Lukas Beeler wrote:
On Thu, Jun 4, 2009 at 22:01, Lukas Beeler
<lukas.beeler@xxxxxxxxxxxxxxxx> wrote:
[cut] srcC9002967
Just got off the call with IBM. Turns out SI30387 can cause SCPF
to loop under certain conditions. Happened once before, but there
was insufficient data to escalate this into the country beyond
the big pond ;)
The conditions which can cause this are not known yet, and i'll
investigate the details with IBM as soon as possible. However,
since i already applied the same CD images to multiple systems
without fail, i'd guess it has to do with patching system that
are wayyyy behind on PTFs.
What we did:
Cut down machine with Switching to manual, Option 8
Boot machine in B manual
Boot into restricted state
Looked at SCPF job (WRKJOB SCPF), which was looping
What would be need to be reviewed for the job that had the
problem would be the QPJOBLOG for the old SCPF job that was active.
That is, the new SCPF job would not be the looping job, because
the active SCPF job is the one which /booted into restricted state/.
IIRC both the IPL with the PTF activity and the /reboot/ IPL would
be logged in the most recent SCPF QPJOBLOG. I would use WRKSPLF
(QSYS *N *N SCPF) versus WRKJOB to find that QPJOBLOG spool, and
then WRKJOB OPTION(*SPLF) using the specific SCPF job number that
was assigned for and recorded on that QPJOBLOG spool, in order to
review if any other spool files were produced in that job; e.g. any
QPSRVDMP and\or QPDSPJOB spools that might occur for failures.
Looked at PTF with DSPPTF
If the PTF being applied had failed to apply [due to loop being
terminated by pwrdwn\IPL], then the PTF presumably was identified as
/damaged/ according to DSPPTF. What might be of most interest is
the named /exit programs/ from the DSPPTF, as those would be most
likely to effect a loop that exists both in SCPF and a user job
during PTF apply processing; i.e. aside from some error in the PZ
[the OS PTF processing] code itself.
(Tried manual apply - required a new LODPTF
LODPTF is required for recovery of a /damaged/ PTF; i.e. enables
a new attempt at APYPTF or RMVPTF.
- same loop, but could abort now)
That would have been during APYPTF [i.e. during the /manual
apply/ noted, not during the LODPTF] for a loop that was equivalent
to the [described as loop] activity in SCPF; this, for clarity only.
And that result implies an /easy/ recreate is available for which,
to "investigate the details with IBM" is then better enabled, more
than if the loop was seen only during the IPL; i.e. only in the SCPF
job.
Then, disabled PTF apply by setting it not to apply
(using APYPTF).
After yet another LODPTF I presume; i.e. the PTF was damaged
again, but by the SysRqs-2 to effect ENDRQS to /abort/ the looping
apply request.? I expect RMVPTF would have been an option instead,
and that APYPTF used to reset for /not to apply/ would only be
necessary after a prior APYPTF for *ALL PTFs, since a PTF which is
only loaded [and not identified for apply] would not have any
pending apply activity.?
Started machine in B normal, rest of PTFs are being
applied right now
<<SNIP>>
If the LODPTF + APYPTF still recreate the /looping/ condition in
a user job after the other PTFs are applied, a TRCJOB of that
activity might be valuable to diagnose the origin of the loop.
Presumably the error is in a /PTF exit program/ that is
improperly coded; e.g. a global MONMSG CPF0000 EXEC(GOTO CLEANUP) is
coded, and a failing statement is coded after the CLEANUP: label
which does not have its local MONMSG CPF0000 coded. An error could
be specific to any system which for example might not have QTEMP in
*LIBL, and where a failing statement of DLTxxx ZOBJECT, where the
ZOBJECT was not properly qualified with QTEMP; i.e. defect as
failure to code instead, DLTxxx QTEMP/ZOBJECT. In this case simply
requesting the SysRqs-3 *PGMSTK would probably identify the failure,
which is even easier than a TRCJOB.
A similar defect by a PTF exit program could be specific to SCPF
whereby origin is failure to code a global MONMSG CPF0000 for which
the default [e.g. CLP] exception handler would send an inquiry
message to the job for which no reply can be given. That condition
is a HANG versus LOOP.
Regards, Chuck
As an Amazon Associate we earn from qualifying purchases.