× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Lukas Beeler wrote:
On Thu, Jun 4, 2009 at 22:01, Lukas Beeler
<lukas.beeler@xxxxxxxxxxxxxxxx> wrote:
[cut] srcC9002967

Just got off the call with IBM. Turns out SI30387 can cause SCPF
to loop under certain conditions. Happened once before, but there
was insufficient data to escalate this into the country beyond
the big pond ;)

The conditions which can cause this are not known yet, and i'll investigate the details with IBM as soon as possible. However,
since i already applied the same CD images to multiple systems
without fail, i'd guess it has to do with patching system that
are wayyyy behind on PTFs.

What we did:

Cut down machine with Switching to manual, Option 8
Boot machine in B manual
Boot into restricted state
Looked at SCPF job (WRKJOB SCPF), which was looping

What would be need to be reviewed for the job that had the problem would be the QPJOBLOG for the old SCPF job that was active. That is, the new SCPF job would not be the looping job, because the active SCPF job is the one which /booted into restricted state/. IIRC both the IPL with the PTF activity and the /reboot/ IPL would be logged in the most recent SCPF QPJOBLOG. I would use WRKSPLF (QSYS *N *N SCPF) versus WRKJOB to find that QPJOBLOG spool, and then WRKJOB OPTION(*SPLF) using the specific SCPF job number that was assigned for and recorded on that QPJOBLOG spool, in order to review if any other spool files were produced in that job; e.g. any QPSRVDMP and\or QPDSPJOB spools that might occur for failures.

Looked at PTF with DSPPTF

If the PTF being applied had failed to apply [due to loop being terminated by pwrdwn\IPL], then the PTF presumably was identified as /damaged/ according to DSPPTF. What might be of most interest is the named /exit programs/ from the DSPPTF, as those would be most likely to effect a loop that exists both in SCPF and a user job during PTF apply processing; i.e. aside from some error in the PZ [the OS PTF processing] code itself.

(Tried manual apply - required a new LODPTF

LODPTF is required for recovery of a /damaged/ PTF; i.e. enables a new attempt at APYPTF or RMVPTF.

- same loop, but could abort now)

That would have been during APYPTF [i.e. during the /manual apply/ noted, not during the LODPTF] for a loop that was equivalent to the [described as loop] activity in SCPF; this, for clarity only. And that result implies an /easy/ recreate is available for which, to "investigate the details with IBM" is then better enabled, more than if the loop was seen only during the IPL; i.e. only in the SCPF job.

Then, disabled PTF apply by setting it not to apply
(using APYPTF).

After yet another LODPTF I presume; i.e. the PTF was damaged again, but by the SysRqs-2 to effect ENDRQS to /abort/ the looping apply request.? I expect RMVPTF would have been an option instead, and that APYPTF used to reset for /not to apply/ would only be necessary after a prior APYPTF for *ALL PTFs, since a PTF which is only loaded [and not identified for apply] would not have any pending apply activity.?

Started machine in B normal, rest of PTFs are being
applied right now

<<SNIP>>


If the LODPTF + APYPTF still recreate the /looping/ condition in a user job after the other PTFs are applied, a TRCJOB of that activity might be valuable to diagnose the origin of the loop.

Presumably the error is in a /PTF exit program/ that is improperly coded; e.g. a global MONMSG CPF0000 EXEC(GOTO CLEANUP) is coded, and a failing statement is coded after the CLEANUP: label which does not have its local MONMSG CPF0000 coded. An error could be specific to any system which for example might not have QTEMP in *LIBL, and where a failing statement of DLTxxx ZOBJECT, where the ZOBJECT was not properly qualified with QTEMP; i.e. defect as failure to code instead, DLTxxx QTEMP/ZOBJECT. In this case simply requesting the SysRqs-3 *PGMSTK would probably identify the failure, which is even easier than a TRCJOB.

A similar defect by a PTF exit program could be specific to SCPF whereby origin is failure to code a global MONMSG CPF0000 for which the default [e.g. CLP] exception handler would send an inquiry message to the job for which no reply can be given. That condition is a HANG versus LOOP.

Regards, Chuck

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.