On 24 Apr 2012 05:13, Jerry C. Adams wrote:
I was in the process of looking through the job logs and history
yesterday when I had to leave (they kick me out at 1430!). But, as
you surmised, the ones I have looked at reveal nothing; even the one
that applied to my sessions.
I do not recall, having been too long since I have looked at such an
incident, but I think the history will show effective "job ended"
messages for the jobs that were active when the system went down.
However those messages will appear in the history _after_ the system
started the IPL; i.e. after the start of some of the system jobs are
recorded in the history. I believe the timestamp of the joblogs will
also be from during the IPL; I think the start of QSPLMAINT is what
precedes the spooling of /incomplete jobs/ from prior to the system
crash. I would also do a WRKSPLF (*ALL *ALL *ALL *ALL), generally with
OUTPUT(*PRINT), to review for any [esp. many dump spools] from just
before the apparent hang; e.g. a job looping may have produced a job
dump for each iteration, or maybe just one error and then its attempt to
There was about an hour gap, which stopped when I started the inquiry
of the error message I was investigating and started up when
everything started shutting down.
A gap from before both the F1\F10 when the system appeared to hang
and before the forced IPL? The VLogs from before the forced IPL would
likely be of interest; as I recall the descriptive text sometimes make
the error [versus just diagnostic] logging somewhat conspicuous, and the
really bad ones often include a process dump which includes the full job
name for which WRKJOB JOB(name_from_dump) OPTION(*SPLF) might reveal
QPSRVDMP, QPDSPJOB, etc.
There is also WRKPRB data, which if retained might have some
diagnostic data for an issue that was identified before the system had
FWiW, being v5r1, I would keep an eye out for indications of restore
and\or index rebuild, or access path invalidation. The QDBSRV01 job
runs at a better priority default than the console IIRC, and a problem
in that job manifest as a loop would likely appear similar to the
described. The EDTRCYAP may be worth a review for possible adjustment,
and the EDTRBDAP screen should have appeared on the manual IPL; though
if the the event handling job were looping instead of properly
processing, no AccPth would necessarily ever be added to the list.
By the way, I know that, when the system registers an abnormal end
(QABNORMSW = 1), this means that two IPLs will be necessary when
applying PTFs. But at V5R1 (sigh) I don't think that's going to
happen. Is there any other downside to leaving the system in this
state, or should I re-IPL to get it back to normal?
The QABNORMSW is just historical, now that the IPL is done. Any
impact the switch had was during that IPL; e.g. if PTFs had been
scheduled to apply, but perhaps were not, due to any caution\concern by
the PTF handler [PZ component]. Any code that reviews the status should
understand the system is not IPLing [when the value is truly pertinent],
and treat the value as merely historical. The spooled SCPF joblog [not
the currently active SCPF joblog], produced at the end of the IPL, would
record any PTF activity. WRKSPLF (QSYS *ALL *ALL SCPF) is how I would
locate the spooled QPJOBLOG from the most recent IPL.