On 03 Jan 2013 10:00, Scott Schollenberger wrote:
On 02 Jan 2013 19:43, CRPence wrote:
What do the other SRC /words/ show as the source of the
srcB90036A0 which means "Failure to initiate critical
Word 11: B90036A0
Word 12: 0DB0008020AC0010, 2nd line 0001000100000000
Word 13: 0000000000000000, 2nd line 0000000000000000
The PTF you referred would be on this server since it was on the
CUME 9321 and I have a CUME from 2011 on the system. I believe it
was the last V5R4 CUME issued. I had seen that APAR and PTF but
discounted it due to its reference to 1 TB of mainstore.
I've also done as the APAR circumvention mention and let the server
sit on the DST screen before continuing the IPL but still got the
same SRC and at the same point, Directory Recovery.
My reference to that APAR was merely for descriptive purposes, as an
example, of how the "words" of the SRC are to be translated into
meaningful details; i.e. not as a recommendation, neither as a
preventive nor as a circumvention.
For the lack of [program name] details in the additional words [did
the available details stop at word 13?] I guess the best way to find the
source of the failure would be to get the formatted dump of the SCPF job
message queue. The VLogs might be telling as well, for what is the
origin for the problem; e.g. identify something which might be an
obvious cause and an inferred resolution.
Without any support available nor good understanding of how to dig
further, I might at this point attempt [after copying the MSD and
vlogs\pals to media], to reinstall with the expectation that the origin
for the failure would be resolved by the install; e.g. an important
[possibly executable] object that has gone missing either physically or
the address to that replaced\destroyed object in the system EPT or a
service program for which the pointer was not properly updated to the
new object. That idea of course presumes the IPL failure is similar to
the error for which the pwrdwn\IPL sequence was initiated in the first
place. An object or a calculation that is exceeded in size is less
likely to be corrected by a reinstall, and thus an IPL for install might
fail in the same manner without corrective action or a preventive PTF.
FWiW, the given data mapped, I believe, more accurately:
Word 11: B90036A0 = Failure to initiate critical system job. (Sent
only from: QWCISTSJ)
Word 12a: 0DB00080 = ¿same data is shown in APAR SE33516; this is not
static for start system jobs QWCISTSJ per MA37982 showing 0D201080?
Word 12b: 20AC0010 = PSF20AC0010 phase 20AC \ subfunction 0010
Word 12c: 00010001 = ¿Function-->0001 *FC count-->0001?
FWiW: I was familiar with [or recalling] use of the prefix PSF for
Phase\Sub-Function symptom kwd string being designator versus the prefix
SRC for the additional word for IPL-failure in APAR text. So I have
since searched also on SRC0DB00080 and found SE25335 which perhaps
provides a better template for comparison, although clearly identifying
QWCISCFR instead of the presumed as more accurate QWCISTSJ. Difficult
to determine without seeing the OS code\listing.
Hmm. Looking at this had me recalling closing an APAR against the IPL
status; for incorrectly recording some database activity in the wrong
SRC. Makes me wonder if the SRCC9002AA5 Directory Recovery was that
place... and perhaps the work should have occurred under SRCC9002AA0 or
??. May be worth destroying [or damaging some of] the QDB* objects in
the QRECOVERY library... though best only if the SCPF joblog or VLogs
could implicate the database recovery as being problematic.