Re: SRC B90036A0 during OS/400 IPL - Directory Recovery step -- MIDRANGE-L

On 03 Jan 2013 10:00, Scott Schollenberger wrote:

On 02 Jan 2013 19:43, CRPence wrote:

What do the other SRC /words/ show as the source of the
srcB90036A0 which means "Failure to initiate critical
system job"?

Word 11: B90036A0
Word 12: 0DB0008020AC0010, 2nd line 0001000100000000
Word 13: 0000000000000000, 2nd line 0000000000000000

The PTF you referred would be on this server since it was on the
CUME 9321 and I have a CUME from 2011 on the system. I believe it
was the last V5R4 CUME issued. I had seen that APAR and PTF but
discounted it due to its reference to 1 TB of mainstore.

I've also done as the APAR circumvention mention and let the server
sit on the DST screen before continuing the IPL but still got the
same SRC and at the same point, Directory Recovery.

My reference to that APAR was merely for descriptive purposes, as an example, of how the "words" of the SRC are to be translated into meaningful details; i.e. not as a recommendation, neither as a preventive nor as a circumvention.

For the lack of [program name] details in the additional words [did the available details stop at word 13?] I guess the best way to find the source of the failure would be to get the formatted dump of the SCPF job message queue. The VLogs might be telling as well, for what is the origin for the problem; e.g. identify something which might be an obvious cause and an inferred resolution.

Without any support available nor good understanding of how to dig further, I might at this point attempt [after copying the MSD and vlogs\pals to media], to reinstall with the expectation that the origin for the failure would be resolved by the install; e.g. an important [possibly executable] object that has gone missing either physically or the address to that replaced\destroyed object in the system EPT or a service program for which the pointer was not properly updated to the new object. That idea of course presumes the IPL failure is similar to the error for which the pwrdwn\IPL sequence was initiated in the first place. An object or a calculation that is exceeded in size is less likely to be corrected by a reinstall, and thus an IPL for install might fail in the same manner without corrective action or a preventive PTF.

FWiW, the given data mapped, I believe, more accurately:
Word 11: B90036A0 = Failure to initiate critical system job. (Sent only from: QWCISTSJ)
Word 12a: 0DB00080 = ¿same data is shown in APAR SE33516; this is not static for start system jobs QWCISTSJ per MA37982 showing 0D201080?
Word 12b: 20AC0010 = PSF20AC0010 phase 20AC \ subfunction 0010
Word 12c: 00010001 = ¿Function-->0001 *FC count-->0001?

FWiW: I was familiar with [or recalling] use of the prefix PSF for Phase\Sub-Function symptom kwd string being designator versus the prefix SRC for the additional word for IPL-failure in APAR text. So I have since searched also on SRC0DB00080 and found SE25335 which perhaps provides a better template for comparison, although clearly identifying QWCISCFR instead of the presumed as more accurate QWCISTSJ. Difficult to determine without seeing the OS code\listing.

Hmm. Looking at this had me recalling closing an APAR against the IPL status; for incorrectly recording some database activity in the wrong SRC. Makes me wonder if the SRCC9002AA5 Directory Recovery was that place... and perhaps the work should have occurred under SRCC9002AA0 or ??. May be worth destroying [or damaging some of] the QDB* objects in the QRECOVERY library... though best only if the SCPF joblog or VLogs could implicate the database recovery as being problematic.