Re: RCLSTG -- MIDRANGE-L

On 2/18/11 1:22 PM, rob@xxxxxxxxx wrote:

IBM is really discouraging me from running a full RCLSTG as a
general habit. Every other month, when we perform scheduled
maintenance, we run the full RCLSTG.

As they should. A RCLSTG should not be performed as regular maintenance, only as recovery from [termination and error] scenarios or various effects that are known or are likely to be corrected by having performed the Reclaim Storage request. Many functions of the RCLSTG SELECT(*ALL) can be effected by alternate means; at least most things other than recovering storage for objects that are not addressable via their respective DLT command, and growth in the /reclaimable/ space portion of the PRTDSKINF report should identify that. Although there was talk of "reclaim user" support to recover addressability to some owned but unaddressable objects, and maybe even it exists as a command [I found it: RCLOBJOWN] there were restrictions for database objects that AFaIK were never resolved and so no support was added. Hmmm, OK I remember. The primary requirements were that the startup phase of a full RCLSTG ran first, which also requires the restricted state; those were apparently implemented for the Reclaim Object by Owner command processing. So only the unowned of the unaddressable objects could not be recovered without the use of RCLSTG.

Apparently if that has any abnormalities then the part of it that
does a RCLSTG *DBXREF may not behave and you could get results like:

There is no specific evidence of one thing leading to another. Having first done OMIT(*DBXREF) would seem to me to have been a better choice, or just omit the RCLSTG SELECT(*DBXREF); admittedly for this scenario, I expect the locking error would have persisted either way.

02/11/11 20:55 - RCLSTG with no options
02/11/11 21:05 - MCH3402, CPF9999
02/11/11 21:07 - ..
02/11/11 21:10 - ..
02/11/11 21:20 - ..

Such errors can be "normal" for objects impacted by terminations; e.g. for "interrupted" operations against objects. Of course, recorded as just a message identifier provides no context for which the condition was encountered; no inference can be made about if or how innocuous they may be.

02/11/11 22:24 - MCH2601, CPF9999, SYSTXTINDXSYSTE00001

This is most definitely a problem. I recall such errors had often been origin from poor implementation of support for SYSIBM; that after an IPL applying a PTF with changes to SYSIBM, the PTF exit processing improperly leaves locks either as side effect of a defect or side effect of the means of implementation to get the changes applied outside of the PTF process [because SQL is not fully functional at IPL due to a design deficiency; somewhat of a chicken\egg scenario]. Find what and where of the SYSTXTINDX *FILE object, and if it exists determine creation and change date\timestamp, and what process holds the lock [likely the QDBSRVXR2 job] and then review the joblog and spool files [dumps and dspjob may have been logged] for that job.

I seem to recall some strange design decisions for the handling of the SQL catalog TABLE objects for\after a RCLSTG; perhaps only with iASP. Not sure if\what about that might be an issue. But there is a PTF SI36148 "OSP-DB-OTHER-UNPRED QSYS2 VIEW GET PUBLIC *EXCLUDE AFTER RCL" for SE38909 that may imply that QSQSYSIBM [SYSIBM creation\] exit routine is being invoked as part of RCLSTG.

02/11/11 22:36 - MCH3402, CPF9999
02/11/11 22:51 - ..
02/11/11 22:59 - ..
02/11/11 23:02 - ..
02/11/11 23:03 - ..
02/11/11 23:06 - MCH3603, CPF3698, CPF9999

That error is an indication of a horrible, potentially disastrous situation. A msgMCH3603 during the reclaim suggests that the reclaim attempted to process [possibly to destroy; again, no context for the message was given] an object [which was dumped, according to CPF3698] from a list of objects, but that object was not the object type that the object handler thought it must have been. Scary. Imagine if instead, that the object type was valid, but still the wrong object [implicitly from "wrong type"], and that the request had been to destroy the object; e.g. object is a *USRSPC and the *FILE object handler issued a DESS [destroy space], thus that object was since deleted... Seriously... Not good!

02/11/11 23:38 - MCH3402, CPF9999
02/12/11 00:11 - ..
02/12/11 01:00 - ..
02/12/11 01:31 - MCH3603, CPF3698, CPF9999
02/12/11 01:57 - MCH3402, CPF9999
02/12/11 02:02 - MCH3603, CPF3698, CPF9999
02/12/11 02:24 - MCH3402, CPF9999
02/12/11 02:37 - ..
02/12/11 02:38 - ..
02/12/11 02:46 - ..
02/12/11 02:53 - MCH3603, CPF3698, CPF9999
02/12/11 03:00 - MCH3402, CPF9999
02/12/11 03:45 - CPC2206, CPF327E, CPF7304, SQL0601, MCH2601, CPF2499,
CEE3201, CEE9901

This seems possibly related to the earlier failure, though for lack of message text, an object name is unknown. Presumably a database *FILE object either had no owner and that was corrected, although the failure to rename seems suspect; perhaps that was a move [e.g. into QRCL] and the object already exists so the errors up until MCH2601 would seem likely to be normal.

02/12/11 06:53 - CPCA08C, CPCA08C, CPCA08C, CPCA08C, CPC2192, ...
02/12/11 06:56 - CPC8208, RCLSTG processing complete. 2807086 objects
processed. 13 deleted.

With the several MCH3603, hopefully the objects that were deleted had been intentional rather than accidental. I think in v5r4 down to v5r2 I worked with the reclaim developer to correct a problem with the same symptom, and I would expect that change would have made the base of the release v6r1; perhaps the same error, or perhaps something similar of a different origin.

02/12/11 06:56 - SAVSYS
02/12/11 07:04 - SAVLIB LIB(*NONSYS)
02/12/11 07:04 - MCH2601, SYSTXTINDXSYSTE00001. CPF3741
02/12/11 07:05 - 259 objects saved from QSYS2. 1 not saved.
02/12/11 11:14 - 761 libraries saved, 1 partially saved, 0 not saved.

Seems to be the same problem for the full reclaim. Makes sense the problem would remain, because a lock held [breaking protocol] can not be resolved by the reclaim. Since likely the lock is held by a system job, likely the only recovery from the bogus lock [aside from patching] is to IPL. Because of the error, there is possibly some information not entirely correct with the *DBXREF for a file SYSTXTINDX in whatever library; though for an error on the *MEM or a *QDIDX versus the *FILE,

Apparently running a RCLSTG *DBXREF after a full RCLSTG is not a bad
idea. Even though the full RCLSTG does a RCLSTG *DBXREF. Just the
*DBXREF runs in a short amount of time.

IMO not a "good idea", unless performing the same work twice via two distinct code paths is somehow "good". What is IMO the best choice is to RCLSTG SELECT(*ALL) OMIT(*DBXREF) followed by the request to RCLSTG SELECT(*DBXREF); in this manner the canceling of the first reclaim hass no ramifications to the *DBXREF, and thus effectively no more requirement to effect refresh of the *DBXREF than what existed before the first reclaim. For an interrupted reclaim which includes the *DBXREF, that would have an effective requirement for recovery by RCLSTG SELECT(*DBXREF), in order to enable doing any definitional activity with triggers, long names, IDDU, and probably some other features.

Can't get a feel from them if signing off/on may clear some of the
messages and affect the *DBXREF, like, maybe some programmer didn't
handle memory allocations well or some such thing.

Signing off and then on is unlikely to assist for the quoted scenario.

I have a PMR on this, but I don't think it's going anywhere.

Ugh. :-(

Regards, Chuck