On 2/18/11 1:22 PM, rob@xxxxxxxxx wrote:
IBM is really discouraging me from running a full RCLSTG as a
general habit. Every other month, when we perform scheduled
maintenance, we run the full RCLSTG.
As they should. A RCLSTG should not be performed as regular
maintenance, only as recovery from [termination and error] scenarios or
various effects that are known or are likely to be corrected by having
performed the Reclaim Storage request. Many functions of the RCLSTG
SELECT(*ALL) can be effected by alternate means; at least most things
other than recovering storage for objects that are not addressable via
their respective DLT command, and growth in the /reclaimable/ space
portion of the PRTDSKINF report should identify that. Although there
was talk of "reclaim user" support to recover addressability to some
owned but unaddressable objects, and maybe even it exists as a command
[I found it: RCLOBJOWN] there were restrictions for database objects
that AFaIK were never resolved and so no support was added. Hmmm, OK I
remember. The primary requirements were that the startup phase of a
full RCLSTG ran first, which also requires the restricted state; those
were apparently implemented for the Reclaim Object by Owner command
processing. So only the unowned of the unaddressable objects could not
be recovered without the use of RCLSTG.
Apparently if that has any abnormalities then the part of it that
does a RCLSTG *DBXREF may not behave and you could get results like:
There is no specific evidence of one thing leading to another.
Having first done OMIT(*DBXREF) would seem to me to have been a better
choice, or just omit the RCLSTG SELECT(*DBXREF); admittedly for this
scenario, I expect the locking error would have persisted either way.
02/11/11 20:55 - RCLSTG with no options
02/11/11 21:05 - MCH3402, CPF9999
02/11/11 21:07 - ..
02/11/11 21:10 - ..
02/11/11 21:20 - ..
Such errors can be "normal" for objects impacted by terminations;
e.g. for "interrupted" operations against objects. Of course, recorded
as just a message identifier provides no context for which the condition
was encountered; no inference can be made about if or how innocuous they
may be.
02/11/11 22:24 - MCH2601, CPF9999, SYSTXTINDXSYSTE00001
This is most definitely a problem. I recall such errors had often
been origin from poor implementation of support for SYSIBM; that after
an IPL applying a PTF with changes to SYSIBM, the PTF exit processing
improperly leaves locks either as side effect of a defect or side effect
of the means of implementation to get the changes applied outside of the
PTF process [because SQL is not fully functional at IPL due to a design
deficiency; somewhat of a chicken\egg scenario]. Find what and where of
the SYSTXTINDX *FILE object, and if it exists determine creation and
change date\timestamp, and what process holds the lock [likely the
QDBSRVXR2 job] and then review the joblog and spool files [dumps and
dspjob may have been logged] for that job.
I seem to recall some strange design decisions for the handling of
the SQL catalog TABLE objects for\after a RCLSTG; perhaps only with
iASP. Not sure if\what about that might be an issue. But there is a
PTF SI36148 "OSP-DB-OTHER-UNPRED QSYS2 VIEW GET PUBLIC *EXCLUDE AFTER
RCL" for SE38909 that may imply that QSQSYSIBM [SYSIBM creation\] exit
routine is being invoked as part of RCLSTG.
02/11/11 22:36 - MCH3402, CPF9999
02/11/11 22:51 - ..
02/11/11 22:59 - ..
02/11/11 23:02 - ..
02/11/11 23:03 - ..
02/11/11 23:06 - MCH3603, CPF3698, CPF9999
That error is an indication of a horrible, potentially disastrous
situation. A msgMCH3603 during the reclaim suggests that the reclaim
attempted to process [possibly to destroy; again, no context for the
message was given] an object [which was dumped, according to CPF3698]
from a list of objects, but that object was not the object type that the
object handler thought it must have been. Scary. Imagine if instead,
that the object type was valid, but still the wrong object [implicitly
from "wrong type"], and that the request had been to destroy the object;
e.g. object is a *USRSPC and the *FILE object handler issued a DESS
[destroy space], thus that object was since deleted... Seriously... Not
good!
02/11/11 23:38 - MCH3402, CPF9999
02/12/11 00:11 - ..
02/12/11 01:00 - ..
02/12/11 01:31 - MCH3603, CPF3698, CPF9999
02/12/11 01:57 - MCH3402, CPF9999
02/12/11 02:02 - MCH3603, CPF3698, CPF9999
02/12/11 02:24 - MCH3402, CPF9999
02/12/11 02:37 - ..
02/12/11 02:38 - ..
02/12/11 02:46 - ..
02/12/11 02:53 - MCH3603, CPF3698, CPF9999
02/12/11 03:00 - MCH3402, CPF9999
02/12/11 03:45 - CPC2206, CPF327E, CPF7304, SQL0601, MCH2601, CPF2499,
CEE3201, CEE9901
This seems possibly related to the earlier failure, though for lack
of message text, an object name is unknown. Presumably a database *FILE
object either had no owner and that was corrected, although the failure
to rename seems suspect; perhaps that was a move [e.g. into QRCL] and
the object already exists so the errors up until MCH2601 would seem
likely to be normal.
02/12/11 06:53 - CPCA08C, CPCA08C, CPCA08C, CPCA08C, CPC2192, ...
02/12/11 06:56 - CPC8208, RCLSTG processing complete. 2807086 objects
processed. 13 deleted.
With the several MCH3603, hopefully the objects that were deleted had
been intentional rather than accidental. I think in v5r4 down to v5r2 I
worked with the reclaim developer to correct a problem with the same
symptom, and I would expect that change would have made the base of the
release v6r1; perhaps the same error, or perhaps something similar of a
different origin.
02/12/11 06:56 - SAVSYS
02/12/11 07:04 - SAVLIB LIB(*NONSYS)
02/12/11 07:04 - MCH2601, SYSTXTINDXSYSTE00001. CPF3741
02/12/11 07:05 - 259 objects saved from QSYS2. 1 not saved.
02/12/11 11:14 - 761 libraries saved, 1 partially saved, 0 not saved.
Seems to be the same problem for the full reclaim. Makes sense the
problem would remain, because a lock held [breaking protocol] can not be
resolved by the reclaim. Since likely the lock is held by a system job,
likely the only recovery from the bogus lock [aside from patching] is to
IPL. Because of the error, there is possibly some information not
entirely correct with the *DBXREF for a file SYSTXTINDX in whatever
library; though for an error on the *MEM or a *QDIDX versus the *FILE,
Apparently running a RCLSTG *DBXREF after a full RCLSTG is not a bad
idea. Even though the full RCLSTG does a RCLSTG *DBXREF. Just the
*DBXREF runs in a short amount of time.
IMO not a "good idea", unless performing the same work twice via two
distinct code paths is somehow "good". What is IMO the best choice is
to RCLSTG SELECT(*ALL) OMIT(*DBXREF) followed by the request to RCLSTG
SELECT(*DBXREF); in this manner the canceling of the first reclaim hass
no ramifications to the *DBXREF, and thus effectively no more
requirement to effect refresh of the *DBXREF than what existed before
the first reclaim. For an interrupted reclaim which includes the
*DBXREF, that would have an effective requirement for recovery by RCLSTG
SELECT(*DBXREF), in order to enable doing any definitional activity with
triggers, long names, IDDU, and probably some other features.
Can't get a feel from them if signing off/on may clear some of the
messages and affect the *DBXREF, like, maybe some programmer didn't
handle memory allocations well or some such thing.
Signing off and then on is unlikely to assist for the quoted scenario.
I have a PMR on this, but I don't think it's going anywhere.
Ugh. :-(
Regards, Chuck
As an Amazon Associate we earn from qualifying purchases.