On 2/18/11 1:22 PM, rob@xxxxxxxxx wrote:
IBM is really discouraging me from running a full RCLSTG as a
general habit.  Every other month, when we perform scheduled
maintenance, we run the full RCLSTG.
  As they should.  A RCLSTG should not be performed as regular 
maintenance, only as recovery from [termination and error] scenarios or 
various effects that are known or are likely to be corrected by having 
performed the Reclaim Storage request.  Many functions of the RCLSTG 
SELECT(*ALL) can be effected by alternate means; at least most things 
other than recovering storage for objects that are not addressable via 
their respective DLT command, and growth in the /reclaimable/ space 
portion of the PRTDSKINF report should identify that.  Although there 
was talk of "reclaim user" support to recover addressability to some 
owned but unaddressable objects, and maybe even it exists as a command 
[I found it: RCLOBJOWN] there were restrictions for database objects 
that AFaIK were never resolved and so no support was added.  Hmmm, OK I 
remember.  The primary requirements were that the startup phase of a 
full RCLSTG ran first, which also requires the restricted state; those 
were apparently implemented for the Reclaim Object by Owner command 
processing.  So only the unowned of the unaddressable objects could not 
be recovered without the use of RCLSTG.
Apparently if that has any abnormalities then the part of it that
does a RCLSTG *DBXREF may not behave and you could get results like:
  There is no specific evidence of one thing leading to another. 
Having first done OMIT(*DBXREF) would seem to me to have been a better 
choice, or just omit the RCLSTG SELECT(*DBXREF); admittedly for this 
scenario, I expect the locking error would have persisted either way.
02/11/11  20:55 - RCLSTG with no options
02/11/11  21:05 - MCH3402, CPF9999
02/11/11  21:07 - ..
02/11/11  21:10 - ..
02/11/11  21:20 - ..
  Such errors can be "normal" for objects impacted by terminations; 
e.g. for "interrupted" operations against objects.  Of course, recorded 
as just a message identifier provides no context for which the condition 
was encountered; no inference can be made about if or how innocuous they 
may be.
02/11/11  22:24 - MCH2601, CPF9999, SYSTXTINDXSYSTE00001
  This is most definitely a problem.  I recall such errors had often 
been origin from poor implementation of support for SYSIBM; that after 
an IPL applying a PTF with changes to SYSIBM, the PTF exit processing 
improperly leaves locks either as side effect of a defect or side effect 
of the means of implementation to get the changes applied outside of the 
PTF process [because SQL is not fully functional at IPL due to a design 
deficiency; somewhat of a chicken\egg scenario].  Find what and where of 
the SYSTXTINDX *FILE object, and if it exists determine creation and 
change date\timestamp, and what process holds the lock [likely the 
QDBSRVXR2 job] and then review the joblog and spool files [dumps and 
dspjob may have been logged] for that job.
  I seem to recall some strange design decisions for the handling of 
the SQL catalog TABLE objects for\after a RCLSTG; perhaps only with 
iASP.  Not sure if\what about that might be an issue.  But there is a 
PTF SI36148 "OSP-DB-OTHER-UNPRED QSYS2 VIEW GET PUBLIC *EXCLUDE AFTER 
RCL" for SE38909 that may imply that QSQSYSIBM [SYSIBM creation\] exit 
routine is being invoked as part of RCLSTG.
02/11/11  22:36 - MCH3402, CPF9999
02/11/11  22:51 - ..
02/11/11  22:59 - ..
02/11/11  23:02 - ..
02/11/11  23:03 - ..
02/11/11  23:06 - MCH3603, CPF3698, CPF9999
  That error is an indication of a horrible, potentially disastrous 
situation.  A msgMCH3603 during the reclaim suggests that the reclaim 
attempted to process [possibly to destroy; again, no context for the 
message was given] an object [which was dumped, according to CPF3698] 
from a list of objects, but that object was not the object type that the 
object handler thought it must have been.  Scary.  Imagine if instead, 
that the object type was valid, but still the wrong object [implicitly 
from "wrong type"], and that the request had been to destroy the object; 
e.g. object is a *USRSPC and the *FILE object handler issued a DESS 
[destroy space], thus that object was since deleted... Seriously... Not 
good!
02/11/11  23:38 - MCH3402, CPF9999
02/12/11  00:11 - ..
02/12/11  01:00 - ..
02/12/11  01:31 - MCH3603, CPF3698, CPF9999
02/12/11  01:57 - MCH3402, CPF9999
02/12/11  02:02 - MCH3603, CPF3698, CPF9999
02/12/11  02:24 - MCH3402, CPF9999
02/12/11  02:37 - ..
02/12/11  02:38 - ..
02/12/11  02:46 - ..
02/12/11  02:53 - MCH3603, CPF3698, CPF9999
02/12/11  03:00 - MCH3402, CPF9999
02/12/11  03:45 - CPC2206, CPF327E, CPF7304, SQL0601, MCH2601, CPF2499,
CEE3201, CEE9901
  This seems possibly related to the earlier failure, though for lack 
of message text, an object name is unknown.  Presumably a database *FILE 
object either had no owner and that was corrected, although the failure 
to rename seems suspect; perhaps that was a move [e.g. into QRCL] and 
the object already exists so the errors up until MCH2601 would seem 
likely to be normal.
02/12/11  06:53 - CPCA08C, CPCA08C, CPCA08C, CPCA08C, CPC2192, ...
02/12/11  06:56 - CPC8208, RCLSTG processing complete. 2807086 objects
processed. 13 deleted.
  With the several MCH3603, hopefully the objects that were deleted had 
been intentional rather than accidental.  I think in v5r4 down to v5r2 I 
worked with the reclaim developer to correct a problem with the same 
symptom, and I would expect that change would have made the base of the 
release v6r1; perhaps the same error, or perhaps something similar of a 
different origin.
02/12/11  06:56 - SAVSYS
02/12/11  07:04 - SAVLIB LIB(*NONSYS)
02/12/11  07:04 - MCH2601, SYSTXTINDXSYSTE00001. CPF3741
02/12/11  07:05 - 259 objects saved from QSYS2. 1 not saved.
02/12/11  11:14 - 761 libraries saved, 1 partially saved, 0 not saved.
  Seems to be the same problem for the full reclaim.  Makes sense the 
problem would remain, because a lock held [breaking protocol] can not be 
resolved by the reclaim.  Since likely the lock is held by a system job, 
likely the only recovery from the bogus lock [aside from patching] is to 
IPL.  Because of the error, there is possibly some information not 
entirely correct with the *DBXREF for a file SYSTXTINDX in whatever 
library; though for an error on the *MEM or a *QDIDX versus the *FILE,
Apparently running a RCLSTG *DBXREF after a full RCLSTG is not a bad
idea. Even though the full RCLSTG does a RCLSTG *DBXREF. Just the
*DBXREF runs in a short amount of time.
  IMO not a "good idea", unless performing the same work twice via two 
distinct code paths is somehow "good".  What is IMO the best choice is 
to RCLSTG SELECT(*ALL) OMIT(*DBXREF) followed by the request to RCLSTG 
SELECT(*DBXREF); in this manner the canceling of the first reclaim hass 
no ramifications to the *DBXREF, and thus effectively no more 
requirement to effect refresh of the *DBXREF than what existed before 
the first reclaim.  For an interrupted reclaim which includes the 
*DBXREF, that would have an effective requirement for recovery by RCLSTG 
SELECT(*DBXREF), in order to enable doing any definitional activity with 
triggers, long names, IDDU, and probably some other features.
Can't get a feel from them if signing off/on may clear some of the
messages and affect the *DBXREF, like, maybe some programmer didn't
handle memory allocations well or some such thing.
  Signing off and then on is unlikely to assist for the quoted scenario.
I have a PMR on this, but I don't think it's going anywhere.
  Ugh. :-(
Regards, Chuck
As an Amazon Associate we earn from qualifying purchases.