On 22-Nov-2013 17:14 -0800, Steinmetz, Paul wrote:
I had an issue some damaged objects with 3rd party software, ran
RCLSTG, still had some issues.
Likely because Reclaim Storage can not correct damage. Though "some
issues" is quite nebulous; so more obviously, only if, had 'some issues
with /damaged objects/ remained.' That is of course, because the only
correction for the "physical damage" types of hard\full (object damage)
and soft\partial (data damage) [to be clear, not /logical/ damage] is to
delete the damaged object although data-recovery actions possible for
soft-damage.
Opened a PMR with IBM, they preferred customers no longer use
RCLSTG.
Except when recommended [by IBM] as part of a recovery action for an
issue with an understood origin. That is, recommended after an issue
for which the reclaim is _known_ /should effect/ the required recovery
or some otherwise desirable effects. And after which, if the documented
effects of having performed the reclaim are not the result, then the
reclaim storage feature has an apparent or obvious defect\deficiency.
Although originally in the aim to reduce customer usage of that
feature [hardly new BTW], I used to suggest customers still should use
reclaim *if* a message had explicitly directed them to run that request.
But that was only with regard to the *DBXREF, and always with warnings
that the QDBSRVXR* jobs needed first to be validated for functionality
and that repeated incidents for that same messaging were a blatantly
obvious indication either of some defect requiring a preventive fix or
an abnormal issue requiring preventive recovery action. For example,
after a CPF32A1, there was little choice but to reclaim the DBXREF
data... and although its likely origin [power loss with hard failure per
no UPS] was something for which a full reclaim might find and correct
some other issues, those other issues might easily pend recovery
indefinitely, whereas the errors with the cross-reference were likely to
cause other failures very soon if not already.
Do you have any info on this?
Primarily, because the Reclaim Storage is a long-running operation
that requires dedicated\restricted-state operation, intended for its
specific recovery effects rather than for maintenance; i.e. not issued
for want of correction, simply for lack of knowing what else to do
with\for a particular problem that was encountered. Secondary, much of
what can be accomplished with a full reclaim can be accomplished with
other means... because it is a recovery, the recovery effects can be
more directly reactive or planned\scheduled, rather than effected with
the heavy-handed and thorough but very lengthy processing of the full
reclaim.
One issue IBM faced was that many customers were issuing the command
as part of their /normal maintenance/ instead of using the feature for
the intended purpose of recovery from certain types of abnormal
failures. As such customers grew or changed their business computing in
various ways, the RCLSTG would often no longer be possible within an
SLA. Another issue was that already-large systems for which a
legitimately encountered abnormal failure would greatly benefit from a
reclaim, the outage required to effect that recovery was already too
large an impact. And finally there was an impression sometimes, that
the service\support [not just IBM] might have used a recommendation of
reclaim as a convenient way to push back on an issue; e.g. the reclaim
was suggested /possibly/ to fix the issue, the reclaim was done, but the
problems typically persisted, because of course the reclaim feature had
nothing to do with nor the capability to correct the issue, so the
customer is no closer to recovery and much further-on in time. As the
false-panacea, the /reclaim storage/ had in some ways paralleled with
the PC for either of its /reboot/ and /defragment/ activities, but with
conspicuously harsher impact.
That the request would be recommended or presumed preventive or
corrective of that which it was not, and that customers would use the
request so often, all while likely causing great impact with so-often
little gain relative to the cost, made the system look problematic,
archaic, or whatever other negative term might describe taking offline
their one scale-up vs one of many scale-out systems. The only hope was
to discourage its use, so...
Over time, great effort had been made to reduce the requirements ever
to use the command implementing the long outage; i.e. specifically, the
RCLSTG SELECT(*ALL) OMIT(*NONE). Some sub-processing like both of the
/directory/ and the /database cross-reference/ were made available as
separate paths to run their function alone and made optional to reduce
overall time by their omission. Additionally there was Reclaim Objects
by Owner (RCLOBJOWN), Reclaim DB Cross-Reference (RCLDBXREF), Reclaim
Object Links (RCLLNK). There may have been improvements to the Reclaim
Library (RCLLIB) to do something more than the effectively-nothing it
had done [best I could infer], although I doubt that... as it only just
recently was corrected IIRC to handle a damaged *OIRSPC, although I
recall no indication of what the new effect was.
As an Amazon Associate we earn from qualifying purchases.