On 06-Dec-2017 15:29 -0700, RIAAN RAS/TCM Software & Services/ZA wrote:
We have a client with a system running V5R4. The system crashed a
week ago with three cache batteries failed, one disk failure and
another disk logging errors. The system had at least <???>
abnormal shut downs. After the hardware was restored, did <???>
the backups fail due to damaged objects.
Saves are designed to [issue a fatal] stop when encountering
diagnosed-as damaged objects, because the integrity of the save would be
questionable, with regard to what was planned to be saved and what could
be or have been dumped to media for that save request. An object noted
during saves, to have been marked as damaged previously, prior to that
save, can and will be omitted from that save [and a diagnostic logged to
indicate the object would not be included]; i.e. the feature ensures
consistency between what is planned to be saved [by the OS] and what can
actually be dumped to media [by the LIC].
Advised customer to perform a reclaim storage which failed.
One phase of reclaim storage is "damage notification"; damage found
or previously found such that an object was marked as damaged, will be
notified to the [*sysopr message queue and the] history log. Note that
the reclaim can and will not be able to find all objects that later may
be found to be damaged when the object is saved or when the object is
"used" for its intended purpose; i.e. some damage is not subject to
detection by RCLSTG, solely with the limited access/touch, as performed
by the reclaim feature.
I IPL'ed the system and perform another RCLSTG. The "Read objects
from disk" failed after 68% with MCH3601 and MCH3202.
I ran a RCLSTG *DBXREF which completed successfully and restarted
the full RCLSTG. This time the "Read objects from disk"
step completed, and failed immediately after that.
All attempts since has been unsuccessful.
The CHKPRDOPT *OPSYS reported no errors.
Have anybody had a similar problem before? I cannot get any
information on the MCH3203 error
msg MCH3203, or msg MCH3202, or both? Which was fatal to, i.e.
terminated, the reclaim request, for which apparently the effect was the
following [which implied MCH3202]?
CPF9999 Diagnostic 40 06/12/17 20:57:42.887600
QMHUNMSG *N QUIMNDRV QSYS 060C
Thread . . . . : 00000004
Message . . . . : Function check. MCH3202 unmonitored by QRCLENUP
at statement *N, instruction X'0060'.
That is some portion of the details from the "function check"
message, not the msg MCH3202 itself; rather meaningless, except to imply
that the actual failure, the preceding condition, was msgMCH3202
T/QRCLENUP x/0060, though, without any further context, such as the
Return Code (RC) which defines what is the Minor Code for the exception
diagnosed by the Licensed Internal Code (LIC). The symptom details from
the above, is merely, the unhelpful:
msgCPF9999 *FC F/QMHUNMSG rcMCH3202
The details for the apparent actual failing condition/message, the
msgMCH3202 RC####, are recorded in a VLIC Log (VLog), if I recall
correctly, as a VL0200####. But details from a spooled joblog would
show the "From program" [and instruction] as additional context to
reveal what might be the origin of the difficulty for the LIC. For
example, the condition as issued from #dbdschk [symptom F/#dbdschk]
might suggest the condition is a duplicate of APAR MA42142 [for which a
PTF MF55800 exists for a LIC level V5R4M5]; though that APAR does not
mention what might be the "To program", so despite the implication of
the above FC revealing a symptom T/QRCLENUP, that APAR has no similar
implication.
As an Amazon Associate we earn from qualifying purchases.