RE: RSTLIBBRM failed due to "false" DFRID -- MIDRANGE-L

Chuck,

I appreciate all your input.

Below is snapshot of QADBRSDFRJ, after the failure, before QSYS/RMVDFRID DFRID(*ALL) It matches the object in question.

File . . . . . . : QADBRSDFRJ Library . . . . : QRECOVERY
Member . . . . . : QADBRSDFRJ Record . . . . . : 1
Control . . . . . Column . . . . . : 1
Find . . . . . . .
*...+....1....+....2....+....3....+....4....+....5....+....6....+....7..
Q1ARSTID BRCAUDIT CRMB42010 3¬}ò¬X^ó¬¤¬¬¬¬¬¬¬¬¬¬¬BON¬¬
****** END OF DATA ******

The libraries are deleted, in a separate job, prior to the RSTLIBBRM, libraries never exist.

The file in error is actually successfully restored, everything matches to the source LPAR.

Here's the latest update from IBM support.
The DataBase developer asked if you would accept a trap. It would likely come in the form of an APAR to load onto the system just as you would a PTF and then when the problem recurred, it would dump additional data. This problem does not seem to be with the SR code even though the call to the defer code is, the problem is that the journal code is flagging that the file is journaled which is flagging the SR code to defer the object. This is causing the entire restore to have a dangling deferred object.

According to development:
"CPF3294 is sent in four places in the code. The normal one is an exception handler, but that handler enqueues the previous failure message which is not present in the joblog. So it must be one of the other three places. This (the trap) will help us to narrow down the problem by narrowing down which of the 3 remaining places is causing this problem. Will the customer take a trap? It will not do anything special or cause any problems; it will simply dump out an extra message (CPF9898) in the joblog when the CPF3294 is sent."

The problem is I cannot recreate this, I've tried.

It may take another 3 months before I see the issue.

Paul

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of CRPence
Sent: Monday, November 03, 2014 1:48 PM
To: midrange-l@xxxxxxxxxxxx
Subject: Re: RSTLIBBRM failed due to "false" DFRID

On 03-Nov-2014 10:51 -0600, Steinmetz, Paul wrote:

We simply QSYS/RMVDFRID DFRID(*ALL) to clear the error.

Before doing that the next time, save\obtain-a-copy of the file of deferral data in QRECOVERY; review the data for each /library/ datum in the row(s) representative of the object(s) for that library, minimally to determine if the library name is corrupted there? If a library name utilized in the STRJRNPF request for the deferred-restore had came from corrupted data in that file, then the CPF9810 would be /correct/, implying again, that the prior failed attempt [the CPF3294] is the origin for the issue [and thus the later message and side effects are just /noise/].

Another RSTLIBBRM of the same library works, recreates do not occur.

Of course the definition of /another RSTLIBBRM/ is quite open to implication\interpretation; the same request repeated again, with no other action except Remove Deferral Identifier (RMVDFRID) since the error, would of course be a restore-over vs a scratch-restore of that specific object [and possibly others]. Conspicuously, a scratch-restore of the problematic object is a minimal requirement. If the restored library did not exist prior to that RSTLIB with deferral-identifier, then a recreate attempt needs to have effected a Delete Library (DLTLIB) to have a minimally-similar recreation setup.

Are these always just RSTLIBBRM onto an LPAR where the libraries being restore never exist? If the *LIB objects exist but they have been cleared with Clear Library (CLRLIB), then can the procedure be changed to either Delete Library (DLTLIB) first, or at least issue a Dump Object
(DMPOBJ) against each presumed-to-be-empty library before the restore actions start? If dump objects are taken, then after any failed restore, the dump of that library for which the failure then transpired could be reviewed for latent information that might have given rise to difficulties. If the DLTLIB is not effected, then along with DMPOBJ, the DSPLIBD output is easier to review than what might be encoded in the dump, so those might be taken additionally.

No files are being journaled for the library being restored from the
source LPAR.

Do the /number of member/ attributes and the actual count of members [as presented by the number of records in outfile that represent a specific\distinct member name], match between the DSPFD TYPE(*MBRLIST) and and DSPFD TYPE(*MBR) performed with OUTPUT(*OUTFILE) on the system from which the file was saved?

Does the particular file identified in the CPF3294 have a historical 'last journal' name [and qualified library name] attribute appear in output of either the Display Object Description (DSPOBJD) for
DETAIL(*FULL) and\or the DSPFD?

We do about 100 to 200 RSTLIBRM on a monthly basis with no issue.
I've had this issue occur about 5 times within the last year.
I've never been able to do a recreate.

A re-create attempt likely could require that the situation\environment appear as the near equivalent [as close as possible] to what was the situation was prior to the /failing/ restore.
Note: I realize that was a successful restore, but as a restore with errors, I will tend to refer to that as a /failure/.

So as noted, minimally the object that incorrectly was identified for having been journaled on the source LPAR [or appear to require journaling per implicit journal upon restore via QDFTJRN or STRJRNLIB] must be scratch-restored in a re-create attempt, just as it was on the failing request, and quite possibly the same situation must be achieved for all the other objects that were scratch-restored in the failing request; i.e. they also would need to be included as a scratch-restore on the re-create attempt. Another thing to mimic would be to have signed-off and then signon again, if that is what the job for failing restore had done.

If the prior failures had all been either RSTLIBBRM as scratch-restore of the *LIB and object or as restore-into an existing library with all objects as\presumed-to-be scratch-restored, then maybe the full restore to a totally different partition [perhaps in the IBM lab] might recreate when using [a copy of] that media. If the original scenario is a restore over the existing libraries, then a first-pass restore could restore just the libraries [omitting all objects, e.g. by type], then a new process mimicking the full restore; otherwise just the full request onto a totally different LPAR might suffice to recreate.
If the first-pass restore is the full restore, then that might be tested first, and [esp. if originally the process is to effect] the CLRLIB requests could be performed in advance of a new process attempting the full restore wherein the *LIB objects are already there.

FWiW: While the same PTF-level for QDBRSPRE remains, if the messaging from the instruction x/17F2 is atypical, a debug break-point program could be activated before the restores to try to /catch/ the CPF3294 being diagnosed, and possibly allowing some more debug\doc-collection work to either improve the possibility to effect a recreate or better to diagnose the origin. One benefit to the OPM, is that breakpoints can be added to the programs easily via scripted [optionally compiled] command\statements, no listing may be required to figure out where to place the break, and the breakpoint can be automated [to collect docs] or prompted for more doc collections performed interactively.

--
Regards, Chuck
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l.