On 12-Sep-2015 20:59 -0600, CRPence wrote:
If I understand correctly, the failure origin is a seize conflict
that is being detected during the Tape Exit (QTATAPEX) processing
that gets invoked during Tape End Of Volume (QTAEOV) processing that
gets invoked during the save processing [for the SAV request].
Given BRMS is not being used, and the issue appears to occur as a
side effect of the BRMS flight recorder [not the tape flight recorder
despite any allusions I made to that possibility], I think the simplest
thing might be to Delete Licensed Program (DLTLICPGM) of the BRMS LPP
[IIRC, that would be LICPGM(5722BR1) OPTION(*ALL)]; optionally obtain a
Save Licensed Program (SAVLICPGM) of that LPP first.
Similarly I had suggested previously that same action for Media And
Storage Extensions (MSE) feature. But not knowing if that feature is
unused, the recommended deletion would be in effect only during the GO
SAVE; i.e. temporary, to circumvent\bypass, such that after the save,
the recommendation was to restore the previously saved MSE so as not to
leave any lasting impacts from the deletion of the feature.
Nearly as simple [and I am similarly confident of the positive
effect, as disabling\deleting either of BRMS or MSE] would be to Change
Object Attribute (CHGATR) of the file named flightrec [I do not know the
location of that file] to effect *ALWSAV=*NO; of course first, just need
to locate that file, Dump (DMP) the file, and verify the object address
matches [what I had posted earlier: 0AE6F6729C], before issuing that
CHGATR request [though I suppose all files with that name might just as
well be omitted from a full backup, as apparent log file(s)].
I will try to explain here, what I believe transpires in the failing
saves, from what I inferred from the VLog and Joblog information:
First some background: The SAV effectively groups\packages some large
number of files into something called a _descriptor_, after which the
SAV request hands-off that list to the LIC Load\Dump feature to prepare
[*asynchronously* in a LIC task], to write [aka Dump] to the tape
device. Meanwhile, the SAV starts grouping\packaging the next group of
files [concurrent to that LIC\LD task writing to the tape], into a new
descriptor that also eventually will be handed-off to LIC\LD. What
transpires in the failing scenario is:
01) for the previously handed-off list of files [the previous
descriptor], a seize [something akin to a lock] gets placed on each file.
02) the asynchronous LIC\LD starts to write that previous descriptor
of files\data to the tape, but encounters the tape-full condition.
03) the LIC\LD signals an event to the SAV processor informing of the
04) the SAV processing dutifully presents the inquiry message about
end-of-volume and awaits the reply from the operator. If the SAV was
still building the next descriptor, that work is interrupted; if the SAV
had completed the next descriptor, the SAV was just sitting in a[n
event-] wait status, awaiting LIC\LD to have written the prior
descriptor to the tape.
05) the tape-exit feature invokes BRMS, an obligation established as a
contract, per the Q1ARTMS being recorded in the Registration Information
as the Exit-Program; that BRMS is not being used in the failing scenario
06) despite that BRMS is not the feature effecting the currently
invoked save\backup activity [i.e. SAV vs SAVBRM is occurring], because
BRMS was invoked by the tape-exit, the BRMS feature feels obligated to
write some flight-recorder details about the save activity, into the
file named flightrec; this write activity being attempted, runs in the
operator's job. I do not know of\if there is some incantation that will
ask BRMS to stop flight-recording; there may be some CALL
07) Concurrently [in parallel] the LIC\LD task is still holding a
seize on the files in the descriptor, awaiting write\dump to tape on the
new\next volume that needs to be loaded. One of those files [as I
interpret the given VLog information] is the file flightrec, and the
attempt to open-for-write to that same file fails with a seize-conflict
condition. That conflict exists because the operator-process running at
the console can not access the file, while the concurrent LIC\LD task
holds the exclusive seize.
08) The LIC\RMSL [Resource Management Seize\Lock] feature signals the
seize-conflict to the LIC\LD, and the LIC\LD task falls into an
error-code-path, to log that an unexpected seize-conflict transpired;
the conflict is unexpected, because by design, no conflict is supposed
to occur. The LIC\LD logs the source\sink to identify the reason for
the abnormal termination of the LIC\LD task, and then effects a
Damage-Set to mark-in-error the tape-device that was being used.
09) The operator replies to the inquiry with the G=Go because the next
tape was loaded and ready to go.
10) The operator process [at the console], nearly immediately
encounters the damaged device when attempting the open of the newly
loaded volume; encountering the damage that was set in the LIC\LD task.
11) The save operation goes into an exception path, and properly
bubbles-up the errors; the GO SAVE option-21 is shown to have failed.