On 12-Sep-2015 20:59 -0600, CRPence wrote:
If I understand correctly, the failure origin is a seize conflict
that is being detected during the Tape Exit (QTATAPEX) processing
that gets invoked during Tape End Of Volume (QTAEOV) processing that
gets invoked during the save processing [for the SAV request].
<<SNIP>>

Given BRMS is not being used, and the issue appears to occur as a side effect of the BRMS flight recorder [not the tape flight recorder despite any allusions I made to that possibility], I think the simplest thing might be to Delete Licensed Program (DLTLICPGM) of the BRMS LPP [IIRC, that would be LICPGM(5722BR1) OPTION(*ALL)]; optionally obtain a Save Licensed Program (SAVLICPGM) of that LPP first.

Similarly I had suggested previously that same action for Media And Storage Extensions (MSE) feature. But not knowing if that feature is unused, the recommended deletion would be in effect only during the GO SAVE; i.e. temporary, to circumvent\bypass, such that after the save, the recommendation was to restore the previously saved MSE so as not to leave any lasting impacts from the deletion of the feature.

Nearly as simple [and I am similarly confident of the positive effect, as disabling\deleting either of BRMS or MSE] would be to Change Object Attribute (CHGATR) of the file named flightrec [I do not know the location of that file] to effect *ALWSAV=*NO; of course first, just need to locate that file, Dump (DMP) the file, and verify the object address matches [what I had posted earlier: 0AE6F6729C], before issuing that CHGATR request [though I suppose all files with that name might just as well be omitted from a full backup, as apparent log file(s)].

I will try to explain here, what I believe transpires in the failing saves, from what I inferred from the VLog and Joblog information:

First some background: The SAV effectively groups\packages some large number of files into something called a _descriptor_, after which the SAV request hands-off that list to the LIC Load\Dump feature to prepare [*asynchronously* in a LIC task], to write [aka Dump] to the tape device. Meanwhile, the SAV starts grouping\packaging the next group of files [concurrent to that LIC\LD task writing to the tape], into a new descriptor that also eventually will be handed-off to LIC\LD. What transpires in the failing scenario is:

01) for the previously handed-off list of files [the previous descriptor], a seize [something akin to a lock] gets placed on each file.

02) the asynchronous LIC\LD starts to write that previous descriptor of files\data to the tape, but encounters the tape-full condition.

03) the LIC\LD signals an event to the SAV processor informing of the tape-full condition.

04) the SAV processing dutifully presents the inquiry message about end-of-volume and awaits the reply from the operator. If the SAV was still building the next descriptor, that work is interrupted; if the SAV had completed the next descriptor, the SAV was just sitting in a[n event-] wait status, awaiting LIC\LD to have written the prior descriptor to the tape.

05) the tape-exit feature invokes BRMS, an obligation established as a contract, per the Q1ARTMS being recorded in the Registration Information as the Exit-Program; that BRMS is not being used in the failing scenario is moot.

06) despite that BRMS is not the feature effecting the currently invoked save\backup activity [i.e. SAV vs SAVBRM is occurring], because BRMS was invoked by the tape-exit, the BRMS feature feels obligated to write some flight-recorder details about the save activity, into the file named flightrec; this write activity being attempted, runs in the operator's job. I do not know of\if there is some incantation that will ask BRMS to stop flight-recording; there may be some CALL QBRM/flight_recorder_config capability

07) Concurrently [in parallel] the LIC\LD task is still holding a seize on the files in the descriptor, awaiting write\dump to tape on the new\next volume that needs to be loaded. One of those files [as I interpret the given VLog information] is the file flightrec, and the attempt to open-for-write to that same file fails with a seize-conflict condition. That conflict exists because the operator-process running at the console can not access the file, while the concurrent LIC\LD task holds the exclusive seize.

08) The LIC\RMSL [Resource Management Seize\Lock] feature signals the seize-conflict to the LIC\LD, and the LIC\LD task falls into an error-code-path, to log that an unexpected seize-conflict transpired; the conflict is unexpected, because by design, no conflict is supposed to occur. The LIC\LD logs the source\sink to identify the reason for the abnormal termination of the LIC\LD task, and then effects a Damage-Set to mark-in-error the tape-device that was being used.

09) The operator replies to the inquiry with the G=Go because the next tape was loaded and ready to go.

10) The operator process [at the console], nearly immediately encounters the damaged device when attempting the open of the newly loaded volume; encountering the damage that was set in the LIC\LD task.

11) The save operation goes into an exception path, and properly bubbles-up the errors; the GO SAVE option-21 is shown to have failed.


This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2019 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].