Re: ENDSBS SBS(QINTER) OPTION(*IMMED) Takes 10 minutes? -- MIDRANGE-L

On 29 Apr 2013 13:58, Sam_L wrote:

In day end processing on Saturday and Sunday we issue
ENDSBS SBS(QINTER) OPTION(*IMMED).
Then we DLYJOB(120)
Then we do STRSBS SBSD(QINTER)
This Saturday evening it took QINTER 10 minutes to end, so the
subsequent STRSBS SBSD(QINTER) failed.

Not that it should matter, but presumably that was done in a program run from the console running in the controlling subsystem, or in a program run in a batch subsystem?

There are much better ways to handle the end without a coded wait. One simple means is to use ALCOBJ OBJ((QINTER *SBSD *EXCL)) WAIT(120) to code a wait with a maximum wait timeout equivalent to what would have been waited irrespective of the time taken for the subsystem to end. The coded wait could be for longer if desirable, polling could be started after the wait timeout, or ??? Alternatively, polling can be done on repeated requests to ENDSBS; waiting on the lock may be viewed by some as /cleaner/ or /easier/ but of course the allocate requires that the subsystem also must be deallocated after the lock is obtained, using the equivalent DLCOBJ request.

Examining recent QINTER job logs it normally ends in less than 12
seconds.

Given that the /normal/ case is one-tenth the time of the coded-wait gives good reason for eliminating the coded wait; i.e. why is there waiting, when there is no reason to wait?

The job log for Saturday has this escape message:
MCH2401 Escape 40 04/27/13 09:05:50.414607
#iiinsen 000D4C QWTMMDSC QSYS 1577 Thread . . . . : 00000920
Message . . . . : Tried to insert duplicate key argument in index
MISREM Disconnected jobs.

The LIC exception x1801 is issued by the Independent Index Insert Entry method to the "SYSTEM EVENT MONITOR FOR DISCONNECT" job feature. That error is the symptom msgMCH2401 F/#iiinsen x/000D4C T/QWTMMDSC x/1577 for which there are no apparent matches [using very generic web searches]. The actual /name/ of the object is not entirely clear without accurate spacing, but it appears to be the device name probably padded to 10 to fifteen bytes, followed by a string "Disconnected jobs"

FWiW, if there had been matching symptoms, knowing the release level is important; that was left unstated by the OP.

MISREM is a virtual display , type 3179.

By default the message MCH2401 ships with "Data to be dumped" DMPLST(*JOB), so if the error was unmonitored by the Work Monitor program, there would be a QPDSPJOB produced by the QINTER at the time of the failure. If the message was monitored, it may just be a message that is left in the joblog even though it was handled, perhaps as an indication of an unlikely but possibly noteworthy incident.... in the event that there was an apparent problem, which could include a performance concern for the ending of the subsystem.

But the job log contained nothing else, except CPF1124 (started) and
CPF1164 (ended.)

The error logged may be unrelated to the length of time. Seeing the status of the jobs and the subsystem monitor over the time of the long ending would likely be of more value.

Sysval QENDJOBLMT is set to 120 and is the only other thing I've
come up with that might impact ending the subsystem.
We were working overtime on Saturday, so there probably were 5250
sessions in QINTER on Saturday evening.

The End Subsystem (ENDSBS) was enhanced many releases ago to provide for better end-time performance, using the available special values on the End subsystem option (ENDSBSOPT) parameter; e.g. *NOJOBLOG and *CHGPTY.

Any idea why it took so long for QINTER to end?

The jobs in the subsystem took a long time to end and\or to produce their joblogs? I would review the history of /job ended/ since the start of the ENDSBS, until the last job ended, and then the subsystem ended. There may be one specific job that did not end until just before the subsystem ended... which would likely be a good place to dig.

I seem to recall an issue, perhaps by design but possibly also something modified, that if an interactive job has already been at the /job ending immediately/ screen, such a condition could delay the ending of the subsystem. I think there was a design change for which that screen would no longer be presented, or at least not left active more than a few moments.?

Hmm. I did a little research with a web search of "PERFM" "ENDSBS" and found the following link for what was noted just above; that delay was formerly 30 seconds, and is now reduced to appear only "briefly":
APAR SE44615
http://www.ibm.com/support/docview.wss?uid=nas2d73b1eb4b14e196a8625777e004997fc