On 29 Apr 2013 13:58, Sam_L wrote:
In day end processing on Saturday and Sunday we issue
ENDSBS SBS(QINTER) OPTION(*IMMED).
Then we DLYJOB(120)
Then we do STRSBS SBSD(QINTER)
This Saturday evening it took QINTER 10 minutes to end, so the
subsequent STRSBS SBSD(QINTER) failed.
Not that it should matter, but presumably that was done in a program
run from the console running in the controlling subsystem, or in a
program run in a batch subsystem?
There are much better ways to handle the end without a coded wait.
One simple means is to use ALCOBJ OBJ((QINTER *SBSD *EXCL)) WAIT(120) to
code a wait with a maximum wait timeout equivalent to what would have
been waited irrespective of the time taken for the subsystem to end.
The coded wait could be for longer if desirable, polling could be
started after the wait timeout, or ??? Alternatively, polling can be
done on repeated requests to ENDSBS; waiting on the lock may be viewed
by some as /cleaner/ or /easier/ but of course the allocate requires
that the subsystem also must be deallocated after the lock is obtained,
using the equivalent DLCOBJ request.
Examining recent QINTER job logs it normally ends in less than 12
seconds.
Given that the /normal/ case is one-tenth the time of the coded-wait
gives good reason for eliminating the coded wait; i.e. why is there
waiting, when there is no reason to wait?
The job log for Saturday has this escape message:
MCH2401 Escape 40 04/27/13 09:05:50.414607
#iiinsen 000D4C QWTMMDSC QSYS 1577 Thread . . . . : 00000920
Message . . . . : Tried to insert duplicate key argument in index
MISREM Disconnected jobs.
The LIC exception x1801 is issued by the Independent Index Insert
Entry method to the "SYSTEM EVENT MONITOR FOR DISCONNECT" job feature.
That error is the symptom msgMCH2401 F/#iiinsen x/000D4C T/QWTMMDSC
x/1577 for which there are no apparent matches [using very generic web
searches]. The actual /name/ of the object is not entirely clear
without accurate spacing, but it appears to be the device name probably
padded to 10 to fifteen bytes, followed by a string "Disconnected jobs"
FWiW, if there had been matching symptoms, knowing the release level
is important; that was left unstated by the OP.
MISREM is a virtual display , type 3179.
By default the message MCH2401 ships with "Data to be dumped"
DMPLST(*JOB), so if the error was unmonitored by the Work Monitor
program, there would be a QPDSPJOB produced by the QINTER at the time of
the failure. If the message was monitored, it may just be a message
that is left in the joblog even though it was handled, perhaps as an
indication of an unlikely but possibly noteworthy incident.... in the
event that there was an apparent problem, which could include a
performance concern for the ending of the subsystem.
But the job log contained nothing else, except CPF1124 (started) and
CPF1164 (ended.)
The error logged may be unrelated to the length of time. Seeing the
status of the jobs and the subsystem monitor over the time of the long
ending would likely be of more value.
Sysval QENDJOBLMT is set to 120 and is the only other thing I've
come up with that might impact ending the subsystem.
We were working overtime on Saturday, so there probably were 5250
sessions in QINTER on Saturday evening.
The End Subsystem (ENDSBS) was enhanced many releases ago to provide
for better end-time performance, using the available special values on
the End subsystem option (ENDSBSOPT) parameter; e.g. *NOJOBLOG and *CHGPTY.
Any idea why it took so long for QINTER to end?
The jobs in the subsystem took a long time to end and\or to produce
their joblogs? I would review the history of /job ended/ since the
start of the ENDSBS, until the last job ended, and then the subsystem
ended. There may be one specific job that did not end until just before
the subsystem ended... which would likely be a good place to dig.
I seem to recall an issue, perhaps by design but possibly also
something modified, that if an interactive job has already been at the
/job ending immediately/ screen, such a condition could delay the ending
of the subsystem. I think there was a design change for which that
screen would no longer be presented, or at least not left active more
than a few moments.?
Hmm. I did a little research with a web search of "PERFM" "ENDSBS"
and found the following link for what was noted just above; that delay
was formerly 30 seconds, and is now reduced to appear only "briefly":
APAR SE44615
http://www.ibm.com/support/docview.wss?uid=nas2d73b1eb4b14e196a8625777e004997fc
As an Amazon Associate we earn from qualifying purchases.