Peter Dow wrote:
One of my customers was running payroll and the job seemed to
hang. Looking at his call stack I saw:
Type Program Stmt /Inst Procedure
QCMD QSYS /04F3
INLPGMC0 QGPL 15400 /00D6
GPRM950 GP#LIBR _QRNP_PEP_GPRM950
GPRM950 GP#LIBR 1790 GPRM950
PPCP400 HS#LIBR_PP _CL_PEP
PPCP400 HS#LIBR_PP 32900 PPCP400
PPCU669 HS#LIBR_PP 3700 /001F
QCPEX0FL QSYS /0121
QCPEXCON QSYS /0AB7
QCPGENIO QSYS /01DF
QDBGETM QSYS /0533
QWTPECTL QSYS /013D
QMHDLVMS QSYS /01A3
QMHDSMSS QSYS /1772
QWSGET QSYS /065D
QT3REQIO QSYS /0253
which to me looks like it's waiting for a response from the user;
however, his (Rumba) session shows input inhibited. WRKACTJOB
shows the job in a DSPW status.
According to the stack, the job is sending a status message to
the message line. If the job were awaiting a reply, the job status
would be MSGW and the QMHRCVM [or similar name; i.e. a receive
message reply processor] would be on the stack instead of QMHDSMSS
[the display status message processor].
The joblog shows:
3700 - CPYF FROMFILE(PPPPCWK) TOFILE(QTEMP/PPBPCWK)
MBROPT(*REPLACE) CRTFILE(*YES)
INCREL((*IF WHOSP# *LT '00001')
(*OR WHOSP# *GT '99999'))
Physical file PPBPCWK created in library QTEMP.
Member PPPPCWK added to file PPBPCWK in QTEMP.
I checked the program, and it has a MONMSG after that CPYF:
MONMSG MSGID(CPF2869 CPF2817 CPC2957)
The issue is not specifically related to those messages which can
be issued by the CPYF. Note also that a completion message can not
be monitored; the CPYF does not show CPC2957 would be issued as an
escape\monitor-capable message.
I checked the PPPPCWK file and all records have WHOSP# = 00001,
which means the above CPYF stmt would select nothing, which
should give a CPC2957 completion message (which it did when I
ran the CPYF manually).
Again, the issue has effectively nothing to do with the CPYF, so
that is of little value from which to infer anything. What might be
of curiosity is how /noisy/ the request was for its status messages.
I had him cancel his Rumba session (without canceling the job).
His job then showed status DSC (disconnected). He started Rumba
again, reconnected to the job, and it took off and completed
normally.
I believe the reconnection was functional because the device had
previously been disconnected due to the first error listed below,
i.e. the "not active" condition, and the device recovery had been
defined as *MSG for which the CPF509F-related messaging allowed a
monitored\handled recovery.
Device MISPC02S2 session not active.
Input or Output request failed. See message CPF5170.
Job connected again. Sign on information ignored.
Job has successfully connected after I/O error.
? C
Cancel reply received for message CPF509F.
Error while processing file QDDSPMSG in library QSYS.
No records copied from file PPPPCWK in PP#FILE. <= CPC2957
4100 - CLRPFM FILE(PPPPCWK)
Member PPPPCWK file PPPPCWK in PP#FILE cleared.
4500 - CPYF FROMFILE(QTEMP/PPBPCWK) TOFILE(PPPPCWK)
MBROPT(*REPLACE)
Empty member PPPPCWK in file PPBPCWK in library QTEMP is
not copied.
Copy command ended because of error. <= CPF2817
The CPF2817 was monitored for and the program finished normally.
From the joblog and continued processing, it looks like
everything is probably working as designed.
The question is, why did it pause at the CPC2957 completion
message?
The device disconnected, probably due to a communication error.
The time it actually disconnected may not be the "not active"
message. The correlation of the timing of the "not active"
condition detection to the moment the "CPC2957 completion message",
would suggest the detection of the disconnected device as a direct
consequence of the completion message being sent to the UIM or DSPF;
i.e. the condition of the device having lost communication probably
was detected, because the sending of the completion message is an
attempt to perform I/O to the device via the UIM or DSPF.
Are they missing a PTF? Is it a Rumba problem? Something
completely different?
While there may be a problem with the comm, the emulation, or
something else, unless the devices are commonly disconnected, it may
just be hiccup.
To further explain...
A program which performs no I\O to the display device can
continue processing unaffected by the loss of the device. However
the job was sending status messages _asynchronously_ to the virtual
display device, sent from QDBGETM. When a status message is sent to
*EXT but when that message can not be delivered [to the QDDSPMSG
message subfile] due to the device being in error, that is a
recoverable error. The /recoverable/ error for that I\O request
generally does not impair the processing of a program, however the
_completion_ message sent to the display file, UIM panel or menu, is
not an I\O that would be considered "recoverable" because it is not
simply /status/. I am not sure if perhaps *MSG might effect the I\O
error [see QDEVRCYACN *sval] for which the workstation support would
then await reconnect, but from the stack it would seem so. So
anyhow the device remained "input inhibited" due to a combination of
the QDEVRCYACN setting and the way the virtual device and emulation
software interact. That the job was able to be reattached via that
device, indicates that things worked well. Review the system value
setting to decide if a different result might be more preferable.
FWiW the CHGJOB STSMSG(*NONE) before performing the copy would
have prevented some wasted processing on sending the status
messages; i.e. what was seen active in the quoted stack.
Regards, Chuck
As an Amazon Associate we earn from qualifying purchases.