Re: CPYF gives CPC2957 as completion message but hangs job -- MIDRANGE-L

Some corrections, inline:

CRPence wrote:

Peter Dow wrote:

One of my customers was running payroll and the job seemed to
hang. Looking at his call stack I saw:

Type Program Stmt /Inst Procedure
QCMD QSYS /04F3
INLPGMC0 QGPL 15400 /00D6
GPRM950 GP#LIBR _QRNP_PEP_GPRM950
GPRM950 GP#LIBR 1790 GPRM950
PPCP400 HS#LIBR_PP _CL_PEP
PPCP400 HS#LIBR_PP 32900 PPCP400
PPCU669 HS#LIBR_PP 3700 /001F
QCPEX0FL QSYS /0121
QCPEXCON QSYS /0AB7
QCPGENIO QSYS /01DF
QDBGETM QSYS /0533
QWTPECTL QSYS /013D
QMHDLVMS QSYS /01A3
QMHDSMSS QSYS /1772
QWSGET QSYS /065D
QT3REQIO QSYS /0253

which to me looks like it's waiting for a response from the user;
however, his (Rumba) session shows input inhibited. WRKACTJOB
shows the job in a DSPW status.

<ed: correction to the below text, following the quoted text>

According to the stack, the job is sending a status message to the message line. If the job were awaiting a reply, the job status would be MSGW and the QMHRCVM [or similar name; i.e. a receive message reply processor] would be on the stack instead of QMHDSMSS [the display status message processor].

According to the stack, the job is displaying a break message which has interrupted a database get-multiple request which implements the CPYF request. The program QMHDSMSS is "Display message" processor.

The joblog shows:

3700 - CPYF FROMFILE(PPPPCWK) TOFILE(QTEMP/PPBPCWK)
MBROPT(*REPLACE) CRTFILE(*YES)
INCREL((*IF WHOSP# *LT '00001')
(*OR WHOSP# *GT '99999'))
Physical file PPBPCWK created in library QTEMP.
Member PPPPCWK added to file PPBPCWK in QTEMP.

I checked the program, and it has a MONMSG after that CPYF:

MONMSG MSGID(CPF2869 CPF2817 CPC2957)

The issue is not specifically related to those messages which can
be issued by the CPYF. Note also that a completion message can
not be monitored; the CPYF does not show CPC2957 would be issued
as an escape\monitor-capable message.

I checked the PPPPCWK file and all records have WHOSP# = 00001,
which means the above CPYF stmt would select nothing, which
should give a CPC2957 completion message (which it did when I
ran the CPYF manually).

Again, the issue has effectively nothing to do with the CPYF, so
that is of little value from which to infer anything. What might
be of curiosity is how /noisy/ the request was for its status
messages.

Because the issue is not for status messaging, but instead a break message being notified, how many status messages [i.e. how /noisy/] is not of interest.

I had him cancel his Rumba session (without canceling the job).
His job then showed status DSC (disconnected). He started Rumba
again, reconnected to the job, and it took off and completed
normally.

I believe the reconnection was functional because the device
had previously been disconnected due to the first error listed
below, i.e. the "not active" condition, and the device recovery
had been defined as *MSG for which the CPF509F-related messaging
allowed a monitored\handled recovery.

It was clarified that QDEVRCYACN was *DSCMSG. I did not explicitly note in the above comment that DEVRCYACN is an attribute of the job [e.g. see CHGJOB] which may not redirect to the *SYSVAL. AFaIK a CPF5170 should have effected a DSCJOB for a device recovery action of *DSCMSG.

Device MISPC02S2 session not active.
Input or Output request failed. See message CPF5170.
Job connected again. Sign on information ignored.
Job has successfully connected after I/O error.
? C
Cancel reply received for message CPF509F.
Error while processing file QDDSPMSG in library QSYS.
No records copied from file PPPPCWK in PP#FILE. <= CPC2957
4100 - CLRPFM FILE(PPPPCWK)
Member PPPPCWK file PPPPCWK in PP#FILE cleared.
4500 - CPYF FROMFILE(QTEMP/PPBPCWK) TOFILE(PPPPCWK)
MBROPT(*REPLACE)
Empty member PPPPCWK in file PPBPCWK in library QTEMP is
not copied.
Copy command ended because of error. <= CPF2817

The CPF2817 was monitored for and the program finished normally.

From the joblog and continued processing, it looks like everything is probably working as designed.

The question is, why did it pause at the CPC2957 completion message?

<ed: correction to the below text, following the quoted text>

The device disconnected, probably due to a communication error. The time it actually disconnected may not be the "not active" message. The correlation of the timing of the "not active" condition detection to the moment the "CPC2957 completion message", would suggest the detection of the disconnected device as a direct consequence of the completion message being sent to the UIM or DSPF; i.e. the condition of the device having lost communication probably was detected, because the sending of the completion message is an attempt to perform I/O to the device via the UIM or DSPF.

The above reflects a false inference that the CPC2957 is related. Instead... The job was in DSPMSG activated during a CPYF, a device error occurred [while the job was at DSPMSG, or sometime before; since the joblog snippet is not a spooled joblog, the lack of timestamps inhibits good inferences], the job reconnected and received the I\O error message to the DSPMSG processor, and its display QDDSPMSG was terminated thus allowing the CPYF to continue.

Are they missing a PTF? Is it a Rumba problem? Something completely different?

While there may be a problem with the comm, the emulation, or something else, unless the devices are commonly disconnected, it
may just be hiccup.

To further explain...

A program which performs no I\O to the display device can
continue processing unaffected by the loss of the device.

The CPYF was performing I\O between database files; i.e. reading from one database file member(s) to another database file member(s). As such its only I\O beyond the FROMFILE and TOFILE would be status messages, but status messages sent to a disconnected display device is a recoverable error. The CPC2957 would have come at the completion of the CPYF, some time after the job reconnected, in however long it took to complete the copy file since the request was interrupted by the DSPMSG which was interrupted by the device error.

<ed: correction to the below text, following the quoted text>

However the job was sending status messages _asynchronously_ to the virtual display device, sent from QDBGETM. When a status
message is sent to *EXT but when that message can not be
delivered [to the QDDSPMSG message subfile] due to the device
being in error, that is a recoverable error. The /recoverable/
error for that I\O request generally does not impair the processing of a program, however the _completion_ message sent to
the display file, UIM panel or menu, is not an I\O that would be
considered "recoverable" because it is not simply /status/.

Ignore the above quoted text, a response based on having misread the given program stack.

I am not sure if perhaps *MSG might effect the I\O error [see QDEVRCYACN *sval] for which the workstation support would then
await reconnect, but from the stack it would seem so. So anyhow
the device remained "input inhibited" due to a combination of the
QDEVRCYACN setting and the way the virtual device and emulation
software interact. That the job was able to be reattached via
that device, indicates that things worked well. Review the system
value setting to decide if a different result might be more
preferable.

The origin of the device error, how the device error is handled according to the DEVRCYACN of the job, and how the emulator & job reacted to both, seems to be the crux.

FWiW I recall over some years having some occasional but rare issues with break messages /hanging/ my session with the II Input Inhibited indicator left on [and IIRC using DSCJOB or ENDJOB to recover]; an issue which I eventually concluded most likely had come about when I was actively typing while the DSPMSG suddenly appeared with a panel of output-only text. I never used Rumba however, and I do not recall ever getting a specific resolution; having changed my user profiles to have DLVRY(*HOLD) probably prevented some occurrences that might have otherwise persisted.

FWiW the CHGJOB STSMSG(*NONE) before performing the copy would
have prevented some wasted processing on sending the status
messages; i.e. what was seen active in the quoted stack.

The stsmsg comment is based on misreading the stack. It is worthwhile information, if the job will issue many status messages, but where those messages are not required to be seen; i.e. turning them off saves resources and prevents async messaging from making a quickly-completed process pend completion of enqueued messaging.

Regards, Chuck