Re: CPYF gives CPC2957 as completion message but hangs job -- MIDRANGE-L

Thanks Chuck & Simon.

This was from their live environment; they'll be refreshing their test environment for other reasons, but I'll try to reproduce the problem there and get a spooled job log with timestamps.

In the meantime, just to clarify, the job will sit in DSPW status until the Rumba telnet session is manually terminated, at which point it gets the CPF5170. Hopefully this will be apparent in the forthcoming spooled joblog (it may be tomorrow before I'm able to do it).

*Peter Dow* /
Dow Software Services, Inc.
909 793-9050
pdow@xxxxxxxxxxxxxxx <mailto:pdow@xxxxxxxxxxxxxxx> /

On 5/5/2010 11:14 PM, CRPence wrote:

Some corrections, inline:

CRPence wrote:

Peter Dow wrote:

One of my customers was running payroll and the job seemed to
hang. Looking at his call stack I saw:

Type Program Stmt /Inst Procedure
QCMD QSYS /04F3
INLPGMC0 QGPL 15400 /00D6
GPRM950 GP#LIBR _QRNP_PEP_GPRM950
GPRM950 GP#LIBR 1790 GPRM950
PPCP400 HS#LIBR_PP _CL_PEP
PPCP400 HS#LIBR_PP 32900 PPCP400
PPCU669 HS#LIBR_PP 3700 /001F
QCPEX0FL QSYS /0121
QCPEXCON QSYS /0AB7
QCPGENIO QSYS /01DF
QDBGETM QSYS /0533
QWTPECTL QSYS /013D
QMHDLVMS QSYS /01A3
QMHDSMSS QSYS /1772
QWSGET QSYS /065D
QT3REQIO QSYS /0253

which to me looks like it's waiting for a response from the user;
however, his (Rumba) session shows input inhibited. WRKACTJOB
shows the job in a DSPW status.

<ed: correction to the below text, following the quoted text>

According to the stack, the job is sending a status message to the
message line. If the job were awaiting a reply, the job status would be
MSGW and the QMHRCVM [or similar name; i.e. a receive message reply
processor] would be on the stack instead of QMHDSMSS [the display status
message processor].

According to the stack, the job is displaying a break message
which has interrupted a database get-multiple request which
implements the CPYF request. The program QMHDSMSS is "Display
message" processor.

The joblog shows:

3700 - CPYF FROMFILE(PPPPCWK) TOFILE(QTEMP/PPBPCWK)
MBROPT(*REPLACE) CRTFILE(*YES)
INCREL((*IF WHOSP# *LT '00001')
(*OR WHOSP# *GT '99999'))
Physical file PPBPCWK created in library QTEMP.
Member PPPPCWK added to file PPBPCWK in QTEMP.

I checked the program, and it has a MONMSG after that CPYF:

MONMSG MSGID(CPF2869 CPF2817 CPC2957)

The issue is not specifically related to those messages which can
be issued by the CPYF. Note also that a completion message can
not be monitored; the CPYF does not show CPC2957 would be issued
as an escape\monitor-capable message.

I checked the PPPPCWK file and all records have WHOSP# = 00001,
which means the above CPYF stmt would select nothing, which
should give a CPC2957 completion message (which it did when I
ran the CPYF manually).

Again, the issue has effectively nothing to do with the CPYF, so
that is of little value from which to infer anything. What might
be of curiosity is how /noisy/ the request was for its status
messages.

Because the issue is not for status messaging, but instead a
break message being notified, how many status messages [i.e. how
/noisy/] is not of interest.

I had him cancel his Rumba session (without canceling the job).
His job then showed status DSC (disconnected). He started Rumba
again, reconnected to the job, and it took off and completed
normally.

I believe the reconnection was functional because the device
had previously been disconnected due to the first error listed
below, i.e. the "not active" condition, and the device recovery
had been defined as *MSG for which the CPF509F-related messaging
allowed a monitored\handled recovery.

It was clarified that QDEVRCYACN was *DSCMSG. I did not
explicitly note in the above comment that DEVRCYACN is an attribute
of the job [e.g. see CHGJOB] which may not redirect to the *SYSVAL.
AFaIK a CPF5170 should have effected a DSCJOB for a device
recovery action of *DSCMSG.

Device MISPC02S2 session not active.
Input or Output request failed. See message CPF5170.
Job connected again. Sign on information ignored.
Job has successfully connected after I/O error.
? C
Cancel reply received for message CPF509F.
Error while processing file QDDSPMSG in library QSYS.
No records copied from file PPPPCWK in PP#FILE.<= CPC2957
4100 - CLRPFM FILE(PPPPCWK)
Member PPPPCWK file PPPPCWK in PP#FILE cleared.
4500 - CPYF FROMFILE(QTEMP/PPBPCWK) TOFILE(PPPPCWK)
MBROPT(*REPLACE)
Empty member PPPPCWK in file PPBPCWK in library QTEMP is
not copied.
Copy command ended because of error.<= CPF2817

The CPF2817 was monitored for and the program finished normally.

From the joblog and continued processing, it looks like everything is
probably working as designed.

The question is, why did it pause at the CPC2957 completion message?

<ed: correction to the below text, following the quoted text>

The device disconnected, probably due to a communication error. The
time it actually disconnected may not be the "not active" message. The
correlation of the timing of the "not active" condition detection to the
moment the "CPC2957 completion message", would suggest the detection of
the disconnected device as a direct consequence of the completion
message being sent to the UIM or DSPF; i.e. the condition of the device
having lost communication probably was detected, because the sending of
the completion message is an attempt to perform I/O to the device via
the UIM or DSPF.

The above reflects a false inference that the CPC2957 is related.
Instead... The job was in DSPMSG activated during a CPYF, a
device error occurred [while the job was at DSPMSG, or sometime
before; since the joblog snippet is not a spooled joblog, the lack
of timestamps inhibits good inferences], the job reconnected and
received the I\O error message to the DSPMSG processor, and its
display QDDSPMSG was terminated thus allowing the CPYF to continue.

Are they missing a PTF? Is it a Rumba problem? Something
completely different?

While there may be a problem with the comm, the emulation, or
something else, unless the devices are commonly disconnected, it
may just be hiccup.

To further explain...

A program which performs no I\O to the display device can
continue processing unaffected by the loss of the device.

The CPYF was performing I\O between database files; i.e. reading
from one database file member(s) to another database file member(s).
As such its only I\O beyond the FROMFILE and TOFILE would be
status messages, but status messages sent to a disconnected display
device is a recoverable error. The CPC2957 would have come at the
completion of the CPYF, some time after the job reconnected, in
however long it took to complete the copy file since the request was
interrupted by the DSPMSG which was interrupted by the device error.

<ed: correction to the below text, following the quoted text>

However the job was sending status messages _asynchronously_ to
the virtual display device, sent from QDBGETM. When a status
message is sent to *EXT but when that message can not be
delivered [to the QDDSPMSG message subfile] due to the device
being in error, that is a recoverable error. The /recoverable/
error for that I\O request generally does not impair the
processing of a program, however the _completion_ message sent to
the display file, UIM panel or menu, is not an I\O that would be
considered "recoverable" because it is not simply /status/.

Ignore the above quoted text, a response based on having misread
the given program stack.

I am not sure if perhaps *MSG might effect the I\O error [see
QDEVRCYACN *sval] for which the workstation support would then
await reconnect, but from the stack it would seem so. So anyhow
the device remained "input inhibited" due to a combination of the
QDEVRCYACN setting and the way the virtual device and emulation
software interact. That the job was able to be reattached via
that device, indicates that things worked well. Review the system
value setting to decide if a different result might be more
preferable.

The origin of the device error, how the device error is handled
according to the DEVRCYACN of the job, and how the emulator& job
reacted to both, seems to be the crux.

FWiW I recall over some years having some occasional but rare
issues with break messages /hanging/ my session with the II Input
Inhibited indicator left on [and IIRC using DSCJOB or ENDJOB to
recover]; an issue which I eventually concluded most likely had come
about when I was actively typing while the DSPMSG suddenly appeared
with a panel of output-only text. I never used Rumba however, and I
do not recall ever getting a specific resolution; having changed my
user profiles to have DLVRY(*HOLD) probably prevented some
occurrences that might have otherwise persisted.

FWiW the CHGJOB STSMSG(*NONE) before performing the copy would
have prevented some wasted processing on sending the status
messages; i.e. what was seen active in the quoted stack.

The stsmsg comment is based on misreading the stack. It is
worthwhile information, if the job will issue many status messages,
but where those messages are not required to be seen; i.e. turning
them off saves resources and prevents async messaging from making a
quickly-completed process pend completion of enqueued messaging.

Regards, Chuck