Re: RPGLE program quits on READ of subfile control -- MIDRANGE-L

On 01-Jun-2010 08:22, Bonnie Lokenvitz wrote:

Thanks for the insights. Ctrl-Esc is mapped to SysRqs.

The user's message queue status was set to *NOTIFY but she had 4
interactive sessions. The user message queue was allocated to a
different session.

IIRC, if the workstation [& user?] message queue were held to the requesting job, the TFRSECJOB would effect the drop of the *MSGQ allocation much like a signoff. Then upon returning to the original job, attempt to reestablish the original /delivery method/ of the message queue held prior to TFRSECJOB. Since the user did not have their *MSGQ allocated in that job, then that should exclude it from source of any issue. It might be interesting to note however, if messages were received during processing such that the /notify/ condition for the *WRKSTN message queue had been established.

The program that was active in the 'hung' session has a message
handling program that does send formatted messages TOPGMQ(*PRV).
The user had done over 2600 entries each of which would have
caused a message to be sent.

Unless it was a /break handling/ program associated with the user or workstation message queue, what processing upon returning to the initial job from the signoff from a secondary job should be moot; at least unrelated to any suspicion that the OS processing to handle the status of a break\notify condition to the workstation device\emulator or the user.

FWiW, it might be interesting to test if sending a break message to the device might resolve the apparent hang; i.e. that if the message breaks, and the user presses Enter\F3\F12 to exit from the DSPMSG panel, if then processing can continue just as if in the manner currently used to effect recovery by TFRSECJOB.

Anyhow, I think the relatedness of user\workstation messaging to the issue may be off the mark. I only mentioned it, because the TFRSECJOB was being utilized to circumvent. I very much suspect that something like SNDBRKMSG or simply SysRqs-3 then returning to the screen might suffice to circumvent versus actually transferring to a secondary job and then returning. Easy enough to test; though probably best before a SNDBRKMSG, but after getting the output from WRKJOB OUTPUT(*PRINT).

QJOBMSGQFL is set to *PRTWRAP. I did not see that happen. Other
message queue values (QJOBMSGQMX, QJOBMSGQSZ, QJOBMSGQTL) are set
at the shipped levels (64, 16, 24). This is a V6R1 system with
lots of disk & memory and only 4 users each running 4 sessions.

Interesting thought. So apparently it was determined that the condition was not just a /wait condition/ caused by having wrapped the job message queue.? Note that the Job message queue maximum size [JOBMSGQMX] and Job message queue full action [JOBMSGQFL] are attributes of the job, which may not resolve directly from their respective system values. Also FWiW, of the noted sysvals, only the QJOBMSGQMX and QJOBMSGQMX are still of interest; i.e. the other two mentioned have long been ineffectual.

We will do the WRKJOB OUTPUT(*PRINT) next time it happens.

Could also STRSRVJOB and TRCJOB *ON after the problem starts and before the circumvention is attempted, or even long before the problem appears, in anticipation the trace will catch the apparent hang. Because the job can issue SysRqs-1 during the /stop/, that means the request to service the job should be possible even after the issue arises; not that also allows starting debug on the subfile program.

Regards, Chuck

CRPence wrote:

On 27-May-2010 13:23, Bonnie Lokenvitz wrote:

I am working with an interactive production program that is
used daily. About once a month it quits functioning for one
user after she has entered 2000+ lines (which are written to
a file in QTEMP). The program stops after it writes the
subfile control record and attempts the following READ
statement.

Today I had the user do a CTRL/ESC to a new session, log on,
log off. This made the READ statement in the original session
work.

Any ideas of what is going on?

What is Ctrl-Esc mapped to? Presumably SysRqs [System
Request] such that SysRqs-1 effects TFRSECJOB, or SysRqs shows
the System Request panel? If so, are the other activities
required, or might just SysRqs followed by F12 to return to the
processing get things going again? Regardless, when it happens
again, probably better to have another user\job request a
WRKJOB OUTPUT(*PRINT) against the apparently-hung job to see
its status and program stack.

FWiW the return to a job after the sequence of TFRSECJOB,
signon, and signoff, will check the break\notify for the
workstation [& user?] message queue. The DLVRY() setting for
the user or the workstation message queue might be germane.