On 15-Aug-2014 14:58 -0500, Steinmetz, Paul wrote:
After more research, I found this problem is NOT always occurring
for this job.
If a specific job and conditions can be noted, for which the problem
is consistently reproduced, then a Job Trace is a good place to continue
with a review of the issue. IIRC the message identifier appears as
trace data [trace data vs merely trace flow must be included in the
trace request] when the QMHSNMSG [I think that is the correct OS pgm]
performs the work of sending a program message that would count against
the overall number of messages sent in the job. The trace would also
reveal any use of Display Job Log (DSPJOBLOG) [or any indirect method
effecting the QMHJLOG program being invoked as well].
Also found the source for the 3rd party vendor's program.
The job does send many messages to programs message q.
I believe these messages are filling up the program message, which
result in the job message q filling, which sometimes results is a
CPI2417, but not always.
The job log is not full, but job message.
Presumably that means to suggest that "The job log is not full, but
the Job Message Queue is full.
Distinguishing the JobLog from the Job Message Queue (JMQ) is
important in that scenario; the joblog being viewed as what is either
the data spooled or directed to an output file, after the effects of
Log-Level (LOG) filtering, and the JMQ as the unfiltered\comprehensive
information regarding _all messaging for the job_ [since the last time
the JMQ was cleared for re-use or wrapping]. The joblog could appear
almost completely empty, even while the JMQ has reached its capacity;
though one would hope that /something/ would appear in the spooled
joblog to help one to identify what had caused the JMQ to become full.
Thus the joblog could never be /full/, but the output device\file into
which the /joblog/ is written can be /full/.
Because the recovery text for CPI2417 suggests that one might be able
to infer the origin by reviewing "the messages in the job message queue
to determine if there is a problem", but because the /user/ really only
has access to the Job Log vs the referenced Job Message Queue, a
possible Design Change Request (DCR) idea is to ask that when the OS
decides to send the msg CPI2417, that the output device for the joblog
should always include additionally, a "message" detailing an effective
/map/ of the JMQ; perhaps some counts\statistics provided that might
assist one to visualize the origin for a "full" condition being
diagnosed. Or perhaps by some other means, one could request that an
effective /formatted dump/ of the JMQ be produced when that condition is
diagnosed; a /dump/ in that case, meaning something that can be
comprehended by someone other than a programmer of the OS. As a
message, the CPI2417 itself might serve to provide additional details
that might help one to infer the origin beyond simply what is found in
the joblog output effected per the *PRTWRAP.
This job also sends to a dtaq.
If the other job on the system that reads this dataq is not running,
the dataq also fills up.
<ed: Job> S001NITE19 <ed: Usr> trp1
message - Storage limit exceeded for data queue PTMDTAQ001.
Presumably that is a reflection of the msg CPF950A "Storage limit
exceeded for data queue &1 in &2.", with an origin from a request to
call the Send Data Queue (QSNDDTAQ) API which could have been sent as a
message or provided as feedback via a[n effective] return code.
Without the actual spooled joblog to show some context of the
failure, the above comment is merely speculation.
However, I'm not sure if the filling of the dataq is related.
If the requester adding data queue entries does not temper the
work\messages being added to the data queue, in response to the "queue
full" failure condition, as an attempt to allow the /other job/ an
opportunity to decrease the total number of messages on the DtaQ [i.e.
to allow the dequeing job to decrement the count of messages from the
DtaQ], then the filling of the JMQ might easily be a side effect only on
occasions whereby the data queue was allowed to reach that storage limit.
My next question is does a job have more than one message q, one for
the joblog and one for program messages?
Each job has only one Job Message Queue object that is comprised of,
is a composite of, all messages sent to any of the program message
queues; the External (*EXT) program message queue being effectively
active until the EOJ, and while the job is active there may be many
inactive program message queues plus every program on the stack at any
one moment is an active program message queue [that may be devoid of any
messages]. For an ILE program, probably acceptable to substitute
/procedure/ for /program/, because in many ways, they are synonymous for
how messages are sent and received [via call stack entries].
<<SNIP ZASNMS subroutine calling Y2SNMGC>>
Presumably the called program effects an invocation of the Send
Program Message (QMHSNDPM) API; directly contributing to the total
number of messages in the JMQ, and to the specific [active] program
message queue [possibly *EXTernal, or even itself] to which the
message(s) would get sent.