Overflow in the activation mark counter -- MIDRANGE-L

I apologize for the length of this email in advance.

We got a surprise while testing our month end process on our development
machine which we recently upgraded to V5R1; We got the following error:

50    03/25/02   13:28:14   QWTPITPP       QSYS        0695     *EXT

  Message . . . . :   Job ended abnormally because of error code MCH3203.

  Cause . . . . . :   The job was ended by the system because an error was

    found, and the job was in a condition where the process default
exception
    handler (QMHPDEH) could not be given control. The system will not give

    control to QMHPDEH when the job is in between the starting phase and the

    problem phase, or between the problem phase and the ending phase. Some
of
    the more common error codes and their meanings follow:... MCH3203 - A
machine
    function check occurred.  The most likely cause is that the program
stack
    for a routing step was nested too deeply....

Our production machine running at V4R5 is handling this job just fine.
IBM's response so far is as follows (Edited for length):

The error is due to overflow in the activation mark counter for the job
(each job has one such counter).  Within a job, each activation group,
program (re)activation and service program (re)activation uses an
activation mark which is an integer ID unique within that job.  As of V4R5
and V5R1, at the SLIC level the counter is an 8-byte integer, while at the
MI level only the 4 low-order bytes are used...

This scenario will typically appear in very large, long-running jobs,
typically server jobs or batch jobs processing large amounts of data.
Suggestions on avoiding this problem revolve around reducing usage of
activation marks to ensure the counter does not overflow:
 o   Avoid running programs in activation group *NEW unless there is a
     functional reason to do so.
 o   Avoid deactivating/reactivating programs unnecessarily (for example,
     in ILE RPG turn the LR indicator on only when it is really necessary;
     otherwise, use RETURN).
 o   Split up data processing into multiple smaller jobs, if viable.

 o   End/restart server jobs as needed."

Am I to believe that due to an UPGRADE of the operating system that a very
long running process might blow up while they worked fine in the past?

Did I mention that this is an interactive job!  Look, I didn't write it, I'm
just here helping out.  If it matters:  The machine this works on is a
9406-820 at V4R5.  The machine it does not work on is 9406-170 at V5R1.

I thought large running data intensive processing was a strong suit of our
mighty AS/400....er..iSeries.  Anybody else ever run into this?

David Smith
Advanced Information Solutions, Inc.

This mailing list archive is Copyright 1997-2026 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.