RE: Condition handler and lock waits - Retry! Retry! -- RPG400-L

Hi Duane,
 
<snip>
To do a retry you need to be able to "rerun" the failing command. I looked
into this years ago and could not find anything within the exception
handling APIs which would do this. What I ended up doing was issuing a C
function set "setjmp" and "longjmp". Before each file I/O I would call the
setjmp and if a record lock occured (CPF5027) the condition handler for the
procedure would catch that error, wait 10 seconds and issue a longjmp. If
this occured on the same statement 3 times I would send a break message to
both the locker and locked users. This is not very elegant IMO but I have
never found anything better and I assume that IBM uses something similar to
do the retry.
</snip>
 
Hmmm... The thing that gets me is the fact that the system happily allows
the failing command to be "rerun". If I ignore the RNX1218 *ESCAPE message
and resume at the next processing instruction (action code 10), it gets
"handled" and a RNX1218 *INQ message is issued - this has a (R)etry option.
If the uses replies with R then the system attempts to allocate the record
again. So the code is in there, it is a case of forcing it to run without
displaying that screen to the user.
 
The way our system works for interactive jobs is the following:
 
1) The code running within the user's job receives an exception for such
things as divide-by-zero, array index error, substring error, etc...
2) The condition manager calls our condition handler.
3) The condition handler logs the exception details to a log file and
performs some tasks (determined by exception ID, job name, etc...)
4) Optionally, the job will display a screen which informs the user of the
error, instructs them what to say to a customer on the telephone, and asks
them to call helpdesk. (This screen can put the caller on hold, or hang up
the call). The user can not proceed until helpdesk gives them a unique
pass-code.
5) Helpdesk, have a screen with all outstanding (logged) exceptions. They
can look at the problem or pass it to a developer if they can't solve it.
Once an appropriate course of action is decided, helpdesk keys in the
response and a unique pass-code is generated. This code is passed to the
user to unlock the screen.
6) When the user keys in the pass-code provided by helpdesk then the
response set by helpdesk is issued to the job. Note - this is not a reply to
the message. The responses are combinations of Resume, Percolate, Promote,
Move Resume Point, End Job, End Activation Group (CEETREC) etc...
 
Now, this is fine for such things a divide by zero because a developer can
service and debug the job, correct the value and manually perform the
division. With a response of "Resume at next instruction" you can get past
that problem without bombing the job. If the problem is fatal to the
application, it is possible to "Move the resume point to after the call to
the top-level program" followed by "Resume at next instruction". This will
pass control to the program which called your top-level program and resume
after the call. Alternatively, you can perform a normal end of the
activation group. These options are excellent when you have stand-alone
independent update sections within a job. If one bangs you can continue and
complete the rest. The problem section can be fixed-up and re-run.
 
But all of these options are simply combinations of moving the resume cursor
and continuing, or cancelling back to a point and continuing. Why is the
retry option offered to the person least qualified to decide whether to take
it? :-)
          
There must be a way.
 
Cheers
 
Larry Ducie