RE: Pause technique -- MIDRANGE-L

-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx 
[mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Roger Harman
Sent: Wednesday, September 27, 2006 6:58 PM
To: midrange-l@xxxxxxxxxxxx
Subject: RE: Pause technique

Are you saying that you never have failures?  If so, I'd sure like to
live on your planet <grin>.


Of course not.


Some processes (particularly when dealing with imported data) have
potential weak spots and they need to be dealt with.  I'd sure rather
deal with the possibility of failure proactively than wait 
for an abend
and have to figure out what needs to be unwound to start over.


Cleaning up imported data prior to processing it through your system is
a separate issue and not what I'm talking about here.  We are talking
about processing after the data has been accepted into your application
system.


We have a critical nightly job with about 50 distinct steps, one of
which has about 15 sub-steps.  The fact that we've planned for
checkpoint restart capability doesn't imply to me that there 
is a design
flaw.  Rather, it implies that we've studied the issue and programmed
defensively to deal with real world conditions.


Let me ask you this:
How many times have you had to restart that job this month/year?

How many times has the same basic problem caused a restart?

How many times have you said to yourself, "I could make a change to
prevent the problem above", but have not yet found the time to make the
change?

How do you determine that a checkpoint has been reached successfully?

when a checkpoint fails, how to you fix the problem so that you can
restart?

Now if you answered zero, never, never, automatically, and "it fixes
itself".  Then congratulations, I'd say your shop is an exception.  Most
shops with an easily restartable and/or checkpointed multi-step job
often validate the checkpoint manually and end up making use of the
restartability often after fixing the data with DFU/SQL/DBU. 

That is what I mean when I say failure is an expected and acceptable
occurrence.

Programming in such a manner isn't defensive in my mind; it's defective.
But the business process of validation the checkpoints, fixing the error
and restarting limit the effects of the defective programming.

At the place I used to work, I too had a critical nightly job that ran.
Did I ever have a problem that required me to restart it?  Yep sure did
and let me tell you it was a major PITA to do so.  But in the 7 years I
was there it probably only had to be done 3 or 4 times.  Mostly early on
when the system was new.  Any problem that required a restart was
tracked down and prevented from happening again at the source.


I assume you do range checks on input data or chains to 
master files to
maintain data integrity?  By doing so, are you not also assuming that
failure (i.e. erroneous data input) is an expected and acceptable
occurrence?  Of course you are, and you're programming defensively to
catch and correct those failures.  I assume you use the MONMSG command
in CL programs?  Another example of defensive programing.


I agree that you have to program defensively, and yes my programs
monitor for failures.  I also validate data on input, but I'd argue that
validation on input is offensive programming and like the saying goes;
the best defense is a good offense.  
 
Once the data is in you application system it had better be good.
Program defensively, but if your defenses are getting hit often, you
need a better offense.

 
Charles Wilt
--
iSeries Systems Administrator / Developer
Mitsubishi Electric Automotive America
ph: 513-573-4343
fax: 513-398-1121