You might want to start a document similar to ours.
We have a document, organized by descriptions of what might go wrong,
followed by step by step instructions how to resolve it. This is more
application specific than system specific.
The same document also has sections explaining the various kinds of
causative issues, and how to inspect system indicators to see whether we are
at risk of such events happening again soon,
Such as how you can tell the biggest # supported by various application
files ... orders for example ... how soon they will run out of 999,999
ceiling and go back to 1 & what that does to reports listing in order
sequence, and what happens if we have open say history on order 1234 and a
brand new unrelated order 1234 shows up in the system.
We are often requested to reset order # to 1 because it means less keying
small # digits, than big & this mounts up, but if we do this too soon, there
can be the overlap problem, so the people doing the reset need to know how
to calculate if this will be safe to do.
Such as messages to the system log about files needing to grow in size ...
check those to see if we are approaching the ceiling supported.
Such as how big a report can be before it runs out of spool space ... check
our biggest reports to see if their growth is approaching that thresh hold.
Exploring any one of those "such as" can be an eye-opening education.
Any time we call tech support on a new problem, there can be time wasted
communicating exactly what happened & us learning how to recognize what went
wrong & how to fix it. There is also the problem of end users recognizing
that something is a problem. Thus after dealing with each new problem, it
gets added to the HOW TO document.
At an earlier employer, on a system other than IBM i/400, a common refrain
in calls to me for help, was "there is garbage on my screen", so I created a
little booklet, labeled "garbage on my screen". I consisted of screen shots
on the left side page, showing example of garbage (typically an error
message), and on the right side was step by step instructions what to do to
resolve it.
Our auditors recently left. One thing they asked for, and I gave them, was
my check list of fiscal duties.
There's also naming conventions for OUTQ, programs, a great spectrum of
system ingredients. Will your operators be setting up security for new
users, configuring new connections?
-
Al Mac
-----Original Message-----
<rob@xxxxxxxxx> wrote in message
news:mailman.11585.1237898882.26163.midrange-l@xxxxxxxxxxxxxxx
Create a daily, weekly, and monthly runbook. Documented well enough that
someone can step in for them when they are on vacation, etc.
Just finished doing that - now I've got to train them how to manage the
system and respond to problems.
As an Amazon Associate we earn from qualifying purchases.