|
Thanks for the list of things for me to look into further.As it turned out, the local tornado seems to have disrupted Internet traffic, so the discussion list did not get to see my post about the problem, until long after I had fixed the immediate problem.
lessons learned ============ The tornado struck Evansville a few miles from where we located.I store off-site backup 2 different places. I now thinking they should be in waterproof baggies.
It was the disk SPACE that got wiped out, NOT the disk CONTENTS. I did not know that initially.
It is my understanding that an IPL includes the recovery of the deleted spool file disk space. We did another Monday wee hours after disk space was back to normal. However, I think there is a System Value (I have to check which one) related to how many days worth of recovery ... it might be smart if there is some evening during the week when the nite clerk gets done early, to use GO POWER to schedule another wee hours IPL.
I am now certain the runaway job did it, although in my kill kill kill anything that I could, I also lost details on what precisely went wrong in that program.
I reran that job and it ran fine, so I now suspect that some of the other stuff I was running Saturday needs to be on the list of what not to run at the same time. Also I need to study how to put safeties on jobs to make runway less likely.
I have security auditing going, and infrequently remember to check what info got there ... I think I ought to put the command to do that on one of my menus, as a reminder to do it more often.
Another disk space management issue ============================ We have files that "grow" to grab more disk space as needed.This means that when I kill a lot of ancient records, the files are still "reserving" space for growth that may be excessive, so I need to downsize some without losing the growth support ... that is for reasons of using disk space wisely. There is also a performance issue associated with spotting files nearing their next growth step, and perhaps upsizing them before that happens in middle of work day. I expect the answer is a query over an *OUTFILE, looking for files at the extremes of not having much growth left, or having excessive growth space.
Backups ... why so infrequent ======Historically we have had to recover stuff from backups perhaps once every 3-4 months, since we moved to BPCS. In most cases, the recoveries not impact most end users, so they not a witness to the rate of recovery ... what would typically happen is one person deletes a query definition that they think no one using, then a month later someone else tries to run it and it bombs.
When we were on MAPICS we had to go back to last backup, on average, a couple times a week. When bad storms came thru the area, we had to do so several times a day. I would ask management to have everyone off the system until the storm passed, but they never thought it neccessary, because it was clerical people further down the food chain that had to rekey all their work several times.
Until the move to other offices, we used to run backups/400 almost every week nite. A lot of people leave their work stations signed on in the middle of some update ... different people different nites ... if I kill their sessions, then that crashes what they updating, and there's other stuff to have to fix, so I figured a backup every 2nd or 3rd nite was probably tolerable. I opted not to move my residence to the new AS/400 site city, and asked about people at that site who could perhaps run the kind of backup I had been doing.
Unfortunately we have people who need to be on the system until the wee hours. When I was doing backups where the AS/400 located, I could wait until people needing to do updates were all off the system (last one at 2 am) then force restricted state, and do a full backup. But in the current reality, I can only kick people off, start a backup, not in a restricted state, then people sign on again, and various critical files not get into the backup, that their inquiry is accessing.
The morning crew comes on at 5 am, so if I going to do a nitely backup, I have to start it before 3 am, and have some way to enforce nite crew inquiry staying off the system during that time frame. I do not have the political clout in the company food chain to get that. It is partly a matter of user education ... the users are accustomed to signing on and accessing the system whenever they please. This is one of the reasons why I have to visit the site when doing end fiscal. I have to get a complete backup (I like to get two of them, one before and one after the fiscal update jobs), and I have to get restricted state when running end fiscal updates.
One of the people at the AS/400 site changes the backup tape media for me, and runs the cleaning tape as needed.
- Al Macintyre http://www.ryze.com/go/Al9Mac BPCS/400 Computer Janitor ... see http://radio.weblogs.com/0107846/stories/2002/11/08/bpcsDocSources.html
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.