I've been working with the AS/400/iSeries/i5 for around 12 years now and administering (at least) one of them at various jobs for about 8 years. I've seen only one true crash up till now where the machine went down hard with no advance warning. Until now... Monday night while Domino was going down for an automated backup, we lost a 90mb RAID controller card that controls our 2nd set of 4 RAIDed drives. We switched our Domino users over to the clustered server and our WAS apps over to the other box as well (15-20min task). Found the box down the next morning and spent the morning diagnosing the actual failure with the DASD group in Rochester. The afternoon was spent trying to reseat the card and remove/reinstall it after sitting outside of the machine for a while, but this particular card never would come up as visible to the server again. The CE showed up with the couriered replacement at around 6pm and we spent the next 5 hours on the phone with Rochester trying several other tricks to try and get the lost cache back - no dice (48800 'blocks' was the statistic of what was lost in the cache). We got the word at around 10:30pm that a reload/restore would be necessary. Yesterday was spent doing the restore and testing out the basics of WAS and Domino. I plugged the ethernet cable back in to the network at around 6:45pm last night and Domino was replicated back to normal (from last Friday's backup through Wednesday evening's current activity) by 7:30pm. WAS and Domino both worked like a charm after restoring an Option 21 from last month and then a full IFS save from last Friday night overtop of the 21 restore. This is one heck of a box, and for the most part, IBM does one heck of a job supporting it. Also key in such an 'easy' (although fretful) recovery was the fact that we dutifully switch these apps to the backup box from afternoon until the next morning each month. Each month the redundancy is tested and it keeps us in good practice for what's required to get things moved over to the backup box (and then moved back to the main) - in short, we KNOW we can trust our backup server and we are comfortable doing the switch. Even if you know your redundancy works... if at all possible it's very worthwhile to keep testing and practicing it!
This mailing list archive is Copyright 1997-2014 by MIDRANGE dot COM and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available here. If you have questions about this, please contact