-----Original Message-----
From: James Rich
Sent: Monday, June 30, 2014 13:41
Subject: RE: Failed disk in a 9406-170

On Sat, 28 Jun 2014, Porterfield, Sean wrote:

You may have already solved this by now, but the Backup and Recovery
manual has that sort of thing.
See PDF page 86.
Sean Porterfield

I've been reading exactly that manual. As I look at the tables on that page it
looks to me like I need to use checklist 15 as that seems to match my situation
of device parity protection active and no data loss.

Those checklists are so much fun...

Unfortunately that checklist says that tasks 1 and 2 are done by the service
represetative, which won't be coming in our case since there is no
maintenance agreement. So I need to do task 1: attach the new disk unit
and task 2: rebuild the failed device parity disk unit data. I've been reading
the section on device parity protection and so far haven't found anything
on rebuilding the failed device parity disk unit data.

You are the service representative in this case.

I'm not sure what all I can get into on my system, since our disk is now mirrored. Related to Parity, I see from STRSST:
3. Work with disk units
2. Work with disk configuration
8. Work with device parity protection

I'm not sure you even have to do that, though. It's been a long time since I played with the hardware. I know I've had IBM replace disks, and I know I've done it myself. I've never had a problem with it, but that is not a guarantee that you won't. (See other post in this thread regarding FAILING vs FAILED.)

On the Concurrent maintenance topic:
1. Start a service tool
7. Hardware service manager
1. Packaging hardware resources (systems, frames, cards,...)
* I had to use option 9 to drill down to disks
3=Concurrent maintenance
8=Associated logical resource(s)

Note: I do not recall ever doing this in this fashion, so it may not be appropriate. I recall keying a disk resource name in a field and selecting Add or Remove (or similar) in a selection field.

Ah, yes. From the 7. Hardware service manager option, 8. Device Concurrent Maintenance. I then keyed the Device Resource Name, Action, Time delay.

I have found this maidrange post interesting:

That sounds about right.

It mentions concurrent maintenance, but I haven't found what that is in any
manuals yet. Do I leave the failing disk unit in the machine and add the new
disk, or do I remove the failing unit from DPY, then remove it from the
machine, then add the new disk, then add the new disk to DPY?

Concurrent maintenance means the ability to replace the disk with the system up and running. I didn't find details of that in my System Builder guide that includes the 170. I know some models were "no concurrent maintenance", some were "concurrent maintenance with these disk controllers", and some (probably) were "concurrent maintenance is always possible." Unfortunately, I don't recall which category the 170 is in.

The "Device Parity Protection-Overview" section in the B&R manual says,
Device parity protection provides the Following:
... Concurrent maintenance for single disk failures

The archive post you reference basically says to go into SST (or DST from manual IPL if not doing concurrent) and tell the system you want to remove a disk. It should have a value for how long to wait, then it flashes the light on the drive. You remove the failing disk. Then you go back in and tell it you want to add a disk, from a very similar screen. Stage the disk in the slot but do not fully insert prior to selecting the add option. Then when it flashes, you put the new disk in.

My comments are all out of order now, but the option 8 concurrent maintenance is what I recall using to replace disks.
Sean Porterfield


