MIDRANGE dot COM Mailing List Archive



Home » MIDRANGE-L » June 2014

RE: BRMS DUP failure CPF5349 , CPP630C - 3573 LTO5 HH FC Drvie failure



fixed

For the archives.
We had a repeat of this issue last night.
Tape failed to unload.
Tape support confirmed this issue, 2EZ0C from drive dump.
Read/write head gets stuck, prevents tape from unloading.
Drive4 is being replace tomorrow.
Latest LTO5 HH FC firmware D8D9 fixed some, but not all issues.
Tape support looking at possible EC change.

Paul

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Steinmetz, Paul
Sent: Thursday, May 15, 2014 9:57 AM
To: 'Midrange Systems Technical Discussion'
Subject: RE: BRMS DUP failure CPF5349 , CPP630C - 3573 LTO5 HH FC Drvie failure

Just completed updating firmware from D8D5 to D8D9 using ITDT utility.
Still waiting for update from IBM on why BRMS selected the failed drive and would not use another drive.

IBM Tape Diagnostic Tool Standard Edition - Device List

Device Model Serial Ucode Changer [#]
+----+-------------------+--------------+------------+------+------------+-+
| 0 | TAP01 | ULT3580-HH5 | 1068040782 | D8D9 | | |
| 1 | TAP02 | ULT3580-HH5 | 1068076308 | D8D9 | | |
| 2 | TAP03 | ULT3580-HH5 | 1068040373 | D8D9 | | |
| 3 | TAP04 | ULT3580-HH5 | 1068040785 | D8D9 | | |
| 4 | | | | | | |
+----+-------------------+--------------+------------+------+------------+-+
[S] Scan [T] Test [D] Dump [F] Firmware Update
[E] Encryption [W] Full Write [U] Tape Usage [O] Other...
[H] Help
<[Q] Quit | + | - | [N] Next | [P] Previous | Line # | Command >

Paul

From: rob@xxxxxxxxx [mailto:rob@xxxxxxxxx]
Sent: Wednesday, May 14, 2014 7:50 AM
To: Steinmetz, Paul
Subject: RE: BRMS DUP failure CPF5349 , CPP630C - 3573 LTO5 HH FC Drvie failure

Traditionally I call IBM and have them come in and fix the drive.

However, if the concern is that they can't truck it in from three states away until after hours, and you have enough drives and can wait until tomorrow then there's some merit to that solution.
Look at it this way, if you have multiple lpars accessing the library at once (not uncommon) then BRMS already handles 'drive busy' situations and picks a different drive in that library. So using allocation to flag a drive should perform the same.
Try allocation on all but one drive on WRKMLBSTS. Then run a WRKMEDBRM and use 10=Reinitialize on some expired tape. By using the GUI on the tape library you should see it load into that drive.
Not sure how to match up the drive in the library gui with WRKMLBSTS? Try serial number.
WRKHDWRSC TYPE(*STG)
9=Work with resource (to see the individual drives) 7=Display resource detail (to see the serial numbers on each drive)

If you're really anal you can go into STRSST look up these resource names and make sure each resource name on each lpar matches for each serial number. That way the errors each flags will all point to the same drive. Instead of one calling it TAP02, another calling it TAP08, etc.

Rob Berendt
--
IBM Certified System Administrator - IBM i 6.1 Group Dekko Dept 1600 Mail to: 2505 Dekko Drive
Garrett, IN 46738
Ship to: Dock 108
6928N 400E
Kendallville, IN 46755
http://www.dekko.com<http://www.dekko.com/>





From: "Steinmetz, Paul" <PSteinmetz@xxxxxxxxxx<mailto:PSteinmetz@xxxxxxxxxx>>
To: "'Midrange Systems Technical Discussion'" <midrange-l@xxxxxxxxxxxx<mailto:midrange-l@xxxxxxxxxxxx>>
Date: 05/13/2014 04:13 PM
Subject: RE: BRMS DUP failure CPF5349 , CPP630C - 3573 LTO5 HH FC Drvie failure
Sent by: "MIDRANGE-L" <midrange-l-bounces@xxxxxxxxxxxx<mailto:midrange-l-bounces@xxxxxxxxxxxx>>
________________________________



I have question on handling tape errors.
LPAR A initially experienced the error on Drive4.
24 hours later LPAR B attempts to load a different volume in Drive4, but drive4 was in an error state from previous LPAR A error.
IBM suggested possibly to 6=Deallocate resource (in this case Drive4) on LPAR B from LPAR A.
This would have forced BRMS to select a different drive, process on LPAR B could have continued without error.
I have never done much with Allocate/ Deallocate resources, has anyone worked with this with BRMS following an error condition.

Paul

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Steinmetz, Paul
Sent: Monday, May 12, 2014 11:50 PM
To: 'Midrange Systems Technical Discussion'
Subject: RE: BRMS DUP failure CPF5349 , CPP630C - 3573 LTO5 HH FC Drvie failure

Resolution for the issue, for the archives.

Power cycling the 3573 library did clear the Drive 4 error.
Volume 001005 was able to be moved/unloaded to a slot.
Media attention light remained on.
Moving 001005 from IO slot to I/O door caused IO Station Bad Tape indicator to appear.
Removed 001005 from tape library, 3573, cleared both Media Attention and Bad Tape light.
Reloaded 001005 back into library, Drive 4, DSPTAP of 001005, *SAVRST, worked fine with no error.
Currently running another DUP, different volume as source in Drive 4, with no issue.
Volume 001005 usable, but still in *ERR status within BRMS.
Beginning with V7R1 9=Remove volume error status Ran this on 001005.
At this point, we don't have solid confirmation of cause of the failure drive 4 or volume 001005.
It was many moons ago, but I remember a similar issue, neither the drive nor the volume was replaced.
I reran the DUPMEDBRM on the same Drive4 with same input volume, 001005, everything ran fine, no errors or failures.
This confirmed that neither Drive4 nor volume 001005 was the cause of the error.

After sending library service logs and drive logs, and requesting a PE review the logs, here is IBM's initial response.
There is new LTO5 HH FC drive firmware, (not yet released) that should resolve the issue.
Current LTO5 HH FC firmware is D8D5, D8D9 available only when recommended by PE.

Summarizing, this is the 4th incident in the last 2 1/2 years, while on Power7 and/or LTO5, where NEWER firmware was recommended by IBM for the resolution.
In all cases, Power7 (VPD issues) or LTO5, critical errors could have been avoided if newer firmware would have been installed.

Any feedback from the group on firmware related errors, has anyone experienced any of these.

Note: If there was a scheduled restore process from LPAR A to LPAR B that night, which needed volume 001005, this process would have failed with no immediate resolution.

Paul





-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Steinmetz, Paul
Sent: Monday, May 12, 2014 10:14 AM
To: 'Midrange Systems Technical Discussion'
Subject: BRMS DUP failure CPF5349 , CPP630C - 3573 LTO5 HH FC Drvie failure

Following a successful BRMS full save on LPAR A, BRMS auto DUP job experienced a CPF5349, CPP630C on the source volume 001005.
Vol 001005 was used for successful restore on LPAR B, 2 hour prior to the failure on LPAR A.
BRMS marked 001005 in an *ERR status, normal.
LTO 5 HH FC Drive 4 is in error status.
Library warning trace logs show a Drive Warn or Crit Tape Alert flag.
001005 is currently stuck in Drive 4 of the 3573, cannot be unloaded either through BRMS or 3573 GUI move media commands.

Not sure of the cause, bad volume 001005 or bad drive 4.
3573 operator guide suggests to power cycle the library.
Our previous 3582 library had a button on the drive itself that you could reach within the library and force remove a stuck volume.
I'm not finding this same functionality with the 3573.

Any suggestions on proper next procedures?

Thank You
_____
Paul Steinmetz
IBM i Systems Administrator

Pencor Services, Inc.
462 Delaware Ave
Palmerton Pa 18071

610-826-9117 work
610-826-9188 fax
610-349-0913 cell
610-377-6012 home

psteinmetz@xxxxxxxxxx<mailto:psteinmetz@xxxxxxxxxx>
http://www.pencor.com/





--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx<mailto:MIDRANGE-L@xxxxxxxxxxxx>
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx<mailto:MIDRANGE-L-request@xxxxxxxxxxxx>
Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l.

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx<mailto:MIDRANGE-L@xxxxxxxxxxxx>
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx<mailto:MIDRANGE-L-request@xxxxxxxxxxxx>
Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l.

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx<mailto:MIDRANGE-L@xxxxxxxxxxxx>
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx<mailto:MIDRANGE-L-request@xxxxxxxxxxxx>
Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l.

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l.






Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2014 by MIDRANGE dot COM and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available here. If you have questions about this, please contact