× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Here is an update from IBM PE regarding the 5913 L2 Cache failure.
IBM is stating that everything WAD.
So, there is a rare case with dual controllers that one can still experience an issue/failure.
I'm not in 100% agreement based on how the dual controllers are marketed and sold.

"Many recoverable hardware errors can generate the same SRC57B58157.
In this case the error was logged due to a Cyclic Redundancy Check
(CRC) error. DC14 encountered a CRC error on a module on
the IOA and needed to reset in order to reload its firmware.
This is not a firmware problem that can be solved with a PTF. This
is a failure of the module on the 57B5 adapter that was resolved by
reloading the firmware for that module.

Our field experience with this error indicates it rarely recreates on
the same adapter. The IBM i error logging has a threshold of two 8157
errors within a 24 hour period before the SAL entry will call for the
replacement of the adapter. Replacing the hardware will also cause the
customer a disruption and that is the reason we recommend to replace the
57B5 adapter only if this recoverable hardware error occurs again.
IBM does not recommend the 57B5 adapter be replaced because
we do not expect the same SRC57B58157 to occur again. While there is
no Technical reason to replace the IOA, it can be replaced if client
would prefer that option.

The two adapters protect the customer's cache data. When one adapter
encounters an error the paired adapter must pause and write all of
the cache data to disk. Cache will then be disabled until the other
adapter is operational again and able to provide a second copy of the
cache data. In the PAL you see this secondary activity as a reset to
the 'good' adapter. The reset provides the method of freezing the
adapter until the data can be written safely to disk.

The entire system is generally not hung when these recovery steps are
performed. Disk activity related to the disks attached to the adapter
pair will be suspended until the resets complete. Other system activity
can continue but may appear hung if it is dependent on data from the
affected disks.

Hardware errors do occur and may be caused by many things. The design
of the system is to maintain availability and to also protect the
integrity of the data. In this case the hardware error should have
caused no more than a brief interruption of service to the customer
and all data was protected at all times. "

Paul

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Steinmetz, Paul
Sent: Tuesday, May 12, 2015 11:23 AM
To: 'Midrange Systems Technical Discussion'
Subject: CPPEA46 - Warning - An I/O adapter to I/O adapter connection failed.

We experienced an issue this morning with our paired 5913 (57B5) DASD controllers.

This error was an L2 cache error and the cards needed to do a reset for data integrity reasons.

The controllers went into a recovery, lasted 23 seconds, LPAR was suspended during this period.

During the recovery, several applications failed.



I was under the impression that if one card failed or had issue the other card takes over.

I guess there are exceptions to that rule.

Open PMR with Hardware support, waiting on feedback.



Anyone ever experience one of these or something similar?



CPPEA46 70 INFO Warning - An I/O adapter to I/O adapter connection failed.

QSYSCOMM1 QSYS 404596 QRUSDMSG 0000 05/12/15 05:50:44.544846 QSYS



CPPEA66 99 INFO *Attention* Hardware service may be required.

QSYSCOMM1 QSYS 404596 QRUSDMSG 0000 05/12/15 05:50:45.352414 QSYS



CPPEA46 70 INFO Warning - An I/O adapter to I/O adapter connection failed.

QSYSCOMM1 QSYS 404596 QRUSDMSG 0000 05/12/15 05:50:45.352726 QSYS



CPPEA45 40 INFO Informational only. I/O adapter to I/O adapter connection restored.

QSYSCOMM1 QSYS 404596 QRUSDMSG 0000 05/12/15 05:51:07.352712 QSYS




This error is an L2 cache error and the card needs to do a reset for data integrity reasons.



57B59072 05/12/15 05:49:42 Info 2/288/0/ 0-2/ /27/ / 1/ DC11 57B5 001 ********** 8025E960 No description is availa
57B58157 05/12/15 05:49:42 Perm 2/308/0/ 0-2/ /27/ / 1/ DC14 57B5 001 ********** 802589C0 No description is availa <===== See explanation below in Desdription for Case 4115
57B59070 05/12/15 05:49:42 Info 2/288/0/ 0-2/ /27/ / 1/ DC11 57B5 001 ********** 802589E8 No description is availa
B6000266 05/12/15 05:49:42 Reco 2/308/0/ 0-2/ / 0/14/ 0/ DMP069 59C2 109 ********** 8025E961 Contact was lost with th
B6005275 05/12/15 05:50:13 Info 2/288/0/ 0- / / / / / CMB13 57B5 001 ********** 802589E8 URC Information not avai
B600512D 05/12/15 05:50:40 Dump 2/308/0/ 0- / / / / / CMB17 57B5 001 ********** 802589C0 An IOP dump was initiate
50B18404 05/12/15 05:50:40 Dump 2/308/0/ 0-2/ / 2/38/ 0/ D07 50B1 001 ********** 802589C0 No description is availa
50B18404 05/12/15 05:50:40 Dump 2/308/0/ 0-2/ / 0/38/ 0/ D06 50B1 001 ********** 802589C0 No description is availa
57B59072 05/12/15 05:50:40 Info 2/288/0/ 0-2/ /27/ / 1/ DC11 57B5 001 ********** 8025E963 No description is avail
B6005120 05/12/15 05:50:44 Info 2/308/0/ 0-2/ / 0/18/ 0/ DMP038 59C2 109 ********** 8025E964 System LIC detected a

B6005120 05/12/15 05:50:44 Info 2/308/0/ 0-2/ / 0/15/ 0/ DMP073 59C2 109 ********** 8025E96E System LIC detected a pr
A6010266 05/12/15 05:50:44 Perm *PLATFORM C80011EC No description is availa
A6020266 05/12/15 05:50:44 Perm *PLATFORM C80011EE No description is availa
A6010266 05/12/15 05:50:44 Perm *PLATFORM C80011F0 No description is availa
57B59071 05/12/15 05:51:06 Info 2/288/0/ 0-2/ /27/ / 1/ DC11 57B5 001 ********** 8025E96F No description is availa
B6005275 05/12/15 05:51:11 Info 2/308/0/ 0- / / / / / CMB17 57B5 001 ********** 802589C0 URC Information not avai < ==== IOA Reset for SRC57B58157 recovery.
B6005120 05/12/15 05:51:30 Info 2/308/0/ 0-2/ / 0/18/ 0/ DMP038 59C2 109 ********** 8025E970 System LIC detected a pr
B6005120 05/12/15 05:51:30 Info 2/308/0/ 0-2/ / 0/19/ 0/ DMP040 59C2 109 ********** 8025E971 System LIC detected a




*******Description for Case 4115******
SRCxxxx8157 with NO degraded disk and Service Action Log (SAL) failing item = SVCDOCS, the IOA does NOT need to be replaced.
________________________________
******ACTION for Case 4115**********
This is an update. You only need to verify that the disk units under the IOA are NOT "DEGRADED" and the callout in the Service Action Log is "SVCDOCS" to = NO IOA REPLACEMENT.

*******Extended Text for Action *******
*************************************************************************
SRCxxxx8157 is a RECOVERABLE error and as long as no disk units are "degraded" and the Service Action Log has "SVCDOCS" as the part called out, DO NOT REPLACE THE IOA.
Other errors logged at the same time may be: (Example for 571F/575B IOA)
SRCA6xx0255/266
SRC571F8133
SRC571F3400
SRCB6005275
SRCB500512D
SRC506D8404
SRC506E8404
*************************************************************************
*
* This error is an L2 cache error and the card needs to do a reset
* for data integrity reasons.
*
* The card is working as designed if after the resets, all resources
* are "operational"
*
*************************************************************************

Thank You
_____
Paul Steinmetz
IBM i Systems Administrator

Pencor Services, Inc.
462 Delaware Ave
Palmerton Pa 18071

610-826-9117 work
610-826-9188 fax
610-349-0913 cell
610-377-6012 home

psteinmetz@xxxxxxxxxx<mailto:psteinmetz@xxxxxxxxxx>
http://www.pencor.com/


--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l.


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.