× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Do you monitor the hmc ?
Anything in serviceable events on there ?
I believe there may be snmp traps available on the hmc but have not looked
into it on detail. Might be worth a look.

On Wed, 23 Jan 2019, 06:08 Steinmetz, Paul <PSteinmetz@xxxxxxxxxx wrote:

Rob,

Yes, I thought of possibly using a WRKWCH for PAL and/or LIC entries.
However, WRKWCH would not be running during an IPL, probably would be
missed.

The PAL are stored in QUSRSYS/QASXPROB.
I thought of maybe checking this PF with "something"

Paul

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxxxxxxxx] On Behalf
Of Rob Berendt
Sent: Tuesday, January 22, 2019 8:29 AM
To: Midrange Systems Technical Discussion
Subject: RE: 57B5 (5913) card/pair failure/recovery - difficult to montor
and troubleshoot

IDK of any current service which will query PAL or SAL. But you could
submit an RFE.

https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/apis/qscswch.htm
The Start Watch (QSCSWCH) API starts the watch for event function, which
notifies the user by calling a user specified program when the specified
event (a message, a LIC log or a PAL) occurs. PAL stands for Product
Activity Log which shows errors that have occurred (such as in disk and
tape units, communications, and work stations).

Occasionally IBM allows some of the service stuff to be accessed outside
of SST also. Starting, stopping and managing the data from comm traces
comes to mind.

-----Original Message-----
From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Steinmetz, Paul
Sent: Tuesday, January 22, 2019 8:04 AM
To: 'Midrange Systems Technical Discussion' <midrange-l@xxxxxxxxxxxxxxxxxx

Subject: RE: 57B5 (5913) card/pair failure/recovery - difficult to montor
and troubleshoot

Nothing in QST.

One initial PAL entry.
One SAL entry with count of 3.

Can either the PAL or SAL be monitored?

Paul

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxxxxxxxx] On Behalf
Of Rob Berendt
Sent: Tuesday, January 22, 2019 7:54 AM
To: Midrange Systems Technical Discussion
Subject: RE: 57B5 (5913) card/pair failure/recovery - difficult to montor
and troubleshoot

Was there any matching records in QHST during that time? If so, this is
quite easy to monitor, even if you have to periodically query QHST (using
the appropriate service API) and feed that into your monitoring tool.
Have you asked your tool vendor if they "catch up" on QHST after an IPL?
It's worth the time to question them.


-----Original Message-----
From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Steinmetz, Paul
Sent: Monday, January 21, 2019 4:27 PM
To: 'Midrange Systems Technical Discussion' <midrange-l@xxxxxxxxxxxxxxxxxx

Subject: RE: 57B5 (5913) card/pair failure/recovery - difficult to montor
and troubleshoot

SST Log analysis shows the below entry during each IPL.
Most monitoring tools can't/don't access PAL, SAL, LIC Log etc.
Even if they could, they would be missed because during an IPL (would not
be active).

Currently, I can't monitor during an IPL and I can't monitor for any of
these.

System Resource Resource
Ref Code Date Time Class Name Type

57B59076 01/21/19 01:10:38 Perm DC05 57B5
B6005090 01/21/19 01:11:39 Qual DMP048 19B3

57B59076 01/21/19 03:23:58 Perm DC05 57B5
B6005090 01/21/19 03:24:59 Qual DMP048 19B3

57B59076 01/21/19 04:05:47 Perm DC05 57B5
B6005090 01/21/19 04:06:52 Qual DMP048 19B3

Paul

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxxxxxxxx] On Behalf
Of Rob Berendt
Sent: Monday, January 21, 2019 4:17 PM
To: Midrange Systems Technical Discussion
Subject: RE: 57B5 (5913) card/pair failure/recovery - difficult to montor
and troubleshoot

When you did the additional IPL's did you select the option on PWRDWNSYS
to check hardware?
We only IPL our hosting lpar once a quarter. Part of the check list is to
ensure that WRKDSKSTS does not show performance degraded.

Reminder: Frequent IPL's delete SQL performance data and can adversely
affect performance.

You could use the DB2 service APIs to query QHST,
QSYS2.HISTORY_LOG_INFO(), about the time of the first IPL and see if you
see that message. Then try to figure out how to shoehorn that into your
monitoring. Startup query perhaps?

The storage service APIs do not seem to cover the performance degraded
status. I wonder if the System Health Services do?

-----Original Message-----
From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Steinmetz, Paul
Sent: Monday, January 21, 2019 3:42 PM
To: 'Midrange Systems Technical Discussion' <midrange-l@xxxxxxxxxxxxxxxxxx

Subject: 57B5 (5913) card/pair failure/recovery - difficult to montor and
troubleshoot

Early this morning, during an IPL, I had a 57B5 (5913) pair fail during an
IPL.
Because our monitoring software is not running during the IPL, I initially
did not see the alert.
Two additional IPLs - LPAR continued to run, but performance was EXTREMELY
poor The initial "call home" PAL entry did not re-appear, only an increased
count in the SAL entry, which I didn't see till later on.

SAL entry
Status Date Time SRC Resource Count PLID
NEW 01/21/19 01:10:38 57B59076 DC05 3

It was later discovered, with IBM hardware support, via SST - 6. Display
disk hardware status That the pair was "Performance degraded", which
implied the disk controllers were running with ZERO cache.

Performance degraded-
This state indicates the device is functional but performance may be
impacted due to other hardware problems (such as a IOA cache problem).

We identified the "suspect" card, which was NOT operational.
Powered off the slot, powered slot back on.
LPAR disk performance problem re-solved.

I had similar failures on a different LPAR, different card pair over the
years.
Those failures were not during an IPL, but while LPAR was running.
The difference in those two previous failures was the card/slot was
automatically reset by the code.
Previously,

This error was an L2 cache error and the cards needed to do a reset for
data integrity reasons.
The controllers went into a recovery, lasted 23 seconds, LPAR was
"suspended" during this period.
During the recovery, several applications failed, which then need a manual
reset/recycle.

1) How does one better monitor for these types of card/pair failures?
2) Why did 2nd and 3rd IPL not reset the card?
3) Why did 2nd and 3rd IPL not "call home" and create a new PAL entry.
4) Anyone else from the group experience similar card/pair failures?

Thank You
_____
Paul Steinmetz
IBM i Systems Administrator

Pencor Services, Inc.
462 Delaware Ave
Palmerton Pa 18071

610-826-9117 work
610-826-9188 fax
610-349-0913 cell
610-377-6012 home

psteinmetz@xxxxxxxxxx
http://www.pencor.com/









--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.