×
The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.
Last Thursday evening our IBM I POWER6 on V5R4M5 became unavailable. QINTER green screen users lost connection. Web apps went offline. The HMC green screen in QCTL remained up for just a few minutes more.
Used HMC to diagnose with IBM's guidance. Based on messages shown at end of this message, IBM tech brought a SCSI DASD controller card and disk drive to our site.
He replaced DASD controller card but could not IPL from disk, only from CD.
Put the old DASD controller card in and replaced failed disk unit.
IPLed successfully and our users were able to sign on.
Approximately 12 Machine Check messages go to QSYSOPR at different intervals.
Message ID . . . . . . : CPF0937 Severity . . . . . . . : 90
Message type . . . . . : Information
Date sent . . . . . . : 03/05/11 Time sent . . . . . . : 22:50:49
Message . . . . : Machine check not recoverable. Error code X'0000'.
Cause . . . . . : The system operation is being continued. The licensed
internal code may have a problem.
====================
Also, saving to tape produces message:
MCH3402 Escape 40 03/08/11 07:53:19.607000 < 000048 QDBDLTFI QSYS
From Program . . . . . . . : setspaceptrfromptr
Message . . . . : Tried to refer to all or part of an object that no longer
exists.
Cause . . . . . : The most common cause is that a stored address to an
object is no longer correct because that object was deleted or part of the
object was deleted.
=====================
IBM tech scans the 12 disk units on our 457GB system and discovers numerous pages are bad.
</begin IBM explanation>
scanned all disk units this time and found 274 bad pages on unit 10. Of those 110 were in free space and got cleaned up by the scan tool. The remaining 174 bad pages are affecting the following four objects -
Database file member PLANNING/F55NOTE2 INOTE
Service program QSYS/QJVAJNISYS
Database file member QSPL/Q04079N036Q651950294
Database file member MODEL/F3002 F3002
</end IBM explanation>
I delete bad objects and restore from good tape backups.
</Begin IBM explanation> The damage was due to a hardware problem and it not specifically related to single-level storage. Perhaps it could be answered better by your hardware rep, but from my experience, when a disk goes bad, it can put "garbage" on the SCSI bus which typically has multiple disks attached, so data on other drives using that scsi bus can be affected. Hardware problems can also happen at the cache, IOA, or even IOP level. This is why when I work a call like this, I scan all of the units under the storage controller for damage.
The MCH3402 messages you received look like the results of cleaning up the addresses where the first page was bad. So now, instead of getting a machine check error, you got better messages that identified what object was affected.
Tomorrow, after the IPL tonight, I'd like to get another remote and rescan units 5 & 10 to make sure all of the bad pages have been cleaned up. Then I recommend scheduling a full system save to check for any further objects that may have pieces missing from those addresses I removed (first page was bad). ,</end IBM tech message >
Original System Service Tools messages shown next
===============
FI00580
FI00580 indicates that any storage device may be the failing item.
The address of the failing storage device cannot be determined.
Note the device type and refer to Finding parts, locations, and addresses to determine the FRU part number to replace.
===============
FI00500
FI00500 indicates that the I/O (SCSI) bus cable is the failing item.
See FI01140.
===============
FI00302
FI00302 indicates that the Licensed Internal Code for the magnetic storage I/O processor (MSIOP) or the combined function I/O processor (CFIOP) is the failing item.
Ask your next level of support for assistance.
===============
FI00301
FI00301 indicates that the magnetic storage I/O processor (MSIOP) or the combined function I/O processor (CFIOP) is the failing item.
Note the IOP type and refer to Managing PCI adapters to determine the FRU part number to replace.
AJDG301
Licensed Internal Code is the failing item. Look for PTFs associated with the reference code and have the customer apply them.
A6xx0266
A6xx0266
Explanation
Contact was lost with the device indicated
Response
Do not power off the system. Perform the procedure indicated in the failing item list.
Failing Item
• LICIP13
• FI00580
• FI00500
• FI00302
• FI00301
• AJDG301
A6xx0255
A6xx0255
Explanation
Contact was lost with the device indicated
Response
Do not power off the system. Perform the procedure indicated in the failing item list.
Failing Item
• LICIP13
• FI00580
• FI00500
• FI00302
• FI00301
• AJDG301
LICIP13
A disk unit seems to have stopped communicating with the system.
The system has stopped normal operation until the cause of the disk unit failure is found and corrected. Ensure you have read the Danger notices in Licensed Internal Code isolation procedures before continuing with this procedure.
If the disk unit that stopped communicating with the system has mirrored protection active, normal operation of the system stops for one to two minutes. Then the system suspends mirrored protection for that disk unit and continues normal operation.
Note: Do not power off the system or partition using the white button, function 08, ASMI, or HMC immediate power-off when performing this procedure. If this procedure or other isolation procedures referenced by this procedure direct you to IPL or power off the system,
• perform a partition main storage dump (see Performing dumps ), or
• if additional dump information is not needed, perform a function 03 IPL or restart the system or partition using the HMC.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the problem. To determine if the system has logical partitions, go to Determining if the system has logical partitions before continuing with this procedure.
2. Was a problem summary form completed for this problem?
o No: Continue with the next step.
o Yes: Use the problem summary form information and go to step 4.
3. Fill out a Problem Reporting Form ... completely with the instructions provided.
4. Recovery from a device command time-out may have caused the communications loss condition (indicated by an SRC on the control panel or in the HMC). This communications loss condition has the following symptoms:
o The A6xx SRC does not increment within two minutes.
o The system continues to run normally after it recovers from the communications loss condition and the reference code is cleared from the control panel.
Does the communication loss condition have the above symptoms?
o Yes: Continue with the next step.
o No: Go to step 6.
5. Verify that all Licensed Internal Code PTFs have been applied to the system. Apply any Licensed Internal Code PTFs that have not been applied to the system. Does the intermittent condition continue?
o Yes: Print all product activity logs. Print the LIC logs with a major code of 1000. Provide this information to your next level of support. This ends the procedure.
o No: This ends the procedure.
6. Is the storage hosted by another partition?
o Yes: Contact your next level of support.
o No: Continue with the next step.
7. A manual reset of the IOP may clear the attention reference code. Perform the following:
If you are working from the control panel:
a. Select Manual mode on the control panel.
b. Select Function 25 and press Enter.
c. Select Function 26 and press Enter.
d. Select Function 67 and press Enter to reset the IOP.
e. Wait 10 minutes.
f. Select Function 25 and press Enter to disable the service functions on the control panel.
If you are working from the HMC:
g. These need to be updated for the new HMC UI....
h. In the Navigation Area, open the Service Applications folder.
i. Select Service Focal Point.
j. In the contents area, select Service Utilities.
k. In the Service Utilities window, select the system you are working on.
l. Select Selected > Operator Panel Service Functions.
m. Select the logical partition, and then select Partition Functions.
n. Select Disk Unit IOP Reset/Reload (67).
o. Wait 10 minutes.
Did the reset successfully clear the control panel SRC or HMC panel value and can commands be entered on the partition console?
o No: Continue with the next step.
o Yes: Look for a Service Action Log (SAL) entry since the last IPL, and use it to fix the problem (see Searching the service action log ). If a B6xx 5090 SRC occurred since the last IPL, look for other SRC entries and take action on them first. This ends the procedure.
Is the SRC the same reference code that sent you here?
o Yes: The same reference code occurred. Continue with the next step.
o No: Collect all words of the reference code and perform, problem analysis to resolve the new problem. This ends the procedure.
Powering off and powering on the affected IOP domain may clear the attention reference code. Perform the following:
If you are working from the control panel:
. Select Manual mode on the control panel.
a. Select Function 25 and press Enter.
b. Select Function 26 and press Enter.
c. Select Function 68 and press Enter to power off the domain.
d. After the domain has been powered off or 10 minutes have passed, select Function 69 and press Enter to power on the domain.
e. Wait 10 minutes.
f. Select Function 25 and press Enter to disable the service functions on the control panel.
If you are working from the HMC:
g. In the Navigation Area, open the Service Applications folder.
h. Select Service Focal Point.
i. In the contents area, select Service Utilities.
j. In the Service Utilities window, select the system you are working on.
k. Select Selected > Operator Panel Service Functions.
l. Select the logical partition, and then select Partition Functions.
m. Select Power off domain (68).
n. After the domain has been powered off or 10 minutes have passed, select Power on domain (69).
o. Wait 10 minutes.
Did this successfully clear the control panel SRC or HMC panel value, and can commands be entered on the partition console?
o No: Continue with the next step.
o Yes: Look for a SAL entry since the last IPL, and use it to fix the problem (see Searching the service action log ). If a B6xx 5090 SRC occurred since the last IPL, look for other SRC entries and take action on them first. This ends the procedure.
Is the SRC the same reference code that sent you here?
o Yes: The same reference code occurred. Continue with the next step.
o No: Collect all words of the reference code and perform problem analysis to resolve the new problem. This ends the procedure.
Perform a main storage dump, then perform an IPL by performing the following:
If you are working from the control panel:
. Select Manual mode on the control panel.
a. Select Function 22 and press Enter to dump the main storage to the load-source disk unit.
b. Wait for SRC A100 300x to occur, indicating that the dump is complete.
c. Then perform an IPL to DST (see Performing an IPL to dedicated service tools ).
If you are working from the HMC:
d. In the Navigation Area, open Server and Partition.
e. Select Server Management.
f. In the contents area, open the server on which the logical partition is located.
g. Select Partitions.
h. Right-click the logical partition profile and select Restart Partition.
i. In the Restart Partition window, select the Dump restart option.
Does a different SRC occur, or does a display appear on the console showing reference codes?
No: Continue with the next step.
Yes: Perform problem analysis to correct the new problem. This ends the procedure.
Does the same reference code occur?
o Yes: Continue with the next step.
o No: The problem is intermittent. Perform the following:
a. Print the system product activity log for the magnetic storage subsystem and print the LIC logs with a major code of 1000.
b. Copy the main storage dump to removable media (see Managing dumps ).
c. Contact your next level of support and provide them with this information. This ends the procedure.
Are characters 7-8 of the top 16 character line of function 12 (2 rightmost characters of word 2) equal to 13 or 17?
o Yes: Continue with the next step.
o No: Go to step 16.
Use the word 1 through 9 information recorded on the Problem summary form to determine the disk unit that stopped communicating with the system:
o Characters 9-16 of the top 16 character line of function 12 (word 3) contain the IOP direct select address.
o Characters 1-8 of the bottom 16 character line of function 12 (word 4) contains the unit address.
o Characters 1-8 of the top 16 character line of function 13 (word 6) may contain the disk unit type, level and model number.
o Characters 13-16 of the top 16 character line of function 13 (4 rightmost characters of word 7) may contain the disk unit reference code.
o Characters 1-8 of the bottom 16 character line of function 13 (word 8) may contain the disk unit serial number.
Note: For 2105 and 2107 disk units, characters 4-8 of the bottom 16 character line of function 13 (5 rightmost characters of word 8) contain the disk unit serial number.
Is the disk unit reference code 0000?
o No: Using the information from step 14, find the table for the indicated disk unit type. Perform problem analysis for the disk unit reference code. This ends the procedure.
o Yes: Perform the following steps:
. Determine the IOP type by using characters 9-12 of the bottom 16 character line of function 13 (4 leftmost characters of word 9).
a. Find the unit reference code table for the IOP type. Determine the unit reference code by using characters 13-16 of the bottom 16 character line of function 13 (4 rightmost characters of word 9).
b. Perform problem analysis for the unit reference code. This ends the procedure.
Are characters 7-8 of the top 16 character line of function 12 (the two rightmost characters of word 2) equal to 27?
o Yes: Continue with the next step.
o No: Go to step 20.
Use the word 1 through 9 information recorded on the Problem summary form to determine the disk unit that stopped communicating with the system:
o Characters 9-16 of the top 16 character line of function 12 (word 3) contain the IOP direct select address.
o Characters 1-8 of the bottom 16 character line of function 12 (word 4) contains the disk unit address
o Characters 9-16 of the bottom 16 character line of function 12 (word 5) contains the disk unit type, level and model number.
o Characters 1-8 of the bottom 16 character line of function 13 (word 8) contains the disk unit serial number.
Note: For 2105 and 2107 Disk Units, characters 4-8 of the bottom 16 character line of function 13 (5 rightmost characters of word 8) contain the disk unit serial number.
o Characters 13-16 of the bottom 16 character line of function 13 (4 rightmost characters of word 9) contain the disk unit reference code.
Is the disk unit reference code 0000?
o No: Continue with the next step.
o Yes: Find the table for the indicated disk unit type. Then find unit reference code (URC) 3002 in the table, and exchange the FRUs for that URC, one at a time.
Note: Do not perform any other isolation procedures that are associated with URC 3002.
This ends the procedure.
Are characters 9-16 of the bottom 16 character line of function 13 (word 9) B6xx 51xx?
o Yes: Using the B6xx table, perform problem analysis for the 51xx unit reference code. This ends the procedure.
o No: Using the information from step 17, find the table for the indicated disk unit type. Perform problem analysis for the disk unit reference code. This ends the procedure.
Are the 2 rightmost characters of word 2 on the Problem summary form equal to 62?
o No: Use the information in characters 9-16 of the bottom 16 character line of function 13 (word 9) and use this information instead of the information in word 1 for the reference code. This ends the procedure.
o Yes: Continue with the next step.
Are characters 9-16 of the top 16 character line of function 12 (word 3) equal to 00010004?
o Yes: Continue with the next step.
o No: Go to step 24.
Are characters 13-16 of the bottom 16 character line of function 12 (4 rightmost characters of word 5) equal to 0000?
o No: Continue with the next step.
o Yes: Go to step 25.
Note the following:
o Characters 13-16 of the bottom 16 character line of function 12 (4 rightmost characters of word 5) contain the disk unit reference code.
o Characters 1-8 of the top 16 character line of function 13 (word 6) contains the disk unit address.
o Characters 9-16 of the top 16 character line of function 13 (word 7) contain the IOP direct select address.
o Characters 1-8 of the bottom 16 character line of function 13 (word 8) contains the disk unit type, level and model number.
Find the table for the disk unit type (characters 1-4 of the bottom 16 character line of function 13 - 4 leftmost characters of word 8), and use characters 13-16 of the bottom 16 character line of function 12 (4 rightmost characters of word 5) as the unit reference code. This ends the procedure.
Are characters 9-16 of the top 16 character line of function 12 (word 3) equal to 0002000D?
o Yes: Continue with the next step.
o No: Use the information in characters 9-16 of the bottom 16 character line of function 13 (word 9), instead of the information in word 1 for the reference code, and perform problem analysis.
Characters 1-8 of the top 16 character line of function 13 (word 6) may contain the disk unit address.
Characters 9-16 of the top 16 character line of function 13 (word 7) may contain the IOP direct select address.
Characters 1-8 of the bottom 16 character line of function 13 (word 8) may contain the disk unit type, level and model number. This ends the procedure.
Note the following:
o Characters 1-8 of the top 16 character line of function 13 (word 6) contains the disk unit address.
o Characters 9-16 of the top 16 character line of function 13 (word 7) contain the IOP direct select address.
o Characters 1-8 of the bottom 16 character line of function 13 (word 8) contains the disk unit type, level and model number.
Find the table for the disk unit type (characters 1-4 of the bottom 16 character line of function 13 (4 leftmost characters of word 8) and use 3002 as the unit reference code. Exchange the FRUs for URC 3002 one at a time. This ends the procedure.
Parent topic: Licensed Internal Code isolation procedures
=============================
BA070001
BA070001
Explanation
SCSI controller error
Response
1. Verify the SCSI cabling and termination.
2. Retry with minimal bus configuration to isolate the failing device.
3. Replace the controller and/or device.
Problem determination
No action is required.
B2004158
B2004158
Explanation
A problem occurred during the IPL of a partition.
Response
The partition ID is in extended word one as LP=xxx in decimal format. This error indicates a failure during a search for the load source. It is usual for a number of these failures to occur prior to finding a valid load source. This is normal.
Look for SRCs in the Serviceable Event View logged at the time the partition was performing an IPL. If a B2xx3110 error is logged, a B2xx3200 may be posted to the control panel. Work the B2xx3110 error in the Serviceable Event View. If the system IPL hangs at B2xx3200 and you cannot check the SRC history, perform the actions indicated for the B2xx3110 SRC. If there are other SRCs in the Serviceable Event View, then work those errors. If this SRC is posted on the control panel and there are no other SRCs listed in the Serviceable Event View, then perform [LIC]LICIP15[/LIC].
Problem determination
No action is required.
Product activitity log
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact
[javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.