× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



I don't recall if you mentioned it but I seem to recall this is in an 8202-e4x machine and we are discussing drives in the system unit. If that's correct then yes you DO have cache batteries.

CALL QSMBTTCC

The output from that will display any cards with batteries and their lifespans.

If you don't have any then you are good! Get more coffee.

Given batteries exist if you are still reading this, then when they croak (SRC xxxx8008) IBM i will simply stop using cache. Any data IN cache is flushed to disk. The system slows down significantly especially on writes.

When you replace those expired batteries you absolutely MUST verify that the output from QSMBTTCC shows YES under 'CAN SAFELY BE REPLACED' If it says NO and you decide to replace it you risk losing your ASP. To force that to YES you work with resources containing cache batteries and forces the card to flush the cache.

As a double check when you are powered down and the cover is off there is a tiny LED that blinks on the card if cache data is present. That is your FINAL warning that what you are about to do is 'bad'.

So confirm YES for can be replaced, and no blinking LED, and your battery replacement will be safe and successful.

- Larry

On 2/15/2021 11:44 PM, Thomas Garvey wrote:
We have a full complement of replacement drives now on order.

But, let's talk about the cache batteries, since you brought it up.
When we get the messages about the condition of batteries nearing end of life, we can still shut it down, taking the option of replacing the batteries to flush the cache, replace the batteries, then bring it back up again, without losing RAID5 protection, right?
It would have to be that way, right? Otherwise, RAID5 would always fail. The problem is if you allow the cache batteries to completely fail before replacement. That's what you're referring to, right?

Best Regards,

Thomas Garvey


On 2/15/2021 5:36 PM, a4g atl wrote:
Its not usual to get failures so quick unless as I found if you shut down
and ipl it some time later. Older disks do not like the change in
temperature and I found that when shutting down systems in winter, the
drive cools more than expected and this results in drives failing. Since I
started keeping my system running all the time, I have not had issues.
Could be I just got lucky.

Another issue with RAID is the life of your CACHE battery. If you lose your
CACHE batteries, the RAID set is usually lost and requires a full
rebuild/restore. If your system is up, there is a procedure to start
recovery before you power down. You must then go into the system and tell
it to turn off RAID and CACHE. I do not remember the exact terminology.
You can then power down, replace the battery and the failed drive and
rebuild your system. Maybe others will chime in on this.

Just my experience.


On Mon, Feb 15, 2021 at 6:21 PM Thomas Garvey <tgarvey@xxxxxxxxxx> wrote:

Thanks, everyone. We had one spare drive and did the replacment. The
advice that replacement was urgent before another failure would cause
complete rebuild from system save was the thing that lit the fire under me.
We are now back to...

Work with Disk Status

Elapsed time:   00:00:00

             --Protection--
Unit  ASP  Type  Status    Compression
     1    1  DPY   ACTIVE
     2    1  DPY   ACTIVE
     3    1  DPY   ACTIVE
     4    1  DPY   ACTIVE
     5    1  DPY   ACTIVE
     6    1  DPY   ACTIVE
     7    1  DPY   ACTIVE
     8    1  DPY   ACTIVE

and...

Display Disk Configuration Status

             Serial Resource                      Hot Spare
   ASP Unit  Number          Type Model Name Status Protection
     1 Unprotected
          1  Y010D30090TW    198C  099  DMP013     RAID 5/Active           N
          2  Y6800TV1N64J    198C  099  DMP020     RAID 5/Active           N
          3  Y010D3008RDW    198C  099  DMP015     RAID 5/Active           N
          4  Y010D300911A    198C  099  DMP005     RAID 5/Active           N
          5  Y010D3008UC7    198C  099  DMP001     RAID 5/Active           N
          6  Y010D3008R7Y    198C  099  DMP011     RAID 5/Active           N
          7  Y010D3008UBG    198C  099  DMP007     RAID 5/Active           N
          8  Y210W7K0JQ4C    198C  099  DMP009     RAID 5/Active           N


Best Regards,

Thomas Garvey
Corporate Scientist
Unbeaten Path International
630-462-3991
/www.unpath.com <http://www.unpath.com/>
/

On 2/15/2021 3:49 PM, Patrik Schindler wrote:
Hello Thomas,

Am 15.02.2021 um 19:31 schrieb Thomas Garvey <tgarvey@xxxxxxxxxx>:

                          Display Device Parity Status

   Parity                      Resource Hot Spare
     Set  ASP Unit  Type Model Name       Status Protection
       1            2BE1  001  DC01       RAID 5                  N
            1    2  198C  099  DMP020 Unprotected
            1    1  198C  099  DMP013 Unprotected
            1    4  198C  099  DMP005 Unprotected
            1    5  198C  099  DMP001 Unprotected
            1    8  198C  099  DMP009 Failed
            1    3  198C  099  DMP015 Unprotected
            1    6  198C  099  DMP011 Unprotected
            1    7  198C  099  DMP007 Unprotected
In addition to other people's valid comments…

I had one occasion some months ago, with a 8203-E4A containing five
disks in a RAID5. One was marked as faulty over night. I know that there
are other than fatal faults, so from habit in the PC-World, I just re-added
that disk (forced a rebuild, I can't recall the precise thing I did in
SST). If it really had a (media related) problem, rebuild would have kicked
it out again.
Rebuild went without any problems. That particular disk hasn't been
conspicuous for months now.
(From the beginning, there's a solid backup strategy in place for that
machine, involving monthly save 21's, and daily savechgobj to an
automatically created iso image being ftp'd to a backup server afterwards.
IFS isn't used beyond what was installed there through the OS itself.)
This outcome matches with my decades old experiences with PC servers
(not no-name crap, I'm talking about HPE and IBM, for example) running
Linux. I describe that as SCSI- (1990's, early 2000's) and today,
SAS-Hiccups, from the apparent lack of something being broken. RAID logic
doesn't get answer from the drive in a timely manner and declares it as
faulty. A lot of kernel log entries, but no clear culprit. Happens once
every one or two years per machine, depending on I/O load.
Just saying. Your mileage may vary.

:wq! PoC

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.