× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Thanks Everyone... I will keep you posted as we get a handle on it.

On Fri, Dec 10, 2010 at 11:33 AM, CRPence <CRPbottle@xxxxxxxxx> wrote:

On 12/8/10 11:21 AM, Kirk Goins wrote:
I have a client that initially reported his 520 at V6R1 was dropping
off the network for 20-40 minutes and them coming back to life like
nothing happened. After some questioning we know that the system
stops responding to pings on the Ethernet ports ( both on the MB) AND
even the Twinax Console stops responding. So I don't see this as a
network issue. During the problem times, the console has a sign on
screen but entering a ID and password just results in a 'X' until
everything starts working again. The system has auto tune on, lots of
memory ( I think a full 32GB, at least 16GB ) 65 diskarms. ALL
logging appears to stop, nothing in QHST or joblogs we checked.
random days and times. Nothing in WRKPRB, (SST - PAL or Service
Action Logs ). I just left a message to check the VLOGs so I don't
know about them yet.

IBM is having the client install iDoctor, but other than that they
haven't found anything yet. I 'feel' that either the box is
thrashing or some very very low level task gets in a tight loop.
Anyone else seen this before? Any Thoughts?


A poorly established *MACHINE pool is a possible origin for the
described effect. There have been a number of auto-tuning fixes over
the years; no idea if\what from v6r1, nor do I recall search keywords
for such fixes, but I would guess any existing fixes would be on the
HIPer group.

Even without iDoctor, given the "coming back to life" means the
interactive sessions did not terminate, a WRKACTJOB RESET(*YES) left
active in an interactive job could be reviewed after the "drop" to see
the F5=Refresh(ed) CPU%, AuxIO, and other statistics averaged over that
time. I believe WRKSYSACT output to an OutFile supports auto-refresh,
so it could similarly log\catch a CPU issue; including if it were LIC
task(s) instead of a particular job\thread.

Since one impact is described being for interactive signon [and for
lack of noting which joblogs were reviewed], review of the subsystem
monitor joblogs and the QSYSARB & QCMNARB joblogs seems appropriate;
albeit those probably should be notifying the history. Note also that
review of the history logged *after* the incident seems since-resolved
is possibly worthwhile; e.g. CPF3100 messages issued some time after the
apparent recovery could be logged to QHST as an indication of work that
might have transpired over several prior minutes. The history might
instead merely be delayed as side effect of a problem with the SCPF job
which would process the history; copying data from QHST *MSGQ into the
QHST##### files.

Regards, Chuck
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.





As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.