× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Looks like I found the issue. The output file which receives a high volume of records in our "problem" job had a field defined as VARLEN(16), but 90% of the calls had a value that exceeded 16 bytes. This was a new field for this client and the size was determined based on all other clients (before we saw the new client's data). Once I discovered that, I upped the VARLEN to 25 (based on new client's data) and suddenly the job screams in speed. I knew a misuse of VARLEN would cause issues, but didn't realize it would be so bad.

Now, I had mentioned that a completely unrelated job took 24 hours to run (4-5x normal). That still boggles me because that job was not exposed to any of the changes made - which is what made me think there was a system issue.

We did check the Cache Battery, which at least exposed us to the fact that we are 100 days until a warning, so I really appreciate that tidbit. I've also been exposed to some system monitoring tools.

Thank you everyone for responding with ideas and things to check into.

-Kurt

-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Kurt Anderson
Sent: Thursday, June 10, 2010 2:57 PM
To: 'Midrange Systems Technical Discussion'
Subject: RE: System slow-down - disk usage?

I knew I forgot something.

Main Storage: 3885.01MB

QPFRADJ = 2

In WrkDskSts I see Active for all units.


-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of DrFranken
Sent: Thursday, June 10, 2010 12:09 PM
To: midrange-l@xxxxxxxxxxxx
Subject: Re: System slow-down - disk usage?

OK Cleaning up never hurts and often helps so that much is good.

Your disk configuration is a bit odd in that the first two units are
mirrored to each other while the rest are RAID protected. Not
unsupported or anything like that but you do lose about 35GB of
available storage this way and the RAID is on only 4 drives rather than
across 8 drives (all drives ARE still protected) In any case while 'odd'
it shouldn't be the source of your problem.

I forgot to mention that when looking at %busy on the drives or at the
paging/faulting ratios you need to wait until the elapsed time at the
top is about 5 minutes. Much shorter than that and you get 'spikey' data
that isn't real valuable. Much longer than that and all your data
averages into useless mush. It's not bad to know that your average
%busy for the entire morning was say 12% but it doesn't help you
troubleshoot much.

On the WRKDSKSTS screen after you pressed F11 did you see 'Degraded' or
'Unprotected'? That would indicate battery failure/drive failure.
'Active' is what you want to see.

One disk is one arm so you're correct there. How much memory is in the
machine? (Easy find is at top of WRKSHRPOOL screen.)

Definitely watch the paging/faulting and %BUSY numbers while the long
slow job is running.

Also what is your system value QPFRADJ set at?

- DrFranken

After I sent out my earlier email, we buckled down and cleaned up a lot of excess on the system, essentially gaining back 10% of the disk, which put us back to where we started. We had another job running, although this time with less to process, but it did take significantly longer than expected.

I found an article on the Cache Battery and have passed it along to my boss.
http://www.itjungle.com/fhg/fhg050907-story03.html

I checked our paging to fault ratio, and it seems decent. Our "hog" jobs aren't running right now, but I looked at wrksyssts enough while they were running to recall that the ratio was around 1 fault per 50 pages.

Using WRKDSKSTS, we have 7 units. Is each unit considered an arm? (I guess, 1 arm per disk? I'm a software guy doing his best to understand the system side here.) 1& 2 are Protection Type MBR, 3-7 are DPY. All are Active. I'll have to start up some tests to take a look at the Busy %. At the moment the Busy % is in the low teens or lower.

In regard to our system, it's a 520, 9405, P10.

Thanks for the help,
Kurt

-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of DrFranken
Sent: Wednesday, June 09, 2010 11:45 PM
To: midrange-l@xxxxxxxxxxxx
Subject: Re: System slow-down - disk usage?

Absolutely check Richard's suggestion on the Cache Battery. If any of
them are dead this sort of performance WILL result during significant I/O.

Going from 60 to 70% DASD should not cause this dramatic slow down. It
may cause some small fraction but nothing like 4 times plus.

What is the %BUSY in WRKDSKSTS when the long running job is running? If
the disks are 40% or more busy then you likely need more arms, faster
arms, or bigger disk cache but even then that's only a 'probable'. Also
how many disk arms do you currently have? Are they DPY or MRR protected
(From WRKDSKSTS F11)

You also need to check faulting as you mention. The big thing about
paging and faulting is the ratios. If the pool running the jobs is
paging at say 2500 but faulting at 25 then you're doing exceptionally
well as only 1 in 100 pages results in a fault. If you've got 500 faults
out of 500 pages then you likely have a memory pool that is far too
small or has too many jobs running in it. The reason you don't find
specific numbers is because 'It Depends'. If you have a 32 way 595 you
can have faulting numbers that would make a 520 user cry and not bat an
eye.

What is your system CPW (or processor feature code) and how much memory
is installed.

- - DrFranken

On 6/9/2010 6:35 PM, Kurt Anderson wrote:

I'm on v5r4, and we've recently gotten a very large customer and have had some speed issues. At first we thought they were specific to some certain new programs, but today we discovered the issue was impacting another job that was completely an absolutely isolated as far as programs go. So, we were looking at things from a system point of view to see what changed to cause this other job to slow down so much. Our guess - that our % system ASP used went from ~60% to ~70%. Is it possible that that would cause us an issue? (We had a job that would normally run 5 hours take almost 24 hours.)

We IPL'd over the weekend as well. Anyway, I realize this email is probably lacking a lot of specific information, but I'm not really a systems guy, and we're kind of grasping at straws, so I thought I'd see if such a change to disk % used should have such a big impact?

I am looking into other performance improving methods, but at this time we'd really like to pin down the cause of our performance crawl before attempting to put in enhancements.

While I'm at it, I'm curious how to quantify "excessive paging." I've seen reference to that phrase online, yet can't seem to find a number.

Thanks,

Kurt Anderson
Sr. Programmer/Analyst
CustomCall Data Systems





As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.