Re: Disk arms and faulting: (was - We've Added more memory...but I can'tremember!) -- MIDRANGE-L

Andy --

I am going to respond to your questions in installments, because I am
multi-tasking, and because the questions are complex.

Part 1 -------

Your understanding of the impact of extra disk activity is correct, right up
until the queueing effects are factored in.  If each of the users is
pressing their enter key at regular intervals, like twice per second, and
all of their transactions are uniformly small (requiring only a few disk
accesses), and there are enough disk arms to keep up, then an extra disk
access or two probably has a small additional price.  But if the
transactions are a little "lumpy" or someone runs an interactive query and
all of the disk arms are busy right now gathering up a bunch of stuff, then
the one extra disk access can take a long time.  Depending on the
granularity of the metrics you are using, the cost of this "lumpiness" may
be hard to visualize.  Averages are tricky.  How many families do you know
that have one and a half children?

The example system I showed has a performance problem.  Adding CPU will not
help.  Adding memory will not help.  Adding disk arms _WILL_ help.  It isn't
easy to prove that.  But I have convinced myself with system after system.

If the cost of adding disk arms is too high, the other approach that will
work is to somehow reduce the quantity of disk activity.  Maybe 10
milliseconds doesn't sound like much, but with queueing it might be 40 or 50
milliseconds for each access.  And with 32,000 unnecessary disk faults
occuring in a 5 minute interval there is a chance that any one transaction
might be delayed by seconds or even minutes.  All transactions may not be
impacted equally.  50 milliseconds is 1/20th of a second, so 20 unnecessary
disk accesses that arrive before you at the disk arm that has what you need
might delay your transaction by one second.  60 of them might cause a 3
second delay.  The resulting response time might be aggravating to the user.
  And the batch job that normally processes 10 million chunks per hour may
slow down...

More later,

--  Charly



>From: "Andy Nolen-Parkhouse" <aparkhouse@attbi.com>
>Date: Fri, 12 Jul 2002 05:22:07 -0400
>
>Charly,
>
>No, I'm afraid I don't understand.  I understand the impact of disk arms
>on overall performance.  I understand the impact of paging/faulting on
>overall performance.  I do not understand the impact of too few disk
>arms on faulting.
>
>If you have 17 disk arms servicing thousands of interactive users, I can
>appreciate that they could be overburdened.  If this is the case, then
>your performance reports or WRKDSKSTS display should indicate a level of
>activity which would justify purchasing additional arms.
>
>So while I can see the effect that faulting would have on disk activity,
>I don't see the effect of disk activity on faulting.  Other than
>tinkering with expert cache, adjusting your workload, or changing your
>activity levels, what can you do about faulting/paging other than
>increase memory?
>
>If your disk activity is within acceptable limits, then the extra disk
>accesses resulting from faulting/paging will increase the response time
>for some users by the duration of those accesses.  If I interpret your
>status display correctly, this is less than one fault per interactive
>transaction.  That one fault could add about 10 milliseconds to the
>overall response time of the transaction.  This doesn't strike me as
>extreme.
>
>I don't have the answers, but I was responding to your paragraph below,
>which seemed to imply that a shortage of disk arms leads to faulting:
>
>"Most systems I have seen recently have lots and lots of memory and it
>is being mostly wasted.  I can tell because they have an automatic tuner
>moving memory around like crazy - the faulting is still high - the
>bottleneck is usually the disk resources (don't get me started on that
>topic) - the CPU is not being fully utilized - and the solution to any
>performance problem is to buy more CPU or more memory."
>
>I do not see that adding more disk arms to the system you describe would
>significantly lessen the level of paging/faulting.  Nor to I think that
>the term 'thrashing' is appropriate for a system with non-database
>faults of 109/second.  Thrashing usually describes a system which is
>spending more processing power moving memory than performing work, this
>doesn't apply in your situation.
>
>Regards,
>Andy Nolen-Parkhouse
>



"Nothing would please me more than being able to hire ten programmers and
deluge the hobby market with good software."  -  Bill Gates in 1976

"We are still waiting..."  -  Alan Cox in 2002

"Linux is only free if your time is worthless."


Charly Jones
253 265-6244
Gig Harbor
Washington USA



_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com