Re: Recommended CPU % utilized -- MIDRANGE-L

Subject: Re: Recommended CPU % utilized


 > With all due respect what you are going to do is to test experimentally a
 > well-known result of a queueing theory:
 >
 > T = S * QM, where QM = 1/(1-U)
 >
 > T is elapsed time to process the request (response time), S is service
time -
 > CPU time needed to process the request.
 > QM is Queueing Multiplier, U is CPU utilization by processes with equal or
 > higher priority.
 >
 > This formula is somewhat more complicated for more than one processor.
 > I would expect that results of your measurements will plot nicely the curve
 > described by formula above.

Since there seems to be some interest in this, I'll chip in with something
I wrote for a project once:


System Configuration Considerations Using Simple Queuing Theory
---------------------------------------------------------------

We consider a system receiving requests, servicing them, and
consequently producing responses. The treatment is based on the
assumption that the events causing requests to the system occur at
random. The probability of any particular event happening is generally
low and uniform, i.e. the probability of it happening in a given unit
of time is the same as the probability of it happening in any other
unit of time. This situation is typical of most (but not all) real-time
systems. The number of arrivals of events during a given time interval
can the be described by the Poisson distribution:

        P(n) = (exp(-<n>) * <n>**n) / n!

Where P(n) denotes the probability of having n events per unit time if
the average number of events per unit time is <n>.  [n!=n(n-1)(n-2).1]
In the following we shall use angle brackets <x> to denote the average
of a variable x. If we have N single values x1, x2, ., xN, the average
value is defined as

        <x> = (x1 + x2 + . + xN) / N

Standard queuing theory shows that if the probability of one particular
event occurring in a given time interval is constant (uniform
distribution) then the probability of having n such events occur in a
given time interval follows the Poisson distribution. The probability
that times between events are less than a certain value then follows an
exponential distribution:

        P(t<T) = 1 - exp(-T/<t>)

where <t> is the mean interarrival time.

For illustration, consider a system where transactions occur at the
average rate of 2 per second. The probability of having n arrivals per
second is thus P(n) = (exp(-2) * 2**n) / n!, since <n> = 2 per second.
Using this formula for n from 0 through 9 yields:

n    P(n)                            It is possible, but very unlikely,
0   0.1353                           to get more than 9 arrivals per
1   0.2707                           second. On the other hand, there
2   0.2707                           is a 14.3% chance that the rate
3   0.1804                           will reach or exceed twice the
4   0.0902  --.                      average range. That is, rather
5   0.0361    |                      large fluctuations can occur.
6   0.0120    |                      The degree of these fluctuations
7   0.0034    | these sum to 0.143   decreases with increasing <n> as
8   0.0009    |                      shown in the following table.
9   0.0002  --'

<n>  P(n>=2*<n>)
  1     0.265           This table shows the probability that the
  2     0.143           number of arrivals exceeds more than twice
  3     0.085           its average value for several different
  4     0.051           mean values <n>
  5     0.030
  6     0.019
  7     0.012

Most systems of the type we are considering can be regarded as a
facility with a certain utilization factor that we shall generally
denote by `u'. If the system can handle five transactions per unit time
and it during a certain interval handled only three transactions per
unit time, it was only 3/5 = 0.6 or 60% utilized. If the facility
utilization is low, system response is swift and there will be little
or no queuing. The greater the utilization is, the greater the delays
eventually become and the longer the queues that will build up.

As we have just seen, the utilization factor can fluctuate markedly
simply because the number of events per unit time (e.g. per second)
fluctuates. If a system can handle four events per second and the
transaction intensity is two events per second, the system will be
overloaded 14.3% of the time. The overload situation will relax shortly
though because of the inherent system capacity. Experience has shown
(becoming a good engineering rule of thumb) that an overloading of less
than 10% is generally safe and can be absorbed by the system without
noticeable penalties. In other words:

      The system must have such a capacity that the traffic
      Does not exceed the capacity 90% of the time.

The quantity of interest here is the probability of having X or more
arrivals in a time interval for which the mean arrival rate is <n>:

         P(n>=X) = sum((exp(-<n>)*<n>**n)/n!) from n=X to infinity

A sound design goal is then to ensure that

         P(n>=X) < 0.10

i.e. to configure the system to handle a number X of arrivals per
second given the average arrival rate of <n> so that the above
inequality holds. The following table gives X and the utilization u as
functions of <n>:

<n>      X         u=<n>/X
  1       2.8         0.36       For large <n>, X approaches the value
  2       4.4         0.45
  3       5.8         0.52            X ->  <n> + 1.287 sqrt(<n>)
  4       7.2         0.56
  5       8.5         0.59       In fact, u -> 1 as <n> -> infinity.
  6       9.8         0.61       But for most values found in practice,
  7      11.0         0.64       X > <n> by a significant amount. This
  8      12.2         0.66       simple fact is often difficult to
  9      13.5         0.67       accept. As an example, if <n> = 10
10      14.7         0.68       transactions per second, we need a
20      26.3         0.76       capacity of X=14.7 or 47% larger in
100    113.0         0.88       order to cope smoothly with the load.
1000  1041.0         0.96

At first sight there seems to be a paradox in the above considerations.
If <n> = 10 per second, we need to have a capacity of X=14.7 (47% more)
but if my unit of time is changed to be a "dec", which is 10 seconds
(just chosen arbitrarily), then the very same real traffic volume would
be <n> = 100 (namely 10*10) and according to the table we need now
X=113 or only 13% overcapacity rather than 47%.

It seems that simply by varying my unit of time over which I
arbitrarily measure the traffic intensity, I require different amount
of capacity to handle what is really the same number of events. How can
that be? Th solution lies in the realization that the design criterion
we used was to configure the system so that delays would not become
excessive. Now, delays are measured in the same units of time as <n>,
and increasing the length of the time interval over which we measure
<n> amounts to increasing the delay we will tolerate as well. The
implicit assumption behind the considerations about facility
overloading is in fact that we do not tolerate delays that are much
longer than the servicing time (which is 1/X). You do not want to wait
in line for an hour for a transaction that only takes a minute;
however, waiting a few minutes would be quite acceptable.

Let us now consider a system having a single server with a queue of
items awaiting service. Let ts be the service time for one item and sd
be the standard deviation ("spread") of all the service times. Further,
let w be the number of items waiting in the queue at a given time and
let q be the total number of items in the system (either waiting or
being served). Let tw be the time an item waits before being served and
let tq be the total time the items spends in the system, then for each
individual event we have:

          tq = tw + ts

and thus also for the mean values:

          <tq> = <tw> + <ts>.

On the average, for a steady state condition, we must have:

          <w> = <n> <tw>
and      <q> = <n> <tq>.

If we in one second serve <n> events (in a steady state the number of
transactions served equals the number of arrivals on the average) and
the mean service time is <ts>, the total time of a second used in
servicing is <n> <ts>, which is precisely the amount of time the
facility is being used, hence the utilization factor becomes just that:

          u = <n> <ts>

Combining the above relations, we see that:

          <q> = <n> <tw> + <n> <ts> = <w> + u.

The basic result of single-server queuing theory was developed by
Khintchine and Pollaczek and can be expressed:

          <w> = (u**2)/(2*(1-u))*(1 + 1/Rs)

where the parameter Rs is given by

         Rs = (<ts>/sd)**2

When the service times are exponentially distributed the standard
deviation sd is equal to the mean:

         Sd = <ts>

and Rs = 1; if the service times were constant (sd = 0), we get Rs =
infinity. In practice, the assumption of exponential service times is
often a good one and we shall use Rs = 1. Therefore, the mean number of
items waiting is:

         <w> = u**2 / (1-u)

and then

         <q> = <w> + u = u**2 / (1-u) + u = u /(1-u)

We shall comment upon the relation <w> = <n> <tw>. In a steady state as
many items leave the queue as enter it from the other end. That number
is <n>. If the queue contains <w> items and <n> leave per second, it
will take <w>/<n> seconds to empty the queue and the latest arrival
would the spend so many seconds in the queue, hence the time an item
waits before being served is

         <tw> = <w> / <n>   from which we get   <w> = <n> <tw>.

We can now write

         <tw> = <w> / <n> = (<w> / u) * <ts>, since u = <n> <ts>, and
thus
         <tq> = <tw> + <ts> = (<w>/u + 1) <ts>.

Using <w> = u**2/(1-u), we finally get:

         <tq> = (1 + u/(1-u)) <ts> = (1 + <q>) <ts>

We have thus expressed the queuing time <tq> in terms of the service
time <ts> and the utilization u.

The following table shows this relationship between the load factor u,
the number of items in the system <q>, and the response times <tq> in
terms of the mean service time <ts>:

  u     <q>      <tq>
0.2    0.25     1.25 <ts>
0.4    0.67     1.67
0.6    1.50     2.50         It is clear that when the utilization
0.7    2.33     3.33         increases to beyond 75%, the queues grow
0.8    4.00     5.00         alarmingly and the system is hardly
0.85   5.67     6.67         usable at all.
0.90   9.00    10.00
0.95  19.00    20.00

When we have a stream of transactions that arrive at random, the times
between arrivals are exponentially distributed. But suppose that there
is a two-way switch at the arrival point. The switch operates so as to
send alternate transactions on alternate output "lines" for service.
The result is a more uniform distribution of interarrival times. The
distributions that result in this way have been studied by the Danish
mathematician A.K. Erlang and are called Erlang distributions. (Erlang
worked for the Copenhagen Telephone Company, which explains the
terminology of traffic, switches, and lines). The exponential
distribution is an Erlang-1 distribution, while the two-way switch
generates Erlang-2 distributions. If we had switched the transactions
down a very large number of paths - instead of just the two as above -
the resulting interarrival times would become very nearly constant. An
Erlang-infinity distribution of interarrival times means that these are
constant.

The converse is also true: if we concentrate onto one line the traffic
from many separate lines, the spread in interarrival times will
increase leading to more queuing, as there will be bursts of events
piling up, as well as intervals without traffic, where the server is
idle and the facility is underused.

Switching down several output lines simply means having several servers
handling transactions. Let us assume there are M identical servers. The
men number of items arriving per unit time is <n> as before. Thus <n>/M
items go to each server. Again, let the mean service time be <ts>. The
facility utilization therefore becomes:

           u = <n> <ts> / M

As before:    <q> = <n> <tw> + <n> <ts>
And so:
               <q> = <w> + M*u

It can be shown that the probability that ALL servers are busy at a
given time is

                B = P(q>=M) = (1 - Z(M,u)) / (1 - u*Z(M,u))

Where
                         sum ((M*u)**i/i!) from i = 0 to M-1
                Z(M,u) = -----------------------------------
                         sum ((M*u)**i/i!) from i = 0 to M

For M = 1 the expression for B reduces to B = u. The table below
explores B as a function of u and M:

      U     M=1    M=2    M=4    M=8    M=16   M=50  M=100
     0.1   0.100  0.018  0.001  0.000  0.000  0.000  0.000
     0.2   0.200  0.067  0.010  0.000  0.000  0.000  0.000
     0.3   0.300  0.138  0.037  0.004  0.000  0.000  0.000
     0.4   0.400  0.229  0.091  0.018  0.001  0.000  0.000
     0.5   0.500  0.333  0.174  0.059  0.009  0.000  0.000
     0.6   0.600  0.450  0.287  0.140  0.042  0.001  0.000
     0.7   0.700  0.576  0.429  0.271  0.130  0.011  0.000
     0.8   0.800  0.711  0.596  0.458  0.305  0.087  0.020
     0.9   0.900  0.853  0.788  0.702  0.591  0.364  0.217
     0.94  0.940  0.911  0.870  0.815  0.740  0.568  0.434

The mean number of items in the queue can be shown to be

         <w> = B*u /(1-u)

hence
         <q> = <w> + M*u = (B/(1-u) + M)*u

and the response time becomes:

         <tq> = <q> / <n> = (<q> <ts>) / (<n> <ts>) = <q> <ts> / (M*u)

so that, finally:

         <tq> = (1+ B/M/(1-u)) <ts>

For M = 2 we have (exactly):

         <tq> = <ts> / (1-u**2)   and B = 2u**2 / (1+u)

The following simple formula is a good approximation for all M (exact
for M <= 2):

         <tq> = <ts> / (1-u**M)

We shall contrast the response times (in terms of the mean service time
<ts>) for various values of M and note the significant improvements
when M grows larger:

         u       <tq>M=1   <tq>M=2   <tq>M=4   <tq>M=16
        0.2       1.250     1.042     1.003     1.000
        0.4       1.667     1.190     1.038     1.000
        0.6       2.500     1.562     1.179     1.007
        0.7       3.333     1.961     1.357     1.027
        0.76      4.167     2.367     1.548     1.058
        0.80      5.000     2.778     1.746     1.095
        0.84      6.250     3.397     2.047     1.158
        0.88      8.333     4.433     2.558     1.273
        0.90     10.000     5.263     2.969     1.370
        0.92     12.500     6.510     3.589     1.518
        0.94     16.667     8.591     4.626     1.771
        0.96     25.000    12.755     6.705     2.284
        0.98     50.000    25.253    12.950     3.389

It is important not to confuse the true multi-server model with one
where arrivals are just distributed cyclically to each server in turn.
This sort of model is really just M single-server queues in parallel
and does not give the improvement in response time to the same degree
as the true multi-server situation with only one queue.

Since it is important then to have only one queue, we may ask: how long
could that queue become? The average number of items in the system is
not the measure we want, rather it is <w>, the number of items in the
queue. Using <w> = B*u/(1-u) we get:

         u       <w>M=1   <w>M=2   <w>M=4   <w>M=16
        0.6        0.9      0.7      0.4      0.1
        0.7        1.6      1.3      1.0      0.3
        0.8        3.2      2.8      2.4      1.2
        0.9        8.1      7.7      7.1      5.3
        0.94      14.7     14.3     13.6     11.6

For heavy load, <w> is not very sensitive to the number of servers. It
is maybe at first sight a bit surprising that the queue length does not
depend on the arrival rate <n>, but only on the facility utilization u.
How should the queue length be configured? Characteristic for this type
of queue is its wide fluctuations: the standard deviation of the queue
length is large. Queuing theory yields the result that the standard
deviation of the queue length is given by:

          sd(w) = sqrt(B*u*(1 + u - B*u))/(1-u)

And the queue length itself is:

           w = B*u/(1-u)

Under heavy load B and u are both close to one, so:

           w = 1 /(1-u)

and
          sd(w) = 1 /(1-u) = w

I.e. the standard deviation is of the same size as the length itself.
The probability that a value is more than three standard deviations
removed from the mean is very low (0.001) so that the maximum queue
length could be set to

            w(max) = <w> + 3 * sd(w) = 4 * <w>

You could add an extra one for added margin and use w(max) = 5 * <w>
with confidence.







+---
| This is the Midrange System Mailing List!
| To submit a new message, send your mail to MIDRANGE-L@midrange.com.
| To subscribe to this list send email to MIDRANGE-L-SUB@midrange.com.
| To unsubscribe from this list send email to MIDRANGE-L-UNSUB@midrange.com.
| Questions should be directed to the list owner/operator: david@midrange.com
+---