|
Subject: Re: Recommended CPU % utilized > With all due respect what you are going to do is to test experimentally a > well-known result of a queueing theory: > > T = S * QM, where QM = 1/(1-U) > > T is elapsed time to process the request (response time), S is service time - > CPU time needed to process the request. > QM is Queueing Multiplier, U is CPU utilization by processes with equal or > higher priority. > > This formula is somewhat more complicated for more than one processor. > I would expect that results of your measurements will plot nicely the curve > described by formula above. Since there seems to be some interest in this, I'll chip in with something I wrote for a project once: System Configuration Considerations Using Simple Queuing Theory --------------------------------------------------------------- We consider a system receiving requests, servicing them, and consequently producing responses. The treatment is based on the assumption that the events causing requests to the system occur at random. The probability of any particular event happening is generally low and uniform, i.e. the probability of it happening in a given unit of time is the same as the probability of it happening in any other unit of time. This situation is typical of most (but not all) real-time systems. The number of arrivals of events during a given time interval can the be described by the Poisson distribution: P(n) = (exp(-<n>) * <n>**n) / n! Where P(n) denotes the probability of having n events per unit time if the average number of events per unit time is <n>. [n!=n(n-1)(n-2).1] In the following we shall use angle brackets <x> to denote the average of a variable x. If we have N single values x1, x2, ., xN, the average value is defined as <x> = (x1 + x2 + . + xN) / N Standard queuing theory shows that if the probability of one particular event occurring in a given time interval is constant (uniform distribution) then the probability of having n such events occur in a given time interval follows the Poisson distribution. The probability that times between events are less than a certain value then follows an exponential distribution: P(t<T) = 1 - exp(-T/<t>) where <t> is the mean interarrival time. For illustration, consider a system where transactions occur at the average rate of 2 per second. The probability of having n arrivals per second is thus P(n) = (exp(-2) * 2**n) / n!, since <n> = 2 per second. Using this formula for n from 0 through 9 yields: n P(n) It is possible, but very unlikely, 0 0.1353 to get more than 9 arrivals per 1 0.2707 second. On the other hand, there 2 0.2707 is a 14.3% chance that the rate 3 0.1804 will reach or exceed twice the 4 0.0902 --. average range. That is, rather 5 0.0361 | large fluctuations can occur. 6 0.0120 | The degree of these fluctuations 7 0.0034 | these sum to 0.143 decreases with increasing <n> as 8 0.0009 | shown in the following table. 9 0.0002 --' <n> P(n>=2*<n>) 1 0.265 This table shows the probability that the 2 0.143 number of arrivals exceeds more than twice 3 0.085 its average value for several different 4 0.051 mean values <n> 5 0.030 6 0.019 7 0.012 Most systems of the type we are considering can be regarded as a facility with a certain utilization factor that we shall generally denote by `u'. If the system can handle five transactions per unit time and it during a certain interval handled only three transactions per unit time, it was only 3/5 = 0.6 or 60% utilized. If the facility utilization is low, system response is swift and there will be little or no queuing. The greater the utilization is, the greater the delays eventually become and the longer the queues that will build up. As we have just seen, the utilization factor can fluctuate markedly simply because the number of events per unit time (e.g. per second) fluctuates. If a system can handle four events per second and the transaction intensity is two events per second, the system will be overloaded 14.3% of the time. The overload situation will relax shortly though because of the inherent system capacity. Experience has shown (becoming a good engineering rule of thumb) that an overloading of less than 10% is generally safe and can be absorbed by the system without noticeable penalties. In other words: The system must have such a capacity that the traffic Does not exceed the capacity 90% of the time. The quantity of interest here is the probability of having X or more arrivals in a time interval for which the mean arrival rate is <n>: P(n>=X) = sum((exp(-<n>)*<n>**n)/n!) from n=X to infinity A sound design goal is then to ensure that P(n>=X) < 0.10 i.e. to configure the system to handle a number X of arrivals per second given the average arrival rate of <n> so that the above inequality holds. The following table gives X and the utilization u as functions of <n>: <n> X u=<n>/X 1 2.8 0.36 For large <n>, X approaches the value 2 4.4 0.45 3 5.8 0.52 X -> <n> + 1.287 sqrt(<n>) 4 7.2 0.56 5 8.5 0.59 In fact, u -> 1 as <n> -> infinity. 6 9.8 0.61 But for most values found in practice, 7 11.0 0.64 X > <n> by a significant amount. This 8 12.2 0.66 simple fact is often difficult to 9 13.5 0.67 accept. As an example, if <n> = 10 10 14.7 0.68 transactions per second, we need a 20 26.3 0.76 capacity of X=14.7 or 47% larger in 100 113.0 0.88 order to cope smoothly with the load. 1000 1041.0 0.96 At first sight there seems to be a paradox in the above considerations. If <n> = 10 per second, we need to have a capacity of X=14.7 (47% more) but if my unit of time is changed to be a "dec", which is 10 seconds (just chosen arbitrarily), then the very same real traffic volume would be <n> = 100 (namely 10*10) and according to the table we need now X=113 or only 13% overcapacity rather than 47%. It seems that simply by varying my unit of time over which I arbitrarily measure the traffic intensity, I require different amount of capacity to handle what is really the same number of events. How can that be? Th solution lies in the realization that the design criterion we used was to configure the system so that delays would not become excessive. Now, delays are measured in the same units of time as <n>, and increasing the length of the time interval over which we measure <n> amounts to increasing the delay we will tolerate as well. The implicit assumption behind the considerations about facility overloading is in fact that we do not tolerate delays that are much longer than the servicing time (which is 1/X). You do not want to wait in line for an hour for a transaction that only takes a minute; however, waiting a few minutes would be quite acceptable. Let us now consider a system having a single server with a queue of items awaiting service. Let ts be the service time for one item and sd be the standard deviation ("spread") of all the service times. Further, let w be the number of items waiting in the queue at a given time and let q be the total number of items in the system (either waiting or being served). Let tw be the time an item waits before being served and let tq be the total time the items spends in the system, then for each individual event we have: tq = tw + ts and thus also for the mean values: <tq> = <tw> + <ts>. On the average, for a steady state condition, we must have: <w> = <n> <tw> and <q> = <n> <tq>. If we in one second serve <n> events (in a steady state the number of transactions served equals the number of arrivals on the average) and the mean service time is <ts>, the total time of a second used in servicing is <n> <ts>, which is precisely the amount of time the facility is being used, hence the utilization factor becomes just that: u = <n> <ts> Combining the above relations, we see that: <q> = <n> <tw> + <n> <ts> = <w> + u. The basic result of single-server queuing theory was developed by Khintchine and Pollaczek and can be expressed: <w> = (u**2)/(2*(1-u))*(1 + 1/Rs) where the parameter Rs is given by Rs = (<ts>/sd)**2 When the service times are exponentially distributed the standard deviation sd is equal to the mean: Sd = <ts> and Rs = 1; if the service times were constant (sd = 0), we get Rs = infinity. In practice, the assumption of exponential service times is often a good one and we shall use Rs = 1. Therefore, the mean number of items waiting is: <w> = u**2 / (1-u) and then <q> = <w> + u = u**2 / (1-u) + u = u /(1-u) We shall comment upon the relation <w> = <n> <tw>. In a steady state as many items leave the queue as enter it from the other end. That number is <n>. If the queue contains <w> items and <n> leave per second, it will take <w>/<n> seconds to empty the queue and the latest arrival would the spend so many seconds in the queue, hence the time an item waits before being served is <tw> = <w> / <n> from which we get <w> = <n> <tw>. We can now write <tw> = <w> / <n> = (<w> / u) * <ts>, since u = <n> <ts>, and thus <tq> = <tw> + <ts> = (<w>/u + 1) <ts>. Using <w> = u**2/(1-u), we finally get: <tq> = (1 + u/(1-u)) <ts> = (1 + <q>) <ts> We have thus expressed the queuing time <tq> in terms of the service time <ts> and the utilization u. The following table shows this relationship between the load factor u, the number of items in the system <q>, and the response times <tq> in terms of the mean service time <ts>: u <q> <tq> 0.2 0.25 1.25 <ts> 0.4 0.67 1.67 0.6 1.50 2.50 It is clear that when the utilization 0.7 2.33 3.33 increases to beyond 75%, the queues grow 0.8 4.00 5.00 alarmingly and the system is hardly 0.85 5.67 6.67 usable at all. 0.90 9.00 10.00 0.95 19.00 20.00 When we have a stream of transactions that arrive at random, the times between arrivals are exponentially distributed. But suppose that there is a two-way switch at the arrival point. The switch operates so as to send alternate transactions on alternate output "lines" for service. The result is a more uniform distribution of interarrival times. The distributions that result in this way have been studied by the Danish mathematician A.K. Erlang and are called Erlang distributions. (Erlang worked for the Copenhagen Telephone Company, which explains the terminology of traffic, switches, and lines). The exponential distribution is an Erlang-1 distribution, while the two-way switch generates Erlang-2 distributions. If we had switched the transactions down a very large number of paths - instead of just the two as above - the resulting interarrival times would become very nearly constant. An Erlang-infinity distribution of interarrival times means that these are constant. The converse is also true: if we concentrate onto one line the traffic from many separate lines, the spread in interarrival times will increase leading to more queuing, as there will be bursts of events piling up, as well as intervals without traffic, where the server is idle and the facility is underused. Switching down several output lines simply means having several servers handling transactions. Let us assume there are M identical servers. The men number of items arriving per unit time is <n> as before. Thus <n>/M items go to each server. Again, let the mean service time be <ts>. The facility utilization therefore becomes: u = <n> <ts> / M As before: <q> = <n> <tw> + <n> <ts> And so: <q> = <w> + M*u It can be shown that the probability that ALL servers are busy at a given time is B = P(q>=M) = (1 - Z(M,u)) / (1 - u*Z(M,u)) Where sum ((M*u)**i/i!) from i = 0 to M-1 Z(M,u) = ----------------------------------- sum ((M*u)**i/i!) from i = 0 to M For M = 1 the expression for B reduces to B = u. The table below explores B as a function of u and M: U M=1 M=2 M=4 M=8 M=16 M=50 M=100 0.1 0.100 0.018 0.001 0.000 0.000 0.000 0.000 0.2 0.200 0.067 0.010 0.000 0.000 0.000 0.000 0.3 0.300 0.138 0.037 0.004 0.000 0.000 0.000 0.4 0.400 0.229 0.091 0.018 0.001 0.000 0.000 0.5 0.500 0.333 0.174 0.059 0.009 0.000 0.000 0.6 0.600 0.450 0.287 0.140 0.042 0.001 0.000 0.7 0.700 0.576 0.429 0.271 0.130 0.011 0.000 0.8 0.800 0.711 0.596 0.458 0.305 0.087 0.020 0.9 0.900 0.853 0.788 0.702 0.591 0.364 0.217 0.94 0.940 0.911 0.870 0.815 0.740 0.568 0.434 The mean number of items in the queue can be shown to be <w> = B*u /(1-u) hence <q> = <w> + M*u = (B/(1-u) + M)*u and the response time becomes: <tq> = <q> / <n> = (<q> <ts>) / (<n> <ts>) = <q> <ts> / (M*u) so that, finally: <tq> = (1+ B/M/(1-u)) <ts> For M = 2 we have (exactly): <tq> = <ts> / (1-u**2) and B = 2u**2 / (1+u) The following simple formula is a good approximation for all M (exact for M <= 2): <tq> = <ts> / (1-u**M) We shall contrast the response times (in terms of the mean service time <ts>) for various values of M and note the significant improvements when M grows larger: u <tq>M=1 <tq>M=2 <tq>M=4 <tq>M=16 0.2 1.250 1.042 1.003 1.000 0.4 1.667 1.190 1.038 1.000 0.6 2.500 1.562 1.179 1.007 0.7 3.333 1.961 1.357 1.027 0.76 4.167 2.367 1.548 1.058 0.80 5.000 2.778 1.746 1.095 0.84 6.250 3.397 2.047 1.158 0.88 8.333 4.433 2.558 1.273 0.90 10.000 5.263 2.969 1.370 0.92 12.500 6.510 3.589 1.518 0.94 16.667 8.591 4.626 1.771 0.96 25.000 12.755 6.705 2.284 0.98 50.000 25.253 12.950 3.389 It is important not to confuse the true multi-server model with one where arrivals are just distributed cyclically to each server in turn. This sort of model is really just M single-server queues in parallel and does not give the improvement in response time to the same degree as the true multi-server situation with only one queue. Since it is important then to have only one queue, we may ask: how long could that queue become? The average number of items in the system is not the measure we want, rather it is <w>, the number of items in the queue. Using <w> = B*u/(1-u) we get: u <w>M=1 <w>M=2 <w>M=4 <w>M=16 0.6 0.9 0.7 0.4 0.1 0.7 1.6 1.3 1.0 0.3 0.8 3.2 2.8 2.4 1.2 0.9 8.1 7.7 7.1 5.3 0.94 14.7 14.3 13.6 11.6 For heavy load, <w> is not very sensitive to the number of servers. It is maybe at first sight a bit surprising that the queue length does not depend on the arrival rate <n>, but only on the facility utilization u. How should the queue length be configured? Characteristic for this type of queue is its wide fluctuations: the standard deviation of the queue length is large. Queuing theory yields the result that the standard deviation of the queue length is given by: sd(w) = sqrt(B*u*(1 + u - B*u))/(1-u) And the queue length itself is: w = B*u/(1-u) Under heavy load B and u are both close to one, so: w = 1 /(1-u) and sd(w) = 1 /(1-u) = w I.e. the standard deviation is of the same size as the length itself. The probability that a value is more than three standard deviations removed from the mean is very low (0.001) so that the maximum queue length could be set to w(max) = <w> + 3 * sd(w) = 4 * <w> You could add an extra one for added margin and use w(max) = 5 * <w> with confidence. +--- | This is the Midrange System Mailing List! | To submit a new message, send your mail to MIDRANGE-L@midrange.com. | To subscribe to this list send email to MIDRANGE-L-SUB@midrange.com. | To unsubscribe from this list send email to MIDRANGE-L-UNSUB@midrange.com. | Questions should be directed to the list owner/operator: david@midrange.com +---
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.