Bottlenecks, Utilization, and Queues

The best bottleneck alarm is system response time, as perceived by the user. Users' perceptions are affected by their expectations and the kind of work they do. An accurate bottleneck alarm would be designed to reflect these same expectations and requirements. You needn't demand the same throughput on a system supporting word processing as you do on one madly calculating routes to Jupiter. Even if your processors, disks, and memory are running at near capacity, if they are not developing the queues that degrade their response time, you don't have a problem (although you might want to plan more capacity for the future).

Although 100% utilization of a resource is a clear warning, it is neither a necessary nor sufficient condition for a bottleneck. You can have bottlenecks on devices with utilization well below 100% and you can, at least in theory, have a device perking along at nearly 100% utilization with no signs that it is a bottleneck. That is, the device is not preventing any other resource from getting its work done, nothing is waiting for it, and even if it were infinitely fast, things wouldn't happen any sooner.

A bottleneck is determined by the number of requests for service, the arrival pattern of the requests, and the amount of time requested. If these factors are perfectly synchronized, no queues develop. But if they are random or unpredictable, queues develop at much lower utilization rates.

For example, suppose a process had ten threads, each of which used exactly 0.999 seconds of processor time once every ten seconds. If each request arrived exactly one second after the previous one in perfect sequence, the processor would be 99.9% busy, but there would be no queue, no interference between the threads and, technically, no bottleneck.

Admittedly, this is a highly idealized situation, but it's easy to see how any disruption in the pattern would quickly create a large queue. According to queuing theory, if the arrival pattern of requests and the duration of requested services are random or unpredictable, a device that is 66% utilized will produce a queue of two items. Even worse, if, instead of being random, requests for service are either very short or very long, queues can form at even lower utilization. That is, fewer requests for service produce even longer queues.