Introduction to Bottlenecks

A system bottleneck is the service center with the highest demand in your organization. Demand is the number of visits to a service center multiplied by the average time each visit takes. For example, if a workload is causing 100 disk accesses per second and the disk accesses takes 1 millisecond, then the demand for the disk is 100 milliseconds per second, or 10 percent. Keep the following concepts in mind when analyzing bottlenecks:

A service center is a resource in the system that requires tasks to wait when the resource is servicing another task. The CPU, disks, controllers, and network are all examples of service centers. Other examples are logical resources such as locks or critical sections in software. Memory is not such a resource. Even though the system starts paging when there is not enough memory, memory itself is not the bottleneck. Rather, insufficient memory causes the disk subsystem to become a bottleneck.
All bottlenecks are observed in the context of a workload. When a server is fulfilling a file-server role, its disks can be the source of the bottleneck. When the same server is acting as a domain controller, its CPU can be the bottleneck.
For any server and workload combination, there is a bottleneck. The bottleneck resource usually is not overloaded during normal operation. When it is, a queue builds up in front of that resource, and service times climb. Then you must eliminate the source of the bottleneck.

The illustration below shows a characteristic response-time curve that all multiuser servers share. As the load on the server increases, usage of the bottleneck resource comes closer to saturation (100 percent utilization) more of the time. As it does so, queuing in front of the resource increases, which causes response times to slow down. As the resource nears 100 percent utilization, the response time slows rapidly.

A server that hosts satisfied users is not operating with its resources very close to saturation. There is "elbow room" for the bottleneck resource, and the system operates in the part of the response rate curve that is below the acceptable response time limit. When the system is operating above that line, it's time to relieve the bottleneck.