Troubleshooting the Disk Activity Counters
Sometimes, the disk activity counters just don't add up. %Disk Read Time or %Disk Write Time might sum to more than 100% even on a single disk, and %Disk Time, which represents their sum, is still 100%. Even worse, on a disk set, the set looks 100% busy even when some disks are idle.
Even the fanciest disk can't be more than 100% busy, but it can look that way to Performance Monitor. Several factors can cause this discrepancy and they are sometimes all happening at once:
- The Disk Performance Statistics Driver, which collects disk measurements for Performance Monitor, can exaggerate disk time. It doesn't actually time the disk, it times I/O requests. It assumes that, as long as a request is in process, the disk is busy. It also counts all processing time—including time in the disk driver stack and in the queue—as part of reading or writing time. Then it sums all busy time for all requests and divides it by the elapsed time of the sample interval. When more than one request is in process at a time, the total processing time is greater than the time of the sample interval, and the disk looks more than 100% busy.
- When Performance Monitor combines data for more than one component of a disk or disk set, it often just sums the values; it doesn't recalculate them in proportion to the whole component. Therefore, a sum can exceed 100%, even when some of the instances are idle.
For example, the %Disk Time counter just displays the sum of %Disk Read Time and %Disk Write Time. The value is not recalculated as a percentage of all time for the disk.
Similarly, the _Total instance of many counters is just a sum of the values for all physical or logical disks. The value is not recalculated as a percentage of time for all disks. For example, if one disk is 100% busy and another is idle, the _Total displays 100% busy, not 50% busy.
Note
When calculating the _Total instance for the Avg. Disk Bytes/Transfer, Avg. Disk sec/Transfer, and %Free Space counters, Performance Monitor recalculates the sums as a percentage for each disk.
Also, the Physical Disk counters are sums of the values for the logical disk. If any logical disk is 100% busy, it looks like all partitions are 100% busy.
- Finally, the percentage counters are limited, by definition, to a maximum of 100%. When a value exceeds 100%, Performance Monitor still displays 100%. This is especially inconvenient when measuring values that are sums, which are even more likely to exceed 100%.
Now that you understand how the disk counters work, you can use them more effectively.
- Monitor individual instances in as much detail as you can see them. Whenever practical, avoid summed values and the _Total instance. When you need to use them, remember that they are sums.
- Use the new Performance Monitor disk activity counters, Avg. Disk Queue Length, Avg. Disk Read Queue Length, Avg. Disk Write Queue Length. They use the same data as the % Disk Time counters, so the busy time values can be exaggerated. However, they report these values in decimals that have no defined maximum, so they can display values above 100% accurately. For more information, see "New Disk Activity Counters," later in this chapter.
- If your disk configuration includes Ftdisk, use the Diskperf -ye option on the Diskperf utility. This installs the Disk Performance Statistics Driver low enough in the disk driver stack that it can see individual physical disks before they are logically combined. If you use Diskperf -y, statistics for all physical disks are summed as though they were one disk.
- If all else fails, factor in the discrepancy when you interpret the values.
For example, if the disks in a five-disk set were busy 30%, 33%, 38%, 0%, and 0% of the time respectively, the Avg. Disk Queue Length would be 1.01. Remember that this means that about 20% of disk set capacity is used, not 101%.