Understanding the Normal Distribution

You describe workloads to Response Probe, in part, by choosing a mean and standard deviation for several parameters that characterize threads in the workload. You choose a mean and standard deviation for

Length of think times

Number of processing cycles (file access and computing) between think times

Length of computing times

Position of records accessed from the workload file

Position of pages accessed from the data page file

Position of functions read from the code page file

The theory behind Response Probe is that real workloads are normally distributed; that is, the time spent thinking, accessing memory, reading and writing to files, finding a record in a file, computing, and other such tasks, are distributed on a standard bell-shaped curve. Alternatively, if the workloads were invariant or random, they wouldn't simulate real use of real computers.

Note

The actions of threads and processes are normally distributed, so they are not, technically, fixed. However, in repeated trials the same inputs will produce the same results.

In a bell-shaped curve, most activity is concentrated around the mean, decreasing with distance from the mean, and with the least frequency at both extremes. The standard deviation determines how much activity is concentrated near the mean and how often more distant values occur. By definition, 2/3 of all activity occurs within one standard deviation on either side of the mean, 95% is within two standard deviations, and 99% is within three standard deviations of the mean.

Usually, the midpoint is chosen as the mean, and one-third of the total (1/6 on either side of the mean) is chosen as the standard deviation.

For example, in a 1024-page file, if page 512 is the mean and 170 pages is the standard deviation, then:

2/3 of reads will be from pages 342–682, (512-170 to 512+170).

95% will be from pages 172–852 (512 - (170 *2) to 512 + (170 * 2).

99% will be from pages 2–1022 (512 - (170 *3) to 512 + (170 * 3).

If, instead, the mean was 512 and the standard deviation was 256 (1/2 of 512), then 2/3 of reads will be from pages 256–768, and the remaining third would be equally distributed throughout the rest of the file. At the extremes, if the standard deviation is 0, page 512 is the only one read, and if the mean is equal to the standard deviation, reading is random, so all pages are equally likely to be read.