Measure Your Machine's Activity and Learn How to Use OS/2 Threads with CPUMETER

Bob Chiverton

It's a miracle my first sports car didn't explode. It had only two speeds-legal and illegal. But I was 19, it was summer, and the car wasn't really mine. I borrowed it from my friend, Roman. And if Roman knew I pushed the tachometer into the red zone in every gear including reverse, you wouldn't be reading this now.

I don't redline tachometers anymore, now that I pay for my auto repairs, but I still enjoy speed-especially in my computers. That's why I wrote CPUMETER, an OS/21 Presentation Manager program that shows when I'm redlining the CPU. It also taught me about using threads.

CPUMETER is a tachometer for your computer. "Tachometer" is not quite right, though. CPUMETER is really a CPU activity meter. It shows you the percentage of time the CPU is busy running threads. In this context, the CPU is active (that is, busy) when it is executing a thread and inactive when all threads are blocked or frozen (that is, not runnable). When iconized, CPUMETER uses a needle gauge that goes from 0 to 100 percent in increments of 10 percent. When your computer is idle, the needle is at 0. At full load, the needle points to 100. The client window is somewhat more informative because it keeps track of where the needle has been and displays a constantly updated performance histogram.

Does it surprise you to learn that sometimes no threads are running? It surprised me. The CPU doesn't pack up and leave town when all threads are blocked. It executes code inside the scheduler. The CPU loops, waiting for a thread to become ready. Some CPU cycles are spent servicing

I/O time-outs, semaphore time-outs, and so on, but otherwise the CPU is polling for ready threads. So it's not that there are no runnable threads, it's that there are no runnable threads except for the OS/2 scheduler thread.

CPUMET

Before I discuss CPUMETER, I'd like to present two simpler versions, CPUMET and CPUMETE. CPUMET's main function creates a standard Presentation Manager (hereafter "PM") window (see Figure 1). After calling WinSetWindowPos to size and position the window on the screen, it calls _beginthread twice to create two additional threads of execution before dropping into the familiar WinGetMsg/WinDispatchMsg loop.

There are two peculiar things about CPUMET. It looks odd, and it uses three threads. CPUMET has no client window-just a title bar window and a system menu (see Figure 2). CPUMET displays its output in the title bar. This is the easiest way I know to display text in a PM application: just call WinSetWindowText when you have a string of text to output. There are no presentation spaces, or GPI functions, which makes everything nice and simple.

Using the title bar this way is also helpful during debugging. One application I wrote used two object windows to manage two separate DDE conversations. Object windows, by definition, are not visible, and mine had no need to be visible since they never interacted with the user. But when I debugged them, I made their title bars visible and displayed text in them that revealed their internal states (for example, "Waiting for Reply").

The CPUMET client window does exist, it's just hidden. To hide the client window, CPUMET creates the standard window by calling WinCreateStdWindow.

hwndFrame = WinCreateStdWindow (HWND_DESKTOP, WS_VISIBLE,

&flFrameFlags,

szClientClass,

NULL, 0L, NULL, NULL,

&hwndClient) ;

The frame creation flags that affect the client window (such as FCF_SIZEBORDER, FCF_MINMAX, FCF_HORZSCROLL) are not used. Since the client window isn't used, there's no need to maximize it or change its size. CPUMET doesn't use menus or an icon either, so the FCF_MENU and FCF_ICON flags aren't used, and the hIcon and hMenu parameters are NULL.

CPUMET next calls WinSetWindowPos.

WinSetWindowPos (hwndFrame, HWND_TOP, 0, 0,

WinQuerySysValue (HWND_DESKTOP,SV_CXSCREEN)/3,

WinQuerySysValue (HWND_DESKTOP,SV_CYTITLEBAR),

SWP_SHOW | SWP_SIZE | SWP_MOVE);

This positions the window in the lower left of the screen. Using a width of SV_CXSCREEN/3 and a height of SV_CYTITLEBAR, the frame window is one third the width of the screen and is just high enough to show the title bar.

CPUMET uses three threads, which is unusual for such a tiny program. But because of the way the OS/2 scheduler works, you'd have difficulties using fewer.

The OS/2 Scheduler and Threads

The OS/2 scheduler starves threads. A priority-based scheduler, it always assigns the CPU to the highest-priority runnable thread.

A thread is classified and runs in one of three priority classes: high priority, general priority, or low priority. These are also known as time-critical, normal, and idle-time, respectively. The special properties associated with them make CPU load calculations possible.

The scheduler runs the highest priority thread for as long as it needs. It doesn't matter how many other threads are waiting. They will not get the CPU-they will starve-if a higher priority thread is running. (In multiple-CPU systems, the scheduler may execute the n highest priority threads.)

The scheduler is rude, too. It will cut right into the middle of a thread's time slice and yank the CPU away if another thread with a higher priority becomes ready.

I'd like to clear up some misconceptions about time slices. There are two kinds in OS/2: those you specify with the TIMESLICE directive in the CONFIG.SYS file and real time slices. I'll distinguish between them by calling the first a "TIMESLICE" and the second a "time slice."

Real time slices are 31.25 milliseconds (msec) long. This is the length of time between two consecutive ticks of the system clock. With every tick, the scheduler sees if another thread with a higher priority is ready. If one is, the scheduler will preempt the current thread in favor of the higher priority one. When a thread gets a time slice, it's getting one of 31.25 msec duration. A thread can be preempted in the middle of a time slice, too, if it calls an API function and a higher priority thread has become ready to run since the last clock tick.

TIMESLICES become important only when several threads of equal priority are all ready to run and are all CPU-bound. Then each thread will run for a full TIMESLICE before control is passed to the next thread of the same priority. Note that the value of TIMESLICE must be an integral number of 31.25 msec time slices.

Calculating the Load

In OS/2, an idle-time thread gets CPU cycles only if higher priority threads aren't available. This is the key to calculating the load. CPUMET simply uses an idle-time thread to soak up CPU cycles when no other threads are running. The busier the CPU is (with other threads), the fewer CPU cycles the idle-time thread will get.

For example, suppose the idle-time thread is getting 10 percent of the total CPU cycles every second and the rest of the time the scheduler runs other threads. By this definition, the CPU load is 90 percent. The trick is to record the CPU cycles/second the idle-time thread gets somehow. CPUMET does this indirectly, using the global variable lCount and two threads.

One thread that I'll call the counting thread runs idle-time and continually increments lCount. The other thread, the timing thread, runs time-critical and resets lCount to zero once every second. The idle-time thread function is CountingThread, shown below.

VOID CountingThread ()

{

DosSetPrty (PRTYS_THREAD, PRTYC_IDLETIME, 0,

tidCounting);

while (TRUE) lCount++;

}

This thread is created when the message queue thread calls _beginthread in the main function.

tidCounting = _beginthread (CountingThread,

iCountingThreadStack, THREADSTACKSIZE,

NULL);

First, the counting thread calls DosSetPrty to change its priority to idle-time.

By default, a thread inherits the priority of its creator thread, which in this case is the message queue thread. The message queue thread, in turn, inherited its general-priority class from the thread in the parent process (the PM shell) that called DosExecPgm to launch CPUMET.

Next, the counting thread drops into an infinite loop.

while (TRUE) lCount++;

Any CPU cycles this thread now receives will be used to increment lCount. This is the way to measure (indirectly) the CPU cycles/second the idle-thread gets.

To get the "per second," you need the timing thread. The timing thread resets the value of lCount once per second. It is created by the message queue thread with the following call.

tidTiming = _beginthread (TimingThread,

iTimingThreadStack

THREADSTACKSIZE, NULL);

The first thing the timing thread does is change its priority to time-critical. The reason for this will be discussed in a moment. Next, it drops into an infinite loop.

while (TRUE)

{

DosSleep (1000L);

WinPostMsg (hwndClient, WM_SEM1,

MPFROMLONG(lCount), NULL);

lCount = 0L;

}

Once per second (every 1000 msec), the timing thread wakes up, posts the message queue thread a WM_SEM1 message with lCount in the mp1 parameter, resets lCount to zero, then goes back to sleep. This continues until CPUMET terminates.

The timing thread runs time-critical to guarantee it will get the CPU when it wakes up (unless another time-critical thread is running). The thread has to be put to sleep with DosSleep until then so it doesn't hog the CPU. DosSleep is one of those API calls that allows the scheduler to preempt the running thread immediately (in this case the timing thread) and dispatch the next highest priority runnable thread (see Figure 3).

Usually when you have a global variable like lCount that is modified by two or more threads, you protect the variable with a semaphore. But in this case no damage results if, in the highly unlikely event, the counting thread increments lCount between two DosSleep calls in the timing thread.

Note that WinPostMsg uses the WM_SEM1 message. WM_SEM1 messages aren't stored in the message queue; instead, they are handled like WM_TIMER messages. Multiple WM_SEM1 messages are consolidated into one, ensuring that only the most recent lCount is retrieved and displayed.

The message queue thread extracts lCount from the WM_SEM1 message and calls WinSetWindowText to display it in the title bar. You can tell how busy the CPU is by looking at the value in the title bar. The lower the number, the busier the CPU.

Too Much of a Good Thing

Only one instance of CPUMET must run at a time, because extra instances will load the CPU just as any other app would. Any CPU cycles that a single instance of CPUMET would have soaked up would then be divided equally among the instances. Because each counting thread has the same priority, the scheduler runs them on a round-robin basis (this is where TIMESLICE comes into

play). For n instances, each counting thread will compute

lCount/n counts per second.

If CPUMET were a Windows application, you could use the hPrevInstance parameter in WinMain to detect multiple instances. Unfortunately, PM has no equivalent. But it has something Windows lacks-system semaphores.

Upon entry to main, before doing anything else, CPUMET tests for the existence of another instance with the following statement:

if ( DosCreateSem(CSEM_PUBLIC, &hSem, "\\sem\\cpumeter.sem") )

DosExit (EXIT_PROCESS, 0) ;

The function DosCreateSem will attempt to create the system semaphore "\sem\cpumeter.sem". The first instance will succeed, but subsequent instances will fail and exit via DosExit.

This is a somewhat untraditional role for a semaphore. It matters not one iota if the semaphore is set or clear, only that it exists. The existence of the semaphore represents the existence of a single instance of CPUMET.

Of course, we have to destroy the semaphore when CPUMET terminates, which is why we need to store the handle in the global variable HSYSSEM hSem.

The semaphore is destroyed in the ClientWndProc with the following code:

case WM_DESTROY:

{

DosCloseSem (hSem) ;

return 0 ;

}

This allows the next instance to recreate the semaphore and run.

CPUMETE

CPUMET calculates the CPU load. However, there is a problem with using lCount to represent it, due to the inverse relationship between lCount and the CPU load. The lower the value of lCount, the higher the load. It would be better to display an amount that varies in direct proportion to the load.

Another problem is that lCount's value is dependent on a computer's CPU speed. A faster computer will calculate a higher value for lCount than a slower one, everything else being equal.

I solved both problems in CPUMETE.EXE by introducing the variable lCountMax, which represents the maximum value lCount can attain, and a new thread to calculate lCountMax. This allows you to display the CPU load as a percentage of maximum load. Let's see how CPUMETE accomplishes this (see Figure 4).

Adding a Calibration Thread

If you know the maximum value lCount can be for a given computer, you can calculate CPU usage as a percentage of this maximum with the following simple equation.

load = [1 - (lCount/lCountMax)] * 100%

The trick is to calculate lCountMax, which is done by the calibration thread. The calibration thread function looks almost identical to the counting thread function.

VOID CalibrationThread ()

{

DosSetPrty (PRTYS_THREAD, PRTYC_TIMECRITICAL, 30,

tidCalibration);

while (TRUE) lCountMax++;

}

The differences are that it uses lCountMax instead of lCount, and it runs time-critical. Running time-critical guarantees the thread will get the maximum CPU cycles possible (assuming there are no other time-critical threads competing for the CPU). Notice that the DosSetPrty call uses 30 for the sChange parameter.

Within a priority class, a thread is further distinguished by its relative priority level, ranging from 0 to 31. If two threads belong to the same priority class, the one with the higher level will get the CPU before the other, all else being equal. The calibration thread has a priority level of 30, one fewer than the maximum. You need to reserve level 31 for a higher priority thread so it can preempt the calibration thread after it runs for one second. The thread doing the interrupting is the message queue thread. Following is the sequence of code added to the main function for this purpose.

DosSetPrty (PRTYS_THREAD, PRTYC_TIMECRITICAL, 31, 1);

tidCalibration=_beginthread(CalibrationThread,

iCalibrationThreadStack,

THREADSTACKSIZE, NULL) ;

DosSleep (1000L);

DosSuspendThread (tidCalibration);

DosSetPrty (PRTYS_THREAD, PRTYC_REGULAR, 0, 1);

You identify the message queue thread with an ID of 1 in the DosSetPrty call. Thread IDs come on a first-come, first-serve basis, starting with 1. Don't expect your other n threads to follow in sequential order as 2, 3, . . ., n1. Some API calls generate captive threads-that is, captive within a DLL. Although you didn't explicitly create these threads, your process will assign thread IDs to them. The rule here is to not assume thread ID values; use the thread ID generated from a _beginthread or DosCreateThread call.

After setting the message queue thread's priority, CPUMETE creates the calibration thread. Although it runs time-critical, the calibration thread will not run yet because the message queue thread has a higher priority level of 31. But then DosSleep is called, knocking out the message queue thread for 1000 msec. During this time, the calibration thread becomes the highest priority runnable thread in the system and gets all the CPU cycles. The calibration thread gets preempted after one second, when the message queue wakes up. By this time, lCountMax is calculated. The calibration thread is no longer needed and is put to sleep by DosSuspendThread. This freezes the thread; the scheduler will ignore it until it is unfrozen.

In general, one thread cannot kill another thread. A thread can kill only itself or the entire process, by calling DosExit:

DosExit (fTerminate, usExitCode) ;

and using EXIT_THREAD or EXIT_PROCESS for the fTerminate parameter. Since the calibration thread doesn't call DosExit, the best you can do is freeze it from another thread. Finally, the message queue thread resets its priority so it doesn't hog the CPU.

CPUMETE's title bar is more informative than CPUMET's (see Figure 5). Instead of just displaying lCount like CPUMET did, it displays three values in the following format:

x / y -> z%

where x represents lCount, y represents lCountMax, and z represents the load. Occasionally, x will exceed y. This does not represent a serious error, but rather inaccuracies associated with the 31.25 msec resolution of the system clock.

Adding the Tachometer

CPUMETE uses a set of eleven icons to display the CPU load when the app is iconized.

HPOINTER hIcon [11] ;

I created these icons with the Icon Editor. The hIcon array is initialized in the WM_CREATE code,

case WM_CREATE:

{

for (i=0; i<=10; i++)

hIcon [i] = WinLoadPointer (HWND_DESKTOP,

NULL, (100+i));

return 0;

}

and the icons are released in the WM_DESTROY code.

case WM_DESTROY:

{

for (i=0; i<=10; i++) WinDestroyPointer(hIcon[i]);

DosCloseSem(hSem);

return 0;

}

CPUMETE uses hIcon[0] when the CPU is at 0 percent, hIcon[1] at 10 percent, and so on, up to hIcon[10] at 100 percent. When the message queue thread receives a WM_SEM1 message, it extracts lCount from the mp1 parameter and calls the nearest_10_percent function.

i = nearest_10_percent (LONGFROMMP (mp1));

This function uses the load equation presented earlier and returns an integer from 0 to 10, in increments of 1, corresponding to a CPU load of 0 percent to 100 percent. The return value i is used in the WM_SETICON message sent to the frame window to change the icon.

WinSendMsg (hwndFrame, WM_SETICON, hIcon [i], NULL) ;

To add icons, I needed to add the FCF_MINBUTTON and FCF_ICON frameflags to the WinCreateStdWindow call in main. I also changed the HICON parameter to ICON00 to represent an initial CPU load of 0 percent. Finally, I had to change the make file to accommodate the CPUMETE.RC resource file.

CPUMETER

CPUMETE's icons are useful for showing the instantaneous CPU load, but it would be better to see the load over a period of time. That would give us a clearer picture of the demands an application places on the CPU. What CPUMETE needs is a histogram.

CPUMETER (see Figure 6), the final version of the program, displays a histogram of the CPU load in its client window (see Figure 7). The histogram is a simple bar graph, consisting of black vertical bars on a white background. The graph is updated each second. Each bar is one pixel wide, which means a client window 400 pixels wide can display up to 400 bars. On a VGA display, about 11 minutes of load history can be displayed in a maximized window. Each bar's height is directly proportional to the CPU load it represents and is scaled to the client window height. For example, given a client window that is 376 pixels high, a bar representing a 100 percent load will be 376 pixels high, a 50 percent load will be 188 pixels high, and so on. When the client window is resized vertically, the bars are resized vertically too.

The histogram logic is contained in the file HISTGRAM.C (see Figure 8). This file contains just five functions, corresponding to the five messages explicitly processed by ClientWndProc: create_histogram (WM_CREATE), size_histogram (WM_SIZE), paint_histogram (WM_PAINT), update_histogram (WM_SEM1), and destroy_histogram (WM_DESTROY).

The histogram is created when ClientWndProc processes the WM_CREATE message.

WM_CREATE :

o

o

o

create_histogram (hab, hwnd) ;

The hab and hwnd variables are used by create_histogram to get a handle, hPS, to a micro presentation space (micro PS) by calling WinOpenWindowDC and GpiCreatePS.

hdc = WinOpenWindowDC (hwndClient) ;

hPS = GpiCreatePS (hab, hdc, ...) ;

Then hPS is used to set the foreground and background colors and the line type for the PS:

GpiSetBackColor (hPS,...);

GpiSetColor (hPS,...);

GpiSetLineType (hPS, ...);

A micro PS is a good choice for this application. With a cached micro PS, CPUMETER would have to reset the colors and line type each time it reacquired a handle to the PS (via GpiBeginPaint or GpiGetPS). Using a micro PS is more efficient because it retains its state information between GpiBeginPaint calls. Using a normal PS would be overkill because CPUMETER is only outputting to the screen; it doesn't need a normal PS's capability to reassociate another display context, such as a printer DC, with itself. Also, there's no need for the normal PS segments.

The histogram's main data structure is an array of ints. Each element in the array corresponds to one bar in the histogram. There must be enough elements to accommodate a maximized client window. This is determined by calling WinQuerySysValue.

iArraySize = WinQuerySysValue (HWND_DESKTOP,

SV_CXSCREEN) ;

What CPUMETER needs is an array like the following.

INT anIntegerArray [iArraySize] ;

But because iArraySize is determined at run time, the array is allocated dynamically, rather than statically. This is done by using the calls to WinCreateHeap and WinAllocMem shown below.

hHeap = WinCreateHeap (0,0,(sizeof(INT)*iArraySize),

0,0,0) ;

npArray = (PINT) WinAllocMem (hHeap,(sizeof(INT)*iArraySize));

WinCreateHeap sets up the heap in the automatic data segment. This is one difference between PM and the Microsoft WindowsÔ graphical environment. In Windows2, a local heap "comes standard" with your application, but in PM it's an option your application has to order.

The variable npArray points to the first element in the array. If you feel uncomfortable with dynamic arrays, you can still use the familiar notation npArray[i] to reference element i. Each array element stores a CPU load, ranging in value from 0 to 100. When the client window receives a WM_PAINT message and calls paint_histogram, the histogram bars are redrawn. Each bar is scaled to the client window's height by scaling the array element value.

ptl.y = (npArray[i] / 100.) * yClientHeight ;

When CPUMETER terminates, the destroy_histogram function releases the micro PS and frees the heap. The heap, being on the automatic data segment, would be destroyed by OS/2 if CPUMETER didn't do it. But it's a good habit to clean up after yourself.

To accommodate the histogram, the FCF_MINMAX and FCF_SHELLPOSITION flags were added to the flFrameFlags variable in main. The FCF_SHELLPOSITION flag tells PM to give our client window an initial size and position, so I removed the call to WinSetWindowPos that followed WinCreateStdWindow in CPUMET and CPUMETE.

Finally, you need the new function nearest_1_percent to compute the CPU load to the nearest 1 percent. This is used in the code that processes WM_SEM1 messages in ClientWndProc to update the histogram.

i = nearest_1_percent (...);

o

o

o

update_histogram (i) ;

Conclusion

CPUMETER is fun to watch and instructive. If you have access to the sample programs in Programming the OS/2 Presentation Manager (Microsoft Press, 1989), try running CPUMETER with the samples from Chapter 17. You'll discover that multithreaded programs make better use of the CPU than their single-threaded counterparts. That is, the multithreaded applications load the CPU more heavily than their single-threaded versions do. CPUMETER shows this quite clearly.

On a warm summer night, start up CPUMETER, then bring up your favorite program and take it for a spin. And if you get the urge. . .redline it!

1As used herein, "OS/2" refers to the OS/2 operating system jointly developed by IBM and Microsoft.

2For ease of reading, "Windows" refers to the Microsoft Windows graphical environment. "Windows" refers solely to this Microsoft product and is not intended to refer to such products generally.