Does Size Matter?
The native type
of a CPU is the size of an integral value that the CPU works with most efficiently. On the 386 and later, the preferred size is 32 bits. However, Intel CPUs can also work with 16-bit WORD values and 8-bit BYTE-sized values.
If you're one of those wacky funsters who's read the Intel architecture manual (and stayed awake), you might have noticed that many instructions have two forms. One works on byte-sized operands, while the other works with the native-size operand. On a current chip running a 32-bit operating system, this would be a 32-bit DWORD. When Windows 95 thunks down to 16-bit code, the preferred size is 16 bits.
Now, if the preferred size is 32 bits, how can the CPU work with 16-bit values? The answer is the operand size prefix. This 1-byte value (0x66 if you're curious) precedes the instruction's opcodes and toggles the native data size for that instruction. If you're running in 32-bit mode, and you use the operand size prefix, the instruction will operate on 16-bit WORDs. Likewise, if you're in 16-bit code, but you specify the size prefix, you'll use 32-bit operands.
The thing about these prefix bytes is that they increase the total code size and potentially slow things down. The amount of slowdown primarily depends on the CPU architecture you're running under. To test the potential performance degradation, I wrote the NativeSize program shown in Figure A.
NativeSize doesn't do anything spectacular. My goal was to use both 16-bit WORDs and 32-bit DWORDs in an identical manner to compare the relative times. Because instructions execute so quickly, I had to repeat the operations many times to get a measurable interval. Eventually, I settled on using a pair of nested for loops, with the WORDs and DWORDs acting as the counters. I had to use a nested loop since the maximum number of iterations for an unsigned 16-bit loop counter is 65535, which is too small to time reliably.
Here's the compiler-generated code for the WORD version of the inner loop. Note that the two instructions that reference the counter at address 0x0040AC2C explicitly use "WORD PTR" (and hence, have a size prefix). The code for the 32-bit version of the loop (not shown) looks identical, except that it uses "DWORD PTR" sized counters.
|