Charles Petzold
Charles Pezold is a contributing editor of MSJ and PC Magazine. He is the author of Progamming Windows (Microsoft Press, 1990).
{ewc navigate.dll, ewbutton, /Bcodeview /T"Click to open or copy the code samples from this article." /C"samples_42bits}
Back in the very early days of the IBMÒ PC, I worked with some people who used a word processor called EasyWriterÒ. One day, someone came to me with a problem. A long document had been typed in to EasyWriter over a period of several days, but now the program wouldn’t accept any more text. No matter what we tried to type into the document, EasyWriter would just beep.
I saved the file, exited EasyWriter, and got a directory list. You can probably guess what I saw. The file was just about 65,536 bytes long. That beep we heard was in lieu of an error message that would have said something like “Maximum document length exceeded.”
Of course, that was not the last time I would encounter 64KB limits. I remember reviewing the first 256-color video board available with a driver for version 1.0 of the MicrosoftÒ WindowsÔ operating system. It looked great and worked fine, except that Windows1 would crash whenever I invoked a large drop-down menu in one particular program. The problem? The screen area that Windows was saving behind the menu required a bitmap larger than 64KB, which wasn’t supported under Windows 1.0.
Of course, bitmaps larger than 64KB are supported under Windows now, but 64KB limitations still exist. In Windows 3.0, many users encountered memory constraints related to system resources. It turned out that the 64KB data segment in the USER dynamic-link library was getting filled up. Windows 3.1 eases the problem by allocating a second 64KB segment for system resources, but that’s not exactly a long-term solution.
The origins of this 64KB barrier go back to the early days of the personal computer.
The first IntelÒ microprocessor widely used in small computers was the 8080. It was used extensively in machines running the CP/MÒ operating system. The 8080 contained 8-bit registers, some of which could be combined to form a 16-bit memory address. The maximum amount of memory that could be accessed by the 8080 was 216 or 65,536 bytes.
Intel’s first 16-bit processors, the 8086 and 8088, were designed for backward compatibility with the 8080. Existing 8080 assembly-language source code could be run through a translator and converted to 8086 code. The 8088 was of course the processor used in the original IBM PC, and some of the early PC programs were ported from CP/M using such a translator.
Making the 8086 and 8088 backward compatible with the 8080 was accomplished in two basic ways:
First, although the internal registers of the 8086/8 were 16 bits wide, most registers consisted of two 8-bit registers that could be used independently. These 8-bit registers were thus compatible with the 8-bit 8080 registers.
Second, memory addressing in the 8086/8 also used 16-bit registers, but with a twist. To increase the memory address space beyond 65,536 bytes (already a limitation on CP/M machines), Intel devised a segmented memory scheme. The 8086 contained four 16-bit segment registers. To form a memory address, a 16-bit segment address was internally shifted left four bits and added to a 16-bit offset register. The resulting 20-bit address could access 1MB of memory. By keeping all the segment registers set to the same value, a program running on the 8086/8 could use memory just like a program running on the 8080. This was the origin of the COM file format popular during the early years of DOS.
The 80286 chip expanded physical memory address space from 1MB to 16MB when running in protected mode. This was accomplished by making the segment registers an index into a descriptor table, each entry of which contained a 24-bit base address. The 16-bit offset address was then internally added to the base address when accessing memory.
Despite the migration from the 8080 to the 8086/8 to the 80286, and the increase in memory space from 64KB to 1MB to 16MB, the 64KB barrier remains. The 64KB barrier is easily the most hated nuisance for programmers working in MS-DOS or Windows.
As we’ve been using the personal computer for more widely varied and ambitious tasks, 65,536 has come to seem like an absurdly small number.
Think about the memory required to store a complete graphics screen image. For the CGA with 640 X 200 resolution and 2 colors, only 16KB was required. For the EGA (640 X 350 with 16 colors), we jumped up to 112,000 bytes. For the standard VGA (640 X 480 with 16 colors), it’s 150KB. For a common super-VGA resolution (800 X 600 with 256 colors), we’re up to the size of a large application—480,000 bytes. And for adapters such as the IBM 8514/A (1024 X 768 with 256 colors), it’s a whopping 768KB. Try fitting that in memory under DOS!
Or consider multimedia sound sampling and playback. At even low-resolution sound sampling—monophonic with 11,025 samples per second and 8 bits per sample—you exceed a 64KB segment in less than 6 seconds. For CD-quality sound, you’re dealing with 44,100 samples per second, 16 bits per sample, and stereo. That’s a whopping 176,000 bytes per second of sound.
And let’s talk about full-blown full-screen full-motion VGA video. That’s 640 X 480 pixels, 24-bit color, and 30 frames per second: 26MB of uncompressed data for one second of video!
To access memory blocks greater than 64KB in C, the huge memory block has been introduced in many compilers. By using the huge keyword for an array or pointer, a C programmer can tell the compiler that it must deal with a memory block larger than 64KB. The compiler handles huge memory blocks by using multiple segments and by generating code that properly adjusts segment addresses when calculating pointers. (You’ll see an example of this below.)
However, huge memory blocks do not entirely solve the problems of the 64KB barrier. First, the byte size of the elements of a huge array should be a power of two. Otherwise, a huge array is limited to 128KB. (In other words, an array element cannot straddle a segment boundary.) There are other problems as well: the code generated by the compiler for huge pointers can really hurt performance, and many C library functions do not accept huge pointers.
The answer to the 64,000 byte question is actually quite simple—we need 32-bit operating systems and 32-bit applications. We need to move from a 65,536 limit to a 4,294,967,296 limit.
Although personal computers built around Intel’s 32-bit 80386 microprocessor have been around since the fall of 1986, the software half of the industry has been slow to catch up. A version of UNIXÒ for the 80386 was available quickly, of course, but OS/2Ò 2.0 didn’t arrive until earlier this year, and Microsoft’s solution, the Windows NTÔ operating system, is currently in beta.
These three operating systems all use a flat memory model. From the viewpoint of an application, the memory address space is accessed simply, using 32-bit pointers without any segmentation. (Segments are used within the operating system for interprocess protection.)
For C programmers working in 32-bit operating systems, the size of an int is promoted from 16 bits to 32 bits. A short remains 16 bits and a long remains 32 bits. There is only one memory model, and concepts such as near, far, and huge disappear.
Of course, moving from a 16-bit environment to a 32-bit environment is a big change for the computer industry. Is it really necessary? After all, there have been ways to get around the 64KB barrier. If you want to use 32-bit integers, you just define them as long rather than int. If you want to use arrays larger than 64KB, you can define them as huge, or you can use halloc (or, in Windows, GlobalAlloc).
The reason to move to real 32-bit environments is simple: performance.
One problem with high-level languages is that they often obscure what goes on behind the scenes. You can write C code that looks clean, tight, and efficient, but what does that matter if the compiler is forced to generate clunky machine code? Yet this is precisely what happens when you use 32-bit integers or 32-bit addressing in a program targeted for a 16-bit environment.
Let’s examine some simple code and see the differences you can expect between 16-bit and 32-bit compiles. In the examples that follow, I used Microsoft C/C++ 7.0 for 16-bit MS-DOS code generation. For 32-bit Windows NT code generation, I used the 32-bit compiler included with the second prerelease of the Win32Ô SDK. In both cases, I specified maximum optimization using the -Ox compiler flag. The assembly-language listings shown here were generated by the compiler (or obtained from the CodeViewÒ debugger), but I’ve cleaned them up and added comments for readability.
Figure 1 shows a very simple C program, MULT16.C, that simply multiplies two 32-bit integers and stores the result. Unfortunately, the 16-bit 80x86 assembly language does not include a 32-bit multiply instruction. The best you can do is a 16-bit multiply that produces a 32-bit result.
The compiler is forced to compensate for this deficiency by using the 16-bit multiply instruction three times and then adding the results. Both 32-bit operands must be broken into two 16-bit words. The two low words must be multiplied, and the low word of each operand must be multiplied by the high word of the other.
Compiling this for a 16-bit environment such as MS-DOS or Windows generates the assembly language code shown in Figure 2. Under the assumption that 32-bit multiplies might be fairly common in some programs, a library function does all the hard work. I don’t know about you, but the idea that my program is making a function call every time I perform a 32-bit multiply is quite unnerving to me.
The 32-bit code shown in Figure 3 is by contrast a model of simplicity—the type of code you expect the compiler to generate.
Figure 4 shows another simple program, INIT16.C, that initializes a 256KB array with a constant value (in this case, the byte 55H). Since the array is larger than 64KB, the array must be defined as huge in a program targeted for 16-bit environments.
In a real-mode environment such as MS-DOS, a 256KB array actually occupies four consecutive 64KB segments in memory. But it cannot be accessed by a continuous range of incremented pointers. For example, if the first 64KB segment has a segment address of 1000H, then that segment must be referenced by the addresses 1000:0000H through 1000:FFFFH. The second segment in the array requires addresses 2000:0000H through 2000:FFFFH, and so forth. Hopping from segment to segment requires some special overhead, as shown in Figure 5.
In a flat 32-bit environment, the array can be accessed much more simply with a linear range of pointers. The source code shown in Figure 6 is the same as that in Figure 4 but without the huge keyword.
The 32-bit assembly-language code is shown in Figure 7. Again, it’s a model of simplicity and does exactly what you expect without any overhead.
The real-life speed improvements that you’ll experience when moving from 16-bit to 32-bit will depend on your code, of course. If you do a lot of work with 32-bit integers and arrays greater than 64KB, the results may be dramatic.
To illustrate just how dramatic, I wrote two versions of a small program—one 16-bit and the other 32-bit—that performs a simple bubble sort on an array (see Figures 8 and 9). For both programs, the elements of the array were 32-bit integers, initialized with 32-bit random integers.
The program sorted arrays of two sizes, one with 16,000 elements (for a total size of 64,000 bytes) and the other with 17,000 elements, for a total size of 68,000 bytes.
With the 16,000-element array, the performance advantage of 32-bit code derives mostly from using 32-bit compares instead of two 16-bit compares. The 16-bit version clocked in at 86 seconds and the 32-bit version took 62 seconds.
For the 17,000-element array, the 32-bit version ran a little longer, of course—72 seconds. This is roughly consistent with what would be expected from increasing the array size in a bubble sort.
Because an array of 17,000 32-bit integers is larger than 64KB bytes, the 16-bit version required a huge array, and the time leapt to an unbearable 366 seconds—five times the duration of the 32-bit version (see Figure 10).
Your mileage may vary, of course, but the potential of five-to-one performance improvements—equivalent to boosting a 20MHz machine to 100MHz—is simply the best single reason for moving to 32-bit environments.
Figure 10 Sort Times for 16- versus 32-bit Code
1For ease of reading, “Windows” refers to the Microsoft Windows operating system. Windows is a trademark that refers only to this Microsoft product.