Allocating Memory the Old-Fashioned Way: _fmalloc and Applications for Windows

Dale Rogerson
Microsoft Developer Network Technology Group

Created: July 10, 1992

Abstract

One of the most shocking things that a first-time programmer for Windows has to learn is not to use malloc but to use special MicrosoftÒ WindowsÔ memory allocation functions such as GlobalAlloc, GlobalReAlloc, GlobalLock, GlobalUnlock, and GlobalFree. The reasons for requiring special memory allocation functions have mostly gone away with the demise of real mode. In fact, Microsoft C/C++ version 7.0 brings us almost full circle, because the preferred method for memory allocation is the large-model version of malloc or _fmalloc. Even the C startup code now uses malloc to allocate space for the environment.

This article discusses the behavior of malloc supplied with Microsoft C/C++ version 7.0. The article focuses on programming for the protected modes—standard and enhanced—of Microsoft Windows version 3.1. The following topics are discussed:

_nmalloc: Why _fmalloc is not the same

History: Why _fmalloc was bad

Subsegment Allocation: Why _fmalloc is good

_ffree: Why _fmalloc is not perfect

DLLs: Why _fmalloc may not do what you want

Versatility: Why _fmalloc is not for everything

The information for this article was gleaned from the C/C++ version 7.0 compiler run-time library source code.

To interactively explore the behavior of _fmalloc, the Smart Alloc (SMART.EXE) sample application is provided. Smart Alloc is best used in conjunction with Heap Walker, which shows the exact state of the global segments allocated. Segments allocated with GlobalAlloc (or _fmalloc) are listed by Heap Walker as having a type designation of "Private." Smart Alloc has a dynamic-link library (DLL) that intercepts all calls to GlobalAlloc, GlobalFree, and GlobalReAlloc made by Smart Alloc or the C run-time library and prints messages with OutputDebugString to the debugging console. It is usually most convenient to use DBWIN.EXE to view these messages.

_nmalloc: Why _fmalloc Is Not the Same

When compiling with the large data model libraries (compact-, large-, and huge-model programs), malloc is automatically mapped to _fmalloc. In other memory models, the programmer must explicitly call _fmalloc, because malloc maps to _nmalloc in these memory models.

_nmalloc functions differently from _fmalloc. _nmalloc directly maps to LocalAlloc with the LMEM_NODISCARD | LMEM_FIXED flags. _nfree directly calls LocalFree. Because _nmalloc allocates fixed memory blocks, it can lead to fragmentation of the local heap.

History: Why _fmalloc Was Bad

Before MicrosoftÒ WindowsÔ version 3.1, programmers had to worry about compatibility with Windows-based real mode, which required the locking and unlocking of memory handles to support movable memory. A locked block in real mode is fixed in memory, and leaving blocks locked would result in performance degradation. The way _fmalloc is defined meant that an allocated block would have to be locked throughout its lifetime. When Microsoft C version 6.0 was released, real mode was the only mode in Windows; therefore, _fmalloc was designed to work under real mode.

Microsoft C/C++ version 7.0 was designed to develop protected-mode applications for Windows. In protected mode, there is no penalty for locking a memory handle and leaving it locked. It is not even necessary to retain the handle returned from GlobalAlloc, because the GlobalHandle function returns the handle to a selector returned from GlobalLock. Macros defined in WINDOWSX.H simplify the process of getting a pointer to a block of memory. The GlobalAllocPtr and GlobalFreePtr macros automatically lock and unlock a memory block.

Microsoft C/C++ version 7.0 takes advantage of the new freedom allowed by protected mode. _fmalloc can now leave memory blocks locked with no penalty under the two protected modes of Windows version 3.x.

Subsegment Allocation: Why _fmalloc Is Good

One of the current limitations of Windows version 3.x is the systemwide limit of 8192 selectors (4096 for standard mode). Each call to GlobalAlloc uses one selector and has an overhead of 32 bytes, which makes GlobalAlloc inappropriate for allocating many small blocks of memory.

For example, take a flat file database that reads in a list of names and addresses from the hard disk and puts them in a binary tree. If GlobalAlloc is called for each name and address, this program would not be able to store more than 4096 names. Many companies have more than 4096 employees. In fact, the actual number of available selectors is far less than 8192 because all Windows-based applications and libraries must share from the same pool of selectors.

_fmalloc implements a much more intelligent use of selectors. Instead of allocating a new segment for each memory request, _fmalloc tries to satisfy as many requests as possible using a single segment. _fmalloc expands the segment as needed and returns pointers to areas of memory within the segment. This process of managing memory within a segment is called subsegment allocation.

In the first call, _fmalloc allocates a segment with GlobalAlloc using GMEM_MOVEABLE. (GMEM_SHARE, also set when compiling dynamic-link libraries [DLLs], will be examined in the section on DLLs.) The block allocated by _fmalloc is, therefore, not fixed in memory. It is movable. The selector associated with this block of memory will not change. However, because malloc returns a pointer to a location within the segment, the pointer will not have an offset of zero (selector:0) as GlobalAlloc does.

In the next call, _fmalloc first tries to satisfy the request without allocating any memory. If this is not possible, it attempts to do a GlobalReAlloc instead of a GlobalAlloc. This reduces the number of selectors used by the program. If the segment size must grow larger than the _HEAP_MAXREQ constant defined in malloc.h to meet the allocation request, GlobalAlloc is called again. _HEAP_MAXREQ is defined to be 0x0FFE6 or 65,510 bytes. This leaves enough room for the overhead needed to manage the heap and not have any memory crossing a segment boundary. If more than _HEAP_MAXREQ memory is requested, the _fmalloc call returns a null pointer.

Figures 1 and 2 illustrate the behavior of _fmalloc.

Figure 1. _fmalloc versus GlobalAlloc

Figure 1 illustrates how _fmalloc satisfies several memory requests with one segment consuming only one selector when the requested blocks are less than _HEAP_MAXREQ. Each call to GlobalAlloc, on the other hand, uses up a selector.

Figure 2. _fmalloc subsegment allocation

Figure 2 shows how _fmalloc allocates a new segment when it cannot satisfy a request with the old segment because the requested block would cause the segment to grow larger than _HEAP_MAXREQ. Notice how neither GlobalAlloc nor _fmalloc allocates exactly the number of bytes that are requested. Both functions have some overhead. The current version of _fmalloc requires 22 bytes of overhead on top of the overhead of GlobalAlloc. It also defines the smallest segment size to be 26. Future versions of _fmalloc may require more or less overhead. _fmalloc also returns a pointer that is guaranteed to be aligned on double-word boundaries.

_fmalloc attempts to be more efficient than GlobalAlloc by allocating memory from Windows in chunks, hoping to satisfy several memory requests while using only one selector and without needing to call GlobalAlloc or GlobalReAlloc again. In some cases, this can lead to faster speeds.

The amount of memory that _fmalloc initially allocates to a new segment is rounded up to the nearest 4K boundary. If less than 4070 bytes (4096 - 26) is requested, 4K is allocated. If 4096 - 26 + 1 is requested, 8K is allocated. This behavior differs from the explanation in the Microsoft C/C++ version 7.0 Run-Time Library Reference, which states that the initial requested size for a segment is just enough to satisfy the allocation request.

When _fmalloc can satisfy a request by growing the segment, it calls GlobalReAlloc. The global variable _amblksiz determines the amount by which the segment is grown. _fmalloc will grow the segment in enough multiples of _amblksiz to satisfy the request. The default value of _amblksiz is 4K for Windows, instead of the 8K used by MS-DOSÒ. You can set _amblksiz to any value, but _fmalloc rounds it up to the nearest whole power of two before it is used.

The sample application, Smart Alloc (SMART.EXE), can be used to explore the behavior of _fmalloc in detail. Examine Smart Alloc's Help file for more information on using it. Try allocating 1 byte of memory. _fmalloc calls GlobalAlloc with a size of 4K. Try allocating 4070 bytes and 4071 bytes. Smart Alloc also lets you experiment with different values of _amblksiz.

The frugal behavior of _fmalloc makes it suited to allocating bunches of small memory objects. However, as will be shown in the next section, _fmalloc is not suitable for all uses.

_ffree: Why _fmalloc Is Not Perfect

While the subsegment allocation scheme employed by _fmalloc is very good, the behavior of _ffree is not as straightforward as GlobalFree. Knowledge of this behavior is very important to avoid wasting large amounts of memory. The following example illustrates the behavior of _ffree.

Note:

In Figures 3 through 7, it is possible for Selector 3 to have a lower or higher value than Selector 1. The number indicates in what order the selectors were allocated.

Figure 3. Freed segments are not GlobalFree'd

In Figure 3, the last block allocated has been freed. However, its memory is not returned to the system.

Figure 4. Freed blocks are not reallocated

In Figure 4, the first and fourth blocks of memory are freed in addition to Block 5. Again, no memory is returned to Windows with a GlobalFree. If _fmalloc returned the memory for the first block to Windows, the pointer to Block 2 would have to change. It would be possible for _fmalloc to GlobalReAlloc the memory associated with Selector 2 and GlobalFree the memory associated with Selector 3. This can be accomplished with the C/C++ run-time library, as will be explained in conjunction with Figure 7.

Figure 5. Figure 4 followed with an _fmalloc(x/2)

In Figure 5, a new block has been allocated. Because this block is half the size of the previous first block, _fmalloc places it in this empty block of Selector 1.

Figure 6. Figure 5 followed with an _fmalloc(2 * x)

In Figure 6, another block of memory is allocated. This time it is twice the size of the previous blocks of memory. Because this block is too large to fit into the heap associated with Selector 2, the memory associated with Selector 3 is reallocated to hold it.

Figure 7. Figure 4 followed by _heapmin

If memory is set up as in Figure 4, calling _heapmin will leave memory in the state shown by Figure 7. _heapmin performs the following actions to achieve this state:

Memory associated with Selector 1 is GlobalReAlloc'ed to remove the padding.

Selector 2's memory is GlobalReAlloc'ed to remove the freed block and padding.

GlobalFree releases Selector 3 and all of its memory.

To recreate the previous examples with Smart Alloc, use 22,000 bytes for the size x. It is important to note that Smart Alloc sorts allocated memory by handle (that is, selector) and not the order in which it was allocated.

In addition to _heapmin, the C compiler run-time library contains many other functions to help manage the heap created by _fmalloc. Descriptions of these functions are in the Microsoft C/C++ version 7.0 Run-Time Library Reference. Like _heapmin, most of these functions are unique to C/C++ version 7.0 and are not ANSI C compatible. Below is a list of these unique functions:

Reallocation functions:
_fexpand	Expands or shrinks a block of memory without moving its location.
_frealloc	Reallocates a block to a new size. Might move the block of memory.
_heapadd	Adds memory to a heap.
_heapmin	Releases unused memory in a heap.
Information functions:
_fmsize	Returns size of an allocated block.
_fheapwalk	Returns information about each entry in a heap.
Debugging functions:
_fheapset	Fills free heap entries with a specified value.

All programmers who decide to use _fmalloc must be aware that _ffree does not return memory to the operating system. For example, an application might read in an entire text file and display it on the screen. Let's say that the application keeps a linked list of lines and mallocs the memory for each line in the file. If the user selects a large file of about 1 megabyte (MB), the application allocates at least 1 MB of memory. The user then closes the file. The application faithfully calls _ffree for each line in the file. Even though the application does not need the memory, it is still hogging it from the system. This application needs to call _heapmin or one of the other heap management functions.

Why doesn't _ffree call GlobalFree? There are two main reasons:

Speed. It is faster to keep the memory allocated than to repeatedly call GlobalAlloc, GlobalReAlloc, and GlobalFree. _fmalloc calls can be extremely fast when _fmalloc only has to return a pointer to an existing block of memory.

Pointers. _fmalloc returns pointers to an offset inside a segment. _fmalloc would have to move the memory pointed to by these pointers if it were to actually call GlobalFree to free the memory. It is not possible for _fmalloc or _ffree to update all the pointers into its heap.

Note:

All memory (freed and unfreed) is returned to the system as part of the Windows kernel's normal clean-up process when the application exits.

DLLs: Why _fmalloc May Not Do What You Want

As mentioned above, when _fmalloc must allocate a segment, it makes a call to GlobalAlloc. For applications, it allocates the segment as GMEM_MOVEABLE. For DLLs, _fmalloc calls GlobalAlloc with GMEM_SHARE | GMEM_MOVEABLE flags. _fmalloc maintains only one heap for a DLL, which is shared by all applications that use the DLL. In most cases, programmers do not really want the memory allocated from a DLL marked as GMEM_SHARE.

The GMEM_SHARE flag tells Windows that this memory is going to be shared by several programs. The most immediate consequence of using GMEM_SHARE in a DLL is that the memory will not be released until the DLL is unloaded from memory. The DLL is not always unloaded from memory when the application that loads it exits. Because multiple applications or instances of an application are using a DLL, the DLL and its memory will not be unloaded until all applications using the DLL have exited.

The following are the possible times when memory is freed:

If an application allocates memory and does not free it, the memory is freed by Windows when the application exits.

If an application calls a DLL that allocates memory without the GMEM_SHARE flag (via GlobalAlloc), the memory is owned by the application and will be freed when the application exits.

If an application calls a DLL that allocates memory with the GMEM_SHARE flag, the memory will be owned by the DLL and not by the application. The memory will be released when the DLL is unloaded and not when the application exits.

If a programmer is not careful, the use of _fmalloc in a DLL can lead to large pools of allocated but unneeded memory. It is usually best to use the GMEM_SHARE flag only when memory must be shared or must exist for the lifetime of the DLL. This means that, in many cases, GlobalAlloc should be used instead of _fmalloc in a DLL.

Remember, calling _ffree does not generate a call to GlobalFree. Even if the DLL is freeing memory before it returns to the application, memory can be wasted. Refer to the previous section on _ffree for more information.

The situations listed above can be demonstrated by using the Smart Alloc sample application. Perform the following steps:

1.Run Heap Walker (HEAPWALK.EXE).

2.Run an instance of Smart Alloc (SMART.EXE).

3.GlobalAlloc 1000 bytes of movable memory from a DLL. (See the Smart Alloc help file for details on how to do this.)

4.Walk the global heap using Heap Walker and examine the listing. The above memory should be owned by Smart Alloc. It will differ slightly in size due to the overhead and padding performed by GlobalAlloc.

5.GlobalAlloc 2000 bytes of shared memory from a DLL.

6.Walk the global heap using Heap Walker and examine the listing. The memory allocated in step 5 should be owned by SMARTDLL.DLL. It will differ slightly in size due to the overhead and padding performed by GlobalAlloc.

7.Run a second instance of Smart Alloc. Do not exit the first instance.

8.GlobalAlloc 3000 bytes of movable memory from a DLL using the second instance of Smart Alloc.

9.GlobalAlloc 4000 bytes of shared memory from a DLL using the second instance of Smart Alloc.

10.Walk the global heap in Heap Walker and examine the listing. The memory allocated in steps 8 and 9 should be owned and allocated like the memory allocated by the first instance in steps 4 and 5. In fact, the memory allocated in step 9 will be allocated in the same segment as the memory allocated for the first instance of Smart Alloc in step 5.

11.Exit the second instance of Smart Alloc.

12.Walk the global heap using Heap Walker and examine the listing. The 3000-byte segment will have been discarded by Windows, but the 4000-byte segment owned by SMARTDLL.DLL will still exist.

Figures 8 and 9 illustrate the above sequence. Figure 8 illustrates the state of memory after executing steps 1 through 10 in the list above.

Figure 8. State of memory after step 10

Figure 9 illustrates what is freed after Instance 2 is deleted.

Figure 9. State of memory after closing Instance 2

Remember that _fmalloc allocates memory with the GMEM_SHARE option set.

Versatility: Why _fmalloc Is Not for Everything

While the subsegment allocation makes _fmalloc better for general use, it does not provide the same kind of versatility that GlobalAlloc does. Below is a list of some of the things that GlobalAlloc can do that _fmalloc cannot:

Allocate memory with the GMEM_SHARE flag in an application.

Allocate nonshared memory from a DLL.

Allocate more than 64K. GlobalAlloc takes a DWORD, while _fmalloc takes a size_t, which is an unsigned int. _halloc can also be used to allocate more than 64K in a block of memory.

Allocate fixed memory, discardable memory, or memory with the other GMEM_* attributes.

Although most programmers do not think of general protection faults as a positive thing, they can be helpful in locating where a program writes outside of a memory block. Because _fmalloc returns a pointer into a block of memory, it is possible to write past the end of the block and not write past the end of the segment.

Conclusion

In most cases, _fmalloc and _ffree utilize system resources better than directly calling GlobalAlloc and GlobalFree. The subsegment allocation scheme used by _fmalloc reduces the number of selectors needed and also reduces the amount of system overhead.

While the subsegment allocation scheme is a boon to programmers for Windows, _fmalloc is not without its limitations. The most important one to remember is that memory is not returned to Windows when _ffree is called.

Also keep in mind that calling _fmalloc from a DLL allocates memory with the GMEM_SHARE attribute set, which is usually not what is wanted because memory is not freed until the DLL is unloaded.