Memory Allocation for Server Applications

Platform SDK: Exchange Server

Memory Allocation for Server Applications

Server applications are rated by their throughput numbers and by the number of simultaneously connected users they can support. Therefore, performance is the most important consideration. Minimizing the effects of memory allocation functions on performance requires an understanding of the unique interactions of such a system and time spent tuning the system based on these unique interactions.

Important issues include the following:

Locality of reference (how many memory pages are touched)
Scalability on multiple CPUs
Fragmentation
Serialization
Matching the memory allocation characteristics of an application to its environment
Considering multiple threads and multiple processors and their effect on the performance of memory allocation functions
Matching the memory allocation characteristics of an application to its purpose

For a server application, the choices include the MAPI allocator functions, the Win32 HeapAlloc or GlobalAlloc functions, or writing allocator functions specifically for the application.

MAPI requires the use of MAPI allocators for certain allocations. When MAPI allocators are used and several very busy threads must make MAPI calls, there will be serialization and reduced performance.

The MAPI allocators do not give the application control over the placement of memory blocks, which makes it impossible for the application to tune locality. The MAPI allocators always serialize through a critical section on a common heap because they can make no assumptions about how they will be called. Besides adding CPU overhead, this can dramatically impact the scalability of a multithreaded server.

There has been considerable analysis of how Win32 memory allocator functions behave in the information store process. Fragmentation has not been shown to be a problem with a single heap. Win32 functions give good performance in coalescing free blocks and packing new allocations into the right place. The information store, however, is heavily serialized with a single heap. Using the multiple heaps available with some other allocation strategies virtually eliminates this contention.

Because calls to the MAPI functions PrepareRecips and ResolveNames reallocate memory in ADRENTRY, it is likely that the MAPI allocators must be called if an ADRLIST is used in your application. Because of this, at some point in a multithreaded application there will be multiple threads trying to access the one Microsoft Exchange Server directory heap available to MAPI client applications. The contention for this heap will have an adverse effect of performance. To see if this is affecting your application, count the number of allocations per hour against the number of threads using MAPI allocators. A gateway is an example of an application that is very likely to use PrepareRecips and ResolveNames.

Server applications do not normally use the OLE IMalloc memory allocation function except where required by OLE interfaces.

The malloc and free C runtime functions cause problems for server applications in two ways. First, unless the Visual C++ 4.0 version is used, free does not return memory to the operating system. Unless the application is able to call _heapmin periodically, its virtual memory consumption will grow without bound. The C++ new and delete operators use malloc by default. Overhead-conscious server applications written in C++ should always overwrite the new and delete operators. Whether to replace them with a new pair of global operators that call HeapAlloc, or to implement smarter class-specific operators, depends on how the application organizes its object lifetimes. The malloc and free memory allocation functions as implemented in Visual C++ 4.0 actually call HeapAlloc and HeapFree functions respectively. If that version is used, there is no problem with returning freed memory to the system.

Another option to consider is a package of memory allocation functions written for your application, which can implement multiple heaps serialized outside the HeapAlloc and HeapFree functions. Such a package can eliminate almost all heap contention by tracking which heap is busy and always trying to steer allocations to non-busy heaps. This is useful for per-thread heaps. When the heap is created, the parallelism can be specified, which controls the number of heaps.