Optimizing Memory Usage and Performance

Mark McCulley

Created: March 20, 1992

ABSTRACT

This article discusses techniques for optimizing memory usage and performance in applications designed for the MicrosoftÒ WindowsÔ graphical environment. These techniques include:

Using global memory

Minimizing selector loads

Using processor-specific code

Using the script channel of a multimedia movie player (MMP) movie to play sound

USING GLOBAL MEMORY

Most applications use the global heap to satisfy their requirements for large memory blocks. How this global memory is managed affects the memory demands and performance of an application. The following are general guidelines for using global memory:

For small memory blocks, use LocalAlloc to allocate memory from the local heap. Accessing memory from the local heap is faster than accessing memory from the global heap because you can use near pointers. Remember, the local heap is limited to 64K, including the application’s stack, static data, and global data.

Always allocate global memory as movable, using the GMEM_MOVEABLE option. Avoid using the GMEM_FIXED option—in the MicrosoftÒ WindowsÔ version 3.0 graphical environment, GMEM_FIXED page locks memory. Avoid using the GPTR option because it includes the GMEM_FIXED flag.

The number of global memory blocks in Windows is limited. If you are using several hundred memory blocks, consider managing your own memory heap and allocating fewer blocks of global memory. Sample code to do this appears in the article “Improve Windows Application Memory Use with Subsegment Allocation and Custom Resources,” available from Microsoft OnLine.

Locking and Unlocking Global Memory

In Windows version 3.0 Real mode, you must lock and unlock global memory diligently because locked memory cannot be moved. Windows version 3.1 runs only in protected (Standard or Enhanced) mode, so locking and unlocking global memory aren’t as critical. Although you should always allocate memory only when you need it, in protected mode you can keep global memory segments locked from the time they are allocated until they are freed.

Converting Selectors into Handles

The 8086 microprocessor uses a segment and an offset to generate a physical memory address, whereas the 80286, 80386, and 80486 processors use a selector and an offset. For more information on the 80386 architecture, see The 80386 Book, referenced at the end of this article.

When you allocate memory with GlobalAlloc, it returns a handle; pass this handle to GlobalLock to get a pointer to the memory. Normally, you need to save this handle until you are through using the memory block so that you can pass the handle to GlobalUnlock and GlobalFree. The following facts about memory management with Windows can help you avoid saving the handle to every global memory block you allocate:

A selector is a handle.

A handle is not a selector. You must use GlobalLock to convert a handle to a selector.

The high-order 16 bits of the far pointer returned by GlobalLock is a selector. You can cast this selector to a handle and pass it to GlobalUnlock and GlobalFree.

Global Memory Macros

You can use macros such as those shown below to allocate and free global memory. These macros are defined in the GMEM.H header file included with the MCITEST sample application in both the Multimedia Development Kit (MDK) and in the Microsoft Windows version 3.1 Software Development Kit (SDK).

HANDLE __H;

// Helpers for GAllocPtr and GFreePtr macros.

#define MAKEP(sel,off) ((LPVOID)MAKELONG(off,sel))

#define GHandle(sel) ((HANDLE)(sel))

#define GSelector(h) (HIWORD((DWORD)GlobalLock(h)))

#define GAllocSelF(f,dwSize) \

((__H=GlobalAlloc(f,(LONG)(dwSize))) ? \

GSelector(__H) : NULL)

#define GAllocPtrF(f,dwSize) MAKEP(GAllocSelF(f,dwSize),0)

#define GFreeSel(sel) \

(GlobalUnlock(GHandle(sel)),GlobalFree(GHandle(sel)))

// These are the workhorses:

#define GAllocPtr(dwSize) GAllocPtrF(GMEM_MOVEABLE,dwSize)

#define GFreePtr(lp) GFreeSel(HIWORD((DWORD)(lp)))

Two of these macros, GAllocPtr and GFreePtr, are directly useful to applications for allocating and freeing global memory.

The syntax for GAllocPtr is:

LPVOID GAllocPtr (DWORD dwSize)

GAllocPtr returns a pointer to a block of movable global memory. If a block of the requested size could not be allocated, GAllocPtr returns NULL.

The syntax for GFreePtr is:

HANDLE GFreePtr (LPVOID lpMem)

GFreePtr returns NULL if the memory was successfully freed.

Note:

The WINDOWSX.H header file in the Windows version 3.1 SDK includes memory macros similar to the macros in GMEM.H.

MINIMIZING SELECTOR LOADS

If you are using far and huge pointers in C, be careful about how you write the code and code optimization options for the compiler, or you might end up with inefficient code. The instructions that load segment registers are costly in terms of CPU cycles when running in protected mode. Compilers often generate one of these instructions each time a program dereferences a far or a huge pointer. The following table identifies some of these expensive assembly language instructions.

Instruction Description

LDS reg, mem Load DS segment register
LES reg, mem Load ES segment register
LFS reg, mem Load FS segment register (80386 and higher)
LGS reg, mem Load GS segment register (80386 and higher)
MOV sreg, mem Move data to segment register

The following list presents some general techniques for minimizing selector loads:

Avoid huge pointers except when they are necessary.

Use based pointers, which let you define a base that specifies the selector for a pointer. This technique is especially useful with code that has several far or huge pointers that point to the same block of memory. Try to use as few bases as possible, and be sure to use segment bases. For details on using based pointers, see the Advanced Programming Techniques manual that comes with version 6.0 of the Microsoft C compiler.

Experiment with different compiler optimization options. Loop optimization (/Ol), “assume no aliasing” optimization (/Oa and /Ow), and global subexpression optimization (/Og) can produce code that handles selector loads more efficiently.

The best way to evaluate the results of these different optimization strategies is to look at the assembly language code produced by the compiler. You can do this with a debugger such as Microsoft CodeViewÒ for Windows or have the compiler print an assembly language listing.

Note:

Using based pointers and compiler optimization options requires intimate knowledge of how your program uses memory and how your compiler works. This type of optimization can introduce bugs; be sure your code is solid and stable before attempting these optimizations.

OPTIMIZING WITH PROCESSOR-SPECIFIC CODE

Another way to optimize memory-intensive code is to write it in assembly language to take advantage of the processor that it is running on. This can improve performance significantly, especially if the processor is an 80386 or higher. Here are some guidelines:

Use processor-specific instructions.

For 80386 and higher processors, use the additional segment registers FS and GS. You can address memory located in different segments without reloading segment registers.

For 80386 and higher processors, use 32-bit offsets to address memory. You essentially have flat memory addressing and can access a block of memory greater than 64K without detecting segment boundaries.

Determining the Processor

To determine which processor your code is running on, you can call the GetWinFlags function. A faster way to do this in assembly language is to access the _WinFlags variable (defined in the Windows kernel) directly. Use the flags defined for GetWinFlags in the WINDOWS.INC include file to determine whether the processor is an 80286, an 80386, or higher, as shown in the following example:

//******************************************************************

// 286386.ASM - Shows how to determine processor type and

// call 80286-specific and 80386-specific routines.

//******************************************************************

?PLM=1 // PASCAL calling convention is DEFAULT

?WIN=0 // Windows calling convention (0 for protected mode)

.xlist

include cmacros.inc

include windows.inc

.list

externA __WinFlags // in kernel

externA __AHINCR // in kernel

externA __AHSHIFT // in kernel

//******************************************************************

// DATA SEGMENT DECLARATIONS

//******************************************************************

ifndef SEGNAME

SEGNAME equ <_TEXT>

endif

createSeg %SEGNAME, CodeSeg, word, public, CODE

sBegin Data

sEnd Data

sBegin CodeSeg

assumes cs,CodeSeg

assumes ds,Data

//******************************************************************

cProc doit,<FAR,PUBLIC>,<>

// Don't generate parameter info or modify stack frame.

cBegin <nogen>

mov ax,__WinFlags

test ax,WF_CPU286

jnz doit286

errn$ doit386 // assumes 386 routine is next (for speed)

// Don't clean up frame because it doesn't need it.

cEnd <nogen>

//******************************************************************

cProc doit386,<FAR,PUBLIC>,<>

// Note: CMACROS.INC does not allow 386 registers in the cProc

// statement.

//

// You can also put locals here. See cmacros documentation.

// Generate stack frame.

cBegin

// Don't turn .386 on until AFTER cBegin.

.386

// Now we can work with the 386 registers.

// For example:

pushedi

// Code goes here.

mf386_exit:

pop edi

// Turn off .386 BEFORE cEnd.

.286

// Cleans up stack frame and returns.

cEnd

// Still in 286 mode.

//******************************************************************

cProc doit286,<FAR,PUBLIC>,<di>

cBegin

// We are limited to 286 registers and instructions.

mf_exit:

// Code goes here.

// Cleans up stack frame and returns.

cEnd

sEnd

sEnd CodeSeg

end

OPTIMIZING MEMORY REQUIREMENTS OF MMP MOVIES

Playing multimedia movie player (MMP) movies with embedded waveform audio sounds can result in out-of-memory conditions, especially when running under ToolBookÒ on a 2-megabyte multimedia personal computer (MPC). The Movie Player loads the entire movie, including audio, into memory before playing it. Keeping the audio separate from the movie greatly reduces the memory demands of MMP movies.

Use the media control interface (MCI) to play audio during the movie by placing MCI command strings in the script channel of the movie. For details on how to do this, refer to the Multimedia Authoring Guide manual in the MDK.

SUGGESTED READING

For more information about memory management and optimization with Windows, refer to the following publications:

Nelson, Ross P. The 80386 Book: Assembly Language Programmer’s Guide for the 80386. Redmond: Microsoft Press, 1988.

Yao, Paul. “Windows 3.0 Memory Management: Supporting Disparate 80x86 Architectures.” Microsoft Systems Journal 5, no. 6 (1990).

Yao, Paul. “Improve Windows Application Memory Use with Subsegment Allocation and Custom Resources.” Microsoft Systems Journal 6, no. 1 (1991).