Microsoft(R) Windows[TM] and the C Compiler Options

Dale Rogerson

Microsoft Developer Network Technology Group

Mr. Rogerson is widely known for having reported the largest number of duckbill platypus sightings in the greater Seattle area.

Created: May 5, 1992

ABSTRACT

One of the key issues in the development and design of commercial applications is optimization—how to make an application run quickly while taking up as little memory as possible. Although optimization is a goal for all applications, the MicrosoftÒ WindowsÔ graphical environment presents some unique challenges. This article provides tips and techniques for using the Microsoft C version 6.0 and C/C++ version 7.0 compilers to optimize applications for Windows. It discusses the following optimization techniques:

Using compiler options

Optimizing the prolog and epilog code

Optimizing the calling convention

Aliasing (using the /Ow and /Oa options)

GENERAL OPTIMIZATION STRATEGIES

Optimization is a battle between two forces: small size and fast execution. As with most engineering problems, deciding which side to take is never easy. The following guidelines will help you optimize your applications for the MicrosoftÒ WindowsÔ graphical environment.

If your application runs in Real mode, always optimize for size. Memory is the limiting resource in Real mode. Using too much memory leads to both speed loss and memory loss, resulting in a performance hit.

Memory is not as scarce in protected mode (that is, in Standard and Enhanced modes) as it is in Real mode, so you must decide whether to optimize for speed or for size. However, as users start running multiple programs simultaneously, memory becomes scarce. The rule of thumb for both Windows and other operating environments is to optimize for speed the 10 percent that runs 90 percent of the time. Tools such as the Microsoft Source Code Profiler help determine where optimizations should be made.

Because Windows is a visual interactive environment, several shortcuts help identify areas that need speed optimization. Any code that displays information directly on the screen, including code that responds to WM_PAINT, WM_CREATE, and WM_INITDIALOG messages, should be optimized. A dialog box does not appear until the WM_INITDIALOG process is complete, so the user must wait. Speed is not as critical in other areas because the user can move the mouse only so fast. In most situations, the code underlying the selection processes in a dialog box need not be optimized.

Note:

The Microsoft C version 6.0 compiler precedes most function modifiers with a single underscore (_), for example, _loadds, _export, _near, _far, _pascal, _cdecl, and _export. The Microsoft C/C++ version 7.0 compiler uses two underscores (__) for ANSI C compatibility but recognizes the single underscore for backward compatibility. This article uses C version 6.0 compiler syntax except when discussing features available only in C/C++ version 7.0.

THE SAMPLE APPLICATION: ZUSAMMEN

The sample application, Zusammen, illustrates the use of the compiler options. Zusammen, which means together in German, scrolls two different windows simultaneously. To scroll, the user selects the windows with the mouse and clicks Zusammen’s scroll bars. This makes it easy to compare data in two different windows or applications.

Zusammen consists of a program generated by MAKEAPP and a dynamic link library (DLL) called Picker. MAKEAPP is a sample program included in the Windows version 3.1 Software Development Kit (SDK). The Picker DLL selects the windows to be scrolled.

The make files for Zusammen and Picker are combined for simplicity. All functions are classified as local, global, entry point, or DLL entry point and declared with an appropriate #define statement, for example:

void LOCAL cleanup(HWND hwndDlg);

BOOL DLL_ENTRY Picker_Do(HWND, LP_PICKER_STRUCT);

A local function is a function called from within a segment.

A global function is a function called from outside a segment.

An entry point is a function that Windows calls.

A DLL entry point is a DLL function that a client application calls.

For demonstration purposes, the symbols are defined in the make files. Using symbols facilitates switching memory models and optimizing applications. You can also port applications to flat-model environments easily by using #define NEAR and #define FAR (from WINDOWS.H) instead of __near and __far. Some possibilities are:

#define LOCAL NEAR

#define DLL_ENTRY FAR PASCAL

or:

#define LOCAL NEAR PASCAL

#define DLL_ENTRY FAR PASCAL __loadds

THE SOLUTION

Tables 1 through 3 show options recommended for general use. These options can be used as defaults in make files because they do not require changes to the source code to compile correctly. Each table shows the options for building an application and a DLL and differentiates between the debugging (development) phase and the released product. The options in Table 1 apply to applications or libraries that run in Real mode; the options in Tables 2 and 3 apply to applications or libraries that run only in protected mode. Table 3 is for C/C++ version 7.0 use only.

The developer must choose either the /Ot option to optimize for speed (time) or the /Os option to optimize for size. The C version 6.0 compiler defaults to /Ot. The C/C++ version 7.0 compiler defaults to /Od, which disables all optimizations and enables fast compiling (/f).

The /Oa and /Ow options do not appear in the tables; both options assume no aliasing and require that the C source meet certain conditions to work properly. These two options are discussed in the “Aliasing and Windows” section. In general, use /Ow instead of /Oa for Windows-based applications. You can turn the no-aliasing assumption on and off using #pragma optimize with the a or w switch.

Another option that is not included in the tables is the optimized prolog/epilog option /GW. In C version 6.0, this option generates code that does not work in Real mode; it is fixed in C/C++ version 7.0. For backward compatibility, the C/C++ version 7.0 /Gq option generates the same prolog/epilog as the C version 6.0 /GW switch. Although the fixed /GW option results in a smaller prolog for non-entry-point functions, better optimizations are available for protected-mode applications, as discussed in the next section.

Table 1. Compiler Options for Real Mode (C 6.0 and C/C++ 7.0)

The General Solution for Protected Mode

If your application runs only in protected mode, you can use the additional optimization options shown in the second row of Table 2. Make1 demonstrates the use of these options, which are safe for all modules in a protected-mode application.

You can realize additional savings in space and time by compiling modules without entry points separately from those with entry points. Use the options in the third row of Table 2 for modules without entry points. Make2 demonstrates the use of both sets of options. The Zusammen sample application is already set up with far calls and entry points in separate C files. This application should run only in protected mode, so you should compile with the resource compiler (RC) /T option to ensure that the application never runs in Real mode.

DLLs can benefit from the techniques presented in the “Optimized DLL Prolog and Epilog” section. These techniques work with both C version 6.0 and C/C++ version 7.0.

Table 2. Compiler Options for Protected Mode Only (C 6.0 and C/C++ 7.0)

The General Solution for Protected Mode and C/C++ 7.0

The C/C++ version 7.0 compiler includes special optimizations for protected-mode Windows programs (see Table 3). These special optimizations include /GA (for applications), /GD (for DLLs), and /GEx (to customize the prolog) and help reduce the amount of overhead the prolog/epilog code causes. The /GA and /GD options add the prolog and epilog code only to far functions marked with __export instead of compiling all far functions with the extra code. With __export, entry points need not be placed in a separate file as required by C version 6.0.

Applications that do not mark far functions with __export can use the /GA /GEf or /GD /GEf options to generate the prolog/epilog code for all far functions. /GEe causes the compiler to export the functions by emitting a linker EXPDEF record. By default, /GD emits the EXPDEF record but /GA does not. Applications compiled with /GA usually do not need the EXPDEF record. Only Real-mode applications need /GEr and /GEm; protected-mode applications have no use for these options. The following options generate equivalent prolog/epilog code:

/GA is equivalent to /GA /GEs /D_WINDOWS.

/GD is equivalent to /GD /GEd /GEe /Aw /D_WINDOWS /D_WINDLL.

Table 3. Compiler Options for Protected Mode (C/C++ 7.0 Only)

OVERVIEW OF COMPILER OPTIONS

Generate Intrinsic Functions (/Oi)

The /Oi option replaces often-used C library functions with equivalent inline versions. This replacement saves time by removing the function overhead but increases program size because it expands the functions.

In C version 6.0, the /Oi option is not recommended for general use because it causes bugs in some situations, especially when DS != SS. Using #pragma intrinsic to selectively optimize functions reduces the chance of encountering a bug.

The ZUSAMMEN.C module of the sample application demonstrates the use of #pragma intrinsic. Although this particular use does not drastically increase program speed, it does demonstrate the right ideas: It speeds up the WM_PAINT function and is used on a function that is called three times per WM_PAINT message. The best savings occur when the intrinsic function is in a loop or is called frequently.

Pack Structure Members (/Zp)

The /Zp option controls storage allocation for structures and structure members. To save as much memory as possible, Windows packs all structures on a 1-byte boundary. Although this saves memory, it can result in performance degradation. IntelÒ processors work more efficiently when word-sized data is placed in even addresses. An application must pack Windows structures to communicate successfully with Windows, but it need not pack its own structures. Because Windows structures are prevalent, it is better to compile with the /Zp option and use #pragma pack on internal data structures. Passing an improperly packed structure to Windows can lead to problems that are difficult to debug. Both Zusammen and Picker use #pragma pack on their internal data structures. (See the FRAME.H, APP.H, and PACK_DLL.H modules.)

Note that PICKER.DLL packs PICKER_STRUCT. Because most Windows-based applications pack structures, it is safer to leave DLL structures packed. In most cases, the speed optimization is not worth the extra trouble of documenting the unpacked functions, especially if the DLL will be used with other languages or products, such as Microsoft Visual BasicÔ or Microsoft Word for Windows.

Set Warning Level (/W3)

All Windows-based programs should be compiled at warning level 3. You can fix many hard-to-detect bugs by removing the warnings that appear during compilation. It is less expensive to fix a warning message than to ship a bug fix release to unsatisfied users. All applications should be run in Windows debug mode before release.

Compile for Debugging (/Zi) and Disable Optimizations (/Od )

It is often easier to turn off optimizations to debug a module. Some optimizations can introduce bugs into (or remove bugs from) otherwise correct programs. For this reason, an application must be fully tested with release options, and all developers and testers should be aware of the options used.

Stack Checking (/Gs)

By default, the compiler generates code to “check the stack”; that is, each time a function is called, chkstk (actually _aNchkstk) compares the available stack space with the additional amount the function needs. If the function requires more space than is available, the program generates a run-time error message. Table 4 shows the call to chkstk, which is removed by compiling with /Gs. Stack checking adds significant overhead, so it is usually disabled with the /Gs option after sufficient testing. It is usually a good idea to reenable stack checking on recursive functions with the check_stack pragma.

#define Statements (/DSTRICT, /D_WINDOWS, /D_WINDLL)

The #define statements /DSTRICT, /D_WINDOWS, and /D_WINDLL are recommended for all Windows-based applications. Using /DSTRICT with WINDOWS.H results in a more robust and type-safe application. /DSTRICT lets you use macros to replace Windows functions such as GetStockObject with type-safe versions such as GetStockBrush and GetStockPen.

The C header files use /D_WINDOWS and /D_WINDLL to determine the correct prototypes and typedefs to include. /D_WINDLL ensures that using an invalid library function in a DLL generates an error. The C/C++ version 7.0 compiler /GA option automatically sets /D_WINDOWS; the /GD option sets both /D_WINDOWS and /D_WINDLL.

OPTIMIZING THE PROLOG AND EPILOG

Programs designed for Windows, unlike those designed for MS-DOSÒ, have special sections of code called the prolog and epilog added to entry points. For this reason, Windows uses special compilers. When you compile a program with the /Gw option, all far functions receive the extra prolog and epilog code and increase in size by about 10 bytes. You can take the following steps to reduce this overhead, especially for protected-mode-only applications:

Reduce the number of far calls.

Reduce the prolog and epilog code.

Reducing the Number of Far Calls

Because /Gw adds the extra code only to far functions, reducing the number of far functions is a good way to trim program size. In the small memory model, all functions are near unless explicitly labeled as far, so reducing far calls is not a problem. In the medium memory model, all functions default to far and therefore receive the extra prolog and epilog code. In C version 6.0, you can use two methods to reduce this overhead:

Organize source modules. Label all functions explicitly as either near or far, and compile with the medium model.

Use mixed-model programming with small model as the base.

C/C++ version 7.0 users do not need either of these methods; they can use the /GA and /GD options to add prolog/epilog code only to far functions marked with __export. Other far functions are compiled without additional overhead. To add the prolog and epilog code to all far functions, use /GA /GEf or /GD /GEf.

Organizing source modules

To reduce the number of far calls, you must organize source modules carefully. Each module is divided into internal functions and external functions. Internal functions are called only from within the module; external functions are called from outside the module. As a direct result of this arrangement, internal functions are marked near and external functions are marked far.

The Zusammen sample application is arranged in this manner. Each module has a header file that prototypes all external functions as far. Each source file prototypes its internal functions as near because they are not needed outside the module.

For large applications, you can use a tool such as MicroQuill’s Segmentor to determine the best segmentation to use. You can also organize source modules manually, but the process must be repeated whenever the source file changes.

Another method for reducing far call overhead is to use the FARCALLTRANSLATION and PACKCODE linker options. This method works exclusively on protected-mode-only applications and should not be used in Real mode. PACKCODE combines code segments. You can specify the size of the segments to pack on the command line (for example, /PACKCODE:8000). The default size limit is 65530 bytes. C/C++ version 7.0 turns PACKCODE on by default for all segmented executables. If a far function is called from the same segment, FARCALLTRANSLATION replaces the far call with a near call.

Mixed-model programming

In mixed-model programming, the small model acts as the base. All far functions are explicitly labeled as in the previous method. Each module is compiled with the /NT option, which places the module in a different segment, for example:

cl /c /Gw /Od /Zp /W3 /NT _MOD1 mod1.c

cl /c /Gw /Od /Zp /W3 /NT _MOD2 mod2.c

Because the small model is used, all other functions default as near model and presto!—no far call overhead. The SDK Multipad sample application uses this method for compiling, although many of its near functions are labeled as such. Make3 compiles Zusammen using this method.

In practice, this method does not save much work—it only eliminates the need to label near functions explicitly. However, labeling near functions is useful for documenting local and global functions.

In mixed-model programming, only functions in the default _TEXT code segment can call the C run-time library. Multipad avoids this limitation by not calling any C run-time library functions. Mixed-model programming uses the small-model C library, which is placed in the _TEXT segment. Because these library routines are based in small model, they assume all code as near. If a C library function is called from a different segment, a linker fixup error occurs because the linker cannot resolve a near jump into another segment. There is no convenient way to avoid this restriction.

Removing the C run-time library

Because the C run-time library is not used, you need not link to it. The Windows version 3.1 SDK includes libraries named xNOCRTW.LIB1 that do not contain any C run-time functions. Each memory model has one such library containing the minimum amount of code needed to resolve all compiler references. Using this library saves about 1.5K from the _TEXT code segment size and about 500 bytes from the default data segment size. Linking time also improves slightly. When using the xNOCRTW.LIB libraries, note that the standard C libraries may contain some operations that seem ordinary (such as long multiplication).

Examining the Prolog and Epilog Code

Decreasing the number of far functions is only part of the battle. Not all far functions need the full prolog and epilog code, as the existence of the /GW, /GA, and /GD options shows. The C/C++ version 7.0 /GA and /GD options provide the best achievable optimizations of the prolog and epilog code. The C version 6.0 /GW option provides an optimized version of the prolog/epilog code for far functions that are not entry points. However, when armed with a little knowledge, the C version 6.0 compiler user can generate better results for protected-mode applications than those the /GW option provides, as discussed in the following sections.

What does the prolog/epilog code do anyway?

The prolog/epilog code sets the DS register to the correct value to compensate for the existence of multiple data segments and their movements. The second column of Table 4 shows the assembly-language listing of the prolog/epilog code that every far function receives when it is compiled with /Gw. The last column shows the prolog/epilog code that near functions receive. This is the same code that far functions contain when they are not compiled with /Gw.

Table 4. Assembly Listing of Prolog and Epilog Code (C 6.0)

C/C++ version 7.0 provides additional optimizations for Real mode, even if you use the /Gw and /GW options. These optimizations include:

Using mov ax,ds instead of a push/pop sequence in the Preamble phase.

Using lea sp, WORD PTR -2[bp] for the Release Frame phase.

Table 5 shows the compiler output for these options.

Table 5. Assembly Listing of Prolog and Epilog Code (C/C++ 7.0)

Most of the prolog/epilog code is not needed in protected mode but is essential for Real mode. The /GW option does not have the push ds instruction that all far functions require in Real mode to save the data segment; for this reason, /GW does not work in Real mode. Not much can be done to optimize the prolog/epilog code that C version 6.0 generates for Real-mode applications, so this article focuses on optimization in protected mode only. For more information on what happens during Real mode, see Programming Windows by Charles Petzold (Redmond, Wash.: Microsoft Press, 1990). For the compiler writer’s viewpoint, see the Windows version 3.1 SDK Help file.

The order of phases in the C/C++ version 7.0 compiler options /GA and /GD differs slightly from that of /Gw: The Alloc Frame phase occurs before the Save DS and Load DS phases (when compiling without /G2). As a result, the /GA and /GD options remove the two dec bp instructions from the Release Frame phase. The compiler output for the /GA and /GD options is shown in Table 6.

Table 6. Assembly Listing of Prolog and Epilog Code (C/C++ 7.0)

Protected mode only

The Mark Frame and Unmark Frame phases are not needed during protected mode and can be ignored. The prolog/epilog code for a near function and the prolog/epilog compiled with /Gw differ in four phases: Preamble, Save DS, Load DS, and Restore DS. The other phases—Link Frame, Alloc Frame, Release Frame, and Unlink Frame—are the same; they set up the stack frame for the function. (See Figure 1.)

Figure 1. Stack Frame Creation

The compiler generates code to access the parameters passed to the function using positive offsets to BP ([BP + XXXX]). Negative offsets from BP ([BP – XXXX]) access the function’s local variables. This happens for all C functions—near functions, far functions, and functions compiled with the /Gw option.

Optimizing for 80286 processors (/G2)

Because protected mode requires an 80286 processor at the minimum, you should use some of the special 80286 instructions through the /G2 option. Two instructions—enter and leave—are relevant to our current discussion. Enter performs the same function as Link Frame and Allocate Frame, and leave performs the same function as Release Frame and Unlink Frame. Table 7 shows the prolog/epilog code for near and far functions compiled with the /G2s option and without the /Gw option.

Table 7. Assembly Listing of Prolog/Epilog Code Compiled with /G2s

Unfortunately, the /Gw option overrides the /G2 option in C version 6.0 and generates the prolog/epilog code without the enter and leave instructions. The C/C++ version 7.0 compiler corrects this limitation; it generates Windows prolog/epilog code with the enter and leave instructions when it compiles with /GA or /GD and /G2. Table 8 shows the prolog/epilog code for functions compiled with C/C++ version 7.0 options.

Table 8. Assembly Listing of Prolog and Epilog Code for C/C++ 7.0 (Protected Mode Only)

The prolog preamble's purpose

The Preamble, Save DS, Load DS, and Restore DS phases exist only when you compile a far function with a Windows option (/Gw, /GW, /GD, or /GA). Programs developed for Windows, unlike those developed for MS-DOS, can have multiple instances, each with its own movable default data segment. When control is transferred from Windows to an application or from an application to a DLL, a mechanism is needed for changing DS to point to the correct default data segment. This mechanism consists of the prolog/epilog code, the Windows program loader, the EXPORT section of the DEF file (or _export), and the MakeProcInstance function.

Nothing seems to happen in the Preamble, Save DS, and Load DS phases:

push ds ; move ds into ax

pop ax

nop ; now ax = ds

push ds ; save ds

mov ds,ax ; ds = ax, but ax = ds

; therefore ds = ds

It seems like a lot of work to set DS equal to itself. However, a lot happens behind the scenes. Examining the code with the Microsoft CodeViewÒ debugger reveals three Preamble phases different from the code listing the /Fc compiler option generates (see Make4). The Client_WinProc (in WCLIENT.C), Client_Initialize (in CLIENT.C), and Picker_Do (in PICKER.C) functions demonstrate these phases. Table 9 lists these phases.

Table 9. Preamble Variations

The Windows program loader magically changes the Preamble phase of the prolog. The loader first examines the list of exported functions when it loads a program. When it finds an entry-point function with the /Gw preamble, it changes the preamble. If the function is not exported or the preamble is different, the loader leaves it alone, and DS retains its value. For example, in Client_Initialize, the DS register does not have to be changed so it is not.

If the function is part of a single-instance application, the value can be set directly because single-instance applications have only one data segment. Because DLLs are always single instance, they belong to this group. AX is set directly to DGROUP. In the Load DS phase, DS is loaded with the DGROUP value from AX, resulting in a correct DS value for the function.

In exported far functions, as demonstrated by Client_WinProc, Windows removes the entire preamble but still loads DS from AX during the Load DS phase. So where does it load AX? It depends on how Windows calls the function. For all window procedures, including Client_WinProc, Windows sets up AX correctly before calling the procedure.

That leaves callbacks such as those used with the EnumFontFamilies function. You can set up an EnumFontFamilies callback as follows:

FARPROC lpCallBack;

lpCallBack = MakeProcInstance(CallBack, hInstance);

EnumFontFamilies(hdc, NULL, lpCallBack, (LPSTR)NULL);

FreeProcInstance(lpCallBack);

MakeProcInstance creates an instance thunk, which is basically a jump table with an added capability: setting AX. Instance thunks appear as follows:

mov ax,XXXX

jmp <actual function address> ;jump to actual function

The return value of MakeProcInstance is the address of the instance thunk. This address is passed to EnumFontFamilies, which calls the instance thunk instead of the function itself. The instance thunk sets up AX with the current address of the data segment. In Real mode, Windows changes this address each time it moves the data segment and jumps to the function that loads DS with the value in AX. And presto! chango! DS has the correct value.

This discussion leads to some interesting conclusions:

An application cannot call an exported far function directly; it must use the result of MakeProcInstance as a function pointer instead.

An application should not use MakeProcInstance when calling a function in a DLL.

DLLs should not call MakeProcInstance on any exported far function that resides inside the DLL.

Non-exported far functions do not need the prolog/epilog code.

Windows sets up the AX register as part of its message-passing mechanism. Window procedures do not have instance thunks.

There are no obvious optimizations.

FixDS (/GA and /GEs)

FixDS by Michael Geary is a public domain program available on CompuServeÒ that brings insight and imagination to the optimization process. BorlandÒ C++ and Microsoft C/C++ version 7.0 both incorporate this feature. Under Microsoft C/C++ version 7.0 you can use /GA to perform the same function as FixDS (see Tables 6 and 8).

So far, we have not discussed the SS stack segment register. The prolog code does not set SS anywhere. This must mean that the Windows Task Manager sets SS before the function is executed. Because a Windows application is not normally compiled with the /Au or /Aw option, SS == DS. So there is no reason why DS cannot be loaded simply from SS.

Instead of pushing DS into AX, FixDS modifies the prolog to put SS into AX, which is eventually placed in DS (see the fourth column of Table 10). This preamble differs from the standard Windows preamble, so the Windows loader does not modify it.

This method has two convenient side effects:

You no longer need MakeProcInstance.

You do not have to export entry points.

FixDS does not work for DLLs because DS != SS.

Table 10. Assembly Listing of Optimized Prolog and Epilog

The C/C++ version 7.0 compiler extends the ideas of FixDS by letting the programmer specify where DS gets its value. You can use the /GEx option in conjunction with the /GA and /GD options to load DS. The following options are available:

/GEa—Load DS from AX. This is equivalent to /Gw and /GW.

/GEd—Load DS from DGROUP. This is the default behavior for /GD and is useful for DLLs, as explained in the next section.

/GEs—Load DS from SS. This is equivalent to FixDS and is the default behavior for /GA.

When you compile an application with /GA, the functions marked with __export are not really exported (you can look at the exported functions with EXEHDR). If you compile the program with /GA /GEe, the EXEHDR listing shows all exported functions. A program that you compile with /GA loads DS from SS and does not need to export its entry points, as mentioned above. A program compiled with /GA /GEa should normally be compiled with /GEe.

The /GD and /GA options work differently. The /GD option exports functions marked with __export. To stop the compiler from exporting functions in a DLL, use /GA /GEd /D_WINDLL /Aw instead of /GD.

Optimized DLL Prolog and Epilog

Although the previous recommendations (excluding FixDS) work fine with DLLs, a better optimization method exists. To optimize a DLL with C version 6.0, compile all DLL modules with the options listed in Table 2 for modules without entry points:

/Aw /G2 /Gs /Oclgne /W3 /Zp

This compilation does not generate prolog or epilog code because the /Gw option is not used. To load DS correctly, mark all entry-point functions with _loadds. Place the functions that the client application calls in the DEF file. This changes the prolog/epilog code to match the second column of Table 10.

_loadds basically adds the same lines that the Windows function loader changes in the Preamble for a DLL. See Make5 for an example of this method. Again, this is for protected-mode-only applications.

The /GD option in C/C++ version 7.0 defaults to loading DS from the default data segment (see the third column of Table 10). The /GD option also sets _WINDLL and /Aw.

Notice that the compiler options include /Aw but not /Au. The /Aw option informs the compiler that DS != SS. The /Au option is equivalent to /Aw and a _loadds on every function, far and near. This is not an optimization because even near functions receive the three lines of code that set up the DS register.

Using _loadds does not work for applications that have multiple instances and therefore multiple DGROUPs. It does, however, work for single-instance applications. A single-instance application need not export functions because the application passes function addresses to Windows. The application should make sure that another instance cannot start by checking the value of hInstance. Windows creates a new data segment for the application, but the application contains hard-coded pointers to the first data segment. The application should also set up a single data segment in the DEF file as:

DATA SINGLE MOVEABLE

Otherwise, the _loadds function modifier will generate warnings. There is no need to use MakeProcInstance because the _loadds function modifier sets up the DS register correctly.

EXPORT vs. _export

In the previous examples, the functions are exported in the DEF file. You can also use the _export keyword to export DLL functions. This method has some drawbacks, depending on the method you use to link the application with the DLL. There are three methods:

Including an IMPORTS line in the DEF file

Using the IMPLIB utility

Linking explicitly at run time

Including an IMPORTS line in the DEF file

Including an IMPORTS line in the DEF file of the application, for example:

IMPORTS

PICKER.Picker_Do

although inconvenient for DLLs with many functions, allows you to rename functions, for example:

IMPORTS

PickIt = PICKER.Picker_Do

Now the application can call PickIt instead of Picker_Do. This is useful when DLLs from different vendors use the same function name and when you import a function directly by its ordinal number. The linker gives each exported function an ordinal number to speed up linking by eliminating the need to search for the function. You can override the default ordinal number by specifying a number after an “at” sign (@) in the DLL’s DEF file, for example:

; DLL .DEF

EXPORTS

Picker_Do @1

An application can import this function with the following DEF file entry:

; Apps .DEF

IMPORTS

PickIt = PICKER.1

DLLs should always include ordinal numbers on exported functions.

Using the IMPLIB utility

Most programmers use the IMPLIB utility instead of an IMPORTS line in their DEF files. IMPLIB takes the DEF file of a DLL or, if _export is used, takes the DLL itself and builds a LIB file. The application links with the LIB file to resolve the calls to the DLL. Therefore, the IMPORTS line is not needed.

One of the drawbacks of _export is that it assumes linking by name instead of linking by ordinal number. As a result, the linker gives the function an ordinal number and the function name is placed in the Resident Name Table.

The linker is not likely to assign the same number each time it links the program. For example, the output of the EXEHDR program for a program with two exported functions may originally look like this:

Exports:

ord seg offset name

1 1 07a1 WEP exported, shared data

4 1 0e06 ___EXPORTEDSTUB exported, shared data

3 1 00ac PICKER_OLDDLGPROC exported, shared data

2 1 0061 PICKER_DO exported, shared data

Adding a third exported function to the program may change all the ordinals in the EXEHDR output, for example:

Exports:

ord seg offset name

1 1 07a1 WEP exported, shared data

3 1 0e06 ___EXPORTEDSTUB exported, shared data

4 1 0f00 NewFunction exported, shared data

2 1 00ac PICKER_OLDDLGPROC exported, shared data

5 1 0061 PICKER_DO exported, shared data

Applications that use any method of ordinal linking must now be recompiled to use the new ordinals. You may also have to recompile if you use the EXPORT statement without explicitly giving ordinal numbers. Having to recompile an application each time the DLL changes offsets many of the advantages of using DLLs.

Linking by name also results in function names being placed in the Resident Name Table, which is an array of function addresses indexed by function name. The Resident Name Table stays in memory for the life of the DLL. When linking by ordinal number, the function names reside on disk in the Non-Resident Name Table while an array of function addresses indexed by ordinal number resides in memory.

For a large DLL, the Resident Name Table could consume a significant amount of memory. Also, linking by name is much slower than linking by ordinal number because Windows must perform a series of string comparisons to find the function in the table.

Linking explicitly at run time

Run-time dynamic linking occurs when a function call is resolved at run time instead of load time. For example:

HANDLE hLib ;

FARPROC lpfnPick ;

// Get library handle.

hLib = LoadLibrary("PICKER.DLL") ;

// Get address of function.

lpfnPick = GetProcAddress(hLib, "Picker_Do") ;

// Call the function.

(*lpfnPick) (hwnd, &aPicker ) ;

// Free the library.

FreeLibrary( hLib) ;

Linking by name does not use the ordinal number of the function. When linking by name it is much faster to have the function name in the Resident Name Table.

However, using ordinal numbers is still faster and uses less memory. For example:

#define PICKER_DO 3

HANDLE hLib ;

FARPROC lpfnPick ;

// Get library handle.

hLib = LoadLibrary("PICKER.DLL") ;

// Get address of function.

lpfnPick = GetProcAddress(hLib, MAKEINTRESOURCE(PICKER_DO)) ;

// Call the function.

(*lpfnPick) (hwnd, &aPicker ) ;

// Free the library.

FreeLibrary( hLib) ;

The fastest, most flexible method, regardless of the linking method you use, is to explicitly list the functions with ordinal numbers in the EXPORTS section of the DEF file. The C/C++ version 7.0 /GD option encourages the use of __export to mark entry points. If you use this option, we recommend that you add an EXPORT entry in the DEF file for all functions that an application calls.

DS != SS issues

Some problems can arise within a DLL because DS != SS. A common problem occurs when a DLL calls the standard C run-time library. For example, if you compile the following code with the /Aw option:

void Foo()

{

char str[10]; // allocates str on stack,

strcpy(str,"BAR"); // passing the far pointer as a

// near pointer

}

the compiler generates a near/far mismatch error because strcpy expects str to be in the default data segment (a near pointer). However, str is allocated on the stack (making it a far pointer) because the stack segment does not equal the data segment. The following examples show how to avoid this situation.

You can place the array in the data segment by making it static:

void Foo2()

{

static char str[10]; // allocate str in data segment

strcpy(str,"BAR");

}

You can place the array in the data segment by making it global:

char str[10]; // allocate str in data segment

void Foo3()

{

strcpy(str,"BAR");

}

Instead of linking with the small-model version of strcpy, you can use the large-model (also called the model-independent) version:

void Foo4()

{

char str[10];

_fstrcpy(str,"BAR"); // accept far pointers

}

This version expects far pointers instead of near pointers and therefore casts the near pointers into far pointers.

You can also use the following functions from the Windows library:

lstrcat

lstrcmp

lstrcmpi

lstrcpy

lstrlen

wsprintf

wvsprintf

If you use one of these functions, the previous example becomes:

void Foo4()

{

char str[10];

lstrcpy(str,"BAR"); // accept far pointers

}

The following code fragment:

void Foo5()

{

char str[10]; // allocated on stack

char *pstr ; // near pointer based on DS

pstr = str ; // loss of segment

strcpy(pstr,"BAR");

}

causes the compiler to generate the error message:

warning C4758: address of automatic (local) variable taken. DS != SS.

In this example, pstr is set to the offset of str, and the segment is lost because pstr is a near pointer. Declaring pstr a far pointer eliminates this problem. However, you cannot pass a far pointer to strcpy so you must use _fstrcpy, which results in the following corrected code:

void Foo6()

{

char str[10];

char FAR *pstr ; // far pointer

pstr = str ; // no segment loss

_fstrcpy(pstr,"BAR");

}

The following code also prevents the segment loss:

void Foo7()

{

static char str[10]; // DS-based pointer

char *pstr ;

pstr = str ; // no segment loss

strcpy(pstr,"BAR");

}

What happens if the C run-time function does not have a far version? For example, in the Picker DLL, the picker_OnMouseUp function calls _splitpath, which requires near pointers. Using static or global structures poses problems for multiple applications that use Picker simultaneously. To avoid these problems, Picker allocates memory from the local heap with the LocalAlloc(LMEM_FIXED,size) function, which returns a local pointer. This is exactly what Picker needs to call _splitpath.

Summary

Follow these guidelines to avoid DS != SS problems:

Be sure that all pointers you pass to a DLL are far pointers.

Declare pointers to stack variables as far pointers.

Declare arrays as static or global.

Avoid storing arrays on the stack.

Avoid storing variables referenced by pointers on the stack.

Use the local heap for storing data.

Use far versions of C run-time functions (such as _fstrcpy).

Use equivalent Windows functions (such as wsprintf or lstrcpy).

Use prototypes on all functions.

Reminders about DLLs:

FixDS does not work with DLLs because DS != SS.

Avoid using _export in DLLs with C version 6.0.

Use the DEF file to override the default behavior of functions marked with _export.

Always assign ordinal numbers to all exported DLL functions.

/Au introduces a considerable amount of overhead; use /Aw and _loadds instead.

Replace /Gw with _loadds on exported functions.

OPTIMIZING THE CALLING CONVENTION

Several calling conventions can be used for optimization, including _cdecl (/Gd), PASCAL (/Gc), and _fastcall (/Gr):

_cdecl is the default C calling convention and is slightly slower than PASCAL and _fastcall.

PASCAL (defined in WINDOWS.H as _pascal) is used to communicate between Windows and an application. It is faster than _cdecl but does not allow variable argument functions such as wsprintf.

_fastcall is the fastest method. It places some of the parameters in registers but does not support variable argument functions and cannot be used with _export or PASCAL, so entry points cannot use the _fastcall modifier. Under C/C++ version 7.0, the __fastcall modifier2 can conflict with the Windows prolog/epilog code if used in the following combinations.

__fastcall, __far, Gw (also invalid in C version 6.0)

__fastcall, __far, __export, GA

__fastcall, __far, __export, GD

__fastcall, __far, GA, GEf

__fastcall, __far, GD, GEf

__fastcall, __far, __export, GA, GEf

__fastcall, __far, __export, GD, GEf

Because the C run-time library is compiled with the _cdecl convention, you must include header files such as STDLIB.H and STRING.H when you use a different calling convention. These header files explicitly mark each function as _cdecl to simplify changing the default convention. When you use a third-party library, you may have to add the _cdecl function modifier to the header files.

You can use any calling convention as the default convention for applications, as long as you declare all entry points FAR PASCAL and declare the WinMain function PASCAL. Marking callback functions as PASCAL is usually safer, even if you use the /Gc Pascal convention option, because it avoids problems if the calling convention changes inadvertently. It is also a good form of code commenting.

Summary of calling conventions:

WinMain should use the PASCAL calling convention.

Entry points that Windows calls must be FAR PASCAL.

Only _cdecl allows variable arguments.

_fastcall is incompatible with _export or PASCAL and is therefore incompatible with Windows prolog/epilog code.

DLLs and _cdecl

A DLL, unlike an application, can use any calling convention, even for application-called entry points. An application that calls a DLL must know which calling convention the DLL expects and must use that convention.

A DLL may need to implement a variable argument function. Because _cdecl is the only convention that supports variable arguments, it is the convention of choice. If you want a DLL function to use variable arguments, use the _cdecl convention instead of the PASCAL convention.

Note the following caveats when using variable argument lists in DLLs:

The variable argument macros from STDARG.H use the default pointer size to point to the arguments that are on the stack. In the small or medium model, the pointers are near pointers. Because DS != SS, these pointers do not point to the correct value and must be changed to far pointers before you can use these macros, as shown in the modified STDARG.H below:

/****************************************************************

* File: wstdarg.h

* Remarks: Macro definitions for variable argument lists

* used in DLLs.

****************************************************************/

typedef char _far *wva_list ;

#define wva_start( ap, v ) (ap = (wva_list) &v + sizeof( v ))

#define wva_arg( ap, t ) (((t _far *)(ap += sizeof( t )))[-1])

#define wva_end( ap ) (ap = NULL)

When passing arguments by reference, always use far pointer declarations. The compiler synthesizes far pointers by pushing the DS and the offset of the memory location onto the stack. This provides the DLL with the proper information to access the application’s data segment.

Because functions with variable arguments are defined using _cdecl, pointer arguments that are not declared in the parameter list must be typecast in the function call; otherwise, the omission of the function parameter prototype causes unpredictable results. For example:

void FAR _cdecl DebugPrint( LPSTR lpStr, LPSTR lpFmt, ... )

DebugPrint( szValue, "%s, value passed: %d\r\n",

(LPSTR) "DebugPrint() called", (int) 10 ) ;

When you import or export a function, you must declare it with an underscore (_) prefix in the DEF file. You must also preserve case sensitivity in the function name. For example, you can declare the function above as follows:

EXPORTS

WEP @1 RESIDENTNAME

_DebugPrint @2

cdecl functions must either be linked by ordinal number or have all-uppercase names.

Unlike Pascal functions, which are converted to uppercase before they are exported, _cdecl functions retain their case when exported. The Windows dynamic linking mechanism always converts function names to uppercase before it looks in the DLL for the function. However, functions exported from a DLL are expected to be in uppercase and are not converted. The result is a comparison between an uppercase function name and a mixed-case function name. This comparison, of course, fails. The solution is to declare the function name all-uppercase or to link by ordinal number and avoid the whole comparison problem.

Variable argument C run-time library functions such as vsprintf and vfprintf do not take DS != SS into account. These functions are not available in DLLs. Compile with /D_WINDLL instead of /D_WINDOWS to detect functions that DLLs do not support. The C/C++ version 7.0 compiler option /GD does this automatically.

If the DLL will be used with different languages such as Visual Basic, Borland C++, Microsoft Excel, Zortech C++, or Microsoft FORTRAN, you should use the PASCAL convention. The registers used by the _fastcall convention can change between compiler versions and are not compatible between compilers by different vendors.

ALIASING AND WINDOWS (/Ow and /Oa)

An alias is a second name that refers to a memory location. For example, in:

int i ;

int *p ;

p = &i ;

pointer p is an alias of variable i. You can use aliases to perform tasks while keeping the original pointer around, for example:

// No error checking.

// Get a pointer.

//

LPSTR ptr = GlobalLock(GlobalAlloc(GHND,1000));

LPSRT ptr_alias = ptr ; // alias the pointer

for ( i = 0 ; i < 1000 ; i++)

*(ptr_alias++) = foo(i) ; // use the alias

GlobalFree(GlobalHandle(ptr)); // free the memory

The compiler makes the following assumptions if there is no aliasing:

If a variable is used directly, no pointers reference that variable.

If a pointer references a variable, that variable is not used directly.

If a pointer modifies a memory location, no other pointers access the same memory location.

Global Register Allocation (/Oe)

Although aliasing is a common and acceptable practice, the compiler can improve optimizations if it can assume that there is no aliasing, because it can place more memory locations into registers. By default, the compiler uses registers:

To hold temporary copies of variables.

To hold variables declared with the register keyword.

To pass arguments to functions declared with _fastcall or compiled with /Gr.

The /Ow and /Oa options signal the compiler that it has more freedom to place variables or memory locations into registers; these options do not cause the compiler to keep variables in registers.

The global register allocation option /Oe, on the other hand, allocates register storage to variables, memory locations, or common subexpressions. Instead of using registers only for temporary storage or for producing intermediate results, the /Oe option places the most frequently used variables into registers. For example, /Oe places a window handle, hWnd, in a register if a function is likely to use hWnd repeatedly.

Because the no-aliasing options increase the compiler’s opportunities to place a variable in a register, it makes sense to use these options with /Oe. In many cases, the /Ow and /Oa options do not optimize without the /Oe option. In some cases, you can eliminate problems with /Ow or /Oa by turning off /Oe optimization.

Using /Ow instead of /Oa

What is the difference between /Ow (Windows version) and /Oa? Basically, /Ow is a relaxed version of /Oa. It assumes aliasing will occur across function calls, so a memory location placed in a register is reloaded after a function call. For example, in:

foobar( int * p) ;

{

// Compiler puts the value that p points to into a register.

*p = 5 ;

foo() ;

// If compiled with /Ow, the compiler reloads the register

// with p.

(*p)++ ;

}

the compiler places the memory referenced by pointer p into a register. If the /Ow option is set, the compiler reloads the register. If the /Oa option is set, pointer p is not reloaded after the function call. Thus, /Ow tells the compiler to forget everything about pointed-to values after function calls.

Compiling the code fragment above with /Ox and /Oa results in the following code:

mov si,WORD PTR [bp+4] ; pointer p is passed in at [bp+4]

mov WORD PTR [si],5

call _foo

mov WORD PTR [si],6 ; compiler assumes that *p cannot

; change and generates *p=6 instead

; of (*p)++

Notice how the compiler optimized away the last line that incremented pointer p.

Compiling the code with /Ox and /Ow results in the following correct version:

mov si,WORD PTR [bp+4] ; p

mov WORD PTR [si],5

call _foo

inc WORD PTR [si] ; compiler assumes that

; *p might change.

To understand the benefit this technique adds to a Windows-based program, look at the following code fragment:

void Foo(HWND hwnd)

{

char ach[80];

// Zero terminate the string in case of error.

//

ach[0] = 0;

SendMessage(hwnd, WM_GETTEXT, sizeof(ach), (LONG)(LPSTR)ach);

// If some text is returned, do something with it.

//

if (ach[0] != 0)

{

Bar(ach);

}

}

If you compile this code fragment with /Oa and C version 6.0, Bar is never called. If you use C/C++ version 7.0, Bar is called. The C version 6.0 compiler assumes that ach does not change in the SendMessage call and optimizes the call to the if block because ach[0] is still zero. If you compile the code with /Ow, the compiler expects ach to change after any function, including SendMessage.

The C version 6.0 compiler appears to be pretty dumb—it does not realize that the ach pointer was passed to SendMessage. However, as far as the compiler can tell, a LONG was passed, not the pointer. If a pointer had been passed, /Oa would have worked. For example, in the following code:

void SomeFunc(HWND hwnd, LPSTR astr, int asize)

{

SendMessage(hwnd, WM_GETTEXT, asize, (LONG)astr);

}

void Foo(HWND hwnd)

{

char ach[80];

//Pass a pointer.

SomeFunc(hwnd,(LPSTR)ach, sizeof(ach));

if (ach[0] != 0)

{

Bar(ach);

}

}

the compiler knows that the pointer is being passed and can be changed. This problem can occur in any function that takes a pointer as a DWORD (lparam) or a WORD (wparam). The C/C++ version 7.0 compiler corrects this behavior.

You can also solve this problem by simply declaring ach volatile. This causes the compiler to place a variable in a register only if it must. However, /Ow usually generates better code than using the volatile keyword.

Although /Ow is the easiest solution, the code it generates is not as efficient as the code /Oa generates, as illustrated by the hWnd window handle in the previous example. Window handles are commonly used in functions. They are perfect examples of variable types that are meant to be placed into registers; however, with the /Ow option they are reloaded after any function call. Using #pragma optimize at strategic locations to turn /Ow and /Oa off prevents problems associated with reloading. A profiler can help determine the placement of such statements.

The STRICT macros defined in the Windows version 3.1 SDK WINDOWS.H file also reduce the need for the /Ow option. WINDOWSX.H includes macros that make most window functions type-safe. So, a pointer is passed as a pointer instead of being passed as a LONG. The STRICT macros can make an application more robust and should be used even if the /Oa option is not in effect.

Avoiding Undocumented Features

Undocumented “features” are rarely necessary or useful, with the exception of file functions such as _lcreate that were not documented before Windows version 3.x. For example, an undocumented feature that saves neither time nor effort is demonstrated by the following code segment.

HANDLE h = LocalAlloc(LMEM_MOVEABLE, cb);

HANDLE h2;

char* p;

// WARNING: Undocumented Hack.

// Dereference the handle without locking it.

//

char* p = *((char**)h);

// Use *p for a bit.

*p = 0;

h2 = LocalAlloc(LMEM_MOVEABLE, cb);

// Hmm... It could have moved, so dereference it again.

//

p = *((char**)h);

if (*p = 0)

{

// Do something.

}

You should not use this undocumented feature for two reasons:

Future versions of Windows will have a flat memory model and will not support this type of memory accessing.

The code will not compile as expected if you use the /Oa option. The p pointer is not passed to the LocalAlloc function; therefore, the compiler assumes that p will not change as a result of this function call. The programmer has tried to outsmart the compiler by dereferencing the pointer again after the function call, so the program appears to be safe. Not quite.... The compiler removes the second dereference statement because it assumes that p did not change as a result of the function call; this is exactly what the person who had to support the code would do.

To avoid this problem:

Do not use this undocumented feature. (This is the best solution.)

Use /Ow instead of /Oa.

Always lock handles to memory before using them.

Use #pragma optimize to selectively turn the /Ow option on and off. You can also turn /Oe off.

Use the volatile keyword to ensure that variables are not placed in registers.

1Previous versions were named xNOCRT.LIB.

2Note that the C/C++ version 7.0 compiler uses two underscores in modifier names but recognizes the single underscore that C version 6.0 uses.