Microsoft DirectX Developer FAQ

Tony Cox
Microsoft Corporation

November 1999

Summary: This article provides in-depth answers to frequently asked development questions regarding Microsoft DirectX, version 7.0, and includes code samples, resources, newsgroups, and SDK information for the DirectX developer. (26 printed pages)

Contents

DirectX Developer Resources
The DirectX SDK and Run Time
General DirectX Development Issues
DirectDraw
Direct3D (Immediate Mode)
DirectSound
DirectPlay
DirectMusic

DirectX Developer Resources

A number of excellent resources are available to DirectX developers. The primary resource is, of course, the Microsoft® DirectX® Web site at www.microsoft.com/directx/. In addition, beta program members can obtain access to the Microsoft private newsgroups for DirectX.

The DirectX SDK and Run Time

Where can I obtain the latest version of the DirectX SDK?

The DirectX SDK can be obtained through Microsoft's DirectX Web site at http://msdn.microsoft.com/directx/downloads.asp. MSDN Universal subscribers automatically receive the latest SDK as part of their MSDN subscription. DirectX beta program members automatically receive both beta and final versions of the SDK, as they become available. The end user run-time portion of DirectX is always available for download from the Microsoft site.

How do I get on the DirectX beta program?

You should send email to directx@microsoft.com. Make sure you include your name, company name, and a fax number and/or postal address. You will be faxed/mailed a nondisclosure agreement (NDA), which you must sign and return before being accepted onto the beta program. Once you're on the beta program, you will be sent beta releases of the SDK, together with an account and password for the Microsoft private DirectX newsgroups.

How should I contact Microsoft with feedback and bug reports?

The general mailing address for all DirectX issues is directx@microsoft.com. Bug reports should be sent to the DirectX bug reporting address, dxbugs@microsoft.com. When you submit a bug report, always include the report produced by the DirectX bug reporting tool, which contains details of the hardware and driver versions of various components installed on your system. This information makes it much easier for Microsoft to track down and fix your bug.

What's the status of DirectX on Windows NT or Windows 2000?

DirectX version 3, plus Microsoft DirectPlay® version 5.2, is supported on Microsoft Windows NT® 4.0 with service pack 3. With service pack 4, DirectPlay is upgraded to version 6.0, with other components still remaining at version 3; and service pack 5 upgrades DirectPlay to version 6.1a.

Windows 2000 fully supports DirectX 7, including full Direct3D hardware acceleration.

What happened to the foreign language versions of the redistributable components for DirectX 7?

They're still there, but they are now all rolled into the same cabinet (CAB) file. The redistributable is now compressed, so it takes up less space than before, despite having all the foreign language versions included.

General DirectX Development Issues

Why do I get lots of error messages when I try to compile the samples?

You probably don't have your include path set correctly. Many compilers, including Microsoft Visual C++®, include an earlier version of the SDK. So if your include path searches the standard compiler include directories first, you'll get incorrect versions of the header files. To remedy this, make sure the include path and library paths are set to search the DirectX include and library paths first. See also the dxreadme.txt file in the SDK.

I get linker errors about multiple or missing symbols for GUIDs—what do I do?

The various globally unique identifiers (GUIDs) you use should be defined once and only once. The definition for the GUID will be inserted if you #define the INITGUID symbol before including the DirectX header files. Therefore, you should make sure that this only occurs for one compilation unit. An alternative to this method is to link with the dxguid.lib library, which contains definitions for all of the DirectX GUIDs. If you use this method (which is recommended), then you should never #define the INITGUID symbol.

Can I simply cast a pointer to a DirectX interface to a lower version number?

No. DirectX interfaces are Component Object Model (COM) interfaces. This means that there is no requirement for higher numbered interfaces to be derived from corresponding lower numbered ones. Therefore, the only safe way to obtain a different interface to a DirectX object is to use the QueryInterface() method of the interface. This method is part of the standard IUnknown interface, from which all COM interfaces must derive.

What do the return values from the Release() or AddRef() methods mean?

The return value will be the current reference count of the object. However, the COM spec says that you should not rely on this, and the value is generally only available for debugging purposes. The values you observe may be unexpected because various other system objects may be holding references to the DirectX objects you create. For this reason, you should not write code that repeatedly calls Release() until the reference count is zero, as the object may then be freed even though another component may still be referencing it.

Does it matter in which order I release DirectX interfaces?

It shouldn't matter, because COM interfaces are reference counted. However, there are some known bugs with the release order of interfaces in some versions of DirectX. For safety, you are advised to release interfaces in the reverse of creation order, where possible.

What is a smart pointer, and should I use them?

A smart pointer is a C++ template class designed to encapsulate pointer functionality. In particular, some standard smart pointer classes are designed to encapsulate COM interface pointers. These pointers automatically perform QueryInterface() instead of a cast, and handle AddRef() and Release() for you. Whether you should use them is largely a matter of taste. If your code contains lots of copying of interface pointers, with multiple AddRef()'s and Release()'s, then smart pointers can probably make your code neater and less error prone. Otherwise, you can do without them. Visual C++ includes a standard Microsoft COM smart pointer, defined in the comdef.h header file (look up com_ptr_t in the help).

I have trouble debugging my DirectX application—any tips?

The most common problem with debugging DirectX applications is attempting to debug while a DirectDraw surface is locked. This situation can cause a "Win16 Lock" on Windows 9x systems, which prevents the debugger window from painting. Specifying the DDLOCK_NOSYSLOCK flag when locking the surface usually eliminates this. Windows 2000 does not suffer from this problem. When developing an application, it is also useful to be running with the debugging version of the DirectX run time (selected when you install the SDK), which performs some parameter validation and outputs useful messages to the debugger output.

What's the correct way to check return codes?

Use the SUCCEEDED() and FAILED() macros. DirectX methods can return multiple success and failure codes, so a simple ==DD_OK test will not always suffice.

I often get DDERR_INVALIDPARAMS or similar error codes. What should I do?

You are probably forgetting to correctly initialize a structure you're passing. In particular, ensure that the dwSize field is correctly filled out with the structure size. Also make sure that unused fields are zeroed. It's usually a good idea to do a ZeroMemory() or memset() to clear the structure before you use it. Another useful trick is to write code like:

DDSURFACEDESC2   desc = {sizeof(desc)};

This has the effect of zeroing the structure and setting the first member (dwSize is always the first member) to zero.

What good books about DirectX are there?

Inside DirectX, published by Microsoft Press, covers most of the DirectX components, with the notable exception of Direct3D. A companion book, Inside Direct3D, is expected to be out soon.

Is there a good book explaining COM?

Inside COM by Dale Rogerson, published by Microsoft Press, is an excellent introduction to COM.

What books are there about general Windows programming?

Lots. However, the ones that are highly recommended are:

Programming Windows 95 by Charles Petzold and Paul Yao (Microsoft Press) and Advanced Windows by Jeffrey Richter (Microsoft Press)

How can I get started quickly with Direct3D programming?

Use the new D3DX utility library. This library takes care of all sorts of initialization problems, and provides a useful set of math functions and simple shapes (spheres, boxes, cylinders, etc.) as well as texture loading and manipulation functions.

DirectDraw

DirectDraw is responsible for managing basic display functions; it also acts as a video memory manager and provides access to hardware blitting functionality.

Do I really have to enumerate the DirectDraw devices?

No. If you know you just want to use the primary desktop display device, then it's safe to pass NULL when specifying the device for DirectDrawCreate. However, if you want to take advantage of multimonitor capabilities or use a secondary 3-D device that may be present, you must enumerate the available devices.

How do I know what format the screen pixels are stored in?

To determine the pixel format for the screen, you must have an interface to the primary display surface. You then call the GetPixelFormat() method to return the format description structure. From this, you can work out if the pixel values are indices into a palette, or actual packed colour values. If the pixels are actual colour values, the structure tells you how the bits are packed for each colour channel. Do not assume any particular pixel format (in particular, there are at least two popular ways of bitpacking 16-bit pixels).

How do I write a single pixel to a surface?

First, you must lock the surface, using the Lock() method. The surface description structure will be filled out with a pointer to, and the pitch of, the surface. The pitch of the surface tells you the increment, in bytes, between vertically adjacent pixels. Note that the pitch of a surface is not always equal to the width of the surface. The address of the desired pixel can then be computed. The data you write to the surface must be in the correct pixel format for the surface. The following code snippet shows writing a single pixel to a 16-bit surface:

void   WritePixel16( IDirectDrawSurface7* surface , 
int x , int y ,           WORD colour_value )
{
   DDSURFACEDESC2   desc;
   memset(&desc,0,sizeof(desc)); desc.dwSize = sizeof(desc);
   if (SUCCEEDED(surface->Lock(NULL,&desc,
  DDLOCK_NOSYSLOCK|DDLOCK_WAIT,NULL)))
{
   char*   address = desc.lpSurface;
   address += (x*2) + (y*desc.dwPitch);
   *((WORD*)address) = colour_value;
   surface->Unlock(NULL);
}
}

In practice, of course, you would not lock and unlock the surface each time a pixel is drawn, because locking can be an expensive operation if the surface is in video memory. Rather, you would lock the surface once, draw all the required pixels, and then unlock.

Can I assume that the address of a surface will remain unchanged between locks?

No. Although it is often the case that the address of a surface is unchanged, you should not rely on this behavior. The driver is technically free to return a different address each time a surface is locked, and indeed it may wish to do so for virtual memory management reasons.

Does DirectDraw convert between pixel formats while blitting?

No. With the exception of unpacking compressed textures, a DirectDraw blit simply copies, bit-for-bit, the data held in the surface. You cannot use DirectDraw to perform general format conversion. In particular, no palette remapping is performed by a DirectDraw blit. You can use the graphics device interface (GDI) to obtain this functionality. DirectDraw will unpack 'DXTn' compressed textures to any sensible RGB(A) formatted surface.

Will DirectDraw emulate blit functions if they are not available in hardware?

Yes. With the exception of rotation and mirroring, all blit functions will be emulated when there is no blitting hardware present. You can disable this emulation by specifying the DDCREATE_HARDWAREONLY flag when creating the DirectDraw object (in which case, blits may fail where no hardware blit is available). Alternatively, you can also force all blits to be emulated by specifying the DDCREATE_EMULATEONLY flag.

Does DirectDraw supply any 2D drawing primitives (e.g., lines)?

No. The only operation supported is a blit (although this can be used to draw coloured boxes). Higher level 2-D functionality is available via the GDI.

I get crashes or my application grinds to a halt when I try to get a GDI device context (DC) for a surface. What should I do?

You are probably not releasing the device context properly after use. This is an easy mistake to make because, though the GetDC() method takes a pointer to an HDC, the ReleaseDC() method takes the HDC itself. For example:

// Get a DC for the surface.
HDC   dc;
if (SUCCEEDED(surface->GetDC(&dc)))
{
   // Do something with the DC.
   TextOut( dc , 0 , 0 , "Hello" , 5 );

   // Now release the DC...
   surface->ReleaseDC(&dc); // Wrong!! Should be dc, not &dc.
}

Unfortunately, the above (incorrect) code will compile without error. To remedy this, #define the symbol STRICT before including Windows.h, which will enable the compiler to spot type errors like this.

How do I perform alpha blending?

DirectDraw does not currently support alpha blending in blit operations. If 3D hardware is available, then this can be used instead, by drawing alpha blended textured quads to simulate blits. This is also very likely to be the fastest method, and gives opportunity for additional effects like scaling, rotation, and filtering at little cost. If this approach is not suitable, perhaps because no 3-D hardware is available, the only alternative is to perform the alpha blend 'by hand,' locking the surfaces and modifying the data using the CPU. Most fast implementations use tables to speed up the blending operation, although where MMX instructions are available, these can be used to implement fast alpha blending for non-paletted surfaces. The following code snippet demonstrates how to use a lookup table to implement a fixed-weight alpha blend between two 8-bit paletted surfaces.

BYTE   g_BlendTable[256][256];

// Find nearest match for a given colour in the palette, using a
// squared distance error function.
int   FindColour( LPPALETTEENTRY palette , int r , int g , int b )
{
   int   best = 0;
   int   best_error = INT_MAX;
   for ( int i = 0 ; i < 256 ; i++ )
{
   int er,eg,eb;
   er = r - (int)palette[i].peRed; er *= er;
   eg = g - (int)palette[i].peGreen; eg *= eg;
   eb = b - (int)palette[i].peBlue; eb *= eb;
   int error = er + eg + eb;
   if ( error < best_error )
   {
      best_error = error;
      best = i;
   }
}
return best;
}

// Initialise the blend table, given the palette.
// The weight factor is from 0 to 256.
void   InitialiseTable( LPPALETTEENTRY palette , int weight )
{
   for ( int i = 0 ; i < 256 ; i++ )
   {
      for ( int j = 0 ; j < 256 ; j++ )
      {
         // Compute the colour we'd like the blend
         // of colour indices i and j to be.
         int   r,g,b;
         r = (palette[i].peRed * weight);
         r += (palette[j].peRed * (256-weight));
         r /= 256;
         g = (palette[i].peGreen * weight);
         g += (palette[j].peGreen * (256-weight));
         g /= 256;
         b = (palette[i].peBlue * weight);
         b += (palette[j].peBlue * (256-weight));
         b /= 256;

         // Find nearest match in our palette.
         int index = FindColour( palette , r , g , b );
         
         // Store in table
         g_BlendTable[i][j] = index;
      }
}
}

// Perform an alpha-blend, with no clipping or stretching. Assumes
// that the table has been initialised, and that both surfaces are
// 8-bit paletted, with the same palette as was used to
// initialise the table. The surfaces can't be the same.
void   AlphaBlend( IDirectDrawSurface7* dest ,
IDirectDrawSurface7* src ,
int dest_x , int dest_y , 
int src_x , int src_y ,
int width , int height )
{
   DDSURFACEDESC2   descd , descs;
   memset(descd,0,sizeof(descd)); descd.dwSize = sizeof(descd);
   memset(descs,0,sizeof(descs)); descs.dwSize = sizeof(descs);
   if (SUCCEEDED(dest->Lock(NULL,&descd,DDLOCK_WAIT,NULL)))
{
   if (SUCCEEDED(src->Lock(NULL,&descs,DDLOCK_WAIT,NULL)))
{
   BYTE*   destptr = (BYTE*) descd.lpSurface;
   BYTE* srcptr = (BYTE*) descs.lpSurface;
   destptr += dest_x + (dest_y * descd.dwPitch);
   srcptr += src_x + (src_y * descs.dwPitch);

   while (--height>=0)
{
   BYTE* dd = destptr;
   BYTE* ds = srcptr;
   int w = width;
   while (--w>=0)
{
   *dd = g_BlendTable[*dd][*ds];
   dd++;
   ds++;
}
destptr += descd.dwPitch;
srcptr += descs.dwPitch;
}

src->Unlock(NULL);
}
dest->Unlock(NULL);
}
}

The same can be done for 16-bit surfaces, by splitting the lookup table into two (the size would be prohibitive otherwise).

How can I fade the screen to and from black?

With paletted screen modes, a fade can be achieved via a simple manipulation of the palette. For non-paletted modes, the best method is to use the IDirectDrawGammaControl interface (queried from the primary surface) to adjust the colour ramp up or down. These methods also easily allow a fade to a colour other than black. Where gamma controls are not available, the fade needs to be done by manipulation of the pixels on the surface. The fastest mechanism is to use the alpha blending capabilities of 3-D hardware. Where this is not an option, the pixels must be manipulated 'by hand,' either by a table lookup (as with alpha blending) or by simple repeated decrement or division of the colour values.

When my DirectDraw application exits, why do I get weird window resizing problems?

Shutting down DirectDraw incorrectly can cause problems. The most common mistake is to destroy the application window before shutting down DirectDraw. The window handle passed to SetCooperativeLevel() must remain valid until DirectDraw has been shut down. The safest place to shut down DirectDraw is in the processing of the WM_DESTROY message for your window.

What are the performance characteristics of video memory surfaces?

Accessing video memory with the CPU is very slow, especially read operations. For this reason, when you are doing a significant amount of direct manipulation of the surface, it is often faster to use a back buffer in system memory and blit it to the front buffer each frame. This is particularly true of alpha blending, or similar operations that require reading from surfaces. On the other hand, blits between video memory surfaces will often be significantly faster than system memory blits due to hardware acceleration. In addition, hardware accelerated blits can occur in parallel with regular processing, again boosting performance. Therefore, you should keep surfaces in video memory whenever possible if you will not need to access them with the CPU.

How can I determine how much video memory is available?

You can use the GetAvailableVidMem() method to determine the available video memory. However, not all drivers implement this method, returning zero. Also, alignment restrictions, private data structures and other factors mean that you should never rely on the exact byte count returned. For example, if there are x*y bytes free, it does not necessarily imply that you will be able to create an x by y 8-bit surface. The returned values are best treated as a guideline only.

What are the alignment rules for surfaces?

The alignment rules for surfaces are determined by the driver. A typical restriction is to pad surfaces to multiples of 8 bytes wide. However, the driver is free to impose any restriction it chooses. In particular, some legacy devices have rectangular allocation schemes. Therefore, it is important to determine the pitch of the surface by querying rather than computation.

How can I determine what chipset and/or driver is being used?

The GetDeviceIdentifier() method, introduced in version 6.0, returns a structure containing information about the chipset and driver, both as a unique GUID for the device/driver pair, vendor IDs, and descriptive strings that can be presented to a user.

Direct3D (Immediate Mode)

Direct3D is primarily responsible for providing access to 3-D acceleration hardware, although it does include software rasterization devices. Using Direct3D requires an understanding of DirectDraw, because DirectDraw is used to create and manage surfaces used by Direct3D (e.g., textures).

Where can I find information about 3-D graphics techniques?

The 'standard' book on the subject is Computer Graphics: Principles and Practice by Foley, Van Dam et al. and is a valuable resource for anyone wanting to understand the mathematical foundations of geometry, rasterization and lighting techniques. The FAQ for the comp.graphics.algorithms Usenet group also contains useful material.

Does Direct3D emulate functionality not provided by hardware?

No. If you are using a hardware device, Direct3D will perform no emulation of missing functionality. You must determine the available functionality by check capability bits and using the ValidateDevice() method.

What functionality is provided by the software rasterizers?

Direct3D provides two software rasterization devices. The first is the regular RGB software rasterizer. This has been greatly improved in functionality from the 5.0 version, and now supports bilinear filtering and a full range of alpha blending operations. It also supports 2-stage multitexture with most common operations. It automatically takes advantage of MMX instructions, where available (previously the MMX rasterizer was enumerated as a separate device). The old ramp mode rasterizer is now obsolete and can only be accessed via the old (version 5.0 or earlier) interfaces. There is also a high-quality reference rasterizer. The reference rasterizer (known as refrast) is full featured, supporting eight-stage multitexture, all legal blending operations, anisotropic filtering, stencil buffer, and a wide range of texture formats. It is not high performance, but is a very useful reference to compare against the output of hardware devices when debugging.

What are the major changes between version 6.0 and version 7.0?

DirectX 7 supports hardware accelerated transformation and lighting. In addition, the programming model has been greatly simplified. Lights, materials, and viewports are no longer distinct COM objects, but are set by directly calling methods of IDirect3DDevice7. Textures no longer have a special interface, but use the regular IDirectDrawSurface7 interface with the Load() method moved to IDirect3DDevice7.

I have a Voodoo card; how do I get Direct3D to select it?

Your 3dfx card is a separate DirectDraw device to your main 2-D graphics card. Therefore, to select it, you must enumerate the DirectDraw devices in your system and select the Voodoo. You can then enumerate the 3-D devices associated with that card. One of them will be the Hardware Abstraction Layer (HAL) for the Voodoo.

Why doesn't my z-buffer work?

The following is a checklist of some of the common pitfalls when working with z-buffers:

Creation: the z-buffer should be created and attached to the backbuffer before the creation of the Direct3D device. It must be in a legal format, as returned by the EnumZBufferFormats() method.
Clearing: the z-buffer should be cleared each frame, by specifying the D3DCLEAR_ZBUFFER flag when calling the Clear() method.
Enabling: the z-buffer should be enabled by setting the ZENABLE and ZWRITEENABLE renderstates appropriately.
Geometry: care must be taken to feed correct z values to the rasterizer. Both rhw and sz should be computed correctly for each vertex. The sz value must lie in the range 0.0 to 1.0. This step is usually straightforward when using Direct3D's transformation pipeline, but care must be taken when the application is performing its own transformation.

What range of values is legal for w-buffers?

The driver determines what range of values is expected for w by examining the projection matrix. For this reason the application should set a correct projection matrix when w-buffering, even if the application performs its own transformations.

I upgraded my application to DirectX 7, and everything is all black—what's wrong?

With DirectX 7, lighting is performed based on the D3DRENDERSTATE_LIGHTING renderstate, rather than the flexible vertex format (FVF) you pass. Because lighting is on by default, you are probably getting all your polygons lit by Direct3D—which, if you didn't specify lights or materials, means everything comes out black. If you don't want Direct3D to perform lighting, you need to explicitly disable it.

How can I improve the performance of my Direct3D application?

The following are key areas to look at when optimising performance:

Batch size. Direct3D is optimized for large batches of primitives. The more polygons that can be sent in a single call, the better. A good rule of thumb is to aim to average over 100 polygons per call. Below that, and you're probably not getting optimal performance, above that and you're into diminishing returns and potential conflicts with concurrency considerations (see below).
State changes. Changing renderstate can be an expensive operation, particularly when changing texture. For this reason, it is important to minimize as far as possible the number of state changes made per frame.
Concurrency. If you can arrange to perform rendering concurrently with other processing, you will be taking fullest advantage of system performance. This goal can conflict with the goal of reducing renderstate changes. You need to strike a balance between batching to reduce state changes and pushing data out to the driver early to help achieve concurrency. Using multiple vertex buffers in round-robin fashion can help with concurrency.
Texture uploads. Uploading textures to the device consumes bandwidth, competing with the bandwidth available for vertex data. Therefore, it is important not to overcommit texture memory, which would force your caching scheme to upload excessive quantities of textures each frame.
Context changes. Switching from 3-D to 2-D operations, for example, by blitting to or locking the frame buffer can cause a large stall as the device flushes its rendering pipeline. For this reason, it is important to avoid these operations as much as possible. Most 2-D components of scenes, head-up displays, and panels for example, can be rendered using 3-D primitives, generally resulting in better performance.
Vertex buffers (see below).
State macro blocks. These were introduced in version 7.0 and provide a mechanism for recording a series of state changes (including lighting, material and matrix changes) into a macro that can then be replayed by a single call. This has two advantages: first, you reduce the call overhead by making one call instead of many, and second, an aware driver can pre-parse and pre-compile the state changes, making it much faster to submit to the graphics hardware. State changes can still be expensive, but using state macros can help reduce at least some of the cost.
Use only a single Direct3D device. If you need to render to multiple targets, use SetRenderTarget. The run time is optimized for a single device, and there is a considerable speed penalty for using multiple devices.
Do not change FVF frequently. Changing FVF always causes a driver transition.
Try to avoid changing vertex buffer frequently (although obviously it is probably not possible to avoid changing vertex buffer altogether).

What is a vertex buffer, and how can it help me?

A vertex buffer is an object that encapsulates an array of vertices. It has similar semantics to a DirectDraw surface; a vertex buffer must be locked while accessing the contents. At creation time, the vertex buffer is placed in system or video memory, and the type of vertex contained is set. If the vertex buffer contains untransformed vertices, it can be transformed and the results placed in a target vertex buffer via the ProcessVertices() method. Using vertex buffers results in several potential benefits:

If multiple passes over the same data are required, perhaps to change renderstate between rendering different portions, it is then easier to avoid multiple transformations of the same vertex. This is achieved by transforming once, using ProcessVertices(), and then rendering from the transformed buffer, typically using several DrawIndexedPrimitiveVB() calls. This method is useful when you don't have hardware accelerated transform and lighting. If you have hardware accelerated transform and lighting then you should just submit the untransformed data multiple times (because ProcessVertices() would be performed on the host CPU).
The lock/unlock semantic avoids a redundant copy operation. Without a vertex buffer, the run time or driver is forced to take a copy of any block of data sent, because there is no guarantee than the application won't scribble over the data as soon as the call returns. With a vertex buffer, the application is required to lock the buffer to change the data, giving the driver the opportunity to block or fail on the Lock() call.
The vertex buffer could be stored in video memory, where it can be accessed very quickly by 3-D hardware. This is particularly important for hardware that accelerates transform and lighting.
The Optimize() method can provide a performance improvement where a vertex buffer will be reused multiple times without changing the contents. Optimize() rearranges the data in the buffer to make accessing it faster, depending on system configuration. Transformed buffers will be rearranged into the optimal format for rendering hardware, and untransformed buffers will be rearranged into the optimal format for transformation (which may depend on the CPU type of the system). Once a buffer has been optimized the data may be in a device/system-dependent format unknown to the application, so it will be no longer possible to lock the buffer.
DirectX 7 introduced several features that make vertex buffers easier to use and more efficient:
- DrawIndexedPrimitiveVB now allows a sub-range of a vertex buffer to be specified, which can greatly improve the efficiency in a software T&L situation.
- The additional flags DDLOCK_DISCARDCONTENTS and DDLOCK_NOOVERWRITE flags allow better concurrency without needing to allocate multiple 'round-robin' buffers. (See below)

What's a good usage pattern for vertex buffers if I'm generating dynamic data?

Create a VB with D3DVBCAPS_WRITEONLY that contains N (typically 1000 or less) vertices.
I = 0.
Set State (textures, renderstates, etc.).
Check if there is space in the VB, i.e., I + M <= N? (M is the number of new vertices).
If yes, then Lock the VB with DDLOCK_NOOVERWRITE. This tells Direct3D and the driver that you will be simply adding vertices and won't be modifying the ones that you previously batched. So, if a DMA operation was in progress, it is not interrupted. If no, go to 11.
Fill in the M vertices at I.
Unlock.
Draw[Indexed]PrimitiveVB using I as the offset.
I += M.
Go to 3.
OK, as we are out of space, let us start with a new VB. We don't want to use the same one because a DMA operation might be in progress. We communicate this to Display-3D (D3D) and the driver by locking the **SAME** VB with DDLOCK_DISCARDCONTENTS. What this means is "you can give me a new pointer because I am finished with the old one and don't really care about the old contents any more."
I = 0.
Go to 4 (or 6).

This procedure is only valid for DirectX 7. Also note that VB locks on DirectX 7 are superfast (~50 cycles), and unlocks are faster, so don't worry about the cost of steps 5 or 7.

Which primitive types (strips, fans, lists, etc.) should I use?

Many meshes encountered in real data feature vertices that are shared by multiple polygons. To maximize performance, it is desirable to reduce the duplication in vertices transformed and sent across the bus to the rendering device. It is clear that using simple triangle lists achieves no vertex sharing, and so is the least optimal method. The choice then is between using strips and fans, which imply a specific connectivity relationship between polygons, and using indexed lists. Where the data naturally falls into strips and fans, this is the most appropriate choice, because they minimize the data sent to the driver. However, decomposing meshes into strips and fans often results in a large number of separate pieces, implying a large number of DrawPrimitive calls. For this reason, the most efficient method is usually to use a single DrawIndexedPrimitive call with a triangle list. An additional advantage of using an indexed list is that a benefit can be gained even when consecutive triangles only share a single vertex.

In summary If your data naturally falls into large strips or fans, then use strips or fans; otherwise use indexed lists.

Can I have multiple BeginScene()/EndScene() pairs per frame?

Yes and no. In theory, you should have only one BeginScene()/EndScene() pair per render target per frame, and this rule certainly applies for scene capture cards like the PowerVR. However, for most conventional rendering devices, this restriction is unnecessary, and you will get the expected results from using multiple pairs. In most situations, multiple pairs are unnecessary, and for the sake of scene capture cards should be avoided.

Can I share position data between vertices with different texture coordinates?

The usual example of this situation is a cube where you want to use a different texture for each face. Unfortunately the answer is no—it is not currently possible to index the vertex components independently. This is sometimes used as an argument in favour of custom transformation code, rather than using the Direct3D pipeline. However, this argument is often spurious for the following reasons. First, the cube example is somewhat contrived, and in more realistic situations with larger polygon count meshes, it is far more common for a vertex shared between polygons to share all components. Second, an independent indexing mechanism might interfere with the smooth flow of data to the driver and/or card, and would likely have to be emulated by extra copy operations, negating a large amount of potential benefit.

Does the Direct3D geometry code utilize 3DNow! and/or Pentium III SIMD instructions?

Yes. The Direct3D geometry pipeline has several different code paths, depending on the processor type, and will utilize the special floating point operations provided by the 3DNow! or Pentium III SIMD instructions, where these are available.

How do I prevent transparent pixels being written to the z-buffer?

You can filter out pixels with an alpha value above or below a given threshold. You control this behavior by using the renderstates ALPHATESTENABLE, ALPHAREF and ALPHAFUNC.

What is a stencil buffer?

A stencil buffer is an additional buffer of per-pixel information, much like a z-buffer. In fact it 'lives' in some of the bits of a z-buffer. Common stencil/z-buffer formats are 15-bit z and 1-bit stencil, or 24-bit z and 8-bit stencil. It is possible to perform simple arithmetic operations on the contents of the stencil buffer on a per-pixel basis as polygons are rendered. For example, the stencil buffer can be incremented or decremented, or the pixel can be rejected if the stencil value fails a simple comparison test. This is useful for effects that involve marking out a region of the frame buffer, and then performing rendering only the marked (or unmarked) region. Good examples are volumetric effects like shadow volumes.

How do I use a stencil buffer to render shadow volumes?

The key to this, and other volumetric stencil buffer effects, is the interaction between the stencil buffer and the z-buffer. A scene with a shadow volume is rendered in three stages. First, the scene without the shadow is rendered as usual, using the z-buffer. Next, the shadow is marked out in the stencil buffer as follows. The front faces of the shadow volume are drawn using invisible polygons, with z-testing enabled but z-writes disabled, and the stencil buffer incremented at every pixel passing the z-test. The back faces of the shadow volume are rendered similarly, but decrementing the stencil value instead. Now, consider a single pixel. Assuming the camera is not in the shadow volume, there are four possibilities for the corresponding point in the scene. If the ray from the camera to the point does not intersect the shadow volume, then no shadow polygons will have been drawn there, and the stencil buffer is still zero. Otherwise, if the point lies in front of the shadow volume, the shadow polygons will be z-buffered out and the stencil again remains unchanged. If the point lies behind the shadow volume, then the same number of front shadow faces as back faces will have been rendered and the stencil will be zero, having been incremented as many times as decremented. The final possibility is that the point lies inside the shadow volume. In this case, the back face of the shadow volume will be z-buffered out, but not the front face, so the stencil buffer will be a non-zero value. The end result is that portions of the frame buffer lying in shadow have non-zero stencil value. Finally, to actually render the shadow, the whole scene is washed over with an alpha-blended polygon set to only affect pixels with non-zero stencil value. An example of this technique can been seen in the "Shadow Volume" sample that comes with the DirectX SDK.

How do I defeat the automatic mipmapping that some drivers perform?

You can defeat the automatic mipmapping performed by some drivers (notably for nVidia hardware) by explicitly specifying a mipmap chain with a depth of 1.

I've just converted from DX5 to DX6 interfaces, and all my objects are screwed up. What have I done wrong?

A common 'gotcha' when upgrading to the new interfaces is specifying the vertex type incorrectly. The old DX5 interfaces took a member of the D3DVERTEXTYPE enum, whereas the DX6 interfaces expect a flexible vertex format (FVF) specification. For example, instead of D3DVT_VERTEX, you now need to use D3DFVF_VERTEX.

The lighting has changed for DX7, but I can't get the new lighting stuff to work right. What am I doing wrong?

Unfortunately, the lighting documentation has some errors. In particular, the documentation on how attenuation is computed is wrong. Attenuation is computed according to the following formula, shown in Figure 1:

Figure 1. Formula for computing attenuation

where D is the distance between the light and the vertex, in world units. Note that this distance is not normalized in any way (as it says in the docs).

Also note that the range of the light has no effect at all on the attenuation calculation; it is used only in determining whether to consider that light at all.

These changes were made to make the Direct3D lighting model the same as the OpenGL lighting model, which is useful because hardware needs to implement the lighting model.

What are the texel alignment rules? How do I get a one-to-one mapping?

This is explained fully in the DirectX 7 documentation (under the article entitled "Directly Mapping Texels to Pixels"). However, the executive summary is that you should bias your texture coordinates by –0.5 of a texel in order to align properly with screen pixels. Most cards now conform properly to the texel alignment rules; however, some older cards or drivers do not. To handle these cases, the best thing to do is contact the hardware vendor in question and request updated drivers or their suggested workaround.

Can I use the texture coordinate generation and/or transform functionality, if I'm doing my own transformation?

No. Texture coordinate generation and transformation functionality are only available when you are using the Direct3D transformation pipeline.

DirectSound

I just upgraded to DirectX 7, and I get a burst of static when I start my DirectSound application. This also happens with lots of other applications—why?

You probably installed the debug version of the DirectX 7 run time. The debug version of the run time initializes all newly allocated DirectSound buffers with static, in order to help developers catch bugs with uninitialized buffers. You should not assume what the contents of a newly allocated DirectSound buffer will be (in particular, the buffer is not guaranteed to be zeroed out).

DirectPlay

My PC clock loses time after running my DirectPlay application—what's wrong?

This can occur if you fail to free DirectPlay properly on the termination of your application. When the program is finished, all the COM interfaces must be freed by calling the Release() method on each interface. Remember also to uninitialize COM properly by calling CoUninitialize(). This error can also occur if you break into a DirectPlay application in the debugger and then terminate the application without going through the shutdown code. This problem should be fixed with the next original equipment manufacturer (OEM) service release (OSR) of Windows 98.

I experience extensive delays in sending packets—any suggestions?

The most common cause of packet delays is from overloading your bandwidth. A common mistake is to send data on every frame. Many applications can get away with sending data much less frequently than their display frame rate. For example, fast action client-server applications often send data only at a 10Hz rate.

Are there known problems with reliable messaging upon host migration?

DirectPlay versions 6.0 and earlier have a known problem with host migration and reliable messaging. It is probably best to use reliable messaging only for setup and initialization. This problem has been fixed for version 6.1a.

I'm using the DirectPlay protocol. Why don't most of the service providers report that they support guaranteed messaging?

The flag (DPCAPS_GUARANTEEDSUPPORTED) indicates the capability of the service provider if the DirectPlay protocol is not used. If you use the DirectPlay protocol, you get guaranteed messaging capability, even if the service provider does not support this functionality.

DirectPlay and IPX on Windows NT 4: I am currently using DirectPlay (6.1a for NT with SP5) and was running some tests using different protocols. I do not have problems using, for example, the TCP protocol. But it hangs/crashes and has problems in general with enumerating sessions with IPX. Why?

Certain machines will not see others unless you set the frame type and network number for IPX in the system. Set the frame type to 802.3 and to set the network number to 2702 on all the systems you're using IPX on.

I am having trouble using DirectPlay while using Internet Connection Sharing`.` What can I do?

We also call these NATs, for Network Address Translation. There is a problem with applications working across Incremental Change Synchronization (ICS). ICS gives its clients local Internet Protocols (IPs) and strips these IPs external to the ICS. DirectPlay has address information embedded in the message that ICSs do not manage currently.

Is there anything wrong with sending network packets from multiple threads? We do most of our sending in our main thread. However, in some instances we would like to send packets in our separate receive thread as well. Are we going to see conflicts, deadlocks, increased instability, missed packets, etc. by doing this?

There shouldn't be any penalty for sending from multiple threads. The DirectPlay Protocol takes everything you throw at it and funnels it through a single thread of its own anyway.

Enumerating players in a remote session is a blocking API that takes a long time to execute. We are going to move it to its own thread and trigger it every couple of seconds to get rid of this pausing of our main thread. Are there any problems in doing this?

DirectPlay should be thread safe, so the workaround shouldn't cause problems.

Is the send completion message received when an ASYNC|GUARANTEED is sent, or when the acknowledge packet from the remote player is received (which could then be used to measure latency, like a 'ping' message)?

"Send Completion" means "send complete, on the wire," not "received by other DPlay" and certainly not "pulled out of the queue by the application on the other side" (IDirectPlay::Receive). The application has to do this itself.

Parsing Addresses: My DirectPlay application has been lobby-launched. I can get the connection settings inside a DPLCONNECTION structure, and everything works as expected. If the session is to be "joined" instead of hosted, for purposes of debugging, I want to know what the remote IP address of the host is. I'm guessing this information is buried inside the DPLCONNECTION structure's "lpAddress" member, but I don't know how to parse this blob of data. What should I do?

Use EnumAddress. Then use a callback, like this:

BOOL FAR PASCAL EnumAddressCallback(REFGUID guidDataType,
                                    DWORD dwDataSize,
                                    LPCVOID lpData,
                                    LPVOID lpContext)
{
  if (DPAID_INet == guidDataType)
  {
    lstrcpy((char *) lpContext, (char *) lpData);
    return FALSE;
  }
  return TRUE;
}

The IP address ends up in the context passed into EnumAddress. Just pass in a buffer big enough to hold the IP address.

When I use TCP/IP connection, if one player crashed out of the session without doing any cleanup (like close session, close connection), why do the other players experience a one-minute delay when they try to exit the session?

The proper way to prevent this close delay in DirectPlay 7 would be to poll the message queue via GetMessageQueue( , DPMESSAGEQUEUE_SEND, ) in a loop between CancelMessage & DestroyPlayer to make sure it did actually reach 0. Then use a sleep(1000) to allow system messages to clear.

DirectMusic

Problems Loading Media: Why Don't I Hear Anything or Why Am I Hearing the Wrong Instruments?

Here is a checklist of suggestions/caveats/gotchas:

I think I did everything right, but I still don't hear anything

Did you call call pDM->Activate(TRUE); ? You need to.
Did you call pSegment->SetParam(GUID_ConnectToDLSCollection)? Do you need to? You will if you want to hear your custom instruments with .mid files or other Segments that do not explicitly have a reference to the Collection.
Does your Segment reference a DLS that didn't get loaded (DMUS_S_PARTIALLOAD)? Check your path, and make sure you called ScanDirectory for "dls" types. Please note that this applies to other media types as well. Not calling ScanDirectory for other media types you need can cause similar head-scratching.
Did you call pSegment->SetParam(GUID_Download)? This is preferred and encouraged over using SetGlobalParam->(GUID_PerfAutoDownload), and you'll need to do this if your Segment is using a custom DLS.
Did you call pPerf->AssignPChannelBlock() on the port? You need to do this before you hear anything. Make sure you assign all channel ranges that are required.
Example: If you have a segment that has notes on PChannels 1, 4, 5, 33, and 59, you'll need to at least do the following:
```
pPerf->AssignPChannelBlock(0, pPort, 1);  // for channels 1, 4, and 5
pPerf->AssignPChannelBlock(1, pPort, 2);  // for channel 33
pPerf->AssignPChannelBlock(3, pPort, 3);  // for channel 59
```
(Channel group is arbitrary, but must be 1 or greater, and not collide with other group assignments.)

Did you call pLoader->ScanDirectory(CLSID_DirectMusicCollection, "dls") and try to play a file that uses GM.dls, and there was a .dls collection in the directory being scanned? This is a bug which will be addressed in later versions of DirectMusic.

Here is the workaround:

HRESULT CacheDefaultGMCollection
(
    IDirectMusicLoader* pLoader
)
{
    HRESULT         hr  = E_FAIL;
    DMUS_OBJECTDESC desc; 

    static IDirectMusicCollection*    pCollection = NULL;

    if (NULL != pCollection)
    {
        pCollection->Release();
        pCollection = NULL;
    }
    
    //**********************************************************************
    // Setup DMUS_OBJECTDESC to represent Default GM Collection
    //**********************************************************************
    ZeroMemory(&desc, sizeof(desc));
    desc.dwSize        = sizeof(DMUS_OBJECTDESC);
    desc.guidObject    = GUID_DefaultGMCollection;
    desc.guidClass     = CLSID_DirectMusicCollection;
    desc.dwValidData   = (DMUS_OBJ_CLASS | DMUS_OBJ_OBJECT);
    
    hr = pLoader->GetObject(&desc, IID_IDirectMusicCollection, (void **)&pCollection);
    if ( FAILED(hr) )
    { 
        DPF(0, "**** Failed to Load Object [collection]");
    }

    return hr;    
}

The mere fact of doing a GetObject with the above DMUS_OBJECTDESC will re-establish the loaders link to the Default GM Collection. Also, you do not need to release the Collection; it will be released when the Loader goes away. *Note: This is one possible implementation of this workaround. A more solid solution could include wrapping the check for Collection pointer with a critical section, to prevent synchronization problems.

Did you try to download a Band to the performance that required a PChannel in a PChannel group that has not been created?

Gotchas

If you're not hearing the right instruments (Band changes are incorrect) – Band changes may need to be at -1 tick from the nearest note event.
If you're hearing a Segment play with a Style from another Segment, check to make sure that the .sty files involved do not share the same GUID. This can happen if the composer copies the Style or can also happen if the Styles were copied outside of DirectMusic Producer.
If a DLS collection is not being found, resulting in a partial load, then make sure the Loader can find the DLS collections. The problems with missing sounds could be caused by either bad linkages in the Band files or inability to find some of the DLS collections or other files due to a programming error. The Loader needs to know where to find the files. The easiest way to handle it is to put them all in the same directory. Be sure NOT to rename any files. DirectMusic Producer makes this a little more confusing because it's able to play with bad linkages – it automatically downloads all the instruments from all the DLS collections, regardless of the linkages in the Bands.
If it's not a file location issue, and all of the instruments return S_OK when downloaded, but during playback, say for example, that 1 of the 24 instruments is reporting that no instrument was downloaded (maybe one portion of its note range plays correctly), then this would indicate that the instrument doesn't have a region that covers some of the notes being played. It is either too low, too high, or a gap somewhere in the middle.
Please note that you can get useful debug messages indicating how the band download is failing if you set the debug level for DMBAND to 3. This is done in win.ini:
```
[debug]
DMBAND=3
```
If you are running the debug DLLs, this sets the amount of debug spew you get. Each DLL can be told a different level so you can focus on the problem at hand. The way the debug messages work is through severity levels of 0 through 5. By default, all DLLs are set to 0.

What are some recommendations for optimization ?

You need to be careful if you want to use a 44.1k sampling rate for the software synth because, on ISA cards that don't do FDMA (a lot of older cards), simply doing 44.1-kHz stereo across the ISA bus can consume quite a lot of the CPU. On PCI and newer ISA cards, there is no problem. Note that a 44.1-kHz sampling rate only makes sense if you have 44.1-kHz samples.
There are two items that affect CPU usage the most: sampling rate and reverb. If you are finding CPU usage to be a concern, consider scaling your application as follows:

Table 1. CPU usage regarding reverb and sampling rate

Reverb Status	Sampling Rate	CPU Usage
Reverb off	22 kHz	Least CPU
Reverb on	22 kHz	Better sounding
Reverb off	44.1 kHz	Probably not that useful; 22 kHz with reverb on usually sounds better, but use your own taste
Reverb on	44.1 kHz	Best sounding if you are using 44.1-kHz samples

Or you can give the end user ultimate control via the audio control panel in your game. Of course, if all your samples are 22 kHz, you should run the synth no faster than 22 kHz.

The number of simultaneous voices being played can have a big impact on CPU usage. You will get the most performance boost if you limit the number of voices (simultaneous notes) allocated on the software synth. This can only be set in DirectMusic Producer 7.0, though it is available through both the DirectX 6.1 and DirectX 7.0 DirectMusic API. Also, sometimes, instruments have really long release times that can be cut back with little ill effect. This can cause a dramatic reduction in voices when there are a lot of repeated notes.
Additionally, composers can reduce polyphony themselves by quantizing the durations of their notes in their MIDI sequencer or editing them by hand. Often, polyphony is greatly increased merely by having a note's duration be a bit too long. For example, a simple 'C' scale in quarter notes may in fact often use two simultaneous voices (the overlap of the end of one quarter note with the beginning of the next). By editing judiciously, you can cut that down to only ever have a single voice playing. Along the same lines, quantized content in general will have fewer simultaneous voices.
Working with extremely large waves is not recommended because performance can suffer as a result. If possible, it is better to break them into smaller pieces so you don't max out your system memory. If you receive warnings about low system resources, you may wish to expand the size of your virtual memory.
Set the debug level for DMBAND to -1 (negative one). This is done in win.ini:
```
[debug]
DMBAND=-1
```
This can be used to turn off debug statements and is necessary sometimes because debug statements can affect performance for some DLLs more than others.
Finally, make sure you are manually downloading your Bands (DO NOT USE AutoDownload!)

Why is it bad practice to use AutoDownload? Would you use it for anything?

If you're working on a typical game application, you probably don't want to use AutoDownload, as it can cause a performance hit when playing back Segments. Instead, manually download with Segment->SetParam( GUID_Download, pIPerformance) to tell the Segment to download the DLS instruments associated with the Segment. This should be called at a convenient time (like a scene change) or you can call it in a separate thread prior to playback. The Band should be placed in a Band Track in your Segment to ensure that this will work properly.

After playing the Segment, call Segment->SetParam( GUID_Unload, pIPerformance) when you're done with the Segment. For this to work, all collections must be referenced properly from within the Band. When you load the Segment, the Band Track reads the name, file name, and GUID for each referenced collection and asks the Loader to load those as well. The easiest way for the Loader to know where to find them is to rely on file names. If you store your data in a resource, then you should call SetObject on each resource chunk first so the Loader will know where to find it.

When using AutoDownload, if you are using only the instruments from the default collection (GM.DLS) in your primary Segment and the Band in your Secondary Segment references only instruments from a custom collection (replacing GM.DLS), then the instruments from the default collection should be returned automatically when the Secondary Segment playback stops, if the primary Segment is still playing.

If you write a basic Play Segment/MIDI file app, you can use AutoDownload so you don't have to manage downloading the instruments. However, in a typical game situation, AutoDownload incurs a performance hit if you ever play a Segment more than once. And, it causes the downloading of instruments to occur right at the start of Segment playback, causing a blip in CPU at that point and potential delay in performance. Downloading and unloading repeatedly (which AutoDownload may do) takes time, and can potentially degrade performance. If you are concerned about CPU performance in your application, consider turning AutoDownload off.

Relying on AutoDownload can cause other problems: You might also want to turn off AutoDownload if you have a Band in a Secondary Segment (Secondary Segments are played on top of a primary Segment). Otherwise the instruments in the Band may be downloaded automatically when the Secondary Segment starts, changing your instruments. If the Secondary Segment stops playing before the primary Segment stops, AutoDownload will then unload the Band. If this happens, you will not revert to the original instruments, as you may expect. Rather, you may lose sound output entirely because you now have no Bands loaded.

For more information on DirectMusic, please check out the DirectMusic FAQ.