Z-Buffer Performance

Applications that use the ramp driver can increase performance when using z-buffering and texturing by ensuring that scenes are rendered from front to back. Textured z-buffered primitives are pretested against the z-buffer on a scan line basis. If a scan line is hidden by a previously rendered polygon, the system rejects it quickly and efficiently. Z-buffering can improve performance, but the technique is most useful when a scene includes a great deal of overdraw. Overdraw is the average number of times a screen pixel is written to. Overdraw is difficult to calculate exactly, but you can often make a close approximation. If the overdraw averages less than 2, you can achieve the best performance by turning z-buffering off.

You can also improve the performance of your application by z-testing primitives; that is, by testing a given list of primitives against the z-buffer. If you render the bounding box of a complex object using z-visibility testing, you can easily discover whether the object is completely hidden. If it is hidden, you can avoid even starting to render the object. For example, imagine that the camera is in a room full of 3-D objects. Adjoining this room is a second room full of 3-D objects. The rooms are connected by an open door. If you render the first room and then draw the doorway to the second room using a z-test polygon, you may discover that the doorway is hidden by one of the objects in the first room and that you don't need to render anything at all in the second room.

You can use the fill-rate test in the D3dtest.exe application that is provided with this Programmer's Reference to demonstrate overdraw performance for a given driver. (The fill-rate test draws four tunnels from front to back or back to front, depending on the setting you choose.)

On faster personal computers, software rendering to system memory is often faster than rendering to video memory, although it has the disadvantage of not being able to use double buffering or hardware-accelerated clear operations. If your application can render to either system or video memory, and if you include a routine that tests which is faster, you can take advantage of the best approach on the current system. The Direct3D sample code in this Programmer's Reference demonstrates this strategy. It is necessary to implement both methods because there is no other way to test the speed. Speeds can vary enormously from computer to computer, depending on the main-memory architecture and the type of graphics adapter being used. Although you can use D3dtest.exe to test the speed of system memory against video memory, it cannot predict the performance of your user's personal computer.

You can run all of the Direct3D samples in system memory by using the -systemmemory command-line option. This is also useful when developing code because it allows your application to fail in a way that stops the renderer without stopping your system—DirectDraw does not take the WIN16 lock for system-memory surfaces. (The WIN16 lock serializes access to GDI and USER, shutting down Windows for the interval between calls to the IDirectDrawSurface3::Lock and IDirectDrawSurface3::Unlock methods, as well as between calls to the IDirectDrawSurface3::GetDC and IDirectDrawSurface3::ReleaseDC methods.)