Optimizing Audio Performance with DirectX

Mark McCulley
Microsoft Developer Network Technology Group

August 26, 1996

Abstract

The audio components of Microsoft® DirectX™—DirectSound™ and Direct3DSound™—include powerful tools for games and interactive-media programmers. DirectX takes advantage of sound-accelerator hardware whenever possible to improve performance and minimize CPU usage, but audio can still have a significant impact on system performance. This article describes techniques that will help you use DirectSound and Direct3DSound to minimize the performance impact of audio playback.

Introduction

The Microsoft® DirectX™ wave audio playback services are designed to support the demanding requirements of games and interactive-media applications for Windows® 95 and Windows NT®. DirectSound™ and Direct3DSound™ allow you simultaneously to play multiple wave files and move sound sources within a simulated 3-D space. Whenever possible, DirectX takes advantage of sound-accelerator hardware to improve performance and minimize CPU usage, but this doesn't mean you can go and code your game to blast out a multitude of sounds and move them willy-nilly around your 3-D game space. If you don't pay attention to how you use your computer's sound resources, you'll soon discover that a sizable percentage of your CPU cycles are being spent cranking out those 44.1 kHz 16-bit stereo bird sounds that you added to provide a little background ambiance to your outdoor adventure game.

The guidelines and techniques in this article will help you optimize audio performance with DirectX. To get the most benefit from this article, you should be familiar at least with the DirectSound application programming interface (API). If you're not yet acquainted with DirectSound, you may want to read a couple other articles first. For a basic introduction to DirectSound, start with "Get World-Class Noise and Total Joy from Your Games with DirectSound and DirectInput" (Library, Periodicals, Microsoft Systems Journal, 1996 Volume 11, February 1996 Number 2). I discuss the technique of streaming (using a relatively small buffer to play a lengthy wave file) in my technical article "Streaming Wave Files with DirectSound."

Tips and Techniques

First, a few definitions. Those of you familiar with DirectSound may already be familiar with the following terms:

Secondary buffers are the buffers applications use to play wave data. Each wave file being played has one secondary buffer, and each of these buffers can have a different format.
The primary buffer is the output buffer for DirectSound. Normally, applications do not write wave data directly to the primary buffer. DirectSound mixes the data from secondary buffers into the primary buffer. There is only one primary buffer and its format determines the output format.
Static buffers contain a complete sound in memory. They are convenient because you can write the entire sound to the buffer in a single operation. Static buffers can be mixed by sound-card hardware to increase performance.
Streaming buffers contain only a portion of a sound and are useful for playing lengthy sounds without using a lot of memory. With streaming buffers, you must periodically write new data into the sound buffer. Streaming buffers cannot be mixed in hardware.

I'll also mention the DirectSound mixer, the component of DirectSound that is responsible for mixing the bits from all of the secondary buffers and performing operations such as volume scaling, panning (left-right balance), frequency shifting, and 3-D processing. While the mixer isn't a discrete component that you have access to through an API (other than by controlling operations like the ones described above), it's the most CPU-intensive part of DirectSound. Many performance issues can be discussed in terms of what's happening with the DirectSound mixer. For all of the visual learners out there, the following diagram illustrates the relationship between the mixer and primary and secondary buffers:

Overly simple diagram showing relationship of buffers to the DirectSound mixer.

The DirectSound development team would throw a fit if they saw this diagram. The mixer is much more sophisticated than this diagram illustrates—I haven't included any components related to hardware mixing, 3-D, or other types of processing.

Now that I've gotten all of this background information out of the way, I can get on with the useful stuff. Following is a list of techniques that can help you maximize performance with DirectSound:

Use sounds wisely.
Use the same format for secondary and primary buffers.
Set the primary buffer format to have the lowest acceptable data rate.
Play the primary buffer continuously if you have frequent short intervals of silence.
Use hardware mixing as much as possible.
Maximize the granularity of control changes.
Use deferred processing of 3-D commands.

I explain each of these techniques in the following sections.

Using Sounds Wisely

One of the coolest features of DirectSound is its ability to play and control multiple audio tracks independently. While this is a real boon to sound designers, it doesn't come without cost. The cost is CPU cycles. Each secondary buffer you use consumes CPU cycles. Each processing operation such as frequency scaling consumes additional CPU cycles. Three-dimensional sounds consume more cycles than regular sounds. Get the picture?

Sit down with your sound designer and discuss the impact of sound use on overall performance. (If you're the programmer and the sound designer, sit down with yourself and mull this over.) Decide which sounds are most important to convey the desired experience to your users. Premix sounds whenever possible to reduce the use of secondary buffers. For example, if you're creating summertime night ambiance with chirping crickets on one track and croaking frogs on another track, combine the two into a single track.

If you design your application keeping in mind the tradeoffs that you may need to make later when you're tweaking performance, you'll simplify the process considerably. Remember that a relatively small number of properly designed and utilized sounds can go a long way. One of the seminal masterpieces of audio recording, the Beatles' Sgt. Pepper's Lonely Hearts Club Band, was recorded on a four-track tape recorder. By comparison, modern recording studios are equipped to provide at least 48 tracks and can provide a virtually unlimited number of tracks by synchronizing multiple tape decks and using MIDI sequencers.

Using the Same Format for the Secondary and Primary Buffers

The DirectSound mixer converts the data from each secondary buffer into the format of the primary buffer. This is done on the fly as data is mixed into the primary buffer. This format conversion costs CPU cycles. You can eliminate this overhead by ensuring that your secondary buffers (i.e. wave files) and primary buffer have the same format. In fact, due to the way DirectSound does format conversion, you only need to match the sample rate and number of channels—it doesn't matter if there is a difference in sample size (8-bit or 16-bit).

Reducing Data Rate of Primary Buffer

Most of today's sound cards are ISA-bus cards that use DMA (direct memory access) to move sound data from system memory to local buffers. This DMA activity directly affects CPU performance when the processor is forced to wait on a DMA transfer to end before it can access memory. This performance hit is unavoidable on ISA sound cards but is not a problem with the newer 32-bit PCI cards.

With DirectSound, the impact of DMA overhead is directly related to the data rate of the output, the primary buffer. I've heard anecdotally that if you set the primary format to 44.1 kHz 16-bit stereo on a 90 MHz Pentium, DMA will suck away almost 30 percent of the CPU cycles! DMA overhead could be the biggest single factor affecting the performance of DirectSound. The up side here is that this factor is also very easy to control when you're tweaking performance. Experiment with reducing the data rate requirement by changing the format of the primary buffer. The tradeoff here is obvious: performance versus sound quality. To change the format of the primary buffer, call the lDirectSoundBuffer::SetFormat method. Don't forget that your cooperative level must be set to DSSCL_PRIORITY or DSSCL_EXCLUSIVE to mess around with the primary buffer.

Playing the Primary Buffer Continuously Through Periodic Silences

DMA affects performance in another way. When there are no sounds playing, DirectSound stops the mixer engine and halts DMA activity. If your game has frequent short intervals of silence, the overhead of starting and stopping the mixer each time a sound is played may be worse than the DMA overhead if you kept the mixer active. In this case, you can force the mixer engine to remain active by calling the Play method on the primary buffer. Then the mixer will continue to run (playing silence) even when there are no sounds playing. To resume the default behavior of stopping the mixer engine when there are no sounds playing, call the Stop method on the primary buffer.

Using Hardware Mixing

Most sound cards support some level of hardware mixing if there is a DirectSound driver for the card. The following tips will allow you to make the most of hardware mixing:

Use static buffers for sounds that you want to be mixed in hardware. DirectSound will attempt to use hardware mixing on static buffers.
Create sound buffers first for the sounds you use the most (there's a limit to the number of buffers that can be hardware mixed).
At run time, use the IDirectSound::GetCaps method to determine what formats are supported by the sound-accelerator hardware and use only those formats if possible. (Some sound cards can mix only certain formats. For example, the SoundBlaster AWE32 card can mix only mono 16-bit formats.)

To create a static buffer, specify the DSBCAPS_STATIC flag in the dwFlags field of the DSBUFFERDESC structure when you call CreateSoundBuffer to create a secondary buffer. You can also specify the DSBCAPS_LOCHARDWARE flag to force hardware mixing for a buffer; however, CreateSoundBuffer will fail if resources are not available for hardware mixing.

The IDirectSound::GetCaps method provides a complete description of the acceleration capabilities of a sound card and should prove helpful when assessing performance issues. You can also call GetCaps at launch time and adjust your audio subsystem to best use available hardware resources. Take a look at the DSCAPS structure and flags for DSCAPS.dwFlags in the DirectX documentation to get an idea of exactly what information is available.

Minimizing Overhead of Control Changes

Changing pan, volume, or frequency on a secondary buffer also affects performance. To prevent interruptions in sound output, the DirectSound mixer must mix ahead from 20 to 100 or more milliseconds. Whenever you make a control change, the mixer has to flush its mix-ahead buffer and remix with the changed sound. It's a good idea to minimize the number of control changes you send, especially if you're sending them in streams or in bursts. Try reducing the granularity of routines that call the SetVolume, SetPan, and SetFrequency methods. For example, if you have a frame-sync'd routine that moves a sound from the left to the right speaker, try calling SetPan once per frame instead of twice per frame.

Note 3-D control changes (orientation, position, velocity, Doppler factor, and so on) also cause the DirectSound mixer to remix its mix-ahead buffer. However, you can group a number of 3-D control changes together and cause only a single remix. Read the following section for details on deferring control changes.

Using Deferred Processing of 3-D Commands

As I said earlier, 3-D sounds are more expensive than regular sounds. That's because on each mixer cycle, additional CPU cycles are spent calculating the 3-D effects. Use as few 3-D sounds as you can, and don't use 3-D on sounds that won't really benefit from the effect. This is another factor you'll have to experiment with when you're tuning your game's performance. To make this task easier, design your application so that it's easy to enable and disable 3-D effects on each sound. You can call the lDirect3DSound::SetMode method with the DS3DMODE_DISABLE flag to disable 3-D processing on any 3-D sound buffer.

Changes to 3-D sound buffer and listener settings such as position, velocity, and Doppler factor will cause the DirectSound mixer to remix its mix-ahead buffer wasting a few CPU cycles. To minimize the performance impact of changing 3-D settings, you can use a feature that is unique to the 3-D sound component of DirectSound, deferred command processing. To use deferred command processing, specify the DS3D_DEFERRED flag for the dwApply parameter on any of the IDirect3DListener or IDirect3DSoundBuffer methods that change 3-D settings (SetPosition, SetVelocity, and so forth). Make all of the changes for a frame deferred and then call IDirect3DListener::CommitDeferredSettings to execute all of the deferred commands with a single remix of the mix–ahead buffer. I bet you're wishing you had deferred processing for sounds that are not 3-D sounds. Unfortunately, this feature didn't make it into DirectX 3.

Conclusion

I've presented a number of specific techniques for optimizing audio performance with DirectX. The best general advice I can give you is to design your audio subsystem to support performance monitoring and tuning. No doubt you've reserved plenty of time in your schedule to tweak your game's performance! If you've taken performance-tuning into consideration from the beginning, this task will be much easier.