Get World-Class Noise and Total Joy from Your Games with DirectSound and DirectInput

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

February 1996

Get World-Class Noise and Total Joy from Your Games with DirectSound and DirectInput

Dave Edson

Dave Edson is a member of the Microsoft Premier Developer Support team and has published two books, Dave's Book of Top Ten Lists for Great Windows Programming (M&T Books, 1994) and Writing Windows Applications from Start to Finish (M&T Books, 1993).

Click to open or copy the WINDONUT project files.

Playing games on versions of Windows® prior to 3.1 often involved great leaps of imagination. If you had just blasted a bloody gaping hole in the head of a big evil alien, this line of code would be executed:

 MessageBeep(0);

If you had just caught that fly ball in center field in the bottom of the 9th inning:

 MessageBeep(0);

And if you stepped on the gas while racing your Formula 1 racing car:

 for (x = 0; x < 10; x++ ) MessageBeep(0);

With Windows 3.1 and the glorious sndPlaySound API, you could finally play a wave file for the above mentioned events. Of course, there were some tiny drawbacks. There was this latency thing. Latency is the amount of time it takes for the wave file to actually start playing from the moment the call to sndPlaySound was made. Since the sndPlaySound API does not implement anything sophisticated like caching away the sound, or mixing two wave files together, your games sounded a bit like a cheezy radio show where the sound effects person is a little bit behind and can only do one sound at a time. In other words, they were still less than adequate.

Then Microsoft came out with WAVEMIX.DLL,anice,little, mainly-unheard-of technology used in the Arcade game pack. It mixed wave files, and it was available to developers for distribution. WAVEMIX never really caught on. Probably because DirectDrawª wasn't out yet, so game graphics were still not that great.

Well, games for Windows® 95 have come a long way, baby. In my first article in this series, "The Game SDK for Windows 95 Gives You Direct Hardware Control for Speedy Animation," MSJ November 1995, I wrote about DirectDraw, one of the four components of the Microsoft Games SDK for Windows 95. In this article, I'll examine DirectSound™, a new technology that deals your PC's sound card from the bottom of the deck to ensure that you'll get world-class noise spewing from your games. DirectSound takes advantage wherever possible of hardware features such as on-card memory and mixing sounds. Mixing sounds means playing them simultaneously. (For example, if you are writing a baseball game program, you want the sound of the crowd and the sound of the announcer to play simultaneously. The baseball game program would have two wave files: one for the crowd noise, and one for the announcer. You'd have the app "mix" these two wave files together to produce the desired sound effect.) I'll also cover DirectInput™, the third Direct feature of the Game SDK, which gives you control of the joystick.

Let's first take a look at how DirectSound fits into the Win32® system under Windows 95. Figure 1 is one of those block diagrams Microsoft has grown so fond of. As you can see, regular Win32-based apps still use the sndPlaySound API to make their noise. The sndPlaySound API calls a function in the wave layer that in turn calls the Windows audio Device Driver Interface (DDI) (that's the sound driver to you and me), which then talks to the hardware card and produces sound. The specifications for these sound drivers (DDIs) only allow for one sound playing at a time. Prior to DirectSound, applications had to do their own mixing of wave files using things like WAVEMIX.DLL. Wave mixing tends to slow things down, is lower quality (you only get 8-bitaudio),lessreliable,andthelatencyproblemisevenworse.

Figure 1 DirectSound

Unlike WAVEMIX.DLL, which can only mix up to eight waves (game programmer lingo for WAV files) and only reliably produce 8-bit output waveforms, DirectSound places no predefined limit on the number of waves that can be mixed. The output waveform can be at any of the current standard sampling rates. On sound cards that support hardware mixing, there is virtually no CPU hit for each combination of sounds. On sound cards that do not support hardware mixing, the impact of mixing is minimal; each mixed wave file consumes less than 1 percent of CPU time.

Anyway, if you're writing an app with sound, you'll have to choose between using sndPlaySound or DirectSound. These two beasties don't currently get along. If your app plays wave files, and doesn't really care about latency (perhaps an adventure game that spends a lot of time painting), go ahead and continue to use sndPlaySound. If you need low latency or sound mixing, then you have to use DirectSound and nothing but DirectSound. Once you release DirectSound (see "Cleanup" down near the end of this article), you can use sndPlaySound again. By the way, many applications can use DirectSound concurrently but only one of them (the foreground one) will make sound at a time. DirectSound switches the sound whenever the input focus changes. If you have two apps that both need to make DirectSound noise simultaneously, you'll have to figure out a way to get some interprocess communication going, so that only one app makes the noise for both.

DirectSound automatically takes advantage of accelerated sound hardware, including hardware mixing and hardware sound buffer memory. Applications do not need to query the hardware or program specifically for hardware acceleration; DirectSound takes care of everything. However, applications can query the hardware to determine the capabilities of the hardware in order to optimize hardware resource allocation specifically for that application.

Anatomy of a Wave File

Before I jump into playing all these wave files on DirectSound, let's take a quickie look at what a wave file is. For DirectSound, a wave file consists of a header and the actual digitized samples. A sample is a scalar value of the average amplitude recorded over a predefined interval on a wave, here a sound wave.

There are an infinite number of points along a genuine analog sound wave, each point's amplitude having a value from zero to n, with n being the maximum pressure the wave carrier (what the sound wave is traveling through) can withstand before breaking down.

Now, let's look at a digitized wave in Figure 2. In this wonderful world of digital, there are a finite amount of "points" along this wave, each point having a duration. These points are called samples, since they represent a sample of what the wave sounds like at that time. The duration is determined by the sampling frequency. A typical sampling rate is 22,050 times per second. This means each sample represents 1/22050th of a second of sound. The amplitude at each sample (illustrated in green in Figure 2) is played for the entire 1/22050th of a second. Obviously, the higher the sampling frequency, the smoother the digital wave sounds. And, of course, the more samples per second, the more memory required.

Figure 2 Digitized Sound Wave

Now these wave samples can be represented using different file formats. This version of DirectSound requires the standard Windows WAV file format.

The header of the wave file is in the WAVEFORMATEX layout. For our WAV file, the wFormatTag is WAVE_FORMAT_PCM, which identifies the standard Windows WAV file format that DirectSound understands. This format assumes no compression, and therefore each sample is a fixed size. Check out the documentation for WAVEFORMATEX to see how these fields may vary for compressed file formats.

 typedef struct 
  {
  WORD  wFormatTag;        // Waveform Format
  WORD  nChannels;         // 1 = mono, 2 = stereo
  DWORD nSamplesPerSec;    // Samples per second
  DWORD nAvgBytesPerSec;   // Bytes per sec on playback
  WORD  nBlockAlign;       // Alignment of each sample
  WORD  wBitsPerSample;    // Size of each sample
  WORD  cbSize;            // Xtra bytes for Xtra info
  } 
WAVEFORMATEX;

A WAV file is a tagged data file, so the first three DWORDs identify the format and size of the wave file. After that, you search through the file for four-byte sequences of the characters "fmt " (there is a space for the fourth character) to find the WAVEFORMATEX header, and "data" to find the samples.

Figure 3 contains WAVE.C, a file that I have put together from a bunch of different samples. The WAVEFILE structure declared at the top of the file is just my own way of keeping everything in one handy place. The functions allocate a chunk of memory to hold the WAV, and then fill out the WAVEFILE structure to indicate the positions of the header and data inside the chunk of memory. It is the responsibility of the caller of these functions to toss the memory when the program is finished with them.

Now that you know what a wave file is, and have some nice helper functions to get a wave file into a format readable by DirectSound, let's move on to making some noise!

DirectSound Objects

Like DirectDraw, DirectSound uses the COM programming model, and all of the APIs are exposed as COM style interfaces. All except two, DirectSoundCreate and DirectSoundEnumerate. DirectSoundEnumerate is used to see what DirectSound devices are available in the system. Usually, there will only be one device, but the Power User with Toys by the Zillions (PUTZ) may have more than one sound card in his machine or he may have some really bitchin' Camaro card that has two complete sound units on it. In this case, you would have a choice of devices to create your DirectSound object for. Getting back to Earth, let's assume you are writing games for the rest of us with only one sound card.

DirectSoundCreate does exactly what you would want it to do: it creates a pointer to IDirectSound that you can then use to access all of the interfaces.

 if(DS_OK!=DirectSoundCreate(NULL,&lpDirectSound,NULL))
  {
  MessageBox (hWnd, "Silent Bob sez no Direct Sound!", 
              "MSJ", MB_OK );
  }

The first parameter is NULL if you want the default Windows sound device; otherwise it is a GUID representing the desired DirectSound driver that you picked from calling DirectSoundEnumerate. Once you have this DirectSound object, you can then set up a few characteristics and find out about your capabilities.

Just as in DirectDraw, you need to set the cooperative level before you can start to use any of the functionality. Unlike DirectDraw, you don't want to be Stalinist and take control of everything. You should set the cooperative level to DSSCL_NORMAL, which is a nice friendly mode that allows other apps on the system to live with DirectSound. (Note: DirectSound does allow multiple apps to use DirectSound simultaneously, unless DirectSound is using the Hardware Emulation Layer due to your totally lame card.) Why? Future versions of DirectSound will allow applications to play sounds that are audible even when the application does not have input focus. A typical call to SetCooperativeLevel would look like this:

 if (DS_OK != lpDirectSound->lpVtbl->
    SetCooperativeLevel (lpDirectSound, ghWnd, 
                         DSSCL_NORMAL))
     {
     MessageBox(ghWnd,"SilentBobdon'twannacooperate!", 
                  "MSJ", MB_OK );
      }

lpDirectSound is obtained from DirectSoundCreate, and ghWnd is the top level window of your application.

After you have done the two required operations above, you can set up the one and only option about the system: the speaker configuration.

Not so fast. The speaker configuration is something best left alone (since the only two features currently supported are stereo and mono). The SetSpeakerConfig API does let you specify parameters to change the speaker configuration to indicate headphones, monophonic, stereophonic, quadraphonic, surround sound, or googolphonic. Read the docs for more information on this currently docile function.

In addition to the SetSpeakerConfig interface, there is a matching GetSpeakerConfig interface to tell your program what DirectSound knows about the current configuration.

Building DirectSound Buffers

Once you have created your DirectSound object, it is time to create DirectSound buffers. A buffer in DirectSound represents an audio stream (wave) that can be played back. There are two kinds of buffers: primary and secondary. The primary buffer is what is actually playing out the speakers. Secondary buffers are sounds that are ready to play. There is only one primary buffer, and your application should simply allow DirectSound to maintain it. Your program can access the primary buffer directly, but your application would then be responsible for real-time playback handling of the buffer at all times. Your application must respond in a very time-critical fashion, something a Ring 3 application cannot accomplish easily. Directly accessing primary buffers is beyond the scope of this article.

If you want to hear a sound in a secondary buffer, you simply "play it" into the primary buffer, a process I'll describe later. You can do basic and advanced operations with DirectSound buffers. The basics, such as creating, destroying, filling, and playing them, are required. The advanced operations involve tweaking attributes of the buffers to change things like frequency, balance, and volume. Let's take a look at these operations.

Basic Operations on DirectSound Buffers

By default a primary buffer has already been created for you. To create the secondary buffer, you must perform these three steps:

Set up a DSBUFFERDESC structure that defines the specific characteristics of the buffer.
Call the CreateSoundBuffer interface off the DirectSound object. This will give you a pointer to a DirectSoundBuffer object.
Copy the wave samples/bits into the buffer.

The DSBUFFERDESC structure has five fields:

 typedef struct 
  {
  DWORD     dwSize;        // sizeof(DSBUFFERDESC)
  DWORD     dwFlags;       // Buffer attributes
  DWORD     dwBufferBytes; // Size of wave file
  DWORD     dwReserved;    // Must be zero!

  LPWAVEFORMATEX lpwfxFormat; // Points to WAVEFORMATEX
  } 
DSBUFFERDESC;

The dwFlags field can have a combination of the values shown in Figure 4. I'll discuss these flags further below.

As just mentioned, to create a secondary buffer, you must first fill out this DSBUFFERDESC structure. The WAVE.C file contains all of the information needed to create a buffer. So let's assume you are trying to create a sound buffer out of the "Tub Full of Tires.wav" file (see Figure 5). Once this code succeeds, lpDSB will point to a DirectSoundBuffer. At this point, you can copy blocks of sound data into the buffer, using the Lock and Unlock interfaces on the DirectSoundBuffer (see Figure 6).

Notice in Figure 6 that the Lock and Unlock interfaces give you back two pointers, pbData and pbData2. This is due to the circular nature of sound buffers. If you provided an offset into the buffer during the Lock call, the first pointer would point to the chunk of memory starting at the offset and going to the end of the buffer (see Figure 7). The second pointer would point to the chunk of memory beginning at the start of the buffer and ending at the byte before the offset.

Figure 7 Sound Buffer Offsets

Unlike the conventional Lock and Unlock-style Windows APIs you may be used to, the Lock and Unlock interfaces under DirectSound are actually wrappers around very device-specific code to get the bits of your wave file to the sound card. Some sound cards actually can make a chunk of their own memory available to the CPU's address space, while others cannot. You don't care how this is done, since DirectSound'sdriverswrap this up in the Lock/Unlock metaphor. The memory pointer returned by Lock actually points to a virtualized memory address. When you write into the memory pointed to by this address, the DirectSound driver catches the memory-write operation and gets the bits to their destination in a manner defined by the device driver.

This means that the pointer returned by Lock is a write-only pointer. You are allowed to copy bits to the "memory" pointed to by the address returned by Lock, but you cannot read bits from the "memory."

Once you have copied the bits of your wave file into a DirectSound buffer with the Lock/Unlock interfaces, you can then call other DirectSound interfaces to play the wave. Calling Play on a sound buffer actually mixes the bits of the buffer's wave with any bits currently residing in the primary sound buffer. "Looped" sound buffers are buffers that play their waves repeatedly.

If you play a buffer that is already playing, DirectSound will stop playing the buffer and restart it again at the point you specify in your call to the Play interface. Think of a CD player. If you are playing your favorite Beatles song and halfway through it you press the restart button, your Beatles tune starts over again. That's the same way a DirectSound buffer works.

If you have a sound that needs to be played multiple times, and possibly simultaneously (such as bullets or explosions), you need multiple buffers that contain the same wave. DirectSound provides a method to duplicate a sound buffer by creating a virtual copy of the original buffer. There is only one copy of the actual bits that compose the wave, but two different sound buffers point to it. Once you have this virtual copy of the buffer, you can play it independantly or simultaneously with the original buffer, whenever you like. I'll cover duplication of DirectSound buffers in a moment.

The code in Figure 8 uses the CD player analogy to play a DirectSound buffer. The code first checks to see if the buffer is already playing. If the buffer is playing, the play pointer is reset to the start of the buffer (the beginning of the song). Since the buffer is already playing, the music starts over. If the buffer was not playing, then the Play interface on the DirectSound buffer is called to get it started.

That's enough to get you started writing a noisy app! Except for the cleanup described down below, you can implement some fairly sophisticated sound in your game and you can start killing those baby seals and hear the pleasing sound of searing flesh from your ACM404-B Missile Launcher 95.

Advanced Options on DirectSound Buffers

Here's the gravy. Along with the pedestrian operations of creating and playing sounds, you can also dink around with the volume, balance, and frequency of the sound. For example, you could take a normally looping sound of a car engine, and you can tweak its frequency to make it sound like the engine is revving. Or you could move the sound of the racing car from the left to right speaker to match what's happening on the screen. Or you can lower the volume as your hot Porsche drives further away. Let's look at the six methods you can call on a DirectSoundBuffer for these three attributes:

 GetFrequency (lpDS, LPDWORD lpdwFrequency)
SetFrequency (lpDS, DWORD   dwFrequency)
GetPan       (lpDS, LPDWORD lpdwPan)
SetPan       (lpDS, DWORD   dwPan)
GetVolume    (lpDS, LPDWORD lpdwVolume)
SetVolume    (lpDS, DWORD   dwVolume)

For the frequency methods, dwFrequency indicates the number of samples getting played per second. Making this value higher increases the overall pitch of the sound; lowering this value decreases the overall pitch. Before you twiddle the frequency of a sound, you will want to call GetFrequency first to find out the current setting, because not all wave files use the same sampling rate. Some waves have 22,050 samples per second, others have 11,025 samples per second, and some others are 44,100 samples per second. Don't assume anything about the sampling rate (except that DirectSound currently only allows sampling rates up to 100,000 samples per second); always query the DirectSound buffer for the sampling rate. Figure 9 contains some code to double the sampling rate of a DirectSound buffer. For the pan methods (balance), dwPan specifies how much to turn down one of the speakers, in hundredths of decibels. For example, if you want to move the balance 3dB left, you would turn the right speaker down 3dB. To move the balance left, use a negative number (here, -300). To move the balance right, use a positive number. The maximum amount you can specify is 100dB. Below is some sample code that pans a one-second wave file from left to right. It adjusts the pan volume in 3dB increments, which means it must cover a range of -10000 to 10000 in 67 steps (20000/300 = 66.6). To do this pan in one second, you need to wait 1/67th (0.0149) of a second between calls to SetPan. I used GetTickCount to get close to this number, but left out the very necessary DS_OK error-checking purely to make the code more readable.

 int idB;
int iTickCount;
// Start playing the buffer
lpDSB->lpVtbl->Play(lpDSB,   // Buffer to play
                    0,       // Reserved1 
                    0,       // Reserved2
                    0))      // Zero (not looping)
                    idB = -10000;
                // Right channel silent (sound on left)
while (iDB < 10000)
  {
  // Set the pan rate
  lpDS->lpVtbl->SetPan(lpDSB, idB);

  // Add three decibels for the next time
  idB = min (10000, idB+300);  

  // Wait 0.0149 of a second
  iTickCount = GetTickCount();
  while ((GetTickCount() - iTickCount) < 149);
  }

And finally, the volume methods allow you to turn down the volume of a wave (the current version of DirectSound does not support amplification). A value of 0 leaves the wave at its original volume, and negative values indicate the number of decibels to turn down the wave. For example, if you wanted to play the wave at half its current volume (-10dB), you would use code like this:

 // Find out the current volume
if (DS_OK == lpDS->lpVtbl->GetVolume (lpDS, &dwVolume))
  {
  // Half it by lowering 10dB, unless it is 
  // essentially silent (-100dB)
  dwVolume = max (-10000, dwVolume - 100);
  // Set the new rate
  if (DS_OK != SetVolume (lpDS, dwVolume))
    { MessageBox (hWnd,"SilentBobsezSetVolumefailed", 
                  "MSJ", MB_OK );
     }
  }
else 
  {
  // Volume will fail if the DirectSound buffer
  // was not created with the DSBCAPS_CTRLVOLUME
  // flag (or DSBCAPS_CTRLDEFAULT).
  MessageBox (hWnd, "Silent Bob sez GetVolume failed", 
              "MSJ", MB_OK );
  }

If you want to play more than one copy of a sound, you can use the DuplicateSoundBuffer method on DirectSound. It will return a new IDirectSoundBuffer interface that refers to the same sound buffer memory. The second virtual buffer may be played and stopped independently of the original; also, parameters such as volume and frequency can also be controlled independently. However, if you change the sound buffer memory by Locking and writing, it will affect all duplicated buffers as well. This function takes two parameters: the source and the duplicate buffer! Here's a function to duplicate a wave (assuming lpDirectSound is a global variable pointing to the DirectSound object):

 LPDIRCTSOUNDBUFFER DuplicateBuffer (LPDIRCTSOUNDBUFFER
                                    lpOriginal )
{
  LPDIRECTSOUNDBUFFER lpDuplicate;

  if (DS_OK == lpDirectSound->lpVtbl->
                 DuplicateSoundBuffer(lpDirectSound,
                                      lpOriginal,
                                      &lpDuplicate) )
    {
    return lpDuplicate;
    }
  else
    { MessageBox (hWnd, 
          "Silent Bob sez DuplicateSoundBuffer failed", 
          "MSJ", MB_OK );
    return NULL;
    }
}

Cleanup

When you are finished using a DirectSound buffer, you can free up the memory associated with the buffer using the Release interface:

 lpDSB->lpVtbl->Release(lpDSB);
lpDSB = NULL;

Once you have released the buffer, lpDSB is invalid, so you should set it to NULL to prevent random crashes caused by pointer reusage. That way you will get an access violation fault instead of some weird COM error (in the case that the pointer was reused for some other COM object).

When you are all finished with the system, call the Release interface on the DirectSound object:

 lpDirectSound->lpVtbl->Release(lpDirectSound);
lpDirectSound = NULL;

At this point, you can safely call the Windows 95 sndPlaySound API again, since you have released the sound driver of its DirectSound monopoly.

My Thin Code Wrapper

I have written a thin little code wrapper to make things a little easier for you to jump right into DirectSound. My wrapper functions in DIRECTSOUND.C (see Figure 10) uses the WAVE.C helper functions. Also, keep in mind that this entire code wrapper is entirely my own personal implementation: it is not part of the Games SDK.

Let's take a look at the two structures defined in DIRECTSOUND.H:

 typedef struct tagDIRECTSOUNDOBJECT
  {
  LPDIRECTSOUND         lpDirectSound;
  }
DIRECTSOUNDOBJECT, *LPDIRECTSOUNDOBJECT;

typedef struct tagDIRECTSOUNDWAVE
  {
  WAVEFILE WaveFile; // Standard WaveFile from WAVE.H
  LPDIRECTSOUNDBUFFER pDSB; // Ptr to dir. sound buffer
  DWORD dwFreq;    // Frequency
  DWORD dwPan;     // Panning info (L to R balance)
  DWORD dwVol;     // Volume
  BOOL  bLooped;   // Looped = TRUE (plays repeatedly)
  BOOL  bPlaying;  // Is this wave playing?
  } 
DIRECTSOUNDWAVE, *LPDIRECTSOUNDWAVE;

The DIRECTSOUNDOBJECT structure is simply a wrapper around the LPDIRECTSOUND interface; I did this just to make it easier to port this wrapper to C++ if I desire. There are only one of these objects in my app. The DIRECTSOUND_Enable and DIRECTSOUND_Disable functions calltheDirectSoundCreateandlpDirectSound->lpVtbl->Release functions respectively.

The DIRECTSOUNDWAVE structure is a wrapper aroundaWAVEFILEstructure(fromWAVE.H)andanLPDIRECTSOUNDBUFFERinterfacepointer.The DIRECTSOUND_LoadWave function first calls the WAVE_LoadFile function to fill out the wave file portion of the structure. Then it calls lpDirectSound->lpVtbl->CreateSoundBuffer to create a DirectSound secondary buffer, and then it uses the pDSB->lpVtbl->Lock and pDSB->lpVtbl->Unlock interfaces to copy the bits from the WaveFile structure into the buffer. Finally, the memory in the WaveFile structure is released.

The dwFreq, dwPan, dwVol, bLooped, and bPlaying fields are cached values of the actual DirectSound buffer to avoid the extra COM calls.

The DIRECTSOUND_Play and DIRECTSOUND_Stop functions call the appropriate interfaces on the DirectSound buffer. The DIRECTSOUND_IsPlaying function returns the cached value that was set/cleared in one of the above Play or Stop functions.

The remaining Set/Get pairs from DIRECTSOUND.C also call the appropriate functions described above in the Advanced Options on DirectSound Buffers section. As you can see, this wrapper is rice-paper thin, even compared to the DirectDraw wrapper I wrote in the first article. That's because sounds basically just start and stop, while pictures require work to animate.

Let's look at how my WinDonut app has been modified to make lots of noise.

Makin' WinDonut Noisy!

The WinDonut sample app from my previous article had a little ship piloted by Ã (the Programmer Formerly Known as Brian) shooting little red donuts as they floated in space. Although in space no one can hear you scream, they can certainly hear Ã's ship flying around and shooting donuts. Just as the silent WinDonut loaded a bitmap file (WINDONUT.BMP) for its graphics, the noisy WinDonut loads fourwavefiles:THRUST.WAV,FIRE.WAV,BACKGROUND.WAV, and EXPLODE.WAV. The thrust wave is played repeatedly (looping) while the thrust key is held down, the background wave is played continuously, and the fire and explode waves are played when the user fires a shot or when a donut is destroyed. Let's look at the code injections into WinDonut.

I added some new global variables in WINDONUT.H:

 GLOBAL      LPDIRECTSOUNDOBJECT     glpDirectSound;
GLOBAL      LPDIRECTSOUNDWAVE       glpSoundFire;
GLOBAL      LPDIRECTSOUNDWAVE       glpSoundExplosion;
GLOBAL      LPDIRECTSOUNDWAVE       glpSoundThrust;
GLOBAL      LPDIRECTSOUNDWAVE       glpSoundBackground;

glpDirectSound points to the DIRECTSOUNDOBJECT structure discussed above; glpSoundFire, glpSoundExplosion, glpSoundThrust, and glpSoundBackground point to the four waves loaded from those four wave files. This block of code was added to WINDONUT_InitGame (in the file WINDONUT.C):

 if (glpDirectSound = DIRECTSOUND_Enable())
 {
 glpSoundBackground=DIRECTSOUND_LoadWave(glpDirectSound, 
                                       "background.wav");
 glpSoundFire = DIRECTSOUND_LoadWave (glpDirectSound,
                                      "fire.wav" );
 glpSoundExplosion=DIRECTSOUND_LoadWave(glpDirectSound, 
                                         "explode.wav");
 glpSoundThrust = DIRECTSOUND_LoadWave (glpDirectSound,
                                        "thrust.wav" );
 glpSoundBackground->bLooped = TRUE;
 glpSoundThrust->bLooped = TRUE;

 DIRECTSOUND_Play (glpDirectSound, glpSoundBackground );
 }

If DIRECTSOUND_Enable succeeds, the four wave files are loaded, the looping flag in the private structure is set to TRUE for the background and thrust waves, and the background "music" begins.

The FigureShipPosition function in WINDONUT.C is where the keyboard input is checked for ship movement. Thiscodeisexecutedwhenthethrustkeyishelddown:

 if (glpSoundThrust)
   if (!(glpSoundThrust->bPlaying))
     DIRECTSOUND_Play (glpDirectSound, glpSoundThrust);

And if the thrust key is released, this code is called:

 if (glpSoundThrust)
   if (glpSoundThrust->bPlaying)
     DIRECTSOUND_Stop (glpDirectSound, glpSoundThrust);

As you can see, these lines of code turn on and off the looping thrust sound for the ship.

In the FigureBulletPositions function in WINDONUT.C, this line of code is added when a bullet is created:

 if (glpSoundFire)
   DIRECTSOUND_Play (glpDirectSound, glpSoundFire);

And finally, when a donut is hit in the FigureCollisions function in WINDONUT.C, this line of code is called:

 if (glpSoundExplosion)
  DIRECTSOUND_Play(glpDirectSound,glpSoundExplosion);

Here's the new shutdown code in the WM_DESTROY section in WINDOWS.C:

 if(glpSoundFire) DIRECTSOUND_UnLoad (glpDirectSound,  
                                      glpSoundFire);
if(glpSoundExplosion)DIRECTSOUND_UnLoad(glpDirectSound,
                                    glpSoundExplosion);
if(glpSoundThrust) DIRECTSOUND_UnLoad (glpDirectSound,
                                        glpSoundThrust);
if(glpSoundBackground)DIRECTSOUND_UnLoad(glpDirectSound,
                                   glpSoundBackground);
if(glpDirectSound)DIRECTSOUND_Disable(glpDirectSound );

That's it! If you run WinDonut now, you will find that it makes lots and lots of noise, all mixed together wonderfully. Sure to provide plenty of yuk-yuks for all.

The Future of DirectSound

This first version of DirectSound is very powerful. Future versions of DirectSound will emulate 3D "positional" sound via software.

What's positional sound? Currently, each sound has a few attributes: volume, left to right pan, and frequency. If you want to make the sound bounce around a room, you have a huge chore ahead of you. DirectSound 3D will add more attributes to each wave: position, direction, and reflectivity. You will be able to make a sound seem to originate from behind the player and come at them in a downward, dive-bomber fashion. This will even be achievable on two-speaker systems; it is possible to modify waveforms so that listeners think a sound is behind them even though it emanated in front of them (the QSound recording format used by Pink Floyd on a couple of albums uses this sort of technology). This sound modulation is all done by DirectSound 3D or the sound card; all your app will need to do is say where the sound originated from.

Let's go over some stuff you can do right now to make it easy to harness this technology in the future. If you are designing a new game right now and you will want to implement DirectSound 3D in the future, try to maintain the information now that DirectSound 3D will want. Maintain your objects' positions in 3D space, and decide if you'll want your sounds to do things like echo and bounce off things. Here's a description of the 3D coordinate system that's planned for DirectSound 3D (although nothing is final yet):

Three-dimensional space is represented by an xyz Cartesian coordinate system, measured in cm.
Velocity is measured in cm/second.
The orientation of the sound is a unit vector: x²+ y²+ z²=1.
Volume and sound energy are measured in hundredths of a decibel.
Sound reflectivity (echo) is measure on an arbitrary scale where 0 is not reflective and 255 is completely reflective.
Sound dissipation is measured in hundredths of dB per meter².

Write your application so that these values can be computed later on. If you can maintain these values (or just put them in a table), porting your app to use DirectSound 3D will be a snap.

I won't go into any more detail on DirectSound 3D here, since it's subject to change. But you can rest assured that if you maintain the above information, you will have an easier time getting your app to make noises all over the room.

DirectInput

Unlike the other Direct features of the Games SDK, DirectInput comes with Windows 95 (by replacing MSJSTICK.DRV). It adds two new APIs (JoyGetPosEx and JoyGetDevCaps) that allow your program to find out information about the installed joystick. And only the joystick-this first incarnation of DirectInput is limited to supporting joystick-like devices. These would include the standard analog joysticks we have all come to hate, the new digital joysticks (such as Microsoft's Sidewinder Pro), and some of those new Nintendo-style controller pads. And I expect a rash of new input devices to swarm the game market over the next year.

First I will talk about the new features and advantages of DirectInput and discuss the placement of DirectInput amongst the New World Order of Windows 95 architecture. Second, I will talk about the calibration and testing of the new joysticks as well as support for multiple joysticks. After I explain how to read the joystick information, I'll modify WinDonut to allow play with the joystick.

DirectInput coupled with digital input devices gives two main benefits: performance and consistency. The analog joysticks we have been using for the past ten years require a tight noninterruptible polling loop to determine the values of the joystick's potentiometers. This polling takes up about ten percent of the entire CPU time for a game, and the results are not always that consistent. Digital joysticks use newer technology to determine the position of the joystick, requiring one poll to find out the position/status of the joystick controllers. The DirectInput joystick driver can just simply ask the joystick in one quick step for all of the information about its position. This means that a digital joystick and DirectInput can instantly add a ten percent performance improvement to your Windows-based game.

In addition to performance, you get consistency. DirectInput requires joystick manufacturers to write minidrivers that will bring some programming consistency to your game app. If a joystick has a throttle, you only need to look in one place to find it. Same goes for the point-of-view hats (for the uninitiated, that's a device on top of the joystick that lets you peek in a particular direction without committing yourself to a move) and extra buttons. And, since the interface with the joystick is handled by DirectInput, and since DirectInput is part of the operating system, players need to calibrate and test the joystick only once. After they have installed their joystick, the calibrations and capabilities of their joysticks are saved in the registry. No more writing those silly "Move yer stick to the upper left and fire, move to lower right, fire, move to center, fire" opening sequences.

And finally, DirectDraw offers a lot more flexibility. Lots-o-buttons, throttles, and up to 16 joysticks are supported in DirectInput. To hook up multiple digital joysticks, all you have to do is get a splitter cable (each joystick will assign itself a digital ID).

Let's take a look at where DirectInput sits in the Windows 95 operating system (see Figure 11). As you can see, the joystick driver model uses both 16- and 32-bit interfaces. VJOYD.VXD is the actual 32-bit Ring 0 driver that polls the joystick device. If your app is Win32-based, your polling of the joystick will be done with the joystick APIs, which live in WINMM.DLL. These APIs map directly down to the VJOYD.VXD device to give you blistering speed. For 16-bit apps, which use MMSYTEM.DLL, there is some thunking through MMSYTEM.DLL that eventually makes it to VJOYD.VXD.

Figure 11 DirectInput Architecture

Nonpolling 16-and 32-bit functions, such as calibration and testing, map to the MMSYSTEM.DLL, which calls the driver-dependent code for this feature. What does this mean to you? You won't have to worry about managing different joystick models; all of the grunge work is taken care of by MMSYSTEM.DLL. All you have to do is use the new APIs, which I will explain in the next two sections.

Calibration, Testing, and Multiple Joysticks

All of the calibration and testing of the DirectInput joystick drivers (both analog and digital) are done through the Joystick Control Panel applet. If your application wishes to ask the user to test or calibrate the joystick, your code simply spawns the Control Panel applet.

 WinExec("control joy.cpl", SW_NORMAL);

As I mentioned earlier, DirectInput supports multiple joysticks. The Windows 3.1 joystick APIs gave you the constants JOYSTICKID1 or JOYSTICKID2. Now you have JOYSTICKID1 through JOYSTICKID16 (although the SDK casually forgets to define JOYSTICKID3 through JOYSTICKID16). To enumerate the list of joysticks, you simply call the joyGetCaps API for each joystick, and skip over the ones with a bad return value:

 JOYCAPS JoyCaps;
int j;
char sz[24];

 for ( j = JOYSTICKID1; j < JOYSTICKID1+16; j++ )
  {
  if (JOYERR_NOERROR ==  
      joyGetDevCaps (j, &JoyCaps, sizeof(JoyCaps)))
    {
    wsprintf (sz, "Joystick #%d", j- JOYSTICKID1+1 );
    MessageBox ( hWnd, JoyCaps.szPname, sz, MB_OK );
    }
  }

Your program can, of course, put all of these in a list box and let the user decide which input device to use. And, since each input device has its own registry entries for calibration, your user can quickly jump between devices.

Reading the Joystick Information

Once your game has decided which joystick to use, it is time to play! With the multimedia APIs of Windows 3.1, you could use the joySetCapture API to capture the joystick input to a window. After calling the joySetCapture API, you could query the joystick or respond to some notifications. For example, if you pushed joystick button 1, the MM_JOY1BUTTONDOWN message would be sent to the window indicated in joySetCapture. For those five people who used it, joyGetPos and joySetCapture returned the x and y positions and two button settings for your joystick.

This model and API set holds true for DirectInput. If your game already uses these APIs and relevant messages, it will automatically gain the performance from DirectInput, especially if you are using a digital joystick.

There is one new API for reading the joystick, joyGetPosEx. This API uses the JOYSTICKIDx constant and the JOYINFOEX structure (see Figure 12), which accounts for the new controls found on the new joysticks. If you use joyGetPosEx, you do not need to set capture (just as you didn't need to if you used the old joyGetPos API).

 MMRESULT joyGetPosEx(UINT IDDevice, LPJOYINFOEX lpJIEX)

Most of the fields of JOYINFOEX are straightforward. The axis values are from 0 to 65,535, with 32,768 indicating a centered axis. The dwFlags bit dictates which fields of the structure need filling out.

The button states are actually bitfields; if button 1 is pressed, the bitfield indicated by JOY_BUTTON1 is set, and so on up to JOY_BUTTON32. If your joystick has more than 32 buttons (!), you can use the dwButtonNumber to see if a particular button is pressed. Since this field is not a bitfield, you cannot check for multiple buttons pressed above button number 32. For the point of view hat, use the values shown in Figure 13.

So, how would you query the joystick? In the WinDonut game, I care about the x axis (left to right) for ship rotation, button 1 (fire) and button 2 (thrust). And since WinDonut does not care how far left or right you push the joystick, I'll convert the x axis range to a more manageable "right, left, or centered" value (see Figure 14). The code in Figure 14 looks innocent enough, but be careful: if the user has an analog joystick, the call to joyGetPosEx can take up to eight milliseconds to do that horrid interrupt disabled polling thing. If they have a digital joystick, it will take just a few clock cycles.

WinDonut: Total Joy

I jammed the code in Figure 14 into WinDonut. (Updated code can be found on the usual MSJ sources; see page 5.) Since different users' inputs are checked in different functions, I have added a new function called ReadJoyStick called at the start of the WINDONUT_GameHeartBeat function. Since WinDonut does so few computations, I can get away with calling it on every heartbeat. In most games, you would not want to call this function for every game heartbeat, since you could use up a lot of time if the user has an analog joystick (72 times per second multiplied by 8 milliseconds means you could spend over half your time reading the joystick!).

The ReadJoyStick function in Figure 15 fills out the giJoyPos and gbButton1 and gbButton2 globals, which are then checked in the FigureShipPosition and FigureBulletPositions functions. This way both the keyboard and the joystick work for WinDonut. It's very straightforward, and it works nicely!

Conclusion

Three down, one to go. DirectDraw is great since it makes Windows video performance incredibly fast. DirectSound is great since it introduces a few new features such as mixing along with extremely low latency to provide a great sound setup. DirectInput and DirectPlay differ from the other Direct features in that they offer something sorely lacking from previous incarnations of Windows games: a standard. With DirectPlay and DirectInput, you can actually use an API set that handles the dirty work of connecting players together via modem and connecting players to their machines with nifty new input devices.

From the February 1996 issue of Microsoft Systems Journal.