Streaming Wave Files with DirectSound

Mark McCulley
Microsoft Corporation

July 30, 1996

Click to open or copy the files in the STREAMS sample application for this technical article.

Abstract

Playing small wave files with Microsoft® DirectSound® requires little buffer management; you can simply load the entire sound into memory and play it. With larger wave files, though, you should be more efficient in your memory usage, especially if you will be playing multiple sounds simultaneously. Streaming is a technique of using a small buffer to play a large file by filling the buffer with data from the file at the same rate that data is taken from the buffer and played.

In this article I discuss the techniques required to stream wave files from disk and play them using the DirectSound application programming interface (API). I chose to implement my solution in C++, but the techniques presented here apply to a C implementation as well.

Introduction to DirectSound

Microsoft® DirectSound® is the 32-bit audio application programming interface (API) for Microsoft Windows® 95 and Windows NT® that replaces the 16-bit wave API introduced in Windows 3.1. It provides device-independent access to audio accelerator hardware, giving you access to features like real-time mixing of audio streams and control over volume, panning (left/right balance control), and frequency shifting during playback. DirectSound also provides low-latency playback (on the order of 20 milliseconds) so that you can better synchronize sounds with other events. DirectSound is available in both the DirectX™ 2 and the DirectX 3 SDKs.

Just the Facts, Ma'am

I'm going to stick to the subject of streaming wave files and not rehash all of the basics of DirectSound. If you want a thorough overview of DirectSound, check out Dave Edson's article, "Get World-Class Noise and Total Joy from Your Games with DirectSound and DirectInput" in the MSDN Library (Microsoft Systems Journal, 1996 Volume 11, February 1996 Number 2).

If you want to experiment with DirectSound or build the STREAMS sample application, you'll need the DirectX 2 or the DirectX 3 SDK. The DirectX 3 SDK is available in the January release of the Development Platform.

If you're already familiar with DirectSound and don't want to read this entire article to get the goodies, skip to the Quick Fix section for a summary of what you need to know about streaming wave files with DirectSound.

How Streaming Works

The purpose of streaming is to use a relatively small buffer to play a large file. Specific implementations vary, but visualize streaming by imagining continually pouring water into a barrel with a hole in it. The idea is to keep enough water in the barrel so that the flow out of it is uninterrupted. In our case, the barrel is a sound buffer and the water is wave data. Let's carry this metaphor a bit further and say that to put water in the barrel, we have to fetch it from a lake with a bucket. The challenge of streaming, then, is to get the proper-sized bucket and a helper who can carry the bucket between the lake and the barrel fast enough to keep up with the outflow from the barrel. If the barrel (buffer) runs out of water (wave data), the flow (sound) is interrupted.

Streaming with DirectSound

If you've worked with the low-level wave API in Windows 3.1, you're probably familiar with the waveOutWrite function. This function sends a block of wave data to the driver; and when the driver is finished playing the buffer, it notifies the application and returns the buffer. To keep the drivers satisfied, the application must use at least two buffers and be able to fill a buffer with data in less time than it takes the driver to play a buffer. The following diagram illustrates the streaming mechanism used with the low-level wave API:

Double-buffer streaming with 16-bit wave API

The streaming mechanism used with DirectSound is a different beast altogether. With DirectSound, you create a looping secondary buffer object (I'll explain the "looping secondary" part of this jargon in a bit). This buffer is owned by DirectSound, and you must query the buffer to determine how much of the wave data has been played and how much space in the buffer is available to be filled with additional data. Conceptually, this mechanism is identical to a traditional circular buffer with head and tail pointers. The following diagram illustrates the streaming mechanism used with DirectSound:

Single-buffer streaming with DirectSound

With single-buffer streaming, the application is responsible for writing sound data into the buffer before the driver plays the data. The application should keep the buffer as full as possible to prevent any interruptions in sound playback.

Polling vs. Interrupt-Driven Buffer Monitoring

Single-buffer streaming requires that the application monitor the buffer and supply it with sound data when necessary. There are two approaches to implementing buffer monitoring:

The second approach, using interrupts to periodically monitor the buffer levels, is the most commonly used solution to the problem of maintaining a streaming buffer. This is the solution I chose to implement in the STREAMS sample application. The first approach, continuous polling, needlessly consumes CPU cycles.

A C++ Implementation of Streaming

The STREAMS sample application includes a C++ implementation of streaming with DirectSound. I chose to do a C++ implementation of streaming for several reasons:

You don't have to use C++ to work with DirectSound, but since DirectSound is based on the Component Object Model (COM), C++ is the native interface. If you choose to use C, the DirectX 2 and 3 SDKs provide macros that allow you to access DirectSound methods in C-language programs. For a C-language implementation of streaming with DirectSound, check out the DSSTREAM sample in the DirectX SDK.

Design Goals

My primary design goal was to create some reusable objects that implement streaming with DirectSound. I didn't want to introduce the complexities of COM or OLE, so the objects are reusable at the source-code level. I wanted the objects to have high-level interfaces and be easy to use in an application.

The STREAMS sample application uses the Microsoft Foundation Class (MFC) Library , a C++ application framework. I didn't base any of my streaming classes on MFC, so if you're using a different application framework, you should be able to reuse this code easily.

Building the STREAMS Sample Application

The STREAMS sample-application package includes source code for one target executable, STREAMS.EXE. I've included a project file for Visual C++®, Version 4.0. The following table summarizes the files required to make STREAMS.EXE. If you're not using Visual C++, you can use this table to easily recreate the project in your favorite IDE.

File Description
ASSERT.C Source file containing basic assert services
DEBUG.C Source file containing basic debug services
AUDIOSTREAM.CPP Source file containing implementation of AudioStreamServices and AudioStream objects
TIMER.CPP Source file containing implementation of Timer object
WAVEFILE.CPP Source file containing implementation of WaveFile object
STREAMS.CPP Source file for application
STREAMS.RC Resource script file
WINMM.LIB System library file
DSOUND.LIB System library file

The key source files are AUDIOSTREAM.CPP, TIMER.CPP, and WAVEFILE.CPP. These files contain the source for all of the objects required to implement wave streaming with DirectSound. The ASSERT.C and DEBUG.C files contain source for some simple debug and assert macros. The remaining source file, STREAMS.CPP, contains the source for a basic MFC-based application.

To build the STREAMS sample application, you'll need the Win32 SDK and the DirectX 2 or DirectX 3 SDK. To run STREAMS.EXE, you need the DirectX SDK runtime libraries and, of course, a sound card.

A Top-Down View

Before I get into the implementation of the objects that support streaming (the AudioStreamServices, AudioStream, Timer, and WaveFile objects), let's take a look at how these objects are used in the STREAMS sample application.

STREAMS is built on a basic two-object MFC model for frame window applications. The two objects are CMainWindow and CTheApp, based on CFrameWnd, and CWinApp, respectively. The following is the declaration of the CMainWindow class taken from STREAMS.H:

class CMainWindow : public CFrameWnd
{
public:
  AudioStreamServices * m_pass;   // ptr to AudioStreamServices object
  AudioStream *m_pasCurrent;      // ptr to current AudioStream object
  
  CMainWindow();

  //{{AFX_MSG( CMainWindow )
  afx_msg void OnAbout();
  afx_msg void OnFileOpen();
  afx_msg void OnTestPlay();
  afx_msg void OnTestStop();
  afx_msg void OnUpdateTestPlay(CCmdUI* pCmdUI);
  afx_msg void OnUpdateTestStop(CCmdUI* pCmdUI);
  afx_msg int  OnCreate(LPCREATESTRUCT lpCreateStruct);
  afx_msg void OnDestroy();
  //}}AFX_MSG

  DECLARE_MESSAGE_MAP()
};

Note the two data members m_pass and m_pasCurrent. These data members hold pointers to an AudioStreamServices and AudioStream object. For simplicity, the STREAMS sample application allows only a single wave file to be opened at a time. The m_pasCurrent member contains a pointer to an AudioStream object created from the currently open wave file.

Creating and initializing the AudioStreamServices object

Before a window uses streaming services, it must create an AudioStreamServices object. The following code shows how the OnCreate handler for the CMainWindow class creates and initializes an AudioStreamsServices object:

int CMainWindow::OnCreate(LPCREATESTRUCT lpCreateStruct) 
{
  if (CFrameWnd ::OnCreate(lpCreateStruct) == -1)
    return -1;

  // Create and initialize AudioStreamServices object.
  m_pass = new AudioStreamServices;
  if (m_pass)
  {
    m_pass->Initialize (m_hWnd);
  }

  // Initialize ptr to current AudioStream object
  m_pasCurrent = NULL;
  
  return 0;
}

Each window using streaming services must create an AudioStreamServices object and initialize it with a window handle. This requirement comes directly from the architecture of DirectSound, which apportions services on a per-window basis so that the sounds associated with a window can be muted when the window loses focus.

Creating an AudioStream object

Once a window has created and initialized an AudioStreamServices object, the window can create one or more AudioStream objects. The following code is the command handler for the File Open menu item:

void CMainWindow::OnFileOpen() 
{
  CString cstrPath;

  // Create standard Open File dialog
  CFileDialog * pfd 
    = new CFileDialog (TRUE, NULL, NULL,
               OFN_EXPLORER | OFN_NONETWORKBUTTON | OFN_HIDEREADONLY,
               "Wave Files (*.wav) | *.wav||", this);

  // Show dialog
  if (pfd->DoModal () == IDOK)
  {
    // Get pathname
    cstrPath = pfd->GetPathName();

    // Delete current AudioStream object
    if (m_pasCurrent)
    {
      delete (m_pasCurrent);
    }

    // Create new AudioStream object
    m_pasCurrent = new AudioStream;
    m_pasCurrent->Create ((LPSTR)(LPCTSTR (cstrPath)), m_pass);
  }
    
  delete (pfd);
}

Two lines of code are required to create an AudioStream object:

m_pasCurrent = new AudioStream;
m_pasCurrent->Create ((LPSTR)(LPCTSTR (cstrPath)), m_pass);

What looks like typecasting to LPCTSTR on the cstrPath parameter is actually a CString operator that extracts a pointer to a read-only C-style null-terminated string from a CString object. You might also be wondering why I didn't just create a constructor for the AudioStream class that accepts a pointer to a filename instead of making a Create member function to take the filename. I didn't do this because it's possible for the operation to fail and in C++ you can't easily return an error code from a constructor.

Controlling an AudioStream object

Once you've created an AudioStream object, you can begin playback with the Play method. The following is the command handler for the Test Play menu item:

void CMainWindow::OnTestPlay() 
{
  if (m_pasCurrent)
  {
    m_pasCurrent->Play ();
  }
}

And here's the command handler for the Test Stop menu item:

void CMainWindow::OnTestStop() 
{
  if (m_pasCurrent)
  {
    m_pasCurrent->Stop ();
  }
}

This code is so simple, I don't think it really needs any explanation. The only control methods I implemented for AudioStream objects are Play and Stop. In a real application, you'd probably want to add some more functionality.

The Timer and WaveFile Objects

Now that I've given you a look at how to use the AudioStreamServices and AudioStream objects in an application, let's dig into their implementation. I'll begin with two helper objects, Timer and WaveFile, that are used by AudioStream objects.

The Timer object

The Timer object is used to provide timer services that allow AudioStream objects to service the sound buffer periodically. Here's the declaration for the Timer class:

class Timer
{
public:
  Timer (void);
  ~Timer (void);
  BOOL Create (UINT nPeriod, UINT nRes, DWORD dwUser,
               TIMERCALLBACK pfnCallback);
protected:
  static void CALLBACK TimeProc(UINT uID, UINT uMsg, DWORD dwUser,
                                DWORD dw1, DWORD dw2);
  TIMERCALLBACK m_pfnCallback;
  DWORD m_dwUser;
  UINT m_nPeriod;
  UINT m_nRes;
  UINT m_nIDTimer;
};

The Timer object uses the multimedia timer services provided through the Win32 timeSetEvent function. These services call a user-supplied callback function at a periodic interval specified in milliseconds. The Create member does all of the work here:

BOOL Create (UINT nDelay, UINT nRes, DWORD dwUser, TIMERCALLBACK pfnCallback);

The nPeriod and nRes parameters specify the timer period and resolution in milliseconds. The dwUser parameter specifies a DWORD that is passed back to you with each timer callback. The pfnCallback parameter specifies the callback function. Here's the source for Create:

BOOL Timer::Create (UINT nPeriod, UINT nRes, DWORD dwUser,
                    TIMERCALLBACK pfnCallback)

{
  BOOL bRtn = SUCCESS;  // assume success
  
  // Set data members
  m_nPeriod = nPeriod;
  m_nRes = nRes;
  m_dwUser = dwUser;
  m_pfnCallback = pfnCallback;

  // Create multimedia timer
  if ((m_nIDTimer = timeSetEvent (m_nPeriod, m_nRes, TimeProc, 
                                  (DWORD) this, TIME_PERIODIC)) == NULL)
  {
    bRtn = FAILURE;
  }

  return (bRtn);
}

After stuffing the four parameters into data members, Create calls timeSetEvent and passes the this pointer as the user-supplied data to the multimedia timer callback. This data is passed back to the callback to identify which Timer object is associated with the callback.

Before I lose you here, take a look at the declaration of the Timer::TimeProc member function. It must be declared as static so that it can be used as a C-style callback for the multimedia timer set with timeSetEvent. Because TimeProc is a static member function, it's not associated with a Timer object and does not have access to the this pointer. Here's the source for TimeProc:

void CALLBACK Timer::TimeProc(UINT uID, UINT uMsg, DWORD dwUser,
                              DWORD dw1, DWORD dw2)
{
  // dwUser contains ptr to Timer object
  Timer * ptimer = (Timer *) dwUser;

  // Call user-specified callback and pass back user specified data
  (ptimer->m_pfnCallback) (ptimer->m_dwUser);
}

TimeProc contains two action-packed lines of code. The first line simply casts the dwUser parameter to a pointer to a Timer object and saves it in a local variable, ptimer. The second line of code dereferences ptimer to call the user-supplied callback and pass back the user-supplied data. I could have done away with the first line of code altogether and just cast dwUser to access the data members of the associated Timer object but I wrote it this way to better illustrate what's going on. Note that when I say "user-supplied" here, I'm talking about the user of the Timer object, which in this case is an AudioStream object.

In similar fashion, any object that uses a Timer object must supply a callback that is a static member function and supply its this pointer as the user-supplied data for the callback. For example, here's the code from AudioStream::Play that creates the Timer object:

// Kick off timer to service buffer
m_ptimer = new Timer ();
if (m_ptimer)
{
  m_ptimer->Create (m_nBufService, m_nBufService, DWORD (this), TimerCallback);
}

And here's the static member function that serves as a callback for the Timer object:

BOOL AudioStream::TimerCallback (DWORD dwUser)
{
  // dwUser contains ptr to AudioStream object
  AudioStream * pas = (AudioStream *) dwUser;

  return (pas->ServiceBuffer ());
}

All the important work is done in the AudioStream::ServiceBuffer routine. You could move everything into AudioStream::TimerCallback, but because it's static, you'd have to use the this pointer contained in dwUser to access all class members. I think using a separate nonstatic member function results in code that is easier to read.

The WaveFile object

In addition to an object to encapsulate multimedia timer services, I needed an object to represent a wave file, so I created the WaveFile class. The following is the class declaration for the WaveFile class:

class WaveFile
{
public:
  WaveFile (void);
  ~WaveFile (void);
  BOOL Open (LPSTR pszFilename);
  BOOL Cue (void);
  UINT Read (BYTE * pbDest, UINT cbSize);
  UINT GetNumBytesRemaining (void) { return (m_nDataSize - m_nBytesPlayed); }
  UINT GetAvgDataRate (void) { return (m_nAvgDataRate); }
  UINT GetDataSize (void) { return (m_nDataSize); }
  UINT GetNumBytesPlayed (void) { return (m_nBytesPlayed); }
  UINT GetDuration (void) { return (m_nDuration); }
  BYTE GetSilenceData (void);
  WAVEFORMATEX * m_pwfmt;
protected:
  HMMIO m_hmmio;
  MMRESULT m_mmr;
  MMCKINFO m_mmckiRiff;
  MMCKINFO m_mmckiFmt;
  MMCKINFO m_mmckiData;
  UINT m_nDuration;      // duration of sound in msec
  UINT m_nBlockAlign;    // wave data block alignment spec
  UINT m_nAvgDataRate;   // average wave data rate
  UINT m_nDataSize;      // size of data chunk
  UINT m_nBytesPlayed;   // offset into data chunk
};

This class was designed expressly to stream wave file data, hence there are none of the traditional file I/O functions for operations such as seeking, writing, and creating new files. The following table describes the purpose of each of the member functions in the WaveFile class:

Function Description
Open Opens a wave file.
Cue Cues a wave file for playback.
Read Reads a given number of data bytes.
GetNumBytesRemaining Returns the number of data bytes remaining to be read.
GetAvgDataRate Returns the average data rate in bytes per second.
GetDataSize Returns the total number of wave data bytes.
GetNumBytesPlayed Returns the number of data bytes that have been read.
GetDuration Gets the duration of the wave file in milliseconds.
GetSilenceData Returns a byte of data representing silence.

I chose to use the Win32 Multimedia File I/O services (MMIO) for implementation of WaveFile objects because these services take care of the basics of parsing the chunks in Resource Interchange File Format (RIFF) files. Since the point of this article is to explain streaming with DirectSound, I'm not going to explain the WaveFile code in detail. Take my word for it: The biggest challenge in writing this code was properly handling the myriad of errors that can occur when accessing files.

Silence, please!

There is one detail I do want to explain. Implementing the AudioStream class required that blocks of data representing silence be written to the sound buffer. (If you read the remainder of this article, you'll learn why.) Since the data representing silence depends on the format of the wave file, I added a GetSilenceData member function to the WaveFile class. Word size for pulse-code modulation (PCM) formats can range from one byte for 8-bit mono to four bytes for 16-bit stereo, as shown in the following table.

PCM Format Word Size Silence Data
8-bit mono 1 byte 0x80
8-bit stereo 2 bytes 0x8080
16-bit mono 2 bytes 0x0000
16-bit stereo 4 bytes 0x00000000

Rather than make the AudioStream code deal with the different word sizes for different wave file formats, I took advantage of the fact that regardless of word size, silence data for PCM formats can be represented by a single byte. Thus, the GetSilenceData function returns a BYTE. This shortcut saved me from having to write a lot of extra code.

The AudioStreamServices Object

The DirectSound interface consists of two objects, IDirectSound and IDirectSoundBuffer. The IDirectSound object represents the DirectSound services for a single window. Services are apportioned on a per-windows basis to facilitate muting a sound stream when a window loses the input focus. I created the AudioStreamServices class to wrap the IDirectSound object:

class AudioStreamServices
{
public:
  AudioStreamServices (void);
  ~AudioStreamServices (void);
  BOOL Initialize (HWND hwnd);
  LPDIRECTSOUND GetPDS (void) { return m_pds; }
protected:
  HWND m_hwnd;
  LPDIRECTSOUND m_pds;
};

As you can see, this is a pretty light class. In addition to a constructor and destructor, there are two member functions, Initialize and GetPDS. The GetPDS function returns the pointer to the IDirectSound object created by the Initialize function. The Initialize function takes a window handle and creates and initializes an IDirectSound object. Here's the code for the Initialize function:

// Initialize
BOOL AudioStreamServices::Initialize (HWND hwnd)
{
  BOOL fRtn = SUCCESS;  // assume success

  if (m_pds == NULL)
  {
    if (hwnd)
    {
      m_hwnd = hwnd;

      // Create IDirectSound object
      if (DirectSoundCreate (NULL, &m_pds, NULL) == DS_OK)
      {
        // Set cooperative level for DirectSound. Normal means our
        // sounds will be silenced when our window loses input focus.
        if (m_pds->SetCooperativeLevel (m_hwnd, DSSCL_NORMAL) == DS_OK)
        {
          // Any additional initialization goes here.
        }
        else
        {
          // Error
          DOUT ("ERROR: Unable to set cooperative level\n\r");
          fRtn = FAILURE;
        }
      }
      else
      {
        // Error
        DOUT ("ERROR: Unable to create IDirectSound object\n\r");
        fRtn = FAILURE;
      }
    }
    else
    {
      // Error, invalid hwnd
      DOUT ("ERROR: Invalid hwnd, unable to initialize services\n\r");
      fRtn = FAILURE;
    }
  }

  return (fRtn);
}

The Initialize function creates an IDirectSound object by calling the DirectSoundCreate function. The first parameter to the DirectSoundCreate call is NULL to request the default DirectSound device. The second parameter is a pointer to a location that DirectSoundCreate fills with a pointer to an IDirectSound object. The pointer returned by DirectSoundCreate provides an interface for accessing IDirectSound member functions.

After successfully creating an IDirectSound object, the Initialize code calls the SetCooperativeLevel member function specifying the DSSCL_NORMAL flag to set the normal cooperative level. This is the lowest cooperative level—other levels are available if you require more control of DirectSound's buffers. For example, in normal cooperative level, the format of audio output is always 8-bit 22kHz mono. To change to another output format, you have to set the priority cooperative level (DSSCL_PRIORITY) and call the SetFormat function.

The AudioStream Object

Now we're down to the good stuff. I've explained how to use AudioStreamServices and AudioStream objects in an application. I've described the Timer and WaveFile objects that are used to provide periodic timer services and read wave files. Now I'm going to explain the implementation of the AudioStream object, the object that actually streams wave files using DirectSound. Here's the AudioStream class declaration:

class AudioStream
{
public:
  AudioStream (void);
  ~AudioStream (void);
  BOOL Create (LPSTR pszFilename, AudioStreamServices * pass);
  BOOL Destroy (void);
  void Play (void);
  void Stop (void);
protected:
  void Cue (void);
  BOOL WriteWaveData (UINT cbSize);
  BOOL WriteSilence (UINT cbSize);
  DWORD GetMaxWriteSize (void);
  BOOL ServiceBuffer (void);
  static BOOL TimerCallback (DWORD dwUser);
  AudioStreamServices * m_pass;  // ptr to AudioStreamServices object
  LPDIRECTSOUNDBUFFER m_pdsb;    // sound buffer
  WaveFile * m_pwavefile;        // ptr to WaveFile object
  Timer * m_ptimer;              // ptr to Timer object
  BOOL m_fCued;                  // semaphore (stream cued)
  BOOL m_fPlaying;               // semaphore (stream playing)
  DSBUFFERDESC m_dsbd;           // sound buffer description
  LONG m_lInService;             // reentrancy semaphore
  UINT m_cbBufOffset;            // last write position
  UINT m_nBufLength;             // length of sound buffer in msec
  UINT m_cbBufSize;              // size of sound buffer in bytes
  UINT m_nBufService;            // service interval in msec
  UINT m_nDuration;              // duration of wave file
  UINT m_nTimeStarted;           // time (in system time) playback started
  UINT m_nTimeElapsed;           // elapsed time in msec since playback started
};

In addition to a standard constructor and destructor, there are four public interface methods: Create, Destroy, Play, and Stop. The purpose of these methods should be obvious from the names I've given them.

The main players here are the Create and Play methods, and a third method, ServiceBuffer, that is not an interface. Here is an explanation of the role each of these methods plays in streaming wave files:

Creating the sound buffer

Before creating a sound buffer, you must open the wave file to determine its format, average data rate, and duration. Here's the corresponding code from the Create method:

// Create a new WaveFile object
if (m_pwavefile = new WaveFile)
{
  // Open given file
  if (m_pwavefile->Open (pszFilename))
  {
    // Calculate sound buffer size in bytes
    m_cbBufSize = (m_pwavefile->GetAvgDataRate () * m_nBufLength) / 1000;
    m_cbBufSize =   (m_cbBufSize > m_pwavefile->GetDataSize ())
            ? m_pwavefile->GetDataSize ()
            : m_cbBufSize;

    // Get duration of sound (in milliseconds)
    m_nDuration = m_pwavefile->GetDuration ();
    
    . . .
  }
}

After opening the file, Create determines the required size of the sound buffer and the duration of the sound. The size of the sound buffer is calculated from the average data rate and the default buffer length in milliseconds (the m_nBufLength data member). The default buffer length is set to a constant in the AudioStream constructor. I chose to use a two-second sound buffer, but it's a good idea to experiment with your particular application. The timer interval for servicing the sound buffer should be no more than half of the buffer length. I used a 500-millisecond service interval, one-fourth the length of the sound buffer. You can adjust the buffer length and buffer service intervals in the STREAMS sample application by changing the DefBufferLength and DefBufferServiceInterval constants in the AUDIOSTREAM.CPP file:

const UINT DefBufferLength      = 2000;
const UINT DefBufferServiceInterval  = 250;

After successfully opening the wave file and calculating the required buffer size, Create creates a DirectSound sound buffer by initializing a DSBUFFERDESC structure and calling IDirectSound::CreateSoundBuffer:

// Create sound buffer
HRESULT hr;
memset (&m_dsbd, 0, sizeof (DSBUFFERDESC));
m_dsbd.dwSize = sizeof (DSBUFFERDESC);
m_dsbd.dwBufferBytes = m_cbBufSize;
m_dsbd.lpwfxFormat = m_pwavefile->m_pwfmt;
hr = (m_pass->GetPDS ())->CreateSoundBuffer (&m_dsbd, &m_pdsb, NULL);

The lpwfxFormat element of the DSBUFFERDESC structure points to a WAVEFORMATEX structure specifying the format of the wave file. Currently, DirectSound will not play compressed wave formats. The CreateSoundBuffer method will fail for any formats that are not PCM. Note that no flags are specified for DSBUFFERDESC.dwFlags. This causes CreateSoundBuffer to create a looping secondary buffer which is the proper type of buffer for streaming.

Filling the sound buffer with wave data

After successfully creating the sound buffer, Create calls the AudioStream::Cue method to prepare the stream for playback. Cue resets the buffer pointers and the file pointer and then calls AudioStream:: WriteWaveData to fill the buffer with data from the wave file. The following is the source for WriteWaveData:

BOOL AudioStream::WriteWaveData (UINT size)
{
  HRESULT hr;
  LPBYTE lpbuf1 = NULL;
  LPBYTE lpbuf2 = NULL;
  DWORD dwsize1 = 0;
  DWORD dwsize2 = 0;
  DWORD dwbyteswritten1 = 0;
  DWORD dwbyteswritten2 = 0;
  BOOL fRtn = SUCCESS;

  // Lock the sound buffer
  hr = m_pdsb->Lock (m_cbBufOffset, size, &lpbuf1, &dwsize1, &lpbuf2, &dwsize2,
                     0);
  if (hr == DS_OK)
  {
    // Write data to sound buffer. Because the sound buffer is circular,
    // we may have to do two write operations if locked portion of buffer
    // wraps around to start of buffer.
    ASSERT (lpbuf1);
    if ((dwbyteswritten1 = m_pwavefile->Read (lpbuf1, dwsize1)) == dwsize1)
    {
      // Second write required?
      if (lpbuf2)
      {
        if ((dwbyteswritten2 = m_pwavefile->Read (lpbuf2, dwsize2)) == dwsize2)
        {
          // Both write operations successful!
        }
        else
        {
          // Error, didn't read wave data completely
          fRtn = FAILURE;
        }
      }
    }
    else
    {
      // Error, didn't read wave data completely
      fRtn = FAILURE;
    }

    // Update our buffer offset and unlock sound buffer
    m_cbBufOffset = (m_cbBufOffset + dwbyteswritten1 + dwbyteswritten2)
                     % m_cbBufSize;
    m_pdsb->Unlock (lpbuf1, dwbyteswritten1, lpbuf2, dwbyteswritten2);
  }
  else
  {
    // Error locking sound buffer
    fRtn = FAILURE;
  }

  return (fRtn);
}

WriteWaveData reads a given number of data bytes from the wave file and writes the data to the sound buffer. To write data to a DirectSound sound buffer you must first call the IDirectSoundBuffer::Lock method to get write pointers. No that's not a typo, Lock return two pointers. Usually, the second pointer will be returned as NULL, but if the write operation spans the end of the buffer the second pointer will be a valid address (the beginning of the buffer). That's the nature of circular buffers. No problem though, the resulting code is still pretty simple and straightforward.

Beginning playback

The AudioStream::Play method begins playback by calling the IDirectSoundBuffer::Play method and creating a timer to service the sound buffer:

// Begin DirectSound playback
HRESULT hr = m_pdsb->Play (0, 0, DSBPLAY_LOOPING);
if (hr == DS_OK)
{
  // Save current time (for elapsed time calculation)
  m_nTimeStarted = timeGetTime ();
  
  // Kick off timer to service buffer
  m_ptimer = new Timer ();
  if (m_ptimer)
  {
    m_ptimer->Create (m_nBufService, m_nBufService, DWORD (this),
                      TimerCallback);
  }

  . . . 
}

Note that the call to IDirectSoundBuffer::Play includes the DSBPLAY_LOOPING flag to specify that playback continue until explicitly stopped. Play also sets the m_nTimeStarted data member to the current system time (in milliseconds) to allow calculation of the time that has elapsed since playback was started.

Servicing the Sound Buffer

The Timer object created by AudioStream::Play periodically calls the ServiceBuffer routine to perform the following tasks:

The following is the complete source for ServiceBuffer:

LONG lInService = FALSE;  // reentrancy semaphore

BOOL AudioStream::ServiceBuffer (void)
{
  BOOL fRtn = TRUE;

  // Check for reentrance
  if (InterlockedExchange (&lInService, TRUE) == FALSE)
  { // Not reentered, proceed normally
    // Maintain elapsed time count
    m_nTimeElapsed = timeGetTime () - m_nTimeStarted;

    // Stop if all of sound has played
    if (m_nTimeElapsed < m_nDuration)
    {
      // All of sound not played yet, send more data to buffer
      DWORD dwFreeSpace = GetMaxWriteSize ();

      // Determine free space in sound buffer
      if (dwFreeSpace)
      {
        // See how much wave data remains to be sent to buffer
        DWORD dwDataRemaining = m_pwavefile->GetNumBytesRemaining ();
        if (dwDataRemaining == 0)
        { // All wave data has been sent to buffer
          // Fill free space with silence
          if (WriteSilence (dwFreeSpace) == FAILURE)
          { // Error writing silence data
            fRtn = FALSE;
          }
        }
        else if (dwDataRemaining >= dwFreeSpace)
        { // Enough wave data remains to fill free space in buffer
          // Fill free space in buffer with wave data
          if (WriteWaveData (dwFreeSpace) == FAILURE)
          { // Error writing wave data
            fRtn = FALSE;
          }
        }
        else
        { // Some wave data remains, but not enough to fill free space
          // Write wave data, fill remainder of free space with silence
          if (WriteWaveData (dwDataRemaining) == SUCCESS)
          {
            if (WriteSilence (dwFreeSpace - dwDataRemaining) == FAILURE)
            { // Error writing silence data
              fRtn = FALSE;
            }
          }
          else
          { // Error writing wave data
            fRtn = FALSE;
          }
        }
      }
      else
      { // No free space in buffer for some reason
        fRtn = FALSE;
      }
    }
    else
    { // All of sound has played, stop playback
      Stop ();
    }
    // Reset reentrancy semaphore
    InterlockedExchange (&lInService, FALSE);
  }
  else
  { // Service routine reentered. Do nothing, just return
    fRtn = FALSE;
  }
  return (fRtn);
}

I feel that the code pretty much speaks for itself here (that's why I included all of this rather lengthy routine). There are several things I want to explain, however. The first is the call to InterlockedExchange. This is a nifty Win32 synchronization mechanism that I'm using to detect whether the ServiceBuffer routine is reentered. It's possible that you could still be servicing the buffer when another timer interrupt comes along. If ServiceBuffer is reentered, it simply returns immediately without doing anything.

I also want to explain why you need to write silence data to the sound buffer. DirectSound has no concept of when playback of a wave file is complete—it just happily cycles through the sound buffer playing whatever data is there until it's told to stop. The ServiceBuffer routine keeps track of how much time has elapsed since playback was started and stops playback as soon as enough time has elapsed to play the entire wave file. Since you can't stop playback at the exact millisecond that the last wave data byte is played, you have to follow the wave data with data representing silence. If you don't do this, you will get some random blip of sound at the end of a wave file.

Managing the read-and-write cursors

Two offsets are required to manage data in a circular buffer. Traditionally these offsets are called the head and the tail of the buffer. I can never remember which is the head and which is the tail, so I like to call these two offsets the "read cursor" and the "write cursor." In this case, the read cursor identifies the location in the buffer where DirectSound is reading wave data and the write cursor identifies the location where we need to write the next block of wave data.

If you take a look at the IDirectSoundBuffer::GetCurrentPosition method, you'll see that it returns a read cursor and a write cursor. Looks easy enough. At least that's what I thought, but that's not exactly correct. It took me several days of hair-pulling to figure out that the write cursor returned by GetCurrentPosition was not the write cursor I needed to manage a sound buffer. Don't you hate it when things don't work like you want them to?

To manage a sound buffer with DirectSound, you must maintain your own write cursor. In the AudioStream class I represent the write cursor with the m_cbBufOffset data member. Each time you write wave data to the sound buffer, you must increment m_cbBufOffset and check to see if it has wrapped around to the beginning of the buffer. It's not difficult code to write, but it certainly took me a while to discover that I couldn't use the write cursor provided by DirectSound! The following code is a helper method called by ServiceBuffer to determine how much of the sound buffer has already been played (in other words, how much data can be written to the sound buffer):

DWORD AudioStream::GetMaxWriteSize (void)
{
  DWORD dwWriteCursor, dwPlayCursor, dwMaxSize;

  // Get current play position
  if (m_pdsb->GetCurrentPosition (&dwPlayCursor, &dwWriteCursor) == DS_OK)
  {
    if (m_cbBufOffset <= dwPlayCursor)
    {
      // Our write position trails play cursor
      dwMaxSize = dwPlayCursor - m_cbBufOffset;
    }

    else // (m_cbBufOffset > dwPlayCursor)
    {
      // Play cursor has wrapped
      dwMaxSize = m_cbBufSize - m_cbBufOffset + dwPlayCursor;
    }
  }
  else
  {
    // GetCurrentPosition call failed
    ASSERT (0);
    dwMaxSize = 0;
  }
  return (dwMaxSize);
}

GetMaxWriteSize provides a good illustration of how to manage the read and write cursors. You may also want to look at the WriteWaveData method presented earlier and see how m_cbBufOffset is used with the IDirectSoundBuffer::Lock method to get an actual write pointer in the sound buffer.

Now, I'll bet you're wondering what the deal is with the write cursor maintained by DirectSound. No, it's not broken—that's the way it was designed to operate! DirectSound's write cursor specifies the position in the buffer where it is safe to write data. During playback, DirectSound won't allow you to write to the section of the sound buffer that begins with its play cursor and ends with its write cursor. Typically, this is about 15 milliseconds worth of data. DirectSound does not change its write cursor when you write data to a sound buffer—the write cursor always tracks the play cursor and leads it by about 15 milliseconds during playback.

Quick Fix: A Summary of Streaming with DirectSound

This following list summarizes what you need to know about streaming wave files with DirectSound: