Recording and Playing Waveform Audio

Nigel Thompson
Microsoft Developer Network Technology Group

Created: July 27, 1994
Revised: October 30, 1995

The sample code was rebuilt in October 1995 using Visual C++™ version 4.0. The resulting code was tested on the following platforms:

Windows NT™ version 3.51 (Dell® Pentium® 90, 32 MB RAM, Windows® Sound System)
Windows 95 (Gateway® 486/66, 16 MB RAM, Windows Sound System)
Windows for Workgroups version 3.11 installed (Compaq 386/20e®, 9 MB RAM, Windows Sound System) with Win32s® version 1.30 installed. Samples that use the Animate library also need WinG version 1.0.

All platforms were running their display drivers in 256-color mode (8 bits per pixel) unless otherwise noted in this article. Also, the AnimTest sample application was added in this revision.

Click to open or copy files in the SpelEdit sample application for this technical article.

Click to open or copy files in the Speller sample application for this technical article.

Click to open or copy files in the Animate sample application for this technical article.

Click to open or copy files in the AutoDuck sample application for this technical article.

Click to open or copy the files in the AnimTest sample application for this technical article.

Abstract

This technical article describes a way to record and play waveform audio in a Win32®-based application built using Microsoft® Visual C++™ version 1.1 for Windows NT™. Two sample applications accompany this technical article: SpelEdit and Speller. SpelEdit demonstrates recording and allows you to construct a spelling list consisting of a number of words and their sounds. Speller demonstrates playback and uses the spelling list created by the SpelEdit application. Speller plays word sounds from the list and requires the user to type them in correctly before moving on to the next word. Help for the user is provided by playing portions of speech that are included as WAVE resources in the application. This particular article was inspired by my daughter, Nell, whose life is currently besieged by spelling lists.

The code in the sample applications themselves is quite simple. The sample applications make extensive use of a library of Microsoft Foundation Class Library (MFC) classes that I originally developed for the samples in my book Animation Techniques for Win32: A C++ Programmer's Guide to DIBs, Palettes, and Sprites (Development Library, Books and Periodicals). The source code for the library is supplied in the Animate sample directory. This library of code changes as I improve the classes used with my sample applications. The version of the Animate library released with this edition of the Library is somewhat altered from the version that comes with the book. I plan to use this library in several samples in the future. The latest version of the library classes may not always be exactly compatible with earlier versions, so if you plan to upgrade, beware. The library code is documented in the ANIMATE.HLP Help file, which is produced automatically from the library source code using the AutoDuck tool.

The AutoDuck tool is supplied in the Developer Library in the Unsupported Tools and Utilities section. AutoDuck was developed internally in the Microsoft Multimedia group as a means to document the system code as it was being developed. The tool extracts tagged comments from the source code and creates either an .RTF file or a Windows Help file. The Help file that accompanies AutoDuck explains how to use it and includes a reference section describing the text tags that it supports.

Introduction

There have been a number of requests recently for more articles about how to record and play waveform audio. This article is in response to those requests. There are several ways to play waveform audio in Microsoft® Windows®: using sndPlaySound, using the Media Control Interface (MCI) string interface, using the MCI message interface, and using the low-level audio services. This article shows how to play waveform audio using the low-level services. This is by no means the simplest way, but don't be put off—I've encapsulated the bulk of the code in a number of C++ classes contained in the Animate library, so if you don't need to know how it works, you can just use the Animate library classes and be happy. Recording of waveform audio can be done through MCI or the low-level audio services. The SpelEdit sample uses the low-level services. Using the low-level services provides the most control over what is going on during recording and playback, but does require a bit more support code. If you want to know the details or want to change the functionality in some way, you can modify the library code to suit your purposes.

Building the Samples

The SpelEdit and Speller samples both link to the Animate library. To be a little more precise, a debug build of one of the samples needs to be linked with ANIMATED.LIB and a retail build needs to be linked with ANIMATER.LIB. The paths for the libraries in the sample makefiles might need to be altered according to where you put the libraries and headers on your system.

The Animate library has a single header file—ANIMATE.H, which you can most simply include in your application by adding it to your STDAFX.H file.

Terminology

This article deals with sampled audio—that is, pieces of audio such as speech or music that have been digitally recorded. These audio samples are often referred to by a variety of names, such as sampled audio, waveform audio, digitized audio, and so on. I tend to refer to them as waveforms.

When one of these waveforms is stored in a disk file, I generally refer to it as a WAVE file. In practice, this is a file on the system with a .WAV extension. The extension is not mandatory, but is obviously convenient and is used by some system components (such as MCI) to automatically select a suitable piece of code for playing it. You may also see these files referred to as .WAV files because of the common use of the .WAV extension.

The SpelEdit and Speller Samples

In creating sample applications, I usually try to keep the code down to an absolute minimum so that the features of the sample don't obscure the key points I'm trying to show. SpelEdit and Speller include rather more code than that required to simply record and play back a chunk of audio, so a quick guided tour of what the samples do might help to show which bits of the code are relevant to waveform recording and playback and which bits are simply part of the user interface.

The SpelEdit Sample

The SpelEdit sample is used to record the sound of words and add them to a spelling list. Figure 1 shows a screen shot of the application with only a few words in the list.

Figure 1. A screen shot of the SpelEdit application

Words in the list can be added to, played, or deleted, using either the Edit menu or the toolbar buttons. New words are added to the list by supplying the name and sound of the word in the Add Word dialog box, shown in Figure 2.

Figure 2. The Add Word dialog box

The user types the word in the Word edit box, and its sound is added either by recording it directly or by supplying a WAVE file containing the sound. Clicking the Sound File. . . button brings up a standard dialog box to open a file and allows the user to select a wave file. Note that if the sound is supplied by a WAVE file, the file itself is not required at playback time. The audio in the file is extracted from it and saved in the word list.

A better way to supply the sound of the word is to record it directly using a microphone connected to your sound card. Clicking the Record. . . button displays the Record dialog box, shown in Figure 3.

Figure 3. The Record dialog box

The volume units (VU) meter in the dialog box shows the input signal level both while idle (in the stopped mode) and while recording, so setting a good level is trivial—simply speak into the microphone and watch the meter. When the level is set correctly, click the Record button to begin recording. When finished, click the Stop button. The recorded sound can then be played by clicking the Play button. If it doesn't sound right, you can record over it by clicking the Record button again.

The VU meter has two needles. The white needle shows the average signal level and the red needle shows the peaks. The red area of the meter shows the level at which the input signal starts to be clipped off, so ideally you record with the peak signal level just below the red part of the scale.

If your PC has more than one sound card, you can select which one to record from in the Source list box. You can also choose the format that you would like the waveform to be recorded in from the Format list box. Note that the list of formats in the Format list box is taken from the device driver, so it only shows formats the audio card supports.

Once a sound has been recorded, click the OK button to return to the Add Word dialog box. Clicking the OK button in the Add Word dialog box adds the word and its sound to the current spelling list. When you're done adding words to the list, save it using the File menu or toolbar button.

The Speller Sample

The Speller sample uses spelling lists created by SpelEdit. Figure 4 shows a screen shot of Speller.

Figure 4. A screen shot of the Speller sample

The application plays the word sounds and the user enters them into the edit box. Great fun, unless you are seven (as my daughter Nell is), in which case it's hard work.

Playing and Recording Waveform Audio Using the Low-Level Audio Services

Playing waveform audio using the low-level services involves the application in opening an output device and sending a series of one or more blocks of waveform data to the device. The output device driver sends a notification message to the application each time a block has finished playing.

In order to record waveform audio, an application must supply a series of data buffers to the wave device driver. The device driver fills the buffers with data as it becomes available, and when each buffer is full, it posts a message to the application saying that the buffer is full and the application may now process it.

That's a rather over-simplified description of the process, but it will do for now. Most applications (such as the SpelEdit sample) really don't want to have to deal with audio data at the buffer level or deal directly with the audio device drivers either, so I encapsulated most of the code to record and play waveform audio into a number of C++ classes. Let's look next at a description of those classes. Later we can look at how they are used to implement the features of the SpelEdit and Speller samples.

The Waveform Audio Classes

The Animate library contains a number of classes that together handle waveform audio. The CWave class encapsulates a piece of waveform audio; its member functions include Play, Record, Stop, and so on. The CWaveInDevice class encapsulates the functionality of a waveform audio input device and deals with creating buffers and handling messages from the device driver. The CWaveOutDevice encapsulates the functionality of a waveform audio output device and deals with sending the data from a CWave object to the device driver and handling the notification messages from the driver. The CWave class is supported by the CWaveBlock and CWaveBlockList classes, which together encapsulate the actual waveform data, and CWaveNotifyObj, which is used to notify an application of events occurring in the CWave object, such as the end of playback or the arrival of a new block of recorded data.

Note If you are familiar with the earlier version of the CWave class I developed, you will notice that the class now supports multiple data blocks instead of only one. This change was required in order to support recording.

The CWave Class

The CWave class provides a simple encapsulation of everything required for the recording and playback of waveform audio. Figure 5 shows the architecture of a CWave object.

Figure 5. The CWave class architecture

The playback of waveform audio is usually done asynchronously to the execution of the application's code, which can present some management problems for the application. Consider an application that starts the playback of a waveform that might play for 2 minutes. Just after the waveform starts playing, the user changes the state of the application in some way. It's quite possible to lose track of which waveforms are playing if we aren't careful about how we write the code. The most annoying case occurs when we simply want to load a waveform and play it. We don't want to hang around while it plays, but we must be sure to free the memory used by the waveform when it's finished.

The CWave class provides two ways of handling this problem. First, you can use a notification object in your application derived from CWaveNotifyObj, which will be called when playback terminates. The code in your notification object then deletes the CWave object. Second, you can create the CWave object with the auto-destruct option and have it delete itself when it is no longer in use.

We'll look at using both notification objects and the auto-destruct feature when we look at the Record dialog box code.

Recording of waveform audio in a CWave object is actually done by a CWaveInDevice object, and playback is done via a CWaveOutDevice object.

The CWaveOutDevice Class

The CWaveOutDevice class provides an encapsulation of the functionality of a waveform output device driver. It handles the notification messages the driver sends out, so the application need not deal with them directly. The CWaveOutDevice class also deals with actually playing the blocks of data in a CWave object.

Because most systems have only one sound card and usually only one waveform output device, the CWaveOutDevice class defines a global object: theDefaultWaveOutDevice, which can be used to play the common WAVE formats (8-bit, mono, 11.025 kHz; and 8-bit, mono, 22.05 kHz) plus any other format the card in the machine happens to support. The CWave::Play function can be called with the pWaveOutDevice parameter either set to NULL or omitted altogether. This will cause the waveform to be played on the default output device (assuming the device supports the format of the waveform). For example, this code:

   m_MyWave.Play();

will play the waveform on the default device. The output device is open only as long as is required to play a waveform. Once playing is finished, the device is closed. This allows the output device to be used by other applications when we aren't using it in our application.

The CWaveInDevice Class

The CWaveInDevice class provides an encapsulation of the functionality of a waveform input device driver. It handles the notification messages the driver sends out, so the application need not deal with them directly. The CWaveInDevice class also deals with actually creating and recording the blocks of data that become attached to a CWave object. Recording to a CWave object continues until the object's Stop member function is called or either CWaveInDevice::Reset or CWaveInDevice::Close is called.

The CWaveBlock and CWaveBlockList Classes

The CWaveBlock class is used to encapsulate a single chunk of waveform data. The CWaveBlockList class keeps track of the list of CWaveBlock objects attached to a single CWave object.

The CWaveNotifyObj Class

The CWaveNotifyObj class is an abstract base class from which you can derive your own notification objects. You'll see how this is used when we look at the Record dialog box.

Recording: SpelEdit's Record Dialog Box

Now let's look at the code that manages the Record dialog box shown in Figure 3. This dialog box records data to a CWave object under user control. It allows the user to play back the waveform and record over it if required. The VU meter in the dialog box shows the current input signal level all the time except while playing back a sample that has just been recorded. So you can see the input signal level before recording, which makes it easy to set the initial recording level, and while actually recording, which gives you some sense that the recording is taking place.

Let's begin by seeing how the dialog box is used from the application. Here's the code from SpelEdit's Add Word dialog box, which opens the Record dialog box (from ADDWDLG.CPP):

void CAddWordDlg::OnClickedRecord()
{
    CRecordDlg dlg;
    if (dlg.DoModal() == IDOK) {
        if (m_pWave) delete m_pWave;
        ASSERT(dlg.m_pWave);
        m_pWave = dlg.m_pWave;
    }
    ValidateButtons();
}

A CRecordDlg object is created and the dialog box is shown by calling its DoModal function. If the user records some audio and clicks the OK button, the waveform currently associated with the Add Word dialog box is deleted and replaced with the newly recorded one. As you can see, this is quite simple to use—very much like using one of the common dialog box functions to open a file.

Please note that I originally wanted to make CRecordDlg a part of the Animate library, but in order to do this, I would have had either to make the library (or part of it) into a dynamic-link library (DLL) or otherwise to find a way to include the template for the dialog box in the library. I really didn't want to provide a DLL, inasmuch as part of the goal for the Animate library was to aid the construction of stand-alone animation applications that didn't need to be shipped with a boatload of DLLs. The problem with including the template in the library as an includable text file is that adding this template to an existing MFC-based application is quite messy. I found that when I tried to do this, I had to make several changes to avoid resource ID conflicts and fiddle around with the classes a lot before ClassWizard would work with them. So I decided simply to give you an application with the dialog box sources included in it. That way you can cut and paste the template and code if you want to, or you can do your own thing and create a dialog box of your own design.

Now let's look at the code that manages the Record dialog box itself. There's quite a lot of code involved in this (which is why I wanted to add it to the library), so I want to discuss only the important points. I won't describe some of the helper functions whose implementation is trivial. Let's begin by looking at the OnInitDialog function:

BOOL CRecordDlg::OnInitDialog()
{
    CDialog::OnInitDialog();
    
    // Wave must not be provided by caller.
    ASSERT(m_pWave == NULL);
    CRect rcVU;
    m_wndFrame.GetWindowRect(&rcVU);
    ScreenToClient(&rcVU);
    m_VU.Create("VU",
                WS_CHILD | WS_VISIBLE,
                rcVU,
                this,
                1);
    m_VU.SetValue(0, 0);

    // Fill the input device list box.
    FillDeviceList();
    // Fill the device list
    FillFormatList();
    // Get the VU meter going.
    SetMode(SAMPLING);
    return TRUE;  // return TRUE  unless you set the focus to a control
}

A test is made to ensure that a CWave object has not been provided by the caller (the dialog box creates its own CWave object to return the recording in). Then comes a rather hacky looking piece of code that creates the VU meter. Because App Studio doesn't know how to include one of my VU meters in a dialog box, I simply chose to design the dialog box with a static control where I wanted the VU meter window to be and then placed the VU meter window over the control at run time. There are other ways to do this, but this is easy and works. The CVUMeter class is part of the Animate library, and since we have enough to talk about just describing waveform audio, I'll leave investigating how the CVUMeter class works to you.

Once the VU meter has been created, the device list is filled with all the available input devices, and the format list is filled with all the formats the current (in this case, the first) input device supports. Finally, the sampling of the input level is started, so the VU meter will show the signal level prior to recording.

Let's see now how the input device list is filled out:

void CRecordDlg::FillDeviceList()
{
    m_iNumDevs = waveInGetNumDevs();
    if (m_iNumDevs == 0) {
        AfxMessageBox("There are no suitable input devices");
        EndDialog(IDCANCEL);
        return;
    }
    // Allocate memory for the device list.
    if (m_pDevCaps) delete m_pDevCaps;
    m_pDevCaps = new WAVEINCAPS[m_iNumDevs];
    m_cbSource.ResetContent();
    for (int i=0; i<m_iNumDevs; i++) {
        waveInGetDevCaps(i,
                         &m_pDevCaps[i],
                         sizeof(WAVEINCAPS));
        // Save the device ID in the manufacturer field.
        m_pDevCaps[i].wMid = i;
        m_cbSource.AddString((LPCSTR)&m_pDevCaps[i]);
    }
    m_cbSource.SetCurSel(0);
}

Note here that I'm not using the CWaveInDevice class but rather the low-level audio services themselves. I could have added some of this functionality to the CWaveInDevice class, but it would have been no simpler to use. A call to waveInGetNumDevs obtains the number of available input devices, and a subsequent call to waveInGetDevCaps fetches the capabilities of each device. A pointer to each device's WAVEINCAPS structure is added to the device list. The current device is set to the first one in the list. Details of the WAVEINCAPS structure can be found in the Microsoft Windows version 3.1 Software Development Kit (SDK) Multimedia Programmer's Reference. Note that the list box is of the owner-drawn type, so the drawing code takes the pointer to the WAVEINCAPS structure, extracts the device name, and draws it in the list box window. Here's how the format list is built:

void CRecordDlg::FillFormatList()
{
    m_cbFormat.ResetContent();
    int iSel = m_cbSource.GetCurSel();
    if(iSel == CB_ERR) return;
    WAVEINCAPS* pCaps = (WAVEINCAPS*) m_cbSource.GetItemData(iSel);
    ASSERT(pCaps);
    DWORD dwMask = 0x00000001;
    for (int i=0; i<12; i++) {
        if (pCaps->dwFormats & dwMask) {
            m_cbFormat.AddString((LPCSTR) dwMask);
        }
        dwMask = dwMask << 1;
    }
    m_cbFormat.SetCurSel(0);
}

This list box is also owner-drawn, and in this case, a bit vector is added to the list for each format the device supports. The drawing code for the list box converts the bit vector to a suitable string of text. Figure 6 shows the format list box in its dropped-down state.

Figure 6. An example of the Format list box

Let's see now how the VU meter is used to monitor the input signal level prior to recording. In order to do this, we must ask the input device to record small blocks of data continuously. As each block is filled up, we can examine its contents to determine the peak value and set the VU meter accordingly. Let's look at how the recording process is started when the mode is set to SAMPLING or RECORDING:

// Start up in the new mode.
    switch (m) {
    case SAMPLING:
    case RECORDING:
        {
        // Get the selected input device.
        int iSel = m_cbSource.GetCurSel();
        if(iSel == CB_ERR) return;
        WAVEINCAPS* pCaps = (WAVEINCAPS*) m_cbSource.GetItemData(iSel);
        ASSERT(pCaps);
        // Get the device ID we saved in the manufacturer's ID slot. 
        UINT uiID = pCaps->wMid;
        // Get the selected format.
        iSel = m_cbFormat.GetCurSel();
        if(iSel == CB_ERR) return;
        DWORD dwFormat = m_cbFormat.GetItemData(iSel);
        ASSERT(dwFormat);
        // Open the device.
        PCMWAVEFORMAT fmt;
        BuildFormat(fmt, dwFormat);
        if (!m_InDev.Open(uiID, &fmt)) return;
        if (m == SAMPLING) {
            m_SampleWave.DeleteAll();
            m_SampleWave.Create(&fmt);
            m_SampleWave.Record(&m_InDev,
                                1024,  
                                &m_NotifyObj);
        } else if (m == RECORDING) {
            if (!m_pWave) m_pWave = new CWave;
            ASSERT(m_pWave);
            m_pWave->Create(&fmt);
            m_pWave->Record(&m_InDev,
                            4096,  
                            &m_NotifyObj);
        }
        } break;

The currently selected input device is opened with the currently selected format. If the mode has been set to SAMPLING, recording is started into a special CWave object (m_SampleWave) with a small block size. If the mode has been set to RECORDING, recording is started to a new CWave object created to match the requested format and using a larger block size. Using a small block size while sampling means we will get samples to update the VU meter more often. Using a larger block for the actual recording helps avoid wasting memory by having to manage a large number of small wave data blocks in the final CWave object.

So what happens now? Notice that, when we called the Record function, we supplied the address of an object as the third parameter. This notification object is derived from CWaveNotifyObj and is used to monitor events occurring while the waveform is being recorded. Here's the definition of the CRecDlgNotifyObj:

class CRecDlgNotifyObj : public CWaveNotifyObj
{
public:
    CRecDlgNotifyObj();
    ~CRecDlgNotifyObj();
    void Attach(CRecordDlg* pDlg)
        {m_pDlg = pDlg;}
    virtual void NewData(CWave *pWave,
                         CWaveBlock* pBlock);
    virtual void EndPlayback(CWave *pWave);
    CRecordDlg* m_pDlg;
};

Let's just look at the NewData member, which is called during recording as a new block is filled up:

void CRecDlgNotifyObj::NewData(CWave *pWave,
                         CWaveBlock* pBlock)
{
    ASSERT(m_pDlg);
    m_pDlg->NewData(pWave, pBlock);
}

A test is made to ensure that the m_pDlg member is valid, and the dialog box's NewData member function is called, effectively passing the notification back to the dialog box. So let's see how the dialog box code handles the callback.

void CRecordDlg::NewData(CWave *pWave, CWaveBlock* pBlock)
{
    ASSERT(pWave);
    // Update the VU meter from the samples.
    ASSERT(pBlock);

    // Get the format of the data.
    PCMWAVEFORMAT* pwf = (PCMWAVEFORMAT*) pWave->GetFormat();
    ASSERT(pwf->wf.wFormatTag == WAVE_FORMAT_PCM);
    int iCount = pBlock->GetNumSamples();
    if (pwf->wBitsPerSample == 8) {
        BYTE* pData = (BYTE*)pBlock->GetSamples();
        BYTE bMax = 0;
        while (iCount--) {
            if (*pData > bMax) bMax = *pData;
            pData++;
        }
        if (bMax < 128) {
            bMax = 0;
        } else {
            bMax -= 128;
        }
        m_VU.SetValue(bMax << 8, bMax << 8);
    } else {
        // Assume 16-bit samples.
        ASSERT(sizeof(short int) == 2);
        short int* pData = (short int*) pBlock->GetSamples();
        int iMax = 0;
        while (iCount--) {
            if (*pData > iMax) iMax = *pData;
            pData++;
        }
        m_VU.SetValue(iMax, iMax);
    }

    // If we are just sampling, nuke the wave blocks.
    if (m_iMode != RECORDING) {
        pWave->GetBlockList()->FreeAll();
    }
}

We don't need to add the new wave data to the wave object itself—that's done for us in the CWaveInDevice code. All we need to do is examine the wave data, find the peak value, and update the VU meter.

Note Using the callback objects might seem a bit complex, but it's the only really clean way to do this using C++ classes. In fact, the technique is very flexible, and once you become familiar with it, you can use it in many different situations. This basic idea is the core of OLE's interface object design.

Blocks of data continue to be recorded and added to the CWave object until the user clicks the Stop button. At that point the user can close the dialog box, play what has been recorded, or do the recording again.

Playing: SpelEdit's Record Dialog Box

The user can test what has been recorded by clicking the Play button. The mode is changed to PLAYING and playback of the current sample is started:

case PLAYING:
        if (m_pWave) m_pWave->Play(NULL, &m_NotifyObj);
        break;
    }

In order to start playback, the CWave's Play function is called. The first parameter is NULL, indicating that we want to play the waveform on the default output device. (You could choose to use a specific CWaveOutDevice if you wanted to.) The second parameter is a pointer to a callback object that will be notified when playback stops. This is used to re-enable the Play button and disable the Stop button to reflect the "stopped" state at the end of playback:

void CRecordDlg::EndPlayback(CWave *pWave)
{
    ASSERT(pWave);
    SetMode(SAMPLING);
}

As you can see, playing CWave objects is trivial, which is exactly what I had in mind when creating the CWave class.

Using CWave Objects as Resources

The Speller sample has a number of waveforms built into the application as resources that it uses to give instructions to the user. Let's look at how these resources are built in.

Adding CWave Objects to the Resource Script

The first problem to solve is how, exactly, do we add WAVE files as resources to our project? App Studio certainly knows nothing of WAVE files, so we have to find another way. Fortunately, the Microsoft Visual C++™ architects foresaw this problem and created a simple backdoor method for adding user-defined resources to the project: They are added to the .RC2 file using a text editor. You can simply use the Visual C++ editor to open the .RC2 file and add the resources yourself. Here's the appropriate section of Speller's .RC2 file (from the RES subdirectory):

////////////////////////////////////////////////////////////////////
// Add additional manually edited resources here. . .

IDR_RIGHT       WAVE    res\right.wav
IDR_WRONG       WAVE    res\wrong.wav
IDR_SILENCE     WAVE    res\silence.wav
IDR_INSTRUCT    WAVE    res\instruct.wav
IDR_NOWORDS     WAVE    res\nowords.wav

/////////////////////////////////////////////////////////////////////

Each entry consists of a resource ID value, followed by the resource type (in this case WAVE), followed finally by the path of the file. The resource IDs are created by using the Edit Symbols menu item in App Studio. The actual values of the symbols are not important, so just let App Studio select them. The resource type "WAVE" does not need to be defined anywhere; it is simply entered into the .RC2 file as a string. The resource files themselves were added to the project's RES subdirectory, and consequently, the path to the files must be "res\. . .".

Loading and Playing a WAVE Resource

Speller uses several WAVE resources that are all loaded in the constructor for the view class. This might seem a weird place to be doing this, but the sounds are all played in response to user actions, and those actions are handled in the view class, so keeping the sound objects themselves in the view class is easiest. Here's the part of the constructor that loads the sound resources:

CWordView::CWordView()
    : CFormView(CWordView::IDD)
{
    ...
    m_wavRight.LoadResource(IDR_RIGHT);
    m_wavWrong.LoadResource(IDR_WRONG);
    m_wavSilence.LoadResource(IDR_SILENCE);
    m_wavInstruct.LoadResource(IDR_INSTRUCT);
    m_wavNoWords.LoadResource(IDR_NOWORDS);
    ...
}

Not too difficult to handle. Let's see how the sounds are played. Here's the piece of code that handles the Help button in Figure 4:

void CWordView::OnClickedHelp()
{
    RestartTimer();
    m_wavInstruct.Play();
    m_wndWord.SetFocus();
    RestartTimer();
}

As you can see, playing the sound is simply a case of calling the CWave::Play function. We don't care on which device it's played, and we are not using a notification object, so no parameters are required. Because these sounds are used fairly often in the application, I chose to keep them around all the time. But what if we don't want to do that? What if we only want to load a sound when it's needed? Read on. . . .

CWave Objects That Take Care of Themselves

One of the tricky problems in dealing with creating and playing one-time asynchronous sounds is making sure that we tidy up properly when the sound is finished playing. For example, in response to some user action, we might want to play a sound from a WAVE file while the user's request is processed. Playing the sound isn't a problem, but deleting the CWave object when the sound has finished playing is something of a chore. Either we need to use a callback object to find when the sound is done playing, or we need to keep a pointer to the object and poll it occasionally to see when it's done.

A better solution is to use a reference-counting system in the CWave object so that when the reference count falls to zero (indicating that no other piece of code is using the object), the object deletes itself. Those of you familiar with the Component Object Model (COM) used in OLE will recognize this mechanism. For those of you not familiar with COM objects, I'll describe my implementation of the reference-counting system I used with the CWave class.

CWave's Reference-Counting System

The CWave class includes an optional reference-counting system that can be used to ensure that the object deletes itself when no longer in use. Here's an example of how it might be used to play a WAVE resource:

void SomeClass::PlayResource()
{
    CWave* pWave = new CWave(CWave::flagAutoDestruct);
    pWave->AddRef();
    if (!pWave->LoadResource(IDR_WAVERES)) {
        pWave->Release;
        return;
    }
    pWave->Play();
    pWave->Release();
}

A new CWave object is created with the flagAutoDestruct option, which enables the object's reference-counting system. The object is initially created with a reference count of zero, so the first thing to do is increase the reference count by calling its AddRef function. Now the object has a reference count of one, so it is not going to disappear. An attempt is then made to load the actual sound from a resource. If that fails, the object's reference count is decremented, causing the count to fall to zero and the object to delete itself. If the resource is loaded successfully, the Play function is called to begin playback. The Play function also calls AddRef (since it is using the object), so now the reference count is two.

Once playback is started, we call the object's Release function, in effect saying that we have finished with it. This decrements the reference count so that it becomes one ( the playback routine is keeping the object alive). When playback finishes, the reference count is decremented again by the object's playback code, causing the count to fall to zero and hence the object to delete itself. This simple system has many uses—not only in CWave objects. Watch out for more of this in the future.

Just to round things out, let's look at the code in the CWave implementation that handles the reference counting. Here's the constructor:

CWave::CWave(WAVEFLAG flag)
{
    m_pcmfmt.wf.wFormatTag = WAVE_FORMAT_1M08;
    m_pcmfmt.wf.nChannels = 1;
    m_pcmfmt.wf.nSamplesPerSec = 11025;
    m_pcmfmt.wf.nAvgBytesPerSec = 11025;
    m_pcmfmt.wf.nBlockAlign = 1;
    m_pcmfmt.wBitsPerSample = 8;
    m_pOutDev = NULL;
    m_pInDev = NULL;
    m_iRefCount = 0;
    m_iPlayCount = 0;
    if (flag && CWave::flagAutoDestruct) {
        m_bAutoDestruct = TRUE;
    } else {
        m_bAutoDestruct = FALSE;
    }
    m_pNotifyObj = NULL;
}

The only relevant thing here is that if the flagAutoDestruct option is used, the internal flag m_bAutoDestruct is set to TRUE. Here are the AddRef and Release functions:

int CWave::AddRef()
{
    ASSERT(m_iRefCount < 1000);
    return ++m_iRefCount;
}

int CWave::Release()
{
    int i;
    i = --m_iRefCount;
    ASSERT(i >= 0);
    if ((i == 0) && m_bAutoDestruct) {
        delete this;
    }
    return i;
}

And finally, here's the internal function DecPlayCount, which is called from the CWaveOutDevice, which is actually playing the waveform when playback of a block is finished:

void CWave::DecPlayCount() 
{
    ASSERT(m_iPlayCount > 0);
    m_iPlayCount--;
    if (m_iPlayCount == 0) {
        // See if there is anyone who wants to be notified.
        if (m_pNotifyObj) {
            m_pNotifyObj->EndPlayback(this);
        }
        if (m_bAutoDestruct) {
            Release(); // and maybe die
        }
    }
}

The play count is the number of CWaveBlocks currently in the output device awaiting playback. If this count falls to zero (that is, playback is complete), two things happen. First, if a notification object is in use in the CWave object, the notification function is called to notify the owner of the CWave object that playback has ended. Second, the object's reference count is decremented by calling the Release function, possibly resulting in the object's destruction.

That's All for Now

There's a lot of code in the samples that I haven't shown here. I don't think any of it is very complicated, so you should be able to dig around in it quite happily for yourself. I've tried to show you how to use the CWave class and its associates to simply record and play waveform audio. Dealing with the AddRef and Release functions can be a bit trying until you figure out what you're doing, which is why I made their use optional with the CWave class. I'm slowly converting more of my objects to use this form of reference counting because I believe it really does help with many of the object ownership issues. I think that if you try using the scheme, you'll eventually grow to like it. If you really didn't follow the story, drop me a line (nigelt@microsoft.com) and I'll try to sort it out for you.