Sound Data

DirectSound and DirectSoundCapture work with waveform audio data, which consists of digital samples of the sound at a fixed frequency. The particular format of a sound can be described by a WAVEFORMATEX structure. This structure is documented in the Multimedia Structures section of the Platform SDK documentation, but is briefly described here for convenience:

typedef struct { 
    WORD  wFormatTag; 
    WORD  nChannels; 
    DWORD nSamplesPerSec; 
    DWORD nAvgBytesPerSec; 
    WORD  nBlockAlign; 
    WORD  wBitsPerSample; 
    WORD  cbSize; 
} WAVEFORMATEX; 
 

The wFormatTag member contains a unique identifier assigned by Microsoft Corporation. A complete list can be found in the Mmreg.h header file. The only tag valid with DirectSound is WAVE_FORMAT_PCM. This tag indicates Pulse Code Modulation (PCM), an uncompressed format in which each samples represents the amplitude of the signal at the time of sampling. DirectSoundCapture can capture data in other formats by using the Audio Compression Manager.

For information on using non-PCM data with DirectSound, see Compressed Wave Formats.

The nChannels member describes the number of channels, usually either one (mono) or two (stereo). For stereo data, the samples are interleaved. The nSamplesPerSec member describes the sampling rate, or frequency, in hertz. Typical values are 11,025, 22,050, and 44,100.

The wBitsPerSample member gives the size of each sample, generally 8 or 16 bits. The value in nBlockAlign is the number of bytes required for each complete sample, and for PCM formats is equal to (wBitsPerSample * nChannels / 8). The value in nAvgBytesPerSec is the product of nBlockAlign and nSamplesPerSec.

Finally, cbSize gives the size of any extra fields required to describe a specialized wave format. This member is always zero for PCM formats.