Microsoft DirectX 8.1 (C++)

Voice Codecs

The compression/decompression (codec) algorithms provided with Microsoft® DirectPlay® are optimized for low-bandwidth voice compression and decompression. These codecs all operate on 8 kHz, 16-bit mono-format based data. However, DirectPlay Voice handles all the details of converting voice data to and from this intermediate format. Third-party codecs are not supported, and you cannot write proprietary codecs for use with DirectPlay Voice.

It is important to note that as the bandwidth requirements drop, the audio quality of the voice data also drops. The following table lists the supported codecs, the bandwidth in kilobits per second (Kbps), and the compression GUID used to select them. The compression GUIDs are defined in dvoice.h.

CodecBandwidthGUID
Voxware VR12variable (1.2 Kbps, avg.)  DPVCTGUID_VR12
Voxware SC033.2 KbpsDPVCTGUID_SC03
Voxware SC066.4 KbpsDPVCTGUID_SC06
TrueSpeech8 KbpsDPVCTGUID_TRUESPEECH
Microsoft® GSM13 KbpsDPVCTGUID_GSM
Microsoft® ADPCM  32 KbpsDPVCTGUID_ADPCM
Microsoft® PCM64 KbpsDPVCTGUID_NONE

The first three codecs provide a high level of compression and have approximately the same resource demands. On a 500 MHz Pentium III class computer, these codecs use approximately 1.5 percent of the CPU capacity. The VR12 codec sounds tinny and robotic, but the SC03 and SC06 codecs provide reasonable fidelity. The PCM codec provides the highest sound quality and is essentially uncompressed 8 kHz 16-bit mono-format audio data.

Note  The GSM, ADPCM, and PCM codecs are included with the Microsoft® Windows® installation but may not have been installed by the user. You may need to prompt the user to install them. You can determine which codecs are available on your system by calling IDirectPlayVoiceServer::GetCompressionTypes. If a codec is not listed, the corresponding ACM codec is not installed.

Selecting a Codec

As with all other game setup parameters, the host controls which codec is used for the voice session. All members of the voice session must use the same codec. Remember that in a peer-to-peer voice session, the voice-session host does not necessarily have to be the same as the game-data host. The host selects the codec when it calls IDirectPlayVoiceServer::StartSession. Set the guidCT member of the DVSESSIONDESC structure to the compression GUID of the codec that you want to use. A client can retrieve this structure by calling IDirectPlayVoiceClient::GetSessionDesc.

The same codec might not be ideal for the entire duration of a game. For instance, you might want to use one codec for the lobby chat feature that players use to set up the game, and another to handle voice communication after the game is launched. You cannot dynamically change codecs during a voice session. To switch to another codec, you must terminate the current voice session and create a new voice session with the new codec. However, you can stop and restart a voice session without terminating the underlying DirectPlay core session.

As with any form of network communication, it is important to analyze the cost of the voice communication to ensure that adequate bandwidth is available to support communication of the game data and voice data. Analyzing the voice bandwidth consumption is straightforward: Estimate the number of simultaneous voice streams that you anticipate and multiply that number by the sum of the bandwidth required by the codec and the protocol overhead. CPU consumption is another factor to consider when choosing a codec. As with network bandwidth, CPU resource consumption is additive, per stream.