XACT Streaming Wave Banks

This section discusses streaming wave banks and how to use them under different circumstances with Microsoft Cross-Platform Audio Creation Tool (XACT).

What is Streaming?

Wave banks may be designated as in-memory, which means that their waves are to be loaded entirely into local memory from the storage medium (DVD or hard drive) for later playback through audio hardware. For waves played directly from memory there are no inherent timing or latency issues.

Wave banks may also be designated as streamed, which means that their wave data remains on the storage medium. The audio data is delivered from the DVD or hard drive to the audio hardware for playback through a set of work buffers. This is often necessary for audio (such as background music) that is too large to load into working memory.

To better understand streaming audio, it will help to become familiar with some important concepts:

Latency

Latency is wasted time. On a storage medium such as DVD or hard drive, latency is the substantial amount of time the device takes to position the proper sector under the read/write head. Latency can cause delays in playback of streamed audio, or even intermittent breaks if a slow-reading disk fails to feed data to the audio output fast enough for continuous sound.

The inherent latency of DVD and that of a hard drive can be quite substantial. While the worst-case latency of a hard drive might be around 100 milliseconds, the worst-case latency of a DVD can be a full second or more.

Latency on a DVD can be minimized when audio files are strategically placed on the disc so that the data to be streamed is contiguous. If the read head does not have to move far to retrieve the next block of data, delays will remain small.

Zero-Latency Streaming

One of the biggest challenges for game audio is trying to fit all the desired sounds within the available memory. To do so, an audio designer might reduce the sampling rate, compress the data, or both. However, this can reduce the fidelity of the sounds.

The XACT solution to this problem is zero-latency streaming, sometimes called "primed streaming." Zero-latency streaming allows a game title to make use of large, high-quality wave banks, tens of megabytes or larger, without crowding out other needs for memory while supporting smooth, uninterrupted playback.

For each wave streamed, the XACT allocates a set of memory buffers that are considerably smaller than the total size of the file. The XACT gives itself a head-start by loading a small portion of the wave file's beginning into the allocated memory; the rest remains on the storage disk. Once playback begins, the XACT will continue to reload subsequent portions of the wave into the memory buffers as they become free. If the buffer is appropriately sized, the inherent latency of the storage media will not interrupt playback.

When Should Waves Be Streamed?

In general, it is preferable to designate wave banks as in-memory. Playback from memory always occurs without error or delay.

That is not always possible, especially when wave files are large and memory resources are limited. In those circumstances, the audio designer may choose to designate a wave bank as streaming. This can be done through the XACT design tool, setting the "Streaming" property of the wave bank.

Examples of types of audio that may be appropriate for streaming are (but not limited to):

Also, if your title has a large number of speech phrases, it may be most efficient to place them all in one large streaming wave bank.

The audio designer should consult with the game title programmer to determine if streaming is appropriate. Under the XACT, the audio designer does not need to be concerned with the details of how the audio is streamed, or what settings will work best. Implementing streaming, determining what size memory buffers need to be and managing resources are entirely under the control of the programmer.

Programming for XACT Streaming

Using a streaming wave bank, in most respects, follows the process discussed in the XACT Programming Guide. You will use the function IXACTEngine::CreateStreamingWaveBank to instantiate a wave bank object.

For cues that use streamed waves, the latency of the storage medium produces a gap of time between the request to play the cue and the moment the audio is played. On a DVD, this is typically around 100 milliseconds, but can be substantially more (see above section on latency.) If this is acceptable to gameplay, then these cues can be triggered using the IXACTSoundBank::Play method. No other support is needed.

XACTINDEX bg_music_index;

// Look up the cue's index
bg_music_index = pSoundBank->GetCueIndex( "Gunshot_1" );
// Now trigger the cue
pSoundBank->Play( bg_music_index, 0, NULL );

Programming for Zero-Latency Playback

In some cases, the programmer may need to synchronize sound to gameplay, such as speech for animations. This can be difficult without knowing the exact latency of the streamed sound. The XACT provides the programmer with a way to address this problem:

Using the Prepare Method

The game title can prime the sound cue using the IXACTSoundBank::Prepare method. This service reads a bit of the beginning of the wave into memory. After the XACT completes its priming, it notifies the program. Once the cue is primed, it can be played with zero latency and synchronized with the animation.

When the function IXACTEngine::CreateStreamingWaveBank is called to instantiate the streamed wave bank it is passed information a XACT_WAVEBANK_STREAMING_PARAMETERS structure that will identify the wave bank file and establish resources for streaming. The packetSize member specifies the amount of memory, in media sectors, for each memory buffer (or "packet") used to stream the waves in the wave bank.

When the game title wishes to trigger the cue that uses a streamed wave, it may use the Prepare method to prime the cue. This method will return the IXACTCue object for the cue. When the Play method of the primed cue is invoked, the title can expect the sound to begin immediately and commence the animation without any additional wait.

The initialization portion of the title might appear as the code sample suggests.

XACT_WAVEBANK_STREAMING_PARAMETERS  XStreaming_Params = {0};

XStreaming_Params.fh = fh_MyStreamingWaveBankFile;
XStreaming_Params.packetSize = 4;			// Four DVD sectors = 8192 bytes

// Ready to create engines for wave banks and sound banks
if ( SUCCEEDED( hr = pEngine->CreateStreamingWaveBank( &XStreaming_Params, &pWaveBank ) ) )
{
  pEngine->CreateSoundBank( pbSoundBank, dwSoundBankSize, 0, 0, &pSoundBank );
}

When the game title wishes to trigger the cue for synchronization with animation, the following sequence may be used:

XACTINDEX cue_anim_index;
IXACTCue * pCue_lips;
DWORD dwState;

// Look up the cue's index
cue_anim_index = pSoundBank->GetCueIndex( "Lips_Moving1" );
// Prime the cue
pSoundBank->Prepare( cue_anim_index, 0, &pCue_lips );
bool fDone= false;
while (!fDone) 
{ // Wait for the cue to finish being prepared
  pCue_lips->GetState( &dwState );
  if( dwState & XACT_CUESTATE_PREPARED ) 
  {
     fDone = true;
     continue;
  }
  Sleep(1);
}

// Now trigger the cue
pCue_lips->Play();

Memory Usage

How much system memory a particular streaming wave uses for buffering depends on the value of the packetSize member of the XACT_WAVEBANK_STREAMING_PARAMETERS structure passed to IXACTEngine::CreateStreamingWaveBank. A non-looping sound uses three buffers; a looping sound uses four. The size of each buffer is specified in packetSize, as the number of sectors. The number of bytes in a packet depends on the storage media sector size (512 bytes for hard drive wave banks, 2048 bytes for DVD).

You will want to determine the ideal value for packetSize from the samples-per-second resolution of the audio and the amount of playback time you wish to buffer. The following examples show how this is done, and what memory usage would result:

Example 1

To begin, we must compute the number of DVD sectors we will need to set as packetSize for a streaming wave bank with a single 44.1-kHz, 16-bit, non-looping wave entry. Assume that we have chosen to buffer 100 milliseconds of wave data.

Therefore, we would compute the amount of memory we need for one buffer as:

44,100 (samples per second) × 2 (bytes per sample) × .100 (milliseconds) = 8820 bytes

Converting this to the number of sectors, would be calculated by:

8820 (buffer size in bytes) / 2048 (bytes per DVD sector) = 4.3 sectors

The value of 4.3 sectors must be rounded up to 5 sectors, so the effective lines of code would read:

XACT_WAVEBANK_STREAMING_PARAMETERS  XStreaming_Params = {0};

XStreaming_Params.fh = fh_MyStreamingWaveBankFile;
XStreaming_Params.packetSize = 5;			// Five DVD sectors = 10,240 bytes

The amount of memory allocated for a single buffer would be 10 KB. Because non-looping waves use 3 buffers per wave, the actual amount of memory used in this case would be 30 KB (3 × 10 KB). Since there is only one wave in this sound bank, that's all the memory that will ever be used.

Note that the amount of system RAM required does not depend on the length of the sound. It depends only on the format and the value specified in packetSize.

Example 2

Let us examine another case, a streaming wave bank that contains 200 looping wave entries. As in the example above, each wave is 44.1-kHz and 16-bit format. If the average length of each wave is 3 seconds, then the approximate size of the wave bank would be 52 MB.

If we decide that we need to buffer 100 milliseconds as in the above example, this means that our value for packetSize remains the same (5 DVD sectors), and each buffer requires 10 KB.

Because looping waves use 4 buffers (instead of 3 in the above example which specified non-looping), the actual amount of memory used per wave will be 40 KB (4 × 10 KB).

If we would like to support up to 40 concurrently playing streamed waves, then the actual amount of memory used would be computed as:

40 Kbytes (per wave) × 40 (concurrent waves) = 1.6MB of memory.

In this example, streaming the wave bank will require only 30 percent of the memory that would be required if the 52 MB wave bank were designated as in-memory.

See Also

XACT Overviews, XACT Reference, XACT Audio Authoring