XACT API Streaming Tutorial

This tutorial builds on the basic knowledge gained of the first API tutorial and takes a closer look at using Microsoft Cross-Platform Audio Creation Tool (XACT) to stream audio data directly from a storage medium, such as a DVD or hard drive.

Getting Started

The full working source code for this tutorial is in:

<Installed SDK Location>\Samples\C++\XACT\Tutorials\Tut02_Stream

This source code is designed to be examined separately and after the discussion below.

Background concepts

Streaming is when data is read directly from a storage medium, such as a DVD or hard drive, instead of being cached in memory. Streaming data saves memory usage since the whole file does not need to be read at once but depending on the speed of the storage medium, there could be significant delay between the time the request to read the data and time the data is actually read. This delay is called latency.

Latency is the time between when the request to play is made and when the audio is heard. When reading audio data from a storage medium, the latency could be substantial as device takes time to position to the proper sector under the read/write head. Latency can cause delays in playback of streamed audio, or even intermittent breaks if a slow-reading disk fails to feed data to the audio output fast enough for continuous sound. While the worst-case latency of a hard drive might be around 100 milliseconds, the worst-case latency of a DVD can be a full second or more. Latency on a DVD can be minimized when audio data are strategically placed on the disc so that the data contiguously streamed. If the read head does not have to move far to retrieve the next block of data, delays will remain small.

XACT wave banks types

There are 2 types of wave banks available in XACT:

Creating a Streaming Wave Bank

To do streaming, the first step is to set the wave bank's "streaming" property to true in the XACT authoring tool. Typically the sound designer sets this and informs the programmer to stream that wave bank.

Once this is done, the programmer can then use CreateFile to open the file with the flags FILE_FLAG_OVERLAPPED and FILE_FLAG_NO_BUFFERING like so:

IXACTWaveBank* pStreamingWaveBank = NULL;

hFile = CreateFile( str, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 
                    FILE_FLAG_OVERLAPPED | FILE_FLAG_NO_BUFFERING, NULL );

if( hFile != INVALID_HANDLE_VALUE )
{
    XACT_WAVEBANK_STREAMING_PARAMETERS wsParams;
    ZeroMemory( &wsParams, sizeof(XACT_WAVEBANK_STREAMING_PARAMETERS) );
    wsParams.file = hFile;
    wsParams.offset = 0;
    wsParams.packetSize = 64;

    hr = pEngine->CreateStreamingWaveBank( &wsParams, &pStreamingWaveBank );
}

The packetSize parameter specifies the how stream packet size in multiples of 2k bytes (the size of a DVD sector). So 64 equates to a stream packet size of 128k which equates approximately to a nice 6 seconds of stereo audio and will take better advantage of the drive's internal read cache than a smaller size. An optimal number to use for reading from a DVD is a multiple of DVD block size or 16 DVD sectors.

After creating a streaming wave bank, it can not be used by a cue until the streaming wave bank itself is prepared. XACT automatically prepares the wave bank, but time must be allowed for this process to happen. The application can tell when this is done by registering and handling the notification like so:

XACT_NOTIFICATION_DESCRIPTION desc = {0};
desc.flags = XACT_FLAG_NOTIFICATION_PERSIST;
desc.type = XACTNOTIFICATIONTYPE_WAVEBANKPREPARED;
pEngine->RegisterNotification(&desc);

And in the notification callback, handling it like this:

void WINAPI XACTNotificationCallback(const XACT_NOTIFICATION* pNotification)
{
    if( pNotification->type == XACTNOTIFICATIONTYPE_WAVEBANKPREPARED &&             
        pNotification->waveBank.pWaveBank == g_pStreamingWaveBank )
    {
        // Respond to this notification outside of this callback so Prepare() can be called
        EnterCriticalSection( &g_cs );
        g_bHandleStreamingWaveBankPrepared = true;
        LeaveCriticalSection( &g_cs );
    }

}

Since IXACTSoundBank::Prepare is not allowed to be called from inside the callback, you must call it outside the callback like so:

EnterCriticalSection( &g_cs );
bool bHandleStreamingWaveBankPrepared = g_bHandleStreamingWaveBankPrepared;
LeaveCriticalSection( &g_cs );

if( bHandleStreamingWaveBankPrepared )
{
    EnterCriticalSection( &g_cs );
    g_bHandleStreamingWaveBankPrepared = false;
    LeaveCriticalSection( &g_cs );

    g_pSoundBank->Prepare( iCue, 0, &gpZeroLatencyRevCue );
}

Notice that you must use critical sections properly to make shared data thread safe while avoiding deadlocks. Some advice to follow is:

Instead of using a critical section, you can also use non-blocking queues to keep track of notifications, meaning that the callback will push, never pop, and the app thread will only pop, never push.

Playing a Cue that Uses Streamed Data

If you are not concerned about latency when playing a cue that references streamed wave data, playing the cue is no different than if it used in-memory wave data:

pSoundBank->Play( iCueIndex, 0, NULL );

However, if latency is important (such as with dialog or action sound effects) then XACT has a method for setting up zero-latency streaming. This is discussed next.

Zero-Latency Streaming

Zero-latency streaming sometimes called "primed streaming" allows a game to make use of large, high-quality wave banks, without crowding out other needs for memory while supporting smooth, uninterrupted playback by preparing the cue ahead of time.

This means that for each wave streamed by the prepared cue, XACT allocates a set of memory buffers that are considerably smaller than the total size of the file. XACT gives itself a head-start by loading a small portion of the wave file's beginning into the allocated memory; the rest remains on the storage disk. Once playback begins, XACT will continue to reload subsequent portions of the wave into the memory buffers as they become free. If the packet size buffer discussed earlier when the wave bank was created is appropriately sized, the inherent latency of the storage media will not interrupt playback.

Because zero-latency streaming requires additional setup by the programmer, the programmer and the audio designer need to work together to identify which cues (if any) could reference streaming wave data where latency should to be avoided.

To accomplish this zero-latency streaming technique, a 3 step process must be followed:

Step 1: Prepare the Cue

Some reasonable amount of time before the sound needs to play, the engine calls IXACTSoundBank::Prepare like so:

pSoundBank->Prepare( iZapCueIndex, 0, &pCue );

The engine needs to store the cue instance returned by IXACTSoundBank::Prepare as this is the only cue instance that will be prepared for zero-latency streaming.

Doing this prepare a fair amount of time before playback is essential because if a cue is not prepared and it is needed immediately then there is little that can be done other than simply calling IXACTSoundBank::Play or IXACTCue::Play which will cause the cue to be played immediately after it's prepared and will result in unseemly latency. However with good design and planning the cue can be prepared before it is needed.

For cues that reference in-memory waves, there will be no benefit nor any harm caused from calling IXACTSoundBank::Prepare first.

Step 2: Wait

After calling IXACTSoundBank::Prepare, some reasonable amount time must be allowed to pass while IXACTEngine::DoWork is called periodically. In other words, the cue will not become prepared unless IXACTEngine::DoWork is called regularly so do not busy loop, waiting for the cue to be prepared.

While waiting for the storage medium to seek to the data, the game should continue doing other tasks such as rendering or updating the game state.

If for your game needs to know needs to know when if the cue has been prepared, there are a few methods available. One method is to call IXACTCue::GetState on the cue and check for the _PREPARED state. Another, more preferred method is to register and handle the XACTNOTIFICATIONTYPE_CUEPREPARED notification:

XACT_NOTIFICATION_DESCRIPTION desc = {0};
desc.flags = XACT_FLAG_NOTIFICATION_PERSIST;
desc.type = XACTNOTIFICATIONTYPE_CUEPREPARED;
desc.cueIndex = XACTINDEX_INVALID;
pEngine->RegisterNotification(&desc);

One good reason to check state of the cue before playing is when syncing animation to a streaming sound - by waiting until the cue is prepared, the game can trigger both at the same time so they will remain in sync.

One quick note is that once a cue has been prepared it will never become un-prepared. The prepared buffers will remain in memory until you release the cue.

Step 3: Play the Cue Instance

After the first 2 steps and the cue is needed for immediate playback, simply call IXACTCue::Play on the cue instance returned by IXACTSoundBank::Prepare:

pCue->Play();

Note that you can only call IXACTCue::Play on this cue instance when it is the state of preparing or is prepared. The call will fail if the cue is already playing or stopped.

Conclusion

That's all there is to it. Identify which sounds if any should go into streaming wave banks, and which of those need to be zero latency. Then have the game do the proper setup work, and play away.