Using CODECs to Compress Wave Audio

Nigel Thompson

April 4, 1997

Click to open or copy the files in the CODEC sample application for this technical article.

Abstract

Microsoft® Windows® 95 and Microsoft Windows NT® operating systems both include CODECs that can compress and decompress wave audio streams. Saving your wave audio data in compressed form can help with data storage requirements and reduce data transmission times when audio is sent over a network.

This article and its accompanying sample code shows how to compress wave audio packets using any of the CODECs installed on a Windows system. By altering the code very slightly it can also be used to decompress compressed data or perform data format conversions.

The sample code was developed using Microsoft Visual C++® version 5.0 and tested on the Windows 95 and Windows NT 4.0 operating systems.

Introduction

Windows 95 and, more recently, Windows NT both include the ability to handle compressed waveform audio and video data streams using installable CODECs.

A CODEC is a small piece of code used to COmpress or DECompress a data stream (hence CO-DEC). Most CODECs handle both compression and decompression. However, some CODECs are designed only to decompress so that proprietary data can be played on a system but the data format cannot be created on that system.

Although a CODEC can be used in principal to compress or decompress any stream of data, various CODECs have been designed to compress certain data types with either higher compression ratios, better fidelity, or real-time performance. For example, the best way to get a high degree of video data compression may not give adequate results when applied to audio data and vice versa.

This article focuses on how to use a CODEC from your own code to compress audio data into one of the formats supported by the CODECs on your system. The primary reason for compressing audio data is to reduce the volume of data required to store a sound sequence. Smaller data volumes mean that less disk space is occupied by the sounds and that they can be transmitted faster over a modem or network link. If the data is compressed into one of the common formats supported by Windows systems, it can be played back directly without the need to decompress it manually—the system will use its own CODECs to decompress the data for playback.

What CODECs Are in My System?

Windows 95 and Windows NT come complete with a number of standard CODECs and can have others installed by applications that are installed on the system. For example, the DSP Group, Inc. TrueSpeech CODEC ships with Windows 95, so any program you write for Windows 95 will have this CODEC available (providing the user hasn't removed it or disabled it using the Control Panel). An example of a CODEC that might be installed later is the one that the Microsoft Network (MSN) software uses for its own audio data.

All the installed CODECs are managed by the Audio Compression Manager (ACM). We can find out what CODECs are installed, and what formats each of them supports, by querying the ACM from a simple program. You can also double-click Multimedia in the Control Panel, and then click the Advanced tab to see a list of the installed CODECs on your system.

Writing a simple command line program to query the ACM provides a good introduction to dealing with the ACM and examining what each CODEC it manages can do. The CAPS program that accompanies this article does just that—so let's have a look at the code and I'll explain what each step does as we go through it.

Let's begin by looking at what header files we need to include to be able to call the ACM application programming interfaces (APIs):

#include <windows.h>
#include <mmsystem.h>
#include <mmreg.h>  // Multimedia registration
#include <msacm.h>    // Audio Compression Manager
#include <stdio.h>

The mmsystem.h header defines most of the multimedia support for Windows but not the ACM API set or any of the manufacturer-specific defines. Mmreg.h contains definitions of wave format tags for various wave data types as designed by different manufacturers. It also contains definitions of structures (based on WAVEFORMATEX) that are used to manipulate the different wave data types. The msacm.h file contains the APIs, flags, and so on for the ACM.

The first thing we can do is perform some general queries of the ACM to determine its version number and get information such as how many drivers it is currently managing. Here's part of the code that queries the ACM:

    // Get the ACM version.
    DWORD dwACMVer = acmGetVersion();
    printf("ACM version %u.%.02u build %u",
            HIWORD(dwACMVer) >> 8,
            HIWORD(dwACMVer) & 0x00FF,
            LOWORD(dwACMVer));
    if (LOWORD(dwACMVer) == 0) printf(" (Retail)");
    printf("\n");

    // Show some ACM metrics.
    printf("ACM metrics:\n");

    DWORD dwCodecs = 0;
    MMRESULT mmr = acmMetrics(NULL, ACM_METRIC_COUNT_CODECS, &dwCodecs);
    if (mmr) {
        show_error(mmr);
    } else {
        printf("%lu codecs installed\n", dwCodecs);
    }

The CAPS sample queries the ACM for a few more metrics. With the sample files, you can look at the code in detail and run the application to see the results for yourself.

Having looked at the ACM, we can now ask it to enumerate all of the drivers currently in the system. As is common practice in Windows programming, the enumeration function we call uses a callback function in our code to report data for each enumerated device. Here's the call that begins the enumeration of all the devices currently managed by the ACM:

    // Enumerate the set of enabled drivers.
    printf("Enabled drivers:\n");
    mmr = acmDriverEnum(DriverEnumProc, 0, 0); 
    if (mmr) show_error(mmr);

Like many other multimedia functions, most of the ACM function calls return an MMRESULT value that indicates any error that might occur. A zero value indicates that the function call completed successfully. Now, let's see the enumeration callback function DriverEnumProc, which is called for each driver in the system:

BOOL CALLBACK DriverEnumProc(HACMDRIVERID hadid, DWORD dwInstance, DWORD fdwSupport)
{
    printf(" id: %8.8lxH", hadid);
    printf("  supports:\n");
    if (fdwSupport & ACMDRIVERDETAILS_SUPPORTF_ASYNC) printf("   async conversions\n");
    if (fdwSupport & ACMDRIVERDETAILS_SUPPORTF_CODEC) printf("   different format conversions\n");
    if (fdwSupport & ACMDRIVERDETAILS_SUPPORTF_CONVERTER) printf("   same format conversions\n");
    if (fdwSupport & ACMDRIVERDETAILS_SUPPORTF_FILTER) printf("   filtering\n");

    // Get some details.
    ACMDRIVERDETAILS dd;
    dd.cbStruct = sizeof(dd);
    MMRESULT mmr = acmDriverDetails(hadid, &dd, 0);
    if (mmr) {
        printf("   "); show_error(mmr);
    } else {
        printf("   Short name: %s\n", dd.szShortName);
        printf("   Long name:  %s\n", dd.szLongName);
        printf("   Copyright:  %s\n", dd.szCopyright);
        printf("   Licensing:  %s\n", dd.szLicensing);
        printf("   Features:   %s\n", dd.szFeatures);
        printf("   Supports %u formats\n", dd.cFormatTags);
        printf("   Supports %u filter formats\n", dd.cFilterTags);
    }

    // Open the driver.
    HACMDRIVER had = NULL;
    mmr = acmDriverOpen(&had, hadid, 0);
    if (mmr) {
        printf("   "); show_error(mmr);
    } else {
        DWORD dwSize = 0;
        mmr = acmMetrics(had, ACM_METRIC_MAX_SIZE_FORMAT, &dwSize);
        if (dwSize < sizeof(WAVEFORMATEX)) dwSize = sizeof(WAVEFORMATEX); // for MS-PCM
        WAVEFORMATEX* pwf = (WAVEFORMATEX*) malloc(dwSize);
        memset(pwf, 0, dwSize);
        pwf->cbSize = LOWORD(dwSize) - sizeof(WAVEFORMATEX);
        pwf->wFormatTag = WAVE_FORMAT_UNKNOWN;
        ACMFORMATDETAILS fd;
        memset(&fd, 0, sizeof(fd));
        fd.cbStruct = sizeof(fd);
        fd.pwfx = pwf;
        fd.cbwfx = dwSize;
        fd.dwFormatTag = WAVE_FORMAT_UNKNOWN;
        mmr = acmFormatEnum(had, &fd, FormatEnumProc, 0, 0);  
        if (mmr) {
            printf("   ");
            show_error(mmr);
        }
        free(pwf);

        acmDriverClose(had, 0);
    }

    return TRUE; // Continue enumeration.
}

The callback function is passed a set of flags that describe the type of support the driver has. Some drivers can operate asynchronously while others cannot. Some drivers can convert one wave data format to another (these are CODECs) and other drivers can only perform filtering operations where the input and output formats are the same. Note that the ACM maintains this data along with the text name of the driver, copyright information, and so on so that we can look at this data without the need to load or open a specific driver. This is convenient, for example, when we want to present a list box to the user to select a specific driver to use.

To obtain more detailed information about the capabilities of a driver, we must load the driver and open it, which is done by calling acmOpenDriver. Once the driver is open, we can request that it enumerate the wave data formats it supports. There is one minor complication here—although all wave format description structures are based on WAVEFORMATEX, many formats use an extended form of the structure to hold information specific to the structure. If we want to enumerate all the formats, we need some idea of how big a structure to allocate so that the driver can fill in the details. We can find the size of the largest structure required by calling acmMetrics and passing the ACM_METRIC_MAX_SIZE_FORMAT flag.

If you look at the code above, you'll see that I simply cast the result of the allocation to be a WAVEFORMATEX pointer. I'm not interested in any type-specific data here, just the common information so that this pointer does all I need it to.

Having allocated the structure, I can now call acmFormatEnum to enumerate the supported formats. Once again, we use a callback function to receive the enumerated format data:

BOOL CALLBACK FormatEnumProc(HACMDRIVERID hadid, LPACMFORMATDETAILS pafd, DWORD dwInstance, DWORD fdwSupport)
{
    printf("    %4.4lXH, %s\n", pafd->dwFormatTag, pafd->szFormat);


    return TRUE; // Continue enumerating.
}

As you can see, this one's trivial and just prints out some of the information about the format.

So, with the code above, you can query the ACM for all its drivers and find what formats are supported by each. I suggest you run the CAPS program now and see what your system currently has installed.

Using a Specific CODEC

Okay, so we've seen how to find out what CODECs are installed on your system—now let's see how we locate a specific CODEC and use it to compress some audio data. Let's look at the CONV sample, which compresses a simple wave data packet using one of the available CODECs. To keep the code as simple as possible, I implemented it in a console application and have made no attempt either to play the compressed data or to save it to a file. The sample code simply shows how to find the driver you need and get it to convert the data into a compressed form. The rest, as we say, is up to you.

Doing the Compression Two-Step

In the ideal world, compressing some data would be simply a case of saying to the system: "Here's some data, compress it in this format please." Unfortunately, the Windows programming world is far from ideal and, as usual, we get to do a lot of the grunt work ourselves. The first and most important problem to solve arises because of the fact that any given CODEC may not be able to compress the data format you just happen to be working in. For example, let's say that we record some data (perhaps a user speaking into a microphone) at 11.025 kHz, in 8-bit, mono pulse-code modulation (PCM), which is a format all Multimedia PCs can record in. We'd like to send this message to a relative via modem, so we want to compress it as much as possible to get the data size down. We choose to use the TrueSpeech CODEC, which comes with Windows and can achieve approximately 10:1 compression. The problem we immediately come up against is that the TrueSpeech CODEC can't handle 11.025 kHz, 8-bit, mono PCM data. It can only handle 8.000 kHz, 16 bit, mono PCM (or 8-bit in some cases). So, we have to first convert the source data into an intermediate PCM format that the TrueSpeech CODEC can handle, and then get the TrueSpeech CODEC to convert the intermediate data to the final format that we need.

Converting one PCM format to another can be done using a different CODEC that also ships with Windows, so you need to use one CODEC to convert the data to the format the other CODEC can handle. Given that we know how to enumerate the CODECs and their supported formats, this looks reasonable.

There is, however, one further problem that I chose to ignore in my sample code and I'll leave for you to resolve. If we have a CODEC that will create the compressed format we want but supports several input formats, how do we choose the best intermediate format to use? Following Nigel's maxim, which states, "Always do the least amount of work possible," I chose to use the first enumerated PCM format the CODEC supports. While this is very easy to implement, it can lead to some loss of data fidelity. Consider that the CODEC we want to use has some algorithm for almost lossless compression and can accept 8 or 16 bit PCM data at 11.025 or 22.050 kHz. Let's say we want to convert a high fidelity sample recorded at 44.1 kHz, 16 bit stereo. We are trying to reduce the data volume, but not at the expense of quality. If we simply enumerate the formats the CODEC supports, the first one we find might well be 11.025 kHz, 8-bit mono. If we convert to this format first and then compress it, we will certainly have lost some quality because the intermediate format we chose was not good enough. If we'd have used 16-bit and 22 kHz, we would have done much better. Having warned you of this pitfall, let's look now at the CONV sample and see how it works.

The CONV Sample Application

The CONV sample works in four stages: it creates some sample waveform data, locates a suitable CODEC, converts the data to an intermediate form the CODEC can handle, and finally converts it to the required form. For simplicity, the source data is created programmatically rather than by a live recording or by reading a .wav file:

    // First we create a wave that might have been just recorded.
    // The format is 11.025 kHz, 8 bit mono PCM which is a recording
    // format available on all machines.
    // Our sample wave will be 1 second long and will be a sine wave 
    // of 1kHz, which is exactly 1,000 cycles.

    WAVEFORMATEX wfSrc;
    memset(&wfSrc, 0, sizeof(wfSrc));
    wfSrc.cbSize = 0;
    wfSrc.wFormatTag = WAVE_FORMAT_PCM; // PCM
    wfSrc.nChannels = 1; // Mono
    wfSrc.nSamplesPerSec = 11025; // 11.025 kHz
    wfSrc.wBitsPerSample = 8; // 8 bit
    wfSrc.nBlockAlign = wfSrc.nChannels * wfSrc.wBitsPerSample / 8;
    wfSrc.nAvgBytesPerSec = wfSrc.nSamplesPerSec * wfSrc.nBlockAlign;

    DWORD dwSrcSamples = wfSrc.nSamplesPerSec;
    BYTE* pSrcData = new BYTE [dwSrcSamples]; // 1 second duration
    BYTE* pData = pSrcData;
    double f = 1000.0;
    double pi = 4.0 * atan(1.0);
    double w = 2.0 * pi * f;
    for (DWORD dw = 0; dw < dwSrcSamples; dw++) {
        double t = (double) dw / (double) wfSrc.nSamplesPerSec; 
        *pData++ = 128 + (BYTE)(127.0 * sin(w * t));
    }

A WAVEFORMATEX structure is created to describe the source data format and a 11.025 kHz, 8-bit, mono PCM wave of one-second duration is generated with some simple math.

The next step is to choose a format we'd like to convert the data to and locate a suitable CODEC.

    WORD wFormatTag = WAVE_FORMAT_DSPGROUP_TRUESPEECH;

    // Now we locate a CODEC that supports the destination format tag.
    HACMDRIVERID hadid = find_driver(wFormatTag);
    if (hadid == NULL) {
        printf("No driver found\n");
        exit(1);
    }
    printf("Driver found (hadid: %4.4lXH)\n", hadid);

The find_driver function enumerates all the drivers until it finds one that supports the given tag value (in this case WAVE_FORMAT_DSPGROUP_TRUESPEECH). I won't show the details because it's very similar to the enumeration code we looked at earlier. You can examine how it works for yourself later.

Having located the driver, we now need to construct a WAVEFORMATEX structure for the final compressed data format that the driver will generate and also one for the intermediate PCM format that the driver needs as input:

    // Get the details of the format.
    // Note: this is just the first of one or more possible formats for the given tag.
    WAVEFORMATEX* pwfDrv = get_driver_format(hadid, wFormatTag);
    if (pwfDrv == NULL) {
        printf("Error getting format info\n");
        exit(1);
    }
    printf("Driver format: %u bits, %lu samples per second\n",
         pwfDrv->wBitsPerSample, pwfDrv->nSamplesPerSec);

    // Get a PCM format tag the driver supports.
    // Note: we just pick the first supported PCM format which might not really
    // be the best choice.
    WAVEFORMATEX* pwfPCM = get_driver_format(hadid, WAVE_FORMAT_PCM);
    if (pwfPCM == NULL) {
        printf("Error getting PCM format info\n");
        exit(1);
    }
    printf("PCM format: %u bits, %lu samples per second\n",
         pwfPCM->wBitsPerSample, pwfPCM->nSamplesPerSec);

At the risk of repeating myself, beware that the get_driver_format function just enumerates for the first matching format—this might not give you the best possible quality.

Now we have WAVEFORMATEX structures built to describe the source format, the intermediate PCM format, and the final compressed format. It's time to start converting the data. Conversion is done by using what the ACM calls a stream. We open the stream passing descriptions of the source and destination formats, and then ask the stream to convert them.

In the case we'll look at here, the conversion is done synchronously and may take quite some time if the CODEC algorithm is complex. Some CODECs can work asynchronously, notifying you as things progress via a message to a window, a call to a callback function, or setting an event. The code here just gets the job done with the least fuss—but you do get to wait until it's complete. There is one other important point. As you'll see, when we open the conversion streams, we specify the ACM_STREAMOPENF_NONREALTIME flag. This is very important. If you omit this flag then some drivers (for example the TrueSpeech driver) will report error 512 (not possible). This error is telling you that the conversion you asked for cannot be done in real time. This isn't an issue in my sample, but it would be if you were trying to convert a lot of data at the same time you were playing it.

So, let's look now at the first conversion step, which converts the source format to the intermediate format:

    /////////////////////////////////////////////////////////////////////////////
    // Convert the source wave to the PCM format supported by the CODEC.
    // We use any driver that can do the PCM to PCM conversion.
    HACMSTREAM hstr = NULL;
    mmr = acmStreamOpen(&hstr,
                        NULL, // Any driver
                        &wfSrc, // Source format
                        pwfPCM, // Destination format
                        NULL, // No filter
                        NULL, // No callback
                        0, // Instance data (not used)
                        ACM_STREAMOPENF_NONREALTIME); // flags
    if (mmr) {
        printf("Failed to open a stream to do PCM to PCM conversion\n");
        exit(1);
    }

    // Allocate a buffer for the result of the conversion.
    DWORD dwSrcBytes = dwSrcSamples * wfSrc.wBitsPerSample / 8;
    DWORD dwDst1Samples = dwSrcSamples * pwfPCM->nSamplesPerSec / wfSrc.nSamplesPerSec;
    DWORD dwDst1Bytes = dwDst1Samples * pwfPCM->wBitsPerSample / 8;
    BYTE* pDst1Data = new BYTE [dwDst1Bytes];

    // Fill in the conversion info.
    ACMSTREAMHEADER strhdr;
    memset(&strhdr, 0, sizeof(strhdr));
    strhdr.cbStruct = sizeof(strhdr);
    strhdr.pbSrc = pSrcData; // The source data to convert
    strhdr.cbSrcLength = dwSrcBytes;
    strhdr.pbDst = pDst1Data;
    strhdr.cbDstLength = dwDst1Bytes;

    // Prep the header.
    mmr = acmStreamPrepareHeader(hstr, &strhdr, 0); 

    // Convert the data.
    printf("Converting to intermediate PCM format...\n");
    mmr = acmStreamConvert(hstr, &strhdr, 0);
    if (mmr) {
        printf("Failed to do PCM to PCM conversion\n");
        exit(1);
    }
    printf("Converted OK\n");

    // Close the stream.
    acmStreamClose(hstr, 0);

When the stream is opened, the second parameter is set to NULL, indicating that we will accept any driver to perform this conversion. The only complexity is computing how much buffer space we'll need for the output data. Because a PCM to PCM conversion involves no compression or decompression, the computation is straight forward.

You might note the call to acmStreamPrepareHeader, which actually is a convenience for the driver and allows it to lock the memory before conversion begins.

The final step is to convert the intermediate format to the final compressed format:

///////////////////////////////////////////////////////////////////////////////////
    // Convert the intermediate PCM format to the final format.

    // Open the driver.
    HACMDRIVER had = NULL;
    mmr = acmDriverOpen(&had, hadid, 0);
    if (mmr) {
        printf("Failed to open driver\n");
        exit(1);
    }

    // Open the conversion stream.
    // Note the use of the ACM_STREAMOPENF_NONREALTIME flag. Without this
    // some software compressors will report error 512 - not possible.
    mmr = acmStreamOpen(&hstr,
                        had, // Driver handle
                        pwfPCM, // Source format
                        pwfDrv, // Destination format
                        NULL, // No filter
                        NULL, // No callback
                        0, // Instance data (not used)
                        ACM_STREAMOPENF_NONREALTIME); // Flags
    if (mmr) {
        printf("Failed to open a stream to do PCM to driver format conversion\n");
        exit(1);
    }

    // Allocate a buffer for the result of the conversion.
    // Compute the output buffer size based on the average byte rate
    // and add a bit for randomness.
    // The IMA_ADPCM driver fails the conversion without this extra space.
    DWORD dwDst2Bytes = pwfDrv->nAvgBytesPerSec * dwDst1Samples /
                         pwfPCM->nSamplesPerSec;
    dwDst2Bytes = dwDst2Bytes * 3 / 2; // add a little room
    BYTE* pDst2Data = new BYTE [dwDst2Bytes];

    // Fill in the conversion info.
    ACMSTREAMHEADER strhdr2;
    memset(&strhdr2, 0, sizeof(strhdr2));
    strhdr2.cbStruct = sizeof(strhdr2);
    strhdr2.pbSrc = pDst1Data; // the source data to convert
    strhdr2.cbSrcLength = dwDst1Bytes;
    strhdr2.pbDst = pDst2Data;
    strhdr2.cbDstLength = dwDst2Bytes;

    // Prep the header.
    mmr = acmStreamPrepareHeader(hstr, &strhdr2, 0); 

    // Convert the data.
    printf("Converting to final format...\n");
    mmr = acmStreamConvert(hstr, &strhdr2, 0);
    if (mmr) {
        printf("Failed to do PCM to driver format conversion\n");
        exit(1);
    }
    printf("Converted OK\n");

    // Close the stream and driver.
    mmr = acmStreamClose(hstr, 0);
    mmr = acmDriverClose(had, 0);

This is very similar to the PCM to PCM conversion, but in this case we supply the handle to the driver we want to use when we open the stream. Actually, we could supply NULL here too because we already ascertained that the driver exists, but supplying the handle helps the system avoid wasting time finding the driver for us.

Computing the buffer size for the compressed data is a little tricky and requires some slight guesswork. The nAvgBytesPerSec field of the WAVEFORMATEX structure indicates the average rate at which bytes are read during playback. We can use this to estimate how much data we need in order to store the compressed wave. Some drivers give data that is truly average and not the worst case, so I chose to add 50 percent more to the buffer. This works well in practice even if it is a little wasteful. Once the conversion is complete, the cbDstLengthUsed field of the ACMSTREAMHEADER structure contains the actual number of bytes used in the buffer. I used this to compute the compression ratio:

    // Show the conversion stats.
    printf("Source wave had %lu bytes\n", dwSrcBytes);
    printf("Converted wave has %lu bytes\n", strhdr2.cbDstLengthUsed);
    printf("Compression ratio is %f\n", (double) dwSrcBytes /
                         (double) strhdr2.cbDstLengthUsed);

Summary

Compressing waveform data using the CODECs that ship with the Windows operating systems is easy to do and results in data that occupies less disk space and takes less time to transmit. If you have a proprietary compression format, you can create your own CODEC to install and use it in the same way I've shown here.

As usual, I'm happy to answer questions regarding this article. I can be contacted by email: nigel-t@msn.com.