Microsoft Corporation
September 1997
The convergence of desktop multimedia and the Internet presents tremendous opportunities for developers to reach broad audiences with compelling, media-rich content and applications. With Microsoft® DirectX® version 5.0 set of application programming interfaces (APIs), Microsoft delivers a significant expansion of its system-level APIs to deliver a unified, comprehensive solution for developers. This article details the application-level services of DirectX media that provide rich support for media interaction and integration.
The Microsoft DirectX set of APIs offers a broad spectrum of services from chip level to cyberspace. In addition to low-level APIs that access hardware acceleration—called DirectX foundation—DirectX now includes DirectX media, a layer of high-level services including streaming, animation, and behaviors.
DirectX media is a family of application-level APIs and controls for multimedia that provides rich support for interaction and integration of different media types in order to develop online and digital media authoring applications.
DirectX media currently consists of the five APIs shown in Table 1.
Table 1. DirectX APIs
DirectX Media Service | Description |
Direct3D® Retained Mode | 3-D scene graph |
DirectPlay® | Multiuser player services |
DirectShow™ (formerly the ActiveMovie™ Software Development Kit [SDK]) | Media playback, streaming, and capture |
DirectAnimation™ (formerly ActiveX™ Animation) | Rich animation and interaction of diverse media that can integrate with Dynamic Hypertext Markup Language (DHTML) |
DirectModel | Three-dimensional (3-D) model support |
DirectX media has been developed to meet a set of related goals:
DirectX media is the result of the conceptual reorganization of DirectX version 5.0 into two levels: a system-layer DirectX foundation and an application-layer DirectX media.
The big-picture organization of DirectX services is shown in the Figures 1 through 3.
Figure 1. DirectX architecture
DirectX media services use DirectX foundation. These services include Direct3D Retained Mode, DirectAnimation, DirectPlay and DirectShow. Support for Virtual Reality Markup/Modeling Language (VRML) is also provided in DirectX media.
Figure 2. DirectX media
DirectX foundation provides the basis for performance media on Microsoft Windows®–based computers through DirectDraw®, DirectInput®, DirectSound®, DirectSound3D, and Direct3D Immediate Mode.
Figure 3. DirectX foundation
Direct3D Retained Mode is a high-level 3-D scene graph manager that simplifies the building and animation of 3-D worlds and data.
New in DirectX version 5.0:
All access to Direct3D Retained Mode is through a small set of objects. Table 2 lists these objects with a brief description.
Table 2. Direct3D Retained Mode Objects
Object | Description |
Direct3DRMAnimation | Defines how a transformation will be modified, often in reference to a Direct3DRMFrame or Direct3DRMFrame2 object. You use it to animate position, orientation, and scaling of Direct3DRMVisual, Direct3DRMLight, and Direct3DRMViewport objects. |
Direct3DRMAnimationSet | Allows Direct3DRMAnimation objects to be grouped together |
Direct3DRMDevice | Represents the visual display destination for the renderer |
Direct3DRMDevice2 | Performs the same as the Direct3DRMDevice object but with enhanced control of transparency |
Direct3DRMFace | Represents a single polygon in a mesh |
Direct3DRMFrame | Positions objects within a scene and defines the positions and orientations of visual objects |
Direct3DRMFrame2 | Extends the Direct3DRMFrame object by enabling access to the frame axes, bounding boxes, and materials. Also supports ray picking |
Direct3DRMInterpolator | Stores actions and applies the actions to objects with automatic calculation of in-between values |
Direct3DRMLight | Defines one of five types of lights that are used to illuminate the visual objects in a scene |
Direct3DRMMaterial | Defines how a surface reflects light |
Direct3DRMMesh | A set of polygonal faces, can be used to manipulate groups of faces and vertices |
Direct3DRMMeshBuilder2 | Allows you to work with individual vertices and faces in a mesh |
Direct3DRMMeshBuilder | Allows you to work with individual vertices and faces in a mesh (obsolesced version of Direct3DRMMeshBuilder2) |
Direct3DRMObject | A base class used by all other Direct3D Retained-Mode objects; it has characteristics that are common to all objects |
Direct3DRMPickedArray | Identifies a visual object that corresponds to a given 2-D point |
Direct3DRMPicked2Array | Identifies a visual object corresponding to a given ray intersection |
Direct3DRMProgressiveMesh | A coarse base mesh, together with records describing how to incrementally refine the mesh, allows a generalized level of detail to be set on the mesh as well as progressive download of the mesh from a remote source |
Direct3DRMShadow | Defines a shadow |
Direct3DRMTexture | A rectangular array of colored pixels |
Direct3DRMTexture2 | Same as the Direct3DRMTexture object except that resources can be loaded from files other than the currently executing file, textures can be created from images in memory, and MIP maps can be generated |
Direct3DRMUserVisual | Defined by an application to provide functionality not otherwise available in the system |
Direct3DRMViewport | Defines how the 3-D scene is rendered into a 2-D window |
Direct3DRMVisual | Anything that can be rendered in a scene. Visual objects need not be visible; for example, a frame can be added as a visual. |
Direct3DRMWrap | Calculates texture coordinates for a face or mesh |
An animation in Retained Mode is defined by a set of keys. A key is a time value associated with a scaling operation, an orientation, or a position. A Direct3DRMAnimation object defines how a transformation is modified according to the time value. The animation can be set to operate on a Direct3DRMFrame object, so it could be used to animate the position, orientation, and scaling of Direct3DRMVisual, Direct3DRMLight, and Direct3DRMViewport objects.
IDirect3DRMAnimation::AddPositionKey, IDirect3DRMAnimation::AddRotateKey, and IDirect3DRMAnimation::AddScaleKey are methods that each specify a time value whose units are arbitrary. If an application adds a position key with a time value of 99, for example, a new position key with a time value of 49 would occur exactly halfway between the (zero-based) beginning of the animation and the first position key.
Calling the IDirect3DRMAnimation::SetTime method drives the animation. This sets the visual object's transformation to the interpolated position, orientation, and scale of the nearby keys in the animation. As with the methods that add animation keys, the time value for IDirect3DRMAnimation::SetTime is an arbitrary value, based on the positions of keys the application has already added.
A Direct3DRMAnimationSet object allows Direct3DRMAnimation objects to be grouped together. This allows all the animations in an animation set to share the same time parameter, simplifying the playback of complex articulated animation sequences. An application can add an animation to an animation set by using the IDirect3DRMAnimationSet::AddAnimation method, and it can remove one by using the IDirect3DRMAnimationSet::DeleteAnimation method. Calling the IDirect3DRMAnimationSet::SetTime method drives animation sets.
A mesh is a visual object that is made up of a set of polygonal faces. A mesh defines a set of vertices and a set of faces.
A progressive mesh is stored as a base mesh (a coarse version) and a set of records that are used to increasingly refine the mesh. This allows you to set the level of detail rendered for a mesh; it also allows progressive download from remote sources.
Using the methods of the IDirect3DRMProgressiveMesh interface, you can set the number of vertices or faces to render, and thereby control, the render detail. You can also specify a minimum level of detail required for rendering. Normally, a progressive mesh is rendered once the base mesh is available, but with the IDirect3DRMProgressiveMesh::SetMinRenderDetail method you can specify that a greater level of detail is necessary before rendering. You can also build a Direct3DRMMesh object from a particular state of the progressive mesh using the IDirect3DRMProgressiveMesh::CreateMesh method.
You can load a progressive mesh from a file, resource, memory, or Uniform Resource Locator (URL). Loading can be done synchronously or asynchronously. You can check the status of a download with the IDirect3DRMProgressiveMesh::GetLoadStatus method, and terminate a download with the IDirect3DRMProgressiveMesh::Abort method. If loading is asynchronous, it is up to the application to use events through the IDirect3DRMProgressiveMesh::RegisterEvents and IDirect3DRMProgressiveMesh::GetLoadStatus methods to find out how the load is progressing.
DirectPlay makes it easy to connect to games over the Internet, a modem link, or a network.
DirectPlay is a software interface that simplifies application access to communication services. DirectPlay has become a technology family that provides a way for applications to communicate with each other independently of the underlying transport, protocol, or online service; it also provides this independence for matchmaking servers, game servers, and billing.
Applications (especially games) can be more compelling if they can be played against real players, and the personal computer has richer connectivity options than any game platform in history. Instead of forcing the developer to deal with the differences that each connectivity solution represents, DirectPlay provides well-defined, generalized communication capabilities. DirectPlay shields developers from the underlying complexities of diverse connectivity implementations, freeing them to concentrate on producing a great application.
DirectPlay version 5.0 has a new interface, IDirectPlay3. This interface inherits directly from IDirectPlay2 and by default behaves as IDirectPlay2. All new functionality is enabled through new methods or new flags.
DirectPlay 5.0 includes numerous new features and improvements:
IDirectPlay3::EnumConnections enumerates the Connection Shortcuts available to the application. This method supersedes DirectPlayEnumerate.
IDirectPlay3::InitializeConnection initializes a DirectPlay connection. This method supersedes DirectPlayCreate. The new IDirectPlayLobby2::CreateCompoundAddress method creates Connection Shortcuts to pass to the InitializeConnection method.
IDirectPlay3::SecureOpen creates or joins a session on a machine that uses Microsoft Windows NT® LAN Manager (NTLM) security.
IDirectPlay3::CreateGroupInGroup, IDirectPlay3::AddGroupToGroup, IDirectPlay3::DeleteGroupFromGroup, and IDirectPlay3::EnumGroupsInGroup add richer group functionality and navigation when connected to a lobby server.
IDirectPlay3::SendChatMessage enables players to chat with other players connected to a lobby server.
IDirectPlay3::SetGroupConnectionSettings, IDirectPlay3::GetGroupConnectionSettings, and IDirectPlay3::StartSession enable synchronized application launching from a lobby server.
Password protection of sessions has been greatly improved. The new DPCREDENTIALS structure holds the user name and password to use when connecting to a secure server. The DPSECURITYDESC structure describes the security properties of a DirectPlay session instance.
An application can create multiple DirectPlay objects.
Guaranteed messaging is available for all service providers.
A new multicast server improves group messaging.
Support has been added for highly scalable client/server architecture applications.
The DirectPlay API is a network abstraction to which applications can be written. The API defines the functionality of the abstract DirectPlay network, and all the functionality is available to your application regardless of whether the actual underlying network supports it or not. When the underlying network does not support a function, DirectPlay contains all the code necessary to emulate it. Examples include group messaging and guaranteed messaging.
The DirectPlay service provider architecture insulates the application from the underlying network it is running on. The application can query DirectPlay for specific capabilities of the underlying network, such as latency and bandwidth, and adjust its communications accordingly.
Figure 4. DirectPlay architecture
A DirectPlay session is a communications channel between several machines. Before an application can start communicating with other machines it must join a session. An application can do this in one of two ways: It can enumerate all the existing sessions on a network and join one of them, or it can create a new session and wait for other machines to join it. Once the application has joined a session, it can create a player and exchange messages with all the other players in the session.
Each session has one machine that is designated as the host. The host, the owner of the session, is the only machine that can change the properties of the session.
Figure 5 illustrates the DirectPlay session model: An application must join a session to communicate with other machines using DirectPlay.
Figure 5.The DirectPlay session model
The most basic entity within a DirectPlay session is a player. A player represents a logical object within the session that can send and receive messages. DirectPlay does not have any representation of a physical machine in the session. Each player is identified as being either a local player (one that exists on your machine) or a remote player (one that exists on another machine). Each machine must have at least one local player before it can start sending and receiving messages. Individual machines can have more than one local player but, within the context of a DirectPlay session, they are considered to be distinct entities.
When an application sends a message, it is always directed to another player—not another machine. The player can be another local player (in which case the message will not go out over the network) or a remote player. Similarly, when messages are received by an application they are always addressed to a specific (local) player and marked as coming from some other player (except system messages, which are always marked as coming from DPID_SYSMSG).
Figure 6. Player and group structure
DirectPlay supports the concepts of groups within a session. A group is a logical collection of players. By creating a group of players, an application can send a single message to the group and all the players in the group will receive a copy of the message. A group is the means by which the multicast capabilities of the network are exposed to the application.
Groups can also be used as a general means to organize players in a session. A player can belong to more than one group. Functions are provided for administering groups and their membership. Additional functions are also provided to associate names and data with individual groups as a convenience, but they are not necessary to use groups.
The DirectAnimation component of DirectX media provides an integrated, comprehensive API and run time with support for a diverse set of media types and a powerful time/event model for developing rich animation and interaction. And because DirectAnimation is integrated with Dynamic HTML, it is especially suited to adding compact animation effects to Web pages.
The DirectAnimation run-time library is part of the Microsoft Internet Explorer version 4.0 minimal install. This means that Internet Explorer 4.0 contains all the software necessary to view multimedia created with DirectAnimation.
The key features of the DirectAnimation API are:
Figure 7 shows the DirectAnimation architecture.
Figure 7. DirectAnimation architecture
The DirectAnimation multimedia controls provide an interface to some of the DirectAnimation library, which is also accessible directly. The DirectAnimation library in turn uses the DirectShow API, the DirectX foundation, and certain operating system services. "SG" stands for Structured Graphics control and "Seq" stands for the Sequencer control.
DirectAnimation is a COM API and an underlying run time, the functionality of which can be accessed in different ways by different user groups.
You can use DirectAnimation in the following ways:
Table 3 shows the typical ways different developers would access DirectAnimation.
Table 3.DirectAnimation Access
Developers | Access DirectAnimation through |
Creative professionals | DirectAnimation Client Controls |
Web-site builders | DirectAnimation Client Controls, DirectAnimation scripting |
Script writers | DirectAnimation scripting, DirectAnimation for Java |
Application developers | DirectAnimation for Java, DirectAnimation scripting |
Graphics-systems programmers | DirectAnimation through native COM, DirectX foundation, and DirectShow |
You can access DirectAnimation from JScript, VBScript, Visual Basic, and C++ through the scripting (COM) interfaces directly. You can also add DirectAnimation content to your Web pages without programming at all by using the DirectAnimation controls and setting parameters on these controls. Using the DirectAnimation controls directly, or using JScript or VBScript, allows you to describe inline animations with HTML. Such animations can integrate with Dynamic HTML by being windowless on the page (overlaying other elements such as text) or by driving the properties of other entities on the page. It is also possible to import HTML-rendered text and use it as a texture in an animation.
There is a special Java binding for DirectAnimation provided on top of the COM API that takes advantage of specific Java features. For example, operations are overloaded so that several COM methods that perform similar functions but use different parameter types are given the same name in Java.
To create presentations with DirectAnimation, you need the following:
The DirectAnimation multimedia controls (formerly called Multimedia DHTML controls) supply a scripting interface to some of the DirectAnimation API functions and libraries. These controls allow you to deliver impressive animation, image, sound and vector graphics content over the Web with low code overhead and without incurring long download times.
The DirectAnimation multimedia controls consist of:
In the following example, "btnOval" is a button that says, "This is a moving button." A DirectAnimation path control named "pthOval" targets the button, telling it to move along an oval-shaped path on the page.
<HTML>
<INPUT NAME=btnOval TYPE=BUTTON VALUE="This is a moving button"
STYLE="position:absolute;LEFT: 20; TOP: 80">
<OBJECT ID="pthOval"
CLASSID = "CLSID:D7A7D7C3-D47F-11D0-89D3-00A0C90833E6">
<PARAM NAME="Target" VALUE="btnOval">
<PARAM NAME="Shape" VALUE="Oval(50,50,400,200)">
<PARAM NAME="AutoStart" VALUE="-1">
<PARAM NAME="Repeat" VALUE="-1">
<PARAM NAME="Duration" VALUE="10">
</OBJECT>
</HTML>
DirectShow (formerly the ActiveMovie SDK) is the media streaming architecture of DirectX media for controlling and processing streams of multimedia data.
DirectShow offers three core features:
DirectShow provides native support for the following formats:
DirectShow uses all DirectDraw and DirectSound hardware capabilities whenever possible. When no special DirectX hardware is available, DirectShow uses a graphics device interface (GDI) to draw video and the waveOut multimedia APIs to play back audio.
Features added to DirectShow since the ActiveMovie 1.0 SDK include:
At the heart of the DirectShow services is a modular system of pluggable components called "filters," arranged in a configuration called a "filter graph." A component called the "filter graph manager" oversees the connection of these filters and controls the stream's data flow.
To use the filter graph manager from an application, it is not necessary to know much about the underlying filter graphs. It is useful, however, to understand at least the basic principles of filter graphs if you ever want to configure your own filter graph rather than letting the filter graph manager configure them for you. A filter graph is composed of a collection of filters of different types. Most filters can be categorized into one of the following three types:
In addition to these three types, there are other kinds of filters. Examples include effect filters, which add effects without changing the data type, and parser filters, which understand the format of the source data and know how to read the correct bytes, create times stamps, and perform seeks.
For example, a filter graph, the purpose of which is to play back an MPEG-compressed video from a file, would use the following filters:
Figure 8 shows such a filter graph.
Figure 8. A filter graph
It is possible for some filters to represent a combination of types. For example, a filter might be an audio renderer that also acts as a transform filter by passing through the video data. But typically, filters fit only one of these three types.
Filter graphs stream multimedia data through filters. In the media stream, one filter passes the media downstream to the next filter. An upstream filter describes the filter that passes data to the filter; a downstream filter describes the next filter in line for the data. This distinction is important because media flow downstream, but other information can go upstream.
There are three ways to use DirectShow:
The amount you must know about an underlying or supported technology depends on your task. For example, you will need to understand COM programming when using C or C++ to control DirectShow playback or create a filter. But you do not need to understand COM programming to use the ActiveMovie control.
DirectShow provides prebuilt filters as part of the DirectShow SDK. A prebuilt filter, supplied as binary code only, is one of the filters listed in the Filter Graph Editor when you choose Insert Filters from the Graph menu.
The DirectShow SDK provides the following filters listed in Table 4.
Table. 4 Prebuilt Filter List
ACM Wrapper | Analog Video CrossBar | Audio Capture |
Audio Renderer | AVI Compressor | AVI Decompressor |
AVI Draw | AVI MUX | AVI Splitter |
AVI/WAV File Source | Color Space Transform | DSound Audio Renderer |
DVD Navigator | DV MUX | DV Video Splitter |
DV Video Decoder | DV Video Encoder | File Source (async) |
File Source (URL) | File Stream Renderer | File Writer |
Full Screen Renderer | Indeo Video R4.1 Compression | Indeo Video R4.1 Decompression |
Indeo 4.2 Video Compression | Indeo 4.2 Video Decompression | Indeo 5.0 Audio Decompression |
Indeo 5.0 Video Compression | Indeo 5.0 Video Decompression | Infinite Pin Tee |
Internal Script Command Renderer | Line 21 Decoder | Lyric Parser |
MIDI Parser | MIDI Renderer | MPEG-1 Stream Splitter |
MPEG Audio Decoder | MPEG Video Decoder | Multi-File Parser |
Overlay Mixer | QuickTime Decompression | QuickTime Movie Parser |
SAMI (CC) Parser | TrueMotion Decompression | TV Tuner |
VFW Video Capture | VGA 16 Color Ditherer | Video Renderer |
WAVE Parser | WDM Video Capture |
A sample filter includes source code; you must build and register it before it will appear in the Filter Graph Editor:
When developers use multimedia streaming in their applications, it greatly reduces the amount of format-specific programming needed. Typically, an application that must obtain media data from a file or hardware source must know everything about the data format and the hardware device. The application must handle the connection, transfer of data, necessary data conversion, and the actual data rendering or file storage. Because each format and device is slightly different, this process is often complex and cumbersome. Multimedia streaming, on the other hand, automatically negotiates the transfer and conversion of data from the source to the application. The streaming interfaces provide a uniform and predictable method of data access and control, which makes it easy for an application to play back the data, regardless of its original source or format.
Figure 9 shows the basic object hierarchy used in multimedia streaming.
Figure 9. Multimedia streaming
The following steps show how to implement streaming from hardware device to rendered playback:
There are three basic object types defined in the multimedia streaming architecture:
Multimedia by definition requires integration—and multimedia over the Internet requires even more. DirectX provides the first unified solution to take advantage of the cross-platform flexibility of the Internet and the powerful multimedia capabilities of the personal computer. With all DirectX services designed to work together with a single programming model, DirectX makes it easy to develop innovation and ease of use into online and digital media-authoring applications with benefits that go far beyond present-day technologies.
For additional information on DirectX technologies, visit the Microsoft DirextX Web site (http://www.microsoft.com/directx).
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.