Introduction to TAPI 3.0

[This is preliminary documentation and subject to change.]

What is TAPI 3.0?

As telephony and call control become more common in the desktop computer, a general telephony interface is needed to enable applications to access all the telephony options available on any machine. Additionally, it is imperative that the media or data on a call is available to applications in a standard manner.

TAPI 3.0 is an architecture that provides simple and generic methods for making connections between two or more machines, and accessing any media streams involved in that connection. It abstracts call-control functionality to allow different, and seemingly incompatible, communication protocols to expose a common interface to applications.

IP Telephony is a demand poised for explosive growth, as organizations begin an historic shift from expensive and inflexible circuit-switched public telephone networks to intelligent, flexible and inexpensive IP networks. Microsoft, in anticipation of this trend, has created a robust computer telephony infrastructure, TAPI. Now in its third major version, TAPI is suitable for quick and easy development of IP Telephony applications.

Inside TAPI 3.0

TAPI 3.0 integrates multimedia stream control with legacy telephony. Additionally, it is an evolution of the TAPI 2.1 API to the COM model, allowing TAPI applications to be written in any language, such as Java™, C/C++ and the Microsoft® Visual Basic® programming system.

Besides supporting classic telephony providers, TAPI 3.0 supports standard H.323 conferencing and IP multicast conferencing. TAPI 3.0 utilizes the Windows® NT 5.0 Active Directory service to simplify deployment within an organization, and it supports quality of service (QoS) features to improve conference quality and network manageability.

There are four major components to TAPI 3.0:

In contrast to TAPI 2.1, the TAPI 3.0 API is implemented as a suite of Component Object Model (COM) objects. Moving TAPI to the object-oriented COM model allows component upgrades of TAPI features. It also allows developers to write TAPI-enabled applications in any language, such as Java, Visual Basic, or C/C++.

The TAPI Server process (TAPISRV.EXE) abstracts the TSPI (TAPI Service Provider Interface) from TAPI 3.0 and TAPI 2.1, allowing TAPI 2.1 Telephony Service Providers to be used with TAPI 3.0, maintaining the internal state of TAPI.

Telephony Service Providers (TSPs) are responsible for resolving the protocol-independent call model of TAPI into protocol-specific call control mechanisms. TAPI 3.0 provides backward compatibility with TAPI 2.1 TSPs. Two IP Telephony service providers (and their associated MSPs) ship by default with TAPI 3.0: the H.323 TSP and the IP Multicast Conferencing TSP, which are discussed later in this document.

TAPI 3.0 provides a uniform way to access the media streams in a call, supporting the DirectShowTM API as the primary media stream handler. TAPI Media Stream Providers (MSPs) implement DirectShow interfaces for a particular TSP and are required for any telephony service that makes use of DirectShow streaming. Generic streams are handled by the application.

Call Control Model

There are five objects in the TAPI 3.0 API:

The TAPI object is the application's entry point to TAPI 3.0. This object represents all telephony resources to which the local computer has access, allowing an application to enumerate all local and remote addresses.

An Address object represents the origination or destination point for a call. Address capabilities, such as media and terminal support, can be retrieved from this object. An application can wait for a call on an Address object, or can create an outgoing call object from an Address object.

A Terminal object represents the sink, or renderer, at the termination or origination point of a connection. The Terminal object can map to hardware used for human interaction, such as a telephone or microphone, but can also be a file or any other device capable of receiving input or creating output.

The Call object represents an address's connection between the local address and one or more other addresses (This connection can be made directly or through a CallHub). The Call object can be imagined as a first-party view of a telephone call. All call control is done through the Call object. There is a call object for each member of a CallHub.

The CallHub object represents a set of related calls. A CallHub object cannot be created directly by an application - they are created indirectly when incoming calls are received through TAPI 3.0. Using a CallHub object, a user can enumerate the other participants in a call or conference, and possibly (because of the location independent nature of COM) perform call control on the remote Call objects associated with those users, subject to sufficient permissions:

To Place a Call

To Answer a Call

Media Streaming Model

The Windows® operating system provides an extensible framework for efficient control and manipulation of streaming media called the DirectShow API. DirectShow, through its exposed COM interfaces, provides TAPI 3.0 with unified stream control.

At the heart of the DirectShow services is a modular system of pluggable components called filters, arranged in a configuration called a filter graph. A component called the filter graph manager oversees the connection of these filters and controls the stream's data flow. Each filter's capabilities are described by a number of special COM interfaces called pins. Each pin instance can consume or produce streaming data, such as digital audio.

While COM objects are usually exposed in user mode programs, the DirectShow streaming architecture includes an extension to the Windows driver model that allows the connection of media streams directly at the device driver level. The diagram below shows a simple PSTN-to-IP bridge: A 64 Kbps voice stream from an ISDN line is compressed into a G.723 audio stream and passed to an RTP payload handler, to be sent out over the network.

These high-performance streaming extensions to the Windows driver model avoid user-to-kernel mode transitions, and allow efficient routing of data streams between different hardware components at the device driver level. Each kernel mode filter is mirrored by a corresponding user mode proxy that facilitates connection setup and can be used to control hardware-specific features.

DirectShow network filters extend the streaming architecture to machines connected on an IP network. The Real-Time Transport protocol (RTP), designed to carry real-time data over connectionless networks, transports TAPI media streams and provides appropriate time stamp information. TAPI 3.0 includes a kernel mode RTP network filter.

TAPI 3.0 utilizes this technology to present a unified access method for the media streams in multimedia calls. Applications can route these streams by manipulating corresponding filter graphs; they can also easily connect streams from multiple calls for bridging and conferencing capabilities.