November 1998
Windows NT 5.0 Brings You New Telephony Development Features with TAPI 3.0
Download Nov98Tapi3.exe (13KB)
Michelle Quinton is the development lead for TAPI 3.0. She often can be seen walking her two Dalmatians around Microsofts campus.
TAPI 3.0, the next version of Microsoft's telephony API, is scheduled to be released with Windows NT® 5.0. It differs from TAPI 2.1 in several ways. First, it is a set of COM interfaces, rather than a procedural C API. This allows developers to write TAPI 3.0 programs in Visual Basic® and Java, as well as
C/C++. Second, it adds media control to the API, so you can handle the recording and playback of voice messages. Finally, TAPI 3.0 has added support for IP, which is becoming increasingly important in the telephony world. This article will describe these new features, and includes a short TAPI 3.0 sample program.
To try TAPI 3.0 out, download the beta of Windows NT 5.0 from http://www.microsoft.com/NTServer/Basics/Future /WindowsNT5/default.asp!. As with all betas, TAPI 3.0 is subject to change until it ships with Windows NT 5.0.
TAPI 3.0 Architecture
New Features of TAPI 3.0
|
|
When the application calls this method, it supplies both the destination address to call and the address type that describes the format of the destination address. Any method in TAPI 3.0 that takes a destination address string as a parameter also takes an address type. Applications need to know what protocols a device supports, and the API provides a method for discovering this. Each TAPI line device supports a single protocol. Figure 2 shows the protocols that are currently defined for TAPI 3.0: PSTN (Public Switch Telephone Network), H.323, and multicast conferencing. If an application is only interested in H.323, it can easily find the device to use. TAPI 3.0 allows a call to support multiple media modes simultaneously. In previous versions of TAPI, a call could only have one media mode, and both TSPs and applications used the LINEMEDIAMODE_XXX constants. In TAPI 3.0, applications use the new TAPIMEDIAMODE_XXX constants, and a TSP can specify that a call has more that one media mode. For example, if a call has both audio and video on it, the TSP can report both LINEMEDIAMODE_INTERACTIVEVOICE and LINEMEDIAMODE_VIDEO in the LINECALLINFO structure. Call center support was introduced with TAPI 2.0, and has been greatly enhanced in version 3 with standard call center features such as agents, sessions, groups, and queues. I won't be discussing the call center features in this article. Please refer to the TAPI 3.0 documentation in MSDN for further information.
Media Control
The COM Interfaces
TAPI 3.0 Object Model
|
Figure 3 TAPI 3.0 Core Objects |
Address objects own Call objects. Call objects correspond directly to call handles in previous versions of TAPI. The Call object also is the first-party view of a connection, as described earlier. Call objects are either created by the application or generated by the TSP. Generally, an application creates outgoing calls and the TSP creates incoming calls.
Address objects also own Terminal objects. Terminals let the application select what media and media devices to use on a call. On a computer with a sound card, there would be Terminal objects that correspond to the sound card's microphone and speakers. The application can select those terminals on a call to indicate that it wants those devices to be the source and sink of media on that call. Note that while Terminal objects are similar in concept to terminals in previous versions of TAPI, they are not derived from those terminals. So if a TSP supports the TAPI 2.x concept of a terminal, this is not exposed in TAPI 3.0 as a Terminal object.
TAPI Object Interfaces
|
|
RegisterCallNotifications is similar to lineOpen in TAPI 2.x. In this method, pAddress specifies the Address object on which the application wants call-related events reported. Remember that Address objects own Call objects, so applications can't be informed of a call unless they are listening on an address. If the application wants to receive call events on more than one address, it must call this method for each address.
fMonitor and fOwner specify whether the application wants to monitor or own incoming calls. Even if these are both VARIANT_FALSE, the application will receive call events about outgoing calls it makes on that address. If fMonitor is VARIANT_TRUE, the application will receive call events about all calls on that address, but will not own any of the calls. If fOwner is VARIANT_TRUE, the application will receive call events only about calls that it owns. This also indicates to TAPI that it wants to own incoming calls. Both fMonitor and fOwner may be VARIANT_TRUE at the same time. This tells TAPI that the application wants to own incoming calls, and it wants to see events about any call on that address, whether or not it owns the call. lMediaTypes specifies the media modes of calls in which the application is interested. This parameter is only relevant when fOwner is VARIANT_TRUE. Basically, it tells TAPI that the application is only interested in owning calls of the specified media type. The TAPI 3.0 media types are listed in Figure 2. lCallbackInstance is an application-defined value that is returned to the application on any event that is fired as a result of this call to RegisterCallNotifications. An application can call RegisterCallNotifications multiple times for the same Address object, and use this value to distinguish between events. TAPI 3.0 returns a unique registration value in plRegister. The application uses this value to stop receiving call events by passing it as the parameter to the ITTAPI::UnregisterNotification method.
Address Object Interfaces
|
|
pDestAddress is the destination address string, such as a phone number or an email address. lAddressType is the address type of pDestAddress. So if pDestAddress is in the format of a phone number, lAddressType will be LINEADDRESSTYPE_PHONENUMBER. The address of the created call is returned in ppCall.
The CreateCall method simply creates a Call object. After the call is created, the application still needs to set up and connect the call before a connection is actually made. I will discuss this further in the Call object overview. The Address object also supports the ITAddressCapabilities interface, which is used to obtain detailed information on the capabilities of the address. The two main methods on the ITAddressCapabilities interface are get_AddressCapability and get_AddressCapabilityString. |
|
ADDRESS_CAPABILITY and ADDRESS_CAPABILITY_STRING are enums that specify which capabilities the application is interested in querying. An example of an ADDRESS_CAPABILITY is AC_ADDRESSTYPES, which requests the address types supported by the Address object. An example of an ADDRESS_CAPABILITY_STRING is ACS_PROTOCOL, which requests the protocol that the Address object supports. The protocol is a GUID, but it is passed in the interface in string format. Many capabilities can be queried through these two methods.
ITMediaSupport, another Address object interface, is used to describe the media supported by the address. The application can obtain the TAPIMEDIAMODEs supported, and learn whether the Address object supports media streaming through DirectShow. Finally, the Address object has the ITTerminalSupport interface, which lets applications find out which terminals can be used on calls that are owned by this address.
Terminal Object Interfaces
Call Object Interfaces
|
|
|
|
|
The Call object also supports the ITCallInfo interface. As the Call object corresponds directly to a TAPI 2.x call handle, this interface provides methods to access fields in the related TAPI 2.x LINECALLINFO structure. It also lets the application set the fields in the related TAPI 2.x LINECALLPARAMS structure, which is used when setting up an outgoing call. For example, this interface has a method called put_BearerMode that lets the application set the desired LINEBEARERMODE_XXX before making a call. The method get_BearerMode retrieves the LINEBEARERMODE_XXX being used on the call. There are about 50 methods on ITCallInfo, so I won't cover all of them here. But if you are familiar with TAPI 2.x and are looking for something that you found previously in LINECALLINFO or LINECALLPARAMS, ITCallInfo is the place to look.
To place an outgoing call you create a call, select terminals, then call Connect. The Connect method on ITBasicCallControl occasionally causes confusion because of the parameter it uses. The method looks like this: |
|
fSync tells TAPI 3.0 when the application wants the method to return. It either can return directly after the call request is made, or it can wait until the call is in the CS_
CONNECTED state. If fSync is VARIANT_TRUE, Connect will return when the call is connected, disconnected, or times out. Setting fSync to VARIANT_TRUE should only be done in very simple applications that don't want to register a callback to wait for a state change. Almost all applications should set fSync to VARIANT_FALSE and monitor the call state of the call through the event mechanism.
CallHub Object Interfaces
A Sample TAPI 3.0 Program
|
|
This single Event method is used to fire all TAPI 3.0 events to the application. callnot.cpp implements Event very simply: it posts a message to the application's UI thread to handle the event. A multithreaded apartment model application should do as little as possible on the thread in which Event is called and should not call back into TAPI 3.0, since this can cause a deadlock situation. Also note that Event calls AddRef so that the Event object is not deleted when Event returns.
OnTapiEvent is the function that eventually gets called to handle TAPI events. The TAPI_EVENT enum defines the events that can be fired. For each TAPI_EVENT, an event interface is defined. The Event object, pEvent, which is passed in Event, supports this corresponding interface. For example, the event TE_CALLNOTIFICATION supports the ITCallNotificationEvent interface. In OnTapiEvent, the application only handles the TE_ CALLNOTIFICATION and TE_CALLSTATE events. All other events are ignored. Also, notice the call to Release at the end of the function. This corresponds to the call to AddRef made when posting the event to the UI thread. Going back to RegisterTapiEventInterface, you can see that after the CTAPIEventNotification object is created, the application finds the ITTAPIEventNotification connection point and registers the callback object. After this registration, the application calls ListenOnAddresses. The ListenOnAddresses function starts by calling gpTapi->EnumerateAddresses. TAPI returns an enumerator of all Address objects present on the system. The application then loops through all the addresses by calling pEnumAddress->Next, and checks to see if the address supports TAPIMEDIAMODE_AUDIO. If it does, the application calls another function, ListenOnThisAddress, to start listening for calls on that Address object. Of course, there are many other capabilities that an application may want to query before using the Address object. As discussed previously, the ITAddressCapabilities interface provides lots of information about the address. The function ListenOnThisAddress first queries for the ITMediaSupport interface, then obtains all the TAPIMEDIAMODEs supported by the address. ITMediaSupport:: get_MediaTypes returns a long, which is actually a bit field of the supported TAPIMEDIAMODEs. That long is then used in RegisterCallNotifications to tell TAPI which media the application would like to listen for. I already know that the address supports audio because I checked this in ListenOnAddresses. For this application, I also want to listen for video, if available. I could have specifically checked for video, then called RegisterCallNotifications with TAPIMEDIAMODE_AUDIO|TAPIMEDIAMODE_VIDEO, if it was supported. The application does it this way to demonstrate both ways of determining how to find the supported media modes. Also, notice that the application keeps a global array of registration instances from RegisterCallNotification in gplRegistrationInstances. As discussed previously, this value is used to stop listening for calls. The implementation in this application is slightly awkward. It keeps an array with no way to map the registration instance back to an address, so there is no way for the application to selectively unregister for notifications. Usually an application would keep this value associated with an address in case it decided to stop listening on that address. After it starts to listen for calls, the application is finished with its initialization and simply waits for something to happen. The user interface (see Figure 5) is very simple. It lets the user answer, disconnect calls, and exit. Let's see what happens when a call comes in. |
Figure 5 INCOMING UI |
|
|
From this interface, the application can obtain the call about which it's being notified, the CALL_NOTIFICATION_
EVENT (which tells the application if it's the owner or just a monitor of this call), and the callback instance that was given to TAPI in the call to RegisterCallNotifications.
The INCOMING application checks the CALL_NOTIFICATION_EVENT to make sure that it is the owner of the call. If it isn't the owner, it ignores the call and returns. It is important to note that at this point, there is no reference to this call. The application has nothing else to clean up related to the call it's ignoring. If the application is the owner of the call, it retrieves the call, saves it in its global pointer, and returns. Actually, using a global variable in this way is bad; if there were already a call there, it would be overwritten. To simplify the demonstration, this application assumes one call at a time. At this point, the Answer button in the application has not been enabled because the application hasn't yet received an event that indicates an offering state. This is similar to the LINE_APPNEWCALL message in TAPI 2.x. The notification event lets the application know about the existence of a call, but the application shouldn't do anything with it until it gets a call state message. A notification is always followed immediately by a call state message. Next, the application will receive a TE_CALLSTATE event. This is eventually handled in HandleCallStateEvent, which queries the event for the ITCallStateEvent interface and looks like this: |
|
|
The ITCallStateEvent interface gives the application the call, the new call state, the cause for the call state change, and the callback instance. HandleCallStateEvent first checks the new call state, and if it's not CS_OFFERING, CS_DISCONNECTED, or CS_CONNECTED, it ignores it.
The Answer button finally gets enabled when CS_OFFERING is handled, although the application waits for the user to press the button before actually answering the call. When CS_CONNECTED calls are handled, the MakeWindowsVisible function is called. By default, video windows are hidden in TAPI 3.0. This gives the application time to place the windows and set their properties before showing them. The best time to set these properties is when the call is connected. The user can now press the Answer button. When the button is pressed, the dialog procedure will call AnswerTheCall. This function finds and selects terminals on the call, and then calls the Answer method. As I mentioned while discussing the Call object, terminals must be selected on the call before the call can be answered or connected. So let's look at how the program discovers the terminals to be used. The CreateTerminals function takes an Address object and returns an array of terminals to use on the call. First, it tries to find a Terminal object that supports audio rendering by calling GetDefaultTerminal to obtain the default audio render terminal. It then checks the actual direction of the terminal. Some terminals can both render and capture the stream. When asked for a specific direction, TAPI can return a terminal that supports both directions. If the terminal does support both, then there is no need to get the capture terminal as well. Next, the CreateTerminals function determines if the address also supports video. If so, it obtains a video render terminal. This is always a video window, which is a dynamic terminal. The function GetDefaultTerminal only returns static terminals, so it will fail for a video capture terminal. Instead, CreateTerminals calls GetVideoRenderTerminal, which is a wrapper around the ITTerminalSupport::CreateTerminal method. The only trick with CreateTerminals is that the terminal class being requested, in this case CLSID_VideoWindowTerm, must be passed in as a BSTR, not a GUID. The function converts the CLSID to a BSTR, CreateTerminals is called, then any allocated memory is freed. Finally, CreateTerminals uses GetDefaultTerminal to obtain the video capture terminal. If the video capture terminal exists, it also enables the preview window on this terminal, so the application will display a preview of what it is sending. Note that it's possible for an address to support video even though no video capture terminal exists. The capability of the address is a separate issue from whether a video capture device is present on the computer. In contrast, video rendering is always available if the address supports video because this involves simply creating a window. The CreateTerminals function uses a simple method to find terminals for a call, relying solely on GetDefaultTerminal to retrieve the Terminal object. Typically, an application will default to GetDefaultTerminal, but gives the user the option of choosing which terminals to use on a call. When execution returns to AnswerTheCall, the application has the terminals to select on the call. It loops through the array of returned terminals and selects each terminal on the call. Finally, the application calls ITBasicCallControl::Answer on the Call object. That about covers the application. Disconnecting the call is very simplejust call ITBasicCallControl::Disconnect. There is also some cleanup after a call is disconnected, and when the application shuts down.
Conclusion
From the November 1998 issue of Microsoft Systems Journal.
|