This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
|
Uncork the Power of Microsoft Agent 2.0
Tandy Trower |
Looking for an advanced character-based help system for your application? Microsoft Agent, which will be included in Windows 2000, provides powerful programmability and UI innovations in an easy-to-use package.
Microsoft® Agent version 2.0 is an innovative, royalty-free interface technology that can be included as part of your Web pages or conventional applications. This user interface element can be used on Windows® platforms (except Windows CE), and it enables you to
display and animate an interactive character. You can even compile character animations of your own using the Microsoft Agent Character Editor. You can define the character's name
While this sounds a little like Microsoft Bob, Agent differs in several ways. Bob was an application suite targeted at new or novice PC users that featured interactive guides as the interface to a set of applications. Agent is a technology that includes a programming interface that can be coded from any language that supports COM, such as C++ or Visual Basic®. It also includes an ActiveX® control that makes it easy to program from languages like Visual Basic or from scripting languages like VBScript or JScript®. In Bob, the characters appeared in an interface that masked the Windows interface. With Agent, characters appear in their own non-rectangular windows, shaped to the current frame of animation, and can appear anywhere in the conventional Windows interface. Also, unlike Bob, Agent characters can be dismissed, keeping the user in charge of the interaction. This may now sound a lot like the Office Assistant featured in Microsoft Office. While Agent is now the technology used to support the Office 2000 Assistant, there are more important differences. The Office Assistant is specifically intended as an enhanced form of user assistance for Office users. Agent is not intended for any specific kind of application; characters can be used as guides, instructors, chat avatars, or even game opponents. Agent also does not include support for the IntelliSense® feature of Office that attempts to provide appropriate help topics or suggest helpful tips. However, similar technology can be combined easily with Agent. In addition, Agent includes support for complementary technologies such as speech input and output.
Robby the robot, Peedy the parrot, Genie, and Merlin
Microsoft supplies four characters: Genie, Merlin, Peedy the parrot, and Robby the robot. Microsoft Agent, the Microsoft characters, speech engines, tools, full API documentation, and code samples are all posted on the Microsoft Agent Web site at http://www.microsoft.com/msagent. You can also access copies of the Microsoft Agent Distribution License Agreement that enables you to distribute Agent with your applications. The Microsoft Agent Programming Interface The Microsoft Agent programming interface is based on COM. By using COM, Agent is able to provide an interface that enables multiple applications, or clients, to use its animation services simultaneously. These services include the ability to load a character, play a specified animation, speak using a synthesized speech engine or audio file, and respond to user input. Automatic lip-synch support is provided for spoken output. The animation services exposed through interfaces of the Agent animation server enable client applications to direct the input and output of a character while managing shared resources like the audio channel for speech input and output. The server's most significant task is the display and management of the character animation on the screen. The server requests frames from the Agent Data Provider To use the Agent services, a client application must first establish a connection with the Agent animation server. There are standard COM interfaces defined for this purpose, but Agent also includes an ActiveX control that simplifies the process. You can add the control to a project by creating an instance of it on a form window. You can then begin programming the control using its various methods, properties, and events. For example, in Visual Basic, to load a character you use the Load method, passing it a character animation file name. |
|
This loads the character's data into the Agent Characters collection. You also supply your own named reference in the Load statement (in the previous example, "my character") that lets you reference the character for specifying other methods or properties for it. If you only specify a file name, Agent attempts to load the character from its Chars subdirectory. However, you can also specify a full path name.
To make the character appear, use the Show method, specifying the character reference you used in the Load call. |
|
Once visible, you can play a character's animation using the Play method, specifying the name of the animation you want to show. |
|
For the names of the character's animations, you need to contact the character's supplier. All the animations for the four supplied Microsoft characters are documented on the Microsoft Agent Web site at http://msdn.microsoft.com/workshop/imedia/agent/default.asp.
To make a character speak, use the Speak method, specifying the text to be spoken. |
|
Spoken text appears in a word balloon. Text defined in the Speak method is also spoken audibly if a compatible synthesized text-to-speech (TTS) speech engine is installed. While Microsoft provides a US English TTS engine that may be used, the Agent speech interfaces are based on the industry standard Speech API (SAPI). This means that Agent can be hosted with speech engines provided by other vendors in other languages. Spoken output can also be supported by using recorded .wav files.
Both audio alternatives were implemented to provide the greatest flexibility for developers to optimize the spoken output. For example, TTS speech synthesis provides greater flexibility in what the Whether using TTS or sound files, Agent automatically lip-synchs the character's mouth animation with the audio output. (TTS engines must support International Phonetic Alphabet phonemes as part of the SAPI NotifySink interface.) Audio files can also be enhanced for lip-synching by postprocessing them using the Microsoft Linguistic Information Sound Editing Tool that analyzes the audio stream and supplies textual representation to add phoneme and word-break information into the audio file. In some programming languages, like Visual Basic, you can declare an object variable and set it equal to the character you load. You can then use this object variable with the various methods and properties supported by Agent, also making your code more readable. For example, the preceding statements could be rewritten as follows: |
|
Handling Input Events Agent also supports a number of events. For example, when the user clicks on the character, Agent fires an event that passes back the button that was clicked, any modifier keys that were pressed, and the x and y coordinates of the mouse. Using Visual Basic, an event handler for this event might look like the following: |
|
Agent includes support for a popup menu when the user right-clicks the character. You can add your own entries on this menu by creating a Command object in the Commands collection. Use the Add method to add a command. |
|
If you create a reference to the character, then the command syntax is simpler. |
|
The first parameter is a unique name you use to reference the command. The second parameter is the text that will appear on the character's popup menu. Include an & character to define the menu item's underlined access key.
When the user selects an item on the popup menu, it generates a Command event. The Command event passes back information about what was selected through the UserInput object. To simply determine if one of your commands was selected, check the Name property of the UserInput object: |
|
Synchronizing Character Animation To avoid blocking your program code execution, Agent Speak and Play calls are played asynchronously. However, there may be times when you want to synchronize your code with a character's animation. For example, you might want to display a message box when the character speaks an introductory statement about it. The following code will not work: |
|
You cannot simply follow the statement with the MsgBox call because the character's animations will be played asynchronously while your code continues to execute the MsgBox call. This would result in the message box appearing before the character
finishes speaking
When using Visual Basic, developers often resort to using some type of loop with a DoEvents call, but there are much better ways. If you are trying to synch with a word as it is spoken, you can simply include a bookmark tag in the Speak method text. This fires an event when the particular word it precedes is spoken. (The bookmark tag will not appear in the character's word balloon.) You can then use the Bookmark event to display the message at that point. |
|
Agent also supports a more generalized technique for synchronizing animations. You declare an object and assign it to the animation method you want to synch (for example, a Play or Speak method). This creates a Request object. When the animation is played, Agent fires an event when the animation begins (RequestStart) and when it ends (RequestComplete). In these event handlers you can program the activity you want to synch. |
|
Make certain you declare your Request object with correct scope. In Visual Basic, if variables are not declared globally their values are lost when your code execution steps outside the subroutine where the variables were assigned.
Agent lets you display multiple characters at the same time. In scenarios where multiple characters are being used, you may want to animate characters synchronously. For example, you might want to have one character wait for the other character to say something or to complete a specific animation. You can use Request objects along with the Wait method to do this. Let's say you have loaded two characters, Genie and Merlin, and want Merlin to wait and respond based on something Genie says. |
|
Any animations in Merlin's queue before the Wait statement will play independently, but any that fall after it will be held until the Speak statement request completes. Of course, you could also use another Request object and Wait statement to make Genie's queue hold until Merlin responds.
Supporting Speech Input In addition to programming a character to respond to keyboard or mouse input, Agent also includes support for speech input. To support speech input, the user must have a SAPI 4.0-compliant Command and Control speech recognition engine installed that matches the character's language ID setting. To select the engine, use the SRModelD property. To support speech input, you use a Command object (as I described earlier), including a voice grammar for the core words you want the character to recognize and match to this command. For example, if you want to voice-enable a message like "Read Message," add a Command object to the Commands collection. |
|
You can also add to a command that you already defined for the popup menu by setting the Voice property. |
|
Or better yet, you can define both your menu and voice command text at the same time. |
|
When you define your voice grammar, you can also include optional words the user might say, like "please" and "the." You can add these using square brackets to indicate that the words are optional, and parentheses to indicate alternative words. |
|
Once you have defined the voice grammar, Agent passes it to the speech engine. Similar to the user's selection of one of your commands on the popup menu, when the user speaks a phrase that matches your voice grammar, Agent fires the Command event and passes back an object that includes the identifier assigned to that grammar, as shown in the following example: |
|
The UserInput object also passes back other possible matches considered by the speech engine and similarity scores for
each of them. This lets your code evaluate the quality of the speech input.
Using Agent from Web Pages Because Microsoft Agent includes an ActiveX control, it can also be hosted and scripted from Web pages using languages like VBScript and JScript. To view the results of the script, the page must be viewed with a browser like Microsoft Internet Explorer that supports ActiveX and these scripting languages. If you choose Netscape Navigator, you can use the NCompass plug-in for ActiveX controls. To declare the control on the page, you can use the HTML <object> tag and specify the class identifier (CLSID) for the Agent ActiveX control: |
|
This loads the ActiveX control, and if Agent has not been installed on the user's local system, it also automatically attempts to download Agent when the page loads and prompts the user for installation. From there the programming model follows conventions similar to those described in this article.
There is one additional feature worth noting. In addition, you can optionally load a character's animations from an HTTP server rather than requiring that the character be installed on the local system. To do this, the character must be available on the server in the .acf format (rather than as an .acs file). The sample Microsoft characters are all stored at the following URL: |
|
Loading a character in this way enables you to download only the animations you want to use; you must load the animation before you can play it. The following code illustrates how to load a character using the .acf format: |
|
By assigning Request objects to the Get statements, you can determine if the animation loaded from the server successfully. You can also set the Queue parameter of the Get method to False to load animations asynchronously. This enables you to download a few animations to get started, then download others while the character is speaking. Animations loaded this way are stored in the browser's queue. This makes subsequent access to the animation data faster as long as the data remains in the browser's queue.
Defining Characters for Microsoft Agent If you don't want to use the four sample characters, you can use your preferred rendering tools to create your own character animation images and then assemble, time, branch, and compile them into an Agent character file using the Microsoft Agent Character Editor. Like traditional cell animation, each character's animation is made up of separate images, each altered slightly, that when played sequentially create the illusion of motion. In addition, mouth overlays can be included to provide the visual images for lip-synched animation. The Character Editor lets you define the character's default word balloon characteristics. If the character uses TTS output, you can also set its default voice settings. Agent supports two different formats for compiled character files. With one format, all character and animation data is compiled into a single file. This format is primarily used when the character file is located on a local disk drive. A second format compiles each animation as a separate file and is used primarily in Web-based scenarios where animations are loaded from a server on demand. The Agent server also manages certain states of the character to make programming easier for clients. For example, when no animation has been played for four seconds, the server automatically places the character in an idle state, playing animations to keep the character from remaining in a frozen pose. The idle state typically begins with simple animations such as breathing, eye blinks, or changes of gaze, attempting to model that of a real person patiently waiting for input. If the first idle animation completes (it does not loop), then the server plays another approximately four seconds later. After one minute, the server proceeds to the second idle level, running animations that indicate that the character has a level of self-occupation. After two minutes, the character enters the third idle level, in which animations are played that indicate the character is present but inactive. A typical animation used at this level has the character sleeping; however, all of the animations assigned for these states are left to the character author to define. The server also provides states for moving and gesturing in a particular direction. When the character is programmed to use the MoveTo or GestureAt methods, the server automatically determines the location of the character and the x, y location it is to move or gesture to, and then plays the appropriate animation. Similarly, the character author can assign specific animations to the listening and hearing states. The server plays the assigned listening animation when speech recognition is initialized, and then switches to the hearing animation when it detects an utterance. This makes the character appear to respond naturally to the user when it's waiting to speak and when words are actually spoken. Finally, Agent provides states to manage the appearance and hiding of the character by coordinating the timing between the playing of the animation and the visibility of the frame. For example, when the character is shown, the frame must be made visible first, followed by the appropriate transitional animation. When hiding the character, the reverse must happen. Conclusion Considering the marketing failure of Bob, you might question the value of interactive characters. However, research conducted by Nass and Reeves (The Media Equation, Cambridge Press) has consistently demonstrated that humans will respond to social cues presented by interactive technology similar to the way they will respond to another person. This has huge implications for the design of application and Web user interfaces. Up to now, human factors specialists only have focused on the cognitive and usability aspects of design. Tapping into the social aspects of communication provides the potential for designing more natural interaction. As Nass and Reeves have reported, social expectations and responses can be generated from the wording of text on the screen. Add an interactive character that can show meaningful facial expressions and gestures, and you significantly expand the potential bandwidth of communication.
|