Speech Input Support

In addition to supporting mouse and keyboard interaction, Microsoft Agent includes support for speech input. You can use The Microsoft Command and Control Engine for supporting speech recognition. The Command and Control speech engine enables users to speak naturally without pausing between words. Speech recognition is speaker-independent, but it can be trained for improved performance. Because Microsoft Agent's support for speech input is based on Microsoft SAPI (Speech Application Programming Interface), you can use Microsoft Agent with other engines that are SAPI-compliant.

The user can initiate speech input by pressing and holding the push-to-talk listening hotkey. In this mode, if the speech engine receives the beginning of spoken input, it holds the audio channel open until it detects the end of the utterance. However, when not receiving input, it does not block audio output. This enables the user to issue multiple voice commands while holding down the key, and the character can respond when the user isn't speaking. If a character attempts to speak while the user is speaking, the character's audible output fails though text may still be displayed in its word balloon. If the character has the audio channel while listening key is pressed, the server automatically transfers control back to the user after processing the text in the Speak method. An optional MIDI tone is played to cue the user to begin speaking. This enables the user to provide input even if the application driving the character failed to provide logical pauses in its output.

Because multiple client applications can share the same character and because multiple clients can use different characters at the same time, the server designates one client as the input-active client and sends mouse and voice input only to that client application. This maintains the orderly management of user input, so that an appropriate client responds to the input. Typically, user interaction determines which client application becomes input-active. For example, if the user clicks a character, that character's client application becomes input-active. Similarly, if a user speaks the name of a character, it becomes input-active. Also, when the server processes a character's Show method, the client of that character becomes input-active. In addition, you can call the Activate method to make your client input-active, but you should do so only when your client application is active. For example, if the user clicks your application's window, activating your application, you can call the Activate method to receive and process mouse and speech input.

If multiple clients use the same character, the server defines the last one shown or the last one input-active as the current input-active character. However, you can also use the Activate method to set your client to become input-active or remain non-input-active when the user selects that character.