The engine must call out through AudioStop(), AudioStart(), and Visual(). The Visual callback must provide IPA phonemes. (The International Phonetic Alphabet [IPA] is a universal notation for describing the phonetic content of spoken communication. All speakable phonemes have representations in IPA. Details of IPA are in the Microsoft Speech API specification [part of the Speech SDK 3.0 download] at http://research.microsoft.com/research/srg/install.htm.)
Although the Visual notification is fairly rich, Microsoft Agent uses only the cIPAPhoneme value to animate the mouth as the character speaks. Any Microsoft Agent-compatible engine must provide a closely synchronized stream of Visual notifications reflecting the phonetic content of the produced utterance. In this case, "relatively timely notification" is not adequate, because speaker-hearers are fairly sensitive to discrepancies between mouth position and acoustic content. Visual notifications need to be returned promptly.