MSJ 1996 sidebar

Figure 3 Speech Hardware Minimum Requirements

Technology
CPU
RAM
Discrete command and control
User speaks simple commands like "mail," "change time," "minimize."
386/33
500KB
Continuous command and control
User speaks complex commands, like "Send mail to Fred," "Change the time to ten o'clock," and "Minimize the window."
486/33
1MB
Discrete dictation
Transcribes whatever the user says into a word processor. The user must pause between words.
486/66
8MB
Continuous dictation
Transcribes natural speech into a word processor
P6
16MB
Text-to-speech
Convert ASCII or Unicode strings to natural speech.
486/33
1MB

Figure 5 High-level Speech Objects

Voice Commands Object
IUnknown
Provide access to other interfaces in the object.
IVoiceCmd
Simple command and control speech recognition. Member functions let the
app create Voice Menu objects.
IVCmdAttributes
Controls the attributes of the speech recognitionenginesuchastheautomatic gain, speaker name, and recognition threshold.
IVCmdDialogs
Displays Windows dialog boxes that let the user configure the speech recognition engine, such as training.
IVCmdNotifySink
(Supplied by the app.) Used to notify the app when a command is recognized, the user is speaking too loudly or softly, or something else happens.

Voice Menu Object
IUnknown	Provide access to other interfaces in the object.
IVCmdMenu	Methodstoadd/remove/modifyvoice commands,andtostartlisteningforthem.

Voice Text Object
IUnknown	Provide access to other interfaces in the object.
IVoiceText	Main interface for generating speech; contains the Speak function.
IVTxtAttributes	ControlstheattributesoftheTTSenginesuchasthevoice'spitchandgender.
IVTxtDialogs	Displays dialog boxes that let the user configure the TTS engine.
IVTxtNotifySink	Supplied by the app. Used to notify the app when talking has begun or ended, or when a bookmark is reached or something else happens.

Figure 6 Low-level Speech Objects

Speech Recognition Grammar Object
IUnknown
Provides access to other interfaces in the object.
ISRGramCommon
Provides methods to activate and deactivate the grammar object, or archive it to disk.
ISRGramCFG
Provides interfaces specific to context-free grammars and methods to manage lists of words and link grammars together.
ISRGramDictation
Used for dictation grammars. Apps can supply hints about what the user might be dictating next.
ISRGramNotifySink
Supplied by the app. Used to pass grammar notifications from the engine to the app.

Speech Recognition Results Object (All interfaces are optional except IUnknown)
IUnknown	Provides access to other interfaces in the object.
ISRResAudio	Gets an audio recording of what was spoken.
ISRResBasic	Provides general information about what was spoken, such as the phrase that was recognized and when it was spoken.
ISRResCorrection	Lets the app confirm that the phrase was correctly or incorrectly recognized, so the engine can learn from its mistakes.
ISRResEval	Tells the engine to reevaluate a recognition decision based on what it now knows about the context.
ISRResGraph	Provides a graph of alternate recognition hypotheses, either for words or phonemes.
ISRResMemory	Since storing results objects consumes memory, this interface is provided to let apps control how results objects are stored.
ISRResMerge	To merge or split two results objects.
ISRResModifyGUI	Tells the engine to display a graphical user interface so the user can correct a recognition result.
ISRResSpeaker	If an engine supports this, the application can use it to identify who spoke.

Text-to-Speech Engine Object
IUnknown	Provides access to other interfaces in the object.
ITTSAttributes	Controls the attributes of the text-to-speech engine such as the volume, processor usage, speaking speed, and pitch.
ITTSCentral	Controls the engine object. Member functions allow an application to add buffers, and start and stop speech.
ITTSDialogs	Displays windows dialog boxes that allow the end-user to configure the text-to-speech engine, such as correcting word pronunciations.
ITTSBufNotifySink	Supplied by the app. Used to notify the app of changes to text buffer, such as when bookmarks are reached.
ITTSNotifySink	Supplied by the app. Used to notify the app when audio starts or stops, or when attributes are changed.
ILexPronounce	Optional. Lets app query and control the pronunciation of words.

Figure 8 CIVCmdNotifySink

 class CIVCmdNotifySink : public IVCmdNotifySink {
public:
  CIVCmdNotifySink(void);
  ~CIVCmdNotifySink(void);

  // Standard IUnkown members,
  // all COM objects must have them.
  //
   STDMETHODIMP QueryInterface (REFIID, LPVOID FAR *);
   STDMETHODIMP_(ULONG) AddRef(void);
   STDMETHODIMP_(ULONG) Release(void);

   // IVCmdNotifySink members
   //
   STDMETHODIMP CommandRecognize (DWORD, PVCMDNAME, DWORD, DWORD, PVOID,                                          DWORD,PSTR, PSTR);
   STDMETHODIMP CommandOther     (PVCMDNAME, PSTR);
   STDMETHODIMP MenuActivate     (PVCMDNAME, BOOL);
   STDMETHODIMP UtteranceBegin   (void);
   STDMETHODIMP UtteranceEnd     (void);
   STDMETHODIMP CommandStart     (void);
   STDMETHODIMP VUMeter          (WORD);
   STDMETHODIMP AttribChanged    (DWORD);
   STDMETHODIMP Interference     (DWORD);
};

Technology	CPU	RAM
Discrete command and control
User speaks simple commands like "mail," "change time," "minimize."	386/33	500KB
Continuous command and control
User speaks complex commands, like "Send mail to Fred," "Change the time to ten o'clock," and "Minimize the window."	486/33	1MB
Discrete dictation
Transcribes whatever the user says into a word processor. The user must pause between words.	486/66	8MB
Continuous dictation
Transcribes natural speech into a word processor	P6	16MB
Text-to-speech
Convert ASCII or Unicode strings to natural speech.	486/33	1MB

Voice Commands Object
IUnknown	Provide access to other interfaces in the object.
IVoiceCmd	Simple command and control speech recognition. Member functions let the
	app create Voice Menu objects.
IVCmdAttributes	Controls the attributes of the speech recognitionenginesuchastheautomatic gain, speaker name, and recognition threshold.
IVCmdDialogs	Displays Windows dialog boxes that let the user configure the speech recognition engine, such as training.
IVCmdNotifySink	(Supplied by the app.) Used to notify the app when a command is recognized, the user is speaking too loudly or softly, or something else happens.

Speech Recognition Grammar Object
IUnknown	Provides access to other interfaces in the object.
ISRGramCommon	Provides methods to activate and deactivate the grammar object, or archive it to disk.
ISRGramCFG	Provides interfaces specific to context-free grammars and methods to manage lists of words and link grammars together.
ISRGramDictation	Used for dictation grammars. Apps can supply hints about what the user might be dictating next.
ISRGramNotifySink	Supplied by the app. Used to pass grammar notifications from the engine to the app.