Figure 3   Speech Hardware Minimum Requirements

Technology

CPU

RAM

Discrete command and control

User speaks simple commands like "mail," "change time," "minimize."

386/33

500KB

Continuous command and control

User speaks complex commands, like "Send mail to Fred," "Change the time to ten o'clock," and "Minimize the window."

486/33

1MB

Discrete dictation

Transcribes whatever the user says into a word processor. The user must pause between words.

486/66

8MB

Continuous dictation

Transcribes natural speech into a word processor

P6

16MB

Text-to-speech

Convert ASCII or Unicode strings to natural speech.

486/33

1MB

Figure 5   High-level Speech Objects

Voice Commands Object

IUnknown

Provide access to other interfaces in the object.

IVoiceCmd

Simple command and control speech recognition. Member functions let the

app create Voice Menu objects.

IVCmdAttributes

Controls the attributes of the speech recognitionenginesuchastheautomatic gain, speaker name, and recognition threshold.

IVCmdDialogs

Displays Windows dialog boxes that let the user configure the speech recognition engine, such as training.

IVCmdNotifySink

(Supplied by the app.) Used to notify the app when a command is recognized, the user is speaking too loudly or softly, or something else happens.

Voice Menu Object

IUnknown

Provide access to other interfaces in the object.

IVCmdMenu

Methodstoadd/remove/modifyvoice commands,andtostartlisteningforthem.

Voice Text Object

IUnknown

Provide access to other interfaces in the object.

IVoiceText

Main interface for generating speech; contains the Speak function.

IVTxtAttributes

ControlstheattributesoftheTTSenginesuchasthevoice'spitchandgender.

IVTxtDialogs

Displays dialog boxes that let the user configure the TTS engine.

IVTxtNotifySink

Supplied by the app. Used to notify the app when talking has begun or ended, or when a bookmark is reached or something else happens.

Figure 6   Low-level Speech Objects

Speech Recognition Grammar Object

IUnknown

Provides access to other interfaces in the object.

ISRGramCommon

Provides methods to activate and deactivate the grammar object, or archive it to disk.

ISRGramCFG

Provides interfaces specific to context-free grammars and methods to manage lists of words and link grammars together.

ISRGramDictation

Used for dictation grammars. Apps can supply hints about what the user might be dictating next.

ISRGramNotifySink

Supplied by the app. Used to pass grammar notifications from the engine to the app.

Speech Recognition Results Object
(All interfaces are optional except IUnknown)

IUnknown

Provides access to other interfaces in the object.

ISRResAudio

Gets an audio recording of what was spoken.

ISRResBasic

Provides general information about what was spoken, such as the phrase that was recognized and when it was spoken.

ISRResCorrection

Lets the app confirm that the phrase was correctly or incorrectly recognized, so the engine can learn from its mistakes.

ISRResEval

Tells the engine to reevaluate a recognition decision based on what it now knows about the context.

ISRResGraph

Provides a graph of alternate recognition hypotheses, either for words or phonemes.

ISRResMemory

Since storing results objects consumes memory, this interface is provided to let apps control how results objects are stored.

ISRResMerge

To merge or split two results objects.

ISRResModifyGUI

Tells the engine to display a graphical user interface so the user can correct a recognition result.

ISRResSpeaker

If an engine supports this, the application can use it to identify who spoke.

Text-to-Speech Engine Object

IUnknown

Provides access to other interfaces in the object.

ITTSAttributes

Controls the attributes of the text-to-speech engine such as the volume, processor usage, speaking speed, and pitch.

ITTSCentral

Controls the engine object. Member functions allow an application to add buffers, and start and stop speech.

ITTSDialogs

Displays windows dialog boxes that allow the end-user to configure the text-to-speech engine, such as correcting word pronunciations.

ITTSBufNotifySink

Supplied by the app. Used to notify the app of changes to text buffer, such as when bookmarks are reached.

ITTSNotifySink

Supplied by the app. Used to notify the app when audio starts or stops, or when attributes are changed.

ILexPronounce

Optional. Lets app query and control the pronunciation of words.

Figure 8   CIVCmdNotifySink

 class CIVCmdNotifySink : public IVCmdNotifySink {
public:
  CIVCmdNotifySink(void);
  ~CIVCmdNotifySink(void);

  // Standard IUnkown members,
  // all COM objects must have them.
  //
   STDMETHODIMP QueryInterface (REFIID, LPVOID FAR *);
   STDMETHODIMP_(ULONG) AddRef(void);
   STDMETHODIMP_(ULONG) Release(void);

   // IVCmdNotifySink members
   //
   STDMETHODIMP CommandRecognize (DWORD, PVCMDNAME, DWORD, DWORD, PVOID,                                          DWORD,PSTR, PSTR);
   STDMETHODIMP CommandOther     (PVCMDNAME, PSTR);
   STDMETHODIMP MenuActivate     (PVCMDNAME, BOOL);
   STDMETHODIMP UtteranceBegin   (void);
   STDMETHODIMP UtteranceEnd     (void);
   STDMETHODIMP CommandStart     (void);
   STDMETHODIMP VUMeter          (WORD);
   STDMETHODIMP AttribChanged    (DWORD);
   STDMETHODIMP Interference     (DWORD);
};