Figure 3 Speech Hardware Minimum Requirements
Technology | CPU | RAM |
Discrete command and control | ||
User speaks simple commands like "mail," "change time," "minimize." | 386/33 | 500KB |
Continuous command and control | ||
User speaks complex commands, like "Send mail to Fred," "Change the time to ten o'clock," and "Minimize the window." | 486/33 | 1MB |
Discrete dictation | ||
Transcribes whatever the user says into a word processor. The user must pause between words. | 486/66 | 8MB |
Continuous dictation | ||
Transcribes natural speech into a word processor | P6 | 16MB |
Text-to-speech | ||
Convert ASCII or Unicode strings to natural speech. | 486/33 | 1MB |
Figure 5 High-level Speech Objects
Voice Commands Object | |
IUnknown | Provide access to other interfaces in the object. |
IVoiceCmd | Simple command and control speech recognition. Member functions let the |
app create Voice Menu objects. | |
IVCmdAttributes | Controls the attributes of the speech recognitionenginesuchastheautomatic gain, speaker name, and recognition threshold. |
IVCmdDialogs | Displays Windows dialog boxes that let the user configure the speech recognition engine, such as training. |
IVCmdNotifySink | (Supplied by the app.) Used to notify the app when a command is recognized, the user is speaking too loudly or softly, or something else happens. |
Voice Menu Object | |
IUnknown | Provide access to other interfaces in the object. |
IVCmdMenu | Methodstoadd/remove/modifyvoice commands,andtostartlisteningforthem. |
Voice Text Object | |
IUnknown | Provide access to other interfaces in the object. |
IVoiceText | Main interface for generating speech; contains the Speak function. |
IVTxtAttributes | ControlstheattributesoftheTTSenginesuchasthevoice'spitchandgender. |
IVTxtDialogs | Displays dialog boxes that let the user configure the TTS engine. |
IVTxtNotifySink | Supplied by the app. Used to notify the app when talking has begun or ended, or when a bookmark is reached or something else happens. |
Figure 6 Low-level Speech Objects
Speech Recognition Grammar Object | |
IUnknown | Provides access to other interfaces in the object. |
ISRGramCommon | Provides methods to activate and deactivate the grammar object, or archive it to disk. |
ISRGramCFG | Provides interfaces specific to context-free grammars and methods to manage lists of words and link grammars together. |
ISRGramDictation | Used for dictation grammars. Apps can supply hints about what the user might be dictating next. |
ISRGramNotifySink | Supplied by the app. Used to pass grammar notifications from the engine to the app. |
Speech Recognition Results Object | |
IUnknown | Provides access to other interfaces in the object. |
ISRResAudio | Gets an audio recording of what was spoken. |
ISRResBasic | Provides general information about what was spoken, such as the phrase that was recognized and when it was spoken. |
ISRResCorrection | Lets the app confirm that the phrase was correctly or incorrectly recognized, so the engine can learn from its mistakes. |
ISRResEval | Tells the engine to reevaluate a recognition decision based on what it now knows about the context. |
ISRResGraph | Provides a graph of alternate recognition hypotheses, either for words or phonemes. |
ISRResMemory | Since storing results objects consumes memory, this interface is provided to let apps control how results objects are stored. |
ISRResMerge | To merge or split two results objects. |
ISRResModifyGUI | Tells the engine to display a graphical user interface so the user can correct a recognition result. |
ISRResSpeaker | If an engine supports this, the application can use it to identify who spoke. |
Text-to-Speech Engine Object | |
IUnknown | Provides access to other interfaces in the object. |
ITTSAttributes | Controls the attributes of the text-to-speech engine such as the volume, processor usage, speaking speed, and pitch. |
ITTSCentral | Controls the engine object. Member functions allow an application to add buffers, and start and stop speech. |
ITTSDialogs | Displays windows dialog boxes that allow the end-user to configure the text-to-speech engine, such as correcting word pronunciations. |
ITTSBufNotifySink | Supplied by the app. Used to notify the app of changes to text buffer, such as when bookmarks are reached. |
ITTSNotifySink | Supplied by the app. Used to notify the app when audio starts or stops, or when attributes are changed. |
ILexPronounce | Optional. Lets app query and control the pronunciation of words. |
Figure 8 CIVCmdNotifySink
class CIVCmdNotifySink : public IVCmdNotifySink {
public:
CIVCmdNotifySink(void);
~CIVCmdNotifySink(void);
// Standard IUnkown members,
// all COM objects must have them.
//
STDMETHODIMP QueryInterface (REFIID, LPVOID FAR *);
STDMETHODIMP_(ULONG) AddRef(void);
STDMETHODIMP_(ULONG) Release(void);
// IVCmdNotifySink members
//
STDMETHODIMP CommandRecognize (DWORD, PVCMDNAME, DWORD, DWORD, PVOID, DWORD,PSTR, PSTR);
STDMETHODIMP CommandOther (PVCMDNAME, PSTR);
STDMETHODIMP MenuActivate (PVCMDNAME, BOOL);
STDMETHODIMP UtteranceBegin (void);
STDMETHODIMP UtteranceEnd (void);
STDMETHODIMP CommandStart (void);
STDMETHODIMP VUMeter (WORD);
STDMETHODIMP AttribChanged (DWORD);
STDMETHODIMP Interference (DWORD);
};