text figures

Figure 3 Techniques for Converting Phonemes

Technique
Description
Pros
Cons

Word Concatenation
Uses prerecorded words as a basis for constructing sentences.
Easy to code.
Not very flexible and produces a discontinuous sound.

Synthesis
Generates a synthetic voice by means of mathematical algorithms.
Very flexible since the voice can be changed by varying parameters such as throat, mouth, and lips.
Produces a robotic sounding voice.

Subword Concatenation
Uses short, prerecorded sounds that are combined into a continuous sound by applying mathematical algorithms.
Most accurate and precise technique.
Requires a large database of basic, prerecorded sounds that need to be localized.

Figure 4 Voice Text Object Properties

General
Description

Enabled
Whether the voice text capability is enabled

CountEngines
The number of synthesis engines installed

IsSpeaking
Whether the engine is speaking

Voice
Description

Age
A value that denotes the age of the speaker

Gender
A value that denotes whether the speaker is male or female

Speaker
The name of the speaker, used for mnemonic reasons when you handle multiple voices at the same time

Speed
The average talking speed in words per minute

Style
The style of the voice (business, excited, casual)

Physical
Description

JawOpen
Angle to which the jaw is open

LipTension
A value denoting the degree of lips tension

MouthHeight
The height of the mouth

MouthWidth
The width of the mouth

MouthUpturn
The angle to which the mouth is open at the corners ("smile amplitude")

TeethLowerVisible
The extent to which the lower teeth are visible

TeethUpperVisible
The extent to which the upper teeth are visible

TonguePosn
The position of the tongue with respect to the teeth

Figure 5 Voice Text Object Methods

Methods

Dialog
Description

AboutDlg
Copyright notice for the TTS engine

GeneralDlg
General properties of the engine

LexiconDlg
Lets you add new words by specifying the exact pronunciation

TranslateDlg
Lets you add new abbreviations and acronyms by specifying how to translate them

Engine
Description

FastForward
Moves playback forward by approximately a sentence

Pause
Pauses the text being played

Resume
Resumes a previously interrupted playback

Speak
Repeats the specified text

StopSpeaking
Stops speaking and flushes all the changes being made to the engine queue

Events

Engine
Description

AttribChanged
Fired when the value of a property changes

Speak
Fired when new text is added to the engine queue for further playback

SpeakingStarted
Indicates that the engine has just started speaking

SpeakingDone
Indicates that the engine has just finished speaking and the queue is empty

Figure 7 SAPIMsgBox.cpp


 /*****************************************************************
 *
 *  Project.....:  Testing Speech SDK 4.0
 *  Application.:  TELLMEMORE.exe
 *  Module......:  SAPIMsgBox.cpp
 *  Description.:  Add speech capabilities to MessageBox()
 *  Compiler....:  MS Visual C++ 6.0
 *  Written by..:  D. Esposito
 *  Environment.:  Windows 9x/NT
 *
 ******************************************************************/
 
 /*---------------------------------------------------------------*/
 //                        INCLUDE section
 /*---------------------------------------------------------------*/
 #include <windows.h>
 #include "SAPIUtils.h"
 
 /*---------------------------------------------------------------*/
 //                        GLOBAL section
 /*---------------------------------------------------------------*/
 // Data
 static HHOOK g_hookCBT=NULL;
 static LPTSTR g_pszText=NULL;
 static BOOL g_bMsgBoxActivated=false;
 
 // Callbacks
 static LRESULT CALLBACK CBTProc(int, WPARAM, LPARAM);
 
 /*---------------------------------------------------------------*/
 // Procedure...: SAPIMessageBox()
 // Description.: Add speech capabilities to MessageBox() calls 
 /*---------------------------------------------------------------*/
 int SAPIMessageBox(HWND hwnd, LPCTSTR szText, LPCTSTR szTitle, 
     UINT fuFlags)
 {
     // If no special style, then call the default MessageBox()
     if (!(fuFlags & MB_ENABLESPEAK))
         return MessageBox(hwnd, szText, szTitle, fuFlags);
 
     // By installing a CBT hook it's possible to detect when the
     // dialog gets created and activated. 
     g_hookCBT = SetWindowsHookEx(WH_CBT,
                                  reinterpret_cast<HOOKPROC>(CBTProc),
                                  NULL,GetCurrentThreadId());
 
     g_pszText = const_cast<LPTSTR>(szText);
     
     // Now call the real MessageBox() after removing the additional
     // speech-specific flag
     fuFlags &= ~MB_ENABLESPEAK;
     g_bMsgBoxActivated = false;
     int irc = MessageBox(hwnd, szText, szTitle, fuFlags);
 
     // Remove the hook 
     UnhookWindowsHookEx(g_hookCBT);
     return irc;
 }
 
 /*---------------------------------------------------------------*/
 // Procedure...: CBTProc()
 // Description.: CBT hook procedure that runs the Speech engine
 // INPUT.......: int, WPARAM, LPARAM
 // OUTPUT......: LRESULT
 /*---------------------------------------------------------------*/
 LRESULT CALLBACK CBTProc(int iHookCode, WPARAM wParam, LPARAM lParam)
 {
     // No interest this time...
     if (0 > iHookCode)
          return CallNextHookEx(g_hookCBT, iHookCode, wParam, lParam);
     
     // When the first window is activated, repeat the text...
     // (Since I set the hook just before MessageBox() I'm sure this
     // is just its window.)
     if (HCBT_ACTIVATE == iHookCode)
     {
         if (!g_bMsgBoxActivated)
         {
             LPWSTR pwszText = new WCHAR[lstrlen(g_pszText)+1];
             ZeroMemory(pwszText, lstrlen(g_pszText)+1);
             mbstowcs(pwszText, g_pszText, lstrlen(g_pszText)+1);
 
             TellMe(pwszText);
             delete pwszText;
             g_bMsgBoxActivated = true;
         }
     }
     else  // stop playing on exit
     if (HCBT_DESTROYWND == iHookCode)
         TellMe(NULL);
 
     return CallNextHookEx(g_hookCBT, iHookCode, wParam, lParam);
 }
 
 /*  End of file: SAPIMsgBox.cpp  */

Figure 8 Tags

Tag
Description

\Pau=number\
The engine will pause for the specified number of milliseconds.

\Mrk=number\
Sets a bookmark in the text. Number is the ID of the bookmark. When a bookmark is encountered, an event is generated. Bookmarks are supported only by DirectTextToSpeech.

\RmS=number\
Forces the engine to spell out all the words encountered until it is reset. Number is a boolean value.

\Vce=char=val
[, char=val]\
Sets the specified attribute for the voice. More settings are accepted at the same time.

\Ctx=string\
Defines a context for the next word. A context is a string that describes the word that follows-for example, an email address, a URL, a number, or a date.

\Vol=number\
Sets the volume. Number is a value from 0 to 65535, where 0 is silence.

NB: Not all the tags are supported by all engines.

Figure 9 SAPIUtils.cpp


 /*****************************************************************
 *  Project.....:  Testing Speech SDK 4.0
 *  Application.:  TELLMEMORE.exe
 *  Module......:  SAPIUtils.cpp
 *  Description.:  Calls the Speech API functions
 *  Compiler....:  MS Visual C++ 6.0
 *  Written by..:  D. Esposito
 *  Environment.:  Windows 9x/NT
 ******************************************************************/
 
 /*---------------------------------------------------------------*/
 //                        PRAGMA section
 /*---------------------------------------------------------------*/
 // Force the linker to add the following libraries.
 #ifdef _MSC_VER
 #pragma comment(lib, "spchwrap.lib")
 #endif
 
 /*---------------------------------------------------------------*/
 //                        INCLUDE section
 /*---------------------------------------------------------------*/
 #include <objbase.h>
 #include <initguid.h>
 #include <spchwrap.h> 
 
 /*---------------------------------------------------------------*/
 //                        GLOBAL section
 /*---------------------------------------------------------------*/
 // Data
 static CVoiceText *g_pSpeakObject;
 
 /*---------------------------------------------------------------*/
 // Procedure...: InitSpeech()
 // Description.: Initializes the speech engine
 // INPUT.......: void
 // OUTPUT......: BOOL
 /*---------------------------------------------------------------*/
 BOOL InitSpeech(void)
 {
     CoInitialize(NULL);
 
     g_pSpeakObject = new CVoiceText;
     if (!g_pSpeakObject)
         return false;
 
     if (g_pSpeakObject->Init(L"TellMeMore")) 
         return false;
 
     return true;
 }
 
 /*---------------------------------------------------------------*/
 // Procedure...: TellMe()
 // Description.: Repeats the specified text
 // INPUT.......: LPWSTR
 // OUTPUT......: void
 /*---------------------------------------------------------------*/
 void TellMe(LPWSTR wszText)
 {
     if (g_pSpeakObject->IsSpeaking())
         g_pSpeakObject->StopSpeaking();
 
     if (NULL == wszText)
         return;
 
     g_pSpeakObject->Speak(wszText);
 }
 
 /*---------------------------------------------------------------*/
 // Procedure...: ShowLexiconDialog()
 // Description.: Run the wizard to add new words
 // INPUT.......: HWND 
 // OUTPUT......: void
 /*---------------------------------------------------------------*/
 void ShowDialog(HWND hWnd)
 {
     g_pSpeakObject->LexiconDlg(hWnd);
 }
 
 /*---------------------------------------------------------------*/
 // Procedure...: TermSpeech()
 // Description.: Closes the speech engine
 // INPUT.......: void
 // OUTPUT......: BOOL
 /*---------------------------------------------------------------*/
 BOOL TermSpeech(void)
 {
     delete g_pSpeakObject;
     CoUninitialize();
     return true;
 }
 
 /*  End of file: SAPIUtils.cpp  */

Figure 14 Detecting Clipboard Changes


 void OnDrawClipboard(HWND hwnd)
 {
    // List of the formats we're interested in
    static UINT auPriorityList[] = {CF_TEXT};  
 
    // Verify that the content is in CF_TEXT format   
    UINT uFormat = GetPriorityClipboardFormat(auPriorityList, 1); 
    if (-1 == uFormat)
        return;
 
    // Get data from the clipboard
    if (OpenClipboard(hwnd))                     
    { 
       HGLOBAL hglb = GetClipboardData(uFormat); 
       LPTSTR psz = reinterpret_cast<LPTSTR>(GlobalLock(hglb));  
       SetDlgItemText(hwnd, IDC_TEXT, psz);
       GlobalUnlock(hglb); 
       CloseClipboard();                     
    }
 
    // Make the window flash. (Only Windows 98 and 2000)
    // When you then click on the window, it'll receive a
    // WM_NCACTIVATE message where the original title is 
    // restored
    SetWindowText(hwnd, APPTITLE_HILITE);
    SetForegroundWindow(hwnd);
 
    return;
 }

Figure 16 Speech Tips

Check the engine Make the speech capability of your program available only if an engine is installed. Gray the related UI elements otherwise.
Make speech capability optional While speech capability is intriguing, it's still a relatively new feature that not all systems can support. Make speech capability settable by the user.
Use speech together with ordinary UI Don't design speech-only user interfaces; mix speech with ordinary UI elements.
Prefer short sentences Speech can be burdensome, both for the machine and the user, so try to use only short sentences whenever possible.
Speech or recorded voice Don't mix synthesis-generated voice with recorded voice. There are too many differences in speed, tone, pitch, sound, fluency, and volume. Choose one approach and stick with it.

Technique	Description	Pros	Cons
Word Concatenation	Uses prerecorded words as a basis for constructing sentences.	Easy to code.	Not very flexible and produces a discontinuous sound.
Synthesis	Generates a synthetic voice by means of mathematical algorithms.	Very flexible since the voice can be changed by varying parameters such as throat, mouth, and lips.	Produces a robotic sounding voice.
Subword Concatenation	Uses short, prerecorded sounds that are combined into a continuous sound by applying mathematical algorithms.	Most accurate and precise technique.	Requires a large database of basic, prerecorded sounds that need to be localized.

General	Description
Enabled	Whether the voice text capability is enabled
CountEngines	The number of synthesis engines installed
IsSpeaking	Whether the engine is speaking
Voice	Description
Age	A value that denotes the age of the speaker
Gender	A value that denotes whether the speaker is male or female
Speaker	The name of the speaker, used for mnemonic reasons when you handle multiple voices at the same time
Speed	The average talking speed in words per minute
Style	The style of the voice (business, excited, casual)
Physical	Description
JawOpen	Angle to which the jaw is open
LipTension	A value denoting the degree of lips tension
MouthHeight	The height of the mouth
MouthWidth	The width of the mouth
MouthUpturn	The angle to which the mouth is open at the corners ("smile amplitude")
TeethLowerVisible	The extent to which the lower teeth are visible
TeethUpperVisible	The extent to which the upper teeth are visible
TonguePosn	The position of the tongue with respect to the teeth

Methods
Dialog	Description
AboutDlg	Copyright notice for the TTS engine
GeneralDlg	General properties of the engine
LexiconDlg	Lets you add new words by specifying the exact pronunciation
TranslateDlg	Lets you add new abbreviations and acronyms by specifying how to translate them
Engine	Description
FastForward	Moves playback forward by approximately a sentence
Pause	Pauses the text being played
Resume	Resumes a previously interrupted playback
Speak	Repeats the specified text
StopSpeaking	Stops speaking and flushes all the changes being made to the engine queue
Events
Engine	Description
AttribChanged	Fired when the value of a property changes
Speak	Fired when new text is added to the engine queue for further playback
SpeakingStarted	Indicates that the engine has just started speaking
SpeakingDone	Indicates that the engine has just finished speaking and the queue is empty

Tag	Description
\Pau=number\	The engine will pause for the specified number of milliseconds.
\Mrk=number\	Sets a bookmark in the text. Number is the ID of the bookmark. When a bookmark is encountered, an event is generated. Bookmarks are supported only by DirectTextToSpeech.
\RmS=number\	Forces the engine to spell out all the words encountered until it is reset. Number is a boolean value.
\Vce=char=val [, char=val]\	Sets the specified attribute for the voice. More settings are accepted at the same time.
\Ctx=string\	Defines a context for the next word. A context is a string that describes the word that follows-for example, an email address, a URL, a number, or a date.
\Vol=number\	Sets the volume. Number is a value from 0 to 65535, where 0 is silence.
NB: Not all the tags are supported by all engines.