Figure 3   Techniques for Converting Phonemes

Technique
Description
Pros
Cons
Word Concatenation
Uses prerecorded words as a basis for constructing sentences.
Easy to code.
Not very flexible and produces a discontinuous sound.
Synthesis
Generates a synthetic voice by means of mathematical algorithms.
Very flexible since the voice can be changed by varying parameters such as throat, mouth, and lips.
Produces a robotic sounding voice.
Subword Concatenation
Uses short, prerecorded sounds that are combined into a continuous sound by applying mathematical algorithms.
Most accurate and precise technique.
Requires a large database of basic, prerecorded sounds that need to be localized.


Figure 4   Voice Text Object Properties

General
Description
Enabled
Whether the voice text capability is enabled
CountEngines
The number of synthesis engines installed
IsSpeaking
Whether the engine is speaking
Voice
Description
Age
A value that denotes the age of the speaker
Gender
A value that denotes whether the speaker is male or female
Speaker
The name of the speaker, used for mnemonic reasons when you handle multiple voices at the same time
Speed
The average talking speed in words per minute
Style
The style of the voice (business, excited, casual)
Physical
Description
JawOpen
Angle to which the jaw is open
LipTension
A value denoting the degree of lips tension
MouthHeight
The height of the mouth
MouthWidth
The width of the mouth
MouthUpturn
The angle to which the mouth is open at the corners ("smile amplitude")
TeethLowerVisible
The extent to which the lower teeth are visible
TeethUpperVisible
The extent to which the upper teeth are visible
TonguePosn
The position of the tongue with respect to the teeth


Figure 5   Voice Text Object Methods

Methods
Dialog
Description
AboutDlg
Copyright notice for the TTS engine
GeneralDlg
General properties of the engine
LexiconDlg
Lets you add new words by specifying the exact pronunciation
TranslateDlg
Lets you add new abbreviations and acronyms by specifying how to translate them
Engine
Description
FastForward
Moves playback forward by approximately a sentence
Pause
Pauses the text being played
Resume
Resumes a previously interrupted playback
Speak
Repeats the specified text
StopSpeaking
Stops speaking and flushes all the changes being made to the engine queue
Events
Engine
Description
AttribChanged
Fired when the value of a property changes
Speak
Fired when new text is added to the engine queue for further playback
SpeakingStarted
Indicates that the engine has just started speaking
SpeakingDone
Indicates that the engine has just finished speaking and the queue is empty


Figure 7   SAPIMsgBox.cpp


 /*****************************************************************
 *
 *  Project.....:  Testing Speech SDK 4.0
 *  Application.:  TELLMEMORE.exe
 *  Module......:  SAPIMsgBox.cpp
 *  Description.:  Add speech capabilities to MessageBox()
 *  Compiler....:  MS Visual C++ 6.0
 *  Written by..:  D. Esposito
 *  Environment.:  Windows 9x/NT
 *
 ******************************************************************/
 
 /*---------------------------------------------------------------*/
 //                        INCLUDE section
 /*---------------------------------------------------------------*/
 #include <windows.h>
 #include "SAPIUtils.h"
 
 /*---------------------------------------------------------------*/
 //                        GLOBAL section
 /*---------------------------------------------------------------*/
 // Data
 static HHOOK g_hookCBT=NULL;
 static LPTSTR g_pszText=NULL;
 static BOOL g_bMsgBoxActivated=false;
 
 // Callbacks
 static LRESULT CALLBACK CBTProc(int, WPARAM, LPARAM);
 
 /*---------------------------------------------------------------*/
 // Procedure...: SAPIMessageBox()
 // Description.: Add speech capabilities to MessageBox() calls 
 /*---------------------------------------------------------------*/
 int SAPIMessageBox(HWND hwnd, LPCTSTR szText, LPCTSTR szTitle, 
     UINT fuFlags)
 {
     // If no special style, then call the default MessageBox()
     if (!(fuFlags & MB_ENABLESPEAK))
         return MessageBox(hwnd, szText, szTitle, fuFlags);
 
     // By installing a CBT hook it's possible to detect when the
     // dialog gets created and activated. 
     g_hookCBT = SetWindowsHookEx(WH_CBT,
                                  reinterpret_cast<HOOKPROC>(CBTProc),
                                  NULL,GetCurrentThreadId());
 
     g_pszText = const_cast<LPTSTR>(szText);
     
     // Now call the real MessageBox() after removing the additional
     // speech-specific flag
     fuFlags &= ~MB_ENABLESPEAK;
     g_bMsgBoxActivated = false;
     int irc = MessageBox(hwnd, szText, szTitle, fuFlags);
 
     // Remove the hook 
     UnhookWindowsHookEx(g_hookCBT);
     return irc;
 }
 
 /*---------------------------------------------------------------*/
 // Procedure...: CBTProc()
 // Description.: CBT hook procedure that runs the Speech engine
 // INPUT.......: int, WPARAM, LPARAM
 // OUTPUT......: LRESULT
 /*---------------------------------------------------------------*/
 LRESULT CALLBACK CBTProc(int iHookCode, WPARAM wParam, LPARAM lParam)
 {
     // No interest this time...
     if (0 > iHookCode)
          return CallNextHookEx(g_hookCBT, iHookCode, wParam, lParam);
     
     // When the first window is activated, repeat the text...
     // (Since I set the hook just before MessageBox() I'm sure this
     // is just its window.)
     if (HCBT_ACTIVATE == iHookCode)
     {
         if (!g_bMsgBoxActivated)
         {
             LPWSTR pwszText = new WCHAR[lstrlen(g_pszText)+1];
             ZeroMemory(pwszText, lstrlen(g_pszText)+1);
             mbstowcs(pwszText, g_pszText, lstrlen(g_pszText)+1);
 
             TellMe(pwszText);
             delete pwszText;
             g_bMsgBoxActivated = true;
         }
     }
     else  // stop playing on exit
     if (HCBT_DESTROYWND == iHookCode)
         TellMe(NULL);
 
     return CallNextHookEx(g_hookCBT, iHookCode, wParam, lParam);
 }
 
 /*  End of file: SAPIMsgBox.cpp  */


Figure 8   Tags

Tag
Description
\Pau=number\
The engine will pause for the specified number of milliseconds.
\Mrk=number\
Sets a bookmark in the text. Number is the ID of the bookmark. When a bookmark is encountered, an event is generated. Bookmarks are supported only by DirectTextToSpeech.
\RmS=number\
Forces the engine to spell out all the words encountered until it is reset. Number is a boolean value.
\Vce=char=val
[, char=val]\
Sets the specified attribute for the voice. More settings are accepted at the same time.
\Ctx=string\
Defines a context for the next word. A context is a string that describes the word that follows-for example, an email address, a URL, a number, or a date.
\Vol=number\
Sets the volume. Number is a value from 0 to 65535, where 0 is silence.
NB: Not all the tags are supported by all engines.


Figure 9   SAPIUtils.cpp


 /*****************************************************************
 *  Project.....:  Testing Speech SDK 4.0
 *  Application.:  TELLMEMORE.exe
 *  Module......:  SAPIUtils.cpp
 *  Description.:  Calls the Speech API functions
 *  Compiler....:  MS Visual C++ 6.0
 *  Written by..:  D. Esposito
 *  Environment.:  Windows 9x/NT
 ******************************************************************/
 
 /*---------------------------------------------------------------*/
 //                        PRAGMA section
 /*---------------------------------------------------------------*/
 // Force the linker to add the following libraries.
 #ifdef _MSC_VER
 #pragma comment(lib, "spchwrap.lib")
 #endif
 
 /*---------------------------------------------------------------*/
 //                        INCLUDE section
 /*---------------------------------------------------------------*/
 #include <objbase.h>
 #include <initguid.h>
 #include <spchwrap.h> 
 
 /*---------------------------------------------------------------*/
 //                        GLOBAL section
 /*---------------------------------------------------------------*/
 // Data
 static CVoiceText *g_pSpeakObject;
 
 /*---------------------------------------------------------------*/
 // Procedure...: InitSpeech()
 // Description.: Initializes the speech engine
 // INPUT.......: void
 // OUTPUT......: BOOL
 /*---------------------------------------------------------------*/
 BOOL InitSpeech(void)
 {
     CoInitialize(NULL);
 
     g_pSpeakObject = new CVoiceText;
     if (!g_pSpeakObject)
         return false;
 
     if (g_pSpeakObject->Init(L"TellMeMore")) 
         return false;
 
     return true;
 }
 
 /*---------------------------------------------------------------*/
 // Procedure...: TellMe()
 // Description.: Repeats the specified text
 // INPUT.......: LPWSTR
 // OUTPUT......: void
 /*---------------------------------------------------------------*/
 void TellMe(LPWSTR wszText)
 {
     if (g_pSpeakObject->IsSpeaking())
         g_pSpeakObject->StopSpeaking();
 
     if (NULL == wszText)
         return;
 
     g_pSpeakObject->Speak(wszText);
 }
 
 /*---------------------------------------------------------------*/
 // Procedure...: ShowLexiconDialog()
 // Description.: Run the wizard to add new words
 // INPUT.......: HWND 
 // OUTPUT......: void
 /*---------------------------------------------------------------*/
 void ShowDialog(HWND hWnd)
 {
     g_pSpeakObject->LexiconDlg(hWnd);
 }
 
 /*---------------------------------------------------------------*/
 // Procedure...: TermSpeech()
 // Description.: Closes the speech engine
 // INPUT.......: void
 // OUTPUT......: BOOL
 /*---------------------------------------------------------------*/
 BOOL TermSpeech(void)
 {
     delete g_pSpeakObject;
     CoUninitialize();
     return true;
 }
 
 /*  End of file: SAPIUtils.cpp  */


Figure 14   Detecting Clipboard Changes


 void OnDrawClipboard(HWND hwnd)
 {
    // List of the formats we're interested in
    static UINT auPriorityList[] = {CF_TEXT};  
 
    // Verify that the content is in CF_TEXT format   
    UINT uFormat = GetPriorityClipboardFormat(auPriorityList, 1); 
    if (-1 == uFormat)
        return;
 
    // Get data from the clipboard
    if (OpenClipboard(hwnd))                     
    { 
       HGLOBAL hglb = GetClipboardData(uFormat); 
       LPTSTR psz = reinterpret_cast<LPTSTR>(GlobalLock(hglb));  
       SetDlgItemText(hwnd, IDC_TEXT, psz);
       GlobalUnlock(hglb); 
       CloseClipboard();                     
    }
 
    // Make the window flash. (Only Windows 98 and 2000)
    // When you then click on the window, it'll receive a
    // WM_NCACTIVATE message where the original title is 
    // restored
    SetWindowText(hwnd, APPTITLE_HILITE);
    SetForegroundWindow(hwnd);
 
    return;
 }


Figure 16   Speech Tips

Check the engine Make the speech capability of your program available only if an engine is installed. Gray the related UI elements otherwise.
Make speech capability optional While speech capability is intriguing, it's still a relatively new feature that not all systems can support. Make speech capability settable by the user.
Use speech together with ordinary UI Don't design speech-only user interfaces; mix speech with ordinary UI elements.
Prefer short sentences Speech can be burdensome, both for the machine and the user, so try to use only short sentences whenever possible.
Speech or recorded voice Don't mix synthesis-generated voice with recorded voice. There are too many differences in speed, tone, pitch, sound, fluency, and volume. Choose one approach and stick with it.