Technique |
Description |
Pros |
Cons |
Word Concatenation |
Uses prerecorded words as a basis for constructing sentences. |
Easy to code. |
Not very flexible and produces a discontinuous sound. |
Synthesis |
Generates a synthetic voice by means of mathematical algorithms. |
Very flexible since the voice can be changed by varying parameters such as throat, mouth, and lips. |
Produces a robotic sounding voice. |
Subword Concatenation |
Uses short, prerecorded sounds that are combined into a continuous sound by applying mathematical algorithms. |
Most accurate and precise technique. |
Requires a large database of basic, prerecorded sounds that need to be localized. |
Figure 4 Voice Text Object Properties
General |
Description |
Enabled |
Whether the voice text capability is enabled |
CountEngines |
The number of synthesis engines installed |
IsSpeaking |
Whether the engine is speaking |
Voice |
Description |
Age |
A value that denotes the age of the speaker |
Gender |
A value that denotes whether the speaker is male or female |
Speaker |
The name of the speaker, used for mnemonic reasons when you handle multiple voices at the same time |
Speed |
The average talking speed in words per minute |
Style |
The style of the voice (business, excited, casual) |
Physical |
Description |
JawOpen |
Angle to which the jaw is open |
LipTension |
A value denoting the degree of lips tension |
MouthHeight |
The height of the mouth |
MouthWidth |
The width of the mouth |
MouthUpturn |
The angle to which the mouth is open at the corners ("smile amplitude") |
TeethLowerVisible |
The extent to which the lower teeth are visible |
TeethUpperVisible |
The extent to which the upper teeth are visible |
TonguePosn |
The position of the tongue with respect to
the teeth |
Figure 5 Voice Text Object Methods
Methods |
|
Dialog |
Description |
AboutDlg |
Copyright notice for the TTS engine |
GeneralDlg |
General properties of the engine |
LexiconDlg |
Lets you add new words by specifying the exact pronunciation |
TranslateDlg |
Lets you add new abbreviations and
acronyms by specifying how to translate them |
Engine |
Description |
FastForward |
Moves playback forward by approximately a sentence |
Pause |
Pauses the text being played |
Resume |
Resumes a previously interrupted playback |
Speak |
Repeats the specified text |
StopSpeaking |
Stops speaking and flushes all the changes being made to the engine queue |
Events |
|
Engine |
Description |
AttribChanged |
Fired when the value of a property changes |
Speak |
Fired when new text is added to the engine queue for further playback |
SpeakingStarted |
Indicates that the engine has just started speaking |
SpeakingDone |
Indicates that the engine has just finished speaking and the queue is empty |
Figure 7 SAPIMsgBox.cpp
/*****************************************************************
*
* Project.....: Testing Speech SDK 4.0
* Application.: TELLMEMORE.exe
* Module......: SAPIMsgBox.cpp
* Description.: Add speech capabilities to MessageBox()
* Compiler....: MS Visual C++ 6.0
* Written by..: D. Esposito
* Environment.: Windows 9x/NT
*
******************************************************************/
/*---------------------------------------------------------------*/
// INCLUDE section
/*---------------------------------------------------------------*/
#include <windows.h>
#include "SAPIUtils.h"
/*---------------------------------------------------------------*/
// GLOBAL section
/*---------------------------------------------------------------*/
// Data
static HHOOK g_hookCBT=NULL;
static LPTSTR g_pszText=NULL;
static BOOL g_bMsgBoxActivated=false;
// Callbacks
static LRESULT CALLBACK CBTProc(int, WPARAM, LPARAM);
/*---------------------------------------------------------------*/
// Procedure...: SAPIMessageBox()
// Description.: Add speech capabilities to MessageBox() calls
/*---------------------------------------------------------------*/
int SAPIMessageBox(HWND hwnd, LPCTSTR szText, LPCTSTR szTitle,
UINT fuFlags)
{
// If no special style, then call the default MessageBox()
if (!(fuFlags & MB_ENABLESPEAK))
return MessageBox(hwnd, szText, szTitle, fuFlags);
// By installing a CBT hook it's possible to detect when the
// dialog gets created and activated.
g_hookCBT = SetWindowsHookEx(WH_CBT,
reinterpret_cast<HOOKPROC>(CBTProc),
NULL,GetCurrentThreadId());
g_pszText = const_cast<LPTSTR>(szText);
// Now call the real MessageBox() after removing the additional
// speech-specific flag
fuFlags &= ~MB_ENABLESPEAK;
g_bMsgBoxActivated = false;
int irc = MessageBox(hwnd, szText, szTitle, fuFlags);
// Remove the hook
UnhookWindowsHookEx(g_hookCBT);
return irc;
}
/*---------------------------------------------------------------*/
// Procedure...: CBTProc()
// Description.: CBT hook procedure that runs the Speech engine
// INPUT.......: int, WPARAM, LPARAM
// OUTPUT......: LRESULT
/*---------------------------------------------------------------*/
LRESULT CALLBACK CBTProc(int iHookCode, WPARAM wParam, LPARAM lParam)
{
// No interest this time...
if (0 > iHookCode)
return CallNextHookEx(g_hookCBT, iHookCode, wParam, lParam);
// When the first window is activated, repeat the text...
// (Since I set the hook just before MessageBox() I'm sure this
// is just its window.)
if (HCBT_ACTIVATE == iHookCode)
{
if (!g_bMsgBoxActivated)
{
LPWSTR pwszText = new WCHAR[lstrlen(g_pszText)+1];
ZeroMemory(pwszText, lstrlen(g_pszText)+1);
mbstowcs(pwszText, g_pszText, lstrlen(g_pszText)+1);
TellMe(pwszText);
delete pwszText;
g_bMsgBoxActivated = true;
}
}
else // stop playing on exit
if (HCBT_DESTROYWND == iHookCode)
TellMe(NULL);
return CallNextHookEx(g_hookCBT, iHookCode, wParam, lParam);
}
/* End of file: SAPIMsgBox.cpp */
Figure 8 Tags
Tag |
Description |
\Pau=number\ |
The engine will pause for the specified number of milliseconds. |
\Mrk=number\ |
Sets a bookmark in the text. Number is the ID of the bookmark. When a bookmark is encountered, an event is generated. Bookmarks are supported only by DirectTextToSpeech. |
\RmS=number\ |
Forces the engine to spell out all the words encountered until it is reset. Number is a boolean value. |
\Vce=char=val [, char=val]\ |
Sets the specified attribute for the voice. More settings are accepted at the same time. |
\Ctx=string\ |
Defines a context for the next word. A context is a string that describes the word that follows-for example, an email address, a URL, a number, or a date. |
\Vol=number\ |
Sets the volume. Number is a value from 0 to 65535, where 0 is silence. |
NB: Not all the tags are supported by all engines. |
Figure 9 SAPIUtils.cpp
/*****************************************************************
* Project.....: Testing Speech SDK 4.0
* Application.: TELLMEMORE.exe
* Module......: SAPIUtils.cpp
* Description.: Calls the Speech API functions
* Compiler....: MS Visual C++ 6.0
* Written by..: D. Esposito
* Environment.: Windows 9x/NT
******************************************************************/
/*---------------------------------------------------------------*/
// PRAGMA section
/*---------------------------------------------------------------*/
// Force the linker to add the following libraries.
#ifdef _MSC_VER
#pragma comment(lib, "spchwrap.lib")
#endif
/*---------------------------------------------------------------*/
// INCLUDE section
/*---------------------------------------------------------------*/
#include <objbase.h>
#include <initguid.h>
#include <spchwrap.h>
/*---------------------------------------------------------------*/
// GLOBAL section
/*---------------------------------------------------------------*/
// Data
static CVoiceText *g_pSpeakObject;
/*---------------------------------------------------------------*/
// Procedure...: InitSpeech()
// Description.: Initializes the speech engine
// INPUT.......: void
// OUTPUT......: BOOL
/*---------------------------------------------------------------*/
BOOL InitSpeech(void)
{
CoInitialize(NULL);
g_pSpeakObject = new CVoiceText;
if (!g_pSpeakObject)
return false;
if (g_pSpeakObject->Init(L"TellMeMore"))
return false;
return true;
}
/*---------------------------------------------------------------*/
// Procedure...: TellMe()
// Description.: Repeats the specified text
// INPUT.......: LPWSTR
// OUTPUT......: void
/*---------------------------------------------------------------*/
void TellMe(LPWSTR wszText)
{
if (g_pSpeakObject->IsSpeaking())
g_pSpeakObject->StopSpeaking();
if (NULL == wszText)
return;
g_pSpeakObject->Speak(wszText);
}
/*---------------------------------------------------------------*/
// Procedure...: ShowLexiconDialog()
// Description.: Run the wizard to add new words
// INPUT.......: HWND
// OUTPUT......: void
/*---------------------------------------------------------------*/
void ShowDialog(HWND hWnd)
{
g_pSpeakObject->LexiconDlg(hWnd);
}
/*---------------------------------------------------------------*/
// Procedure...: TermSpeech()
// Description.: Closes the speech engine
// INPUT.......: void
// OUTPUT......: BOOL
/*---------------------------------------------------------------*/
BOOL TermSpeech(void)
{
delete g_pSpeakObject;
CoUninitialize();
return true;
}
/* End of file: SAPIUtils.cpp */
Figure 14 Detecting Clipboard Changes
void OnDrawClipboard(HWND hwnd)
{
// List of the formats we're interested in
static UINT auPriorityList[] = {CF_TEXT};
// Verify that the content is in CF_TEXT format
UINT uFormat = GetPriorityClipboardFormat(auPriorityList, 1);
if (-1 == uFormat)
return;
// Get data from the clipboard
if (OpenClipboard(hwnd))
{
HGLOBAL hglb = GetClipboardData(uFormat);
LPTSTR psz = reinterpret_cast<LPTSTR>(GlobalLock(hglb));
SetDlgItemText(hwnd, IDC_TEXT, psz);
GlobalUnlock(hglb);
CloseClipboard();
}
// Make the window flash. (Only Windows 98 and 2000)
// When you then click on the window, it'll receive a
// WM_NCACTIVATE message where the original title is
// restored
SetWindowText(hwnd, APPTITLE_HILITE);
SetForegroundWindow(hwnd);
return;
}
Figure 16 Speech Tips
Check the engine Make the speech capability of your program available only if an engine is installed. Gray the related UI elements otherwise. Make speech capability optional While speech capability is intriguing, it's still a relatively new feature that not all systems can support. Make speech capability settable by the user. Use speech together with ordinary UI Don't design speech-only user interfaces; mix speech with ordinary UI elements. Prefer short sentences Speech can be burdensome, both for the machine and the user, so try to use only short sentences whenever possible. Speech or recorded voice Don't mix synthesis-generated voice with recorded voice. There are too many differences in speed, tone, pitch, sound, fluency, and volume. Choose one approach and stick with it. |