SRRESWORDNODE

This structure provides information about a word node in a recognition/alternative graph generated by a speech-recognition engine.

Syntax

typedef struct { // srrwn
DWORD dwNextWordNode;
DWORD dwUpAlternateWordNode;
DWORD dwDownAlternateWordNode;
DWORD dwPreviousWordNode;
DWORD dwPhonemeNode;
QWORD qwStartTime;
QWORD qwEndTime;
DWORD dwWordScore;
WORD wVolume;
WORD wPitch;
VOICEPARTOFSPEECH pos;
DWORD dwCFGParse;
DWORD dwCue;
} SRRESWORDNODE, *PSRRESWORDNODE;

Members

dwNextWordNode

Specifies the node number of the node that follows next in time. For example, if the current node contains “Mail,” the next node would contain “To.” If the next node has several alternatives, this pointer should continue along the path that produces the highest path score at the final node. If there is nothing left in the path, this can be NULL.

dwUpAlternateWordNode

Specifies the number of an alternative node with a higher score than that of the word in this node. For example, if this node contains “Nail,” the alternative node might contain “Mail.” The alternative nodes should be ordered according to the highest word score. If there are no more alternatives, this can be NULL.

dwDownAlternateWordNode

Specifies the number of an alternative node with a lower score than that of the word in this node. For example, if this node contains “Mail,” the alternative node might contain “Nail.” The alternative nodes should be ordered according to the highest word score. If there are no more alternatives, this can be NULL.

dwPreviousWordNode

Specifies the node number of the previous node. If there are several previous nodes, this maps back along the highest path score at the final node. If there is nothing before this in the path, this can be NULL.

dwPhonemeNode

Specifies the node number of the first phoneme in the word. If an engine does not support phonemes or if such a mapping is inappropriate, this should be NULL.

qwStartTime

Time stamp, in bytes, when the audio for this node started. If the value is indeterminate, this is zero.

qwEndTime

Time stamp, in bytes, when the audio for this node ended. If the value is indeterminate, this is zero.

dwWordScore

Specifies the score for this individual word. It is valid to compare this score only against the scores of this word’s alternatives. Engines should make this value a linear probability if possible. If an engine does not know the value, this is zero.

wVolume

Unsigned 16-bit integer that contains the linear volume level of the word, from 1 to 0xFFFF. If the engine does not know the volume, this should be zero. Applications can use this information to determine emphasis or sentence type — for example, question or command — or to do transplanted prosody.

wPitch

Unsigned 16-bit integer that contains the average pitch of the word in hertz. Engines that do not have information about this value set it to zero. Applications can use this information to determine emphasis or sentence type — for example, question or command — or to do transplanted prosody.

pos

Part of speech of the word, if known, for example, noun, verb, adverb, adjective, or conjunction.

dwCFGParse

Specifies the unique identifier of the rule used to parse the word. This is the rule identifier from the SRCFGRULE structure that defines the rule in the grammar. This member is valid only for context-free grammars.

If an imported rule was used to parse the word, dwCFGParse contains the unique rule identifier from the SRCFGIMPRULE structure that defines the imported rule in the grammar. An application can use this information to determine the semantics of a word and save itself the work of parsing the tree. For example, suppose the context-free grammar contains the following rules:

1 = SEQ(Send mail to 2)

2 = ALT(Mike, Bob, Fred)

If the engine recognizes “Send mail to Bob,” the value of dwCFGParse would be 1 for “Send mail to” and 2 for “Bob.”

dwCue

Specifies a cue that the engine has recognized but cannot translate into a word string. A node with a cue may not have a word associated with it. An engine is not required to return cues so an application should not expect them, although it should take advantage of cues that are returned. This member is one of the following flags:

SRRESCUE_COMMA: A comma should go here.
SRRESCUE_DECLARATIVEBEGIN: A declarative sentence has begun. This cue must be followed by a closing SRRESCUE_DECLARATIVEEND.
SRRESCUE_DECLARATIVEEND: A declarative sentence has ended.
SRRESCUE_IMPERATIVEBEGIN: An imperative sentence has begun. This cue must be followed by a closing SRRESCUE_IMPERATIVEEND.
SRRESCUE_IMPERATIVEEND: An imperative sentence has ended.
SRRESCUE_INTERROGATIVEBEGIN: An interrogative sentence has begun. This cue must be followed by a closing SRRESCUE_INTERROGATIVEEND.
SRRESCUE_INTERROGATIVEEND: An interrogative sentence has ended.
SRRESCUE_NOISE: A noise has occurred.
SRRESCUE_PAUSE: The speaker has paused.
SRRESCUE_SENTENCEBEGIN: A sentence has begun. This cue must be followed by a closing SRRESCUE_SENTENCEEND. If the engine has identified the sentence as interrogative, declarative, or imperative, it should use the more specific SRRESCUE_INTERROGATIVEBEGIN, SRRESCUE_DECLARATIVEBEGIN, or SRRESCUE_IMPERATIVEBEGIN cues.
SRRESCUE_SENTENCEEND: A sentence has ended. If the engine has identified the sentence as interrogative, declarative, or imperative, it should use the more specific SRRESCUE_INTERROGATIVEEND, SRRESCUE_DECLARATIVEEND, or SRRESCUE_IMPERATIVEEND cues.
SRRESCUE_UM: The speaker has said a nonsense sound, such as “Um.”
SRRESCUE_WILDCARD: This is a placeholder for a wildcard word in a context-free grammar.
SRRESCUE_WORD: The node contains a word.