INF: How the SQL SOUNDEX Algorithm Works

ID Number: Q67754

1.10 1.11 4.20

OS/2

Summary:

The following is a description of the functionality of the SQL SOUNDEX

algorithm and examples of its abilities and limitations in

distinguishing between words of similar length and/or phonetic

structure.

The SOUNDEX function returns a four-character code that describes the

phonetic characteristics of the word that was used as an argument to

the function. The first character of this code returns the first

letter of the word, and the remaining three characters are single

digits that describe the phonetic "value" of the first three syllables

of the word. As an example, a possible return from the SOUNDEX

function might be the value "A123". This is translated as follows:

"A" is the first letter of the word.

"1" is the phonetic value of the first syllable.

"2" is the phonetic value of the second syllable.

"3" is the phonetic value of the third syllable.

Unfortunately, the SOUNDEX algorithm suffers from some serious

limitations, which are imposed by the fact that it can only record

nine possible phoneme patterns. the SOUNDEX algorithm was not designed

to distinguish between such soundalike words as "string" and "sing"

(and it doesn't), but it also cannot register differences between

vowel sounds. For example, the value "B300" is returned for all of the

following words: "bit," "bite," "bat," "bait," "boat," "beet."

Additional reference words: 1.10 1.11 4.20