Tuning vocabulary pronunciations

The pronunciations in system dictionaries for supported languages have been tuned extensively. However, you may still find it necessary to modify them, for any of several possible reasons. You can tune pronunciations by:

  • Anticipating hard-to-recognize vocabulary items, such as words that have unexpected pronunciations. This is done during development by examining pronunciations, and also by testing them with Recognizer.
  • Identifying accuracy problems with specific vocabulary items. This is done by analyzing telephone calls received during the deployment phase.

Common causes for tuning

Nuance recommends you pay particular attention to the following cases:

  • A proper noun, such as a product name, might have an atypical pronunciation or might not appear in the system dictionary.
  • The word shows up very often in your application or has more significance than other words (for example, the company president’s name in an auto-attendant application), and you believe that Recognizer does not cover all the pronunciations likely to be used.
  • The item is a multi-word phrase that slurs its component words together when spoken naturally (for example, “want to” becomes “wanna”).
  • The word is likely to be spoken quickly and without much emphasis. In the case of US English, you might want to add pronunciations with the “schwa” phoneme for some of the vowels. See the Language Supplement appropriate for your language for more details.
  • The word is of foreign origin. In all languages, Recognizer handles many common words from other languages, but check them carefully.
  • The word has two very different pronunciations (for example, the word “either”) that are likely to be used, and Recognizer has only provided one.
  • The word is a regional variation. For example, in Massachusetts, a milkshake may be referred to as a “frappe”.
  • Some of your users are likely to have strong regional accents. For instance, some Bostonians pronounce the word “car” as “cah”. Similarly, there are towns of the same name whose pronunciation differs by region.

See Pronunciations for numbers for more about how pronunciations are generated for numbers, percentages, and other special cases.

Common errors

The most common mistake people make when writing phonemic transcriptions is to think in terms of spelling and break words down into letters. Phonemics is about the sounds of the words, not the letters in the words.

It helps to say words aloud when writing phonemically. Simply stated, it’s easier to hear the difference between two sounds than to imagine it.

Phonemic spelling does not match regular spelling

Remember that a word may be pronounced differently from how it is spelled. For example, consider the word “dogs” in en-US. A very common mistake would be to write this as dQgs, with an “s” at the end; but the correct phonemic representation is dQgz with a “z” at the end, because that is how the word is actually pronounced in conversation.

Another example is the word “cube”. If you think of spelling when writing the word phonemically, you might use ku:b (“koob”). However, the correct pronunciation is actually kju:b (“kyoub”).

n and N

The phonemes {n and {N in en-US can be confusing. To decide which to use, say the particular word with the {n, and then say the word with the {N. Compare the “n” sound in “bank” with the “n” in “band”. In “band”, the “n” sound is crisper and cleaner ({n); in “bank”, in “bank”, the “n” sound is less clearly enunciated and more nasal, as in “sing” or “finger” ({N).

bandb{nd
bankb{Nk

u and U

The same is true for similar sounding vowels, like U and u:. Try to say the word “good” with the vowel in “food”. They are different and while the u: vowel is correct for “food” it is not the right vowel for writing “good” phonemically.

goodgUd
foodfu:d

Pronunciations in a non-default language

When a grammar contains vocabulary items that originate in more than one language, you can work with the pronunciations in either of these ways:

  • When a foreign word or phrase appears, set the recognition language to match. For examples, see Setting the language for items in a list.

    The advantage of this technique is improved accuracy, because Recognizer can use language-specific acoustic models for the items, and a language-specific system dictionary to find correctly tuned pronunciations for the foreign items. The disadvantage is that the additional language must be installed, and more memory is used when the language is loaded.

  • Without changing the grammar’s default language, add pronunciations that mimic each foreign word or phrase. For example, consider the French name “Jean-Guy” appearing in an English-based grammar. The English system dictionary includes the French pronunciation for Jean (ZO:n) but not for Guy (gi:). In this case, you can add "gi:" to a user dictionary.

    The advantage of this technique is that it does not require installation of the additional language or an increase in memory usage when the language is loaded. The disadvantage is that it requires manual tuning (the insertion of foreign language pronunciations into user dictionaries), and future maintenance of the user dictionary.