English United States (en-US)
This documentation was updated on May 8, 2023.
Creating grammars
The following subsections describe key issues for working with grammar documents in the American English language.
Grammar file encoding
Nuance has full internal Unicode support. Create your grammars using ISO-8859-1 (also known as Latin-1) or UTF-8. For example, your grammar header might be:
<?xml version='1.0' encoding='UTF-8'?>
<grammar xml:lang="en-US" version="1.0" root="test">
alphanum_lc built-in grammar
The alphanum_lc built-in grammar recognizes a connected string of up to 20 digits and lowercase alphabetic characters, such as “a8f9h23”. For example, this grammar could be used to recognize a product code or user id. The “lc” in the name of this built-in means lowercase. The possible characters are the lowercase letters a-z and the digits 0-9. The application layer can adjust the case of the returned letters as needed for further processing.
Note: This grammar replaces the alphanum built-in grammar.
alphanum built-in grammar
(NOTE: for backward-compatibility only. Otherwise, use alphanum_lc builtin)
This grammar has been replaced by the alphanum_lc grammar, but is still available. The alphanum builtin-grammar has been retained for backward-compatibility. For new implementations, please use the alphanum_lc builtin grammar.
The alphanum built-in grammar recognizes a connected string of up to 20 digits and uppercase or lowercase alphabetic characters, such as “A8f9h23”. For example, this grammar could be used to recognize a product code or order number. The possible characters are the uppercase letters A-Z, lowercase letters a-z, and digits 0-9. Uppercase and lowercase letters are homonyms (e.g., “B” and “b”), so the inclusion of both is redundant for the purposes of speech recognition of case insensitive items such as product codes. Thus, the alphanum built-in grammar has been replaced by the alphanum_lc grammar.
boolean built-in grammar
The boolean built-in grammar accepts a yes/no utterance from the caller. “Correct” is accepted as a synonym for “yes”.
Properties
The y and n parameters let you associate any two touchtone buttons as synonyms for yes and no.
Parameter | Description |
---|---|
y | Desired DTMF digit to be equivalent to “yes” (default = 1) |
n | Desired DTMF digit to be equivalent to “no” (default = 2) |
Examples
Caller says… | MEANING key |
---|---|
yes | true |
no | false |
cancel built-in grammar
The cancel grammar collects a single word “cancel.” It also allows (and ignores) “please” and disfluencies such as “umm… er.”
ccexpdate built-in grammar
The ccexpdate grammar understands the expiration date on a credit card. Expiration dates are usually a month and a year, and are often embossed on a credit card in the form “mm/yy.” The grammar recognizes variations on the date, for example, “December 2007,” “twelve oh seven,” “twelve of two thousand and seven,” “twelve slash zero seven,” etc.
creditcard built-in grammar
The creditcard grammar understands a caller saying a credit card number, optionally preceding the number with the credit card name, or the words “account number” or “account.” For example, a caller can say, “visa account number four oh one seven…,” “mastercard five zero zero two…,” or “three seven three five….”
currency built-in grammar
The currency built-in grammar collects currency amounts using Dollars and Cents, such as “ten dollars,” “ten dollars and fifteen cents,” and “ten fifteen.”
Return keys/values
MEANING | contains a string in the following form: USDmain_unit_amount . subunit_amount If the caller explicitly says “U S dollars,” then a currency value of USD is added as a prefix. If the caller omits the main unit or subunit amount, then that field is zero. The string contains a leading zero if the subunit amount is collected without the main unit. The key AMBIGUOUS is set to 1 if the caller says an ambiguous phrase such as “fifteen twelve” which could either be $15.12 or $1512.00; otherwise, it is set to 0. |
---|---|
SWI_literal | contains the exact text that was recognized. |
Examples
Caller says | MEANING |
---|---|
five dollars | 5.00 |
five U S dollars | USD5.00 |
five cents | 0.05 |
five dollars and five cents | 5.05 |
five dollars and twenty-five five dollars twenty-five five twenty-five | 5.25 |
six hundred twenty-five thousand four hundred sixty-four dollars | 625464.00 |
one dollar zero cents | 1.00 |
one twenty two | 1.22 |
date built-in grammar
The date built-in grammar accepts spoken date utterances from the caller.
Recognized phrases include “June four,” “four June two thousand six “six four two thousand six”, “the fourth,” “fourth of June,” and “Monday, the fourth of June.”
The grammar also accepts “yesterday” “today,” and “tomorrow” which return values of -1, 0, and +1 respectively into the MEANING key.
Examples
Caller says | MEANING key |
---|---|
January 5th, 2000 | 20000105 |
Yesterday | -1 |
Today | 0 |
Tomorrow | +1 |
the fourth | ??????04 |
Wednesday | (Phrase not recognized) |
Wednesday the 12th | ??????12 |
June 4 or June 4th | ????0604 |
June 4, 1997 | 19970604 |
June 4, 97 | ??970604 |
Wednesday, June 4, 1997 | 19970604 |
the 6th | ??????06 |
4, 6 | ????0604 |
10, 12 | ????1210 |
10, 12, 97 | ??971210 |
digits built-in grammar
Valid characters are the digits 0-9. The digit “0” can be pronounced as either “oh” or “zero.”
exit built-in grammar
The exit grammar collects a single word “exit.” It also allows (and ignores) “please” and disfluencies such as “umm… er.”
help built-in grammar
The help grammar collects a single word “help.” It also allows (and ignores) “please” and disfluencies such as “umm… er.”
number built-in grammar
The number built-in grammar accepts quantities such as “ten,” “one hundred and forty,” “five hundred sixty one point five,” “negative five,” and “minus four point three.”
Numbers from -999,999,999.99 to 999,999,999.99 are recognized, but by default the minallowed parameter is set to zero, which limits recognition to positive values.
Note :
The source grammar contains a rule which is called COLLOQ_NATNUM. By removing the comment markers from the <item> tags the grammar also allows colloquial number sentences like “one twenty two”.
By enabling this rule the complexity of the grammar will increase, which will lead to a higher error rate for regular natural numbers.
Examples
Caller says | MEANING key |
---|---|
twenty five | 25 |
twelve thousand three hundred forty five | 12345 |
twelve hundred | 1200 |
minus two negative two | -2 |
fourteen point five six | 14.56 |
fourteen dot fifty six | (Phrase not recognized; the words “dot” and “fifty six” are not allowed) |
operator built-in grammar
The operator grammar collects a single word “operator.” It also allows (and ignores) “please” and disfluencies such as “umm… er.”
phone built-in grammar
VoiceXML built-in grammar
The phone built-in grammar accepts 7- and 10-digit North American phone numbers as well as three-digits numbers ending in 11 (for example, “911”). An optional “1” can be placed before the 7- or 10-digit numbers.
Return keys/values
MEANING | The key is assigned to a string of digits representing the recognized phone number. A leading “1” is omitted in the return value. For example, if “16789999” is recognized, the return result is “6789999”. The return string may optionally contain the character “x” to indicate a phone number with an extension. For example, a result could be “8005551234x789”. |
---|---|
SWI_literal | Contains the exact text that was recognized. |
The grammar allows phrases such as “three two four fifty five seventy two” as well as strings of individual digits.
Properties
Additionally, as stipulated in the VoiceXML specification, the caller may specify an extension, for example, “five four two three five six seven extension two thousand.” By default, extensions of one to four digits long are supported.
Property | Description |
---|---|
minextension | Minimum numeric value allowed for an extension (default is 1). |
maxextension | Maximum numeric value allowed for an extension. Set this to 0 to disallow extensions. (Default is 9999.) |
DTMF interpretation
DTMF keys are interpreted according to the VoiceXML specification. DTMF asterisk “*” indicates “x” for extensions.
time built-in grammar
The time built-in grammar accepts spoken time-of-day utterances from the caller.
Examples
For each entry, the values returned in the MEANING and QUALIFIER keys are shown. (Not shown are the values of the HOUR, MINUTE, and AMPM keys.)
Caller says | MEANING | QUALIFIER |
---|---|---|
now, immediately… | (Phrase not recognized) | -- |
in a half hour | (Phrase not recognized) | -- |
at noon | 1200p | exact |
at midnight | 0000? | exact |
before noon | 1200p | before |
after thirteen thirty | 1330h | after |
twenty twenty | 2020h | exact |
eight twenty in the morning | 0820a | exact |
half past eight | 0830? | exact |
seven fifteen pm quarter past seven in the evening | 0715p | exact |
twenty four hundred hours twenty four hundred | 0000h | exact |
zipcode built-in grammar
The zipcode grammar recognizes valid United States ZIP Codes in either five- or nine-digit format.
Return keys/values
Upon return, the key MEANING is assigned to the recognized zipcode, and can contain either five or nine digits.
Vocabulary items and pronunciations
This chapter describes considerations for vocabularies and their pronunciations in United States English (en-US).
Specially tuned pronunciations
The following table shows common words that are fine-tuned by Nuance. Each of these words contains “word-specific phonemes;” that is, phonemes and associated models created especially for the words.
Words with tuned pronunciations (do not modify):
-
All letters of the alphabet, a-z
-
Affirmation and negation: yes, no
-
Monetary units: dollar, dollars, cent, cents
-
Cardinal numbers: 0-99, 100, and 1000
-
Ordinal numbers: 1.-31. (1 st through 31 st )
United States English pronunciations
This section provides detailed reference information to help create pronunciation dictionaries. It is intended for people who have sufficient knowledge of the English language as spoken in United States. It provides information about transcription and pronunciation.
This section explains all the phonemes and their SAMPA symbols used in the American English language. As a reference pronunciation dictionary, we use:
Wells, John C.: Longman Pronunciation Dictionary. Burnt Mill: Longman 1990. (ISBN 0-582-96411-3)
In this dictionary you will find the American English as well as the British English pronunciation.
If you are not sure how a certain word is pronounced you can refer to the IPA transcriptions given there and then convert them into the SAMPA symbols, given in the alphabetic SAMPA-IPA table ( United States English phoneme set in alphabetical order ).
The United States English phoneme system
The American English phoneme system can be divided into two groups:
- Consonants
- Vowels
Furthermore, it is possible to distinguish seven different types of consonants:
- Plosives
- Fricatives
- Affricates
- Nasals
- Laterals
- Flaps
- Glides
Within the vowel group, further distinctions can be made between front, central and back vowels and diphthongs.
American English spelling does have a certain complexity, since the orthography of most of its constituent words does not necessarily reflect their pronunciation. This lack of rigid structure means that the relationship between spelling (grapheme) and sound (phoneme) is not easy to define. Generally speaking, the phonetic transcription of a word is influenced by:
- Specific phonetic rules
- Pronunciation peculiarities that have developed by the time
The following table shows all phonemes used in American English transcriptions. They are listed according to their phoneme classes with their IPA and SAMPA representations. If you are already familiar with the IPA symbol set, you will find this table useful when converting into SAMPA symbols.
United States English phoneme set grouped by class
Phoneme class | SAMPA | IPA | Examples of usage |
---|---|---|---|
Consonants | Plosives | b | b |
p | p | pin | / p In/ |
g | g | give | / g Iv/ |
k | k | skin | /s k In/ |
d | d | day | / d eY/ |
t | t | tin | / t In/ |
Fricatives | v | v | saving |
f | f | coffee | /kQ f i/ |
D | ð | this | / D Is/ |
T | θ | thin | / T In/ |
z | z | crazy | /kreY z i/ |
Z | ʒ | vision | /vI Z @n/ |
s | s | sin | / s In/ |
S | ʃ / ʒ | ship | / S Ip/ |
h | h | hit | / h It/ |
Affricates | tS | ʧ | ketchup |
dZ | ʤ | Jim | / dZ Im/ |
Nasals | m | m | mock |
n | n | knock | / n Qk/ |
N | ŋ | thing | /TI N / |
Laterals | l | l | life |
Flap | 4 | ɾ | butter hidden |
Consonants | Glides | r | r |
j | j | yes | / j es/ |
w | w | wet | / w et/ |
Vowels | Single vowels | i | i: |
u | u: | lose | /l u z/ |
I | ɪ | pit | /p I t/ |
U | ʊ | put | /p U t/ |
e | e | pet | /p e t/ |
@ | ə | away | / @ weY/ |
3: | ɜː | furs | /f 3: z/ |
V | ʌ | cut | /k V t/ |
O | ɔː | award | /@w O rd/ |
a | æ | bad | /b a d/ |
Q | ɑː / ɒ | stars pot | /st Q rz/ /p Q t/ |
Diphthongs | @W | əʊ | nose |
QY | ɔɪ | toy | /t QY / |
eY | eɪ | raise | /r eY z/ |
aY | aɪ | rise | /r aY z/ |
aW | au̬ / aʊ̬ | rouse | /r aW z/ |
United States English consonants
American English consonants typically consist of:
- six plosives
- eight fricatives
- two affricates
- three nasals
- one lateral
- one flap
- three glides
Plosives
There are three voiceless and three voiced plosives in the American English SAMPA symbol set which can be arranged in pairs.
Voiced | Voiceless |
---|---|
/b/ | bit rabid cab |
/g/ | gold degree bag |
/d/ | down medal sad |
American English has an additional plosive, the so-called glottal stop ‘?’. It does not have a distinctive function and is not uniformly represented in the language’s orthography, but it is pronounced (for example, before initial vowels when heavily stressed) and is sometimes used as a variant supplanting the medial and final /t/. It can, however, be ignored for pronunciation transcription purposes.
Fricatives
There are nine fricatives in American English, four voiceless and five voiced:
Voiced | Voiceless |
---|---|
/v/ | vine marvel prove |
/D/ | this worthy with |
/z/ | zinc razor plays |
/Z/ | gendarme vision |
/h/ | hot behind |
In American English the voiceless fricative /h/ does not appear in the final position.
Affricates
In American English there are two affricates, /dZ/ and /tS/, for example gin /dZIn/ and chin /tSIn/.
Note that in SAMPA affricates are always represented by two single phonemes.
Voiced | Voiceless |
---|---|
/dZ/ | gin ridges large |
Nasals
There are three nasals in American English, /m/, /n/, and /N/. The velar nasal /N/ (back of the tongue touches the soft palate) never appears in the initial position.
/m/ | might simmer sum | / m aYt/ /sI m @r/ /sV m / |
---|---|---|
/n/ | night sinner sun | / n aYt/ /sI n @r/ /sV n / |
/N/ | singer finger sung | /sI N @r/ /fI N g@r/ /sV N / |
Laterals
There is one lateral in American English, /l/.
/l/ | look silly milk | /lUk/ /sIli/ /mIlk/ |
---|
Flap
There is one flap in American English, /4/.
4 | butter hidden | /bV 4 3:/ /hi 4 n/ |
---|
Glides
/r/ | rag mirror far | / r ag/ /mI r @r/ /fQ r / |
---|---|---|
/j/ | yes view | / j es/ /v j u/ |
/w/ | wet away | / w et/ /@ w eY/ |
United States vowels
Front, central, and back vowels
American English single vowels (monophthongs) can be divided into three groups according to their place of articulation: front, central or back. Within each group vowels differ in their degree of the mouth opening. The length of vowels is of minor importance in the American English vowel system, and the length of a particular vowel in a given word may change considerably in connected speech. Thus the colon, which appears in some phonetic symbols to denote length, is used in the transcription of American English to denote a different vowel quality rather than quantity (length).
The three vowel groups are shown in the following table, ranging in each group from closed (top) to open (bottom) mouth:
Front | Central | Back |
---|---|---|
/i/ | feel machine glee | /f i @l/ /m@S i n/ /gl i / |
/I/ | inflame fill | / I nfleYm/ /f I l/ |
/e/ | excellence fell | / e ks@l@ns/ /f e l/ |
/3:/ | ||
/a/ | axe cat | / a ks/ /k a t/ |
As a considerable regional and individual variation can be found in the realization of certain vowel sounds, it may be useful in some cases to employ one symbol for different phonemes. Thus, the open and mid-back vowels (SAMPA: ‘A’, ‘Q’) are represented by /Q/ only. The central vowel /3:/ is used to transcribe the “r-sound” in combination with /3:/, as in:
bird /b3:d/
… and the “r-sound” in combination with /@/, as in:
another /@nVD@r/
Diphthongs
There are five diphthongs in the American English phoneme inventory:
/@W/ | open bone know | / @W p@n/ /b @W n/ /n @W / |
---|---|---|
/QY/ | oil boisterous boy | / QY @l/ /b QY st@r@s/ /b QY / |
/eY/ | April safe play | / eY pr@l/ /s eY f/ /pl eY / |
/aY/ | ice time sky | / aY s/ /t aY m/ /sk aY / |
/aW/ | owl house now | / aW l/ /h aW s/ /n aW / |
Diphthongs can artificially emerge in a transcription when the phonemes /@/ and /U/, /Q/ and /I/, or /e/ and /I/ are placed adjacent in a word. For example, autoimmune. However, instances of such words are rare and can be ignored.
Specific pronunciation transcription methods
Assimilation
In American English assimilation takes place only with the letter combination <nk>:
Incredible | /INkred@b@l/ |
---|
Initial <wh>
A word initial <wh>, pronounced /hw/ by some American speakers, is concisely represented as /w/ in the Nuance transcriptions. Actually the phoneme has been trained to compensate for the /hw/-variant.
For example:
which | /wItS/ |
---|---|
whale | /weYl/ |
Pronunciation of foreign words
To transcribe foreign words, you must use the American English SAMPA symbol set.
If you use a different symbol set your system will be incapable of understanding the input.
Every language has a different phoneme inventory, so you may have problems in covering each and every sound. For the most common cases we offer some transcription examples.
French nasals
Try to apply a pronunciation that has been adapted to American English, for example:
Bon-Bon | /bQnbOn/ |
---|
The original transcription ‘bo~bo~’ cannot be realized, because the symbol ‘o~’ is not part of the American English SAMPA symbol set.
Vowel ‘y’ in German and French
The vowel ‘y’, found in some German or French words can be represented by /u/, such as:
Duchamp | /duSQmp/ |
---|---|
duenn | /duen/ |
Conveniently this reflects the pronunciation commonly used by American English speakers who are not fully conversant with the particular language.
German fricatives ‘C’ and ‘x’
Palatal and velar fricatives that occur in, for example, German, can be either transcribed as /tS/ or as /k/, instead of ‘C’ or ‘x’. As in:
Bach | /bQk/ |
---|---|
bachus | /batS@s/ |
Multiple pronunciations (variants)
The type of pronunciation used in SAMPA and in the American English dictionary conforms to the standard non-regional American pronunciation. It is possible for other varieties of American English to occur in an application. If they markedly differ from the standard form, they should be transcribed as a separate variant, as in:
advocates | /adv@k@ts/ |
---|---|
advocates <adv@keYts> | /adv@keYts/ |
United States English phoneme set in alphabetical order:
SAMPA | IPA | Examples of usage |
---|---|---|
4 | ɾ | butter hidden |
@ | ə | away |
@W | əʊ | nose |
3: | ɜː | furs |
a | æ | bad |
aW | au̬aʊ̬ | rouse |
aY | aɪ | rise |
b | b | bin |
d | d | day |
D | ð | this |
dZ | ʤ | Jim |
e | e | pet |
eY | eɪ | raise |
f | f | coffee |
g | g | give |
h | h | hit |
i | i: | ease |
I | ɪ | pit |
j | j | yes |
k | k | skin |
l | l | life |
m | m | mock |
n | n | knock |
N | ŋ | thing |
O | ɔː | award |
p | p | pin |
Q | ɑː / ɒ | stars pot |
QY | ɔɪ | toy |
r | r | ring |
s | s | sin |
S | ʃ / ʒ | ship |
t | t | tin |
T | θ | thin |
tS | ʧ | ketchup |
u | u: | lose |
U | ʊ | put |
v | v | saving |
V | ʌ | cut |
w | w | wet |
z | z | crazy |
Z | ʒ | vision |
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.