Spanish United States (es-US)
This documentation was updated on May 8, 2023.
Creating grammars
The following subsections describe key issues for working with grammar documents in the Spanish language.
Character encoding
Nuance Recognizer has full internal Unicode support. You can create your grammars using UTF-8 or Latin-1 (also known as ISO-8859-1) character encoding. For example, your grammar header might be:
<?xml version='1.0' encoding='UTF-8'?>
<grammar xml:lang="es-US" version="1.0" root="test">
In case your keyboard does not match your target language on Windows add the respective keyboard by going to the “Control Panel” click “Regional and Language” and select “Keyboards and languages”.
Below are codes for writing some common Spanish characters. These are useful if you do not have access to a Spanish keyboard, and are typed by pressing the ALT key while entering digits on your keyboard (after typing the last digit, the desired character appears on your screen when you release the Alt key):
Alt/0225 = á | Alt/0250 = ú |
---|---|
Alt/0233 = é | Alt/0252 = ü |
Alt/0237 = í | Alt/0241 = ñ |
Alt/0243 = ó |
alphanum_lc built-in grammar
The alphanum_lc built-in grammar recognizes a connected string of up to 20 digits and lowercase alphabetic characters, such as “a8f9h23”. For example, this grammar could be used to recognize a product code or user id. The “lc” in the name of this built-in means lowercase. The possible characters are the lowercase letters a-z, ñ and the digits 0-9.The character sequences “ch” and “ll” can be spoken as two separate letters, as in “c h” and “l l” or as single letters, as in “doble ele” and “doble erre.” The application layer can adjust the case of the returned letters as needed for further processing.
Note: This grammar replaces the alphanum built-in grammar.
alphanum built-in grammar
(NOTE: for backward-compatibility only. Otherwise, use alphanum_lc builtin)
This grammar has been replaced by the alphanum_lc grammar, but is still available. The alphanum builtin-grammar has been retained for backward-compatibility. For new implementations, please use the alphanum_lc builtin grammar.
The alphanum built-in grammar recognizes a connected string of up to 20 digits and uppercase or lowercase alphabetic characters, such as “A8f9h23”. For example, this grammar could be used to recognize a product code or order number. The possible characters are the uppercase letters A-Z, lowercase letters a-z, and digits 0-9. Uppercase and lowercase letters are homonyms (e.g., “B” and “b”), so the inclusion of both is redundant for the purposes of speech recognition of case insensitive items such as product codes. Thus, the alphanum built-in grammar has been replaced by the alphanum_lc grammar.
boolean built-in grammar
The boolean grammar collects an affirmative or negative response.
Properties
The y and n parameters let you associate any two touchtone buttons as synonyms for yes and no.
Parameter | Description |
---|---|
y | Desired DTMF digit to be equivalent to “sí” (default = 1) |
n | Desired DTMF digit to be equivalent to “no” (default = 2) |
Examples
Caller says… | MEANING key |
---|---|
sí | true |
no | false |
ccexpdate built-in grammar
The ccexpdate grammar understands the expiration date on a credit card. Expiration dates are usually a month and a year, and are often embossed on a credit card in the form “mm/yy.” The grammar recognizes variations on the date, for example, “septiembre 2007,” “doce cero siete,” “ocho treinta y uno de cero siete,” “doce diagonal cero siete,” etc.
creditcard built-in grammar
The creditcard grammar understands a caller saying a credit card number, optionally preceding the number with the credit card name or the word “cuenta.” For example, a caller can say, “visa cuenta cuatro cero uno siete…,” “tarjeta mastercard cinco cero cero dos…,” or “cinco cero cero dos….”
currency built-in grammar
The currency grammar collects currency amounts using dólares and centavos. Because some speakers will say “pesos” when referring to dollars, the grammar recognizes peso as a synonym.
MEANING | Contains a string in the following form: currencymain_unit_amount.subunit_amount If the caller explicitly says “dólar,” then a currency value of USD is added as a prefix If the caller explicitly says “peso,” then a currency value of MXN is added as a prefix If the caller does not explicitly indicate the currency type, or if they say “centavos,” then no prefix is added. If the caller omits the main unit or subunit amount, then that field is zero. The string contains a leading zero if the subunit amount is collected without the main unit. |
---|---|
SWI_literal | Contains the exact text that was recognized. |
Examples
Caller says | MEANING |
---|---|
cinco pesos | MXN5.00 |
cinco centavos | 0.05 |
cinco pesos y cinco centavos | MXN5.05 |
cinco pesos y veinticinco centavos cinco pesos y veinticinco | MXN5.25 |
seiscientos veinticinco mil cuatrocientos sesenta y cuatro pesos | MXN625464.00 |
cuatrocientos doce mil quinientos sesenta pesos con diez centavos | MXN412560.10 |
date built-in grammar
The date grammar accepts a date spoken in any of several formats.
Recognized phrases include “cuatro de junio,” “cuatro de junio de dos mil uno,” “lunes cuatro de junio,” and “el lunes cuatro de junio.”
The grammar also accepts “anteayer,” “ayer,” “hoy,” “mañana,” and “pasado mañana” which return values of -2, -1, 0, +1, and +2 respectively into the MEANING key.
Examples
Caller says | MEANING key |
---|---|
el cinco de enero de dos mil uno | 20010105 |
ayer | -1 |
anteayer | -2 |
hoy | 0 |
mañana | +1 |
pasado mañana | +2 |
el cuatro | ??????04 |
el miércoles | (Phrase not recognized) |
miércoles doce | ??????12 |
el cuatro de junio junio cuatro | ????0604 |
el cuatro de junio de mil novecientos noventa y siete junio cuatro de mil novecientos noventa y siete | 19970604 |
el cuatro de junio del noventa y siete junio cuatro del noventa y siete | ??970604 |
miércoles el cuatro de junio de mil novecientos noventa y siete | 19970604 |
el seis | ??????06 |
diez, doce | (Phrase not recognized) |
diez, doce, noventa y siete | (Phrase not recognized) |
digits built-in grammar
Valid characters are the digits 0-9.
number built-in grammar
The number grammar recognizes whole numeric numbers (the caller must not speak the individual digits).
Up to two decimal places are recognized by default; this can be extended to 9 using the maxdecimal parameter. The caller must speak individual digits after the decimal point (natural numbers not allowed).
Examples
Numbers from -999,999,999.99 to 999,999,999.99 are recognized, but by default the minallowed parameter is set to zero, which limits recognition to positive values.
Caller says | MEANING key |
---|---|
veinticinco | 25 |
doce mil trescientos cuarenta y cinco | 12345 |
menos dos | -2 |
tres punto uno cuatro uno seis | 3.1416 |
phone built-in grammar
The phone grammar collects telephone numbers (landline and cellular) using the North American dialing plan. Callers must speak each digit one at a time.
The grammar accepts 7- and 10-digit North American phone numbers as well as three-digit numbers ending in 11 (for example, “911”). An optional “1” can be placed before the 7- or 10-digit numbers.
Additionally, as stipulated in the VoiceXML specification, the caller may specify an extension, for example, “cinco cuatro dos tres cinco seis cinco siete extension dos mil.” By default, extensions of one to four digits are supported. Natural numbers are allowed for extensions.
Return keys/values
Upon return, the MEANING key is assigned to a 10-character result representing the recognized phone number. A leading “1” is omitted in the return value. For example, if “16789999” is recognized, the return result is “6789999”.
Properties
Property | Description |
---|---|
minextension | Minimum numeric value allowed for an extension. |
maxextension | Maximum numeric value allowed for an extension. Set this to -1 to disallow extensions. |
Extensions to VoiceXML specification
Nuance has interpreted the VoiceXML specification to limit numbers to North American formats. Also, we have extended the VoiceXML specification with the parameters used to limit the allowed extension numbers.
time built-in grammar
The time grammar recognizes spoken time of day utterances from the caller. Recognized phrases include times given in 12-hour format (for example, “a las cinco”) and 24-hour format (“veintitrés quince”). In addition, it will recognize “qualified” times such as “antes de las cinco” and “como a las cinco.”
Examples
For each entry, the values returned in the MEANING and QUALIFIER keys are shown. (Not shown are the values of the HOUR, MINUTE, and AMPM keys.)
Caller says | MEANING | QUALIFIER |
---|---|---|
ahora | (Phrase not recognized) | -- |
en media hora | (Phrase not recognized) | -- |
a mediodía | 1200p | exact |
a media noche | 0000? | exact |
antes del mediodía | 1200p | before |
después de las trece treinta | 1330? | after |
veinte veinte | 2020? | exact |
a las ocho y veinte de la mañana a las ocho de la mañana con veinte minutos | 0820a | exact |
ocho en punto | 0800? | exact |
ocho y cuarto | 0815? | exact |
ocho y media | 0830? | exact |
a las siete y cuarto por la tarde | 0715p | exact |
cuarto para las ocho al cuarto para las ocho cuarto antes de las ocho quince antes de las ocho | 0745? | exact |
cinco antes de la una | 1255? | exact |
al diez para la una | 1250? | exact |
a las siete y cuarto de la tarde | 0715p | exact |
alrededor de las trece horas | 1300h | approx |
a las veinticuatro horas | 0000h | exact |
zipcode built-in grammar
The zipcode grammar recognizes valid United States ZIP Codes in either five- or nine-digit format.
Return keys/values
Upon return, the key MEANING is assigned to the recognized zipcode, and can contain either five or nine digits.
Vocabulary items and pronunciations
This chapter describes considerations for vocabularies and their pronunciations in Spanish (es-US).
Spanish pronunciations
This section provides detailed reference information to help create pronunciation dictionaries. It is intended for people who have sufficient knowledge of the Spanish language as spoken in the United States. It provides information about transcription and pronunciation.
The type of pronunciation used in SAMPA and the Spanish dictionary conforms to the standard non-regional America Spanish pronunciation.
If you are not sure how a certain word is pronounced you can refer to the IPA transcriptions given there and then convert them into the SAMPA symbols, given in The Spanish symbol set in alphabetical order .
The Spanish phoneme system
The Spanish phoneme system can be conveniently divided into three groups:
- Consonants
- Vowels
- Semi-vowel
Furthermore, it is possible to define six different types of consonants:
- Plosives
- Fricatives
- Affricates
- Nasals
- Laterals
- Trills
Spanish spelling is very regular. This means the relationship between spelling (grapheme) and sound (phoneme) is easy to define since the orthography is very regular and correlates very well with pronunciation. Nevertheless, there are some pronunciation variants, mostly regional variants, that will be explained further in Multiple pronunciations (variants) .
Within the vowel group, a distinction can be made between vowels and semivowels. Furthermore, diphthongs represent an additional characteristic among the group of vowels. Spanish has two groups of diphthongs, increasing diphthongs and decreasing diphthongs.
Exceptional in the case of Spanish is the reduction of the vowel set, which will be explained further on in the subsection Single vowels (monophthongs) .
Spanish symbol set grouped by phoneme classes
Phoneme class | SAMPA | IPA | Examples of usage |
---|---|---|---|
Consonants | Plosives | b | b / β |
p | p | paso | /paso/ |
g | g / ɣ | gusto agua | /gusto/ /agwa/ |
k | k | casa quitar | /kasa/ /kitar/ |
d | d / ð | donde juzgado nada | /donde/ /xusgado/ /nada/ |
t | t | nata | /nata/ |
Fricatives | s | s / z | cinco casa juzgado mismo |
x | x | gente jaca | /xente/ /xaka/ |
f | f | fama | /fama/ |
j | j | yema | /jema/ |
Affricates | tS | ʧ | mucho |
Nasals | m | m / ɱ | malo convento confuso |
n | n / ŋ | nota tengo manco | /nota/ /tengo/ /manko/ |
J | ɲ | año | /aJo/ |
Laterals | l | l | lento |
Tap | r | ɾ | puro |
Trill | rr | r | perro |
Vowels | Single vowels | a | a |
e | e / ɛ | meseta llover | /meseta/ /jober/ |
i | ɪ / i | mina | /mina/ |
o | o / ɔ | oficina ojo | /ofisina/ /oxo/ |
u | u | pluma | /pluma/ |
Semi-vowels | j | j / ʎ | piel calle |
w | ʊ / w | cual | /kwal/ |
Decreasing diphthongs | aj | ai | aire |
ej | ɛi | veinte | /bejnte/ |
oj | ɔi | boina | /bojna/ |
aw | au | áureo | /awreo/ |
ew | ɛu | éustilo | /ewstilo/ |
ow | Ǻu | bou | /bow/ |
Increasing diphthongs | ja | ia | cambiar |
je | ie | pie | /pje/ |
jo | iɔ | piojo | /pjoxo/ |
ju | iu | viuda | /bjuda/ |
wa | ua | cuadro | /kwadro/ |
we | ue | puerto | /pwerto/ |
wi | ui | cuidar | /kwidar/ |
wo | uo | cuota | /kwota/ |
Spanish consonants
The standard Spanish consonant system is generally considered to have:
- Six plosives
- Four fricatives
- One affricative
- Three nasals
- One lateral
- One tap
- One trill
The sample words given below demonstrate the different contexts in which the sounds can appear. A short explanation is also given.
Plosives
There are three voiced and three voiceless plosives in Spanish, which can be arranged in pairs as shown below:
Voiced | Voiceless |
---|---|
/b/ | basura invierno wáter |
/g/ | guisar ángulo |
/d/ | donde andar caldo |
Fricatives
There are four fricatives in Spanish, /s/, /x/, /f/, and /j/:
/s/ | cero acceder pez sal arsenal dos | /sero/ /akseder/ /pes/ /sal/ /arsenal/ /dos/ |
---|---|---|
/x/ | gente caja boj | /xente/ /kaxa/ /box/ |
/f/ | fama fiel | /fama/ /fjel/ |
/j/ | yodo reyes | /jodo/ /rrejes/ |
Affricates
In the Spanish SAMPA symbol set there is one affricate, /tS/. Affricates in SAMPA are always represented by two single phonemes.
/tS/ | chucho cacharro | /tSutSo/ /katSarro/ |
---|
Nasals
There are three nasals in Spanish, /m/, /n/, and /J/:
/m/ | mano ambos convento confusión tándem | /mano/ /ambos/ /kombento/ /komfusjon/ /tandem/ |
---|---|---|
/n/ | nota antes carne canción | /nota/ /antes/ /karne/ /kansjon/ |
/J/ | ñoño año | /JoJo/ /aJo/ |
Laterals
There is one lateral in the Spanish SAMPA set, /l/:
/l/ | lento palco alar leal | /lento/ /palko/ /alar/ /leal/ |
---|
Tap and trill
Spanish has one tap and one trill; both are pronounced with the tip of the tongue: /r/ and /rr/.
/r/ | cero parar artificio | /sero/ /parar/ /artifisjo/ |
---|---|---|
/rr/ | ratón carro alrededor enrubiar desrizar | /rraton/ /karro/ /alrrededor/ /enrrubjar/ /desrrisar/ |
Spanish vowels
This section discusses the Spanish vowels in these groupings:
- Single vowels (monophthongs)
- Semi-vowels
- Diphthongs
Single vowels (monophthongs)
Generally, the Spanish language has nine distinguishable monophthongs:
- the vowel /a/
- two representations for each of the vowels <e>, <i>, <o>, and <u> (formed basically as a long and a short variant)
Since these vowels have similar pronunciation, and the different phonemes do not carry important information to convey the meaning of a word, it was decided to use only one phoneme for each vowel. Subsequent speech recognition testing has shown very good results for this practice. The main advantage for transcription is that it reduces the amount of phonemes to be considered and, at the same time, reduces a possible error source.
SAMPA | Examples |
---|---|
/a/ | amar desatar boca |
/e/ | enano camelar célebre |
/i/ | icono desinflar tití |
/o/ | olor déspota campo |
/u/ | humano reputación menú |
Semi-vowels
In this Spanish SAMPA phoneme inventory, two semi-vowels are to be found, /j/ and /w/. For example:
/j/ | ciudad cambiar piel piojo llanto calle | /sjudad/ /kambjar/ /pjel/ /pjoxo/ /janto/ /kaje/ |
---|---|---|
/w/ | agua tuerto | /agwa/ /twerto/ |
/j/ is used as a fricative consonant as well. For example:
/j/ | yema | /jema/ |
---|
Diphthongs
In Spanish diphtongs are normally formed by the combination of a strong vowel (a, e, o) and a weak vowel (i, u). The vowel forms the nucleus of the syllable. The reduced vowel set (see Single vowels (monophthongs) ) also applies to the diphthongs.
Take care with hiatus. They are also formed as a conjunction of two vowels, but each vowel forms the nucleus of a different syllable. For example:
pi-a-no | /piano/ |
---|---|
con-ti-nú-a | /kontinua/ |
In the Spanish language six decreasing /aj/, /ej/, /oj/, /aw/, /ew/, /ow/ and eight increasing diphthongs /ja/, /je/, /jo/, /ju/, /wa/, /we/, /wi/, /wo/ can be distinguished.
Decreasing diphthongs
Decreasing diphthongs have the first vowel as the nucleus of the syllable. The vocal organs move from an open position into a closed position.
/aj/ | aire desairar hay | /ajre/ /desajrar/ /aj/ |
---|---|---|
/ej/ | veinte rey | /bejnte/ /rrej/ |
/oj/ | boina voy | /bojna/ /boj/ |
/aw/ | áureo | /awreo/ |
/ew/ | éustilo | /ewstilo/ |
/ow/ | bou | /bow/ |
Increasing diphthongs
Increasing diphthongs have the second vowel as the nucleus of the syllable. The vocal organs–especially the tongue–move from a closed into an open position.
/ja/ | cambiar | /kambjar/ |
---|---|---|
/je/ | pie | /pje/ |
/jo/ | piojo | /pjoxo/ |
/ju/ | viuda | /bjuda/ |
/wa/ | cuadro | /kwadro/ |
/we/ | cuenca | /kwenka/ |
/wi/ | cuidar | /kwidar/ |
/wo/ | cuota | /kwota/ |
Specific pronunciation transcription methods
Initial <h>
The initial h should always be ignored in transcription as it is not pronounced in Spanish. For example:
hotel | /otel/ |
---|---|
ahora | /aora/ |
Transcription of the tap /r/ and trill /rr/
The tap /r/ appears in the middle of a word between two vowels, between a vowel and a consonant and between a consonant and a vowel other than <n>, <l>, or <s>. It also occurs at the end of a word. For example:
cero | /sero/ |
---|---|
artificio | /artifisjo/ |
apresar | /apresar/ |
The trill /rr/ appears in the middle of a word as <rr> between two vowels, preceded by the letters <n>, <l>, or <s>, or as initial <r>. For example:
carro | /karro/ |
---|---|
enrubiar | /enrrubjar/ |
alrededor | /alrrededor/ |
desrizar | /desrrisar/ |
ratón | /rraton/ |
Transcription of the nasals /J/ and /n/
The Spanish letter <ñ> is always represented by the Spanish SAMPA symbol /J/. For example:
año | /aJo/ |
---|---|
muñeca | /muJeka/ |
The Spanish nasal /n/ appears before the phonemes /g/, /k/, and /x/. For example:
tengo | /tengo/ |
---|---|
donjuanesco | /donxwanesko/ |
nunca | /nunka/ |
inquirir | /inkirir/ |
Assimilation
In Spanish only the /n/ can be assimilated:
nb | /mb/ | convento | /kombento/ |
---|---|---|---|
nf | /mf/ | confuso | /komfuso/ |
nv | /mb/ | invita | /imbita/ |
Pronunciation of foreign words
To transcribe foreign words, you must use the Spanish SAMPA symbols.
If you use a different symbol set your system will be incapable of understanding the input.
Every language has a different phoneme inventory, so you may have problems in covering each and every sound. In order to get a Spanish transcription which is closest to the transcription in the original language, the Spanish SAMPA symbols that most resemble the SAMPA symbols of the foreign languages are the ones to be used.
For example:
bordeaux | /bordo/ |
---|
The original transcription ‘bORDo’ cannot be realized because the French symbols ‘O’ and ‘R’ do not belong to the Spanish SAMPA symbol set. Therefore, these symbols have to be replaced by the Spanish symbols which are closest to the French ones. In this case /o/ replaces ‘O’ and /r/ replaces ‘R’.
Moreover, there has been a re-adaptation of the Spanish phoneme /tS/ in order to get the phoneme ‘S’. The phoneme ‘S’ does not exist in Spanish as a single phoneme although it does exist in combination with the plosive /t/ to form the affricate /tS/. In other words, there has been a reinterpretation of /tS/ to get the phoneme ‘S’ to match the transcription of foreign words containing the ‘S’ phoneme.
For example:
beige | beS |
---|---|
beige<bejs> | bejs |
beige<bejx> | bejx |
In this example, the base form contains the ‘S’ phoneme and is closest to the original transcription, while the other two variants show other phonetic variations closer to the Spanish pronunciation.
For the most common cases we offer transcription examples. In some of the cases we provide one transcription whereas in other cases, a second or even third variant are introduced. The need of these variants show that the Spanish speaker pronounces the foreign word using the Spanish phonetic set.
French <g> and <j>
Try to apply a pronunciation that has been adapted to Spanish, for example:
collage | kolas |
---|---|
déjà_vu | desabu |
rouge | rrus |
The original transcriptions ‘kolaZ’, ‘deZavy’, and ‘RuZ’ cannot be realized because the French symbols ‘Z’, ‘v’, ‘y’, and ‘R’ are not part of the Spanish SAMPA symbol set.
Double <l> in foreign Words
The grapheme <ll> can be well represented by the Spanish SAMPA symbol /l/. For example:
Nelly Nelly<neji> | neli neji |
---|---|
allegro | alegro |
Foreign vowels
Even with English vowels you have to try to apply a pronunciation that has been adapted to Spanish, for example:
buggy buggy<bugi> | bagi bugi |
---|---|
cross-country | kroskantri |
The original transcriptions ‘bVgI’ and ‘krQskVntrI’ cannot be realized because the English symbols ‘V’, ‘I’, and ‘‘Q’ are not part of the Spanish SAMPA set.
French nasals
Try to apply a pronunciation that has been adapted to Spanish, for example:
bombon | /bombon/ |
---|
The original transcription ‘bo~bo~’ cannot be realized because the French symbol ‘o~’ is not part of the Spanish SAMPA symbol set.
Multiple pronunciations (variants)
The type of pronunciation used in SAMPA and the Spanish dictionary conforms to the standard non-regional American Spanish pronunciation. Other varieties can also occur in an application. If they markedly differ from the standard form, they should be transcribed as separate variants in the format below:
cansado | kansado |
---|---|
cansado <kansao> | kansao |
In this section, the most common pronunciation variants of the Spanish language are listed. In some regions it may be helpful to consider different ways of pronunciation.
Reduction of -ado (such as in some Caribbean regions)
Words ending with -ado are sometimes reduced to -ao. For example:
jorobado | /xorobado/ versus /xorobao/ |
---|---|
cansado | /kansado/ versus /kansao/ |
The Spanish symbol set in alphabetical order:
SAMPA | IPA | Examples of usage |
---|---|---|
a | a | mano |
aj | ai | aire |
aw | au | áureo |
b | b / β | basta cabo |
d | d / ð | donde juzgado nada |
e | e / ɛ | meseta llover |
ej | ɛi | veinte |
ew | ɛu | éustilo |
f | f | fama |
g | g / ɣ | gusto agua |
i | ɪ / i | mina |
j | j | yema |
j | j / ʎ | piel calle |
J | ɲ | año |
ja | ia | cambiar |
je | ie | pie |
jo | iɔ | piojo |
ju | iu | viuda |
k | k | casa quitar |
l | l | lento |
m | m / ɱ | malo convento confuso |
n | n / ŋ | nota tengo manco |
o | o / ɔ | oficina ojo |
oj | ɔi | boina |
ow | ɔu | bou |
p | p | paso |
r | ɾ | puro |
rr | r | perro |
s | s / z | cinco casa juzgado mismo |
t | t | nata |
tS | ʧ | mucho |
u | u | pluma |
w | ʊ/ w | cual |
wa | wa | cuadro |
we | we | puerto |
wi | wi | cuidar |
wo | wo | cuota |
x | x | gente jaca |
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.