Portuguese Portugal (pt-PT)

This documentation was updated on January 26, 2024.

Creating grammars

The following subsections describe key issues for working with grammar documents in the Portuguese language.

Character encoding

Nuance Recognizer has full internal Unicode support. For example, you can create your grammars using UTF-8 or Latin-1 (also known as ISO-8859-1) character encoding. For example, your grammar header might be:

<?xml version=‘1.0’ encoding=‘UTF-8’?> <grammar xml:lang=“pt-PT” version=“1.0” root=“test”>

alphanum_lc built-in grammar

The alphanum_lc built-in grammar recognizes a connected string of up to 20 digits and lower case alphabetic characters. For example, this grammar could be used to recognize a product code or order number.

Characters are the letters a-z, including accented letters.

Digits are 0-9.

Returned keys/values

MEANING Contains a string of digits and lowercase letters, with no embedded spaces.
SWI_literal Contains the exact text that was recognized.

Note: This alphanum_lc built-in grammar replaces the alphanum built-in grammar.

alphanum built-in grammar

The alphanum built-in grammar recognizes a connected string of up to 20 digits and alphabetic characters. For example, this grammar could be used to recognize a product code or order number.

Characters are the letters a-z, including accented letters.

Digits are 0-9.

Returned keys/values

MEANING Contains a string of digits and lowercase letters, with no embedded spaces.
SWI_literal Contains the exact text that was recognized.

boolean built-in grammar

The boolean grammar collects an affirmative or negative response.

Properties

The y and n parameters let you associate any two touch tone buttons as synonyms for yes and no.

Parameter Description
y Desired DTMF digit to be equivalent to “sim” (default = 1)
n Desired DTMF digit to be equivalent to “não” (default = 2)

Examples

Caller says… MEANING key
sim true
não false

ccexpdate built-in grammar

The ccexpdate grammar understands the expiration date on a credit card. Expiration dates are usually a month and a year, and are often embossed on a credit card in the form “mm/yy.” The grammar recognizes variations on the date, for example, “dezembro 2005,” “zero quatro zero zero,” “onze barra zero três,” etc. The forward slash symbol can be spoken as “barra,” “de,” or “do.”

Some credit cards are stamped with a day of the month as well as the month and year; the ccexpdate grammar recognizes these dates as well. However, the only day of the month it recognizes is the last day of a given month, e.g., “30 novembro 2005,” “zero dois dois nove zero zero,” etc. The grammar does not check for leap years: both 28 February and 29 February are recognized, regardless of the given year.

Return keys/values

Upon return, the MEANING key is assigned to the recognized date in YYYYMMDD format, where YYYY is the year, MM is the month, and DD is the day. For example, 20050331 refers to 31 March 2005. The value is the same regardless of whether the caller specified a day of the month or not; the day is always set to the last day of the month. For example, both “zero seis zero cinco” and “zero seis três zero zero cinco” return 20050630. Note that if the expiration month is February, MMDD is always 0228, regardless of what the caller said or whether or not the expiration year is a leap year.

creditcard built-in grammar

The creditcard grammar understands a caller saying a credit card number, optionally preceding the number with the credit card name, or the words “conta número” or “conta.” For example, a caller can say, “visa conta número quatro zero um sete…,” “cartão mastercard cinco zero zero dois…,” or “três sete três cinco….”

currency built-in grammar

The currency grammar collects currency amounts using euro and cêntimo, escudo and centavo, or dólar and cêntimo.

Returned keys/values

MEANING contains a string in the following form: <currency> main_unit_amount . subunit_amount If the caller explicitly says “euro,” then EUR is added as a prefix as the value of <currency>. If the caller says “escudo,” then a PTE prefix is added. If the caller says “dólar,” then a USD prefix is added. In all other cases, no prefix is added.
SWI_literal contains the exact text that was recognized.

Examples

Caller says MEANING
5 euros EUR5.00
5 escudos PTE5.00
5 centavos 0.05
5 euros e 5 cêntimos EUR5.05
5 euros 25 cêntimos 5 euros 25 EUR5.25
seiscentos e vinte e cinco mil quatrocentos sessenta e quatro euros EUR625464.00
um dólar zero cêntimo USD1.00

date built-in grammar

The date grammar accepts a date spoken in any of several formats.

Recognized phrases include “4 junho,” “4 junho 2006,” ““4, 6, 2006,” “o dia quatro,” and “segunda-feira, o quatro de junho.”

Because the grammar does not know the current date, it returns question marks (?) wherever the caller omits information (see the examples and the discussion of return keys for more information).

The grammar also accepts “ontem” “hoje,” “amanhã,” and “depois de amanhã” which return values of -1, 0, +1, and +2 respectively into the MEANING key.

Note that to be understood, at least the day of the month must be present. Phrases like “próxima quarta-feira” and “junho 2006” are not understood.

There is no validity checking on the date, either for day-of-week or days-in-month validity. For example, “segunda-feira, 4 de julho, 2001” is not automatically rejected even though july 4, 2001 is a Wednesday. Similarly, “31 abril” is not rejected even though April has only 30 days.

Examples

Caller says MEANING key
5 Janeiro, 2000 20000105
ontem -1
hoje 0
amanhã +1
depois de amanhã +2
o dia quatro ??????04
quarta-feira (Phrase not recognized)
4 de junho ????0604
4 de junho de 1997 19970604
4 de junho de 97 ??970604
quarta-feira 4 de junho de 1997 19970604
dez doze Not allowed
dez doze noventa e sete Not allowed

digits built-in grammar

Valid characters are the digits 0-9.

number built-in grammar

The number grammar recognizes whole numeric numbers (the caller must not speak the individual digits). Decimal places are not allowed.

Examples

Numbers from -999,999,999.99 to 999,999,999.99 are recognized. For example:

Caller says MEANING key
vinte e cinco 25
doze mil trezentos e quarenta cinco 12345
menos quatro -4

phone built-in grammar

The phone grammar recognizes 9-digit telephone numbers (landline and cellular).

postcode built-in grammar

The postcode grammar recognizes valid postal code in Portugal in either four- or seven-digit format:

xxxx-xxx
xxxx

The grammar allows callers to speak natural numbers for the first four digits. For example, callers can say “mil” instead of “um zero zero zero.” However, this introduces possible ambiguities: what is meant if the caller says “mil duzentos trinta_e_quatro”? If the caller is using a 4-digit postal code, the meaning is 1234; but if the caller is using 7 digits, the intent is 1000234. The recognizer handles the ambiguous result using its standard mechanism: it reports both values separated by the pound symbol (1234#1000234).

Return keys/values

Upon return, the key MEANING is assigned to the recognized code, and can contain either 4 or 7 digits.

Properties

The length parameter is useful for tuning purposes. Initially, applications should accept the default, and then change the value if callers are speaking 4 or 7 digits exclusively. Applications can also set the length to avoid the ambiguous situation described above.

Parameter Description
length Sets the allowed number of digits in the spoken postal code. Possible values: 4, 7, or 4%2B7 (the default). (The %2B is the escape sequence for the plus sign, which is a special character that must be escaped in a URI.)

time built-in grammar

The time grammar recognizes a time of day.

The grammar accepts spoken time utterances from the caller. Recognized phrases include times given in 12-hour format (e.g., “5 horas”) and 24-hour format (“vinte e três e quinze”). In addition, it will recognize “qualified” times such as “antes das cinco horas” and “ao redor das cinco” (or “por volta das cinco”).

Examples

For each entry, the values returned in the MEANING and QUALIFIER keys are shown. (Not shown are the values of the HOUR, MINUTE, and AMPM keys.)

Caller says MEANING QUALIFIER
ao meio-dia 1200p exact
à meia-noite 0000? exact
antes do meio-dia 1200p before
depois das treze e trinta 1330h after
vinte e vinte 2020h exact
oito e vinte da manhã 0820a exact
oito e meia 0830? exact
sete e quinze da noite 0715p exact
às vinte e quatro horas 2400 Not allowed unless maxexpected = 2459.

Vocabulary items and pronunciations

This chapter describes considerations for vocabularies and their pronunciations in Portuguese (pt-PT).

Specially tuned pronunciations

The following table shows common words that are fine-tuned by Nuance. Each of these words contains “word-specific phonemes;” that is, phonemes and associated models created especially for the words.

Words with tuned pronunciations (do not modify):

All letters of the alphabet, a-z
Boolean: sim and não
Digits: 0-9
Cardinal numbers: 0-99, 100, and 1000
Ordinal numbers: 1.-31. (1º through 31º)

Portuguese pronunciations

This section provides detailed reference information to help create pronunciation dictionaries. It is intended for people who have sufficient knowledge of the Portuguese language as spoken in Portugal. It provides information about transcription and pronunciation.

This section explains all the phonemes and their SAMPA symbols used in Portuguese. As reference pronunciation dictionaries, we use:

CASTELEIRO, J. M. (ed.), Dicionário Gramatical de Verbos Portugueses, Lisboa: Texto Editores, 2007 [for verbal inflexion]

Dicionário da Língua Portuguesa Contemporânea da Academia das Ciências, Lisboa: Ed. Verbo, 2001.

Grande Dicionário Língua Portuguesa, Porto: Porto Editora, 2004.

MATEUS, ANDRADE, VIANA, VILLALVA: Fonética, Fonologia e Morfologia do Português, Lisboa: Univ. Aberta, 1990

If you are not sure how a certain word is pronounced you can refer to the IPA transcriptions given there and then convert them into the SAMPA symbols, given in The Portuguese symbol set in alphabetical order.

The Portuguese phoneme system

The Portuguese phoneme system can be divided into two groups:

  • Consonants
  • Vowels

Furthermore, it is possible to distinguish five different types of Portuguese consonants:

  • Plosives
  • Fricatives
  • Nasals
  • Laterals
  • Trills

Within the vowel group, a further distinction can be made between vowels and semi-vowels.

Furthermore the diphthongs, the schwa and the nasal-vowels represent three additional characteristics among the vowel group.

Affricates do not play an important role in contemporary Portuguese, since they only occur in obsolete dialectic variants.

Portuguese symbol set grouped by phoneme classes

The following table is an overview of the phonemes of the Portuguese SAMPA and IPA symbol set, grouped by the phoneme classes to which they belong (according to the manner of their articulation):

Phoneme class SAMPA IPA Examples of use
Consonants Plosives b b
p p picar /pikar/
g g golpear /gol~pjar/
k k casa /kaz6/
d d dar /dar/
t t tirar /tirar/
Fricatives v v viver
f f faca /fak6/
z z casa /kaz6/
s s sol /sOl~/
Z ʒ jornal /Zurnal~/
S ʃ chover /Suver/
Nasals m m amiga
n n núvem /nuv6~j/
J ɲ sozinho /sOziJu/
Laterals l l
l~ ɫ mal /mal~/
L ʎ valha /vaL6/
Trills r r caro
R R carro /kaRu/
Vowels Single vowels i i
e e mês /meS/
E ɛ /pE/
a a /pa/
6 ɐ para /p6r6/
O ɔ nove /nOv@/
o o novo /novu/
u u tudo /tudu/
Diphthongs aj aj pai
aw aw pau /paw/
6j ej falei /f6l6j/
ew ew seu /sew/
Ew ɛw céu /sEw/
Ej ɛj papéis /p6pEjS/
iw iw frio /friw/
Oj ɔj atóis /6tOjS/
oj oj coisa /kojz6/
6~j ɐ~j Belém /b@l6~j/
6~w 6~w chão /S6~w/
o~j o~i põe /po~j/
uj uj fui /fuj/
u~j u~j muito /mu~jtu/
Semi-vowel (reduced) @ ə de
Semi-vowels j j pai
w w pau /paw/
Vowels Nasal vowels i~ ɨ
e~ e~ pente /pe~t@/
6~ ɐ~ irmã /irm6~/
o~ o~ bom /bo~/
u~ u~ mundo /mu~du/

Portuguese consonants

The standard Portuguese consonant system is considered to have:

  • Six plosives
  • Six fricatives
  • Three nasals
  • Three laterals
  • Two trills

The sample words given below demonstrate the different contexts in which the sounds can appear. A short explanation is also given.

Plosives

There are three voiced and three voiceless plosives in Portuguese, which can be arranged in pairs as shown below:

Voiced Voiceless
/b/ baile caber
/g/ gato fogo
/d/ dar mandar

Fricatives

The six fricatives also can be arranged in pairs:

Voiced Voiceless
/v/ vir viver
/z/ zangar casa
/Z/ jantar ajudar

In general one can say that affricates do not exist in Portuguese anymore. The two affricates [tS] and [dS] only occur as rare dialectic variants and are therefore not included in our phoneme inventory.

Nasals

There are three nasals in Portuguese:

/m/ mais cama /majS/ /k6m6/
/n/ no cano /nu/ /k6nu/
/J/ sozinho /sOziJu/

The SAMPA symbol /J/ always appears when <n> meets <h>.

Laterals

There are three laterals in Portuguese:

/l/ lado falar mal /ladu/ /f6lar/ /mal~/
/L/ olho /oLu/
/l~/ sol /sOl~/

The SAMPA lateral /L/ is being produced when <l> meets <h>.

Trills

There are two trills used in Portuguese which are pronounced with the tongue tip touching the alveols [r] or the uvula [R].

/r/ caro fazer /karu/ /f6zer/
/R/ rua carro /Ru6/ /kaRu/

In the Portuguese phoneme system another trill exists which is the [r~]. This sound is a dialectic variant, which is not often used. It is produced by a multiple touching of the alveol with the tongue tip.

Portuguese vowels

Diphthongs

There are fourteen diphthongs in Portuguese:

/aj/ pai /paj/
/aw/ mau /maw/
/6j/ farei /f6r6j/
/ew/ eu /ew/
/Ew/ céu /sEw/
/Ej/ papéis /p6pEjS/
/iw/ fugiu /fuZiw/
/Oj/ dóis /dOjS/
/oj/ noite foi /nojt@/ /foj/
/6~j/ mamem /m6m6~j/
/6~w/ não /n6~w/
/o~j/ põe /po~j/
/uj/ uiva compossuidor fui /ujv6/ /ko~pusujdor/ /fuj/
/u~j/ muito /mu~jtu/

Reduced vowel schwa

The schwa sound is short and occurs in an unstressed position. Mostly it represents a reduction of a vowel sound.

/@/ escapar meter de /@Sk6par/ /m@ter/ /d@/

Semi-vowels

In the Portuguese SAMPA Phoneme Inventory two semi-vowels can be found, /j/ and /w/. For example:

/j/ pai /paj/
/w/ pau /paw/

Nasal vowels

There are five different nasal vowels in the Portuguese phoneme inventory. Examples:

/i~/ fim /fi~/
/e~/ pente /pe~t@/
/6~/ irmã /irm6~/
/o~/ bom /bo~/
/u~/ mundo /mu~du/

Specific pronunciation transcription methods

Initial <h>

The initial <h> should always be ignored in transcription as it is not pronounced in Portuguese. For example:

hotel /OtEl~/
hora /Or6/

Two identical vowels adjacent to each other

When the vowel <e> is followed by the same vowel the first <e> will be transcribed as /j/. For example:

preencher /prje~Ser/

Differences between fricatives and plosives

/g/ versus /Z/

The Portuguese realizes <g> as /g/ when it is followed by the vowels /u/, /a/, and /o/. It is realized as /Z/ when it is followed by the vowels /e/ and /i/.

When <g> is combined with the vowel u and e or i the u is ignored and <g> is realized as /g/. For example:

golpear /gol~pjar/
guloso /gulozu/
garagem /g6raZ6~j/

But:

girar /Zirar/
gemer /Z@mer/

u-combination:

guiar /gjar/
guerra /gER6/

/k/ versus /s/

The Portuguese <c> can also be realized as fricative and as plosive. As in the example above the realization differs when <c> is followed by the same vowels. For example:

casa /kaz6/
coisa /kojz6/
cúmulo /kumulu/

But:

cego /sEgu/
cigarro /sigaRu/

The combination with the <u> does not exist. To realize a /k/ in front of /e/ and /i/ the combination <qu> is used. For example:

querer /k@rer/

Transcription of the fricatives /s/ and /z/

The voiceless fricative /s/ occurs in front position before vowels. It also occurs in onset position after a nasal vowel, a lateral consonant or a tril. Before a voiceless consonant and in end position it is transcribed as /S/. The realization of /S/ in front position needs an <e> in front of it. Before voiced consonants, <s> is pronounced /Z/. For example:

sol /sOl~/
tenso /te~su/
falso /fal~su/
urso /ursu/
estar /@Star/
mês /meS/
mesmo /meZmu/
desde /deZd@/

The voiced fricative /z/ occurs before vowels. For example:

casa /kaz6/
zangar /z6~gar/

Transcription of the trills /r/ and /R/

The trill /r/ appears in the middle of a word between two vowels, between a vowel and a consonant. It also occurs at the end of a word. For example:

zero /zEru/
jornal /Zurnal~/
tirar /tirar/

The trill /R/ appears in the middle of a word as <rr> between two vowels or as initial <r>. For example:

rua /Ru6/
carro /kaRu/

Transcription of the nasal /J/

The combination of <n> and <h> always produces the sound /J/. For example:

vinho /viJu/
cozinha /kuziJ6/

Transcription of the lateral /L/

The combination of <l> and <h> always produces the sound /L/. For example:

olho /oLu/
alho /aLu/

Elimination

In Portuguese the consonant <c> and the consonant <p> can be eliminated when they are followed by the plosive /t/. For example:

óptico /Otiku/ or /Optiku/
sector /sEtor/ or /sEktor/

Pronunciation of foreign words

When there is a need to transcribe foreign words, the general rule is to transcribe those words with the same SAMPA symbol set than the rest. In case of a Portuguese transcription you have to transcribe every word of the dictionary with the Portuguese SAMPA symbols.

If you use a different symbol set your system will be incapable of understanding the input.

Every language has a different phoneme inventory, so you may have problems in covering each and every sound. For the most common cases we offer transcription examples.

French nasals

Try to apply a pronunciation that has been adapted to Portuguese, for example:

bâton /bato~/

The original transcription ‘batO~’ cannot be realized because the French symbol ‘O~’ is not a part of the Portuguese SAMPA symbol set.

English words

Even with English words you have to try to apply a pronunciation that has been adapted to Portuguese, for example:

camping /kEmpiJ/

The English phonemes ‘{’ and ‘N’ in the original transcription ‘k{mpIN’ are not elements of the Portuguese SAMPA symbol set.

The Portuguese symbol set in alphabetical order

The following table shows the Portuguese symbol set in alphabetical order:

SAMPA IPA Examples of Usage
6 ɐ para
6~ ɐ~ irmã
@ ə de
a a
b b beber
d d dar
E ɛ
e e mês
e~ e~ mente
f f fogo
g g gato
i i dividir
i~ i~ fim
J ɲ sozinho
j j pai
k k carro
L ʎ valha
l l lado
l~ ɫ mal
m m amiga
n n no
O ɔ dorme
o o novo
o~ o~ onde
p p por
R R carro
r r dor
S ʃ chover
s s sal
t t tanto
u u Portugal
u~ u~ um
v v ver
w w água
Z ʒ jornal
z z zona