Spanish Catalan (ca-ES)
This documentation was updated on October 31, 2023.
Creating grammars
The following subsections describe key issues for working with grammar documents in the Catalan language.
Character encoding
Nuance Recognizer has full internal Unicode support. For example, you can create your grammars using UTF-8 or Latin-1 (also known as ISO-8859-1) character encoding. For example, your grammar header might be:
<?xml version='1.0' encoding='UTF-8'?>
<grammar xml:lang="ca-ES" version="1.0" root="test">
Below are codes for writing some common Catalan characters. These are useful if you do not have access to a Catalan keyboard, and are typed by pressing the Alt key while entering digits on your keyboard (after typing the last digit, the desired character appears on your screen when you release the Alt key):
Alt/0224 = à | Alt/0237 = í |
---|---|
Alt/0225 = á | Alt/0239 = ï |
Alt/0231 = ç | Alt/0242 = ò |
Alt/0232 = è | Alt/0243 = ó |
Alt/0233 = é | Alt/0249 = ù |
Alt/0250 = ú |
In case your keyboard does not match your target language on Windows add the respective keyboard by going to the “Control Panel” click “Regional and Language” and select “Keyboards and languages”.
alphanum_lc built-in grammar
The alphanum_lc grammar recognizes a connected string of up to 20 digits and lower case alphabetic characters.
For example, this grammar could be used to recognize a product code or order number.
Characters are the letters a-z, and à, á, ç, è, é, í, ï, ò, ó, ù, and ú.
Digits are 0-9.
Non-alphanumeric characters such as hyphens (-), dots (.), and underscores (_) are not recognized; if spoken they reduce recognition accuracy.
Returned keys/values
MEANING | Contains a string of ISO-8859-1 digits and lowercase letters, with no embedded spaces. |
---|---|
SWI_literal | Contains the exact text that was recognized. |
Note: the alphanum_lc built-in grammar replaces the alphanum built-in grammar.
alphanum built-in grammar
The alphanum grammar recognizes a connected string of up to 20 digits and alphabetic characters. For example, this grammar could be used to recognize a product code or order number.
Characters are the letters a-z, and à, á, ç, è, é, í, ï, ò, ó, ù, and ú.
Digits are 0-9.
Non-alphanumeric characters such as hyphens (-), dots (.), and underscores (_) are not recognized; if spoken they reduce recognition accuracy.
Returned keys/values
MEANING | Contains a string of ISO-8859-1 digits and lowercase letters, with no embedded spaces. |
---|---|
SWI_literal | Contains the exact text that was recognized. |
boolean built-in grammar
The boolean grammar collects an affirmative or negative response.
Properties
The y and n parameters let you associate any two touchtone buttons as synonyms for yes and no.
Parameter | Description |
---|---|
y | Desired DTMF digit to be equivalent to “sí” (default = 1) |
n | Desired DTMF digit to be equivalent to “no” (default = 2) |
Examples
Caller says | MEANING key |
---|---|
sí | true |
no | false |
digits built-in grammar
Valid characters are the digits 0-9.
Vocabulary items and pronunciations
This chapter describes considerations for vocabularies and their pronunciations in Catalan (ca-ES). Your product documentation covers details about how to work with pronunciations and dictionaries.
Specially tuned pronunciations
The following table shows common words that are fine-tuned by Nuance. Each of these words contains “word-specific phonemes;” that is, phonemes and associated models created especially for the words.
Words with tuned pronunciations (do not modify):
All letters of the alphabet, a-z, à, é, ç, è, é, í, ï, ò, ó, ù, and ú.
Boolean: sí and no
Digits: 0-9
Catalan pronunciations
This section provides detailed reference information to help create pronunciation dictionaries. It is intended for people who have sufficient knowledge of the Catalan language as spoken in Spain. It provides information about transcription and pronunciation.
As a reference pronunciation dictionary we use:
> Langenscheidts Universalwörterbuch Katalanisch , Katalanisch-Deutsch/Deutsch-Katalanisch, edition Langenscheidt, Berlin 2000
If you are not sure how a certain word is pronounced you can refer to the IPA transcriptions given there and then convert them into the SAMPA symbols, given in The Catalan symbol set in alphabetical order .
The Catalan phoneme system
The Catalan phoneme system can be divided into two groups:
- Vowels
- Consonants
It is possible to distinguish seven different types of Catalan consonants:
- Plosives
- Fricatives
- Affricates
- Nasals
- Laterals
- Trill
- Xenophones
Catalan symbol set grouped by phoneme classes
The following table shows all phonemes used in Catalan transcriptions. They are listed according to their phoneme classes with their SAMPA and IPA representations.
Phoneme class | SAMPA | IPA | Examples of use |
---|---|---|---|
Consonants | Plosives | b | b |
p | p | pellícula | /p@likul@/ |
d | d | diploma | /diplom@/ |
t | t | teatre | /teatr@/ |
g | g | garantia | /g@r@nti@/ |
k | k | informàtica | /imfurrmatik@/ |
Fricatives | f | f | micròfon |
s | s | observar | /ups@rba/ |
S | ʃ | peix | /peS/ |
Z | ʒ | pellroja | /peLrrOZ@/ |
j | j | aires | /ajr@s/ |
z | z | atletisme | /@ll@tizm@/ |
Affricates | dZ | ʤ | garatge |
Nasals | m | m | humana |
N | ŋ | llengua | /LeNgw@/ |
J | ɲ | acompanyar | /@kump@Ja/ |
n | n | ajuntament | /@Zunt@men/ |
Laterals | l | l | blanca |
L | ʎ | caballero | /kabaLero/ |
Trills | r | r | cabaret |
rr | rr | caràcter | /k@rakt@rr/ |
Vowels | Single vowels | a | a |
e | e | doble | /doble/ |
E | ɛ | dèbil | /dEbil/ |
i | i | edicions | /@disions/ |
o | o | editor | /@dito/ |
O | ɔ | escola | /@skOl@/ |
u | u | espero | /@speru/ |
Semivowels | j | j | fruita |
w | w | guardar | /gw@rda/ |
@ | ə | gaita | /gajt@/ |
Catalan consonants
The standard Catalan consonants system is considered to have:
- six plosives
- six fricatives
- one affricate
- four nasals
- two laterals
- two trills
- two xenophones
The sample words given below demonstrate the different contexts in which the sounds can appear. A short explanation is also given.
Plosives
There are three voiced and three voiceless plosives in Catalan, which can be arranged in pairs as shown here:
Voiced | Examples | Voiceless | Examples | ||
---|---|---|---|---|---|
/b/ | bellesa setembre | /b@LEz@/ /s@tembr@/ | /p/ | palma aplicar verb | /palm@/ /@plika/ /bErrp/ |
/d/ | declarar setze | /d@kl@ra/ /sEdz@/ | /t/ | teatre canta cabaret | /teatr@/ /kant@/ /k@b@rEt/ |
/g/ | gaita agost | /gajt@/ /@gost/ | /k/ | kilo descansar despòtic | /kilu/ /d@sk@nsa/ /d@spOtik/ |
Fricatives
There are six fricatives in the Catalan SAMPA symbol set, five voiced and four voiceless:
Voiced | Examples | Voiceless | Examples | ||
---|---|---|---|---|---|
/S/ | caixa cruz | /kaS@/ /kruS/ | |||
/z/ | zero reserva | /zEru/ /rr@zerb@/ | /f/ | fabricar cafè fotògraf | /f@brika/ /k@fE/ /futOgr@f/ |
/Z/ | genial urgent | /Z@nial/ /urrZen/ | /s/ | saber caríssima sentiments | /s@bE/ /k@risim@/ /s@ntimens/ |
/j/ | ioga paisatge bonsai | /jOg@/ /p@jzadZ@/ /bunsaj/ |
Affricates
In Catalan there is one affricate, /dZ/.
/dZ/ | fotomuntatge | /futumuntadZ@/ |
---|
Nasals
There are four nasals in Catalan, /m/, /n/, /N/, and /J/.
/m/ | mes habitualment quelcom | /mes/ /@bitualmen/ /k@lkOm/ |
---|---|---|
/n/ | nata organitzar pagament | /nat@/ /urg@nidza/ /p@g@men/ |
/N/ | significa cinc | /siNnifik@/ /siN/ |
/J/ | acompanyar any | /@kump@Ja/ /aJ/ |
Laterals
There are two laterals in Catalan, /l/ and /L/.
/l/ | lògica diabòlic mal | /lOZik@/ /di@bolik/ /mal/ |
---|---|---|
/L/ | lluna mallorca castell | /Lun@/ /m@LOrrk@/ /k@steL/ |
Trills
There are two trills in Catalan, /r/ and /rr/.
/r/ | compra | /kompr@/ |
---|---|---|
/rr/ | racista sorprendre valor | /rr@sist@/ /surrpEndr@/ /b@lorr/ |
Catalan vowels
Single vowels (monophthongs)
The Catalan language has seven distinguishable monophthongs:
/a/ | cada caminar | /kad@/ /k@mina/ |
---|---|---|
/e/ | ella gomera porter | /eL@/ /gumer@/ /purrte/ |
/E/ | època aquella què | /Epuk@/ /@kEL@/ /kE/ |
/i/ | hipnòtic idea imaginar introduir | /ibnOtik/ /ide@/ /im@Zina/ /intrudui/ |
/o/ | onze operadora operació | /onz@/ /up@r@dor@/ /up@r@sio/ |
/O/ | òptica amazònica | /Optik@/ /@m@zOnik@/ |
/u/ | homòfon important menú | /umOfun/ /impurrtan/ /m@nu/ |
Semi-vowels
There are three semi-vowels in Catalan:
/j/ | ioga vizcaya massai | /jOg@/ /biSkaja/ /m@saj/ |
---|---|---|
/w/ | washington ambigua europeu | /waSinton/ /@mbigw@/ /@wrupEw/ |
/@/ | evitar exagerat fantasia | /@bita/ /@gz@Z@rat/ /f@nt@zi@/ |
Specific pronunciation transcription methods
The grapheme <h>
There is no phonetic realization of the grapheme <h>. For example:
hotel | /utEl/ |
---|
Transcription of the fricatives /s/ and /z/
The voiceless fricative /s/ occurs before vowels, voiceless consonants and at the end of a word. For example:
sala | /sal@/ |
---|---|
absorbit | /@psurbit/ |
articles | /@rrtikl@s/ |
The voiced fricative /z/ occurs before voiced consonants or between two vowels. For example:
anglicismes | /@Nglisizm@s/ |
---|---|
bellesa | /b@LEz@/ |
Transcription of the trills /r/ and /rr/
The trill /r/ appears in the middle of a word between two vowels and between a vowel and a consonant other than <n>, <l>, or <s>. For example:
bolero | /buleru/ |
---|---|
celebrar | /s@l@bra/ |
The trill /rr/ appears at the beginning and at the end of a word. In the middle of a word, its position is between two vowels or between a vowel and a consonant other than <n>, <l>, or <s>. For example:
realitat | /rre@litat/ |
---|---|
familiar | /f@miliarr/ |
catorze | /k@torrz@/ |
Transcription of the nasals /J/ and /N/
The grapheme <ny> is always represented by the SAMPA symbol /J/.
companyia | /kump@Ji@/ |
---|---|
desengany | /d@z@NgaJ/ |
The nasal /N/ appears before the phonemes /g/ and /k/ and word final. For example:
lingüista | /liNgwist@/ |
---|---|
banc | /baN/ |
Transcription of the affricate /dZ/
/dZ/ usually appears between two vowels and is represented orthographically as <tg> or <tj>.
paisatge | /p@jzadZ@/ |
---|---|
lletjos | /LedZus/ |
Pronunciation of foreign words
When there is a need to transcribe foreign words, the general rule is to transcribe those words with the same SAMPA symbol set than the rest. In case of a Catalan transcription you have to transcribe every word of the dictionary with the Catalan SAMPA symbols.
If you use a different symbol set your system will be incapable of understanding the input.
Every language has a different phoneme inventory, so you may have problems in covering each and every sound. For the most common case we offer transcription examples.
The French nasals for example are adapted to the Catalan vowel system:
blanc | /blaN/ |
---|---|
centre | /sentr@/ |
In general, foreign words are being integrated into Catalan phonetics, sometimes also the orthography was changed.
garatge | /g@radZ@/ |
---|---|
communications | /komunikEjSons/ |
hannover | /@nOb@rr/ |
lletjos | /LedZus/ |
Multiple pronunciations (variants)
The type of pronunciation used in SAMPA and in the Catalan Background dictionary conforms to the standard non-regional Catalan pronunciation. Since it is possible to have more than one pronunciation for a word by using pronunciation variants, it may be difficult to determine how many pronunciation variants should be created.
The general rule is: create variants only if the pronunciation differs in more than one phoneme.
The Catalan symbol set in alphabetical order
The following table shows the Catalan symbol set in alphabetical order:
SAMPA | IPA | Examples of use |
---|---|---|
@ | ə | gaita |
a | a | casado |
b | b | bodega |
d | d | diploma |
dZ | ʤ | garatge |
e | e | doble |
E | ɛ | dèbil |
f | f | micròfon |
g | g | garantia |
i | i | edicions |
j | j | aires |
j | j | fruita |
J | ɲ | acompanyar |
k | k | informàtica |
l | l | blanca |
L | ʎ | caballero |
m | m | humana |
n | n | ajuntament |
N | ŋ | llengua |
o | o | editor |
O | ɔ | escola |
p | p | pellícula |
r | r | cabaret |
rr | rr | caràcter |
s | s | observar |
S | ʃ | peix |
t | t | teatre |
u | u | espero |
w | w | guardar |
z | z | atletisme |
Z | ʒ | pellroja |
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.