Spanish Catalan (ca-ES)

This documentation was updated on October 31, 2023.

Creating grammars

The following subsections describe key issues for working with grammar documents in the Catalan language.

Character encoding

Nuance Recognizer has full internal Unicode support. For example, you can create your grammars using UTF-8 or Latin-1 (also known as ISO-8859-1) character encoding. For example, your grammar header might be:

<?xml version='1.0' encoding='UTF-8'?> 



<grammar xml:lang="ca-ES" version="1.0" root="test">

Below are codes for writing some common Catalan characters. These are useful if you do not have access to a Catalan keyboard, and are typed by pressing the Alt key while entering digits on your keyboard (after typing the last digit, the desired character appears on your screen when you release the Alt key):

Alt/0224 = à Alt/0237 = í
Alt/0225 = á Alt/0239 = ï
Alt/0231 = ç Alt/0242 = ò
Alt/0232 = è Alt/0243 = ó
Alt/0233 = é Alt/0249 = ù
Alt/0250 = ú

In case your keyboard does not match your target language on Windows add the respective keyboard by going to the “Control Panel” click “Regional and Language” and select “Keyboards and languages”.

alphanum_lc built-in grammar

The alphanum_lc grammar recognizes a connected string of up to 20 digits and lower case alphabetic characters.

For example, this grammar could be used to recognize a product code or order number.

Characters are the letters a-z, and à, á, ç, è, é, í, ï, ò, ó, ù, and ú.

Digits are 0-9.

Non-alphanumeric characters such as hyphens (-), dots (.), and underscores (_) are not recognized; if spoken they reduce recognition accuracy.

Returned keys/values

MEANING Contains a string of ISO-8859-1 digits and lowercase letters, with no embedded spaces.
SWI_literal Contains the exact text that was recognized.

Note: the alphanum_lc built-in grammar replaces the alphanum built-in grammar.

alphanum built-in grammar

The alphanum grammar recognizes a connected string of up to 20 digits and alphabetic characters. For example, this grammar could be used to recognize a product code or order number.

Characters are the letters a-z, and à, á, ç, è, é, í, ï, ò, ó, ù, and ú.

Digits are 0-9.

Non-alphanumeric characters such as hyphens (-), dots (.), and underscores (_) are not recognized; if spoken they reduce recognition accuracy.

Returned keys/values

MEANING Contains a string of ISO-8859-1 digits and lowercase letters, with no embedded spaces.
SWI_literal Contains the exact text that was recognized.

boolean built-in grammar

The boolean grammar collects an affirmative or negative response.

Properties

The y and n parameters let you associate any two touchtone buttons as synonyms for yes and no.

Parameter Description
y Desired DTMF digit to be equivalent to “sí” (default = 1)
n Desired DTMF digit to be equivalent to “no” (default = 2)

Examples

Caller says MEANING key
true
no false

digits built-in grammar

Valid characters are the digits 0-9.

Vocabulary items and pronunciations

This chapter describes considerations for vocabularies and their pronunciations in Catalan (ca-ES). Your product documentation covers details about how to work with pronunciations and dictionaries.

Specially tuned pronunciations

The following table shows common words that are fine-tuned by Nuance. Each of these words contains “word-specific phonemes;” that is, phonemes and associated models created especially for the words.

Words with tuned pronunciations (do not modify):

All letters of the alphabet, a-z, à, é, ç, è, é, í, ï, ò, ó, ù, and ú.
Boolean: sí and no
Digits: 0-9

Catalan pronunciations

This section provides detailed reference information to help create pronunciation dictionaries. It is intended for people who have sufficient knowledge of the Catalan language as spoken in Spain. It provides information about transcription and pronunciation.

As a reference pronunciation dictionary we use:

> Langenscheidts Universalwörterbuch Katalanisch , Katalanisch-Deutsch/Deutsch-Katalanisch, edition Langenscheidt, Berlin 2000

If you are not sure how a certain word is pronounced you can refer to the IPA transcriptions given there and then convert them into the SAMPA symbols, given in The Catalan symbol set in alphabetical order .

The Catalan phoneme system

The Catalan phoneme system can be divided into two groups:

  • Vowels
  • Consonants

It is possible to distinguish seven different types of Catalan consonants:

  • Plosives
  • Fricatives
  • Affricates
  • Nasals
  • Laterals
  • Trill
  • Xenophones

Catalan symbol set grouped by phoneme classes

The following table shows all phonemes used in Catalan transcriptions. They are listed according to their phoneme classes with their SAMPA and IPA representations.

Phoneme class SAMPA IPA Examples of use
Consonants Plosives b b
p p pellícula /p@likul@/
d d diploma /diplom@/
t t teatre /teatr@/
g g garantia /g@r@nti@/
k k informàtica /imfurrmatik@/
Fricatives f f micròfon
s s observar /ups@rba/
S ʃ peix /peS/
Z ʒ pellroja /peLrrOZ@/
j j aires /ajr@s/
z z atletisme /@ll@tizm@/
Affricates dZ ʤ garatge
Nasals m m humana
N ŋ llengua /LeNgw@/
J ɲ acompanyar /@kump@Ja/
n n ajuntament /@Zunt@men/
Laterals l l blanca
L ʎ caballero /kabaLero/
Trills r r cabaret
rr rr caràcter /k@rakt@rr/
Vowels Single vowels a a
e e doble /doble/
E ɛ dèbil /dEbil/
i i edicions /@disions/
o o editor /@dito/
O ɔ escola /@skOl@/
u u espero /@speru/
Semivowels j j fruita
w w guardar /gw@rda/
@ ə gaita /gajt@/

Catalan consonants

The standard Catalan consonants system is considered to have:

  • six plosives
  • six fricatives
  • one affricate
  • four nasals
  • two laterals
  • two trills
  • two xenophones

The sample words given below demonstrate the different contexts in which the sounds can appear. A short explanation is also given.

Plosives

There are three voiced and three voiceless plosives in Catalan, which can be arranged in pairs as shown here:

Voiced Examples Voiceless Examples
/b/ bellesa setembre /b@LEz@/ /s@tembr@/ /p/ palma aplicar verb /palm@/ /@plika/ /bErrp/
/d/ declarar setze /d@kl@ra/ /sEdz@/ /t/ teatre canta cabaret /teatr@/ /kant@/ /k@b@rEt/
/g/ gaita agost /gajt@/ /@gost/ /k/ kilo descansar despòtic /kilu/ /d@sk@nsa/ /d@spOtik/

Fricatives

There are six fricatives in the Catalan SAMPA symbol set, five voiced and four voiceless:

Voiced Examples Voiceless Examples
/S/ caixa cruz /kaS@/ /kruS/
/z/ zero reserva /zEru/ /rr@zerb@/ /f/ fabricar cafè fotògraf /f@brika/ /k@fE/ /futOgr@f/
/Z/ genial urgent /Z@nial/ /urrZen/ /s/ saber caríssima sentiments /s@bE/ /k@risim@/ /s@ntimens/
/j/ ioga paisatge bonsai /jOg@/ /p@jzadZ@/ /bunsaj/
Affricates

In Catalan there is one affricate, /dZ/.

/dZ/ fotomuntatge /futumuntadZ@/
Nasals

There are four nasals in Catalan, /m/, /n/, /N/, and /J/.

/m/ mes habitualment quelcom /mes/ /@bitualmen/ /k@lkOm/
/n/ nata organitzar pagament /nat@/ /urg@nidza/ /p@g@men/
/N/ significa cinc /siNnifik@/ /siN/
/J/ acompanyar any /@kump@Ja/ /aJ/
Laterals

There are two laterals in Catalan, /l/ and /L/.

/l/ lògica diabòlic mal /lOZik@/ /di@bolik/ /mal/
/L/ lluna mallorca castell /Lun@/ /m@LOrrk@/ /k@steL/
Trills

There are two trills in Catalan, /r/ and /rr/.

/r/ compra /kompr@/
/rr/ racista sorprendre valor /rr@sist@/ /surrpEndr@/ /b@lorr/

Catalan vowels

Single vowels (monophthongs)

The Catalan language has seven distinguishable monophthongs:

/a/ cada caminar /kad@/ /k@mina/
/e/ ella gomera porter /eL@/ /gumer@/ /purrte/
/E/ època aquella què /Epuk@/ /@kEL@/ /kE/
/i/ hipnòtic idea imaginar introduir /ibnOtik/ /ide@/ /im@Zina/ /intrudui/
/o/ onze operadora operació /onz@/ /up@r@dor@/ /up@r@sio/
/O/ òptica amazònica /Optik@/ /@m@zOnik@/
/u/ homòfon important menú /umOfun/ /impurrtan/ /m@nu/

Semi-vowels

There are three semi-vowels in Catalan:

/j/ ioga vizcaya massai /jOg@/ /biSkaja/ /m@saj/
/w/ washington ambigua europeu /waSinton/ /@mbigw@/ /@wrupEw/
/@/ evitar exagerat fantasia /@bita/ /@gz@Z@rat/ /f@nt@zi@/

Specific pronunciation transcription methods

The grapheme <h>

There is no phonetic realization of the grapheme <h>. For example:

hotel /utEl/

Transcription of the fricatives /s/ and /z/

The voiceless fricative /s/ occurs before vowels, voiceless consonants and at the end of a word. For example:

sala /sal@/
absorbit /@psurbit/
articles /@rrtikl@s/

The voiced fricative /z/ occurs before voiced consonants or between two vowels. For example:

anglicismes /@Nglisizm@s/
bellesa /b@LEz@/

Transcription of the trills /r/ and /rr/

The trill /r/ appears in the middle of a word between two vowels and between a vowel and a consonant other than <n>, <l>, or <s>. For example:

bolero /buleru/
celebrar /s@l@bra/

The trill /rr/ appears at the beginning and at the end of a word. In the middle of a word, its position is between two vowels or between a vowel and a consonant other than <n>, <l>, or <s>. For example:

realitat /rre@litat/
familiar /f@miliarr/
catorze /k@torrz@/

Transcription of the nasals /J/ and /N/

The grapheme <ny> is always represented by the SAMPA symbol /J/.

companyia /kump@Ji@/
desengany /d@z@NgaJ/

The nasal /N/ appears before the phonemes /g/ and /k/ and word final. For example:

lingüista /liNgwist@/
banc /baN/

Transcription of the affricate /dZ/

/dZ/ usually appears between two vowels and is represented orthographically as <tg> or <tj>.

paisatge /p@jzadZ@/
lletjos /LedZus/

Pronunciation of foreign words

When there is a need to transcribe foreign words, the general rule is to transcribe those words with the same SAMPA symbol set than the rest. In case of a Catalan transcription you have to transcribe every word of the dictionary with the Catalan SAMPA symbols.

If you use a different symbol set your system will be incapable of understanding the input.

Every language has a different phoneme inventory, so you may have problems in covering each and every sound. For the most common case we offer transcription examples.

The French nasals for example are adapted to the Catalan vowel system:

blanc /blaN/
centre /sentr@/

In general, foreign words are being integrated into Catalan phonetics, sometimes also the orthography was changed.

garatge /g@radZ@/
communications /komunikEjSons/
hannover /@nOb@rr/
lletjos /LedZus/

Multiple pronunciations (variants)

The type of pronunciation used in SAMPA and in the Catalan Background dictionary conforms to the standard non-regional Catalan pronunciation. Since it is possible to have more than one pronunciation for a word by using pronunciation variants, it may be difficult to determine how many pronunciation variants should be created.

The general rule is: create variants only if the pronunciation differs in more than one phoneme.

The Catalan symbol set in alphabetical order

The following table shows the Catalan symbol set in alphabetical order:

SAMPA IPA Examples of use
@ ə gaita
a a casado
b b bodega
d d diploma
dZ ʤ garatge
e e doble
E ɛ dèbil
f f micròfon
g g garantia
i i edicions
j j aires
j j fruita
J ɲ acompanyar
k k informàtica
l l blanca
L ʎ caballero
m m humana
n n ajuntament
N ŋ llengua
o o editor
O ɔ escola
p p pellícula
r r cabaret
rr rr caràcter
s s observar
S ʃ peix
t t teatre
u u espero
w w guardar
z z atletisme
Z ʒ pellroja