English India (en-IN)
This documentation was updated on December 17, 2023.
Creating grammars
The following subsections describe key issues for working with grammar documents in the Indian English language.
Character encoding
Nuance has full internal Unicode support. For example, you can create your grammars using UTF 8 or Latin-1 (also known as ISO-8859-1) character encoding. For example, your grammar header might be:
<?xml version=‘1.0’ encoding=‘UTF 8’?> <grammar xml:lang=“en-IN” version=“1.0” root=“test”>
If you do not have access to a keyboard for your target language, you can use the Windows character map. (Choose the “System” font and the “Latin-1” subset.)
Start→Programs→Accessories→System Tools→Character Map
alphanum_lc built-in grammar
The alphanum built-in grammar recognizes a connected string of up to 20 digits and lower case alphabetic characters. For example, this grammar could be used to recognize a product code or order number.
Characters are the letters a-z. Digits are 0-9.
Callers can speak digits as English or Hindi numbers: “nine seven five three” or “nau sath panch theen,” but they cannot mix languages in a single utterance.
Note: this grammar replaces the alphanum built-in grammar
alphanum built-in grammar
The alphanum built-in grammar recognizes a connected string of up to 20 digits and alphabetic characters. For example, this grammar could be used to recognize a product code or order number.
Characters are the letters A-Z and a-z. Digits are 0-9.
Callers can speak digits as English or Hindi numbers: “nine seven five three” or “nau sath panch theen,” but they cannot mix languages in a single utterance.
boolean built-in grammar
The boolean grammar collects an affirmative or negative response.
In addition to “yes,” callers can say: true, correct, right, sure, ha, yep, yeah.
In addition to “no,” callers can say: wrong, not correct, incorrect, false, no, nope, nahee.
Properties
The y and n parameters let you associate any two touchtone buttons as synonyms for yes and no.
Parameter | Description |
---|---|
y | Desired DTMF digit to be equivalent to “yes” (default = 1) |
n | Desired DTMF digit to be equivalent to “no” (default = 2) |
Examples
Caller says | MEANING key |
---|---|
yes ha | true |
no nahee | false |
ccexpdate built-in grammar
The ccexpdate grammar understands the expiration date on a credit card. Expiration dates are usually a month and a year, and are often embossed on a credit card in the form “mm/yy.” The grammar recognizes variations on the date, for example, “December 2007,” “twelve oh seven,” “twelve of two thousand and seven,” “twelve slash zero seven,” etc.
creditcard built-in grammar
The creditcard grammar understands a caller saying a credit card number, optionally preceding the number with the credit card name, or the words “account number” or “account.” For example, a caller can say, “visa account number four oh one seven…,” “mastercard five zero zero two…,” or “three seven three five….”
currency built-in grammar
The currency grammar collects currency using rupees and paisa, British pounds and pennies, euros and cents, such as “ten rupees,” “ten rupees and fifteen paisa,” and “ten fifteen.”
Return keys/values
MEANING | Contains a string in the following form: currencymain_unit_amount . subunit_amount If the caller explicitly says the denomination of the currency like “rupees,” then a currency value is added as a prefix: INR: rupees GBP: pounds EUR: euros If the caller omits the main unit or subunit amount, then that field is zero. The string contains a leading zero if the subunit amount is collected without the main unit. The use of two currencies in the same amount, for example, “five rupees and five cents,” is not allowed. The key AMBIGUOUS is set to 1 if the caller says an ambiguous phrase such as “fifteen twelve” which could either be 15.12 or 1512.00; otherwise, it is set to 0. |
---|---|
SWI_literal | contains the exact text that was recognized. |
Examples
Caller says | MEANING |
---|---|
five rupees | INR5.00 |
five euros | EUR5.00 |
five cents | EUR0.05 |
five rupees and five paise | INR5.05 |
five rupees and twenty-five five rupees twenty-five | INR5.25 |
five twenty-five | 5.25 |
six hundred twenty-five thousand four hundred sixty-four rupees | INR625464.00 |
one euro zero cents | EUR1.00 |
one pound twenty two | GBP1.22 |
one twenty two | 1.22 |
date built-in grammar
The date grammar accepts a date spoken in any of several formats. Recognized phrases include “4 June,” “4 June 2006,” ““4, 6, 2006,” “the 4th,” “4th June,” and “Monday, the 4th of June.”
The grammar also accepts “yesterday” “today,” and “tomorrow” which return values of -1, 0, and +1 respectively into the MEANING key.
Examples
Caller says | MEANING key |
---|---|
January 5th, 2000 | 20000105 |
Yesterday | -1 |
Today | 0 |
Tomorrow | +1 |
the fourth | ??????04 |
Wednesday | (Phrase not recognized) |
Wednesday the 12th | ??????12 |
June 4 or June 4th | ????0604 |
June 4, 1997 | 19970604 |
June 4, 97 | ??970604 |
Wednesday, June 4, 1997 | 19970604 |
the 6th | ??????06 |
4, 6 | ????0604 |
10, 12 | ????1210 |
10, 12, 97 | ??971210 |
digits built-in grammar
Valid characters are the digits 0-9. The digit `0’ can be pronounced as either “oh” or “zero.” The digits can be spoken as English or Hindi numbers: “nine seven five three…” or “nau sath panch theen…” The languages cannot be mixed in a single utterance.
number built-in grammar
The number grammar recognizes whole numeric numbers (the caller must not speak the individual digits), such as “ten,” “one hundred and forty,” “five hundred sixty one point five,” “negative five,” and “minus four point three.”
Examples
Numbers from -999,999,999.99 to 999,999,999.99 are recognized, but by default the minallowed parameter is set to zero, which limits recognition to positive values.
Caller says | MEANING key |
---|---|
twenty five | 25 |
twelve thousand three hundred forty five | 12345 |
twelve hundred | 1200 |
minus two negative two | -2 |
fourteen point five six | 14.56 |
fourteen dot fifty six | (Phrase not recognized; the words “dot” and “fifty six” are not allowed) |
time built-in grammar
The time built-in grammar accepts spoken time-of-day utterances from the caller.
Examples
For each entry, the values returned in the MEANING and QUALIFIER keys are shown. (Not shown are the values of the HOUR, MINUTE, and AMPM keys.)
Caller says | MEANING | QUALIFIER |
---|---|---|
now, immediately… | (Phrase not recognized) | -- |
in a half hour | (Phrase not recognized) | -- |
at noon | 1200p | exact |
at midnight | 0000? | exact |
before noon | 1200p | before |
after thirteen thirty | 1330h | after |
twenty twenty | 2020h | exact |
eight twenty in the morning | 0820a | exact |
half past eight | 0830? | exact |
seven fifteen pm quarter past seven in the evening | 0715p | exact |
twenty four hundred hours twenty four hundred | 0000h | exact |
zipcode built-in grammar
The zipcode grammar recognizes Indian Postal Index Numbers (PIN codes) in six-digit format. Digits 0 and 9 never occur in the first position.
Return keys/values
Upon return, the key MEANING is assigned to the recognized PIN code, and contains six digits.
Vocabulary items and pronunciations
This chapter describes considerations for vocabularies and their pronunciations in Indian English (en-IN).
Specially tuned pronunciations
The following table shows common words that are fine-tuned by Nuance. Each of these words contains “word-specific phonemes;” that is, phonemes and associated models created especially for the words.
Words with tuned pronunciations (do not modify):
Hindi booleans: ha, nahee
Hindi digits: shoonya (0), akh (1), dhoh (2), theen (3), char (4), panch (5), cheh (6), sath (7), aath (8) and nau (9)
Indian English pronunciations
This section provides detailed reference information to help create pronunciation dictionaries. It is intended for people who have sufficient knowledge of the Indian English language. It provides information about transcription and pronunciation.
The Indian English phoneme system
The Indian English phoneme system can be divided into two groups:
- consonants
- vowels
Furthermore, it is possible to define six different types of consonants:
- plosives
- fricatives
- affricates
- nasals
- laterals
- semivowels
Within the vowel group, further distinctions can be made between front, central, and back vowels and diphthongs.
Indian English spelling does have a certain complexity, since the orthography of most of its constituent words does not necessarily reflect their pronunciation. This lack of rigid structure means that the relationship between spelling (grapheme) and sound (phoneme) is difficult to define. Generally speaking, the phonetic transcription of a word is influenced by:
- specific phonetic rules
- pronunciation peculiarities that have developed by the time
Indian English symbol set grouped by phoneme classes
The following table shows all phonemes used in Indian English transcriptions, these are listed according to their phoneme classes with their SAMPA and IPA representations.
Phoneme class | SAMPA | IPA | Examples of usage |
---|---|---|---|
Consonants | Plosives | b | b |
p | p | pin | /pIn/ |
g | g | give | /gIv/ |
k | k | skin | /skIn/ |
d | d | dummy | /dVmi:/ |
t | t | tin | /tIn/ |
Fricatives | v | v | saving |
f | f | coffee | /kQfi:/ |
D | ð | this | /DIs/ |
T | θ | thin | /TIn/ |
z | z | crazy | /kreYzi:/ |
s | s | sin | /sIn/ |
S | ʃ | ship | /SIp/ |
Z | ʒ | vision | /vIZ@n/ |
h | h | hit | /hIt/ |
Affricates | tS | ʧ | chat |
dZ | ʤ | ginger | /dZIndZ@/ |
Nasals | m | m | mock |
n | n | knock | /nQk/ |
N | ŋ | thing | /TIN/ |
Laterals | l | l | long |
Vowels | Semivowels | r | r |
j | j | yes | /jes/ |
w | w | wet | /wet/ |
Single vowels | I | ɪ | pit |
i: | i: | ease | /i:z/ |
e | e | pet | /pet/ |
u: | u: | lose | /lu:z/ |
@ | ə | away | /@weY/ |
{ | æ | bad | /b{d/ |
A: | ɑ: | stars | /stA:z/ |
Q | ɒ | pot | /pQt/ |
O: | ɔ: | north | /nO:T/ |
V | ʌ | cut | /kVt/ |
3: | ɛ: | furs | /f3:z/ |
U | ʊ | put | /pUt/ |
Diphthongs | eY | eɪ | raise |
aY | aɪ | rise | /raYz/ |
QY | ɔɪ | noise | /nQYz/ |
@W | əʊ | nose | /n@Wz/ |
aW | au̬ / aʊ̬ | rouse | /raWz/ |
eR | eə | stairs | /steRz/ |
IR | ɪə | appear | /@pIR/ |
UR | ʊə | tourist | /tURrIst/ |
Indian English consonants
English consonants typically consist of:
- six plosives
- nine fricatives
- two affricates
- three nasals
- one lateral
- three semivowels
Plosives
There are three voiced and three voiceless plosives in Indian English, which can be arranged in pairs as shown here:
Voiced | Voiceless |
---|---|
/b/ | bit rabid cab |
/g/ | gold degree bag |
/d/ | down medal sad |
Glottal stop
Indian English has an additional plosive, the so-called glottal stop ‘?’. It does not have a distinctive function and is not uniformly represented in the language’s orthography, but it is pronounced. For example, before initial vowels when heavily stressed, and is sometimes used as a variant supplanting the medial and final /t/. It can, however, be ignored for lexicon transcription purposes.
Fricatives
There are nine fricatives in the Indian English SAMPA symbol set, five voiceless and four voiced:
Voiced | Voiceless |
---|---|
/v/ | vine even prove |
/D/ | this worthy with |
/z/ | zone razor plays |
/Z/ | gendarme vision |
In Indian English the voiceless fricative /h/ does not appear in the final position.
Affricates
In Indian English there are two affricates: /dZ/ and /tS/.
Note, that in SAMPA affricates are always represented by two single phonemes:
Voiced | Voiceless |
---|---|
/dZ/ | gin ridges large |
Nasals
There are three nasals in Indian English, /m/, /n/, and /N/. The velar nasal /N/ (back of the tongue touches the soft palate) never appears in the initial position.
/m/ | man hammer ham | /m{n/ /h{m@/ /h{m/ |
---|---|---|
/n/ | net enter run | /net/ /ent@/ /rVn/ |
/N/ | sing finger | /sIN/ /fINg@/ |
Pronunciation note: The grapheme n before c, g, k, q, x is pronounced as /N/.
Syllabic /m/ and /n/ are represented as /@m/ and /@n/ respectively, for example:
garden /gA:d@n/
Laterals
There is one lateral in Indian English: /l/.
/l/ | long falling roll | /lQN/ /fO:lIN/ /r@Wl/ |
---|
Syllabic /l/ is represented as /@l/, for example: level /lev@l/.
Semivowels
A semivowel is articulated by allowing air to escape over the center of the tongue through a stricture (in the case of /w/ two strictures) that is not so narrow as to cause audible friction. Semivowels are articulated like vowels, but function as consonants since they are not syllabic. They can also be referred to as approximants.
There are three semivowels in Indian English, /r/, /j/, and /w/.
/r/ | rich blurring | /rItS/ /bl3:rIN/ |
---|---|---|
/j/ | young view | /jVN/ /vju:/ |
/w/ | win away | /wIn/ /@weY/ |
In Indian English final /r/ is usually not pronounced, unless it appears in combined words as a linking-r, for example: far-off /fA:rQf/.
Indian English vowels
Front, central, and back vowels
Indian English single vowels (monophthongs) can be divided into three groups according to their place of articulation: front, central or back. Within each group vowels differ in their degree of mouth opening. Length is of minor importance in the Indian English vowel system, and the length of a particular vowel in a given word may change considerably in connected speech. Thus the colon, which appears in some phonetic symbols to denote length, is used in the transcription of Indian English to denote a different vowel quality rather than quantity (length).
The three vowel groups are shown in the following table, ranging in each group from closed (top) to open (bottom) mouth:
Front | Central | Back |
---|---|---|
/i:/ | ease believe free | /i:z/ /bIli:v/ /fri:/ |
/U/ | umlaut put | /UmlaWt/ /pUt/ |
/I/ | itch pit | /ItS/ /pIt/ |
/e/ | pet ever | /pet/ /ev@/ |
/3:/ | urban nurse fur | /3:b@n/ /n3:s/ /f3:/ |
/O:/ | awe north cause | /O:/ /nO:T/ /kO:z/ |
/{/ | apt sad | /{pt/ /s{d/ |
/A:/ | start father bar | /stA:t/ /fA:D@/ /bA:/ |
/Q/ | optimistic pot | /QptImIstIk/ /pQt/ |
Pronunciation note: The short o-sound, is regularly transcribed as /Q/, for example:
moral /mQr@l/
Diphthongs
There are eight diphthongs in the Indian English phoneme inventory:
/eY/ | aim face hay | /eYm/ /feYs/ /heY/ |
---|---|---|
/aY/ | ice price high | /aYs/ /praYs/ /haY/ |
/QY/ | oyster toys boy | /QYst@/ /tQYz/ /bQY/ |
/@W/ | omen home blow | /@Wm@n/ /h@Wm/ /bl@W/ |
/aW/ | our mouth now | /aW@/ /maWT/ /naW/ |
/IR/ | ear near | /IR/ /nIR/ |
/eR/ | air area square | /eR/ /eRrIR/ /skweR/ |
/UR/ | cure | /kjUR/ |
Diphthongs can artificially emerge in a transcription when the individual phonemes that usually form a diphthong are placed adjacent in a word. For example, autoimmune. However, instances of such words are rare and can be ignored.
Specific pronunciation transcription methods
Linking-r
In Indian English, a word pronounced in isolation never ends in /r/. However, in connected speech the final r is pronounced as if it is followed by a vocal, as in the combined words:
far-away /fA:r@weY/
The inserted r-sound is known as `linking-r’ and should be transcribed to avoid liaison problems.
Syllabic consonants
The consonants l, m and n can sometimes form a syllable on their own. In these cases they are transcribed as /@l/, /@m/, and /@n/ respectively.
Pronunciation of foreign words
When there is a need to transcribe foreign words the general rule is to transcribe those words with the same SAMPA symbol set than the rest. In case of a Indian English transcription you have to transcribe every word of the lexicon with the Indian English SAMPA symbols.
If you use a different symbol set your system will be incapable of understanding the input.
Every language has a different phoneme inventory, so you may have problems in covering each and every sound. For the most common cases we offer the following transcription examples.
French nasals
Try to apply a pronunciation that has been adapted to Indian English, for example:
bon-bon /bQnbQn/
The original transcription ‘bo~bo~’ cannot be realized because the French phoneme ‘o~’ is not part of the Indian English SAMPA Symbol set.
Vowel ‘y’ in German and French
The vowel ‘y’, found in some German or French words can be represented by /jU/, such as:
Dubonnet /djUbQneY/
Conveniently this reflects the pronunciation commonly used by Indian English speakers who are not fully conversant within the particular language.
German fricatives ‘C’ and ‘x’
Palatal and velar fricatives that occur in, for example, German, can be transcribed as /k/, instead of ‘C’ or ‘x’. As in:
Bach | /bA:k/ |
---|---|
Reich | /raYk/ |
Multiple pronunciations (variants)
The type of pronunciation used in SAMPA and in the Indian-English Background lexicon conforms to the standard non-regional British pronunciation. Indian English sometimes differs from the standard British pronunciation and thus a variant should be used.
As the speakers of Indian English are influenced by different mother tongues, the pronunciation variants are not always consistent. The standard British English should always be used as a canonical form. Some frequently occurring variations are the following:
contractions | November n@Wvemb@ |
---|---|
November<nQmb@> | |
“Ellipse” | five faYv |
five<faY> | |
rolled r | cheers tSIRz cheers<tSIRrs> cheers<tIRs> |
“Auslautverhärtung” | red red |
red<ret> | |
w → v problem | water wO:t@r |
water<vO:t@r> | |
William wIlIRm | |
William<vIlIRm> |
The Indian English symbol set in alphabetical order
The following table shows the Indian English symbol set in alphabetical order:
SAMPA | IPA | Examples of usage |
---|---|---|
@ | ə | away |
@W | əʊ | nose |
{ | æ | bad |
3: | ɛ: | furs |
A: | ɑ: | stars |
aY | aɪ | rise |
aW | au̬ / aʊ̬ | rouse |
b | b | bin |
d | d | dummy |
D | ð | this |
dZ | ʤ | ginger |
e | e | pet |
eR | eə | stairs |
eY | eɪ | raise |
f | f | coffee |
g | g | give |
h | h | hit |
I | ɪ | pit |
i: | i: | ease |
IR | ɪə | appear |
j | j | yes |
k | k | skin |
l | l | long |
m | m | mock |
n | n | knock |
N | ŋ | thing |
O: | ɔ: | north |
QY | ɔɪ | noise |
p | p | pin |
Q | ɒ | pot |
r | r | run |
s | s | sin |
S | ʃ | ship |
t | t | tin |
T | θ | thin |
tS | ʧ | chat |
U | ʊ | put |
u: | u: | lose |
UR | ʊə | tourist |
v | v | saving |
V | ʌ | cut |
w | w | wet |
z | z | crazy |
Z | ʒ | vision |
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.