Mandarin China (zh-CH)
This documentation was updated on November 22, 2023.
Creating grammars
The following subsections describe key issues for working with grammar documents in the Mandarin language.
Character encoding
Nuance Recognizer has full internal Unicode support. For example, you can create your grammars using UTF-8 or GB character encoding. UTF-8 is preferred. Traditional Chinese characters shouldn’t be used. For example, your grammar header might be:
<?xml version=‘1.0’ encoding=‘UTF-8’?>
<grammar xml:lang=“zh-CN” version=“1.0” root=“test”>
alphanum_lc built-in grammar
The alphanum_lc built-in grammar recognizes a connected string of up to 20 digits and lower case alphabetic characters. For example, this grammar could be used to recognize a product code or order number.
Valid characters are the English letters of the alphabet (a–z) so callers can speak English characters in addition to Mandarin numbers. The pronunciation of the letter z as the British-style “zed” is recognized, but the American-style “zii” is likely to be misrecognized as the letter c .
Valid digits are 0–9. Although specified as Arabic numbers, callers speak the Mandarin equivalents: 零 一 二 三 四 五 六 七 八 九 . The digit " 一" " can be pronounced as either “yi1” or “yao1.”
Non-alphanumeric characters such as hyphens (-), dots (.), and underscores (_) are not recognized; if spoken they reduce recognition accuracy.
Return keys/values
MEANING | Contains a string of ISO-8859-1 digits and lowercase letters, with no embedded spaces. |
---|---|
SWI_literal | Contains the exact text that was recognized. |
Examples
In the following examples, note that the English letters of the alphabet are allowed. This is done to allow callers to speak English characters in addition to Mandarin.
Caller says | MEANING key |
---|---|
Spaces between digits indicate individually spoken numbers: 零 一 二 三 四 五 六 七 八 九 | 0123456789 |
a b c d e f g | abcdefg |
a b c 1 e 6 g | abc1e6g |
a 一 s 二 d 三 f 四 | a1s2d3f4 |
Here are examples of utterances that do not parse when spoken by callers:
Caller says | Reason for not being recognized |
---|---|
十 二 | Natural numbers are not recognized with this grammar. Each digit must be spoken individually. |
alphanum built-in grammar
_**NOTE: for backward-compatibility only. Otherwise, use alphanum_lc builtin!
This grammar has been replaced by the alphanum_lc grammar, but is still available and has been retained for backward-compatibility.
For new implementations, please use the alphanum_lc builtin grammar. ** _
The alphanum built-in grammar recognizes a connected string of up to 20 digits and upper and lower case alphabetic characters. For example, this grammar could be used to recognize a product code or order number.
Valid characters are the English letters of the alphabet (a–z) so callers can speak English characters in addition to Mandarin numbers. The pronunciation of the letter z as the British-style “zed” is recognized, but the American-style “zii” is likely to be misrecognized as the letter c .
Valid digits are 0–9 . Although specified as Arabic numbers, callers speak the Mandarin equivalents: 零 一 二 三 四 五 六 七 八 九 . The digit " 一" " can be pronounced as either “yi1” or “yao1.”
Non-alphanumeric characters such as hyphens (-), dots (.), and underscores (_) are not recognized; if spoken they reduce recognition accuracy.
Return keys/values
MEANING | Contains a string of ISO-8859-1 digits and lowercase letters, with no embedded spaces. |
---|---|
SWI_literal | Contains the exact text that was recognized. |
Examples
In the following examples, note that the English letters of the alphabet are allowed. This is done to allow callers to speak English characters in addition to Mandarin.
Caller says | MEANING key |
---|---|
Spaces between digits indicate individually spoken numbers: 零 一 二 三 四 五 六 七 八 九 | 0123456789 |
a b c d e f g | abcdefg |
a b c 1 e 6 g | abc1e6g |
a 一 s 二 d 三 f 四 | a1s2d3f4 |
Here are examples of utterances that do not parse when spoken by callers:
Caller says | Reason for not being recognized |
---|---|
十 二 | Natural numbers are not recognized with this grammar. Each digit must be spoken individually. |
boolean built-in grammar
The boolean grammar collects an affirmative or negative response.
Properties
The y and n parameters let you associate any two touchtone buttons as synonyms for yes and no.
Parameter | Description |
---|---|
y | Desired DTMF digit to be equivalent to " 对 " (default = 1) |
n | Desired DTMF digit to be equivalent to " 错 " (default = 2) |
Examples
Caller says | MEANING key |
---|---|
对 | true |
错 | false |
ccexpdate built-in grammar
The ccexpdate grammar understands the expiration date on a credit card. Expiration dates are usually a month and a year, and are often embossed on a credit card in the form “mm/yy.” The grammar recognizes variations on the date, for example, December 2005 ( 二 零 零 五 年 十 二 月 ) and oh four oh five ( 二 零 零 五 年 四 月 ).
Some credit cards are stamped with a day of the month as well as the month and year; the ccexpdate grammar recognizes these dates as well. However, the only day of the month it recognizes is the last day of a given month, for example, November 30th, 2005 ( 二 零 零 五 年 十 一 月 三 十 号 ). The grammar does not check for leap years: both February 28 and February 29 are recognized, regardless of the given year.
Return keys/values
Upon return, the MEANING key is assigned to the recognized date in YYYYMMDD format, where YYYY is the year, MM is the month, and DD is the day. For example, 20100331 refers to March 31, 2010. The value is the same regardless of whether the caller specified a day of the month or not; the day is always set to the last day of the month. For example, both “oh six three oh oh five” ( 二 零 零 五 年 六 月 三 十 号 ) and “oh six oh five” ( 二 零 零 五 年 六 月 ) return 20050630. Note that if the expiration month is February, MMDD is always 0228, regardless of what the caller said or whether or not the expiration year is a leap year.
citizenid (Mainland China) built-in grammar
The citizenid grammar understands 15-digit and 18-digit PRC citizen ID numbers:
- The 15 digit ID only applies to persons born before Oct 1st, 1989
- The 18 digit ID only applies to persons of at least 16 years old
These are the parts contained in the ID number:
15 digit ID:
area code - 6 digits
date of birth - format yymmdd - 6 digits
sequence number - 3 digits
NOTE: The 15 digit ID number does NOT have a checksum
18 digit ID:
area code - 6 digits
date of birth - format yyyymmdd - 8 digits
sequence number - 3 digits
checksum - 1 digit or X
Example
Caller says | MEANING key |
---|---|
三 八 零 零 零 零 幺 九 幺 九 幺 幺 零 四 零 零 八 九 | 380000191911040089 |
creditcard built-in grammar
The creditcard grammar understands a caller saying a credit card number, optionally preceding the number with the credit card name, or the words “account number” ( 账号 ) or “account” ( 账户 ). For example, a caller can say, “visa account number four seven six four…” ( 维萨 卡 账号 四 七 六 四 ), “mastercard five two seven eight…” ( 万事达 卡 五 二 七 八 ), or “three seven three five…” ( 三 七 三 五 ).
The following card types are allowed by default: China Unionpay Card, Mastercard, Visa, JCB.
In order to allow other card types you have to add the default card tags “mastercard+visa+cup+jcb+” plus your other selected card types to your grammar load line, joined by + signs:
e.g.
[credit card grammar]?SWI_vars.typesallowed=mastercard+visa+dinersclub+private+amex+discover+jcb
currency built-in grammar
The currency grammar collects currency amounts using 元 , 角 , and 分 .
Returned keys in resultData
MEANING | Contains a string in this form: CNY main_unit_amount . subunit_amount If the caller omits the main unit or subunit amount, then that field is zero. The string contains a leading zero if the subunit amount is collected without the main unit. |
---|
Examples
Caller says | MEANING |
---|---|
五 块 | CNY5.00 |
五 分 | CNY0.05 |
五 块 零 五 分 | CNY5.05 |
五 块 两 毛 五 分 | CNY5.25 |
六 十 二 万 五 千 四 百 六 十 四 块 | CNY625464.00 |
一 块 | CNY1.00 |
date built-in grammar
The date grammar accepts a date spoken in any of several formats.
Recognized phrases include:
- 六 月 四 号
- 二 零 零 一 年 六 月 四 日
- 四 号
- 六 月 四 号 星期 一
Because the grammar does not know the current date, it returns question marks (?) wherever the caller omits information (see the examples and the discussion of return keys for more information).
The grammar also accepts these utterances, which return values of -2, -1, 0, +1, and +2 respectively in the MEANING key:
- 前 天
- 昨 天
- 今 天
- 明 天
- 后 天
Note that to be understood, at least the day of the month must be present. Phrases like these are not understood:
- 下 星期 三
- 二 零 零 一 年 六 月
There is no validity checking on the date, either for day-of-week or days-in-month validity. For example, “二 零 零 一 年 七 月 四 号 星期 一” is not automatically rejected even though July 4, 2001 is a Wednesday. Similarly, " 四 月 三 十 一 日 " is not rejected even though April has only 30 days.
The grammar also recognizes " 民 国 " years corresponding to the establishment of the Republic of China in 1911. So “民国 九 十 年” is year 2011.
Examples
Caller says | MEANING key |
---|---|
今 天 | 0 |
明 天 | +1 |
昨 天 | -1 |
后 天 | +2 |
前 天 | -2 |
一 号 | ??????01 |
十 二 月 四 号 星期 三 | ????1204 |
十 二 月 四 号 | ????1204 |
四 号 | ??????04 |
十 五 日 | ??????15 |
二 零 零 零 年 一 月 一 号 | 20000101 |
二 零 零 一 年 一 月 一 号 | 20010101 |
Here is an utterance that does not parse when spoken by callers:
Caller says | Reason for not being recognized |
---|---|
四 号,一月三号 | The grammar does not recognize corrections inside an utterance. |
digits built-in grammar
Valid characters are 零 一 二 三 四 五 六 七 八 九 . The digit " 一 " can be pronounced as either “yi1” or “yao1.”
Characters need to be space separated.
Examples
Caller says | MEANING key |
---|---|
零 | 0 |
一 | 1 |
零 一 二 三 四 五 六 七 八 九 | 0123456789 |
Here are examples of utterances that do not parse when spoken by callers:
Caller says | Reason for not being recognized |
---|---|
十 二 | Natural numbers are not recognized by this grammar |
number built-in grammar
The number grammar recognizes whole numeric numbers. (The caller must not speak the individual digits.)
Up to two decimal places are recognized by default; this can be extended to 9 using the maxdecimal parameter. The caller must speak individual digits after the decimal point (natural numbers not allowed).
In order to allow negative values the parameter SWI_vars.minallowed must be used with a value below the lowest number that has to be accepted.
Number values in transcriptions need to be separated by spaces.
Examples
Caller says | MEANING key |
---|---|
十 二 | 12 |
二 十 一 | 21 |
二 十 二 | 22 |
三 十 | 30 |
一 百 零 一 | 101 |
四 百 二 十 | 420 |
四 百 二 | 420 |
三 千 零 二 | 3002 |
一 万 两 千 三 百 四 十 五 | 12345 |
一 百 二 十 三 | 123 |
负 四 | -4 |
十 四 点 五 六 | 14.56 |
phone built-in grammar
The following telephone numbers are supported:
- Landline numbers with area code
- Area code always starts with 0
- Length of areacode between 3 and 7 digits
- Local number - 7 to 8 digits long - never starting with 0
- Optional: Extension - default value between 1 and 9999
- Landline numbers without area code
- Local number - 7 to 8 digits long - never starting with 0
- Optional: Extension - default value between 1 and 9999
- Cellular numbers - length 11 digits - starting with 13, 15 or 18
- Special (emergency) numbers: 110, 112, 114, 119, 120 and 122
- Service numbers:
- 5 digit numbers, starting with 1 or 9
- 6 digit number: 118114
- 400 or 800 numbers - starting with 400 or 800 - length 10 digits
The variable SWI_vars.typesallowed can be used to switch on or off the following phone number groups:
Tag to use | Related phone number section |
---|---|
landline | landline numbers |
cellular | cellular numbers |
service | service numbers |
special | special numbers |
By default all groups above are allowed.
To restrict the phone grammar to selected groups only you should call the grammar using a SWI_vars setting.
Sample:
Only allow cellular and special numbers:
[phone grammar]?SWI_vars.typesallowed=cellular+service
The caller must speak each digit one at a time. The grammar does not allow natural number phrases such as " 三 百 二 十 四 五 十 五 七 十 二 “.
Examples
Caller says | MEANING key |
---|---|
一 一 零 | 110 |
零 三 五 零 一 二 三 四 五 六 七 | 03501234567 |
零 三 五 零 一 二 三 四 五 六 七 转 分机 二 零 三 六 | 03501234567x2036 |
Here are examples of utterances that do not parse when spoken by callers:
Caller says | Reason for not being recognized |
---|---|
零 三 五 零 一 二 三 四 五 六 七 转 分机 二 零 三 六 六 | Extensions cannot be longer than 4 digits |
postcode built-in grammar
The postcode grammar recognizes valid postal code in Mainland China in six-digit format.
time built-in grammar
The time grammar recognizes a time of day.
The grammar accepts spoken time utterances from the caller. Recognized phrases include times given in 12-hour format (for example, 五 点 ) and 24-hour format ( 二 十 三 点 ). In addition, it recognizes “qualified” times such as " 五 点 以前 " and “大约 五 点.”
Examples
For each entry, the values returned in the MEANING and QUALIFIER keys are shown. (Not shown are the values of the HOUR, MINUTE, and AMPM keys.)
Caller says | MEANING key | QUALIFIER key |
---|---|---|
正 午 | 1200p | exact |
午 夜 | 1200a | exact |
中 午 以前 | 1200p | before |
八 点 半 | 0830? | exact |
晚上 七 点 一 刻 | 0715p | exact |
凌 晨 一 点 | 0100a | exact |
二 十 四 点 | 0000h | exact |
一 点 十 分 | 0110? | exact |
一 点 一 刻 | 0115? | exact |
大约 一 点 一 刻 | 0115? | approx |
下 午 一 点 一 刻 | 0115p | exact |
早 上 一 点 一 刻 | 0115a | exact |
一 点 半 | 0130? | exact |
十 二 点 十 分 | 1210? | exact |
中 午 十 二 点 | 1200p | exact |
中 午 十 二 点 过 五 分 | 1205p | exact |
午 夜 十 二 点 过 五 分 | 1205a | exact |
中 午 十 二 点 差 五 分 | 1155a | exact |
午 夜 差 五 分 | 1155p | exact |
Here are examples of utterances that do not parse when spoken by callers:
Caller says | Reason for not being recognized |
---|---|
一 点 六 十 | There are no more than 60 minutes in an hour. |
三 | Ambiguous. The caller must say " san1 dian3.” |
三 点 二 | Incomplete phrase. The caller must say something like “san1 dian3 er4 fen1” or “san1 dian3 er4 shi2.” |
现 在 | (Phrase not recognized) |
半 小时以内 | (Phrase not recognized) |
Vocabulary items and pronunciations
This chapter describes considerations for vocabularies and their pronunciations in Mandarin (zh-CN). Your product documentation covers details about how to work with pronunciations and dictionaries.
Mandarin pronunciations
This section provides detailed reference information to help create pronunciation dictionaries. It is intended for people who have sufficient knowledge of the Mandarin language as spoken in Mainland China. It provides information about transcription and pronunciation, and all the phonemes and their Nuance symbols used in the language. If you are not sure how a certain word is pronounced you can refer to the IPA transcriptions and then convert them into the Nuance symbols. (See The Mandarin symbol set in alphabetical order .)
The Mandarin phoneme system
The Mandarin phoneme system can be divided into two groups:
- Initials
- Finals
According to Li & Thompson initials and finals are defined as:
> The initial represents the consonantal beginning of a syllable. Since Mandarin does not have consonant clusters (sequences of consonants), the consonantal beginning of a syllable can only be a single consonant. There are, however, Mandarin syllables that do not have initial consonant. For those syllables the tradition is to describe their initials as “zero.” …The final is the part of the syllable excluding the initial.
>
> (Li, Charles N. and Sandra A. Thompson. 1981. Mandarin Chinese: A Functional Reference Grammar. C.A: University of California. Reprinting in Taipei: Crane, 1997.)
There are 24 initials (including the empty initial, and initial /u/ and /i/, which is represented as /w/ and /j/) and 36 finals in Mandarin.
The Mandarin phonetic system is quite complex since it has not only initials and finals but also a unique tone system and tone sandhi. After introducing the phonetic system, we will discuss the tone system in detail.
To comply with phonetic conventions, we refer to the initials still as consonants since they don’t differ in any systematic way.
Furthermore, it is possible to distinguish six different types of Mandarin consonants:
- Plosives
- Fricatives
- Affricates
- Glides
- Nasals
- Liquids
Mandarin symbol set grouped by phoneme classes
The following table shows all phonemes used in Mandarin transcriptions. They are listed according to their phoneme classes with their Nuance and Pinyin representations.
Phoneme class | Nuance | IPA | Pinyin | Examples |
---|---|---|---|---|
Consonants | Plosives | b | b | b |
p | p h | p | p i2 | 皮 |
d | d | d | d a4 | 大 |
t | t h | t | t ai2 | 台 |
g | g | g | g uo2 | 国 |
k | k h | k | k ou3 | 口 |
Fricatives | f | f | f | f eng1 |
s | s | s | s an1 | 三 |
S | s̘ | sh | sh ui3 | 水 |
x | ç | x | x iao3 | 小 |
h | h | h | h e2 | 河 |
Affricates | c | ts h | c | c un1 |
C | ts̘ h | ch | c ha2 | 茶 |
q | tçh | q | q i1 | 七 |
j | dʐ | j | j ing1 | 京 |
z | dz | z | z i3 | 子 |
Z | dʐ | zh | zh ong1 | 中 |
Glides | w | u | w | w ang2 |
y | i | y | y ou3 | 有 |
H | y | y | y uan1 | 渊 |
Nasals | m | m | m | m en2 |
n | n | n | n an2 | 南 |
N | n | n | tia n 1 | 天 |
G | ɳ | ng | sha ng 4 | 上 |
Liquids | l | l | l | l u4 |
r | r | r | r en2 | 人 |
Vowels | Vowels | a | a a/ɑ | a |
@ | ʌ ə ʌ | e | d e 2 m e n2 f e ng1 | 德 门 风 |
$ | ɜ ɨ | i | sh i 4 s i 4 | 士 四 |
i | i ɨ | i | y i 1 x i n1 | 一 心 |
wo | ɔ uɔ | o uo | b o 2 g uo 2 | 伯 国 |
u | u y | u | w u 3 q u 4 | 五 去 |
e | ɛ | e | xi e 4 | 谢 |
y | i | i | x i a4 | 下 |
o | ʊ | o | g o ng1 | 工 |
w | u | u | ch u an2 | 船 |
v | u: | u: / v | nv3 | 女 |
I | ai | ai | s ai 4 | 赛 |
A | aʊ | ao | z ao 3 | 早 |
E | ei | ei | b ei 3 | 北 |
O | ou | ou | k ou 3 | 口 |
R | r | r | e r 4 | 二 |
Mandarin consonants
The standard Mandarin consonants system is considered to have:
- Six plosives
- Five fricatives
- Six affricates
- Four nasals
- Three glides
- Two liquids
Mandarin Chinese has a primary distinction of obstruents (stops/plosives, affricates, fricatives) and sonorants (nasals, liquids and semivowels). The obstruents are all voiceless, the sonorants all voiced. The stops and affricates fall into two contrasting series, one unaspirated, the other aspirated. The unaspirated series (/b/, /d/, /S/, etc.) often gives the impression of being voiced to the untrained ear. The second series (/p/, /t/, /s/, etc.) is strongly aspirated.
Plosives
There are three aspirated and three unaspirated plosives in Mandarin, which can be arranged in pairs as shown here:
Unaspirated | Examples | Aspirated | Examples |
---|---|---|---|
/b/ | bai2 | / b I2/ | 白 |
/d/ | du1 | / d O1/ | 都 |
/g/ | gao3 | / g A3/ | 搞 |
Fricatives
There are five fricatives in the Mandarin Nuance symbol set, /S/, /s/, /f/, /h/, and /x/.
/S/ | shuan1 | /Swa1n/ | 拴 | tie |
---|---|---|---|---|
/s/ | suan1 | /swa1n/ | 酸 | sour |
/f/ | fu2 | /fu2/ | 福 | luck |
/h/ | hao3 | /hA3/ | 好 | good |
/x/ | xin1 | /xi1n/ | 心 | heart |
Affricates
In Mandarin there are six affricates, /c/, /C/, /q/, /j/, /z/, and /Z/.
Unaspirated | Examples | Aspirated | Examples |
---|---|---|---|
/z/ | zu2 | / z u2/ | 捽 |
/Z/ | zhu1 | / Z u1/ | 朱 |
/j/ | jing4 | / j i4G/ | 静 |
Nasals
There are four nasals in Mandarin, /m/, /n/, /N/ and /G/.
/m/ | ma3 | / m a3/ | 马 | horse |
---|---|---|---|---|
/n/ | na2 | / n a2/ | 拿 | take |
/N/ | tia n 1 | /tye1 N / | 天 | heaven |
/G/ | sha ng 4 | /Sa4 G / | 上 | up |
Glides
There are three glides in Mandarin, /y/, /w/, and /H/.
/y/ | you3 | / y O3/ | 有 | to have |
---|---|---|---|---|
/w/ | w ang2 | / w a2G/ | 王 | king |
/H/ | y uan1 | / H e1n/ | 渊 | abyss |
Note that if the glides “w” and “y” are followed by respectively /u/ and /i/ then in the Nuance transcription they are merged to a single /u/ and /i/ respectively.
For example: 无 in Pinyin wu2 but in Nuance transcription /u2/
Liquids
There are two liquids in Mandarin, /l/ and /r/.
/l/ | lu4 | /lu4/ | 路 | road |
---|---|---|---|---|
/r/ | rou2 | / r O2/ | 柔 | gentle |
/r/ appears only at initial position, articulated slightly postalveolarized.
Mandarin vowels
The Mandarin vowels refer to the finals they correspond to.
Vowels
There are eleven vowels in Mandarin:
$ | ɜ ɨ | i | sh i 4 s i 4 | 士 四 | scholar four | /S $ 4/ /s $ 4/ |
---|---|---|---|---|---|---|
@ | ʌ ə ʌ | e | d e 2 m e n2 f e ng1 | 德 门 风 | virtue door wind | /d @ 2/ /m @ 2n/ /f @ 1G/ |
a | a a/ɑ | a | d a 4 sh a n1 | 大 山 | big mountain | /d a 4/ /S a 1n/ |
e | ɛ | e | xi e 4 | 谢 | thanks | /xy e 4/ |
v | u: | u: / v | nv3 | 女 | female | / n v3 / |
i | i ɨ | i | y i 1 x i n1 | 一 心 | one heart | / i 1/ /x i 1n/ |
o | ʊ | o | g o ng1 | 工 | work | /g o 1G/ |
u | u y | u | w u 3 q u 4 | 五 去 | five to go | /w u 3/ /q u 4/ |
w | u | u | ch u an1 | 穿 | wear | /C w a1N/ |
wo | ɔ | o | b o 2 | 伯 | uncle | /b wo 2/ |
y | i | i | x i a4 | 下 | down | /x y a4/ |
Diphthongs
There are five diphthongs in Mandarin.
/wo/ | u ɔ | uo | g uo 2 | 国 | country | /g wo 2/ |
---|---|---|---|---|---|---|
/I/ | ai | ai | s ai 4 | 赛 | contest | /s I 4/ |
/A/ | aʊ | ao | z ao 3 | 早 | early | /z A 3/ |
/E/ | ei | ei | b ei 3 | 北 | north | /b E 3/ |
/O/ | ou | ou | k ou 3 | 口 | mouth | /k O 3/ |
Semivowel
There is one semivowel in Mandarin:
/R/ | r | r | e r 4 | 二 | two | / R 4/ |
---|
The system of initials and finals
After describing the Mandarin phonetic system in a canonical way, we now introduce the initials/finals system as it is more commonly used to describe the Mandarin phonetic system. There are twenty-four initials in Mandarin. As mentioned above, since there is no consonant cluster in Mandarin, the initials basically correspond to the consonants.
However, it is not the case for the finals. The Mandarin final unit is basically larger than that of a vowel. It includes the medial, the main vowel and the remaining syllabic ending such as ing ‘iG’. Except the main vowel, the medial and the ending are both optional. The finals as listed in the following table.
Mandarin initials
Pinyin | b | p | m | f | t | n | l | z | c | s | |||
Nuance | b | p | m | f | t | n | l | z | c | s | |||
Pinyin | zh | ch | sh | r | w | y | q | x | g | k | h | d | null initial |
Nuance | Z | C | S | r | w / u* | y / i* | q | x | g | k | h | d |
- Note that if the glides “w” and “y” are followed by respectively /u/ and /i/ then in the Nuance transcription they are merged to a single /u/ and /i/ respectively.
Mandarin finals without glides
Pinyin | a | e | i | o | u | u: / v | er | ||
Nuance | a | @ | i | wo | u | v | R | ||
Pinyin | ai | ei | ao | ou | an | en | ang | eng | ong |
Nuance | I | E | A | O | a_n | @_n | a_G | @_G | v_G |
Mandarin finals with glides
Pinyin | ia | iao | ie | iou/iu | ian | in | iang | ing | iong | ua |
Nuance | ja_ | ja_v_ | je_ | jO_ | je_m | i_m | ja_G | i_G | jv_G | wa_ |
Pinyin | uo | uai | uei/ui | uan | uen/un | uang | ueng | u:e / ve | u:an / van | u:n / vn |
Nuance | wo_ | wI_ | wE_ | wa_n | w@_n | wa_G | wv_G | He_ | He_n | v_n |
- Replace the underscore (_) in the above notations with one of the tonal symbols 1–5.
- The handling of wu and yi at initial positions is handled different in Nuance transcriptions: Both phoneme combinations are merged in to /u/ and /i/ respectively.
- In standard Pinyin forms, when “ou” is with “i,” “iu” but not “iou” is used. When “ei” and “en” are with “u,” “ui,” and “un,” but not “uei” and “uen,” are used. When “eng” is with “u:,” “iong” is used instead.
- In Pinyin forms, when “eng” is with “u,” preceded by initials, “ueng” is replaced by “ong.”
- In Pinyin forms, when “u:” is preceded by “j,” “q,” and “x,” “u” is used instead, as in “ju,” “qu,” and “xu.” On the other hand, when “u:” is preceded by “n” and “l,” “u:” is kept the same without any change as in “nu:” and “lu:.”
- The empty finals are not included in the table.
Rhotacization
In addition to the primary set of finals discussed above, Standard Chinese also possesses a series of “rhotacized” (er2-hua4) finals; morphemically these finals consist of one of the primary finals followed by the subsyllabic suffix -r. The effects of the processes are several:
- the syllabic endings i and n are dropped
- front vowels become centralized
- final ng fuses with r to form a nasalized retroflexed vowel
Mandarin tone system
Mandarin is a tonal language, in which tone pitches are obligatory to the construction of a syllable. There are four tones in Mandarin and every syllable must be assigned one of the four tones. Tone pitches in Mandarin have the status just like initials and finals, once one used the wrong pitch, people cannot tell the syllable anymore. Chao has introduced a method to describe the Mandarin tone system, dividing the pitch level into 5 scales. 5 is the highest and 1 is the lowest. According to Chao’s system, we can describe the Mandarin four-tone system as follows:
- The first tone is high level. It is relatively constant in its intensity (loudness). In Chao’s notation it is [55]. Examples: ge1 “song,” ting1 “listen.”
- The second tone is high rising. It begins in about the middle of the speaker’s normal speaking range and rises abruptly to the top of his range. It tends to rise in intensity toward the end and is short in duration. It can be represented as [35] on the tonal scale. Examples: na2 “hold in the hand,” jiao2 “chew,” ling2 “zero.”
- The third tone has two basic variants. When it is pronounced in isolation, it begins low, falls to the very bottom limit of the voice and then rises to a half-high level. It can be described as [214] on the tonal scale. When the third tone occurs before any tone except another third tone, it becomes what is commonly known as a “half third;” that is, it loses its final rise and remains low throughout: [21].
- The fourth tone is high falling. It begins at the top of the speaker’s pitch range and falls abruptly to the bottom limit of his voice. Its value on the tonal scale is [51]. Examples: liu4 “six,” nen4 “tender,” qu4 “go.”
(Chao, Yuan Ren. 1968. A Grammar of Spoken Chinese. Berkeley and Los Angeles: University of California.)
Mandarin tonal system could be summarized in the following table:
Tone | Description | Pitch | Example |
---|---|---|---|
1 | High Level | 55 | 一 |
2 | High Rising | 35 | 移 |
3 | Falling-Rising/Falling | 214/21 | 以 |
4 | High Falling | 51 | 亿 |
In addition to the four tones, a neutral tone is included in the standard Mandarin.
Tone sandhi
One of the most interesting tonal phenomena in Mandarin is tone sandhi, which is described as the change of tones when syllables are in sequence. That is, a syllable has one of the tones in isolation, and the same syllable may take on a different tone without any change in meaning when it is followed by another syllable. The most important and common tone sandhi rule involves the third tone. For example, both jiu3 “nine” and wu3 “five” are syllables with third tones. When they are in sequence, jiu3 wu3 “nine five / ninety-five,” jiu3 would be changed from third tone to be second tone, jiu2.
Specific pronunciation transcription methods
Transcription of the retroflexes
In some cases, symbols with upper or lower case are used to distinguish retroflexes from their non-retroflex counterpart in Mandarin. /S/, /Z/, and /C/ are used to represent “retroflex,” and /s/, /z/, and /c/, non-retroflex.
Pinyin | Nuance (retroflex) | Example | Pinyin | Nuance (non-retroflex) | Example |
---|---|---|---|---|---|
shi4 | / S $4/ | 是 | yes | si4 | / s $4/ |
zhu3 | / Z u3/ | 主 | master | zu3 | / z u3/ |
chi2 | / C $2/ | 持 | hold | ci2 | / c $2/ |
Transcription for sounds with distinct positions / contexts
Symbols which are used to represent sounds with distinct positions or contexts are introduced in this section.
Transcription of liquids
Symbol “r” is used only at syllable final position as a semivowel in Mandarin. And, the liquid /R/, for appearing at initial position and being articulated slightly postalveolarized, is represented by “R.”
Symbol | Pinyin | Nuance |
---|---|---|
R | er2 | / R 2/ |
r | ren2 | / r @2n/ |
Transcription of glides
Symbol “u” and “i” are used only in glide position in Mandarin. “I” is used when followed by a nasal. And, at initial position, “w” and “j” are used instead.
Symbol | Pinyin | Nuance |
---|---|---|
w | chuan2 | /C w a2N/ |
y | xia4 | /x y a4/ |
i | xin1 | /x i 1n/ |
w | wang2 | / w a2G/ |
j | you3 | / y O3/ |
Transcription of nasals and semivowels
Symbol “n” is used to present the alveolar nasal at initial position and “N” is used at final position, being a semivowel. Symbol “m” is used for bilabial nasal and “G” (being a semivowel in Mandarin) is for a velar nasal.
Symbol | Pinyin | Nuance |
---|---|---|
n, N | nan2 | / n a2 N / |
Note that as in European SAMPA, symbol “m” is also used for a bilabial nasal and “N” (being a semivowel in Mandarin) is for a velar nasal.
Symbol | Pinyin | Nuance |
---|---|---|
m, G | meng4 | / m @4 G / |
Pronunciation of foreign words
When there is a need to transcribe foreign words, the general rule is to transcribe those words with the same Nuance symbol set as the rest. If one uses a different symbol set, the system will be incapable of understanding the input.
However, every language has a different phoneme inventory, so one may have problems in covering each and every sound. For the most common case transcription examples are offered.
English words
English words are transcribed with alphabets and English Nuance symbols except when English alphabets are pronounced as in Chinese syllables.
For example, English letters C and G are sometimes pronounced by Chinese speakers as ‘Ci:’ and ‘dCi:’, which fit the pronunciation of “xi1” and “ji1”in Chinese syllables:
C. ‘Ci:’ (pronounced as Chinese syllable “xi1”)
G. #dCi:’ (pronounced as Chinese syllable “ji1”)
In these cases Nuance symbols which fit into Chinese syllables are used to transcribe the sound.
Multiple pronunciations (variants)
The type of pronunciation used in Nuance and the Mandarin dictionary conforms to the standard non-regional Mandarin pronunciation. Since it is possible to have more than one pronunciation for a word by using pronunciation variants, it may be difficult to determine how many pronunciation variants should be created.
The general rule is that variants should only be created if the pronunciation differs in more than one phoneme. Systematic variants in some dialects such as fricative /S/ and /s/, affricate /Z/ and /z/, or semivowel /n/ and /G/ can usually be reflected in the training material for the phonemes, so they need not be covered by pronunciation variants. If such a word causes recognition errors, the creation of pronunciation variants may help to solve the problem. They should be transcribed as separated variants in the format below:
Base forms: Orthographic Form Phonetic Form
Variants: Orthographic<Phonetic Variant> Phonetic Variant
The phonetic transcription of a variant is appended to the orthographic form in angle brackets in order to differentiate between base and variant forms.
The same syntax could be also applied to the notations of neutral tone, tone sandhi and modified tones.
The Mandarin symbol set in alphabetical order
The following table shows the Mandarin symbol set in alphabetical order:
Nuance | IPA | Pinyin | Examples of Usage |
---|---|---|---|
$ | ɜ ɨ | i | sh i 4 s i 4 |
@ | ʌ ə ʌ | e | d e 2 m e n2 f e ng1 |
a | a a/ɑ | a | d a 4 sh a n1 |
A | aʊ | ao | z ao 3 |
b | b | b | b a1 |
c | ts h | c | c un1 |
C | ts̘ h | ch | c ha2 |
d | d | d | d a4 |
e | ɛ | e | xi e 4 |
E | ei | ei | b ei 3 |
f | f | f | f eng1 |
g | g | g | g uo2 |
G | ɳ | ng | sha ng 4 |
h | h | h | h e2 |
H | y | y | y uan1 y uan2 |
i | i ɨ | i | y i 1 x i n1 |
I | ai | ai | s ai 4 |
j | dʐ | j | j ing1 |
k | k h | k | k ou3 |
l | l | l | l u4 |
m | m | m | m en2 |
n | n | n | n an2 |
N | n | n | tia n 1 |
o | ʊ | o | g o ng1 |
O | ou | ou | k ou 3 |
p | p h | p | p i2 |
q | t ç h | q | q i1 |
R | r | r | r en2 |
R | r | r | e r 4 |
s | s | s | s an1 |
S | s̘ | sh | sh ui3 |
t | t h | t | t ai2 |
u | u y | u | w u 3 q u 4 |
v | u: | u: / v | nv3 |
w | u | w u | w ang2 ch u an2 |
wo | ɔ uɔ | o uo | b o 2 g uo 2 |
x | ç | x | x iao3 |
y | j i | y i | y ou3 x i a4 |
z | dz | z | z i3 |
Z | dʐ | zh | zh ong1 |
Automatic pronunciation module
The automatic pronunciation module is provided to pronounce words that are not in any dictionary.
The automatic pronunciation module supports pinyin and native characters:
- 1403 pinyin single syllables
- 20204 native single characters
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.