Korean Korea (ko-KR)

This documentation was updated on January 29, 2024.

Creating grammars

The following subsections describe key issues for working with grammar documents in the Korean language.

Grammar file encoding

Nuance has full internal Unicode support. Create your grammars using EUC-KR or UTF-8. For example, your grammar header might be:

<?xml version=‘1.0’ encoding=‘EUC-KR’?>

<grammar xml:lang=“ko-KR” version=“1.0” root=“test”>

Boolean built-in grammar

The boolean grammar collects an affirmative or negative response.

Properties

The y and n parameters let you associate any two touchtone buttons as synonyms for yes and no.

Parameter Description
y The DTMF digit to be equivalent to “예” (y means “yes”; the default = 1).
n The DTMF digit to be equivalent to “아니오” (n means “no”; default = 2).

Examples

Caller says… MEANING key
예, 네 true
아니오 false

Ccexpdate built-in grammar

The ccexpdate grammar understands the expiration date on a credit card. Expiration dates are usually a month and a year, and are often embossed on a credit card in the form “mm/yy.” The grammar recognizes variations on the date, for example, " 십 이 공 오," " 십 이 슬레시 공 오," " 이 천 오 년 십 이 월" etc.

Some credit cards are stamped with a day of the month as well as the month and year; the ccexpdate grammar does not recognize days.

When speaking digits:

  • Callers can speak as follows: 영,공,일,이, …, 구, 하나, 둘
  • The following pronunciations are not allowed: 셋,넷,다섯,…, 아홉

Citizenid built-in grammar

The citizenid grammar understands 13-digit Korean Citizen ID numbers. For example, a caller can say, " 육 일 공 사 이 육 일 삼 사 육 일 삼 칠." Numbers with invalid formats for citizen IDs are rejected.

The advantage of using this grammar rather than a digits grammar (of length 13) is that identification numbers have constraints that reduce that set of possible recognition hypotheses significantly (and thus increases recognition accuracy).

When speaking digits:

  • Callers can speak as follows: 영,공,일,이, …, 구, 하나, 둘
  • The following pronunciations are not allowed: 셋,넷,다섯,…, 아홉

Return keys/values

Upon return, the key MEANING is assigned to the recognized number.

Creditcard built-in grammar

The creditcard grammar understands a caller saying a credit card number, optionally preceding the number with the credit card name, or the words “카드 번호” or “번호.” For example, a caller can say, " 비자 카드 번호 사 공 일 칠 …, 마스터 카드 오 공 공 공 …, or 사 공 일 칠 …."

Currency built-in grammar

The currency grammar collects currency amounts using the Korean Won “원.” There are no currency sub-units (whole Won amounts only).

Return keys/values

MEANING Contains a string with the spoken value. It has the following form: KRW unit_amount
SWI_literal Contains the exact text that was recognized.

Examples

Caller says MEANING key
팔 천 칠 백 원 KRW8700
천 백 원 KRW1100
만 원 KRW10000
일 만 원 KRW10000
이 십 만 삼 천 원 KRW203000
이 억 오 천 만 원 KRW250000000
영 원 KRW0

Here are examples of utterances that do not parse when spoken by callers:

Caller says Reason for not being recognized
사 만 The monetary unit was not spoken.
공 원 Not a natural way of speaking. Caller should say “영 원”.
일 백 원 Not a natural way of speaking. Caller should say “백 원”
일 천 원 Not a natural way of speaking. Caller should say “천 원”.

Date built-in grammar

The date grammar accepts a date spoken in any of several formats.

Below are examples of recognized phrases. The examples show arabic numbers to make them easier to read, but the actual recognition phrases would be entirely Hangul:

6월 4일

2001년 6월 4일
6 월 4 일 월요일
4일

The grammar also accepts " 그저께 " (day before yesterday), " 어제 " (yesterday),
" 오늘 " (today), " 내일 " (tomorrow), and " 모레 " (the day after tomorrow) which return values of -2, -1, 0, +1, and +2 respectively into the MEANING key.

Examples

Caller says MEANING key
그저께 -2
어제 -1
오늘 0
내일 +1
모레 +2
이 천 일 년 삼 월 구 일 20010309
천 구 백 팔 십 구 년 이 월 팔 일 19890208
칠 십 칠 년 유 월 십 사 일 ??770614
칠 월 이 십 육 일 토요일 ????0726
일 월 이 십 삼 일 ????0123
사 일 ??????04

Here are examples of utterances that do not parse when spoken by callers:

Caller says Reason for not being recognized
공 일 년 이 월 오 일 A single digit year is not allowed. The caller should say " 이 천 일 년 … " or " 천 구 백 일 년 "
수요일 The caller must speak the month and date

Digits built-in grammar

Valid characters are the digits 0-9. When speaking digits:

  • Callers can speak as follows: 영,공,일,이, …, 구, 하나, 둘
  • The following pronunciations are not allowed: 셋,넷,다섯,…, 아홉

Examples

Caller says MEANING key
일 이 삼 사 오 12345
공 삼 사 칠 구 03479
사 팔 영 480
사 팔 공 480
영 오 05
공 오 05

Here are examples of utterances that do not parse when spoken by callers:

Caller says Reason for not being recognized
아홉 Caller should say " 구 “.
Only 0-9 are allowed.

Number built-in grammar

The number grammar recognizes whole numeric numbers (the caller must not speak the individual digits). For example, " 이 백 오 십 칠 " or " 삼 만 오 천 .”

Numbers from -999,999,999,999.99 to 999,999,999,999.99 are recognized, but by default the minallowed parameter is set to zero, which limits recognition to positive values.

Caller says MEANING key
백 오 십 팔 158
구 천 칠 십 9070
일 만 10000
10000
천 이 백 만 12000000
마이나스 이 십 점 영 일 -20.01
영 점 일 삼 0.13

Here are examples of utterances that do not parse when spoken by callers:

Caller says Reason for not being recognized
일 천 The caller should say " 천 "
일 백 The caller should say " 백 "
일 십 The caller should say " 십 "
하나 The caller should say " 일 "
The caller should say " 십 "
서른 셋 The caller should say " 삼 십 삼 "
The caller should say " 일 억 "
오 억 만 The caller should say " 오 억 일 만 "

Phone built-in grammar

The phone grammar recognizes telephone numbers (landline and cellular). When speaking digits:

  • Callers can speak as follows: 영,공,일,이, …, 구, 하나, 둘
  • The following pronunciations are not allowed: 셋,넷,다섯,…, 아홉

The grammar allows naturally-spoken phrases as well as responses of just digits. For example: “오 백 삼 십 사 국에 구 천 이 백 오 십 삼 번”.

Examples

Caller says MEANING key
사 백 오 십 팔 국에 오 삼 하나 칠 4585317
공 삼 일 삼 육 칠 에 하나 공 하나 오 교환 구 칠 육 번 0313671015x976
이 팔 사 에 구 천 사 백 오 십 이 구내 삼 십 일 2849452x31
공 일 일 칠 칠 팔 에 사 공 육 구 번 0117784069
칠 백 국에 사 삼 이 이 번 7004322

Here are examples of utterances that do not parse when spoken by callers:

The following alternative numbers are allowed: 하나 and 둘.

But these numbers are not allowed: 셋, 넷, 다섯, …

Three digit numbers are not allowed. For example: 일 일 구

Postcode built-in grammar

The postcode grammar recognizes valid Korean postal codes in either three- or six-digit format. When speaking digits:

  • Callers can speak as follows: 영,공,일,이, …, 구, 하나, 둘
  • Callers can speak a dash symbol (다시 or 에) between numbers.
  • The following pronunciations are not allowed: 셋,넷,다섯,…, 아홉

Return keys/values

Upon return, the key MEANING is assigned to the recognized postal code, and can contain either 3 or 6 digits.

Time built-in grammar

The time grammar recognizes spoken time utterances from the caller. Recognized phrases include times given as follows:

  • 12-hour format; for example “다섯 시”
  • 24-hour format; for example 열 여덟 시 오 십 분
  • “Qualified” times: “오전 아홉 시 이전” and “다섯 시 경”

Callers can clarify the time with words to indicate AM/PM, dawn, morning, evening, and night. The time range for each indicator is interpreted as follows:

Indicator Time range
오 전/에이엠 00:00-11:59 AM
오후/피엠 12:00-11:59 PM
새벽 03:00-05:59 AM
아침 05:00-10:59 AM
저녁 05:00-09:59 PM
낮/점심 11:00 AM-04:59 PM
09:00 PM-03:59 AM

Examples

For each entry, the values returned in the MEANING and QUALIFIER keys are shown. (Not shown are the values of the HOUR, MINUTE, and AMPM keys.)

Caller says MEANING key QUALIFIER key
지금 9999? exact
정오 1200p exact
자정 0000a exact
정오 이전 1200p before
새벽 네 시 반 0430a exact
아침 아홉 시 정각 0900a exact
에이엠 여덟 시 삼십 분 0830a exact
여섯 시 0600? exact
점심 열 한 시 십 오 분 이후 1115a after
오후 세 시 경 0300p approx
낮 열 두 시 쯤 1200p approx
밤 영시 정각 0000a exact
여섯 시 육 분 피엠 이전 0606p before

Here are examples of utterances that do not parse when spoken by callers:

Caller says Reason for not being recognized
아침 한 시 The caller should say " 밤 한 시 "
낮 아홉 시 The caller should say " 아침 아홉 시 "

Vocabulary items and pronunciations

This chapter describes considerations for vocabularies and their pronunciations in Korean (ko-KR).

Specially tuned pronunciations

The following table shows common words that are fine-tuned by Nuance.

영, 공, 일 , 이 , 삼,, 십, 백, 천, 만

Korean pronunciations

This section provides detailed reference information to help create pronunciation dictionaries. It is intended for people who have sufficient knowledge of the Korean language. It provides information about transcription and pronunciation.

The Korean phoneme system

The Korean spelling system is very regular. This means that in most cases the relationship between spelling (grapheme) and sound is easy to define since the orthography correlates well with the pronunciation. Nevertheless, there are some co-articulatory effects and pronunciation variants.

Korean symbol set grouped by phoneme classes

The following table shows all phonemes that can be used in Korean transcriptions. They are listed according to their phoneme classes with their SAMPA and Hangul representations.

Phoneme class SAMPA Examples of use
Consonants Plosives b
p 빠르다 /pAlIdA/
pH 파랑 /pHAlAg~/
d 다음 /dAIm/
t 따뜻한 /tAtIzhAn/
tH 타다 /tHAdA/
dZ 자세히 /dZAzehi/
tS 짜증 /tSAdZIg~/
tSH 차장 /tSHAdZAg~/
g 가구 /gAgu/
k 까지 /kAdZi/
kH 카드 /kHAdI/
Fricatives z 사슴
s 싸다 /sAdA/
h 하나 /hAnA/
Nasals m 마음
n 난방 /nAnbAg~/
g~ 항공 /hAg~gog~/
Liquids l 알림
Glides j 야구
w 왕자 /wAg~dZA/
Vowels A 아이
^ 어머니 /^m^ni/
o 오후 /ohu/
u 우리 /uli/
I 으뜸 /ItIm/
i 이상 /izAg~/
E 애쓰다 /EsIdA/
e 에서 /ez^/

Korean consonants

Plosives

There are twelve plosives in Korean. They are distinguished by place and effort of articulation (see table below).

/b/, /d/, /g/, and /dZ/ are laxed unaspirated plosives which are normally unvoiced in syllable initial or final position and voiced in intervocalic position.

/p/, /t/, /k/, and /tS/ are tensed. They are uttered with an extra glottal pressure; usually they are voiceless and unaspirated.

/pH/, /tH/, /kH/, and /tSH/ are strongly aspirated voiceless plosives.

Laxed Tensed Aspirated
Bilabial b 바람 /bAlAm/
Alveolar d 다음 /dAIm/
Velar g 가구 /gAgu/
Palatal dZ 자세히 /dZAzehi/

Fricatives

There are three fricatives in Korean; the glottal fricative /h/ and the alveolar fricatives /z/ and /s/, the latter being the tensed variant of /z/.

Laxed Tensed
Alveolar z 사슴
Glottal h 하나

Nasals

There are three nasals in Korean. The nasal /g~/ is the syllable-final realization of the glottal stop represented by the Korean orthographic symbol = “ㅇ”).

The syllable coda /g~/ can undergo considerable assimilation depending on the following sound. It is, however, always represented by the same SAMPA symbol /g~/.

Bilabial m 마음 /mAIm/
Alveolar n 난방 /nAnbAg~/
Velar g~ 항공 /hAg~gog~/

Liquids

There is one liquid in Korean. It is realized as [r] in intervocalic position and [l] in all other positions. However, in this case the phonemic transcription is preferred, using the same symbol /l/ for both types of realizations.

/l/ 알림 /Allim/

Glides

There are two glides in Korean. They are used as parts of the diphthongs (see Diphthongs ):

  • /j/
  • /w/

Korean vowels

Monophthongs

There are eight vowels in Korean:

Vowel Example Transcription
A 아이 /Ai/
^ 어머니 /^m^ni/
o 오후 /ohu/
u 우리 /uli/
I 으뜸 /ItIm/
i 이상 /izAg~/
E 애쓰다 /EsIdA/
e 에서 /ez^/

Diphthongs

There are 12 diphthongs in Korean. Only one is a “real” diphthong, made of two vowel segments. The others are combinations of a glide with a vowel. The sequence /wE/ is used in the transcription of the Korean characters " 왜 " and " 외 “.

Diphthong Korean character
jA
j^
jo
ju
jE
je
wa
wE 왜, 외
we
w^
wi
Ii

Pronunciation of foreign words

In Korean, some foreign words (mainly English) are used that contain phonemes which are not part of the Korean symbol set. These phonemes are known as xenophones . They are sounds borrowed from English that do not exist in the native Korean set of sounds. It is not possible to use these phonemes in your transcriptions as this will lead to runtime errors. Instead, these phonemes have to be mapped to similar ones, which are part of the Korean set. In the following, we give some examples for such mappings:

Dental and labiodental fricatives

The English sound ‘f’ can be mapped to /pH/; the interdentals ‘T’ and ‘D’ (the orthographic “th”) can be mapped to the alveolus sounds /t/, /s/, or /d/:

F /epHI/
the /d^/
mother /mAd^l/
thank you /tEg~kHju/
month /m^nsI/

Vowel insertion

The Korean syllable structure is very strict; that is, many consonants are not allowed in the syllable coda. If in a foreign word such an “illegal” syllable coda occurs, the additional vowel /I/ is often inserted to create an extra open syllable:

S /esI/

Insertion of /I/ also occurs in consonant clusters which do normally not exist in the Korean language:

speaker /zIpHikH^/
X /egsI/

However many younger people nowadays speak the English words without vowel insertion.

The Korean symbol set in alphabetical order:

SAMPA Examples of use
^ 어머니
A 아이
b 바람
d 다음
dZ 자세히
e 에서
E 애쓰다
g 가구
g~ 항공
h 하나
i 이상
I 으뜸
j 야구
k 까지
kH 카드
l 알림
m 마음
n 난방
o 오후
p 빠르다
pH 파랑
s 싸다
t 따뜻한
tH 타다
tS 짜증
tSH 차장
u 우리
w 왕자
z 사슴