User dictionaries
You can create application-specific user dictionaries to improve recognition accuracy of words and phrases that are giving you problems—for example, when application users have non-standard or heavily accented pronunciations. You can also create dictionaries that supplement the system dictionary vocabulary by adding terms that may be particular to your application.
Each user dictionary is an XML text file that maps the text to be returned by Recognizer for different user pronunciations, with the pronunciations expressed in a phonemic alphabet. The phonemic alphabet used depends on the language; refer to the Language Supplement in the language pack for details.
Note: Each Language Supplement is located in: %SWISRSDK%\documentation\languages.
You may discover the need for a user dictionary after testing; for example, if the parseTool utility fails to parse your test sentences. A dictionary may also be indicated during early stages of application deployment when you analyze call logs. For example, if you experience many false rejections and unnecessary confirmations, a dictionary can tune pronunciations of problematic words.
Carefully used, the user dictionary is an excellent way to tune pronunciations for an application. However, too much dictionary use frequently reduces accuracy by overriding the system dictionary, which is tuned for high accuracy.
Importing a user dictionary
To import a dictionary, use the <lexicon> element in a main grammar header:
<lexicon uri="http://myuri/my_user_dict.xml"/>
The SRGS specification allows a "type" attribute with <lexicon>, but the specification does not define its contents. Recognizer ignores this property.
You can include SWI properties in the URI after a question mark:
<lexicon uri="user_dict.xml?SWI.type=backup"/>
(For more on the SWI.type parameter, see Dictionary precedence.)
User dictionaries are local to the grammar where they are defined. If two or more grammars are active at one time, each can define different dictionary sets.
You can use the xml:lang attribute to specify the language of the dictionary in the <lexicon> element, though that will already be specified in the dictionary itself.
A simple user dictionary file appears below:
<?xml version="1.0" encoding="UTF-8" ?>
<lexicon xml:lang="en-US"
alphabet="application/sampa;localization=nuance">
<entry key="tomato">
<definition value="t @ m Q t @W" />
<definition value="t @ m e Y t @W" />
</entry>
<entry key="George">
<definition value="d Z O r d Z" />
</entry>
<entry key="record">
<definition value="r e k @ r d" part="noun" />
<definition value="r I k O r d" part="verb" />
</entry>
</lexicon>
The components of the file are described below.
Dictionary header
The dictionary header includes the following:
Create separate dictionary files for each target language. This is necessary because the language attribute sets the default language for all vocabulary items in that <lexicon>.
Alphabet
The alphabet attribute declares the phonemic symbol set used to define pronunciations. There are three symbol sets available:
application/sampa;localization=nuance (Nuance SAMPA)
application/sampa;localization=swi (OSR SAMPA)
application/arpa;localization=swi (ARPAbet, this is the default)
It is recommended that you use Nuance SAMPA even though OSR SAMPA and ARPAbet are allowed. Otherwise, the runtime system performs extra processing to convert to Nuance SAMPA at runtime.
Although it is not recommended, it is possible to mix the SAMPA and ARPAbet alphabets in a single file. The alphabet attribute is available at three different levels in different XML elements: <lexicon>, <entry>, and <definition>.
- If specified in a <lexicon> element, the alphabet is valid inside the entire dictionary, unless it is overridden in an <entry> or <definition>.
- If specified in an <entry> element, the alphabet is valid for any pronunciation inside the entry, unless it is overridden by the definition.
- If specified in a <definition> element, the alphabet is only valid for the individual pronunciation.
For details on the SAMPA symbol set, see the Language Supplement provided for each supported language.
Main body
Elements found in the main body of a dictionary are:
- <entry>: An <entry> element defines the text to be matched with a pronunciation. The text is specified in the element key attribute:
<entry key="tomato">
Each entry may include several <definition> elements.
- <definition>: Each <definition> element specifies the phonemic representation (the pronunciation) of the key:
<definition value="t @ m Q t @W" />
Spaces are optional between phonemes. They can help readability, and in some instances they avoid ambiguities. The following are equivalent:
<definition value="t @ m Q t @W" />
<definition value="t@mQt@W" />
One risk of adding spaces is that you can create an invalid phoneme if you add a space to a multi-character phoneme. For example, if you define "@ W" instead of "@W" above, the definition might be invalid in the current language. (At a minimum, the erroneous space defines an unintended phoneme and pronunciation.) Such an error causes a compilation error in the grammar (but will not be detected with the dicttest tool).
The value must use the phoneme set for the language being recognized.
Duplicate keys and proununciations
Duplicate keys and pronunciations are allowed. For example, the following code results in a single dictionary entry with two pronunciations:
<entry key="tomato">
<definition value="t@mQt@W" />
</entry>
<entry key="tomato">
<definition value="t@meYt@W" />
<definition value="t@mQt@W" />
</entry>
Dictionary precedence
When Recognizer loads a grammar, it also loads needed pronunciations for that grammar. To determine the pronunciations, Recognizer searches for dictionary entries using the following precedence scheme:
- Primary dictionaries are searched first. Recognizer searches all of them and loads all the pronunciations found. When you define a user dictionary, it is a primary dictionary by default.
- Backup dictionaries are searched when the previous dictionaries have not yielded a match. As with primary dictionaries, there can be more than one backup. By default, the system dictionary is a backup dictionary.
- Automatic dictionaries are searched when a pronunciation is not found in primary or backup dictionaries. Recognizer uses text-to-phoneme rules to generate the pronunciation. The diagnostic log reports each instance of a generated pronunciation; it is recommended that you investigating to ensure the pronunciations are correct.
Changing the default precedence
Applications can control the precedence of the user and system dictionaries. In this way, the pronunciations for a word in one dictionary can be supplemented by additional pronunciations in another. This means you can take advantage of the user and system dictionaries simultaneously.
By default, if a pronunciation is found in a user dictionary, the system dictionary will not be consulted; but it is often better to use both. To accomplish this goal, define the user dictionary as a “backup” dictionary, giving it equal precedence with the system dictionary. Recognizer will then use all pronunciations from both dictionaries.
To designate a dictionary as a backup, add the SWI.type property to the dictionary URI in a grammar:
<lexicon uri="user_dict.xml?SWI.type=backup"/>
The possible values for SWI.type are "primary" or "backup". The setting is local to the grammar (the precedence is determined on a grammar-by-grammar basis, even if grammars are imported or loaded in parallel). The precedence remains changed until the grammar is deactivated.
swirec_max_dict_prons parameter
You can specify any number of <definition> elements for a key. However, this does not guarantee that the pronunciations will be used by Recognizer. The swirec_max_dict_prons parameter configures the maximum number of pronunciations allowed: if this parameter is set to 4 and the user dictionary contains 5 pronunciations for an item, then only 4 pronunciations are used.
As an example: if the default precedence settings of the user and system dictionaries were changed so that each had equal precedence, their pronunciations would be merged and Recognizer would consult those pronunciations in an undefined order, without regard as to their source. If the swirec_max_dict_prons parameter were set to 4 and the user and system dictionaries each had several pronunciations for a vocabulary item, Recognizer would only use 4 of those pronunciations. You would not be able to predict which dictionaries or which pronunciations would be used.
See swirec_max_dict_prons.
Compiling a user dictionary
To improve runtime performance, you can compile user dictionaries with the make_dict tool. This tool is stored in the %SWISRSDK%\bin directory. The command format is:
make_dict InputDict OutputDict -language LangCode
Where:
- The InputDict is the input XML dictionary.
- The OutputDict is the resulting binary dictionary.
- The LangCode is the default language for the dictionary.
See make_dict.