SLM training file main body

The main part of a training file defines the vocabulary allowed in the SLM, and a training section that lists example sentences which use words of the vocabulary.

The vocabulary section

The vocabulary section of the training file defines all words allowed in the training section. Words that appear in the training sentences but not in the vocabulary section will be ignored by the compiler.

The vocabulary section is defined by the <vocab> element. Within the <vocab> elements, words and classes are defined with <item> and <ruleref> elements.

The training section

The training section of the file lists example sentences that use the words from the vocabulary section. These sentences are used by the compiler to determine the probabilities to be used in recognizing user utterances.

The training section is defined by the <training> element. Within the <training> elements, sentences are defined with <sentence> element pairs.

The order of the sentences has no effect on the trained results.

The test section

Optionally, the training file may include a test section defined by a pair of <test> elements. This section lists sentences that could be used to test the SLM.

The test section is not used during SLM training. However, it comes into play when the SLM is used to support an SSM (see SLMs).

Very large training files

Although large training files can improve the quality of an SLM, those large files can be difficult to manage. They can become unwieldy to edit, and some third-party software may not accept files over a certain size.

Two techniques for dividing large training files into collections of smaller files are described below.