Confidence engine models

By default, an SSM calculates confidence scores based on probabilities determined during training. You can improve these confidence scores by training a confidence engine model, and adding it to the wrapper grammar.

The SSM determines its results independently from Recognizer, and with no consideration of Recognizer’s confidence scores. In other words, once the n-best entries pass Recognizer’s confidence thresholds, the SSM processes the text of those entries, and the SSM result shows the probability of the meaning—but not whether the audio signal itself was correctly recognized.

During training, the SSM determines probabilities and computes values for default confidence scores. For any given recognized text, the SSM returns the same result and confidence every time.

When you build a confidence engine model, you improve accuracy by taking Recognizer acoustic features into account in the overall confidence score. You create this synthesis during training of the model using the ssm_train_conf utility, which uses Recognizer on training audio files. At runtime, the static confidence engine model computes the confidence of incoming utterances using both acoustic and SSM features. By training a confidence engine model, you get better accuracy in the final confidence scores.

A simple example:

  • Recognizer’s best guess (the top entry on the n-best list) is that the user has said, “Joe’s”.
  • The SSM meaning is “Joe’s pizza restaurant”.
  • The confidence in the SSM meaning depends on two factors: the utterance “Joe’s” means the user wants “Joe’s pizza restaurant” (and not “Joe’s repair shop”, for example), and that Recognizer’s guess of what was said was correct (and that the caller did not say “Moe’s” instead, for example).
  • When you train a confidence engine model, you unite training features from both Recognizer and the SSM, synthesize them, and arrive at an overall confidence of meanings.

Overview of confidence training

The training process for a confidence engine model is similar to training an SSM: you create a training file (much simpler than an SSM training file) and submit the file to an iterative training tool.

The training file for a confidence engine model does the following:

  • It declares an SSM for training confidences.
  • It declares sentences for training and (optionally) testing. Unlike an SSM training file, which declares the text and meanings of sentences, the confidence training file declares audio files and meanings.

Although the training uses a specific SSM and its meanings, the resulting model is generalized, and you can use it with more than one SSM at runtime. The model achieves good performance across a variety of applications.

Alternatively, you can avoid a generalized confidence engine model by setting the use_meanings parameter in the training file, with a resulting engine that is tightly linked to the SSM used in training. Doing this improves performance for that SSM, but builds a model that is not re-usable. This means you cannot reuse such a confidence engine model for another SSM task, and you must retrain a new model whenever you change the list of SSM meanings.

In summary, you have two options for building confidence engine models:

  • By default, build a generalized, re-usable model.
  • Build specialized models for individual SSMs.

Note: The number of available training sentences can determine whether you need to build a generalized or specialized model for your application. With fewer sentences, a specialized model gets better results, because a sample with fewer sentences does not generalize as well as one with abundant sentences.

Elements in the confidence file

A confidence engine training file contains the following elements:

Element

Description

<audio>

Optional, but strongly recommended. This element specifies an audio file for a <sentence>.

One <audio> element is allowed per sentence.

<grammar>

Required. Specifies a wrapper grammar containing an SSM, as well as the SLM definition files (finite-state machine and wordlist). When you train the model, the SSM is used to generate features. For best results, use the SLM and SSM that will be deployed in the grammar to train the confidence engine.

Only one <grammar> is allowed per training file.

<item>

Optional. Specifies a slot. See <slots_to_use>.

<param>

Required for some parameters; see Parameters in the confidence file. Specifies a parameter and its value.

<semantics>

Required. A container for the true meanings of the sentence. The meanings are specified in <slot> elements.

<slot>

Required. Specifies a true meaning. (If the SSM returns this meaning, consider the result as correct.) More than one <slot> is allowed per sentence.

<slots_to_use>

Optional. A container for items. Each <item> is a slot inside the specified <grammar>. By default, all slots in the grammar are used; this parameter enables specifying a subset of the grammar slots.

If you specify <slots_to_use>, only the listed items are used when computing matches between SSM results and truth (during confidence engine training only). Any slots not listed are ignored.

<sentence>

Required. A container for audio files and meanings.

<SSMConfidenceTraining>

Required. One header is required. Match the specified language to the language of the SSM (specified with the <grammar> element).

<test>

Required. A container for sentences to test the model. One <test> element is allowed; the section has the same format as the <training> element. If the <test> element is missing or is empty, the file that is created will be empty.

<training>

Required. A container for all the <sentence> elements for training the model. One <training> element is allowed.

Training the confidence engine model

Use the ssm_train_conf tool to generate a confidence engine model from a training file:

ssm_train_conf <filename>.xml

The default output filename is out.conf. (You can override the default with the output configuration parameters in the training file.)

You can include the output file in a wrapper grammar. See Using in applications (wrapper grammars).

When training the model, the tool writes warnings if any wave files cannot be read (SWI_SSMCONF_TRAIN_WARNING). If the tool can read a file, but cannot process it for some reason, it writes an error message (SWI_SSMCONF_TRAIN_ERROR). But as long as the tools successfully processes at least one wavefile, then ssm_train_conf will produce a model file. Otherwise, if no wave files can be properly processed, ssm_train_conf writes an error message and does not produce an output model file.

Special keys in recognition results

Special SWI_ keys describes special keys (and values) identified with the "SWI_" prefix. Several keys are specific to the statistical models used for natural language processing such as: