Controlling the self-learning recognizer

Nuance Recognizer contains self-learning technology that improves accuracy over time.

The technology is nearly invisible to the voice browser and application developers, but there are mechanisms that can be controlled.

Acoustic adaptation

Self-learning is also called acoustic adaptation. Essentially, Recognizer distinguishes between high- and low-confidence recognition results and uses this information to improve acoustic models for future recognitions.

By default, Recognizer adapts automatically and requires no intervention. Optionally, you can change storage locations, rate of learning, and other configurations. See Summary of self-learning configuration parameters.

Self-learning works as follows:

  1. Nuance Recognizer installs one set of models that is shared by all recognition servers on that host.
  2. Recognizer copies the installed models and adapts them for all applications or each tenant’s applications depending on the configuration. Optionally, use swirec_acoustic_adapt_root to change the adaptation scope.
  3. When applications load grammars, Recognizer loads appropriate models along with statistics files that store self-learned data for improving recognition.
  4. When Recognizer receives audio and computes a result, it collects new statistics, and automatically learns and adapts from the recognition result. The CPU load is insignificant with no noticeable realtime delay to recognition.

    For various reasons, you application developers can prevent adaptation for individual recognition events. For details, see swirec_acoustic_adapt_suppress_adaptation.

  5. Recognizer periodically updates its recognition models using the collected statistics. The adaptation processing adds a small CPU load that could be noticed by application users. For this reason, updates do not happen often (no more than once a day). You can control the timing of updates with swirec_acoustic_adapt_model_update_time.

    If an administrator stops and restarts Recognizer, Recognizer discards any statistics collected since the previous update of the models. The loss of data does not reduce recognition accuracy or corrupt the system in any way.

  6. After adaptation updates, Recognizer archives the statistics data. The archive is not used in future updates. It assists Nuance technical support when troubleshooting your system. System administrators control the amount of archived data with swirec_acoustic_adapt_num_archive.
  7. Recognizer writes adaptation messages to the event and diagnostic log files. Reported events:

Hot insert of acoustic models

Hot insert is a feature to enable updates of acoustic speech recognition models while the system continues operation (instead of forcing you to stop the system before performing the updates). Telephone callers do not perceive delays when these updates occur. Hot insert is nearly invisible to voice browser developers and application developers, but some mechanisms can be controlled.

The automatic self-learning feature uses hot insert to perform its updates. In addition, applications that use customized acoustic models can update those models simply by updating files. (For example, as application developers iteratively tune acoustic models they place new versions of the models onto the file system and Recognizer automatically inserts the new models.)

Specifically, the feature works as follows:

  1. When loading the default language (during system startup) and when loading grammars, Recognizer loads all files associated with acoustic modeling into memory and records the timestamps of those files.
  2. Recognizer periodically checks those files and detects whether the timestamps have changed. The browser can define the time period with swirec_update_interval.
  3. If a timestamp has changed, Recognizer replaces the currently loaded file with the new file. All subsequent recognitions use the new file. If a recognition is underway when the update occurs, the recognition continues unchanged (using the original model).
  4. When placing new modeling files onto a system, disable the Recognizer’s update processing temporarily. Doing this ensures that all needed files are updated simultaneously. To disable the processing, the operator creates a file named by the swirec_update_lockfile parameter.

This feature works independently of the language being recognized. Each language has its own set of acoustic modeling files. For example, if your system has French and German installed, and you place new German models onto the hard drive, Recognizer updates the German models and there is no change to the French.

Summary of self-learning configuration parameters

Most systems use default values for the self-learning configuration parameters.

Because the ideal values vary for each language, these parameters are set on a language-by-language basis:

Parameter

Description

Default

swirec_acoustic_adapt_adapt_model

Disables self-learning for one or more languages.

0 (adaptation enabled)

swirec_acoustic_adapt_min_num_utts

Minimum data for updating acoustic models.

(depends on language)

swirec_acoustic_adapt_model_update_time

When to update recognition models with learned statistics.

0000 (midnight)

swirec_acoustic_adapt_num_archive

How long to save old statistics files.

3 (months)

swirec_acoustic_adapt_rate

Controls the rate of the Recognizer’s self-learning.

(depends on language)

swirec_acoustic_adapt_root

Storage location of self-learning files. Controls sharing of models across the server, tenants, or applications.

(empty)

swirec_acoustic_adapt_suppress_acoustic_model_update

Stops self-learning adaptation of recognition models.

0 (updates enabled)

swirec_acoustic_adapt_suppress_adaptation

Temporarily stops self-learning activities for one or more languages.

(depends on language)

Self-learning for distributed architectures

When you install Nuance Recognizer on more than one host, each host adapts independently. Depending on configuration, each tenant (or even each application) improves independently on each host (see swirec_acoustic_adapt_root).

Over time, recognition accuracy improves to approximately the same levels, but in the short term some hosts, tenants, and applications will improve faster than others.

If your operational environment requires all hosts to remain identical (that is, that all files, programs, and so on, are the same), do either of the following:

  • Turn off adaptation updates on every Nuance recognition service in Management Station with swirec_acoustic_adapt_suppress_acoustic_model_update. This technique collects statistics (with very low CPU load), and ensures that the models remain unchanged until a system administrator can perform a single, simultaneous update for all hosts. Contact Nuance Professional Services for information on merging statistics files and generating models for distributed systems.
  • Turn off self-learning on every Nuance recognition service in Management Station with swirec_acoustic_adapt_adapt_model. This eliminates all system load and disk storage for adaptation purposes.