Configuring Recognizer

There are hundreds of recognizer parameters, but only a few dozen are used frequently. Others are useful in specific situations, and many exist for historical compatibility and are rarely used.

Configuration mechanisms

Note: This discussion focuses on Nuance Recognizer configuration, but you can use some mechanisms to configure other components in the system.

Here is a summary of recognizer configuration mechanisms listed in order of precedence, from highest to lowest (see Rules of parameter precedence):

Mechanism

Description

VoiceXML properties

Properties set by the application in a VoiceXML document, and translated by the system to set Recognizer parameters. Some properties have the same name as parameters, and some are different. See Parameters set in VoiceXML applications.

Voice browser

Settings in MRCP client messages sent to Speech Server. See Configuring Recognizer with the browser.

Parameter grammar files

Settings that affect the recognition context, including all active speech grammars. See Parameters set in parameter grammar files.

<meta> tags in grammars

Settings inside of speech grammars, affecting those grammars and their children. See Parameters set in grammar files.

session.xml settings

Application defaults loaded from a session.xml file. See Parameters set in session.xml.

Management Station 

Service properties (configuration parameters) defined centrally on the Management Station and propagated to Speech Server instances. See Setting service properties.

Nuance Speech Server

Administrator settings defined on the Speech Server via the Management Station. See Configuring Speech Server.

Recognizer configurations

Operators sometimes configure the SpeechWorks.cfg file during installation. This file's main purpose is to assist with bootstrapping during recognizer startup.

Administrator settings defined on the recognition service via the Management Station. See Recognizer parameter categories.

If not using the Management Station, set parameters in a Recognizer configuration file. See Configuration without the Management Station.

Baseline

Product defaults.

One reason for different mechanisms is to enable different components to load and change the configuration:

Configuration lifecycle

Different people configure the Recognizer at different times for different purposes. Configuration begins during installation with a few site-specific settings. The installation can be on a small development system or a large production system.

Application developers change parameters during implementation and deployment. Later, administrators and application developers fine-tune many parameters in response to real-world performance in the production environment.

Configuring during installation

As part of installation, the initial configuration has variations for development and production systems. A development system means any pre-deployment environment used by voice browser developers or application developers. A production system is for deployed applications receiving real telephone calls.

The installation configuration consists of default settings and a few site-specific values (for example, parameters that point to the location of servers). For site-specific configuration changes, see Configuration during installation.

Configuring during integration

An voice browser uses Nuance Speech Server to access the Recognizer. Whoever develops the browser makes general configuration decisions for the runtime environment, and specific decisions for each application.

Typically, a voice platform integrator builds an MRCP client as part of a VoiceXML browser that serves any number of VoiceXML applications. Alternately, application developers can build an MRCP client with a non-VoiceXML application (with no need for a voice browser). In both scenarios, the MRCP protocol is the mechanism for communicating between the application and the Nuance Speech Server.

Voice platform integrators determine how to best install and configure Nuance speech products for their platform. Typically, Nuance components are installed separately from platform components (using the Nuance installers), and the integrator provides configuration instructions to join the two.

When the system hosts multiple applications, system administrators can define provisioning and data logging directories for each tenant and application being hosted. A tenant is the owner of the application, and is typically represented by a company name.

At runtime, the voice browser configures components on behalf of applications. It can send the application’s session.xml to Speech Server at the start of a session, and when the application sets properties during the session, the client communicates them to Speech Server. Applications can specify any VoiceXML property plus the additional Nuance parameters described in Nuance documentation.

Configuring during development

Applications developers make the majority of configuration decisions. They decide global settings for all recognition events, and case-by-case settings for individual events. They use various mechanisms: specifying parameters in application code and in speech grammars, and by changing configuration files. In addition, developers use different settings at different times in the application lifecycle: they enable or disable features during development and reverse those settings during deployment.

Most application developers deploy to an existing platform infrastructure. Someone else is responsible for the production system, and has installed and configured the needed components. In this scenario, the developer is responsible for the application configuration. For security purposes, these developers might not have access to some parts of the system (for example, the Speech Server host). In this case, the developer must communicate needed changes to those responsible for the production hosts.

Many application use products such as Nuance DialogModules, which are mostly controlled through configuration properties. This guide does not address DialogModules directly, but the process is the same with configuration iterations during installation, development, deployment, and tuning.

See Configuration during application development.

Configuring during deployment

When moving an application to a production system, the developer reproduces the application’s configuration on the runtime machine, tests the initial deployment, and adjusts configuration values as needed.

A typical application gets deployed in phases that gradually increase the number of users. This enables tests for usability and performance while the system remains somewhat private, and gives developers a chance to fine-tune the configuration before committing to full production loads. Developers assess the application’s functionality, stability, and performance throughout the deployment phases, and modify the configuration accordingly.

There is no special deployment mechanism for moving a configuration. Replicating a configuration involves the following:

  • Many configurations are embedded in application files (for example, in <meta> values in speech grammars and properties in VoiceXML files). These settings automatically transfer to the production system when application files are moved.
  • The application’s default configurations are in a session.xml file, which moves to the production system with other application files.
  • Application developers can replicate parameters values from their development system to the production system. When you have access to the host system where Nuance products are installed, you can change each product’s configuration files.
  • System administrators (or application developers) are responsible for assessing factors that affect overall performance. These include licensing, caching behaviors, memory usage, network loads, latency, and system capacity.

For a description of configuration changes when moving applications to a production system, see Configuration during application deployment.

Tuning applications

As load increases on a deployed system, administrators and developers compare actual performance with expectations, and then they tune the configuration accordingly. In practice, analysis and tuning are done by more than one person, and not necessarily the original application developer. Numerous people can contribute:

  • System administrators tune performance. They identify and troubleshoot physical performance problems (CPU, memory, memory cache size, and network issues). They can collect and deliver call logs for further analysis by other users.
  • Application designers use call logs to study the application’s dialog with users (the prompts and expected responses). They identify specific problem areas, and decide how to improve them.
  • User interface designers tune application transaction performance. They determine whether the prompts elicit the expected responses from callers, and make adjustments to achieve more transaction success.
  • Grammar developers tune recognition performance. For example, they compare recordings of caller speech with the resulting confidence scores, and make adjustments if the scores seem too low or high.

Tuning is highly iterative: you measure physical performance (CPU, memory, and network traffic), analyze the call logs for application successes and failures, identify problems, make configuration changes, and evaluate results. You also record the changes in the original development environment to be used for future releases of the application.

During tuning, some configuration changes are temporary: for example, collecting extra data for troubleshooting. Often, changes are incremental: just a few parameters, or small variations in values, followed by a period of usage and data collection. This approach reduces the causes of change, and clarifies the results.

To detect problems, analyze data from different angles:

  • Recognition performance: A measure of caller speech that is not covered by the grammars, and the resulting recognition rate perceived by callers.
  • Containment: The percentage of calls completed by the application versus calls transferred to human agents.
  • Caller experience: The average call duration, and average number of prompts and responses, and the number of calls slowed by delays of backend resources (such as responses from a database).
  • Transaction success: The number of times a particular logical path in the application was followed, the success rate of that path, the average time to completion, and the frequency of the path resulting in a transfer.
  • System load: The number of calls, the durations of speech, the usage of memory, CPU, and disk space.

For a description of tuning problems solved through configuration changes, see Configuration for tuning.