Recognizer parameter categories
This topic describes groups of related parameters. For an alphabetical list, see the Recognizer parameter reference.
If you use the Management Station to set defaults on a specific Nuance recognition service instance, your settings override the default values.
Application developers do not typically work with recognition server configurations files; they use session.xml files instead.
Set licensing parameters during installation:
Parameter |
Description |
Default |
---|---|---|
Number of licenses to check out during initialization of the speech detector (endpointer). |
8 (licenses) |
|
Number of licenses the endpointer can claim before the system generates a warning. |
-1 (disabled) |
|
The number of licenses to check out during Recognizerinitialization. |
4 (licenses) |
|
The number of licenses Recognizer can claim before the system generates a warning. |
-1 (disabled) |
|
Specifies which features in the license file to enable at the start of each session. |
(all features enabled) |
Use these parameters to control fetching and caching. Also, see Understanding grammar caching.
Parameter |
Description |
Default |
---|---|---|
Enables (or disables) the disk and inet caches. |
1 (enabled) |
|
Largest size of the disk and inet caches after cache cleanup. |
400 (MB) |
|
Minimum size of disk and inet caches entries. |
0 (KB) |
|
Desired maximum size of the disk and inet caches. |
500 (MB) |
|
Default optimization level for fully-optimized grammars. |
9 |
|
Number of previous activations before fully optimizing a grammar. |
3 |
|
Specifies the user agent name presented to the web server when HTTP requests are made. |
OpenSpeechRecognizer/1.0 | |
Keeps preloaded grammars in the memory cache. |
0 (flushing allowed) |
|
Maximum size of the memory cache after removing grammars to create available space. |
0 (adaptation enabled) |
|
Minimum size for memory cache entries. |
85 (MB) |
|
Desired size of the memory cache. |
0 (KB) |
You can use these recognizer parameters to set up a proxy
Parameter |
Description |
Default |
---|---|---|
Proxy server to be used by Recognizer when fetching grammar URIs. |
<value/> (empty value) |
|
The port of a proxy server. |
<value/> (empty value) |
Optional. Use these parameters to set up mutual authentication when fetching grammars with HTTPS. Most applications use one-way authentication, which does not require that any Recognizer parameters be set.
Parameter |
Description |
Default |
---|---|---|
A file containing one or more sequential PEM-encoded public CA certificates, used by Recognizer when mutual authentication is required for fetching grammars. |
(none) |
|
The PEM-encoded client certificate used by the Recognizer when mutual authentication is required for fetching grammars. |
(none) |
|
swirec_inet_ssl_private_key_file |
The PEM-encoded private key for Recognizer when mutual authentication is required. |
(none) |
swirec_inet_ssl_verify |
Enables peer authentication when mutual authentication is required. |
0 |
swirec_inet_ssl_verify_depth |
Limits the depth of the certificate chain for validation when using mutual authentication. |
2 (accommodates one intermediate CA) |
These parameter cannot be changed dynamically during runtime operation because they apply to the process as a whole and not to individual recognition events.
Parameter |
Description |
Default |
---|---|---|
Audio formats that will be supplied to Recognizer. |
audio/basic;rate=8000 |
|
Number of licenses to check out during initialization of the speech detector (endpointer). |
8 (licenses) |
|
Number of licenses the endpointer can claim before the system generates a warning. |
-1 (disabled) |
|
Minimum amount of audio data processed by the endpointer. |
800 (bytes) |
|
Maximum number of channels to save waveforms (recordings of speech from callers). |
-1 (no maximum) |
|
Minimum data for updating acoustic models. |
(depends on language) |
|
When to update recognition models with learned statistics. |
0000 (midnight) |
|
How long to save old statistics files. |
3 (months) |
|
Stops self-learning adaptation of recognition models. |
0 (updates enabled) |
|
Audio formats that will be supplied to Recognizer. |
audio/basic;rate=8000 |
|
Activates all public rules in a speech grammar as the root rule. |
0 (disabled) |
|
Default optimization level for grammars. |
6 |
|
Enables (or disables) the disk and inet caches. |
1 (enabled) |
|
Largest size of the disk and inet caches after cache cleanup. |
400 (MB) |
|
Minimum size of disk and inet caches entries. |
0 (KB) |
|
Desired maximum size of the disk and inet caches. |
500 (MB) |
|
Default optimization level for fully-optimized grammars. |
9 |
|
Number of previous activations before fully optimizing a grammar. |
3 |
|
A media type used during grammar activation. (It replaces the media type fetched from the server.) |
0 (ignore fetched type) |
|
Defines characters that represent delimiters in URI strings. |
; and & (semicolon and ampersand) |
|
Specifies a text file that maps language declarations in grammars to Nuance language codes. |
(Recognizer language codes) |
|
The number of licenses to check out during Recognizer initialization. |
4 (licenses) |
|
The number of licenses Recognizer can claim before the system generates a warning. |
-1 (disabled) |
|
Specifies which features in the license file to enable at the start of each session. |
(all features enabled) |
|
Keeps preloaded grammars in the memory cache. |
0 (flushing allowed) |
|
Number of pronunciations to generate automatically when a word is not found. |
1 (pronunciations) |
|
Maximum allowed size of grammars that can be dynamically compiled. |
-1 (unlimited) |
|
Limits the CPU and memory cost of SLMs trained at run-time. |
-1 (unlimited) |
|
Maximum size of the memory cache after removing grammars to create available space. |
0 (adaptation enabled) |
|
Minimum size for memory cache entries. |
85 (MB) |
|
File that loads grammars during Recognizer initialization. |
$SWISRSDK/config/SWIgrmPreload.xml |
|
Writes detailed statistics of Recognizer processing to the call logs. |
0 (SWIstats is disabled) |
|
Name and location of the system dictionary file. |
(system dictionary provided by the Recognizer) |
|
Interval for hot insert loading of acoustic models. |
300 (5 minutes) |
|
Disables the hot insert feature. |
$SWISRSDK/config/update_lock.txt |
|
The number of digits used in filenames when the Recognizer saves waveform files. |
2 (limit of 99 files per session) |
|
Assigns the same number in the filenames of related waveforms. |
0 (disabled) |
|
Density of the word lattice. |
100.0 |
|
Density of the word lattice. |
7.0 |
|
Default confidence threshold any application SSMs. |
0.0 |
Here are the remaining parameters set in a recognizer configuration file. The values override the installation defaults. They can also be set via other mechanisms.
Parameter |
Description |
Default |
---|---|---|
Allows callers to interrupt prompts. |
1 (enabled) |
|
How long to wait before concluding that a caller is finished speaking. |
0 (timer disabled) |
|
Minimum confidence score. Nuance Recognizer rejects utterances with scores below this value. |
0 (all utterances accepted) |
|
Duration of silence to determine that callers have finished speaking. |
1500 (milliseconds) |
|
Sensitivity of the speech detector when looking for speech. |
0.5 |
|
Safety margin to ensure that the begin-of-speech is captured. |
200 (milliseconds) |
|
Safety margin to ensure the end-of-speech is captured. |
350 (milliseconds) |
|
Controls how loudly callers must speak to interrupt prompts (barge-in) and detect speech. |
50 (percent) |
|
Maximum duration of a magic word candidate for recognition. |
800 (milliseconds) |
|
Version of a Recognizer language to use. |
(most recent version of each language) |
|
Minimum duration of a magic word candidate for recognition. |
200 (milliseconds) |
|
Sets special recognition modes (such as magic word) in the endpointer. |
begin_only |
|
Disables barge-in briefly at the beginning of a prompt. |
0 (no delay) |
|
Storage location of self-learning files. Controls sharing of models across the server, tenants, or applications. |
(empty) |
|
Temporarily stops self-learning activities for one or more languages. |
(depends on language) |
|
Controls the logged output by enabling tags in diagnostic log files. |
(none) |
|
Adds grammar keys to the XML result. |
SWI_meaning, SWI_literal, SWI_grammarName |
|
Adds weight to match the dynamic ranges of language and acoustic models. |
1.0 |
|
Defines ranges of system activity (idle, normal, and busy) based on CPU capacity. |
0, 14, 40, 101 (percentages) |
|
Overrides the automatic detection of CPU load, and forces specific values for parameters that balance speed and accuracy. |
on |
|
Confidence threshold for magic word recognition results. |
500 |
|
Maximum number of active FSM arcs. |
10000, 5000, 3000 |
|
Maximum CPU time used to recognize an answer. |
20000 (milliseconds) |
|
Maximum number of pronunciations per word. |
8 (pronunciations) |
|
Number of n-best entries written to the call log. |
2 (n-best entries) |
|
Maximum number of parses evaluated by Recognizer for a single literal string. |
10 (parses) |
|
Maximum CPU time for the search phase of recognition. |
5000 (milliseconds) |
|
Maximum number of candidates for filling the n-best list. |
999999 (sentences) |
|
Maximum number of n-best answers that can be returned. |
2 (n-best length) |
|
Provides a secondary guide to the Viterbi beam search. |
-30, -60, -60 |
|
Returns waveforms in recognition results. |
1 (enabled) |
|
Sets security levels for protecting confidential data. |
open |
|
Confidence threshold for selective_barge_in mode. |
500 |
|
Limits search paths that end in a silence model during pruning. |
56, 56, 56 |
|
Defines allophone maps for secondpass processing in the Recognizer. |
(default mapfiles used) |
|
Defines finite state machines for secondpass processing in the Recognizer. |
(default fsm files used) |
|
Acoustic models for secondpass processing in Recognizer. |
(default models used) |
|
Primary guide for the Viterbi beam search. |
0, -15, -35 |
|
How much silence is kept at the start of a collected utterance. |
0 (milliseconds) |
|
Controls whether Recognizer performs word confidence calculations. |
0 (disabled) |
Application developers can set any parameter defined in the VoiceXML standard. They can also set Nuance-specific recognizer parameters using the <property> tag, which the voice browser handles as an MRCP vendor-specific parameter.
These parameters are defined in the VoiceXML standard:
Parameter |
Description |
Default |
---|---|---|
Allows callers to interrupt prompts. |
1 (enabled) |
|
How long to wait before concluding that a caller is finished speaking. |
0 (timer disabled) |
|
Minimum confidence score. Recognizer rejects utterances with scores below this value. |
0 (all utterances accepted) |
|
Duration of silence to determine that callers have finished speaking. |
1500 (milliseconds) |
|
Maximum duration of an utterance collected from users. |
-1 (no timeout) |
|
Sensitivity of the speech detector when looking for speech. |
0.5 |
|
Sets security levels for protecting confidential data. |
open |
|
Specifies how long to wait for speech after a prompt ends. |
7000 (milliseconds) |
The voice browser must pass these parameters to the Speech Server. It reads the parameters from a VoiceXML document, and performs any needed translation for the recognizer. For example, a VoiceXML value of "10s" might need to be passed as "10000". For a list of needed translations, see Implementing an MRCP client.
Application developers can define these parameters in a session.xml file when building and tuning applications. For details, see Configuring sessions (session.xml).
Parameter |
Description |
Default |
---|---|---|
Preloads a language during Recognizer startup, and sets the default for built-in grammars. |
default (first language installed) |
|
Storage location of self-learning files. Controls sharing of models across the server, tenants, or applications. |
(empty) |
|
Temporarily stops self-learning activities for one or more languages. |
(depends on language) |
|
Includes "*#ABCD" in DTMF built-in grammars. |
0 |
|
Controls the logged output by enabling tags in diagnostic log files. |
(none) |
|
Ignores missing pronunciations during grammar compilation. |
0 (disabled) |
|
Ignores the media type that is returned by the server upon fetching a grammar. |
1 (ignore fetched type) |
|
Default language for built-in grammars. |
(value of Recognizer’s DefaultLanguage parameter) |
|
Version of a Recognizer language to use. |
(most recent version of each language) |
|
Adds weight to match the dynamic ranges of language and acoustic models. |
1.0 |
|
Overrides the automatic detection of CPU load, and forces specific values for parameters that balance speed and accuracy. |
on |
|
Maximum number of active FSM arcs. |
10000, 5000, 3000 |
|
Provides a secondary guide to the Viterbi beam search. |
-30, -60, -60 |
|
Defines allophone maps for secondpass processing in Recognizer. |
(default mapfiles used) |
|
Defines finite state machines for secondpass processing in Recognizer. |
(default fsm files used) |
|
Acoustic models for secondpass processing in Recognizer. |
(default models used) |
|
Retains the semicolon as separator in the query string when importing a grammar. |
0 |
|
Suppresses logging of confidential values in grammar URI strings. |
(empty) |
|
Primary guide for the Viterbi beam search. |
0, -15, -35 |
|
For Nuance use only. |
Certain parameters can be set with the <meta> tag inside grammar files. This raises a question of when to apply values: during grammar compilation (which allows different grammars to have different values for the same parameter) or during recognition (which requires a single, shared value when different grammars reference each other.
When a parameter is set by more than one active grammar, there are implications for precedence. See Precedence of parameters set via <meta> tags.
Compilation-time parameters
The settings of “compilation-time” parameters are determined when the grammar is compiled. The setting is used in the grammar even if the grammar is subsequently imported by another grammar that sets the same parameter differently.
Parameter |
Description |
Default |
---|---|---|
Speeds recognition time at the cost compilation performance. |
0 (feature is off) |
|
Specifies an n-gram grammar file that defines a Statistical Language Model (SLM). |
(empty) |
|
Specifies a finite state machine (fsm) used by a speech grammar. |
(empty) |
|
Specifies a wordlist used by a speech grammar. |
(empty) |
|
Maximum number of pronunciations per word. |
8 (pronunciations) |
|
Limits the number of pronunciations for phrases in user dictionaries. |
0 (pronunciations for whole phrases and their individual words) |
|
Optimization level for the grammar. |
6 (for dynamic compilations) |
|
Improves accuracy by adding a normalized, probabilistic language model. |
0 (normalization is off) |
|
Specifies an SLM training set. |
(empty) |
Recognition-time parameters
When “recognition-time” parameters are set via a <meta> tag in a grammar, and the grammar is subsequently imported by another grammar, the <meta> setting is ignored and the parent grammar determines the parameter value.
Parameter |
Description |
Default |
---|---|---|
Duration of silence to determine that callers have finished speaking. |
1500 (milliseconds) |
|
Temporarily stops self-learning activities for one or more languages. |
(depends on language) |
|
Adds application or browser information to call logs to synchronize runtime activities with log analysis. |
(empty) |
|
Maximum number of nodes visited during the a-star search. |
100000 (nodes visited) |
|
Adds weight to match the dynamic ranges of language and acoustic models. |
1.0 |
|
Maximum number of active FSM arcs. |
10000, 5000, 3000 |
|
Maximum CPU time used to recognize an answer. |
20000 (milliseconds) |
|
Number of n-best entries written to the call log. |
2 (n-best entries) |
|
Maximum number of parses evaluated by Recognizer for a single literal string. |
10 (parses) |
|
Maximum number of n-best answers that can be returned. |
2 (n-best length) |
|
Provides a secondary guide to the Viterbi beam search. |
-30, -60, -60 |
|
Returns waveforms in recognition results. |
1 (enabled) |
|
Sets security levels for protecting confidential data. |
open |
|
Limits search paths that end in a silence model during pruning. |
56, 56, 56 |
|
Specifies a single key to return in the recognition result instead of all keys. |
(empty) |
|
Primary guide for the Viterbi beam search. |
0, -15, -35 |
|
Controls whether Recognizer performs word confidence calculations. |
0 (disabled) |
A parameter grammar sets recognition parameters on all active speech grammars. For parameter grammar format and activation, see Understanding parameter grammars.
These parameters can be set via parameter grammars:
Parameter |
Description |
Default |
---|---|---|
How long to wait before concluding that a caller is finished speaking. |
0 (timer disabled) |
|
Duration of silence to determine that callers have finished speaking. |
1500 (milliseconds) |
|
Maximum duration of an utterance collected from users. |
-1 (no timeout) |
|
Sensitivity of the speech detector when looking for speech. |
0.5 |
|
Temporarily stops self-learning activities for one or more languages. |
(depends on language) |
|
Adds application or browser information to call logs to synchronize runtime activities with log analysis. |
(empty) |
|
Sets special recognition modes in Recognizer. |
normal |
|
Adds grammar keys to the XML result. |
SWI_meaning, SWI_literal, SWI_grammarName |
|
A grammar script to be invoked on the root rule of each n-best result. |
(empty) |
|
A grammar script to be invoked on the root rule of each n-best result. |
(empty) |
|
Defines ranges of system activity (idle, normal, and busy) based on CPU capacity. |
0, 14, 40, 101 (percentages) |
|
Overrides the automatic detection of CPU load, and forces specific values for parameters that balance speed and accuracy. |
on |
|
Confidence threshold for magic word recognition results. |
500 |
|
Maximum number of active FSM arcs. |
10000, 5000, 3000 |
|
Maximum CPU time used to recognize an answer. |
20000 (milliseconds) |
|
Number of n-best entries written to the call log. |
2 (n-best entries) |
|
Maximum number of parses evaluated by the Recognizer for a single literal string. |
10 (parses) |
|
Maximum number of candidates for filling the n-best list. |
999999 (sentences) |
|
Maximum number of n-best answers that can be returned. |
2 (n-best length) |
|
Provides a secondary guide to the Viterbi beam search. |
-30, -60, -60 |
|
Returns waveforms in recognition results. |
1 (enabled) |
|
Confidence threshold for selective_barge_in mode. |
500 |
|
Limits search paths that end in a silence model during pruning. |
56, 56, 56 |
|
Defines allophone maps for secondpass processing in Recognizer. |
(default mapfiles used) |
|
Defines finite state machines for secondpass processing in Recognizer. |
(default fsm files used) |
|
Acoustic models for secondpass processing in Recognizer. |
(default models used) |
|
Primary guide for the Viterbi beam search. |
0, -15, -35 |
|
How much silence is kept at the start of a collected utterance. |
0 (milliseconds) |
|
Controls whether Recognizer performs word confidence calculations. |
0 (disabled) |
|
Default confidence threshold any application SSMs. |
0.0 |
Recognizer needs values for some parameters before initialization. These parameters are static and are seldom changed after the initial installation.
Operational parameters
The system administrator sets the following parameters as environment variables or in SpeechWorks.cfg. If the parameter is set in both locations, the environment variable is used and the value in the configuration file is ignored.
Parameter |
Description |
Default |
---|---|---|
Preloads a language during Recognizer startup, and sets the default for built-in grammars. |
default (first language installed) |
|
Storage location for grammars fetched by Recognizer. |
NULL (disabled) |
|
Maximum size of the grammar dump directory. |
100000 (100 MB) |
|
Specifies the locations of License Managers. |
27000@localhost |
Diagnostic parameters in Speechworks.cfg
These parameters are for TRC diagnostic logging. As above, they are set as environment variables or in SpeechWorks.cfg.
Parameter |
Description |
Default |
---|---|---|
Defines how often to reload the tagmap files from disk. |
600 (seconds) |
|
File that maps diagnostic log messages into any language. |
$SWISRSDK/config/SWIErrors.en.us.txt |
|
Deprecated. |
(not applicable) |
|
Maximum size of the diagnostic log file. |
1024 (KB) |
|
Writes diagnostic logs to a file, stdout, or both. |
debug, file (stdout and file) |
|
Suppresses timestamps in TRC diagnostic logs. |
0 (timestamps written) |
|
Recognizer’s tagmap files for TRC diagnostic logging. |
$SWISRSDK/config/defaultTagmap.txt;$SWISRSDK/config/bwcompatTagmap.txt |
|
Application’s tagmap files for custom TRC diagnostic logging. |
(empty) |
Paths for Recognizer initialization
The Nuance Speech Suite installer sets the following parameter as an environment variable. On rare occasions, a system administrator might change this value.
Parameter |
Default |
Description |
---|---|---|
|
Environment variable pointing to the Recognizer installation directory. |