Configuration during application development

Application developers configure application sessions and individual recognition events. They use different configurations for development and deployment systems depending on the emphasis for troubleshooting, testing, tuning, or performance.

This illustration shows configuration mechanisms available to application developers:

Configuration summary:

The VoiceXML application specifies properties that the voice browser communicates to the Speech Server, Recognizer, and text-to-speech engine. The settings are often valid for a single recognition or synthesis event, whereupon they revert to their default values.
Applications specify two kinds of properties: basic properties as defined in the VoiceXML specification, and Nuance-specific parameters that refine the control of Recognizer. The browser passes the Nuance parameters as MRCP vendor-specific properties.
The session.xml configures defaults for the duration of the session.
Speech grammars contain parameters to control individual recognition events. Recognizer loads the settings when the application activates a grammar. Parameter grammars configure a group of active grammars.
Management Station sets defaults for a specific recognition service instance. Application developers change parameters this way when a higher-level mechanism is not available.
The text-to-speech capabilities of Nuance Vocalizer support parameters that may be set in Management Station.

Temporary settings to aid development

These parameters are occasionally set differently for application development and deployment.

Setting these parameters requires access to the recognition host.

Parameter	Description	Default
DiagTagMapsUser	Application’s tagmap files for custom TRC diagnostic logging.	(empty)
swirec_disk_cache_enabled	Enables (or disables) the disk and inet caches.	1 (enabled)
swirec_enable_robust_compile	Ignores missing pronunciations during grammar compilation. Useful for applications that automatically generate grammars (for example, generating grammars at runtime from data in a database). This parameter is not useful for hand-written grammars.	0 (disabled)

Parameters for each recognition

Applications configure Recognizer for each recognition event.

Typically, the VoiceXML application sets these parameters with a <property> tag, and the browser passes values to Speech Server in MRCP headers.

Parameter	Description	Default
bargein	Allows callers to interrupt prompts.	1 (enabled)
completetimeout	How long to wait before concluding that a caller is finished speaking.	0 (timer disabled)
confidencelevel	Minimum confidence score. Nuance Recognizer rejects utterances with scores below this value. (Does not apply to Dragon Voice recognition.)	0 (all utterances accepted)
incompletetimeout	Duration of silence to determine that callers have finished speaking.	1500 (milliseconds)
maxspeechtimeout	Maximum duration of an utterance collected from users.	-1 (no timeout)
sensitivity	Sensitivity of the speech detector when looking for speech.	0.5
swirec.secure_context	Sets security levels for protecting confidential data.	open
timeout	How long to wait for speech after a prompt ends.	7000 (milliseconds)

Parameters for recognition results

Applications use these parameters to control a grammar’s recognition results and to return special key/value pairs in the results.

Typically, the VoiceXML application sets these parameters with a <property> tag, and the browser passes values to Speech Server as MRCP vendor-specific parameters.

Parameter	Description	Default
swirec_extra_nbest_keys	Adds grammar keys to the XML result.	SWI_meaning, SWI_literal, SWI_grammarName
swirec_grammar_script	A grammar script to be invoked on the root rule of each n-best result.	(empty)
swirec_grammar_script_sisr	A grammar script to be invoked on the root rule of each n-best result.	(empty)
swirec_nbest_list_length	Maximum number of n-best answers that can be returned.	2 (n-best length)
swirec_save_comp_stats	Adds the speech mode attribute to nomatch recognition results (to conform to VoiceXML 2.0).	0 (disabled)
resultNbestExtraKeys	Dragon Voice: adds confidence scores to the XML result.	(empty)

Parameters for accuracy and performance

All grammar parameters can affect CPU usage, compilation or recognition latencies, and recognition accuracy. These parameters have a strong impact.

Application developers set some of these parameters using a <meta> inside grammars. Other parameters are set on the Nuance recognition service via Management Station.

Parameter	Description	Default
swirec_compile_parser	Speeds recognition time at the cost compilation performance.	0 (feature is off)
swirec_enable_robust_compile	Ignores missing pronunciations during grammar compilation.	0 (disabled)
swirec_language_translation_table	Specifies a text file that maps language declarations in grammars to Nuance language codes.	(Recognizer language codes)
swirec_max_auto_prons	Number of pronunciations to generate automatically when a word is not found.	1 (pronunciations)
swirec_max_dict_prons	Maximum number of pronunciations per word.	8 (pronunciations)
swirec_multiword_replace	Limits the number of pronunciations for phrases in user dictionaries.	0 (pronunciations for whole phrases and their individual words)
swirec_normalize_to_probabilities	Improves accuracy by adding a normalized, probabilistic language model.	0 (normalization is off)
swirec_optimization	Optimization level for the grammar.	6 (for dynamic compilations)

Configuration in grammar files

When writing speech grammars, use the parameters shown in the sections below. Set these parameters inside grammar files using the <meta> tag. There are two general categories:

Parameters applied to individual grammars
Parameters applied to the set of active grammar (grammars activated in parallel).

Choosing performance of compilation time, recognition time, or accuracy

These parameters balance compilation time, recognition time, and recognition accuracy. Typically, improving performance of one dimension decreases performance of the others:

Parameter	Description	Default
swirec_compile_parser	Speeds recognition time at the cost compilation performance.	0 (feature is off)
swirec_optimization	Optimization level for the grammar.	6 (for dynamic compilations)

These parameters balance the variety of pronunciations with compilation time, recognition time, CPU load, and recognition accuracy:

Parameter	Description	Default
swirec_max_dict_prons	Maximum number of pronunciations per word.	8 (pronunciations)
swirec_multiword_replace	Limits the number of pronunciations for phrases in user dictionaries.	0 (pronunciations for whole phrases and their individual words)

Tuning Recognizer performance

These parameter tune application performance by controlling Recognizer’s search for matches:

Parameter	Description	Default
swirec_astar_max_paths	Maximum number of nodes visited during the a-star search.	100000 (nodes visited)
swirec_lmweight	Adds weight to match the dynamic ranges of language and acoustic models.	1.0
swirec_max_arcs	Maximum number of active FSM arcs.	10000, 5000, 3000
swirec_phoneme_lookahead_beam	Provides a secondary guide to the Viterbi beam search.	-30, -60, -60
swirec_silence_prune_offset	Limits search paths that end in a silence model during pruning.	56, 56, 56
swirec_state_beam	Primary guide for the Viterbi beam search.	0, -15, -35
swirec_nbest_list_length	Maximum number of n-best answers that can be returned.	2 (n-best length)

Semantic interpretation:

Parameter	Description	Default
swirec_max_parses_per_literal	Maximum number of parses evaluated by Recognizer for a single literal string.	10 (parses)
swirec_simple_result_key	Specifies a single key to return in the recognition result instead of all keys.	(empty)
swirec_word_confidence_enabled	Controls whether Recognizer performs word confidence calculations.	0 (disabled)

Using custom models

Nuance can provide custom models for an application, and for specific contexts within a speech grammar. These parameters control the usage of those models:

Parameter	Description	Default
swirec_model_name	Points to custom models for firstpass processing in Recognizer.	(default models are used for each language)
swirec_secondpass_allophone_mapfile_name	Defines allophone maps for secondpass processing in Recognizer.	(default mapfiles used)
swirec_secondpass_global_fsm_name	Defines finite state machines for secondpass processing in Recognizer.	(default fsm files used)
swirec_secondpass_model_name	Acoustic models for secondpass processing in Recognizer.	(default models used)

Controlling language models (SLMs)

Use these parameters when building models for SLMs and robust parsing grammars:

Parameter	Description	Default
swirec_first_pass_grammar	Specifies an n-gram grammar file that defines a Statistical Language Model (SLM).	(empty)
swirec_fsm_grammar	Specifies a finite state machine (fsm) used by a speech grammar.	(empty)
swirec_fsm_wordlist	Specifies a wordlist used by a speech grammar.	(empty)
swirec_training_grammar	Specifies an SLM training set.	(empty)

These parameters are used when building SLMs, but also have other uses:

Parameter	Description	Default
swirec_acoustic_adapt_suppress_adaptation	Temporarily stops self-learning activities for one or more languages.	(depends on language)
swirec_normalize_to_probabilities	Improves accuracy by adding a normalized, probabilistic language model.	0 (normalization is off)

This parameter is used when building SSMs:

Parameter	Description	Default
swissm_confidence_threshold	Default confidence threshold any application SSMs.	0.0

Controlling waveforms saved by Recognizer

These parameters control audio data after the application collects utterances from users and delivers them to Recognizer.

The VoiceXML application sets some of these parameters with a <property> tag, and the browser passes values to Speech Server as MRCP vendor-specific parameters. Other parameters are set by system administrators on the recognition host.

Parameter	Description	Default
swiep_waveform_logging_max_channels	Maximum number of channels to save waveforms (recordings of speech from callers).	-1 (no maximum)
swirec_waveform_begin_silence	How much silence is kept at the start of a collected utterance.	0 (milliseconds)
swirec_waveform_end_silence	How much silence is kept in a collected utterance.	0 (milliseconds)
swirec_waveform_interword_max	How much silence is kept in a collected utterance.	0 (milliseconds)
swirec_waveform_logging_max_channels	Maximum number of channels allowed to save waveforms (recordings of speech from callers).	-1 (no maximum)
swirec_waveform_speech_thresh	Removes line noise from audio recordings.	10 (percent)
swirec_return_waveform	Returns waveforms in recognition results.	1 (enabled)

Controlling data written to logs

These parameters control Recognizer data written to the call logs.

Typically, the VoiceXML application sets these parameters with a <property> tag, and the browser passes values to Speech Server as MRCP vendor-specific parameters.

Parameter	Description	Default
swirec.secure_context	Sets security levels for protecting confidential data.	open
swirec_app_state_tokens	Adds application or browser information to call logs to synchronize runtime activities with log analysis.	(empty)
swirec_sensitive_query_keys	Suppresses logging of confidential values in grammar URI strings.	(empty)
swirec_max_logged_nbest	Number of n-best entries written to the call log.	2 (n-best entries)

Controlling magic word recognition modes

These parameters control the magic word and selective barge-in features, which enable responses based on detecting specified words.

Typically, the VoiceXML application sets these parameters with a <property> tag, and the browser passes values to Speech Server as MRCP vendor-specific parameters.

Parameter	Description	Default
swirec_magic_word_conf_thresh	Confidence threshold for magic word recognition results.	500
swirec_selective_barge_in_conf_thresh	Confidence threshold for selective_barge_in mode.	500

Typically, these parameters are set for all applications running on the Speech Server host. Alternatively, browser can set these parameters with MRCP vendor-specific parameters.

Parameter	Description	Default
swiep_magic_word_max_msec	Maximum duration of a magic word candidate for recognition.	800 (milliseconds)
swiep_magic_word_min_msec	Minimum duration of a magic word candidate for recognition.	200 (milliseconds)
swiep_mode	Sets special recognition modes (such as magic word) in the endpointer.	begin_only
swirec_barge_in_mode	Sets special recognition modes in Recognizer.	normal

Troubleshooting during development

These parameters are for troubleshooting speech grammars. System administrators set them on the recognition host:

Parameter	Description	Default
GrammarDumpDirectory	Storage location for grammars fetched by Recognizer.	NULL (disabled)
GrammarDumpDirectorySize	Maximum size of the grammar dump directory.	100000 (100 MB)

Troubleshooting languages

By default, built-in grammars use the Recognizer’s default language (which is determined during installation). When the application language does not match the default, the application must declare a language whenever it uses a built-in grammar.