<speechserver>

The <speechserver> element defines the default parameters that apply to the Speech Server interactions with the Recognizer and Vocalizer.

This element has three child elements:

<speechrecog>—Lists parameters that apply to Speech Server interactions with the Recognizer.
<speechrecog>—Lists parameters that apply to Speech Server waveform recordings.
<speechsynth>—Lists parameters that apply to Speech Server interactions with Vocalizer.

Note that the parameters in these <speechsynth> and <speechrecog> sections under <speechserver> are different from the parameters specified in the main <speechsynth> and <speechrecog> sections.

Parameters settable in all child elements

This table lists parameters that can be set in any <speechserver> child element. They apply only to the section in which they are set. For example, you can set one fetch-timeout parameter to apply for recognition resources in the <speechrecog> child element and a different fetch-timeout parameter to apply for audio resources in the <speechsynth> child element.

Parameter	Type	Description
cache-control.max-age	Integer	The server may use only content whose age is no greater than the specified time in seconds.
cache-control.max-stale	Integer	The server may use cached data that has exceeded its expiration time by up to the specified number of seconds. If no value is assigned, the server may use stale data of any age.
cache-control.min-fresh	Integer	The server may only use cached data whose expiration is no less than its current age plus the specified time in seconds.
fetch-timeout	Integer	Timeout for resources the Speech Server may need to fetch from the network. The value, specified in milliseconds, controls URI access properties when fetching documents or other resources like speech audio files.
logging-tag	String	Specifies a string of text to include in the server logs.

Parameters settable in <speechrecog> child element

A <speechrecog> section within a <speechserver> element may include these parameters (see also Parameters settable in all child elements):

Parameter	Type	Description
accept-charset	String	The acceptable character set for entities returned by the Recognizer in responses to requests.
dtmf-buffer-time	Integer	Period in milliseconds allocated for the Recognizer type-ahead buffer. The type-ahead buffer collects DTMF digits as they are pressed, even when there is no recognition currently active. When a subsequent recognition is activated, it takes the digits in the buffer. If the digits in the buffer are not sufficient, then it can continue to listen to more digits to match the grammar. If the value is set to 0, all DTMF input is ignored.
dtmf-interdigit-timeout	String	Maximum period in milliseconds that may pass between each DTMF keystroke, without causing a timeout.
dtmf-term-char	Char	The DTMF character that marks the end of DTMF input.
dtmf-term-timeout	Integer	Time period (in milliseconds) that marks the end of DTMF input once the total number of tones allowed by the grammar has been entered, and if the user fails to type the optional termination character (the dtmf-term-char).
hotword-max-duration	Integer	Maximum duration in milliseconds of an utterance that is considered for hotword recognition. This limit lets the Recognizer immediately reject utterances that are too long to be a hotword, without wasting CPU to interpret them. For more information, see Using hot word recognition.
hotword-min-duration	Integer	Minimum duration in milliseconds of an utterance that is considered for hotword recognition. This limit lets the Recognizer immediately reject utterances that are too short to be a hotword, without wasting CPU to interpret them. For more information, see Using hot word recognition.
media-type	String	Specifies the MIME content type to be used for captured audio.
new-audio-channel	Boolean	Indicates to the Recognizer that the audio data is from a new audio source, channel, or speaker. If the recognition resource had collected any input statistics or adaptation state, the recognition resource MUST do what is appropriate for the specific recognition technology, which includes but is not limited to discarding any collected input statistics or adaptation state before starting the recognition request. If there are multiple resources sharing a media pipe and collecting or using this data, and the client issues this header to one of the resources, the reset operation applies to all resources that use the shared media stream. This helps in a number of use cases, including where the client wishes to reuse an open recognition session with an existing media session for multiple telephone calls.
recognition-mode	String	Mode of operation for the recognition method. normal: The Recognizer matches speech and DTMF to the grammars specified in the recognition request. hotword: The Recognizer only looks for the particular keywords or DTMF sequences specified in the grammar, and ignores silence or other speech in the audio stream.
recognition-timeout	Integer	Maximum number of milliseconds allowed for the Recognizer to complete a recognition.
save-waveform	Boolean	Instructs the Recognizer to save the current utterance without endpointing, and to return a pointer to it for logging purposes.
speech-language	String	Specifies the default language for built-in grammars. (All other grammars define their languages internally.)
start-input-timers	Boolean	Starts various Recognizer input timers. Use this when the synthesizer is playing a barge-in enabled prompt, and you want the recognition request to be simultaneously active so that it can detect and implement barge-in.

Parameters settable in <speechrecorder> child element

A <speechrecorder> section within a <speechserver> element may include these parameters (see also Parameters settable in all child elements):

Parameter	Type	Description
capture-on-speech	Boolean	Specifies whether to begin recording immediately or wait for the end-pointing functionality to detect speech before starting. Values: false: Record immediately (default) true: Wait until speech is detected You can also set this in your application with the captureonspeech attribute within the <record> element. The application setting takes precedence over the session.xml setting.
max-time	Integer	Specifies the maximum length of the recording in milliseconds, calculated from the time the actual capture and store begins, and is not necessarily the time the RECORD method is received. The Max-Time header specifies the duration before silence suppression, if any, that has been applied by the recorder resource. After this time, the recording stops and the server must return a RECORD-COMPLETE event to the client having a request-state of "COMPLETE".
media-type	String	Specifies the MIME content type to be used for captured audio.
new-audio-channel	Boolean	Allows the client to tell the server that, from this point on, further input audio comes from a different audio source, channel, or speaker. This header takes a Boolean value. If multiple resources are sharing a media pipe and are collecting or using this data, and the client issues this header to one of the resources, the reset operation applies to all resources that use the shared media stream. This helps in a number of use cases, including where the client wishes to reuse an open recognition session with an existing media session for multiple telephone calls.
trim-length	Integer	Specifies the length of audio to be trimmed from the end of the recording after the stop. The length is interpreted in milliseconds. The default value for this header is 0.

Parameters settable in <speechsynth> child element

A <speechsynth> section within a <speechserver> element may include these parameters (see also Parameters settable in all child elements):

Parameter	Type	Description
lexicon-search-order	String	A list of active lexicon URIs and the search order among the active lexicons.

<speechserver>

Related topics