<speechserver>
The <speechserver> element defines the default parameters that apply to the Speech Server interactions with the Recognizer and Vocalizer.
This element has three child elements:
- <speechrecog>—Lists parameters that apply to Speech Server interactions with the Recognizer.
- <speechrecog>—Lists parameters that apply to Speech Server waveform recordings.
- <speechsynth>—Lists parameters that apply to Speech Server interactions with Vocalizer.
Note that the parameters in these <speechsynth> and <speechrecog> sections under <speechserver> are different from the parameters specified in the main <speechsynth> and <speechrecog> sections.
This table lists parameters that can be set in any <speechserver> child element. They apply only to the section in which they are set. For example, you can set one fetch-timeout parameter to apply for recognition resources in the <speechrecog> child element and a different fetch-timeout parameter to apply for audio resources in the <speechsynth> child element.
|
Parameter |
Type |
Description |
|---|---|---|
|
cache-control.max-age |
Integer |
The server may use only content whose age is no greater than the specified time in seconds. |
|
cache-control.max-stale |
Integer |
The server may use cached data that has exceeded its expiration time by up to the specified number of seconds. If no value is assigned, the server may use stale data of any age. |
|
cache-control.min-fresh |
Integer |
The server may only use cached data whose expiration is no less than its current age plus the specified time in seconds. |
|
fetch-timeout |
Integer |
Timeout for resources the Speech Server may need to fetch from the network. The value, specified in milliseconds, controls URI access properties when fetching documents or other resources like speech audio files. |
|
logging-tag |
String |
Specifies a string of text to include in the server logs. |
A <speechrecog> section within a <speechserver> element may include these parameters (see also Parameters settable in all child elements):
|
Parameter |
Type |
Description |
|---|---|---|
|
accept-charset |
String |
The acceptable character set for entities returned by the Recognizer in responses to requests. |
|
dtmf-buffer-time |
Integer |
Period in milliseconds allocated for the Recognizer type-ahead buffer. The type-ahead buffer collects DTMF digits as they are pressed, even when there is no recognition currently active. When a subsequent recognition is activated, it takes the digits in the buffer. If the digits in the buffer are not sufficient, then it can continue to listen to more digits to match the grammar. If the value is set to 0, all DTMF input is ignored. |
|
dtmf-interdigit-timeout |
String |
Maximum period in milliseconds that may pass between each DTMF keystroke, without causing a timeout. |
|
dtmf-term-char |
Char |
The DTMF character that marks the end of DTMF input. |
|
dtmf-term-timeout |
Integer |
Time period (in milliseconds) that marks the end of DTMF input once the total number of tones allowed by the grammar has been entered, and if the user fails to type the optional termination character (the dtmf-term-char). |
|
hotword-max-duration |
Integer |
Maximum duration in milliseconds of an utterance that is considered for hotword recognition. This limit lets the Recognizer immediately reject utterances that are too long to be a hotword, without wasting CPU to interpret them. For more information, see Using hot word recognition. |
|
hotword-min-duration |
Integer |
Minimum duration in milliseconds of an utterance that is considered for hotword recognition. This limit lets the Recognizer immediately reject utterances that are too short to be a hotword, without wasting CPU to interpret them. For more information, see Using hot word recognition. |
|
media-type |
String |
Specifies the MIME content type to be used for captured audio. |
|
new-audio-channel |
Boolean |
Indicates to the Recognizer that the audio data is from a new audio source, channel, or speaker. If the recognition resource had collected any input statistics or adaptation state, the recognition resource MUST do what is appropriate for the specific recognition technology, which includes but is not limited to discarding any collected input statistics or adaptation state before starting the recognition request. If there are multiple resources sharing a media pipe and collecting or using this data, and the client issues this header to one of the resources, the reset operation applies to all resources that use the shared media stream. This helps in a number of use cases, including where the client wishes to reuse an open recognition session with an existing media session for multiple telephone calls. |
|
recognition-mode |
String |
Mode of operation for the recognition method.
|
|
recognition-timeout |
Integer |
Maximum number of milliseconds allowed for the Recognizer to complete a recognition. |
|
save-waveform |
Boolean |
Instructs the Recognizer to save the current utterance without endpointing, and to return a pointer to it for logging purposes. |
|
speech-language |
String |
Specifies the default language for built-in grammars. (All other grammars define their languages internally.) |
|
start-input-timers |
Boolean |
Starts various Recognizer input timers. Use this when the synthesizer is playing a barge-in enabled prompt, and you want the recognition request to be simultaneously active so that it can detect and implement barge-in. |
A <speechrecorder> section within a <speechserver> element may include these parameters (see also Parameters settable in all child elements):
|
Parameter |
Type |
Description |
|---|---|---|
|
capture-on-speech |
Boolean |
Specifies whether to begin recording immediately or wait for the end-pointing functionality to detect speech before starting. Values:
You can also set this in your application with the captureonspeech attribute within the <record> element. The application setting takes precedence over the session.xml setting. |
|
max-time |
Integer |
Specifies the maximum length of the recording in milliseconds, calculated from the time the actual capture and store begins, and is not necessarily the time the RECORD method is received. The Max-Time header specifies the duration before silence suppression, if any, that has been applied by the recorder resource. After this time, the recording stops and the server must return a RECORD-COMPLETE event to the client having a request-state of "COMPLETE". |
|
media-type |
String |
Specifies the MIME content type to be used for captured audio. |
|
new-audio-channel |
Boolean |
Allows the client to tell the server that, from this point on, further input audio comes from a different audio source, channel, or speaker. This header takes a Boolean value. If multiple resources are sharing a media pipe and are collecting or using this data, and the client issues this header to one of the resources, the reset operation applies to all resources that use the shared media stream. This helps in a number of use cases, including where the client wishes to reuse an open recognition session with an existing media session for multiple telephone calls. |
|
trim-length |
Integer |
Specifies the length of audio to be trimmed from the end of the recording after the stop. The length is interpreted in milliseconds. The default value for this header is 0. |
A <speechsynth> section within a <speechserver> element may include these parameters (see also Parameters settable in all child elements):
|
Parameter |
Type |
Description |
|---|---|---|
|
lexicon-search-order |
String |
A list of active lexicon URIs and the search order among the active lexicons. |