Controlling speech processing
After configuring the general environment, the application requests specific properties for each activity within a session: generating speech, recognizing utterances, recording audio, and so on.
Recognition resources
Nuance Speech Server supports all MRCP recognition resources. See Controlling speech recognition and TTS or the MRCP recommendation.
Speech Server supports the following MRCP recognition methods:
Method |
Description |
---|---|
DEFINE-GRAMMAR |
For Nuance Recognizer, specifies one or more grammars and requests the server to access, fetch, and compile the grammars as needed. For Dragon Voice, specifies domain language models and optional wordsets required for recognition and interpretation. |
RECOGNIZE |
Requests the recognizer to start recognition. The RECOGNIZE method can carry headers to control properties such as sensitivity and the level of detail in results provided by the recognizer. For Nuance Recognizer, the RECOGNIZE method can request to operate in normal or hotword mode as specified by the Recognition-Mode header. Note: The Krypton recognition engine does not support hotword mode recognition. |
INTERPRET |
Requests to start semantic interpretation of a specified text (contained in the Interpret-Text header). For Nuance Recognizer, specifies one or more grammars to match against the input text. For NLE, specifies the semantic model to use. |
GET-RESULT |
Speech Server does not support GET-RESULT. For Nuance Recognizer, the n-best recognition result is generated once from the RECOGNIZE method and then returned to the client. To retrieve a larger number of n-best results, applications can set the N-Best-List-Length header. |
START-INPUT-TIMERS (MRCPv2) RECOGNITION-START-TIMERS (MRCPv1) |
Notifies the recognizer that a kill-on-barge-in prompt has finished playing. |
STOP |
Instructs the recognizer to stop any in-progress recognition. |
The following methods apply to voice-phrase enrollment and Nuance Recognizer only:
Method |
Description |
---|---|
START-PHRASE-ENROLLMENT |
Starts a new phrase-enrollment session consisting of a set of calls to RECOGNIZE in which the caller speaks a phrase several times so the system can "learn" it. The phrase is then added to a personal grammar (speaker-trained grammar), so that the system can recognize it later. |
ENROLLMENT-ROLLBACK |
Discards the last live utterance from the RECOGNIZE operation. The client can invoke this method when the caller provides undesirable input such as non-speech noises, side speech, commands, etc. |
END-PHRASE-ENROLLMENT |
Commits the new phrase in the personal grammar. The client can call this method once successive calls to RECOGNIZE have succeeded and Num-Repetitions-Still-Needed has been returned as 0 in the RECOGNITION-COMPLETE event. Alternatively, the client can abort the phrase enrollment session by calling this method with the Abort- Phrase-Enrollment header. |
MODIFY-PHRASE |
Changes the phrase ID, natural-language phrase, and/or weight for a given phrase in a personal grammar. |
DELETE-PHRASE |
Deletes a phase in a personal grammar added through voice enrollment or text enrollment. |
Speech Server supports the following MRCP recognition events:
Event |
Description |
---|---|
END-OF-INPUT (Nuance extension) |
The time between the END-OF-INPUT and the beginning of the following prompt is used to compute caller-perceived latency. Not supported by the Krypton recognition engine. |
START-OF-INPUT |
Indicates that the recognizer has detected speech or a DTMF digit in the media stream. In a kill-on-barge-in scenario the client must act as an intermediary and respond to this event by issuing a BARGE-IN-OCCURRED method to the synthesizer. |
RECOGNITION-COMPLETE |
Indicates that the recognition completed and that this is the last event with that request-id. The recognition result is sent in the body of the MRCP message. The RECOGNITION-COMPLETE event can specify a URI to the audio waveform in a Waveform-URI header. |
INTERPRETATION-COMPLETE |
Indicates that the INTERPRET operation is complete. The interpretation result is sent in the body of the MRCP message and returns, via the INTERPRETATION-COMPLETE event. The interpretation result is very similar to the one returned from a RECOGNIZE method invocation, but excludes portions of the result relevant only to acoustic matching. |
Speech Server supports the following MRCP recognition headers. When the MRCP draft recommendation indicates that a default value is "implementation specific," the value is defined on the Speech Server via the Management Station.
Header |
Values |
Default |
Description |
---|---|---|---|
Cache-Control |
max-age; max-stale; min-fresh |
n/a |
(generic header) Overrides the default cache expiration mechanisms.
The scope of this header depends on the method it is sent on:
An empty cache-control header on the GET-PARAMS method requests the server to return the current Cache-Control directive settings on the server. |
Cancel-If-Queue (MRCPv2 only) |
TRUE, FALSE |
n/a Must be specified with each RECOGNIZE message. |
Specifies what happens if the client invokes another RECOGNIZE method while this RECOGNIZE request is in progress.
|
Clear-DTMF-Buffer (MRCPv2 only) |
TRUE, FALSE |
FALSE |
Discards any digits pressed by the caller before the recognition begins. |
Completion-Cause |
String: numeric code from 000 to 016 |
n/a |
Text describing the cause for the RECOGNIZE request completion. For detailed value descriptions, see Recognition Completion-Causes, or the MRCPv2 draft specification. |
Completion-Reason (MRPCv2 only) |
String |
n/a |
Text describing the reason for a RECOGNIZE failure; for use in logs or debugging. |
Confidence-Threshold |
0.0–1.0 (MRCPv2) |
n/a |
The confidence level the application considers a successful match. Note: Ignored by Dragon Voice engines. |
0–MAXINT |
0 |
Size (in milliseconds) of the type-ahead buffer. The type-ahead buffer collects DTMF digits as they are pressed, even when there is no RECOGNIZE command active. When a subsequent RECOGNIZE method is received, it may look to this buffer to match the RECOGNIZE request. If the digits in the buffer are not sufficient, then it can continue to listen for more digits to match the grammar. |
|
0–MAXINT |
5000 |
Maximum time allowed between each DTMF character entered by the user. |
|
valid DTMF character |
NULL |
DTMF character that terminates DTMF input. |
|
0–MAXINT |
10000 (MRCPv1) 5000 (MRCPv2) |
Time period (in milliseconds) that terminates DTMF input once the total number of tones allowed by the grammar has been entered and the user fails to type an optional termination character (DTMF-Term-Char). |
|
Failed-URI |
String |
Bad URI. When a method instructs the recognizer to fetch or access a URI and the access fails, Speech Server provides the failed URI in the header of the method response. |
|
Failed-URI-Cause |
String |
Reason for a failed URI fetch. When a recognition method needs a recognizer to fetch or access a URI and the access fails, Speech Server provides the URI or protocol-specific response code in the header of the method response. |
|
Fetch-Timeout |
10–MAXINT |
5000 |
(generic header) Maximum number of milliseconds for the server to fetch content from the network. |
Hotword-Confidence-Threshold (MRCPv1 VSP) |
0–100 |
n/a |
Confidence level the application considers a successful match in hotword mode. Note: Ignored by Dragon Voice engines. |
Hotword-Max-Duration Hotword-Min-Duration (MCRPv2; VSP in MRCPv1) |
Integer expressing the duration in milliseconds |
800 200 |
Maximum/minimum length of an utterance that is considered for hotword recognition. These headers can be used to tune performance by preventing the recognizer from evaluating utterances that are too short or too long to be one of the hotwords in the grammar(s). They enable a client application to support the Nuance Recognizer recognition modes selective barge-in and magic word. Hotword-Min-Duration is ignored if Hotword-Max-Duration is 0. See Proprietary Recognizer features. Note: Ignored by Dragon Voice engines. |
Input-Modes (Nuance extension) |
voice, dtmf, dtmf voice |
dtmf voice |
Specifies which input modes are enabled:
Note: This parameter does not control grammar activation in Nuance Recognizer. For instance, if you allow only DTMF input, even active voice grammars are not matched during recognition. |
Input-Type |
dtmf, speech |
n/a |
When the recognizer detects barge-in-able input and generates a START-OF-INPUT event, Input-Type specifies whether the input that caused the barge-in was DTMF or speech. |
Input-Waveform-URI (MRCPv2 only) |
String |
n/a |
URI pointing to audio content to be processed by the RECOGNIZE operation. |
Interpret-Text (MRCPv2 only) |
String |
n/a |
Text for which a natural-language interpretation is desired or a URI that points to the text. This header must be used when invoking the INTERPRET method. |
Logging-Tag |
String |
n/a |
(generic header) Text to include in the call logs. |
Media-Type (MRCPv2 only) |
MIME content type |
audio/x-wav |
MIME content type for the to store captured audio or video (such as the one returned by Waveform-URI). |
N-Best-List-Length |
1–999 |
1 |
Returns more than one recognition match, if available. All matches must be above the confidence-threshold. (You cannot get below-threshold matches by increaseing the length.) The header affects Krypton but not the other Dragon Voice engines. The length for the Krypton engine never exceeds 10, regardless of the header setting. |
New-Audio-Channel (MRCPv2 only) |
TRUE, FALSE |
FALSE |
Indicates to the recognizer that the audio data is from a new audio source, channel, or speaker. Using this header helps improve accuracy, because the recognizer readjusts to the audio signal. If there are multiple resources sharing a media pipe and collecting or using this data, and the client issues this header to one of the resources, the reset operation applies to all resources that use the shared media stream. This helps in a number of scenarios, including where the client wishes to reuse an open recognition session with an existing media session for multiple telephone calls. |
No-Input-Timeout |
0–MAXINT |
7000 |
Maximum number of milliseconds for the recognizer to wait for speech to be detected. |
One-Of-Rule-ID-URI |
Not supported by Speech Server. |
||
Recognizer-Context-Block |
n/a |
n/a |
Line information (acoustic state information) in Nuance proprietary format. |
Recognition-Mode (MRCPv2; VSP in MRCPv1) |
normal, hotword |
normal |
Mode of operation for the RECOGNIZE method.
Note: The Krypton recognition engine does not support hotword mode recognition. |
Recognizer-Start-Timers (MRCPv1) |
TRUE, FALSE |
TRUE |
Starts recognizer input timers. Use this when the synthesizer is playing a kill-on-barge-in prompt, and the client wants the RECOGNIZE request to be simultaneously active so that it can detect and implement kill-on-barge-in. However, the recognizer ought not start the no-input timers until the prompt is finished. (=Start-Input-Timers) |
Recognition-Timeout |
10–MAXINT |
10000 (MRCPv2) 60000 (MRCPv1) |
Maximum number of milliseconds for the recognizer to complete a recognition. |
Save-Waveform |
TRUE, FALSE |
FALSE |
Instructs the recognizer to save the current utterance (without endpointing) and return a pointer to it via the Waveform-URI header. To use this capability, you must have a web server installed. |
Save-Waveform-On-DTMF (Nuance extension) |
TRUE, FALSE |
TRUE |
When DTMF is detected, instructs the recognizer to save the current utterance and return a pointer to it via the Waveform-URI header. |
Sensitivity-Level |
0.0–1.0 |
0.5 |
Sensitivity of the end-pointer. Used to filter out background noise and not mistake it for speech. Lower values are less sensitive. |
Speech-Complete-Timeout |
0–MAXINT |
0 |
(Nuance Recognizer only) Length of silence required following user speech before the recognizer finalizes a result (either accepting it or generating a nomatch event). The Speech-Complete-Timeout applies when the recognizer currently has a complete match against an active grammar. By contrast, the Incomplete-Timeout is used when the speech is an incomplete match to an active grammar. The value is in milliseconds. A long Speech-Complete-Timeout value delays the result to the client and therefore makes the application's response to a user slow. A short Speech-Complete-Timeout may lead to an utterance being broken up inappropriately. Reasonable Speech-Complete-Timeout values are typically in the range of 0.3–1.0 second. Note: By default, Speech Server does not set this value, but uses the completetimeout value. |
Speech-Incomplete-Timeout |
0–MAXINT |
1500 |
Length of silence required following user speech after which a recognizer finalizes a result. The Incomplete-Timeout applies when the speech prior to the silence is an incomplete match. In this case, once the timeout is triggered, the partial result is rejected (with a Completion-Cause of "partial-match"). The value is in milliseconds. The Speech-Incomplete-Timeout also applies when the speech prior to the silence is a complete match, but where it is possible to speak further and still achieve a complete match. By contrast, the Complete-Timeout is used when the speech is a complete match and no further spoken words can continue to represent a match. A long Speech-Incomplete-Timeout value delays the result to the client and therefore makes the application's response to a user slow. A short Speech-Incomplete-Timeout may lead to an utterance being broken up inappropriately. The Speech-Incomplete-Timeout is usually longer than the Speech-Complete-Timeout to allow users to pause mid-utterance (for example, to breathe). Note: By default, Speech Server does not set this value, but uses the incompletetimeout value. |
Speech-Language |
Any installed recognizer language code |
en-US |
(Nuance Recognizer only) Default language for built-in grammars (all other grammars define their languages internally). |
Speed-Vs-Accuracy |
0.0–1.0 (MRCPv2) 0–100 (MRCPv1) |
0.5 50 |
A value of 0.0 means fastest recognition; a value of 1.0 means best accuracy. Note: This header can be sent with no error, but currently is ignored by the recognizer. |
Start-Input-Timers (MRCPv2) |
TRUE, FALSE |
TRUE |
Starts recognizer input timers. Use this when the synthesizer is playing a kill-on-barge-in prompt, and the client wants the RECOGNIZE request to be simultaneously active so that it can detect and implement kill-on-barge-in. However, the recognizer ought not start the no-input timers until the prompt is finished. (=Recognizer-Start-Timers) |
Vendor-Specific-Parameters |
You can pass Nuance Recognizer-specific parameters to the server for a given session using this header. For example, to set swirec_search_threshold you would send the following: Vendor-Specific-Parameters: swirec_search_threshold=8 |
||
Ver-Buffer-Utterance (MRCPv2 only) |
Not supported by Speech Server. Instructs the server to buffer the utterance associated with this request into the verification buffer, so that this utterance could be later considered for speaker verification. This buffer is shared across resources within a session. Sending this header is permitted only if the verification buffer is instantiated for the session. |
||
Waveform-URI (MRCPv2) Waveform-URL (MRCPv1) |
n/a |
n/a |
If Save-Waveform is set to TRUE, returns a URI for the recording of the current utterance. For MRCPv2, also returns the size in bytes and duration in milliseconds of the recording. Returns an empty string if an error prevented recording. |
Note: The low-end accuracy for some timeout parameters is affected by the precision of the operating system. For example, on Windows the scheduling precision is 10 milliseconds, so that a value of 10 ms can result in less than a 10 ms scheduling time slice.
The Completion-Cause header must be part of a RECOGNITION-COMPLETE event coming from the recognizer resource to the client. It indicates the reason behind the RECOGNIZE method completion. This header MUST be sent in the DEFINE-GRAMMAR and RECOGNIZE responses, if they return with a failure status and a COMPLETE state.
Cause-code |
Cause-name |
Description |
---|---|---|
000 |
success |
RECOGNIZE completed with a match or DEFINE-GRAMMAR succeeded in downloading the resource (and compiling the grammar when required). |
001 |
no-match |
RECOGNIZE completed, but no match was found. |
002 |
no-input-timeout |
RECOGNIZE completed without a match due to a No-Input-Timeout. |
003 |
hotword-maxtime |
RECOGNIZE in hotword mode completed without a match due to a Recognition-Timeout. (Nuance Recognizer only) |
004 |
grammar-load-failure |
RECOGNIZE failed due to grammar load failure. |
005 |
grammar-compilation-failure |
RECOGNIZE failed due to grammar compilation failure. |
006 |
recognizer-error |
RECOGNIZE request terminated prematurely due to a recognizer error. |
007 |
speech-too-early |
RECOGNIZE request terminated because speech was too early. This happens when the audio stream is already "in-speech" when the RECOGNIZE request was received. |
008 |
success-maxtime |
RECOGNIZE request terminated because speech was too long, but whatever was spoken until that point was a full match. |
009 |
uri-failure |
Failure accessing a URI. |
010 |
language-unsupported |
Language not supported. |
011 |
cancelled |
A new RECOGNIZE cancelled this one. |
012 |
semantics-failure |
Recognition succeeded, but semantic interpretation of the recognized input failed. The RECOGNITION-COMPLETE event must contain the recognition result with only input text and no interpretation. |
013 |
partial-match |
Speech Incomplete timeout expired before there was a full match, but whatever that was spoken until that point was a partial match. |
014 |
partial-match-maxtime |
Recognition-Timeout expired before full match was achieved, but whatever was spoken until that point was a partial match. |
015 |
no-match-maxtime |
Recognition-Timeout expired. Either whatever was spoken until that point did not match, or the recognizer does not support detecting partial matches. |
016 |
grammar-definition-failure |
DEFINE-GRAMMAR error other than grammar-load-failure and grammar-compilation-failure. |
Text-to-speech (TTS) resources
The Nuance Speech Server software supports many MRCP text-to-speech resources. See Controlling speech recognition and TTS or the MRCP recommendation.
Speech Server supports the following MRCP text-to-speech methods:
Method |
Description |
---|---|
BARGE-IN-OCCURRED |
Notifies the synthesizer that the client has detected a barge-in-able event. This method is useful in two scenarios:
|
CONTROL |
Tells a synthesizer that is speaking to modify what it is speaking on the fly: to jump forward or backward in what it is speaking, change speaker rate, speaker parameters, etc. It affects only the currently IN-PROGRESS "SPEAK" request. |
DEFINE-LEXICON |
Specifies a user dictionary and tells the server to load, unload, activate, or deactivate it. DEFINE-LEXICON enables the dictionary for the current text-to-speech instance, and for all succeeding speak requests until it is explicitly disabled. Hence, DEFINE-LEXICON enables loading a user dictionary once and leaving it enabled for an entire call, although the voice platform must remember to turn it back off. Note: The SPEAK request can also enable a user dictionary, but only for the current text-to-speech request. |
GET-PARAMS |
Retrieves the value of the specified parameter. |
PAUSE |
Tells the synthesizer to pause speech output if it is speaking something. |
RESUME |
Tells a paused synthesizer to resume speaking. |
SET-PARAMS |
The supported headers for this request are described in the section Text-to-speech headers. |
SPEAK |
The supported headers for this request are described in the section Text-to-speech headers. |
STOP |
Instructs the synthesizer to stop speech if a request is active. |
Speech Server supports the following MRCP text-to-speech events:
Events |
Description |
---|---|
SPEECH-COMPLETE |
Supported. |
SPEECH-MARKER |
Supported. |
Speech Server supports the following MRCP text-to-speech headers.
Header |
Values |
Default |
Description |
---|---|---|---|
Audio-Fetch-Hint |
prefetch, safe, stream |
prefetch |
This header is ignored by Vocalizer. |
Cache-Control |
max-age; max-stale; min-fresh |
n/a |
(generic header) Overrides the default cache expiration mechanisms.
The scope of this header depends on the method it is sent on. If the directives are sent on a SET-PARAMS method, they apply for all requests for external documents the server makes during that session, unless overridden by a Cache-Control header on an individual request. If the directives are sent on any other request, they apply only to external document requests the server makes for that request. An empty cache-control header on the GET-PARAMS method requests the server to return the current Cache-Control directive settings on the server. |
Completion-Cause |
String: numeric code from 000 to 007 |
n/a |
Text describing the reason for the SPEAK request completion. For detailed value descriptions, see Text-to-speech Completion-Causes, or the MRCPv2 draft specification. |
Completion-Reason (MRPCv2 only) |
String |
n/a |
Text describing the reason for a SPEAK failure; for use in logs or debugging, such as an error in parsing the speech markup text. |
Failed-URI |
String |
n/a |
Bad URI. When a method instructs the synthesizer to fetch or access a URI and the access fails, Speech Server provides the failed URI in the header of the method response. |
Failed-URI-Cause |
String |
n/a |
Reason for a failed URI fetch. When a synthesizer method needs a synthesizer to fetch or access a URI and the access fails, access a URI and the access fails, Speech Server provides the URI or protocol specific response code in the header of the method response. |
Fetch-Hint |
prefetch, safe |
prefetch |
Instructs the synthesizer to retrieve documents or other resources such as speech markup or audio files from the server. Possible values are:
This header field can occur in SPEAK, SET-PARAMS or GET-PARAMS requests |
Fetch-Timeout (generic in MRCPv2) |
0–MAXINT |
Synthesizer timeout for resources Speech Server may need to fetch from the network. The value, specified in milliseconds, controls URI access properties when the synthesizer needs to fetch documents or other resources like speech audio files. The default value is dependent on the voice platform. This header field can occur in SPEAK, SET-PARAMS or GET-PARAMS. |
|
Jump-Size (MRCPv2) Jump-Target (MRCPv1) |
Not supported by Speech Server. It is accepted and returns 408 "Unrecognized or unsupported message entity." Amount to jump forward or backward in an active "SPEAK" request. A + or - indicates a relative value to what is being currently played. This header MAY also be specified in a SPEAK request as a desired offset into the synthesized speech. In this case, the synthesizer must begin speaking from this amount of time into the speech markup. Note that an offset that extends beyond the end of the produced speech will result in audio of length zero. The different speech length units (second, word, sentence, paragraph) supported are dependent on the synthesizer implementation. |
||
Kill-On-Barge-In |
TRUE, FALSE |
TRUE |
Enables barge-in. If set to TRUE, or if not set, the server stops the SPEAK method when it receives a BARGE-IN-OCCURRED method. If set to FALSE, the server ignores a BARGE-IN-OCCURRED method and sends the appropriate response. The client sends a BARGE-IN-OCCURRED method to the synthesizer when it receives a barge-in-able event such as DTMF input detected by a signal detector resource or by the start of speech sensed or recognized by the speech recognizer resource. If the recognizer or signal detector resource is on the same server as the synthesizer and both are part of the same session, the server can work with both to provide internal notification to the synthesizer so that audio can be stopped without having to wait for the client's BARGE-IN-OCCURRED event. |
Lexicon-Search-Order (MRCPv2 only) |
n/a |
List of active lexicon URIs and the search order among the active lexicons. |
|
Load-Lexicon (MRCPv2 only) |
TRUE, FALSE |
TRUE |
Loads a lexicon (dictionary). If set to TRUE, or if not set, this header indicates to load the dictionary. If set to FALSE, it indicates to unload the dictionary. This header can occur only in DEFINE-LEXICON. |
Logging-Tag |
String |
n/a |
(generic header) Text to include in the call logs. |
Prosody-Contour Prosody-Duration Prosody-Pitch Prosody-Range |
These headers are ignored by Speech Server. |
||
Prosody-Rate Prosody-Volume (MRCPv2 only) |
Prosody-Rate accepts values in the following forms:
Prosody-Volume accepts values in the following forms:
When the value is a number, you can specify a simple floating point value without exponentials. Legal formats are "n", "n.", ".n" and "n.n" where "n" is a sequence of one or more digits. When the value indicates a relative change (an offset based on the current setting), the value must begin with a plus sign (+) or minus sign (-) followed by a number. Optionally, the number can be a percentage change. Examples:
|
||
Speaker-Profile |
Not supported by Speech Server. Specifies a URI which references the profile of the speaker. Speaker profiles are collections of voice parameters like gender and accent. |
||
Speak-Length |
Not supported by Speech Server. |
||
Speak-Restart |
Not supported by Speech Server. |
||
Speech-Language |
Default language for the text-to-speech engine. |
||
Speech-Marker |
Integer |
Marker tag that can be embedded in the speech data. When the synthesizer reaches these marker fields, it generates SPEECH-MARKER events. Marker tags are limited to integer values. For details on marker tags, see . |
|
Vendor-Specific Parameters (generic in MRCPv2) |
n/a |
The following Nuance-supplied (vendor-specific) parameters are supported:
Note: Speech Server processes vendor-specific dictionary parameters in the order listed here, not in the order they appear in the MRCP request. To process these parameters in a different order, specify them in consecutive MRCP SET-PARAMS requests. You can load and unload dictionaries by including the parameters with a comma-separated list of the dictionaries, as in the following example: SET_PARAM 100014 MRCP/1.0Vendor-Specific: ssftrs_dict_load="http://my.com/dict1,http://my.com/dict2"; |
|
Voice-Age |
Setting these fields results in a 201 "parameter is ignored" message. |
||
Voice-Gender |
male, female, neutral |
n/a |
Gender of the TTS voice. |
Voice-Name |
String |
n/a |
Name of the TTS voice. Specify the complete voice name (including any suffix). |
The parameter server.mrcp2.rsspeechsynth.mrcpdefaults.prosodyVolume defines the value for the symbolic string "default". The default value is 100, which is interpreted as an “extra loud” volume. The complete set of values appears below:
Value |
Symbolic string |
---|---|
0 |
silent |
10 |
x-soft |
25 |
soft |
50 |
medium (default) |
75 |
loud |
100 |
x-loud |
The parameter server.mrcp2.session.mrcpdefaults.prosodyRate overrides the value for the symbolic string "default". The default value is 50, which is interpreted as a "medium" rate. The complete set of values appears below:
Value |
Symbolic string |
---|---|
1 |
x-slow |
25 |
slow |
50 |
medium (default) |
75 |
fast |
100 |
x-fast |
The Completion-Cause header must be part of a SPEAK-COMPLETE event coming from the synthesizer resource to the client. It indicates the reason behind the SPEAK method completion.
Cause-code |
Cause-name |
Description |
---|---|---|
000 |
normal |
SPEAK completed normally. |
001 |
barge-in |
SPEAK request was terminated because of barge-in. |
002 |
parse-failure |
SPEAK request terminated because of a failure to parse the speech markup text. |
003 |
uri-failure |
SPEAK request terminated because access to one of the URIs failed. |
004 |
error |
SPEAK request terminated prematurely due to synthesizer error. |
005 |
language-unsupported |
Language not supported. |
006 |
lexicon-load-failure |
Lexicon loading failed. |
007 |
cancelled |
A prior SPEAK request failed while this one was still in the queue. |
Recorder resources
Speech Server can record speaker utterances for future reference, such as for use in recognizer tuning or legal confirmation.
Nuance Speech Server supports the following recorder methods:
Method |
Description |
---|---|
RECORD |
Places the recorder in the Recording state. Depending on the headers specified in the RECORD method, the recorder can start recording the audio immediately or wait for the end pointer to detect speech in the audio. The RECORD request then saves the audio to the URI supplied in the Record-URI header. If Record-URI is not specified, the server saves the audio file anywhere it finds convenient and returns a URI pointing to the file in the RECORD-COMPLETE event. |
START-INPUT-TIMERS |
Sent when a kill-on-barge-in prompt has finished playing. Use this when the recorder and synthesizer are not in the same MRCPv2 session. When a kill-on-barge-in prompt is playing, the client wants the RECORD request to be active at the same time (so that it can detect and implement kill on barge-in) without starting the server's no-input timers until the prompt is finished. The Start-Input-Timers header in the RECORD request specifies whether to start the timers. |
STOP |
Instructs the recorder to stop recognition if one is active. If a RECOGNIZE request is active and the STOP request successfully terminates it, the response header contains an Active-Request-ID-List header with the request-id of the RECOGNIZE request that was terminated. In this case, no RECOGNITION-COMPLETE event is sent for the terminated request. If there is no recognition active, the response must not contain an Active-Request-ID-List header. |
Nuance Speech Server supports the following recorder events:
Resource |
Description |
---|---|
RECORD-COMPLETE |
If the recording was a success, contains a Record-Uri header pointing to the recorded audio file on the server or to a MIME part containing the recorded audio in the body of the message. If the recording completes due to no-input, silence after speech, or max-time, the server must generate the RECORD-COMPLETE event to the client with a request-state of "COMPLETE". |
START-OF-INPUT |
Returned from the server to the client once the server has detected speech. |
Nuance Speech Server supports the following recorder headers:
Header |
Values |
Default |
Description |
---|---|---|---|
Capture-On-Speech |
TRUE, FALSE |
FALSE |
Instructs the recorder to start capturing immediately (FALSE) when it starts, or wait for the end-pointing functionality to detect speech (TRUE) before it starts capturing. This header can occur in the RECORD, SET-PARAM, or GET-PARAMS method. |
Completion-Cause |
String: numeric code from 000 to 004 |
n/a |
Reason behind the RECOGNIZE request completion. Must be included in a RECOGNITION-COMPLETE event. For detailed value descriptions, see Recorder Completion-Causes, or the MRCPv2 draft specification. |
Completion-Reason |
String |
n/a |
Reason behind the RECOGNIZE request completion, such as the specific error encountered in parsing a grammar markup. May be included in a RECOGNITION-COMPLETE event. Do not interpret the completion reason text. Instead, record the reason in client logs and make them available for debugging and instrumentation purposes. |
Failed-Uri |
Alphanumeric string plus "@" ":" "/" "\" |
n/a |
Bad URI. When a method instructs the recorder to fetch or access a URI and the access fails, Speech Server provides the failed URI in the header of the method response. |
Failed-Uri-Cause |
Alphanumeric string |
n/a |
Reason for a failed URI fetch. When a recorder method needs a recorder to fetch or access a URI and the access fails, Speech Server provides the URI or protocol-specific response code in the header of the method response. |
Final-Silence |
0–MAXINT |
Implementation specific |
Length of silence in the audio (in milliseconds) to be interpreted as the end of the recording. This header can occur in RECORD, SET-PARAMS or GET-PARAMS. A value of zero means infinity and allows the recording to continue until one of the other stop conditions are met. Use the endpointer to include the silence before start of speech in the recording even when Final-Silence is non-zero and Capture-On-Speech is false. Collect samples whenever Capture-On-Speech is true or Final-Silence is non-zero. Since the endpointer does not support intermediary silence suppression, there is no support for intermediary silence suppression in the recorder resource. |
Max-Time |
0–MAXINT |
0 |
Maximum length of the recording in milliseconds, calculated from the time the actual capture and store begins (not necessarily the time the RECORD method is received). After this time, the recording stops and the server returns a RECORD-COMPLETE event with a request-state of COMPLETE. This header can occur in RECORD, SET-PARAMS or GET-PARAMS. A value of zero means infinity and allows the recording to continue until one or more of the other stop conditions is met. |
Media-Type |
MIME content type |
audio/x-wav |
MIME content type in which to store the captured audio or video. |
New-Audio-Channel |
TRUE, FALSE |
n/a |
Tells the server that, from this point on, further input audio comes from a different audio source, channel, or speaker. It is specified in a RECORD request. If multiple resources are sharing a media pipe and are collecting or using this data, and the client issues this header to one of the resources, the reset operation applies to all resources that use the shared media stream. This helps in a number of scenarios, including where the client wishes to reuse an open recognition session with an existing media session for multiple telephone calls. |
No-Input-Timeout |
0–MAXINT |
Implementation specific |
Duration when there is no speech detected for a certain period of time. Based on this value, the recognizer issues a RECOGNITION-COMPLETE event with a Completion-Status of "002 no-input-timeout", and terminates the recognition operation. This header can occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS. Time in milliseconds, from 0 to an application specific maximum value |
Record-URI |
Alphanumeric string plus "@" ":" "/" "\" |
n/a |
When a recorder method contains the Failed-Uri header, the server must capture the audio and store it.
The server must also return the size in bytes and the duration in milliseconds of the recorded audio wave-form as parameters associated with the header; for example:
|
Sensitivity-Level |
0.0–1.0 |
Implementation specific |
Sensitivity level for the recognizer. A higher value for this header means higher sensitivity. The recognizer may support a variable level of sound sensitivity to filter out background noise and not mistake it for speech. This header can occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS. |
Start-Input-Timers |
TRUE, FALSE |
TRUE |
Instructs the recorder to start the timers. The recorder does not start the timers until the client sends a START-INPUT-TIMERS method to the recorder. This is useful in the scenario when the recorder and synthesizer resources are not part of the same session. When a kill-on-barge-in prompt plays, the client may want the RECORD request to be simultaneously active, so that it can detect and implement kill-on-barge-in. At the same time, however, the client doesn't want the recorder resource to start the no-input timers until the prompt is finished. The Start-Input-Timers header can be sent as part of the RECORD request.
|
Trim-Length |
0–MAXINT |
0 |
Length of audio to be trimmed from the end of the recording after the STOP (in milliseconds). |
The Completion-Cause header must be part of a RECORD-COMPLETE event coming from the recorder resource to the client. It indicates the reason behind the RECORD method completion. This header must be sent in the RECORD responses, if they return with a failure status and a COMPLETE state.
Cause-code |
Cause-name |
Description |
---|---|---|
000 |
success-silence |
RECORD completed with a silence at the end. |
001 |
success-maxtime |
RECORD completed after reaching the maximum recording time specified in the RECORD method. |
002 |
no-input-timeout |
RECORD failed due to no input. |
003 |
uri-failure |
Failure accessing the record URI. |
004 |
error |
RECORD request terminated prematurely due to a recorder error. |
Related topics
Related topics