Controlling speech processing

After configuring the general environment, the application requests specific properties for each activity within a session: generating speech, recognizing utterances, recording audio, and so on.

Recognition resources

Nuance Speech Server supports all MRCP recognition resources. See Controlling speech recognition and TTS or the MRCP recommendation.

Recognition methods

Speech Server supports the following MRCP recognition methods:

Method	Description
DEFINE-GRAMMAR	For Nuance Recognizer, specifies one or more grammars and requests the server to access, fetch, and compile the grammars as needed. For Dragon Voice, specifies domain language models and optional wordsets required for recognition and interpretation.
RECOGNIZE	Requests the recognizer to start recognition. The RECOGNIZE method can carry headers to control properties such as sensitivity and the level of detail in results provided by the recognizer. For Nuance Recognizer, the RECOGNIZE method can request to operate in normal or hotword mode as specified by the Recognition-Mode header. Note: The Krypton recognition engine does not support hotword mode recognition.
INTERPRET	Requests to start semantic interpretation of a specified text (contained in the Interpret-Text header). For Nuance Recognizer, specifies one or more grammars to match against the input text. For NLE, specifies the semantic model to use.
GET-RESULT	Speech Server does not support GET-RESULT. For Nuance Recognizer, the n-best recognition result is generated once from the RECOGNIZE method and then returned to the client. To retrieve a larger number of n-best results, applications can set the N-Best-List-Length header.
START-INPUT-TIMERS (MRCPv2) RECOGNITION-START-TIMERS (MRCPv1)	Notifies the recognizer that a kill-on-barge-in prompt has finished playing.
STOP	Instructs the recognizer to stop any in-progress recognition.

The following methods apply to voice-phrase enrollment and Nuance Recognizer only:

Method	Description
START-PHRASE-ENROLLMENT	Starts a new phrase-enrollment session consisting of a set of calls to RECOGNIZE in which the caller speaks a phrase several times so the system can "learn" it. The phrase is then added to a personal grammar (speaker-trained grammar), so that the system can recognize it later.
ENROLLMENT-ROLLBACK	Discards the last live utterance from the RECOGNIZE operation. The client can invoke this method when the caller provides undesirable input such as non-speech noises, side speech, commands, etc.
END-PHRASE-ENROLLMENT	Commits the new phrase in the personal grammar. The client can call this method once successive calls to RECOGNIZE have succeeded and Num-Repetitions-Still-Needed has been returned as 0 in the RECOGNITION-COMPLETE event. Alternatively, the client can abort the phrase enrollment session by calling this method with the Abort- Phrase-Enrollment header.
MODIFY-PHRASE	Changes the phrase ID, natural-language phrase, and/or weight for a given phrase in a personal grammar.
DELETE-PHRASE	Deletes a phase in a personal grammar added through voice enrollment or text enrollment.

Recognition events

Speech Server supports the following MRCP recognition events:

Event	Description
END-OF-INPUT (Nuance extension)	The time between the END-OF-INPUT and the beginning of the following prompt is used to compute caller-perceived latency. Not supported by the Krypton recognition engine.
START-OF-INPUT	Indicates that the recognizer has detected speech or a DTMF digit in the media stream. In a kill-on-barge-in scenario the client must act as an intermediary and respond to this event by issuing a BARGE-IN-OCCURRED method to the synthesizer.
RECOGNITION-COMPLETE	Indicates that the recognition completed and that this is the last event with that request-id. The recognition result is sent in the body of the MRCP message. The RECOGNITION-COMPLETE event can specify a URI to the audio waveform in a Waveform-URI header.
INTERPRETATION-COMPLETE	Indicates that the INTERPRET operation is complete. The interpretation result is sent in the body of the MRCP message and returns, via the INTERPRETATION-COMPLETE event. The interpretation result is very similar to the one returned from a RECOGNIZE method invocation, but excludes portions of the result relevant only to acoustic matching.

Recognition headers

Speech Server supports the following MRCP recognition headers. When the MRCP draft recommendation indicates that a default value is "implementation specific," the value is defined on the Speech Server via the Management Station.

Header	Values	Default	Description
Cache-Control	max-age; max-stale; min-fresh	n/a	(generic header) Overrides the default cache expiration mechanisms. max-age—The server uses only cached data whose age is no greater than the specified time in seconds. min-fresh—The server uses only cached data whose expiration is no less than its current age plus the specified time in seconds. max-stale—The server uses only cached data that has exceeded the expiration time by up to the specified number of seconds. If no value is assigned to max-stale, the server may use stale data of any age. The scope of this header depends on the method it is sent on: If the directives are sent on a SET-PARAMS method, they apply for that entire session, unless overridden by a Cache-Control header on an individual request. If the directives are sent on any other request, they apply only to external document requests the server makes for that request. An empty cache-control header on the GET-PARAMS method requests the server to return the current Cache-Control directive settings on the server.
Cancel-If-Queue (MRCPv2 only)	TRUE, FALSE	n/a Must be specified with each RECOGNIZE message.	Specifies what happens if the client invokes another RECOGNIZE method while this RECOGNIZE request is in progress. TRUE—The current recognition is cancelled, and the new one started. FALSE—The new recognition is put in a queue until the current one finishes.
Clear-DTMF-Buffer (MRCPv2 only)	TRUE, FALSE	FALSE	Discards any digits pressed by the caller before the recognition begins.
Completion-Cause	String: numeric code from 000 to 016	n/a	Text describing the cause for the RECOGNIZE request completion. For detailed value descriptions, see Recognition Completion-Causes, or the MRCPv2 draft specification.
Completion-Reason (MRPCv2 only)	String	n/a	Text describing the reason for a RECOGNIZE failure; for use in logs or debugging.
Confidence-Threshold	0.0–1.0 (MRCPv2) 0–100 (MRCPv1)	n/a	The confidence level the application considers a successful match. Note: Ignored by Dragon Voice engines.
dtmf-buffer-time	0–MAXINT	0	Size (in milliseconds) of the type-ahead buffer. The type-ahead buffer collects DTMF digits as they are pressed, even when there is no RECOGNIZE command active. When a subsequent RECOGNIZE method is received, it may look to this buffer to match the RECOGNIZE request. If the digits in the buffer are not sufficient, then it can continue to listen for more digits to match the grammar.
dtmf-interdigit-timeout	0–MAXINT	5000	Maximum time allowed between each DTMF character entered by the user.
dtmf-term-char	valid DTMF character	NULL	DTMF character that terminates DTMF input.
dtmf-term-timeout	0–MAXINT	10000 (MRCPv1) 5000 (MRCPv2)	Time period (in milliseconds) that terminates DTMF input once the total number of tones allowed by the grammar has been entered and the user fails to type an optional termination character (DTMF-Term-Char).
Failed-URI	String		Bad URI. When a method instructs the recognizer to fetch or access a URI and the access fails, Speech Server provides the failed URI in the header of the method response.
Failed-URI-Cause	String		Reason for a failed URI fetch. When a recognition method needs a recognizer to fetch or access a URI and the access fails, Speech Server provides the URI or protocol-specific response code in the header of the method response.
Fetch-Timeout	10–MAXINT	5000	(generic header) Maximum number of milliseconds for the server to fetch content from the network.
Hotword-Confidence-Threshold (MRCPv1 VSP)	0–100	n/a	Confidence level the application considers a successful match in hotword mode. Note: Ignored by Dragon Voice engines.
Hotword-Max-Duration Hotword-Min-Duration (MCRPv2; VSP in MRCPv1)	Integer expressing the duration in milliseconds	800 200	Maximum/minimum length of an utterance that is considered for hotword recognition. These headers can be used to tune performance by preventing the recognizer from evaluating utterances that are too short or too long to be one of the hotwords in the grammar(s). They enable a client application to support the Nuance Recognizer recognition modes selective barge-in and magic word. Hotword-Min-Duration is ignored if Hotword-Max-Duration is 0. See Proprietary Recognizer features. Note: Ignored by Dragon Voice engines.
Input-Modes (Nuance extension)	voice, dtmf, dtmf voice	dtmf voice	Specifies which input modes are enabled: voice—Allows only speech input; disables DTMF input. dtmf—Allows only DTMF input; disables speech input dtmf voice—Allows both DTMF and speech input. Note: This parameter does not control grammar activation in Nuance Recognizer. For instance, if you allow only DTMF input, even active voice grammars are not matched during recognition.
Input-Type	dtmf, speech	n/a	When the recognizer detects barge-in-able input and generates a START-OF-INPUT event, Input-Type specifies whether the input that caused the barge-in was DTMF or speech.
Input-Waveform-URI (MRCPv2 only)	String	n/a	URI pointing to audio content to be processed by the RECOGNIZE operation.
Interpret-Text (MRCPv2 only)	String	n/a	Text for which a natural-language interpretation is desired or a URI that points to the text. This header must be used when invoking the INTERPRET method.
Logging-Tag	String	n/a	(generic header) Text to include in the call logs.
Media-Type (MRCPv2 only)	MIME content type	audio/x-wav	MIME content type for the to store captured audio or video (such as the one returned by Waveform-URI).
N-Best-List-Length	1–999	1	Returns more than one recognition match, if available. All matches must be above the confidence-threshold. (You cannot get below-threshold matches by increaseing the length.) The header affects Krypton but not the other Dragon Voice engines. The length for the Krypton engine never exceeds 10, regardless of the header setting.
New-Audio-Channel (MRCPv2 only)	TRUE, FALSE	FALSE	Indicates to the recognizer that the audio data is from a new audio source, channel, or speaker. Using this header helps improve accuracy, because the recognizer readjusts to the audio signal. If there are multiple resources sharing a media pipe and collecting or using this data, and the client issues this header to one of the resources, the reset operation applies to all resources that use the shared media stream. This helps in a number of scenarios, including where the client wishes to reuse an open recognition session with an existing media session for multiple telephone calls.
No-Input-Timeout	0–MAXINT	7000	Maximum number of milliseconds for the recognizer to wait for speech to be detected.
One-Of-Rule-ID-URI			Not supported by Speech Server.
Recognizer-Context-Block	n/a	n/a	Line information (acoustic state information) in Nuance proprietary format.
Recognition-Mode (MRCPv2; VSP in MRCPv1)	normal, hotword	normal	Mode of operation for the RECOGNIZE method. normal—The recognizer matches speech and DTMF to the grammars specified in the RECOGNIZE request. hotword—The recognizer only looks for the particular keywords or DTMF sequences specified in the grammar and ignores silence or other speech in the audio stream. Note: The Krypton recognition engine does not support hotword mode recognition.
Recognizer-Start-Timers (MRCPv1)	TRUE, FALSE	TRUE	Starts recognizer input timers. Use this when the synthesizer is playing a kill-on-barge-in prompt, and the client wants the RECOGNIZE request to be simultaneously active so that it can detect and implement kill-on-barge-in. However, the recognizer ought not start the no-input timers until the prompt is finished. (=Start-Input-Timers)
Recognition-Timeout	10–MAXINT	10000 (MRCPv2) 60000 (MRCPv1)	Maximum number of milliseconds for the recognizer to complete a recognition.
Save-Waveform	TRUE, FALSE	FALSE	Instructs the recognizer to save the current utterance (without endpointing) and return a pointer to it via the Waveform-URI header. To use this capability, you must have a web server installed.
Save-Waveform-On-DTMF (Nuance extension)	TRUE, FALSE	TRUE	When DTMF is detected, instructs the recognizer to save the current utterance and return a pointer to it via the Waveform-URI header.
Sensitivity-Level	0.0–1.0	0.5	Sensitivity of the end-pointer. Used to filter out background noise and not mistake it for speech. Lower values are less sensitive.
Speech-Complete-Timeout	0–MAXINT	0	(Nuance Recognizer only) Length of silence required following user speech before the recognizer finalizes a result (either accepting it or generating a nomatch event). The Speech-Complete-Timeout applies when the recognizer currently has a complete match against an active grammar. By contrast, the Incomplete-Timeout is used when the speech is an incomplete match to an active grammar. The value is in milliseconds. A long Speech-Complete-Timeout value delays the result to the client and therefore makes the application's response to a user slow. A short Speech-Complete-Timeout may lead to an utterance being broken up inappropriately. Reasonable Speech-Complete-Timeout values are typically in the range of 0.3–1.0 second. Note: By default, Speech Server does not set this value, but uses the completetimeout value.
Speech-Incomplete-Timeout	0–MAXINT	1500	Length of silence required following user speech after which a recognizer finalizes a result. The Incomplete-Timeout applies when the speech prior to the silence is an incomplete match. In this case, once the timeout is triggered, the partial result is rejected (with a Completion-Cause of "partial-match"). The value is in milliseconds. The Speech-Incomplete-Timeout also applies when the speech prior to the silence is a complete match, but where it is possible to speak further and still achieve a complete match. By contrast, the Complete-Timeout is used when the speech is a complete match and no further spoken words can continue to represent a match. A long Speech-Incomplete-Timeout value delays the result to the client and therefore makes the application's response to a user slow. A short Speech-Incomplete-Timeout may lead to an utterance being broken up inappropriately. The Speech-Incomplete-Timeout is usually longer than the Speech-Complete-Timeout to allow users to pause mid-utterance (for example, to breathe). Note: By default, Speech Server does not set this value, but uses the incompletetimeout value.
Speech-Language	Any installed recognizer language code	en-US	(Nuance Recognizer only) Default language for built-in grammars (all other grammars define their languages internally).
Speed-Vs-Accuracy	0.0–1.0 (MRCPv2) 0–100 (MRCPv1)	0.5 50	A value of 0.0 means fastest recognition; a value of 1.0 means best accuracy. Note: This header can be sent with no error, but currently is ignored by the recognizer.
Start-Input-Timers (MRCPv2)	TRUE, FALSE	TRUE	Starts recognizer input timers. Use this when the synthesizer is playing a kill-on-barge-in prompt, and the client wants the RECOGNIZE request to be simultaneously active so that it can detect and implement kill-on-barge-in. However, the recognizer ought not start the no-input timers until the prompt is finished. (=Recognizer-Start-Timers)
Vendor-Specific-Parameters			You can pass Nuance Recognizer-specific parameters to the server for a given session using this header. For example, to set swirec_search_threshold you would send the following: Vendor-Specific-Parameters: swirec_search_threshold=8
Ver-Buffer-Utterance (MRCPv2 only)			Not supported by Speech Server. Instructs the server to buffer the utterance associated with this request into the verification buffer, so that this utterance could be later considered for speaker verification. This buffer is shared across resources within a session. Sending this header is permitted only if the verification buffer is instantiated for the session.
Waveform-URI (MRCPv2) Waveform-URL (MRCPv1)	n/a	n/a	If Save-Waveform is set to TRUE, returns a URI for the recording of the current utterance. For MRCPv2, also returns the size in bytes and duration in milliseconds of the recording. Returns an empty string if an error prevented recording.

Note: The low-end accuracy for some timeout parameters is affected by the precision of the operating system. For example, on Windows the scheduling precision is 10 milliseconds, so that a value of 10 ms can result in less than a 10 ms scheduling time slice.

Recognition Completion-Causes

The Completion-Cause header must be part of a RECOGNITION-COMPLETE event coming from the recognizer resource to the client. It indicates the reason behind the RECOGNIZE method completion. This header MUST be sent in the DEFINE-GRAMMAR and RECOGNIZE responses, if they return with a failure status and a COMPLETE state.

Cause-code	Cause-name	Description
000	success	RECOGNIZE completed with a match or DEFINE-GRAMMAR succeeded in downloading the resource (and compiling the grammar when required).
001	no-match	RECOGNIZE completed, but no match was found.
002	no-input-timeout	RECOGNIZE completed without a match due to a No-Input-Timeout.
003	hotword-maxtime	RECOGNIZE in hotword mode completed without a match due to a Recognition-Timeout. (Nuance Recognizer only)
004	grammar-load-failure	RECOGNIZE failed due to grammar load failure.
005	grammar-compilation-failure	RECOGNIZE failed due to grammar compilation failure.
006	recognizer-error	RECOGNIZE request terminated prematurely due to a recognizer error.
007	speech-too-early	RECOGNIZE request terminated because speech was too early. This happens when the audio stream is already "in-speech" when the RECOGNIZE request was received.
008	success-maxtime	RECOGNIZE request terminated because speech was too long, but whatever was spoken until that point was a full match.
009	uri-failure	Failure accessing a URI.
010	language-unsupported	Language not supported.
011	cancelled	A new RECOGNIZE cancelled this one.
012	semantics-failure	Recognition succeeded, but semantic interpretation of the recognized input failed. The RECOGNITION-COMPLETE event must contain the recognition result with only input text and no interpretation.
013	partial-match	Speech Incomplete timeout expired before there was a full match, but whatever that was spoken until that point was a partial match.
014	partial-match-maxtime	Recognition-Timeout expired before full match was achieved, but whatever was spoken until that point was a partial match.
015	no-match-maxtime	Recognition-Timeout expired. Either whatever was spoken until that point did not match, or the recognizer does not support detecting partial matches.
016	grammar-definition-failure	DEFINE-GRAMMAR error other than grammar-load-failure and grammar-compilation-failure.

Text-to-speech (TTS) resources

The Nuance Speech Server software supports many MRCP text-to-speech resources. See Controlling speech recognition and TTS or the MRCP recommendation.

Text-to-speech methods

Speech Server supports the following MRCP text-to-speech methods:

Method	Description
BARGE-IN-OCCURRED	Notifies the synthesizer that the client has detected a barge-in-able event. This method is useful in two scenarios: The client detects DTMF digits in the input media or some other barge-in-able event. The recognizer and the synthesizer are on different servers. In this case the client acts as an intermediary for the two servers. It receives a START-OF-INPUT event from the recognizer and sends a BARGE-IN-OCCURRED request to the synthesizer. In such cases, the BARGE-IN-OCCURRED method would also have a Proxy-Sync-Id header received from the resource generating the original event.
CONTROL	Tells a synthesizer that is speaking to modify what it is speaking on the fly: to jump forward or backward in what it is speaking, change speaker rate, speaker parameters, etc. It affects only the currently IN-PROGRESS "SPEAK" request.
DEFINE-LEXICON	Specifies a user dictionary and tells the server to load, unload, activate, or deactivate it. DEFINE-LEXICON enables the dictionary for the current text-to-speech instance, and for all succeeding speak requests until it is explicitly disabled. Hence, DEFINE-LEXICON enables loading a user dictionary once and leaving it enabled for an entire call, although the voice platform must remember to turn it back off. Note: The SPEAK request can also enable a user dictionary, but only for the current text-to-speech request.
GET-PARAMS	Retrieves the value of the specified parameter.
PAUSE	Tells the synthesizer to pause speech output if it is speaking something.
RESUME	Tells a paused synthesizer to resume speaking.
SET-PARAMS	The supported headers for this request are described in the section Text-to-speech headers.
SPEAK	The supported headers for this request are described in the section Text-to-speech headers.
STOP	Instructs the synthesizer to stop speech if a request is active.

Text-to-speech events

Speech Server supports the following MRCP text-to-speech events:

Events	Description
SPEECH-COMPLETE	Supported.
SPEECH-MARKER	Supported.

Text-to-speech headers

Speech Server supports the following MRCP text-to-speech headers.

Header	Values	Default	Description
Audio-Fetch-Hint	prefetch, safe, stream	prefetch	This header is ignored by Vocalizer.
Cache-Control	max-age; max-stale; min-fresh	n/a	(generic header) Overrides the default cache expiration mechanisms. max-age—The server may use only content whose age is no greater than the specified time in seconds. min-fresh—The server may respond only with cached data whose expiration is no less than its current age plus the specified time in seconds. max-stale—The server may use cached data that has exceeded its expiration time by up to the specified number of seconds. If no value is assigned to max-stale, the server may use stale data of any age. The scope of this header depends on the method it is sent on. If the directives are sent on a SET-PARAMS method, they apply for all requests for external documents the server makes during that session, unless overridden by a Cache-Control header on an individual request. If the directives are sent on any other request, they apply only to external document requests the server makes for that request. An empty cache-control header on the GET-PARAMS method requests the server to return the current Cache-Control directive settings on the server.
Completion-Cause	String: numeric code from 000 to 007	n/a	Text describing the reason for the SPEAK request completion. For detailed value descriptions, see Text-to-speech Completion-Causes, or the MRCPv2 draft specification.
Completion-Reason (MRPCv2 only)	String	n/a	Text describing the reason for a SPEAK failure; for use in logs or debugging, such as an error in parsing the speech markup text.
Failed-URI	String	n/a	Bad URI. When a method instructs the synthesizer to fetch or access a URI and the access fails, Speech Server provides the failed URI in the header of the method response.
Failed-URI-Cause	String	n/a	Reason for a failed URI fetch. When a synthesizer method needs a synthesizer to fetch or access a URI and the access fails, access a URI and the access fails, Speech Server provides the URI or protocol specific response code in the header of the method response.
Fetch-Hint	prefetch, safe	prefetch	Instructs the synthesizer to retrieve documents or other resources such as speech markup or audio files from the server. Possible values are: Prefetch—Files can be downloaded when the request is received. Safe—Files are downloaded when needed. This header field can occur in SPEAK, SET-PARAMS or GET-PARAMS requests
Fetch-Timeout (generic in MRCPv2)	0–MAXINT		Synthesizer timeout for resources Speech Server may need to fetch from the network. The value, specified in milliseconds, controls URI access properties when the synthesizer needs to fetch documents or other resources like speech audio files. The default value is dependent on the voice platform. This header field can occur in SPEAK, SET-PARAMS or GET-PARAMS.
Jump-Size (MRCPv2) Jump-Target (MRCPv1)			Not supported by Speech Server. It is accepted and returns 408 "Unrecognized or unsupported message entity." Amount to jump forward or backward in an active "SPEAK" request. A + or - indicates a relative value to what is being currently played. This header MAY also be specified in a SPEAK request as a desired offset into the synthesized speech. In this case, the synthesizer must begin speaking from this amount of time into the speech markup. Note that an offset that extends beyond the end of the produced speech will result in audio of length zero. The different speech length units (second, word, sentence, paragraph) supported are dependent on the synthesizer implementation.
Kill-On-Barge-In	TRUE, FALSE	TRUE	Enables barge-in. If set to TRUE, or if not set, the server stops the SPEAK method when it receives a BARGE-IN-OCCURRED method. If set to FALSE, the server ignores a BARGE-IN-OCCURRED method and sends the appropriate response. The client sends a BARGE-IN-OCCURRED method to the synthesizer when it receives a barge-in-able event such as DTMF input detected by a signal detector resource or by the start of speech sensed or recognized by the speech recognizer resource. If the recognizer or signal detector resource is on the same server as the synthesizer and both are part of the same session, the server can work with both to provide internal notification to the synthesizer so that audio can be stopped without having to wait for the client's BARGE-IN-OCCURRED event.
Lexicon-Search-Order (MRCPv2 only)		n/a	List of active lexicon URIs and the search order among the active lexicons.
Load-Lexicon (MRCPv2 only)	TRUE, FALSE	TRUE	Loads a lexicon (dictionary). If set to TRUE, or if not set, this header indicates to load the dictionary. If set to FALSE, it indicates to unload the dictionary. This header can occur only in DEFINE-LEXICON.
Logging-Tag	String	n/a	(generic header) Text to include in the call logs.
Prosody-Contour Prosody-Duration Prosody-Pitch Prosody-Range			These headers are ignored by Speech Server.
Prosody-Rate Prosody-Volume (MRCPv2 only)			Prosody-Rate accepts values in the following forms: A number in the range 0 to 100 A number indicating a relative change in speed One of these strings: x-fast, fast, medium, slow, x-slow, or default. Prosody-Volume accepts values in the following forms: A number in the range 0 to 100 A number indicating a relative change in volume One of these strings: silent, x-soft, soft, medium, loud, x-loud, or default. When the value is a number, you can specify a simple floating point value without exponentials. Legal formats are "n", "n.", ".n" and "n.n" where "n" is a sequence of one or more digits. When the value indicates a relative change (an offset based on the current setting), the value must begin with a plus sign (+) or minus sign (-) followed by a number. Optionally, the number can be a percentage change. Examples: Changes expressed as a number: +10, -5.5 Changes expressed as percentages: 3%, +15.2%, -8.0%
Speaker-Profile			Not supported by Speech Server. Specifies a URI which references the profile of the speaker. Speaker profiles are collections of voice parameters like gender and accent.
Speak-Length			Not supported by Speech Server.
Speak-Restart			Not supported by Speech Server.
Speech-Language			Default language for the text-to-speech engine.
Speech-Marker	Integer		Marker tag that can be embedded in the speech data. When the synthesizer reaches these marker fields, it generates SPEECH-MARKER events. Marker tags are limited to integer values. For details on marker tags, see .
Vendor-Specific Parameters (generic in MRCPv2)		n/a	The following Nuance-supplied (vendor-specific) parameters are supported: switts.ssftrs_dict_load switts.ssftrs_dict_unload switts.ssftrs_dict_enable switts.ssftrs_dict_disable switts.secure_context switts.speechlanguage switts.voicename Note: Speech Server processes vendor-specific dictionary parameters in the order listed here, not in the order they appear in the MRCP request. To process these parameters in a different order, specify them in consecutive MRCP SET-PARAMS requests. You can load and unload dictionaries by including the parameters with a comma-separated list of the dictionaries, as in the following example: SET_PARAM 100014 MRCP/1.0Vendor-Specific: ssftrs_dict_load="http://my.com/dict1,http://my.com/dict2"; ssftrs_dict_unload="http://my.com/dict3,http://my.com/dict4"; ssftrs_dict_enable="http://my.com/dict2"; ssftrs_dict_disable="http://my.com/dict5"
Voice-Age Voice-Category Voice-Variant			Setting these fields results in a 201 "parameter is ignored" message.
Voice-Gender	male, female, neutral	n/a	Gender of the TTS voice.
Voice-Name	String	n/a	Name of the TTS voice. Specify the complete voice name (including any suffix).

The parameter server.mrcp2.rsspeechsynth.mrcpdefaults.prosodyVolume defines the value for the symbolic string "default". The default value is 100, which is interpreted as an “extra loud” volume. The complete set of values appears below:

Value	Symbolic string
0	silent
10	x-soft
25	soft
50	medium (default)
75	loud
100	x-loud

The parameter server.mrcp2.session.mrcpdefaults.prosodyRate overrides the value for the symbolic string "default". The default value is 50, which is interpreted as a "medium" rate. The complete set of values appears below:

Value	Symbolic string
1	x-slow
25	slow
50	medium (default)
75	fast
100	x-fast

Text-to-speech Completion-Causes

The Completion-Cause header must be part of a SPEAK-COMPLETE event coming from the synthesizer resource to the client. It indicates the reason behind the SPEAK method completion.

Cause-code	Cause-name	Description
000	normal	SPEAK completed normally.
001	barge-in	SPEAK request was terminated because of barge-in.
002	parse-failure	SPEAK request terminated because of a failure to parse the speech markup text.
003	uri-failure	SPEAK request terminated because access to one of the URIs failed.
004	error	SPEAK request terminated prematurely due to synthesizer error.
005	language-unsupported	Language not supported.
006	lexicon-load-failure	Lexicon loading failed.
007	cancelled	A prior SPEAK request failed while this one was still in the queue.

Recorder resources

Speech Server can record speaker utterances for future reference, such as for use in recognizer tuning or legal confirmation.

Recorder methods

Nuance Speech Server supports the following recorder methods:

Method	Description
RECORD	Places the recorder in the Recording state. Depending on the headers specified in the RECORD method, the recorder can start recording the audio immediately or wait for the end pointer to detect speech in the audio. The RECORD request then saves the audio to the URI supplied in the Record-URI header. If Record-URI is not specified, the server saves the audio file anywhere it finds convenient and returns a URI pointing to the file in the RECORD-COMPLETE event.
START-INPUT-TIMERS	Sent when a kill-on-barge-in prompt has finished playing. Use this when the recorder and synthesizer are not in the same MRCPv2 session. When a kill-on-barge-in prompt is playing, the client wants the RECORD request to be active at the same time (so that it can detect and implement kill on barge-in) without starting the server's no-input timers until the prompt is finished. The Start-Input-Timers header in the RECORD request specifies whether to start the timers.
STOP	Instructs the recorder to stop recognition if one is active. If a RECOGNIZE request is active and the STOP request successfully terminates it, the response header contains an Active-Request-ID-List header with the request-id of the RECOGNIZE request that was terminated. In this case, no RECOGNITION-COMPLETE event is sent for the terminated request. If there is no recognition active, the response must not contain an Active-Request-ID-List header.

Method

Description

RECORD

Places the recorder in the Recording state. Depending on the headers specified in the RECORD method, the recorder can start recording the audio immediately or wait for the end pointer to detect speech in the audio. The RECORD request then saves the audio to the URI supplied in the Record-URI header. If Record-URI is not specified, the server saves the audio file anywhere it finds convenient and returns a URI pointing to the file in the RECORD-COMPLETE event.

START-INPUT-TIMERS

Sent when a kill-on-barge-in prompt has finished playing. Use this when the recorder and synthesizer are not in the same MRCPv2 session.

When a kill-on-barge-in prompt is playing, the client wants the RECORD request to be active at the same time (so that it can detect and implement kill on barge-in) without starting the server's no-input timers until the prompt is finished. The Start-Input-Timers header in the RECORD request specifies whether to start the timers.

STOP

Instructs the recorder to stop recognition if one is active. If a RECOGNIZE request is active and the STOP request successfully terminates it, the response header contains an Active-Request-ID-List header with the request-id of the RECOGNIZE request that was terminated. In this case, no RECOGNITION-COMPLETE event is sent for the terminated request. If there is no recognition active, the response must not contain an Active-Request-ID-List header.

Recorder events

Nuance Speech Server supports the following recorder events:

Resource	Description
RECORD-COMPLETE	If the recording was a success, contains a Record-Uri header pointing to the recorded audio file on the server or to a MIME part containing the recorded audio in the body of the message. If the recording completes due to no-input, silence after speech, or max-time, the server must generate the RECORD-COMPLETE event to the client with a request-state of "COMPLETE".
START-OF-INPUT	Returned from the server to the client once the server has detected speech.

Resource

Description

RECORD-COMPLETE

If the recording was a success, contains a Record-Uri header pointing to the recorded audio file on the server or to a MIME part containing the recorded audio in the body of the message.

If the recording completes due to no-input, silence after speech, or max-time, the server must generate the RECORD-COMPLETE event to the client with a request-state of "COMPLETE".

START-OF-INPUT

Returned from the server to the client once the server has detected speech.

Recorder headers

Nuance Speech Server supports the following recorder headers:

Header	Values	Default	Description
Capture-On-Speech	TRUE, FALSE	FALSE	Instructs the recorder to start capturing immediately (FALSE) when it starts, or wait for the end-pointing functionality to detect speech (TRUE) before it starts capturing. This header can occur in the RECORD, SET-PARAM, or GET-PARAMS method.
Completion-Cause	String: numeric code from 000 to 004	n/a	Reason behind the RECOGNIZE request completion. Must be included in a RECOGNITION-COMPLETE event. For detailed value descriptions, see Recorder Completion-Causes, or the MRCPv2 draft specification.
Completion-Reason	String	n/a	Reason behind the RECOGNIZE request completion, such as the specific error encountered in parsing a grammar markup. May be included in a RECOGNITION-COMPLETE event. Do not interpret the completion reason text. Instead, record the reason in client logs and make them available for debugging and instrumentation purposes.
Failed-Uri	Alphanumeric string plus "@" ":" "/" "\"	n/a	Bad URI. When a method instructs the recorder to fetch or access a URI and the access fails, Speech Server provides the failed URI in the header of the method response.
Failed-Uri-Cause	Alphanumeric string	n/a	Reason for a failed URI fetch. When a recorder method needs a recorder to fetch or access a URI and the access fails, Speech Server provides the URI or protocol-specific response code in the header of the method response.
Final-Silence	0–MAXINT	Implementation specific	Length of silence in the audio (in milliseconds) to be interpreted as the end of the recording. This header can occur in RECORD, SET-PARAMS or GET-PARAMS. A value of zero means infinity and allows the recording to continue until one of the other stop conditions are met. Use the endpointer to include the silence before start of speech in the recording even when Final-Silence is non-zero and Capture-On-Speech is false. Collect samples whenever Capture-On-Speech is true or Final-Silence is non-zero. Since the endpointer does not support intermediary silence suppression, there is no support for intermediary silence suppression in the recorder resource.
Max-Time	0–MAXINT	0	Maximum length of the recording in milliseconds, calculated from the time the actual capture and store begins (not necessarily the time the RECORD method is received). After this time, the recording stops and the server returns a RECORD-COMPLETE event with a request-state of COMPLETE. This header can occur in RECORD, SET-PARAMS or GET-PARAMS. A value of zero means infinity and allows the recording to continue until one or more of the other stop conditions is met.
Media-Type	MIME content type	audio/x-wav	MIME content type in which to store the captured audio or video.
New-Audio-Channel	TRUE, FALSE	n/a	Tells the server that, from this point on, further input audio comes from a different audio source, channel, or speaker. It is specified in a RECORD request. If multiple resources are sharing a media pipe and are collecting or using this data, and the client issues this header to one of the resources, the reset operation applies to all resources that use the shared media stream. This helps in a number of scenarios, including where the client wishes to reuse an open recognition session with an existing media session for multiple telephone calls.
No-Input-Timeout	0–MAXINT	Implementation specific	Duration when there is no speech detected for a certain period of time. Based on this value, the recognizer issues a RECOGNITION-COMPLETE event with a Completion-Status of "002 no-input-timeout", and terminates the recognition operation. This header can occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS. Time in milliseconds, from 0 to an application specific maximum value
Record-URI	Alphanumeric string plus "@" ":" "/" "\"	n/a	When a recorder method contains the Failed-Uri header, the server must capture the audio and store it. If the header is empty in the RECORD request, the server must store the content locally and return a URI that points to it in the STOP response or the RECORD-COMPLETE event. If the header in the RECORD request specifies a URI, the server must attempt to capture and store the audio at that location. If this header is not specified in the RECORD request, the server must capture the audio and send it in the STOP response or the RECORD-COMPLETE event as a message body. In this case, the response carrying the audio content would contain this header with a cid value pointing to the Content-ID in the message body. The server must also return the size in bytes and the duration in milliseconds of the recorded audio wave-form as parameters associated with the header; for example: `Record-URI:<file://mediaserver/recordings/myfile.wav>;size=325325;duration=24652`
Sensitivity-Level	0.0–1.0	Implementation specific	Sensitivity level for the recognizer. A higher value for this header means higher sensitivity. The recognizer may support a variable level of sound sensitivity to filter out background noise and not mistake it for speech. This header can occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
Start-Input-Timers	TRUE, FALSE	TRUE	Instructs the recorder to start the timers. The recorder does not start the timers until the client sends a START-INPUT-TIMERS method to the recorder. This is useful in the scenario when the recorder and synthesizer resources are not part of the same session. When a kill-on-barge-in prompt plays, the client may want the RECORD request to be simultaneously active, so that it can detect and implement kill-on-barge-in. At the same time, however, the client doesn't want the recorder resource to start the no-input timers until the prompt is finished. The Start-Input-Timers header can be sent as part of the RECORD request. FALSE = Start recorder, but not to start the no-input timer until the client sends a START-INPUT-TIMERS TRUE = Do not start recorder.
Trim-Length	0–MAXINT	0	Length of audio to be trimmed from the end of the recording after the STOP (in milliseconds).

Recorder Completion-Causes

The Completion-Cause header must be part of a RECORD-COMPLETE event coming from the recorder resource to the client. It indicates the reason behind the RECORD method completion. This header must be sent in the RECORD responses, if they return with a failure status and a COMPLETE state.

Cause-code	Cause-name	Description
000	success-silence	RECORD completed with a silence at the end.
001	success-maxtime	RECORD completed after reaching the maximum recording time specified in the RECORD method.
002	no-input-timeout	RECORD failed due to no input.
003	uri-failure	Failure accessing the record URI.
004	error	RECORD request terminated prematurely due to a recorder error.

Controlling speech processing

Recognition resources

Text-to-speech (TTS) resources

Recorder resources

Related topics