Configuring MRCP clients

This topic describes the configuration of resources on the Speech Server using Management Station. The purpose of this configuration is to match Nuance Speech Server settings to the expectations of the MRCP client.

Note: Speech Server supports both MRCPv2 and MRCPv1; however, Nuance recommends using MRCPv2. For example, Dragon Voice does not support MRCPv1.

Although Speech Server generally works well with the installation defaults, your speech browser (and its MRCP client) might require non-default values. System administrators must review the defaults to determine appropriate values, and if necessary modify settings.

In general, these parameters are not modified for individual applications. Instead, application developers use a session.xml. For details, see Configuring application sessions.

Configuring network security

Speech Server supports enhanced security for communication with MRCPv2 clients. Use Transport Layer Security (TLS) to encrypt SIP and MRCP requests and responses between the MRCP client and Speech Server so that requests and responses remain invisible to outside observers.

Speech Server supports TLS versions 1.0, 1.1, and 1.2.

Note: You can also use Secure Real-time Transport Protocol (SRTP) to encrypt the audio channel. See Setting audio channel security.

Setting up TLS

These parameters configure TLS for Speech Server:

Parameter	Description	Value
server.mrcp2.sip.transport.tls.port	Specifies the SIP TLS port to use for the application server.	An available port number. DEFAULT: 5061
server.mrcp2.transport.tls.port	Specifies the MRCPv2 TLS port to use for the application server.	An available port number. DEFAULT: 6076
server.callLog.tls.port	Specifies the listening TLS port for clients to communicate with call log server. Listening TLS port for clients to communicate with call log server.	Integer. An available port number for TLS. DEFAULT: 10102
server.tls.tlsVersion	Specifies the allowed versions of the TLS protocol.	Integer. One of the following: 0 (TLS v1.0 only) 1 (TLS v1.1 and above) 2 (TLS v1.2 and above) DEFAULT: 0

Setting up peer authentication

You can enable two-way peer authentication with web servers with these Speech Server parameters:

Parameter	Description	Value
server.tls.caCertificatesFile	Specifies the file containing one or more sequential PEM-encoded public CA certificates.	String DEFAULT: (none)
server.tls.certificateFile	Specifies the PEM-encoded certificate for Speech Server.	String DEFAULT: (none)
server.tls.privateKeyFile	Specifies the PEM-encoded private key for Speech Server.	String DEFAULT: (none)
server.inet.tls.verify	Enables peer authentication.	Boolean DEFAULT: 0
server.tls.verifyDepth	Limits the depth of the certificate chain for validation.	Integer DEFAULT: 2 (accommodates one intermediate CA)

Setting up strongest security

Set these Speech Server parameters to require the highest level of security:

Parameter	Description	Value
server.mrcp2.sip.transport.tls.useStrongestCipherSuite	Specifies the encryption ciphers on the TLS port. Requires use of only the strongest TLS ciphers.	Integer. Must be 0 (accept weak ciphers), 1 (accept strong ciphers only), or 2 (accept cipher suites based on the DH key exchange method with GCM mode only). DEFAULT: 0 (accept weak ciphers)
server.mrcp2.transport.tls.useStrongestCipherSuite	Specifies the encryption ciphers on the TLS port. Requires use of only the strongest TLS ciphers.	Integer. Must be 0 (accept weak ciphers), 1 (accept strong ciphers only), or 2 (accept cipher suites based on the DH key exchange method with GCM mode only). DEFAULT: 0 (accept weak ciphers)
server.callLog.useStrongestCipherSuite	Specifies the encryption ciphers on the TLS port. Requires use of only the strongest TLS ciphers for communication with the call log server.	Integer. Must be 0 (accept weak ciphers), 1 (accept strong ciphers only), or 2 (accept cipher suites based on the DH key exchange method with GCM mode only). DEFAULT: 0 (accept weak ciphers)

Parameter

Description

Value

server.mrcp2.sip.transport.tls.useStrongestCipherSuite

Specifies the encryption ciphers on the TLS port. Requires use of only the strongest TLS ciphers.

Integer. Must be 0 (accept weak ciphers), 1 (accept strong ciphers only), or 2 (accept cipher suites based on the DH key exchange method with GCM mode only).

DEFAULT: 0 (accept weak ciphers)

server.mrcp2.transport.tls.useStrongestCipherSuite

Specifies the encryption ciphers on the TLS port. Requires use of only the strongest TLS ciphers.

Integer. Must be 0 (accept weak ciphers), 1 (accept strong ciphers only), or 2 (accept cipher suites based on the DH key exchange method with GCM mode only).

DEFAULT: 0 (accept weak ciphers)

server.callLog.useStrongestCipherSuite

Specifies the encryption ciphers on the TLS port. Requires use of only the strongest TLS ciphers for communication with the call log server.

Integer. Must be 0 (accept weak ciphers), 1 (accept strong ciphers only), or 2 (accept cipher suites based on the DH key exchange method with GCM mode only).

DEFAULT: 0 (accept weak ciphers)

Configuring recognition resources

This section describes the configuration of speech recognition resources on Speech Server in Management Station, including for selective barge-in and magic word.

Note: The Krypton recognition engine does not support hotword mode recognition, including selective barge-in and magic word.

MRCPv2 recognition parameters

Set these parameters if your voice browser uses MRCPv2. (If your browser uses MRCPv1, it ignores these parameters.) You can control several types of activity:

Parameter	Description	Value
Audio processing
server.mrcp2.audioengine.audioThreadNumber	Specifies the number of audio threads used to fetch audio and feed it to the recognizer and recorder.	Integer: 1–INT_MAX. DEFAULT: 20
Endpointer usage
server.mrcp2.osrspeechrecog.endpointer	Controls use of the endpointer.	Boolean DEFAULT: 1 (external endpointer)
Enable/disable use of cookies with Internet fetches
server.mrcp2.osrspeechrecog.cookie.enable	Enables the use of cookies for retrieving files.	Boolean DEFAULT: 0
Responses to various recognitions
server.mrcp2.osrspeechrecog.internalBargein	Sends a BARGE-IN-OCCURRED event directly from the recognizer to Vocalizer to quickly stop a prompt.	Boolean DEFAULT: 0 (disabled)
server.mrcp2.osrspeechrecog.startOfInputOnDTMF	Send a START-OF-INPUT event on DTMF input.	Boolean DEFAULT: 0 (disabled)
server.mrcp2.osrspeechrecog.startOfInputOnHotword	Send a START-OF-INPUT event each time new candidate speech is detected in a hotword mode recognition.	Boolean DEFAULT: 0 (disabled)
server.mrcp2.osrspeechrecog.hotwordSuppression	Prevents a hotword from being included in recordings.	Integer. 0 (hotword is not suppressed) or 1 (hotword is suppressed) DEFAULT: 0 (hotword is not suppressed)
Result formats
server.mrcp2.osrspeechrecog.mrcpdefaults.VSP.server. osrspeechrecog.result.mediatype	Specifies the media type of the recognition result returned to the application.	MIME media type supported. DEFAULT: application/x-vnd.speechworks.emma+xml;strictconfidencelevel=1;mrcpv=2.06
server.mrcp2.osrspeechrecog.mrcpdefaults.VSP.server. osrspeechrecog.result.sendnomatch	Return the recognition result to the MRCP client even when the confidence is low.	Boolean DEFAULT: false (does not return the result of low confidence recognitions)

MRCPv1 recognition parameters

Set these parameters if your voice browser uses MRCPv1. (If your browser uses MRCPv2, it ignores these parameters.) You can control several types of activity:

Parameter	Description	Value
Audio processing
server.mrcp1.osrspeechrecog.audioBufferSize	Specifies the size of the audio buffer.	The range of values depends on the audio type and the sampling rate. Assuming a sampling rate of 8kHz: ulaw, alaw: 0–500 milliseconds. L16: 0–250 milliseconds. The maximum buffer size is equivalent to 4000 bytes. DEFAULT: 100 (milliseconds)
server.mrcp1.osrspeechrecog.audioThreadNumber	Specifies the number of threads used to fetch audio and feed it to the recognizer.	Integer: 1–INT_MAX. DEFAULT: 10
Endpointer usage
server.mrcp1.osrspeechrecog.endpointer	Controls use of the endpointer.	Boolean DEFAULT: 1 (external endpointer)
Enable/disable use of cookies with Internet fetches
server.mrcp1.osrspeechrecog.cookie.enable	Enables the use of cookies for retrieving files.	Boolean DEFAULT: 0 (no cookies)
Responses to various recognitions
server.mrcp1.osrspeechrecog.internalBargein	Sends a BARGE-IN-OCCURRED event directly from the Recognizer to Vocalizer to quickly stop a prompt.	Boolean DEFAULT: 0 (disabled)
server.mrcp1.osrspeechrecog.startOfSpeechOnDTMF	Send a START-OF-SPEECH event on DTMF input.	Boolean DEFAULT: 1 (enabled)
server.mrcp1.osrspeechrecog.startOfSpeechOnHotword	Send a START-OF-SPEECH event each time new candidate speech is detected in a hotword mode recognition.	Boolean DEFAULT: 0 (disabled)
Result formats
server.mrcp1.osrspeechrecog.nlsml.encoding	Inserts an XML header before the NLSML results.	Character encoding type DEFAULT: ISO-8859-1
server.mrcp1.osrspeechrecog.result.mediatype	Specifies the media type of the recognition result returned to the application.	Media type DEFAULT: application/x-vnd.speechworks.emma+xml;strictconfidencelevel=1
server.mrcp1.osrspeechrecog.startOfSpeechOnDTMF	Send a START-OF-SPEECH event on DTMF input.	Boolean DEFAULT: 1 (enabled)

Selective barge-in and magic word (MRCPv1 & MRCPv2)

An application can use the hotword mode to support two Nuance-specific barge-in modes: selective barge-in and magic word. These modes enable the application to recognize a specific speech or DTMF sequence and ignore anything else.

Selective barge-in prevents accidental interruption by allowing applications to define a small set of key words (to be spoken by callers) that trigger barge-in. An application that supports selective barge-in always listens for commands, whether the caller is speaking or listening to prompts.
Magic word is identical to selective barge-in except that it also rejects candidates that are too short or long.

The application and must co-ordinate the endpointer and the recognizer, using the parameters swiep_mode and swirec_barge_in_mode. Both resources must be set to compatible modes for any given recognition:

swiep_mode	swirec_barge_in_mode
begin_only (default)	normal
magic_word	magic_word
selective_barge_in	selective_barge_in

You can also use the following parameters to control barge-in modes:

Parameter	Description	Value
swiep_magic_word_max_msec	Specifies the maximum duration of a magic word candidate for recognition.	Integer: milliseconds. Minimum is 0; there is no maximum. DEFAULT: 800 (milliseconds)
swiep_magic_word_min_msec	Specifies the minimum duration of a magic word candidate for recognition.	Integer: 0– swiep_magic_word_max_msec milliseconds. DEFAULT: 200 (milliseconds)
swirec_magic_word_conf_thresh	Specifies the confidence threshold for recognition results computed while the magic_word mode is active.	Integer: 0–999. DEFAULT: 500
swirec_selective_barge_in_conf_thresh	Specifies the confidence threshold for recognition results computed while the selective_barge_in mode is active.	Integer: 0–999. DEFAULT: 500
incompletetimeout	Specifies the duration of silence to determine that callers have finished speaking.	Integer: 0–INT_MAX (milliseconds) A value of 0 disables the timer (a zero-length silence period). DEFAULT: 1500 (1.5 seconds)

Recognition result formats

You can control the mediatype and encoding of recognition results using the following parameters:

Parameter	Description	Value
server.mrcp2.osrspeechrecog.mrcpdefaults.VSP.server. osrspeechrecog.result.mediatype	Specifies the media type of the recognition result returned to the application.	MIME media type supported. DEFAULT: application/x-vnd.speechworks.emma+xml;strictconfidencelevel=1;mrcpv=2.06
server.mrcp1.osrspeechrecog.nlsml.encoding	Inserts an XML header before the NLSML results.	Character encoding type DEFAULT: ISO-8859-1
server.mrcp1.osrspeechrecog.result.mediatype	Specifies the media type of the recognition result returned to the application.	Media type DEFAULT: application/x-vnd.speechworks.emma+xml;strictconfidencelevel=1

Parameter

Description

Value

server.mrcp2.osrspeechrecog.mrcpdefaults.VSP.server.
osrspeechrecog.result.mediatype

Specifies the media type of the recognition result returned to the application.

MIME media type supported.

DEFAULT: application/x-vnd.speechworks.emma+xml;strictconfidencelevel=1;mrcpv=2.06

server.mrcp1.osrspeechrecog.nlsml.encoding

Inserts an XML header before the NLSML results.

Character encoding type

DEFAULT: ISO-8859-1

server.mrcp1.osrspeechrecog.result.mediatype

Specifies the media type of the recognition result returned to the application.

Media type

DEFAULT: application/x-vnd.speechworks.emma+xml;strictconfidencelevel=1

Configuring text-to-speech resources

This section describes the configuration of text-to-speech resources on Speech Server using Management Station.

MRCPv2 text-to-speech parameters

Set these parameters if your voice browser uses MRCPv2. (If your browser uses MRCPv1, it ignores these parameters.) You can control several types of activity:

Parameter	Description	Value
Audio processing
server.mrcp2.rsspeechsynth.audioThreadNumber	Specifies the number of sending threads.	Integer DEFAULT: 20
server.mrcp2.rsspeechsynth.errorOnNoAudio	Determines whether it is an error when Vocalizer does not generate audio.	Boolean DEFAULT: 0
Speed of RTP stream
server.mrcp2.rsspeechsynth.rtpSendRate	Specifies the number of audio samples sent per second.	Integer: 1–INT_MAX samples. DEFAULT: 8000
server.mrcp2.rsspeechsynth.rtpBufferFillMultiplier	Adjusts the RTP sending speed by filling packets more quickly.	Integer: 1–10. DEFAULT: 2
server.mrcp2.rsspeechsynth.rtpLowerBoundarySamples	Adjusts the RTP sending speed by setting the minimum number of samples that can be sent ahead.	Integer: 0–rtpUpperBoundarySamples samples. DEFAULT: 300
server.mrcp2.rsspeechsynth.rtpUpperBoundarySamples	Adjusts the RTP sending speed by setting the maximum number of samples that can be sent ahead.	Integer: rtpLowerBoundarySamples–INT_MAX samples. DEFAULT: 600
Enable/disable use of cookies with Internet fetches
server.mrcp2.rsspeechsynth.cookie.enable	Enables the use of cookies for retrieving files.	Boolean DEFAULT: 0
Vocalizer input/output
server.mrcp2.rsspeechsynth.plainTextSSMLEncoding	Specifies the default encoding for text/plain MRCP messages.	ISO-8859-1, UTF-8, UTF-16 DEFAULT: ISO-8859-1
server.mrcp2.rsspeechsynth.playsilence	Enables writing silence before and after a prompt.	Boolean DEFAULT: 0 (no writing silence)

MRCPv1 text-to-speech parameters

Set these parameters if your voice browser uses MRCPv1. (If your browser uses MRCPv2, it ignores them.) You can control several types of activity:

Parameter	Description	Value
Configuring Vocalizer
server.mrcp1.rsspeechsynth.initialNumber	Specifies the number of synthesizer plug-in instances created during system initialization.	Integer: 0–number of available TTS licenses. DEFAULT: 0
Audio processing
server.mrcp1.rsspeechsynth.audioThreadNumber	Specifies the number of sending threads.	Integer DEFAULT: 8
Speed of RTP stream
server.mrcp1.rsspeechsynth.rtpSendRate	Specifies the number of audio samples sent per second.	Integer: 1–INT_MAX samples. DEFAULT: 8000
server.mrcp1.rsspeechsynth.rtpPacketSamples	Specifies the size of the RTP packet in samples.	Integer: 1–1000 samples. Typical settings are 160 or 240. DEFAULT: 160
server.mrcp1.rsspeechsynth.rtpBufferFillMultiplier	Adjusts the RTP sending speed by filling packets more quickly.	Integer: 1–10. DEFAULT: 2
server.mrcp1.rsspeechsynth.rtpLowerBoundarySamples	Adjusts the RTP sending speed by setting the minimum number of samples that can be sent ahead.	Integer: 0–rtpUpperBoundarySamples. DEFAULT: 300
server.mrcp1.rsspeechsynth.rtpUpperBoundarySamples	Adjusts the RTP sending speed by setting the maximum number of samples that can be sent ahead.	Integer: rtpLowerBoundarySamples–INT_MAX samples. DEFAULT: 600
Enable/disable use of cookies with Internet fetches
server.mrcp1.rsspeechsynth.cookie.enable	Enables the use of cookies.	Boolean DEFAULT: 0
Vocalizer input/output
server.mrcp1.rsspeechsynth.plainTextSSMLEncoding	Encoding to plain text for MRCP messages.	ISO-8859-1, UTF-8, UTF-16 DEFAULT: ISO-8859-1

Configuring MRCP clients

Configuring network security

Configuring recognition resources

Configuring text-to-speech resources

Related topics