Configuring MRCP clients
This topic describes the configuration of resources on the Speech Server using Management Station. The purpose of this configuration is to match Nuance Speech Server settings to the expectations of the MRCP client.
Note: Speech Server supports both MRCPv2 and MRCPv1; however, Nuance recommends using MRCPv2. For example, Dragon Voice does not support MRCPv1.
Although Speech Server generally works well with the installation defaults, your speech browser (and its MRCP client) might require non-default values. System administrators must review the defaults to determine appropriate values, and if necessary modify settings.
In general, these parameters are not modified for individual applications. Instead, application developers use a session.xml. For details, see Configuring application sessions.
Configuring network security
Speech Server supports enhanced security for communication with MRCPv2 clients. Use Transport Layer Security (TLS) to encrypt SIP and MRCP requests and responses between the MRCP client and Speech Server so that requests and responses remain invisible to outside observers.
Speech Server supports TLS versions 1.0, 1.1, and 1.2.
Note: You can also use Secure Real-time Transport Protocol (SRTP) to encrypt the audio channel. See Setting audio channel security.
These parameters configure TLS for Speech Server:
Parameter |
Description |
Value |
---|---|---|
Specifies the SIP TLS port to use for the application server. |
An available port number. DEFAULT: 5061 |
|
Specifies the MRCPv2 TLS port to use for the application server. |
An available port number. DEFAULT: 6076 |
|
server.callLog.tls.port |
Specifies the listening TLS port for clients to communicate with call log server. Listening TLS port for clients to communicate with call log server. |
Integer. An available port number for TLS. DEFAULT: 10102 |
server.tls.tlsVersion |
Specifies the allowed versions of the TLS protocol. |
Integer. One of the following:
DEFAULT: 0 |
You can enable two-way peer authentication with web servers with these Speech Server parameters:
Parameter |
Description |
Value |
---|---|---|
Specifies the file containing one or more sequential PEM-encoded public CA certificates. |
String DEFAULT: (none) |
|
Specifies the PEM-encoded certificate for Speech Server. |
String DEFAULT: (none) |
|
Specifies the PEM-encoded private key for Speech Server. |
String DEFAULT: (none) |
|
Enables peer authentication. |
Boolean DEFAULT: 0 |
|
Limits the depth of the certificate chain for validation. |
Integer DEFAULT: 2 (accommodates one intermediate CA) |
Set these Speech Server parameters to require the highest level of security:
Parameter |
Description |
Value |
---|---|---|
Specifies the encryption ciphers on the TLS port. Requires use of only the strongest TLS ciphers. |
Integer. Must be 0 (accept weak ciphers), 1 (accept strong ciphers only), or 2 (accept cipher suites based on the DH key exchange method with GCM mode only). DEFAULT: 0 (accept weak ciphers) |
|
Specifies the encryption ciphers on the TLS port. Requires use of only the strongest TLS ciphers. |
Integer. Must be 0 (accept weak ciphers), 1 (accept strong ciphers only), or 2 (accept cipher suites based on the DH key exchange method with GCM mode only). DEFAULT: 0 (accept weak ciphers) |
|
server.callLog.useStrongestCipherSuite |
Specifies the encryption ciphers on the TLS port. Requires use of only the strongest TLS ciphers for communication with the call log server. |
Integer. Must be 0 (accept weak ciphers), 1 (accept strong ciphers only), or 2 (accept cipher suites based on the DH key exchange method with GCM mode only). DEFAULT: 0 (accept weak ciphers) |
Configuring recognition resources
This section describes the configuration of speech recognition resources on Speech Server in Management Station, including for selective barge-in and magic word.
Note: The Krypton recognition engine does not support hotword mode recognition, including selective barge-in and magic word.
Set these parameters if your voice browser uses MRCPv2. (If your browser uses MRCPv1, it ignores these parameters.) You can control several types of activity:
Parameter | Description | Value |
---|---|---|
Audio processing |
||
Specifies the number of audio threads used to fetch audio and feed it to the recognizer and recorder. |
Integer: 1–INT_MAX. DEFAULT: 20 |
|
Endpointer usage |
||
Controls use of the endpointer. |
Boolean DEFAULT: 1 (external endpointer) |
|
Enable/disable use of cookies with Internet fetches |
||
Enables the use of cookies for retrieving files. |
Boolean DEFAULT: 0 |
|
Responses to various recognitions |
||
Sends a BARGE-IN-OCCURRED event directly from the recognizer to Vocalizer to quickly stop a prompt. |
Boolean DEFAULT: 0 (disabled) |
|
Send a START-OF-INPUT event on DTMF input. |
Boolean DEFAULT: 0 (disabled) |
|
Send a START-OF-INPUT event each time new candidate speech is detected in a hotword mode recognition. |
Boolean DEFAULT: 0 (disabled) |
|
server.mrcp2.osrspeechrecog.hotwordSuppression |
Prevents a hotword from being included in recordings. |
Integer. 0 (hotword is not suppressed) or 1 (hotword is suppressed) DEFAULT: 0 (hotword is not suppressed) |
Result formats |
||
server.mrcp2.osrspeechrecog.mrcpdefaults.VSP.server. |
Specifies the media type of the recognition result returned to the application. |
MIME media type supported. DEFAULT: application/x-vnd.speechworks.emma+xml;strictconfidencelevel=1;mrcpv=2.06 |
server.mrcp2.osrspeechrecog.mrcpdefaults.VSP.server. |
Return the recognition result to the MRCP client even when the confidence is low. |
Boolean DEFAULT: false (does not return the result of low confidence recognitions) |
Set these parameters if your voice browser uses MRCPv1. (If your browser uses MRCPv2, it ignores these parameters.) You can control several types of activity:
Parameter | Description | Value |
---|---|---|
Audio processing |
||
Specifies the size of the audio buffer. |
The range of values depends on the audio type and the sampling rate. Assuming a sampling rate of 8kHz: ulaw, alaw: 0–500 milliseconds. L16: 0–250 milliseconds. The maximum buffer size is equivalent to 4000 bytes. DEFAULT: 100 (milliseconds) |
|
Specifies the number of threads used to fetch audio and feed it to the recognizer. |
Integer: 1–INT_MAX. DEFAULT: 10 |
|
Endpointer usage |
||
Controls use of the endpointer. |
Boolean DEFAULT: 1 (external endpointer) |
|
Enable/disable use of cookies with Internet fetches |
||
Enables the use of cookies for retrieving files. |
Boolean DEFAULT: 0 (no cookies) |
|
Responses to various recognitions |
||
Sends a BARGE-IN-OCCURRED event directly from the Recognizer to Vocalizer to quickly stop a prompt. |
Boolean DEFAULT: 0 (disabled) |
|
Send a START-OF-SPEECH event on DTMF input. |
Boolean DEFAULT: 1 (enabled) |
|
Send a START-OF-SPEECH event each time new candidate speech is detected in a hotword mode recognition. |
Boolean DEFAULT: 0 (disabled) |
|
Result formats |
||
Inserts an XML header before the NLSML results. |
Character encoding type DEFAULT: ISO-8859-1 |
|
Specifies the media type of the recognition result returned to the application. |
Media type DEFAULT: application/x-vnd.speechworks.emma+xml;strictconfidencelevel=1 |
|
Send a START-OF-SPEECH event on DTMF input. |
Boolean DEFAULT: 1 (enabled) |
An application can use the hotword mode to support two Nuance-specific barge-in modes: selective barge-in and magic word. These modes enable the application to recognize a specific speech or DTMF sequence and ignore anything else.
- Selective barge-in prevents accidental interruption by allowing applications to define a small set of key words (to be spoken by callers) that trigger barge-in. An application that supports selective barge-in always listens for commands, whether the caller is speaking or listening to prompts.
- Magic word is identical to selective barge-in except that it also rejects candidates that are too short or long.
The application and must co-ordinate the endpointer and the recognizer, using the parameters swiep_mode and swirec_barge_in_mode. Both resources must be set to compatible modes for any given recognition:
swiep_mode |
swirec_barge_in_mode |
---|---|
begin_only (default) |
normal |
magic_word |
magic_word |
selective_barge_in |
selective_barge_in |
You can also use the following parameters to control barge-in modes:
Parameter |
Description |
Value |
---|---|---|
Specifies the maximum duration of a magic word candidate for recognition. |
Integer: milliseconds. Minimum is 0; there is no maximum. DEFAULT: 800 (milliseconds) |
|
Specifies the minimum duration of a magic word candidate for recognition. |
Integer: 0– swiep_magic_word_max_msec milliseconds. DEFAULT: 200 (milliseconds) |
|
Specifies the confidence threshold for recognition results computed while the magic_word mode is active. |
Integer: 0–999. DEFAULT: 500 |
|
Specifies the confidence threshold for recognition results computed while the selective_barge_in mode is active. |
Integer: 0–999. DEFAULT: 500 |
|
Specifies the duration of silence to determine that callers have finished speaking. |
Integer: 0–INT_MAX (milliseconds) A value of 0 disables the timer (a zero-length silence period). DEFAULT: 1500 (1.5 seconds) |
You can control the mediatype and encoding of recognition results using the following parameters:
Parameter |
Description |
Value |
---|---|---|
server.mrcp2.osrspeechrecog.mrcpdefaults.VSP.server. |
Specifies the media type of the recognition result returned to the application. |
MIME media type supported. DEFAULT: application/x-vnd.speechworks.emma+xml;strictconfidencelevel=1;mrcpv=2.06 |
Inserts an XML header before the NLSML results. |
Character encoding type DEFAULT: ISO-8859-1 |
|
Specifies the media type of the recognition result returned to the application. |
Media type DEFAULT: application/x-vnd.speechworks.emma+xml;strictconfidencelevel=1 |
Configuring text-to-speech resources
This section describes the configuration of text-to-speech resources on Speech Server using Management Station.
Set these parameters if your voice browser uses MRCPv2. (If your browser uses MRCPv1, it ignores these parameters.) You can control several types of activity:
Parameter | Description | Value |
---|---|---|
Audio processing |
||
Specifies the number of sending threads. |
Integer DEFAULT: 20 |
|
Determines whether it is an error when Vocalizer does not generate audio. |
Boolean DEFAULT: 0 |
|
Speed of RTP stream |
||
Specifies the number of audio samples sent per second. |
Integer: 1–INT_MAX samples. DEFAULT: 8000 |
|
Adjusts the RTP sending speed by filling packets more quickly. |
Integer: 1–10. DEFAULT: 2 |
|
Adjusts the RTP sending speed by setting the minimum number of samples that can be sent ahead. |
Integer: 0–rtpUpperBoundarySamples samples. DEFAULT: 300 |
|
Adjusts the RTP sending speed by setting the maximum number of samples that can be sent ahead. |
Integer: rtpLowerBoundarySamples–INT_MAX samples. DEFAULT: 600 |
|
Enable/disable use of cookies with Internet fetches |
||
Enables the use of cookies for retrieving files. |
Boolean DEFAULT: 0 |
|
Vocalizer input/output |
||
Specifies the default encoding for text/plain MRCP messages. |
ISO-8859-1, UTF-8, UTF-16 DEFAULT: ISO-8859-1 |
|
Enables writing silence before and after a prompt. |
Boolean DEFAULT: 0 (no writing silence) |
Set these parameters if your voice browser uses MRCPv1. (If your browser uses MRCPv2, it ignores them.) You can control several types of activity:
Parameter |
Description |
Value |
---|---|---|
Configuring Vocalizer | ||
Specifies the number of synthesizer plug-in instances created during system initialization. |
Integer: 0–number of available TTS licenses. DEFAULT: 0 |
|
Audio processing |
||
Specifies the number of sending threads. |
Integer DEFAULT: 8 |
|
Speed of RTP stream |
||
Specifies the number of audio samples sent per second. |
Integer: 1–INT_MAX samples. DEFAULT: 8000 |
|
Specifies the size of the RTP packet in samples. |
Integer: 1–1000 samples. Typical settings are 160 or 240. DEFAULT: 160 |
|
Adjusts the RTP sending speed by filling packets more quickly. |
Integer: 1–10. DEFAULT: 2 |
|
Adjusts the RTP sending speed by setting the minimum number of samples that can be sent ahead. |
Integer: 0–rtpUpperBoundarySamples. DEFAULT: 300 |
|
Adjusts the RTP sending speed by setting the maximum number of samples that can be sent ahead. |
Integer: rtpLowerBoundarySamples–INT_MAX samples. DEFAULT: 600 |
|
Enable/disable use of cookies with Internet fetches |
||
Enables the use of cookies. |
Boolean DEFAULT: 0 |
|
Vocalizer input/output |
||
Encoding to plain text for MRCP messages. |
ISO-8859-1, UTF-8, UTF-16 DEFAULT: ISO-8859-1 |