Configuring the speech processing environment
You can use MRCP commands to set up some basic qualities of the environment.
Setting expected media types and encodings
Components of a speech-processing system must agree on media types for sharing messages, text files, and audio files. The VoiceXML application and the voice browser can set the media type at runtime, or you can set default media types in the component configuration files.
Elements that require specification of media types include the following:
Text files—File types and encodings for:
- SIP messages—Between the voice browser (MRCP client) to Nuance Speech Server (which passes the messages to Nuance speech software such as Nuance Recognizer, Natural Language Processing service, or Nuance Vocalizer)
- Web server—For text file storage
Audio files—File types for:
- Storing audio, such as audio recordings or synthesized speech.
- Transmitting audio from the telephone call.

These elements have media types associated with the following:
- Audio files to recognize.
- Audio files saved by Speech Server (Save-Waveform) and returned to the application.
- Recognition results.
- Text files to interpret.
The default media type is mulaw (audio/basic;rate=8000).
To use other media types, you must modify the parameters swiep_audio_media_type and swirec_audio_media_type on the recognition service in the Management Station.

If an MRCP SPEAK request is sent as SSML, the MRCP client can specify the encoding at the beginning of the SSML. See the SSML specification. For example:
<?xml version="1.0" encoding="ISO-8859-1"?>
In such cases, Nuance Speech Server sends the SSML to Vocalizer without modification. This means that the request can contain every character encoding that Vocalizer supports.
If the MRCP SPEAK request is sent as plain-text, Speech Server needs to know when the character encoding is not ISO-8859-1.
The voice browser can use the MRCP Content-Type header to pass the character encoding to Speech Server:
Header |
Value |
Default |
---|---|---|
Content-Type |
ISO-8859-1 |
ISO-8859-1 |
This example specifies the UTF-8 character set with Content-Type:
MRCP/2.0 nn SPEAK 1 Content-Length: nn Content-Type: text/plain; charset=UTF-8 Speech-Language: en-US
The next example does not specify the character set, so Speech Server uses ISO-8859-1 by default:
MRCP/2.0 nn SPEAK 2 Content-Length: nn Content-Type: text/plain Speech-Language: en-US
You can configure the default Speech Server character set using the plainTextSSMLEncoding parameter in the Management Station.

These parameters configure the encoding of recognition results sent from Nuance Speech Server to the voice browser:
- server.mrcp2.osrspeechrecog.mrcpdefaults.VSP.server.osrspeechrecog.result.mediatype
- server.mrcp1.osrspeechrecog.result.mediatype
See Configuring recognition resources.
To set the encoding of recognition results at runtime, the voice browser can use the MRCP Accept-Charset parameter. For example:
MRCP/2.0 nn SET-PARAMS 100 Accept-Charset: UTF-8
Setting audio channel security
Secure Real-time Transport Protocol (SRTP) encrypts the audio channel so that confidential information cannot be intercepted and played by outside observers.
SRTP is configured via the SDP that is sent in the INVITE request. This is defined in RFC 4568, Security Descriptions for Media Streams. In particular, you use the SDP attribute "crypto" to specify that the preceding media is to be encrypted.
The following example is a simplified INVITE that shows how the SRTP channel is specified using the "crypto" attribute:
INVITE sip:mresources@10.0.0.1:5060 SIP/2.0
...
Content-Type: application/sdp
v=0
o=client_user 2890844526 2890842808 IN IP4 example.com
s=SomeSIPsession
m=application 9 TCP/MRCPv2 1
c=IN IP4 10.0.0.2
a=setup:active
a=connection:new
a=resource:speechrecog
a=cmid:1
m=audio 47774 RTP/SAVP 0 96
c=IN IP4 10.0.0.2
a=rtpmap:0 pcmu/8000
a=rtpmap:96 l16/8000
a=sendonly
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^40|1:32
a=mid:1