Configuring the speech processing environment

You can use MRCP commands to set up some basic qualities of the environment.

Setting expected media types and encodings

Components of a speech-processing system must agree on media types for sharing messages, text files, and audio files. The VoiceXML application and the voice browser can set the media type at runtime, or you can set default media types in the component configuration files.

Elements that require specification of media types include the following:

Text files—File types and encodings for:

SIP messages—Between the voice browser (MRCP client) to Nuance Speech Server (which passes the messages to Nuance speech software such as Nuance Recognizer, Natural Language Processing service, or Nuance Vocalizer)
Web server—For text file storage

Audio files—File types for:

Storing audio, such as audio recordings or synthesized speech.
Transmitting audio from the telephone call.

Media types in Vocalizer (SSML and Plain-text)

If an MRCP SPEAK request is sent as SSML, the MRCP client can specify the encoding at the beginning of the SSML. See the SSML specification. For example:

<?xml version="1.0" encoding="ISO-8859-1"?>

In such cases, Nuance Speech Server sends the SSML to Vocalizer without modification. This means that the request can contain every character encoding that Vocalizer supports.

If the MRCP SPEAK request is sent as plain-text, Speech Server needs to know when the character encoding is not ISO-8859-1.

The voice browser can use the MRCP Content-Type header to pass the character encoding to Speech Server:

Header	Value	Default
Content-Type	ISO-8859-1 UTF-8 UTF-16	ISO-8859-1

This example specifies the UTF-8 character set with Content-Type:

MRCP/2.0 nn SPEAK 1
 Content-Length: nn
 Content-Type: text/plain; charset=UTF-8
 Speech-Language: en-US

The next example does not specify the character set, so Speech Server uses ISO-8859-1 by default:

MRCP/2.0 nn SPEAK 2
 Content-Length: nn
 Content-Type: text/plain
 Speech-Language: en-US

You can configure the default Speech Server character set using the plainTextSSMLEncoding parameter in the Management Station.

Setting audio channel security

Secure Real-time Transport Protocol (SRTP) encrypts the audio channel so that confidential information cannot be intercepted and played by outside observers.

SRTP is configured via the SDP that is sent in the INVITE request. This is defined in RFC 4568, Security Descriptions for Media Streams. In particular, you use the SDP attribute "crypto" to specify that the preceding media is to be encrypted.

The following example is a simplified INVITE that shows how the SRTP channel is specified using the "crypto" attribute:

INVITE sip:mresources@10.0.0.1:5060 SIP/2.0

...
Content-Type: application/sdp

v=0
o=client_user 2890844526 2890842808 IN IP4 example.com
s=SomeSIPsession
m=application 9 TCP/MRCPv2 1
c=IN IP4 10.0.0.2
a=setup:active
a=connection:new
a=resource:speechrecog
a=cmid:1
m=audio 47774 RTP/SAVP 0 96
c=IN IP4 10.0.0.2
a=rtpmap:0 pcmu/8000
a=rtpmap:96 l16/8000
a=sendonly
a=crypto:1 AES_CM_128_HMAC_SHA1_80
      inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^40|1:32
a=mid:1

Configuring the speech processing environment

Setting expected media types and encodings

Setting audio channel security

Related topics