Configuring the speech processing environment

You can use MRCP commands to set up some basic qualities of the environment.

Setting expected media types and encodings

Components of a speech-processing system must agree on media types for sharing messages, text files, and audio files. The VoiceXML application and the voice browser can set the media type at runtime, or you can set default media types in the component configuration files.

Elements that require specification of media types include the following:

Text files—File types and encodings for:

  • SIP messages—Between the voice browser (MRCP client) to Nuance Speech Server (which passes the messages to Nuance speech software such as Nuance Recognizer, Natural Language Processing service, or Nuance Vocalizer)
  • Web server—For text file storage

Audio files—File types for:

  • Storing audio, such as audio recordings or synthesized speech.
  • Transmitting audio from the telephone call.

Setting audio channel security

Secure Real-time Transport Protocol (SRTP) encrypts the audio channel so that confidential information cannot be intercepted and played by outside observers.

SRTP is configured via the SDP that is sent in the INVITE request. This is defined in RFC 4568, Security Descriptions for Media Streams. In particular, you use the SDP attribute "crypto" to specify that the preceding media is to be encrypted.

The following example is a simplified INVITE that shows how the SRTP channel is specified using the "crypto" attribute:

INVITE sip:mresources@10.0.0.1:5060 SIP/2.0
...
Content-Type: application/sdp
v=0
o=client_user 2890844526 2890842808 IN IP4 example.com
s=SomeSIPsession
m=application 9 TCP/MRCPv2 1
c=IN IP4 10.0.0.2
a=setup:active
a=connection:new
a=resource:speechrecog
a=cmid:1
m=audio 47774 RTP/SAVP 0 96
c=IN IP4 10.0.0.2
a=rtpmap:0 pcmu/8000
a=rtpmap:96 l16/8000
a=sendonly
a=crypto:1 AES_CM_128_HMAC_SHA1_80
      inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^40|1:3
2
a=mid:1