Saving audio files

Nuance Speech Server can save the incoming audio stream used for recognition, and provide a URI of the resulting file to the MRCP client. To control the feature, the waveform logging parameters identify the location of the web server, the location for writing files, and the amount of data that can be written.

Note: This Speech Server feature is unrelated to Configuring whole call recording, and to Recognizer’s logging of audio waveforms. You control the features separately, and they result in different waveforms. However, these files are all saved to the same base location. (For troubleshooting purposes, you can compare pairs of files created by the mechanisms in search of unexpected differences.)

Recording the audio stream

The MRCP header Save-Waveform instructs the Speech Server to record the incoming audio stream of the recognition and to use the Waveform-URI header to provide a URI where the client can access the resulting file. For information on the location of saved waveforms, see Call log directories and files.

The following example illustrates this scenario:

C->S:MRCP/2.0 RECOGNIZE 543257 
Channel-Identifier: 1@speechrecog
Content-Type: text/uri-list 
Content-Length: 22
Save-Waveform: true
builtin:grammar/digits 
S->C:MRCP/2.0 543257 200 IN-PROGRESS 
Channel-Identifier: 1@speechrecog
S->C:MRCP/2.0 85 START-OF-SPEECH 543257 IN-PROGRESS 
Channel-Identifier: 1@speechrecog
S->C:MRCP/2.0 RECOGNITION-COMPLETE 543257 COMPLETE
Channel-Identifier: 1@speechrecog 
Waveform-URL: http://localhost/1/20051128204946_677309.wav
Completion-Cause: 000 success
Content-Type: application/x-nlsml
Content-Length: 175
<?xml version='1.0'?>
<result>
  <interpretation grammar="builtin:grammar/digits" confidence="100">
    <input mode="speech">one</input>
    <instance>1</instance>
  </interpretation>
</result>

Configuring a web server

Nuance Speech Server stores logs and audio recordings locally on the disk. To retrieve them over an HTTP interface, a local Web server must be configured. To record utterances (save waveforms), you must install and configure a web server on the same machine as Speech Server.

Note: If Apache is already installed when you install Nuance speech products, the existing httpd.conf file in the Apache config directory to httpd.conf.bak is automatically backed up and replaced with an httpd.conf file that supports integration with Speech Server. If you wish to retain customizations to httpd.conf, you must copy them manually to the new configuration file.

The listening port of the web server must be the same as specified by the server.session.ossweb.port parameter on Speech Server in the Management Station.

Supported media types

To set the media type for a stored recording so that it can be read by the application, use the Speech Server parameter. The default value is audio/x-wav.

The following media types are supported for the Save-Waveform and Input-Waveform-URI headers:

Header

Supported media-type

Save-Waveform

audio/x-wav
audio/x-nist

Input-Waveform-URI

audio/basic
audio/x-alaw-basic
audio/L16
audio/x-wav
audio/x-nist