Audio files

Voice Platform includes a text-to-speech (TTS) engine that can convert text input into synthesized speech. TTS makes it possible for you to enter prompts as text within a VoiceXML file, and have those prompts spoken at runtime. TTS makes it relatively simple to generate prompts and responses dynamically.

The Voice Platform Voice Browser service also supports the SSML markup language within VoiceXML prompts. SSML elements affect how text is synthesized—for example, how loudly or quickly to speak certain words—so you can refine the text-to-speech synthesis.

These features make it possible for you to create your application prompts using only TTS if you prefer. However, for most applications you will create at least some audio files. These may be as simple as a signature sound (or earcon) that is played at the beginning and end of each call as corporate branding, or you may want to use a complete set of custom prompts that do not invoke the TTS engine at all.

Supported formats

Voice Platform supports the Microsoft RIFF (.wav), NIST Sphere (.nis or .nist), and Raw (headerless) audio formats with the following encodings:

  • 8-bit, 8KHz, Mulaw encoding, single channel (recommended)
  • 8-bit, 8KHz, A-law encoding, single channel
  • 16-bit, 8KHz, linear encoding, single channel

The rate used must match the rate of the current session (8KHz).

Persona

One common reason for recording your own prompts is to create a persona for your application. The persona is the personality that the callers perceive as they interact with the application, and is affected by factors such as the age and gender of the voice used for the application, and the patterns of speech.

Samples

The audio files for PizzaTalk are stored in:

$NVP_HOME/appservice/applications/webapps/PizzaTalk/prompts/

These audio files are all .wav format files (16 bit, 8KHz). There are prerecorded prompts and error messages for each dialog state in the application. Many of these files represent individual words that can be used to repeat selections back to the caller: for example, the application concatenates some of these files when verifying the toppings chosen, or confirming the caller’s phone number.