Input and output behavior

Applications provide input text as a text buffer, an input stream, or a document specified via a URI.

  • The text buffer method is the simplest method and yields the best performance. The application passes the text buffer via MRCP SPEAK method.
  • The URI method is particularly useful in configurations with multiple servers: the application stores input texts on a central Web server and passes the URI to the MRCP SPEAK method (Speech Server) . Vocalizer supports documents on HTTP servers and local files. It also supports HTTP and HTTPS proxy servers and caching the retrieved documents.

Using markup to control output

Use a markup language to control the voice, pronunciation, volume, rate, and other aspects of the generated speech. Vocalizer supports these markup languages:

  • The native Vocalizer markup language, which is explained in Control sequences, with further details on guiding text normalization in the Language Supplement for each Vocalizer language.
  • W3C SSML v1.0 (XML-based) with some proprietary extensions. See Vocalizer SSML support.

Vocalizer supports a wide range of character sets and encodings. The engine handles the transcoding of the input text to the native (or internal) Unicode UTF-16 character set.

Switching languages and voices

The active language and voice can be specified when a TTS engine instance is opened, between the open and the TTS request, or during the processing of input text (via markup). The voice and language can be switched at any location in the input text.

See Managing languages and voices.

Streaming audio output

Vocalizer streams the audio output to the application via RTP (Nuance Speech Server integrations), . This allows applications to play the audio output for the initial blocks of text to callers while the TTS engine is still processing the blocks of text that follow, minimizing caller-perceived latency. The application can specify the desired audio format (A-law, µ-law, 16-bit linear).