Managing languages and voices

After you purchase, download, and install a language and voice, your applications can activate it at any time in the logical callflow:

  • when opening a TTS engine instance
  • after opening the engine instance and before sending the TTS request
  • when processing the input text (via markup)
  • inside the input text

Note: The application can switch voice and language at any location in the input text. However, each switch produces a sentence break and that can affect the prosody of the output.

For a list of available downloads, see Vocalizer languages and voices.

Supporting existing applications

Nuance continually releases new variants of languages and voices with improvements to performance and features such as voice styles. You can install and use these new variants while continuing to use their original versions. For example, your existing application might rely on a highly-tuned voice that you do not want to replace. In this case, the procedure is simple: 

  1. Download and install a new variant from Nuance Network. By default, every application uses the newer, high-quality variant.
  2. To force Vocalizer to use a lesser variant, configure the voice_model parameter for a given application.

Using automatic language identification

Vocalizer can automatically detect the language in a block of text and switch to an appropriate voice (if you've installed a voice with that language). This feature is especially useful when your application does not know which language is needed.

Use these configuration parameters to control the feature:

Parameter

Description

Default

language_identifier_scope

Enables automatic language identification and sets its scope.

user-defined

language_identifier_languages

Permissible languages for automatic language identification in precedence order.

(all installed voices that support language identification)

language_identifier_mode

Mode for automatic language identifiication during the session.

rejection

Installing voices without restarting processes

You can install voices on a running Vocalizer system without needing to restart any processes.

This behavior applies to all the ways you can specify voice criteria:

  • MRCP SET-PARAMS requests
  • session.xml configuration
  • SSML markup
  • Native Vocalizer control sequences

How this feature works:

  1. Vocalizer caches voice criteria during initialization. (The voice criteria includes language, voice name, gender, voice age, voice model, frequency, and others.)
  2. If your application specifies a criteria that is not cached, Vocalizer scans the voice installation directories for new voices that were installed after initialization.
  3. If Vocalizer finds a voice, it returns success without logging errors or warnings.
  4. If Vocalizer cannot satisfy the voice criteria, returns an error to the application (and writes the error to the logs).

If your application repeatedly requests unavailable voice criteria, there's a chance of adversely affecting performance such that your customers notice delays. To avoid too much scanning, use voice_rescan_interval.

Selecting custom voices

Nuance develops custom voices upon request. For example, if you have a special voice associated with your business, Nuance can work with your voice talent to create a TTS voice with the same recognizable characteristics. This enables seamless transition between pre-recorded audio files (recorded in a studio by the voice talent) and synthesized audio (for dynamic text that cannot be anticipated in advance).

After installing a custom voice, you can use its unique name and language to select it: 

  • Selecting via SSML: use the <voice> element and voice attribute
  • Selecting via a control sequence: use the native <ESC>\voice\ tag

Preload multiple voices

For applications that utilize multiple languages and/or voices, you can preload voices to minimize delays associated with the loading of voices (the voices must first be installed on the system prior to being preloaded).

Note: The ability to preload voices requires additional licenses for the preloaded voices. Ensure that you have enough licenses for the preloaded voices and SPEAK request voices prior to implementing. The licenses attached to the preloaded voices will be released when the application closes.

If your application uses Management Station, to preload multiple voices add the nserver.nvs.PreloadVoicesList parameter to the <command> element of the Nuance vocalizer service role file. See Configuring services with role files for more information about configuring the <command> element.

For applications that do not use Management Station, the list of voices to preload should be added to the Nuance vocalizer service command line startup script. See Starting services without Management Station for more information.

The nserver.nvs.PreloadVoicesList parameter is constructed of comma-separated voice name and model pairs (the names and model names are not case-sensitive): nserver.nvs.PreloadVoicesList=Voice1:Model1,Voice2:Model2,Voice3:Model3

For example, a <command> element using the nserver.nvs.PreloadVoicesList parameter could look like the following:

nserver.nvs.PreloadVoicesList=Alice:XPremium,Alice-ml:xpremium,Luca:xpremium,Federica:xpremium-high

Note: There are no spaces or quotations marks allowed in the formatting of the list.

Alternatively to a list of voices, you can also specify allvoices (nserver.nvs.PreloadVoicesList=allvoices), which will preload all installed voices. However, this should only be used in applications that have a limited number of voices installed on the system, as the preloaded voices may consume too much of a system's resources, which may lead to performance degradation.