<speechsynth>

The <speechsynth> element defines the default parameters that apply to Vocalizer while converting text to synthesized speech.

The parameters that may appear in a <speechsynth> element include:

Parameter

Type

Description

activeprompt_dbs

String

Lists one or more ActivePrompt databases to load for tuning speech output, where each database is specified via a URI. Use one <value> child element for each database URI.

ActivePrompt databases are voice-specific, as indicated in the database header. At runtime Vocalizer only applies those ActivePrompt databases that match the current synthesis voice (see the voice parameter).

dictionaries

String

Lists one or more user dictionaries to load for tuning speech output, where each dictionary is specified via a URI. Use one <value> child element for each dictionary URI.

User dictionaries are language-specific, as indicated in the user dictionary file header. At runtime Vocalizer only does lookups within user dictionaries that match the current synthesis language (see the language parameter).

escape_sequence

String

Specifies an alternative to the <ESC> character (ASCII 0x1B) for using Vocalizer control sequences within the input text.

You must define an escape_sequence to use the Vocalizer native markup in control sequences, as the <ESC> character is not permitted in VoiceXML documents.

language

String

Specifies the language to be used for speech synthesis. This must be an IETF language code (for example, “en-US”) or a Vocalizer language name (for example, “American English”)

language_identifier_languages

String

Lists permissible languages available for language identification, in order of precedence. Use one <value> child element for each language, where the contents of the element is the 3-letter language code. Each language in the list has a higher precedence than languages listed after it.

The language identification feature determines which language Vocalizer uses to synthesize speech from text when the language is not known.

language_identifier_mode

String

Specifies how Vocalizer behaves when the language identifier assigns a low confidence score to its identification of an unknown language, and this low-confidence first choice would cause a switch in language. There are two possible values:

  • rejection: If the confidence score is too low, Vocalizer continues using the current language.
  • forced-choice: The language always uses the language with the highest confidence score.

language_identifier_scope

String

Specifies when the language identifier feature is used. There are three possible values:

  • user-defined: Use the language identifier only for blocks labeled with the native markup <ESC>\lang=unknown\ control sequence or SSML xml:lang="unknown".
  • message: Use the language identifier automatically on each input message (typically a sentence).
  • none: Disable the language identifier feature.
marker_mode

String

Specify the types of markers to deliver. The allowable values are zero or more of the following:

  • TTS_MRK_SENTENCE (0x0001) for sentence markers
  • TTS_MRK_WORD (0x0002) for word markers
  • TTS_MRK_PHONEME (0x0004) for phoneme markers
  • TTS_MRK_BOOK (0x0008) for bookmarks, the only marker type supported in MRCP environments
  • TTS_MRK_PARAGRAPH (0x0200) for paragraph markers

rate

Integer

Specifies the speaking rate, on a scale of 1–100 (inclusive) where lower values represent slower speaking rates.

rulesets

String

Specifies one or more user rulesets to load, where each ruleset is specified via a URI. Use one <value> element for each ruleset URI.

User rulesets are language-specific, as indicated in the user ruleset file header. At runtime Vocalizer only applies user rulesets that match the current synthesis language.

ssml_validation

String

Specifies the Vocalizer SSML validation mode.

  • strict: Validate the input against the SSML 1.0 Recommendation with Nuance extensions. If validation fails, Vocalizer logs error messages and fails out of the synthesis operation with an error.
  • warn: Perform the same validation, but only log errors rather than failing out of the synthesis operation.
  • none: Skip validation.

The strict setting is the most robust, as it ensures that Vocalizer does not attempt to handle bad input that could otherwise lead to inaccurate speech synthesis.

voice

String

Specifies the name of the voice used for speech synthesis.

voice_model

String

Indicates the type of TTS technology to be used for speech synthesis. The possible values are:

  • full_encryptf8: Value used for 8 kHz voices
  • full_vssq5f22: Value used for 22 kHz voices
  • bet4: for 8 kHz and 22 kHz voices; bet4 produces higher quality speech, but uses more disk space, CPU, and memory.

volume

Integer

Specifies the volume for synthesized speech on a scale of 0–100 (inclusive), where lower values are lower volume levels.