Synthesis resources

Synthesis resources are objects that facilitate or improve speech synthesis. The principal resource is a mandatory voice pack and optional resources include user dictionaries, ActivePrompt databases, rulesets, and audio files.

To use these optional resources, upload them to storage using UploadRequest, then reference them in SynthesisResource type and uri. You may also specify user dictionaries inline, using SynthesisResource body.

See the following scenarios for details about each type of resource.

Voice pack

TTSaaS works with one or more factory voice packs, available in several languages and locales.

For the list of voices available in the Mix environment, see see Geographies.

You may also query your environment programmatically for supported voices using GetVoicesRequest. See Get voices in the sample synthesis client for an example.

For issues relating to voices, see Known issues.

User dictionary

A user dictionary alters the default pronunciation of words spoken by TTSaaS. For example, you can define the pronunciation of words from foreign languages, expand special acronyms, and tune the pronunciation of words with unusual spelling.

User dictionaries are created using Nuance Vocalizer Studio. For details, see “Specifying pronunciations with user dictionaries” in the Nuance Vocalizer for Enterprise documentation.

The steps for using a user dictionary are:

  1. Compile the source dictionary using Nuance Vocalizer Studio to create a .dcb file, for example, coffee-dictionaryy.dcb.

  2. Upload the dictionary to storage with UploadRequest. See Upload user dictionary in the sample storage client.

    UploadResponse returns the complete URN for this dictionary in the response.

    uri: "urn:nuance-mix:tag:tuning:lang/coffee_app/coffee_dict/en-us/mix.tts?type=userdict"
    
  3. Reference the dictionary using its URN in a synthesis request. See Run client with resources in the sample synthesis client.

    synthesis_resource = SynthesisResource()
    synthesis_resource.type = EnumResourceType.USER_DICTIONARY
    synthesis_resource.uri = "urn:nuance-mix:tag:tuning:lang/coffee_app/coffee_dict/en-us/mix.tts"
    request.input.resources.extend([synthesis_resource])
    

To remove a resource from storage, use DeleteRequest. See Delete resource in the sample storage client.

Inline dictionary

Alternatively, you may reference a dictionary inline.

The dictionary shown below includes the pronunciation of “zero,” the expansion and pronunciation of “addr” and “adm,” plus the expansion of several abbreviated words and acronyms. To use this as an inline dictionary:

  1. Compile the source dictionary using Nuance Vocalizer Studio or its conversion tool, dictcpl. In this example, the resulting compiled file is user_dictionary.dcb.

    [Header]
    Language = ENU
    [SubHeader]
    Content = EDCT_CONTENT_BROAD_NARROWS
    Representation = EDCT_REPR_SZZ_STRING
    [Data]
    zero // #'zi.R+o&U#
    addr // #'@.dR+Es#
    adm // #@d.'2mI.n$.'stR+e&I.S$n#
    [SubHeader]
    Content=EDCT_CONTENT_ORTHOGRAPHIC
    Representation=EDCT_REPR_SZ_STRING
    [Data]
    Info      Information
    IT        "Information Technology"
    DLL       "Dynamic Link Library"
    A-level   "advanced level"
    Afr       africa
    Acc       account
    TEL       telephone
    Anon      anonymous
    AP        "associated press"
    
  2. Read the dictionary as a local file in the body field. This example shows user_dictionary.dcb in flow.py, which serves as input to the Sample synthesis client.

    request.input.text.text = "I need to find a DLL."
    
    synthesis_resource = SynthesisResource()
    synthesis_resource.type = EnumResourceType.USER_DICTIONARY
    synthesis_resource.body = open('/path/to/user_dictionary.dcb', 'rb').read()
    request.input.resources.extend([synthesis_resource])
    
  3. Run client.py, the main file in the sample synthesis client. The audio output is: “I need to find a dynamic link library.”

ActivePrompt database

An ActivePrompt database is a collection of digital audio recordings and pronunciation instructions that can be used within synthesized speech using the Nuance control code, prompt.

ActivePrompt databases are created using Nuance Vocalizer Studio. For details, see “Tuning TTS output with ActivePrompts” in the Nuance Vocalizer for Enterprise documentation.

To create and use an ActivePrompt database:

  1. Create the database using Nuance Vocalizer studio.

  2. Rename the database to index.dat, and add the database and all recordings to a zip file without a root folder, for example, coffee-prompts.zip.

  3. Upload the database to storage using UploadRequest. See Upload ActivePrompts in the sample storage client.

    UploadResponse returns the complete URN for this database in the response.

    uri: "urn:nuance-mix:tag:tuning:voice/coffee_app/coffee_prompts/evan/mix.tts?type=activeprompt"
    
  4. Load the database into a synthesis session with its URN. See Run client with resources in the sample synthesis client.

    synthesis_resource = SynthesisResource()
    synthesis_resource.type = EnumResourceType.ACTIVEPROMPT_DB
    synthesis_resource.uri = "urn:nuance-mix:tag:tuning:voice/coffee_app/coffee_prompts/evan/mix.tts"
    request.input.resources.extend([synthesis_resource])
    
  5. Reference prompts in the database in the prompt control code.

    Token (control_code=ControlCode (key="prompt", value="coffee::confirm_order")),
    Token (text = "Thanks ")
    

To remove a resource from storage, use DeleteRequest. See Delete resource in the sample storage client.

Ruleset

A user ruleset is a set of match-and-replace rules that replace sections of input text during voice synthesis. For example, a ruleset may expand an abbreviation (from “PIN” to “personal information number”), or convert currency symbols into full words.

Whereas user dictionaries only support search-and-replace for complete words or phrases, user rulesets support any search pattern that can be expressed using regular expressions. You can use rulesets to search for multiple words, part of a word, or a repeated pattern. For example, you can use an expression to find all uses of a currency symbol, and replace it with words (“dollars” or “euros”) regardless of the amounts.

Rulesets are created following the instructions in “Rulesets” in the Nuance Vocalizer for Enterprise documentation. Only text rulesets are allowed: binary (or encrypted) rulesets are not supported.

To include rulesets in your applications:

  1. Define the ruleset as a text file, for example, coffee-ruleset.rst.txt.

  2. Upload the ruleset to storage using UploadRequest. See Upload rulesets in the sample storage client.

    UploadResponse returns the complete URN for the ruleset in the response.

    uri: "urn:nuance-mix:tag:tuning:lang/coffee_app/coffee_rules/en-us/mix.tts?type=textruleset"
    
  3. Load the ruleset into a synthesis session with its URN. See Run client with resources in the sample synthesis client.

    synthesis_resource = SynthesisResource()
    synthesis_resource.type = EnumResourceType.TEXT_USER_RULESET
    synthesis_resource.uri = "uri: "urn:nuance-mix:tag:tuning:lang/coffee_app/coffee_rules/en-us/mix.tts
    request.input.resources.extend([synthesis_resource])
    

To remove a resource from storage, use DeleteRequest. See Delete resource in the sample storage client.

Audio file

An audio file may be included in SSML input or tokenized sequences to provide speech or sounds during synthesis. You may include audio files using the SSML <audio> element or the audio control code.

You may optionally include alternative text in the SSML audio element as <audio src="file.wav">Alt text</audio>. If the file is not found or is not a WAV file, TTSaaS synthesizes the alternative text and includes it in the results.

This example references an audio file in cloud storage via URN:

<speak>Please leave your name after the tone. 
<audio src="urn:nuance-mix:tag:tuning:audio/coffee_app/beep/mix.tts">Beep</audio>
</speak>

And this audio file is via a secure URL:

<speak>Please leave your name after the tone. 
<audio src="https://<host>/audio/beep.wav">Beep</audio>
</speak>

Tokenized sequences do not support alternative text for the audio file. With the Synthesize method, if the audio file is not found or is not WAV, TTSaaS reports an error but synthesizes any text tokens in the sequence, ignoring the audio file. For UnarySynthesize, TTSaaS does not synthesize the text tokens and returns no synthesis.

If there is no alternative text or text token, TTSaaS reports errors for unavailable or non-WAV files.