Synthesizer HTTP API

TTSaaS includes an HTTP API for requesting voices and synthesis operations. It is based on the Synthesizer gRPC API and offers two commands: voices and synthesis.

This API is a transcoded version of the main gRPC API, so it respects the JSON mapping  detailed here.

Base URL and authorization

The endpoint for TTSaaS HTTP commands in the Mix environment is:

https://tts.api.nuance.com/api/v1/

This service requires an authorization token. To generate the token, you can use this shell script, get-token.sh, replacing the CLIENT_ID and SECRET values with your credentials from Mix. See Prerequisites from Mix.

The script changes the colons in your client ID to their percent-encoded form, so you may enter your client ID as is.

CLIENT_ID=<Mix client ID>
SECRET=<Mix client secret>
CLIENT_ID=${CLIENT_ID//:/%3A}

export MY_TOKEN="`curl -s -u $CLIENT_ID:$SECRET \
https://auth.crt.nuance.com/oauth2/token \
-d 'grant_type=client_credentials' -d 'scope=tts' \
| jq -j .access_token`"

“Source” this script to generate an authorization token and make it available in the current shell. Then test the URL with a simple voices request using cURL:

source get-token.sh

curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/voices \
-d '{ "voice": { "name": "Evan" } }'

{
 "voices": [
  {
   "name": "Evan",
   "model": "enhanced",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "MALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "1.1.1",
   "foreignLanguages": []
  }
 ]
}

You must provide the token when calling the service. For example:

  • In a cURL command:

     curl -H "Authorization: Bearer $MY_TOKEN" https://tts.api.nuance.com/api/v1/voices
    
  • In a REST client, you may either generate a token manually and enter it in your request, or have your development environment generate it for you.

    Authorization: Bearer <token>
    
  • In a Python client:

    http_headers['Authorization'] = "Bearer {}".format(token)
    

Your authorization token expires after a short period of time. Source get-token.sh again when you get a 401 error, meaning status Unauthorized: The request could not be authorized.

/api/v1/voices

Queries the voice packs to learn which voices are available. Optionally include parameters to filter the results.

GET https://tts.api.nuance.com/api/v1/voices

The parameters for the voices command are:

Voices
Name In Type Description
Authorization header object Mandatory. Authorization token as Bearer: token
voice body voice Optional. Filter the voices to retrieve. For example, set language to en-US to return only American English voices.

A successful response details the available voices, filtered when requested. See Status codes for other responses.

Get all available voices (cURL example):

curl -H "Authorization: Bearer $MY_TOKEN" https://tts.api.nuance.com/api/v1/voices

{
 "voices": [
  {
   "name": "Allison",
   "model": "standard",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "5.2.3.12283",
   "foreignLanguages": []
  },
  {
   "name": "Allison",
   "model": "standard",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 8000,
   "languageTlw": "enu",
   "restricted": false,
   "version": "5.2.3.12283",
   "foreignLanguages": []
  },
  {
   "name": "Ava-Ml",
   "model": "enhanced",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "3.0.1",
   "foreignLanguages": [
    "es-mx"
   ]
  },
  {
   "name": "Chloe",
   "model": "standard",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "5.2.3.15315",
   "foreignLanguages": []
  },
. . .

voice (in voices)

Filters the requested voices in the voices command. It contains one of the following:

Voice in get voices
Name Type Description
name string The voice’s name, for example, Evan.
model string The voice’s quality model, for example, enhanced or standard. (For backward compatibility, xpremium-high or xpremium are also accepted.)
language string IETF language code, for example, en-US. Search for voices with a specific language. Some voices support multiple languages.
age_group string Search for adult or child voices, using a keyword: ADULT (default) or CHILD.
gender string Search for voices with a certain gender, using a keyword: ANY (default), MALE, FEMALE, NEUTRAL.
sample_rate_hz integer Search for a certain native sample rate.
language_tlw string Three-letter language code (for example, enu for American English) for configuring language identification.
Filter results to retrieve voice name Evan:

curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/voices -d '{ "voice": { "name": "evan" } }'

{
 "voices": [
  {
   "name": "Evan",
   "model": "enhanced",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "MALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "1.1.1",
   "foreignLanguages": []
  }
 ]
}

Filter results to retrieve all French Canadian voices:

curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/voices -d '{ "voice": { "language": "fr-ca" } }'

{
 "voices": [
  {
   "name": "Amelie-Ml",
   "model": "enhanced",
   "language": "fr-ca",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 22050,
   "languageTlw": "frc",
   "restricted": false,
   "version": "2.1.1",
   "foreignLanguages": [
    "en-us",
    "en-gb",
    "es-mx"
   ]
  },
  {
   "name": "Chantal",
   "model": "standard",
   "language": "fr-ca",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 22050,
   "languageTlw": "frc",
   "restricted": false,
   "version": "2.1.0",
   "foreignLanguages": []
  },
  {
   "name": "Nicolas",
   "model": "standard",
   "language": "fr-ca",
   "ageGroup": "ADULT",
   "gender": "MALE",
   "sampleRateHz": 22050,
   "languageTlw": "frc",
   "restricted": false,
   "version": "2.0.0",
   "foreignLanguages": []
  }
 ]
}

/api/v1/synthesize

Sends a synthesis request and returns a unary (non-streamed) synthesis response. The request specifies a mandatory voice and input text, as well as optional audio parameters and so on.

POST https://tts.api.nuance.com/api/v1/synthesize

The parameters for the synthesize command are:

Synthesize
Name In Type Description
Authorization header object Mandatory. Authorization token as Bearer: token
voice body voice Mandatory. The voice to perform the synthesis.
audio_params body audio_params Output audio parameters, such as encoding and volume. Default is PCM audio at 22050 Hz.
input body input Mandatory. Input text to synthesize, tuning data, etc.
event_params body event_params Markers and other info to include in server events returned during synthesis.
client_data body map<string,string> Map of client-supplied key:value pairs to inject into the call log.
user_id body string Identifies a specific user within the application.

Synthesize plain text (cURL example):

curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/synthesize \
-d '{ "voice": { "name": "Evan", "model": "enhanced" }, "input": { "text": { "text": "This is a test. A very simple test."} } }'

For examples of the results, see Response to synthesize below.

voice (in synthesize)

In the synthesize command, this mandatory parameter specifies the voice to use for the synthesis operation. The other entries in the voice parameter are not used for synthesis.

Voice in synthesize
Name Type Description
name string Mandatory. The voice’s name, for example, Evan.
model string Mandatory. The voice’s quality model, for example, enhanced or standard. (For backward compatibility, xpremium-high or xpremium are also accepted.)

Mandatory voice parameters identify the voice to perform the synthesis:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  }
}

audio_params

Audio-related parameters for synthesis, including encoding, volume, and audio length. Included in synthesize. The default is PCM audio at 22050 Hz.

Voice in synthesizeAudio parameters
Name Type Description
audio_format audio_format Audio encoding. Default PCM 22050 Hz.
volume_percentage integer Volume amplitude, from 0 to 100. Default 80.
speaking_rate_factor number Speaking rate, from 0 to 2.0. Default 1.0.
audio_chunk_ duration_ms integer Maximum duration, in ms, of an audio chunk delivered to the client, from 1 to 60000. Default is 20000 (20 seconds). When this parameter is large enough (for example, 20 or 30 seconds), each audio chunk contains an audible segment surrounded by silence.
target_audio_length_ms integer Maximum duration, in ms, of synthesized audio. When greater than 0, the server stops ongoing synthesis at the first sentence end, or silence, closest to the value.
disable_early_emission boolean By default, audio segments are emitted as soon as possible, even if they are not audible. This behavior may be disabled.

Optional audio parameters set audio to Ogg Opus and include three other options:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  },
  "audio_params":{
    "audio_format":{
      "ogg_opus":{
        "sample_rate_hz":16000
      }
    },
    "volume_percentage": 100,
    "speaking_rate_factor": 1.2,
    "target_audio_length_ms": 10
  }
}

audio_format

Audio encoding of synthesize text. Included in audio_params.

Voice in synthesizeAudio format
Name Type Description
pcm pcm Signed 16-bit little endian PCM.
alaw alaw G.711 A-law, 8kHz.
ulaw ulaw G.711 Mu-law, 8kHz.
ogg_opus ogg_opus Ogg Opus, 8kHz,16kHz, or 24 kHz.
opus opus Opus, 8kHz, 16kHz, or 24kHz. The audio will be sent one Opus packet at a time.

pcm

The PCM sample rate. Included in audio_format.

Voice in synthesizePCM audio
Name Type Description
sample_rate_hz integer Output sample rate in Hz. Supported values: 8000, 11025, 16000, 22050, 24000.

PCM sample rate changed to 16000 (from default 22050):

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  },
  "audio_params":{
    "audio_format":{
      "pcm":{
        "sample_rate_hz": 16000
      }
    }
  }
}

alaw

The A-law audio format. Included in audio_format. G.711 audio formats are set to 8 kHz.

Audio format changed to A-law:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  },
  "audio_params":{
    "audio_format":{
      "alaw":{}
    }
  }
}

ulaw

The μ-law audio format. Included in audio_format. G.711 audio formats are set to 8 kHz.

Audio format changed to μ-law:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  },
  "audio_params":{
    "audio_format":{
      "ulaw":{}
    }
  }
}

ogg_opus

The Ogg Opus output rate. Included in audio_format.

Voice in synthesizeOgg Opus audio
Name Type Description
sample_rate_hz integer Output sample rate in Hz. Supported values: 8000, 16000, 24000.
bit_rate_bps integer Valid range is 500 to 256000 bps. Default 28000.
max_frame_ duration_ms number Opus frame size in ms: 2.5, 5, 10, 20, 40, 60. Default 20.
complexity integer Computational complexity. A complexity of 0 means the codec default.
vbr vbr Variable bitrate. On by default.

Audio format changed to Ogg Opus:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  },
  "audio_params":{
    "audio_format":{
      "ogg_opus":{
        "sample_rate_hz":16000
      }
    }
  }
}

opus

Opus output rate. Included in audio_format.

Voice in synthesizeOpus audio
Name Type Description
sample_rate_hz integer Output sample rate in Hz. Supported values: 8000, 16000, 24000.
bit_rate_bps integer Valid range is 500 to 256000 bps. Default 28000.
max_frame_ duration_ms number Opus frame size in ms: 2.5, 5, 10, 20, 40, 60. Default 20.
complexity integer Computational complexity. A complexity of 0 means the codec default.
vbr vbr Variable bitrate. On by default.

vbr

Settings for variable bitrate. Included in ogg_opus and opus. Turned on by default.

Voice in synthesizeVariable bitrate
Name Number Description
VARIABLE_BITRATE_ON 0 Use variable bitrate. Default.
VARIABLE_BITRATE_OFF 1 Do not use variable bitrate.
VARIABLE_BITRATE_ CONSTRAINED 2 Use constrained variable bitrate.

input

Text to synthesize and synthesis parameters, including tuning data, etc. Included in synthesize. The type of input may be plain text, SSML, or a sequence of plain text and Nuance control codes.

Voice in synthesizeInput
Name Type Description
text text Plain text input.
ssml ssml SSML input, including text and SSML elements.
tokenized_sequence tokenized_sequence Sequence of text and Nuance control codes.
resources resources Repeated. Synthesis resources (user dictionaries, rulesets, etc.) to tune synthesized audio. Default empty.
lid_params lid_params LID parameters.
download_params download_params Remote file download parameters.

Minimal mandatory input:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  }
}

text

Input for synthesizing plain text. The encoding must be UTF-8. Included in input.

Voice in synthesizeText input
Name Type Description
text string Plain input text in UTF-8 encoding.
uri string Remote URI to the plain input text. Not supported in Nuance-hosted TTS.

ssml

Input for synthesizing SSML. Included in input. See SSML input for a list of supported elements.

Voice in synthesizeSSML input
Name Type Description
text string SSML input text and elements.
uri string Remote URI to the SSML input text. Not supported in Nuance-hosted TTS.
ssml_validation_mode ssml_validation_mode SSML validation mode. Default STRICT.

Minimal SSML input:

{
  "voice": {
    "name": "Evan",
    "model": "enhanced"
  },
  "input": {
    "ssml": {
      "text": "<speak>This is an SSML test. A super simple test.</speak>"
    }
  }
}

ssml_validation_mode

SSML validation mode when using SSML input. Included in ssml. Strict by default but can be relaxed.

Voice in synthesizeSSML validation mode
Name Number Description
STRICT 0 Strict SSL validation. Default.
WARN 1 Give warning only.
NONE 2 Do not validate.

tokenized_sequence

Input for synthesizing a sequence of plain text and Nuance control codes. Included in input.

Voice in synthesizeTokenized sequence
Name Type Description
tokens tokens Repeated. Sequence of text and control codes.

tokens

The unit when using tokenized_sequence for input. Included in tokenized_sequence. Each token can be either plain text or a Nuance control code. See Tokenized sequence for supported codes.

Voice in synthesizeTokens
Name Type Description
text string Plain input text.
control_code control_code Nuance control code.

control_code

Nuance control code that specifies how text should be spoken, similarly to SSML. Included in tokens.

Voice in synthesizeControl code
Name Type Description
key string Name of the control code, for example, pause
value string Value of the control code.

resources

A resource for tuning the synthesized output. Included in input.

Voice in synthesizeResources
Name Type Description
type type Resource type, for example, user dictionary, etc. Default USER_DICTIONARY.
uri string URI to the remote resource. Either a URL or the URN of a resource previously uploaded to cloud storage with the Storage gRPC API. See URNs for the format.
body bytes For resource type USER_DICTIONARY, the contents of the file.

type

The type of synthesis resource to tune the output. Included in resources. User dictionaries provide custom pronunciations, rulesets apply search-and-replace rules to input text, and ActivePrompt databases help tune synthesized audio under certain conditions, using Nuance Vocalizer Studio.

Voice in synthesizeType of resource
Name Number Description
USER_DICTIONARY 0 User dictionary (application/edct-bin-dictionary). Default.
TEXT_USER_RULESET 1 Text user ruleset (application/x-vocalizer-rettt+text).
BINARY_USER_RULESET 2 Not supported. Binary user ruleset (application/x-vocalizer-rettt+bin).
ACTIVEPROMPT_DB 3 ActivePrompt database (application/x-vocalizer-activeprompt-db).
ACTIVEPROMPT_DB_AUTO 4 ActivePrompt database with automatic insertion (application/x-vocalizer-activeprompt-db;mode=automatic).
SYSTEM_DICTIONARY 5 Nuance system dictionary (application/sdct-bin-dictionary).

lid_params

Parameters for controlling the language identifier. Included in input. The language identifier runs on input blocks labeled with the control code lang unknown or the SSML attribute xml:lang="unknown". The language identifier automatically restricts the matched languages to the installed voices. This limits the permissible languages, and also sets the order of precedence (first to last) when they have equal confidence scores.

Voice in synthesizeLanguage identification parameters
Name Type Description
disable boolean Whether to disable language identification. Turned on by default.
languages string Repeated. List of three-letter language codes (for example, enu, frc, spm) to restrict language identification results, in order of precedence. Use voices to obtain the three-letter codes, returned in language_tlw. Default empty.
always_use_ highest_confidence boolean If enabled, language identification always chooses the language with the highest confidence score, even if the score is low. Default false, meaning use language with any confidence.

download_params

Parameters for remote file download, whether for input text (input.uri) or a synthesis resource (resource.uri). Included in input.

Voice in synthesizeDownload parameters
Name Type Description
headers map<string,string> Map of HTTP header name,value pairs to include in outgoing requests. Supported headers: max_age, max_stale.
request_timeout_ms integer Request timeout in ms. Default (0) means server default, usually 30000 (30 seconds).
refuse_cookies boolean Whether to disable cookies. By default, HTTP requests accept cookies.

event_params

Event subscription parameters. Included in synthesize. Requested events are reported in the response.

Voice in synthesizeEvent parameters
Name Type Description
send_sentence_marker_events boolean Sentence marker. Default: do not send.
send_word_marker_events boolean Word marker. Default: do not send.
send_phoneme_marker_events boolean Phoneme marker. Default: do not send.
send_bookmark_marker_events boolean Bookmark marker. Default: do not send.
send_paragraph_marker_events boolean Paragraph marker. Default: do not send.
send_visemes boolean Lipsync information. Default: do not send.
send_log_events boolean Whether to log events during synthesis. By default, logging is turned off.
suppress_input boolean Whether to omit input text and URIs from log events. By default, these items are included.

Event parameters:

{
  "voice": {
    "name": "Evan",
    "model": "enhanced"
   },
  "input": {
    "text": {
      "text": "This is a test. A very simple test."
    }
  },
  "event_params": {
        "send_log_events" true,
        "send_sentence_marker_events": true,
        "send_word_marker_events": true
  }
}

Response to synthesize

The synthesize command returns a unary (non-streamed) message containing:

  • A status code, indicating completion or failure of the request. See Status codes.
  • A list of events the client has requested. See event_params for details.
  • The complete audio buffer of the synthesized text, in base64 format.
curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/synthesize \
-d '{ "voice": { "name": "Evan", "model": "enhanced" }, "input": { "text": { "text": "This is a test. A very simple test."} } }'

{
  "status": {
    "code": 200,
    "message": "OK",
    "details": ""
  },
  "audio": "AAAAAAAAA... (synthesized audio in base64 format)"
}

Sample client for WAV output

Additional processing is required to convert this audio to a playable audio format. For example, Python processing in http-wav-client.py converts base64 audio to WAV format:

#!/usr/bin/env python3

import requests as req
import base64
import os
import argparse

global args

# Generates the .wav file header for a given set of parameters
def generate_wav_header(sampleRate, bitsPerSample, channels, datasize, formattype):
    # (4byte) Marks file as RIFF
    o = bytes("RIFF", 'ascii')
    # (4byte) File size in bytes excluding this and RIFF marker
    o += (datasize + 36).to_bytes(4, 'little')
    # (4byte) File type
    o += bytes("WAVE", 'ascii')
    # (4byte) Format Chunk Marker
    o += bytes("fmt ", 'ascii')
    # (4byte) Length of above format data
    o += (16).to_bytes(4, 'little')
    # (2byte) Format type (1 - PCM)
    o += (formattype).to_bytes(2, 'little')
    # (2byte) Will always be 1 for TTS
    o += (channels).to_bytes(2, 'little')
    # (4byte)
    o += (sampleRate).to_bytes(4, 'little')
    o += (sampleRate * channels * bitsPerSample // 8).to_bytes(4, 'little')  # (4byte)
    o += (channels * bitsPerSample // 8).to_bytes(2,'little')               # (2byte)
    # (2byte)
    o += (bitsPerSample).to_bytes(2, 'little')
    # (4byte) Data Chunk Marker
    o += bytes("data", 'ascii')
    # (4byte) Data size in bytes
    o += (datasize).to_bytes(4, 'little')

    return o

token = os.getenv('MY_TOKEN')

parser = argparse.ArgumentParser(description='TTS HTTP Client')

options = parser.add_argument_group("options")
options.add_argument("--wav", action="store_true",
                     help="Save audio file in WAVE format")
options.add_argument("--voice", nargs="?",
                     help="Voice name (default=Evan)", default="Evan")
options.add_argument("--model", nargs="?",
                     help="Voice model type (default=enhanced)", default="enhanced")
options.add_argument("--type", nargs="?",
                     help="Input type: text or ssml (default=text)", default="text")
options.add_argument("--input", nargs="?",
                     help="Input text (default=This is a test)", default="This is a test.")

args = parser.parse_args()

http_headers = {}
http_headers['Authorization'] = "Bearer {}".format(token)

formatted_data = '{{ "voice": {{ "name": "{voice_name}", "model": "{model_name}" }}, "input": {{ "{input_type}": {{ "text": "{input_text}"}} }} }}'.format(voice_name=args.voice, model_name=args.model, input_type=args.type, input_text=args.input)

response = req.post('https://tts.api.nuance.com/api/v1/synthesize', data=formatted_data, headers=http_headers)

if response.status_code != 200:
    raise Exception("Failed to synthesize. Status: {}".format(response.status_code))

json_response = response.json()

if json_response["status"]["code"] != 200:
    print("Failed to synthesize. Message: {}. Status: {}".format(json_response["status"]["message"], json_response["status"]["code"]))
else:
    decoded_audio_response = base64.b64decode(json_response["audio"])

    waveheader = generate_wav_header(22050, 16, 1, len(decoded_audio_response), 1)

    if response.status_code == 200:
        if args.wav:
            with open("output.wav","wb") as output_file:
                output_file.seek(0, 0)
                output_file.write(waveheader)
                output_file.seek(0, 2)
                output_file.write(decoded_audio_response)
                print("Audio successfully written to", output_file.name)
        else:
            with open("output.raw", "wb") as output_file:
                output_file.write(decoded_audio_response)
                print("Audio successfully written to", output_file.name)

Copy this code into a text file named http-wav.client.py. If you don’t have the Python requests module, install it with pip install requests.

This client accepts an authorization token and returns the audio, either as base64 audio or, with the --wav argument, as a WAV file named output.wav.

Source the get-token.sh script (see Base URL and authorization) to generate and export an authorization token, then call the Python client. This client accepts the following arguments:

python3 http-wav-client.py --help

usage: http-wav-client.py [-h] [--wav] [--voice [VOICE]] [--model [MODEL]] [--type [TYPE]] [--input [INPUT]]

TTS HTTP Client

options:
  -h, --help       show this help message and exit

options:
  --wav            Save audio file in WAVE format
  --voice [VOICE]  Voice name (default=Evan)
  --model [MODEL]  Voice model type (default=enhanced)
  --type [TYPE]    Input type: text or ssml (default=text)
  --input [INPUT]  Input text (default=This is a test)

This example uses the default voice and input but sets the output file to WAV format:

source get-token.sh

python3 http-wav-client.py --wav

Audio successfully written to output.wav

Optionally use a different voice and specify your own input:

python3 http-wav-client.py --wav --voice "Zoe-Ml" --input "Shall I compare thee to a summers day"

Audio successfully written to output.wav

Or change to SSML input:

python3 http-wav-client.py --wav --voice "Zoe-Ml" --type "ssml" \
--input "<speak>Thou art more lovely and more temperate.</speak>"

Audio successfully written to output.wav

Your authorization token expires after a short period of time. Re-run get-token.sh when you get status error 401: Failed to synthesize. See Status codes for other codes.

Synthesizer HTTP on Windows

To use the Synthesizer HTTP API on Windows, you generate an authorization token and then call the API using cURL or the Python client. Copy the following into a Windows batch file named run-http.bat.

This file contains several cURL and Python commands. Enter your Mix credentials as described in Prerequisites from Mix, then uncomment the command you want.

@echo off
setlocal enabledelayedexpansion

set CLIENT_ID=<Mix client ID, starting with appID:>
set SECRET=<Mix client secret>
rem Change colons (:) to %3A in client ID
set CLIENT_ID=!CLIENT_ID::=%%3A!

set command=curl -s ^
-u %CLIENT_ID%:%SECRET% ^
-d "grant_type=client_credentials" -d "scope=tts" ^
https://auth.crt.nuance.com/oauth2/token

for /f "delims={}" %%a in ('%command%') do (
  for /f "tokens=1 delims=:, " %%b in ("%%a") do set key=%%b
  for /f "tokens=2 delims=:, " %%b in ("%%a") do set value=%%b
  goto done:
)

:done

rem Remove quotes
set MY_TOKEN=!value:"=!

rem Uncomment the command you want to run

rem 1) This gives information about Evan
REM curl -H "Authorization: Bearer %MY_TOKEN%" ^
REM https://tts.api.nuance.com/api/v1/voices ^
REM -d "{ \"voice\": { \"name\": \"Evan\" } }"

rem 2) This gives information about all the Mix voices
REM curl -H "Authorization: Bearer %MY_TOKEN%" https://tts.api.nuance.com/api/v1/voices

rem 3) This gives information about all French Canadian voices available in Mix
REM curl -H "Authorization: Bearer %MY_TOKEN%" ^
REM https://tts.api.nuance.com/api/v1/voices ^
REM -d "{ \"voice\": { \"language\": \"fr-ca\" } }"

rem 4) This generates audio in base64 format. Be aware the output is very long...
REM curl -H "Authorization: Bearer %MY_TOKEN%" ^
REM https://tts.api.nuance.com/api/v1/synthesize ^
REM -d "{ \"voice\": { \"name\": \"Evan\", \"model\": \"enhanced\" }, \"input\": { \"text\": { \"text\": \"This is a test. A very simple test.\"} } }"

rem 5) This generates audio as a wav file, with several variations.
REM python http-wav-client.py --wav

REM python http-wav-client.py --wav --voice "Zoe-Ml" --input "Shall I compare thee to a summers day"

REM python http-wav-client.py --wav --voice "Zoe-Ml" --type "ssml" ^
REM --input "<speak>Thou art more lovely and more temperate.</speak>"

For example, to see information about the voice Evan, comment out the lines following “1) This gives information about Evan”:

rem 1) This gives information about Evan
curl -H "Authorization: Bearer %MY_TOKEN%" ^
https://tts.api.nuance.com/api/v1/voices ^
-d "{ \"voice\": { \"name\": \"Evan\" } }"

Then run the batch file:

run-http.bat

{
 "voices": [
  {
   "name": "Evan",
   "model": "enhanced",
   "language": "en-US",
   "ageGroup": "ADULT",
   "gender": "MALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "1.1.1",
   "foreignLanguages": []
  }
 ]
}

To synthesize Evan saying, “This is a test,” uncomment the first Python command. It uses the default settings and the --wav argument:

rem 5) This gives audio as a wav file, with several variations.
python http-wav-client.py --wav
run-http.bat

Audio successfully written to output.wav