Synthesizer HTTP API

TTSaaS includes an HTTP API for requesting voices and synthesis operations. It is based on the Synthesizer gRPC API and offers two commands: voices and synthesis.

This API is a transcoded version of the main gRPC API, so it respects the JSON mapping detailed here.

Note:

This API uses UnarySynthesize so does not provide real-time streaming. If you require streaming output, use the gRPC API.

Base URL and authorization

The endpoint for TTSaaS HTTP commands in the Mix environment is:

https://tts.api.nuance.com/api/v1/

This service requires an authorization token. To generate the token, you can use this shell script, get-token.sh, replacing the CLIENT_ID and SECRET values with your credentials from Mix. See Prerequisites from Mix.

Note:

For a Windows version of the examples, see Synthesizer HTTP on Windows.

The script changes the colons in your client ID to their percent-encoded form, so you may enter your client ID as is.

CLIENT_ID=<Mix client ID>
SECRET=<Mix client secret>
CLIENT_ID=${CLIENT_ID//:/%3A}

export MY_TOKEN="`curl -s -u $CLIENT_ID:$SECRET \
https://auth.crt.nuance.com/oauth2/token \
-d 'grant_type=client_credentials' -d 'scope=tts' \
| jq -j .access_token`"

“Source” this script to generate an authorization token and make it available in the current shell. Then test the URL with a simple voices request using cURL:

source get-token.sh

curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/voices \
-d '{ "voice": { "name": "Evan" } }'

{
 "voices": [
  {
   "name": "Evan",
   "model": "enhanced",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "MALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "1.1.1",
   "foreignLanguages": []
  }
 ]
}

You must provide the token when calling the service. For example:

In a cURL command:

 curl -H "Authorization: Bearer $MY_TOKEN" https://tts.api.nuance.com/api/v1/voices

In a REST client, you may either generate a token manually and enter it in your request, or have your development environment generate it for you.
```
Authorization: Bearer <token>
```

In a Python client:

http_headers['Authorization'] = "Bearer {}".format(token)

Your authorization token expires after a short period of time. Source get-token.sh again when you get a 401 error, meaning status Unauthorized: The request could not be authorized.

/api/v1/voices

Queries the voice packs to learn which voices are available. Optionally include parameters to filter the results.

GET https://tts.api.nuance.com/api/v1/voices

The parameters for the voices command are:

Voices
Name	In	Type	Description
Authorization	header	object	Mandatory. Authorization token as `Bearer: token`
voice	body	voice	Optional. Filter the voices to retrieve. For example, set language to en-US to return only American English voices.

A successful response details the available voices, filtered when requested. See Status codes for other responses.

Get all available voices (cURL example):

curl -H "Authorization: Bearer $MY_TOKEN" https://tts.api.nuance.com/api/v1/voices

{
 "voices": [
  {
   "name": "Allison",
   "model": "standard",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "5.2.3.12283",
   "foreignLanguages": []
  },
  {
   "name": "Allison",
   "model": "standard",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 8000,
   "languageTlw": "enu",
   "restricted": false,
   "version": "5.2.3.12283",
   "foreignLanguages": []
  },
  {
   "name": "Ava-Ml",
   "model": "enhanced",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "3.0.1",
   "foreignLanguages": [
    "es-mx"
   ]
  },
  {
   "name": "Chloe",
   "model": "standard",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "5.2.3.15315",
   "foreignLanguages": []
  },
. . .

voice (in voices)

Filters the requested voices in the voices command. It contains one of the following:

Voice in get voices
Name	Type	Description
name	string	The voice’s name, for example, Evan.
model	string	The voice’s quality model, for example, enhanced or standard. (For backward compatibility, xpremium-high or xpremium are also accepted.)
language	string	IETF language code, for example, en-US. Search for voices with a specific language. Some voices support multiple languages.
age_group	string	Search for adult or child voices, using a keyword: ADULT (default) or CHILD.
gender	string	Search for voices with a certain gender, using a keyword: ANY (default), MALE, FEMALE, NEUTRAL.
sample_rate_hz	integer	Search for a certain native sample rate.
language_tlw	string	Three-letter language code (for example, enu for American English) for configuring language identification.

Filter results to retrieve voice name Evan:

curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/voices -d '{ "voice": { "name": "evan" } }'

{
 "voices": [
  {
   "name": "Evan",
   "model": "enhanced",
   "language": "en-us",
   "ageGroup": "ADULT",
   "gender": "MALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "1.1.1",
   "foreignLanguages": []
  }
 ]
}

Filter results to retrieve all French Canadian voices:

curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/voices -d '{ "voice": { "language": "fr-ca" } }'

{
 "voices": [
  {
   "name": "Amelie-Ml",
   "model": "enhanced",
   "language": "fr-ca",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 22050,
   "languageTlw": "frc",
   "restricted": false,
   "version": "2.1.1",
   "foreignLanguages": [
    "en-us",
    "en-gb",
    "es-mx"
   ]
  },
  {
   "name": "Chantal",
   "model": "standard",
   "language": "fr-ca",
   "ageGroup": "ADULT",
   "gender": "FEMALE",
   "sampleRateHz": 22050,
   "languageTlw": "frc",
   "restricted": false,
   "version": "2.1.0",
   "foreignLanguages": []
  },
  {
   "name": "Nicolas",
   "model": "standard",
   "language": "fr-ca",
   "ageGroup": "ADULT",
   "gender": "MALE",
   "sampleRateHz": 22050,
   "languageTlw": "frc",
   "restricted": false,
   "version": "2.0.0",
   "foreignLanguages": []
  }
 ]
}

/api/v1/synthesize

Sends a synthesis request and returns a unary (non-streamed) synthesis response. The request specifies a mandatory voice and input text, as well as optional audio parameters and so on.

POST https://tts.api.nuance.com/api/v1/synthesize

The parameters for the synthesize command are:

Synthesize
Name	In	Type	Description
Authorization	header	object	Mandatory. Authorization token as `Bearer: token`
voice	body	voice	Mandatory. The voice to perform the synthesis.
audio_params	body	audio_params	Output audio parameters, such as encoding and volume. Default is PCM audio at 22050 Hz.
input	body	input	Mandatory. Input text to synthesize, tuning data, etc.
event_params	body	event_params	Markers and other info to include in server events returned during synthesis.
client_data	body	map<string,string>	Map of client-supplied key:value pairs to inject into the call log.
user_id	body	string	Identifies a specific user within the application.

Synthesize plain text (cURL example):

curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/synthesize \
-d '{ "voice": { "name": "Evan", "model": "enhanced" }, "input": { "text": { "text": "This is a test. A very simple test."} } }'

For examples of the results, see Response to synthesize below.

voice (in synthesize)

In the synthesize command, this mandatory parameter specifies the voice to use for the synthesis operation. The other entries in the voice parameter are not used for synthesis.

Voice in synthesize
Name	Type	Description
name	string	Mandatory. The voice’s name, for example, Evan.
model	string	Mandatory. The voice’s quality model, for example, enhanced or standard. (For backward compatibility, xpremium-high or xpremium are also accepted.)

Mandatory voice parameters identify the voice to perform the synthesis:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  }
}

audio_params

Audio-related parameters for synthesis, including encoding, volume, and audio length. Included in synthesize. The default is PCM audio at 22050 Hz.

Voice in synthesize
Name	Type	Description
audio_format	audio_format	Audio encoding. Default PCM 22050 Hz.
volume_percentage	integer	Volume amplitude, from 0 to 100. Default 80.
speaking_rate_factor	number	Speaking rate, from 0 to 2.0. Default 1.0.
audio_chunk_ duration_ms	integer	Maximum duration, in ms, of an audio chunk delivered to the client, from 1 to 60000. Default is 20000 (20 seconds). When this parameter is large enough (for example, 20 or 30 seconds), each audio chunk contains an audible segment surrounded by silence.
target_audio_length_ms	integer	Maximum duration, in ms, of synthesized audio. When greater than 0, the server stops ongoing synthesis at the first sentence end, or silence, closest to the value.
disable_early_emission	boolean	By default, audio segments are emitted as soon as possible, even if they are not audible. This behavior may be disabled.

Optional audio parameters set audio to Ogg Opus and include three other options:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  },
  "audio_params":{
    "audio_format":{
      "ogg_opus":{
        "sample_rate_hz":16000
      }
    },
    "volume_percentage": 100,
    "speaking_rate_factor": 1.2,
    "target_audio_length_ms": 10
  }
}

audio_format

Audio encoding of synthesize text. Included in audio_params.

Voice in synthesize
Name	Type	Description
pcm	pcm	Signed 16-bit little endian PCM.
alaw	alaw	G.711 A-law, 8kHz.
ulaw	ulaw	G.711 Mu-law, 8kHz.
ogg_opus	ogg_opus	Ogg Opus, 8kHz,16kHz, or 24 kHz.
opus	opus	Opus, 8kHz, 16kHz, or 24kHz. The audio will be sent one Opus packet at a time.

pcm

The PCM sample rate. Included in audio_format.

Voice in synthesize
Name	Type	Description
sample_rate_hz	integer	Output sample rate in Hz. Supported values: 8000, 11025, 16000, 22050, 24000.

PCM sample rate changed to 16000 (from default 22050):

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  },
  "audio_params":{
    "audio_format":{
      "pcm":{
        "sample_rate_hz": 16000
      }
    }
  }
}

alaw

The A-law audio format. Included in audio_format. G.711 audio formats are set to 8 kHz.

Audio format changed to A-law:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  },
  "audio_params":{
    "audio_format":{
      "alaw":{}
    }
  }
}

ulaw

The μ-law audio format. Included in audio_format. G.711 audio formats are set to 8 kHz.

Audio format changed to μ-law:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  },
  "audio_params":{
    "audio_format":{
      "ulaw":{}
    }
  }
}

ogg_opus

The Ogg Opus output rate. Included in audio_format.

Voice in synthesize
Name	Type	Description
sample_rate_hz	integer	Output sample rate in Hz. Supported values: 8000, 16000, 24000.
bit_rate_bps	integer	Valid range is 500 to 256000 bps. Default 28000.
max_frame_ duration_ms	number	Opus frame size in ms: 2.5, 5, 10, 20, 40, 60. Default 20.
complexity	integer	Computational complexity. A complexity of 0 means the codec default.
vbr	vbr	Variable bitrate. On by default.

Audio format changed to Ogg Opus:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  },
  "audio_params":{
    "audio_format":{
      "ogg_opus":{
        "sample_rate_hz":16000
      }
    }
  }
}

opus

Opus output rate. Included in audio_format.

Voice in synthesize
Name	Type	Description
sample_rate_hz	integer	Output sample rate in Hz. Supported values: 8000, 16000, 24000.
bit_rate_bps	integer	Valid range is 500 to 256000 bps. Default 28000.
max_frame_ duration_ms	number	Opus frame size in ms: 2.5, 5, 10, 20, 40, 60. Default 20.
complexity	integer	Computational complexity. A complexity of 0 means the codec default.
vbr	vbr	Variable bitrate. On by default.

vbr

Settings for variable bitrate. Included in ogg_opus and opus. Turned on by default.

Voice in synthesize
Name	Number	Description
VARIABLE_BITRATE_ON	0	Use variable bitrate. Default.
VARIABLE_BITRATE_OFF	1	Do not use variable bitrate.
VARIABLE_BITRATE_ CONSTRAINED	2	Use constrained variable bitrate.

input

Text to synthesize and synthesis parameters, including tuning data, etc. Included in synthesize. The type of input may be plain text, SSML, or a sequence of plain text and Nuance control codes.

Voice in synthesize
Name	Type	Description
text	text	Plain text input.
ssml	ssml	SSML input, including text and SSML elements.
tokenized_sequence	tokenized_sequence	Sequence of text and Nuance control codes.
resources	resources	Repeated. Synthesis resources (user dictionaries, rulesets, etc.) to tune synthesized audio. Default empty.
lid_params	lid_params	LID parameters.
download_params	download_params	Remote file download parameters.

Minimal mandatory input:

{
  "voice":{
    "name":"Evan",
    "model":"enhanced"
  },
  "input":{
    "text":{
      "text":"This is a test. A very simple test."
    }
  }
}

text

Input for synthesizing plain text. The encoding must be UTF-8. Included in input.

Voice in synthesize
Name	Type	Description
text	string	Plain input text in UTF-8 encoding.
uri	string	Remote URI to the plain input text. Not supported in Nuance-hosted TTS.

ssml

Input for synthesizing SSML. Included in input. See SSML input for a list of supported elements.

Voice in synthesize
Name	Type	Description
text	string	SSML input text and elements.
uri	string	Remote URI to the SSML input text. Not supported in Nuance-hosted TTS.
ssml_validation_mode	ssml_validation_mode	SSML validation mode. Default STRICT.

Minimal SSML input:

{
  "voice": {
    "name": "Evan",
    "model": "enhanced"
  },
  "input": {
    "ssml": {
      "text": "<speak>This is an SSML test. A super simple test.</speak>"
    }
  }
}

ssml_validation_mode

SSML validation mode when using SSML input. Included in ssml. Strict by default but can be relaxed.

Voice in synthesize
Name	Number	Description
STRICT	0	Strict SSL validation. Default.
WARN	1	Give warning only.
NONE	2	Do not validate.

tokenized_sequence

Input for synthesizing a sequence of plain text and Nuance control codes. Included in input.

Voice in synthesize
Name	Type	Description
tokens	tokens	Repeated. Sequence of text and control codes.

tokens

The unit when using tokenized_sequence for input. Included in tokenized_sequence. Each token can be either plain text or a Nuance control code. See Tokenized sequence for supported codes.

Voice in synthesize
Name	Type	Description
text	string	Plain input text.
control_code	control_code	Nuance control code.

control_code

Nuance control code that specifies how text should be spoken, similarly to SSML. Included in tokens.

Voice in synthesize
Name	Type	Description
key	string	Name of the control code, for example, pause
value	string	Value of the control code.

resources

A resource for tuning the synthesized output. Included in input.

Voice in synthesize
Name	Type	Description
type	type	Resource type, for example, user dictionary, etc. Default USER_DICTIONARY.
uri	string	URI to the remote resource. Either a URL or the URN of a resource previously uploaded to cloud storage with the Storage gRPC API. See URNs for the format.
body	bytes	For resource type USER_DICTIONARY, the contents of the file.

type

The type of synthesis resource to tune the output. Included in resources. User dictionaries provide custom pronunciations, rulesets apply search-and-replace rules to input text, and ActivePrompt databases help tune synthesized audio under certain conditions, using Nuance Vocalizer Studio.

Voice in synthesize
Name	Number	Description
USER_DICTIONARY	0	User dictionary (application/edct-bin-dictionary). Default.
TEXT_USER_RULESET	1	Text user ruleset (application/x-vocalizer-rettt+text).
BINARY_USER_RULESET	2	Not supported. Binary user ruleset (application/x-vocalizer-rettt+bin).
ACTIVEPROMPT_DB	3	ActivePrompt database (application/x-vocalizer-activeprompt-db).
ACTIVEPROMPT_DB_AUTO	4	ActivePrompt database with automatic insertion (application/x-vocalizer-activeprompt-db;mode=automatic).
SYSTEM_DICTIONARY	5	Nuance system dictionary (application/sdct-bin-dictionary).

lid_params

Parameters for controlling the language identifier. Included in input. The language identifier runs on input blocks labeled with the control code lang unknown or the SSML attribute xml:lang="unknown". The language identifier automatically restricts the matched languages to the installed voices. This limits the permissible languages, and also sets the order of precedence (first to last) when they have equal confidence scores.

Voice in synthesize
Name	Type	Description
disable	boolean	Whether to disable language identification. Turned on by default.
languages	string	Repeated. List of three-letter language codes (for example, enu, frc, spm) to restrict language identification results, in order of precedence. Use `voices` to obtain the three-letter codes, returned in language_tlw. Default empty.
always_use_ highest_confidence	boolean	If enabled, language identification always chooses the language with the highest confidence score, even if the score is low. Default false, meaning use language with any confidence.

download_params

Parameters for remote file download, whether for input text (input.uri) or a synthesis resource (resource.uri). Included in input.

Voice in synthesize
Name	Type	Description
headers	map<string,string>	Map of HTTP header name,value pairs to include in outgoing requests. Supported headers: max_age, max_stale.
request_timeout_ms	integer	Request timeout in ms. Default (0) means server default, usually 30000 (30 seconds).
refuse_cookies	boolean	Whether to disable cookies. By default, HTTP requests accept cookies.

event_params

Event subscription parameters. Included in synthesize. Requested events are reported in the response.

Voice in synthesize
Name	Type	Description
send_sentence_marker_events	boolean	Sentence marker. Default: do not send.
send_word_marker_events	boolean	Word marker. Default: do not send.
send_phoneme_marker_events	boolean	Phoneme marker. Default: do not send.
send_bookmark_marker_events	boolean	Bookmark marker. Default: do not send.
send_paragraph_marker_events	boolean	Paragraph marker. Default: do not send.
send_visemes	boolean	Lipsync information. Default: do not send.
send_log_events	boolean	Whether to log events during synthesis. By default, logging is turned off.
suppress_input	boolean	Whether to omit input text and URIs from log events. By default, these items are included.

Event parameters:

{
  "voice": {
    "name": "Evan",
    "model": "enhanced"
   },
  "input": {
    "text": {
      "text": "This is a test. A very simple test."
    }
  },
  "event_params": {
        "send_log_events" true,
        "send_sentence_marker_events": true,
        "send_word_marker_events": true
  }
}

Response to synthesize

The synthesize command returns a unary (non-streamed) message containing:

A status code, indicating completion or failure of the request. See Status codes.
A list of events the client has requested. See event_params for details.
The complete audio buffer of the synthesized text, in base64 format.

curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/synthesize \
-d '{ "voice": { "name": "Evan", "model": "enhanced" }, "input": { "text": { "text": "This is a test. A very simple test."} } }'

{
  "status": {
    "code": 200,
    "message": "OK",
    "details": ""
  },
  "audio": "AAAAAAAAA... (synthesized audio in base64 format)"
}

Sample client for WAV output

Additional processing is required to convert this audio to a playable audio format. For example, Python processing in http-wav-client.py converts base64 audio to WAV format:

#!/usr/bin/env python3

import requests as req
import base64
import os
import argparse

global args

# Generates the .wav file header for a given set of parameters
def generate_wav_header(sampleRate, bitsPerSample, channels, datasize, formattype):
    # (4byte) Marks file as RIFF
    o = bytes("RIFF", 'ascii')
    # (4byte) File size in bytes excluding this and RIFF marker
    o += (datasize + 36).to_bytes(4, 'little')
    # (4byte) File type
    o += bytes("WAVE", 'ascii')
    # (4byte) Format Chunk Marker
    o += bytes("fmt ", 'ascii')
    # (4byte) Length of above format data
    o += (16).to_bytes(4, 'little')
    # (2byte) Format type (1 - PCM)
    o += (formattype).to_bytes(2, 'little')
    # (2byte) Will always be 1 for TTS
    o += (channels).to_bytes(2, 'little')
    # (4byte)
    o += (sampleRate).to_bytes(4, 'little')
    o += (sampleRate * channels * bitsPerSample // 8).to_bytes(4, 'little')  # (4byte)
    o += (channels * bitsPerSample // 8).to_bytes(2,'little')               # (2byte)
    # (2byte)
    o += (bitsPerSample).to_bytes(2, 'little')
    # (4byte) Data Chunk Marker
    o += bytes("data", 'ascii')
    # (4byte) Data size in bytes
    o += (datasize).to_bytes(4, 'little')

    return o

token = os.getenv('MY_TOKEN')

parser = argparse.ArgumentParser(description='TTS HTTP Client')

options = parser.add_argument_group("options")
options.add_argument("--wav", action="store_true",
                     help="Save audio file in WAVE format")
options.add_argument("--voice", nargs="?",
                     help="Voice name (default=Evan)", default="Evan")
options.add_argument("--model", nargs="?",
                     help="Voice model type (default=enhanced)", default="enhanced")
options.add_argument("--type", nargs="?",
                     help="Input type: text or ssml (default=text)", default="text")
options.add_argument("--input", nargs="?",
                     help="Input text (default=This is a test)", default="This is a test.")

args = parser.parse_args()

http_headers = {}
http_headers['Authorization'] = "Bearer {}".format(token)

formatted_data = '{{ "voice": {{ "name": "{voice_name}", "model": "{model_name}" }}, "input": {{ "{input_type}": {{ "text": "{input_text}"}} }} }}'.format(voice_name=args.voice, model_name=args.model, input_type=args.type, input_text=args.input)

response = req.post('https://tts.api.nuance.com/api/v1/synthesize', data=formatted_data, headers=http_headers)

if response.status_code != 200:
    raise Exception("Failed to synthesize. Status: {}".format(response.status_code))

json_response = response.json()

if json_response["status"]["code"] != 200:
    print("Failed to synthesize. Message: {}. Status: {}".format(json_response["status"]["message"], json_response["status"]["code"]))
else:
    decoded_audio_response = base64.b64decode(json_response["audio"])

    waveheader = generate_wav_header(22050, 16, 1, len(decoded_audio_response), 1)

    if response.status_code == 200:
        if args.wav:
            with open("output.wav","wb") as output_file:
                output_file.seek(0, 0)
                output_file.write(waveheader)
                output_file.seek(0, 2)
                output_file.write(decoded_audio_response)
                print("Audio successfully written to", output_file.name)
        else:
            with open("output.raw", "wb") as output_file:
                output_file.write(decoded_audio_response)
                print("Audio successfully written to", output_file.name)

Copy this code into a text file named http-wav.client.py. If you don’t have the Python requests module, install it with pip install requests.

This client accepts an authorization token and returns the audio, either as base64 audio or, with the --wav argument, as a WAV file named output.wav.

Source the get-token.sh script (see Base URL and authorization) to generate and export an authorization token, then call the Python client. This client accepts the following arguments:

python3 http-wav-client.py --help

usage: http-wav-client.py [-h] [--wav] [--voice [VOICE]] [--model [MODEL]] [--type [TYPE]] [--input [INPUT]]

TTS HTTP Client

options:
  -h, --help       show this help message and exit

options:
  --wav            Save audio file in WAVE format
  --voice [VOICE]  Voice name (default=Evan)
  --model [MODEL]  Voice model type (default=enhanced)
  --type [TYPE]    Input type: text or ssml (default=text)
  --input [INPUT]  Input text (default=This is a test)

This example uses the default voice and input but sets the output file to WAV format:

source get-token.sh

python3 http-wav-client.py --wav

Audio successfully written to output.wav

Optionally use a different voice and specify your own input:

python3 http-wav-client.py --wav --voice "Zoe-Ml" --input "Shall I compare thee to a summers day"

Audio successfully written to output.wav

Or change to SSML input:

python3 http-wav-client.py --wav --voice "Zoe-Ml" --type "ssml" \
--input "<speak>Thou art more lovely and more temperate.</speak>"

Audio successfully written to output.wav

Your authorization token expires after a short period of time. Re-run get-token.sh when you get status error 401: Failed to synthesize. See Status codes for other codes.

Synthesizer HTTP on Windows

To use the Synthesizer HTTP API on Windows, you generate an authorization token and then call the API using cURL or the Python client. Copy the following into a Windows batch file named run-http.bat.

This file contains several cURL and Python commands. Enter your Mix credentials as described in Prerequisites from Mix, then uncomment the command you want.

@echo off
setlocal enabledelayedexpansion

set CLIENT_ID=<Mix client ID, starting with appID:>
set SECRET=<Mix client secret>
rem Change colons (:) to %3A in client ID
set CLIENT_ID=!CLIENT_ID::=%%3A!

set command=curl -s ^
-u %CLIENT_ID%:%SECRET% ^
-d "grant_type=client_credentials" -d "scope=tts" ^
https://auth.crt.nuance.com/oauth2/token

for /f "delims={}" %%a in ('%command%') do (
  for /f "tokens=1 delims=:, " %%b in ("%%a") do set key=%%b
  for /f "tokens=2 delims=:, " %%b in ("%%a") do set value=%%b
  goto done:
)

:done

rem Remove quotes
set MY_TOKEN=!value:"=!

rem Uncomment the command you want to run

rem 1) This gives information about Evan
REM curl -H "Authorization: Bearer %MY_TOKEN%" ^
REM https://tts.api.nuance.com/api/v1/voices ^
REM -d "{ \"voice\": { \"name\": \"Evan\" } }"

rem 2) This gives information about all the Mix voices
REM curl -H "Authorization: Bearer %MY_TOKEN%" https://tts.api.nuance.com/api/v1/voices

rem 3) This gives information about all French Canadian voices available in Mix
REM curl -H "Authorization: Bearer %MY_TOKEN%" ^
REM https://tts.api.nuance.com/api/v1/voices ^
REM -d "{ \"voice\": { \"language\": \"fr-ca\" } }"

rem 4) This generates audio in base64 format. Be aware the output is very long...
REM curl -H "Authorization: Bearer %MY_TOKEN%" ^
REM https://tts.api.nuance.com/api/v1/synthesize ^
REM -d "{ \"voice\": { \"name\": \"Evan\", \"model\": \"enhanced\" }, \"input\": { \"text\": { \"text\": \"This is a test. A very simple test.\"} } }"

rem 5) This generates audio as a wav file, with several variations.
REM python http-wav-client.py --wav

REM python http-wav-client.py --wav --voice "Zoe-Ml" --input "Shall I compare thee to a summers day"

REM python http-wav-client.py --wav --voice "Zoe-Ml" --type "ssml" ^
REM --input "<speak>Thou art more lovely and more temperate.</speak>"

For example, to see information about the voice Evan, comment out the lines following “1) This gives information about Evan”:

rem 1) This gives information about Evan
curl -H "Authorization: Bearer %MY_TOKEN%" ^
https://tts.api.nuance.com/api/v1/voices ^
-d "{ \"voice\": { \"name\": \"Evan\" } }"

Then run the batch file:

run-http.bat

{
 "voices": [
  {
   "name": "Evan",
   "model": "enhanced",
   "language": "en-US",
   "ageGroup": "ADULT",
   "gender": "MALE",
   "sampleRateHz": 22050,
   "languageTlw": "enu",
   "restricted": false,
   "version": "1.1.1",
   "foreignLanguages": []
  }
 ]
}

To synthesize Evan saying, “This is a test,” uncomment the first Python command. It uses the default settings and the --wav argument:

rem 5) This gives audio as a wav file, with several variations.
python http-wav-client.py --wav

run-http.bat

Audio successfully written to output.wav

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.