Synthesizer HTTP API
TTSaaS includes an HTTP API for requesting voices and synthesis operations. It is based on the Synthesizer gRPC API and offers two commands: voices and synthesis.
This API is a transcoded version of the main gRPC API, so it respects the JSON mapping detailed here.
Note:
This API uses UnarySynthesize so does not provide real-time streaming. If you require streaming output, use the gRPC API.Base URL and authorization
The endpoint for TTSaaS HTTP commands in the Mix environment is:
https://tts.api.nuance.com/api/v1/
This service requires an authorization token. To generate the token, you can use this shell script, get-token.sh, replacing the CLIENT_ID and SECRET values with your credentials from Mix. See Prerequisites from Mix.
Note:
For a Windows version of the examples, see Synthesizer HTTP on Windows.The script changes the colons in your client ID to their percent-encoded form, so you may enter your client ID as is.
CLIENT_ID=<Mix client ID>
SECRET=<Mix client secret>
CLIENT_ID=${CLIENT_ID//:/%3A}
export MY_TOKEN="`curl -s -u $CLIENT_ID:$SECRET \
https://auth.crt.nuance.com/oauth2/token \
-d 'grant_type=client_credentials' -d 'scope=tts' \
| jq -j .access_token`"
“Source” this script to generate an authorization token and make it available in the current shell. Then test the URL with a simple voices request using cURL:
source get-token.sh
curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/voices \
-d '{ "voice": { "name": "Evan" } }'
{
"voices": [
{
"name": "Evan",
"model": "enhanced",
"language": "en-us",
"ageGroup": "ADULT",
"gender": "MALE",
"sampleRateHz": 22050,
"languageTlw": "enu",
"restricted": false,
"version": "1.1.1",
"foreignLanguages": []
}
]
}
You must provide the token when calling the service. For example:
-
In a cURL command:
curl -H "Authorization: Bearer $MY_TOKEN" https://tts.api.nuance.com/api/v1/voices
-
In a REST client, you may either generate a token manually and enter it in your request, or have your development environment generate it for you.
Authorization: Bearer <token>
-
In a Python client:
http_headers['Authorization'] = "Bearer {}".format(token)
Your authorization token expires after a short period of time. Source get-token.sh again when you get a 401 error, meaning status Unauthorized: The request could not be authorized.
/api/v1/voices
Queries the voice packs to learn which voices are available. Optionally include parameters to filter the results.
GET https://tts.api.nuance.com/api/v1/voices
The parameters for the voices
command are:
Name | In | Type | Description |
---|---|---|---|
Authorization | header | object | Mandatory. Authorization token as Bearer: token |
voice | body | voice | Optional. Filter the voices to retrieve. For example, set language to en-US to return only American English voices. |
A successful response details the available voices, filtered when requested. See Status codes for other responses.
Get all available voices (cURL example):
curl -H "Authorization: Bearer $MY_TOKEN" https://tts.api.nuance.com/api/v1/voices
{
"voices": [
{
"name": "Allison",
"model": "standard",
"language": "en-us",
"ageGroup": "ADULT",
"gender": "FEMALE",
"sampleRateHz": 22050,
"languageTlw": "enu",
"restricted": false,
"version": "5.2.3.12283",
"foreignLanguages": []
},
{
"name": "Allison",
"model": "standard",
"language": "en-us",
"ageGroup": "ADULT",
"gender": "FEMALE",
"sampleRateHz": 8000,
"languageTlw": "enu",
"restricted": false,
"version": "5.2.3.12283",
"foreignLanguages": []
},
{
"name": "Ava-Ml",
"model": "enhanced",
"language": "en-us",
"ageGroup": "ADULT",
"gender": "FEMALE",
"sampleRateHz": 22050,
"languageTlw": "enu",
"restricted": false,
"version": "3.0.1",
"foreignLanguages": [
"es-mx"
]
},
{
"name": "Chloe",
"model": "standard",
"language": "en-us",
"ageGroup": "ADULT",
"gender": "FEMALE",
"sampleRateHz": 22050,
"languageTlw": "enu",
"restricted": false,
"version": "5.2.3.15315",
"foreignLanguages": []
},
. . .
voice (in voices)
Filters the requested voices in the voices command. It contains one of the following:
Name | Type | Description |
---|---|---|
name | string | The voice’s name, for example, Evan. |
model | string | The voice’s quality model, for example, enhanced or standard. (For backward compatibility, xpremium-high or xpremium are also accepted.) |
language | string | IETF language code, for example, en-US. Search for voices with a specific language. Some voices support multiple languages. |
age_group | string | Search for adult or child voices, using a keyword: ADULT (default) or CHILD. |
gender | string | Search for voices with a certain gender, using a keyword: ANY (default), MALE, FEMALE, NEUTRAL. |
sample_rate_hz | integer | Search for a certain native sample rate. |
language_tlw | string | Three-letter language code (for example, enu for American English) for configuring language identification. |
curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/voices -d '{ "voice": { "name": "evan" } }'
{
"voices": [
{
"name": "Evan",
"model": "enhanced",
"language": "en-us",
"ageGroup": "ADULT",
"gender": "MALE",
"sampleRateHz": 22050,
"languageTlw": "enu",
"restricted": false,
"version": "1.1.1",
"foreignLanguages": []
}
]
}
Filter results to retrieve all French Canadian voices:
curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/voices -d '{ "voice": { "language": "fr-ca" } }'
{
"voices": [
{
"name": "Amelie-Ml",
"model": "enhanced",
"language": "fr-ca",
"ageGroup": "ADULT",
"gender": "FEMALE",
"sampleRateHz": 22050,
"languageTlw": "frc",
"restricted": false,
"version": "2.1.1",
"foreignLanguages": [
"en-us",
"en-gb",
"es-mx"
]
},
{
"name": "Chantal",
"model": "standard",
"language": "fr-ca",
"ageGroup": "ADULT",
"gender": "FEMALE",
"sampleRateHz": 22050,
"languageTlw": "frc",
"restricted": false,
"version": "2.1.0",
"foreignLanguages": []
},
{
"name": "Nicolas",
"model": "standard",
"language": "fr-ca",
"ageGroup": "ADULT",
"gender": "MALE",
"sampleRateHz": 22050,
"languageTlw": "frc",
"restricted": false,
"version": "2.0.0",
"foreignLanguages": []
}
]
}
/api/v1/synthesize
Sends a synthesis request and returns a unary (non-streamed) synthesis response. The request specifies a mandatory voice and input text, as well as optional audio parameters and so on.
POST https://tts.api.nuance.com/api/v1/synthesize
The parameters for the synthesize
command are:
Name | In | Type | Description |
---|---|---|---|
Authorization | header | object | Mandatory. Authorization token as Bearer: token |
voice | body | voice | Mandatory. The voice to perform the synthesis. |
audio_params | body | audio_params | Output audio parameters, such as encoding and volume. Default is PCM audio at 22050 Hz. |
input | body | input | Mandatory. Input text to synthesize, tuning data, etc. |
event_params | body | event_params | Markers and other info to include in server events returned during synthesis. |
client_data | body | map<string,string> | Map of client-supplied key:value pairs to inject into the call log. |
user_id | body | string | Identifies a specific user within the application. |
Synthesize plain text (cURL example):
curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/synthesize \
-d '{ "voice": { "name": "Evan", "model": "enhanced" }, "input": { "text": { "text": "This is a test. A very simple test."} } }'
For examples of the results, see Response to synthesize below.
voice (in synthesize)
In the synthesize command, this mandatory parameter specifies the voice to use for the synthesis operation. The other entries in the voice
parameter are not used for synthesis.
Name | Type | Description |
---|---|---|
name | string | Mandatory. The voice’s name, for example, Evan. |
model | string | Mandatory. The voice’s quality model, for example, enhanced or standard. (For backward compatibility, xpremium-high or xpremium are also accepted.) |
Mandatory voice parameters identify the voice to perform the synthesis:
{
"voice":{
"name":"Evan",
"model":"enhanced"
},
"input":{
"text":{
"text":"This is a test. A very simple test."
}
}
}
audio_params
Audio-related parameters for synthesis, including encoding, volume, and audio length. Included in synthesize. The default is PCM audio at 22050 Hz.
Name | Type | Description |
---|---|---|
audio_format | audio_format | Audio encoding. Default PCM 22050 Hz. |
volume_percentage | integer | Volume amplitude, from 0 to 100. Default 80. |
speaking_rate_factor | number | Speaking rate, from 0 to 2.0. Default 1.0. |
audio_chunk_ duration_ms | integer | Maximum duration, in ms, of an audio chunk delivered to the client, from 1 to 60000. Default is 20000 (20 seconds). When this parameter is large enough (for example, 20 or 30 seconds), each audio chunk contains an audible segment surrounded by silence. |
target_audio_length_ms | integer | Maximum duration, in ms, of synthesized audio. When greater than 0, the server stops ongoing synthesis at the first sentence end, or silence, closest to the value. |
disable_early_emission | boolean | By default, audio segments are emitted as soon as possible, even if they are not audible. This behavior may be disabled. |
Optional audio parameters set audio to Ogg Opus and include three other options:
{
"voice":{
"name":"Evan",
"model":"enhanced"
},
"input":{
"text":{
"text":"This is a test. A very simple test."
}
},
"audio_params":{
"audio_format":{
"ogg_opus":{
"sample_rate_hz":16000
}
},
"volume_percentage": 100,
"speaking_rate_factor": 1.2,
"target_audio_length_ms": 10
}
}
audio_format
Audio encoding of synthesize text. Included in audio_params.
Name | Type | Description |
---|---|---|
pcm | pcm | Signed 16-bit little endian PCM. |
alaw | alaw | G.711 A-law, 8kHz. |
ulaw | ulaw | G.711 Mu-law, 8kHz. |
ogg_opus | ogg_opus | Ogg Opus, 8kHz,16kHz, or 24 kHz. |
opus | opus | Opus, 8kHz, 16kHz, or 24kHz. The audio will be sent one Opus packet at a time. |
pcm
The PCM sample rate. Included in audio_format.
Name | Type | Description |
---|---|---|
sample_rate_hz | integer | Output sample rate in Hz. Supported values: 8000, 11025, 16000, 22050, 24000. |
PCM sample rate changed to 16000 (from default 22050):
{
"voice":{
"name":"Evan",
"model":"enhanced"
},
"input":{
"text":{
"text":"This is a test. A very simple test."
}
},
"audio_params":{
"audio_format":{
"pcm":{
"sample_rate_hz": 16000
}
}
}
}
alaw
The A-law audio format. Included in audio_format. G.711 audio formats are set to 8 kHz.
Audio format changed to A-law:
{
"voice":{
"name":"Evan",
"model":"enhanced"
},
"input":{
"text":{
"text":"This is a test. A very simple test."
}
},
"audio_params":{
"audio_format":{
"alaw":{}
}
}
}
ulaw
The μ-law audio format. Included in audio_format. G.711 audio formats are set to 8 kHz.
Audio format changed to μ-law:
{
"voice":{
"name":"Evan",
"model":"enhanced"
},
"input":{
"text":{
"text":"This is a test. A very simple test."
}
},
"audio_params":{
"audio_format":{
"ulaw":{}
}
}
}
ogg_opus
The Ogg Opus output rate. Included in audio_format.
Name | Type | Description |
---|---|---|
sample_rate_hz | integer | Output sample rate in Hz. Supported values: 8000, 16000, 24000. |
bit_rate_bps | integer | Valid range is 500 to 256000 bps. Default 28000. |
max_frame_ duration_ms | number | Opus frame size in ms: 2.5, 5, 10, 20, 40, 60. Default 20. |
complexity | integer | Computational complexity. A complexity of 0 means the codec default. |
vbr | vbr | Variable bitrate. On by default. |
Audio format changed to Ogg Opus:
{
"voice":{
"name":"Evan",
"model":"enhanced"
},
"input":{
"text":{
"text":"This is a test. A very simple test."
}
},
"audio_params":{
"audio_format":{
"ogg_opus":{
"sample_rate_hz":16000
}
}
}
}
opus
Opus output rate. Included in audio_format.
Name | Type | Description |
---|---|---|
sample_rate_hz | integer | Output sample rate in Hz. Supported values: 8000, 16000, 24000. |
bit_rate_bps | integer | Valid range is 500 to 256000 bps. Default 28000. |
max_frame_ duration_ms | number | Opus frame size in ms: 2.5, 5, 10, 20, 40, 60. Default 20. |
complexity | integer | Computational complexity. A complexity of 0 means the codec default. |
vbr | vbr | Variable bitrate. On by default. |
vbr
Settings for variable bitrate. Included in ogg_opus
and opus
. Turned on by default.
Name | Number | Description |
---|---|---|
VARIABLE_BITRATE_ON | 0 | Use variable bitrate. Default. |
VARIABLE_BITRATE_OFF | 1 | Do not use variable bitrate. |
VARIABLE_BITRATE_ CONSTRAINED | 2 | Use constrained variable bitrate. |
input
Text to synthesize and synthesis parameters, including tuning data, etc. Included in synthesize. The type of input may be plain text, SSML, or a sequence of plain text and Nuance control codes.
Name | Type | Description |
---|---|---|
text | text | Plain text input. |
ssml | ssml | SSML input, including text and SSML elements. |
tokenized_sequence | tokenized_sequence | Sequence of text and Nuance control codes. |
resources | resources | Repeated. Synthesis resources (user dictionaries, rulesets, etc.) to tune synthesized audio. Default empty. |
lid_params | lid_params | LID parameters. |
download_params | download_params | Remote file download parameters. |
Minimal mandatory input:
{
"voice":{
"name":"Evan",
"model":"enhanced"
},
"input":{
"text":{
"text":"This is a test. A very simple test."
}
}
}
text
Input for synthesizing plain text. The encoding must be UTF-8. Included in input.
Name | Type | Description |
---|---|---|
text | string | Plain input text in UTF-8 encoding. |
uri | string | Remote URI to the plain input text. Not supported in Nuance-hosted TTS. |
ssml
Input for synthesizing SSML. Included in input. See SSML input for a list of supported elements.
Name | Type | Description |
---|---|---|
text | string | SSML input text and elements. |
uri | string | Remote URI to the SSML input text. Not supported in Nuance-hosted TTS. |
ssml_validation_mode | ssml_validation_mode | SSML validation mode. Default STRICT. |
Minimal SSML input:
{
"voice": {
"name": "Evan",
"model": "enhanced"
},
"input": {
"ssml": {
"text": "<speak>This is an SSML test. A super simple test.</speak>"
}
}
}
ssml_validation_mode
SSML validation mode when using SSML input. Included in ssml
. Strict by default but can be relaxed.
Name | Number | Description |
---|---|---|
STRICT | 0 | Strict SSL validation. Default. |
WARN | 1 | Give warning only. |
NONE | 2 | Do not validate. |
tokenized_sequence
Input for synthesizing a sequence of plain text and Nuance control codes. Included in input.
Name | Type | Description |
---|---|---|
tokens | tokens | Repeated. Sequence of text and control codes. |
tokens
The unit when using tokenized_sequence for input. Included in tokenized_sequence. Each token can be either plain text or a Nuance control code. See Tokenized sequence for supported codes.
Name | Type | Description |
---|---|---|
text | string | Plain input text. |
control_code | control_code | Nuance control code. |
control_code
Nuance control code that specifies how text should be spoken, similarly to SSML. Included in tokens.
Name | Type | Description |
---|---|---|
key | string | Name of the control code, for example, pause |
value | string | Value of the control code. |
resources
A resource for tuning the synthesized output. Included in input.
Name | Type | Description |
---|---|---|
type | type | Resource type, for example, user dictionary, etc. Default USER_DICTIONARY. |
uri | string | URI to the remote resource. Either a URL or the URN of a resource previously uploaded to cloud storage with the Storage gRPC API. See URNs for the format. |
body | bytes | For resource type USER_DICTIONARY, the contents of the file. |
type
The type of synthesis resource to tune the output. Included in resources. User dictionaries provide custom pronunciations, rulesets apply search-and-replace rules to input text, and ActivePrompt databases help tune synthesized audio under certain conditions, using Nuance Vocalizer Studio.
Name | Number | Description |
---|---|---|
USER_DICTIONARY | 0 | User dictionary (application/edct-bin-dictionary). Default. |
TEXT_USER_RULESET | 1 | Text user ruleset (application/x-vocalizer-rettt+text). |
BINARY_USER_RULESET | 2 | Not supported. Binary user ruleset (application/x-vocalizer-rettt+bin). |
ACTIVEPROMPT_DB | 3 | ActivePrompt database (application/x-vocalizer-activeprompt-db). |
ACTIVEPROMPT_DB_AUTO | 4 | ActivePrompt database with automatic insertion (application/x-vocalizer-activeprompt-db;mode=automatic). |
SYSTEM_DICTIONARY | 5 | Nuance system dictionary (application/sdct-bin-dictionary). |
lid_params
Parameters for controlling the language identifier. Included in input. The language identifier runs on input blocks labeled with the control code lang unknown
or the SSML attribute xml:lang="unknown"
. The language identifier automatically restricts the matched languages to the installed voices. This limits the permissible languages, and also sets the order of precedence (first to last) when they have equal confidence scores.
Name | Type | Description |
---|---|---|
disable | boolean | Whether to disable language identification. Turned on by default. |
languages | string | Repeated. List of three-letter language codes (for example, enu, frc, spm) to restrict language identification results, in order of precedence. Use voices to obtain the three-letter codes, returned in language_tlw. Default empty. |
always_use_ highest_confidence | boolean | If enabled, language identification always chooses the language with the highest confidence score, even if the score is low. Default false, meaning use language with any confidence. |
download_params
Parameters for remote file download, whether for input text (input.uri
) or a synthesis resource (resource.uri
). Included in input.
Name | Type | Description |
---|---|---|
headers | map<string,string> | Map of HTTP header name,value pairs to include in outgoing requests. Supported headers: max_age, max_stale. |
request_timeout_ms | integer | Request timeout in ms. Default (0) means server default, usually 30000 (30 seconds). |
refuse_cookies | boolean | Whether to disable cookies. By default, HTTP requests accept cookies. |
event_params
Event subscription parameters. Included in synthesize. Requested events are reported in the response.
Name | Type | Description |
---|---|---|
send_sentence_marker_events | boolean | Sentence marker. Default: do not send. |
send_word_marker_events | boolean | Word marker. Default: do not send. |
send_phoneme_marker_events | boolean | Phoneme marker. Default: do not send. |
send_bookmark_marker_events | boolean | Bookmark marker. Default: do not send. |
send_paragraph_marker_events | boolean | Paragraph marker. Default: do not send. |
send_visemes | boolean | Lipsync information. Default: do not send. |
send_log_events | boolean | Whether to log events during synthesis. By default, logging is turned off. |
suppress_input | boolean | Whether to omit input text and URIs from log events. By default, these items are included. |
Event parameters:
{
"voice": {
"name": "Evan",
"model": "enhanced"
},
"input": {
"text": {
"text": "This is a test. A very simple test."
}
},
"event_params": {
"send_log_events" true,
"send_sentence_marker_events": true,
"send_word_marker_events": true
}
}
Response to synthesize
The synthesize
command returns a unary (non-streamed) message containing:
- A status code, indicating completion or failure of the request. See Status codes.
- A list of events the client has requested. See event_params for details.
- The complete audio buffer of the synthesized text, in base64 format.
curl -H "Authorization: Bearer $MY_TOKEN" \
https://tts.api.nuance.com/api/v1/synthesize \
-d '{ "voice": { "name": "Evan", "model": "enhanced" }, "input": { "text": { "text": "This is a test. A very simple test."} } }'
{
"status": {
"code": 200,
"message": "OK",
"details": ""
},
"audio": "AAAAAAAAA... (synthesized audio in base64 format)"
}
Sample client for WAV output
Additional processing is required to convert this audio to a playable audio format. For example, Python processing in http-wav-client.py converts base64 audio to WAV format:
#!/usr/bin/env python3
import requests as req
import base64
import os
import argparse
global args
# Generates the .wav file header for a given set of parameters
def generate_wav_header(sampleRate, bitsPerSample, channels, datasize, formattype):
# (4byte) Marks file as RIFF
o = bytes("RIFF", 'ascii')
# (4byte) File size in bytes excluding this and RIFF marker
o += (datasize + 36).to_bytes(4, 'little')
# (4byte) File type
o += bytes("WAVE", 'ascii')
# (4byte) Format Chunk Marker
o += bytes("fmt ", 'ascii')
# (4byte) Length of above format data
o += (16).to_bytes(4, 'little')
# (2byte) Format type (1 - PCM)
o += (formattype).to_bytes(2, 'little')
# (2byte) Will always be 1 for TTS
o += (channels).to_bytes(2, 'little')
# (4byte)
o += (sampleRate).to_bytes(4, 'little')
o += (sampleRate * channels * bitsPerSample // 8).to_bytes(4, 'little') # (4byte)
o += (channels * bitsPerSample // 8).to_bytes(2,'little') # (2byte)
# (2byte)
o += (bitsPerSample).to_bytes(2, 'little')
# (4byte) Data Chunk Marker
o += bytes("data", 'ascii')
# (4byte) Data size in bytes
o += (datasize).to_bytes(4, 'little')
return o
token = os.getenv('MY_TOKEN')
parser = argparse.ArgumentParser(description='TTS HTTP Client')
options = parser.add_argument_group("options")
options.add_argument("--wav", action="store_true",
help="Save audio file in WAVE format")
options.add_argument("--voice", nargs="?",
help="Voice name (default=Evan)", default="Evan")
options.add_argument("--model", nargs="?",
help="Voice model type (default=enhanced)", default="enhanced")
options.add_argument("--type", nargs="?",
help="Input type: text or ssml (default=text)", default="text")
options.add_argument("--input", nargs="?",
help="Input text (default=This is a test)", default="This is a test.")
args = parser.parse_args()
http_headers = {}
http_headers['Authorization'] = "Bearer {}".format(token)
formatted_data = '{{ "voice": {{ "name": "{voice_name}", "model": "{model_name}" }}, "input": {{ "{input_type}": {{ "text": "{input_text}"}} }} }}'.format(voice_name=args.voice, model_name=args.model, input_type=args.type, input_text=args.input)
response = req.post('https://tts.api.nuance.com/api/v1/synthesize', data=formatted_data, headers=http_headers)
if response.status_code != 200:
raise Exception("Failed to synthesize. Status: {}".format(response.status_code))
json_response = response.json()
if json_response["status"]["code"] != 200:
print("Failed to synthesize. Message: {}. Status: {}".format(json_response["status"]["message"], json_response["status"]["code"]))
else:
decoded_audio_response = base64.b64decode(json_response["audio"])
waveheader = generate_wav_header(22050, 16, 1, len(decoded_audio_response), 1)
if response.status_code == 200:
if args.wav:
with open("output.wav","wb") as output_file:
output_file.seek(0, 0)
output_file.write(waveheader)
output_file.seek(0, 2)
output_file.write(decoded_audio_response)
print("Audio successfully written to", output_file.name)
else:
with open("output.raw", "wb") as output_file:
output_file.write(decoded_audio_response)
print("Audio successfully written to", output_file.name)
Copy this code into a text file named http-wav.client.py. If you don’t have the Python requests
module, install it with pip install requests
.
This client accepts an authorization token and returns the audio, either as base64 audio or, with the --wav
argument, as a WAV file named output.wav.
Source the get-token.sh script (see Base URL and authorization) to generate and export an authorization token, then call the Python client. This client accepts the following arguments:
python3 http-wav-client.py --help
usage: http-wav-client.py [-h] [--wav] [--voice [VOICE]] [--model [MODEL]] [--type [TYPE]] [--input [INPUT]]
TTS HTTP Client
options:
-h, --help show this help message and exit
options:
--wav Save audio file in WAVE format
--voice [VOICE] Voice name (default=Evan)
--model [MODEL] Voice model type (default=enhanced)
--type [TYPE] Input type: text or ssml (default=text)
--input [INPUT] Input text (default=This is a test)
This example uses the default voice and input but sets the output file to WAV format:
source get-token.sh
python3 http-wav-client.py --wav
Audio successfully written to output.wav
Optionally use a different voice and specify your own input:
python3 http-wav-client.py --wav --voice "Zoe-Ml" --input "Shall I compare thee to a summers day"
Audio successfully written to output.wav
Or change to SSML input:
python3 http-wav-client.py --wav --voice "Zoe-Ml" --type "ssml" \
--input "<speak>Thou art more lovely and more temperate.</speak>"
Audio successfully written to output.wav
Your authorization token expires after a short period of time. Re-run get-token.sh when you get status error 401: Failed to synthesize. See Status codes for other codes.
Synthesizer HTTP on Windows
To use the Synthesizer HTTP API on Windows, you generate an authorization token and then call the API using cURL or the Python client. Copy the following into a Windows batch file named run-http.bat.
This file contains several cURL and Python commands. Enter your Mix credentials as described in Prerequisites from Mix, then uncomment the command you want.
@echo off
setlocal enabledelayedexpansion
set CLIENT_ID=<Mix client ID, starting with appID:>
set SECRET=<Mix client secret>
rem Change colons (:) to %3A in client ID
set CLIENT_ID=!CLIENT_ID::=%%3A!
set command=curl -s ^
-u %CLIENT_ID%:%SECRET% ^
-d "grant_type=client_credentials" -d "scope=tts" ^
https://auth.crt.nuance.com/oauth2/token
for /f "delims={}" %%a in ('%command%') do (
for /f "tokens=1 delims=:, " %%b in ("%%a") do set key=%%b
for /f "tokens=2 delims=:, " %%b in ("%%a") do set value=%%b
goto done:
)
:done
rem Remove quotes
set MY_TOKEN=!value:"=!
rem Uncomment the command you want to run
rem 1) This gives information about Evan
REM curl -H "Authorization: Bearer %MY_TOKEN%" ^
REM https://tts.api.nuance.com/api/v1/voices ^
REM -d "{ \"voice\": { \"name\": \"Evan\" } }"
rem 2) This gives information about all the Mix voices
REM curl -H "Authorization: Bearer %MY_TOKEN%" https://tts.api.nuance.com/api/v1/voices
rem 3) This gives information about all French Canadian voices available in Mix
REM curl -H "Authorization: Bearer %MY_TOKEN%" ^
REM https://tts.api.nuance.com/api/v1/voices ^
REM -d "{ \"voice\": { \"language\": \"fr-ca\" } }"
rem 4) This generates audio in base64 format. Be aware the output is very long...
REM curl -H "Authorization: Bearer %MY_TOKEN%" ^
REM https://tts.api.nuance.com/api/v1/synthesize ^
REM -d "{ \"voice\": { \"name\": \"Evan\", \"model\": \"enhanced\" }, \"input\": { \"text\": { \"text\": \"This is a test. A very simple test.\"} } }"
rem 5) This generates audio as a wav file, with several variations.
REM python http-wav-client.py --wav
REM python http-wav-client.py --wav --voice "Zoe-Ml" --input "Shall I compare thee to a summers day"
REM python http-wav-client.py --wav --voice "Zoe-Ml" --type "ssml" ^
REM --input "<speak>Thou art more lovely and more temperate.</speak>"
For example, to see information about the voice Evan, comment out the lines following “1) This gives information about Evan”:
rem 1) This gives information about Evan
curl -H "Authorization: Bearer %MY_TOKEN%" ^
https://tts.api.nuance.com/api/v1/voices ^
-d "{ \"voice\": { \"name\": \"Evan\" } }"
Then run the batch file:
run-http.bat
{
"voices": [
{
"name": "Evan",
"model": "enhanced",
"language": "en-US",
"ageGroup": "ADULT",
"gender": "MALE",
"sampleRateHz": 22050,
"languageTlw": "enu",
"restricted": false,
"version": "1.1.1",
"foreignLanguages": []
}
]
}
To synthesize Evan saying, “This is a test,” uncomment the first Python command. It uses the default settings and the --wav
argument:
rem 5) This gives audio as a wav file, with several variations.
python http-wav-client.py --wav
run-http.bat
Audio successfully written to output.wav
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.