Sample synthesis client
TTSaaS offers a fully-functional Python client application that you may download and run on Linux or Windows to synthesize speech using the Synthesizer API.
Note:
You may also use this client with Microsoft neural voices, as described in Neural TTSaaS > Sample synthesis client for Neural TTSaaS.To run this client, you need:
- Python 3.6 or later.
- The generated Python stub files from gRPC setup.
- Your client ID and secret from Prerequisites from Mix.
- The Python client files: sample-synthesis-client.zip
Download this zip file and extract its files into the same directory as the nuance directory, which contains your proto files and Python stubs.
On Linux, give run-mix-client.sh execute permission with chmod +x
. For example:
unzip sample-synthesis-client.zip
chmod +x run-mix-client.sh
These are the resulting client files, in the same directory as the nuance directory:
├── flow.py
├── flow-multi.py
├── client.py
├── run-mix-client.bat
├── run-mix-client.sh
└── nuance
├── rpc (RPC message files)
└── tts
├── storage (Storage files)
└── v1
├── synthesizer_pb2_grpc.py
├── synthesizer_pb2.py
└── synthesizer.proto
You can use the client to search for available voices and/or request synthesis. Here are a few scenarios you can try.
Get help
For a quick check that the client is working, and to see the arguments it accepts, run it using the help (-h
or --help
) option.
See the results below and notice:
-
-s
or--serverUrl
: The URL of the service. The sample run script specifies the Mix service, tts.api.nuance.com, on its default port, 443. -
Authorization: Include
--oauthURL
,--clientID
, and--clientSecret
. Alternatively, use the hidden--token argument
. See Authorize. -
--secure
: Include this argument when calling TTSaaS. -
-f
or--files
: This points to an input file, by default flow.py. For multiple files, specify--files flow1.py flow2.py
The results are the same on Linux and Windows:
python3 client.py --help
usage: client.py [-options]
options:
-h, --help Show this help message and exit
--appid [appID:client-id] Client ID or group name, prefixed with appID:
-f file [file ...], --files file [file ...]
List of flow files to execute sequentially,
default=['flow.py']
-p, --parallel Run each flow in a separate thread
-i [num], --iterations [num] Number of times to run the list of files, default=1
--infinite Run all files infinitely (overrides number of
iterations)
-t [num], --timeoutSeconds [num] Timeout in seconds for every RPC call, default=30
-s [url], --serverUrl [url] NVC server URL, default=localhost:8080
--oauthURL [url] OAuth 2.0 URL
--clientRequestID [id] Client-generated request ID
--clientID [url] OAuth 2.0 Client ID
--clientSecret [url] OAuth 2.0 Client Secret
--oauthScope [url] OAuth 2.0 Scope, default=tts
--secure Connect to the server using a secure gRPC channel
--rootCerts [file] Not used
--privateKey [file] Not used
--certChain [file] Not used
--audioDir [dir] Audio output directory, default=./audio
--saveAudio Save whole audio to disk
--saveAudioChunks Save each individual audio chunk to disk
--saveAudioAsWav Save each audio file in the WAVE format
--jaeger [addr] Send UDP opentrace spans, default
addr=udp://localhost:6831
--sendUnary Receive one response (UnarySynthesize) instead of a
stream of responses (Synthesize)
--sendHTTP Send the requests using the HTTP-to-gRPC API
--maxReceiveSizeMB [megabytes] Maximum length of gRPC server response in megabytes,
default=50 MB
--neural Send the request to Neural TTS, if available.
Input files
The sample client includes two input files, flow.py and flow-multi.py. These files provide an easy way to customize the client without editing the main client.py file.
You will learn more about these input files in the following sections.
Edit run script
Edit the sample shell script or batch file to add your Mix client ID and secret. See Authorize.
#!/bin/bash
CLIENT_ID=<Mix client ID, starting with appID:>
SECRET=<Mix client secret>
#Change colons (:) to %3A in client ID
CLIENT_ID=${CLIENT_ID//:/%3A}
python3 client.py --oauthURL https://auth.crt.nuance.com/oauth2/token \
--clientID $CLIENT_ID --clientSecret $SECRET \
--secure --serverUrl tts.api.nuance.com \
--saveAudio --saveAudioAsWav
@echo off
setlocal enabledelayedexpansion
set CLIENT_ID=<Mix client ID, starting with appID:>
set SECRET=<Mix client secret>
rem Change colons (:) to %3A in client ID
set CLIENT_ID=!CLIENT_ID::=%%3A!
python mix-client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^
--clientID %CLIENT_ID% --clientSecret %SECRET% ^
--secure --serverUrl tts.api.nuance.com ^
--saveAudio --saveAudioAsWav
Notice the --saveAudio
and --saveAudioAsWav
arguments. These save the synthesized audio as a wave file in the --audioDir
default location, ./audio.
Synthesize text input
In this first scenario, use the default input file to synthesize a text string using SynthesisRequest and save the audio in a wave file.
-
Open the input file, flow.py, and notice two sections. For this exercise, you don’t need to change anything in this file.
# GetVoices request
asks for information about the Evan voice.
# Synthesis request
uses the same voice to synthesize a text string.from nuance.tts.v1.synthesizer_pb2 import * list_of_requests = [] # GetVoices request request = GetVoicesRequest() request.voice.name = "Evan" # Add request to list list_of_requests.append(request) # --- # Synthesis request request = SynthesisRequest() request.voice.name = "Evan" request.voice.model = "enhanced" pcm = PCM(sample_rate_hz=22050) request.audio_params.audio_format.pcm.CopyFrom(pcm) request.audio_params.volume_percentage = 80 request.audio_params.speaking_rate_factor = 1.0 request.audio_params.audio_chunk_duration_ms = 2000 request.input.text.text = "This is a test. A very simple test." request.event_params.send_log_events = True request.user_id = "MyApplicationUser" #Add request to list list_of_requests.append(request) # ---
-
Run the client using the shell script or batch file.
./run-mix-client.sh
run-mix-client.bat
The client takes the information in the input flow.py file and creates an audio file, flow.py_i1_s1.wav, of Evan saying “This is a test. A very simple test.”
The results are the same on Linux and Windows. Some lines have been omitted for brevity.
2023-10-12 16:58:03,713 (139946639763264) INFO Obtaining auth token
2023-10-12 16:58:03,834 (139946639763264) DEBUG Creating secure gRPC channel
2023-10-12 16:58:04,026 (139946639763264) INFO Running file [flow.py]
2023-10-12 16:58:04,026 (139946639763264) DEBUG [voice {
name: "Evan"
}
, voice {
name: "Evan"
model: "Enhanced"
}
audio_params {
audio_format {
pcm {
sample_rate_hz: 22050
}
}
volume_percentage: 80
speaking_rate_factor: 1.0
audio_chunk_duration_ms: 2000
}
input {
text {
text: "This is a test. A very simple test."
}
}
event_params {
send_log_events: true
}
user_id: "MyApplicationUser"
]
2023-10-12 16:58:04,026 (139946639763264) INFO Sending GetVoices request
2023-10-12 16:58:04,350 (139946639763264) INFO voices {
name: "Evan"
model: "enhanced"
language: "en-US"
gender: MALE
sample_rate_hz: 22050
language_tlw: "enu"
version: "1.1.1"
}
2023-10-12 16:58:04,351 (139946639763264) INFO Sending Synthesis request
*** Events and received audio chunks here ***
2023-10-12 16:58:04,748 (139946639763264) INFO Received status response: SUCCESS
2023-10-12 16:58:04,748 (139946639763264) INFO Wrote audio to ./audio/flow.py_i1_s1.wav
2023-10-12 16:58:04,748 (139946639763264) INFO Done running file [flow.py]
2023-10-12 16:58:04,749 (139946639763264) INFO Done
For an example of events in the results, see Events.
Warning:
The file created by the the client, flow.py_i1_s1.wav, is overwrittten every time you run the client. If you wish to save the file, rename it, for example, evan-simple.wav.Change text and voice
Optionally change the voice and the input text in the synthesis request, and rerun the client. To learn what other voices are available, see Get voices below.
To avoid the long list of events in the response, disable send_log_events
. For example:
# Synthesis request
request = SynthesisRequest()
request.voice.name = "Zoe-Ml"
request.voice.model = "enhanced"
request.input.text.text = "Your coffee will be ready in 5 minutes."
#request.event_params.send_log_events = True # Comment out or change to False
Synthesize SSML input
You may provide SSML input instead of plain text.
-
Edit flow.py to comment out the
request.input.text.text
line and add an SSML line:#request.input.text.text = "This is a test. A very simple test." request.input.ssml.text = "<speak>It's 24,901 miles around the earth, or 40,075 km.</speak>"
-
Run the client using the shell script or batch file.
./run-mix-client.sh
run-mix-client.bat
The client sends a SynthesisRequest to turn the SSML into speech. It creates a file named flow.py_i1_s1.wav telling us the distance around the earth.
For more SSML examples, as well as examples using Nuance control codes, see Input to synthesize.
2023-10-12 17:10:27,576 (140011284363072) INFO Obtaining auth token
2023-10-12 17:10:28,234 (140011284363072) DEBUG Creating secure gRPC channel
...
input {
ssml {
text: "<speak>It\'s 24,901 miles around the earth, or 40,075 km.</speak>"
}
}
user_id: "MyApplicationUser"
]
2023-06-22 09:03:04,375 (13572) INFO Sending GetVoices request
2023-06-22 09:03:04,628 (13572) INFO voices {
name: "Evan"
model: "enhanced"
language: "en-US"
gender: MALE
sample_rate_hz: 22050
language_tlw: "enu"
version: "1.1.1"
}
2023-06-22 09:03:04,629 (13572) INFO Sending Synthesis request
2023-06-22 09:03:04,852 (13572) INFO Received audio: 21336 bytes
2023-06-22 09:03:04,878 (13572) INFO Received audio: 17856 bytes
2023-06-22 09:03:04,968 (13572) INFO Received audio: 82492 bytes
2023-06-22 09:03:05,004 (13572) INFO Received audio: 17030 bytes
2023-06-22 09:03:05,048 (13572) INFO Received audio: 45300 bytes
2023-06-22 09:03:05,130 (13572) INFO Received audio: 70044 bytes
2023-06-22 09:03:05,138 (13572) INFO Received status response: SUCCESS
2023-06-22 09:03:05,141 (13572) INFO Wrote audio to flow.py_i1_s1.av
2023-06-22 09:03:05,141 (13572) INFO Done running file [flow.py]
2023-06-22 09:03:05,142 (13572) INFO Done
Without send_log_events
in the input flow.py file, notice that only the received audio chunks are shown in the results.
Get voices
When you ask TTSaaS to synthesize text, you must specify a named voice. To learn which voices are available, send a GetVoicesRequest, entering your requirements in the flow.py input file.
-
Edit flow.py to request American English female voices. This combination of options returns voices that are both American English and female.
Optionally turn off synthesis for this request.
from nuance.tts.v1.synthesizer_pb2 import * list_of_requests = [] # GetVoices request request = GetVoicesRequest() #request.voice.name = "Evan" request.voice.language = "en-us" # Request American English voices request.voice.gender = EnumGender.FEMALE # Request female voices # Add request to list list_of_requests.append(request) # Synthesis request ... #Add request to list #list_of_requests.append(request) # Disable synthesis with #
-
Run the client using the shell script or batch file.
./run-mix-client.sh
run-mix-client.bat
The client returns information about all female American English voices available in the current environment.
2023-10-12 17:16:55,041 (140380487186240) INFO Obtaining auth token
2023-10-12 17:16:55,256 (140380487186240) DEBUG Creating secure gRPC channel
...
2023-10-12 17:16:55,361 (140380487186240) DEBUG [voice {
language: "en-US"
gender: FEMALE
}
]
2023-10-12 17:16:55,362 (140380487186240) INFO Sending GetVoices request
2023-10-12 17:16:55,604 (140380487186240) INFO voices {
name: "Allison"
model: "standard"
language: "en-US"
gender: FEMALE
sample_rate_hz: 22050
language_tlw: "enu"
version: "2.0.0"
}
voices {
name: "Ava-Ml"
model: "enhanced"
language: "en-US"
gender: FEMALE
sample_rate_hz: 22050
language_tlw: "enu"
version: "3.0.1"
foreign_languages: "es-MX"
}
...
voices {
name: "Zoe-Ml"
model: "enhanced"
language: "en-US"
gender: FEMALE
sample_rate_hz: 22050
language_tlw: "enu"
version: "2.0.0"
foreign_languages: "es-MX"
foreign_languages: "fr-CA"
}
2023-10-12 17:16:55,604 (140380487186240) INFO Done running file [flow.py]
2023-10-12 17:16:55,605 (140380487186240) INFO Done
Notice the information that TTSaaS returns for each voice:
-
All voices include the voice name, model (standard or enhanced), language code, gender, and other parameters described in Voice.
-
Multilingual voices (ending in -Ml) list supported languages other than their native language.
Get more voices
You can experiment with this request: for example, to see all available voices, remove or comment out all the request.voice
lines, leaving only the main GetVoicesRequest.
# GetVoices request
request = GetVoicesRequest() # Keep only this line
#request.voice.name = "Evan"
#request.voice.language = "en-us"
The results include all voices available from the Nuance-hosted TTSaaS service.
Redirect results to file
If you request a large number of voices, you may wish to save the output to a file. For example, this requests all voices and saves them to a text file.
./run-mix-client.sh &> all-voices.txt
ls *.txt
-rw-r--r-- 1 xxx xxx 60185 Apr 17 14:57 all-voices.txt
cat all-voices.txt
run-mix-client.bat > all-voices.txt 2>&1
dir *.txt
2023-06-22 09:15 AM 30,807 all-voices.txt
Run client with resources
If you have uploaded synthesis resources using the Storage API (see the Sample storage client), you can reference them in a synthesis request.
-
Edit flow.py to specify one or more resources within the synthesis request, for example, a user dictionary uploaded with the Storage API.
from nuance.tts.v1.synthesizer_pb2 import * . . . # Synthesis request request = SynthesisRequest() request.voice.name = "Evan" request.voice.model = "enhanced" pcm = PCM(sample_rate_hz=22050) request.audio_params.audio_format.pcm.CopyFrom(pcm) user_dict = SynthesisResource() # Add a user dictionary user_dict.type = EnumResourceType.USER_DICTIONARY user_dict.uri = "urn:nuance-mix:tag:tuning:lang/coffee_app/coffee_dict/en-us/mix.tts" request.input.resources.extend([user_dict]) request.input.text.text = "This is a test. A very simple test." #Add request to list list_of_requests.append(request)
-
Run the client using the shell script or batch file.
./run-mix-client.sh
run-mix-client.bat
In the results, the user dictionary is listed under resources. As this is a Python example, type: USER_DICTIONARY
is not shown under resources
because it’s the default value.
2023-10-12 17:20:27,762 (139961486419776) INFO Obtaining auth token
2023-10-12 17:20:27,834 (139961486419776) DEBUG Creating secure gRPC channel
...
2023-10-12 17:20:28,014 (139961486419776) DEBUG [voice {
name: "Evan"
model: "Enhanced"
}
audio_params {
audio_format {
pcm {
sample_rate_hz: 22050
}
}
volume_percentage: 80
speaking_rate_factor: 1.0
audio_chunk_duration_ms: 2000
}
input {
ssml {
text: "<speak>It\'s 24,901 miles around the earth, or 40,075 km.</speak>"
}
resources {
uri: "urn:nuance-mix:tag:tuning:lang/coffee_app/coffee_dict/en-us/mix.tts"
}
}
user_id: "MyApplicationUser"
]
2023-10-12 17:20:28,015 (139961486419776) INFO Sending Synthesis request
2023-10-12 17:20:28,727 (139961486419776) INFO Received audio: 21336 bytes
2023-10-12 17:20:28,752 (139961486419776) INFO Received audio: 17856 bytes
2023-10-12 17:20:28,916 (139961486419776) INFO Received audio: 82492 bytes
2023-10-12 17:20:28,945 (139961486419776) INFO Received audio: 17030 bytes
2023-10-12 17:20:29,001 (139961486419776) INFO Received audio: 45300 bytes
2023-10-12 17:20:29,107 (139961486419776) INFO Received audio: 70044 bytes
2023-10-12 17:20:29,114 (139961486419776) INFO Received status response: SUCCESS
2023-10-12 17:20:29,118 (139961486419776) INFO Wrote audio to ./audio/flow.py_i1_s1.wav
2023-10-12 17:20:29,118 (139961486419776) INFO Done running file [flow.py]
2023-10-12 17:20:29,119 (139961486419776) INFO Done
For examples of using all types of synthesis resources, see Synthesis resources.
Multiple requests
You can send multiple requests for synthesis (and/or get voices) in the same session. For efficient communication with the TTSaaS server, all requests use the same channel and stub. This scenario sends three synthesis requests.
-
Open the input file, flow-multi.py, and notice it contains three synthesis requests, pausing for a couple of seconds after each request.
from nuance.tts.v1.synthesizer_pb2 import * list_of_requests = [] # Synthesis request request = SynthesisRequest() # First request request.voice.name = "Evan" request.voice.model = "enhanced" pcm = PCM(sample_rate_hz=22050) request.audio_params.audio_format.pcm.CopyFrom(pcm) request.input.text.text = "This is a test. A very simple test." list_of_requests.append(request) list_of_requests.append(2) # Pause after request # Synthesis request request = SynthesisRequest() # Second request request.voice.name = "Evan" request.voice.model = "enhanced" pcm = PCM(sample_rate_hz=22050) request.audio_params.audio_format.pcm.CopyFrom(pcm) request.input.text.text = "Your coffee will be ready in 5 minutes." list_of_requests.append(request) list_of_requests.append(2) # Pause after request # Synthesis request request = SynthesisRequest() # Third request request.voice.name = "Zoe-Ml" request.voice.model = "enhanced" pcm = PCM(sample_rate_hz=22050) request.audio_params.audio_format.pcm.CopyFrom(pcm) request.input.text.text = "The wind was a torrent of darkness, among the gusty trees." list_of_requests.append(request)
-
Edit your shell script or batch file to point to the flow-multi.py input file:
... python3 mix-client.py --token $MY_TOKEN --saveAudio --saveAudioAsWav \ --files flow-multi.py
... python mix-client.py --token %MY_TOKEN% --saveAudio --saveAudioAsWav ^ --files flow-multi.py
-
Run the client using the shell script or batch file.
./run-mix-client.sh
run-mix-client.bat
The client makes three synthesis requests and creates three audio files:
- flow.py_i1_s1.wav: Evan saying: “This is a test…”
- flow.py_i1_s2.wav: Evan saying: “Your coffee will be ready…”
- flow.py_i1_s3.wav: Zoe saying: “The wind was a torrent of darkness…”
2023-10-12 17:25:52,663 (140168510699328) INFO Obtaining auth token
2023-10-12 17:25:52,725 (140168510699328) DEBUG Creating secure gRPC channel
...
2023-10-12 17:25:52,981 (140168510699328) DEBUG [voice {
name: "Evan"
model: "enhanced"
}
audio_params {
audio_format {
pcm {
sample_rate_hz: 22050
}
}
}
input {
text {
text: "This is a test. A very simple test."
}
}
, 2, voice {
name: "Evan"
model: "enhanced"
}
audio_params {
audio_format {
pcm {
sample_rate_hz: 22050
}
}
}
input {
text {
text: "Your coffee will be ready in 5 minutes."
}
}
, 2, voice {
name: "Zoe-Ml"
model: "enhanced"
}
audio_params {
audio_format {
pcm {
sample_rate_hz: 22050
}
}
}
input {
text {
text: "The wind was a torrent of darkness, among the gusty trees."
}
}
]
2023-10-12 17:25:52,982 (140168510699328) INFO Sending Synthesis request
2023-10-12 17:25:53,504 (140168510699328) INFO Received audio: 57484 bytes
2023-10-12 17:25:53,635 (140168510699328) INFO Received audio: 70432 bytes
2023-10-12 17:25:53,635 (140168510699328) INFO Received status response: SUCCESS
2023-10-12 17:25:53,636 (140168510699328) INFO Wrote audio to ./audio/flow-multi.py_i1_s1.wav
2023-10-12 17:25:53,636 (140168510699328) INFO Waiting for 2 seconds
2023-10-12 17:25:55,638 (140168510699328) INFO Sending Synthesis request
2023-10-12 17:25:55,946 (140168510699328) INFO Received audio: 44756 bytes
2023-10-12 17:25:56,010 (140168510699328) INFO Received audio: 67030 bytes
2023-10-12 17:25:56,011 (140168510699328) INFO Received status response: SUCCESS
2023-10-12 17:25:56,011 (140168510699328) INFO Wrote audio to ./audio/flow-multi.py_i1_s2.wav
2023-10-12 17:25:56,011 (140168510699328) INFO Waiting for 2 seconds
2023-10-12 17:25:58,013 (140168510699328) INFO Sending Synthesis request
2023-10-12 17:25:58,278 (140168510699328) INFO Received audio: 42424 bytes
2023-10-12 17:25:58,278 (140168510699328) INFO Received audio: 1040 bytes
2023-10-12 17:25:58,278 (140168510699328) INFO Received audio: 26648 bytes
2023-10-12 17:25:58,309 (140168510699328) INFO Received audio: 20558 bytes
2023-10-12 17:25:58,309 (140168510699328) INFO Received audio: 7902 bytes
2023-10-12 17:25:58,310 (140168510699328) INFO Received audio: 10292 bytes
2023-10-12 17:25:58,318 (140168510699328) INFO Received audio: 50508 bytes
2023-10-12 17:25:58,323 (140168510699328) INFO Received status response: SUCCESS
2023-10-12 17:25:58,326 (140168510699328) INFO Wrote audio to ./audio/flow-multi.py_i1_s3.wav
2023-10-12 17:25:58,326 (140168510699328) INFO Done running file [flow-multi.py]
2023-10-12 17:25:58,327 (140168510699328) INFO Done
What’s list_of_requests?
The client expects all input files to declare a global array named list_of_requests
. It sequentially processes the requests contained in that array.
You may optionally instruct the client to wait a number of seconds between requests, by appending a number value to list_of_requests. For example:
list_of_requests.append(request1)
list_of_requests.append(1.5)
list_of_requests.append(request2)
Once request1 is complete, the client pauses for 1.5 seconds before executing request2.
Run client for unary response
By default, the synthesized voice is streamed back to the client, but you may request a unary (non-streamed, single package) response.
-
Using the sample client, include the
--sendUnary
argument in the run script. This example uses the same input flow.py file as Synthesize text input.... python3 client.py --oauthURL https://auth.crt.nuance.com/oauth2/token \ --clientID $CLIENT_ID --clientSecret $SECRET \ --secure --serverUrl tts.api.nuance.com \ --saveAudio --saveAudioAsWav --sendUnary
... python client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^ --clientID %CLIENT_ID% --clientSecret %SECRET% ^ --secure --serverUrl tts.api.nuance.com ^ --saveAudio --saveAudioAsWav --sendUnary
-
Run the client using the shell script or batch file.
./run-mix-client.sh
run-mix-client.bat
This unary response returns a single non-streamed audio package, logged as one Received audio
:
2023-10-12 17:33:10,583 (140049666340672) INFO Obtaining auth token
2023-10-12 17:33:10,629 (140049666340672) DEBUG Creating secure gRPC channel
...
2023-10-12 17:33:10,900 (140049666340672) DEBUG [voice {
name: "Evan"
model: "Enhanced"
}
audio_params {
audio_format {
pcm {
sample_rate_hz: 22050
}
}
volume_percentage: 80
speaking_rate_factor: 1.0
audio_chunk_duration_ms: 2000
}
input {
ssml {
text: "<speak>It\'s 24,901 miles around the earth, or 40,075 km.</speak>"
}
resources {
uri: "urn:nuance-mix:tag:tuning:lang/coffee_app/coffee_dict/en-us/mix.tts"
}
}
user_id: "MyApplicationUser"
]
2023-10-12 17:33:12,363 (140049666340672) INFO Sending Unary Synthesis request
2023-10-12 17:33:12,363 (140049666340672) INFO Received audio: 254058 bytes
2023-10-12 17:33:12,363 (140049666340672) INFO Received status response: SUCCESS
2023-10-12 17:33:12,364 (140049666340672) INFO Wrote audio to ./audio/flow.py_i1_s1.wav
2023-10-12 17:33:12,364 (140049666340672) INFO Done running file [flow.py]
2023-10-12 17:33:12,365 (140049666340672) INFO Done
If you have multiple requests, each request returns a single audio package.
See also Streamed vs. unary response.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.