Sample synthesis client for Neural TTSaaS
Neural TTSaaS offers a fully functional Python client that you may download and use on Linux or Windows to synthesize speech using the Synthesizer gRPC API for Neural TTSaaS.
Note:
This is the same client provided with TTSaaS. When used in Neural TTSaaSs, the client includes different SSML sample input and a header that sends the request to the Neural TTSaaS service.To run this client, you need:
- Python 3.6 or later.
- The generated Python stub files from gRPC setup.
- Your Mix client ID and secret from Prerequisites from Mix.
- The Python client files: sample-synthesis-client.zip
Download the zip file to Linux or Windows and extract the files into the same directory as the nuance directory, which contains your proto files and Python stubs.
On Linux, give the shell script execute permission with chmod +x
. For example:
unzip sample-synthesis-client.zip
chmod +x run-client.sh
These are the resulting client files, in the same directory as the nuance directory.
├── client.py
├── flow.py
├── flow-multi.py
├── run-client.sh
├── run-client.bat
└── nuance
└── tts
└── v1
├── synthesizer_pb2_grpc.py
├── synthesizer_pb2.py
└── synthesizer.proto
You can use the client to check for available voices and/or request synthesis. Here are a few scenarios you can try.
Get help
For a quick check that the client is working, and to see the arguments it accepts, run it on Linux or Windows using the help (-h
or --help
) option.
See the results below and notice:
-
-s
or--serverUrl
: The URL of the service. The sample run script specifies the Mix service, tts.api.nuance.com, on its default port, 443. -
Authorization: Include
--oauthURL
,--clientID
, and--clientSecret
. Alternatively, use the (hidden)--token
argument. See Authorize. -
--neural
: Include this argument to send the request to Neural TTSaaS. The client adds the x-nuance-tts-neural header as it calls the service, which directs the request to Neural TTSaaS instead of TTSaaS. -
-f
or--files
: The name of the input file to use for the request. The default is flow.py.
The results are the same on Linux and Windows:
python3 client.py --help
usage: client.py [-options]
options:
-h, --help Show this help message and exit
--appid [appID:client-id] Not used
-f file [file ...], --files file [file ...] List of flow files to execute sequentially,
default=['flow.py']
-p, --parallel Run each flow in a separate thread
-i [num], --iterations [num] Number of times to run the list of files, default=1
--infinite Run all files infinitely (overrides number of
iterations)
-t [num], --timeoutSeconds [num] Timeout in seconds for every RPC call, default=30
-s [url], --serverUrl [url] NVC server URL, default=localhost:8080
--oauthURL [url] OAuth 2.0 URL
--clientRequestID [id] Client-generated request ID
--clientID [url] OAuth 2.0 Client ID
--clientSecret [url] OAuth 2.0 Client Secret
--oauthScope [url] OAuth 2.0 Scope, default=tts
--secure Connect to the server using a secure gRPC channel
--rootCerts [file] Not used
--privateKey [file] Not used
--certChain [file] Not used
--audioDir [dir] Audio output directory, default=./audio
--saveAudio Save whole audio to disk
--saveAudioChunks Save each individual audio chunk to disk
--saveAudioAsWav Save each audio file in the WAVE format
--jaeger [addr] Not used
--sendUnary Not used
--sendHTTP Not used
--maxReceiveSizeMB [megabytes] Maximum length of gRPC server response in megabytes,
default=50 MB
--neural Send the request to Neural TTS, if available.
Input files
The sample client includes two input files, flow.py and flow-multi.py. These files provide an easy way to customize the client without editing the main client.py file.
You’ll learn more about these input files in the following sections.
Synthesize text input
In this first scenario, use the default input file to ask Neural TTSaaS to synthesize a text string using SynthesisRequest and save the resulting audio in a wave file.
-
Edit the run script, run-client.sh or run-client.bat, to add your Mix client ID and secret. (See Authorize for details.)
#!/bin/bash CLIENT_ID=<Mix client ID, starting with appID:> SECRET=<Mix client secret> # Change colons (:) to %3A in client ID CLIENT_ID=${CLIENT_ID//:/%3A} python3 client.py --oauthURL https://auth.crt.nuance.com/oauth2/token \ --clientID $CLIENT_ID --clientSecret $SECRET \ --secure --serverUrl tts.api.nuance.com:443 --neural --saveAudio --saveAudioAsWav
@echo off setlocal enabledelayedexpansion set CLIENT_ID=<Mix client ID, starting with appID:> set SECRET=<Mix client secret> rem Change colons (:) to %3A in client ID set CLIENT_ID=!CLIENT_ID::=%%3A! python client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^ --clientID %CLIENT_ID% --clientSecret %SECRET% ^ --secure --serverUrl tts.api.nuance.com --neural --saveAudio --saveAudioAsWav
Notice the
--neural
argument. This adds the x-nuance-tts-neural header, which directs the request to Neural TTSaaS instead of TTSaaS.Also notice the
--saveAudio
and--saveAudioAsWav
arguments. These save the synthesized result as a wave file. There is no need to include the--files
argument since flow.py is the default input filename. -
Open the input file, flow.py, and notice the two sections:
# GetVoices request
asks for information about the JennyNeural voice.
# Synthesis request
requests the same voice and provides input text to synthesize.from nuance.tts.v1.synthesizer_pb2 import * list_of_requests = [] # GetVoices request request = GetVoicesRequest() request.voice.name = "en-US-JennyNeural" #request.voice.language = "en-US" #request.voice.gender = EnumGender.FEMALE # Add request to list list_of_requests.append(request) # --- # Synthesis request request = SynthesisRequest() request.voice.name = "en-US-JennyNeural" pcm = PCM(sample_rate_hz=16000) request.audio_params.audio_format.pcm.CopyFrom(pcm) request.input.text.text = "This is a test, a very simple test." #request.input.ssml.text = . . . #request.user_id = "MyApplicationUser" #request.client_data["company"] = "My Company" #request.client_data["user"] = "My User Name" # Add request to list list_of_requests.append(request) # ---
-
Run the client using the script or batch file:
./run-client.sh
run-client.bat
The client first sends a GetVoicesRequest, which returns information about the JennyNeural voice.
It then sends a SynthesisRequest to turn the text into speech using the same voice, and creates a file named flow.py_i1_s1.wav in the default --audioDir
location, ./audio. The WAV file contains the voice of Jenny saying “This is a test, a very simple test.”
These are the results. Some lines are omitted for brevity.
2023-10-26 16:48:16,111 (139817866266432) INFO Obtaining auth token
2023-10-26 16:48:16,476 (139817866266432) DEBUG Creating secure gRPC channel
2023-10-26 16:48:16,483 (139817866266432) INFO Running file [flow.py]
2023-10-26 16:48:16,483 (139817866266432) DEBUG [voice {
name: "en-US-JennyNeural"
}
, voice {
name: "en-US-JennyNeural"
}
input {
text {
text: "This is a test, a very simple test."
}
}
]
2023-10-26 16:48:16,483 (139817866266432) INFO Sending GetVoices request
2023-10-26 16:48:16,483 (139817866266432) INFO Adding x-nuance-tts-neural header
2023-10-26 16:48:16,615 (139817866266432) INFO voices {
name: "en-US-JennyNeural"
model: "neural"
language: "en-US"
gender: FEMALE
sample_rate_hz: 24000
styles: "assistant"
styles: "chat"
styles: "customerservice"
styles: "newscast"
styles: "angry"
styles: "cheerful"
styles: "sad"
styles: "excited"
styles: "friendly"
styles: "terrified"
styles: "shouting"
styles: "unfriendly"
styles: "whispering"
styles: "hopeful"
}
2023-10-26 16:48:16,616 (139817866266432) INFO Adding x-nuance-tts-neural header
2023-10-26 16:48:16,616 (139817866266432) INFO Sending Synthesis request
2023-10-26 16:48:16,897 (139817866266432) INFO Received audio: 62842 bytes
2023-10-26 16:48:16,897 (139817866266432) INFO Received audio: 30870 bytes
2023-10-26 16:48:16,898 (139817866266432) INFO Received audio: 66 bytes
2023-10-26 16:48:16,898 (139817866266432) INFO Received status response: SUCCESS
2023-10-26 16:48:16,899 (139817866266432) INFO Wrote audio to ./audio/flow.py_i1_s1.wav
2023-10-26 16:48:16,899 (139817866266432) INFO Done running file [flow.py]
2023-10-26 16:48:16,900 (139817866266432) INFO Done
Warning:
The file created by the the client, flow.py_i1_s1.wav, is overwrittten every time you run the client. If you want to save the file, rename it, for example, jenny-simple.wav.Change text and voice
Optionally change the voice and the input text in the synthesis request, and rerun the client. (To learn what other voices are available, see Get voices below.) For example:
# Synthesis request
request.voice.name = "en-US-ChristopherNeural"
request.input.text.text = "Your coffee will be ready in 5 minutes."
Include metadata
You may include metadata that will be included in event logs. Uncomment the following lines in the sample flow.py file and add your own values for user_id
and one or more client_data
key-value pairs
request.user_id = "MyApplicationUser"
request.client_data["company"] = "My Company"
request.client_data["user"] = "My User Name"
The information is shown in the results:
2023-10-26 16:52:59,572 (140255182530368) DEBUG [voice {
name: "en-US-JennyNeural"
}
, voice {
name: "en-US-JennyNeural"
}
input {
text {
text: "This is a test, a very simple test."
}
}
client_data {
key: "company"
value: "My Company"
}
client_data {
key: "user"
value: "My User Name"
}
user_id: "MyApplicationUser"
]
Synthesize SSML input
You may provide SSML input instead of plain text.
-
Edit flow.py to disable the
request.input.text.text
line and enablerequest.input.ssml.text
.Optionally remove the enclosing <speak> </speak> element in the SSML as Neural TTSaaS will add it automatically.
from nuance.tts.v1.synthesizer_pb2 import * list_of_requests = [] # Synthesis request request = SynthesisRequest() request.voice.name = "en-US-JennyNeural" pcm = PCM(sample_rate_hz=16000) request.audio_params.audio_format.pcm.CopyFrom(pcm) #request.input.text.text = "This is a test, a very simple test." request.input.ssml.text = '''<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"> <voice name="en-US-JennyNeural">Hello, it's Jenny.</voice> <voice name="en-US-AriaNeural">Hi, it's Aria.</voice> </speak>''' # Add request to list list_of_requests.append(request)
-
Run the client as before.
./run-client.sh
run-client.bat
The client sends a SynthesisRequest to turn the SSML text into speech. It creates a file named flow.py_i1_s1.wav containing the speech: Jenny saying “Hello, it’s Jenny,” followed by Aria saying “Hi, it’s Aria.”
These are the results. (Some lines are omitted for brevity.)
2022-12-13 09:45:07,272 (140618171987776) INFO Obtaining auth token
2022-12-13 09:45:07,642 (140618171987776) DEBUG Creating secure gRPC channel
2022-12-13 09:45:07,649 (140618171987776) INFO Running file [flow.py]
2022-12-13 09:45:07,649 (140618171987776) DEBUG [voice {
name: "en-US-JennyNeural"
}
, voice {
name: "en-US-JennyNeural"
}
audio_params {
audio_format {
pcm {
sample_rate_hz: 16000
}
}
}
input {
ssml {
text: "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">\n<voice name=\"en-US-JennyNeural\">Hello, it\'s Jenny.</voice>\n<voice name=\"en-US-AriaNeural\">Hi, it\'s Aria.</voice>\n</speak>"
}
}
]
2022-12-13 09:45:07,649 (140618171987776) INFO Sending GetVoices request
2022-12-13 09:45:07,649 (140618171987776) INFO Adding x-nuance-tts-neural header
2022-12-13 09:45:08,049 (140618171987776) INFO voices {
name: "en-US-JennyNeural"
...
}
2022-12-13 09:45:08,050 (140618171987776) INFO Adding x-nuance-tts-neural header
2022-12-13 09:45:08,050 (140618171987776) INFO Sending Synthesis request
2022-12-13 09:45:08,373 (140618171987776) INFO Received audio: 34358 bytes
2022-12-13 09:45:08,400 (140618171987776) INFO Received audio: 25642 bytes
2022-12-13 09:45:08,467 (140618171987776) INFO Received audio: 34358 bytes
2022-12-13 09:45:08,468 (140618171987776) INFO Received audio: 24842 bytes
2022-12-13 09:45:08,469 (140618171987776) INFO Received status response: SUCCESS
2022-12-13 09:45:08,470 (140618171987776) INFO Wrote audio to ./audio/flow.py_i1_s1.wav
2022-12-13 09:45:08,470 (140618171987776) INFO Done running file [flow.py]
2022-12-13 09:45:08,471 (140618171987776) INFO Done
For more SSML examples, including how to add lexicons and prerecorded audio, see Reference topics: Input to synthesize and SSML elements.
Get voices
When you ask Neural TTSaaS to synthesize text, you must specify a named voice. To learn which voices are available, send a GetVoicesRequest, entering your requirements in the flow.py input file.
-
Make sure your run script, run-client.sh or run-client.bat, contains your Mix client ID and secret. (See Authorize for details.)
-
Edit the input file, flow.py, to request American English female voices. This combination of options returns voices that are both American English and female. Optionally turn off synthesis for this request.
from nuance.tts.v1.synthesizer_pb2 import * list_of_requests = [] # GetVoices request request = GetVoicesRequest() #request.voice.name = "en-US-JennyNeural" request.voice.language = "en-US" # Request American English voices request.voice.gender = EnumGender.FEMALE # Request female voices # Add request to list list_of_requests.append(request) # Make sure voice request is enabled # Synthesis request ... # Add request to list #list_of_requests.append(request) # Disable synthesis request
-
Run the client using the script or batch file:
./run-client.sh
run-client.bat
The results include all female American English voices available. Neural TTSaaS returns the following information for each voice:
- All voices include the voice name, model (usually “neural”), language code, gender, and audio sampling rate.
- Voices that support expression styles return a list of styles that you may include in SSML input. See Voice style.
- The Jenny multilingual voice returns the languages other than English (“foreign_languages”) that this voice supports. See Multilingual voice.
These are the American English female voices in the results:
2022-12-13 09:50:12,489 (140290220357440) INFO Obtaining auth token
2022-12-13 09:50:12,769 (140290220357440) DEBUG Creating secure gRPC channel
2022-12-13 09:50:12,775 (140290220357440) INFO Running file [flow.py]
2022-12-13 09:50:12,775 (140290220357440) DEBUG [voice {
language: "en-US"
gender: FEMALE
}
]
2022-12-13 09:50:12,776 (140290220357440) INFO Sending GetVoices request
2022-12-13 09:50:12,776 (140290220357440) INFO Adding x-nuance-tts-neural header
2022-12-13 09:50:13,223 (140290220357440) INFO voices {
name: "en-US-JennyNeural"
model: "neural"
language: "en-US"
gender: FEMALE
sample_rate_hz: 24000
styles: "assistant"
styles: "chat"
styles: "customerservice"
styles: "newscast"
styles: "angry"
styles: "cheerful"
styles: "sad"
styles: "excited"
styles: "friendly"
styles: "terrified"
styles: "shouting"
styles: "unfriendly"
styles: "whispering"
styles: "hopeful"
}
voices {
name: "en-US-JennyMultilingualNeural"
model: "neural"
language: "en-US"
gender: FEMALE
sample_rate_hz: 24000
foreign_languages: "de-DE"
foreign_languages: "en-AU"
foreign_languages: "en-CA"
foreign_languages: "en-GB"
foreign_languages: "es-ES"
foreign_languages: "es-MX"
foreign_languages: "fr-CA"
foreign_languages: "fr-FR"
foreign_languages: "it-IT"
foreign_languages: "ja-JP"
foreign_languages: "ko-KR"
foreign_languages: "pt-BR"
foreign_languages: "zh-CN"
}
voices {
name: "en-US-AmberNeural"
model: "neural"
language: "en-US"
gender: FEMALE
sample_rate_hz: 24000
}
voices {
name: "en-US-AnaNeural"
model: "neural"
language: "en-US"
gender: FEMALE
sample_rate_hz: 24000
}
voices {
name: "en-US-AriaNeural"
model: "neural"
language: "en-US"
gender: FEMALE
sample_rate_hz: 24000
styles: "chat"
styles: "customerservice"
styles: "narration-professional"
styles: "newscast-casual"
styles: "newscast-formal"
styles: "cheerful"
styles: "empathetic"
styles: "angry"
styles: "sad"
styles: "excited"
styles: "friendly"
styles: "terrified"
styles: "shouting"
styles: "unfriendly"
styles: "whispering"
styles: "hopeful"
}
... Voices omitted here ...
2022-12-13 09:50:13,223 (140290220357440) INFO Done running file [flow.py]
2022-12-13 09:50:13,227 (140290220357440) INFO Done
Get more voices
You can experiment with this request by commenting and uncommenting the request.voice
lines in your flow.py file. For example, uncomment only the language line to see all American English voices, or change the language to es-ES, for example, to see Spanish voices.
# GetVoices request
request = GetVoicesRequest()
#request.voice.name = "en-US-JennyNeural"
request.voice.language = "en-US" # Or try "es-ES", "en-GB", or "zh-CN"
#request.voice.gender = EnumGender.FEMALE
Or, to see all available voices, comment out all request.voice
lines, leaving only the main GetVoicesRequest.
# GetVoices request
request = GetVoicesRequest() # Keep only this line to see all voices
#request.voice.name = "en-US-JennyNeural"
#request.voice.language = "en-US"
#request.voice.gender = EnumGender.FEMALE
Redirect results to file
If you request a large number of voices, you may wish to save the output to a file. For example, this requests all voices and saves them to a text file.
$ ./run-client.sh &> all-voices.txt
$ ls *.txt
-rw-r--r-- 1 xxx xxx 60185 Apr 17 14:57 all-voices.txt
$ cat all-voices.txt
>run-client.bat > all-voices.txt 2>&1
>dir *.txt
2023-04-17 11:15 AM 63,498 all-voices.txt
Multiple requests
You can send multiple requests for synthesis (and/or get voices) in the same session. For efficient communication with Neural TTSaaS, all requests use the same channel and stub. This scenario sends three synthesis requests.
-
Use the flow-multi.py input file, which contains three synthesis requests, with a pause between each one.
from nuance.tts.v1.synthesizer_pb2 import * list_of_requests = [] # Synthesis request request = SynthesisRequest() # First request request.voice.name = "en-US-JennyNeural" pcm = PCM(sample_rate_hz=22050) request.audio_params.audio_format.pcm.CopyFrom(pcm) request.input.text.text = "This is a test. A very simple test." list_of_requests.append(request) list_of_requests.append(2) # Optionally pause after request # Synthesis request request = SynthesisRequest() # Second request request.voice.name = "en-US-JennyNeural" pcm = PCM(sample_rate_hz=22050) request.audio_params.audio_format.pcm.CopyFrom(pcm) request.input.text.text = "Your coffee will be ready in 5 minutes." list_of_requests.append(request) list_of_requests.append(2) # Optionally pause after request # Synthesis request request = SynthesisRequest() # Third request request.voice.name = "en-US-ChristopherNeural" pcm = PCM(sample_rate_hz=22050) request.audio_params.audio_format.pcm.CopyFrom(pcm) request.input.text.text = "The wind was a torrent of darkness, among the gusty trees." list_of_requests.append(request)
-
Edit the script or batch file to include the
--file
argument pointing to flow-multi.py.... python3 client.py --oauthURL https://auth.crt.nuance.com/oauth2/token \ --clientID $CLIENT_ID --clientSecret $SECRET \ --secure --serverUrl tts.api.nuance.com:443 --neural\ --saveAudio --saveAudioAsWav --file flow-multi.py
-
Run the client using the script or batch file.
./run-client.sh
run-client.bat
See the results below and notice the three audio files created:
- flow.py_i1_s1.wav: Jenny saying: “This is a test, a very simple test.”
- flow.py_i1_s2.wav: Jenny saying: “Your coffee will be ready in five minutes.”
- flow.py_i1_s3.wav: Christopher saying: “The wind was a torrent of darkness, among the gusty trees.”
These are the results from multiple synthesis requests:
2022-12-13 15:33:11,048 (139787073779520) INFO Obtaining auth token
2022-12-13 15:33:11,449 (139787073779520) DEBUG Creating secure gRPC channel
2022-12-13 15:33:11,454 (139787073779520) INFO Running file [flow-multi.py]
2022-12-13 15:33:11,454 (139787073779520) DEBUG [voice {
name: "en-US-JennyNeural"
}
audio_params {
audio_format {
pcm {
sample_rate_hz: 22050
}
}
}
input {
text {
text: "This is a test, a very simple test."
}
}
, 2, voice {
name: "en-US-JennyNeural"
}
audio_params {
audio_format {
pcm {
sample_rate_hz: 22050
}
}
}
input {
text {
text: "Your coffee will be ready in 5 minutes."
}
}
, 2, voice {
name: "en-US-ChristopherNeural"
}
audio_params {
audio_format {
pcm {
sample_rate_hz: 22050
}
}
}
input {
text {
text: "The wind was a torrent of darkness, among the gusty trees."
}
}
]
2022-12-13 15:33:11,455 (139787073779520) INFO Adding x-nuance-tts-neural header
2022-12-13 15:33:11,455 (139787073779520) INFO Sending Synthesis request
2022-12-13 15:33:11,966 (139787073779520) INFO Received audio: 55058 bytes
2022-12-13 15:33:11,992 (139787073779520) INFO Received audio: 55126 bytes
2022-12-13 15:33:11,994 (139787073779520) INFO Received audio: 7716 bytes
2022-12-13 15:33:11,995 (139787073779520) INFO Received audio: 30870 bytes
2022-12-13 15:33:11,995 (139787073779520) INFO Received audio: 66 bytes
2022-12-13 15:33:11,996 (139787073779520) INFO Received status response: SUCCESS
2022-12-13 15:33:11,997 (139787073779520) INFO Wrote audio to ./audio/flow-multi.py_i1_s1.wav
2022-12-13 15:33:11,997 (139787073779520) INFO Waiting for 2 seconds
2022-12-13 15:33:14,000 (139787073779520) INFO Adding x-nuance-tts-neural header
2022-12-13 15:33:14,000 (139787073779520) INFO Sending Synthesis request
2022-12-13 15:33:14,378 (139787073779520) INFO Received audio: 55058 bytes
2022-12-13 15:33:14,404 (139787073779520) INFO Received audio: 47958 bytes
2022-12-13 15:33:14,405 (139787073779520) INFO Received audio: 30870 bytes
2022-12-13 15:33:14,405 (139787073779520) INFO Received audio: 66 bytes
2022-12-13 15:33:14,406 (139787073779520) INFO Received status response: SUCCESS
2022-12-13 15:33:14,407 (139787073779520) INFO Wrote audio to ./audio/flow-multi.py_i1_s2.wav
2022-12-13 15:33:14,407 (139787073779520) INFO Waiting for 2 seconds
2022-12-13 15:33:16,410 (139787073779520) INFO Adding x-nuance-tts-neural header
2022-12-13 15:33:16,410 (139787073779520) INFO Sending Synthesis request
2022-12-13 15:33:16,905 (139787073779520) INFO Received audio: 55058 bytes
2022-12-13 15:33:16,933 (139787073779520) INFO Received audio: 55126 bytes
2022-12-13 15:33:16,934 (139787073779520) INFO Received audio: 48510 bytes
2022-12-13 15:33:16,934 (139787073779520) INFO Received audio: 30870 bytes
2022-12-13 15:33:16,935 (139787073779520) INFO Received audio: 66 bytes
2022-12-13 15:33:16,935 (139787073779520) INFO Received status response: SUCCESS
2022-12-13 15:33:16,936 (139787073779520) INFO Wrote audio to ./audio/flow-multi.py_i1_s3.wav
2022-12-13 15:33:16,936 (139787073779520) INFO Done running file [flow-multi.py]
2022-12-13 15:33:16,939 (139787073779520) INFO Done
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.