Storage gRPC API

The Storage API contains methods for uploading synthesis resources to a central cloud location. You can then use these resources in the Synthesizer API.

Tip:

Try out this API using a Sample storage client.

Proto file structure

The Storage API is defined in the storage.proto file.

└── nuance
    ├── rpc (RPC message files)
    └── tts
        ├── storage
        │   └── v1beta1
        │       └── storage.proto
        └── v1 
            └── synthesizer.proto

The proto file defines a Storage service with two RPC methods: Upload and Delete.

Proto file fields for Upload method

Proto file fields for Delete method

For the RPC field, see RPC gRPC messages.

Storage

The Storage service offers two methods: Upload and Delete.

Storage service
Method	Request	Response	Description
Upload	UploadRequest stream	UploadResponse	Uploads a synthesis resource to cloud storage and returns a URN to refer to it.
Delete	DeleteRequest	DeleteResponse	Deletes the synthesis resource in storage.

These are the general steps for uploading or deleting synthesis resources to cloud storage:

Send an UploadRequest with the content to upload and other parameters. The request is streamed to the service and UploadResponse returns a URN to identify the resource.
To remove content from storage, send DeleteRequest with the URN of the resource to remove. If the resource exists in storage, it is removed, and DeleteResponse returns the status of the delete process.

UploadRequest

Requests to upload (stream) content to central cloud storage, sent one at a time in order. First send upload_init_message then the data to upload. This request returns UploadResponse.

Upload request
Field	Type	Description
One of:
upload_init_message	UploadInitMessage	Mandatory. First message in the RPC input stream, to define the content that will follow.
data_chunk	bytes	Mandatory. Data to upload, in chunks lower than the allowed maximum gRPC message size. If uploading an ActivePrompt, a zipped stream is required.

This message includes:

UploadRequest
  upload_init_message (UploadInitMessage)
    context_tag
    name
    metadata
    (One of the following)
    active_prompt_db (ActivePromptDB)
    dictionary (UserDictionary)
    text_ruleset (TextUserRuleset)
    wav (Wav)
  data_chunk

This upload request references the constructed initial message and data:

data = file_handle.read(max_chunk_size_bytes)
if not data:
    log.info("Done reading data")
    break
upload_request = UploadRequest()
upload_request.data_chunk = data
yield upload_request

UploadInitMessage

The required first message sent by the client. It defines the type of the content as well as the output URN. Included in UploadRequest. There are three types of URNs:

Language-scoped: urn:nuance-mix:tag:tuning:lang/<context_tag>/<name>/<language>/mix.tts
Voice-scoped: urn:nuance-mix:tag:tuning:voice/<context_tag>/<name>/<voice>/mix.tts
Audio-scoped: urn:nuance-mix:tag:tuning:audio/<context_tag>/<name>/mix.tts

Upload initial message
Field	Type	Description
context_tag	string	Mandatory. Context tag of the current application. A context tag can contain many resources. Will be included in the URN.
name	string	Mandatory. Name of the uploaded content. Should be unique within a context tag. Will be included in the URN.
metadata	map<string,string>	Map of client-supplied metadata key, value pairs.
One of:		Mandatory. Resource type to upload.
active_prompt_db	ActivePromptDB	ActivePrompt database (application/x-vocalizer-activeprompt-db). Voice-scoped.
dictionary	UserDictionary	User dictionary (application/edct-bin-dictionary). Language-scoped.
text_ruleset	TextUserRuleset	Text user ruleset (application/x-vocalizer-rettt+text). Language-scoped.
binary_ruleset	BinaryUserRuleset	Not supported. Binary user ruleset (application/x-vocalizer-rettt+bin).
wav	Wav	Wav audio file, for insertion into synthesis via SSML or Nuance control codes. See SSML input and Tokenized sequence.

This message includes:

UploadRequest
  upload_init_message (UploadInitMessage)
    context_tag
    name
    metadata
    (One of the following)
    active_prompt_db (ActivePromptDB)
      voice
      voice_version
      voice_model
      vocalizer_studio_version
    dictionary (UserDictionary)
      language
    text_ruleset (TextUserRuleset)
      language
    wav (Wav)
      status nuance.rpc.Status
      uri

This upload init message takes the context tag and name from arguments:

upload_request = UploadRequest()
upload_init_message = UploadInitMessage()
upload_init_message.context_tag = args.context_tag
upload_init_message.name = args.name

ActivePromptDB

Parameters for uploading an ActivePrompt database. Included in UploadInitMessage. See ActivePrompt database.

An ActivePrompt database is a voice-scoped tuning resource, to control the output audio and dynamically insert recordings during synthesis. These databases must be created through Nuance Vocalizer Studio. When uploading an ActivePrompt database:

The database file itself must be renamed to index.dat before upload.
A zip file containing both the .dat file and all recordings is required.
The database and audio must be zipped together without a root folder.

Active prompt database
Field	Type	Description
voice	string	Mandatory. Voice name.
voice_version	string	Mandatory. Voice version.
voice_model	string	Mandatory. Voice model.
vocalizer_studio_version	string	Mandatory. Vocalizer Studio version used to build the ActivePrompt.

Parameters for ActivePrompt databases are collected from the user:

options.add_argument("--file", metavar="file", nargs="?",
                     help="File to upload. If an ActivePrompt Database, must be packaged as a zip.", required=True)
options.add_argument("--context_tag", metavar="tag", nargs="?",
                     help="Context tag", default='', required=True)
options.add_argument("--name", metavar="name", nargs="?",
                     help="Resource name", default='', required=True)
options.add_argument("--type", metavar="type", nargs="?",
                     help="Resource type. Must be one of: [activeprompt,
                     user_dictionary, text_ruleset]", required=True)
options.add_argument("--voice", metavar="type", nargs="?",
                     help="ActivePrompt voice", default='')
options.add_argument("--voice_model", metavar="type", nargs="?",
                     help="ActivePrompt voice model", default='')
options.add_argument("--voice_version", metavar="type", nargs="?",
                     help="ActivePrompt voice version", default='')
options.add_argument("--vocalizer_studio_version", metavar="type", nargs="?",
                     help="ActivePrompt Vocalier Studio version", default='')
. . .
upload_request = UploadRequest()
upload_init_message = UploadInitMessage()
upload_init_message.context_tag = args.context_tag
upload_init_message.name = args.name

if type == 'activeprompt':
    log.info('Type is ActivePromptDB')
    active_prompt_db = ActivePromptDB()
    active_prompt_db.voice = voice
    active_prompt_db.voice_model = voice_model
    active_prompt_db.voice_version = voice_version
    active_prompt_db.vocalizer_studio_version = vocalizer_studio_version
    upload_init_message.active_prompt_db.CopyFrom(active_prompt_db)

UserDictionary

Parameters for uploading a user dictionary. Included in UploadInitMessage. See User dictionary.

A user dictionary is a language-scoped tuning resource, to control pronunciation and acronym expansion.

User dictionaary
Field	Type	Description
language	string	Mandatory. IETF language of the dictionary.

Parameters for user dictionaries are collected from the user:

options.add_argument("--file", metavar="file", nargs="?",
                     help="File to upload...", required=True)
options.add_argument("--context_tag", metavar="tag", nargs="?",
                     help="Context tag", default='', required=True)
options.add_argument("--name", metavar="name", nargs="?",
                     help="Resource name", default='', required=True)
options.add_argument("--type", metavar="type", nargs="?",
                     help="Resource type. Must be one of: [activeprompt,
                     user_dictionary, text_ruleset]", required=True)
options.add_argument("--type", metavar="type", nargs="?",
                     help="Resource type. Must be one of: [activeprompt,
                     user_dictionary, text_ruleset]", required=True)
options.add_argument("--language", metavar="type", nargs="?",
                     help="IETF language code. Required if type is [user_dictionary,
                     text_ruleset])", default='')
. . .
upload_request = UploadRequest()
upload_init_message = UploadInitMessage()
upload_init_message.context_tag = args.context_tag
upload_init_message.name = args.name
. . .
elif type == "user_dictionary":
    log.info('Type is User Dictionary')
    user_dictionary = UserDictionary()
    user_dictionary.language = language
    upload_init_message.dictionary.CopyFrom(user_dictionary)

TextUserRuleset

Parameters for uploading a text user ruleset. Included in UploadInitMessage. See ActivePrompt database.

A user ruleset is a language-scoped tuning resource, to apply find+replace and regular expression rules on the input text.

Text user ruleset
Field	Type	Description
language	string	Mandatory. IETF language of the ruleset.

Parameters for text rulesets are collected from the user:

options.add_argument("--file", metavar="file", nargs="?",
                     help="File to upload...", required=True)
options.add_argument("--context_tag", metavar="tag", nargs="?",
                     help="Context tag", default='', required=True)
options.add_argument("--name", metavar="name", nargs="?",
                     help="Resource name", default='', required=True)
options.add_argument("--type", metavar="type", nargs="?",
                     help="Resource type. Must be one of: [activeprompt,
                     user_dictionary, text_ruleset]", required=True)
options.add_argument("--language", metavar="type", nargs="?",
                     help="IETF language code. Required if type is [user_dictionary,
                     text_ruleset])", default='')
. . .
upload_request = UploadRequest()
upload_init_message = UploadInitMessage()
upload_init_message.context_tag = args.context_tag
upload_init_message.name = args.name
. . .
elif type == "text_ruleset":
    log.info('Type is Text User Ruleset')
    text_ruleset = TextUserRuleset()
    text_ruleset.language = language
    upload_init_message.text_ruleset.CopyFrom(text_ruleset)

BinaryUserRuleset

Binary (encrypted) rulesets are not supported.

Wav

An audio wave recording can be inserted into the synthesis using the SSML <audio> tag or the Nuance control code, audio. Included in UploadInitMessage. See Audio file.

UploadResponse

Response to UploadRequest, indicating whether the upload was successful.

Upload response
Field	Type	Description
status	nuance.rpc.Status	Any error response means the data was not stored. If no response at all is received (e.g. due to a communication issue), data may have been stored. Another UploadRequest can be sent to restart; any existing files will be overwritten.
uri	string	Output URN, to refer to the content at runtime. This is for informational purposes: the URN format is predictable based on the input parameters in the UploadInitMessage. The URN includes a type field to identify the type of request. This field is not required when using the URN in other requests.

This message includes:

UploadResponse
status (nuance.rpc.Status)
uri

Upload request and response:

with create_channel() as channel:
    storage_stub = StorageStub(channel)
    request_iterator = read_file(file=args.file, context_tag=args.context_tag, name=args.name, type=args.type, voice=args.voice, voice_model=args.voice_model, voice_version=args.voice_version, vocalizer_studio_version=args.vocalizer_studio_version, language=args.language, max_chunk_size_bytes=args.max_chunk_size_bytes)
    upload_response = storage_stub.Upload(request_iterator)
    log.info(text_format.MessageToString(upload_response))

This reponds to uploading an ActivePrompt database for a coffee application:

./run-ap-storage-client.sh

2021-05-18 11:27:33,610 INFO  Type is ActivePromptDB
2021-05-18 11:27:33,928 INFO  Done reading data
2021-05-18 11:27:34,427 INFO  status {
  status_code: OK
}
uri: "urn:nuance-mix:tag:tuning:voice/coffee_app/coffee_prompts/evan/mix.tts?type=activeprompt"

DeleteRequest

Request to remove an item from storage. This request returns DeleteResponse.

Delete request
Field	Type	Description
uri	string	Mandatory. URN of the uploaded content, using one of these formats: `urn:nuance-mix:tag:tuning:lang/<context_tag>/<name>/<language>/mix.tts` `urn:nuance-mix:tag:tuning:voice/<context_tag>/<name>/<voice>/mix.tts` `urn:nuance-mix:tag:tuning:audio/<context_tag>/<name>/mix.tts`

This message includes:

DeleteRequest
uri

DeleteResponse

Response to DeleteRequest, indicating whether the deletion was successful.

Delete response
Field	Type	Description
status	nuance.rpc.status	Success means the data is not in the system anymore; either because it was deleted by the request or was never there (idempotency).

This message includes:

DeleteResponses
status (nuance.rpc.Status)

Scalar value types

The data types in the proto files are mapped to equivalent types in the generated client stub files.

Scalar data types
Proto	Notes	C++	Java	Python
double		double	double	float
float		float	float	float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers. If your field is likely to have negative values, use sint32 instead.	int32	int	int
int64	Uses variable-length encoding. Inefficient for encoding negative numbers. If your field is likely to have negative values, use sint64 instead.	int64	long	int/long
uint32	Uses variable-length encoding.	uint32	int	int/long
uint64	Uses variable-length encoding.	uint64	long	int/long
sint32	Uses variable-length encoding. Signed int value. These encode negative numbers more efficiently than regular int32s.	int32	int	int
sint64	Uses variable-length encoding. Signed int value. These encode negative numbers more efficiently than regular int64s.	int64	long	int/long
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 2^28.	uint32	int	int
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 2^56.	uint64	long	int/long
sfixed32	Always four bytes.	int32	int	int
sfixed64	Always eight bytes.	int64	long	int/long
bool		bool	boolean	boolean
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string	String	str/unicode
bytes	May contain any arbitrary sequence of bytes.	string	ByteString	str

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.