Storage gRPC API

The Storage API contains methods for uploading synthesis resources to a central cloud location. You can then use these resources in the Synthesizer API.

Proto file structure

The Storage API is defined in the storage.proto file.

└── nuance
    ├── rpc (RPC message files)
    └── tts
        ├── storage
        │   └── v1beta1
        │       └── storage.proto
        └── v1 
            └── synthesizer.proto

The proto file defines a Storage service with two RPC methods: Upload and Delete.

  Proto file fields for Upload method  
  Proto file fields for Delete method  

For the RPC field, see RPC gRPC messages.

Storage

The Storage service offers two methods: Upload and Delete.

Storage service
Method Request Response Description
Upload UploadRequest stream UploadResponse Uploads a synthesis resource to cloud storage and returns a URN to refer to it.
Delete DeleteRequest DeleteResponse Deletes the synthesis resource in storage.

These are the general steps for uploading or deleting synthesis resources to cloud storage:

  1. Send an UploadRequest with the content to upload and other parameters. The request is streamed to the service and UploadResponse returns a URN to identify the resource.

  2. To remove content from storage, send DeleteRequest with the URN of the resource to remove. If the resource exists in storage, it is removed, and DeleteResponse returns the status of the delete process.

UploadRequest

Requests to upload (stream) content to central cloud storage, sent one at a time in order. First send upload_init_message then the data to upload. This request returns UploadResponse.

Upload request
Field Type Description
One of:
   upload_init_message UploadInitMessage Mandatory. First message in the RPC input stream, to define the content that will follow.
   data_chunk bytes Mandatory. Data to upload, in chunks lower than the allowed maximum gRPC message size. If uploading an ActivePrompt, a zipped stream is required.

This message includes:

UploadRequest
  upload_init_message (UploadInitMessage)
    context_tag
    name
    metadata
    (One of the following)
    active_prompt_db (ActivePromptDB)
    dictionary (UserDictionary)
    text_ruleset (TextUserRuleset)
    wav (Wav)
  data_chunk

This upload request references the constructed initial message and data:

data = file_handle.read(max_chunk_size_bytes)
if not data:
    log.info("Done reading data")
    break
upload_request = UploadRequest()
upload_request.data_chunk = data
yield upload_request

UploadInitMessage

The required first message sent by the client. It defines the type of the content as well as the output URN. Included in UploadRequest. There are three types of URNs:

  • Language-scoped: urn:nuance-mix:tag:tuning:lang/<context_tag>/<name>/<language>/mix.tts
  • Voice-scoped: urn:nuance-mix:tag:tuning:voice/<context_tag>/<name>/<voice>/mix.tts
  • Audio-scoped: urn:nuance-mix:tag:tuning:audio/<context_tag>/<name>/mix.tts
Upload initial message
Field Type Description
context_tag string Mandatory. Context tag of the current application. A context tag can contain many resources. Will be included in the URN.
name string Mandatory. Name of the uploaded content. Should be unique within a context tag. Will be included in the URN.
metadata map<string,string> Map of client-supplied metadata key, value pairs.
One of:   Mandatory. Resource type to upload.
   active_prompt_db ActivePromptDB ActivePrompt database (application/x-vocalizer-activeprompt-db). Voice-scoped.
   dictionary UserDictionary User dictionary (application/edct-bin-dictionary). Language-scoped.
   text_ruleset TextUserRuleset Text user ruleset (application/x-vocalizer-rettt+text). Language-scoped.
   binary_ruleset BinaryUserRuleset Not supported. Binary user ruleset (application/x-vocalizer-rettt+bin).
   wav Wav Wav audio file, for insertion into synthesis via SSML or Nuance control codes. See SSML input and Tokenized sequence.

This message includes:

UploadRequest
  upload_init_message (UploadInitMessage)
    context_tag
    name
    metadata
    (One of the following)
    active_prompt_db (ActivePromptDB)
      voice
      voice_version
      voice_model
      vocalizer_studio_version
    dictionary (UserDictionary)
      language
    text_ruleset (TextUserRuleset)
      language
    wav (Wav)
      status nuance.rpc.Status
      uri

This upload init message takes the context tag and name from arguments:

upload_request = UploadRequest()
upload_init_message = UploadInitMessage()
upload_init_message.context_tag = args.context_tag
upload_init_message.name = args.name

ActivePromptDB

Parameters for uploading an ActivePrompt database. Included in UploadInitMessage. See ActivePrompt database.

An ActivePrompt database is a voice-scoped tuning resource, to control the output audio and dynamically insert recordings during synthesis. These databases must be created through Nuance Vocalizer Studio. When uploading an ActivePrompt database:

  • The database file itself must be renamed to index.dat before upload.
  • A zip file containing both the .dat file and all recordings is required.
  • The database and audio must be zipped together without a root folder.
Active prompt database
Field Type Description
voice string Mandatory. Voice name.
voice_version string Mandatory. Voice version.
voice_model string Mandatory. Voice model.
vocalizer_studio_version string Mandatory. Vocalizer Studio version used to build the ActivePrompt.

Parameters for ActivePrompt databases are collected from the user:

options.add_argument("--file", metavar="file", nargs="?",
                     help="File to upload. If an ActivePrompt Database, must be packaged as a zip.", required=True)
options.add_argument("--context_tag", metavar="tag", nargs="?",
                     help="Context tag", default='', required=True)
options.add_argument("--name", metavar="name", nargs="?",
                     help="Resource name", default='', required=True)
options.add_argument("--type", metavar="type", nargs="?",
                     help="Resource type. Must be one of: [activeprompt,
                     user_dictionary, text_ruleset]", required=True)
options.add_argument("--voice", metavar="type", nargs="?",
                     help="ActivePrompt voice", default='')
options.add_argument("--voice_model", metavar="type", nargs="?",
                     help="ActivePrompt voice model", default='')
options.add_argument("--voice_version", metavar="type", nargs="?",
                     help="ActivePrompt voice version", default='')
options.add_argument("--vocalizer_studio_version", metavar="type", nargs="?",
                     help="ActivePrompt Vocalier Studio version", default='')
. . .
upload_request = UploadRequest()
upload_init_message = UploadInitMessage()
upload_init_message.context_tag = args.context_tag
upload_init_message.name = args.name

if type == 'activeprompt':
    log.info('Type is ActivePromptDB')
    active_prompt_db = ActivePromptDB()
    active_prompt_db.voice = voice
    active_prompt_db.voice_model = voice_model
    active_prompt_db.voice_version = voice_version
    active_prompt_db.vocalizer_studio_version = vocalizer_studio_version
    upload_init_message.active_prompt_db.CopyFrom(active_prompt_db)

UserDictionary

Parameters for uploading a user dictionary. Included in UploadInitMessage. See User dictionary.

A user dictionary is a language-scoped tuning resource, to control pronunciation and acronym expansion.

User dictionaary
Field Type Description
language string Mandatory. IETF language of the dictionary.

Parameters for user dictionaries are collected from the user:

options.add_argument("--file", metavar="file", nargs="?",
                     help="File to upload...", required=True)
options.add_argument("--context_tag", metavar="tag", nargs="?",
                     help="Context tag", default='', required=True)
options.add_argument("--name", metavar="name", nargs="?",
                     help="Resource name", default='', required=True)
options.add_argument("--type", metavar="type", nargs="?",
                     help="Resource type. Must be one of: [activeprompt,
                     user_dictionary, text_ruleset]", required=True)
options.add_argument("--type", metavar="type", nargs="?",
                     help="Resource type. Must be one of: [activeprompt,
                     user_dictionary, text_ruleset]", required=True)
options.add_argument("--language", metavar="type", nargs="?",
                     help="IETF language code. Required if type is [user_dictionary,
                     text_ruleset])", default='')
. . .
upload_request = UploadRequest()
upload_init_message = UploadInitMessage()
upload_init_message.context_tag = args.context_tag
upload_init_message.name = args.name
. . .
elif type == "user_dictionary":
    log.info('Type is User Dictionary')
    user_dictionary = UserDictionary()
    user_dictionary.language = language
    upload_init_message.dictionary.CopyFrom(user_dictionary)

TextUserRuleset

Parameters for uploading a text user ruleset. Included in UploadInitMessage. See ActivePrompt database.

A user ruleset is a language-scoped tuning resource, to apply find+replace and regular expression rules on the input text.

Text user ruleset
Field Type Description
language string Mandatory. IETF language of the ruleset.

Parameters for text rulesets are collected from the user:

options.add_argument("--file", metavar="file", nargs="?",
                     help="File to upload...", required=True)
options.add_argument("--context_tag", metavar="tag", nargs="?",
                     help="Context tag", default='', required=True)
options.add_argument("--name", metavar="name", nargs="?",
                     help="Resource name", default='', required=True)
options.add_argument("--type", metavar="type", nargs="?",
                     help="Resource type. Must be one of: [activeprompt,
                     user_dictionary, text_ruleset]", required=True)
options.add_argument("--language", metavar="type", nargs="?",
                     help="IETF language code. Required if type is [user_dictionary,
                     text_ruleset])", default='')
. . .
upload_request = UploadRequest()
upload_init_message = UploadInitMessage()
upload_init_message.context_tag = args.context_tag
upload_init_message.name = args.name
. . .
elif type == "text_ruleset":
    log.info('Type is Text User Ruleset')
    text_ruleset = TextUserRuleset()
    text_ruleset.language = language
    upload_init_message.text_ruleset.CopyFrom(text_ruleset)

BinaryUserRuleset

Binary (encrypted) rulesets are not supported.

Wav

An audio wave recording can be inserted into the synthesis using the SSML <audio> tag or the Nuance control code, audio. Included in UploadInitMessage. See Audio file.

UploadResponse

Response to UploadRequest, indicating whether the upload was successful.

Upload response
Field Type Description
status nuance.rpc.Status Any error response means the data was not stored. If no response at all is received (e.g. due to a communication issue), data may have been stored. Another UploadRequest can be sent to restart; any existing files will be overwritten.
uri string

Output URN, to refer to the content at runtime. This is for informational purposes: the URN format is predictable based on the input parameters in the UploadInitMessage.

The URN includes a type field to identify the type of request. This field is not required when using the URN in other requests.

This message includes:

UploadResponse
  status (nuance.rpc.Status)
  uri

Upload request and response:

with create_channel() as channel:
    storage_stub = StorageStub(channel)
    request_iterator = read_file(file=args.file, context_tag=args.context_tag, name=args.name, type=args.type, voice=args.voice, voice_model=args.voice_model, voice_version=args.voice_version, vocalizer_studio_version=args.vocalizer_studio_version, language=args.language, max_chunk_size_bytes=args.max_chunk_size_bytes)
    upload_response = storage_stub.Upload(request_iterator)
    log.info(text_format.MessageToString(upload_response))

This reponds to uploading an ActivePrompt database for a coffee application:

./run-ap-storage-client.sh

2021-05-18 11:27:33,610 INFO  Type is ActivePromptDB
2021-05-18 11:27:33,928 INFO  Done reading data
2021-05-18 11:27:34,427 INFO  status {
  status_code: OK
}
uri: "urn:nuance-mix:tag:tuning:voice/coffee_app/coffee_prompts/evan/mix.tts?type=activeprompt"

DeleteRequest

Request to remove an item from storage. This request returns DeleteResponse.

Delete request
Field Type Description
uri string Mandatory. URN of the uploaded content, using one of these formats:
urn:nuance-mix:tag:tuning:lang/<context_tag>/<name>/<language>/mix.tts
urn:nuance-mix:tag:tuning:voice/<context_tag>/<name>/<voice>/mix.tts
urn:nuance-mix:tag:tuning:audio/<context_tag>/<name>/mix.tts

This message includes:

DeleteRequest
  uri

DeleteResponse

Response to DeleteRequest, indicating whether the deletion was successful.

Delete response
Field Type Description
status nuance.rpc.status Success means the data is not in the system anymore; either because it was deleted by the request or was never there (idempotency).

This message includes:

DeleteResponses
  status (nuance.rpc.Status)

Scalar value types

The data types in the proto files are mapped to equivalent types in the generated client stub files.

Scalar data types
Proto Notes C++ Java Python
double double double float
float float float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers. If your field is likely to have negative values, use sint32 instead. int32 int int
int64 Uses variable-length encoding. Inefficient for encoding negative numbers. If your field is likely to have negative values, use sint64 instead. int64 long int/long
uint32 Uses variable-length encoding. uint32 int int/long
uint64 Uses variable-length encoding. uint64 long int/long
sint32 Uses variable-length encoding. Signed int value. These encode negative numbers more efficiently than regular int32s. int32 int int
sint64 Uses variable-length encoding. Signed int value. These encode negative numbers more efficiently than regular int64s. int64 long int/long
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long
sfixed32 Always four bytes. int32 int int
sfixed64 Always eight bytes. int64 long int/long
bool bool boolean boolean
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode
bytes May contain any arbitrary sequence of bytes. string ByteString str