Dialog service gRPC API

DLGaaS provides three protocol buffer (.proto) files to define the Dialog service for gRPC:

  • The dlg_interface.proto file defines the main DialogService interface methods.
  • The dlg_messages.proto file defines the request and response messages for the main DialogService methods.
  • The dlg_common_messages.proto file defines other objects used in the fields of messages.

Once you have transformed the proto files into functions and classes in your programming language using gRPC tools, you can call these functions from your client application to start a conversation with a user, collect the user’s input, obtain the action to perform, and so on.

See Client app development for a scenario using Python that provides an overview of the different methods and messages used in a sample order coffee application.

For other languages, consult the gRPC  and protocol buffer  documentation.

Proto file structure

The proto files define the methods and message types for the API.

  View Dialog basic proto structure  
  View Dialog streaming proto structure  
  View status and update request proto structure  

DialogService

The Dialog service contains six methods related to starting, executing, managing, and closing a conversation flow or dialog.

Dialog service methods description
Name Request Type Response Type Description
Start StartRequest StartResponse Starts a conversation. Returns a StartResponse object.
Status StatusRequest StatusResponse Returns the status of a session. Returns grpc status 0 (OK) if found, 5 (NOT_FOUND) if no session was found. Returns a StatusResponse object.
Update UpdateRequest UpdateResponse Updates the state of a session, for example session variables, without advancing the conversation. Returns an UpdateResponse object.
Execute ExecuteRequest ExecuteResponse Used to continuously interact with the conversation based on end user input or events. Returns an ExecuteResponse object that will contain data related to the dialog interactions and that can be used by the client to interact with the end user.
ExecuteStream StreamInput stream StreamOutput stream Performs recognition on streamed audio using ASRaaS and provides speech synthesis using TTSaaS.
Stop StopRequest StopResponse Ends a conversation and performs cleanup. Returns a StopResponse object.

StartRequest

Request object used by the Start method.

Start request fields description
Field Type Description
session_id string Optional session ID. If not provided then one will be generated.
selector common.Selector Selector providing the channel and language used for the conversation.
payload common.StartRequestPayload Payload of the Start request.
session_timeout_sec uint32 Session idle timeout limit (in seconds), after which the session is terminated. Maximum of 259200 (72 hours). The maximum is configurable in self-hosted deployments.
user_id string Identifies a specific user within the application. See User ID.
client_data map<string,string> Map of client-supplied key-value pairs to inject into the call log. Optional.
Example:
"client_data": { "param1": "value1", "param2": "value2" }

This method includes:

StartRequest
    session_id
    selector
        channel
        language
        library
    payload
        model_ref
            uri
            type
        data
        suppress_log_user_data
    session_timeout_sec
    user_id
    client_data

StartResponse

Response object used by the Start method.

Start response fields description
Field Type Description
payload common.StartResponsePayload Payload of the Start response. Contains session ID.

This method includes:

StartResponse
    payload
        session_id

StatusRequest

Request object used by Status method. For more information about the Status method, see Step 5. Check session status.

Status request fields description
Field Type Description
session_id string ID for the session.

This method includes:

StatusRequest
    session_id

StatusResponse

Response object used by the Status method.

Status response fields description
Field Type Description
session_remaining_sec uint32 Remaining session time to live (TTL) value in seconds, after which the session is terminated.
Note: The TTL may be a few seconds off based on how long the round trip of the request took.

This method includes:

StatusResponse
    session_remaining_sec

UpdateRequest

Request object used by the Update method. For more information about the Update method, see Step 6. Update session data.

Update request fields description
Field Type Description
session_id string ID for the session.
payload common.UpdateRequestPayload Payload of the Update request.
client_data map<string,string> Map of client-supplied key-value pairs to inject into the call log. Optional.
Example: "client_data": { "param1": "value1", "param2": "value2" }
user_id string Identifies a specific user within the application. See User ID.

This method includes:

UpdateRequest
    session_id
    payload
    client_data
    user_id

UpdateResponse

Response object used by the Update method. Currently empty.

This method includes:

UpdateResponse

ExecuteRequest

Request object used by the Execute method.

Execute request fields description
Field Type Description
session_id string ID for the session.
selector common.Selector Selector providing the channel and language used for the conversation.
payload common.ExecuteRequestPayload Payload of the Execute request.
user_id string Identifies a specific user within the application. See User ID.

This method includes:

ExecuteRequest
    session_id
    selector
        channel
        language
        library
    payload
        user_input
            user_text
            interpretation
                confidence
                input_mode
                utterance
                data
                    key
                    value
                slot_literals
                    key
                    value
                slot_formatted_literals
                    key
                    value
                slot_confidences
                    key
                    value
                alternative_interpretations
            selected_item
                id
                value
            nluaas_interpretation
            asraas_result
            input_mode
        dialog_event
            type
            message
            event_name
        requested_data
            id
            data
    user_id

ExecuteResponse

Response object used by the Execute method. This object carries a payload, which instructs the client app to play messages to the user (as needed) and do one of the following:

  • Prompt for user input
  • Provide requested data
  • Fill time and keep user engaged while server side is fetching data
  • Transfer or end the conversation

The payload also includes references to ASR, NLU, TTS, and NR resources that can be used to orchestrate externally with these other services rather than having Dialog perform orchestration.

Execute response fields description
Field Type Description
payload common.ExecuteResponsePayload Payload of the Execute response.

This method includes:

ExecuteResponse
    payload
        messages
            nlg
                text
                mask
                barge_in_disabled
            visual
                text
                mask
                barge_in_disabled
            audio
                text
                uri
                mask
                barge_in_disabled
            view
                id
                name
            language
            tts_parameters
                voice
            channel
        qa_action
            message
                nlg
                    text
                    mask
                    barge_in_disabled
                visual
                    text
                    mask
                    barge_in_disabled
                audio
                    text
                    uri
                    mask
                    barge_in_disabled
                view
                    id
                    name
                language
                tts_parameters
                  voice
                channel
            data
            view
                id
                name
            selectable
                selectable_items
                    value
                        id
                        value
                    description
                    display_text
                    display_image_uri
            recognition_settings
                dtmf_mappings
                collection_settings
                speech_settings
                dtmf_settings
                input_modes
            mask
            orchestration_resource_reference
                grammar_references
                recognition_resources
                interpretation_resources
            recognition_init_resources
                recognition_init_message
                recognition_init
                dtmf_recognition_init
            language
        da_action
            id
            message
                nlg
                    text
                    mask
                    barge_in_disabled
                visual
                    text
                    mask
                    barge_in_disabled
                audio
                    text
                    uri
                    mask
                    barge_in_disabled
                view
                    id
                    name
                language
                tts_parameters
                  voice
                channel
            view
                id
                name
            message_settings
                delay
                minimum
            data
        escalation_action
            message
                nlg
                    text
                    mask
                    barge_in_disabled
                visual
                    text
                    mask
                    barge_in_disabled
                audio
                    text
                    uri
                    mask
                    barge_in_disabled
                view
                    id
                    name
                language
                tts_parameters
                  voice
                channel
            view
                id
                name
            data
            id
            escalation_settings
                type
                destination
        end_action
            data
            id
        continue_action
            message
                nlg
                    text
                    mask
                    barge_in_disabled
                visual
                    text
                    mask
                    barge_in_disabled
                audio
                    text
                    uri
                    mask
                    barge_in_disabled
                view
                    id
                    name
                language
                tts_parameters
                  voice
                channel
            message_settings
                  delay
                  minimum
            backend_connection_settings
                  fetch_timeout
                  connect_timeout
            view
                id
                name
            data
            id
        channel

StreamInput

Performs recognition on streamed audio using ASRaaS and requests speech synthesis using TTSaaS.

The field asr_control_v1 (and control_message if applicable) must be sent as part of the first StreamInput message in order for DLGaaS to chain the audio stream with ASRaaS. Audio is streamed, in order, segment by segment, over the course of the various StreamInput messages.

Stream input fields description
Field Type Description
request ExecuteRequest Standard DLGaaS ExecuteRequest. Used to continue the dialog interactions. Used on the first StreamInput only.
asr_control_v1 AsrParamsV1 Defines audio recognition parameters to be forwarded to the ASR service to initiate audio streaming. The contents of this message correspond to those of the recognition_init_message field used in the first message of the ASR input stream. Used on the first StreamInput only.
audio bytes A segment of the input speech audio in the selected encoding for recognition.
tts_control_v1 TtsParamsv1 Parameters to be forwarded to the TTS service.
control_message nuance.asr.v1.ControlMessage Optional input message to be forwarded to the ASR service. This corresponds to the optional control_message field used in the first message of the ASR input stream. ASR uses this message to start the recognition no-input timer if it was disabled by a stall_timers recognition flag in asr_control_v1. See the ASRaaS RecognitionRequest documentation for details. Used on the first StreamInput only.

This method includes:

StreamInput
    request Standard DLGaaS ExecuteRequest
    asr_control_v1
        audio_format
            pcm | alaw | ulaw | opus | ogg_opus
        utterance_detection_mode
            SINGLE | MULTIPLE | DISABLED
        recognition_flags
            auto_punctuate
            filter_profanity
           include_tokenization
           stall_timers
            etc.
        result_type
        no_input_timeout_ms
        recognition_timeout_ms
        utterance_end_silence_ms
        speech_detection_sensitivity
        max_hypotheses
        end_stream_no_valid_hypotheses
        resources
        speech_domain
        formatting
    audio
    tts_control_v1
        audio_params
            audio_format
            volume_percentage
            speaking_rate_percentage
            and so on
        voice
            name
            model
            and so on
    control_message
        start_timers_message

StreamOutput

Streams the requested TTS output and returns ASR results.

Stream output fields description
Field Type Description
response ExecuteResponse Standard DLGaaS ExecuteResponse; used to continue the dialog interactions. Used on the first StreamOutput only.
audio nuance.tts.v1.SynthesisResponse TTS output. See the TTSaaS SynthesisResponse documentation for details.
asr_result nuance.asr.v1.Result Output message containing the transcription result, including the result type, the start and end times, metadata about the transcription, and one or more transcription hypotheses. See the ASRaaS Result documentation for details. Used on the first StreamOutput only.
asr_status nuance.asr.v1.Status Output message indicating the status of the transcription. See the ASRaaS Status documentation for details. Used on the first StreamOutput only.
asr_start_of_speech nuance.asr.v1.StartOfSpeech Output message containing the start-of-speech message. See the ASRaaS StartOfSpeech documentation for details. Used on the first StreamOutput only.

This method includes:

StreamOutput
    response Standard DLGaaS ExecuteResponse
    audio
    asr_result
    asr_status
    asr_start_of_speech

StopRequest

Request object used by Stop method.

Stop request fields description
Field Type Description
session_id string ID for the session.
user_id string Identifies a specific user within the application. See User ID.

This method includes:

StopRequest
    session_id
    user_id

StopResponse

Response object used by the Stop method. Currently empty; reserved for future use.

This method includes:

StopResponse

Fields reference

The following section contains additional details about the message types of fields used in the request and response messages.

AsrParamsV1

Parameters to be forwarded to the ASR service. See Step 4b. Interact with the user using audio for details.

ASR params fields description
Field Type Description
audio_format nuance.asr.v1.AudioFormat Audio codec type and sample rate. See the ASRaaS AudioFormat documentation for details.
utterance_detection_mode nuance.asr.v1. EnumUtteranceDetectionMode How end of utterance is determined. Defaults to SINGLE. See the ASRaaS EnumUtteranceDetectionMode documentation for details.
recognition_flags nuance.asr.v1.RecognitionFlags Flags to fine tune recognition. See the ASRaaS RecognitionFlags documentation for details.
result_type nuance.asr.v1.EnumResultType Whether final, partial, or immutable results are returned. See the ASRaaS EnumResultType documentation for details.
no_input_timeout_ms uint32 Maximum silence, in milliseconds, allowed while waiting for user input after recognition timers are started. Default (0) means server default, usually no timeout. See the ASRaaS Timers documentation for details.
recognition_timeout_ms uint32 Maximum duration, in milliseconds, of recognition turn. Default (0) means server default, usually no timeout. See the ASRaaS Timers documentation for details.
utterance_end_silence_ms uint32 Minimum silence, in milliseconds, that determines the end of an utterance. Default (0) means server default, usually 5 or half a second. See the ASRaaS Timers documentation for details.
speech_detection_sensitivity float A balance between detecting speech and noise (breathing, etc.), from 0 to 1. 0 means ignore all noise, 1 means interpret all noise as speech. Default is 0.5. See the ASRaaS Timers documentation for details.
max_hypotheses uint32 Maximum number of n-best hypotheses to return. Default (0) means a server default, usually 10 hypotheses.
end_stream_no_valid_hypotheses bool Determines whether the dialog application or the client application handles the dialog flow when ASRaaS does not return a valid hypothesis. When set to false (default), the dialog flow is determined by the Mix.dialog application, according to the processing defined for the NO_INPUT and NO_MATCH events. To configure the streaming request so that the stream is closed if ASRaaS does not return a valid hypothesis, set to true. See Client handling of ASR no valid hypotheses for details.
resources nuance.asr.v1.RecognitionResource Repeated. Resources (DLMs, wordsets, builtins) to improve recognition. See the ASRaaS RecognitionResource documentation for details.
speech_domain string Mapping to internal weight sets for language models in the data pack. Values depend on the data pack.
formatting nuance.asr.v1.Formatting Specifies how the transcription results are presented, using keywords for formatting schemes and options supported by the data pack. See ASRaaS Formatting for details.

BackendConnectionSettings

Settings configured for a data access node backend connection.

Backend connection settings fields description
Field Type Description
fetch_timeout string Number of milliseconds allowed for fetching the data before timing out.
connect_timeout string Connect timeout in milliseconds.

ContinueAction

Continue action provides the client application with information useful for handling latency or delays involved with a data access node using a backend data connection. The continue action prompts the client application to respond to initiate the data access. See Continue actions for more detail.

Continue action fields description
Field Type Description
message Message Latency message to be played to the user while waiting for the backend data access.
view View View details for this action.
data google.protobuf.Struct  Map of data exchanged in this node.
id string ID identifying the Continue action node in the dialog application.
message_settings MessageSettings Settings to be used along with messages returned to the present user.
backend_connection_settings BackendConnectionSettings Backend settings that will be used by DLGaaS for connecting to and fetching from the backend.

DAAction

A Data Access action is associated with a Data access node using client-side data access. It provides the client application with data needed to perform the data access as well as a message to play to the user while waiting for the data access to complete. See Data access actions for more detail.

DA action fields description
Field Type Description
id string ID identifying the Data Access node in the dialog application.
message Message Message to be played to the user while waiting for the data access to complete.
view View View details for this action.
data google.protobuf.Struct  Map of data exchanged in this node.
message_settings MessageSettings Settings to be used along with messages played to the present user.

DialogEvent

Message used to indicate an event that occurred during the dialog interactions.

Dialog event fields description
Field Type Description
type DialogEvent.EventType Type of event being triggered.
message string Optional message providing additional information about the event.
event_name string Name of custom event. Must be set to the name of the custom event defined in Mix.dialog. See Handling events for details. Applies only when DialogEvent.EventType is set to CUSTOM.

DialogEvent.EventType

The possible event types that can occur on the client side of interactions.

Dialog event event type fields description
Name Number Description
SUCCESS 0 Everything went as expected.
ERROR 1 An unexpected problem occurred.
NO_INPUT 2 End user has not provided any input.
NO_MATCH 3 End user provided unrecognizable input.
HANGUP 4 The end user session has been terminated by the user. This event is used both for IVR (caller hangup) and for digital channels (for example the user disconnecting from a chat session). In Mix.dialog, this event type triggers a UserDisconnect event.
CUSTOM 5 Custom event. You must set field event_name in DialogEvent to the name of the custom event defined in Mix.dialog.

EndAction

End node, indicates that the dialog has ended. See End actions for more detail.

End action fields description
Field Type Description
data google.protobuf.Struct  Map of data exchanged in this node.
id string ID identifying the End Action node in the dialog application.

EscalationAction

Escalation action to be performed by the client application. See Transfer actions for more detail.

Escalation action fields description
Field Type Description
message Message Message to be played as part of the escalation action.
view View View details for this action.
data google.protobuf.Struct  Map of data exchanged in this node.
id string ID identifying the External Action node in the dialog application.
escalation_settings EscalationSettings Settings to configure the esclation transfer.

EscalationSettings

Settings to configure a transfer of the dialog, for example, to a live agent.

Escalation settings fields description
Field Type Description
type string Type of escalation transfer. Values can include “blind” and “route-request”. Empty if not set.
destination string Optional, provided if a specific transfer type is set. Destination for the transfer, for example, a phone number.

ExecuteRequestPayload

Payload sent with the Execute request. If both an event and a user input are provided, the event has precedence. For example, if an error event is provided, the input will be ignored.

Execute request payload fields description
Field Type Description
user_input UserInput Input provided to the Dialog engine.
dialog_event DialogEvent Used to pass in events that can drive the flow. Optional; if an event is not passed, the operation is assumed to be successful.
requested_data RequestData Data that was previously requested by engine.

ExecuteResponsePayload

Payload returned after the Execute method is called. Specifies the action to be performed by the client application.

Execute response payload fields description
Field Type Description
messages Message Repeated. Message action to be performed by the client application.
One of:    
   qa_action QAAction Question and answer action to be performed by the client application.
   da_action DAAction Data access action to be performed by the client application in relation to data access node using client-side data connection.
   escalation_action EscalationAction Escalation action to be performed by the client application.
   end_action EndAction End action to be performed by the client application.
   continue_action ContinueAction Continue action to be performed by the client application in relation to data access node using server-side data connection.
channel string Active channel for the action.

Message

Specifies the message to be played to the user. See Message actions for details.

Message fields description
Field Type Description
nlg Message.Nlg Repeated. Text to be played using text to speech.
visual Message.Visual Repeated. Text to be displayed to the user (for example, in a chat).
audio Message.Audio Repeated. Prompt to be played from an audio file.
view View View details for this message.
language string Message language in xx-XX format. For example, en-US.
tts_parameters TTSParameters Voice parameters for TTS to be used when TTSaaS orchestrated separately from DLGaaS.
channel string Active channel for the message.

Message.Audio

Message audio details.

Message audio fields description
Field Type Description
text string Text to be used as TTS backup if the audio file cannot be played.
uri string URI to the audio file, in the following format:
<language>/prompts/<library>/<channel>/<filename>?version=<version>
For example:
en-US/prompts/default/Omni_Channel_VA/Message_ini_01.wav?version=1.0_1602096507331
See To provide speech response using recorded audio for more details on how the filename portion is generated.
mask bool When set to true, indicates that the text contains sensitive data that will be masked in logs.
barge_in_disabled bool When set to true, indicates that barge-in is disabled.

Message.TTSParameters

Message TTS parameters.

TTS params fields description
Field Type Description
voice Voice TTSaaS voice to be used.

Message.TTSParameters.Voice

Message TTS voice details.

TTS params voice fields description
Field Type Description
name string The voice’s name, for example ‘Evan’. Mandatory for SynthesizeRequest.
model string The voice’s quality model, for example ‘standard’ or ’enhanced’. Mandatory for SynthesizeRequest.
gender EnumGender Voice gender. Default ANY for SynthesisRequest.
language string Language associated with the voice in xx-XX format, for example en-US.
voice_type string TTS voice type, for example ’neural’ or ‘standard’. To identify the TTS Engine to use.

Message.TTSParameters.Voice.EnumGender

TTSaaS voice gender.

TTS params voice gender enum description
Name Number Description
ANY 0 Any gender voice. Default for SynthesisRequest.
MALE 1 Male voice.
FEMALE 2 Female voice.
NEUTRAL 3 Neutral gender voice.

Message.Nlg

Text for text to speech.

Message nlg fields description
Field Type Description
text string Text to be played using text to speech.
mask bool When set to true, indicates that the text contains sensitive data that will be masked in logs.
barge_in_disabled bool When set to true, indicates that barge-in is disabled.

Message.Visual

Text to be displayed to the user.

Message visual fields description
Field Type Description
text string Text to be displayed to the user (for example, in a chat).
mask bool When set to true, indicates that the text contains sensitive data that will be masked in logs.
barge_in_disabled bool When set to true, indicates that barge-in is disabled.

MessageSettings

Settings to be used with latency messages returned by DAAction or ContinueAction.

Message settings fields description
Field Type Description
delay string Time in milliseconds to wait before presenting user with message.
minimum string Time in milliseconds to display/play message to user.

QAAction

Question and answer action to be performed by the client application. See Question and answer actions for more details.

QA action fields description
Field Type Description
message Message Message to be played as part of the question and answer action.
data google.protobuf.Struct  Map of data exchanged in this node.
view View View details for this action.
selectable Selectable Interactive elements to be displayed by the client app, such as clickable buttons or links. See Interactive elements for details.
recognition_settings RecognitionSettings Configuration information to be used during recognition when handled externally.
mask bool When set to true, indicates that the question and answer node is marked on the node level as sensitive, or is meant to collect an entity that will hold sensitive data to be masked in logs. Also true when logs are suppressed globally.
orchestration_resource_reference OrchestrationResourceReference References to ASR/NLU/NR resources to support external orchestration.
recognition_init_resources RecognitionInitResources ASR/NR parameters to configure speech or DTMF recognition. Dialog service does not populate this field in responses.
language string Language expected for user input.

RecognitionSettings

Configuration information to be used during recognition when handled externally.

Recognition settings fields description
Field Type Description
dtmf_mappings DtmfMapping Repeated. DTMF mappings configured in Mix.dialog.
collection_settings CollectionSettings Collection settings configured in Mix.dialog.
speech_settings SpeechSettings Speech settings configured in Mix.dialog.
dtmf_settings DtmfSettings DTMF settings configured in Mix.dialog.
input_modes string Repeated. Input modes configured in Mix.dialog for the presently selected channel. Possible values are “text”, “interactivity,” “voice,” or “dtmf”.

RecognitionSettings.CollectionSettings

Collection settings configured in Mix.dialog.

Recognition collection settings fields description
Field Type Description
timeout string Time, in ms, to wait for speech once a prompt has finished playing before throwing a NO_INPUT event.
complete_timeout string Duration of silence, in milliseconds, to determine the user has finished speaking. The timer starts when the recognizer has a well-formed hypothesis.
incomplete_timeout string Duration of silence, in milliseconds, to determine the user has finished speaking. The timer starts when the user stops speaking.
max_speech_timeout string Maximum duration, in milliseconds, of an utterance collected from the user.

RecognitionSettings.DtmfMapping

DTMF mappings configured in Mix.dialog. See Set DTMF mappings for details.

Recognition DTMF mapping fields description
Field Type Description
id string Name of the entity to which the DTMF mapping applies.
value string Entity value to map to a DTMF key.
dtmf_key string DTMF key associated with this entity value. Valid values are: 0-9, *, #

RecognitionSettings.DtmfSettings

DTMF settings configured in Mix.dialog.

Recognition DTMF settings fields description
Field Type Description
inter_digit_timeout string Maximum time, in milliseconds, allowed between each DTMF character entered by the user.
term_timeout string Maximum time, in milliseconds, to wait for an additional DTMF character before terminating the input.
term_char string Character that terminates a DTMF input.

RecognitionSettings.SpeechSettings

Speech settings configured in Mix.dialog.

Recognition speech settings fields description
Field Type Description
sensitivity string Level of sensitivity to speech. 1.0 means highly sensitive to quiet input, while 0.0 means least sensitive to noise.
barge_in_type string Barge-in type; possible values: “speech” (interrupt a prompt by using any word) and “hotword” (interrupt a prompt by using a specific hotword).
speed_vs_accuracy string Desired balance between speed and accuracy. 0.0 means fastest recognition, while 1.0 means best accuracy.

RequestData

Data that was requested by the dialog application.

Request data fields description
Field Type Description
id string ID used by the dialog application to identify which node requested the data.
data google.protobuf.Struct  Map of keys to json objects of the data requested.

ResourceReference

Reference object of the resource to use for the request (for example, URN or URL of the model)

Resource reference fields description
Field Type Description
uri string Reference (for example, the URL or URN for the Dialog model).
type ResourceReference. EnumResourceType Type of resource.

ResourceReference.EnumResourceType

Resource reference resource type enum.

Resource reference type enum fields description
Name Number Description
APPLICATION_MODEL 0 Dialog application model.

Selectable

Interactive elements to be displayed by the client app, such as clickable buttons or links. See Interactive elements for details.

Selectable fields description
Field Type Description
selectable_items Selectable.SelectableItem Repeated. Ordered list of interactive elements.

Selectable.SelectableItem

Selectable item details.

Selectable item fields description
Field Type Description
value Selectable.SelectableItem. SelectedValue Key-value pair of entity information (name and value) for the interactive element. A selected key-value pair is passed in an ExecuteRequest when the user interacts with the element.
description string Description of the interactive element.
display_text string Label to display for this interactive element.
display_image_uri string URI of image to display for this interactive element.

Selectable.SelectableItem.SelectedValue

Selectable item value.

Selectable item value fields description
Field Type Description
id string Name of the entity being collected.
value string Entity value corresponding to the interactive element.

OrchestrationResourceReference

References to orchestration resources for ASR, NLU, and Nuance Recognizer (NR).

Orchestration Resource References fields description
Field Type Description
grammar_references OrchestrationResourceReference.GrammarResourceReference Repeated. Nuance Recognizer resource references.
recognition_resources nuance.asr.v1.RecognitionResource Repeated. ASR resource references.
interpretation_resources nuance.nlu.v1.InterpretationResource Repeated. NLU resource references.

OrchestrationResourceReference.GrammarResourceReference

Reference to Nuance Recognizer grammar.

Grammar Resource Reference fields description
Field Type Description
uri string Reference (for example, the URL or URN).
type OrchestrationResourceReference.GrammarResourceReference.EnumResourceType Type of resource.
OrchestrationResourceReference.GrammarResourceReference.EnumResourceType

Nuance Recognizer grammar type.

Grammar Resource Reference fields description
Name Number Description
SPEECH_GRAMMAR 0 SRGS Grammar for speech(xml/gram/and so on).
DTMF_GRAMMAR 1 SRGS Grammar for dtmf(xml/gram/and so on).

RecognitionInitResources

RecognitionInitResources is used to pass ASR/NR parameters to configure speech or DTMF recognition. Dialog service does not populate this message in responses.

Recognition Init Resources fields description
Field Type Description
recognition_init_message nuance.asr.v1.RecognitionInitMessage ASR parameters and resources to configure speech recognition. Dialog service does not populate this field in responses. See the RecognitionInitMessage documentation for details.
recognition_init nuance.nrc.v1.RecognitionInit NR parameters and resources to configure speech recognition. Dialog service does not populate this field in responses. See the RecognitionInit documentation for details.
dtmf_recognition_init nuance.nrc.v1.DTMFRecognitionInit NR parameters and resources to configure DTMF recognition. Dialog service does not populate this field in responses. See the DTMFRecognitionInit documentation for details.

Selector

Provides channel and language used for the conversation. See Languages, channels, and modalities for details.

Selector fields description
Field Type Description
channel string Optional. Channel that this conversation is going to use (for example, WebVA).
Note: Replace any spaces or slashes in the name of the channel with the underscore character (_).
language string Optional. Language to use for this conversation. This sets the language session variable. The format is xx-XX, for example, “en-US”
library string Optional. Library to use for this conversation. Advanced customization reserved for future use. Always use the default value for now, which is default.

StartRequestPayload

Payload sent with the Start request.

Start request payload fields description
Field Type Description
model_ref ResourceReference Reference object for the Dialog model.
data google.protobuf.Struct  Session variables data sent in the request as a map of key-value pairs.
suppress_log_user_data bool Set to true to disable logging for ASR, NLU, TTS, and Dialog.

StartResponsePayload

Payload returned after the Start method is called. If a session ID is not provided in the request, a new one is generated and should be used for subsequent calls.

Start response payload fields description
Field Type Description
session_id string Returns session ID to use for subsequent calls.

UpdateRequestPayload

Payload sent with the Update request.

Update request payload fields description
Field Type Description
data google.protobuf.Struct  Map of key-value pairs of session variables to update.

TtsParamsv1

Parameters to be forwarded to the TTS service. See Step 4b. Interact with the user (using audio) for details.

TTS params fields description
Field Type Description
audio_params nuance.tts.v1.
AudioParameters
Output audio parameters, such as encoding and volume. See the TTSaaS AudioParameters documentation for details.
voice nuance.tts.v1.Voice The voice to use for audio synthesis. See the TTSaaS Voice documentation for details.

UserInput

Provides input to the Dialog engine. The client application sends either the text collected from the user, to be interpreted by Mix, or an interpretation that was performed externally.

Note: Provide only one of the following fields: user_text, interpretation, selected_item, nluaas_interpretation, asraas_result.

User input fields description
Field Type Description
One of:    
   user_text string Text collected from end user.
   interpretation UserInput.Interpretation Interpretation that was done externally (for example, Nuance Recognizer for VoiceXML). This can be used for simple interpretations that include entities with string values only. Use nluaas_interpretation for interpretations that include complex entities.
   selected_item Selectable.SelectableItem.
SelectedValue
Value of element selected by end user.
   nluaas_interpretation nuance.nlu.v1.InterpretResult Interpretation that was done externally (for example, Nuance Recognizer for VoiceXML), provided in the NLUaaS InterpretResult format. See Interpreting text user input for an example. Note that DLGaaS currently only supports single intent interpretations.
   asraas_result nuance.asr.v1.Result Speech recognition that was done externally, provided in the ASRaaS result format.
input_mode string Optional. Input mode. Used for reporting. Current values are dtmf/voice. Applies to user_text and nluaas_interpretation input only.

UserInput.Interpretation

Sends interpretation data.

User input interpretation fields description
Field Type Description
confidence float Required: Value from 0..1 that indicates the confidence of the interpretation.
input_mode string Optional. Input mode. Current values are dtmf/voice (but input mode not limited to these).
utterance string Raw collected text.
data UserInput.Interpretation.
DataEntry
Repeated. Data from the interpretation of intents and entities.
For example, INTENT:BILL_PAY or AMOUNT:100.
slot_literals UserInput.Interpretation.
SlotLiteralsEntry
Repeated. Slot literals from the interpretation of the entities. The slot literal provides the exact words used by the user.
For example, AMOUNT: One hundred dollars.
slot_formatted_literals UserInput.Interpretation.
SlotFormattedLiteralsEntry
Repeated. Slot formatted literals from the interpretation of the entities.
slot_confidences UserInput.Interpretation.
SlotConfidencesEntry
Repeated. Slot confidences from the interpretation of the entities.
alternative_interpretations UserInput.Interpretation Repeated. Alternative interpretations possible from the interaction, that is, n-best list.

UserInput.Interpretation.DataEntry

Data entry from interpretation of user input.

User input interpretation data entry fields description
Field Type Description
key string Key of the data.
value string Value of the data.

UserInput.Interpretation.SlotConfidencesEntry

Slot confidence entry from interpretation of user input.

User input interpretation slot confidence fields description
Field Type Description
key string Name of the entity.
value float Value from 0..1 that indicates the confidence of the interpretation for this entity.

UserInput.Interpretation.SlotLiteralsEntry

Slot literal entry from interpretation of user input.

User input interpretation slot literal fields description
Field Type Description
key string Name of the entity.
value string Literal value of the entity.

UserInput.Interpretation.SlotFormattedLiteralsEntry

Slot formatted literal entry from interpretation of user input.

User input interpretation slot formatted literal fields description
Field Type Description
key string Name of the entity.
value string Literal value of the entity.

View

Specifies view details for this action.

View fields description
Field Type Description
id string Class or CSS defined for the view details in the node.
name string Type defined for the view details in the node.

Scalar value types

The data types in the proto files are mapped to equivalent types in the generated client stub files.

Scalar data types
Proto Notes C++ Java Python
double double double float
float float float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers. If your field is likely to have negative values, use sint32 instead. int32 int int
int64 Uses variable-length encoding. Inefficient for encoding negative numbers. If your field is likely to have negative values, use sint64 instead. int64 long int/long
uint32 Uses variable-length encoding. uint32 int int/long
uint64 Uses variable-length encoding. uint64 long int/long
sint32 Uses variable-length encoding. Signed int value. These encode negative numbers more efficiently than regular int32s. int32 int int
sint64 Uses variable-length encoding. Signed int value. These encode negative numbers more efficiently than regular int64s. int64 long int/long
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long
sfixed32 Always four bytes. int32 int int
sfixed64 Always eight bytes. int64 long int/long
bool bool boolean boolean
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode
bytes May contain any arbitrary sequence of bytes. string ByteString str