Responses and input and output modalities

At a given point in the dialog, your dialog model supports different modalities for system output and user input depending on what is configured in Mix.dialog for the current channel and node.

The supported modalities in both directions are indicated at each turn of the dialog in an execute response payload. Your client application can use this information on each turn to first play messages to the user and then (if the response contains a QA action), collect input accordingly.

Responses and supported output modalities

The output modalities supported for each message in a given turn can be determined from the fields that are present in each ExecuteResponse message action:

The nlg field indicates that TTS generated speech output is supported for the message. Note that the actual generated audio is not found in this field, but in the audio field of a StreamOutput if TTS was requested in an ExecuteStream() call. The nlg field contents provide backup text that can be used to try again if TTS generation was requested in an ExecuteStream() call but was not successful.
The visual field indicates that rich text output is supported for the message.
The audio field indicates that audio script prerecorded messages are supported as output for the message.

Responses and supported input modalities

Self-hosted environments: This feature is only available for version 1.5.0 (or later) of the Dialog service. This corresponds to engine pack 2.4 for Speech Suite deployments and engine pack 3.11 for self-hosted Mix deployments.

The input modalities that are supported for the present channel and QA node are indicated in an input_modes field within the Execute response payload under QAAction Recognition settings.

{
    "payload": {
        ...
        "qa_action": {
            ...
            "recognition_settings": {
                "collection_settings": {
                    "timeout": "7000",
                    "complete_timeout": "0",
                    "incomplete_timeout": "1500",
                    "max_speech_timeout": "12000"
                },
                "speechSettings": {
                    "sensitivity": "0.5",
                    "barge_in_type": "speech",
                    "speed_vs_accuracy": "0.5"
                },
                "input_modes": [
                    "text",
                    "voice"
                ]
            },
            ...
        },
        "channel": "Voice_Chat"
    }
}

Note:

For previously existing Dialog projects created prior to the release of this feature, the text and voice input modalities are added to your project and enabled by default.

Warning:

If streaming audio is sent to Dialog as part of an ExecuteStream call, Dialog will attempt to orchestrate with ASRaaS, even if the voice input mode is not enabled for the current channel. The client application is responsible for verifying the enabled input modalities and not sending streaming audio in this case.

The following topics describe how to handle these different types of output and input in more detail.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.