Sample Python runtime client

DLGaaS offers a sample Python client application that you can download and use to access a deployed Dialog model with the Runtime gRPC API.

This sample app can accept text input or speech audio input and return text output or synthesized speech audio (TTS) output. To run this client, you will need to have:

  • Installed Python 3.6 or later.
  • Generated Python stubs from gRPC setup.
  • Your Mix client ID and secret from Prerequisites from Mix.
  • A built and deployed Mix project with Dialog and NLU resources, as described in Prerequisites from Mix. The Mix Coffee app quick start project is an easy way to get started.
  • For TTS output, your project requires the TTS output modality.
  • For speech input, a speech audio file. A sample wave file is provided in the sample client package, or you can use your own audio file. Your project must support the voice input modality to use voice input.
  • The Mix URN for your deployed Dialog model
  • Your Mix client ID and secret. This is needed to authorize you to access your previously built and deployed Mix Dialog and NLU model.
  • The Python sample client files for Linux or Windows:

Download the Python client zip file for Linux or Windows and extract its files into the same directory as the nuance directory that contains your proto files and Python stubs.

On Linux, give the run scripts execute permission with chmod +x. For example:

unzip dialog-python-client-linux.zip 
chmod +x *.sh

Client files

These are the resulting client files, in the same directory as the nuance directory holding your Python stubs:

    ├── dlg_client.py
    ├── run-mix-client.sh or run-mix-client.bat
    ├── run-mix-token-client.sh or run-mix-token-client.bat
    ├── OrderCoffee_i_want_a_double_espresso.wav
    ├── google
    └── nuance
        ├── dlg
        │   └── v1
        │       ├── common
        │       │   ├── dlg_common_messages.proto
        │       │   └── dlg_common_messages_pb2.py
        │       ├── dlg_interface.proto
        │       ├── dlg_interface_pb2.py
        │       ├── dlg_interface_pb2_grpc.py
        │       ├── dlg_messages.proto
        │       └── dlg_messages_pb2.py
        ├── asr
        │   └── v1
        │       ├── recognizer_pb2_grpc.py
        │       ├── recognizer_pb2.py
        │       ├── recognizer.proto
        │       ├── resource_pb2.py
        │       ├── resource.proto
        │       ├── result_pb2.py
        │       └── result.proto
        ├── tts
        │   └── v1
        │       ├── nuance_tts_v1.proto
        │       ├── nuance_tts_v1_pb2.py
        │       └── nuance_tts_v1_pb2_grpc.py
        ├── nlu
        │   └── v1
        │       ├── interpretation_common_pb2.py
        │       ├── interpretation-common.proto
        │       ├── multi_intent_interpretation_pb2.py
        │       ├── multi-intent-interpretation.proto
        │       ├── result.proto
        │       ├── result_pb2.py
        │       ├── runtime.proto
        │       ├── runtime_pb2.py
        │       ├── runtime_pb2_grpc.py
        │       ├── single_intent_interpretation_pb2.py
        │       └── single-intent-interpretation.proto
        └──rpc
            ├── error_details.proto
            ├── error_details_pb2.py
            ├── status.proto
            ├── status_pb2.py
            ├── status_code.proto
            └── status_code_pb2.py

Python app file

Each sample app package contains a common Python client application file, dlg_client.py. This file imports the generated Python stubs and contains the main application code. The client apps include command line scripts to run the app on the respective platforms along with other files to use with the app.

  View dlg_client.py  

Audio file for speech input

Each sample app package also includes a common audio file, OrderCoffee_i_want_a_double_espresso.wav, containing a text-to-speech rendering of the phrase “I want a double espresso.”

This audio file is intended for trying out speech processing with the client app, specifically in relation to a Dialog model built from the Mix Coffee app quick start project.

Given a Dialog model and associated NLU model built from this quick start project, this phrase will be interpreted by NLUaaS as the intent “ORDER_COFFEE” and with entity values of COFFEE_TYPE espresso and COFFEE_SIZE large. The Dialog model can then proceed down the path defined for those intent and entity values, and return responses accordingly.

Applicability of the audio file

Since this audio clip is designed for this specific coffee-shop themed project, the provided audio clip will only be useful for testing with models built from this quick start project. It may also be useful for other models relevant to a similar domain that includes intents of ordering coffee or other drinks. But if you’re using a different Dialog model that is related to a very different domain, you’ll need to provide your own audio clip that is appropriate for your model.

Specifications for creating your own audio file

The Python sample app currently only supports .wav audio files. The .wav audio file must be encoded with the following format to be usable with ASRaaS:

  • Linear pulse-code modulated (PCM)
  • 16-bit signed little-endian samples
  • 8 or 16 kHz sample rate

Run Python client for help

For a quick check that the client is working, and to see the arguments it accepts, run it on Linux or Windows using the help (-h or --help) option.

See the results below and notice:

  • -s, --serverUrl: URL for the Dialog server. By default this is localhost:8080 but the sample scripts specify hosted Mix Dialog at dlg.api.nuance.com.

  • Authorization: Include --oauthURL, --clientID, and --clientSecret. Alternatively, generate a token and use the (hidden) --token argument. The --oauthScope is set by default to dlg and so does not need to be specified for the provided Python client, which does not require any other scopes.

  • --secure: Boolean signalling whether to use a secure gRPC channel.

  • --modelUrn: Mix URN for the Dialog model to use.

  • --textInput: Text input string to the dialog.

  • --audioFile: Audio file containing speech input recording.

  • --tts: Boolean signalling whether text to speech output is required.

  • --audioDir: Directory for audio output files. This is set to audio by default.


```bash
python dlg_client_prev.py -h
usage: dlg_client.py [-options]

options:
  -h, --help                               Show this help message and exit
  --appId [appId]                          Mix appId. For self-hosted use only. Used by Dialog
                                           service to resolve resource URNs in self-hosted setup.
  --oauthURL [oauthUrl]                    OAuth 2.0 URL
  --clientID [clientID]                    OAuth 2.0 Client ID
  --clientSecret [clientSecret]            OAuth 2.0 Client Secret
  --oauthScope [oauthScope]                OAuth 2.0 Scope, default=dlg
  --secure                                 Connect to the server using a secure gRPC channel
  -s [serverUrl], --serverUrl [serverUrl]  Dialog server URL, default=localhost:8080
  --modelUrn [modelUrn]                    Dialog model URN, e.g. urn:nuance:mix/eng-
                                           USA/A2_C16/mix.dialog
  --textInput [textInput]                  Text to perform interpretation on
  --audioFile [audioFile]                  audio file name for speech input to trigger speech
                                           recognition and then interpretation
  --tts                                    Boolean whether to request TTS
  --audioDir [audio directory]             Audio output directory for TTS, default=audio. To be used
                                           together with --tts.

Edit run script

First, edit the sample shell script (run-mix-client.sh) or batch file (run-mix-client.bat) to add your Mix client ID and secret. The script replaces the colons in the client ID with %3A so the value can be parsed correctly in subsequent operations.

See Authorize to use the run-mix-token-client.sh and *.bat scripts.

#!/bin/bash

CLIENT_ID=<Mix client ID, starting with appID:>
SECRET=<Mix client secret>
# Change colons (:) to %3A in client ID 
CLIENT_ID=${CLIENT_ID//:/%3A}

# Scenario 1: Text input and text output
python dlg_client_prev.py --oauthURL https://auth.crt.nuance.com/oauth2/token \
    --clientID $CLIENT_ID --clientSecret $SECRET \
    --serverUrl dlg.api.nuance.com \
    --secure \
    --modelUrn $1 \
    --textInput $2 

# Scenario 2: Text input and TTS output
# python3 dlg_client_prev.py --oauthURL https://auth.crt.nuance.com/oauth2/token \
#     --clientID $CLIENT_ID --clientSecret $SECRET \
#     --serverUrl dlg.api.nuance.com \
#     --secure \
#     --tts \
#     --modelUrn $1 \
#     --textInput $2 

# Scenario 3: Audio input and TTS output
# python3 dlg_client_prev.py --oauthURL https://auth.crt.nuance.com/oauth2/token \
#     --clientID $CLIENT_ID --clientSecret $SECRET \
#     --serverUrl dlg.api.nuance.com \
#     --secure \
#     --tts \
#     --audioFile OrderCoffee_i_want_a_double_espresso.wav \
#     --modelUrn $1 
@echo off
setlocal enabledelayedexpansion

set CLIENT_ID=<Mix client ID, starting with appID:>
set SECRET=<Mix client secret>
rem Change colons (:) to %3A in client ID
set CLIENT_ID=!CLIENT_ID::=%%3A!

rem Scenario 1: Text input and output
python dlg_client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^
    --clientID %CLIENT_ID% --clientSecret %SECRET% ^
    --serverUrl dlg.api.nuance.com ^
    --secure ^
    --modelUrn %1 ^
    --textInput %2 

rem Scenario 2: Text input and TTS output
rem python dlg_client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^
rem     --clientID %CLIENT_ID% --clientSecret %SECRET% ^
rem     --serverUrl dlg.api.nuance.com ^
rem     --secure ^
rem     --tts ^
rem     --modelUrn %1 ^
rem     --textInput %2 

rem Scenario 3: Audio file input and TTS output
rem python dlg_client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^
rem     --clientID %CLIENT_ID% --clientSecret %SECRET% ^
rem     --serverUrl dlg.api.nuance.com ^
rem     --secure ^
rem     --tts ^
rem     --audioFile OrderCoffee_i_want_a_double_espresso.wav ^
rem     --modelUrn %1 

Run the sample client

With your client ID and secret added to the run script, you can run the sample client. There are three options for running the client, depending on the scenario you want to try.

Scenario 1: Text input and text output

By default, this client accepts a text string as input and returns a text response: the next prompt to send to the user. To try this scenario, run the sample shell script or batch file, passing the URN of your Dialog model and your text input to be interpreted.

./run-mix-client.sh "urn:nuance-mix:tag:model/TestMixClient/mix.dialog" "I want a double espresso"
run-mix-client.bat "urn:nuance-mix:tag:model/TestMixClient/mix.dialog" "I want a double espresso"

The client takes your text string and calls DLGaaS to interpret it and return the next prompt in the application as a text string. The prompt should be: “Perfect! A double espresso coming right up!” The response is the same on Linux and Windows.

2024-01-07 17:04:05,414 DEBUG: Creating secure gRPC channel
2024-01-07 17:04:05,420 DEBUG: Start Request: selector {
  channel: "default"
  language: "en-US"
  library: "default"
}
payload {
  model_ref {
    uri: "urn:nuance-mix:tag:model/TestMixClient/mix.dialog"
  }
}

2024-01-07 17:04:05,945 DEBUG: Start Request Response: {'payload': {'sessionId': '92705444-cd59-4a04-b79c-e67203f04f0d'}}
2024-01-07 17:04:05,948 DEBUG: Session: 92705444-cd59-4a04-b79c-e67203f04f0d
2024-01-07 17:04:05,949 DEBUG: Initial request, no input from the user to get initial prompt
2024-01-07 17:04:05,952 DEBUG: Execute Request: user_input {
}

2024-01-07 17:04:06,193 DEBUG: Execute Response: {'payload': {'messages':
[{'visual': [{'text': 'Hello and welcome to the coffee app.'}], 'view': {}}],
'qaAction': {'message': {'visual': [{'text': 'What can I get you today?'}]},
'data': {}, 'view': {}}}}
2024-01-07 17:04:06,198 DEBUG: Second request, passing in user input
2024-01-07 17:04:06,199 DEBUG: Execute Request: user_input {
  user_text: "I want a double espresso"
}

2024-01-07 17:04:06,791 DEBUG: Execute Response: {'payload': {'messages':
[{'visual': [{'text': 'Perfect, a double espresso coming right up!'}], 'view':
{}}], 'endAction': {'data': {}, 'id': 'End dialog'}}}

If you receive errors, or don’t get the response you expect (“Perfect…”) see Troubleshooting.

Scenario 2: Text input and TTS output

In this scenario, you input a text string but DLGaaS returns a wave file with synthesized text-to-speech audio, ready to play to the user instead of a text prompt.

Edit the sample shell script or batch file to uncomment scenario 2: the lines for text input and TTS output. Comment out scenario 1.

# Scenario 1: Text input and output 
# python3 dlg_client.py –oauthURL https://auth.crt.nuance.com/oauth2/token \
#    --clientID $CLIENT_ID –clientSecret $SECRET \
#    --serverUrl dlg.api.nuance.com \ 
#    --secure \
#    --modelUrn $1 \
#    --textInput $2 

# Scenario 2: Text input and TTS output 
python3 dlg_client_prev.py --oauthURL https://auth.crt.nuance.com/oauth2/token \
     --clientID $CLIENT_ID --clientSecret $SECRET \ 
     --serverUrl dlg.api.nuance.com \
     --secure \
     --tts \
     --modelUrn "$1" \
     --textInput "$2" 
rem Scenario 1: Text input and output
rem python dlg_client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^
rem    --clientID %CLIENT_ID% --clientSecret %SECRET% ^
rem    --serverUrl dlg.api.nuance.com ^
rem    --secure ^
rem    --modelUrn %1 ^
rem    --textInput %2 

rem Scenario 2: Text input and TTS output
python dlg_client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^
    --clientID %CLIENT_ID% --clientSecret %SECRET% ^
    --serverUrl dlg.api.nuance.com ^
    --secure ^
    --tts ^
    --modelUrn %1 ^
    --textInput %2 

As in scenario 1, run the client from the shell script or batch file, passing the URN of your Dialog model and your text input to be interpreted.

./run-mix-client.sh urn:nuance-mix:tag:model/TestMixClient/mix.dialog "I want a double espresso"
run-mix-client.bat urn:nuance-mix:tag:model/TestMixClient/mix.dialog "I want a double espresso"

The client takes the text string and calls DLGaaS to interpret it and return the next prompt in the application. In this scenario, the client signals Dialog via the streaming API to call TTSaaS and saves the audio that comes back as .wav files. You’ll see two audio files under a new folder, audio:

  • initial_tts_audio.wav: The audio for the initial prompt.
  • main_tts_audio.wav: The audio for the response to the user input.

If all goes well, you should see output similar to the following:

2024-01-07 16:16:20,415 DEBUG: Adding CallCredentials using token parameter
2024-01-07 16:16:20,416 DEBUG: Creating secure gRPC channel
2024-01-07 16:16:20,422 DEBUG: Start Request: selector {
  channel: "default"
  language: "en-US"
  library: "default"
}
payload {
  model_ref {
    uri: "urn:nuance-mix:tag:model/TestMixClient/mix.dialog"
  }
}

2024-01-07 16:16:20,738 DEBUG: Start Request Response: {'payload': {'sessionId': '6303610e-8d97-4f95-bf97-2d423c4baad0'}}
2024-01-07 16:16:20,738 DEBUG: Session: 6303610e-8d97-4f95-bf97-2d423c4baad0
2024-01-07 16:16:20,738 DEBUG: Initial request, no input from the user to get initial prompt
2024-01-07 16:16:20,739 DEBUG: Stream input with parameters for TTS: audio_params {
  audio_format {
    pcm {
      sample_rate_hz: 16000
    }
  }
}
voice {
  name: "Evan"
  model: "enhanced"
}

2024-01-07 16:16:20,931 DEBUG: Received Execute response: {'payload': {'messages': [{'nlg': [{'text': 'Hello and welcome to the coffee app.'}], 'visual': [{'text': 'Hello and welcome to the coffee app.'}], 'view': {}}], 'qaAction': {'message': {'nlg': [{'text': 'What can I get you today?'}], 'visual': [{'text': 'What can I get you today?'}]}, 'view': {}, 'recognitionSettings': {'collectionSettings': {'timeout': '7000', 'completeTimeout': '0', 'incompleteTimeout': '1500', 'maxSpeechTimeout': '12000'}, 'speechSettings': {'sensitivity': '0.5', 'bargeInType': 'speech', 'speedVsAccuracy': '0.5'}}}}}
2024-01-07 16:16:20,998 DEBUG: Received TTS audio: 70806 bytes
2024-01-07 16:16:20,998 DEBUG: Received TTS audio: 12596 bytes
2024-01-07 16:16:21,000 DEBUG: Received TTS audio: 42758 bytes
2024-01-07 16:16:21,001 DEBUG: Wrote generated speech audio response to audio/initial_tts_audio.wav
2024-01-07 16:16:21,002 DEBUG: Stream input with parameters for TTS: audio_params {
  audio_format {
    pcm {
      sample_rate_hz: 16000
    }
  }
}
voice {
  name: "Evan"
  model: "enhanced"
}

2024-01-07 16:16:21,176 DEBUG: Received Execute response: {'payload': {'messages': [{'nlg': [{'text': 'Perfect, a double espresso coming right up!'}], 'visual': [{'text': 'Perfect, a double espresso coming right up!'}], 'view': {}}], 'endAction': {'data': {}, 'id': 'End dialog'}}}
2024-01-07 16:16:21,193 DEBUG: Received TTS audio: 62572 bytes
2024-01-07 16:16:21,194 DEBUG: Received TTS audio: 36856 bytes
2024-01-07 16:16:21,196 DEBUG: Wrote generated speech audio response to audio/main_tts_audio.wav

If you receive errors, or don’t get the response you expect see Troubleshooting.

Scenario 3: Audio input and TTS output

In this final scenario, you input an audio file and DLGaaS returns wave files with synthesized text-to-speech audio, simulating a complete voice conversation between the application and the end user.

Edit the shell script or batch file to uncomment scenario 3: the lines for audio input and TTS output. Comment out the lines for scenarios 1 and 2.

# Scenario 1: Text input and output
# python3 dlg_client.py --oauthURL https://auth.crt.nuance.com/oauth2/token \
#    --clientID $CLIENT_ID --clientSecret $SECRET \
#    --serverUrl dlg.api.nuance.com \
#    --secure \
#    --modelUrn "$1" \
#    --textInput "$2" 

# Scenario 2: Text input and TTS output
# python3 dlg_client.py --oauthURL "https://auth.crt.nuance.com/oauth2/token" \
#     --clientID $CLIENT_ID --clientSecret $SECRET \ 
#    --serverUrl dlg.api.nuance.com \
#    --secure \
#    --tts \
#    --modelUrn "$1" \
#    --textInput "$2" 

# Scenario 3: Audio file input and TTS output
python3 dlg_client.py --oauthURL https://auth.crt.nuance.com/oauth2/token \
   --clientID $CLIENT_ID --clientSecret $SECRET \ 
    --serverUrl dlg.api.nuance.com \
    --secure \
    --tts \
    --audioFile OrderCoffee_i_want_a_double_espresso.wav \
    --modelUrn "$1"
rem Scenario 1: Text input and output
rem python dlg_client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^
rem    --clientID %CLIENT_ID% --clientSecret %SECRET% ^
rem    --serverUrl dlg.api.nuance.com ^
rem    --secure ^
rem    --modelUrn %1 ^
rem    --textInput %2

rem Scenario 2: Text input and TTS output
rem python dlg_client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^
rem    --clientID %CLIENT_ID% --clientSecret %SECRET% ^
rem    --serverUrl dlg.api.nuance.com ^
rem    --secure ^
rem    --tts ^
rem    --modelUrn %1 ^
rem    --textInput %2

rem Scenario 3: Audio file input and TTS output
python dlg_client.py --oauthURL https://auth.crt.nuance.com/oauth2/token ^
    --clientID %CLIENT_ID% --clientSecret %SECRET% ^
    --serverUrl dlg.api.nuance.com ^
    --secure ^
    --tts ^
    --audioFile OrderCoffee_i_want_a_double_espresso.wav ^
    --modelUrn %1 

Run the client from the shell script or batch file and pass only the URN of your Dialog model. The audio file used as input is set in the run script.

./run-mix-client.sh "urn:nuance-mix:tag:model/TestMixClient/mix.dialog"
run-mix-client.bat "urn:nuance-mix:tag:model/TestMixClient/mix.dialog" 

The client takes the audio file and calls DLGaaS to recognize the speech, interpret its meaning, and return the next prompt in the application.

The client simulates streaming of the audio by breaking the audio file into chunks and sending them to ASRaaS via DLGaaS’s streaming API. The client also signals DLGaaaS to call TTSaaS via the streaming API and saves the audio that comes back as .wav files. As in scenario 2, you’ll see two audio files under a new folder, audio:

  • initial_tts_audio.wav: The audio for the initial prompt.
  • main_tts_audio.wav: The audio for the response to the user input.

If all goes well, you should see output similar to the following:

2024-01-07 16:25:08,367 DEBUG: Adding CallCredentials using token parameter
2024-01-07 16:25:08,368 DEBUG: Creating secure gRPC channel
2024-01-07 16:25:08,373 DEBUG: Start Request: selector {
  channel: "default"
  language: "en-US"
  library: "default"
}
payload {
  model_ref {
    uri: "urn:nuance-mix:tag:model/TestMixClient/mix.dialog"
  }
}

2024-01-07 16:25:08,506 DEBUG: Start Request Response: {'payload': {'sessionId': 'a4ff950f-db77-455a-bc6e-bc5aad33b328'}}
2024-01-07 16:25:08,507 DEBUG: Session: a4ff950f-db77-455a-bc6e-bc5aad33b328
2024-01-07 16:25:08,507 DEBUG: Initial request, no input from the user to get initial prompt
2024-01-07 16:25:08,508 DEBUG: Stream input with parameters for TTS: audio_params {
  audio_format {
    pcm {
      sample_rate_hz: 16000
    }
  }
}
voice {
  name: "Evan"
  model: "enhanced"
}

2024-01-07 16:25:08,754 DEBUG: Received Execute response: {'payload': {'messages': [{'nlg': [{'text': 'Hello and welcome to the coffee app.'}], 'visual': [{'text': 'Hello and welcome to the coffee app.'}], 'view': {}}], 'qaAction': {'message': {'nlg': [{'text': 'What can I get you today?'}], 'visual': [{'text': 'What can I get you today?'}]}, 'view': {}, 'recognitionSettings': {'collectionSettings': {'timeout': '7000', 'completeTimeout': '0', 'incompleteTimeout': '1500', 'maxSpeechTimeout': '12000'}, 'speechSettings': {'sensitivity': '0.5', 'bargeInType': 'speech', 'speedVsAccuracy': '0.5'}}}}}
2024-01-07 16:25:08,820 DEBUG: Received TTS audio: 70806 bytes
2024-01-07 16:25:08,821 DEBUG: Received TTS audio: 12596 bytes
2024-01-07 16:25:08,822 DEBUG: Received TTS audio: 42758 bytes
2024-01-07 16:25:08,823 DEBUG: Wrote generated speech audio response to audio/initial_tts_audio.wav
2024-01-07 16:25:08,825 DEBUG: Streaming audio input...
2024-01-07 16:25:08,825 DEBUG: First streamed packet:
2024-01-07 16:25:08,825 DEBUG: Sending parameters for ASR: audio_format {
  pcm {
    sample_rate_hz: 16000
  }
}
end_stream_no_valid_hypotheses: true

2024-01-07 16:25:08,825 DEBUG: Sending parameters TTS: audio_params {
  audio_format {
    pcm {
      sample_rate_hz: 16000
    }
  }
}
voice {
  name: "Evan"
  model: "enhanced"
}

2023-12-19 16:25:08,825 DEBUG: Sending first speech input audio packet. Sending 640 bytes
2023-12-19 16:25:08,846 DEBUG: Sending subsequent speech audio packet. Sending 640 bytes.
. . . (more audio packets)
2023-12-19 16:25:11,103 DEBUG: Sending subsequent speech audio packet. Sending 640 bytes.
2023-12-19 16:25:11,118 DEBUG: Received ASR status response: 200 - Success
. . . (more audio packets)
2023-12-19 16:25:11,249 DEBUG: Sending subsequent speech audio packet. Sending 160 bytes.
2023-12-19 16:25:11,269 DEBUG: Sending empty stream input to signal end of stream.
2023-12-19 16:25:11,416 DEBUG: Received Execute response: {'payload': {'messages': [{'nlg': [{'text': 'Perfect, a double espresso coming right up!'}], 'visual': [{'text': 'Perfect, a double espresso coming right up!'}], 'view': {}}], 'endAction': {'data': {}, 'id': 'End dialog'}}}
2023-12-19 16:25:11,434 DEBUG: Received TTS audio: 62572 bytes
2023-12-19 16:25:11,435 DEBUG: Received TTS audio: 36856 bytes
2023-12-19 16:25:11,437 DEBUG: Wrote generated speech audio response to audio/main_tts_audio.wav

If you receive errors, or don’t get the response you expect, see Troubleshooting.

Troubleshooting

In these examples, the client should return two prompts, either in text or TTS format:

  • The initial prompt: “Hello and welcome… What can I do…”
  • The response to the user’s input: “Perfect! A double espresso coming right up!”

Depending on your input and how you have set up your project, you may encounter issues or receive different responses. Here are some tips for troubleshooting.

Dialog fails to generate TTS output

Confirm in Mix dashboard whether your project is configured to support the TTS output modality. If not, you will not be able to generate TTS output. Edit your project settings in the Mix dashboard to enable the TTS output modality in at least one channel.

Then rebuild project resources and redeploy the resources.

Dialog only partially captures the meaning of your input and asks a followup question

Instead of “Perfect,” your response may be “What type of coffee would you like?” or “What size coffee would you like?” These responses alert you to issues with your input text or your NLU model. The NLU model may fail to understand the size or type of coffee you want in terms of the entities and values defined in the model.

In this case you could try a new input more similar in wording and entity values to your NLU model training samples. Alternatively, add new training samples and rebuild and redeploy your NLU model.

Dialog fails to capture the meaning of the input / NO_MATCH

Check that your input is relevant to the domain on which your model is based and is reasonably similar to the training samples in your NLU model. If your input is very different from the training samples, the NLU model may fail to recognize valid intent or entities.

StatusCode.NOT_FOUND

If you receive a StatusCode.NOT_FOUND error, with details of “model … could not be found,” the model URN you specified does not exist under the client ID and secret you specified for authorization. Check that you have specified the correct URN for your Dialog model and that your authorization credentials give you access to that model.

Next steps

This is a very simple toy client to demonstrate some of the basic mechanics of how to access and use the DLGaaS API. It provides useful functions to access the methods of the DLGaaS API that could serve as building blocks in a more complete application. The client authorizes, starts the dialog, and goes through a single step of a dialog using a single text input string or audio file, provided as a command line argument.

However, additional work is required to create an app that can run through a full, multi-step, interactive dialog, collect input from a user, and handle data transfers.

What follows are some brief tips on next steps to build on this to create a more fully functional app.

Multi-step dialog loop

Most real dialogs include multiple steps of back and forth interaction. You will need to write a loop to cycle through playing prompts, collecting user input and data, as needed, and sending the input and data back within requests until the dialog is finished.

Collecting user input

You will need to write code to collect input, whether text or audio.

The app includes the function execute_request() to handle text input and the functions execute_stream_request() and build_stream_input() to process and stream audio input to Dialog from an existing audio file. However, you will need to write code to collect the text or audio input from the user and save to a file.

Supporting other audio formats

While this sample app only supports PCM encoding, ASRaaS and DLGaaS can support other ASRaaS supported audio formats. If needed for your application, you could add support for other audio formats.

Handling data transfers

Some dialogs rely on data transfers, whether client-side or server side. If your dialog contains data access nodes, then you would need to write code to recognize and handle both data access actions and continue actions as part of your main dialog loop.

Terminating the dialog

You will want to write code to handle an end action indicating that the dialog has terminated at its natural endpoint, as well as to allow the user to leave the conversation early and send a stop_request().

Authorize

DLGaaS is a hosted service on the Nuance Mix platform. To access this service, your client applications must be authorized with an access token generated by the OAuth 2 protocol.

In order to request a token, you need your Mix client ID and secret as described in Prerequisites from Mix. Once you have these credentials, you can request an access token in several ways.

The sample client supports two methods.

Let client generate token

The client includes token-generation code, checking first to see whether the token has expired. To use this method, pass your credentials and the location of the OAuth server in the --clientID, --clientSecret, and --oauthURL arguments.

Note: This is the preferred method.

Edit your run script, run-mix-client.sh or run-mix-client.bat, to add your Mix client ID and secret.

#!/bin/bash

CLIENT_ID=<Mix client ID, starting with appID:>
SECRET=<Mix client secret>
# Change colons (:) to %3A in client ID 
CLIENT_ID=${CLIENT_ID//:/%3A}
@echo off
setlocal enabledelayedexpansion

set CLIENT_ID=<Mix client ID, starting with appID:>
set SECRET=<Mix client secret>
rem Change colons (:) to %3A in client ID
set CLIENT_ID=!CLIENT_ID::=%%3A!

Generate token manually

For testing purposes, you may instead generate the token manually and pass it to the client as an environment variable in the --token argument.

This token expires after a short time (around 15 minutes) so must be regenerated frequently, but the number of requests is limited for security reasons.

To use this method, use the run-mix-token-client.sh or *.bat file, adding your Mix client ID and secret.

#!/bin/bash

CLIENT_ID=<Mix client ID, starting with appID:>
SECRET=<Mix client secret>
# Change colons (:) to %3A in client ID 
CLIENT_ID=${CLIENT_ID//:/%3A}

export MY_TOKEN="`curl -s -u "$CLIENT_ID:$SECRET" \
"https://auth.crt.nuance.com/oauth2/token" \
-d 'grant_type=client_credentials' -d 'scope=dlg' \
| python -c 'import sys, json; print(json.load(sys.stdin)["access_token"])'`"
@echo off
setlocal enabledelayedexpansion

set CLIENT_ID=<Mix client ID, starting with appID:>
set SECRET=<Mix client secret>
rem Change colons (:) to %3A in client ID 
set CLIENT_ID=!CLIENT_ID::=%%3A!

set command=curl -s -u %CLIENT_ID%:%SECRET% ^
-d "grant_type=client_credentials" -d "scope=dlg" ^
"https://auth.crt.nuance.com/oauth2/token"

for /f "delims={}" %%a in ('%command%') do (
    for /f "tokens=1 delims=:, " %%b in ("%%a") do set key=%%b
    for /f "tokens=2 delims=:, " %%b in ("%%a") do set value=%%b
    goto done:
)

:done
rem Check if the token was found
if not !key!=="access_token" (
    echo Access token not found^^!
    pause
    exit
)

rem Remove quotes
set MY_TOKEN=!value:"=!