Using Mix resources in DLGaaS/Dialog application

This section describes how to leverage built Mix speech resources in DLGaaS (Mix hosted) or Dialog (self-hosted) applications.

Dialog service API

In a Dialog client application, the application uses the Dialog service API. The conversation proceeds in steps of back and forth between the user and a dialog agent. When the dialog agent is configured for speech inputs, the client application can collect and stream the user speech audio to the Dialog service.

Dialog orchestration with ASR and NLU

When Nuance ASR and NLU are available alongside Dialog, Dialog can orchestrate behind the scenes with ASR to perform speech recognition on streaming speech inputs, and then with NLU to obtain semantic interpretation of the transcript.

When Dialog uses orchestration with Mix services for handling speech inputs, the following flow occurs:

  1. The Dialog client app collects speech audio from the user in response to prompts from the Dialog model.

  2. The Dialog client app sends a series of streaming requests to the Dialog service. The first request includes audio configurations, including references to Mix speech resources. Subsequent requests include the speech audio data.

  3. Dialog configures requests to the ASR service using any configured resources.

  4. The ASR service returns a text transcript of the user speech to the Dialog service.

  5. The Dialog service configures a request to the NLU service, including the text transcript and a reference to the NLU model from the same app configuration context tag as the Dialog model.

  6. The NLU service returns a semantic interpretation of the text transcript to the Dialog service.

  7. Dialog advances the dialog based on the intent identified by the NLU service and on the Dialog model.

  8. Dialog returns a prompt to the client app to drive the next turn.

For more details on the runtime flow in a Dialog client application, see Dialog client app development.

Orchestration for Nuance-hosted DLGaaS

For Nuance hosted DLGaaS, this orchestration is available by default, as DLGaaS, ASRaaS, and NLUaaS are all part of the same Mix platform. There is no separate deployment process necessary for any Mix resources, since Mix.nlu, ASRaaS, NLUaaS, and DLGaaS resources are all hosted by Nuance as part of the same Mix platform.

DLMs and other resources built and deployed in Mix app configurations can be referenced and accessed at runtime via Mix URN. Any DLMs deployed within the same Mix application can be referenced and used by DLGaaS, even if they belong to a different app configuration.

Orchestration for self-hosted Dialog

When self-hosted Dialog is used, orchestration with ASR and NLU can be configured in one of two different ways:

  • Self-hosted Dialog service orchestrates with Nuance-hosted ASRaaS and NLUaaS
  • Self-hosted Dialog service orchestrates with local installs of Krypton and NLU service

Orchestration with Nuance-hosted services

For this arrangement, the self-hosted Dialog installation has to be configured to access the Nuance-hosted ASRaaS and NLUaaS. The dialog model has to be downloaded from Mix and deployed locally, but there is no separate deployment required for speech and NLU resources. The speech and NLU resources can be accessed at runtime from the Mix platform via URN as for the case of Nuance-hosted DLGaaS.

A local install of Nuance Customer configuration storage service (CCSS) is required.

See the Dialog installation documentation for more details.

Orchestration with self-hosted Krypton and NLU

In this case, both self-hosted Krypton and NLU must also be installed locally alongside Dialog service. Dialog must be configured to orchestrate with Krypton and NLU.

A local install of Nuance Customer configuration storage service (CCSS) is required. Resources for all the services are deployed and retrieved via CCSS. Similarly to the Nuance-hosted case, CCSS allows resources to be accessed via Mix URN.

Speech and NLU resources must be deployed locally in this case. The following needs to be done to deploy the Mix-built DLMs and other resources locally:

  • Export the relevant app configurations containing the resources from Mix. One app configuration has to include the relevant Dialog model and its associated NLU model.

  • For each app configuration, extract resource files from the export package

  • Deploy all the resource files in local CCSS storage as described in the Dialog installation documentation

See the Dialog installation documentation for deployment details.

Referencing speech resources in Dialog

Speech resources can be referenced in the Dialog service API in one of two ways:

  • Via an ExternalResourceReferences object
  • Via resources in a StreamInput object’s asr_control_v1 field contents

Here is a summary of the key details of these two methods. For full details, see the DLGaaS API documentation.

Calling resources via ExternalResourceReferences object

The DLGaaS API allows you to pass in references to existing ASR and NLU resources at runtime using a session variable ExternalResourceReferences. The variable can be passed in to Dialog at runtime via a StartRequest or UpdateRequest payload, or via a data access node transfer.

A number of different resource types can be included here:

  • NLU and ASR compiled wordsets, both app-level and user-level
  • DLMs
  • ASR settings
  • Speaker profile

Once these resources are referenced via the ExternalResourceReferences, Dialog will use these resources for the rest of the Dialog session whenever it orchestrates with NLU and ASR.

Calling ASR resources via StreamInput.asr_control_v1 resources

Alternatively, speech resources can be called as part of each dialog turn. On each turn when speech input is collected from the user, the input is streamed to Dialog as StreamInput objects in a series of ExecuteStream calls. The first call of each turn configures speech audio and recognition settings, while subsequent calls send the audio bytes data. The asr_control_v1 field of StreamInput allows you to set various ASR parameters. This includes the ability to include multiple ASR RecognitionResource references. The details of configuring this are the same as for ASR and Krypton applications.

Limitations on number of each type of resource

Limitations on speech resources are the same as for ASRaaS and Krypton. You can use up to five DLMs and five wordsets.

Weighting resources

Weightings for resources can be set in different ways.

Weighting main DLM in Mix.dialog front-end

A default for the weighting for the main DLM of the application (The DLM built from the project from which the dialog and NLU models are built) can be set in the Mix.dialog front-end. See Configure the weight of the ASR domain language model in the Mix.dialog documentation for more details.

Weighting resources in ExternalResourceReferences object

The ExternalResourceReferences format allows for a weight_value to be set for referenced DLMs.

Weighting resources in StreamInput.asr_control_v1 resources

In this case, the same ResourceReference object is used as with ASR. Similarly to the ASRaaS and Krypton case, the obejct includes fields to set weights or weightable resources with either a float value or an enumerated setting level.

Resource reuse

When resources are referenced in an ExternalResourceReference, they are reused for all remaining speech-based conversation turns within the Dialog session.