Using Mix resources in ASRaaS/Krypton applications

This section describes how to leverage built Mix speech resources in ASRaaS (Mix hosted) or Krypton (self-hosted) applications. This involves:

Deploying resources
Referencing and using resources, including applying weights as needed

This section is intended as a high-level review and overview only. For full details, you can refer to ASRaaS or Krypton documentation.

Deploying resources

Resources need to be deployed to be used.

Nuance-hosted ASRaaS

For Nuance-hosted ASRaaS, the relevant Nuance datapacks are already installed in the platform. DLMs built and deployed in app configurations in Mix can be accessed at runtime via Mix URN. See the Mix URN documentation for more information on details of URN format. There is no separate deployment process necessary, since Mix.nlu and ASRaaS are both hosted by Nuance as part of the same Mix platform.

Self-hosted Krypton

For self-hosted Krypton, the following needs to be done to deploy the Mix-built DLMs locally:

Export the relevant app configurations containing the DLMs from Mix.
Extract DLM zip files from the export package.
Place DLM zip files on a local web server as described in Krypton installation documentation.
As needed, compile wordsets using NQAS and deploy on a local web server as described in Krypton installation documentation.

You also need to locally install Nuance datapacks in this case.

For full deployment details, see Krypton installation documentation.

Compiling wordsets

In ASRaaS, wordsets are compiled via the ASRaaS Training API.

In self-hosted Krypton, wordsets can be compiled using the NQAS utility. For more information on NQAS, see the Krypton documentation set.

When compiling a wordset, whether via the ASRaaS Training API or NQAS, a companion DLM associated with the relevant entities must be referenced.

When using the ASRaaS Training API, a target Mix URN for the resulting compiled wordset must also be included. The resulting compiled wordset can be referenced at runtime by this target URN as a RecognitionResource external resource.

For NQAS, the resulting compiled wordset must be deployed on a web server with other speech resources and referenced in a RecognitionResource external resource by server URL.

Recognizer API and referencing resources

In an ASRaaS or Krypton client application, speech recognition is performed using the Recognizer API. Each round of recognition is handled as a series of streaming recognition requests.

An initial request sends parameters, recognition resources, and possibly a user_id for the specific user.

Subsequent requests send the streaming audio.

The initial request is used to call on various resources:

Builtins: By name
DLMs: External reference to file by URN or URL
Wordsets
- Inline wordsets: Inline as JSON string
- Compiled wordsets: External reference to file by URN or URL
Speaker profile: External reference

For more details on how to call on these resources, see the links above.

For speaker profiles and user-level compiled wordsets, you need to include a user_id.

Limitations on resources

The following limits apply to use of DLMs and wordsets in ASRaaS and Krypton:

DLMs: Maximum of 5
Wordsets: Maximum of 5

Resource weighting

On each recognition turn using the API, each RecognitionResource can be weighted to tune the relative importance of the resource in the recognition. Resources can be weighted with either a float value or one of a set of discrete weight levels.

Weights can be set for DLMs, builtins, and wordsets. Weights are not set for speaker profiles.

If weights are not specified, certain minimum values are applied.

The remaining weight is assigned to the base language model in the data pack, with a default minimum weight value for the base language model. (If the combined weight of other resources does not leave this minimum amount, the other weights are scaled down accordingly)

Weighting can also be tuned in RecognitionParameters.

For more details on weighting, see the Resource weights reference topic in the ASRaaS API documentation.