Triggering the Dragon Voice recognizer

Note: The content in this topic is for Dragon Voice in on-premise deployments.

To trigger Dragon Voice recognition, provide a URL with key/value pairs to control the details:

<grammar src="http://base_path/filename?key=value"></grammar>

Syntax:

<grammar> is an element in your VoiceXML application. To understand how <grammar> fits into the workflow, see VoiceXML application structure and Dragon Voice recognition flow.

base_path points to the central repository where you store the artifacts (manifest file, models, and wordsets).

filename is optional. It specifies a DLM or wordset.

key=value pairs load artifacts needed for recognition. Use the a question mark (?) to specify more than one pair.

Key=value pairs Description
nlptype=config

Trigger for Dragon Voice recognition.

Loads artifacts as defined in the manifest for the duration of the session.

nlptype=krypton

Trigger for Dragon Voice recognition.

Loads artifacts using their fully-qualified paths for the next recognition event.

nlptype=nle

Loads a semantic model for extracting meaning from text.

Required for semantic interpretation. You can load one semantic model during a session. It cannot be changed for the remainder of the session.

Not allowed for Krypton-only recognition.

nlptype=wordset

Optional. Loads a wordset containing new vocabulary into the models.

You can load more than one wordset during a session. See Using wordsets.

dlm_weight=value

Assigns a weight or importance to a domain language model. See Understanding weight.

Required when loading a DLM for recognition scope (via nlptype=krypton).

Optional when loading a DLM for session scope (via nlptype=config).

builtin_NAME_weight=value

Optional. Specifies one or more builtins, and assigns a weight. You can only load builtins that are supported by the base language model.

Dragon Voice artifacts

You create artifacts using Nuance tools that are provided separately from Speech Suite. The output from the tools are artifacts that become inputs to Dragon Voice in Speech Suite.

Note: After generating a manifest and its artifacts, store them in the same base directory, and specify that directory as the base path of the <grammar> src attribute in VoiceXML documents. If you generate more than one manifest and artifacts, store each set in a different base. (You cannot substitute or move files from one artifact to another. For example, you cannot insert a DLM from one artifact into the filepath of a different artifact.)

Artifact Description
Manifest file

Required. You must supply a manifest file to identify the project, artifacts, and resources that serve each recognition event. See Understanding the manifest file.

  • The manifest provides a default configuration for loading the artifacts. You can use the same manifest for every session.
  • You can override the configurations during the load. For example, you can assign a different DLM weight for each session.

Note: The <grammar> element points to the storage location but does not explicitly name the manifest file. The filename must be nuance_package.json. The NLP service automatically fetches the manifest and loads the engines.

Semantic models

NLU (natural language understanding) and linguistic models for the NLE and NTpE components.

Nuance creates these models on your behalf or you create them with Nuance Command Line Interface, Nuance Experience Studio or Nuance Mix Tools .

Not allowed for Krypton-only recognition.

Domain language models

Optional DLMs provide specialized knowledge of a domain or application-specific content. These models add to the factory or base language model that Krypton loads on startup.

Wordsets

Optional vocabularies that inject dynamic content at runtime. For example, a list of contact names or payees. You create wordsets in JSON format.

Loading artifacts

You can load Dragon Voice artifacts in different scopes: service, session, and recognition. Service scope comprises the period when the recognizer service is running, session scope typically comprises the time when a call is connected, and recognition scope comprises the period during which the recognizer is processing input. Service is the highest scope and recognition is the lowest. A single service scope can contain multiple session scopes, and a single session scope can contain multiple recognition scopes.

While artifacts provide faster and more accurate recognition, the loading of artifacts can be time-consuming. To reduce the latency experienced by users it is a good practice to load artifacts at service or session scope, rather than when they are used at recognition scope. Loading artifacts at service scope or session scope can significantly reduce latency, especially if those artifacts are large and used in multiple recognition events. The disadvantage of preloading is that the artifacts occupy resources before they are needed.

In some situations, you may prefer to load artifacts at recognition scope because you don't know which artifacts are needed until immediately before recognition. The downside is that the users may experience latency during the call (while the artifacts load) and those artifacts are only available for one recognition turn.

Loading artifacts for service scope

Service scope begins when the recognition service starts up or restarts. Accordingly, if you load artifacts at service scope, they are available without delay to subsequent sessions and recognition turns. Loading at service scope is different from loading at session and recognition scope because it doesn't use a manifest.

Preloading artifacts for session scope

Session scope begins when a user is connected on a call. This is a good time to preload artifacts that are likely to be used during the call. Use the nlptype=config keyword pair to load and activate artifacts for the remainder of the session. This could happen while playing a welcome prompt so that it would not be noticed by the user.

Loading artifacts for recognition scope

Use the nlptype=krypton and nlptype=wordset keyword pairs to load and activate objects for the next recognition event. This allows your application to respond flexibly to momentary needs. For example, you could load a DLM or wordset in the middle of a call to improve accuracy based on the context of the conversation.