VoiceXML application structure

The VoiceXML specification provides ways to request and control prompts and speech recognition:

  • Define the flow of a dialog (within a <form> element).
  • Specify text to be spoken to the caller (using the <prompt> element).
  • Specify needed speech grammars (using the <grammar> element).
  • Configure recognition processing. For each recognition event, determine which grammars/models to use, set timers, define the bargein state (using the <property> element).
  • Request recognition of the collected speech (using the <field> element).
  • Receive recognition results (in the VoiceXML variable application.lastresult$) and process them for further action. See Getting recognition results.

Example: directed dialog using Nuance Recognizer

The following simplified example illustrates a common way in which VoiceXML implements these actions. The example provides the current weather for a requested city and state using Nuance Recognizer.

  1. The application sets the grammar to be used for this session (cityandstate.grxml).
  2. The application begins with an announcement and an advertisement for the service. The caller is prohibited from interrupting the ad.
  3. Following the welcome ad, the application asks for the city and state for which the caller wants to know the weather. If the caller does not respond within 5 seconds, the prompt is repeated twice.
  4. If there is still no answer, the application asks again, more specifically. If the caller provides the state, the application asks for the city. Note that the request for the city name includes the state that the application understood from the preceding question, as a way of confirming it.

Example: raw recognition with Krypton-only

Note: Nuance Recognizer and Dragon Voice applications require different artifacts. (They do not share artifacts.) To create Dragon Voice artifacts, contact Nuance to get access to Nuance Command Line Interface or Nuance Experience Studio or Nuance Mix Tools .

Note: The content in this topic is for Dragon Voice in on-premise deployments.

This rudimentary example begins with a prompt, collects the information provider by the caller, and ends.

  1. In preparation for recognizing collected speech, the VoiceXML document loads a domain language model (DLM) and two wordsets (which expand the vocabulary of the DLM) into the Krypton engine. For details, see Triggering the Dragon Voice recognizer.
  2. A greeting collects input from the caller. The prompt is open-ended: " This is a test. Please Speak."
  3. If the caller say anything, the VoiceXML document disconnects (exits).
  4. If the caller says nothing, or if the speech is not recognized, the VoiceXML document repeats the prompt.

Example: open-dialog using Dragon Voice

Note: Nuance Recognizer and Dragon Voice applications require different artifacts. (They do not share artifacts.) To create Dragon Voice artifacts, contact Nuance to get access to Nuance Command Line Interface or Nuance Experience Studio or Nuance Mix Tools .

This example begins with an open-ended prompt, collects the information provider by the caller, and uses a directed dialog to prompt for each of the remaining information slots. The scenario for this VoiceXML page is a banking application where callers transition to this page after indicating the desire to make a payment. The page collects the processing information: amount, date, payee, and account. The page can collect multiple slots per dialog turn, and the caller can change slot values at any time.

  1. The VoiceXML document loads the models and dynamic content (in this case, two wordsets) into the Dragon Voice engines in preparation for recognizing and interpreting the collected speech. For details, see Triggering the Dragon Voice recognizer.
  2. A greeting collects input from the caller. The prompt is open-ended: "Thank you for calling, how can I help you?"
  3. If the caller provides all of the information needed to satisfy the dialog (all fields filled), confirmation follows.

    For example, to the initial prompt a caller might say "Pay thirty dollars to my Visa from checking on February first 2018." This utterance fills all slots (provides all entity values). Confirmation follows: "Thanks, we'll pay Visa thirty dollars from account 123412341234 on February first, two thousand eighteen."

    If the caller does not provide all of the information needed, he or she is prompted to provide the missing information. If the utterance fills two of the slots—the caller says, for example, "Pay fifty dollars to Visa" (thus filling the AMOUNT and PAYEE slots)—he or she is prompted to provide the remaining pieces of information (the date as per the DATE field, and the account from which to make the payment as per the FROM_ACCOUNT field).

    Similarly, if the caller says simply "Pay Visa", he or she will be prompted for the amount, date, and account.

  4. When collection is complete, the VoiceXML document confirms the information collected (AMOUNT, PAYEE, DATE, and FROM_ACCOUNT slots) and disconnects (exits).