Dialogs

Within a document, a user interacts with dialogs in which the application produces auditory output, typically asking for information. The caller provides input by speaking, or by pressing keys on the telephone (DTMF). Caller speech must be recognized and its meaning interpreted. DTMF is interpreted as a sequence of tones.

VoiceXML offers two kinds of dialogs: forms and menus.

A form interacts with the user to obtain information. It may fill in a number of fields or carry out an action.
A menu presents the user with a number of choices.

Forms

Forms serve as the backbone of a VoiceXML document. Each form can contain its own error handling, help, prompts, grammars, variables, subdialogs, and scripts. Each form must have a name that is unique within the document, and can specify a scope for grammars within the form (dialog, or document).

A form can contain several different form items. Form items can be input items that collect information from the caller, or control items that perform a task.

The input item elements available for forms in NVP are:

<field>—Fills a variable with the interpretation of a caller’s reply.
<record>—Records the caller’s speech for later playback.
<subdialog>—Calls another form to act as a subroutine and stores the result.
<transfer>—Transfers the caller to another line.

The control items are:

<block>—Carries out instructions whenever the form is entered.
<initial>—Carries out instructions only the first time the form is entered.

Each form item in a form has an associated form item variable whose value is reset to undefined each time the form is entered. This variable has the same name as the item, or an internally generated name if no name is specified.

When the voice browser service loads a VoiceXML document, it begins execution at the very first form by default, unless another form was specified when the document was invoked. Execution stops at the end of the form unless control is passed to another location or form.

Form item attributes

The different types of form items each have specific attributes. However, they also have three attributes that are common to each type:

name—A unique identifier for the item and the associated form item variable.
expr—An initial value assigned to the form item variable (optional).
cond—An additional guard condition that determines whether the item is used or skipped (optional).

Example of a form

As an example, consider the GetPizzaSize form in GetPizzaSize.vxml:

<form id="GetPizzaSize">

  <field name="size">

    <property name="nuance.grammarlabel" value="GetPizzaSize"/>

      <prompt cond="entry == 'init'">

      <audio expr="PromptPath + 'GetPizzaSize_init.wav'"/>

      </prompt>

      <prompt cond="entry == 'reentry'">

      <audio expr="PromptPath + 'GetPizzaSize_reentry.wav'"/>

      </prompt>

      <prompt cond="entry == 'rej'">

      <audio expr="PromptPath + 'GetPizzaSize_rej.wav'"/>

      </prompt>

      <prompt cond="entry == 'nst'">

      <audio expr="PromptPath + 'GetPizzaSize_nst.wav'"/>

      </prompt>

      <prompt cond="entry == 'help'">

      <audio expr="PromptPath + 'GetPizzaSize_help.wav'"/>

      </prompt>

      <grammar src="../grammars/pizza.grxml#Pizza_Size"/>

      <filled>

        <assign name="PizzaSize" expr="size"/>

        <goto next="GetPizzaToppings.vxml"/>

      </filled>

    </field>

</form>

This example illustrates several important points:

This form consists of a single input item, the “size” field.
The prompt played is determined by the value of the “entry” variable.
Only one grammar rule is used to interpret the caller’s response. Since no type is assigned to the field, this rule is coded in an external grammar.
When the size is determined, that value is assigned to the PizzaSize variable and the application proceeds to the GetPizzaToppings.vxml document.

This basic form structure is used throughout the PizzaTalk application. See the other VoiceXML documents in the directory for similar examples of forms.

Guard conditions

An item in a form is only used when its guard conditions are met. These are:

The form item variable must be null or undefined.

The condition specified in the cond attribute must be met.

If the form item variable has already been filled and not cleared, or if the condition is not met (that is, if it evaluates to Boolean false), then the item is skipped and processing moves on to the next item.

Fields

In a typical voice application, most input items consist of <field> elements that are used to fill a specific variable. These fields are usually introduced with a <prompt>, and invoke a grammar to be used for recognizing the reply. This grammar may be a built-in that is invoked automatically when you assign the field type, or it may be an external grammar that you create separately.

The field variable has a dialog scope, so it can only be used within the form that contains it. To use a field variable value in other parts of the application, you can assign the value to one of your application global variables. This strategy is used in the PizzaTalk application, where the values obtained for different form variables (size, toppings) are assigned to global variables (PizzaSize, PizzaToppings, and so on).

Filled elements

The <filled> element specifies an action to be taken when a value has been assigned to one or more variables. You can use it within a single input item (<field>, <record>, <subdialog>, or <transfer>), or within the form as a whole.

When used within an input item, the <filled> element carries out its content when that input item has been filled.

When used within a form, the <filled> element carries out its content when one or more of the input items has been filled. You can set the element to execute when all input items are filled, when any one item has been filled, or when a specified combination of items has been filled.

For example, when a form is filled you can use this element to play a prompt (“Got it!”) and then transition to the next document in your application.

Menus

The <menu> element defines a menu. Each choice consists of a <choice> element. The “next” attribute of a <choice> element specifies the destination to which the interpreter should send the caller when that choice is selected. For example, the following menu consists of three choices:

<menu>

  <prompt> Please choose one of <enumerate/></prompt>

  <choice dtmf="1" next="#MovieForm">

    local movies

  </choice>

  <choice dtmf="2" next="localBroadcast.vxml#RadioForm">

    local radio stations

  </choice>

  <choice dtmf="3" next="http://www.nationTV.org/tv.vxml">

    national TV listings

  </choice>

</menu>

When the destination of a transition is a <form> or <menu> element, specify a unique identifier for the destination dialog's id attribute .

<form id="MovieForm">

  <field id="category">

    <prompt> What sort of movie would you like?</prompt>

...

The prompt in this menu includes <enumerate>. This element lets you set up a template for an automatically generated description of the choices.

By default, the <enumerate> template simply lists all the choices. In the example, the prompt is “Please choose one of local movies, local radio stations, national TV listings.” The destination dialog specified by the next attribute can be in the current document or in a different document:

If the caller says “local movies” or presses 1, the voice browser service transitions to the dialog named “MovieForm” in the same document.
If the caller says “local radio stations” or presses 2, the voice browser service transitions to the “RadioForm” dialog in the document localBroadcast.vxml.
If the caller says “national TV listings” or presses 3, the voice browser service transitions to the first dialog in tv.vxml on the national TV web site.

In this example, each choice can be activated by a specific phrase—an implicit in-line grammar, expressed simply as text within the <choice> element. However, you can use an external grammar to activate the choice instead.

Grammars in a dialog

NVP uses grammars at many different points in a dialog. They are most often used within a <form>, a <field> in a form, or in a menu <choice>. However, they can be used within a <link>, an <option> within a field, a <record>, or a <transfer>.

The elements of the SGRS implementation of XML (GrXML for short) are listed in VoiceXML elements .

Sources

Grammars have three basic sources:

A built-in grammar is included automatically as part of the NVP installation, to provide coverage for common words like numbers and dates. Built-ins can be invoked with the <grammar> element. For example, the NameSpeller sample application uses the alphanum_lc builtin grammar for US English:

<field name="fname">

  <grammar src="builtin:grammar/alphanum_lc"/>

    <prompt>Welcome to the name spelling application.

      Please spell your first name.</prompt>

    <noinput>Sorry, I didn't hear you say anything. Please

      spell your first name. For instance, if your name is

      peter, just say p e t e r</noinput>

    <nomatch>Sorry, I didn't understand what you said. Please

      spell your first name. For instance, if your name is

      peter, just say p e t e r</nomatch>

</field>

If the grammar needs to interpret one of the standard types defined in the VoiceXML specification, you can instead assign the type to the <field>:

<field name="departureTime" type="time">

  <prompt>What time of day would you like to depart?</prompt>

</field>

This use of the type attribute is only possible for supported types.

An inline grammar consists of a few words written directly in the VoiceXML file itself, and interpreted by the voice browser service at runtime. For example, the BlackJack sample includes a simple inline grammar directly in the BlackJack.vxml file.

<field name="request">

  <prompt>Would you like to hit or hold?</prompt>

    <grammar type="application/srgs+xml" root="root"

      version="1.0">

      <rule id="root" scope="public">

        <item>

          <one-of>

             <item>hit</item>

             <item>hold</item>

             <item>split</item>

             <item>double down</item>

          </one-of>

        </item>

      </rule>

    </grammar>

...

An inline grammar is often defined in a menu <choice> element.

An external grammar must be created individually by the application developer as a separate file written in the SRGS format of XML (*.grxml). external grammar reference often includes a specific rule. For example, the GetPizzaSize.vxml file refers to a specific rule in the pizza.grxml grammar:
```
<form id="GetPizzaSize">
```
```
  <field name="size">
```
```
    
```
```
    <grammar src="../grammars/pizza.grxml#Pizza_Size"/>
```
```
...
```

Grammar scope

Like other components of a voice application, a grammar has a scope that determines where it can be used. This scope is determined by the level at which the grammar is invoked. For example, a grammar declared within a form (or a field in a form) is only active within that form.

However, the <grammar> element includes a scope attribute that lets you override the default scope. When this attribute is set to “document”, the grammar is active throughout the document. If the document is a root document, the grammar is active throughout the application.

Weight

More than one grammar may be active at a time. In cases where you have two or more active grammars available to interpret a response, you may want to guide recognition to favor some grammars over others. To do so, use the “weight” attribute to increase or decrease the probability that the grammar will be used. For an example of grammar weighting, see Links.

A weight can only be applied to a grammar declared using the <grammar> element. Implicit grammars, such as text within a <choice>, cannot be weighted.