Planning the dialog

Once you know how information flows in your application, you can use that to plan dialogs.

The information flow provides a skeleton for the interactions. It gives you an idea of where you will have to ask the caller for information, what pieces of information you need at different times or for different tasks, and what information you may have to return to the caller.

Once you have an idea of the information you need to gather and the order in which you need to gather it, you can begin to design the flow of interactions between the voice application and the user, and pinpoint where you will need to use grammars to interpret a user response.

Directed dialogs vs. mixed-initiative dialogs

One of the key issues in dialog design is the choice between directed dialogs and mixed-initiative dialogs. The application may request information one piece at a time in a given order (directed dialog), or it can accept several pieces of information at once in any order and prompt for missing items as necessary (mixed-initiative dialog).

A directed dialog is simpler, tends to have lower out-of-grammar rates, and allows for different branches in the application where there are information dependencies. However, a directed dialog can be cumbersome when the user has to supply many different pieces of information.
A mixed-initiative dialog can accept many different pieces of information at once, which makes it very quick from the user’s point of view. However, it requires an extensive and complex grammar to cover all reasonable responses, and tends to have higher out-of-grammar rates than a corresponding directed dialog.

To illustrate the difference, consider our flight reservation example.

Directed dialog

A directed dialog might begin by presenting the caller with three options: “Would you like to make a new reservation, check an existing reservation, or cancel an existing reservation?” Once the caller makes a choice, the application continues with the next question: “What city are you leaving from?”, or “What is the reservation number?”

A directed dialog is simple and straightforward. The caller is prompted for each piece of information, often by stating the options available and asking the caller to make a selection. This approach generally has the advantage of keeping the interaction on track, since the caller has a good idea of what response is expected at each stage of the dialog. It can also lead to smaller (and therefore more efficient and accurate) grammars, since the options are clearly stated.

However, directed dialogs do have potential disadvantages. They can be extremely cumbersome for the caller, who is forced to wait to be prompted for each piece of information separately. This can be particularly frustrating when the task requires that the caller supply many different items: the caller will find it tiresome, and the call will take a long time. Also, a grammar designed for a directed dialog is unlikely to handle unexpected responses very well.

A directed dialog is generally most suitable when the task at hand requires only a few pieces of information, when the options are not intuitively obvious, when the caller does not know what information must be supplied, or when there are a limited number of acceptable replies to a prompt.

Mixed-initiative dialog

A mixed-initiative dialog will begin with a general prompt, such as “What can I do for you today?” The caller is able to reply with more complete information, such as “I’d like to reserve a flight from New York to Chicago on July 3rd,” or “I want to cancel reservation number 5213.” The application then prompts the caller for any needed information that may be missing from the initial response.

A mixed-initiative dialog is far more open-ended than a directed dialog. It can understand an entire sentence, and accept many different pieces of information at once. If the caller leaves anything out, the application can still prompt for the missing information as needed, without revisiting items that have already been filled. This makes the process faster and more convenient from the caller’s point of view, especially when the task requires many different pieces of information. Instead of plodding laboriously through a series of questions, the caller can provide all necessary information naturally in a single sentence. However, the caller must have an idea of what sort of input is expected.

The grammars for mixed-initiative dialogs are more complex than the grammars used in a directed dialog. Since it’s more difficult to predict the answers a user may provide to an open-ended question, the grammar must be able to handle a large number of possible responses. A mixed-initiative dialog will also tend to have higher grammar error rates. There are many advanced natural language techniques which are designed to optimize mixed-initiative dialogs. These are discussed in Adding natural language capabilities.

Mixed-initiative dialogs are best used when the caller has to supply a great deal of information for a task, when the caller is experienced and already knows what’s expected, and when you are trying to keep calls as brief as possible for the convenience of your customers or in order to keep phone lines open.

Combining the types of dialog

In practice, the choice between a directed dialog and a mixed-initiative dialog is not absolute. You may want to switch your approaches at suitable points in your dialog. For example, you may begin with a directed prompt (“Would you like to book, verify, or cancel a reservation?”) and continue with the directed method if the caller chooses to verify or cancel a reservation. However, if the caller instead chooses to book a new reservation, you may decide it’s better to use a mixed-initiative approach for that branch of the application.

Unfortunately, experience shows that this approach can confuse callers, especially when the transition is abrupt. Callers expect the application to be consistent in the kind of input it accepts, and they become annoyed at a switch. It’s recommended that you use a consistent approach for the whole application.

Built-in grammars

For each supported language, Nuance provides several pre-defined grammars for common recognition tasks. You can invoke these built-ins from within your VXML file or another grammar. Built-in grammars can save you development time and effort, so it’s a good idea to structure your dialog to use them whenever possible.

In our flight reservation application example, the date and time can each be covered by built-in grammars that are included with the installation for US English.

The built-ins that are available vary for each language. See Built-in grammars.

Usage patterns

Your dialog may be profoundly influenced by the kind of callers you expect. First-time callers may require careful explanations of technical terms and acronyms, while experienced callers may grow impatient and want to skip ahead. These two approaches present another factor to be balanced.

If your application is intended to encourage repeat callers (for example, if it’s a subscriber-based service) you will want to offer such help to first-time callers who need it, while allowing experienced callers to jump ahead and complete their transactions quickly. Let the expected ratio of new to experienced callers guide your choices. If you expect mostly repeat callers, you may use a mixed-initiative dialog as your default, while still letting a new caller explicitly ask for help (“What would you like to do? If you’d like a list of possible actions, say ’help’”). If you instead anticipate a lot of one-time callers, you can use a directed dialog as the default, but allow callers to ask to jump ahead.

Confirmation requests

There is always a chance that Recognizer will misunderstand a caller’s utterance, or that the caller will change his or her mind after responding. It’s generally a good idea to ask the caller to confirm their choices before actually committing them to a database.

This is best done when a block of information has been completed rather than for each individual piece of information. In the flight reservation example, you may want to confirm once when the caller selects the departure and destination cities—before compiling the list of available flights—and then again when you also have the date and time before finalizing the reservation.

If the caller rejects the information, your application will have to go back and repeat at least some of your dialog. If the dialog is a long one, you may want to determine which values need to be replaced, and only prompt for those fields.

Dynamic items

At some points in your dialog, you may need to generate a dynamic list for your grammar. To use the flight reservation example, the list of flights available is only given once the caller specifies the origin and destination cities and a date. For a banking application, you might need to provide account-specific information once the caller has supplied identifying information. Since these lists are usually intended as selection lists, you may need to generate a grammar or part of a grammar dynamically as well, in order to accept the caller’s choice.

In these cases, you can create dynamic-link grammars that will retrieve needed words and phrases at runtime. See Dynamic-link grammars.