VoiceXML document structure

Like any XML document, a VoiceXML document must begin with a header identifying the document type. The main body of the file contains all the VoiceXML instructions—all variables, audio prompts, grammar files, and other instructions that the voice browser service needs to conduct a dialog.

The header must conform to the XML standard:

<?xml version="1.0"?>

<!DOCTYPE vxml PUBLIC "-//Nuance/DTD VoiceXML 2.0//EN" "http://voicexml.nuance.com/dtd/nuancevoicexml-2-0.dtd">

<vxml version="2.0">

XML declaration

The first element (<? xml version=”1.0” ?>) indicates that the document is an XML document. Always use this element exactly as specified above.

DTD declaration

The second element, !DOCTYPE, identifies the Document Type Definition (DTD). A DTD describes the format of the data that might appear in an XML document. That is, the DTD defines the valid elements by specifying what attributes each element can have and what child elements or other content each element can contain. For NVP voice applications, you should use the Nuance VoiceXML DTD as shown.

<vxml>

Finally, the <vxml> delimiter opens the main body of the VoiceXML document.

If the document is a root document, this delimiter only lists the version, as shown above.

If the document is a subdocument, it will include an application attribute that identifies the root document. For example, the subdocuments for the PizzaTalk application all identify pizza.vxml as their root application:

<vxml version="2.0" application="pizza.vxml">

Other optional attributes of the element let you specify a language, a namespace defining the available elements and their attributes, and a base URI to be used as the starting point for all relative URIs in the document. For more information, see xml:base.

Main body

The main body of a VoiceXML document consists of everything that appears within the <vxml> element. A given VoiceXML document will typically include many different instructions for prompting a caller, receiving input, interpreting that input, and taking actions based on the interpretation.

These instructions fall into several general categories:

Properties customize the behavior of the voice browser service.

Variables store data temporarily for use by the application.
Dialogs define how the application interacts with the caller.

Executable content manipulates data or performs other actions based on input.

Since any voice application is interactive, there can be a fair amount of overlap in any discussion of these categories. For example, the <filled> element contains executable content. However, this content is only carried out when specified field variables have been assigned values. As a result, this element can come up in discussions of variables, dialogs, or executable content.

VoiceXML document structure

Header

XML declaration

DTD declaration

<vxml>

Main body

Related topics