Syntax of robust parsing grammars

A robust parsing grammar is simply an SRGS grammar that includes the definition of one or more concept sets. In addition, it includes <meta> elements in the header that must refer to one of the following: an n-gram grammar, a finite state machine (FSM) and wordlist, or an SLM training set.

For a short but complete example of a robust parsing grammar, see Detailed restaurant guide example.

Referring to n-gram grammars

A robust parsing grammar can refer to an n-gram. For example:

<meta name="swirec_first_pass_grammar" content="MyN-gram.xml"/>

An n-gram file is a textual method for describing an SLM. It’s not a training file, which simply states sentences and vocabulary. Instead, an n-gram file specifies the relationship between words by determining the probability that a given word will follow another in a common sequence. You can specify unigrams (single word), bigrams (2 item sequences), and trigrams (3 item sequences).

See N-gram grammars and swirec_first_pass_grammar.

Referring to FSMs and wordlists

When you generate an SLM, you automatically create a finite state machine and wordlist (see SLMs).

To use an SLM, a robust parsing grammar must refer to the finite state machine (FSM) file and the wordlist. To do this, use the meta parameters swirec_fsm_grammar and swirec_fsm_wordlist. For example:

<meta name="swirec_fsm_grammar" content="MyNgram.fsm"/>
<meta name="swirec_fsm_wordlist" content="MyNgram.wordlist"/>

The grammar writer is responsible for ensuring that the SLM vocabulary (the vocabulary of the recognition grammar) is complete. Words that are not contained in the SLM vocabulary will not be recognized, even if they appear in a concept rule of a robust parsing grammar. All grammar words must therefore be included in the SLM vocabulary.

See swirec_fsm_grammar.

Referring to training sets

As an alternative to training an SLM in advance and referring to the resulting FSM and wordlist, a robust parsing grammar can simply refer to the SLM training set instead. For example:

<meta name="swirec_training_grammar" content="myTraining.xml"/>

When the grammar contains this <meta> element, the SLM will be trained every time the grammar is compiled. This has performance implications; see Compiling and loading robust parsing grammars.

Defining concept sets

Within the main body of the grammar, you must write the rules defining the concepts within a <concept> section.

A robust parsing grammar defines concepts in a special <conceptset> section. The concept definitions specify the concept rules to be used to identify significant information, and the tags that fill the returned slots.

The following example defines a concept set with three concepts:

<conceptset id="allConcepts" 
        xmlns="http://www.nuance.com/grammar">
    <concept>
        <ruleref uri="#c_origin"/>
        <tag>origin = c_origin.v</tag>
    </concept>
    <concept>
        <ruleref uri="#c_destination"/>
        <tag>destination = c_destination.v</tag>
    </concept>
    <concept>
        <ruleref uri="#c_date"/>
        <tag>date = c_date.v</tag>
    </concept>
</conceptset>

Provided that the concept rules define phrases like "from Twickenham," "to London," and "on Sunday", this set of concept definitions would allow recognition of the following sentence:

"On Sunday I'd like to go from Twickenham to London"

In the recognition result the slots "origin," "destination," and "date" would be filled with the values "Twickenham," "London," and "Sunday." Note that the rules referenced may themselves refer to other rules; for example, a city rule that identifies the city.

<conceptset> syntax

A robust parsing grammar can contain only one <conceptset> element, and this conceptset must be declared to be the root rule.

Like a <rule> element, a <conceptset> element must have an id attribute.

Unlike a <rule> element, a <conceptset> cannot have a scope attribute. The scope of a conceptset is always "private." As a consequence, a concept set can only be activated via the root rule mechanism. Another difference between a <conceptset> and a <rule> element is that a conceptset cannot be referred to (you cannot specify a conceptset with the uri attribute of a <ruleref> element).

A <conceptset> contains one or more <concept> elements. Any number of concepts in any order is allowed.

Each <concept> element must contain one <ruleref> element, and may contain any number of <tag> elements. No other elements are allowed in a <concept> element.

Concept elements do not have attributes.

Required namespace for concepts

The <conceptset> is an extension to SRGS. Therefore, the elements must reside in the following required namespace:

http://www.nuance.com/grammar

You must declare this namespace in addition to the SRGS namespace in your robust parsing grammars (see examples below).

The <ruleref> and the <tag> elements in this namespace are identical to the corresponding elements in the SRGS namespace. This also means that the concept rules (the rules that are referred to by <ruleref> elements in a <concept> element) are ordinary SRGS rules.

According to the rules of XML namespaces, there are two ways to identify elements as residing in a namespace:

  • Declare a default namespace on an enclosing element (thus, overriding the SRGS namespace).
  • Declare a namespace prefix on an enclosing element, and then add the prefix to element that belongs to the namespace.

You can use either, or both, of these syntaxes in your grammars; there is no difference with regard to grammar processing in Recognizer. Below are examples of each syntax.