Robust parsing grammars

A robust parsing grammar is a collection of SRGS rules (or concepts), which are applied to input text. Unlike a regular SRGS grammar, which applies rules in a specific order, a robust parsing grammar applies rules flexibly wherever they provide the best matches, and extracts meaning from those matching fragments. This allows the grammar to ignore regions of text that contain filler words (such as dysfluencies like “um” or “ah”), and lets the caller provide information in any order.

For example, consider a banking application. In order to transfer money between accounts, the caller might have to specify these distinct concepts:

  • An amount of money
  • A source account
  • A destination account

In a directed dialog, where each piece of information is requested separately, it would be relatively simple to create a grammar for each prompt. However, such a directed dialog could prove tedious for the caller. A robust parsing grammar is able to extract all the required information from a single sentence: it identifies the key information-bearing words, and ignores the rest. This makes a robust parsing grammar ideal for interpreting answers that fill many different slots.

Robust parsing grammars give more flexibility to applications: they allow users to speak answers with more varieties of phrases, and expand the range of in-grammar speech. Specifically, this technology allows Recognizer to:

  • Interpret more spontaneous speech effects such as hesitations, dysfluencies, and out-of-grammar sentences.
  • Maintain a high level of accuracy while using SLMs.
  • Handle flexible responses typical of mixed-initiative systems.

Robust parsing grammars always require an underlying statistical language model. The SLM recognizes the spoken language, and the robust parsing grammar fills grammar slots for the meaningful phrases. For an overview of grammar combinations that can employ robust parsing grammars, see NLU techniques.

How robust parsing fills slots

The rules that define the key words and phrases are called concepts. When a robust parsing grammar is used for recognition, any concept phrases in an utterance will be found, regardless of the order in which they appear. The robust parsing grammar will also recognize concept phrases that are separated by arbitrary word sequences. Each sequence of words that does not match a concept phrase will automatically be covered by a so-called "filler rule".

Concepts are not the same thing as slots: concepts are rules that return values that fill slots. However, in most cases you will write one concept rule per slot.

By default, the robust parser fills a slot with the value extracted from the last word sequence that matched the concept. As an example, if you have a “city” slot to be filled and Recognizer returns “Boston no i meant new york”, then the city is “new york.” For a detailed discussion, see Concepts that fill more than one slot.

As a further example, suppose concept rules are developed for recognition of an origin-city, a destination-city and a date. If these rules define phrases like "from Twickenham," "to London," and "on Sunday," utterances like the following will be understood, and the correct meaning will be extracted:

  • "On Sunday I'd like to go from Twickenham to London, please"
  • "I want to travel from Twickenham to London on Sunday"

The meaning obtained from these two sentences will be the same, even though the same pieces of information are given in a different order.

Confidence values for robust parsing

Since the main purpose of a robust parsing grammar is to extract certain pieces of information and fill slots, the reliability of recognition results is best expressed by slot confidences, rather than sentence confidences:

  • Sentence confidence scores depend on all words in a recognized sentence (including semantically insignificant ones).
  • Slot confidence scores are focused on the significant portions of the recognized sentence, and different slots from the same utterance can have different confidence values. This enables an application to accept more natural speech from callers, because the application can distinguish confidences for each piece of information within the utterances.

Based on the slot confidence score, the application decides whether to discard individual slot values (and asked for again), verified (confirmed with the caller), or marked as successfully retrieved.

Compiling and loading robust parsing grammars

There is no special compile or load command for robust parsing. You can pre-compile a robust parsing grammar, or load it at runtime, just as you would an SRGS grammar.

However, because the robust parsing grammar requires a trained SLM, you must use one of the following techniques as part of your compilation strategy:

  • Complete compilation in advance: You can train the SLM and pre-compile the robust parsing grammar. The output of SLM training is an FSM (finite state machine) file and wordlist. When the grammar is compiled, it uses <meta> elements to reference the FSM and wordlist. Use this technique for the best runtime performance.
  • Partial compilation in advance: You can also train the SLM in advance and compile the robust parsing grammar dynamically at runtime. This technique avoids the cost of training the SLM at runtime, but accepts the cost of runtime compilation. The direct costs are increased CPU and memory usage. The indirect cost is increased delay perceived by callers.
  • Runtime compilation: Finally, you can train the SLM while dynamically compiling the robust parsing grammar at runtime. When the grammar is compiled, it contains a <meta> element to reference the SLM for training. This technique incurs the heaviest runtime cost. Avoid using this for large SLM training sets or grammars, or else limit the training file size.

Tip: For details on training an SLM, see SLMs and Training an SLM. For details on referring to the output of a trained SLM from within a robust parsing grammar, see Referring to FSMs and wordlists. For details on training an SLM during compilation, see Referring to training sets.