Advanced robust parsing grammars

This topic addresses advanced topics involved in robust parsing grammars.

Concepts that fill more than one slot

In most robust parsing grammars, each slot is represented by one concept. However, there is no restriction on the number of slots one concept can fill.

For example, a grammar for a timetable application could define an "origin," a "destination," and an additional "two_locations" concept. The origin and destination concepts covers phrases like "from Twickenham" and "to London." The two_locations concept covers phrases like "Twickenham London," where the significant words "from" and "to" are missing, but where it is reasonable to assume that the first city is the origin and the second one is the destination.

A conceptset for this grammar might look like this:

<conceptset id="concepts" 
            xmlns="http://www.nuance.com/grammar">
    <concept>
        <ruleref uri="#c_origin"/>
        <tag>origin = c_origin.V</tag>
    </concept>
    <concept>
        <ruleref uri="#c_destination"/>
        <tag>destination = c_destination.V</tag>
    </concept>
    <concept>
        <ruleref uri="#two_locations"/>
        <tag>origin = two_locations.city1</tag>
        <tag>destination = two_locations.city2</tag>
    </concept>
</conceptset>

Slots that are filled more than once

During parsing, sentences are processed (and robust parsing concepts are applied) in a left-to-right manner. For example, consider a sentence where the user makes a correction: “I want to go to New York, er, I mean New Jersey”.

Assuming the robust parsing grammar fills a slot called Destination, the value will first be set to New York, and then replaced by New Jersey.

Since an ECMAScript object cannot have two properties with the same name, it is not possible to return more than one value for one slot. You could write ECMAScript (in a concept's <tag> element) to return the first or the last value that is extracted from an utterance. In some cases it might make sense to collect all values. This can easily be done by concatenating all occurring values. In the following example, the "toppings" slot is filled with a list of spoken pizza toppings separated by a "|" character.

<concept>
    <ruleref uri="#pizza_topping"/>
    <tag> if(typeof(toppings)=="undefined") {
            toppings = pizza_topping.v;
            } else {
            toppings += "|"+ pizza_topping.v
            }
    </tag>
</concept>

Parser weights

When you compile a robust parsing grammar, the compiler automatically adds weights to improve the recognition accuracy of concepts and fillers. To enable processing of these weights, swirec_compile_parser_with_weights is automatically set to 1, even if the robust parsing grammar contains a meta element that specifies a different value (in this case a warning message is written to the error log). It is not possible to override this automatic setting.

Set swirec_compile_parser_with_weights to “0” (or do not change the default value of 0) in any concept rules grammar files (that is, those files that are referred to within a "ruleref" element inside the conceptset).

n-best list length

For robust parsing grammars, swirec_nbest_list_length is automatically set to a value that provides a good quality of slot confidence values. This can lead to increased CPU time. If necessary, you can set swirec_nbest_list_length to a different (lower) value in a meta element in the robust parsing grammar. After making such a change, carefully evaluate the effect upon recognition accuracy and the quality of slot confidence values.

Ambiguities

For robust parsing grammars, swirec_max_parses_per_literal is automatically set to 1 unless you override it (see next paragraph). The reason for this is that a robust parsing grammar ‘s filler rule can cover any sequence of words. This also includes phrases that may belong to a concept rule. As a consequence, each recognized sentence has multiple parses during intermediate stages of recognition processing. With swirec_max_parses_per_literal set to 1, only the parse with highest score will be returned. The robust parsing mechanism ensures that this is the parse covering the most words with the fewest number of concepts.

To intentionally allow numerous interpretations (parses) of a single string, set swirec_max_parses_per_literal to an appropriate value in the robust parsing grammar. For example, if a robust parsing grammar contains the following meta, the specified value will be used:

<meta name="swirec_max_parses_per_literal" content="3"/>