Applying bigram language models

Recognizer allows the application of a bigram language model using the SWIlanguageModel parameter within a <meta> element as follows:

<meta name="SWIlanguageModel" content="letsgo-ngram.xml"/>

The format of the n-gram is specified in W3C January 3, 2001 Working Draft “Stochastic Language Models (N-Gram) Specification”.

Recognizer supports a subset of the n-gram specification. In particular, it supports bigrams and class-based bigrams, and allows multi-word tokens where the words are separated by spaces (these are called super-words in the n-gram specification), and also allows underscore-separated words. There is no support for back-off or interpolated models as outlined in the specification.

When to use a bigram

A bigram model is useful for tuning speech recognition accuracy if you have good reason to believe that some words or word sequences will occur more often than others. Applying a bigram boosts scores on more likely phrases compared to less likely ones. For difficult recognition tasks, this can boost accuracy. For tasks that already have highly accurate results, the extra effort required is likely not worthwhile.

Since applying bigrams is quite complex, Nuance recommends that you contact technical support for guidance at Nuance Network if you intend to use this feature.

The W3C specification is supported as described; however, n-grams are more conveniently used in a statistical language model grammar, and this is the recommended method. The SWIlanguageModel meta parameter cannot be used with robust parsing grammars. See Adding natural language capabilities for a detailed discussion of this and other natural language strategies.

Bigrams vs. SWI_scoreDelta

Note that bigrams are an alternative to SWI_scoreDelta for boosting scores for frequently uttered words and phrases. Especially for large vocabularies, bigrams are a better choice since they get applied earlier in the recognition process (before parsing) and so improve both efficiency and, to a lesser extent, accuracy. However, they are more complex to apply and are not useful for boosting scores based on dynamic criteria (such as today’s date, or the caller’s area code).

Bigram example

The following presents a basic example of using a class bigram to boost scores for some cities more than others in a travel application.

Here is a sample grammar file that uses a bigram:

<?xml version='1.0' encoding='UTF-8'?>
<grammar root="START" version="1.0" xml:lang="en-US"
  xmlns="http://www.w3.org/2001/06/grammar">
<meta name="SWIlanguageModel" content="fly.bgxml"/>
<rule id="START" scope="public">
 i want to fly from
 <ruleref uri="#CITIES_CLASS" />
 <tag>ORIGIN=CITIES_CLASS.SWI_literal </tag>
 to
 <ruleref uri="#CITIES_CLASS" />
 <tag>DESTINATION=CITIES_CLASS.SWI_literal </tag>
</rule>
 
<rule id="CITIES_CLASS" SWI_listClass="1">
 <one-of>
  <item><token>Montreal</token></item>
  <item>Denver</item>
  <item>Boston</item>
  <item>San Francisco</item>
 </one-of>
</rule>
</grammar>

This imports a bigram defined in the file fly.bgxml.

Here is an excerpt of fly.bgxml:

<N-Gram>
 <import uri="cities.bgxml" name="cities" />
 <lexicon>
  <token index="1">-pau-</token>
  <token index="2">i</token>
  <token index="3">want</token>
  <token index="4">to</token>
  <token index="5">fly</token>
  <token index="6">from</token>
  <token index="7"> 
   <ruleref import="cities#CITIES_CLASS" /> 
  </token>
  <token index="8">-pau2-</token>
 </lexicon>
 <tree>
  <node>7 8 </node> <!-- 7 classes/words, count of 8 -->
  <node>1 1 1 </node> <!-- Word 1 (I) can only go to -->
  <node>2 1 1 </node> <!-- Word 2 (I) can only go to -->
  <node>3 1 </node> <!-- Word 2 (want) -->
  <node>3 1 1 </node> 
  <node>4 1 </node> 
  <node>4 2 2 </node> <!--Word 4 (to) used in 2 places-->
  <node>5 1 </node> <!-- fly -->
  <node>7 1 </node> <!-- city class -->
  <node>5 1 1 </node> 
  <node>6 1 </node> 
  <node>6 1 1 </node> 
  <node>7 1 </node> 
  <node>7 1 1 </node> 
  <node>4 1 </node>
  <node>8 1 </node>
 </tree>
</N-Gram>

In conformance with the W3C n-gram specification, fly.bgxml imports a bigram from the file cities.bgxml (shown below) to represent the cities class. Also note that this class is referred to as CITIES_CLASS. This matches the name given to the class in fly.grxml, where it is defined as a SWI_listClass. The correspondence between the class names and the class's designation as SWI_listClass are necessary. Similarly, all the words must exactly match those in the grammar. Also note that the bigram must put counts on all the words in the sentence even though there is no variation in how the sentence is said; it is always “i want to fly from…”. For this case, we simply have each word be followed once by the word after it with a count of 1. Failure to do this causes recognition to fail since a probability of 0 is placed on any word not included.

Note that Recognizer uses the special words "-pau-" and "-pau2-" to mark the begin of sentence and end of sentence markers respectively. These two special words are required in the lexicon section of the parent n-gram. This allows the setting of bigram probabilities for "-pau-" to any other word in the lexicon and from any word in the lexicon into "-pau2-".

Finally, here is the bigram specified in cities.bgxml:

<N-Gram>
 <lexicon>
  	<token index="1">-pau-</token>
  	<token index="2">Montreal</token>
  	<token index="3">Denver</token>
  	<token index="4">Boston</token>
  	<token index="5">San Francisco</token>
  	<token index="6">-pau2-</token>
 </lexicon>
 <tree>
  <node>6 6 </node>
  <node>1 1 </node>
  <node>2 2 </node>
  <node>3 2 </node>
  <node>4 1 </node>
  <node>5 1 </node>
  <node>6 1 </node>
 </tree>
</N-Gram>