SLMs

Application developers can choose various combinations of speech grammars to provide natural language solutions. Most speech recognition systems have two stages: recognition and semantic interpretation. SRGS grammars define both stages simultaneously. However, more difficult contexts require more powerful techniques, which usually involve separately defining recognition and semantic interpretation stages.

A statistical language model (SLM) controls recognition in an application using a probabilistic model of word combinations. An SLM is generated from a training file, and it improves recognition accuracy by describing the probability of each valid utterance.

Note: An SLM training file can be activated directly, precompiled, or imported into an SRGS wrapper grammar that packages it with a means of semantic interpretation.

Why use an SLM

SLMs are not meant to completely replace traditional SRGS grammars, which are quite suitable in many circumstances:

  • When the application’s prompts are sufficient to restrict the user responses to the vocabulary constrained by those grammars.
  • The vocabulary is small (<500 words) or is composed of a small number of classes. For example, a date grammar has a small vocabulary. A city-state grammar—despite being large—is essentially a composition of two classes.
  • The CPU/accuracy operating point has to be carefully tuned.
  • Only a small amount of transcribed data is available to train an SLM.

Since SLMs need a large set of examples to train, they often require that you gather examples with a data collection system or a pilot application that uses an SRGS grammar.

SLMs are useful for recognizing free-style speech, especially when the out-of-grammar (OOG) rate would be high with an SRGS grammar—for example, when caller replies are inherently unpredictable. However, an SLM is seldom used alone, because it does not provide the semantic meaning of what a caller says. For full natural language capability, you must combine the SLM with an SRGS grammar, a robust parsing grammar, or a statistical semantic model (SSM). With this combined approach, you can use a single, generalized language model with more than one SSM or robust parsing grammar: the SLM handles the recognition, and the semantic layer interprets meanings.

This table lists a number of ways to compare SRGS grammars and SLMs.

Type

SRGS

SLM

Training data

Required.

OOG

Big problem

Solved by having enough training data

Author

Difficult

Collect training data

Flexibility

Small changes in input language require complicated grammar changes which require special expertise.

Requires more training data, but not special expertise.

Semantic interpretation

Part of grammar

A separate layer:

  • An SRGS grammar (difficult)
  • Robust parsing grammar
  • Statistical semantic model (SSM)

CPU demands

Generally lower

Higher

Accuracy

Often limited by presence of OOG utterances.

Higher in complex contexts

How an SLM works

At any moment in a dialog, some sentences are more likely to be spoken than others. An SLM models what callers might say by defining a vocabulary of acceptable words (defined in the <vocab> section of the training file), and describing the probability that a given word will follow other particular words in a sentence (trained from the sentences in the <training> section).

For example, consider the prompt “Who are you trying to reach?” and a recognition vocabulary that allows answers such as the following:

  • “I’m trying to reach the Vice President of Sales.”
  • “I want to talk with the Vice President of R and D.”
  • “The Vice President of Marketing, please.”

The SLM can predict the probability that a given word comes after the phrase "the Vice-President of...", and weight the sentences accordingly. In this way, a statistical language model improves recognition accuracy by increasing the recognition of probable word sequences.

In theory, an SLM can accept any combinations of vocabulary words, even if they occur in an order that doesn’t make sense. For example, it could accept the nonsense phrase: “Sales Sales the please.” Words in the vocabulary can be recognized at any time, in any combination. However, in practice, the weights that the SLM assigns make a rejection of such nonsense phrases far more likely.

An SLM assigns probabilities not just to individual words, but to entire sequences of words. For example, consider a travel application, where a caller may reply to a prompt with a sentence such as:

“I’d like to travel to London from Twickenham on Thursday.”

Here, it’s possible to predict what will come after the word “to” based on the word that precedes it:

  • The words “like to” are likely to be followed by an action or verb (“travel”).
  • The words “travel to” are likely to be followed by a location (“London”).

An n-gram SLM predicts the probability that a given word will follow the previous n-1 words, where the n is the order of the probabilistic model. Consider this sentence, “...to be or not to be...”.

  • If n is 1, the SLM is called a unigram. In a unigram, the probability of a word is not affected by the preceding word. For example, the n-gram sequence would be: …, to, be, or, not, to, be, …
  • If n is 2, the SLM is called a bigram. In a bigram, the probability of each word is calculated based on the previous word. For example, the n-gram sequence would be: …, to be, be or, or not, not to, to be, … A bigram with a vocabulary of size V contains as many as V2 probabilities.
  • If n is 3, the SLM is called a trigram. In a trigram, the probability of the next word is calculated based on the two previous words. For example, the n-gram sequence would be: …, to be or, be or not, or not to, not to be, … A trigram with a vocabulary size V contains as many as V3 probabilities.

You can set the order of an SLM in the training file header, as discussed in SLM training file header.