Grammar file standards

Before discussing the issues involved in actually writing a grammar, here is some background on the standards that apply to Recognizer.

SRGS specification

Recognizer conforms to the W3C Recommendation of March 16, 2004 (“Speech Recognition Grammar Specification”, known hereafter as “the SRGS specification”, or “the W3C specification”).

This SRGS specification describes two types of grammar syntax:

  • ABNF (Augmented BNF) syntax is essentially a combination of traditional BNF and a regular expression language. This syntax is more compact than XML, and more familiar to many voice application developers.
  • XML grammar syntax represents a grammar in an XML document.

Recognizer only supports the XML format of SRGS (referred to as “GrXML” for convenience). In practical terms, this means that you must either write your grammars in GrXML, or convert them to GrXML format from ABNF using theabnf2xml utility.

ECMAScript

You can include ECMAScript within the <tag> element in your grammars. Recognizer supports ECMA-262. ECMAScript is supported by all three of the possible values for the tag-format attribute in the <grammar> element.

File extensions and formats

Grammar files in the SRGS specification use the .grxml extension:

grammar_filename.grxml 

Recognizer only accepts XML (.grxml) format text files. However, the Recognizer installation package includes an offline grammar compilation tool called sgc, which precompiles grammars into a proprietary format (see Compiling grammars). These compiled binary grammars use a .gram extension:

grammar_filename.gram 

Grammar media types

If you store grammars on a web server, Recognizer can fetch them when performing load and activate functions. Configure these web servers to identify the media types of files:

File extension

Media type

.grxml

application/srgs+xml

.gram

application/x-swi-grammar

Semantic tags

The main purpose of a grammar is to recognize user utterances and translate them into information that the main application can use. It will generally do so by assigning a value to one or more specific variable(s). The SRGS specification does not include any elements that can assign a value to a variable directly: you must instead code the instructions for this action in some sort of code or semantic scripting language within matching <tag></tag> elements.

For these semantic tags, your grammars can use W3C syntax or Nuance syntax:

  • The W3C syntax provides general semantic capabilities, and these grammars will work on any recognizer that is SISR-compliant.
  • The Nuance syntax (or SWI syntax) is Nuance’s proprietary syntax. It provides the same capabilities as W3C syntax, and also enables access to additional SWI keys. SWI syntax is not compatible with recognizers other than Nuance Recognizer.

When adding semantic tags to grammars, use the tag-format attribute in the header to define which syntax is used. A single grammar can only use one format; however, a grammar can import subgrammars that use a different format (this is known as mixed-format import). For a detailed discussion, see Importing grammars with mixed formats.

The W3C tag syntax is described in “Semantic Interpretation for Speech Recognition (SISR) Version 1.0 W3C Recommendation” (known hereafter as “the SISR specification”). The SISR specification describes the "meta" object. Recognizer supports it, but with the following differences:

Property

Note

meta.rulename.score

Recognizer interprets the confidence score as the minimum of all word confidences in the rule.

meta.rulename.starttimemeta.rulename.endtime

Recognizer does not support these properties.

The SISR specification also describes the "out" and "rules" objects for semantic tags. Recognizer supports both, and also offers additional objects:

  • SWI: This object is similar to the SISR "out" object.
  • SWIrules: This object is similar to the SISR "rules" object.

These objects enable access to the semantic SWI_ keys.