Writing a grammar main body
The main body of a grammar consists of rules. Each rule is a child of <grammar>, and defined in a separate <rule> element, which will use some combination of words and contained child elements (<one-of>, <item>, or <ruleref> elements) to define the rule. The recognizable words are entered as text. By default, Recognizer interprets all text as automatic <token> content unless otherwise specified.
For full details on GrXML elements and their attributes, refer to the SRGS specification.
GrXML elements
Sample grammar file provides an example of how GrXML elements can be used in the main body of a grammar. These elements are described here:
The <example> element contains text that represents an example of an acceptable response or utterance to the current rule. This text is ignored by Recognizer, but serves as a guideline for human readers.
An <example> can also be read by testing and parsing tools to create a sample script for testing purposes. See Test the grammars.
The <item> element defines words and phrases that the user can say. These expressions are written as text, and interpreted by Recognizer.
Any word that appears in an <item> is mandatory; users must say it in order for Recognizer to accept their utterances. Because this is a rather strict requirement to use in a voice application, there are two methods that are commonly used to mitigate the use of <item> and prevent incorrect rejections:
- When there are several acceptable alternatives, the <item> element is often put within a <one-of> list that includes an <item> for each option. The user must then say only one of those items in order to fall within the grammar.
- For optional utterances, the <item> element includes a repeat attribute that indicates the number of times the item may be repeated by the user. If the repeat attribute is set to "0-1", the item may be said once, or not at all.
For example, the filler items in the following rule are optional:
<rule id="gettime">
<item repeat="0-1"> I want to </item>
<item repeat="0-1"> I wanna </item>
<item repeat="0-1"> I would like to </item>
<item repeat="0-1"> Let me </item>
<one-of>
<item>departure</item>
<item>depart</item>
<item>leave</item>
</one-of>
<item>at</item>
<ruleref uri="#time">
<item repeat="0-1">please</item>
</rule>
An <item> can include tokens (the default) and elements. Each text word within an <item> is automatically a token unless it appears within a <tag> element.
The <one-of> element defines a list of acceptable words and phrases. If a user says any one of them, that utterance is accepted by Recognizer as falling within the grammar. The <one-of> element lets you list many different replies to a given prompt, and thus include different ways of making the same choice.
In the sample grammar, <one-of> is used to define the many acceptable variations of “yes” and “no”: “yes”, “yeah”, “correct”, “right”, and so on.
The main body of a grammar file consists of rules defined with the <rule> element. Each rule has a unique identifier and a scope (public or private), and specifies the actions to be taken when a user utterance matches the rule.
- Private rules can only be referenced by another rule in the same grammar.
- Public rules can be referenced independently from the rest of the grammar file. This is useful for rules that may have to be consulted frequently in your application. For example, universal commands that can be invoked at any time can be public; or in a banking application, a single rule may be used to recognize an account number at many different stages of a transaction.
Note: Do not use reserved words for the rule identifier, for example, Call, Block, Function, Object, Array, Boolean, Date, Math, Number Error, and so on. (These words are case-sensitive.)
The <ruleref> element refers to a rule. The rule it references may appear within the same grammar, or in a different grammar file; or it may be a special rule.
If the rule exists in a grammar file, you must specify the location of the rule using the uri attribute. For example:
<ruleref uri="#YesNo"/>
<ruleref uri="./universals#BuySell"/>
As shown, the rule identifier must always be preceded with a hash mark (#).
The <ruleref> element is always an empty element; it will never contain other elements. This means it has no closing delimiter. It has one delimiter, which must always be closed with a slash before the second angle bracket (/>) to be syntactically correct.
To use a special rule, you must use the "special" attribute instead of a uri:
<ruleref special="NULL"/>
The three special rules are explained under Special rules (NULL, VOID, and GARBAGE).
The <tag> element defines semantic information or scripts that must be parsed and evaluated. The type of scripts allowed is determined by the value assigned to the tag-format attribute of the header <grammar> element.
Instructions in a <tag> element are typically used to assign or calculate the values for variables, which are then returned to the main application.
For a discussion of scripts and semantic interpretation, see Scripts, tags, and semantic interpretation.
The <token> element identifies text that Recognizer will interpret as a pronunciation. By default, text in a grammar file is treated as a token unless otherwise specified (for example, it appears within a <tag> element set).
You can use the <token> element to assign a different language to a particular word or phrase within an <item> by using the xml:lang attribute.
Allowed symbols and digit strings
You can use certain non-alphabetic symbols and strings of digits in your grammars, but Recognizer’s interpretation of them will depend on the grammar language. For best results, it is recommended that you spell out vocabulary items in your grammars as words; avoid using digit strings and symbols if possible.
Consider the following two items:
<item> 50% </item>
<item> fifty percent</item>
In this example, the second item is preferable because it explicitly defines what words can be spoken, and avoids using non-alphabetic symbols (the percent sign) and strings of digits (50).
When you use a digit string or other abbreviation, you cannot be certain about phrases covered by the grammar. Always test the pronunciations (see Checking pronunciations with dicttest). What is usable in one language may not be in another. Some digit strings may not generate any pronunciation and therefore may interfere with grammar compilation. To avoid such failures, set swirec_enable_robust_compile in a <meta> element in the grammar.
Recognizer interprets some symbols in each language automatically (such as the percent symbol "%" and dollar sign "$" in en-US), but most symbols cause an error unless defined in a dictionary. Problem symbols include:
Character name |
Character |
Character name |
Character |
---|---|---|---|
hyphen |
- |
period |
. |
underscore |
_ |
comma |
, |
opening parenthesis |
( |
forward slash |
/ |
closing parenthesis |
) |
question mark |
? |
single quotation mark |
' |
Double quotation marks (") are never allowed inside vocabulary items; they are reserved as delimiters.
Use digit strings cautiously to avoid problems:
- Individual digits (0–9) are acceptable. Recognizer interprets them as the number they represent (zero to nine).
- Strings of digits with matching entries in a user dictionary are acceptable. If a grammar covers the digits “32564,” and you provide a dictionary pronunciation for that number, recognition accuracy remains high.
- For random strings of digits, use extra caution and remember that spelling the names of the numbers will get better recognition accuracy.
For example, the phrase “one hundred and twenty three” generates accurate pronunciations and gets high accuracy. But the same phrase as digits “123,” with no matching entry in a user dictionary, generates additional dissimilar pronunciations (for example, “one two three," "one hundred and twenty three," "one hundred twenty three," "twelve three," and so on), and decreased accuracy.
- Some languages limit the length of string to avoid accuracy problems. If you exceed the allowed length, you get a parsing error similar to this:
SWI_ERROR_GENERIC| error| lookupIndividualWords | Could not generate pronunciation for phrase '1234' (lang en-gb).
If Recognizer cannot generate a pronunciation for a symbol, the grammar compilation fails. However, you can instruct Recognizer to ignore failures and continue compiling by using swirec_enable_robust_compile.
As noted above, the rules for translating symbols and digit strings into pronunciations are different for each language. For example, in US English the phrase “$5” is translated to “five dollars”; in other languages, the dollar sign may translate to something else. Similarly, the digit string “519” is translated to “five hundred nineteen” in en-US, but another language might translate the digits individually as “5”, “1”, and “9”, or not translate the string successfully.
To check the translation of any symbol or number for any installed language, use the dicttest tool (see Checking pronunciations with dicttest).
All grammars (and any embedded ECMAScript code) must respect characters reserved by the XML standard. For example, the ampersand "&" functions as an escape character: any XML or GrXML parser will interpret it as the beginning of code, rather than as the ampersand character itself.
To represent special characters, you must “escape” them: encode them so they will be interpreted correctly. The basic code for each such character consists of an ampersand followed by a letter or number code, and ending in a semi-colon.
The characters that must be escaped to ensure correct interpretation include:
Character name |
XML Code |
---|---|
quote (") |
" |
apostrophe (') |
' |
ampersand (&) |
& |
less than (<) |
< |
greater than (>) |
> |
For example, to encode the company name AT&T, you would have to represent the ampersand with its XML code equivalent:
<item>AT&T</item>
Escaped characters in URLs
Here is the general form of the URL:
http://myServer/myDir/myGrammar.gram?x=y
Any needed escapes occur in the x and y expressions. Do not escape the equals sign (=).
Escaped characters in ECMAScript
This restriction on escaped characters applies somewhat differently to characters that appear in a script. In ECMAScript, you must escape double quotes differently, and also escape any backslash characters:
Character name |
Escape Code |
---|---|
quote (") |
\" |
apostrophe (') |
\' |
slash (\) |
\\ |
Remember also that in any information returned to the application, you may have to use XML escape codes for certain characters (quote, ampersand, apostrophe, less/greater than) as described in Escaped characters.
Within a grammar, a word is defined as a unit of the grammar separated by whitespace. The following grammar excerpt has 5 words:
<item> the destination is San Francisco </item>
The underscore character (_) can be used to link two words to create a single word that will be recognized as a single unit. For example, “San Francisco” could be written as “San_Francisco” to make the grammar 4 words:
<item> the destination is San_Francisco </item>
As a general rule, do not use the underscore character (_) to join words into phrases in your grammars. Underscores can improve recognition accuracy of the joined phrase; but they can also degrade accuracy if used improperly. If you do add an underscore to a phrase, be sure to test its effects thoroughly.
For more discussion, see A phrase can have a pronunciation.
A URI can be composed of two parts separated by a question mark (?). The first part indicates a transport protocol (such as http or ftp) and a path location. The second part contains additional information in the form of key/value pairs, separated by semi-colons. Such key/value pairs are often used to define constraints for built-in grammars. For example, consider the URI below:
builtin:grammar/date?language=ko-KR;minallowed=20010101;
maxallowed=20011231;minexpected=20010301;maxexpected=20010430
Here, builtin:grammar/date is the transport protocol, while everything after the question mark consists of key/value pairs separated with semi-colons.
A URI can include key/value pairs that contain Recognizer-specific information (see SWI_vars):
file://mygrammars/birthdate.grxml?SWI_vars.today=20090922
When interpreting the application’s URIs, Recognizer extracts and removes Recognizer-specific information (as indicated by the SWI_vars. prefix) and passes the remaining string to the internet fetching mechanism.
For security purposes, you can prevent Recognizer from logging the values of any key/value pairs in the URI.
Recognizer assumes that the URIs use the semicolon and ampersand characters (; and &) as delimiters to distinguish Recognizer-specific from fetch-specific information. If you need these characters in the URI string sent to the internet server, you can assign a different Recognizer delimiter using the swirec_inet_query_delimiters parameter.
Certain ASCII characters are not allowed in a URI, and are treated as blank spaces. For example, the plus symbol (+) will be treated as a space. To include such characters in a URI, you must instead use the appropriate escape sequence. For example, you would have to encode a plus sign as %2B instead. Do not escape the equals sign (=).
Special rules (NULL, VOID, and GARBAGE)
GrXML offers three special rules that simplify grammar development:
- NULL: defines a rule that is matched automatically.
- VOID: defines a rule that cannot be spoken.
- GARBAGE: defines a rule that matches any speech up to the next rule match, the next token, or to the end of spoken input.
These names are all reserved in GrXML, so you must not use them for rules of your own creation. Recognizer will interpret them automatically. To invoke one of these rules, use the special attribute of the <ruleref> element as shown:
<ruleref special="NULL"/>
Details on each of these rules appear below.
The NULL rule automatically matches a user utterance, regardless of what the caller actually said. You can use NULL to supply default values for a variable:
<ruleref special="NULL" tag="TEMP='HOT';"/>
In this example, the TEMP variable is set to the value HOT automatically, regardless of what the user said.
The NULL rule can be used to initialize the values of variables (for example, a variable used in a recursive loop), or simply as a placeholder for a rule that will be defined more fully at a later time.
The VOID rule defines an utterance that is automatically rejected when spoken. It can be used during testing to disable a given branch of your grammar, effectively commenting out that section of the GrXML. For example:
<rule id="action">
<one-of>
<item> reserve <tag>command='reserve'</tag> </item>
<item> cancel <tag>command='cancel'</tag> </item>
<item><ruleref special="VOID"/> check
<tag>command='check'</tag> </item>
</one-of>
</rule>
In this example, any utterance containing the word “check” will be rejected.
Avoid activating VOID rules along with other rules, as they lower the confidence scores of recognition matches. This happens because the VOID rules will match silence at the beginning of utterances. The effect is not usually significant, but is magnified if the grammars have high weight values.
The GARBAGE rule may match any speech up to the next rule, token, or end of spoken input. This rule is used to create a placeholder for extra words that are not part of desired recognitions—in other words, the GARBAGE rule lets you account for noise or other forms of speech, like hesitations and false starts, that are not described explicitly in the grammar.
- Short, monosyllabic utterances might not trigger the garbage rule as effectively as longer utterances.
- You cannot use both the GARBAGE rule and a constraint list in the same grammar. See the example that follows, and Constraint lists.
The following example tries to capture possible environmental noise and also some extra speech like "Please, hum, ah,…":
...
<ruleref special="GARBAGE"/>
I
<one-of>
<item>want</item>
<item>need</item>
</one-of>
<ruleref special="GARBAGE"/>
<item>a taxi</item>
<ruleref special="GARBAGE"/>
...
This example uses a garbage rule to spot the word “taxi” inside any sentence:
<ruleref special="GARBAGE"/> taxi <ruleref special="GARBAGE"/>
Related topics
Related topics
Reference