Writing a grammar main body

<item>

<one-of>

<rule>

The <ruleref> element refers to a rule. The rule it references may appear within the same grammar, or in a different grammar file; or it may be a special rule.

If the rule exists in a grammar file, you must specify the location of the rule using the uri attribute. For example:

<ruleref uri="#YesNo"/>

<ruleref uri="./universals#BuySell"/>

As shown, the rule identifier must always be preceded with a hash mark (#).

The <ruleref> element is always an empty element; it will never contain other elements. This means it has no closing delimiter. It has one delimiter, which must always be closed with a slash before the second angle bracket (/>) to be syntactically correct.

To use a special rule, you must use the "special" attribute instead of a uri:

<ruleref special="NULL"/>

The three special rules are explained under Special rules (NULL, VOID, and GARBAGE).

<tag>

<token>

Character name	Character	Character name	Character
hyphen	-	period	.
underscore	_	comma	,
opening parenthesis	(	forward slash	/
closing parenthesis	)	question mark	?
single quotation mark	'

Overriding an error

The effect of grammar language

Escaped characters

All grammars (and any embedded ECMAScript code) must respect characters reserved by the XML standard. For example, the ampersand "&" functions as an escape character: any XML or GrXML parser will interpret it as the beginning of code, rather than as the ampersand character itself.

To represent special characters, you must “escape” them: encode them so they will be interpreted correctly. The basic code for each such character consists of an ampersand followed by a letter or number code, and ending in a semi-colon.

The characters that must be escaped to ensure correct interpretation include:

Character name	XML Code
quote (")	"
apostrophe (')	'
ampersand (&)	&
less than (<)	<
greater than (>)	>

For example, to encode the company name AT&T, you would have to represent the ampersand with its XML code equivalent:

<item>AT&amp;T</item>

Escaped characters in URLs

Here is the general form of the URL:

http://myServer/myDir/myGrammar.gram?x=y

Any needed escapes occur in the x and y expressions. Do not escape the equals sign (=).

Escaped characters in ECMAScript

This restriction on escaped characters applies somewhat differently to characters that appear in a script. In ECMAScript, you must escape double quotes differently, and also escape any backslash characters:

Character name	Escape Code
quote (")	\"
apostrophe (')	\'
slash (\)	\\

Remember also that in any information returned to the application, you may have to use XML escape codes for certain characters (quote, ampersand, apostrophe, less/greater than) as described in Escaped characters.

Words, underscores, pronunciations, and accuracy

Within a grammar, a word is defined as a unit of the grammar separated by whitespace. The following grammar excerpt has 5 words:

<item> the destination is San Francisco </item>

The underscore character (_) can be used to link two words to create a single word that will be recognized as a single unit. For example, “San Francisco” could be written as “San_Francisco” to make the grammar 4 words:

<item> the destination is San_Francisco </item>

As a general rule, do not use the underscore character (_) to join words into phrases in your grammars. Underscores can improve recognition accuracy of the joined phrase; but they can also degrade accuracy if used improperly. If you do add an underscore to a phrase, be sure to test its effects thoroughly.

For more discussion, see A phrase can have a pronunciation.

URIs and delimiters

A URI can be composed of two parts separated by a question mark (?). The first part indicates a transport protocol (such as http or ftp) and a path location. The second part contains additional information in the form of key/value pairs, separated by semi-colons. Such key/value pairs are often used to define constraints for built-in grammars. For example, consider the URI below:

builtin:grammar/date?language=ko-KR;minallowed=20010101;
maxallowed=20011231;minexpected=20010301;maxexpected=20010430

Here, builtin:grammar/date is the transport protocol, while everything after the question mark consists of key/value pairs separated with semi-colons.

A URI can include key/value pairs that contain Recognizer-specific information (see SWI_vars):

file://mygrammars/birthdate.grxml?SWI_vars.today=20090922

When interpreting the application’s URIs, Recognizer extracts and removes Recognizer-specific information (as indicated by the SWI_vars. prefix) and passes the remaining string to the internet fetching mechanism.

For security purposes, you can prevent Recognizer from logging the values of any key/value pairs in the URI.

Recognizer assumes that the URIs use the semicolon and ampersand characters (; and &) as delimiters to distinguish Recognizer-specific from fetch-specific information. If you need these characters in the URI string sent to the internet server, you can assign a different Recognizer delimiter using the swirec_inet_query_delimiters parameter.

Certain ASCII characters are not allowed in a URI, and are treated as blank spaces. For example, the plus symbol (+) will be treated as a space. To include such characters in a URI, you must instead use the appropriate escape sequence. For example, you would have to encode a plus sign as %2B instead. Do not escape the equals sign (=).

NULL rule

The NULL rule automatically matches a user utterance, regardless of what the caller actually said. You can use NULL to supply default values for a variable:

<ruleref special="NULL" tag="TEMP='HOT';"/>

In this example, the TEMP variable is set to the value HOT automatically, regardless of what the user said.

The NULL rule can be used to initialize the values of variables (for example, a variable used in a recursive loop), or simply as a placeholder for a rule that will be defined more fully at a later time.

VOID rule

The VOID rule defines an utterance that is automatically rejected when spoken. It can be used during testing to disable a given branch of your grammar, effectively commenting out that section of the GrXML. For example:

<rule id="action">

    <one-of>

        <item> reserve <tag>command='reserve'</tag> </item>

        <item> cancel <tag>command='cancel'</tag> </item>

        <item><ruleref special="VOID"/> check

            <tag>command='check'</tag> </item>

    </one-of>

</rule>

In this example, any utterance containing the word “check” will be rejected.

Avoid activating VOID rules along with other rules, as they lower the confidence scores of recognition matches. This happens because the VOID rules will match silence at the beginning of utterances. The effect is not usually significant, but is magnified if the grammars have high weight values.

GARBAGE rule

The GARBAGE rule may match any speech up to the next rule, token, or end of spoken input. This rule is used to create a placeholder for extra words that are not part of desired recognitions—in other words, the GARBAGE rule lets you account for noise or other forms of speech, like hesitations and false starts, that are not described explicitly in the grammar.

Notes:

Short, monosyllabic utterances might not trigger the garbage rule as effectively as longer utterances.
You cannot use both the GARBAGE rule and a constraint list in the same grammar. See the example that follows, and Constraint lists.

The following example tries to capture possible environmental noise and also some extra speech like "Please, hum, ah,…":

...

 <ruleref special="GARBAGE"/>

 <one-of>

  <item>want</item>

  <item>need</item>

 </one-of>

 <ruleref special="GARBAGE"/>

  <item>a taxi</item>

 <ruleref special="GARBAGE"/>

...

This example uses a garbage rule to spot the word “taxi” inside any sentence:

<ruleref special="GARBAGE"/> taxi <ruleref special="GARBAGE"/>

Writing a grammar main body

GrXML elements

Allowed symbols and digit strings

Special rules (NULL, VOID, and GARBAGE)

Related topics