Troubleshooting grammars
Once you have tested your grammar and identified errors, you can go about correcting them. Unfortunately, it isn’t always easy to determine what exactly is causing the error, especially if your grammar is very large.
Hard-coded limits
Remember that Recognizer has a few precompiled limits that affect grammars and recognition. Typical grammars will never encounter these limits, but they do exist, and you may encounter problems if they are exceeded.
- Maximum number of simultaneous languages is 20.
- Maximum size of the n-best list is 100 entries.
- Maximum length of a rule name is 256.
- Maximum length of a word or underscored phrase is 256.
Syntax errors
Many errors will involve syntax errors. GrXML is unforgiving of such errors, and Recognizer will not always give a good indication of the problem when loading a grammar that is not well-formed XML. To locate them, it can help to view the file with a Web browser, which may give more informative syntax error messages than the testing tools.
- When you do encounter problems, look for the following errors in the file:
- The most common mistakes are:
- Forgetting to close an element with a closing delimiter (</item>, </rule>).
- Incorrect nesting (for example, putting a <rule> within an <item>).
- A very common mistake is to forget to close a <ruleref>. Since <ruleref> is a self-closing element, each <ruleref> delimiter must end with a close:
<ruleref uri="#X"/>
Where the
/>indicates the closing. It is very common to leave out the forward slash at the end, and miswrite this as <ruleref uri="#X">. - Another common mistake is to forget to use the "#" in a <ruleref>; for example, you may accidentally put
<ruleref uri="localrule"/>rather than<ruleref uri="#localrule"/>. In this case, the parser interprets "localrule" as a URI, and generates an error when unable to find it. - You may find it useful to edit the grammar with an XML-aware editor. Your grammar may be well-formed XML, but not a valid grammar according to the SRGS specification and its extensions. Use the testing tools described in Test the grammars to identify any invalid lines.
- Another common mistake is to forget to escape special characters in XML or ECMAScript (see Allowed symbols and digit strings). Be aware that such characters are used especially often in scripting.
- Within a grammar, the most common oversight is to forget to escape an apostrophe using the escape sequence "'", as shown below:
<item>o'reilly</item>
The correct form is:
<item>o'reilly</item>
- Certain ASCII characters are not allowed in a URI. For example, the plus symbol (+) in a URI must be escaped. To include an actual plus symbol, you must encode it as %2B. For example, consider these two URI specifications:
uri="a.grxml?SWI_vars.xxx=yyy%2Bzzz;"
uri="a.grxml?SWI_vars.xxx=yyy+zzz;"
Both URIs are valid, but Recognizer will interpret them respectively as:
xxx='yyy+zzz';"
xxx='yyy zzz';"
To learn about encoding reserved characters, see the specification for URIs.
Using a grammar dump directory
A grammar dump directory is a location where copies of grammars that are dynamically generated in response to runtime conditions are stored. This makes it possible to verify and debug grammars that have been created at runtime on remote servers. Whenever Recognizer dumps a copy of a grammar, it writes the SWIiffi event to the call log, along with the filename of the dumped file.
You can use a grammar dump directory by specifying a location for the directory in the GrammarDumpDirectory parameter, and the amount of disk space to be allocated, measured in Kbytes, in the GrammarDumpDirectorySize parameter (if this space limit is reached, Recognizer removes previously written grammars—first in, first out—before writing new ones).
Both parameters may be specified in the SpeechWorks.cfg configuration file (located in %SWISRSDK%\config), or as environment variables.