Testing grammars

The parseTool and test_parser, described in detail below, are tools specifically designed for testing grammars. Both are shipped with Recognizer, and are stored in the %SWISRSDK%\bin directory.

Also useful are the following utilities:

  • acc_test for recognition accuracy testing, which takes one or more prepared scripts as input.
  • dicttest for checking dictionary pronunciations.

Using parseTool

Use the parseTool program test a single grammar interactively. It lets you type sentences into grammar to see how the grammar handles them. You can test grammar coverage, interpretation, ambiguity, and overgeneration.

To use parseTool, navigate to the %SWISRSDK%\bin directory, and enter a command with the following format at the prompt:

parseTool grammarfile.grxml [option1arg1] [option2arg2] [...]

Where grammarfile is the path and name of the grammar file to be tested. The options most often used for regular GrXML grammars are described in the table that follows:

Note: Some parseTool options only apply for natural language grammars.

Option

Description

-debug_output

Prints information about ECMAScript operations. Used with -test_sentences and -test_file. Can be abbreviated to -d_o.

-dump_parser filename

Prints parser information to the specified file.

-gen_file filename

Generates random output sentences from the grammar to the specified file. Use with -max_gen to specify a number of sentences to be generated.

-gen_sentences

Generates random output sentences from the grammar. Used to detect overgeneration. Use with -max_gen to specify a number of sentences to be generated.

Can be abbreviated to -g_s.

-iso8859

Specifies the encoding format of the input and output files as ISO-8859. Used to override UTF-8 format when UTF-8 is the default.

-max_gen

Specifies how many sentences to generate. Used with -gen_sentences and -gen_file.

-media_type

Specifies the media type of the grammar. The value is either "application/x-vnd.speechworks.emma+xml" or "application/x-vnd.speechworks.recresult+xml".

-no_pretty

Prints the parse result with no formatting (that is, as a continuous line of text).

-no_script_check

Disables validity checking of the grammar.

-s

Enables silence mode, which stops the printing of argument information at the beginning of output.

-test_file filename

Specifies an input file with test sentences (one sentence per line).

-test_sentences

Enables input of sentences to test the grammar. You can type sentences from the keyboard (the default) or specify an input file (using the -test_file option). The tool evaluates each sentence and shows whether it is covered by the grammar.

Can be abbreviated to -t_s.

-utf8

Specifies the encoding format of the input and output files as UTF-8. Used to override ISO-8859 format when ISO-8859 is the default.

-utt

Enables input of audio files to be parsed. Only audio/basic files may be input.

Use this option with –test_sentences. You cannot use this option with -test_file.

With this option, you can specify audio files in addition to typing sentences (see below). The syntax is <filename (the angle bracket is required, and no whitespace is allowed between the bracket and the filename).

-verbose

Prints additional parse details.

Note: You can put the parseTool options in any order on the command line.

Using test_parser

The test_parser tool allows you to perform interpretation tests on grammars by comparing the correct key/value pairs that get passed to Recognizer with those actually generated.

Note that test_parser only tests keys/values that are set at the root and thus passed back to Recognizer; it cannot test attribute settings from subroot rules.

The tool operates on a test file, each of whose lines defines a test, or directive. Additionally, lines beginning with # are treated as comments. Like parsetool, test_parser accepts an argument of -iso8859 when the input file is not utf-8.

Each test line in the test file must be of the form:

xml_grammar_file sentence_text key_name correct_value_for_key

Item

Description

xml_grammar_file

Name of the grammar file. A hyphen (–) uses the previous grammar.

sentence_text

Text of the sentence to be recognized (in quotes). Precede the text with a tilde (~) to indicate sentences not allowed by the grammar (sentences that should not parse, or that parse and cause SWI_disallow to be set to 1.

key_name

Name of key to test. Precede the text with a tilde (~) to indicate keys to ignore.

correct_value_for_key

The expected value for key_name. The test_parser program compares the value actually returned with this value. Place the value in double quotes if it has spaces.

You can run test_parser with the following command-line options: