A simple sample script appears below:
# Example script. Use pound sign (#) for Comments
# Header (ACC:)
:ACC
# Load grammars
SWIrecGrammarLoad G0 g0.grxml
SWIrecGrammarLoad G1 g1.grxml
SWIrecGrammarLoad G2 g2.grxml
# Define the contexts
context_define context1 500 800
context_add G1 1
context_add G2 1
context_end
context_define context2 200 900
context_add G0 1
context_end
# Open the cumulative files
open utd test_grammar.utd
open errors test_grammar.err
open xmlresult test_grammar_nlsml.xml
xmlresult_media_type application/x-vnd.speechworks.emma+xml
# Test the contexts
context_use context1
transcription blue elephant
meaning {toto}
recognize blue_elephant.ulaw
transcription blue
recognize blue.ulaw
# Reset channel normalization
SWIrecAcousticStateReset
context_use context2
transcription i want to fly from denver to boston at five o'clock
recognize boston_denver_at_5.ulaw
# Generate reports
report summary test_grammar.summary
report confidence test_grammar.confidence
report nbest test_grammar.nbest
report oov test_grammar.oov
report words test_grammar.words
# Close the cumulative files
close utd
close errors
close xmlresult
Each script begins with the following header:
:ACC
This header tells acc_test to call Recognizer, which uses the speech detector to detect end of speech. This means that the utility may declare the end of speech before the end of the file, based on the "incompletetimeout" parameter. Typical input waveform files will have already been endpointed and padded with approximately 200 ms of silence before and after the speech.
To enter comments anywhere in the script, use a hashmark (#) at the beginning of the line. This character tells acc_test to ignore the rest of the line.
The next section of the script tells acc_test which grammars to load for testing:
SWIrecGrammarLoad G0 g0.grxml
SWIrecGrammarLoad G1 g1.grxml
SWIrecGrammarLoad G2 g2.grxml
Here, each SWIrecGrammarLoad command defines an internal name (for example, G0) and matches it with a grammar to be loaded (g0.grxml):
SWIrecGrammarLoad gname gpath
Here, gname is the name to be assigned to the grammar in the rest of the script, and gpath is the URI to the grammar. This URI must include the full pathname for the grammar relative to the script. In the examples above, all the grammars are assumed to be in the same directory as the script itself. If there is a problem loading the grammar, the script will exit with an error message.
Once the grammars are loaded, they are used to define grammar contexts:
context_define context1 500 800
context_add G1 1
context_add G2 1
context_end
In this excerpt, the script defines "context1" as a combination of grammars G1 and G2, weights these grammars equally, and sets the confidences thresholds that will be considered low (500) and high (800) on a scale of a thousand.
The commands used to define contexts are:
context_define cname low_thresh high_thresh
This command begins each definition, specifying the name to be used for the context (cname), and setting the low (low_thresh) and high (thresh) confidence thresholds for the context. These threshold limits range from 1 to 1000. Both are required, but you can use the same number for both if desired.
This command adds a grammar gname to the current context, assigning it the specified weight within the context. To weight all grammars equally, you assign the same weight.
This command marks the end of the current context definition.
The acc_test reports represent data which has been accumulated internally. The data is written when the report command is processed. However, some kinds of data are written as the recognitions happen:
To activate these files so new results will be written during the current session, the script uses the open command:
open filetype fname
Where filetype is one of the options listed above (utd, error, or xmlresult) and fname is the name and location for the file. If the named files already exist, they will be overwritten with the new results.
These cumulative files can be closed later—normally at the end of the script—by using the close command. See Close the cumulative files for details.
The testing section specifies the tests themselves. Each subsection uses a context_use command to invoke a context with which to test recognitions, and specifies the tests to be conducted. For example, the sample file above tests the context1 context with two items:
context_use context1
transcription blue elephant
meaning {toto}
recognize blue_elephant.ulaw
transcription blue
recognize blue.ulaw
The context_use command takes one argument, that being the name of the context to be tested (the cname specified in the context_define command). Only one context can be active at one time: each context_use command implicitly deactivates whatever context was active up to that point.
Each test can include the following commands:
- transcription: The transcription of the audio file being used for recognition.
- meaning: The meaning to be assigned to the item when recognized, if this is different from the meaning that will be returned by Recognizer (optional). Enclose the meaning in braces {like this}.
- recognize: The name and location of the audio file to be recognized. In the example, both audio files are in the same directory as the script.
Use the -format option to specify an audio type (the default is 8-bit, 8 KHz ulaw audio). For example:
recognize blue.alaw -format audio/x-alaw-basic
You can use a recognize command to specify text as well:
recognize -format text/plain "blue elephant"
However, acc_test is intended for audio tests, so this is not recommended.
It is strongly recommended that you use end-pointed wave files generated by Recognizer, as these have correct begin and end silence times that the acc_test utility requires. Wave files generated or processed by other methods are not recommended.
The transcription and meaning values are used in reports (see below).
To reset the speaker/channel normalization between tests (in order to simulate the start of a new call, for example), use a SWIrecAcousticStateReset command:
SWIrecAcousticStateReset
You can reset the result count at any point with a context_reset command:
context_reset
This command erases the results from all recognitions performed up to this point, so they are not counted for subsequent reports. You will probably only want to do this if you have already generated one set of reports (see below), and want to reinitialize for the next set of reports.
Once the tests are complete, you can use the results to write several different kinds of reports, using a separate command for each desired report:
report rtype reportfile
Here, the rtype is the type of report to be generated, while the reportfile specifies the location and name for the report. The name can include an environment variable (for example, %fname.err%). You can use a report command anywhere in the script. Usually, reports are generated before a context_reset or just before the script’s end. See Reset recognition count within a script.
The types of reports recommended for your testing include:
- summary: Provides an overall summary of the results for each context.
- nbest: Shows where on the nbest list the correct answer occurred.
- oov: Lists the words found to be out-of-vocabulary, by count.
- words: Evaluates the overall accuracy of recognition for each word.
- confidence: Lists the confidences of recognitions for each context.
You stop writing to the cumulative files using the close command:
close filetype
Normally these files will be closed at the end of the script, as in the example. Only one file of each type can be open at a time, so there is no need to specify the file name when you close. You can make any number of cumulative files, but when you open a second file of a type, the first closes automatically.