Getting raw recognition results
This topic describes the format of raw recognition results for use by advanced users. The raw format is returned using a wordlattice media type. You can use this format for semantic analysis and other purposes.
Overview
The wordlattice format returns results in an XML format (encoded as UTF-8). The XML has these primary elements:
- The <result> element is the root element.
- The <param> element defines parameters global to the lattice.
- The <lattice> element is a container for the lattice.
- The <node> element defines attributes of nodes in the lattice.
- The <arc> element defines the arcs in the lattice.
Below is a description of the general XML format. For an example of real output, see Example wordlattice output:
<result type="wordlattice" version="1.0" nlattices="1">
<param …/>
<param …/>
…
<lattice>
<node …/>
<node …/>
…
<arc …> … </arc>
<arc …> … </arc>
…
</lattice>
</result>
Getting wordlattice recognition results
To get word lattice output, set swirec_word_confidence_enabled to true. This adds word confidence scores to the output (see swirec_word_confidence_enabled). An application does the following:
- Gets the recognition results using the wordlattice media type ("application/x-vnd.speechworks.wordlattice+xml").
- Optionally, you can specify identification information to be included in the returned results. To do this, append the userid attribute to the media type. For example:
"application/x-vnd.speechworks.wordlattice+xml;userid=my_information"
Related parameters
These Recognizer parameters affect word lattice results:
Parameter |
Description |
---|---|
These parameters control the density of the word lattice. Higher values create larger lattices, which may consume more CPU cycles. |
The <result> element
The <result> element (the root of the lattice result) has the following attributes:
Attribute |
Description |
---|---|
type |
The type of result; always "wordlattice". |
version |
The version of the XML document. |
nlattices |
Integer. The number of lattices. In a multi-grammar context, each grammar can generate its own word lattice. In version 1.0 of the word lattice feature, the result contains only one word lattice (which corresponds to the grammar generating the top result on the n-best list). |
The <param> element
In the returned recognition results, the <param> elements will contain parameters global to the entire lattice. This includes the frame length, and absolute starting time, and others.
The <param> element has the following attributes:
Attribute |
Description |
---|---|
name |
A parameter name (see below). |
value |
Parameter value. |
The legal parameter names are:
Parameter |
Description |
---|---|
start_time |
Absolute start time of the utterance. ("YYYYMMDDHHMMSS.mmm") |
frame_length |
Float. Length of a frame in seconds. This is the basic unit used for time measurement in the lattice. |
utterance_length |
Integer. Length of the utterance, in frames. This is used to determine the final nodes in the lattice. |
userid |
Optional. Alphanumeric. Identification information supplied by the user in the media type. |
The <lattice> element
In the returned recognition results, a recognition context may contain multiple grammars with each grammar containing its own lattice.
The <lattice> element has the following attributes:
Parameter |
Description |
---|---|
gramname |
Alphanumeric. Name of the grammar associated with the lattice. |
nnodes |
Integer. Number of nodes in the graph. |
narcs |
Integer. Number of arcs in the graph. |
The <node> element
In the returned recognition results, the <node> element associates a unique node id with a frame number. Multiple node id's may have the same frame number (for example, for branches in the lattice).
The <node> element has the following attributes:
Attribute |
Description |
---|---|
id |
Alphanumeric. An id associated with the node. It is an arbitrary string. |
frame |
Integer. The frame number associated with this node. |
The <arc> element
In the returned recognition results, the <arc> element specifies the lattice network connecting the nodes. The <arc> is the textual representation of the data on that arc (typically the word).
The <arc> element has the following attributes:
Attribute |
Description |
---|---|
type |
Optional. Values are "word" (the default) or "silence". There is no text in a silent arc; text is required for a word arc. |
from |
Alphanumeric. The node id that this arc departs. |
to |
Alphanumeric. The id of the node that this arc enters. |
acoustic_score |
Float. The acoustic score for this arc. |
lm_score |
Optional. Float. The language model (LM) score for this arc. This attribute is optional if the lm score is zero. |
confidence |
Float in the range 0.000 to 1.000. The confidence score for this arc. (Valid for word arcs only.) |