Getting raw recognition results

This topic describes the format of raw recognition results for use by advanced users. The raw format is returned using a wordlattice media type. You can use this format for semantic analysis and other purposes.

Overview

The wordlattice format returns results in an XML format (encoded as UTF-8). The XML has these primary elements:

The <result> element is the root element.
The <param> element defines parameters global to the lattice.
The <lattice> element is a container for the lattice.
The <node> element defines attributes of nodes in the lattice.
The <arc> element defines the arcs in the lattice.

Below is a description of the general XML format. For an example of real output, see Example wordlattice output:

<result type="wordlattice" version="1.0" nlattices="1">

 <param …/>

 <param …/>

…

 <lattice>

  <node …/>

  <node …/>

…

  <arc …> … </arc>

  <arc …> … </arc>

…

 </lattice>

</result>

Getting wordlattice recognition results

To get word lattice output, set swirec_word_confidence_enabled to true. This adds word confidence scores to the output (see swirec_word_confidence_enabled). An application does the following:

Gets the recognition results using the wordlattice media type ("application/x-vnd.speechworks.wordlattice+xml").
Optionally, you can specify identification information to be included in the returned results. To do this, append the userid attribute to the media type. For example:
```
"application/x-vnd.speechworks.wordlattice+xml;userid=my_information"
```

Related parameters

These Recognizer parameters affect word lattice results:

Parameter	Description
swirec_word_posterior_pruning swirec_word_lattice_density	These parameters control the density of the word lattice. Higher values create larger lattices, which may consume more CPU cycles.

Parameter

Description

swirec_word_posterior_pruning

swirec_word_lattice_density

These parameters control the density of the word lattice. Higher values create larger lattices, which may consume more CPU cycles.

The <result> element

The <result> element (the root of the lattice result) has the following attributes:

Attribute	Description
type	The type of result; always "wordlattice".
version	The version of the XML document.
nlattices	Integer. The number of lattices. In a multi-grammar context, each grammar can generate its own word lattice. In version 1.0 of the word lattice feature, the result contains only one word lattice (which corresponds to the grammar generating the top result on the n-best list).

Attribute

Description

type

The type of result; always "wordlattice".

version

The version of the XML document.

nlattices

Integer. The number of lattices. In a multi-grammar context, each grammar can generate its own word lattice.

In version 1.0 of the word lattice feature, the result contains only one word lattice (which corresponds to the grammar generating the top result on the n-best list).

The <param> element

In the returned recognition results, the <param> elements will contain parameters global to the entire lattice. This includes the frame length, and absolute starting time, and others.

The <param> element has the following attributes:

Attribute	Description
name	A parameter name (see below).
value	Parameter value.

The legal parameter names are:

Parameter	Description
start_time	Absolute start time of the utterance. ("YYYYMMDDHHMMSS.mmm")
frame_length	Float. Length of a frame in seconds. This is the basic unit used for time measurement in the lattice.
utterance_length	Integer. Length of the utterance, in frames. This is used to determine the final nodes in the lattice.
userid	Optional. Alphanumeric. Identification information supplied by the user in the media type.

The <lattice> element

In the returned recognition results, a recognition context may contain multiple grammars with each grammar containing its own lattice.

The <lattice> element has the following attributes:

Parameter	Description
gramname	Alphanumeric. Name of the grammar associated with the lattice.
nnodes	Integer. Number of nodes in the graph.
narcs	Integer. Number of arcs in the graph.

The <node> element

In the returned recognition results, the <node> element associates a unique node id with a frame number. Multiple node id's may have the same frame number (for example, for branches in the lattice).

The <node> element has the following attributes:

Attribute	Description
id	Alphanumeric. An id associated with the node. It is an arbitrary string.
frame	Integer. The frame number associated with this node.

The <arc> element

In the returned recognition results, the <arc> element specifies the lattice network connecting the nodes. The <arc> is the textual representation of the data on that arc (typically the word).

The <arc> element has the following attributes:

Attribute	Description
type	Optional. Values are "word" (the default) or "silence". There is no text in a silent arc; text is required for a word arc.
from	Alphanumeric. The node id that this arc departs.
to	Alphanumeric. The id of the node that this arc enters.
acoustic_score	Float. The acoustic score for this arc.
lm_score	Optional. Float. The language model (LM) score for this arc. This attribute is optional if the lm score is zero.
confidence	Float in the range 0.000 to 1.000. The confidence score for this arc. (Valid for word arcs only.)

Example wordlattice output

<result type="wordlattice" version="1.0" nlattices="1">

 <param name="start_time" value="20050812141250.123"/>

 <param name="frame_length" value=".001"/>

 <param name="utterance_length" value="198"/>

 <param name="userid" value="abc.wav"/>

 <lattice gramname="mygrammar" nnodes="26" narcs="38">

  <node id="s0000" frame="0"/>

  <node id="s0001" frame="53"/>

  <node id="s0002" frame="92"/>

  <node id="s0003" frame="93"/>

  <node id="s0004" frame="122"/>

  <node id="s0005" frame="123"/>

  <node id="s0006" frame="155"/>

  <node id="s0007" frame="156"/>

  <node id="s0008" frame="196"/>

  <node id="s0009" frame="197"/>

  <node id="s0010" frame="32"/>

  <node id="s0011" frame="35"/>

  <node id="s0012" frame="38"/>

  <node id="s0013" frame="38"/>

  <node id="s0014" frame="50"/>

  <node id="s0015" frame="47"/>

  <node id="s0016" frame="53"/>

  <node id="s0017" frame="92"/>

  <node id="s0018" frame="93"/>

  <node id="s0019" frame="122"/>

  <node id="s0020" frame="123"/>

  <node id="s0021" frame="155"/>

  <node id="s0022" frame="156"/>

  <node id="s0023" frame="196"/>

  <node id="s0024" frame="197"/>

  <node id="s0025" frame="198"/>

  <arc type="silence" from="s0000" to="s0015"

   acoustic_score="355.395" confidence="0.00099" />

  <arc type="silence" from="s0000" to="s0014"

   acoustic_score="378.981" confidence="0.0191" />

  <arc type="silence" from="s0000" to="s0011"

   acoustic_score="263.683" confidence="0.0423" />

  <arc type="silence" from="s0000" to="s0010"

   acoustic_score="242.347" confidence="0.00076" />

  <arc type="silence" from="s0000" to="s0001"

   acoustic_score="401.805" confidence="0.937" />

  <arc type="word" from="s0001" to="s0002"

   acoustic_score="336.66" confidence="0.937" >

one

  </arc>

  <arc type="word" from="s0001" to="s0003"

   acoustic_score="348.273" confidence="0.937" >

one

  </arc>

  <arc type="silence" from="s0002" to="s0003"

   acoustic_score="13.28" confidence="0.284" />

  <arc type="word" from="s0003" to="s0004"

   acoustic_score="256.012" confidence="0.937" >

two

  </arc>

  <arc type="word" from="s0003" to="s0005"

   acoustic_score="267.469" confidence="0.937" >

two

  </arc>

  <arc type="silence" from="s0004" to="s0005"

   acoustic_score="12.3631" confidence="0.364" />

  <arc type="word" from="s0005" to="s0006"

   acoustic_score="277.704" confidence="0.937" >

   three

  </arc>

  <arc type="word" from="s0005" to="s0007"

   acoustic_score="288.781" confidence="0.937" >

   three

  </arc>

  <arc type="silence" from="s0006" to="s0007"

   acoustic_score="12.3804" confidence="0.321" />

  <arc type="word" from="s0007" to="s0008"

   acoustic_score="330.39" confidence="1" >

   four

  </arc>

  <arc type="word" from="s0007" to="s0009"

   acoustic_score="341.57" confidence="1" >

   four

  </arc>

  <arc type="silence" from="s0009" to="s0025"

   acoustic_score="7.44958" confidence="0.937" />

  <arc type="word" from="s0010" to="s0012"

   acoustic_score="55.9089" confidence="0.043" >

um

  </arc>

  <arc type="word" from="s0011" to="s0013"

   acoustic_score="28.1015" confidence="0.043" >

um

  </arc>

  <arc type="word" from="s0011" to="s0012"

   acoustic_score="27.7555" confidence="0.043" >

um

  </arc>

  <arc type="silence" from="s0012" to="s0016"

   acoustic_score="118.19" confidence="0.0237" />

  <arc type="silence" from="s0013" to="s0016"

   acoustic_score="118.19" confidence="0.0193" />

  <arc type="word" from="s0014" to="s0016"

   acoustic_score="31.0167" confidence="0.0201" >

um

  </arc>

  <arc type="word" from="s0015" to="s0016"

   acoustic_score="60.5195" confidence="0.0201" >

um

  </arc>

  <arc type="word" from="s0016" to="s0017"

   acoustic_score="336.255" confidence="0.0631" >

two

  </arc>

  <arc type="word" from="s0016" to="s0018"

   acoustic_score="347.868" confidence="0.0631" >

two

  </arc>

  <arc type="silence" from="s0017" to="s0018"

   acoustic_score="13.28" confidence="0.0191" />

  <arc type="word" from="s0018" to="s0019"

   acoustic_score="256.012" confidence="0.0631" >

   three

  </arc>

  <arc type="word" from="s0018" to="s0020"

   acoustic_score="267.469" confidence="0.0631" >

   three

  </arc>

  <arc type="silence" from="s0019" to="s0020"

   acoustic_score="12.3631" confidence="0.0245" />

  <arc type="word" from="s0020" to="s0021"

   acoustic_score="277.704" confidence="0.0631" >

   four

  </arc>

  <arc type="word" from="s0020" to="s0022"

   acoustic_score="288.781" confidence="0.0631" >

   four

  </arc>

  <arc type="silence" from="s0021" to="s0022"

   acoustic_score="12.3804" confidence="0.0216" />

  <arc type="word" from="s0022" to="s0023"

   acoustic_score="330.39" confidence="1" >

   four

  </arc>

  <arc type="word" from="s0022" to="s0024"

   acoustic_score="341.57" confidence="1" >

   four

  </arc>

  <arc type="silence" from="s0023" to="s0024"

   acoustic_score="12.1729" confidence="0.0239" />

  <arc type="silence" from="s0024" to="s0025"

   acoustic_score="7.44958" confidence="0.0631" />

 </lattice>

</result>