Getting raw recognition results

This topic describes the format of raw recognition results for use by advanced users. The raw format is returned using a wordlattice media type. You can use this format for semantic analysis and other purposes.

Overview

The wordlattice format returns results in an XML format (encoded as UTF-8). The XML has these primary elements:

Below is a description of the general XML format. For an example of real output, see Example wordlattice output:

<result type="wordlattice" version="1.0" nlattices="1">
 <param …/>
 <param …/>
 …
 <lattice>
  <node …/>
  <node …/>
  …
  <arc …> … </arc>
  <arc …> … </arc>
  …
 </lattice>
</result>

Getting wordlattice recognition results

To get word lattice output, set swirec_word_confidence_enabled to true. This adds word confidence scores to the output (see swirec_word_confidence_enabled). An application does the following:

  1. Gets the recognition results using the wordlattice media type ("application/x-vnd.speechworks.wordlattice+xml").
  2. Optionally, you can specify identification information to be included in the returned results. To do this, append the userid attribute to the media type. For example:
    "application/x-vnd.speechworks.wordlattice+xml;userid=my_information"

Related parameters

These Recognizer parameters affect word lattice results:

Parameter

Description

swirec_word_posterior_pruning

swirec_word_lattice_density

These parameters control the density of the word lattice. Higher values create larger lattices, which may consume more CPU cycles.

The <result> element

The <result> element (the root of the lattice result) has the following attributes:

Attribute

Description

type

The type of result; always "wordlattice".

version

The version of the XML document.

nlattices

Integer. The number of lattices. In a multi-grammar context, each grammar can generate its own word lattice.

In version 1.0 of the word lattice feature, the result contains only one word lattice (which corresponds to the grammar generating the top result on the n-best list).

The <param> element

In the returned recognition results, the <param> elements will contain parameters global to the entire lattice. This includes the frame length, and absolute starting time, and others.

The <param> element has the following attributes:

Attribute

Description

name

A parameter name (see below).

value

Parameter value.

The legal parameter names are:

Parameter

Description

start_time

Absolute start time of the utterance. ("YYYYMMDDHHMMSS.mmm")

frame_length

Float. Length of a frame in seconds.

This is the basic unit used for time measurement in the lattice.

utterance_length

Integer. Length of the utterance, in frames.

This is used to determine the final nodes in the lattice.

userid

Optional. Alphanumeric. Identification information supplied by the user in the media type.

The <lattice> element

In the returned recognition results, a recognition context may contain multiple grammars with each grammar containing its own lattice.

The <lattice> element has the following attributes:

Parameter

Description

gramname

Alphanumeric. Name of the grammar associated with the lattice.

nnodes

Integer. Number of nodes in the graph.

narcs

Integer. Number of arcs in the graph.

The <node> element

In the returned recognition results, the <node> element associates a unique node id with a frame number. Multiple node id's may have the same frame number (for example, for branches in the lattice).

The <node> element has the following attributes:

Attribute

Description

id

Alphanumeric. An id associated with the node. It is an arbitrary string.

frame

Integer. The frame number associated with this node.

The <arc> element

In the returned recognition results, the <arc> element specifies the lattice network connecting the nodes. The <arc> is the textual representation of the data on that arc (typically the word).

The <arc> element has the following attributes:

Attribute

Description

type

Optional. Values are "word" (the default) or "silence". There is no text in a silent arc; text is required for a word arc.

from

Alphanumeric. The node id that this arc departs.

to

Alphanumeric. The id of the node that this arc enters.

acoustic_score

Float. The acoustic score for this arc.

lm_score

Optional. Float. The language model (LM) score for this arc. This attribute is optional if the lm score is zero.

confidence

Float in the range 0.000 to 1.000. The confidence score for this arc. (Valid for word arcs only.)