Recognition Response

A speech or DTMF recognition performed on Nuance Recognizer can result on of these GRPC response messages:

  • A Result response when recognition succeeded.

  • A Status response when recognition failed.

A recognition can fail for different reasons. For example, Nuance Recognizer can terminate a recognition if it receives insufficient audio data within a certain time or if it detects no speech in the received audio within a configurable time. But when Nuance Recognizer detects speech in the audio (and the end of that speech is detected within an allowed time), or in the case of a DTMF recognition receives at least one DTMF digit, then Nuance Recognizer returns a Result message.

The final Status or Result message returned by Nuance Recognizer contains:

On a failed recognition:

message Status {
  uint32 code = 1;                                  // HTTP-style return code: 100, 200, 4xx, or 5xx as appropriate.
  string message = 2;                               // Brief description of the status.
  string details = 3;                               // Longer description if available.

On a successful recognition:

message Result {
  string formatted_text = 1;                        // Formatted recognition result (could be empty).
  string status = 2;                                // Recognition status information.

On a successful recognition, the Result will not contain a formatted_text entry if the recognized speech did not match any of the grammars (loaded via the DTMFRecognitionInit/RecognitionInit message). In that case, the Result status string will show the recognition status returned by Nuance Recognizer. For example a status of “NO_MATCH” means that the input could not be matched against the grammars.

When the recognition is successful and there is a match, the formatted_text string contains an xml result. This xml result can be in Natural Language Semantics Markup Language (NLSML) or Extensible MultiModal Annotation (EMMA) format. The client application specifies the desired format for the result in the DTMFRecognitionInit/RecognitionInit message when initiating the recognition.

Example result messages

These are sample result messages.

Success with builtin grammar for digits

The digits 0 1 2 3 4 were recognized. The specified result format for the recognition was NLSML (the default when no result format is specified in the RecognitionInit message). The double quotes characters are escaped (prepended with a \ (backslash) character) in order to be transmitted in JSON.

"result": {
  "formattedText": "<result><interpretation conf=\"0.93\"><text mode=\"voice\">zero one two three four</text><instance grammar=\"builtin:grammar/digits\"><SWI_meaning>01234</SWI_meaning><MEANING conf=\"0.93\">01234</MEANING><SWI_literal>zero one two three four</SWI_literal><SWI_grammarName>builtin:grammar/digits</SWI_grammarName></instance></interpretation><interpretation conf=\"0.48\"><text mode=\"voice\">zero one two oh three four</text><instance grammar=\"builtin:grammar/digits\"><SWI_meaning>012034</SWI_meaning><MEANING conf=\"0.48\">012034</MEANING><SWI_literal>zero one two oh three four</SWI_literal><SWI_grammarName>builtin:grammar/digits</SWI_grammarName></instance></interpretation></result>",
  "status": "SUCCESS"

Successful NLSML result

The NLSML result (reformatted for easier viewing) shows that Nuance Recognizer returned two possible interpretations for the audio. The first is the correct one, which has a confidence level of 0.93.

  <interpretation conf="0.93">
    <text mode="voice">zero one two three four</text>
    <instance grammar="builtin:grammar/digits">
      <MEANING conf="0.93">01234</MEANING>
      <SWI_literal>zero one two three four</SWI_literal>
  <interpretation conf="0.48">
    <text mode="voice">zero one two oh three four</text>
    <instance grammar="builtin:grammar/digits">
      <MEANING conf="0.48">012034</MEANING>
      <SWI_literal>zero one two oh three four</SWI_literal>

Successful EMMA result

The same recognition with a EMMA result format:

"result" : {
  "formatted_text": "<?xml version=\'1.0\'?><emma:emma version=\"1.0\" xmlns:emma=\"\" xmlns:nuance=\"\"><emma:grammar id=\"grammar_1\" ref=\"builtin:grammar/digits\"/><emma:interpretation id=\"interp_1\" emma:confidence=\"0.93\" emma:grammar-ref=\"grammar_1\" emma:tokens=\"zero one two three four\" emma:duration=\"2821\" emma:mode=\"voice\"><emma:literal>01234</emma:literal></emma:interpretation></emma:emma>"
  "status": "SUCCESS"

The EMMA result returned in formatted_text:

<?xml version='1.0'?>
<emma:emma xmlns:emma="" xmlns:nuance="" version="1.0">
  <emma:grammar id="grammar_1" ref="builtin:grammar/digits"/>
  <emma:interpretation id="interp_1" emma:confidence="0.93" emma:grammar-ref="grammar_1" emma:tokens="zero one two three four" emma:duration="2821" emma:mode="voice">