Proprietary Recognizer features

The voice browser can use these special features in Nuance Recognizer:

Selective barge-in and magic word (MRCPv1 & MRCPv2)

An application can use the hotword mode to support two Nuance-specific barge-in modes in Nuance Recognizer: selective barge-in and magic word. These modes enable the application to recognize a specific speech or DTMF sequence and ignore anything else:

  • Selective barge-in prevents accidental interruption by allowing applications to define a small set of key words (to be spoken by callers) that trigger barge-in. An application that supports selective barge-in always listens for commands, whether the caller is speaking or listening to prompts.
  • Magic word is identical to selective barge-in except that it also rejects candidates that are too short or long.

An application can use either selective barge-in or magic word. Typically, these modes are used for prompts that play long, informational messages (for example, email messages) where an accidental barge-in would disrupt the user’s experience. Until the caller speaks a key word or phrase, the application takes no action. For this reason, the prompt continues to play until a successful recognition result is returned.

Another use of these modes involves no prompt at all. The application can wait silently until triggered into action by a successful recognition. For example, a voice-dialing application could allow callers to have a conversation while Nuance Recognizer listens for a command word (a magic word). The application could let a caller place a series of telephone calls without needing to hang up: at the end of one call, the caller could speak a magic word and then give commands for the next phone call.

An application can use either selective barge-in or magic word depending on its needs:

  • Because of the duration constraints in magic word, Nuance Recognizer returns less audio. This has the advantage of reducing network traffic and Recognizer load.

    However, with magic word the duration of each sound must be checked before the voice platform can send the first audio sample to Recognizer. This has the disadvantage of adding latency to magic word recognitions.

  • With selective barge-in, on the other hand, Speech Server can send audio to Nuance Recognizer immediately once speech begins.

In both magic word and selective barge-in, the application must activate grammars that contain only single words or very short phrases such as “go back”, “skip this”, or “wake up”:

  • A complex grammar adds too much recognition processing in this mode.
  • Callers might speak with uncertainty when speaking as a prompt continues to play, and uncertainty can lower recognition success. With single words and short phrases you limit the duration of their overlaid speech.
  • To improve the performance of magic word and selective barge-in, applications must instruct callers to pause briefly before and after saying the key words. Also, use Speech Server to set incompletetimeout to a small value (for example, 300 milliseconds).

Barge-in modes

The barge-in modes selective barge-in and magic word correspond to recognizer endpointer modes.

The endpointer and recognizer support several recognition modes: begin_only, selective_barge_in, and magic_word.

  • begin_only is the most common mode. When the endpointer detects the beginning of speech, Speech Server terminates the current prompt immediately, and sends the speech for recognition. Speech Server continues to send speech so the endpointer can adapt to the speech volume, background noise, and line noise.

    Note: Dragon Voice (Krypton recognition engine) supports this mode, but not the others.

  • selective_barge_in defines a small set of keywords that trigger barge-in. For example, an application can listen for specific commands at all times, whether the caller is speaking or listening to prompts. Speech Server sends the speech to Nuance Recognizer and awaits a successful result before terminating the current utterance or prompt.
  • magic_word is identical to selective_barge_in except that in magic_word, the endpointer rejects candidates that are too short or long before sending them to Nuance Recognizer.

By nature, selective_barge_in and magic_word are less responsive than begin_only because they must detect speech and complete a recognition before stopping prompts.

Setting barge-in modes

The application and must co-ordinate the endpointer and the recognizer: both resources must be set to compatible modes for any given recognition.

swiep_mode swirec_barge_in_mode Support

begin_only (default)

normal

Nuance Recognizer and Krypton

magic_word

magic_word

Nuance Recognizer

selective_barge_in

selective_barge_in

Nuance Recognizer

You can also use the following parameters to control bargein modes:

Parameter Description Support

swiep_magic_word_max_msec

Maximum length of a magic word utterance that is a candidate for recognition.

Nuance Recognizer

swiep_magic_word_min_msec

Minimum length of a magic word utterance that is a candidate for recognition.

Nuance Recognizer

swirec_magic_word_conf_thresh

Confidence threshold for recognition results computed while the magic_word mode is active.

Nuance Recognizer

swirec_selective_barge_in_conf_thresh

Confidence threshold for recognition results computed while the selective_barge_in mode is active.

Nuance Recognizer

incompletetimeout

Controls the duration of silence to indicate that the caller has finished speaking.

Nuance Recognizer and Krypton

MRCP headers

The browser uses MRCP headers to set the parameters that determine whether to interpret an utterance as a barge-in or a magic word. In MRPCv1, hotword mode is a Nuance vendor-specific feature. With MRCPv2, it is in the MRCP specification with changes in syntax.

The following settings determine which recognition mode is used for a given recognition:

Recognition-Mode Hotword-Max-Duration Hotword-Min-Duration Mode used Support

normal

Ignored.

Ignored.

begin_only

Nuance Recognizer and Krypton

hotword

0

Ignored.

selective_barge_in

Nuance Recognizer

hotword

Any value not = 0

Any value not = 0

magic_word

Nuance Recognizer

Examples

The browser can set the barge-in mode as in the following examples.

Interpreting text (MRCPv2 only)

The MRCPv2 INTERPRET method enables Nuance Recognizer to recognize plain text as an alternative to audio input. As such, it returns a standard recognition result without audio-specific values. The browser sends an Interpret-Text header containing the text for interpretation to the server and receives an INTERPRETATION-COMPLETE event with the result of the interpretation.

Note: If a RECOGNIZE, RECORD, or another INTERPRET operation is already in progress, invoking the INTERPRET method causes the response to have a status code of 402, "Method not valid in this state," and a COMPLETE request state.

The following example illustrates an INTERPRET and an INTERPRETATION-COMPLETE interaction between a client application and the server:

Client->Server:
 MRCP/2.0 123 INTERPRET 543266
 Channel-Identifier:32AECB23433801@speechrecog
 Interpret-Text:may I speak to Andre Roy
 Content-Type:application/srgs+xml
 Content-Id:<request1@form-level.store>
 Content-Length:104
 <?xml version="1.0"?>
 <!-- the default grammar language is US English -->
 <grammar xmlns="http://www.w3.org/2001/06/grammar"
    xml:lang="en-US" version="1.0" root="request">
 <!-- single language attachment to tokens -->
  <rule id="yes">
    <one-of>
      <item xml:lang="fr-CA">oui</item>
  <item xml:lang="en-US">yes</item>
    </one-of>
  </rule>
<!-- single language attachment to a rule expansion -->
  <rule id="request">
    may I speak to
    <one-of xml:lang="fr-CA">
      <item>Michel Tremblay</item>
      <item>Andre Roy</item>
    </one-of>
  </rule>
</grammar>
 
S->C:
 MRCP/2.0 49 543266 200 IN-PROGRESS
 Channel-Identifier:32AECB23433801@speechrecog
 
S->C:
 MRCP/2.0 49 INTERPRETATION-COMPLETE 543266 COMPLETE
 Channel-Identifier:32AECB23433801@speechrecog
 Completion-Cause:000 success
 Content-Type:application/nlsml+xml
 Content-Length:276
 <?xml version="1.0"?>
 <result grammar="session:request1@form-level.store">
  <interpretation>
    <instance name="Person">
      <Person>
        <Name> Andre Roy </Name>
      </Person>
    </instance>
    <input>   may I speak to Andre Roy </input>
  </interpretation>
</result>

The event illustrated in the example shows that the INTERPRET operation is complete. The interpretation result appears in the body of the MRCP message. The request state is COMPLETE.

The Completion-Cause header is included in this event and must be set to an appropriate value from the list of cause codes.