Controlling speech recognition and TTS

When applications use Nuance speech resources, communications must pass down and up through a multi-layer stack. The stack begins with the VoiceXML application, and continues down through the voice browser (including the MRCP client), Nuance Speech Server, and then to the individual Nuance resources.

Although the application may not have direct control over lower layers of this stack, developers can be aware of defaults at those layers and decide to use non-default values.

For example, you can:

Below is a summary of the relationship among configuration parameters at different levels of the resource stack, listed in order of decreasing precedence.

  • VoiceXML properties—Properties set by the VoiceXML application for a particular session or utterance. The VoiceXML 2.0 specification defines numerous properties. When the browser interprets properties on a VoiceXML page, it is responsible for setting the corresponding MRCP or vendor-specific parameters to configure Nuance components. See Speech processing with VoiceXML.

    Note: An application also controls recognition with a session.xml file. See Configuring application sessions.

  • MRCP—Your browser can either pass through configuration requests from the application, or use its own default values. The browser translates VoiceXML property settings to standard MRCP headers. The browser uses headers to set parameters; for example, in the RECOGNIZE or SET-PARAMS message to set recognition parameters, and in the SPEAK or SET-PARAMS message to set Vocalizer parameters. See Implementing an MRCP client.

    The Nuance Speech Server software supports all MRCP recognition resources. See the MRCP recommendation.

  • Recognizer and Vocalizer directly control many aspects of the recognition/speech synthesis process, if not overridden by requests from other, higher-precedence components such as Speech Server.
    Notes: