Dragon Voice features
Dragon Voice provides raw recognition as well as a rich conversational voice experience by leveraging AI-based speech technology to support a more natural, open-dialog flow.
Main features | For information |
---|---|
Raw recognition of spoken words | See Krypton-only recognition. |
Natural conversations |
See Semantic interpretation. |
Data packs, semantic and linguistic, and dynamic wordsets | See Engine models for accuracy and intelligence. |
Starter packs to accelerate application development | See Starter packs to speed development. |
Custom pronunciations for specific environments | See Custom pronunciations. |
Krypton-only recognition
The Krypton-only recognition feature provides raw recognition. It returns the words spoken in the audio.
- The application loads any needed linguistic models and wordsets, and then streams audio (speech) and triggers a recognition.
- The audio passes through the Nuance Speech Server, the Natural Language Processing service, and the Krypton recognition engine.
- Speech Server passes the result to the voice browser.
What you need to know:
- To enable Krypton-only, you must configure nlps-audio-only or server.nlps.audioOnly.
- To deploy Krypton-only,see Deployment architecture decisions.
- To understand the recognition dataflow, see Dragon Voice recognition flow.
- To prepare your application, see VoiceXML application structure.
- To prepare for recognition, see Triggering the Dragon Voice recognizer.
- To understand the recognition results, see Getting recognition results.
Semantic interpretation
Dragon Voice engines can extract intentions and entities (and values) from caller requests. This enables your applications to conduct highly intelligent, contextually aware, and natural conversational experiences.
For example, to understand what a caller wants to accomplish by what he or she says:
- Speech (audio) is streamed by the Nuance Speech Server to the Krypton recognition engine (via the Natural Language Processing service, which manages the connections between Speech Server and Dragon Voice engines).
- The Krypton engine sends the ensuing recognition result to the Natural Language Engine (NLE) for semantic processing (again via the Natural Language Processing service).
- NLE receives the result from Krypton and sends the top interpretation to NTpE for tokenization.
- NTpE feeds the resulting token sequence to NLE.
- NLE, in turn, returns the semantic results—the spoken intent and any entities (also known as "concepts," “slots,” or “mentions”)—as a text result back to the Speech Server (via the Natural Language Processing service).
- Speech Server passes the result to the voice browser.
By chaining the processing of these engines together, more accurate recognition and interpretation results are achieved.
Engine models for accuracy and intelligence
As input, each of the core engines requires a set of models to accomplish its tasks. The models dictate how to manipulate and understand the input to each service. In effect, the input is layered from general understandings to more specific or specialized information geared to the end user.
- Some models provide a capability that is general to all applications, while other models provide a focus specific to an application.
- You can download default models with general capabilities from Nuance Network, and you can generate custom models with specific capabilities using Nuance Experience Studio or Nuance Mix Tools. Speak to your Nuance sales representative about obtaining access to either of those tools.
For information on referencing models and wordsets (artifacts) from your Dragon Voice application, see Triggering the Dragon Voice recognizer.
Starter packs to speed development
Starter packs accelerate development of natural language applications by providing developers and Nuance Professional Services with out-of-the-box recognition and semantic understanding capabilities. Starter packs minimize the need for upfront data collection, tagging, and building of acoustic models, language models, and semantic models. Starter packs are provisioned by Nuance and typically made available as part of your project in Nuance Experience Studio or Nuance Mix Tools. Consult Nuance for more information.
Profanity filter
Krypton can remove profanities from transcriptions of spoken text. To enable this feature, edit the Krypton's default.yaml
file and add enableProfanityFilter
in the protocol
section. For example:
protocol:
defaultVersion: '1.0'
...
enableProfanityFilter: true
Custom pronunciations
You can add custom pronunciations to improve Krypton recognition of user speech in specific environments. You accomplish this by generating a DLM that includes a pronunciation file (also called a prons file) with words expressed in the IPA or XSAMPA phonetic alphabet. The words can be existing vocabulary or new vocabulary added with a dynamic wordset. General procedure:
- Create a pron file named _client_prons.txt.
- Include the file when generating a DLM.
- Load the DLM into a Krypton session.
Note: The custom pron feature is for speech scientists who are comfortable with phonetic alphabets. An alternative, non-technical way to specify unusual pronunciations is to use the wordset "spoken" option. See Using wordsets.