Understanding the manifest file
Dragon Voice requires a manifest file named nuance_package.json that is one element of a recognition model created with Nuance tools. (Tools that are not included in the Speech Suite product.) Ways to acquire a manifest:
- Nuance can create a manifest (and its artifacts) on your behalf.
- You can use Nuance Experience Studio to build DLMs and manifests.
- You can use Nuance Mix Tools to build DLMs. Then use the Speech Suite Mix Command Line Tool for generating manifests and downloading (provided separately, not included with Speech Suite).
Note: The contents of the manifest file are case-sensitive. For example, the language "eng-USA" different from "eng-usa".
Do not modify the generated JSON file. It's purpose is to define dependencies among the specific artifacts and data packs (separately built), and to enable allocation of the best resources for each application session:
- Krypton resource — allocates a specific language, data pack topic, and data pack version. The values correspond to the installed data pack file.
- NLE resource — allocates a specific pipeline name, pipeline version, and the semantic model ID.
- Optionally, the manifest configures Krypton recognition parameters, semantic models, and DLMs.
For more information on engine allocation, see Resource manager features.
Note: After generating a manifest and its artifacts, store them in the same base directory, and specify that directory as the base path of the <grammar> src attribute in VoiceXML documents. If you generate more than one manifest and artifacts, store each set in a different base. (You cannot substitute or move files from one artifact to another. For example, you cannot insert a DLM from one artifact into the filepath of a different artifact.)
Note: For Krypton-only recognition, the nuance_package.json manifest must not include NLE information. (The manifest fails to load if it mentions NLE resources.) When using Nuance Experience Studio, use the recognition only project setting.
{
"configVersion": "11.0.1", //constant value
"domain": "DomainName", //constant value
"project": "ProjectName", //Project name defined in the Nuance tool that generated the manifest
"language": "eng-USA", //Language code
"version": "1.2.3",
"krypton": {
"dpTopic": "GEN", //Datapack topic defined in the Nuance tool
"dpVersion": "3.7.0", //Datapack version
"sessionObjects": [
{
"id": "<someID>", //unique ID
"type": "application/x-nuance-domainlm", //constant value
"url": "http://myPath/grammars/myDLM.zip", //the ASR model
"weight": 0
}
]
}
}
{ "configVersion" : "11.0.1", //constant value "domain" : "DomainName", "project" : "ProjectName", "language" : "eng-USA", "version" : "1.2.3", "krypton" : { "dpTopic" : "GEN", "dpVersion" : "3.7.1", "sessionObjects" : [ { "id" : "1000_MainMenu", "type" : "application/x-nuance-domainlm", "url" : "./A1000_MainMenu_DLM.zip", "weight" : 0.25 }, { "id" : "2000_BillingMenu", "type" : "application/x-nuance-domainlm", "url" : "./A2000_BillingMenu_DLM.zip", "weight" : 0 } ], "parameters": { "noInputTimeout" : "500ms", "recognitionTimeout" : "20000ms", "utteranceEndSilence" : "1500ms", "speechDetectionSensitivity" : 500, "audioFormat" : "audio/basic;rate=8000" }, "comments": { "dpTopic" : "Name of data pack used for model", "dpVersion" : "Version of data pack used for model", "sessionObjects" : "List of DLMs to load", "sessionObjects.id" : "ID is used to specify DLM weight at runtime in URL", "sessionObjects.type" : "DLM object type", "sessionObjects.url" : "URL is the relative path to DLM", "sessionObjects.weight" : "Default weight to use for DLM", "parameters" : "Default values for runtime parameters", "parameters.noInputTimeout" : "Interval of silence permitted while awaiting user input", "parameters.recognitionTimeout" : "Maximum duration of recognition turn", "parameters.utteranceEndSilence" : "Minimum amount of silence that will cause an utterance to end", "parameters.speechDetectionSensitivity" : "Integer value that controls speech detection sensitivity level", "parameters.audioFormat" : "Incoming audio MIME type" } }, "nle" : { "pipelineName" : "QuickNLP", //inside the NLU model the nle.properties file: pipeline.providerName "pipelineVersion" : "2.6.5", //inside the NLU model the nle.properties file: pipeline.version "url" : "./nle_model.zip", //the NLU model "parameters" : { }, "comments" : { "pipelineName" : "Pipeline name.", "pipelineVersion" : "Pipeline version.", "url" : "Relative path to NLE model.", "parameters" : "NLE parameters. For future use." } } }
Set these optional parameters in the manifest file. They override session-level recognition parameters.
Note: If you set these parameters in the Speech Server configuration, you override the values set in the manifest.
Parameter | Type | Default | Description | |
---|---|---|---|---|
allowZeroBaseWeight | Boolean | False | When true, custom resources (DLMs, wordsets, etc.) can use the entire weight space, disabling the base LM contribution. By default, the base LM uses at least 10% of the weight space. Even when true, words from the base LM are still recognized, but with lower probability. | |
audioFormat | String | audio/L16;rate=8000 |
Incoming audio mime-type. Default is 16-bit linear data (Linear PCM), 8 kHz sampling rate. |
|
autoEnd | Boolean | TRUE |
Whether the recognition turn ends when Krypton detects the end of an utterance. See utteranceEndSilence below. True (default): Recognition ends when an end of utterance is detected. False: Recognition continues until the client sends a Stop or EndOfInput command. This allows multiple utterances to be transcribed in a single recognition command. |
|
autoPunctuation | Boolean | TRUE |
Whether to enable auto punctuation. True (default): Auto punctuation is enabled if available for the language. False: Auto punctuation is disabled. |
|
enablePartialResults | Boolean | TRUE |
Whether to enable partial results. True (default): Partial results are returned, followed by full results. See also immutablePartialResults below. False: Only full results are returned. |
|
enableProfanityFilter | Boolean | FALSE |
Whether to filter profanities and other unacceptable language from transcriptions. Not all data packs support this feature. True: Known profanities for the language, when recognized, are replaced with asterisks (***) to reduce the explicit offensiveness of the transcription. False (default): Profanities are left as is in the transcription. |
|
enableSpeakerProfileUpdate | Boolean | TRUE | If speaker profiles are used in the session, whether updated speaker adaptation data is stored in the Minio server after the session. Default is set in config in the protocol section, set to true in the sample config file. This setting may be overridden in EndSession. | |
enableUtteranceDetection | Boolean | TRUE |
Whether to activate the internal utterance detection mechanism. True (default): Utterances are detected. False: Utterances are not detected, meaning the StartOfSpeech event is not generated and these recognition parameters are not allowed:
The client must explicitly terminate the audio stream with EndOfInput. Use disableUtteranceDetection (default false) with version 1.0. |
|
formattingParams | Array | n/a | One or more formatting options that affect how results are presented in the formattedText field. These options may be specified with or without formattingType, but should not conflict with formattingType. Values depend on the data pack, as listed in the data pack readme file or from GetInfo. | |
formattingType | String | n/a |
A keyword that affects how ambiguous numbers are presented in the formattedText field. For example, this is how the utterance “seven eleven” is formatted for these types: all_as_words: seven eleven date: 7/11 time: 7:11 Values depend on the data pack, as listed in the data pack readme file or from GetInfo. |
|
immutablePartialResults | Boolean | FALSE |
When partial results are enabled (see enablePartialResults above), whether to return stable partial results. This feature may be useful if you are using realtime analytics systems. Not all data packs support this feature. True: Partial results are delivered, after a slight delay, to ensure that the recognized words do not change with the rest of the received speech. Some data packs perform additional processing after the initial transcription. The transcription may change slightly during this second pass, even when immutablePartialResults is true. False (default): Partial results are delivered as soon as speech is detected, but with low recognition confidence. These results usually change as more speech is processed and the context is better understood. |
|
nBestListLength | Int (1-10) | 10 | The number of n-best hypotheses returned in the nBest recognition result. Default is 10. | |
noInputTimeout |
String |
n/a |
A time designation string. Interval of silence allowed while waiting for user input after recognition timers have been started. Related to the timeout timeout VoiceXML property. The default is 7000 milliseconds. Speech Server overwrites the manifest default of no timeout. |
|
recognitionTimeout |
String |
n/a |
A time designation string. Maximum duration of the recognition turn. Related to the maxspeechtimeout VoiceXML property and the Recognition-Timeout MRCP property. The default is 1000 milliseconds. (Speech Server overwrites the manifest default of no timeout. |
|
speechDetectionSensitivity | Integer | 500 |
Integer value that controls sensitivity level when detecting speech. Used to filter out background noise and not mistake it for speech. Values range from 0 (least likely to interpret noise as speech) and 1000 (least likely to interpret speech as noise or silence or, in other words, highly sensitive to quiet input). Similar to sensitivity VoiceXML property. |
|
speechDomain | String | n/a | A mapping to internal weightsets for language models in the data pack. Providing a specific speechDomain tells Krypton to expect that type of audio input and can improve transcription quality using an internal weightset. Values depend on the data pack, as listed in the data pack readme file or from GetInfo, for example: ivr, healthcare, messaging, and so on. | |
startRecognitionTimers | Boolean | TRUE |
Whether timers are activated when a recognition turn starts. True (default): Timers start when the turn begins. False: Timers do not start when the turn begins. They can be started later—typically following a barge-in event—using StartRecognitionTimers. Once started, timers cannot be stopped. |
|
suppressCallRecording | Boolean | FALSE |
Whether call recording is disabled for the recognition turn. True: Information is not sent to the call log aggregator. False (default): Call logs, metadata, and audio are collected by the aggregator. |
|
suppressInitialCapitalization | Boolean | FALSE | When true, the first word in a sentence is not automatically capitalized. This option does not affect words that are capitalized by definition, such as proper names, place names, etc. | |
utteranceEndSilence |
String |
500 |
A time designation string. The minimum amount of silence that will cause an utterance to end, up to a maximum of 2750 milliseconds. Related to the incompletetimeout VoiceXML property and the Speech-Incomplete-Timeout MRCP property. The default is 1500 (1.5 seconds). (Speech Server overwrites the manifest default of 500.) Accepted values in milliseconds are: 0, 250, 500, 1000, 1500, 2000, 2250, 2500, 2750. Any other number will be mapped to the nearest greater allowed value. (For example, 1200 is rounded to 1500, and 3500 is rounded to 2750, the greatest value allowed.) |