Understanding the manifest file

Dragon Voice requires a manifest file named nuance_package.json that is one element of a recognition model created with Nuance tools. (Tools that are not included in the Speech Suite product.) Ways to acquire a manifest:

Nuance can create a manifest (and its artifacts) on your behalf.
You can use Nuance Experience Studio to build DLMs and manifests.
You can use Nuance Mix Tools to build DLMs. Then use the Speech Suite Mix Command Line Tool for generating manifests and downloading (provided separately, not included with Speech Suite).

Note: The contents of the manifest file are case-sensitive. For example, the language "eng-USA" different from "eng-usa".

Do not modify the generated JSON file. It's purpose is to define dependencies among the specific artifacts and data packs (separately built), and to enable allocation of the best resources for each application session:

Krypton resource — allocates a specific language, data pack topic, and data pack version. The values correspond to the installed data pack file.
NLE resource — allocates a specific pipeline name, pipeline version, and the semantic model ID.
Optionally, the manifest configures Krypton recognition parameters, semantic models, and DLMs.

For more information on engine allocation, see Resource manager features.

Note: After generating a manifest and its artifacts, store them in the same base directory, and specify that directory as the base path of the <grammar> src attribute in VoiceXML documents. If you generate more than one manifest and artifacts, store each set in a different base. (You cannot substitute or move files from one artifact to another. For example, you cannot insert a DLM from one artifact into the filepath of a different artifact.)

Krypton configuration in the manifest

Parameter	Type	Default	Description
allowZeroBaseWeight	Boolean	False	When true, custom resources (DLMs, wordsets, etc.) can use the entire weight space, disabling the base LM contribution. By default, the base LM uses at least 10% of the weight space. Even when true, words from the base LM are still recognized, but with lower probability.
audioFormat	String	audio/L16;rate=8000	Incoming audio mime-type. Default is 16-bit linear data (Linear PCM), 8 kHz sampling rate.
autoEnd	Boolean	TRUE	Whether the recognition turn ends when Krypton detects the end of an utterance. See utteranceEndSilence below. True (default): Recognition ends when an end of utterance is detected. False: Recognition continues until the client sends a Stop or EndOfInput command. This allows multiple utterances to be transcribed in a single recognition command.
autoPunctuation	Boolean	TRUE	Whether to enable auto punctuation. True (default): Auto punctuation is enabled if available for the language. False: Auto punctuation is disabled.
enablePartialResults	Boolean	TRUE	Whether to enable partial results. True (default): Partial results are returned, followed by full results. See also immutablePartialResults below. False: Only full results are returned.
enableProfanityFilter	Boolean	FALSE	Whether to filter profanities and other unacceptable language from transcriptions. Not all data packs support this feature. True: Known profanities for the language, when recognized, are replaced with asterisks (***) to reduce the explicit offensiveness of the transcription. False (default): Profanities are left as is in the transcription.
enableSpeakerProfileUpdate	Boolean	TRUE	If speaker profiles are used in the session, whether updated speaker adaptation data is stored in the Minio server after the session. Default is set in config in the protocol section, set to true in the sample config file. This setting may be overridden in EndSession.
enableUtteranceDetection	Boolean	TRUE	Whether to activate the internal utterance detection mechanism. True (default): Utterances are detected. False: Utterances are not detected, meaning the StartOfSpeech event is not generated and these recognition parameters are not allowed: autoEnd noInputTimeout recognitionTimeout utteranceEndSilence. The client must explicitly terminate the audio stream with EndOfInput. Use disableUtteranceDetection (default false) with version 1.0.
formattingParams	Array	n/a	One or more formatting options that affect how results are presented in the formattedText field. These options may be specified with or without formattingType, but should not conflict with formattingType. Values depend on the data pack, as listed in the data pack readme file or from GetInfo.
formattingType	String	n/a	A keyword that affects how ambiguous numbers are presented in the formattedText field. For example, this is how the utterance “seven eleven” is formatted for these types: all_as_words: seven eleven date: 7/11 time: 7:11 Values depend on the data pack, as listed in the data pack readme file or from GetInfo.
immutablePartialResults	Boolean	FALSE	When partial results are enabled (see enablePartialResults above), whether to return stable partial results. This feature may be useful if you are using realtime analytics systems. Not all data packs support this feature. True: Partial results are delivered, after a slight delay, to ensure that the recognized words do not change with the rest of the received speech. Some data packs perform additional processing after the initial transcription. The transcription may change slightly during this second pass, even when immutablePartialResults is true. False (default): Partial results are delivered as soon as speech is detected, but with low recognition confidence. These results usually change as more speech is processed and the context is better understood.
nBestListLength	Int (1-10)	10	The number of n-best hypotheses returned in the nBest recognition result. Default is 10.
noInputTimeout	String	n/a	A time designation string. Interval of silence allowed while waiting for user input after recognition timers have been started. Related to the timeout timeout VoiceXML property. The default is 7000 milliseconds. Speech Server overwrites the manifest default of no timeout.
recognitionTimeout	String	n/a	A time designation string. Maximum duration of the recognition turn. Related to the maxspeechtimeout VoiceXML property and the Recognition-Timeout MRCP property. The default is 1000 milliseconds. (Speech Server overwrites the manifest default of no timeout.
speechDetectionSensitivity	Integer	500	Integer value that controls sensitivity level when detecting speech. Used to filter out background noise and not mistake it for speech. Values range from 0 (least likely to interpret noise as speech) and 1000 (least likely to interpret speech as noise or silence or, in other words, highly sensitive to quiet input). Similar to sensitivity VoiceXML property.
speechDomain	String	n/a	A mapping to internal weightsets for language models in the data pack. Providing a specific speechDomain tells Krypton to expect that type of audio input and can improve transcription quality using an internal weightset. Values depend on the data pack, as listed in the data pack readme file or from GetInfo, for example: ivr, healthcare, messaging, and so on.
startRecognitionTimers	Boolean	TRUE	Whether timers are activated when a recognition turn starts. True (default): Timers start when the turn begins. False: Timers do not start when the turn begins. They can be started later—typically following a barge-in event—using StartRecognitionTimers. Once started, timers cannot be stopped.
suppressCallRecording	Boolean	FALSE	Whether call recording is disabled for the recognition turn. True: Information is not sent to the call log aggregator. False (default): Call logs, metadata, and audio are collected by the aggregator.
suppressInitialCapitalization	Boolean	FALSE	When true, the first word in a sentence is not automatically capitalized. This option does not affect words that are capitalized by definition, such as proper names, place names, etc.
utteranceEndSilence	String	500	A time designation string. The minimum amount of silence that will cause an utterance to end, up to a maximum of 2750 milliseconds. Related to the incompletetimeout VoiceXML property and the Speech-Incomplete-Timeout MRCP property. The default is 1500 (1.5 seconds). (Speech Server overwrites the manifest default of 500.) Accepted values in milliseconds are: 0, 250, 500, 1000, 1500, 2000, 2250, 2500, 2750. Any other number will be mapped to the nearest greater allowed value. (For example, 1200 is rounded to 1500, and 3500 is rounded to 2750, the greatest value allowed.)