Domain LMs
Each data pack supplied with ASRaaS provides a base language model that lets ASRaaS recognize the most common terms and constructs in the language and locale.
You may complement this language model with one or more domain-specific models, called domain language models (domain LMs or DLMs). Each DLM is based on sentences from a specific environment, or domain, and may include one or more entities, or collections of terms used in that environment.
For example, a Pharmacy DLM might have a MEDS entity, and a Travel DLM might have NAMES and PLACES entities.
DLMs are created in Mix from your training sentences, entities, and optionally a custom pronunciation file. For an in-depth look at DLMs in Mix, consult Best practices > ASR modeling.
To use a DLM in ASRaaS, declare it in RecognitionInitMessage: RecognitionResource, specifying the DLM’s location as a URN available in Mix. See URN format of resources.
Optionally give it a weight with weight_value
or weight_enum
. The default weight for each DLM is 0.25 or MEDIUM. See Resource weights.
Each recognition request allows five DLMs for each reuse
setting (LOW_REUSE and HIGH_REUSE).
# Declare DLM
travel_dlm = RecognitionResource(
external_reference = ResourceReference(
type = 'DOMAIN_LM',
uri = 'urn:nuance-mix:tag:model/<context_tag>/mix.asr?=language=eng-USA'
),
weight_value = 0.5
)
# Include DLM in recognition request
init = RecognitionInitMessage(
parameters = RecognitionParameters(
language = 'en-US',
topic = 'GEN',
audio_format = AudioFormat(pcm=PCM(sample_rate_hz=16000))
),
resources = [travel_dlm]
)
DLMs and wordsets
You may optionally extend DLM entities with wordsets. A wordset associated with an entity provides additional terms in the same category as the entity, used for a recognition session only. For example, the MEDS entity could be extended with a wordset that lists new medications, or medications available in a specific region.
You may also create standalone wordsets, or wordsets not associated with a DLM entity. See Standalone wordsets.
Custom pronunciations
A DLM may optionally contain custom pronunciations to improve recognition of user speech in a specific environment. As you generate a DLM in Mix, you may include a file containing words expressed in the IPA or XSAMPA phonetic alphabet. The words may be existing words or words to be added with a wordset. To create custom pronunciations in Mix, see Best practices > ASR modeling > Tips on prons files to create a _client_prons.txt file.
Then upload the prons file in Mix using the Import/Export tab in PROJECTS. Choose ASR Pronunciations and upload your _client_prons.txt file. See Best practices > ASR modeling > Importing files.
Note:
In Mix, custom prons are handled by Nuance Professional Services.Language in request and DLM
Mix generates a DLM using the project’s data pack, identified by language and topic. (The topic is usually gen, or General.) This information is shown in the Project tab under Details:
When you use a DLM in ASRaaS, the recognition request must use the same language and topic as the DLM. For example, this request has language = 'en-US'
and topic = 'GEN'
and includes a DLM, travel_dlm
. The DLM was created in a Mix project with en-US and gen, so it’s compatible with the request.
# Declare DLM, which was created with en-US GEN data pack
travel_dlm = RecognitionResource(
external_reference = ResourceReference(
type = 'DOMAIN_LM',
uri = 'urn:nuance-mix:tag:model/names-places/mix.asr?=language=eng-USA'
)
)
# Recognition request specifies en-US GEN data pack
RecognitionInitMessage(
parameters = RecognitionParameters(
language = 'en-US',
topic = 'GEN',
...
),
resources = [travel_dlm]
)
If the language/topic in the request and the DLM do not match, ASRaaS returns an error. For example, if the request has language = 'en-GB'
but the DLM was created in an en-US Mix project, a “language mismatch” error is returned:
code: 400,
message: "Bad request",
details: "language mismatch for URN(s): urn:nuance-mix:tag:model/names-places/mix.asr?=language=eng-USA"
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.