Speaker profiles

Speaker adaptation is a technique that adapts the acoustic model and improves speech recognition based on qualities of the speaker and channel. The best results are achieved by updating the data pack’s acoustic model in real time based on the immediate utterance.

ASRaaS can maintain adaptation data for each caller as speaker profiles in an internal datastore.

To use speaker profiles in ASRaaS, specify them in ResourceReference as TYPE SPEAKER_PROFILE, and include a user_id in RecognitionInitMessage:

# Define speaker profile
speaker_profile = RecognitionResource(
    external_reference = ResourceReference(
        type = 'SPEAKER_PROFILE'
    )
)

# Include profile in RecognitionInitMessage
init = RecognitionInitMessage(
    parameters = RecognitionParameters(
        language = 'en-US',
        topic = 'GEN',
        audio_format = AudioFormat(pcm=PCM(sample_rate_hz=16000))
    ),
    resources = [travel_dlm, places_wordset, speaker_profile],
    user_id = 'james.somebody@aardvark.com'
)

The user id must be a unique identifier for a speaker, for example:

user_id='socha.someone@aardvark.com'
user_id='erij-lastname'
user_id='device-1234'
user_id='33ba3676-3423-438c-9581-bec1dc52548a'

The first time you send a request with a speaker profile, ASRaaS creates a profile based on the user id and stores the data in the profile. On subsequent requests with the same user id, ASRaaS adds the data to the profile, which adapts the acoustic model for that specific speaker, providing custom recognition.

Speaker profiles do not have a weight.

After the ASRaaS session, the adapted data is saved by default. If this information is not required after the session, set discard_speaker_adaptation to true in RecognitionFlags:

# Define speaker profile
speaker_profile = RecognitionResource(
    external_reference = ResourceReference(
        type = 'SPEAKER_PROFILE'
    )
)

# Include profile in RecognitionInitMessage, optionally discard after adaptation
init = RecognitionInitMessage(
    parameters = RecognitionParameters(
        language = 'en-US',
        topic = 'GEN',
        audio_format = AudioFormat(pcm=PCM(sample_rate_hz=16000)),
        recognition_flags = RecognitionFlags(discard_speaker_adaptation=True)
        ),
    resources = [travel_dlm, places_wordset, speaker_profile],
    user_id = 'james.somebody@aardvark.com'
    )

If you need to remove speaker profiles, use the ForgetMe API.