Configuring voice enrollment

Abort-Phrase-Enrollment

The Abort-Phrase-Enrollment header can optionally be specified in the END-PHASE-ENROLLMENT method to abort the phrase enrollment, rather than committing the phrase to the personal grammar.

abort-phrase-enrollment = "Abort-Phrase-Enrollment" ":" Boolean-Value   CRLF

Clash-Threshold

The Clash-Threshold header can be sent as part of the START-PHRASE-ENROLLMENT, SET-PARAMS, or GET-PARAMS method. Used during voice enrollment, this header specifies how similar the pronunciations of two different phrases can be before they are considered clashing. For example, pronunciations of phrases such as “John Smith” and “Jon Smits” may be so similar that they are difficult to distinguish correctly. A smaller threshold reduces the number of clashes detected. The range for this threshold is a float value between 0.0 and 1.0. The default value for this header is implementation specific. You can turn off clash testing by setting the Clash-Threshold header value to 0.

clash-threshold = "Clash-Threshold" ":" 1*DIGIT CRLF

Confusable-Phrases-URI

The Confusable-Phrases-URI header specifies a grammar that defines invalid phrases for enrollment. For example, typical applications do not allow an enrolled phrase that is also a command word. This header may occur in RECOGNIZE requests that are part of an enrollment session.

confusable-phrases-uri = "Confusable-Phrases-URI" ":" Uri CRLF

Consistency-Threshold

The Consistency-Threshold header may be sent as part of the START-PHRASE-ENROLLMENT, SET-PARAMS, or GET-PARAMS method. Used during voice enrollment, this header specifies how similar to a previously enrolled pronunciation of the same phrase an utterance needs to be in order to be considered consistent. The higher the threshold, the closer the match between an utterance and previous pronunciations must be. The range for this threshold is a float value between 0.0 and 1.0. The default value for this header is implementation specific.

consistency-threshold = "Consistency-Threshold" ":" FLOAT CRLF

Enroll-Utterance

The Enroll-Utterance header may be specified in the RECOGNIZE method. If this header is set to TRUE, and an Enrollment is active, the RECOGNIZE command must add the collected utterance to the personal grammar that is being enrolled. The default value for this header is FALSE.

enroll-utterance = "Enroll-Utterance" ":" Boolean-Value CRLF

Expect the client to set to TRUE if the RECOGNIZE is for an enrollment and to false if doing a regular RECOGNIZE during an enrollment session.

New-Phrase-Id

The New-Phrase-Id header replaces the ID used to identify the phrase in a personal grammar. Recognizer returns the new ID when using an enrollment grammar. This header may occur in MODIFY-PHRASE requests.

new-phrase-id = "New-Phrase-ID" ":" 1*VCHAR CRLF

New-Phrase-Id is used for MODIFY-PHRASE and changes the rule in the personal grammar.

Num-Min-Consistent-Pronunciations

The Num-Min-Consistent-Pronunciations header may be specified in a START-PHRASE-ENROLLMENT, SET-PARAMS, or GET-PARAMS method and is used to specify the minimum number of consistent pronunciations that must be obtained to voice enroll a new phrase. The minimum value is 1. The default value is implementation specific and may be greater than 1.

num-min-consistent-pronunciations="Num-Min-Consistent-Pronunciations" ":"   1*DIGIT CRLF

Nuance Speech Server controls the number of consistent pronunciations and does not count no-matches or no-inputs. Speech Server must also parse enrollment result to find consistency status. Only count consistent utterances towards the Num-Min-Consistent-Pronunciations.

Personal-Grammar-URI

The Personal-Grammar-URI header specifies the speaker-trained grammar to be used or referenced during enrollment operations. Phrases are added to this grammar during enrollment. For example, a contact list for user “Jeff” could be stored at the Personal-Grammar-URI http://myserver.example.com/myenrollmentdb/jeff-list. Nuance Speech Server, using the HTTP database, stores the source grammar at this location. The generated grammar syntax may be implementation specific. There is no default value for this header.

personal-grammar-uri = "Personal-Grammar-URI" ":" Uri CRLF

Phrase-Id

In a request, the Phrase-Id header identifies a phrase in an existing personal grammar for which enrollment is desired. It is also returned to the client in the RECOGNIZE complete event. This header may occur in START-PHRASE-ENROLLMENT, MODIFY-PHRASE or DELETE-PHRASE requests. There is no default value for this header.

phrase-id = "Phrase-ID" ":" 1*VCHAR CRLF

Nuance Speech Server stores this rule in a personal grammar.

Phrase-NL

The Phrase-NL header is a string that specifies that a natural language statement in one of the active grammars apply to the phrase once the phrase is recognized. This header can occur in START-PHRASE-ENROLLMENT and MODIFY-PHRASE requests. There is no default value for this header.

phrase-nl = "Phrase-NL" ":" 1*VCHAR CRLF

Nuance Speech Server stores Phrase-NL as SWI_meaning in a personal grammar.

Save-Best-Waveform

The Save-Best-Waveform header allows the client to request the recognizer resource to save the audio stream for the best repetition of the phrase that was used during the enrollment session. The recognizer must attempt to record the recognized audio and make it available to the client in the form of a URI returned in the waveform-uri header in the response to the END-PHASE-ENROLLMENT method. If there was an error in recording the stream, or the audio data is otherwise not available, the recognizer must return an empty waveform-uri header.

save-best-waveform = "Save-Best-Waveform" ":" Boolean-value CRLF

Nuance Speech Server uses the last utterance or, if possible, parses recognition results and keeps the utterance with the highest confidence.

Weight

The value of the Weight header represents the occurrence likelihood of a phrase in an enrolled grammar. When using grammar enrollment, the system is essentially constructing a grammar segment consisting of a list of possible match phrases. This is similar to the dynamic construction of a <one-of> tag in the W3C grammar specification. Each enrolled phrase becomes an item in the list that can be matched against spoken input similar to an <item> within a <one-of> list.

This header allows you to assign a weight to the phrase (that is, <item> entry) in the <one-of> list that is enrolled. Grammar weights are normalized to a sum of one at grammar compilation time, so a weight value of 1 for each phrase in an enrolled grammar list indicates that all items in the list have the same weight. This header may occur in START-PHRASE-ENROLLMENT and MODIFY-PHRASE requests. The default value for this header is implementation specific.

weight = "Weight" ":" weight-value CRLF

Nuance Speech Server stores Weight as SWI_scoreDelta in a personal grammar.

DELETE-PHRASE

The DELETE-PHRASE method sent from the client to the server is used to delete a phrase in a personal grammar added through voice enrollment or text enrollment. If the specified phrase does not exist, this method has no effect.

C->S: MRCP/2.0 123 DELETE-PHRASE 543266

  Channel-Identifier:32AECB23433801@speechrecog

  Personal-Grammar-URI:personal_grammar_uri

  Phrase-Id:phrase_id

S->C: MRCP/2.0 49 543266 200 COMPLETE

  Channel-Identifier:32AECB23433801@speechrecog

END-PHASE-ENROLLMENT

Do not call the END-PHRASE-ENROLLMENT method during an ongoing RECOGNIZE operation.

Instead, call it to commit a new phrase in the grammar during an active phrase-enrollment session. This is after successive and successful calls to RECOGNIZE where Num-Repetitions-Still-Needed returns as 0 in the RECOGNITION-COMPLETE event. Alternatively, call it by specifying the Abort-Phrase-Enrollment header to abort the phrase-enrollment session.

If the client has specified Save-Best-Waveform as true in the STARTPHRASE-ENROLLMENT request, then include the location/URI of a recording of the best repetition of the learned phrase in the response. For example:

C->S: MRCP/2.0 49 END-PHRASE-ENROLLMENT 543262

  Channel-Identifier:32AECB23433801@speechrecog

S->C: MRCP/2.0 123 543262 200 COMPLETE

  Channel-Identifier:32AECB23433801@speechrecog

  Waveform-URI:http://mediaserver.com/recordings/file1324.wav;size=242453;duration=25432

ENROLLMENT-ROLLBACK

The ENROLLMENT-ROLLBACK method discards the last live utterance from the RECOGNIZE operation. Use this method when the caller provides undesirable input such as non-speech noises, sidespeech, commands, or utterance from the RECOGNIZE grammar.

This method does not provide a stack of rollback states. Executing ENROLLMENT-ROLLBACK twice in succession without an intervening recognition operation has no effect on the second attempt as shown in this example:

C->S: MRCP/2.0 49 ENROLLMENT-ROLLBACK 543261

  Channel-Identifier:32AECB23433801@speechrecog

S->C: MRCP/2.0 49 543261 200 COMPLETE

  Channel-Identifier:32AECB23433801@speechrecog

MODIFY-PHRASE

The MODIFY-PHRASE method sent from the client to the server is used to change the phrase ID, NL phrase, and/or weight for a given phrase in a personal grammar.

If no fields are supplied then calling this method has no effect.

C->S: MRCP/2.0 123 MODIFY-PHRASE 543265

  Channel-Identifier:32AECB23433801@speechrecog

  Personal-Grammar-URI:personal_grammar_uri

  Phrase-Id:phrase_id

  New-Phrase-Id:new_phrase_id

  Phrase-NL:NL_phrase

  Weight:1

S->C: MRCP/2.0 49 543265 200 COMPLETE

START-PHRASE-ENROLLMENT

The START-PHRASE-ENROLLMENT method from the client to the server starts a new phrase-enrollment session during which the client can call RECOGNIZE multiple times to enroll a new utterance in a grammar. An enrollment session consists of a set of calls to RECOGNIZE in which the caller speaks a phrase several times so the system can learn it. The phrase is then added to a personal grammar (speaker-trained grammar), so that the system can recognize it later.

Only one phrase-enrollment session may be active at a time for a resource. The Personal-Grammar-URI identifies the grammar that is used during enrollment to store the personal list of phrases. Once RECOGNIZE is called, the result is returned in a RECOGNITION-COMPLETE event and may contain either an enrollment result or a recognition result for a regular recognition. Calling END-PHASE-ENROLLMENT ends the ongoing phrase-enrollment session, which is typically done after a sequence of successful calls to RECOGNIZE. This method can be called to commit the new phrase to the personal grammar or to abort the phrase-enrollment session.

The Personal-Grammar-URI, which specifies the grammar to contain the new enrolled phrase, is created if it does not exist. Also, the personal grammar can only contain phrases added via a phrase-enrollment session.

The Phrase-ID passed to this method is used to identify this phrase in the grammar and is returned as the speech input when doing a RECOGNIZE on the grammar. The Phrase-NL similarly is returned in a RECOGNITION-COMPLETE event in the same manner as other NL in a grammar. The tag-format of this NL is implementation specific.

If the client specifies Save-Best-Waveform as true, include the location/URI of a recording (of the best repetition of the learned phrase in the response) after ending the phrase-enrollment session.

C->S: MRCP/2.0 123 START-PHRASE-ENROLLMENT 543258 5

  Channel-Identifier:32AECB23433801@speechrecog

  Num-Min-Consistent-Pronunciations:2

  Consistency-Threshold:30

  Clash-Threshold:12

  Personal-Grammar-URI:personal_grammar_uri

  Phrase-NL:NL_phrase

  Weight:1

  Save-Best-Waveform:true

S->C: MRCP/2.0 49 543258 200 COMPLETE

  Channel-Identifier:32AECB23433801@speechrecog

Configuring voice enrollment

Voice enrollment headers

Voice enrollment methods

Related topics