Configuring voice enrollment

Voice enrollment is a speech dialog where a user associates a pronunciation with a given function. For example, an enrollment application could associate the spoken phrase “call home” with a command to dial a phone number.

The main task of a voice enrollment application is to enroll a word or phrase (the pronunciation). The caller is prompted to speak the same utterance several times so that the system can compute a pronunciation for it. Then, the application adds the pronunciation to a user dictionary. For example, the caller repeats the phrase “call mom” several times, the pronunciation is computed, and the following entry is added to a dictionary:

<entry key="phone_5551234_1"> 
<definition value=" k ah l m ah m" /> 
</entry>

At this point, the enrollment is complete and the application, or any other application with access to the dictionary, can use the pronunciation in a grammar (which enables recognition when the caller speaks the enrolled utterance).

Note: Voice enrollment is a feature of Nuance Recognizer only.

In the example above, note that the key name (phone_5551234_1) is completely arbitrary with respect to the meaning of the utterance. The application does not know the meaning or spelling of the enrolled utterance. Instead, the application hears the phrase repeatedly and compares each collection until it is possible to compute a phonetic sequence that reliably represents the heard sounds. The enrolled utterance is a piece of data that will match when that specific caller speaks the same sounds. When the application inserts the pronunciation into a dictionary, it can assign any useful key name.

Voice enrollment headers

Speech Server supports the following MRCP headers for voice enrollment:

Voice enrollment methods

Speech Server supports these MRCP methods for voice enrollment: