NRaaS gRPC API

The Nuance Recognizer gRPC API contains methods for requesting recognitions.

GrammarRecognizer

Streaming grammar-based recognition service API.

NRC parameter table
Name Request Type Response Type Description
Recognize RecognitionRequest stream RecognitionResponse stream Starts a recognition request and returns a response.
DTMFRecognize DTMFRecognitionRequest stream RecognitionResponse stream Starts a DTMF recognition request and returns a response.

RecognitionRequest

Input stream messages to perform a speech recognition, sent one at a time in a specific order to perform speech recognition. The first mandatory message sends recognition parameters and resources. An optional Control message can then be sent. The remaining messages send the audio to be recognized. When the stall_timers recognition flag is set to true in the RecognitionInit message, an optional Control message can be sent at any time after the RecognitionInit to initiate the timing of timeout events. Included in GrammarRecognizer Recognize service.

RecognitionRequest parameter table
Field Type Description
recognition_init RecognitionInit Mandatory. Required first message in the RPC input stream, sends parameters and resources for recognition.
control Control Optional. Second message in the RPC input stream, for timer control.
audio bytes Audio samples in the selected encoding for recognition.

RecognitionInit

Input message that initiates a new recognition turn. Included in RecognitionRequest.

RecognitionInit parameter table
Field Type Description
parameters RecognitionParameters Mandatory. Various endpointer and recognition parameters, recognition result format.
resources RecognitionResource Repeated. Mandatory. Resources (grammars) to be used for the recognition.
client_data RecognitionInit.ClientDataEntry Repeated. Client-supplied event, key=value pairs to inject into the call log. Multiple key=value entries can be specified in the key=value string by separating them with a pipe character. Example: client_data[“event”] = “key1=value1|key2=value2”;
user_id string A user identification to associate with the recognition.

RecognitionParameters

Input message that defines parameters for the recognition process. Included in RecognitionInit. The AudioFormat parameter is required; all others are optional.

RecognitionParameters parameter table
Field Type Description
audio_format AudioFormat Mandatory. Audio codec type and sample rate.
recognition_flags RecognitionFlags Boolean recognition parameters.
no_input_timeout_ms int32 Maximum silence, in milliseconds, allowed while waiting for user input after recognition timers are started. Default is 7000 ms. A value of -1 means no timeout.
complete_timeout_ms int32 Specify the duration of silence, in milliseconds, after a valid recognition has occurred that determines the caller has finished speaking. Default is 0 (timer disabled).
incomplete_timeout_ms int32 Specify the duration of silence, in milliseconds, after an utterance before concluding that the caller has finished speaking. Default is 1500 ms. A value of 0 disables the timer.
max_speech_timeout_ms int32 Maximum duration, in milliseconds, of an utterance collected from the user. Default is 22000 ms (22 seconds). A value of -1 means no timeout.
speech_detection_sensitivity float A balance between detecting speech and noise (for example, breathing), 0 to 1.0. 0 means ignore all noise, 1.0 means interpret all noise as speech. Default is 0.5.
nbest int32 Maximum number of n-best hypotheses to return. Range is 0 to 999. Additional CPU cycles needed if > 5. Default is 2.
confidence_level float When the score of the first n-best entry is less than the value of confidence_level, the recognition will return a no-match. Range is 0 to 1.0. Default is 0 (all utterances accepted).
result_format ResultFormat Specifies in what format the recognition result should be returned.
cookies string Repeated. Defines the HTTP cookies to be included when fetching a grammar resource using the Set-Cookie or Set-Cookie2 format. Format: “Set-Cookie:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”, “Set-Cookie2:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”. The name and value attributes are required. The remaining attributes are optional.
endpointer_parameters RecognitionParameters.Endpointer
ParametersEntry
Client-supplied key-value pairs representing parameters to set on the endpointer.
recognizer_parameters RecognitionParameters.Recognizer
ParametersEntry
Client-supplied key-value pairs representing parameters to set on the recognizer.
secure_context_level EnumSecureContextLevel Specifies the level of security for the recognition. Default is OPEN.

DTMFRecognitionRequest

Input stream messages that request DTMF recognition, sent one at a time, in a specific order. The first mandatory message sends recognition parameters and resources. An optional Control message can then be sent. The remaining messages send the DTMFs to be recognized. When the stall_timers recognition flag is set to true in the RecognitionInit message, an optional Control message can be sent at any time after the RecognitionInit to initiate the timing of timeout events. Included in GrammarRecognizer DTMFRecognize service.

DTMFRecognitionRequest parameter table
Field Type Description
recognition_init DTMFRecognitionInit Mandatory. Required first message in the RPC input stream, sends parameters and resources for recognition.
control Control Second message in the RPC input stream, for timer control.
dtmf string A DTMF char/string to add to the recognition.

DTMFRecognitionInit

Input message that initiates a new DTMF recognition turn. Included in DTMFRecognitionRequest.

DTMFRecognitionInit parameter table
Field Type Description
parameters DTMFRecognitionParameters Various endpointer and recognition parameters, recognition result format.
resources RecognitionResource Repeated. Mandatory. Resources (grammars) to be used for the recognition.
client_data DTMFRecognitionInit.ClientDataEntry Repeated. Client-supplied event, key=value pairs to inject into the call log. Multiple key=value entries can be specified in the key=value string by separating them with a pipe character. Example: client_data[“event”] = “key1=value1|key2=value2”;
user_id string A user identification to associate with the recognition.

DTMFRecognitionParameters

Input message that defines parameters for the DTMF recognition process. Included in DTMFRecognitionInit.

DTMFRecognitionParameters parameter table
Field Type Description
recognition_flags RecognitionFlags Boolean recognition parameters.
no_input_timeout_ms int32 Maximum time, in milliseconds, allowed while waiting for user input after recognition timers are started. Default is 7000 ms. A value of -1 means no timeout.
dtmf_interdigit_timeout_ms int32 Maximum time, in milliseconds, allowed while waiting for next DTMF char. Default is 5000 ms. A value of -1 means no timeout. After Nuance Recognizer receives the first DTMF digit, any subsequent DTMF must come within the dtmf_interdigit_timeout_ms time. Otherwise, Nuance Recognizer ends the recognition and returns the result up to that point.
dtmf_term_timeout_ms int32 Maximum duration, in milliseconds, to wait for DTMF term char. Default is 10000 ms. A value of -1 means no timeout. This timer is active when DTMFRecognitionParameters specifies a DTMF termination character. When Nuance Recognizer finds a match for the DTMF sequence at a point in time, Nuance Recognizer will wait for the terminating DTMF character. If it does not arrive within the dtmf_term_timeout_ms time, Nuance Recognizer will end the recognition and return the result up to that point.
dtmf_term_char string Terminating DTMF character for DTMF input recognition.
nbest int32 Maximum number of n-best hypotheses to return. Range is 0 to 999. Additional CPU cycles needed if > 5. Default is 2.
result_format ResultFormat Specifies in what format the recognition result should be returned.
cookies string Repeated. Defines the HTTP cookies to be included when fetching a grammar resource using the Set-Cookie or Set-Cookie2 format. Format: “Set-Cookie:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”, “Set-Cookie2:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”. The name and value attributes are required. The remaining attributes are optional.
recognizer_parameters DTMFRecognitionParameters.Recognizer
ParametersEntry
Client-supplied key-value pairs representing parameters to set on the recognizer.
secure_context_level EnumSecureContextLevel Specifies the level of security for the recognition. Default is OPEN.

AudioFormat

Input message specifying the format of the audio to recognize. Included in RecognitionParameters.

AudioFormat parameter table
Field Type Description
pcm PCM Signed 16-bit little endian -> “audio/L16;rate=8000” 16-bit 8 kHz linear encoding.
ulaw ULaw G.711 Mu-law, 8kHz -> “audio/basic;rate=8000” 8-bit 8 kHz u-law encoding.
alaw ALaw G.711 A-law, 8kHz -> “audio/x-alaw-basic;rate=8000” 8-bit 8 kHz A-law encoding.
PCM

Input message defining PCM audio format. Audio rate is 8kHz.

ALaw

Input message defining ALaw audio format. G.711 audio formats are set to 8kHz.

ULaw

Input message defining ULaw audio format. G.711 audio formats are set to 8kHz.

RecognitionFlags

Input message containing boolean recognition parameters. The default is false in all cases.

For speech recognitions, this is included in RecognitionParameters.

For DTMF recognitions, this is included in DTMFRecognitionParameters.

RecognitionFlags parameter table
Field Type Description
stall_timers bool Whether to disable recognition timers. By default, timers start when recognition begins.

ResultFormat

Input message used to specify the format to use for the recognition result.

For speech recognitions, this is included in [RecognitionParameters].(#recognitionparameters)

For DTMF recognitions, this is included in DTMFRecognitionParameters.

ResultFormat table
Field Type Description
format EnumResultFormat The result format to use. If not set, the NLSML format is used (“application/x-vnd.speechworks.emma+xml”).
additional_parameters string Additional parameters controlling the formatting of the result. Example: “;mrcpv=2.06;strictconfidencelevel=1”
EnumResultFormat

Supported formats for the recognition result.

EnumResultFormat parameter table
Name Number Description
NLSML 0 Natural Language Semantics Markup Language (NLSML) format. See www.w3.org/TR/nl-spec  for details. “application/x-vnd.speechworks.emma+xml”
EMMA 1 Extensible Multimodal Annotation Language (EMMA) format. See www.w3.org/TR/emma  for details. “application/x-vnd.nuance.emma+xml”
EnumSecureContextLevel

Secure context level.

EnumSecureContextLevel parameter table
Name Number Description
OPEN 0 Prompt text and recognition results appear in the diagnostic and call logs, and utterance waveforms are recorded.
SUPPRESS 1 Utterance waveforms are not recorded, recognition results in the diagnostic and call logs are suppressed.

RecognitionResource

Input message defining one or more recognition resources (grammars) to be used for the recognition.

For speech recognitions, this is included in RecognitionInit.

For DTMF recognitions, this is included in DTMFRecognitionInit.

RecognitionResource parameter table
Field Type Description
builtin string Name of a built-in resource supported by the installed language pack.
uri_grammar UriGrammar The resource is an external file.
inline_grammar InlineGrammar Inline grammar, SRGS XML format, or other format.
language string Mandatory. Language and country (locale) code as xx-XX (2-letters format), e.g. en-US. Must be one of the languages available in the language group of the URI being called.
weight int32 Specifies the grammar’s weight relative to other grammars active for that recognition. This value can range from 1 to 32767. Default is 1.
grammar_id string Specifies the id that Nuance Recognizer will use to identify the grammar in the recognition result. If not set, Nuance Recognizer generates a unique one.

UriGrammar

Input message defining the URI reference to a grammar resource.

UriGrammar parameter table
Field Type Description
uri string Mandatory for UriGrammar resources. Location of the resource as a URI reference.
media_type EnumMediaType The type of media used for the grammar being fetched. If not specified, Nuance Recognizer detects the media type.
parameters UriGrammarParameters Parameters controlling the grammar fetch.

InlineGrammar

Input message containing an inline recognition grammar.

InlineGrammar parameter table
Field Type Description
media_type EnumMediaType The type of media used for the inline grammar data. If not specified, Nuance Recognizer detects the media type.
grammar bytes Mandatory for InlineGrammar resources. Grammar data.

EnumMediaType

Grammar format.

EnumMediaType parameter table
Name Number Description
AUTOMATIC 0 Recognizer will attempt to automatically determine the loaded grammar format.
APPLICATION_SRGS_XML 1 “application/srgs+xml”
APPLICATION_X_SWI_GRAMMAR 2 “application/x-swi-grammar”
APPLICATION_X_SWI_PARAMETER 3 “application/x-swi-parameter”

UriGrammarParameters

Input message for fetching an external recognition grammar.

UriGrammarParameters parameter table
Field Type Description
request_timeout_ms uint32 Time to wait when downloading resources, in milliseconds. Default of 0 will use the server default of 30000 milliseconds (30 seconds).
content_base string Used to specify the base URI for resolving relative URLs. Default "" is the server default (no base).
max_age uint32 Cache control parameter. Sets max-age, in seconds. Default of 0 is the server default (not present).
max_stale uint32 Cache control parameter. Sets max-stale, in seconds. Default of 0 is the server default (do not use expired entries).

Control

Input message that starts the recognition no-input timer.

For speech recognitions, this is included in RecognitionRequest.

For DTMF recognitions, this is included in DTMFRecognitionRequest.

Control parameter table
Field Type Description
start_timers StartTimersControl Starts the recognition no-input timer.

StartTimersControl

Input message the client sends when starting the no-input timer. Included in Control.

RecognitionResponse

Output stream of messages in response to a recognize request. Included in GrammarRecognizer Recognizer service.

RecognitionResponse parameter table
Field Type Description
status Status Always the first message returned, indicates whether recognition was initiated successfully.
start_of_speech StartOfSpeech Number of samples to the moment that speech was detected.
end_of_speech EndOfSpeech When the end of speech was detected.
result Result The partial or final recognition result. A series of partial results may precede the final result.

Status

Output message indicating the status of the transcription. The message and details are developer-facing error messages in English. User-facing messages should be localized by the client based on the status code. Included in RecognitionResponse.

See Status codes for details about the codes.
Status parameter table
Field Type Description
code uint32 HTTP-style return code: 100, 200, 4xx, or 5xx as appropriate.
message string Brief description of the status.
details string Longer description if available.

StartOfSpeech

Output message containing the start-of-speech message. Included in RecognitionResponse.

StartOfSpeech parameter table
Field Type Description
first_audio_to_start_of_speech_ms uint32 Offset from start of audio stream to start of speech detected, in milliseconds.

EndOfSpeech

Output message containing the end-of-speech message. Included in RecognitionResponse.

EndOfSpeech parameter table
Field Type Description
first_audio_to_end_of_speech_ms uint32 Offset from start of audio stream to end of speech detected, in milliseconds.

Result

Output message containing the result, including the result status.

Result parameter table
Field Type Description
formatted_text string Formatted recognition result (could be empty).
status string Recognition status information: SUCCESS, NO_MATCH, INCOMPLETE, NON_SPEECH_DETECTED, SPEECH_DETECTED, SPEECH_COMPLETE, MAX_CPU_TIME, MAX_SPEECH, STOPPED, REJECTED or NO_SPEECH_FOUND.

Scalar value types

The data types in the proto files are mapped to equivalent types in the generated client stub files.

Scalar data types
Proto Notes C++ Java Python
double double double float
float float float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers. If your field is likely to have negative values, use sint32 instead. int32 int int
int64 Uses variable-length encoding. Inefficient for encoding negative numbers. If your field is likely to have negative values, use sint64 instead. int64 long int/long
uint32 Uses variable-length encoding. uint32 int int/long
uint64 Uses variable-length encoding. uint64 long int/long
sint32 Uses variable-length encoding. Signed int value. These encode negative numbers more efficiently than regular int32s. int32 int int
sint64 Uses variable-length encoding. Signed int value. These encode negative numbers more efficiently than regular int64s. int64 long int/long
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long
sfixed32 Always four bytes. int32 int int
sfixed64 Always eight bytes. int64 long int/long
bool bool boolean boolean
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode
bytes May contain any arbitrary sequence of bytes. string ByteString str