NRaaS gRPC API
The Nuance Recognizer gRPC API contains methods for requesting recognitions.
GrammarRecognizer
Streaming grammar-based recognition service API.
Name | Request Type | Response Type | Description |
---|---|---|---|
Recognize | RecognitionRequest stream | RecognitionResponse stream | Starts a recognition request and returns a response. |
DTMFRecognize | DTMFRecognitionRequest stream | RecognitionResponse stream | Starts a DTMF recognition request and returns a response. |
RecognitionRequest
Input stream messages to perform a speech recognition, sent one at a time in a specific order to perform speech recognition. The first mandatory message sends recognition parameters and resources. An optional Control message can then be sent. The remaining messages send the audio to be recognized. When the stall_timers recognition flag is set to true in the RecognitionInit message, an optional Control message can be sent at any time after the RecognitionInit to initiate the timing of timeout events. Included in GrammarRecognizer Recognize service.
Field | Type | Description |
---|---|---|
recognition_init | RecognitionInit | Mandatory. Required first message in the RPC input stream, sends parameters and resources for recognition. |
control | Control | Optional. Second message in the RPC input stream, for timer control. |
audio | bytes | Audio samples in the selected encoding for recognition. |
RecognitionInit
Input message that initiates a new recognition turn. Included in RecognitionRequest.
Field | Type | Description |
---|---|---|
parameters | RecognitionParameters | Mandatory. Various endpointer and recognition parameters, recognition result format. |
resources | RecognitionResource | Repeated. Mandatory. Resources (grammars) to be used for the recognition. |
client_data | RecognitionInit.ClientDataEntry | Repeated. Client-supplied event, key=value pairs to inject into the call log. Multiple key=value entries can be specified in the key=value string by separating them with a pipe character. Example: client_data[“event”] = “key1=value1|key2=value2”; |
user_id | string | A user identification to associate with the recognition. |
RecognitionParameters
Input message that defines parameters for the recognition process. Included in RecognitionInit. The AudioFormat parameter is required; all others are optional.
Field | Type | Description |
---|---|---|
audio_format | AudioFormat | Mandatory. Audio codec type and sample rate. |
recognition_flags | RecognitionFlags | Boolean recognition parameters. |
no_input_timeout_ms | int32 | Maximum silence, in milliseconds, allowed while waiting for user input after recognition timers are started. Default is 7000 ms. A value of -1 means no timeout. |
complete_timeout_ms | int32 | Specify the duration of silence, in milliseconds, after a valid recognition has occurred that determines the caller has finished speaking. Default is 0 (timer disabled). |
incomplete_timeout_ms | int32 | Specify the duration of silence, in milliseconds, after an utterance before concluding that the caller has finished speaking. Default is 1500 ms. A value of 0 disables the timer. |
max_speech_timeout_ms | int32 | Maximum duration, in milliseconds, of an utterance collected from the user. Default is 22000 ms (22 seconds). A value of -1 means no timeout. |
speech_detection_sensitivity | float | A balance between detecting speech and noise (for example, breathing), 0 to 1.0. 0 means ignore all noise, 1.0 means interpret all noise as speech. Default is 0.5. |
nbest | int32 | Maximum number of n-best hypotheses to return. Range is 0 to 999. Additional CPU cycles needed if > 5. Default is 2. |
confidence_level | float | When the score of the first n-best entry is less than the value of confidence_level, the recognition will return a no-match. Range is 0 to 1.0. Default is 0 (all utterances accepted). |
result_format | ResultFormat | Specifies in what format the recognition result should be returned. |
cookies | string | Repeated. Defines the HTTP cookies to be included when fetching a grammar resource using the Set-Cookie or Set-Cookie2 format. Format: “Set-Cookie:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”, “Set-Cookie2:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”. The name and value attributes are required. The remaining attributes are optional. |
endpointer_parameters | RecognitionParameters.Endpointer ParametersEntry |
Client-supplied key-value pairs representing parameters to set on the endpointer. |
recognizer_parameters | RecognitionParameters.Recognizer ParametersEntry |
Client-supplied key-value pairs representing parameters to set on the recognizer. |
secure_context_level | EnumSecureContextLevel | Specifies the level of security for the recognition. Default is OPEN. |
DTMFRecognitionRequest
Input stream messages that request DTMF recognition, sent one at a time, in a specific order. The first mandatory message sends recognition parameters and resources. An optional Control message can then be sent. The remaining messages send the DTMFs to be recognized. When the stall_timers recognition flag is set to true in the RecognitionInit message, an optional Control message can be sent at any time after the RecognitionInit to initiate the timing of timeout events. Included in GrammarRecognizer DTMFRecognize service.
Field | Type | Description |
---|---|---|
recognition_init | DTMFRecognitionInit | Mandatory. Required first message in the RPC input stream, sends parameters and resources for recognition. |
control | Control | Second message in the RPC input stream, for timer control. |
dtmf | string | A DTMF char/string to add to the recognition. |
DTMFRecognitionInit
Input message that initiates a new DTMF recognition turn. Included in DTMFRecognitionRequest.
Field | Type | Description |
---|---|---|
parameters | DTMFRecognitionParameters | Various endpointer and recognition parameters, recognition result format. |
resources | RecognitionResource | Repeated. Mandatory. Resources (grammars) to be used for the recognition. |
client_data | DTMFRecognitionInit.ClientDataEntry | Repeated. Client-supplied event, key=value pairs to inject into the call log. Multiple key=value entries can be specified in the key=value string by separating them with a pipe character. Example: client_data[“event”] = “key1=value1|key2=value2”; |
user_id | string | A user identification to associate with the recognition. |
DTMFRecognitionParameters
Input message that defines parameters for the DTMF recognition process. Included in DTMFRecognitionInit.
Field | Type | Description |
---|---|---|
recognition_flags | RecognitionFlags | Boolean recognition parameters. |
no_input_timeout_ms | int32 | Maximum time, in milliseconds, allowed while waiting for user input after recognition timers are started. Default is 7000 ms. A value of -1 means no timeout. |
dtmf_interdigit_timeout_ms | int32 | Maximum time, in milliseconds, allowed while waiting for next DTMF char. Default is 5000 ms. A value of -1 means no timeout. After Nuance Recognizer receives the first DTMF digit, any subsequent DTMF must come within the dtmf_interdigit_timeout_ms time. Otherwise, Nuance Recognizer ends the recognition and returns the result up to that point. |
dtmf_term_timeout_ms | int32 | Maximum duration, in milliseconds, to wait for DTMF term char. Default is 10000 ms. A value of -1 means no timeout. This timer is active when DTMFRecognitionParameters specifies a DTMF termination character. When Nuance Recognizer finds a match for the DTMF sequence at a point in time, Nuance Recognizer will wait for the terminating DTMF character. If it does not arrive within the dtmf_term_timeout_ms time, Nuance Recognizer will end the recognition and return the result up to that point. |
dtmf_term_char | string | Terminating DTMF character for DTMF input recognition. |
nbest | int32 | Maximum number of n-best hypotheses to return. Range is 0 to 999. Additional CPU cycles needed if > 5. Default is 2. |
result_format | ResultFormat | Specifies in what format the recognition result should be returned. |
cookies | string | Repeated. Defines the HTTP cookies to be included when fetching a grammar resource using the Set-Cookie or Set-Cookie2 format. Format: “Set-Cookie:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”, “Set-Cookie2:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”. The name and value attributes are required. The remaining attributes are optional. |
recognizer_parameters | DTMFRecognitionParameters.Recognizer ParametersEntry |
Client-supplied key-value pairs representing parameters to set on the recognizer. |
secure_context_level | EnumSecureContextLevel | Specifies the level of security for the recognition. Default is OPEN. |
AudioFormat
Input message specifying the format of the audio to recognize. Included in RecognitionParameters.
Field | Type | Description |
---|---|---|
pcm | PCM | Signed 16-bit little endian -> “audio/L16;rate=8000” 16-bit 8 kHz linear encoding. |
ulaw | ULaw | G.711 Mu-law, 8kHz -> “audio/basic;rate=8000” 8-bit 8 kHz u-law encoding. |
alaw | ALaw | G.711 A-law, 8kHz -> “audio/x-alaw-basic;rate=8000” 8-bit 8 kHz A-law encoding. |
PCM
Input message defining PCM audio format. Audio rate is 8kHz.
ALaw
Input message defining ALaw audio format. G.711 audio formats are set to 8kHz.
ULaw
Input message defining ULaw audio format. G.711 audio formats are set to 8kHz.
RecognitionFlags
Input message containing boolean recognition parameters. The default is false in all cases.
For speech recognitions, this is included in RecognitionParameters.For DTMF recognitions, this is included in DTMFRecognitionParameters.
Field | Type | Description |
---|---|---|
stall_timers | bool | Whether to disable recognition timers. By default, timers start when recognition begins. |
ResultFormat
Input message used to specify the format to use for the recognition result.
For speech recognitions, this is included in [RecognitionParameters].(#recognitionparameters)For DTMF recognitions, this is included in DTMFRecognitionParameters.
Field | Type | Description |
---|---|---|
format | EnumResultFormat | The result format to use. If not set, the NLSML format is used (“application/x-vnd.speechworks.emma+xml”). |
additional_parameters | string | Additional parameters controlling the formatting of the result. Example: “;mrcpv=2.06;strictconfidencelevel=1” |
EnumResultFormat
Supported formats for the recognition result.
Name | Number | Description |
---|---|---|
NLSML | 0 | Natural Language Semantics Markup Language (NLSML) format. See www.w3.org/TR/nl-spec for details. “application/x-vnd.speechworks.emma+xml” |
EMMA | 1 | Extensible Multimodal Annotation Language (EMMA) format. See www.w3.org/TR/emma for details. “application/x-vnd.nuance.emma+xml” |
EnumSecureContextLevel
Secure context level.
Name | Number | Description |
---|---|---|
OPEN | 0 | Prompt text and recognition results appear in the diagnostic and call logs, and utterance waveforms are recorded. |
SUPPRESS | 1 | Utterance waveforms are not recorded, recognition results in the diagnostic and call logs are suppressed. |
RecognitionResource
Input message defining one or more recognition resources (grammars) to be used for the recognition.
For speech recognitions, this is included in RecognitionInit.For DTMF recognitions, this is included in DTMFRecognitionInit.
Field | Type | Description |
---|---|---|
builtin | string | Name of a built-in resource supported by the installed language pack. |
uri_grammar | UriGrammar | The resource is an external file. |
inline_grammar | InlineGrammar | Inline grammar, SRGS XML format, or other format. |
language | string | Mandatory. Language and country (locale) code as xx-XX (2-letters format), e.g. en-US. Must be one of the languages available in the language group of the URI being called. |
weight | int32 | Specifies the grammar’s weight relative to other grammars active for that recognition. This value can range from 1 to 32767. Default is 1. |
grammar_id | string | Specifies the id that Nuance Recognizer will use to identify the grammar in the recognition result. If not set, Nuance Recognizer generates a unique one. |
UriGrammar
Input message defining the URI reference to a grammar resource.
Field | Type | Description |
---|---|---|
uri | string | Mandatory for UriGrammar resources. Location of the resource as a URI reference. |
media_type | EnumMediaType | The type of media used for the grammar being fetched. If not specified, Nuance Recognizer detects the media type. |
parameters | UriGrammarParameters | Parameters controlling the grammar fetch. |
InlineGrammar
Input message containing an inline recognition grammar.
Field | Type | Description |
---|---|---|
media_type | EnumMediaType | The type of media used for the inline grammar data. If not specified, Nuance Recognizer detects the media type. |
grammar | bytes | Mandatory for InlineGrammar resources. Grammar data. |
EnumMediaType
Grammar format.
Name | Number | Description |
---|---|---|
AUTOMATIC | 0 | Recognizer will attempt to automatically determine the loaded grammar format. |
APPLICATION_SRGS_XML | 1 | “application/srgs+xml” |
APPLICATION_X_SWI_GRAMMAR | 2 | “application/x-swi-grammar” |
APPLICATION_X_SWI_PARAMETER | 3 | “application/x-swi-parameter” |
UriGrammarParameters
Input message for fetching an external recognition grammar.
Field | Type | Description |
---|---|---|
request_timeout_ms | uint32 | Time to wait when downloading resources, in milliseconds. Default of 0 will use the server default of 30000 milliseconds (30 seconds). |
content_base | string | Used to specify the base URI for resolving relative URLs. Default "" is the server default (no base). |
max_age | uint32 | Cache control parameter. Sets max-age, in seconds. Default of 0 is the server default (not present). |
max_stale | uint32 | Cache control parameter. Sets max-stale, in seconds. Default of 0 is the server default (do not use expired entries). |
Control
Input message that starts the recognition no-input timer.
For speech recognitions, this is included in RecognitionRequest.For DTMF recognitions, this is included in DTMFRecognitionRequest.
Field | Type | Description |
---|---|---|
start_timers | StartTimersControl | Starts the recognition no-input timer. |
StartTimersControl
Input message the client sends when starting the no-input timer. Included in Control.
RecognitionResponse
Output stream of messages in response to a recognize request. Included in GrammarRecognizer Recognizer service.
Field | Type | Description |
---|---|---|
status | Status | Always the first message returned, indicates whether recognition was initiated successfully. |
start_of_speech | StartOfSpeech | Number of samples to the moment that speech was detected. |
end_of_speech | EndOfSpeech | When the end of speech was detected. |
result | Result | The partial or final recognition result. A series of partial results may precede the final result. |
Status
Output message indicating the status of the transcription. The message and details are developer-facing error messages in English. User-facing messages should be localized by the client based on the status code. Included in RecognitionResponse.
See Status codes for details about the codes.Field | Type | Description |
---|---|---|
code | uint32 | HTTP-style return code: 100, 200, 4xx, or 5xx as appropriate. |
message | string | Brief description of the status. |
details | string | Longer description if available. |
StartOfSpeech
Output message containing the start-of-speech message. Included in RecognitionResponse.
Field | Type | Description |
---|---|---|
first_audio_to_start_of_speech_ms | uint32 | Offset from start of audio stream to start of speech detected, in milliseconds. |
EndOfSpeech
Output message containing the end-of-speech message. Included in RecognitionResponse.
Field | Type | Description |
---|---|---|
first_audio_to_end_of_speech_ms | uint32 | Offset from start of audio stream to end of speech detected, in milliseconds. |
Result
Output message containing the result, including the result status.
Field | Type | Description |
---|---|---|
formatted_text | string | Formatted recognition result (could be empty). |
status | string | Recognition status information: SUCCESS, NO_MATCH, INCOMPLETE, NON_SPEECH_DETECTED, SPEECH_DETECTED, SPEECH_COMPLETE, MAX_CPU_TIME, MAX_SPEECH, STOPPED, REJECTED or NO_SPEECH_FOUND. |
Scalar value types
The data types in the proto files are mapped to equivalent types in the generated client stub files.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.