NRaaS gRPC API

The Nuance Recognizer gRPC API contains methods for requesting recognitions.

GrammarRecognizer

Streaming grammar-based recognition service API.

NRC parameter table
Name	Request Type	Response Type	Description
Recognize	RecognitionRequest stream	RecognitionResponse stream	Starts a recognition request and returns a response.
DTMFRecognize	DTMFRecognitionRequest stream	RecognitionResponse stream	Starts a DTMF recognition request and returns a response.

RecognitionRequest

Input stream messages to perform a speech recognition, sent one at a time in a specific order to perform speech recognition. The first mandatory message sends recognition parameters and resources. An optional Control message can then be sent. The remaining messages send the audio to be recognized. When the stall_timers recognition flag is set to true in the RecognitionInit message, an optional Control message can be sent at any time after the RecognitionInit to initiate the timing of timeout events. Included in GrammarRecognizer Recognize service.

RecognitionRequest parameter table
Field	Type	Description
recognition_init	RecognitionInit	Mandatory. Required first message in the RPC input stream, sends parameters and resources for recognition.
control	Control	Optional. Second message in the RPC input stream, for timer control.
audio	bytes	Audio samples in the selected encoding for recognition.

RecognitionInit

Input message that initiates a new recognition turn. Included in RecognitionRequest.

RecognitionInit parameter table
Field	Type	Description
parameters	RecognitionParameters	Mandatory. Various endpointer and recognition parameters, recognition result format.
resources	RecognitionResource	Repeated. Mandatory. Resources (grammars) to be used for the recognition.
client_data	RecognitionInit.ClientDataEntry	Repeated. Client-supplied event, key=value pairs to inject into the call log. Multiple key=value entries can be specified in the key=value string by separating them with a pipe character. Example: client_data[“event”] = “key1=value1\|key2=value2”;
user_id	string	A user identification to associate with the recognition.

RecognitionParameters

Input message that defines parameters for the recognition process. Included in RecognitionInit. The AudioFormat parameter is required; all others are optional.

RecognitionParameters parameter table
Field	Type	Description
audio_format	AudioFormat	Mandatory. Audio codec type and sample rate.
recognition_flags	RecognitionFlags	Boolean recognition parameters.
no_input_timeout_ms	int32	Maximum silence, in milliseconds, allowed while waiting for user input after recognition timers are started. Default is 7000 ms. A value of -1 means no timeout.
complete_timeout_ms	int32	Specify the duration of silence, in milliseconds, after a valid recognition has occurred that determines the caller has finished speaking. Default is 0 (timer disabled).
incomplete_timeout_ms	int32	Specify the duration of silence, in milliseconds, after an utterance before concluding that the caller has finished speaking. Default is 1500 ms. A value of 0 disables the timer.
max_speech_timeout_ms	int32	Maximum duration, in milliseconds, of an utterance collected from the user. Default is 22000 ms (22 seconds). A value of -1 means no timeout.
speech_detection_sensitivity	float	A balance between detecting speech and noise (for example, breathing), 0 to 1.0. 0 means ignore all noise, 1.0 means interpret all noise as speech. Default is 0.5.
nbest	int32	Maximum number of n-best hypotheses to return. Range is 0 to 999. Additional CPU cycles needed if > 5. Default is 2.
confidence_level	float	When the score of the first n-best entry is less than the value of confidence_level, the recognition will return a no-match. Range is 0 to 1.0. Default is 0 (all utterances accepted).
result_format	ResultFormat	Specifies in what format the recognition result should be returned.
cookies	string	Repeated. Defines the HTTP cookies to be included when fetching a grammar resource using the Set-Cookie or Set-Cookie2 format. Format: “Set-Cookie:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”, “Set-Cookie2:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”. The name and value attributes are required. The remaining attributes are optional.
endpointer_parameters	RecognitionParameters.Endpointer ParametersEntry	Client-supplied key-value pairs representing parameters to set on the endpointer.
recognizer_parameters	RecognitionParameters.Recognizer ParametersEntry	Client-supplied key-value pairs representing parameters to set on the recognizer.
secure_context_level	EnumSecureContextLevel	Specifies the level of security for the recognition. Default is OPEN.

DTMFRecognitionRequest

Input stream messages that request DTMF recognition, sent one at a time, in a specific order. The first mandatory message sends recognition parameters and resources. An optional Control message can then be sent. The remaining messages send the DTMFs to be recognized. When the stall_timers recognition flag is set to true in the RecognitionInit message, an optional Control message can be sent at any time after the RecognitionInit to initiate the timing of timeout events. Included in GrammarRecognizer DTMFRecognize service.

DTMFRecognitionRequest parameter table
Field	Type	Description
recognition_init	DTMFRecognitionInit	Mandatory. Required first message in the RPC input stream, sends parameters and resources for recognition.
control	Control	Second message in the RPC input stream, for timer control.
dtmf	string	A DTMF char/string to add to the recognition.

DTMFRecognitionInit

Input message that initiates a new DTMF recognition turn. Included in DTMFRecognitionRequest.

DTMFRecognitionInit parameter table
Field	Type	Description
parameters	DTMFRecognitionParameters	Various endpointer and recognition parameters, recognition result format.
resources	RecognitionResource	Repeated. Mandatory. Resources (grammars) to be used for the recognition.
client_data	DTMFRecognitionInit.ClientDataEntry	Repeated. Client-supplied event, key=value pairs to inject into the call log. Multiple key=value entries can be specified in the key=value string by separating them with a pipe character. Example: client_data[“event”] = “key1=value1\|key2=value2”;
user_id	string	A user identification to associate with the recognition.

DTMFRecognitionParameters

Input message that defines parameters for the DTMF recognition process. Included in DTMFRecognitionInit.

DTMFRecognitionParameters parameter table
Field	Type	Description
recognition_flags	RecognitionFlags	Boolean recognition parameters.
no_input_timeout_ms	int32	Maximum time, in milliseconds, allowed while waiting for user input after recognition timers are started. Default is 7000 ms. A value of -1 means no timeout.
dtmf_interdigit_timeout_ms	int32	Maximum time, in milliseconds, allowed while waiting for next DTMF char. Default is 5000 ms. A value of -1 means no timeout. After Nuance Recognizer receives the first DTMF digit, any subsequent DTMF must come within the dtmf_interdigit_timeout_ms time. Otherwise, Nuance Recognizer ends the recognition and returns the result up to that point.
dtmf_term_timeout_ms	int32	Maximum duration, in milliseconds, to wait for DTMF term char. Default is 10000 ms. A value of -1 means no timeout. This timer is active when DTMFRecognitionParameters specifies a DTMF termination character. When Nuance Recognizer finds a match for the DTMF sequence at a point in time, Nuance Recognizer will wait for the terminating DTMF character. If it does not arrive within the dtmf_term_timeout_ms time, Nuance Recognizer will end the recognition and return the result up to that point.
dtmf_term_char	string	Terminating DTMF character for DTMF input recognition.
nbest	int32	Maximum number of n-best hypotheses to return. Range is 0 to 999. Additional CPU cycles needed if > 5. Default is 2.
result_format	ResultFormat	Specifies in what format the recognition result should be returned.
cookies	string	Repeated. Defines the HTTP cookies to be included when fetching a grammar resource using the Set-Cookie or Set-Cookie2 format. Format: “Set-Cookie:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”, “Set-Cookie2:name=cookie-name;value=cookie-value;expires=cookie-expiration;…”. The name and value attributes are required. The remaining attributes are optional.
recognizer_parameters	DTMFRecognitionParameters.Recognizer ParametersEntry	Client-supplied key-value pairs representing parameters to set on the recognizer.
secure_context_level	EnumSecureContextLevel	Specifies the level of security for the recognition. Default is OPEN.

AudioFormat

Input message specifying the format of the audio to recognize. Included in RecognitionParameters.

AudioFormat parameter table
Field	Type	Description
pcm	PCM	Signed 16-bit little endian -> “audio/L16;rate=8000” 16-bit 8 kHz linear encoding.
ulaw	ULaw	G.711 Mu-law, 8kHz -> “audio/basic;rate=8000” 8-bit 8 kHz u-law encoding.
alaw	ALaw	G.711 A-law, 8kHz -> “audio/x-alaw-basic;rate=8000” 8-bit 8 kHz A-law encoding.

PCM

Input message defining PCM audio format. Audio rate is 8kHz.

ALaw

Input message defining ALaw audio format. G.711 audio formats are set to 8kHz.

ULaw

Input message defining ULaw audio format. G.711 audio formats are set to 8kHz.

RecognitionFlags

Input message containing boolean recognition parameters. The default is false in all cases.

For speech recognitions, this is included in RecognitionParameters.

For DTMF recognitions, this is included in DTMFRecognitionParameters.

RecognitionFlags parameter table
Field	Type	Description
stall_timers	bool	Whether to disable recognition timers. By default, timers start when recognition begins.

ResultFormat

Input message used to specify the format to use for the recognition result.

For speech recognitions, this is included in [RecognitionParameters].(#recognitionparameters)

For DTMF recognitions, this is included in DTMFRecognitionParameters.

ResultFormat table
Field	Type	Description
format	EnumResultFormat	The result format to use. If not set, the NLSML format is used (“application/x-vnd.speechworks.emma+xml”).
additional_parameters	string	Additional parameters controlling the formatting of the result. Example: “;mrcpv=2.06;strictconfidencelevel=1”

EnumResultFormat

Supported formats for the recognition result.

EnumResultFormat parameter table
Name	Number	Description
NLSML	0	Natural Language Semantics Markup Language (NLSML) format. See www.w3.org/TR/nl-spec for details. “application/x-vnd.speechworks.emma+xml”
EMMA	1	Extensible Multimodal Annotation Language (EMMA) format. See www.w3.org/TR/emma for details. “application/x-vnd.nuance.emma+xml”

EnumSecureContextLevel

Secure context level.

EnumSecureContextLevel parameter table
Name	Number	Description
OPEN	0	Prompt text and recognition results appear in the diagnostic and call logs, and utterance waveforms are recorded.
SUPPRESS	1	Utterance waveforms are not recorded, recognition results in the diagnostic and call logs are suppressed.

RecognitionResource

Input message defining one or more recognition resources (grammars) to be used for the recognition.

For speech recognitions, this is included in RecognitionInit.

For DTMF recognitions, this is included in DTMFRecognitionInit.

RecognitionResource parameter table
Field	Type	Description
builtin	string	Name of a built-in resource supported by the installed language pack.
uri_grammar	UriGrammar	The resource is an external file.
inline_grammar	InlineGrammar	Inline grammar, SRGS XML format, or other format.
language	string	Mandatory. Language and country (locale) code as xx-XX (2-letters format), e.g. en-US. Must be one of the languages available in the language group of the URI being called.
weight	int32	Specifies the grammar’s weight relative to other grammars active for that recognition. This value can range from 1 to 32767. Default is 1.
grammar_id	string	Specifies the id that Nuance Recognizer will use to identify the grammar in the recognition result. If not set, Nuance Recognizer generates a unique one.

UriGrammar

Input message defining the URI reference to a grammar resource.

UriGrammar parameter table
Field	Type	Description
uri	string	Mandatory for UriGrammar resources. Location of the resource as a URI reference.
media_type	EnumMediaType	The type of media used for the grammar being fetched. If not specified, Nuance Recognizer detects the media type.
parameters	UriGrammarParameters	Parameters controlling the grammar fetch.

InlineGrammar

Input message containing an inline recognition grammar.

InlineGrammar parameter table
Field	Type	Description
media_type	EnumMediaType	The type of media used for the inline grammar data. If not specified, Nuance Recognizer detects the media type.
grammar	bytes	Mandatory for InlineGrammar resources. Grammar data.

EnumMediaType

Grammar format.

EnumMediaType parameter table
Name	Number	Description
AUTOMATIC	0	Recognizer will attempt to automatically determine the loaded grammar format.
APPLICATION_SRGS_XML	1	“application/srgs+xml”
APPLICATION_X_SWI_GRAMMAR	2	“application/x-swi-grammar”
APPLICATION_X_SWI_PARAMETER	3	“application/x-swi-parameter”

UriGrammarParameters

Input message for fetching an external recognition grammar.

UriGrammarParameters parameter table
Field	Type	Description
request_timeout_ms	uint32	Time to wait when downloading resources, in milliseconds. Default of 0 will use the server default of 30000 milliseconds (30 seconds).
content_base	string	Used to specify the base URI for resolving relative URLs. Default "" is the server default (no base).
max_age	uint32	Cache control parameter. Sets max-age, in seconds. Default of 0 is the server default (not present).
max_stale	uint32	Cache control parameter. Sets max-stale, in seconds. Default of 0 is the server default (do not use expired entries).

Control

Input message that starts the recognition no-input timer.

For speech recognitions, this is included in RecognitionRequest.

For DTMF recognitions, this is included in DTMFRecognitionRequest.

Control parameter table
Field	Type	Description
start_timers	StartTimersControl	Starts the recognition no-input timer.

StartTimersControl

Input message the client sends when starting the no-input timer. Included in Control.

RecognitionResponse

Output stream of messages in response to a recognize request. Included in GrammarRecognizer Recognizer service.

RecognitionResponse parameter table
Field	Type	Description
status	Status	Always the first message returned, indicates whether recognition was initiated successfully.
start_of_speech	StartOfSpeech	Number of samples to the moment that speech was detected.
end_of_speech	EndOfSpeech	When the end of speech was detected.
result	Result	The partial or final recognition result. A series of partial results may precede the final result.

Status

Output message indicating the status of the transcription. The message and details are developer-facing error messages in English. User-facing messages should be localized by the client based on the status code. Included in RecognitionResponse.

See Status codes for details about the codes.

Status parameter table
Field	Type	Description
code	uint32	HTTP-style return code: 100, 200, 4xx, or 5xx as appropriate.
message	string	Brief description of the status.
details	string	Longer description if available.

StartOfSpeech

Output message containing the start-of-speech message. Included in RecognitionResponse.

StartOfSpeech parameter table
Field	Type	Description
first_audio_to_start_of_speech_ms	uint32	Offset from start of audio stream to start of speech detected, in milliseconds.

EndOfSpeech

Output message containing the end-of-speech message. Included in RecognitionResponse.

EndOfSpeech parameter table
Field	Type	Description
first_audio_to_end_of_speech_ms	uint32	Offset from start of audio stream to end of speech detected, in milliseconds.

Result

Output message containing the result, including the result status.

Result parameter table
Field	Type	Description
formatted_text	string	Formatted recognition result (could be empty).
status	string	Recognition status information: SUCCESS, NO_MATCH, INCOMPLETE, NON_SPEECH_DETECTED, SPEECH_DETECTED, SPEECH_COMPLETE, MAX_CPU_TIME, MAX_SPEECH, STOPPED, REJECTED or NO_SPEECH_FOUND.

Scalar value types

The data types in the proto files are mapped to equivalent types in the generated client stub files.

Scalar data types
Proto	Notes	C++	Java	Python
double		double	double	float
float		float	float	float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers. If your field is likely to have negative values, use sint32 instead.	int32	int	int
int64	Uses variable-length encoding. Inefficient for encoding negative numbers. If your field is likely to have negative values, use sint64 instead.	int64	long	int/long
uint32	Uses variable-length encoding.	uint32	int	int/long
uint64	Uses variable-length encoding.	uint64	long	int/long
sint32	Uses variable-length encoding. Signed int value. These encode negative numbers more efficiently than regular int32s.	int32	int	int
sint64	Uses variable-length encoding. Signed int value. These encode negative numbers more efficiently than regular int64s.	int64	long	int/long
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 2^28.	uint32	int	int
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 2^56.	uint64	long	int/long
sfixed32	Always four bytes.	int32	int	int
sfixed64	Always eight bytes.	int64	long	int/long
bool		bool	boolean	boolean
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string	String	str/unicode
bytes	May contain any arbitrary sequence of bytes.	string	ByteString	str

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.