Audio formats

The audio you provide in RecognitionRequest must be a raw, headerless monophonic (single-channel) stream of audio samples in one of the following formats. Before sending the audio, set the codec and sampling rate in the mandatory RecognitionParameters: AudioFormat.

List of supported audio formats
Audio format	Mime type
Linear PCM, 16 bit, signed little-endian, 8 kHz	audio/L16;rate=8000 (default) audio/x-raw;format=S16LE;rate=8000
Linear PCM, 16 bit, signed little-endian, 16 kHz	audio/L16;rate=16000 audio/x-raw;format=S16LE;rate=16000
µ-law, 8-bit, 8 kHz	audio/basic;rate=8000
A-law, 8 bit, 8 kHz	audio/x-alaw-basic;rate=8000
Ogg-encapsulated Opus, 8 kHz	audio/ogg audio/ogg; codecs=opus audio/ogg; rate=8000 audio/ogg; codecs=opus; rate=8000
Ogg-encapsulated Opus, 16 kHz	audio/ogg; rate=16000 audio/ogg; codecs=opus; rate=16000
Raw Opus, 8 kHz	audio/opus; rate=8000 audio/opus; rate=8000; preskip=x
Raw Opus, 16 kHz	audio/opus; rate=16000 audio/opus; rate=16000; preskip=x
Not supported	audio/ogg; codecs=vorbis

x is the number of samples (at 48 kHz) to pre-skip in the raw Opus audio.

ASRaaS supports the Opus audio format, either raw Opus (RFC 6716) or Ogg-encapsulated Opus (RFC 7845). The recommended encoder settings for Opus for speech recognition are:

Sampling rate: 16 kHz
Complexity: 3
Bitrate: 28kbps recommended (20 kbps minimum)
Bitrate type: VBR (variable bitrate) or CBR (constant bitrate)
Packet length: 20ms
Encoder mode: SILK only mode
With Ogg encapsulation, the maximum Ogg container delay must be <= 100 ms

Please note that Opus is a lossy codec, so you should not expect recognition results to be identical to those obtained with PCM audio.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.