Audio formats

The audio you provide in RecognitionRequest must be a raw, headerless monophonic (single-channel) stream of audio samples in one of the following formats. Before sending the audio, set the codec and sampling rate in the mandatory RecognitionParameters: AudioFormat.

List of supported audio formats
Audio format Mime type
Linear PCM, 16 bit, signed little-endian, 8 kHz audio/L16;rate=8000 (default) audio/x-raw;format=S16LE;rate=8000
Linear PCM, 16 bit, signed little-endian, 16 kHz audio/L16;rate=16000 audio/x-raw;format=S16LE;rate=16000
ยต-law, 8-bit, 8 kHz audio/basic;rate=8000
A-law, 8 bit, 8 kHz audio/x-alaw-basic;rate=8000
Ogg-encapsulated Opus, 8 kHz audio/ogg audio/ogg; codecs=opus audio/ogg; rate=8000 audio/ogg; codecs=opus; rate=8000
Ogg-encapsulated Opus, 16 kHz audio/ogg; rate=16000 audio/ogg; codecs=opus; rate=16000
Raw Opus, 8 kHz audio/opus; rate=8000 audio/opus; rate=8000; preskip=x
Raw Opus, 16 kHz audio/opus; rate=16000 audio/opus; rate=16000; preskip=x
Not supported audio/ogg; codecs=vorbis

x is the number of samples (at 48 kHz) to pre-skip in the raw Opus audio.

ASRaaS supports the Opus audio format, either raw Opus (RFC 6716) or Ogg-encapsulated Opus (RFC 7845). The recommended encoder settings for Opus for speech recognition are:

  • Sampling rate: 16 kHz
  • Complexity: 3
  • Bitrate: 28kbps recommended (20 kbps minimum)
  • Bitrate type: VBR (variable bitrate) or CBR (constant bitrate)
  • Packet length: 20ms
  • Encoder mode: SILK only mode
  • With Ogg encapsulation, the maximum Ogg container delay must be <= 100 ms

Please note that Opus is a lossy codec, so you should not expect recognition results to be identical to those obtained with PCM audio.