Audio formats
The audio you provide in RecognitionRequest must be a raw, headerless monophonic (single-channel) stream of audio samples in one of the following formats. Before sending the audio, set the codec and sampling rate in the mandatory RecognitionParameters: AudioFormat.
Audio format | Mime type |
---|---|
Linear PCM, 16 bit, signed little-endian, 8 kHz | audio/L16;rate=8000 (default) audio/x-raw;format=S16LE;rate=8000 |
Linear PCM, 16 bit, signed little-endian, 16 kHz | audio/L16;rate=16000 audio/x-raw;format=S16LE;rate=16000 |
µ-law, 8-bit, 8 kHz | audio/basic;rate=8000 |
A-law, 8 bit, 8 kHz | audio/x-alaw-basic;rate=8000 |
Ogg-encapsulated Opus, 8 kHz | audio/ogg audio/ogg; codecs=opus audio/ogg; rate=8000 audio/ogg; codecs=opus; rate=8000 |
Ogg-encapsulated Opus, 16 kHz | audio/ogg; rate=16000 audio/ogg; codecs=opus; rate=16000 |
Raw Opus, 8 kHz | audio/opus; rate=8000 audio/opus; rate=8000; preskip=x |
Raw Opus, 16 kHz | audio/opus; rate=16000 audio/opus; rate=16000; preskip=x |
Not supported | audio/ogg; codecs=vorbis |
x is the number of samples (at 48 kHz) to pre-skip in the raw Opus audio.
ASRaaS supports the Opus audio format, either raw Opus (RFC 6716) or Ogg-encapsulated Opus (RFC 7845). The recommended encoder settings for Opus for speech recognition are:
- Sampling rate: 16 kHz
- Complexity: 3
- Bitrate: 28kbps recommended (20 kbps minimum)
- Bitrate type: VBR (variable bitrate) or CBR (constant bitrate)
- Packet length: 20ms
- Encoder mode: SILK only mode
- With Ogg encapsulation, the maximum Ogg container delay must be <= 100 ms
Please note that Opus is a lossy codec, so you should not expect recognition results to be identical to those obtained with PCM audio.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.