Minimizing latency

This topic provides suggestions to help optimize performance and minimize latency.

Optimal audio buffer size

The audio buffer size is an important factor for minimizing latency (time to first audio) and avoiding underruns, where larger buffer sizes are more efficient for CPU use, but increase latency and the risk of underruns. A good starting point is a buffer that is big enough for half a second of audio rounded up to the nearest multiple of 1024 bytes (1K):

  • 4096 bytes for an 8 kHz sampling rate voice for µ-law or A-law audio output
  • 8192 bytes for an 8 kHz sampling rate voice for linear 16-bit PCM audio output
  • 22528 bytes for a 22kHz sampling rate voice

For Speech Server based applications, the audio buffer size used for synthesis (by default 8192 bytes) is controlled by a configuration parameter that can be set on the Speech Server in Management Station, but the RTP packet size is generally smaller and is controlled by the negotiated RTP stream parameters.

Limiting delays when internet fetching is used

When content such as input texts, user dictionaries, rulesets, and ActivePrompt databases are located on a Web server, this can result in delays when the content is fetched for the first time. Since the internet fetch library uses a (configurable) cache, the download time will be minimal if the cache has been configured well (big enough, reasonable cache entry expiration time), the web server is configured to support caching all the data (specifies HTTP/1.1 caching parameters like maxage), and the cache has been warmed up.

To warm up the cache, the application can perform a number of dummy speak requests. For input texts, the content will already be cached before the first audio packet is delivered. So during the warmup, the application can stop the synthesis request after the first audio packet to speed up the warmup.

Audio content specified via the SSML <audio> tag is always fetched on message (normally a sentence) boundaries, but not necessarily before the first audio packet is delivered. User dictionaries, rulesets, and ActivePrompt databases can be loaded and unloaded to obtain a copy in the cache without consuming RAM. If RAM usage is not a problem, load them as soon as possible and leave them loaded.