Troubleshooting system latency
This topic covers some of the causes of latency that may occur in your system. Latency is the delay observed by callers who notice a long pause between the end of their speech and the next prompt they hear. Callers are very sensitive to these latencies.
Always test for latencies and verify operations in a full production environment. During initial deployments and full production, you can detect latencies in various ways: test calls to the system, reports from customers (for example, comments to agents after users transfer from the application), and call analysis (for example, places in the application where users hang up unexpectedly, or other unexpected values in the call logs).
Follow the steps below to diagnose the causes of latency. They can be general or specific; for example, a heavily loaded system is a general problem, and a slow fetch of a large file is specific. Thus, in diagnosing problems, you might know exactly where to focus, or you might need to survey all parts of the system. For example, latency can be due to delays in these locations:
- Voice platform or application activities. See Diagnosing platform latency.
- Grammar load activities. See Diagnosing delays during grammar loading.
- Fetch delay: Getting the grammar, often across the network.
- Compile delay: Waiting for large grammars to compile.
- Recognizer processing. See Diagnosing Recognizer latency.
- If you don’t have a specific starting point, see General troubleshooting.
The sections that follow help you look for clues in call logs.
Tip: See Logging for Nuance speech products to learn about call log paths, filenames, events, and tokens.
Diagnosing platform latency
To investigate whether the observed latency is caused by a voice platform, consider these questions:
- Does the browser access a database after the recognition?
- Is the application playing a prompt that begins with silence? The time of silence will be perceived as a delay.
- Are there network delays that slow interaction between the system and the application?
Diagnosing delays during grammar loading
If you suspect the latency occurs during grammar load, investigate the SWIgrld—grammar load event in the call logs.
Use the URI token to identify the grammar type. A grammar name ending in .xml or .grxml is a source grammar. A name ending in .gram is a binary grammar.
For source grammars, the latency can be caused by fetching or by compiling.
- Check the values of these tokens:
- FETCHES: Number of fetches needed to load the grammar.
- GCCPU: Total CPU milliseconds used for grammar compilation.
- GCTIME: Total clock-time milliseconds used for grammar compilation.
- IFCPU: Total CPU milliseconds to fetch the grammar(s) from inet.
- IFTIME: Total clock-time milliseconds to fetch the grammar(s) from inet.
- IFBYTES: Total bytes fetched (or re-fetched) from inet or the disk cache.
- LDCPU: Total CPU milliseconds used for the API call.
- LDTIME: Total clock-time milliseconds used for the API call.
- If LDTIME (the clock time for the whole grammar load) is significantly greater than the sum of LDCPU and IFCPU (the time used by the CPU for the load), CPU use may have been interrupted.
- If IFTIME is high, it suggests network or server delays occurring during the attempt to fetch grammars.
- If FETCHES (number of fetches needed to load the grammar) is not 1, the additional fetches can be caused by rulerefs to other grammars or user dictionaries. Each fetch causes latency.
- If GCCPU and GCTIME are significant, it suggests that the grammars are being compiled because they are not in memory cache or disk cache.
For binary grammars, latency can be caused by fetching. A grammar must be loaded into the memory cache before recognition can occur. If a grammar is not in memory, it is loaded from the disk cache. If it is also not in the disk cache, it is fetched, cached to disk, and then loaded into memory.
Check the values of these tokens:
- IFCPU and IFTIME. If IFCPU is low, and IFTIME is high, this indicates network or server delays in fetching a grammar across the internet. If fetching takes a “long” time, latency can occur—the utterance cannot be analyzed until the grammar is in memory.
- IFCPU: Total CPU milliseconds to fetch the grammar(s) from inet.
- IFTIME: Total clock-time milliseconds to fetch the grammar(s) from inet.
- MEMMISS and MEMHITS allow you to see how often grammars are found in memory:
- MEMMISS: Memory cache misses for this load. (The number of loaded grammars that were not already available in the memory cache.)
- MEMHITS: Memory cache hits for this load. (The number of loaded grammars that were already in the memory cache.)
- DISKHITS and DISKMISS allow you to see how often grammars are found in the disk cache:
- DISKMISS: Disk cache misses for this load. (The number of loaded grammars that were not already available in the disk cache.)
- DISKHITS: Disk cache hits for this load. (The number of loaded grammars that were already in the disk cache.)
Here are some suggestions for reducing or avoiding latency when loading grammars:
- Avoid time delays and multiple file fetches per grammar, and reduce CPU load by precompiling grammars: use the sgc compiler utility with the correct optimization level. (See Compiling grammars.)
- Reduce fetch time by pre-loading grammars: use a “preload file” for all your common and/or large grammars. See swirec_preload_file.
- Reduce fetch time by locking grammars into the disk cache: set swirec_disk_cache_lock.
- Avoid fetches by locking grammars into memory: set swirec_memory_cache_lock.
- Avoid fetches by giving grammars long expiration dates on the web server.
- Avoid fetches by increasing memory and disk cache sizes. Set these parameters:
Diagnosing Recognizer latency
Use the call logs to diagnose latency during recognition:
- Investigate the SWIrcst event. Review the list of active grammars (GURIn), and ensure that all of them are needed for the recognition. Otherwise, Recognizer needlessly processes the grammar and attempts to match utterances.
- Investigate the SWIrcnd event. Make note of the following tokens:
- DURS: Duration of the speech signal.
- EOST: Clock time from first speech packet received until end of speech declared.
- EORT: Clock time from first speech packet received to end of recognition (when the results are ready).
- EOSS: Milliseconds into the audio stream where end-of-speech occurs.
- RCPU: Amount of CPU used for the recognition.
- If DURS is far apart from EOST and EORT, there is a CPU problem, such as non-Nuance software running on the machine. (Check the CPU usage of virus scanners and automatic software updaters.) This might also might that the CPU is not sufficient for the number of simultaneous recognitions.
- If EOST and EORT are far apart, the problem is usually insufficient CPU or a very complex recognition task.
Rarely, this can reflect a delay in the audio sent to Recognizer.
- If RCPU is high, the problem is probably a complex grammar, noisy speech, or the utterance is covered in more than one grammar.
- If EOSS and EOST are far apart, it can signal a CPU problem.
- Enable the swirec_save_comp_stats parameter, which writes detailed statistics of Recognizer processing to the call logs. Collect more data in call logs, and deliver the logs to Nuance technical support.
- Investigate large gaps in timestamps between events. Tracing MRCP packets can help identify issues.
Here are some suggestions for reducing or avoiding latency during recognition:
- Remove unnecessary grammars from the context.
- Reduce the complexity of the grammars.
- Reduce expected length of utterance (change the prompt to get a shorter response from the user).
- Reduce CPU usage on machine.
- Look for network delays.
General troubleshooting
A strong indicator of latency is when multiple callers are hanging up unexpectedly at the same place in an application. If you are getting reports in general of slow recognition, but you don’t know the specific grammar or recognition context that is performing slowly, try the following:
- If it’s a repeatable test case, turn on perfmon (Windows) or sar or top (Linux; sar is not included in a typical Linux installation) to track CPU use.
- To narrow in on a problem recognition context, look at the call logs for the application from the time period in which latency was reported.
Once you have selected call logs, look for a recognition that took a long time. (Look for large recognition times between SWIrcst and SWIrcnd.) If you find several recognitions for the same context for various calls that seem to take longer from SWIrcst to SWIrcnd than anticipated, zero in on that grammar by following the suggestions in Diagnosing Recognizer latency.
Likewise, look for large grammar load times, and follow the suggestions in Diagnosing delays during grammar loading.
Tuning scenarios
The following topics cover typical tuning scenarios, and provide some suggestions of what to analyze.
VoiceXML receives NOINPUT return.
- Check telephony hardware settings to ensure the signal coming in has a decent level.
- Check endpointer sensitivity. If it is too high, start-of-speech is not identified.
VoiceXML receives frequent NOMATCH return.
- Most likely a grammar problem.
Matches are made, but confidence is often below threshold.
- Potential hardware issue. Noise on the line. Turn on waveform saving and listen to confirm that the captured audio is good. Transcribe to check grammars.
- Check configuration of Recognizer.
- More careful analysis may be required.
Problem: The content of saved audio files does not match your expectations.
Description: When listening to waveforms captured during a call, you expect the first waveforms to match the first utterances spoken. If, instead, the first waveforms appear to originate later in the call, the problem might be caused by an indexing problem when writing the files. If the system runs out of index digits, it restarts counters and begins overwriting the first files written during the session.
Solution: See these parameters:
Match is made, but confidence is below threshold.
- Check grammars. Ensure grammars are active. Check their coverage—they must include the speech spoken but not be overly broad. Ensure they are not overly confusable. Bring in a speech scientist.
- Listen to the utterance. Eliminate hardware as the problem. Confirm that the captured audio is good.
- Check the call logs. Check tokens related to endpointing and call duration.
- Adapt the grammar or change the confidence threshold.
Dictionary pronunciations needs tuning.
- Use dicttest to see what pronunciation is being used.
Problem may be global or context specific.
- Is the context complex?
- Ensure there are no external issues in the form of database lookups or application computation.
- Is the recognition just too hard—look at RCPU in the SWIrcnd event. If high (relative to others), study the grammar to see why it’s using CPU.
- Check whether the grammar is being compiled (SWIgrld event in the call log), or whether it is being pushed out of cache?
- Semantic interpretation (ECMAScript or SSM): Use RCPU and involve a speech scientist as required.
Indicates a global problem.
- Look at the application server. Look at networking. Ensure no latencies.
- Enable swirec_save_comp_stats, collect data, and deliver the resulting call logs to Nuance technical support.
- Check CPU usage for Speech Server and individual recognition servers. If it is high, figure out why.
- Determine whether grammars are being compiled a lot (see the SWIgrld event in the call log), or whether they are being pushed out of cache.