Grammar performance

These guidelines for detecting and avoiding resource usage problems in your grammars serve as best practices that promote efficiency for voice platforms and applications.

The foremost objective for grammar development is to design for optimal recognition accuracy. The next goal is to write for clarity, maintainability, and extensibility. The third goal is to create efficient recognition contexts.

You can evaluate the first two goals to some extent by using the testing tools described in Testing grammars. The following topics address grammar performance as it applies to the CPU and storage space resources that your grammar uses: the factors that contribute to resource use, and strategies you can use to make your grammars as resource-efficient as possible.

How grammars affect resource usage

Below are grammar characteristics that affect resource usage:

  • Coverage: The grammar covers (includes) the phrases you expect the caller to use. Under-coverage leads to an increase in out-of-vocabulary utterances, confirmations, and retries, which all increase CPU usage and call duration.
  • Over-generation: It is important that the grammar not over-generate by allowing nonsensical phrases, as this reduces accuracy. For example, a grammar that recognizes a city and state needs to constrain utterances to valid combinations of cities and states.
  • Multiple parses: See Multiple parses.
  • Keys passed to the application: You must ensure that key/value pairs are set correctly.
  • SWI_meaning key: This key can improve efficiency by compiling redundant answers into a single entry on the n-best list. See SWI_meaning.

Performance considerations

The following list summarizes strategies for conserving grammar resources:

  • Fetching grammars from web servers: When you load grammars, Recognizer fetches them from a web server and caches them on the local machine. It’s important to configure Cache-Control headers on the web server to inform Recognizer about timers such as expiration and maximum age. Otherwise, performance degrades if Recognizer re-fetches files repeatedly and unnecessarily.

    If a web server does not provide expiration information, Recognizer calculates a default behavior (using the Last-Modified stamp) that might not be optimal for your system: the system might re-fetch data more often than needed.

    To avoid performance problems, configure all web servers to specify the cache policy for your grammar file types:

    • Ensure the web server specifies the proper HTTP/1.1 Cache-Control headers. For example, "Cache-Control: max-age=1440" allows the system to cache fetched data for 24 hours (1440 seconds).
    • Choose the cache duration carefully based on each application’s requirements. Be especially careful with dynamically generated grammars: make the duration long enough so that fetches are infrequent, but short enough so that the system acquires updated grammars within a reasonable time frame.
  • Caching grammars: After fetching grammars to the local machine, Recognizer compiles them, writes them to the local disk, puts them into memory for an amount of time before replacing them. All these activities are influenced by the caching configuration. See Understanding grammar caching.

    Your strategy for managing grammars must include decisions about the costs and benefits of re-fetching, storing in the disk cache, and storing in memory. Keeping grammars in memory is a good strategy when the cost of loading is high (such as when a grammar is large, must compile at runtime, or must be reloaded frequently).

    To troubleshoot a caching problem, examine the HTTP/1.1 responses to the grammar fetch requests and examine the expiration settings. By tuning the web server’s Cache-Control headers, you can solve most performance issues.

  • Load source grammars: You can load grammars in their SRGS source form and allow Recognizer to compile them at runtime. This strategy is useful for small grammars and grammars generated at runtime (such as grammars that must be customized for each caller). The drawback is the cost of CPU cycles for compiling each time the grammar is loaded.
  • Load binary grammars: You can precompile grammars (see Compiling grammars). This strategy is good for large grammars that cause latency problems if compiled at runtime. The drawback is that these grammars are static and cannot change their coverage at runtime. You can also precompile user dictionaries to improve performance; see Compiling a user dictionary.
  • Combine source and binary grammars: You can combine precompiled and source grammars using dynamic linked grammars (Dynamic-link grammars). This strategy combines the previous techniques to mitigate their drawbacks, but it requires more planning and maintenance activities due to the modularized design of your grammar libraries.
  • Preload grammars: You can load grammars when your application starts. This strategy incurs the costs of fetching, compiling and loading before the application accepts telephone calls (otherwise, the first callers to the application would experience any delays associated with those costs). This technique is only useful for static grammars, since grammars that change at runtime must be recompiled each time they are used. See swirec_preload_file.
  • Trade recognition time for faster compilation: When your application requires large lists of items, you can create a wordlist grammar that reduces compilation time at the cost of increased CPU usage during runtime recognition. This strategy is useful when the list changes frequently, or when it is not used frequently enough to keep in memory. For details, see Wordlist (directory-assistance) grammars.

Latency issues

Latency is defined as the period of elapsed time from after the caller stops speaking (including the configured end-of-speech timeout) until a recognition result is returned to the application. When latency is too high, the caller’s experience degrades; the system appears sluggish, which can be frustrating to the user and leads to further user interface complications.

In extreme circumstances, excess latency causes unsuccessful application transactions if callers hang up without accomplishing the goal of their calls. Poor recognition response times can have many contributing factors:

  • Use of very large grammars containing hundreds of thousands of items.
  • Extremely long average utterance lengths.
  • High amounts of ECMAScript processing within the grammar.
  • Insufficient system memory, resulting in excessive paging and swapping.
  • Extra time for compiler grammars that are not precompiled.
  • Extra time when application servers dynamically generate grammars.
  • Network delays when fetching grammars.
  • Processes on the host machine that are not part of the recognition service.

The first step to finding the source of latency is to measure the response time of your recognition contexts, as discussed below.

Managing performance

The following subsections describe factors in managing performance.

Self-learning feature (acoustic adaptation)

As an additional performance-enhancing feature, Recognizer automatically improves recognition accuracy over time by using high-confidence results to tune the underlying recognition models. This feature uses negligible CPU and memory resources except for a daily update, which typically occurs during low-usage times.

The benefits of self-learning depend on the language being recognized. For languages where little or no benefit is expected, Nuance suppresses the feature by default. Furthermore, adaptation is intended for deployed systems and not recommended when developing and testing grammars.

Do not use the self-learning feature for voice enrollment grammars or any highly-unconstrained grammars (such as a “phoneme loop” grammar). To suppress the feature, use a <meta> element in the grammar header to set the swirec_acoustic_adapt_suppress_adaptation parameter to "1", as follows:

<meta name="swirec_acoustic_adapt_suppress_adaptation" content="1"/>

To suppress acoustic adaptation only when the CPU load is high, use the swirec_load_adjusted_cpu_ranges parameter to define the idle, normal, high, and pegged levels of CPU activity. Then use the swirec_acoustic_adapt_suppress_adaptation parameter to specify the levels at which acoustic adaptation is to be suppressed.

See swirec_load_adjusted_cpu_ranges and swirec_acoustic_adapt_suppress_adaptation.