Hardware and memory for self-hosted environments

Requirements for processors and memory in self-hosted runtime environments.

Nuance recommends more powerful machines for large deployments with high port densities and heavy call volumes.
Nuance recommends explicitly configuring the date and time of all host machines in the network. Using a common date and time on all hosts simplifies log collection, licensing, troubleshooting, and comparing logs.

Basic environment

To compute sizing estimates, Nuance uses a basic Kubernetes environment with Docker, and helm charts in the local Kubernetes cluster. This includes a range of nodes (1 master, 1 master and 1 worker, 1 master and 2 workers) and CPUs (4, 8, and 12). You can assume similar estimates for WAR deployments

VoiceXML Connector instances are tested to handle 127 requests per second concurrently with default settings and a resource limit of 4 CPU and 4 GB storage. The service uses standard OpenJDK 11 settings for heap usage by default. You can override the values by modifying the java_options setting. For example:

java_options: -XX:MaxRAMPercentage=50 -XX:G1ReservePercent=20 -XX:ConcGCThreads=2

Requests per second refers to the number of requests sent from the client application to VoiceXML Connector. Nuance uses a test application that generates 4 requests per call: start, execute, execute and stop. (Start requests use slightly more resources than other requests.) To estimate total requests for a real world application, assume one request for each dialog component executed in the Mix application, plus start and end requests for each call.

Nuance used a sample test application that maximizes the number of requests and minimizes the call durations. This is done by emphasizing VoiceXML Connector requests to the dialog service and minimizing the duration of prompts, input audio, and speech recognition. The request density reaches 150/second for durations up to 60 minutes.

Memory

In general, VoiceXML Connector does not require large amounts of memory.

Recommended minimum: 2 GB per instance.

CPU usage

Processing increases linearly on each node with the rate of requests. The slope increases with the number of nodes.

After determining a satisfactory processing capability for VoiceXML Connector, you can estimate the needed CPUs in direct proportion to any number of additional nodes.

Guideline:

To determine the total number of CPUs needed in a cluster for a given workload, calculate the minimal CPU processing needed, and round up. (Typically, you round up to an even number.)

Formula:

total minimal CPU = number of  nodes * ((0.02 * requests/second)  + 0.25)

Example:

To process 200 requests/second with 1 node you need 4.25 CPUs (a minimum of 5 CPUs, typically rounded up to 6 or 8):

total minimal cpu = 1 x ( (0.02 x 200) + 0.25) = 1 x ( (4.0) + 0.25) = 1 * 4.25 = 4.25 cpu