Failure recovery strategy and fault tolerance

Nuance recommends the following as part of a failure-recovery strategy for Management Station:

Nuance implements fault tolerance between Management Station and a managed network at the host and system level. Fault tolerance is the ability of a system or component to continue normal operation despite the presence of hardware or software faults.

Host-level fault tolerance

A managed host is installed with the watcher service, an internal service responsible for implementing all management functions local to the host. The watcher service is the communication gateway between Management Station and the host. (For details, see Configuring watcher and SNMP.)

Management Station monitors managed hosts and services. It consists of three services:

  • Nuance Management Station
  • Nuance Management Station Data Collection
  • Nuance Management Station Stats Analyzer

These services are configured by default to start automatically when Management Station host starts up. They are also configured by default to restart on failure.

System-level fault tolerance

A Nuance system is typically distributed as a network of Management Station, managed hosts, and application servers. Because a distributed network can contain failure or fault points, Nuance provides fault tolerance at the system level in the following areas:

  • Communication between Management Station and hosts
  • Communication with application servers

Communication between Management Station and hosts

A high-level representation of a Nuance network might look like this:

Management Station monitors managed hosts and generates alarms if it detects a host has failed.

Managed hosts have built-in fault tolerance to Management Station failures. When a configured managed host is brought into service, all the services are started via Management Station. The managed host creates a configuration file and stores it on its local drive. This file contains the startup information for all the configured services on that host. If the host is restarted after a power failure and can't find a running Management Station, it uses this local configuration file to restart all the services. Additionally, managed hosts can continue to serve calls until a standby Management Station is brought into service.

Communication with application servers

Application servers are typically deployed in a server farm with requests distributed using various techniques like TCP/IP load balancers. A Nuance system functions independently of the load-balancing mechanism and can operate with many popular solutions available on the market.