Application call logs

At runtime, Nuance Vocalizer writes call-log information to report TTS engine statistics for application tuning and capacity planning. When using Vocalizer with Speech Server, all Vocalizer events appear in the Speech Server call logs. This is the preferred method for accessing Vocalizer call log information in a Speech Server environment, as it gives a single view into the system that provides all the information across multiple Nuance products. When using the Vocalizer API, applications can register a callback to receive event notifications, allowing the application to write events to an application specific location or use them for online monitoring and reporting.

Application-provided logging callbacks

Speech Server writes certain Vocalizer-specific application callbacks to the Speech Server call log, and uploads them to the Management Station.

Vocalizer written call log files

Vocalizer-written call-log files are disabled by default. Do not use them in Speech Server environments. These files are similar to Vocalizer error and diagnostic logs: there are two call-log files at most, and there is no hierarchical directory tree of separate call logs.

Call log file format

Call logs are encoded as UTF-8. For storage and transport purposes (for example, storing in CVS and transporting via ftp), treat these files as binary.

Input text log files

Vocalizer supports logging the full input text for speak requests, which is important for analyzing the text spoken by Vocalizer and reviewing it for TTS tuning purposes. For Speech Server-written log files, this text appears inline within the NVOCinpt event, for every speak request. For Vocalizer-written log files, it does so by logging the input text to a separate XML format input text log file, then logging a NVOCinpt event to the call log to correlate that information with a speak request within the main call log. See the description of NVOCinpt—input text.

You can use Vocalizer configuration parameters to limit the number of simultaneous speak requests where the input text is being logged or to completely disable input text capture. When the secure_context parameter is set, input text capture is automatically disabled for the corresponding speak requests.

To configure input text logging, set the following parameters:

  • event_log_input_text_max_capture to allow disabling input capture completely, or throttling input text capture to a limited number of simultaneous speak requests to limit the performance impact and the logged data volume.
  • event_log_input_text_file_base_name to specify the XML file name used for logging input texts.
  • event_log_input_text_file_max_size to specify the maximum size for the input text log file.

The input text log file is a UTF-8 encoded XML file, where each input text is written using an <entry> element with a unique "id" attribute that is generated using the current date, timestamp to the millisecond, and session ID, similar to how waveform capture file names are generated. By generating the "id" attribute in this manner, the ID will be unique for at least that single system, and if the session IDs specified by the application are unique, then the ID will be unique across the entire deployment. The content of the <entry> element is the input text.

The logged input text is the original plain text or SSML input, without any modifications except for transcoding to UTF-8 for logging purposes.

Note: For SSML with an encoding specified in the XML declaration, that encoding is not updated to indicate the input log file’s encoding of UTF-8. Before playing back that SSML for analysis, make sure you update the encoding attribute to specify encoding="utf-8" (or simply remove the encoding attribute).

The NVOCinpt events that cross-reference these entries report the MIME content type for the input text (MIME token), a reference to the input text (TXID token, empty if input text capture logging is disabled in the configuration file or for a secure context), and the text input size in bytes (TXSZ token). See the description of NVOCinpt—input text.

Call logs—merging distributed logs

A voice platform might have several different call-log streams depending on which products are being used. For example, each of the following components can write a call log:

  • VoiceXML browser
  • Nuance Dialog Modules
  • Speech recognition service (for example, Nuance Recognizer)
  • Audio output service (text-to-speech) engine

To merge the logs, you can set "SWI.appsessionid" and "SWI.appstepid" parameters via SSML <meta>; setting both of these parameters generates a NVOCapps event. Nuance packaged applications use these identifiers to merge component logs, including application logs, to enable analysis, tuning, and reporting. The parameters are typically set several times during a session to provide information about logical steps within the application. For example:

<meta name="SWI.appsessionid"
 content="431cc972eaa41c1a22e99ac59f5e4fa4"/> 
<meta name="SWI.appstepid" content="3"/>

Suppressing confidential data

For mask-sensitive (suppress) mode, all affected events report a SECURE=mask-sensitive token and substitute the string "_SUPPRESSED" where confidential data would otherwise appear. For encrypt-sensitive mode, all affected events report a SECURE=encrypt-sensitive token and encrypt all the confidential data.

You can also set "secure_context" via an SSML <meta> element to affect the current speak request only. For example:

<meta name="secure_context" content="encrypt-sensitive"/>

Tokens used for every event

The first entries in each log record are TIME, CHAN, and EVNT; the last entries are UCPU and SCPU.

Token

Description

TIME

System time when the event occurred, in the following format (accurate to within 0.01 second): YYYYMMDDhhmmssmmm

CHAN

Unique session identification name provided in calls to TtsSessionStart or TtsSessionStartEx.

EVNT

Prefix used for event codes. Limited to 8 characters; longer names are truncated. All Vocalizer event codes are prefixed with "NVOC".

UCPU

Current running value of "user" CPU time consumed from the start of synthesis. This value is reported in milliseconds, accurate to within 0.01 second.

SCPU

Current running value of "system" CPU time consumed from the start of synthesis. This value is reported in milliseconds, accurate to within 0.01 second.

Standard events and tokens

When Vocalizer runs with the Nuance Speech Server, the Speech Server receives Vocalizer events, and relays the information to the logging server. Speech Server also receives events from other Nuance speech products and these too are merged into the call logs.

The following list shows groups of standard Vocalizer event codes. Production sites might also encounter events that are defined and inserted by the application. For descriptions of each event code and its tokens, see the Call log event reference.