Using rulesets

This topic describes how to use rulesets to tune or normalize text input.

A ruleset is a set of match-and-replace rules that the engine uses to change strings of input text when converting to TTS. For example, a ruleset might expand an abbreviation (from “PIN” to “personal information number”) or to find all uses of a currency symbol, and replace it with words ("dollars", "euros" and so on) regardless of the amounts.

  • The match specification finds a string in the input.

  • The replacement specification substitutes a different string.

Whereas user dictionaries only support search and replace for literal strings—that is, complete words or tagged multi-word fragments—rulesets support any search pattern that can be expressed using regular expressions. You can use them to search for multiple words, part of a word, or a repeated pattern.

Loading

You can load any number of rulesets for runtime. The mechanism you use determines whether the ruleset is global or has a narrow scope. The load order determines the processing order: first loaded rulesets get processed before last loaded rulesets.

Mechanisms:

  • Load with a the <default_rulesets> XML configuration parameter in Management Station (global, applies to the entire input text).
  • Load with the SSML <lexicon> element in the input text (typed, only effects the input text in the same scope as the element). Use this techinque to narrow the scope of text replacements to specific text fragments. These are called typed rulesets because their matching text fragments are labelled as a particular data type by the SSML <say-as> or the native <ESC>\tn\ control sequence. The typed rulesets augment or override the Vocalizer built-in text normalization types.

Precedence and processing

Vocalizer applies a ruleset when the active language matches the language in the ruleset header. To perform the substitutions, it uses the Regular Expression Text-To-Text (RETTT) engine, a subcomponent of each text-to-text engine instance.

Text normalization sequence:

  1. If you provide the input text via the input text callback mechanism, Vocalizer collects the entire input.
  2. Vocalizer transcodes the input to UTF-16.
  3. Vocalizer expands the SSML markup to native control sequences, applies rulesets in order, and then applies any other text normalizations (for example, before applying orthographic user dictionary entries).

For typed rulesets, the SSML <say-as> or <ESC>\tn\ type gets stripped by the engine while processing the typed ruleset (even if the typed ruleset does not make changes to the contained text fragment). This allows typed rulesets to override Vocalizer built-in text normalization types. Optionally, you can delegate portions of your typed ruleset processing to Vocalizer built-in text normalization types by adding <ESC>\tn\ wrappers around the text. (You must use the native control sequence because SSML processing is already done before ruleset processing.).

The engine processes rulesets as follows:

  1. The engine processes rulesets in the order they were loaded and completes the typed rulesets before starting the global rulesets. This strategy allows you to define a sequential chain of rulesets that progressively refine and normalize the input text.
  2. In each ruleset, rules are processed in document order.
  3. Regardless of the effect of a rule, the engine applies the next rule, and so on. A later rule may change an input string that has already been transformed by a previous rule.
  4. The engine continues processing rulesets from the first- to last-loaded. It stops after the last rule of the most recently-loaded ruleset.

Ruleset format

In general, a user-defined is a UTF-8 text file with a header followed by data. Use the pound (#) symbol for comments.

Short example ruleset:

[header]
language = ENU
charset = "utf-8"
type = financial:stocks
# The language is American English
[data]
/NUAN/ --> "Nuance" 
/\x{20ac}(\d+)\.(\d{2})\d*/ --> "$1 euro $2 cents" 

Effect of rulesets on location markers

Markers on input text have source position fields that represent positions after the ruleset transformations (and not the original text positions).

Effect of rulesets on performance

The loading of rulesets can affect synthesis processing performance, increasing latency (time to first audio) and overall CPU use. It is important to test the system to see how the rulesets affect its performance, and to ensure that it is acceptable.

This makes it important to consider pattern efficiency carefully while writing rulesets. Certain regular expression patterns are more efficient than others. For example, a character class (such as "[aeiou]") is more efficient than the equivalent set of alternatives (such as "(a|e|i|o|u)").

For details, see the pcreperform.html main page of the PCRE package.