Natural language understanding

Vocalizer for Enterprise uses advanced technology based on natural language understanding. The role of natural language understanding is to improve the initial processing of input to achieve improved prosody and naturalness. This feature is available only with Vocalizer for Enterprise using XPremium-high and XPremium-high-nb voices.

The processing component extracts information from the input text that relates to part-of-speech (POS), phrase type (PHR), word prominence (PRM), and phrase boundary strength (BND).

  • POS defines word types like noun, verb, adjective, and so on
  • PHR defines phrase types like noun-phrase, verb-phrase, adverbial phrase, and so on
  • PRM defines how prominent a word is compared with surrounding words
  • BND defines the strength of the boundary, where a particular word ends

Of these four, you can configure word prominence (PRM) and phrase boundary strength (BND) using the native markup \nlu tag to modify the synthesis output. The syntax is:

<ESC>\nlu=key1:value1;key2:value2...\

In the control sequence, the \nlu tag can have one or more pair-values, separated by semi-colons. Each pair-value is composed of a key and a value, separated by a colon. The parameters and their values are:

  • PRM
    • 0—Reduced, unaccented, destressed
    • 1—De-accented, weak
    • 2—Accented, main phrase accent
    • 3—Emphasized
  • BND
    • W—Weak, minor phrase boundary
    • S—Strong, major phrase boundary
    • N—No boundary

For example, each of these sentences, when synthesized, leads to different voice output.

Example

Voice output

Hello world.

Default output.

<ESC>\nlu=PRM:0\Hello world.

“Hello” is destressed by setting the word prominence to 0.

<ESC>\nlu=PRM:3;BND:S\Hello world.

“Hello” is emphasized, as is the boundary between “hello” and “world”.

Hello, how are you?

Default output.

<ESC>\nlu=BND:N\Hello, how are you?

The boundary between “hello” and “how” is removed.

The \nlu tag is used before the word that the markup applies to, but when BND is specified, it applies to the word boundary after the word that is modified. In these examples, PRM applies to “Hello” but BND applies to the boundary after “Hello.”

See Control sequences.