Control sequences

A control sequence is a piece of text that is not read out, but instead affects how other text is spoken, or that performs a specific task. By using control sequences, you can acquire full control over the pronunciation of the input text.

For example, you can use a control sequence to tell Vocalizer to speak a particular word in your text more loudly than the others, or to insert a bookmark that will appear in your application logs. See also Natural language understanding.

Vocalizer supports two types of control sequences:

  • Native control sequences: Vocalizer supports a proprietary syntax for control sequences, which is described in the appropriate HTML format Language Supplement found in the VOCALIZER_SDK\doc\languages directory.
  • SSML markup: Vocalizer accepts SSML (Speech Synthesis Markup Language) elements in an XML document.

Native control sequence format

All native control sequences follow this general syntax notation:

<ESC> \ parameter = value \

In this notation:

  • <ESC> normally represents the escape character "\x1B" (decimal 27) that generates the ASCII character 27 (Hex 1B).
  • parameter is the name of the control parameter that the control sequence affects.
  • value is the value you want to assign to the control parameter.

For example, you can insert a half-second pause in your text with the pause parameter:

Welcome to our phone system. <ESC>\pause=500\ How may I help you?

A value that is set with a control sequence remains active until another control sequence sets a new value, or until the end of the input text.

Enter control sequences that affecting the pronunciation of a word outside of that word. If entered inside a word, they will break it into two words.

It is possible to use native control sequences within an SSML document. However, this is difficult because the default <ESC> escape character for native control sequences is forbidden in XML documents. For more on using native control sequences within an XML document, see Defining an alternative escape sequence.

SSML markup

The SSML markup language includes several elements that have the same effect as native Vocalizer control sequences. For example, you can insert a half-second pause using the SSML <break> element:

Welcome to our phone system. <break time="500ms"/> How may I help you?

See the SSML Specification for a list of SSML elements, and Vocalizer SSML support for details on how Vocalizer supports them.

You cannot use SSML outside an XML document.

Using native control sequences within SSML can lead to unexpected behavior, so test them carefully. See Native control sequences.

Defining an alternative escape sequence

Under some circumstances you may be unable to use the <ESC> escape sequence, such as including a native control sequence in an SSML document, for example. You may also wish to augment <ESC> with an alternative sequence of your own.

To define an alternative sequence, use <escape_sequence> in Management Station. For example, to define three hashmarks (###) as the escape sequence:

<escape_sequence>###</escape_sequence>

You can then use this new sequence instead of <ESC>:

Welcome to our phone system. ###\pause=500\ How may I help you?

The alternative sequence must be a Perl 5 compatible regular expression. This means that any special characters in the sequence, such as a period (.), pipe (|), question mark (?), and so on, must be escaped using a backslash character (\).

Be careful to avoid using characters that might appear in your input text, otherwise you may inadvertently create an extra escape sequence. For example, "\$" would be a bad choice if your input text might include "$". This example creates an unwanted escape sequence:

\$volume=80\$50 has been transferred to your savings account

The alternative sequence supplements <ESC> rather than replacing it, so you can still use <ESC> for control sequences in non-XML documents.

Control sequence tasks

The following table summarizes the tasks you can achieve using control sequences, and whether they are supported in native or SSML format:

Task

Native

SSML

Inserting a digital audio recording

X

X

Inserting an ActivePrompt

X

 

Activating implicit matching for an ActivePrompt domain

X

X

Inserting phonetic input

X

X

Inserting Pinyin input for Chinese languages

X

 

Marking a multi-word string for lookup in the user dictionary

X

 

Inserting a pause

X

X

Guiding text normalization

X

X

Inserting a bookmark

X

X

Changing the speaking rate

X

X

Changing the pitch

X

X

Changing the volume

X X

Setting the end-of-sentence pause duration

X

 

Setting the spelling pause duration

X

 

Controlling end-of-sentence detection

X

X

Setting the textual context explicitly

X

X

Controlling the read mode

X

 

Changing the voice

X

X

Labeling text for language identification

X

X

Indicating a paragraph break

X

X

Resetting control sequences to the default

X

 
Changing the speaking style X  
Controlling agreement of number, gender, and case X