Control sequences

Task	Native	SSML
Inserting a digital audio recording	X	X
Inserting an ActivePrompt	X
Activating implicit matching for an ActivePrompt domain	X	X
Inserting phonetic input	X	X
Inserting Pinyin input for Chinese languages	X
Marking a multi-word string for lookup in the user dictionary	X
Inserting a pause	X	X
Guiding text normalization	X	X
Inserting a bookmark	X	X
Changing the speaking rate	X	X
Changing the pitch	X	X
Changing the volume	X	X
Setting the end-of-sentence pause duration	X
Setting the spelling pause duration	X
Controlling end-of-sentence detection	X	X
Setting the textual context explicitly	X	X
Controlling the read mode	X
Changing the voice	X	X
Labeling text for language identification	X	X
Indicating a paragraph break	X	X
Resetting control sequences to the default	X
Changing the speaking style	X
Controlling agreement of number, gender, and case	X

Inserting a digital audio recording

Use this control sequence to insert a digital audio recording at a specific location in the text.

The control sequence <ESC>\audio="path"\ inserts the recording specified by path, a URI or local file system path. For example, the following sequence plays the audio recording found at c:\recordings\beep.wav:

Say your name at the beep. <ESC>\audio="c:\recordings\beep.wav"\

Vocalizer supports inserting headerless, WAV format, AU format, or NIST SPHERE format audio files that contain mulaw, alaw, or linear 16-bit PCM samples. If the recording sampling rate does not match the current voice, Vocalizer resamples it before inserting it in the speech output.

The SSML equivalent of this control sequence is the <audio> element:

Say your name at the beep. <audio src="c:\recordings\beep.wav"/>

The control sequence can also accept alternate text for an audio recording. For example, the following sequence specifies for Vocalizer to read "beep sound", instead of playing the audio recording at c:\recordings\beep.wav:

<ESC>\audio="c:\recordings\beep.wav":"beep sound"\

Vocalizer uses the alternate text "beep sound" when the audio recording is unavailable or incompatible, such as an unsupported file format. Changes in rate, volume, and pitch on the alternate text are audibly the same as normal input text.

Note: Vocalizer extracts the alternate text from the control sequence without the surrounding double quotes. If the alternate text contains the double quote character, you can escape it with “\””.

Inserting an ActivePrompt

Use this control sequence to explicitly insert an ActivePrompt at a specific location in the text.

For example:

<ESC>\prompt=banking::confirm_account_number\ 238773?

ActivePrompts are explained in Tuning TTS output with ActivePrompts.

This control sequence has no equivalent in SSML.

Activating implicit matching for an ActivePrompt domain

Use this control sequence to activate implicit matching for an ActivePrompt domain starting at a specific location in the text. If no domain value is specified (for example, <ESC>\domain\), the most recently activated domain is activated.

For example:

<ESC>\domain=banking\Is your account number 238773?

The SSML equivalent of this control sequence consists of using the ssft-domaintype attribute within a <p> or <s> element:

<s ssft-domaintype="banking">Is your account number 238773?</s>

ActivePrompts are explained in Tuning TTS output with ActivePrompts.

Inserting phonetic input

Use this control sequence to clarify the phonetic value of words to ensure they are pronounced correctly. The sequence is useful for words whose spelling deviates from the pronunciation rules of a given language. For example, use it for foreign words or acronyms unknown to the system.

The phonetic input string is composed of symbols of the L&H+ phonetic alphabet, a Nuance specific alphabet that can be conveniently entered from a keyboard. See the Language Supplement for the subset of the L&H+ Phonetic Alphabet relevant for each language, along with examples to help you construct proper phonetic text. However, the following general information applies across all languages, with American English examples to illustrate proper use.

The SSML equivalent of this control sequence is the <phoneme> element:

Would you like a <phoneme alphabet="ipa" ph="t&#x259;&#x2c8;m&#x065;&#x361;&#x26a;.t&#x06f;&#x361;&#x28a;">

tomato</phoneme>?

Use the control sequence <ESC>\toi=lhp\ to mark the beginning of a piece of phonetic text (switch to L&H+ phonetic input mode), and <ESC>\toi=orth\ to mark the end (switch back to orthographic input mode).

This control sequence can be extended as <ESC>\toi=lhp:"orthographic_text"\, where orthographic_text supplies the orthographic (plain text) equivalent of the phonetic fragment. The orthographic equivalent can be any Unicode string, but a double quote (") has to be escaped as (\"), a backslash has to be escaped as a double backslash (\\), and the string cannot contain other <ESC> sequences or SSML markup. The orthographic text may be used by Vocalizer for cross-word effects in some languages, but typically is used as an application comment. This is similar to SSML <phoneme> where the "ph" attribute specifies the phonetic input and the <phoneme> element’s content supplies the orthographic alternative. However, unlike SSML <phoneme>, if Vocalizer encounters invalid symbols in the <ESC>\toi=lhp\ phonetic fragment, it drops those symbols, rather than falling back to the orthographic equivalent.

In addition to the L&H+ phonetic symbols in the Language Supplement, use the following characters to clarify the pronunciation of the phonetic input string:

L&H+ symbol	Meaning	Example
' (ASCII 39, Hex 27)	Primary word stress	<ESC>\toi=lhp:"record"\ R+I.'kOR+d <ESC>\toi=orth\ (the verb “record”) versus: <ESC>\toi=lhp:"record"\ 'R+E.kOR+d <ESC>\toi=orth\ (the noun “record”)





'2	Secondary word stress	<ESC>\toi=lhp:"explanation"\ '2Ek.spl$.'ne&I.S$n<ESC>\toi=orth\ (“explanation”)


" (ASCII 34, Hex 22)	Sentence accent	<ESC>\toi=lhp\DER+_AR+_"tu_"@k.sEnts_?In_DI_'sEn.t$ns <ESC>\toi=orth\ (“There are TWO ACCENTS in this sentence”)


.	Syllable boundary	<ESC>\toi=lhp:"syllable"\ 'sI.l$.b$l <ESC>\toi=orth\ (“syllable”)
#	Silence (pause)	<ESC>\toi=lhp\?a&I_"sEd#do&Unt_"du_It <ESC>\toi=orth\ (“I said: don’t do it.”)

Punctuation marks remain useful within phonetic input to assure correct intonation. Each punctuation mark must be preceded by an asterisk.

L&H+ symbol	Meaning
_	Word delimiter
*.	End of declarative
*,	Comma
*!	End of exclamation
*?	End of question
*;	Semicolon
*:	Colon

For example:

<ESC>\toi=lhp\ "jEs.t$.de&I*,_De&I_'lEft_"?E0.li*. <ESC>\toi=orth\

(“Yesterday, they left early.”)

Lexical stress and sentence accents can be indicated in phonetic strings by using a single quote (') or double quote (") respectively. Vocalizer automatically converts all lexical stress marks into sentence accents if the phonetic input doesn’t contain any sentence accents.

Note that manually specified lexical stress marks and sentence accents sometimes have no effect in Vocalizer, because the synthesis module sometimes needs to override the requested stress or accent.

For example:

<ESC>\toi=lhp\If_D$_'wE.D$R+_Is_fa&In_t$.'mA.R+o&U*,_wi_wIl_liv_fOR+_nu.'jOR+k*.<ESC>\toi=orth\

(“If the weather is fine tomorrow, we will leave for New York.”)

If the phonetic input contains at least one manually added sentence accent, no additional sentence accents are assigned by Vocalizer. Therefore, only those words marked with a double quote (") will get a sentence accent. As a consequence, input containing only one manual sentence accent produces an almost flat intonation on all the other words.

For example:

<ESC>\toi=lhp\If_D$_wE.D$R+_Is_fa&In_t$."mA.R+o&U*,_wi_wIl_liv_fOR+_nu.jOR+k*.<ESC>\toi=orth\

(Only one sentence accent is realized in, “If the weather is fine tomorrow, we will leave for New York.”)

Phonetic input can also be combined with orthographic input. If no sentence accents are found in the input text (indicated by <ESC>\sent_accent\ in orthographic input, or by " in phonetic input), Vocalizer automatically assigns sentence accents. In the orthographic part of the input, Vocalizer realizes these sentence accents on the basis of part-of-speech and syntactic information. In the phonetic part of the input, all lexical stress marks (if any) are converted into sentence accents. If there are no lexical stress marks, no sentence accent will be realized for the phonetic part of the input. If the user has manually specified one or more sentence accents, no additional sentence accents are realized.

For example:

If the weather is fine tomorrow, we will leave for <ESC>\toi=lhp:"New York"\nu.'jOR+k <ESC>\toi=orth\.

(No sentence accents are found; Vocalizer automatically assigns sentence accents.)

If the weather is fine tomorrow, we will leave for <ESC>\toi=lhp:"New York"\nu."jOR+k <ESC>\toi=orth\.

(A sentence accent is specified in the phonetic part of the input text. No additional sentence accents will be realized.)

If the weather is <ESC>\sent_accent\fine tomorrow, we will leave for <ESC>\toi=lhp:"New York"\nu.jOR+k<ESC>\toi=orth\.

(A sentence accent is specified in the orthographic part of the input text. No additional sentence accents will be realized.)

If the weather is <ESC>\sent_accent\fine tomorrow, we will leave for <ESC>\toi=lhp:"New York"\nu."jOR+k<ESC>\toi=orth\.

(Two sentence accents are specified; no additional sentence accents will be realized.)

Inserting Pinyin input for Chinese languages

Inserting diacritized input for Arabic

Use this control sequence to pronounce Arabic input according to the rules of Arabic orthography. By default it assumes undiacritized orthographic input. Use the control sequence <ESC>\toi=diacritized\ to insert diacritized input, and <ESC>\toi=orth\ at the end of that input to restore orthographic input mode.

For example, undiacritized orthographic input:

Using diacritized input:

<ESC>\toi=diacritized\

<ESC>\toi=orth\

This control sequence has no equivalent in SSML.

Marking a multi-word string for lookup in the user dictionary

Use the control sequence <ESC>\mw\ to mark the beginning and the end of a multi-word string that you want Vocalizer to look up as a single entry in a user dictionary.

For example:

Alternatively use the <ESC>\mw\ IP address <ESC>\mw\ to connect.

This is explained in Specifying pronunciations with user dictionaries.

This control sequence has no equivalent in SSML.

Inserting a pause

Use this control sequence to insert a pause of a specified duration at a specific location in the text. For example:

His name is <ESC>\pause=300\ Michael.

The control sequence <ESC>\pause=dur_ms\ inserts a pause of dur_ms milliseconds; the supported range is 1–65535 msec.

The SSML equivalent of this control sequence is the <break> element:

<speak>His name is <break time="300ms"/> Michael.</speak>

The default duration of a pause is 200 milliseconds, inserted as follows:

<ESC>\pause\

To prevent a pause between phrases, specify 0. This example reads the sentence without a pause before saying "Michael":

His name is: <ESC>\pause=0\ Michael.

Guiding text normalization

TN type	Use	Examples
address	Address reading	<ESC>\tn=address\Apt. 7-12, 28 N. Whitney St., Saint Augustine Beach, FL 32084-6715<ESC>\tn=normal\ <say-as interpret-as="address">Apt. 7-12, 28 N. Whitney St., Saint Augustine Beach, FL 32084-6715</say-as>
alphanumeric	Alias of spell:alphanumeric
boolean	Alias of vxml:boolean
cardinal	Alias of number
characters	Alias of spell:alphanumeric
currency	Currency reading	<ESC>\tn=currency\12USD<ESC>\tn=normal\ <say-as interpret-as="currency">12USD</say-as>
date	Date reading	<ESC>\tn=date\12/3/1995<ESC>\tn=normal\ <say-as interpret-as="date">12/3/1995</say-as>
digits	Alias of spell:alphanumeric
name	Proper name reading	<ESC>\tn=name\Care Telecom Ltd<ESC>\tn=normal\ <say-as interpret-as="name">Care Telecom Ltd</say-as>
ordinal	Ordinal number reading	<ESC>\tn=ordinal\12th<ESC>\tn=normal\ <say-as interpret-as="ordinal">12th</say-as>
phone	Telephone number reading	<ESC>\tn=vxml:phone\1-800-688-0068<ESC>\tn=normal\ <say-as interpret-as="phone">1-800-688-0068</say-as>
raw	Block expansions of abbreviations and acronyms.	<ESC>\tn=raw\app.<ESC>\tn=normal\ <say-as interpret-as="raw">app.</say-as>
scope	Activate a dictionary based on a scope, where the "scope" is any TN type.	<ESC>\tn=biking\brevet<ESC>\tn=normal\ <say-as interpret-as="biking">brevet</say-as>
sms	Short message service (SMS) reading	<ESC>\tn=sms\CU (-:<ESC>\tn=normal\ <say-as interpret-as="sms">CU (-:</say-as>
spell	Alias of spell:strict
spell:alphanumeric	Spell alphanumeric characters except for white space and punctuation	<ESC>\tn=spell:alphanumeric\a34y<ESC>\tn=normal\ <say-as interpret-as="spell" format="alphanumeric"> a34y</say-as>
spell:strict	Spell all characters including white space and punctuation	<ESC>\tn=spell:strict\ a34y-347<ESC>\tn=normal\ <say-as interpret-as="spell" format="strict">a34y-347</say-as>
state	(Not all languages.) State, city, and province names and abbreviations reading	<ESC>\tn=state\ FL<ESC>\tn=normal\ <say-as interpret-as="state">FL</say-as>
streetname	(Not all languages.) Street name and abbreviation reading	<ESC>\tn=streetname\ Emerson Rd.<ESC>\tn=normal\ <say-as interpret-as="streetname">Emerson Rd.</say-as>
streetnumber	(Not all languages.) Street number reading	<ESC>\tn=streetnumber\11001-11010<ESC>\tn=normal\ <say-as interpret-as="streetnumber">11001-11010</say-as>
telephone	Alias of phone
time	Time of day reading	<ESC>\tn=time\10:00<ESC>\tn=normal\ <say-as interpret-as="time">10:00</say-as>
vxml:boolean	VoiceXML 2.0 defined type for boolean input	<ESC>\tn=vxml:boolean\true<ESC>\tn=normal\ <say-as interpret-as="vxml:boolean">true</say-as>
vxml:currency	VoiceXML 2.0 defined type for currencies	<ESC>\tn=vxml:currency\EUR15.23<ESC>\tn=normal\ <say-as interpret-as="vxml:currency">EUR15.23</say-as>
vxml:date	VoiceXML 2.0 defined type for dates	<ESC>\tn=vxml:date\20100102<ESC>\tn=normal\ <say-as interpret-as="vxml:date">20100102</say-as>
vxml:digits	VoiceXML 2.0 defined type for digit sequences	<ESC>\tn=vxml:digits\20051225<ESC>\tn=normal\ <say-as interpret-as="vxml:digits">20051225</say-as>
vxml:number	VoiceXML 2.0 defined type for numbers	<ESC>\tn=number\+15243.1235<ESC>\tn=normal\ <say-as interpret-as="vxml:number">+15243.1235</say-as>
vxml:phone	VoiceXML 2.0 defined type for telephone numbers	<ESC>\tn=vxml:phone\7815655000<ESC>\tn=normal\ <say-as interpret-as="vxml:phone">7815655000</say-as>
vxml:time	VoiceXML 2.0 defined type for time strings	<ESC>\tn=vxml:time\0100a<ESC>\tn=normal\ <say-as interpret-as="vxml:time">0100a</say-as>
zip	(American English only.) ZIP codes	<ESC>\tn=zip\01803<ESC>\tn=normal\ <say-as interpret-as="zip">01803</say-as>

Using scopes to activate dictionaries

Use the control sequence <ESC>\tn=scope to activate a dictionary for a specific scope. The value of scope is any TN type including any user-defined types you might create.

When creating a dictionary with Vocalizer Studio, you define a scope by assigning a domain to that dictionary. When the dictionary is loaded, the scope is declared as a suffix to the MIME type. When your application supplies marked-up text to be spoken, the mark-up can activate that dictionary by referring to its scope: when the mark-up matches the language and scope of any loaded dictionary, Vocalizer consults that dictionary at runtime. Otherwise, Vocalizer ignores dictionaries that don't match the language and scope.

Imagine you have an English-speaking application for the sport of long-distance bicycling, and many of the technical descriptions use French words such as "brevet" and "randonneuring" with peculiar American pronunciations. You could create a user dictionary designated as a "biking" domain. Example mark-up (with bold text to highlight the text substitutions in the dictionary):

<ESC>\tn=biking\Welcome to the randonneuring hotline. Every brevet in the series begins on Thursday mornings.<ESC>\tn=normal\

For example, the dictionary might normalize the spoken text as "Welcome to the render nearing hotline. Every brevay in the series begins on Thursday mornings."

Inserting a bookmark

Use the control sequence <ESC>\mrk=name\ control sequence to mark a position in the input text. Vocalizer tracks this position throughout the TTS conversion. The bookmark name can be any text sequence. After synthesis Vocalizer delivers a bookmark marker that refers to this position in the input text and the corresponding position in the audio output.

The use of this control sequence does not affect the speech output process.

Some examples:

This bookmark <ESC>\mrk=bookmark 1\ marks a reference point.

Another <ESC>\mrk=-bookmark 2\ does the same.

The SSML equivalent of this control sequence is the <mark> element:

<speak>This bookmark <mark name="bookmark1"/> marks a reference point.

    Another <mark name="bookmark2"/> does the same.</speak>

Changing the speaking rate

Use this control sequence to set the speaking rate to a specified value. The format is <ESC>\rate=level\ where level is between 50 (half the default rate) and 400 (four times the default rate), and 100 is the default speaking rate.

Example:

I can <ESC>\rate=150\ speed up the rate <ESC>\rate=75\ or slow it down.

The SSML equivalent is the rate attribute of the <prosody> element:

<speak>I can <prosody rate="+50%">speed up the rate</prosody>

    <prosody rate="-25%">or slow it down</prosody></speak>

See Rate scale conversion.

For more precise results, experiment with different combinations of pitch, rate, and timbre. For example, you can create a more gender-neutral voice by assigning pitch and timbre to 80 or 90 for a female voice.

Changing the pitch

Use this control sequence to set the pitch to the specified level. The pitch code changes the speaking voice to sound deep (lower values) or thin (higher values).

The format is <ESC>\pitch=level\ where level is a value between 50 (lower pitch) and 200 (higher pitch), where 100 is typical. For example:

<ESC>\pitch=30\ I can speak with a deep voice, <ESC>\pitch=170\ but also very thin.

The SSML equivalent is the pitch attribute of the <prosody> element where the values are relative percentages of change:

<prosody pitch="-50%">I can speak rather deeply,</prosody>

    <prosody pitch="+50%">but also very thinly.</prosody>

In SSML, you can set symbolic values instead of percentages: x-low, low, medium, high, x-high, and default.

For more precise results, experiment with different combinations of pitch, rate, and timbre. For example, you can create a more gender-neutral voice by assigning pitch and timbre to 80 or 90 for a female voice.

Changing the volume

Use this control sequence to set the volume to the specified level. The format is <ESC>\vol=level\ where level is a value between 0 (no volume) and 100 (the maximum volume), where 80 is typically the default volume. For example:

<ESC>\vol=10\ I can speak rather quietly, <ESC>\vol=90\ but also very loudly.

The SSML equivalent is the volume attribute of the <prosody> element:

<prosody volume="-50%">I can speak rather quietly,</prosody>

    <prosody volume="+50%">but also very loudly.</prosody>

See Volume scale conversion.

Changing the timbre

Use this control sequence to make the timbre of the speaking voice sound older (lower values) or younger (higher values). You can use this feature on any voice.

The control sequence <ESC>\timbre=level\ sets the timbre to the specified level, where level is a percentage value between 50 and 200, where 100 is typical. For example:

<ESC>\timbre=180\ I can sound like this, <ESC>\timbre=50\ but also sound very different.

The SSML equivalent is the timbre attribute of the <prosody> element:

<prosody timbre="180">I can sound like this,</prosody>

<prosody timbre="50">but I can also sound very different.</prosody>

You can set symbolic values instead of percentages:

Symbolic value	Corresponding percentage
x-young	+35%
young	+20%
medium	0%
default	0%
old	-20%
x-old	-35%

For more precise results, experiment with different combinations of pitch, rate, and timbre. For example, you can create a more gender-neutral voice by assigning pitch and timbre to 80 or 90 for a female voice.

Setting the end-of-sentence pause duration

Use this control sequence to set an end-of-sentence pause duration (wait period). The format is <ESC>\wait=value\ where the value is between 0 and 9. The pause is that number multiplied by 200 milliseconds.

Examples:

<ESC>\wait=2\ There will be a short wait period after this sentence.

<ESC>\wait=9\ This sentence will be followed by a long wait period. Did you notice the difference?

This control sequence has no equivalent in SSML, although you can use the <break> element to set the length of pauses explicitly.

Setting the spelling pause duration

Use this control sequence to set the inter-character pause. The format is <ESC>\spell=duration\ where the duration value is milliseconds. For example:

The part code is <ESC>\tn=spell\<ESC>\spell=200\a134b<ESC>\tn=normal\

Note: The spelling pause duration does not affect the spelling done by <ESC>\readmode=char\, because that mode treats each character as a separate sentence. To adjust the spelling pause duration for <ESC>\readmode=char\, set the end of sentence pause duration using <ESC>\wait\ instead.

This control sequence has no equivalent in SSML.

Controlling end-of-sentence detection

Use this control sequence to control end of sentence detection. The format is <ESC>\eos=1\ and <ESC>\eos=0\ where the value of 1 forces a sentence break and 0 suppresses a sentence break. Optionally, use this sequence in conjunction with explicit read mode (which disables automatic end-of-sentence detection for a block of text). See Controlling the read mode.

For suppression, the sequence must appear immediately after the symbol that would normally trigger a break (such as after a period).

Examples:

Tom lives in the U.S. <ESC>\eos=1\ So does John.

180 Park Ave. <ESC>\eos=0\ Room 24

The SSML equivalent of this control sequence is the <s> (or <sentence>) element to force a sentence break, and a <break> with attribute strength set to "none" to suppress a break:

<s>Tom lives in the U.S.</s>

<s>So does John.180 Park Ave. <break strength="none"/> Room 24</s>

There is no SSML equivalent for the <ESC>\readmode=explicit_eos\ sequence. SSML lets you force or suppress a sentence break, but does not allow you to activate explicit end-of-sentence mode.

Setting the textual context explicitly

Use this control sequence to indicate a position in the sentence of the text so Vocalizer can adjust the intonation appropriately. The format is <ESC>\prosody=position\ where the value is one of these:

Prosody position values	Description
<ESC>\prosody=medial\	Mark for middle of phrase
<ESC>\prosody=phrase-break\	Mark for phrase boundary
<ESC>\prosody=sentence-break\	Mark for sentence boundary

For example, this markup identifies the date as being preceded by a carrier phrase and followed by a sentence boundary:

<ESC>\prosody=medial\<ESC>\tn=date\ 2011-07-04

<ESC>\tn=normal\<ESC>\prosody=sentence-break\

With SSML, use the detail attribute in the <say-as> element:

Prosody	Preceded by	Followed by
prosody-start-medial-end-medial	Carrier phrase	Carrier phrase
prosody-start-medial-end-phrase	Carrier phrase	Phrase boundary
prosody-start-medial-end-sentence	Carrier phrase	Sentence boundary
prosody-start-phrase-end-medial	Phrase boundary	Carrier phrase
prosody-start-phrase-end-phrase	Phrase boundary	Phrase boundary
prosody-start-phrase-end-sentence	Phrase boundary	Sentence boundary
prosody-start-sentence-end-medial	Sentence boundary	Carrier phrase
prosody-start-sentence-end-phrase	Sentence boundary	Phrase boundary
prosody-start-sentence-end-sentence	Sentence boundary	Sentence boundary

The SSML equivalent to the above example identifies the date as being preceded by a carrier phrase and followed by a sentence boundary:

<say-as interpret-as="date" detail="prosody-start-medial-end-sentence"> 2011-07-04</say-as>

If the intonation pattern isn’t explicitly specified at the SSML level, Vocalizer uses intonation patterns implicitly provided by textual context. If the intonation pattern is explicitly specified at the SSML level, the detail attribute in the <say-as> prompt has priority over the textual context.

For example, the following SSML element implicitly considers the date inserted as being preceded by a carrier phrase and followed by sentence boundary:

The date is: <say-as interpret-as="date">2011-06-28</say-as>.

Controlling the read mode

Read mode	Description
<ESC>\readmode=sent\	Sentence mode (the default)
<ESC>\readmode=char\	Character mode (similar to spelling)
<ESC>\readmode=word\	Word-by-word mode
<ESC>\readmode=line\	Line-by-line mode
<ESC>\readmode=explicit_eos\	Explicit end-of-sentence mode (sentence breaks only where indicated by <ESC>\eos=1\)

Changing the voice

Use this control sequence to change the speaking voice and force a sentence break. The format is <ESC>\voice=voice_name\ where the value is any installed voice. For example:

<ESC>\voice=samantha\ Hello, this is Samantha.

<ESC>\voice=tom\ Hello, this is Tom.

The SSML equivalent of this control sequence is the <voice> element:

<voice name="Samantha">Hello, this is Samantha.</voice>

<voice name="Tom">Hello, this is Tom.</voice>

To use this control sequence successfully, you must have more than one voice installed. If you do not have the requested voice installed, Vocalizer flags a warning and does its best to carry on. In this example, if Samantha is installed, but Tom is not, Vocalizer synthesizes, “Hello, this is Tom,” in Samantha’s voice, and produces this debug message:

SEVERE 16123: TTSEG|Could not do a mid-synthesis voice switch, voice load failed, voice=Tom

Instead of a specific voice, the native control sequence can accept key-value pairs that let you choose a language or gender for the voice rather than a specific voice. For example:

<ESC>\voice=key:value[,key:value]\

Where a key may be:

lang—The three-letter code for a language (for example, ENU for American English). This may be "unknown".
gender—A gender for the voice (male or female).
ietf—The IETF code for a language (for example, en-US for American English).

Several key-value pairs may be included, using a comma or semi-colon as a separator between each pair. For example:

<ESC>\voice=(lang:unknown,gender:female)\

Vocalizer chooses a default voice that meets the specified key and value criteria, if such a voice is available.

Labeling text for language identification

Use this escape sequence to label text as an unknown language, Vocalizer automatically determines the language with its built-in language identifier. This feature only works with languages that support language ID. See Using automatic language identification.

To label the text, use the following control sequence:

Begin the string with: <ESC>\lang=unknown\
End the string with: <ESC>\lang=normal\
(alternatively, simply end the input)

The automatic language identifier scope is enabled by default (set to user-defined). Use a Vocalizer configuration file to change the setting. If the scope is not enabled, Vocalizer ignores the control sequence.

Vocalizer identifies the language on a sentence-by-sentence basis within the text and switches the synthesis voice if necessary. Vocalizer restores the original synthesis voice at the next <ESC>\lang=normal\ or the end of the synthesis request.

Note: Vocalizer does not support specifying an explicit language name instead of "unknown".

Example:

Le titre de la chanson est : <ESC>\lang=unknown\In Between <ESC>\lang=normal\

The SSML equivalent of this control sequence is the xml:lang attribute, which is available for several SSML elements, including <p>, <speak>, and <s>:

<speak>Le titre de la chanson est:</speak>

<speak xml:lang="unknown">In Between</speak>

Indicating a paragraph break

Use this escape sequence to declare a paragraph break (which also implies a sentence break).The format is <ESC>\para\ with no value to specify.

Example:

Introduction to Vocalizer. <ESC>\para\ Vocalizer is a state-of-the-art text to speech system.

The SSML equivalent of this control sequence is the <p> (or <paragraph>) element:

<p>Introduction to Vocalizer.</p>

<p>Vocalizer is a state-of-the-art text to speech system.</p>

Resetting control sequences to the default

Use this escape sequence to reset all parameters to the original settings at the start of synthesis. The format is <ESC>\rst\ with no value to specify. For example:

<ESC>\vol=10\ The volume is set to a low value. <ESC>\rst\ Now it is reset to its default value.

<ESC>\rate=10\ The rate is set to a low value. <ESC>\rst\ Now it is reset to its default value.

This control sequence has no equivalent in SSML.

Changing the speaking style

Use this escape sequence to change the speaking style of the current voice. The format is <ESC>\style=value\ where the value is the name of a style. For example:

<ESC>\style=lively\This text would be read in lively style.

Different voices support different styles. Typical values are lively, neutral, lively, formal, didactic, and apologetic. See Changing the speaking style. If you specify an unsupported value, there is no change in the speaking style.

To reset the speaking style to the default, specify the value default. For example:

<ESC>\style=default\This text would be read in default style of the voice.

This control sequence has no equivalent in SSML.

Controlling agreement of number, gender, and case

Use this control sequence to define the case, gender, and number of a word. This feature and its values can vary for each voice. It is first implemented for German Petra-ml xpremium-high and added to other voices over time. For details, see the Language Supplement for each voice you download. (If a supplement doe not mention the feature, this means the voice does not support it.)

The format is <ESC>\agreement=features\ where the value is one or more key/value pairs separated by semi-colons. You can list the pairs in any order. The sequence applies to the next input word (Sekunde, in the following example):

<ESC>\agreement=gender:FEM;case:NOM;number:SING\1 Sekunde

Use this sequence (in languages and voices that support it) where words can vary by number, gender, or case depending upon the implied context where the words appear. Typically, this occurs when reading numeric values where the actual spoken numbers may change based on the context where they are used. When you explicitly define this context, the engine generates the correct word.

Feature	Value	Description
case	ACC DAT GEN NOM	Accusative Dative Genitive Nominative
gender	FEM NEUT MASC	Feminine Neuter Masculine
number	PLUR SING	Plural Singular

Feature

Value

Description

case

ACC

DAT

GEN

NOM

Accusative

Dative

Genitive

Nominative

gender

FEM

NEUT

MASC

Feminine

Neuter

Masculine

number

PLUR

SING

Plural

Singular

It's not necessary to specify all features. If you omit a feature, Vocalizer uses its normal processing algorithms.

Vocalizer ignores all specified features in the following situations:

If the voice does not support the feature.
If any value is incompatible with the input context. For example, 2 is plural even if the sequence declares number:SING.
If there's any punctuation between the sequence and the target word.
If any key/value pair is malformed.

This control sequence has no equivalent in SSML.

Control sequences

Native control sequence format

SSML markup

Defining an alternative escape sequence

Control sequence tasks

Related topics