Getting key values in results
To retrieve the current values of special SWI_ keys, get the recognition results as returned by your environment or voice platform.
Note: Nuance Recognizer supports all keys. Dragon Voice supports SWI_literal, SWI_spoken, SWI_meaning, SWI_utteranceSNR.
Default keys in recognition results
These keys appear in recognition results by default:
For all other keys, use swirec_extra_nbest_keys to them to recognition results.
Keys set in application URIs for ECMAScript use
These keys are set by an application and can be used by ECMAScript in grammar files, but are not available directly in the recognition result.
Only one SWI_ key appears in this category, but it is a significant one which we have already used in our ECMAScript examples: SWI_vars.
SWI_vars
Nuance Recognizer supports this key. Dragon Voice does not.
The SWI_vars key passes variables to a grammar from an application. This lets you affect grammar processing based on information that is available to that application.
For example, consider an application that asks for a date of birth. Since a person’s date of birth must be the current date or a date in the past, the application can pass the current date to the grammar, and the grammar can use this date to disallow all parses of dates in the future. This capability requires your application to pass a variable to the grammar.
In your application, you set can these variables by appending one or more variable names and values to the grammar’s URI. Then, in the grammar, you access the variables through the SWI_vars object.
For example, imagine that you have a birthdate grammar URI called file://mygrammars/birthdate.grxml and your application has computed the present date, which is 20100922. The URI you pass when loading or activating grammars would be:
file://mygrammars/birthdate.grxml?SWI_vars.today=20100922
The file birthdate.grxml itself would contain the following:
<rule id="BD" scope="public">
<ruleref uri="#BIRTHDATE" />
<tag>
today = SWI_vars.today ? SWI_vars.today :'20100630';
if (BIRTHDATE.date > today) SWI_disallow=1;
</tag>
</rule>
<rule id="BIRTHDATE" />
<tag>… ; date = …; </tag>
<!-- ... grammar elements and scripts that accept -->
<!-- ... dates and convert them to a string of the -->
<!-- ... form YYYYMMDD -->
</rule>
In this grammar, the BIRTHDATE rule does the work of recognizing the date and converting it into a numeric string. The root rule BD simply executes the script that sets SWI_disallow to 1 if the date is in the future.
Note several things about this example:
- The > entity represents the greater-than sign (>). This is required by XML, since ">" has special meaning in markup. See Escaped characters.
- The construct today = SWI_vars.today ? SWI_vars.today : '20100630'; is used often when passing variables in with SWI_vars. It uses a default value for “today” for the case where the application has not provided a value in the URI. This is necessary to avoid a scripting error that would occur from trying to manipulate an unset value.
- Because today and BIRTHDATE.date are strings, the comparison is a string comparison (which is valid). In other cases, you might treat them as integers.
Protecting SWI_vars data
When a SWI_vars variable contains confidential information, use the swirec_sensitive_query_keys parameter to protect the data from unauthorized access.
By default, Recognizer logs grammar URIs including the variables and values. For example, each of the following URIs contains sensitive data that needs protection:
http://myserver/securityCode.grxml?SWI_vars.ssnum=123456789
http://myserver/birthdate.grxml?SWI_vars.bdate=01301960
http://myserver/password.grxml?sessionID=45678
You can protect any key/value pair in the URI query string. Above, ssnum and bdate are SWI_vars variables, but sessionID is not.
To protect the data, use swirec_sensitive_query_keys:
<param name="swirec_sensitive_query_keys">
<value>SWI_vars.ssnum</value>
<value>SWI_vars.bdate</value>
<value>sessionID</value>
</param>
In response, the call log contains URI values like these:
http://myserver/ssnum.grxml?SWI_vars.ssnum=_suppressed
http://myserver/ssnum.grxml?SWI_vars.bdate=_suppressed
http://myserver/ssnum.grxml?sessionID=_suppressed
See swirec_sensitive_query_keys.
Scoping of SWI_vars
The SWI_vars variable is passed to the specified grammar document only; its scope does not extend to any other grammar documents, even if they are referenced from the specified grammar. You can use the SWI_vars key with any grammar that uses the swi or W3C tag syntax.
To make the variable available in a subgrammar, you must explicitly pass the variable in the <ruleref> element that calls the subgrammar. The following example shows a line from a parent grammar that copies the bdate variable into the new context provided by the mydate.xml subgrammar:
<item>
<ruleref uri="mydate.xml#date?SWI_vars.bdate=SWI_vars.bdate"/>
</item>
The following example copies all its SWI_vars variables into the new context:
<item><ruleref uri="mydate.xml#date?SWI_vars=SWI_vars"/></item>
Passing UTF-8 character in SWI_vars
To send a query string with UTF-8 encodings in SWI_vars, you must use double escapes for the UTF-8 characters. Recognizer removes the first level of escape from the URI, and the grammar ECMAScript removes the second level.
The following URI query string passes Hindi language characters. Here is the string with double escapes:
myGrammar.grxml?SWI_vars.myKey=%e0%a4%96%e0%a4%be%e0%a4%a4%e0%a4%be
Recognizer removes one level of escapes. Here is the single encoding of the same string:
myGrammar.grxml?SWI_vars.myKey=%25%43%33%25%41%38"
Upon receipt of the variable in the recognition result, your ECMAScript decodes the value as follows:
myKey = decodeURIComponent (SWI_vars.myKey);
Here is a example. This grammar (school.grxml) contains the vocabulary word école, requiring UTF-8 encoding of the é when passing parameters to the grammar. The examples uses the before and after tags to return the single escapes (the result of Recognizer processing of the URI), and the result when the grammar ECMAScript removes the remaining escapes:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xml:lang="fr-CA" version="1.0" root="_root"
xmlns="http://www.w3.org/2001/06/grammar">
<meta name="swirec_compile_parser" content="1"/>
<rule id="_root" scope="public">
<item>
école
<tag>
before=SWI_vars.disallow;
after=decodeURIComponent(before);
</tag>
</item>
</rule>
</grammar>
Running parsetool, the command line requires double escapes for passing the accented é character:
parsetool school.grxml\?SWI_vars.disallow=%25%43%33%25%41%39cole -test_file tst.utf8
Parsetool returns the following results.
<?xml version='1.0'?>
<result>
<interpretation grammar="ParseToolGrammar" confidence="100">
<input mode="speech">
école
</input>
<instance>
<before confidence="100">
%C3%A9cole
</before>
<after confidence="100">
école
</after>
<SWI_literal>
École
</SWI_literal>
<SWI_grammarName>
ParseToolGrammar
</SWI_grammarName>
<SWI_meaning>
{after:école before:%C3%A9cole}
</SWI_meaning>
</instance>
</interpretation>
</result>
To summarize, Recognizer translates the double-encoded SWI_vars string "%25%43%33%25%41%39cole" into the single encoded "%C3%A9cole". The grammar’s ECMAScript translates the string into the correct wide-char presentation using a call to decodeURIComponent.
Using SWI_vars to control conditional phrases
A typical use of SWI_vars is to control the SWI_disallow key. For example:
var mina = SWI_vars.minallowed
? SWI_vars.minallowed : '19000101';
var maxa = SWI_vars.maxallowed
? SWI_vars.maxallowed : '21991231';
if (date < mina || date > maxa ) {
SWI_disallow=1;
}
You could also use SWI_vars to control setting SWI_scoreDelta to bias against particular responses. (See SWI_scoreDelta.)
Keys available during recognition to ECMAScript scripts
These keys are returned by Recognizer, and are available during recognition for use by ECMAScript scripts in the loaded grammars.
As in the case of SWI_vars, we have already seen these two keys in our examples (see Using parseTool to verify ECMAScript).
SWI_literal
Nuance Recognizer and Dragon Voice support this key.
SWI_literal contains text representing the exact words that were recognized. This key is included in the swirec_extra_nbest_keys list by default.
The SWI_literal key is returned only after a rule has resulted in recognition. As a consequence, you can only use SWI_literal in a context outside the invoked rule in your GrXML file. For example, the following rule will not assign a value to the myvar variable:
<rule id="myrule">
word
<tag>myvar=SWI_literal </tag>
</rule>
Instead, you would have to create a sub-rule to recognize the word, invoke it from within the myrule rule, and assign the value from SWI_literal as follows:
<rule id="myrule">
<ruleref uri="#subrule" />
<tag>myvar=subrule.SWI_literal </tag>
</rule>
<rule id="subrule">
<item>word</item>
</rule>
SWI_literal is set for every rule in a grammar. In particular, it is set for the root rule, which leads to the following consequences:
- In the grammar, you can use the key in script computations.
- In an application, you can get the key from the recognition result to determine the raw text that was recognized.
SWI_literalConfidence
Nuance Recognizer supports this key. Dragon Voice does not.
SWI_literalConfidence measures the confidence that the words that matched the rule are correct. This is usually the same part of the utterance covered by the words in SWI_literal. SWI_literalConfidence is computed at the same time as SWI_literal, and is set for every rule in the grammar, including the root rule.
SWI_literalConfidence is intended to provide a more accurate confidence for keys whose values are determined from only a small part of the utterance (such as when there is a lot of filler speech in the utterance). For example, suppose the caller says “book a meeting at 10am with John Smith, Harry Potter, and Harriet Vane,” and that we have a rule, Time, that captures “at 10am.” SWI_literalConfidence for the Time rule supplies the confidence of “at 10 am” only, without being affected by or considering the rest of the utterance.
SWI_literalConfidence is useful in cases such as these:
- When the grammar writer needs to use a confidence within a rule’s ECMAScript during its execution.
ExampleIn this example, the grammar writer chooses which value to return depending on the value of SWI_literalConfidence determined for the rule with the ID AnotherRule:
<rule id="root" scope="public">
…
<item>
<ruleref uri="#OneRule"/> <ruleref uri="#AnotherRule"/>
<tag>
if ( AnotherRule.SWI_literalConfidence <= 250 ) {
Value = OneRule.Content;
}
else {
Value = AnotherRule.Content;
}
</tag>
</item>
…
</rule>
As with SWI_confidence, if the grammar writer requires a SWI_literalConfidence from a rule not directly referred to, they must store and pass back SWI_literalConfidence:
<rule id="root" scope="public">
…
<item>
<ruleref uri="#OneRule"/> <ruleref uri="#AnotherRule"/>
<tag>
if ( AnotherRule.confidenceValue <= 250 ) {
Value = OneRule.Content;
ValueConfidence = OneRule.
}
else {
Value = AnotherRule.Content;
}
</tag>
</item>
…
</rule>
<rule id="AnotherRule">
…
<item>
… <ruleref uri="#YetAnotherRule"/> …
<tag>
…
confidenceValue = YetAnotherRule.SWI_literalConfidence;
…
</tag>
</item>
</rule>
- When a slot’s value is a list of several values concatenated, and we want a slot confidence for each item in the list.
Example<rule id="List" scope="public">
<item repeat="1-">
<ruleref uri="#ListItem"/>
<tag>
VALUE = (VALUE ? VALUE + ',' : '') + ListItem.VALUE;
CONFS = (CONFS ? CONFS + ',':'') + ListItem.SWI_literalConfidence;
</tag>
</item>
</rule>
<rule id="ListItem">
<one-of>
<item> john smith <tag>VALUE="John Smith"</tag> </item>
<item> harry potter <tag> VALUE="Harry Potter"</tag> </item>
<item> harriet vane <tag>VALUE="Harriet Vane"</tag> </item>
…
</one-of>
</rule>
If the spoken utterance is "John Smith, Harry Potter, Harriet Vane," this grammar will return the two slots:
- VALUE = "John Smith,Harry Potter,Harriet Vane"
- CONFS = "856,721,900"
SWI_spoken
Nuance Recognizer and Dragon Voice support this key.
This key is set in recognition results. It contains the exact text that was recognized.
The value is almost always identical to that of the SWI_literal key. However, the text may be different when it is written in a normalized form (this happens if the grammar was processed with a normalizer before being loaded).
Aside from normalization differences, SWI_spoken is identical to SWI_literal.
Keys set to control n-best processing and confidence scoring
Set these keys in grammars to control and improve n-best processing and confidence scoring.
Nuance Recognizer supports these keys. Dragon Voice does not.
SWI_decoy
SWI_decoy enables the use of "decoy" items in grammars.
A decoy is a vocabulary item used to improve the rejection of out-of-vocabulary utterances in small-vocabulary grammars, thereby avoiding false recognitions.
When a defined decoy word tops the n-best list for recognition, it is returned with an overall confidence score of zero, and confidence scores for the individual keys returned are also set to 0. Assuming that the minimum confidence level required is greater than 0, this results in a rejection rather than a false recognition (also called a false acceptance).
Decoys can be used in many situations:
- Consider a grammar for a hot word, where people can interrupt a long prompt by speaking the appropriate command, and advance to the next segment of the prompt. Suppose the only vocabulary item in the grammar supporting this command is a single word, "skip".
With no competing vocabulary items, the probability of falsely recognizing "skip" increases. Since this is a magic word, it’s especially important to reject false candidates and avoid accidental barge-in. To avoid false recognitions, the application can activate a parallel grammar that recognizes noises and other words. This adds competing sounds to the vocabulary, and allows Recognizer to perform more refined comparisons before returning a result. If the result matches a decoy, the application does not interrupt the prompts.
- Decoys can be used for security purposes. For example, if a grammar is needed to recognize a password, there is a chance that any similar-sounding word will be recognized as that one item. To eliminate such an error, decoy items that sound similar to the password can be added to the grammar. If a decoy item is recognized, the utterance is rejected.
While decoys reduce the number of false acceptances, they also increase the level of false rejections (that is, rejections of utterances that should be accepted). The more accurate your grammar is to begin with, the less likely it is that decoys will reduce false acceptances more than they increase false rejections.
It is possible to fine-tune the sensitivity of a decoy by changing grammar weights. Typically, this work is provided through Nuance solutions services or by expert speech scientists.
The SWI_decoy key is a flag to indicate that a particular word or phrase in the grammar is a decoy. Possible values are 0 and 1 (zero means no decoy; 1 means treat the item as a decoy). In this example, the word “hello” is set as a decoy:
...
<item><tag>SWI_decoy='1'</tag>hello</item>
...
Note: SWI_decoy must be set on the root rule.
The following example shows a decoy grammar that recognizes words similar to a password (“skip”) defined in the root rule:
<!-- Note that SWI_decoy is set on the root rule. -->
<rule id="root" scope="public">
<one-of>
<item>skip</item> <!-- This is the password -->
<item>
<ruleref uri="#decoys"/>
<tag>SWI_decoy=1</tag>
</item>
</one-of>
</rule>
<rule id="decoys" scope="public">
<one-of>
<item>skid</item>
<item>skiff</item>
<item>skill</item>
<item>sip</item>
<item>skin</item>
<item>skim</item>
<item>skit</item>
<item>scope</item>
<item>escape</item>
<item>scoop</item>
<item>scab</item>
</one-of>
</rule>
If the user says a word that more closely matches any one of the words in the decoys rule than it does the word “skip”, then it will be rejected.
This may result in more false rejections of the word “skip”; but this increase in the false rejection rate will hopefully be small compared to the decrease in false acceptances, and/or tolerable given the importance of not falsely recognizing the password.
SWI_disallow
SWI_disallow lets you improve the accuracy of Recognizer by disallowing items from the n-best list, based on criteria you specify in a script.
For example, suppose your application collects departure and destination city names from the caller. Do not use the same values for the departure and destination cities. The following example shows how the SWI_disallow key can be used to prevent Recognizer from accepting the departure city as the destination city:
<?xml version='1.0' encoding='UTF-8'?>
<grammar xml:lang="en-US" version="1.0" root="ROOT"
tag-format="swi-semantics/1.0"
xmlns="http://www.w3.org/2001/06/grammar">
<rule id="ROOT" scope="public">
<item>
from
<ruleref uri="#City"/>
<tag>DEPARTURE=City.CITY</tag>
to
<ruleref uri="#City"/>
<tag>DESTINATION=City.CITY</tag>
<tag>
if (DEPARTURE==DESTINATION) SWI_disallow=1
</tag>
</item>
</rule>
<rule id="City">
<one-of>
<item>
<tag>CITY='boston'</tag>
boston
</item>
<item>
<tag>CITY='seattle'</tag>
seattle
</item>
</one-of>
</rule>
</grammar>
When set to a zero value (character "0", integer 0, or boolean false), SWI_disallow has no effect. When set to a non-zero value, it invalidates the parse immediately (that is, the parse is removed from the n-best list).
You can also use SWI_disallow to prevent the recognition of an item that has previously been rejected by the caller. For an example, see the discussion of swirec_grammar_script.
SWI_safeKey
Grammar developers can use SWI_safeKey to pass non-confidential recognition results to log files even if security settings are enabled.
The SWI_safeKey is useful for systems that set swirec.secure_context to protect confidential data. Typically, grammar developers use the key to pass partial recognition results to log files when passing the whole result might be a security risk. For example, it can pass several digits of a recognized credit card number, but not the whole number. In the logs, the data appears with the SAFEK token in the SWIrcnd event.
To implement a data pass-through (also known as partial masking) in a grammar, set an ECMAScript variable (or object) named SWI_safeKey in the <tag> portion of the root grammar rule. Recognizer converts the variable to a string, and writes the result to the SAFEK token.
The following example takes the credit_card value (a sixteen-digit string) and assigns the last four digits to the last4digits variable in the SWI_safeKey object:
SWI_safeKey = new Object();
SWI_safeKey.last4digits = credit_card.substr(12, 4);
This example sets SWI_safeKey as a variable, and add surrounding context. It defines the _creditcard rule in the root, and defines the key in the rule’s <tag>:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/06/grammar
http://www.w3.org/TR/speech-grammar/grammar.xsd"
xml:lang="en-US" mode="voice" root="_creditcard">
<!-- Fragment of a credit card grammar -->
<rule id="_creditcard" scope="public">
<item> ... </item>
<ruleref uri="#CREDITCARD"/>
<tag>
SWI_meaning=CREDITCARD.account_num;
SWI_safeKey = CREDITCARD.account_num.substr(12, 4);
...
Recognizer returns the value of SWI_safeKey regardless of security settings (swirec.secure_context). This example shows a fragment of SWIrcnd event in the call log when security is set to open:
TIME=20100923111612272|CHAN=myrec|EVNT=SWIrcnd|RSTT=ok|RSLT=5093340521027901|SPOK=five zero nine three three four zero five two one zero two seven nine zero one|SAFEK=7901|KEYS=<SWI_safeKey conf="0">7901</SWI_safeKey><CARDTYPE conf="779">mastercard</CARDTYPE><MEANING conf="779">5093340521027901</MEANING>
This example shows the same recognition with security set to suppressed:
TIME=20100927161841936|CHAN=myrec|EVNT=SWIrcnd|SECURE=TRUE|RSTT=ok|RSLT=_SUPPRESSED|SPOK=_SUPPRESSED|SAFEK=7901|KEYS=_SUPPRESSED
SWI_scoreDelta
SWI_scoreDelta lets you improve recognition accuracy by assigning weights to recognition items, based on how frequently you expect each item to be used.
The SWI_scoreDelta value is added to the recognition raw score before confidence scores are computed. A positive value increases the score to make it more likely that the item will be recognized, while a negative SWI_scoreDelta decreases the score so recognition is less likely. If you use SWI_scoreDelta in a grammar, the key is also returned with the recognition results.
For example, if you have a grammar consisting of many cities, you could assign higher SWI_scoreDelta values to cities you expect to be spoken more often:
<rule id="ROOT">
<one-of>
<item>
<tag>CITY='new york'; SWI_scoreDelta=80</tag>
new york
</item>
<item>
<tag"CITY='newark'; SWI_scoreDelta=-80"</tag>
newark
</item>
</one-of>
</rule>
In this example, recognition would be weighted in favour of the interpretation “new york”, and against the interpretation “newark”.
Note: If used, SWI_scoreDelta must be set on the root rule.
For American English, typical SWI_scoreDelta values ranges from a maximum of 100 to a minimum of -100. In other languages the score range is different: before you use SWI_scoreDelta in another language, it is strongly recommended that you contact Nuance Network at Nuance Network.
Keys set by Recognizer and available in recognition result
The Recognizer sets these keys during recognition, and returns them in the recognition result. Unless noted otherwise, these keys are not accessible within a grammar.
SWI_attributes
Nuance Recognizer supports this key. Dragon Voice does not.
The SWI_attributes key contains all attributes that are in the XML result. The purpose of this key is to provide the attributes as character data, to ensure that slot confidence scores are not lost.
Recognizer returns an XML-formatted recognition result, containing slot confidence scores expressed as XML attributes. Voice platforms typically transform this result into an ECMAScript object that is available to the voice application. However, there is no general standard for platforms to make this transformation, and some platforms omit some information contained in the XML attributes.
Because advanced natural language applications rely on slot confidences, the SWI_attributes key ensures that this information remains available in the ECMAScript objects returned by your voice platform. By representing slot confidence as character data in the recognition result, the information usually survives any platform transformations.
The recognition result has different XML formats, depending on the media type requested by the platform. The examples below provide a comparison. This example shows a media type "application/x-vnd.speechworks.emma+xml" recognition result. Note that the confidence values are expressed as both an attribute of the location, and as part of the SWI_attributes key.
<instance>
<location>
<city confidence="82">
Aachen
</city>
<state confidence="73">
NRW
</state>
</location>
<SWI_attributes>
<location>
<city>
<confidence>
82
</confidence>
</city>
<state>
<confidence>
73
</confidence>
</state>
</location>
</SWI_attributes>
</instance>
This example shows a media type "application/x-vnd.speechworks.recresult+xml" recognition result that includes the SWI_attributes key. Note that the confidence values are expressed both as an attribute of the location, and as part of the SWI_attributes key:
<instance grammar="ParseToolGrammar">
<location>
<city conf="0.82">Aachen</city>
<state conf="0.73">NRW</state>
</location>
<SWI_attributes>
<location>
<city>
<conf>
0.82
</conf>
</city>
<state>
<conf>
0.73
</conf>
</state>
</location>
</SWI_attributes>
</instance>
The following is a generic example of the SWI_attributes key. This example does not represent an actual recognition result; instead, it shows the general case for expressing attributes within the SWI_attributes key:
<instance>
<location attr1="value1">
<city attr2="value2" attr3="value3">
Aachen
</city>
<state attr2="value2">
NRW
</state>
</location>
<SWI_attributes>
<location>
<attr1>value1</attr1>
<city>
<attr2>value2</attr2>
<attr3>value_3</attr3>
</city>
<state>
<attr2>value2</attr2>
</state>
</location>
</SWI_attributes>
</instance>
SWI_bestModel
Nuance Recognizer supports this key. Dragon Voice does not.
The SWI_bestModel key contains the full path name of the recognition model used to recognize the current n-best entry.
SWI_confidence
Nuance Recognizer supports this key. Dragon Voice does not.
This key is always available in the recognition result. You can set it explicitly in the root rule of a grammar. Otherwise, the Recognizer sets it.
This key shows the acoustic confidence of each recognized word. The acoustic confidence is different from normal sentence and slot confidences. It shows the match of speech models to the utterance, but does not compare the match to other sentences on the n-best list.
Note: To include this key in the XML result, you must add it to the swirec_extra_nbest_keys list, and enable the swirec_word_confidence_enabled parameter.
SWI_grammarName
Nuance Recognizer supports this key. Dragon Voice does not.
The SWI_grammarName key contains the name of the grammar as set during grammar activation. This key is useful when more than one grammar is active during a recognition; it makes it possible to identify which grammar parsed each result on the n-best list.
This key is included in the swirec_extra_nbest_keys list by default.
SWI_literalTimings
Nuance Recognizer supports this key. Dragon Voice does not.
The SWI_literalTimings key contains information about the beginning and ending times (relative to the start of speech in the audio buffer) of the words recognized.
This key returns the literal string with time markings for each word:
<alignment type="word" unit_msecs="1">
<word start="50" end="520"> one </word>
<word start="600" end="1190"> two </word>
<word start="1350" end="1980"> three </word>
</alignment>
Since there may be silence between words, the end time of a word often does not equal the start time of the next word.
Note: You can use this key, but it is best to use its replacement instead. The newer SWI_wordTimings key provides additional timings for silence and garbage words.
When swirec_word_confidence_enabled is enabled, the XML recognition result will also include the confidence score. For example:
<SWI_literalTimings>
<alignment type="word" unit_msecs="1">
<word start="0" end="430" confidence="1.00">
Jeff
</word>
<word start="430" end="1170" confidence="1.00">
Bond
</word>
</alignment>
</SWI_literalTimings>
To learn about setting the swirec_word_confidence_enabled parameter, see swirec_word_confidence_enabled.
When there are multiple parses for a single n-best entry, this key is returned for the first parse only. See Multiple parses.
SWI_meaning
Nuance Recognizer and Dragon Voice support this key.
The SWI_meaning key contains the semantic meaning of a recognized phrase. It can only be set for the root rule. This key is included in the swirec_extra_nbest_keys list by default, so it will appear in the XML result if your grammar sets this key.
As shown in our Example of scripts in a grammar, SWI_meaning filters out redundant answers so that entries on the n-best list are truly distinct. Eliminating redundancy improves confidence scores, and improves usefulness of the n-best list.
When one recognized phrase is similar to another in the grammar it will often have a low confidence score, as Recognizer is unsure which phrase is correct. When SWI_meaning is used properly, Recognizer groups redundant interpretations into the same slot on the n-best list. In the example below, SWI_meaning is set to “direct calls home” whether the recognized phrase is “direct my calls home” or “please direct my calls home”.
Without SWI_meaning, the grammar might produce the following n-best list:
N
|
Text
|
1
|
direct my calls to my car phone
|
2
|
direct calls to my car
|
3
|
send calls home
|
4
|
please send my calls to the office
|
5
|
send my calls to the office
|
6
|
direct calls to my home
|
When SWI_meaning is used, Recognizer arranges the n-best list by the meaning of the interpretation rather than the exact phrase spoken, so that entries on the n-best list are truly distinct:
N
|
Text
|
Top-level SWI_meaning key
|
1
|
direct my calls to my car phone
|
direct calls car
|
|
direct calls to my car
|
direct calls car
|
2
|
send calls home
|
direct calls home
|
|
direct calls to my home
|
direct calls home
|
3
|
please send my calls to the office
|
direct calls work
|
|
send my calls to the office
|
direct calls work
|
Recognizer sets SWI_meaning automatically, even if it is not explicitly set in a script within the grammar:
If SWI_meaning is not explicitly defined on the root, it is constructed by concatenating all the keys defined in the root and their values (except any keys beginning with SWI_, for example, such as SWI_literal). The key/value pairs are first sorted alphabetically. The reasoning is that as far as the application is concerned, the set of keys returned is the sentence’s meaning.
If there are no keys, the results depend on whether you are using SISR or SWI semantics. With SISR, the SWI_meaning key is not set if there are no keys. In contrast, with SWI semantics, SWI_meaning is set to the following:
{SWI_literal:<literal>}
If SWI_meaning is an object, it is converted to a string representation.
While the application can access SWI_meaning, it is more often the case that it will use other key/value pairs defined specifically for it.
SWI_rawScore
Nuance Recognizer supports this key. Dragon Voice does not.
The SWI_rawScore key contains the raw score for an interpretation—a measure of how close the utterance is to Recognizer’s internal statistical model.
When matching vocabulary items to an utterance, Recognizer assigns raw scores to each match. In subsequent processing, Recognizer compares the relative merits of all the matches, creates an n-best list, and assigns confidence scores for each entry on the list. The raw score measures a pure match between the utterance and a single vocabulary item, while the confidence score considers statistical possibilities of what might have been spoken.
This read-only key is set for each entry in the n-best list. Recognizer sets this key for return in the recognition result, and the key is not accessible within the grammar itself.
SWI_semanticSource
Nuance Recognizer supports this key. Dragon Voice does not.
This read-only key appears in the recognition results when you use statistical semantic models.
The key is set for each slot of each recognized sentence. It takes one of two possible values:
Value
|
Description
|
ssm
|
The slot was assigned by the classifier.
|
ssm_feature
|
The slot was assigned by ECMAScript in a feature rule.
|
SWI_ssmConfidences
Nuance Recognizer supports this key. Dragon Voice does not.
This read-only key appears in the recognition results when you use a statistical semantic model. It is a list of confidence scores, which are set for each recognized sentence. This key is included in the swirec_extra_nbest_keys list by default, so it will appear in the XML result if your grammar sets this key.
Each score corresponds to a semantic label in SWI_ssmMeanings. The list elements are separated with a double colon:
score1::score2::score3
SWI_ssmMeanings
Nuance Recognizer supports this key. Dragon Voice does not.
This read-only key appears in the recognition results when you use a statistical semantic model. It a list of semantic labels associated with the n-best list of sentences, set for each recognized sentence. This key is included in the swirec_extra_nbest_keys list by default, so it will appear in the XML result if your grammar sets this key.
The list elements are separated with a double colon:
label1::label2::label3
For example, if a sentence is “my bill has a charge I don’t understand” the SSM might have three hypotheses for the meaning: billing, billing_overviews, and tech_support. The confidence for each hypothesis is provided in SWI_ssmConfidences.
SWI_utteranceSNR
Nuance Recognizer and Dragon Voice support this key.
This read-only key is in the recognition results for each utterance. It contains an estimate of the utterance’s signal-to-noise ratio (SNR) in decibels.
Applications can use the SNR to assess the line noise of the telephone signal. For example, the application might turn off barge-in for a noisy line, or transfer the call to a human agent instead of attempting subsequent recognitions.
The SNR ratio is not exact, but rather an estimate. This estimate is influenced by spoken vowels, stops, fricatives, and so on, and is highly variable based on endpointer accuracy and the contents of each utterance. Consequently, do not base decisions on the signal-to-noise ratio alone. Instead, use this key in conjunction with other indicators, such as repeated recognition failures or frequent retries occurring during the telephone call.
As a general guideline, SNR values from 35.0 to 22.0 indicate a reliable signal and recognition accuracy. But in noisy environments, applications must set confidence thresholds appropriately because confidence scores are often more sensitive to noise levels than accuracy.
This key is set in the recognition result. It is not accessible within the grammar itself.
If recognition engine cannot compute the SNR ratio for an utterance, the key value will be an empty string.
SWI_wordTimings
Nuance Recognizer supports this key. Dragon Voice does not.
The SWI_wordTimings key measures where recognized words start and end, returns a structured XML recognition result that a VoiceXML application can parse, and provides confidence scores for individual words if you activate swirec_word_confidence_enabled.
This key returns the literal string with time markings for each word, and for silence and garbage between words. The measurements (milliseconds) are relative to the start of speech in the audio buffer delivered to Recognizer. If you save the recognized waveform, you can use the timings to index into the audio
The timings require adjustment if you attempt to index into the original audio submitted to Recognizer. (For example, indexing into an RTP stream.) The original audio might begin with silence not measured by Recognizer if an endpointer has removed the silence during processing (after receiving the RTP stream and before sending to Recognizer).
Here is an example XML recognition result:
<SWI_wordTimings>
<alignment>
<type>word</type>
<version>1.0.0</version>
<unit_msecs>1</unit_msecs>
<num_segments>9</num_segments>
<segments>
<seg0000>
<type>silence</type>
<start>0</start>
<end>930</end>
</seg0000>
<seg0001>
<type>word</type>
<start>930</start>
<end>1290</end>
<confidence>0.00</confidence>
<literal>one</literal>
</seg0001>
<seg0002>
<type>word</type>
<start>1290</start>
<end>1500</end>
<confidence>0.00</confidence>
<literal>one</literal>
</seg0002>
<seg0003>
<type>silence</type>
<start>1500</start>
<end>1560</end>
</seg0003>
<seg0004>
<type>word</type>
<start>1560</start>
<end>2090</end>
<confidence>0.00</confidence>
<literal>one</literal>
</seg0004>
<seg0005>
<type>silence</type>
<start>2090</start>
<end>2410</end>
</seg0005>
<seg0006>
<type>word</type>
<start>2410</start>
<end>2870</end>
<confidence>0.00</confidence>
<literal>six</literal>
</seg0006>
<seg0007>
<type>word</type>
<start>2870</start>
<end>3370</end>
<confidence>0.00</confidence>
<literal>four</literal>
</seg0007>
<seg0008>
<type>silence</type>
<start>3370</start>
<end>4010</end>
</seg0008>
</segments>
</alignment>
</SWI_wordTimings>
When there are multiple parses for a single n-best entry, this key is returned for the first parse only. See Multiple parses.
To model random or unpredictable speech patterns, grammars reference the special garbage rule: <ruleref special="GARBAGE"/>. This key represents garbage rules as a garbage-type segment. For example:
<seg0007>
<type>garbage</type>
<start>2870</start>
<end>3370</end>
</seg0007>
If you define pronunciations for a multi-word phrase in a user dictionary, SWI_wordTimings writes output for the whole phrase. For example, a financial application might define "at the market" as a single word. For these phrases, the boundaries between words is unknown to Recognizer, and SWI_wordTimings writes output as follows:
- The first word segment measures the start and end of the whole phrase.
- Subsequent word segments have zero length (the start and end time are equal).
For example, you might see:
...
<seg0000>
<type>silence</type>
<start>0</start>
<end>930</end>
</seg0000>
<seg0001>
<type>word</type>
<start>930</start>
<end>1650</end>
<confidence>1.00</confidence>
<literal>at</literal>
</seg0001>
<seg0002>
<type>word</type>
<start>1650</start>
<end>1650</end>
<confidence>1.00</confidence>
<literal>the</literal>
</seg0002>
<seg0003>
<type>word</type>
<start>1650</start>
<end>1650</end>
<confidence>1.00</confidence>
<literal>market</literal>
</seg0003>
...
Using SWI_ keys in grammars with W3C script syntax
To provide access to SWI_keys in grammars written with W3C script syntax, Nuance allows the use of the ECMAScript variables SWI and SWIrules.
Like the out variable, a SWI variable is an object that is automatically created before execution of the first semantic script in a rule. Your scripts inside the rule can set SWI_keys on the SWI object.
For example:
SWI.SWI_meaning = "blue"
Like the rules variable, you can use the SWIrules variable to access any of the SWI_keys that were set on the SWI object of a previously referenced rule:
- SWIrules.rulename holds the value of the SWI variable of a referenced rule.
- SWIrules.latest() holds the value of the SWI variable of the last referenced rule.
As with swi syntax grammars, SWI_confidence, SWI_literal and SWI_spoken are set automatically for every rule. Since they are returned only after a rule has resulted in a positive recognition, you can only use them in a context outside the invoked rule in your GrXML file.
For example:
<rule id="myrule">
<ruleref uri="#subrule" />
<tag>
out.mytag=SWIrules.subrule.SWI_literal;
out.myconf=SWIrules.subrule.SWI_confidence;
out.myspoken=SWIrules.subrule.SWI_spoken
</tag>
</rule>
<rule id="subrule">
... <!-- Defines the words to be recognized -->
</rule>
For comparison, the following example is not correct, because SWI.SWI_literal, SWI.SWI_confidence and SWI.SWI_spoken are being invoked within myrule:
<rule id="myrule">
yyy
<tag>
out.x=SWI.SWI_literal;
out.y=SWI.SWI_confidence;
out.z=SWI.SWI_spoken;
</tag>
</rule>
The following grammar illustrates the use of SWI and SWIrules in W3C script syntax:
<?xml version='1.0' encoding='UTF-8'?>
<grammar xml:lang="en-US" version="1.0" root="ROOT"
tag-format="semantics/1.0"
xmlns="http://www.w3.org/2001/06/grammar">
<rule id="A">
indigo
<tag>
SWI.SWI_meaning = "blue";
out = "lovely"
</tag>
</rule>
<rule id="ROOT">
<ruleref uri="#A"/>
elephant
<tag>
out.animal = "weird";
out.other = rules.A.out;
out.color = SWIrules.A.SWI_literal;
SWI.SWI_meaning = SWIrules.latest().SWI_meaning;
SWI.SWI_decoy = 1;
</tag>
</rule>
</grammar>
The XML result for a parse of “indigo elephant” would contain the following semantic elements (here shown in the application/x-vnd.speechworks.emma+xml format):
...
<instance>
<animal confidence="98">weird</animal>
<other confidence="98">lovely</other>
<color confidence="98">indigo</color>
<SWI_meaning>blue</SWI_meaning>
<SWI_decoy>1</SWI_decoy>
<SWI_literal>Indigo elephant</SWI_literal>
</instance>
...
Note: This example assumes that swirec_extra_nbest_keys is set to SWI_meaning, SWI_decoy and SWI_literal.
Using SWIjsPrint to debug ECMAScript
In addition to using the -debug_output tool, which shows the results of each script execution, you can use the SWIjsPrint function inside your grammar ECMAScript to show intermediate results. The function prints the value of its argument to the diagnostic log file (printed to tag 4516, which is called SWIjsPrint in the default tag map file).
For example:
A="hello";B=A+" world";SWIjsPrint(A); SWIjsPrint(B);
This script produces the following output when tag 4516 is turned on:
Dec 31 16:57:14.56| 2520| 0||| SWIjsPrint|| hello
Dec 31 16:57:14.57| 2520| 0||| SWIjsPrint|| hello world