Rule-based entities

An entity with rule-based collection method defines a set of values based on a GrXML grammar file.

While regular expressions can be useful for matching short alphanumeric patterns in text-based input and formatted speech recognition results, grammars are more useful for matching multi-word patterns in spoken user inputs. A grammar uses rules to systematically describe all the ways users could express values for an entity.

Creating rule-based entities

To create an entity using the rule-based collection method:

  1. Prepare the grammar file. See Understanding grammar files and GrXML file rules below for more details on filename conventions and the required format of the file.

  2. (As required) In Mix.nlu select the language from the menu near the name of the project. (GrXML files are language-specific.)

  3. Create a new entity and name it appropriately, keeping in mind the requirements described in the link above.

  4. Choose as the Type: Rule-based.

  5. Browse to upload the grammar file that you have prepared.

  6. Click Download project and save rule-based entity.

    create_grxml_entity

  7. If your project includes multiple languages, upload separate grammar files, one for each language. See the note below.

Before the new entity is saved (or modified), Mix.nlu exports your existing NLU model to a ZIP file (one ZIP file per language) so that you have a backup of your NLU model. Creating (or modifying) a rule-based entity requires your NLU model to be retokenized, which may take some time and impact your existing annotations. You receive a message when the entity is saved successfully.

At any time you can use the download button to view the contents of the GrXML file.

download_grxml_

Tips and considerations for rule-based entities

Note that in the Mix NLUaaS runtime, GrXML files are used to help parse entity values from within speech recognition text rather than for supporting speech recognition directly. Because of this, you should not use any speech-specific rules in the grammar file such as you might use in Nuance Recognizer. For example, do not include a <ruleref special="GARBAGE"/> rule in your GrXML file. Such rules do not have any meaning in Mix and may cause your Mix build to fail.

Note also the following additional points when creating entities using a rule-based collection method:

  • You can change a rule-based entity to any other entity collection method. However, in this case, the associated GrXML file is not retained. The GrXML file is completely removed from the project.
  • Dynamic rule-based entities are not supported.
  • A GrXML file must have the .grxml extension.
  • The name of the rule-based entity must match the grammar root and rule ID of the GrXML file. See GrXML file rules for details. As a result, you cannot rename a rule-based entity.
  • The grammar must return a value.

Understanding grammar files

Let’s take a look at the structure of GrXML files. The following is an example of a GrXML file.

  Example GrXML file  

The grammar file shown above is designed to recognize a specific account number type in conjunction with a rule-based entity called DP_NUMBER.

From the attributes of the grammar element, we know the language for the grammar is United States English (xml:lang="en-US")

Notice that the header of the file identifies “DP_NUMBER” (the same name as the rule-based entity) as the root rule (root="DP_NUMBER").

Below this, we see the root rule definition (<rule id="DP_NUMBER" scope="public">).

This rule itself consists of a one-of list with two options representing two possible formats for the account number. Each of these options refers to a sub-rule appearing further on in the file via a ruleref element. The first option refers to a rule entitled “S” (<ruleref uri="#S"/>). The second option refers to another rule entitled “EMIR” (<ruleref uri="#EMIR"/>). These sub-rules themselves reference additional rules “DIGIT”, “dash”, and “zero” used by both.

At runtime, Mix.nlu compares what the user says with the patterns defined in the different sub-rule branches. If the user utterance matches a pattern, this activates that branch. The code in the tag element of the branch assigns the appropriate value to the DP_NUMBER variable and returns this value.

If the user utterance doesn’t match an option from any of the rules with reasonable accuracy, the rule-based entity and any intents using the entity will not match with significant confidence.

For more information on GrXML, refer to the standard at Speech Recognition Grammar specification  .

GrXML file rules

The filename for the GrXML file must have from 1-128 characters, and may include upper and lowercase letters, 0-9, - (hyphen), and _ (underscore).

A rule grammar file has this format:

  • The file must be a valid GrXML file that defines the pattern of the entity using <rule> and other standard GrXML elements.

  • Only one rule-based entity may be defined per GrXML file.

  • Within the GrXML file, the grammar root and rule ID must match the name of the entity that uses it. In the GrXML sample, notice that both root="DP_NUMBER" and rule id="DP_NUMBER" take the same value, which reflects the name of the associated entity, DP_NUMBER.
    Tip: The “normalize to probabilities” and “robust compile” parameters are recommended in all rule grammar files. The first parameter improves recognition accuracy, while the second allows missing pronunciations to be ignored during grammar compilation (without this parameter, the compilation fails if a pronunciation cannot be found).

  • The variable in the return tag must also match the entity name, for example:
    <tag>DP_NUMBER = S.V</tag>

  • The file may not reference any other GrXML files so any dependencies should be included within the file itself.

Troubleshooting GrXML errors

Here are some notes that may help if you encounter problems creating rule-based entities.

Troubleshooting GrXML
Issue Description
Invalid file extension The file is not a GrXML file. If you are creating a rule-based entity, you must upload a GrXML file with the *.grxml extension.
Invalid file name The filename must not exceed 128 characters and is limited to upper and lowercase letters, 0-9, - (hyphen), and _ (underscore).
Grammar root value The grammar root in the GrXML file must be the entity name. For example:
<grammar ... root="DP_NUMBER" ...>
File contains GrXML errors There are format errors in the file’s GrXML markup. For example, check that the grammar root, the rule ID, and the return tag all use the entity name:
<grammar... root="DP_NUMBER" ...>
<rule id="DP_NUMBER" ...>
<tag>DP_NUMBER = S.V</tag>
Grammars may not reference other files The grammar file may not include references to other files; for example, this is not supported: <ruleref uri="acct_num.grxml#emir"/>
Any related rules required by the grammar must be included in the file being uploaded.