Overview to NLU modeling

The best practices in this document are organized around the following steps of NLU model building:

  1. Designing the model: How to design your ontology to maximize accuracy and minimize effort.

  2. Generating data and training the initial model: How to generate initial data. You will need to generate initial training data for the model. The training data consists of examples of the kinds of utterances that the model will need to understand, along with their semantic interpretations. Mix.nlu trains the NLU model based on the training data so that whenever it gets an input utterance, it outputs a semantic interpretation that is as similar as possible to what is in the training data. Although training data may be generated from scratch, it is better to use existing log data around user requests and select from this data.

  3. Evaluating NLU accuracy: After Mix.nlu trains the NLU model on the training data, you will also need test data that can be run against the model so you can understand how well the model has learned how to predict the meanings of new utterances.

  4. Improving NLU accuracy: Once you have a trained NLU model and have a set of test results from that model, the accuracy of the model can be improved via analysis.

    There are basically two kinds of analysis:

    • With error analysis, you start with errors that the model is making and try to improve the model to not make those errors.

    • With consistency checking, you analyze your annotations to make sure that they are consistent.

      Improving the model (including retesting the improved model) is an ongoing process—as more test and training data become available, additional iterations of accuracy improvement can be performed.

This section also includes frequently asked questions (FAQs) that are not addressed elsewhere in the document.

Notation convention for NLU annotations

In this section, examples of an annotated utterance follow the notation used by Mix.nlu itself. Word sequences that correspond to entities are preceded by the name of the entity in square brackets (such as “[SIZE]”), and are followed by “[/]”. Word sequences that correspond to intents (in most cases, this is the entire utterance) are preceded by the name of the intent in curly braces (such as “{ORDER_PIZZA}”) and followed by “{/}”. For example, the utterance “I want to order a large pizza” could be annotated as follows:

{ORDER_PIZZA} I want to order a [SIZE] large [/] pizza {/}

In this example, the intent associated with the entire utterance is ORDER_PIZZA, and the only entity in the utterance is the SIZE entity, which has the literal “large”.

Nuance refers to this notation as XMix (eXtended Mix) format.

(Note that Mix supports exporting annotations in a different format called TRSX. However, TRSX uses an XML format that is not easily human-readable; thus, this format is not helpful to this discussion.)