Designing the model

Designing a model means creating an ontology that captures the meanings of the sorts of requests your users will make.

In the context of Mix.nlu, an ontology refers to the schema of intents, entities, and their relationships that you specify and that are used when annotating your samples and interpreting user queries. These represent the things your users will want to do and the things they will mention to specify that intention.

Don’t overuse intents

In many cases, you need to make an ontology design choice around how to categorize the different user requests you want to support. Generally, it’s better to use a few relatively broad intents that capture very similar types of requests, with the specific differences captured in entities, rather than use multiple specific intents.

For example, consider the ontology needed to model the following set of utterances:

  • Order a pizza
  • Order a large pizza
  • Order a small pizza
  • Order a large coke
  • Order chicken wings

In this case there is one single common request—to place a food order. The order can consist of one of a set of different menu items, and some of the items come in different sizes.

Best practice—most general ontology

The best practice in this case is to define a single intent ORDER, and then define an entity SIZE with possible values of “large” and “small”, and an entity MENU_ITEM that includes values of “pizza”, “coke”, and “chicken wings”:

  • {ORDER} order a [MENU_ITEM] pizza [/] {/}
  • {ORDER} order a [SIZE] large [/] [MENU_ITEM] pizza [/] {/}
  • {ORDER} order a [SIZE] small [/] [MENU_ITEM] pizza [/] {/}
  • {ORDER} Order a [SIZE] large [/] [MENU_ITEM] coke [/] {/}
  • {ORDER} Order [MENU_ITEM] chicken wings [/] {/}

Another approach would be to encode the menu item literals into the intent but keep the SIZE entity only:

  • {ORDER_PIZZA} order a pizza {/}
  • {ORDER_PIZZA} order a [SIZE] large [/] pizza {/}
  • {ORDER_PIZZA} order a small pizza {/}
  • {ORDER_COKE} order a [SIZE] large [/] coke {/}
  • {ORDER_CHICKEN_WINGS} order chicken wings {/}

Yet another approach would be to encode both the menu item and size entity literals into the intent:

  • {ORDER_PIZZA} order a pizza {/}
  • {ORDER_LARGE_PIZZA} order a large pizza {/}
  • {ORDER_SMALL_PIZZA} order a small pizza {/}
  • {ORDER_LARGE_COKE} order a large coke {/}
  • {ORDER_CHICKEN_WINGS} order chicken wings {/}

By using a general intent and defining the entities SIZE and MENU_ITEM, the model will learn about these entities across intents, and you don’t need examples containing each entity literal for each relevant intent. By contrast, if the size and menu item are part of the intent, then training examples containing each entity literal will need to exist for each intent. The net effect is that less general ontologies will require more training data in order to achieve the same accuracy as the recommended approach.

Another reason to use a more general intent is that once an intent is identified, you usually want to use this information to route your system to some procedure to handle the intent. Since food orders will all be handled in similar ways, regardless of the item or size, it makes sense to define intents that group closely related tasks together, specifying important differences with entities.

Use predefined entities when appropriate

Mix includes a number of predefined entities.

When possible, use predefined entities. Developing your own entities from scratch can be a time-consuming and potentially error-prone process when those entities are complex; for example, for dates, which may be spoken or typed in many different ways. It is much faster and easier to use the predefined entity, when it exists. Since the predefined entities are tried and tested, they will also likely perform more accurately when recognizing the entity than if you tried to create it yourself. There is no point in reinventing the wheel.

You can effectively assign a custom name to a predefined entity by creating an entity with the name you prefer, and then define that entity to be an instance of (“isA”) a predefined entity. For example, say you are creating an application that lets a user book a return ticket for travel. You need to look for two calendar dates in the user sentence: one for the departure date and one for the return date. By defining two custom entities, DEPARTURE_DATE isA nuance_CALENDARX and RETURN_DATE is a nuance_CALENDARX, your model will be able to learn to resolve which date corresponds to which role in a sentence, while still taking advantage of the benefits of using the predefined entity nuance_CALENDARX.

Use the NO_INTENT predefined intent for fragments

Users often speak in fragments; that is, speak utterances that consist entirely or almost entirely of entities. For example, in the coffee ordering domain, some likely fragments might be “short latte,” “Italian soda,” or “hot chocolate with whipped cream”. Because fragments are so popular, Mix has a predefined intent called NO_INTENT that is designed to capture them. NO_INTENT automatically includes all of the entities that have been defined in the model. This way any entity or sequence of entities spoken on their own will not require its own training data.

Make use of hierarchical entities when appropriate

Making use of hierarchical entities can make your ontology simpler. For example, a BANK_ACCOUNT entity might have ACCOUNT_NUMBER, ACCOUNT_TYPE, and ACCOUNT_BALANCE sub-entities. These are defined via “hasA” relationships: “BANK_ACCOUNT hasA (ACCOUNT_NUMBER, ACCOUNT_TYPE, ACCOUNT_BALANCE).”

On the other hand, for a funds transfer utterance such as “transfer $100 from checking to savings,” you may want to define FROM_ACCOUNT and TO_ACCOUNT entities. By using the “isA” relationship, you can define the relationships as follows:

  • “FROM_ACCOUNT isA BANK_ACCOUNT”
  • “TO_ACCOUNT isA BANK_ACCOUNT”

This way, the sub-entities of BANK_ACCOUNT also become sub-entities of FROM_ACCOUNT and TO_ACCOUNT—there is no need to define the sub-entities separately for each parent entity.

For more details, see Relationship entities.

Define intents and entities that are semantically distinct

In your ontology every element should be semantically distinct. You shouldn’t define intents or entities that are semantically similar to or overlap with other intents or entities. There needs to be a clear, clean distinction and separation between the different intents the model seeks to support, and between the different entities seen by the model if you want the model to be able to resolve the distinctions in its inference.

It will be difficult for the trained NLU model to distinguish between semantically similar elements, because the overlap or closeness in meaning will cause ambiguity and uncertainty in interpretation, with low confidence levels. The model will have trouble identifying a clear best interpretation. In choosing a best interpretation, the model will make mistakes, lowering the accuracy of your model.

The differences in meaning between the different intents should be clearly visible in the words used to express each intent. If possible, each intent should be associated with a distinct set of carrier phrases and entities. A carrier phrase is the part of a sentence for an intent that is not the entities themselves. For example, if your application is trying to distinguish between showing bills (REQUEST_BILL) and paying bills (PAY_BILL), then you might set up (and document via an annotation guide) the following rules:

  • Show bills” → REQUEST_BILL
  • Pay bills” → PAY_BILL
  • What are my bills” → REQUEST_BILL
  • Handle my bills” → PAY_BILL. Here you could argue that the user might first like to see the bills first, rather than pay them directly, so the decision has a UX impact.
  • “View my bills so I can pay them” → ? This kind of hard-to-annotate utterance will occur in real usage data. Ultimately, a decision (even if arbitrary) must be made and documented.

Use an out-of-domain intent

The end users of an NLU model don’t know what the model can and can’t understand. They will say things that the model isn’t designed to understand. For this reason, NLU models should typically include an out-of-domain intent that is designed to catch utterances that the model can’t handle properly. This intent can be called something like OUT_OF_DOMAIN, and it should be trained on a variety of utterances that the system is expected to encounter but cannot otherwise handle. At runtime when the OUT_OF_DOMAIN intent is returned, then the system can accurately reply with “I don’t know how to do that”.

Many older NLU systems do not use an out-of-domain intent, but instead rely on a rejection threshold to achieve the same effect: NLU interpretations whose confidence values fall below a particular threshold are thrown out, and the system instead replies with “I didn’t understand”.

However, a rejection threshold has two disadvantages compared to an out-of-domain intent:

  • The system is not trained on any out-of-domain data, so the model is not trained to explicitly discriminate between in-domain and out-of-domain utterances. Because of this, the accuracy of the system will be lower, particularly on utterances that are not clearly in-domain or out-of-domain (that is, utterances that are closer to the decision boundary between in-domain and out-of-domain).
  • The confidence threshold needs to be manually tuned, which is difficult to do properly without a test set of usage data that includes a large proportion of out-of-domain utterances.

The data used to train the out-of-domain intent must come from the same sources as all other training data:

  • From usage data if available
  • From data collection, if possible
  • From individuals’ best guesses as to what out-of-domain utterances users will say to the production system

Don’t ask your model to try to do too much

Some types of utterances are inherently very difficult to tag accurately. Whenever possible, design your ontology to avoid having to perform tagging that is inherently very difficult.

For example, it is possible to design an ontology that requires the trained NLU model to use large dictionaries to predict the correct interpretation. This can happen when exactly the same carrier phrase occurs in multiple intents. For example, consider the following utterances:

  1. {PLAY_MEDIA} play the film [MOVIE] Citizen Kane [/] {/}
  2. {PLAY_MEDIA} play the track [SONG] Mister Brightside [/] {/}
  3. {PLAY_MEDIA} play [MOVIE] Citizen Kane [/] {/}
  4. {PLAY_MEDIA} play [SONG] Mister Brightside [/] {/}

In the first two utterances (1-2), the carrier phrases themselves (“play the film” and “play the track”) provide enough information for the model to correctly predict the entity type of the follow words (MOVIE and SONG, respectively).

However in utterances 3-4, the carrier phrases of the two utterances are the same (“play”), even though the entity types are different. In this case, in order for the NLU to correctly predict the entity types of “Citizen Kane” and “Mister Brightside”, these strings must be present in MOVIE and SONG dictionaries, respectively.

This situation is acceptable in domains where all possible literals for an entity can be listed in a dictionary for that entity in a straightforward manner. But for some entity types, the required dictionaries would either be too large (for example, song titles) or change too quickly (for example, news topics) for Mix to effectively model. In this type of case, the best practice solution is to under specify the entity within NLU, and then let a post-NLU search disambiguate. In the case of utterances 3-4, underspecifying means defining an entity that can consist of either a movie or a song:

  1. {PLAY_MEDIA} play [QUERY] citizen kane [/] {/}
  2. {PLAY_MEDIA} play [QUERY] mister brightside [/] {/}

Now, the carrier phrase “play” provides the information necessary to infer that the following words are QUERY entity. (Note that Mix is able to learn that “play” by itself is likely to be followed by a QUERY, while the word sequence “play movie” is likely to be followed by a MOVIE.)

This approach requires a post-NLU search to disambiguate the QUERY into a concrete entity type—but this task can be easily solved with standard search algorithms.

Use a consistent naming convention

It is a good idea to use a consistent convention for the names of intents and entities in your ontology. This is particularly helpful if there are multiple developers working on your project.

There is not necessarily one right answer. It is important to plan ahead, have a systematic logical convention, and be consistent.

Here are some tips that you may find useful:

  • Use all caps (uppercase) with underscores separating words. For example, ORDER_COFFEE for an intent, and COFFEE_SIZE and COFFEE_TYPE for entities. All caps makes the entities visually stand out clearly in annotated sentences as distinct from the sentence contents, and the underscores keep the name to a valid format while making the name easily readable.
  • Use an ACTION_OBJECT or VERB_OBJECT format for intents. A lot of intents fall into this schema where an action is carried out and involves or applies to some object. Here, object is used in the grammatical sense of subject-verb-object. For example, PAY_BILL or MESSAGE_RECIPIENT.
  • Use a prefix to identify the domain for an intent, with the domain in lowercase, as one word. For example, personalbanking_PAY_BILL. This practice can help to visually group related intents in Mix.nlu. nlu.

An example of a best practice ontology

The Coffee app quick start is an example of a recommended best-practice NLU ontology.