Improving NLU accuracy

Once you have trained and evaluated your NLU model, the next step is to improve the model’s accuracy. There are two general ways of doing this:

Collect and annotate additional data to add to the model training and testing. This additional data can come from two different sources:
1. Prior to deployment, improve the model-based internal/external data collection.
2. After deployment, improve the model based on actual usage data.
Conduct accuracy improvement based on error analysis of validation set accuracy.

Conduct error analysis on your validation set—but don’t overfit

Conducting error analysis means going through the errors that the trained model makes on your validation set, and improving your training data to fix those errors (and hopefully related errors). In general, there are two types of validation set errors:

Intents and entity types with no or too little training data
Annotation errors, either in your training data or in validation set data

For errors caused by too little training data, the solution is simple: add additional training data that is similar to the failing validation set utterance. However, do not add the failing utterance itself unless it is likely to be seen frequently in production. This restriction is for helping to prevent overfitting.

Overfitting happens when you make changes to your training data that improve the validation set accuracy, but which are so tailored to the validation set that they generalize poorly to real-world usage data. It is something like fitting a curve to a set of plotted points by drawing a complicated curve that passes perfectly through every point rather than drawing a smooth, simple curve that broadly captures the overall shape of the data.

This typically happens when you add carrier phrases and entity literals that occur infrequently in a validation set to the training data, because phrases that occur infrequently in a validation set are less likely to also occur in test sets or in future usage data.

The problem of annotation errors is addressed in the next best practice below.

Warning:

Do not conduct error analysis on your test sets. The purpose of a test sets is to simulate how your model will perform on sentences that users will say to the deployed system in the future. Obviously, you have no idea in advance exactly what the users will say; therefore, you should not look at the utterances in your test set or use the test set as a basis for model refinement. This includes not performing error analysis on the test set.

Make sure your annotations are correct and consistent

One important type of problem that error analysis can reveal is the existence of incorrect and inconsistent annotations, both in your training data and in your test sets. Some types of annotation inconsistencies are as follows:

Intents: Two utterances with similar carrier phrases and meaning, but which belong to different intents. For example, if “please pay my electricity bill” and “I want to pay my water bill” are tagged with different intents, then this is likely an annotation inconsistency since in both utterances the user is attempting to pay a bill.
Different/missing entities: Two utterances that contain similar words with similar meanings, but where the words are tagged as an entity in one utterance and as a different entity, or as no entity at all, in the other utterance. For example, in the following pair, the word “large” is tagged as SIZE in the first utterance, but in the second utterance, the similar word “short” is (incorrectly) tagged as part of the DRINK_TYPE entity:
- {ORDER} I would like a [SIZE] large [/] [DRINK_TYPE] latte [/] {/}
- {ORDER} I want a [DRINK_TYPE] short mocha [/] {/}
Entity spans: Two utterances that contain similar words tagged with the same entity, but where the tagged entities span inconsistent sequences of words. For example, given the words “in five minutes” and a DURATION entity, one utterance might be tagged “[DURATION] in five minutes [/]” while the other is tagged “in [DURATION] five minutes [/]”.

Often, annotation inconsistencies will occur not just between two utterances, but between two sets of utterances. In this case, it may be possible (and indeed preferable) to apply a regular expression. This can be done by exporting a TRSX file from a project, running the regular expression on the TRSX file, and then re-uploading the corrected TRSX into the same project (after deleting all data from the project to get rid of the inconsistently annotated data). See Import and export data for more information.

Note that since you may not look at test set data, it isn’t straightforward to correct test set data annotations. One possibility is to have a developer who is not involved in maintaining the training data review test set data annotations.

Make sure you have an annotation guide

The problem of incorrect and inconsistent annotations can be greatly alleviated by maintaining an annotation guide that describes the correct usage for each intent and entity in your ontology, and provides examples. A good annotation guide acts as a rule book that standardizes annotations across multiple annotators. In particular, an annotation guide provides two benefits:

In any data set that contains real usage data, it will occasionally be unclear what the correct annotation for a particular utterance is. The annotation guide contains appropriate guidelines. If guidance doesn’t already exist for that type of utterance, then once the correct annotation is decided, new guidelines can be added to the guide.
The annotation guide helps to make sure that different people annotate in the same fashion.

As one simple example, whether or not determiners should be tagged as part of entities, as discussed above, should be documented in the annotation guide.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.