Using wordsets

The Krypton recognition engine and Natural Language Engine (NLE) use wordsets to support dynamic content that is not available until after a session is begun. For example, to identify user-specific information such as a payee list and to add those recognizable objects into the vocabulary at runtime. Krypton and NLE accept the same wordset JSON file:

  • For Krypton, a wordset adds new vocabulary as well as the alternative spoken forms associated with a single word or phrase.
  • For NLE, a wordset describes the canonical value to be returned as the semantic value for the specified entities.

To load wordsets into the engines, see Triggering the Dragon Voice recognizer.

The format of wordset files

A wordset is a JSON file containing words and phrases that can be applied to known entities in the models at runtime. A given wordset can contain one or more entries for one or more entities, and can also specify custom pronunciations.

{
   "category-name_1" : [                                 <-- entity name
     {
       "canonical" : "<semantic value>",                 <-- semantic value returned in NLE result
       "literal" : "<written form>",                     <-- written form returned in Krypton result
       "spoken" : ["<spoken form 1>", "<spoken form n>"] <-- alternative spoken forms to be recognized
     },
     …
   ],
   …,
   "category-name_n" : [
     {       
       "canonical" : "<semantic value>",        
       "literal" : "<written form>",       
       "spoken" : ["<spoken form 1>", "<spoken form n>"]
     },
     …
   ]
}

Where:

  • category_name is the dynamic entity name defined in a domain language model. An example is PAYEE (see the example below).
  • canonical is an optional string consumed by NLE only (ignored by Krypton). When missing, the value of the literal field is used instead. It will be returned as the semantic value for the specified entity in the NLE result.
  • literal is the written form (ASR literal) that will be returned in the Krypton formatted result, and is passed to NLE as input to the interpretation process.
  • spoken is an array of possible pronunciations that will be associated with the literal form by Krypton (ASR model). This element is optional and ignored by NLE.

    Tip: Be aware that multiple "spoken" values will return a single literal. Omit the spoken form unless the literal is difficult to pronounce as written.

An example wordset

If the entity "PAYEE" is a placeholder in the model set, dynamic entries to the entity might include "AMEX", "Visa", and any number of specific payees.

{
  "PAYEE" : [
    {
      "canonical" : "AMEX",
      "literal" : "amex"
    },
    {
      "canonical" : "AMEX",
      "literal" : "amex",
      "spoken" : ["A M E X"] 	
    },
    {
      "canonical" : "AMEX",
      "literal" : "american express"
    },
    {
      "canonical" : "VISA",
      "literal" : "visa"
    },
    {
      "canonical" : "SOCALGAS",
      "literal" : "southern california gas"
    },
    {
      "canonical" : "SOCALGAS",
      "literal" : "southern california gas company"
    },
    {
      "canonical" : "SOCALGAS",
      "literal" : "the gas company"
    },	
    …
  ]
}

Notice in this excerpt from a wordset file that multiple entity values have been identified for the PAYEE entity, including the names of credit card companies as well as the name of a utility company, perhaps sourced from a back-end system or customer database. Thus, the caller is able to say "Southern California Gas" or "the gas company" and the application will understand these literals as instances of the SOCALGAS entity value for the PAYEE entity.