nrc-callsummary payload
Current version: v1
The application/x-nuance-nrc-callsummary
payloads provide summary information about the NRaaS interaction (full recognition turn), including request parameters, results, statistics, and internal events.
In addition to the standard fields described in data field structure, messages with the application/x-nuance-nrc-callsummary
dataContentType include the following fields:
Field | Description |
---|---|
status | Contains the status code. Go to Status codes in the NRaaS documentation for details. |
nrcSessionid | Identifier of the current NR session. |
absEndTime | Audio stream end time. Go to absEndTime |
audioPacketStats | Timing information about audio packets. |
audioURN | URN of the audio file for the utterance. This field can be used to download the audio file with the AFSS API. Go to AFSS API message for details. |
audioDump | URL of the audio recording. Is not included in DTMF recognitions. |
nrcallogs | The call logs with all NRaaS events that occurred during the interaction. |
absEndTime
The absEndTime field contains the following values:
Field | Description |
---|---|
firstPacketTime | Date and time the first audio packet was received. |
lastPacketTime | Date and time the last audio packet was received. |
audioDurationMs | The duration of the audio received minus begin and end silence periods as detected by NRaaS. |
Tokens in nrcalllogs events
Event records that detail recognizer executions, recognitions, special events (such as compilation and cache activities), and caller utterances. These records contain information such as:
- Timestamps of each event
- Recognition results with confidence scores
- Timing statistics of each recognition event
- Names of audio files containing caller utterances
Tokens used in every event
Token | Description |
---|---|
TIME | System time when the event occurred, in the following format (accurate to within 0.01 second): YYYYMMDDhhmmssmmm. |
CHAN | A unique session identification name provided when the session is created. |
EVNT | The event identifier. |
UCPU | The current running value of CPU time consumed from the start of the recognition or synthesis. This value is reported in milliseconds, accurate to within 0.01 second. For events where this doesn’t apply, the value is 0. |
SCPU | This value is always 0. |
SWIfrmt
This event identifies the format of call log events written by Nuance Recognizer and occurs at the beginning of a call.
Token | Description |
---|---|
ENCD | Format of log events written by Nuance Recognizer. For example, UTF-8. |
SWIclst
This event indicates the beginning of a call to the system. It is triggered at the beginning of a session.
Token | Description |
---|---|
SRC | Session ID and status. |
SRC | Component that issued the event. |
SWIliss
The SWIliss and SWIlise events indicate recognizer license usage at the beginning and end of a call to the system.
SWIlise indicates the duration (in seconds) a license was held for the call; the event is triggered at the end of a session, and the tokens describe the count of licenses in use after incrementing for the new license.
Token | Description |
---|---|
LUSED | Licenses used. The current number of recognizer instances. |
OMAX | Overdraft maximum. The number of available license ports (not including overdraft ports). |
LFEAT | License features. A comma-separated list showing which features are associated with the license. |
SWIgrld
This event summarizes the loading of a grammar. The event is logged whenever a grammar is loaded, activated, or compiled.
Token | Description |
---|---|
API | The called Recognizer function, either: “SWIrecGrammarLoad” “SWIrecGrammarActivate”, or “SWIrecGrammarCompile”. |
TYPE | The data type of the grammar. |
URI | The grammar URI (token not written if grammar is not a URI). |
PROPS | Any properties supplied in for the grammar. |
FETCHES | Number of fetches needed to load the grammar. |
MEMHITS | Memory cache hits for this load. (The number of loaded grammars that were already in the memory cache.) |
MEMMISS | Memory cache misses for this load. (The number of loaded grammars that were not already available in the memory cache.) |
DISKHITS | Disk cache hits for this load. (The number of loaded grammars that were already in the disk cache.) |
DISKMISS | Disk cache misses for this load. (The number of loaded grammars that were not already available in the disk cache.) |
LDCPU | Total CPU milliseconds used for the API call. |
LDTIME | Total clock-time milliseconds used for the API call. |
GCCPU | Total CPU milliseconds used for grammar compilation. |
GCTIME | Total clock-time milliseconds used for grammar compilation. |
IFCPU | Total CPU milliseconds to fetch the grammar(s) from inet. |
IFTIME | Total clock-time milliseconds to fetch the grammar(s) from inet. |
IFBYTES | Total bytes fetched (or re-fetched) from inet or the disk cache. |
COMPILES | Number of “real” compiles from source or old (OSR 1.n) binary files. (Total count of loaded grammars that required compilation; grammars already pre-compiled with the sgc 2.0 compiler are excluded.) |
RC | The return code from the API call. |
SWIcach
This event is a periodic summary of grammar caching activities.
Token | Description |
---|---|
MHIT | Number of times grammars were found in the memory cache. |
MMISS | Number of times grammars were not found in the memory cache. |
MSIZE | Size of the memory cache (in kilobytes). |
SWIrcst
This event is logged at the start of a recognition.
Token | Description |
---|---|
ACST | Indicates whether the acoustic state has been reset: Set to 1 when a recognizer is created or the acoustic state is reset. Set to 0 after the SWIrcst event has been logged. For the second and subsequent recognition events during a telephone call, the expected value is 0, which indicates that the acoustic state has not been reset during the call. |
GURIx | Grammar URI, where x is an integer enumerating active speech grammars, starting at 0. |
GRNM | Grammar name. (For a parameter grammar, this is the grammar ID.) |
LANG | Grammar language. (This field is empty for parameter grammars.) |
GRMT | Grammar media type. For example, “GRMT=application/srgs+xml”. |
WGHT | Activation weight of a grammar. |
OSRVER | Recognizer version number. Logged only if ACST=1. |
SWIepst
This event is written for each recognition turn and signals that the endpointer has begun the attempt to detect the start of speech.
Token | Description |
---|---|
VERSION | NR version. |
SWIepss
The SWIepss and SWIepse events indicate the endpointer license usage at the beginning and end of a call to the system. SWIepss is triggered at the start of a session, and the tokens describe the count of licenses in use after incrementing for the new license.
Token | Description |
---|---|
LUSED | Licenses used. The current number of endpointer instances. |
LMAX | License maximum. The maximum number of available licenses. The number of licenses actually checked out and available for use by an endpointer instance. |
OMAX | Overdraft maximum. The number of available license ports (not including overdraft ports). |
LFEAT | License features. A comma-separated list showing which features are associated with the license. |
SWIrcnd
The SWIepss and SWIepse events indicate the endpointer license usage at the beginning and end of a call to the system. SWIepss is triggered at the start of a session, and the tokens describe the count of licenses in use after incrementing for the new license.
Token | Description |
---|---|
RSTT | See Return codes. |
RENR | See Reasons for end of recognition. |
ENDR | See Reasons for end of speech. |
NBST | Number of n-best items. Used only if RSTT is “ok” or “lowconf”. |
RSLT | Parsed text for n-best item. |
SPOK | Normalized raw text for n-best item; set to the value of the SWI_spoken key. |
GRMR | Grammar for n-best item. |
KEYS | List of key/value pairs for the top result. |
CONF | Confidence value for n-best item. Values can range from 0 to 999. |
RAWS | Raw score for n-best item. |
SPIV | The second pass has been invoked. When the recognizer is “unsure” about the accuracy of the nbest list, it invokes a second pass through the data to help improve the accuracy. A second pass uses more CPU and may also presage a low-confidence recognition. |
SPAG | The second pass has not modified the result of the first pass. When the recognizer is “unsure” about the accuracy of the nbest list, it invokes a second pass through the data to help improve the accuracy. A second pass uses more CPU and may also presage a low-confidence recognition. |
MDVR | Model version—version stamp of models. Format is L.M.m.s, where L is language number, M is major version, m is minor version, and s is the set number. |
MPNM | Indicates the acoustic models used for generating the recognition result. Contains a comma separated list showing the language and acoustic model filenames used for first-pass recognition processing to get the top choice on the n-best list. Each list element has the format LangCode/Version/Path/Filename. (If there is no applicable value to report, a value of NA is used.) For example: MPNM=en.us/10.0.0/models/FirstPass/models.hmm,de.de/10.0.0/models/FirstPass/models.hmm |
DPNM | Root name of the diphone acoustic models used to recognize the top choice on the n-best list. (If there is no applicable value to report, a value of NA is used.) |
MACC | Filename of the statistics file (the monophone accumulator) that tuned the acoustic model used for the recognition event. |
MEDIA | An audio media type. For example, “MEDIA=audio/basic;rate:8000”. |
EOSS | End-of-speech signal: where in the input stream the endpointer wanted the recognizer to stop. |
DURS | Amount of speech processed by the recognizer in milliseconds. The value can sometimes exceed EOSS by small amounts. |
EOSD | How much speech data was passed to the endpointer before EOS was determined. This token helps determine latency due to endpointer decision-making (mostly end of speech timeout). If EOSD equals EOSS then something unusual caused the end-of-speech; for example, the maximum speech duration timer expired. |
BORT | Beginning of recognition time (when the recognizer first processed the signal). |
EOST | End-of-speech time in milliseconds. Clock time when the endpointer determined the end of caller speech; measured in real time from the arrival of the first packet; delays in the audio path are not counted. |
EORT | End-of-recognition time in milliseconds. Clock time when the results are ready. Measured in real time from the arrival of the first packet of the input stream. |
LA | Value of the swirec_load_adjusted_speedvsaccuracy parameter used for the recognition. Values include: idle, normal, busy, Xidle, Xnormal, Xbusy. “X” values indicate that the parameter specified that value. Values without “X” were determined at runtime with the parameter setting “on”. |
OFFS | For internal use only. Shows an offset value for acoustic models. For example, “OFFS=1.3”. |
SCAL | For internal use only. Shows a multiplier for acoustic scale. For example, “SCAL=5.5”. |
RCPU | Recognizer CPU time in milliseconds. Measures how much CPU was used for the recognition. |
Return codes
Return code | Status |
---|---|
serr | A system error occurred. |
lowconf | There was an n-best result (including any possible decoys), but it was below the setting of the confidencelevel parameter. |
maxc | The maximum CPU time was reached (swirec_max_cpu_time). |
nomatch | There was no recognition match, and no n-best result. |
ok | Recognition was successful. There is an n-best result. |
stop | Recognizer received a stop request. |
Reasons for end of recognition
Return code | Status |
---|---|
count | The maximum sentences were reached. (The max is determined by internal algorithms; this is not swirec_max_sentences.) |
err | A system error occurred. |
maxc | The maximum CPU time was reached. |
maxsrch | Recognizer’s maximum allowed search time was reached. |
maxsent | The number of sentences tried. |
ok | Recognition was successful. There is an n-best result. |
prun | Stopped generating the n-best list. This can occur even if no n-best entries returned. One cause is that the pruning threshold was exceeded (swirec_state_beam). But typically, it simply means that there were no more hypotheses to consider. For example, this happens if requesting an n-best size of n but the grammar has fewer than n choices. It will also happen if the recognizer has found a compelling acoustic match so that all the other hypotheses are pruned in the first pass search. |
stop | Recognizer received a stop request. |
Reasons for end of speech
Return code | Status |
---|---|
ctimeout | The end of speech was detected (completetimeout was triggered). |
eeos | External end of speech. The audio sample sent to the recognizer was labeled as the last sample. |
itimeout | Normal end of speech. |
maxs | The maximum speech time was reached (maxspeechtimeout). |
nobos | No beginning of speech detected. |
SWIacum
This event is written whenever the Recognizer collects a statistic as part of its self-learning feature (acoustic adaptation).
Token | Description |
---|---|
MODNM | Name of the recognition model associated with the statistics. |
LANG | The language of the acoustic models associated with the statistics. |
SWIrslt
This event logs the complete XML recognition result at the end of a successful recognition (SWIrcnd) when a voice platform requests a result from Nuance Recognizer.
Token | Description |
---|---|
MEDIA | Media type of the result. |
CNTNT | XML result of the recognition. The exact format of the XML string depends on the voice platform (for example, the platform might request NLSML result format). |
SECURE=TRUE | Confidential information has been suppressed (removed) from the call log record. The token only appears when TRUE. |
SWIepms
This event signals that the external endpointer is done with trying to detect the beginning of speech.
Token | Description |
---|---|
PD | The offset in milliseconds from which the prompt started playing to the time it stopped (either due to barge-in, or because it finished playing). This value is reset to “-1” before the next prompt plays. If no barge-in occurs, this value reflects the total duration of the prompt that was played. |
BOS | The offset time, in milliseconds, at which the beginning of speech in the signal was detected, with some additional backoff. For the true start of speech, see the SOS value. When set to -1, this means that the endpointer timed out. |
SOS | The offset time, in milliseconds, at which the beginning of speech in the signal was detected. If SOS is set to -1, this means that the endpointer timed out. If SOS=PD this indicates that there was barge-in, because the prompt stopped at the start of speech. |
EOS | End of speech time. The default reset value is -1, meaning that the external endpointer did not find the end of speech. The -1 value is expected when the endpointer is in begin_only mode. |
SWIendp
This event is written for every recognition attempt (where start of speech is detected), whether the recognition was successful or not. It is not logged if there is no start of speech. The event is triggered if the voice platform stops the endpointer.
Token | Description |
---|---|
SRC | This token, if present, is set to “SWIep.” |
BRGN | Boolean value, set to 1 if speech was detected while the prompt was playing, 0 if not. |
BTIM | Integer number of elapsed milliseconds between the first sample and the detection of speech, counted based on the duration of the samples passed into the endpointer. |
MODE | nput mode used: spch—Caller used speech. dtmf—Caller used DTMF. hangup—Caller disconnected (some older systems logged this as empty). timeout—No speech detected; timeout. other—The voice browser requested a stop for an unknown reason. |
SWIepse
The SWIepss and SWIepse events indicate endpointer license usage at the beginning and end of a call to the system.
SWIepse indicates the duration (in seconds) a license was held for the call; the event is triggered at the end of a session, and the tokens describe the count of licenses in use after incrementing for the new license.
Token | Description |
---|---|
LUSED | Licenses used. The current number of endpointer instances. |
LMAX | License maximum. The maximum number of available licenses. The number of licenses actually checked out and available for use by an endpointer instance. |
OMAX | Overdraft maximum. The number of available license ports (not including overdraft ports). |
LFEAT | License features. A comma-separated list showing which features are associated with the license. |
LTIME | License time. It shows the number of milliseconds that the license was held since the beginning of the call. |
SWIlise
The SWIliss and SWIlise events indicate recognizer license usage at the beginning and end of a call to the system.
SWIlise indicates the duration (in seconds) a license was held for the call; the event is triggered at the end of a session, and the tokens describe the count of licenses in use after incrementing for the new license.
Token | Description |
---|---|
LUSED | Licenses used. The current number of recognizer instances. |
OMAX | Overdraft maximum. The number of available license ports (not including overdraft ports). |
LMAX | License maximum. The maximum number of available licenses. The number of licenses actually checked out and available for use by a recognizer instance. |
LFEAT | License features. A comma-separated list showing which features are associated with the license. |
LTIME | License time. It shows the number of milliseconds that the license was held since the beginning of the call. |
SWIlps
This event indicates the versions of data packs used for recognition during the session.
Token | Description |
---|---|
LANGVER | Concatenation of all languages (and their data pack versions) used during the session. |
NUANtnat
This event is written near the end of every call.
Token | Description |
---|---|
TNAT | The tenant name associated with the recognition. |
SWIclnd
This event indicates the end of a call to the system. It is triggered at the end of a session.
Token | Description |
---|---|
VALU | Session ID and status. |
SRC | Nuance component that issued the event. |
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.