Managing sensitive information in an application

Mix lets you manage sensitive information in an application so that the information is redacted in the logs.

You can mark data as sensitive in the Mix tools as follows:

There are two redaction options: partial and complete.

  • Partial redaction: Only the information marked as sensitive is redacted in the dialog and NLU logs. Partial redaction is implemented by marking entities and variables as sensitive in the Mix tools.

  • Complete redaction: All user data is redacted in the logs. Complete redaction is enabled by:

    • Setting a question and answer node as sensitive. In this case, all user input collected at this node is fully redacted from the ASRaaS, NLUaaS, and DLGaaS logs.

    • Setting the suppress_log_user_data field to true in the DLGaaS StartRequest. In this case, all user input is fully redacted from the logs across all services for the entire session.

    • Setting a field in the NLUaaS, TTSaaS, ASRaaS, NRaaS API. In this case, the logs are redacted for that specific service.

This section summarizes these options.

Partial redaction

When you mark an entity or a variable as sensitive in Mix.nlu and Mix.dialog and complete redaction is not enabled, then only the entity and variables marked as sensitive will be redacted in the dialog and NLU logs.

For details on the values masked with partial redaction, see:

Complete redaction

ASRaaS, NLUaaS, TTSaaS, and DLGaaS all provide a field that disables logging of user data, as follows:

  • ASRaaS: Set the suppress_call_recording RecognitionFlags field to True to disable call logging. See ASR values possibly masked.

  • NLUaaS: Set the interpretation_input_logging_mode InterpretationParameters field to SUPPRESSED so that input is masked. See NLU values possibly masked.

  • TTSaaS: Set the suppress_input EventParameters field to True to omit input text and URIs from log events. See TTS values possibly masked.

  • DLGaaS: Set the suppress_log_user_data field in the StartRequestPayload to True to disable logging of user data for all the services when dialog orchestrates with these services. Otherwise, the client application must set the field when requesting a service. See Dialog values possibly masked.

  • NRaaS: Set the secure_context_level field to SUPPRESS to disable utterance waveform recording and suppress recognition results. See NR values possibly masked.

You can also mark a question and answer node as sensitive. This enables complete redaction of all user input collected at this node (user text, utterance, intent and entity values and literals). This applies to NLU intent and entity collection, as well as to all events at that node: collection, recovery, confirmation, nomatch, noinput, max events, NO_INTENT, intent switching, and so on.

ASR values possibly masked

When the suppress_call_recording RecognitionFlags field is set to true, the recognition respsonse is suppressed in the event logs and the corresponding audio is removed. The hypotheses field is empty and the redactedReason field is provided:

"hypotheses": [],
"redactedReason": "suppressCallRecording"

The minimally formatted text is available in the final result returned to the application, but this information is not logged.

See the ASRaaS RecognitionFlags documentation for details on setting suppress_call_recording.

NLU values possibly masked

The NLU values masked depend on whether partial or complete redaction is enabled.

Partial redaction

When an entity is marked as sensitive (either in Mix.nlu or Mix.dialog) and complete redaction is not enabled, values in the data field are redacted as follows:

  • request.input: Only the sensitive entity value is redacted from the input.
  • response.result.interpretations: The literal, formatted literal, and string values are redacted for sensitive entities.
  • response.result.literal: Only the sensitive entity value is redacted from the literal.
  • response.result.formattedLiteral: The formatted literal value is redacted.
  Example of partial redaction, COFFEE_SIZE entity is marked as sensitive  

Complete redaction

When the interpretation_input_logging_mode field in the InterpretRequest is set to SUPPRESSED, values in the data field are redacted as follows:

  • request.input: Fully redacted.
  • response.result.interpretations: Fully redacted.
  • response.result.literal: Fully redacted.
  • response.result.formattedLiteral: Fully redacted.

See the NLUaaS InterpretationParameters documentation for details on setting interpretation_input_logging_mode.

  Example of complete redaction  

Dialog values possibly masked

The dialog values masked depend on whether partial or complete redaction is enabled.

Partial redaction

When an entity or a variable is marked as sensitive (either in Mix.nlu for entities or Mix.dialog for entities, variables, and question and answer nodes) and complete redaction is not enabled, values in Dialog application logs are redacted as follows:

  • Messages: Only the sensitive variable or entity is redacted from messages.
  • Data: The values of sensitive entities and variables are redacted from the data fields.
  • Utterances: The values of sensitive entities are redacted from the utterance field.
  • Entity values and literals: The values and literals of sensitive entities are redacted in the qa-config, question-router, and input-received event logs.
  • Reporting variables: Only non-sensitive variables configured for reporting are included in the reporting-vars event log.
  Example of message event, partial redaction of user_name variable  
  Example of data field in session-update event, partial redaction of user_name variable  
  Example of partial redaction in input-received event, COFFEE_SIZE entity is masked as sensitive  

Complete redaction

When the suppress_log_user_data field in the StartRequest is set to true, values in Dialog application logs are redacted as follows:

  • Messages: The messages are fully redacted.

  • Data: All entity and variable values are redacted from the data fields.

  • Utterances: The utterance fields are fully redacted.

  • Entity values and literals: All entity values and literals are redacted in the qa-config, question-router, and input-received event logs.

  • Reporting variables: Only non-sensitive variables configured for reporting are included in the reporting-vars event log.

See the StartRequestPayload documentation for details on setting suppress_log_user_data.

  Example of message event, complete redaction  
  Example of data field in session-update event, complete redaction  
  Example of complete redaction in input-received event  

NR values possibly masked

When the secure_context_level field in RecognitionParameters or DTMFRecognitionParameters is set to SUPPRESS, sensitive information is suppressed from the event logs and the corresponding audio is removed. Utterance waveforms are not recorded, recognition results in the diagnostic and call logs are suppressed.

See the secure_context_level field in RecognitionParameters or DTMFRecognitionParameters documentation for details on setting secure_context_level.

TTS values possibly masked

When the suppress_input field in the SynthesisRequest is set to true, the following fields are masked:

Credit card numbers

All the Mix engines attempt to mask or redact credit card numbers in the event logs.

In all engines except ASR, credit card numbers that are between 13 and 19 characters and pass the Luhn algorithm  test are masked in the Mix event logs. Potential credit card numbers can be interspersed with spaces or hyphens. For example, the following values are masked:

  • XXXX XXXX XXXX XXXX
  • XXXXXXXXXXXXXXXX
  • XXXX-XXXX-XXXX-XXXX

As an additional precaution, you may also mark entities that collect credit card numbers as sensitive.

ASR uses a slightly different algorithm to detect and redact credit card numbers: see ASR below.

NLU

In NLU, the credit card number is redacted from the logs and replaced with ******.

  Example of redacted log for NLU  

TTS

In TTS, the credit card number is redacted from the logs and replaced with the string *** POSSIBLE CC NUMBER REDACTED ***.

  Example of redacted log for TTS  

NR

In NR, if a potential credit card number is detected, the results are deleted in call logs for that recognition and FluentD redacted possible CCN appears instead.

  Example of redacted log for NR  

Dialog

In Dialog, a credit card number is redacted from the logs and replaced with the string ****POSSIBLE CC NUMBER REDACTED****.

Dialog redacts content in the following cases:

  • When the user provides input (for example, in qa-config, input-required, input-received, and question-router events).

  • During data exchanges with the client application or a backend server (for example, in session-update, data-required, data-received, input-required, transfer-initiated, transfer-completed, continue-initiated, continue-completed, application-ended, and message events).

  Example of redacted log for Dialog  

ASR

ASR checks for credit card numbers in the recognition result by looking for 12 or more digits in each transcription hypothesis. The digits can be consecutive or non-consecutive. It then removes hypotheses that contain a potential credit card number from the event logs. For example, both these hypotheses are redacted:

  • My number is 123456789012
  • My number is 45004688, no sorry, that’s 4689

When 12 or more digits are detected in a transcription, the hypotheses are deleted for that recognition and the redactedReason field is returned, explaining why the content was redacted:

"hypotheses": [],
"redactedReason": "generic_digits"
  Example of redacted log for ASR