Remove sensitive content
The Remove sensitive content action in CodeTwo Exchange Rules Pro is used to automatically filter confidential data and sensitive content in messages, and remove, mask or replace such content. If a rule containing such action is enabled, the program searches for selected phrases in all the messages that meet the criteria defined on the Conditions tab.
Phrases available in the program are grouped into categories such as Offensive Language, Spam, Drugs & Alcohol, etc. Each phrase is assigned a score, and each category (a collection of phrases) is assigned a required cumulative score that needs to be reached for the action to be triggered. Only when the total score of all phrases found within a message reaches the required cumulative score of a category, the Remove sensitive content action takes effect.
To start configuring this action, click Add on the Actions tab and select Remove sensitive content (Fig. 2.).
The Remove sensitive content action will appear on the List of actions while the right pane will show the action's properties (Fig. 3.).
Frist, specify what should happen to the sensitive content once it’s found in the message (Fig. 4.). The choice is between:
- removing the content - every sensitive phrase will be deleted from the message,
- masking the content - every sensitive phrase found in the message will be replaced with a string of ****,
- replacing the content - every sensitive phrase will be replaced with another chosen phrase.
Next, select phrases that will be searched for by the program within the contents of messages. Categories of phrases are displayed in the right pane. You can select an entire category by checking the box next to it. Here you can also modify the category's cumulative score required to trigger the Remove sensitive content action (Fig. 5.).
- create a new category or remove an existing category from the list,
- see the phrases in each category, add new phrases to a category, edit or remove the existing phrases.
A phrase score is a numerical value assigned to each phrase. All the predefined phrases in each category have their score set by default. If you are adding a new phrase, you need to set its score manually. You can adjust the score at any time.
The Remove sensitive content action is triggered only when the required cumulative score for the category is reached. When a phrase appears more than once in a message, its score is counted only once for the purpose of calculating the required cumulative score for the category. For example, if a required cumulative score for the Sexual Content category is 20, and the phrase uncensored with a score of 10 appears three times in the message, the Remove sensitive content action will not take effect.
When it comes to the phrase type (Fig. 7.), there are 4 types of phrases that can be used by the program when it processes messages:
- Exact phrase - the program searches for the exact match of a chosen phrase, e.g. test. If the phrase test is found in the message, its score is taken into account when calculating the required cumulative score. If the message contains the word tester, the program simply ignores it. The search operation is case-insensitive.
- Wildcard - the program searches for the phrase that matches or contains the selected phrase, e.g. test*. If the message contains the phrase test or any other word containing it, e.g. tester or testing, the score of each such phrase is taken into account when calculating the required cumulative score for a category. Note that wildcards (*) can be used only at the beginning or at the end of a selected phrase, e.g. *test or test*. The search is case-insensitive.
- Regular expression - allows you to define a sequence of characters or phrases that form a search pattern. The search operation is case-insensitive. CodeTwo Exchange Rules Pro supports the standard Microsoft's .NET Framework types of regular expressions. Learn more about regular expression in this Microsoft article and visit our Knowledge Base article to see examples.
- Algorithm, (Fig. 8.) - this phrase is available only for credit card numbers.
The Algorithm phrase definition is based on the Luhn algorithm and can be used to search for correctly entered credit card numbers within the body of a message. A credit card number inside a message will be recognized by the program if:
- it is entered correctly (according to the pattern defined by the credit card provider);
- it is preceded and followed by a white space character or any other character different than a number, a letter, or a plus (+) sign;
- the digits are separated by dots (.), dashes (-) or white spaces (except for the end of line characters).
If any other characters are found within the credit card number, that number will not be considered a valid occurrence.
Currently, the program can recognize the numbers of credit cards issued by the following providers:
- American Express
- Diners Club
If you want the program to recognize the numbers of credit cards from other providers, you need to define them in the Credit Cards category by using the regular expression type of phrase.
In certain situations, some strings of numbers may be wrongly recognized as credit card numbers, for example when a message contains e.g. personal name records (PNR) from a database of a computer reservation system (CRS) or technical parameters of devices.
Remember to click the Submit changes button located on the Administration Panel's ribbon to apply your changes.
In the case of HTML messages, the Remove sensitive content action searches for phrases inside the HTML code, and may not work as expected in the following cases:
- Part of the phrase is formatted
The Remove sensitive content action may not recognize a multi-word phrase when part of it is in bold, colored, etc. or when an additional character, such as a non-breaking space ( ) is included in the phrase. In such cases, HTML tags appear in the phrase’s source code. If any tags are included in the phrase’s source code, it is impossible for the program to process the phrase correctly.
- A line break <br> tag is inserted by Outlook inside a phrase
When a long (multi-word) sensitive phrase is present in a message sent from Outlook, Outlook may automatically break such phrase into two or more lines. This adds the <br> tag to the phrase’s HTML code and prevents the program from recognizing the phrase as sensitive.
One possible solution to these issues is to use regular expressions when defining sensitive phrases. For example, if you want to remove the phrase: External Email: Exercise caution before clicking on links or attachments from HTML messages, use the following regular expression:
The option of last resort is to send messages in the RTF or Plain Text format.