How to use regular expressions
Problem:
How to use regular expressions.
Solution:
Regular expressions are formulas that can be used to match or find character strings by employing wildcards and metasigns. In CodeTwo Exchange Rules family the regular expressions may be used for removing a sensitive content or check if the subject contains some regular text. As opposed to searching text phrases, regular expressions make it possible to create a formula that will cover a variety of character strings.
Let’s analyze the following regular expression:
[0-9]-[a-z]
And symbol string: a12-c3
The regular expression analyzes subsequent characters in a given string and checks if they match a predefined formula. First, it checks if the symbol is a digit in the scope <0,9>. In our string, the first character is a letter, so the regular expression restarts the analysis by moving on to the next character. The next character is a digit (1), therefore it meets the requirement of our expression. For the next requirement to be met, the next character must be a hyphen, but in our string it is digit 3 instead. Therefore, the search restarts beginning from the next character (2). As we can see, both the first and second requirements are met here (2 is a digit and the character that follows is a dash). The next condition is that the next character should be a lower case letter in the scope <a,z>, which is fulfilled, too, as the next character in our string is the letter ‘c’
Summing up, string fragment 2-c matches our regular expression [0-9]-[a-z].
But how can we match a whole string? Above regular expression means that we would like to match: 1 digit, 1 dash and 1 letter. To match a whole string we use:
[a-z0-9]{3}-[a-z0-9]{2}
This way we specify that the searched string should contain 3 characters that are digits or letters, next one dash and it should be ended by two characters which are digits or letters.
Syntax:
To use regular expressions correctly, it is vital to use good syntax. Below you will find the key syntax elements with examples:
- . (dot) means 'any character':
John.
- matches John1, Johns, John@ etc.
- \d or [0-9] means 'any digit':
invoice no\d
- matches invoice no1, invoice no2
- \w or [a-z0-9_] means 'any word character' (digits, letters and underscore)
\w\w\w
- matches A53, S11, _a0, a_0 etc.
- \s means 'any white character' (space, tab, line break etc.)
no\s\d
- matches no 3, no 8, etc.
- * means 'zero or more preceding characters'
John.*
- matches John, Johny, Johnny, John12311, John?, John!
- + means 'at least one preceding character'
John+y
- matches Johny, Johnny, Johnnny
- ? means 'zero or one preceding character'
Not?
- matches no, not
- {N} means 'N instances of the preceding character'
\d{3}
- matches 007, 123, 545, etc.
- {N,M} means 'between N and M instances of the preceding character'
.{2,4}
- matches jack, c4, ..., &&
- {N,} means 'at least N instances of the preceding character'
\d{3,}
- matches 123, 3245234 etc.
- {,N} means 'a maximum of N instances of the preceding character'
\d{,2}
- matches 1, 11, etc.
- $ indicates the end of a string or a line
domain\.com$
- matches domain.com, my-domain.com, etc. but does not match my-domain.com.pl
- [ ] matches one element on the list, all listed elements must be located between square brackets
[not]{2}
- matches no, to, oo, on, nn, etc.
- - (dash) creates a scope on the list
[a-c0-2]{2,}
- matches abc012, ab, c2, 120, ac03,
- ^ means a logical ‘not’
[^7]{2}
- matches all two-character strings without digit 7
- | means a logical ‘or’
(c|m)at
- matches cat and mat
- & means a logical ‘and’
(^\d)&(^[a-z])
- means any character that is NOT a digit and that is not a letter, e.g. @, &, ), #, etc.
- \ means that the character that follows should be treated as a normal character, not as a syntax element, e.g. \. Means a dot rather than any character.
\\small\.
- matches only string \small. (with a dot at the end).
- \n is a new line.
- \r is a carriage return.
Example 1:
invoice\s((no|number)&\s)?\d{4,}
This expression matches the following character strings:
- invoice no 1233
- invoice number 213252341324
- invoice 12366
Example 2:
Order\s(no\s)?[0-9\-]+
This expression matches the following character strings:
- Order 123-15-123413
- Order no 1
- Order -
If you’d like to learn more about regular expressions, go here.
Related products: | CodeTwo Exchange Rules 2007 4.x, CodeTwo Exchange Rules 2010 3.x, CodeTwo Exchange Rules 2013 2.x, CodeTwo Exchange Rules 2016 1.x, CodeTwo Exchange Rules 2019 1.x, CodeTwo Exchange Rules Pro 1.x, 2.x, General (Microsoft 365, Exchange & more) |
Categories: | How-To |
Last modified: | November 6, 2023 |
Created: | May 31, 2011 |
ID: | 177 |