CodeTwo Base.title

How to use regular expressions

Problem:

How to use regular expressions.

Solution:

Regular expressions are formulas that can be used to match or find character strings by employing wildcards and metasigns. In CodeTwo Exchange Rules family the regular expressions may be used for removing a sensitive content or check if the subject contains some regular text. As opposed to searching text phrases, regular expressions make it possible to create a formula that will cover a variety of character strings.

Let’s analyze the following regular expression:

[0-9]-[a-z]

And symbol string: a12-c3

The regular expression analyzes subsequent characters in a given string and checks if they match a predefined formula. First, it checks if the symbol is a digit in the scope <0,9>. In our string, the first character is a letter, so the regular expression restarts the analysis by moving on to the next character. The next character is a digit (1), therefore it meets the requirement of our expression. For the next requirement to be met, the next character must be a hyphen, but in our string it is digit 3 instead. Therefore, the search restarts beginning from the next character (2). As we can see, both the first and second requirements are met here (2 is a digit and the character that follows is a dash). The next condition is that the next character should be a lower case letter in the scope <a,z>, which is fulfilled, too, as the next character in our string is the letter ‘c’

Summing up, string fragment 2-c matches our regular expression [0-9]-[a-z].

But how can we match a whole string? Above regular expression means that we would like to match: 1 digit, 1 dash and 1 letter. To match a whole string we use:

[a-z0-9]{3}-[a-z0-9]{2}

This way we specify that the searched string should contain 3 characters that are digits or letters, next one dash and it should be ended by two characters which are digits or letters.

Syntax:

To use regular expressions correctly, it is vital to use good syntax. Below you will find the key syntax elements with examples:

  • . (dot) means 'any character':
    John.

    - matches John1, Johns, John@ etc.

  • \d or [0-9] means 'any digit':
    invoice no\d 

    - matches invoice no1, invoice no2

  • \w or [a-z0-9_] means 'any word character' (digits, letters and underscore)
    \w\w\w

    - matches A53, S11, _a0, a_0 etc.

  • \s means 'any white character' (space, tab, line break etc.)
    no\s\d

    - matches no 3, no 8, etc.

  • * means 'zero or more preceding characters'
    John.*

    - matches John, Johny, Johnny, John12311, John?, John!

  • + means 'at least one preceding character'
    John+y

    - matches Johny, Johnny, Johnnny

  • ? means 'zero or one preceding character'
    Not?

    - matches no, not

  • {N} means 'N instances of the preceding character'
    \d{3}

    - matches 007, 123, 545, etc.

  • {N,M} means 'between N and M instances of the preceding character'
    .{2,4}

    - matches jack, c4, ..., &&

  • {N,} means 'at least N instances of the preceding character'
    \d{3,}

    - matches 123, 3245234 etc.

  • {,N} means 'a maximum of N instances of the preceding character'
    \d{,2}

    - matches 1, 11, etc.

  • [ ] matches one element on the list, all listed elements must be located between square brackets
    [not]{2}

    - matches no, to, oo, on, nn, etc.

  • - (dash) creates a scope on the list
    [a-c0-2]{2,}

    - matches abc012, ab, c2, 120, ac03,

  • ^ means a logical ‘not’
    [^7]{2}

    - matches all two-character strings without digit 7

  • | means a logical ‘or’
    (c|m)at

    - matches cat and mat

  • & means a logical ‘and’
    (^\d)&(^[a-z])

    - means any character that is NOT a digit and that is not a letter, e.g. @, &, ), #, etc.

  • \ means that the character that follows should be treated as a normal character, not as a syntax element, e.g. \. Means a dot rather than any character.
    \\small\.

    - matches only string \small. (with a dot at the end).

  • \n is a new line.
  • \r is a carriage return.

Example 1:

invoice\s((no|number)&\s)?\d{4,}

This expression matches the following character strings:

  • invoice no 1233
  • invoice number 213252341324
  • invoice 12366

Example 2:

Order\s(no\s)?[0-9\-]+

This expression matches the following character strings:

  • Order 123-15-123413
  • Order no 1
  • Order -

If you’d like to learn more about regular expressions, go here.

Our Clients:
Unicef
Facebook
Shell
T-Systems
Loreal
Casio
UPS Israel
Oford University
Mitsubishi Motors
Toshiba TEC UK Imaging Systems Ltd
Illinois Institute of Technology
MAN Diesel
McDonalds India
Skoda Auto
Bank of Israel
Fujifilm
China Mobile
Santander
Samsung SDI
Skanska
Generali
Telmex
Toyota Tsusho
BECHTEL
Ricoh
BAE SYSTEMS
Federação Portuguesa de Futebol
Credit Agricole
HYUNDAI
Rothschild
Toyota Boshoku
Oriflame Romania
ING
Ikea
Nordea

Partners, certificates & awards