**Overview**
When normalization rules were first explained to me in an Office Communications Server 2007 training class, I left completely confused.

I spent quite a lot of time trying to understand how normalization rules work. First, I found that normalization rules are .Net Expressions. A quick search of the Internet for .Net Expression primers and help guides did not help with understanding how they worked.

I finally found a piece of software called

.NET Regular Express Designer that allowed me to see what is happening in a .Net expression and more importantly a normalization rule.

Let's start with why we need telephone numbers (straight from the IETF/ITU standards).

- A telephone number is a string of decimal digits that uniquely indicates the network termination point.
- The number contains the information necessary to route the call to the termination point.

A Normalization Rule modifies the user input and presents a fully routable telephone number that can be used by Office Communications Server (OCS) / Lync Server and the PSTN to delivery a voice call to the intended termination point. To OCS / Lync Server, your telephone number is effectively meaningless if it is not presented in E.164 format.

Humans are inconsistent, especially with how we write phone numbers down. People use parens, dashes, dots, and spaces for example. Users in a business might only know a 4 digit extension to call another employee. Normalization Rules help the humans enter the phone number in the format they are used to and then translate that to the pattern that OCS / Lync Server is expecting.

There are three main processes happening when a normalization rule is used.

- Does the Phone Pattern Regular Expression match the input?
- What is captured in the Phone Pattern Regular Expression to be used by the Translation Pattern Regular Expression?
- What is the Translated number?

**Example Normalization Rule**

Phone Pattern Regular Expression: ^(2\d\d\d)$

Translation Pattern Regular Expression: +1425555$1

The "^" specifies that the match must occur at the beginning of the string.

Anything between parens is captured into a group. If there are more than one set of parens then there are multiple groups.

Any letter that is after "\" is considered a language element and has a special function. For example \d is a single digit wildcard. \D is a single character wild card.

"$" Specifies that the match must occur at the end of the string.

In the above example we are matching against any 4 digit number that starts with a 2. We are capturing the 2 into group 1 plus any other 3 digits that follow. If a number is 5 digits it will not match. If a number starts with any other number than 2 it will not match.

Now that we have captured group 1 we can take a look at the Translated Pattern Regular Expression

Phone Pattern Regular Expression: ^(2\d\d\d)$

Translation Pattern Regular Expression: +1425555$1

The +1425555 are absolute digits and will be inserted before the captured digits in group 1 "$1". Each group is represented by a $ and a digit for the order in which they were captured. The second group captured would have a "$2" in the Translation Pattern Regular Expression.

If we entered 2345 then the translated pattern would be +1425552345.

What if we wanted to match against 5 digits and only capture 4 for example?

Phone Pattern Regular Expression: ^6(\d\d\d\d)$

Translation Pattern Regular Expression: +1425555$1

The above rule would match any 5 digit number that started with a 6. But, because the 6 is not within the Parens we will not capture the 6 into group 1.

If we entered 62345 then the translated pattern would be +1425552345.

Is there an easier way to specify multiple digits rather than writing\d\d\d\d?

Phone Pattern Regular Expression: ^6(\d\d\d\d)$

Translation Pattern Regular Expression: +1425555$1

is the same as

Phone Pattern Regular Expression: ^6(\d{4})$

Translation Pattern Regular Expression: +1425555$1

The {x} specifies the number of matches for the preceding Language Element. In this case we are looking for 4 digits. If I specified \D{4} then it would be 4 characters.

If we entered 62345 then the translated pattern would be +1425552345.

What does a normalization rule look like capturing multiple groups of numbers?

Phone Pattern Regular Expression: ^(\d{3})(\d{4})$

Translation Pattern Regular Expression: +1425$1$2

In the above Phone Pattern there are two sets of parens. Each set of parens captures into a different group. The first three digits are captured into group 1 "$1" and the next 4 digits are captured into group 2 "$2".

In the Translation Pattern we use $1 and $2 after the +1425.

If we entered 5552345 then the translated pattern would be +1425552345.

What if we wanted to handle dashes, spaces, dots, and whatever else users dream up?

Phone Pattern Regular Expression: ^(\d{3})\D(\d{3})\D(\d{4})$

Translation Pattern Regular Expression: +1$1$2$3

In the Phone Pattern Regular Expression we are matching for 3 digits, then a single character. Then another three digits, and a single character. Then a final four digits. Since the \D is not within the parens we match against it, but are not capturing it. The result is the Translation Pattern has no dashes, dots, spaces, or any other character the user can dream up.

If we entered 425-555-2345 or 425.555.2345 then the translated pattern would be +1425552345.

Why do you use \D instead of [\s()\-\./] ?

Simple. It does the same thing and more! \D will match any non-digit. [\s()\-\./] will only match space, parens, dash or dots.

Is there a way to do optional matches?

Phone Pattern Regular Expression: ^9?(\d{3})\D(\d{3})\D(\d{4})$

Translation Pattern Regular Expression: +1$1$2$3

In the Phone Pattern Regular Expression above we start of with a "9?". This means the expression will match if there is a 9 or not a 9. The key is using the question mark after the number (or character). This is handy if you want to be allow users to still dial a 9 like they used to on a PBX. They can type it in or not, we simply don't care because it is optional and we are not capturing that digit into a group.

If we entered 9425-555-2345 or 425-555-2345 then the translated pattern would be +1425552345.

How would I do a wild card for any number of characters/digits?

Phone Pattern Regular Expression: ^\D*(\d{3})\D*(\d{3})\D*(\d{4})$

Translation Pattern Regular Expression: +1$1$2$3

The above Phone Pattern Regular Expression will look for any amount of characters until it matches against 3 digits. Then any amount of characters until it matches against another 3 digits. Then a match against the last four digits.

The benefit of this is that we can handle "(425) 555-1234" or "425-555-1234" or "4255551234" and to be honest we can handle this too "Your grandma 425 has white 555 hair 1234". All the examples would be translated to +14255551234

How about logical OR?

Phone Pattern Regular Expression: ^\D*(303|720)\D*(\d{3})\D*(\d{4})$

Translation Pattern Regular Expression: +1$1$2$3

A logical OR is very handy if you need to handle multiple area codes, or NXXs (the second set of 3 digits for non-voice people). The pipe sign is what does the logical OR within the parens. The above Phone Pattern Regular Expression would look to match the first three digits to 303 or 720, but not both.

If we entered 303-555-2345 or (720) 555-2345 then the translated pattern would be either +1303552345 or +17205552345.

**Conclusion**

In my experience the above examples will help with 90% of the needs for Normalization Rules. There are much more complicated Normalization Rules that could be written, but I'll leave that to another post. If you want to play around with Normalization Rules I strongly encourage downloading

.NET Regular Expression Designer so that you can visibly see how Normalization Rules work.