Wednesday, January 18, 2012

Learning RegEx - Grouping and Backlinks

Switch on the CodeRegular Expressions - Grouping

As we learned in our Regular Expressions Primer, regular expressions can be simple or extremely complex. However, there are a few concepts that were left out. Today we are going to cover one of those concepts... grouping.

In regular expressions, grouping allows you to "chunk" parts of your expression and tell the regular expression engine to treat each "chunk" as a separate match. This allows you to, say, find the type of domain (com, eu, biz) of an email address, or possibly whether or not the uri is using encrytion (https). There are a whole lot of reasons you want to use grouping, but enough talk, let's get to it. Today we will be using PHP yet again, but grouping is supported by any self-respecting regular expression engine.

The Basics of Grouping

Grouping is implemented with ()s. Anything inside the parentheses are considered a group, and are tracked by the order in which they appear in the expression. So if I have two groups, whichever appears first (from left to right) will be marked as match one, and match two will be the one that appears next. Some engines even allow sub-groups, but for today we will stick to single-scope groups.

...

Back References

When we are talking about regular expressions, back-references are references to groups that you can reference when replacing matches. In order to use back-references, you have groups defined in your expression.

Taking the example above, let's say we want to replace all the domains in a set of emails with "awesomesite". In PHP you use the preg_replace function, which uses regular expressions to replace text in a string. Let's check out how to use back-references to replace the domain in our email.

...

image..."

[Insert my usual, "I don't RegEx enough to remember this so I'm captuing this here so I can find it in the future the next time I have to RegEx" statement block here]

No comments: