Negative Look behind assertion put to use
I'm pretty handy with Regular Expressions. They're an incredibly useful tool. Certainly one that more programmers out there need to learn. Even still, there are some constructs that I rarely if ever use. Zero width negative look behind assertion is one of them. That's not to say I don't know how to use them, but it's been my experience that either you have a clear cut case for their use, or you're desperate to solve some regex problem, so you're reaching into the bag of tricks for anything that might make the magic happen.
More often than not, it's the latter reason that will make me dust off the ol' NLBA. And more often than not, it's not where the magic is. As a rule, if you're convinced that you can devise a single regex to solve an intricate problem, and you succeed, you've likely created a monster that will be a liability when it comes to code maintenance, with or without annotation. You're probably better off splitting it up into a couple regex calls for your own sake, and for those that follow you.
That being said, a fellow coworker, new to perl, and new to regex had an intricate problem and reached into the bag of tricks, found NBLA and it was where the magic was. It still seems odd and sort of hacky, but here goes:
- The Problem
-
Evaluate a comment string, and flag it for the presence of any url that was not relative to a particular domain —let's say mydomain.com.
- BAD: "My site is at http://foo.manchu.info/hi/there/mom"
- GOOD: "I have an account on manchu"
- GOOD: "I have an account on http://foo.manchu.mydomain.com/hi/there/mom"
- The (dirty hacky single regex) Solution
# Look for urls to other domains in the comment if ( $comment =~ /[a-z0-9\-]\.(?:com|info|net|etc)\b(?<!mydomain.com)/i ) { # Do something immoral and unkind. Unless it's comment spam, in which case, hurray. ... }
# Look for urls in the comment
if ( my @urls = ($comment =~ m%(?:http(s)?://)?((?:[a-z0-9\-]\.)+(?:com|net|info|etc))\b%gi) ) {
# Check for urls to other domains
if ( grep { ! /mydomain.com$/i } @urls ) {
# Do something immoral etc.
...
}
}
Though much more verbose. Maybe I'm missing something...