Recently in Perl Category

Negative Look behind assertion put to use

By Luke Smith on March 5, 2007 10:46 PM

I'm pretty handy with Regular Expressions. They're an incredibly useful tool. Certainly one that more programmers out there need to learn. Even still, there are some constructs that I rarely if ever use. Zero width negative look behind assertion is one of them. That's not to say I don't know how to use them, but it's been my experience that either you have a clear cut case for their use, or you're desperate to solve some regex problem, so you're reaching into the bag of tricks for anything that might make the magic happen. More often than not, it's the latter reason that will make me dust off the ol' NLBA. And more often than not, it's not where the magic is. As a rule, if you're convinced that you can devise a single regex to solve an intricate problem, and you succeed, you've likely created a monster that will be a liability when it comes to code maintenance, with or without annotation. You're probably better off splitting it up into a couple regex calls for your own sake, and for those that follow you. That being said, a fellow coworker, new to perl, and new to regex had an intricate problem and reached into the bag of tricks, found NBLA and it was where the magic was. It still seems odd and sort of hacky, but here goes:
The Problem
Evaluate a comment string, and flag it for the presence of any url that was not relative to a particular domain —let's say mydomain.com.
  • BAD: "My site is at http://foo.manchu.info/hi/there/mom"
  • GOOD: "I have an account on manchu"
  • GOOD: "I have an account on http://foo.manchu.mydomain.com/hi/there/mom"
The (dirty hacky single regex) Solution
  # Look for urls to other domains in the comment
  if ( $comment =~ /[a-z0-9\-]\.(?:com|info|net|etc)\b(?<!mydomain.com)/i ) {
    # Do something immoral and unkind.  Unless it's comment spam, in which case, hurray.
    ...
  }
Maybe my test cases were limited, but I'll be damned it if didn't do just what it should have with each. Personally, I think it would be easier to maintain as
  # Look for urls in the comment
  if ( my @urls = ($comment =~ m%(?:http(s)?://)?((?:[a-z0-9\-]\.)+(?:com|net|info|etc))\b%gi) ) {
    # Check for urls to other domains
    if ( grep { ! /mydomain.com$/i } @urls ) {
      # Do something immoral etc.
      ...
    }
  }
Though much more verbose. Maybe I'm missing something...

ls.n

LucasSmith.name

Luke and Liam

I'm Luke. I am a front end engineer at Yahoo! on the YUI team.

Mostly I write about code stuff, but occassionally I'll mix in some real life. You've been warned.

Archives

Tags

Feeds

Subscribe to feed Recent entries

Content licensed under Creative Commons

Code licensed under BSD license

©2005 - 2010 Lucas Smith

Powered by Movable Type