Deprecated: Assigning the return value of new by reference is deprecated in /home/burlyman/public_html/blog/hosting-settings.php on line 472

Deprecated: Assigning the return value of new by reference is deprecated in /home/burlyman/public_html/blog/hosting-settings.php on line 487

Deprecated: Assigning the return value of new by reference is deprecated in /home/burlyman/public_html/blog/hosting-settings.php on line 494

Deprecated: Assigning the return value of new by reference is deprecated in /home/burlyman/public_html/blog/hosting-settings.php on line 530

Deprecated: Assigning the return value of new by reference is deprecated in /home/burlyman/public_html/blog/hosting-includes/cache.php on line 103

Deprecated: Assigning the return value of new by reference is deprecated in /home/burlyman/public_html/blog/hosting-includes/query.php on line 21

Deprecated: Assigning the return value of new by reference is deprecated in /home/burlyman/public_html/blog/hosting-includes/theme.php on line 623
BurlyHost.com, Inc. Web Hosting Blog » regular expressions


Posts Tagged ‘regular expressions’

Perl Programming and regular expression fun.

Saturday, October 4th, 2008 by Tim Greer
del.icio.us Digg Facebook FeedMeLinks Furl Ma.gnolia NewsVine Netscape Reddit Slashdot SphereIt SpurlStumbleUpon Technorati YahooMyWeb

Recently, in the Perl Programming language usenet newsgroup (comp.lang.perl.misc), a poster posed a question, asking if a certain task was possible, thinking it wasn’t. I read it and immediately thought “Sure, that’s possible, it’s easy…” and went on to post a solution a short time later. However, I had to think about it for a minute, because I never considered doing something like this. To be more accurate, have done things just like this, but wouldn’t have approached it this way. Of course, there’s more than one way to skin a cat and some ways are far more creative, fun, challenging or inventive than others.

The question was an interesting challenge, because the poster wanted to only use a regular expression to dynamically detect and replace any duplicate instances of any single character in a string to only keep the first instance and remove the rest following. Most people assumed that the duplicate characters in the string would be predetermined/already known, but they weren’t (e.g.; it could be any single character repeated in the string, or multiple characters repeated throughout, not just one character you already knew about that could be used more than once).

They had posed the following string examples:

a”bc’def’g’ -> a’bcdefg
”’ab’cd’efg -> ‘abcdefg
abc’d'e”f’g -> abc’defg

My posted solution was a simple and effective one line of relevant code:

$string =~ s|(.)| ($` =~ m/$1/) ? ” : $1 |eg;

This takes any single character, captures it and then processes it by counting how many times it finds it, and if it’s over once, it ignores duplicates of it, and the replacement (after the first replacement) is empty, rather than the captured value. =~ s|…|…|g; replaces globally for each (.) single character it finds and captures into $1, while /e processes the right side of the regex and uses $` to check the number of times $1 is matched by checking with ($` =~ m/$1/), which results in a number due to the parenthesis and the operators are assigned to replace with “” (nothing) “” if it’s true ? or else (:) $1 if it’s false (less than one, hasn’t shown a match yet for that single character).

I created the following script to example this (pardon the fact I didn’t put it in a loop to show the varied strings and parsed output):

#!/usr/bin/perl
use warnings;
use strict;

my $linea = “a”bc’def’g'”;
my $lineb = “”’ab’cd’efg”;
my $linec = “abc’d'e”f’g”;

print “$linea -> “;
$linea =~ s|(.)| ($` =~ m/$1/) ? ” : $1 |eg;
print “$linea\n”;

print “$lineb -> “;
$lineb =~ s|(.)| ($` =~ m/$1/) ? ” : $1 |eg;
print “$lineb\n”;

print “$linec -> “;
$linec =~ s|(.)| ($` =~ m/$1/) ? ” : $1 |eg;
print “$linec\n”;

The output:
~]$ ./script.pl
a”bc’def’g’ -> a’bcdefg
”’ab’cd’efg -> ‘abcdefg
abc’d'e”f’g -> abc’defg

Following this example, another poster (Ben Morrow) followed up with some other methods of accomplishing the same task, also using only regular expressions:

Without using /e:
~% perl -le’$_ = “abccbdcdc”; 1 while s/(.)(.*)\1/$1$2/g; print’
abcd

Using 5.10’s \K we can remove the replacement part:
~% perl5.10.0 -le’$_=”abccbdcdc”; 1 while s/(.).*\K\1//g; print’
abcd

and if we reverse the string before and after (so we can use look*ahead* instead, which can be variable-length) we can remove the while loop:

~% perl -le’$_ = reverse “abccbdcdc”; s/(.)(?=.*\1)//g;
print scalar reverse’
abcd

Now, how cool is that? I offered one method, and Ben offered three additional methods, totaling four various ways to accomplish the task using only regular expressions. Perl’s regular expressions and varied ways about coming up with solutions, truly makes it a great language. This is only one of many cool tricks and challenges people have posed that impresses me about Perl’s power, but this was just a recent one that stuck in my mind. Since I didn’t see any existing examples out there, this could be useful information to learn from and use if you should run into such a challenge some day.

There are a million reasons to use regular expressions, and if used properly, they are a very powerful, time saving and accurate feature of any language. The above is just one of many examples that illustrate the Tim Toady (TIMTOWTDI -> “There Is More Than One Way To Do It“) power of Perl. Doing that much logic in a very short, simple regular expression. One of about 10,000 things that show how cool Perl is. By far, my language of choice when coding.


Valid XHTML 1.0 Transitional