Regex help

Ezra_Zygmuntowicz3 · 15 June 2005 21:44

Hello list!
Could someone help me do a little regex conversion? I've got a few perl compatible regexes from a php script I am trying to port to ruby but I need a little help. Here are the php functions:

$buffer = preg_replace("#(?<!\"|http:\/\/)www\.(?:[a-zA-Z0-9\-]+\.)*[a-zA-Z]{2,4}(?:/[^ \n\r\"\'<]+)?#", "http://$0", $buffer);
$buffer = preg_replace("#(?<!\"|href=|href\s=\s|href=\s|href\s=)(?:http:\/\/|https:\/\/|ftp:\/\/)(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,4}(?::[0-9]+)?(?:/[^ \n\r\"\'<]+)?#", "<a href=\"$0\" target=\"_blank\">$0</a>", $buffer);
$buffer = preg_replace("#(?<=[\n ])([a-z0-9\-_.]+?)@([^,< \n\r]+)#i", "<a href=\"mailto:$0\">$0</a>", $buffer);

Can someone please help me get these into a format that ruby will like? I kow I will end up using gsub! to do the sub but these regexes don't parse correctly in ruby and I am not sure of the rules I need to follow to make ruby happy. Help is much appreciated.
Thanks-
-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
509-577-7732
ezra@yakima-herald.com

Chris_Eidhof · 15 June 2005 21:52

I'm willing to help, but could you give a little more detail on what
the regexen should do?

···

On Thu, Jun 16, 2005 at 06:44:50AM +0900, Ezra Zygmuntowicz wrote:

Hello list!
Could someone help me do a little regex conversion? I've got a
few perl compatible regexes from a php script I am trying to port to
ruby but I need a little help. Here are the php functions:

$buffer = preg_replace("#(?<!\"|http:\/\/)www\.(?:[a-zA-Z0-9\-]+\.)*
[a-zA-Z]{2,4}(?:/[^ \n\r\"\'<]+)?#", "http://$0", $buffer);
$buffer = preg_replace("#(?<!\"|href=|href\s=\s|href=\s|href\s=)
(?:http:\/\/|https:\/\/|ftp:\/\/)(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,4}
(?::[0-9]+)?(?:/[^ \n\r\"\'<]+)?#", "<a href=\"$0\" target=\"_blank\">
$0</a>", $buffer);
$buffer = preg_replace("#(?<=[\n ])([a-z0-9\-_.]+?)@([^,< \n\r]+)#i",
"<a href=\"mailto:$0\">$0</a>", $buffer);

--
Best regards,
Chris Eidhof

Nikolai_Weibull · 15 June 2005 22:36

Ezra Zygmuntowicz wrote:

Could someone help me do a little regex conversion? I've got a
few perl compatible regexes from a php script I am trying to port to
ruby but I need a little help. Here are the php functions:

$buffer = preg_replace("#(?<!\"|http:\/\/)www\.(?:[a-zA-Z0-9\-]+\.)*
[a-zA-Z]{2,4}(?:/[^ \n\r\"\'<]+)?#", "http://$0", $buffer);
$buffer = preg_replace("#(?<!\"|href=|href\s=\s|href=\s|href\s=)
(?:http:\/\/|https:\/\/|ftp:\/\/)(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,4}
(?::[0-9]+)?(?:/[^ \n\r\"\'<]+)?#", "<a href=\"$0\" target=\"_blank\">
$0</a>", $buffer);
$buffer = preg_replace("#(?<=[\n ])([a-z0-9\-_.]+?)@([^,< \n\r]+)#i",
"<a href=\"mailto:$0\">$0</a>", $buffer);

OK, this wins my newly instated prize for _worst regexes ever_. Inefficient,
inconclusive, inconsistent, and just plain wrong. I really hope you
don’t have to work with a lot of code like this.

Nonetheless, here’s my solution:

domain = /(?:[[:alnum:]\-]+\.)/
tld = /[[:alpha:]]{2,4}/
buffer.gsub!(/(?<!"|http:\/\/)www\.#{domain_part}*#{tld}/, 'http://\0')
buffer.gsub!(/(?<!\"|href=|href\s=\s|href=\s|href\s=)
              (?:https?|ftp):\/\/#{domain_part}+#{tld}
              (?::\d+)?(?:\/[^\s"'<]+)?/x,
             '<a href="\0" target="_blank">\0</a>')
buffer.gsub!(/(?<=\s)[[:alnum:]\-_.]+@[^,<\s]+/i,
             '<a href="mailto:\0">\0</a>')

Totally untested, but at least it’s somewhat easier to understand and a
bit more correct. There are better ways to extract URLs and email
addresses from an input than this, mind you,
nikolai

···

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Ezra_Zygmuntowicz3 · 15 June 2005 22:26

Thanks Chris-
I was able to hack these out and get them to work in ruby. They just do some formatting and conversion of some hyperlinks and ftp links. It was the (?....) grouping that was messing things up a bit. Thanks all the same though!
-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
509-577-7732
ezra@yakima-herald.com

···

On Jun 15, 2005, at 2:52 PM, Chris Eidhof wrote:

On Thu, Jun 16, 2005 at 06:44:50AM +0900, Ezra Zygmuntowicz wrote:

Hello list!
Could someone help me do a little regex conversion? I've got a
few perl compatible regexes from a php script I am trying to port to
ruby but I need a little help. Here are the php functions:

$buffer = preg_replace("#(?<!\"|http:\/\/)www\.(?:[a-zA-Z0-9\-]+\.)*
[a-zA-Z]{2,4}(?:/[^ \n\r\"\'<]+)?#", "http://$0", $buffer);
$buffer = preg_replace("#(?<!\"|href=|href\s=\s|href=\s|href\s=)
(?:http:\/\/|https:\/\/|ftp:\/\/)(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,4}
(?::[0-9]+)?(?:/[^ \n\r\"\'<]+)?#", "<a href=\"$0\" target=\"_blank\">
$0</a>", $buffer);
$buffer = preg_replace("#(?<=[\n ])([a-z0-9\-_.]+?)@([^,< \n\r]+)#i",
"<a href=\"mailto:$0\">$0</a>", $buffer);

I'm willing to help, but could you give a little more detail on what
the regexen should do?
--
Best regards,
Chris Eidhof

Ezra_Zygmuntowicz3 · 15 June 2005 22:46

Nikolai-
Thank you. I have inherited a ton of NASTY php code like this at the newspaper I work at. I am rewriting it all in rails and ruby cgi scripts. But the guy who wrote this stuff is no longer here and I think he liked making his code as obsfuscated as possible in order to keep his job secure. I am by no means a regex master so digesting volumes of stuff like this hurts my head. Thank you for the help.

-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
509-577-7732
ezra@yakima-herald.com

···

On Jun 15, 2005, at 3:36 PM, Nikolai Weibull wrote:

Ezra Zygmuntowicz wrote:

    Could someone help me do a little regex conversion? I've got a
few perl compatible regexes from a php script I am trying to port to
ruby but I need a little help. Here are the php functions:

$buffer = preg_replace("#(?<!\"|http:\/\/)www\.(?:[a-zA-Z0-9\-]+\.)*
[a-zA-Z]{2,4}(?:/[^ \n\r\"\'<]+)?#", "http://$0", $buffer);
$buffer = preg_replace("#(?<!\"|href=|href\s=\s|href=\s|href\s=)
(?:http:\/\/|https:\/\/|ftp:\/\/)(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,4}
(?::[0-9]+)?(?:/[^ \n\r\"\'<]+)?#", "<a href=\"$0\" target=\"_blank\">
$0</a>", $buffer);
$buffer = preg_replace("#(?<=[\n ])([a-z0-9\-_.]+?)@([^,< \n\r]+)#i",
"<a href=\"mailto:$0\">$0</a>", $buffer);

OK, this wins my newly instated prize for _worst regexes ever_. Inefficient,
inconclusive, inconsistent, and just plain wrong. I really hope you
don’t have to work with a lot of code like this.

Nonetheless, here’s my solution:

domain = /(?:[[:alnum:]\-]+\.)/
tld = /[[:alpha:]]{2,4}/
buffer.gsub!(/(?<!"|http:\/\/)www\.#{domain_part}*#{tld}/, 'http://\0')
buffer.gsub!(/(?<!\"|href=|href\s=\s|href=\s|href\s=)
              (?:https?|ftp):\/\/#{domain_part}+#{tld}
              (?::\d+)?(?:\/[^\s"'<]+)?/x,
             '<a href="\0" target="_blank">\0</a>')
buffer.gsub!(/(?<=\s)[[:alnum:]\-_.]+@[^,<\s]+/i,
             '<a href="mailto:\0">\0</a>')

Totally untested, but at least it’s somewhat easier to understand and a
bit more correct. There are better ways to extract URLs and email
addresses from an input than this, mind you,
        nikolai

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Martin_DeMello1 · 16 June 2005 06:45

http://www.weitz.de/regex-coach/ is a nice way to interactively test
regexps as you develop them.

martin

···

Ezra Zygmuntowicz <ezra@yakima-herald.com> wrote:

Nikolai-
Thank you. I have inherited a ton of NASTY php code like this at
the newspaper I work at. I am rewriting it all in rails and ruby cgi
scripts. But the guy who wrote this stuff is no longer here and I
think he liked making his code as obsfuscated as possible in order to
keep his job secure. I am by no means a regex master so digesting
volumes of stuff like this hurts my head. Thank you for the help.

Ezra_Zygmuntowicz3 · 16 June 2005 07:31

Martin-
Thank you for the link! That is exactly the tool I needed. I really appreciate it.

-Ezra Zygmuntowicz
WebMaster
Yakima Herald-Republic Newspaper
ezra@yakima-herald.com
509-577-7732

···

On Jun 15, 2005, at 11:45 PM, Martin DeMello wrote:

Ezra Zygmuntowicz <ezra@yakima-herald.com> wrote:

Nikolai-
Thank you. I have inherited a ton of NASTY php code like this at
the newspaper I work at. I am rewriting it all in rails and ruby cgi
scripts. But the guy who wrote this stuff is no longer here and I
think he liked making his code as obsfuscated as possible in order to
keep his job secure. I am by no means a regex master so digesting
volumes of stuff like this hurts my head. Thank you for the help.

The Regex Coach - interactive regular expressions is a nice way to interactively test
regexps as you develop them.

martin

Topic		Replies	Views
Beginning Ruby ruby-talk	1	62	27 October 2006
Regular Expresion Needed ruby-talk	3	56	19 June 2007
Regex help please please! ruby-talk	3	62	20 December 2002
Convert perl code to ruby: help please ruby-talk	3	109	25 December 2005
Ruby global regex question ruby-talk	6	91	19 November 2008

Regex help

Related Topics