Detect any "<a href=mailto:...>...</a>" string in a string?

Hi all

I have a long, long string of HTML tags. There might be some unprotected
Email links in there like this:

<a href="mailto:some@email.xx">Some Email</a>

or

<a href="mailto:some@email.xx?subject=Something">Some Email</a>

or...

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate("some","email.xx","Something","Some Email")

or something like that. Sadly I have no idea how to find the needed
string parts. I stumbled upon the TMail GEM and guess it could help me a
lot... But I don't get any further now.

Any help is appreciated! Thanks!
Josh

···

--
Posted via http://www.ruby-forum.com/.

Here's how I do it in PHP, if you wanna rework it into Ruby:

$GLOBALS[ 'EMAIL_LINK_REGEX' ] = "#<a[^>]*mailto:([^'\" ]*)['\"
]>([^<]*)</a>#i";

$html = preg_replace_callback( $GLOBALS[ 'EMAIL_LINK_REGEX' ],
'fubarEmail', $html );

function fubarEmail( $matches )
{
  $strNewAddress = replaceEntities( $matches[ 1 ] );

  $strText = replaceEntities( $matches[ 2 ] );

  $arrEmail = explode( '&#64;', $strNewAddress );

  $strTag = "<script language='Javascript' type='text/javascript'>\r";
  $strTag .= "<!--\r";
  $strTag .= "document.write('<a href=\"mai');\r";
  $strTag .= "document.write('lto');\r";
  $strTag .= "document.write(':$arrEmail[0]');\r";
  $strTag .= "document.write('@');\r";
  $strTag .= "document.write('$arrEmail[1]\">');\r";
  $strTag .= "document.write('$strText<\/a>');\r";
  $strTag .= "// -->\r";
  $strTag .= "</script><noscript>$arrEmail[0] at \r";
  $strTag .= str_replace( '&#46;', ' dot ', $arrEmail[ 1 ] ) . '</noscript>';

  return $strTag;
}

···

On Wed, Oct 7, 2009 at 3:57 PM, Joshua Muheim <forum@josh.ch> wrote:

I have a long, long string of HTML tags. There might be some unprotected
Email links in there like this:

<a href="mailto:some@email.xx">Some Email</a>

or

<a href="mailto:some@email.xx?subject=Something">Some Email</a>

or...

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate("some","email.xx","Something","Some Email")

or something like that. Sadly I have no idea how to find the needed
string parts. I stumbled upon the TMail GEM and guess it could help me a
lot... But I don't get any further now.

--
Greg Donald
http://destiney.com/

Joshua Muheim wrote:

Hi all

I have a long, long string of HTML tags. There might be some unprotected
Email links in there like this:

<a href="mailto:some@email.xx">Some Email</a>

or

<a href="mailto:some@email.xx?subject=Something">Some Email</a>

or...

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate("some","email.xx","Something","Some Email")

or something like that. Sadly I have no idea how to find the needed
string parts.

1)regexes
2)gsub()
3)split()

html =<<ENDOFHTML
<html>
<head>
  <title>html page</title>
</head>
<body>
  <a href="mailto:some@email.xx">Some Email</a>
  <div>hello</div>
  <div>world</div>
  <div>goodbye</div>
  <a href="mailto:some@email.xx?subject=Something&cost=10">Some
Email</a>
</body>
</html>
ENDOFHTML

new_html = html.gsub(/<a href="(.+?)">(.+?)<\/a>/) do |match|
  p match
  addy = $1
  link = $2
  p addy, link

  pieces = addy.split("?")
  if pieces.length == 2
    puts "there is a query string to parse"
    name_vals = pieces[1].split("&")
    p name_vals
  end

  puts

  "the replacement string cobbled together from the pieces above"
end

puts new_html

--output:--
"<a href=\"mailto:some@email.xx\">Some Email</a>"
"mailto:some@email.xx"
"Some Email"

"<a href=\"mailto:some@email.xx?subject=Something&cost=10\">Some
Email</a>"
"mailto:some@email.xx?subject=Something&cost=10"
"Some Email"
there is a query string to parse
["subject=Something", "cost=10"]

<html>
<head>
        <title>html page</title>
</head>
<body>
  the replacement string cobbled together from the pieces above
        <div>hello</div>
        <div>world</div>
        <div>goodbye</div>
  the replacement string cobbled together from the pieces above
</body>
</html>

···

--
Posted via http://www.ruby-forum.com/\.

Joshua Muheim wrote:

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate("some","email.xx","Something","Some Email")

By the way, you can't substitute js functions for <a> tags.

···

--
Posted via http://www.ruby-forum.com/\.

You could also use a library such as hpricot or nokogiri to search and
replace all the <a> tags.

And you should have \r\n, not \r, if you are writing HTML.

···

On Oct 7, 5:19 pm, Greg Donald <gdon...@gmail.com> wrote:

On Wed, Oct 7, 2009 at 3:57 PM, Joshua Muheim <fo...@josh.ch> wrote:
> I have a long, long string of HTML tags. There might be some unprotected
> Email links in there like this:

> <a href="mailto:s...@email.xx">Some Email</a>

> or

> <a href="mailto:s...@email.xx?subject=Something">Some Email</a>

> or...

> I need to detect these email links and substitute them with something
> different, like a JavaScript function to obfuscate them:

> obfuscate("some","email.xx","Something","Some Email")

> or something like that. Sadly I have no idea how to find the needed
> string parts. I stumbled upon the TMail GEM and guess it could help me a
> lot... But I don't get any further now.

Here's how I do it in PHP, if you wanna rework it into Ruby:

$GLOBALS[ 'EMAIL_LINK_REGEX' ] = "#<a[^>]*mailto:([^'\" ]*)['\"
]>([^<]*)</a>#i";

$html = preg_replace_callback( $GLOBALS[ 'EMAIL_LINK_REGEX' ],
'fubarEmail', $html );

function fubarEmail( $matches )
{
$strNewAddress = replaceEntities( $matches[ 1 ] );

$strText = replaceEntities( $matches[ 2 ] );

$arrEmail = explode( '&#64;', $strNewAddress );

$strTag = "<script language='Javascript' type='text/javascript'>\r";
$strTag .= "<!--\r";
$strTag .= "document.write('<a href=\"mai');\r";
$strTag .= "document.write('lto');\r";
$strTag .= "document.write(':$arrEmail[0]');\r";
$strTag .= "document.write('@');\r";
$strTag .= "document.write('$arrEmail[1]\">');\r";
$strTag .= "document.write('$strText<\/a>');\r";
$strTag .= "// -->\r";
$strTag .= "</script><noscript>$arrEmail[0] at \r";
$strTag .= str_replace( '&#46;', ' dot ', $arrEmail[ 1 ] ) . '</noscript>';

return $strTag;

}

--
Greg Donaldhttp://destiney.com/

Daniel Danopia wrote:

···

On Oct 7, 5:19�pm, Greg Donald <gdon...@gmail.com> wrote:

> or...
Here's how I do it in PHP, if you wanna rework it into Ruby:

� $strTag .= "document.write('$arrEmail[1]\">');\r";
Greg Donaldhttp://destiney.com/

You could also use a library such as hpricot or nokogiri to search and
replace all the <a> tags.

And you should have \r\n, not \r, if you are writing HTML.

Thank you guys. Nokogiri looks really useful, I will take a look at it.
:slight_smile:
--
Posted via http://www.ruby-forum.com/\.