Gsub + loop

This question actually pertains to a Rails app but it's more of a
general Ruby question so I'll ask it here.

Within a body of text I'm trying to match URLs for
youtube/google/myspace/etc videos and replace them with their associated
embed codes. For each different site there is a different regular
expression and a different embed code. So I made a hash with each value
being an array containing the regular expression and replacement like:
{:videosite => [regexp, replacement]}. From there I figured I should
loop through the hash and check the text against each expression and
replace the URLs if necessary. Unfortunately I am having a problem:

http://pastie.caboo.se/64785

class Embedment < ActiveRecord::Base
  def self.capture_embedments(text)
    embedments = []
    regexps.each do |k,v|
      text.gsub!(v[0]) do |match|
        embedments << embedment = self.new
        embedment.html = v[1]
      end
    end

    return embedments
  end

  def self.regexps
    return {
      :youtube => [/\(youtube:.*?(?:v=)?([\w|-]{11}).*\)/, "<object
width=\"425\" height=\"350\"><param name=\"movie\"
value=\"http://www.youtube.com/v/#{match[0]}&quot;></param><param
name=\"wmode\" value=\"transparent\"></param><embed
src=\"http://www.youtube.com/v/#{match[0]}&quot;
type=\"application/x-shockwave-flash\" wmode=\"transparent\"
width=\"425\" height=\"350\"></embed></object>"]
    }
  end
end

I think the flaw in my plan is that the second member of the array is
being parsed as soon as it's referenced, so this raises an exception
(undefined local variable or method `match' for Embedment:Class) since
the object 'match' does not exist yet. I'm guessing my approach to this
problem is very very wrong, but I have yet to see past my poor solution.
The reason why I separated the regular expressions from the method is
because there's going to be a few tens of them and I wanted to
consolidate them. Any help would be appreciated.

···

--
Posted via http://www.ruby-forum.com/.

This question actually pertains to a Rails app but it's more of a
general Ruby question so I'll ask it here.

Within a body of text I'm trying to match URLs for
youtube/google/myspace/etc videos and replace them with their associated
embed codes. For each different site there is a different regular
expression and a different embed code. So I made a hash with each value
being an array containing the regular expression and replacement like:
{:videosite => [regexp, replacement]}. From there I figured I should
loop through the hash and check the text against each expression and
replace the URLs if necessary. Unfortunately I am having a problem:

http://pastie.caboo.se/64785

class Embedment < ActiveRecord::Base
  def self.capture_embedments(text)
    embedments =
    regexps.each do |k,v|
      text.gsub!(v[0]) do |match|
        embedments << embedment = self.new
        embedment.html = v[1]
      end
    end

    return embedments
  end

  def self.regexps
    return {
      :youtube => [/\(youtube:.*?(?:v=)?([\w|-]{11}).*\)/, "<object
width=\"425\" height=\"350\"><param name=\"movie\"
value=\"YouTube
name=\"wmode\" value=\"transparent\"></param><embed
src=\"YouTube;
type=\"application/x-shockwave-flash\" wmode=\"transparent\"
width=\"425\" height=\"350\"></embed></object>"]
    }
  end
end

I think the flaw in my plan is that the second member of the array is
being parsed as soon as it's referenced, so this raises an exception
(undefined local variable or method `match' for Embedment:Class) since
the object 'match' does not exist yet.

You're right on!

I'm guessing my approach to this
problem is very very wrong, but I have yet to see past my poor solution.
The reason why I separated the regular expressions from the method is
because there's going to be a few tens of them and I wanted to
consolidate them. Any help would be appreciated.

You are not too far away. Just use proper regexp escapes in the replacement string plus apply the replacement twice. So #{match[0]} becomes \\& and #{match[1]} becomes \\1 etc.

class Embedment < ActiveRecord::Base
   def self.capture_embedments(text)
     embedments =
     REGEXPS.each do |k,v|
       text.gsub!(v[0]) do |match|
         embedment = new
         embedments << embedment
         embedment.html = match.sub(*v)
       end
     end

     return embedments
   end
end

Note also, that it's better to make the replacements a constant in the class. And if you iterate only, you don't need symbols as keys, just do

REGEXPS = {
   /\(youtube:.*?(?:v=)?([\w|-]{11}).*\)/ => "...",
}

And then

     REGEXPS.each do |rx,repl|
       text.gsub!(rx) do |match|
         embedment = new
         embedments << embedment
         embedment.html = match.sub(rx,repl)
       end
     end

If you need more flexibility you can replace the replacement string with a block. Then you can do

class Embedment
   REGEXPS = {
     /\(youtube:.*?(?:v=)?([\w|-]{11}).*\)/ =>
       lambda {|match| "<!-- matched: #{match} -->" }
   }
end

Then you can do

     REGEXPS.each do |rx,repl|
       text.gsub!(rx) do |match|
         embedment = new
         embedments << embedment
         embedment.html = match.sub(rx,&repl)
       end
     end

Kind regards

  robert

···

On 26.05.2007 09:44, Eleo wrote:

While not related to using gsub/regular expressions, consider an
option like hpricot?

Robert Klemme wrote:

some helpful crap

Thanks, this seems to work out fine. I used hash keys just in case. It
is easier to type a site name than it is to type out a regular
expression, and although I don't need to access the hash directly yet,
well, you nevva know.

As for lambda, something I don't fully understand yet. While I was
pondering the solution I kind of had this weird feeling that it might be
the solution, though. I'll look into it.

Paul Stickney wrote:

While not related to using gsub/regular expressions, consider an
option like hpricot?

I just glanced at it, but I'm not sure why it would be a better option.
In my case I wanted users to be able to embed videos in their comments,
but they have no access to html, so I instead created this generic
markup like (youtube:link) to accomplish the same ends. I figured
allowing them the option of using real HTML <embed> code would be too
risky. I don't know for sure whether or not there are malicious uses
for <embed> but I imagined so.

···

--
Posted via http://www.ruby-forum.com/\.