[regexp] How to extract

Jan_Kowalski · 5 April 2008 00:37

From:

#header h1 a {
  display: block;
  background: url(images/wordpress-logo.png) center left no-repeat;
  width: 301px;
  height: 88px;
  text-indent: -9999px;
  float: left;
}

#header ul li#download a {
  background: #d54e21 url(images/download-tab-bg.png) bottom left
repeat-x;
  color: #fff;
  -moz-border-radius-topleft: 3px;
  -khtml-border-top-left-radius: 3px;
  -webkit-border-top-left-radius: 3px;
  border-top-left-radius: 3px;
  -moz-border-radius-topright: 3px;
  -khtml-border-top-right-radius: 3px;
  -webkit-border-top-right-radius: 3px;
  border-top-right-radius: 3px;
  text-shadow: #b5421c 1px 1px 1px;
}

i need each occurence of text enclosed between url(************), for
this example:

images/wordpress-logo.png
images/download-tab-bg.png

···

--
Posted via http://www.ruby-forum.com/.

7stud · 5 April 2008 01:00

Jan Kowalski wrote:

From:

#header h1 a {
  display: block;
  background: url(images/wordpress-logo.png) center left no-repeat;
  width: 301px;
  height: 88px;
  text-indent: -9999px;
  float: left;
}

#header ul li#download a {
  background: #d54e21 url(images/download-tab-bg.png) bottom left
repeat-x;
  color: #fff;
  -moz-border-radius-topleft: 3px;
  -khtml-border-top-left-radius: 3px;
  -webkit-border-top-left-radius: 3px;
  border-top-left-radius: 3px;
  -moz-border-radius-topright: 3px;
  -khtml-border-top-right-radius: 3px;
  -webkit-border-top-right-radius: 3px;
  border-top-right-radius: 3px;
  text-shadow: #b5421c 1px 1px 1px;
}

i need each occurence of text enclosed between url(************), for
this example:

images/wordpress-logo.png
images/download-tab-bg.png

str = <<CSS
#header h1 a {
  display: block;
  background: url(images/wordpress-logo.png) center left no-repeat;
  width: 301px;
  height: 88px;
  text-indent: -9999px;
  float: left;
}

#header ul li#download a {
  background: #d54e21 url(images/download-tab-bg.png) bottom left
repeat-x;
  color: #fff;
  -moz-border-radius-topleft: 3px;
  -khtml-border-top-left-radius: 3px;
  -webkit-border-top-left-radius: 3px;
  border-top-left-radius: 3px;
  -moz-border-radius-topright: 3px;
  -khtml-border-top-right-radius: 3px;
  -webkit-border-top-right-radius: 3px;
  border-top-right-radius: 3px;
  text-shadow: #b5421c 1px 1px 1px;
}
CSS

pattern = /url\((.*)\)/ #to match a parenthesis escape it with a '\'

str.each do |line|
  match = line[pattern, 1] #1 is the parenthesized sub pattern
  if match
    puts match
  end
end

--output:--
images/wordpress-logo.png
images/download-tab-bg.png

···

--
Posted via http://www.ruby-forum.com/\.

7stud · 5 April 2008 01:19

7stud -- wrote:

pattern = /url\((.*)\)/ #to match a parenthesis escape it with a '\'

Actually, to be safe make the .* non-greedy:

pattern = /url\((.*?)\)/ #to match a parenthesis escape it with a '\'

Here's the difference:

str = "abcurl(good)bad)xyz"

pattern1 = /url\((.*)\)/ #to match a parenthesis escape it with a '\'
pattern2 = /url\((.*?)\)/ #to match a parenthesis escape it with a '\'

match1 = str[pattern1, 1] #1 is the parenthesized sub pattern
puts match1

match2 = str[pattern2, 1] #1 is the parenthesized sub pattern
puts match2

--output:--
good)bad
good

···

--
Posted via http://www.ruby-forum.com/\.

Todd_Benson · 5 April 2008 08:41

You can use a non-greedy multi-line #scan pattern as well...

str.scan /url\((.*?)\)/m
=> [["images/wordpress-logo.png"], [images/download-tab-bg.png]]

...which you can #flatten if you want.

Todd

···

On Fri, Apr 4, 2008 at 8:00 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

Jan Kowalski wrote:
> From:
>
> #header h1 a {
> display: block;
> background: url(images/wordpress-logo.png) center left no-repeat;
> width: 301px;
> height: 88px;
> text-indent: -9999px;
> float: left;
> }
>
> #header ul li#download a {
> background: #d54e21 url(images/download-tab-bg.png) bottom left
> repeat-x;
> color: #fff;
> -moz-border-radius-topleft: 3px;
> -khtml-border-top-left-radius: 3px;
> -webkit-border-top-left-radius: 3px;
> border-top-left-radius: 3px;
> -moz-border-radius-topright: 3px;
> -khtml-border-top-right-radius: 3px;
> -webkit-border-top-right-radius: 3px;
> border-top-right-radius: 3px;
> text-shadow: #b5421c 1px 1px 1px;
> }
>
> i need each occurence of text enclosed between url(************), for
> this example:
>
> images/wordpress-logo.png
> images/download-tab-bg.png

str = <<CSS

#header h1 a {
  display: block;
  background: url(images/wordpress-logo.png) center left no-repeat;
  width: 301px;
  height: 88px;
  text-indent: -9999px;
  float: left;
}

#header ul li#download a {
  background: #d54e21 url(images/download-tab-bg.png) bottom left
repeat-x;
  color: #fff;
  -moz-border-radius-topleft: 3px;
  -khtml-border-top-left-radius: 3px;
  -webkit-border-top-left-radius: 3px;
  border-top-left-radius: 3px;
  -moz-border-radius-topright: 3px;
  -khtml-border-top-right-radius: 3px;
  -webkit-border-top-right-radius: 3px;
  border-top-right-radius: 3px;
  text-shadow: #b5421c 1px 1px 1px;
}
CSS

pattern = /url\((.*)\)/ #to match a parenthesis escape it with a '\'

str.each do |line|
  match = line[pattern, 1] #1 is the parenthesized sub pattern
  if match
    puts match
  end
end

--output:--

images/wordpress-logo.png
images/download-tab-bg.png

Robert_K1 · 5 April 2008 08:55

I'd rather use a more specific match as I believe this could be a tad more efficient

str.scan /url\(([^)]*)\)/m

Cheers

robert

···

On 05.04.2008 10:41, Todd Benson wrote:

On Fri, Apr 4, 2008 at 8:00 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

Jan Kowalski wrote:
> From:
>
> #header h1 a {
> display: block;
> background: url(images/wordpress-logo.png) center left no-repeat;
> width: 301px;
> height: 88px;
> text-indent: -9999px;
> float: left;
> }
>
> #header ul li#download a {
> background: #d54e21 url(images/download-tab-bg.png) bottom left
> repeat-x;
> color: #fff;
> -moz-border-radius-topleft: 3px;
> -khtml-border-top-left-radius: 3px;
> -webkit-border-top-left-radius: 3px;
> border-top-left-radius: 3px;
> -moz-border-radius-topright: 3px;
> -khtml-border-top-right-radius: 3px;
> -webkit-border-top-right-radius: 3px;
> border-top-right-radius: 3px;
> text-shadow: #b5421c 1px 1px 1px;
> }
>
> i need each occurence of text enclosed between url(************), for
> this example:
>
> images/wordpress-logo.png
> images/download-tab-bg.png

str = <<CSS

#header h1 a {
  display: block;
  background: url(images/wordpress-logo.png) center left no-repeat;
  width: 301px;
  height: 88px;
  text-indent: -9999px;
  float: left;
}

#header ul li#download a {
  background: #d54e21 url(images/download-tab-bg.png) bottom left
repeat-x;
  color: #fff;
  -moz-border-radius-topleft: 3px;
  -khtml-border-top-left-radius: 3px;
  -webkit-border-top-left-radius: 3px;
  border-top-left-radius: 3px;
  -moz-border-radius-topright: 3px;
  -khtml-border-top-right-radius: 3px;
  -webkit-border-top-right-radius: 3px;
  border-top-right-radius: 3px;
  text-shadow: #b5421c 1px 1px 1px;
}
CSS

pattern = /url\((.*)\)/ #to match a parenthesis escape it with a '\'

str.each do |line|
  match = line[pattern, 1] #1 is the parenthesized sub pattern
  if match
    puts match
  end
end

--output:--

images/wordpress-logo.png
images/download-tab-bg.png

You can use a non-greedy multi-line #scan pattern as well...

str.scan /url\((.*?)\)/m
=> [["images/wordpress-logo.png"], [images/download-tab-bg.png]]

..which you can #flatten if you want.

Todd_Benson · 5 April 2008 09:07

Yep, I think that's better.

Todd

···

On Sat, Apr 5, 2008 at 3:55 AM, Robert Klemme <shortcutter@googlemail.com> wrote:

On 05.04.2008 10:41, Todd Benson wrote:

> On Fri, Apr 4, 2008 at 8:00 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

I'd rather use a more specific match as I believe this could be a tad more
efficient

str.scan /url\(([^)]*)\)/m

Topic		Replies	Views
Problem with trivial regular expression ruby-talk	9	139	23 December 2009
How to extract something in between a pattern ruby-talk	5	117	28 September 2008
Regular expression ruby-talk	7	118	23 March 2009
Can't find appropriate regexp ruby-talk	16	90	24 June 2003
Extracting text from HTML ruby-talk	7	93	11 May 2003

[regexp] How to extract

Related topics