Regex problem

K_R · 27 November 2007 16:28

hi @all

I would like to scan a string of html-tags. I need it to take out all
links (a-tags) in the string, but I become only the last one. What is
wrong? See the code below...

response = '<a href="hello1.html">test1</a> - <a
href="hello2.html">test2</a>'
response.scan(/<a.*href="(.*?)"/) do |line|
puts line
end

thanks for helping!

···

--
Posted via http://www.ruby-forum.com/.

franco · 27 November 2007 16:44

the first kleene star might need to be non greedy? in other words stop
at the first href consumed, not the last.
/<a.*?href="(.*?)"/

···

On Nov 27, 11:28 am, "K. R." <m...@palstek.ch> wrote:

hi @all

I would like to scan a string of html-tags. I need it to take out all
links (a-tags) in the string, but I become only the last one. What is
wrong? See the code below...

response = '<a href="hello1.html">test1</a> - <a
href="hello2.html">test2</a>'
response.scan(/<a.*href="(.*?)"/) do |line|
puts line
end

thanks for helping!
--
Posted viahttp://www.ruby-forum.com/.

Christian_von_Kleist · 27 November 2007 17:00

Franco is right. You could fix it by doing "a.*?href". However, I
would change "a.*href" to "a\s+href" since you're looking for any
amount of whitespace after the "a" and before the "href".

response = '<a href="hello1.html">test1</a> - <a href="hello2.html">test2</a>'
response.scan(/<a\s+href="(.*?)"/s) do |line|
puts line
end

···

On Nov 27, 2007 11:28 AM, K. R. <mcse@palstek.ch> wrote:

hi @all

I would like to scan a string of html-tags. I need it to take out all
links (a-tags) in the string, but I become only the last one. What is
wrong? See the code below...

response = '<a href="hello1.html">test1</a> - <a
href="hello2.html">test2</a>'
response.scan(/<a.*href="(.*?)"/) do |line|
puts line
end

thanks for helping!
--
Posted via http://www.ruby-forum.com/\.

franco · 1 December 2007 16:34

> hi @all

> I would like to scan a string of html-tags. I need it to take out all
> links (a-tags) in the string, but I become only the last one. What is
> wrong? See the code below...

> response = '<a href="hello1.html">test1</a> - <a
> href="hello2.html">test2</a>'
> response.scan(/<a.*href="(.*?)"/) do |line|
> puts line
> end

but what if href is not the first attribute of <a/>?

···

On Nov 27, 12:00 pm, Christian von Kleist <cvonkle...@gmail.com> wrote:

On Nov 27, 2007 11:28 AM, K. R. <m...@palstek.ch> wrote:

> thanks for helping!
> --
> Posted viahttp://www.ruby-forum.com/.

Franco is right. You could fix it by doing "a.*?href". However, I
would change "a.*href" to "a\s+href" since you're looking for any
amount of whitespace after the "a" and before the "href".

response = '<a href="hello1.html">test1</a> - <a href="hello2.html">test2</a>'
response.scan(/<a\s+href="(.*?)"/s) do |line|
puts line
end

K_R · 2 December 2007 13:05

response.scan(/<a.*href="(.*?)"/) do |line|

but what if href is not the first attribute of <a/>?

Regardless which order has the attributes, because you can have any
sequence (.*) between the <a tag and href.

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Html stringScanner regexp ruby-talk	1	97	3 May 2006
Regexp help: Matching HTML having trouble w/greediness ruby-talk	5	111	23 May 2006
Regular Expression interesting problem ruby-talk	8	157	28 March 2009
Regex problem, probably simple ruby-talk	6	144	16 May 2007
Regex: get the first match ruby-talk	6	154	10 June 2007

Regex problem

Related topics