Regexp question

Lyndon_Samson · 9 May 2005 11:43

I have a HTML document containing many table cells of which I wish to
extract the contents.

r = Regexp.new("\<TD.*?\>(.*?)\<\TD\>", Regexp::MULTILINE)
m = r.match(table)

The above only matches the first cell, I'd like to continue the match
finding each subsequent cell. The not-very-nice way to do this is to
take the char offset of the match, create a new string from that point
and feed it back into match.

Whats the better way?

···

--
Into RFID? www.rfidnewsupdate.com Simple, fast, news.

CT1 · 9 May 2005 11:57

Have you tried scan? Try the same regexp with String#scan, to get an array
of matched groups

- Shajith.

···

On 5/9/05, Lyndon Samson <lyndon.samson@gmail.com> wrote:

The above only matches the first cell, I'd like to continue the match
finding each subsequent cell.

Robert · 9 May 2005 11:59

Lyndon Samson wrote:

I have a HTML document containing many table cells of which I wish to
extract the contents.

r = Regexp.new("\<TD.*?\>(.*?)\<\TD\>", Regexp::MULTILINE)
m = r.match(table)

The above only matches the first cell, I'd like to continue the match
finding each subsequent cell. The not-very-nice way to do this is to
take the char offset of the match, create a new string from that point
and feed it back into match.

Whats the better way?

Use String#scan.

robert

Nikolai_Weibull · 9 May 2005 12:13

Lyndon Samson, May 9:

r = Regexp.new("\<TD.*?\>(.*?)\<\TD\>", Regexp::MULTILINE)
m = r.match(table)

The above only matches the first cell, I'd like to continue the match
finding each subsequent cell. The not-very-nice way to do this is to
take the char offset of the match, create a new string from that point
and feed it back into match.

Whats the better way?

Using String#scan, as previously suggested, is the easiest method.
Another way of doing it is to use a loop while m is non-nil and match
against m.post_match on each iteration. See the documentation of the
MatchData class for more information,
nikolai

···

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Topic		Replies	Views
Extracting text from HTML ruby-talk	7	80	11 May 2003
Str.scan ruby-talk	5	71	15 June 2007
String extraction using RegExp ruby-talk	2	89	9 June 2008
Parsing HTML using regexes and arrays ruby-talk	1	136	7 November 2008
Multiple regexp matches ruby-talk	27	102	24 August 2004

Regexp question

Related topics