Extracting text

I suspect you’ll want to use a parser, instead of regular expressions, to
parse HTML. There’s an html-parser module on the RAA, though I haven’t used
it myself.

Regards,

Dan

···

-----Original Message-----
From: Tim Hunter [mailto:Tim.Hunter@sas.com]
Sent: Friday, July 11, 2003 12:36 PM
To: ruby-talk@ruby-lang.org
Subject: Re: extracting text

Here’s one answer to your question. Watch out, almost any
change to the input will break it.

irb(main):012:0> s = “Some text”
“Some text”
irb(main):013:0> m = %r{<TD [^>]+>([^<]+)}.match(s)
#MatchData:0x276f978 irb(main):014:0> p m[1] “Some text”
nil irb(main):015:0>

On Fri, 11 Jul 2003 07:46:44 -0400, “Dan” falseflyboy@yahoo.comNONO > wrote:

I have a HTML table which I would like to extract text
inside a .
For an example Some text

I can write a code that detects the beginning of TD…
print line ~= “”

But how do I make it stop at . In the code above, I
just want to
print “Some text”

thanks