I have a nice little regex to pull the information rich guts from a
table....
%r{</thead.*?>(.*?)</table>}m =~html
# $1 now contains all the rows of the table as one long string.
I'd like to turn that into an array of rows, but I am not exactly sure
how.
Additionally, I'd like to process the rows so that i can get data from
between the nth <td></td> pair.
Any help?
If you have a string with a repeating pattern that you want an array
of, String#scan is your man.
irb(main):001:0> html = "<td>foo</td><td>bar</td>"
=> "<td>foo</td><td>bar</td>"
irb(main):002:0> a = html.scan(/<td>(.+?)<\/td>/)
=> [["foo"], ["bar"]]
Hmmm, that's sort of ugly.
irb(main):003:0> a = html.scan(/<td>(.+?)<\/td>/).flatten
=> ["foo", "bar"]
Much better.
Ad hoc regexes are fine for quick-n-dirty scripting. But if you're
serious about parsing HTML you might want to look into Hpricot or
Nokogiri.
-Michael Libby
ยทยทยท
On Fri, Nov 7, 2008 at 3:08 PM, soldier.coder <geekprogrammer.ed@googlemail.com> wrote:
I have a nice little regex to pull the information rich guts from a
table....
%r{</thead.*?>(.*?)</table>}m =~html
# $1 now contains all the rows of the table as one long string.
I'd like to turn that into an array of rows, but I am not exactly sure
how.
Additionally, I'd like to process the rows so that i can get data from
between the nth <td></td> pair.
Any help?