Parsing HTML tables

Nicolas_Pioupiou · 31 March 2009 18:59

Hi everybody,

I'm searching for a way to write a beautidull code which parse an HTML
table.

In fact, the table is dynamic.
It always have three columns but have randoms lines.

In each "line" (<tr></tr>) I want to extract the information inside the
colums <td></td>. And then, I create a new object with these
informations.

I done it by splitting my html source with the method split("<tr>") and
use regexp to extract what I want. But this solution do not satisfied
me. It's unmaintanable.

However, I'm pretty sure that I could do more clever code...

Is there anyone has an idea, a clue a thought ?

Thanks.
PS: English is not my native langage...

···

--
Posted via http://www.ruby-forum.com/.

Mark_Thomas · 31 March 2009 19:44

Use a real parser. Example:

···

On Mar 31, 2:59 pm, Nicolas Pioupiou <nicolas.e...@gmail.com> wrote:

Hi everybody,

I'm searching for a way to write a beautidull code which parse an HTML
table.

In fact, the table is dynamic.
It always have three columns but have randoms lines.

In each "line" (<tr></tr>) I want to extract the information inside the
colums <td></td>. And then, I create a new object with these
informations.

I done it by splitting my html source with the method split("<tr>") and
use regexp to extract what I want. But this solution do not satisfied
me. It's unmaintanable.

However, I'm pretty sure that I could do more clever code...

Is there anyone has an idea, a clue a thought ?

#---
require 'nokogiri'

html = <<eohtml
<html>
<body>
<table>
  <tr>
    <td>One</td><td>Two</td><td>Three</td>
  </tr>
</table>
</html>
eohtml

doc = Nokogiri::HTML(html)

doc.search('//tr').each do |line|
puts line.search('td/text()')
end

#---
Output:
One
Two
Three

Nicolas_Pioupiou · 31 March 2009 20:35

Use a real parser.

Hi,

Thanks for your help.
I perfomerd tests with Hpricot (already included in my Ruby release))
I obtain good results. Great tool !

Thnks for your help !

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Parsing html table cells ruby-talk	3	105	12 November 2006
Using HPricot to parse a fiddly table ruby-talk	2	137	7 January 2008
Parsing HTML using regexes and arrays ruby-talk	1	152	7 November 2008
Reading lines from a file into an array ruby-talk	2	182	1 May 2010
Extracting text ruby-talk	0	90	11 July 2003

Parsing HTML tables

Related topics