Html to plain text

Okay, I have played with Hpricot and I am a convert. Amazing stuff.

I am struggling up to speed and I can't find what must be a basic
function. I've scraped the FAA site and they store all their stuff
wrapped in td's, wrapped in tr's, wrapped in tables. Thank you
Hpricot.

Now that I have "<b>Manufacturer</b>" isn't there a simple call to get
rid of the last bit of html?

Thanks,
--Colin

Hi Colin, consult api doc for Hpricot.inner_text:

require 'rubygems'
require 'hpricot'
require 'open-uri'
doc = open( 'http://www.google.com/ncr' ) { |io| Hpricot io }
doc.inner_text

Regards
Florian

It looks like you're looking for the inner_text method.

HTH,
Chris

···

On Jun 24, 1:40 pm, "Colin Summers" <blade...@gmail.com> wrote:

Okay, I have played with Hpricot and I am a convert. Amazing stuff.

I am struggling up to speed and I can't find what must be a basic
function. I've scraped the FAA site and they store all their stuff
wrapped in td's, wrapped in tr's, wrapped in tables. Thank you
Hpricot.

Now that I have "<b>Manufacturer</b>" isn't there a simple call to get
rid of the last bit of html?

Thanks,
--Colin

^^^^^^^
This code (above) doesn't work on my system.

The following does:

require 'rubygems'
require 'hpricot'
html_string = '<b>Manufacturer</b>'
html_data = Hpricot html_string
html_element = html_data / "b"
puts html_element.inner_html

Todd

···

On 6/24/07, Florian Aßmann <florian.assmann@email.de> wrote:

Hi Colin, consult api doc for Hpricot.inner_text:

require 'rubygems'
require 'hpricot'
require 'open-uri'
doc = open( 'http://www.google.com/ncr&#39; ) { |io| Hpricot io }
doc.inner_text

Another "jump too soon moment".

In the above code, I didn't point out that html_element should be
plural. It still works though, but technically the grammatically
correct way would be:

require 'rubygems'
require 'hpricot'
html_string = '<b>Manufacturer</b>'
html_data = Hpricot html_string
html_elements = html_data / "b"
first_b_element = html_data.at "b"
first_b_element_also = (html_data / "b").first
puts first_b_element.inner_html

Todd

···

On 6/24/07, Todd Benson <caduceass@gmail.com> wrote:

The following does:

require 'rubygems'
require 'hpricot'
html_string = '<b>Manufacturer</b>'
html_data = Hpricot html_string
html_element = html_data / "b"
puts html_element.inner_html