Extracting some text from HTML

It's a quick question (I hope).

I'm using Nokogiri.

I have the following HTML:

...
  <div class="lead">
    <div class="something"> blah blha blah </div>
    <div class="another-something"> blah blah blah </div>

     ...some text I want to extract...

    <div class="blah"> whatever </div>
  </div>
...

How can I extract the text "...some text I want to extract..." ?

My problem is that this text isn't wrapped in a DIV. It's a "text" node
(not an "element" node).

Anybody can figure out an xpath expression for it?

···

--
Posted via http://www.ruby-forum.com/.

Albert Schlef wrote:

I'm using Nokogiri.

I have the following HTML:

...
  <div class="lead">
    <div class="something"> blah blha blah </div>
    <div class="another-something"> blah blah blah </div>

     ...some text I want to extract...

    <div class="blah"> whatever </div>
  </div>
...

How can I extract the text "...some text I want to extract..." ?

I solved the problem. I used the following code:

the_text_i_want = doc.at_xpath('//div[@class="lead"]/text()[3]')
puts the_text_i_want.content

···

--
Posted via http://www.ruby-forum.com/\.

Hi, I want to do something similar to what you are doing.

Basically I would like to go through a whole bunch of links and text on
a page and scrape just the Text I want, and if I get that text scrape
the corresponding URL in the same table with it.

Here is my actual code so far

require 'nokogiri'
require 'rubygems'
require 'open-uri'

def certs
  @certs = %{MCSE "A\+" MCITP MCDBA MCPD MCSA} # Text I would like
scraped
end
for i in 1..100 do # yay for page loop
url = "http://www.hawaiicrcs.org/searchprog.asp?cat=&pg=#{i}" pages
scraped
doc = Nokogiri::HTML(open(url))
for s in 1..100 do # yay for table loop
tts = doc.css("tr:nth-child(#{s})").each do |var| # pages in an array

puts var
end
end
end
end

Hope I am being clear. for the href I have gotten it to display with
tts = doc.css("tr:nth-child(#{s})")[:href]

But am unsure how to go about getting the href with the compared to text
I am thinking an if then statement or case. maybe someone can help.

Something like

if compared data = true
p "doc.css("tr:nth-child(#{s})")[:href]"

or something of the sort. I am a newb so forgive my error's if there are
any when I type.

···

--
Posted via http://www.ruby-forum.com/.