7stud2
(7stud --)
31 October 2012 05:20
1
Rails 1.9.3
For http://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A
I would like to make a list of airport names using Nokogiri.
The following code seems to work but it does not insert "\n" as I wish.
Can you tell me why?
require 'open-uri'
require 'nokogiri'
test_url =
"http://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A "
url_list_file = "list_page_url.txt"
test_xpath = "//tr"
output_file = "list_airport_names_wiki_url.txt"
test = Nokogiri::HTML(open(test_url))
File.open(output_file, "a") {|f|
test.xpath(test_xpath).each do |e|
f.write e.xpath("//td[3]/a").text + "\n" #### HERE!!! ####
end
}
···
--
Posted via http://www.ruby-forum.com/ .
Robert_K1
(Robert K.)
31 October 2012 09:12
2
First of all the XPath looks suspicious: you certainly want only "td"
elements nested below the current "tr". So you should use any of
td[3]/a
.//td[3]/a
Otherwise the first selection is useless because //td[3]/a will select
all "a" children of the third "td" in the document. Also, e.xpath
will return a NodeSet which, when converted via #text , will lead to
surprising results:
irb(main):026:0> puts dom
<?xml version="1.0"?>
<table>
<td>abc</td>
</table>
=> nil
irb(main):027:0> dom.xpath('//*')
=> [#<Nokogiri::XML::Element:0x..fc00768e6 name="table"
children=[#<Nokogiri::XML::Element:0x..fc00766d4 name="td"
children=[#<Nokogiri::XML::Text:0x..fc0076526 "abc">]>]>,
#<Nokogiri::XML::Element:0x..fc00766d4 name="td"
children=[#<Nokogiri::XML::Text:0x..fc0076526 "abc">]>]
irb(main):028:0> dom.xpath('//*').text
=> "abcabc"
Kind regards
robert
···
On Wed, Oct 31, 2012 at 6:20 AM, Soichi Ishida <lists@ruby-forum.com> wrote:
Rails 1.9.3
For List of airports by IATA airport code: A - Wikipedia
I would like to make a list of airport names using Nokogiri.
The following code seems to work but it does not insert "\n" as I wish.
Can you tell me why?
require 'open-uri'
require 'nokogiri'
test_url =
"http://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A" ;
url_list_file = "list_page_url.txt"
test_xpath = "//tr"
output_file = "list_airport_names_wiki_url.txt"
test = Nokogiri::HTML(open(test_url))
File.open(output_file, "a") {|f|
test.xpath(test_xpath).each do |e|
f.write e.xpath("//td[3]/a").text + "\n" #### HERE!!! ####
end
}
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/