Ruby 1.9
I'm trying to scrape a part of a web page,
http://www3.nhk.or.jp/nhkworld/chinese/top/index.html
(excuse me, it's an unknown language for most of you. It's a chinese
page of Japanese news site)
I hope you can see the portion which I want in the attached file.
the Xpath for the portion should be
/html/body[@id='nhkworld-language-template-index']/div[@id='mainBox']/div[@id='mainBoxL']/div[@id='news']/h2/span[@class='update']
the code would be
url_date = "http://www3.nhk.or.jp/nhkworld/chinese/top/update_news.js"
doc_init = Nokogiri::HTML(open(url_date))
date =
doc_init.xpath("/html/body[@id='nhkworld-language-template-index']/div[@id='mainBox']/div[@id='mainBoxL']/div[@id='news']/h2/span[@class='update']")
p date.text()
But it does not get anything. The expected outcome is something like
更新 6月12日 21:34(日本时间)
showing the date and time of update, which of course varies depending on
when you execute it.
Looking at the source of this page at line 96,
<h2><img src="fixed/images/h2_news.gif" alt="新闻" width="39"
height="20"><span class="update"><script type="text/javascript"
src="./update_news.js"></script></span></h2>
is the place. It seems like this javascript file, 'update_news.js',
gets the date and time dynamically.
Is there anyway to get the particular portion of this page?
soichi
Attachments:
http://www.ruby-forum.com/attachment/7486/ruby_scraping.jpg
···
--
Posted via http://www.ruby-forum.com/.