Scraping with Nokogiri for dynamic page(?)

Ruby 1.9

I'm trying to scrape a part of a web page,

http://www3.nhk.or.jp/nhkworld/chinese/top/index.html

(excuse me, it's an unknown language for most of you. It's a chinese
page of Japanese news site)

I hope you can see the portion which I want in the attached file.

the Xpath for the portion should be

/html/body[@id='nhkworld-language-template-index']/div[@id='mainBox']/div[@id='mainBoxL']/div[@id='news']/h2/span[@class='update']

the code would be

url_date = "http://www3.nhk.or.jp/nhkworld/chinese/top/update_news.js"
doc_init = Nokogiri::HTML(open(url_date))
date =
doc_init.xpath("/html/body[@id='nhkworld-language-template-index']/div[@id='mainBox']/div[@id='mainBoxL']/div[@id='news']/h2/span[@class='update']")
p date.text()

But it does not get anything. The expected outcome is something like

更新 6月12日 21:34(日本时间)

showing the date and time of update, which of course varies depending on
when you execute it.

Looking at the source of this page at line 96,

<h2><img src="fixed/images/h2_news.gif" alt="新闻" width="39"
height="20"><span class="update"><script type="text/javascript"
src="./update_news.js"></script></span></h2>

is the place. It seems like this javascript file, 'update_news.js',
gets the date and time dynamically.

Is there anyway to get the particular portion of this page?

soichi

Attachments:
http://www.ruby-forum.com/attachment/7486/ruby_scraping.jpg

···

--
Posted via http://www.ruby-forum.com/.

Have you looked at the file?

http://www3.nhk.or.jp/nhkworld/chinese/top/update_news.js

It basically just writes out the date; just get it from there.

-- Matma Rex

thanks, that was simple.

soichi

···

--
Posted via http://www.ruby-forum.com/.