Scraping with Nokogiri for dynamic page(?)

7stud2 · 13 June 2012 03:24

Ruby 1.9

I'm trying to scrape a part of a web page,

http://www3.nhk.or.jp/nhkworld/chinese/top/index.html

(excuse me, it's an unknown language for most of you. It's a chinese
page of Japanese news site)

I hope you can see the portion which I want in the attached file.

the Xpath for the portion should be

/html/body[@id='nhkworld-language-template-index']/div[@id='mainBox']/div[@id='mainBoxL']/div[@id='news']/h2/span[@class='update']

the code would be

url_date = "http://www3.nhk.or.jp/nhkworld/chinese/top/update_news.js"
doc_init = Nokogiri::HTML(open(url_date))
date =
doc_init.xpath("/html/body[@id='nhkworld-language-template-index']/div[@id='mainBox']/div[@id='mainBoxL']/div[@id='news']/h2/span[@class='update']")
p date.text()

But it does not get anything. The expected outcome is something like

更新 6月12日 21:34（日本时间）

showing the date and time of update, which of course varies depending on
when you execute it.

Looking at the source of this page at line 96,

is the place. It seems like this javascript file, 'update_news.js',
gets the date and time dynamically.

Is there anyway to get the particular portion of this page?

soichi

Attachments:
http://www.ruby-forum.com/attachment/7486/ruby_scraping.jpg

···

--
Posted via http://www.ruby-forum.com/.

11142 · 13 June 2012 14:08

Have you looked at the file?

http://www3.nhk.or.jp/nhkworld/chinese/top/update_news.js

It basically just writes out the date; just get it from there.

-- Matma Rex

7stud2 · 14 June 2012 00:14

thanks, that was simple.

soichi

···

--
Posted via http://www.ruby-forum.com/.

Topic		Replies	Views
Nokogiri not pulling correct XPath ruby-talk	4	162	1 March 2011
Nokogiri and xpath for changing value on the web ruby-talk	2	119	21 December 2009
Scrapping data from a webpage where the data is loaded dynamically ruby-talk	7	161	8 February 2014
Nokogiri help parsing HTML ruby-talk	17	509	29 March 2013
How to write this xpath? ruby-talk	4	158	7 September 2010

Scraping with Nokogiri for dynamic page(?)

Related topics