I'm trying to get some website screen scraping working, but I'm
suffering from a lack of examples and documentation for either
WWW::Mechanize or Rubyful Soup.
I'm trying to get some website screen scraping working, but I'm
suffering from a lack of examples and documentation for either
WWW::Mechanize or Rubyful Soup.
I'm trying to get some website screen scraping working, but I'm
suffering from a lack of examples and documentation for either
WWW::Mechanize or Rubyful Soup.
My ultimate goal is to create a series of screen scrapers that are able
to access airline websites (including entering username and password,
dealing with redirects, etc.), find my mileage and recent flights,
parse the data, put it in some variables, and save it to MySQL (with
rails).
I was trying to start with baby steps to understand the methods these
libraries support. Specifically, I was trying to fetch my own web
page, and then use a regex to match to my wife's name, "Julie Pullen",
since I have link text on www.dankohn.com saying "My wife, Julie
Pullen". I was then going to gradually increase the complexity of the
scraping.
Thanks in advance for any example scripts or documentation that you can
provide showing web scraping in ruby.
And Lyndon, I'm a huge fan of Tidy for cleaning up my own web pages,
but I'm not sure it's helpful here, as was aiming to use regexes to
parse the HTML rather than the DOM.
And Lyndon, I'm a huge fan of Tidy for cleaning up my own web pages,
but I'm not sure it's helpful here, as was aiming to use regexes to
parse the HTML rather than the DOM.
Well, DOM allows you to use XPath, which is a powerfull query mechanism.
Looks cool, thanks. I'm developing on a Windows machine, but plan to
move to a "real" machine for production. So it looks like Mechanize,
REXML, and XQuery will be the best bet.