Is there a tutorial on parsing text?

I’m thinking of building a TV Guide viewer in Ruby.

Currently, to view a TV Guide, I have to go to a website, select the
channels and date I want to view.

The problem is that it only lets me select 3 channels at a time, view one
day at a time, and I have to wait for the webpage to load everytime.

So, I’m thinking of getting ruby to send a HTTP POST to that site,
requesting for all the channels and dates, and hopefully I’ll end up with
the HTML. Then I’ll strip the HTML to just the channel names, programs,
times of the programs, and dates and store them somewhere. Then I should be
able to use tell Ruby which channels I wanna view, and it’ll show it to me
immediately.

Later on, I might even be able to create a GUI for it using FXRuby or
something.

The thing is, I have no idea how to do these things. How to send HTTP POST
using Ruby, how to get the HTML pages back, how to extract the TV programs
out of those HTML pages.

So, can anyone provide any help or links on how to do this kinda things, and
is there a better method to accomplish this than the one I mentioned?

Robo

“Robo” robo@mars.com wrote in message

The thing is, I have no idea how to do these things. How to send HTTP POST
using Ruby, how to get the HTML pages back, how to extract the TV programs
out of those HTML pages.

So, can anyone provide any help or links on how to do this kinda things,
and
is there a better method to accomplish this than the one I mentioned?

Take a look at Internet Explorer Controller by Chris Morris at
http://clabs.org/ruby.htm

I don’t know of a tutorial on parsing text in Ruby.

For getting pages, you might look at how webfetcher does it:

http://www.acc.umu.se/~r2d2/programming/ruby/webfetcher/

curl, a unix program, can handle pages where the site must set a cookie
before it gives you access to a page. See:

http://curl.haxx.se/

For text parsing after you get the pages, see the relevant sections of
the pick-ax book (strings and regular expressions). The book can be
found here:

http://www.rubycentral.com/book/index.html

···

On Friday, May 9, 2003, at 09:23 PM, Robo wrote:

[snip]

How to send HTTP POST
using Ruby, how to get the HTML pages back, how to extract the TV
programs
out of those HTML pages.

[snip]

There’s also at least one HTML parsing library in the RAA. I haven’t
used it, so I can’t tell you more details.

Regards,
Pit

···

On 10 May 2003 at 12:29, Mark Wilson wrote:

On Friday, May 9, 2003, at 09:23 PM, Robo wrote:

[snip]

How to send HTTP POST
using Ruby, how to get the HTML pages back, how to extract the TV
programs
out of those HTML pages.

[snip]

I don’t know of a tutorial on parsing text in Ruby.

For getting pages, you might look at how webfetcher does it:

402 Access Denied

curl, a unix program, can handle pages where the site must set a cookie
before it gives you access to a page. See:

http://curl.haxx.se/

For text parsing after you get the pages, see the relevant sections of
the pick-ax book (strings and regular expressions). The book can be
found here:

http://www.rubycentral.com/book/index.html