Hi,
I want to do …
…1) connect http server and get a html file
…2) parse a http file to retrieve infromation from it
I want to automate my routine work.
Could you recommend me a good library or solution?
regards
kwatch
Hi,
I want to do …
…1) connect http server and get a html file
…2) parse a http file to retrieve infromation from it
Could you recommend me a good library or solution?
regards
kwatch
…1) connect http server and get a html file
You can use Net::HTTP. Some documentation for it can be found here:
http://www.rubycentral.com/book/lib_network.html
Search for “Net::HTTP” in that page (it’s about halfway down).
…2) parse a http file to retrieve infromation from it
Here’s a Ruby module that parses HTML. There may be others (look in
the Ruby Application Archive):
http://www.ruby-lang.org/en/raa-list.rhtml?name=html-parser
I want to automate my routine work.
Assuming you’re on a UNIX system, make a cron job that periodically
runs your ruby script.
On Tue, Jun 25, 2002 at 04:32:50PM +0900, kwatch wrote:
“kwatch” kwatch@lycos.jp wrote in message
news:cf674456.0206242322.638840b5@posting.google.com…
I want to do …
.1) connect http server and get a html file
I like PHP’s “URL fopen wrapper” feature.
Here is my Ruby draft implementation:
class RFile
require ‘net/ftp’
require ‘net/http’
require ‘uri’
require ‘tempfile’
def initialize(uri,net)
@uri = uri
@net = net
end
def RFile.open(fileName,aMode=“r”,aPerm=nil,&block)
uri = URI.parse(fileName)
case uri.scheme
when nil
aFile = aPerm ? File.new(fileName ,aMode,aPerm) :
File.new(fileName,aMode)
if block_given?
yield aFile
aFile.close
aFile = nil
end
aFile
when ‘http’
Net::HTTP.version_1_1
new(uri,Net::HTTP.new(uri.host,uri.port))
when ‘ftp’
user,pass = uri.userinfo.split(‘:’) if uri.userinfo
new(uri,Net::FTP.new(uri.host,user,pass))
end
end
def read
case @uri.scheme
when ‘http’
@net.get(@uri.path)[1]
when ‘ftp’
dir,file = File.split(@uri.path)
@net.chdir(dir[1…-1]) if dir!=‘/’
data = ‘’
@net.retrbinary(“RETR #{file}”, 1024){|d| data += d}
data
end
end
def each(aSepString=$/, &block)
read.split(aSepString).each(&block)
end
def write(data)
case @uri.scheme
when ‘ftp’
dir,file = File.split(@uri.path)
@net.chdir(dir[1…-1]) if dir!=‘/’
f = Tempfile.new(“tmp”)
f.write(data)
f.close
@net.putbinaryfile( f.path, file, 1024 )
f.close(true)
data.length
end
end
def close
@net.close if @uri.schme==‘ftp’
end
usage:
RFile:open(“/home/path/file.txt”, “r”).each {|x| puts x}
RFile:open(“http://www.example.com/index.htm”, “r”).each {|x| puts x}
RFile:open(“ftp://user:password@example.com/file”, “r”).read
RFile:open(“ftp://user:password@example.com/file”, “w”).write(“test”)
Park Heesob.
I had to modify the html-parser a little to get it to work.
For example:
#! /usr/local/bin/ruby
require “net/http”
require “html-parser”
require “formatter”
def htmltest(data)
w = DumbWriter.new
f = AbstractFormatter.new(w)
p = HTMLParser.new(f)
p.feed(data)
p.close
end
domain = ‘www.rubycentral.com’
file = ‘/book/rubyworld.html’
h = Net::HTTP.new(domain, 80)
resp, data = h.get(file, nil )
puts domain + file if $DEBUG
This program generated the following error:
c:/ruby/lib/ruby/site_ruby/html-parser.rb:409:in Integer': invalid value for Integer: ""1"" (ArgumentError) from c:/ruby/lib/ruby/site_ruby/html-parser.rb:409:in
do_img’
from c:/ruby/lib/ruby/site_ruby/html-parser.rb:395:in each' from c:/ruby/lib/ruby/site_ruby/html-parser.rb:395:in
do_img’
from c:/ruby/lib/ruby/site_ruby/sgml-parser.rb:281:in send' from c:/ruby/lib/ruby/site_ruby/sgml-parser.rb:281:in
handle_starttag’
from c:/ruby/lib/ruby/site_ruby/sgml-parser.rb:233:in
finish_starttag' from c:/ruby/lib/ruby/site_ruby/sgml-parser.rb:208:in
parse_starttag’
from c:/ruby/lib/ruby/site_ruby/sgml-parser.rb:89:in goahead' from c:/ruby/lib/ruby/site_ruby/sgml-parser.rb:58:in
feed’
from htmltest00.rb:10:in `htmltest’
from htmltest00.rb:21
To fix it, I modified “do_img” method in file (at line 409) html-parser.rb
where it was:
if attrname == 'width'
width = Integer(value)
end
if attrname == 'height'
height = Integer(value)
end
changed to
if attrname == 'width'
width = Integer(value.gsub(/[\'\"/,'')) # replace all double-quotes
" and single quotes ’ with nothing
end
if attrname == ‘height’
height = Integer(value.gsub(/['"/,‘’))
end
And then it worked.
I am not sure if this is the best way to do it but i thought I should
share it with you.
Also, here are some changes I did to the sgml-parser.rb at line 57:
def feed(data)
@rawdata << data
goahead(false)
end
changed to :
def feed(data)
@rawdata << data if data # make sure that data is not nil
goahead(false)
end
HTH,
– Shanko
“Philip Mak” pmak@animeglobe.com wrote in message
news:20020625073957.GR9237@trapezoid.interserver.net…
On Tue, Jun 25, 2002 at 04:32:50PM +0900, kwatch wrote:
…1) connect http server and get a html file
You can use Net::HTTP. Some documentation for it can be found here:
http://www.rubycentral.com/book/lib_network.html
Search for “Net::HTTP” in that page (it’s about halfway down).
…2) parse a http file to retrieve infromation from it
Here’s a Ruby module that parses HTML. There may be others (look in
the Ruby Application Archive):http://www.ruby-lang.org/en/raa-list.rhtml?name=html-parser
I want to automate my routine work.
Assuming you’re on a UNIX system, make a cron job that periodically
runs your ruby script.
I suggest Ned Konz’s html parser, available from RAA. It can return a
REXML tree object which lets you treat the page as if it had been an
XML document.
Aidan
On Tue, Jun 25, 2002 at 04:32:50PM +0900, kwatch wrote:
…2) parse a http file to retrieve infromation from it