Open-uri error

I am writing a script to download warcraft 3 replays for me. It got a few (which work) then had an error:

URI::InvalidURIError: bad URI(is not URI?): http://ftp.replays.net/w3g/060607/060606_mYm]Lucifer(UD)_vs_mTw-LasH(Hum)_TwistedMeadows_RN.w3g

The URL works in Safari, so I'm not sure what's going on. My wild guess is that Safari accepts technically invalid URLs. Hopefully someone knowledgeable can tell me what the issue is. Here's my code:

require "open-uri"
path = "/Applications/Warcraft\ III/Replay/auto/"
files = Dir.glob "#{path}*"
count = 0

urls = `lynx -dump http://war3.replays.net/`.split "\n"
urls = urls.select { |url| url =~ %r-\d{1,3}\. http://ftp.replays.net/w3g- }
urls = urls.collect { |url| url.sub(%r-\s*\d{1,3}\.\s*-, "")}

urls.each do |url|
   filename = url.sub(%r-http://ftp.replays.net/w3g/\d*/-, "")
   if not files.include?(filename)
     open(url) do |remote_file|
       File.open(path + filename, "w") do |local_file|
         local_file.write remote_file.read
         count += 1
       end
     end
   end
end

puts "I got #{count} files!"

-- Elliot Temple

I fixed my problem. The key change is:

url = URI.escape(url)

Here's the current version of the code:

require "open-uri"
path = "/Applications/Warcraft\ III/Replay/auto/"
Dir.chdir path
files = Dir.glob "*"
count = 0

urls = `lynx -dump http://war3.replays.net/`.split "\n"
urls = urls.select {|url| url =~ %r-\d{1,3}\. http://ftp.replays.net/w3g-\}
urls = urls.collect {|url| url.sub(%r-\s*\d{1,3}\.\s*-, "")}.uniq

puts "I found #{urls.length} replays!"

urls.each do |url|
   filename = url.sub(%r-http://ftp.replays.net/w3g/\d*/-, "")
   url = URI.escape(url)
   if not files.include?(filename)
     puts "Count is #{count}. Getting #{url}"
     open(url) do |remote_file|
       File.open(path + filename, "w") do |local_file|
         local_file.write remote_file.read
         count += 1
       end
     end
   end
end

puts "I got #{count} files!"

-- Elliot Temple

···

On Jun 6, 2006, at 11:47 PM, Elliot Temple wrote:

I am writing a script to download warcraft 3 replays for me. It got a few (which work) then had an error:

URI::InvalidURIError: bad URI(is not URI?): http://ftp.replays.net/w3g/060607/060606_mYm]Lucifer(UD)_vs_mTw-LasH(Hum)_TwistedMeadows_RN.w3g

oops. that didn't work for URLS with in them. now i've added this code:

     begin
       get_replay url, filename
     rescue URI::InvalidURIError
       url = url.scan(%r-http://ftp.replays.net/w3g/\d*/-\)[0] + CGI.escape(filename)
       begin
         get_replay url, filename
       rescue URI::InvalidURIError
         STDERR.puts $!
       end
     end

the CGI.escape changes but isn't safe to do on the entire URL (it changes slashes as well). observe:

irb(main):013:0> x = CGI.escape "http://www.google.com"
=> "http%3A%2F%2Fwww.google.com"
irb(main):014:0> open(x)
Errno::ENOENT: No such file or directory - http%3A%2F%2Fwww.google.com
         from /usr/local/lib/ruby/1.8/open-uri.rb:88:in `initialize'
         from /usr/local/lib/ruby/1.8/open-uri.rb:88:in `open'
         from (irb):14
irb(main):015:0> open "http://www.google.com"
=> #<StringIO:0x585b74>

I don't know if I'm doing this the correct way, but it's working so far (got about 60 files).

Elliot

···

On Jun 7, 2006, at 11:23 AM, Elliot Temple wrote:

On Jun 6, 2006, at 11:47 PM, Elliot Temple wrote:

I am writing a script to download warcraft 3 replays for me. It got a few (which work) then had an error:

URI::InvalidURIError: bad URI(is not URI?): http://ftp.replays.net/w3g/060607/060606_mYm]Lucifer(UD)_vs_mTw-LasH(Hum)_TwistedMeadows_RN.w3g

I fixed my problem. The key change is:

url = URI.escape(url)