Efficient file downloading

Kyle_Hunter · 22 February 2008 01:54

Hello,

I'm using open-uri to download files using a buffer. It seems very
inefficient in terms of resource usage (CPU is ~10-20% in usage).

If possible, I'd like some suggestions for downloading a file which
names the outputted file the same as the URL, and does not actually
write if the file comes out to a 404 (or some other exception hits).

Current code:
BUFFER_SIZE=4096
def download(url)
  from = open(url)
  if (buffer = from.read(BUFFER_SIZE))
    puts "Downloading #{url}"
    File.open(url.split('/').last, 'wb') do |file|
    begin
      file.write(buffer)
    end while (buffer = from.read(BUFFER_SIZE))
    end
  end
end

···

--
Posted via http://www.ruby-forum.com/.

Kyle_Hunter · 22 February 2008 01:55

To clarify, I mean the file-name should be the same as it is on the web,
not the same as the URL.

···

--
Posted via http://www.ruby-forum.com/.

James_Tucker · 22 February 2008 02:24

Hello,

I'm using open-uri to download files using a buffer. It seems very
inefficient in terms of resource usage (CPU is ~10-20% in usage).

If possible, I'd like some suggestions for downloading a file which
names the outputted file the same as the URL, and does not actually
write if the file comes out to a 404 (or some other exception hits).

Current code:
BUFFER_SIZE=4096

Try making that a lot lot bigger.

···

On 22 Feb 2008, at 01:54, Kyle Hunter wrote:

def download(url)
from = open(url)
if (buffer = from.read(BUFFER_SIZE))
   puts "Downloading #{url}"
   File.open(url.split('/').last, 'wb') do |file|
   begin
     file.write(buffer)
   end while (buffer = from.read(BUFFER_SIZE))
   end
end
end
--
Posted via http://www.ruby-forum.com/\.

fedzor · 22 February 2008 22:06

$ sudo gem install snoopy
$ snoopy Wikipedia, the free encyclopedia
=> file Main_Page

Ta dah! there's a lot of magic behind it right now, and torrentz don't work (fixed on my machine, need to release it). It does segmented downloading, ideal for large files. For smaller ones, it still works fine.

The problem with open-uri is this: it downloads the whole thing to your tmp directory first, so using the BUFFER_SIZE thing won't actually help.

snoopy won't not write the file if there's an error.

-------------------------------------------------------|
~ Ari
Some people want love
Others want money
Me... Well...
I just want this code to compile

···

On Feb 21, 2008, at 8:54 PM, Kyle Hunter wrote:

Hello,

I'm using open-uri to download files using a buffer. It seems very
inefficient in terms of resource usage (CPU is ~10-20% in usage).

If possible, I'd like some suggestions for downloading a file which
names the outputted file the same as the URL, and does not actually
write if the file comes out to a 404 (or some other exception hits).

Current code:
BUFFER_SIZE=4096
def download(url)
  from = open(url)
  if (buffer = from.read(BUFFER_SIZE))
    puts "Downloading #{url}"
    File.open(url.split('/').last, 'wb') do |file|
    begin
      file.write(buffer)
    end while (buffer = from.read(BUFFER_SIZE))
    end
  end
end

Kyle_Hunter · 22 February 2008 02:34

James Tucker wrote:

···

On 22 Feb 2008, at 01:54, Kyle Hunter wrote:

BUFFER_SIZE=4096

Try making that a lot lot bigger.

Doh! Thanks James. Brings it down to much more reasonable usage. I
totally overlooked that very small buffer size that was set - thanks.
--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Download a file piece by piece ruby-talk	4	108	30 November 2006
Trying to download first 10k ruby-talk	3	117	2 December 2007
Mirroring large files over HTTP ruby-talk	4	158	6 October 2008
Question: Downloading files with open(-uri)? ruby-talk	15	118	23 December 2006
Net/http performance question ruby-talk	2	76	31 October 2006

Efficient file downloading

Related topics