Best way to download >1GB files

What is the best way to download files from the internet (HTTP) that are greater than 1GB?

Here's the story in whole....
I was trying to use Ruby Net::HTTP to manage a download from wikipedia... Specifically all current versions of the english one... But anyways, as I was downloading it, I got a memory error as I ran out of RAM.

My current code:
       open(@opts[:out], "w") do |f|
         http = Net::HTTP.new(@url.host, @url.port)
         c = http.start do |http|
           a = Net::HTTP::Get.new(@url.page)
           http.request(a)
         end
         f.write(c.body)
       end

I was hoping there'd be some method that I can attach a block to, so that for each byte it will call the block.

Is there some way to write the bytes to the file as they come in, not at the end?

Thanks,
---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est man alive

thefed wrote:

What is the best way to download files from the internet (HTTP) that are greater than 1GB?

Here's the story in whole....
I was trying to use Ruby Net::HTTP to manage a download from wikipedia... Specifically all current versions of the english one... But anyways, as I was downloading it, I got a memory error as I ran out of RAM.

My current code:
      open(@opts[:out], "w") do |f|
        http = Net::HTTP.new(@url.host, @url.port)
        c = http.start do |http|
          a = Net::HTTP::Get.new(@url.page)
          http.request(a)
        end
        f.write(c.body)
      end

I was hoping there'd be some method that I can attach a block to, so that for each byte it will call the block.

Is there some way to write the bytes to the file as they come in, not at the end?

Not precisely what you asked for, but this is how ara t. howard told me to download large files, using open-uri. This gets one 8kb sized chunk at a time:

         open(uri) do |fin|
           open(File.basename(uri), "w") do |fout|
             while (buf = fin.read(8192))
               fout.write buf
             end
           end
         end

···

--
RMagick: http://rmagick.rubyforge.org/
RMagick 2: http://rmagick.rubyforge.org/rmagick2.html

Is there some reason to not use wget or curl? Those are both written already. What are you hoping to do with the files you download?

-Bryan

···

On Dec 31, 2007, at 2:04 PM, thefed wrote:

What is the best way to download files from the internet (HTTP) that are greater than 1GB?

Here's the story in whole....
I was trying to use Ruby Net::HTTP to manage a download from wikipedia... Specifically all current versions of the english one... But anyways, as I was downloading it, I got a memory error as I ran out of RAM.

My current code:
      open(@opts[:out], "w") do |f|
        http = Net::HTTP.new(@url.host, @url.port)
        c = http.start do |http|
          a = Net::HTTP::Get.new(@url.page)
          http.request(a)
        end
        f.write(c.body)
      end

I was hoping there'd be some method that I can attach a block to, so that for each byte it will call the block.

Is there some way to write the bytes to the file as they come in, not at the end?

Thanks,
---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est man alive

But doesn't open-uri download the whole thing to your compy? I was about to use it, but then I ran it in irb and saw it returned a file object.

-------------------------------------------------------|
~ Ari
seydar: it's like a crazy love triangle of Kernel commands and C code

···

On Dec 31, 2007, at 5:15 PM, Tim Hunter wrote:

Not precisely what you asked for, but this is how ara t. howard told me to download large files, using open-uri. This gets one 8kb sized chunk at a time:

        open(uri) do |fin|
          open(File.basename(uri), "w") do |fout|
            while (buf = fin.read(8192))
              fout.write buf
            end
          end
        end

I'm trying to write wget/axel in ruby. Plus add torrent support!

···

On Dec 31, 2007, at 7:23 PM, Bryan Duxbury wrote:

Is there some reason to not use wget or curl? Those are both written already. What are you hoping to do with the files you download?

thefed wrote:

But doesn't open-uri download the whole thing to your compy? I was about to use it, but then I ran it in irb and saw it returned a file object.

Isn't that what you want to happen? I thought your question was about how to download it in small chunks so it's not all in memory at the same time. This code downloads the whole file, but 8kb at a time.

···

--
RMagick: http://rmagick.rubyforge.org/
RMagick 2: http://rmagick.rubyforge.org/rmagick2.html

Is there some particular reason not to use Aria2, it's already written :wink:

Yes, the UI sucks, and it cannot download multifile torrents from the
web as well but to compete with that you would have to make something
really good :slight_smile:

Thanks

Michal

···

On 01/01/2008, thefed <fedzor@gmail.com> wrote:

On Dec 31, 2007, at 7:23 PM, Bryan Duxbury wrote:

> Is there some reason to not use wget or curl? Those are both
> written already. What are you hoping to do with the files you
> download?

I'm trying to write wget/axel in ruby. Plus add torrent support!

No, I thought when you use Kernel#open with open-uri, it FIRST downloads the entire 1GB file to your temp folder, and THEN runs your block on that file in temp

···

On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:

thefed wrote:

But doesn't open-uri download the whole thing to your compy? I was about to use it, but then I ran it in irb and saw it returned a file object.

Isn't that what you want to happen? I thought your question was about how to download it in small chunks so it's not all in memory at the same time. This code downloads the whole file, but 8kb at a time.

Well then I have a competitor!

I'm really writing this just for practice, but also because I think the world needs a ruby downloader.

Maybe to give myself a fighting chance against aria2, I'll lower the version numbers instead of raising them.

- Ari

···

On Jan 1, 2008, at 1:38 PM, Michal Suchanek wrote:

Is there some particular reason not to use Aria2, it's already written :wink:

Yes, the UI sucks, and it cannot download multifile torrents from the
web as well but to compete with that you would have to make something
really good :slight_smile:

thefed wrote:

thefed wrote:

But doesn't open-uri download the whole thing to your compy? I was about to use it, but then I ran it in irb and saw it returned a file object.

Isn't that what you want to happen? I thought your question was about how to download it in small chunks so it's not all in memory at the same time. This code downloads the whole file, but 8kb at a time.

No, I thought when you use Kernel#open with open-uri, it FIRST downloads the entire 1GB file to your temp folder, and THEN runs your block on that file in temp

Interesting. I just tried downloading a 6.1MB file with open-uri and didn't see that behavior. I'm using Ruby 1.8.6 on OS X 10.5.

···

On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:

--
RMagick: http://rmagick.rubyforge.org/
RMagick 2: http://rmagick.rubyforge.org/rmagick2.html

> Is there some particular reason not to use Aria2, it's already
> written :wink:
>
> Yes, the UI sucks, and it cannot download multifile torrents from the
> web as well but to compete with that you would have to make something
> really good :slight_smile:

Well then I have a competitor!

I'm really writing this just for practice, but also because I think
the world needs a ruby downloader.

Maybe to give myself a fighting chance against aria2, I'll lower the
version numbers instead of raising them.

There's also ruby-torrent:

http://rubytorrent.rubyforge.org/

I think you should definitely use BitTorrent rather than HTTP. I spoke
to the maintainer/developer a while ago and I think ruby-torrent isn't
being actively worked on, but it could definitely save you some
headaches if you start there.

···

--
Giles Bowkett

Podcast: http://hollywoodgrit.blogspot.com
Blog: http://gilesbowkett.blogspot.com
Portfolio: http://www.gilesgoatboy.org
Tumblelog: http://giles.tumblr.com

That's good then! I'll test it out myself juuuust to make sure. I don't to waste 4GB of space when i only need 2GB.

open-uri uses Net::HTTP, of course. Am I correct?

Net::HTTP wraps connections in a Timeout, which is REALLY screwing with me downloading large files.

Will probably get some monkeys to patch that for me.

- Ari

···

On Jan 1, 2008, at 1:56 PM, Tim Hunter wrote:

thefed wrote:

On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:

thefed wrote:

But doesn't open-uri download the whole thing to your compy? I was about to use it, but then I ran it in irb and saw it returned a file object.

Isn't that what you want to happen? I thought your question was about how to download it in small chunks so it's not all in memory at the same time. This code downloads the whole file, but 8kb at a time.

No, I thought when you use Kernel#open with open-uri, it FIRST downloads the entire 1GB file to your temp folder, and THEN runs your block on that file in temp

Interesting. I just tried downloading a 6.1MB file with open-uri and didn't see that behavior. I'm using Ruby 1.8.6 on OS X 10.5.

Heh, that's what I'm using in snoopy right now. Although, ruby-torrent is about 1000 LOC, and since its not a gem, snoopy could get pretty fat.

Would you mind packaging it and releasing it as a gem?

(I have homework to do)

-------------------------------------------------------|
~ Ari
if god gives you lemons
YOU FIND A NEW GOD

···

On Jan 1, 2008, at 2:46 PM, Giles Bowkett wrote:

There's also ruby-torrent:

http://rubytorrent.rubyforge.org/

Hmmm ... fetching a ~5MB file over HTTP, the entire file was
downloaded prior to the 8192 chunk reads. Ruby 1.8.6 p111 on Solaris
2.11. Same behavior with JRuby. FWIW, I'm observing interface stats
to make my determination.

···

On Jan 1, 1:56 pm, Tim Hunter <TimHun...@nc.rr.com> wrote:

thefed wrote:

> On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:

>> thefed wrote:
>>> But doesn't open-uri download the whole thing to your compy? I was
>>> about to use it, but then I ran it in irb and saw it returned a file
>>> object.

>> Isn't that what you want to happen? I thought your question was about
>> how to download it in small chunks so it's not all in memory at the
>> same time. This code downloads the whole file, but 8kb at a time.

> No, I thought when you use Kernel#open with open-uri, it FIRST downloads
> the entire 1GB file to your temp folder, and THEN runs your block on
> that file in temp

Interesting. I just tried downloading a 6.1MB file with open-uri and
didn't see that behavior. I'm using Ruby 1.8.6 on OS X 10.5.

> There's also ruby-torrent:
>
> http://rubytorrent.rubyforge.org/

Heh, that's what I'm using in snoopy right now. Although, ruby-
torrent is about 1000 LOC, and since its not a gem, snoopy could get
pretty fat.

Would you mind packaging it and releasing it as a gem?

(I have homework to do)

Are you insane? Firstly it already has a RubyForge page with download
files, secondly I mentioned having spoken to the maintainer - which
would mean the maintainer was not me - and thirdly who would say yes
to that?

(And fourth, kind of a tangent, but who expects an O'Reilly book on
Ruby to have accurate information?)

···

--
Giles Bowkett

Podcast: http://hollywoodgrit.blogspot.com
Blog: http://gilesbowkett.blogspot.com
Portfolio: http://www.gilesgoatboy.org
Tumblelog: http://giles.tumblr.com

ccheetham@gmail.com wrote:

···

On Jan 1, 1:56 pm, Tim Hunter <TimHun...@nc.rr.com> wrote:

thefed wrote:

On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:

thefed wrote:

But doesn't open-uri download the whole thing to your compy? I was
about to use it, but then I ran it in irb and saw it returned a file
object.

Isn't that what you want to happen? I thought your question was about
how to download it in small chunks so it's not all in memory at the
same time. This code downloads the whole file, but 8kb at a time.

No, I thought when you use Kernel#open with open-uri, it FIRST downloads
the entire 1GB file to your temp folder, and THEN runs your block on
that file in temp

Interesting. I just tried downloading a 6.1MB file with open-uri and
didn't see that behavior. I'm using Ruby 1.8.6 on OS X 10.5.

Hmmm ... fetching a ~5MB file over HTTP, the entire file was
downloaded prior to the 8192 chunk reads. Ruby 1.8.6 p111 on Solaris
2.11. Same behavior with JRuby. FWIW, I'm observing interface stats
to make my determination.

Fascinating. Learning every day...

*Where* did it download the file to? Did it write it to disk or just keep it all in memory?

--
RMagick: http://rmagick.rubyforge.org/
RMagick 2: http://rmagick.rubyforge.org/rmagick2.html

There's also ruby-torrent:

http://rubytorrent.rubyforge.org/

Heh, that's what I'm using in snoopy right now. Although, ruby-
torrent is about 1000 LOC, and since its not a gem, snoopy could get
pretty fat.

Would you mind packaging it and releasing it as a gem?

(I have homework to do)

Are you insane?

If it's a gem, it means EASY INSTALL

and thirdly who would say yes
to that?

Someone who's looking to start off the new year with a good deed :slight_smile:

-------------------------------------------------------|
~ Ari
crap my sig won't fit

···

On Jan 1, 2008, at 3:43 PM, Giles Bowkett wrote:

Wouldn't it be cool if we could keep Zed Shaw in a cage and feed him newbies?

···

On 1/1/08, thefed <fedzor@gmail.com> wrote:

On Jan 1, 2008, at 3:43 PM, Giles Bowkett wrote:

>>> There's also ruby-torrent:
>>>
>>> http://rubytorrent.rubyforge.org/
>>
>> Heh, that's what I'm using in snoopy right now. Although, ruby-
>> torrent is about 1000 LOC, and since its not a gem, snoopy could get
>> pretty fat.
>>
>> Would you mind packaging it and releasing it as a gem?
>>
>> (I have homework to do)
>
> Are you insane?

If it's a gem, it means EASY INSTALL

> and thirdly who would say yes
> to that?

Someone who's looking to start off the new year with a good deed :slight_smile:

-------------------------------------------------------|
~ Ari
crap my sig won't fit

--
Giles Bowkett

Podcast: http://hollywoodgrit.blogspot.com
Blog: http://gilesbowkett.blogspot.com
Portfolio: http://www.gilesgoatboy.org
Tumblelog: http://giles.tumblr.com

not funny >:-(

You asked about if frames are tagged as "leather" or "no leather"

···

On Jan 1, 2008, at 4:37 PM, Giles Bowkett wrote:

Wouldn't it be cool if we could keep Zed Shaw in a cage and feed him newbies?

Giles Bowkett kirjoitti:

Wouldn't it be cool if we could keep Zed Shaw in a cage and feed him newbies?

You mean AFTER you have sniped at the newbies, right? The kettle, the pot, et cetera.

Csmr