What is the best way to download files from the internet (HTTP) that are greater than 1GB?
Here's the story in whole....
I was trying to use Ruby Net::HTTP to manage a download from wikipedia... Specifically all current versions of the english one... But anyways, as I was downloading it, I got a memory error as I ran out of RAM.
My current code:
open(@opts[:out], "w") do |f|
http = Net::HTTP.new(@url.host, @url.port)
c = http.start do |http|
a = Net::HTTP::Get.new(@url.page)
http.request(a)
end
f.write(c.body)
end
I was hoping there'd be some method that I can attach a block to, so that for each byte it will call the block.
Is there some way to write the bytes to the file as they come in, not at the end?
Thanks,
---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est man alive
What is the best way to download files from the internet (HTTP) that are greater than 1GB?
Here's the story in whole....
I was trying to use Ruby Net::HTTP to manage a download from wikipedia... Specifically all current versions of the english one... But anyways, as I was downloading it, I got a memory error as I ran out of RAM.
My current code:
open(@opts[:out], "w") do |f|
http = Net::HTTP.new(@url.host, @url.port)
c = http.start do |http|
a = Net::HTTP::Get.new(@url.page)
http.request(a)
end
f.write(c.body)
end
I was hoping there'd be some method that I can attach a block to, so that for each byte it will call the block.
Is there some way to write the bytes to the file as they come in, not at the end?
Not precisely what you asked for, but this is how ara t. howard told me to download large files, using open-uri. This gets one 8kb sized chunk at a time:
open(uri) do |fin|
open(File.basename(uri), "w") do |fout|
while (buf = fin.read(8192))
fout.write buf
end
end
end
Is there some reason to not use wget or curl? Those are both written already. What are you hoping to do with the files you download?
-Bryan
···
On Dec 31, 2007, at 2:04 PM, thefed wrote:
What is the best way to download files from the internet (HTTP) that are greater than 1GB?
Here's the story in whole....
I was trying to use Ruby Net::HTTP to manage a download from wikipedia... Specifically all current versions of the english one... But anyways, as I was downloading it, I got a memory error as I ran out of RAM.
My current code:
open(@opts[:out], "w") do |f|
http = Net::HTTP.new(@url.host, @url.port)
c = http.start do |http|
a = Net::HTTP::Get.new(@url.page)
http.request(a)
end
f.write(c.body)
end
I was hoping there'd be some method that I can attach a block to, so that for each byte it will call the block.
Is there some way to write the bytes to the file as they come in, not at the end?
Thanks,
---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est man alive
But doesn't open-uri download the whole thing to your compy? I was about to use it, but then I ran it in irb and saw it returned a file object.
-------------------------------------------------------|
~ Ari
seydar: it's like a crazy love triangle of Kernel commands and C code
···
On Dec 31, 2007, at 5:15 PM, Tim Hunter wrote:
Not precisely what you asked for, but this is how ara t. howard told me to download large files, using open-uri. This gets one 8kb sized chunk at a time:
open(uri) do |fin|
open(File.basename(uri), "w") do |fout|
while (buf = fin.read(8192))
fout.write buf
end
end
end
But doesn't open-uri download the whole thing to your compy? I was about to use it, but then I ran it in irb and saw it returned a file object.
Isn't that what you want to happen? I thought your question was about how to download it in small chunks so it's not all in memory at the same time. This code downloads the whole file, but 8kb at a time.
Is there some particular reason not to use Aria2, it's already written
Yes, the UI sucks, and it cannot download multifile torrents from the
web as well but to compete with that you would have to make something
really good
Thanks
Michal
···
On 01/01/2008, thefed <fedzor@gmail.com> wrote:
On Dec 31, 2007, at 7:23 PM, Bryan Duxbury wrote:
> Is there some reason to not use wget or curl? Those are both
> written already. What are you hoping to do with the files you
> download?
I'm trying to write wget/axel in ruby. Plus add torrent support!
No, I thought when you use Kernel#open with open-uri, it FIRST downloads the entire 1GB file to your temp folder, and THEN runs your block on that file in temp
···
On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:
thefed wrote:
But doesn't open-uri download the whole thing to your compy? I was about to use it, but then I ran it in irb and saw it returned a file object.
Isn't that what you want to happen? I thought your question was about how to download it in small chunks so it's not all in memory at the same time. This code downloads the whole file, but 8kb at a time.
I'm really writing this just for practice, but also because I think the world needs a ruby downloader.
Maybe to give myself a fighting chance against aria2, I'll lower the version numbers instead of raising them.
- Ari
···
On Jan 1, 2008, at 1:38 PM, Michal Suchanek wrote:
Is there some particular reason not to use Aria2, it's already written
Yes, the UI sucks, and it cannot download multifile torrents from the
web as well but to compete with that you would have to make something
really good
But doesn't open-uri download the whole thing to your compy? I was about to use it, but then I ran it in irb and saw it returned a file object.
Isn't that what you want to happen? I thought your question was about how to download it in small chunks so it's not all in memory at the same time. This code downloads the whole file, but 8kb at a time.
No, I thought when you use Kernel#open with open-uri, it FIRST downloads the entire 1GB file to your temp folder, and THEN runs your block on that file in temp
Interesting. I just tried downloading a 6.1MB file with open-uri and didn't see that behavior. I'm using Ruby 1.8.6 on OS X 10.5.
> Is there some particular reason not to use Aria2, it's already
> written
>
> Yes, the UI sucks, and it cannot download multifile torrents from the
> web as well but to compete with that you would have to make something
> really good
Well then I have a competitor!
I'm really writing this just for practice, but also because I think
the world needs a ruby downloader.
Maybe to give myself a fighting chance against aria2, I'll lower the
version numbers instead of raising them.
I think you should definitely use BitTorrent rather than HTTP. I spoke
to the maintainer/developer a while ago and I think ruby-torrent isn't
being actively worked on, but it could definitely save you some
headaches if you start there.
That's good then! I'll test it out myself juuuust to make sure. I don't to waste 4GB of space when i only need 2GB.
open-uri uses Net::HTTP, of course. Am I correct?
Net::HTTP wraps connections in a Timeout, which is REALLY screwing with me downloading large files.
Will probably get some monkeys to patch that for me.
- Ari
···
On Jan 1, 2008, at 1:56 PM, Tim Hunter wrote:
thefed wrote:
On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:
thefed wrote:
But doesn't open-uri download the whole thing to your compy? I was about to use it, but then I ran it in irb and saw it returned a file object.
Isn't that what you want to happen? I thought your question was about how to download it in small chunks so it's not all in memory at the same time. This code downloads the whole file, but 8kb at a time.
No, I thought when you use Kernel#open with open-uri, it FIRST downloads the entire 1GB file to your temp folder, and THEN runs your block on that file in temp
Interesting. I just tried downloading a 6.1MB file with open-uri and didn't see that behavior. I'm using Ruby 1.8.6 on OS X 10.5.
Hmmm ... fetching a ~5MB file over HTTP, the entire file was
downloaded prior to the 8192 chunk reads. Ruby 1.8.6 p111 on Solaris
2.11. Same behavior with JRuby. FWIW, I'm observing interface stats
to make my determination.
···
On Jan 1, 1:56 pm, Tim Hunter <TimHun...@nc.rr.com> wrote:
thefed wrote:
> On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:
>> thefed wrote:
>>> But doesn't open-uri download the whole thing to your compy? I was
>>> about to use it, but then I ran it in irb and saw it returned a file
>>> object.
>> Isn't that what you want to happen? I thought your question was about
>> how to download it in small chunks so it's not all in memory at the
>> same time. This code downloads the whole file, but 8kb at a time.
> No, I thought when you use Kernel#open with open-uri, it FIRST downloads
> the entire 1GB file to your temp folder, and THEN runs your block on
> that file in temp
Interesting. I just tried downloading a 6.1MB file with open-uri and
didn't see that behavior. I'm using Ruby 1.8.6 on OS X 10.5.
Heh, that's what I'm using in snoopy right now. Although, ruby-
torrent is about 1000 LOC, and since its not a gem, snoopy could get
pretty fat.
Would you mind packaging it and releasing it as a gem?
(I have homework to do)
Are you insane? Firstly it already has a RubyForge page with download
files, secondly I mentioned having spoken to the maintainer - which
would mean the maintainer was not me - and thirdly who would say yes
to that?
(And fourth, kind of a tangent, but who expects an O'Reilly book on
Ruby to have accurate information?)
On Jan 1, 1:56 pm, Tim Hunter <TimHun...@nc.rr.com> wrote:
thefed wrote:
On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:
thefed wrote:
But doesn't open-uri download the whole thing to your compy? I was
about to use it, but then I ran it in irb and saw it returned a file
object.
Isn't that what you want to happen? I thought your question was about
how to download it in small chunks so it's not all in memory at the
same time. This code downloads the whole file, but 8kb at a time.
No, I thought when you use Kernel#open with open-uri, it FIRST downloads
the entire 1GB file to your temp folder, and THEN runs your block on
that file in temp
Interesting. I just tried downloading a 6.1MB file with open-uri and
didn't see that behavior. I'm using Ruby 1.8.6 on OS X 10.5.
Hmmm ... fetching a ~5MB file over HTTP, the entire file was
downloaded prior to the 8192 chunk reads. Ruby 1.8.6 p111 on Solaris
2.11. Same behavior with JRuby. FWIW, I'm observing interface stats
to make my determination.
Fascinating. Learning every day...
*Where* did it download the file to? Did it write it to disk or just keep it all in memory?
Wouldn't it be cool if we could keep Zed Shaw in a cage and feed him newbies?
···
On 1/1/08, thefed <fedzor@gmail.com> wrote:
On Jan 1, 2008, at 3:43 PM, Giles Bowkett wrote:
>>> There's also ruby-torrent:
>>>
>>> http://rubytorrent.rubyforge.org/
>>
>> Heh, that's what I'm using in snoopy right now. Although, ruby-
>> torrent is about 1000 LOC, and since its not a gem, snoopy could get
>> pretty fat.
>>
>> Would you mind packaging it and releasing it as a gem?
>>
>> (I have homework to do)
>
> Are you insane?
If it's a gem, it means EASY INSTALL
> and thirdly who would say yes
> to that?
Someone who's looking to start off the new year with a good deed
-------------------------------------------------------|
~ Ari
crap my sig won't fit