Saving a PDF locally

Hi Ruby-folks,

I am currently working on a small program, which saves copies of website
locally to my harddisk. For normal html-pages this works as expected.
But now I am struggeling with binary files such as PDFs.

Here is my code:

open("http://www.somewebpage.com/atestfile.pdf){|u|
   targetFile = File.new("test.pdf,"w")
   u.each_byte {|ch|
     targetFile.putc ch
   }
}

The resulting local file cannot be opened with my pdf-reader. When I
open it in an editor, there seems only to be numbers in the file (->not
binary).

instead of putc if tried write which did not work, too.

Any hints?

Yochen

···

--
Posted via http://www.ruby-forum.com/.

Yochen Gutmann wrote:

Hi Ruby-folks,

I am currently working on a small program, which saves copies of website
locally to my harddisk. For normal html-pages this works as expected.
But now I am struggeling with binary files such as PDFs.

   targetFile = File.new("test.pdf,"w")

Use "wb" instead of "w". On windows, this treats the data as binary
instead of lines of text that should be terminated with cr-lf.

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Try:

open("Google, 'rb'){|u|
    File.open("test.pdf", "wb") do |f|
      f.write(u.read)
    end
}

-- Daniel

···

On May 30, 2006, at 12:29 AM, Yochen Gutmann wrote:

open("Google){|u|
   targetFile = File.new("test.pdf,"w")
   u.each_byte {|ch|
     targetFile.putc ch
   }
}

Joel VanderWerf wrote:

Use "wb" instead of "w". On windows, this treats the data as binary
instead of lines of text that should be terminated with cr-lf.

well, although I am working on OSX (forgot to mention), I tried your
hint but that did not work(like expected). Hm.. Any other idea?

-Yochen

···

--
Posted via http://www.ruby-forum.com/\.

Daniel Harple wrote:

Try:

open("Google, 'rb'){|u|
    File.open("test.pdf", "wb") do |f|
      f.write(u.read)
    end
}

Thanx, Daniel, your solution is working as well! And it is even shorter!

Slowly I am wondering why I did't come up with a working solution by
myself ;-]

-- Yochen

···

--
Posted via http://www.ruby-forum.com/\.

Yochen Gutmann wrote:

Joel VanderWerf wrote:

Use "wb" instead of "w". On windows, this treats the data as binary
instead of lines of text that should be terminated with cr-lf.

well, although I am working on OSX (forgot to mention), I tried your
hint but that did not work(like expected). Hm.. Any other idea?

-Yochen

Sorry! I jumped to conclusions about the problem.

The following works for me on linux. Can't make any predictions about
OSX, tho'.

require 'open-uri'

open("http://path.berkeley.edu/~vjoel/redshift/ruby-sdforum.pdf"\){|u|
   targetFile = File.new("test.pdf","w")
   u.each_byte {|ch|
     targetFile.putc ch
   }
}

Why are you doing it a byte at a time? This seems to run much faster for me:

require 'open-uri'

open("http://path.berkeley.edu/~vjoel/redshift/ruby-sdforum.pdf"\) do |u|
  targetFile = File.new("test.pdf","w")
  loop do
    dat = u.read(1000)
    break unless dat
    targetFile.write dat
  end
end

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Joel VanderWerf wrote:

Why are you doing it a byte at a time?

just ran out of ideas :wink:

This seems to run much faster for me:

require 'open-uri'

open("http://path.berkeley.edu/~vjoel/redshift/ruby-sdforum.pdf"\) do |u|
  targetFile = File.new("test.pdf","w")
  loop do
    dat = u.read(1000)
    break unless dat
    targetFile.write dat
  end
end

Fantastic!

Thanx a lot.

-Yochen

···

--
Posted via http://www.ruby-forum.com/\.