Can I know a file is binary or text in ruby?

I am newbie in ruby and computer language.
To make simple ruby script, I need a file information.
Which file is binary or text.
In ruby, how can I know file information?
I found File class and File::stat class.
but there is no method which tells file state. (isbinary?)
Please give me an advice.
Thanks.

Ignoring for a moment what it would mean for a Unicode file to be
“binary”, you could just do this:

#!/usr/bin/ruby

From the Perl documentation:

···

On Friday 19 July 2002 09:20 am, mookhae wrote:

I am newbie in ruby and computer language.
To make simple ruby script, I need a file information.
Which file is binary or text.
In ruby, how can I know file information?
I found File class and File::stat class.
but there is no method which tells file state. (isbinary?)
Please give me an advice.
Thanks.

The “-T” and “-B” switches work as follows. The

first block or so of the file is examined for odd

characters such as strange control codes or char-

acters with the high bit set. If too many strange

characters (>30%) are found, it’s a “-B” file,

otherwise it’s a “-T” file. Also, any file con-

taining null in the first block is considered a

binary file. If “-T” or “-B” is used on a file-

handle, the current stdio buffer is examined

rather than the first block. Both “-T” and “-B”

return true on a null file, or a file at EOF when

testing a filehandle. Because you have to read a

file to do the “-T” test, on most occasions you

want to use a “-f” against the file first, as in

“next unless -f $file && -T $file”.

I don’t know how to get to the stdio buffer…

class File
def self.isBinary(name)
myStat = stat(name)
return false unless myStat.file?
open(name) { |file|
blk = file.read(myStat.blksize)
return blk.size == 0 ||
blk.count(“^ -~”, “^\r\n”) / blk.size > 0.3 ||
blk.count(“\x00”) > 0
}
end
end

Dir.new(‘.’).each { |entry|
if File.stat(entry).file?
puts “#{entry} #{ File.isBinary(entry) ? ‘binary’ : ‘text’ }”
else
puts “#{entry} directory”
end
}


Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE

It is impossible to determine completely whether a file is binary or
not because binary file is undefined concept. Unix command `file’
does that by heuristics. file command is pragmatically enough well
but not complete.

Now, the following tests if a file includes non ascii printable code
point byte. You can improve this script to detect non latin-1 etc.
However some coding systems is stateful, for example iso-2022, and
this approach does not work for such character coding systems.

#! ruby
NON_ASCII_PRINTABLE = /[^\x20-\x7e\s]/

def nonbinary?(io, forbidden, size = 1024)
while buf = io.read(size)
return false if forbidden =~ buf
end
true
end

usage: ruby this_script.rb filename …

ARGV.each do |fn|
begin
open(fn) do |f|
if nonbinary?(f, NON_ASCII_PRINTABLE)
puts “#{fn}: ascii printable”
else
puts “#{fn}: binary”
end
end
rescue
puts “#$0: #$!”
end
end

···

At Sat, 20 Jul 2002 01:20:56 +0900, mookhae wrote:

I am newbie in ruby and computer language.
To make simple ruby script, I need a file information.
Which file is binary or text.
In ruby, how can I know file information?