Is there a file-type detection script available for Ruby, similar to
the unix ‘file’ program? At a minimum, is there a way in Ruby to
reliably tell the difference between a binary file format and a text
file format?
I’m working on a script which will run ‘tidy’ recursively against all
files in a directory structure – but it should only run on files
which are not binary. (I use a Windows-based CMS, City Desk, that
doesn’t do XHTML; I’m going to use Tidy to convert the non-XHTML
parts to XHTML during the publish process.)
I don’t want to make the detection extension-based, as there’s
already a wide variety of possible extensions that could be used
(most of them outside of my own application, but I plan on making
this publicly available, possibly with a Ruby-ized Tidy) – from htm
and html to cfm and asp to xml and css.
Ideally, I’d be able to come up with a MIME-type for the file and
create callbacks based on the MIME-type and/or extension (I may wish
to make an HTML-fragment which calls Tidy with the show-body-only
option set).
I can get my local version working easily – it’s predictable. I’m
just looking at doing this in a “larger” way…
-austin
– Austin Ziegler, austin@halostatue.ca on 2002.10.05 at 01.36.45
Is there a file-type detection script available for Ruby, similar to
the unix ‘file’ program? At a minimum, is there a way in Ruby to
reliably tell the difference between a binary file format and a text
file format?
Here’s something in Ruby that I wrote to do that. It’s based on the
algorithm that Perl uses to determine “textness” and “binariness”:
module FileTest
private
def self.isText(block)
return (block.count(“^ -~”, “^\b\f\t\r\n”) < (block.size / 3.0) &&
block.count(“\x00”) < 1)
end
public
The textfile? and binaryfile? methods are not inverses of each
other. Both return false if the item is not a file, or if the
item is a zero-length file. The “textness” or “binariness” of
an item can only be determined if it’s a file that contains at
least one byte.
def self.textfile?(item)
size = self.size(item)
blksize = File.stat(item).blksize
if size < 1 then
return false
end
begin
open(item) {
>file>
block = file.read(blksize < size ? blksize : size)
return self.isText(block)
}
rescue
return false
end
end
def self.binaryfile?(item)
size = self.size(item)
blksize = File.stat(item).blksize
if size < 1 then
return false
end
begin
open(item) {
>file>
block = file.read(blksize < size ? blksize : size)
return !self.isText(block)
}
rescue
return false
end
end
end
···
[ … ]
–
Lloyd Zusman
ljz@asfast.com
Sat, 5 Oct 2002 15:38:40 +0900, Lloyd Zusman ljz@asfast.com pisze:
The textfile? and binaryfile? methods are not inverses of each
other. Both return false if the item is not a file, or if the
item is a zero-length file.
In Perl both are true for an empty file.
···
–
__("< Marcin Kowalczyk
__/ qrczak@knm.org.pl
^^ Blog człowieka poczciwego.
Austin Ziegler austin@halostatue.ca wrote in message news:20021005055015.CIXT20369.tomts5-srv.bellnexxia.net@hogwarts…
Is there a file-type detection script available for Ruby, similar to
the unix ‘file’ program? At a minimum, is there a way in Ruby to
reliably tell the difference between a binary file format and a text
file format?
There’s a native win32 port of ‘file’, if that helps:
martin
“Lloyd Zusman” ljz@asfast.com wrote in message
news:m28z1d4cgj.fsf@asfast.com…
[…code snip]
def self.textfile?(item)
size = self.size(item)
blksize = File.stat(item).blksize
^^^^^
if size < 1 then
return false
end
[…code snip]
statfile.blksize returns 0 in ruby 1.7.2 (2002-07-02) [i386-mswin32]
So I modified it to
blksize = File.stat(item).blksize
blksize = 2048 if (blksize == 0)
Hope I did the right thing …
Thanks,
– shanko
Thanks. That’ll be a good start to make sure that I don’t run tidy on
my JPGs (:
-austin
– Austin Ziegler, austin@halostatue.ca on 2002.10.05 at 11.59.53
···
On Sat, 5 Oct 2002 15:38:40 +0900, Lloyd Zusman wrote:
Is there a file-type detection script available for Ruby, similar
to the unix ‘file’ program? At a minimum, is there a way in Ruby to
reliably tell the difference between a binary file format and a
text file format?
Here’s something in Ruby that I wrote to do that. It’s based on the
algorithm that Perl uses to determine “textness” and “binariness”: