In perl, -T returns true if is a text file and
-B returns true if is binary.
How do you do this in ruby? I’ve checked File, FileTest, and
File.Stat - nothing there to test for text vs. binary filetype.
Thanks,
P
···
–
^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^^~^~^~^~^
Peter B. Ensch peterbe@attbi.com
^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^^~^~^~^~^
Yukihiro Matsumoto wrote:
Hi,
In perl, -T returns true if is a text file and
-B returns true if is binary.
You have to define what is text file first. Traditional Perl test
does not work for Japanese text (nor Unicode text).
matz.
Ah! Not having worked with Unicode before, I had not considered this.
Does your answer assume that I need to define ‘text file’ and then
open/read the file to see if it meets the definition? In other words,
it is not possible to do something like:
if text file
open and process
else
do not open
end
BTW, thanks for answering my question, Matz, and thanks for ruby. I’m
having a lot of fun learning it.
P
···
In message “How to test for text file” > on 03/04/17, “Peter B. Ensch” pNOeterSPAM4MEbe@attbi.com writes:
–
^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^^~^~^~^~^
Peter B. Ensch peterbe@attbi.com
^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^^~^~^~^~^
Peter B. Ensch wrote:
Ah! Not having worked with Unicode before, I had not considered this.
Does your answer assume that I need to define ‘text file’ and then
open/read the file to see if it meets the definition? In other words,
it is not possible to do something like:
if text file
open and process
else
do not open
end
That’s what perl does as well, internally – it opens and checks the (I
believe) first line for characters that indicate one way or the other
(NUL is a good indication of binary, CRLF is a good indication it’s
text, under Windows. Under Unix, and dealing with other character sets,
things get way hairier)
I’ve never found a good way to detect file types easily – there’s the
“file” command under many unices that looks in a database of magic
numbers to guess file formats – GIF files start with GIF8?a, where ? is
7 or 9. JPEG files start with JFIF, windows executables with MZ and
linux and many other unices use ELF. Windows NT and later stamp unicode
text files with an “ordering sequence”, I can’t remember which, but if
it’s interpreted with the bytes swapped, it’s invalid unicode, but if
correct, is a two byte “NUL” sequence, telling windows which byte order
the file is in. If the file command doesn’t see anything that indicates
a certain type, it just guesses similarly to perl and says “Binary data”
or “text file”.
Windows and DOS try to solve this with file extensions, of course, and
Macs with creator codes, and nowadays, being unix-based, it’s starting
to use some combination of MacOS magic and file extensions and guessing,
which starts to get confusing.
Unix traditionally avoids the issue by making the user decide what to
open, or just throwing errors when it gets an ugly file, like a word
processor opening a jpeg. Such fun.
Aredridel
You could use MIME::Types [RAA:mime-types], which does a decent job
of MIME Content-Type analysis, and it knows most extensions for
filee types.
I may port File::MMagic from Perl to do this sort of thing, though.
-austin
– Austin Ziegler, austin@halostatue.ca on 2003.04.18 at 22:09:30
···
On Sat, 19 Apr 2003 01:14:08 +0900, Peter B. Ensch wrote:
Yukihiro Matsumoto wrote:
In perl, -T returns true if is a text file and -B
returns true if is binary.
You have to define what is text file first. Traditional Perl test
does not work for Japanese text (nor Unicode text).
Ah! Not having worked with Unicode before, I had not considered
this. Does your answer assume that I need to define ‘text file’
and then open/read the file to see if it meets the definition? In
other words, it is not possible to do something like:
if text file
open and process
else
do not open
end
BTW, thanks for answering my question, Matz, and thanks for ruby.
I’m having a lot of fun learning it.