Check for text file

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

Thanks,

Alin

···

--
Posted via http://www.ruby-forum.com/.

Alin Popa wrote:

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

I think so, but it's a little unclear exactly what you're trying to achieve. Do you have an example?

···

--
Alex

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

If you can't use 'file' directly, you should look at it's source and see how
the detection works. I think CVS also detects text quite well.

Thanks,

···

On 6/19/07, Alin Popa <alin.popa@gmail.com> wrote:

Alin

--
Posted via http://www.ruby-forum.com/\.

--
I always thought Smalltalk would beat Java, I just didn't know it would be
called 'Ruby' when it did.
-- Kent Beck

http://blog.zenspider.com/archives/2006/08/i_miss_perls_b.html

···

On Jun 18, 2007, at 23:59 , Alin Popa wrote:

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.

Alex Young wrote:

Alin Popa wrote:

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

I think so, but it's a little unclear exactly what you're trying to
achieve. Do you have an example?

I'm trying to do a replace in file for some text but I don't want to
consider files like archives or other binary files.

···

--
Posted via http://www.ruby-forum.com/\.

Ryan Davis wrote:

···

On Jun 18, 2007, at 23:59 , Alin Popa wrote:

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain
text no
matter what characters are in it.

http://blog.zenspider.com/archives/2006/08/i_miss_perls_b.html

Nice, thanks.

--
Posted via http://www.ruby-forum.com/\.

Hi,

At Wed, 20 Jun 2007 02:10:57 +0900,
Ryan Davis wrote in [ruby-talk:256206]:

> After some research I still cannot find a way how to see if a file is
> plain text or binary. In fact I want to check if a file is plain
> text no
> matter what characters are in it.

http://blog.zenspider.com/archives/2006/08/i_miss_perls_b.html

You can use String#count:

  def File.binary?(path)
    s = read(path, 4096) and
    !s.empty? and
    (/\0/n =~ s or s.count("\t\n -~").to_f/s.size<=0.7)
  end

In any case, it doesn't work for non-ascii files.

···

--
Nobu Nakada

Alin Popa wrote:

Alex Young wrote:

Alin Popa wrote:

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

I think so, but it's a little unclear exactly what you're trying to
achieve. Do you have an example?

I'm trying to do a replace in file for some text but I don't want to
consider files like archives or other binary files.

Of course, when I'm on windows I can go after the file extension and try
to ignore some specific (eg. .exe, .zip, .jar, .rar, .anything_i_want)
but I don't know how to do it on Linux/Unix OS where file extension is
not mandatory.

···

--
Posted via http://www.ruby-forum.com/\.

Which I shamelessly plagiarized and stuck in the ptools library.

gem install ptools

File.binary?('some_file')

Regards,

Dan

···

On Jun 19, 11:33 am, Alin Popa <alin.p...@gmail.com> wrote:

Ryan Davis wrote:
> On Jun 18, 2007, at 23:59 , Alin Popa wrote:

>> After some research I still cannot find a way how to see if a file is
>> plain text or binary. In fact I want to check if a file is plain
>> text no
>> matter what characters are in it.

>http://blog.zenspider.com/archives/2006/08/i_miss_perls_b.html

Nice, thanks.

Nobuyoshi Nakada wrote:

You can use String#count:

  def File.binary?(path)
    s = read(path, 4096) and
    !s.empty? and
    (/\0/n =~ s or s.count("\t\n -~").to_f/s.size<=0.7)
  end

In any case, it doesn't work for non-ascii files.

Pedantic correction: it desn't work for non-western scripts. French uses accents here and there but it would pass the test above.

Still, I have to say I was surprised; I didn't know that a hyphen in String#count had the same effect as in a regexp character class. Talk about an undocumented feature!

Daniel

You could read the file (or portion of the file), create a histogram of byte (or groups of bytes) occurrences and compare that to what you expect for text files (e.g. most chars are "0-9a-zA-Z" and punctuation).

You could as well use command "file" and parse its output.

Kind regards

  robert

···

On 19.06.2007 09:33, Alin Popa wrote:

Alin Popa wrote:

Alex Young wrote:

Alin Popa wrote:

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

I think so, but it's a little unclear exactly what you're trying to
achieve. Do you have an example?

I'm trying to do a replace in file for some text but I don't want to consider files like archives or other binary files.

Of course, when I'm on windows I can go after the file extension and try to ignore some specific (eg. .exe, .zip, .jar, .rar, .anything_i_want) but I don't know how to do it on Linux/Unix OS where file extension is not mandatory.

Hello,

On a *nix system, you can do

file_type = `file my_file`
puts file_type

but this will not work on Windows.

George

···

On 19 Jun 2007, at 08:33, Alin Popa wrote:

Alin Popa wrote:

Alex Young wrote:

Alin Popa wrote:

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

I think so, but it's a little unclear exactly what you're trying to
achieve. Do you have an example?

I'm trying to do a replace in file for some text but I don't want to
consider files like archives or other binary files.

Of course, when I'm on windows I can go after the file extension and try
to ignore some specific (eg. .exe, .zip, .jar, .rar, .anything_i_want)
but I don't know how to do it on Linux/Unix OS where file extension is
not mandatory.

--
Posted via http://www.ruby-forum.com/\.

Hi,

At Wed, 20 Jun 2007 08:22:51 +0900,
Daniel DeLorme wrote in [ruby-talk:256241]:

Still, I have to say I was surprised; I didn't know that a hyphen in
String#count had the same effect as in a regexp character class. Talk
about an undocumented feature!

It's documented.

It can be
  s.count("^\t\n -~").to_f/s.size>0.3

···

--
Nobu Nakada

robert@fussel ~
$ file .inputrc
.inputrc: ASCII English text

robert@fussel ~
$ uname -a
CYGWIN_NT-5.1 fussel 1.5.24(0.156/4/2) 2007-01-31 10:57 i686 Cygwin

:slight_smile:

  robert

···

On 19.06.2007 10:01, George Malamidis wrote:

Hello,

On a *nix system, you can do

file_type = `file my_file`
puts file_type

but this will not work on Windows.

George Malamidis wrote:

Hello,

On a *nix system, you can do

file_type = `file my_file`
puts file_type

but this will not work on Windows.

George

Thanks guys, the problem was solved due to your indications :wink:

Regarding file command, I can use it on win also since there are
gnuwin32 tools :slight_smile:

Best regards,

Alin

···

--
Posted via http://www.ruby-forum.com/\.