Fromdos dos2unix in ruby

how can I achieve in ruby the result of running:
fromdos dos_file.txt unix_file.txt

or in vim:
set ff=unix

?

thanks,
chris

just to add, I need to do that conversion under windows.

thanks,
chris

···

On 18 Aug., 13:38, Krzysztof Cierpisz <ciape...@gmail.com> wrote:

how can I achieve in ruby the result of running:
fromdos dos_file.txt unix_file.txt

or in vim:
set ff=unix

?

thanks,
chris

krzysztof cierpisz <ciapecki@gmail.com> writes:

···

On 18 Aug., 13:38, Krzysztof Cierpisz <ciape...@gmail.com> wrote:

how can I achieve in ruby the result of running:
fromdos dos_file.txt unix_file.txt

or in vim:
set ff=unix

?

just to add, I need to do that conversion under windows.

Well, you would read from the input file, replace the dos/windows line
endings with unix ones and write to the output file.

--
Dominik Honnef
dominikho@gmx.net

Well, you would read from the input file, replace the dos/windows line
endings with unix ones and write to the output file.

I tried with following dos2unix.rb script

###### dos2unix.rb ######################
out = File.open(ARGV[1],"w")

File.open(ARGV[0]).each {|line|
  out << line.gsub!(/\r$/,'')
}

out.close

···

#########################################

this:
ruby dos2unix.rb u8nl_utf8_tab.dos.txt d

works fine on Linux (d with length 408 bytes) but not on Windows, on
Windows d is a file with 0 bytes

input file u8nl_utf8_tab.dos.txt looks like this:
col1,col2|~|
"first line of cell 1
second line of cell 1",only line in 2|~|
"Czy specjalny telefon przeznaczony dla dzieci w wieku od 3 do 7 lat
podbije rynek? Jest prosty, bezpieczny i ma tylko 4 klawisze.
Sprzedawać go chce między innymi telefonia ojca Rydzyka. więcej
","Copyright © World Group.
Реклама
Help
Сделать World стартовой"|~|
äöüб фыва,"asdf,эжх"|~|

thanks,
chris

You are not closing the File object properly so your output might
never get flushed to disk...

Cheers

robert

···

2009/8/18 krzysztof cierpisz <ciapecki@gmail.com>:

Well, you would read from the input file, replace the dos/windows line
endings with unix ones and write to the output file.

I tried with following dos2unix.rb script

###### dos2unix.rb ######################
out = File.open(ARGV[1],"w")

File.open(ARGV[0]).each {|line|
out << line.gsub!(/\r$/,'')
}

out.close
#########################################

this:
ruby dos2unix.rb u8nl_utf8_tab.dos.txt d

works fine on Linux (d with length 408 bytes) but not on Windows, on
Windows d is a file with 0 bytes

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

> I tried with following dos2unix.rb script

> ###### dos2unix.rb ######################
> out = File.open(ARGV[1],"w")

> File.open(ARGV[0]).each {|line|
> out << line.gsub!(/\r$/,'')
> }

> out.close
> #########################################

> this:
> ruby dos2unix.rb u8nl_utf8_tab.dos.txt d

> works fine on Linux (d with length 408 bytes) but not on Windows, on
> Windows d is a file with 0 bytes

You are not closing the File object properly so your output might
never get flushed to disk...

Cheers

robert

can you let me know how to close it properly?

thanks,
chris

Well, you would read from the input file, replace the dos/windows line
endings with unix ones and write to the output file.

I tried with following dos2unix.rb script

###### dos2unix.rb ######################
out = File.open(ARGV[1],"w")

File.open(ARGV[0]).each {|line|
out << line.gsub!(/\r$/,'')

You open the file with the default mode of 'r' here so the File class is going to do the line-ending conversion for you. Then you use String#gsub! which returns nil when no changes are made. You are never going to get output this way.

}

out.close
#########################################

this:
ruby dos2unix.rb u8nl_utf8_tab.dos.txt d

works fine on Linux (d with length 408 bytes) but not on Windows, on
Windows d is a file with 0 bytes

You are not closing the File object properly so your output might
never get flushed to disk...

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Try something like this:

buffer = ''
File.open(ARGV[1], 'wb') do |out| # open for writing binary
   File.open(ARGV[0], 'rb') do |in| # open for reading binary
     while in.read(1024, buffer) # read upto 1024 bytes into buffer
       out.write buffer.gsub(/\r\n/, "\n") # change ending and write out
     end
   end # end of block closes input
end # end of block closes output

-Rob

P.S. This is untested straight from my head.

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Aug 18, 2009, at 11:21 AM, Robert Klemme wrote:

2009/8/18 krzysztof cierpisz <ciapecki@gmail.com>:

Try something like this:

buffer = ''
File.open(ARGV[1], 'wb') do |out| # open for writing binary
File.open(ARGV[0], 'rb') do |in| # open for reading binary
while in.read(1024, buffer) # read upto 1024 bytes into
buffer
out.write buffer.gsub(/\r\n/, "\n") # change ending and write
out
end
end # end of block closes input
end # end of block closes output

-Rob

thanks Rob,

I just added binary mode to what I had, and now it's working under
Windows as well.
I am always forgetting about "b" mode under windows.

thanks
chris

fancy little bug here Rob, do you spot it?

What if \r is the 1024th char?
This will happen, one day :wink:
Unless the file is hugh I would try
File.open....
   File.open ...
      out.print in.read.gsub( /\r\n/, /\n/ )
   end
end

If performance can be an issue we could use File#each with 10.chr as a seperator

     in.each 10.chr do | line |
         out.print line.sub( /\r\n\z/, 10.chr )
     end

HTH
Robert

···

On Tue, Aug 18, 2009 at 5:50 PM, Rob Biedenharn<Rob@agileconsultingllc.com> wrote:

On Aug 18, 2009, at 11:21 AM, Robert Klemme wrote:

2009/8/18 krzysztof cierpisz <ciapecki@gmail.com>:

Well, you would read from the input file, replace the dos/windows line
endings with unix ones and write to the output file.

I tried with following dos2unix.rb script

###### dos2unix.rb ######################
out = File.open(ARGV[1],"w")

File.open(ARGV[0]).each {|line|
out << line.gsub!(/\r$/,'')

You open the file with the default mode of 'r' here so the File class is
going to do the line-ending conversion for you. Then you use String#gsub!
which returns nil when no changes are made. You are never going to get
output this way.

}

out.close
#########################################

this:
ruby dos2unix.rb u8nl_utf8_tab.dos.txt d

works fine on Linux (d with length 408 bytes) but not on Windows, on
Windows d is a file with 0 bytes

You are not closing the File object properly so your output might
never get flushed to disk...

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Try something like this:

buffer = ''
File.open(ARGV[1], 'wb') do |out| # open for writing binary
File.open(ARGV[0], 'rb') do |in| # open for reading binary
while in.read(1024, buffer) # read upto 1024 bytes into buffer
out.write buffer.gsub(/\r\n/, "\n") # change ending and write out

--
module Kernel
  alias_method :λ, :lambda
end

Just for the record... in Ruby "\n" == 10.chr in all platforms. I find
"\n" to be more obvious.

···

2009/8/18 Robert Dober <robert.dober@gmail.com>:

If performance can be an issue we could use File#each with 10.chr as a seperator

in\.each 10\.chr do | line |
    out\.print line\.sub\( /\\r\\n\\z/, 10\.chr \)
end

If performance can be an issue we could use File#each with 10.chr as a seperator

in\.each 10\.chr do | line |
    out\.print line\.sub\( /\\r\\n\\z/, 10\.chr \)
end

Just for the record... in Ruby "\n" == 10.chr in all platforms. I find
"\n" to be more obvious.

I wanted to point out the subtle bug because I thought it useful. But
I hate backslashes and use 10.chr often, this however is not good
practice, because it is unconventional, it is just me ;).
In the infinitesimal hope that 10.chr is useful for some folks anyway.
Cheers
Robert

···

On Tue, Aug 18, 2009 at 7:19 PM, Xavier Noria<fxn@hashref.com> wrote:

2009/8/18 Robert Dober <robert.dober@gmail.com>:

--
module Kernel
  alias_method :λ, :lambda
end

I would let Ruby do the line detection to avoid the issue Robert pointed out. For the record, this is what I'd probably be doing:

WIN_LE = "\r\n".freeze

File.open ARGV[0] do |in|
   File.open ARGV[1], "wb" do |out|
     in.each do |line|
       line.chomp!
       out.print line, WIN_LE
       # or:
       # out.write(line)
       # out.write(WIN_LE)
     end
   end
end

In this particular case I would not use File.foreach because then "out" is created even if "in" isn't there.

Kind regards

  robert

···

On 18.08.2009 21:46, Robert Dober wrote:

On Tue, Aug 18, 2009 at 7:19 PM, Xavier Noria<fxn@hashref.com> wrote:

2009/8/18 Robert Dober <robert.dober@gmail.com>:

If performance can be an issue we could use File#each with 10.chr as a seperator

    in.each 10.chr do | line |
        out.print line.sub( /\r\n\z/, 10.chr )
    end

Just for the record... in Ruby "\n" == 10.chr in all platforms. I find
"\n" to be more obvious.

I wanted to point out the subtle bug because I thought it useful. But
I hate backslashes and use 10.chr often, this however is not good
practice, because it is unconventional, it is just me ;).
In the infinitesimal hope that 10.chr is useful for some folks anyway.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Hey but this is dos2unix :-).

You can't read in text-mode just like that in a portable way, because
chomp! only chomps "\n".

If you can assume the program is gonna run only on Windows then the
solution is trivial: read in text-mode, and write in binary mode. No
chomping or gsubs needed, just read and write.

If the program has to be portable then you need to deal with the
spurious \015 that may come up.

···

On Tue, Aug 18, 2009 at 10:15 PM, Robert Klemme<shortcutter@googlemail.com> wrote:

WIN_LE = "\r\n".freeze

File.open ARGV[0] do |in|
File.open ARGV[1], "wb" do |out|
in.each do |line|
line.chomp!
out.print line, WIN_LE

Yup, I thought my code solved the issue, tell Ruby that a line ends
with "\n" ( that was tough to type :wink: in each and replace a potential
"\r" before?
But maybe this does not work on binary files under Windows, no way to
test, sorry.

Cheers
Robert

···

On Tue, Aug 18, 2009 at 10:39 PM, Xavier Noria<fxn@hashref.com> wrote:

On Tue, Aug 18, 2009 at 10:15 PM, Robert > Klemme<shortcutter@googlemail.com> wrote:

WIN_LE = "\r\n".freeze

File.open ARGV[0] do |in|
File.open ARGV[1], "wb" do |out|
in.each do |line|
line.chomp!
out.print line, WIN_LE

Hey but this is dos2unix :-).

You can't read in text-mode just like that in a portable way, because
chomp! only chomps "\n".

If you can assume the program is gonna run only on Windows then the
solution is trivial: read in text-mode, and write in binary mode. No
chomping or gsubs needed, just read and write.

If the program has to be portable then you need to deal with the
spurious \015 that may come up.

WIN_LE = "\r\n".freeze

File.open ARGV[0] do |in|
File.open ARGV[1], "wb" do |out|
in.each do |line|
line.chomp!
out.print line, WIN_LE

Hey but this is dos2unix :-).

Ooops, make that then

LE = "\n".freeze

and of course

out.print line, LE

You can't read in text-mode just like that in a portable way, because
chomp! only chomps "\n".

No.

$ allruby -e 'p "a\r\n".chomp'
CYGWIN_NT-5.1 padrklemme1 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin

···

2009/8/18 Xavier Noria <fxn@hashref.com>:

On Tue, Aug 18, 2009 at 10:15 PM, Robert > Klemme<shortcutter@googlemail.com> wrote:

========================================
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
"a"

ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-cygwin]
"a"

If you can assume the program is gonna run only on Windows then the
solution is trivial: read in text-mode, and write in binary mode. No
chomping or gsubs needed, just read and write.

If the program has to be portable then you need to deal with the
spurious \015 that may come up.

String#chomp does that nicely.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

The idea is good, but this topic is brittle (though easy when you get
the facts straight).

Problem is on CRLF platforms the I/O system filters out the CR of any
pair CRLF before the string arrives to Ruby land. That is, if you work
in text-mode. In fact that is the definition of text-mode, that the
conversion is on.

When you write in text mode in a CRLF platform, the I/O system
monitors the stream of bytes, and inserts a CR every time he sees an
LF. Unconditionally.

On Unix these conversions do not happen, text-mode and binary-mode are
the same, and Unix uses LF on disk to mean a newline.

And the point is those conversions happen in text-mode *no matter
which is the input record separator*, so in those solution the file
opened for reading should be opened in binary mode anyway. If you
don't do this, a file that has on disk

   \r\r\n

will go up as \r\n on Windows, and that gsubed to \n, so you've lost a
\r that didn't belong to the newline.

In a portable script you have to work in binary mode, and in a
Windows-only script it is enough to read in text-mode and write
verbatim in binary-mode.

···

On Tue, Aug 18, 2009 at 11:39 PM, Robert Dober<robert.dober@gmail.com> wrote:

Yup, I thought my code solved the issue, tell Ruby that a line ends
with "\n" ( that was tough to type :wink: in each and replace a potential
"\r" before?
But maybe this does not work on binary files under Windows, no way to
test, sorry.

Oh you are right. I thought chomp chomped the input record separator,
but I see in the Pickaxe that's unless $/ has been untouched.

···

On Wed, Aug 19, 2009 at 9:14 AM, Robert Klemme<shortcutter@googlemail.com> wrote:

If the program has to be portable then you need to deal with the
spurious \015 that may come up.

String#chomp does that nicely.

But I did open it in binary mode, did I not?
Anyway, if I had a typo in my snippet, thanx for the correction.

The only issue I can see is the following

Newline = "\n" || 10.chr || "\012" || ";-)"

file.open( "...", "rb"){ | f |
   f.each( Newline ) { ...
####### ^
####### Does this work on Windows?

Cheers
Robert

···

On Wed, Aug 19, 2009 at 12:38 AM, Xavier Noria<fxn@hashref.com> wrote:

On Tue, Aug 18, 2009 at 11:39 PM, Robert Dober<robert.dober@gmail.com> wrote:

Yup, I thought my code solved the issue, tell Ruby that a line ends
with "\n" ( that was tough to type :wink: in each and replace a potential
"\r" before?
But maybe this does not work on binary files under Windows, no way to
test, sorry.

The idea is good, but this topic is brittle (though easy when you get
the facts straight).

Problem is on CRLF platforms the I/O system filters out the CR of any
pair CRLF before the string arrives to Ruby land. That is, if you work
in text-mode. In fact that is the definition of text-mode, that the
conversion is on.

When you write in text mode in a CRLF platform, the I/O system
monitors the stream of bytes, and inserts a CR every time he sees an
LF. Unconditionally.

On Unix these conversions do not happen, text-mode and binary-mode are
the same, and Unix uses LF on disk to mean a newline.

And the point is those conversions happen in text-mode *no matter
which is the input record separator*, so in those solution the file
opened for reading should be opened in binary mode anyway. If you
don't do this, a file that has on disk