How to use the encoding option :universal_newline while working with File::new?

Hi,
If do use the option universal_newline with simple string object it works :
"foo \r \r\n".encode(universal_newline: true)
# => "foo \n \nNow, File::new documentation says it supports all options that String#encode does. That's why I tired the below code :-File.open("#{__dir__}/out.txt", universal_newline: true) do |file|
  file.each_line do |line|
    p line
  end
end

# gives output

# "I am a good boy \\r \\r\\n\n"
# "I am a good girl \\r\\n \\r\n"
As you can see, it didn't convert all the \r \r\n to \n only. It seems, I am using the option incorrect way. Can anyone just give a pointer how to use the option as I said above while dealing with File _encoding_ ? I applied the code to the text file :
I am a good boy \r \r\n
I am a good girl \r\n \r
Regards,
Arup Rakshit

Where exactly do you see that? When looking at

I see no such statement. When following the reference to

It does not mention universal_newline. What am I missing?

Kind regards

robert

···

On Tue, Jan 27, 2015 at 9:30 AM, Arup Rakshit <aruprakshit@rocketmail.com> wrote:

"foo \r \r\n".encode(universal_newline: true)
# => "foo \n \n

Now, File::new documentation says it supports all options that String#encode does. That's why I tired the below code :-

--
[guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can -
without end}
http://blog.rubybestpractices.com/

Hi Robert Klemme,

I was luckier than you! :wink:

I have steped on...

I have no idea how to help on this!

Abinoam Jr.

···

On Tue, Jan 27, 2015 at 12:24 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

On Tue, Jan 27, 2015 at 9:30 AM, Arup Rakshit <aruprakshit@rocketmail.com> > wrote:

"foo \r \r\n".encode(universal_newline: true)
# => "foo \n \n

Now, File::new documentation says it supports all options that
String#encode does. That's why I tired the below code :-

Where exactly do you see that? When looking at

Class: File (Ruby 2.2.0)

I see no such statement. When following the reference to

Class: IO (Ruby 2.2.0)

It does not mention universal_newline. What am I missing?

Kind regards

robert

--
[guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can -
without end}
http://blog.rubybestpractices.com/

Please look the doc : Class: IO (Ruby 2.2.0) and the end line :-

Also, `opt` can have same keys in `String#encode` for controlling conversion between the external encoding and the internal encoding.

···

On Tuesday, January 27, 2015 04:24:21 PM you wrote:

On Tue, Jan 27, 2015 at 9:30 AM, Arup Rakshit <aruprakshit@rocketmail.com> > wrote:

> "foo \r \r\n".encode(universal_newline: true)
> # => "foo \n \n
>
> Now, File::new documentation says it supports all options that String#encode does. That's why I tired the below code :-
>
>
Where exactly do you see that? When looking at

Class: File (Ruby 2.2.0)

I see no such statement. When following the reference to

Class: IO (Ruby 2.2.0)

It does not mention universal_newline. What am I missing?

Kind regards

robert

--

Regards,
Arup Rakshit

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

--Brian Kernighan

Thanks for helping my feeble eyes!

The explanation is pretty simple: the option is only effective if there is
an encoding conversion going on:

If one thinks about it this is pretty logical. :slight_smile:

Kind regards

robert

···

On Wed, Jan 28, 2015 at 2:56 AM, Arup Rakshit <aruprakshit@rocketmail.com> wrote:

Please look the doc :
http://www.ruby-doc.org/core-2.2.0/IO.html#method-c-new-label-IO+Encoding
and the end line :-

Also, `opt` can have same keys in `String#encode` for controlling
conversion between the external encoding and the internal encoding.

--
[guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can -
without end}
http://blog.rubybestpractices.com/

Thanks Robert.

Ohh! That's how I need to use. Then it seems I wouldn't be able to take the advantage of it. Actually I am trying to replace such characters for a big file:

text = File.open('file/to/path/foo.txt').read
text.gsub!(/\r\n?/, "\n")

What is the most efficient way to do this, without reading the whole file ?

···

On Wednesday, January 28, 2015 10:49:35 AM Robert Klemme wrote:

On Wed, Jan 28, 2015 at 2:56 AM, Arup Rakshit <aruprakshit@rocketmail.com> > wrote:

> Please look the doc :
> Class: IO (Ruby 2.2.0)
> and the end line :-
>
> Also, `opt` can have same keys in `String#encode` for controlling
> conversion between the external encoding and the internal encoding.
>

Thanks for helping my feeble eyes!

The explanation is pretty simple: the option is only effective if there is
an encoding conversion going on:

encoding-line-break-test.rb · GitHub

robert

--

Regards,
Arup Rakshit

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

--Brian Kernighan

Hi Arup,

text = File.open('file/to/path/foo.txt').read
text.gsub!(/\r\n?/, "\n")

What is the most efficient way to do this, without reading the whole file ?

Dear Robert Klemme in person has told us some time ago! :wink:

"Use File.foreach for large files"

http://blog.rubybestpractices.com/posts/rklemme/001-Using_blocks_for_Robustness.html

Abinoam Jr.

Hi Arup,

> text = File.open('file/to/path/foo.txt').read
> text.gsub!(/\r\n?/, "\n")
>
> What is the most efficient way to do this, without reading the whole
file ?

Dear Robert Klemme in person has told us some time ago! :wink:

:-))

"Use File.foreach for large files"

http://blog.rubybestpractices.com/posts/rklemme/001-Using_blocks_for_Robustness.html

In this case you could also work with a fixed buffer, i.e. use IO#read.

But even more efficient is probably this:

$ dos2unix your-large-file

:slight_smile:

Kind regards

robert

···

On Wed, Jan 28, 2015 at 9:28 PM, Abinoam Jr. <abinoam@gmail.com> wrote:

--
[guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can -
without end}
http://blog.rubybestpractices.com/