Why does IO.readlines() keep newlines?

At the very least, the win32 implementation of Ruby's IO.readlines()
method keeps the newline character on each string in the array. Considering
that it is the newline that defines a "line," it would not be wholly
unreasonable to omit it from the array, returned. I would have imagined
that it was implemented using String.split(), which omits the splitting
character. On a simply practical note, I'm sure the former is more popular
than the latter in the following:

out = File.open('file.txt', 'r'){|file| file.readlines.collect{|line|
line.chomp}}
out = File.open('file.txt', 'r'){|file|
               }

    ...in that rarely do people actually want newlines in their strings.
    Interestingly enough, I discovered this behaviour from a bug in a
program which was hidden by another peculiar function, puts(). Can you
imagine my surprise that puts() not only appends a newline to a string
printed to stdout but, if a newline already exists, it doesn't bother
appending one! So, printing strings with puts() can hide whether strings
have a newline or not. Weird...
    So, who thinks my suggested change is a good idea? How do I go about
popularizing my opinion?
    Thank you...

I'm going to speculate that readlines does this because of operating
system differences in line endings.
For compatibility between most systems, it would have to remove line
feeds (\x0A) or line-feed/carriage return combinations (\x0D\x0A).

I personally rather prefer the current behavior of readline. I don't
think puts matters, and is certainly not worth changing. I'm aware of
their behavior and if it matters, I code accordingly.

humbly,
Daniel Brumbaugh Keeney

···

On Nov 19, 2007 1:15 PM, Just Another Victim of the Ambient Morality <ihatespam@hotmail.com> wrote:

    At the very least, the win32 implementation of Ruby's IO.readlines()
method keeps the newline character on each string in the array. Considering
that it is the newline that defines a "line," it would not be wholly
unreasonable to omit it from the array, returned. I would have imagined
that it was implemented using String.split(), which omits the splitting
character. On a simply practical note, I'm sure the former is more popular
than the latter in the following:

out = File.open('file.txt', 'r'){|file| file.readlines.collect{|line|
line.chomp}}
out = File.open('file.txt', 'r'){|file|
               }

    ...in that rarely do people actually want newlines in their strings.
    Interestingly enough, I discovered this behaviour from a bug in a
program which was hidden by another peculiar function, puts(). Can you
imagine my surprise that puts() not only appends a newline to a string
printed to stdout but, if a newline already exists, it doesn't bother
appending one! So, printing strings with puts() can hide whether strings
have a newline or not. Weird...
    So, who thinks my suggested change is a good idea? How do I go about
popularizing my opinion?
    Thank you...

...

character. On a simply practical note, I'm sure the former is more popular
than the latter in the following:

out = File.open('file.txt', 'r'){|file| file.readlines.collect{|line|
line.chomp}}
out = File.open('file.txt', 'r'){|file|
    ...in that rarely do people actually want newlines in their strings.

FWIW, I never use readlines for this exact reason. I find its
preservation of line endings entirely annoying. I always
IO.read().split when I can.

As much as I'd personally like it changed, and know that such a change
would not affect any of my scripts, I'm concerned that such a change
must fall into the category of "not backwards compatible", and thus
unlikely to be effected without very strong support.

How do I go about popularizing my opinion?

Discuss the issue here as you are doing. If you don't get a large
vocal outcry against the proposal, or are not swayed by any arguments
that come against it, file an RCR[1] (preferably with a source code
patch attached) and hope that Matz accepts your change into the core.

[1] http://rcrchive.net/

···

On Nov 19, 12:14 pm, "Just Another Victim of the Ambient Morality" <ihates...@hotmail.com> wrote:

Indeed that's not the case.

In CRLF platforms the I/O layer handles newlines in text mode so that the programmer *always* works with "\n", no CRLF ever goes up on Windows. Nor you need to print CRLFs by hand at the Ruby level. At the Ruby level a newline is always == "\n" and has always length 1.

The string "\n" is the logical newline in Ruby meaning it is portable and the I/O layer takes care of its actual representation on disk according to the runtime platform. In Java for example this works in a different way, "\n" is not portable, to write a portable newline in Java you invoke some println().

This article explains how newlines work in C-based languages. It is Perl-based but in general it applies to Ruby except that in Ruby there's no platform where "\n" == "\015". In Ruby "\n" == "\012" everywhere and that simplifies things a bit. The I/O layer in MRI is C's stdio instead of PerlIO, but the explained newline mangling in and out is analogous:

   Radar – O’Reilly

I am the author but that doesn't matter.

-- fxn

···

On Nov 20, 2007, at 12:43 AM, Daniel Brumbaugh Keeney wrote:

I'm going to speculate that readlines does this because of operating
system differences in line endings.
For compatibility between most systems, it would have to remove line
feeds (\x0A) or line-feed/carriage return combinations (\x0D\x0A).

<snip>

I personally rather prefer the current behavior of readline.

But than you could do
readlines/(\n\r?)/,
as default behavior I find it most annoying too.

Robert

···

On Nov 20, 2007 12:43 AM, Daniel Brumbaugh Keeney <devi.webmaster@gmail.com> wrote:
--
what do I think about Ruby?
http://ruby-smalltalk.blogspot.com/

Unfortunately, files created on one platform inevitably make their way
to another. When an IO with \r\n is read on a UNIX, it preserves the
carriage return.

Daniel Brumbaugh Keeney

···

On Nov 20, 2007 2:53 AM, Xavier Noria <fxn@hashref.com> wrote:

On Nov 20, 2007, at 12:43 AM, Daniel Brumbaugh Keeney wrote:

> I'm going to speculate that readlines does this because of operating
> system differences in line endings.
> For compatibility between most systems, it would have to remove line
> feeds (\x0A) or line-feed/carriage return combinations (\x0D\x0A).

Indeed that's not the case.

In CRLF platforms the I/O layer handles newlines in text mode so that
the programmer *always* works with "\n", no CRLF ever goes up on
Windows. Nor you need to print CRLFs by hand at the Ruby level. At the
Ruby level a newline is always == "\n" and has always length 1.
-- fxn

Yes, that's covered in the article I mentioned as well:

   Radar – O’Reilly

-- fxn

···

On Nov 20, 2007, at 10:17 PM, Daniel Brumbaugh Keeney wrote:

Unfortunately, files created on one platform inevitably make their way
to another. When an IO with \r\n is read on a UNIX, it preserves the
carriage return.