Why won't ruby chomp for me?

Hello,

I was wondering.. IIRC, Perl came up with this "\n" in the end of 

line in gets etc because in Perl, “” is false. So, if you want to have:
while (<>) {print $_;}
working, you needed that even empty lines won’t be “false”. So they
said that they will put the “\n” in the line, and problem is gone, nice
hack etc.

but in ruby, "" is true, so i'm wondering... why did ruby take this 

over from perl? i find myself many times forgetting that chomp and the
fact ruby offers me the “raw” line format never ever helped me in any
way. is it just historical praise to Perl?

just curious,

emmanuel

    I was wondering.. IIRC, Perl came up with this "\n" in the end of
line in gets etc because in Perl, "" is false. So, if you want to have:
while (<>) {print $_;}
    working, you needed that even empty lines won't be "false". So they
said that they will put the "\n" in the line, and problem is gone, nice
hack etc.

Well, I know nothing in this strange P language but this is not really the
reason.

while (<>) {} is in reality a shortcut for while(defined($_ = <>)) {} (it
must exist an old version of this strange language where while($_ = <>) {}
was different from while (<>) {})

The reason is that if the file don't end with a newline, like you say ""
is false but defined("") is true

    but in ruby, "" is true, so i'm wondering... why did ruby take this
over from perl? i find myself many times forgetting that chomp and the
fact ruby offers me the "raw" line format never ever helped me in any
way.

Probably the good question is : what is a line ? do the "line" separator
belong to the line or not ?

Guy Decoux

Quoteing emmanuel.touzery@wanadoo.fr, on Fri, Jan 30, 2004 at 12:04:30AM +0900:

Hello,

I was wondering… IIRC, Perl came up with this “\n” in the end of
line in gets etc because in Perl, “” is false. So, if you want to have:

This is an interesting observation, but its not the reason. perl does it
like that because a line HAS a new-line character at the end (that’s
what makes it a line).

while (<>) {print $_;}
working, you needed that even empty lines won’t be “false”. So they
said that they will put the “\n” in the line, and problem is gone, nice
hack etc.

It’s not a hack, its an anti-hack! perl, unlike awk (IIRC), tries not to
mangle your input data. What it read from the file is what it gives you.
This allows you to process binary files, for example, with perl -
whereas with awk/sed/grep etc., the assumption that input is always line
oriented, and thus that newlines should be stripped when reading a line,
for “convenience”, makes this hard/impossible.

but in ruby, “” is true, so i’m wondering… why did ruby take this
over from perl? i find myself many times forgetting that chomp and the
fact ruby offers me the “raw” line format never ever helped me in any
way. is it just historical praise to Perl?

Even with text, the last line in a file may or may not end with a
newline character.

If you wanted to write out the file exactly as you read it, AND perl/ruby
auto-chomped any newlines, how would you know whether the last line had
a newline or not?

In your example above, if the newline was removed by <>, would you
expect the print to put it back? But print doesn’t do that, and
shouldn’t, what if you want to do two prints to build up a single line!

just curious,

Basically, removing the newline on input is the hack, just like adding
it on output is, and its only a useful hack if you are always doing line
oriented IO. perl (and ruby) is great for line-oriented processing, like
awk/sed/…, but unlike those others, they give you control over your
input and output, you aren’t FORCED to do everything line-oriented.

Cheers,
Sam

Hello,

One problem with “autochomping” is that at the end of file you don’t
know if the record (line) ended “properly” or whether the read ended
because of an end of file.

so this is why so many tools insist on a final carriage return at the end of a
file? in the case of a truncated file? i never quite got it :O)

In your example above, if the newline was removed by <>, would you
expect the print to put it back? But print doesn’t do that, and
shouldn’t, what if you want to do two prints to build up a single line!

i’m satisfied of the distinction between puts and print :O)

generally, i think it’s a good design decision, but i have a bit the feeling
the particular case killed the general case (when you don’t care about that).
but i agree that uniformity pays off, and this is definitely very good that
it’s not “half of the methods give the \n, half don’t”. and since sometimes
it’s needed…

thanks for all the very clear answers :O)

PS: somehow i many times though that Guy meant “Pascal” when he mentionned
“the P language” ;O)

I too find the chomp an ugly perlism. Has anyone written code that
makes use of the trailing line separator from IO#gets?

Unless someone argues why it would not be a good idea, I will submit an
RCR for changing IO#gets to not include the trailing line separator by
default.

Dion.

···

On Fri, Jan 30, 2004 at 12:04:30AM +0900, Emmanuel Touzery wrote:

i find myself many times forgetting that chomp and the
fact ruby offers me the “raw” line format never ever helped me in any
way. is it just historical praise to Perl?

ts wrote:

Probably the good question is : what is a line ? do the “line” separator
belong to the line or not ?

I suppose any language can define it any way. But it seems clear that
much of the Ruby library is modeled after C and Unix (which are defined
by international standards). In ISO C, the newline is removed from the
string returned by gets().

On the other hand, C doesn’t let you redefine the input line separator.
Making one language comply with another language’s specification is tricky.

Steve

In article 200401291519.i0TFJ8d15618@moulon.inra.fr,

I was wondering.. IIRC, Perl came up with this "\n" in the end of 

line in gets etc because in Perl, “” is false. So, if you want to have:
while (<>) {print $_;}
working, you needed that even empty lines won’t be “false”. So they
said that they will put the “\n” in the line, and problem is gone, nice
hack etc.

Well, I know nothing in this strange P language but this is not really the
reason.

while (<>) {} is in reality a shortcut for while(defined($_ = <>)) {} (it
must exist an old version of this strange language where while($_ = <>) {}
was different from while (<>) {})

The reason is that if the file don’t end with a newline, like you say “”
is false but defined(“”) is true

The real reason is that it was possible in ancient perls to miss the
last line of a file iff the file ended line “blah\n0” (e.g. emacs can
write out lines without a trailing \n on the last line.)

Perl’s notion of truth and the behaviour of <> meant that the last thing
read by <> would be ‘0’ which is false (but defined); if there were a
trailing \n then the string “0\n” would be true.

but in ruby, "" is true, so i'm wondering... why did ruby take this 

over from perl? i find myself many times forgetting that chomp and the
fact ruby offers me the “raw” line format never ever helped me in any
way.

Probably the good question is : what is a line ? do the “line” separator
belong to the line or not ?

One problem with “autochomping” is that at the end of file you don’t
know if the record (line) ended “properly” or whether the read ended
because of an end of file.

Depending on what you’re doing you might just be able to make a class
which inherits everything from File and define your own readline

class MyFile < File
def readline(*args)
super.chomp
end
end

Hope this helps,

Mike

···

ts decoux@moulon.inra.fr wrote:


mike@stok.co.uk | The “`Stok’ disclaimers” apply.
http://www.stok.co.uk/~mike/ | GPG PGP Key 1024D/059913DA
mike@exegenix.com | Fingerprint 0570 71CD 6790 7C28 3D60
http://www.exegenix.com/ | 75D2 9EC4 C1C0 0599 13DA

Hi,

···

At Fri, 30 Jan 2004 10:07:49 +0900, Dion Mendel wrote:

i find myself many times forgetting that chomp and the
fact ruby offers me the “raw” line format never ever helped me in any
way. is it just historical praise to Perl?

I too find the chomp an ugly perlism. Has anyone written code that
makes use of the trailing line separator from IO#gets?

It should be another method, I guess.


Nobu Nakada

Date: Fri, 30 Jan 2004 10:07:49 +0900
From: Dion Mendel nsb034@lostrealm.com
Newsgroups: comp.lang.ruby
Subject: Re: why won’t ruby chomp for me?

i find myself many times forgetting that chomp and the
fact ruby offers me the “raw” line format never ever helped me in any
way. is it just historical praise to Perl?

I too find the chomp an ugly perlism. Has anyone written code that makes
use of the trailing line separator from IO#gets?

Unless someone argues why it would not be a good idea, I will submit an RCR
for changing IO#gets to not include the trailing line separator by default.

it might not be agood idea when lines are empty… but i suppose it would
work. i would loath having to remember to open a file in binary mode in order
for methods in IO to not to automagical things to the data too…

2 cts.

-a

···

On Fri, 30 Jan 2004, Dion Mendel wrote:

On Fri, Jan 30, 2004 at 12:04:30AM +0900, Emmanuel Touzery wrote:

Dion.

ATTN: please update your address books with address below!

===============================================================================

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
STP :: Solar-Terrestrial Physics Data | NCEI
NGDC :: http://www.ngdc.noaa.gov/
NESDIS :: http://www.nesdis.noaa.gov/
NOAA :: http://www.noaa.gov/
US DOC :: http://www.commerce.gov/

The difference between art and science is that science is what we
understand well enough to explain to a computer.
Art is everything else.
– Donald Knuth, “Discover”

/bin/sh -c ‘for l in ruby perl;do $l -e “print "\x3a\x2d\x29\x0a"”;done’
===============================================================================

Quoteing steven.jenkins@ieee.org, on Fri, Jan 30, 2004 at 12:47:49AM +0900:

ts wrote:

Probably the good question is : what is a line ? do the “line” separator
belong to the line or not ?

I suppose any language can define it any way. But it seems clear that
much of the Ruby library is modeled after C and Unix (which are defined
by international standards). In ISO C, the newline is removed from the
string returned by gets().

On the other hand, C doesn’t let you redefine the input line separator.

Modelled after, but supposed to be easier to use than C, and gets() is widely
considered to be a mistake by even C programmers.

The C IO library has some apis that strip newlines, and some that don’t, and
its a common source of bugs and confusion, not profitably emulated.

For your amusement, here’s some of the comments from the GNU C library docs:

Line-Oriented Input

···

===================

Since many programs interpret input on the basis of lines, it is
convenient to have functions to read a line of text from a stream.

Standard C has functions to do this, but they aren’t very safe: null
characters and even (for gets') long lines can confuse them. So the GNU library provides the nonstandard getline’ function that makes it
easy to read lines reliably.

Another GNU extension, getdelim', generalizes getline’. It reads
a delimited record, defined as everything through the next occurrence
of a specified delimiter character.

  • Deprecated function: char * gets (char *S)
    The function gets' reads characters from the stream stdin’ up to
    the next newline character, and stores them in the string S. The
    newline character is discarded (note that this differs from the
    behavior of fgets', which copies the newline character into the string). If gets’ encounters a read error or end-of-file, it
    returns a null pointer; otherwise it returns S.

    Warning: The gets' function is *very dangerous* because it provides no protection against overflowing the string S. The GNU library includes it for compatibility only. You should *always* use fgets’ or getline' instead. To remind you of this, the linker (if using GNU ld’) will issue a warning whenever you use
    `gets’.