1.8.7 String#lines keeps new-line chars (say it ain't so in 1.9)

Ruby 1.8.7 p72

  >> "A\nB\nC".lines.to_a
  => ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

  >> "A\nB\nC".lines.to_a
  => ["A", "B", "C"]

Thanks.

Thomas Sawyer wrote:

Ruby 1.8.7 p72

  >> "A\nB\nC".lines.to_a
  => ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

  >> "A\nB\nC".lines.to_a
  => ["A", "B", "C"]

Why would you expect that? The documentation is very clear.

--------------------------------------------------------------- IO#lines
     ios.lines(sep=$/) => anEnumerator
     ios.lines(limit) => anEnumerator
     ios.lines(sep, limit) => anEnumerator

···

------------------------------------------------------------------------
     Returns an enumerator that gives each line in _ios_. The stream
     must be opened for reading or an +IOError+ will be raised.

        f = File.new("testfile")
        f.lines.to_a #=> ["foo\n", "bar\n"]
        f.rewind
        f.lines.sort #=> ["bar\n", "foo\n"]

If it changed in 1.9, that would be another source of incompatibilities.

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

--
Posted via http://www.ruby-forum.com/\.

Thomas Sawyer wrote:

Ruby 1.8.7 p72

  >> "A\nB\nC".lines.to_a
  => ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

  >> "A\nB\nC".lines.to_a
  => ["A", "B", "C"]

Thanks.

$ ruby19 r1test.rb
["A\n", "B\n", "C\n"]

$ ri19 String#lines
----------------------------------------------------------- String#lines
     str.lines(separator=$/) => anEnumerator
     str.lines(separator=$/) {|substr| block } => str

     From Ruby 1.9.1

···

------------------------------------------------------------------------
     Returns an enumerator that gives each line in the string. If a
     block is given, it iterates over each line in the string.

        "foo\nbar\n".lines.to_a #=> ["foo\n", "bar\n"]
        "foo\nb ar".lines.sort #=> ["b ar", "foo\n"]

--
Posted via http://www.ruby-forum.com/\.

I know I'm going to be accused of bullying you again, but...
Install the latest 1.9.1 and try it yourself (or if you feel fancy,
the 1.9.2 preview).

These aren't questions irb can't answer for you.

Alternatively, try out ruby-versions:
http://ruby-versions.net/

···

On Sat, Aug 22, 2009 at 11:46 AM, Intransition<transfire@gmail.com> wrote:

Ruby 1.8.7 p72

>> "A\nB\nC".lines.to_a
=> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

>> "A\nB\nC".lines.to_a
=> ["A", "B", "C"]

Hi --

Ruby 1.8.7 p72

>> "A\nB\nC".lines.to_a
=> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

>> "A\nB\nC".lines.to_a
=> ["A", "B", "C"]

String#lines is essentially the same as String#each, which is gone in
1.9. You get, instead of #each and friends (Enumerable), a whole
toolkit of ways to enumerate through strings:

   * lines
   * bytes
   * chars
   * codepoints

There's no auto-chomping, but there never has been in any string
operation I can think of.

David

···

On Sun, 23 Aug 2009, Intransition wrote:

--
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Q: What's the best way to get a really solid knowledge of Ruby?
A: Come to our Ruby training in Edison, New Jersey, September 14-17!
    Instructors: David A. Black and Erik Kastner
    More info and registration: http://rubyurl.com/vmzN

$ ruby -v -e 'p "A\nB\nC".lines.to_a'
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
["A\n", "B\n", "C"]

$ ruby-trunk -v -e 'p "A\nB\nC".lines.to_a'
ruby 1.9.2dev (2009-08-23 trunk 24631) [x86_64-linux]
["A\n", "B\n", "C"]

···

On Sat, Aug 22, 2009 at 5:46 PM, Intransition<transfire@gmail.com> wrote:

Ruby 1.8.7 p72

>> "A\nB\nC".lines.to_a
=> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

>> "A\nB\nC".lines.to_a
=> ["A", "B", "C"]

Thanks.

--
Pozdrawiam

Radosław Bułat
http://radarek.jogger.pl - mój blog

...oh, yeah:

$ ruby19 -v
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin8.11.1]

···

--
Posted via http://www.ruby-forum.com/.

Hi --

Ruby 1.8.7 p72

>> "A\nB\nC".lines.to_a
=> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

>> "A\nB\nC".lines.to_a
=> ["A", "B", "C"]

String#lines is essentially the same as String#each, which is gone in
1.9. You get, instead of #each and friends (Enumerable), a whole
toolkit of ways to enumerate through strings:

  * lines
  * bytes
  * chars
  * codepoints

There's no auto-chomping, but there never has been in any string
operation I can think of.

It isn't the same but in many places where I might use String#lines I'd use code like this in Ruby 1.8.6:

   >> "first line\nsecond line\nthird line".split("\n")
   => ["first line", "second line", "third line"]

···

At 9:58 PM +0900 8/23/09, David A. Black wrote:

On Sun, 23 Aug 2009, Intransition wrote:

I'd expect it from a StringIO, but not a String.

T.

···

On Aug 23, 3:32 am, Brian Candler <b.cand...@pobox.com> wrote:

Thomas Sawyer wrote:
> Ruby 1.8.7 p72

> >> "A\nB\nC".lines.to_a
> => ["A\n", "B\n", "C"]

> Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

> >> "A\nB\nC".lines.to_a
> => ["A", "B", "C"]

Why would you expect that? The documentation is very clear.

--------------------------------------------------------------- IO#lines
ios.lines(sep=$/) => anEnumerator
ios.lines(limit) => anEnumerator
ios.lines(sep, limit) => anEnumerator
------------------------------------------------------------------------
Returns an enumerator that gives each line in _ios_. The stream
must be opened for reading or an +IOError+ will be raised.

    f = File\.new\(&quot;testfile&quot;\)
    f\.lines\.to\_a  \#=&gt; \[&quot;foo\\n&quot;, &quot;bar\\n&quot;\]
    f\.rewind
    f\.lines\.sort  \#=&gt; \[&quot;bar\\n&quot;, &quot;foo\\n&quot;\]

If it changed in 1.9, that would be another source of incompatibilities.

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

How is there loss of data, when you know what was removed? Just join
("\n").

···

On Aug 23, 3:32 am, Brian Candler <b.cand...@pobox.com> wrote:

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

String#lines wasn't defined in 1.8.6 so I did not think there was any
precedence for it. My use case has always been (as Radoslaw said):

  "first line\nsecond line\nthird line".split("\n")

Wanting my program to read better, I have often defined #lines to do
just that. In my experience that's the frequent case. Wanting to keep
the separator I think is the lesser need --for which I would be
happier with a less concise method name. As it stands #lines does me
no good now.

  "first line\nsecond line\nthird line".lines.map{ |s| s.chomp("\n") }

Is even worse than before! :wink:

···

On Aug 23, 8:58 am, "David A. Black" <dbl...@rubypal.com> wrote:

String#lines is essentially the same as String#each, which is gone in
1.9. You get, instead of #each and friends (Enumerable), a whole
toolkit of ways to enumerate through strings:

* lines
* bytes
* chars
* codepoints

There's no auto-chomping, but there never has been in any string
operation I can think of.

> Ruby 1.8.7 p72

> >> "A\nB\nC".lines.to_a
> => ["A\n", "B\n", "C"]

> Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

> >> "A\nB\nC".lines.to_a
> => ["A", "B", "C"]

I know I'm going to be accused of bullying you again, but...
Install the latest 1.9.1 and try it yourself (or if you feel fancy,
the 1.9.2 preview).

These aren't questions irb can't answer for you.

Looking it up isn't the main issue mate. It was the "wherefore?" that
I pondered upon finding it to be the case.

Alternatively, try out ruby-versions:http://ruby-versions.net/

Cool, thanks.

···

On Aug 23, 8:50 am, Gregory Brown <gregory.t.br...@gmail.com> wrote:

On Sat, Aug 22, 2009 at 11:46 AM, Intransition<transf...@gmail.com> wrote:

Exactly. And I guess I'll just have to keep on doing that then.

···

On Aug 23, 10:05 am, Stephen Bannasch <stephen.banna...@deanbrook.org> wrote:

It isn't the same but in many places where I might use String#lines
I'd use code like this in Ruby 1.8.6:

>> "first line\nsecond line\nthird line".split("\n")
=> ["first line", "second line", "third line"]

What do we do for lines ending in \r\n? Do we take both or just the \n? I say both would be the most consistent, but then you don't know if you need to put back a \r\n or just an \n.

Also, how do you know if the last line ended in a \n? join("\n") wouldn't put it back in either case.

James Edward Gray II

···

On Aug 23, 2009, at 3:48 PM, Trans wrote:

On Aug 23, 3:32 am, Brian Candler <b.cand...@pobox.com> wrote:

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

How is there loss of data, when you know what was removed? Just join
("\n").

Granting that prefect reversibility is a requirement here, then yes
the later makes sense. I do not think it surprising.

To me returning the newline code with #lines would be like returning
the spaces for a method #words. Eg.

  "show it\nto me".words => ["show ", "it\n", "to ", "me "]

I think the broader issue here is the question of whether or not
String is intended for use by code-point (ie. low-level character)
manipulators, or for higher-level human-oriented textual manipulation.
I always thought StringIO was for the former case. But now I am seeing
Ruby's String is some sort of hodge-podge mixture of the two.

T.

···

On Aug 24, 4:50 am, Brian Candler <b.cand...@pobox.com> wrote:

Thomas Sawyer wrote:
> On Aug 23, 3:32 am, Brian Candler <b.cand...@pobox.com> wrote:

>> Perhaps most importantly of all, if the newlines were stripped, there
>> would be loss of data and it would be impossible to reconstruct the
>> original file exactly from the members of the lines array (e.g.
>> lines.join), as your example shows nicely.

> How is there loss of data, when you know what was removed? Just join
> ("\n").

"Loss of data" means "you don't get back exactly what you started with".
Taking your example, I believe that you want both "A\nB\nC" and
"A\nB\nC\n" to result in lines ["A","B","C"], so this operation is not
reversible.

Or did you want "A\nB\nC\n" to result in ["A","B","C",""] ? That would
surprise me more. Most inputs have terminating newlines on the final
line.

Thomas Sawyer wrote:

To me returning the newline code with #lines would be like returning
the spaces for a method #words. Eg.

  "show it\nto me".words => ["show ", "it\n", "to ", "me "]

...except there is no built-in method 'words' so you can't accuse it of
being inconsistent :slight_smile:

Some people would want lines with trailing whitespace stripped as well
as terminators. Some people would want leading whitespace stripped too.
I don't think you can please everyone, so IMO the most flexible and
consistent approach is to give the line complete with its terminator,
and let the user apply whatever post-processing they like.

I think the broader issue here is the question of whether or not
String is intended for use by code-point (ie. low-level character)
manipulators, or for higher-level human-oriented textual manipulation.
I always thought StringIO was for the former case. But now I am seeing
Ruby's String is some sort of hodge-podge mixture of the two.

I think StringIO is for when you want to duck-type a File, but with
in-RAM backing.

ruby is certainly lacking consistency in this area. In ruby 1.9, for
example, IO still has #each (meaning #each_line), whereas String doesn't
any more.

···

--
Posted via http://www.ruby-forum.com/\.

Thomas Sawyer wrote:
> To me returning the newline code with #lines would be like returning
> the spaces for a method #words. Eg.

> "show it\nto me".words => ["show ", "it\n", "to ", "me "]

...except there is no built-in method 'words' so you can't accuse it of
being inconsistent :slight_smile:

ok :wink: ...just making an analogy.

Some people would want lines with trailing whitespace stripped as well
as terminators. Some people would want leading whitespace stripped too.
I don't think you can please everyone, so IMO the most flexible and
consistent approach is to give the line complete with its terminator,
and let the user apply whatever post-processing they like.

Sure, but at that point we are moving into a realm of narrower
usecases. I believe the short, more concise method name should go to
the most frequent use. I have no definitive statistics, but I'd wager
that split("\n") is by far the more common case. Based on that, I'd
rather see the current def be called something else, like #newlines or
#rawlines. But Ruby is ultimately Matz' baby so maybe his more common
use is otherwise.

> I think the broader issue here is the question of whether or not
> String is intended for use by code-point (ie. low-level character)
> manipulators, or for higher-level human-oriented textual manipulation.
> I always thought StringIO was for the former case. But now I am seeing
> Ruby's String is some sort of hodge-podge mixture of the two.

I think StringIO is for when you want to duck-type a File, but with
in-RAM backing.

ruby is certainly lacking consistency in this area. In ruby 1.9, for
example, IO still has #each (meaning #each_line), whereas String doesn't
any more.

Yea, I think that b/c StringIO is an IO first and foremost. So I don't
think String should aspire to be like StringIO per se. And StringIO
can only be like String insofar as it doesn't interfere with it being
an IO. By I may be presuming too much.

Appreciate the discussion.

···

On Aug 24, 9:17 am, Brian Candler <b.cand...@pobox.com> wrote:

I don't think it is a good idea to change the default behavior. If
you frequently need line endings stripped, you can always define your
own method for this, for example:

class LineEnum
  include Enumerable

  def initialize(obj, meth = case obj
                             when String, IO then :each_line
                             else :each
                             end)
    @obj = obj
    @meth = meth
  end

  def each
    @obj.send(@meth) do |elem|
      elem.chomp!
      yield elem
    end

    self
  end
end

$ irb19 -r lineenum.rb
Ruby version 1.9.1
irb(main):001:0> s = "foo\nbar\n"
=> "foo\nbar\n"
irb(main):002:0> se = LineEnum.new s
=> #<LineEnum:0x10169bc0 @obj="foo\nbar\n", @meth=:each_line>
irb(main):003:0> se.each {|l| p l}
"foo"
"bar"
=> #<LineEnum:0x10169bc0 @obj="foo\nbar\n", @meth=:each_line>
irb(main):004:0>

We could also extend Enumerator to honor blocks passed to them so we could do

$stdin.to_enum(:each_line) {|l| l.strip!}.each do |line|
  p line # no \n present
end

But frankly, I'd rather just add a "line.chomp!" to my block body and
be done. :slight_smile:

Kind regards

robert

···

2009/8/24 Trans <transfire@gmail.com>:

On Aug 24, 9:17 am, Brian Candler <b.cand...@pobox.com> wrote:

Thomas Sawyer wrote:
> To me returning the newline code with #lines would be like returning
> the spaces for a method #words. Eg.

> "show it\nto me".words => ["show ", "it\n", "to ", "me "]

...except there is no built-in method 'words' so you can't accuse it of
being inconsistent :slight_smile:

ok :wink: ...just making an analogy.

Some people would want lines with trailing whitespace stripped as well
as terminators. Some people would want leading whitespace stripped too.
I don't think you can please everyone, so IMO the most flexible and
consistent approach is to give the line complete with its terminator,
and let the user apply whatever post-processing they like.

Sure, but at that point we are moving into a realm of narrower
usecases. I believe the short, more concise method name should go to
the most frequent use. I have no definitive statistics, but I'd wager
that split("\n") is by far the more common case. Based on that, I'd
rather see the current def be called something else, like #newlines or
#rawlines. But Ruby is ultimately Matz' baby so maybe his more common
use is otherwise.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/