Iterate chars in a string

Modifying String#to_a to return an array of characters has been brought
up before (ruby-talk:148588 and following). I don't think Matz likes
the idea, though.

Dan

···

-----Original Message-----
From: Mike Austin [mailto:noone@nowhere.com]
Sent: Monday, March 20, 2006 12:39 PM
To: ruby-talk ML
Subject: Re: iterate chars in a string

"I am puzzled".each_byte { |b| puts b.chr }

I'm surprised that not many people knew about 'each_byte()'.
Maybe it's a
problem with Ruby docs? Or maybe it is just
counter-intuitive - I would expect
each() iterate over bytes, and provide each_lines() to
iterate over lines instead.

Mike

http://www.rubycentral.com/ref/

Mike I really agree, I *was* expecting that behavior from "each" too, some
time ago, and it took me some time to figure it out
BTW I think the "nicest" solutions is mine combined with Robert's (pun
intended)

"Ty Mr. Klemme".each_byte { |b| puts b.chr }

my "%" stuff was rather clumsy.

bye for now
Robert

···

On 3/20/06, Berger, Daniel <Daniel.Berger@qwest.com> wrote:

> -----Original Message-----
> From: Mike Austin [mailto:noone@nowhere.com]
> Sent: Monday, March 20, 2006 12:39 PM
> To: ruby-talk ML
> Subject: Re: iterate chars in a string
>
>
> "I am puzzled".each_byte { |b| puts b.chr }
>
> I'm surprised that not many people knew about 'each_byte()'.
> Maybe it's a
> problem with Ruby docs? Or maybe it is just
> counter-intuitive - I would expect
> each() iterate over bytes, and provide each_lines() to
> iterate over lines instead.
>
> Mike
>
> http://www.rubycentral.com/ref/

Modifying String#to_a to return an array of characters has been brought
up before (ruby-talk:148588 and following). I don't think Matz likes
the idea, though.

Dan

--
Deux choses sont infinies : l'univers et la bêtise humaine ; en ce qui
concerne l'univers, je n'en ai pas acquis la certitude absolue.

- Albert Einstein

<i>Modifying String#to_a to return an array of characters has been
brought
up before (ruby-talk:148588 and following). I don't think Matz likes
the idea, though.</i>

It should be pointed out that there really is a flaw in the way Ruby
handles this. #each takes a parameter which defaults to the global
variable $/ to determine the actual split to perform. Not only can't
enumerable handle the parameter, but worse relying on a globabl
varaible like that is dangerous! To be safe one would have to set the
global everytime it is used, or aways give the parameter. Otherwise
another lib might change it on you. The upshot of all this is that we
are much less inclined to even bother with String#each, which is too
bad.

BTW, see Calibre's EnumerablePass for a way to allow #each to take
parameters and still have enumerablity.

T.

The thing I don't like about this behaviour is that an algorithm which
operates on containers and expects #each can't work immediately with
strings.

Daniel Tse

···

On 3/21/06, Robert Dober <robert.dober@gmail.com> wrote:

Mike I really agree, I *was* expecting that behavior from "each" too, some
time ago, and it took me some time to figure it out
BTW I think the "nicest" solutions is mine combined with Robert's (pun
intended)

"Ty Mr. Klemme".each_byte { |b| puts b.chr }

my "%" stuff was rather clumsy.

bye for now
Robert

On 3/20/06, Berger, Daniel <Daniel.Berger@qwest.com> wrote:
>
> > -----Original Message-----
> > From: Mike Austin [mailto:noone@nowhere.com]
> > Sent: Monday, March 20, 2006 12:39 PM
> > To: ruby-talk ML
> > Subject: Re: iterate chars in a string
> >
> >
> > "I am puzzled".each_byte { |b| puts b.chr }
> >
> > I'm surprised that not many people knew about 'each_byte()'.
> > Maybe it's a
> > problem with Ruby docs? Or maybe it is just
> > counter-intuitive - I would expect
> > each() iterate over bytes, and provide each_lines() to
> > iterate over lines instead.
> >
> > Mike
> >
> > http://www.rubycentral.com/ref/
>
> Modifying String#to_a to return an array of characters has been brought
> up before (ruby-talk:148588 and following). I don't think Matz likes
> the idea, though.
>
> Dan
>
>

--
Deux choses sont infinies : l'univers et la bêtise humaine ; en ce qui
concerne l'univers, je n'en ai pas acquis la certitude absolue.

- Albert Einstein

Hi --

···

On Tue, 21 Mar 2006, Trans wrote:

<i>Modifying String#to_a to return an array of characters has been
brought
up before (ruby-talk:148588 and following). I don't think Matz likes
the idea, though.</i>

It should be pointed out that there really is a flaw in the way Ruby
handles this. #each takes a parameter which defaults to the global
variable $/ to determine the actual split to perform. Not only can't
enumerable handle the parameter, but worse relying on a globabl
varaible like that is dangerous! To be safe one would have to set the
global everytime it is used, or aways give the parameter. Otherwise
another lib might change it on you. The upshot of all this is that we
are much less inclined to even bother with String#each, which is too
bad.

Another lib might change *anything* on you :slight_smile: We all have to trust
each other not to do that.

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! Ruby for Rails

jogloran wrote:

The thing I don't like about this behaviour is that an algorithm which
operates on containers and expects #each can't work immediately with
strings.

That's not true. It just depends on what you consider to be the parts of a string. I'd agree that naturally one would expect characters to be that - but "lines" is another option. And that's the one that has been chose by Matz. It has some advantages, too, e.g. if you slurp in a file into a single string and then want to iterate the lines.

Kind regards

  robert

} The thing I don't like about this behaviour is that an algorithm which
} operates on containers and expects #each can't work immediately with
} strings.

A decision had to be made on how to split up a string when using each. The
decision was made to split it up by lines, since that was deemed to be the
most often used case. I think that's probably a correct assessment. Now, if
you want it to use something other than newlines as its split, you can
explicitly split by whatever you want:

'foobar'.split('').each { |c| puts c }

Basically, stringvar.each is a shortcut for stringvar.split("\n").each
because line splitting is the common case.

} Daniel Tse
--Greg

···

On Tue, Mar 21, 2006 at 04:31:39PM +0900, jogloran wrote:

} On 3/21/06, Robert Dober <robert.dober@gmail.com> wrote:
} >
} > Mike I really agree, I *was* expecting that behavior from "each" too, some
} > time ago, and it took me some time to figure it out
} > BTW I think the "nicest" solutions is mine combined with Robert's (pun
} > intended)
} >
} > "Ty Mr. Klemme".each_byte { |b| puts b.chr }
} >
} > my "%" stuff was rather clumsy.
} >
} > bye for now
} > Robert
} >
} > On 3/20/06, Berger, Daniel <Daniel.Berger@qwest.com> wrote:
} > >
} > > > -----Original Message-----
} > > > From: Mike Austin [mailto:noone@nowhere.com]
} > > > Sent: Monday, March 20, 2006 12:39 PM
} > > > To: ruby-talk ML
} > > > Subject: Re: iterate chars in a string
} > > >
} > > >
} > > > "I am puzzled".each_byte { |b| puts b.chr }
} > > >
} > > > I'm surprised that not many people knew about 'each_byte()'.
} > > > Maybe it's a
} > > > problem with Ruby docs? Or maybe it is just
} > > > counter-intuitive - I would expect
} > > > each() iterate over bytes, and provide each_lines() to
} > > > iterate over lines instead.
} > > >
} > > > Mike
} > > >
} > > > http://www.rubycentral.com/ref/
} > >
} > > Modifying String#to_a to return an array of characters has been brought
} > > up before (ruby-talk:148588 and following). I don't think Matz likes
} > > the idea, though.
} > >
} > > Dan
} > >
} > >
} >
} >
} > --
} > Deux choses sont infinies : l'univers et la b?tise humaine ; en ce qui
} > concerne l'univers, je n'en ai pas acquis la certitude absolue.
} >
} > - Albert Einstein
} >
} >

Another lib might change *anything* on you :slight_smile: We all have to trust
each other not to do that.

Of course, but the behavior of String#each is not a good context to
_invite_ global modification. The behavior needs to be more reliable
then that.

T.

Would it not be nice to simply extend the behavior of String#each e.g. like
this

"Really nothing intelligent to tell".each <some_intelligent_choice> do
          >c>
          puts c
end
==>
R
e
etc. etc.

<some_intelligent_choice> might be 1 (imagine what n could do!)
[ we could even formulate endless loops like this
   "".each 0 do
    puts "Eternity is a hack of a long time"
   end
that is *really* great :wink:
]
or shall everybody do it oneself

class String
      def my_iterator ....

I just do not think so.

Cheers
Robert

···

On 3/21/06, Robert Klemme <bob.news@gmx.net> wrote:

jogloran wrote:
> The thing I don't like about this behaviour is that an algorithm which
> operates on containers and expects #each can't work immediately with
> strings.

That's not true. It just depends on what you consider to be the parts
of a string. I'd agree that naturally one would expect characters to be
that - but "lines" is another option. And that's the one that has been
chose by Matz. It has some advantages, too, e.g. if you slurp in a file
into a single string and then want to iterate the lines.

Kind regards

        robert

--
Deux choses sont infinies : l'univers et la bêtise humaine ; en ce qui
concerne l'univers, je n'en ai pas acquis la certitude absolue.

- Albert Einstein

Besides,

RUBY_VERSION # => "1.8.4"
require 'enumerator'

def foo(x)
  x.each{|y| p y.chr}
end

str = "foo bar baz"
foo(str.enum_for(:each_byte))
# >> "f"
# >> "o"
# >> "o"
# >> " "
# >> "b"
# >> "a"
# >> "r"
# >> " "
# >> "b"
# >> "a"
# >> "z"

···

On Tue, Mar 21, 2006 at 05:58:50PM +0900, Robert Klemme wrote:

jogloran wrote:
>The thing I don't like about this behaviour is that an algorithm which
>operates on containers and expects #each can't work immediately with
>strings.

That's not true. It just depends on what you consider to be the parts
of a string. I'd agree that naturally one would expect characters to be
that - but "lines" is another option. And that's the one that has been
chose by Matz. It has some advantages, too, e.g. if you slurp in a file
into a single string and then want to iterate the lines.

--
Mauricio Fernandez - http://eigenclass.org - singular Ruby

Hi --

···

On Tue, 21 Mar 2006, Trans wrote:

Another lib might change *anything* on you :slight_smile: We all have to trust
each other not to do that.

Of course, but the behavior of String#each is not a good context to
_invite_ global modification. The behavior needs to be more reliable
then that.

Well... I'd rather rule out global modification as a response to
features one doesn't like, even if it means living with a few of
those.

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! http://www.manning.com/books/black

Sorry for the doubletons, I am working on it :frowning:

I am not aware of the impact this kind of discussion might have on the
evolution of Ruby
<<off topic it would really be great if someone could point me to
information regarding ruby-2.0>>
but I fail to understand why extending the behavior of String#each in a
complete backward compatible way might be a problem.

In my ideal Ruby World
aString.each would do what it ever did
aString.each(anotherString) would do what it ever did

but
aString.each(aFixnum) will get us slices "doing what one might expect it to
do".
aString.each(0) might implement an endless loop (naah too dangerous)
aString.each(nil) might become meaningful in some other ways

Matz did such a great and beautiful work on getting close to that ambitous
goal and I still have the feeling that String#each does not fit into that
picture.

The beauty of the thing is that ruby is probably one of the few languages
where that kind of discussion can occur, others beeing too ugly anyway to
pleed for beauty ( very biased oppinion, I admit)

···

On 3/21/06, Trans <transfire@gmail.com> wrote:

> Another lib might change *anything* on you :slight_smile: We all have to trust
> each other not to do that.

Of course, but the behavior of String#each is not a good context to
_invite_ global modification. The behavior needs to be more reliable
then that.

T.

--
Deux choses sont infinies : l'univers et la bêtise humaine ; en ce qui
concerne l'univers, je n'en ai pas acquis la certitude absolue.

- Albert Einstein

Robert Dober wrote:

Please don't top post.

Would it not be nice to simply extend the behavior of String#each e.g. like
this

"Really nothing intelligent to tell".each <some_intelligent_choice> do
          >c>
          puts c
end
==>
R
e
etc. etc.

<some_intelligent_choice> might be 1 (imagine what n could do!)

We have that already: it's Enumerator.

irb(main):003:0> require 'enumerator'
=> true
irb(main):004:0> "foo\nbar".to_enum(:each_byte).each {|c| puts c.chr}
f
o

b
a
r
=> "foo\nbar"
irb(main):005:0> "foo\nbar".to_enum(:scan, /./m).each {|c| puts c}
f
o

b
a
r
=> "foo\nbar"

Cheers

  robert

Well... I'd rather rule out global modification as a response to
features one doesn't like, even if it means living with a few of
those.

What other kind of features would one be inclined to modify?

T.

Sorry Robert but I do not think I made myself too clear

we now about each_byte and I already combined your #chr and String#each_byte

so we had already established

"Ty Mr. Klemme".each_byte { |b| puts b.chr }

which is pretty elegant, don't you agree?
So Enumerators are not needed.

Now after that we switched discussing the behaviour of String#each and I
*really* feal that String#each should us give that kind of behaviour.
As I see that a lot of people think that the current behaviour of
String#each is a good one, and changing it would not be an option anyway I
thaught that the only solution would be to enhance the behaviour of
String#each.
The idea for that behaviour comes from Ruby itself (look at IO#gets)

then

   "Ty Mr. Klemme".each_byte { |b| puts b.chr }

would be the same as

    "Ty Mr. Klemme".each( 1 ) { |b| puts b }

There is always more than one way to do it :wink:

Robert

···

On 3/21/06, Robert Klemme <bob.news@gmx.net> wrote:

Robert Dober wrote:

Please don't top post.

> Would it not be nice to simply extend the behavior of String#each e.g.
like
> this
>
> "Really nothing intelligent to tell".each <some_intelligent_choice> do
> >c>
> puts c
> end
> ==>
> R
> e
> etc. etc.
>
> <some_intelligent_choice> might be 1 (imagine what n could do!)

We have that already: it's Enumerator.

irb(main):003:0> require 'enumerator'
=> true
irb(main):004:0> "foo\nbar".to_enum(:each_byte).each {|c| puts c.chr}
f
o
o

b
a
r
=> "foo\nbar"
irb(main):005:0> "foo\nbar".to_enum(:scan, /./m).each {|c| puts c}
f
o
o

b
a
r
=> "foo\nbar"

Cheers

        robert

--
Deux choses sont infinies : l'univers et la bêtise humaine ; en ce qui
concerne l'univers, je n'en ai pas acquis la certitude absolue.

- Albert Einstein

Sorry Robert but I do not think I made myself too clear

we now about each_byte and I already combined your #chr and String#each_byte

so we had already established

"Ty Mr. Klemme".each_byte { |b| puts b.chr }

which is pretty elegant, don't you agree?
So Enumerators are not needed.

Duck typing?

require 'enumerator'

def just_iterate(obj)
  obj.each { |e| p e }
end

just_iterate("abc")
# (prints) "abc"

just_iterate("abc".enum_for(:each_byte))
# (prints) 97
# 98
# 99

Now after that we switched discussing the behaviour of String#each and I
*really* feal that String#each should us give that kind of behaviour.
As I see that a lot of people think that the current behaviour of
String#each is a good one, and changing it would not be an option anyway I
thaught that the only solution would be to enhance the behaviour of
String#each.
The idea for that behaviour comes from Ruby itself (look at IO#gets)

then

   "Ty Mr. Klemme".each_byte { |b| puts b.chr }

would be the same as

    "Ty Mr. Klemme".each( 1 ) { |b| puts b }

There is always more than one way to do it :wink:

Umm, there's *already* more than one way to do it :slight_smile: But Enumerator
really is a very useful class:

"abc\ndef\nghi\njkl".each { |e| p e }
# "abc\n"
# "def\n"
# "ghi\n"
# "jkl"

"abc\ndef\nghi\njkl\n".enum_for(:each_byte).each { |e| p e.chr }
# "a"
# "b"
# "c"
# "\n"
# "d"
# "e"
# etc.

"abc\ndef\nghi\njkl\n".enum_slice(2).each { |e| p e }
# ["abc\n", "def\n"]
# ["ghi\n", "jkl\n"]

"abc\ndef\nghi\njkl\n".enum_for(:each_byte).enum_slice(2).each { |e| p e }
# [97, 98]
# [99, 10]
# [100, 101]

And crucially:

"abc\ndef\nghi\njkl\n".enum_for(:each_byte).enum_slice(2).map do |(a,b)|
  (a + b).chr
end
# => ["\303", "m", "\311", "p", "\317", "s", "\325", "v"]

How would map and other Enumerable methods work if 'each' needed
arguments?

···

On Tue, 2006-03-21 at 18:47 +0900, Robert Dober wrote:

--
Ross Bamford - rosco@roscopeco.REMOVE.co.uk

Ross Bamford wrote:

Sorry Robert but I do not think I made myself too clear

we now about each_byte and I already combined your #chr and String#each_byte

so we had already established

"Ty Mr. Klemme".each_byte { |b| puts b.chr }

which is pretty elegant, don't you agree?
So Enumerators are not needed.

Duck typing?

require 'enumerator'

def just_iterate(obj)
  obj.each { |e| p e }
end

just_iterate("abc")
# (prints) "abc"

just_iterate("abc".enum_for(:each_byte))
# (prints) 97
# 98
# 99

Now after that we switched discussing the behaviour of String#each and I
*really* feal that String#each should us give that kind of behaviour.
As I see that a lot of people think that the current behaviour of
String#each is a good one, and changing it would not be an option anyway I
thaught that the only solution would be to enhance the behaviour of
String#each.
The idea for that behaviour comes from Ruby itself (look at IO#gets)

then

   "Ty Mr. Klemme".each_byte { |b| puts b.chr }

would be the same as

    "Ty Mr. Klemme".each( 1 ) { |b| puts b }

There is always more than one way to do it :wink:

Umm, there's *already* more than one way to do it :slight_smile: But Enumerator
really is a very useful class:

"abc\ndef\nghi\njkl".each { |e| p e }
# "abc\n"
# "def\n"
# "ghi\n"
# "jkl"

"abc\ndef\nghi\njkl\n".enum_for(:each_byte).each { |e| p e.chr }
# "a"
# "b"
# "c"
# "\n"
# "d"
# "e"
# etc.

"abc\ndef\nghi\njkl\n".enum_slice(2).each { |e| p e }
# ["abc\n", "def\n"]
# ["ghi\n", "jkl\n"]

"abc\ndef\nghi\njkl\n".enum_for(:each_byte).enum_slice(2).each { |e| p e }
# [97, 98]
# [99, 10]
# [100, 101]

And crucially:

"abc\ndef\nghi\njkl\n".enum_for(:each_byte).enum_slice(2).map do |(a,b)|
  (a + b).chr
end
# => ["\303", "m", "\311", "p", "\317", "s", "\325", "v"]

How would map and other Enumerable methods work if 'each' needed
arguments?

Maybe a little off the topic, but I still think it would be pretty nifty if calling enumerators without a block returned it's own enum_for():

"abcdefg".each_byte.each_slice(2).map do |a, b|
   (a + b).chr
end

To me it feels like partial application. It can't do anything without a block, so it simply passes its enumerator. No more messy 'enum_for()'s littered everywhere. Here's another example:

"abcdefg".each_byte.with_index.map do |a, i|
   a + i
end

Just my 2c, and thinking out loud
Mike

···

On Tue, 2006-03-21 at 18:47 +0900, Robert Dober wrote:

Mike Austin wrote:

Maybe a little off the topic, but I still think it would be pretty nifty
if calling enumerators without a block returned it's own enum_for():

"abcdefg".each_byte.each_slice(2).map do |a, b|
  (a + b).chr
end

I believe Ruby 1.9 (experimental) has this feature.

Cheers,
Dave