Behaviour of Enumerables reject vs. select mixed into Hash

BTW all this does not answer OP's, Tom's and my *original* concern.
We were talking about Hash#reject and Hash#select not
Enumerable#reject and Enumerable#select, but nobody seems to notice,
sigh :frowning:

As I see it, there should _be_ no Hash#reject (nor Array#reject).

Hash (and in fact any class mixing in Enumerable) should be able to use Enumerable#reject to filter out elements.
That is what Trans's/Daniels code aims at (hope I am not misinterpreting them here).

Other than providing exactly this generality for iteration(==enumeration) methods I see no need to have introduced the class Enumerable at all.

Sincerely yours
Alex

BTW all this does not answer OP's, Tom's and my *original* concern.
We were talking about Hash#reject and Hash#select not
Enumerable#reject and Enumerable#select, but nobody seems to notice,
sigh :frowning:
This topic would be less difficult to discuss I think.

My hash isn't your hash. What's the most generic behavior to expect?

See, you've already dismissed me right there, because you've decided
if my Hash doesn't work the way _you_ expect it to, it's my fault.

Offense, are you kidding? The only thing which bothers me from a set
theory point of view is the symmetry in your statement above.

Honestly Todd if you wanted to make a point I missed it, completely...

That wouldn't be the first time I've been accused of saying a whole
lot of nothin' :slight_smile:

I stopped talking about Ruby, and started talking instead about
programming in general. I introduced too many things at once. My
bad.

Todd

···

On 6/22/07, Robert Dober <robert.dober@gmail.com> wrote:

A duck is a bird. It doesn't behave exactly like every other bird you
know about. But you can be relatively certain it has wings. I
suspect the least common denominator return of the object is there for
several reasons, including testing, ease of the ruby language
development, etc. Why the difference for the Hash returning a
different object between select and reject? I think it's one of those
oversight things.

Thinking about LCD another way, why is it that "13s".to_i gives me 13?
Because it _mostly_ is an integer. So, if I have an Enumerable
object that I perform some operation on; where does it say in
duck-typing that I should get an Enumerable object back?

Just by its very nature, duck-typing will forever be laden with
inconsistencies. The LCD object return is a simple way around it. I
agree with the general consensus here that #select and #reject should
return the same way for a Hash, but you have to keep in mind that
writing a pragmatic language that does what you want is like trying to
assume everything about you as a programmer before you start typing.
To do that, maybe the designer should be using the LCD for pretty much
everything :slight_smile:

Todd

···

On 6/21/07, Alexander Presber <aljoscha@weisshuhn.de> wrote:

Obviously breaking compatibilty is a very bad thing.
But without knowing for sure, I imagine Enumerables implementation
should have been something more along the lines of the above from the
beginning.
(I know, this is a bold claim and I'd like to see more opinions on
that. But transfires approach is exactly what I was thinking of when
bringing this up.)

Making Enumerable behave more agnostic to the class it is mixed in
(by letting the class itself provide a method to add an "element" to
an instance of itself, Enumerable becomes truly mixable into anything
that provides "each" and "<<".
Then doing reject on any class that mixes in Enumerable will yield a
filtered instance of that class, not Array.

That said - I think there should be no such thing as TomsEnum or any
special implementation.
Enumerable is the place to define methods for all things containing
enumerable elements.

>>
>> The other option would require an #each_assoc method (maybe assoc
>> isn't the best term, but anyhow...)
>>
>> module Enumerable
>> def select_assoc(&blk)
>> h = {}
>> each_assoc{|k,v| h[k]=v if blk[k,v]}
>> h
>> end
>> end
>>
>> The downside here of course, is twice the number of Enumerable
>> methods.
> And although I cannot imagine a case, how do we know that there are
> not Enumerables that take three or fourtytwo params ;).
> After all Enumerable is a Mixin and we have to be prepared that it be
> mixed in, right?

Yes, one could impossibly provide for all possible Mixees like this.

Yours,
Alex

Hi,

At Fri, 22 Jun 2007 06:30:40 +0900,
Trans wrote in [ruby-talk:256489]:

Hmm...well for the first solution, I suppose we need a special
constructor to provide the kind of enumerable result we will be
building. In my example, I used self.class.new, by obviously that's
not always the case, so the class will need to tell us.

I'd prefer this.

···

--
Nobu Nakada

As I see it, there should _be_ no Hash#reject (nor Array#reject).

Alexander thank you so much :slight_smile:
In the future maybe.
Right now Hash#reject is just the poor man's implementation of the
concepts which are discussed in this post.
Right now I feel it would be nice if its sibbling Hash#select returned
a Hash too.

Hash (and in fact any class mixing in Enumerable) should be able to
use Enumerable#reject to filter out elements.
That is what Trans's/Daniels code aims at (hope I am not
misinterpreting them here).

Not at all IMHO

Other than providing exactly this generality for iteration
(==enumeration) methods I see no need to have introduced the class
Enumerable at all.

Sure that is it's purpose but be careful it is a module.
I do not see any harm in overriding some of the mixed in methods for
special cases like Hash, it is reasonable approach.

Sincerely yours
Alex

Robert

···

On 6/22/07, Alexander Presber <aljoscha@weisshuhn.de> wrote:

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

> BTW all this does not answer OP's, Tom's and my *original* concern.
> We were talking about Hash#reject and Hash#select not
> Enumerable#reject and Enumerable#select, but nobody seems to notice,
> sigh :frowning:
> This topic would be less difficult to discuss I think.

My hash isn't your hash. What's the most generic behavior to expect?

See, you've already dismissed me right there, because you've decided
if my Hash doesn't work the way _you_ expect it to, it's my fault.

I do not recall having dismissed you at all? I have mad a judgement
about two levels of complexity. If you do not like the behavior of
Hash#reject or Hash#select it might be a good idea to say so, right?
I do not mind at all if somebody says, I like Hash#reject to return a
Set because (this is just an example) etc. etc.

I guess I got confused about who thinks what in this thread :frowning:

>
> Offense, are you kidding? The only thing which bothers me from a set
> theory point of view is the symmetry in your statement above.
>
> Honestly Todd if you wanted to make a point I missed it, completely...

That wouldn't be the first time I've been accused of saying a whole
lot of nothin' :slight_smile:

I stopped talking about Ruby, and started talking instead about
programming in general. I introduced too many things at once. My
bad.

Indead :wink:

···

On 6/22/07, Todd Benson <caduceass@gmail.com> wrote:

On 6/22/07, Robert Dober <robert.dober@gmail.com> wrote:

Todd

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

A duck is a bird. It doesn't behave exactly like every other bird you
know about. But you can be relatively certain it has wings.

I am on your side up to here :slight_smile:

I suspect the least common denominator return of the object is there for
several reasons, including testing, ease of the ruby language
development, etc. Why the difference for the Hash returning a
different object between select and reject? I think it's one of those
oversight things.

Thinking about LCD another way, why is it that "13s".to_i gives me 13?

13 is a good result for "13s".to_i, I couldn't come up with a better one: You ask the string class to
give you an integer (hence the _i) for a certain string and it does so in an (arguably) optimal way.

However, [["baz", "qux"]] is not a good result for
{'foo' => 'bar', 'baz' => 'qux'}.select{|k,v| k=='baz' },
{'baz' => 'qux'} would be much more sensible.

Can you see the difference?

Because it _mostly_ is an integer. So, if I have an Enumerable
object that I perform some operation on; where does it say in
duck-typing that I should get an Enumerable object back?

It is not duck-typing promising me an enumerable.
When calling enum.to_a I _expect_ an Array and nothing else.

It is the _concept_ of "select" that promises a Hash when calling it on a Hash.
Enumerability (and therefore the possibility to _iterate_) just happens to be a requirement for selecting.

But as Yossef pointed out already, it doesn't have to do all that much with duck typing.
Duck typing allows me to let a class share methods with other classes (provided some conditions on these classes are met)
without specifying the exact classes beforehand.
A la: This is an "enumerable", it must be able to "select" some of its elements with a specified rule.

You seem to imply, that in order to allow for classes to mix Enumerable in, we would have to accept
to get Arrays back for filter operations as a "least common denominator".

I disagree, and in fact that is the whole point of my first post.

By adding a second requirement to the class that wants to mix in Enumerable (implementing an "appendElement" method or, as Array calls it already "<<")
we can let go of that arbitrary and confusing "LCD": Any filtering method returns the class of the object that gets filtered.

Just by its very nature, duck-typing will forever be laden with
inconsistencies.

I do not think so.

The LCD object return is a simple way around it. I
agree with the general consensus here that #select and #reject should
return the same way for a Hash, but you have to keep in mind that
writing a pragmatic language that does what you want is like trying to
assume everything about you as a programmer before you start typing.
To do that, maybe the designer should be using the LCD for pretty much
everything :slight_smile:

I don not agree. I have not yet heard of the concept of "LCD" being necessary to work with duck typing.

Sincerely yours,
Alex

···

Am 21.06.2007 um 19:31 schrieb Todd Benson:

...
Making Enumerable behave more agnostic to the class it is mixed in
(by letting the class itself provide a method to add an "element" to
an instance of itself, Enumerable becomes truly mixable into anything
that provides "each" and "<<".
Then doing reject on any class that mixes in Enumerable will yield a
filtered instance of that class, not Array.

That said - I think there should be no such thing as TomsEnum or any
special implementation.
Enumerable is the place to define methods for all things containing
enumerable elements.

>>
>> The other option would require an #each_assoc method (maybe assoc
>> isn't the best term, but anyhow...)
>>
>> module Enumerable
>> def select_assoc(&blk)
>> h = {}
>> each_assoc{|k,v| h[k]=v if blk[k,v]}
>> h
>> end
>> end
>>
>> The downside here of course, is twice the number of Enumerable
>> methods.
> And although I cannot imagine a case, how do we know that there are
> not Enumerables that take three or fourtytwo params ;).
> After all Enumerable is a Mixin and we have to be prepared that it be
> mixed in, right?

Yes, one could impossibly provide for all possible Mixees like this.

Yours,
Alex

A duck is a bird. It doesn't behave exactly like every other bird you
know about. But you can be relatively certain it has wings. I
suspect the least common denominator return of the object is there for
several reasons, including testing, ease of the ruby language
development, etc. Why the difference for the Hash returning a
different object between select and reject? I think it's one of those
oversight things.

...

Todd

What no one seems to have (directly) focussed on is that Hash#each gives two-element [k,v] arrays back as the content. Is this what Alexander Presber means with:

Enumerable is the place to define methods for all things containing
enumerable elements.

It's how you define the elements that are enumerable. Hash#each_key, Hash#each_value, and Hash#each_pair make this explicit. Hash#each being simply Hash#each_pair (the documentation says it's the other way around, but if they're synonyms what does it matter) makes the definition of a "pair" or more explicitly "key-value pair" a good place to focus.

Perhaps if a Hash was a collection of "entries" so #each returned something closer to {k=>v} rather than an array.

s = [1, 2, 3]
r = s.class.new
s.each {|e| r << e}
r
=> [1, 2, 3]

Currently, if we replace the value of the array s with a hash { 1 => 'uno', 2 => 'dos', 3 => 'tres' }
you'd need something like:

   class Hash
     alias_method :<<, :update
     alias_method :orig_each, :each
     def each
       orig_each {|k,v| yield({k=>v}) }
     end
   end

to get a similar result:

s={1=>'uno',2=>'dos',3=>'tres'}
r = s.class.new
s.each {|e| r << e}
r
=> {1=>"uno", 2=>"dos", 3=>"tres"}

if the notation to "unpack" block arguments .each {|(k,v)| ... } would pick apart a hash entry like {k=>v} the same way that it would [k,v] (which, of course, doesn't actually need the parentheses), then the "syntactic compatibility" would be close enough for my brain, eyes, and fingers. I think that the arity of block that each_pair expects should remain 2 and each_pair should yield([k,v]) (so the meaning of "pair" for English-speakers is maintained -- I realize I being anglophilic here). The arity of the block expected by #each would be 1.

It seems to me that you'd also get "benefits" like.

{1=>"uno", 2=>"dos", 3=>"tres"}.sort_by {|k,v| v}
=> [{2=>"dos"}, {3=>"tres"}, {1=>"uno"}]

Since you'd clearly need to transform to a type that was ordered if you wanted to "sort" an unordered Hash.

And if there were a #<=> defined for a Hash entry, you could just use #sort.

This is clearly a slippery slope because you'd then have to redefine Hash#shift to give {k=>v} rather than [k,v] also. You'd almost certainly want something like #first and #last to give the key and value from {k=>v} like they'd pull those parts from [k,v].

[Blah! I took too long to answer and Alex(ander) got another response to Todd in with some of the same ideas, but I decided to post anyway.]

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Jun 21, 2007, at 1:31 PM, Todd Benson wrote:

On 6/21/07, Alexander Presber <aljoscha@weisshuhn.de> wrote:

> A duck is a bird. It doesn't behave exactly like every other bird you
> know about. But you can be relatively certain it has wings.

I am on your side up to here :slight_smile:

> I suspect the least common denominator return of the object is
> there for
> several reasons, including testing, ease of the ruby language
> development, etc. Why the difference for the Hash returning a
> different object between select and reject? I think it's one of those
> oversight things.
>
> Thinking about LCD another way, why is it that "13s".to_i gives me 13?

13 is a good result for "13s".to_i, I couldn't come up with a better
one: You ask the string class to
give you an integer (hence the _i) for a certain string and it does
so in an (arguably) optimal way.

However, [["baz", "qux"]] is not a good result for
{'foo' => 'bar', 'baz' => 'qux'}.select{|k,v| k=='baz' },
{'baz' => 'qux'} would be much more sensible.

Can you see the difference?

Sure. Why does [1,2,3].delete(3) return the integer 3 instead of the
array? It's just semantics. What you think is a duck is not
necessarily what somebody else thinks is a duck.

> Because it _mostly_ is an integer. So, if I have an Enumerable
> object that I perform some operation on; where does it say in
> duck-typing that I should get an Enumerable object back?

It is not duck-typing promising me an enumerable.
When calling enum.to_a I _expect_ an Array and nothing else.

I don't know what you expect, but I think you expect some semblance of
continuity (in english).

It is the _concept_ of "select" that promises a Hash when calling it
on a Hash.

_concept_ of "select", again that's what you think it should mean.

Enumerability (and therefore the possibility to _iterate_) just
happens to be a requirement for selecting.

But as Yossef pointed out already, it doesn't have to do all that
much with duck typing.
Duck typing allows me to let a class share methods with other classes
(provided some conditions on these classes are met)
without specifying the exact classes beforehand.
A la: This is an "enumerable", it must be able to "select" some of
its elements with a specified rule.

You seem to imply, that in order to allow for classes to mix
Enumerable in, we would have to accept
to get Arrays back for filter operations as a "least common
denominator".

Okay, here's where you make a good point. Array actually is _not_ the
LCD of an Enumerable object logistically. But it may very well be the
LCD at the bottom level; the inner workings of Ruby. That's a bold
thing for me to say because I haven't read the Ruby source code yet.

I disagree, and in fact that is the whole point of my first post.

By adding a second requirement to the class that wants to mix in
Enumerable (implementing an "appendElement" method or, as Array calls
it already "<<")
we can let go of that arbitrary and confusing "LCD": Any filtering
method returns the class of the object that gets filtered.

> Just by its very nature, duck-typing will forever be laden with
> inconsistencies.

I do not think so.

If you can have objects be inconsistent, your "pragmatic program"
means naught to me unless I already know where you're coming from --
cost of flexibility argument.

> The LCD object return is a simple way around it. I
> agree with the general consensus here that #select and #reject should
> return the same way for a Hash, but you have to keep in mind that
> writing a pragmatic language that does what you want is like trying to
> assume everything about you as a programmer before you start typing.
> To do that, maybe the designer should be using the LCD for pretty much
> everything :slight_smile:

I don not agree. I have not yet heard of the concept of "LCD" being
necessary to work with duck typing.

I didn't say necessary.

I guess the point is moot because we seem to all agree that #select
and #reject should be polar opposites (I'm leaving out 3-valued logic
when I say that).

I was just pointing out that reject and select may not mean the same
thing to you as it does to me. When you don't know what the person
expects as a return value, you give them something that you yourself
might expect.

Todd

···

On 6/21/07, Alexander Presber <aljoscha@weisshuhn.de> wrote:

Am 21.06.2007 um 19:31 schrieb Todd Benson:

Wow. Even though I've dealt with this all the time, that thought
didn't occur to me.

···

On 6/21/07, Rob Biedenharn <Rob@agileconsultingllc.com> wrote:

What no one seems to have (directly) focussed on is that Hash#each
gives two-element [k,v] arrays back as the content.

That's right. And if you wanted things to be "just ducky", it would
have to give only the v. I've actually argued in favor of that before,
b/c it's not unreasonable to see that an Array index is like a Hash
key. So, for a full parallel we'd need to see something like:

  {:x=>'m'}.each { |v| v #=> 'm'

  ['m'].each { |v| v #=> 'm'

  {'x'=>'m'}.each_assoc{ |a| a #=> ['x','m']

  ['m'].each_assoc{ |a| a #=> [0,'m']

One could easily argue that an Assoc class would be quite useful here,
rather than relying on 2-element Array to fulfill the roll. With that
in hand, it would re easy enough to add the #<< method for enumerable
construction.

T.

···

On 6/21/07, Rob Biedenharn <R...@agileconsultingllc.com> wrote:

> What no one seems to have (directly) focussed on is that Hash#each
> gives two-element [k,v] arrays back as the content.

Regardless of what we think is the best, it's seems pretty clear to me
that the duplicate of the Hash is broken during a delete_if, reject,
select, etc. Thus, the array, and thus my comment about the array
being the LCD. Like I said, I haven't read the Ruby source, but I'm
starting to get curious enough to do so.

···

On 6/21/07, Trans <transfire@gmail.com> wrote:

> On 6/21/07, Rob Biedenharn <R...@agileconsultingllc.com> wrote:
>
> > What no one seems to have (directly) focussed on is that Hash#each
> > gives two-element [k,v] arrays back as the content.

That's right. And if you wanted things to be "just ducky", it would
have to give only the v. I've actually argued in favor of that before,
b/c it's not unreasonable to see that an Array index is like a Hash
key.

Hi --

What no one seems to have (directly) focussed on is that Hash#each
gives two-element [k,v] arrays back as the content.

That's right. And if you wanted things to be "just ducky", it would
have to give only the v. I've actually argued in favor of that before,
b/c it's not unreasonable to see that an Array index is like a Hash
key. So, for a full parallel we'd need to see something like:

{:x=>'m'}.each { |v| v #=> 'm'

['m'].each { |v| v #=> 'm'

{'x'=>'m'}.each_assoc{ |a| a #=> ['x','m']

['m'].each_assoc{ |a| a #=> [0,'m']

One could easily argue that an Assoc class would be quite useful here,
rather than relying on 2-element Array to fulfill the roll. With that
in hand, it would re easy enough to add the #<< method for enumerable
construction.

I don't think there's any reason to expect just the value when
iterating through a hash. In fact if you're using a hash, you
probably have a reason for storing data that way and iterating
pair-wise seems logical to me.

I think it's a question of how one looks at the underlying types or
behaviors. I don't think the language has to converge around the
smallest possible number of interfaces. You can look at it the other
way around. It's useful (extremely) to have hashes, and to iterate
over them in pairs. Enumerable is one way to help introduce that
construct into the language -- not a one-stop-shopping hash
implementation, but helpful.

If we then find fault with hash behavior because it's not in line
precisely with other enumerables, that's a kind of reverse logic; it's
a way to talk the language out of having something useful, which I
don't think is a good idea. Enumerable not only allows but requires
that each class implement #each, and there's no constraint that every
enumerable class has to yield exactly one value at a time. I'd want
to see more concrete evidence that having hashes and arrays behave
differently is really creating problems before wanting to normalize
them around one construct.

David

···

On Fri, 22 Jun 2007, Trans wrote:

On 6/21/07, Rob Biedenharn <R...@agileconsultingllc.com> wrote:

--
* Books:
   RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242\)
   RUBY FOR RAILS (http://www.manning.com/black\)
* Ruby/Rails training
     & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)

Todd Benson wrote:

···

On 6/21/07, Trans <transfire@gmail.com> wrote:

> On 6/21/07, Rob Biedenharn <R...@agileconsultingllc.com> wrote:
>
> > What no one seems to have (directly) focussed on is that Hash#each
> > gives two-element [k,v] arrays back as the content.

That's right. And if you wanted things to be "just ducky", it would
have to give only the v. I've actually argued in favor of that before,
b/c it's not unreasonable to see that an Array index is like a Hash
key.

Regardless of what we think is the best, it's seems pretty clear to me
that the duplicate of the Hash is broken during a delete_if, reject,
select, etc.

That's one reason I'd like to see more #to_hash methods lying around.

--
Alex

I don't necessarily disagree with you. I was following the natural
conclusion of one possible perspective, namely, that the a hash key
corresponds to the array index. It's one possible way to fix the the
issue posited by the thread. And yet, if our hashes were in fact
ordered, as some have asked for, this assumption would fail. So I'm
not actually for it, but it does offer some contrast.

Consider the order hash perspective. We have an index, plus a key and
a value. So in this case, what exactly are we enumerating? We say
"pairs" as if it is something, but Ruby doesn't really have such a
thing. The closest we come to is a 2-element array. Perhaps that is
enough, but it's hardly embraced as such. We see it only in the
iteration of #each _if_ we use a single var. There is no
Hash.new([:a,'x'],[:b,'y']) or hash << [:a,'x'], etc. If there were, I
think this issue wouldn't exist. I think Enumerable would a little
more robust, and we could expect that a Hash be returned from #select.
While the 2-element array covers the need, maybe not so much the want,
and we might even consider a real Pair object:

  pair = (:a => 'x')
  pair.key #=> :a
  pair.value #=> 'x'

If we don't take this perspective (irregardless of an actual Pair
class, or not) I don't see any good reason to have Enumerable included
in Hash. Just define the desired "enumerating" methods on Hash itself
--just like #reject. But personally, I'd prefer we get Enumerable
right.

And while were on the subject --it seems that's exactly what we're
doing with String. I hear that String will no longer be Enumerable in
future Ruby. I really just don't get this. What's so problematic with
a default "view" of strings as ordered-sets of characters? All it
requires is the proper definition of #each. Clearly the way things are
now is broken. But does this really require us to scrap String
enumerablity all together? At the very least, how terribly inefficient
it will be to have to convert an string to an array of characters (eg.
bunches of little strings), just to iterate over it.

T.

···

On Jun 25, 9:47 am, dbl...@wobblini.net wrote:

Hi --

On Fri, 22 Jun 2007, Trans wrote:

>> On 6/21/07, Rob Biedenharn <R...@agileconsultingllc.com> wrote:

>>> What no one seems to have (directly) focussed on is that Hash#each
>>> gives two-element [k,v] arrays back as the content.

> That's right. And if you wanted things to be "just ducky", it would
> have to give only the v. I've actually argued in favor of that before,
> b/c it's not unreasonable to see that an Array index is like a Hash
> key. So, for a full parallel we'd need to see something like:

> {:x=>'m'}.each { |v| v #=> 'm'

> ['m'].each { |v| v #=> 'm'

> {'x'=>'m'}.each_assoc{ |a| a #=> ['x','m']

> ['m'].each_assoc{ |a| a #=> [0,'m']

> One could easily argue that an Assoc class would be quite useful here,
> rather than relying on 2-element Array to fulfill the roll. With that
> in hand, it would re easy enough to add the #<< method for enumerable
> construction.

I don't think there's any reason to expect just the value when
iterating through a hash. In fact if you're using a hash, you
probably have a reason for storing data that way and iterating
pair-wise seems logical to me.

I think it's a question of how one looks at the underlying types or
behaviors. I don't think the language has to converge around the
smallest possible number of interfaces. You can look at it the other
way around. It's useful (extremely) to have hashes, and to iterate
over them in pairs. Enumerable is one way to help introduce that
construct into the language -- not a one-stop-shopping hash
implementation, but helpful.

If we then find fault with hash behavior because it's not in line
precisely with other enumerables, that's a kind of reverse logic; it's
a way to talk the language out of having something useful, which I
don't think is a good idea. Enumerable not only allows but requires
that each class implement #each, and there's no constraint that every
enumerable class has to yield exactly one value at a time. I'd want
to see more concrete evidence that having hashes and arrays behave
differently is really creating problems before wanting to normalize
them around one construct.