Mode method for Array

Hi,

I'd like to write a get_mode method for the Array class. The method would return an array of the most frequently occurring element or elements.

So [3, 1, 1, 55, 55].get_mode would return [1, 55].

I have a way to do this but I don't know if it's the best way. I was wondering if anyone had any suggestions?

Thanks!

I'd like to write a get_mode method for the Array class. The method would return an array of the most frequently occurring element or elements.
So [3, 1, 1, 55, 55].get_mode would return [1, 55].
I have a way to do this but I don't know if it's the best way. I was wondering if anyone had any suggestions?

What is your way? Maybe we can have some idea of what parameters you are using
to the the most frequently elements. Using something like

irb(main):001:0> [3,1,1,55,55].inject(Hash.new(0)){|memo,item| memo[item] += 1;
memo}.sort_by {|e| e[1]}.reverse
=> [[55, 2], [1, 2], [3, 1]]

can return you some elements ordered by frequency.

Facets has:

  module Enumerable

    # In Statistics mode is the value that occurs most
    # frequently in a given set of data.

    def mode
      count = Hash.new(0)
      each {|x| count += 1 }
      count.sort_by{|k,v| v}.last[0]
    end

  end

Hmm.. but that thwarts ties. I'll have to consider how to fix.

T.

···

On Sep 30, 5:56 pm, Glenn <glenn_r...@yahoo.com> wrote:

Hi,

I'd like to write a get_mode method for the Array class. The method would return an array of the most frequently occurring element or elements.

So [3, 1, 1, 55, 55].get_mode would return [1, 55].

I have a way to do this but I don't know if it's the best way. I was wondering if anyone had any suggestions?

Using Enumerable#cluster_by (already defined in Facets):

module Enumerable
  def mode
    cluster_by do |element|
      element
    end.cluster_by do |cluster|
      cluster.length
    end.last.ergo do |clusters|
      clusters.transpose.first
    end # || []
  end
end

gegroet,
Erik V.

There's one more problem with your code: [].mode doesn't work.

gegroet,
Erik V.

Shame that the standard Hash#invert doesn't handle duplicate values
well. My suggestion:

class Hash
  def ninvert
    inject({}) { |h,(k,v)| (h[v] ||= []) << k; h }
  end
end

class Array
  def get_mode
    (inject(Hash.new(0)) { |h,e| h[e] += 1; h }.ninvert.max ||
[[]]).last
  end
end

p [3, 1, 1, 55, 55].get_mode
p [3, 1, 1, 55].get_mode
p [:foo, 3, "bar", :foo, 4, "bar"].get_mode
p [].get_mode

(with ruby 1.8 if there are multiple mode values you get them in an
arbitary order; I think with 1.9 you'd get them in the order first seen
in the original array)

···

--
Posted via http://www.ruby-forum.com/.

Thanks Eric.

T.

···

On Sep 30, 10:12 pm, Erik Veenstra <erikv...@gmail.com> wrote:

There's one more problem with your code: .mode doesn't work.

And since we all love speed, we tend to avoid inject. (For
those who don't know: inject and inject! are really, really
slow. I mean, _really_ slow...)

Speed is also the reason for "pre-defining" variables used in
iterations. (6% faster!!)

For low level methods like these, speed is much more important
than readability. And the inject versions of the methods below
aren't even more readable than the faster implementations.

So, we'll go for the fast ones:

module Enumerable
   def mode
     empty? ? [] : frequencies.group_by_value.max.last
   end

   def frequencies
     x = nil
     res = Hash.new(0)
     each{|x| res[x] += 1}
     res
   end
end

class Hash
   def group_by_value
     k = v = nil
     res = {}
     each{|k, v| (res[v] ||= []) << k}
     res
   end
end

gegroet,
Erik V. - http://www.erikveen.dds.nl/

Premature optimization IMHO.

Here's another nice and short one:

irb(main):001:0> module Enumerable
irb(main):002:1> def mode
irb(main):003:2> max = 0
irb(main):004:2> c = Hash.new 0
irb(main):005:2> each {|x| cc = c += 1; max = cc if cc > max}
irb(main):006:2> c.select {|k,v| v == max}.map {|k,v| k}
irb(main):007:2> end
irb(main):008:1> end
=> nil
irb(main):009:0> [3, 1, 1, 55, 55].mode
=> [55, 1]
irb(main):010:0> .mode
=>
irb(main):011:0>

Cheers

robert

···

2008/10/1 Erik Veenstra <erikveen@gmail.com>:

Speed is also the reason for "pre-defining" variables used in
iterations. (6% faster!!)

--
remember.guy do |as, often| as.you_can - without end

Hi --

···

On Wed, 1 Oct 2008, Erik Veenstra wrote:

And since we all love speed, we tend to avoid inject. (For
those who don't know: inject and inject! are really, really
slow. I mean, _really_ slow...)

What's inject! ?

David

--
Rails training from David A. Black and Ruby Power and Light:
   Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
   Advancing with Rails January 19-22 Fort Lauderdale, FL *
   * Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!

I use inject for conceptual reasons. You can always refactor for speed later.

Todd

···

On Wed, Oct 1, 2008 at 6:29 AM, Erik Veenstra <erikveen@gmail.com> wrote:

And since we all love speed, we tend to avoid inject. (For
those who don't know: inject and inject! are really, really
slow. I mean, _really_ slow...)

Hi --

Speed is also the reason for "pre-defining" variables used in
iterations. (6% faster!!)

Premature optimization IMHO.

As much as I like inject, I have to say I've always felt that the ones
that look like this:

   inject({}) {|h,item| do_something; h }

are kind of unidiomatic. Evan Phoenix was saying recently on IRC (I
hope I'm remembering/quoting correctly) that his rule of thumb was
that inject was for cases where the accumulator was not the same
object every time, and that where a single object is having elements
added to it, an each iteration from the source collection was better.
I tend to agree, though I'm not able to come up with a very technical
rationale.

What say you, oh inject king?

David

···

On Wed, 1 Oct 2008, Robert Klemme wrote:

2008/10/1 Erik Veenstra <erikveen@gmail.com>:

--
Rails training from David A. Black and Ruby Power and Light:
   Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
   Advancing with Rails January 19-22 Fort Lauderdale, FL *
   * Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!

What's inject! ?

It's the thing you're talking about, all day... :}

It's part of Facets:

http://facets.rubyforge.org/doc/api/core/classes/Enumerable.html#M000416

  def inject!(s)
    k = s
    each { |i| yield(k, i) }
    k
  end

gegroet,
Erik V.

David A. Black wrote:

What say you, oh inject king?

I don't know who that is, but I'll add my 2c anyway:

As much as I like inject, I have to say I've always felt that the ones
that look like this:

   inject({}) {|h,item| do_something; h }

are kind of unidiomatic.

I agree; 'inject' is ideally for when you're creating a new data
structure each iteration rather than modifying an existing one. You
could do

  inject({}) {|h,item| h.merge(something => otherthing)}

but that creates lots of waste.

I only used it as a convenient holder for the target object. Maybe
there's a more ruby-ish pattern where the target is the same each time
round, although I don't know what you'd call it:

  module Enumerable
    def into(obj)
      each { |e| yield obj, e }
      obj
    end
  end

  src = {:foo=>1, :bar=>1, :baz=>2}
  p src.into({}) { |tgt,(k,v)| (tgt[v] ||= ) << k }

There was also a previous suggestion of generalising map so that it
would build into an arbitary object, not just an array.

  module Enumerable
    def map2(target = )
      each { |e| target << (yield e) }
      target
    end
  end

  p [1,2,3].map2 { |e| e * 2 }

  class Hash
    def <<(x)
      self[x[0]] = x[1]
    end
  end

  p [1,2,3].map2({}) { |e| [e, e * 2] }

That would allow any target which implements :<<, so map to $stdout
would be fine.

It's not so useful here, since we'd need a :<< method suitable for hash
inversion. And I suppose for completeness, you'd need a wrapper class
analagous to Enumerator to map :<< to an arbitary method name...

···

--
Posted via http://www.ruby-forum.com/\.

Hi --

Speed is also the reason for "pre-defining" variables used in
iterations. (6% faster!!)

Premature optimization IMHO.

As much as I like inject, I have to say I've always felt that the ones
that look like this:

inject({}) {|h,item| do_something; h }

are kind of unidiomatic. Evan Phoenix was saying recently on IRC (I
hope I'm remembering/quoting correctly) that his rule of thumb was
that inject was for cases where the accumulator was not the same
object every time, and that where a single object is having elements
added to it, an each iteration from the source collection was better.

In that case #map might be more appropriate - at least if the target
collection is an Array. Btw, did we ever discuss having #map accept a
parameter which defaults to ? i.e.

module Enumerable
  def map(target = )
    each {|x| target << yield x}
    target
  end
end

I tend to agree, though I'm not able to come up with a very technical
rationale.

What say you, oh inject king?

Um..., I kind of agree about the unidiomaticness. It's ugly. These
are certainly much nicer

inject(0) {|h,item| item + h }
inject("") {|s,item| s << item }

I have to admit I use it sparingly these days. :slight_smile:

Kind regards

robert

···

2008/10/1 David A. Black <dblack@rubypal.com>:

On Wed, 1 Oct 2008, Robert Klemme wrote:

2008/10/1 Erik Veenstra <erikveen@gmail.com>:

--
remember.guy do |as, often| as.you_can - without end

Hi --

Hi --

Speed is also the reason for "pre-defining" variables used in
iterations. (6% faster!!)

Premature optimization IMHO.

As much as I like inject, I have to say I've always felt that the ones
that look like this:

inject({}) {|h,item| do_something; h }

are kind of unidiomatic. Evan Phoenix was saying recently on IRC (I
hope I'm remembering/quoting correctly) that his rule of thumb was
that inject was for cases where the accumulator was not the same
object every time, and that where a single object is having elements
added to it, an each iteration from the source collection was better.

In that case #map might be more appropriate - at least if the target
collection is an Array. Btw, did we ever discuss having #map accept a
parameter which defaults to ? i.e.

module Enumerable
def map(target = )
   each {|x| target << yield x}
   target
end
end

I don't think it's a good idea; it generalizes the idea of a mapping
of a collection beyond anything that really seems to me to be a
mapping. If I saw:

   [1,2,3,4,5] => "Hi."

I would not just say it's a weird mapping; I would not be able to
identify it as a mapping at all. It's a more general transformation.

David

···

On Wed, 1 Oct 2008, Robert Klemme wrote:

2008/10/1 David A. Black <dblack@rubypal.com>:

On Wed, 1 Oct 2008, Robert Klemme wrote:

2008/10/1 Erik Veenstra <erikveen@gmail.com>:

--
Rails training from David A. Black and Ruby Power and Light:
   Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
   Advancing with Rails January 19-22 Fort Lauderdale, FL *
   * Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!

I'm not sure I understand exactly what you mean. My point was simply
to allow the caller to provide the target collection where something
is mapped to. Of course you can then also use a String or anything
else that supports << (a stream for example) which I find quite neat.
But where does "=>" syntax come into play?

Kind regards

robert

···

2008/10/1 David A. Black <dblack@rubypal.com>:

In that case #map might be more appropriate - at least if the target
collection is an Array. Btw, did we ever discuss having #map accept a
parameter which defaults to ? i.e.

module Enumerable
def map(target = )
  each {|x| target << yield x}
  target
end
end

I don't think it's a good idea; it generalizes the idea of a mapping
of a collection beyond anything that really seems to me to be a
mapping. If I saw:

[1,2,3,4,5] => "Hi."

I would not just say it's a weird mapping; I would not be able to
identify it as a mapping at all. It's a more general transformation.

--
remember.guy do |as, often| as.you_can - without end

Hi --

In that case #map might be more appropriate - at least if the target
collection is an Array. Btw, did we ever discuss having #map accept a
parameter which defaults to ? i.e.

module Enumerable
def map(target = )
  each {|x| target << yield x}
  target
end
end

I don't think it's a good idea; it generalizes the idea of a mapping
of a collection beyond anything that really seems to me to be a
mapping. If I saw:

[1,2,3,4,5] => "Hi."

I would not just say it's a weird mapping; I would not be able to
identify it as a mapping at all. It's a more general transformation.

I'm not sure I understand exactly what you mean. My point was simply
to allow the caller to provide the target collection where something
is mapped to. Of course you can then also use a String or anything
else that supports << (a stream for example) which I find quite neat.
But where does "=>" syntax come into play?

It doesn't; I'm just using it to separate the collection and a
potential "mapping".

   [1,2,3,4,5].map("") {...} # Result: "Hi."

I don't think that << correctly represents the concept of mapping, so
I would not want map to be generalized to any <<-capable target
object. It's more a <<'ing, or something, than a mapping. It happens
that the current behavior of map can be implemented using << and an
empty array, but I don't think that means that << per se is at the
heart of mapping.

David

···

On Thu, 2 Oct 2008, Robert Klemme wrote:

2008/10/1 David A. Black <dblack@rubypal.com>:

--
Rails training from David A. Black and Ruby Power and Light:
   Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
   Advancing with Rails January 19-22 Fort Lauderdale, FL *
   * Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!

Ah, now I get your point. Thanks for elaborating. So you'd rather call such a method #append or similar.

Kind regards

  robert

···

On 01.10.2008 18:17, David A. Black wrote:

It doesn't; I'm just using it to separate the collection and a
potential "mapping".

   [1,2,3,4,5].map("") {...} # Result: "Hi."

I don't think that << correctly represents the concept of mapping, so
I would not want map to be generalized to any <<-capable target
object. It's more a <<'ing, or something, than a mapping. It happens
that the current behavior of map can be implemented using << and an
empty array, but I don't think that means that << per se is at the
heart of mapping.

Why not? Building up a collection requires some means of "building
up", and #<< is that means. Standardizing around that method allows
for a more comprehensible and flexible system. Is it a semantic thing
for you? Would it help to think of #collect, instead of #map?

In any case, Robert's idea had to do with providing an initial
collection with which to build, whether #<< is used to do that or not.

T.

···

On Oct 1, 12:17 pm, "David A. Black" <dbl...@rubypal.com> wrote:

Hi --

On Thu, 2 Oct 2008, Robert Klemme wrote:
> 2008/10/1 David A. Black <dbl...@rubypal.com>:
>>> In that case #map might be more appropriate - at least if the target
>>> collection is an Array. Btw, did we ever discuss having #map accept a
>>> parameter which defaults to ? i.e.

>>> module Enumerable
>>> def map(target = )
>>> each {|x| target << yield x}
>>> target
>>> end
>>> end

>> I don't think it's a good idea; it generalizes the idea of a mapping
>> of a collection beyond anything that really seems to me to be a
>> mapping. If I saw:

>> [1,2,3,4,5] => "Hi."

>> I would not just say it's a weird mapping; I would not be able to
>> identify it as a mapping at all. It's a more general transformation.

> I'm not sure I understand exactly what you mean. My point was simply
> to allow the caller to provide the target collection where something
> is mapped to. Of course you can then also use a String or anything
> else that supports << (a stream for example) which I find quite neat.
> But where does "=>" syntax come into play?

It doesn't; I'm just using it to separate the collection and a
potential "mapping".

[1,2,3,4,5].map("") {...} # Result: "Hi."

I don't think that << correctly represents the concept of mapping, so
I would not want map to be generalized to any <<-capable target
object. It's more a <<'ing, or something, than a mapping. It happens
that the current behavior of map can be implemented using << and an
empty array, but I don't think that means that << per se is at the
heart of mapping.