a = [4,5,6,4,5,6,6,7]
result = Hash.new(0)
a.each { |x| result[x] += 1 }
p result
The result I am getting
{4=>2, 5=>2, 6=>3, 7=>1}
is what I want.
Is there a better way; perhaps using uniq?
a = [4,5,6,4,5,6,6,7]
result = Hash.new(0)
a.each { |x| result[x] += 1 }
p result
The result I am getting
{4=>2, 5=>2, 6=>3, 7=>1}
is what I want.
Is there a better way; perhaps using uniq?
The first that came to my mind.
[4,5,6,4,5,6,6,7].inject(Hash.new(0)) {|res, x| res += 1; res }
Is it good enough for you?
On Jan 16, 2012, at 5:51 PM, Ralph Shnelvar wrote:
a = [4,5,6,4,5,6,6,7]
result = Hash.new(0)
a.each { |x| result += 1 }p result
The result I am getting
{4=>2, 5=>2, 6=>3, 7=>1}
is what I want.Is there a better way; perhaps using uniq?
I like this
a = [4,5,6,4,5,6,6,7]
# 1
p Hash[a.group_by{|n|n}.map{|k, v|[k, v.size]}]
# 2
p Hash.new(0).tap{|h|a.each{|n|h[n] += 1}}
2012/1/17 Ralph Shnelvar <ralphs@dos32.com>:
a = [4,5,6,4,5,6,6,7]
result = Hash.new(0)
a.each { |x| result += 1 }p result
The result I am getting
{4=>2, 5=>2, 6=>3, 7=>1}
is what I want.Is there a better way; perhaps using uniq?
If your data items are integers, and from a rather small range (compared
to computer memory...), then you can use an array instead of an hash:
maxval = 10
result = Array.new(maxval+1, 0)
ar.each{
>x> result[x] += 1;
}
This returns an array and not an hash.
[0, 0, 0, 0, 2, 2, 3, 1, 0, 0, 0]
To make a histogram, that data structure is even better. Otherwise you
need to transform it to a hash again. But for large data sets I still
expect it to be faster:
Your cpu does not need to calculate a hash key of every single data
item, because the data item is already a perfect key for the array. Also
no hash key collisions can occur.
Regards
Karsten Meier
--
Posted via http://www.ruby-forum.com/.
Ok, I tried some benchmarks. We have now even more variables, as they
also depend on "maxval" from the dataset.
maxval = 1000
ar = [].tap{|a| 1_000_000.times {a << rand(maxval)}}
b.report("Meier:") {
n.times {
hist = Array.new(maxval+1, 0)
ar.each{|x| hist[x] += 1;}
result = Hash.new(0)
0.upto(maxval){|i| result[i] = hist[i] unless hist[i] == 0}
result
}
}
On my jruby and my windows- mri 1.8.7 my algorithm was fastest for
maxvalue of 10, 100 or 10000, for example:
SIZE
1000000
MAXVAL
10000
user system total real
Ralph Shneiver: 0.533000 0.000000 0.533000 ( 0.518000)
Meier: 0.312000 0.000000 0.312000 ( 0.312000)
Keinich #1 0.814000 0.000000 0.814000 ( 0.814000)
(I have no 1.9.3 yet on my windows PC, so it may be different there)
But here are two observations:
1) speed-up is not so big as I expected. In C, I expect array lookup to
be factors better than hash calculation (followed by an array-lookup in
the hash-table...). In Ruby it seem to be not so much faster. But the
speedup gets bigger for bigger values of maxval.
2) My algorithm sometimes run much much slower when #kennich1 had run
before mine.
It seem to get worse with big values of maxval, but not
with jruby --1.9 option. It is not the array-allocate itself that is the
problem.
Is it possible that group_by changes the internal array structure, So I
get a non-continous-array?
Regards
Karsten Meier
--
Posted via http://www.ruby-forum.com/.
I think this is a misuse of inject, personally, every time I see it. It's
harder to read and it doesn't give the feeling of actually "reducing"
(inject's alias) the array down to one thing. The required `; res` is a
sign of that. Compare:
[1, 2, 3, 4].inject(5) { |a, b| a + b }
On Mon, Jan 16, 2012 at 16:00, Sigurd <cu9ypd@gmail.com> wrote:
[4,5,6,4,5,6,6,7].inject(Hash.new(0)) {|res, x| res += 1; res }
Kenichi,
Monday, January 16, 2012, 9:21:51 AM, you wrote:
2012/1/17 Ralph Shnelvar <ralphs@dos32.com>:
a = [4,5,6,4,5,6,6,7]
result = Hash.new(0)
a.each { |x| result += 1 }p result
The result I am getting
{4=>2, 5=>2, 6=>3, 7=>1}
is what I want.Is there a better way; perhaps using uniq?
I like this
a = [4,5,6,4,5,6,6,7]
# 1
p Hash[a.group_by{|n|n}.map{|k, v|[k, v.size]}]
# 2
p Hash.new(0).tap{|h|a.each{|n|h[n] += 1}}
I like #2. I can understand it. I'm still having trouble wrapping my head around #1.
Having said that, is your #2 better than mine in any dimension (comprehensibility and/or speed of execution?
If your data items are integers, and from a rather small range (compared
to computer memory...), then you can use an array instead of an hash:maxval = 10
result = Array.new(maxval+1, 0)
ar.each{
>x> result += 1;
}This returns an array and not an hash.
[0, 0, 0, 0, 2, 2, 3, 1, 0, 0, 0]To make a histogram, that data structure is even better. Otherwise you
need to transform it to a hash again. But for large data sets I still
expect it to be faster:
Don't expect, measure. There's Benchmark...
Your cpu does not need to calculate a hash key of every single data
item, because the data item is already a perfect key for the array. Also
no hash key collisions can occur.
But if only few of the numbers in the range are used you waste a
potentially large Array for just a few entries.
Kind regards
robert
On Mon, Jan 23, 2012 at 11:15 AM, Karsten Meier <developer@handylearn-projects.de> wrote:
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
Ok, I tried some benchmarks. We have now even more variables, as they
also depend on "maxval" from the dataset.maxval = 1000
ar = .tap{|a| 1_000_000.times {a << rand(maxval)}}b.report("Meier:") {
n.times {
hist = Array.new(maxval+1, 0)
ar.each{|x| hist += 1;}
result = Hash.new(0)
0.upto(maxval){|i| result[i] = hist[i] unless hist[i] == 0}
You could also do
result.each_with_index {|c,i| result[i] = c if c.nonzero?}
result
}
}
That's just part of the testing code, isn't it? Why not share the
complete code?
On my jruby and my windows- mri 1.8.7 my algorithm was fastest for
maxvalue of 10, 100 or 10000, for example:SIZE
1000000
MAXVAL
10000
user system total real
Ralph Shneiver: 0.533000 0.000000 0.533000 ( 0.518000)
Meier: 0.312000 0.000000 0.312000 ( 0.312000)
Keinich #1 0.814000 0.000000 0.814000 ( 0.814000)(I have no 1.9.3 yet on my windows PC, so it may be different there)
But here are two observations:
1) speed-up is not so big as I expected. In C, I expect array lookup to
be factors better than hash calculation (followed by an array-lookup in
the hash-table...). In Ruby it seem to be not so much faster. But the
speedup gets bigger for bigger values of maxval.2) My algorithm sometimes run much much slower when #kennich1 had run
before mine.It seem to get worse with big values of maxval, but not
with jruby --1.9 option. It is not the array-allocate itself that is the
problem.
Is it possible that group_by changes the internal array structure, So I
get a non-continous-array?
No. It's more likely that you are hit by GC I'd say. You could also
try Benchmark.bmbm for warm up before the test.
Kind regards
robert
On Mon, Jan 23, 2012 at 3:08 PM, Karsten Meier <developer@handylearn-projects.de> wrote:
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
Interesting.
I added your algorithm to the list and tested on ruby 1.9.3
$ ruby -v
ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux]
SIZE
1000000
MAXVAL
1000
user system total real
Ralph Shneiver: 0.370000 0.000000 0.370000 ( 0.369229)
Sigurd: 0.420000 0.000000 0.420000 ( 0.418634)
Meier: 0.270000 0.000000 0.270000 ( 0.274136)
Keinich #1 0.320000 0.000000 0.320000 ( 0.320962)
Keinich #2 0.380000 0.000000 0.380000 ( 0.372422)
Magnus Holm: 0.420000 0.000000 0.420000 ( 0.423316)
Abinoam #1: 0.600000 0.000000 0.600000 ( 0.597028)
And I also retested in the latest jruby-head (1.7.0.dev)
$ ruby -v
jruby 1.7.0.dev (ruby-1.8.7-p357) (2012-01-23 f80ab05) (Java HotSpot(TM)
Server VM 1.6.0_26) [linux-i386-java]
SIZE
1000000
MAXVAL
1000
user system total real
Ralph Shneiver: 0.492000 0.000000 0.492000 ( 0.476000)
Sigurd: 0.473000 0.000000 0.473000 ( 0.473000)
Meier: 0.287000 0.000000 0.287000 ( 0.287000)
Keinich #1 0.308000 0.000000 0.308000 ( 0.308000)
Keinich #2 7.374000 0.000000 7.374000 ( 7.374000)
Magnus Holm: NoMethodError: undefined method `each_with_object' for
#<Array:0x19c6163>
__file__ at sb.rb:30
times at org/jruby/RubyFixnum.java:261
...
So, at least for these 2 cased, your algorithm seems somewhat
faster.
As long as the array is not "sparsely" populated, this
approach certainly makes sense.
HTH,
Peter
On Mon, Jan 23, 2012 at 3:08 PM, Karsten Meier < developer@handylearn-projects.de> wrote:
Ok, I tried some benchmarks. We have now even more variables, as they
also depend on "maxval" from the dataset.maxval = 1000
ar = .tap{|a| 1_000_000.times {a << rand(maxval)}}b.report("Meier:") {
n.times {
hist = Array.new(maxval+1, 0)
ar.each{|x| hist += 1;}
result = Hash.new(0)
0.upto(maxval){|i| result[i] = hist[i] unless hist[i] == 0}
result
}
}On my jruby and my windows- mri 1.8.7 my algorithm was fastest for
maxvalue of 10, 100 or 10000, for example:SIZE
1000000
MAXVAL
10000
user system total real
Ralph Shneiver: 0.533000 0.000000 0.533000 ( 0.518000)
Meier: 0.312000 0.000000 0.312000 ( 0.312000)
Keinich #1 0.814000 0.000000 0.814000 ( 0.814000)(I have no 1.9.3 yet on my windows PC, so it may be different there)
Well it's one of the possible solutions.
You example is not accurate though:
5 + [1, 2, 3, 4].reduce(&:+)
On Jan 16, 2012, at 6:04 PM, Adam Prescott wrote:
On Mon, Jan 16, 2012 at 16:00, Sigurd <cu9ypd@gmail.com> wrote:
[4,5,6,4,5,6,6,7].inject(Hash.new(0)) {|res, x| res += 1; res }
I think this is a misuse of inject, personally, every time I see it. It's
harder to read and it doesn't give the feeling of actually "reducing"
(inject's alias) the array down to one thing. The required `; res` is a
sign of that. Compare:[1, 2, 3, 4].inject(5) { |a, b| a + b }
There's always each_with_object, although it's a little long:
[4,5,6,4,5,6,6,7].each_with_object(Hash.new(0)) { |x, res| res += 1 }
On Mon, Jan 16, 2012 at 17:04, Adam Prescott <adam@aprescott.com> wrote:
On Mon, Jan 16, 2012 at 16:00, Sigurd <cu9ypd@gmail.com> wrote:
[4,5,6,4,5,6,6,7].inject(Hash.new(0)) {|res, x| res += 1; res }
I think this is a misuse of inject, personally, every time I see it. It's
harder to read and it doesn't give the feeling of actually "reducing"
(inject's alias) the array down to one thing. The required `; res` is a
sign of that. Compare:[1, 2, 3, 4].inject(5) { |a, b| a + b }
Abinoam,
thank you for get the benchmarks.
I guess the different of spped from generating objects.
They generate minimal objects which Object#tap and Enumerable#each_with_object.
-------------------------------------------------------------------------------
Ralph,
I like #1 for comprehensibility meaning
# looks "construct Hash instance from inner subscript"
Hash
# looks "collect the own values"
a.group_by{|n|n}
# counting
map{|k, v|[k, v.size]}]
# if Hash has own map
-------------------------------------------------------------------------------
I like #2 for comprehensibility, clear name-space, and speed meanings
# comprehensibility
# looks like "list comprehension"
tap{|h|a.each{|n|h[n] += 1}}
# clear name-space
no make variables for out of block
# speed meanings
look Abinoam's benchmarks
# if Enumerable has method for this case
2012/1/17 Ralph Shnelvar <ralphs@dos32.com>:
Kenichi,
Monday, January 16, 2012, 9:21:51 AM, you wrote:
> 2012/1/17 Ralph Shnelvar <ralphs@dos32.com>:
a = [4,5,6,4,5,6,6,7]
result = Hash.new(0)
a.each { |x| result += 1 }p result
The result I am getting
{4=>2, 5=>2, 6=>3, 7=>1}
is what I want.Is there a better way; perhaps using uniq?
> I like this
> a = [4,5,6,4,5,6,6,7]
> # 1
> p Hash[a.group_by{|n|n}.map{|k, v|[k, v.size]}]> # 2
> p Hash.new(0).tap{|h|a.each{|n|h[n] += 1}}I like #2. I can understand it. I'm still having trouble wrapping my head around #1.
Having said that, is your #2 better than mine in any dimension (comprehensibility and/or speed of execution?
--
Kenichi Kamiya
The full code for my recent tests is here:
https://gist.github.com/1663455
Peter
On Mon, Jan 23, 2012 at 3:38 PM, Peter Vandenabeele <peter@vandenabeele.com>wrote:
I added your algorithm to the list and tested on ruby 1.9.3
In what sense is that more "accurate"?
On Jan 16, 2012 4:09 PM, "Sigurd" <cu9ypd@gmail.com> wrote:
You example is not accurate though:
5 + [1, 2, 3, 4].reduce(&:+)
I think Magnus Holm is the clearest (IMHO, yes, it's just a taste and
humble opinion.).
[4,5,6,4,5,6,6,7].each_with_object(Hash.new(0)) {|num, hsh| hsh[num] += 1}
Another way (not better) I remember is...
Hash[ [4,5,6,4,5,6,6,7].sort.chunk {|n| n}.map {|ix, els| [ix, els.size] } ]
See: Module: Enumerable (Ruby 1.9.3)
It also can be... clearer?!?
Hash[ [4,5,6,4,5,6,6,7].group_by {|n| n}.map {|ix, els| [ix, els.size] } ]
Perhaps something like this (same as Magnus Holm) just hiding the
complexity into the method.
class Array
def totalize_to_hash
hsh = Hash.new(0)
self.each do |n|
hsh[n] += 1
end
hsh
end
end
[4,5,6,4,5,6,6,7].totalize_to_hash
Abinoam Jr.
On Mon, Jan 16, 2012 at 1:48 PM, Magnus Holm <judofyr@gmail.com> wrote:
On Mon, Jan 16, 2012 at 17:04, Adam Prescott <adam@aprescott.com> wrote:
On Mon, Jan 16, 2012 at 16:00, Sigurd <cu9ypd@gmail.com> wrote:
[4,5,6,4,5,6,6,7].inject(Hash.new(0)) {|res, x| res += 1; res }
I think this is a misuse of inject, personally, every time I see it. It's
harder to read and it doesn't give the feeling of actually "reducing"
(inject's alias) the array down to one thing. The required `; res` is a
sign of that. Compare:[1, 2, 3, 4].inject(5) { |a, b| a + b }
There's always each_with_object, although it's a little long:
[4,5,6,4,5,6,6,7].each_with_object(Hash.new(0)) { |x, res| res += 1 }
I would like to have it.
We can discuss it better here at ruby talk to see the pros and cons.
If somebody is able to do the C code of it...
Perhaps we could issue a feature request.
Abinoam Jr.
On Tue, Jan 17, 2012 at 7:58 AM, Kenichi Kamiya <kachick1@gmail.com> wrote:
# if Enumerable has method for this case
an aproach to http://www.ruby-forum.com/topic/3446541 · GitHub
Well,
it seems not quite accurate to me because of block. inject uses convention that the last statement in method is a return. The nature of inject is to assign the last value to the memo that has not been used ever in your case. Therefore it's more natural to use short inject method definitions: either a.inject(5, :+) either 5 + a.inject(:+). If the memo return in proc would be unnatural the inject won't pass it to the proc explicitly.
On the other side I'm not a proponent of the crazy injects that could be barely understood. I think in this case inject could be used easily as well as the other solutions provided.
On Jan 16, 2012, at 6:14 PM, Adam Prescott wrote:
On Jan 16, 2012 4:09 PM, "Sigurd" <cu9ypd@gmail.com> wrote:
You example is not accurate though:
5 + [1, 2, 3, 4].reduce(&:+)
In what sense is that more "accurate"?
Some benchmark results...
n = 100_000
Benchmark.bm(15) do |b|
b.report("Ralph Shneiver:") { n.times { a = [4,5,6,4,5,6,6,7];
result = Hash.new(0); a.each { |x| result += 1 }; result} }
b.report("Sigurd:") { n.times {
[4,5,6,4,5,6,6,7].inject(Hash.new(0)) {|res, x| res += 1; res } } }
b.report("Keinich #1") { n.times { Hash[a.group_by{|n|n}.map{|k,
v>[k, v.size]}] } }
b.report("Keinich #2") { n.times {
Hash.new(0).tap{|h|a.each{|n|h[n] += 1}} } }
b.report("Magnus Holm:") { n.times {
[4,5,6,4,5,6,6,7].each_with_object(Hash.new(0)) { |x, res| res += 1
} } }
b.report("Abinoam #1:") { n.times { Hash[
[4,5,6,4,5,6,6,7].sort.chunk {|n| n}.map {|ix, els| [ix, els.size] } ]
} }
end
user system total real
Ralph Shneiver: 0.290000 0.000000 0.290000 ( 0.259640)
Sigurd: 0.320000 0.000000 0.320000 ( 0.289873)
Keinich #1 0.560000 0.000000 0.560000 ( 0.497736)
Keinich #2 0.280000 0.000000 0.280000 ( 0.250843)
Magnus Holm: 0.310000 0.000000 0.310000 ( 0.283344)
Abinoam #1: 1.140000 0.000000 1.140000 ( 1.042744)
Abinoam Jr.
On Mon, Jan 16, 2012 at 9:22 PM, Abinoam Jr. <abinoam@gmail.com> wrote:
On Mon, Jan 16, 2012 at 1:48 PM, Magnus Holm <judofyr@gmail.com> wrote:
On Mon, Jan 16, 2012 at 17:04, Adam Prescott <adam@aprescott.com> wrote:
On Mon, Jan 16, 2012 at 16:00, Sigurd <cu9ypd@gmail.com> wrote:
[4,5,6,4,5,6,6,7].inject(Hash.new(0)) {|res, x| res += 1; res }
I think this is a misuse of inject, personally, every time I see it. It's
harder to read and it doesn't give the feeling of actually "reducing"
(inject's alias) the array down to one thing. The required `; res` is a
sign of that. Compare:[1, 2, 3, 4].inject(5) { |a, b| a + b }
There's always each_with_object, although it's a little long:
[4,5,6,4,5,6,6,7].each_with_object(Hash.new(0)) { |x, res| res += 1 }
I think Magnus Holm is the clearest (IMHO, yes, it's just a taste and
humble opinion.).[4,5,6,4,5,6,6,7].each_with_object(Hash.new(0)) {|num, hsh| hsh[num] += 1}
Another way (not better) I remember is...
Hash[ [4,5,6,4,5,6,6,7].sort.chunk {|n| n}.map {|ix, els| [ix, els.size] } ]
See: Module: Enumerable (Ruby 1.9.3)
It also can be... clearer?!?
Hash[ [4,5,6,4,5,6,6,7].group_by {|n| n}.map {|ix, els| [ix, els.size] } ]
Perhaps something like this (same as Magnus Holm) just hiding the
complexity into the method.class Array
def totalize_to_hash
hsh = Hash.new(0)
self.each do |n|
hsh[n] += 1
end
hsh
end
end[4,5,6,4,5,6,6,7].totalize_to_hash
Abinoam Jr.
I found a below discussion in rails. (via google)
It has a same name and aims to same goal.
And it was estimated "too specific".
I think too, but this name not recall other case for me.
uhh...
2012/1/17 Abinoam Jr. <abinoam@gmail.com>:
On Tue, Jan 17, 2012 at 7:58 AM, Kenichi Kamiya <kachick1@gmail.com> wrote:
# if Enumerable has method for this case
an aproach to http://www.ruby-forum.com/topic/3446541 · GitHubI would like to have it.
We can discuss it better here at ruby talk to see the pros and cons.
If somebody is able to do the C code of it...
Perhaps we could issue a feature request.Abinoam Jr.
--
Kenichi Kamiya