Getting the smallest Items of an Array

Hello guys,

I have been working with ranking algorithm, wich collects the smallest
numbers chronologically. but i couldn't make it work.

Here is the code.

module Enumerable
def min_n(n, &block)
    block = Proc.new { |x,y| x <=> y} if block == nil
    stable = StortedArray.new(&block)
    each do |x|
      stable << x if stable.size < n or block.call(x, stable[-1]) == -1
      stable.pop until stable.size <=n
    end
    return stable

  end

end

m = [1,5,2,8,7,9,10,100,50]
m.min_n(4)

I would be more than glad to get your feedback/suggestions of how to
make it work.

···

--
Posted via http://www.ruby-forum.com/.

Hi,

so what does "not work" mean? I don't know your StortedArray class (or
rather SortedArray?), but if it works like I think it works, then your
method should indeed output the four smallest elements:

[1, 2, 5, 7]

But what's all this effort for? Why not simply use Ruby's built-in
methods for that?

smallest = m.sort.first 4

This might not be particularly efficient, but your method isn't either.

···

--
Posted via http://www.ruby-forum.com/.

[1,2,3,4,-1].min

=> -1

···

--
Posted via http://www.ruby-forum.com/\.

def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end

=> nil

n_min([1,2,-1,3,4],2)

=> [-1, 1]

···

--
Posted via http://www.ruby-forum.com/\.

Uhh... how about ary.sort.first(n)?

-- Matma Rex

···

2012/11/29 Regis d'Aubarede <lists@ruby-forum.com>:

def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end

=> nil

n_min([1,2,-1,3,4],2)

=> [-1, 1]

Without having measured it that's quite an inefficient way to do it IMHO:
- you are creating a lot temporary arrays (two for each step)
- you are traversing most elements n times

I think, sorting once and then taking the first n elements is more efficient.

Turns out, measurements confirm this: this approach is faster only for
n=1 but then one would use Enumerable#min anyway.

require 'benchmark'

REP = 1_000

puts 'Generating data...'

data = [0, 1, 10, 17].map do |i|
    (1 << i).times.map { rand 1000000 }
  end

puts 'done'

def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end

Benchmark.bm 30 do |x|
  [1, 10, 1000].each do |n|
    data.each do |array|
      x.report "#{n}/#{array.size} items sort" do
        REP.times do
          array.sort[0, n]
        end
      end

      x.report "#{n}/#{array.size} items n_min" do
        REP.times do
          n_min array, n
        end
      end
    end
  end
end

Test run (I interrupted it because it took too long):

Generating data...
done
                                     user system total real
1/1 items sort 0.000000 0.000000 0.000000 ( 0.000500)
1/1 items n_min 0.000000 0.000000 0.000000 ( 0.003000)
1/2 items sort 0.000000 0.000000 0.000000 ( 0.000500)
1/2 items n_min 0.000000 0.000000 0.000000 ( 0.002001)
1/1024 items sort 0.109000 0.000000 0.109000 ( 0.112014)
1/1024 items n_min 0.093000 0.000000 0.093000 ( 0.094012)
1/131072 items sort 25.709000 0.141000 25.850000 ( 25.836281)
1/131072 items n_min 11.685000 0.280000 11.965000 ( 11.967520)
10/1 items sort 0.000000 0.000000 0.000000 ( 0.000500)
10/1 items n_min 0.015000 0.000000 0.015000 ( 0.014002)
10/2 items sort 0.000000 0.000000 0.000000 ( 0.000000)
10/2 items n_min 0.016000 0.000000 0.016000 ( 0.014002)
10/1024 items sort 0.094000 0.016000 0.110000 ( 0.112514)
10/1024 items n_min 0.904000 0.000000 0.904000 ( 0.929618)
10/131072 items sort 25.850000 0.109000 25.959000 ( 25.948795)
10/131072 items n_min 117.609000 2.137000 119.746000 (119.802713)
1000/1 items sort 0.000000 0.000000 0.000000 ( 0.000500)
1000/1 items n_min 1.279000 0.000000 1.279000 ( 1.279162)
1000/2 items sort 0.000000 0.000000 0.000000 ( 0.001000)
1000/2 items n_min 1.279000 0.000000 1.279000 ( 1.276662)
1000/1024 items sort 0.109000 0.000000 0.109000 ( 0.112514)
1000/1024 items n_min 48.579000 0.000000 48.579000 ( 48.599671)
1000/131072 items sort 25.694000 0.188000 25.882000 ( 25.885787)
1000/131072 items n_min x.rb:14:in `block in n_min': Interrupt
        from x.rb:14:in `each'
        from x.rb:14:in `map'
        from x.rb:14:in `n_min'
        from x.rb:27:in `block (5 levels) in <main>'
        from x.rb:26:in `times'
        from x.rb:26:in `block (4 levels) in <main>'
        from /usr/lib/ruby/1.9.1/benchmark.rb:280:in `measure'
        from /usr/lib/ruby/1.9.1/benchmark.rb:362:in `item'
        from x.rb:25:in `block (3 levels) in <main>'
        from x.rb:18:in `each'
        from x.rb:18:in `block (2 levels) in <main>'
        from x.rb:17:in `each'
        from x.rb:17:in `block in <main>'
        from /usr/lib/ruby/1.9.1/benchmark.rb:174:in `benchmark'
        from /usr/lib/ruby/1.9.1/benchmark.rb:205:in `bm'
        from x.rb:16:in `<main>'

Kind regards

robert

···

On Thu, Nov 29, 2012 at 12:02 PM, Regis d'Aubarede <lists@ruby-forum.com> wrote:

def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end

=> nil

n_min([1,2,-1,3,4],2)

=> [-1, 1]

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

You lose the relative ordering of the chosen elements. Don't know if
it's important or not.

Jesus.

···

On Thu, Nov 29, 2012 at 1:37 PM, Bartosz Dziewoński <matma.rex@gmail.com> wrote:

2012/11/29 Regis d'Aubarede <lists@ruby-forum.com>:

def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end

=> nil

n_min([1,2,-1,3,4],2)

=> [-1, 1]

Uhh... how about ary.sort.first(n)?

Uhh... how about ary.sort.first(n)?

Of sort complexity id O(n*log(n), i thought that sorting the complete
array
was less smart then my solution which is should be
O(m*n) / m=number of minus item selected.

but it is not true:
nlog(n) with n=100_000 >> 1_115_000
m*n with n=100_000 and m=10 >> 1_000_000

perhaps there is a mistake somewhere ?

···

--
Posted via http://www.ruby-forum.com/\.

def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end

          array.sort[0, n]
          n_min array, n

You compar ruby implementation for n_min() with c implementation for
sort...

here result with a qsort from http://rosettacode.org, REP=10

class Array
  def qsort
    return self if length <= 1
    pivot = self[0]
    less, greatereq = self[1..-1].partition { |x| x < pivot }
    less.qsort +
      [pivot] +
      greatereq.qsort
  end
end

Generating data...
done
                                    user system total
real

4/1024 items sort 0.031000 0.000000 0.031000 (
0.029001)
4/1024 items n_min 0.000000 0.000000 0.000000 (
0.006001)

4/131072 items sort 6.022000 0.016000 6.038000 (
6.039345)
4/131072 items n_min 0.671000 0.016000 0.687000 (
0.701040)

20/1024 items sort 0.031000 0.000000 0.031000 (
0.028002)
20/1024 items n_min 0.031000 0.000000 0.031000 (
0.027001)

20/131072 items sort 6.069000 0.031000 6.100000 (
6.117349)
20/131072 items n_min 3.510000 0.078000 3.588000 (
3.595206)

100/1024 items sort 0.031000 0.000000 0.031000 (
0.030002)
100/1024 items n_min 0.125000 0.000000 0.125000 (
0.135008)

100/131072 items sort 6.286000 0.000000 6.286000 (
6.300360)
100/131072 items n_min 17.223000 0.265000 17.488000 (
17.496001)

···

--
Posted via http://www.ruby-forum.com/\.

Try this:
n = [4, 6, 7, 2, 15]

def min(n)
minimum = n[0]
n.each do |element|
if element < minimum
minimum = element
end
end
return minimum
end

puts min(n)

Is that forbidden? You are making use of C implementation as well.
With that argumentation of yours you would also need to reimplement
methods you use inside n_min in Ruby (notably l.min and l-[a]). Fact
remains that your algorithm will visit most of the elements of a 2*n
times. I find that inelegant but YMMV of course.

Kind regards

robert

···

On Thu, Nov 29, 2012 at 4:42 PM, Regis d'Aubarede <lists@ruby-forum.com> wrote:

def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end

          array.sort[0, n]
          n_min array, n

You compar ruby implementation for n_min() with c implementation for
sort...

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

"Иван Бишевац" <ivan.bisevac@gmail.com> wrote in post #1087106:

def min(n)
minimum = n[0]
n.each do |element|
if element < minimum
minimum = element
end
end
return minimum
end

Great, you've reimplemented Array#min.

Problem is, this has nothing to do with what the OP wants.

···

--
Posted via http://www.ruby-forum.com/\.

I agree with Robert, using #sort is probably the simplest approach and
reasonably fast in most situations. This is exactly what I would do.
But I also like playing around with algorithms :wink: So if you're really
keen on both, performance *and* doing as much as possible in ruby, you
can try the following implementation of "min_n", which uses a binary
heap. It has a running-time of O(n*log m) (n=array size, m=number of
small elements) and should by quite fast if m << n. Then it also seems
to be quite competetive with the sort approach.

···

On 2012-11-29, Robert Klemme <shortcutter@googlemail.com> wrote:

On Thu, Nov 29, 2012 at 4:42 PM, Regis d'Aubarede <lists@ruby-forum.com> wrote:

def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end

          array.sort[0, n]
          n_min array, n

You compar ruby implementation for n_min() with c implementation for
sort...

Is that forbidden? You are making use of C implementation as well.
With that argumentation of yours you would also need to reimplement
methods you use inside n_min in Ruby (notably l.min and l-[a]). Fact
remains that your algorithm will visit most of the elements of a 2*n
times. I find that inelegant but YMMV of course.

====
def hdown(a, i)
  x, n = a[i], a.size
  while true do
    l = 2*i + 1
    r = l + 1
    break if l >= n
    nxt = r >= n || a[l] >= a[r] ? l : r
    break if x >= a[nxt]
    a[i], i = a[nxt], nxt
  end
  a[i] = x
end

def hbuild(a)
  (a.size/2).downto(0) do |i| hdown(a, i) end
end

def min_n(a, n)
  if a.size <= n then
    return a.dup
  else
    h = a[0,n]
    hbuild(h)
    n.upto(a.size-1) do |i|
      if h[0] > a[i] then
        h[0] = a[i]
        hdown(h, 0)
      end
    end
    return h
  end
end

The following tests are from jruby-1.7.0

1/32768 items sort 2.700000 0.020000 2.720000 ( 2.685000)
1/32768 items min_n 1.400000 0.050000 1.450000 ( 1.246000)
1/131072 items sort 17.070000 0.010000 17.080000 ( 17.054000)
1/131072 items min_n 4.510000 0.010000 4.520000 ( 4.418000)
10/32768 items sort 2.660000 0.000000 2.660000 ( 2.655000)
10/32768 items min_n 1.170000 0.010000 1.180000 ( 1.142000)
10/131072 items sort 17.060000 0.000000 17.060000 ( 17.036000)
10/131072 items min_n 4.540000 0.010000 4.550000 ( 4.437000)
100/32768 items sort 2.660000 0.000000 2.660000 ( 2.656000)
100/32768 items min_n 1.440000 0.010000 1.450000 ( 1.405000)
100/131072 items sort 17.070000 0.000000 17.070000 ( 17.039000)
100/131072 items min_n 4.900000 0.010000 4.910000 ( 4.782000)
1000/32768 items sort 2.650000 0.000000 2.650000 ( 2.653000)
1000/32768 items min_n 3.850000 0.000000 3.850000 ( 3.794000)
1000/131072 items sort 17.080000 0.010000 17.090000 ( 17.042000)
1000/131072 items min_n 8.550000 0.010000 8.560000 ( 8.320000)

and with 1.9.3

1/32768 items sort 3.420000 0.030000 3.450000 ( 3.437513)
1/32768 items min_n 2.540000 0.000000 2.540000 ( 2.544122)
1/131072 items sort 15.840000 0.160000 16.000000 ( 16.007184)
1/131072 items min_n 10.050000 0.000000 10.050000 ( 10.037374)
10/32768 items sort 3.420000 0.000000 3.420000 ( 3.422531)
10/32768 items min_n 2.620000 0.000000 2.620000 ( 2.616946)
10/131072 items sort 15.910000 0.000000 15.910000 ( 15.913246)
10/131072 items min_n 10.250000 0.000000 10.250000 ( 10.246652)
100/32768 items sort 3.410000 0.010000 3.420000 ( 3.413907)
100/32768 items min_n 3.500000 0.000000 3.500000 ( 3.502954)
100/131072 items sort 15.910000 0.010000 15.920000 ( 15.923128)
100/131072 items min_n 11.340000 0.010000 11.350000 ( 11.353193)
1000/32768 items sort 3.420000 0.000000 3.420000 ( 3.417772)
1000/32768 items min_n 11.060000 0.000000 11.060000 ( 11.074263)
1000/131072 items sort 15.880000 0.000000 15.880000 ( 15.882260)
1000/131072 items min_n 22.170000 0.020000 22.190000 ( 22.197999)

Best regards,
Frank

I put a few more variants which traverse the original Enumerable just
once up at github to play around with:

Kind regards

robert

···

On Thu, Nov 29, 2012 at 10:14 PM, Frank Fischer <frank-fischer@shadow-soft.de> wrote:

I agree with Robert, using #sort is probably the simplest approach and
reasonably fast in most situations. This is exactly what I would do.
But I also like playing around with algorithms :wink:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/