Algorithm help

Hm, my alogorithm analysis is quite rusty but I believe you
would have to
get rid of m and replace it as an expression of n; reason
being that m is
part of the result and not part of the input. So you
probably have to
calculate something like an average number of range for the
set of input
range. Difficult without further knowledge of the data's
nature. My gut
guess is that for random numbers the number of ranges is
rather close to n
so that would make O(n*log n) which doesn't exactly look
better than O(n)...

true, this is optimized for the given case (m <= 2).

Anyway with the low numbers of n we're dealing with here
other effects are
likely more dramatic. So doing a benchmark of all solutions
provided might
yield surprising results.

>> Btw, here's another funny idea:
>>
>> -----------------
>> data = [3, 4, 5, 6, 7, 8, 9, 15, 38, 39, 40, 41, 6789, 6790,
>> 9998, 9999]
>> ranges =
>>
>> data.inject(0) {|s,x| s | (1 << x)}.to_s(2).reverse!.scan(/1+/) do
>> >m> s = $`.length
>> ranges << (s .. (s + m.length - 1))
>> end
>>
>> puts ranges
>> -----------------
>>
>> Kind regards
>>
>> robert
>
> Rofl, indead a funny idea. For sure the slowest one mentioned here

Did you benchmark them?

this creates a BigInt 1250 Bytes width, alocates 10 kByte of memory,
fills it, reverses it, and scans it with a regular expression. All to
look at 17 fixnums, it may be O(C * n) but with a realy huge C.

> d = [1, 3, 4, 5, 6, 7, 8, 9, 15, 38, 39, 40, 41, 6789, 6790, 9998,
> 9999] puts
d.inject([d[0]..d[0]]){|r,x|r<<(((r.last.end+1>=x)?r.pop.end: x)..x)}
>
> lets see if you can beat this :))

Nah, unlikely. ATM I don't feel like investing that much
time. Btw, two
points of your solution vs. the bit set approach (above):

- This solution breaks for the empty array.

>> [0] .. [0]
ArgumentError: bad value for range
        from (irb):5

- It needs a sorted array.

So checks to cope with these make your bit a bit longer. :-}

ok, I don't want to invest that much time either, but your code
doesn't cope with negativ numbers :slight_smile:

it was fun,

Simon

true, this is optimized for the given case (m <= 2).

which was exactly the tast at hand - this is 'always' true for my data.

this creates a BigInt 1250 Bytes width, alocates 10 kByte of memory, fills
it, reverses it, and scans it with a regular expression. All to look at 17
fixnums, it may be O(C * n) but with a realy huge C.

and i'm scanning rows with about 7324 pixels so this matters...

ok, I don't want to invest that much time either, but your code doesn't cope
with negativ numbers :slight_smile:

indeed. and the code (my code) needs to do exactly that - i searching for
pixels indicating that the sun is below the horizon using narray like

   below_horizon = (narray < -3).where

and then applying some data munging to only those areas of the scanline. the
area will always be zero, one, or two sections for pure daytime, crossing the
solar terminator away from the poles, and crossing the terminator at the poles
respectively. it seems like that couldn't be so - but latitutes follow curves
in the data to you can get two, but not three, ranges in a single scanline.

in addition, the dark area, when there are two, will always be on opposite
ends of the scanline to the 'divide and conquer' approach is particularly
suitable.

cheers.

-a

···

On Fri, 5 Aug 2005, Kroeger Simon (ext) wrote:
--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

To put some figures in here, I did a bit of benchmarking and indeed the bitset version is quite slooooouw:

$ ruby ranges-2.rb
                          user system total real
bitset 37.734000 0.000000 37.734000 ( 38.354000)
inject-range 0.313000 0.000000 0.313000 ( 0.318000)
inject-array 0.156000 0.000000 0.156000 ( 0.153000)
inject-array-2 0.140000 0.000000 0.140000 ( 0.140000)
inject-array-map 0.204000 0.000000 0.204000 ( 0.210000)

(Code attached)

Anyway, it was fun playing around with this. :slight_smile:

Kind regards

    robert

ranges-2.rb (1.29 KB)

···

Ara.T.Howard <Ara.T.Howard@noaa.gov> wrote:

On Fri, 5 Aug 2005, Kroeger Simon (ext) wrote:

true, this is optimized for the given case (m <= 2).

which was exactly the tast at hand - this is 'always' true for my
data.

this creates a BigInt 1250 Bytes width, alocates 10 kByte of memory,
fills it, reverses it, and scans it with a regular expression. All
to look at 17 fixnums, it may be O(C * n) but with a realy huge C.

and i'm scanning rows with about 7324 pixels so this matters...

ok, I don't want to invest that much time either, but your code
doesn't cope with negativ numbers :slight_smile:

indeed. and the code (my code) needs to do exactly that - i
searching for pixels indicating that the sun is below the horizon
using narray like
  below_horizon = (narray < -3).where

and then applying some data munging to only those areas of the
scanline. the area will always be zero, one, or two sections for
pure daytime, crossing the solar terminator away from the poles, and
crossing the terminator at the poles respectively. it seems like
that couldn't be so - but latitutes follow curves in the data to you
can get two, but not three, ranges in a single scanline.
in addition, the dark area, when there are two, will always be on
opposite ends of the scanline to the 'divide and conquer' approach is
particularly suitable.

cheers.

-a

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

To put some figures in here, I did a bit of benchmarking and indeed the bitset version is quite slooooouw:

$ ruby ranges-2.rb
                         user system total real
bitset 37.734000 0.000000 37.734000 ( 38.354000)
inject-range 0.313000 0.000000 0.313000 ( 0.318000)
inject-array 0.156000 0.000000 0.156000 ( 0.153000)
inject-array-2 0.140000 0.000000 0.140000 ( 0.140000)
inject-array-map 0.204000 0.000000 0.204000 ( 0.210000)

(Code attached)

Anyway, it was fun playing around with this. :slight_smile:

Kind regards

   robert

Nice one,

now we are getting some hard numbers to play with.

i added a larger range 1000..5000 to the test set which should come
closer to ara's problem at hand.

NUMBERS = [3, 4, 5, 6, 7, 8, 9, 15, 38, 39, 40, 41] +
  (1000..5000).to_a + [6789, 6790, 9998, 9999]

I get the following numbers:

                           user system total real
bitset 79.047000 0.094000 79.141000 ( 80.407000)
inject-range 41.609000 0.328000 41.937000 ( 43.000000)
inject-array 17.219000 0.047000 17.266000 ( 17.296000)
inject-array-2 17.687000 0.031000 17.718000 ( 18.688000)
inject-array-map 16.797000 0.078000 16.875000 ( 16.906000)
to_ranges 0.453000 0.000000 0.453000 ( 0.453000)

Interestingly this brings the bitset approach back in play, but it
also clearly shows what can be done with the right algorithm:

def to_ranges first, last
   if (NUMBERS[last] - NUMBERS[first]).abs==last-first
     return [[NUMBERS[first],NUMBERS[last]]]
   end
   r1 = to_ranges(first, (first+last)/2)
   r2 = to_ranges((first+last)/2+1, last)
   if (r1.last[1] - r2.first[0]).abs == 1
     r2[0] = [r1.last[0],r2.first[1]]
     r1.pop
   end
   r1 + r2
end

This would have been a nice quiz question, but i guess its worn
out by now.

cheers

Simon

To put some figures in here, I did a bit of benchmarking and indeed
the bitset version is quite slooooouw:

$ ruby ranges-2.rb
                         user system total real
bitset 37.734000 0.000000 37.734000 ( 38.354000)
inject-range 0.313000 0.000000 0.313000 ( 0.318000)
inject-array 0.156000 0.000000 0.156000 ( 0.153000)
inject-array-2 0.140000 0.000000 0.140000 ( 0.140000)
inject-array-map 0.204000 0.000000 0.204000 ( 0.210000)

(Code attached)

Anyway, it was fun playing around with this. :slight_smile:

Kind regards

   robert

Nice one,

now we are getting some hard numbers to play with.

i added a larger range 1000..5000 to the test set which should come
closer to ara's problem at hand.

NUMBERS = [3, 4, 5, 6, 7, 8, 9, 15, 38, 39, 40, 41] +
(1000..5000).to_a + [6789, 6790, 9998, 9999]

I get the following numbers:

                          user system total real
bitset 79.047000 0.094000 79.141000 ( 80.407000)
inject-range 41.609000 0.328000 41.937000 ( 43.000000)
inject-array 17.219000 0.047000 17.266000 ( 17.296000)
inject-array-2 17.687000 0.031000 17.718000 ( 18.688000)
inject-array-map 16.797000 0.078000 16.875000 ( 16.906000)
to_ranges 0.453000 0.000000 0.453000 ( 0.453000)

Interestingly this brings the bitset approach back in play, but it
also clearly shows what can be done with the right algorithm:

Especially if it's optimized for actual input data.

def to_ranges first, last
  if (NUMBERS[last] - NUMBERS[first]).abs==last-first
    return [[NUMBERS[first],NUMBERS[last]]]
  end
  r1 = to_ranges(first, (first+last)/2)
  r2 = to_ranges((first+last)/2+1, last)
  if (r1.last[1] - r2.first[0]).abs == 1
    r2[0] = [r1.last[0],r2.first[1]]
    r1.pop
  end
  r1 + r2
end

I think there is a slightly more efficient version of to_ranges: you can save the range recreation in case of the connection of the last and first ranges of r1 and r2:

def to_ranges_2 first, last
   if (NUMBERS[last] - NUMBERS[first]).abs==last-first
     return [[NUMBERS[first],NUMBERS[last]]]
   end
   r1 = to_ranges_2(first, (first+last)/2)
   r2 = to_ranges_2((first+last)/2+1, last)
   r2.first[0] = r1.pop[0] if (r1.last[1] - r2.first[0]).abs == 1
   r1 + r2
end

With the old set of data I get

$ ruby ranges-2.rb
                          user system total real
bitset 37.890000 0.000000 37.890000 ( 38.520000)
inject-range 0.313000 0.000000 0.313000 ( 0.323000)
inject-array 0.125000 0.000000 0.125000 ( 0.130000)
inject-array-2 0.140000 0.000000 0.140000 ( 0.155000)
inject-array-map 0.203000 0.000000 0.203000 ( 0.198000)
to_ranges 0.141000 0.000000 0.141000 ( 0.147000)
to_ranges_2 0.141000 0.000000 0.141000 ( 0.138000)
to_ranges-map 0.218000 0.000000 0.218000 ( 0.224000)
to_ranges_2-map 0.219000 0.000000 0.219000 ( 0.218000)

With your set of NUMBERS I get

$ ruby ranges-2.rb
                          user system total real
bitset 103.984000 0.000000 103.984000 (105.677000)
inject-range 77.735000 0.000000 77.735000 ( 78.927000)
inject-array 29.312000 0.015000 29.327000 ( 29.803000)
inject-array-2 28.625000 0.000000 28.625000 ( 29.301000)
inject-array-map 29.047000 0.000000 29.047000 ( 29.543000)
to_ranges 0.672000 0.000000 0.672000 ( 0.684000)
to_ranges_2 0.594000 0.000000 0.594000 ( 0.596000)
to_ranges-map 0.750000 0.000000 0.750000 ( 0.772000)
to_ranges_2-map 0.609000 0.000000 0.609000 ( 0.640000)

Your approach has a clear advantage if ranges are fairly large compared to the overall number of elements in the array. I wonder whether there is still room for optimization with regard to division of the array.

This would have been a nice quiz question, but i guess its worn
out by now.

Yeah, I guess so.

Cheers

    robert

ranges-2.rb (2.83 KB)

···

Simon Kröger <SimonKroeger@gmx.de> wrote:

Robert Klemme wrote:

With your set of NUMBERS I get

$ ruby ranges-2.rb
                         user system total real
bitset 103.984000 0.000000 103.984000 (105.677000)
inject-range 77.735000 0.000000 77.735000 ( 78.927000)
inject-array 29.312000 0.015000 29.327000 ( 29.803000)
inject-array-2 28.625000 0.000000 28.625000 ( 29.301000)
inject-array-map 29.047000 0.000000 29.047000 ( 29.543000)
to_ranges 0.672000 0.000000 0.672000 ( 0.684000)
to_ranges_2 0.594000 0.000000 0.594000 ( 0.596000)
to_ranges-map 0.750000 0.000000 0.750000 ( 0.772000)
to_ranges_2-map 0.609000 0.000000 0.609000 ( 0.640000)

Your approach has a clear advantage if ranges are fairly large compared to the overall number of elements in the array. I wonder whether there is still room for optimization with regard to division of the array.

Maybe there is, but your seek method 'inspired' me to do something else.
I wrote another linear approach (seek2) which is a faster (at least for
the given NUMBERS) and than reimplemented it in c.

Here are the results:

                           user system total real
bitset 78.203000 0.156000 78.359000 ( 78.937000)
inject-range 41.766000 0.375000 42.141000 ( 42.172000)
inject-array 17.046000 0.000000 17.046000 ( 17.062000)
inject-array-2 17.469000 0.047000 17.516000 ( 17.532000)
inject-array-map 16.938000 0.015000 16.953000 ( 17.093000)
to_ranges 0.437000 0.000000 0.437000 ( 0.438000)
to_ranges_2 0.391000 0.000000 0.391000 ( 0.391000)
to_ranges-map 0.484000 0.000000 0.484000 ( 0.484000)
to_ranges_2-map 0.453000 0.016000 0.469000 ( 0.469000)
seek 33.125000 0.016000 33.141000 ( 33.187000)
seek2 8.860000 0.000000 8.860000 ( 8.860000)
seek_c 0.078000 0.000000 0.078000 ( 0.078000)

And thats realy bad. It's five times faster than our best
algorithm in ruby - and its a realy braindead implementation:

ruby:

···

---------------------------------------------------------------
p1, p2, d1, last = 0, 0, NUMBERS[0], NUMBERS.length - 1
ranges =
while (p2+=1) <= last
   if (d2=NUMBERS[p2]) - d1 != p2 - p1
     ranges << (d1..NUMBERS[p2-1])
     p1, d1 = p2, d2
   end
end
ranges << (d1..NUMBERS[last])
---------------------------------------------------------------

and the same thing in c:

---------------------------------------------------------------
VALUE seek_c(VALUE self, VALUE arr)
{
   RArray* a = RARRAY(arr);
  
   int p1 = 0;
   int p2 = 0;
   int d1 = FIX2INT(a->ptr[0]);
   int d2 = 0;
   int last = a->len -1;

   VALUE ranges = rb_ary_new();

   while ((p2+=1) <= last)
   {
     if ((d2=FIX2INT(a->ptr[p2])) - d1 != p2 - p1)
     {
       rb_ary_push(ranges, rb_range_new(INT2FIX(d1), a->ptr[p2-1], 1));
       p1 = p2;
       d1 = d2;
     }
   }
   rb_ary_push(ranges, rb_range_new(INT2FIX(d1), a->ptr[last], 1));
   return ranges;
}
---------------------------------------------------------------

this is more than a hundred times faster, ruby needs a jit engine.

Simon

this doesn't compile for me with 1.8.2 or 1.9.0 - can you give the whole
source and extconf.rb?

cheers.

-a

···

On Mon, 8 Aug 2005, [ISO-8859-1] Simon Kröger wrote:

and the same thing in c:

---------------------------------------------------------------
VALUE seek_c(VALUE self, VALUE arr)
{
RArray* a = RARRAY(arr);
    int p1 = 0;
int p2 = 0;
int d1 = FIX2INT(a->ptr[0]);
int d2 = 0;
int last = a->len -1;

VALUE ranges = rb_ary_new();

while ((p2+=1) <= last)
{
   if ((d2=FIX2INT(a->ptr[p2])) - d1 != p2 - p1)
   {
     rb_ary_push(ranges, rb_range_new(INT2FIX(d1), a->ptr[p2-1], 1));
     p1 = p2;
     d1 = d2;
   }
}
rb_ary_push(ranges, rb_range_new(INT2FIX(d1), a->ptr[last], 1));
return ranges;
}
---------------------------------------------------------------

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================

---------------------------------------------------------------
VALUE seek_c(VALUE self, VALUE arr)
{
RArray* a = RARRAY(arr);
    int p1 = 0;
int p2 = 0;
int d1 = FIX2INT(a->ptr[0]);
int d2 = 0;
int last = a->len -1;

VALUE ranges = rb_ary_new();

while ((p2+=1) <= last)
{
   if ((d2=FIX2INT(a->ptr[p2])) - d1 != p2 - p1)
   {
     rb_ary_push(ranges, rb_range_new(INT2FIX(d1), a->ptr[p2-1], 1));
     p1 = p2;
     d1 = d2;
   }
}
rb_ary_push(ranges, rb_range_new(INT2FIX(d1), a->ptr[last], 1));
return ranges;
}
---------------------------------------------------------------

o.k. - got it to compile - but you need

     rb_ary_push(ranges, rb_range_new(INT2FIX(d1), a->ptr[p2-1], 1));

                                                                    ^
                                                                    0

rb_ary_push(ranges, rb_range_new(INT2FIX(d1), a->ptr[last], 1));

                                                                ^
                                                                0

else the ranges exclude the ends - which should be included. however, if you
do that negative ranges break. i'd like to test this on actual data sizes,
which would be about 10,000 elements per array. i'm thinking anything
O(log(n)) will rise higher, even if pure ruby, for data this large... maybe
not though...

here's a set of arrays the code should work on:

   tests = [
     ,
     [0],
     [0,1],
     [0,1,2],
     [0,1,2,3],
     [0,3],
     [0,3,5],
     [0,3,5,7],
     [0,1,3,4],
     [0,1,3,4,6,7],
     [0,1,3,4,6,7,9,10],
     [0,1,3,4,6,7,10],
     [0,3,4,6,7,9,10],
     [0,1,3],
     [0,3,4],
     [0,-1],
     [0,-1,-2],
     [0,-1,-3,-4],
     [0,-1,-3,-4,-6],
     [0,-1,-3,-4,-6,-7],
     [-1,-3,-4,-6,-7],
     [-3,-4],
     [-3,-4,-6, *(-42 .. -8).to_a.reverse],
   ]

but, again, actual data will be about 10,000 element big.

cheers.

-a

···

On Mon, 8 Aug 2005, [ISO-8859-1] Simon Kröger wrote:
--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================