Need help detecting overlapping ranges

Hello all,

I am writing some code where I create a bunch of ranges, then at the end
I want to create new ranges out of any ranges that overlap one another.
For example, say I have the following ranges:

(1..5) (7..11) (22..29) (5..8)

Given the ranges above, I want to end up with the following ranges:

(1..11) (22..29)

Here is the code I've come up with so far (ranges is an array of ranges
similar to what I described above in my example):

  ranges = @failed.outages
  changes = true
  while changes
    changes = false
    outages = ranges.collect { |range| range.to_a }
    ranges.clear
    while !outages.empty?
      outage = outages.shift
      outages.each do |n|
        unless (outage & n).empty?
          outage = (outage + n).uniq.sort
          outages.delete(n)
          changes = true
        end
      end
      ranges << (outage.first..outage.last)
    end
  end
  return ranges.sort { |a,b| a.first <=> b.first }

This code works, but it is *EXTREMELY* slow (my current array of ranges
is averaging out to ~24000 range elements). Anyone have an idea of how
to speed it up?

···

--
Thanks!
Bryan
--
Posted via http://www.ruby-forum.com/.

Bryan Richardson wrote:

Hello all,

I am writing some code where I create a bunch of ranges, then at the end
I want to create new ranges out of any ranges that overlap one another.
For example, say I have the following ranges:

(1..5) (7..11) (22..29) (5..8)

Given the ranges above, I want to end up with the following ranges:

(1..11) (22..29)

(...)

This code works, but it is *EXTREMELY* slow (my current array of ranges
is averaging out to ~24000 range elements). Anyone have an idea of how
to speed it up?

--
Thanks!
Bryan

Sets are usually much faster then large arrays and sets have a divide
method.

require 'set'
ranges=[(1..5),(7..11),(22..29),(5..8)]
set=Set.new
  ranges.each do|range|
  range.each do|el|
set<<el
end
end
sets = set.divide { |i,j| (i - j).abs == 1 } #straight from the
documentation!

p sets
#<Set: {#<Set: {27, 22, 28, 23, 29, 24, 25, 26}>, #<Set: {5, 11, 6, 1,
7, 2, 8, 3, 9, 4, 10}>}>

hth,

Siep

···

--
Posted via http://www.ruby-forum.com/\.

I guess it's slow because it's doing many Range to Array conversions.
Here's a version that does no such conversions:

    # Assumption: for every range: range.end >= range.begin
    def overlap(ranges)
      ranges = ranges.dup
      new_ranges =
      until ranges.empty?
        cur = ranges.pop
        overlap = ranges.find { |range|
          cur.member?(range.begin) || cur.member?(range.end) ||
            range.member?(cur.begin) || range.member?(cur.end)
        }
        if overlap
          new_begin = cur.begin < overlap.begin ? cur.begin : overlap.begin
          new_end = cur.end > overlap.end ? cur.end : overlap.end
          join = Range.new(new_begin, new_end)
          ranges.map! { |range| range.equal?(overlap) ? join : range }
        else
          new_ranges << cur
        end
      end
      new_ranges
    end

(And in case Gmail messes up the formating:
http://pastecode.com/?show=m69851851\)

So I've tried it only with the one example you've given and
I didn't actually benchmark it.

Stefan

···

2008/8/6 Bryan Richardson <btrichardson@gmail.com>:

Hello all,

I am writing some code where I create a bunch of ranges, then at the end
I want to create new ranges out of any ranges that overlap one another.
For example, say I have the following ranges:

(1..5) (7..11) (22..29) (5..8)

Given the ranges above, I want to end up with the following ranges:

(1..11) (22..29)

Here is the code I've come up with so far (ranges is an array of ranges
similar to what I described above in my example):

ranges = @failed.outages
changes = true
while changes
   changes = false
   outages = ranges.collect { |range| range.to_a }
   ranges.clear
   while !outages.empty?
     outage = outages.shift
     outages.each do |n|
       unless (outage & n).empty?
         outage = (outage + n).uniq.sort
         outages.delete(n)
         changes = true
       end
     end
     ranges << (outage.first..outage.last)
   end
end
return ranges.sort { |a,b| a.first <=> b.first }

This code works, but it is *EXTREMELY* slow (my current array of ranges
is averaging out to ~24000 range elements). Anyone have an idea of how
to speed it up?

this should be quite fast with a few constraints. the main thing is that you do not have to convert ranges to detect the overlap nor to do the merge.

cfp:~ > cat a.rb
def collapse_inclusive_integer *ranges
   ranges.flatten!
   ranges.compact!

   ranges.each do |r|
     raise ArgumentError, "exclusive range #{ range.inspect }" if
       r.exclude_end?
     raise ArgumentError, "non-integer range #{ range.inspect }" unless
       Integer === r.begin and Integer === r.end
   end

   overlaps = lambda do |i,j|
     a, b = ranges[i], ranges[j]
     a.begin <= b.end and b.begin <= a.end
   end

   merge = lambda do |i,j|
     a, b = ranges[i], ranges[j]
     values = a.begin, a.end, b.begin, b.end
     min, max = values.min, values.max
     range = min .. max
     src, dst = i < j ? [i,j] : [j,i]
     ranges[src] = range
     ranges.delete_at dst
     range
   end

   loop {
     catch('start over'){
       size = ranges.size
       size.times do |i|
         size.times do |j|
           next if i == j
           if overlaps[i,j]
             merge[i,j]
             throw 'start over'
           end
         end
       end
       return ranges
     }
   }
end

ranges = 1..5, 7..11, 22..29, 5..8

p ranges
p collapse_inclusive_integer(ranges)

cfp:~ > ruby a.rb
[1..5, 7..11, 22..29, 5..8]
[1..11, 22..29]

a @ http://codeforpeople.com/

···

On Aug 6, 2008, at 12:48 PM, Bryan Richardson wrote:

Hello all,

I am writing some code where I create a bunch of ranges, then at the end
I want to create new ranges out of any ranges that overlap one another.
For example, say I have the following ranges:

(1..5) (7..11) (22..29) (5..8)

Given the ranges above, I want to end up with the following ranges:

(1..11) (22..29)

Here is the code I've come up with so far (ranges is an array of ranges
similar to what I described above in my example):

ranges = @failed.outages
changes = true
while changes
   changes = false
   outages = ranges.collect { |range| range.to_a }
   ranges.clear
   while !outages.empty?
     outage = outages.shift
     outages.each do |n|
       unless (outage & n).empty?
         outage = (outage + n).uniq.sort
         outages.delete(n)
         changes = true
       end
     end
     ranges << (outage.first..outage.last)
   end
end
return ranges.sort { |a,b| a.first <=> b.first }

This code works, but it is *EXTREMELY* slow (my current array of ranges
is averaging out to ~24000 range elements). Anyone have an idea of how
to speed it up?

--
Thanks!
Bryan
--
Posted via http://www.ruby-forum.com/\.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Bit busy at work, so I don't have time to play with code, but mostly
your algorithm is slow. This one should work better:

1. sort the array of ranges by startpoint
2. compare the first two ranges. if they overlap, merge them by
setting the second range equal to the merged range and the first to
nil
3. repeat with the next pair, and so on
4. compact the array

in action:

1..5, 5..8, 7..11, 22..29
nil, 1..8, 7..11, 22..29
nil, nil, 1..11, 22..29
no change
1..11, 22..29

Might be some corner cases I haven't taken care of but the basic
algorithm is O(n) and involves no range conversions.

martin

···

On Wed, Aug 6, 2008 at 11:48 AM, Bryan Richardson <btrichardson@gmail.com> wrote:

Hello all,

I am writing some code where I create a bunch of ranges, then at the end
I want to create new ranges out of any ranges that overlap one another.
For example, say I have the following ranges:

(1..5) (7..11) (22..29) (5..8)

Given the ranges above, I want to end up with the following ranges:

(1..11) (22..29)

Here is a module from our project earlier this year:

<pre>
# Module Spans and the associated Array "span" method implements an
array of numeric
# intervals with "merge", "complement", "fit", and other convenience
methods
# Copyright (C) 2008 GETCO LLC

···

On Aug 6, 1:48 pm, Bryan Richardson <btrichard...@gmail.com> wrote:

Hello all,

I am writing some code where I create a bunch of ranges, then at the end
I want to create new ranges out of any ranges that overlap one another.
For example, say I have the following ranges:

(1..5) (7..11) (22..29) (5..8)

Given the ranges above, I want to end up with the following ranges:

(1..11) (22..29)

#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# The full text of the license as of August 06, 2008, is available at
# http://www.gnu.org/licenses/gpl-3.0.txt
#
class Array

  # Module +Spans+ and the associated Array span method implements an
array numeric intervals - aka +spans+
  module Spans

    # Reduces the set of spans by combining the overlapping intervals
together to form larger intervals. Adjacent spans are not merged.
    # *IMPORTANT:* The return set is a set of *copies* of objects,
*not the references* to the original objects. The new object have some
of their boundaries changed, but it is random, which objects are
deleted, and which have new boundaries.
    #
    # >> [ [5,5], [10,20], [3,4], [5,6], [15,25], [3,7] ].spans.merge
    # => [[3, 7], [10, 25]]
    def merge
      result = deep_clone.position_sort!
      result.each_index do |idx|

        break if idx >= result.size - 1
        curr = result[ idx ]
        succ = result[ idx + 1 ]

        if overlap?( curr, succ )
          curr.send("#{@span_left}=", [curr.send(@span_left),
succ.send(@span_left) ].min )
          curr.send("#{@span_right}=", [curr.send(@span_right),
succ.send(@span_right)].max )
          result.delete_at( idx + 1)
          redo
        end
      end
      return result.spans( @span_left, @span_right )
    end

    # Provides the spans' complement to fill the specified interval.
The complement spans is a simple array of new arrays, responding to
the +first+ and +last+ methods.
    #
    # >> [ [5,5], [10,20], [3,4], [5,6], [15,25],
[3,7] ].spans.complement(-1,28)
    # => [[-1, 3], [7, 10], [25, 28]]
    def complement( left, right )
      complemented =
      merge.clean!.each do |span|
        break if right <= span.send(@span_left)
        next if span.send(@span_right) < left
        complemented << [ left, span.send(@span_left) ] if left <
span.send(@span_left)
        left = span.send(@span_right)
      end
      complemented << [ left, right ] if left < right
      complemented.spans
    end

    # Finds a first arbitrary span to fit the specified size. Returns
one of the original objects in the provided spans array.
    #
    # >> [ [5,5], [2,4], [3,7] ].spans.fit( 2 )
    # => [2, 4]
    def fit( requested_size )
      # see in-code comment to fit_all
      find do |a|
        delta = a.send(@span_right) - a.send(@span_left)
        (delta = requested_size) or (delta > requested_size)
      end
    end

    # Finds all spans to fit the specified size. Returns a new, but
filtered, array of the original objects. The array alreasy has +Spans+
mixed in with the original boundary names.
    #
    # >> [ [5,5], [2,4], [3,7] ].spans.fit_all( 2 )
    # => [[2, 4], [3, 7]]
    def fit_all( requested_size )
      self.select do |a|
        ( requested_size <= (a.send(@span_right) -
a.send(@span_left)) )
      end.spans( @span_left, @span_right )
    end

    # Removes all empty and invalid spans from the the calling
instance
    def clean!() delete_if{ |s| s.send(@span_left) >=
s.send(@span_right) } end

    protected

    def position_sort() sort { |a,b| position( a, b ) } end
    def position_sort!() sort! { |a,b| position( a, b ) } end

    def position( a, b )
      if a.send(@span_left) < b.send(@span_left)
        a.send(@span_right) > b.send(@span_left) ? 0 : -1 # -1 is
when "a" is at left of "b"
      else
         a.send(@span_left) < b.send(@span_right) ? 0 : 1 # 1 is when
"a" is at right of "b"
      end
    end

    def overlap?( a, b )
      position( a, b ) == 0
    end

    def deep_clone
        klone = self.clone
        klone.clear
        each { |v| klone << v.clone }
        klone
    end
  end

  # The setter method for an array's first element
  def first=( value) self[0] =value end

  # The setter method for an array's last element
  def last=( value) self[-1]=value end

  # Sets up the calling +Array+ instance to be processed as +Spans+ -
a collection of intervals. If arguments provided those should be names
of getter/setter methods for left and right sides on intervals
accordingly.
  def spans( left_name = nil, right_name = nil )
    # Validates, that all spans in the array have properly ordered
ends (beginning before or at the end)
    def valid?
      each { |span| return false if span.send(@span_left) >
span.send(@span_right) }
      return true
    end

    @span_left = ( left_name || "first" )
    @span_right = ( right_name || "last" )
    unless valid?
      remove_instance_variable(:@span_left)
      remove_instance_variable(:@span_right)
      raise ArgumentError, "Can not consider this array for spans -
invalid intervals."
    end
    self.extend Spans
  end
end
</pre>

Thanks to all who replied! I took the main advice of all of your posts
to be "don't convert ranges to arrays stupid!!!" and also "don't do this
recursively stupid!". :slight_smile:

With that in mind, here's what I came up with (I didn't want to just
copy someone's code... I don't learn anything that way!) -- it seems to
be much faster:

def merge_outages
  ranges = @failed.outages.sort { |a,b| a.first <=> b.first }
  outages = Array.new
  while !ranges.empty?
    range = ranges.shift
    loop do
      if ranges.empty?
        break
      else
        if (range.last + 1) >= ranges.first.first
          range = (range.first..ranges.first.last)
          ranges.shift
        else
          break
        end
      end
    end
    outages << range
  end
  return outages
end

What do you guys think of this approach? Am I overlooking something
that you guys suggested?

···

--
Thanks!
Bryan
--
Posted via http://www.ruby-forum.com/.

It's O(n), but only if you write a radix sort to sort the ranges :wink:

Stefan

···

2008/8/6 Martin DeMello <martindemello@gmail.com>:

On Wed, Aug 6, 2008 at 11:48 AM, Bryan Richardson > <btrichardson@gmail.com> wrote:

Hello all,

I am writing some code where I create a bunch of ranges, then at the end
I want to create new ranges out of any ranges that overlap one another.
For example, say I have the following ranges:

(1..5) (7..11) (22..29) (5..8)

Given the ranges above, I want to end up with the following ranges:

(1..11) (22..29)

Bit busy at work, so I don't have time to play with code, but mostly
your algorithm is slow. This one should work better:

1. sort the array of ranges by startpoint
2. compare the first two ranges. if they overlap, merge them by
setting the second range equal to the merged range and the first to
nil
3. repeat with the next pair, and so on
4. compact the array

in action:

1..5, 5..8, 7..11, 22..29
nil, 1..8, 7..11, 22..29
nil, nil, 1..11, 22..29
no change
1..11, 22..29

Might be some corner cases I haven't taken care of but the basic
algorithm is O(n) and involves no range conversions.

Hi --

Thanks to all who replied! I took the main advice of all of your posts
to be "don't convert ranges to arrays stupid!!!" and also "don't do this
recursively stupid!". :slight_smile:

With that in mind, here's what I came up with (I didn't want to just
copy someone's code... I don't learn anything that way!) -- it seems to
be much faster:

def merge_outages
ranges = @failed.outages.sort { |a,b| a.first <=> b.first }
outages = Array.new
while !ranges.empty?
   range = ranges.shift
   loop do
     if ranges.empty?
       break
     else
       if (range.last + 1) >= ranges.first.first
         range = (range.first..ranges.first.last)
         ranges.shift
       else
         break
       end
     end
   end
   outages << range
end
return outages
end

I did something similar in trying to implement Martin's algorithm. I
haven't tested it beyond eyeballing the results for this one run:

ranges = [(1..5), (7..11), (22..29), (5..8)].sort_by {|r| r.first }
outages = [ranges.shift]

ranges.each do |r|
   if outages[-1].include?(r.first)
     outages[-1] = Range.new(outages[-1].first, r.last)
   else
     outages.push(r)
   end
end

p outages

David

···

On Thu, 7 Aug 2008, Bryan Richardson wrote:

--
Rails training from David A. Black and Ruby Power and Light:
  * Advancing With Rails August 18-21 Edison, NJ
  * Co-taught by D.A. Black and Erik Kastner
See http://www.rubypal.com for details and updates!

Doh :slight_smile:

m.

···

On Wed, Aug 6, 2008 at 3:09 PM, Stefan Lang <perfectly.normal.hacker@gmail.com> wrote:

2008/8/6 Martin DeMello <martindemello@gmail.com>:

Might be some corner cases I haven't taken care of but the basic
algorithm is O(n) and involves no range conversions.

It's O(n), but only if you write a radix sort to sort the ranges :wink:

this_will_break_it = 0 ... 42

a @ http://codeforpeople.com/

···

On Aug 6, 2008, at 3:47 PM, Bryan Richardson wrote:

Am I overlooking something
that you guys suggested?

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Hi David,

I like the conciseness of your code, but I think it only works for when
the end of one range is the same as the beginning of another. I need to
also handle the times when ranges overlap more than that (i.e. 1..5,
3..6 -> 1..6). Thanks for thinking about this!

···

--
Thanks!
Bryan

David A. Black wrote:

Hi --

On Thu, 7 Aug 2008, Bryan Richardson wrote:

outages = Array.new
         break
       end
     end
   end
   outages << range
end
return outages
end

I did something similar in trying to implement Martin's algorithm. I
haven't tested it beyond eyeballing the results for this one run:

ranges = [(1..5), (7..11), (22..29), (5..8)].sort_by {|r| r.first }
outages = [ranges.shift]

ranges.each do |r|
   if outages[-1].include?(r.first)
     outages[-1] = Range.new(outages[-1].first, r.last)
   else
     outages.push(r)
   end
end

p outages

David

--
Posted via http://www.ruby-forum.com/\.

Hi Ara,

Can you elaborate at all?

···

--
Thanks!
Bryan
--
Posted via http://www.ruby-forum.com/.

Actually, never mind what I said about this code in a previous post... I
misunderstood it. I think it works perfectly... still testing!!! :slight_smile:

David A. Black wrote:

···

Hi --

I did something similar in trying to implement Martin's algorithm. I
haven't tested it beyond eyeballing the results for this one run:

ranges = [(1..5), (7..11), (22..29), (5..8)].sort_by {|r| r.first }
outages = [ranges.shift]

ranges.each do |r|
   if outages[-1].include?(r.first)
     outages[-1] = Range.new(outages[-1].first, r.last)
   else
     outages.push(r)
   end
end

p outages

David

--
Posted via http://www.ruby-forum.com/\.

It actually looks like the latest code I posted doesn't seem to work
either... I'm getting gaps where I shouldn't be. :frowning:

···

--
Posted via http://www.ruby-forum.com/.

cfp:~ > ruby -e' p ( 0 ... 1 ).to_a '
[0]

cfp:~ > ruby -e' p ( 0 .. 1 ).to_a '
[0, 1]

one range included the endpoint, one does not. with integer ranges you could guess the prior end point, but not with floats

cfp:~ > ruby -e' p ( 0 ... 1.0 ).to_a '
[0]

cfp:~ > ruby -e' p ( 0 ... 1.0 ).include?(0.999999999999999) '
true

a @ http://codeforpeople.com/

···

On Aug 6, 2008, at 4:36 PM, Bryan Richardson wrote:

Can you elaborate at all?

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Hi --

Hi David,

I like the conciseness of your code, but I think it only works for when
the end of one range is the same as the beginning of another. I need to
also handle the times when ranges overlap more than that (i.e. 1..5,
3..6 -> 1..6). Thanks for thinking about this!

Here's the heavy artillery :slight_smile:

def merge_ranges(ranges)
   ranges = ranges.sort_by {|r| r.first }
   *outages = ranges.shift
   ranges.each do |r|
     lastr = outages[-1]
     if lastr.last >= r.first - 1
       outages[-1] = lastr.first..[r.last, lastr.last].max
     else
       outages.push(r)
     end
   end
   outages
end

require 'test/unit'

class RangeTest < Test::Unit::TestCase
   def test_one
     ranges = [1..5, 20..20, 4..11, 40..45, 39..50]
     assert_equal([1..11, 20..20, 39..50], merge_ranges(ranges))
   end

   def test_two
     ranges = [1..5, 7..11, 22..29, 5..8]
     assert_equal([1..11, 22..29], merge_ranges(ranges))
   end

   def test_three
     ranges = [1..5, 7..11, 22..29, 6..8]
     assert_equal([1..11, 22..29], merge_ranges(ranges))
   end
end

David

···

On Thu, 7 Aug 2008, Bryan Richardson wrote:

--
Rails training from David A. Black and Ruby Power and Light:
  * Advancing With Rails August 18-21 Edison, NJ
  * Co-taught by D.A. Black and Erik Kastner
See http://www.rubypal.com for details and updates!

Crazy! I had *just* discovered the r.last lastr.last max problem when
you posted this. Thanks! :slight_smile:

David A. Black wrote:

···

Here's the heavy artillery :slight_smile:

def merge_ranges(ranges)
   ranges = ranges.sort_by {|r| r.first }
   *outages = ranges.shift
   ranges.each do |r|
     lastr = outages[-1]
     if lastr.last >= r.first - 1
       outages[-1] = lastr.first..[r.last, lastr.last].max
     else
       outages.push(r)
     end
   end
   outages
end

--
Posted via http://www.ruby-forum.com/\.