Searching through a sorted array

Hello,

I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range. I could use find_all

partition = my_array.find_all {|x| (x > start) && (x < end)}

My problem is that, as far as I know, find_all just iterates through
all the elements of the array and that's very inefficient, especially
in my case, in which I have a sorted array. Is there any standard way
to search efficiently through a sorted array? A binary search for
example?

Thanks
FireAphis

FireAphis wrote:

Hello,

I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range.

Then why not

my_array[start..end] ?

Cheers,
Peter

···

__
http://www.rubyrailways.com
http://scrubyt.org

FireAphis wrote:

My problem is that, as far as I know, find_all just iterates through
all the elements of the array and that's very inefficient, especially
in my case, in which I have a sorted array. Is there any standard way
to search efficiently through a sorted array? A binary search for
example?

You can also turn your array into a hash, which will work just as well
for unsorted arrays:

class Dog
  attr_reader :breed, :id

  def initialize(breed, id)
    @breed = breed
    @id = id
  end

end

#Create sorted array of Dog's:

···

#-----------------------------
dogs =
(1..2_000).each do |r|
  dogs << Dog.new("Akita", r*2)
end

=begin
dogs = (1..2000).inject() do |arr, r|
  arr << Dog.new("Akita", r*2)
  arr
end
=end
#----------------------------+

#Convert array to hash.
#key: dog.id
#val: Dog object
#---------------------
my_id_dog_hash = {}

class << my_id_dog_hash
  attr_accessor :highest_id
end

dogs.each do |dog|
  my_id_dog_hash[dog.id] = dog
end

my_id_dog_hash.highest_id = dogs[-1].id
----------------------+

#Find Dog's in range:
#-------------------
def get_dogs_in_range(id_dog_hash, range)

  high_id = id_dog_hash.highest_id
  if range.end > high_id
    range = range.begin..high_id
  end

  dogs_in_range =

  range.each do |r|
    dog = id_dog_hash[r]

    if dog
      dogs_in_range << id_dog_hash[r]
    end

  end
=begin
  dogs_in_range = range.inject() do |arr1, r|
    if dog = id_dog_hash[r]
      arr1 << dog
    end

    arr1
  end
=end

  dogs_in_range
end
------------------+

#Test it out:
------------
results = get_dogs_in_range(my_id_dog_hash, 1_990..2000)
results.each {|dog| p dog}
-----------+

--output:--
#<Dog:0x1a70c @id=1990, @breed="Akita">
#<Dog:0x1a6e4 @id=1992, @breed="Akita">
#<Dog:0x1a6bc @id=1994, @breed="Akita">
#<Dog:0x1a694 @id=1996, @breed="Akita">
#<Dog:0x1a66c @id=1998, @breed="Akita">
#<Dog:0x1a644 @id=2000, @breed="Akita">

--
Posted via http://www.ruby-forum.com/\.

Sorry if my explanation has been misleading. The array isn't
necessarily comprised of continuous or consecutive numbers. Moreover
they don't necessarily start from zero. For example:

my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1..1000] doesn't help in this case.

···

On Oct 3, 9:30 am, Peter Szinek <pe...@rubyrailways.com> wrote:

FireAphis wrote:
> Hello,

> I have a very big array of objects sorted by one of its numeric data
> members. During the flow of my application I need occasionally to get
> all the elements in a specific range.

Then why not

my_array[start..end] ?

Cheers,
Peter
__http://www.rubyrailways.comhttp://scrubyt.org

-------- Original-Nachricht --------

Datum: Wed, 3 Oct 2007 16:30:03 +0900
Von: Peter Szinek <peter@rubyrailways.com>
An: ruby-talk@ruby-lang.org
Betreff: Re: Searching through a sorted array

FireAphis wrote:
> Hello,
>
> I have a very big array of objects sorted by one of its numeric data
> members. During the flow of my application I need occasionally to get
> all the elements in a specific range.

Then why not

my_array[start..end] ?

Dear FireAphis,

to find the values 'start' and 'end', you could use something like
the twenty questions game to find a number between 1 and a million
(roughly 2^20, hence 20 questions), starting by :

1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?

2.) If the answer was "yes", is the index of 'start' ('end') bigger or smaller than 2^19+2^18 (roughly 750000)?
If the answer was "no", is the index of 'start' ('end') bigger or smaller than 2^18 (roughly 250000)?

etc..

The range in which "start" lies reduces its size by half with each question, and you need 2*log(2) n questions to find the 'start' and 'end'
values.

I am actually not aware whether that's implemented in the Array#index
method.

Best regards,

Axel

···

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: GMX Handytarife 2024 | jetzt wechseln & sparen

Neat. Not space efficient but undoubtfully more quick. But if the IDs
have large gaps ([10000, 20000, 30000]) do you think it still will be
more efficient compared to a simple iteration? Consider the fact that
a simple loop will have three iterations whereas your implementation
will have 30000 iteration.

Thanks,
FireAphis

···

On Oct 3, 1:45 pm, 7stud -- <dol...@excite.com> wrote:

FireAphis wrote:
> My problem is that, as far as I know, find_all just iterates through
> all the elements of the array and that's very inefficient, especially
> in my case, in which I have a sorted array. Is there any standard way
> to search efficiently through a sorted array? A binary search for
> example?

You can also turn your array into a hash, which will work just as well
for unsorted arrays:

class Dog
  attr_reader :breed, :id

  def initialize(breed, id)
    @breed = breed
    @id = id
  end

end

#Create sorted array of Dog's:
#-----------------------------
dogs =
(1..2_000).each do |r|
  dogs << Dog.new("Akita", r*2)
end

=begin
dogs = (1..2000).inject() do |arr, r|
  arr << Dog.new("Akita", r*2)
  arr
end
=end
#----------------------------+

#Convert array to hash.
#key: dog.id
#val: Dog object
#---------------------
my_id_dog_hash = {}

class << my_id_dog_hash
  attr_accessor :highest_id
end

dogs.each do |dog|
  my_id_dog_hash[dog.id] = dog
end

my_id_dog_hash.highest_id = dogs[-1].id
----------------------+

#Find Dog's in range:
#-------------------
def get_dogs_in_range(id_dog_hash, range)

  high_id = id_dog_hash.highest_id
  if range.end > high_id
    range = range.begin..high_id
  end

  dogs_in_range =

  range.each do |r|
    dog = id_dog_hash[r]

    if dog
      dogs_in_range << id_dog_hash[r]
    end

  end
=begin
  dogs_in_range = range.inject() do |arr1, r|
    if dog = id_dog_hash[r]
      arr1 << dog
    end

    arr1
  end
=end

  dogs_in_range
end
------------------+

#Test it out:
------------
results = get_dogs_in_range(my_id_dog_hash, 1_990..2000)
results.each {|dog| p dog}
-----------+

--output:--
#<Dog:0x1a70c @id=1990, @breed="Akita">
#<Dog:0x1a6e4 @id=1992, @breed="Akita">
#<Dog:0x1a6bc @id=1994, @breed="Akita">
#<Dog:0x1a694 @id=1996, @breed="Akita">
#<Dog:0x1a66c @id=1998, @breed="Akita">
#<Dog:0x1a644 @id=2000, @breed="Akita">

--
Posted viahttp://www.ruby-forum.com/.

-------- Original-Nachricht --------

Datum: Wed, 3 Oct 2007 16:50:03 +0900
Von: FireAphis <FireAphis@gmail.com>
An: ruby-talk@ruby-lang.org
Betreff: Re: Searching through a sorted array

> FireAphis wrote:
> > Hello,
>
> > I have a very big array of objects sorted by one of its numeric data
> > members. During the flow of my application I need occasionally to get
> > all the elements in a specific range.
>
> Then why not
>
> my_array[start..end] ?
>
> Cheers,
> Peter
> __http://www.rubyrailways.comhttp://scrubyt.org

Sorry if my explanation has been misleading. The array isn't
necessarily comprised of continuous or consecutive numbers. Moreover
they don't necessarily start from zero. For example:

my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1..1000] doesn't help in this case.

In that case, you can say:

interesting_array=my_array.delete_if{|entry| entry>1000 or entry<1}

Best regards,

Axel

···

On Oct 3, 9:30 am, Peter Szinek <pe...@rubyrailways.com> wrote:

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: GMX Handytarife 2024 | jetzt wechseln & sparen

Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm? I'm
quite sure Array#index doesn't do that since in order to apply a
binary search the array has to be sorted.

···

On Oct 3, 9:56 am, "Axel Etzold" <AEtz...@gmx.de> wrote:

-------- Original-Nachricht --------

> Datum: Wed, 3 Oct 2007 16:30:03 +0900
> Von: Peter Szinek <pe...@rubyrailways.com>
> An: ruby-t...@ruby-lang.org
> Betreff: Re: Searching through a sorted array
> FireAphis wrote:
> > Hello,

> > I have a very big array of objects sorted by one of its numeric data
> > members. During the flow of my application I need occasionally to get
> > all the elements in a specific range.

> Then why not

> my_array[start..end] ?

Dear FireAphis,

to find the values 'start' and 'end', you could use something like
the twenty questions game to find a number between 1 and a million
(roughly 2^20, hence 20 questions), starting by :

1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?

2.) If the answer was "yes", is the index of 'start' ('end') bigger or smaller than 2^19+2^18 (roughly 750000)?
If the answer was "no", is the index of 'start' ('end') bigger or smaller than 2^18 (roughly 250000)?

etc..

The range in which "start" lies reduces its size by half with each question, and you need 2*log(2) n questions to find the 'start' and 'end'
values.

I am actually not aware whether that's implemented in the Array#index
method.

Best regards,

Axel

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN:GMX Handytarife 2024 | jetzt wechseln & sparen

Here is my try:

class Array
  # assumes a sorted array whose elements respond appropriately to <=>
  def subrange(lower, higher)
    low = find_index_lower(lower)
    high = find_index_higher(higher)
    if (low >= length) || (high < 0)
      
    else
      self[low..high]
    end

  end

   # Finds the lowest index whose value is greater than the specified parameter
   # If all values in the array are lower, returns the index after the last one
   # If all values are higher, returns 0
   def find_index_lower(lower)
     return length if ((last <=> lower) == -1)
     return 0 if ((first <=> lower) == 1)
     low = 0
     high = self.length - 1
     while low < high
       index = low + (high - low) / 2
       element = self[index]
       test = element <=> lower
       case test
       when 0:
         return index + 1
       when -1:
         return (index + 1) if ((self[index+1] <=> lower) == 1)
         low = index + 1
       when 1:
         return (index) if ((self[index-1] <=> lower) <= 0)
         high = index - 1
       end
     end
   end

   # Finds the highest index whose value is lower than the specified parameter
   # If all values in the array are higher, returns -1
   # If all values are lower, returns length-1
   def find_index_higher(higher)
     return -1 if ((first <=> higher) >= 0)
     return length - 1 if ((last <=> higher) == -1)
     low = 0
     high = self.length - 1
     while low < high
       index = low + (high - low) / 2
       element = self[index]
       test = element <=> higher
       case test
       when 0:
         return index - 1
       when -1:
         return (index) if ((self[index+1] <=> higher) >= 0)
         low = index + 1
       when 1:
         return (index - 1) if ((self[index-1] <=> higher) == -1)
         high = index - 1
       end
     end
   end

end

a = [5, 6, 7, 8, 9, 10, 14]
p a
puts "6 - 13: #{a.subrange(6,13).inspect}"
puts "5 - 15: #{a.subrange(5,15).inspect}"
puts "5 - 100: #{a.subrange(5, 100).inspect}"
puts "2 - 100: #{a.subrange(2, 100).inspect}"
puts "-1 - 100: #{a.subrange(-1, 100).inspect}"
puts "100 - 150: #{a.subrange(100,150).inspect}"
puts "1 - 2: #{a.subrange(1,2).inspect}"

require 'benchmark'

n = 10_000
Benchmark.bmbm do |x|
  x.report("Array subrange") {
    n.times {a.subrange(6,13)}
  }
end

···

On 10/3/07, FireAphis <FireAphis@gmail.com> wrote:

> > I have a very big array of objects sorted by one of its numeric data
> > members. During the flow of my application I need occasionally to get
> > all the elements in a specific range.
Sorry if my explanation has been misleading. The array isn't
necessarily comprised of continuous or consecutive numbers. Moreover
they don't necessarily start from zero. For example:

my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1..1000] doesn't help in this case.

-----------------------------------------------------------

$ ruby array_subrange.rb
[5, 6, 7, 8, 9, 10, 14]
6 - 13: [7, 8, 9, 10]
5 - 15: [6, 7, 8, 9, 10, 14]
5 - 100: [6, 7, 8, 9, 10, 14]
2 - 100: [5, 6, 7, 8, 9, 10, 14]
-1 - 100: [5, 6, 7, 8, 9, 10, 14]
100 - 150:
1 - 2:
Rehearsal --------------------------------------------------
Array subrange 0.220000 0.020000 0.240000 ( 0.274937)
----------------------------------------- total: 0.240000sec

                     user system total real
Array subrange 0.190000 0.010000 0.200000 ( 0.227276)

Jesus.

FireAphis wrote:

Neat. Not space efficient

Yes, you're right. I wouldn't recommend it for really large arrays. The
original sorted array could also be abandoned in favor of a hash to
begin with.

But if the IDs
have large gaps ([10000, 20000, 30000]) do you think it still will be
more efficient compared to a simple iteration?

Depending on how you write the simple iteration, the hash method can be
a lot slower.

a simple loop will have three iterations whereas your implementation
will have 30000 iteration.

What if a simple iteration has 3 iterations x 3,000 range checks, e.g.:

  dogs_in_range =

  dogs.each do |dog|
    range.each do |r|
      if dog.id == r
        dogs_in_range << dog
      end
    end
  end

···

--
Posted via http://www.ruby-forum.com/\.

-------- Original-Nachricht --------

Datum: Wed, 3 Oct 2007 17:15:12 +0900
Von: FireAphis <FireAphis@gmail.com>
An: ruby-talk@ruby-lang.org
Betreff: Re: Searching through a sorted array

> -------- Original-Nachricht --------
>
> > Datum: Wed, 3 Oct 2007 16:30:03 +0900
> > Von: Peter Szinek <pe...@rubyrailways.com>
> > An: ruby-t...@ruby-lang.org
> > Betreff: Re: Searching through a sorted array
> > FireAphis wrote:
> > > Hello,
>
> > > I have a very big array of objects sorted by one of its numeric data
> > > members. During the flow of my application I need occasionally to
get
> > > all the elements in a specific range.
>
> > Then why not
>
> > my_array[start..end] ?
>
> Dear FireAphis,
>
> to find the values 'start' and 'end', you could use something like
> the twenty questions game to find a number between 1 and a million
> (roughly 2^20, hence 20 questions), starting by :
>
> 1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly
500000)?
>
> 2.) If the answer was "yes", is the index of 'start' ('end') bigger or
smaller than 2^19+2^18 (roughly 750000)?
> If the answer was "no", is the index of 'start' ('end') bigger or
smaller than 2^18 (roughly 250000)?
>
> etc..
>
> The range in which "start" lies reduces its size by half with each
question, and you need 2*log(2) n questions to find the 'start' and 'end'
> values.
>
> I am actually not aware whether that's implemented in the Array#index
> method.
>
> Best regards,
>
> Axel
>
> --
> Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
> Ideal für Modem und ISDN:GMX Handytarife 2024 | jetzt wechseln & sparen

Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm? I'm
quite sure Array#index doesn't do that since in order to apply a
binary search the array has to be sorted.

Dear FireAphis,

I don't know of an implemented solution, but I've implemented my own ... use at your own risk!

Best regards,

Axel

class Array
  def find_lower_index(cond)
    len=self.length
    upper=len
    max_nr=(Math.log(len)/Math.log(2)).floor
    lower_index=0
    pow=1
    while eval(cond)==false
      lower_index=2**pow
      pow=pow+1
    end
    pow=pow-2
    lower=2**pow
    (pow-1).downto(0) do |x|
      old_test=lower_index
      lower_index=lower+2**x
      if eval(cond)==false
      else
        lower_index=old_test
      end
    end
    return lower_index+1
  end
  
  def find_upper_index(cond)
    len=self.length
    upper=len
    max_nr=(Math.log(len)/Math.log(2)).floor
    upper_index=0
    pow=1
    while eval(cond)==true
      upper_index=2**pow
      pow=pow+1
    end
    pow=pow-2
    upper=2**pow
    (pow-1).downto(0) do |x|
      old_test=upper_index
      upper_index=upper+2**x
      if eval(cond)
      else
        upper_index=old_test
      end
      upper=upper_index
    end
    return upper_index-1
  end

end

# some example
len=10**8
a=(1..len).to_a
b=5*len/2
cond='lower_index>10'
r=a.find_lower_index(cond)
cond='upper_index<100'
s=a.find_upper_index(cond)
p r,s
p a[r]
p a[s]

···

On Oct 3, 9:56 am, "Axel Etzold" <AEtz...@gmx.de> wrote:

--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

A recent Ruby Quiz centered around binary search algorithms, so you can find several examples reading those solutions:

http://www.rubyquiz.com/quiz139.html

James Edward Gray II

···

On Oct 3, 2007, at 3:15 AM, FireAphis wrote:

Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm?

Dear Sir
   
  Can I please request you to stop this email services please? as I do not wish to have emails in my mail box.
   
  regards
  suchi

> > I have a very big array of objects sorted by one of its numeric data
> > members. During the flow of my application I need occasionally to get
> > all the elements in a specific range.
Sorry if my explanation has been misleading. The array isn't
necessarily comprised of continuous or consecutive numbers. Moreover
they don't necessarily start from zero. For example:

my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1..1000] doesn't help in this case.

Here is my try:

class Array
# assumes a sorted array whose elements respond appropriately to <=>
def subrange(lower, higher)
low = find_index_lower(lower)
high = find_index_higher(higher)
if (low >= length) || (high < 0)

else
self[low..high]
end

end

# Finds the lowest index whose value is greater than the specified parameter
# If all values in the array are lower, returns the index after the last one
# If all values are higher, returns 0
def find_index_lower(lower)
return length if ((last <=> lower) == -1)
return 0 if ((first <=> lower) == 1)
low = 0
high = self.length - 1
while low < high
index = low + (high - low) / 2
element = self[index]
test = element <=> lower
case test
when 0:
return index + 1
when -1:
return (index + 1) if ((self[index+1] <=> lower) == 1)
low = index + 1
when 1:
return (index) if ((self[index-1] <=> lower) <= 0)
high = index - 1
end
end
end

# Finds the highest index whose value is lower than the specified parameter
# If all values in the array are higher, returns -1
# If all values are lower, returns length-1
def find_index_higher(higher)
return -1 if ((first <=> higher) >= 0)
return length - 1 if ((last <=> higher) == -1)
low = 0
high = self.length - 1
while low < high
index = low + (high - low) / 2
element = self[index]
test = element <=> higher
case test
when 0:
return index - 1
when -1:
return (index) if ((self[index+1] <=> higher) >= 0)
low = index + 1
when 1:
return (index - 1) if ((self[index-1] <=> higher) == -1)
high = index - 1
end
end
end

end

a = [5, 6, 7, 8, 9, 10, 14]
p a
puts "6 - 13: #{a.subrange(6,13).inspect}"
puts "5 - 15: #{a.subrange(5,15).inspect}"
puts "5 - 100: #{a.subrange(5, 100).inspect}"
puts "2 - 100: #{a.subrange(2, 100).inspect}"
puts "-1 - 100: #{a.subrange(-1, 100).inspect}"
puts "100 - 150: #{a.subrange(100,150).inspect}"
puts "1 - 2: #{a.subrange(1,2).inspect}"

require 'benchmark'

n = 10_000
Benchmark.bmbm do |x|
x.report("Array subrange") {
n.times {a.subrange(6,13)}
}
end

···

Jesús Gabriel y Galán <jgabrielygalan@gmail.com> wrote:
  On 10/3/07, FireAphis wrote:

-----------------------------------------------------------

$ ruby array_subrange.rb
[5, 6, 7, 8, 9, 10, 14]
6 - 13: [7, 8, 9, 10]
5 - 15: [6, 7, 8, 9, 10, 14]
5 - 100: [6, 7, 8, 9, 10, 14]
2 - 100: [5, 6, 7, 8, 9, 10, 14]
-1 - 100: [5, 6, 7, 8, 9, 10, 14]
100 - 150:
1 - 2:
Rehearsal --------------------------------------------------
Array subrange 0.220000 0.020000 0.240000 ( 0.274937)
----------------------------------------- total: 0.240000sec

user system total real
Array subrange 0.190000 0.010000 0.200000 ( 0.227276)

Jesus.

---------------------------------
Forgot the famous last words? Access your message archive online. Click here.

http://raa.ruby-lang.org/search.rhtml?search=binary%20search

Kind regards

robert

···

2007/10/3, FireAphis <FireAphis@gmail.com>:

Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm? I'm
quite sure Array#index doesn't do that since in order to apply a
binary search the array has to be sorted.

Axel Etzold wrote:

# some example
len=10**8
a=(1..len).to_a
b=5*len/2
cond='lower_index>10'
r=a.find_lower_index(cond)
cond='upper_index<100'
s=a.find_upper_index(cond)
p r,s
p a[r]
p a[s]

When I use this code:

a = [2, 4, 6, 8]
cond='lower_index>0'
r=a.find_lower_index(cond)
cond='upper_index<2'
s=a.find_upper_index(cond)
p r,s
p a[r]
p a[s]

I get this output:

3
1
8
4

The output appears to be backwards, since the lower index would be 1 and
the upper index would be 3. And it's a little bit puzzling why a search
would be needed to find an index in an array that is greater than 0. I
can tell you the answer without searching: it's 1. The same goes for
the upper index: if the cond = "upper index < 2", then the upper index
is 1.

···

--
Posted via http://www.ruby-forum.com/\.

Axel Etzold wrote:

Can you explain this output:

a = [2, 4, 6, 8]
cond='lower_index>0'
r=a.find_lower_index(cond)
cond='upper_index<4'
s=a.find_upper_index(cond)
p r,s
p a[r]
p a[s]

--output:--
3
2
8
6

···

--
Posted via http://www.ruby-forum.com/\.