Delete elements in array, break, and keep changes?

I have an array of a lot elements that I need to cluster (they are
latitudes and longitudes).

As I iterate the array I want to remove elements that I have already
clustered.

Currently I've tried using Array::reject! and Array::delete_if. It
works. I can further optimize by breaking from the delete_if/reject!
block if certiain conditions are met.

Here is the issue, if I call break in the block, all of the changes to
the array are lost.

Is there a way to break from the block but still delete/reject elements
in the array?

···

--
Posted via http://www.ruby-forum.com/.

I have an array of a lot elements that I need to cluster (they are
latitudes and longitudes).

As I iterate the array I want to remove elements that I have already
clustered.

In this case, it looks like Array#shift might work, though I'm not sure I totally understand your requirements. Something like:

until long_lat_array.empty?
  long_lat = long_lat_array.shift
  cluster(long_lat)
end

Hope I'm not missing your issue.

- Ehsan

···

_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/171222986/direct/01/

Ehsan,

Thanks for the help. Unfortunately I don't think that will work as it
removes the first element in the array.

What I need is an Array::delete_if or Array::reject! call that allows me
to break from the block while still removing elements.

My solution is to use a second array to keep the indices of all removed
elements. I can then iterate through the second array after the first
loop and remove the elements. I'm guessing reject! and delete_if do this
as well. It could be more efficient but it works.

Thanks!

···

--
Posted via http://www.ruby-forum.com/.

def copy_and_shift_array array
   temp_array = array # or perform a deep copy, if you are worried about
                      # about the first array's integrity.
   temp_array.each do |element|
     return element if condition_met?
   end
end

Array#shift is useful here *because* it gives you the first Array element in an easily accessible format (the actual Object in the Array), so that you can perform your checks and other magic, before storing it someplace else.

Note: Above code is untested, but should work, at least in principle.

(I'm always a foggy when it comes to blocks and variable assignment in Ruby, despite many ruby-talk discussions about scope in Ruby).

···

On 01.01.2010 17:45, Joe Buck wrote:

Ehsan,

Thanks for the help. Unfortunately I don't think that will work as it
removes the first element in the array.

--
Phillip Gawlowski

Without knowing the point of what you're trying to do, this sounds
like what you have in mind:

array = # or whatev
process_array = true
array.reject!(x) do
  if process_array
    conditional_code_for x
  else
    false
end

···

On Jan 1, 11:45 am, Joe Buck <semle2...@hotmail.com> wrote:

Ehsan,

Thanks for the help. Unfortunately I don't think that will work as it
removes the first element in the array.

What I need is an Array::delete_if or Array::reject! call that allows me
to break from the block while still removing elements.

My solution is to use a second array to keep the indices of all removed
elements. I can then iterate through the second array after the first
loop and remove the elements. I'm guessing reject! and delete_if do this
as well. It could be more efficient but it works.

Thanks!
--
Posted viahttp://www.ruby-forum.com/.

Also, is the reject! really the bottleneck in your app? We obviously
don't want to intentionally write sub-optimal code, but do not
optimize what's not slowing you down.

···

On Jan 1, 2:18 pm, pharrington <xenogene...@gmail.com> wrote:

On Jan 1, 11:45 am, Joe Buck <semle2...@hotmail.com> wrote:

> Ehsan,

> Thanks for the help. Unfortunately I don't think that will work as it
> removes the first element in the array.

> What I need is an Array::delete_if or Array::reject! call that allows me
> to break from the block while still removing elements.

> My solution is to use a second array to keep the indices of all removed
> elements. I can then iterate through the second array after the first
> loop and remove the elements. I'm guessing reject! and delete_if do this
> as well. It could be more efficient but it works.

> Thanks!
> --
> Posted viahttp://www.ruby-forum.com/.

Without knowing the point of what you're trying to do, this sounds
like what you have in mind:

array = # or whatev
process_array = true
array.reject!(x) do
if process_array
conditional_code_for x
else
false
end

wow array.reject! do |x| i don't know basic ruby syntax today...

Or if you really want to avoid even iterating over each element:

original_array =
new_array =

original_array.each {|x| new_array << x unless conditional_code_for
(x)}

But again, what precisely are you doing? Is the deletion code really
your bottleneck? And if it is, I'm almost certain that using a more
appropriate structure than an array would be the way to go.

···

On Jan 1, 2:19 pm, pharrington <xenogene...@gmail.com> wrote:

On Jan 1, 2:18 pm, pharrington <xenogene...@gmail.com> wrote:

> On Jan 1, 11:45 am, Joe Buck <semle2...@hotmail.com> wrote:

> > Ehsan,

> > Thanks for the help. Unfortunately I don't think that will work as it
> > removes the first element in the array.

> > What I need is an Array::delete_if or Array::reject! call that allows me
> > to break from the block while still removing elements.

> > My solution is to use a second array to keep the indices of all removed
> > elements. I can then iterate through the second array after the first
> > loop and remove the elements. I'm guessing reject! and delete_if do this
> > as well. It could be more efficient but it works.

> > Thanks!
> > --
> > Posted viahttp://www.ruby-forum.com/.

> Without knowing the point of what you're trying to do, this sounds
> like what you have in mind:

> array = # or whatev
> process_array = true
> array.reject!(x) do
> if process_array
> conditional_code_for x
> else
> false
> end

Also, is the reject! really the bottleneck in your app? We obviously
don't want to intentionally write sub-optimal code, but do not
optimize what's not slowing you down.

Alright, I won't hide you from the details. I have a map (google maps
API) with data points (100,000+).

I'm doing server side clustering of the points to limit the amount of
data I send to the client (I'm using the Ruby On Rails framework).

I start out with an array of 100,000 spots. Each spot is a hash with a
:lat and :lng. I need to iterate through the array and group the spots
into clusters. I break the world up into 30/30 degree clusters, so I
have 72 clusters to put the spots in.

The spots are ordered by longitude.

Here is the code (do you use code snippet tags here?). Note that
'listings' is the array of spots ordered by longitude.

···

#---------------------------------------------------------------------
  for i in (0..72)

    # Move to the next block.
    bNextRow = (i%nRows == 0 && i!=0)
    cLatLngBox.topRightLat = bNextRow ? 90 : cLatLngBox.topRightLat-size
    cLatLngBox.topRightLng = bNextRow ? (cLatLngBox.topRightLng+size) :
cLatLngBox.topRightLng
    cLatLngBox.botLeftLat = bNextRow ? 90-size :
(cLatLngBox.botLeftLat-size)
    cLatLngBox.botLeftLng = bNextRow ? (cLatLngBox.botLeftLng+size) :
cLatLngBox.botLeftLng

    # Holds the index of each spot we added to a cluster. We will remove
these
    # from the listing when done.
    cRemove = []

    # Our new cluster.
    cluster = {:lat=>0, :lng=>0, :size=>0}

    # Iterate through all the spots, adding them to clusters.
    listings.each_with_index do |listing, h|
      if cLatLngBox.is_in_box?(listing[:lat], listing[:lng])

        # Add to the cluster.
        cluster[:size]+=1
        cluster[:lat]+=listing[:lat].to_f
        cluster[:lng]+=listing[:lng].to_f

        # Tag the spot for removal.
        cRemove << h
      else
        # We cheat here. Because the spots are ordered by longitude
(-180 to 180) we'll check
        # if the spot longitude exceeds our current max. If it does
we're done with this loop.
        if listing[:lng].to_f > cLatLngBox.topRightLng
          break
        end
      end
    end

    # Remove all the items that were clustered.
    cRemove.each do |index|
      listings.delete_at(index)
    end
#---------------------------------------------------------------------

This is faster than not removing the items. But I'd like to remove them
while I'm in the above loop, but still have the ability to break.
--
Posted via http://www.ruby-forum.com/.

Sorry for the terrible formatting. As another example, here is C++ code
that would do what I want using iterators.

typedef std::vector<cSpot> spots_t;
spots_t cSpots;

// Fill cSpots here ...

CLatLngBox cLatLngBox;

for(int i=0; i<72; i++)
  {
  // Move cLatLngBox to the next block ..

  // Our new cluster.
  CCluster cCluster;

  // Iterate points.
  spots_t::iterator itr = cSpots.begin ();
  for(itr; itr!=cSpots.end():wink:
    {
     if(cLatLngBox.is_in_box(*itr))
       {
       // Add to cluster.

       // This is where we remove the spot.
       itr = cSpots.erase(itr);
       }
     else
       {
       if((*itr).lng > cLatLngBox.lng)
         {
         // Done with the inner loop.
         break;
         }

       // Next spot.
       itr++;
       }
    }
  }

···

--
Posted via http://www.ruby-forum.com/.

I am not sure I fully understand your requirement (partly because the code seems incomplete, e.g. where does "size" come from?) but I believe you might have the wrong iteration approach: as far as I understand you have a large array and *every* element in that array falls into one of 72 cluster spots. Since the array is large you want to traverse it only once. So the natural thing would be to traverse the array, put every element into its corresponding spot and then use the clustered information. You could use #group_by for this when on 1.8.7 or newer.

robert@fussel:~$ ruby1.8 -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]
robert@fussel:~$ ruby1.8 -e 'p (1..10).group_by{|x|x%3}'
{0=>[3, 6, 9], 1=>[1, 4, 7, 10], 2=>[2, 5, 8]}
robert@fussel:~$

So I would end up doing something like this:

Spot = Struct.new :lat, :lng do
   include Comparable

   def <=>(spot)
     to_a <=> spot.to_a
   end

   def get_box(precision)
     # place realistic calculation here
     Spot.new lat.round(-1), lng.round(-1)
   end
end

listings = ... # filled with Spot instances

clustered = listings.group_by do |spot|
   spot.get_box
end

# a variant might be:
ClusterSpot = Struct.new :lat, :lng, :count do
   def initialize
     self.lat = self.lng = self.count = 0
   end

   def update(spot)
     c = (count + 1).to_f
     self.lat = ((lat * count) + spot.lat) / c
     self.lng = ((lng * count) + spot.lng) / c
     self.count += 1
   end
end

clustered = Hash.new do |h,k|
   # add something to the Hash whenever the key shows
   # up the first time
   h[k] = ClusterSpot.new
end

listings.each do |spot|
   clustered[spot.get_box].update(spot)
end

# output clustered information
clustered.keys.sort.each do |cl_spot|
   printf "%6d entries for %p\n", clustered[cl_spot].size, cl_spot
end

Of course you can pull this off without defining additional classes but with specialized classes the code becomes clearer and more reusable IMHO.

Kind regards

  robert

···

On 01/01/2010 09:00 PM, Joe Buck wrote:

Alright, I won't hide you from the details. I have a map (google maps API) with data points (100,000+).

I'm doing server side clustering of the points to limit the amount of data I send to the client (I'm using the Ruby On Rails framework).

I start out with an array of 100,000 spots. Each spot is a hash with a :lat and :lng. I need to iterate through the array and group the spots into clusters. I break the world up into 30/30 degree clusters, so I have 72 clusters to put the spots in.

The spots are ordered by longitude.

Here is the code (do you use code snippet tags here?). Note that 'listings' is the array of spots ordered by longitude.

#---------------------------------------------------------------------
  for i in (0..72)

    # Move to the next block.
    bNextRow = (i%nRows == 0 && i!=0)
    cLatLngBox.topRightLat = bNextRow ? 90 : cLatLngBox.topRightLat-size
    cLatLngBox.topRightLng = bNextRow ? (cLatLngBox.topRightLng+size) : cLatLngBox.topRightLng
    cLatLngBox.botLeftLat = bNextRow ? 90-size : (cLatLngBox.botLeftLat-size)
    cLatLngBox.botLeftLng = bNextRow ? (cLatLngBox.botLeftLng+size) : cLatLngBox.botLeftLng

    # Holds the index of each spot we added to a cluster. We will remove these
    # from the listing when done.
    cRemove =

    # Our new cluster.
    cluster = {:lat=>0, :lng=>0, :size=>0}

    # Iterate through all the spots, adding them to clusters.
    listings.each_with_index do |listing, h|
      if cLatLngBox.is_in_box?(listing[:lat], listing[:lng])

        # Add to the cluster.
        cluster[:size]+=1
        cluster[:lat]+=listing[:lat].to_f
        cluster[:lng]+=listing[:lng].to_f

        # Tag the spot for removal.
        cRemove << h
      else
        # We cheat here. Because the spots are ordered by longitude (-180 to 180) we'll check
        # if the spot longitude exceeds our current max. If it does we're done with this loop.
        if listing[:lng].to_f > cLatLngBox.topRightLng
          break
        end
      end
    end

    # Remove all the items that were clustered.
    cRemove.each do |index|
      listings.delete_at(index)
    end
#---------------------------------------------------------------------

This is faster than not removing the items. But I'd like to remove them while I'm in the above loop, but still have the ability to break.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Array#delete_if appears to be completely borked when used with 'break'.

a = [5,6,7,8,9,10]

=> [5, 6, 7, 8, 9, 10]

a.delete_if { |x| break if x > 8; x < 7 }

=> nil

a

=> [7, 8, 7, 8, 9, 10]

I have opened a ticket:
http://redmine.ruby-lang.org/issues/show/2545

In the mean time you could do something like this:

class Array
  def safe_delete_if
    i = 0
    while i < length
      if yield self[i]
        slice!(i,1)
      else
        i += 1
      end
    end
  end
end

···

--
Posted via http://www.ruby-forum.com/\.

Alright, I won't hide you from the details. I have a map (google maps API) with data points (100,000+).

I'm doing server side clustering of the points to limit the amount of data I send to the client (I'm using the Ruby On Rails framework).

I start out with an array of 100,000 spots. Each spot is a hash with a :lat and :lng. I need to iterate through the array and group the spots into clusters. I break the world up into 30/30 degree clusters, so I have 72 clusters to put the spots in.

The spots are ordered by longitude.

Here is the code (do you use code snippet tags here?). Note that 'listings' is the array of spots ordered by longitude.

#---------------------------------------------------------------------
  for i in (0..72)

    # Move to the next block.
    bNextRow = (i%nRows == 0 && i!=0)
    cLatLngBox.topRightLat = bNextRow ? 90 : cLatLngBox.topRightLat-size
    cLatLngBox.topRightLng = bNextRow ? (cLatLngBox.topRightLng+size) : cLatLngBox.topRightLng
    cLatLngBox.botLeftLat = bNextRow ? 90-size : (cLatLngBox.botLeftLat-size)
    cLatLngBox.botLeftLng = bNextRow ? (cLatLngBox.botLeftLng+size) : cLatLngBox.botLeftLng

    # Holds the index of each spot we added to a cluster. We will remove these
    # from the listing when done.
    cRemove =

    # Our new cluster.
    cluster = {:lat=>0, :lng=>0, :size=>0}

    # Iterate through all the spots, adding them to clusters.
    listings.each_with_index do |listing, h|
      if cLatLngBox.is_in_box?(listing[:lat], listing[:lng])

        # Add to the cluster.
        cluster[:size]+=1
        cluster[:lat]+=listing[:lat].to_f
        cluster[:lng]+=listing[:lng].to_f

        # Tag the spot for removal.
        cRemove << h
      else
        # We cheat here. Because the spots are ordered by longitude (-180 to 180) we'll check
        # if the spot longitude exceeds our current max. If it does we're done with this loop.
        if listing[:lng].to_f > cLatLngBox.topRightLng
          break
        end
      end
    end

    # Remove all the items that were clustered.
    cRemove.each do |index|
      listings.delete_at(index)
    end
#---------------------------------------------------------------------

This is faster than not removing the items. But I'd like to remove them while I'm in the above loop, but still have the ability to break.

I am not sure I fully understand your requirement (partly because the code seems incomplete, e.g. where does "size" come from?) but I believe you might have the wrong iteration approach: as far as I understand you have a large array and *every* element in that array falls into one of 72 cluster spots. Since the array is large you want to traverse it only once. So the natural thing would be to traverse the array, put every element into its corresponding spot and then use the clustered information. You could use #group_by for this when on 1.8.7 or newer.

robert@fussel:~$ ruby1.8 -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]
robert@fussel:~$ ruby1.8 -e 'p (1..10).group_by{|x|x%3}'
{0=>[3, 6, 9], 1=>[1, 4, 7, 10], 2=>[2, 5, 8]}
robert@fussel:~$

So I would end up doing something like this:

Spot = Struct.new :lat, :lng do
  include Comparable

  def <=>(spot)
    to_a <=> spot.to_a
  end

  def get_box(precision)
    # place realistic calculation here
    Spot.new lat.round(-1), lng.round(-1)
  end
end

listings = ... # filled with Spot instances

clustered = listings.group_by do |spot|
  spot.get_box
end

# a variant might be:
ClusterSpot = Struct.new :lat, :lng, :count do
  def initialize
    self.lat = self.lng = self.count = 0
  end

  def update(spot)
    c = (count + 1).to_f
    self.lat = ((lat * count) + spot.lat) / c
    self.lng = ((lng * count) + spot.lng) / c
    self.count += 1
  end
end

clustered = Hash.new do |h,k|
  # add something to the Hash whenever the key shows
  # up the first time
  h[k] = ClusterSpot.new
end

listings.each do |spot|
  clustered[spot.get_box].update(spot)
end

This output code belongs to the first variant. Sorry for mixing this up but my concentration suffers a bit from lying sick in bed. :frowning:

# output clustered information
clustered.keys.sort.each do |cl_spot|
  printf "%6d entries for %p\n", clustered[cl_spot].size, cl_spot
end

Output for the second alternative could look like

clustered.sort.each do |spot, cl_spot|
   printf "lat=%8.3f lng=%8.3f\n", cl_spot
end

The main advantage of both these approaches is that they do not need a sorted input. This makes them more flexible. The single pass approach is useful for arbitrary large sets of data - especially when using the second approach where the size of the collection never grows beyond what will be output in the end.

Cheers

  robert

···

On 01/02/2010 12:28 PM, Robert Klemme wrote:

On 01/01/2010 09:00 PM, Joe Buck wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/