I am not sure I fully understand your requirement (partly because the code seems incomplete, e.g. where does "size" come from?) but I believe you might have the wrong iteration approach: as far as I understand you have a large array and *every* element in that array falls into one of 72 cluster spots. Since the array is large you want to traverse it only once. So the natural thing would be to traverse the array, put every element into its corresponding spot and then use the clustered information. You could use #group_by for this when on 1.8.7 or newer.
robert@fussel:~$ ruby1.8 -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]
robert@fussel:~$ ruby1.8 -e 'p (1..10).group_by{|x|x%3}'
{0=>[3, 6, 9], 1=>[1, 4, 7, 10], 2=>[2, 5, 8]}
robert@fussel:~$
So I would end up doing something like this:
Spot = Struct.new :lat, :lng do
include Comparable
def <=>(spot)
to_a <=> spot.to_a
end
def get_box(precision)
# place realistic calculation here
Spot.new lat.round(-1), lng.round(-1)
end
end
listings = ... # filled with Spot instances
clustered = listings.group_by do |spot|
spot.get_box
end
# a variant might be:
ClusterSpot = Struct.new :lat, :lng, :count do
def initialize
self.lat = self.lng = self.count = 0
end
def update(spot)
c = (count + 1).to_f
self.lat = ((lat * count) + spot.lat) / c
self.lng = ((lng * count) + spot.lng) / c
self.count += 1
end
end
clustered = Hash.new do |h,k|
# add something to the Hash whenever the key shows
# up the first time
h[k] = ClusterSpot.new
end
listings.each do |spot|
clustered[spot.get_box].update(spot)
end
# output clustered information
clustered.keys.sort.each do |cl_spot|
printf "%6d entries for %p\n", clustered[cl_spot].size, cl_spot
end
Of course you can pull this off without defining additional classes but with specialized classes the code becomes clearer and more reusable IMHO.
Kind regards
robert
···
On 01/01/2010 09:00 PM, Joe Buck wrote:
Alright, I won't hide you from the details. I have a map (google maps API) with data points (100,000+).
I'm doing server side clustering of the points to limit the amount of data I send to the client (I'm using the Ruby On Rails framework).
I start out with an array of 100,000 spots. Each spot is a hash with a :lat and :lng. I need to iterate through the array and group the spots into clusters. I break the world up into 30/30 degree clusters, so I have 72 clusters to put the spots in.
The spots are ordered by longitude.
Here is the code (do you use code snippet tags here?). Note that 'listings' is the array of spots ordered by longitude.
#---------------------------------------------------------------------
for i in (0..72)
# Move to the next block.
bNextRow = (i%nRows == 0 && i!=0)
cLatLngBox.topRightLat = bNextRow ? 90 : cLatLngBox.topRightLat-size
cLatLngBox.topRightLng = bNextRow ? (cLatLngBox.topRightLng+size) : cLatLngBox.topRightLng
cLatLngBox.botLeftLat = bNextRow ? 90-size : (cLatLngBox.botLeftLat-size)
cLatLngBox.botLeftLng = bNextRow ? (cLatLngBox.botLeftLng+size) : cLatLngBox.botLeftLng
# Holds the index of each spot we added to a cluster. We will remove these
# from the listing when done.
cRemove =
# Our new cluster.
cluster = {:lat=>0, :lng=>0, :size=>0}
# Iterate through all the spots, adding them to clusters.
listings.each_with_index do |listing, h|
if cLatLngBox.is_in_box?(listing[:lat], listing[:lng])
# Add to the cluster.
cluster[:size]+=1
cluster[:lat]+=listing[:lat].to_f
cluster[:lng]+=listing[:lng].to_f
# Tag the spot for removal.
cRemove << h
else
# We cheat here. Because the spots are ordered by longitude (-180 to 180) we'll check
# if the spot longitude exceeds our current max. If it does we're done with this loop.
if listing[:lng].to_f > cLatLngBox.topRightLng
break
end
end
end
# Remove all the items that were clustered.
cRemove.each do |index|
listings.delete_at(index)
end
#---------------------------------------------------------------------
This is faster than not removing the items. But I'd like to remove them while I'm in the above loop, but still have the ability to break.
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/