Grouping elements of an array

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

- Steve W.

···

--
Posted via http://www.ruby-forum.com/.

What about
A 0
B 20
C 40

Does that become
[[A,B],[B,C]] or [[A,B,C]] or something else? The congruence class here is
unclear.

···

On Thu, Mar 18, 2010 at 4:50 PM, Steve Wilhelm <steve@studio831.com> wrote:

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

- Steve W.
--
Posted via http://www.ruby-forum.com/\.

Hi

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

How about this:

  1 arr = [0,15,35,100,205,300]
  2 arr2 = [0, 20, 40]
  3
  4 def group(array)
  5 array.map!{|e| [e]}
  6 array.inject() do |r, e|
  7 if r == or e[0] - r.last.last > 30 then
  8 r.push(e)
  9 else
10 r[-1].push(e[0])
11 end
12 r
13 end
14 end
15
16 puts arr.inspect
17 puts group(arr).inspect
18 puts arr2.inspect
19 puts group(arr2).inspect

···

On Thu, Mar 18, 2010 at 11:50 PM, Steve Wilhelm <steve@studio831.com> wrote:

--
Roger Braun
http://yononaka.de
roger.braun@student.uni-tuebingen.de

Here is my solution, it's conceptually similar to Roger's, though differs in
implementation gist:337195 · GitHub

···

On Thu, Mar 18, 2010 at 4:50 PM, Steve Wilhelm <steve@studio831.com> wrote:

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

- Steve W.
--
Posted via http://www.ruby-forum.com/\.

Hello, I am new to Ruby. Below is my solution:

a=[0, 15, 35, 100, 205, 215, 300]
b=[]
c=[]
d=a[0]
a.each do |i|
  if i - d < 30
    c << i
  else
    b << c
    c=[i]
  end
  d =i
end
if c.size > 0
  b << c
end

p b

Here's another way:

require 'pp'
require 'set'

Struct.new("Record", :value, :timestamp)

data = [
  Struct::Record.new('A', 0),
  Struct::Record.new('B', 15),
  Struct::Record.new('C', 35),
  Struct::Record.new('D', 100),
  Struct::Record.new('E', 205),
  Struct::Record.new('F', 215),
  Struct::Record.new('G', 300),
]

pp data

pp data.
     to_set.
     divide{|i, j| (i.timestamp - j.timestamp.abs) < 30}.
     map{|s| s.to_a}

$ ruby -v z.rb
ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0]
[#<struct Struct::Record value="A", timestamp=0>,
#<struct Struct::Record value="B", timestamp=15>,
#<struct Struct::Record value="C", timestamp=35>,
#<struct Struct::Record value="D", timestamp=100>,
#<struct Struct::Record value="E", timestamp=205>,
#<struct Struct::Record value="F", timestamp=215>,
#<struct Struct::Record value="G", timestamp=300>]
[[#<struct Struct::Record value="D", timestamp=100>],
[#<struct Struct::Record value="G", timestamp=300>],
[#<struct Struct::Record value="F", timestamp=215>,
  #<struct Struct::Record value="E", timestamp=205>],
[#<struct Struct::Record value="A", timestamp=0>,
  #<struct Struct::Record value="C", timestamp=35>,
  #<struct Struct::Record value="B", timestamp=15>]]

(Depending on the amount of data converting from array to sets to
arrays may be expensive :slight_smile:

···

On Thu, Mar 18, 2010 at 6:50 PM, Steve Wilhelm <steve@studio831.com> wrote:

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0, B 15, C 35, D 100, E 205, F 215, G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

Assuming records are ordered already - otherwise you need a sort in between.

require 'pp'

dat = <<DDD.each_line.map {|l|r = l.split;r[1]=r[1].to_i;r}
A 0
B 15
C 35
D 100
E 205
F 215
G 300
DDD

pp dat

gr = dat.inject do |agg, rec|
   if agg.last && rec[1] - agg.last.last[1] <= 15
     agg.last << rec
     agg
   else
     agg << [rec]
   end
end

pp gr

Note: I don't claim that this is *the* Ruby way.

Kind regards

  robert

···

On 03/18/2010 11:50 PM, Steve Wilhelm wrote:

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Lots of verbose answers. It can be quite short:

    a = [['a',0],['b',15],['c',35],['d',100],['e',205],['f',215],['g',300]]

    a.group_by {|name,val| val/100} .
      collect do |key,list|
        list.collect {|name, time| name}
      end

    # => [["a", "b", "c"], ["d"], ["e", "f"], ["g"]]

···

At 2010-03-18 06:50PM, "Steve Wilhelm" wrote:

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

--
Glenn Jackman
    Write a wise saying and your name will live forever. -- Anonymous

I think this kind of problems which slices consecutive elements in an array
is not well supported in Ruby.

Enumerable#each_slice is not usable because it slices for each fixed number
of elements.

Ruby 1.9.2 has Enumerable#slice_before and it is usable but not so elegant
because it needs to maintain previous element.

% ruby -e '
a = [
  ["A", 0],
  ["B", 15],
  ["C", 35],
  ["D", 100],
  ["E", 205],
  ["F", 215],
  ["G", 300]
]
prev = nil
p a.slice_before {|s,t|
    tmp, prev = prev, t
    tmp && (t-tmp) > 30
  }.map {|es|
    es.map {|s,t| s }
  }
'
[["A", "B", "C"], ["D"], ["E", "F"], ["G"]]

We may need Enumerable#slice_between.

% ruby -e '
module Enumerable
  def slice_between(&b)
    Enumerator.new {|y|
      first = true
      buf =
      prev = nil
      self.each {|elt|
        if first
          first = false
          buf << elt
          prev = elt
        else
          if b.call(prev, elt)
            y << buf
            buf = [elt]
          else
            buf << elt
          end
          prev = elt
        end
      }
      if !buf.empty?
        y << buf
      end
    }
  end
end
a = [
  ["A", 0],
  ["B", 15],
  ["C", 35],
  ["D", 100],
  ["E", 205],
  ["F", 215],
  ["G", 300]
]
p a.slice_between {|(s1,t1),(s2,t2)|
    (t2-t1) < 30
  }.map {|es|
    es.map {|s,t| s }
  }
'
[["A"], ["B"], ["C", "D", "E"], ["F", "G"]]

···

2010/3/19 Steve Wilhelm <steve@studio831.com>:

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

--
Tanaka Akira

Josh Cheek wrote:

···

On Thu, Mar 18, 2010 at 4:50 PM, Steve Wilhelm <steve@studio831.com> > wrote:

A 0

Any help on how to do this in the "Ruby Way" would be appreciated.

- Steve W.
--
Posted via http://www.ruby-forum.com/\.

What about
A 0
B 20
C 40

Does that become
[[A,B],[B,C]] or [[A,B,C]] or something else? The congruence class here
is
unclear.

It would be [[A,B,C]].

- Steve W.
--
Posted via http://www.ruby-forum.com/\.

Roger Braun wrote:

How about this:

You should really know about Enumerable#group_by.

irb(main):001:0> [0,15,35,100,205,300].group_by {|i| i/100 }
=> {0=>[0, 15, 35], 1=>[100], 2=>[205], 3=>[300]}

After reviewing the suggestions, here is my code. records is the result
of a ActiveRecord::find with a member called timestamp containing a UNIX
timestamp.

I introduced an addition of a default value for the find_index call and
included a descending sort of the groups.

  edges = records.each_cons(2).group_by {|(record_x, record_y)|
record_y.timestamp - record_x.timestamp < 30 }[false].map{ |edge|
edge[0] }

  groups = records.group_by { |record| edges.find_index {|edge|
record.timestamp <= edge.timestamp } || edges.count }.values.sort_by {

e> -e.count }

Thanks again for everyone's help.

- Steve W.

···

--
Posted via http://www.ruby-forum.com/\.

Yeah, but as far as I can see that's not a solution to the problem the OP wanted to solve. He wants to build chains based on the delta and not based on the fact that they fall in the same interval. Am I missing something?

Kind regards

  robert

···

On 03/19/2010 03:33 PM, Glenn Jackman wrote:

At 2010-03-18 06:50PM, "Steve Wilhelm" wrote:

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.
  I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.
  Example (second column is timestamp in seconds starting from zero).
  A 0
B 15
C 35
D 100
E 205
F 215
G 300
  would result in
  [[A, B, C], [D], [E, F], [G]]
  Any help on how to do this in the "Ruby Way" would be appreciated.

Lots of verbose answers. It can be quite short:

    a = [['a',0],['b',15],['c',35],['d',100],['e',205],['f',215],['g',300]]

    a.group_by {|name,val| val/100} .
      collect do |key,list| list.collect {|name, time| name}
      end

    # => [["a", "b", "c"], ["d"], ["e", "f"], ["g"]]

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Tanaka Akira wrote:

I think this kind of problems which slices consecutive elements in an array
is not well supported in Ruby.

+1. There seems to be some real applications where that kind of tools are nifty.

I think this kind of problems which slices consecutive elements in an array
is not well supported in Ruby.

+1. There seems to be some real applications where that kind of tools are nifty.

Enumerable#each_slice is not usable because it slices for each fixed number
of elements.
Ruby 1.9.2 has Enumerable#slice_before and it is usable but not so elegant
because it needs to maintain previous element.

...

We may need Enumerable#slice_between.

Is a really elegant method possible for this sort of thing?
Some time ago I was trying to find duplicate files,
and set up an array of files with elements (paths, sizes, checksums),
and then wrote an Enumerable#each_group method.
The only way I could think of differentiating a yield for "comparison"
from a yield of the wanted array of "similar" objects
was to have the first item of each yield to be a boolean
with true meaning compare, and false meaning it's the wanted array.

I'd be interested in seeing a better Enumerable#each_group method,
if one is possible.

In the meantime, below is what I wrote for Enumerable#each_group,
with its application to Steve Wilhelm's problem.

module Enumerable
  # Deeming successive enumerable objects to be "equivalent"
  # using a block compare, yields an array of the equivalent items.
  def each_group_use_block_compare()
    arr = nil ; obj_g = nil
    self.each do | obj |
      if arr then
        # first item in yield is "true" indicating a yield for comparison
        if ( yield true, obj_g, obj ) == 0 then
          arr << obj
          obj_g = obj # group by adjacent objects, not by first in group
          next
        end
        # first item in yield is "false" indicating a yield of the group
        yield false, arr
      end
      obj_g = obj ; arr = [ obj ]
    end
    if arr then
      # first item in yield is "false" indicating a yield of the group
      yield false, arr
    end
    return self
  end
end

# for the problem of Steve Wilhelm
def group_for_sw( arr )
  arrg =
  arr.each_group_use_block_compare do | q_compare, aa, bb |
    if q_compare then
      if bb[1] - aa[1] < 30 then 0 else -1 end
    else
      aa.map! { |cc| cc[0] } # to preserve A, B, C objects, omit this
      arrg << aa
    end
  end
  arrg
end

arr = [ [ :A, 0 ], [ :B, 15 ], [ :C, 35 ],
        [ :D, 100 ],
        [ :E, 205 ], [ :F, 215 ],
        [ :G, 300 ]
       ]
arrg = group_for_sw( arr )
p arrg #=> [[:A, :B, :C], [:D], [:E, :F], [:G]]

···

On Sun, Mar 21, 2010 at 12:57 AM, Tanaka Akira wrote:
On Sun, Mar 21, 2010 at 6:42 AM, Urabe Shyouhei wrote:
On Sun, Mar 21, 2010 at 12:57 AM, Tanaka Akira also wrote:

> Example (second column is timestamp in seconds starting from zero).
>
> A 0
> B 15
> C 35
> D 100
> E 205
> F 215
> G 300
>
> would result in
>
> [[A, B, C], [D], [E, F], [G]]

I think this kind of problems which slices consecutive elements in an array
is not well supported in Ruby.

Enumerable#each_slice is not usable because it slices for each fixed number
of elements.

I guess that's why Enumerable#each_cons exists, to iterate over a
collection and look at the next n consecutive elements.

    a = [[:A,0],[:B,15],[:C,35],[:D,100],[:E,205],[:F,215],[:G,300]]

    all =
    current = [a[0][0]]

    a.each_cons(2) do |m, n|
      if n[1] - m[1] < 30
        current << n[0]
      else
        all << current
        current = [n[0]]
      end
    end

    all << current
    p all

···

At 2010-03-20 08:57PM, "Tanaka Akira" wrote:

2010/3/19 Steve Wilhelm <steve@studio831.com>:

Ruby 1.9.2 has Enumerable#slice_before and it is usable but not so elegant
because it needs to maintain previous element.

% ruby -e '
a = [
   ["A", 0],
   ["B", 15],
   ["C", 35],
   ["D", 100],
   ["E", 205],
   ["F", 215],
   ["G", 300]
]
prev = nil
p a.slice_before {|s,t|
     tmp, prev = prev, t
     tmp && (t-tmp) > 30
   }.map {|es|
     es.map {|s,t| s }
   }
'
[["A", "B", "C"], ["D"], ["E", "F"], ["G"]]

--
Glenn Jackman
    Write a wise saying and your name will live forever. -- Anonymous

Your results are correct only because of a happenstance of the data. Add 99
in there, it should group with 100, but it groups with the 0...100
congruence class

ruby-1.9.1-p378 > [0,15,35,99,100,205,300].group_by {|i| i/100 }
=> {0=>[0, 15, 35, 99], 1=>[100], 2=>[205], 3=>[300]}

Because these groups are relative to each other, I think you must do
something like Roger or I did, where you iterate through the list and
compare it to the groups.

···

On Thu, Mar 18, 2010 at 9:29 PM, Urabe Shyouhei <shyouhei@ruby-lang.org>wrote:

Roger Braun wrote:
> How about this:

You should really know about Enumerable#group_by.

irb(main):001:0> [0,15,35,100,205,300].group_by {|i| i/100 }
=> {0=>[0, 15, 35], 1=>[100], 2=>[205], 3=>[300]}

This does not solve the problem.

irb(main):011:0> [0,15,35,99,100,205,300].group_by{|i| i/100}
=> {0=>[0, 15, 35, 99], 1=>[100], 2=>[205], 3=>[300]}

but should be

[[0, 15, 35], [99, 100], [205], [300]]

at least if I understood the problem correctly.

···

On Fri, Mar 19, 2010 at 4:29 AM, Urabe Shyouhei <shyouhei@ruby-lang.org> wrote:

Roger Braun wrote:

How about this:

You should really know about Enumerable#group_by.

irb(main):001:0> [0,15,35,100,205,300].group_by {|i| i/100 }
=> {0=>[0, 15, 35], 1=>[100], 2=>[205], 3=>[300]}

--
Roger Braun
http://yononaka.de
roger.braun@student.uni-tuebingen.de

>> I would like to convert the array into an array of arrays; each subarray
>> would contain "grouped records." Grouping would occur if the timestamp
>> of the next element in the original array is within thirty seconds of
>> the current element.

[...]

Yeah, but as far as I can see that's not a solution to the problem the
OP wanted to solve. He wants to build chains based on the delta and not
based on the fact that they fall in the same interval. Am I missing
something?

Ah, I didn't read the words that closely -- I was looking at the input
and output data instead.

···

At 2010-03-19 01:52PM, "Robert Klemme" wrote:

On 03/19/2010 03:33 PM, Glenn Jackman wrote:
> At 2010-03-18 06:50PM, "Steve Wilhelm" wrote:

--
Glenn Jackman
    Write a wise saying and your name will live forever. -- Anonymous

We need two blocks.
One for slice condition and one for sliced array.
But Ruby's method call can take only one block at most.

Enumerable#slice_before solves this problem by returning enumerator.

enum.slice_before {|elt| condition }.each {|ary| ... }

slice_between presented in [ruby-talk:359384] is similar.

enum.slice_between {|elt1, elt2| condition }.each {|ary| ... }

Another possible idea is using Proc argument.
enum.foo(lambda {|elt| condition }) {|ary| ... }

···

2010/3/21 Colin Bartlett <colinb2r@googlemail.com>:

Is a really elegant method possible for this sort of thing?
Some time ago I was trying to find duplicate files,
and set up an array of files with elements (paths, sizes, checksums),
and then wrote an Enumerable#each_group method.
The only way I could think of differentiating a yield for "comparison"
from a yield of the wanted array of "similar" objects
was to have the first item of each yield to be a boolean
with true meaning compare, and false meaning it's the wanted array.

--
Tanaka Akira