ruby for bigdata and streaming

Hello

I am not that familiar with ruby programming.
But I found that ruby objects have good block arguments, that means we can
write good FP styles with the higher functions as in other languages such
as scala?

For instance, for this job in Apache spark:

rdd.collect()

[('a', 3), ('b', 4), ('a', 2), ('c', 4)]

rdd.groupByKey().mapValues(len).sortBy(lambda x:

x[1],ascending=False).collect()
[('a', 2), ('b', 1), ('c', 1)]

I can do it pretty fine with ruby way:

irb(main):037:0> li
=> [["a", 3], ["b", 4], ["a", 2], ["c", 4]]
irb(main):038:0>
irb(main):039:0> li.group_by {|x| x[0]}.map{|x,y| [x,y.size]}.sort_by
{|_,x| -x}
=> [["a", 2], ["b", 1], ["c", 1]]

So, is ruby also suitable for bigdata and the streaming?
Just want to hear back from you.

Thanks
Adriel P

Quoting Adriel Peng (peng.adriel@gmail.com):

   I am not that familiar with ruby programming.
   But I found that ruby objects have good block arguments, that means we can
   write good FP styles with the higher functions as in other languages such
   as scala?

Hello Adriel. I know I am showing my ignorance here, but what do you
mean by 'good FP styles with the higher functions?' Nothing to do with
floating-point, I am afraid...

Thanks

Carlo

···

Subject: ruby for bigdata and streaming
  Date: Tue 18 Jan 22 04:41:44PM +0800

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)

Hello

Sorry for my mistake.
FP: Functional programming - Wikipedia
Higher oder functions: Higher-order function - Wikipedia

Thanks.

···

On Tue, Jan 18, 2022 at 4:46 PM Carlo E. Prelz <fluido@fluido.as> wrote:

        Subject: ruby for bigdata and streaming
        Date: Tue 18 Jan 22 04:41:44PM +0800

Quoting Adriel Peng (peng.adriel@gmail.com):

> I am not that familiar with ruby programming.
> But I found that ruby objects have good block arguments, that means
we can
> write good FP styles with the higher functions as in other languages
such
> as scala?

Hello Adriel. I know I am showing my ignorance here, but what do you
mean by 'good FP styles with the higher functions?' Nothing to do with
floating-point, I am afraid...

Thanks

Carlo

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Quoting Adriel Peng (peng.adriel@gmail.com):

   Sorry for my mistake.
   FP: [1]Functional programming - Wikipedia
   Higher oder functions:
   [2]Higher-order function - Wikipedia

Yours was not a mistake. Thanks a lot for the pointers.

You know, I studied Architecture, long time ago, not IT (there wasn't
even a specific faculty at the time), and in my several decades of
earning my bread with code I never had the need to explore the field
of functional programming.

I hope other readers can help you with this.

Best

Carlo

···

Subject: Re: ruby for bigdata and streaming
  Date: Tue 18 Jan 22 04:49:18PM +0800

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)

There are Ruby bindings for Apache Arrow:

which allows for data processing of some large datasets and is actively developed. Ruby should be good for such tasks. There is a thought many approaches are influenced too much by Python:

so input on nice ways to perform certain operations is helpful. If a large amount of RAM is required, some effort is still needed for distributed processing, but if primarily reading data and checking it for some conditions then some of the existing tools might be sufficient.

···

On 1/18/22 11:57 AM, Carlo E. Prelz wrote:

  Subject: Re: ruby for bigdata and streaming
  Date: Tue 18 Jan 22 04:49:18PM +0800

Quoting Adriel Peng (peng.adriel@gmail.com):

    Sorry for my mistake.
    FP: [1]Functional programming - Wikipedia
    Higher oder functions:
    [2]Higher-order function - Wikipedia

Yours was not a mistake. Thanks a lot for the pointers.

You know, I studied Architecture, long time ago, not IT (there wasn't
even a specific faculty at the time), and in my several decades of
earning my bread with code I never had the need to explore the field
of functional programming.

I hope other readers can help you with this.

Best

Carlo

Hi Adriel,

Please check this GitHub repo for the topic

Kind regards,
Anton Kozik

···

On Tue, Jan 18, 2022 at 9:42 AM Adriel Peng <peng.adriel@gmail.com> wrote:

Hello

I am not that familiar with ruby programming.
But I found that ruby objects have good block arguments, that means we can
write good FP styles with the higher functions as in other languages such
as scala?

For instance, for this job in Apache spark:

>>> rdd.collect()
[('a', 3), ('b', 4), ('a', 2), ('c', 4)]
>>> rdd.groupByKey().mapValues(len).sortBy(lambda x:
x[1],ascending=False).collect()
[('a', 2), ('b', 1), ('c', 1)]

I can do it pretty fine with ruby way:

irb(main):037:0> li
=> [["a", 3], ["b", 4], ["a", 2], ["c", 4]]
irb(main):038:0>
irb(main):039:0> li.group_by {|x| x[0]}.map{|x,y| [x,y.size]}.sort_by
{|_,x| -x}
=> [["a", 2], ["b", 1], ["c", 1]]

So, is ruby also suitable for bigdata and the streaming?
Just want to hear back from you.

Thanks
Adriel P

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Thanks for all the answers.
I took a look at Apache Arrow and its ruby binding. It seems we can program
with ruby to implement a dataframe object for such similar operations in
Spark. That looks interesting.

And, I also found ruby's Array API is very strong. For stance, there is no
reduceByKey() member for Array object, like this one in Spark:

rdd.collect()

[('a', 1), ('b', 2), ('a', 3), ('b', 5)]

rdd.reduceByKey(lambda x,y: x+y).collect()

[('b', 7), ('a', 4)]

But, I just took two minutes to write that one by using Array's other
methods, though it is not that perfect. For instance:

irb(main):010:0> li = [["a",1],["b",2],["a",3],["b",5]]
=> [["a", 1], ["b", 2], ["a", 3], ["b", 5]]

irb(main):011:0> li.group_by {|x| x[0]}.map {|x,y| [x,y.reduce{|x,y|
x[1]+y[1]}]}
=> [["a", 4], ["b", 7]]

This is so flexible. Even in Scala it's really hard to write that a member,
thing could be like:

List[A].groupMapReduce[K, B](key: (A) => K)(f: (A) => B)(reduce: (B, B) =>
B): Map[K, B]

Thank you the ruby world!

Regards.
Adriel P

Another question, for my this reduceByKey() in ruby though it's not that
perfect, but how can I make it as an Array member? next time I can directly
call array_obj.reduceByKey.

Sorry this time I am not yet familiar with ruby's object system.

Thank you.

···

On Tue, Jan 18, 2022 at 7:44 PM Adriel Peng <peng.adriel@gmail.com> wrote:

Thanks for all the answers.
I took a look at Apache Arrow and its ruby binding. It seems we can
program with ruby to implement a dataframe object for such similar
operations in Spark. That looks interesting.

And, I also found ruby's Array API is very strong. For stance, there is no
reduceByKey() member for Array object, like this one in Spark:

>>> rdd.collect()
[('a', 1), ('b', 2), ('a', 3), ('b', 5)]

>>> rdd.reduceByKey(lambda x,y: x+y).collect()
[('b', 7), ('a', 4)]

But, I just took two minutes to write that one by using Array's other
methods, though it is not that perfect. For instance:

irb(main):010:0> li = [["a",1],["b",2],["a",3],["b",5]]
=> [["a", 1], ["b", 2], ["a", 3], ["b", 5]]

irb(main):011:0> li.group_by {|x| x[0]}.map {|x,y| [x,y.reduce{|x,y|
x[1]+y[1]}]}
=> [["a", 4], ["b", 7]]

This is so flexible. Even in Scala it's really hard to write that a
member, thing could be like:

List[A].groupMapReduce[K, B](key: (A) => K)(f: (A) => B)(reduce: (B, B) =>
B): Map[K, B]

Thank you the ruby world!

Regards.
Adriel P

Simplest solution example,

class Array
    def third
        size > 2 ? self[2] : nil
    endend

a = [1, 2, 3, 4, 5]

puts a.third

Kind regards,
Anton Kozik

···

On Tue, Jan 18, 2022 at 12:52 PM Adriel Peng <peng.adriel@gmail.com> wrote:

Another question, for my this reduceByKey() in ruby though it's not that
perfect, but how can I make it as an Array member? next time I can directly
call array_obj.reduceByKey.

Sorry this time I am not yet familiar with ruby's object system.

Thank you.

On Tue, Jan 18, 2022 at 7:44 PM Adriel Peng <peng.adriel@gmail.com> wrote:

Thanks for all the answers.
I took a look at Apache Arrow and its ruby binding. It seems we can
program with ruby to implement a dataframe object for such similar
operations in Spark. That looks interesting.

And, I also found ruby's Array API is very strong. For stance, there is
no reduceByKey() member for Array object, like this one in Spark:

>>> rdd.collect()
[('a', 1), ('b', 2), ('a', 3), ('b', 5)]

>>> rdd.reduceByKey(lambda x,y: x+y).collect()
[('b', 7), ('a', 4)]

But, I just took two minutes to write that one by using Array's other
methods, though it is not that perfect. For instance:

irb(main):010:0> li = [["a",1],["b",2],["a",3],["b",5]]
=> [["a", 1], ["b", 2], ["a", 3], ["b", 5]]

irb(main):011:0> li.group_by {|x| x[0]}.map {|x,y| [x,y.reduce{|x,y|
x[1]+y[1]}]}
=> [["a", 4], ["b", 7]]

This is so flexible. Even in Scala it's really hard to write that a
member, thing could be like:

List[A].groupMapReduce[K, B](key: (A) => K)(f: (A) => B)(reduce: (B, B)
=> B): Map[K, B]

Thank you the ruby world!

Regards.
Adriel P

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Quoting Adriel Peng (peng.adriel@gmail.com):

   Another question, for my this reduceByKey() in ruby though it's not that
   perfect, but how can I make it as an Array member? next time I can
   directly call array_obj.reduceByKey.

Ah, for this I may be of help. Write in a source file something like:

class Array
  def reduceByKey(para_1,...,para_n)
    # put here your code;
    # refer to the array itself as 'self'
  end
end

After that, load your file with 'require' or 'require_relative' (or
just add the code you need to execute in the same source file, after
the method definition) and method reduceByKey will be available for
all arrays.

Carlo

···

Subject: Re: ruby for bigdata and streaming
  Date: Tue 18 Jan 22 07:52:12PM +0800

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)

Thank you a lot Carlo.

I have made job done:

irb(main):001:0> class Array
irb(main):002:1> def reduceByKey
irb(main):003:2> group_by {|x| x[0]}.map{|x,y| y.reduce{|x,y| [ x[0],
x[1]+y[1] ]} }
irb(main):004:2> end
irb(main):005:1> end
=> :reduceByKey
irb(main):006:0>
irb(main):007:0> li =
[["a",1],["b",2],["a",3],["b",5],["a",4],["c",9],["a",11]]
=> [["a", 1], ["b", 2], ["a", 3], ["b", 5], ["a", 4], ["c", 9], ["a", 11]]
irb(main):008:0>
irb(main):009:0> li.reduceByKey
=> [["a", 19], ["b", 7], ["c", 9]]

···

On Tue, Jan 18, 2022 at 8:01 PM Carlo E. Prelz <fluido@fluido.as> wrote:

        Subject: Re: ruby for bigdata and streaming
        Date: Tue 18 Jan 22 07:52:12PM +0800

Quoting Adriel Peng (peng.adriel@gmail.com):

> Another question, for my this reduceByKey() in ruby though it's not
that
> perfect, but how can I make it as an Array member? next time I can
> directly call array_obj.reduceByKey.

Ah, for this I may be of help. Write in a source file something like:

class Array
  def reduceByKey(para_1,...,para_n)
    # put here your code;
    # refer to the array itself as 'self'
  end
end

After that, load your file with 'require' or 'require_relative' (or
just add the code you need to execute in the same source file, after
the method definition) and method reduceByKey will be available for
all arrays.

Carlo

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

I have another question related to this.
How can I let other people (my co-workers) who are using Array class also
use the method?
Does he need to require the file which has the method defined every time?

Thanks

···

On Wed, Jan 19, 2022 at 12:06 PM Adriel Peng <peng.adriel@gmail.com> wrote:

Thank you a lot Carlo.

I have made job done:

irb(main):001:0> class Array
irb(main):002:1> def reduceByKey
irb(main):003:2> group_by {|x| x[0]}.map{|x,y| y.reduce{|x,y| [ x[0],
x[1]+y[1] ]} }
irb(main):004:2> end
irb(main):005:1> end
=> :reduceByKey
irb(main):006:0>
irb(main):007:0> li =
[["a",1],["b",2],["a",3],["b",5],["a",4],["c",9],["a",11]]
=> [["a", 1], ["b", 2], ["a", 3], ["b", 5], ["a", 4], ["c", 9], ["a", 11]]
irb(main):008:0>
irb(main):009:0> li.reduceByKey
=> [["a", 19], ["b", 7], ["c", 9]]

On Tue, Jan 18, 2022 at 8:01 PM Carlo E. Prelz <fluido@fluido.as> wrote:

        Subject: Re: ruby for bigdata and streaming
        Date: Tue 18 Jan 22 07:52:12PM +0800

Quoting Adriel Peng (peng.adriel@gmail.com):

> Another question, for my this reduceByKey() in ruby though it's not
that
> perfect, but how can I make it as an Array member? next time I can
> directly call array_obj.reduceByKey.

Ah, for this I may be of help. Write in a source file something like:

class Array
  def reduceByKey(para_1,...,para_n)
    # put here your code;
    # refer to the array itself as 'self'
  end
end

After that, load your file with 'require' or 'require_relative' (or
just add the code you need to execute in the same source file, after
the method definition) and method reduceByKey will be available for
all arrays.

Carlo

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci
sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Quoting Adriel Peng (peng.adriel@gmail.com):

   I have another question related to this.
   How can I let other people (my co-workers) who are using Array class also
   use the method?
   Does he need to require the file which has the method defined
   every time?

Yes. You can prepare a file with all different methods with generic
use, and give a copy to each of your co-workers. If the file is copied
into a specific directory, it will be found by the interpreter no
matter where in the filesystem you happen to be when you execute it.

This directory is created when the interpreter is installed in the
computer. For a generic linux debian-like distribution, it is

/usr/lib/ruby/vendor_ruby

If you copy the file there - say you called it 'adriel.rb' - any ruby
source file or irb instance that contains

require('adriel')

at the top will be able to access your methods.

Carlo

···

Subject: Re: ruby for bigdata and streaming
  Date: Wed 19 Jan 22 12:11:26PM +0800

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)

when working with large data, it might be better to not iterate multiple
times, currently you do iterate 3 times over your list. (more or less)

when your plan is to add the values anyway, you can just use
each_with_object and a Hash object.

> li.each_with_object(Hash.new(0)) {|(k, v), h| h[k]+=v }
=> {"a"=>19, "b"=>7, "c"=>9}

···

Am 19.01.22 um 05:06 schrieb Adriel Peng:

Thank you a lot Carlo.

I have made job done:

irb(main):001:0> class Array
irb(main):002:1> def reduceByKey
irb(main):003:2> group_by {|x| x[0]}.map{|x,y| y.reduce{|x,y| [
x[0], x[1]+y[1] ]} }
irb(main):004:2> end
irb(main):005:1> end
=> :reduceByKey
irb(main):006:0>
irb(main):007:0> li =
[["a",1],["b",2],["a",3],["b",5],["a",4],["c",9],["a",11]]
=> [["a", 1], ["b", 2], ["a", 3], ["b", 5], ["a", 4], ["c", 9], ["a", 11]]
irb(main):008:0>
irb(main):009:0> li.reduceByKey
=> [["a", 19], ["b", 7], ["c", 9]]

On Tue, Jan 18, 2022 at 8:01 PM Carlo E. Prelz <fluido@fluido.as> wrote:

     Subject: Re: ruby for bigdata and streaming
     Date: Tue 18 Jan 22 07:52:12PM +0800

    Quoting Adriel Peng (peng.adriel@gmail.com):

    > Another question, for my this reduceByKey() in ruby though
    it's not that
    > perfect, but how can I make it as an Array member? next time
    I can
    > directly call array_obj.reduceByKey.

    Ah, for this I may be of help. Write in a source file something like:

    class Array
     def reduceByKey(para_1,...,para_n)
     # put here your code;
     # refer to the array itself as 'self'
     end
    end

    After that, load your file with 'require' or 'require_relative' (or
    just add the code you need to execute in the same source file, after
    the method definition) and method reduceByKey will be available for
    all arrays.

    Carlo

    --
     * Se la Strada e la sua Virtu' non fossero state messe
    da parte,
    * K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
     * di parlare tanto di amore e di rettitudine?
    (Chuang-Tzu)

    Unsubscribe:
    <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
    <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe:<mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

From what I know, in the distributed computing platform every computing job is splitted into map and reduce, this is maybe OP’s purpose.

Thanks

···

On 2022-01-19 14:07, Hans Mackowiak wrote:

when working with large data, it might be better to not iterate
multiple times, currently you do iterate 3 times over your list. (more
or less)

when your plan is to add the values anyway, you can just use
each_with_object and a Hash object.

li.each_with_object(Hash.new(0)) {|(k, v), h| h[k]+=v }

=> {"a"=>19, "b"=>7, "c"=>9}

Am 19.01.22 um 05:06 schrieb Adriel Peng:

Thank you a lot Carlo.

I have made job done:

irb(main):001:0> class Array
irb(main):002:1> def reduceByKey
irb(main):003:2> group_by {|x| x[0]}.map{|x,y| y.reduce{|x,y| [
x[0], x[1]+y[1] ]} }
irb(main):004:2> end
irb(main):005:1> end
=> :reduceByKey
irb(main):006:0>
irb(main):007:0> li =
[["a",1],["b",2],["a",3],["b",5],["a",4],["c",9],["a",11]]
=> [["a", 1], ["b", 2], ["a", 3], ["b", 5], ["a", 4], ["c", 9],
["a", 11]]
irb(main):008:0>
irb(main):009:0> li.reduceByKey
=> [["a", 19], ["b", 7], ["c", 9]]

On Tue, Jan 18, 2022 at 8:01 PM Carlo E. Prelz <fluido@fluido.as> >> wrote:

Subject: Re: ruby for bigdata and streaming
Date: Tue 18 Jan 22 07:52:12PM +0800

Quoting Adriel Peng (peng.adriel@gmail.com):

Another question, for my this reduceByKey() in ruby though

it's not that

perfect, but how can I make it as an Array member? next time

I can

directly call array_obj.reduceByKey.

Ah, for this I may be of help. Write in a source file something
like:

class Array
def reduceByKey(para_1,...,para_n)
# put here your code;
# refer to the array itself as 'self'
end

After that, load your file with 'require' or 'require_relative'
(or
just add the code you need to execute in the same source file,
after
the method definition) and method reduceByKey will be available
for
all arrays.

Carlo

--
* Se la Strada e la sua Virtu' non fossero state messe
da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci
sarebbe
* di parlare tanto di amore e di rettitudine?
(Chuang-Tzu)

Unsubscribe:
<mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe:
<mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt; [1]

Links:
------
[1] http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

And, I checked his implementation, it only implemented the sum() operation, though this is maybe enough for his use case.

The good reduce() should be difined by user by an anonymous function, not only the sum.

Thanks

···

On 2022-01-19 14:26, Wes Peng wrote:

From what I know, in the distributed computing platform every
computing job is splitted into map and reduce, this is maybe OP’s
purpose.

Thanks

On 2022-01-19 14:07, Hans Mackowiak wrote:

when working with large data, it might be better to not iterate
multiple times, currently you do iterate 3 times over your list. (more
or less)

when your plan is to add the values anyway, you can just use
each_with_object and a Hash object.

li.each_with_object(Hash.new(0)) {|(k, v), h| h[k]+=v }

=> {"a"=>19, "b"=>7, "c"=>9}

Am 19.01.22 um 05:06 schrieb Adriel Peng:

Thank you a lot Carlo.

I have made job done:

irb(main):001:0> class Array
irb(main):002:1> def reduceByKey
irb(main):003:2> group_by {|x| x[0]}.map{|x,y| y.reduce{|x,y| [
x[0], x[1]+y[1] ]} }
irb(main):004:2> end
irb(main):005:1> end
=> :reduceByKey
irb(main):006:0>
irb(main):007:0> li =
[["a",1],["b",2],["a",3],["b",5],["a",4],["c",9],["a",11]]
=> [["a", 1], ["b", 2], ["a", 3], ["b", 5], ["a", 4], ["c", 9],
["a", 11]]
irb(main):008:0>
irb(main):009:0> li.reduceByKey
=> [["a", 19], ["b", 7], ["c", 9]]

On Tue, Jan 18, 2022 at 8:01 PM Carlo E. Prelz <fluido@fluido.as> >>> wrote:

Subject: Re: ruby for bigdata and streaming
Date: Tue 18 Jan 22 07:52:12PM +0800

Quoting Adriel Peng (peng.adriel@gmail.com):

Another question, for my this reduceByKey() in ruby though

it's not that

perfect, but how can I make it as an Array member? next time

I can

directly call array_obj.reduceByKey.

Ah, for this I may be of help. Write in a source file something
like:

class Array
def reduceByKey(para_1,...,para_n)
# put here your code;
# refer to the array itself as 'self'
end

After that, load your file with 'require' or 'require_relative'
(or
just add the code you need to execute in the same source file,
after
the method definition) and method reduceByKey will be available
for
all arrays.

Carlo

--
* Se la Strada e la sua Virtu' non fossero state messe
da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci
sarebbe
* di parlare tanto di amore e di rettitudine?
(Chuang-Tzu)

Unsubscribe:
<mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe:
<mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt; [1]

Links:
------
[1] http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Hi wes,

Yes, currently I only use sum(). The standard groupByKey implementation (I
think) would be:

irb(main):001:0> class Array
irb(main):002:1> def reduceByKey
irb(main):003:2> if block_given?
irb(main):004:3> group_by {|x| x[0]}.map{|x,y| y.reduce{|x,y| [ x[0],
yield(x[1],y[1]) ]}}
irb(main):005:3> else
irb(main):006:3> raise "no block"
irb(main):007:3> end
irb(main):008:2> end
irb(main):009:1> end
=> :reduceByKey
irb(main):010:0>
irb(main):011:0> li =
[["a",1],["b",2],["a",3],["b",5],["a",4],["c",9],["a",11]]
=> [["a", 1], ["b", 2], ["a", 3], ["b", 5], ["a", 4], ["c", 9], ["a", 11]]
irb(main):012:0>
irb(main):013:0> li.reduceByKey{|x,y| x+y}
=> [["a", 19], ["b", 7], ["c", 9]]
irb(main):014:0>
irb(main):015:0>
irb(main):016:0> li.reduceByKey{|x,y| x*y}
=> [["a", 132], ["b", 10], ["c", 9]]

Thanks.

···

On Wed, Jan 19, 2022 at 2:32 PM Wes Peng <wes@stackdev.eu> wrote:

And, I checked his implementation, it only implemented the sum()
operation, though this is maybe enough for his use case.

The good reduce() should be difined by user by an anonymous function,
not only the sum.

Thanks

On 2022-01-19 14:26, Wes Peng wrote:
> From what I know, in the distributed computing platform every
> computing job is splitted into map and reduce, this is maybe OP’s
> purpose.
>
> Thanks
>
> On 2022-01-19 14:07, Hans Mackowiak wrote:
>> when working with large data, it might be better to not iterate
>> multiple times, currently you do iterate 3 times over your list. (more
>> or less)
>>
>> when your plan is to add the values anyway, you can just use
>> each_with_object and a Hash object.
>>
>>> li.each_with_object(Hash.new(0)) {|(k, v), h| h[k]+=v }
>> => {"a"=>19, "b"=>7, "c"=>9}
>>
>> Am 19.01.22 um 05:06 schrieb Adriel Peng:
>>
>>> Thank you a lot Carlo.
>>>
>>> I have made job done:
>>>
>>> irb(main):001:0> class Array
>>> irb(main):002:1> def reduceByKey
>>> irb(main):003:2> group_by {|x| x[0]}.map{|x,y| y.reduce{|x,y| [
>>> x[0], x[1]+y[1] ]} }
>>> irb(main):004:2> end
>>> irb(main):005:1> end
>>> => :reduceByKey
>>> irb(main):006:0>
>>> irb(main):007:0> li =
>>> [["a",1],["b",2],["a",3],["b",5],["a",4],["c",9],["a",11]]
>>> => [["a", 1], ["b", 2], ["a", 3], ["b", 5], ["a", 4], ["c", 9],
>>> ["a", 11]]
>>> irb(main):008:0>
>>> irb(main):009:0> li.reduceByKey
>>> => [["a", 19], ["b", 7], ["c", 9]]
>>>
>>> On Tue, Jan 18, 2022 at 8:01 PM Carlo E. Prelz <fluido@fluido.as> > >>> wrote:
>>>
>>>> Subject: Re: ruby for bigdata and streaming
>>>> Date: Tue 18 Jan 22 07:52:12PM +0800
>>>>
>>>> Quoting Adriel Peng (peng.adriel@gmail.com):
>>>>
>>>>> Another question, for my this reduceByKey() in ruby though
>>>> it's not that
>>>>> perfect, but how can I make it as an Array member? next time
>>>> I can
>>>>> directly call array_obj.reduceByKey.
>>>>
>>>> Ah, for this I may be of help. Write in a source file something
>>>> like:
>>>>
>>>> class Array
>>>> def reduceByKey(para_1,...,para_n)
>>>> # put here your code;
>>>> # refer to the array itself as 'self'
>>>> end
>>>> end
>>>>
>>>> After that, load your file with 'require' or 'require_relative'
>>>> (or
>>>> just add the code you need to execute in the same source file,
>>>> after
>>>> the method definition) and method reduceByKey will be available
>>>> for
>>>> all arrays.
>>>>
>>>> Carlo
>>>>
>>>> --
>>>> * Se la Strada e la sua Virtu' non fossero state messe
>>>> da parte,
>>>> * K * Carlo E. Prelz - fluido@fluido.as che bisogno ci
>>>> sarebbe
>>>> * di parlare tanto di amore e di rettitudine?
>>>> (Chuang-Tzu)
>>>>
>>>> Unsubscribe:
>>>> <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
>>>> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;
>>>
>>> Unsubscribe:
>>> <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
>>> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt; [1]
>>
>>
>> Links:
>> ------
>> [1] http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk
>>
>> Unsubscribe:
>> <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
>> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;
>
> Unsubscribe:
> <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;