How do you do this

George_George · 1 October 2009 12:23

Given an array of strings e.g.
x = ["abc","abcde" "def","xyzwj"] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = ["abc","def"]
x2 = ["abcde","xyzwj"]

Thank you.

···

--
Posted via http://www.ruby-forum.com/.

Ilan_Berci1 · 1 October 2009 12:34

George George wrote:

Given an array of strings e.g.
x = ["abc","abcde" "def","xyzwj"] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = ["abc","def"]
x2 = ["abcde","xyzwj"]

Thank you.

y = {}
x.each do |v|
y[v.length] || =
y[v.length] << v
end
y.values

or if you prefer less lines..

x.inject({}) do |h, v|
(y[v.length] || = ) << v
h
end.values

···

--
Posted via http://www.ruby-forum.com/\.

Paul_Smith1 · 1 October 2009 12:35

Given an array of strings e.g.
x = ["abc","abcde" "def","xyzwj"] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

Well, here's something close:

h = {}
x.each do |i|
h[i.length] ||=
h[i.length] << i
end

h is now a hash: {3=>["abc", "def"], 5=>["abcde", "xyzwj"]}

That's close enough to what you want that I'm sure you can run with
it. Look in the "Group by unique entries of a hash" thread for more
ideas.

···

On Thu, Oct 1, 2009 at 1:23 PM, George George <george.githinji@gmail.com> wrote:

x1 = ["abc","def"]
x2 = ["abcde","xyzwj"]

Thank you.
--
Posted via http://www.ruby-forum.com/\.

--
Paul Smith
http://www.nomadicfun.co.uk

paul@pollyandpaul.co.uk

Jesus_Gabriel_y_Gala · 1 October 2009 12:36

You might want to look at group_by:

%w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

Jesus.

···

On Thu, Oct 1, 2009 at 2:23 PM, George George <george.githinji@gmail.com> wrote:

Given an array of strings e.g.
x = ["abc","abcde" "def","xyzwj"] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = ["abc","def"]
x2 = ["abcde","xyzwj"]

Bertram_Scharpf · 1 October 2009 12:40

Hi,

···

Am Donnerstag, 01. Okt 2009, 21:23:30 +0900 schrieb George George:

Given an array of strings e.g.
x = ["abc","abcde" "def","xyzwj"] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = ["abc","def"]
x2 = ["abcde","xyzwj"]

x = %w(abc abcde def xyzwj)
x.inject( Hash.new { |h,k| h[k] = }) { |h,e| h[e.length].push e ; h }

Bertram

--
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de

Harry3 · 1 October 2009 15:04

p x.map{|a| a.length}.uniq.map{|b| x.select{|c| c.length == b}}

#> [["abc", "def"], ["abcde", "xyzwj"]]

Harry

···

On Thu, Oct 1, 2009 at 9:23 PM, George George <george.githinji@gmail.com> wrote:

Given an array of strings e.g.
x = ["abc","abcde" "def","xyzwj"] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = ["abc","def"]
x2 = ["abcde","xyzwj"]

Thank you.

--
A Look into Japanese Ruby List in English

Paul_Smith1 · 1 October 2009 12:38

George George wrote:

Given an array of strings e.g.
x = ["abc","abcde" "def","xyzwj"] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = ["abc","def"]
x2 = ["abcde","xyzwj"]

Thank you.

y = {}
x.each do |v|
y[v.length] || =
y[v.length] << v
end
y.values

LOL I love Ruby and Rubytalk

or if you prefer less lines..

x.inject({}) do |h, v|
(y[v.length] || = ) << v
h
end.values

Must.... master..... inject.....

···

On Thu, Oct 1, 2009 at 1:34 PM, Ilan Berci <ilan.berci@gmail.com> wrote:

--
Paul Smith
http://www.nomadicfun.co.uk

paul@pollyandpaul.co.uk

Paul_Smith1 · 1 October 2009 12:39

I really can't believe Ruby sometimes. This is so freaking awesome!

Must get back to real work...

···

2009/10/1 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:

On Thu, Oct 1, 2009 at 2:23 PM, George George <george.githinji@gmail.com> wrote:

Given an array of strings e.g.
x = ["abc","abcde" "def","xyzwj"] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = ["abc","def"]
x2 = ["abcde","xyzwj"]

You might want to look at group_by:

%w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

--
Paul Smith
http://www.nomadicfun.co.uk

paul@pollyandpaul.co.uk

David_A_Black1 · 1 October 2009 15:20

Hi --

Given an array of strings e.g.
x = ["abc","abcde" "def","xyzwj"] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = ["abc","def"]
x2 = ["abcde","xyzwj"]

You might want to look at group_by:

%w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

It's interesting how often the need for group_by without the keys
comes up. Meaning, in this case, to get the new arrays you'd
ultimately do:

arr.group_by(&:length).values

and I believe there was at least one similar case mentioned here
recently. I wonder whether it would be cool to have a method that did
this -- in effect:

   module Enumerable
     def group_by_without_keys(&block)
       group_by(&block).values
     end
   end

I'm not sure what it should be called, though.

David

···

On Thu, 1 Oct 2009, Jesús Gabriel y Galán wrote:

On Thu, Oct 1, 2009 at 2:23 PM, George George <george.githinji@gmail.com> wrote:

--
David A. Black, Director
Ruby Power and Light, LLC (http://www.rubypal.com)
Ruby/Rails training, consulting, mentoring, code review
Book: The Well-Grounded Rubyist (http://www.manning.com/black2\)

Ryan_Davis1 · 1 October 2009 19:21

Syntax error in both cases. It needs to be "||=", not "|| =".

Well... inject ALWAYS loses, but fanboys sure seems to like it for no good reason.

By using better names and the right tool for the job, this becomes a LOT more readable, maintanable, and faster all in one fell swoop:

by_length = Hash.new { |h,k| h[k] = }
strings.each do |string|
by_length[string.length] << string
end
by_length.values # I think this part is a mistake, but I wanted to match

I think the readability is more important than speed by a long shot... But just in case you're not convinced, check out the benchmarks:

% ./blah.rb 10000
# of iterations = 10000
user system total real
null_time 0.000000 0.000000 0.000000 ( 0.001370)
mine 7.790000 0.050000 7.840000 ( 7.869737)
yours-inject 15.170000 0.050000 15.220000 ( 15.554334)
yours-each 11.850000 0.100000 11.950000 ( 12.013553)

inject is twice as slow as mine. stop using it.

···

On Oct 1, 2009, at 05:34 , Ilan Berci wrote:

George George wrote:

Given an array of strings e.g.
x = ["abc","abcde" "def","xyzwj"] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = ["abc","def"]
x2 = ["abcde","xyzwj"]

Thank you.

y = {}
x.each do |v|
y[v.length] || =
y[v.length] << v
end
y.values

or if you prefer less lines..

x.inject({}) do |h, v|
(y[v.length] || = ) << v
h
end.values

Jesus_Gabriel_y_Gala · 1 October 2009 15:42

Hi --

%w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

It's interesting how often the need for group_by without the keys
comes up. Meaning, in this case, to get the new arrays you'd
ultimately do:

arr.group_by(&:length).values

Yup, although in this case, I'm going to guess that he will either

- Access the list of words of a specific number

desired_length = something
a = %w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

a[desired_length]

- Sort the groups by length

a = %w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}
a.sort.map {|x| x[1]} # or something

and I believe there was at least one similar case mentioned here
recently. I wonder whether it would be cool to have a method that did
this -- in effect:

module Enumerable
def group_by_without_keys(&block)
group_by(&block).values
end
end

I'm not sure what it should be called, though.

values_grouped_by
?

Jesus.

···

On Thu, Oct 1, 2009 at 5:20 PM, David A. Black <dblack@rubypal.com> wrote:

On Thu, 1 Oct 2009, Jesús Gabriel y Galán wrote:

Josh_Cheek · 1 October 2009 21:11

I generalized yours, and made the returned groups sorted by the results from
the call. In this more comparable situation, inject is about 11% slower, not
twice as slow.

Inject Test
Rehearsal --------------------------------------------------
Without Inject 14.160000 0.100000 14.260000 ( 14.364824)
With Inject 15.950000 0.120000 16.070000 ( 16.258609)
---------------------------------------- total: 30.330000sec

user system total real
Without Inject 14.200000 0.110000 14.310000 ( 14.553592)
With Inject 16.000000 0.120000 16.120000 ( 16.422186)

Inject is about 11.38% slower

Here is the code:

#!/usr/bin/env ruby
require 'benchmark'

class Symbol
  def to_proc
    Proc.new{|obj| obj.send self } # give 1.9ish syntax
  end
end

module Enumerable

  def group_by_without_inject( &get_key )
    groups = Hash.new { |h,k| h[k] = Array.new }
    each do |obj|
      groups[ get_key[obj] ] << obj
    end
    groups.keys.sort!.map!{|key| groups[key] }
  end

  def group_by_with_inject( &get_key )
    groups = inject Hash.new{ |h,k| h[k] = Array.new } do |groups,obj|
      groups[ get_key[obj] ] << obj
      groups
    end
    groups.keys.sort!.map!{|key| groups[key] }
  end

end

puts "Inject Test"
benchmarks = Benchmark.bmbm do|b|
x = ["abc","abcde","def","xyzwj"]

  b.report("Without Inject") do
    500_000.times{ x.group_by_without_inject &:length }
  end

  b.report("With Inject") do
    500_000.times{ x.group_by_with_inject &:length }
  end
end

benchmarks.map!{|b| b.real }
percent_slower = sprintf( "%.2f" , 100 - 100 * benchmarks.first /
benchmarks.last )
puts '' , "Inject is about #{ percent_slower }% slower"

···

On Thu, Oct 1, 2009 at 2:21 PM, Ryan Davis <ryand-ruby@zenspider.com> wrote:

Well... inject ALWAYS loses, but fanboys sure seems to like it for no good
reason.

By using better names and the right tool for the job, this becomes a LOT
more readable, maintanable, and faster all in one fell swoop:

by_length = Hash.new { |h,k| h[k] = }
strings.each do |string|
by_length[string.length] << string
end
by_length.values # I think this part is a mistake, but I wanted to match

I think the readability is more important than speed by a long shot... But
just in case you're not convinced, check out the benchmarks:

% ./blah.rb 10000
# of iterations = 10000
user system total real
null_time 0.000000 0.000000 0.000000 ( 0.001370)
mine 7.790000 0.050000 7.840000 ( 7.869737)
yours-inject 15.170000 0.050000 15.220000 ( 15.554334)
yours-each 11.850000 0.100000 11.950000 ( 12.013553)

inject is twice as slow as mine. stop using it.

Tony_Arcieri6 · 1 October 2009 21:26

I think it's much more readable to build hashes with something like:

h = {}
blah.each do |v|
h[...] = ...
end

than:

blah.inject({}) do |h, v|
h[...] = ...
h
end

Gratuitous use of inject FTL. Ruby isn't an immutable state functional
language.

···

On Thu, Oct 1, 2009 at 6:38 AM, Paul Smith <paul@pollyandpaul.co.uk> wrote:

Must.... master..... inject.....

--
Tony Arcieri
Medioh/Nagravision

David_A_Black1 · 1 October 2009 17:05

Yes, that's pretty likely. I guess to make the non-keyed version useful it would have to do some kind of automatic sorting, like you
did in your example:

   module Enumerable
     def my_group_by(&block)
       g = group_by(&block)
       g.sort.map(&:last)
     end
   end

or something. (I'm trying to be a good 1.9 citizen and use
Symbol#to_proc, even though I still find it a bit line-noisy

And that wouldn't handle the specific number case, of course. Maybe
it's not all that useful.

David

···

On Fri, 2 Oct 2009, Jesús Gabriel y Galán wrote:

On Thu, Oct 1, 2009 at 5:20 PM, David A. Black <dblack@rubypal.com> wrote:

Hi --

On Thu, 1 Oct 2009, Jesús Gabriel y Galán wrote:

%w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

It's interesting how often the need for group_by without the keys
comes up. Meaning, in this case, to get the new arrays you'd
ultimately do:

arr.group_by(&:length).values

Yup, although in this case, I'm going to guess that he will either

- Access the list of words of a specific number
- Sort the groups by length

--
David A. Black, Director
Ruby Power and Light, LLC (http://www.rubypal.com)
Ruby/Rails training, consulting, mentoring, code review
Book: The Well-Grounded Rubyist (http://www.manning.com/black2\)

Ryan_Davis1 · 1 October 2009 23:38

This "more comparable" situation is full of bugs and isn't comparable.

Yes, I should have said "your [ilan's] inject version is twice as slow as mine" instead of "inject is twice as slow as mine" but my numbers still stand. If you use the right tool for the job and it'll pay off in both maintainability and speed.

Your version isn't maintainable, has bugs(*) and obfuscates a ton, missing my point entirely. Simpler code wins HANDS DOWN. As I said the first time: "I think the readability is more important than speed by a long shot". FWIW, my results running your code as-is was exactly 2x yours (22% slower, not 11% slower).

*) calling sort! within a law of demeter violation is ALWAYS a bug.
*) calling (almost) any bang method on a temporary value is usually a bug.

···

On Oct 1, 2009, at 14:11 , Josh Cheek wrote:

I generalized yours, and made the returned groups sorted by the results from
the call. In this more comparable situation, inject is about 11% slower, not
twice as slow.

David_A_Black1 · 2 October 2009 00:37

Hi --

···

On Fri, 2 Oct 2009, Tony Arcieri wrote:

On Thu, Oct 1, 2009 at 6:38 AM, Paul Smith <paul@pollyandpaul.co.uk> wrote:

Must.... master..... inject.....

I think it's much more readable to build hashes with something like:

h = {}
blah.each do |v|
h[...] = ...
end

than:

blah.inject({}) do |h, v|
h[...] = ...
h
end

I agree, and I think that's it's nice that 1.9 provides
Enumerator#with_object, which lets you avoid that explicit feeding of
the accumulator back into the loop.

David

--
David A. Black, Director
Ruby Power and Light, LLC (http://www.rubypal.com)
Ruby/Rails training, consulting, mentoring, code review
Book: The Well-Grounded Rubyist (http://www.manning.com/black2\)

Josh_Cheek · 2 October 2009 01:27

Your version isn't maintainable, has bugs(*) and obfuscates a ton, missing
my point entirely. Simpler code wins HANDS DOWN. As I said the first time:
"I think the readability is more important than speed by a long shot".

Perhaps, but it is simpler because it is too specific to to the given
problem. Once it must be rewritten in several different places, it is no
longer more maintainable. It will also clutter the code, making it less
readable.

*) calling sort! within a law of demeter violation is ALWAYS a bug.

That is fair.

*) calling (almost) any bang method on a temporary value is usually a bug.

Why is that?

···

On Thu, Oct 1, 2009 at 6:38 PM, Ryan Davis <ryand-ruby@zenspider.com> wrote:

Yossef_Mendelssohn · 2 October 2009 02:34

What's the point of calling a bang method there? You don't care at all
about the object you're mutating, and it seems like a blatant case of
premature optimization.

Now, consider the specific cases where the bang method doesn't return
the same value as the non-bang (viz. uniq!).

···

On Oct 1, 8:27 pm, Josh Cheek <josh.ch...@gmail.com> wrote:

On Thu, Oct 1, 2009 at 6:38 PM, Ryan Davis <ryand-r...@zenspider.com> wrote:
> *) calling (almost) any bang method on a temporary value is usually a bug.

Why is that?

--
-yossef

Josh_Cheek · 2 October 2009 02:43

I see, thank you.

···

On Thu, Oct 1, 2009 at 9:34 PM, Yossef Mendelssohn <ymendel@pobox.com>wrote:

On Oct 1, 8:27 pm, Josh Cheek <josh.ch...@gmail.com> wrote:
> On Thu, Oct 1, 2009 at 6:38 PM, Ryan Davis <ryand-r...@zenspider.com> > wrote:
> > *) calling (almost) any bang method on a temporary value is usually a
bug.
>
> Why is that?

What's the point of calling a bang method there? You don't care at all
about the object you're mutating, and it seems like a blatant case of
premature optimization.

Now, consider the specific cases where the bang method doesn't return
the same value as the non-bang (viz. uniq!).

--
-yossef

David_A_Black1 · 2 October 2009 12:00

Hi --

*) calling (almost) any bang method on a temporary value is usually a bug.

Why is that?

What's the point of calling a bang method there? You don't care at all
about the object you're mutating, and it seems like a blatant case of
premature optimization.

I disagree, at least on the second point. Using transparent language
constructs that happen to be more efficient in some dimension doesn't
mean you're prematurely optimizing.

For example, if I do this:

obj.meth

instead of this:

obj.send(:meth)

you could say I'm prematurely optimizing. Both exist in the language;
any non-beginning Rubyist knows about both; both fly from the fingers
quite readily. So if I choose the one that's faster, I'm "optimizing"
-- but still, unless I have no choice, I'll choose it.

Same thing with things like map and map!. Actually I don't
automatically reach for the in-place ones (I'm just not in the habit),
but they're fully visible and trivially usable at the language level,
so I don't think they can be thought of as sidetracking the programmer
into doing too much too soon about too little. In other words, all
else being equal, it isn't necessarily bad to choose one idiom over
another in a high level language on the grounds that the one you're
choosing is likely to shave a few cycles. If you forbid yourselves the
slightly faster ones simply because they are slightly faster, that way
lies having to test everything you write to make sure it ISN'T
performing as well as it could, so that you can't be accused of having
prematurely optimized

What might be bad is if you get a hunch, based on no evidence, that it
would be efficient to divide every array into four subarrays before
doing a mapping, and then recombining them, and go around changing
your code so that every array is handled that way.... That, I think,
is the kind of thing where you're looking for trouble before you know
it's there.

Of course, a certain amount of how this plays out in Ruby has to do
with Ruby being a high-level language. We can make a certain number of
choices like map vs. map! trivially, even though if we were
implementing the methods themselves, we'd be faced with a whole new
round of optimization issues and a much more fine-grained code
texture.

I tend to agree too with the interpretations of the Hoare statement
that suggest that not all optimization during development is
"premature". It depends on what you're writing, and on how hard it is
likely to be to go back and deal with optimizations and bottlenecks
later. I think Ryan has said "Performance isn't an issue until it's an
issue", which I agree with, but I'd add that it's not necessarily
amiss to get into some habits that cost you nothing in developer time,
do not derail you from your basic workflow, but do buy you a bit of
efficiency here and there.

Now, consider the specific cases where the bang method doesn't return
the same value as the non-bang (viz. uniq!).

Of course you wouldn't want to risk calling a wrong method on nil, but
that's not an argument against using map! or sort! Just remember to
read the documentation carefully. After all, ! means "dangerous" so
You Have Been Warned

David

P.S. I've found this article very interesting on this score:

···

On Fri, 2 Oct 2009, Yossef Mendelssohn wrote:

On Oct 1, 8:27 pm, Josh Cheek <josh.ch...@gmail.com> wrote:

On Thu, Oct 1, 2009 at 6:38 PM, Ryan Davis <ryand-r...@zenspider.com> wrote:

--
David A. Black, Director
Ruby Power and Light, LLC (http://www.rubypal.com)
Ruby/Rails training, consulting, mentoring, code review
Book: The Well-Grounded Rubyist (http://www.manning.com/black2\)

Topic		Replies	Views
Array Practice ruby-talk	26	209	10 February 2008
Convert a Hash into an Array ruby-talk	22	253	28 March 2005
Group in the array ruby-talk	14	166	9 January 2010
Uniq with count; better way? ruby-talk	40	412	23 January 2012
Array#inject to create a hash versus Hash[*array.collect{}.flatten] -- Speed, segfault ruby-talk	4	143	11 June 2007

How do you do this

Related topics