Elegant expression of sorting

GD1 · 19 January 2012 00:58

Hi all,

I've found that for the current project I am working on that I am frequently doing sorts based on multiple keys. The comparisons on each key tend to vary, ie. sometimes I want descending rather than ascending, and other times the comparison itself is complex.

I usually do something like this:

foo.sort! {|a,b|
   # Ascending, primary.
   rv = a.key0 <=> b.key0
   # Descending, secondary.
   rv = b.key1 <=> a.key1 if rv == 0
   # A tricky and expensive comparison, tertiary.
   rv = a.magic(b) if rv == 0
   rv
}

Does anyone have any suggestions as to a more elegant way to express this? I'm after something that is logically equivalent, but easier to write and clearer to read.

Garth

Yong_Li · 19 January 2012 06:05

A general way to perform a multi-key sort is:
# assuming sort() is a stable sort algorithm, e.g. merge sort
  foo.sort! {|a,b|
     # starting from the least important key
     a.magic(b)
  }
  foo.sort! {|a,b|
     # sort again using the second-least important key
     b.key1 <=> a.key1
  }
  # repeat until you are done with the most important key

I guess you can generalize this using an array of Lambdas.

One potential disadvantage of this for your particular case is that,
a.magic(b) is always performed even if a.key0 != b.key0. As you have
stated, this is quite expensive, and can make the above general scheme
much more expensive than your way.

···

On Thu, Jan 19, 2012 at 8:58 AM, Garthy D <garthy_lmkltybr@entropicsoftware.com> wrote:

Hi all,

I've found that for the current project I am working on that I am frequently
doing sorts based on multiple keys. The comparisons on each key tend to
vary, ie. sometimes I want descending rather than ascending, and other times
the comparison itself is complex.

I usually do something like this:

foo.sort! {|a,b|
# Ascending, primary.
rv = a.key0 <=> b.key0
# Descending, secondary.
rv = b.key1 <=> a.key1 if rv == 0
# A tricky and expensive comparison, tertiary.
rv = a.magic(b) if rv == 0
rv
}

Does anyone have any suggestions as to a more elegant way to express this?
I'm after something that is logically equivalent, but easier to write and
clearer to read.

Garth

botp1 · 19 January 2012 07:23

try,

foo.sort{|a,b|
((rv=a.key0<=>b.key0)==0) &&
((rv=b.key1<=>a.key1)==0) &&
(rv=a.magic(b))
rv
end

kind regards -botp

···

On Thu, Jan 19, 2012 at 8:58 AM, Garthy D <garthy_lmkltybr@entropicsoftware.com> wrote:

easier to write and clearer to read.
Garth

GD1 · 19 January 2012 09:09

Hi Yong Li,

···

On 19/01/12 16:35, Yong Li wrote:

On Thu, Jan 19, 2012 at 8:58 AM, Garthy D > <garthy_lmkltybr@entropicsoftware.com> wrote:

Hi all,

I've found that for the current project I am working on that I am frequently
doing sorts based on multiple keys. The comparisons on each key tend to
vary, ie. sometimes I want descending rather than ascending, and other times
the comparison itself is complex.

I usually do something like this:

foo.sort! {|a,b|
  # Ascending, primary.
  rv = a.key0<=> b.key0
  # Descending, secondary.
  rv = b.key1<=> a.key1 if rv == 0
  # A tricky and expensive comparison, tertiary.
  rv = a.magic(b) if rv == 0
  rv
}

Does anyone have any suggestions as to a more elegant way to express this?
I'm after something that is logically equivalent, but easier to write and
clearer to read.

Garth

A general way to perform a multi-key sort is:
# assuming sort() is a stable sort algorithm, e.g. merge sort
   foo.sort! {|a,b|
      # starting from the least important key
      a.magic(b)
   }
   foo.sort! {|a,b|
      # sort again using the second-least important key
      b.key1<=> a.key1
   }
   # repeat until you are done with the most important key

I guess you can generalize this using an array of Lambdas.

One potential disadvantage of this for your particular case is that,
a.magic(b) is always performed even if a.key0 != b.key0. As you have
stated, this is quite expensive, and can make the above general scheme
much more expensive than your way.

Cool- thanks for that. I won't be able to use it directly as the last key sort for most of the things I am doing is typically expensive, but I'm still quite interested in different possible approaches. Thankyou.

Garth

GD1 · 19 January 2012 09:15

Hi botp,

···

On 19/01/12 17:53, botp wrote:

On Thu, Jan 19, 2012 at 8:58 AM, Garthy D > <garthy_lmkltybr@entropicsoftware.com> wrote:

easier to write and clearer to read.
Garth

try,

foo.sort{|a,b|
  ((rv=a.key0<=>b.key0)==0)&&
  ((rv=b.key1<=>a.key1)==0)&&
  (rv=a.magic(b))
  rv
end

kind regards -botp

Very nice use of short-circuiting. It made me think of Bourne shell scripting.

Garth

Robert_K1 · 19 January 2012 09:22

Hi all,

I've found that for the current project I am working on that I am
frequently
doing sorts based on multiple keys. The comparisons on each key tend to
vary, ie. sometimes I want descending rather than ascending, and other
times
the comparison itself is complex.

I usually do something like this:

foo.sort! {|a,b|
# Ascending, primary.
rv = a.key0<=> b.key0
# Descending, secondary.
rv = b.key1<=> a.key1 if rv == 0
# A tricky and expensive comparison, tertiary.
rv = a.magic(b) if rv == 0
rv
}

Does anyone have any suggestions as to a more elegant way to express
this?
I'm after something that is logically equivalent, but easier to write and
clearer to read.

A general way to perform a multi-key sort is:
# assuming sort() is a stable sort algorithm, e.g. merge sort
foo.sort! {|a,b|
# starting from the least important key
a.magic(b)
}
foo.sort! {|a,b|
# sort again using the second-least important key
b.key1<=> a.key1
}
# repeat until you are done with the most important key

I guess you can generalize this using an array of Lambdas.
One potential disadvantage of this for your particular case is that,
a.magic(b) is always performed even if a.key0 != b.key0. As you have
stated, this is quite expensive, and can make the above general scheme
much more expensive than your way.

This is totally inefficient because it goes through the original data
set (which might be large) multiple times. The complexity of #magic
only adds to this.

Cool- thanks for that. I won't be able to use it directly as the last key
sort for most of the things I am doing is typically expensive, but I'm still
quite interested in different possible approaches. Thankyou.

Please don't, there are much better approaches (see botp's for
example). Here are more

foo.sort! {|a,b|
# Ascending, primary.
(a.key0 <=> b.key0).nonzer0? ||
# Descending, secondary.
(b.key1 <=> a.key1).nonzero? ||
# A tricky and expensive comparison, tertiary.
a.magic(b)
}

foo.sort_by {|a|
  [
  # Ascending, primary.
  a.key0,
  # Descending, secondary, only if numeric.
  -a.key1,
   # A tricky and expensive comparison, tertiary.
  a.create_magic_key
  ]
}

All these approaches can also be stored in a lambda and used from there

YourClass::SORT_FOO = lambda {|a| [a.key0, -a.key1, a.create_magic_key]}
YourClass::SORT_BAR = lambda {|a,b| (a.key0 <=> b.key0).nonzero? || ... }

enum.sort_by &YourClass::SORT_FOO
enum.sort &YourClass::SORT_BAR

Kind regards

robert

···

On Thu, Jan 19, 2012 at 10:09 AM, Garthy D <garthy_lmkltybr@entropicsoftware.com> wrote:

On 19/01/12 16:35, Yong Li wrote:

On Thu, Jan 19, 2012 at 8:58 AM, Garthy D >> <garthy_lmkltybr@entropicsoftware.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

GD1 · 19 January 2012 10:49

Hi Robert,

Hi all,

I've found that for the current project I am working on that I am
frequently
doing sorts based on multiple keys. The comparisons on each key tend to
vary, ie. sometimes I want descending rather than ascending, and other
times
the comparison itself is complex.

I usually do something like this:

foo.sort! {|a,b|
  # Ascending, primary.
  rv = a.key0<=> b.key0
  # Descending, secondary.
  rv = b.key1<=> a.key1 if rv == 0
  # A tricky and expensive comparison, tertiary.
  rv = a.magic(b) if rv == 0
  rv
}

Does anyone have any suggestions as to a more elegant way to express
this?
I'm after something that is logically equivalent, but easier to write and
clearer to read.

A general way to perform a multi-key sort is:
# assuming sort() is a stable sort algorithm, e.g. merge sort
   foo.sort! {|a,b|
      # starting from the least important key
      a.magic(b)
   }
   foo.sort! {|a,b|
      # sort again using the second-least important key
      b.key1<=> a.key1
   }
   # repeat until you are done with the most important key

I guess you can generalize this using an array of Lambdas.
One potential disadvantage of this for your particular case is that,
a.magic(b) is always performed even if a.key0 != b.key0. As you have
stated, this is quite expensive, and can make the above general scheme
much more expensive than your way.

This is totally inefficient because it goes through the original data
set (which might be large) multiple times. The complexity of #magic
only adds to this.

Very true.

Cool- thanks for that. I won't be able to use it directly as the last key
sort for most of the things I am doing is typically expensive, but I'm still
quite interested in different possible approaches. Thankyou.

Please don't, there are much better approaches (see botp's for
example). Here are more

Yes, it's not directly suitable for what I'm doing but the different approach was interesting- there are cases where I might be able to have the input come in already sorted against one of the keys without cost. It got me thinking, anyway.

foo.sort! {|a,b|
  # Ascending, primary.
  (a.key0<=> b.key0).nonzer0? ||
  # Descending, secondary.
  (b.key1<=> a.key1).nonzero? ||
  # A tricky and expensive comparison, tertiary.
  a.magic(b)
}

Very, very neat; and nice find on "nonzero?". My first thought was that "nonzero?" surely returns a boolean and this would break it- but it looks like "nonzero?" with numeric return was designed for this *exact* problem.

I was wondering if there was a nice way to drop the "rv" from my solution- this looks like the best way.

foo.sort_by {|a|
   [
   # Ascending, primary.
   a.key0,
   # Descending, secondary, only if numeric.
   -a.key1,
    # A tricky and expensive comparison, tertiary.
   a.create_magic_key
   ]
}

I'd stumbled on sort_by initially, I suspect I'll use it when I can precompute the comparisons easily (this is sometimes the case, sometimes not).

All these approaches can also be stored in a lambda and used from there

YourClass::SORT_FOO = lambda {|a| [a.key0, -a.key1, a.create_magic_key]}
YourClass::SORT_BAR = lambda {|a,b| (a.key0<=> b.key0).nonzero? || ... }

enum.sort_by&YourClass::SORT_FOO
enum.sort&YourClass::SORT_BAR

Excellent stuff.

I'm not at all familiar with the "enum.sort_by&YourClass::SORT_FOO" syntax at this point, so I might need to read up on that. I can at least partly guess from the context though.

Thankyou for your very detailed input on this one, excellent stuff!

Garth

···

On 19/01/12 19:52, Robert Klemme wrote:

On Thu, Jan 19, 2012 at 10:09 AM, Garthy D > <garthy_lmkltybr@entropicsoftware.com> wrote:

On 19/01/12 16:35, Yong Li wrote:

On Thu, Jan 19, 2012 at 8:58 AM, Garthy D >>> <garthy_lmkltybr@entropicsoftware.com> wrote:

Yong_Li · 20 January 2012 02:15

A general way to perform a multi-key sort is:
# assuming sort() is a stable sort algorithm, e.g. merge sort
foo.sort! {|a,b|
# starting from the least important key
a.magic(b)
}
foo.sort! {|a,b|
# sort again using the second-least important key
b.key1<=> a.key1
}
# repeat until you are done with the most important key

This is totally inefficient because it goes through the original data
set (which might be large) multiple times. The complexity of #magic
only adds to this.

Cool- thanks for that. I won't be able to use it directly as the last key
sort for most of the things I am doing is typically expensive, but I'm still
quite interested in different possible approaches. Thankyou.

Please don't, there are much better approaches (see botp's for
example). Here are more

Wow, big bow to Robert, I really like reading your posts on this
mailing list. They always teach me something.

I came across this 'generic' multi-key sorting algorithm from a
textbook. It looked interesting to me because I was wondering why I
had never seen it in codes. Now, it seems obvious why I had never seen
it - it is inefficient! lucky that I never used this algorithm in real
life. wheww..

···

On Thu, Jan 19, 2012 at 5:22 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

On Thu, Jan 19, 2012 at 10:09 AM, Garthy D > <garthy_lmkltybr@entropicsoftware.com> wrote:

On 19/01/12 16:35, Yong Li wrote:

foo.sort! {|a,b|
# Ascending, primary.
(a.key0 <=> b.key0).nonzer0? ||
# Descending, secondary.
(b.key1 <=> a.key1).nonzero? ||
# A tricky and expensive comparison, tertiary.
a.magic(b)
}

foo.sort_by {|a|
[
# Ascending, primary.
a.key0,
# Descending, secondary, only if numeric.
-a.key1,
# A tricky and expensive comparison, tertiary.
a.create_magic_key
]
}

All these approaches can also be stored in a lambda and used from there

YourClass::SORT_FOO = lambda {|a| [a.key0, -a.key1, a.create_magic_key]}
YourClass::SORT_BAR = lambda {|a,b| (a.key0 <=> b.key0).nonzero? || ... }

enum.sort_by &YourClass::SORT_FOO
enum.sort &YourClass::SORT_BAR

Kind regards

robert

Robert_K1 · 19 January 2012 12:56

foo.sort! {|a,b|
# Ascending, primary.
(a.key0<=> b.key0).nonzer0? ||
# Descending, secondary.
(b.key1<=> a.key1).nonzero? ||
# A tricky and expensive comparison, tertiary.
a.magic(b)
}

Very, very neat; and nice find on "nonzero?". My first thought was that
"nonzero?" surely returns a boolean and this would break it- but it looks
like "nonzero?" with numeric return was designed for this *exact* problem.

Right.

I was wondering if there was a nice way to drop the "rv" from my solution-
this looks like the best way.

foo.sort_by {|a|
[
# Ascending, primary.
a.key0,
# Descending, secondary, only if numeric.
-a.key1,
# A tricky and expensive comparison, tertiary.
a.create_magic_key
]
}

I'd stumbled on sort_by initially, I suspect I'll use it when I can
precompute the comparisons easily (this is sometimes the case, sometimes
not).

Note, with #sort_by you do not precompute comparisons but you compute
a comparison key! In the example it is an Array because Array#<=>
uses fields from the beginning to the end, i.e. sorts according to
fields with decreasing precedence (first pos 0, if that comparison
returns 0 field pos 1 etc.).

I'm not at all familiar with the "enum.sort_by&YourClass::SORT_FOO" syntax
at this point, so I might need to read up on that. I can at least partly
guess from the context though.

It just passes the lambda as a block as if it the block was passed
directly. Note, that technically behind the scenes the syntax foo(&x)
invokes x.to_proc which is also provided by Symbol:

irb(main):001:0> a = 5.times.to_a
=> [0, 1, 2, 3, 4]
irb(main):002:0> a.map {|x| x.to_s}
=> ["0", "1", "2", "3", "4"]
irb(main):003:0> a.map(&:to_s)
=> ["0", "1", "2", "3", "4"]

Thankyou for your very detailed input on this one, excellent stuff!

You're welcome!

Kind regards

robert

···

On Thu, Jan 19, 2012 at 11:49 AM, Garthy D <garthy_lmkltybr@entropicsoftware.com> wrote:

On 19/01/12 19:52, Robert Klemme wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

GD1 · 20 January 2012 06:04

Hi Robert,

foo.sort_by {|a|
   [
   # Ascending, primary.
   a.key0,
   # Descending, secondary, only if numeric.
   -a.key1,
    # A tricky and expensive comparison, tertiary.
   a.create_magic_key
   ]
}

I'd stumbled on sort_by initially, I suspect I'll use it when I can
precompute the comparisons easily (this is sometimes the case, sometimes
not).

Note, with #sort_by you do not precompute comparisons but you compute
a comparison key! In the example it is an Array because Array#<=>
uses fields from the beginning to the end, i.e. sorts according to
fields with decreasing precedence (first pos 0, if that comparison
returns 0 field pos 1 etc.).

I just reread my previous response and my explanation was really poor- sorry about that. Thankyou for the clarification, it matches my understanding, and expresses it far better than I did.

I'm not at all familiar with the "enum.sort_by&YourClass::SORT_FOO" syntax
at this point, so I might need to read up on that. I can at least partly
guess from the context though.

It just passes the lambda as a block as if it the block was passed
directly. Note, that technically behind the scenes the syntax foo(&x)
invokes x.to_proc which is also provided by Symbol:

irb(main):001:0> a = 5.times.to_a
=> [0, 1, 2, 3, 4]
irb(main):002:0> a.map {|x| x.to_s}
=> ["0", "1", "2", "3", "4"]
irb(main):003:0> a.map(&:to_s)
=> ["0", "1", "2", "3", "4"]

Excellent, thankyou. The additional example solidified my understanding of it too. Thankyou for taking the additional time to explain this, much appreciated.

Garth

···

On 19/01/12 23:26, Robert Klemme wrote:

Topic		Replies	Views
Cascading <=> comparisons ruby-talk	14	71	5 August 2005
Sort_by: multiple fields with reverse sort ruby-talk	15	148	15 October 2010
Sorting using many subitems ruby-talk	15	294	4 January 2017
#sort_by and #sort_obj ruby-talk	17	136	8 October 2007
A sort_by descending sort ruby-talk	8	113	9 October 2004

Elegant expression of sorting

Related topics