#sort_by and #sort_obj

I haven't seen this technique in the wild before.

If you have custom sorting using the spaceship:

class Something
   def <=>(other)
     [@name, @version] <=> [other.name, other.version]
   end
end

You can't easily use the more-efficient #sort_by because you'll need to know how to sort Somethings, which involves duplication:

somethings.sort_by { |s| [s.name, s.version] }

Instead you can add a method that returns the thing to sort by:

class Something
   def sort_obj
     [@name, @version]
   end

   def <=>(other)
     sort_obj <=> other.sort_obj
   end
end

So now using #sort_by is easy:

somethings.sort_by { |s| s.sort_obj }

I don't know if the name #sort_obj is appropriate, but I'm not coming up with something better.

See also:

http://blog.segment7.net/articles/2007/10/07/sort_by-and-sort_obj

···

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars

Nice. In fact, it seems this could dethrone <=> as the primary method
Sortable depends on.

Perhaps a better name is #indice or #sort_index, or something like
that.

T.

···

On Oct 7, 2:21 am, Eric Hodel <drbr...@segment7.net> wrote:

I haven't seen this technique in the wild before.

If you have custom sorting using the spaceship:

class Something
   def <=>(other)
     [@name, @version] <=> [other.name, other.version]
   end
end

You can't easily use the more-efficient #sort_by because you'll need
to know how to sort Somethings, which involves duplication:

somethings.sort_by { |s| [s.name, s.version] }

Instead you can add a method that returns the thing to sort by:

class Something
   def sort_obj
     [@name, @version]
   end

   def <=>(other)
     sort_obj <=> other.sort_obj
   end
end

So now using #sort_by is easy:

somethings.sort_by { |s| s.sort_obj }

I don't know if the name #sort_obj is appropriate, but I'm not coming
up with something better.

See also:

http://blog.segment7.net/articles/2007/10/07/sort_by-and-sort_obj

Hi --

···

On Sun, 7 Oct 2007, Eric Hodel wrote:

I haven't seen this technique in the wild before.

If you have custom sorting using the spaceship:

class Something
def <=>(other)
  [@name, @version] <=> [other.name, other.version]
end

You can't easily use the more-efficient #sort_by because you'll need to know how to sort Somethings, which involves duplication:

somethings.sort_by { |s| [s.name, s.version] }

Instead you can add a method that returns the thing to sort by:

class Something
def sort_obj
  [@name, @version]
end

def <=>(other)
  sort_obj <=> other.sort_obj
end

So now using #sort_by is easy:

somethings.sort_by { |s| s.sort_obj }

I don't know if the name #sort_obj is appropriate, but I'm not coming up with something better.

Maybe #sort_key.

David

--
Upcoming training from Ruby Power and Light, LLC:
   * Intro to Ruby on Rails, Edison, NJ, October 23-26
   * Advancing with Rails, Edison, NJ, November 6-9
Both taught by David A. Black.
See http://www.rubypal.com for more info!

i prefer #order, #order_by since the concept is extends to much more that sorting, for instance iterating or returning elements from a collection such as a priority queue. in any case the idea is a good one.

cheers.

a @ http://codeforpeople.com/

···

On Oct 7, 2007, at 3:21 AM, Eric Hodel wrote:

I don't know if the name #sort_obj is appropriate, but I'm not coming up with something better.

--
share your knowledge. it's a way to achieve immortality.
h.h. the 14th dalai lama

Eric Hodel wrote:

somethings.sort_by { |s| s.sort_obj }

Since this is more efficient than #sort, maybe the implementation of #sort should first try to do this, and only if that fails (#sort_obj not defined, or two sort_obj not comparable), revert to the normal <=> algorithm.

This would make a difference when some library does a #sort on some collection, and you can't tell it to #sort_by instead (and you can't redefine #sort in the collection, either).

In the cases where you have a collection of objects that don't respond to #sort_obj, #sort will quickly hit a failure and use the fall-back. (The worst case would be if, say, all but the last object in the collection had #sort_obj.)

It should also fall back if <=> fails on the sort objs. I'm not sure how to detect that reliably though.

Anyway, here's a half-baked attempt:

class NoSortObjMethod < StandardError; end

class Object
   def sort_obj
     raise NoSortObjMethod
   end
end

class Array
   alias old_sort sort
   def sort
     if block_given?
       old_sort {|*args| yield(*args)}
     else
       begin
         rslt = sort_by {|s| s.sort_obj}
         puts "DEBUG: used sort_by with sort_obj"
         rslt
       rescue NoSortObjMethod
         puts "DEBUG: NoSortObjMethod, using old_sort"
         old_sort
       rescue TypeError => ex # ?
         puts "DEBUG: Comparison failed (#{ex}), using old_sort"
         old_sort
       end
     end
   end
end

a = [1, 3, 2]
p a
p a.sort

class C
   def initialize n
     @n = n
   end

   def sort_obj
     @n
   end
end

b = [C.new(1), C.new(3), C.new(2)]
p b
p b.sort

__END__

Output:

[1, 3, 2]
DEBUG: NoSortObjMethod, using old_sort
[1, 2, 3]
[#<C:0xb7c3d248 @n=1>, #<C:0xb7c3d234 @n=3>, #<C:0xb7c3d220 @n=2>]
DEBUG: used sort_by with sort_obj
[#<C:0xb7c3d248 @n=1>, #<C:0xb7c3d220 @n=2>, #<C:0xb7c3d234 @n=3>]

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Hi --

I haven't seen this technique in the wild before.

If you have custom sorting using the spaceship:

class Something
   def <=>(other)
     [@name, @version] <=> [other.name, other.version]
   end
end

You can't easily use the more-efficient #sort_by because you'll need
to know how to sort Somethings, which involves duplication:

somethings.sort_by { |s| [s.name, s.version] }

Instead you can add a method that returns the thing to sort by:

class Something
   def sort_obj
     [@name, @version]
   end

   def <=>(other)
     sort_obj <=> other.sort_obj
   end
end

So now using #sort_by is easy:

somethings.sort_by { |s| s.sort_obj }

I don't know if the name #sort_obj is appropriate, but I'm not coming
up with something better.

See also:

http://blog.segment7.net/articles/2007/10/07/sort_by-and-sort_obj

Nice. In fact, it seems this could dethrone <=> as the primary method
Sortable depends on.

I think Eric intends it to return a kind of sort key, not to actually
specify the sorting algorithm.

Perhaps a better name is #indice or #sort_index, or something like
that.

I'm afraid "indice" isn't a word :slight_smile: The singular of indices is index.

David

···

On Sun, 7 Oct 2007, Trans wrote:

On Oct 7, 2:21 am, Eric Hodel <drbr...@segment7.net> wrote:

--
Upcoming training from Ruby Power and Light, LLC:
   * Intro to Ruby on Rails, Edison, NJ, October 23-26
   * Advancing with Rails, Edison, NJ, November 6-9
Both taught by David A. Black.
See http://www.rubypal.com for more info!

David A. Black wrote:

Hi --

You can't easily use the more-efficient #sort_by because you'll need to know

something better.

Maybe #sort_key.

David

I'd support that. I wrote a nat_cmp for String and added
String#nat_cmp_key to use sort_by.
It would be nice if the object returned allowed #-@ to be called, to
enable reverse sorting by easily doing: enum.sort_by { |o| -o.sort_key }

Regards
Stefan

···

On Sun, 7 Oct 2007, Eric Hodel wrote:

--
Posted via http://www.ruby-forum.com/\.

its a great idea and i have been using it for years in sql by the
name "sortkey" although after reading ara's point i think i prefer the
term "orderkey" now.
another name in collation docs is "weight string". i think i like
weight_string the best because it is less verby and confrontational
than order_by and order. both of these are sort of "in your face"
sounding. cheers,
-r.

···

On 10/7/07, ara.t.howard <ara.t.howard@gmail.com> wrote:

On Oct 7, 2007, at 3:21 AM, Eric Hodel wrote:

> I don't know if the name #sort_obj is appropriate, but I'm not
> coming up with something better.

i prefer #order, #order_by since the concept is extends to much more
that sorting, for instance iterating or returning elements from a
collection such as a priority queue. in any case the idea is a good
one.

Yes, but sort_by is not always more efficient than sort:

$ qri sort_by
----------------------------------------------------- Enumerable#sort_by
     enum.sort_by {| obj | block } => array

···

On 10/7/07, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:

Eric Hodel wrote:
> somethings.sort_by { |s| s.sort_obj }

Since this is more efficient than #sort, maybe the implementation of
#sort should first try to do this, and only if that fails (#sort_obj not
defined, or two sort_obj not comparable), revert to the normal <=>
algorithm.

------------------------------------------------------------------------
     Sorts enum using a set of keys generated by mapping the values in
     enum through the given block.

        %w{ apple pear fig }.sort_by {|word| word.length}
                     #=> ["fig", "pear", "apple"]

     The current implementation of sort_by generates an array of tuples
     containing the original collection element and the mapped value.
     This makes sort_by fairly expensive when the keysets are simple

        require 'benchmark'
        include Benchmark

        a = (1..100000).map {rand(100000)}

        bm(10) do |b|
          b.report("Sort") { a.sort }
          b.report("Sort by") { a.sort_by {|a| a} }
        end

     produces:

        user system total real
        Sort 0.180000 0.000000 0.180000 ( 0.175469)
        Sort by 1.980000 0.040000 2.020000 ( 2.013586)

Also adding the startup overhead to check for an implementation of
sort_obj is an expense.

There's also the issue that you really need to check if ALL of the
elements being sorted respond to sort_obj.

It seems better to me to either reserve this technique as a pattern,
or at most add a new sort_by_sort_obj method or the like, and leave
sort as it is.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

My primary goal was reducing the number of method calls and objects created, boosting speed. The downside to this is that you can't sort heterogeneous collections using #sort_obj. When sorting using #<=> this would be easier.

···

On Oct 7, 2007, at 03:34 , Trans wrote:

On Oct 7, 2:21 am, Eric Hodel <drbr...@segment7.net> wrote:

I haven't seen this technique in the wild before.

If you have custom sorting using the spaceship:

class Something
   def <=>(other)
     [@name, @version] <=> [other.name, other.version]
   end
end

You can't easily use the more-efficient #sort_by because you'll need
to know how to sort Somethings, which involves duplication:

somethings.sort_by { |s| [s.name, s.version] }

Instead you can add a method that returns the thing to sort by:

class Something
   def sort_obj
     [@name, @version]
   end

   def <=>(other)
     sort_obj <=> other.sort_obj
   end
end

So now using #sort_by is easy:

somethings.sort_by { |s| s.sort_obj }

I don't know if the name #sort_obj is appropriate, but I'm not coming
up with something better.

See also:

http://blog.segment7.net/articles/2007/10/07/sort_by-and-sort_obj

Nice. In fact, it seems this could dethrone <=> as the primary method
Sortable depends on.

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars

I think Eric intends it to return a kind of sort key, not to actually
specify the sorting algorithm.

I realize. But clearly a standard definition of <=> would be based on
it. So the "coupling" method of Sortable could be #sort_key instead of
#<=>, though of course one can still override #<=> if need be.

> Perhaps a better name is #indice or #sort_index, or something like
> that.

I'm afraid "indice" isn't a word :slight_smile: The singular of indices is index.

Ah, well. I'm an American; dropping 's' is good enough for me :wink:
#sort_key is a perfectly good name.

T.

···

On Oct 7, 7:42 am, "David A. Black" <dbl...@rubypal.com> wrote:

Rick DeNatale wrote:

···

On 10/7/07, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:

Eric Hodel wrote:

somethings.sort_by { |s| s.sort_obj }

Since this is more efficient than #sort, maybe the implementation of
#sort should first try to do this, and only if that fails (#sort_obj not
defined, or two sort_obj not comparable), revert to the normal <=>
algorithm.

Yes, but sort_by is not always more efficient than sort:

Quite right, #sort should stay as it is, and #sort_by should only be called explicitly.

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

I like order_key too.

···

On Oct 7, 2007, at 11:22 , Rudi Cilibrasi wrote:

On 10/7/07, ara.t.howard <ara.t.howard@gmail.com> wrote:

On Oct 7, 2007, at 3:21 AM, Eric Hodel wrote:

I don't know if the name #sort_obj is appropriate, but I'm not
coming up with something better.

i prefer #order, #order_by since the concept is extends to much more
that sorting, for instance iterating or returning elements from a
collection such as a priority queue. in any case the idea is a good
one.

its a great idea and i have been using it for years in sql by the
name "sortkey" although after reading ara's point i think i prefer the
term "orderkey" now.
another name in collation docs is "weight string". i think i like
weight_string the best because it is less verby and confrontational
than order_by and order. both of these are sort of "in your face"
sounding. cheers,

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars

>> I haven't seen this technique in the wild before.
>>
>> If you have custom sorting using the spaceship:
>>
>> class Something
>> def <=>(other)
>> [@name, @version] <=> [other.name, other.version]
>> end
>> end
>>
>> You can't easily use the more-efficient #sort_by because you'll need
>> to know how to sort Somethings, which involves duplication:
>>
>> somethings.sort_by { |s| [s.name, s.version] }
>>
>> Instead you can add a method that returns the thing to sort by:
>>
>> class Something
>> def sort_obj
>> [@name, @version]
>> end
>>
>> def <=>(other)
>> sort_obj <=> other.sort_obj
>> end
>> end
>>
>> So now using #sort_by is easy:
>>
>> somethings.sort_by { |s| s.sort_obj }
>>
>> I don't know if the name #sort_obj is appropriate, but I'm not coming
>> up with something better.
>>
>> See also:
>>
>> http://blog.segment7.net/articles/2007/10/07/sort_by-and-sort_obj
>
> Nice. In fact, it seems this could dethrone <=> as the primary method
> Sortable depends on.

My primary goal was reducing the number of method calls and objects
created, boosting speed.

I'm afraid the performance issue is not that easy: I can imagine where
the <=> approach is more efficient. Here's the reason: with #sort_by
you need to keep *all* keys in memory for the whole sorting process
which can cost a lot of memory. Creating those for pairwise
comparison (as with <=>) is more CPU intensive because more temp
arrays have to be created but the overall memory usage might be lower
because they are kept during a single comparison operation only (see
also Rick's benchmark). Allocating that much memory from the OS for
the #sort_by case might actually slow the sorting down dramatically.

The downside to this is that you can't sort
heterogeneous collections using #sort_obj. When sorting using #<=>
this would be easier.

Not necessarily because the <=> operators might not be able to work
with elements from the other types and thus sorting might fail as
miserably. :slight_smile:

Kind regards

robert

···

2007/10/8, Eric Hodel <drbrain@segment7.net>:

On Oct 7, 2007, at 03:34 , Trans wrote:
> On Oct 7, 2:21 am, Eric Hodel <drbr...@segment7.net> wrote:

I'm actually impressed #sort_by does that well. Array#sort cheats.

···

On Oct 7, 2007, at 14:20 , Rick DeNatale wrote:

On 10/7/07, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:

Eric Hodel wrote:

somethings.sort_by { |s| s.sort_obj }

Since this is more efficient than #sort, maybe the implementation of
#sort should first try to do this, and only if that fails (#sort_obj not
defined, or two sort_obj not comparable), revert to the normal <=>
algorithm.

Yes, but sort_by is not always more efficient than sort:

        require 'benchmark'
        include Benchmark

        a = (1..100000).map {rand(100000)}

        bm(10) do |b|
          b.report("Sort") { a.sort }
          b.report("Sort by") { a.sort_by {|a| a} }
        end

     produces:

        user system total real
        Sort 0.180000 0.000000 0.180000 ( 0.175469)
        Sort by 1.980000 0.040000 2.020000 ( 2.013586)

Also adding the startup overhead to check for an implementation of
sort_obj is an expense.

There's also the issue that you really need to check if ALL of the
elements being sorted respond to sort_obj.

It seems better to me to either reserve this technique as a pattern,
or at most add a new sort_by_sort_obj method or the like, and leave
sort as it is.

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars

My primary goal was reducing the number of method calls and objects
created, boosting speed.

I'm afraid the performance issue is not that easy: I can imagine where
the <=> approach is more efficient. Here's the reason: with #sort_by
you need to keep *all* keys in memory for the whole sorting process
which can cost a lot of memory. Creating those for pairwise
comparison (as with <=>) is more CPU intensive because more temp
arrays have to be created but the overall memory usage might be lower
because they are kept during a single comparison operation only (see
also Rick's benchmark). Allocating that much memory from the OS for
the #sort_by case might actually slow the sorting down dramatically.

Profiling indicated a significant amount of time was spent in #<=> which created more objects and called more methods than #sort_by. Replacing #sort with #sort_by and #sort_obj reduced both. Creating fewer objects and using less memory aren't the same thing.

(Also, Rick's benchmark shows Array#sort cheating more than any other effect.)

The downside to this is that you can't sort heterogeneous collections using #sort_obj. When sorting using #<=> this would be easier.

Not necessarily because the <=> operators might not be able to work
with elements from the other types and thus sorting might fail as
miserably. :slight_smile:

#<=> usually does the right thing, raising ArgumentError or returning nil on incomparable objects, where if you use #sort_obj with incomparable objects you may get bogus results.

···

On Oct 8, 2007, at 02:51 , Robert Klemme wrote:

2007/10/8, Eric Hodel <drbrain@segment7.net>:

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars

First of all it's not my benchmark, it comes directly from the ri doc
for sort_by

Second, I don't understand the concept of Array#sort "cheating" this
isn't a contest.

Some things perform better with sort, others with sort_by. The
performance of sort is generally determined by the cost of doing the
<=> comparison, while the performace of sort_by is generally
determined by the cost in time and space of creating the side array.

As the English say, Horse for courses.

···

On 10/8/07, Eric Hodel <drbrain@segment7.net> wrote:

Profiling indicated a significant amount of time was spent in #<=>
which created more objects and called more methods than #sort_by.
Replacing #sort with #sort_by and #sort_obj reduced both. Creating
fewer objects and using less memory aren't the same thing.

(Also, Rick's benchmark shows Array#sort cheating more than any other
effect.)

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Profiling indicated a significant amount of time was spent in #<=>
which created more objects and called more methods than #sort_by.
Replacing #sort with #sort_by and #sort_obj reduced both. Creating
fewer objects and using less memory aren't the same thing.

(Also, Rick's benchmark shows Array#sort cheating more than any other
effect.)

First of all it's not my benchmark, it comes directly from the ri doc
for sort_by

Second, I don't understand the concept of Array#sort "cheating" this
isn't a contest.

Array#sort doesn't call Fixnum#<=> when comparing fixnums, even if you redefine Fixnum#<=>. If you want a benchmark that shows when #sort is faster than #sort_by in several general cases you should use like implementations, not the special cases where Array#sort has been optimized.

Some things perform better with sort, others with sort_by. The
performance of sort is generally determined by the cost of doing the
<=> comparison, while the performace of sort_by is generally
determined by the cost in time and space of creating the side array.

For sorting Fixnum there is much lower cost since you never call a #<=>.

While I didn't mention it, I was profiling to speed up RubyGems. I don't blindly assume, and neither should anyone else.

···

On Oct 8, 2007, at 10:05 , Rick DeNatale wrote:

On 10/8/07, Eric Hodel <drbrain@segment7.net> wrote:

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars