Sorting with the Swartzian transform

Hello all,

This email is simply for the benefit and/or reading pleasure of ruby
hackers. In particular those without a strong Perl background. I just
had to do a Swartzian transform in Ruby and it occurred to me that someone
might benefit from my experience.

If you already know how to do a Swartzian transform in Ruby, just skip.

The Swartzian transform is a method of speeding up certain expensive
sorts. It is one of the hallmarks of Perl, and it is also available in
Ruby (I actually like it better in Ruby).

Situation:

···

==========
Say I have an array of objects and want to sort them according to some
criteria. Suppose I have a function that assigns a numerical value for
each object. The simplest way to do the sort is:

array.sort { |a,b| myFunction(a) <=> my_function(b)}

Here, my_function() gets called O(n log n) times. If my_function() is
expensive, this sort can be very slow. This is where the Swartzian
transform comes in.

step 1:

Use Array#map to create a new array whose entries are arrays containing
both the original object and the corresponding my_function() value.

array.map { |obj| [ my_function(obj), obj] }

This is an O(n) operation.

step 2:

Sort this array

. sort { |a,b| a[0] <=> b[0] }

This is O(n log n), but the operations are cheap.

step 3:

Now use Array#map again to get rid of the myFunction() values.

. map { |arr| arr[1] }

The whole thing looks like this:

sorted = array.map { |obj| [ my_function(obj), obj ] }
.sort { |a,b| a[0] <=> b[0] }
.map { |arr| arr[1] }

This is actually much more readable than the Perl version (which has to be
written backwards).

A general case:

Suppose you can’t assign numerical values to the objects. All you have is
a compare() function. Normally you’d write:

sorted = array.sort { |a,b| compare(a,b) }

There might still be room for a Swartzian transform. You can try to do as
much work as possible on a sepparate function, and create a fast_compare()
function that does the rest of the work. The the transform would look
like:

sorted = array.map { |obj| [ prepare_object(obj), obj ] }
.sort { |a,b| fast_compare(a,b) }
.map { |arr| arr[1] }

The more work you can pull out of ‘fast_compare()’ and put into
’prepare_object()’ the faster the sort will be.

I hope someone benefits from this.

Cheers,
Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

Even easier-to-use:

class Array
def ssort(&block)
map {{|i| [&block.call(i), i]}
.sort {|a,b| a[0] <=> b[0]}
.map {|a| a[1]}
end
end

martin

···

Daniel Carrera dcarrera@math.umd.edu wrote:

sorted = array.map { |obj| [ my_function(obj), obj ] }
.sort { |a,b| a[0] <=> b[0] }
.map { |arr| arr[1] }

Hi,

Daniel Carrera dcarrera@math.umd.edu writes:

This email is simply for the benefit and/or reading pleasure of ruby
hackers. In particular those without a strong Perl background. I just
had to do a Swartzian transform in Ruby and it occurred to me that someone
might benefit from my experience.

Ruby 1.8 has Enumerable#sort_by.
This method is equivalent to:

module Enumerable
def sort_by
ary = map { |i| [yield(i), i] }
ary.sort! { |a, b| a[0] <=> b[0] }
ary.map! { |i| i[1] }
end
end

···


eban

Ruby never ceases to amaze me. :slight_smile:

I still think it’s good to understand Swartzian transforms because
Enumerable#sort_by won’t help in the general case when you can’t easily
assign a number to ech object.

Still, it’s very cool to have that in the language.

Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

···

On Sun, 12 Jan 2003, WATANABE Hirofumi wrote:

Hi,

Daniel Carrera dcarrera@math.umd.edu writes:

This email is simply for the benefit and/or reading pleasure of ruby
hackers. In particular those without a strong Perl background. I just
had to do a Swartzian transform in Ruby and it occurred to me that someone
might benefit from my experience.

Ruby 1.8 has Enumerable#sort_by.
This method is equivalent to:

module Enumerable
def sort_by
ary = map { |i| [yield(i), i] }
ary.sort! { |a, b| a[0] <=> b[0] }
ary.map! { |i| i[1] }
end
end


eban

I misspelled Randal’s name. It’s should have been “Schwartzian
transformed” as it is named after “Randal Schwartz”.

David, thanks for pointing that out.

In my defense, I did look up his name before writing the email. The page
I found with his name had it spelled wrong.

Cheers,
Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

···

On Sun, 12 Jan 2003, WATANABE Hirofumi wrote:

Hi,

Daniel Carrera dcarrera@math.umd.edu writes:

This email is simply for the benefit and/or reading pleasure of ruby
hackers. In particular those without a strong Perl background. I just
had to do a Swartzian transform in Ruby and it occurred to me that someone
might benefit from my experience.

Ruby 1.8 has Enumerable#sort_by.
This method is equivalent to:

module Enumerable
def sort_by
ary = map { |i| [yield(i), i] }
ary.sort! { |a, b| a[0] <=> b[0] }
ary.map! { |i| i[1] }
end
end


eban

Hello Martin,

Sunday, January 12, 2003, 11:48:27 AM, you wrote:

class Array
def ssort(&block)
map {{|i| [&block.call(i), i]}
.sort {|a,b| a[0] <=> b[0]}\

just “.sort” will be faster (in most cases :slight_smile:

.map {|a| a[1]}

end
end

and it’s better to define standard (for 1.7) method
sort_by after checking that it doesn’t exist

···


Best regards,
Bulat mailto:bulatz@integ.ru

good stuff daniel; I only “got” the transform in my perl days about 30 seconds
after I’d read up on it. Every time. For whatever reason, it never “stuck”.
Seeing it in ruby makes much, much more sense.

Not to be a pedant, but as it came from Randal, it’s the “Schwartzian”
transform.

···

-----Original Message-----
From: Daniel Carrera [mailto:dcarrera@math.umd.edu]
Sent: Sunday, January 12, 2003 8:09 AM
To: ruby-talk ML
Subject: Re: sorting with the Swartzian transform

Ruby never ceases to amaze me. :slight_smile:

Good point! Yet another place where Ruby just does the right thing.

martin

···

Bulat Ziganshin bulatz@integ.ru wrote:

Hello Martin,

Sunday, January 12, 2003, 11:48:27 AM, you wrote:

class Array
def ssort(&block)
map {{|i| [&block.call(i), i]}
.sort {|a,b| a[0] <=> b[0]}\

just “.sort” will be faster (in most cases :slight_smile:

Yes, but it does something different. ‘.sort’ sorts in ASCIIbetical
order (so 10 comes before 2) , whereas <=> is a numerical sort.

Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

···

On Mon, 13 Jan 2003, Bulat Ziganshin wrote:

Hello Martin,

Sunday, January 12, 2003, 11:48:27 AM, you wrote:

class Array
def ssort(&block)
map {{|i| [&block.call(i), i]}
.sort {|a,b| a[0] <=> b[0]}\

just “.sort” will be faster (in most cases :slight_smile:

What are you talking about? I think you’re
confused, but if not, please illustrate
with code.

Hal

···

----- Original Message -----
From: “Daniel Carrera” dcarrera@math.umd.edu
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Monday, January 13, 2003 10:30 AM
Subject: Re: sorting with the Swartzian transform

On Mon, 13 Jan 2003, Bulat Ziganshin wrote:

Hello Martin,

Sunday, January 12, 2003, 11:48:27 AM, you wrote:

class Array
def ssort(&block)
map {{|i| [&block.call(i), i]}
.sort {|a,b| a[0] <=> b[0]}\

just “.sort” will be faster (in most cases :slight_smile:

Yes, but it does something different. ‘.sort’ sorts in ASCIIbetical
order (so 10 comes before 2) , whereas <=> is a numerical sort.

Not under 1.7.3, at least:

a = [[10, 20], [3, 4], [1, “a”]]
a.sort # => [[1, “a”], [3, 4], [10, 20]]

martin

···

Daniel Carrera dcarrera@math.umd.edu wrote:

On Mon, 13 Jan 2003, Bulat Ziganshin wrote:

Hello Martin,

Sunday, January 12, 2003, 11:48:27 AM, you wrote:

class Array
def ssort(&block)
map {{|i| [&block.call(i), i]}
.sort {|a,b| a[0] <=> b[0]}\

just “.sort” will be faster (in most cases :slight_smile:

Yes, but it does something different. ‘.sort’ sorts in ASCIIbetical
order (so 10 comes before 2) , whereas <=> is a numerical sort.

Hi –

Hello Martin,

Sunday, January 12, 2003, 11:48:27 AM, you wrote:

class Array
def ssort(&block)
map {{|i| [&block.call(i), i]}
.sort {|a,b| a[0] <=> b[0]}\

just “.sort” will be faster (in most cases :slight_smile:

Yes, but it does something different. ‘.sort’ sorts in ASCIIbetical
order (so 10 comes before 2) , whereas <=> is a numerical sort.

“10” (string) comes before “2”, but 2 comes before 10:

irb(main):008:0> [10,2].sort
=> [2, 10]
irb(main):009:0> [“10”,“2”].sort
=> [“10”, “2”]

(both based on <=>)

David

···

On Tue, 14 Jan 2003, Daniel Carrera wrote:

On Mon, 13 Jan 2003, Bulat Ziganshin wrote:


David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Hello Daniel,

Monday, January 13, 2003, 7:30:15 PM, you wrote:

.sort {|a,b| a[0] <=> b[0]}\

just “.sort” will be faster (in most cases :slight_smile:

Yes, but it does something different. ‘.sort’ sorts in ASCIIbetical
order (so 10 comes before 2) , whereas <=> is a numerical sort.

sorry, perl days is over :slight_smile:

···


Best regards,
Bulat mailto:bulatz@integ.ru

What are you talking about? I think you’re
confused, but if not, please illustrate
with code.

Well, Ruby’s ‘sort’ is smarter than I expected:

array = [10, 2]
=> [10, 2]
array.sort
=> [2, 10]
array = [‘10’, ‘2’]
=> [“10”, “2”]
array.sort
=> [“10”, “2”]

The latter example is what I was thinking of. I was influenced by my Perl
days and forgot that Ruby could make a smarter sort.

Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137