Sort_by is not stable?

sort_by is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

_md

···

--
Posted via http://www.ruby-forum.com/.

sort_by is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

Without code, I have to guess. If I understood your issue correctly,
then you need to specify both fields in sort_by block:

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b", 1]]

=> [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b", 1]]

a.sort_by {|e| [e.last, e.first]}

=> [["b", 1], ["e", 2], ["a", 3], ["c", 3], ["d", 3], ["f", 3], ["g", 3]]

Regards,
Ammar

···

On Sun, Nov 7, 2010 at 12:22 PM, Michel Demazure <michel@demazure.com> wrote:

I've noticed that it's unstable under 1.8.7:

%w(foo bar car war raw rat cat or bat is).sort.
sort_by{|s| s.size}
    ==>["is", "or", "bar", "cat", "foo", "car", "bat", "rat", "raw",
"war"]
VERSION
    ==>"1.8.7"

···

On Nov 7, 4:22 am, Michel Demazure <mic...@demazure.com> wrote:

sort_by is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

Ammar Ali wrote in post #959889:

If I understood your issue correctly,

Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# = > [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# = > [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]

Without code, I have to guess

The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).
_md

···

--
Posted via http://www.ruby-forum.com/\.

Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby's qsort is naive.
AFAIK, the majority of qsort implementations are.

Maybe this is a question/request for the core mailing list.

Regards,
Ammar

···

On Sun, Nov 7, 2010 at 1:15 PM, Michel Demazure <michel@demazure.com> wrote:

Ammar Ali wrote in post #959889:

If I understood your issue correctly,

Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# = > [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# = > [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]

Without code, I have to guess

The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).

IIRC quicksort is instable by definition. There may be stable variants though.

In this case it's not really needed IMHO. You can either do

x.sort_by {|let,dig| [dig, let]}

or, if you don't want to create all those temporary arrays

x.sort {|(la,da),(lb,db)| da == db ? la <=> lb : da <=> db}
x.sort {|(la,da),(lb,db)| c = da <=> db; c == 0 ? la <=> lb : c}

Kind regards

  robert

···

On 07.11.2010 12:39, Ammar Ali wrote:

On Sun, Nov 7, 2010 at 1:15 PM, Michel Demazure<michel@demazure.com> wrote:

Ammar Ali wrote in post #959889:

If I understood your issue correctly,

Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# => [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# => [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]

Without code, I have to guess

The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).

Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby's qsort is naive.
AFAIK, the majority of qsort implementations are.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Please, people, trim your quotes.

···

On Sun, Nov 7, 2010 at 1:07 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

IIRC quicksort is instable by definition. There may be stable variants
though.

It's only unstable if the implementation is inefficient (says
Wikipedia, so apply your truthiness filters).

However, QSort is (again, so says Wikipedia), only maintains the
relative order of records, if those are equal.
["a",3] and ["b",3] aren't equal as far as a computer is concerned,
resulting in the OP's problem.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

From a practical viewpoint, stable sort should be implemented as stable
sort.
Not a multiple keys sort... or something else.

···

On Sun, Nov 7, 2010 at 8:07 PM, Robert Klemme <shortcutter@googlemail.com>wrote:

On 07.11.2010 12:39, Ammar Ali wrote:

On Sun, Nov 7, 2010 at 1:15 PM, Michel Demazure<michel@demazure.com> >> wrote:

Ammar Ali wrote in post #959889:

If I understood your issue correctly,

Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# => [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# => [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]

Without code, I have to guess

The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).

Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby's qsort is naive.
AFAIK, the majority of qsort implementations are.

IIRC quicksort is instable by definition. There may be stable variants
though.

In this case it's not really needed IMHO. You can either do

x.sort_by {|let,dig| [dig, let]}

or, if you don't want to create all those temporary arrays

x.sort {|(la,da),(lb,db)| da == db ? la <=> lb : da <=> db}
x.sort {|(la,da),(lb,db)| c = da <=> db; c == 0 ? la <=> lb : c}

Kind regards

       robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

--
Lai, Yu-Hsuan

Robert Klemme:

You can either do

x.sort_by {|let,dig| [dig, let]}

or, if you don't want to create all those temporary arrays

x.sort {|(la,da),(lb,db)| da == db ? la <=> lb : da <=> db}
x.sort {|(la,da),(lb,db)| c = da <=> db; c == 0 ? la <=> lb : c}

Or use #nonzero? and #<=> chaining like this:

x.sort { |(la, da), (lb, db)| (da <=> db).nonzero? or la <=> lb }

— Piotr Szotkowski

···

--
Language designers in general: You are IN THE WAY. Go here:
REBOL in One Line (R2 One-liners) then do us all a favor and don’t
resist the urge to fall on that wakizashi when your shame overcomes you.
                                                             [Jeff Bone]

Phillip Gawlowski wrote in post #959912:

However, QSort is (again, so says Wikipedia), only maintains the
relative order of records, if those are equal.
["a",3] and ["b",3] aren't equal as far as a computer is concerned,
resulting in the OP's problem.

@Phillip : I am not sure I agree. When sorting on the second half, the
couple (["a", 3], ["b", 3]) falls under the "equal" case of the choice.

Robert Klemme wrote :

In this case it's not really needed IMHO

@Robert : yes in this precise case with two fields, you can go around.
But in general, if you work with many fields and have to sort by one of
them, you should not have to remember the previous sorts and specify all
the unconcerned fields.
_md

···

--
Posted via http://www.ruby-forum.com/\.

"a" != "b" results in true, thus the couplet is *not* equal.

···

On Sun, Nov 7, 2010 at 2:08 PM, Michel Demazure <michel@demazure.com> wrote:

Phillip Gawlowski wrote in post #959912:
@Phillip : I am not sure I agree. When sorting on the second half, the
couple (["a", 3], ["b", 3]) falls under the "equal" case of the choice.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

No, Michael is right. Every sorting algorithm only ever looks at *sort keys*. So since our sort key is the second array member (the number) both mentioned records are equivalent. If you include the other field a stable sort isn't really interesting here since then all fields are part of the sort key - stable and instable algorithms will return the same then.

Kind regards

  robert

···

On 07.11.2010 15:10, Phillip Gawlowski wrote:

On Sun, Nov 7, 2010 at 2:08 PM, Michel Demazure<michel@demazure.com> wrote:

Phillip Gawlowski wrote in post #959912:
@Phillip : I am not sure I agree. When sorting on the second half, the
couple (["a", 3], ["b", 3]) falls under the "equal" case of the choice.

"a" != "b" results in true, thus the couplet is *not* equal.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme wrote in post #959935:

No, Michael is right. Every sorting algorithm only ever looks at *sort
keys*. So since our sort key is the second array member (the number)
both mentioned records are equivalent.

Coming back to the beginning, quicksort correctly coded is stable (cut
recursively in three parts, with the middle part made by those elements
equivalent to the choosen pivot). Even the parallel version is stable.
In NESL :

function quicksort(a) =
if (#a < 2) then a
else
  let pivot = a[#a/2];
      lesser = {e in a| e < pivot};
      equal = {e in a| e == pivot};
      greater = {e in a| e > pivot};
      result = {quicksort(v): v in [lesser,greater]};
  in result[0] ++ equal ++ result[1];

Which means that ruby's is badly coded.
_md

···

--
Posted via http://www.ruby-forum.com/\.

Good Morning,

Coming back to the beginning, quicksort correctly coded is stable (cut
recursively in three parts, with the middle part made by those elements
equivalent to the choosen pivot).

...

Which means that ruby's is badly coded.

The rest of the sane world begs to differ.

Sorting algorithm - Wikipedia - "...Efficient
implementations of quicksort (with in-place partitioning) are typically
unstable sorts..."

sort - perl pragma to control sort() behaviour - Perldoc Browser - "...Mergesort is stable, quicksort is
not..."

http://www.codecogs.com/reference/c/stdlib.h/qsort.php - "...The algorithms
implemented by *qsort*, *qsort_r* and *heapsort* are not stable..."

Linux Man page for qsort - "....If two members compare as equal, their order
in the sorted array is undefined...."

Python in all fairness says they have stable sorts - but they really make no
mention of the algorithm used so I'm unclear as to whether they are using
quicksort at all.

I'd rather have the efficient implementation that one can easily make stable
by using the secondary comparison (as you can easily do in this case) than
an inefficient version that really accomplishes nothing that the efficient
version cannot.

John

P.S. There is nothing in the world stopping anyone from submitting a patch
to add a stable sort option to Ruby. But to pronounce that the hard work of
others is badly coded when it clearly conforms to the normal standard of
multiple mainstream languages is just plain ignorant at best.

···

On Sun, Nov 7, 2010 at 9:13 AM, Michel Demazure <michel@demazure.com> wrote:

John W Higgins wrote in post #959968:

... just plain ignorant at best.

Message received.
_md

···

--
Posted via http://www.ruby-forum.com/\.

In fact, python uses Timsort. Java SE and Android too.
(wikipedia::Timsort <http://en.wikipedia.org/wiki/Timsort&gt;\)
It's very efficient, like quicksort.
(btw, quicksort isn't most efficient compare-based sort. see
wikipedia::introsort. <http://en.wikipedia.org/wiki/Introsort&gt;\)

···

On Mon, Nov 8, 2010 at 2:34 AM, John W Higgins <wishdev@gmail.com> wrote:

Good Morning,

On Sun, Nov 7, 2010 at 9:13 AM, Michel Demazure <michel@demazure.com> > wrote:

>
> Coming back to the beginning, quicksort correctly coded is stable (cut
> recursively in three parts, with the middle part made by those elements
> equivalent to the choosen pivot).

...

>
> Which means that ruby's is badly coded.
>

The rest of the sane world begs to differ.

Sorting algorithm - Wikipedia - "...Efficient
implementations of quicksort (with in-place partitioning) are typically
unstable sorts..."

sort - perl pragma to control sort() behaviour - Perldoc Browser - "...Mergesort is stable, quicksort is
not..."

http://www.codecogs.com/reference/c/stdlib.h/qsort.php - "...The
algorithms
implemented by *qsort*, *qsort_r* and *heapsort* are not stable..."

Linux Man page for qsort - "....If two members compare as equal, their
order
in the sorted array is undefined...."

Python in all fairness says they have stable sorts - but they really make
no
mention of the algorithm used so I'm unclear as to whether they are using
quicksort at all.

I'd rather have the efficient implementation that one can easily make
stable
by using the secondary comparison (as you can easily do in this case) than
an inefficient version that really accomplishes nothing that the efficient
version cannot.

John

P.S. There is nothing in the world stopping anyone from submitting a patch
to add a stable sort option to Ruby. But to pronounce that the hard work of
others is badly coded when it clearly conforms to the normal standard of
multiple mainstream languages is just plain ignorant at best.

--
Lai, Yu-Hsuan

I'd rather have a stable and efficient sort. If Python can do it,
then there's no excuse for Ruby.

···

On Nov 7, 12:34 pm, John W. Higgins <wish...@gmail.com> wrote:

I'd rather have the efficient implementation that one can easily make stable
by using the secondary comparison (as you can easily do in this case) than
an inefficient version that really accomplishes nothing that the efficient
version cannot.

Good Afternoon

> I'd rather have the efficient implementation that one can easily make
stable
> by using the secondary comparison (as you can easily do in this case)
than
> an inefficient version that really accomplishes nothing that the
efficient
> version cannot.

I'd rather have a stable and efficient sort. If Python can do it,
then there's no excuse for Ruby.

First, there are two issues here - one is whether or not to use quicksort as
the default sort and then whether or not there is an expectation as to
whether or not quicksort should be stable.

The issue I took was the statement that Ruby's quicksort was "badly coded".
This is not a correct statement given the many other instances of quicksort
that follow the same principal as Ruby's version in that it's not stable. I
think Ruby is just fine with respect to its quicksort implementation.

As to the concept of whether or not another algorithm should be used - it's
rather interesting to look here -
http://redmine.ruby-lang.org/issues/show/1089 and especially the comments of
Mr. Nutter who while working on JRuby had an apparently clean canvas to work
with and still went with an unstable sort. It's sparse on details and I'm
not trying to put words in his mouth either.

Again, anyone is free to put a patch on the table and really start a good
conversation as to how it would fit in - and quite possibly a stable_sort
method would be an outstanding option here. But there just is a huge
difference between someone coming to the table patch in hand to begin a
conversation - and just saying Ruby is "badly coded". This is a sort
algorithm - it shouldn't be rocket science for someone to create a different
algorithm patch if they wanted to. There are plenty of C reference
implementations available as starting points.

John

···

On Sun, Nov 7, 2010 at 1:00 PM, w_a_x_man <w_a_x_man@yahoo.com> wrote:

On Nov 7, 12:34 pm, John W. Higgins <wish...@gmail.com> wrote:

I can clarify.

JRuby used to just use the JVM-provided mergesort, which was stable
but markedly slower than Ruby 1.8.6/7's unstable quicksort. Because we
got a couple reports about sort being slower, we opted to do the
"wrong thing" and have a contributor implement a hybrid sort that
performed better than Ruby 1.8.6/7 but was unstable.

About a month later, someone raised the issue about MRI not having a
stable sort, and it was decided to make it stable. AARGH. I have not
kept up with discussions, however, and I believe sorts are still
unstable in 1.9.2, correct?

At this point I almost want there to be both options, since our
contributor (qbproger ... sorry man, your name escapes me at the
moment!) went to a lot of effort to get our current sort running well.

In any case, it wasn't an arbitrary decision. We debate almost
endlessly over these sorts of things, and I think I've taken a good
ten years off my life due to the related stress :slight_smile:

- Charlie

···

On Sun, Nov 7, 2010 at 3:37 PM, John W Higgins <wishdev@gmail.com> wrote:

As to the concept of whether or not another algorithm should be used - it's
rather interesting to look here -
http://redmine.ruby-lang.org/issues/show/1089 and especially the comments of
Mr. Nutter who while working on JRuby had an apparently clean canvas to work
with and still went with an unstable sort. It's sparse on details and I'm
not trying to put words in his mouth either.

Charles Nutter wrote in post #962516:

In any case, it wasn't an arbitrary decision. We debate almost
endlessly over these sorts of things, and I think I've taken a good
ten years off my life due to the related stress :slight_smile:

Just to come back to my initial post : it was not about sort, but about
sort_by.

For sort_by, you need to compute the keys, and sort the indices
according to the keys. It is not in-place sorting. Yes, optimal in-place
quicksort is unstable. But for sort_by, one could use a stable version.

Am I wrong ?
_md

···

--
Posted via http://www.ruby-forum.com/\.