Ruby 1.8 vs 1.9

Peter_Pincus · 23 November 2010 15:43

Hi,

how much longer will Ruby 1.8(.7) be maintained ? Is it advisable to
dive into 1.9(.2) ? What are the immediate advantages of using 1.9
over 1.8 ?

Thanks,
..
Pete Pincus

Chuck_Remes · 23 November 2010 21:25

I believe the guys at EngineYard are in charge of backporting fixes to the 1.8.7 branch. I also heard there was a 1.8.8 coming at some point to be the final release in the 1.8 series.

I use 1.9.2p0 daily and find it to be extremely stable and fast. I would say the biggest reason to use it is to get a performance boost. Most of your code from 1.8 will "just work." My code sees a 2-5x speedup on 1.9.2 versus 1.8.7.

Why not try it on your code and see for yourself? With tools like rvm (for unix) and pik (for windows) it's a breeze to have multiple rubies installed simultaneously.

cr

···

On Nov 23, 2010, at 9:43 AM, Peter Pincus wrote:

Hi,

how much longer will Ruby 1.8(.7) be maintained ? Is it advisable to
dive into 1.9(.2) ? What are the immediate advantages of using 1.9
over 1.8 ?

Oknek_Jeanr · 28 November 2010 18:00

I know,I really LIKE THE GERMANY life,anyone is the same?
I like to Buy UGG Boots From Good Supplier.
My friend Tell me a Good Supplier,can anyone
Let Me Know?
www Uggscom com
www Uggscom com
My friend tell me that can do Best Secvice and 100% Original.
really?
please let me know,thank you

···

--
Posted via http://www.ruby-forum.com/.

Ryan_Davis1 · 23 November 2010 22:23

That's really variable and depends on what you're doing.

All of my text processing code needed reworking, and text processing is (was?) noticeably slower in ruby 1.9 than it is in 1.8.

···

On Nov 23, 2010, at 13:25 , Chuck Remes wrote:

I use 1.9.2p0 daily and find it to be extremely stable and fast. I would say the biggest reason to use it is to get a performance boost. Most of your code from 1.8 will "just work." My code sees a 2-5x speedup on 1.9.2 versus 1.8.7.

Brian_Candler · 24 November 2010 11:14

Chuck Remes wrote in post #963430:

I use 1.9.2p0 daily and find it to be extremely stable and fast. I would
say the biggest reason to use it is to get a performance boost. Most of
your code from 1.8 will "just work." My code sees a 2-5x speedup on
1.9.2 versus 1.8.7.

And just to give some balance: the biggest reason not to use 1.9 is
because of the incredible complexity which has been added to the String
class, and the ability it gives you to make programs which crash under
unexpected circumstances.

For example, an expression like

s1 = s2 + s3

where s2 and s3 are both Strings will always work and do the obvious
thing in 1.8, but in 1.9 it may raise an exception. Whether it does
depends not only on the encodings of s2 and s3 at that point, but also
their contents (properties "empty?" and "ascii_only?")

The encodings of strings you read may also be affected by the locale set
from the environment, unless you explicitly code against that. This
means the same program with the same data may work on your machine, but
crash on someone else's.

github.com

candlerb/string19/blob/master/string19.rb

#!/usr/bin/env ruby19
# encoding: UTF-8
# This document is Copyright (C) Brian Candler 2009 and released under a
# Creative Commons Attribution-NonCommercial 3.0 Unported License.

############# CONTENTS ###################

# -1. PREAMBLE
#  0. INTRODUCTION
#  1. ENCODINGS
#  2. PROPERTIES OF ENCODINGS
#  3. STRING, FILE AND REGEXP ENCODINGS
#  4. VALID AND FIXED ENCODINGS
#  5. COMPATIBLE OBJECTS
#  6. STRING CONCATENATION
#  7. THE BINARY / ASCII-8BIT ENCODING
#  8. SINGLE CHARACTERS
#  9. EQUALITY AND COLLATION
# 10. HASH AND EQL?
# 11. UPPER AND LOWER CASE

This file has been truncated. show original

github.com

candlerb/string19/blob/master/soapbox.rb

=begin rant

More discussion, and examples of problems, at
* http://www.ruby-forum.com/topic/173380
* http://www.ruby-forum.com/topic/179303
* http://www.ruby-forum.com/topic/192218
* http://www.ruby-forum.com/topic/216873

For me, I absolutely hate all this encoding stuff in ruby 1.9, and I'll try
to explain why here.

* As a programmer, the most important thing for me is to be able to reason
  about the code I write.  Reasoning tells me whether the code I write is
  likely to run, terminate, and give the result I want.
  
  In ruby 1.8, if I write an expression like "s3 = s1 + s2", where s1 and s2
  are strings, this is easy because it's a one-dimensional space.
  
               s3     =     s1   +   s2

This file has been truncated. show original

···

--
Posted via http://www.ruby-forum.com/\.

Phil5 · 23 November 2010 23:44

Who do I talk to get 1.9 RPMs produced for Fedora?

Thanks,

Phil.

···

On 2010-11-24 09:23, Ryan Davis wrote:

On Nov 23, 2010, at 13:25 , Chuck Remes wrote:

I use 1.9.2p0 daily and find it to be extremely stable and fast. I
would say the biggest reason to use it is to get a performance
boost. Most of your code from 1.8 will "just work." My code sees a
2-5x speedup on 1.9.2 versus 1.8.7.

That's really variable and depends on what you're doing.

All of my text processing code needed reworking, and text processing
is (was?) noticeably slower in ruby 1.9 than it is in 1.8.

--
Philip Rhoades

GPO Box 3411
Sydney NSW 2001
Australia
E-mail: phil@pricom.com.au

Chuck_Remes · 24 November 2010 04:43

Definitely true. That's why I was careful to say "My code sees a 2-5x speedup..." because I have seen a few instances where 1.9 is a tad pokier. But clearly 1.9 is the future so sticking with 1.8 seems like a bad long-term bet.

cr

···

On Nov 23, 2010, at 4:23 PM, Ryan Davis wrote:

On Nov 23, 2010, at 13:25 , Chuck Remes wrote:

I use 1.9.2p0 daily and find it to be extremely stable and fast. I would say the biggest reason to use it is to get a performance boost. Most of your code from 1.8 will "just work." My code sees a 2-5x speedup on 1.9.2 versus 1.8.7.

That's really variable and depends on what you're doing.

All of my text processing code needed reworking, and text processing is (was?) noticeably slower in ruby 1.9 than it is in 1.8.

Oliver_Schad · 24 November 2010 12:25

Brian Candler wrote:

And just to give some balance: the biggest reason not to use 1.9 is
because of the incredible complexity which has been added to the
String class, and the ability it gives you to make programs which
crash under unexpected circumstances.

Sounds great. Can somebody else confirm this?

Regards
Oli

···

--
Man darf ruhig intelligent sein, man muss sich nur zu helfen wissen

Michael_Fellinger1 · 24 November 2010 12:25

Chuck Remes wrote in post #963430:

I use 1.9.2p0 daily and find it to be extremely stable and fast. I would
say the biggest reason to use it is to get a performance boost. Most of
your code from 1.8 will "just work." My code sees a 2-5x speedup on
1.9.2 versus 1.8.7.

And just to give some balance: the biggest reason not to use 1.9 is
because of the incredible complexity which has been added to the String
class, and the ability it gives you to make programs which crash under
unexpected circumstances.

For example, an expression like

s1 = s2 + s3

where s2 and s3 are both Strings will always work and do the obvious
thing in 1.8, but in 1.9 it may raise an exception. Whether it does
depends not only on the encodings of s2 and s3 at that point, but also
their contents (properties "empty?" and "ascii_only?")

The encodings of strings you read may also be affected by the locale set
from the environment, unless you explicitly code against that. This
means the same program with the same data may work on your machine, but
crash on someone else's.

And that's why I use and love 1.9.
The obvious thing isn't so obvious if you actually care about
encodings, and if you are mindful about what comes from where, it's
actually helpful to find otherwise hidden issues.
I hear nobody complain that 1 / 0 raises but 1.0 / 0.0 gives Infinity,
which I find pretty counter-intuitive, and makes me check for .nan?
and .infinite? (which also fails if I call it on Fixnum instead of
Float).

string19/string19.rb at master · candlerb/string19 · GitHub
string19/soapbox.rb at master · candlerb/string19 · GitHub

Many valid complaints there, but nothing that would make me long for
the everything-is-a-string-of-bytes approach of 1.8, which made
working with encodings very brittle.
I can see how this is just annoying to someone who has only dealt with
BINARY/ASCII/UTF-8 all their lives, but please consider that most of
the world actually still uses other encodings as well.
I also want to thank you for writing string19.rb, which is a very
helpful resource for me and others, along with the series from JEG II.

···

On Wed, Nov 24, 2010 at 8:14 PM, Brian Candler <b.candler@pobox.com> wrote:

--
Michael Fellinger
CTO, The Rubyists, LLC

David_Masover · 24 November 2010 18:49

For example, an expression like

s1 = s2 + s3

where s2 and s3 are both Strings will always work and do the obvious
thing in 1.8, but in 1.9 it may raise an exception. Whether it does
depends not only on the encodings of s2 and s3 at that point, but also
their contents (properties "empty?" and "ascii_only?")

In 1.8, if those strings aren't in the same encoding, it will blindly
concatenate them as binary values, which may result in a corrupt and
nonsensical string.

It seems to me that the obvious thing is to raise an error when there's an
error, instead of silently corrupting your data.

This
means the same program with the same data may work on your machine, but
crash on someone else's.

Better, again, than working on my machine, but corrupting on someone else's.
At least if it crashes, hopefully there's a bug report and even a fix _before_
it corrupts someone's data, not after.

string19/string19.rb at master · candlerb/string19 · GitHub
string19/soapbox.rb at master · candlerb/string19 · GitHub

From your soapbox.rb:

* Whether or not you can reason about whether your program works, you will
  want to test it. 'Unit testing' is generally done by running the code with
  some representative inputs, and checking if the output is what you expect.

  Again, with 1.8 and the simple line above, this was easy. Give it any two
  strings and you will have sufficient test coverage.

Nope. All that proves is that you can get a string back. It says nothing about
whether the resultant string makes sense.

More relevantly:

* It solves a non-problem: how to write a program which can juggle multiple
  string segments all in different encodings simultaneously. How many
  programs do you write like that? And if you do, can't you just have
  a wrapper object which holds the string and its encoding?

Let's see... Pretty much every program, ever, particularly web apps. The end-
user submits something in the encoding of their choice. I may have to convert
it to store it in a database, at the very least. It may make more sense to
store it as whatever encoding it is, in which case, the simple act of
displaying two comments on a website involves exactly this sort of
concatenation.

Or maybe I pull from multiple web services. Something as simple and common as
a "trackback" would again involve concatenating multiple strings from
potentially different encodings.

* It's pretty much obsolete, given that the whole world is moving to UTF-8
  anyway. All a programming language needs is to let you handle UTF-8 and
  binary data, and for non-UTF-8 data you can transcode at the boundary.
  For stateful encodings you have to do this anyway.

Java at least did this sanely -- UTF16 is at least a fixed width. If you're
going to force a single encoding, why wouldn't you use fixed-width strings?

Oh, that's right -- UTF16 wastes half your RAM when dealing with mostly ASCII
characters. So UTF-8 makes the most sense... in the US.

The whole point of having multiple encodings in the first place is that other
encodings make much more sense when you're not in the US.

* It's ill-conceived. Knowing the encoding is sufficient to pick characters
  out of a string, but other operations (such as collation) depend on the
  locale. And in any case, the encoding and/or locale information is often
  carried out-of-band (think: HTTP; MIME E-mail; ASN1 tags), or within the
  string content (think: <?xml charset?>)

How does any of this help me once I've read the string?

* It's too stateful. If someone passes you a string, and you need to make
it compatible with some other string (e.g. to concatenate it), then you
need to force it's encoding.

You only need to do this if the string was in the wrong encoding in the first
place. If I pass you a UTF-16 string, it's not polite at all (whether you dup
it first or not) to just stick your fingers in your ears, go "la la la", and
pretend it's UTF-8 so you can concatenate it. The resultant string will be
neither, and I can't imagine what it'd be useful for.

You do seem to have some legitimate complaints, but they are somewhat
undermined by the fact that you seem to want to pretend Unicode doesn't exist.
As you noted:

"However I am quite possibly alone in my opinion. Whenever this pops up on
ruby-talk, and I speak out against it, there are two or three others who
speak out equally vociferously in favour. They tell me I am doing the
community a disservice by warning people away from 1.9."

Warning people away from 1.9 entirely, and from character encoding in
particular, because of the problems you've pointed out, does seem incredibly
counterproductive. It'd make a lot more sense to try to fix the real problems
you've identified -- if it really is "buggy as hell", I imagine the ruby-core
people could use your help.

···

On Wednesday, November 24, 2010 05:14:15 am Brian Candler wrote:

Ryan_Davis1 · 24 November 2010 00:13

Beats me.

···

On Nov 23, 2010, at 15:44 , Philip Rhoades wrote:

Who do I talk to get 1.9 RPMs produced for Fedora?

Phil · 24 November 2010 00:30

Just a guess: The Ruby (or Programming/Script language) maintainers of
the Fedora project.

···

On Wed, Nov 24, 2010 at 12:44 AM, Philip Rhoades <phil@pricom.com.au> wrote:

Who do I talk to get 1.9 RPMs produced for Fedora?

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

flebber · 24 November 2010 10:35

wiki is a start for fedora they have ruby packages in the
repositories.

http://fedoraproject.org/wiki/Features/Ruby_1.9.1

or build it...simple on linux here is a guide for fedora

···

On Nov 24, 10:44 am, Philip Rhoades <p...@pricom.com.au> wrote:

On 2010-11-24 09:23, Ryan Davis wrote:

> On Nov 23, 2010, at 13:25 , Chuck Remes wrote:

>> I use 1.9.2p0 daily and find it to be extremely stable and fast. I
>> would say the biggest reason to use it is to get a performance
>> boost. Most of your code from 1.8 will "just work." My code sees a
>> 2-5x speedup on 1.9.2 versus 1.8.7.

> That's really variable and depends on what you're doing.

> All of my text processing code needed reworking, and text processing
> is (was?) noticeably slower in ruby 1.9 than it is in 1.8.

Who do I talk to get 1.9 RPMs produced for Fedora?

Thanks,

Phil.
--
Philip Rhoades

GPO Box 3411
Sydney NSW 2001
Australia
E-mail: p...@pricom.com.au

Michael_Fellinger1 · 24 November 2010 13:00

iota ~ % echo ʘ | LC_ALL=ja_JP.UTF8 ruby -pe '$_[1,0] = "ʘ"'
ʘʘ
iota ~ % echo ʘ | LC_ALL=C ruby -pe '$_[1,0] = "ʘ"'
-e:1: invalid multibyte char (US-ASCII)
-e:1: invalid multibyte char (US-ASCII)

···

On Wed, Nov 24, 2010 at 9:25 PM, Oliver Schad <spam.entfernen.und.bring.gefaelligst.ein.bier.mit@oschad.de> wrote:

Brian Candler wrote:

And just to give some balance: the biggest reason not to use 1.9 is
because of the incredible complexity which has been added to the
String class, and the ability it gives you to make programs which
crash under unexpected circumstances.

Sounds great. Can somebody else confirm this?

--
Michael Fellinger
CTO, The Rubyists, LLC

Brian_Candler · 24 November 2010 15:15

Michael Fellinger wrote in post #963539:

from the environment, unless you explicitly code against that. This
means the same program with the same data may work on your machine, but
crash on someone else's.

And that's why I use and love 1.9.
The obvious thing isn't so obvious if you actually care about
encodings, and if you are mindful about what comes from where, it's
actually helpful to find otherwise hidden issues.

Y'know, I wouldn't mind so much if it *always* raised an exception.

For example, say I have s1 tagged UTF-8 and s2 tagged ISO-8859-1. If
"s1+s2" always raised an exception, it would be easy to find, and easy
to fix.

However the 'compatibility' rules mean that this is data-sensitive. In
many cases s1+s2 will work, if either s1 contains non-ASCII characters
but s2 doesn't, or vice-versa. It's really hard to get test coverage of
all the possible cases - rcov won't help you - or you just cross your
fingers and hope.

You also need test coverage for cases where the input data is invalid
for the given encoding. In fact s1+s2 won't raise an exception in that
case, nor will s1[i], but s1 =~ /./ will.

I hear nobody complain that 1 / 0 raises but 1.0 / 0.0 gives Infinity,

Well, IEEE floating point is a well-established standard that has been
around for donkeys years, so I think it's reasonable to follow it.

And yes, if I see code like "c = a / b", I do think to myself "what if b
is zero?" It's easy to decide if it's expected, and whether I need to do
something other than the default behaviour. Then I move onto the next
line.

For "s3 = s1 + s2" in 1.9 I need to think to myself: "what if s1 has a
different encoding to s2, and s1 is not empty or s2 is not empty and
s1's encoding is not ASCII-compatible or s2's encoding is not
ASCII-compatible or s1 contains non-ASCII characters or s2 contains
non-ASCII characters? And what does that give as the encoding for s3 in
all those possible cases?" And then I have to carry the possible
encodings for s3 forward to the next point where it is used.

···

--
Posted via http://www.ruby-forum.com/\.

Jorg_W_Mittag1 · 25 November 2010 02:40

David Masover wrote:

Java at least did this sanely -- UTF16 is at least a fixed width. If you're
going to force a single encoding, why wouldn't you use fixed-width strings?

Actually, it's not. It's simply mathematically impossible, given that
there are more than 65536 Unicode codepoints. AFAIK, you need (at the
moment) at least 21 Bits to represent all Unicode codepoints. UTF-16
is *not* fixed-width, it encodes every Unicode codepoint as either one
or two UTF-16 "characters", just like UTF-8 encodes every Unicode
codepoint as 1, 2, 3 or 4 octets.

The only two Unicode encodings that are fixed-width are the obsolete
UCS-2 (which can only encode the lower 65536 codepoints) and UTF-32.

You can produce corrupt strings and slice into a half-character in
Java just as you can in Ruby 1.8.

Oh, that's right -- UTF16 wastes half your RAM when dealing with mostly ASCII
characters. So UTF-8 makes the most sense... in the US.

Of course, that problem is even more pronounced with UTF-32.

German text blows up about 5%-10% when encoded in UTF-8 instead of
ISO8859-15. Arabic, Persian, Indian, Asian text (which is, after all,
much more than European) is much worse. (E.g. Chinese blows up *at
least* 50% when encoding UTF-8 instead of Big5 or GB2312.) Given that
the current tendency is that devices actually get *smaller*, bandwidth
gets *lower* and latency gets *higher*, that's simply not a price
everybody is willing to pay.

The whole point of having multiple encodings in the first place is that other
encodings make much more sense when you're not in the US.

There's also a lot of legacy data, even within the US. On IBM systems,
the standard encoding, even for greenfield systems that are being
written right now, is still pretty much EBCDIC all the way.

There simply does not exist a single encoding which would be
appropriate for every case, not even the majority of cases. In fact,
I'm not even sure that there is even a single encoding which is
appropriate for a significant minority of cases.

We tried that One Encoding To Rule Them All in Java, and it was a
failure. We tried it again with a different encoding in Java 5, and it
was a failure. We tried it in .NET, and it was a failure. The Python
community is currently in the process of realizing it was a failure. 5
years of work on PHP 6 were completely destroyed because of this. (At
least they realized it *before* releasing it into the wild.)

And now there's a push for a One Encoding To Rule Them All in Ruby 2.
That's *literally* insane! (One definition of insanity is repeating
behavior and expecting a different outcome.)

jwm

Aaron_Patterson1 · 24 November 2010 00:57

My uncle Carl is really good at IT.

···

On Wed, Nov 24, 2010 at 09:13:04AM +0900, Ryan Davis wrote:

On Nov 23, 2010, at 15:44 , Philip Rhoades wrote:

> Who do I talk to get 1.9 RPMs produced for Fedora?

Beats me.

--
Aaron Patterson
http://tenderlovemaking.com/

Oliver_Schad · 24 November 2010 14:25

Michael Fellinger wrote:

···

On Wed, Nov 24, 2010 at 9:25 PM, Oliver Schad > <spam.entfernen.und.bring.gefaelligst.ein.bier.mit@oschad.de> wrote:

Brian Candler wrote:

And just to give some balance: the biggest reason not to use 1.9 is
because of the incredible complexity which has been added to the
String class, and the ability it gives you to make programs which
crash under unexpected circumstances.

Sounds great. Can somebody else confirm this?

iota ~ % echo ? | LC_ALL=ja_JP.UTF8 ruby -pe '$_[1,0] = "?"'
??
iota ~ % echo ? | LC_ALL=C ruby -pe '$_[1,0] = "?"'
-e:1: invalid multibyte char (US-ASCII)
-e:1: invalid multibyte char (US-ASCII)

So working with strings in ruby v1.9 is not supported, right?

Regards
Oli

--
Man darf ruhig intelligent sein, man muss sich nur zu helfen wissen

Phil · 24 November 2010 15:47

For example, say I have s1 tagged UTF-8 and s2 tagged ISO-8859-1. If
"s1+s2" always raised an exception, it would be easy to find, and easy
to fix.

However the 'compatibility' rules mean that this is data-sensitive. In
many cases s1+s2 will work, if either s1 contains non-ASCII characters
but s2 doesn't, or vice-versa. It's really hard to get test coverage of
all the possible cases - rcov won't help you - or you just cross your
fingers and hope.

Convert your strings to UTF-8 at all times, and you are done. You have
to check for data integrity anyway, so you can do that in one go.

I hear nobody complain that 1 / 0 raises but 1.0 / 0.0 gives Infinity,

Well, IEEE floating point is a well-established standard that has been
around for donkeys years, so I think it's reasonable to follow it.

Every natural number is an element of the set of rational numbers. For
all intents and purposes, 0 == 0.0 in mathematics (unless you limit
the set of numbers you are working on to natural numbers only, and
let's just ignore irrational numbers for now). And since the 0 is
around for a bit longer than the IEEE, and the rules of math are
taught in elementary school (including "you must not and cannot divide
by zero"), Ruby exhibits inconsistent behavior for pretty much anyone
who has a little education in maths. The IEEE standards deal with
representing floating point numbers in an inherently integer-based
numerical system, but they don't supersede the rules of maths.

Ruby's behavior of returning *infinity* is the proverbial icing on the
cake, since dividing something large by something infinitely small
results in something large (so, x / 0.000000...[ad infinitum]...1 = x
; a trick used in integrals, too).

Thus, you have to exercise due diligence in this area if you want to
keep your results in the sphere of what's possible and sane.

And yes, if I see code like "c = a / b", I do think to myself "what if b
is zero?" It's easy to decide if it's expected, and whether I need to do
something other than the default behaviour. Then I move onto the next
line.

It's easy? Take a look at integrals, and infinitesimal[0] numbers.
Infinitesimal are at the same time zero and *not* zero.

For "s3 = s1 + s2" in 1.9 I need to think to myself: "what if s1 has a
different encoding to s2, and s1 is not empty or s2 is not empty and
s1's encoding is not ASCII-compatible or s2's encoding is not
ASCII-compatible or s1 contains non-ASCII characters or s2 contains
non-ASCII characters? And what does that give as the encoding for s3 in
all those possible cases?" And then I have to carry the possible
encodings for s3 forward to the next point where it is used.

Then, as I suggested above, enforce a standard encoding in your code.
Convert everything into UTF-8, and you are pretty much done.

[0] Infinitesimal - Wikipedia

···

On Wed, Nov 24, 2010 at 4:15 PM, Brian Candler <b.candler@pobox.com> wrote:
--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

JEG2 · 25 November 2010 03:39

And even UTF-32 would have the complications of "combining characters."

James Edward Gray II

···

On Nov 24, 2010, at 8:40 PM, Jörg W Mittag wrote:

The only two Unicode encodings that are fixed-width are the obsolete
UCS-2 (which can only encode the lower 65536 codepoints) and UTF-32.

Topic		Replies	Views
[ANN] 1.9 String and M17N documentation ruby-talk	19	135	13 August 2009
Unicode roadmap? ruby-talk	262	656	1 June 2007
Unicode in Ruby now? ruby-talk	51	417	23 December 2004
Why Ruby? ruby-talk	51	266	10 February 2010
Unicode roadmap? ruby-talk	17	107	18 June 2006

Ruby 1.8 vs 1.9

Related topics