Ruby 1.8 vs 1.9

David_Masover · 26 November 2010 00:42

David Masover wrote:
> Java at least did this sanely -- UTF16 is at least a fixed width. If
> you're going to force a single encoding, why wouldn't you use
> fixed-width strings?

Actually, it's not.

Whoops, my mistake. I guess now I'm confused as to why they went with UTF-16
-- I always assumed it simply truncated things which can't be represented in
16 bits.

You can produce corrupt strings and slice into a half-character in
Java just as you can in Ruby 1.8.

Wait, how?

I mean, yes, you can deliberately build strings out of corrupt data, but if
you actually work with complete strings and string concatenation, and you
aren't doing crazy JNI stuff, and you aren't digging into the actual bits of
the string, I don't see how you can create a truncated string.

> The whole point of having multiple encodings in the first place is that
> other encodings make much more sense when you're not in the US.

There's also a lot of legacy data, even within the US. On IBM systems,
the standard encoding, even for greenfield systems that are being
written right now, is still pretty much EBCDIC all the way.

I'm really curious why anyone would go with an IBM mainframe for a greenfield
system, let alone pick EBCDIC when ASCII is fully supported.

And now there's a push for a One Encoding To Rule Them All in Ruby 2.
That's *literally* insane! (One definition of insanity is repeating
behavior and expecting a different outcome.)

Wait, what?

I've been out of the loop for awhile, so it's likely that I missed this, but
where are these plans?

···

On Wednesday, November 24, 2010 08:40:22 pm Jörg W Mittag wrote:

JEG2 · 24 November 2010 16:07

Thank you for being the voice of reason.

I've fought against Brian enough in the past over this issue, that I try to stay out of it these days. However, his arguments always strike me as wanting to unlearn what we have learned about encodings.

We can't go back. Different encodings exist. At least Ruby 1.9 allows us to work with them.

James Edward Gray II

···

On Nov 24, 2010, at 9:47 AM, Phillip Gawlowski wrote:

On Wed, Nov 24, 2010 at 4:15 PM, Brian Candler <b.candler@pobox.com> wrote:

For example, say I have s1 tagged UTF-8 and s2 tagged ISO-8859-1. If
"s1+s2" always raised an exception, it would be easy to find, and easy
to fix.

However the 'compatibility' rules mean that this is data-sensitive. In
many cases s1+s2 will work, if either s1 contains non-ASCII characters
but s2 doesn't, or vice-versa. It's really hard to get test coverage of
all the possible cases - rcov won't help you - or you just cross your
fingers and hope.

Convert your strings to UTF-8 at all times, and you are done. You have
to check for data integrity anyway, so you can do that in one go.

Brian_Candler · 24 November 2010 16:09

Phillip Gawlowski wrote in post #963602:

Convert your strings to UTF-8 at all times, and you are done.

But that basically is my point. In order to make your program
comprehensible, you have to add extra incantations so that strings are
tagged as UTF-8 everywhere (e.g. when opening files).

However this in turn adds *nothing* to your program or its logic, apart
from preventing Ruby from raising exceptions.

Well, IEEE floating point is a well-established standard that has been
around for donkeys years, so I think it's reasonable to follow it.

Every natural number is an element of the set of rational numbers. For
all intents and purposes, 0 == 0.0 in mathematics (unless you limit
the set of numbers you are working on to natural numbers only, and
let's just ignore irrational numbers for now). And since the 0 is
around for a bit longer than the IEEE, and the rules of math are
taught in elementary school (including "you must not and cannot divide
by zero"), Ruby exhibits inconsistent behavior for pretty much anyone
who has a little education in maths.

Maths and computation are not the same thing. Is there anything in the
above which applies only to Ruby and not to floating point computation
in another other mainstream programming language?

Yes, there are gotchas in floating point computation, as explained at
http://docs.sun.com/source/806-3568/ncg_goldberg.html
These are (or should be) well understood by programmers who feel they
need to use floating point numbers.

If you don't like IEEE floating point, Ruby also offers BigDecimal and
Rational.

If Ruby were to implement floating point following some different set of
rules other than IEEE, that would be (IMO) horrendous. The point of a
standard is that you only have to learn the gotchas once.

···

--
Posted via http://www.ruby-forum.com/\.

Jorg_W_Mittag1 · 25 November 2010 18:30

James Edward Gray II wrote:

···

On Nov 24, 2010, at 8:40 PM, Jörg W Mittag wrote:

The only two Unicode encodings that are fixed-width are the obsolete
UCS-2 (which can only encode the lower 65536 codepoints) and UTF-32.

And even UTF-32 would have the complications of "combining characters."

.. and zero-width characters and different representations of the same
character and ...

But that is a whole different can of worms.

jwm

Phil · 26 November 2010 11:51

Because that's how the other applications written on the mainframe the
company bought 20, 30, 40 years ago expect their data, and the same
code *still runs*.

Legacy systems like that have so much money invested in them, with
code poorly understood (not necessarily because it's *bad* code, but
because the original author has retired 20 years ago), and are so
mission critical, that a replacement in a more current design is out
of the question.

Want perpetual job security? Learn COBOL.

···

On Fri, Nov 26, 2010 at 1:42 AM, David Masover <ninja@slaphack.com> wrote:

I'm really curious why anyone would go with an IBM mainframe for a greenfield
system, let alone pick EBCDIC when ASCII is fully supported.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Robert_K1 · 28 November 2010 17:20

David Masover wrote:

Java at least did this sanely -- UTF16 is at least a fixed width. If
you're going to force a single encoding, why wouldn't you use
fixed-width strings?

Actually, it's not.

Whoops, my mistake. I guess now I'm confused as to why they went with UTF-16
-- I always assumed it simply truncated things which can't be represented in
16 bits.

The JLS is a bit difficult to read IMHO. Characters are 16 bit and a single character covers the range of code points 0000 to FFFF.

http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1

Characters with code points greater than FFFF are called "supplementary characters" and while UTF-16 provides encodings for them as well, these need two code units (four bytes). They write "The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.":

http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#95413

IMHO this is not very precise: all calculations based on char can not directly represent the supplementary characters. These use just a subset of UTF-16. If you want to work with supplementary characters things get really awful. Then you need methods like this one

http://download.oracle.com/javase/6/docs/api/java/lang/Character.html#toChars(int)

And if you stuff this sequence into a String all of a sudden String.length() does no longer return the length in characters what is in line with what the JavaDocs states

http://download.oracle.com/javase/6/docs/api/java/lang/String.html#length\(\)

Unfortunately the majority of programs I have seen never takes this into account and uses String.length() as "length in characters". This awful mixture becomes apparent in the JavaDoc of class Character, which explicitly states that there are two ways to deal with characters:

1. type char (no supplementary supported)
2. type int (with supplementary)

http://download.oracle.com/javase/6/docs/api/java/lang/Character.html#unicode

You can produce corrupt strings and slice into a half-character in
Java just as you can in Ruby 1.8.

Wait, how?

You can convert a code point above FFFF via Character.toChars() (which returns a char of length 2) and truncate it to 1. But: the resulting sequence isn't actually invalid since all values in the range 0000 to FFFF are valid characters. This isn't really robust. Even though the docs say that the longest matching sequence is to be considered during decoding there is no reliably way to determine whether d80d dd53 represents a single character (code point 013553) or two separate characters (code points d80d and dd53).

If you like you can play around a bit with this:

gist.github.com

https://gist.github.com/719100

EncodingTest.java

package enc;

public class EncodingTest {

  public static void main(String[] args) {
    boolean lastDefined = !Character.isDefined(0);
    int lastStart = -1;

    for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; ++i) {
      final boolean def = Character.isDefined(i);

This file has been truncated. show original

output.txt

Range 0x000000 - 0x000236 (   567 chars) defined
Range 0x000237 - 0x00024f (    25 chars) -
Range 0x000250 - 0x000357 (   264 chars) defined
Range 0x000358 - 0x00035c (     5 chars) -
Range 0x00035d - 0x00036f (    19 chars) defined
Range 0x000370 - 0x000373 (     4 chars) -
Range 0x000374 - 0x000375 (     2 chars) defined
Range 0x000376 - 0x000379 (     4 chars) -
Range 0x00037a - 0x00037a (     1 chars) defined
Range 0x00037b - 0x00037d (     3 chars) -

This file has been truncated. show original

I mean, yes, you can deliberately build strings out of corrupt data, but if
you actually work with complete strings and string concatenation, and you
aren't doing crazy JNI stuff, and you aren't digging into the actual bits of
the string, I don't see how you can create a truncated string.

Well, you can (see above) but unfortunately it is still valid. It just happens to represent a different sequence.

Kind regards

robert

···

On 26.11.2010 01:42, David Masover wrote:

On Wednesday, November 24, 2010 08:40:22 pm Jörg W Mittag wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Chuck_Remes · 24 November 2010 16:56

[snipped lots of arguments about string encodings that may or may not be relevant to the OP]

So... I am wondering if the original poster (Peter Pincus) has tried his code under 1.9 yet.

Peter?

cr

David_Masover · 24 November 2010 18:12

And if you don't like Ruby's strings, there's nothing stopping you from
rolling your own. There's certainly nothing stopping you from using binary
mode (whether it claims to be ASCII or not) for all strings.

···

On Wednesday, November 24, 2010 10:09:13 am Brian Candler wrote:

If you don't like IEEE floating point, Ruby also offers BigDecimal and
Rational.

Phil · 24 November 2010 18:20

Phillip Gawlowski wrote in post #963602:

Convert your strings to UTF-8 at all times, and you are done.

But that basically is my point. In order to make your program
comprehensible, you have to add extra incantations so that strings are
tagged as UTF-8 everywhere (e.g. when opening files).

However this in turn adds *nothing* to your program or its logic, apart
from preventing Ruby from raising exceptions.

s/apart from preventing Ruby from raising exceptions/but ensures
correctness of data across different systems/;

Maths and computation are not the same thing. Is there anything in the
above which applies only to Ruby and not to floating point computation
in another other mainstream programming language?

You conveniently left out that Ruby thinks dividing by 0.0 results in infinity.
That's not just wrong, but absurd to the extreme. S, we have to
safeguard against this. Just like having to safeguard against, say,
proper string encoding. If *anyone* is to blame, it's the ANSI and the
IT industry for having a) an extremely US-centric view of the world,
and b) being too damn shortsighted to create an international, capable
standard 30 years ago.

Further, you can't do any computations without proper maths. In Ruby,
you can't do computations since it cannot divide by zero properly, or
at least *consistently*.

Yes, there are gotchas in floating point computation, as explained at
http://docs.sun.com/source/806-3568/ncg_goldberg.html
These are (or should be) well understood by programmers who feel they
need to use floating point numbers.

If you don't like IEEE floating point, Ruby also offers BigDecimal and
Rational.

Works really well with irrational numbers, that are neither large
decimals, nor can they be expressed as a fraction x/x_0.

In a nutshell, Ruby cannot deal with floating points at all, and the
IEEE standard is a means to *represent* floating point numbers in
bits. It does *not* supersede natural laws, much less rules that are
in effect for hundreds of years.

And once the accuracy that the IEEE float represents isn't good enough
anymore (which happens once you have to simulate a particle system),
you move away from scalar CPUs, and move to vector CPUs / APUs (like
the MMX and SSE instruction sets for desktops, or a GPGPU via CUDA).

If Ruby were to implement floating point following some different set of
rules other than IEEE, that would be (IMO) horrendous. The point of a
standard is that you only have to learn the gotchas once.

Um, no. A standard is a means to avoid misunderstandings, and have a
well-defined system dealing with what the standard defines. You know,
like exchange text data in a standard that can cover as many of the
world's glyphs as possible.

And there is always room for improvement, otherwise I wonder why
engineers need Maple and mathematicians Mathematica.

···

On Wed, Nov 24, 2010 at 5:09 PM, Brian Candler <b.candler@pobox.com> wrote:

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Robert_K1 · 25 November 2010 09:45

Phillip Gawlowski wrote in post #963602:

Convert your strings to UTF-8 at all times, and you are done.

This may be true for the western world but I believe I remember one of
our Japanese friends state that Unicode does not cover all Asian
character sets completely; it could have been a remark about Java's
implementation of Unicode though, I am not 100% sure.

But that basically is my point. In order to make your program
comprehensible, you have to add extra incantations so that strings are
tagged as UTF-8 everywhere (e.g. when opening files).

However this in turn adds *nothing* to your program or its logic, apart
from preventing Ruby from raising exceptions.

Checking input and ensuring that data reaches the program in proper
ways is generally good practice for robust software. IMHO dealing
explicitly with encodings falls into the same area as checking whether
an integer entered by a user is strictly positive or a string is not
empty.

And I don't think you have to do it for one off scripts or when
working in your local environment only. So there is no effort
involved.

Brian, it seems you want to avoid the complex matter of i18n - by
ignoring it. But if you work in a situation where multiple encodings
are mixed you will be forced to deal with it - sooner or later. With
1.9 you get proper feedback while 1.8 may simply stop working at some
point - and you may not even notice it quickly enough to avoid damage.

Kind regards

robert

···

On Wed, Nov 24, 2010 at 5:09 PM, Brian Candler <b.candler@pobox.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

David_Masover · 27 November 2010 08:04

> I'm really curious why anyone would go with an IBM mainframe for a
> greenfield system, let alone pick EBCDIC when ASCII is fully supported.

Because that's how the other applications written on the mainframe the
company bought 20, 30, 40 years ago expect their data, and the same
code *still runs*.

In other words, not _quite_ greenfield, or at least, a somewhat different
sense of greenfield.

But I guess that explains why you're on a mainframe at all. Someone put their
data there 20, 30, 40 years ago, and you need to get at that data, right?

Legacy systems like that have so much money invested in them, with
code poorly understood (not necessarily because it's *bad* code, but
because the original author has retired 20 years ago),

Which implies bad code, bad documentation, or both. Yes, having the original
author available tends to make things easier, but I'm not sure I'd know what
to do with the code I wrote 1 year ago, let alone 20, unless I document the
hell out of it.

Want perpetual job security? Learn COBOL.

I considered that...

It'd have to be job security plus a large enough paycheck I could either work
very part-time, or retire in under a decade. Neither of these seems likely, so
I'd rather work with something that gives me job satisfaction, which is why
I'm doing Ruby.

···

On Friday, November 26, 2010 05:51:38 am Phillip Gawlowski wrote:

On Fri, Nov 26, 2010 at 1:42 AM, David Masover <ninja@slaphack.com> wrote:

Garance_A_Drosehn · 27 November 2010 23:05

My experience with 1.9 so far is that some of my ruby scripts have
become much faster. I have other scripts which have needed to deal
with a much wider range of characters than "standard ascii". I got
those string-related scripts working fine in 1.8. They all seem to
break in 1.9.

In my own opinion, the problem isn't 1.9, is that I wrote these
string-handling scripts in ruby before ruby really supported all the
characters I had to deal with. I look forward to getting my scripts
switched over to 1.9, but there's no question that *getting* to 1.9 is
going to require a bunch of work from me. That's just the way it is.
Not the fault of ruby 1.9, but it's still some work to fix the
scripts.

···

On Wed, Nov 24, 2010 at 11:07 AM, James Edward Gray II <james@graysoftinc.com> wrote:

On Nov 24, 2010, at 9:47 AM, Phillip Gawlowski wrote:

Convert your strings to UTF-8 at all times, and you are done. You have
to check for data integrity anyway, so you can do that in one go.

Thank you for being the voice of reason.

I've fought against Brian enough in the past over this issue, that I try to stay out of it these days. However, his arguments always strike me as wanting to unlearn what we have learned about encodings.

We can't go back. Different encodings exist. At least Ruby 1.9 allows us to work with them.

--
Garance Alistair Drosehn = drosihn@gmail.com
Senior Systems Programmer
Rensselaer Polytechnic Institute; Troy, NY; USA

Robert_K1 · 29 November 2010 08:12

After reading RFC 2781 - UTF-16, an encoding of ISO 10646 I am not
sure any more whether the last statement still holds. It seems the
presented algorithm can only work reliable if certain code points are
unused. And indeed checking with
Character Name Index shows that D800 and DC00
are indeed reserved. Interestingly enough Java's
Character.isDefined() returns true for D800 and DC00:

gist.github.com

https://gist.github.com/719100

EncodingTest.java

package enc;

public class EncodingTest {

  public static void main(String[] args) {
    boolean lastDefined = !Character.isDefined(0);
    int lastStart = -1;

    for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; ++i) {
      final boolean def = Character.isDefined(i);

This file has been truncated. show original

output.txt

Range 0x000000 - 0x000236 (   567 chars) defined
Range 0x000237 - 0x00024f (    25 chars) -
Range 0x000250 - 0x000357 (   264 chars) defined
Range 0x000358 - 0x00035c (     5 chars) -
Range 0x00035d - 0x00036f (    19 chars) defined
Range 0x000370 - 0x000373 (     4 chars) -
Range 0x000374 - 0x000375 (     2 chars) defined
Range 0x000376 - 0x000379 (     4 chars) -
Range 0x00037a - 0x00037a (     1 chars) defined
Range 0x00037b - 0x00037d (     3 chars) -

This file has been truncated. show original

Cheers

robert

···

On Sun, Nov 28, 2010 at 6:20 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

On 26.11.2010 01:42, David Masover wrote:

On Wednesday, November 24, 2010 08:40:22 pm Jörg W Mittag wrote:

I mean, yes, you can deliberately build strings out of corrupt data, but
if
you actually work with complete strings and string concatenation, and you
aren't doing crazy JNI stuff, and you aren't digging into the actual bits
of
the string, I don't see how you can create a truncated string.

Well, you can (see above) but unfortunately it is still valid. It just
happens to represent a different sequence.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Josh_Cheek · 24 November 2010 19:02

Its wrongness is an interpretation (I would also prefer that it just break,
but I can certainly see why some would say it should be infinity). And it
doesn't apply only to Ruby:

Java:
public class Infinity {
  public static void main(String args) {
    System.out.println(1.0/0.0); // prints "Infinity"
  }
}

JavaScript:
document.write(1.0/0.0) // prints "Infinity"

C:
#include <stdio.h>
int main( ) {
printf( "%f\n" , 1.0/0.0 ); // prints "inf"
return 0;
}

···

On Wed, Nov 24, 2010 at 12:20 PM, Phillip Gawlowski < cmdjackryan@googlemail.com> wrote:

> Maths and computation are not the same thing. Is there anything in the
> above which applies only to Ruby and not to floating point computation
> in another other mainstream programming language?

You conveniently left out that Ruby thinks dividing by 0.0 results in
infinity.
That's not just wrong, but absurd to the extreme.

Phil · 25 November 2010 10:12

Since UTF-8 is a subset of UTF-16, which in turn is a subset of
UTF-32, and Unicode is future-proofed (at least, ISO learned from the
mess created in the 1950s to 1960s) so that new glyphs won't ever
collide with existing glyphs, my point still stands.

···

On Thu, Nov 25, 2010 at 10:45 AM, Robert Klemme <shortcutter@googlemail.com> wrote:

This may be true for the western world but I believe I remember one of
our Japanese friends state that Unicode does not cover all Asian
character sets completely; it could have been a remark about Java's
implementation of Unicode though, I am not 100% sure.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Phil · 27 November 2010 17:41

> I'm really curious why anyone would go with an IBM mainframe for a
> greenfield system, let alone pick EBCDIC when ASCII is fully supported.

Because that's how the other applications written on the mainframe the
company bought 20, 30, 40 years ago expect their data, and the same
code *still runs*.

In other words, not _quite_ greenfield, or at least, a somewhat different
sense of greenfield.

You don't expect anyone to throw their older mainframes away, do you?

But I guess that explains why you're on a mainframe at all. Someone put their
data there 20, 30, 40 years ago, and you need to get at that data, right?

Oh, don't discard mainframes. For a corporation the size of SAP (or
needing SAP software), a mainframe is still the ideal hardware to
manage the enormous databases collected over the years.

And mainframes with vector CPUs are ideal for all sorts of simulations
engineers have to do (like aerodynamics), or weather research.

Legacy systems like that have so much money invested in them, with
code poorly understood (not necessarily because it's *bad* code, but
because the original author has retired 20 years ago),

Which implies bad code, bad documentation, or both. Yes, having the original
author available tends to make things easier, but I'm not sure I'd know what
to do with the code I wrote 1 year ago, let alone 20, unless I document the
hell out of it.

It gets worse 20 years down the line: The techniques used and state of
the art then are forgotten now, for example (nobody uses GOTO, or
should use it, anyway) any more, and error handling is done with
exceptions these days, instead of error codes, for example. And TDD
didn't even *exist* as a technique.

Together with a very, very conservative attitude, changes are
difficult to deal with, if they can be implemented at all.

Assuming the source code still exists, anyway.

···

On Sat, Nov 27, 2010 at 9:04 AM, David Masover <ninja@slaphack.com> wrote:

On Friday, November 26, 2010 05:51:38 am Phillip Gawlowski wrote:

On Fri, Nov 26, 2010 at 1:42 AM, David Masover <ninja@slaphack.com> wrote:

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Brian_Candler · 27 November 2010 21:35

Robert Klemme wrote in post #963807:

But that basically is my point. In order to make your program
comprehensible, you have to add extra incantations so that strings are
tagged as UTF-8 everywhere (e.g. when opening files).

However this in turn adds *nothing* to your program or its logic, apart
from preventing Ruby from raising exceptions.

Checking input and ensuring that data reaches the program in proper
ways is generally good practice for robust software.

But that's not what Ruby does!.

If you do
s1 = File.open("foo","r:UTF-8").gets
it does *not* check that the data is UTF-8. It just adds a tag saying
that it is.

Then later, when you get s2 from somewhere else, and have a line like s3
= s1 + s2, it *might* raise an exception if the encodings are different.
Or it might not, depending on the actual content of the strings at that
time.

Say s2 is a string read from a template. It may work just fine, as long
as s2 contains only ASCII characters. But later, when you decide to
translate the program and add some non-ASCII characters into the
template, it may blow up.

If it blew up on the invalid data, I'd accept that. If it blew up
whenever two strings of different encodings encounter, I'd accept that.
But to have your program work through sheer chance, only to blow up some
time later when it encounters a different input stream - no, that sucks.

In that case, I would much rather the program didn't crash, but at least
carried on working (even in the garbage-in-garbage-out sense).

Brian, it seems you want to avoid the complex matter of i18n - by
ignoring it. But if you work in a situation where multiple encodings
are mixed you will be forced to deal with it - sooner or later.

But you're never going to want to combine two strings of different
encodings without transcoding them to a common encoding, as that
wouldn't make sense.

So either:

1. Your program deals with the same encoding from input through to
output, in which case there's nothing to do

2. You transcode at the edges into and out of your desired common
encoding

Neither approach requires each individual string to carry its encoding
along with it.

···

--
Posted via http://www.ruby-forum.com/\.

Phil · 24 November 2010 19:16

It cannot be infinity. It does, quite literally not compute. There's
no room for interpretation, it's a fact of (mathematical) life that
something divided by nothing has an undefined result. It doesn't
matter if it's 0, 0.0, or -0.0. Undefined is undefined.

That other languages have the same issue makes matters worse, not
better (but at least it is consistent, so there's that).

···

On Wed, Nov 24, 2010 at 8:02 PM, Josh Cheek <josh.cheek@gmail.com> wrote:

Its wrongness is an interpretation (I would also prefer that it just break,
but I can certainly see why some would say it should be infinity). And it
doesn't apply only to Ruby:

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Robert_K1 · 25 November 2010 11:56

This may be true for the western world but I believe I remember one of
our Japanese friends state that Unicode does not cover all Asian
character sets completely; it could have been a remark about Java's
implementation of Unicode though, I am not 100% sure.

Since UTF-8 is a subset of UTF-16, which in turn is a subset of
UTF-32,

I tried to find more precise statement about this but did not really
succeed. I thought all UTF-x were just different encoding forms of
the same universe of code points.

and Unicode is future-proofed

Oh, so then ISO committee actually has a time machine? Wow!

(at least, ISO learned from the
mess created in the 1950s to 1960s) so that new glyphs won't ever
collide with existing glyphs, my point still stands.

Well, I support your point anyway. That was just meant as a caveat so
people are watchful (and test rather than believe). But as I
think about it it more likely was a statement about Java's
implementation (because a char has only 16 bits which is not
sufficient for all Unicode code points).

Kind regards

robert

···

On Thu, Nov 25, 2010 at 11:12 AM, Phillip Gawlowski <cmdjackryan@googlemail.com> wrote:

On Thu, Nov 25, 2010 at 10:45 AM, Robert Klemme > <shortcutter@googlemail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

David_Masover · 27 November 2010 18:50

>> > I'm really curious why anyone would go with an IBM mainframe for a
>> > greenfield system, let alone pick EBCDIC when ASCII is fully
>> > supported.
>>
>> Because that's how the other applications written on the mainframe the
>> company bought 20, 30, 40 years ago expect their data, and the same
>> code *still runs*.
>
> In other words, not _quite_ greenfield, or at least, a somewhat different
> sense of greenfield.

You don't expect anyone to throw their older mainframes away, do you?

I suppose I expected people to be developing modern Linux apps that just
happen to compile on that hardware.

> But I guess that explains why you're on a mainframe at all. Someone put
> their data there 20, 30, 40 years ago, and you need to get at that data,
> right?

Oh, don't discard mainframes. For a corporation the size of SAP (or
needing SAP software), a mainframe is still the ideal hardware to
manage the enormous databases collected over the years.

Well, now that it's been collected, sure -- migrations are painful.

But then, corporations the size of Google tend to store their information
distributed on cheap PC hardware.

And mainframes with vector CPUs are ideal for all sorts of simulations
engineers have to do (like aerodynamics), or weather research.

When you say "ideal", do you mean they actually beat out the cluster of
commodity hardware I could buy for the same price?

>> Legacy systems like that have so much money invested in them, with
>> code poorly understood (not necessarily because it's *bad* code, but
>> because the original author has retired 20 years ago),
>
> Which implies bad code, bad documentation, or both. Yes, having the
> original author available tends to make things easier, but I'm not sure
> I'd know what to do with the code I wrote 1 year ago, let alone 20,
> unless I document the hell out of it.

It gets worse 20 years down the line: The techniques used and state of
the art then are forgotten now, for example (nobody uses GOTO, or
should use it, anyway) any more, and error handling is done with
exceptions these days, instead of error codes, for example. And TDD
didn't even *exist* as a technique.

Together with a very, very conservative attitude, changes are
difficult to deal with, if they can be implemented at all.

Assuming the source code still exists, anyway.

All three of which suggest to me that in many cases, an actual greenfield
project would be worth it. IIRC, there was a change to the California minimum
wage that would take 6 months to implement and 9 months to revert because it
was written in COBOL -- but could the same team really write a new payroll
system in 15 months? Maybe, but doubtful.

But it's still absurdly wasteful. A rewrite would pay for itself with only a
few minor changes that'd be trivial in a sane system, but major year-long
projects with the legacy system.

So, yeah, job security. I'd just hate my job.

···

On Saturday, November 27, 2010 11:41:59 am Phillip Gawlowski wrote:

On Sat, Nov 27, 2010 at 9:04 AM, David Masover <ninja@slaphack.com> wrote:
> On Friday, November 26, 2010 05:51:38 am Phillip Gawlowski wrote:
>> On Fri, Nov 26, 2010 at 1:42 AM, David Masover <ninja@slaphack.com> wrote:

Topic		Replies	Views
Ruby 1.8 vs. Ruby 1.9 ruby-talk	16	136	28 May 2009
What's the "current" Ruby? ruby-talk	3	97	1 February 2009
Differences between 1.8.2 and 1.8.4 ruby-talk	3	116	15 December 2005
Performance of Ruby 1.9 vs. Ruby 1.8 (was: Speed sprint) ruby-talk	14	206	26 February 2010
Ruby 1.6.8 vs Ruby 1.8.0 preview 2 - benchmarks ruby-talk	10	124	3 April 2003

Ruby 1.8 vs 1.9

Related Topics