Ruby 1.8 vs 1.9

>
> Its wrongness is an interpretation (I would also prefer that it just
break,
> but I can certainly see why some would say it should be infinity). And it
> doesn't apply only to Ruby:

It cannot be infinity. It does, quite literally not compute. There's
no room for interpretation, it's a fact of (mathematical) life that
something divided by nothing has an undefined result. It doesn't
matter if it's 0, 0.0, or -0.0. Undefined is undefined.

"by observing from the table of values and the graph of y = 1/x*²* in Figure
1, that the values of 1/x*²* can be made arbitrarily large by taking x close
enough to 0. Thus the values of f(x) do not approach a number, so lim_(x->0)
1/x*²* does not exist. To indicate this kind of behaviour we use the
notation lim_(x->0) 1/x*²* = ∞"

Since floats define infinity, regardless of its not being a number, it is
not "absurd to the extreme" to result in that value when doing floating
point math.

That other languages have the same issue makes matters worse, not
better (but at least it is consistent, so there's that).

The question was "Is there anything in the above which applies only to Ruby
and not to floating point computation in another other mainstream
programming language?" the answer isn't "other languages have the same
issue", it's "no".

···

On Wed, Nov 24, 2010 at 1:16 PM, Phillip Gawlowski < cmdjackryan@googlemail.com> wrote:

On Wed, Nov 24, 2010 at 8:02 PM, Josh Cheek <josh.cheek@gmail.com> wrote:

From my Calculus book (goo.gl/D7PoI)

Phillip Gawlowski wrote in post #963658:

It cannot be infinity. It does, quite literally not compute. There's
no room for interpretation, it's a fact of (mathematical) life that
something divided by nothing has an undefined result. It doesn't
matter if it's 0, 0.0, or -0.0. Undefined is undefined.

It is perfectly reasonable, mathematically, to assign infinity to 1/0.
To geometers and topologists, infinity is just another point. Look up
the one-point compactification of R^n. If we join infinity to the real
line, we get a circle, topologically. Joining infinity to the real plane
gives a sphere, called the Riemann sphere. These are rigorous
definitions with useful results.

I'm glad that IEEE floating point has infinity included, otherwise I
would run into needless error handling. It's not an error to reach one
pole of a sphere (the other pole being zero).

Infinity is there for good reason; its presence was well-considered by
the quite knowledgeable IEEE designers.

···

--
Posted via http://www.ruby-forum.com/\.

Phillip Gawlowski wrote in post #963658:

It cannot be infinity. It does, quite literally not compute. There's
no room for interpretation, it's a fact of (mathematical) life that
something divided by nothing has an undefined result. It doesn't
matter if it's 0, 0.0, or -0.0. Undefined is undefined.

That other languages have the same issue makes matters worse, not
better (but at least it is consistent, so there's that).

--
Phillip Gawlowski

This is not even wrong.

From the definitive source:

The IEEE floating-point standard, supported by almost all modern
floating-point units, specifies that every floating point arithmetic
operation, including division by zero, has a well-defined result. The
standard supports signed zero, as well as infinity and NaN (not a
number). There are two zeroes, +0 (positive zero) and −0 (negative zero)
and this removes any ambiguity when dividing. In IEEE 754 arithmetic, a
÷ +0 is positive infinity when a is positive, negative infinity when a
is negative, and NaN when a = ±0. The infinity signs change when
dividing by −0 instead.

···

--
Posted via http://www.ruby-forum.com/\.

Since UTF-8 is a subset of UTF-16, which in turn is a subset of
UTF-32,

I tried to find more precise statement about this but did not really
succeed. I thought all UTF-x were just different encoding forms of
the same universe of code points.

It's an implicit feature, rather than an explicit one:
Wester languages get the first 8 bits for encoding. Glyphs going
beyond the Latin alphabet get the next 8 bits. If that isn't enough, n
additional 16 bits are used for encoding purposes.

Thus, UTF-8 is a subset of UTF-16 is a subset of UTF-16. Thus, also,
the future-proofing, in case even more glyphs are needed.

(at least, ISO learned from the
mess created in the 1950s to 1960s) so that new glyphs won't ever
collide with existing glyphs, my point still stands. :wink:

Well, I support your point anyway. That was just meant as a caveat so
people are watchful (and test rather than believe). :slight_smile: But as I
think about it it more likely was a statement about Java's
implementation (because a char has only 16 bits which is not
sufficient for all Unicode code points).

Of course, test your assumptions. But first, you need an assumption to
start from. :wink:

···

On Thu, Nov 25, 2010 at 12:56 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

I believe you are referring to the complaints the Asian cultures sometimes raise against Unicode. If so, I'll try to recap the issues, as I understand them.

First, Unicode is a bit larger than their native encodings. Typically they get everything they need into two bytes where Unicode requires more for their languages.

The Unicode team also made some controversial decisions that affected the Asian languages, like Han Unification (http://en.wikipedia.org/wiki/Han_unification\).

Finally, they have a lot of legacy data in their native encodings and perfect conversion is sometimes tricky due to some context sensitive issues.

I think the Asian cultures have warmed a bit to Unicode over time (my opinion only), but it's important to remember that adopting it involved more challenges for them.

James Edward Gray II

···

On Nov 25, 2010, at 5:56 AM, Robert Klemme wrote:

But as I think about it it more likely was a statement about Java's
implementation (because a char has only 16 bits which is not
sufficient for all Unicode code points).

Robert Klemme wrote:

This may be true for the western world but I believe I remember one
of our Japanese friends state that Unicode does not cover all Asian
character sets completely; it could have been a remark about Java's
implementation of Unicode though, I am not 100% sure.

Since UTF-8 is a subset of UTF-16, which in turn is a subset of
UTF-32,

I tried to find more precise statement about this but did not really
succeed. I thought all UTF-x were just different encoding forms of
the same universe of code points.

Yes this is correct. Many people don't get the difference between a
charset and the corresponding encoding.

Unicode is a charset not with one encoding but with many encodings. So
we talk about the same characters and different mappings of this
characters to bits and bytes. This mapping is a simple table which you
can write down to a paper sheet and the side with the characters will
always be the same with UTF-8, UTF-16 and UTF-32.

The encodings UTF-8, UTF-16 and UTF-32 were build for different
purposes. The number after UTF says nothing about the maximum length in
first place it says something about the shortest length (and often about
the usual length if you use this encoding in that situation which it was
build for).

So if people coming from the ascii world use UTF-8, many encodings
(mapping of a character to a sequence of bits and bytes) of characters
will be inside one byte.

UTF-32 is a bit different, in this case 32 bits are a static size of
each encoded character.

and Unicode is future-proofed

Oh, so then ISO committee actually has a time machine? Wow! :wink:

Read as has much encoding space left and nobody of us knows how you
could fill the whole space. But humans tend to be wrong.

Regards
Oli

···

On Thu, Nov 25, 2010 at 11:12 AM, Phillip Gawlowski > <cmdjackryan@googlemail.com> wrote:

On Thu, Nov 25, 2010 at 10:45 AM, Robert Klemme >> <shortcutter@googlemail.com> wrote:

--
Man darf ruhig intelligent sein, man muss sich nur zu helfen wissen

I suppose I expected people to be developing modern Linux apps that just
happen to compile on that hardware.

Linux is usually not the OS the vendor supports. Keep in mind, a day
of lost productivity on this kind of systems means losses in the
millions of dollars area.

But then, corporations the size of Google tend to store their information
distributed on cheap PC hardware.

If they were incorporated where there was such a thing as "cheap PC
hardware". Google is a young corporation, even in IT. And they need
loads of custom code to make their search engine and datacenters
perform and scale, too.

And mainframes with vector CPUs are ideal for all sorts of simulations
engineers have to do (like aerodynamics), or weather research.

When you say "ideal", do you mean they actually beat out the cluster of
commodity hardware I could buy for the same price?

Sure, if you can shell out for about 14 000 Xeon CPUs and 7 000 Tesla
GPGPUs (Source: Tianhe-1 - Wikipedia ).

All three of which suggest to me that in many cases, an actual greenfield
project would be worth it. IIRC, there was a change to the California minimum
wage that would take 6 months to implement and 9 months to revert because it
was written in COBOL -- but could the same team really write a new payroll
system in 15 months? Maybe, but doubtful.

So, you'd bet the corporation on the size of Exxon Mobile, Johnson &
Johnson, General Electric and similar, just because you *think* it is
easier to do changes 40 years later in an unproven, unused, upstart
language?

The clocks in the sort of shops that still run mainframes tick very
different from what you or I are used to.

But it's still absurdly wasteful. A rewrite would pay for itself with only a
few minor changes that'd be trivial in a sane system, but major year-long
projects with the legacy system.

If the rewrite would pay for itself in the short term, then why hasn't
it been done?

···

On Sat, Nov 27, 2010 at 7:50 PM, David Masover <ninja@slaphack.com> wrote:

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

> > Its wrongness is an interpretation (I would also prefer that it just
>
> break,
>
> > but I can certainly see why some would say it should be infinity). And
> > it
>
> > doesn't apply only to Ruby:
> It cannot be infinity. It does, quite literally not compute. There's
> no room for interpretation, it's a fact of (mathematical) life that
> something divided by nothing has an undefined result. It doesn't
> matter if it's 0, 0.0, or -0.0. Undefined is undefined.

From my Calculus book (goo.gl/D7PoI)

"by observing from the table of values and the graph of y = 1/x*²* in
Figure 1, that the values of 1/x*²* can be made arbitrarily large by
taking x close enough to 0. Thus the values of f(x) do not approach a
number, so lim_(x->0) 1/x*²* does not exist. To indicate this kind of
behaviour we use the notation lim_(x->0) 1/x*²* = ∞"

Specifically, the _limit_ is denoted as infinity, which is not a real number.

Since floats define infinity, regardless of its not being a number, it is
not "absurd to the extreme" to result in that value when doing floating
point math.

Ah, but it is, for two reasons:

First, floats represent real numbers. Having exceptions to that, like NaN or
Infinity, is pointless and confusing -- it would be like making nil an
integer. And having float math produce something which isn't a float doesn't
really make sense.

Second, 1/0 is just undefined, not infinity. It's the _limit_ of 1/x as x goes
to 0 which is infinity. This only has meaning in the context of limits,
because limits are just describing behavior -- all the limit says is that as x
gets arbitrarily close to 0, 1/x gets arbitrarily large, but you still can't
_actually_ divide x by 0.

They didn't teach me that in Calculus, they're teaching me that in proofs.

> That other languages have the same issue makes matters worse, not
> better (but at least it is consistent, so there's that).

The question was "Is there anything in the above which applies only to Ruby
and not to floating point computation in another other mainstream
programming language?" the answer isn't "other languages have the same
issue", it's "no".

I don't know that there's anything in the above that applies only to Ruby.
However, Ruby does a number of things differently, and arguably better, than
other languages -- for example, Ruby's integer types transmute into Bignum
rather than overflowing.

···

On Wednesday, November 24, 2010 01:35:12 pm Josh Cheek wrote:

On Wed, Nov 24, 2010 at 1:16 PM, Phillip Gawlowski < > > cmdjackryan@googlemail.com> wrote:
> On Wed, Nov 24, 2010 at 8:02 PM, Josh Cheek <josh.cheek@gmail.com> wrote:

This is not even wrong.

From the definitive source:
Division by zero - Wikipedia

For certain values of "definitive", anyway.

The IEEE floating-point standard, supported by almost all modern
floating-point units, specifies that every floating point arithmetic
operation, including division by zero, has a well-defined result. The
standard supports signed zero, as well as infinity and NaN (not a
number). There are two zeroes, +0 (positive zero) and -0 (negative zero)
and this removes any ambiguity when dividing. In IEEE 754 arithmetic, a
÷ +0 is positive infinity when a is positive, negative infinity when a
is negative, and NaN when a = ±0. The infinity signs change when
dividing by -0 instead.

Yes, the IEEE 754 standard defines it that way.

The IEEE standard, however, does *not* define how mathematics work.
Mathematics does that. In math, x_0/0 is *undefined*. It is not
infinity (David kindly explained the difference between limits and
numbers), it is not negative infinity, it is undefined. Division by
zero *cannot* happen. If it would, we would be able to build, for
example, perpetual motion machines.

So, from a purely mathematical standpoint, the IEEE 754 standard is
wrong by treating the result of division by 0.0 any different than
dividing by 0 (since floats are only different in their nature to
*computers* representing everything in binary [which cannot represent
floating point numbers at all, much less any given irrational
number]).

···

On Thu, Nov 25, 2010 at 2:05 AM, Adam Ms. <e148759@bsnow.net> wrote:

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Since UTF-8 is a subset of UTF-16, which in turn is a subset of
UTF-32,

I tried to find more precise statement about this but did not really
succeed. I thought all UTF-x were just different encoding forms of
the same universe of code points.

It's an implicit feature, rather than an explicit one:
Wester languages get the first 8 bits for encoding. Glyphs going
beyond the Latin alphabet get the next 8 bits. If that isn't enough, n
additional 16 bits are used for encoding purposes.

What bits are you talking about here, bits of code points or bits in
the encoding? It seems you are talking about bits of code points.
However, how these are put into any UTF-x encoding is a different
story and also because UTF-8 knows multibyte sequences it's not
immediately clear whether UTF-8 can only hold a subset of what UTF-16
can hold.

Thus, UTF-8 is a subset of UTF-16 is a subset of UTF-16. Thus, also,
the future-proofing, in case even more glyphs are needed.

Quoting from RFC 3629 - UTF-8, a transformation format of ISO 10646

Char. number range | UTF-8 octet sequence
   (hexadecimal) | (binary)

···

On Thu, Nov 25, 2010 at 1:37 PM, Phillip Gawlowski <cmdjackryan@googlemail.com> wrote:

On Thu, Nov 25, 2010 at 12:56 PM, Robert Klemme > <shortcutter@googlemail.com> wrote:

--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

So we have for code point encoding

7 bits
6 + 5 = 11 bits
2 * 6 + 4 = 16 bits
3 * 6 + 3 = 21 bits

This makes 2164864 (0x210880) possible code points in UTF-8. And the
pattern can be extended.

Looking at RFC 2781 - UTF-16, an encoding of ISO 10646 we see that
UTF-16 (at least this version) supports code points up to 0x10FFFF.
This is less than what UTF-8 can hold theoretically.

Coincidentally 0x10FFFF has 21 bits which is what fits into UTF-8.

I stay unconvinced that UTF-8 can handle a subset of code points of
the set UTF-16 can handle.

I also stay unconvinced that UTF-8 encodings are a subset of UTF-16
encodings. This cannot be true because in UTF-8 the encoding unit is
one octet, while in UTF-16 it's two octets. As a practical example
the sequence "a" will have length 1 octet in UTF-8 (because it happens
to be an ASCII character) and length 2 octets in UTF-16.

"All standard UCS encoding forms except UTF-8 have an encoding unit
larger than one octet, [...]"

(at least, ISO learned from the
mess created in the 1950s to 1960s) so that new glyphs won't ever
collide with existing glyphs, my point still stands. :wink:

Well, I support your point anyway. That was just meant as a caveat so
people are watchful (and test rather than believe). :slight_smile: But as I
think about it it more likely was a statement about Java's
implementation (because a char has only 16 bits which is not
sufficient for all Unicode code points).

Of course, test your assumptions. But first, you need an assumption to
start from. :wink:

:slight_smile:

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

You are confusing us.

UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points. They are all capable of representing all code points. Nothing in this discussion is a subset of anything else.

James Edward Gray II

···

On Nov 25, 2010, at 6:37 AM, Phillip Gawlowski wrote:

Thus, UTF-8 is a subset of UTF-16 is a subset of UTF-16. Thus, also,
the future-proofing, in case even more glyphs are needed.

But as I think about it it more likely was a statement about Java's
implementation (because a char has only 16 bits which is not
sufficient for all Unicode code points).

I believe you are referring to the complaints the Asian cultures sometimes raise against Unicode. If so, I'll try to recap the issues, as I understand them.

First, Unicode is a bit larger than their native encodings. Typically they get everything they need into two bytes where Unicode requires more for their languages.

The Unicode team also made some controversial decisions that affected the Asian languages, like Han Unification (http://en.wikipedia.org/wiki/Han_unification\).

Finally, they have a lot of legacy data in their native encodings and perfect conversion is sometimes tricky due to some context sensitive issues.

James, thanks for the summary. It is much appreciated.

I think the Asian cultures have warmed a bit to Unicode over time (my opinion only), but it's important to remember that adopting it involved more challenges for them.

I believe that is in part due to our western ignorance. If we would
deal with encodings properly we would probably feel a similar pain -
at least it would cause more pain for us. I have frequently seen i18n
aspects being ignored (my pet peeve is time zones). Usually this
breaks your neck as soon as people from other cultures start using
your application - or such simple things happen as a change of a
database server's timezone which then differs from the application
server's. :slight_smile:

Kind regards

robert

···

On Thu, Nov 25, 2010 at 3:07 PM, James Edward Gray II <james@graysoftinc.com> wrote:

On Nov 25, 2010, at 5:56 AM, Robert Klemme wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme wrote:

This may be true for the western world but I believe I remember one
of our Japanese friends state that Unicode does not cover all Asian
character sets completely; it could have been a remark about Java's
implementation of Unicode though, I am not 100% sure.

Since UTF-8 is a subset of UTF-16, which in turn is a subset of
UTF-32,

I tried to find more precise statement about this but did not really
succeed. I thought all UTF-x were just different encoding forms of
the same universe of code points.

Yes this is correct. Many people don't get the difference between a
charset and the corresponding encoding.

Btw, this happens all the time: for example, people often do not grasp
the difference between "point in time" and "representation of a point
in time in a particular time zone and locale". This becomes an issue
if you want to calculate with timestamps at or near the DST change
time. :slight_smile:

Unicode is a charset not with one encoding but with many encodings. So
we talk about the same characters and different mappings of this
characters to bits and bytes. This mapping is a simple table which you
can write down to a paper sheet and the side with the characters will
always be the same with UTF-8, UTF-16 and UTF-32.

The encodings UTF-8, UTF-16 and UTF-32 were build for different
purposes. The number after UTF says nothing about the maximum length in
first place it says something about the shortest length (and often about
the usual length if you use this encoding in that situation which it was
build for).

More precisely the number indicates the "encoding unit" (see my quote
in an earlier posting). One could think up an encoding with encoding
unit of 1 octet (8 bits, 1 byte) where the shortest length would be 2
octets. Example

1st octet: number of octets to follow
2nd and subsequent octets: encoded character

The shortest length would be 2 octets, but the length would increase
by 1 octet so the encoding unit is 1 octet.

Cheers

robert

···

On Thu, Nov 25, 2010 at 6:05 PM, Oliver Schad <spam.entfernen.und.bring.gefaelligst.ein.bier.mit@oschad.de> wrote:

On Thu, Nov 25, 2010 at 11:12 AM, Phillip Gawlowski >> <cmdjackryan@googlemail.com> wrote:

On Thu, Nov 25, 2010 at 10:45 AM, Robert Klemme >>> <shortcutter@googlemail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

> I suppose I expected people to be developing modern Linux apps that just
> happen to compile on that hardware.

Linux is usually not the OS the vendor supports. Keep in mind, a day
of lost productivity on this kind of systems means losses in the
millions of dollars area.

In other words, you need someone who will support it, and maybe someone who'll
accept that kind of risk. None of the Linux vendors are solid enough? Or is it
that they don't support mainframes?

>> And mainframes with vector CPUs are ideal for all sorts of simulations
>> engineers have to do (like aerodynamics), or weather research.
>
> When you say "ideal", do you mean they actually beat out the cluster of
> commodity hardware I could buy for the same price?

Sure, if you can shell out for about 14 000 Xeon CPUs and 7 000 Tesla
GPGPUs (Source: Tianhe-1 - Wikipedia ).

From that page:

"Both the original Tianhe-1 and Tianhe-1A use a Linux-based operating
system... Each blade is composed of two compute nodes, with each compute node
containing two Xeon X5670 6-core processors and one Nvidia M2050 GPU
processor."

I'm not really seeing a difference in terms of hardware.

> All three of which suggest to me that in many cases, an actual greenfield
> project would be worth it. IIRC, there was a change to the California
> minimum wage that would take 6 months to implement and 9 months to
> revert because it was written in COBOL -- but could the same team really
> write a new payroll system in 15 months? Maybe, but doubtful.

So, you'd bet the corporation

Nope, which is why I said "doubtful."

just because you *think* it is
easier to do changes 40 years later in an unproven, unused, upstart
language?

Sorry, "unproven, unused, upstart"? Which language are you talking about?

> But it's still absurdly wasteful. A rewrite would pay for itself with
> only a few minor changes that'd be trivial in a sane system, but major
> year-long projects with the legacy system.

If the rewrite would pay for itself in the short term, then why hasn't
it been done?

The problem is that it doesn't. What happens is that those "few minor changes"
get written off as "too expensive", so they don't happen. Every now and then,
it's actually worth the expense to make a "drastic" change anyway, but at that
point, again, 15 months versus a greenfield rewrite -- the 15 months wins.

So it very likely does pay off in the long run -- being flexible makes good
business sense, and sooner or later, you're going to have to push another of
those 15-month changes. But it doesn't pay off in the short run, and it's hard
to predict how long it will be until it does pay off. The best you can do is
say that it's very likely to pay off someday, but modern CEOs get rewarded in
the short term, then take their pensions and let the next guy clean up the
mess, so there isn't nearly enough incentive for long-term thinking.

And I'm not sure I could make a solid case that it'd pay for itself
eventually. I certainly couldn't do so without looking at the individual
situation. Still wasteful, but maybe not worth fixing.

Also, think about the argument you're using here. Why hasn't it been done? I
can think of a few reasons, some saner than others, but sometimes the answer
to "Why hasn't it been done?" is "Everybody was wrong." Example: "If it was
possible to give people gigabytes of email storage for free, why hasn't it
been done?" Then Gmail did, and the question became "Clearly it's possible to
give people gigabytes of email storage for free. Why isn't Hotmail doing it?"

···

On Saturday, November 27, 2010 02:47:12 pm Phillip Gawlowski wrote:

On Sat, Nov 27, 2010 at 7:50 PM, David Masover <ninja@slaphack.com> wrote:

James,

···

On 2010-11-26 00:55, James Edward Gray II wrote:

On Nov 25, 2010, at 6:37 AM, Phillip Gawlowski wrote:

Thus, UTF-8 is a subset of UTF-16 is a subset of UTF-16. Thus,
also, the future-proofing, in case even more glyphs are needed.

You are confusing us.

UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points. They
are all capable of representing all code points. Nothing in this
discussion is a subset of anything else.

This is all really interesting but I don't understand what you mean by "code points" - is what you have said expressed diagrammatically somewhere?

Thanks,

Phil.
--
Philip Rhoades

GPO Box 3411
Sydney NSW 2001
Australia
E-mail: phil@pricom.com.au

Phillip Gawlowski wrote in post #963815:

The IEEE standard, however, does *not* define how mathematics work.
Mathematics does that. In math, x_0/0 is *undefined*. It is not
infinity...

What psychological anomaly causes creationists keep saying that there
are no transitional fossils even after having been shown transitional
fossils? We might pass it off as mere cult indoctrination or
brainwashing, but the problem is a more general one.

We also see it happening here in Mr. Gawlowski who, after being given
mathematical facts about infinity, simply repeats his uninformed
opinion.

"The Dunning-Kruger effect is a cognitive bias in which an unskilled
person makes poor decisions and reaches erroneous conclusions, but
their incompetence denies them the metacognitive ability to realize
their mistakes." (http://en.wikipedia.org/wiki/Dunning-Kruger_effect\)

Here is my initial response to Mr. Gawlowski. Let's see if he ignores
it again (as a creationist ignores transitional fossils).

···

It is perfectly reasonable, mathematically, to assign infinity to
1/0. To geometers and topologists, infinity is just another
point. Look up the one-point compactification of R^n. If we join
infinity to the real line, we get a circle, topologically. Joining
infinity to the real plane gives a sphere, called the Riemann
sphere. These are rigorous definitions with useful results.

I'm glad that IEEE floating point has infinity included, otherwise I
would run into needless error handling. It's not an error to reach
one pole of a sphere (the other pole being zero).

Infinity is there for good reason; its presence was well-considered
by the quite knowledgeable IEEE designers.

--
Posted via http://www.ruby-forum.com/\.

James Edward Gray II wrote:

UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points. They are all capable of representing all code points. Nothing in this discussion is a subset of anything else.

To add to this, Unicode 3 uses the codespace from 0 to 0x10FFFF (not 0xFFFFFFFF),
so it does cover all the Oriental characters (unlike Unicode 2 as implemented in
earlier Java versions, which only covers 0..0xFFFF). It even has codepoints for
Klingon and Elvish!

UTF-8 requires four bytes to encode a 21-bit number (enough to encode 0x10FFFF)
though if you extend the pattern (as many implementations do) it has a 31-bit gamut.

UTF-16 encodes the additional codespace using surrogate pairs, which is a pair of
16-bit numbers each carrying a 10-bit payload. Because it's still a variable length
encoding, it's just as painful to work with as UTF-8, but less space-efficient.

Both UTF-8 and UTF-16 encodings allow you to look at any location in a string and step
forward or back to the nearest character boundary - a very important property that
was missing from Shift-JIS and other earlier encodings.

If you go back to 2003 in the archives, you'll see I engaged in a long and somewhat
heated discussion about this subject with Matz and others back then. I'm glad we
finally have a Ruby version that can at least do this stuff properly, even though
I think it's over-complicated.

Clifford Heath.

Robert Klemme wrote:

Btw, this happens all the time: for example, people often do not grasp
the difference between "point in time" and "representation of a point
in time in a particular time zone and locale".

... on a particular relativistic trajectory :wink: Seriously though,
time dilation effects are accounted for in every GPS unit, because
the relative motion of the satellites gives each one its own timeline
which affects the respective position fixes.

Clifford Heath

In other words, you need someone who will support it, and maybe someone who'll
accept that kind of risk. None of the Linux vendors are solid enough? Or is it
that they don't support mainframes?

Both, and the Linux variant you use has to be certified by the
hardware vendor, too. Essentially, a throwback to the UNIX
workstations of yore: if you run something uncertified, you don't get
the support you paid for in the first place.

"Both the original Tianhe-1 and Tianhe-1A use a Linux-based operating
system... Each blade is composed of two compute nodes, with each compute node
containing two Xeon X5670 6-core processors and one Nvidia M2050 GPU
processor."

I'm not really seeing a difference in terms of hardware.

We are probably talking on cross purposes here:
You *can* build a vector CPU cluster out of commodity hardware, but it
involves a) a lot of hardware and b) a lot of customization work to
get them to play well with each other (like concurrency, and avoiding
bottlenecks that leads to a hold up in several nodes of you cluster).

Sorry, "unproven, unused, upstart"? Which language are you talking about?

Anything that isn't C, ADA or COBOL. Or even older. This is a very,
very conservative mindset, where not even Java has a chance.

So it very likely does pay off in the long run -- being flexible makes good
business sense, and sooner or later, you're going to have to push another of
those 15-month changes. But it doesn't pay off in the short run, and it's hard
to predict how long it will be until it does pay off. The best you can do is
say that it's very likely to pay off someday, but modern CEOs get rewarded in
the short term, then take their pensions and let the next guy clean up the
mess, so there isn't nearly enough incentive for long-term thinking.

Don't forget the engineering challenge. Doing the Great Rewrite for
software that's 20 years in use (or even longer), isn't something that
is done on a whim, or because this new-fangled "agile movement" is
something the programmers like.

Unless there is a very solid business case (something on the level of
"if we don't do this, we will go bankrupt in 10 days" or similarly
drastic), there is no incentive to fix what ain't broke (for certain
values of "ain't broke", anyway).

Also, think about the argument you're using here. Why hasn't it been done? I
can think of a few reasons, some saner than others, but sometimes the answer
to "Why hasn't it been done?" is "Everybody was wrong." Example: "If it was
possible to give people gigabytes of email storage for free, why hasn't it
been done?" Then Gmail did, and the question became "Clearly it's possible to
give people gigabytes of email storage for free. Why isn't Hotmail doing it?"

Google has a big incentive, and a big benefit going for it:
a) Google wants your data, so they can sell you more and better ads.
b) The per MB cost of hard drives came down *significantly* in the
last 10 years. For my external 1TB HD I paid about 50 bucks, and for
my internal 500GB 2.5" HD I paid about 50 bucks. For that kind of
money, you couldn't buy a 500 GB HD 5 years ago.

Without cheap storage, free email accounts with Gigabytes of storage
are pretty much impossible.

CUDA and GPGPUs have become available only in the last few years, and
only because GPUs have become insanely powerful and insanely cheap at
the same time.

If you were building the architecture that requires mainframes today,
I doubt anyone would buy a Cray without some very serious
considerations (power consumption, ease of maintenance, etc) in favor
of the Cray.

···

On Sun, Nov 28, 2010 at 1:56 AM, David Masover <ninja@slaphack.com> wrote:

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

A "code point" is basically a unique identifier of a special symbol.

http://www.unicode.org/
http://www.unicode.org/charts/About.html
http://www.unicode.org/charts/

HTH

robert

···

On Thu, Nov 25, 2010 at 3:48 PM, Philip Rhoades <phil@pricom.com.au> wrote:

James,

On 2010-11-26 00:55, James Edward Gray II wrote:

On Nov 25, 2010, at 6:37 AM, Phillip Gawlowski wrote:

Thus, UTF-8 is a subset of UTF-16 is a subset of UTF-16. Thus,
also, the future-proofing, in case even more glyphs are needed.

You are confusing us.

UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points. They
are all capable of representing all code points. Nothing in this
discussion is a subset of anything else.

This is all really interesting but I don't understand what you mean by "code
points" - is what you have said expressed diagrammatically somewhere?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/