Hello,
in Ruby 1.9 I get the following behaviour:
"aoueäöüé".upcase
=> "AOUEäöüé"
"AOUEÄÖÜÉ".downcase
=> "aoueÄÖÜÉ"
I can't find however find a bug in the bug tracking system.
Doesn't this qualify as a bug?
Cheers, Stefan
Hello,
in Ruby 1.9 I get the following behaviour:
"aoueäöüé".upcase
=> "AOUEäöüé"
"AOUEÄÖÜÉ".downcase
=> "aoueÄÖÜÉ"
I can't find however find a bug in the bug tracking system.
Doesn't this qualify as a bug?
Cheers, Stefan
Hi,
in Ruby 1.9 I get the following behaviour:
"aoueäöüé".upcase
=> "AOUEäöüé"
"AOUEÄÖÜÉ".downcase
=> "aoueÄÖÜÉ"
I can't find however find a bug in the bug tracking system.
Doesn't this qualify as a bug?
The document for String#upcase says:
call-seq:
str.upcase => new_str
Returns a copy of <i>str</i> with all lowercase letters replaced with their
uppercase counterparts. The operation is locale insensitive---only
characters ``a'' to ``z'' are affected.
Note: case replacement is effective only in ASCII region.
"hEllO".upcase #=> "HELLO"
See "Note:". Tim Bray have persuaded me to do so, since case
conversion outside of ASCII region is highly dependent on country,
language, culture and script.
matz.
In message "Re: String#upcase/downcase with UTF-8 strings in Ruby 1.9" on Thu, 10 Jul 2008 07:09:29 +0900, "Stefan Schmidt" <Stefan.Schmidt@gmx.net> writes:
This leaves the perfect opening for people to contribute locale or language specific extensions to String.
It would make a great gem with a plug-in architecture.
Just add options for the language you want to use.
In any case it can get very tricky to do character conversions with different languages.
On Jul 9, 2008, at 6:25 PM, Yukihiro Matsumoto wrote:
Hi,
In message "Re: String#upcase/downcase with UTF-8 strings in Ruby 1.9" > on Thu, 10 Jul 2008 07:09:29 +0900, "Stefan Schmidt" <Stefan.Schmidt@gmx.net > > writes:
>in Ruby 1.9 I get the following behaviour:
>
>>> "aoueäöüé".upcase
>=> "AOUEäöüé"
>>> "AOUEÄÖÜÉ".downcase
>=> "aoueÄÖÜÉ"
>
>I can't find however find a bug in the bug tracking system.
>Doesn't this qualify as a bug?The document for String#upcase says:
call-seq:
str.upcase => new_strReturns a copy of <i>str</i> with all lowercase letters replaced with their
uppercase counterparts. The operation is locale insensitive---only
characters ``a'' to ``z'' are affected.
Note: case replacement is effective only in ASCII region."hEllO".upcase #=> "HELLO"
See "Note:". Tim Bray have persuaded me to do so, since case
conversion outside of ASCII region is highly dependent on country,
language, culture and script.matz.
The document for String#upcase says:
Yes, sorry, I should have read the documentation
See "Note:". Tim Bray have persuaded me to do so, since case
conversion outside of ASCII region is highly dependent on country,
language, culture and script.
So basically the Python guys are going down a wrong route ?
# -*- coding: utf-8 -*-
import string
print string.upper(u"aoueäöüé")
print string.lower(u"AOUEÄÖÜÉ")
works as expected.
Cheers, Stefan
No.
They're going down a different route.
Seriously, the language handling is something that could easily be handled by extensions. It does not need to be a core part of the language.
Even operating systems handle these things with proprietary and very sophisticated techniques based on the language in question.
In most cases, what you are expecting to be the correct upper case characters may be 'correct' but it will ultimately depend on the language and the context.
On Jul 9, 2008, at 8:17 PM, Stefan Schmidt wrote:
The document for String#upcase says:
Yes, sorry, I should have read the documentation
See "Note:". Tim Bray have persuaded me to do so, since case
conversion outside of ASCII region is highly dependent on country,
language, culture and script.So basically the Python guys are going down a wrong route ?
# -*- coding: utf-8 -*-
import string
print string.upper(u"aoueäöüé")
print string.lower(u"AOUEÄÖÜÉ")works as expected.
Cheers, Stefan
> So basically the Python guys are going down a wrong route ?
>
> # -*- coding: utf-8 -*-
> import string
> print string.upper(u"aoueäöüé")
> print string.lower(u"AOUEÄÖÜÉ")
>
> works as expected.
>
> Cheers, Stefan
>
No.
They're going down a different route.
Seriously, the language handling is something that could easily be
handled by extensions. It does not need to be a core part of the
language.
Is Nikolai Weibull's Ruby Character Encodings Library [1] currently the best way to go?
Stefan
Seriously, the language handling is something that could easily be
handled by extensions. It does not need to be a core part of the
language.
Are there any working extensions for Ruby 1.9 that offer Unicode support for String#downcase/upcase and/or Array#sort?
Stefan