Hi, I've added a method "multi_capitalize" to String class. This
method is done in C and basically modifies the string:
"record-roUTE".multi_capitalize => "Record-Route"
The problem is that after the method execution, the new String has
ASCII-8BIT encoding, while the original string had UTF-8 (using Ruby
1.9.1).
···
--------------------------------------------------------------------------------
hname = "record-rouTE-€"
"record-rouTE-€"
hname.encoding
#<Encoding:UTF-8>
hname2 = hname.multi_capitalize
"Record-Route-\xE2\x82\xAC" <------- !!!
hname2.encoding
#<Encoding:ASCII-8BIT> <------- !!!
hname2.force_encoding("utf-8")
"Record-Route-€"
hname2.encoding
#<Encoding:UTF-8>
--------------------------------------------------------------------------------
What should I add to my C method to mantain the UTF-8 codification
after the changes in the string?
Could I invoke the C "force_encoding()" function from the C code
before returning the modified string? How to invoke it?
Thanks a lot.
--
Iñaki Baz Castillo
<ibc@aliax.net>
You can call it as (untested):
rb_funcall(str, rb_intern("force_encoding"), 1, rb_str_new2("utf-8"));
I'm not sure how to make your multi-capitalize method do the right
thing, but maybe reading the source of rb_str_capitalize_bang in
string.c helps.
Best,
Andre
···
On Sat, 2009-04-04 at 01:39 +0900, Iñaki Baz Castillo wrote:
Could I invoke the C "force_encoding()" function from the C code
before returning the modified string? How to invoke it?
Thanks a lot, I will check it.
···
El Viernes 03 Abril 2009, Andre Nathan escribió:
On Sat, 2009-04-04 at 01:39 +0900, Iñaki Baz Castillo wrote:
> Could I invoke the C "force_encoding()" function from the C code
> before returning the modified string? How to invoke it?
You can call it as (untested):
rb_funcall(str, rb_intern("force_encoding"), 1, rb_str_new2("utf-8"));
I'm not sure how to make your multi-capitalize method do the right
thing, but maybe reading the source of rb_str_capitalize_bang in
string.c helps.
--
Iñaki Baz Castillo <ibc@aliax.net>
Yes, rb_str_capitralize_bang handles a lot of stuf realted to encoding:
c = rb_enc_codepoint(s, send, enc);
if (rb_enc_islower(c, enc)) {
rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc);
modify = 1;
}
s += rb_enc_codelen(c, enc);
so this is the way 
Thanks a lot.
···
El Viernes 03 Abril 2009, Iñaki Baz Castillo escribió:
El Viernes 03 Abril 2009, Andre Nathan escribió:
> On Sat, 2009-04-04 at 01:39 +0900, Iñaki Baz Castillo wrote:
> > Could I invoke the C "force_encoding()" function from the C code
> > before returning the modified string? How to invoke it?
>
> You can call it as (untested):
>
> rb_funcall(str, rb_intern("force_encoding"), 1, rb_str_new2("utf-8"));
>
> I'm not sure how to make your multi-capitalize method do the right
> thing, but maybe reading the source of rb_str_capitalize_bang in
> string.c helps.
Thanks a lot, I will check it.
--
Iñaki Baz Castillo <ibc@aliax.net>