Within C extension code, what are the appropriate C functions to use
that provide the equivalent functionality of String#force_encoding and
String#encode? I wasn't able to find anything in the README.EXT
(http://svn.ruby-lang.org/repos/ruby/tags/v1_9_3_194/README.EXT).
= String#force_encoding
Based on these articles [1] [2], it seems as though the the suggested
approach for forcing an encoding change in a C extension is to use the
'rb_enc_associate_index' function defined in 'ruby/encoding.h'. Is
that accurate?
The implementation of String#force_encoding does a bit more than that:
static VALUE
rb_str_force_encoding(VALUE str, VALUE enc)
{
str_modifiable(str);
rb_enc_associate(str, rb_to_encoding(enc));
ENC_CODERANGE_CLEAR(str);
return str;
}
This implementation adds an invocation of the ENC_CODERANGE_CLEAR
macro and I'm not clear on what this does or when/why it would be
needed.
= String#encode
I wasn't able to find any documentation about String transcoding
within C extensions. Based on the implementation of String#encode,
there seems to be a function 'rb_str_encode' in 'ruby/encoding.h' that
might be appropriate, but there's no documentation for the method and
I couldn't completely reverse engineer how the 'ecflags' and 'ecopts'
arguments are used.
Welcome to ruby 1.9.x, where everything to do with string encodings is
completely undocumented.
The implementation of String#force_encoding does a bit more than that:
static VALUE
rb_str_force_encoding(VALUE str, VALUE enc)
{
str_modifiable(str);
rb_enc_associate(str, rb_to_encoding(enc));
ENC_CODERANGE_CLEAR(str);
return str;
}
This implementation adds an invocation of the ENC_CODERANGE_CLEAR
macro and I'm not clear on what this does or when/why it would be
needed.
I believe this is essentially the same as rb_str_modify(). It clears the
cache of properties like 'ascii_only?' and 'valid_encoding?', so that
next time someone queries them it has to scan the whole string.
= String#encode
I wasn't able to find any documentation about String transcoding
within C extensions. Based on the implementation of String#encode,
there seems to be a function 'rb_str_encode' in 'ruby/encoding.h' that
might be appropriate, but there's no documentation for the method and
I couldn't completely reverse engineer how the 'ecflags' and 'ecopts'
arguments are used.
Welcome to ruby 1.9.x, where everything to do with string encodings is
completely undocumented.
I've noticed ...
The implementation of String#force_encoding does a bit more than that:
static VALUE
rb_str_force_encoding(VALUE str, VALUE enc)
{
str_modifiable(str);
rb_enc_associate(str, rb_to_encoding(enc));
ENC_CODERANGE_CLEAR(str);
return str;
}
This implementation adds an invocation of the ENC_CODERANGE_CLEAR
macro and I'm not clear on what this does or when/why it would be
needed.
I believe this is essentially the same as rb_str_modify(). It clears the
cache of properties like 'ascii_only?' and 'valid_encoding?', so that
next time someone queries them it has to scan the whole string.
= String#encode
I wasn't able to find any documentation about String transcoding
within C extensions. Based on the implementation of String#encode,
there seems to be a function 'rb_str_encode' in 'ruby/encoding.h' that
might be appropriate, but there's no documentation for the method and
I couldn't completely reverse engineer how the 'ecflags' and 'ecopts'
arguments are used.
Maybe easier just to rb_funcall it?
That's what I've done, but I wanted to check if there was something
more appropriate I should be doing while in the C code.
···
On Thu, Jul 19, 2012 at 8:42 AM, Brian Candler <lists@ruby-forum.com> wrote: