What are the C extension analogs of String#force_encoding and String#encode?

Nathan_Beyer · 19 July 2012 01:08

Within C extension code, what are the appropriate C functions to use
that provide the equivalent functionality of String#force_encoding and
String#encode? I wasn't able to find anything in the README.EXT
(http://svn.ruby-lang.org/repos/ruby/tags/v1_9_3_194/README.EXT).

= String#force_encoding
Based on these articles [1] [2], it seems as though the the suggested
approach for forcing an encoding change in a C extension is to use the
'rb_enc_associate_index' function defined in 'ruby/encoding.h'. Is
that accurate?

The implementation of String#force_encoding does a bit more than that:

static VALUE
rb_str_force_encoding(VALUE str, VALUE enc)
{
    str_modifiable(str);
    rb_enc_associate(str, rb_to_encoding(enc));
    ENC_CODERANGE_CLEAR(str);
    return str;
}

This implementation adds an invocation of the ENC_CODERANGE_CLEAR
macro and I'm not clear on what this does or when/why it would be
needed.

= String#encode
I wasn't able to find any documentation about String transcoding
within C extensions. Based on the implementation of String#encode,
there seems to be a function 'rb_str_encode' in 'ruby/encoding.h' that
might be appropriate, but there's no documentation for the method and
I couldn't completely reverse engineer how the 'ecflags' and 'ecopts'
arguments are used.

-Nathan

[1] http://yugui.jp/articles/838
[2] http://tenderlovemaking.com/2009/06/26/string-encoding-in-ruby-1-9-c-extensions.html

7stud2 · 19 July 2012 13:42

Nathan Beyer wrote in post #1069253:

I wasn't able to find anything in the README.EXT
(http://svn.ruby-lang.org/repos/ruby/tags/v1_9_3_194/README.EXT\).

Welcome to ruby 1.9.x, where everything to do with string encodings is
completely undocumented.

The implementation of String#force_encoding does a bit more than that:

static VALUE
rb_str_force_encoding(VALUE str, VALUE enc)
{
    str_modifiable(str);
    rb_enc_associate(str, rb_to_encoding(enc));
    ENC_CODERANGE_CLEAR(str);
    return str;
}

This implementation adds an invocation of the ENC_CODERANGE_CLEAR
macro and I'm not clear on what this does or when/why it would be
needed.

I believe this is essentially the same as rb_str_modify(). It clears the
cache of properties like 'ascii_only?' and 'valid_encoding?', so that
next time someone queries them it has to scan the whole string.

= String#encode
I wasn't able to find any documentation about String transcoding
within C extensions. Based on the implementation of String#encode,
there seems to be a function 'rb_str_encode' in 'ruby/encoding.h' that
might be appropriate, but there's no documentation for the method and
I couldn't completely reverse engineer how the 'ecflags' and 'ecopts'
arguments are used.

Maybe easier just to rb_funcall it?

Good luck,

Brian.

···

--
Posted via http://www.ruby-forum.com/\.

Nathan_Beyer · 24 July 2012 23:12

Nathan Beyer wrote in post #1069253:

I wasn't able to find anything in the README.EXT
(http://svn.ruby-lang.org/repos/ruby/tags/v1_9_3_194/README.EXT\).

Welcome to ruby 1.9.x, where everything to do with string encodings is
completely undocumented.

I've noticed ...

The implementation of String#force_encoding does a bit more than that:

static VALUE
rb_str_force_encoding(VALUE str, VALUE enc)
{
    str_modifiable(str);
    rb_enc_associate(str, rb_to_encoding(enc));
    ENC_CODERANGE_CLEAR(str);
    return str;
}

This implementation adds an invocation of the ENC_CODERANGE_CLEAR
macro and I'm not clear on what this does or when/why it would be
needed.

I believe this is essentially the same as rb_str_modify(). It clears the
cache of properties like 'ascii_only?' and 'valid_encoding?', so that
next time someone queries them it has to scan the whole string.

= String#encode
I wasn't able to find any documentation about String transcoding
within C extensions. Based on the implementation of String#encode,
there seems to be a function 'rb_str_encode' in 'ruby/encoding.h' that
might be appropriate, but there's no documentation for the method and
I couldn't completely reverse engineer how the 'ecflags' and 'ecopts'
arguments are used.

Maybe easier just to rb_funcall it?

That's what I've done, but I wanted to check if there was something
more appropriate I should be doing while in the C code.

···

On Thu, Jul 19, 2012 at 8:42 AM, Brian Candler <lists@ruby-forum.com> wrote:

Topic		Replies	Views
Problem with String encoding when modifying it in C method ruby-talk	5	151	4 April 2009
How to use rb_enc_str_new() to create a String with UTF-8 encoding? ruby-talk	4	275	2 December 2009
Ruby 'C' Extensions and Unicode ruby-talk	10	148	22 March 2010
1.9, C extension vs encoding ruby-talk	1	144	22 August 2008
Is it possible to make your own String Encoding? ruby-talk	7	286	19 November 2016

What are the C extension analogs of String#force_encoding and String#encode?

Related topics