StringIO and encodings

This surprised me:

  $ ruby -v
  ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-linux]

  $ ruby -e 'p "".encoding'
  #<Encoding:ISO-8859-1>

  $ cat a.rb
  # encoding: utf-8

  require 'stringio'
  s = StringIO.new

  a = "abc"
  s.puts(a)

  p :string => a.encoding
  p :stringio => s.string.encoding

  $ ruby a.rb
  {:string=>#<Encoding:UTF-8>}
  {:stringio=>#<Encoding:ISO-8859-1>} # <- WTF?

  $ cat b.rb
  # encoding: utf-8

  require 'stringio'
  s = StringIO.new("") # <- note the constructor parameter

  a = "abc"
  s.puts(a)

  p :string => a.encoding
  p :stringio => s.string.encoding

  $ ruby b.rb
  {:string=>#<Encoding:UTF-8>}
  {:stringio=>#<Encoding:UTF-8>} # <- better!

I presume that what's going on here is a simple matter of defaults. If
you don't pass an initial string to StringIO.new, it constructs one for
itself using the default encoding from the locale, and the encoding
coercion rules mean that the internal string's encoding will always be
the same. StringIO.new has no knowledge of the file encoding at the
location it's called from.

This behaviour seems odd to me. I think a better behaviour would be
*either* to always force a string parameter, so that it never has to
pick a default encoding itself, *or* that it should not make itself an
internal string on #new, but instead #dup the first string it gets
passed as a parameter to #write or #puts and use that instead.

Thoughts?

···

--
Alex

--
Posted via http://www.ruby-forum.com/.

Can it not be changed so that it knows the internal encoding, instead? That
would stop you having to break the argument-less constructor or doing any
#dup'ing, no?

···

On Tue, Sep 20, 2011 at 4:18 PM, Alex Young <alex@blackkettle.org> wrote:

StringIO.new has no knowledge of the file encoding at the
location it's called from.

Alex Young wrote in post #1022945:

This surprised me:

Nothing surprises me any more about encodings in ruby 1.9.

FWIW, there's a similar case with String.new. Whereas a string literal
gets its encoding from the source encoding of the file, String.new
doesn't.

brian@x100:~$ ruby192 -e 'p "".encoding'
#<Encoding:UTF-8>
brian@x100:~$ ruby192 -e 'p String.new.encoding'
#<Encoding:ASCII-8BIT>
brian@x100:~$ echo 'p "".encoding' | ruby192
#<Encoding:UTF-8>
brian@x100:~$ echo 'p String.new.encoding' | ruby192
#<Encoding:ASCII-8BIT>
brian@x100:~$ echo 'p "".encoding' > x.rb && ruby192 x.rb
#<Encoding:US-ASCII>
brian@x100:~$ echo 'p String.new.encoding' > x.rb && ruby192 x.rb
#<Encoding:ASCII-8BIT>

However, String.new doesn't seem to be getting its encoding from the
environment, which your program suggests StringIO.new does.

All of this is completely undocumented, and therefore whatever behaviour
you get is what you get. Fine if you like stamp collecting though.

Regards,

Brian.

···

--
Posted via http://www.ruby-forum.com/.

Adam Prescott wrote in post #1022947:

···

On Tue, Sep 20, 2011 at 4:18 PM, Alex Young <alex@blackkettle.org> > wrote:

StringIO.new has no knowledge of the file encoding at the
location it's called from.

Can it not be changed so that it knows the internal encoding, instead?
That
would stop you having to break the argument-less constructor or doing
any
#dup'ing, no?

I don't know if there's an API for that, but I suspect there isn't. If
there were, then yes, that's the way to do it.

--
Alex

--
Posted via http://www.ruby-forum.com/.

Adam Prescott wrote in post #1022947:

StringIO.new has no knowledge of the file encoding at the
location it's called from.

Can it not be changed so that it knows the internal encoding, instead?
That would stop you having to break the argument-less constructor or doing
any #dup'ing, no?

I don't know if there's an API for that, but I suspect there isn't.

It's not that hard to check:

$ ri StringIO | grep encoding
  external_encoding
  internal_encoding
  set_encoding

If there were, then yes, that's the way to do it.

$ ri StringIO.set_encoding
StringIO.set_encoding

(from ruby core)

···

On Sep 20, 2011, at 8:32 AM, Alex Young wrote:

On Tue, Sep 20, 2011 at 4:18 PM, Alex Young <alex@blackkettle.org> >> wrote:

------------------------------------------------------------------------------
  strio.set_encoding(ext_enc, [int_enc[, opt]]) => strio
   
------------------------------------------------------------------------------

Specify the encoding of the StringIO as ext_enc. Use the default
external encoding if ext_enc is nil. 2nd argument int_enc and
optional hash opt argument are ignored; they are for API compatibility
to IO.

Eric Hodel wrote in post #1022972:

···

On Sep 20, 2011, at 8:32 AM, Alex Young wrote:

I don't know if there's an API for that, but I suspect there isn't.

It's not that hard to check:

$ ri StringIO | grep encoding
  external_encoding
  internal_encoding
  set_encoding

  $ ri StringIO
  Nothing known about StringIO

is what I get. I never assume ri works.

--
Alex

--
Posted via http://www.ruby-forum.com/.

soooo... instead of fixing it and empowering yourself... you choose... what exactly?

···

On Sep 23, 2011, at 00:07 , Alex Young wrote:

$ ri StringIO
Nothing known about StringIO

is what I get. I never assume ri works.

Ryan Davis wrote in post #1023415:

···

On Sep 23, 2011, at 00:07 , Alex Young wrote:

$ ri StringIO
Nothing known about StringIO

is what I get. I never assume ri works.

soooo... instead of fixing it and empowering yourself... you choose...
what exactly?

rubydoc.info, usually. Saves fixing it on every single box I ever
touch.

--
Alex

--
Posted via http://www.ruby-forum.com/.

+1, ri has worked for me once before, but rarely does, and I don't enjoy the
format anyway. I used to build docs and host them with `gem server` but now
I turn off ri and rdoc and just use rdoc.info since it has not only core
docs, but also gems.

Occasionally I use ruby-doc.org, and for Rails I use guides.rubyonrails.organd
api.rubyonrails.org

···

On Fri, Sep 23, 2011 at 4:05 AM, Alex Young <alex@blackkettle.org> wrote:

Ryan Davis wrote in post #1023415:
> On Sep 23, 2011, at 00:07 , Alex Young wrote:
>
>> $ ri StringIO
>> Nothing known about StringIO
>>
>> is what I get. I never assume ri works.
>
> soooo... instead of fixing it and empowering yourself... you choose...
> what exactly?

rubydoc.info, usually. Saves fixing it on every single box I ever
touch.