Encoding for system calls on Windows

I am trying to understand how encoding interacts with system calls,
particularly on Windows.
I suspect there may be a bug in ruby on how it invokes system calls with
non-ASCII characters in the command.

I have a demonstration program below (also attached, in case the
encoding gets corrupted in the mailing list). It fails on Windows 7 and
Windows XP, even when I "chcp 65001". I have my Command Prompt
configured to use Lucinda Console font (as default Raster fonts have
even more problems with non-ASCII characters).

# encoding: utf-8

def test(word)
  returned = `echo #{word}`.chomp
  puts "#{word} == #{returned}"
  raise "Cannot roundtrip #{word}" unless word == returned
end

test "good"

test "bÃd"

puts "Success"

# win7, cmd.exe font set to Lucinda Console, chcp 65001
# good == good
# bÃd == bÃd

I suspect that something in here


needs to make sure the string being passed to the cmd.exe process is
encoded properly, but I'm in over my head.
I also suspect that this response hold a clue about the change that may
need to be made:

I'd like to know if

1) Is this a bug in ruby on Windows

or

2) How can I change the program above so that it completes successfully
on Windows?

Attachments:
http://www.ruby-forum.com/attachment/8849/spawn_encode.rb

···

--
Posted via http://www.ruby-forum.com/.

Have you tried something like `chcp 65001; echo #{word}`?

···

On Tuesday, 22 October 2013 г. at 5:31, Joshua F. wrote:

I am trying to understand how encoding interacts with system calls,
particularly on Windows.
I suspect there may be a bug in ruby on how it invokes system calls with
non-ASCII characters in the command.

I have a demonstration program below (also attached, in case the
encoding gets corrupted in the mailing list). It fails on Windows 7 and
Windows XP, even when I "chcp 65001". I have my Command Prompt
configured to use Lucinda Console font (as default Raster fonts have
even more problems with non-ASCII characters).

# encoding: utf-8

def test(word)
returned = `echo #{word}`.chomp
puts "#{word} == #{returned}"
raise "Cannot roundtrip #{word}" unless word == returned
end

test "good"

test "bÃd"

puts "Success"

# win7, cmd.exe font set to Lucinda Console, chcp 65001
# good == good
# bÃd == bÃd

I suspect that something in here
ruby/win32/win32.c at d66c5768caaee16a0c2c2c6411858d23fb9f21a9 · ruby/ruby · GitHub
needs to make sure the string being passed to the cmd.exe process is
encoded properly, but I'm in over my head.
I also suspect that this response hold a clue about the change that may
need to be made:
windows - What encoding/code page is cmd.exe using? - Stack Overflow

I'd like to know if

1) Is this a bug in ruby on Windows

or

2) How can I change the program above so that it completes successfully
on Windows?

Attachments:
http://www.ruby-forum.com/attachment/8849/spawn_encode.rb

--
Posted via http://www.ruby-forum.com/\.