Output UTF-16LE BOM to file - 1.9

ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]

With this code:

File.open('zz.txt', 'w:UTF-16LE') do |f|
  f.print "Hello Uni-world"
end

...I get no BOM

guts = File.read('zz.txt')
puts guts.bytes.to_a.inspect

#=> [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0,...

...and my brain can't concoct a way to insert it myself, though I know
it must be simple...

···

--
Chris
http://clabs.org

Yeah, it's easy stuff.

A Unicode BOM is just the character U+FEFF encoded at the beginning of the document. You can insert that character yourself with Ruby 1.9's Unicode escape and it will be transcoded into the proper byte order based on the external_encoding() you are writing to:

$ cat utf16_bom.rb
# encoding: UTF-8
File.open("utf16_bom.txt", "w:UTF-16LE") do |f|
   f.puts "\uFEFFThis is UTF-16LE with a BOM."
end
$ ruby -v utf16_bom.rb
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-darwin9.6.0]
$ ruby -e 'p File.binread(ARGV.shift)[0..9]' utf16_bom.txt
"\xFF\xFET\x00h\x00i\x00s\x00"

Hope that helps.

James Edward Gray II

···

On Apr 9, 2009, at 3:31 PM, Chris Morris wrote:

ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]

With this code:

File.open('zz.txt', 'w:UTF-16LE') do |f|
f.print "Hello Uni-world"
end

...I get no BOM

guts = File.read('zz.txt')
puts guts.bytes.to_a.inspect

#=> [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0,...

...and my brain can't concoct a way to insert it myself, though I know
it must be simple...