Karsten Meier wrote...
Now I like to create a file with "native java strings", in this case
strings that can be read with DataInputStream.readUTF() method.
I hope that this simplifies the code and improves the speed.
I don't need a complete Ruby/Java Integration, I just want to write
these java strings to a file (or to stdout as a cgi-script).
Are there any libraries for this?
The way I read the Java API, the format is simply the UTF-8-encoded string
preceded by its length in bytes as a 16-bit integer.
Except that (from
* The null byte '\u0000' is encoded in 2-byte format rather than 1-byte, so
that the encoded strings never have embedded nulls.
* Only the 1-byte, 2-byte, and 3-byte formats are used.
* Supplementary characters are represented in the form of surrogate pairs.
So, in Ruby:
# untested code (I'm on Windows with no iconv)
# Convert the string to UTF-8
result = Iconv.new('iso-8859-1', 'utf-8').iconv(self)
# Re-encode null characters in two-byte format
# The string's size must fit in 2 bytes, so the string must be less than
fail if result.size >= 65536
# Prepend string length as short integer in network (big-endian) byte
text = "Test string\n\0with embedded newline and 0"
p text.to_java_utf #=> "\000(Test string\n\300\200with embedded newline and