XMLRPC (REXML) incorrectly handles UTF-8 data

Hi,
I'm running ruby 1.9.2-p0 on Centos 5.5 x86_64 along with rails 2.3.8.

I have XMLRPC server on another windows machine (rails 1.9.1) and XMLRPC
client on the Centos machine. I need to return UTF-8 encoded data from
server to client and this is where I'm stuck.

The Server seems to be sending correct UTF-8 encoded data, bud client is
unable to parse the XML. If the XML contains ASCII only strings,
everything's OK, but once there is any multi-byte UTF-8 character, ruby
bails out and outputs this:

···

----------------------
REXML::ParseException (#<Encoding::CompatibilityError: incompatible
encoding regexp match (UTF-8 regexp with ASCII-8BIT string)>
/usr/local/lib/ruby/1.9.1/rexml/source.rb:212:in `match'
/usr/local/lib/ruby/1.9.1/rexml/source.rb:212:in `match'
/usr/local/lib/ruby/1.9.1/rexml/parsers/baseparser.rb:425:in `pull'
/usr/local/lib/ruby/1.9.1/rexml/parsers/streamparser.rb:16:in `parse'
/usr/local/lib/ruby/1.9.1/rexml/document.rb:204:in `parse_stream'
/usr/local/lib/ruby/1.9.1/xmlrpc/parser.rb:717:in `parse'
/usr/local/lib/ruby/1.9.1/xmlrpc/parser.rb:460:in `parseMethodResponse'
/usr/local/lib/ruby/1.9.1/xmlrpc/client.rb:421:in `call2'
/usr/local/lib/ruby/1.9.1/xmlrpc/client.rb:410:in `call'
......
----------------------

There seems to be something wrong with REXML non-ASCII data parsing or
maybe encoding detection. I've tracked it down to the "match" method in
IOSource wrapper class in rexml/source.rb file. The problem seems to be
that the @buffer which the method matches against contains ASCII-8bit
string sometimes. Strangely, it happens only when it contains some
non-ASCII data. If there are only ASCII characters in @buffer, it
happily proceeds as UTF-8.

BTW, my client script looks like this:
----------------------
module SubmitFilesHelper

  @rpc_server_url='http://172.16.1.2:3000'

  def self.sendToServer(filename,language)
    require 'xmlrpc/client'
    server = XMLRPC::Client.new2(@rpc_server_url)
    result = server.call('check', filename,language)
  end
end
----------------------

Centos has locale set to en_us.UTF-8

Is there anything I'm doing wrong, or is it ruby bug?

Thanks,
Petr

--
Posted via http://www.ruby-forum.com/.

Hm, it's possible to encode the offending string to base64 before
handing it to xmlrpc, effectively bypassing any ruby 1.9 encoding
awareness. Not exactly what I would like to see...

Anyway, is there a correct solution to my problem? Base64 encoding is
working solution, but not correct as I'm manually bypassing a language
feature worth having.

Cheers,
Petr

···

--
Posted via http://www.ruby-forum.com/.

try,
  Encoding.default_internal = Encoding.default_external = "UTF-8"

best regards -botp

···

On Tue, Nov 16, 2010 at 10:37 PM, Petr Klima <petr.klima@avg.com> wrote:

REXML::ParseException (#<Encoding::CompatibilityError: incompatible
encoding regexp match (UTF-8 regexp with ASCII-8BIT string)>

Hi,

In <77a0ce708f3cb00f79bcc1b13733c71b@ruby-forum.com>
  "XMLRPC (REXML) incorrectly handles UTF-8 data" on Tue, 16 Nov 2010 23:37:48 +0900,

···

Petr Klima <petr.klima@avg.com> wrote:

I have XMLRPC server on another windows machine (rails 1.9.1) and XMLRPC
client on the Centos machine. I need to return UTF-8 encoded data from
server to client and this is where I'm stuck.

The Server seems to be sending correct UTF-8 encoded data, bud client is
unable to parse the XML. If the XML contains ASCII only strings,
everything's OK, but once there is any multi-byte UTF-8 character, ruby
bails out and outputs this:

Could you show us a reproducable example? We need at least
the HTTP response header and the XML response from your
XML-RPC server.

Thanks,
--
kou

Hi,
here is the reply from XMLRPC server:

HTTP header:

···

---------------
HTTP/1.1 200: OK
Content-Length: 921
Content-Type: text/xml; charset=utf-8
Server: WEBrick/1.3.1 (Ruby/1.9.1/2010-01-10)
Date: Thu, 18 Nov 2010 07:57:17 GMT
Connection: Keep-Alive
---------------

XML response (should be one line):
---------------
<?xml version="1.0"
?><methodResponse><params><param><value><struct><member><name>result</name><value><string>ok
</string></value></member><member><name>program_ver</name><value><string>10.0.1153</string></value></member><member><na

engine_ver</name><value><string>10.0.424</string></value></member><member><name>virus_db_ver</name><value><string>42

4/3263
2010-11-1</string></value></member><member><name>threat_desc</name><value><string>Определен
вирус EICAR_Test </s

</value></member><member><name>infections_found</name><value><string>1</string></value></member><member><name>pup

s_found</name><value><string>0</string></value></member><member><name>infections_healed</name><value><string>0</string>
</value></member><member><name>pups_healed</name><value><string>0</string></value></member><member><name>warnings</name

<value><string>0</string></value></member></struct></value></param></params></methodResponse>

---------------
As you can see, there's correct UTF-8 string in cyrillic in the middle
of the XML.

BTW, botp's suggested solution (Encoding.default_internal =
Encoding.default_external = "UTF-8") doesn't work in Apache module
Passenger 3.0.0

--
Posted via http://www.ruby-forum.com/\.

botp wrote in post #961846:

try,
  Encoding.default_internal = Encoding.default_external = "UTF-8"

Damn, I have seen this before and I would swear I tried it and it didn't
help (I was using 1.9.1 at the time). Hm, probably somehow slipped
between my fingers. Thanks a lot, works now :slight_smile:

···

--
Posted via http://www.ruby-forum.com/\.

Hi,

In <417d7f496f19d3bb0dd933bc74983483@ruby-forum.com>
  "Re: XMLRPC (REXML) incorrectly handles UTF-8 data" on Thu, 18 Nov 2010 17:21:45 +0900,

Hi,
here is the reply from XMLRPC server:

HTTP header:

...

XML response (should be one line):

...

As you can see, there's correct UTF-8 string in cyrillic in the middle
of the XML.

Thanks. I can reproduce it.
This had been fixed in trunk.

This is a problem of REXML but maybe the following code will
fix it. (I don't try it. Sorry.)

module SubmitFilesHelper
  module XMLRPCWorkAround
    def do_rpc(request, async=false)
      data = super
      data.force_encoding("UTF-8")
      data
    end
  end

  @rpc_server_url='http://172.16.1.2:3000'

  def self.sendToServer(filename,language)
    require 'xmlrpc/client'
    server = XMLRPC::Client.new2(@rpc_server_url)
    server.extend(XMLRPCWorkAround)
    result = server.call('check', filename,language)
  end
end

Thanks,

···

Petr Klima <petr.klima@avg.com> wrote:
--
kou