I am doing a quick and dirty automatic translation from English to
spanish of some text in an xml document.
However the translation returns characters outsize the 7 bit range,
which seems to creates ain invalid xml document. I need those string
utf8 encoded before I set the text of an element. But I cant see how to
do this.
Thanks for any help
Regards
Ralph
A test doc looks like
<?xml version='1.0' encoding='UTF-8'?>Vehicle
Full code.
require 'net/http’
require 'cgi’
require ‘rexml/document’
def translate(text)
puts "translating #{text}“
ret =”"
Net::HTTP.start(‘translate.google.com’){ |session|
session.get("/translate_t?langpair=en|es&hl=en&text=#{CGI.escape(text)}"){
>result> ret<< result
}
}
ret =~ /(name=q.?>)(.?)</
$2
end
def process(node)
puts node.name
node.text = translate(node.text) if ( node.text.strip != “” )
node.elements.each{|x| process x}
end
doc = REXML::Document.new File.new "lang_eng.xml"
doc.elements.each{|x| process x }
doc.write(File.new(“lang_spn.xml”,“w”),0)