RTranslate Gem (Open-URI) and Encoding

I'm using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.

Translating the word "where" returns "dónde" instead of "dónde"

Any idea why this is happening and what I can do to fix this?

Thanks!

···

--
Posted via http://www.ruby-forum.com/.

You need to specify encoding in your ruby script. Ruby (1.8 at least, I am
not certain of 1.9)
will use your system encoding for strings by default.

Set this constant in your script to make Ruby process strings as UTF-8,
independently of
your machine
$KCODE = 'u'

There is more detail here:

Note that Ruby could be processing google translate correctly (i.e. you are
doing everything above),
but if you are outputting the result to the console/system out (via puts)
your machine may still
process the UTF-8 text according to the host system. This for instance is a
particularly annoying
problem on windows systems IME. Output to a file instead if you are seeing
this problem.

regards,
Richard.

···

On Thu, Feb 18, 2010 at 7:45 PM, The Chromag <brent@bjohnson.net> wrote:

I'm using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.

Translating the word "where" returns "dónde" instead of "dónde"

Any idea why this is happening and what I can do to fix this?

--

The amusing part is that the first one looks fine to me.

I suspect this means that you're getting UTF8 when you expect something
in some particular encoding, so you should be specifying encodings...

-s

···

On 2010-02-18, The Chromag <brent@bjohnson.net> wrote:

I'm using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.

Translating the word "where" returns "dónde" instead of "dónde"

--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
| Seebs.Net <-- lawsuits, religion, and funny pictures
Fair game (Scientology) - Wikipedia <-- get educated!

Richard Conroy wrote:

···

On Thu, Feb 18, 2010 at 7:45 PM, The Chromag <brent@bjohnson.net> wrote:

You need to specify encoding in your ruby script. Ruby (1.8 at least, I
am
not certain of 1.9)
will use your system encoding for strings by default.

Set this constant in your script to make Ruby process strings as UTF-8,
independently of
your machine
$KCODE = 'u'

There is more detail here:
Gray Soft / Not Found

Note that Ruby could be processing google translate correctly (i.e. you
are
doing everything above),
but if you are outputting the result to the console/system out (via
puts)
your machine may still
process the UTF-8 text according to the host system. This for instance
is a
particularly annoying
problem on windows systems IME. Output to a file instead if you are
seeing
this problem.

I had the $KCODE variable set. It didn't seem to do anything in this
case. I outputted the translated text to a file to see if it was a
display issue with the console and the text was still incorrect in the
file.

Any other ideas?

Thanks.
--
Posted via http://www.ruby-forum.com/\.

Translating the word "where" returns "dónde" instead of "dónde"

The amusing part is that the first one looks fine to me.

Indeed. The first one is properly encoded in UTF-8, the second in ISO-8859-1.

-Jonathan Nielsen

Seebs wrote:

···

On 2010-02-18, The Chromag <brent@bjohnson.net> wrote:

I'm using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.

Translating the word "where" returns "dónde" instead of "d�nde"

The amusing part is that the first one looks fine to me.

I suspect this means that you're getting UTF8 when you expect something
in some particular encoding, so you should be specifying encodings...

Where would I specify the encoding to fix this problem? And yes, I just
noticed that in a reply it DOES look correct, but when I posted it, it
was not. I'm guessing somewhere (other than $KCODE) I need to set it as
UTF-8.

Thanks.
--
Posted via http://www.ruby-forum.com/\.

Where would I specify the encoding to fix this problem? And yes, I just
noticed that in a reply it DOES look correct, but when I posted it, it
was not. I'm guessing somewhere (other than $KCODE) I need to set it as
UTF-8.

Are you using ruby 1.8 or ruby 1.9? In 1.9, you should do
string.force_encoding("UTF-8") or
string.force_encoding("ISO-8859-1")... as needed. In ruby 1.8, I
think it just works with the bits you provide it and it's your
terminal that determines what actually gets displayed.

-Jonathan Nielsen

There, I can't help you. I don't understand encodings at all.

-s

···

On 2010-02-19, The Chromag <brent@bjohnson.net> wrote:

Where would I specify the encoding to fix this problem?

--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
| Seebs.Net <-- lawsuits, religion, and funny pictures
Fair game (Scientology) - Wikipedia <-- get educated!

Jonathan Nielsen wrote:

Are you using ruby 1.8 or ruby 1.9? In 1.9, you should do
string.force_encoding("UTF-8") or
string.force_encoding("ISO-8859-1")... as needed. In ruby 1.8, I
think it just works with the bits you provide it and it's your
terminal that determines what actually gets displayed.

I'm using 1.8.7. I don't think it's the terminal but I'm not entirely
sure. I'm outputting the translation to a text file, but technically
I'm viewing it in a terminal app (Putty) so it may be screwing up there.

Thanks.

···

--
Posted via http://www.ruby-forum.com/\.

I'm using 1.8.7. I don't think it's the terminal but I'm not entirely
sure. I'm outputting the translation to a text file, but technically
I'm viewing it in a terminal app (Putty) so it may be screwing up there.

Thanks.
--

If you're using 1.8, you can transcode between ISO-8859 and UTF-8 with this:
http://ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html

-Jonathan Nielsen

Make sure PuTTY is set for UTF-8.

···

On Fri, Feb 19, 2010 at 1:29 PM, The Chromag <brent@bjohnson.net> wrote:

Jonathan Nielsen wrote:

Are you using ruby 1.8 or ruby 1.9? In 1.9, you should do
string.force_encoding("UTF-8") or
string.force_encoding("ISO-8859-1")... as needed. In ruby 1.8, I
think it just works with the bits you provide it and it's your
terminal that determines what actually gets displayed.

I'm using 1.8.7. I don't think it's the terminal but I'm not entirely
sure. I'm outputting the translation to a text file, but technically
I'm viewing it in a terminal app (Putty) so it may be screwing up there.

Thanks.

Eric Christopherson wrote:

···

On Fri, Feb 19, 2010 at 1:29 PM, The Chromag <brent@bjohnson.net> wrote:

Thanks.

Make sure PuTTY is set for UTF-8.

Aha! Well that fixed the problem of being able to see the correct
output in the terminal. It should greatly help the debugging process
now. I'm then taking the encoded string and transferring it with XML via
a socket connection. I'll have to look into the transfer to see if it's
breaking there.

Thanks for the help.
--
Posted via http://www.ruby-forum.com/\.