Hello,
is there any way, to detect text encoding?
For example, is it in utf8, or in win1251, or something else.
Thank you.
Hello,
is there any way, to detect text encoding?
For example, is it in utf8, or in win1251, or something else.
Thank you.
You can't detect one-byte-per-character encodings easily (i.e. without
statistical analysis) but you can easily tell if something's UTF-8 or
not:
class String
def is_utf8?
unpack('U*')
return true
rescue
return false
end
end
"foo".is_utf8? #=> true
"foo\303".is_utf8? #=> false
Not the most efficient way, necessarily, but probably the easiest.
Paul.
On 10/07/06, xTRiM <rtokarev@gmail.com> wrote:
is there any way, to detect text encoding?
For example, is it in utf8, or in win1251, or something else.
Hi,
2006/7/10, xTRiM <rtokarev@gmail.com>:
Hello,
is there any way, to detect text encoding?
For example, is it in utf8, or in win1251, or something else.
You can use the standard lib NKF's guess or guess2 (ruby 1.8.2 or
later) method for that. Look up the NKF section in
http://www.ruby-doc.org/stdlib/\.
Takashi Sano
In the general case, there's *no safe way* to do this, unless the data is XML or comes with an HTTP header from a reliable server (ha ha ha, I'm sure there must be one somewhere). Probably the best auto-detecter is Mark Pilgrim's, but it's in Python: http://chardet.feedparser.org/
-Tim
On Jul 10, 2006, at 4:47 AM, Takashi Sano wrote:
is there any way, to detect text encoding?
For example, is it in utf8, or in win1251, or something else.You can use the standard lib NKF's guess or guess2 (ruby 1.8.2 or
later) method for that. Look up the NKF section in
http://www.ruby-doc.org/stdlib/\.
Nice pointer, Tim. I'll have to check that out. I did a quick web search
and found a Ruby port incidentally (I have not evaluated it in any way
though):
http://rubyforge.org/projects/chardet/ by Hui Zheng
gem name is "chardet"
Jake
On Jul 10, 2006, at 4:47 AM, Takashi Sano wrote:
is there any way, to detect text encoding?
For example, is it in utf8, or in win1251, or something else.You can use the standard lib NKF's guess or guess2 (ruby 1.8.2 or
later) method for that. Look up the NKF section in
http://www.ruby-doc.org/stdlib/\.In the general case, there's *no safe way* to do this, unless the
data is XML or comes with an HTTP header from a reliable server (ha
ha ha, I'm sure there must be one somewhere). Probably the best auto-
detecter is Mark Pilgrim's, but it's in Python: http://
chardet.feedparser.org/-Tim