Nkf #guess1 and #guess2 on html files

Pere_Noel1 · 23 March 2006 08:08

may be i'm not using correctly nkf #guess1 but it gaves me return type 3
(suuposed to be UTF-8) for ISO-8859-1 encoded files.

it gaves me also 3 for UTF-8 encoded files ???

my code is simply :

NKF.guess1(string)

with string=<whole file content>

also sometimes guess1 disaggreed with guess2 ???

whare could i find a table giving the encoding versus returned values
???

···

--
une bévue

YANAGAWA_Kazuhisa · 23 March 2006 11:16

In Message-Id: <1hcnady.1rrszh87n73rfN%pere.noel@laponie.com.invalid>
pere.noel@laponie.com.invalid (Une be.AeNivue) writes:

may be i'm not using correctly nkf #guess1 but it gaves me return type 3
(suuposed to be UTF-8) for ISO-8859-1 encoded files.

it gaves me also 3 for UTF-8 encoded files ???

Unfortunately NKF is just for Japanese tool, so you can't use it for
general code conversion / guessing, I think.

···

--
kjana@dm4lab.to March 23, 2006
Out of sight, out of mind.

Pere_Noel1 · 23 March 2006 11:53

ok, fine, i need just a tool in order to discriminate between ISO-8859-1
and UTF-8 (as a first step) without using the meta content-type charset
in the html file, which isn't reliable, for example a Ruby Cocoa site
(<rubycocoa.com - This website is for sale! - rubycocoa Resources and Information.) says it's
ISO-8859-1 encoding (in the meta tag) but it is in fact UTF-8 (said by
Firefox and text editor and also http headers...)

···

YANAGAWA Kazuhisa <kjana@dm4lab.to> wrote:

Unfortunately NKF is just for Japanese tool, so you can't use it for
general code conversion / guessing, I think.

--
une bévue

Topic		Replies	Views
Text encodings ruby-talk	4	74	10 July 2006
How to detect string charset ruby-talk	6	111	26 February 2008
Reading Files: how to I specify the encoding? ruby-talk	2	119	14 May 2007
Ruby-dev summary 24487-24627 ruby-talk	3	96	31 October 2004
Autodetect encoding / mojibake correction? ruby-talk	1	76	25 October 2006

Nkf #guess1 and #guess2 on html files

Related topics