-------- Original-Nachricht --------
Datum: Tue, 19 Aug 2008 00:23:17 +0900
Von: Pavel Drobushevich <p.drobushevich@gmail.com>
An: ruby-talk@ruby-lang.org
Betreff: Re: Ruby 1.8.* convert string to utf-8
Dear Axel,
Thank you at the your answer.
But maybe I didn't good explain my problem, I have some problem with
English.
> require 'iconv'
>
> s =IO.read('kknta10.txt')
>
> ic = Iconv.iconv('utf-8', 'cp1251',s)
> f=File.new("t.txt","w")
> f.puts ic
> f.close
>
It's good idea, and I used it. But, I has many files with different
encoding: utf8, cp1251, ucs-2le, .... and I need convert all this files
to utf-8 by one code, I need to identify encoding of file in run time,
not to fix with const for every file, because files generate other
system.
Thanks
--
Posted via http://www.ruby-forum.com/\.
Dear Pavel,
maybe it's a good idea to ask this question on a specialised Russian language
Ruby forum, but otherwise, I'd say that across (sufficiently long) Russian documents,
the most frequent letters will (most often) be the same.
You could count the frequencies of the letters in your documents like so
class Array
def count
k=Hash.new(0)
self.each{|x| k+=1}
k
end
end
s =IO.read( input_file_name ).split(//).count
p s
freq=s.sort{|x,y| x[1]<=>y[1]}
I'd then convert the, say five most frequent
letters into each of the possible encodings.
For the large number of files, you count the frequencies also,
and select the encoding which contains the greatest number of
common keys to the "five most frequent letters in file X hash".
Best regards,
Axel
···
--
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/wasistshortview.php?mc=sv_ext_mf@gmx