What's going on here? Ths is on MacOS X 10.4.4. Looks like Dir#entries returns strings encoded with some encoding I didn't expect. How can I convert the string to UTF8?
What's going on here? Ths is on MacOS X 10.4.4. Looks like
Dir#entries returns strings encoded with some encoding I didn't
expect. How can I convert the string to UTF8?
You have got a corrent UTF-8 string. Unlike Windows XP, Mac OS X
decomposes character components as much as possible (Sorry I forgot
the correct term for this policy). So what you got:
"HFS Plus stores strings fully decomposed and in canonical order. HFS
Plus compares strings in a case-insensitive fashion. Strings may
contain Unicode characters that must be ignored by this comparison.
For more details on these subtleties, see Unicode Subtleties."
-A
···
On 1/12/06, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
Hi,
On 1/13/06, Timo Hoepfner <th-dev@onlinehome.de> wrote:
> What's going on here? Ths is on MacOS X 10.4.4. Looks like
> Dir#entries returns strings encoded with some encoding I didn't
> expect. How can I convert the string to UTF8?
You have got a corrent UTF-8 string. Unlike Windows XP, Mac OS X
decomposes character components as much as possible (Sorry I forgot
the correct term for this policy). So what you got:
On 12/01/06, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
On 1/13/06, Timo Hoepfner <th-dev@onlinehome.de> wrote:
What's going on here? Ths is on MacOS X 10.4.4. Looks like
Dir#entries returns strings encoded with some encoding I didn't
expect. How can I convert the string to UTF8?
You have got a corrent UTF-8 string. Unlike Windows XP, Mac OS X
decomposes character components as much as possible (Sorry I forgot
the correct term for this policy). So what you got:
You have got a corrent UTF-8 string. Unlike Windows XP, Mac OS X
decomposes character components as much as possible (Sorry I forgot
the correct term for this policy). So what you got:
is decomposed form of your string, a+umlaut, o+umlaut, etc.
Hi Matz, Austin and A.
Thanks for the clarification. Unicode is more comlex than it seems in the first place...
Nevertheless that doesn't solve my current problem. What I'm trying to do is to organize files within a directory into subfolders based on the first N characters of the file name. Here's my code (w/o error handling) which works fine for 8bit characters, but doesn't work for e.g. umlauts:
$KCODE='UTF8'
require 'jcode'
require 'pathname'
require 'fileutils'
wd, len = Pathname.new(ARGV[0]), ARGV[1].to_i
files=wd.children.reject{|f| f.directory?}
files.each do |f|
dir = wd + Pathname.new(f.basename.to_s.split(//)[0..len-1].join)
dir.mkdir unless dir.exist?
FileUtils.mv f, dir
end
I guess I have to recompose the decomposed filename somehow. Are there any tools for that in the standard library or somewhere else?