I’m downloading information from a website in UTF-8, but I want to
place the text I recieve into XML and XHTML output files. In a
nutshell, if I run into a u with an umlaut on top, like: ü. I would
like ruby to replace it with ü or ü or ü
I made some headway trying to use unpack(“U”) when I ran into high
ascii characters, but it did not seem to handle my test case very well.
REXML’s normalize function did not help either. Here is the test-case I
am trying to handle, from http://toadstool.se/temp/utf (you may want to
fetch the file instead of relying on what’s pasted below)
Es befinden sich 3 Streichholzschachteln im Cache, diese können gegen
Zündhölzer aus aller Welt getauscht werden.
Does anyone here have any recipes for replacing all UTF characters with
entities? If so, I would really appreciate the help. Thanks!
I’m downloading information from a website in UTF-8, but I want to
place the text I recieve into XML and XHTML output files. In a
nutshell, if I run into a u with an umlaut on top, like: ü. I would
like ruby to replace it with ü or ü or ü
I made some headway trying to use unpack(“U”) when I ran into high
ascii characters, but it did not seem to handle my test case very well.
REXML’s normalize function did not help either. Here is the test-case I
am trying to handle, from Paljon tuotteita ja palveluita (you may want to
fetch the file instead of relying on what’s pasted below)
Es befinden sich 3 Streichholzschachteln im Cache, diese können gegen
Zündhölzer aus aller Welt getauscht werden.
Does anyone here have any recipes for replacing all UTF characters with
entities? If so, I would really appreciate the help. Thanks!