Thomas Strömberg thomasNOMORESPAM@stromberg.org wrote in message news:2004053113022116304%thomasNOMORESPAM@strombergorg…
I made some headway trying to use unpack(“U”) when I ran into high
ascii characters, but it did not seem to handle my test case very well.
REXML’s normalize function did not help either. Here is the test-case I
am trying to handle, from Paljon tuotteita ja palveluita (you may want to
fetch the file instead of relying on what’s pasted below)
You have to tell REXML that you want ASCII output, and write the
document out to get it to normalize text. All XML is UTF-8 by
default, unless you specify some other encoding in the XML
declaration.
— SER