Hi,
How can I transform (US-ASCII) s='Eintr\xE4ge:' (individial chars!:
\,x,E,4) to correct UTF-8 ?
thanks
Andrew
Hi,
How can I transform (US-ASCII) s='Eintr\xE4ge:' (individial chars!:
\,x,E,4) to correct UTF-8 ?
thanks
Andrew
The string you start out with is not valid US-ASCII, so I have to make
assumptions about what you start out with here.
If you know (or can fairly safely assume) that your input is valid
ISO8859-1 (which is compatible with Unicode for codepoints < 256), you can
do:
utf8string = s.encode('UTF-8', 'ISO8859-1')
Or use String#encode! to do it in-place.
Cheers,
Christer Jansson
Christer Jansson Datakonsult AB
+46 70 88 55 020
Den ons 8 juli 2020 kl 10:31 skrev Die Optimisten <inform@die-optimisten.net >:
Hi,
How can I transform (US-ASCII) s='Eintr\xE4ge:' (individial chars!:
\,x,E,4) to correct UTF-8 ?thanks
AndrewUnsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Hi,
thanks for your try
the string is valid US-ASCII, EACH [part of the!] UTF-char coded, is
stored in 1 byte.
So a similar question could be:
How to convert/interpret s='text', as it would be written like s="..."
(change 'string' to "string" using the variable s, where s already
contains 'string' ...
thanks Andrew
Ah, sorry, misreading.
Then, again assuming you want to interpret '\xE4' as the codepoint for "ä",
eval(%("#{s}")).encode('UTF-8', 'ISO8859-1')
But remember, eval is not very safe so don't use it if the text can contain
something like, say, '"; system %(sudo rm -rf /); "'.
Cheers,
Christer Jansson
Christer Jansson Datakonsult AB
+46 70 88 55 020
Den ons 8 juli 2020 kl 12:27 skrev Die Optimisten <inform@die-optimisten.net >:
Hi,
thanks for your try
the string is valid US-ASCII, EACH [part of the!] UTF-char coded, is
stored in 1 byte.So a similar question could be:
How to convert/interpret s='text', as it would be written like s="..."
(change 'string' to "string" using the variable s, where s already
contains 'string' ...thanks Andrew
Or to very specifically handle \xFF sequences:
s.gsub(/\\x([0-9A-F]{2})/) {|h| h[-2,2].to_i(16).chr }.encode('UTF-8', 'ISO8859-1')
On 8 Jul 2020, at 07:14, Christer Jansson <datakonsult@janssons.org> wrote:
Ah, sorry, misreading.
Then, again assuming you want to interpret '\xE4' as the codepoint for "ä",
eval(%("#{s}")).encode('UTF-8', 'ISO8859-1')
But remember, eval is not very safe so don't use it if the text can contain something like, say, '"; system %(sudo rm -rf /); "'.
Cheers,
Christer Jansson
Christer Jansson Datakonsult AB
+46 70 88 55 020Den ons 8 juli 2020 kl 12:27 skrev Die Optimisten <inform@die-optimisten.net <mailto:inform@die-optimisten.net>>:
Hi,
thanks for your try
the string is valid US-ASCII, EACH [part of the!] UTF-char coded, is
stored in 1 byte.So a similar question could be:
How to convert/interpret s='text', as it would be written like s="..."
(change 'string' to "string" using the variable s, where s already
contains 'string' ...thanks Andrew
Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Thanks for both these answers, just a line, but powerful!
Andrew
Am 7/9/20 um 1:48 AM schrieb Rob Biedenharn:
Then, again assuming you want to interpret '\xE4' as the codepoint for
"ä",
Just to be clear, I would have chosen Rob's solution if I were you.
Cheers,
Christer Jansson
Christer Jansson Datakonsult AB
+46 70 88 55 020
Den tors 9 juli 2020 kl 11:58 skrev Die Optimisten < inform@die-optimisten.net>:
Am 7/9/20 um 1:48 AM schrieb Rob Biedenharn:
> Then, again assuming you want to interpret '\xE4' as the codepoint for
> "ä",Thanks for both these answers, just a line, but powerful!
Andrew
Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>