string.b

Hello everybody!

What is the inverse method to .b ?

(how to convert the output of string.b back to UTF-8) - seems to be not
as easy when including 'umlauts'...

thanks
Opti

There is no "reverse" without knowing the encoding.

When you turn a string to bytes, you are telling the system to stop caring
about the encoding, you turn it into what some people might call a
"c-string", you lost all track of what encoding, character width, etc was.

The only way to get it back into a higher-level string, with a known
encoding is to try something like this
https://ruby-doc.org/core-2.5.3/String.html#method-i-encode - but beware
this can (and will) fail in a number of ways.

Whatever you are doing with your string in the first place, turning it into
bytes is probably _not_ what you want to do. using "bytes" will behave
unpredictably depending on the encoding of your underlying string (umlaut
before or after, or combined with, read here for more details
<https://en.wikipedia.org/wiki/Diaeresis_(diacritic)#Character_encodings>)

String#each_char which yields _characters_ not bytes, is probably thing
thing you want to be doing to iterate over a string letter-by-letter.

Ahoy,

Lee Hambley
http://lee.hambley.name/
+49 (0) 170 298 5667

···

On Mon, 21 Sep 2020 at 11:26, Die Optimisten <inform@die-optimisten.net> wrote:

Hello everybody!

What is the inverse method to .b ?

(how to convert the output of string.b back to UTF-8) - seems to be not
as easy when including 'umlauts'...

thanks
Opti

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

Hi,
Thanks for this answer (still the only one :frowning: ... )
I ask, because - as you say - there is no standard way to achive this. -
So I have to add: string is in UTF8 (before).
- but that should be as input-parameter for encoding-Methods to work
with every encodings.
I couldn't find any method to correctly transform it back. You could do
it "by hand" (maybe there is a table somewhere in/on the internet), but
I'm sure there is a easier way.

thanks
Opti

···

Am 9/21/20 um 11:52 AM schrieb Lee Hambley:

There is no "reverse" without knowing the encoding.

When you turn a string to bytes, you are telling the system to stop
caring about the encoding, you turn it into what some people might
call a "c-string", you lost all track of what encoding, character
width, etc was.

The only way to get it back into a higher-level string, with a known
encoding is to try something like this
https://ruby-doc.org/core-2.5.3/String.html#method-i-encode - but
beware this can (and will) fail in a number of ways.

Whatever you are doing with your string in the first place, turning it
into bytes is probably _not_ what you want to do. using "bytes" will
behave unpredictably depending on the encoding of your underlying
string (umlaut before or after, or combined with, read here for more
details
<https://en.wikipedia.org/wiki/Diaeresis_(diacritic)#Character_encodings>)

String#each_char which yields _characters_ not bytes, is probably
thing thing you want to be doing to iterate over a string
letter-by-letter.

Ahoy,

Lee Hambley
http://lee.hambley.name/
+49 (0) 170 298 5667

On Mon, 21 Sep 2020 at 11:26, Die Optimisten > <inform@die-optimisten.net <mailto:inform@die-optimisten.net>> wrote:

    Hello everybody!

    What is the inverse method to .b ?

    (how to convert the output of string.b back to UTF-8) - seems to
    be not
    as easy when including 'umlauts'...

    thanks
    Opti

    Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org
    <mailto:ruby-talk-request@ruby-lang.org>?subject=unsubscribe>
    <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

Quoting Die Optimisten (inform@die-optimisten.net):

   I ask, because - as you say - there is no standard way to achive this. -
   So I have to add: string is in UTF8 (before).
   - but that should be as input-parameter for encoding-Methods to work with
   every encodings.
   I couldn't find any method to correctly transform it back. You could do it
   "by hand" (maybe there is a table somewhere in/on the internet), but I'm
   sure there is a easier way.

What about 'force_encoding?'

--8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<--
s1='àèìòù'
s2=s1.b
s3=s2.force_encoding('UTF-8')

printf("s1: %s (%s)\ns3: %s (%s)\n",s1,s1.encoding,s3,s3.encoding);
--8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<--

The above returns (here on my machine):

s1: àèìòù (UTF-8)
s3: àèìòù (UTF-8)

I am not completely sure about what you are trying to obtain...

HTH

Carlo

···

Subject: Re: string.b
  Date: Mon 21 Sep 20 02:08:51PM +0200

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)