Utf8 string with reverse question

Hi everyone, long time lurker here. Even longer Ruby user.

I have a minor problem with a utf8 string.

In short I see this behavior:

"Stuhlu".sub(/u/,'ü')
=> "Stühlu"
"Stuhlu".reverse.sub(/u/,'ü').reverse
=> "Stuhl\274\303"
"Stuhlu".reverse.sub(/u/,'ü').split(//).reverse.join
=> "Stuhlü"

The general goal is to sub the final "u" in that word with an umlauted version and not the first. I started irb with -Ku so that I get utf8 support in all things ruby. But the behavior of reverse on the substituted string is really baffling me.

Does anyone know the reason for the weirdness of reverse after the sub? The last version was a hack to get things to just work. Am I mising a Regexp option that would make the final match work? I don't normally look for a final match to substitute on. And reverse seemed the most logical choice for a solution.

Any help would be appreciated!

Thanks,
Mitch

Mitch Tishmack schrieb:

In short I see this behavior:

"Stuhlu".sub(/u/,'ü')
=> "Stühlu"
"Stuhlu".reverse.sub(/u/,'ü').reverse
=> "Stuhl\274\303"
"Stuhlu".reverse.sub(/u/,'ü').split(//).reverse.join
=> "Stuhlü"

The general goal is to sub the final "u" in that word with an umlauted version and not the first.
...
Am I mising a Regexp option that would make the final match work?

Can't help with the reverse behaviour, but if you want to substitute single letters only the following regexp should work:

   "Stuhlu".sub(/u(?=[^u]*$)/,'ü')

Regards,
Pit

Hi everyone, long time lurker here. Even longer Ruby user.

I have a minor problem with a utf8 string.

In short I see this behavior:

"Stuhlu".sub(/u/,'ü')
=> "Stühlu"
"Stuhlu".reverse.sub(/u/,'ü').reverse
=> "Stuhl\274\303"

It seems like reverse is acting on the string as a byte-array. That
means you are reversing the two byte character ü = '\303\274' into the
non-character '\274\303' when reversing the string 'ühlutS'

Are you trying to build a german -> pig-türkisch translator :wink:

regards,

Brian

···

On 15/08/05, Mitch Tishmack <idylls@gmail.com> wrote:

"Stuhlu".reverse.sub(/u/,'ü').split(//).reverse.join
=> "Stuhlü"

The general goal is to sub the final "u" in that word with an
umlauted version and not the first. I started irb with -Ku so that I
get utf8 support in all things ruby. But the behavior of reverse on
the substituted string is really baffling me.

Does anyone know the reason for the weirdness of reverse after the
sub? The last version was a hack to get things to just work. Am I
mising a Regexp option that would make the final match work? I don't
normally look for a final match to substitute on. And reverse seemed
the most logical choice for a solution.

Any help would be appreciated!

Thanks,
Mitch

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

You could do this:

   $KCODE = 'u'
   class String
     def reverse; self.scan(/./).reverse.join end
   end

   "Stuhlü".reverse #=> "ülhutS"

found on <http://redhanded.hobix.com/inspect/closingInOnUnicodeWithJcode.html&gt;

-Levin

···

Mitch Tishmack <idylls@gmail.com> wrote:

I have a minor problem with a utf8 string.

In short I see this behavior:

"Stuhlu".sub(/u/,'ü')
=> "Stühlu"
"Stuhlu".reverse.sub(/u/,'ü').reverse
=> "Stuhl\274\303"
"Stuhlu".reverse.sub(/u/,'ü').split(//).reverse.join
=> "Stuhlü"

It seems like reverse is acting on the string as a byte-array. That
means you are reversing the two byte character ü = '\303\274' into the
non-character '\274\303' when reversing the string 'ühlutS'

That makes sense, but seems like incorrect behavior for this instance.

Are you trying to build a german -> pig-türkisch translator :wink:

Not quite :), I just picked a random German word and appended anther u
to it for testing. I suppose my example could have been Kuhlstuhl. I
am actually working on a German Noun/Verb helper, all it will do is
conjugate the verb/noun according to proper grammatical rules. ie der
Fisch -> die Fische etc... der Stuhl -> die Stühle

I was just worrying about compound nouns where the final noun is what
is conjugated.

Yes I DO have too much time on my hands right now.

Cheers,
Mitch

···

On 8/15/05, Brian Schröder <ruby.brian@gmail.com> wrote:

Aha, I knew there was something I was missing in Regexp. Thanks for
confirming. :slight_smile:

I will get back to my program after work today.

Thanks,
Mitch

···

On 8/15/05, Pit Capitain <pit@capitain.de> wrote:

Mitch Tishmack schrieb:
> In short I see this behavior:
>
> "Stuhlu".sub(/u/,'ü')
> => "Stühlu"
> "Stuhlu".reverse.sub(/u/,'ü').reverse
> => "Stuhl\274\303"
> "Stuhlu".reverse.sub(/u/,'ü').split(//).reverse.join
> => "Stuhlü"
>
> The general goal is to sub the final "u" in that word with an umlauted
> version and not the first.
> ...
> Am I mising a
> Regexp option that would make the final match work?

Can't help with the reverse behaviour, but if you want to substitute
single letters only the following regexp should work:

   "Stuhlu".sub(/u(?=[^u]*$)/,'ü')

Regards,
Pit