Hello,
Is there an easy way to chop (as in String#chop) a string that can
potentially contain UTF-8 in ruby 1.8? Or should I roll my own?
Thanks,
Ammar
Hello,
Is there an easy way to chop (as in String#chop) a string that can
potentially contain UTF-8 in ruby 1.8? Or should I roll my own?
Thanks,
Ammar
Ended up making my own. Posting it here for the benefit of others, and
maybe some feedback.
https://gist.github.com/661217
Regards,
Ammar
Well, it should be this simple:
str.gsub(/.\z/mu, "")
James Edward Gray II
On Nov 3, 2010, at 9:08 AM, Ammar Ali wrote:
Is there an easy way to chop (as in String#chop) a string that can
potentially contain UTF-8 in ruby 1.8? Or should I roll my own?
I was going to say
$KCODE="U"
=> "U"
s = "one two three"
=> "one two three"
s.gsub(/^(.+)./u) { $1 }
=> "one two thre"
I guess I overthought it, huh!
On Wed, Nov 3, 2010 at 3:38 PM, Ammar Ali <ammarabuali@gmail.com> wrote:
Ended up making my own. Posting it here for the benefit of others, and
maybe some feedback.UTF-8 aware string chop · GitHub
Regards,
Ammar
Beautiful. Thank you both.
It was a god exercise for me, so I don't necessarily feel that I
wasted 30 minutes of my life
By the way, the m options seems superfluous in James' version. I get
the same results without it.
Thanks again,
Ammar
On Wed, Nov 3, 2010 at 5:57 PM, James Edward Gray II <james@graysoftinc.com> wrote:
Well, it should be this simple:
str.gsub(/.\z/mu, "")
On Wed, Nov 3, 2010 at 6:04 PM, Adam Prescott <mentionuse@gmail.com> wrote:
s.gsub(/^(.+)./u) { $1 }
=> "one two thre"
Well, it should be this simple:
str.gsub(/.\z/mu, "")
s.gsub(/^(.+)./u) { $1 }
=> "one two thre"
Beautiful. Thank you both.
It was a god exercise for me, so I don't necessarily feel that I
wasted 30 minutes of my lifeBy the way, the m options seems superfluous in James' version. I get
the same results without it.
It's not:
"\n".sub(/.\z/u, "")
=> "\n"
"\n".sub(/.\z/mu, "")
=> ""
Using gsub() over sub() was a dumb mistake on my part though. sub() is all you need, since it can only match once.
James Edward Gray II
On Nov 3, 2010, at 11:33 AM, Ammar Ali wrote:
On Wed, Nov 3, 2010 at 5:57 PM, James Edward Gray II > <james@graysoftinc.com> wrote:
On Wed, Nov 3, 2010 at 6:04 PM, Adam Prescott <mentionuse@gmail.com> wrote:
Ammar Ali wrote in post #959047:
By the way, the m options seems superfluous in James' version. I get
the same results without it.
foo = "abc\n"
=> "abc\n"
foo.sub(/.\z/mu, '')
=> "abc"
foo.sub(/.\z/u, '')
=> "abc\n"
--
Posted via http://www.ruby-forum.com/\.
Thanks for the clarification.
My method now looks like:
def chop_utf8(s)
return unless s
lead = s.sub(/.\z/mu, "")
last = s.scan(/.\z/mu).first
last = '' unless last
[lead, last]
end
Short and sweet.
Cheers,
Ammar
On Wed, Nov 3, 2010 at 6:38 PM, James Edward Gray II <james@graysoftinc.com> wrote:
On Nov 3, 2010, at 11:33 AM, Ammar Ali wrote:
By the way, the m options seems superfluous in James' version. I get
the same results without it.It's not:
"\n".sub(/.\z/u, "")
=> "\n"
"\n".sub(/.\z/mu, "")
=> ""
Using gsub() over sub() was a dumb mistake on my part though. sub() is all you need, since it can only match once.
James clarified this earlier. But thanks for chiming in nonetheless.
Cheers,
Ammar
On Thu, Nov 4, 2010 at 4:37 PM, Brian Candler <b.candler@pobox.com> wrote:
Ammar Ali wrote in post #959047:
By the way, the m options seems superfluous in James' version. I get
the same results without it.foo = "abc\n"
=> "abc\n"
foo.sub(/.\z/mu, '')
=> "abc"
foo.sub(/.\z/u, '')
=> "abc\n"
My method now looks like:
def chop_utf8(s)
return unless slead = s.sub(/.\z/mu, "")
last = s.scan(/.\z/mu).first
last = '' unless last
The two lines above can be replaced with the more efficient:
last = s[/.\z/mu] || ''
[lead, last]
end
James Edward Gray II
On Nov 3, 2010, at 11:56 AM, Ammar Ali wrote:
At this rate the method is going to disappear.
I updated the gist accordingly:
UTF-8 aware string chop. (the firs gist was posted as anonymous) · GitHub
Thanks again,
Ammar
On Wed, Nov 3, 2010 at 7:00 PM, James Edward Gray II <james@graysoftinc.com> wrote:
On Nov 3, 2010, at 11:56 AM, Ammar Ali wrote:
My method now looks like:
def chop_utf8(s)
return unless slead = s.sub(/.\z/mu, "")
last = s.scan(/.\z/mu).first
last = '' unless lastThe two lines above can be replaced with the more efficient:
last = s[/.\z/mu] || ''
can we make that a one pass?
str =~ /.\z/mu
[$`,$&]
best regards -botp
On Thu, Nov 4, 2010 at 1:25 AM, Ammar Ali <ammarabuali@gmail.com> wrote:
On Wed, Nov 3, 2010 at 7:00 PM, James Edward Gray II
last = s[/.\z/mu] || ''
I updated the gist accordingly:
UTF-8 aware string chop. (the firs gist was posted as anonymous) · GitHub