String.strip with UTF-8

Hi

I can't strip the leading whitespace (or what at least looks like
whitespace) from a Ruby 1.9.2 string

ruby-1.9.2-p0 :002 > d.entity
=> " United Arab Emirates"
ruby-1.9.2-p0 :003 > d.entity.strip
=> " United Arab Emirates"
ruby-1.9.2-p0 :004 > d.entity.class
=> String
ruby-1.9.2-p0 :005 > d.entity.encoding
=> #<Encoding:UTF-8>
ruby-1.9.2-p0 :006 >

It's inside the Rails 3.0.3 console..

Erik

···

--
Posted via http://www.ruby-forum.com/.

Try this:

d.entity[0].ord

I'm not sure how useful that will be, but you can compare it to that of a
space. It _seems_ to be unicode-aware:

ruby-1.9.2-p136 :020 > ':snowman_with_snow:'.ord
=> 9731
ruby-1.9.2-p136 :021 > _.to_s 16
=> "2603"
ruby-1.9.2-p136 :022 > "\u2603"
=> ":snowman_with_snow:"

And for good measure:

ruby-1.9.2-p136 :023 > _.ord
=> 9731

(If you're wondering, that underscore means "The result of the last command I
entered into IRB." It's fantastically useful, though it gets annoying when you
want to repeat commands using up arrow, etc.)

So, if you get something other than:

ruby-1.9.2-p136 :024 > ' '.ord
=> 32

...then it's not a space. At that point, maybe report a bug, but maybe you'll
also be able to work around it with a regex or something.

···

On Wednesday, January 12, 2011 03:28:38 pm Erik E. wrote:

Hi

I can't strip the leading whitespace (or what at least looks like
whitespace) from a Ruby 1.9.2 string

ruby-1.9.2-p0 :002 > d.entity
=> " United Arab Emirates"
ruby-1.9.2-p0 :003 > d.entity.strip
=> " United Arab Emirates"
ruby-1.9.2-p0 :004 > d.entity.class
=> String
ruby-1.9.2-p0 :005 > d.entity.encoding
=> #<Encoding:UTF-8>
ruby-1.9.2-p0 :006 >

It's inside the Rails 3.0.3 console..

Erik E. wrote in post #974416:

Hi

I can't strip the leading whitespace (or what at least looks like
whitespace) from a Ruby 1.9.2 string

ruby-1.9.2-p0 :002 > d.entity
=> " United Arab Emirates"
ruby-1.9.2-p0 :003 > d.entity.strip
=> " United Arab Emirates"
ruby-1.9.2-p0 :004 > d.entity.class
=> String
ruby-1.9.2-p0 :005 > d.entity.encoding
=> #<Encoding:UTF-8>
ruby-1.9.2-p0 :006 >

It's inside the Rails 3.0.3 console..

Erik

Hi, I made a fresh install with rvm 1.9.2-p0 and rails 3.0.3
and I cannot reproduce your problem. Maybe you could try to
replay what I did and see if you can still reproduce it ?

Also, to examine that first character in detail, what is the
result when you try this:

009:0> d.entity.bytes.to_a[0..5]
=> [32, 85, 110, 105, 116, 101]

I see a "regular" space (character 32 in decimal notation)
as first character.

HTH,

Peter

peterv@ASUS:~/ra/apps/trials$ rvm install 1.9.2-p0
/home/peterv/.rvm/rubies/ruby-1.9.2-p0, this may take a while depending
on your cpu(s)...

ruby-1.9.2-p0 - #fetching
...
Install of ruby-1.9.2-p0 - #complete

peterv@ASUS:~/ra/apps/trials$ rvm use 1.9.2-p0
Using /home/peterv/.rvm/gems/ruby-1.9.2-p0

peterv@ASUS:~/ra/apps/trials$ rvm gemset create rails3
'rails3' gemset created (/home/peterv/.rvm/gems/ruby-1.9.2-p0@rails3).

peterv@ASUS:~/ra/apps/trials$ rvm gemset use rails3
Now using gemset 'rails3'

peterv@ASUS:~/ra/apps/trials$ gem install rails --no-rdoc --no-ri
Successfully installed activesupport-3.0.3
Successfully installed builder-2.1.2
Successfully installed i18n-0.5.0
Successfully installed activemodel-3.0.3
Successfully installed rack-1.2.1
Successfully installed rack-test-0.5.7
Successfully installed rack-mount-0.6.13
Successfully installed tzinfo-0.3.23
Successfully installed abstract-1.0.0
Successfully installed erubis-2.6.6
Successfully installed actionpack-3.0.3
Successfully installed arel-2.0.6
Successfully installed activerecord-3.0.3
Successfully installed activeresource-3.0.3
Successfully installed mime-types-1.16
Successfully installed polyglot-0.3.1
Successfully installed treetop-1.4.9
Successfully installed mail-2.2.14
Successfully installed actionmailer-3.0.3
Successfully installed thor-0.14.6
Successfully installed railties-3.0.3
Successfully installed bundler-1.0.7
Successfully installed rails-3.0.3
23 gems installed

peterv@ASUS:~/ra/apps/trials$ rails new issue_with_strip
      create
...
      create vendor/plugins/.gitkeep
peterv@ASUS:~/ra/apps/trials$ cd issue_with_strip/
peterv@ASUS:~/ra/apps/trials/issue_with_strip$ bundle install
Fetching source index for http://rubygems.org/
Using rake (0.8.7)
...
Using rails (3.0.3)
Installing sqlite3-ruby (1.3.2) with native extensions
Your bundle is complete! Use `bundle show [gemname]` to see where a
bundled gem is installed.

peterv@ASUS:~/ra/apps/trials/issue_with_strip$ rails g model D
entity:string
      invoke active_record
      create db/migrate/20110112222955_create_ds.rb
      create app/models/d.rb
      invoke test_unit
      create test/unit/d_test.rb
      create test/fixtures/ds.yml

peterv@ASUS:~/ra/apps/trials/issue_with_strip$ rake db:migrate
(in /home/peterv/data/back/rails-apps/apps/trials/issue_with_strip)
== CreateDs: migrating

···

=======================================================
-- create_table(:ds)
   -> 0.0010s
== CreateDs: migrated (0.0011s)

US:~/ra/apps/trials/issue_with_strip$ rails c
Loading development environment (Rails 3.0.3)
001:0> IRB.prompt_mode=:RVM # this is a local patch
=> :RVM
ruby-1.9.2-p0 :002 > d = D.create :entity => " United Arab Emirates"
=> #<D id: 1, entity: " United Arab Emirates", created_at: "2011-01-12
22:31:21", updated_at: "2011-01-12 22:31:21">
ruby-1.9.2-p0 :003 > d.entity
=> " United Arab Emirates"
ruby-1.9.2-p0 :004 > d.entity.strip
=> "United Arab Emirates"
ruby-1.9.2-p0 :005 > d.entity.class
=> String
ruby-1.9.2-p0 :006 > d.entity.encoding
=> #<Encoding:UTF-8>
ruby-1.9.2-p0 :007 > exit

peterv@ASUS:~/ra/apps/trials/issue_with_strip$ rails c
Loading development environment (Rails 3.0.3)
001:0> d = D.find :last
=> #<D id: 1, entity: " United Arab Emirates", created_at: "2011-01-12
22:31:21", updated_at: "2011-01-12 22:31:21">
002:0> d.entity
=> " United Arab Emirates"
003:0> d.entity.strip
=> "United Arab Emirates"

--
Posted via http://www.ruby-forum.com/\.

Thank you for quick reply David & Peter, I was upgrading Ruby to see if
it made a difference, but I can see it's not a space now which explains
why it didn't strip

Loading development environment (Rails 3.0.3)
ruby-1.9.2-p136 :001 > d = Domain.last
=> #<Domain id: 2055, classification: "Internationalized Country Code
Top Level Domain", dns_name: "xn--mgbaam7a8h", idn_name: "امارات.",
entity: " United Arab Emirates", explanation: "imārāt", notes: nil,
related_id: 1795, idn: true, dnssec: false, created_at: "2011-01-12
19:04:54", updated_at: "2011-01-12 19:04:54">
ruby-1.9.2-p136 :002 > d.entity
=> " United Arab Emirates"
ruby-1.9.2-p136 :003 > d.entity.class
=> String
ruby-1.9.2-p136 :004 > d.entity.encoding
=> #<Encoding:UTF-8>
ruby-1.9.2-p136 :005 > d.entity[0].ord
=> 160
ruby-1.9.2-p136 :006 > d.entity.bytes.to_a
=> [194, 160, 85, 110, 105, 116, 101, 100, 32, 65, 114, 97, 98, 32, 69,
109, 105, 114, 97, 116, 101, 115]

Peter Vandenabeele wrote in post #974440:

···

Hi, I made a fresh install with rvm 1.9.2-p0 and rails 3.0.3
and I cannot reproduce your problem. Maybe you could try to
replay what I did and see if you can still reproduce it ?

Also, to examine that first character in detail, what is the
result when you try this:

009:0> d.entity.bytes.to_a[0..5]
=> [32, 85, 110, 105, 116, 101]

I see a "regular" space (character 32 in decimal notation)
as first character.

HTH,

Peter

Loading development environment (Rails 3.0.3)
001:0> IRB.prompt_mode=:RVM # this is a local patch
=> :RVM
ruby-1.9.2-p0 :002 > d = D.create :entity => " United Arab Emirates"
=> #<D id: 1, entity: " United Arab Emirates", created_at: "2011-01-12
22:31:21", updated_at: "2011-01-12 22:31:21">
ruby-1.9.2-p0 :003 > d.entity
=> " United Arab Emirates"
ruby-1.9.2-p0 :004 > d.entity.strip
=> "United Arab Emirates"
ruby-1.9.2-p0 :005 > d.entity.class
=> String
ruby-1.9.2-p0 :006 > d.entity.encoding
=> #<Encoding:UTF-8>
ruby-1.9.2-p0 :007 > exit

peterv@ASUS:~/ra/apps/trials/issue_with_strip$ rails c
Loading development environment (Rails 3.0.3)
001:0> d = D.find :last
=> #<D id: 1, entity: " United Arab Emirates", created_at: "2011-01-12
22:31:21", updated_at: "2011-01-12 22:31:21">
002:0> d.entity
=> " United Arab Emirates"
003:0> d.entity.strip
=> "United Arab Emirates"

--
Posted via http://www.ruby-forum.com/\.

Yeah, it's the dreaded non-breaking space [1]. Unfortunately, somebody
thought it would be nice to map Alt+Space to this character on some
keymaps (like mine, which is Swiss-French). If you're on a mac, see my
solution here :
http://0x2a.im/2009/04/16/terminal-unicode-problem-2.html

[1]: Non-breaking space - Wikipedia

···

2011/1/12 Erik E. <erik.eide@gmail.com>:

Thank you for quick reply David & Peter, I was upgrading Ruby to see if
it made a difference, but I can see it's not a space now which explains
why it didn't strip

Cool, thanks for that! I can just gsub/gsub! it out now that I know what
it is.

zimbatm ... wrote in post #974462:

···

2011/1/12 Erik E. <erik.eide@gmail.com>:

Thank you for quick reply David & Peter, I was upgrading Ruby to see if
it made a difference, but I can see it's not a space now which explains
why it didn't strip

Yeah, it's the dreaded non-breaking space [1]. Unfortunately, somebody
thought it would be nice to map Alt+Space to this character on some
keymaps (like mine, which is Swiss-French). If you're on a mac, see my
solution here :
http://0x2a.im/2009/04/16/terminal-unicode-problem-2.html

[1]: Non-breaking space - Wikipedia

--
Posted via http://www.ruby-forum.com/\.

That will work if NO-BREAK SPACE is the only space you'll encounter.

s.gsub(/\A[[:space:]]*(.*?)[[:space:]]*\z/) { $1 }

will remove:
Space_Separator | Line_Separator | Paragraph_Separator | 0009 | 000A | 000B | 000C | 000D | 0085

See section 6 of サービス終了のお知らせ

PS: Note that s.gsub(/…(…)…/, '\1') may alter the encoding of the result string.

···

On Jan 12, 2011, at 16:43, Erik E. wrote:

Cool, thanks for that! I can just gsub/gsub! it out now that I know what
it is.