Newbie Question: delete all non alphanumeric characters

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

···

--
Posted via http://www.ruby-forum.com/.

string.gsub(/[0-9a-z]+/i, '')

···

On Jul 21, 2006, at 1:53 PM, Theallnighter Theallnighter wrote:

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

--
Posted via http://www.ruby-forum.com/\.

I've also just started to learn Ruby, so thought I'd reply for the practice -
Here's one solution:

···

On 2006-07-21, Theallnighter Theallnighter <theallnighter@gmail.com> wrote:

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

------------------------------------------------------------------------
#!/usr/bin/ruby

x = "There are 2007 beans and 15234 grains of rice in this bag."
puts x
x.gsub!(/\W/, '')
puts x

------------------------------------------------------------------------

output:

There are 2007 beans and 15234 grains of rice in this bag.
Thereare2007beansand15234grainsofriceinthisbag

--

TMTOWTDI:

username.delete('^A-Za-z0-9')

...I just thought I'd add a little variety to this collection of
Regexp-centric solutions.

···

On 7/21/06, Theallnighter Theallnighter <theallnighter@gmail.com> wrote:

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

--
Posted via http://www.ruby-forum.com/\.

Logan Capaldo wrote:

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

--
Posted via http://www.ruby-forum.com/\.

string.gsub(/[0-9a-z]+/i, '')

That deletes all alphanumeric. To delete all non-alphanumeric:

string.gsub(/[^0-9a-z]/i, '')

···

On Jul 21, 2006, at 1:53 PM, Theallnighter Theallnighter wrote:

--
Tom Werner
Helmets to Hardhats
Software Developer
tom@helmetstohardhats.org
www.helmetstohardhats.org

Well the only "problem" with that is

x = '\w includes_under_scores_too'

···

On Jul 21, 2006, at 3:40 PM, Jim Cochrane wrote:

On 2006-07-21, Theallnighter Theallnighter > <theallnighter@gmail.com> wrote:

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

I've also just started to learn Ruby, so thought I'd reply for the practice -
Here's one solution:

------------------------------------------------------------------------
#!/usr/bin/ruby

x = "There are 2007 beans and 15234 grains of rice in this bag."
puts x
x.gsub!(/\W/, '')
puts x

------------------------------------------------------------------------

output:

There are 2007 beans and 15234 grains of rice in this bag.
Thereare2007beansand15234grainsofriceinthisbag

--

I think \W is non-perl-word, so underscores won't be stripped. If you want
those out too:

irb(main):006:0> str = "The $re34& __q!?"
=> "The $re34& __q!?"
irb(main):007:0> str.gsub( /\W/, '')
=> "There34__q"
irb(main):008:0> str.gsub( /\W|_/, '')
=> "There34q"
irb(main):009:0>

Jeff

···

On 7/21/06, Jim Cochrane <allergic-to-spam@no-spam-allowed.org> wrote:

On 2006-07-21, Theallnighter Theallnighter <theallnighter@gmail.com> > wrote:
> Hi all,
> how can i delete all non alphanumeric characters in a string ? thanks
>

I've also just started to learn Ruby, so thought I'd reply for the
practice -
Here's one solution:

------------------------------------------------------------------------
#!/usr/bin/ruby

x = "There are 2007 beans and 15234 grains of rice in this bag."
puts x
x.gsub!(/\W/, '')
puts x

------------------------------------------------------------------------

output:

There are 2007 beans and 15234 grains of rice in this bag.
Thereare2007beansand15234grainsofriceinthisbag

--

Doh! I'm obviously not awake yet this ---err-- afternoon.

···

On Jul 21, 2006, at 2:05 PM, Tom Werner wrote:

Logan Capaldo wrote:

On Jul 21, 2006, at 1:53 PM, Theallnighter Theallnighter wrote:

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

--
Posted via http://www.ruby-forum.com/\.

string.gsub(/[0-9a-z]+/i, '')

That deletes all alphanumeric. To delete all non-alphanumeric:

string.gsub(/[^0-9a-z]/i, '')

--
Tom Werner
Helmets to Hardhats
Software Developer
tom@helmetstohardhats.org
www.helmetstohardhats.org

Woah! Thanks for pointing that out. It looks like
http://www.ruby-doc.org/docs/ruby-doc-bundle/UsersGuide/rg/regexp.html
has a bug:

\w letter or digit; same as [0-9A-Za-z]

It's missing a _.

Here's a fixed version:

#!/usr/bin/ruby

x = "There are 2007 beans_and 15234 grains of rice in this bag."
puts x
x.gsub!(/\W/, '')
puts x
x.gsub!(/\W|_/, '')
puts "fixed:"
puts x

···

On 2006-07-21, Logan Capaldo <logancapaldo@gmail.com> wrote:

On Jul 21, 2006, at 3:40 PM, Jim Cochrane wrote:

On 2006-07-21, Theallnighter Theallnighter >> <theallnighter@gmail.com> wrote:

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

...
#!/usr/bin/ruby

x = "There are 2007 beans and 15234 grains of rice in this bag."
puts x
x.gsub!(/\W/, '')
puts x
...

Well the only "problem" with that is

x = '\w includes_under_scores_too'

for fun, I started irb, then typed

"567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')

It returned

67576hgjhgjh&**)

Tom Werner wrote:

···

Logan Capaldo wrote:
>
> On Jul 21, 2006, at 1:53 PM, Theallnighter Theallnighter wrote:
>
>> Hi all,
>> how can i delete all non alphanumeric characters in a string ? thanks
>>
>> --
>> Posted via http://www.ruby-forum.com/\.
>>
>
> string.gsub(/[0-9a-z]+/i, '')
>
>
>
That deletes all alphanumeric. To delete all non-alphanumeric:

string.gsub(/[^0-9a-z]/i, '')

--
Tom Werner
Helmets to Hardhats
Software Developer
tom@helmetstohardhats.org
www.helmetstohardhats.org

The carat goes inside the brackets (it inverses the character class)

Tom

···

dominique.plante@gmail.com wrote:

for fun, I started irb, then typed

"567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')

It returned

67576hgjhgjh&**)

--
Tom Werner
Helmets to Hardhats
Software Developer
tom@helmetstohardhats.org
www.helmetstohardhats.org

for fun, I started irb, then typed

"567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')

It returned

67576hgjhgjh&**)

No wonder. There was only one character at the begining of the string....

Regards,
Rimantas

···

--
http://rimantas.com/

Oops - the above has a bug (although it still "works"). Here's a fixed
version, with an opposite example further demonstrating the bug in the
ruby doc site:

#!/usr/bin/ruby

s = "There are 2007 beans_and 15234 grains of rice in this bag."
x = s.dup
y = s.dup
puts "original:"
puts x
x.gsub!(/\W/, '')
puts "\nbroken:"
puts x
y.gsub!(/\W|_/, '')
puts "\nfixed:"
puts y

puts "\nopposite:"
z = s.dup
z.gsub!(/\w/, '')
puts z

···

On 2006-07-21, Jim Cochrane <allergic-to-spam@no-spam-allowed.org> wrote:

On 2006-07-21, Logan Capaldo <logancapaldo@gmail.com> wrote:

On Jul 21, 2006, at 3:40 PM, Jim Cochrane wrote:

On 2006-07-21, Theallnighter Theallnighter >>> <theallnighter@gmail.com> wrote:

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

...
#!/usr/bin/ruby

x = "There are 2007 beans and 15234 grains of rice in this bag."
puts x
x.gsub!(/\W/, '')
puts x
...

Well the only "problem" with that is

x = '\w includes_under_scores_too'

Woah! Thanks for pointing that out. It looks like
Regular expressions
has a bug:

\w letter or digit; same as [0-9A-Za-z]

It's missing a _.

Here's a fixed version:

#!/usr/bin/ruby

x = "There are 2007 beans_and 15234 grains of rice in this bag."
puts x
x.gsub!(/\W/, '')
puts x
x.gsub!(/\W|_/, '')
puts "fixed:"
puts x

--

original:
There are 2007 beans_and 15234 grains of rice in this bag.

broken:
Thereare2007beans_and15234grainsofriceinthisbag

fixed:
Thereare2007beansand15234grainsofriceinthisbag

opposite:
          .

for fun, I started irb, then typed

"567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')

It returned

67576hgjhgjh&**)

The carat goes inside the brackets (it inverses the character class)

And it should look like this:

"567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')

Note the +

···

On 21-Jul-06, at 4:19 PM, Tom Werner wrote:

dominique.plante@gmail.com wrote:

Tom

--
Jeremy Tregunna
jtregunna@blurgle.ca

"One serious obstacle to the adoption of good programming languages is the notion that everything has to be sacrificed for speed. In computer languages as in life, speed kills." -- Mike Vanier

Jeremy Tregunna wrote:

And it should look like this:

"567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')

Note the +

#sub only does one replacement; adding a + will replace one chunk of non-alphas, but not any others in the string.

Tom

···

--
Tom Werner
Helmets to Hardhats
Software Developer
tom@helmetstohardhats.org
www.helmetstohardhats.org

Jeremy Tregunna wrote:

And it should look like this:

"567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')

Note the +

#sub only does one replacement; adding a + will replace one chunk of non-alphas, but not any others in the string.

typo, sorry.

···

On 21-Jul-06, at 4:44 PM, Tom Werner wrote:

Tom

--
Jeremy Tregunna
jtregunna@blurgle.ca

"One serious obstacle to the adoption of good programming languages is the notion that everything has to be sacrificed for speed. In computer languages as in life, speed kills." -- Mike Vanier

Jeremy Tregunna wrote:

And it should look like this:

"567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')

Note the +

#sub only does one replacement; adding a + will replace one chunk of non-alphas, but not any others in the string.

typo, sorry.

Speaking of typos, say either a-zA-Z or a-z/i, you don't need both <g>

···

On Jul 21, 2006, at 6:15 PM, Jeremy Tregunna wrote:

On 21-Jul-06, at 4:44 PM, Tom Werner wrote:

Tom

--
Jeremy Tregunna
jtregunna@blurgle.ca

"One serious obstacle to the adoption of good programming languages is the notion that everything has to be sacrificed for speed. In computer languages as in life, speed kills." -- Mike Vanier