Remove all illegal chars form string

Ben_Nagy · 28 June 2006 17:18

Character sets with ranges: [a-z]
Negated sets: [^a-z]

"exam@p Le3|§".gsub(/[^a-zA-Z0-9]/,'') => "exampLe3"

ben

···

-----Original Message-----
From: thomas coopman [mailto:thomas.coopman@gmail.com]
Sent: Wednesday, June 28, 2006 8:10 PM
To: ruby-talk ML
Subject: remove all illegal chars form string

Hi,

Is there a simple way to remove all but the legal chars from
a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't
know if it's possible with sub or gsub.

Robert_K1 · 28 June 2006 17:19

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

Some options:

s.gsub /[^a-zA-Z0-9]+/, ''
s.gsub /\W+/, ''

(or use gsub! if you want to change in place)

I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

Definitively.

Kind regards

robert

···

2006/6/28, thomas coopman <thomas.coopman@gmail.com>:

--
Have a look: Robert K. | Flickr

A_S_Bradbury · 28 June 2006 17:20

You were right to think along the lines of gsub, a possible approach is:
"exam@p Le3|§".gsub(/[^[:alnum:]]/, '') -->"exampLe3"
This replaces all characters that are not in the :alnum: POSIX character class
with a blank string.

If you just want alpha characters (not 0-9), then use [[:alpha:]] instead.
Your example contradicted what you stated you were looking for.

The only thing to be wary of here is that Regexps such as /[[:alpha:]]/ may
act differently depending on locale (no idea whether it has any effect in
non-Onigurama Ruby 1.8.x), and certainly if executed on a Ruby compiled with
the Onigurama regular expression engine (mine at least). With that in mind,
the best way to get what you want may be to use /[a-zA-Z0-9]/ as your Regex.

As a side note, I didn't know that you could also do /[[:^alnum:]]/.

Hope this helps,
Alex

···

On Wednesday 28 June 2006 14:09, thomas coopman wrote:

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9 So everything should be
removed from the string but these characters. "exam@p Le3|§" --> "exampLe"

Troy_Denkinger · 28 June 2006 18:18

delete() - nice. As a Perler just coming to Ruby, it's hard not to fall
back on old habits (regex with gsub, for instance).

Troy

···

On 6/28/06, James Edward Gray II <james@grayproductions.net> wrote:

string.delete("^a-zA-Z0-9")

Pete1 · 28 June 2006 19:56

Troy Denkinger schrieb:

···

On 6/28/06, James Edward Gray II <james@grayproductions.net> wrote:

string.delete("^a-zA-Z0-9")

delete() - nice. As a Perler just coming to Ruby, it's hard not to fall
back on old habits (regex with gsub, for instance).

Troy

just too bad the rdoc is inaccurate and says that delete takes a string as argument and not a regexp

A_S_Bradbury · 28 June 2006 20:10

That's actually correct. At first glance I thought it just constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to take a
regexp?

Alex

···

On Wednesday 28 June 2006 20:56, Pete wrote:

just too bad the rdoc is inaccurate and says that delete takes a string
as argument and not a regexp

Eric_Hodel1 · 28 June 2006 22:54

#delete and #count are restricted to character lists, a full regular expression is too much.

···

On Jun 28, 2006, at 1:10 PM, A. S. Bradbury wrote:

On Wednesday 28 June 2006 20:56, Pete wrote:

just too bad the rdoc is inaccurate and says that delete takes a string
as argument and not a regexp

That's actually correct. At first glance I thought it just constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to take a
regexp?

--
Eric Hodel - drbrain@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

Jason_Clinton · 29 June 2006 00:29

Performance is just slightly better with character lists instead of
regular expressions.

···

On Thu, 2006-06-29 at 05:10 +0900, A. S. Bradbury wrote:

Is there a reason it wouldn't make more sense for these methods to take a
regexp?

Christian_Neukirche1 · 29 June 2006 21:57

Eric Hodel <drbrain@segment7.net> writes:

just too bad the rdoc is inaccurate and says that delete takes a
string
as argument and not a regexp

That's actually correct. At first glance I thought it just
constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of
pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to
take a
regexp?

#delete and #count are restricted to character lists, a full regular
expression is too much.

+1 for adding these to a future Ruby.

···

On Jun 28, 2006, at 1:10 PM, A. S. Bradbury wrote:

On Wednesday 28 June 2006 20:56, Pete wrote:

Eric Hodel - drbrain@segment7.net - http://blog.segment7.net

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

James_Edward_Gray_II · 29 June 2006 22:10

We have those now. They are called gsub() and scan().

James Edward Gray II

···

On Jun 29, 2006, at 4:57 PM, Christian Neukirchen wrote:

Eric Hodel <drbrain@segment7.net> writes:

On Jun 28, 2006, at 1:10 PM, A. S. Bradbury wrote:

On Wednesday 28 June 2006 20:56, Pete wrote:

just too bad the rdoc is inaccurate and says that delete takes a
string
as argument and not a regexp

That's actually correct. At first glance I thought it just
constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of
pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to
take a
regexp?

#delete and #count are restricted to character lists, a full regular
expression is too much.

+1 for adding these to a future Ruby.

Topic		Replies	Views
Remove all illegal chars form string ruby-talk	0	112	28 June 2006
Newbie Question: delete all non alphanumeric characters ruby-talk	16	134	22 July 2006
Regular expression question ruby-talk	2	75	27 June 2008
Triming characters from the front of a String ruby-talk	4	123	27 July 2007
Short cut for gsub("word","") ruby-talk	2	134	15 April 2009

Remove all illegal chars form string

Related topics