Remove all illegal chars form string

Character sets with ranges: [a-z]
Negated sets: [^a-z]

"exam@p Le3|§".gsub(/[^a-zA-Z0-9]/,'') => "exampLe3"

ben

···

-----Original Message-----
From: thomas coopman [mailto:thomas.coopman@gmail.com]
Sent: Wednesday, June 28, 2006 8:10 PM
To: ruby-talk ML
Subject: remove all illegal chars form string

Hi,

Is there a simple way to remove all but the legal chars from
a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't
know if it's possible with sub or gsub.

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

Some options:

s.gsub /[^a-zA-Z0-9]+/, ''
s.gsub /\W+/, ''

(or use gsub! if you want to change in place)

I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

Definitively.

Kind regards

robert

···

2006/6/28, thomas coopman <thomas.coopman@gmail.com>:

--
Have a look: Robert K. | Flickr

You were right to think along the lines of gsub, a possible approach is:
"exam@p Le3|§".gsub(/[^[:alnum:]]/, '') -->"exampLe3"
This replaces all characters that are not in the :alnum: POSIX character class
with a blank string.

If you just want alpha characters (not 0-9), then use [[:alpha:]] instead.
Your example contradicted what you stated you were looking for.

The only thing to be wary of here is that Regexps such as /[[:alpha:]]/ may
act differently depending on locale (no idea whether it has any effect in
non-Onigurama Ruby 1.8.x), and certainly if executed on a Ruby compiled with
the Onigurama regular expression engine (mine at least). With that in mind,
the best way to get what you want may be to use /[a-zA-Z0-9]/ as your Regex.

As a side note, I didn't know that you could also do /[[:^alnum:]]/.

Hope this helps,
Alex

···

On Wednesday 28 June 2006 14:09, thomas coopman wrote:

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9 So everything should be
removed from the string but these characters. "exam@p Le3|§" --> "exampLe"

delete() - nice. As a Perler just coming to Ruby, it's hard not to fall
back on old habits (regex with gsub, for instance).

Troy

···

On 6/28/06, James Edward Gray II <james@grayproductions.net> wrote:

string.delete("^a-zA-Z0-9")

Troy Denkinger schrieb:

···

On 6/28/06, James Edward Gray II <james@grayproductions.net> wrote:

string.delete("^a-zA-Z0-9")

delete() - nice. As a Perler just coming to Ruby, it's hard not to fall
back on old habits (regex with gsub, for instance).

Troy

just too bad the rdoc is inaccurate and says that delete takes a string as argument and not a regexp

That's actually correct. At first glance I thought it just constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to take a
regexp?

Alex

···

On Wednesday 28 June 2006 20:56, Pete wrote:

just too bad the rdoc is inaccurate and says that delete takes a string
as argument and not a regexp

#delete and #count are restricted to character lists, a full regular expression is too much.

···

On Jun 28, 2006, at 1:10 PM, A. S. Bradbury wrote:

On Wednesday 28 June 2006 20:56, Pete wrote:

just too bad the rdoc is inaccurate and says that delete takes a string
as argument and not a regexp

That's actually correct. At first glance I thought it just constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to take a
regexp?

--
Eric Hodel - drbrain@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

Performance is just slightly better with character lists instead of
regular expressions.

···

On Thu, 2006-06-29 at 05:10 +0900, A. S. Bradbury wrote:

Is there a reason it wouldn't make more sense for these methods to take a
regexp?

Eric Hodel <drbrain@segment7.net> writes:

just too bad the rdoc is inaccurate and says that delete takes a
string
as argument and not a regexp

That's actually correct. At first glance I thought it just
constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of
pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to
take a
regexp?

#delete and #count are restricted to character lists, a full regular
expression is too much.

+1 for adding these to a future Ruby.

···

On Jun 28, 2006, at 1:10 PM, A. S. Bradbury wrote:

On Wednesday 28 June 2006 20:56, Pete wrote:

Eric Hodel - drbrain@segment7.net - http://blog.segment7.net

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

We have those now. They are called gsub() and scan(). :wink:

James Edward Gray II

···

On Jun 29, 2006, at 4:57 PM, Christian Neukirchen wrote:

Eric Hodel <drbrain@segment7.net> writes:

On Jun 28, 2006, at 1:10 PM, A. S. Bradbury wrote:

On Wednesday 28 June 2006 20:56, Pete wrote:

just too bad the rdoc is inaccurate and says that delete takes a
string
as argument and not a regexp

That's actually correct. At first glance I thought it just
constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of
pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to
take a
regexp?

#delete and #count are restricted to character lists, a full regular
expression is too much.

+1 for adding these to a future Ruby.