-----Original Message-----
From: thomas coopman [mailto:thomas.coopman@gmail.com]
Sent: Wednesday, June 28, 2006 8:10 PM
To: ruby-talk ML
Subject: remove all illegal chars form string
Hi,
Is there a simple way to remove all but the legal chars from
a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"
I don't know very much about regular expressions, so I don't
know if it's possible with sub or gsub.
Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"
Some options:
s.gsub /[^a-zA-Z0-9]+/, ''
s.gsub /\W+/, ''
(or use gsub! if you want to change in place)
I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.
Definitively.
Kind regards
robert
···
2006/6/28, thomas coopman <thomas.coopman@gmail.com>:
You were right to think along the lines of gsub, a possible approach is:
"exam@p Le3|§".gsub(/[^[:alnum:]]/, '') -->"exampLe3"
This replaces all characters that are not in the :alnum: POSIX character class
with a blank string.
If you just want alpha characters (not 0-9), then use [[:alpha:]] instead.
Your example contradicted what you stated you were looking for.
The only thing to be wary of here is that Regexps such as /[[:alpha:]]/ may
act differently depending on locale (no idea whether it has any effect in
non-Onigurama Ruby 1.8.x), and certainly if executed on a Ruby compiled with
the Onigurama regular expression engine (mine at least). With that in mind,
the best way to get what you want may be to use /[a-zA-Z0-9]/ as your Regex.
As a side note, I didn't know that you could also do /[[:^alnum:]]/.
Hope this helps,
Alex
···
On Wednesday 28 June 2006 14:09, thomas coopman wrote:
Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9 So everything should be
removed from the string but these characters. "exam@p Le3|§" --> "exampLe"
That's actually correct. At first glance I thought it just constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to take a
regexp?
Alex
···
On Wednesday 28 June 2006 20:56, Pete wrote:
just too bad the rdoc is inaccurate and says that delete takes a string
as argument and not a regexp
#delete and #count are restricted to character lists, a full regular expression is too much.
···
On Jun 28, 2006, at 1:10 PM, A. S. Bradbury wrote:
On Wednesday 28 June 2006 20:56, Pete wrote:
just too bad the rdoc is inaccurate and says that delete takes a string
as argument and not a regexp
That's actually correct. At first glance I thought it just constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to take a
regexp?
--
Eric Hodel - drbrain@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant
just too bad the rdoc is inaccurate and says that delete takes a
string
as argument and not a regexp
That's actually correct. At first glance I thought it just
constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of
pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to
take a
regexp?
#delete and #count are restricted to character lists, a full regular
expression is too much.
+1 for adding these to a future Ruby.
···
On Jun 28, 2006, at 1:10 PM, A. S. Bradbury wrote:
We have those now. They are called gsub() and scan().
James Edward Gray II
···
On Jun 29, 2006, at 4:57 PM, Christian Neukirchen wrote:
Eric Hodel <drbrain@segment7.net> writes:
On Jun 28, 2006, at 1:10 PM, A. S. Bradbury wrote:
On Wednesday 28 June 2006 20:56, Pete wrote:
just too bad the rdoc is inaccurate and says that delete takes a
string
as argument and not a regexp
That's actually correct. At first glance I thought it just
constructed a regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of
pseudo-regex.
Is there a reason it wouldn't make more sense for these methods to
take a
regexp?
#delete and #count are restricted to character lists, a full regular
expression is too much.