Remove all illegal chars form string

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

Thanks

thomas coopman wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

Thanks

str.gsub(/[^a-zA-Z]/, '') should do it.

···

--
Alex

Hi --

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

You can make a character class that contains what you want, and then
get rid of everything that doesn't match it, using a negated character
class:

   irb(main):004:0> str = "exam@p Le3|§"
   => "exam@p Le3|\302\247"
   irb(main):005:0> str.gsub(/[^A-Za-z0-9]/, '')
   => "exampLe3"

(The '3' disappeared in your example but assuming you want 0-9 it
would actually remain.)

David

···

On Wed, 28 Jun 2006, thomas coopman wrote:

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

See what the readers are saying about "Ruby for Rails"!

string.delete("^a-zA-Z0-9")

Hope that helps.

James Edward Gray II

···

On Jun 28, 2006, at 8:09 AM, thomas coopman wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

Well you can try with String#tr

moulon% ruby -e 'p "exam@p Le3|§".tr("^a-zA-Z0-9", "")'
"exampLe3"
moulon%

which means replace all characters, except a-z A-Z 0-9, with ""

Guy Decoux

It's easy with gsub and a regex:

stringToCheck.gsub!(/[^a-zA-Z0-9]/, "")

The '^' inverses the list of characters.

Les

···

On 6/28/06, thomas coopman <thomas.coopman@gmail.com> wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

sender: "thomas coopman" date: "Wed, Jun 28, 2006 at 10:09:35PM +0900" <<<EOQ

Hi,

Hi,

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

Taking your example, if legal chars are a-z A-Z 0-9 then
the output of:
    "exam@p Le3|§"
should be:
    "exampLe3" and not "exampLe"...

I don't know very much about regular expressions, so I don't know if
it's possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.

Yes, this is why regexps were invented :slight_smile:
# irb
irb(main):001:0> "exam@p Le3|§".gsub(/[^a-zA-Z0-9]/,'')
=> "exampLe3"

Thanks

You're welcome,
Alex

The regexp for such a thing would be:

"exam@p Le3|§".gsub(/[^a-zA-Z0-9]/, "")
  => "exampLe3" (you listed 3 as a legal character in the above email. /[^a-zA-Z]/ would remove numbers as well)

Probably about time to learn some regular expressions. Have a look at Regular Expression Tutorial - Learn How to Use Regular Expressions
You'll really start to like them once you learn even just the basic matching ideas.
-Mat

···

On Jun 28, 2006, at 9:09 AM, thomas coopman wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

Thanks

This is definitely regex territory. And gsub() is the thing:

ex = "exam@p Le3|§"
puts ex.gsub( /[^A-Za-z0-9]/, '' )

exampLe3

Or

puts "exam@p Le3|§".gsub( /[^A-Za-z0-9]/, '' )

exampLe3

I assume you wanted the 3 in there, since you asked for numbers in your
range of characters. Whe doing something like this, in my opinion, it's
best to not try to roll your own.

Also, if you're willing to accept underscores in the accepted character
list, you could just use the \W character class, which is equal to
[^A-Za-z0-9_].

Yes, there is:

str = "exam@p Le3|§"
str.gsub(/[^a-z0-9]/i, '') # => "exampLe3"

Paul.

···

On 28/06/06, thomas coopman <thomas.coopman@gmail.com> wrote:

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

It's dead easy:

test = "exam@p Le3|§"
test.gsub(/[^A-Za-z0-9]/, '')
=> "exampLe3"

Quick explanation:

defines a group you want to treat as one character. If you only wanted vowels, ferinstance, you'd use [aeiou].

^ as the first character in a group means the opposite of that group. Everything that isn't a vowel would be like this: [^aeiou]

A-Z is a range of characters, and you can do smaller ranges like c-q or whatever, the ordering used to determine the range is the character encoding. This means you can just say A-z in place of A-Za-z (at least in ASCII and compatible encodings - I don't know about anything else), but I think that tends to make things a little less clear, especially since a-Z is invalid because a > Z in ASCII.

There are also shortcuts for some classes of characters, \d is equivalent to [0-9], and \w is close to [A-z0-9] but also includes the underscore character '_'.

So the regular expression says 'match any single character that is not in the ranges A-Z, a-z, or 0-9'. #gsub takes everything matched by the regular expression, and replaces it with nothing.

matthew smillie.

···

On Jun 28, 2006, at 14:09, thomas coopman wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

Thanks

Is there a simple way to remove all but the legal chars from
a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these
characters. "exam@p Le3|§" --> "exampLe"

p "exam@p Le3|§".gsub(/[^a-zA-Z0-9]/, "")

gegroet,
Erik V. - http://www.erikveen.dds.nl/

thomas coopman wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|�" --> "exampLe"

I don't know very much about regular expressions, so I don't know if
it's possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.

Thanks

irb(main):006:0> s = "abcd??ABCD!!0123"
=> "abcd??ABCD!!0123"
irb(main):007:0> s
=> "abcd??ABCD!!0123"
irb(main):008:0> s.tr('^a-zA-Z0-9','')
=> "abcdABCD0123"

···

--
Posted via http://www.ruby-forum.com/\.

An except from my upcoming book, Ruby Phrasebook:

"""
new_password = gets
if new_password.count '^A-Za-z._' != 0 then
  puts "Bad Password"
else
  #do something
end

This works by using a special syntax that's shared by .count, .tr,
delete, and squeeze. A parameter beginning with ^ negates the list; the
list consists of any valid characters in the active character set and
may contain ranges formed with -. If more than one parameter list is
given to these functions, the lists of characters are intersected using
set logic[md]that is, only characters in both lists are used for
filtering.

You might also want to simply replace all "evil" characters with _ (such
as perhaps from a CGI form post):

evil_input = '`cat /etc/passwd`'

evil_input.tr('./\`', '_')

#=> "_cat _etc_passwd_"
"""

In your specific question, you will want to use .delete:

'exam@p Le3|§'.delete '^A-Za-z'
  #=> "exampLe"

···

On Wed, 2006-06-28 at 22:09 +0900, thomas coopman wrote:

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't know if
it's possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

   str="exam@p L_e3|�"
   puts str.gsub(/[^a-zA-Z0-9]/, "")
   # yields:
   # exampLe3

a bit shorter would be

   puts str.gsub(/\W/, "")
   # but "word"-characters (\w) and "non-word"-characters (\W) also
   # contain the' _', so this would yield:
   # exampL_e3

Benedikt

   ALLIANCE, n. In international politics, the union of two thieves who
     have their hands so deeply inserted in each other's pockets that
     they cannot separately plunder a third.
       (Ambrose Bierce, The Devil's Dictionary)

"exam@p Le3|§".gsub(/\W/, '')

will return

"exampLe3"

I strongly suggest you learn about regular expressions. You can start here,
Regular expression - Wikipedia , there are many links to
many tutorials. Here, http://www.rubycentral.com/book/tut_stdtypes.html , you
can find info on ruby regular expressions ; though it might be best to get
yourself a Ruby book.

Anselm

···

On Wednesday 28 June 2006 14:09, thomas coopman wrote:

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9 So everything should be
removed from the string but these characters. "exam@p Le3|§" --> "exampLe"

--
------------------------------
Netuxo Ltd
a workers' co-operative
providing low-cost IT solutions
for peace, environmental and social justice groups
and the radical NGO sector
------------------------------

thomas coopman wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

The easiest way to find stuff is to search comp.lang.ruby through
Google groups:

http://groups.google.com/group/comp.lang.ruby/browse_frm/thread/c9b63420fe8f66a9?q=remove+non-ASCII&

ruby-talk-google was set up in late April and doesn't have much
searchable history

You've hit the nail on the head. Use gsub on the string.

"exam@p Le3|§".gsub(/\W/, '') # --> exampLe3

The \W in the regular expression matches every character that is not a
valid word character: i.e. [^a-zA-Z0-9_]

Blessings,
TwP

···

On 6/28/06, thomas coopman <thomas.coopman@gmail.com> wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't know if it's possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

Thanks

Thomas,

s = "exam@p Le3|§"
s.gsub(/[^a-zA-Z0-9]/, '') # => "exampLe"

Thanks,

David

···

On 6/28/06, thomas coopman <thomas.coopman@gmail.com> wrote:

Hi,

Is there a simple way to remove all but the legal chars from a
string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
"exam@p Le3|§" --> "exampLe"

I don't know very much about regular expressions, so I don't know if it's
possible with sub or gsub. My first Idea was to loop over the string and
check every character but I wondered if there is something more simple or
better.

Thanks

--
--------
David Pollak's Ruby Playground
http://dppruby.com