In search of a Regexp character class for "weird command characters"

Hello,

I have a regexp search problem.
I have written a text-correction program in Ruby which
reads in a text file and marks every word that's not
in a dictionary array red in an RTF output file
(still a plain text file, from Ruby's viewpoint) and saves that
file (i.e., it is still a text, with some commands specific
to Rich Text Format).
For instance, in a text, I have a citation "(Fox , 1970)."
Now, "(Fox" is not a correct English word, so it should
be red and bold, the comma is all right, so it stays black, and
"1970)." is not a correct English word, either, so it
should be red and bold, also.
In RTF, you can achieve this by replacing

a="(Fox , 1970)."

   by

b=" \cf1\b (Fox \cf0\b0 , \cf1\b 1970). \cf0\b0 ".

Now, if you say

p b

Ruby will give the following output

" \0061\010 (Fox \0060\010 1970). \0060\0100 "

However, I would like to remove all the characters of the form '\' + number
from the RTF file in a next step.
Is there a character class for Regexps (like \w,\S etc.) that achieves
this?
I have learned so far that '\010' is one character, and not the same as
'backslash' +
three digits.

Best regards,

Axel

Hi --

ยทยทยท

On Wed, 28 Sep 2005 Nuralanur@aol.com wrote:

Hello,

I have a regexp search problem.
I have written a text-correction program in Ruby which
reads in a text file and marks every word that's not
in a dictionary array red in an RTF output file
(still a plain text file, from Ruby's viewpoint) and saves that
file (i.e., it is still a text, with some commands specific
to Rich Text Format).
For instance, in a text, I have a citation "(Fox , 1970)."
Now, "(Fox" is not a correct English word, so it should
be red and bold, the comma is all right, so it stays black, and
"1970)." is not a correct English word, either, so it
should be red and bold, also.
In RTF, you can achieve this by replacing

a="(Fox , 1970)."

  by

b=" \cf1\b (Fox \cf0\b0 , \cf1\b 1970). \cf0\b0 ".

Now, if you say

p b

Ruby will give the following output

" \0061\010 (Fox \0060\010 1970). \0060\0100 "

However, I would like to remove all the characters of the form '\' + number
from the RTF file in a next step.
Is there a character class for Regexps (like \w,\S etc.) that achieves
this?

Not a predefined one, but you can do:

   b.gsub!(/[006-010]/,"")

which leaves you with:

   " 1 (Fox 00 , 1 1970). 00 "

if you're sure that's what you want.

David

--
David A. Black
dblack@wobblini.net