In search of a Regexp character class for "weird command characters"

Nuralanur · 28 September 2005 14:46

Hello,

I have a regexp search problem.
I have written a text-correction program in Ruby which
reads in a text file and marks every word that's not
in a dictionary array red in an RTF output file
(still a plain text file, from Ruby's viewpoint) and saves that
file (i.e., it is still a text, with some commands specific
to Rich Text Format).
For instance, in a text, I have a citation "(Fox , 1970)."
Now, "(Fox" is not a correct English word, so it should
be red and bold, the comma is all right, so it stays black, and
"1970)." is not a correct English word, either, so it
should be red and bold, also.
In RTF, you can achieve this by replacing

a="(Fox , 1970)."

by

b=" \cf1\b (Fox \cf0\b0 , \cf1\b 1970). \cf0\b0 ".

Now, if you say

p b

Ruby will give the following output

" \0061\010 (Fox \0060\010 1970). \0060\0100 "

However, I would like to remove all the characters of the form '\' + number
from the RTF file in a next step.
Is there a character class for Regexps (like \w,\S etc.) that achieves
this?
I have learned so far that '\010' is one character, and not the same as
'backslash' +
three digits.

Best regards,

Axel

David_A_Black3 · 28 September 2005 15:01

Hi --

···

On Wed, 28 Sep 2005 Nuralanur@aol.com wrote:

Hello,

I have a regexp search problem.
I have written a text-correction program in Ruby which
reads in a text file and marks every word that's not
in a dictionary array red in an RTF output file
(still a plain text file, from Ruby's viewpoint) and saves that
file (i.e., it is still a text, with some commands specific
to Rich Text Format).
For instance, in a text, I have a citation "(Fox , 1970)."
Now, "(Fox" is not a correct English word, so it should
be red and bold, the comma is all right, so it stays black, and
"1970)." is not a correct English word, either, so it
should be red and bold, also.
In RTF, you can achieve this by replacing

a="(Fox , 1970)."

by

b=" \cf1\b (Fox \cf0\b0 , \cf1\b 1970). \cf0\b0 ".

Now, if you say

p b

Ruby will give the following output

" \0061\010 (Fox \0060\010 1970). \0060\0100 "

However, I would like to remove all the characters of the form '\' + number
from the RTF file in a next step.
Is there a character class for Regexps (like \w,\S etc.) that achieves
this?

Not a predefined one, but you can do:

b.gsub!(/[006-010]/,"")

which leaves you with:

" 1 (Fox 00 , 1 1970). 00 "

if you're sure that's what you want.

David

--
David A. Black
dblack@wobblini.net

Topic		Replies	Views
In search of a Regexp character class for "weird command characters" ruby-talk	1	110	28 September 2005
Bug is ruby regexp ruby-talk	5	95	3 February 2007
Regexp questions ruby-talk	4	80	25 June 2007
Help with regexp's ruby-talk	2	112	12 October 2010
More Regexp and file load problems ruby-talk	2	104	28 June 2007

In search of a Regexp character class for "weird command characters"

Related topics