Hi.
A simple one, again. How do I strip out all chars
except valid ASCII from a file?
while line = gets()
line.sub(/[^\w]/, " ")
puts line
end
leaves stuff, such as ESC sequences. Any ideas?
Is there a one-liner for this?
Regards,
-mark.
Hi.
A simple one, again. How do I strip out all chars
except valid ASCII from a file?
while line = gets()
line.sub(/[^\w]/, " ")
puts line
end
leaves stuff, such as ESC sequences. Any ideas?
Is there a one-liner for this?
Regards,
-mark.
In article 5.1.0.14.2.20021002161707.030aecd8@zcard04k.ca.nortel.com,
Mark Probert wrote:
Hi.
A simple one, again. How do I strip out all chars
except valid ASCII from a file?while line = gets()
line.sub(/[^\w]/, " ")
puts line
endleaves stuff, such as ESC sequences. Any ideas?
Is there a one-liner for this?
Did you mean to say
line.gsub!(/\W/, ’ ')
which will change all non-word characters in line to spaces and modifies
line?
What do you mean by “valid ASCII” here? Are you really after printable
characters < chr(128)? e.g.
line.gsub!(/[^ -~]/, ’ ')
Hope this helps,
Mike
–
mike@stok.co.uk | The “`Stok’ disclaimers” apply.
http://www.stok.co.uk/~mike/ | GPG PGP Key 1024D/059913DA
mike@exegenix.com | Fingerprint 0570 71CD 6790 7C28 3D60
http://www.exegenix.com/ | 75D2 9EC4 C1C0 0599 13DA
Hi Mark,
Depending on what you mean with “valid ASCII”, probably using a lower
level representation is more appropriate than using a regular
expression. Just an example:
ARGF.each_byte do |byte|
if byte < 128
print byte.chr
end
end
I guess, you can just change the criterion (byte < 128) to suit your
need. You may even be able to transform the above code to a one-liner.
Regards,
Bill
========================================================================
Mark Probert probertm@nortelnetworks.com wrote:
Hi.
A simple one, again. How do I strip out all chars
except valid ASCII from a file?
while line = gets()
line.sub(/[^\w]/, " ")
puts line
end
leaves stuff, such as ESC sequences. Any ideas?
Is there a one-liner for this?
My apologies, I wasn’t clear.
The file is plain text that has curses detritus all the way
through it. A sample is:
[23DOPC Save and Restore [0m[m
[23DOPC Shutdown
[19DOPC Date [0;7m[m
[23DPort Configuration [0;7m[m
[23DIP Routing Admin
[C Unix Shell
[10DOPC PM Coll. Filter
[19DTL1 Configuration
[20DLog Archive
Your suggestion is excellent at getting rid of the \C-?
characters. Any ideas on the escape strings?
-mark.
At 05:43 AM 10/3/2002 +0900, Mike wrote:
In article 5.1.0.14.2.20021002161707.030aecd8@zcard04k.ca.nortel.com,
Mark Probert wrote:Hi.
A simple one, again. How do I strip out all chars
except valid ASCII from a file?What do you mean by “valid ASCII” here? Are you really after printable
characters < chr(128)? e.g.line.gsub!(/[^ -~]/, ’ ')
In article 5.1.0.14.2.20021002165913.0305ae10@zcard04k.ca.nortel.com,
Mark Probert wrote:
In article 5.1.0.14.2.20021002161707.030aecd8@zcard04k.ca.nortel.com,
Mark Probert wrote:Hi.
A simple one, again. How do I strip out all chars
except valid ASCII from a file?What do you mean by “valid ASCII” here? Are you really after printable
characters < chr(128)? e.g.line.gsub!(/[^ -~]/, ’ ')
My apologies, I wasn’t clear.
The file is plain text that has curses detritus all the way
through it. A sample is:[23DOPC Save and Restore [0m[m
[23DOPC Shutdown
[19DOPC Date [0;7m[m
[23DPort Configuration [0;7m[m
[23DIP Routing Admin
[C Unix Shell
[10DOPC PM Coll. Filter
[19DTL1 Configuration
[20DLog ArchiveYour suggestion is excellent at getting rid of the \C-?
characters. Any ideas on the escape strings?
Ick Here you aren’t dealing with characters, but sequences. For
example the (implied Esc)[23D is the ANSI escape sequence to move the
cursor back (left) 23 columns. There are a “metric buttload” of escape
sequences.
If this were a perl news group then I would concoct a complex regexp,
but you might be better doing something like this and then refining it
once it works. First deal with the escape sequences:
line.gsub!(/\e\[\d+[ABCD]/, ' ') # cursor up, down, forward, back
line.gsub!(/\e\[2J/, ' ') # clear screen (could combine
# with previous regex)
line.gsub!(/\e\[\d+;\d+[Hf]/, ' ') # cursor positioning
line.gsub!(/\e\[[suK]/, ' ') # save / restore / erase line
line.gsub!(/\e\[\d+(?:;\d+)*m/, ' ') # set graphics mode
line.gsub!(/\e\[=\d+[hl]/, ' ') # set mode
# etc.
Then see what you have left in the line.
Some of the qauntifiers might need to be * rather than + if you are
allowed zero or more digits (e.g. there seem to be some \e[m in your
example, so /\e[\d+(?:;\d+)m/ might need to be /\e[\d(?:;\d+)*m/
There are more codes than this, they are easy to find on the web via
google.
Hope this helps,
Mike
At 05:43 AM 10/3/2002 +0900, Mike wrote:
–
mike@stok.co.uk | The “`Stok’ disclaimers” apply.
http://www.stok.co.uk/~mike/ | GPG PGP Key 1024D/059913DA
mike@exegenix.com | Fingerprint 0570 71CD 6790 7C28 3D60
http://www.exegenix.com/ | 75D2 9EC4 C1C0 0599 13DA