Finding non-printable characters using Regular Expressions

As part of a method I am playing with while learning Ruby I need to be able to determine which characters in a string are non-printable. What is the "best" method for determining if a character is printable, such as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is this the best way to do this?

Michael W. Ryder wrote:

As part of a method I am playing with while learning Ruby I need to be able to determine which characters in a string are non-printable. What is the "best" method for determining if a character is printable, such as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is this the best way to do this?

The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"

···

--
Alex

Alex Young wrote:

Michael W. Ryder wrote:

As part of a method I am playing with while learning Ruby I need to be able to determine which characters in a string are non-printable. What is the "best" method for determining if a character is printable, such as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is this the best way to do this?

The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"

This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been able to figure out how to include them in the output so that it shows "A B C".
Thank you for your assistance, it has given me a starting point and I will have to spend some time experimenting and researching to reach the final step.

Michael W. Ryder wrote:

Alex Young wrote:

Michael W. Ryder wrote:

As part of a method I am playing with while learning Ruby I need to be able to determine which characters in a string are non-printable. What is the "best" method for determining if a character is printable, such as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is this the best way to do this?

The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"

This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been able to figure out how to include them in the output so that it shows "A B C".
Thank you for your assistance, it has given me a starting point and I will have to spend some time experimenting and researching to reach the final step.

You're nearly there. Look a little closer at my suggestion, particularly the second regex.

···

--
Alex

Michael W. Ryder wrote:

"A \n B \t C".gsub(/[^[:graph:]]/, '')

I need to keep the spaces and have not been able to figure
out how to include them in the output so that it shows "A B C".

Hint: examine the second parameter of String#gsub

···

--
Posted via http://www.ruby-forum.com/\.

Alex Young wrote:

Michael W. Ryder wrote:

Alex Young wrote:

Michael W. Ryder wrote:

As part of a method I am playing with while learning Ruby I need to be able to determine which characters in a string are non-printable. What is the "best" method for determining if a character is printable, such as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is this the best way to do this?

The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"

This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been able to figure out how to include them in the output so that it shows "A B C".
Thank you for your assistance, it has given me a starting point and I will have to spend some time experimenting and researching to reach the final step.

You're nearly there. Look a little closer at my suggestion, particularly the second regex.

Thank you very much for your assistance using "A \n B \t C".gsub(/[^[:print:]]/, '') gives me "A B C" which is what I was looking for.
Can you recommend a good reference on regular expressions so I can learn more?

THE book on RegEx is "Mastering Regular Expressions" from OReilly.
It is a bit Perl focused in the examples, but the book itself is all about regular expressions in use.

···

On Apr 21, 2007, at 7:55 AM, Michael W. Ryder wrote:

Alex Young wrote:

Michael W. Ryder wrote:

Alex Young wrote:

Michael W. Ryder wrote:

As part of a method I am playing with while learning Ruby I need to be able to determine which characters in a string are non-printable. What is the "best" method for determining if a character is printable, such as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is this the best way to do this?

The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"

This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been able to figure out how to include them in the output so that it shows "A B C".
Thank you for your assistance, it has given me a starting point and I will have to spend some time experimenting and researching to reach the final step.

You're nearly there. Look a little closer at my suggestion, particularly the second regex.

Thank you very much for your assistance using "A \n B \t C".gsub(/[^[:print:]]/, '') gives me "A B C" which is what I was looking for.
Can you recommend a good reference on regular expressions so I can learn more?

John Joyce wrote:

THE book on RegEx is "Mastering Regular Expressions" from OReilly.
It is a bit Perl focused in the examples, but the book itself is all about regular expressions in use.

I will get a copy of the book as trying to find the information on the web is very time consuming and hit or miss. Thank you for the suggestion.

···

On Apr 21, 2007, at 7:55 AM, Michael W. Ryder wrote:

Alex Young wrote:

Michael W. Ryder wrote:

Alex Young wrote:

Michael W. Ryder wrote:

As part of a method I am playing with while learning Ruby I need to be able to determine which characters in a string are non-printable. What is the "best" method for determining if a character is printable, such as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is this the best way to do this?

The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"

This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been able to figure out how to include them in the output so that it shows "A B C".
Thank you for your assistance, it has given me a starting point and I will have to spend some time experimenting and researching to reach the final step.

You're nearly there. Look a little closer at my suggestion, particularly the second regex.

Thank you very much for your assistance using "A \n B \t C".gsub(/[^[:print:]]/, '') gives me "A B C" which is what I was looking for.
Can you recommend a good reference on regular expressions so I can learn more?