I've got an app that I use ascii-8bit encoding as the default to
avoid throwing errors when checking against regexps, etc.
Unfortunately, this causes Psych to dump out my data structures
in binary format.
Would anyone be interested in a one-line patch to force dumping
in non-binary format?
···
------
--- /opt/local/lib/ruby/1.9.1/psych/visitors/yaml_tree.rb.orig 2012-05-02 16:49:41.246503805 -0400
+++ /opt/local/lib/ruby/1.9.1/psych/visitors/yaml_tree.rb 2012-05-02 16:35:21.586503943 -0400
@@ -230,7 +230,7 @@
quote = false
style = Nodes::Scalar::ANY
- if binary?(o)
+ if binary?(o) && ! @options[:nobinary]
str = [o].pack('m').chomp
tag = '!binary' # FIXME: change to below when syck is removed
#tag = 'tag:yaml.org,2002:binary'
------
--
Jim Hranicky
IT Security Engineer
Office of Information Security and Compliance
University of Florida
If you can, try US-ASCII encoding for 7-bit clean ASCII. Psych will
dump that as you expect.
···
On Wed, May 2, 2012 at 1:53 PM, Jim Hranicky <jfh@ufl.edu> wrote:
I've got an app that I use ascii-8bit encoding as the default to
avoid throwing errors when checking against regexps, etc.
Unfortunately, this causes Psych to dump out my data structures
in binary format.
Well, I'd rather avoid this when reading and parsing syslog
lines:
ruby -Eus-ascii -e 'x = "foo"; x.force_encoding("US-ASCII"); puts x.encoding; x += "\xf0\xff"; x.force_encoding("US-ASCII"); puts x.match(/foo/); puts x' | m
-e:1:in `match': invalid byte sequence in US-ASCII (ArgumentError)
from -e:1:in `match'
from -e:1:in `<main>'
US-ASCII
You never know what'll be in there, and I'd rather not have to run
force_encoding on every processed line.
If there's a better way to handle strings with naughty characters
I'd be grateful for pointers.
···
On 05/02/2012 05:10 PM, Jeremy Kemper wrote:
If you can, try US-ASCII encoding for 7-bit clean ASCII. Psych will
dump that as you expect.
--
Jim Hranicky
IT Security Engineer
Office of Information Security and Compliance
University of Florida
For now, you can use String#ascii_only?
def tag(string)
string.force_encoding("US-ASCII") if string.ascii_only?
end
x = "foo"
tag(x)
puts x.encoding
x += "\xf0\xff"
tag(x)
puts x.match(/foo/)
puts x
I will push this up to Psych so that ascii only ASCII-8BIT strings are
dumped as UTF-8.
···
On Thu, May 03, 2012 at 06:41:57AM +0900, Jim Hranicky wrote:
On 05/02/2012 05:10 PM, Jeremy Kemper wrote:
> If you can, try US-ASCII encoding for 7-bit clean ASCII. Psych will
> dump that as you expect.
Well, I'd rather avoid this when reading and parsing syslog
lines:
ruby -Eus-ascii -e 'x = "foo"; x.force_encoding("US-ASCII"); puts x.encoding; x += "\xf0\xff"; x.force_encoding("US-ASCII"); puts x.match(/foo/); puts x' | m
-e:1:in `match': invalid byte sequence in US-ASCII (ArgumentError)
from -e:1:in `match'
from -e:1:in `<main>'
US-ASCII
You never know what'll be in there, and I'd rather not have to run
force_encoding on every processed line.
If there's a better way to handle strings with naughty characters
I'd be grateful for pointers.
--
Aaron Patterson
http://tenderlovemaking.com/