Advice on ASCII representation of objects for testing using string compares?

I've written a quick test that checks an input file for an application.
I have a "known good" baseline file, and I want to check the new file
against this. I attempted to serialize each conf file into objects to
do string compares. If the strings are not equal, then we need to check
the file for changes. If they are equal, the input file hasn't changed.

I'm using marshal to serialize each file, but I'm finding that my test
case has some false positives and false negatives using this method. If
I marshal the base line file in one script and store the string in a
file for later use, then with another script read in the configuration
file from the new build and do a comparison, I never get a comparison
of the two strings to be equal. If I do all of the above in one script:
-read in base-line file, marshal to a base-line file
-read in new conf file, marshal to a temp file
-read in both as strings, do a comparison (string1 == string2)
It works most of the time, but sometimes it doesn't work when there is
a change, or when the files are equal. The same thing happens with
checksum.

Any ideas on what might be going on? I'm wondering if with marshal
there are differences in the serialization operation which result in
slightly different strings in some cases.

Is there a better way of reading in these XML conf files and making a
simple ASCII representation for accurate string compares for a fast
failure kind of test? Are there library functions that would do this
reliably?

This is what the code looks like, I could very well be doing something
wrong here:
require 'rexml/document'

#serialize baseline file
xml = REXML::Document.new(File.open("xml/demo.xml"))

File.open("baseline/xml_baseline", "w") do | file |
    Marshal.dump(xml, file)
end

#serialize test file
xml2 = REXML::Document.new(File.open("xml/test_demo.xml"))

File.open("temp", "w") do | file |
    Marshal.dump(xml2, file)
end

line1 = IO.read("baseline/xml_baseline")
puts "line1.id.to_s" + line1.id.to_s

#line1 = file1.readline
puts "xml_baseline is: #{line1.size} characters."
puts "xml_baseline is: #{line1.sum.to_s} checksum."
#file1.close

line2 = IO.read("temp")

#line2 = file2.readline
puts "test_demo is: #{line2.size} characters."
puts "test_demo is: #{line2.sum.to_s} checksum."
#file2.close
puts "line2.id.to_s" + line2.id.to_s

#string compare section
if line1 == line2
   puts "EQUAL!"
else
   puts "NOT EQUAL! test_demo: " + line2 + "\n"
   puts "xml_baseline: " + line1 + "\n"
end
#end of sting compare section

Thanks;

-Jonathan

Jonathan Kohl wrote:

I've written a quick test that checks an input file for an application.
I have a "known good" baseline file, and I want to check the new file
against this. I attempted to serialize each conf file into objects to
do string compares. If the strings are not equal, then we need to check
the file for changes. If they are equal, the input file hasn't changed.

This sounds like a job for YAML.

···

-----
require 'yaml'

a = {1,2,3,4,5,6}
b = {1,2,3,4,5,6}
aa = a.to_yaml
bb = b.to_yaml
puts "match" if aa == bb
-----

Kirk Haines

Thanks. I Iooked at YAML, but given that the input files are XML it
seemed like a lot of markup.

The two areas that are not causing failures are in XML tag contents of
the first two entries in the XML file. If I run a diff, it says the
binary files are different, but the string compare and checksums are
identical. If I change the contents between tags or the tags after the
first two, it will fail as expected. This seems really odd. Using REXML
appealed to me because if the tags are malformed it will catch it when
it reads the file.

In the mean time, I discovered:
read(_filename_)" which reads the contents of a file (_filename_) in as
a string. I can read in both XML files (known good, and one from test
system) as strings, do a compare:
a == b and it seems to work. If this is the case, was my usage of
marshal to serialize the file contents into a string complete overkill?

Kirk Haines wrote:

Jonathan Kohl wrote:

I've written a quick test that checks an input file for an application.
I have a "known good" baseline file, and I want to check the new file
against this. I attempted to serialize each conf file into objects to
do string compares. If the strings are not equal, then we need to check
the file for changes. If they are equal, the input file hasn't changed.

This sounds like a job for YAML.

-----
require 'yaml'

a = {1,2,3,4,5,6}
b = {1,2,3,4,5,6}
aa = a.to_yaml
bb = b.to_yaml
puts "match" if aa == bb
-----

I've had problems with both Marshal and YAML when comparing loaded objects. YAML, for instance, does some funny things with floats. (Sorry, don't have specifics at hand.)

To be safe, it is best to add an extra cycle:

orig = MyObject.new...

copy_for_test = Marshal.load(Marshal.dump(orig))

copy_from_disk = Marshal.load(string_or_file)

if copy_for_test == copy_from_disk ...

Joel VanderWerf wrote:

Kirk Haines wrote:
> Jonathan Kohl wrote:
>
>
>>I've written a quick test that checks an input file for an

application.

>>I have a "known good" baseline file, and I want to check the new

file

>>against this. I attempted to serialize each conf file into objects

to

>>do string compares. If the strings are not equal, then we need to

check

>>the file for changes. If they are equal, the input file hasn't

changed.

>
>
> This sounds like a job for YAML.
>
> -----
> require 'yaml'
>
> a = {1,2,3,4,5,6}
> b = {1,2,3,4,5,6}
> aa = a.to_yaml
> bb = b.to_yaml
> puts "match" if aa == bb
> -----

I've had problems with both Marshal and YAML when comparing loaded
objects. YAML, for instance, does some funny things with floats.

(Sorry,

don't have specifics at hand.)

To be safe, it is best to add an extra cycle:

orig = MyObject.new...

copy_for_test = Marshal.load(Marshal.dump(orig))

copy_from_disk = Marshal.load(string_or_file)

if copy_for_test == copy_from_disk ...

I'm finding that on one machine (Ruby v 1.8.0-10), the string
comparison works perfectly after I marshal the XML file contents for
both the baseline and new test files. On another machine, (Ruby v
1.8.1-11), it doesn't work if there are changes in the first two tags
of the test file, but catches differencs anywhere else in the file. A
diff says there is a difference between the files, but the string
compare doesn't see a difference, so I get a false positive for a test
result. In the binary files of each XML file after they are marshalled,
character 71 of 923 total is a null byte. I'm wondering if when I read
it in, IO sees this as the end of the file. The data that is different
in 1.8.0 might be falling ahead of the null byte, and in 1.8.1 might be
past it, so the string compare doesn't catch it. Does this sound
familiar to anyone? I'm going to test this out further on my own as
well and get more information.

I'm wondering if REXML has changed, or if the IO.read or string
compares have changed from 1.8.0 to 1.8.1?