[QUIZ] NDiff (#46)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Bill Kleb

This week's quiz is to write a version of the diff command that compares files
numerically. In Unix-speak,

  $ ndiff --help
  Usage: ndiff [options] file1 file2

  Numerically compare files line by line, numerical field by numerical field.

  -d INT --digits INT Maximum number of significant digits that
                           should match. (default: 0)
  -h --help Output this help.
  -q --quiet No output, just exit code.
  -s --statistics Provide comparison statistics only. (extra credit)
  -t DBL --tolerance DBL Tolerate <= DBL distance between numbers.
                           (default: 0.0)

For example, given fileA,

  1.000001
  2.00
  -3
  Cy=0.11278889E-01 Cx=-1.343e+02

And fileB,

  1.000000
  1.99
  -3.4
  Cy=0.11278890E-01 Cx=-1.343e+02

the following scenarios could play out:

  $ ndiff fileA fileB
  1,4c1,4
  < 1.000001
  < 2.00
  < -3
  < Cy=0.11278889E-01 Cx=-1.343e+02
  ---
  > 1.000000
  > 1.99
  > -3.4
  > Cy=0.11278890E-01 Cx=-1.343e+02
  
  $ ndiff -t 0.000001 fileA fileB
  2,3c2,3
  < 2.00
  < -3
  ---
  > 1.99
  > -3.4
  
  $ ndiff --tolerance 0.01 fileA fileB
  3c3
  < -3
  ---
  > -3.4
  
  $ ndiff --digits 1 fileA fileB (zero exit code)
  
  $ ndiff -q fileA fileB (non-zero exit code)

and, for extra credit,

  $ ndiff --statistics fileA fileB
  Numbers compared: 5
  Distance range: 0.0..0.4
  Average distance: 0.99987e-01 [guess]
  Mean distance: 1.0e-06 [guess]

FWIW, the results of this quiz will be used by NASA's FUN3D
team for regression testing aerothermodynamic simulation
software.

It would seem useful to be produce a file of numerical differences
as well. Writing the subsequent patch program would open another
can of worms, but difference plots definately have their worth.

         Hugh

Named fields are an interesting case: what should happen for
     Cy=0.11278889E-01 Cx=-1.343e+02

   And fileB,

     Cx=0.11278890E-01 Cy=-1.343e+02 Cz=2.9979e+08
?
Fields named in a different order, and an extra field introduced.

         Hugh

···

On Fri, 9 Sep 2005, Ruby Quiz wrote:

  Cy=0.11278889E-01 Cx=-1.343e+02

And fileB,

  Cy=0.11278890E-01 Cx=-1.343e+02

Hi

To help the quiz writers focus on the 'the quiz' and not the administrative
part of the code and to help promote CommandLine, here is a little snippet
that some may find useful to get started.

#!/usr/bin/env ruby

require 'rubygems'
require 'commandline'

class NDiffApp < CommandLine::Application
  def initialize
    author "Jim Freeze"
    copyright "2005 Jim Freeze"
    synopsis "[-hqs] [-d INT] [-t DBL] file1 file2"
    short_description "Numerically compare files line by line, "+
                      "numerical field by numerical field."

    option :names => %w(--digits -d),
           :opt_description => "Maximum number of significant digits that "+
                               "should match. (default: 0)",
           :arg_description => "INT",
           :opt_found => get_arg,
           :opt_not_found => "0"

    option :names => %w(--tolerance -t),
           :opt_description => "Tolerate <= DBL distance between numbers. "+
                               "(default: 0.0)",
           :arg_description => "DBL",
           :opt_found => get_arg,
           :opt_not_found => "0.0"

    option :flag,
           :names => %w(--statistics -s),
           :opt_description => "Provide comparison statistics only.
(extra credit)"

    option :flag,
           :names => %w(--quiet -q),
           :opt_description => "No output printed to screen. just exit code.??"

    option :help

    expected_args :file1, :file2
  end

  def main
    NDiff.new(@file1, @file2,
      @option_data["--digits"].to_i,
      @option_data["--tolerance"].to_f,
      @option_data["--statistics"],
      @option_data["--quiet"])
  end

end#class NDiffApp

class NDiff
  # Your code here
  def initialize(file1, file2,
    digits=0,
    tolerance=0.0,
    statistics=false,
    quiet=false)

    p file1
    p file2
    p digits
    p tolerance
    p statistics
    p quiet
  end

end#class NDiff

The printout is quit nice, and it takes no extr effort:

% ./ndiff
Usage: ndiff [-hqs] [-d INT] [-t DBL] file1 file2
% ./ndiff file1
ERROR: Missing expected arguments. Found 1 but expected 2.
Usage: ndiff [-hqs] [-d INT] [-t DBL] file1 file2
% ./ndiff -h
NAME

    ndiff - Numerically compare files line by line, numerical field by
    numerical field.

SYNOPSIS

    ndiff [-hqs] [-d INT] [-t DBL] file1 file2

OPTIONS

    --digits,-d INT
        Maximum number of significant digits that should match.
        (default: 0)

    --tolerance,-t DBL
        Tolerate <= DBL distance between numbers. (default: 0.0)

    --statistics,-s
        Provide comparison statistics only. (extra credit)

    --quiet,-q
        No output printed to screen. just exit code.??

    --help,-h
        Displays help page.

AUTHOR: Jim Freeze
COPYRIGHT (c) 2005 Jim Freeze

A little note, I did not understand "just exit code" for -q. What is
'-q' supposed
to do. No output at all?

It also appears that the ability to type cast input to floats or ints or
whatever should be added to CommandLine. All comments are appreciated.

···

--
Jim Freeze

That's an interesting point. Maybe we can attract Bill's attention and he'll enlighten us about how NASA would use the data...

James Edward Gray II

···

On Sep 9, 2005, at 8:26 AM, Hugh Sasse wrote:

It would seem useful to be produce a file of numerical differences
as well. Writing the subsequent patch program would open another
can of worms, but difference plots definately have their worth.

Hi

To help the quiz writers focus on the 'the quiz' and not the administrative
part of the code and to help promote CommandLine, here is a little snippet
that some may find useful to get started.

<snip>

Thanks, Jim! I'd actually been trying to use CommandLine::Application,
but was having some problems. Comparing what I had against this let me
see where my problems were.

A little note, I did not understand "just exit code" for -q. What is
'-q' supposed
to do. No output at all?

Yes, as in "no output to STDOUT". But there is output: the exit code.
This is a common feature of many unix tools (e.g. diff[1], grep). It
allows the tool to be used in a boolean environment during shell
scripting. An exit code of 0 means there was no difference (within
tolerance), and a non-zero exit code means there was a difference.

Jacob Fugal

···

On 9/9/05, Jim Freeze <jim@freeze.org> wrote:

Hugh Sasse wrote:

Named fields are an interesting case: what should happen for
      Cy=0.11278889E-01 Cx=-1.343e+02

  And fileB,

      Cx=0.11278890E-01 Cy=-1.343e+02 Cz=2.9979e+08
?
Fields named in a different order, and an extra field introduced.

Non-zero exit code due to the different number of "numeric fields".

Ignore named fields.

Later,

···

--
Bil
http://fun3d.larc.nasa.gov

Jim Freeze wrote:

Hi

Hello.

To help the quiz writers focus on the 'the quiz' and not the administrative
part of the code and to help promote CommandLine, here is a little snippet
that some may find useful to get started.

Awesome.

Thanks for accommodating my lame quiz-writing abilities.

Thanks again,

···

--
Bil
http://fun3d.larc.nasa.gov

Hugh Sasse wrote:

It would seem useful to be produce a file of numerical differences
as well.

Yes, but not to us at this point.

Regards,

···

--
Bil
http://fun3d.larc.nasa.gov

Thanks, Jim! I'd actually been trying to use CommandLine::Application,
but was having some problems. Comparing what I had against this let me
see where my problems were.

Great. Glad to have helped. Any feedback to improve CommandLine is
always appreciated.

> A little note, I did not understand "just exit code" for -q. What is
> '-q' supposed
> to do. No output at all?

Yes, as in "no output to STDOUT". But there is output: the exit code.
This is a common feature of many unix tools (e.g. diff[1], grep). It
allows the tool to be used in a boolean environment during shell
scripting. An exit code of 0 means there was no difference (within
tolerance), and a non-zero exit code means there was a difference.

Ok, thanks for that explanation. I probably would have called it
"--boolean" instead of --quiet. I always think of quiet as no verbosity,
but I could be just un-informed on this one.

···

On 9/9/05, Jacob Fugal <lukfugl@gmail.com> wrote:

--
Jim Freeze

James Edward Gray II wrote:

It would seem useful to be produce a file of numerical differences
as well. Writing the subsequent patch program would open another
can of worms, but difference plots definately have their worth.

That's an interesting point. Maybe we can attract Bill's attention and he'll enlighten us about how NASA would use the data...

Sorry, one of my "ignore thread" filters apparently
got out of hand...

We do regression testing as part of our continuous integration
and currently, merely use the stock diff command to compare
output with a pre-existing "golden" output.

This is troublesome if you switch compiler versions, machine
architectures, order of operations, or other things that merely
tweak the answers in the 13th decimal place or something.

Our currently hack is to do some truncation of the output
before write the data. So, diff only winds up comparing
a certain number of decimals, but this is, of course, is
not very flexible.

The patch style output really isn't important. In fact,
just a non-zero exit code is enough for starters.

So we need ability to compare numerical fields line-by-line,
ignoring any text that might be surrounding them, but
make sure that for a given line, the same number of
"numeric fields" exist.

Later,

···

On Sep 9, 2005, at 8:26 AM, Hugh Sasse wrote:

--
Bil
http://fun3d.larc.nasa.gov

Jacob Fugal wrote:

···

On 9/9/05, Jim Freeze <jim@freeze.org> wrote:

A little note, I did not understand "just exit code" for -q. What is
'-q' supposed to do. No output at all?

Yes, as in "no output to STDOUT". But there is output: the exit code.
This is a common feature of many unix tools (e.g. diff[1], grep). It
allows the tool to be used in a boolean environment during shell
scripting. An exit code of 0 means there was no difference (within
tolerance), and a non-zero exit code means there was a difference.

An exit code is much desired for our application.

Regards,
--
Bil
http://fun3d.larc.nasa.gov

So we need ability to compare numerical fields line-by-line,
ignoring any text that might be surrounding them, but
make sure that for a given line, the same number of
"numeric fields" exist.

But shouldn't the surrounding text matter to some degree? Otherwise:
   a b 1.2345 c d
would match:
   d c 1.2345 b a
.. It seems to me much safer to at least assure that the surrounding
text is the same in both files.

Meador Inge wrote:

So we need ability to compare numerical fields line-by-line,
ignoring any text that might be surrounding them, but
make sure that for a given line, the same number of
"numeric fields" exist.

But shouldn't the surrounding text matter to some degree? Otherwise:
   a b 1.2345 c d
would match:
   d c 1.2345 b a
. It seems to me much safer to at least assure that the surrounding
text is the same in both files.

That would be a good option, but at this point, all
we care about is the numerical tolerance aspect -- ndiff
should be primarily focused on numbers.

We can use diff for text-based changes if need be.

Thanks for the interest,

···

--
Bil
http://fun3d.larc.nasa.gov