Inverse scanf: finding format specifers of existing fields

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

  '0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

Thanks,

···

--
Bil Kleb
http://fun3d.larc.nasa.gov

[1] Legacy formatted-Fortran data files.

Are there many different formats?

-- fxn

···

On May 2, 2007, at 12:50 PM, Bil Kleb wrote:

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

'0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

'0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

If there is a fixed number of formats you can probably use a cascade of RX matches. Otherwise it probably becomes a bit more complex like matching sequences of digits and measuring their lengths.

>> md = %r{^(\d+)\.(\d+)?$}.match('0.4577')
=> #<MatchData:0x7ef61250>
>> pa="%#{md[0].size}.#{md[2].size}f"
=> "%6.4f"
>> pa % 0.4577111
=> "0.4577"

HTH

  robert

···

On 02.05.2007 12:47, Bil Kleb wrote:

Hi --

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

  '0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

You could probably just do a gsub, like this:

require 'scanf'

re = /-?\d+\.\d+(e-\d+)?/

a = "'0.4577' -> '0.7728'"
b = "'-2.345e-02' -> ' 1.232e-03'"

as = a.gsub(re, "%f")
bs = a.gsub(re, "%f")

p a.scanf(as)
p b.scanf(bs)

Output:

[0.4577, 0.7728]
[-0.02345, 0.001232]

David

···

On 5/2/07, Bil Kleb <Bil.Kleb@nasa.gov> wrote:

--
Upcoming Rails training by Ruby Power and Light:
  Four-day Intro to Intermediate
  May 8-11, 2007
  Edison, NJ
  http://www.rubypal.com/events/05082007

Bill,

How's this for a start? I wrote it leaning towards clarity vs. conciseness.

rick@frodo:/public/rubyscripts$ cat number_format.rb
class String
  def to_number_format
    m = match(%r{^(*)([+-]?)(.*)$})
    leading_blanks, sign, rest = m[1], m[2], m[3]
    plus_flag = sign == '+' ? sign : ''
    case rest
    when %r{^([\d]\.([\d]+)([eE])[+-][\d]+)(.*)$}
      # exponentiated float
      entirety, frac_part, e_or_E, exponent, suffix = $1, $2, $3, $4, $5
      entirety = leading_blanks << entirety
      "%#{entirety.length}.#{frac_part.length}#{e_or_E}#{suffix}"
    when %r{^([\d]+\.([\d]*))(.*)$}
      # simple float
      entirety, frac_part, suffix = $1, $2, $3
      zero = frac_part.match(/00$/) ? '0' : ''
      "%#{zero}#{entirety.length}.#{frac_part.length}f#{suffix}"
    when %r{^(0[\d]+)([^e.]*)$}
      # zero padded integer
      digits, suffix = $1, $2
      "#{leading_blanks}%#{plus_flag}0#{digits.length}d#{$suffix}"
    when %r{^([\d]+)([^e.]*)$}
      # whitespace padded integer
      digits, suffix = $1, $2
      digits = leading_blanks << digits
      "%#{digits.length}d#{suffix}"
   else
      nil
    end
  end
end

x = '0.4577'
puts x
puts x.to_number_format
puts x.to_number_format % x.to_f
puts(x.to_number_format % 0.7728)
puts (x.to_number_format % x.to_f) == x
puts

x = '-2.345e-02'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_f)
puts(x.to_number_format % 1.232e-03)
puts (x.to_number_format % x.to_f) == x
puts

x = '12345'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_f) == x
puts

x = ' 00012345'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_i) == x
puts

x = ' 12345'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_i) == x

rick@frodo:/public/rubyscripts$ ruby number_format.rb
0.4577
%6.4f
0.4577
0.7728
true

-2.345e-02
%9.3e
-2.345e-02
1.232e-03
true

12345
%5d
12345
  765
true

  00012345
  %08d
  00012345
  00000765
true

  12345
%7d
  12345
    765
true

···

On 5/2/07, Bil Kleb <Bil.Kleb@nasa.gov> wrote:

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

  '0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Xavier Noria wrote:

Are there many different formats?

Yes, in that the field lengths are different.

No, in that the there are really only three "types":
integers, vanilla floats, and exponentials.

Regards,

···

--
Bil Kleb
http://fun3d.larc.nasa.gov

David A. Black wrote:

Hi --

Hi.

Output:

[0.4577, 0.7728]
[-0.02345, 0.001232]

The second output indicates that I failed to express
my predicament clearly, as the numbers are no longer
in exponential format?

A brief re-cast:

The original file has numbers of the form

  5 0.4577 -2.345e-02

Something reads the numbers and spits out new numbers,
but in exactly the same format as the original file, e.g.,

  8 0.7728 1.232e-03

I.e., I can't write the last number out as 0.001232 --
it has to be in exponential format with the same field
lengths.

Regards,

···

--
Bil Kleb
http://fun3d.larc.nasa.gov

Robert Klemme wrote:

If there is a fixed number of formats you can probably use a cascade of RX matches.

Unfortunately not.

Otherwise it probably becomes a bit more complex like matching sequences of digits and measuring their lengths.

>> md = %r{^(\d+)\.(\d+)?$}.match('0.4577')
=> #<MatchData:0x7ef61250>
>> pa="%#{md[0].size}.#{md[2].size}f"

Hmmm, this looks like a viable path.

I hadn't thought of using MatchData groups, but as you say,
it may get ugly fast... I'm thinking of edge cases like
dealing with the leading space if positive numbers become
negative, or accommodating the number of digits needed for
exponentials or integers if the new number exceeds the
capacity of the existing format.

Thanks,

···

--
Bil Kleb
http://fun3d.larc.nasa.gov

Rick DeNatale wrote:

How's this for a start?

Excellent! Thanks.

All but my last test passed:

  require 'test/unit'
  require 'number_format'
  class TestNumberFormat < Test::Unit::TestCase
    def test_some_floats
      assert_equal( '%3.1f', '8.3'.to_number_format )
      assert_equal( '%05.3f', '0.500'.to_number_format )
      assert_equal( '%8.7f', '.0001170'.to_number_format )
      assert_equal( '%7.1f', '14000.0'.to_number_format )
      assert_equal( '%9.3E', '4.480E+09'.to_number_format )
      assert_equal( '%6.1e', '3.2e-5'.to_number_format )
      assert_equal( '%6.1f', '-254.2'.to_number_format )
    end
  end

   1) Failure:
  test_some_floats(TestNumberFormat) [-:11]:
  <"%6.1f"> expected but was
  <"%5.1f">.

Note: made the simple float leading digit match 0
or more to get the third test to pass.

Puzzling the minus sign part now...

Thanks again,

···

--
Bil Kleb
http://fun3d.larc.nasa.gov

Then I think you could base the solution on String#index/regexps depending on the existence of "e" and ".", since we can assume numbers are well-formed. The idea would be:

   if none
     %d
   elsif "e"
     %e
   else
     %f with computed widths
   end

-- fxn

···

On May 2, 2007, at 2:50 PM, Bil Kleb wrote:

Xavier Noria wrote:

Are there many different formats?

Yes, in that the field lengths are different.

No, in that the there are really only three "types":
integers, vanilla floats, and exponentials.

For floating point numbers you might even get away with a single regexp if that is crafted appropriately and group values are evaluated accordingly.

Kind regards

  robert

···

On 02.05.2007 15:08, Bil Kleb wrote:

Robert Klemme wrote:

If there is a fixed number of formats you can probably use a cascade of RX matches.

Unfortunately not.

Otherwise it probably becomes a bit more complex like matching sequences of digits and measuring their lengths.

>> md = %r{^(\d+)\.(\d+)?$}.match('0.4577')
=> #<MatchData:0x7ef61250>
>> pa="%#{md[0].size}.#{md[2].size}f"

Hmmm, this looks like a viable path.

I hadn't thought of using MatchData groups, but as you say,
it may get ugly fast... I'm thinking of edge cases like
dealing with the leading space if positive numbers become
negative, or accommodating the number of digits needed for
exponentials or integers if the new number exceeds the
capacity of the existing format.

Bil Kleb wrote:

Puzzling the minus sign part now...

  "%#{zero}#{sign.length+entirety.length}.#{frac_part.length}f#{suffix}"
             ^^^^^^^^^^^^
Later,

···

--
Bil Kleb
http://fun3d.larc.nasa.gov

Rick DeNatale wrote:
>
> How's this for a start?

Excellent! Thanks.

All but my last test passed:

  require 'test/unit'
  require 'number_format'
  class TestNumberFormat < Test::Unit::TestCase
    def test_some_floats
      assert_equal( '%3.1f', '8.3'.to_number_format )
      assert_equal( '%05.3f', '0.500'.to_number_format )
      assert_equal( '%8.7f', '.0001170'.to_number_format )

Not sure how this one worked, it fails for me. As a matter of fact:
irb(main):001:0> '%8.7f' % 0.0001170
=> "0.0001170"

And I haven't been able to find an sprintf format string which
supresses a leading zero on a float.

      assert_equal( '%7.1f', '14000.0'.to_number_format )
      assert_equal( '%9.3E', '4.480E+09'.to_number_format )
      assert_equal( '%6.1e', '3.2e-5'.to_number_format )
      assert_equal( '%6.1f', '-254.2'.to_number_format )
    end
  end

   1) Failure:
  test_some_floats(TestNumberFormat) [-:11]:
  <"%6.1f"> expected but was
  <"%5.1f">.

Note: made the simple float leading digit match 0
or more to get the third test to pass.

Puzzling the minus sign part now...

I see that you figured this out.

Another thing to test is that the values actually round trip. Here's my test:

rick@frodo:/public/rubyscripts$ cat test_number_format.rb
require 'test/unit'
require 'number_format'
class TestNumberFormat < Test::Unit::TestCase
   def test_some_floats
     assert_equal( '%3.1f', '8.3'.to_number_format )
     assert_nf('8.3')
     assert_equal( '%05.3f', '0.500'.to_number_format )
     assert_nf('0.500')
     assert_equal( '%8.7f', '.0001170'.to_number_format )
     assert_nf('.0001170')
     assert_equal( '%7.1f', '14000.0'.to_number_format )
     assert_nf('14000.0')
     assert_equal( '%9.3E', '4.480E+09'.to_number_format )
     assert_nf('4.480E+09')
     assert_equal( '%6.1e', '3.2e-5'.to_number_format )
     assert_nf('3.2e-5')
     assert_equal( '%6.1f', '-254.2'.to_number_format )
     assert_nf('-254.2')
   end

   private
   def assert_nf(str)
     assert_equal(str, str.to_number_format % eval(str))
   end
end

···

On 5/4/07, Bil Kleb <Bil.Kleb@nasa.gov> wrote:

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Xavier Noria wrote:

Then I think you could base the solution on String#index/regexps depending on the existence of "e" and ".", since we can assume numbers are well-formed. The idea would be:

  if none
    %d
  elsif "e"
    %e
  else
    %f with computed widths
  end

This, coupled with Robert's computed field lengths
is beginning to look tractable...

Thanks,

···

--
Bil Kleb
http://fun3d.larc.nasa.gov

By the way Bill, seeing who you seem to work for, I'd like to dedicate
whatever help I've given to you to the memory of Wally Schirra!

Are you a turtle? <G>

···

On 5/5/07, Rick DeNatale <rick.denatale@gmail.com> wrote:

On 5/4/07, Bil Kleb <Bil.Kleb@nasa.gov> wrote:
> Rick DeNatale wrote:
> >
> > How's this for a start?
>
> Excellent! Thanks.

--
Rick DeNatale

Visit the Project Mercury Wiki Site
http://www.mercuryspacecraft.com/

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Rick DeNatale wrote:

···

On 5/4/07, Bil Kleb <Bil.Kleb@nasa.gov> wrote:

      assert_equal( '%8.7f', '.0001170'.to_number_format )

Not sure how this one worked, it fails for me. As a matter of fact:
irb(main):001:0> '%8.7f' % 0.0001170
=> "0.0001170"

And I haven't been able to find an sprintf format string which
supresses a leading zero on a float.

You're correct; as you wrote, I wasn't testing round-trip.

Thanks,
--
Bil Kleb
http://fun3d.larc.nasa.gov

Rick DeNatale wrote:

By the way Bill, seeing who you seem to work for, I'd like to dedicate
whatever help I've given to you to the memory of Wally Schirra!

You helped me learn more Ruby; always a pure joy. Thank you.

I've since decided that I'm going to require the users
specify the format instead of trying to back it out --
there are cases for which you just can't back out the
correct format. Besides, the need is infrequent, and
I have no sympathy for code that employs formatted reads...

Are you a turtle? <G>

You bet your sweet ass I am! :wink:

Regards,

···

--
Bil Kleb
http://fun3d.larc.nasa.gov