[OT] calculations on lists of numbers

for years i've felt that i should be able to pipe numerical output into some
unix command like so

   cat list | mean
   cat list | sum
   cat list | minmax

etc.

and have never found one. right now i'm building a ruby version - before i
continue, does anyone know a standard unix or ruby version of this?

cheers.

-a

···

--
if you want others to be happy, practice compassion.
if you want to be happy, practice compassion. -- the dalai lama

I wonder if something like bc could do something along those lines. The biggest problem is that you have to assume a lot of information about the format of the list.

When you get something implemented, I'd be interested in seeing it.

-Chris

···

On Dec 1, 2006, at 9:02 PM, ara.t.howard@noaa.gov wrote:

for years i've felt that i should be able to pipe numerical output into some
unix command like so

  cat list | mean
  cat list | sum
  cat list | minmax

etc.

and have never found one. right now i'm building a ruby version - before i
continue, does anyone know a standard unix or ruby version of this?

cheers.

-a
--
if you want others to be happy, practice compassion.
if you want to be happy, practice compassion. -- the dalai lama

   cat list | mean

cat list | awk '{ s += $1; n += 1 } END { print s / n }'

   cat list | sum

cat list | awk '{ s += $1 } END { print s }'

   cat list | minmax

Hmmm..... need to study some more awk.

It is so easy to create in Ruby, a matter of minutes, that it is not
terribly important to do the search you are suggesting.

···

ara.t.howard@noaa.gov wrote:

for years i've felt that i should be able to pipe numerical output into
some unix command like so

   cat list | mean
   cat list | sum
   cat list | minmax

etc.

and have never found one. right now i'm building a ruby version - before
i continue, does anyone know a standard unix or ruby version of this?

--------------------------------

#!/usr/bin/ruby -w

array =

STDIN.read.split(/\s+/).each do |item|
   if(v = item.to_f)
      array << v
   end
end

if(array.size > 0)
   sum = 0
   array.each { |v| sum += v }
   mean = sum / array.size
   puts sum.to_s + " " +
     mean.to_s + " " +
     array.min.to_s + " " +
     array.max.to_s
end

--------------------------------

$ echo 1 2 3 4 5 | (script_name)

Output: 15.0 3.0 1.0 5.0

--
Paul Lutus
http://www.arachnoid.com

for years i've felt that i should be able to pipe numerical output into some
unix command like so

   cat list | mean
   cat list | sum
   cat list | minmax

Why drag in the cat when it's utterly superfluous?

  mean <list
  sum <list
  minmax <list

etc.

and have never found one. right now i'm building a ruby version - before i
continue, does anyone know a standard unix or ruby version of this?

Matthew Moss wrote:

> cat list | mean

cat list | awk '{ s += $1; n += 1 } END { print s / n }'

Reading a file isn't a magical ability; awk can do it.

  awk "{s+=$0} END{print s/NR}" list

If there can be more than one number on a line:

  awk "{for(i=1;i<=NF;i++)s+=$i; n+=NF} END{print s/n}" file

Ruby:

ruby -nale "BEGIN{$s=$n=0}; $s+=$F.inject(0){|x,y| x.to_f+y.to_f};
  $n+=$F.size; END{puts $s/$n}" file

If there's memory enough for the whole file:

ruby -e "a=$<.read.split.map{|x|x.to_f};
  puts a.inject{|x,y|x+y}/a.size" file

···

ara.t.howard@noaa.gov wrote:

Hi,

  I'm currently coding a ruby program for processing images. One of the class I
wrote is intended to compute some stats about the luminance of a channel, but
in fact it can be used on any set of numerical datas. The stats are :
- an histogram
- the mean
- the variance
- the deviation
- the median
- the skewness
- the kurtosis

  It is very fast, since it uses no memory : the values are not stored
internally, just the sub-results (so, a list of 2 values will use the same
amount of memory than a list of a billion values), and also because the
method that adds a value is generated depending of what stats you want to
compute.

  It would be really easy to add min and max, but it would need one or two
modifications to get rid of the "image-specific" things. Then, reading stdin
and add the values would be no problem.

  If you want to base your work on my class then just tell me, I'll be happy to
share it if it can help.

-- Olivier

···

Le samedi 02 décembre 2006 05:02, ara.t.howard@noaa.gov a écrit :

for years i've felt that i should be able to pipe numerical output into
some unix command like so

   cat list | mean
   cat list | sum
   cat list | minmax

etc.

and have never found one. right now i'm building a ruby version - before i
continue, does anyone know a standard unix or ruby version of this?

cheers.

-a

I am by no means an expert in numerical processing but maybe bc or dc can be made to function that way. Other than that my first line of defense would probably be awk - if you want to prevent usage of Ruby. :slight_smile:

Kind regards

  robert

···

On 02.12.2006 05:02, ara.t.howard@noaa.gov wrote:

for years i've felt that i should be able to pipe numerical output into some
unix command like so

  cat list | mean
  cat list | sum
  cat list | minmax

etc.

and have never found one. right now i'm building a ruby version - before i
continue, does anyone know a standard unix or ruby version of this?

ara.t.howard@noaa.gov writes:

for years i've felt that i should be able to pipe numerical output into some
unix command like so

  cat list | mean
  cat list | sum
  cat list | minmax

etc.

and have never found one. right now i'm building a ruby version - before i
continue, does anyone know a standard unix or ruby version of this?

cheers.

-a
--
if you want others to be happy, practice compassion.
if you want to be happy, practice compassion. -- the dalai lama

Sorry this doesn't really answer your question, but...

When doing science with piles of columns of numbers in text files, I
have got endless mileage out of JDB[1].

It's not only the collection of commands, it's how they painlessly
cause a group to standardize on a single format for all program
output.

% cat db
#h x y
1 4
2 5
3 6
% cat db | dbstats -q 4 x | dblistize
#L mean stddev pct_rsd conf_range conf_low conf_high conf_pct sum sum_squared min max n q1 q2 q3
mean: 2
stddev: 1
pct_rsd: 50
conf_range: 2.4843
conf_low: -0.48434
conf_high: 4.4843
conf_pct: 0.95
sum: 6
sum_squared: 14
min: 1
max: 3
n: 3
q1: 1
q2: 2
q3: 3

# | dbstats -q 4 x
# 0.95 confidence intervals assume normal distribution and small n.
# | dblistize

Steve

[1] JDB

ara.t.howard@noaa.gov writes:

for years i've felt that i should be able to pipe numerical output into some
unix command like so

  cat list | mean
  cat list | sum
  cat list | minmax

etc.

and have never found one. right now i'm building a ruby version - before i
continue, does anyone know a standard unix or ruby version of this?

What about hacking http://rubyforge.org/projects/sss/ ?

···

cheers.

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Oh... my samples assume each number is on its own line... dunno
offhand how to support an arbitrary amount of numbers on a line....

yup. that's where i broke into ruby too :wink:

-a

···

On Sat, 2 Dec 2006, Matthew Moss wrote:

   cat list | mean

cat list | awk '{ s += $1; n += 1 } END { print s / n }'

   cat list | sum

cat list | awk '{ s += $1 } END { print s }'

   cat list | minmax

Hmmm..... need to study some more awk.

--
if you want others to be happy, practice compassion.
if you want to be happy, practice compassion. -- the dalai lama

Paul Lutus wrote:

for years i've felt that i should be able to pipe numerical output into
some unix command like so

   cat list | mean
   cat list | sum
   cat list | minmax

etc.

and have never found one. right now i'm building a ruby version - before
i continue, does anyone know a standard unix or ruby version of this?

It is so easy to create in Ruby, a matter of minutes, that it is not
terribly important to do the search you are suggesting.

Disagree. I would like to know if a unix version exists, since it will certainly be faster than ruby, run in less memory, and probably exist in environments where ruby doesn't. So I think the search is worth while.

Also, it _is_ terribly important to scrutinize code one finds in a newsgroup, so here we go...

#!/usr/bin/ruby -w

array =

STDIN.read.split(/\s+/).each do |item|
   if(v = item.to_f)
      array << v
   end
end

- Use $stdin instead of STDIN, to play well with reassignment of $stdin, in case this snippet ever becomes part of a library, and someone wants to capture output.

- The code above can be simplified and improved so that files can be named on the command line:

array = ARGF.read.split.map {|s| Float(s)}

(Is #Float better than #to_f? It depends. If you want "3foobar" to be treated as 3.0 and you want "foobar3" to be treated as 0.0, stick with #to_f (and keep the nil values out of the array). If you want the program to die noisily on bad input, use #Float. As a bonus, you don't have to deal with nil values.)

if(array.size > 0)
   sum = 0
   array.each { |v| sum += v }
   mean = sum / array.size
   puts sum.to_s + " " +
     mean.to_s + " " +
     array.min.to_s + " " +
     array.max.to_s

> end

More idiomatically ruby, IMO, is the following:

unless array.empty?
   sum = array.inject {|s,x| s+x}
   mean = sum / array.size
   puts "#{sum} #{mean} #{array.min} #{array.max}"
end

Also, you might want an empty array have a sum of 0, just so that the nice algebraic properties hold:

[1, 2, 3, 4].sum + .sum == [1, 2].sum + [3, 4].sum

(And, it's fairly standard: http://wiki.r-project.org/rwiki/doku.php?id=tips:surprises:emptysetfuncs\)

That only makes sense for the

>> cat list | sum

invocation, of course.

Here's the implementation so far:

$ cat agr.rb
#!/usr/bin/env ruby

array = ARGF.read.split.map {|s| Float(s)}

sum = array.inject(0) {|s,x| s+x}

print sum
unless array.empty?
   mean = sum / array.size
   print " #{mean} #{array.min} #{array.max}"
end
puts

$ echo "1 2 3" | ./agr.rb
6.0 2.0 1.0 3.0
$ echo "1 2 3foo" | ./agr.rb
./agr.rb:3:in `Float': invalid value for Float(): "3foo" (ArgumentError)
         from ./agr.rb:3
$ echo "1 2 3" >data
$ ./agr.rb data data
12.0 2.0 1.0 3.0
[~/tmp] echo "" >empty_data
[~/tmp] ./agr.rb empty_data
0

--------------------------------

$ echo 1 2 3 4 5 | (script_name)

Output: 15.0 3.0 1.0 5.0

And then what do you do if you are piping this output somewhere else? Use cut to get the mean or whatever it was you wanted? The OP wanted three separate functions. It might be better to use an argument to the script to select which aggregate value is to be output.

There's not much point computing the min and max if only the mean was requested.

···

ara.t.howard@noaa.gov wrote:

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

heh, i've got something similar i use to compute stats on binary data all the
time, here's the entire code

   harp:~> cat a.rb
   #! /dmsp/reference/bin/ruby

   require 'narray'
   require 'yaml'

   list = ARGF.readlines.map{|line| line.strip.split(%r/\s+/).map{|f| Float f}}.flatten
   na = NArray.to_na list

   puts '---'
   %w( min max mean median stddev ).each{|stat| puts "#{ stat }: #{ na.send stat}"}

but it's not as complete as yours.

-a

···

On Sat, 2 Dec 2006, Olivier wrote:

Hi,

  I'm currently coding a ruby program for processing images. One of the class I
wrote is intended to compute some stats about the luminance of a channel, but
in fact it can be used on any set of numerical datas. The stats are :
- an histogram
- the mean
- the variance
- the deviation
- the median
- the skewness
- the kurtosis

  It is very fast, since it uses no memory : the values are not stored
internally, just the sub-results (so, a list of 2 values will use the same
amount of memory than a list of a billion values), and also because the
method that adds a value is generated depending of what stats you want to
compute.

  It would be really easy to add min and max, but it would need one or two
modifications to get rid of the "image-specific" things. Then, reading stdin
and add the values would be no problem.

  If you want to base your work on my class then just tell me, I'll be happy to
share it if it can help.

-- Olivier

--
if you want others to be happy, practice compassion.
if you want to be happy, practice compassion. -- the dalai lama

William James wrote:

···

ara.t.howard@noaa.gov wrote:

for years i've felt that i should be able to pipe numerical output into
some unix command like so

   cat list | mean
   cat list | sum
   cat list | minmax

Why drag in the cat when it's utterly superfluous?

  mean <list
  sum <list
  minmax <list

Yes, true, but in a simple example like this, 'cat' is just a stand-in for
some other application that would stream the numbers. In such a case, the
pipe seems more appropriate.

--
Paul Lutus
http://www.arachnoid.com

Olivier wrote:
...

  I'm currently coding a ruby program for processing images. One of the class I wrote is intended to compute some stats about the luminance of a channel, but in fact it can be used on any set of numerical datas. The stats are :
- an histogram
- the mean
- the variance
- the deviation
- the median
- the skewness
- the kurtosis

  It is very fast, since it uses no memory : the values are not stored internally, just the sub-results (so, a list of 2 values will use the same amount of memory than a list of a billion values), and also because the method that adds a value is generated depending of what stats you want to compute.

What's the secret to computing stdev in bounded space? The formulas I know (I am not much of a statistician) require you to know the mean in advance.

Do you do it in two passes through the data, first getting the mean and then the stdev? (But this would not work if you are reading data from stdin and don't want to cache the data in memory.)

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

easy in ruby :wink:

-a

···

On Sat, 2 Dec 2006, Matthew Moss wrote:

Oh... my samples assume each number is on its own line... dunno
offhand how to support an arbitrary amount of numbers on a line....

--
if you want others to be happy, practice compassion.
if you want to be happy, practice compassion. -- the dalai lama

Joel VanderWerf wrote:

Paul Lutus wrote:

for years i've felt that i should be able to pipe numerical output into
some unix command like so

   cat list | mean
   cat list | sum
   cat list | minmax

etc.

and have never found one. right now i'm building a ruby version -
before i continue, does anyone know a standard unix or ruby version of
this?

It is so easy to create in Ruby, a matter of minutes, that it is not
terribly important to do the search you are suggesting.

Disagree.

It's a bit too late to disagree, in the face of the evidence that I said it,
then I did it.

I would like to know if a unix version exists, since it will
certainly be faster than ruby, run in less memory, and probably exist in
environments where ruby doesn't. So I think the search is worth while.

Yes, all true, but that isn't what you disagreed with.

A number of *nix hands will probably put forth solutions that rely on awk or
bc (something I might have done in years past), and they will probably be
faster, and they certainly exist, and no need to write any Ruby code.

But writing something quick and serviceable in Ruby was extremely easy.

···

ara.t.howard@noaa.gov wrote:

--
Paul Lutus
http://www.arachnoid.com

Joel VanderWerf wrote:

   puts "#{sum} #{mean} #{array.min} #{array.max}"

I'm not much into golf, but, since we've long since left the clubhouse, and
because I am very lazy:

puts [ sum,mean,array.min,array.max ].map { |v| v.to_s }.join(' ')

This has the sole advantage that particular elements can be added and
removed without a lot of typing. If no one expects to change the program,
then there's no point in it.

···

--
Paul Lutus
http://www.arachnoid.com

  harp:~> cat a.rb

Man, how do you keep all your a.rb's straight?

  #! /dmsp/reference/bin/ruby

  require 'narray'
  require 'yaml'

Am I blind, or do you require 'yaml' and never use it?

  list = ARGF.readlines.map{|line| line.strip.split(%r/\s+/).map{|f| Float f}}.flatten

Doesn't this load the whole list of numbers into memory? (i.e. how does it fare on "a billion values"?

  na = NArray.to_na list

  puts '---'
  %w( min max mean median stddev ).each{|stat| puts "#{ stat }: #{ na.send stat}"}

Devin

···

ara.t.howard@noaa.gov wrote:

I'll take a look to this NArray class, it seems pretty powerful ! I don't
understand how the examples on their website works, but the 'image blur'
sample is exactly the kind of things I have to do for my project. Too bad, I
don't have time to restart from scratch... But I'll keep it in mind :slight_smile:

-- olivier