Performance improvement possible?

Phil5 · 26 June 2008 21:49

Ellie,

Eleanor McHugh wrote:

A few things:

- you left a line in the loop:

File.open( output_filename, 'w' ) do |fout|

which should be deleted

Paste in haste, repent at leisure I've corrected it to read the
way it appeared in my head when I was looking at it:
http://pastie.org/222765

- I originally used:

stats = lines = File.readlines(input_filename, 'r')

but found that reading the whole file (8871 lines) and then
processing the array was inefficient so I got rid of the array

- using:

stats << stats06

If you buffer it as a single read and then work through the file in memory it guarantees that you minimise the IO costs of reading. I am
of course assuming that even at 8871 lines your file is much smaller
than your available RAM

When I did the profile, the array processing was the biggest hit - when I got rid of the array, I almost halved the time! Ruby arrays are pretty cool but I think you pay for the convenience . .

and the file writing output of:

File.open(output_filename, "a") do |file| file.sync = false file.puts *stats file.fsync end

looks interesting - why should that be faster?

Doing the file write this way offloads making it efficient to the
Ruby runtime. The file.fsync call will cost you in terms of runtime
performance, but it ensures that the data is flushed to disk before
moving on to the next file which for a large data processing job is
often desirable.

See my other note but it didn't make much difference.

Personally I wouldn't store the results in separate
files but combine them into a single file (possibly even a database),
however I don't know how that would fit with your use case.

There is more post processing using R and for casual inspection it is convenient to be able to find data according to it's file name. It might still be possible to have fewer, larger files - I might ask another question about that (basically I have paste the single column output of this stuff into 32 column arrays). I have tried DBs for storing output form the main simulation program when it was all in C/C++ and it was quite slow so I went back to text files . .

As to the file.puts *stats, there's no guarantee this approach will
be efficient but compared to doing something like:

File.open(output_filename, "a") do |file| stats.each { |stat|
file.puts stat } end

it feels more natural to the problem domain.

Yes, it is was good to find out about this alternative.

Another alternative would be:

File.open(output_filename, "a") do |file| file.puts stats.join("\n") end

but that's likely to use more memory as first an in-memory string
will be created, then this will be passed to Ruby's IO code. For the
size of file you're working with that's not likely to be a problem.

I've a suspicion that your overall algorithm can also be greatly improved.

I'm sure you are right about that!

In particular the fact that you're forming a cubic array and then
manipulating it raises warning bells and suggests you'll have data sparsity issues which could be handled in a different way, but that would require a deeper understanding of your data.

The cubic array was just a direct translation of the C pointer setup I had - basically it is a rectangular grid of sub-populations each with an array of allele lengths.

Thanks again,

Regards,

Phil.

···

On 26 Jun 2008, at 20:47, Philip Rhoades wrote:

--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: phil@pricom.com.au

Marc_Heiler · 26 June 2008 22:03

Is it only me or is that profiling tiresome and arduous?

When I did the profile, the array processing was the biggest hit - when
I got rid of the array, I almost halved the time! Ruby arrays are
pretty cool but I think you pay for the convenience . .

But interesting to read that arrays were one performance neck.

···

--
Posted via http://www.ruby-forum.com/\.

Eleanor_McHugh · 27 June 2008 01:00

Eleanor McHugh wrote:

When I did the profile, the array processing was the biggest hit - when I got rid of the array, I almost halved the time! Ruby arrays are pretty cool but I think you pay for the convenience . .

I'm surprised at the behaviour you're seeing so I ran a quick benchmark on what I believe to be an equivalent array handling problem with my laptop (OS X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB RAM):

require 'benchmark'
include Benchmark
bm(6) do |y|
    y.report("appending") { x = ; (1..20000000).each { |i| x << i } }
    y.report("nested creation") { x = Array.new(1000) { Array.new(1000) { Array.new(20, 0) } } }
    y.report("unrolling") { x.flatten.flatten.length }
end

user system total real
appending 7.350000 0.100000 7.450000 ( 7.685458)
nested creation 0.850000 0.000000 0.850000 ( 0.851341)
unrolling 15.240000 0.280000 15.520000 ( 20.927189)

which in each case is manipulating an array of 20,000,000 elements. This would be equivalent to processing 32000 files where each contained 625 occurrences of the 'k=' tag.

and the file writing output of:
File.open(output_filename, "a") do |file| file.sync = false file.puts *stats file.fsync end
looks interesting - why should that be faster?

Doing the file write this way offloads making it efficient to the
Ruby runtime. The file.fsync call will cost you in terms of runtime
performance, but it ensures that the data is flushed to disk before
moving on to the next file which for a large data processing job is
often desirable.

See my other note but it didn't make much difference.

Regardless of what the profiler says, any program that's opening 32000 files and writing to them is a prime candidate for IO optimisation...

Assuming I want to write 625 single-digit numbers to 32000 files the difference with file.puts *stats compared to individual writes is noticeable on my setup and would more than outweigh the cost of appending these numbers to an array. A slower processor might behave differently, as might a system with a higher-performance drive than my laptop's 5400RPM SATA HDD.

x = Array.new(675, 1)
bm(6) do |y|
   y.report("direct") { 32000.times { File.open("test.dat", "a") { |file> file.puts *x } } }
   y.report("enumerated") { 32000.times { File.open("test.dat", "a") { |file| x.each { |i| file.puts i } } } }
   y.report("direct+fsync") { 32000.times { File.open("test.dat", "a") { |file| file.sync = false; file.puts *x; file.fsync } } }
   y.report("enum+fsync") { 32000.times { File.open("test.dat", "a") { |file| file.sync = false; x.each { |i| file.puts i }; file.fsync } } }

end
user system total real
direct 25.160000 2.100000 27.260000 ( 38.775792)
enumerated 34.200000 2.170000 36.370000 ( 55.614969)
direct+fsync 25.300000 2.250000 27.550000 ( 60.225069)
enum+fsync 34.520000 2.420000 36.940000 ( 98.627757)

I'd be interested in equivalent benchmarking figures for your configuration and also an estimate of how close to the actual case this 32000x625 assumption is to your actual use case as it would help me make sense of the doubled execution time you're seeing with my array modification.

The cubic array was just a direct translation of the C pointer setup I had - basically it is a rectangular grid of sub-populations each with an array of allele lengths.

One option is to use a Hash as a Sparse Matrix by using Arrays as keys for indexing:

  stats = Hash.new(0)
  lines.each |line|
    ...
    v = stats06
    stats[[x,y]] = v if v != 0
    ...
  end
  File.open(output_filename, "a") do |file|
    file.puts *stats.values
  end

which should pay off if the number of interesting results is much less than the cubic search space. Obviously if you need the values to keep a specific ordering you'd need additional logic...

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 26 Jun 2008, at 22:51, Philip Rhoades wrote:
----
raise ArgumentError unless @reality.responds_to? :reason

M_Edward_Ed_Borasky1 · 27 June 2008 03:24

Eleanor McHugh wrote:

Regardless of what the profiler says, any program that's opening 32000 files and writing to them is a prime candidate for IO optimisation...

Not to mention directory fragmentation in many filesystems. I'd be looking at an ORM and SQLite rather than attempting something involving 32,000 files.

Phil5 · 27 June 2008 06:55

Ellie,

Eleanor McHugh wrote:

Eleanor McHugh wrote:

When I did the profile, the array processing was the biggest hit - when I got rid of the array, I almost halved the time! Ruby arrays are pretty cool but I think you pay for the convenience . .

I'm surprised at the behaviour you're seeing so I ran a quick benchmark on what I believe to be an equivalent array handling problem with my laptop (OS X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB RAM):

require 'benchmark'
include Benchmark
bm(6) do |y|
   y.report("appending") { x = ; (1..20000000).each { |i| x << i } }
   y.report("nested creation") { x = Array.new(1000) { Array.new(1000) { Array.new(20, 0) } } }
   y.report("unrolling") { x.flatten.flatten.length }
end

                        user system total real
appending 7.350000 0.100000 7.450000 ( 7.685458)
nested creation 0.850000 0.000000 0.850000 ( 0.851341)
unrolling 15.240000 0.280000 15.520000 ( 20.927189)

which in each case is manipulating an array of 20,000,000 elements. This would be equivalent to processing 32000 files where each contained 625 occurrences of the 'k=' tag.

I haven't used BM yet but I can't see how doing a benchmark on an array helps? When I had an array it was slow, when I ditched the array it was fast?

and the file writing output of:
File.open(output_filename, "a") do |file| file.sync = false file.puts *stats file.fsync end
looks interesting - why should that be faster?

Doing the file write this way offloads making it efficient to the
Ruby runtime. The file.fsync call will cost you in terms of runtime
performance, but it ensures that the data is flushed to disk before
moving on to the next file which for a large data processing job is
often desirable.

See my other note but it didn't make much difference.

Regardless of what the profiler says, any program that's opening 32000 files and writing to them is a prime candidate for IO optimisation...

Yes, you have started me thinking about this and the following processing scripts now . .

Assuming I want to write 625 single-digit numbers to 32000 files the difference with file.puts *stats compared to individual writes is noticeable on my setup and would more than outweigh the cost of appending these numbers to an array. A slower processor might behave differently, as might a system with a higher-performance drive than my laptop's 5400RPM SATA HDD.

x = Array.new(675, 1)
bm(6) do |y|
  y.report("direct") { 32000.times { File.open("test.dat", "a") { |file| file.puts *x } } }
  y.report("enumerated") { 32000.times { File.open("test.dat", "a") { >file> x.each { |i| file.puts i } } } }
  y.report("direct+fsync") { 32000.times { File.open("test.dat", "a") { >file> file.sync = false; file.puts *x; file.fsync } } }
  y.report("enum+fsync") { 32000.times { File.open("test.dat", "a") { >file> file.sync = false; x.each { |i| file.puts i }; file.fsync } } }

end
                        user system total real
direct 25.160000 2.100000 27.260000 ( 38.775792)
enumerated 34.200000 2.170000 36.370000 ( 55.614969)
direct+fsync 25.300000 2.250000 27.550000 ( 60.225069)
enum+fsync 34.520000 2.420000 36.940000 ( 98.627757)

I'd be interested in equivalent benchmarking figures for your configuration and also an estimate of how close to the actual case this 32000x625 assumption is to your actual use case as it would help me make sense of the doubled execution time you're seeing with my array modification.

I think I am giving up on writing lots of small files to disk - see following post.

The cubic array was just a direct translation of the C pointer setup I had - basically it is a rectangular grid of sub-populations each with an array of allele lengths.

One option is to use a Hash as a Sparse Matrix by using Arrays as keys for indexing:

    stats = Hash.new(0)
    lines.each |line|
        ...
        v = stats06
        stats[[x,y]] = v if v != 0
        ...
    end
    File.open(output_filename, "a") do |file|
        file.puts *stats.values
    end

which should pay off if the number of interesting results is much less than the cubic search space. Obviously if you need the values to keep a specific ordering you'd need additional logic...

The problem with using a hash is I lose the mental picture of the situation - a two dimensional array corresponds to the physical situation and another array in each of these cells corresponds to the numbers of organisms. If I have time I might look at this suggestion though.

Thanks yet again!

Regards,

Phil.

···

On 26 Jun 2008, at 22:51, Philip Rhoades wrote:

--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: phil@pricom.com.au

Robert_K1 · 27 June 2008 11:41

I'm surprised at the behaviour you're seeing so I ran a quick benchmark on
what I believe to be an equivalent array handling problem with my laptop (OS
X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB RAM):

require 'benchmark'
include Benchmark
bm(6) do |y|
  y.report("appending") { x = ; (1..20000000).each { |i| x << i } }
  y.report("nested creation") { x = Array.new(1000) { Array.new(1000) {
Array.new(20, 0) } } }
  y.report("unrolling") { x.flatten.flatten.length }
end

This can't be the exact code you were benchmarking since "x" is
undefined in the "unrolling" test. Also, why do you flatten twice?
You can't beat an Array flatter than flatten.

irb(main):009:0> Array.new(1000) { Array.new(1000) { Array.new(20, 0)
} }.flatten.select {|x| Array === x}.empty?
=> true

Regardless of what the profiler says, any program that's opening 32000 files
and writing to them is a prime candidate for IO optimisation...

... because IO in that case is also likely the slowest bit. Absolutely.

Kind regards

robert

···

2008/6/27 Eleanor McHugh <eleanor@games-with-brains.com>:

--
use.inject do |as, often| as.you_can - without end

Phil5 · 27 June 2008 07:07

Ed, Ellie et al,

M. Edward (Ed) Borasky wrote:

Eleanor McHugh wrote:

Regardless of what the profiler says, any program that's opening 32000 files and writing to them is a prime candidate for IO optimisation...

Not to mention directory fragmentation in many filesystems. I'd be looking at an ORM and SQLite rather than attempting something involving 32,000 files.

The dir organisation is not so bad - 20 files in each dir in tree of 50x32 dirs - however, this and previous comments from Ellie and others have caused me to think that more of the processing chain can be improved. Although I am incurring a 90% time increase on this current step, the following steps have lots of awks, seds, pastes etc and it is VERY slow. I think if I make use of DataMapper with SQLite (ie keep the DB in memory) and then massage the data with some more Ruby I can improve this stage still further and dramatically improve the next stage as well! (and also keep it in one Ruby program). This is like a drug . .

Converting the first link in the chain, the main simulation C/C++ program would be a VERY big job though and I can't really justify the time for that at the moment although I am thinking about it . .

Thanks to all for their help!

Regards,

Phil.

···

--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: phil@pricom.com.au

Eleanor_McHugh · 27 June 2008 08:36

Your experience runs counter to all of my experience, both in terms of several years of Ruby hacking and more generally in working on performance-optimisation for large data manipulation problems. Therefore being of a curious disposition I'd love to figure out why

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 27 Jun 2008, at 07:55, Philip Rhoades wrote:

Eleanor McHugh wrote:

I'm surprised at the behaviour you're seeing so I ran a quick benchmark on what I believe to be an equivalent array handling problem with my laptop (OS X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB RAM):
require 'benchmark'
include Benchmark
bm(6) do |y|
  y.report("appending") { x = ; (1..20000000).each { |i| x << i } }
  y.report("nested creation") { x = Array.new(1000) { Array.new(1000) { Array.new(20, 0) } } }
  y.report("unrolling") { x.flatten.flatten.length }
end
                       user system total real
appending 7.350000 0.100000 7.450000 ( 7.685458)
nested creation 0.850000 0.000000 0.850000 ( 0.851341)
unrolling 15.240000 0.280000 15.520000 ( 20.927189)
which in each case is manipulating an array of 20,000,000 elements. This would be equivalent to processing 32000 files where each contained 625 occurrences of the 'k=' tag.

I haven't used BM yet but I can't see how doing a benchmark on an array helps? When I had an array it was slow, when I ditched the array it was fast?

----
raise ArgumentError unless @reality.responds_to? :reason

Eleanor_McHugh · 27 June 2008 11:59

I'm surprised at the behaviour you're seeing so I ran a quick benchmark on
what I believe to be an equivalent array handling problem with my laptop (OS
X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB RAM):

require 'benchmark'
include Benchmark
bm(6) do |y|
y.report("appending") { x = ; (1..20000000).each { |i| x << i } }
y.report("nested creation") { x = Array.new(1000) { Array.new(1000) {
Array.new(20, 0) } } }
y.report("unrolling") { x.flatten.flatten.length }
end

This can't be the exact code you were benchmarking since "x" is
undefined in the "unrolling" test.

I was using IRB and x was already defined earlier in my session, so yes and no: for my session it was defined, but for this snippet it won't be.

Also, why do you flatten twice?
You can't beat an Array flatter than flatten.

I was posting at 2am >;o

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 27 Jun 2008, at 12:41, Robert Klemme wrote:

2008/6/27 Eleanor McHugh <eleanor@games-with-brains.com>:

----
raise ArgumentError unless @reality.responds_to? :reason

Phil5 · 27 June 2008 14:47

Ellie,

Eleanor McHugh wrote:

Eleanor McHugh wrote:

I'm surprised at the behaviour you're seeing so I ran a quick benchmark on what I believe to be an equivalent array handling problem with my laptop (OS X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB RAM):
require 'benchmark'
include Benchmark
bm(6) do |y|
  y.report("appending") { x = ; (1..20000000).each { |i| x << i } }
  y.report("nested creation") { x = Array.new(1000) { Array.new(1000) { Array.new(20, 0) } } }
  y.report("unrolling") { x.flatten.flatten.length }
end
                       user system total real
appending 7.350000 0.100000 7.450000 ( 7.685458)
nested creation 0.850000 0.000000 0.850000 ( 0.851341)
unrolling 15.240000 0.280000 15.520000 ( 20.927189)
which in each case is manipulating an array of 20,000,000 elements. This would be equivalent to processing 32000 files where each contained 625 occurrences of the 'k=' tag.

I haven't used BM yet but I can't see how doing a benchmark on an array helps? When I had an array it was slow, when I ditched the array it was fast?

Your experience runs counter to all of my experience, both in terms of several years of Ruby hacking and more generally in working on performance-optimisation for large data manipulation problems. Therefore being of a curious disposition I'd love to figure out why

I don't think there is any great mystery here - in the first version I thought it would be efficient to read the entire text file into an array and then process the array, line by line. In the second case, I eliminated the array and processed the data directly after reading it (the same as I was doing in the first case) and relied on the Linux system buffers/cache to be efficient about reading the data from disk. I think the processing time saved (~50%) is just a function of not putting data into an unnecessary structure and not manipulating the structure?

Regards,

Phil.

···

On 27 Jun 2008, at 07:55, Philip Rhoades wrote:

--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: phil@pricom.com.au

Robert_K1 · 27 June 2008 15:42

Ooops, right you are. I missed that bit completely. Hopefully you got your sleep back by now.

Cheers

robert

···

On 27.06.2008 13:59, Eleanor McHugh wrote:

I was posting at 2am >;o

Eleanor_McHugh · 27 June 2008 15:48

Mental note to self: do _not_ start fiddling in IRB at bedtime

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 27 Jun 2008, at 16:42, Robert Klemme wrote:

On 27.06.2008 13:59, Eleanor McHugh wrote:

I was posting at 2am >;o

Ooops, right you are. I missed that bit completely. Hopefully you got your sleep back by now.

----
raise ArgumentError unless @reality.responds_to? :reason

Ryan_Davis1 · 28 June 2008 09:15

nonsense... just don't post your results.

···

On Jun 27, 2008, at 08:48 , Eleanor McHugh wrote:

Mental note to self: do _not_ start fiddling in IRB at bedtime

Robert_K1 · 28 June 2008 09:17

robert

···

On 27.06.2008 17:48, Eleanor McHugh wrote:

On 27 Jun 2008, at 16:42, Robert Klemme wrote:

On 27.06.2008 13:59, Eleanor McHugh wrote:

I was posting at 2am >;o

Ooops, right you are. I missed that bit completely. Hopefully you got your sleep back by now.

Mental note to self: do _not_ start fiddling in IRB at bedtime

Eleanor_McHugh · 28 June 2008 11:16

lol
like common sense and late-night hacking have ever gone together

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 28 Jun 2008, at 10:15, Ryan Davis wrote:

On Jun 27, 2008, at 08:48 , Eleanor McHugh wrote:

Mental note to self: do _not_ start fiddling in IRB at bedtime

nonsense... just don't post your results.

----
raise ArgumentError unless @reality.responds_to? :reason

Topic		Replies	Views
Ruby performance ruby-talk	32	181	4 July 2012
For performance, write it in C ruby-talk	156	338	29 August 2006
A Real World example for Ruby to "compiled" version discussion ruby-talk	9	135	22 October 2010
Basic Ruby performance ruby-talk	42	205	15 February 2012
Performance and style advice requested ruby-talk	50	216	17 September 2003

Performance improvement possible?

Related topics