I noticed that ruby's disc performance drops drastically when a large
array is allocated. I think it has to do with garbage collection since
the performance increases again by disabling the garbage collection. I
created a small test program to illustrate the problem:
···
# #begin of program
#
allocateBefore=true
useFileLoop=true
disableGC=false
GC.disable if disableGC
#create a file containing 100000 lines of 'test'
File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}
largeArray=Array.new(20000000) if allocateBefore
if useFileLoop
File.open('testfile') do |fi|
fi.each{|line|}
end
else
1000000.times{|i|}
end
largeArray=Array.new(20000000) if !allocateBefore
# #end of program
#
On my home pc, the above program takes 3.225 sec. If I allocate the
large array AFTER the fi.each-loop by setting allocateBefore=false, it
takes only 0.467 sec. The same good performance occurs when I disable
the garbage collection by setting disableGC=true. Unfortunately,
disabling GC is not an option in my real application since my file is
a lot larger and all my memory gets consumed very fast.
If I play with the allocateBefore and disableGC when the
1000000.times-loop is enabled (by setting useFileLoop=false), I don't
get this difference anymore.
Any idea what is going on here? How can I achieve a good file
performance with large arrays in memory?
If you run Unix, maybe you should consider using mmap module?
Cheers,
Kent.
···
On Oct 13, 2004, at 5:44 PM, Geert Fannes wrote:
Hello,
I noticed that ruby's disc performance drops drastically when a large
array is allocated. I think it has to do with garbage collection since
the performance increases again by disabling the garbage collection. I
created a small test program to illustrate the problem:
# #begin of program
#
allocateBefore=true
useFileLoop=true
disableGC=false
GC.disable if disableGC
#create a file containing 100000 lines of 'test'
File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}
largeArray=Array.new(20000000) if allocateBefore
if useFileLoop
File.open('testfile') do |fi|
fi.each{|line|}
end
else
1000000.times{|i|}
end
largeArray=Array.new(20000000) if !allocateBefore
# #end of program
#
On my home pc, the above program takes 3.225 sec. If I allocate the
large array AFTER the fi.each-loop by setting allocateBefore=false, it
takes only 0.467 sec. The same good performance occurs when I disable
the garbage collection by setting disableGC=true. Unfortunately,
disabling GC is not an option in my real application since my file is
a lot larger and all my memory gets consumed very fast.
If I play with the allocateBefore and disableGC when the
1000000.times-loop is enabled (by setting useFileLoop=false), I don't
get this difference anymore.
Any idea what is going on here? How can I achieve a good file
performance with large arrays in memory?
Definitely a performance hit. Pretty interesting.
-Charlie
···
On Oct 13, 2004, at 2:44 PM, Geert Fannes wrote:
I noticed that ruby's disc performance drops drastically when a large
array is allocated. I think it has to do with garbage collection since
the performance increases again by disabling the garbage collection. I
created a small test program to illustrate the problem:
# #begin of program
#
allocateBefore=true
useFileLoop=true
disableGC=false
GC.disable if disableGC
#create a file containing 100000 lines of 'test'
File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}
largeArray=Array.new(20000000) if allocateBefore
if useFileLoop
File.open('testfile') do |fi|
fi.each{|line|}
end
else
1000000.times{|i|}
end
largeArray=Array.new(20000000) if !allocateBefore
# #end of program
#
I played some more with the test program and apparently it has nothing
to do with the file access. The program below is a simplified version,
which is more to the point. If I exchange the string allocation t="t"
with t=1, there is no performance drop anymore.
···
# #begin of program
#
allocateBefore=true
disableGC=false
GC.disable if disableGC
largeArray=Array.new(20000000) if allocateBefore
100000.times{|i|t="t"}
largeArray=Array.new(20000000) if !allocateBefore
# #end of program
#
Any idea why the string allocation (and possibly deallocation) takes
so much more time when there is a large array in memory? Can I destroy
an object manually? This could be helpfull in combination with
disabling the garbage collection.
Greets,
Geert Fannes.
geert.fannes@gmail.com (Geert Fannes) wrote in message news:<bc64b7df.0410131344.16c4856e@posting.google.com>...
Hello,
I noticed that ruby's disc performance drops drastically when a large
array is allocated. I think it has to do with garbage collection since
the performance increases again by disabling the garbage collection. I
created a small test program to illustrate the problem:
# #begin of program
#
allocateBefore=true
useFileLoop=true
disableGC=false
GC.disable if disableGC
#create a file containing 100000 lines of 'test'
File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}
largeArray=Array.new(20000000) if allocateBefore
if useFileLoop
File.open('testfile') do |fi|
fi.each{|line|}
end
else
1000000.times{|i|}
end
largeArray=Array.new(20000000) if !allocateBefore
# #end of program
#
On my home pc, the above program takes 3.225 sec. If I allocate the
large array AFTER the fi.each-loop by setting allocateBefore=false, it
takes only 0.467 sec. The same good performance occurs when I disable
the garbage collection by setting disableGC=true. Unfortunately,
disabling GC is not an option in my real application since my file is
a lot larger and all my memory gets consumed very fast.
If I play with the allocateBefore and disableGC when the
1000000.times-loop is enabled (by setting useFileLoop=false), I don't
get this difference anymore.
Any idea what is going on here? How can I achieve a good file
performance with large arrays in memory?
My guess is that in the loop 100000.times{|i|t="t"}, the garbage
collector will run out of memory, maybe once, maybe even more. And
when it runs out of memory, it will have to mark/read/follow all cells
in "largeArray", and since that array is very large, i think that's
causing the slowdown.
Try the following and you'll see it's faster:
···
At Thu, 14 Oct 2004 15:54:32 +0900, Geert Fannes wrote:
Hello,
I played some more with the test program and apparently it has nothing
to do with the file access. The program below is a simplified version,
which is more to the point. If I exchange the string allocation t="t"
with t=1, there is no performance drop anymore.
# #begin of program
#
allocateBefore=true
disableGC=false
GC.disable if disableGC
largeArray=Array.new(20000000) if allocateBefore
100000.times{|i|t="t"}
largeArray=Array.new(20000000) if !allocateBefore
# #end of program
#
Any idea why the string allocation (and possibly deallocation) takes
so much more time when there is a large array in memory? Can I destroy
an object manually? This could be helpfull in combination with
disabling the garbage collection.
# #begin of program
#
allocateBefore=true
disableGC=false
GC.disable if disableGC
largeArray=Array.new(20000000) if allocateBefore
largeArray=0 # explicitly setting to 0 so that the
# gc will not need to mark it
100000.times{|i|t="t"}
largeArray=Array.new(20000000) if !allocateBefore
# #end of program
#
Sorry, disregard this. I somehow messed up the timings from some
tests.
Ruben
···
At Thu, 14 Oct 2004 16:13:22 +0900, Ruben wrote:
My guess is that in the loop 100000.times{|i|t="t"}, the garbage
collector will run out of memory, maybe once, maybe even more. And
when it runs out of memory, it will have to mark/read/follow all cells
in "largeArray", and since that array is very large, i think that's
causing the slowdown.