How can i find lingering file descriptors?

Hello,

I seem to have an issue with file descriptors that aren't being closed when I attempt to put some parallelization into one of my scripts. I am trying to make use of the forkoff gem, but I guess that I am not using it correctly.

If it's useful, here is what the code looks like currently:

     def measure_n ( start_time, stop_time, direction, pattern, records )
       counters = [ :flows, :packets, :octets ]

       # single process
       stats = Hash.new
       counters.each { |c| stats[c] = 0 }

       puts "starting single process execution"
       single_start = Time.now.to_f
       pattern.each do |p|
         r = self.measure_1( start_time, stop_time, direction, "#{direction} AS #{p.to_s}", records)
         counters.each { |c| stats[c] += r[c] }
       end
       single_stop = Time.now.to_f
       puts stats.inspect
       puts "single exeution time: #{single_stop - single_start}"

       # multiple processes
       stats = Hash.new
       counters.each { |c| stats[c] = 0 }

       puts "starting multi process execution"
       multi_start = Time.now.to_f
       asn_stats = pattern.forkoff! :processes => 4 do |asn|
         a = Netflow::Nfdump.new
         a.measure_1(start_time,stop_time,direction,"#{direction} AS #{asn}",records)
       end
       asn_stats.each do |asn|
         counters.each { |c| stats[c] += asn[c] }
       end

       multi_stop = Time.now.to_f
       puts stats.inspect
       puts "multi execution time: #{multi_stop - multi_start}"

       return stats
     end

The part that I find really odd, is that I can replace the block that I am attempting to parallelize with a simple:-

  puts "#{asn}"

And the script dies in the same place -- at the 255'th element in the array, which corresponds well to the number of file descriptors that I can use:-

  bradv:bvolz:$ ulimit -a | grep files
  open files (-n) 256

Since I see this problem with a simple 'puts' does that mean that the issue is not in my code, and perhaps lies elsewhere? Or have I misunderstood how to make use of the forkoff gem? In either case, how can I figure out what these open file descriptors are?

Thanks,

Brad

Hello,

I seem to have an issue with file descriptors that aren't being closed when I attempt to put some parallelization into one of my scripts.

you can list them all with something like

   limit = 8192
   files = Array.new(limit){|i| IO.for_fd(i) rescue nil}.compact.map{|io> io.fileno}
   p files

I am trying to make use of the forkoff gem, but I guess that I am not using it correctly.

If it's useful, here is what the code looks like currently:

   def measure_n ( start_time, stop_time, direction, pattern, records )
     counters = [ :flows, :packets, :octets ]

     # single process
     stats = Hash.new
     counters.each { |c| stats[c] = 0 }

     puts "starting single process execution"
     single_start = Time.now.to_f
     pattern.each do |p|
       r = self.measure_1( start_time, stop_time, direction, "#{direction} AS #{p.to_s}", records)
       counters.each { |c| stats[c] += r[c] }
     end
     single_stop = Time.now.to_f
     puts stats.inspect
     puts "single exeution time: #{single_stop - single_start}"

     # multiple processes
     stats = Hash.new
     counters.each { |c| stats[c] = 0 }

     puts "starting multi process execution"
     multi_start = Time.now.to_f
     asn_stats = pattern.forkoff! :processes => 4 do |asn|
       a = Netflow::Nfdump.new

this probably opens a file or pipe

       a.measure_1(start_time,stop_time,direction,"#{direction} AS #{asn}",records)

this too possibly

     end
     asn_stats.each do |asn|
       counters.each { |c| stats[c] += asn[c] }
     end

     multi_stop = Time.now.to_f
     puts stats.inspect
     puts "multi execution time: #{multi_stop - multi_start}"

     return stats
   end

The part that I find really odd, is that I can replace the block that I am attempting to parallelize with a simple:-

  puts "#{asn}"

And the script dies in the same place -- at the 255'th element in the array, which corresponds well to the number of file descriptors that I can use:-

  bradv:bvolz:$ ulimit -a | grep files
  open files (-n) 256

Since I see this problem with a simple 'puts' does that mean that the issue is not in my code, and perhaps lies elsewhere? Or have I misunderstood how to make use of the forkoff gem? In either case, how can I figure out what these open file descriptors are?

Thanks,

Brad

you are using the gem properly. you should attempt to track down the open files with something like this in the forkoff block

     ios = Array.new(8192){|i| IO.for_fd(i) rescue nil}.compact
     filenos = ios.map{|io| io.fileno}
     paths = ios.map{|io| io.path rescue nil}

     STDERR.puts "child : #{ Process.pid }"
     STDERR.puts "filenos : #{ filenos.inspect }"
     STDERR.puts "paths : #{ paths.inspect }"
     STDIN.gets

i just releases version 0.0.4 of forkoff. should not affect your issue, but might want to install anyhow (just pushed to rubyforge - might have to wait for it propagate or grab the gem from there manually)

cheers.

a @ http://codeforpeople.com/

···

On Sep 13, 2008, at 12:45 AM, Brad Volz wrote:
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Thanks for the reply.

I'm going to trim the original message a bit.

Hello,

I seem to have an issue with file descriptors that aren't being closed when I attempt to put some parallelization into one of my scripts.

you can list them all with something like

limit = 8192
files = Array.new(limit){|i| IO.for_fd(i) rescue nil}.compact.map{|io> io.fileno}
p files

Excellent. Thanks.

    # multiple processes
    stats = Hash.new
    counters.each { |c| stats[c] = 0 }

    puts "starting multi process execution"
    multi_start = Time.now.to_f
    asn_stats = pattern.forkoff! :processes => 4 do |asn|
      a = Netflow::Nfdump.new

this probably opens a file or pipe

      a.measure_1(start_time,stop_time,direction,"#{direction} AS #{asn}",records)

this too possibly

Yes. It calls an external program to collect the actual data from the datastore. Having said that, I don't know that I can confidently say that the issue is in that particular block of code, as I can remove it and put in place something like: puts "hello !" and still experience the same problem.

you are using the gem properly. you should attempt to track down the open files with something like this in the forkoff block

   ios = Array.new(8192){|i| IO.for_fd(i) rescue nil}.compact
   filenos = ios.map{|io| io.fileno}
   paths = ios.map{|io| io.path rescue nil}

   STDERR.puts "child : #{ Process.pid }"
   STDERR.puts "filenos : #{ filenos.inspect }"
   STDERR.puts "paths : #{ paths.inspect }"
   STDIN.gets

i just releases version 0.0.4 of forkoff. should not affect your issue, but might want to install anyhow (just pushed to rubyforge - might have to wait for it propagate or grab the gem from there manually)

I'll try this again later with the new forkoff, but I currently have 0.0.1.

Thanks for the test block. I ran it via -

require 'rubygems'
require 'forkoff'

(0..255).forkoff do |f|

   ios = Array.new(8192){|i| IO.for_fd(i) rescue nil}.compact
   filenos = ios.map{|io| io.fileno}
   paths = ios.map{|io| io.path rescue nil}

   STDERR.puts "child : #{ Process.pid }"
   STDERR.puts "filenos : #{ filenos.inspect }"
   STDERR.puts "paths : #{ paths.inspect }"
   STDIN.gets

end

I won't past in the entire output, but here is the first and last child for comparison --

bradv:bvolz:$ ruby forkoff-test.rb
child : 47036
filenos : [0, 1, 2, 4]
paths : [nil, nil, nil, nil]

child : 47288
filenos : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 255]
paths : [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]

/opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:42:in `pipe': Too many open files (Errno::EMFILE)
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:42:in `forkoff'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:37:in `loop'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:37:in `forkoff'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:34:in `initialize'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:34:in `new'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:34:in `forkoff'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:32:in `times'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:32:in `forkoff'
  from forkoff-test.rb:5
bradv:bvolz:$ /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:53:in `write': Broken pipe (Errno::EPIPE)
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:53:in `forkoff'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:37:in `loop'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:37:in `forkoff'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:34:in `initialize'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:34:in `new'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:34:in `forkoff'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:32:in `times'
  from /opt/local/lib/ruby/gems/1.8/gems/forkoff-0.0.1/lib/forkoff.rb:32:in `forkoff'
  from forkoff-test.rb:5

If it's useful to know I am using ruby from macports on OS X 10.5

bradv:bvolz:$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9]

My current workaround is to use threadify instead of forkoff which is working beautifully.

bradv:bvolz:$ ruby netflow.rb
starting single process execution
{:flows=>176915, :packets=>90580480, :octets=>81664177152}
single exeution time: 138.792369842529
starting threadify execution
{:flows=>176915, :packets=>90580480, :octets=>81664177152}
threadify execution time: 85.1209449768066

cheers!

Brad

···

On Sep 13, 2008, at 10:44 AM, ara.t.howard wrote:

On Sep 13, 2008, at 12:45 AM, Brad Volz wrote:

I just tested 0.0.4, and it seems to work just as well as threadify at keeping the file descriptors under control.

Using the same test code produces -

bradv:bvolz:$ ruby forkoff-test.rb
child : 47335
filenos : [0, 1, 2, 4]
paths : [nil, nil, nil, nil]

that was the first pass, and here is the last --

child : 47600
filenos : [0, 1, 2, 3, 5]
paths : [nil, nil, nil, nil, nil]
bradv:bvolz:$

Many thanks for your help and for releasing these gems to the public.

cheers!

Brad

···

On Sep 13, 2008, at 10:44 AM, ara.t.howard wrote:

i just releases version 0.0.4 of forkoff. should not affect your issue, but might want to install anyhow (just pushed to rubyforge - might have to wait for it propagate or grab the gem from there manually)

great!

guess i left some pipes lying around in the initial version ;-(

say - which of the two, threadify and forkoff, was fastest for your use case?

a @ http://codeforpeople.com/

···

On Sep 13, 2008, at 6:25 PM, Brad Volz wrote:

I just tested 0.0.4, and it seems to work just as well as threadify at keeping the file descriptors under control.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

It looks like threadify is.

The following test attempts to be "fair" by specifying 4 threads for threadify, and 4 processes for forkoff since their defaults are different.

bradv:bvolz:$ !!
ruby netflow.rb
2391 items in Enumerable
starting threadify execution
{:flows=>150148, :packets=>76875776, :octets=>68717894144}
threadify execution time: 108.343915224075
starting forkoff execution
{:flows=>150148, :packets=>76875776, :octets=>68717894144}
forkoff execution time: 143.863088130951
bradv:bvolz:$

In my case, I'm not doing the calculations in ruby code. I'm calling an external program to fetch the data for me, so in both cases there is an external process that can be scheduled by the operating system onto whatever processor has cycles. Presumably, if that were not the case, and I was doing a lot of cpu intensive work in Ruby itself, the forkoff version would be faster by providing 'n' GIL instances instead of just one.

cheers!

Brad

···

On Sep 13, 2008, at 10:40 PM, ara.t.howard wrote:

On Sep 13, 2008, at 6:25 PM, Brad Volz wrote:

I just tested 0.0.4, and it seems to work just as well as threadify at keeping the file descriptors under control.

great!

guess i left some pipes lying around in the initial version ;-(

say - which of the two, threadify and forkoff, was fastest for your use case?

i think that's a correct assessment alright. time and time again i find that spawning a bunch of processes via threads is the easiest and most powerful method of achieving parallelism with ruby - and in general. i know it's not sexy, but being able to test a computation unit from the command line and then simply fan them out across cpus or even many nodes make incremental development super easy and let's us use all the tools we're accustomed to (ps, free, top, ssh, ruby) to manage the computational units. it's interesting that you're also following this pattern.

cheers.

a @ http://codeforpeople.com/

···

On Sep 14, 2008, at 1:48 AM, Brad Volz wrote:

n my case, I'm not doing the calculations in ruby code. I'm calling an external program to fetch the data for me, so in both cases there is an external process that can be scheduled by the operating system onto whatever processor has cycles. Presumably, if that were not the case, and I was doing a lot of cpu intensive work in Ruby itself, the forkoff version would be faster by providing 'n' GIL instances instead of just one.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama