[BUG] Segmentation fault with a threads/forks script

Hi,

I experience a reproducable crash (each time) with prunner.rb at
http://blop.info/bazaar/prunner.rb. The script starts several commands
at the same time. When running with a large number of commands, it
exits with :

./prunner.rb:51: [BUG] Segmentation fault
ruby 1.8.2 (2005-04-11) [i386-linux]

Aborted

To reproduce, create a large file with one command per line.
for i in $(seq 1 2000); do echo hostname done > cmds
Then run prunner.rb like this :
cat cmds |head -n 1500 |./prunner.rb
1500 can be increased if it doesn't crash for you.

Of course, I expect it to go wrong at some time, but it could probably
do this in a cleaner way.

Can somebody confirm the bug ? Or better, fix it ? :slight_smile:

···

--

Lucas Nussbaum
lucas@lucas-nussbaum.net http://www.lucas-nussbaum.net/ |
jabber: lucas@nussbaum.fr GPG: 1024D/023B3F4F |

With Ubuntu Breezy's ruby 1.9 package (version 1.9.0+20050623-2), it
crashes with :

*** glibc detected *** free(): invalid pointer: 0x08a9db38 ***
Aborted

···

On Thu, Jul 21, 2005 at 09:28:29PM +0900, Lucas Nussbaum <lucas@lucas-nussbaum.net> wrote:

Hi,

I experience a reproducable crash (each time) with prunner.rb at
http://blop.info/bazaar/prunner.rb. The script starts several commands
at the same time. When running with a large number of commands, it
exits with :

./prunner.rb:51: [BUG] Segmentation fault
ruby 1.8.2 (2005-04-11) [i386-linux]

Aborted

To reproduce, create a large file with one command per line.
for i in $(seq 1 2000); do echo hostname done > cmds
Then run prunner.rb like this :
cat cmds |head -n 1500 |./prunner.rb
1500 can be increased if it doesn't crash for you.

Of course, I expect it to go wrong at some time, but it could probably
do this in a cleaner way.

Can somebody confirm the bug ? Or better, fix it ? :slight_smile:

--

Lucas Nussbaum
lucas@lucas-nussbaum.net http://www.lucas-nussbaum.net/ |
jabber: lucas@nussbaum.fr GPG: 1024D/023B3F4F |

seems to work on 1.8.2 for values around 1000:

   [ahowward@localhost ~]$ wget http://blop.info/bazaar/prunner.rb
   --07:25:29-- http://blop.info/bazaar/prunner.rb
              => `prunner.rb'
   Resolving blop.info... 85.68.8.93
   Connecting to blop.info[85.68.8.93]:80... connected.
   HTTP request sent, awaiting response... 200 OK
   Length: 2,125 [text/plain]

   100%[======================================================================================================================================>] 2,125 --.--K/s

   07:25:29 (42.22 MB/s) - `prunner.rb' saved [2,125/2,125]

   [ahoward@localhost ~]$ for i in $(seq 1 2000);do echo 'date'; done > cmds
   [ahoward@localhost ~]$ wc -l cmds
   2000 cmds
   [ahoward@localhost ~]$ ruby prunner.rb < cmds >/dev/null
   prunner.rb:47:in `popen': Too many open files - date 2>&1 (Errno::EMFILE)
           from prunner.rb:47
           from prunner.rb:47:in `each'
           from prunner.rb:47
   [ahoward@localhost ~]$ head -1000 cmds |ruby prunner.rb >/dev/null
   [ahoward@localhost ~]$ echo $?
   0
   [ahoward@localhost ~]$ ruby -v
   ruby 1.8.2 (2005-02-12) [i686-linux]
   [ahoward@localhost ~]$ uname -srm
   Linux 2.6.12-1.1372_FC3 i686
   [ahoward@localhost ~]$ cat /etc/redhat-release
   Fedora Core release 3 (Heidelberg)

and on 1.9:

   harp:~ > wget http://blop.info/bazaar/prunner.rb
   --07:42:37-- http://blop.info/bazaar/prunner.rb
              => `prunner.rb.1'
   Resolving blop.info... 85.68.8.93
   Connecting to blop.info[85.68.8.93]:80... connected.
   HTTP request sent, awaiting response... 200 OK
   Length: 2,125 [text/plain]

   100%[======================================================================================================================================>] 2,125 --.--K/s

   07:42:37 (343.01 KB/s) - `prunner.rb.1' saved [2,125/2,125]

   harp:~ > for i in $(seq 1 2000);do echo 'date'; done > cmds
   harp:~ > wc -l cmds
      2000 cmds
   harp:~ > ruby prunner.rb < cmds >/dev/null
   prunner.rb:47:in `popen': Too many open files - date 2>&1 (Errno::EMFILE)
           from prunner.rb:47
           from prunner.rb:47:in `each'
           from prunner.rb:47
   harp:~ > head -1000 cmds|ruby prunner.rb >/dev/null
   harp:~ > echo $?
   0
   harp:~ > ruby -v
   ruby 1.9.0 (2005-05-16) [i686-linux]
   harp:~ > uname -srm
   Linux 2.4.21-32.0.1.EL i686
   harp:~ > cat /etc/redhat-release
   Red Hat Enterprise Linux WS release 3 (Taroon Update 5)

did you compile ruby yourself or use some installer/package-manager?

cheers.

-a

···

On Thu, 21 Jul 2005, Lucas Nussbaum wrote:

Hi,

I experience a reproducable crash (each time) with prunner.rb at
http://blop.info/bazaar/prunner.rb. The script starts several commands
at the same time. When running with a large number of commands, it
exits with :

./prunner.rb:51: [BUG] Segmentation fault
ruby 1.8.2 (2005-04-11) [i386-linux]

Aborted

To reproduce, create a large file with one command per line.
for i in $(seq 1 2000); do echo hostname done > cmds
Then run prunner.rb like this :
cat cmds |head -n 1500 |./prunner.rb
1500 can be increased if it doesn't crash for you.

Of course, I expect it to go wrong at some time, but it could probably
do this in a cleaner way.

Can somebody confirm the bug ? Or better, fix it ? :slight_smile:

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

looks like you can work around it by just closing every thing as you use it:

   harp:~ > curl http://fortytwo.merseine.nu/prunner.rb > prunner.rb

   harp:~ > for i in $(seq 1 2000);do echo 'date'; done > cmds

   harp:~ > wc -l cmds
      2000 cmds

   harp:~ > ruby prunner.rb < cmds > /dev/null

   harp:~ > echo $?
   0

   harp:~ > ls prunner.*out* | wc -l
      2000

   harp:~ > cat prunner.out-.0
   # 0 : date
   Thu Jul 21 10:49:07 MDT 2005

   harp:~ > cat prunner.out-.1999
   # 1999 : date
   Thu Jul 21 10:49:21 MDT 2005

(prunner.rb inlined below)

hth.

-a

···

On Thu, 21 Jul 2005, Lucas Nussbaum wrote:

Hi,

I experience a reproducable crash (each time) with prunner.rb at
http://blop.info/bazaar/prunner.rb. The script starts several commands
at the same time. When running with a large number of commands, it
exits with :

./prunner.rb:51: [BUG] Segmentation fault
ruby 1.8.2 (2005-04-11) [i386-linux]

Aborted

To reproduce, create a large file with one command per line.
for i in $(seq 1 2000); do echo hostname done > cmds
Then run prunner.rb like this :
cat cmds |head -n 1500 |./prunner.rb
1500 can be increased if it doesn't crash for you.

Of course, I expect it to go wrong at some time, but it could probably
do this in a cleaner way.

Can somebody confirm the bug ? Or better, fix it ? :slight_smile:
--
> Lucas Nussbaum
> lucas@lucas-nussbaum.net http://www.lucas-nussbaum.net/ |
> jabber: lucas@nussbaum.fr GPG: 1024D/023B3F4F |

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

===============================================================================
file: prunner.rb

#!/usr/bin/ruby -w

require 'optparse'
require 'thread'

#
# prunner : read commands from stdin, and execute each of them in parallel
#

class Hash
   def getopt k, default = nil
     return self[k] if self.has_key? k
     k = "#{ k }"
     return self[k] if self.has_key? k
     k = k.intern
     return self[k] if self.has_key? k
     return default
   end
end

class Command
   class << self
     def gen_cid
       @cid = defined?(@cid) ? (@cid + 1) : 0
     end
   end

   attr_accessor :command, :cid, :prefix, :path, :exit_status

   def initialize command, opts = {}
     @command = command.strip
     @cid = self.class.gen_cid
     @prefix = opts.getopt 'prefix', "#{ $$ }_command.out-"
     @path = "#{ @prefix }.#{ @cid }"
     @header = opts.getopt 'header'
     @mutex = Mutex::new
     @lines = []
     @update_idx = 0
     @thread = nil
     @exit_status = -1
   end
   def start
     @thread =
       Thread::new(@command, Thread::current) do |cmd, cur|
         begin
           IO::popen("{ #{ cmd } ;} 2>&1") do |pipe|
             File::open(@path, 'w') do |f|
               f << self if @header
               while((line = pipe.gets))
                 synchronize{ @lines << line }
                 f << line
               end
             end
           end
           @exit_status = $?.exitstatus
         rescue Exception => e
           cur.raise e
         end
       end
   end
   def synchronize(*a, &b)
     @mutex.synchronize(*a, &b)
   end
   def join(*a, &b)
     @thread.join(*a, &b)
   end
   def update
     report = nil
     synchronize do
       report = @lines[@update_idx .. -1]
       @update_idx = @lines.size
     end
     report
   end
   def update?
     @update_idx < @lines.size
   end
   def to_s
     "# #{ @cid } : #{ @command }\n"
   end
   alias label to_s
end

class Main
   def initialize env = ENV.to_hash, argv = ARGV.clone
     @env, @argv = env, argv
     @cmds = []
     @header = true
     @verbose = true
     @interval = 1
     @prefix = 'prunner.out-'
     @viewthread = nil
     parse_options
   end
   def parse_options
     OptionParser::new do |opts|
       opts.banner = "echo command | prunner.rb [options]"
       opts.separator ''
       opts.on('-h', '--suppress-header', 'suppress header in output files'){
         @header = false
       }
       opts.on('-q', '--quiet', 'run quietly'){
         @verbose = false
       }
       opts.on('-i', '--interval', 'output interval'){|i|
         @interval = Float i
       }
       opts.on('-p', '--prefix PREFIX', "prefix for output files (default #{ @prefix })"){|p|
         @prefix = p
       }
       opts.on_tail('h', '--help', 'Show this message') {
         puts opts
         exit
       }
       opts.parse!(@argv)
     end
   end
   def main
     STDIN.each do |line|
       line.strip!
       next if line.empty?
       c =
         Command::new(line,
           :verbose => @verbose,
           :header => @header,
           :prefix => @prefix
         )
       c.start
       @cmds << c
     end

     if @verbose
       @viewthread =
         Thread::new do
           loop do
             reports = @cmds.map{|c| [c.label, c.update] if c.update?}.compact
             exit if reports.empty?
             reports.each do |label, report|
               print label
               report.each{|line| print line}
             end
             sleep @interval
           end
         end
     end

     @cmds.each{|c| c.join}
     @viewthread.join if @viewthread
     exit
   end
end

if $0 == __FILE__
   STDOUT.sync = true
   Main::new(ENV, ARGV).main
end

I tested using Debian's and Ubuntu's packages.
What if you ulimit -n 16384 first ?
Does it still work ?
It worked for me with values around 1000 too. But after increasing the
ulimit for open files, it started crashing.

···

On Thu, Jul 21, 2005 at 10:50:00PM +0900, "Ara.T.Howard" <Ara.T.Howard@noaa.gov> wrote:

On Thu, 21 Jul 2005, Lucas Nussbaum wrote:

>Hi,
>
>I experience a reproducable crash (each time) with prunner.rb at
>http://blop.info/bazaar/prunner.rb. The script starts several commands
>at the same time. When running with a large number of commands, it
>exits with :
>
>./prunner.rb:51: [BUG] Segmentation fault
>ruby 1.8.2 (2005-04-11) [i386-linux]
>
>Aborted
>
>To reproduce, create a large file with one command per line.
>for i in $(seq 1 2000); do echo hostname done > cmds
>Then run prunner.rb like this :
>cat cmds |head -n 1500 |./prunner.rb
>1500 can be increased if it doesn't crash for you.
>
>Of course, I expect it to go wrong at some time, but it could probably
>do this in a cleaner way.
>
>Can somebody confirm the bug ? Or better, fix it ? :slight_smile:

seems to work on 1.8.2 for values around 1000:

[...]

did you compile ruby yourself or use some installer/package-manager?

--

Lucas Nussbaum
lucas@lucas-nussbaum.net http://www.lucas-nussbaum.net/ |
jabber: lucas@nussbaum.fr GPG: 1024D/023B3F4F |

o.k. - now it crashed on 1.8.2. i can't ulimit on the 1.9 box.

-a

···

On Thu, 21 Jul 2005, Lucas Nussbaum wrote:

On Thu, Jul 21, 2005 at 10:50:00PM +0900, "Ara.T.Howard" <Ara.T.Howard@noaa.gov> wrote:

On Thu, 21 Jul 2005, Lucas Nussbaum wrote:

Hi,

I experience a reproducable crash (each time) with prunner.rb at
http://blop.info/bazaar/prunner.rb. The script starts several commands
at the same time. When running with a large number of commands, it
exits with :

./prunner.rb:51: [BUG] Segmentation fault
ruby 1.8.2 (2005-04-11) [i386-linux]

Aborted

To reproduce, create a large file with one command per line.
for i in $(seq 1 2000); do echo hostname done > cmds
Then run prunner.rb like this :
cat cmds |head -n 1500 |./prunner.rb
1500 can be increased if it doesn't crash for you.

Of course, I expect it to go wrong at some time, but it could probably
do this in a cleaner way.

Can somebody confirm the bug ? Or better, fix it ? :slight_smile:

seems to work on 1.8.2 for values around 1000:

[...]

did you compile ruby yourself or use some installer/package-manager?

I tested using Debian's and Ubuntu's packages.
What if you ulimit -n 16384 first ?
Does it still work ?
It worked for me with values around 1000 too. But after increasing the
ulimit for open files, it started crashing.

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

Hi,

At Thu, 21 Jul 2005 23:09:51 +0900,
Lucas Nussbaum wrote in [ruby-talk:149072]:

I tested using Debian's and Ubuntu's packages.
What if you ulimit -n 16384 first ?
Does it still work ?
It worked for me with values around 1000 too. But after increasing the
ulimit for open files, it started crashing.

I think it has been fixed in CVS trunk.

···

Fri Jun 3 23:23:02 2005 Nobuyoshi Nakada <nobu@ruby-lang.org>
  
    * intern.h (rb_fdset_t): deal with fd bit sets over FD_SETSIZE.
      fixed: [ruby-dev:26187]

--
Nobu Nakada

Could somebody confirm ? The Debian/Ubuntu package is versioned
1.9.0+20050623-2, so it *might* be based on the ruby 1.9 CVS after that
bug was fixed, and it still crashes for me. (is the CVS HEAD ruby1.9 ?
I'm not familiar with Ruby development)

···

On Thu, Jul 21, 2005 at 11:29:49PM +0900, nobu.nokada@softhome.net wrote:

Hi,

At Thu, 21 Jul 2005 23:09:51 +0900,
Lucas Nussbaum wrote in [ruby-talk:149072]:
> I tested using Debian's and Ubuntu's packages.
> What if you ulimit -n 16384 first ?
> Does it still work ?
> It worked for me with values around 1000 too. But after increasing the
> ulimit for open files, it started crashing.

I think it has been fixed in CVS trunk.

  Fri Jun 3 23:23:02 2005 Nobuyoshi Nakada <nobu@ruby-lang.org>
  
    * intern.h (rb_fdset_t): deal with fd bit sets over FD_SETSIZE.
      fixed: [ruby-dev:26187]

--

Lucas Nussbaum
lucas@lucas-nussbaum.net http://www.lucas-nussbaum.net/ |
jabber: lucas@nussbaum.fr GPG: 1024D/023B3F4F |

Hi,

At Thu, 21 Jul 2005 23:45:34 +0900,
Lucas Nussbaum wrote in [ruby-talk:149077]:

Could somebody confirm ? The Debian/Ubuntu package is versioned
1.9.0+20050623-2, so it *might* be based on the ruby 1.9 CVS after that
bug was fixed, and it still crashes for me. (is the CVS HEAD ruby1.9 ?
I'm not familiar with Ruby development)

It seems to have new related bug(s), I'll investigate it more.

···

--
Nobu Nakada

In article <TYOMLEM041XvpFVjCRG000000c1@tyomlvem02.e2k.ad.ge.com>,
  nobuyoshi nakada <nobuyoshi.nakada@ge.com> writes:

It seems to have new related bug(s), I'll investigate it more.

The all three fd_sets must be long enough for select.

Index: eval.c

···

===================================================================
RCS file: /src/ruby/eval.c,v
retrieving revision 1.803
diff -u -r1.803 eval.c
--- eval.c 19 Jul 2005 14:57:47 -0000 1.803
+++ eval.c 22 Jul 2005 15:49:31 -0000
@@ -9880,10 +9880,8 @@
     }
}

-void
-rb_fd_set(n, fds)
- int n;
- rb_fdset_t *fds;
+static void
+rb_fd_resize(int n, rb_fdset_t *fds)
{
     int m = howmany(n + 1, NFDBITS) * sizeof(fd_mask);
     int o = howmany(fds->maxfd, NFDBITS) * sizeof(fd_mask);
@@ -9896,6 +9894,14 @@
   memset((char *)fds->fdset + o, 0, m - o);
     }
     if (n >= fds->maxfd) fds->maxfd = n + 1;
+}
+
+void
+rb_fd_set(n, fds)
+ int n;
+ rb_fdset_t *fds;
+{
+ rb_fd_resize(n, fds);
     FD_SET(n, fds->fdset);
}

@@ -9931,6 +9937,15 @@
     memcpy(dst->fdset, src, size);
}

+int
+rb_fd_select(int n, rb_fdset_t *readfds, rb_fdset_t *writefds, rb_fdset_t *exceptfds, struct timeval *timeout)
+{
+ rb_fd_resize(n-1, readfds);
+ rb_fd_resize(n-1, writefds);
+ rb_fd_resize(n-1, exceptfds);
+ return select(n, rb_fd_ptr(readfds), rb_fd_ptr(writefds), rb_fd_ptr(exceptfds), timeout);
+}
+
#undef FD_ZERO
#undef FD_SET
#undef FD_CLR
@@ -10795,7 +10810,7 @@
       delay_ptr = &delay_tv;
   }

- n = select(max+1, rb_fd_ptr(&readfds), rb_fd_ptr(&writefds), rb_fd_ptr(&exceptfds), delay_ptr);
+ n = rb_fd_select(max+1, &readfds, &writefds, &exceptfds, delay_ptr);
   if (n < 0) {
       int e = errno;

Index: intern.h

RCS file: /src/ruby/intern.h,v
retrieving revision 1.172
diff -u -r1.172 intern.h
--- intern.h 14 Jul 2005 15:11:52 -0000 1.172
+++ intern.h 22 Jul 2005 15:49:32 -0000
@@ -162,6 +162,7 @@
void rb_fd_clr _((int, rb_fdset_t *));
int rb_fd_isset _((int, const rb_fdset_t *));
void rb_fd_copy _((rb_fdset_t *, const fd_set *, int));
+int rb_fd_select(int, rb_fdset_t *, rb_fdset_t *, rb_fdset_t *, struct timeval *);

#define rb_fd_ptr(f) ((f)->fdset)
#define rb_fd_max(f) ((f)->maxfd)
@@ -178,6 +179,7 @@
#define rb_fd_init(f) FD_ZERO(f)
#define rb_fd_term(f) (f)
#define rb_fd_max(f) FD_SETSIZE
+#define rb_fd_select(n, rfds, wfds, efds, timeout) select(n, rfds, wfds, efds, timeout)

#endif

--
Tanaka Akira

Hi,

At Sat, 23 Jul 2005 00:55:09 +0900,
Tanaka Akira wrote in [ruby-talk:149199]:

> It seems to have new related bug(s), I'll investigate it more.

The all three fd_sets must be long enough for select.

Indeed.

···

--
Nobu Nakada

Hi,

At Sat, 23 Jul 2005 00:55:09 +0900,
Tanaka Akira wrote in [ruby-talk:149199]:

The all three fd_sets must be long enough for select.

What about making them one struct? I guess it would be nice
also for absorbing difference between select and poll.

···

--
Nobu Nakada

In article <200507231124.j6NBOY0Z028038@sharui.nakada.niregi.kanuma.tochigi.jp>,
  nobu.nokada@softhome.net writes:

The all three fd_sets must be long enough for select.

What about making them one struct? I guess it would be nice
also for absorbing difference between select and poll.

It may be a step for kqueue, epoll, /dev/poll, etc.

I feel it's good idea.

···

--
Tanaka Akira