Drb hangs with 64-bit Ruby

(Steven Lumos) #1

64-bit Ruby on Solaris seems to have a problem that causes Drb to
hang. You can see below that I had to interrupt the process two times
(search for ^C).

[/src/lang/ruby/ruby-1.8.2]0% uname -a
SunOS jimi 5.8 Generic_117350-05 sun4u sparc SUNW,Sun-Fire
[/src/lang/ruby/ruby-1.8.2]0% ruby -v -d test/runner.rb -v test/drb/test_drb.rb
ruby 1.8.2 (2004-12-25) [sparc-solaris2.8]
Loaded suite test_drb.rb
Started
test_01(TestDRbAry): ^CException `Interrupt' at /tmp/slumos/ruby/lib/ruby/1.8/drb/drb.rb:563 -
/src/lang/ruby/ruby-1.8.2/test/drb/drbtest.rb:252: warning: instance variable @ext not initialized
Exception `NoMethodError' at /src/lang/ruby/ruby-1.8.2/test/drb/drbtest.rb:252 - undefined method `stop_service' for nil:NilClass
Exception `Errno::ECONNRESET' at /tmp/slumos/ruby/lib/ruby/1.8/drb/drb.rb:563 - Connection reset by peer
Exception `DRb::DRbConnError' at /tmp/slumos/ruby/lib/ruby/1.8/drb/drb.rb:565 - Connection reset by peer
^CException `Interrupt' at /tmp/slumos/ruby/lib/ruby/1.8/test/unit/testcase.rb:78 -
/tmp/slumos/ruby/lib/ruby/1.8/test/unit/testcase.rb:78:in `run': Interrupt
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/testsuite.rb:32:in `run'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/testsuite.rb:31:in `each'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/testsuite.rb:31:in `run'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/testsuite.rb:32:in `run'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/testsuite.rb:31:in `each'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/testsuite.rb:31:in `run'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/ui/testrunnermediator.rb:44:in `run_suite'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:65:in `start_mediator'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:39:in `start'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/ui/testrunnerutilities.rb:27:in `run'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/autorunner.rb:194:in `run'
        from /tmp/slumos/ruby/lib/ruby/1.8/test/unit/autorunner.rb:14:in `run'
        from test/runner.rb:7

I first noticed this with my optimized Sun compiler build, but to
simplify things, I verified it with gcc by running configure like
this:

  % export CFLAGS='-m64'
  % export DLDFLAGS='-m64'
  % export LDFLAGS='-m64'
  % /src/lang/ruby/ruby-1.8.2/configure --prefix=/tmp/slumos/ruby \
    CFLAGS='-m64' LDFLAGS='-m64' DLDFLAGS='-m64'

Here's a partial syscall trace.

accept(5, 0xFFFFFFFF7FFDDD30, 0xFFFFFFFF7FFDDD2C, 1) = 6
        AF_INET name = 131.216.20.6 port = 36644
setsockopt(6, 6, 1, 0xFFFFFFFF7FFDA3CC, 4, 1) = 0
fcntl(6, F_SETFD, 0x00000001) = 0
brk(0x1006EE300) = 0
brk(0x10070E300) = 0
read(6, 0x100468CA4, 1024) = 124
  \0\0\00304\b 0\0\0\0\n04\b "\v r e g i s t\0\0\00404\b i07\0\0\0
  0F04\b "10 u t _ a r r a y . r b\0\0\0 A04\b u :13 D R b : : D R
   b O b j e c t 204\b [07 " % d r u b y : / / j i m i . i s r i .
   u n l v . e d u : 3 6 6 4 3 l +0718 F1F80\0\0\00304\b 0
so_socket(2, 2, 0, "", 1) = 7
fcntl(7, F_GETFL, 0x00000000) = 2
fstat(7, 0xFFFFFFFF7FFD60B0) = 0
    d=0x0000012D00000000 i=62970 m=0140666 l=0 u=0 g=0 sz=0
        at = Aug 19 16:48:12 PDT 2005 [ 1124495292 ]
        mt = Aug 19 16:48:12 PDT 2005 [ 1124495292 ]
        ct = Aug 19 16:48:12 PDT 2005 [ 1124495292 ]
    bsz=8192 blks=0 fs=ufs
getsockopt(7, 65535, 8192, 0xFFFFFFFF7FFD61E8, 0xFFFFFFFF7FFD61E0, -2147656320)
= 0
fstat(7, 0xFFFFFFFF7FFD60B0) = 0
    d=0x0000012D00000000 i=62970 m=0140666 l=0 u=0 g=0 sz=0
        at = Aug 19 16:48:12 PDT 2005 [ 1124495292 ]
        mt = Aug 19 16:48:12 PDT 2005 [ 1124495292 ]
        ct = Aug 19 16:48:12 PDT 2005 [ 1124495292 ]
    bsz=8192 blks=0 fs=ufs
getsockopt(7, 65535, 8192, 0xFFFFFFFF7FFD61E8, 0xFFFFFFFF7FFD61E4, -2147656320)
= 0
setsockopt(7, 65535, 8192, 0xFFFFFFFF7FFD61E8, 4, -2147656320) = 0
fcntl(7, F_SETFL, 0x00000006) = 0
connect(7, 0x1004029E0, 16, 1) = 0
        AF_INET name = 131.216.20.6 port = 36643
fstat(7, 0xFFFFFFFF7FFD60B0) = 0
    d=0x0000012D00000000 i=62970 m=0140666 l=0 u=0 g=0 sz=0
        at = Aug 19 16:48:12 PDT 2005 [ 1124495292 ]
        mt = Aug 19 16:48:12 PDT 2005 [ 1124495292 ]
        ct = Aug 19 16:48:12 PDT 2005 [ 1124495292 ]
    bsz=8192 blks=0 fs=ufs
getsockopt(7, 65535, 8192, 0xFFFFFFFF7FFD61E8, 0xFFFFFFFF7FFD61E4, 0) = 0
setsockopt(7, 65535, 8192, 0xFFFFFFFF7FFD61E8, 4, 0) = 0
fcntl(7, F_SETFL, 0x00000002) = 0
setsockopt(7, 6, 1, 0xFFFFFFFF7FFD376C, 4, 1) = 0
fcntl(7, F_SETFD, 0x00000001) = 0
poll(0xFFFFFFFF7FFD05E0, 2, 0) = 1
        fd=4 ev=POLLRDNORM rev=0
        fd=7 ev=POLLOUT rev=POLLOUT
brk(0x10071E300) = 0
brk(0x10074E300) = 0
poll(0xFFFFFFFF7FFD7930, 2, 0) = 0
        fd=4 ev=POLLRDNORM rev=0
        fd=5 ev=POLLRDNORM rev=0
poll(0xFFFFFFFF7FFD13B0, 3, 0) = 1
        fd=4 ev=POLLRDNORM rev=0
        fd=5 ev=POLLRDNORM rev=0
        fd=6 ev=POLLOUT rev=POLLOUT
poll(0xFFFFFFFF7FFD06B0, 3, 0) = 1
        fd=4 ev=POLLRDNORM rev=0
        fd=5 ev=POLLRDNORM rev=0
        fd=7 ev=POLLOUT rev=POLLOUT
poll(0xFFFFFFFF7FFD1480, 3, 0) = 1
        fd=4 ev=POLLRDNORM rev=0
        fd=5 ev=POLLRDNORM rev=0
        fd=6 ev=POLLOUT rev=POLLOUT
write(7, 0x100464EF0, 42) = 42
  \0\0\0\t04\b l +0718 F1F80\0\0\0\n04\b "\v a l i v e ?\0\0\00404
  \b i\0\0\0\00304\b 0
read(7, 0x100421ED4, 1024) (sleeping...)
    Received signal #2, SIGINT, in read() [caught]
read(7, 0x100421ED4, 1024) Err#4 EINTR

Any help greatly appreciated.

Steve