[ I've had this email open for two day strait now, I should probably
fire something off though, sorry for this being long, I tried to
make this as detailed as possible. ]
> Yup, that's right. Now add -i 0 to the argument list and watch it
> core after a few min. This is a bug that comes out when you stress
> test a module. -sc
I can't reproduce, sorry
pigeon% time tests.rb -i 0 -n xml_parser4 >& /dev/null
real 40m55.139s
user 40m46.740s
sys 0m10.530s
pigeon%
Interesting. When I first woke up this morning (day you sent this)
and read your email, I couldn't reproduce it either. Since then, I've
significantly improved rubytest and have increased rubytest's memory
footprint and can now get it to crash with xml_parser4. I think the
reason that xml_parser4 wasn't crashing was because it was under some
memory threshold (sound plausible? I have some evidence to back this
up below). I just re-ran the tests with an updated version of
rubytest and am getting the following now:
$ time rubytest -F -i 0 -n xml_parser4
./tests/tc_xml_parser4.rb:12: [BUG] Segmentation fault
ruby 1.7.3 (2002-09-27) [i386-freebsd5]
Abort (core dumped)
68.013u 0.509s 1:23.42 82.1% 5+3350k 0+51io 0pf+0w
What's really strange is that if I let this send data to stdout, I
don't have any problems.
$ time rubytest -i 0 -n xml_parser4
[runs forever]
All -F does is sidestep using IO::Tee and opens /dev/null to use as
the IO handle for the unit tests. As an interesting test, I used -F
with a delay of 0.1 sec and noticed it still crashing. I'm not sure
why using IO::Tee would prevent xml_parser4 from crashing while
working around it and using only a single file handle would cause this
to crash. My best guess would be that the memory section that gets
stomped on with -F has been replaced by some fluff from IO::Tee and I'm
just getting lucky and not reading from the corrupted address
space(s).
I ran this through valgrind and came up with an updated report for
just xml_parser4.
http://www.rubynet.org/bugs/valgrind-libxml.txt
http://www.rubynet.org/bugs/valgrind-libxml.txt.bz2
I don't know enough of gc.c's workings to speculate, but it seems to
think there are problems on quite a few lines in gc.c. I don't know
if that's because it couldn't follow Ruby's execution path or if it's
because there are legitimate bugs that it's picked up. Valgrind's
good enough to run against mozilla and KDE so I'd think it's pretty
decent at picking up errors and avoiding the traps, but I'm not an
aficionado of the tool so I don't know its shortcomings.
Sorry to make this difficult, I'm really trying not to be.
Here,
let me make things easier, the following tarball should have
everything needed to reproduce this for both libxml and Net::GeoIP,
though I think running any reasonably large .so would trigger this
under rubytest -i0.
http://www.rubynet.org/bugs/ruby-bug.tar.gz
Installation:
*) Download and untar ruby-bug.tar.gz
*) cd bug && setenv BUGDIR $PWD
*) setenv RUBYLIB $BUGDIR
*) setenv LD_LIBRARY_PATH $BUGDIR/lib
*) set path = ($BUGDIR/rubytest $path)
For Net::GeoIP:
*) Install libGeoIP:
*) cd $BUGDIR/GeoIP-1.0.5
*) ./configure --prefix=$BUGDIR
*) make && make install
*) build Net::GeoIP
*) cd $BUGDIR/net/geoip
*) ruby extconf.rb --with-geoip-dir=$BUGDIR
*) make
*) time rubytest -i 0 -F
For XML:
*) Install libxml2 2.2.4
*) cd $BUGDIR/xml/libxml
*) ruby extconf.rb
*) make
*) time rubytest -i 0 -F -n xml_parser4
I've been running the above test under valgrind for over 70min now and
haven't gotten it to crash. If I run it from the CLI, I can get it to
core in about 5min. For what it's worth, here are two runs of
xml_parser4 under different testing circumstances (changed flags to
rubytest):
http://www.rubynet.org/bugs/libxml-out1.txt
http://www.rubynet.org/bugs/libxml-out2.txt
Grr... same code, no updates, and I can't get it to crash under
valgrind. I can always get it to crash when it's not under valgrind
though. :-/ Maybe valgrind just slows it down that much, but it only
takes 2min of operation on the CLI before it cores, I'd think that
it'd core in under 70min under valgrind.
Just for curiosities sake,
I ran the CLI test several times in a row and it seems to be
intermittent even between runs of rubytest. I ctrl+c'ed the 1st run,
and the 2nd and 3rd run dumped almost instantaneously.
$ time rubytest -i 0 -F -n xml_parser4
/usr/lib/ruby/site_ruby/1.7/test/unit/ui/util/observable.rb:93:in `channels': Interrupt
from /usr/lib/ruby/site_ruby/1.7/test/unit/ui/util/observable.rb:77:in `notify_listeners'
[snip stack trace: have yet to install a signal handler]
from /home/sean/bin/rubytest:404
506.432u 9.254s 8:49.59 97.3% 0+0k 0+0io 469pf+0w
$ time rubytest -i 0 -F -n xml_parser4
./tests/tc_xml_parser4.rb:7: [BUG] Segmentation fault
ruby 1.7.3 (2002-09-13) [i686-linux]
Abort
3.027u 0.031s 0:03.12 97.7% 0+0k 0+0io 467pf+0w
$ time rubytest -i 0 -F -n xml_parser4
/usr/lib/ruby/site_ruby/1.7/test/unit/assertions.rb:93: [BUG] Segmentation fault
ruby 1.7.3 (2002-09-13) [i686-linux]
Abort (core dumped)
11.556u 0.100s 0:11.96 97.4% 0+0k 0+0io 467pf+0w
I'll see if I can't get valgrind to run overnight and produce a crash,
but hopefully I'll let it run overnight tonight though and see if I
can't get it to crash under valgrind. Odd, huh? Some thing's obscure
someplace and I'm not sure where to poke next. I was talking to JD
last night and he said he was able to core a real nasty DB test
script... it's backtraces look almost identical and can be found here:
http://www.rubynet.org/bugs/postgresql-dump_jd.txt
It looks like it's all incarnations of the same beast and it only
comes out to play when you really push Ruby in tight loops. It's hard
to for me to imagine that all three pieces of software are buggy or
that every box I'm testing on has a RAM issue that's coming up because
of these tests _and_ is dumping in similar areas... stranger things
have happened though. -sc
···
--
Sean Chittenden