Matz, if you're reading, please scan this email

I’ve found a problem with the Ruby interpreter, wherein the
interpreter seg faults. I have not found a way to reliably reproduce
this; like the Great White Shark, I’ve only seen it reproduce in the
wild. I hesitate, therefore, to submit a formal bug report.

However, I have noticed that certain situations can increase the
possibility of these segfaults occurring. I believe it has to do with
the actual internal translation of source to execution. Hmm. That’s
pretty obscure. Try this:

In cases where Ruby code re-defines the same method several times in a
row, it is more likely that Ruby will segfault at some obscure point
in the program. It also seems to be aggravated by threaded code.

With the same Ruby VM version, and the same version of an application
running on two different machines, one may suddenly exhibit the
problem, and the other won’t. And then the one that was having the
problem may not have the problem the next morning.

I know how frustrating it is to have to track down no-see-ums. Is
this something you’re aware of? Have observed?

I’m curious, because I get occasional bug reports about seg-faults
that I can’t reproduce, and I don’t know what to tell users.

— SER

Hi,

···

In message “Matz, if you’re reading, please scan this email” on 02/09/18, Sean Russell ser@germane-software.com writes:

I’ve found a problem with the Ruby interpreter, wherein the
interpreter seg faults. I have not found a way to reliably reproduce
this; like the Great White Shark, I’ve only seen it reproduce in the
wild. I hesitate, therefore, to submit a formal bug report.

However, I have noticed that certain situations can increase the
possibility of these segfaults occurring. I believe it has to do with
the actual internal translation of source to execution. Hmm. That’s
pretty obscure. Try this:

In cases where Ruby code re-defines the same method several times in a
row, it is more likely that Ruby will segfault at some obscure point
in the program. It also seems to be aggravated by threaded code.

I will examine. Could you show us your ruby -v output?

						matz.

A really good way of trapping such bugs is valgrind. If you run your test
case under valgrind I bet you will spot some illicit behaviour.

http://developer.kde.org/~sewardj/

···

On Wed, 18 Sep 2002, Sean Russell wrote:

I’ve found a problem with the Ruby interpreter, wherein the
interpreter seg faults. I have not found a way to reliably reproduce
this; like the Great White Shark, I’ve only seen it reproduce in the
wild. I hesitate, therefore, to submit a formal bug report.

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

Good Ideas:
Ruby - http://www.ruby-lang-org - The best of perl,python,scheme without the pain.
Valgrind - http://developer.kde.org/~sewardj/ - memory debugger for x86-GNU/Linux
Free your books - http://www.bookcrossing.com

I think that, faced with that sort of code, my brain would segfault before Ruby
does.

Sorry :slight_smile:

Gavin

···

----- Original Message -----
From: “Sean Russell” ser@germane-software.com

In cases where Ruby code re-defines the same method several times in a
row, it is more likely that Ruby will segfault at some obscure point
in the program.

I’ve found a problem with the Ruby interpreter, wherein the
interpreter seg faults. I have not found a way to reliably
reproduce this; like the Great White Shark, I’ve only seen it
reproduce in the wild. I hesitate, therefore, to submit a formal
bug report.

However, I have noticed that certain situations can increase the
possibility of these segfaults occurring. I believe it has to do
with the actual internal translation of source to execution. Hmm.
That’s pretty obscure. Try this:

In cases where Ruby code re-defines the same method several times in
a row, it is more likely that Ruby will segfault at some obscure
point in the program. It also seems to be aggravated by threaded
code.

With the same Ruby VM version, and the same version of an
application running on two different machines, one may suddenly
exhibit the problem, and the other won’t. And then the one that was
having the problem may not have the problem the next morning.

I know how frustrating it is to have to track down no-see-ums. Is
this something you’re aware of? Have observed?

I’m curious, because I get occasional bug reports about seg-faults
that I can’t reproduce, and I don’t know what to tell users.

In the unit tests for libxml, I think I’ve pushed things to SEGV land
with the do_hash() macro during cleanup when it can’t determine the
classname of an object. Does this seem plausible? I’m not sure if
I’m running across an instance of libxml trodding on Ruby’s memory or
what the problem is, but, all of a sudden, I’m getting SEGV’ing with
my unit tests. I’m hoping that by backing down the number of unit
tests and seeing the problem disappear, but no promises. Any chance
you can get a core file from your users? And do you know if this is
something that’s happening in cleanup or the GC of old objects? -sc

···


Sean Chittenden

In the unit tests for libxml, I think I've pushed things to SEGV land
with the do_hash() macro during cleanup when it can't determine the
classname of an object. Does this seem plausible?

Can you send me, in private email, the tests which crash ruby ?

I've ruby-libxml-20020919

Guy Decoux

> In the unit tests for libxml, I think I've pushed things to SEGV land
> with the do_hash() macro during cleanup when it can't determine the
> classname of an object. Does this seem plausible?

Can you send me, in private email, the tests which crash ruby ?

I've ruby-libxml-20020919

Just made a fresh snapshot for 'ya:

bzip2: http://www.rubynet.org/modules/xml/ruby-libxml/ruby-libxml-0.03-snapshot-20020926.tar.bz2
bzip2 MD5: http://www.rubynet.org/modules/xml/ruby-libxml/ruby-libxml-0.03-snapshot-20020926.tar.bz2.md5
gzip: http://www.rubynet.org/modules/xml/ruby-libxml/ruby-libxml-0.03-snapshot-20020926.tar.gz
gzip MD5: http://www.rubynet.org/modules/xml/ruby-libxml/ruby-libxml-0.03-snapshot-20020926.tar.gz.md5

Either of the following will crash the interpreter:

./test/tests.rb -i 0
./test/tests.rb -i 0 -n xml_parser5

xml_parser[2-4] should be stable, I had these running over night
without any problems. Thoughts? -sc

···

--
Sean Chittenden

./test/tests.rb -i 0

Well, if I'm right the problem is in ruby_xml_parser_io_set()

532 rxp->ctxt = ruby_xml_parser_context_new3();
533 data = (rx_io_data *)rxp->data;
534 data->io = io;
535 GetOpenFile(io, fptr);
536 rb_io_check_readable(fptr);
537 f = GetWriteFile(fptr);
538
539 Data_Get_Struct(rxp->ctxt, ruby_xml_parser_context, rxpc);
540 rxpc->ctxt = xmlCreateIOParserCtxt(NULL, NULL,
541 (xmlInputReadCallback) ctxtRead,
542 (xmlInputCloseCallback) ctxtClose,
543 f, XML_CHAR_ENCODING_NONE);

When the object is garbage collected, ruby call

89 void ruby_xml_parser_context_free(ruby_xml_parser_context *rxpc) {
90 if (rxpc->ctxt != NULL && rxpc->copy == RUBY_LIBXML_ORIG) {
91 xmlFreeParserCtxt(rxpc->ctxt);
92 ruby_xml_parser_count--;
93 }
94
95 if (ruby_xml_parser_count == 0)
96 xmlCleanupParser();
97
98 free(rxpc);
99 }
100

xmlFreeParserCtxt() call ctxtClose() but the original object (data->io)
was previously closed by ruby : explicitely or the garbage collector has
found it before it try to release rxp->ctxt.

The segfault is just because you call close() on a already closed IO.

Guy Decoux

> ./test/tests.rb -i 0

Well, if I'm right the problem is in ruby_xml_parser_io_set()

532 rxp->ctxt = ruby_xml_parser_context_new3();
533 data = (rx_io_data *)rxp->data;
534 data->io = io;
535 GetOpenFile(io, fptr);
536 rb_io_check_readable(fptr);
537 f = GetWriteFile(fptr);
538
539 Data_Get_Struct(rxp->ctxt, ruby_xml_parser_context, rxpc);
540 rxpc->ctxt = xmlCreateIOParserCtxt(NULL, NULL,
541 (xmlInputReadCallback) ctxtRead,
542 (xmlInputCloseCallback) ctxtClose,
543 f, XML_CHAR_ENCODING_NONE);

When the object is garbage collected, ruby call

89 void ruby_xml_parser_context_free(ruby_xml_parser_context *rxpc) {
90 if (rxpc->ctxt != NULL && rxpc->copy == RUBY_LIBXML_ORIG) {
91 xmlFreeParserCtxt(rxpc->ctxt);
92 ruby_xml_parser_count--;
93 }
94
95 if (ruby_xml_parser_count == 0)
96 xmlCleanupParser();
97
98 free(rxpc);
99 }
100

xmlFreeParserCtxt() call ctxtClose() but the original object (data->io)
was previously closed by ruby : explicitely or the garbage collector has
found it before it try to release rxp->ctxt.

The segfault is just because you call close() on a already closed IO.

:-/ You could be right, but, the IO context is created when reading
data from RAM, not from an actual IO socket. I've created a simpler
use case:

http://lists.ruby-support.com/pipermail/ruby-developers/2002-September/000044.html

http://lists.ruby-support.com/pipermail/ruby-developers/2002-September/000045.html

That's a much simpler example. -sc

···

--
Sean Chittenden

:-/ You could be right, but, the IO context is created when reading
data from RAM, not from an actual IO socket.

Here the example

pigeon% gdb ruby
GNU gdb 2002-04-01-cvs
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-linux"...
(gdb) b dln_load
Breakpoint 1 at 0x80b576b: file dln.c, line 1282.
(gdb) r tests.rb
Starting program: /usr/local/bin/ruby tests.rb
Running all tests...

Breakpoint 1, dln_load (file=0x811b340 "../libxml.so") at dln.c:1282
1282 if ((handle = (void*)dlopen(file, RTLD_LAZY|RTLD_GLOBAL)) == NULL) {
(gdb) n
1287 if ((init_fct = (void(*)())dlsym(handle, buf)) == NULL) {
(gdb) n
1293 (*init_fct)();
(gdb) b ctxtClose
Breakpoint 2 at 0x401c5c9e: file ruby_xml_parser.c, line 15.
(gdb) b fptr_finalize
Breakpoint 3 at 0x806c982: file io.c, line 1009.
(gdb) c
Continuing.

Breakpoint 3, fptr_finalize (fptr=0x81240c0) at io.c:1009
1009 if (fptr->f) {
(gdb) cond 3 fptr->f == 0x81f9558
(gdb) c
Continuing.
Loaded suite libxml
Started...
.................
Breakpoint 3, fptr_finalize (fptr=0x81f9538) at io.c:1009
1009 if (fptr->f) {
(gdb) n
1010 fclose(fptr->f);
(gdb) p fptr->f
$1 = (FILE *) 0x81f9558
(gdb) c
Continuing.
...........................
Breakpoint 2, ctxtClose (f=0x81f9558) at ruby_xml_parser.c:15
15 if (f != stdin) {
(gdb) n
16 fclose(f);
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
0x400d93b9 in _IO_file_close_it () from /lib/libc.so.6
(gdb)

Guy Decoux

> :-/ You could be right, but, the IO context is created when reading
> data from RAM, not from an actual IO socket.

Here the example

Good catch, I fixed this in the CVS version, however this is a
different problem than the one I am having. Use test case xml_parser4
as it doesn't touch any of the IO routines (though if you update and
try again, you should get the same bug). Check out the other post
about GeoIP crashing. The stack traces look identical for both
Net::GeoIP and for libxml. What's odd is that under normal loads,
Net::GeoIP never destructs, however if you really push it and toss
things into a tight loop, it combusts and leaves a nice core. Using
Net::GeoIP, reproducing the bug is much easier and smaller than
tracking stuff through libxml, it just takes longer with Net::GeoIP.
I can reproduce this on 1.6 and 1.7.... in both cases, I have to
really push Ruby to have this show up. Actually, to get this to show
up, I have to add the /etc/hosts entries otherwise the DNS is a
sufficient bottleneck (even on a 100Mbps network with the DNS cache
3ft from the server that's doing this test).

http://lists.ruby-support.com/pipermail/ruby-developers/2002-September/000044.html
http://lists.ruby-support.com/pipermail/ruby-developers/2002-September/000045.html

-sc

#0 0x28197e83 in kill () from /usr/lib/libc.so.5
#0 0x28197e83 in kill () from /usr/lib/libc.so.5
#1 0x281ea8ae in abort () from /usr/lib/libc.so.5
#2 0x280831d9 in rb_bug () at error.c:179
#3 0x280e2c12 in sigbus () at signal.c:402
#4 <signal handler called>
#5 0x2808ab51 in rb_eval (self=135302616, n=0x0) at ruby.h:618
#6 0x2808a95f in rb_eval (self=135302616, n=0x0) at eval.c:2725
#7 0x280896d8 in rb_eval (self=135302616, n=0x0) at eval.c:2303
#8 0x2809029a in rb_call0 (klass=135569056, recv=135302616, id=11361, oid=0, argc=0, argv=0x8343274,
    body=0x814acb0, nosuper=0) at eval.c:4640
#9 0x280965cc in method_call (argc=1, argv=0x8343270, method=135299216) at eval.c:6902
#10 0x28096e5c in bmcall (args=89546, method=0) at eval.c:7043
#11 0x2808dd9a in rb_yield_0 (val=134529664, self=134724204, klass=0, pcall=2) at eval.c:3802
#12 0x2809532b in proc_invoke (proc=135298656, args=134529424, pcall=2, self=6) at ruby.h:613
#13 0x280954b8 in proc_call (proc=0, args=0) at eval.c:6544
#14 0x2808f68c in call_cfunc (func=0x28095490 <proc_call>, recv=135298656, len=89546, argc=5, argv=0xbfbfb678)
    at eval.c:4386
#15 0x2808fdbd in rb_call0 (klass=134674044, recv=135298656, id=5665, oid=0, argc=1, argv=0xbfbfb678,
    body=0x806f618, nosuper=1) at eval.c:4514

···

--
Sean Chittenden

Good catch, I fixed this in the CVS version, however this is a
different problem than the one I am having. Use test case xml_parser4
as it doesn't touch any of the IO routines (though if you update and
try again, you should get the same bug). Check out the other post

pigeon% tests.rb -n xml_parser4
Running tests xml_parser4
Loaded suite libxml
Started...
..
Finished in 0.007966 seconds.
1 runs, 13 assertions, 0 failures, 0 errors
pigeon%

Guy Decoux

> Good catch, I fixed this in the CVS version, however this is a
> different problem than the one I am having. Use test case xml_parser4
> as it doesn't touch any of the IO routines (though if you update and
> try again, you should get the same bug). Check out the other post

pigeon% tests.rb -n xml_parser4
Running tests xml_parser4
Loaded suite libxml
Started...
..
Finished in 0.007966 seconds.
1 runs, 13 assertions, 0 failures, 0 errors
pigeon%

Yup, that's right. Now add -i 0 to the argument list and watch it
core after a few min. This is a bug that comes out when you stress
test a module. -sc

···

--
Sean Chittenden

Yup, that's right. Now add -i 0 to the argument list and watch it
core after a few min. This is a bug that comes out when you stress
test a module. -sc

I can't reproduce, sorry

pigeon% time tests.rb -i 0 -n xml_parser4 >& /dev/null

real 40m55.139s
user 40m46.740s
sys 0m10.530s
pigeon%

Guy Decoux

[ I've had this email open for two day strait now, I should probably
  fire something off though, sorry for this being long, I tried to
  make this as detailed as possible. ]

> Yup, that's right. Now add -i 0 to the argument list and watch it
> core after a few min. This is a bug that comes out when you stress
> test a module. -sc

I can't reproduce, sorry

pigeon% time tests.rb -i 0 -n xml_parser4 >& /dev/null

real 40m55.139s
user 40m46.740s
sys 0m10.530s
pigeon%

Interesting. When I first woke up this morning (day you sent this)
and read your email, I couldn't reproduce it either. Since then, I've
significantly improved rubytest and have increased rubytest's memory
footprint and can now get it to crash with xml_parser4. I think the
reason that xml_parser4 wasn't crashing was because it was under some
memory threshold (sound plausible? I have some evidence to back this
up below). I just re-ran the tests with an updated version of
rubytest and am getting the following now:

$ time rubytest -F -i 0 -n xml_parser4
./tests/tc_xml_parser4.rb:12: [BUG] Segmentation fault
ruby 1.7.3 (2002-09-27) [i386-freebsd5]
Abort (core dumped)
68.013u 0.509s 1:23.42 82.1% 5+3350k 0+51io 0pf+0w

What's really strange is that if I let this send data to stdout, I
don't have any problems.

$ time rubytest -i 0 -n xml_parser4
[runs forever]

All -F does is sidestep using IO::Tee and opens /dev/null to use as
the IO handle for the unit tests. As an interesting test, I used -F
with a delay of 0.1 sec and noticed it still crashing. I'm not sure
why using IO::Tee would prevent xml_parser4 from crashing while
working around it and using only a single file handle would cause this
to crash. My best guess would be that the memory section that gets
stomped on with -F has been replaced by some fluff from IO::Tee and I'm
just getting lucky and not reading from the corrupted address
space(s).

I ran this through valgrind and came up with an updated report for
just xml_parser4.

http://www.rubynet.org/bugs/valgrind-libxml.txt
http://www.rubynet.org/bugs/valgrind-libxml.txt.bz2

I don't know enough of gc.c's workings to speculate, but it seems to
think there are problems on quite a few lines in gc.c. I don't know
if that's because it couldn't follow Ruby's execution path or if it's
because there are legitimate bugs that it's picked up. Valgrind's
good enough to run against mozilla and KDE so I'd think it's pretty
decent at picking up errors and avoiding the traps, but I'm not an
aficionado of the tool so I don't know its shortcomings.

Sorry to make this difficult, I'm really trying not to be. :slight_smile: Here,
let me make things easier, the following tarball should have
everything needed to reproduce this for both libxml and Net::GeoIP,
though I think running any reasonably large .so would trigger this
under rubytest -i0.

http://www.rubynet.org/bugs/ruby-bug.tar.gz

Installation:

*) Download and untar ruby-bug.tar.gz
*) cd bug && setenv BUGDIR $PWD
*) setenv RUBYLIB $BUGDIR
*) setenv LD_LIBRARY_PATH $BUGDIR/lib
*) set path = ($BUGDIR/rubytest $path)

For Net::GeoIP:
*) Install libGeoIP:
   *) cd $BUGDIR/GeoIP-1.0.5
   *) ./configure --prefix=$BUGDIR
   *) make && make install
*) build Net::GeoIP
   *) cd $BUGDIR/net/geoip
   *) ruby extconf.rb --with-geoip-dir=$BUGDIR
   *) make
   *) time rubytest -i 0 -F

For XML:
*) Install libxml2 2.2.4
*) cd $BUGDIR/xml/libxml
*) ruby extconf.rb
*) make
*) time rubytest -i 0 -F -n xml_parser4

I've been running the above test under valgrind for over 70min now and
haven't gotten it to crash. If I run it from the CLI, I can get it to
core in about 5min. For what it's worth, here are two runs of
xml_parser4 under different testing circumstances (changed flags to
rubytest):

http://www.rubynet.org/bugs/libxml-out1.txt
http://www.rubynet.org/bugs/libxml-out2.txt

Grr... same code, no updates, and I can't get it to crash under
valgrind. I can always get it to crash when it's not under valgrind
though. :-/ Maybe valgrind just slows it down that much, but it only
takes 2min of operation on the CLI before it cores, I'd think that
it'd core in under 70min under valgrind. :slight_smile: Just for curiosities sake,
I ran the CLI test several times in a row and it seems to be
intermittent even between runs of rubytest. I ctrl+c'ed the 1st run,
and the 2nd and 3rd run dumped almost instantaneously.

$ time rubytest -i 0 -F -n xml_parser4
/usr/lib/ruby/site_ruby/1.7/test/unit/ui/util/observable.rb:93:in `channels': Interrupt
        from /usr/lib/ruby/site_ruby/1.7/test/unit/ui/util/observable.rb:77:in `notify_listeners'
[snip stack trace: have yet to install a signal handler]
        from /home/sean/bin/rubytest:404
506.432u 9.254s 8:49.59 97.3% 0+0k 0+0io 469pf+0w
$ time rubytest -i 0 -F -n xml_parser4
./tests/tc_xml_parser4.rb:7: [BUG] Segmentation fault
ruby 1.7.3 (2002-09-13) [i686-linux]
Abort
3.027u 0.031s 0:03.12 97.7% 0+0k 0+0io 467pf+0w
$ time rubytest -i 0 -F -n xml_parser4
/usr/lib/ruby/site_ruby/1.7/test/unit/assertions.rb:93: [BUG] Segmentation fault
ruby 1.7.3 (2002-09-13) [i686-linux]
Abort (core dumped)
11.556u 0.100s 0:11.96 97.4% 0+0k 0+0io 467pf+0w

I'll see if I can't get valgrind to run overnight and produce a crash,
but hopefully I'll let it run overnight tonight though and see if I
can't get it to crash under valgrind. Odd, huh? Some thing's obscure
someplace and I'm not sure where to poke next. I was talking to JD
last night and he said he was able to core a real nasty DB test
script... it's backtraces look almost identical and can be found here:

http://www.rubynet.org/bugs/postgresql-dump_jd.txt

It looks like it's all incarnations of the same beast and it only
comes out to play when you really push Ruby in tight loops. It's hard
to for me to imagine that all three pieces of software are buggy or
that every box I'm testing on has a RAM issue that's coming up because
of these tests _and_ is dumping in similar areas... stranger things
have happened though. -sc

···

--
Sean Chittenden

Hi,

···

In message “Re: ruby bug in tight loops? (was: Re: Matz, if you’re reading, please scan this email)” on 02/10/01, Sean Chittenden sean@chittenden.org writes:

I can’t reproduce, sorry

$ time rubytest -F -i 0 -n xml_parser4
./tests/tc_xml_parser4.rb:12: [BUG] Segmentation fault
ruby 1.7.3 (2002-09-27) [i386-freebsd5]
Abort (core dumped)
68.013u 0.509s 1:23.42 82.1% 5+3350k 0+51io 0pf+0w

Unfortunately this runs forever on my machine without dumping core.
No clue.

						matz.

p.s.
Some reports from valgrind is due to Ruby’s conservative GC, which
touch all C stack region.

$ time rubytest -F -i 0 -n xml_parser4
=2E/tests/tc_xml_parser4.rb:12: [BUG] Segmentation fault
ruby 1.7.3 (2002-09-27) [i386-freebsd5]
Abort (core dumped)

Not really convincted that this is a bug in ruby.

All core dumped that I've seen actually seems to be all related with one
of your objects, for example

gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0809fefe in st_lookup (table=0x2861, key=0x1229 <Address 0x1229 out of bounds>,
    value=0xbfff6488) at st.c:253
253 hash_val = do_hash(key, table);
(gdb) up
#1 0x080aa62a in classname (klass=1075359772) at variable.c:153
153 if (ROBJECT(klass)->iv_tbl &&
(gdb) up
#2 0x080aa76a in rb_class_path (klass=1075359772) at variable.c:189
189 VALUE path = classname(klass);
(gdb) up
#3 0x080aaa51 in rb_class2name (klass=1075359772) at variable.c:288
288 return RSTRING(rb_class_path(klass))->ptr;
(gdb) up
#4 0x080b65d9 in rb_check_type (x=1075359792, t=34) at error.c:237
237 etype = rb_class2name(CLASS_OF(x));
(gdb) up
#5 0x40228c13 in ruby_xml_parser_str_set (self=1075361212, str=1075359812)
    at ruby_xml_parser.c:716
716 Data_Get_Struct(rxp->ctxt, ruby_xml_parser_context, rxpc);
(gdb)

Manifestly ruby sometimes try to work with a GC'ed object but why ???

3.027u 0.031s 0:03.12 97.7% 0+0k 0+0io 467pf+0w
$ time rubytest -i 0 -F -n xml_parser4
/usr/lib/ruby/site_ruby/1.7/test/unit/assertions.rb:93: [BUG] Segmentation =
fault
ruby 1.7.3 (2002-09-13) [i686-linux]
Abort (core dumped)

Same here, this is this case

  _wrap_assertion {
     assert_equal(Class, klass.type, "assert_instance_of takes a Class as its first argument")
     full_message = build_message(message, object, klass, object.type) {
        > arg1, arg2, arg3 |

this is the call to object#type which give the coredump

Guy Decoux

$ time rubytest -F -i 0 -n xml_parser4
=2E/tests/tc_xml_parser4.rb:12: [BUG] Segmentation fault
ruby 1.7.3 (2002-09-27) [i386-freebsd5]
Abort (core dumped)

Sorry to say that but you have a bug in your extension.

Can you explain me this ?

VALUE ruby_xml_parser_new(VALUE class) {
  ruby_xml_parser *rxp;

  ruby_xml_parser_count++;
  rxp = ALLOC(ruby_xml_parser);
  rxp->ctxt = Qnil;
  rxp->data_type = RUBY_LIBXML_SRC_TYPE_NULL;
  rxp->data = NULL;
  rxp->parsed = 0;

  return(Data_Wrap_Struct(class, 0, ruby_xml_parser_free, rxp));
}

Guy Decoux

In article 1033445508.174360.31838.nullmailer@picachu.netlab.jp,
matz@ruby-lang.org (Yukihiro Matsumoto) writes:

Some reports from valgrind is due to Ruby’s conservative GC, which
touch all C stack region.

I use following suppression file to suppress such reports.

{
memcpy/rb_thread_save_context(Value1)
Addr1
fun:memcpy
fun:rb_thread_save_context
}

{
memcpy/rb_thread_restore_context(Value1)
Addr1
fun:memcpy
fun:rb_thread_restore_context
}

{
strchr/_dl_catch_error(Cond)
Cond
fun:strchr
obj:/lib/libc-2.2.5.so
fun:_dl_catch_error
}

{
mark_locations_array(Cond)
Cond
fun:mark_locations_array
}

{
mark_locations_array(Value4)
Value4
fun:mark_locations_array
}

{
mark_locations_array(Value4)
Addr4
fun:mark_locations_array
}

{
rb_gc_mark(Cond)
Cond
fun:rb_gc_mark
}

{
rb_gc_mark(Value4)
Value4
fun:rb_gc_mark
}

{
rb_gc_mark_children(Value4)
Value4
fun:rb_gc_mark_children
}

{
rb_gc_mark_children(Cond)
Cond
fun:rb_gc_mark_children
}

···


Tanaka Akira

I can’t reproduce, sorry

$ time rubytest -F -i 0 -n xml_parser4
./tests/tc_xml_parser4.rb:12: [BUG] Segmentation fault
ruby 1.7.3 (2002-09-27) [i386-freebsd5]
Abort (core dumped)
68.013u 0.509s 1:23.42 82.1% 5+3350k 0+51io 0pf+0w

Unfortunately this runs forever on my machine without dumping core.
No clue.

I was on a gentoo system and it seemed to core about 50% of the time
on the linux box. :frowning: Did you try running the command a few times? I
know it sounds hokey, but this really made a difference. Are you
running with a stripped ruby interpreter? From watching this crop up
and testing this on several systems, it looks like something that
crops up under memory constraints. xml_parser4 is the least demanding
of all of the libxml tests… I’m not convinced that if if you just
run ‘rubytest -F -i 0’ that the core is a result of a ruby bug vs a
libxml bug, but there’s clearly something going on someplace and I’m
not sure where it’s happening. The current core dump I’m getting
right now, however, leads me to think that it’s a ruby
problem… though I’m not 100% sure of that.

$ rubytest -i 0

p.s.
Some reports from valgrind is due to Ruby’s conservative GC, which
touch all C stack region.

That’s what I figured. Ruby’s internals are far from simple so it
doesn’t surprise me that valgrind choked or reported false positives.

-sc

···


Sean Chittenden