Segmentation fault, proc, eval, long string

Hi,

I'm getting a 'Segmentation fault' in ruby 1.8.5 running on debian in a Xen VPS. The same code running on OS X and a different version of linux has no problems.

The process to get this is maybe a little strange.

1) read a large file into a string (1.3MB)
2) eval the string (the string is a single ruby proc definition that when called will build an object structure in memory)
3) call the proc --> Segmentation fault *very* soon after

The file was generated by the same program but it was running but on a different machine, in this case the other linux box I mentioned above.

Knowning full well that there can be all kinds of differences between the linuxes, I'll claim that the only interesting difference that I can find is/was in the architectures reported by ruby --version: on the machine that works reports i686-linux, the machine that doesn't reports i386-linux -- so I rebuilt a version that was also i686 and, of course, this made no difference. So all that means is that I can't find the truly interesting difference.

If I edit the file from where the string is read, and replace a bunch of assignments of a particular type of object (the objects are still created) (about 6000 of them) then the problem disappears. There's nothing special about the objects I got rid of, it was just easy to use regular expressions to identify them and get rid of their assignment.

If I try running ruby through gdb there is a SIGSEGV signal at eval.c:2890 -- which is the unknown_node method but I can't get a more complete stacktrace (until I figure out how to build ruby with the debug information not stripped out). Manually poking around though, method_call calls rb_call0 calls unknown_node so I'm betting on this. And so? Well maybe the eval of the string produced an invalid proc object? What's the cause of this? Too long a string? too many objects in the eval? too big a proc object? But why work on one linux box and fail on the other?

I'm wondering if anyone has seen anything like this before or maybe have any experience debugging this kind of thing? Any suggestions very much appreciated.

Thanks,
Bob

···

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>

Hrm. This looks similar to the problem reported here:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/80435

···

On 11/30/06, Bob Hutchison <hutch@recursive.ca> wrote:

Hi,

I'm getting a 'Segmentation fault' in ruby 1.8.5 running on debian in
a Xen VPS. The same code running on OS X and a different version of
linux has no problems.

The process to get this is maybe a little strange.

1) read a large file into a string (1.3MB)
2) eval the string (the string is a single ruby proc definition that
when called will build an object structure in memory)
3) call the proc --> Segmentation fault *very* soon after

A little more on this...

Hi,

I'm getting a 'Segmentation fault' in ruby 1.8.5 running on debian in a Xen VPS. The same code running on OS X and a different version of linux has no problems.

The process to get this is maybe a little strange.

1) read a large file into a string (1.3MB)
2) eval the string (the string is a single ruby proc definition that when called will build an object structure in memory)
3) call the proc --> Segmentation fault *very* soon after

The file was generated by the same program but it was running but on a different machine, in this case the other linux box I mentioned above.

Knowning full well that there can be all kinds of differences between the linuxes, I'll claim that the only interesting difference that I can find is/was in the architectures reported by ruby --version: on the machine that works reports i686-linux, the machine that doesn't reports i386-linux -- so I rebuilt a version that was also i686 and, of course, this made no difference. So all that means is that I can't find the truly interesting difference.

If I edit the file from where the string is read, and replace a bunch of assignments of a particular type of object (the objects are still created) (about 6000 of them) then the problem disappears. There's nothing special about the objects I got rid of, it was just easy to use regular expressions to identify them and get rid of their assignment.

If I try running ruby through gdb there is a SIGSEGV signal at eval.c:2890 -- which is the unknown_node method but I can't get a more complete stacktrace (until I figure out how to build ruby with the debug information not stripped out). Manually poking around though, method_call calls rb_call0 calls unknown_node so I'm betting on this. And so? Well maybe the eval of the string produced an invalid proc object? What's the cause of this? Too long a string? too many objects in the eval? too big a proc object? But why work on one linux box and fail on the other?

So I put some printf into the eval.c file and it turns out that rb_eval is called recursively 5301 times before seg faulting, while trying to handle a NODE_DASGN_CURR node. There are no other eval node types being evaluated when this begins, every node is a NODE_DASGN_CURR.

There is nothing that is anywhere that deep in the script that I am evaluating. So it looks as though the proc object is corrupt??

So maybe this is reproducible?? Well, so it is. If I run this script:

module SomeModule
   def initialize
     @@proc = nil
   end

   def SomeModule.build
     if @@proc then
       result = @@proc.call
       @@proc = nil
       return result
     end
   end
end

N = 5000

the_string = ""

the_string << "module SomeModule\n"
the_string << " @@proc = Proc.new {\n"
the_string << " thing = \n"

N.times do | i |
   the_string << " v#{i} = [#{i}]\n"
end

N.times do | i |
   the_string << " thing << v#{i}\n"
end

the_string << " thing\n"
the_string << " } #proc\n"
the_string << "end\n"

puts("the_string length: #{the_string.length}")
eval(the_string, nil, "ruby_definition", 0)
SomeModule.build

It will fail on the one linux box, run on the other, and run on OS X. With a little binary search, the smallest N that causes the segfault is 3024 (3023 works).

Does this help?

···

On 30-Nov-06, at 10:36 AM, Bob Hutchison wrote:

I'm wondering if anyone has seen anything like this before or maybe have any experience debugging this kind of thing? Any suggestions very much appreciated.

Thanks,
Bob

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

Hi,

I'm getting a 'Segmentation fault' in ruby 1.8.5 running on debian in
a Xen VPS. The same code running on OS X and a different version of
linux has no problems.

The process to get this is maybe a little strange.

1) read a large file into a string (1.3MB)
2) eval the string (the string is a single ruby proc definition that
when called will build an object structure in memory)
3) call the proc --> Segmentation fault *very* soon after

Hrm. This looks similar to the problem reported here:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/80435

Thanks for the link.

Could be, but that thread kind of petered out. There were some others that I found that didn't seem to resolve. There was one in Japanese that I certainly could not follow :slight_smile:

Cheers,
Bob

···

On 30-Nov-06, at 12:09 PM, Wilson Bilkovich wrote:

On 11/30/06, Bob Hutchison <hutch@recursive.ca> wrote:

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

A little more on this...

Hi,

I'm getting a 'Segmentation fault' in ruby 1.8.5 running on debian in a Xen VPS. The same code running on OS X and a different version of linux has no problems.

The process to get this is maybe a little strange.

1) read a large file into a string (1.3MB)
2) eval the string (the string is a single ruby proc definition that when called will build an object structure in memory)
3) call the proc --> Segmentation fault *very* soon after

[snip]

So maybe this is reproducible?? Well, so it is. If I run this script:

module SomeModule
  def initialize
    @@proc = nil
  end

  def SomeModule.build
    if @@proc then
      result = @@proc.call
      @@proc = nil
      return result
    end
  end
end

N = 5000

the_string = ""

the_string << "module SomeModule\n"
the_string << " @@proc = Proc.new {\n"
the_string << " thing = \n"

N.times do | i |
  the_string << " v#{i} = [#{i}]\n"
end

N.times do | i |
  the_string << " thing << v#{i}\n"
end

the_string << " thing\n"
the_string << " } #proc\n"
the_string << "end\n"

puts("the_string length: #{the_string.length}")
eval(the_string, nil, "ruby_definition", 0)
SomeModule.build

It will fail on the one linux box, run on the other, and run on OS X. With a little binary search, the smallest N that causes the segfault is 3024 (3023 works).

So, to increase the strangeness a bit... if I run this from within vim (i.e. using ":!ruby crash.rb") it works for some pretty big Ns.

Sigh.

Cheers,
Bob

···

On 30-Nov-06, at 12:50 PM, Bob Hutchison wrote:

On 30-Nov-06, at 10:36 AM, Bob Hutchison wrote:

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

Segfaults for me on my Debian box with ruby 1.8.4 (2005-12-24) [i386-linux]

···

On 11/30/06, Bob Hutchison <hutch@recursive.ca> wrote:

A little more on this...

On 30-Nov-06, at 10:36 AM, Bob Hutchison wrote:

> Hi,
>
> I'm getting a 'Segmentation fault' in ruby 1.8.5 running on debian
> in a Xen VPS. The same code running on OS X and a different version
> of linux has no problems.
>
> The process to get this is maybe a little strange.
>
> 1) read a large file into a string (1.3MB)
> 2) eval the string (the string is a single ruby proc definition
> that when called will build an object structure in memory)
> 3) call the proc --> Segmentation fault *very* soon after
>
> The file was generated by the same program but it was running but
> on a different machine, in this case the other linux box I
> mentioned above.
>
> Knowning full well that there can be all kinds of differences
> between the linuxes, I'll claim that the only interesting
> difference that I can find is/was in the architectures reported by
> ruby --version: on the machine that works reports i686-linux, the
> machine that doesn't reports i386-linux -- so I rebuilt a version
> that was also i686 and, of course, this made no difference. So all
> that means is that I can't find the truly interesting difference.
>
> If I edit the file from where the string is read, and replace a
> bunch of assignments of a particular type of object (the objects
> are still created) (about 6000 of them) then the problem
> disappears. There's nothing special about the objects I got rid of,
> it was just easy to use regular expressions to identify them and
> get rid of their assignment.
>
> If I try running ruby through gdb there is a SIGSEGV signal at
> eval.c:2890 -- which is the unknown_node method but I can't get a
> more complete stacktrace (until I figure out how to build ruby with
> the debug information not stripped out). Manually poking around
> though, method_call calls rb_call0 calls unknown_node so I'm
> betting on this. And so? Well maybe the eval of the string produced
> an invalid proc object? What's the cause of this? Too long a
> string? too many objects in the eval? too big a proc object? But
> why work on one linux box and fail on the other?

So I put some printf into the eval.c file and it turns out that
rb_eval is called recursively 5301 times before seg faulting, while
trying to handle a NODE_DASGN_CURR node. There are no other eval node
types being evaluated when this begins, every node is a NODE_DASGN_CURR.

There is nothing that is anywhere that deep in the script that I am
evaluating. So it looks as though the proc object is corrupt??

So maybe this is reproducible?? Well, so it is. If I run this script:

module SomeModule
   def initialize
     @@proc = nil
   end

   def SomeModule.build
     if @@proc then
       result = @@proc.call
       @@proc = nil
       return result
     end
   end
end

N = 5000

the_string = ""

the_string << "module SomeModule\n"
the_string << " @@proc = Proc.new {\n"
the_string << " thing = \n"

N.times do | i |
   the_string << " v#{i} = [#{i}]\n"
end

N.times do | i |
   the_string << " thing << v#{i}\n"
end

the_string << " thing\n"
the_string << " } #proc\n"
the_string << "end\n"

puts("the_string length: #{the_string.length}")
eval(the_string, nil, "ruby_definition", 0)
SomeModule.build

It will fail on the one linux box, run on the other, and run on OS X.
With a little binary search, the smallest N that causes the segfault
is 3024 (3023 works).

Does this help?

Bob Hutchison schrieb:

So I put some printf into the eval.c file and it turns out that rb_eval is called recursively 5301 times before seg faulting, while trying to handle a NODE_DASGN_CURR node. There are no other eval node types being evaluated when this begins, every node is a NODE_DASGN_CURR.

There is nothing that is anywhere that deep in the script that I am evaluating. So it looks as though the proc object is corrupt??

So maybe this is reproducible?? Well, so it is. If I run this script:

(...)

It will fail on the one linux box, run on the other, and run on OS X. With a little binary search, the smallest N that causes the segfault is 3024 (3023 works).

Bob, you can use parsetree to dump the AST of the generated proc. I'm sure you'll see the deep nesting of the nodes.

Regards,
Pit

Hi,

···

In message "Re: Segmentation fault, proc, eval, long string [Reproduced]" on Fri, 1 Dec 2006 02:50:30 +0900, Bob Hutchison <hutch@recursive.ca> writes:

There is nothing that is anywhere that deep in the script that I am
evaluating. So it looks as though the proc object is corrupt??

So maybe this is reproducible?? Well, so it is. If I run this script:

Thank you for the report. Your script helped. Could you check if the
attached patch work for you?

              matz.

Index: parse.y

RCS file: /var/cvs/src/ruby/parse.y,v
retrieving revision 1.307.2.47
diff -p -u -1 -r1.307.2.47 parse.y
--- parse.y 2 Nov 2006 06:45:50 -0000 1.307.2.47
+++ parse.y 2 Dec 2006 15:57:23 -0000
@@ -4863,2 +4863,4 @@ gettable(id)

+static VALUE dyna_var_lookup _((ID id));
+
static NODE*
@@ -4891,3 +4893,3 @@ assignable(id, val)
   }
- else if (rb_dvar_defined(id)) {
+ else if (dyna_var_lookup(id)) {
       return NEW_DASGN(id, val);
@@ -5733,2 +5735,18 @@ top_local_setup()

+static VALUE
+dyna_var_lookup(id)
+ ID id;
+{
+ struct RVarmap *vars = ruby_dyna_vars;
+
+ while (vars) {
+ if (vars->id == id) {
+ vars->val = Qtrue;
+ return Qtrue;
+ }
+ vars = vars->next;
+ }
+ return Qfalse;
+}
+
static struct RVarmap*
@@ -5767,3 +5786,5 @@ dyna_init(node, pre)
     for (var = 0; post != pre && post->id; post = post->next) {
- var = NEW_DASGN_CURR(post->id, var);
+ if (RTEST(post->val)) {
+ var = NEW_DASGN_CURR(post->id, var);
+ }
     }

Can you get a full stack trace from gdb or something?
I found a pile of other links by googling for 'unknown node type' that
seem to suggest that maybe some of your objects are getting
prematurely garbage collected.

Maybe the size of that method hits a Ruby threshold that triggers GC
inappropriately?

Try turning GC off; if that fixes it, that might help narrow it down.

···

On 11/30/06, Bob Hutchison <hutch@recursive.ca> wrote:

On 30-Nov-06, at 12:09 PM, Wilson Bilkovich wrote:

> On 11/30/06, Bob Hutchison <hutch@recursive.ca> wrote:
>> Hi,
>>
>> I'm getting a 'Segmentation fault' in ruby 1.8.5 running on debian in
>> a Xen VPS. The same code running on OS X and a different version of
>> linux has no problems.
>>
>> The process to get this is maybe a little strange.
>>
>> 1) read a large file into a string (1.3MB)
>> 2) eval the string (the string is a single ruby proc definition that
>> when called will build an object structure in memory)
>> 3) call the proc --> Segmentation fault *very* soon after
>>
>
> Hrm. This looks similar to the problem reported here:
> http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/80435

Thanks for the link.

Could be, but that thread kind of petered out. There were some others
that I found that didn't seem to resolve. There was one in Japanese
that I certainly could not follow :slight_smile:

Oh dear. In some ways I was hoping for something unique to my machine. Thanks a lot for trying this.

Cheers,
Bob

···

On 30-Nov-06, at 1:56 PM, Wilson Bilkovich wrote:

On 11/30/06, Bob Hutchison <hutch@recursive.ca> wrote:

A little more on this...

On 30-Nov-06, at 10:36 AM, Bob Hutchison wrote:

> Hi,
>
> I'm getting a 'Segmentation fault' in ruby 1.8.5 running on debian
> in a Xen VPS. The same code running on OS X and a different version
> of linux has no problems.
>
> The process to get this is maybe a little strange.
>
> 1) read a large file into a string (1.3MB)
> 2) eval the string (the string is a single ruby proc definition
> that when called will build an object structure in memory)
> 3) call the proc --> Segmentation fault *very* soon after
>
> The file was generated by the same program but it was running but
> on a different machine, in this case the other linux box I
> mentioned above.
>
> Knowning full well that there can be all kinds of differences
> between the linuxes, I'll claim that the only interesting
> difference that I can find is/was in the architectures reported by
> ruby --version: on the machine that works reports i686-linux, the
> machine that doesn't reports i386-linux -- so I rebuilt a version
> that was also i686 and, of course, this made no difference. So all
> that means is that I can't find the truly interesting difference.
>
> If I edit the file from where the string is read, and replace a
> bunch of assignments of a particular type of object (the objects
> are still created) (about 6000 of them) then the problem
> disappears. There's nothing special about the objects I got rid of,
> it was just easy to use regular expressions to identify them and
> get rid of their assignment.
>
> If I try running ruby through gdb there is a SIGSEGV signal at
> eval.c:2890 -- which is the unknown_node method but I can't get a
> more complete stacktrace (until I figure out how to build ruby with
> the debug information not stripped out). Manually poking around
> though, method_call calls rb_call0 calls unknown_node so I'm
> betting on this. And so? Well maybe the eval of the string produced
> an invalid proc object? What's the cause of this? Too long a
> string? too many objects in the eval? too big a proc object? But
> why work on one linux box and fail on the other?

So I put some printf into the eval.c file and it turns out that
rb_eval is called recursively 5301 times before seg faulting, while
trying to handle a NODE_DASGN_CURR node. There are no other eval node
types being evaluated when this begins, every node is a NODE_DASGN_CURR.

There is nothing that is anywhere that deep in the script that I am
evaluating. So it looks as though the proc object is corrupt??

So maybe this is reproducible?? Well, so it is. If I run this script:

module SomeModule
   def initialize
     @@proc = nil
   end

   def SomeModule.build
     if @@proc then
       result = @@proc.call
       @@proc = nil
       return result
     end
   end
end

N = 5000

the_string = ""

the_string << "module SomeModule\n"
the_string << " @@proc = Proc.new {\n"
the_string << " thing = \n"

N.times do | i |
   the_string << " v#{i} = [#{i}]\n"
end

N.times do | i |
   the_string << " thing << v#{i}\n"
end

the_string << " thing\n"
the_string << " } #proc\n"
the_string << "end\n"

puts("the_string length: #{the_string.length}")
eval(the_string, nil, "ruby_definition", 0)
SomeModule.build

It will fail on the one linux box, run on the other, and run on OS X.
With a little binary search, the smallest N that causes the segfault
is 3024 (3023 works).

Does this help?

Segfaults for me on my Debian box with ruby 1.8.4 (2005-12-24) [i386-linux]

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

Bob Hutchison schrieb:

So I put some printf into the eval.c file and it turns out that rb_eval is called recursively 5301 times before seg faulting, while trying to handle a NODE_DASGN_CURR node. There are no other eval node types being evaluated when this begins, every node is a NODE_DASGN_CURR.
There is nothing that is anywhere that deep in the script that I am evaluating. So it looks as though the proc object is corrupt??
So maybe this is reproducible?? Well, so it is. If I run this script:
(...)
It will fail on the one linux box, run on the other, and run on OS X. With a little binary search, the smallest N that causes the segfault is 3024 (3023 works).

Bob, you can use parsetree to dump the AST of the generated proc. I'm sure you'll see the deep nesting of the nodes.

Okay, I'll do that. But a Segmentation Fault? Surely there's a more polite way to deal with the problem.

Cheers,
Bob

···

On 1-Dec-06, at 7:31 AM, Pit Capitain wrote:

Regards,
Pit

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

I just tried this, and here's what it gave me (for a smaller N, so the
whole process doesn't crash. Should show the same structure no matter
what N is, though)

[[:module,
  :SomeModule,
  [:defn,
   :initialize,
   [:scope, [:block, [:args], [:cvasgn, :@@proc, [:nil]]]]],
  [:defn,
   :"self.build",
   [:scope,
    [:block,
     [:args],
     [:if,
      [:cvar, :@@proc],
      [:block,
       [:lasgn, :result, [:call, [:cvar, :@@proc], :call]],
       [:cvasgn, :@@proc, [:nil]],
       [:return, [:lvar, :result]]],
      nil]]]]]]

···

On 12/1/06, Pit Capitain <pit@capitain.de> wrote:

Bob Hutchison schrieb:
> So I put some printf into the eval.c file and it turns out that rb_eval
> is called recursively 5301 times before seg faulting, while trying to
> handle a NODE_DASGN_CURR node. There are no other eval node types being
> evaluated when this begins, every node is a NODE_DASGN_CURR.
>
> There is nothing that is anywhere that deep in the script that I am
> evaluating. So it looks as though the proc object is corrupt??
>
> So maybe this is reproducible?? Well, so it is. If I run this script:
>
> (...)
>
> It will fail on the one linux box, run on the other, and run on OS X.
> With a little binary search, the smallest N that causes the segfault is
> 3024 (3023 works).

Bob, you can use parsetree to dump the AST of the generated proc. I'm
sure you'll see the deep nesting of the nodes.

Bob, you can use parsetree to dump the AST of the generated proc. I'm sure you'll see the deep nesting of the nodes.

Oh, very interesting. Yes indeed it gets kinda deep there :slight_smile: Thanks!

So the 'solution' is to set the stack size bigger (ulimit -s 20000 works up to a much larger number). But this 'solution' does not make me very happy. First the stack size is the same on all the machines that I'm using, so while this fixes the machine showing the problem I am not entirely convinced. Secondly, all this does is increases the depth at which it will fail (and it still does fail). Seems more of a work around than a solution.

Any better ideas?

What is actually going on anyway? Is Ruby creating a closure or something for each new local variable introduced?

Cheers,
Bob

···

On 1-Dec-06, at 7:31 AM, Pit Capitain wrote:

Regards,
Pit

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

Hi,

Hi,

>There is nothing that is anywhere that deep in the script that I am
>evaluating. So it looks as though the proc object is corrupt??
>
>So maybe this is reproducible?? Well, so it is. If I run this script:

Thank you for the report. Your script helped. Could you check if the
attached patch work for you?

Thanks for the patch. I applied it and tried a few things with it. The test program I provided does work now, so I think this is solving the problem. On the other hand, when I try to run the application that needs it, debian is killing it and I can't see why. The installation I'm using, as it happens, doesn't have any VM configured and this has been causing some difficulty recently. Could this patch cause a lot of memory to be allocated rapidly? If not then there's something else going on that I'll have to look into (and it is likely a completely different problem).

Thanks everyone for your help.

Cheers,
Bob

···

On 2-Dec-06, at 10:59 AM, Yukihiro Matsumoto wrote:

In message "Re: Segmentation fault, proc, eval, long string > [Reproduced]" > on Fri, 1 Dec 2006 02:50:30 +0900, Bob Hutchison > <hutch@recursive.ca> writes:

              matz.

Index: parse.y

RCS file: /var/cvs/src/ruby/parse.y,v
retrieving revision 1.307.2.47
diff -p -u -1 -r1.307.2.47 parse.y
--- parse.y 2 Nov 2006 06:45:50 -0000 1.307.2.47
+++ parse.y 2 Dec 2006 15:57:23 -0000
@@ -4863,2 +4863,4 @@ gettable(id)

+static VALUE dyna_var_lookup _((ID id));
+
static NODE*
@@ -4891,3 +4893,3 @@ assignable(id, val)
   }
- else if (rb_dvar_defined(id)) {
+ else if (dyna_var_lookup(id)) {
       return NEW_DASGN(id, val);
@@ -5733,2 +5735,18 @@ top_local_setup()

+static VALUE
+dyna_var_lookup(id)
+ ID id;
+{
+ struct RVarmap *vars = ruby_dyna_vars;
+
+ while (vars) {
+ if (vars->id == id) {
+ vars->val = Qtrue;
+ return Qtrue;
+ }
+ vars = vars->next;
+ }
+ return Qfalse;
+}
+
static struct RVarmap*
@@ -5767,3 +5786,5 @@ dyna_init(node, pre)
     for (var = 0; post != pre && post->id; post = post->next) {
- var = NEW_DASGN_CURR(post->id, var);
+ if (RTEST(post->val)) {
+ var = NEW_DASGN_CURR(post->id, var);
+ }
     }

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

Oh, and what happens when you freeze the string before eval'ing it?

···

On 11/30/06, Wilson Bilkovich <wilsonb@gmail.com> wrote:

On 11/30/06, Bob Hutchison <hutch@recursive.ca> wrote:
>
> On 30-Nov-06, at 12:09 PM, Wilson Bilkovich wrote:
>
> > On 11/30/06, Bob Hutchison <hutch@recursive.ca> wrote:
> >> Hi,
> >>
> >> I'm getting a 'Segmentation fault' in ruby 1.8.5 running on debian in
> >> a Xen VPS. The same code running on OS X and a different version of
> >> linux has no problems.
> >>
> >> The process to get this is maybe a little strange.
> >>
> >> 1) read a large file into a string (1.3MB)
> >> 2) eval the string (the string is a single ruby proc definition that
> >> when called will build an object structure in memory)
> >> 3) call the proc --> Segmentation fault *very* soon after
> >>
> >
> > Hrm. This looks similar to the problem reported here:
> > http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/80435
>
> Thanks for the link.
>
> Could be, but that thread kind of petered out. There were some others
> that I found that didn't seem to resolve. There was one in Japanese
> that I certainly could not follow :slight_smile:
>

Can you get a full stack trace from gdb or something?
I found a pile of other links by googling for 'unknown node type' that
seem to suggest that maybe some of your objects are getting
prematurely garbage collected.

Maybe the size of that method hits a Ruby threshold that triggers GC
inappropriately?

Try turning GC off; if that fixes it, that might help narrow it down.

That didn't make any difference. Nice idea though.

···

On 30-Nov-06, at 1:18 PM, Wilson Bilkovich wrote:

Can you get a full stack trace from gdb or something?
I found a pile of other links by googling for 'unknown node type' that
seem to suggest that maybe some of your objects are getting
prematurely garbage collected.

Maybe the size of that method hits a Ruby threshold that triggers GC
inappropriately?

Try turning GC off; if that fixes it, that might help narrow it down.

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

Wilson Bilkovich schrieb:

···

On 12/1/06, Pit Capitain <pit@capitain.de> wrote:

Bob, you can use parsetree to dump the AST of the generated proc. I'm
sure you'll see the deep nesting of the nodes.

I just tried this, and here's what it gave me (for a smaller N, so the
whole process doesn't crash. Should show the same structure no matter
what N is, though)

(...)

Wilson, this is not the dump of the generated proc. You have to pass the contents of @@proc to ParseTree.

Regards,
Pit

Bob Hutchison schrieb:

What is actually going on anyway? Is Ruby creating a closure or something for each new local variable introduced?

Bob, the whole proc is a closure, so there's a difference between executing "a = 1" inside or outside of a proc. But I'm not sure whether it's necessary to evaluate NODE_DASGN_CURR in the recursive way it is done now. Unfortunately I've not enough time to check this myself. Maybe someone else can take a look?

Regards,
Pit

Hi,

Thank you for the report. Your script helped. Could you check if the
attached patch work for you?

Thanks for the patch. I applied it and tried a few things with it.
The test program I provided does work now, so I think this is solving
the problem. On the other hand, when I try to run the application
that needs it, debian is killing it and I can't see why. The
installation I'm using, as it happens, doesn't have any VM configured
and this has been causing some difficulty recently. Could this patch
cause a lot of memory to be allocated rapidly? If not then there's
something else going on that I'll have to look into (and it is likely
a completely different problem).

The patch does rather decrease the amount of memory that Ruby use.
But the original program seems to use tens of thousands of in-block
variables, which themselves consumes more memory than plain local
variables or arrays. So by avoiding segmentation fault, it turns out
to kick the out-of-memory killer of linux kernel.

If it's possible, I'd recommend you to reduce these local variables.

              matz.

···

In message "Re: Segmentation fault, proc, eval, long string [Reproduced]" on Sun, 3 Dec 2006 22:51:38 +0900, Bob Hutchison <hutch@recursive.ca> writes:

same thing. Tried freezing the proc too, no change. Thanks again though.

Cheers,
Bob

···

On 30-Nov-06, at 1:20 PM, Wilson Bilkovich wrote:

Oh, and what happens when you freeze the string before eval'ing it?

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;