[BUG] unknown node type 0 - SERIOUS ENOUGH TO MIGRATE AWAY FROM RUBY?

This is a long standing bug in Ruby, and has been reported hundreds of times
by myself and many other people, but never addressed. Unfortunately, the
usual response is "Give a small code example reproducing the problem", which
is impossible (given the nature of the bug), so it gets overlooked.

Common themes seem to be

1) Its pretty random
2) Changing a source file, even by adding white space, often causes the
problem to appear/dissappear
3) Test cases are worthless. It will undoutedly work on your machine. This is
undoutedly some sort of nasty random read/write scribble bug.

These are other reports (A quick search on google/groups find 330+)

http://www.talkaboutprogramming.com/group/comp.lang.ruby/messages/116486.html

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/63237

http://www.generic-nic.net/dyn/mon/detail?tld=.pr_Puerto_Rico

http://rubyforge.org/forum/forum.php?forum_id=1126

http://groups.google.co.uk/groups?q=ruby+[BUG]+unknown+node+type+0&hl=en&lr=&ie=UTF-8&selm=200409161610.24892.andrew%40walrond.org&rnum=1

http://groups.google.co.uk/groups?q=ruby+[BUG]+unknown+node+type+0&hl=en&lr=&ie=UTF-8&selm=200301302033.59645.uehli%40bluewin.ch&rnum=2

http://groups.google.co.uk/groups?q=ruby+[BUG]+unknown+node+type+0&hl=en&lr=&ie=UTF-8&selm=200304091208.00832.fg%40siamecommerce.com&rnum=3

http://groups.google.co.uk/groups?q=ruby+[BUG]+unknown+node+type+0&hl=en&lr=&ie=UTF-8&selm=1081211008.555546.15826.nullmailer%40picachu.netlab.jp&rnum=4

...

The problem is serious. It bites me, and many others, frequently. I use ruby
based init scripts for my Rubyx linux distro, which is about as mission
critical as it gets.

I have offered opportunities to debug a reproducible test case on one of my
servers (see my previous post to ruby-core), but nobody seems interested. My
time and resources are at your disposal; I am record as having already spent
considerable time trying to solve this already.

I love ruby and don't want to have to migrate away, but I _need_ reliability,
and it's looking like my only option.

Yours, having a bad day,

Andrew Walrond

[ Imagine; Rubyx becomes Pythix, or Perlix - Yuk! :slight_smile: ]

FWIW and for other sufferers, (my) empirical evidence suggests that building
ruby with debug info (-g) makes the problem dissappear ... mostly ...

Andrew Walrond wrote:

This is a long standing bug in Ruby, and has been reported hundreds of times by myself and many other people, but never addressed. Unfortunately, the usual response is "Give a small code example reproducing the problem", which is impossible (given the nature of the bug), so it gets overlooked.

Common themes seem to be

1) Its pretty random
2) Changing a source file, even by adding white space, often causes the problem to appear/dissappear
3) Test cases are worthless. It will undoutedly work on your machine. This is undoutedly some sort of nasty random read/write scribble bug.

The problem is that if you cannot reproduce the bug on demand then how on earth is anyone going to see what is happening so that they can fix it and how are they going to be able to test that their fix has indeed worked. Without a clue as to how the bug is generated then what exactly do you expect people to do?

The problem can only be addressed if you can show it to someone, thus "Give a small code example reproducing the problem"

Andrew Walrond wrote:

This is a long standing bug in Ruby, and has been reported hundreds of times by myself and many other people, but never addressed. Unfortunately, the usual response is "Give a small code example reproducing the problem", which is impossible (given the nature of the bug), so it gets overlooked.

No, the message alone doesn't say very much, because it's printed if a default case of the evluator is reached. So it's perfectly possible that every report of the hundred was caused by a different bug. That's why you and everyone that encounters it should provide a code example. And BTW, whining still isn't a substitute for that...

···

--
Florian Frank

Andrew Walrond wrote:

3) Test cases are worthless. It will undoutedly work on your machine. This
is undoutedly some sort of nasty random read/write scribble bug.

I have in the past been successful at fixing these kinds of problems (albeit
not in Ruby, and not on Linux) by getting a core dump from the user
(usually via FTP) and inspecting it under the debugger. Is this a possible
approach here?

The problem is that if you cannot reproduce the bug on demand then how
on earth is anyone going to see what is happening so that they can fix
it and how are they going to be able to test that their fix has indeed
worked. Without a clue as to how the bug is generated then what exactly
do you expect people to do?

The problem can only be addressed if you can show it to someone, thus

1) It's not that kind of bug. Its not reprodicible in that way. Programmers
are familiar with the "read random crap from uninitialised memory/variable",
or "scribble on the stack" type bugs which can exhibit this behaviour.

2) I can reproduce the bug on demand. I have a reproducible testcase on a
internet connected server which I have offered to the ruby developers for
debugging purposes.

I'm not having a go at anybody; I merely wanted to emphasize that 330+ bug
reports have been ignored because, as I explained, it's impossible to

"Give a small code example reproducing the problem"

and to highlight my perception of the seriousness of this bug.

Andrew Walrond

···

On Wednesday 15 Dec 2004 13:55, Peter Hickman wrote:

Florian Frank wrote:

Andrew Walrond wrote:

This is a long standing bug in Ruby, and has been
reported hundreds of times by myself and many other
people, but never addressed. Unfortunately, the usual
response is "Give a small code example reproducing the
problem", which is impossible (given the nature of the
bug), so it gets overlooked.

No, the message alone doesn't say very much, because it's
printed if a default case of the evluator is reached. So
it's perfectly possible that every report of the hundred
was caused by a different bug. That's why you and
everyone that encounters it should provide a code
example. And BTW, whining still isn't a substitute for
that...

But the code sample is worthless. What fails on my machine works on someone
else's machine. I posted a post like this a couple weeks ago and the
response I got was: "works on my machine". And the mere act of inserting a
newline or removing a newline can break or fix the problem.

Peter Hickman wrote:

The problem is that if you cannot reproduce the bug on demand then how on earth is anyone going to see what is happening so that they can fix it and how are they going to be able to test that their fix has indeed worked. Without a clue as to how the bug is generated then what exactly do you expect people to do?

Andrew didn't say he couldn't reproduce it.

The problem can only be addressed if you can show it to someone, thus "Give a small code example reproducing the problem"

Twenty years ago, I fixed a bug in a data acquisition system caused by inappropriate use of a signed integer as a buffer pointer (PDP-11 assembler code). The bug was triggered only because the buffer happened to span the midpoint of address space, where the pointer changed sign. Any attempt to make a "small" example would have made the problem go away.

Some bugs simply do not yield to decimation. As I recall, I solved it by reading the code and thinking about it.

Steve

Florian Frank wrote:

Andrew Walrond wrote:

This is a long standing bug in Ruby, and has been reported hundreds of times by myself and many other people, but never addressed. Unfortunately, the usual response is "Give a small code example reproducing the problem", which is impossible (given the nature of the bug), so it gets overlooked.

No, the message alone doesn't say very much, because it's printed if a default case of the evluator is reached. So it's perfectly possible that every report of the hundred was caused by a different bug. That's why you and everyone that encounters it should provide a code example.

I haven't seen this bug in a while, but it was a very strange one for
me...

1. It happened sometimes when I was not (to my knowledge) using *any*
C extension.

2. Frequently ANY change in the source would make it go away -- e.g.,
adding a blank line or a comment.

Hal

but then again nor is a duplicated useless moaning response in reply is it?
i hope the latest changes sorted the problem andrew.

Alex

···

On Dec 15, 2004, at 11:15 PM, Florian Frank wrote:

Andrew Walrond wrote:

This is a long standing bug in Ruby, and has been reported hundreds of times by myself and many other people, but never addressed. Unfortunately, the usual response is "Give a small code example reproducing the problem", which is impossible (given the nature of the bug), so it gets overlooked.

No, the message alone doesn't say very much, because it's printed if a default case of the evluator is reached. So it's perfectly possible that every report of the hundred was caused by a different bug. That's why you and everyone that encounters it should provide a code example. And BTW, whining still isn't a substitute for that...

[...]

I'm not having a go at anybody; I merely wanted to emphasize that 330+ bug
reports have been ignored because, as I explained, it's impossible to

[...]

They weren't ignored. The '[BUG] unknown node type 0' error is the common
symptom of any bug in the Ruby interpreter that "screws things up" in a
certain way.

i.e. it's just a symptom of the real problem which is "somebody corrupted
things sometime earlier".

Many of the bugs that could cause the error have been fixed.

I suspect that the usual way they are found and fixed is that someone thinks
hard about the (Ruby interpreter) code and sees a potential problem.

This is why whenever there is a report "I got a [BUG unknown node type 0", the
answer is usually "try the latest stable snapshot". It means that a bug was
fixed recently that could result in that error and perhaps that was the bug
the person was seeing.

···

In article <200412151409.53542.andrew@walrond.org>, Andrew Walrond wrote:

Joe Van Dyk wrote:

But the code sample is worthless.

No, it's not. It's the only way to find out which code in the execution path might have led to the problem. Even if it cannot directly be reproduced on another machine, it's at least a hint where to look for the error, or to connect the problem to a recent change that could have went wrong.

···

--
Florian Frank

Alexander Kellett wrote:

but then again nor is a duplicated useless moaning response in reply is it?

That's called a race condition. And believe me, I would rather have sent the code example, but some things just cannot be delegated.

···

--
Florian Frank

Tim Sutherland wrote:

In article <200412151409.53542.andrew@walrond.org>,
Andrew Walrond wrote: [...]

I'm not having a go at anybody; I merely wanted to
emphasize that 330+ bug reports have been ignored
because, as I explained, it's impossible to

[...]

They weren't ignored. The '[BUG] unknown node type 0'
error is the common symptom of any bug in the Ruby
interpreter that "screws things up" in a certain way.

i.e. it's just a symptom of the real problem which is
"somebody corrupted things sometime earlier".

Many of the bugs that could cause the error have been
fixed.

I suspect that the usual way they are found and fixed is
that someone thinks hard about the (Ruby interpreter)
code and sees a potential problem.

This is why whenever there is a report "I got a [BUG
unknown node type 0", the answer is usually "try the
latest stable snapshot". It means that a bug was fixed
recently that could result in that error and perhaps that
was the bug the person was seeing.

What I don't get is how entering white space or newlines can cause or fix
the problem. That's just weird.

Hi,

···

In message "Re: [BUG] unknown node type 0 - SERIOUS ENOUGH TO MIGRATE AWAY FROM RUBY?" on Thu, 16 Dec 2004 08:52:16 +0900, "Joe Van Dyk" <joe.vandyk@boeing.com> writes:

What I don't get is how entering white space or newlines can cause or fix
the problem. That's just weird.

Because it's often caused by GC invoked at certain timing, even mere
spaces or newlines can change the situation.

              matz.

In article <I8sEpB.E65@news.boeing.com>,

···

Joe Van Dyk <joe.vandyk@boeing.com> wrote:

Tim Sutherland wrote:

In article <200412151409.53542.andrew@walrond.org>,
Andrew Walrond wrote: [...]

I'm not having a go at anybody; I merely wanted to
emphasize that 330+ bug reports have been ignored
because, as I explained, it's impossible to

[...]

They weren't ignored. The '[BUG] unknown node type 0'
error is the common symptom of any bug in the Ruby
interpreter that "screws things up" in a certain way.

i.e. it's just a symptom of the real problem which is
"somebody corrupted things sometime earlier".

Many of the bugs that could cause the error have been
fixed.

I suspect that the usual way they are found and fixed is
that someone thinks hard about the (Ruby interpreter)
code and sees a potential problem.

This is why whenever there is a report "I got a [BUG
unknown node type 0", the answer is usually "try the
latest stable snapshot". It means that a bug was fixed
recently that could result in that error and perhaps that
was the bug the person was seeing.

What I don't get is how entering white space or newlines can cause or fix
the problem. That's just weird.

_ It doesn't 'fix it' it just hides it. It may be weird, but that
kind of intermittant bug is inherent in every computer language
and is always the hardest to fix and find.

_ Booker C. Bense

In article <1103155193.940450.14567.nullmailer@x31.priv.netlab.jp>,
  Yukihiro Matsumoto <matz@ruby-lang.org> writes:

Because it's often caused by GC invoked at certain timing, even mere
spaces or newlines can change the situation.

I have some idea to ease debugging such bugs.

1. "always GC" option
  It makes easier to modify script without lose a problem.

  However it makes ruby drastically slow, it should take an optional
  argument N which means "always GC after N GC opportunities".

2. record last GC backtrace.
  It shows GC timing which we hard to see now.
  
  It is not difficult with gcc if __builtin_return_address(level) and
  __builtin_frame_address(level) works well where level > 0.

···

--
Tanaka Akira