Argument error with negative arity arguments?

I've been scratching my head at this one for a long long time. I've looked
pretty far and wide and can't find any reference to it happening elsewhere.

There's a set of background (Sidekiq) jobs that fail at a rate of about one
in a million with the following bewildering exception:

ArgumentError: wrong number of arguments (-2 for 2)

This confuses me in a couple of ways.

* this kind of exception normally happens when you supply too few or too
many arguments to a method. But less than 0 arguments does not make sense.
I dug into the C code on github, and as far as I can tell, the throwing
function is rb_check_arity which is just using the value that is passed in
for argc from the callerinfo. I reach a deadend where I try to figure out
how that callerinfo is being constructed where it must be assigning this
negative value.

* Secondly, the line that calls the method that throws said exception is of
the form:

method_that_raises not_nil_string, {

  :inline_hash => 'something' }

So it's particularly confusing because this is not a dynamic call. The
method is always called with two arguments and works as expected 99.9% of
the time. Perhaps the seemingly random failure is related to thread
reaping. Either way, perhaps this is an issue where the class is parsed
incorrectly? Could it have something to do with the hash being defined
inline with only the curly on the calling line and it being some sort of
parsing weirdness? Has anyone even seen something nonsensical like this
before? Really what I'm looking for is any kind of clue on what the cause
of this error might be and how to address it.

The ruby -v is MRI 2.2.5p319

It sounds to me as if this error is generated because a signed integer variable is being driven to overflow - this is typical of the sort of unpredicatvble behaviour triggered by such events. Not sure that helps with a solution in any way though!

···

On 09/09/2016 8:22 PM, Jonathan Calvert wrote:

I've been scratching my head at this one for a long long time. I've looked pretty far and wide and can't find any reference to it happening elsewhere.

There's a set of background (Sidekiq) jobs that fail at a rate of about one in a million with the following bewildering exception:

ArgumentError: wrong number of arguments (-2 for 2)

This confuses me in a couple of ways.

* this kind of exception normally happens when you supply too few or too many arguments to a method. But less than 0 arguments does not make sense. I dug into the C code on github, and as far as I can tell, the throwing function is rb_check_arity which is just using the value that is passed in for argc from the callerinfo. I reach a deadend where I try to figure out how that callerinfo is being constructed where it must be assigning this negative value.

* Secondly, the line that calls the method that throws said exception is of the form:

method_that_raises not_nil_string, {

  :inline_hash => 'something' }

So it's particularly confusing because this is not a dynamic call. The method is always called with two arguments and works as expected 99.9% of the time. Perhaps the seemingly random failure is related to thread reaping. Either way, perhaps this is an issue where the class is parsed incorrectly? Could it have something to do with the hash being defined inline with only the curly on the calling line and it being some sort of parsing weirdness? Has anyone even seen something nonsensical like this before? Really what I'm looking for is any kind of clue on what the cause of this error might be and how to address it.

The ruby -v is MRI 2.2.5p319

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

No virus found in this message.
Checked by AVG - www.avg.com <http://www.avg.com>
Version: 2016.0.7752 / Virus Database: 4649/12972 - Release Date: 09/08/16

--
Patrick Bayford Tel : 020 8265 8376 E-mail : pbayford@talktalk.net

Hi,

There's a set of background (Sidekiq) jobs that fail at a rate of about one
in a million

This sounds to me like a race condition. Check if there is data shared
between multiple threads and properly guarded. (I don't know Sidekiq,
though, and for the rest of this answer I am going to assume it spawns
threads and not processes).

ArgumentError: wrong number of arguments (-2 for 2)

Now that looks interesting. It is possible on the Ruby C side to define
a method that takes -2 arguments (arity -2) using this call:

    rb_define_method(klass, "methodname", funcptr, -2);

-2 is treated as a special indicator rather than the expected number of
arguments; it is a way on the C side to define a splat (rest)
argument. As per the docs[1], -2 causes Ruby to call the C function with
the arguments passed as a Ruby Array instance, whereas -1 would pass
them as a C array of VALUE objects.

However, what you get is the inverse: the method has an arity of 2, but
you pass -2 arguments. That would indicate that it is possible to call
rb_funcall() with its `argc' parameter set to such a value, which to my
knowledge is not possible.

From this follows at least that your mail subject is misleading: your
method has a positive arity of 2 (which is correct), it is the argument
list passed to that method that has a negative length of -2 (which is
unexpected).

* Secondly, the line that calls the method that throws said exception is of
the form:

method_that_raises not_nil_string, {

  :inline_hash => 'something' }

So it's particularly confusing because this is not a dynamic call.

The exception you are getting implies that something on the C side of
Ruby went wrong. The two arguments (the String and Hash instances) were
passed to the C side, but before they could be counted for determining
the value of `argc', something happened that caused `argc' to be set to
this weird -2 value. So it is likely that after the Ruby side was left,
but before the C code has constructed the argument list for
rb_funcall(), the thread got descheduled and some other thread got
scheduled that interferred with the argument list. Double-check if your
`not_nil_string' is a shared resource.

If I am right and we are looking at a race condition, you can increase
the likelihood of this phenomenon by spawning more threads to force the
scheduler to do more task switching, increasing the likelihood to
trigger just in this moment before `argc' is assigned.

Still, even if there is a race condition, it should not be possible to
deschedule a thread that is doing the construction of the rb_funcall()
call. The exception you get looks like a Ruby bug to me.

inline with only the curly on the calling line and it being some sort of
parsing weirdness?

I am pretty sure this is not the case. Your Ruby code gets parsed to an
Abstract Syntax Tree and then compiled to YARV bytecode before it is
executed. Any newlines in the original source file will have vanished by
the time the code is actually executed. If you want to verify, put the
brace on the same line and retry.

Has anyone even seen something nonsensical like this
before? Really what I'm looking for is any kind of clue on what the cause
of this error might be and how to address it.

You'll need to try to create a minimal demonstration example we can
fiddle with. Given the low likelihood of the incident, it might be
difficult to create, but as I said try throwing more threads on the code
and see if you can increase it.

Greetings
Marvin

···

On Fri, Sep 09, 2016 at 03:22:50PM -0400, Jonathan Calvert wrote:

--
Blog: http://www.guelkerdev.de
PGP/GPG ID: F1D8799FBCC8BC4F

So I managed to forget the docs[1] link. Here it is:

    https://svn.ruby-lang.org/cgi-bin/viewvc.cgi/tags/v2_3_1/doc/extension.rdoc?view=markup#l382

Greetings
Marvin

···

On Fri, Sep 09, 2016 at 11:03:51PM +0200, Marvin Gülker wrote:

-2 is treated as a special indicator rather than the expected number of
arguments; it is a way on the C side to define a splat (rest)
argument. As per the docs[1], -2 causes Ruby to call the C function with
the arguments passed as a Ruby Array instance, whereas -1 would pass
them as a C array of VALUE objects.

--
Blog: http://www.guelkerdev.de
PGP/GPG ID: F1D8799FBCC8BC4F

Thank you for the reply. It does seem like a race condition due to the
sporadic nature, but each process is being initialized with a primary key
as an argument and then gathers all data from the database independently.
You're also correct that I could have worded the subject a little better to
convey the weirdness that is going on. The 'not_nil_string' comes from the
calling method and is made via interpolation within an enumeration EG:

db_model = DbModel.find(model_id)
mapped_stuff = ['first', 'last'].map{|element|
Klass.method_that_raises(db_model.send(element), "#{element}_name") }

So at least within the code I can't really see anything that is a shared
resource to create a race condition. Unfortunately that doesn't seem
helpful. So far the issue is resistant to reproduction. If I was able to do
so, are there RVM debug options/builds that would output useful information?

I do not think that it can be found in your code. If there is
something wrong, then it must be in the interpreter's code. Ruby code
does not have access to the variable and only implicitly sets the
value via the number of arguments present. How do you pass -2
arguments? I don't see how that could happen.

Kind regards

robert

···

On Sat, Sep 10, 2016 at 6:33 PM, Jonathan Calvert <jonathan.calvert@kitcheck.com> wrote:

So at least within the code I can't really see anything that is a shared
resource to create a race condition. Unfortunately that doesn't seem
helpful. So far the issue is resistant to reproduction. If I was able to do
so, are there RVM debug options/builds that would output useful information?

--
[guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can
- without end}
http://blog.rubybestpractices.com/