What is the fastest way to iterate over a hash in C?

Hello,

I'm working on the Fast JSON project, and have come against some puzzling
performance quirk.

Most of the Hash#to_json functionality is implemented in C and performs much
better. However there is one section that performs 6 time better when
implemented in ruby vs c.

I wrote a benchmark that calls to_json on 50000 hashes.

Here is the method in Ruby. The benchmark takes around 1.7 seconds:

  def process_internal_json(json, state, depth, delim)
    first = true
    each { |key,value|
      if first
        first = false
      else
        json << delim
      end
      generate_key_value_json(json, state, depth, key, value)
    }
    json
  end

Here is the method in C. The benchmark takes around 9.5 seconds:

static VALUE process_internal_json(VALUE self, VALUE json, VALUE state,
VALUE depth, VALUE delim) {
  int first = 1;
  VALUE key_value_pairs = rb_funcall(self, rb_intern("to_a"), 0);

  VALUE key_value = Qnil;
  while((key_value = rb_ary_pop(key_value_pairs)) != Qnil) {
    if(first == 1) {
      first = 0;
    }
    else {
      rb_str_concat(json, delim);
    }
    VALUE value = rb_ary_pop(key_value);
    VALUE key = rb_ary_pop(key_value);
    generate_key_value_json(self, json, state, depth, key, value);
  }
}

It seems like there is some optimization in the Hash#each method. I'm trying
to figure out how to get that same performance benefit using C. Perhaps its
is not worth it though.

Does anybody know what going on?

Thank you,
Brian Takita

I found a better solution in c. This method causes the benchmark to run in
about 1.4 seconds.

static VALUE process_internal_json(VALUE self, VALUE json, VALUE state,
VALUE depth, VALUE delim) {
  VALUE key_value_pairs = rb_funcall(self, rb_intern("to_a"), 0);

  VALUE key_value = Qnil;
  int i;
  for( i = 0; i < RARRAY(key_value_pairs)->len; i++) {
    if(i > 0) {
      rb_str_concat(json, delim);
    }
    VALUE key_value = rb_ary_entry(key_value_pairs, i);
    VALUE key = rb_ary_entry(key_value, 0);
    VALUE value = rb_ary_entry(key_value, 1);
    generate_key_value_json(self, json, state, depth, key, value);
  }
}

···

On 9/30/06, Brian Takita <brian.takita@gmail.com> wrote:

Hello,

I'm working on the Fast JSON project, and have come against some puzzling
performance quirk.

Most of the Hash#to_json functionality is implemented in C and performs
much
better. However there is one section that performs 6 time better when
implemented in ruby vs c.

I wrote a benchmark that calls to_json on 50000 hashes.

Here is the method in Ruby. The benchmark takes around 1.7 seconds:

  def process_internal_json(json, state, depth, delim)
    first = true
    each { |key,value|
      if first
        first = false
      else
        json << delim
      end
      generate_key_value_json(json, state, depth, key, value)
    }
    json
  end

Here is the method in C. The benchmark takes around 9.5 seconds:

static VALUE process_internal_json(VALUE self, VALUE json, VALUE state,
VALUE depth, VALUE delim) {
  int first = 1;
  VALUE key_value_pairs = rb_funcall(self, rb_intern("to_a"), 0);

  VALUE key_value = Qnil;
  while((key_value = rb_ary_pop(key_value_pairs)) != Qnil) {
    if(first == 1) {
      first = 0;
    }
    else {
      rb_str_concat(json, delim);
    }
    VALUE value = rb_ary_pop(key_value);
    VALUE key = rb_ary_pop(key_value);
    generate_key_value_json(self, json, state, depth, key, value);
  }
}

It seems like there is some optimization in the Hash#each method. I'm
trying
to figure out how to get that same performance benefit using C. Perhaps
its
is not worth it though.

Does anybody know what going on?

Thank you,
Brian Takita