Extension question

Hi,

I'm generating a fairly large image file (can be 10000x10000) pixels
by reading some binary data and setting each pixel. As you can
imagine, pure Ruby is REALLY slow, I think because of the loops. I've
written the program in C and it's literally hundreds to thousands of
times faster (depending on image size).

So, I'd like to write my first C extension. I'm using the GD library
for image manipulation.

Are there any general rules of thumb that you use when writing extensions?

My first problem was trying to figure out what should remain Ruby code
and what should be moved out to C. For example, if I opened the file
that contained binary data in Ruby, how could I "get" to that data
using a FILE pointer in C? Or should I just pass the filename to the
C extension?

Also, I couldn't really figure out how to only have one function in C,
the documentation seemed to say that I needed to have a class?

Thanks,
Joe

Hi,

I'm generating a fairly large image file (can be 10000x10000) pixels
by reading some binary data and setting each pixel. As you can
imagine, pure Ruby is REALLY slow, I think because of the loops. I've
written the program in C and it's literally hundreds to thousands of
times faster (depending on image size).

So, I'd like to write my first C extension. I'm using the GD library
for image manipulation.

You might be interested in this:

http://raa.ruby-lang.org/list.rhtml?name=ruby-gd

From the example scripts:

require "GD"
im = GD::Image.new(100,100)
red = im.colorAllocate(255,0,0)
im.rectangle(0,0,99,99,red)
im.png STDOUT

Yours,

tom

···

On Mon, 2005-04-04 at 23:19 +0900, Joe Van Dyk wrote:

You can attach the function to Object:

···

On Mon, 2005-04-04 at 23:19 +0900, Joe Van Dyk wrote:

Also, I couldn't really figure out how to only have one function in C,
the documentation seemed to say that I needed to have a class?

===============================
$ cat example.c
#include <stdio.h>
#include "ruby.h"
static VALUE test(VALUE self) {
        printf("HI!\n");
        return Qnil;
}
void Init_example() {
        rb_define_method(rb_cObject, "test", test, 0);
}
$ cat extconf.rb
require 'mkmf'
create_makefile("example")
$ ruby extconf.rb && make
[... some gcc output ...]
$ irb irb
irb(main):001:0> require 'example'
=> true
irb(main):002:0> test
HI!
=> nil
irb(main):003:0>

Yours,

Tom

I wonder about this as well. So far I've been sort of passing data
objects back and forth between C and Ruby, the same way that I would if
I were doing an out-of-process method call. But I think that's mostly
just because I'm an extensions new bee....

Yours,

Tom

···

On Mon, 2005-04-04 at 23:19 +0900, Joe Van Dyk wrote:

My first problem was trying to figure out what should remain Ruby code
and what should be moved out to C. For example, if I opened the file
that contained binary data in Ruby, how could I "get" to that data
using a FILE pointer in C? Or should I just pass the filename to the
C extension?

Are there any general rules of thumb that you use when writing

extensions?

http://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/47700

You shouldn't be able to cause your extension to seg fault from Ruby.
That may sound like a joke, but I just mean using StringValue,
NUM2LONG, etc before you starting using Ruby objects in C.

My first problem was trying to figure out what should remain Ruby

code

and what should be moved out to C. For example, if I opened the file
that contained binary data in Ruby, how could I "get" to that data
using a FILE pointer in C? Or should I just pass the filename to the
C extension?

You can do either. IMO it depends on the layout of the file. If your
reading line by line you can do this easily using the rb_io_* functions
found in intern.h. If your going to be moving the file position
around, using fgetc/ungetc, or reading into C structs it may be easier
to use the stdio functions directly.

-Charlie

Joe Van Dyk wrote:

Hi,

I'm generating a fairly large image file (can be 10000x10000) pixels
by reading some binary data and setting each pixel. As you can
imagine, pure Ruby is REALLY slow, I think because of the loops. I've
written the program in C and it's literally hundreds to thousands of
times faster (depending on image size).

Have you tried RMagick? http://rmagick.rubyforge.org.

The constitute method may be useful: RMagick 1.15.0: class Image (class and instance methods, part 1)

That method requires you to have all the pixel data in memory at once, though, which could be a problem unless you have a LOT of RAM.

You could use import_pixels to set just a subset of pixels at a time:
http://www.simplesystems.org/RMagick/doc/image2.html#import_pixels

That would minimize the memory requirements.

In both cases you'd have to convert your binary data to a Ruby numeric type. Don't know whether that would be too slow or not.

Hi,

I was using Ruby-GD before. The slowness is not the library, but the
looping 10 million times. I have to set each pixel, so I can't use
the rectangle function.

···

On Apr 4, 2005 7:46 AM, Tom Copeland <tom@infoether.com> wrote:

On Mon, 2005-04-04 at 23:19 +0900, Joe Van Dyk wrote:
> Hi,
>
> I'm generating a fairly large image file (can be 10000x10000) pixels
> by reading some binary data and setting each pixel. As you can
> imagine, pure Ruby is REALLY slow, I think because of the loops. I've
> written the program in C and it's literally hundreds to thousands of
> times faster (depending on image size).
>
> So, I'd like to write my first C extension. I'm using the GD library
> for image manipulation.

You might be interested in this:

http://raa.ruby-lang.org/list.rhtml?name=ruby-gd

>From the example scripts:

require "GD"
im = GD::Image.new(100,100)
red = im.colorAllocate(255,0,0)
im.rectangle(0,0,99,99,red)
im.png STDOUT

Yours,

"Tom Copeland" <tom@infoether.com> schrieb im Newsbeitrag
news:1112626649.3468.41.camel@hal...

···

On Mon, 2005-04-04 at 23:19 +0900, Joe Van Dyk wrote:
> Also, I couldn't really figure out how to only have one function in C,
> the documentation seemed to say that I needed to have a class?

You can attach the function to Object:

IMHO the appropriate place would be a private singleton method of Kernel
wouldn't it? At least all other "functions" (gsub, gsub! etc.) are placed
there...

Kind regards

    robert

So, if I have a IO object in Ruby, how could I use C's stdio functions
on it? And how could I have figured it out without asking the mailing
list? :slight_smile: Cuz I'm sure I'll have lots more questions.

Thanks,
Joe

···

On Apr 4, 2005 8:34 AM, Charles Mills <cmills@freeshell.org> wrote:

> Are there any general rules of thumb that you use when writing
extensions?

http://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/47700

You shouldn't be able to cause your extension to seg fault from Ruby.
That may sound like a joke, but I just mean using StringValue,
NUM2LONG, etc before you starting using Ruby objects in C.

>
> My first problem was trying to figure out what should remain Ruby
code
> and what should be moved out to C. For example, if I opened the file
> that contained binary data in Ruby, how could I "get" to that data
> using a FILE pointer in C? Or should I just pass the filename to the
> C extension?
>
You can do either. IMO it depends on the layout of the file. If your
reading line by line you can do this easily using the rb_io_* functions
found in intern.h. If your going to be moving the file position
around, using fgetc/ungetc, or reading into C structs it may be easier
to use the stdio functions directly.

-Charlie

Ooh, that looks nice. The image can be up to (and probably more than)
10,000x10,000 pixels though... that's a heck of a lot of memory. And
converting the data in Ruby would take a long time, I believe. Too
many iterations.

···

On Apr 4, 2005 11:09 AM, Tim Hunter <sastph@sas.com> wrote:

Joe Van Dyk wrote:
> Hi,
>
> I'm generating a fairly large image file (can be 10000x10000) pixels
> by reading some binary data and setting each pixel. As you can
> imagine, pure Ruby is REALLY slow, I think because of the loops. I've
> written the program in C and it's literally hundreds to thousands of
> times faster (depending on image size).

Have you tried RMagick? http://rmagick.rubyforge.org.

The constitute method may be useful:
RMagick 1.15.0: class Image (class and instance methods, part 1)

That method requires you to have all the pixel data in memory at once,
though, which could be a problem unless you have a LOT of RAM.

You could use import_pixels to set just a subset of pixels at a time:
RMagick 1.15.0: class Image (instance methods, part 2)

That would minimize the memory requirements.

In both cases you'd have to convert your binary data to a Ruby numeric
type. Don't know whether that would be too slow or not.

How should errors be handled in C extensions? For example, my C
extension was expecting a Ruby string, but it was passed an IO object.

Or, what should I do when I try to open a file in C (the filename
being passed to the function) and the file doesn't exist?

···

On Apr 4, 2005 8:34 AM, Charles Mills <cmills@freeshell.org> wrote:

> Are there any general rules of thumb that you use when writing
extensions?

http://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/47700

You shouldn't be able to cause your extension to seg fault from Ruby.
That may sound like a joke, but I just mean using StringValue,
NUM2LONG, etc before you starting using Ruby objects in C.

Also, I couldn't really figure out how to only have one function in C,
the documentation seemed to say that I needed to have a class?

You can attach the function to Object:

Please note that :test is a kernel method. :slight_smile:

I'd like to illustrate an alternative to Tom's suggestion:

===============================
$ cat example.c
#include <stdio.h>
#include "ruby.h"
static VALUE test(VALUE self) {
        printf("HI!\n");
        return Qnil;
}
void Init_example() {
        rb_define_method(rb_cObject, "test", test, 0);
}

$ cat example.rb

require 'inline'

class Example
   inline do |builder|
     builder.include "<stdio.h>"

     builder.c <<-"END"
       void do_it() {
         puts("HI!");
       }
     END
   end
end

$ cat extconf.rb
require 'mkmf'
create_makefile("example")

skip this file entirely

$ ruby extconf.rb && make
[... some gcc output ...]

skip this step entirely - and I should add, by just running/requiring the code everything is silently done for you, as necessary, very very quickly.

$ irb irb
irb(main):001:0> require 'example'
=> true
irb(main):002:0> test

Example.new.do_it # because test is rude. :slight_smile:

HI!
=> nil
irb(main):003:0>

There is no middle step.

For developer time and convenience, ruby inline kicks ass!

···

On Apr 4, 2005, at 8:01 AM, Tom Copeland wrote:

On Mon, 2005-04-04 at 23:19 +0900, Joe Van Dyk wrote:

Ah, OK, so you were already familiar with that one, sorry.

Yours,

Tom

···

On Tue, 2005-04-05 at 00:17 +0900, Joe Van Dyk wrote:

I was using Ruby-GD before.

Right you are, that's much better...

Yours,

Tom

···

On Tue, 2005-04-05 at 00:24 +0900, Robert Klemme wrote:

> You can attach the function to Object:

IMHO the appropriate place would be a private singleton method of Kernel
wouldn't it? At least all other "functions" (gsub, gsub! etc.) are placed
there...

And to expand the question...

I'm using the GD library to create the image. I'm creating the image
by reading binary data from a file and looping through some
predetermined width and height and setting each pixel of the picture
based on the data from the file.

Which parts of this should go in the C extension and which parts
should stay Ruby? I'm really scratching my head over this one.

Thanks,
Joe

···

On Apr 4, 2005 10:35 AM, Joe Van Dyk <joevandyk@gmail.com> wrote:

On Apr 4, 2005 8:34 AM, Charles Mills <cmills@freeshell.org> wrote:
>
> > Are there any general rules of thumb that you use when writing
> extensions?
>
> http://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/47700
>
> You shouldn't be able to cause your extension to seg fault from Ruby.
> That may sound like a joke, but I just mean using StringValue,
> NUM2LONG, etc before you starting using Ruby objects in C.
>
> >
> > My first problem was trying to figure out what should remain Ruby
> code
> > and what should be moved out to C. For example, if I opened the file
> > that contained binary data in Ruby, how could I "get" to that data
> > using a FILE pointer in C? Or should I just pass the filename to the
> > C extension?
> >
> You can do either. IMO it depends on the layout of the file. If your
> reading line by line you can do this easily using the rb_io_* functions
> found in intern.h. If your going to be moving the file position
> around, using fgetc/ungetc, or reading into C structs it may be easier
> to use the stdio functions directly.
>
> -Charlie

So, if I have a IO object in Ruby, how could I use C's stdio functions
on it? And how could I have figured it out without asking the mailing
list? :slight_smile: Cuz I'm sure I'll have lots more questions.

Thanks,
Joe

Joe Van Dyk wrote:

So, if I have a IO object in Ruby, how could I use C's stdio functions
on it? And how could I have figured it out without asking the mailing
list? :slight_smile: Cuz I'm sure I'll have lots more questions.

Check out the GetOpenFile, GetWriteFile, etc. macros in rubyio.h, and the rb_io_xxxxx functions in io.h.

I found these by perusing the Ruby source code and asking questions on this list.

Joe Van Dyk wrote:

Ooh, that looks nice. The image can be up to (and probably more than)
10,000x10,000 pixels though... that's a heck of a lot of memory. And
converting the data in Ruby would take a long time, I believe. Too
many iterations.

You're right, converting from binary data to Ruby objects would be time-consuming. You could reduce this memory requirements by making the image a piece at a time and then stitching the pieces together in a separate pass. You'd still have to be able to hold the final image in memory but you wouldn't have to have the pixel data _and_ the final image in memory at the same time. I estimate that a 10000x10000 image would take about 105-110Mb of memory.

Of course doing it a piece at a time takes longer. Depends on what your tightest constraint is.

If you decide to try RMagick give me a shout off-line. We can strategize a bit before you go to the trouble of installing ImageMagick or GraphicsMagick and RMagick.

Joe Van Dyk wrote:

>
> > Are there any general rules of thumb that you use when writing
> extensions?
>
> http://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/47700
>
> You shouldn't be able to cause your extension to seg fault from

Ruby.

> That may sound like a joke, but I just mean using StringValue,
> NUM2LONG, etc before you starting using Ruby objects in C.

How should errors be handled in C extensions? For example, my C
extension was expecting a Ruby string, but it was passed an IO

object.

static VALUE
expecting_a_string(VALUE self, VALUE str)
{
  StringValue(str);
  /* str is now definately a String so I can
   * use RSTRING(str)->ptr, RSTRING(str)->len
   * and pass it to rb_str_* methods */
  return self;
}

Or, what should I do when I try to open a file in C (the filename
being passed to the function) and the file doesn't exist?

static VALUE
open_and_use_file(VALUE self, VALUE path_str)
{
  const char *path = StringValueCStr(path_str);
  FILE *stream = fopen(path, "r");
  if (stream == NULL) rb_sys_fail(path); /* does not return */
  /* stream is now ready to use ... */
}

you can use rb_sys_fail() when fread, etc fails as well.

-Charlie

···

On Apr 4, 2005 8:34 AM, Charles Mills <cmills@freeshell.org> wrote:

Joe Van Dyk wrote:

How should errors be handled in C extensions? For example, my C
extension was expecting a Ruby string, but it was passed an IO object.

Generally the Ruby functions that convert Ruby types to C types will raise an exception if they're given invalid input.

Or, what should I do when I try to open a file in C (the filename
being passed to the function) and the file doesn't exist?

Call rb_raise()

Check out chapter 21 in the new Pickaxe.

I was hoping Ara Howard would stick his head in here, but I'll just mention that he's doing some very performance intensive image processing using ruby/mmap and narray. I don't know the details, but I'd certainly give those two tools a look.

HTH,

Nathaniel

<:((><

···

On Apr 4, 2005, at 14:19, Joe Van Dyk wrote:

Ooh, that looks nice. The image can be up to (and probably more than)
10,000x10,000 pixels though... that's a heck of a lot of memory. And
converting the data in Ruby would take a long time, I believe. Too
many iterations.