Efficiency advice needed

hi,

i got to interleave the alpha channel with an rgb image both given as raw
files: please don’t laugh at me, but i found myself writing this very short
but slow code:

rgb = File.new rgbfile, ‘rb’

this raw file contains rgb bytes in interleaved order, so one pixel is 3

bytes

alpha = File.new alphafile, ‘rb’

contains 1 byte/pixel alpha channel

#desired result: a binary string, rgba interleaved, 4 byte/pixel, =
4widthheight bytes

data = ‘’
(width*height).times{
data += rgb.read( 3) + alpha.read( 1)
}

i think alternately reading some bytes is slowing down alot. but also loops
operating on strings are
slow, i think. i know some solutions, but i don t know which one to use.
are there any suggestions how to do this most efficiently in ruby?

– thanks, mr

> data = '' > (width*height).times{ > data += rgb.read( 3) + alpha.read( 1) > } > > i think alternately reading some bytes is slowing down alot. but also loops > operating on strings are > slow, i think. i know some solutions, but i don t know which one to use. > are there any suggestions how to do this most efficiently in ruby?

Read everything into an array or string and go from there… I don’t
thing String operations are particularly slow, but I could be wrong.
You might read in the entire file as a string (or a given chunk, for
scaleability’s sake), String#unpack it, and go from there, but that
sounds like a waste of memory for something like this.

···

On Mon, 12 May 2003 06:56:27 +0900 “meinrad.recheis” my.name.here@gmx.at wrote:


Ryan Pavlik rpav@users.sf.net

“Darn that uncertainty principle! It always gets me lost.” - 8BT

unpacking makes no sence for me.

i lately thought of inserting the alpha characters into the rgb string
using String#insert from
back to front, so i don t need to worry about changing indices (due to
insertion) and i can Striong#chop! off the used alpha bytes.

there are several other ways to do it, but i allways want to keep code
short and easy.

thanks for comments, mr.

···

On Mon, 12 May 2003 08:31:58 +0900, Ryan Pavlik rpav@nwlink.com wrote:

On Mon, 12 May 2003 06:56:27 +0900 > “meinrad.recheis” my.name.here@gmx.at wrote:

> data = '' > (width*height).times{ > data += rgb.read( 3) + alpha.read( 1) > } > > i think alternately reading some bytes is slowing down alot. but also > loops operating on strings are > slow, i think. i know some solutions, but i don t know which one to use. > are there any suggestions how to do this most efficiently in ruby?

Read everything into an array or string and go from there… I don’t
thing String operations are particularly slow, but I could be wrong.
You might read in the entire file as a string (or a given chunk, for
scaleability’s sake), String#unpack it, and go from there, but that
sounds like a waste of memory for something like this.


Using M2, Opera’s revolutionary e-mail client: http://www.opera.com/m2/

+= strikes as extremely inefficient, as it is creating new increasingly big String
objects on each iteration!! Ruby will spend quite some time just
duplicating the string (ie copying data) and then GC’ing, in fact that’d
be O(N^2). Although nearly as bad (making heavy use of realloc), using
#<< should give a significant speedup, and you cannot get it much easier
than s/+=/<</ :wink:

See:

batsman@tux-chan:/tmp$ cat ae.rb

rgb = File.new “rgbfile”, ‘rb’

alpha = File.new “alphafile”, ‘rb’

width = height = ARGV[0].to_i

data = ‘’
(width*height).times{
data << rgb.read(3) + alpha.read( 1)
}
p data.size

batsman@tux-chan:/tmp$ cat ae1.rb

rgb = File.new “rgbfile”, ‘rb’

alpha = File.new “alphafile”, ‘rb’

width = height = ARGV[0].to_i

data = ‘’
(width*height).times{
data += rgb.read(3) + alpha.read( 1)
}
p data.size

batsman@tux-chan:/tmp$ time ruby ae.rb 100
40000

real 0m0.071s
user 0m0.070s
sys 0m0.000s
batsman@tux-chan:/tmp$ time ruby ae.rb 200
160000

real 0m0.483s
user 0m0.460s
sys 0m0.010s
batsman@tux-chan:/tmp$ time ruby ae.rb 300
360000

real 0m2.067s
user 0m2.060s
sys 0m0.010s
batsman@tux-chan:/tmp$ time ruby ae1.rb 100
40000

real 0m0.555s
user 0m0.530s
sys 0m0.020s
batsman@tux-chan:/tmp$ time ruby ae1.rb 200
160000

real 0m9.316s
user 0m8.420s
sys 0m0.730s
batsman@tux-chan:/tmp$ time ruby ae1.rb 300
360000

real 1m3.309s
user 0m51.880s
sys 0m10.480s

You can see that ae.rb’s execution time still grows faster than
linearly… A better solution is

batsman@tux-chan:/tmp$ cat ae2.rb

rgb = File.new “rgbfile”, ‘rb’

alpha = File.new “alphafile”, ‘rb’

width = height = ARGV[0].to_i
data = String.new(" " * 4widthheight)

this sucks, would like to create a String w/ a given capa,

IRC that was considered for/possible in 1.8, but not sure

(widthheight).times{ |idx|
data[idx
4, 4] = rgb.read(3) + alpha.read( 1)
}

p data.size

Now:

batsman@tux-chan:/tmp$ time ruby ae2.rb 100
40000

real 0m0.053s
user 0m0.050s
sys 0m0.000s
batsman@tux-chan:/tmp$ time ruby ae2.rb 200
160000

real 0m0.186s
user 0m0.170s
sys 0m0.010s
batsman@tux-chan:/tmp$ time ruby ae2.rb 300
360000

real 0m0.391s
user 0m0.370s
sys 0m0.010s

batsman@tux-chan:/tmp$ ruby -v
ruby 1.6.8 (2003-02-28) [i386-linux]

Not bad, a >150-fold speedup from your original script :slight_smile:

You might consider using Inline for this stuff, that’d be pretty neat.

···

On Mon, May 12, 2003 at 08:31:58AM +0900, Ryan Pavlik wrote:

On Mon, 12 May 2003 06:56:27 +0900 > “meinrad.recheis” my.name.here@gmx.at wrote:

> data = '' > (width*height).times{ > data += rgb.read( 3) + alpha.read( 1) > }


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

I did this 'cause Linux gives me a woody. It doesn’t generate revenue.
– Dave '-ddt->` Taylor, announcing DOOM for Linux

“Mauricio Fernández” batsman.geo@yahoo.com schrieb im Newsbeitrag
news:20030512071000.GA13884@student.ei.uni-stuttgart.de

You can see that ae.rb’s execution time still grows faster than
linearly… A better solution is

batsman@tux-chan:/tmp$ cat ae2.rb

rgb = File.new “rgbfile”, ‘rb’

alpha = File.new “alphafile”, ‘rb’

width = height = ARGV[0].to_i
data = String.new(" " * 4widthheight)

You don’t need the String.new here - this only creates a superfluous copy.
Better do

data = " " * (4widthheight)

If you want to append to a string, using “<<” or “concat” is better than
“+=” since “+=” creates a new instance every time. You can verify it like
this:

$ cat concat.rb

TIMES = 10000

$st_add = ‘x’

$st1 = “”
$st2 = “”

def test1(t)
t.times { $st1 << $st_add }
end

def test2(t)
t.times { $st2 += $st_add }
end

test1 TIMES
test2 TIMES

$ ruby -r profile concat.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
42.32 1.64 1.64 10000 0.16 0.19 String#+
39.38 3.17 1.53 2 763.00 1930.00 Integer#times
10.99 3.59 0.43 10000 0.04 0.04 String#<<
6.92 3.86 0.27 10003 0.03 0.03 String#allocate
0.00 3.86 0.00 2 0.00 0.00 Module#method_added
0.00 3.86 0.00 1 0.00 3875.00 #toplevel
0.00 3.86 0.00 1 0.00 1125.00 Object#test1
0.00 3.86 0.00 1 0.00 2735.00 Object#test2

$

Regards

robert

i lately thought of inserting the alpha characters into the rgb string
using String#insert from
back to front, so i don t need to worry about changing indices (due to
insertion) and i can Striong#chop! off the used alpha bytes.

i implemented this, and it was a bad idea. that was extremely slow too.

···


Using M2, Opera’s revolutionary e-mail client: http://www.opera.com/m2/

“Mauricio Fernández” batsman.geo@yahoo.com schrieb im Newsbeitrag
news:20030512071000.GA13884@student.ei.uni-stuttgart.de

You can see that ae.rb’s execution time still grows faster than
linearly… A better solution is

batsman@tux-chan:/tmp$ cat ae2.rb

rgb = File.new “rgbfile”, ‘rb’

alpha = File.new “alphafile”, ‘rb’

width = height = ARGV[0].to_i
data = String.new(" " * 4widthheight)

You don’t need the String.new here - this only creates a superfluous copy.
Better do

data = " " * (4widthheight)

Stupid of me, yes. I was thinking C there (init buffer with given capa. :slight_smile:

If you want to append to a string, using “<<” or “concat” is better than
“+=” since “+=” creates a new instance every time. You can verify it like
this:

That was the point of my post (see first example), but in this case you
can get more speed by initializing the string to the given size and then
replacing characters inside with #=; this way you don’t waste time
realloc()ating the string buffer (which would happen if you use
String#<<).

···

On Mon, May 12, 2003 at 05:40:04PM +0900, Robert Klemme wrote:


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Linux poses a real challenge for those with a taste for late-night
hacking (and/or conversations with God).
– Matt Welsh

“Mauricio Fernández” batsman.geo@yahoo.com schrieb im Newsbeitrag
news:20030512092410.GA22416@student.ei.uni-stuttgart.de

width = height = ARGV[0].to_i
data = String.new(" " * 4widthheight)

You don’t need the String.new here - this only creates a superfluous
copy.
Better do

data = " " * (4widthheight)

Stupid of me, yes. I was thinking C there (init buffer with given capa.
:slight_smile:

Don’t worry, such things happen to all of us all the time. :slight_smile:

If you want to append to a string, using “<<” or “concat” is better
than
“+=” since “+=” creates a new instance every time. You can verify it
like
this:

That was the point of my post (see first example), but in this case you
can get more speed by initializing the string to the given size and then
replacing characters inside with #=; this way you don’t waste time
realloc()ating the string buffer (which would happen if you use
String#<<).

That’s true. I just wanted to point out that “<<” is better for cases
where you can’t do preallocation. Maybe I should’ve read the beginning of
your post more carefully and written my point more clearly.

Regards

robert

thank you robert and batsman! that was really helpful.

how come you both know so much about implementation of ruby?

meinrad

···


Using M2, Opera’s revolutionary e-mail client: http://www.opera.com/m2/

And even better is to preallocate the source string: i.e.

space = " "
… do
a << space * n
end

is faster than

a << " " * n

because the latter generates a new string object (containing one space) each
time round the loop.

Cheers,

Brian.

···

On Mon, May 12, 2003 at 06:40:06PM +0900, Robert Klemme wrote:

If you want to append to a string, using “<<” or “concat” is better
than
“+=” since “+=” creates a new instance every time. You can verify it
like
this:

In this case it could be guessed (how would I implement #concat?, etc)
but it never hurts to grep the sources (which I did, BTW :slight_smile:

Really, if you’re looking for some info on Xyz#method, it takes but a
minute to look at the end of xyz.c, locate the appropriate
rb_define_method call and then check the function. The source code is
really readable.

Now, if you are interested in how to reach the Ultime Knowledge of
Ruby’s implementation, ask Guy Decoux :slight_smile:

···

On Mon, May 12, 2003 at 07:40:24PM +0900, meinrad.recheis wrote:

thank you robert and batsman! that was really helpful.

how come you both know so much about implementation of ruby?


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

…[Linux’s] capacity to talk via any medium except smoke signals.
– Dr. Greg Wettstein, Roger Maris Cancer Center

“Mauricio Fernández” batsman.geo@yahoo.com schrieb im Newsbeitrag
news:20030512111004.GA30107@student.ei.uni-stuttgart.de

thank you robert and batsman! that was really helpful.

how come you both know so much about implementation of ruby?

I wouldn’t say that I know the implementation of ruby. I prefer reading
the docs, using irg and / or small test scripts like the one I posted to
gain knowledge.

Apart from that, “+=” is “+” (which creates a new instance) followed by
“=” (which assigns the new instance, possibly handing over the old
instance to garbage collection). “<<” on the other hand is an instance
method that does only append to the current string. You save the
allocation of the new instance and GC of the unused.

Regards

robert
···

On Mon, May 12, 2003 at 07:40:24PM +0900, meinrad.recheis wrote: