What is the fastest way to add many small Strings to a big buffer?
thanks!
Dominik
What is the fastest way to add many small Strings to a big buffer?
thanks!
Dominik
Use the << method.
Regards,
Michael
On Fri, Jun 13, 2003 at 09:11:30PM +0900, Dominik Werder wrote:
What is the fastest way to add many small Strings to a big buffer?
–
ruby -r complex -e ‘c,m,w,h=Complex(-0.75,0.136),50,150,100;puts “P6\n#{w} #{h}\n255”;(0…h).each{|j|(0…w).each{|i|
n,z=0,Complex(.9i/w,.9j/h);while n<=m&&(z-c).abs<=2;z=zz+c;n+=1 end;print [10+n15,0,rand99].pack("C")}}’|display
AFAIK, if you stay w/ Ruby (and don’t write extensions in C), the best
thing you can do is probably
buffer << str
You might want to see what speed you get if you create the buffer and
then replace its contents with #=. That way you avoid the realloc()s,
but I’m not sure it’s worth it.
Whatever, never use += in a loop, you’d be creating lots of garbage of
increasing sizes, and copying all the time the data!
It’d be nice if there was a way to create a string with an arbitrary
capa, say, exposing rb_str_buf_new to the Ruby side.
On Fri, Jun 13, 2003 at 09:11:30PM +0900, Dominik Werder wrote:
What is the fastest way to add many small Strings to a big buffer?
–
_ _
__ __ | | ___ _ __ ___ __ _ _ __
'_ \ /| __/ __| '_
_ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com
Q: What’s the big deal about rm, I have been deleting stuff for years? And
never lost anything… oops!
A: …
– From the Frequently Unasked Questions
Dominik Werder wrote:
What is the fastest way to add many small Strings to a big buffer?
I don’t believe there’s a single fastest way.
Often ‘<<’ is fast, but not always.
In one application, where I was generating a CVS file from a largish SQL
result set, I made the program over 100 times faster by changing
result = ""
for r in results
csv_string = to_csv(r)
result << csv_string
end
to
result = []
for r in results
csv_string = to_csv(r)
result << csv_string
end
result = result.join
Cheers
Dave
Saluton!
This message contains a C implementation of a string buffer (see
below). It is an alpha version because I have no information about
the behavior of ALLOC_N and REALLOC_N in the case of a memory
allocation failure - I didn’t find any documentation on that topic.
Besides that it seems to be bullet-proof (I use it to to concatenate
chunks of all mails I download from my POP3 accounts - which is the
reason why I want to avoid vulnerabilities).
As every C programmer knows not checking the return value of ‘malloc’
is a vulnerability to be exploited. In the case of the Rubyish
routines ALLOC_N and REALLOC_N two different behaviors are equally
likely:
return NULL buffer in the same way as plain C functions do
raise an exception
I did assume the latter but if the former is true that will mean the
code given below is vulnerable to exploits!
What is the fastest way to add many small Strings to a big buffer?
Follows my C version for that problem: ‘new’ set size of buffer,
append does append to it (if capacity is too slow it is automagically
increased), get gets whole buffer.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#include <ruby.h>
#include <string.h>
VALUE cStorage;
char *m = NULL;
char *end = NULL;
int capacity = 0;
static VALUE resize(VALUE, VALUE);
static VALUE new(VALUE self, VALUE size) {
resize(self, size);
*m = ‘\0’;
end = m;
return self;
}
static VALUE resize(VALUE self, VALUE size) {
int chars;
Check_Type(size, T_FIXNUM);
chars = NUM2ULONG(size);
if (!m) {
m = ALLOC_N(char, chars + 1);
capacity = chars;
} else if (capacity < chars) {
REALLOC_N(m, char, chars + 1);
capacity = chars;
}
return self;
}
static VALUE append(VALUE self, VALUE text) {
int len;
char * str;
Check_Type(text, T_STRING);
str = STR2CSTR(text);
len = strlen(str);
if (end - m + len > capacity) resize(self, end - m + len);
strcat(end, str);
end += len;
return self;
}
static VALUE get(VALUE self) {
return rb_str_new2(m);
}
void Init_Storage() {
cStorage = rb_define_module(“Storage”);
rb_define_module_function(cStorage, “new”, new, 1);
rb_define_module_function(cStorage, “append”, append, 1);
rb_define_module_function(cStorage, “get”, get, 0);
}
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Gis,
Josef ‘Jupp’ Schugt
For “Ruby Developer’s Guide” I did some measurements on this and join was
often significantly faster when there are many strings to
concat. Also “<<” was somewhat but not much faster than “+=” (for many
strings, for fewer string “+=” was sometimes faster). Performance
measurement is hairy business though and its hard to generalise…
Regards,
Robert Feldt
On Fri, 13 Jun 2003, Dave Thomas wrote:
I don’t believe there’s a single fastest way.
Often ‘<<’ is fast, but not always.
In one application, where I was generating a CVS file from a largish SQL
result set, I made the program over 100 times faster by changingresult = "" for r in results csv_string = to_csv(r) result << csv_string end
to
result = [] for r in results csv_string = to_csv(r) result << csv_string end result = result.join
Saluton!
This message contains a C implementation of a string buffer (see
below). It is an alpha version because I have no information about
the behavior of ALLOC_N and REALLOC_N in the case of a memory
allocation failure - I didn’t find any documentation on that topic.
Besides that it seems to be bullet-proof (I use it to to concatenate
chunks of all mails I download from my POP3 accounts - which is the
reason why I want to avoid vulnerabilities).As every C programmer knows not checking the return value of ‘malloc’
is a vulnerability to be exploited. In the case of the Rubyish
routines ALLOC_N and REALLOC_N two different behaviors are equally
likely:
return NULL buffer in the same way as plain C functions do
raise an exception
It raises a NoMemoryException (see ruby_xmalloc in gc.c). So you don’t
need to test weather if returns NULL or not.
I did assume the latter but if the former is true that will mean the
code given below is vulnerable to exploits!
- Dominik Werder; 2003-06-13, 13:18 UTC:
What is the fastest way to add many small Strings to a big buffer?
Follows my C version for that problem: ‘new’ set size of buffer,
append does append to it (if capacity is too slow it is automagically
increased), get gets whole buffer.
Why not make Storage a class, so you can have more than one? Your
current implementation does not allow this. You’d need to use
Data_Make_Struct and Data_Get_Struct.
This way, you wouldn’t need the global variables.
Regards,
Michael
On Sat, Jun 14, 2003 at 03:46:08AM +0900, Josef ‘Jupp’ Schugt wrote:
–
ruby -r complex -e ‘c,m,w,h=Complex(-0.75,0.136),50,150,100;puts “P6\n#{w} #{h}\n255”;(0…h).each{|j|(0…w).each{|i|
n,z=0,Complex(.9i/w,.9j/h);while n<=m&&(z-c).abs<=2;z=zz+c;n+=1 end;print [10+n15,0,rand99].pack("C")}}’|display
I found also something interesting:
“Pre#{i.to_s}Post” seams to me cheaper than ‘Pre’<<i.to_s<<‘Post’
although ruby has to parse the double quoted string?
bye!
Dominik
Often ‘<<’ is fast, but not always.
In one application, where I was generating a CVS file from a largish SQL
result set, I made the program over 100 times faster by changingresult = “”
for r in results
csv_string = to_csv(r)
result << csv_string
endto
result =
for r in results
csv_string = to_csv(r)
result << csv_string
end
result = result.join
“Michael Neumann” mneumann@ntecs.de wrote in message
[cool sig snip]
print [10+n15,0,rand99].pack(“C*”)}}’ | display
^^^^^^
This must a *ix command …where can I get Windows version of display ?
Saluton!
Follows my C version for that problem: ‘new’ set size of buffer,
append does append to it (if capacity is too slow it is
^^^^^^
That’s low by the way :->
automagically increased), get gets whole buffer.
Why not make Storage a class, so you can have more than one?
Einstein’s principle: Make it as simple as possible but not simpler.
I one buffer was required need of additional buffers was unexpected
so special relativity was created. Due to public interest I will work
on general relativity now >;->
And as it was the case with Einstein who did need to learn
differential geometry in order to be able to set up general
relativity I will have to learn more about Ruby extensions.
Gis,
Josef ‘Jupp’ Schugt
Hi,
I found also something interesting:
“Pre#{i.to_s}Post” seams to me cheaper than ‘Pre’<<i.to_s<<‘Post’
to_s inside #{} isn’t needed.
although ruby has to parse the double quoted string?
Yes for 1.6. In 1.8, “Pre#{i}Post” is equivalent to
‘Pre’<<i.to_s<<‘Post’.
At Mon, 16 Jun 2003 17:17:53 +0900, Dominik Werder wrote:
–
Nobu Nakada
Saluton!
Why not make Storage a class, so you can have more than one? Your
current implementation does not allow this.
Done.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
require ‘Storage’ # Require library
s = Storage.new(3) # s is to hold 3 chars
s.append(‘foo’) # append ‘foo’ to s
s.append(‘bar’) # append ‘bar’ to s (s grows to hold text)
s.append(‘baz’) # append ‘baz’ to s (s grows to hold text)
puts s.to_s # prints ‘foobarbaz’
s.resize(6) # shrink s to hold 6 chars
puts s.to_s # prints ‘foobar’
s.flush # flushes content of s
puts s.to_s.length # 0
puts s.capacity # 6
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#include <ruby.h>
#include <string.h>
VALUE cStorage;
typedef struct {
char *start, *end;
int capacity;
} mem;
static VALUE freemem(mem *m) {
free(m->start);
free(m);
return Qnil;
}
VALUE new(VALUE class, VALUE size) {
int n;
mem *m = ALLOC(mem);
VALUE data;
Check_Type(size, T_FIXNUM);
*(m->end = m->start = ALLOC_N(char, (n = NUM2ULONG(size)) + 1)) = ‘\0’;
m->capacity = n;
data = Data_Wrap_Struct(class, NULL, freemem, m);
rb_obj_call_init(data, 0, NULL);
return data;
}
static VALUE append(VALUE self, VALUE text) {
mem *m;
char *s, *t;
int old, plus, need;
Data_Get_Struct(self, mem, m);
if ((need = (plus = strlen(s = STR2CSTR(text))) + (old = m->end - (t = m->start))) >= m->capacity) {
REALLOC_N(m->start, char, (m->capacity = need) + 1);
if (t != m->start) m->end = m->start + old;
}
strcpy(m->end, s);
m->end += plus;
return Qnil;
}
static VALUE resize(VALUE self, VALUE size) {
mem *m;
int n, len;
char *s;
Data_Get_Struct(self, mem, m);
Check_Type(size, T_FIXNUM);
len = m->end - (s = m->start);
REALLOC_N(m->start, char, (m->capacity = n = NUM2ULONG(size)) + 1);
if (s != m->start) m->end = m->start + ((n < len) ? n : len);
*(m->start + n) = ‘\0’;
return Qnil;
}
static VALUE flush(VALUE self) {
mem *m;
Data_Get_Struct(self, mem, m);
*(m->end = m->start) = ‘\0’;
return Qnil;
}
static VALUE to_s(VALUE self) {
mem *m;
Data_Get_Struct(self, mem, m);
return rb_str_new2(m->start);
}
static VALUE capacity(VALUE self) {
mem *m;
Data_Get_Struct(self, mem, m);
return INT2NUM(m->capacity);
}
void Init_Storage() {
cStorage = rb_define_class(“Storage”, rb_cObject);
rb_define_singleton_method(cStorage, “new”, new, 1);
rb_define_method(cStorage, “append”, append, 1);
rb_define_method(cStorage, “to_s”, to_s, 0);
rb_define_method(cStorage, “resize”, resize, 1);
rb_define_method(cStorage, “flush”, flush, 0);
rb_define_method(cStorage, “capacity”, capacity, 0);
}
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Gis,
Josef ‘Jupp’ Schugt
“display” probably refers to the ImageMagick package. Haven’t tried it
though - if I executed any random sig from the Usenet, I could as well use
Outlook ImageMagick is probably available though Cygwin.
On Fri, 13 Jun 2003 18:48:35 -0500, Shashank Date wrote:
[cool sig snip]
print [10+n15,0,rand99].pack(“C*”)}}’ | display
^^^^^^
This must a *ix command …where can I get Windows version of display ?
–
Best Regards, | Hi! I’m a .signature virus. Copy me into
Sebastian | your ~/.signature to help me spread!
To me Storage is too general a name. StringBuffer?
Regards,
Robert Feldt
On Tue, 17 Jun 2003, Josef ‘Jupp’ Schugt wrote:
EXAMPLE of using ‘Storage’:
Saluton!
- Michael Neumann; 2003-06-13, 23:05 UTC:
Why not make Storage a class, so you can have more than one? Your
current implementation does not allow this.Done.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
EXAMPLE of using ‘Storage’:
require ‘Storage’ # Require library
s = Storage.new(3) # s is to hold 3 chars
s.append(‘foo’) # append ‘foo’ to s
s.append(‘bar’) # append ‘bar’ to s (s grows to hold text)
s.append(‘baz’) # append ‘baz’ to s (s grows to hold text)
puts s.to_s # prints ‘foobarbaz’
s.resize(6) # shrink s to hold 6 chars
puts s.to_s # prints ‘foobar’
s.flush # flushes content of s
puts s.to_s.length # 0
puts s.capacity # 6-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
[some 86 lines of code]
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This is a lot of work. All you need to do is expose rb_str_buf_new to
get the capa thing, matz did already the work for you
I am too tired to check if rb_str_append takes into account the capa or
only the length (but it should as the capa can be bigger than the size,
otherwise it is meaningless). If it doesn’t, you need to redefine #<<
(as a singleton method of the returned string), otherwise (as I believe)
you can just stay w/ #<<.
(non tested)
#include <ruby.h>
static VALUE new(VALUE class, VALUE size)
{
int n = NUM2ULONG(size);
return rb_str_buf_new(n);
}
void Init_Storage()
{
rb_define_singleton_method(rb_cString, “new_with_capa”, new, 1);
}
and then in Ruby
buf = String.new_with_capa(1000)
buf << somedata
…etc…
On Tue, Jun 17, 2003 at 05:59:34AM +0900, Josef ‘Jupp’ Schugt wrote:
–
_ _
__ __ | | ___ _ __ ___ __ _ _ __
'_ \ /| __/ __| '_
_ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com
/*
Please skip to the bottom of this file if you ate lunch recently
-- Alan
*/
– from Linux kernel pre-2.1.91-1
“display” probably refers to the ImageMagick package. Haven’t tried it
though - if I executed any random sig from the Usenet, I could as well use
OutlookImageMagick is probably available though Cygwin.
Saluton!
EXAMPLE of using ‘Storage’:
To me Storage is too general a name. StringBuffer?
ACK
Josef ‘Jupp’ Schugt