I ran the profiler on my program and it said that 51% of the time is spent
in Array#each. Is there any way to speed this up (like using
Array#each_index perhaps)?
And is there a quicker way to do a .gsub if I'm using just strings and not a
regexp?
By the way, the program compares 2 databases and determines which records
have been inserted and deleted, and each database has about 15,000 records
to compare.
Mike Steiner
Mike Steiner wrote:
I ran the profiler on my program and it said that 51% of the time is
spent
in Array#each. Is there any way to speed this up (like using
Array#each_index perhaps)?
And is there a quicker way to do a .gsub if I'm using just strings and
not a
regexp?
By the way, the program compares 2 databases and determines which
records
have been inserted and deleted, and each database has about 15,000
records
to compare.
Mike Steiner
No. I don't think you can speed up Array#each at all. What's far more
promising is improving your algorithm in a way that it has to run less
loops. You most likely have one or even several O(n^x) algorithms with x
= 2 in your app. Maybe you're not even aware. E.g. if you have an each
with a select in the loop you're already in O(n^2) realm.
The database comparison probably can be handled best with hashtables of
the primary keys.
Regards
Stefan
···
--
Posted via http://www.ruby-forum.com/\.
I ran the profiler on my program and it said that 51% of the time is spent
in Array#each. Is there any way to speed this up (like using
Array#each_index perhaps)?
The time spent on Array#each itself is minimal and can't really be sped up much. What you're really seeing is the time spent in the block's code. If you have a bunch of each calls all over, you can refactor each one to call out to a method:
ary.each { ... } => def x; ...; end; ary.each { x }
That way you can get better accounting on which block is taking up the most time.
And is there a quicker way to do a .gsub if I'm using just strings and not a
regexp?
Not by much. I will say (based on a subsequent email of yours on this thread) that you shouldn't have to be doing these gsubs just because you're on windows. IIRC, ruby should open files with either form of path separator.
By the way, the program compares 2 databases and determines which records
have been inserted and deleted, and each database has about 15,000 records
to compare.
15k records to compare isn't that much, honestly.
One thing I would suggest is for you to use an alternative profiler. shugo's prof or my zenprofile gives much more accurate profiles where lots and lots of method calls are involved. The beauty of the default profiler is how small and easy it is to write. The pain is that it is biased towards # of calls, not time spent.
···
On May 23, 2007, at 14:59 , Mike Steiner wrote:
I ran the profiler on my program and it said that 51% of the time is spent
in Array#each. Is there any way to speed this up (like using
Array#each_index perhaps)?
Mostly what the profiler is telling you is "don't do that".
Which sometimes isn't very helpful.
The thing to do rather than ask "is there something faster than
Array#each?" rather ask is there a way I can rewrite my algorithm so I
don't called Array#each so much?
I tend to not look at those entries in the profiler output and skip
downwards until I find the first method I wrote and optimize that.
And is there a quicker way to do a .gsub if I'm using just strings and not a
regexp?
I don't know exactly what you're doing but String = does a lot more
than I would naively think.
------------------------------------------------------------- String#=
str[fixnum] = fixnum
str[fixnum] = new_str
str[fixnum, fixnum] = new_str
str[range] = aString
str[regexp] = new_str
str[regexp, fixnum] = new_str
str[other_str] = new_str
···
On Thu, 24 May 2007, Mike Steiner wrote:
------------------------------------------------------------------------
Element Assignment---Replaces some or all of the content of _str_.
The portion of the string affected is determined using the same
criteria as +String#+. If the replacement string is not the same
length as the text it is replacing, the string will be adjusted
accordingly. If the regular expression or string is used as the
index doesn't match a position in the string, +IndexError+ is
raised. If the regular expression form is used, the optional second
+Fixnum+ allows you to specify which portion of the match to
replace (effectively using the +MatchData+ indexing rules. The
forms that take a +Fixnum+ will raise an +IndexError+ if the value
is out of range; the +Range+ form will raise a +RangeError+, and
the +Regexp+ and +String+ forms will silently ignore the
assignment.
By the way, the program compares 2 databases and determines which records
have been inserted and deleted, and each database has about 15,000 records
to compare.
Have a look at
http://rubygarden.org:3000/Ruby/page/show/RubyOptimization
for more info.
If you are still stuck, and it doesn't reveal any confidential
details, send the top 50 lines profiler output to the group and ask
for suggestions.
John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand
I was going to suggest something even more primitive -- query each
database table in some sorted fashion and dump it as a CSV file. Use
diff to compare CSV files. Obviously this won't work for tables with
blobs, but it might be a fun experiment.
Failing that, how can the database do the work for you? Let it find
the records you're interested in (it will do it faster). Put
modification timestamps in your tables to reduce the number of records
you need to search. Or use triggers to log all transactions to a
separate table so you can run those same transactions on the other
database.
Just some thoughts.
Blessings,
TwP
···
On 5/23/07, Stefan Rusterholz <apeiros@gmx.net> wrote:
No. I don't think you can speed up Array#each at all. What's far more
promising is improving your algorithm in a way that it has to run less
loops. You most likely have one or even several O(n^x) algorithms with x
>= 2 in your app. Maybe you're not even aware. E.g. if you have an each
with a select in the loop you're already in O(n^2) realm.
The database comparison probably can be handled best with hashtables of
the primary keys.
Mike Steiner wrote:
I ran the profiler on my program and it said that 51% of the time is spent
in Array#each. Is there any way to speed this up (like using
Array#each_index perhaps)?
Note that the profiler output is often irritating because time for each also includes all the time spend in the block - and while #each is invoked only once the block is invoked n times.
And is there a quicker way to do a .gsub if I'm using just strings and not a regexp?
Quicker than what? Are you using the block form of gsub? Are you using gsub! or gsub? Can you show some code?
By the way, the program compares 2 databases and determines which records
have been inserted and deleted, and each database has about 15,000 records
to compare.
15,000 does not sound like much. What is the runtime of the program without profile?
And another idea: could be that you are using inefficient methods to test for existence. Finding something in an Array is always O(n) unless you have additional constraints; i.e. array content is sorted in which case you can use binary search which is O(log(n)). It seems you want to be doing set operations - so why don't you try to use Set for which member tests are far more efficient (O(1) because it's using hashing internally).
Kind regards
robert
···
On 24.05.2007 00:15, Stefan Rusterholz wrote:
One of the things I'm doing is a lot of File.join's, and I have to do a gsub
( /\// , "\\" ) after each one (because I'm running under Windows).
I can't use sets because I have duplicate records. There is no unique record
key, so for each record (which I use as a key in a hash), I have an array
listing the files it appears in (which can include the same file more than
once).
Thanks for the help!
Mike Steiner
···
On 5/24/07, Robert Klemme <shortcutter@googlemail.com> wrote:
On 24.05.2007 00:15, Stefan Rusterholz wrote:
> Mike Steiner wrote:
>> I ran the profiler on my program and it said that 51% of the time is
>> spent
>> in Array#each. Is there any way to speed this up (like using
>> Array#each_index perhaps)?
Note that the profiler output is often irritating because time for each
also includes all the time spend in the block - and while #each is
invoked only once the block is invoked n times.
>> And is there a quicker way to do a .gsub if I'm using just strings and
>> not a regexp?
Quicker than what? Are you using the block form of gsub? Are you using
gsub! or gsub? Can you show some code?
>> By the way, the program compares 2 databases and determines which
>> records
>> have been inserted and deleted, and each database has about 15,000
>> records
>> to compare.
15,000 does not sound like much. What is the runtime of the program
without profile?
And another idea: could be that you are using inefficient methods to
test for existence. Finding something in an Array is always O(n) unless
you have additional constraints; i.e. array content is sorted in which
case you can use binary search which is O(log(n)). It seems you want to
be doing set operations - so why don't you try to use Set for which
member tests are far more efficient (O(1) because it's using hashing
internally).
Kind regards
robert
Mike Steiner wrote:
One of the things I'm doing is a lot of File.join's, and I have to do a
gsub
( /\// , "\\" ) after each one (because I'm running under Windows).
Are you using the strings generated by File.join outside of ruby?
If not you can simply use them without gsub.
Stefan
Can you be a bit more specific on what exactly you are doing?
robert
···
2007/5/24, Mike Steiner <mikejaysteiner@gmail.com>:
One of the things I'm doing is a lot of File.join's, and I have to do a gsub
( /\// , "\\" ) after each one (because I'm running under Windows).
I can't use sets because I have duplicate records. There is no unique record
key, so for each record (which I use as a key in a hash), I have an array
listing the files it appears in (which can include the same file more than
once).
--
Have a look: Robert K. | Flickr