"Randy Kramer" <rhkramer@gmail.com> schrieb im Newsbeitrag
news:200503221157.21361.rhkramer@gmail.com...
Robert,
I want to thank you for all your help, it's like having a personal
tutor!
You're welcome! I'm glad I could help by sharing my experiences.
Some feedback / observations below that don't really require any
response.
Well, some comments below nevertheless... 
> "Randy Kramer" <rhkramer@gmail.com> schrieb im Newsbeitrag
> news:200503210919.00242.rhkramer@gmail.com...
> > > Some remarks:
> > > - The comparison between 5 and 6 does not seem fair, as you
iterate
> in 6
> > > but not in 5.
After some more testing, your remark seems more on target than I
originally
thought--I can account for almost all the 30x increase in required time
(for
6) by the additional iterations (repeated invocations of the RE engine).
It's like the invocation is the expensive part, and whether it looks for
a
pattern at one point or scans the remainder of the(se short) strings is
negligible. (See the results of tests 6d and 6e below.)
Well, that clearly shows that simple scanning with a RE is superior to
iterating and then scanning.
> A particular performance show stopper in test 6 is String# i.e. you
> create a new String object for each test; object creation is
comparatively
> expensive even though Strings share their internal buffer. But the GC
has
> to be informed etc. and this is quite some overhead. If you want fast
> code, create as few instances as possible. The same holds for Java in
99%
> of all cases.
I'm surprised that Ruby creates a new String object for each test--I
would
have hoped/thought that it was simply letting me "peek" at a portion of
the
existing string (especially since it's only a test).
The internal buffer (the characters) is shared but there is a new Ruby
instance each time you invoke String#:
10.times { puts s1[2,4].id }
134979736
134979676
134979652
134979592
134979496
134979472
134979436
134979364
134979268
134979196
=> 10
I presume that the
StringScanner behaves more sanely in that respect, but I guess I'll find
out. 
Never used that myself but it's sure worth a try.
Thanks for all of the following! I did substitute them in test 6 to see
what
they would do.
# old (6): ~18 seconds
# with range (6a): didn't work, see below
# with upto (6b): ~14 seconds
# with times (6c): ~12 seconds
#6d: ~0.75 seconds (This is the test that convinced me the iterations
are the
problem, I revised the (with times) program to only call the RE once,
although it still scans only from the start of the string--I guess I
should
try test 6e with the RE not anchored.)
#6e: ~0.6 seconds (Same as 6d, except I removed the \A anchor--and now
I'm
puzzled, how is this faster than the anchored version?? Anyway, at this
time
I don't care, I'll just "file it away" as a little anomaly to perhaps
understand some day (and, as I haven't run the test multiple times or
similar
in an attempt to discount garbage collection, maybe that is the
problem.)
I did create new test programs (6a, 6b, 6c) but I haven't uploaded them
to the
TWiki--if you are really interested I can do that, but, as I say below,
I'm
not going to lose sleep over the problem with range.
For some reason that I haven't figured out (yet?), the "with range"
option
didn't work. I'm not going to lose sleep over it--I did try some
troubleshooting, but it may be a rather subtle bug (or I have a very
dense
head).
When I run it as part of a program (re_test_6a.rb), I get the following
error
messages:
bash-2.05b$ re_test6a.rb
/re_test6a.rb:40: Invalid char `\240' in expression
/re_test6a.rb:41: Invalid char `\240' in expression
..
/re_test6a.rb:66: Invalid char `\240' in expression
bash-2.05b$
See comment below.
When I simply copy the "individual loop" part of the code (i.e., the
portion
you show below under # with range) into IRB and running it (after
defining
the appropriate strings), I get (and get kicked out of IRB) BTW, this is
the
result of attempting to paste the five lines into IRB as a group:
irb(main):021:0> (0...(s1.length-6)).each do |i|
irb(main):022:1* if s1[i] == ?[
SyntaxError: compile error
(irb):21: syntax error
from (irb):21
from (null):0
bash-2.05b$
s1="a"*10
=> "aaaaaaaaaa"
(0...(s1.length-6)).each do |i|
?> if s1[i] == ?[
puts "yes"
end
end
=> 0...4
I guess this and the other syntax error above are caused by copying and
pasting some characters outside the ASCII range. I have experienced
similar errors in the past. Sometimes they look like whitespace
characters so you don't recognize them on first sight.
As I try to troubleshoot (by removing pieces from the loop), everything
seems
to work OK (and I'm learning what some of those pieces do 
Anyway, since I went this far, I have uploaded programs 6a thru 6e to
the
TWiki, but I am not requesting / suggesting that anyone try to spend
time
debugging 6a.
RWP_RE_Tests < Wikilearn < TWiki?
> # old
> i = 0
> until i==s1.length-6 do
> if s1[i] == 91
> s1[i,s1.length] =~
> /\A\[(([A-Z]\w*)\.)?(.*)(#([A-Z]\w*))?\](\[(.*)\])?\]/
> end
> i += 1
> end
>
> # with range
> (0...(s1.length-6)).each do |i|
> if s1[i] == ?[
> s1[i,s1.length] =~
> /\A\[(([A-Z]\w*)\.)?(.*)(#([A-Z]\w*))?\](\[(.*)\])?\]/
> end
> end
>
> # with upto
> 0.upto(s1.length-7) do |i|
> if s1[i] == ?[
> s1[i,s1.length] =~
> /\A\[(([A-Z]\w*)\.)?(.*)(#([A-Z]\w*))?\](\[(.*)\])?\]/
> end
> end
>
> # with times
> (s1.length-6).times do |i|
> if s1[i] == ?[
> s1[i,s1.length] =~
> /\A\[(([A-Z]\w*)\.)?(.*)(#([A-Z]\w*))?\](\[(.*)\])?\]/
> end
> end
For anyone following along
my next efforts are going to be focused
on
StringScanner and then making the necessary substitutions. In parallel
I
will probably try to refine the REs.
Please let me/us know how that works out.
The remainder of this looks useful as well!
regards,
Randy Kramer
> Hm, if you know that the size of files is limited (i.e. something like
> just a few KB) then it's usually worth slurping in the whole file with
> something like this
>
> contents = File.open(f){|io| io.read}
>
> and then iterate through the whole thing with #scan. You can still
use ^
> to anchor at line beginnings.
>
> # get the initial sequen until the first non whitespace
> # just an example
> contents.scan /^\s+\S/ do |m|
> p m[0]
> end
Kind regards
robert
···
On Monday 21 March 2005 09:54 am, Robert Klemme wrote:
> > On Monday 21 March 2005 04:44 am, Robert Klemme wrote: