Help me understand why the Ruby block is slower than without

I just wrote my first Ruby script. I'm an experienced C and perl
programmer, so please, if it looks too much like these languages and not
Ruby, let me know. I've got a 100K word list (Linux dictionary) on my
Mac and am opening it then looking for any words that are exactly 10
letters long with no letters repeating ('profligate\n') == 11 is a
match. After I wrote my first version I did some playing. I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array. I then tried putting the File.open in a block and found
that this was much slower, even if I subtract out the time for the open,
which I assume is an error in how the profile is counting total time.

Here's the faster version:

f = File.open("./words")
begin
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
rescue EOFError
  f.close
end

And here's the slower block version:

File.open("./words") { |f|
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
}

Again, the words file is just a list of about 100K unique words from the
dict command or similar on *nix....

Any critique welcome and enlightenment is encouraged.
Thanks!

···

--
Posted via http://www.ruby-forum.com/.

Alan Burch wrote:

I just wrote my first Ruby script. I'm an experienced C and perl
programmer, so please, if it looks too much like these languages and not
Ruby, let me know. I've got a 100K word list (Linux dictionary) on my
Mac and am opening it then looking for any words that are exactly 10
letters long with no letters repeating ('profligate\n') == 11 is a
match. After I wrote my first version I did some playing. I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array. I then tried putting the File.open in a block and found
that this was much slower, even if I subtract out the time for the open,
which I assume is an error in how the profile is counting total time.

Here's the faster version:

f = File.open("./words")
begin
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
rescue EOFError
  f.close
end

And here's the slower block version:

File.open("./words") { |f|
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
}

Again, the words file is just a list of about 100K unique words from the
dict command or similar on *nix....

Any critique welcome and enlightenment is encouraged.
Thanks!

File.open("wordlist") { |f|
  while w = f.gets
     puts w if w.size==11 && w.split(//).uniq.size == 11
  end
}

I'm guessing that

  print "#{ar.to_s}"

is what is slowing you down. It results in converting
each element of the array into a string (at least 11
extra method calls) and then concatenating the results.
Kind of a waste when you've got the result already sitting in $_.

Also, calling to_s to convert an object to a string within a string interpolation
block is redundant.

  print "#{ar}"

works and then you realize that you don't need the interpolation so

  print ar

is even better. Understanding this is what David Black called a
'Ruby right of passage'. At least I think it was David who said that
recently. I'm too lazy to google for the reference at the moment.

Gary Wright

···

On Mar 10, 2006, at 5:57 PM, Alan Burch wrote:

f = File.open("./words")
begin
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
rescue EOFError
  f.close
end

This sounds like premature optimization. Remember, you start worrying about speed when the code gets too slow. Not before.

James Edward Gray II

···

On Mar 10, 2006, at 4:57 PM, Alan Burch wrote:

I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array.

Alan Burch wrote:

I just wrote my first Ruby script. I'm an experienced C and perl
programmer, so please, if it looks too much like these languages and not
Ruby, let me know. I've got a 100K word list (Linux dictionary) on my
Mac and am opening it then looking for any words that are exactly 10
letters long with no letters repeating ('profligate\n') == 11 is a
match. After I wrote my first version I did some playing. I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array. I then tried putting the File.open in a block and found
that this was much slower, even if I subtract out the time for the open,
which I assume is an error in how the profile is counting total time.

Here's the faster version:

f = File.open("./words")
begin
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
rescue EOFError
  f.close
end

And here's the slower block version:

File.open("./words") { |f|
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
}

IO.foreach('words'){|s|puts s if s=~/(?!.*(.).*\1)^.{10}$/}

File.open("wordlist") { |f|
  while w = f.gets
     puts w if w.size==11 && w.split(//).uniq.size == 11
  end
}

Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
Others, any comments on the block slow down?
AB

···

--
Posted via http://www.ruby-forum.com/\.

Hi --

···

On Sat, 11 Mar 2006, gwtmp01@mac.com wrote:

Also, calling to_s to convert an object to a string within a string interpolation
block is redundant.

  print "#{ar}"

works and then you realize that you don't need the interpolation so

  print ar

is even better. Understanding this is what David Black called a
'Ruby right of passage'. At least I think it was David who said that
recently. I'm too lazy to google for the reference at the moment.

You're right but spelled rite wrong, Wright :slight_smile:

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! Ruby for Rails

That's what the foreach() iterator is for:

File.foreach("wordlist") do |word|
   puts word if word.chomp.split("").uniq.size == 10
end

James Edward Gray II

···

On Mar 10, 2006, at 5:28 PM, William James wrote:

File.open("wordlist") { |f|
  while w = f.gets
     puts w if w.size==11 && w.split(//).uniq.size == 11
  end
}

James Gray wrote:

I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array.

This sounds like premature optimization. Remember, you start
worrying about speed when the code gets too slow. Not before.

James Edward Gray II

James:
I'm going to have to respectfully disagree. I guess maybe I'm getting
to old to code, but I first learned assembler and then C. Assembler
served me well in that I knew how to write the fastest, least resource
intensive C. Back in the early 80s on a VAX running V7 UNIX that was
more important than maintainability. As I've continued my craft and
learned many other languages, I've found that truly understanding what's
happening "under the hood" of any language was the key to writing code
that didn't break, executed quickly, and kept clients happy.
Furthermore, there's something in me that makes me better love the
language when I completely master it. I don't believe the language is
mastered until one understands things such as why one construct executes
quicker than another. I can now picture how a mix-in works and why
calling the to_a mix-in is a slower construct. I don't understand all
the nuances of that yet, but I intend to and that will make Ruby that
much more enjoyable to me.
Thanks for a different insight,
Alan

···

On Mar 10, 2006, at 4:57 PM, Alan Burch wrote:

--
Posted via http://www.ruby-forum.com/\.

Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
Others, any comments on the block slow down?
AB

I mis-spoke. Not a factor of 10 faster, just marginally. I had
"wordlist" in my directory as a list of the unique 10 letter words.
I do like the code better still, but with out the block, it's still much
faster. Also using uniq! rather than size is quicker than taking the
size twice.

My current fastest script:

f= File.open("./words")
begin
  while w = f.gets
    puts w if w.size == 11 && w.split(//).uniq! == nil
  end
rescue EOFError
  f.close
end

Not measurably faster than the first one, but seems better and more Ruby
like to me.

···

--
Posted via http://www.ruby-forum.com/\.

Alan Burch <orotone@gmail.com> writes:

File.open("wordlist") { |f|
  while w = f.gets
     puts w if w.size==11 && w.split(//).uniq.size == 11
  end
}

Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
Others, any comments on the block slow down?

I don't see much of a slowdown.

···

----------------------------------------------------------------------

g@crash:~/tmp$ cat read-slow.rb
File.open("./words") { |f|
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
}
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.56user 0.01system 0:02.64elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.55user 0.01system 0:02.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.54user 0.01system 0:02.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ cat read-fast.rb
f = File.open("./words")
begin
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
rescue EOFError
  f.close
end
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.51user 0.01system 0:02.54elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.50user 0.01system 0:02.56elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.51user 0.01system 0:02.53elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps

----------------------------------------------------------------------

There's a bit of a slowdown, but note that in your "fast" algo, the
stream is never closed, since IO#gets never throws EOFError. Do `ri
IO#gets' for the method's documentation. :slight_smile:

Another speedup: replace:

  w.split(//).uniq.size == 11

with:

  w !~ /(.).*\1/

It's faster since there's less intermediate diddlage, but
theoretically it shouldn't scale as well. You'd have to increase your
"11" quite a lot to notice it though I think.

More shell dump.

----------------------------------------------------------------------

g@crash:~/tmp$ cat read-one.rb
File.open("words") { |f|
  while w = f.gets
     puts w if w.size==11 && w.split(//).uniq.size == 11
  end
}
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.54user 0.02system 0:02.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.54user 0.01system 0:02.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.55user 0.01system 0:02.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ cat read-two.rb
File.open("words") { |f|
  while w = f.gets
    puts w if w.size==11 && w !~ /(.).*\1/
  end
}
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.23user 0.01system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.27user 0.01system 0:01.29elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.27user 0.02system 0:01.30elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$
g@crash:~/tmp$
g@crash:~/tmp$ diff out-one out-two
g@crash:~/tmp$

----------------------------------------------------------------------

James Gray wrote:

File.open("wordlist") { |f|
  while w = f.gets
     puts w if w.size==11 && w.split(//).uniq.size == 11
  end
}

That's what the foreach() iterator is for:

File.foreach("wordlist") do |word|
   puts word if word.chomp.split("").uniq.size == 10
end

James Edward Gray II

James:
This code doesn't work on my Mac. I do have a version that uses the
file block and each/foreach above, but I'm suspecting that when the
string becomes an array after the split something's breaking down as I
get words of all sizes out???
Thanks,
Alan

···

On Mar 10, 2006, at 5:28 PM, William James wrote:

--
Posted via http://www.ruby-forum.com/\.

Well, I'm pretty darn sure you are in the minority on that one: :wink:

http://www.google.com/search?q="premature+optimization"

James Edward Gray II

···

On Mar 11, 2006, at 9:58 AM, Alan Burch wrote:

James Gray wrote:

On Mar 10, 2006, at 4:57 PM, Alan Burch wrote:

I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array.

This sounds like premature optimization. Remember, you start
worrying about speed when the code gets too slow. Not before.

James Edward Gray II

James:
I'm going to have to respectfully disagree.

Alan Burch wrote:

I mis-spoke. Not a factor of 10 faster, just marginally. I had
"wordlist" in my directory as a list of the unique 10 letter words.
I do like the code better still, but with out the block, it's still much
faster. Also using uniq! rather than size is quicker than taking the
size twice.

Solely for my own amusement, since I'm still trying teach myself Ruby...

File.open("./words").read.split.collect! {|x| x if x.length == 10 &&
x.split(//).uniq! == nil}.compact!.each {|x| puts x }

···

--
Posted via http://www.ruby-forum.com/\.

George Ogata wrote:

Another speedup: replace:

  w.split(//).uniq.size == 11

with:

  w !~ /(.).*\1/

It's faster since there's less intermediate diddlage, but
theoretically it shouldn't scale as well. You'd have to increase your
"11" quite a lot to notice it though I think.

George:
Much thanks, I think that you've proved what I suspected, that Ruby is
counting the time wrong with the profile (ruby -r profile script.rb) as
when I subtract the profile time for the File.open block it's only a bit
slower than the faster call. I appreciate all the help and will try to
ask a more difficult question next time.
I've always been fairly strong with regexes, but I'd have never thought
to use one here. Thanks for that as well.

David:
Thanks for chiming in, I'll check out your links as well.

Alan

···

--
Posted via http://www.ruby-forum.com/\.

I'm curious why you see it so? Personally, seems less Ruby-like to me.

--Steve

PGP.sig (186 Bytes)

···

On Mar 10, 2006, at 4:08 PM, Alan Burch wrote:

My current fastest script:

f= File.open("./words")
begin
  while w = f.gets
    puts w if w.size == 11 && w.split(//).uniq! == nil
  end
rescue EOFError
  f.close
end

Not measurably faster than the first one, but seems better and more Ruby
like to me.

!? :slight_smile: How on earth does that work? Every time I think I've sort of got the hang of regexp, they spring something new on me.

I was also going to ask why everyone was doing "split( // )" instead of "split( '' )"?

- oooh, coffee's ready...

Cheers,
  Benjohn

···

On 11 Mar 2006, at 02:13, George Ogata wrote:

Another speedup: replace:

  w.split(//).uniq.size == 11

with:

  w !~ /(.).*\1/

Something is fishy there, for it works just fine on my own Mac:

Neo:~/Desktop$ ls
tens.rb wordlist
Neo:~/Desktop$ cat wordlist
one
two
three
0123456789
five
0123456789
Neo:~/Desktop$ cat tens.rb
#!/usr/local/bin/ruby -w

File.foreach("wordlist") do |word|
    puts word if word.chomp.split("").uniq.size == 10
end

__END__
Neo:~/Desktop$ ruby tens.rb
0123456789

James Edward Gray II

···

On Mar 11, 2006, at 10:09 AM, Alan Burch wrote:

This code doesn't work on my Mac.

Doesn't James Gray's code print out words which contain exactly 11 different letters e.g.

abbreviations - 13 characters + \n, but because it wasn't checked for size before splitting this boils down to 10 different characters.

irb(main):001:0> s = 'abbreviations'
=> "abbreviations"
irb(main):002:0> s.split('').uniq
=> ["a", "b", "r", "e", "v", "i", "t", "o", "n", "s"]
irb(main):003:0> s.split('').uniq.size
=> 10

Interesting. I crudely benchmarked this (using time on my mac):

#!/usr/bin/env ruby

File.foreach("K6wordlist.txt") do |word|
    # puts word if word.size==11 && word.split(//).uniq.size == 11
      puts word if word.length == 11 and word.chomp.split(//).uniq.size == 10
    # puts word if word.length == 11 and not word =~ /(.).*\1/
end

and then ran each of the three sending output to /dev/null (after checking that they all worked the same on my test file. In order:

real 0m0.347s
user 0m0.294s
sys 0m0.017s

real 0m0.334s
user 0m0.288s
sys 0m0.018s

real 0m0.177s
user 0m0.137s
sys 0m0.015s

There may be interesting behaviour if the last line in the file doesn't have a trailing \n, I would probably go for something more like

File.foreach("K6wordlist.txt") do |word|
    word.chomp!
    puts word if word.length == 10 and not word =~ /(.).*\1/
end

(timing intentionally omitted :slight_smile:

Mike

···

On 11-Mar-06, at 11:09 AM, Alan Burch wrote:

James Gray wrote:

On Mar 10, 2006, at 5:28 PM, William James wrote:

File.open("wordlist") { |f|
  while w = f.gets
     puts w if w.size==11 && w.split(//).uniq.size == 11
  end
}

That's what the foreach() iterator is for:

File.foreach("wordlist") do |word|
   puts word if word.chomp.split("").uniq.size == 10
end

James Edward Gray II

James:
This code doesn't work on my Mac. I do have a version that uses the
file block and each/foreach above, but I'm suspecting that when the
string becomes an array after the split something's breaking down as I
get words of all sizes out???

--

Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.

Well, in this case, being in the majority doesn't necessarily make you right. Like many things, I think we've got several shades of gray here.. err... Gray? :slight_smile: I'm all for not prematurely optimizing. But in this case, Alan is attempting to better understand Ruby's inner-workings which is a perfectly fine example of playing with performance.

Additionally, the "no premature optimization ideal" is often taken a little too far. I intentionally call it an "ideal". I work on video games. A good portion of our job is optimization. If we didn't do *some* premature optimization, we'd be in bad shape.

--Steve

···

On Mar 11, 2006, at 8:47 AM, James Edward Gray II wrote:

This sounds like premature optimization. Remember, you start
worrying about speed when the code gets too slow. Not before.

I'm going to have to respectfully disagree.

Well, I'm pretty darn sure you are in the minority on that one: :wink: