Fastest way to reverse-process a string?

Mehr_Assaph_Assaph · 4 August 2004 00:00

In a ruby program I'm writing, I want to parse a string in reverse
and feed each of its individual characters through a state machine.
A "hammer and tongs" solution looks like this:
string.reverse.split(//).each {
>ch>
feedIntoStateMachine(ch)
}
However, this seems rather inefficient. Does anyone know of a faster
algorithm for this in ruby?

Have you tried benchmark?

require 'benchmark'
include Benchmark

str = "Hammer and tongs"

bm(7) do |x|
  x.report("1") { 100000.times { str.reverse.split(//).each { |ch| ch }
} }
  x.report("2") { 100000.times { str.reverse.split('').each { |ch| ch }
} }
  x.report("3") { 100000.times { str.reverse.each_byte { |ch| ch.chr } }
}
end

user system total real
1 4.094000 0.000000 4.094000 ( 4.109000)
2 4.125000 0.015000 4.140000 ( 4.141000)
3 2.375000 0.000000 2.375000 ( 2.391000)

Lloyd_Zusman · 4 August 2004 00:31

"Mehr, Assaph (Assaph)" <assaph@avaya.com> writes:

In a ruby program I'm writing, I want to parse a string in reverse
and feed each of its individual characters through a state machine.
A "hammer and tongs" solution looks like this:
string.reverse.split(//).each {
>ch>
feedIntoStateMachine(ch)
}
However, this seems rather inefficient. Does anyone know of a faster
algorithm for this in ruby?

Have you tried benchmark?

I can only benchmark algorithms when I have more than one algorithm to
compare. I only thought of the one algorithm that I mentioned in my
original post, and so I had nothing to benchmark it against.

I can easily do the benchmarking, and I was not asking anyone to do it
for me. I wrote my email because I was looking for algorithm
suggestions, such as the third one that you mentioned. I was previously
unaware of 'each_byte'. I also didn't think of trying the split('')
version.

Thank you for these suggestions.

Can anyone think of other ways to do this? Is there something really
clever that doesn't require 'reverse', perhaps?

I'll benchmark any and all things that I come across. And I'll be
digging around myself, as well.

Thanks again to all.

···

require 'benchmark'
include Benchmark

str = "Hammer and tongs"

bm(7) do |x|
  x.report("1") { 100000.times { str.reverse.split(//).each { |ch| ch }
} }
  x.report("2") { 100000.times { str.reverse.split('').each { |ch| ch }
} }
  x.report("3") { 100000.times { str.reverse.each_byte { |ch| ch.chr } }
}
end

             user system total real
1 4.094000 0.000000 4.094000 ( 4.109000)
2 4.125000 0.015000 4.140000 ( 4.141000)
3 2.375000 0.000000 2.375000 ( 2.391000)

--
Lloyd Zusman
ljz@asfast.com
God bless you.

Robert · 4 August 2004 07:36

"Lloyd Zusman" <ljz@asfast.com> schrieb im Newsbeitrag
news:m3wu0f690d.fsf@asfast.com...

"Mehr, Assaph (Assaph)" <assaph@avaya.com> writes:

>> In a ruby program I'm writing, I want to parse a string in reverse
>> and feed each of its individual characters through a state machine.
>> A "hammer and tongs" solution looks like this:
>> string.reverse.split(//).each {
>> >ch>
>> feedIntoStateMachine(ch)
>> }
>> However, this seems rather inefficient. Does anyone know of a faster
>> algorithm for this in ruby?
>
> Have you tried benchmark?

I can only benchmark algorithms when I have more than one algorithm to
compare. I only thought of the one algorithm that I mentioned in my
original post, and so I had nothing to benchmark it against.

I can easily do the benchmarking, and I was not asking anyone to do it
for me. I wrote my email because I was looking for algorithm
suggestions, such as the third one that you mentioned. I was previously
unaware of 'each_byte'. I also didn't think of trying the split('')
version.

Thank you for these suggestions.

Can anyone think of other ways to do this? Is there something really
clever that doesn't require 'reverse', perhaps?

require 'benchmark'
include Benchmark

str = "Hammer and tongs"

bm(7) do |x|
  x.report("1") { 100000.times { str.reverse.split(//).each { |ch|
ch } } }
  x.report("2") { 100000.times { str.reverse.split('').each { |ch|
ch } } }
  x.report("3") { 100000.times { str.reverse.each_byte { |ch| ch.chr } } }
  x.report("4") { 100000.times { (str.length-1).downto(0) {|idx| ch =
str[idx]} } }
  x.report("5") { 100000.times { len = str.size; for idx in 1..len ; ch =
str[len-idx] end } }
end

user system total real
1 6.828000 0.000000 6.828000 ( 6.907000)
2 6.891000 0.000000 6.891000 ( 6.936000)
3 5.078000 0.000000 5.078000 ( 5.098000)
4 2.281000 0.000000 2.281000 ( 2.287000)
5 3.391000 0.000000 3.391000 ( 3.395000)

=> 4

robert

Lloyd_Zusman · 4 August 2004 09:31

First of all, I very much thank all of you who responded to my query. I've
read all of your useful and helpful replies, but I am only responding to
this final one, so as to minimize bandwidth.

"Robert Klemme" <bob.news@gmx.net> writes:

[ ... ]

require 'benchmark'
include Benchmark

str = "Hammer and tongs"

bm(7) do |x|
  x.report("1") { 100000.times { str.reverse.split(//).each { |ch|
ch } } }
  x.report("2") { 100000.times { str.reverse.split('').each { |ch|
ch } } }
  x.report("3") { 100000.times { str.reverse.each_byte { |ch| ch.chr } } }
  x.report("4") { 100000.times { (str.length-1).downto(0) {|idx| ch =
str[idx]} } }
  x.report("5") { 100000.times { len = str.size; for idx in 1..len ; ch =
str[len-idx] end } }
end
             user system total real
1 6.828000 0.000000 6.828000 ( 6.907000)
2 6.891000 0.000000 6.891000 ( 6.936000)
3 5.078000 0.000000 5.078000 ( 5.098000)
4 2.281000 0.000000 2.281000 ( 2.287000)
5 3.391000 0.000000 3.391000 ( 3.395000)

I have been so steeped in Ruby-ish thinking that I had completely
forgotten about array indexing. Duh!

And being so reminded, at first I would not have suspected that the
algorithms based on indexing would be faster than the others. But I
suspect that the absence of 'reverse' is what accounts for the speed
differences.

By the way, I tried this with longer strings: between 40 and 8192 bytes,
which are the most likely lengths for the strings I will be dealing
with. As the lengths increase beyond something like 50-60 bytes,
algorithm number 5 becomes faster than algorithm 4, and the difference
in speeds keeps increasing with longer strings.

But if I replace "ch.chr" in algorithm 3 and with just "ch" by itself,
that algorithm suddenly becomes the fastest. The state machine to which
I want to feed the characters can deal with integer representations as
easily as their character equivalents. So, it looks like the following
would be the fastest for my purposes:

  string.reverse.each_byte {
    >c>
    feedToMyStateMachine(c)
  }

Thanks again to all of you!

···

--
Lloyd Zusman
ljz@asfast.com
God bless you.

Robert · 4 August 2004 09:51

"Lloyd Zusman" <ljz@asfast.com> schrieb im Newsbeitrag
news:m3y8kv2qx2.fsf@asfast.com...

First of all, I very much thank all of you who responded to my query.

I've

read all of your useful and helpful replies, but I am only responding to
this final one, so as to minimize bandwidth.

"Robert Klemme" <bob.news@gmx.net> writes:

> [ ... ]
>
> require 'benchmark'
> include Benchmark
>
> str = "Hammer and tongs"
>
> bm(7) do |x|
> x.report("1") { 100000.times { str.reverse.split(//).each { |ch|
> ch } } }
> x.report("2") { 100000.times { str.reverse.split('').each { |ch|
> ch } } }
> x.report("3") { 100000.times { str.reverse.each_byte { |ch|

ch.chr } } }

> x.report("4") { 100000.times { (str.length-1).downto(0) {|idx| ch =
> str[idx]} } }
> x.report("5") { 100000.times { len = str.size; for idx in 1..len ;

ch =

> str[len-idx] end } }
> end
> user system total real
> 1 6.828000 0.000000 6.828000 ( 6.907000)
> 2 6.891000 0.000000 6.891000 ( 6.936000)
> 3 5.078000 0.000000 5.078000 ( 5.098000)
> 4 2.281000 0.000000 2.281000 ( 2.287000)
> 5 3.391000 0.000000 3.391000 ( 3.395000)

I have been so steeped in Ruby-ish thinking that I had completely
forgotten about array indexing. Duh!

And being so reminded, at first I would not have suspected that the
algorithms based on indexing would be faster than the others. But I
suspect that the absence of 'reverse' is what accounts for the speed
differences.

The most costly operations are object creations (apart from Fixnums which
AFAIK are handled a bit differently). From that it's clear that
reverse.split is extremely costly since there are n + 2 instances created
(1 Array, 1 reverse String, n Strings with length 1).

By the way, I tried this with longer strings: between 40 and 8192 bytes,
which are the most likely lengths for the strings I will be dealing
with. As the lengths increase beyond something like 50-60 bytes,
algorithm number 5 becomes faster than algorithm 4, and the difference
in speeds keeps increasing with longer strings.

But if I replace "ch.chr" in algorithm 3 and with just "ch" by itself,
that algorithm suddenly becomes the fastest.

Oh, that can happen if you just copy and past. Of course ".chr" must be
removed in order to get comparable results.

The state machine to which
I want to feed the characters can deal with integer representations as
easily as their character equivalents.

Note that there is no character type in Ruby:

"foo"[0].class

=> Fixnum

77.chr.class

=> String

So, it looks like the following
would be the fastest for my purposes:

  string.reverse.each_byte {
    >c>
    feedToMyStateMachine(c)
  }

Yep.

Regards

robert

#!/usr/bin/ruby

require 'benchmark'
include Benchmark

str = "Hammer and tongs"

bm(7) do |x|
  x.report("1") { 100000.times { str.reverse.split(//).each { |ch|
ch } } }
  x.report("2") { 100000.times { str.reverse.split('').each { |ch|
ch } } }
  x.report("3") { 100000.times { str.reverse.each_byte { |ch| ch } } }
  x.report("4") { 100000.times { (str.length-1).downto(0) {|idx| ch =
str[idx]} } }
  x.report("5") { 100000.times { len = str.size; for idx in 1..len ; ch =
str[len-idx] end } }
end

str = "Hammer and tongs" * 10

bm(7) do |x|
  x.report("1") { 100000.times { str.reverse.split(//).each { |ch|
ch } } }
  x.report("2") { 100000.times { str.reverse.split('').each { |ch|
ch } } }
  x.report("3") { 100000.times { str.reverse.each_byte { |ch| ch } } }
  x.report("4") { 100000.times { (str.length-1).downto(0) {|idx| ch =
str[idx]} } }
  x.report("5") { 100000.times { len = str.size; for idx in 1..len ; ch =
str[len-idx] end } }
end

11:40:07 [ruby]: ./str-reverse.rb
user system total real
1 6.828000 0.000000 6.828000 ( 6.864000)
2 6.937000 0.000000 6.937000 ( 6.926000)
3 1.375000 0.000000 1.375000 ( 1.384000)
4 2.282000 0.000000 2.282000 ( 2.279000)
5 3.422000 0.000000 3.422000 ( 3.412000)
user system total real
1 59.906000 0.000000 59.906000 ( 60.014000)
2 60.250000 0.000000 60.250000 ( 60.420000)
3 10.625000 0.000000 10.625000 ( 10.626000)
4 21.687000 0.000000 21.687000 ( 21.714000)
5 22.422000 0.000000 22.422000 ( 22.455000)

Topic		Replies	Views
Fastest way to reverse-process a string? ruby-talk	6	118	4 August 2004
Fastest way to reverse-process a string? ruby-talk	1	114	4 August 2004
Iterating through a string and removing leading characters ruby-talk	43	231	3 April 2005
Is there a better string.each? ruby-talk	44	255	9 July 2002
Possibly improvable code ruby-talk	8	65	18 June 2004

Fastest way to reverse-process a string?

Related topics