I wrote the method below by copying the algorithm from http://en.wikipedia.org/wiki/Damerau-Levenshtein_distance (and matrix is a really simple 2d array implementation). But the problem is that it slows wat down as the string size gets bigger. At string length of about 150 it takes 1s, at 500 10s. Is there any way to recode this to get better performance without rewriting it in C, and would rewrting it in C even help or is this just a slow algorithm?
class Matrix
def initialize(columns, rows)
ac = Array.new(columns, 0) @am = Array.new(rows, 0) @am = @am.collect{|r| ac.dup}
end
def [](c, r) @am[r][c]
end
def []=(c,r,value) @am[r][c] = value
end
def inspect @am.collect{|a| a.inspect}.join("\n")
end
def to_s @am.to_s
end
end
I wrote the method below by copying the algorithm from Damerau–Levenshtein distance - Wikipedia (and matrix
is a really simple 2d array implementation). But the problem is that
it slows wat down as the string size gets bigger. At string length of
about 150 it takes 1s, at 500 10s. Is there any way to recode this to
get better performance without rewriting it in C, and would rewrting
it in C even help or is this just a slow algorithm?
Well, the algorithm itself is O(n*m), where n and m are the size of
the strings involved, so on large strings it's going to get slow.
I was able to shave about 40% off the time for your method, and fix a
bug.
The bug was caused because the wikipedia article indexes the strings
starting from 1, but indexes the array starting form 0. In ruby both
start at 0, of course. To fix this, I added a fake element at the
start of both strings.
a = m[i-1][j] + 1
a = b if ((b = m[i][j-1]+1) < a)
a = b if ((b = m[i-1][j-1]+cost) < a)
if(i > 1 && j > 1 &&
string1[i] == string2[j-1] &&
string1[i-1] == string2[j]) then
a = b if ((b = m[i-2][j-2] + 1) < a)
end
m[i][j] = a
end
end
m[s1n][s2n]
end
···
--
s=%q( Daniel Martin -- martin@snowplow.org
puts "s=%q(#{s})",s.to_a.last )
puts "s=%q(#{s})",s.to_a.last