> > I wouldn't be surprised if the idea of searching only 1/2
> > of the second string to prevent overlaps is wrong..
> I think you're right in that it's wrong.
...snip
> I'll post my solution in a reply, which is very similar to your
> except in the overlap prevention code, which, I have to admit, is
> pretty ugly. And I'm not even convinced that I got it right!
Dave's code can be corrected by realizing that since all suffix
strings end at the same place (the end of the string), then of the two
adjacent strings being tested, one is a suffix of the other.
This means that to detect overlap, the following test can be used:
if prefix.length + s1.length > s2.length then
# Overlap
end
where "prefix" is the current prefix being checked in the two adjacent
suffix strings.
...
Adding this test (instead of the s2.length / 2 test) and also testing
adjacent strings that start with the prefix currently being searched
(to find later matches if earlier ones overlap) would correct Dave's
solution and shouldn't be much more complicated.
That's very close to what I do, but I go in the opposite direction. I
(try to) track the size window of how many previous sorted suffixes
must be checked against the last in the sequence.
So if I'm comparing substrings a and b in [..., a, b, ...], and if the
amount of non-overlap is smaller than the longest found substring so
far, then I enlarge the window. So now when I'm focusing on c in
[..., a, b, c, ...] I'll compare it to both a and b.
There needs to be a way to shrink the window as well, so it doesn't
grow monotonically, and so if the a and c don't have the same prefix
which exceeds the length of the longest found substring so far, the
window shrinks.
Part of my complexity comes from trying to detect cases where I can
avoid the full comparison early, and so every time I try to do the
avoid (by using "next" to start the next iteration), I must also do
some window management.
Maybe re-orienting the code in your way would simplify things. If you
have the time, I'd be very curious to see what you would come up with.
Eric
···
On Jan 20, 5:18 pm, JJ <jjnoa...@gmail.com> wrote:
On Jan 20, 2:23 pm, "Eric I." <rubytrain...@gmail.com> wrote:
> On Jan 20, 12:04 pm, Dave Thomas <d...@pragprog.com> wrote: