[QUIZ] Longest Repeated Substring (#153)

My solution finds substrings of length n, groups the identical ones together in a hash of their starting indices, prunes out cases where two of the "repeated" substring overlap each other, and then looks in the same places for substrings of length n+1.

The aggressive pruning does, however, cause it to fail in the case where a substring that overlaps at one length does not overlap at a higher length. For example, take the string "aaaaaa". The substring "aa" exists at indicies [0,1,2,3,4]. My program will prune that down to [0,2,4]. It will then record the substring of length 3 that starts at each of [0,2,4], prune out the ones that start at 2 (overlaps with the one startign at 0) and 4 (goes out of bounds), leaving only an unrepeated substring, and would thus fall back and print "aa," even though the correct answer is "aaa," because the index of 3 had already been pruned.

On the other hand, the aggressive pruning does make my program somewhat fast. It takes about 45s on Also Sprach Zarathustra.

text = ARGF.read

substr_locs = {}

text.split(//).each_with_index do |c,idx|
  if substr_locs[c]
    substr_locs[c] << idx
  else
    substr_locs[c] = [idx]
  end
end

substr_locs.each_pair do |substr,locs|
  substr_locs.delete(substr) if locs.length == 1
end

len = 1
done = false
until done
  len += 1
  new_substr_locs = {}
  substr_locs.each_pair do |substr,locs|
    locs.each do |idx|
      s = text[idx,len]
      next if idx+len>text.length
      if new_substr_locs[s]
        new_substr_locs[s] << idx
      else
        new_substr_locs[s] = [idx]
      end
    end
  end
  ##Must reduce for the case where a substring overlaps itself
  new_substr_locs.each_key do |s|
    locs = new_substr_locs[s]
    idx = 1
    while idx < locs.length
      if locs[idx]-locs[idx-1] < len
        locs.delete_at(idx)
      else
        idx += 1
      end
    end
    new_substr_locs.delete(s) if locs.length == 1
  end
  unless new_substr_locs.keys.empty?
    substr_locs = new_substr_locs
  else
    done = true
  end
end
puts substr_locs.keys.first

···

____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping

cannot use regex, i suppose:

(.{aNumber}).*\1

where aNumber = 2..x (x=text.length/2), until regex 'response' is
'nil'

You can use whatever you like.

Try it out on War & Peace though and see how it does. :slight_smile:

James Edward Gray II

···

On Jan 23, 2008, at 11:35 AM, Raffa wrote:

cannot use regex, i suppose:

(.{aNumber}).*\1

where aNumber = 2..x (x=text.length/2), until regex 'response' is
'nil'

Why not?
The simplest solution could be:

def longest_repeated_substring str
    (str.size/2).downto(1) { |i|
        /(.{#{i}}).*\1/m =~ str and return $1
    }
    nil
end

but it's too slow;

···

On Jan 23, 2008 12:35 PM, Raffa <raffaboss@gmail.com> wrote:

cannot use regex, i suppose:

(.{aNumber}).*\1

where aNumber = 2..x (x=text.length/2), until regex 'response' is
'nil'

Well, one would think that the absolute simplest solution is /(.*+).*\1/,
but that's what I was testing when I invented the "your banana my banana"
test case. (Which it failed, returning only "y" as the repeated
substring).

Yours is a nice extension of this basic idea.

--Ken

···

On Wed, 23 Jan 2008 13:21:31 -0500, SERGEY VOLKOV wrote:

Why not?
The simplest solution could be:

def longest_repeated_substring str
    (str.size/2).downto(1) { |i|
        /(.{#{i}}).*\1/m =~ str and return $1
    }
    nil
end

but it's too slow;

On Jan 23, 2008 12:35 PM, Raffa <raffaboss@gmail.com> wrote:

cannot use regex, i suppose:

(.{aNumber}).*\1

where aNumber = 2..x (x=text.length/2), until regex 'response' is 'nil'

--
Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

It's linear search for repeated substring length,
I've tried binary search, too slow too

···

On Jan 24, 2008 12:39 AM, Ken Bloom <kbloom@gmail.com> wrote:

On Wed, 23 Jan 2008 13:21:31 -0500, SERGEY VOLKOV wrote:

> Why not?
> The simplest solution could be:
>
> def longest_repeated_substring str
> (str.size/2).downto(1) { |i|
> /(.{#{i}}).*\1/m =~ str and return $1
> }
> nil
> end
>
> but it's too slow;
>
> On Jan 23, 2008 12:35 PM, Raffa <raffaboss@gmail.com> wrote:
>> cannot use regex, i suppose:
>>
>> (.{aNumber}).*\1
>>
>>
>>
>> where aNumber = 2..x (x=text.length/2), until regex 'response' is 'nil'

Well, one would think that the absolute simplest solution is /(.*+).*\1/,
but that's what I was testing when I invented the "your banana my banana"
test case. (Which it failed, returning only "y" as the repeated
substring).

Yours is a nice extension of this basic idea.

--Ken

--
Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/