[QUIZ] Reverse the Polarity (#143)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Benjohn Barnes

Usually, you write a regular expression and then search for it in a text string.
How about doing the opposite? Given a regular expression, generate all of the
strings that it will match.

Rather amazingly, this has independently come up a couple of times in
conversations with friends, so perhaps it's actually useful. I wanted to use it
to specify possible domain names for a potential new company...

  /(lovely|delicious|splendid)(food|snacks|munchies)/.generate
  => [lovelyfood, deliciousfood, splendidfood,
      lovelysnacks, delicioussnacks, splendidsnacks,
      lovelymunchies, deliciousmunchies, splendidmunchies]

The full regular expression grammar is pretty complex, so maybe you'll want to
just go with a small subset. :slight_smile:

Looks like its been 2 days...
Did not have a lot of time this weekend, so my solution just implements
operators used by the author:

require 'strscan'

class Regexp
  # Create a list of all values matching the regex.
  # Currently only supports groupings and '|'
  def generate
    regex = self.inspect
    regex = regex.slice(1, regex.size - 2) # Remove leading/trailing '/''
    s = StringScanner.new(regex)

    # Build a list containing each grouping, and blocks in between
    groups =
    result = ''
    while (result = s.scan_until(/\(|\)/)) != nil
      result = result.sub("(", "") # Does not support '\' escape chars

      if result.size > 0
        if result[-1].chr == ")"
          groups << result.split("|")
          groups[-1][-1] = groups[-1][-1].sub(")", "")
        else
          groups << [result]
        end
      end
    end

    # Create all combinations of those groups and return them
    find_list_combinations(groups)
  end

  # Return an array of all combinations of values in given list of lists
  def find_list_combinations(lists)
    lists.reverse!
    results = lists.pop
    list = lists.pop

    while list != nil
      new_results =
      for result in results
        for item in list
          new_result = Array.new([result])
          new_result << item
          new_results << new_result
        end
      end

      results = new_results
      list = lists.pop
    end

    new_results.map{|list| list.flatten.join }
  end
end

Examples of usage:

/(lovely|delicious|splendid)(food|snacks|munchies)/.generate
/Hello, my name is (Linus|Yukihiro|Dennis)
(Torvalds|Matsumoto|Ritchie)/.generate

Thanks,

Justin

···

On 10/12/07, Ruby Quiz <james@grayproductions.net> wrote:

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz
until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby Talk follow the discussion. Please reply to the original quiz
message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Benjohn Barnes

Usually, you write a regular expression and then search for it in a text
string.
How about doing the opposite? Given a regular expression, generate all of
the
strings that it will match.

Rather amazingly, this has independently come up a couple of times in
conversations with friends, so perhaps it's actually useful. I wanted to
use it
to specify possible domain names for a potential new company...

        /(lovely|delicious|splendid)(food|snacks|munchies)/.generate
        => [lovelyfood, deliciousfood, splendidfood,
            lovelysnacks, delicioussnacks, splendidsnacks,
            lovelymunchies, deliciousmunchies, splendidmunchies]

The full regular expression grammar is pretty complex, so maybe you'll
want to
just go with a small subset. :slight_smile:

Hi, this is my solution. For the design I took the following decisions:

- I will limit the number of repetitions provided by * and + to a
constant. In my examples I've set it to 2

- I will limit the set of characters generated by . and [^...]
constructs to a fixed list of chars. For my tests I used CHARS = %w{a
b c d e}, but also tested with
#CHARS = [("a".."z").to_a, ("A".."Z").to_a, ".", ",", ";"].flatten
but the results can be big !!

I have the concept of repeaters and groups. A repeteater is a class
that knows how to repeat the result of the groups it contains. I've
implemented OneTimeRepeater for the groups without repetition
modifier, StarRepeater, PlusRepeater and QuestionMarkRepeater for the
appropriate modifiers. RangeRepeater takes care of the {a,b} modifier.

Groups: a group represents a piece of the string for which we can
geneate a set of possible values. The groups I have implemented are
SingleChar, CharGroup (don't support ranges yet, just a list of chars)
and Dot. I haven't implemented escaped characters like \s, \d, etc,
but I think they are trivial to implement. I have a couple of special
groups to cope with parens and |. MultiGroup takes care of the parens
nesting by combining the result of the contained repeaters, and
OrGroup adds together the result of the contained groups.

Things I don't support:
- Ranges in char groups [a-zA-Z]
- non-greedy repetitions: I don't even know how to do this, cause I
think you have to take into account what you are going to generate
after the group
- Character classes: \s, \d, etc and [:alpha:], etc, but I think they
are easy to implement
- Backreferences: /(abc)\1/, didn't even thought about them, but
probably a nightmare.
- Other things I don't even know :-).

Most of the interesting stuff is in the parse and combine methods. The
first one understands the syntax and creates groups and repeaters.
Uses recursive calls for parens and |. The combine method is able to
combine an array of arrays to produce all the possible combinations:

[["", "a"], ["b", "bb"]] --> ["b", "bb", "ab", "abb"]

Here it is:

# Number of times to repeat for Star and Plus repeaters
TIMES = 2

# Set of chars for Dot and negated [^] char groups
#CHARS = [("a".."z").to_a, ("A".."Z").to_a, ".", ",", ";"].flatten
CHARS = %w{a b c d e}

class OneTimeRepeater
  def initialize(group)
    @group = group
  end

  def result
    @group.result
  end
end

class StarRepeater
  def initialize(group)
    @group = group
  end

  def result
    r =
    group_res = @group.result
    group_res.unshift("")
    TIMES.times do
      r << group_res
    end
    combine(r).uniq
  end
end

class PlusRepeater
  def initialize(group)
    @group = group
  end

  def result
    group_res = @group.result
    r = [group_res]
    temp = [""].concat(group_res)
    (TIMES - 1).times do
      r << temp
    end
    combine(r).uniq
  end
end

class QuestionMarkRepeater
  def initialize(group)
    @group = group
  end

  def result
    @group.result.unshift("")
  end
end

class RangeRepeater
  def initialize(group,min,max)
    @group = group
    @min = min
    @max = max
  end

  def result
    result = @group.result
    r =
    r << [""] if @min == 0
    @min.times {r << result}
    temp = result.dup.unshift("")
    (@max - @min).times do
      r << temp
    end
    combine(r).uniq
  end
end

class SingleChar
  def initialize(c)
    @c = c
  end
  def result
    [@c]
  end
end

# TODO: support ranges [a-zA-Z]
class CharGroup
  def initialize(chars)
    @negative = chars[0] == "^"
    @chars = chars
    @chars = @chars[1..-1] if @negative
  end
  def result
    if @negative
      CHARS - @chars
    else
      @chars
    end
  end
end

class Dot
  def result
    CHARS
  end
end

class MultiGroup
  def initialize(groups)
    @groups = groups
  end

  def result
    strings = @groups.map {|x| x.result}
    combine(strings)
  end
end

class OrGroup
  def initialize(first_groupset, second_groupset)
    @first = first_groupset
    @second = second_groupset
  end

  def result
    strings = @first.map {|x| x.result}
    s = combine(strings)
    strings = @second.map {|x| x.result}
    s.concat(combine(strings))
  end
end

# Combines arrays, calling + on each possible pair
# Starts from the first two arrays, then goes on
# combining another array to the result
def combine(arrays)
  string = arrays.inject do |r, rep|
    temp =
    r.each {|aa| rep.each {|bb| temp << (aa + bb)}}
    temp
  end
  string
end

def parse(s, i = 0)
  repeaters =
  group = nil
  while i < s.length
    char = s[i].chr
    case char
    when '('
      groups, i = parse(s, i+1)
      group = MultiGroup.new(groups)
    when ')'
      return repeaters,i
    when '['
      chars =
      i += 1
      until s[i].chr == ']'
        chars << s[i].chr
        i += 1
      end
      group = CharGroup.new(chars)
    when '.'
      group = Dot.new
    when '|'
      groups, i = parse(s, i + 1)
      group = OrGroup.new(repeaters, groups)
      return [group], i
    else
      group = SingleChar.new(char)
    end

    repeater = nil
    i += 1
    if i < s.length
      case s[i].chr
      when '*'
        repeater = StarRepeater.new(group)
      when '+'
        repeater = PlusRepeater.new(group)
      when '?'
        repeater = QuestionMarkRepeater.new(group)
      when '{'
        first = ""
        i += 1
        while s[i].chr != ","
          first << s[i].chr
          i += 1
        end
        i += 1
        second = ""
        while s[i].chr != "}"
          second << s[i].chr
          i += 1
        end
        repeater = RangeRepeater.new(group, first.to_i, second.to_i)
      else
        repeater = OneTimeRepeater.new(group)
        i -= 1
      end
      i += 1
    else
      repeater = OneTimeRepeater.new(group)
    end
    repeaters << repeater
  end
  return repeaters, i
end

class Regexp
  def generate
    r = self.inspect[1..-2]
    repeaters, _ = parse(r)
    strings = repeaters.map {|x| x.result}
    s = combine(strings)
    s
  end
end

def show(regexp)
  s = regexp.generate
  puts "#{regexp.inspect} --> #{s.inspect}"
  #puts "Checking..."
  #errors = s.reject {|string| string =~ regexp}
  #if errors.size == 0
  # puts "All strings match"
  #else
  # puts "These don't match: #{errors.inspect}"
  #end

···

On 10/12/07, Ruby Quiz <james@grayproductions.net> wrote:

by Benjohn Barnes

Usually, you write a regular expression and then search for it in a text string.
How about doing the opposite? Given a regular expression, generate all of the
strings that it will match.

  #
end

Some tests with TIMES=2 and CHARS = %w{a b c d e}:

show(/ab+[def]?[ghi]j/)
show(/ab*c+/)
show(/ab(c+(de)*)?f/)
show(/a{0,3}/)
show(/a|b|c/)
show(/ab(c)+|xy*|jjjk+[^jk]/)
show(/(lovely|delicious|splendid)(food|snacks|munchies)/)

/ab+[def]?[ghi]j/ --> ["abgj", "abhj", "abij", "abdgj", "abdhj",
"abdij", "abegj", "abehj", "abeij", "abfgj", "abfhj", "abfij",
"abbgj", "abbhj", "abbij", "abbdgj", "abbdhj", "abbdij", "abbegj",
"abbehj", "abbeij", "abbfgj", "abbfhj", "abbfij"]

/ab*c+/ --> ["ac", "acc", "abc", "abcc", "abbc", "abbcc"]

/ab(c+(de)*)?f/ --> ["abf", "abcf", "abcdef", "abcdedef", "abccf",
"abccdef", "abccdedef"]

/a{0,3}/ --> ["", "a", "aa", "aaa"]

/a|b|c/ --> ["a", "b", "c"]

/ab(c)+|xy*|jjjk+[^jk]/ --> ["abc", "abcc", "x", "xy", "xyy", "jjjka",
"jjjkb", "jjjkc", "jjjkd", "jjjke", "jjjkka", "jjjkkb", "jjjkkc",
"jjjkkd", "jjjkke"]

/(lovely|delicious|splendid)(food|snacks|munchies)/ --> ["lovelyfood",
"lovelysnacks", "lovelymunchies", "deliciousfood", "delicioussnacks",
"deliciousmunchies", "splendidfood", "splendidsnacks",
"splendidmunchies"]

It was fun.

Regards,

Jesus.

I've had no time to implement the actual quiz. (The quiz is very
interesting, though; I hope to find time to tackle it soon.)

However, I thought I'd post a piece of code I wrote for Quiz 48, since
it is related:

class Array
  def random
    self[ rand( self.length ) ]
  end
end

class String
  def variation( values={} )
    out = self.dup
    while out.gsub!( /\(([^())?]+)\)(\?)?/ ){
      ( $2 && ( rand > 0.5 ) ) ? '' : $1.split( '|' ).random
    }; end
    out.gsub!( /:(#{values.keys.join('|')})\b/ ){ values[$1.intern] }
    out.gsub!( /\s{2,}/, ' ' )
    out
  end
end

Used like so:

irb(main):023:0> str = "(lovely|delicious|splendid)(food|snacks|
munchies)"
irb(main):024:0> 10.times{ puts str.variation }
deliciousmunchies
delicioussnacks
splendidsnacks
splendidmunchies
splendidsnacks
deliciousmunchies
lovelysnacks
splendidsnacks
lovelyfood
delicioussnacks

irb(main):025:0> str = "(How (much|many)|What) is (the (value|result)
of)? 6*7?"
irb(main):026:0> 10.times{ puts str.variation }
How many is 6*7?
How much is the value of 6*7?
How much is 6*7?
What is 6*7?
How many is the result of 6*7?
What is the value of 6*7?
What is the value of 6*7?
How many is 6*7?
What is 6*7?
What is 6*7?

···

On Oct 12, 7:17 am, Ruby Quiz <ja...@grayproductions.net> wrote:

Usually, you write a regular expression and then search for it in a text string.
How about doing the opposite? Given a regular expression, generate all of the
strings that it will match.

> by Benjohn Barnes
>
> Usually, you write a regular expression and then search for it in a text string.
> How about doing the opposite? Given a regular expression, generate all of the
> strings that it will match.

I made a second version, supporting:

- Ranges in char groups [a-zA-Z]
- Backreferences: /(abc)\1/, didn't even thought about them, but
probably a nightmare.

I also support the {<number>} modifier, previously I only supported
{<num1>,<num2>}.

Backreferences were indeed a nightmare, but I managed a solution. I
tried first to integrate them seemlessly in my architecture with no
luck, since the generation of the groups is isolated from one another,
and there was no easy way. So I settled for the following:

Multigroups will carry a number depending on appearance.
When I generate the result for a multigroup, I add the result of that
group to each string.
When I generate a backreference, I generate a special syntax
(__<num>__) to gsub in a second step, when all groups have been
generated and stored. This can of course fail if the regexp matches
that syntax, since I will later on try to sub things that I shouldn't
(any ideas here are welcome).
In the second step I gsub every occurrence of __<num>__ with the
appropriate group value for that string.

Here's the code and some examples of the new things:

# Number of times to repeat for Star and Plus repeaters
TIMES = 2

# Set of chars for Dot and negated [^] char groups
#CHARS = [("a".."z").to_a, ("A".."Z").to_a, ".", ",", ";"].flatten
CHARS = %w{a b c d e}

class OneTimeRepeater
  def initialize(group)
    @group = group
  end

  def result
    @group.result
  end
end

class StarRepeater
  def initialize(group)
    @group = group
  end

  def result
    r =
    group_res = @group.result
    group_res.unshift("")
    TIMES.times do
      r << group_res
    end
    combine(r).uniq
  end
end

class PlusRepeater
  def initialize(group)
    @group = group
  end

  def result
    group_res = @group.result
    r = [group_res]
    temp = [""].concat(group_res)
    (TIMES - 1).times do
      r << temp
    end
    combine(r).uniq
  end
end

class QuestionMarkRepeater
  def initialize(group)
    @group = group
  end

  def result
    @group.result.unshift("")
  end
end

class RangeRepeater
  def initialize(group,min,max)
    @group = group
    @min = min
    @max = max
  end

  def result
    result = @group.result
    r =
    r << [""] if @min == 0
    @min.times {r << result}
    temp = result.dup.unshift("")
    if @max
      (@max - @min).times do
        r << temp
      end
    end
    combine(r).uniq
  end
end

class SingleChar
  def initialize(c)
    @c = c
  end
  def result
    [@c]
  end
end

class CharGroup
  def initialize(chars)
    @chars = chars
    if chars[0] == "^"
      @negative = true
      @chars = @chars[1..-1]
    else
      @negative = false
    end

    # Ranges a-b
    # save first and last "-" if present
    first = nil
    last = nil
    first = @chars.shift if @chars.first == "-"
    last = @chars.pop if @chars.last == "-"
    while i = @chars.index("-")
      @chars[i-1..i+1] = (@chars[i-1]..@chars[i+1]).to_a
    end
    # restore them back
    @chars.unshift(first) if first
    @chars.push(last) if last
  end
  def result
    if @negative
      CHARS - @chars
    else
      @chars
    end
  end
end

class Dot
  def result
    CHARS
  end
end

class MultiGroup
  attr_reader :group_num
  def initialize(groups, group_num)
    @groups = groups
    @group_num = group_num
  end

  # Generates the result of each contained group
  # and adds the filled group of each result to
  # itself
  def result
    strings = @groups.map {|x| x.result}
    result = combine(strings)
    result.each {|x| x.add_filled_group(@group_num, x)}
    result
  end
end

class OrGroup
  def initialize(first_groupset, second_groupset)
    @first = first_groupset
    @second = second_groupset
  end

  def result
    strings = @first.map {|x| x.result}
    s = combine(strings)
    strings = @second.map {|x| x.result}
    s.concat(combine(strings))
  end
end

class BackReference
  attr_reader :num
  def initialize(num)
    @num = num
  end

  def result
    ["__#{@num}__"]
  end
end

# Combines arrays, concatenating each string
# merging the possible groups they have
# Starts combining the first two arrays, then goes on
# combining each other array to the result of the
# previous combination
def combine(arrays)
string = arrays.inject do |r, rep|
   temp =
   r.each {|aa| rep.each {|bb| temp << (aa.concat_and_merge_groups(bb))}}
   temp
end
string
end

class String
  attr_accessor :filled_groups

  def add_filled_group(num, group)
    @filled_groups ||= {}
    @filled_groups[num] = group
  end

  def concat_and_merge_groups(other)
    temp = self + other
    groups = {}
    groups.merge!(self.filled_groups) if self.filled_groups
    groups.merge!(other.filled_groups) if other.filled_groups
    temp.filled_groups = groups
    temp
  end

end

class Regexp
  attr_reader :num_groups
  def parse(s, i = 0)
    repeaters =
    group = nil
    while i < s.length
      char = s[i].chr
      case char
      when '('
        num = @num_groups + 1
        @num_groups += 1
        groups, i = parse(s, i+1)
        group = MultiGroup.new(groups, num)
      when ')'
        return repeaters,i
      when '['
        chars =
        i += 1
        until s[i].chr == ']'
          chars << s[i].chr
          i += 1
        end
        group = CharGroup.new(chars)
      when '.'
        group = Dot.new
      when '|'
        groups, i = parse(s, i + 1)
        group = OrGroup.new(repeaters, groups)
        return [group], i
      when '\\'
        i += 1
        p s[i..-1]
        m = s[i..-1].match(/^(\d+)/)
        if m
          group = BackReference.new(m[0].to_i)
          i += m[0].size - 1
        end
      else
        group = SingleChar.new(char)
      end

      repeater = nil
      i += 1
      if i < s.length
        case s[i].chr
        when '*'
          repeater = StarRepeater.new(group)
        when '+'
          repeater = PlusRepeater.new(group)
        when '?'
          repeater = QuestionMarkRepeater.new(group)
        when '{'
          m = s[i..-1].match(/\{(\d+)(,(\d+))?\}/)
          first = m[1].to_i if m[1]
          second = m[3].to_i if m[3]
          repeater = RangeRepeater.new(group, first, second)
          i += m[0].size - 1
        else
          repeater = OneTimeRepeater.new(group)
          i -= 1
        end
        i += 1
      else
        repeater = OneTimeRepeater.new(group)
      end
      repeaters << repeater
    end
    return repeaters, i
  end

  def generate
    @num_groups = 0
    r = self.inspect[1..-2]
    repeaters, _ = self.parse(r)
    strings = repeaters.map {|x| x.result}
    s = combine(strings)
    # Makes a pass for the backreferences
    s.each do |string|
      string.gsub!(/__(\d+)__/) do |match|
        string.filled_groups[$1.to_i]
      end
    end
    s
  end
end

def show(regexp)
  s = regexp.generate
  puts "#{regexp.inspect} --> #{s.inspect}"
  puts "Checking..."
  errors = s.reject {|string| string =~ regexp}
  if errors.size == 0
    puts "All strings match"
  else
    puts "These don't match: #{errors.inspect}"
  end
end

Samples:

show(/a{3}bcd/)
show(/a[a-d1-9-]/)
show(/a[-a-c1-3]/)
show(/a(b|c)d\1xyz/)
show(/(a)(b)(c)(d)(e)(f)(g)(h)(i)(jjjj|xxxx)(k)\10/)

/a{3}bcd/ --> ["aaabcd"]

/a[a-d1-9-]/ --> ["aa", "ab", "ac", "ad", "a1", "a2", "a3", "a4",
"a5", "a6", "a7", "a8", "a9", "a-"]

/a[-a-c1-3]/ --> ["a-", "aa", "ab", "ac", "a1", "a2", "a3"]

/a(b|c)d\1xyz/ --> ["abdbxyz", "acdcxyz"]

/(a)(b)(c)(d)(e)(f)(g)(h)(i)(jjjj|xxxx)(k)\10/ -->
["abcdefghijjjjkjjjj", "abcdefghixxxxkxxxx"]

Regards,

Jesus.

···

On 10/15/07, Jesús Gabriel y Galán <jgabrielygalan@gmail.com> wrote:

On 10/12/07, Ruby Quiz <james@grayproductions.net> wrote: