[QUIZ] Statistician I (#167)

Here's my own submission for this problem. Once you wrap your head
around a few bits of the regular expression, it's pretty simple to
understand.

class Rule
  attr_reader :fields

  def initialize(str)
    patt = str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?)')
    @pattern = Regexp.new('^' + patt + '$')
    @fields = nil
  end

  def match(str)
    if md = @pattern.match(str)
      @fields = md.captures
    else
      @fields = nil
    end
  end
end

rules = []
File.open(ARGV[0]).each do |line|
  line.strip!
  next if line.empty?
  rules << Rule.new(line)
end

unknown = []
File.open(ARGV[1]).each do |line|
  line.strip!
  if line.empty?
    puts
    next
  end

  if rule = rules.find { |rule| rule.match(line) }
    indx, data = rules.index(rule), rule.fields.reject { |f| f.nil? }
    puts "Rule #{indx}: #{data.join(', ')}"
  else
    unknown << line
    puts "# No match"
  end
end

puts "\nUnmatched input:"
puts unknown.join("\n")

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
## Statistician I (#167)

My first quiz, it's very rough but it works most of the time.

I'm probably (re)implementing a very limited form of regular
expression, but in the process of making this I discovered several
ways it could fail, in the test cases, it's just the case noted in the
comments.

Here is the code: http://pastie.org/224463
And the rules that catch most of the samples: http://pastie.org/224464

Lucas.

Here is my submission. I hope it's flexible enough for the followup
quizzes. I sensed there might be a need to access the fields of a match
by name, which is why I added the RuleMatch#fields method. It returns a
hash that allows code like

puts Rule.match(line).fields['amount'] # prints the value of the
<amount> field

This method isn't used in the current code however. But who knows, it
might come in handy later on.

You can find my submission at http://www.pastie.org/224480

- Matthias

···

--
Posted via http://www.ruby-forum.com/.

## Statistician I (#167)

Here is my solution:
http://www.pastie.org/224576

It's for ruby19 only.

Hi,

This is my try at this quiz. I thought it would be cool to store the
field "names" too, for each match.
I also added a verbose output to show the field name and the value. As
the goal was to be flexible too,
I made some classes to encapsulate everything, to prepare for the future:

class Match
  attr_accessor :captures, :mappings, :rule
  
  def initialize captures, mappings, rule
    @captures = captures
    @mappings = mappings
    @rule = rule
  end

  def to_s verbose=false
    s = "Rule #{@rule.id}: "
    if verbose
      @rule.names.each_with_index {|n,i| s << "[#{n} => #{@mappings[n]}]"
if @captures[i]}
      s
    else
      s + "#{@captures.compact.join(",")}"
    end
  end
end

class Rule
  attr_accessor :names, :id
  
        # Translate rules to regexps, specifying if the first captured group
        # has to be remembered
  RULE_MAPPINGS = {
    "[" => ["(?:", false],
    "]" => [")?", false],
    /<(.*?)>/ => ["(.*?)", true],
  }
  def initialize id, line
    @id = id
    @names =
    escaped = escape(line)
    reg = RULE_MAPPINGS.inject(escaped) do |line, (tag, value)|
      replace, remember = *value
      line.gsub(tag) do |m|
        @names << $1 if remember
        replace
      end
    end
    @reg = Regexp.new(reg)
  end
  
  def escape line
    # From the mappings, change the regexp sensitive chars with non-sensitive ones
    # so that we can Regexp.escape the line, then sub them back
    escaped = line.gsub("[", "____").gsub("]", "_____")
    escaped = Regexp.escape(escaped)
    escaped.gsub("_____", "]").gsub("____", "[")
  end
  
  def match data
    m = @reg.match data
    return nil unless m
    map = Hash[*@names.zip(m.captures).flatten]
    Match.new m.captures, map, self
  end
end

class RuleSet
  def initialize file
    @rules =
    File.open(file) do |f|
      f.each_with_index {|line, i| @rules << Rule.new(i, line.chomp)}
    end
    p @rules
  end
  
  def apply data
    match = nil
    @rules.find {|r| match = r.match data}
    match
  end
end

rules_file = ARGV[0] || "rules.txt"
data_file = ARGV[1] || "data.txt"

rule_set = RuleSet.new rules_file

matches = nil
unmatched =
File.open(data_file) do |f|
  matches = f.map do |line|
    m = rule_set.apply line.chomp
    unmatched << line unless m
    m
  end
end

matches.each do |m|
  if m
    puts m
  else
    puts "#No match"
  end
end

unless unmatched.empty?
  puts "Unmatched input: "
  puts unmatched
end

#~ puts "Verbose output:"
#~ matches.each do |m|
  #~ if m
    #~ puts (m.to_s(true))
  #~ else
    #~ puts "#No match"
  #~ end
#~ end

···

On Fri, Jun 27, 2008 at 5:56 PM, Matthew Moss <matthew.moss@gmail.com> wrote:

## Statistician I (#167)

This week begins a three-part quiz, the final goal to provide a little
library for parsing and analyzing line-based data. Hopefully, each portion
of the larger problem is interesting enough on its own, without being too
difficult to attempt. The first part -- this week's quiz -- will focus on
the pattern matching.

Let's look at a bit of example input:

   You wound Perl for 15 points of Readability damage.
   You wound Perl with Metaprogramming for 23 points of Usability damage.
   Your mighty blow defeated Perl.
   C++ walks into the arena.
   C++ wounds you with Compiled Code for 37 points of Speed damage.
   You wound C++ for 52 points of Usability damage.

Okay, it's silly, but it is similar to a much larger data file I'll provide
end for testing.

You should definitely note the repetitiveness: just the sort of thing that
we can automate. In fact, I've examined the input above and created three
rules (a.k.a. patterns) that match (most of) the data:

   [The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[
damage].
   You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[
damage].
   Your mighty blow defeated[ the] <name>.

There are a few guidelines about these rules:

1. Text contained within square brackets is optional.
2. A word contained in angle brackets represents a field; not a literal
match, but data to be remembered.
3. Fields are valid within optional portions.
4. You may assume that both the rules and the input lines are stripped of
excess whitespace on both ends.

Assuming the rules are in `rules.txt` and the input is in `data.txt`,
running your Ruby script as such:

   > ruby reporter.rb rules.txt data.txt

Should generate the following output:

   Rule 1: Perl, 15, Readability
   Rule 1: Perl, Metaprogramming, 23, Usability
   Rule 2: Perl
   # No Match
   Rule 0: C++, Compiled Code, 37, Speed
   Rule 1: C++, 52, Usability

   Unmatched input:
   C++ walks into the arena.

Here is my solution to this weeks quiz. It's also my first RubyQuiz.

http://www.pastie.org/224754

Matthew Moss wrote:

  def initialize(str)
    patt = str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?)')
    @pattern = Regexp.new('^' + patt + '$')
    @fields = nil
  end

does the rule string not need to be regexp escaped somehow if it's
gonna be directly Regexp.new'ed?

I fear a rule with something like "You run away[ from <name>] (you
coward)" would break this approach.

Matthew Rudy

···

--
Posted via http://www.ruby-forum.com/\.

So I thought I could use pastie for a change so that I could still
make minor modifications and don't have to repost the code. But ...
wrong URL. Sorry, let's hope this is the right one:

http://www.pastie.org/224585

Regards,
Thomas.

···

On Jun 30, 7:48 am, ThoML <micat...@gmail.com> wrote:

> ## Statistician I (#167)

Here is my solution:http://www.pastie.org/224576

Added some lines to show the unmatched input.

http://www.pastie.org/225875

Perhaps... My solution is likely not safe from all input sets. While I
hadn't considered literal parentheses as part of the rule set, I
should have at the least considered the period (match any char).

For the current purposes, it is sufficient if your solution supports
the provided example ruleset, though any additional work towards
escaping parts/preventing breakage is certainly acceptable.

···

On Jun 29, 3:19 pm, Matthew Rudy Jacobs <matthewrudyjac...@gmail.com> wrote:

Matthew Moss wrote:

> def initialize(str)
> patt = str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?)')
> @pattern = Regexp.new('^' + patt + '$')
> @fields = nil
> end

does the rule string not need to be regexp escaped somehow if it's
gonna be directly Regexp.new'ed?

I fear a rule with something like "You run away[ from <name>] (you
coward)" would break this approach.

Here's mine solution: (http://pastie.org/226949\)

class Parse
  def initialize(rules)
    @rules = create_rules(rules)
  end

  # Read the rules and transform then into regexp
  def create_rules(rules)
    rules.collect do |r|
      vars =;
r=Regexp.escape(r.chomp).gsub("\\\[","[").gsub("\\\]","]").gsub
/\[([^\]]+)\]/, '(?:\1)?';
      r.gsub!(/<([^>]+)>/) do vars<<$1; '(.*?)' end
      [Regexp.new(r),vars]
    end
  end

  # Parse the given file upon the rules created
  def parse(data)
    @match =; @exceptions=; data.each do |l|
      mdata=nil; @rules.each_with_index{|(r,d),i| break if !((mdata =
[i,r.match(l)]) == [i,nil]) }
      if !mdata[1].nil?
        @match << ["Rule #{mdata[0]+1}:",*mdata[1].to_a[1..-1]]
      else
        @match << ["# No Match"]; @exceptions << l
      end
    end; self
  end

  #Print results
  def to_s
    "#{@match.collect{|m| m.join(" ")}.join("\n")}" +
    (@exceptions.empty? ? "" : "\n\nUnmatched
input:\n#{@exceptions.join("")}")
  end
end

# Example of usage
puts "#{Parse.new(File.read("rules.txt")).parse(File.read("guardian.txt"))}"

···

On Tue, Jul 1, 2008 at 10:47 PM, benjamin.billian@googlemail.com < benjamin.billian@googlemail.com> wrote:

Added some lines to show the unmatched input.

http://www.pastie.org/225875

--
Rita Rudner - "When I eventually met Mr. Right I had no idea that his first
name was Always."

I had to escape the string in order to make my solution work due to
the final dot...

Jesus.

···

On Mon, Jun 30, 2008 at 3:19 AM, Matthew Moss <matthew.moss@gmail.com> wrote:

On Jun 29, 3:19 pm, Matthew Rudy Jacobs <matthewrudyjac...@gmail.com> > wrote:

Matthew Moss wrote:

> def initialize(str)
> patt = str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?)')
> @pattern = Regexp.new('^' + patt + '$')
> @fields = nil
> end

does the rule string not need to be regexp escaped somehow if it's
gonna be directly Regexp.new'ed?

I fear a rule with something like "You run away[ from <name>] (you
coward)" would break this approach.

Perhaps... My solution is likely not safe from all input sets. While I
hadn't considered literal parentheses as part of the rule set, I
should have at the least considered the period (match any char).

For the current purposes, it is sufficient if your solution supports
the provided example ruleset, though any additional work towards
escaping parts/preventing breakage is certainly acceptable.