Hi,
This is my try at this quiz. I thought it would be cool to store the
field "names" too, for each match.
I also added a verbose output to show the field name and the value. As
the goal was to be flexible too,
I made some classes to encapsulate everything, to prepare for the future:
class Match
attr_accessor :captures, :mappings, :rule
def initialize captures, mappings, rule
@captures = captures
@mappings = mappings
@rule = rule
end
def to_s verbose=false
s = "Rule #{@rule.id}: "
if verbose
@rule.names.each_with_index {|n,i| s << "[#{n} => #{@mappings[n]}]"
if @captures[i]}
s
else
s + "#{@captures.compact.join(",")}"
end
end
end
class Rule
attr_accessor :names, :id
# Translate rules to regexps, specifying if the first captured group
# has to be remembered
RULE_MAPPINGS = {
"[" => ["(?:", false],
"]" => [")?", false],
/<(.*?)>/ => ["(.*?)", true],
}
def initialize id, line
@id = id
@names =
escaped = escape(line)
reg = RULE_MAPPINGS.inject(escaped) do |line, (tag, value)|
replace, remember = *value
line.gsub(tag) do |m|
@names << $1 if remember
replace
end
end
@reg = Regexp.new(reg)
end
def escape line
# From the mappings, change the regexp sensitive chars with non-sensitive ones
# so that we can Regexp.escape the line, then sub them back
escaped = line.gsub("[", "____").gsub("]", "_____")
escaped = Regexp.escape(escaped)
escaped.gsub("_____", "]").gsub("____", "[")
end
def match data
m = @reg.match data
return nil unless m
map = Hash[*@names.zip(m.captures).flatten]
Match.new m.captures, map, self
end
end
class RuleSet
def initialize file
@rules =
File.open(file) do |f|
f.each_with_index {|line, i| @rules << Rule.new(i, line.chomp)}
end
p @rules
end
def apply data
match = nil
@rules.find {|r| match = r.match data}
match
end
end
rules_file = ARGV[0] || "rules.txt"
data_file = ARGV[1] || "data.txt"
rule_set = RuleSet.new rules_file
matches = nil
unmatched =
File.open(data_file) do |f|
matches = f.map do |line|
m = rule_set.apply line.chomp
unmatched << line unless m
m
end
end
matches.each do |m|
if m
puts m
else
puts "#No match"
end
end
unless unmatched.empty?
puts "Unmatched input: "
puts unmatched
end
#~ puts "Verbose output:"
#~ matches.each do |m|
#~ if m
#~ puts (m.to_s(true))
#~ else
#~ puts "#No match"
#~ end
#~ end
···
On Fri, Jun 27, 2008 at 5:56 PM, Matthew Moss <matthew.moss@gmail.com> wrote:
## Statistician I (#167)
This week begins a three-part quiz, the final goal to provide a little
library for parsing and analyzing line-based data. Hopefully, each portion
of the larger problem is interesting enough on its own, without being too
difficult to attempt. The first part -- this week's quiz -- will focus on
the pattern matching.
Let's look at a bit of example input:
You wound Perl for 15 points of Readability damage.
You wound Perl with Metaprogramming for 23 points of Usability damage.
Your mighty blow defeated Perl.
C++ walks into the arena.
C++ wounds you with Compiled Code for 37 points of Speed damage.
You wound C++ for 52 points of Usability damage.
Okay, it's silly, but it is similar to a much larger data file I'll provide
end for testing.
You should definitely note the repetitiveness: just the sort of thing that
we can automate. In fact, I've examined the input above and created three
rules (a.k.a. patterns) that match (most of) the data:
[The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[
damage].
You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[
damage].
Your mighty blow defeated[ the] <name>.
There are a few guidelines about these rules:
1. Text contained within square brackets is optional.
2. A word contained in angle brackets represents a field; not a literal
match, but data to be remembered.
3. Fields are valid within optional portions.
4. You may assume that both the rules and the input lines are stripped of
excess whitespace on both ends.
Assuming the rules are in `rules.txt` and the input is in `data.txt`,
running your Ruby script as such:
> ruby reporter.rb rules.txt data.txt
Should generate the following output:
Rule 1: Perl, 15, Readability
Rule 1: Perl, Metaprogramming, 23, Usability
Rule 2: Perl
# No Match
Rule 0: C++, Compiled Code, 37, Speed
Rule 1: C++, 52, Usability
Unmatched input:
C++ walks into the arena.