Statistician II
Last week's quiz started the creation of a line-based pattern-matching system: our statistician. This week, your task is to further develop a solution from last week: organize the code and provide a more interesting interface.
The first thing is organization. This little library should be reusable and not tied to any particular parsing need. So we want to separate out the "Statistician" from the client. To do this means moving the appropriate code into a separate file called statistician.rb, containing:
# statistician.rb
module Statistician
# This module is your task! Your code goes here...
end
Meanwhile, the client code will now begin with:
# client.rb
require 'statistician'
Simple, eh?
Next, we will move the rules from their own data file and bring them into the code. Admittedly, moving data into code usually is not a wise thing to do, but as the primary data is that which the rules parse, we're going to do it anyway. Besides, this is Ruby Quiz, so why not?
Simultaneously, we're going to group rules together: rules that while may differ somewhat in appearance, essentially represent the same kind or category of data. As the rules and category are client data, they will go into the client's code. Here's an example to begin, borrowing the LotRO rules used last week.
# client.rb
class Offense < Statistician::Reportable
rule "You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[ damage]."
rule "You reflect <amount> point[s] of <kind> damage to[ the] <name>."
end
class Victory < Statistician::Reportable
rule "Your mighty blow defeated[ the] <name>."
end
Next, we need a parser (or Reporter, as I like to call it) that can manage these rules and classes, read the input data and process it all line by line. Such client code looks like this:
# client.rb
lotro = Statistician::Reporter.new(Offense, Victory)
lotro.parse(File.read(ARGV[0]))
Finally, we need to begin getting useful information out of all the records that have been read and parsed by the Reporter. After the data is parsed, the final bit will be to support code such as this:
# client.rb
num = Offense.records.size
dmg = Offense.records.inject(0) { |s, x| s + x.amount.to_i }
puts "Average damage inflicted: #{dmg.to_f / num}"
puts Offense.records[0].class # outputs "Offense"
What is going on here? The class Offense serves three purposes.
- Its declaration contains the rules for offensive related records.
- After parsing, the class method
recordsreturns an array of records that matched those rules. - Those records are instances of the class, and instance methods that match the field names (extracted from the rules) provide access to a record's data.
Hopefully this isn't too confusing. I could have broken up some of these responsibilities into other classes or sections of code, but since the three tasks are rather related, I thought it convenient and pleasing to group them all into the client's declared class.
Below I'll give the full, sample client file I'm using, as well as the output it generates when run over the hunter.txt file we used last week. A few hints, first...
You are welcome to make
statistician.rbdepend on other Ruby modules. I personally foundOpenStructto be quite useful here.Personally, I found making
Offenseinherit fromReportableto be the cleanest method. At least, it is in my own code. There may be other ways to accomplish this goal: byincludeorextendmethods. If you find those techniques more appealing, please go ahead, but make a note of it in your submission, since it does require changing how client code is written.Metaprogramming can get a bit tricky to explain in a couple sentences, so I'll leave such hints and discussion for the mailing list. Aside from that, there are some good examples of metaprogramming looking back through past Ruby Quizzes. Of particular interest would be the metakoans.rb quiz.
Finally, my own solution for this week's quiz is just under 80 lines long, so it need not be overly complex to support the client file below.
Here is the complete, sample client file:
require 'statistician'
class Defense < Statistician::Reportable
rule "[The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[ damage]."
rule "You are wounded for <amount> point[s] of <kind> damage."
end
class Offense < Statistician::Reportable
rule "You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[ damage]."
rule "You reflect <amount> point[s] of <kind> damage to[ the] <name>."
end
class Defeat < Statistician::Reportable
rule "You succumb to your wounds."
end
class Victory < Statistician::Reportable
rule "Your mighty blow defeated[ the] <name>."
end
class Healing < Statistician::Reportable
rule "You heal <amount> points of your wounds."
rule "<player> heals you for <amount> of wound damagepoints."
end
class Regen < Statistician::Reportable
rule "You heal yourself for <amount> Power points."
rule "<player> heals you for <amount> Power points."
end
class Comment < Statistician::Reportable
rule "### <comment> ###"
end
class Ignored < Statistician::Reportable
rule "<player> defeated[ the] <name>."
rule "<player> has succumbed to his wounds."
rule "You have spotted a creature attempting to move stealthily about."
rule "You sense that a creature is nearby but hidden from your sight."
rule "[The ]<name> incapacitated you."
end
if __FILE__ == $0
lotro = Statistician::Reporter.new(Defense, Offense, Defeat, Victory,
Healing, Regen, Comment, Ignored)
lotro.parse(File.read(ARGV[0]))
num = Offense.records.size
dmg = Offense.records.inject(0) { |sum, off| sum + Integer(off.amount.gsub(',', '_')) }
d = Defense.records[3]
puts <<-EOT
Number of Offense records: #{num}
Total damage inflicted: #{dmg}
Average damage per Offense: #{(100.0 * dmg / num).round / 100.0}
Defense record 3 indicates that a #{d.name} attacked me
using #{d.attack}, doing #{d.amount} points of damage.
Unmatched rules:
#{lotro.unmatched.join("\n")}
Comments:
#{Comment.records.map { |c| c.comment }.join("\n")}
EOT
end
And here is the output it generates, using the hunter.txt data file:
Number of Offense records: 1300
Total damage inflicted: 127995
Average damage per Offense: 98.46
Defense record 3 indicates that a Tempest Warg attacked me
using Melee Double, doing 108 points of damage.
Unmatched rules:
The Trap wounds Goblin-town Guard for 128 points of Common damage.
Nothing to cure.
Comments:
Chat Log: Combat 04/04 00:34 AM
Summary
I don't know if it was the metaprogramming that scared people away this week, or perhaps folks are away on summer vacations. In any case, I'm going to summarize this week's quiz by looking at the submission from Matthias Reitinger. The solution is, as Matthias indicates, unexpectedly concise. "I guess that's just the way Ruby works."
Matthias' code implements the Statistician module in three parts, each a class. Here is the first class, Rule:
class Rule
def initialize(pattern)
@fields = []
pattern = Regexp.escape(pattern).gsub(/\[(.+?)\]/, '(?:\1)?').
gsub(/<(.+?)>/) { @fields << $1; '(.+?)' }
@regexp = Regexp.new('^' + pattern + '$')
end
def match(line)
@result = if md = @regexp.match(line)
Hash[*@fields.zip(md.captures).flatten]
end
end
def result
@result
end
end
Rule makes use of regular expressions built-up as discussed in the previous quiz, so I'm not going to discuss that here. I will point out, though, the initialization of the @fields member in the initializer. Note the last gsub call: it uses the block form of gsub.
gsub(/<(.+?)>/) { @fields << $1; '(.+?)' }
As the (.+?) string is last evaluated in the block, that provides the expected replacement in the string. However, Matthias makes use of the just-matched expression to extract the field names. This avoids a second pass over the source string to get those fields names, and is arguably simpler.
The match method matches input lines against the regular expression, returning nil if the input didn't match, or a hash if it did. Field names (@fields) are first paired (zip) with the matched values (md.captures), then flatten-ed into a single array, finally expanded (*) and passed to a Hash initializer that treats alternate items as keys and values. The end result of Rule#match, when the input matches, is a hash that looks like this:
{ 'amount' => '108', 'name' => 'Tempest Warg' }
That hash is returned, but also stored internally into member @result for future reference, accessed by the last method, result.
The next class is Reportable:
class Reportable < OpenStruct
class << self
attr_reader :records
def inherited(klass)
klass.class_eval do
@rules, @records = [], []
end
super
end
def rule(pattern)
@rules << Rule.new(pattern)
end
def match(line)
if rule = @rules.find { |rule| rule.match(line) }
@records << self.new(rule.result)
end
end
end
end
This small class is the extent of the metaprogramming going on in the solution, and it's not much, though perhaps unfamiliar to some. Let's get into some of it. We'll ignore the OpenStruct inheritance for the moment, coming back to it later.
Everything inside the Reportable class is surrounded by a block that opens with class << self. There is a good summary on the Ruby Talk mailing list, but its use here can be summed up in two words: class methods. The class << self mechanism is not strictly about class methods, but in this context it affects similar behavior. Alternatively, these methods could have been defined in this manner:
class Reportable < OpenStruct
def Reportable.rule(pattern)
# etc.
end
def Reportable.match(line)
# etc.
end
# etc.
end
In the end, the class << self mechanism is cleaner looking, and also allows for use of attr_reader in a natural way.
The next interesting bit is the inherited method. This is a class method, here implemented on Reportable, that is called whenever Reportable is subclassed (which happens repeatedly in the client code). It's a convenient hook that allows the other bit of metaprogramming to happen.
klass.class_eval do
@rules, @records = [], []
end
klass is the class derived from Reportable (i.e. our client's classes for future statistical analysis). Here, Matthias initializes two members, both to empty arrays, in the scope of class klass. This serves to ensure that every class derived from Reportable gets its own, separate members, not shared with other Reportable subclasses.
This could be done without metaprogramming, but would require effort from the client.
class Reportable
# class methods here
end
class Offense < Reportable
@rules, @records = [], []
# rules, etc.
end
class Defense < Reportable
@rules, @records = [], []
# rules, etc.
end
If the client forgot to initialize those two members, or got the names wrong, the class wouldn't work, exceptions would be thrown, cats and dogs living together... you get the idea.
You might consider defining those data members in the Reportable class itself, like so:
class Reportable
@rules, @records = [], []
# class methods, without inherited
end
The problem with this is that every Reportable subclass would now share the same rules and records arrays: not the desired outcome.
In the end, the class_eval used here, called from inherited, is the right way to do things. It provides a way for the superclass to inject functionality into the subclass.
Getting back to functionality, Reportable#match is straightforward, but let me highlight one line:
@records << self.new(rule.result)
If you recall, result returns a hash of field names to values. And Reportable is attempting to pass that hash to its own initializer, of which none is defined. This is where OpenStruct comes in.
OpenStruct "allows you to create data objects and set arbitrary attributes." And OpenStruct provides an initializer that takes the hash Matthias provides, and does the expected.
data = OpenStruct.new( {'amount' => '108', 'name' => 'Tempest Warg'} )
p data.amount # -> 108
p data.name # -> Tempest Warg
By subclassing Reportable from OpenStruct, all of the client's classes will inherit the same behavior, which fulfills many of the requirements provided in the class specification.
The final class, Reporter, is trivial.
class Reporter
attr_reader :unmatched
def initialize(*args)
@reportables = args
@unmatched = []
end
def parse(data)
data.each_line do |line|
line.strip!
@reportables.find { |rep| rep.match(line) } or @unmatched << line
end
end
end
It reads through a data source a line at a time, finding a matching rule (and creating the appropriate record in the process) or adding the input line to @unmatched which the client can query later.