Newbie converting brain from perl

William_Pietri · 28 August 2002 20:31

Howdy! As a long time Perl, Java, and Objective-C developer, I’m quite taken
with Ruby, and I’m busily converting my brain over. But there’s a Perl
idiom that I’m having trouble translating. In Perl it goes like this:

$date = "28 Aug 2002";
($day, $month, $year) = $date =~ /(\d+) (\w+) (\d+)/;

In Ruby the best I’ve been able to do is this:

date = "28 Aug 2002"
day, month, year = /(\d+) (\w+) (\d+)/.match(date)[1..3]

But that final [1…3] thing seems inelegant, which makes me think I’m
barking up the wrong tree. What’s the best Ruby-style way to do this?

Many thanks,

William

David_Alan_Black1 · 28 August 2002 20:48

Hello –

Howdy! As a long time Perl, Java, and Objective-C developer, I’m quite taken
with Ruby, and I’m busily converting my brain over. But there’s a Perl
idiom that I’m having trouble translating. In Perl it goes like this:
$date = "28 Aug 2002";
($day, $month, $year) = $date =~ /(\d+) (\w+) (\d+)/;
In Ruby the best I’ve been able to do is this:
date = "28 Aug 2002"
day, month, year = /(\d+) (\w+) (\d+)/.match(date)[1..3]
But that final [1…3] thing seems inelegant, which makes me think I’m
barking up the wrong tree. What’s the best Ruby-style way to do this?

Well, my response to this problem was to participate in a coding
retreat where we implemented scanf (See earlier thread on this.)

require ‘scanf’

d = “28 August 2002”
day,month,year = d.scanf(“%d%s%d”)
p day, month, year

=> 28
“August”
2002

(You can get scanf at http://www.rubyhacker.com/code/scanf.)

You could also do:

day,month,year = d.scan(/(\d+) (\w+) (\d+)/).flatten

You need to flatten the scan results because scan returns an array of
()-captured substrings for every successful scan through the array –
so otherwise, day will be a three-element array and month and year
will be nil.

(Note that scanf gives you the %d fields as numbers (in case that’s a
consideration), whereas match and scan give you strings.)

David

···

On Thu, 29 Aug 2002, William Pietri wrote:

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

James_F_Hranicky · 28 August 2002 21:18

In 1.7, you can use String#match as well:

all, date, month, year = date.match(/\d+) (\w+) (\d+)/)

Jim

···

On Thu, 29 Aug 2002 05:31:23 +0900 “William Pietri” william-news-383910@scissor.com wrote:

date = "28 Aug 2002"
day, month, year = /(\d+) (\w+) (\d+)/.match(date)[1..3]
But that final [1…3] thing seems inelegant, which makes me think I’m
barking up the wrong tree. What’s the best Ruby-style way to do this?

Joel_VanderWerf1 · 28 August 2002 21:24

William Pietri wrote:
…

In Ruby the best I’ve been able to do is this:
date = "28 Aug 2002"
day, month, year = /(\d+) (\w+) (\d+)/.match(date)[1..3]

How about:

   day, month, year = /(\d+) (\w+) (\d+)/.match(date)[1..-1]

or

_, day, month, year = /(\d+) (\w+) (\d+)/.match(date).to_a

Still not quite elegant, but at least the 3 isn’t hard-coded.

···

But that final [1…3] thing seems inelegant, which makes me think I’m
barking up the wrong tree. What’s the best Ruby-style way to do this?

HAL_9000 · 28 August 2002 21:33

Howdy! As a long time Perl, Java, and Objective-C developer, I’m quite
taken
with Ruby, and I’m busily converting my brain over. But there’s a Perl
idiom that I’m having trouble translating. In Perl it goes like this:
$date = "28 Aug 2002";
($day, $month, $year) = $date =~ /(\d+) (\w+) (\d+)/;
In Ruby the best I’ve been able to do is this:
date = "28 Aug 2002"
day, month, year = /(\d+) (\w+) (\d+)/.match(date)[1..3]
But that final [1…3] thing seems inelegant, which makes me think I’m
barking up the wrong tree. What’s the best Ruby-style way to do this?

Hmm. It does seem inelegant, but I’m not sure
how to improve on it.

Things like \1 and \2 are only useful in doing substitutions,
I suppose.

One way, if you don’t mind (the new) scanf, would be:

day, month, year = date.scanf(“%d %s %d”)

HTH,
Hal

···

----- Original Message -----
From: “William Pietri” william-news-383910@scissor.com
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Wednesday, August 28, 2002 3:31 PM
Subject: Newbie converting brain from perl

Peter_Suschlik · 29 August 2002 00:49

Hi,

[…]

date = "28 Aug 2002"
day, month, year = /(\d+) (\w+) (\d+)/.match(date)[1..3]

day, month, year = date.scan(/[\d\w]+/) # => [“28”, “August”, “2002”]

Regards
Peter

Florian_Frank2 · 29 August 2002 13:46

I stumbled over this problem before. I think there isn’t a really
elegant solution for this right now.

Adding MatchDate#captures to ruby would be possible though without
breaking much existing code:

/(\d+) (\w+) (\d+)/.match(“28 Aug 2002”).captures
==>[“28”, “Aug”, “2002”]

I’ve made a little patch for this behaviour (attachment).

re-captures.patch (903 Bytes)

···

On Wed, 2002-08-28 at 22:31, William Pietri wrote:

Howdy! As a long time Perl, Java, and Objective-C developer, I’m quite taken
with Ruby, and I’m busily converting my brain over. But there’s a Perl
idiom that I’m having trouble translating. In Perl it goes like this:
$date = "28 Aug 2002";
($day, $month, $year) = $date =~ /(\d+) (\w+) (\d+)/;
In Ruby the best I’ve been able to do is this:
date = "28 Aug 2002"
day, month, year = /(\d+) (\w+) (\d+)/.match(date)[1..3]
But that final [1…3] thing seems inelegant, which makes me think I’m
barking up the wrong tree. What’s the best Ruby-style way to do this?

–
There is no God, and Murphy is his prophet.

William_Pietri · 29 August 2002 16:33

Thanks to all who gave advice about the best way to get matching regexp
items out; I found it very helpful. After years of doing most of my OO work
in Java, it’s exciting to be reminded that an OO language can be as
programmer-friendly as Perl.

Here’s the next thing that I’m puzzled about. I’m looking for the most
efficient way to do a Factory pattern.

When parsing a system log file, there are lots of lines of different sorts.
It would make sense that one should have a base class, LogLine, and then a
bunch of subclasses for particular entries (SendmailLogLine, NamedLogLine,
etc.).

So one might start out with

class LogLine
    attr_reader :month, :day, :hour, :minute, :second, 
            :host, :program, :pid, :message
    [...]
end

But from there, I have two questions:

Is it possible (or even sensible) to override LogLine.new so that it
returns an instance of the appropriate subclass? I’ve never liked the
Java-style LogLine.getInstance(args); it seems unfair to make programmers
keep track of whether you’re using a Factory internally or not.
What’s the most efficient way to give each subclass a chance to say “Yes,
that line is best represented by me,” without doing a lot of repetitive
parsing?

The obvious OO way seems to be that each subclass would, on loading,
register itself with LogLine. As each new LogLine is created, the
superclass would show the line to each of the subclasses, and whichever
subclass liked it the best would be instantiated with the data from the
line.

Done naively, this would mean that each subclass would parse the line, but
that would be wasteful. So obviously, the superclass should do the basic
parsing once and pass the details down. Passing as an Array is brittle and
not very OO. Passing as a Hash is better, but still not right.

Really, the parsed data is best represented as a LogLine, no? So it would
make sense to instantiate a LogLine with the base data, and then pass that
to the subclasses. But when instantiating the subclass, the subclass has to
copy over all the data and throw out the copy of the superclass, which
seems a little wasteful.

So is it possible for a subclass to somehow replace an instance of its
superclass with one of its own, stealing its data without explicitly
copying each field?

I suspect the answer to this is “no”, but Ruby’s OO is so much more flexible
than what I’m used to that I thought it would be good to ask.

Many thanks,

William

Ryan_King · 29 August 2002 01:43

Hrmm… if you don’t have to match /\d+ \w+ \d+/, you might as
well do this:

day, month, year = “28 Aug 2002”.split

Ryan King

···

On 2002.08.29, Peter Suschlik peter@zilium.de wrote:

date = "28 Aug 2002"
day, month, year = /(\d+) (\w+) (\d+)/.match(date)[1..3]
day, month, year = date.scan(/[\d\w]+/) # => [“28”, “August”, “2002”]

Massimiliano_Mirra4 · 29 August 2002 20:41

I ran into a similar situation when I wanted to initialize packages
kept on disk in a directory (modelled by PackageDir) and packages kept
in a file (modelled by PackageFile) through a call to
Package.new(path) in both cases.

PackageDir and PackageFile were separated in the first place because
the former supports #pack (returning a PackageFile instance) and the
latter supports #unpack (returning a PackageDir instance).

Here’s how I did it (non relevant parts omitted):

class Package
def Package.new(path)
if self == Package
case File.ftype(path)
when ‘file’ then PackageFile.new(path)
when ‘directory’ then PackageDir.new(path)
end
else
log 1, “initializing a package object from #{File.expand_path(path)}”
super(path)
end
end
end

class PackageDir < Package
def pack
…
end
end

class PackageFile < Package
def unpack
…
end
end

pkg = Package.new(‘foodir’)
pkg.type => PackageDir
pkg = Package.new(‘foo.rpk’)
pkg.type => PackageFile

Hope this helps.

Massimiliano

···

On Fri, Aug 30, 2002 at 01:33:01AM +0900, William Pietri wrote:

So one might start out with
class LogLine
    attr_reader :month, :day, :hour, :minute, :second, 
            :host, :program, :pid, :message
    [...]
end
But from there, I have two questions:

Is it possible (or even sensible) to override LogLine.new so that it
returns an instance of the appropriate subclass?

Gavin_Sinclair · 30 August 2002 06:29

Thanks to all who gave advice about the best way to get matching regexp
items out; I found it very helpful. After years of doing most of my OO
work in Java, it’s exciting to be reminded that an OO language can be
as programmer-friendly as Perl.

Here’s the next thing that I’m puzzled about. I’m looking for the most
efficient way to do a Factory pattern.

When parsing a system log file, there are lots of lines of different
sorts. It would make sense that one should have a base class, LogLine,
and then a bunch of subclasses for particular entries (SendmailLogLine,
NamedLogLine, etc.).

So one might start out with
class LogLine
    attr_reader :month, :day, :hour, :minute, :second,
            :host, :program, :pid, :message
    [...]
end

I’d use a Time object instead of :month, :day, etc.

But from there, I have two questions:

Is it possible (or even sensible) to override LogLine.new so that it
returns an instance of the appropriate subclass? I’ve never liked the
Java-style LogLine.getInstance(args); it seems unfair to make
programmers keep track of whether you’re using a Factory internally or
not.

There’s no such method as LogLine.new, I’m afraid. When you define the
class LogLine, you are creating a constant called LogLine which is of type
class. When you call LogLine.new, it is the new method in class Class
that you are calling. It, of course, calls LogLine.init, and the rest is
history.

So creating a (static) method LogLine.create or something like that is
probably the way to go.

I wouldn’t try to make subclasses of LogLine straight away, unless you
already know what you want to do with them.

What’s the most efficient way to give each subclass a chance to say
“Yes, that line is best represented by me,” without doing a lot of
repetitive parsing?

Passing the string (minus the date) is all I can think you can do. You
can’t slice the string in a way that’s meaningful to all subclasses.

The obvious OO way seems to be that each subclass would, on loading,
register itself with LogLine.

Don’t know what you mean here.

As each new LogLine is created, the
superclass would show the line to each of the subclasses, and whichever
subclass liked it the best would be instantiated with the data from the
line.

What does “like it the best” mean?

Done naively, this would mean that each subclass would parse the line,
but that would be wasteful. So obviously, the superclass should do the
basic parsing once and pass the details down. Passing as an Array is
brittle and not very OO. Passing as a Hash is better, but still not
right.

It of course depends on your app, but as I said above, you probably can’t
produce something at the base level that is meaningful to all subclasses.

BTW There’s nothing non-OO about arrays in Ruby. They’re not like Java!

Really, the parsed data is best represented as a LogLine, no? So it
would make sense to instantiate a LogLine with the base data, and then
pass that to the subclasses. But when instantiating the subclass, the
subclass has to copy over all the data and throw out the copy of the
superclass, which seems a little wasteful.

This doesn’t seem meaningful to me. All my comments are pointing to the
conclusion that you’re on the wrong track.

So is it possible for a subclass to somehow replace an instance of its
superclass with one of its own, stealing its data without explicitly
copying each field?

I suspect the answer to this is “no”, but Ruby’s OO is so much more
flexible than what I’m used to that I thought it would be good to ask.

Basically, I think you should put all the logic of what subclass best
represents the line into LogLine, like:

class LogLine
attr_reader :time

def init; raise “Can’t instantiate LogLine”; end

def create(line)
index = line[‘:’]
time = parse_time(line.slice!(index))

case line
  when SENDMAIL_REGEX
    return SendMailLogLine.new(...)
  when NAMED_REGEX
    return NamedLogLine.new(...)
  else
    return BasicLogLine.new(line)
end

end
end

It may be a bit awkward for the client of LogLine to know what to do with
the return value - it’ll probably have to type-check it - but that’s
because of the subclassing, not because of my approach.

Anyway, you asked for a factory, I gave you a factory!

Many thanks,

William

Cheers,
Gavin

Tobias1 · 30 August 2002 10:54

class LogLine
    attr_reader :month, :day, :hour, :minute, :second,
            :host, :program, :pid, :message
    [...]
end

I’d use a Time object instead of :month, :day, etc.

of course

But from there, I have two questions:

Is it possible (or even sensible) to override LogLine.new so that it
returns an instance of the appropriate subclass? I’ve never liked the
Java-style LogLine.getInstance(args); it seems unfair to make
programmers keep track of whether you’re using a Factory internally or
not.

There’s no such method as LogLine.new, I’m afraid. When you define the
class LogLine, you are creating a constant called LogLine which is of type
class. When you call LogLine.new, it is the new method in class Class
that you are calling. It, of course, calls LogLine.init, and the rest is
^^^^ialize
history.

He can add a singleton method
def LogLine.new(logline_string)

return subclass instance if apropriate

if not, create a plain LogLine:

super(logline_string)
end

So creating a (static) method LogLine.create or something like that is
probably the way to go.

This is better than overiding new because programmers expect
SomeClass.new.class == SomeClass

The obvious OO way seems to be that each subclass would, on loading,
register itself with LogLine.

Sounds sensible. Just add to your logline class:
class LogLine
@@subclasses = [LogLine]
def LogLine.inherited(subclass)
@@subclasses << subclass
end
def LogLine.create(logline_string)
# parse the date
ratings = @@subclasses.collect{|subclass|
subclass.rate(string_after_date)
}
best_subclass = @@subclasses[ratings.index(ratings.max)]
best_subclass.new(date, string_after_date)
end
end

and implement the classmethod rate for LogLine and all its subclasses to
return a number that indicates the likelyhood that it is responsible for
this logline. All classes that inherit from LogLine will automatically be
added to the class variable @@subclasses.

Really, the parsed data is best represented as a LogLine, no?
So it
would make sense to instantiate a LogLine with the base data, and then
pass that to the subclasses. But when instantiating the subclass, the
subclass has to copy over all the data and throw out the copy of the
superclass, which seems a little wasteful.

You want a “become” method:

d = LogLine.new(…)
old_id = d.id
puts d.class => LogLine
d.become(LogLine_Subclass)
puts d.class => LogLine_Subclass
puts (d.id == old_id) => true

To my knowledge, ruby does not have this functionality. I’d like to be
proven wrong on this. Maybe its possible to implement this in a C
extension that does some tricks with ruby’s object representation?

conclusion that you’re on the wrong track.

No. Apart from wanting to use the new method, his aproach looks fairly
sensible to me. He presented refreshing ideas and showed that he
understands OO programming fairly well. He’s just a newby to the ruby
and needs help in mapping his ideas to the language.

Basically, I think you should put all the logic of what subclass best
represents the line into LogLine, like:

The logic of the subclass should be part of the subclass. Or =>
maintainance nightmare.

It may be a bit awkward for the client of LogLine to know what to do with
the return value - it’ll probably have to type-check it

Why? It is_a LogLine, that’s for sure. The client got what it asked for.

Tobias

···

On Fri, 30 Aug 2002, Gavin Sinclair wrote:

William_Pietri · 30 August 2002 16:35

Howdy, Gavin. Thanks for taking the time to answer!

Tobias responded to a lot of your comments already; I just wanted to add a
little more:

Gavin Sinclair wrote:

So one might start out with
class LogLine
    attr_reader :month, :day, :hour, :minute, :second,
            :host, :program, :pid, :message
    [...]
end
I’d use a Time object instead of :month, :day, etc.

We’re thinking along the same lines; part of that […] is a method that
returns a Time object. But I did it this way for a couple of reasons. One
is that because time in the standard syslog file is ambiguous (you’ll note
there’s no year or tz field), it’s important to offer the uninterpreted
data. Anotherissue is one of speed: I’d rather defer creation of the Time
object until somebody actually needs it.

I wouldn’t try to make subclasses of LogLine straight away, unless you
already know what you want to do with them.

Happily, I do. I need to process my mail logs to see which DNS-based
blacklists would cut spam without yielding false positives. This struck me
as a fine project to learn Ruby with. But once I have a some decent
classes, they will undoubtedly be reused for other log analysis.

Passing the string (minus the date) is all I can think you can do. You
can’t slice the string in a way that’s meaningful to all subclasses.

Well, that’s not entirely true. All LogLine subclasses agree on the basic
fields in the LogLine class, so I want to slice those up first. Other
subclasses may be able to ferret out additional data from the message, so
each subclass may do further slicing.

Done naively, this would mean that each subclass would parse the line,
but that would be wasteful. So obviously, the superclass should do the
basic parsing once and pass the details down. Passing as an Array is
brittle and not very OO. Passing as a Hash is better, but still not
right.

BTW There’s nothing non-OO about arrays in Ruby. They’re not like Java!

Ah, I should have explained more what I meant.

In OO analysis and design, you find groups of data that travel together and
call them objects. If I break a log line up into an array of fields

["Aug", "24",  "12", "31", "01", "myhost", "sendmail", "12345", "foo"]

and pass that around, that’s not very OO because there is an implicit
structure that isn’t expressed in my representation. Arrays are just a row
of things, and I know more about the log line than that. So I could use a
hash to represent the structure and the data together:

{ "month"=>"Aug", "day"=>"24", "hour"=>"12", [...] }

Which is better, right? With the array if I added a field in the middle (say
syslog starts logging microseconds) then everything breaks. But with the
hash, it’s more flexible.

But then I’ve got to say to myself, “Gosh, that hash looks a lot like an
object.” And if I turn it into a full-fledged object, I can add behavior,
like a call to get a Time object.

So that’s what I meant when I said that passing arrays like this around
isn’t such good OO design.

Basically, I think you should put all the logic of what subclass best
represents the line into LogLine, like: […]
case line
  when SENDMAIL_REGEX
    return SendMailLogLine.new(...)
  when NAMED_REGEX
    return NamedLogLine.new(...)
  else
    return BasicLogLine.new(line)
end
end
end

That’s a good start, but you will run into trouble with that over time.

The case statement approach works for a few LogLine types, but is brittle.
If you only have a couple of subclasses, then keeping the superclass in
sync with the subclasses is easy. But with 20, it’s a big pain, exactly the
kind of pain that OO is supposed to save us from.

And if you want other people to be able to write dynamically loadable log
parsing modules that extend your current functionality, the case statement
approach breaks down entirely.

It’s been my experience that often when I write a complicated if or case
statement, there’s a set of subclasses waiting to be born. Or if they
already exist, then there’s an opportunity to move the code into the
subclasses.

It may be a bit awkward for the client of LogLine to know what to do with
the return value - it’ll probably have to type-check it - but that’s
because of the subclassing, not because of my approach.

I don’t think awkward is quite the right word; one of the points of object
orientation is that you have typing built in to the language.

In my case, some parts of the code are interested only in certain types of
LogLines; they will indeed check for type. But many parts of the code will
work with any LogLine at all. So a ReportGenerator object would accept a
collection of LogLine objects, but the SendmailReportGenerator would only
choose to process things of the SendmailLogLine type.

Thanks again for taking the time to lend me a hand!

William

William_Pietri · 30 August 2002 20:15

Many thanks for your thoughtful answers, Tobias. I’m going to split my
follow-up questions into a couple of messages, so that the different issues
are in reasonably short posts.

Tobias Peters wrote:

Is it possible (or even sensible) to override LogLine.new so that it
returns an instance of the appropriate subclass? I’ve never liked the
Java-style LogLine.getInstance(args); it seems unfair to make
programmers keep track of whether you’re using a Factory internally or
not.

He can add a singleton method
def LogLine.new(logline_string)

return subclass instance if apropriate

if not, create a plain LogLine:

super(logline_string)
end

So creating a (static) method LogLine.create or something like that is
probably the way to go.

This is better than overiding new because programmers expect
SomeClass.new.class == SomeClass

Should programmers really depend on that? It would seem that if
SomeClass.new returns a subclass of SomeClass, then it should always behave
as a SomeClass. Ergo, no harm done.

Could you give me an example of a situation where overriding new to return a
subclass wouldn’t work? Every example I can think of turns out to be either
bad subclass design or an encapsulation violation on the part of the user
of SomeClass.

That’s not to say I think that overriding new is a good idea yet. I’d never
suggest this in Java, for example; it would cause heart attacks on the part
of the language designers and brain aneurysms for many Java users. But
since Ruby is so much more flexible, I’m trying to see how far I can bend
it.

William

HAL_9000 · 30 August 2002 20:30

BTW There’s nothing non-OO about arrays in Ruby. They’re not like Java!

Ah, I should have explained more what I meant.

In OO analysis and design, you find groups of data that travel together
and
call them objects. If I break a log line up into an array of fields
["Aug", "24",  "12", "31", "01", "myhost", "sendmail", "12345", "foo"]
and pass that around, that’s not very OO because there is an implicit
structure that isn’t expressed in my representation. Arrays are just a row
of things, and I know more about the log line than that. So I could use a
hash to represent the structure and the data together:
{ "month"=>"Aug", "day"=>"24", "hour"=>"12", [...] }
Which is better, right? With the array if I added a field in the middle
(say
syslog starts logging microseconds) then everything breaks. But with the
hash, it’s more flexible.

If you’re storing only data (not methods),
consider a Struct for this.

Hal

···

----- Original Message -----
From: “William Pietri” william-news-383910@scissor.com
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Friday, August 30, 2002 11:35 AM
Subject: Re: Newbie converting brain from perl

Tobias1 · 30 August 2002 21:15

To clarify what I wrote:

You want a “become” method:

d = LogLine.new(…)
old_id = d.id
puts d.class => LogLine
d.become(LogLine_Subclass)
puts d.class => LogLine_Subclass
puts (d.id == old_id) => true

This idea of a become method was inspired by, but is not identical to
Smalltalk’s become: anObject method. The proposed become method just
changes the class of its receiver, it does not replace all existing
references to the reciever with references to the argument object, as the
Smalltalk version does.

Tobias

William_Pietri · 30 August 2002 21:15

Tobias Peters wrote:

The obvious OO way seems to be that each subclass would, on loading,
register itself with LogLine.

Sounds sensible. Just add to your logline class:
class LogLine
@@subclasses = [LogLine]
def LogLine.inherited(subclass)
@@subclasses << subclass
end
def LogLine.create(logline_string)
# parse the date
ratings = @@subclasses.collect{|subclass|
subclass.rate(string_after_date)
}
best_subclass = @@subclasses[ratings.index(ratings.max)]
best_subclass.new(date, string_after_date)
end
end

Awesome! The Class.inherited method is just what I needed. Sorry I didn’t
notice that the first time I read the docs!

Regarding the factory method, would something like this be just as
Ruby-appropriate?

def LogLine.create(logline_string)
    base_line = LogLine.new(logline_string)
    ratings = @@subclasses.collect{|subclass|
       subclass.rate(base_line)
    }
    best_subclass = @@subclasses[ratings.index(ratings.max)]
    best_subclass.new(base_line)
end

My notion is that rather than reparsing string_after_date over and over, the
common features would be parsed exactly once in LogLine.initialize, and the
subclasses would only parse the log message content. (Further progress
along this path could be made by LogLine only getting ratings from direct
subclasses, letting the subsubclasses poll their children as needed, but
that’s more an implementation detail.)

LogLine.new would end up being polymorphic, of course. Speaking of which,
what’s the best Ruby idiom for that? In Java, one does this as something
like

LogLine(String logString) {
    parse(logString);
}

LogLine(LogLine logLine) {
    copy(logLine);
}

Is the best way to do that in Ruby a case statement in the
LogLine.initialize method that checks the type of the argument?

Thanks,

William

···

On Fri, 30 Aug 2002, Gavin Sinclair wrote:

William_Pietri · 30 August 2002 21:35

Tobias Peters wrote:

So it
would make sense to instantiate a LogLine with the base data, and then
pass that to the subclasses. But when instantiating the subclass, the
subclass has to copy over all the data and throw out the copy of the
superclass, which seems a little wasteful.

You want a “become” method:

d = LogLine.new(…)
old_id = d.id
puts d.class => LogLine
d.become(LogLine_Subclass)
puts d.class => LogLine_Subclass
puts (d.id == old_id) => true

Excatly! Although I’d be nervous about doing it in this fashion. Instead,
I’d only use it in constructors, as it would scare me to change the type
changing during runtime. And I think I’d only do it with the direct
superclass. So I’d use it something like this:

class LogLine_Subclass < LogLine
    def initialize(arg)
        if arg.class == self.class.superclass
            become(arg)
        else 
            super(arg)
        end
        # do subclass initialization here
    end
end

That still seems subtly weird to me, but I think y’all can see where I’m
headed.

To my knowledge, ruby does not have this functionality. I’d like to be
proven wrong on this. Maybe its possible to implement this in a C
extension that does some tricks with ruby’s object representation?

Hmmm… That’s a little more than I want to tackle during my first week
using Ruby. But if there’s a less hairy way to do this, that’s swell.
Otherwise, I can just copy all the data from the superclass to the
subclass.

Thanks,

William

P.S. Thanks again to everybody for putting up with newbie questions like
this. If the FAQ maintainer would like any of these questions in the FAQ,
just let me know and I’ll summarize and rewrite things to be in FAQ form.

Gavin_Sinclair · 31 August 2002 07:23

Howdy, Gavin. Thanks for taking the time to answer!

Tobias responded to a lot of your comments already; I just wanted to add a
little more:

Your problem and solution, with its variants, have made interesting reading
I’ve ruthlessly snipped the rest for brevity.

Basically, I think you should put all the logic of what subclass best
represents the line into LogLine, like: […]
case line
  when SENDMAIL_REGEX
    return SendMailLogLine.new(...)
  when NAMED_REGEX
    return NamedLogLine.new(...)
  else
    return BasicLogLine.new(line)
end
end
end
That’s a good start, but you will run into trouble with that over time.

The case statement approach works for a few LogLine types, but is brittle.
If you only have a couple of subclasses, then keeping the superclass in
sync with the subclasses is easy. But with 20, it’s a big pain, exactly the
kind of pain that OO is supposed to save us from.

And if you want other people to be able to write dynamically loadable log
parsing modules that extend your current functionality, the case statement
approach breaks down entirely.

It’s been my experience that often when I write a complicated if or case
statement, there’s a set of subclasses waiting to be born. Or if they
already exist, then there’s an opportunity to move the code into the
subclasses.

I agree with all of the above. I think the issue os one of scalability.
Decomposing if/case statements into polymorphism is a good practice, but you
never get rid of them all - you just factor them into the single if/case
statement that decides what type of object it’s going to be: like my code
above.

When I think “factory”, I think “assembly line that spits out objects”, not “a
communications link for objects to decide what type they want to be”.

20 subclasses doesn’t necessarily make it a pain: those subclasses will be
arranged in a heirarchy, so the “factory” logic can be spread out a bit.

Keeping the base class in sync, and allowing new modules to fit automatically,
is a great point. I hope your approach works well.

It may be a bit awkward for the client of LogLine to know what to do with
the return value - it’ll probably have to type-check it - but that’s
because of the subclassing, not because of my approach.

I don’t think awkward is quite the right word; one of the points of object
orientation is that you have typing built in to the language.

In my case, some parts of the code are interested only in certain types of
LogLines; they will indeed check for type. But many parts of the code will
work with any LogLine at all. So a ReportGenerator object would accept a
collection of LogLine objects, but the SendmailReportGenerator would only
choose to process things of the SendmailLogLine type.

The reason I think it may be awkward is because I can’t imagine the various
types of LogLine having much in common. So you can’t write client code that
sends generic methods to specific objects. I may be wrong here, and you have
pointed out that there is some domain-specific similarity in your project, but
I still envisage a lot of case statements in the client code.

Thanks again for taking the time to lend me a hand!

William

Pleasure. If there are no issues, I’d like to see the source code when you’re
done, out of curiosity. Good luck.

Gavin

···

----- Original Message -----
From: “William Pietri” william-news-383910@scissor.com

Tobias1 · 30 August 2002 22:35

So creating a (static) method LogLine.create or something like that is
probably the way to go.

This is better than overiding new because programmers expect
SomeClass.new.class == SomeClass

Should programmers really depend on that?
[…]
But
since Ruby is so much more flexible, I’m trying to see how far I can bend
it.

You know that you can “bend” ruby that far. I’m more concerned with the
programmers having to read that sort of code. When I read
“SomeClass.new”, I think “constructor”. When the code says
“SomeClass.create” I think “factory”. There is no problem for the
interpreter. Just with the people (erroneously) thinking “just why do we
call SomeClass’es constructor here? I thought we would need an instance
of SomeSubclass”. Its about readability, not functionality.

Tobias

···

On Fri, 30 Aug 2002, William Pietri wrote:

Topic		Replies	Views
Newbie converting brain from perl ruby-talk	0	148	29 August 2002
Code Critique Request ruby-talk	7	80	27 August 2004
Equivalent idiom for ruby " perl -pe 's/(\d+)/localtime($1)/e '" ruby-talk	1	114	21 March 2005
My ruby code won't go as fast as my perl code ruby-talk	11	139	16 July 2004
Ex-Perl coders: Howz it feel to convert to Ruby? ruby-talk	94	250	28 September 2004

Newbie converting brain from perl

return subclass instance if apropriate

if not, create a plain LogLine:

return subclass instance if apropriate

if not, create a plain LogLine:

Related topics