Standard type conversion mechanism

OK, I’m posting this because a few people on #ruby-lang thought it
should be posted, so here it is.

A number of us desire a more standard, more scalable type conversion
mechanism. By “type conversion”, I mean you’ve got one thing (say, an
Integer), and you want it to be another (say, a String).

Currently we do this:

x = 42
s = x.to_s

The problem is this is both a loose convention and it doesn’t scale well
at all. A case-in-point of the former:

x  = nil
s  = ""
s << x

=> TypeError: cannot convert nil into String

Yet, there’s a NilClass#to_s… but this looks for NilClass#to_str.
There’s not even internal consistency. This leads us to the second
point; it doesn’t scale. In order to provide a conversion from each
type to another, it requires (type^2) methods in each class. (Granted,
you’re not likely to have a conversion from every type to every other,
but this is a worst-case.)

There isn’t a good way of naming these methods, either. If I have
Should I start adding methods to classes that have little relation to
other classes, just for quick conversions? This breaks a form of
containership, and requires one class know about another one, even
completely unrelated.

To summarize, here are the cons of the current system:

*  No standardization.  (#to_i, #to_a, #to_s, #to_str, what about
   other classes?)

*  Poor scaling. (Lots of #to_* methods)

*  Inelegant.  (Requires one class "know" about another class)

OK, enough complaining. I have a working solution: a ConversionTable.
Currently, you can find the implementation here:

http://mephle.org/conversion.rb

Here’s an example:

require 'conversion'

ConversionTable.add(Integer, String)  { |i| i.to_s }
ConversionTable.add(Integer, Boolean) { |i| i != 0 }
ConversionTable.add(String,  Integer) { |s| s.to_i }

x = 42
p x.as(String)
p x.as(Boolean)

s = "42"
p s.as(Integer)
p s.can_convert?(Integer)
p s.can_convert?(Array)

This provides the following advantages:

*  Standard.  There's no ambiguity about which method to call;
   things are based entirely on the class/module itself.

*  Scalable.  Adding conversions adds them in one place; no "method
   pollution".

*  Elegant.  Conversion "glue" is outside the class itself,
   requiring neither class to know about the others.

*  Easy to add new conversions.

*  Scalable part 2.  Theoretically, "conversion inferencing" can be
   done.  That is, given A => B, and B => C, you can ask for A => C
   and it'll figure out how.  (Note the current implementation does
   not handle this.)

*  Conversions can be queried.

*  Fits over existing methods and doesn't break backward
   compatibility.

I have an interest in developing some further dimensions to this, as I’m
contemplating semantics/unit and context-sensitive programming in Ruby.
(So you can say, “this is an integer in bytes, give me a string
representation in kilobytes” and it does the magic for you.) I don’t
have any immediate plans to formalize this as a module (since it lacks
the further functionality I’d like, and I don’t have the time to finish
it), but I know a number of people who are pressing to have (at least)
the basic conversion mechanism as a more built-in standard.

Thoughts, comments, etc, encouraged.

···

A::b::C and want it as an A::b::D, should I have A::b::C#to_a_b_d?


Ryan Pavlik rpav@users.sf.net

“Another perfectly calculated space-time
splice-n-splice. Now to get back to… wait a second.
I forgot to carry the TWO!” - 8BT

A number of us desire a more standard, more scalable type conversion
mechanism. By “type conversion”, I mean you’ve got one thing (say, an
Integer), and you want it to be another (say, a String).

Currently we do this:

x = 42
s = x.to_s

The problem is this is both a loose convention and it doesn’t scale well
at all. A case-in-point of the former:

x  = nil
s  = ""
s << x

=> TypeError: cannot convert nil into String

Yet, there’s a NilClass#to_s… but this looks for NilClass#to_str.
There’s not even internal consistency.

It is consistent:

  • to_int, to_str, to_ary are called implicitly
  • to_i, to_s, to_a are called explicitly

In other words, if an operator decides to do an automatic cast of one of its
operands, it calls the long version. If you want to force a conversion
yourself, you use the short version.

This is done because there is loss of data in the second case, and allowing
automatic conversion would hide bugs which would be very hard to detect. For
example, the value ‘nil’ is not the same as an empty string. If I do
mystr << myval

where myval is ‘nil’ then in general I want it to raise an exception,
because ‘nil’ is not a string, so I probably forgot to initialise it.
However, if I know that myval may contain nil, and want it treated as an
empty string, then I do
mystr << myval.to_s

The same applies for strings containing numeric values, and vice versa.
a = “1”
b = 2
c = a + b # String “12” ? Fixnum 3? or something else?

What should c contain? The answer is ‘it depends’. Ruby can’t know what your
intention is, so it fails. You tell it what you want by using either ‘to_s’
or ‘to_i’ to convert

Your solution, a.as(Fixnum) + b.as(Fixnum), seems to me to be the same as
a.to_i + b.to_i, but more verbose and less efficient.

This leads us to the second
point; it doesn’t scale. In order to provide a conversion from each
type to another, it requires (type^2) methods in each class.

Sure, but in the general case it does not make sense to attempt to convert
type X to type Y. Object X has instance variables @a,@b,@c and Object Y has
instance variables @d, @e; how are you going to convert an instance of X
into an instance of Y?

It probably makes sense for most objects to have a ‘printable’ form of their
contents for display purposes - hence X.to_s - and also a detailled
printable version for debugging, hence X.inspect. But I don’t see any need
for generic ‘X to Y’.

Regards,

Brian.

···

On Tue, Jun 24, 2003 at 04:45:05AM +0900, Ryan Pavlik wrote:

Kent Beck’s book Smalltalk Best Practice Patterns covers this topic. It
discusses two gotchas to the practice of adding conversion methods to
the source class:

  • No limit to number of methods to be added – the protocol tends to
    grow and grow.

  • It couples the source class to another class it wouldn’t otherwise
    have knowledge of.

He says there should only be a conversion method when either the 1)
source and destination classes have the same protocol, or 2) there’s
only one reasonable way to implement the conversion. His solution for
conversions when neither of these conditions are true is to add a
“Converter Constructor Method” on the destination class. So instead of
string.as(Date), you’d say Date.from_string(string). This keeps the
logic on Date, which is where it belongs.

As for the proposed code, what would a conversion table buy me?
Wouldn’t it just create a third place to put code that knows about my
class? IMHO, the problem Ryan mentioned with NilClass#to_str not
existing should be resolved in accordance with the POLS, but a
conversion table smells like bad design to me. If I’m wrong, I’d like
to better understand the benefits of the conversion table proposal.

···

On Tuesday, Jun 24, 2003, at 02:50 America/Chicago, Brian Candler wrote:

It probably makes sense for most objects to have a ‘printable’ form of
their contents for display purposes - hence X.to_s - and also a
detailled printable version for debugging, hence X.inspect. But I
don’t see any need for generic ‘X to Y’.


John Platte
Principal Consultant, NIKA Consulting
http://nikaconsulting.com/

Kent Beck’s book Smalltalk Best Practice Patterns covers this topic. It
discusses two gotchas to the practice of adding conversion methods to
the source class:

  • No limit to number of methods to be added – the protocol tends to
    grow and grow.

  • It couples the source class to another class it wouldn’t otherwise
    have knowledge of.

These are two exact problems this design solves. Having a separate
table means 1) No additional methods added to the class, and 2) Neither
class needs to know about the other.

He says there should only be a conversion method when either the 1)
source and destination classes have the same protocol, or 2) there’s
only one reasonable way to implement the conversion. His solution for
conversions when neither of these conditions are true is to add a
“Converter Constructor Method” on the destination class. So instead of
string.as(Date), you’d say Date.from_string(string). This keeps the
logic on Date, which is where it belongs.

This is a bit of a grey area. I’d still tend toward the side which says
two types shouldn’t have to know about each other. Any parsers or the
like should be external to both, because otherwise, as above, the number
of parsers and methods in your class can grow without bound.

As for the proposed code, what would a conversion table buy me?
Wouldn’t it just create a third place to put code that knows about my
class? IMHO, the problem Ryan mentioned with NilClass#to_str not
existing should be resolved in accordance with the POLS, but a
conversion table smells like bad design to me. If I’m wrong, I’d like
to better understand the benefits of the conversion table proposal.

It is a third place to put code. It just so happens that this is
where the code belongs, due to both of the above problems. Given types
A and B, there is no reason A or B necessarily know about each other, or
are even in the same package/module/etc. They may be totally unrelated,
yet it may become desireable for one type to be converted to the other.

Thus you need a place for the “intermediary” or “glue” code to be
placed. Should it go in A? Or B? It’s best that it be in neither A
nor B, but in the middle somewhere. That’s what the ConversionTable is:
the middle.

In addition, you get further advantages, in that you can query whether
one type can be converted to another, you can make allowances for
the class hierarchy (allowing subtypes to be converted to other subtypes
when only supertype conversions are available), and you can have the
table figure out how to go from A → D when only A → B, B → C, and
C → D conversions exist. (The current implementation doesn’t handle
that yet, but it could be made to with a little work. In fact, once it
figures out a conversion, it could write code and drop it in A → D that
follows the process so it doesn’t have to figure it out again.)

Further but more abstract discussion:

You always have a major advantage when things are programmatic and not
purely semantic/convention. For instance, the method #to_s means
nothing to ruby, really, other than “a method called to_s”. Any further
meaning is purely convention… typically it converts to a string.

If you have actual programmatic meaning… “this is a ConversionTable,
it does this thing”, then you gain the ability for your code to do more
work for you. This is where the chain conversion mechanism comes in.
Instead of writing every conversion yourself (adding a method to every
class to go to every other class it needs), ruby can fill in the rest.

It also allows your code to do things like ask “what can this object be
treated as?” Your code can pick the preferable form, or be satisfied
with a less-preferable form (but still be able to function). For
instance, if you’re printing a result, you may want a String, but you’ll
be happy with an Integer, too.

This opens the door for further semantic processing of values… making
semantics “programmatic” themselves. Adding metadata about what an
object is ties right into the conversion system. Tagging things with
semantics would deepen the conversion chaining greatly. For instance,
you could have an Integer tagged as “Unix Time”, and ask for an “ISO
Date” String.

The idea of well-defined semantics, along with context-sensitivity, is
an area I’ve been interested in… the ConversionTable is a (very)
simple beginning, but it’s a fundamental piece of functionality that
solves the problems you’ve listed above in a fairly elegant manner (even
if my code isn’t the most elegant implementation).

···

On Wed, 25 Jun 2003 00:27:40 +0900 John Platte john.platte@nikaconsulting.com wrote:


Ryan Pavlik rpav@users.sf.net

“And I’m 10,000 feet up in the air. Dang it.” - 8BT