The problem with run-time type checking

One problem that I find crops up in Ruby, but doesn’t really happen in
Perl (since Perl has only one kind of scalar) or compiled languages like
Java/C (since the type is checked at compile time) is that many libraries
(e.g. from RAA) will crash if you pass a variable with the wrong type to a
method, but rather than telling you that it was your fault, the stack dump
just shows an error occuring on a line inside the library, which then
requires you to look at the source code of the library in order to find
out what you did wrong (IMHO, libraries should work like black boxes).

For example, in the ‘cgi’ library, CGI.escapeHTML(str) calls str.gsub. If
you pass an array to it by accident, you’ll get a crash:

NameError: private method gsub' called for [1, 2, 3]:Array from /usr/local/lib/ruby/1.6/cgi.rb:260:inescapeHTML’
from (irb):7

rather than seeing a more friendly error message like “expected String,
got Array”. A programmer faced with the above error message probably has
to go into cgi.rb, look at line 260, and then figure out that he was
supposed to pass a String to CGI.escapeHTML but passed something else.

Regarding this, I have two questions:

(1) Is it bad programming practice to write a library that can crash if
garbage is passed in?

(2) How do I program libraries in a friendly fashion? Should I do stuff
like:

raise “Expected String, got #{str.class}” unless str.kind_of?(String)

Then again, doing (2) seems to add a significant amount of extra code I
have to write. In typed languages, I just need to specify what type a
variable should be; not have to write a whole line to raise an exception
if the variable isn’t the right type.

Does anyone have a better idea?

NameError: private method gsub' called for [1, 2, 3]:Array from /usr/local/lib/ruby/1.6/cgi.rb:260:in escapeHTML’
from (irb):7

The beauty of Ruby (and OO in general) is being able to make “objects”
… what if you wanted to pass in your own object that wasn’t inherited
from String (and thus, is not a #kind_of? String) but implements the
#gsub method?

Regarding this, I have two questions:

(1) Is it bad programming practice to write a library that can crash if
garbage is passed in?

Yes and no. It’s just as bad programming practice as not having unit
tests and it seems programmers don’t mind not having those …

(2) How do I program libraries in a friendly fashion? Should I do stuff
like:

raise “Expected String, got #{str.class}” unless str.kind_of?(String)

Definitely not. See my comment above.

Even testing using #respond_to? isn’t foolproof because of
#method_missing

Then again, doing (2) seems to add a significant amount of extra code I
have to write. In typed languages, I just need to specify what type a
variable should be; not have to write a whole line to raise an exception
if the variable isn’t the right type.

Does anyone have a better idea?

Automated unit tests. You get test coverage and peace of mind with
dynamically typed languages at the same time.

– Dossy

···

On 2002.09.22, Philip Mak mfraser@seas.gwu.edu wrote:


Dossy Shiobara mail: dossy@panoptic.com
Panoptic Computer Network web: http://www.panoptic.com/
“He realized the fastest way to change is to laugh at your own
folly – then you can let go and quickly move on.” (p. 70)

This is another way of saying that Array doesn’t have a gsub method.
There is a gsub method in the Kernel module; all methods in the Kernel
module are mixed in to Object as private methods, hence this error
message.

The author could, as you suggested, have enforced that the object be a
String by raising an exception if it were not a String. With this
particular function, I can’t see anyone wanting to pass in anything
other than a String. But since the method should work with any object
that has a gsub method that works like String’s gsub method, there isn’t
any reason to necessarily limit the method to only Strings.

If I had added a gsub method to the Array object I passed in, then this
might have worked. I’m not sure what gsub on an Array would do, but it
might make sense for other types of objects.

The author could also have uesd to_s or to_str to coerce the object that
was passed in into the type of object that he wanted.

Paul

···

On Sun, Sep 22, 2002 at 03:34:39PM +0900, Philip Mak wrote:

NameError: private method gsub' called for [1, 2, 3]:Array from /usr/local/lib/ruby/1.6/cgi.rb:260:in escapeHTML’
from (irb):7

Regarding question 1, it seems to me that libraries must “crash” when
garbage is passed in. At least I’ve never seen a library that didn’t. The
task for the library developer is to crash in a way that makes it
easy - or at least possible - for the library developer to fix the problem.

Regarding (2), It seems to me that the problem isn’t that the method
got an Array instead of a String, the problem is that the method got
an object that doesn’t have the expected type. That is, the object
doesn’t respond to the set of messages (gsub, in this case) the
method needs to send it. Presumably any object that responds to
gsub would be acceptable to the method.

Testing .responds_to? is klunky, too. From the library user’s
point-of-view, the fact that you can’t send gsub to str is not very
useful. Such a message is only useful to the library developer. A
useful message for the user explains the error in terms the
user can understand (with perhaps detailed background information
along with the primary message).

I don’t know what this method does, but I’m guessing that it adds escape
characters to its string argument. Assuming that, what’s wrong with

begin
str.gsub…
rescue
raise ArgumentError, “Can’t add escape characters to #{str}: $!”
end

···

On Sun, 22 Sep 2002 02:40:00 -0400, Philip Mak wrote:

One problem that I find crops up in Ruby, but doesn’t really happen in
Perl (since Perl has only one kind of scalar) or compiled languages like
Java/C (since the type is checked at compile time) is that many
libraries (e.g. from RAA) will crash if you pass a variable with the
wrong type to a method, but rather than telling you that it was your
fault, the stack dump just shows an error occuring on a line inside the
library, which then requires you to look at the source code of the
library in order to find out what you did wrong (IMHO, libraries should
work like black boxes).

For example, in the ‘cgi’ library, CGI.escapeHTML(str) calls str.gsub.
If you pass an array to it by accident, you’ll get a crash:

NameError: private method gsub' called for [1, 2, 3]:Array from /usr/local/lib/ruby/1.6/cgi.rb:260:in escapeHTML’ from
(irb):7

rather than seeing a more friendly error message like “expected String,
got Array”. A programmer faced with the above error message probably has
to go into cgi.rb, look at line 260, and then figure out that he was
supposed to pass a String to CGI.escapeHTML but passed something else.

Regarding this, I have two questions:

(1) Is it bad programming practice to write a library that can crash if
garbage is passed in?

(2) How do I program libraries in a friendly fashion? Should I do stuff
like:

raise “Expected String, got #{str.class}” unless str.kind_of?(String)

Then again, doing (2) seems to add a significant amount of extra code I
have to write. In typed languages, I just need to specify what type a
variable should be; not have to write a whole line to raise an exception
if the variable isn’t the right type.

Does anyone have a better idea?

Aargh! Of course I meant to say “for the library USER to fix the
problem.” Note to self: Don’t post at 9:00AM on Sundays.

···

On Sun, 22 Sep 2002 08:56:26 -0400, Tim Hunter wrote:

Regarding question 1, it seems to me that libraries must “crash” when
garbage is passed in. At least I’ve never seen a library that didn’t.
The task for the library developer is to crash in a way that makes it
easy - or at least possible - for the library developer to fix the
problem.

Regarding (2), It seems to me that the problem isn’t that the method
got an Array instead of a String, the problem is that the method got
an object that doesn’t have the expected type. That is, the object
doesn’t respond to the set of messages (gsub, in this case) the
method needs to send it. Presumably any object that responds to
gsub would be acceptable to the method.

people keep saying that the type of an object is just the set of methods
that the object responds to, but i don’t buy it.

def my_join (foobar, sep)
result = “”
foobar.each_with_index do |obj, idx|
result << obj
result << sep unless idx == (foobar.length - 1)
end
end

my_join( [1, 2, 3], ‘,’ ) => “1,2,3”
my_join( “foo\nbar\nbaz\n”, ‘,’ ) => “foo,bar,baz,” # note
trailing comma

String and Array are both of type Enumerable, and they both have a “length”
method. however, the statement

“String and Array both conform to an (unnamed) type ‘t’, defined as the set
of those methods which both String and Array objects respond to”

is false, IMO, because the semantics of “each_with_index” are not the same,
in relation to the semantics of “length”, in class String as they are in
class Array.

if “length” were part of the Enumerable interface, i would consider the
above behavior to be a defect in the library. but it’s not. the point is
that i, as the user of a library, can expect to be able to make certain
assumptions when i know that an object is of a certain type, as defined by a
class or an interface/module, that i cannot make if i just know that they
both respond_to? the same symbols.

-brian

Hi –

String and Array are both of type Enumerable, and they both have a “length”

They mix in Enumerable – which isn’t just a quibble about
terminology: the whole mixin process pertains to interface-building
rather than type.

method. however, the statement

“String and Array both conform to an (unnamed) type ‘t’, defined as the set
of those methods which both String and Array objects respond to”

is false, IMO, because the semantics of “each_with_index” are not the same,
in relation to the semantics of “length”, in class String as they are in
class Array.

They aren’t necessarily the same in class String and class String
either :slight_smile:

irb(main):039:0> t
“hi”
irb(main):040:0> t.type
String
irb(main):041:0> s
“hi”
irb(main):042:0> s.type
String
irb(main):043:0> t.each_with_index {|x| p x}
[“hi”, 0]
nil
irb(main):044:0> s.each_with_index {|x| p x}
Hello

The dynamic nature of Ruby objects is such that there’s really no
guarantee of what they do except… what they do.

That’s more of a starting point than an ending point, but I’m almost
late for a class…

David

···

On Tue, 24 Sep 2002, Brian Denny wrote:


David Alan Black | Register for RubyConf 2002!
home: dblack@candle.superlink.net | November 1-3
work: blackdav@shu.edu | Seattle, WA, USA
Web: http://pirate.shu.edu/~blackdav | http://www.rubyconf.com

Enumerable is a mixin, not an interface. It’s used to add additional
methods to a class that already has an each() method. Because a class
includes Enumerable doesn’t mean that an instance of that class will
have all the methods defined in Enumerable (I can remove or undefine
them), nor does it mean that a class that doesn’t include Enumerable
lacks an each_with_index() method that has the same semantics as the one
in Enumerable. In Ruby, an object’s class just isn’t a very good
measure of an object’s type.

As you pointed out, the set of method names an object responds to is
also not a good measure of the object’s type, because it’s possible to
have to objects that both respond to the same method, but to give the
method different semantics for each object. This is why using
respond_to? to check an object’s “type” is frowned upon.

I know of no good way to check for the semantics of an object’s methods
except by calling those methods; this is where unit tests come in.

An alternative is to use modules as if they were interfaces, but I have
not yet seen an implementation that does this well. I’m also not sure
it would be widely accepted in the Ruby community.

In the case of String#length and String#each, the problem is that each
iterates over the lines in the String, while length tells you how many
characters are in the String. IMO, this is a bug in String’s interface;
as a result, I rarely call String#each directly, but instead call
String#each_byte or String#each_line.

Paul

···

On Tue, Sep 24, 2002 at 01:38:02AM +0900, Brian Denny wrote:

if “length” were part of the Enumerable interface, i would consider the
above behavior to be a defect in the library. but it’s not. the point is
that i, as the user of a library, can expect to be able to make certain
assumptions when i know that an object is of a certain type, as defined by a
class or an interface/module, that i cannot make if i just know that they
both respond_to? the same symbols.