[BUG] string range membership

Dave,

Relative new Ruby user.

    Welcome to Ruby!

Let's see if I've got this straight. Somebody
complained because

('1'..'10').member?('2')
=> false

    That is the tip of the iceberg, yes.

Good! The fact that Ruby will get incredibly clever
with strings and fabricate arbitrary sequences with
them is a charming trick, but they are arbitrary, and
it is a trick.

    Why is this good? The Range '1'..'10' is a member of Enumerable.
As such, it has a finite number of elements and those elements can
enumerated (for lack of a better word) one at a time using #each. In
particular, this Range is enumerated as '1', '2', '3', '4', '5', '6',
'7', '8', '9', '10'. The method Enumerable#member?, will return true if
one of the enumerated elements is equal to the parameter. However, for
Ranges, the behavior of #member? is different. So different in fact
that for this Range, #mamber?('2') returns false. Many people see this
as a bad thing, not a good thing.

The fact that '1', '2', ... '9','10' is obvious
doesn't make it any less arbitrary.

    However that sequence isn't arbitrary, all others are. A Range can
be defined on any object that supports #succ and #<=>. The #succ method
defines the *one and only* sequence that a Range cares about, in
relation to Enumerable. For strings, '1', '2', '3', '4', '5', '6', '7',
'8', '9', '10' is the *one and only* sequence that #succ generates, and
that is the sequence that Enumerable#member? would use if Range didn't
override the #member? method.

'1'..'100'

Is that supposed to be 1, 2, 3, ... 99, 100 or 1, 10,
11, 100? Ruby arbitrarily decided to interpret those
strings as base 10 integers.

    No, it's supposed to be '1', '2', '3', ... '99', '100'. There is
nothing arbitrary about the sequence, and it's not a trick. It is the
sequence defined by String#succ. You are welcome to write your own
version of String#succ, but it won't change anything. Range#member?
will still ignore it.

'a.1'..'c.3'

Quite honestly, I have absolutely no idea how Ruby
would count that. Will I get 'a.1', 'a.2', 'a.3',
'b.1' ... or is it going to go all the way to 'a.9'
and then start over with 'b.1'?

    Well, let's see:

ruby -e "p ('a.1'..'c.3').to_a"

["a.1", "a.2", "a.3", "a.4", "a.5", "a.6", "a.7", "a.8", "a.9", "b.0",
"b.1", "b.2", "b.3", "b.4", "b.5", "b.6", "b.7", "b.8", "b.9", "c.0",
"c.1", "c.2", "c.3"]

(The rest of the message was less relevant, so I snipped it along with

my irrelevant smart-ass replies :o) Instead, I present the current
state of affairs on this issue, as there seems to be a lot of confusion
about it:

    There are two core issues involved in this problem. The first is
the dual nature of Ranges. Since Ranges implement the #each method,
they can be viewed as a set of elements, which is how Enumerable views
them. Therefore (1..10).to_a works, along with all of the other
wonderful methods that Enumerable provides.

    Ranges can also be viewed as intervals. The best example here is
(1.0..10.0). This Range is *not* Enumerable, since Float does not (and
can not) implement the #each method. However, it is still useful to ask
if a number falls within the boundaries of a Range. Therefore, the <=>
operator is used to test for Range.begin <= value <= Range.end. This is
the functionality that is currently implemented by Range#member?, and
its alias, Range#include?. This was mainly done as an optimization,
since checking 1 <= x <= 1000000 is a whole lot faster than Enumerating
all 1000000 elements. It also allowed Float Ranges to work as well.

    The other core issue is that the method String#succ is implemented
in such a way that it is possible for (x > x.succ) to be true (e.g. 'z'

'z'.succ). This is what makes the view of a Range as a set and the

view of a Range as an interval incompatible, and why
('1'..'10').include?('2') can be viewed as either right or wrong
depending on how you are looking at the Range. Certainly, '2' is in the
set ('1', '2', '3', ... , '10'), but '1' <= '2' <= '10' is *not* true
since strings are compared, well, as strings.

    So, we are currently in a situation where Enumerable.member? (and
its alias Enumerable.include?) test for set membership by enumerating
the set through the #each method, but Range#member? and Range#include?
test for interval coverage and *not* set membership. This is the main
inconsistency that we are trying to get rid of.

    Matz is currently considering changing the functionality of
Range#member? from an interval coverage test back to the set membership
test, which interestingly enough, is actually how it started life (it
was later change to be the same as #include?). Range would still
override the method and optimize the test for Integer Ranges, but
non-Integer ranges (include String Ranges) would revert back to the
Enumerable#member? method (or at least that method's functionality).
Matz hasn't decided whether he would change the Range#include? method to
be a test for set membership too, or to leave it as an interval coverage
test. My guess is that it will remain an alias for #member?, since the
two are aliases in Enumerable.

    However, since Range#member? would no longer be an interval coverage
test, Matz would want to add a new method to Range to take its place, so
he is currently trying to find a good name for that method. Current
suggestions for the name include (no pun intended):

#between?
#betwixt?
@bound?
#cover?
#enclose?
#encompass?
#in?
#in_interval?
#in_range?
#inside?
#surround?
#within?

    Matz is also seeking comments from other people on these suggested
names along with any other names that might be appropriate.

    David A. Black also suggested (along with the wonderfully apt name
#encompass?) that this new function could also accept a Range as the
parameter and test for interval over interval coverage as well. This
sounds like a great suggestion and would make the new function even more
useful.

    So, that's where we are. I hope this clears up a lot of the
misconceptions that seem to have plagued this discussion.

    - Warren Brown

   No, it's supposed to be '1', '2', '3', ... '99', '100'. There is
nothing arbitrary about the sequence, and it's not a trick.

I don't think you understand my use of "arbitrary." Ordering strings to correspond to their integers is absolutely arbitrary. As arbitrary as ordering them by ASCII value, or (as libraries do it) by the spelling of their pronounced forms. Integers have an inherent order. Strings do not.

    Matz is also seeking comments from other people on these suggested
names along with any other names that might be appropriate.

Yes. And my comment is that their very existence is detrimental to the language, promoting obscurity and the opportunity for confusion, and they should be scrapped, at least from the core. Ranges aren't arrays or integers; no small part of the problem stems from wanting to treat them as if they are.

Oh, well, I don't have to use them whatever they're called.

···

On Nov 30, 2005, at 22:59, Warren Brown wrote:

Nice summary Warren.

There's still a little bit more to it though. If one serachs ruby-talk
one finds that there are also other less obvious pacularities about
Range --not that they are all the significant but they are there.

The problem I see is that if #member? goes back to being essentially
equivalent to #to_a.include? We're right back to the original problem
exactly as you point out:

This was mainly done as an optimization,
since checking 1 <= x <= 1000000 is a whole lot faster than Enumerating
all 1000000 elements. It also allowed Float Ranges to work as well.

How could one optimize a _cutstom_ memebership for a Range then? You
can't, so our choices for #member? trap us between inconsistant
functionaity or significant ineffeicency. And it still does not address
the underlying causes: #succ and #<=> are incompatabile in the String
class, and might also be so for other classes.

I've offered the best solution generally possible for this issue: It
corrects the underlying cuase, fixes the inconsistant functionality and
maintains efficiency. What more can one ask? Nonetheless no one seems
interested in it. I tend to think the reason is becuase it introduces a
new method (#cmp), but since no one has even touched on it, how do I
know? I'm at a loss. Do people just not get it? Did I not explain it
well enough? Did I miss something? Or is that people just prefer to
stew around in their own preconceptions?

T.