[BUG] string range membership

Ara,

ruby -v -e "p(('1'..'10').to_a)"
ruby 1.8.2 (2004-12-25) [i386-mswin32]
["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]

   This shows a clear and unique mapping of the range
'1'..'10' into a set of strings.

but where do '01', '001', and '0001' go? they too,
are in the set of strings.

    You completely lost me there. '01' doesn't *go* anywhere. That
string is not in the range '1'..'10', in the same way the 'x' is not in
the range 'a'..'n'.

    Don't let the fact that my example used strings that look like
numbers confuse the issue. The issue is that a range of strings that
can be converted into a finite set, has a method to test for membership
in that range, that doesn't match values that are in the set. Wow, that
sentence is even hard for *me* to follow.

    OK, let's take a different example to avoid all discussion of
integers and various string representations of them.

ruby -v -e "p(('a'..'aa').to_a)"

ruby 1.8.2 (2004-12-25) [i386-mswin32]
["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n",
"o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "aa"]

    Here we have a string range that has 27 "members". Now:

ruby -e "p(('a'..'aa').member?('a'))"

true

ruby -e "p(('a'..'aa').member?('b'))"

false
...

ruby -e "p(('a'..'aa').member?('z'))"

false

ruby -e "p(('a'..'aa').member?('aa'))"

true

    Can this really be called correct behavior of the member?() method?
I can't see any tenable argument to say that it is.

    - Warren Brown

ruby -v -e "p(('1'..'10').to_a)"
ruby 1.8.2 (2004-12-25) [i386-mswin32]
["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]

   This shows a clear and unique mapping of the range
'1'..'10' into a set of strings.

but where do '01', '001', and '0001' go? they too,
are in the set of strings.

   You completely lost me there. '01' doesn't *go* anywhere. That
string is not in the range '1'..'10', in the same way the 'x' is not in
the range 'a'..'n'.

says who? :wink: i may chose to define String#succ to do whatever i like -
including the values '01', '001', and '0001'.

my point is simply that you seem to be merging the notion of ranges and sets.
the range abstract to_a is determined by only a few things

   - the start and end points

   - the succ method of the start value and each successive succ value
     remember one could do this

       irb(main):003:0> class String; def succ; self == "1" ? 42 : super; end; end
       => nil
       irb(main):004:0> "1".succ
       => 42

   - the spaceship operator for each succ value called against the endpoint

because of this we cannot even safely call to_a on an arbitrary range., for instance

   irb(main):002:0> (42.0 .. 1.0).to_a
   TypeError: can't iterate from Float
    from (irb):2:in `each'
    from (irb):2:in `to_a'
    from (irb):2

in summary a range is nothing but a set of endpoints with some
abstract/duck-type-like methods that may or may not produce a set as a
__process__. note that the set produced is not part of the range itself and
can be dynamically altered or even be made to produce a different set each
time:

   harp:~ > cat a.rb
   class Float
     def succ
       self + rand
     end
   end

   p((4.2 ... 42.0).to_a)

   harp:~ > ruby a.rb
   [4.2, 4.60303889967309, 5.57983848378295, 6.19446672151043, 6.92731328072508, 7.40446684874589, 7.79202463038348, 8.67552806421286, 9.42821837951244, 10.1988047216007, 11.1116769865281, 11.6169205995556, 11.9975653524073, 12.2256247650959, 12.8874200335378, 13.1557666607712, 13.6470070004444, 14.2172959192607, 15.0882979655236, 15.3487930162798, 15.9791460692026, 16.4321713791994, 17.0903318945661, 17.2967949864209, 18.2400722395741, 18.7286500286255, 19.7174743954199, 20.4528553779707, 20.953553149678, 21.0415866875269, 21.2924876748544, 22.2378099442685, 23.0076932295775, 23.0941582708386, 23.4748092012559, 23.5515124737304, 24.3463511761819, 24.6901201768951, 25.2541406207396, 26.0256212044938, 26.843159468986, 26.9579528629072, 27.01297383827, 27.7250436963749, 27.9017308958297, 28.1100643283236, 28.4480522935525, 28.6197629801695, 29.3756706791326, 29.9897540116082, 30.0057580759777, 30.7085039121469, 30.7510332074171, 30.9096299847723, 30.9314941316772, 31.3964098461468, 31.7312966347497, 32.2153802510432, 32.619498970957, 32.9731525439908, 33.3765950052407, 34.3397676884718, 35.1641816525327, 35.4891756054474, 36.2408178073905, 36.8733362068042, 37.6251560883057, 37.8047618263845, 37.8828752584342, 38.2001976403303, 38.9255502197319, 39.8027872575378, 40.0416710479264, 40.9954826039753, 41.4534375661544]

   Don't let the fact that my example used strings that look like numbers
   confuse the issue. The issue is that a range of strings that can be
   converted into a finite set, has a method to test for membership in that
   range, that doesn't match values that are in the set. Wow, that sentence
   is even hard for *me* to follow.

   OK, let's take a different example to avoid all discussion of integers
   and various string representations of them.

ruby -v -e "p(('a'..'aa').to_a)"

ruby 1.8.2 (2004-12-25) [i386-mswin32]
["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n",
"o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "aa"]

   Here we have a string range that has 27 "members". Now:

not quite - we have a string range that __produces__ 27 elements. it does not
'have' or 'contain' them. it merely suggests this set as it's current thought
on what that set might be. this set definition may change - unlike the
endpoints of the range - and it is therefore not a property of the range.

ruby -e "p(('a'..'aa').member?('a'))"

true

ruby -e "p(('a'..'aa').member?('b'))"

false
...

ruby -e "p(('a'..'aa').member?('z'))"

false

ruby -e "p(('a'..'aa').member?('aa'))"

true

   Can this really be called correct behavior of the member?() method? I
   can't see any tenable argument to say that it is.

the definition of membership may rely on endpoints only. that explains it
perfectly.

   harp:~ > irb
   irb(main):001:0> 'z' < 'aa'
   => false

ergo - not in the set. the confustion here is caused by exactly the reasons
i'm explaining - String#succ has been defined not to create a monotonically
increasing (<=>) sequence - but to produce the "next" string in an english
sense. this is very useful for auto-generating names

   irb(main):004:0> "z99".succ
   => "aa00"

if this were a monotonically increasing set the output would be

   => "z9:"

but that sure isn't that useful - unless you want to try to use ranges as
sets.

the secret here is simply re-define String#succ - not Range#member. if
String#succ did a simply addition using base 255 arith you'd be set.

kind regards.

-a

···

On Thu, 24 Nov 2005, Warren Brown wrote:
--

ara [dot] t [dot] howard [at] noaa [dot] gov
all happiness comes from the desire for others to be happy. all misery
comes from the desire for oneself to be happy.
-- bodhicaryavatara

===============================================================================

>>> ruby -v -e "p(('1'..'10').to_a)"
>>> ruby 1.8.2 (2004-12-25) [i386-mswin32]
>>> ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
>>>
>>> This shows a clear and unique mapping of the range
>>> '1'..'10' into a set of strings.
>>
>> but where do '01', '001', and '0001' go? they too,
>> are in the set of strings.
>
> You completely lost me there. '01' doesn't *go* anywhere. That
> string is not in the range '1'..'10', in the same way the 'x' is not in
> the range 'a'..'n'.

says who? :wink: i may chose to define String#succ to do whatever i like -
including the values '01', '001', and '0001'.

my point is simply that you seem to be merging the notion of ranges and sets.
the range abstract to_a is determined by only a few things

   - the start and end points

   - the succ method of the start value and each successive succ value
     remember one could do this

       irb(main):003:0> class String; def succ; self == "1" ? 42 : super; end; end
       => nil
       irb(main):004:0> "1".succ
       => 42

   - the spaceship operator for each succ value called against the endpoint

because of this we cannot even safely call to_a on an arbitrary range., for instance

   irb(main):002:0> (42.0 .. 1.0).to_a
   TypeError: can't iterate from Float
    from (irb):2:in `each'
    from (irb):2:in `to_a'
    from (irb):2

in summary a range is nothing but a set of endpoints with some
abstract/duck-type-like methods that may or may not produce a set as a
__process__. note that the set produced is not part of the range itself and
can be dynamically altered or even be made to produce a different set each
time:

   harp:~ > cat a.rb
   class Float
     def succ
       self + rand
     end
   end

   p((4.2 ... 42.0).to_a)

   harp:~ > ruby a.rb
   [4.2, 4.60303889967309, 5.57983848378295, 6.19446672151043, 6.92731328072508, 7.40446684874589, 7.79202463038348, 8.67552806421286, 9.42821837951244, 10.1988047216007, 11.1116769865281, 11.6169205995556, 11.9975653524073, 12.2256247650959, 12.8874200335378, 13.1557666607712, 13.6470070004444, 14.2172959192607, 15.0882979655236, 15.3487930162798, 15.9791460692026, 16.4321713791994, 17.0903318945661, 17.2967949864209, 18.2400722395741, 18.7286500286255, 19.7174743954199, 20.4528553779707, 20.953553149678, 21.0415866875269, 21.2924876748544, 22.2378099442685, 23.0076932295775, 23.0941582708386, 23.4748092012559, 23.5515124737304, 24.3463511761819, 24.6901201768951, 25.2541406207396, 26.0256212044938, 26.843159468986, 26.9579528629072, 27.01297383827, 27.7250436963749, 27.9017308958297, 28.1100643283236, 28.4480522935525, 28.6197629801695, 29.3756706791326, 29.9897540116082, 30.0057580759777, 30.7085039121469, 30.7510332074171, 30.9096299847723, 30.9314941316772, 31.3964098461468, 31.7312966347497, 32.2153802510432, 32.619498970957, 32.9731525439908, 33.3765950052407, 34.3397676884718, 35.1641816525327, 35.4891756054474, 36.2408178073905, 36.8733362068042, 37.6251560883057, 37.8047618263845, 37.8828752584342, 38.2001976403303, 38.9255502197319, 39.8027872575378, 40.0416710479264, 40.9954826039753, 41.4534375661544]

> Don't let the fact that my example used strings that look like numbers
> confuse the issue. The issue is that a range of strings that can be
> converted into a finite set, has a method to test for membership in that
> range, that doesn't match values that are in the set. Wow, that sentence
> is even hard for *me* to follow.
>
> OK, let's take a different example to avoid all discussion of integers
> and various string representations of them.
>
>> ruby -v -e "p(('a'..'aa').to_a)"
> ruby 1.8.2 (2004-12-25) [i386-mswin32]
> ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n",
> "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "aa"]
>
> Here we have a string range that has 27 "members". Now:

not quite - we have a string range that __produces__ 27 elements. it does not
'have' or 'contain' them. it merely suggests this set as it's current thought
on what that set might be. this set definition may change - unlike the
endpoints of the range - and it is therefore not a property of the range.

>> ruby -e "p(('a'..'aa').member?('a'))"
> true
>> ruby -e "p(('a'..'aa').member?('b'))"
> false
> ...
>> ruby -e "p(('a'..'aa').member?('z'))"
> false
>> ruby -e "p(('a'..'aa').member?('aa'))"
> true
>
> Can this really be called correct behavior of the member?() method? I
> can't see any tenable argument to say that it is.

the definition of membership may rely on endpoints only. that explains it
perfectly.

   harp:~ > irb
   irb(main):001:0> 'z' < 'aa'
   => false

ergo - not in the set. the confustion here is caused by exactly the reasons
i'm explaining - String#succ has been defined not to create a monotonically
increasing (<=>) sequence - but to produce the "next" string in an english
sense. this is very useful for auto-generating names

   irb(main):004:0> "z99".succ
   => "aa00"

if this were a monotonically increasing set the output would be

   => "z9:"

but that sure isn't that useful - unless you want to try to use ranges as
sets.

the secret here is simply re-define String#succ - not Range#member. if
String#succ did a simply addition using base 255 arith you'd be set.

Or, perhaps, re-define String#<=>. Who says Strings have to be
compared as if they were an array of bytes? It might be more in line
with peoples expectations if strings compared this way:

class String
  def <=>(other)
    dig, up, low = *%w/ \d+ [[:upper:]]+ [[:lower:]]+ /.map{|r|/#{r}/}
    re = /#{up}|#{low}|#{dig}/

    me, you = scan(re), other.scan(re)
    # uncomparable unless same format
    return nil unless me.size == you.size
    return nil unless me[1..-1].zip(you[1..-1]).all?{|a,b|a.size == b.size}

    # test starting with most significant chunks
    first = true
    me.zip(you) do |us, them|
      res = if us =~ dig and them =~ dig
        us.to_i <=> them.to_i
      elsif (us =~ up and them =~ up) or (us =~ low and them =~ low )
        us.to_i(36) <=> them.to_i(36)
      else
        # uncomparable
        nil
      end
      # if res.nil?, this chunk was uncomparable.
      # if res.zero?, these chunks were equal.
      return res if res.nil? or not res.zero?
    end
    return 0
  end
end
    ==>nil
('0a'..'10z').member? '5b2'
    ==>false
('0a0'..'10z9').member? '5b2'
    ==>true
('0a0'..'10z9').member? '5aa1'
    ==>false

I'm not saying this is the way it *should* be, just proposing another
possibility, for the sake of argument. Since Strings are text, some
people might expect them to be compared as text, instead of as a
series of byte values.

That was almost as fun as a RubyQuiz! :slight_smile:

cheers,
Mark

···

On 11/23/05, Ara.T.Howard <ara.t.howard@noaa.gov> wrote:

On Thu, 24 Nov 2005, Warren Brown wrote:

kind regards.

-a
--

> ara [dot] t [dot] howard [at] noaa [dot] gov
> all happiness comes from the desire for others to be happy. all misery
> comes from the desire for oneself to be happy.
> -- bodhicaryavatara