Assert "foo"[3] != "foo"[3,1]...revisited

consider the following irb session;

"foo"[0]

=> "f"

"foo"[1]

=> "o"

"foo"[2]

=> "o"

"foo"[3]

=> nil

"foo"[4]

=> nil

"foo"[0,3]

=> "foo"

"foo"[1,3]

=> "oo"

"foo"[2,3]

=> "o"
#note the following weird case!!!

"foo"[3,3]

=> "" #this should be nil in my mind, "foo"[3] is nil, and taking
three characters is still nil.

"foo"[4,3]

=> nil

So, to summarize, when indexing a position beyond the length of a
string, ruby returns nil. But when indexing a slice beyond the length
of a string, ruby returns an empty string "" for the first index
beyond and then nil.

I don't like that this passes
assert "foo"[3] != "foo"[3,1]

Matz, you are so smart, but this does not follow the principle of
least surprise!!! こんなことはへんじゃないすか。

I would appreciate it if anyone can explain how this might make
sense...but please only try if you really believe it is a defensible
behavior for the language.
Thanks,
Tim

Hi,

I don't like that this passes
assert "foo"[3] != "foo"[3,1]

Matz, you are so smart, but this does not follow the principle of
least surprise!!! こんなことはへんじゃないすか。

"foo"[3,1] is "" since when index is within the string, the sought
length will be rounded to fit in the size. And 3 (which equals to the
length of the string) is considered as touching the end of the string,
so the result length is zero.

And a tip for you; never mention PoLS again to persuade me. It's no
use. If you have real trouble besides misunderstanding, let me know.

              matz.

···

In message "Re: assert "foo"[3] != "foo"[3,1]...revisited" on Mon, 23 Aug 2010 17:51:02 +0900, timr <timrandg@gmail.com> writes:

...In message "Re: assert "foo"[3] != "foo"[3,1]...revisited"

>I don't like that this passes
>assert "foo"[3] != "foo"[3,1]
>...

"foo"[3,1] is "" since when index is within the string, the sought

length will be rounded to fit in the size. And 3 (which equals to the
length of the string) is considered as touching the end of the string,
so the result length is zero.
...
                                                        matz.

I'm grateful for the reply, and for the question that prompted it, as that
is something that has bothered/intrigued me for some time, and I had
wondered whether it was something to do with string (or array) processing
near the end of the string. I think it's worth adding something like Matz's
wording to
  class String - RDoc Documentation
  class Array - RDoc Documentation
because reasons for why things work a particular way do help people (well,
at least me) remember behaviour.

I first noticed this when I was doing some complicated string processing,
and was using (or trying to use) something like str[i, 1] being nil to
indicate that the end of the string had been reached. (I could perhaps have
used regular expressions, but I didn't - and sometimes still don't - trust
my understanding of them, and I was very wary of an apparent regular
expression match or non-match being not quite what I'd intended it to be.) I
then found that arrays worked the same way.

At the time I wondered about asking whether it might be of more general use
if String and Array also had a variation of slice having the behaviour I
wanted for what I was doing (that is working like slice, except when the
index is just past the end of the string or array when it would return nil),
but didn't pursue it, partly from assuming that since / slice were fairly
fundamental parts of String and Array it was very likely this had been
debated before, even if I couldn't find the debate.

I'm tempted to suggest that now, partly because I would find it useful (not
sure what to call it, but it should start with "slice") - yes, I know I
could write something to do that for my own use! - and partly because if you
(well I) have two similar but a little differently named methods it might
help you (well me) remember that there is a behaviour difference to what -
choosing my words very carefully - I am expecting. (And being well aware
that *I* didn't invent the Ruby computer language.)

*** off the topic of the thread, but on the topic alluded to by my last
comments, and prompted by Matz's penultimate sentence: I'm rather
distrustful of slogans - I think there is a danger that they start being
used as a substitute for really thinking about things, so I was pleased when
the use of a certain phrase in Ruby discussions became discouraged (quite a
long time ago now). I don't use Python, but I do sometimes look at the
discussion groups, and I get the impression that "there should only be one
obvious way to do it" sometimes (frequently???) gets misused, not least by
the omission of "obvious". It also illustrates nicely how slogans can get
corrupted: looking here
  PEP 20 – The Zen of Python | peps.python.org
I find that (a) the original (?) version is more complex, and (b) more than
somewhat self-deprecating, which I like, because it at least hints at the
possibility that different people can quite reasonably take different views
on things.
  There should be one -- and preferably only one -- obvious way to do it.
  Although that way may not be obvious at first unless you're Dutch.

···

2010/8/23 Yukihiro Matsumoto <matz@ruby-lang.org>

    on Mon, 23 Aug 2010 17:51:02 +0900, timr <timrandg@gmail.com> writes:

Hi,

···

In message "Re: assert "foo"[3] != "foo"[3,1]...revisited" on Mon, 23 Aug 2010 20:10:23 +0900, Colin Bartlett <colinb2r@googlemail.com> writes:

I think it's worth adding something like Matz's
wording to
class String - RDoc Documentation
class Array - RDoc Documentation
because reasons for why things work a particular way do help people (well,
at least me) remember behaviour.

Yep, description proposal is welcome.

              matz.

Hi again,
Thanks for responding to my question--it is something akin to getting
a return letter from the President. And sorry for the previous
reference to PoLS. Though I understand the current function of
String# vs. String#[,] and can predict their output, I don't concede
that anything gained by having a divergence in behavior at the end of
a string. In fact, without the divergent behavior at the end of a
string, the documentation could simply read:

Element Reference—If passed a single Fixnum, returns the code of the
character at that position. If passed two Fixnum objects, returns a
substring starting at the offset given by the first, and a length
given by the second. If given a range, a substring containing
characters at offsets given by the range is returned. In all three
cases, if an offset is negative, it is counted from the end of str.
Returns nil if the initial offset falls outside the string, the length
is negative, or the beginning of the range is greater than the end.

(This is in fact how the documentation currently reads. If there were
no edge cases to account for, one could leave it implied that the
String# and String#[,] methods are predictably related with
String# being equivalent to String#[,] with an implied second
argument of 1.)

However, as element referencing is currently implemented, the
documentation may benefit from highlighting the edge-case at the end
of strings. See below for an attempt...

Please note edge-case when the index position is the same as the
length of the string:
"foo"[0] == "foo"[0,1]
"foo"[1] == "foo"[1,1]
"foo"[2] == "foo"[2,1]
"foo"[3] != "foo"[3,1] #These are different (nil on the left and "" on
the right). Just memorize this edge case.

"foo"[3,1] = "d" #=> "food"
"foo"[3] = "d" #=> IndexError. #And memorize also for good measure.

And here is a proposal for the updated change to the documentation:

Element Reference—If passed a single Fixnum, returns the code of the
character at that position. At the end of the string, single parameter
element referencing will return nil. If passed two Fixnum objects,
returns a substring starting at the offset given by the first, and a
length given by the second. At the end of the string, two-argument
element referencing returns and empty string (""). If given a range, a
substring containing characters at offsets given by the range is
returned. In all three cases, if an offset is negative, it is counted
from the end of str. Returns nil if the initial offset falls outside
the string, the length is negative, or the beginning of the range is
greater than the end.

Of course, simplicity/elegance is in the mind of the beholder. But, I
hope that you might consider the option of eliminating the divergent
behavior of String# and String[,] when the index == str.length.
Thanks,
Tim

···

On Aug 23, 5:40 am, Yukihiro Matsumoto <m...@ruby-lang.org> wrote:

Hi,

In message "Re: assert "foo"[3] != "foo"[3,1]...revisited" > on Mon, 23 Aug 2010 20:10:23 +0900, Colin Bartlett <colin...@googlemail.com> writes:

>I think it's worth adding something like Matz's
>wording to
> http://www.ruby-doc.org/core/classes/String.html#M000771
> class Array - RDoc Documentation
>because reasons for why things work a particular way do help people (well,
>at least me) remember behaviour.

Yep, description proposal is welcome.

                                                    matz\.

At the end of this post I've put examples of what I had in mind for slice
methods which behave that way.

The following is a try at modifying the documentation for Array. (I'm using
Array because that doesn't have the complication of the change in behaviour
of string[index] from 1.8 to 1.9, and because the current documentation for
Array does have the special cases, albeit I think it could perhaps be more
precise. Adaptation to String should be straightforward.) As much as
possible the try uses the existing documentation with minimal changes, and
I've included Matz's explanation with what I hope are appropriate changes
for array.

Comments (not intended to be included in the documentation) are /* comment
*/.
(Apologies in advance if the formatting is weird: Gmail sometimes deletes
leading spaces (and others?) when it thinks it knows better than me.)

array[index] --> obj or nil
array[start, length] --> an_array or nil
array[range] --> an_array or nil
array.slice(index) --> obj or nil
array.slice(start, length) --> an_array or nil
array.slice(range) --> an_array or nil

Element Reference—Returns the element at index, or returns a subarray
starting at start and continuing for length elements, or returns a subarray
specified by range. Negative indices count backward from the end of the
array (-1 is the last element).
/* start a new line to highlight that an out of range index does not always
return nil */
Returns nil if the start (or starting index) are out of range:
/* suggested additional documentation */
/*new line*/ *unless* there is a length *and* Integer(start) == length;
/*new line*/ *or* the argument is a range *and* Integer(range.begin) ==
length.
/*new line*/ For these special cases (see the table of examples) the
return value is an empty array .
/*new line*/ The reason for this special behaviour is that when the start
is within the array the sought length is rounded /* or use "truncated"? */
to fit in the size. In the special cases examples 5 (which is the length of
the array) is considered as touching the end of the array, so the returned
value is a subarray with length zero.

/*back to current documentation */
a = [ "a", "b", "c", "d", "e" ] # a.length == 5
a[2] + a[0] + a[1] #=> "cab"
a[6] #=> nil
a[1, 2] #=> [ "b", "c" ]
a[1..3] #=> [ "b", "c", "d" ]
a[4..7] #=> [ "e" ]
a[6..10] #=> nil
a[-3, 3] #=> [ "c", "d", "e" ]

/* suggested additional documentation */
The following table shows the special cases behaviour
when the start position is just past the end of the array.

index/
start a[index] a[start, 2] a[start..7]
----- -------- ------------ ------------
3 "d" [ "d", "e" ] [ "d", "e" ]
4 "e" [ "e" ] [ "e" ]
5 nil # <-- special cases
6 nil nil nil

*** *** example additional slice methods which return nil
*** *** if the start position is outside the array, even if
*** *** the start position is only just after the end of the array

module Array_String_at_slice
  # Intended for Array and String: behave like #, #slice and #slice!
  # except when the arguments are not just an index,
  # that is the arguments are a range or an index _and_ a length,
  # *and* Integer(range.begin) or Integer(index) == array_string.length,
  # when #, #slice and #slice! return an empty array or string "",
  # but #at_slice and #at_slice! return nil.
  # In other words, if #at(index) or #at(range.begin) would return nil
  # then #at_slice(index, arg), #at_slice!(index, arg),
  # #at_slice(range) and #at_slice!(range) also return nil.

  # The method names are intended to convey that the slice behaviour
  # is similar to #at. Using the name #slice_at was considered,
  # but rejected (wrongly?) as possibly being capable of being assumed
  # to be a synonym for #slice.

  def at_slice( *args )
    unless Numeric === (ii = args[0]) then ii = ii.begin end
    if ii >= self.size then nil else slice( *args ) end
  end

  def at_slice!( *args )
  unless Numeric === (ii = args[0]) then ii = ii.begin end
  if ii >= self.size then nil else slice!( *args ) end
  end
end

class Array
  include Array_String_at_slice
end

···

On Tue, Aug 24, 2010 at 2:55 AM, timr <timrandg@gmail.com> wrote:

On Aug 23, 5:40 am, Yukihiro Matsumoto <m...@ruby-lang.org> wrote:

on 23 Aug 2010 20:10:23 +0900, Colin Bartlett <colin...@googlemail.com> wrote:

I think it's worth adding something like Matz's wording to
http://www.ruby-doc.org/core/classes/String.html#M000771
class Array - RDoc Documentation
because reasons for why things work a particular way do help people
(well, at least me) remember behaviour.

Yep, description proposal is welcome.
matz.

... And here is a proposal for the updated change to the documentation:
Of course, simplicity/elegance is in the mind of the beholder. But,
I hope that you might consider the option of eliminating the divergent
behavior of String# and String[,] when the index == str.length.

Correcting my own post to remove confusion between length as in array.size
and length as in slice(index, length)

Element Reference—Returns the element at index, or returns a subarray
starting at start and continuing for length elements, or returns a subarray
specified by range. Negative indices count backward from the end of the
array (-1 is the last element).
/*new line to highlight that an out of range index might not return nil*/
Returns nil if the start (or starting index) are out of range:
/* suggested additional documentation */
/*new line*/ *unless* there is a length *and* Integer(start) == array.size;
/*new line*/ *or* the argument is a range *and* Integer(range.begin) ==
array.size.
/*new line*/ For these special cases (see the table of examples) the return
value is an empty array [].
/*new line*/ The reason for this special behaviour is that when the start is
within the array the sought length is rounded /* or use "truncated"? */ to
fit in the size. In the special cases examples 5 (which is the size of the
array) is considered as touching the end of the array, so the returned value
is a subarray with size zero.