The docs certainly could be more clear but the actual behavior is self-consistent and useful.
Note: I'm assuming 1.9.X version of String.
···
On Mar 30, 2011, at 2:08 PM, 7stud -- wrote:
Patrick Tyler wrote in post #990031:
Hello,
I know that this has been covered a bit here:
Ruby string slice/[] w/ range, weird end behavior - Ruby - Ruby-Forum but I'm still not certain that I
understand.
s = "foo"
s[3] is nil, like I would expect.
s[3,0] is "", instead of nil.
That behaviour is contrary to the description in the 1.9.2 docs here:
class Array - RDoc Documentation
+---+---+---+---+
> a | b | c | d |
+---+---+---+---+
0 1 2 3 4 <-- numbering for two argument indexing or start of range
-4 -3 -2 -1
The common (and understandable) mistake is too assume that the semantics of the single argument index are the same as the semantics of the *first* argument in the two argument scenario (or range). They are not the same thing in practice and the documentation doesn't reflect this. The error though is definitely in the documentation and not in the implementation:
single argument: the index represents a single character position within the string. The result is either the single character string found at the index or nil because there is no character at the given index.
s = ""
s[0] # nil because no character at that position
s = "abcd"
s[0] # "a"
s[-4] # "a"
s[-5] # nil, no characters before the first one
two integer arguments: the arguments identify a portion of the string to extract or to replace. In particular, zero-width portions of the string can also be identified so that text can be inserted before or after existing characters including at the front or end of the string. In this case, the first argument does *not* identify a character position but instead identifies the space between characters as shown in the diagram above. The second argument is the length, which can be 0.
s = "abcd" # each example below assumes s is reset to "abcd"
To insert text before 'a': s[0,0] = "X" # "Xabcd"
To insert text after 'd': s[4,0] = "Z" # "abcdZ"
To replace first two characters: s[0,2] = "AB" # "ABcd"
To replace last two characters: s[-2,2] = "CD" # "abCD"
To replace middle two characters: s[1..3] = "XX" # "aXXd"
The behavior of a range is pretty interesting. The starting point is the same as the first argument when two arguments are provided (as described above) but the end point of the range can be the 'character position' as with single indexing or the "edge position" as with two integer arguments. The difference is determined by whether the double-dot range or triple-dot range is used:
s = "abcd"
s[1..1] # "b"
s[1..1] = "X" # "aXcd"
s[1...1] # ""
s[1...1] = "X" # "aXbcd", the range specifies a zero-width portion of the string
s[1..3] # "bcd"
s[1..3] = "X" # "aX", positions 1, 2, and 3 are replaced.
s[1...3] # "bc"
s[1...3] = "X" # "aXd", positions 1, 2, but not quite 3 are replaced.
If you go back through these examples and insist and using the single index semantics for the double or range indexing examples you'll just get confused. You've got to use the alternate numbering I show in the ascii diagram to model the actual behavior.
Gary Wright