String#[] behaviour

DNNX · 18 December 2007 13:40

'asd'[0...10] returns 'asd' while 'asd'[-10..-1] returns nil.

As far as I understand, such behaviour completely satisfies ruby
documentation (http://www.ruby-doc.org/core/classes/String.html), but
it seems inconsistent to me.

Any thoughts?

Thanks

yermej · 18 December 2007 15:09

From the docs for String# at that link:

"Returns nil if the initial offset falls outside the string..."

···

On Dec 18, 7:36 am, DNNX <6aLLIaPu...@gmail.com> wrote:

'asd'[0...10] returns 'asd' while 'asd'[-10..-1] returns nil.

As far as I understand, such behaviour completely satisfies ruby
documentation (http://www.ruby-doc.org/core/classes/String.html\), but
it seems inconsistent to me.

Any thoughts?

Thanks

Robert_K1 · 18 December 2007 15:54

On one hand you are right. On the other hand, begin and end indexes
are asymmetric anyway: you know that the starting index is always 0
but the ending index can have arbitrary values. I could not say I
came across this so far so for me personally this is a non issue. On
a larger scale it is probably a minor issue. Let's hear what others
say.

Kind regards

robert

···

2007/12/18, DNNX <6aLLIaPuMoB@gmail.com>:

'asd'[0...10] returns 'asd' while 'asd'[-10..-1] returns nil.

As far as I understand, such behaviour completely satisfies ruby
documentation (http://www.ruby-doc.org/core/classes/String.html\), but
it seems inconsistent to me.

Any thoughts?

--
use.inject do |as, often| as.you_can - without end

yermej · 18 December 2007 16:06

Nevermind what I said. Some days I don't read so good and stuff.

···

On Dec 18, 9:06 am, yermej <yer...@gmail.com> wrote:

On Dec 18, 7:36 am, DNNX <6aLLIaPu...@gmail.com> wrote:

> 'asd'[0...10] returns 'asd' while 'asd'[-10..-1] returns nil.

> As far as I understand, such behaviour completely satisfies ruby
> documentation (http://www.ruby-doc.org/core/classes/String.html\), but
> it seems inconsistent to me.

> Any thoughts?

> Thanks

From the docs for String# at that link:

"Returns nil if the initial offset falls outside the string..."

DNNX · 18 December 2007 16:25

Hm... On the other hand, end and begin indexes are asymmetric anyway:
you know that the ending index is always -1 but the starting index can
have arbitrary values.

Isn't this a symmetry?

Best regards,
Viktar

···

On 18 дек, 17:54, Robert Klemme <shortcut...@googlemail.com> wrote:

On one hand you are right. On the other hand, begin and end indexes
are asymmetric anyway: you know that the starting index is always 0
but the ending index can have arbitrary values. ...

Michal_hramrach_Such · 18 December 2007 20:12

The asymmetry is in that you can chop off "at most 10 characters from
the start" with 0...10 but not "at most 10 characters from the end"
with -10..-1 because the start that has to be inside the string is the
one of which you cannot be sure. You cannot swap the bounds because
you get an empty string then.

So the symmetric rule for range indexing would be something like this:

a) both ends of the range have same sign -> the one with lower
absolute value must be inside the string. In other words, the range
must intersect with 0...string.length. This is the only option that
can create a valid range completely outside of the string (when the
condition is not met).

b) they have different sign, start is non-negative -> simple. Either
they give a range inside the string or a range where the start is
higher than end (a..-b => a..length-b), and can always return string,
sometimes empty.

c) the start is negative, end is positive -> ideally you get something
inside the string but you can get range that has both start and end
outside of the string - each on different side. Either way it makes
sense. It contains part of the string or start is higher than end
after evaluating (-a..b => length-a..b)

Thanks

Michal

···

On 18/12/2007, DNNX <6aLLIaPuMoB@gmail.com> wrote:

On 18 дек, 17:54, Robert Klemme <shortcut...@googlemail.com> wrote:
> On one hand you are right. On the other hand, begin and end indexes
> are asymmetric anyway: you know that the starting index is always 0
> but the ending index can have arbitrary values. ...

Hm... On the other hand, end and begin indexes are asymmetric anyway:
you know that the ending index is always -1 but the starting index can
have arbitrary values.

Isn't this a symmetry?

Jordan_Callicoat · 18 December 2007 23:45

No, because "-1" is a special value...it's got magic in it. It can
magically mean 5, or 10, or even 12 (because it's magic). "-1" is
just sugar for #length, and #length is always a side-effect of a
container, whereas '0' is a constant entry point.

Regards,
Jordan

···

On Dec 18, 10:23 am, DNNX <6aLLIaPu...@gmail.com> wrote:

On 18 ÄÅË, 17:54, Robert Klemme <shortcut...@googlemail.com> wrote:

> On one hand you are right. On the other hand, begin and end indexes
> are asymmetric anyway: you know that the starting index is always 0
> but the ending index can have arbitrary values. ...

Hm... On the other hand, end and begin indexes are asymmetric anyway:
you know that the ending index is always -1 but the starting index can
have arbitrary values.

Isn't this a symmetry?

Best regards,
Viktar

DNNX · 19 December 2007 08:15

-1 is more special and magic than 0? Hm... 0 also can magically mean
-6, -11, or even -13 (because it's magic too).

-1 is sugar for #length? Not sure I understand correctly. Never heard
such an interpretation of -1 earlier. Why #length but not #length-1?
Why 0 is not
sugar for -#length? What do you mean saying -1 is a sugar for
something?

0 is a constant entry point? Great, -1 is a constant exit point.

Anyway, is there any symmetry or no, I still believe that returning
'asd' in one case and nil in other is not consistent (please see my
example in the first message).

Regards,
Viktar

···

On 19 дек, 01:42, MonkeeSage <MonkeeS...@gmail.com> wrote:

On Dec 18, 10:23 am, DNNX <6aLLIaPu...@gmail.com> wrote:

> On 18 ÄÅË, 17:54, Robert Klemme <shortcut...@googlemail.com> wrote:

> > On one hand you are right. On the other hand, begin and end indexes
> > are asymmetric anyway: you know that the starting index is always 0
> > but the ending index can have arbitrary values. ...

> Hm... On the other hand, end and begin indexes are asymmetric anyway:
> you know that the ending index is always -1 but the starting index can
> have arbitrary values.

> Isn't this a symmetry?

> Best regards,
> Viktar

No, because "-1" is a special value...it's got magic in it. It can
magically mean 5, or 10, or even 12 (because it's magic). "-1" is
just sugar for #length, and #length is always a side-effect of a
container, whereas '0' is a constant entry point.

Regards,
Jordan

Michal_hramrach_Such · 19 December 2007 10:20

-1 is as constant as 0. And because of its magic when used as
container index it always means the end. And really length - 1, not
just length. And at that place there is always the last object unless
the container is empty. The same way as the first object is at 0.

Thanks

Michal

···

On 19/12/2007, MonkeeSage <MonkeeSage@gmail.com> wrote:

No, because "-1" is a special value...it's got magic in it. It can
magically mean 5, or 10, or even 12 (because it's magic). "-1" is
just sugar for #length, and #length is always a side-effect of a
container, whereas '0' is a constant entry point.

Pasha_Nigerish · 19 December 2007 11:49

Michal Suchanek wrote:

The asymmetry is in that you can chop off "at most 10 characters from
the start" with 0...10 but not "at most 10 characters from the end"
with -10..-1 because the start that has to be inside the string is the
one of which you cannot be sure. You cannot swap the bounds because
you get an empty string then.

So the symmetric rule for range indexing would be something like this:
...skipped...

So we must become clear with range indexing: I think it's perfectly
legal to return intersection of an array/string with range instead of
nil in a case of negative start.
This can be done via one-line patch in range.c:615 (as in trunk) - just
assume `beg = 0` instead of `goto out_of_range`
Thus we'll have at least more perl-compatible behavior =) i.e. just as
'abc'[0..6] is 'abc' now, so 'abc'[-6..-1] will be 'abc' as well.

One problem I see in this assumption: 'abc'[4..6] and 'abc'[-6..-4] will
return '' instead of nil.

···

On 18/12/2007, DNNX <6aLLIaPuMoB@gmail.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

Michal_hramrach_Such · 19 December 2007 12:00

You can still test the lower bound is inside the string. It's just
that with negative ranges the lower bound is the second number, not
the first.

Thanks

Michal

···

On 19/12/2007, Pasha Nigerish <reedsol@tut.by> wrote:

Michal Suchanek wrote:
> On 18/12/2007, DNNX <6aLLIaPuMoB@gmail.com> wrote:
>>
> The asymmetry is in that you can chop off "at most 10 characters from
> the start" with 0...10 but not "at most 10 characters from the end"
> with -10..-1 because the start that has to be inside the string is the
> one of which you cannot be sure. You cannot swap the bounds because
> you get an empty string then.
>
> So the symmetric rule for range indexing would be something like this:
> ...skipped...

So we must become clear with range indexing: I think it's perfectly
legal to return intersection of an array/string with range instead of
nil in a case of negative start.
This can be done via one-line patch in range.c:615 (as in trunk) - just
assume `beg = 0` instead of `goto out_of_range`
Thus we'll have at least more perl-compatible behavior =) i.e. just as
'abc'[0..6] is 'abc' now, so 'abc'[-6..-1] will be 'abc' as well.

One problem I see in this assumption: 'abc'[4..6] and 'abc'[-6..-4] will
return '' instead of nil.

Jordan_Callicoat · 19 December 2007 14:40

I think you (and Michal) missed my point. And yes, I should have said
#length-1. The point is, since there is *no such thing* as a negative
index -- 0 is the *first* index -- and "-1" (or -anynumber) is just
sugar (i.e., just a more convenient syntax for writing #length-
whatever), what you're asking is for ranges such as [-7..2] and [1..0]
to be meaningful. Taking your example, "'asd'[-10..-1]", this means
'asd'[-7..2] when you de-sugar it. Now in the other case,
"'asd'[0...10]", once you reach #length-1, you can stop and return
0..#length-1. But with 'asd'[-7..2], what are you supposed to do when
the start index is less than the first index (0)? Well, you could skip
ahead to the first index, sure, but it makes just as much sense (if
not more) to return nil/empty string. Same goes for cases such as
'asd'[-2..-3] (i.e., 'asd'[1..0]), where the start index is greater
than the end index.

Regards,
Jordan

···

On Dec 19, 2:12 am, DNNX <6aLLIaPu...@gmail.com> wrote:

On 19 дек, 01:42, MonkeeSage <MonkeeS...@gmail.com> wrote:

> On Dec 18, 10:23 am, DNNX <6aLLIaPu...@gmail.com> wrote:

> > On 18 ÄÅË, 17:54, Robert Klemme <shortcut...@googlemail.com> wrote:

> > > On one hand you are right. On the other hand, begin and end indexes
> > > are asymmetric anyway: you know that the starting index is always 0
> > > but the ending index can have arbitrary values. ...

> > Hm... On the other hand, end and begin indexes are asymmetric anyway:
> > you know that the ending index is always -1 but the starting index can
> > have arbitrary values.

> > Isn't this a symmetry?

> > Best regards,
> > Viktar

> No, because "-1" is a special value...it's got magic in it. It can
> magically mean 5, or 10, or even 12 (because it's magic). "-1" is
> just sugar for #length, and #length is always a side-effect of a
> container, whereas '0' is a constant entry point.

> Regards,
> Jordan

-1 is more special and magic than 0? Hm... 0 also can magically mean
-6, -11, or even -13 (because it's magic too).

-1 is sugar for #length? Not sure I understand correctly. Never heard
such an interpretation of -1 earlier. Why #length but not #length-1?
Why 0 is not
sugar for -#length? What do you mean saying -1 is a sugar for
something?

0 is a constant entry point? Great, -1 is a constant exit point.

Anyway, is there any symmetry or no, I still believe that returning
'asd' in one case and nil in other is not consistent (please see my
example in the first message).

Regards,
Viktar

Pasha_Nigerish · 19 December 2007 15:01

Jordan Callicoat wrote:

Taking your example, "'asd'[-10..-1]", this means
'asd'[-7..2] when you de-sugar it. Now in the other case,
"'asd'[0...10]", once you reach #length-1, you can stop and return
0..#length-1. But with 'asd'[-7..2], what are you supposed to do when
the start index is less than the first index (0)? Well, you could skip
ahead to the first index, sure, but it makes just as much sense (if
not more) to return nil/empty string. Same goes for cases such as
'asd'[-2..-3] (i.e., 'asd'[1..0]), where the start index is greater
than the end index.

IMHO, the main goal of such a construct (some_string[-10..-1]) - to
return last 10 chars from some_string. And in this case - returning
'asd' for 'asd'[-10..-1] seems to be as logical as returning 'asd' for
'asd' for [0..10] (as implemented now).

right now (1.8.6) we have:
1) 'asd'[0..10] => 'asd'
2) 'asd'[2..1] => ''
3) 'asd'[-1..-2] => ''
-BUT-
4) 'asd'[-10..-1] => nil

I think, that by a "Principle of Least Astonishment" (c) we can unify
that cases - i.e. to return either 'asd' or nil in cases 1) and 4). All
that we need - adjust start index of the range to 0, if negative - right
after de-sugar.

···

--
Posted via http://www.ruby-forum.com/\.

Jordan_Callicoat · 20 December 2007 10:30

In thinking about it, I guess that does make some sense. Unless one
were to assume that negative starting indexes were more likely to be
programmer errors than larger-than-#length-1 end indexes (does anyone
claim this?), it seems to me that setting negative indexes to 0 would
be consistent with setting larger-than-#length-1 indexes to #length-1.
Maybe you should start an RCR for this.

Regards,
Jordan

···

On Dec 19, 9:01 am, Pasha Nigerish <reed...@tut.by> wrote:

Jordan Callicoat wrote:
> Taking your example, "'asd'[-10..-1]", this means
> 'asd'[-7..2] when you de-sugar it. Now in the other case,
> "'asd'[0...10]", once you reach #length-1, you can stop and return
> 0..#length-1. But with 'asd'[-7..2], what are you supposed to do when
> the start index is less than the first index (0)? Well, you could skip
> ahead to the first index, sure, but it makes just as much sense (if
> not more) to return nil/empty string. Same goes for cases such as
> 'asd'[-2..-3] (i.e., 'asd'[1..0]), where the start index is greater
> than the end index.

IMHO, the main goal of such a construct (some_string[-10..-1]) - to
return last 10 chars from some_string. And in this case - returning
'asd' for 'asd'[-10..-1] seems to be as logical as returning 'asd' for
'asd' for [0..10] (as implemented now).

right now (1.8.6) we have:
1) 'asd'[0..10] => 'asd'
2) 'asd'[2..1] => ''
3) 'asd'[-1..-2] => ''
-BUT-
4) 'asd'[-10..-1] => nil

I think, that by a "Principle of Least Astonishment" (c) we can unify
that cases - i.e. to return either 'asd' or nil in cases 1) and 4). All
that we need - adjust start index of the range to 0, if negative - right
after de-sugar.
--
Posted viahttp://www.ruby-forum.com/.

Topic		Replies	Views
Ruby string slice/[] w/ range, weird end behavior ruby-talk	16	143	10 May 2009
String-indices ruby-talk	5	224	18 November 2016
Bug in ruby? ruby-talk	31	124	27 November 2006
String/array slices ruby-talk	14	122	16 August 2011
String#[] ruby-talk	9	91	27 September 2006

String#[] behaviour

Related topics