Ruby string slice/[] w/ range, weird end behavior

First the docs:

...If passed two Fixnum objects, returns a substring starting at the
offset given by the first, and a length given by the second. If given
a range, a substring containing characters at offsets given by the
range is returned... Returns nil if the initial offset falls outside
the string, the length is negative, or the beginning of the range is
greater than the end.

Now from irb (1.8):

"foo"[2..2]

=> "o"

"foo"[3..3]

=> "" # ???

"foo"[4..4]

=> nil

"foo"[2,1]

=> "o"

"foo"[3,1]

=> "" # ???

"foo"[4,1]

=> nil

"foo"[2]

=> 111 # (the 'o' char)

"foo"[3]

=> nil # This makes sense to me, but seems inconsistent wrt the above

Seems to me like the null terminator of the string is somehow getting
muddled into all of this.

Is there any meaning/purpose behind this behavior?

Thanks,
Gary

String indices start at zero, so:

"foo"[0..0] => 'f'
"foo"[1..1] => 'o'
"foo"[2..2] => 'o'
"foo"[3..3] => nil

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 9 May 2009, at 00:26, Gary Yngve wrote:

First the docs:

...If passed two Fixnum objects, returns a substring starting at the
offset given by the first, and a length given by the second. If given
a range, a substring containing characters at offsets given by the
range is returned... Returns nil if the initial offset falls outside
the string, the length is negative, or the beginning of the range is
greater than the end.

Now from irb (1.8):

"foo"[2..2]

=> "o"

"foo"[3..3]

=> "" # ???

"foo"[4..4]

=> nil

"foo"[2,1]

=> "o"

"foo"[3,1]

=> "" # ???

"foo"[4,1]

=> nil

"foo"[2]

=> 111 # (the 'o' char)

"foo"[3]

=> nil # This makes sense to me, but seems inconsistent wrt the above

Seems to me like the null terminator of the string is somehow getting
muddled into all of this.

Is there any meaning/purpose behind this behavior?

----
raise ArgumentError unless @reality.responds_to? :reason

Here's an even simpler case:

""[0]

=> nil

""[0..0]

=> ""

""[0..1]

=> ""

""[1..1]

=> nil

Why differentiate between returning "" and nil?
Why isn't this explained in the docs?

You sure? I get "", same as the op. If it would return nil, I don't think the
op would have asked his question.

···

Am Samstag 09 Mai 2009 01:32:06 schrieb Eleanor McHugh:

"foo"[3..3] => nil

Are replies to this group always like this?

someone blindly posts the doc without reading that I already posted
the doc or reading my example (like an implicit RTFM)
other folks automatically assume I'm a newb and respond in five
seconds without fully reading my post or thinking about their own post

do i need to be a kool kid or have a secret code word for this noise
to go away?

i've found the more niche ruby groups to be much more signal.. but
this didn't seem to fit into a niche

Element Reference—If passed a single Fixnum, returns the code of the
character at that position. If passed two Fixnum objects, returns a
substring starting at the offset given by the first, and a length
given by the second. If given a range, a substring containing
characters at offsets given by the range is returned. In all three
cases, if an offset is negative, it is counted from the end of str.
Returns nil if the initial offset falls outside the string, the length
is negative, or the beginning of the range is greater than the end.

two fixed numbers returns a substring.

···

On Fri, May 8, 2009 at 4:39 PM, Sebastian Hungerecker <sepp2k@googlemail.com> wrote:

Am Samstag 09 Mai 2009 01:32:06 schrieb Eleanor McHugh:

"foo"[3..3] => nil

You sure? I get "", same as the op. If it would return nil, I don't think the
op would have asked his question.

Sorry, typo on my part (I'm not having a good week for these it seems). The point I was trying to make was that

"foo"[3..3] => ""

is a thoroughly valid range, but I guess that would have been clearer with a fuller explanation. Consider

"foo"[2..3] => "o"

index [3] is actually the end of the string so whilst the range accesses the string after any characters in it, it's still accessing the string in a valid range.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 9 May 2009, at 00:39, Sebastian Hungerecker wrote:

Am Samstag 09 Mai 2009 01:32:06 schrieb Eleanor McHugh:

"foo"[3..3] => nil

You sure? I get "", same as the op. If it would return nil, I don't think the
op would have asked his question.

----
raise ArgumentError unless @reality.responds_to? :reason

Are replies to this group always like this?

someone blindly posts the doc without reading that I already posted
the doc or reading my example (like an implicit RTFM)
other folks automatically assume I'm a newb and respond in five
seconds without fully reading my post or thinking about their own post

Yes. We all have our moron moments. After all, the String. documentation clearly states:

      Element Reference---If passed a single +Fixnum+, returns a
      substring of one character at that position.

which is precisely what I've just tried to explain in my other message and confirms that the behaviour you're querying is completely
consistent with the conceptual model of a string of characters.

A Ruby string is not a *char and the index points are intersticies _between_ an array of characters, not the addresses of those characters.

do i need to be a kool kid or have a secret code word for this noise
to go away?

i've found the more niche ruby groups to be much more signal.. but
this didn't seem to fit into a niche

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 9 May 2009, at 01:23, Gary Yngve wrote:
----
raise ArgumentError unless @reality.responds_to? :reason

The same result would occur with

"foo"[3,1] => ""
"foo"[4,1] => nil

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 9 May 2009, at 00:44, RubyTalk@gmail.com wrote:

Element Reference—If passed a single Fixnum, returns the code of the
character at that position. If passed two Fixnum objects, returns a
substring starting at the offset given by the first, and a length
given by the second. If given a range, a substring containing
characters at offsets given by the range is returned. In all three
cases, if an offset is negative, it is counted from the end of str.
Returns nil if the initial offset falls outside the string, the length
is negative, or the beginning of the range is greater than the end.

two fixed numbers returns a substring.

----
raise ArgumentError unless @reality.responds_to? :reason

"foo"[3..3] => ""

...

index [3] is actually the end of the string so whilst the range
accesses the string after any characters in it, it's still accessing
the string in a valid range.

Maybe in the native C. But why should that be exposed?
And why shouldn't index[3] return "" if you are correct?

After all:

a=; "asd".each_byte{|x| a << x}; a

=> [97, 115, 100]

If I ask for a substring entirely out of bounds, I should consistently
be returned nil or "", not one of the two.

···

On May 8, 4:47 pm, Eleanor McHugh <elea...@games-with-brains.com> wrote:

Are replies to this group always like this?

someone blindly posts the doc without reading that I already posted
the doc or reading my example (like an implicit RTFM)
other folks automatically assume I'm a newb and respond in five
seconds without fully reading my post or thinking about their own post

Yes. We all have our moron moments. After all, the String. documentation

Speak for yourself, I am *always* a moron (luckily I have no idea what
that means).

clearly states:

Element Reference\-\-\-If passed a single \+Fixnum\+, returns a
substring of one character at that position\.

Does it?
Well I guess so, for Ruby1.8.* :wink:
OP will be pleased with Ruby1.9 I guess.
Cheers
Robert

···

On Sat, May 9, 2009 at 3:12 AM, Eleanor McHugh <eleanor@games-with-brains.com> wrote:

On 9 May 2009, at 01:23, Gary Yngve wrote:

"foo"[3..3] => ""

...

index [3] is actually the end of the string so whilst the range
accesses the string after any characters in it, it's still accessing
the string in a valid range.

Maybe in the native C. But why should that be exposed?

It's not the C implementation, it's the conceptual model of what a string is: i.e. an array of characters addressable by index and range.

And why shouldn't index[3] return "" if you are correct?

Because in this case the question you're asking isn't "What substring occupies the given segment of the string" but "Which character is stored at the given index in the string". If no character is stored there (as the case for "foo"[3]) then nil is the only meaningful answer.

"foo"[3] => nil
nil.to_s => ""

After all:

a=; "asd".each_byte{|x| a << x}; a

=> [97, 115, 100]

If I ask for a substring entirely out of bounds, I should consistently
be returned nil or "", not one of the two.

And the substring "foo"[3..3] is in bounds because conceptually you're dealing with:

   f o o
0 1 2 3

so [3..3] equals the slice at the end of the string but not containing any characters.

And yes, I know this is probably about as clear as mud - my ability to write English seems to be inversely proportional to the difficulty of the code I'm working on at any given time, and currently I'm buried in research so the code is very hairy indeed :frowning:

Whilst it's not directly relevant, being a different language and all, I recommend "Chapter 3: String Processing" in Programming with Unicon (http://unicon.sourceforge.net/ubooks.html\) as it's the same basic model.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 9 May 2009, at 01:19, Gary Yngve wrote:

On May 8, 4:47 pm, Eleanor McHugh <elea...@games-with-brains.com> > wrote:

----
raise ArgumentError unless @reality.responds_to? :reason

Yes. We all have our moron moments. After all, the String. documentation

Speak for yourself, I am *always* a moron (luckily I have no idea what
that means).

I know exactly what you mean - let those who never write a bug throw the first rant :slight_smile:

clearly states:

    Element Reference---If passed a single +Fixnum+, returns a
    substring of one character at that position.

Does it?
Well I guess so, for Ruby1.8.* :wink:

I pulled that straight from ri in my 1.9.1 install...

      From Ruby 1.9.1

···

On 9 May 2009, at 09:22, Robert Dober wrote:

On Sat, May 9, 2009 at 3:12 AM, Eleanor McHugh > <eleanor@games-with-brains.com> wrote:

------------------------------------------------------------------------
      Element Reference---If passed a single +Fixnum+, returns a
      substring of one character at that position.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
----
raise ArgumentError unless @reality.responds_to? :reason

If we change perspective a bit, the behavior seems pretty naturally to me: if you execute this:

s = "foo"
l = s.length
(l + 2).times do |i|
   p i, s[i,l - i], s[i, 1 + l - i]
end

you get this:

0
"foo"
1
"oo"
2
"o"
3
""
4
nil

In this context, returning the empty string for 3,0 seems ok - especially if you consider, that s[a,b] is truncated at the end of the string if a + b > s.length.

Kind regards

  robert

···

On 09.05.2009 02:19, Gary Yngve wrote:

On May 8, 4:47 pm, Eleanor McHugh <elea...@games-with-brains.com> > wrote:

"foo"[3..3] => ""

..

index [3] is actually the end of the string so whilst the range accesses the string after any characters in it, it's still accessing the string in a valid range.

Maybe in the native C. But why should that be exposed?
And why shouldn't index[3] return "" if you are correct?

oh yes that is what it does, I cannot read, sorry (but I just proved
my statemet above :wink:

···

On Sat, May 9, 2009 at 1:41 PM, Eleanor McHugh <eleanor@games-with-brains.com> wrote:

On 9 May 2009, at 09:22, Robert Dober wrote:

On Sat, May 9, 2009 at 3:12 AM, Eleanor McHugh >> <eleanor@games-with-brains.com> wrote:

Yes. We all have our moron moments. After all, the String.
documentation

Speak for yourself, I am *always* a moron (luckily I have no idea what
that means).

I know exactly what you mean - let those who never write a bug throw the
first rant :slight_smile:

clearly states:

Element Reference---If passed a single +Fixnum+, returns a
substring of one character at that position.

Does it?
Well I guess so, for Ruby1.8.* :wink:

I pulled that straight from ri in my 1.9.1 install...

From Ruby 1\.9\.1

------------------------------------------------------------------------
Element Reference---If passed a single +Fixnum+, returns a
substring of one character at that position.

Robert Klemme wrote:

···

On 09.05.2009 02:19, Gary Yngve wrote:

On May 8, 4:47 pm, Eleanor McHugh <elea...@games-with-brains.com> >> wrote:

"foo"[3..3] => ""

..

index [3] is actually the end of the string so whilst the range
accesses the string after any characters in it, it's still accessing
the string in a valid range.

Maybe in the native C. But why should that be exposed?
And why shouldn't index[3] return "" if you are correct?

If we change perspective a bit, the behavior seems pretty naturally to
me: if you execute this:

It doesn't to me. I'll throw in with the op: that is stupid behavior
and whoever wrote the docs had no idea how the end of a string is
handled in ruby. Typical crappy ruby documentation.
--
Posted via http://www.ruby-forum.com/\.

The ruby array docs say:
a = [ "a", "b", "c", "d", "e" ]
...
# special cases
   a[5] #=> nil
   a[5, 1] #=> []
   a[5..10] #=> []

As one would expect, slice behavior for an array and a string are
consistent, even if not consistently documented.
I wish the docs would have also expressed the language designer's
intent, rather than just enumerate special cases.

An abstraction of half-steps (brings me back to my days of
computational fluid dynamics research -- pressure/density on whole
steps, velocity/flow on half steps) somewhat explains the slice
behavior -- n elements have n+1 fenceposts.

Though this explains the behavior, it doesn't explain why it is *good*
behavior. In fact, I find it highly annoying because you need to
compare against two possible values (nil and empty?, unless I monkey-
patch nil.empty? #=> true) to see if you have valid elements or not,
and that an empty array/string evaluates successfully with <=> rather
than raising a nil ptr exception.

The software engineer in me thinks [-1] s similarly dangerous because
it doesn't catch an off-by-one bug, but I do find the negative indices
to be elegant enough to more than offset the detraction.

I would love to see someone cogently defend the behavior of slice
here. I think returning nil is safer from the software engg
perspective. Do you have a good use case where something can be done
elegantly using the special cases documented by array?