[rcr] String#split behaves odd

// on Tue, 7 Dec 2004 04:20:37 +0900, Simon Strandgaard
//|Maybe the return value of String#split is wrong.
//|If I invoke split on an empty string, then it
//|results in an empty Array (which I think is odd).
//
//Feeling odd is subjective. Could you tell me why you felt
//String#split is "wrong"?

imho, I think he meant

[] != [""]

I myself thought that string#split would return an array of strings w a
minimum element of [""]

//
// matz.

kind regards -botp

···

Yukihiro Matsumoto [mailto:matz@ruby-lang.org] wrote:

Hi,

···

In message "Re: [rcr] String#split behaves odd" on Tue, 7 Dec 2004 14:16:06 +0900, "Peña, Botp" <botp@delmonte-phil.com> writes:

imho, I think he meant

!= [""]

I myself thought that string#split would return an array of strings w a
minimum element of [""]

I don't get it. is an array of strings with zero elements. :wink:

              matz.

In the past I had the impression, that as long as there are no newline's
in the string, then split would always returns an array with one string.

"a".split(/\n/) #=> ["a"]
"a\nb".split(/\n/) #=> ["a", "b"]

However yesterday accidential the string I were about to split were empty,
and I had to add a specialcase (that only deals with the empty string).

I think many people don't have to make specialcases for the empty string,
if just split returns at least an array with one String element.

maybe title of this rcr should have been: change split result to
reduce specialcases.

···

On Tue, 7 Dec 2004 14:31:27 +0900, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:

In message "Re: [rcr] String#split behaves odd" > on Tue, 7 Dec 2004 14:16:06 +0900, "Peña, Botp" <botp@delmonte-phil.com> writes:

>imho, I think he meant
>
> != [""]
>
>I myself thought that string#split would return an array of strings w a
>minimum element of [""]

I don't get it. is an array of strings with zero elements. :wink:

--
Simon Strandgaard

Yukihiro Matsumoto wrote:

Hi,

>imho, I think he meant
>
> != [""]
>
>I myself thought that string#split would return an array of strings w a
>minimum element of [""]

I don't get it. is an array of strings with zero elements. :wink:

There is misleading behavior though with the current implementation. For example:

Example1: "aaaab".split( /a/ ) => [ "", "", "", "", "b" ]
Example2: "a".split( /a/ ) =>
Example3: "aaaa".split( /a/ ) =>

You would think all three cases would respond the same, but the last examples respond very differently then the first. Should the behavior not be consistent?

Zach

···

In message "Re: [rcr] String#split behaves odd" > on Tue, 7 Dec 2004 14:16:06 +0900, "Peña, Botp" <botp@delmonte-phil.com> writes:

This is a tough call, IMHO. It all depends on your mental description of
split. If you think of split as constructing an array of elements found in a
string, separated by a delimited, then returning makes sense because there
are no elements found. This progression makes a lot of sense ...

   "a,b".split(',') => ['a', 'b'] # two elements found
   "a".split(',') => ['a'] # one element found
   "".split(',') => # zero elements found

However, if your mental model of split is that it starts with the original
string (well, a copy thereof) and breaks it apart whereever it finds a
delimiter, then this sequence makes sense...

  "a,b".split(',') => ['a', 'b'] # Split between a and b
  "a".split(',') => ['a'] # No delimiter found
  "".split(',') => [''] # Again, no delimiter found

So when no delimiter is found, a list containing just the original string with
no splits makes sense in this model.

I will confess to finding myself in the first camp. It took some
experimentation before I saw the viewpoint of the second camp.

···

On Tuesday 07 December 2004 12:31 am, Yukihiro Matsumoto wrote:

Hi,

In message "Re: [rcr] String#split behaves odd" > > on Tue, 7 Dec 2004 14:16:06 +0900, "Peña, Botp" <botp@delmonte-phil.com> writes:
>imho, I think he meant
>
> != [""]
>
>I myself thought that string#split would return an array of strings w a
>minimum element of [""]

I don't get it. is an array of strings with zero elements. :wink:

--
-- Jim Weirich jim@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

Ah I had forgotten this..
Ruby wipes tailing empty-strings

irb(main):001:0> "aaabbb".split(/b/)
=> ["aaa"]
irb(main):002:0> "aaabbbc".split(/b/)
=> ["aaa", "", "", "c"]
irb(main):003:0> "aaabbbcbb".split(/b/)
=> ["aaa", "", "", "c"]
irb(main):004:0> "aaabbbcbbc".split(/b/)
=> ["aaa", "", "", "c", "", "c"]
irb(main):005:0>

Why is it smart to remove tailing empty strings?

···

--
Simon Strandgaard

To weigh in, I think the behavior of split should necessarily be compared to splits in other languages, as long as our split acts in a consistent and well behaved way. For me, that roughly means that there should be a 1:1 correlation between join and split. Namely, anything split (with a constant string/pattern) should be able to be joined back to the original using that constant string. This is not currently the case in ruby:

irb(main):008:0> %w( aabb bbaa ).map do |s| s.split(/a/).join('a') == s; end
=> [true, false]

argh... should _NOT_ be compared... not not... stupid brain.

···

On Dec 7, 2004, at 2:55 PM, Ryan Davis wrote:

To weigh in, I think the behavior of split should necessarily be compared to splits in other

argh... should _NOT_ be compared... not not... stupid brain.

···

On Dec 7, 2004, at 2:55 PM, Ryan Davis wrote:

To weigh in, I think the behavior of split should necessarily be compared to splits in other

This is good thinkings.

T.

···

On Tuesday 07 December 2004 05:55 pm, Ryan Davis wrote:

To weigh in, I think the behavior of split should necessarily be
compared to splits in other languages, as long as our split acts in a
consistent and well behaved way. For me, that roughly means that there
should be a 1:1 correlation between join and split. Namely, anything
split (with a constant string/pattern) should be able to be joined back
to the original using that constant string. This is not currently the
case in ruby:

irb(main):008:0> %w( aabb bbaa ).map do |s| s.split(/a/).join('a') ==
s; end
=> [true, false]

%w( aabb bbaa ).map do |s| s.split(/a/,-1).join('a') == s; end
  # => [true, true]

Don't drop trailing splits.

···

On Tuesday 07 December 2004 05:55 pm, Ryan Davis wrote:

To weigh in, I think the behavior of split should necessarily be
compared to splits in other languages, as long as our split acts in a
consistent and well behaved way. [...] Namely, anything
split (with a constant string/pattern) should be able to be joined back
to the original using that constant string. This is not currently the
case in ruby:

irb(main):008:0> %w( aabb bbaa ).map do |s| s.split(/a/).join('a') ==
s; end
=> [true, false]

--
-- Jim Weirich jim@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)