[bug] String#split returns extra empty string

Simon_Strandgaard1 · 31 May 2004 07:44

While extending my own regexp-engine with a split method,
I discovered something odd about Ruby’s split.

irb(main):001:0> ‘ab1ab’.split(/\D+/)
=> ["", “1”]

Its asymmetric, it has a special case for eliminating
the last empty string… but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> [“1”]

···

–
Simon Strandgaard

Simon_Strandgaard1 · 31 May 2004 08:46

Simon Strandgaard wrote:

While extending my own regexp-engine with a split method,
I discovered something odd about Ruby’s split.

irb(main):001:0> ‘ab1ab’.split(/\D+/)
=> [“”, “1”]

Its asymmetric, it has a special case for eliminating
the last empty string… but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> [“1”]

[10 minutes of experimenting later]
I wasn’t aware that Ruby inserts subcaptures this way.

irb(main):001:0> “ab2cd3”.split(/(\D+)/, 2)
=> [“”, “ab”, “2cd3”]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

···

–
Simon Strandgaard

Robert · 31 May 2004 12:03

“Simon Strandgaard” neoneye@adslhome.dk schrieb im Newsbeitrag
news:20040531104155.074a42b0.neoneye@adslhome.dk…

Simon Strandgaard wrote:

While extending my own regexp-engine with a split method,
I discovered something odd about Ruby’s split.

irb(main):001:0> ‘ab1ab’.split(/\D+/)
=> [“”, “1”]

Its asymmetric, it has a special case for eliminating
the last empty string… but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> [“1”]

[10 minutes of experimenting later]
I wasn’t aware that Ruby inserts subcaptures this way.

irb(main):001:0> “ab2cd3”.split(/(\D+)/, 2)
=> [“”, “ab”, “2cd3”]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

But what about:

‘ab’.split(/\D+/)
=>

You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.

robert

Simon_Strandgaard1 · 31 May 2004 12:09

Guy Decoux very recently explained that to me.

When split has no limit, it wipes empty strings.

In your case you would have expected it to output [“”]… but
because its an empty-string in the tail… it gets wiped.

def split(pattern, limit=0)
…
unless limit # lets wipe tailing elements which are empty
result.pop while result.size > 0 and result.last.empty?
end
result
end

···

“Robert Klemme” bob.news@gmx.net wrote:

“Simon Strandgaard” neoneye@adslhome.dk schrieb im Newsbeitrag
news:20040531104155.074a42b0.neoneye@adslhome.dk…

Simon Strandgaard wrote:

While extending my own regexp-engine with a split method,
I discovered something odd about Ruby’s split.

irb(main):001:0> ‘ab1ab’.split(/\D+/)
=> [“”, “1”]

Its asymmetric, it has a special case for eliminating
the last empty string… but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> [“1”]

[10 minutes of experimenting later]
I wasn’t aware that Ruby inserts subcaptures this way.

irb(main):001:0> “ab2cd3”.split(/(\D+)/, 2)
=> [“”, “ab”, “2cd3”]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

But what about:

‘ab’.split(/\D+/)
=>

You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.

–
Simon Strandgaard

Robert · 31 May 2004 20:58

“Simon Strandgaard” neoneye@adslhome.dk schrieb im Newsbeitrag
news:20040531140451.3abb4fb2.neoneye@adslhome.dk…

“Simon Strandgaard” neoneye@adslhome.dk schrieb im Newsbeitrag
news:20040531104155.074a42b0.neoneye@adslhome.dk…

Simon Strandgaard wrote:

While extending my own regexp-engine with a split method,
I discovered something odd about Ruby’s split.

irb(main):001:0> ‘ab1ab’.split(/\D+/)
=> [“”, “1”]

Its asymmetric, it has a special case for eliminating
the last empty string… but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> [“1”]

[10 minutes of experimenting later]
I wasn’t aware that Ruby inserts subcaptures this way.

irb(main):001:0> “ab2cd3”.split(/(\D+)/, 2)
=> [“”, “ab”, “2cd3”]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

But what about:

‘ab’.split(/\D+/)
=>

You would at least expect one empty string in the result since there is
at
least one separator. This strikes me as odd.

Guy Decoux very recently explained that to me.

When split has no limit, it wipes empty strings.

In your case you would have expected it to output [“”]… but
because its an empty-string in the tail… it gets wiped.

def split(pattern, limit=0)
…
unless limit # lets wipe tailing elements which are empty
result.pop while result.size > 0 and result.last.empty?
end
result
end

But I though it will strip trailing empty strings - what about the leading
empty string in my example? I’d expect that to be preserved.

Hm…

robert

···

“Robert Klemme” bob.news@gmx.net wrote:

Simon_Strandgaard1 · 31 May 2004 22:48

Robert Klemme wrote:

But I though it will strip trailing empty strings - what about the leading
empty string in my example? I’d expect that to be preserved.

Let take another example both with leading and tailing empty strings.

irb(main):005:0> ‘34ab34’.split(/\d+/, 10)
=> [“”, “ab”, “”]
irb(main):006:0> ‘34ab34’.split(/\d+/)
=> [“”, “ab”]

When no limit are specified, Ruby wipes the tailing empty strings,
until it reaches a non-empty string.

In your case there are zero non-empty strings… so Ruby wipes everything.

irb(main):001:0> ‘ab’.split(/\D+/)
=>
irb(main):002:0> ‘ab’.split(/\D+/, 10)
=> [“”, “”]

FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

···

–
Simon Strandgaard

David_A_Black3 · 1 June 2004 03:23

Hi –

Simon Strandgaard neoneye@adslhome.dk writes:

FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

Maybe a case like:

irb(main):006:0> "one two three “.split(” ")
=> [“one”, “two”, “three”]

(though there you don’t need an argument to split at all I guess) or
something like:

irb(main):016:0> “one!two!three!”.split(“!”)
=> [“one”, “two”, “three”]

David

···

–
David A. Black
dblack@wobblini.net

Florian_Gross · 1 June 2004 13:28

David Alan Black wrote:

Hi –

Moin!

FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

Maybe a case like:

irb(main):006:0> "one two three “.split(” ")
=> [“one”, “two”, “three”]

(though there you don’t need an argument to split at all I guess) or
something like:

irb(main):016:0> “one!two!three!”.split(“!”)
=> [“one”, “two”, “three”]

Hm, I think that it causes more trouble than it’s worth. It’s very easy
to remove empty elements anyway:

"one!two!three!".split("!").reject { |item| item.empty? }

Maybe it would be better to create a reject_at_end/at_start or something
similar?

Regards,
Florian Gross

David_A_Black3 · 1 June 2004 13:52

Hi –

David Alan Black wrote:

Hi –

Moin!

FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

Maybe a case like:

irb(main):006:0> "one two three “.split(” ")
=> [“one”, “two”, “three”]

(though there you don’t need an argument to split at all I guess) or
something like:

irb(main):016:0> “one!two!three!”.split(“!”)
=> [“one”, “two”, “three”]

Hm, I think that it causes more trouble than it’s worth.

I’m not sure what you mean; what trouble does it cause?

It’s very easy to remove empty elements anyway:

“one!two!three!”.split(“!”).reject { |item| item.empty? }

It’s even easier than that

“one!two!three!”.split(“!”).grep(/\S/)

though I’m still not sure what’s undesireable about having split do
different things.

Maybe it would be better to create a reject_at_end/at_start or something
similar?

That seems like an awfully specific case for a whole separate method.
(I admit, though, that I’m somewhat conservative about proliferation
of methods

David

···

On Tue, 1 Jun 2004, Florian Gross wrote:

–
David A. Black
dblack@wobblini.net

Topic		Replies	Views
[rcr] String#split behaves odd ruby-talk	10	140	8 December 2004
[bug] String#split wipes result ruby-talk	2	121	31 May 2004
String#split(' ') and whitespace (perl user's surprise) ruby-talk	14	127	27 June 2003
String.split ruby-talk	13	97	14 July 2004
[rcr] String#split behaves odd ruby-talk	1	114	8 December 2004

[bug] String#split returns extra empty string

Related topics