While extending my own regexp-engine with a split method,
I discovered something odd about Ruby’s split.
irb(main):001:0> ‘ab1ab’.split(/\D+/)
=> ["", “1”]
Its asymmetric, it has a special case for eliminating
the last empty string… but apparently not the first empty string.
I would have expected above to be symmetric, and output:
=> [“1”]
···
–
Simon Strandgaard
Simon Strandgaard wrote:
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby’s split.
irb(main):001:0> ‘ab1ab’.split(/\D+/)
=> [“”, “1”]
Its asymmetric, it has a special case for eliminating
the last empty string… but apparently not the first empty string.
I would have expected above to be symmetric, and output:
=> [“1”]
[10 minutes of experimenting later]
I wasn’t aware that Ruby inserts subcaptures this way.
irb(main):001:0> “ab2cd3”.split(/(\D+)/, 2)
=> [“”, “ab”, “2cd3”]
Because of subcapture insertion, it make sense to keep the
first empty string.
I withdraw this bug-report.
···
–
Simon Strandgaard
Robert
(Robert)
31 May 2004 12:03
3
“Simon Strandgaard” neoneye@adslhome.dk schrieb im Newsbeitrag
news:20040531104155.074a42b0.neoneye@adslhome.dk…
Simon Strandgaard wrote:
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby’s split.
irb(main):001:0> ‘ab1ab’.split(/\D+/)
=> [“”, “1”]
Its asymmetric, it has a special case for eliminating
the last empty string… but apparently not the first empty string.
I would have expected above to be symmetric, and output:
=> [“1”]
[10 minutes of experimenting later]
I wasn’t aware that Ruby inserts subcaptures this way.
irb(main):001:0> “ab2cd3”.split(/(\D+)/, 2)
=> [“”, “ab”, “2cd3”]
Because of subcapture insertion, it make sense to keep the
first empty string.
I withdraw this bug-report.
But what about:
‘ab’.split(/\D+/)
=>
You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.
robert
Guy Decoux very recently explained that to me.
When split has no limit, it wipes empty strings.
In your case you would have expected it to output [“”]… but
because its an empty-string in the tail… it gets wiped.
def split(pattern, limit=0)
…
unless limit # lets wipe tailing elements which are empty
result.pop while result.size > 0 and result.last.empty?
end
result
end
···
“Robert Klemme” bob.news@gmx.net wrote:
“Simon Strandgaard” neoneye@adslhome.dk schrieb im Newsbeitrag
news:20040531104155.074a42b0.neoneye@adslhome.dk…
Simon Strandgaard wrote:
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby’s split.
irb(main):001:0> ‘ab1ab’.split(/\D+/)
=> [“”, “1”]
Its asymmetric, it has a special case for eliminating
the last empty string… but apparently not the first empty string.
I would have expected above to be symmetric, and output:
=> [“1”]
[10 minutes of experimenting later]
I wasn’t aware that Ruby inserts subcaptures this way.
irb(main):001:0> “ab2cd3”.split(/(\D+)/, 2)
=> [“”, “ab”, “2cd3”]
Because of subcapture insertion, it make sense to keep the
first empty string.
I withdraw this bug-report.
But what about:
‘ab’.split(/\D+/)
=>
You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.
–
Simon Strandgaard
Robert
(Robert)
31 May 2004 20:58
5
“Simon Strandgaard” neoneye@adslhome.dk schrieb im Newsbeitrag
news:20040531140451.3abb4fb2.neoneye@adslhome.dk…
“Simon Strandgaard” neoneye@adslhome.dk schrieb im Newsbeitrag
news:20040531104155.074a42b0.neoneye@adslhome.dk…
Simon Strandgaard wrote:
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby’s split.
irb(main):001:0> ‘ab1ab’.split(/\D+/)
=> [“”, “1”]
Its asymmetric, it has a special case for eliminating
the last empty string… but apparently not the first empty string.
I would have expected above to be symmetric, and output:
=> [“1”]
[10 minutes of experimenting later]
I wasn’t aware that Ruby inserts subcaptures this way.
irb(main):001:0> “ab2cd3”.split(/(\D+)/, 2)
=> [“”, “ab”, “2cd3”]
Because of subcapture insertion, it make sense to keep the
first empty string.
I withdraw this bug-report.
But what about:
‘ab’.split(/\D+/)
=>
You would at least expect one empty string in the result since there is
at
least one separator. This strikes me as odd.
Guy Decoux very recently explained that to me.
When split has no limit, it wipes empty strings.
In your case you would have expected it to output [“”]… but
because its an empty-string in the tail… it gets wiped.
def split(pattern, limit=0)
…
unless limit # lets wipe tailing elements which are empty
result.pop while result.size > 0 and result.last.empty?
end
result
end
But I though it will strip trailing empty strings - what about the leading
empty string in my example? I’d expect that to be preserved.
Hm…
robert
···
“Robert Klemme” bob.news@gmx.net wrote:
Robert Klemme wrote:
But I though it will strip trailing empty strings - what about the leading
empty string in my example? I’d expect that to be preserved.
Let take another example both with leading and tailing empty strings.
irb(main):005:0> ‘34ab34’.split(/\d+/, 10)
=> [“”, “ab”, “”]
irb(main):006:0> ‘34ab34’.split(/\d+/)
=> [“”, “ab”]
When no limit are specified, Ruby wipes the tailing empty strings,
until it reaches a non-empty string.
In your case there are zero non-empty strings… so Ruby wipes everything.
irb(main):001:0> ‘ab’.split(/\D+/)
=>
irb(main):002:0> ‘ab’.split(/\D+/, 10)
=> [“”, “”]
FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?
···
–
Simon Strandgaard
Hi –
Simon Strandgaard neoneye@adslhome.dk writes:
FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?
Maybe a case like:
irb(main):006:0> "one two three “.split(” ")
=> [“one”, “two”, “three”]
(though there you don’t need an argument to split at all I guess) or
something like:
irb(main):016:0> “one!two!three!”.split(“!”)
=> [“one”, “two”, “three”]
David
···
–
David A. Black
dblack@wobblini.net
David Alan Black wrote:
Hi –
Moin!
FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?
Maybe a case like:
irb(main):006:0> "one two three “.split(” ")
=> [“one”, “two”, “three”]
(though there you don’t need an argument to split at all I guess) or
something like:
irb(main):016:0> “one!two!three!”.split(“!”)
=> [“one”, “two”, “three”]
Hm, I think that it causes more trouble than it’s worth. It’s very easy
to remove empty elements anyway:
"one!two!three!".split("!").reject { |item| item.empty? }
Maybe it would be better to create a reject_at_end/at_start or something
similar?
Regards,
Florian Gross
Hi –
David Alan Black wrote:
Hi –
Moin!
FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?
Maybe a case like:
irb(main):006:0> "one two three “.split(” ")
=> [“one”, “two”, “three”]
(though there you don’t need an argument to split at all I guess) or
something like:
irb(main):016:0> “one!two!three!”.split(“!”)
=> [“one”, “two”, “three”]
Hm, I think that it causes more trouble than it’s worth.
I’m not sure what you mean; what trouble does it cause?
It’s very easy to remove empty elements anyway:
“one!two!three!”.split(“!”).reject { |item| item.empty? }
It’s even easier than that
“one!two!three!”.split(“!”).grep(/\S/)
though I’m still not sure what’s undesireable about having split do
different things.
Maybe it would be better to create a reject_at_end/at_start or something
similar?
That seems like an awfully specific case for a whole separate method.
(I admit, though, that I’m somewhat conservative about proliferation
of methods
David
···
On Tue, 1 Jun 2004, Florian Gross wrote:
–
David A. Black
dblack@wobblini.net