Splitting a sentence with delimiter preserved

Gavin_Kistner · 17 October 2006 20:37

I am a newbie and the answer to this might be too simple.
How do I improve the example below and reduce the number of passes
over the string?

string = "This is an example string. The purpose is to save the
delimiter during split. Does this work. Great!!!."

re = /(\.\s+)(\D)/
string.gsub!(re,'\1'+'#'+'\2')
b = string.split('#')
puts b

p string.scan( /\w[^.!?]+\S+/ )

#=> ["This is an example string.", "The purpose is to save the delimiter
during split.", "Does this work.", "Great!!!."]

···

From: Ajithkumar Warrier [mailto:a.varier@gmail.com]

Ajithkumar_Warrier · 17 October 2006 21:29

That was very quick.

Thank you.

···

On 10/17/06, Gavin Kistner <gavin.kistner@anark.com> wrote:

From: Ajithkumar Warrier [mailto:a.varier@gmail.com]
> I am a newbie and the answer to this might be too simple.
> How do I improve the example below and reduce the number of passes
> over the string?
>
> string = "This is an example string. The purpose is to save the
> delimiter during split. Does this work. Great!!!."
>
> re = /(\.\s+)(\D)/
> string.gsub!(re,'\1'+'#'+'\2')
> b = string.split('#')
> puts b

p string.scan( /\w[^.!?]+\S+/ )

#=> ["This is an example string.", "The purpose is to save the delimiter
during split.", "Does this work.", "Great!!!."]

David_A_Black3 · 18 October 2006 00:11

Hi --

···

On Wed, 18 Oct 2006, Gavin Kistner wrote:

From: Ajithkumar Warrier [mailto:a.varier@gmail.com]

I am a newbie and the answer to this might be too simple.
How do I improve the example below and reduce the number of passes
over the string?

string = "This is an example string. The purpose is to save the
delimiter during split. Does this work. Great!!!."

re = /(\.\s+)(\D)/
string.gsub!(re,'\1'+'#'+'\2')
b = string.split('#')
puts b

p string.scan( /\w[^.!?]+\S+/ )

#=> ["This is an example string.", "The purpose is to save the delimiter
during split.", "Does this work.", "Great!!!."]

Also, in 1.9, with oniguruma, you can do:

string.split(/(?<=\.)\s+/)

(negative lookbehind).

David

--
David A. Black | dblack@wobblini.net
Author of "Ruby for Rails" [1] | Ruby/Rails training & consultancy [3]
DABlog (DAB's Weblog) [2] | Co-director, Ruby Central, Inc. [4]
[1] Ruby for Rails | [3] http://www.rubypowerandlight.com
[2] http://dablog.rubypal.com | [4] http://www.rubycentral.org

Robert_K1 · 18 October 2006 08:35

From: Ajithkumar Warrier [mailto:a.varier@gmail.com]

I am a newbie and the answer to this might be too simple.
How do I improve the example below and reduce the number of passes
over the string?

string = "This is an example string. The purpose is to save the
delimiter during split. Does this work. Great!!!."

re = /(\.\s+)(\D)/
string.gsub!(re,'\1'+'#'+'\2')
b = string.split('#')
puts b

p string.scan( /\w[^.!?]+\S+/ )

#=> ["This is an example string.", "The purpose is to save the delimiter
during split.", "Does this work.", "Great!!!."]

>> string = "This is an example string. The purpose is to save the
delimiter during split. Does this work. Great!!!."
>> string = string + " It costs 0.1 dollars."
=> "This is an example string. The purpose is to save the\ndelimiter during split. Does this work. Great!!!. It costs 0.1 dollars."
>> string.scan( /\w[^.!?]+\S+/ )
=> ["This is an example string.", "The purpose is to save the\ndelimiter during split.", "Does this work.", "Great!!!.", "It costs 0.1", "do
llars."]

Hm...

robert

···

On 17.10.2006 22:37, Gavin Kistner wrote:

Matt9 · 18 October 2006 02:55

Is the current (1.8.5) regex engine some (other) well-known engine? For
example it is very like PCRE, but I take it that it is not PCRE. Just
curious. m.

···

<dblack@wobblini.net> wrote:

Also, in 1.9, with oniguruma

--
matt neuburg, phd = matt@tidbits.com, Matt Neuburg’s Home Page
Tiger - http://www.takecontrolbooks.com/tiger-customizing.html
AppleScript - http://www.amazon.com/gp/product/0596102119
Read TidBITS! It's free and smart. http://www.tidbits.com

Gavin_Kistner2 · 18 October 2006 03:05

dblack@wobblini.net wrote:

Also, in 1.9, with oniguruma, you can do:

string.split(/(?<=\.)\s+/)

(negative lookbehind).

Er, positive lookbehind, I believe you mean.

For completeness, if you wanted to use this form and also wanted to
allow exclamation points and question marks as sentence delimiters in
addition to periods, you could use:

string.split( /(?<=[.!?])\s+/ )

Gavin_Kistner2 · 18 October 2006 13:40

Robert Klemme wrote:

> p string.scan( /\w[^.!?]+\S+/ )
>
> #=> ["This is an example string.", "The purpose is to save the delimiter
> during split.", "Does this work.", "Great!!!."]

>> string = "This is an example string. The purpose is to save the
delimiter during split. Does this work. Great!!!."
>> string = string + " It costs 0.1 dollars."
=> "This is an example string. The purpose is to save the\ndelimiter
during split. Does this work. Great!!!. It costs 0.1 dollars."
>> string.scan( /\w[^.!?]+\S+/ )
=> ["This is an example string.", "The purpose is to save the\ndelimiter
during split.", "Does this work.", "Great!!!.", "It costs 0.1", "do
llars."]

Down this path leads the madness that is trying to use simple regexp to
parse something as complex as English grammar. That said, here's
another regexp that still works and fixes that particular case:

string = "This is an example string. The purpose is to save the
delimiter during split. Does this work. Great!!!."
string = string + " It costs 0.1 dollars."
p string.scan( /\w.+?[.!?]+(?=\s|\Z)/ )
#=> ["This is an example string.", "The purpose is to save the
delimiter during split.", "Does this work.", "Great!!!.", "It costs 0.1
dollars."]

It'll still fail on sentences with embedded quotes that have
sub-sentences within them.

···

On 17.10.2006 22:37, Gavin Kistner wrote:

James_Edward_Gray_II · 18 October 2006 12:54

Ruby's current regex engine is pretty limited compared to PCRE or Oniguruma. I'm not aware of the name for the current engine.

James Edward Gray II

···

On Oct 17, 2006, at 9:55 PM, matt neuburg wrote:

<dblack@wobblini.net> wrote:

Also, in 1.9, with oniguruma

Is the current (1.8.5) regex engine some (other) well-known engine? For
example it is very like PCRE, but I take it that it is not PCRE.

Topic		Replies	Views
Splitting a sentence with delimiter preserved ruby-talk	0	97	17 October 2006
String split drops the delimiter ruby-talk	8	105	6 December 2005
Question about split method ruby-talk	5	89	16 March 2009
String#split converts string args to regexes --? ruby-talk	40	288	12 July 2002
Text chunking? ruby-talk	4	77	8 March 2005

Splitting a sentence with delimiter preserved

Related topics