Curiously, someone asked exactly that on freenode#perl tonight.
If the input is that simple and is assumed to be well-formed this is enough:
irb(main):005:0> %q{some words "some quoted text" some "" more words}.scan(/"[^"]*"|\S+/)
=> ["some", "words", "\"some quoted text\"", "some", "\"\"", "more", "words"]
Since nothing was said about this, it does not handle escaped quotes, and it assumes quotes are always balanced, so a field cannot be %q{"foo}, for example.
-- fxn
···
On Jan 7, 2006, at 1:08, Richard Livsey wrote:
I want to split a string into words, but group quoted words together such that...
How about the csv module? Despite the name, you don't have to use
commas.
require 'csv'
CSV::parse_line('some words "some quoted text" some more words', ' ')
I hope this helps,
--
ara [dot] t [dot] howard [at] noaa [dot] gov
all happiness comes from the desire for others to be happy. all misery
comes from the desire for oneself to be happy.
-- bodhicaryavatara
require 'shellwords'
Shellwords.shellwords(s)
=> ["one", "two with quotes", "three "]
···
On Mon, 2006-01-09 at 18:13 +0900, William James wrote:
Richard Livsey wrote:
> I want to split a string into words, but group quoted words together
> such that...
>
> some words "some quoted text" some more words
>
> would get split up into:
>
> ["some", "words", "some quoted text", "some", "more", "words"]
s = 'some words "some quoted text" some more words'
p s.split( / *"(.*?)" *| / )
(Or am I overlooking some reason you'd want to capture sequences of
spaces?)
I changed the \w+ to \S+ (and moved it after the | to avoid having it
sponge up too much) in case the words included non-\w characters.
You're right, that's better all around.
I guess with zero-width positive lookbehind/ahead one could do it
without the map operation.
You can drop the map(), if you're willing to replace it with two other calls:
>> example = %Q{some words "some quoted text" some more words}
=> "some words \"some quoted text\" some more words"
>> example.scan(/"([^"]+)"|(\S+)/).flatten.compact
=> ["some", "words", "some quoted text", "some", "more", "words"]
James Edward Gray II
···
On Jan 6, 2006, at 8:33 PM, dblack@wobblini.net wrote:
> Richard Livsey wrote:
> > I want to split a string into words, but group quoted words together
> > such that...
> >
> > some words "some quoted text" some more words
> >
> > would get split up into:
> >
> > ["some", "words", "some quoted text", "some", "more", "words"]
>
> s = 'some words "some quoted text" some more words'
> p s.split( / *"(.*?)" *| / )
>
Which along with the CSV solution can't handle complex cases:
require 'shellwords'
Shellwords.shellwords(s)
=> ["one", "two with quotes", "three "]
This is not a "more complex case"; it is an invalid case.
The original poster simply wanted to avoid splitting on spaces
within double quotes, not within single quotes.
The shellwords "solution" is a solution to a different problem, not
to this one. It can't even handle a simple case:
require 'shellwords'
s = "why can't you think?"
Shellwords.shellwords(s)
ArgumentError: Unmatched single quote: 't you think?
···
On Mon, 2006-01-09 at 18:13 +0900, William James wrote:
Geoff Jacobsen wrote:
> > Richard Livsey wrote:
> > > I want to split a string into words, but group quoted words together
> > > such that...
> > >
> > > some words "some quoted text" some more words
> > >
> > > would get split up into:
> > >
> > > ["some", "words", "some quoted text", "some", "more", "words"]
> >
> > s = 'some words "some quoted text" some more words'
> > p s.split( / *"(.*?)" *| / )
>
> Which along with the CSV solution can't handle complex cases:
>
> s='one two" "\'with quotes\' "three "'
>
> s.split( / *"(.*?)" *| / )
> => ["one", "two", " ", "'with", "quotes'", "three "]
...
> but Shellwords can:
>
> require 'shellwords'
> Shellwords.shellwords(s)
> => ["one", "two with quotes", "three "]
This is not a "more complex case"; it is an invalid case.
The original poster simply wanted to avoid splitting on spaces
within double quotes, not within single quotes.
The shellwords "solution" is a solution to a different problem, not
to this one. It can't even handle a simple case:
require 'shellwords'
s = "why can't you think?"
Shellwords.shellwords(s)
ArgumentError: Unmatched single quote: 't you think?
I agree my example doesn't match the originators request but *I think*
there is enough ambiguity about the post to postulate that they may want
more real-world cases such as:
s='symbol "William said: \"why can't you think?\"" 123 "<xml>foo</xml>"'
Shellwords.shellwords(s)
=> ["symbol", "William said: \"why can't you think?\"", "123",
"<xml>foo</xml>"]
So Shellwords may indeed be a solution to this problem but the problem
is not stated precisely enough to know.
···
On Tue, 2006-01-10 at 04:23 +0900, William James wrote:
> On Mon, 2006-01-09 at 18:13 +0900, William James wrote: