Interesting solution! I also tried
("|')([^\1]*)\1
which looked fine initially
irb(main):025:0> "foo 'bar' \"baz\" buz".scan(/("|')([^\1]*)\1/).map(&:last)
=> ["bar", "baz"]
but broke later:
irb(main):030:0> "foo 'bar' \"baz\" buz \"bongo's
kongo\"".scan(/("|')([^\1]*)\1/)
=> [["'", "bar' \"baz\" buz \"bongo"]]
where your solution still works:
irb(main):031:0> "foo 'bar' \"baz\" buz \"bongo's
kongo\"".scan(/("|')((?:(?!\1).)*)\1/)
=> [["'", "bar"], ["\"", "baz"], ["\"", "bongo's kongo"]]
However, we can also use non greediness to achieve the same:
irb(main):032:0> "foo 'bar' \"baz\" buz \"bongo's kongo\"".scan(/("|')(.*?)\1/)
=> [["'", "bar"], ["\"", "baz"], ["\"", "bongo's kongo"]]
irb(main):033:0> "foo 'bar' \"baz\" buz \"bongo's
kongo\"".scan(/("|')(.*?)\1/).map(&:last)
=> ["bar", "baz", "bongo's kongo"]
Adding some escaping capabilities we get ("|')((?:\\.|(?!\1).)*)\1
irb(main):038:0> "foo 'bar' \"baz\" buz \"bongo's kongo\" gingo said
\"foo \\\" bar\" yes".scan(/("|')((?:\\.|(?!\1).)*)\1/).map(&:last)
=> ["bar", "baz", "bongo's kongo", "foo \\\" bar"]
Kind regards
robert
···
On Wed, Dec 11, 2013 at 10:58 AM, Xavier Noria <fxn@hashref.com> wrote:
Doing this is tricky, the robustness of a regexp approach depends on what
you can assume about the input. For example, in a programming language
escaping a quote \" would be valid but unsupported, or in English
apostrophes could be taken as single quotes.
A regexp solution that is broken in those scenarios but works for the easy
cases is:
("|')((?:(?!\1).)*)\1
The regexp says: if you match either " o ', then countinue matching as long
as you do not find the matched quote, and until you find the closing quote
(needed because you could reach end of file with an unbalanced quote).
The second group has the string without quotes.
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/