Patrick,
what i got:
lines (strings) of the fo_ll_ow_ing fo_rm
---- start of example-lines ----
foo “foobar”;
bar “foo” “bar”;
fob “foo "bar"”;
---- end of example-lines ----
i want a regexp, which returns me the following
(applied with String::scan) from the given lines:
[“foobar”]
[“foo”,“bar”]
[“foo "bar"”]
i thought about something like: /“([^”]|\“)+”/
You were close, but there are a few problems with your regular
expression:
1) Your parentheses would affect what String#scan returns. You could
change the opening parenthesis to “(?:” to prevent this.
2) This one is more subtle. The problem is that the /[^"]/ is going to
match your ‘\’ before the /\“/ ever sees it. You could rework your logic
from (anything which is not " or which is ") to ((anything that is not " or
\ ) or is \ followed by anything). This works out to /([^”\]|\.)/.
3) Since you are using scan, everything before the first double-quote is
ignored. This would include a backslash. To get around this you would have
to reverse the strings, then scan for /“(?:[^”]|“\)+”(?!\)/ (i.e. a
double-quote followed by one or more (non double-quotes or a double-quote
followed by a backslash) followed by a double-quote not followed by a
backslash), then reverse the results.
4) As ahoward pointed out, you will not match the "bar" in 'foo
\\“bar”’ even though normal quoting rules could be interpreted to mean
that the first backslash cancels the meaning of the second backslash,
leaving the double-quote unescaped. To get around this you would have to
add tests for an even number of backslashes.
Putting all of this together would give you:
line.reverse.scan(/“(?:[^”]|"\(?:\\)(?!\))+"(?:\\)(?!\)/).map
line> line.reverse }
That would work, but it's pretty ugly and not very readable. However
there is a much simpler method of doing this if you don’t care about
problem 4) and there exists a string that your strings are guaranteed not to
contain. A good candidate would be “\0”. If your strings will never
contain a “\0” (ASCII 0), you could simply replace each ‘\"’ with “\0”,
scan on /“[^”]+“/, then change each “\0” back to '\”':
line.gsub(/\“/,”\0").scan(/“[^”]+“/).map { |line| line.gsub(/\0/,'\”') }
One last note, you are not going to match the empty string in 'foo ""'.
To match the empty string, change the ‘+’ to ‘*’.
I hope this helps!
- Warren Brown