Regular expression question

Typing straight from Mr Friedl, this is the java.util.regex version for
CSV…

Pattern pCSVMain = Pattern.compile(
“\G(?:^|,) \n”+
“(?: \n”+
" # Either a double-quoted field… \n"+
" " #field’s opening quote \n"+
" ( (?> [^"]+ ) (?> "" [^"]+ ) + ) \n"+
" " # field’s closing quote \n"+
" # … or … \n"+
" | \n"+
" # … some non-quote/non-comma text … \n"+
" ( [^",]
+ ) \n"+
“) \n”,
Pattern.COMMENTS);

I hope I haven’t made any transcription errors; I’m not very
Java-literate…

While I love the book, I sometimes think some kind of “Regexp Cookbook”
might be a big timesaver as I hunt for the thing I need, not being enough of
a regular Regexp user or having enough brainpower to keep stuff in my head.

HTH,

Mike

···

-----Original Message-----
From: Harry Ohlsen [mailto:harryo@qiqsolutions.com]
Sent: 21 August 2003 08:54
To: ruby-talk@ruby-lang.org
Subject: Regular expression question

I’m sure this is trivial, but I don’t have my Mastering
Regular Expressions handy (and I haven’t put sufficient
effort into getting through it!).

I have a string that looks something like …

“Guido van Rossum”, “Larry Wall”, “Matz”<<

(ie, it’s the set of quoted strings between the markers).
What I’d like to do is split the string up into an array
containing the three quoted items (with or without the quote marks).

At some point, I’d probably also like to use something like
" to represent a quote within a string.

I’m sorry to say that I’m actually having to do this in Java
(it’s a work thing), so I’ll have to do some mangling of
whatever the “correct” regex is, but that’s OK … well, it’s
not, but I’ll have to live with it :-).

Cheers,

H.


This communication (including any attachments) contains confidential information. If you are not the intended recipient and you have received this communication in error, you should destroy it without copying, disclosing or otherwise using its contents. Please notify the sender immediately of the error.

Internet communications are not necessarily secure and may be intercepted or changed after they are sent. Abbey National Treasury Services plc does not accept liability for any loss you may suffer as a result of interception or any liability for such changes. If you wish to confirm the origin or content of this communication, please contact the sender by using an alternative means of communication.

This communication does not create or modify any contract and, unless otherwise stated, is not intended to be contractually binding.

Abbey National Treasury Services plc. Registered Office: Abbey National House, 2 Triton Square, Regents Place, London NW1 3AN. Registered in England under Company Registration Number: 2338548. Regulated by the Financial Services Authority (FSA).


Woodhouse, Mike (ANTS) wrote:

Typing straight from Mr Friedl, this is the java.util.regex version for
CSV…

Pattern pCSVMain = Pattern.compile(
“\G(?:^|,) \n”+
“(?: \n”+
" # Either a double-quoted field… \n"+
" " #field’s opening quote \n"+
" ( (?> [^"]+ ) (?> "" [^"]+ ) + ) \n"+
" " # field’s closing quote \n"+
" # … or … \n"+
" | \n"+
" # … some non-quote/non-comma text … \n"+
" ( [^",]
+ ) \n"+
“) \n”,
Pattern.COMMENTS);

Thanks.

Whew! I remember reading the (probably perl) equivalent of that when I
made my first pass through the book, but had forgotten how tricky it was
:-).

Thanks for that. I’ll give it a go when I get to work this morning.

I hope I haven’t made any transcription errors; I’m not very
Java-literate…

Not to worry, I’ll hunt out my copy of the book over the weekend (it’s
in one of the many boxes of books in the garage since my last move).

While I love the book, I sometimes think some kind of “Regexp Cookbook”
might be a big timesaver as I hunt for the thing I need, not being enough of
a regular Regexp user or having enough brainpower to keep stuff in my head.

Yes, regexes are a classic case where a cookbook would probably be
invaluable. I must admit, though, that generally I can get a regex for
most things with minimal effort. It’s just when you have things like
this where the alternative is splitting the problem into a bunch of
regexes that having a recipe like that would be very very handy.

Cheers,

Harry O.