Regexp to replace comma in text

Kev,

given the following input:

"INSERT INTO PROJECT ( PROJECT_ID, PROJECTTITLE,
PROJECTNO, PROJECTDATE, CATEGORY, SECTOR, SECTION,
DEPARTMENT, LOCATION, STATUS, COMPLETIONSTATUS,
CLIENT, TYPE ) VALUES ( 2319, 'SKIM PERPINDAHAN DAN
PERUMAHAN KAMPONG RATAIE, PERUMAHAN, KAMPONG RATAIE
HOUSING CONTRACT H-3 (64 HOUSES)', 'AD520/0105',
TO_Date( '01/01/1981 12:00:00 AM', 'MM/DD/YYYY
HH:MI:SS AM'), '2', '2', 'T', 'G', 'D', 'D', NULL,
NULL, NULL, 3000000, 'Y', 2);"

How would you replace the commas embedded in the
strings in the SQL?

    First of all, I am assuming that in your database, single quotes are
not escaped by a backslash (\'). If they are, you need to handle that
case separately. Most databases escape a single quote with another
single quote (''), guaranteeing an even number of single quotes.

    I would approach this problem by splitting the string apart and then
reassembling it. You can split it apart in such a way that all of the
quoted strings are separate elements (using a shorter example string):

irb(main):001:0> s = "a, 'b', 'c,d', 'e', 'f,g,h', i"
=> "a, 'b', 'c,d', 'e', 'f,g,h', i"
irb(main):002:0> s.scan(/[^']+|'[^']*'/)
=> ["a, ", "'b'", ", ", "'c,d'", ", ", "'e'", ", ", "'f,g,h'", ", i"]

    Now you can take these individual elements and replace all the
commas in the ones that are quoted:

irb(main):003:0> s.scan(/[^']+|'[^']*'/).map {|e| if e =~ /'/ then
e.gsub(/,/,';') else e end }
=> ["a, ", "'b'", ", ", "'c;d'", ", ", "'e'", ", ", "'f;g;h'", ", i"]

    Now, all you have to do is reassemble the string:

irb(main):004:0> s.scan(/[^']+|'[^']*'/).map {|e| if e =~ /'/ then
e.gsub(/,/,';') else e end }.join
=> "a, 'b', 'c;d', 'e', 'f;g;h', i"

    Using your original string (sorry for all the wrapping):

irb(main):005:0> s = "INSERT INTO PROJECT ( PROJECT_ID, PROJECTTITLE,
PROJECTNO, PROJECTDATE, CATEGORY, SECTOR, SECTION, DEPARTMENT, LOCATIO
N, STATUS, COMPLETIONSTATUS, CLIENT, TYPE ) VALUES ( 2319, 'SKIM
PERPINDAHAN DAN PERUMAHAN KAMPONG RATAIE, PERUMAHAN, KAMPONG RATAIE
HOUSING
CONTRACT H-3
irb(main):006:0" (64 HOUSES)', 'AD520/0105', TO_Date( '01/01/1981
12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), '2', '2', 'T', 'G', 'D', 'D',
NU
LL, NULL, NULL, 3000000, 'Y', 2);"
irb(main):006:0> s.scan(/[^']+|'[^']*'/).map {|e| if e =~ /'/ then
e.gsub(/,/,';') else e end }.join
=> "INSERT INTO PROJECT ( PROJECT_ID, PROJECTTITLE, PROJECTNO,
PROJECTDATE, CATEGORY, SECTOR, SECTION, DEPARTMENT, LOCATION, STATUS,
COMPLET
IONSTATUS, CLIENT, TYPE ) VALUES ( 2319, 'SKIM PERPINDAHAN DAN PERUMAHAN
KAMPONG RATAIE; PERUMAHAN; KAMPONG RATAIE HOUSING CONTRACT H-3\n(64
HOUSES)', 'AD520/0105', TO_Date( '01/01/1981 12:00:00 AM', 'MM/DD/YYYY
HH:MI:SS AM'), '2', '2', 'T', 'G', 'D', 'D', NULL, NULL, NULL, 3000
000, 'Y', 2);"

    You could also replace the if/then/else with the ternary operator if
you wanted to make the line a little shorter, if somewhat less readable:

s.scan(/[^']+|'[^']*'/).map {|e| e =~ /'/ ? e.gsub(/,/,';') : e }.join

    I hope this helps.

    - Warren Brown

Thanks to all who helped me with this.

All suggested solutions worked perfectly (even if I did claim I was still getting incorrect line lengths). It was a combination of embedded commas (the problem I'd asked about) and embedded SQL terminations ";" inside comments.

My original code did

input.each(";") do |line| ...

now I'm using the more robust

input.each(");") do |line|

and I avoid these nasty ";" embedded in comments

Again all help was much appreciated - FYI I'm using the first solution offered to remove the commas.

Kev