Regexp to replace comma in text

given the following input:

"INSERT INTO PROJECT ( PROJECT_ID, PROJECTTITLE, PROJECTNO, PROJECTDATE, CATEGORY, SECTOR, SECTION, DEPARTMENT, LOCATION, STATUS, COMPLETIONSTATUS, CLIENT, TYPE ) VALUES ( 2319, 'SKIM PERPINDAHAN DAN PERUMAHAN KAMPONG RATAIE, PERUMAHAN, KAMPONG RATAIE HOUSING CONTRACT H-3 (64 HOUSES)', 'AD520/0105', TO_Date( '01/01/1981 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), '2', '2', 'T', 'G', 'D', 'D', NULL, NULL, NULL, 3000000, 'Y', 2);"

How would you replace the commas embedded in the strings in the SQL?

I'm struggling with something like:
line.sub!(/(VALUES \(\s*\d+,\s*\'[\w\&\-\s]+)\',/,'\1')

to find the first 'string' after the VALUES and to attempt to remove any commas. The problem is that the input can contain superfluous commas in many places and I need to remove all commas in a quoted string (liek gsub) before I can split the string.

I can't think of an easy way to achieve this, can any regexp/ruby expert help me out?

Kev

Hi,

At Thu, 6 Oct 2005 19:54:12 +0900,
Kev Jackson wrote in [ruby-talk:159336]:

How would you replace the commas embedded in the strings in the SQL?

line.gsub(/'(?:[^\']|'')*'/) {$&.delete(",")}

ยทยทยท

--
Nobu Nakada

nobuyoshi nakada wrote:

Hi,

At Thu, 6 Oct 2005 19:54:12 +0900,
Kev Jackson wrote in [ruby-talk:159336]:

How would you replace the commas embedded in the strings in the SQL?
   
line.gsub(/'(?:[^\']|'')*'/) {$&.delete(",")}

Thanks, it's almost there...

Here's my code

def munge_data(line)
    unless line == nil
      # remove star value
      line.sub!(/'\*', /,'')
      # replace client
      line.sub!(/'([A-Z]{3})'/) {
        val = @client_pks["'"+$1.dup+"'"]
        val.to_s
      }
      # remove commas (",") from inside descriptions/titles/briefs etc
      line.gsub!(/'(?:[^\']|'')*'/) {$&.delete(",")}
      # remove TO_Date
      line.sub!(/TO_Date\(\s(\'\d*\/\d*\/\d*)\s[\s\w\/\:]+\',\s\'MM\/DD\/YYYY\sHH\:MI\:SS\sAM\'\)/,'\1\'')
           data = line.split(",")
      p data.length
      #p data[16]
      line = data.join(",")
    end
  end

The regexp seems to work (on visual inspection), but when I split on "," I still get varying line lengths, if the only commas where between fields I should have all lines.split.length equal. Unless I'm being stupid.

Here's another line,

INSERT INTO ADMUSER.TBLPROJ ( TBLPROJ_ID, PROJ_TITLE, PROJ_NO, STAR, PROJ_DATE, PROJ_CATEGORY, PROJ_SECTOR, PROJ_SECTION_CODE, PROJ_DEPT_CODE, PROJ_LOC_CODE, PROJ_STATUS, PROJ_STATC, PROJ_BRIEF, PROJ_REMARKS, SCHEME_VAL, NEWPROJ, PROJ_CLIENT ) VALUES ( 2225, 'TANGGUNGAN TANGGUNGAN BAGI PROJEK PROJEK RKN 7 YANG TELAH SIAP - Kerja Membaiki Kerosakan, Tempat Letak Kerita Bertingkat"', 'AD418/0501', '*', TO_Date( '01/01/1986 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), '3', '4', 'D', 'A', 'A', 'E', 'C3', NULL, 'Completed, Finalised and Retention Money Released on 21/11/2001.', 11328, 'Y', 'JKS');

Not sure how I'm going to do this as I thought that split would be the way to go, but these extra commas are scuppering my plans completely :slight_smile:

Kev

Kev Jackson wrote:

nobuyoshi nakada wrote:

Hi,

At Thu, 6 Oct 2005 19:54:12 +0900,
Kev Jackson wrote in [ruby-talk:159336]:

How would you replace the commas embedded in the strings in the SQL?

line.gsub(/'(?:[^\']|'')*'/) {$&.delete(",")}

Thanks, it's almost there...

Here's my code

def munge_data(line)
    unless line == nil
      # remove star value
      line.sub!(/'\*', /,'')
      # replace client
      line.sub!(/'([A-Z]{3})'/) {
        val = @client_pks["'"+$1.dup+"'"]
        val.to_s
      }
      # remove commas (",") from inside descriptions/titles/briefs etc
      line.gsub!(/'(?:[^\']|'')*'/) {$&.delete(",")}
      # remove TO_Date

line.sub!(/TO_Date\(\s(\'\d*\/\d*\/\d*)\s[\s\w\/\:]+\',\s\'MM\/DD\/YYYY\sH
H\:MI\:SS\sAM\'\)/,'\1\'')

      data = line.split(",")
      p data.length
      #p data[16]
      line = data.join(",")
    end
  end

The regexp seems to work (on visual inspection), but when I split on
"," I still get varying line lengths, if the only commas where between
fields I should have all lines.split.length equal. Unless I'm being
stupid.

Here's another line,

INSERT INTO ADMUSER.TBLPROJ ( TBLPROJ_ID, PROJ_TITLE, PROJ_NO, STAR,
PROJ_DATE, PROJ_CATEGORY, PROJ_SECTOR, PROJ_SECTION_CODE,
PROJ_DEPT_CODE, PROJ_LOC_CODE, PROJ_STATUS, PROJ_STATC, PROJ_BRIEF,
PROJ_REMARKS, SCHEME_VAL, NEWPROJ, PROJ_CLIENT ) VALUES ( 2225,
'TANGGUNGAN TANGGUNGAN BAGI PROJEK PROJEK RKN 7 YANG TELAH SIAP -
Kerja Membaiki Kerosakan, Tempat Letak Kerita Bertingkat"',
'AD418/0501', '*', TO_Date( '01/01/1986 12:00:00 AM', 'MM/DD/YYYY
HH:MI:SS AM'), '3', '4', 'D', 'A', 'A', 'E', 'C3', NULL, 'Completed,
Finalised and Retention Money Released on 21/11/2001.', 11328, 'Y',
'JKS');

Not sure how I'm going to do this as I thought that split would be the
way to go, but these extra commas are scuppering my plans completely
:slight_smile:

Kev

You could split along strings with grouping. Something like

s.split(%r{('(?:[^\']|'')*')})

=> "'aaa', 'bbbb'"

s.split(%r{('(?:[^\']|'')*')})

=> ["", "'aaa'", ", ", "'bbbb'"]

Or use #scan

s.scan(%r{'(?:[^\']|'')*'})

=> ["'aaa'", "'bbbb'"]

Kind regards

    robert