Regex question(how easy/hard to do it in ruby)

I have this text in a comma delimited file with the following
characteristic:

ccc-123456, ,

Field number:

1a - its always begin with 1 to 3 characters followed by
a dash, e.g JKL-, A-, NM-, PQ-

1b - after the dash, it follows by numbers starting from
1 to 99999

2 - a multiline data with either or both newline chars(\n)
and/or cariage-return char(\r), or both(\r\n). This field
might include special characters such as a
single(') or double(") quote, a space, characters
with ascii number > 127 - accented character,
umlaud, etc …

3 - this field contain at least 2 line to at most 5 line of
data where each line might be
Begin with 2-3 chars, e.g GH@OPRJGPF1234
followed by an “@”, 1-7chars, and followed by
1-4 numbers

My question is :

1a. how to parse the first field(field 1a) so I can manipulate/rename
it to
a new label dending on what label they have currently

1b. in field 1b, instead of just 1 number, I’d like to pad
them with leading zero so, 1 → 000001,
1494 → 001494, 560987->560987(no change).

  1. capture 2nd field and escape the special characters with ascii
    number

  2. capture 3rd field and parse them as well just as field 1.

Untested code:

···

=============

rex = %r|(\w{1,3})-(\d+),(.*),((\w{2,3}@\w{1,7}\d{1,4}){2,5})|

new_text = old_text.gsub(rex) {

rename label

label = case $1
when ‘JKL’ then ‘newJKL’
when ‘AN’ then ‘newAN’
end

number padding

num = sprintf(“%03d”, “1”)

handle escaping for $3

parse field $4

return new construct:

“#{label}-#{num},#{new_4_field}, #{new_4_field}”
}