I have this text in a comma delimited file with the following
characteristic:ccc-123456, ,
Field number:
1a - its always begin with 1 to 3 characters followed by
a dash, e.g JKL-, A-, NM-, PQ-1b - after the dash, it follows by numbers starting from
1 to 999992 - a multiline data with either or both newline chars(\n)
and/or cariage-return char(\r), or both(\r\n). This field
might include special characters such as a
single(') or double(") quote, a space, characters
with ascii number > 127 - accented character,
umlaud, etc …3 - this field contain at least 2 line to at most 5 line of
data where each line might be
Begin with 2-3 chars, e.g GH@OPRJGPF1234
followed by an “@”, 1-7chars, and followed by
1-4 numbersMy question is :
1a. how to parse the first field(field 1a) so I can manipulate/rename
it to
a new label dending on what label they have currently1b. in field 1b, instead of just 1 number, I’d like to pad
them with leading zero so, 1 → 000001,
1494 → 001494, 560987->560987(no change).
capture 2nd field and escape the special characters with ascii
numbercapture 3rd field and parse them as well just as field 1.
Untested code:
···
=============
rex = %r|(\w{1,3})-(\d+),(.*),((\w{2,3}@\w{1,7}\d{1,4}){2,5})|
new_text = old_text.gsub(rex) {
rename label
label = case $1
when ‘JKL’ then ‘newJKL’
when ‘AN’ then ‘newAN’
end
number padding
num = sprintf(“%03d”, “1”)
handle escaping for $3
…
parse field $4
return new construct:
“#{label}-#{num},#{new_4_field}, #{new_4_field}”
}