Regex question(how easy/hard to do it in ruby)

Mehr_Assaph_Assaph · 4 May 2004 03:43

I have this text in a comma delimited file with the following
characteristic:

ccc-123456, ,

Field number:

1a - its always begin with 1 to 3 characters followed by
a dash, e.g JKL-, A-, NM-, PQ-

1b - after the dash, it follows by numbers starting from
1 to 99999

2 - a multiline data with either or both newline chars(\n)
and/or cariage-return char(\r), or both(\r\n). This field
might include special characters such as a
single(') or double(") quote, a space, characters
with ascii number > 127 - accented character,
umlaud, etc …

3 - this field contain at least 2 line to at most 5 line of
data where each line might be
Begin with 2-3 chars, e.g GH@OPRJGPF1234
followed by an “@”, 1-7chars, and followed by
1-4 numbers

My question is :

1a. how to parse the first field(field 1a) so I can manipulate/rename
it to
a new label dending on what label they have currently

1b. in field 1b, instead of just 1 number, I’d like to pad
them with leading zero so, 1 → 000001,
1494 → 001494, 560987->560987(no change).

capture 2nd field and escape the special characters with ascii
number

capture 3rd field and parse them as well just as field 1.

Untested code:

···

=============

rex = %r|(\w{1,3})-(\d+),(.*),((\w{2,3}@\w{1,7}\d{1,4}){2,5})|

new_text = old_text.gsub(rex) {

rename label

label = case $1
when ‘JKL’ then ‘newJKL’
when ‘AN’ then ‘newAN’
end

number padding

num = sprintf(“%03d”, “1”)

handle escaping for $3

…

parse field $4

return new construct:

“#{label}-#{num},#{new_4_field}, #{new_4_field}”
}

Topic		Replies	Views
Regex question(how easy/hard to do it in ruby) ruby-talk	1	99	4 May 2004
Regexp help: Parsing a CSV file ruby-talk	26	197	27 February 2003
Parsing challenge ruby-talk	3	68	8 October 2003
[QUIZ] Statistician I (#167) ruby-talk	12	96	3 July 2008
Ruby regexpresion ruby-talk	6	132	17 September 2010