Regex question(how easy/hard to do it in ruby)

Pointers, please…

I have this text in a comma delimited file with the following
characteristic:

ccc-123456, ,

Field number:

1a - its always begin with 1 to 3 characters followed by
a dash, e.g JKL-, A-, NM-, PQ-

1b - after the dash, it follows by numbers starting from
1 to 99999

2 - a multiline data with either or both newline chars(\n)
and/or cariage-return char(\r), or both(\r\n). This field
might include special characters such as a
single(’) or double(") quote, a space, characters
with ascii number > 127 - accented character,
umlaud, etc …

3 - this field contain at least 2 line to at most 5 line of
data where each line might be
Begin with 2-3 chars, e.g GH@OPRJGPF1234
followed by an “@”, 1-7chars, and followed by
1-4 numbers

My question is :

1a. how to parse the first field(field 1a) so I can manipulate/rename it to
a new label dending on what label they have currently

1b. in field 1b, instead of just 1 number, I’d like to pad
them with leading zero so, 1 -> 000001,
1494 -> 001494, 560987->560987(no change).

  1. capture 2nd field and escape the special characters with ascii number

  2. capture 3rd field and parse them as well just as field 1.

THanks

Pointers, please…

I have this text in a comma delimited file with the following
characteristic:

ccc-123456, ,

Field number:

1a - its always begin with 1 to 3 characters followed by
a dash, e.g JKL-, A-, NM-, PQ-

1b - after the dash, it follows by numbers starting from
1 to 99999

2 - a multiline data with either or both newline chars(\n)
and/or cariage-return char(\r), or both(\r\n). This field
might include special characters such as a
single(') or double(") quote, a space, characters
with ascii number > 127 - accented character,
umlaud, etc …

3 - this field contain at least 2 line to at most 5 line of
data where each line might be
Begin with 2-3 chars, e.g GH@OPRJGPF1234
followed by an “@”, 1-7chars, and followed by
1-4 numbers

My question is :

1a. how to parse the first field(field 1a) so I can manipulate/rename it to
a new label dending on what label they have currently

what exactly do you mean by this? if you want to parse the fields themselves
out use the ‘csv’ module included with ruby…

1b. in field 1b, instead of just 1 number, I’d like to pad
them with leading zero so, 1 → 000001,
1494 → 001494, 560987->560987(no change).

~ > ruby -e ‘p(sprintf(“%06.6d”, 42))’
“000042”

~ > man 3 printf

  1. capture 2nd field and escape the special characters with ascii number
esc = '\\'[0]
munged = ''
field_2.each_byte{|c| munged << esc if c > 127; munged << c} 
field_2 = munged

you could also use a regex to do this…

special = %r/([#{ 127.chr }-#{ 255.chr })]/o
field_2.gsub!(special){|match| "\\#{ match }"}
  1. capture 3rd field and parse them as well just as field 1.

THanks

can you post some sample data? we could probably say more then…

-a

···

On Mon, 3 May 2004, Sarah Tanembaum wrote:

===============================================================================

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
URL :: Solar-Terrestrial Physics Data | NCEI
TRY :: for l in ruby perl;do $l -e “print "\x3a\x2d\x29\x0a"”;done
===============================================================================