String.scan (Regexp again...)

Shannon_Fang · 11 December 2002 11:38

Hi gurus,

I used the following regexp to parse a text file:

p=/.{4}((.{3})(.{6})(.{3})).{17}(.{16}).{13}(.).{33}(.{7}).{17}((.{3})(.{6})(.{3}))/

line="PFPT0YH100010 NUT-SPRG-EXPN 980101G A
00001000010001WA100001050000000 OYH100010
"
result=line.match§
p result >>

[["0YH100010 ", “0YH”, “100010”, " ", "NUT-SPRG-EXPN ", “G”,
“0000105”, “
”, " ", " ", " "]]

It seems that result is flattened. I expect result like this:
[[a,b,c], d,e,f,[g,h,i]]

Could anyone tell me how can I achieve that? Thanks a lot!

Shannon

···

The new MSN 8: advanced junk mail protection and 2 months FREE*
http://join.msn.com/?page=features/junkmail

Robert · 11 December 2002 14:38

regexp per se can not match recursive structures. for that you need a
context free grammar and an appropriate parser. i think there are parser
generators for ruby.

robert

“Shannon Fang” xrfang@hotmail.com schrieb im Newsbeitrag
news:F165JLtosJ3Ih7jtrSB000144b2@hotmail.com…

Hi gurus,

I used the following regexp to parse a text file:

p=/.{4}((.{3})(.{6})(.{3})).{17}(.{16}).{13}(.).{33}(.{7}).{17}((.{3})(.{6}
)(.{3}))/

line="PFPT0YH100010 NUT-SPRG-EXPN 980101G
A

···

00001000010001WA100001050000000 OYH100010
"
result=line.match(p)
p result >>

[["0YH100010 ", “0YH”, “100010”, " ", "NUT-SPRG-EXPN ", “G”,
“0000105”, "
", " ", " ", " "]]

It seems that result is flattened. I expect result like this:
[[a,b,c], d,e,f,[g,h,i]]

Could anyone tell me how can I achieve that? Thanks a lot!

Shannon

The new MSN 8: advanced junk mail protection and 2 months FREE*
http://join.msn.com/?page=features/junkmail

Mike_Stok2 · 12 December 2002 13:05

one thing you might consider as you seem to be using fixed width chunks
maiching with . is to use String#unpack to dismember the string and its
subcomponents. After that you could do some manyal packing.

If this happens more than once then you might consider using a more
sophisticated approach with some higher level tools as another poster has
suggested.

Unpack is usually easier to read (x means skip, a means non-null ASCII chars,
if you wantes the spaces trimmed from the NUT_SPRG-EXPN field then you
could use A rather than a.) E.g.

result = line.unpack(‘x4 a12 x17 a16 x13 a x33 a7 x17 a12’)
[0, -1].each { |i| result[i] = result[i].unpack(‘a3a6a3’) }

It isn’t hard to imagine writing a routine to let you use a modifed pack
specifier to show some structure e.g.

result = line.my_unpack(‘x4 [a3 a6 a3] x17 a16 x13 a x33 a7 x17 [a3 a6 a3]’)

Just a thought.

Hope this helps,

Mike

···

In article F165JLtosJ3Ih7jtrSB000144b2@hotmail.com, Shannon Fang wrote:

Hi gurus,

I used the following regexp to parse a text file:

p=/.{4}((.{3})(.{6})(.{3})).{17}(.{16}).{13}(.).{33}(.{7}).{17}((.{3})(.{6})(.{3}))/

line="PFPT0YH100010 NUT-SPRG-EXPN 980101G A
00001000010001WA100001050000000 OYH100010
"
result=line.match(p)
p result >>

[["0YH100010 ", “0YH”, “100010”, " ", "NUT-SPRG-EXPN ", “G”,
“0000105”, "
", " ", " ", " "]]

It seems that result is flattened. I expect result like this:
[[a,b,c], d,e,f,[g,h,i]]

Could anyone tell me how can I achieve that? Thanks a lot!

–
mike@stok.co.uk | The “`Stok’ disclaimers” apply.
http://www.stok.co.uk/~mike/ | GPG PGP Key 1024D/059913DA
mike@exegenix.com | Fingerprint 0570 71CD 6790 7C28 3D60
http://www.exegenix.com/ | 75D2 9EC4 C1C0 0599 13DA

Shannon_Fang · 12 December 2002 20:59

Hi,

Thanks a lot! That makes my program easier, because in the input file
specification, it is described as offset/length, so I used
unpack(“@4a3a6a3 @17a16…”).

Shannon

···

On Thu, 12 Dec 2002 22:05:35 +0900 Mike Stok mike@stok.co.uk wrote:

In article F165JLtosJ3Ih7jtrSB000144b2@hotmail.com, Shannon Fang wrote:

Hi gurus,

I used the following regexp to parse a text file:

p=/.{4}((.{3})(.{6})(.{3})).{17}(.{16}).{13}(.).{33}(.{7}).{17}((.{3})(.{6})(.{3}))/

line="PFPT0YH100010 NUT-SPRG-EXPN 980101G A
00001000010001WA100001050000000 OYH100010
"
result=line.match(p)
p result >>

[["0YH100010 ", “0YH”, “100010”, " ", "NUT-SPRG-EXPN ", “G”,
“0000105”, "
", " ", " ", " "]]

It seems that result is flattened. I expect result like this:
[[a,b,c], d,e,f,[g,h,i]]

Could anyone tell me how can I achieve that? Thanks a lot!

one thing you might consider as you seem to be using fixed width chunks
maiching with . is to use String#unpack to dismember the string and its
subcomponents. After that you could do some manyal packing.

If this happens more than once then you might consider using a more
sophisticated approach with some higher level tools as another poster has
suggested.

Unpack is usually easier to read (x means skip, a means non-null ASCII chars,
if you wantes the spaces trimmed from the NUT_SPRG-EXPN field then you
could use A rather than a.) E.g.

result = line.unpack(‘x4 a12 x17 a16 x13 a x33 a7 x17 a12’)
[0, -1].each { |i| result[i] = result[i].unpack(‘a3a6a3’) }

It isn’t hard to imagine writing a routine to let you use a modifed pack
specifier to show some structure e.g.

result = line.my_unpack(‘x4 [a3 a6 a3] x17 a16 x13 a x33 a7 x17 [a3 a6 a3]’)

Just a thought.

Hope this helps,

Mike

–
mike@stok.co.uk | The “`Stok’ disclaimers” apply.
http://www.stok.co.uk/~mike/ | GPG PGP Key 1024D/059913DA
mike@exegenix.com | Fingerprint 0570 71CD 6790 7C28 3D60
http://www.exegenix.com/ | 75D2 9EC4 C1C0 0599 13DA

Topic		Replies	Views
Simpler case: String.scan (Regexp again...) ruby-talk	0	117	11 December 2002
Str.scan ruby-talk	5	71	15 June 2007
Regexp issue on parsing from file ruby-talk	10	135	15 August 2009
Simple regexp question ruby-talk	0	64	26 October 2005
Regexp for parsing? ruby-talk	6	65	3 December 2007

String.scan (Regexp again...)

Related topics