Individual elements of regular expressions

Guys,

I have a file like this:

1182001 G -5.862926E-06 -4.551246E-04 -8.286275E-07
876
-CONT- -8.112655E-06 -3.389444E-08 5.149248E-06
877
1182002 G -8.318727E-06 -6.311623E-04 -1.682066E-06
878
-CONT- -1.082094E-05 -3.322333E-08 5.278418E-06
879
1182003 G -1.142483E-05 -8.031433E-04 -4.946876E-06
880
-CONT- -1.385842E-05 -8.260204E-08 5.663920E-06
881

How do I say: in a set of (number G number number1 number number

end-of-line --CONT- number number2 number number end-of-line), I want to
know number1 and number2?

Thanks,
Maurício

Hi –

Guys,

I have a file like this:

1182001 G -5.862926E-06 -4.551246E-04 -8.286275E-07
876
-CONT- -8.112655E-06 -3.389444E-08 5.149248E-06
877
1182002 G -8.318727E-06 -6.311623E-04 -1.682066E-06
878
-CONT- -1.082094E-05 -3.322333E-08 5.278418E-06
879
1182003 G -1.142483E-05 -8.031433E-04 -4.946876E-06
880
-CONT- -1.385842E-05 -8.260204E-08 5.663920E-06
881

How do I say: in a set of (number G number number1 number number

end-of-line --CONT- number number2 number number end-of-line), I want to
know number1 and number2?

(Performing some joins on the above, as per your verbal description of
where the newlines are…)

This little demo doesn’t do much with the numbers, other than put them
all into an array, and you might want to do more checks… but the
principle (determining an index and then splitting the line and
grabbing the element at the index) might be useful:

res = DATA.readlines.map do |line|
index = case line
when /G/ then 3
when /CONT/ then 2
else raise “Bad line: #{line}”
end
line.split[index]
end

p res

END
1182001 G -5.862926E-06 -4.551246E-04 -8.286275E-07 876
-CONT- -8.112655E-06 -3.389444E-08 5.149248E-06 877
1182002 G -8.318727E-06 -6.311623E-04 -1.682066E-06 878
-CONT- -1.082094E-05 -3.322333E-08 5.278418E-06 879
1182003 G -1.142483E-05 -8.031433E-04 -4.946876E-06 880
-CONT- -1.385842E-05 -8.260204E-08 5.663920E-06 881

David

···

On Wed, 31 Jul 2002, Maurício wrote:


David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

How do I say: in a set of (number G number number1 number number

end-of-line --CONT- number number2 number number end-of-line), I want to
know number1 and number2?

I think David perhaps answered half your question…
I’ll address the other half.

I think you want backreferences. If you parenthesize
part of a regex, then you can retrieve it afterward
as a separate entity. You can use the Perlish
shorthand \1, \2, etc. in some cases or you can use
the array-like MatchData object.

Look it up in the Pickaxe Book or in The Ruby Way.

A MatchData example (doing this from memory, so it
may be wrong):

reg = /(.)2(.)2/
str = “R2D2”
md = reg.match(str)
puts md[0] # “R2D2” (the entire match)
puts md[1] # “R”
puts md[2] # “D”

Hope this helps.

Hal

···

----- Original Message -----
From: “Maurício” briqueabraque@yahoo.com
Newsgroups: gmane.comp.lang.ruby.general
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Wednesday, July 31, 2002 9:10 AM
Subject: Individual elements of regular expressions

How do I say: in a set of (number G number number1 number number

end-of-line --CONT- number number2 number number end-of-line), I want to
know number1 and number2?

(…)
I think you want backreferences. If you parenthesize
part of a regex, then you can retrieve it afterward
(…)

Great! That's exactly what I want. There's only one missing point: how

to insert a line break in a regular expression. I tried /.*$-CONT-/, but it
didn’t work.

Maurício

Hi –

How do I say: in a set of (number G number number1 number number

end-of-line --CONT- number number2 number number end-of-line), I want to
know number1 and number2?

(…)
I think you want backreferences. If you parenthesize
part of a regex, then you can retrieve it afterward
(…)

Great! That's exactly what I want. There's only one missing point: how

to insert a line break in a regular expression. I tried /.*$-CONT-/, but it
didn’t work.

You can always use “\n”, and if you use the /m modifier:

/regexp/m

then the wildcard dot will match \n’s.

So, given a two-line string:
1182001 G -5.862926E-06 -4.551246E-04 -8.286275E-07 876
-CONT- -8.112655E-06 -3.389444E-08 5.149248E-06 877

these will both succeed:

/.+\n-CONT-/.match(line)
/.±CONT-/m.match(line)

David

···

On Thu, 1 Aug 2002, Maurício wrote:


David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

So, given a two-line string:
1182001 G -5.862926E-06 -4.551246E-04 -8.286275E-07 876
-CONT- -8.112655E-06 -3.389444E-08 5.149248E-06
877

these will both succeed:

/.+\n-CONT-/.match(line)
/.±CONT-/m.match(line)

Does \n understand/bypass the Windows/Unix end-of-line conversion

problems?

Maurício