Problem modifying captured regexp results

Hello,

I'm using ruby to automatically generate Fortran95 code and I'm using a regular expression to parse the following type of definition line:

   REAL(fp), DIMENSION(Dim1,Dim2) :: Arr2 ! Description of Arr2

The regexp I'm using works fine and I build a array of hashes for each definition, i.e.

     if line =~ componentRegexp
       # We have matched an array component definition
      arrayList<<{"type"=>$1,
                  "param"=>$2,
                  "dimlist"=>$3,
                  "name"=>$4,
                  "description"=>$5}
       puts(arrayList.last.inspect)
     else
       # No match, so raise an error
       raise StandardError, "Invalid array definition, #{$~}"
     end

which works fine. The inspect o/p gives me:

{"name"=>"Arr2", "type"=>"REAL", "description"=>"Description of Arr2", "param"=>"fp", "dimlist"=>"Dim1,Dim2"}

However, what I want to do is modify the dimlist in the hash so it is a string array
   "dimlist"=>["Dim1","Dim2"]
rather than a single string,
   "dimlist"=>"Dim1,Dim2"

Because the number of dimensions in the dimlist can vary from 1 to 7, rather than do the splitting in the regexp, I tried doing it in the arrayList concatenation using the split method like so,

       arrayList<<{"type"=>$1,
                   "param"=>$2,
                   "dimlist"=>$3.split(/\s*,\s*/), # <--- split dimlist on ","
                   "name"=>$4,
                   "description"=>$5}

but I've found that the above operation on the $3 captured result appears to "wipe" the subsequent entries $4 (name) and $5 (description). For example, the output of
       puts(arrayList.last.inspect)
on the above gives me,

{"name"=>nil, "type"=>"REAL", "description"=>nil, "param"=>"fp", "dimlist"=>["Dim1", "Dim2"]}

Note that the "dimlist" is how I want it, but "name" and "description" entries are now nil.

So can someone elaborate on why the above split operation on captured regexp results seems to bugger up the other captured results? Does this issue extend to *any* operation on captured regexp results?

I've looked through the pickaxe and cookbook, but no information on this was immediately apparent.

Thanks for any info.

cheers,

paulv

···

--
Paul van Delst Ride lots.
CIMSS @ NOAA/NCEP/EMC Eddy Merckx
Ph: (301)763-8000 x7748
Fax:(301)763-8545

Paul van Delst schrieb:

(...)

    if line =~ componentRegexp
      # We have matched an array component definition
      arrayList<<{"type"=>$1,
                  "param"=>$2,
                  "dimlist"=>$3.split(/\s*,\s*/), # <--- split dimlist
                  "name"=>$4,
                  "description"=>$5}

but I've found that the above operation on the $3 captured result appears to "wipe" the subsequent entries $4 (name) and $5 (description).

Paul, the problem is that #split with a Regexp internally executes some Regexp matches which change $1, $2 etc. You have to capture the results of the first match before executing the split.

Regards,
Pit

Pit Capitain wrote:

Paul van Delst schrieb:

(...)

    if line =~ componentRegexp
      # We have matched an array component definition
      arrayList<<{"type"=>$1,
                  "param"=>$2,
                  "dimlist"=>$3.split(/\s*,\s*/), # <--- split dimlist
                  "name"=>$4,
                  "description"=>$5}

but I've found that the above operation on the $3 captured result appears to "wipe" the subsequent entries $4 (name) and $5 (description).

Paul, the problem is that #split with a Regexp internally executes some Regexp matches which change $1, $2 etc. You have to capture the results of the first match before executing the split.

Aha! That is the answer to the question (see my other post).

Bewdy. Thanks Pit and Gavin.

cheers,

paulv

···

--
Paul van Delst Ride lots.
CIMSS @ NOAA/NCEP/EMC Eddy Merckx
Ph: (301)763-8000 x7748
Fax:(301)763-8545

In this case, it's really easy: just reorder the lines so that the one
containing split will be the last (hash changes the order anyway):

       arrayList<<{"type"=>$1,
                   "param"=>$2,
                   "name"=>$4,
                   "description"=>$5,
                   "dimlist"=>$3.split(/\s*,\s*/)}

This will work fine as in the moment split messes up those $x, you
don't need them any more. Obviously this would not work if there were
more splits.

···

On 10/20/06, Pit Capitain <pit@capitain.de> wrote:

Paul van Delst schrieb:
> (...)
>
> if line =~ componentRegexp
> # We have matched an array component definition
> arrayList<<{"type"=>$1,
> "param"=>$2,
> "dimlist"=>$3.split(/\s*,\s*/), # <--- split dimlist
> "name"=>$4,
> "description"=>$5}
>
> but I've found that the above operation on the $3 captured result
> appears to "wipe" the subsequent entries $4 (name) and $5 (description).

Paul, the problem is that #split with a Regexp internally executes some
Regexp matches which change $1, $2 etc. You have to capture the results
of the first match before executing the split.