See how the unexpected stuff comes at you one character at a time. This may
or may not be a problem.
For example, if all you are interested in are patterns
like “some_var = some_thing_else”, then end your token list with
something like /[^a-zA-Z0-9_=]*/.
Again I’d suggest to consider the “+” since the empty
sequence is not very often interesting as a token.
Yes, “+” is a better option than “*”.
Another anoying feature of scan is this if you have parenthesis in your RE,
then it starts returning lists of matches.
See how the unexpected stuff comes at you one character at a time. This
may
or may not be a problem.
For example, if all you are interested in are patterns
like “some_var = some_thing_else”, then end your token list with
something like /[^a-zA-Z0-9_=]*/.
Again I’d suggest to consider the “+” since the empty
sequence is not very often interesting as a token.
Yes, “+” is a better option than “*”.
Another anoying feature of scan is this if you have parenthesis in your
RE,
then it starts returning lists of matches.
Yeah, that’s true. It’s usually not a problem since you control the regexp
(you can either use (?:…) or extract the appropriate element). If you
don’t control it you have to decide which of the groups you choose. You’ll
probably end up doing:
But I wondered why it’s not a MatchData instance like you get from
Regexp#match. That would provide more information. Any ideas about the
reasoning behind this?