Converting a string to an array of tokens (TestCase Attached)

Weirich_James · 14 January 2004 18:25

From: Robert Klemme [mailto:bob.news@gmx.net]

Try /./ instead of /.*/. Unexpected stuff will come at you one
character at a time, which may or may not be ok.

Looks like “|.*” is already present above. Or did you want
to point to something else?

Ummm … I was suggesting “|.” instead of “|.*”

irb(main):001:0> “foo bar baz”.scan(/foo|bar|./)
[“foo”, " ", “bar”, " ", “b”, “a”, “z”]

See how the unexpected stuff comes at you one character at a time. This may
or may not be a problem.

For example, if all you are interested in are patterns
like “some_var = some_thing_else”, then end your token list with
something like /[^a-zA-Z0-9_=]*/.

Again I’d suggest to consider the “+” since the empty
sequence is not very often interesting as a token.

Yes, “+” is a better option than “*”.

Another anoying feature of scan is this if you have parenthesis in your RE,
then it starts returning lists of matches.

···

–
– Jim Weirich / Compuware
– FWP Capture Services
– Phone: 859-386-8855

Robert · 14 January 2004 21:31

“Weirich, James” James.Weirich@FMR.COM schrieb im Newsbeitrag
news:1C8557C418C561429998C1F8FBB283A728BA94@MSGDALCLB2WIN.DMN1.FMR.COM…

From: Robert Klemme [mailto:bob.news@gmx.net]

Try /./ instead of /.*/. Unexpected stuff will come at you one
character at a time, which may or may not be ok.

Looks like “|.*” is already present above. Or did you want
to point to something else?

Ummm … I was suggesting “|.” instead of “|.*”

Ooops, sorry. Must’ve got the logic the other way round.

irb(main):001:0> “foo bar baz”.scan(/foo|bar|./)
[“foo”, " ", “bar”, " ", “b”, “a”, “z”]

See how the unexpected stuff comes at you one character at a time. This
may
or may not be a problem.

For example, if all you are interested in are patterns
like “some_var = some_thing_else”, then end your token list with
something like /[^a-zA-Z0-9_=]*/.

Again I’d suggest to consider the “+” since the empty
sequence is not very often interesting as a token.

Yes, “+” is a better option than “*”.

Another anoying feature of scan is this if you have parenthesis in your
RE,
then it starts returning lists of matches.

Yeah, that’s true. It’s usually not a problem since you control the regexp
(you can either use (?:…) or extract the appropriate element). If you
don’t control it you have to decide which of the groups you choose. You’ll
probably end up doing:

str.scan( rx ) {|m| m.kind_of?( String ) ? m : m.find{|x|x} }

But I wondered why it’s not a MatchData instance like you get from
Regexp#match. That would provide more information. Any ideas about the
reasoning behind this?

Regards

robert

Topic		Replies	Views
Converting a string to an array of tokens (TestCase Attached) ruby-talk	1	107	14 January 2004
Converting a string to an array of tokens ruby-talk	2	98	14 January 2004
Scan for Tokens ruby-talk	2	73	11 November 2007
Converting a string to an array of tokens ruby-talk	16	115	13 January 2004
Regex oddity ruby-talk	3	98	23 July 2011

Converting a string to an array of tokens (TestCase Attached)

Related topics