class String
def tokenize(*tokens)
array =
each_token(*tokens){|tk| array << tk}
array
enddef each_token(*tokens) regex = Regexp.new(tokens.map { |t| t.kind_of?( Regexp ) ? t :
Regexp::escape(t) }.join(“|”))
string = selfwhile( match = regex.match(string) ) yield match.pre_match if match.pre_match.length > 0 yield match[0] if match[0].length > 0 string = match.post_match end yield string if string.length > 0 self end
end
Question 1:
Except for the pre_match, isn’t this just doing the same thing as scan. And
if that’s the case, why not just include the expected prematch patterns in
the list of regexps and use scan directly?
Question 2:
Ben Tilly pointed out (a long time ago) that the “string =
string.post_match” type of statement is enormously inefficient for large
strings because amount of string copying involved. In ruby-talk:89747 Nobu
Nakada indicated that string tails could be shared and use copy-on-write
(COW). In current Ruby, are the strings shared with COW semantics, or was
Nobu just speculating on possible implementations?
···
–
– Jim Weirich / Compuware
– FWP Capture Services
– Phone: 859-386-8855