Whenever I find myself about to do something like the above, I say to myself:
"Hey, buddy, pre-allocating an array and shoving stuff onto it in a block is neat as an exercise of the closure, but you should be using something like #map."
Unfortunately, it would appear that #scan doesn't automagically map the returned value from each iteration to an array. Man, wouldn't that be nice?
Following is my hackish attempt to make a String#scan_and_map function that does the above.
A few questions for the gurus:
a) Is there a better way to deal with bol? with StringScanner? (Boy, it'd be nice if there was a Regexp#uses_bol_at_start_of_match? method.)
b) Is there a clean way to tell the 'arity' of a regexp (how many captures it has, at max)? (Boy, it'd be nice if there was a Regexp#arity method.)
c) Without knowing the arity, is there a clean/fast way to gather all the 1..n submatches held in StringScanner? (Boy, it'd be nice if StringScanner gave you access to an array of subcaptures as a single property. And if it set the $1..$9 vars.)
require 'strscan'
class String
def scan_and_map( regexp )
# A naive check for beginning of line
use_bol = regexp.inspect =~ /\/(?:\((?:\?:)?)*\^/
# A naive check for sub-expression groups
# Will fail for unescaped ( inside , for example
use_groups = regexp.inspect =~ /(\^|[^\\])\\{2}*\(/
results =
ss = StringScanner.new( self )
while !ss.eos?
ss.scan_until( regexp ) unless ss.match?( regexp )
if use_bol and not ss.bol?
ss.pos += 1
else
result = ss.scan( regexp )
if use_groups
result = (1..9).to_a.map{ |i| ss[i] }
end
results << yield( result )
end
end
results
end
end
str = 'foo,bar , baz,qux,jorb,jing,blat'
p str.scan_and_map( /(.+?[^,],{2}*)(?:,(?!,)|$)/ ){ |saved,others|
saved
}
#=> ["foo", "bar , baz", "qux,", "jorb", "jing,blat"]
···
On Oct 3, 2005, at 7:01 AM, Gavin Kistner wrote:
str = 'foo,bar , baz,qux,jorb,jing,blat'
out =
str.scan( /(.+?[^,],{2}*)(?:,(?!,)|$)/ ){ |a,b|
out << a.gsub( ',', ',' )
}
p out
#=> ["foo", "bar , baz", "qux,", "jorb", "jing,blat"]