nobu.nokada@softhome.net writes:
#scan doesn’t solve the ultimate problem – being able to
backtrack and rescan from an earlier position (a la Perl’s
pos() function).
[…]
String#index takes optional argument specifies searching
position.
pos = 0
while foostr.index(/foo/, pos)
puts $&
pos = $~.end(0)
end
The problem with String#index is:
(a) if ruby is not run in ASCII mode, ruby must scan the whole
string up to 'pos' to find the correct byte offset
(e.g. utf8_startpos() in regex.c)
(b) there is no way to anchor the regex at 'pos'
"abcd".index(/\Abc/, 1) -> nil
"abcd".index(/^bc/, 1) -> nil
This makes String#index not suitable for answering simple questions
efficiently – e.g. does a given regexp match at a given offset into
the string. So String#index is not good for lexical analysis
applications.
Optional argument to Regexp#match somebody had proposed would be
nice too.
pos = 0
while md = /foo/.match(foostr, pos)
puts md.to_s
pos = md.end(0)
end
Yes, I proposed this before I understood about point (a) above. This
proposed change to Regexp#match turns and String#index into the almost
same thing.
Because of these problems, an API like Perl’s pos() and \G is
desirable. For example:
(1) Have the string remember its last end-of-match position (byte
and offset). This fixes problem (a) above.
(2) In regexps, \G match this position. This fixes problem (b).
(3) Have String#gpos (or better name) set/get the end-of-match
position based on a character index, for convenience.
This begins to look a lot like strscan, which will be part of ruby
1.8. However, because strscan is not part of String, it can not know
when the string is modified and must freeze the string before
operating on it (otherwise it risks having its byte offsets be
incorrect when the string is modified).
Freezing the string is inconvenient in my application. I examine a
string in detail before deciding whether to append more data to it
(String#<<) from a file or start a new string.
···
At Fri, 13 Dec 2002 04:31:12 +0900, > Austin Ziegler wrote:
–
Don’t send mail to Donald_Schaefer@hole.lickey.com
The address is there for spammers to harvest.