“YANAGAWA Kazuhisa” kjana@dm4lab.to wrote in message
news:20030312112248.DC4161EE12@milestones.dm4lab.to…
Nope, split will not give me the text where the split takes place
and it will also not give me a match object such that I can use
subexpressions.
What did you test?
>ruby -ve 'p "abc def ghi".split(/(\s+)/)'
ruby 1.7.2 (2002-05-07) [i386-freebsd]
["abc", " ", "def", " ", "ghi"]
I used 1.7.3, but mostly referred to the information provided by ri.
I assumed that matched string was removed, but this is apparently not always
the case:
From ri String#split
" now’s the time".split(/ /) #=> [“”, “now’s”, “”, “the”, “time”]
“1, 2.34,56, 7”.split(/,\s*/) #=> [“1”, “2.34”, “56”, “7”]
My own test in 1.7.3:
“abc def ghi”.split(/\s+/) #=> [“abc”, “def”, “ghi”]
I think my problem is important because it is typical for scanning tags
embedded in text.
On a related note: In the end I solved my problem using a while loop.
But since match do not take an offset as parameter I get to reallocate
strings all the time.
Based on memory, my solution was something like
s = “my string to scan”
RE = “\s+” # actually somewhat more complicated than this
while s.length > 0
m = RE.match s
if m
process_text m.pre_match
process_space m[0]
s = m.post_match
else
process_text s
break
end
end
This is lengthy and not very efficient. The s = m_post_match could result in
a lot of allocation on large strings, and you can’t operate directly on a
file.
Mikkel