I'm probably just missing something obvious, but I haven't found a way to match a regular expression against only part of a string, in particular only past a certain point of a string, as a way of finding successive matches. Of course, one could do a match against a string, take the substring past that match and do a match against the substring, and so on, to find all of the matches for the string, but that could be very expensive for very large strings.
I'm aware of the String.scan method, but that doesn't work for me because it doesn't return MatchData instances.
What I want is just something like regexp.match(string, n), where the regexp starts looking for a match at or after position n in the string.
I don't know of anything obvious, but I would probably do something a
little more like:
class String
def match_each(exp)
str = self
while md = str.match(exp)
yield md
str = md.post_match
end
end
end
foo = "foo bar foo bar foo"
foo.match_each /[oa][or]/ do |md|
puts "Found: #{md}"
end
# pth
···
On 6/2/07, Kenneth McDonald <kenneth.m.mcdonald@sbcglobal.net> wrote:
I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.
I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.
What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.
I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.
I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.
What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.
Thanks,
Ken
How about this?
def match(s, re, n)
/(?:.{#{n}})(#{re})/.match(s)
end
irb(main):043:0> p s
"abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh
abdefgh "
irb(main):044:0> p match(s, /abd/, 10).begin(1)
16
irb(main):045:0> p match(s, /abd/, 20).begin(1)
24
I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.
I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.
What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.
require 'strscan'
scanner = StringScanner.new(string)
scanner.pos = n
if scanner.scan(regexp)
p scanner[1]
p scanner.matched
p scanner.pos
end
It's in the stdlib. (Note, it doesn't actually give you a match data, or
set $~, but of the top of my head I can't think of anything that a
matchdata can do that the stringscanner can't.)
···
On Sun, Jun 03, 2007 at 12:59:24PM +0900, Kenneth McDonald wrote:
Hmm apart of using #scan and #index with $~ as indicated, I do not
think that there is a performance penalty if you do
rg.match(string[n..-1])
Cheers
Robert
···
On 6/3/07, Kenneth McDonald <kenneth.m.mcdonald@sbcglobal.net> wrote:
I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.
I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.
What I want is just something like regexp.match(string, n),
--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw
I think he wanted MatchData objects. The String#index method returns
the index (numeric position of the match). But if all you want are
captures, then index is a good solution.
pth
···
On 6/3/07, Nobuyoshi Nakada <nobu@ruby-lang.org> wrote:
Hi,
At Sun, 3 Jun 2007 12:59:24 +0900,
Kenneth McDonald wrote in [ruby-talk:254054]:
> What I want is just something like regexp.match(string, n), where the
> regexp starts looking for a match at or after position n in the string.
If you want to specify the point in the string by number, you could do this.
str = "abcdefghabcehjjjuabcfjkiabcgdfg"
str =~ /.{10}(abc.).*(abc.)/
p $1 #abcf
p $2 #abcg
Harry
···
On 6/3/07, Harry Kakueki <list.push@gmail.com> wrote:
On 6/3/07, Kenneth McDonald <kenneth.m.mcdonald@sbcglobal.net> wrote:
>
> What I want is just something like regexp.match(string, n), where the
> regexp starts looking for a match at or after position n in the string.
>
> Thanks,
> Ken
>
You could match the string but ignore the first part of the match.
str = "abcdefghabcehjjjuabcfjkiabcgdfg"
str =~ /(abc.)/
p $1 # abcd
str =~ /a.*ju(abc.)/
p $1 #abcf
I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.
I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.
What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.
Thanks,
Ken
How about this?
def match(s, re, n)
/(?:.{#{n}})(#{re})/.match(s)
end
irb(main):043:0> p s
"abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh "
irb(main):044:0> p match(s, /abd/, 10).begin(1)
16
irb(main):045:0> p match(s, /abd/, 20).begin(1)
24
That's clever. Obscure, but clever :-). I wonder if the regexp engine is clever enough to turn a match like .{n} into a constant time operation?
How can that be? You have to create a whole new String. If that can be
avoided in the internal implementation then adding an optional offset
index to #match is not an unreasonable idea.
T.
···
On Jun 4, 6:19 am, "Robert Dober" <robert.do...@gmail.com> wrote:
On 6/3/07, Kenneth McDonald <kenneth.m.mcdon...@sbcglobal.net> wrote:> I'm probably just missing something obvious, but I haven't found a way
> to match a regular expression against only part of a string, in
> particular only past a certain point of a string, as a way of finding
> successive matches. Of course, one could do a match against a string,
> take the substring past that match and do a match against the substring,
> and so on, to find all of the matches for the string, but that could be
> very expensive for very large strings.
> I'm aware of the String.scan method, but that doesn't work for me
> because it doesn't return MatchData instances.
> What I want is just something like regexp.match(string, n),
Hmm apart of using #scan and #index with $~ as indicated, I do not
think that there is a performance penalty if you do
At Sun, 3 Jun 2007 13:56:05 +0900,
Patrick Hurley wrote in [ruby-talk:254059]:
I think he wanted MatchData objects. The String#index method returns
the index (numeric position of the match). But if all you want are
captures, then index is a good solution.
> On 6/3/07, Kenneth McDonald <kenneth.m.mcdon...@sbcglobal.net> wrote:> I'm probably just missing something obvious, but I haven't found a way
> > to match a regular expression against only part of a string, in
> > particular only past a certain point of a string, as a way of finding
> > successive matches. Of course, one could do a match against a string,
> > take the substring past that match and do a match against the substring,
> > and so on, to find all of the matches for the string, but that could be
> > very expensive for very large strings.
>
> > I'm aware of the String.scan method, but that doesn't work for me
> > because it doesn't return MatchData instances.
>
> > What I want is just something like regexp.match(string, n),
>
> Hmm apart of using #scan and #index with $~ as indicated, I do not
> think that there is a performance penalty if you do
>
> rg.match(string[n..-1])
How can that be? You have to create a whole new String.
Beating a dead man Tom? As mentioned I had a terrible slip to C in my
reasoning, no idea why
···
On 6/4/07, Trans <transfire@gmail.com> wrote:
On Jun 4, 6:19 am, "Robert Dober" <robert.do...@gmail.com> wrote:
If that can be avoided in the internal implementation then adding an optional offset
index to #match is not an unreasonable idea.
T.
--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw
I should have know to never question Nobu Nakada :-), I always forget
about those variables.
Thanks
pth
···
On 6/3/07, Nobuyoshi Nakada <nobu@ruby-lang.org> wrote:
Hi,
At Sun, 3 Jun 2007 13:56:05 +0900,
Patrick Hurley wrote in [ruby-talk:254059]:
> I think he wanted MatchData objects. The String#index method returns
> the index (numeric position of the match). But if all you want are
> captures, then index is a good solution.
At Sun, 3 Jun 2007 13:56:05 +0900,
Patrick Hurley wrote in [ruby-talk:254059]:
I think he wanted MatchData objects. The String#index method returns
the index (numeric position of the match). But if all you want are
captures, then index is a good solution.
Robert, actually string[n..-1] is cheaper than you might assume: I believe the new string shares the char buffer with the old string, so you basically just get a new String object with a different offset - the large bit (the char data) is not copied.
Kind regards
robert
···
On 04.06.2007 13:28, Robert Dober wrote:
On 6/4/07, Trans <transfire@gmail.com> wrote:
On Jun 4, 6:19 am, "Robert Dober" <robert.do...@gmail.com> wrote:
> On 6/3/07, Kenneth McDonald <kenneth.m.mcdon...@sbcglobal.net> wrote:> I'm probably just missing something obvious, but I haven't found a way
> > to match a regular expression against only part of a string, in
> > particular only past a certain point of a string, as a way of finding
> > successive matches. Of course, one could do a match against a string,
> > take the substring past that match and do a match against the substring,
> > and so on, to find all of the matches for the string, but that could be
> > very expensive for very large strings.
>
> > I'm aware of the String.scan method, but that doesn't work for me
> > because it doesn't return MatchData instances.
>
> > What I want is just something like regexp.match(string, n),
>
> Hmm apart of using #scan and #index with $~ as indicated, I do not
> think that there is a performance penalty if you do
>
> rg.match(string[n..-1])
How can that be? You have to create a whole new String.
Beating a dead man Tom? As mentioned I had a terrible slip to C in my
reasoning, no idea why
If that can be avoided in the internal implementation then adding an optional offset
index to #match is not an unreasonable idea.
I am afraid that this is not true anymore when the slice is passed as
a formal parameter, the data has to be copied
irb(main):011:0> def change(x)
irb(main):012:1> x << "changed"
irb(main):013:1> end
=> nil
irb(main):014:0> a="abcdef"
=> "abcdef"
irb(main):015:0> change(a[1..2])
=> "bcchanged"
irb(main):016:0> a
=> "abcdef"
Cheers
Robert
···
On 6/4/07, Robert Klemme <shortcutter@googlemail.com> wrote:
On 04.06.2007 13:28, Robert Dober wrote:
Robert, actually string[n..-1] is cheaper than you might assume: I
believe the new string shares the char buffer with the old string, so
you basically just get a new String object with a different offset - the
large bit (the char data) is not copied.
--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw