Very ineficient regular expression match

Hi,

The following code is supposed to print multiple regular expression

matches on a given string:

class Myregexp < Regexp
def each (str)
pos = 0
while ((pos<str.size) && (m = match(str[pos…str.size])))
yield m
pos += m.end(0)
end
end
end

Myregexp.new(/pattern/).each(“string”){|m| $stdout << m[0]}

It works. However, it takes a lot of time and memory when "string" is

big. In my case, with a 10Mb string, it takes 40Mb of RAM. The problem comes
from the str[pos…str.size] (I solved the problem for my situation with a
hack where I changed it for str[pos,300], and everything runs fast and with
low memory). Since Ruby uses garbage collection and strings are, I believe,
copy on write, why does this occur?
I’m using Pragmatic Programmers Ruby 1.66 with Windows XP.

Thanks,
Maurício

Maurício wrote:

Hi,

The following code is supposed to print multiple regular expression

matches on a given string:

Why not use scan?

“foo bar baz”.scan(/\w\w\w/) { |str| puts str }

   # ==> foo
   #     bar
   #     baz

You’re right, I didn’t know it.
Anyway, just for educational purposes, I’m still curious about why my
class didn’t work well.

Thanks,
Maurício
···

----- Original Message -----
From: “Joel VanderWerf” vjoel@PATH.Berkeley.EDU
Newsgroups: gmane.comp.lang.ruby.general
Sent: Wednesday, August 07, 2002 1:35 AM
Subject: Re: Very ineficient regular expression match

Maurício wrote:

Hi,

The following code is supposed to print multiple regular expression

matches on a given string:

Why not use scan?

“foo bar baz”.scan(/\w\w\w/) { |str| puts str }

   # ==> foo
   #     bar
   #     baz

Hi,

···

At Wed, 7 Aug 2002 20:10:14 +0900, Maurício Antunes wrote:

Anyway, just for educational purposes, I'm still curious about why my

class didn’t work well.

See the thread from [ruby-talk:42596].


Nobu Nakada