Slow regular expressions :(

Alexandru E. Ungur wrote:

sender: "Just Another Victim of the Ambient Morality" date: "Fri, Jul 28, 2006 at 02:55:05PM +0900" <<<EOQ

    It is not clear cut to me, either, and yes, it is worth noting that there are performance trade-offs. However, the poster to whom I was responding dismissed it, out of hand, by blaming the programmer, much like the 60's auto industry blaming accidents on drivers as an excuse for not bothering with safety features...

    If the performance hit is great and the regex pattern uncommon, it may very well not be worth it. This is, of course, debatable and that's my point. Personally, I don't use Ruby to write fast programs, I use it to write (correct) programs fast.
    At the very least, perhaps the PERL Regexp engine can be written as a Ruby extension and used as another Regexp class that we "require" in our code when we feel it's more appropriate? I'm a big believer in the Best of Both Worlds solution...

No offence, but who stops you from using Perl :slight_smile: ?!?
Just use perl when you're too lazy to think about a regex and hope that
the engine will fix the crap you put in there, and use Ruby, when you do
have the time and the interest to actually think. How about that?
Wouldn't this be "the best of both worlds"... ?

Of course you and all the other wise "haha I found one more bug in Ruby"
guys could actually follow the very good advice of reading 'Mastering
Regular Expressions', and stop dumping crap on this list, but hey I know
it's not a perfect world :slight_smile:

This is just disgusting, to take proud in your own stupidity and have
the audacity that others should turn it into wisdom on their time and
money. How about you two guys get together and write that extension you
so badly need and make yourself happy with your own hands? It's a little
harder than whinning on a list, but hey, you're smart guys, after all you
"found where Ruby sucks" and others do a great job... and nobody else
from all the people on the list discovered this! Gosh, you must be
really smart... :slight_smile:

I'd say "Good luck" but you're probably just too smart to need it
anyway...

To all the other decent readers on the list:
I appologise for my post, I may be breaking the netiquette here, but this was just too much... :frowning: I'm a Ruby nuby too, I am not the Grand Master of regular expressions,
but I love Ruby just the way it is, and it really hurts me to see this
crap attitude about Ruby's "limitations" coming from people not able
to understand their own limitations...

What's wrong with people wanting to improve a language? Fortran definitely has changed over its 50+ years, why can't Ruby or any other language? Languages (or operating systems, or any other program) benefit from the input of many people. No one can create a "perfect" language on their first try. Ruby obviously wasn't or there wouldn't be all the work on new versions.
I have been programming for over 25 years in Business Basic and find that "old" language has many features that I would love to see in Ruby, but from the looks of it probably won't see. Some of them I could probably write methods to duplicate but they won't be the same and I will continue to think first of the way I used to write a program before trying to figure out the Ruby way. My results probably won't be optimal and it will take a long time to fix some of the problems.
The other posters were showing that the Ruby way of doing some things wasn't the best way to do them and asking why there couldn't be changes made to improve the current way. I see no problem in them preferring this other method and maybe it would benefit more people than just them.

···

All the best to all the decent people,
Alex

If regexp stuff matter so much in the course of your programming for a particular case, then using perl is a valid suggestion as it is rightly seen as pretty much the best regexp engine so far (Oniguruma [1]<sp?> has made some waves and may be able to best perl in the end, but it's not widely available and is an extension now, perhaps for Ruby2/Rite it will be the built in regexp engine, who knows or dares to dream).

So I don't find it unreasonable to say to people who need to do a lot of regexp work - "use perl", it was practically designed for it. But most of the time I find using regexp complicates things unnecessarily [2].

Most of the time Ruby is good enough, if you are generating regexps automatically, perhaps perl is the better choice as it will prevent this exponential slowdown for you. To say that Ruby must be 'fixed' to allow you to write crappy regexps and have the engine take care of it for you is in my opinion not the answer. If you're going to use a regexp, you should be aware of what you are doing. If for some reason you generate a load of them automatically, then I think that there could possibly be a better way of achieving your goals in ruby (perhaps a dsl instead of treating everything as raw text?). And finally Ruby might not be the right language to choose and you'd be better off throwing your funky generated regexps at perl to handle.

All this may change as soon as Ruby2/Rite appears with Oniguruma, but for now Ruby isn't the best tool for every possible problem - and if it's a major worry for you, perhaps lend a hand with Onig... or indeed write a perl-alike regexp engine extension - competition is good :slight_smile:

Kev

[1] http://www.geocities.jp/kosako3/oniguruma/
[2] Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.—Jamie Zawinski, in comp.lang.emacs

···

--
"Society in every state is a blessing, but government even in its best state is but a necessary evil; in it's worst state an intolerable one" - Thomas Paine

sender: "Michael W. Ryder" date: "Fri, Jul 28, 2006 at 07:30:11PM +0900" <<<EOQ

What's wrong with people wanting to improve a language?

Nothing. In this case however we are not talking about improving,
but about making the regexp engine to be more forgiving, so that
we can be less careful when crafting a regex.
I wouldn't prefer Ruby to think for me, and I very much like when
it says: " - You moron, look what you did: I'm gonna stay stuck on
this regex for 56783.99 years, and all because of you!! Can you
even wait that long?!?" so that slowly I become less of a moron...
I would definitely not like it to say: " - All is good, feed me crap,
I can take it. I have a good stomach.", and let me go away with it.
But maybe that's just me...

I'm not against evolution, I just see it differently.

Alex

I don't talk to my programming language nearly as much, though perhaps
I should :). But I agree with this. I've had regexes that degenerated
into such pathological cases once or twice and I've just fixed them.
Fixing ruby would hide the bug.

That being said this is just one more reason I'm afraid of regexes.
When they grow beyond very short I start to get nervous. That's
probably how AI will develop and enslave us all, by some random noise
being interpreted as a regex... Which is one more reason not to fix
this, to slow down our evil overlords. :slight_smile:

Pedro.

···

On 7/28/06, Alexandru E. Ungur <alexandru@globalterrasoft.ro> wrote:

I wouldn't prefer Ruby to think for me, and I very much like when
it says: " - You moron, look what you did: I'm gonna stay stuck on
this regex for 56783.99 years, and all because of you!! Can you
even wait that long?!?" so that slowly I become less of a moron...
I would definitely not like it to say: " - All is good, feed me crap,
I can take it. I have a good stomach.", and let me go away with it.
But maybe that's just me...

Alexandru E. Ungur wrote:

sender: "Michael W. Ryder" date: "Fri, Jul 28, 2006 at 07:30:11PM +0900" <<<EOQ

What's wrong with people wanting to improve a language?

Nothing. In this case however we are not talking about improving,
but about making the regexp engine to be more forgiving, so that
we can be less careful when crafting a regex.
I wouldn't prefer Ruby to think for me, and I very much like when
it says: " - You moron, look what you did: I'm gonna stay stuck on
this regex for 56783.99 years, and all because of you!! Can you
even wait that long?!?" so that slowly I become less of a moron...
I would definitely not like it to say: " - All is good, feed me crap,
I can take it. I have a good stomach.", and let me go away with it.
But maybe that's just me...

First, you are misunderstanding the extend of the problem if you think
this
is just about RegExps that are not "carefully crafted".
As I have pointed out, with very complex expressions or expressions that
get constructed automatically (which happens quite often) it is nearly
impossible to avoid this, even if you are en expert with RegExps.
Why would you prefer an expression engine that
stupidly does *the wrong* thing?

Second, the whole purpose of programming languages is to think for
humans when the kind of thinking required is not actually helping to
solve a
problem but rather a technicality. Your argument would prevent
optimizing
compilers and a lot of other things where indeed a language (or its
compiler/interpreter is doing a lot of thinking for humans).
We are talking about Ruby here - a high level language with a design
that
makes it easy to learn and use. Not assmebler or C.

Lastly, what you want Ruby to do here is even *more* out of the scope
of a RegExp engine than simply using optimization tricks for speeding
up some pathological cases: if you want it to *tell you in advance*
that it is going to get into exponentional processing for a certain case
you are wishing for the impossible. And if you want it to *teach* you
how to make better epxressions you are asking for something nearly
as complicated.

···

--
Posted via http://www.ruby-forum.com/\.

Alexandru E. Ungur wrote:

sender: "Michael W. Ryder" date: "Fri, Jul 28, 2006 at 07:30:11PM +0900" <<<EOQ

What's wrong with people wanting to improve a language?

Nothing. In this case however we are not talking about improving,
but about making the regexp engine to be more forgiving, so that
we can be less careful when crafting a regex.
I wouldn't prefer Ruby to think for me, and I very much like when
it says: " - You moron, look what you did: I'm gonna stay stuck on this regex for 56783.99 years, and all because of you!! Can you
even wait that long?!?" so that slowly I become less of a moron...
I would definitely not like it to say: " - All is good, feed me crap,
I can take it. I have a good stomach.", and let me go away with it.
But maybe that's just me...

I'm not against evolution, I just see it differently.

Alex

So you are saying that Ruby should only be used by those who can craft a perfect Regexp, all others need not bother? Personally, I choose a language by how well it will do the job, not how arcane it is. Maybe you would be happier with APL where you can write an entire program in a single line. Very few other people will ever be able to read it, much less maintain it.
Computer Languages are supposed to make it easier to accomplish a purpose, not force you to think about the details. If I wanted to have to spend hours fine tuning a single line of code so that it would run in a reasonable amount of time I would use Assembly. People should not have to spend a large amount of time trying to learn "features" in a language to become productive. They should be able to create a "reasonable" program quickly and then spend time later, if they have it which most of us don't, to make it better.

"Alexandru E. Ungur" <alexandru@globalterrasoft.ro> wrote in message
news:20060728124227.GA8607@globalterrasoft.ro...

sender: "Michael W. Ryder" date: "Fri, Jul 28, 2006 at 07:30:11PM
+0900" <<<EOQ

What's wrong with people wanting to improve a language?

Nothing. In this case however we are not talking about improving,
but about making the regexp engine to be more forgiving, so that
we can be less careful when crafting a regex.

    So, you're against "defensive programming?" Routines that check their
input for bad values are not an improvement over their earlier counterparts
that would just crash, because forgiveness won't teach you a lesson?
    Are you against "memory protection" because a program should crash and
take the whole system down with it instead of being forgiving of people's
mistakes?
    Are you against "microkernels" because drivers should crash and halt the
system and automatically restarting such sub-systems is just too forgiving
and would hide the problem?

I wouldn't prefer Ruby to think for me, and I very much like when
it says: " - You moron, look what you did: I'm gonna stay stuck on
this regex for 56783.99 years, and all because of you!! Can you
even wait that long?!?" so that slowly I become less of a moron...
I would definitely not like it to say: " - All is good, feed me crap,
I can take it. I have a good stomach.", and let me go away with it.
But maybe that's just me...

    I think you misunderstand the purpose of programming. Programming
languages are not a yard stick to measure the length of your member. This
is not a contest and it's not a tutorial on regular expressions and how they
work. Why don't you want Ruby to think for you? That's the role of a
computer... to do work for you! I wish I could just talk to Hal, my
computer, describe to it what work I want done and then have it do that
work! But no, computers are still primitive, and we must do most of the
thinking for it. However, we do try to get it to do as much work as
possible, including catching our mistakes and optimising our code! Believe
me, removing an exponential search pattern from my regular expression counts
as an optimisation!
    Ruby halting is not Ruby saying "You moron, look what you did." That's
Ruby sulking in a corner, not doing the work I (thought I) asked it to, and
not telling me what's wrong.

I'm not against evolution, I just see it differently.

    I don't think you see it at all. Pretend, just for a moment, that you
don't care if you're "right" or "wrong" and think about both sides of the
issue. Think about the purpose of computers and think about the pros and
cons...

    A little off topic but here's another property of Ruby I'd like to
change. I didn't realize I wanted this until just the other day, so it
remains interesting to me...

array.each { |i| new_var = i if i.some_test }
a_method.do_thing new_var # variables leak out of blocks!

If you really need to create such complicated regexps, then maybe regexps
aren't the right tool. You would be better of using a real parser or
parser generator.

Kristof

···

On Sat, 29 Jul 2006 00:14:22 +0900, Roman Hausner wrote:

<snip>
First, you are misunderstanding the extend of the problem if you think
this
is just about RegExps that are not "carefully crafted".
As I have pointed out, with very complex expressions or expressions that
get constructed automatically (which happens quite often) it is nearly
impossible to avoid this, even if you are en expert with RegExps.
Why would you prefer an expression engine that
stupidly does *the wrong* thing?
<snip>

Well, that's not exactly fair to languages in which you can write the
entire (presumably nontrivial) program on a single line. For instance,
Logo allows you to write an entire (nontrivial) program on a single
line, and it's eminently readable -- at least in the same class as Ruby.

Don't allow this digression to distract you from the point you're
making, though.

···

On Sat, Jul 29, 2006 at 04:35:05AM +0900, Michael W. Ryder wrote:

So you are saying that Ruby should only be used by those who can craft a
perfect Regexp, all others need not bother? Personally, I choose a
language by how well it will do the job, not how arcane it is. Maybe
you would be happier with APL where you can write an entire program in a
single line. Very few other people will ever be able to read it, much
less maintain it.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
"The measure on a man's real character is what he would do
if he knew he would never be found out." - Thomas McCauley