redparse
Last time I looked at redparse I couldn't even get the tests to run. Only
I wish you had/would let me know about whatever problem(s) it was that
you had, so I could/can fix it. I've definitely fixed a great many
problems since the first release, but I can't fix bugs that I don't
know about.
through lots of hacking did I get it executing and that was just to see how
fast it was.
I'll be the last to claim that redparse is (right now) a fast parser.
It's probably (still) the slowest ruby parser in the universe, tho it
should be somewhat faster than it was originally.
Caleb found a _horrible_ ruby package that both redparse and
ruby_parser choke on (think: it'll finish right around the time of heat
death of the universe... and will probably contribute greatly to it).
Ah, yes, japanese/zipcodes.rb. A 65MB ruby file.... it's an
interesting test case, since any reasonable parser ought to be able to
parse it all before the cows come home. It wasn't until I talked to
you about it that I realized that my parser (and yours too? still?)
slows down as N**2 (or is it e**N), where N is the number of tokens in
input. Fundamentally, both parsing algorithms are O(K*N) (for a fairly
large K), but since both build up a parse tree as they go, and
sometimes the garbage collector has to run in the background, which
has to visit everything in that partial parse tree as it goes, the
time for garbage collection runs in the background increases as the
length of the input. Overall this makes for unreasonably super-linear
slowdowns of the algorithm. I had never thought that just adding
garbage collection to an algorithm could take it from O(N) to O(N**2).
Usually, this isn't an issue since the parser finishes for any
non-ridiculously sized input (eg <1MB) before the garbage collector
needs to run. I suppose that using a generational garbage collector
would return things to O(N) land. Reducing the garbage production rate
of the parser might help a lot too.
On the other hand, why does this particular data set need to be
expressed as executable ruby code, when something like CSV seems so
much more reasonable....
Otherwise, it didn't seem to be faster or better in any area (esp given how
much work it was to get it to work at all).
I will readily admit that your parser has (for now) the advantage in
terms of speed, ability to run in MRI 1.9, and ability to parse 1.9
expressions. However, it also presents a fairly ugly interface to the
user. parse_tree/ruby_parser trees are these yucky lisp-like things
which behave in various unexpected ways. Whereas redparse trees are a
great deal nicer, IMNSHO. I specifically designed redparse to have
pleasant parsetrees as its output; as I see it, redparse's tree format
has a number of advantages over the lisp-like output
parse_tree/ruby_parser or the smalltalk-like output of RubyNode:
1) it's object oriented
2) it closely mirrors structure of original source code
3) it has better positioning information (line #/byte offset)
4) it behaves in predicable, expectable ways in most cases
5) things like begin..end and def..end have action unified in one
node, instead of spread out over multiple nodes nested in strange ways
6) operators are actually a separate type of node, instead of being a
weirdly named method call
The first of those points is really the key one. Having an
object-oriented interface makes the parse trees much nicer to deal
with programatically than anything else. However, the user need not be
limited to that; he can use redparse's parse trees in list-oriented or
hash-oriented if for some reason those prove to be better. So really,
redparse presents the best of all 3 worlds; its interface is
object-oriented, list-oriented, or hash-oriented depending on what
best suits the user.
Now, all of these things have been true about redparse basically from
the beginning, altho the interface has changed slightly (I think for
the better) over time. Perhaps you prefer the parse_tree style
interface; in fact I'm certain that you do. But when I'm doing the
kind of deep metaprogramming which requires access to a parse tree, I
want a tree format that's going to be as simple, regular, and
predictable as possible. Ruby's syntax is fairly complex, and any type
of parse tree representing that syntax is going to inevitably reflect
a fair amount of that complexity as a result. But there is a virtue to
be had in making the resulting trees as simple as it's possible for
them to be, rather than squirrelly and convoluted in unnecessary extra
ways.
···
On 11/30/09, Ryan Davis <ryand-ruby@zenspider.com> wrote:
On Nov 30, 2009, at 10:17 , Roger Pack wrote:
Things may have changed since then.