I am very interested in Cursor for use in Reg.
Thanks for the interest. If you want to help out, let me know.
I think the API is stabilizing. The only functional change in
my mind right now is having "position"s shift with insertions
and deletions before them. A lot of optimization is needed
(i.e. derived classes do one element at a time and #pos/#pos=
is typically O(n) instead of O(1)). I'm moving on to Grammar
right now.
It seems like
the sort
of thing to allow Reg's Array-based backtracking pattern
matcher to
operate on Strings or Files or whatever else.
(Did you know that I needed this by esp? Since you're got
your own
lexer/parser/pattern matcher thing, (grammar?) maybe you
don't want to
help the competition...
Competition is good for the consumer The competitors
enhance their own products and use each others ideas.
Since we are both doing parsing type things, I can see how this
might be useful to you.
Anyway, cursor isn't quite functional enough for my purposes.
Passing
patterns for the length parameter to #get and #set is great,
but there
aren't enough pattern types. I think I need the equivalent of
Cursor#===(regexp), which is to say, the ability to compare a
Regexp
to a position within a String. Or, if the backing store of
the cursor
is a file, you get the ability to compare a Regexp to a File.
Someone
was asking for that recently, it's not easy, but it would be
great if
your lib did it.... just the ability to regexp against the
middle of a
String would make me very happy, tho. I've hacked this up
before, but
I think that a reasonably efficient version would require a
bit of c
extension.
Funny you ask for that - cursor===regexp. Instead, my grammar
package will support grammar===cursor. I think the only case I
could fully implement cursor===regexp is when the data the
cursor is accessing is a string. I can't do it for the case
when the cursor is a IO (unless I read the whole file into a
String). Regexp is too tied to String. Also, right now,
nothing in Cursor is tied specifically to Strings (or
characters). I want to keep it that way - dealing with
elements and element sequences.
This === pattern matching would be better placed in the pattern
matching class if it is possible - regexp=== cursor,
grammar===cursor, or reg===cursor.
Ideally, of course, I want to see the ability to compare
against a
Reg, particularly the Regs that match multiple array
elements. If you
made it work with whatever the equivalent in your system is,
that
would be almost as good.How does Grammar work? I would appreciate if you could write
a couple
of paragraphs of high-level overview of Cursor, Grammar,
their design
and capabilites and how they're intended to work together.
Cursor is a unification of many external iterator ideas -
iterators from C++, streams from Java, iterators from Java,
Ruby IO, and a text editor "cursor". Compared to most other
general external iterators, Cursor offers two interesting
features - ability to insert/delete and the ability to
save/restore a position.
Since I haven't released grammar yet, its predecessor will give
you an idea:
http://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/138450
In this, I'm using RandomAccessStream instead of Cursor. And
Syntax is equivalent to the Grammar I'm working on.
A Grammar is used to match what's next in Cursor. Like Cursor,
it can deal with any objects, not just characters and Strings.
I provide some operator overloading to make specifying a
Grammar like BNF - "|" (Grammar::Alteration), "+"
(Grammar::Sequence), "*" (Grammar::Repeat), etc. I also put
some of these in built-in classes (String, Range) to make it
even easier (i.e. "hi"|"hello" makes a Grammar that matches
either "hi" or "hello").
With this you can make a parser that works directly on the
Cursor (from an IO, String, etc) or you can split it up into
lexer and parser. In this case, the parser (a Grammar) would
deal with a Cursor that spits out tokens. You can make your
own lexer for that or use Grammar again. Using Grammar to make
a lexer you'd have first need the Grammar for a token which
should be a big Alteration (or a table lookup). This token
Grammar would also need to tag these appropriately to
distinguish token types for the parser. The lexer would then
be a Cursor which would returns matches from the token Grammar
against the original Cursor. Here is some psuedo code to give
an idea of what was just said:
token = (tokenGrammar===ioCursor)
tokenCursor = Cursor of tokens (lexer)
parsetree = (parserGrammar===tokenCursor)
Like the ideas?
···
--- Caleb Clausen <vikkous@gmail.com> wrote:
__________________________________
Do you Yahoo!?
Yahoo! Small Business - Try our new Resources site