Rubynet-announce Digest, Vol 9, Issue 1

Send rubynet-announce mailing list submissions to
rubynet-announce@lists.rubynet.org

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.rubynet.org/lists/listinfo/rubynet-announce
or, via email, send a message with subject or body ‘help’ to
rubynet-announce-request@lists.rubynet.org

You can reach the person managing the list at
rubynet-announce-owner@lists.rubynet.org

When replying, please edit your Subject line so it is more specific
than “Re: Contents of rubynet-announce digest…”

Today’s Topics:

  1. regexp-engine v0.2 (Simon Strandgaard)
···

Message: 1
Date: Thu, 13 Nov 2003 12:39:35 +0100
From: Simon Strandgaard neoneye@adslhome.dk
Subject: [ruby-announce] regexp-engine v0.2
To: rubynet-announce@lists.rubynet.org
Message-ID:
20031113113936.ENDG29690.fepE.post.tele.dk@localhost.localdomain
Content-Type: text/plain; charset=iso-8859-1

I am proud to present version 0.2 of my regexp-engine,
it takes up only 785 lines of Ruby code and can do the most
fundemental operations. In my opinion its good OO.

Please feel free to ask me questions.

download:
http://rubyforge.org/download.php/200/regexp-engine-0.2.tar.gz

this projects RAA entry:
http://raa.ruby-lang.org/list.rhtml?name=regexp

browse CVS:
http://rubyforge.org/cgi-bin/viewcvs/cgi/viewcvs.cgi/projects/regexp_engine/source/?cvsroot=aeditor

Regular Expressions engine, a subproject of AEditor
2003, Copyright by Simon Strandgaard
http://aeditor.rubyforge.org/

About

AEditor needs a regexp engine. You probably think, why not
rely on an existing engine (for instance Ruby’s regexp engine) ?
Existing engines are not flexible enough. The iterator pattern
provides that needed flexibility. Thus it should not matter
wheter the engine operate on: UCS-4 or UTF-8 or ASCII.

Goal is to build an engine which is fully compatible with Ruby’s
regexp syntax, which can work with iterators.

Eventualy extend the regexp syntax, with some editor-stuff.
For instance: point where cursor should be placed,
match text which is legal ruby code, execute regexp within
retangular selection… etc. I am open to other suggestions.

Eventualy re-implement in C++ to gain performance.

Status

Data structure has stabilized and the fundemental operations
are working quite good (was difficult to implement).
Iterators is not yet implemented, thus only ASCII right now.
Performance is not impressive.
Left is all the easy stuff.

  • features of the scanner so far:
    a>b>c alternation

    •      repeat(0..infinity) greedy
      
    •      repeat(1..infinity) greedy
      

    {n,} repeat(n…infinity) greedy
    ( … ) grouping → register… nested repeat also works

  • features of the parser so far:
    ( … ) group → register

         alternation
    

    \1 … \9 backreferences
    \ escape
    . match anything except newline

    •      repeat(0..infinity) greedy 
      

    *? repeat(0…infinity) lazy

    •      repeat(1..infinity) greedy  
      

    +? repeat(1…infinity) lazy
    {n,m} repeat(n…m) greedy constraint(n <= m)
    {n,m}? repeat(n…m) lazy constraint(n <= m)
    {n,} repeat(n…infinity) greedy
    {n,}? repeat(n…infinity) lazy
    {m} repeat(m…m) greedy
    {m}? repeat(m…m) lazy Does this one make sense ?
    specialcase: illegal ranges is treated as they are just
    ordinary literals.

License

Ruby’s license.

Comments

I hope you like it.


Simon Strandgaard



rubynet-announce mailing list
announce@rubynet.org
http://lists.rubynet.org/lists/listinfo/rubynet-announce

End of rubynet-announce Digest, Vol 9, Issue 1