[ann] regexp-engine-0.8, perl5 + some perl6

download:
http://rubyforge.org/frs/?group_id=18&release_id=422

play with regexp online:
http://neoneye.dk/regexp.rbx

RAA entry:
http://raa.ruby-lang.org/list.rhtml?name=regexp

About

···

=====

Here is an Regexp engine written entirely in Ruby.
It allows you to search in text with advanced search patterns.
It supports Perl5 syntax… plus some perl6 syntax (more to
come in the future). Its fairly compatible with Ruby’s native
regexp engine (GNU), and when running against the Rubicon
testsuite, it passes 96.025% out of 1560 total tests.

The implementation is simple, yet without any optimizations.
Therefore speed is slow… At some point when optimizations
are in place, I plan to do a re-implementation in C++.
Because of the simplicity, the code should be easy to grasp
and extend with your own custom code.

Goals

Be compatible with Ruby’s GNU regexp engine (perl5 syntax).
DONE, This is goal fullfilled.

Support Perl6 regexp-syntax (even before perl6 gets finished).
This new syntax is less obfuscated than old perl5 syntax.
Perhaps also make a converter between perl5 <-> perl6 syntax.
Not fullfilled yet, but I am working on it.

The AEditor project needs a flexible regexp-engine for doing
lexing, so that text can get syntax-colored.
Future.

The Ruby-in-Ruby project needs a regexp-engine… this engine
will hopefully become suitable.
Optional.

Explain-regexp… output a verbose overview of what each
opcode in the regexp does.
Optional.

Status

The project has completed the ‘make it work’ phase, and has
entered the ‘make it right’ phase, where I will focus on
optimization, so that decent speed can be achieved.

Running the engine against the Rubicon testsuite, yields
pass=1498, fail=62, pass/total=96.025%
The failing tests are mostly obscurities in GNU.

Besides that there are 402 tests, which both does whitebox
and blackbox testing. However in order to run the tests
its necessary to fetch Michael Granger’s Test::Unit::Mock
package.

License

Ruby’s license.

Acknowledgements

Mark Sparshatt

  • Got the inital idea of extending with perl6.
  • NewMatchData class, NewRegexp class.

Guy Decoux/Dave Thomas

  • stolen part of rubicon testsuite which exercises regex.

Contact

In case you find a bug og have suggestion for improvements,
then feel free to mail me.

Simon Strandgaard neoneye@adslhome.dk

Thanks for your patience.

Simon Strandgaard wrote:

download:
http://rubyforge.org/frs/?group_id=18&release_id=422

play with regexp online:
http://neoneye.dk/regexp.rbx

RAA entry:
http://raa.ruby-lang.org/list.rhtml?name=regexp

About

Here is an Regexp engine written entirely in Ruby.
It allows you to search in text with advanced search patterns.
It supports Perl5 syntax… plus some perl6 syntax (more to
come in the future). Its fairly compatible with Ruby’s native
regexp engine (GNU), and when running against the Rubicon
testsuite, it passes 96.025% out of 1560 total tests.

The implementation is simple, yet without any optimizations.
Therefore speed is slow… At some point when optimizations
are in place, I plan to do a re-implementation in C++.
Because of the simplicity, the code should be easy to grasp
and extend with your own custom code.

Goals

Be compatible with Ruby’s GNU regexp engine (perl5 syntax).
DONE, This is goal fullfilled.

Support Perl6 regexp-syntax (even before perl6 gets finished).
This new syntax is less obfuscated than old perl5 syntax.
Perhaps also make a converter between perl5 ↔ perl6 syntax.
Not fullfilled yet, but I am working on it.

The AEditor project needs a flexible regexp-engine for doing
lexing, so that text can get syntax-colored.
Future.

The Ruby-in-Ruby project needs a regexp-engine… this engine
will hopefully become suitable.
Optional.

Explain-regexp… output a verbose overview of what each
opcode in the regexp does.
Optional.

Status

The project has completed the ‘make it work’ phase, and has
entered the ‘make it right’ phase, where I will focus on
optimization, so that decent speed can be achieved.

Running the engine against the Rubicon testsuite, yields
pass=1498, fail=62, pass/total=96.025%
The failing tests are mostly obscurities in GNU.

Besides that there are 402 tests, which both does whitebox
and blackbox testing. However in order to run the tests
its necessary to fetch Michael Granger’s Test::Unit::Mock
package.

License

Ruby’s license.

Acknowledgements

Mark Sparshatt

  • Got the inital idea of extending with perl6.
  • NewMatchData class, NewRegexp class.

Guy Decoux/Dave Thomas

  • stolen part of rubicon testsuite which exercises regex.

Contact

In case you find a bug og have suggestion for improvements,
then feel free to mail me.

Simon Strandgaard neoneye@adslhome.dk

Thanks for your patience.

Does it include the perl6 embedded grammars in regex stuff?
Charlie

Questions for those which has tried the package out… or played with the homepage,
or just following the discussion :wink:

What do you think about this perl6 regexp thing? Is it something Ruby needs?
What do you want to use perl6 syntax fore?

The engine are going to support inline code inside regexp.
Is this a feature you would use?

Why did you played with regexp on the demo site?

Who has tried out this package? why did you chose to do that?
Did it install itself correct?

Did you browed the source code?

BTW: What happened to the Ruby-in-Ruby project ?

···

On Thu, 22 Apr 2004 10:37:28 +0900 Simon Strandgaard neoneye@adslhome.dk wrote:

download:
http://rubyforge.org/frs/?group_id=18&release_id=422

play with regexp online:
http://neoneye.dk/regexp.rbx

RAA entry:
http://raa.ruby-lang.org/list.rhtml?name=regexp


Simon Strandgaard

Not yet. There is only little perl6 support.
I plan to support perl6 fully.

However there is many tasks for me to do. You are very welcome to
contribute to the project :wink:

···

Charles Comstock cc1@cec.wustl.edu wrote:

Does it include the perl6 embedded grammars in regex stuff?
Charlie


Simon Strandgaard

The engine are going to support inline code inside regexp.
Is this a feature you would use?

Don't forget this

      "(?{ code })"

[...]

                 For reasons of security, this construct is forbidden if the
                 regular expression involves run-time interpolation of vari-
                 ables, unless the perilous "use re 'eval'" pragma has been
                 used (see re), or the variables contain results of "qr//"
                 operator (see "qr/STRING/imosx" in perlop).
[...]

and there are not really parano, on p5p

Guy Decoux

Yes security is an issue here. I think that one must supply a option when
they wish inline code to be executed. Or perhaps rely on $SAFE-level ?

code = “remember position; puts ‘hello world’”
re = NewRegexp.new(“xy(?{#{code}}).{42}z”, INLINE_CODE)

···

ts decoux@moulon.inra.fr wrote:

The engine are going to support inline code inside regexp.
Is this a feature you would use?

Don’t forget this

  "(?{ code })"

[…]

             For reasons of security, this construct is forbidden if the
             regular expression involves run-time interpolation of vari-
             ables, unless the perilous "use re 'eval'" pragma has been
             used (see re), or the variables contain results of "qr//"
             operator (see "qr/STRING/imosx" in perlop).

[…]

and there are not really parano, on p5p


Simon Strandgaard