Lexer

There has been some talk about Regular Expression performance.

What I would really is a proper Lexer class in Ruby.

The primary differences between a lexer and regex are:

Multiple concurrent expressions are active at the same time - you can do
deal with this using ((expr1)|(expr2)|(expr3)) but it gets rather tedious
and my guess is that it is much slower than what a lexer could do. A lexer
would identify the matched expression, here you have to inspect
MatchObject[2], MatchObject[3] etc. A Lexer could tell you exactly what
expression were matched and give you access to the matched text (possible
with submatches).

You get buffer management. Today you can use File.each to access a buffer a
line at a time or you can break the buffer with a character sequence, or you
can read a certain number of bytes.
You cannot break the buffer at a given regular expression. The lexer would
handle this seemlessly.
A Lexer could also keep track of line numbers and start and end positions of
a match (relative to start of stream), and it could also give the position
of each line.

With a proper lexer object implemented in C, Ruby’s ability to process
textual input would be greatly enhanced. Applications such as FastCGI and
RexML would probably speed up significantly.

There are two options for lexers: a built in lexer object, and a lexer
generator like Lex or Flex. I think a built in lexer object is the most
useful, but you could have a lex style script to initialize the lexer
object.

Mikkel