[ANN] Syntax 0.7.0

Syntax is a pure-Ruby framework for doing lexical analysis (and, in particular, syntax highlighting) of text. It currently sports lexers for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts in those languages to HTML).

Links:

   Download: http://rubyforge.org/frs/?group_id=505
   User Manual: http://docs.jamisbuck.org/read/book/4

This release is much improved in accuracy and robustness (at least, for the Ruby lexer--the XML and YAML lexers were not changed). The Ruby lexer now deals better with many ambiguous cases, and even supports multiple heredocs on a single line. It accurately colorizes cgi.rb and mkmf.rb from the standard lib, if that means anything at all to you.

The Syntax framework also supports "regions" now (thanks to flgr for the suggestion) and sports many bug fixes (thanks to Carl Drinkwater for discovering most of them). Syntax regions just allow one group to span (and include) multiple groups--like a string that includes interpolated expressions and escape sequences.

For a pretty example (mkmf.rb fully syntax highlighted) see http://ruby.jamisbuck.org/mkmf.html.

The next release will include robustness fixes for the XML and YAML lexers, as well as a lexer for C. Lexers for Perl, Python, Java, HTML, and RHTML would be nice as well, if I can get to them. Community submissions will be gladly accepted, as long as you are okay with your contributed code being distributed under the BSD license.

Enjoy!

- Jamis

Jamis Buck wrote:

Syntax is a pure-Ruby framework for doing lexical analysis (and, in particular, syntax highlighting) of text. It currently sports lexers for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts in those languages to HTML).

And is indeed a wonderful Ruby library. It's just so very cool to have a library that marks up Ruby properly with <span> classes. It allows you to do quite a lot to Ruby code.

Thanks a lot, Jamis, for this very nice library!

For a pretty example (mkmf.rb fully syntax highlighted) see http://ruby.jamisbuck.org/mkmf.html\.

Another one (lots of new CSS) can be seen here:

http://flgr.0x42.net/highlighting.png

I'll be using the Syntax library for dissecting the submissions of the IORCC and it is a wonderful help.

If you're recognizing your own code in the above screenshot then let me tell you that you IMHO did a very nice job with your obfuscation.

The next release will include robustness fixes for the XML and YAML lexers, as well as a lexer for C. Lexers for Perl, Python, Java, HTML, and RHTML would be nice as well, if I can get to them. Community submissions will be gladly accepted, as long as you are okay with your contributed code being distributed under the BSD license.

Having a C lexer will be wonderful as that is exactly something that I'm currently finding myself needing as well.

I think I'll be able to submit lexers for a few simple languages -- Befunge would be an easy one. But your framework seems to make lexing more complex language easy as well, so I might as well try that. Guess we'll see. :slight_smile:

Quoting jamis@37signals.com, on Thu, Mar 24, 2005 at 02:54:20PM +0900:

Syntax is a pure-Ruby framework for doing lexical analysis (and, in
particular, syntax highlighting) of text. It currently sports lexers
for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts in
those languages to HTML).

Would this be an appropriate tool for parsing ruby to generate ctags?

To write a tags file I need to know where I am in ruby's terms (in what
class, module), what was found (method, attribute, constant, class,
...), AND I need to generate a regex that will find this place in the
file. For repeated names this can mean knowing what the entire line
looks like, so that I can put leading whitespace into the regex.

Is Syntax something I should be looking at? It seems there are some
similarities.. if you know enough to hilight, maybe you know enough to
generate a ctag?

I'm using rdoc right now, but it is a very large tool, and I would like
something smaller and more malleable, if possible.

Thanks,
Sam

···

Links:

  Download: http://rubyforge.org/frs/?group_id=505
  User Manual: http://docs.jamisbuck.org/read/book/4

This release is much improved in accuracy and robustness (at least, for
the Ruby lexer--the XML and YAML lexers were not changed). The Ruby
lexer now deals better with many ambiguous cases, and even supports
multiple heredocs on a single line. It accurately colorizes cgi.rb and
mkmf.rb from the standard lib, if that means anything at all to you.

The Syntax framework also supports "regions" now (thanks to flgr for
the suggestion) and sports many bug fixes (thanks to Carl Drinkwater
for discovering most of them). Syntax regions just allow one group to
span (and include) multiple groups--like a string that includes
interpolated expressions and escape sequences.

For a pretty example (mkmf.rb fully syntax highlighted) see
http://ruby.jamisbuck.org/mkmf.html\.

The next release will include robustness fixes for the XML and YAML
lexers, as well as a lexer for C. Lexers for Perl, Python, Java, HTML,
and RHTML would be nice as well, if I can get to them. Community
submissions will be gladly accepted, as long as you are okay with your
contributed code being distributed under the BSD license.

Enjoy!

- Jamis

Thanks you so much for updating this wonderful library of yours.

···

On Thu, 24 Mar 2005 14:54:20 +0900, Jamis Buck <jamis@37signals.com> wrote:

Links:

   Download: http://rubyforge.org/frs/?group_id=505
   User Manual: http://docs.jamisbuck.org/read/book/4

--
Tobi
http://www.snowdevil.ca - Snowboards that don't suck
http://www.hieraki.org - Open source book authoring
http://blog.leetsoft.com - Technical weblog

Quoting jamis@37signals.com, on Thu, Mar 24, 2005 at 02:54:20PM +0900:

Syntax is a pure-Ruby framework for doing lexical analysis (and, in
particular, syntax highlighting) of text. It currently sports lexers
for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts in
those languages to HTML).

Would this be an appropriate tool for parsing ruby to generate ctags?

Hmmm, maybe. Not in its current incarnation, though. One thing the lexer doesn't give you right now is the location of each token in the file. That would be a good addition, though. I'll see about adding that to the next version.

To write a tags file I need to know where I am in ruby's terms (in what
class, module), what was found (method, attribute, constant, class,
...), AND I need to generate a regex that will find this place in the
file. For repeated names this can mean knowing what the entire line
looks like, so that I can put leading whitespace into the regex.

The lexers that come with Syntax are optimized for syntax highlighting. You could conceivably write a different lexer module that was optimized for tag extraction, using the Syntax framework. You'd probably do just as well to use strscan directly, though.

- Jamis

···

On Mar 24, 2005, at 7:44 AM, Sam Roberts wrote:

Is Syntax something I should be looking at? It seems there are some
similarities.. if you know enough to hilight, maybe you know enough to
generate a ctag?

I'm using rdoc right now, but it is a very large tool, and I would like
something smaller and more malleable, if possible.

Thanks,
Sam

Links:

  Download: http://rubyforge.org/frs/?group_id=505
  User Manual: http://docs.jamisbuck.org/read/book/4

This release is much improved in accuracy and robustness (at least, for
the Ruby lexer--the XML and YAML lexers were not changed). The Ruby
lexer now deals better with many ambiguous cases, and even supports
multiple heredocs on a single line. It accurately colorizes cgi.rb and
mkmf.rb from the standard lib, if that means anything at all to you.

The Syntax framework also supports "regions" now (thanks to flgr for
the suggestion) and sports many bug fixes (thanks to Carl Drinkwater
for discovering most of them). Syntax regions just allow one group to
span (and include) multiple groups--like a string that includes
interpolated expressions and escape sequences.

For a pretty example (mkmf.rb fully syntax highlighted) see
http://ruby.jamisbuck.org/mkmf.html\.

The next release will include robustness fixes for the XML and YAML
lexers, as well as a lexer for C. Lexers for Perl, Python, Java, HTML,
and RHTML would be nice as well, if I can get to them. Community
submissions will be gladly accepted, as long as you are okay with your
contributed code being distributed under the BSD license.

Enjoy!

- Jamis

Sam Roberts ha scritto:

I'm using rdoc right now, but it is a very large tool, and I would like
something smaller and more malleable, if possible.

why not ParseTree or ripper ?

Speacking of RDOC. Did anyone take up the call for a new maintainer? I
would love to see syntax highlighting in RDoc.

T.

Quoting jamis@37signals.com, on Fri, Mar 25, 2005 at 01:27:37AM +0900:

>Quoting jamis@37signals.com, on Thu, Mar 24, 2005 at 02:54:20PM +0900:
>>Syntax is a pure-Ruby framework for doing lexical analysis (and, in
>>particular, syntax highlighting) of text. It currently sports lexers
>>for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts
>>in
>>those languages to HTML).
>
>Would this be an appropriate tool for parsing ruby to generate ctags?
>

Hmmm, maybe. Not in its current incarnation, though. One thing the
lexer doesn't give you right now is the location of each token in the
file. That would be a good addition, though. I'll see about adding that
to the next version.

I don't need location in file, I just need the text of the line:

  module Foo
    class Bar
      class Bar
    end

The tag would be
  Bar-> regex / class Bar/
  Bar-> regex / class Bar/
  Foo.Bar -> regex / class Bar/
  Foo.Bar.Bar -> regex / class Bar/

I don't need line no.

For this
  module Foo
  end
  class Foo::Bar
  end

The tags would be different:
  Bar -> /class Foo::Bar/

And for
  class
    Foo
  end

Different again.

Quoting surrender_it@remove-yahoo.it, on Fri, Mar 25, 2005 at 01:49:52AM +0900:

Sam Roberts ha scritto:

>I'm using rdoc right now, but it is a very large tool, and I would like
>something smaller and more malleable, if possible.
>

why not ParseTree or ripper ?

I have no idea what ripper does, but parse tree just gives symbols, it
doesn't have enough information for me to build a regex, as above, does
it?

Making tags is an odd problem. It involves semantic analysis, when you
see class Foo, you need to know if it is in module Bar, or inside class
Joe. But, to generate the tag you need access to the original text so
that you can build a regex, which is sensitive to HOW you wrote the
code, not just what the code means. Most tokenizers goal in life is to
abstract you away from the text, so you just see a stream of syntactic
elements.

Rdoc is useful, because it does the analysis, but it also maintains
original text in a way it can (in some cases) be regenerated to form
regexes.

I think its not a bad place to put it, since tags as another output
format is a reasonable extension of its model.

But... it's really slow (i think its how much data it keeps in memory).
It also doesn't quite give me access to everything I want. I can hack
it, but I'm balking at the chore. Adding an output formatter was easy
and standalone. Hacking its internals... thats another story.

I'm totally open to suggestions. I NEED tags to read code effectively.

I'm faster writing in ruby than in C, but I read C code way, way, way
faster due to the tool support I have (vim+tags) (I debug C faster, too,
because I have a great debugger - gdb.) I'm not happy about this
situation.

Maybe I should suggest this as one of those ruby weekly challenges...
Document the tags format, the goals, and let people choose - rules are
that there are no rules, you can use any tool/library you want, even
non-ruby, and let the best code win. If its non-ruby, well, that would
point out an area where ruby could use some work.

Btw, syntax hilighting with rdoc should be easy, it tokenized the input.

Cheers,
Sam

···

On Mar 24, 2005, at 7:44 AM, Sam Roberts wrote:

Sam Roberts wrote:

why not ParseTree or ripper ?

I have no idea what ripper does, but parse tree just gives symbols, it
doesn't have enough information for me to build a regex, as above, does
it?

Ripper basically is Ruby's integrated Ruby parser. It will invoke callbacks for every kind of construct it encounters.

This code snippet ought to get you started with it:

irb(main):017:0> class MyParser < Ripper
irb(main):018:1> def method_missing(name, *args)
irb(main):019:2> puts "#{name}: #{args.inspect}"
irb(main):020:2> end
irb(main):021:1> end
=> nil
irb(main):022:0> MyParser.new.parse("puts 'Hello World!' if true")
on__scan: ["puts"]
on__IDENTIFIER: ["puts"]
on__scan: [" "]
on__space: [" "]
on__scan: ["'"]
on__new_string: ["'"]
on__scan: ["Hello World!"]
on__add_string: [nil, "Hello World!"]
on__scan: ["'"]
on__string_end: [nil, "'"]
on__scan: [" "]
on__space: [" "]
on__scan: ["if"]
on__KEYWORD: ["if"]
on__argstart: ["Hello World!"]
on__fcall: [:puts, nil]
on__scan: [" "]
on__space: [" "]
on__scan: ["true"]
on__KEYWORD: ["true"]
on__varref: [:true]
on__if_mod: [nil, nil]
=> nil

Maybe I don't understand what you need exactly, but exuberant ctags
supports both ruby and vi:
$ ctags --version
Exuberant Ctags 5.5.4, Copyright (C) 1996-2003 Darren Hiebert
  Compiled: May 12 2004, 14:32:50
  Addresses: <dhiebert@users.sourceforge.net>, http://ctags.sourceforge.net
  Optional compiled features: +wildcards, +regex

$ ctags --list-languages | grep -i ruby
Ruby

It works for me with emacs...

Tell me if I am completely off base.

Cheers,
Guillaume.

···

On Fri, 2005-03-25 at 02:21 +0900, Sam Roberts wrote:

I'm totally open to suggestions. I NEED tags to read code effectively.

I'm faster writing in ruby than in C, but I read C code way, way, way
faster due to the tool support I have (vim+tags) (I debug C faster, too,
because I have a great debugger - gdb.) I'm not happy about this
situation.

Florian Gross wrote:

Ripper basically is Ruby's integrated Ruby parser. It will invoke callbacks for every kind of construct it encounters.

This code snippet ought to get you started with it:

Oh, and you need to do require 'ripper' before you can use it, of course.

Quoting guslist@free.fr, on Fri, Mar 25, 2005 at 03:11:15AM +0900:

> I'm totally open to suggestions. I NEED tags to read code effectively.
>
> I'm faster writing in ruby than in C, but I read C code way, way, way
> faster due to the tool support I have (vim+tags) (I debug C faster, too,
> because I have a great debugger - gdb.) I'm not happy about this
> situation.

Maybe I don't understand what you need exactly, but exuberant ctags
supports both ruby and vi:

$ ctags --version
Exuberant Ctags 5.5.4, Copyright (C) 1996-2003 Darren Hiebert
  Compiled: May 12 2004, 14:32:50
  Addresses: <dhiebert@users.sourceforge.net>, http://ctags.sourceforge.net
  Optional compiled features: +wildcards, +regex

$ ctags --list-languages | grep -i ruby
Ruby

"Support", and "supports well" aren't the same thing.

It works for me with emacs...

Pico supports editing text, but it doesn't really compare to emacs, does
it? :slight_smile:

Tell me if I am completely off base.

Half on, half off.

I think you've internalized the limitations, or don't realize how good
it could be.

It doesn't tag constants, and it doesn't support qualified tags.

Tags are downright useless (IMNSHO) if they aren't qualified in an OO
language. C only has one function per name (ignoring static functions).

Tag a large code-base, now jump to tag "new" (trick question, exctags
doesn't understand that "initialize" is called as "new").

Ok, now jump to "each", is it the right tag? No way, you've got one for
almost every class, because its the Ruby Way, and you've even more
definitions of #to_s. How much fun do you have walking them all to find
the one you wanted?

In well-supported languages, you would use --extra=+q, and get qualified
tags, so you could do:

  <tag-cmd>Vc<TAB-complete name>ard.t<TAB-complete>o_s

And in about 5 keystrokes, you'd be at the definition of the method you
wanted, Vcard.to_s

It's also a cheap and fast class browser:

  <tag-cmd>Vp<TAB><TAB>

would give you a list of all methods, classes, modules, and constants in
the module Vpim, and you can keep drilling down, exploring whats there.
Ah... heaven.

Doesn't work with exuberant ctags. I looked at adding it, but it's
awful. You need to maintain a stack of class module names so you know
where you are. How bad can that be, you ask? Terrible. You don't know
where they end. An "end" means all kind of things in ruby. Maybe I'll
give it another shot, but it looked hard to me.

It also has minor bugs, like it doesn't grok this:

  class SomeModule::Foo
  end

Maybe if you think exctags is OK, you've never felt the intoxicating
power of a fully operational Battle Star^w^w tagging system...

Cheers,
Sam

···

On Fri, 2005-03-25 at 02:21 +0900, Sam Roberts wrote:

> I'm totally open to suggestions. I NEED tags to read code effectively.

$ ctags --list-languages | grep -i ruby
Ruby

It works for me with emacs...

And, well, both versions of the pickaxe talk about 'rtags' in the irb section:

  http://www.rubycentral.com/book/irb.html

But I'm with Guillaume, perhaps I'm missing something.

Cameron

Quoting flgr@ccan.de, on Fri, Mar 25, 2005 at 02:34:48AM +0900:

Florian Gross wrote:

>Ripper basically is Ruby's integrated Ruby parser. It will invoke
>callbacks for every kind of construct it encounters.
>
>This code snippet ought to get you started with it:

Oh, and you need to do require 'ripper' before you can use it, of course.

There is no files released, and the cvs is not building for me, I'll
have to try later.

[ensemble] ~/p/ruby/rtags/other/ripper/ripper $ make
make: Entering directory `/Users/sam/p/ruby/rtags/other/ripper/ripper'
touch parse.y
/opt/local/bin/ruby tools/preproc.rb parse.y > ripper.y
bison -t -v -oripper.c ripper.y
gperf -p -j1 -i 1 -g -o -t -N rb_reserved_word -k'1,3,$' keywords > lex.c
/opt/local/bin/ruby tools/list-parse-event-ids.rb parse.y | /opt/local/bin/ruby tools/generate-eventids1.rb > eventids1.c
gcc -fno-common -O -pipe -I/opt/local/include -fno-common -pipe -fno-common -I. -I/opt/local/lib/ruby/1.8/powerpc-darwin6.8 -I/opt/local/lib/ruby/1.8/powerpc-darwin6.8 -I. -O -pipe -I/opt/local/include -DRIPPER -c ripper.c -o ripper.o
cc -dynamic -bundle -undefined suppress -flat_namespace -L"/opt/local/lib" -o ripper.bundle ripper.o -lruby -ldl -lobjc
ld: multiple definitions of symbol _rb_reserved_word
ripper.o definition of _rb_reserved_word in section (__TEXT,__text)
/opt/local/lib/libruby.dylib(parse.o) definition of _rb_reserved_word
make: *** [ripper.bundle] Error 1

Sam

Quoting flgr@ccan.de, on Fri, Mar 25, 2005 at 02:34:48AM +0900:

Sam Roberts wrote:

>>why not ParseTree or ripper ?
>
>I have no idea what ripper does, but parse tree just gives symbols, it
>doesn't have enough information for me to build a regex, as above, does
>it?

Ripper basically is Ruby's integrated Ruby parser. It will invoke
callbacks for every kind of construct it encounters.

Hm, look like it returns whitespace, and other non-syntactic elements.
Good. Does it return end-of-line, and is it just a lexer, or is it a
parser, too?

I.e., does

  MyParser.new.parse("class Foo; Bar = 4; end;")

tell me that the Foo is a class name, and Bar is a constant name, or do
I have to deduce that?

If so, maybe i'll try.

The rubyforge page makes it look as if it may be written in C based on
ruby's parser, using lex&yacc. If so, that would be sweet, because it
might be fast.

Thanks,
Sam

···

This code snippet ought to get you started with it:

irb(main):017:0> class MyParser < Ripper
irb(main):018:1> def method_missing(name, *args)
irb(main):019:2> puts "#{name}: #{args.inspect}"
irb(main):020:2> end
irb(main):021:1> end
=> nil
irb(main):022:0> MyParser.new.parse("puts 'Hello World!' if true")
on__scan: ["puts"]
on__IDENTIFIER: ["puts"]
on__scan: [" "]
on__space: [" "]
on__scan: ["'"]
on__new_string: ["'"]
on__scan: ["Hello World!"]
on__add_string: [nil, "Hello World!"]
on__scan: ["'"]
on__string_end: [nil, "'"]
on__scan: [" "]
on__space: [" "]
on__scan: ["if"]
on__KEYWORD: ["if"]
on__argstart: ["Hello World!"]
on__fcall: [:puts, nil]
on__scan: [" "]
on__space: [" "]
on__scan: ["true"]
on__KEYWORD: ["true"]
on__varref: [:true]
on__if_mod: [nil, nil]
=> nil

Maybe if you think exctags is OK, you've never felt the intoxicating
power of a fully operational Battle Star^w^w tagging system...

oops. time overlap.

Obviously, I'm not a tags poweruser. So what is an example of a fully
operational tagging system?

Cameron

Well, I think you have been addicted to tags way more than I have. I use
them in couple occasion, but because of their limited usefulness in
Ruby, I never thought too much of them. Now, what you told me about the
potential power of well integrated tags does appeal to me a lot.
Consider me as a beta tester if you get anything done on the subject.

Thanks for the explanation,
Guillaume.

···

On Fri, 2005-03-25 at 03:59 +0900, Sam Roberts wrote:

Half on, half off.

I think you've internalized the limitations, or don't realize how good
it could be.

Sam Roberts wrote:

Quoting flgr@ccan.de, on Fri, Mar 25, 2005 at 02:34:48AM +0900:

Florian Gross wrote:

Ripper basically is Ruby's integrated Ruby parser. It will invoke callbacks for every kind of construct it encounters.

This code snippet ought to get you started with it:

Oh, and you need to do require 'ripper' before you can use it, of course.

There is no files released, and the cvs is not building for me, I'll
have to try later.

Odd, doesn't it come bundled with Ruby already?

Just doing require 'ripper' worked for me. (I'm on the win32 one-click installer.)

Sam Roberts wrote:

Ripper basically is Ruby's integrated Ruby parser. It will invoke callbacks for every kind of construct it encounters.

Hm, look like it returns whitespace, and other non-syntactic elements.
Good. Does it return end-of-line, and is it just a lexer, or is it a
parser, too?

I.e., does

  MyParser.new.parse("class Foo; Bar = 4; end;")

tell me that the Foo is a class name, and Bar is a constant name, or do
I have to deduce that?

If so, maybe i'll try.

The above produces quite a few events. Here's a few which seem to be relevant to you:

on__ASSIGN: ["="]
on__assignable: [:Bar, nil]
[...]
on__assign: [nil, 4]
[...]
on__class: [:Foo, nil, nil]
on__set_line: [nil, 1]

It's probably best to try this out on your own.

The rubyforge page makes it look as if it may be written in C based on
ruby's parser, using lex&yacc. If so, that would be sweet, because it
might be fast.

I still think that it is build directly into Ruby and that it also comes with it. It reuses the same parser as Ruby and thus ought to be very fast.

I'm on MacOSX and built Ruby myself, and there is no 'ripper' lib...

ruby -v --> ruby 1.8.2 (2004-12-25) [powerpc-darwin7.8.0]

- Jamis

···

On Mar 24, 2005, at 1:34 PM, Florian Gross wrote:

Sam Roberts wrote:

Quoting flgr@ccan.de, on Fri, Mar 25, 2005 at 02:34:48AM +0900:

Florian Gross wrote:

Ripper basically is Ruby's integrated Ruby parser. It will invoke callbacks for every kind of construct it encounters.

This code snippet ought to get you started with it:

Oh, and you need to do require 'ripper' before you can use it, of course.

There is no files released, and the cvs is not building for me, I'll
have to try later.

Odd, doesn't it come bundled with Ruby already?

Just doing require 'ripper' worked for me. (I'm on the win32 one-click installer.)