I want to write my own wiki markup language. Pure regexp fails me, as I need a proper parser to keep track of state.
I thought I'd give Syntax a try, but I'm a little confused as to some of the specifics.
1) What is a 'region', and how do I use the start_region method? It's not documented in the API, or the source. (I think this is what I want for nesting tags.)
2) Do I have to close_group and close_region, or do they automatically get invoked under certain circumstances? (Does starting one group close the previous one? Do repeated calls to open the same group cause them to be aggregated together (is that how accumulating text in :normal groups works?)
3) How do I keep track of state during successive calls to #step? I tried an instance variable, but that doesn't seem to exist across calls.
Following is my terrible, broken attempt at the basics of what I'm after. Am I totally misunderstanding how to use Syntax?
require 'rubygems'
require_gem 'syntax'
class OWLScribble < Syntax::Tokenizer
def step
if heading = scan( /^={1,6}/ )
start_region "heading level #{heading.length}".intern
$heading_end = Regexp.new( heading + "\\s*" )
elsif $heading_end && ( heading = scan( $heading_end ) )
end_region "heading level #{heading.length}".intern
$heading_end = nil
elsif char = scan( /^[\r\n]/ )
start_group :paragraph, char
elsif scan( /\*\*/ )
if $inbold
end_region :bold
$inbold = nil
else
start_region :bold
$inbold = true
end
elsif char = scan( /./ )
start_group :normal, char
else
scan( /[\r\n]/ )
end
end
end
Syntax::SYNTAX[ 'owlscribble' ] = OWLScribble
str = <<END
Intro paragraph
= Heading 1 =
First **paragraph** under the heading.
== Second **Heading** = very yes ==
Another paragraph.
END
tokenizer = Syntax.load( "owlscribble" )
tokenizer.tokenize( str ) do |token|
puts "#{token.group} (#{token.instruction}) #{token}"
end
···
--
(-, /\ \/ / /\/