I’ve been looking at lex.c and parse.y and parse.c, …
Pending a correction, lex.c is an unused remnant.
parse.c is ignorable (generated by Yacc from parse.y).
The real ruby lexer is in parse.y (function yylex).
How might one simply break Ruby code into tokens?
Hal
While writing IRB, Keiju ISHITSUKA seems to have taken
the trouble to expose his lexer to other callers.
Thank you.
ruby-lex is a ruby emulation of the interpreter’s lexer.
(May have slight differences.)
As part of IRB, it’s standard distribution.
I haven’t seen examples – this offering tokenizes itself
but you can change to a script-file target.
···
“Hal Fulton” hal9000@hypermetrics.com wrote:
#------------------------------------
require ‘irb\ruby-lex’
include RubyToken
#File.open(‘testfile.rb’) do |infile| # see: lex.set_input
tree =
ikeys = [:name, :op, :value, :node]
lex = RubyLex.new
DATA.rewind
lex.set_input(DATA) # (DATA) or (infile)
line = lex.get_readed # read (past tense;)
while tk = lex.token
tkc = tk.class.to_s.sub(/\ARubyToken::/, '')
tkih = { :tk => tkc,
:line => tk.line_no,
:seek => tk.seek,
:char_no => tk.char_no }
# some tokens have extra attributes.
ikeys.each do |tkk|
tkih[tkk.to_sym] = tk.respond_to?(tkk) && tk.send(tkk)
end
tree << tkih
if tkc === 'TkNL'
puts line unless line == /\A\s*\Z/ # line sep
line = lex.get_readed # next line
# Note: read line left here otherwise
# position of NL is mis-reported [BUG?].
end
end
tree.each do |tkh|
printf(“line %-3d @%3d: %-12s”, tkh[:line], tkh[:char_no], tkh[:tk])
printf(" [%s]", tkh[:name]) if tkh[:name]
tkh.each do |k, v|
next unless (ikeys - [:name]).include?(k)
printf(" %s(%s)", k, v) if v
end
puts
puts if tkh[:tk] == 'TkNL'
end
#end # File.open
END
#------------------------------------
There may be other methods of interest in:
lib\ruby\1.8\irb\slex.rb
lib\ruby\1.8\irb\ruby-lex.rb
lib\ruby\1.8\irb\ruby-token.rb
daz