Problems with racc: $end token

Hello,

I’m trying to write a simple parser using racc, and I’m apparently
retarded. I’ve looked at all of the example code I can find, including
rdtool, and I cannot seem to resolve this problem. I am convinced there
is something small I am missing, because it’s apparently small enough that
I can’t see the difference between my grammar/parser and everyone else’s.

I always get the following error when I try to run my parser:

parse error on token ‘$end’ => ‘false’

The basic text is mine, but the error is kicked out by racc. For some
reason, racc is considering the $end token to be a parse error, rather
than considering it the end of parsing. I’ve tried understanding all the
code involved, including racc’s parser.rb, the racc script itself, the
generated parser.rb file, and a good bit more. I just can’t get it.

Here are the pertinent portions of my grammar file:

class Cricket::Parser

token DEFINE NAME STRING PARAM LCURLY RCURLY VALUE

rule
file: objects
;

objects: object { [val[0]] }
> objects object { [val[0], val[1]].flatten }
;

object: DEFINE NAME LCURLY vars RCURLY {
Cricket::Object.create(val[1],val[3]) }
;

vars: var
> vars var
;

var: PARAM VALUE { [val[0],val[1]] }
;

end

----inner

def parse(src)
#puts "src is " + BLUE + src + RESET
@src = src

$invar = false
$inobject = false
$done = false

begin
    do_parse
rescue SyntaxError
    $stderr.print "Got a syntax error: " + $! + "\n"
    exit
end

end

def next_token

if @src.length == 0
puts “returning end”
#return [false, 0]
#return [false, ‘$’]
return [false, false]
end

end

As you can see, I’ve tried returning different types things, to no affect
(although the value of the $end token changes, the error still gets kicked
up).

The really strange thing is that the parser I wrote earlier had the same
problem unless i passed a file (not a string) to yylex. This seems to
imply that the EOF from the file somehow avoids this error. I’m not using
yylex in this case (as my tokens are quite easy), and I’d really like to
just understand what the problem is.

Any pointers would be greatly appreciated, but apparently pointing me to
further example code is not helpful, unless you can point out how this
sample code avoids this error. Again, I expect it’s something small, but
it’s small enough that I’ve missed it twice now, both times having written
the grammar from scratch.

Thanks,
Luke

···


First they came for the hackers. But I never did anything
illegal with my computer, so I didn’t speak up.
Then they came for the pornographers. But I thought there was
too much smut on the Internet anyway, so I didn’t speak up.
Then they came for the anonymous remailers. But a lot of nasty
stuff gets sent from anon.penet.fi, so I didn’t speak up.
Then they came for the encryption users. But I could never
figure out how to work PGP anyway, so I didn’t speak up.
Then they came for me. And by that time there was no one left
to speak up.
– Alara Rogers, Aleph Press

I apologize for not being able to dig more into this, but for
my next_token method I have:

def next_token
@q.shift
end

Are there two different ways to setup the tokenizing (ie next_token and
parse method)? I vaguely recall that there might be, but can’t
look it up right now.

···

On Friday, 12 December 2003 at 0:42:30 +0900, Luke A. Kanies wrote:

Hello,

def next_token

if @src.length == 0
puts “returning end”
#return [false, 0]
#return [false, ‘$’]
return [false, false]
end

end


Jim Freeze

Well, kind of, but we’re both doing the same thing. Most of the examples
I’ve seen preparse the entire source string into an array, and then just
pop tokens off the stack. I’m using the ‘next_token’ routine to actually
collect the tokens and return them.

So what I’m doing is (theoretically) functionally equivalent, I’m just
doing the split-into-tokens inside next_token instead of inside parse,
which seems to make a bit more sense to me.

However, I’ll try converting to stacking the text into an array of tokens
and see what happens. I don’t understand how that could solve the
problem, but that doesn’t mean it won’t.

Luke

···

On Fri, 12 Dec 2003, Jim Freeze wrote:

On Friday, 12 December 2003 at 0:42:30 +0900, Luke A. Kanies wrote:

Hello,

def next_token

if @src.length == 0
puts “returning end”
#return [false, 0]
#return [false, ‘$’]
return [false, false]
end

end

I apologize for not being able to dig more into this, but for
my next_token method I have:

def next_token
@q.shift
end

Are there two different ways to setup the tokenizing (ie next_token and
parse method)? I vaguely recall that there might be, but can’t
look it up right now.


A great many people think they are thinking when they are merely
rearranging their prejudices.
– William James

Hello,

def next_token

if @src.length == 0
puts “returning end”
#return [false, 0]
#return [false, ‘$’]
return [false, false]

What if you do: return [“”, “”] ?

···

On Friday, 12 December 2003 at 1:28:57 +0900, Luke A. Kanies wrote:

On Fri, 12 Dec 2003, Jim Freeze wrote:

On Friday, 12 December 2003 at 0:42:30 +0900, Luke A. Kanies wrote:

end
....

end


Jim Freeze

After the last of 16 mounting screws has been removed from an access
cover, it will be discovered that the wrong access cover has been
removed.

I still get a syntax error, but this time the apparently-magical token
‘$end’ is not used.

I know I’m supposed to return false as the token, and that somehow racc
converts that into the $end token. I just don’t know how it does that,
nor do I know why racc doesn’t then gracefully cease trying to parse,
rather than continuing on and hitting a syntax error.

I’ve tried looking through the source code, and I’m extremely confused. I
can find what I am pretty sure is all of the parsing stuff, but I’ve tried
adding puts statements to see what the heck is going on, and they never
get called. Or at least, I don’t see their output. I even tried
deleting all references to external modules to make sure that I wasn’t
loading an unmodified library, and that didn’t seem to work.

So, I guess I’ll continue trying to understand the code without knowing
how to turn on debugging (even though it appears to be there) and without
being able to add my own debug statements. Yay.

Thanks.

Luke

···

On Fri, 12 Dec 2003, Jim Freeze wrote:

What if you do: return [“”, “”] ?


That was just a drill of the emergency y2k system. Had this been a
real emergency, we would’ve also dumped a bucket of spiders on you and
yelled out “civilization is collapsing!”

Well, in the sample code I see:

@q.push [false, '$']   # optional from 1.3.7

I have successfully left that out of my parsers.
Essentially, the samples just tokenize the file
(and store the tokens in @q) then call do_parse.

do_parse apparently gets tokens from the stack by
calling next_token, which returns to tokens one
at a time by calling @q.shift. This would suggest
that when you are done all you need to do is return
the same value as .shift #=> nil.

···

On Friday, 12 December 2003 at 2:16:53 +0900, Luke A. Kanies wrote:

On Fri, 12 Dec 2003, Jim Freeze wrote:

What if you do: return [“”, “”] ?

I still get a syntax error, but this time the apparently-magical token
‘$end’ is not used.

I know I’m supposed to return false as the token, and that somehow racc
converts that into the $end token. I just don’t know how it does that,
nor do I know why racc doesn’t then gracefully cease trying to parse,
rather than continuing on and hitting a syntax error.


Jim Freeze

Okay, I may have actually tracked this down to my apparent ignorance of
ruby’s regexes.

The following code does not behave as I expect at all:

string = “\nalias Jamie Dowdy\n"
string.sub!(/^./,”")

print “[#{string}]\n”

This code strips out the ‘a’ in ‘alias’. In other words, the anchor '^'
is anchoring against the beginning of a line, rather than the beginning of
the string.

Not surprisingly, this, um, really screws up my pattern matching.

How do I specifically anchor against the beginning of a string in ruby,
not the beginning of a line in a string?

Getting that fixed may solve my problem here (and with my other parser,
since I obviously expected this behaviour and likely made the same mistake
in my other parser).

Thanks,
Luke

···


Due to circumstances beyond your control, you are master of your fate
and captain of your soul.

This code strips out the ‘a’ in ‘alias’. In other words, the anchor ‘^’
is anchoring against the beginning of a line, rather than the beginning of
the string.

I had that problem a day or two ago…

How do I specifically anchor against the beginning of a string in ruby,
not the beginning of a line in a string?

Use \A for the beginning, and \Z for the end. I believe.

···

On Fri, Dec 12, 2003 at 03:17:51AM +0900, Luke A. Kanies wrote:

Ceri Storey cez@necrofish.org.uk

Well, in the sample code I see:

@q.push [false, '$']   # optional from 1.3.7

Yep, I’ve tried that, along with about nine other variations of having an
end token.

I have successfully left that out of my parsers.
Essentially, the samples just tokenize the file
(and store the tokens in @q) then call do_parse.

do_parse apparently gets tokens from the stack by
calling next_token, which returns to tokens one
at a time by calling @q.shift. This would suggest
that when you are done all you need to do is return
the same value as .shift #=> nil.

Yeah, all the code does it that way, but there shouldn’t be a functional
difference between collecting the tokens in parse() and returning them in
next_token(), and just collecting and returning them in next_token().
Either way, I’ve switched to a method like the examples, and I’ve
corrected my regex problems, and I still get an error.

At this point I think it’s a problem with my grammar, that I’m somehow not
correctly specifying the end of the parsing. Obviously, though, I don’t
know how to say “hey, the file is over, stop looking” or whatever the
magic words are. I know that the false token is supposed to do that,
but for some reason racc thinks it shouldn’t be expecting that token yet.

In case anyone feels like pointing out my idiocy, here’s my grammar as it
stands now:

token DEFINE NAME STRING PARAM LCURLY RCURLY VALUE RETURN COMMENT
INLINECOMMENT EOF

rule
file: objects EOF
;

objects: object { [val[0]] }
> objects object { [val[0], val[1]].flatten }
;

object: DEFINE NAME LCURLY RETURN vars RCURLY returns {
Cricket::Object.create(val[1],val[3]) }
;

vars: var { [val[0]] }
> vars var { [val[0], val[1]].flatten }
;

var: PARAM VALUE returns { [val[0],val[1]] }
;

returns: return
> returns return
;

return: comment RETURN
;

comment: # nothing
> COMMENT
> INLINECOMMENT
;

end

It’s for parsing text like this:

a comment

define contact {
contact_name vwf1607 ; inline comment
alias Lawrence Hubenak
host_notification_period none
host_notification_commands host-notify-by-email
service_notification_period none
service_notification_commands notify-by-email
email lawrence.hubenak@hcahealthcare.com
pager lawrence.hubenak@my2way.com
}

I.e., nagios configs.

Well, I guess I’ll figure it out eventually, I was just hoping not get
much past the 8 or so hours I’ve already wasted on it.

Thanks,
Luke

···

On Fri, 12 Dec 2003, Jim Freeze wrote:


I have an answering machine in my car. It says, “I’m home now. But
leave a message and I’ll call when I’m out.” – Stephen Wright

Well, in the sample code I see:

@q.push [false, '$']   # optional from 1.3.7

Yep, I’ve tried that, along with about nine other variations of having an
end token.

I have successfully left that out of my parsers.
Essentially, the samples just tokenize the file
(and store the tokens in @q) then call do_parse.

do_parse apparently gets tokens from the stack by
calling next_token, which returns to tokens one
at a time by calling @q.shift. This would suggest
that when you are done all you need to do is return
the same value as .shift #=> nil.

Yeah, all the code does it that way, but there shouldn’t be a functional
difference between collecting the tokens in parse() and returning them in
next_token(), and just collecting and returning them in next_token().
Either way, I’ve switched to a method like the examples, and I’ve
corrected my regex problems, and I still get an error.

I agree. Could be a grammar/file syntax mismatch.

At this point I think it’s a problem with my grammar, that I’m somehow not
correctly specifying the end of the parsing. Obviously, though, I don’t
know how to say “hey, the file is over, stop looking” or whatever the
magic words are. I know that the false token is supposed to do that,
but for some reason racc thinks it shouldn’t be expecting that token yet.

In case anyone feels like pointing out my idiocy, here’s my grammar as it
stands now:

Ok, I’ll take a look. At your grammar that is, not your idiocy. :slight_smile:

···

On Friday, 12 December 2003 at 3:55:31 +0900, Luke A. Kanies wrote:

On Fri, 12 Dec 2003, Jim Freeze wrote:


Jim Freeze

This is the ____LAST time I take travel suggestions from Ray Bradbury!

This code strips out the ‘a’ in ‘alias’. In other words, the anchor ‘^’
is anchoring against the beginning of a line, rather than the beginning of
the string.

I had that problem a day or two ago…

How do I specifically anchor against the beginning of a string in ruby,
not the beginning of a line in a string?

Use \A for the beginning, and \Z for the end. I believe.

Use ‘\z’ (lowercase) if you want to match the end.

irb
irb(main):001:0> /x\Z/.match(“ax\n”).to_a
=> [“x”]
irb(main):002:0> /x\z/.match(“ax\n”).to_a
=>
irb(main):003:0> /x\z/.match(“ax”).to_a
=> [“x”]
irb(main):004:0>

···

On Fri, 12 Dec 2003 03:31:11 +0900, Ceri Storey wrote:

On Fri, Dec 12, 2003 at 03:17:51AM +0900, Luke A. Kanies wrote:


Simon Strandgaard

Hi Luke

I took your code and repeated the $end problem.
I removed the problem by putting an optional return
after your define { } block.

See code attached:

gram.y (2.62 KB)

···

On Friday, 12 December 2003 at 3:55:31 +0900, Luke A. Kanies wrote:


Jim Freeze

Bubble Memory, n.:
A derogatory term, usually referring to a person’s
intelligence. See also “vacuum tube”.

Um, wow, thank you!

I ended up finally figuring out how to turn on debugging in racc (yep, a
hack: I had to use -E and then edit the parser.rb file manually, setting
@yydebug = true), and this enabled me to figure out, that, well, my
grammer didn’t do anything like I expected.

So, I ended up basically rewriting the grammar itself. In doing so, I was
finally able to avoid the $end error. In other words, it was definitely a
grammar problem, and I probably would have caught it much sooner if I had
figured out earlier how to turn debugging on. The silly thing is, I know
there’s an API for turning it on, but I haven’t been able to extract it
yet.

The problem here is that I have used perl’s Parse::Yapp, which behaves
quite differently in many ways, and most especially in how it deals with
syntax errors. It is totally my fault, because I was unconsciously
expecting racc to behave a certain way, and when it didn’t I got very
confused. With the advent of debugging, I figured it out relatively
quickly.

As a side note, the grammar rules must set the value of ‘return’. For
some reason, racc does not use explicit mechanisms for returning data;
instead you have to set the value of ‘return’ and it returns that for you.
This also caused a bunch of problems for me.

To summarize:

There is debugging in racc, and using racc -E to embed the parser and then
manually setting @yydebug = true can turn it on. I’m sure there’s a
better way.

Also, you must set ‘return’ manually, although it can be any type of
variable.

Thanks for all your help, Jim.

Luke

···

On Fri, 12 Dec 2003, Jim Freeze wrote:

On Friday, 12 December 2003 at 3:55:31 +0900, Luke A. Kanies wrote:

Hi Luke

I took your code and repeated the $end problem.
I removed the problem by putting an optional return
after your define { } block.


Today I dialed a wrong number…The other person said, “Hello?” and
I said, “Hello, could I speak to Joey?”…
They said, “Uh…I don’t think so…he’s only 2 months old.”
I said, “I’ll wait.” – Steven Wright

Hi,

In mail “Re: problems with racc: $end token”

There is debugging in racc, and using racc -E to embed the parser and then
manually setting @yydebug = true can turn it on. I’m sure there’s a
better way.

Set @yydebug=true in your “inner” and use racc -g.

% cat t.y
class MyParser
options no_result_var
rule
program: list
list : { }
> list ITEM { val[0].push val[1]; val[0] }

---- inner
def parse
@tokens = [
[:ITEM, ‘1’],
[:ITEM, ‘2’],
[:ITEM, ‘3’]
]
@yydebug = true #####
do_parse
end
def next_token
@tokens.shift
end

---- footer
p MyParser.new.parse

~/tmp % racc -ot.rb t.y
~/tmp % ruby t.rb
[“1”, “2”, “3”]

~/tmp % racc -g -ot.rb t.y
~/tmp % ruby t.rb
reduce → list
[ (list ) ]

goto 2
[ 0 2 ]

read :ITEM(ITEM) “1”

shift ITEM

  (snip)

Also, you must set ‘return’ manually, although it can be any type of
variable.

Try this:

class MyParser
options no_result_var #### this line
rule

Regards,
Minero Aoki

···

“Luke A. Kanies” luke@madstop.com wrote: