Quality of error messages

>> % ./script.rb
>> script.rb:6: warning: inconsistent indentation level.
>> script.rb:13: parse error
>>
>> This should make it easy to find such errors.

>That seems like a nice idea to me. This would be nearly the
same as the
>autoindenter so, only that it does not indent but spit out warnings.

I'm afraid that it might cause tab-space indentation war like
in the Python community. The issue is much smaller though,
since it is not mandatory.

can we extend -c (check syntax) command so that it also checks for
missing/broken def/end pairs? Right now ruby -c gives only a syntax error
message wc is not so helpful (especially for python newcomers).

              matz.

kind regards -botp

···

Yukihiro Matsumoto [mailto:matz@ruby-lang.org] wrote:

Hi,

···

In message "Re: quality of error messages" on Fri, 8 Oct 2004 12:45:11 +0900, "Peña, Botp" <botp@delmonte-phil.com> writes:

can we extend -c (check syntax) command so that it also checks for
missing/broken def/end pairs?

We can. But how we check for missing/broken def/end pairs, more than
just syntax error?

              matz.

Yukihiro Matsumoto wrote:

Hi,

>can we extend -c (check syntax) command so that it also checks for
>missing/broken def/end pairs?

We can. But how we check for missing/broken def/end pairs, more than
just syntax error?

              matz.

I believe what is being asked for is more than just a "syntax error" message. If the error could be more specific, like "missing 'end' on line x", it would greatly increase the usefulness of the -c option.

That said, I think I have some inkling as to how tricky such a request would be to implement. I'm nowhere near the language wizard you are, matz, but I've tinkered, and by tinkering, I've learned. :slight_smile:

- Jamis

···

In message "Re: quality of error messages" > on Fri, 8 Oct 2004 12:45:11 +0900, "Peña, Botp" <botp@delmonte-phil.com> writes:

--
Jamis Buck
jgb3@email.byu.edu
http://www.jamisbuck.org/jamis

Hi,

···

In message "Re: quality of error messages" on Fri, 8 Oct 2004 12:58:26 +0900, Jamis Buck <jgb3@email.byu.edu> writes:

We can. But how we check for missing/broken def/end pairs, more than
just syntax error?

I believe what is being asked for is more than just a "syntax error"
message. If the error could be more specific, like "missing 'end' on
line x", it would greatly increase the usefulness of the -c option.

I know what he wants. I am not refusing his idea. The point is I'm
not yet sure how to detect missing pairs.

              matz.

Having spent 12 of the last 48 hours or so hacking away on ruby's
parse.y, I think I've got a pretty clear idea what the problem is.
Unless (as some have suggested) you add a second source of information
(such as indentation or an explicit statement of intent such as 'enddef'
or 'method_delimiter') it simply isn't possible in general to tell which
end is missing. Consider:

    ((1+2)+3+4/5

There is clearly a ')' missing, but should it be:

    ((1)+2)+3+4/5 which equals 6.8

or
   
    ((1+2))+3+4/5 which likewise equals 6.8

or

    ((1+2)+3)+4/5 which is also 6.8

or

    ((1+2)+3+4)/5 which is 2

or

    ((1+2)+3+4/5) which is 6.8 again

Without an external source of information, it is impossible to decide
this. In a simple ruby program, there might be a reasonably small
number of possibilities, but those are the times it's easy to spot "by
hand." In a more complex (say, over 50 lines or so) program it would be
more work to weed through the warnings than to find it by other means.

-- Markus

P.S. There may be heuristics to get a reasonable "hint" by making some
assumptions; e.g., warn if there is a line less indented than the first
line of an outstanding (open) construct, excluding here-docs, %_{
constructs, etc., if (and only if) there is a missing end at eof. This
could (I think) be implemented fairly easily by

      * caching the location and indentation of a each class, def, etc.
        on a stack

      * popping from the stack on end

      * noting when the first token is lexed from a line if it was less
        indented than the most recent outstanding def/class, etc., and
        if so noting the fact in a global

      * including the information in the global (if any) when generating
        the missing end message

But this is only a heuristic, based on the observation that even people
who don't like salient structure tend to use it to some extent. It
would not solve the problem in general, and perhaps not even in a
typical case, for anyone but me and the python expatriates.

···

On Thu, 2004-10-07 at 21:19, Yukihiro Matsumoto wrote:

Hi,

In message "Re: quality of error messages" > on Fri, 8 Oct 2004 12:58:26 +0900, Jamis Buck <jgb3@email.byu.edu> writes:

>> We can. But how we check for missing/broken def/end pairs, more than
>> just syntax error?

>I believe what is being asked for is more than just a "syntax error"
>message. If the error could be more specific, like "missing 'end' on
>line x", it would greatly increase the usefulness of the -c option.

I know what he wants. I am not refusing his idea. The point is I'm
not yet sure how to detect missing pairs.

Yukihiro Matsumoto wrote:

Hi,

>> We can. But how we check for missing/broken def/end pairs, more than
>> just syntax error?

>I believe what is being asked for is more than just a "syntax error" >message. If the error could be more specific, like "missing 'end' on >line x", it would greatly increase the usefulness of the -c option.

I know what he wants. I am not refusing his idea. The point is I'm
not yet sure how to detect missing pairs.

Well, I had an idea once, but I considered it too dumb to share.

I am opposed to the use of "significant whitespace" as in Python. But
having said that, it is always wise to indent intelligently, most of
us do it in one way or another.

It would be theoretically possible to match by indentation as well as
keyword (solely for these purposes, and only with -c).

For example:

1| class Foo
2| class Bar
3| def mymeth(x)
4| if x.nil?
5| puts "x is nil"
6| end
7| end
8| end

If we do this kind of guessing, we will guess that there is a missing end
after line 5. (Because the end doesn't match the indentation of the 'if'
to which it corresponds.)

If we don't, we will go past line 8 until we hit something we can't handle.
It may be line 10 or line 999.

This is a very common error for me, and I have thought of writing a little tool
just to find missing ends (since my own indentation is always consistent enough
to allow it).

Hal

···

In message "Re: quality of error messages" > on Fri, 8 Oct 2004 12:58:26 +0900, Jamis Buck <jgb3@email.byu.edu> writes:

Yukihiro Matsumoto wrote:

Hi,

We can. But how we check for missing/broken def/end pairs, more than
just syntax error?

I believe what is being asked for is more than just a "syntax error" message. If the error could be more specific, like "missing 'end' on line x", it would greatly increase the usefulness of the -c option.

I know what he wants. I am not refusing his idea. The point is I'm
not yet sure how to detect missing pairs.

            matz.

The best suggestion that i've seen was from B.Candler@pobox.com (above). I'll quote a bit:
<<EOM

I think that you don't have to enforce any particular indentation style or
amount of space on each line - only that it is consistent between begin and
end.

If we define the 'nesting depth' as the number of module / class / def / do
/ if sections we are within (i.e. the number of matching 'end's we expect to
see), then:

- at the start of each line, count the number of spaces. Ignore lines which
consist entirely of whitespace.

R1: if the nesting depth is the same as the previous line, then raise a
warning if the number of spaces is not the same as the previous line

R2: if the nesting depth is greater than the previous line, then remember
the indentation of this line associated with this nesting depth (e.g. on a
stack)

R3: if the nesting depth is less than the previous line, then raise a
warning if the number of spaces is not the same as the last line with the
same nesting depth

EOM

He goes on to explain further, and give examples. I don't know how difficult this would be to implement, but it looks like a good answer -- if feasible. I don't think that this should be a syntax rule, but purely used as an (optional?) heuristic for error detection. I.e., take it as a description of how people DO tend to write code rather than a rule of how they MUST.

···

In message "Re: quality of error messages" > on Fri, 8 Oct 2004 12:58:26 +0900, Jamis Buck <jgb3@email.byu.edu> writes:

Interesting that you haven't written the tool yet :slight_smile:

Gavin

···

On Friday, October 8, 2004, 3:38:38 PM, Hal wrote:

It would be theoretically possible to match by indentation as well as
keyword (solely for these purposes, and only with -c).

For example:

1| class Foo
2| class Bar
3| def mymeth(x)
4| if x.nil?
5| puts "x is nil"
6| end
7| end
8| end

If we do this kind of guessing, we will guess that there is a missing end
after line 5. (Because the end doesn't match the indentation of the 'if'
to which it corresponds.)

If we don't, we will go past line 8 until we hit something we can't handle.
It may be line 10 or line 999.

This is a very common error for me, and I have thought of writing a little tool
just to find missing ends (since my own indentation is always consistent enough
to allow it).

The problem (as I responded before) is that this would complain
about almost everyone's code since (*smile* I presume because you are
all unwashed heathens, but there, may be other reasons) people tend to
follow the "match pairs" rule instead of tucking the "end"s in as the
last item of the block.

     The rule as everyone states it (including here) would object to the
ends being at the same indentation as the "containing" structure, which
is where pair-matchers put them. People _say_ that they indent
according to salience, but in fact they pair match.

     So, in summary, I suspect that I would be happy with this rule but
almost everyone else would be happier with the modified (< vs. <=)
version I suggested earlier on this thread, which would allow:

      starting thing
           guts
           intestines
           gizzards
      end

instead of the salient form:

      starting thing
           guts
           intestines
           gizzards
           end

which would be expected by the rule as stated.

     -- Markus

···

On Fri, 2004-10-08 at 12:26, Charles Hixson wrote:

Yukihiro Matsumoto wrote:

>Hi,
>
>In message "Re: quality of error messages" > > on Fri, 8 Oct 2004 12:58:26 +0900, Jamis Buck <jgb3@email.byu.edu> writes:
>
>>> We can. But how we check for missing/broken def/end pairs, more than
>>> just syntax error?
>
>>I believe what is being asked for is more than just a "syntax error"
>>message. If the error could be more specific, like "missing 'end' on
>>line x", it would greatly increase the usefulness of the -c option.
>
>I know what he wants. I am not refusing his idea. The point is I'm
>not yet sure how to detect missing pairs.
>
> matz.
>
The best suggestion that i've seen was from B.Candler@pobox.com
(above). I'll quote a bit:
<<EOM

I think that you don't have to enforce any particular indentation style or
amount of space on each line - only that it is consistent between begin and
end.

If we define the 'nesting depth' as the number of module / class / def / do
/ if sections we are within (i.e. the number of matching 'end's we expect to
see), then:

- at the start of each line, count the number of spaces. Ignore lines which
consist entirely of whitespace.

R1: if the nesting depth is the same as the previous line, then raise a
warning if the number of spaces is not the same as the previous line

R2: if the nesting depth is greater than the previous line, then remember
the indentation of this line associated with this nesting depth (e.g. on a
stack)

R3: if the nesting depth is less than the previous line, then raise a
warning if the number of spaces is not the same as the last line with the
same nesting depth

EOM

He goes on to explain further, and give examples. I don't know how difficult this would be to implement, but it looks like a good answer -- if feasible. I don't think that this should be a syntax rule, but purely used as an (optional?) heuristic for error detection. I.e., take it as a description of how people DO tend to write code rather than a rule of how they MUST.

Maybe an idea for the quiz?

KB

···

On Fri, 08 Oct 2004 17:36:51 +0900, Gavin Sinclair wrote:

This is a very common error for me, and I have thought of writing a
little tool just to find missing ends (since my own indentation is
always consistent enough to allow it).

Interesting that you haven't written the tool yet :slight_smile:

Markus wrote:

    The problem (as I responded before) is that this would complain
about almost everyone's code since (*smile* I presume because you are
all unwashed heathens, but there, may be other reasons) people tend to
follow the "match pairs" rule instead of tucking the "end"s in as the
last item of the block.

    The rule as everyone states it (including here) would object to the
ends being at the same indentation as the "containing" structure, which
is where pair-matchers put them. People _say_ that they indent
according to salience, but in fact they pair match.

    So, in summary, I suspect that I would be happy with this rule but
almost everyone else would be happier with the modified (< vs. <=)
version I suggested earlier on this thread, which would allow:

     starting thing
          guts
          intestines
          gizzards
     end

instead of the salient form:

     starting thing
          guts
          intestines
          gizzards
          end

which would be expected by the rule as stated.

    -- Markus

OK. You're clearly right in my case, as I tend to indent with this pattern (in C, etc.):
if (something)
{ a block
      of stuff
}
else
{ a different block
     of different stuff
}
which few seem to use. For me one of the disfeatures of Ruby is that the block start must be on the same line as the test for block start (unless one takes extra steps). I find this a continuing irritation, even while I understand that the parser needs to know that there is a block coming. So Ruby already violates my desired indentation pattern. An additional violation would probably not be much additional irritation.

Yeah, I know. That was something I glossed over, because I didn't want to
think it through :slight_smile:

I think that: if a line begins with 'end', then you process it (reduce the
nesting level) *before* checking whether the line's indentation level
matches what you expect. And if there's more than one, I suppose you have to
process them all, in case anyone writes something silly like

   class Foo; def bar; begin
         puts "hello"
   end; end; end

Another construct which may cause difficulty is

case foo
when /bar/
  ...
else
  ...
end

since I've seen other ways of laying this out.

Regards,

Brian.

···

On Sat, Oct 09, 2004 at 04:37:23AM +0900, Markus wrote:

     So, in summary, I suspect that I would be happy with this rule but
almost everyone else would be happier with the modified (< vs. <=)
version I suggested earlier on this thread, which would allow:

      starting thing
           guts
           intestines
           gizzards
      end

instead of the salient form:

      starting thing
           guts
           intestines
           gizzards
           end

which would be expected by the rule as stated.

Write it up and send it in. I will run it.

James Edward Gray II

P.S. I've heard from three of you now, so the other 1,097 active members sure better be watching for their idea to come along, I say... :smiley:

···

On Oct 8, 2004, at 5:25 AM, Kristof Bastiaensen wrote:

On Fri, 08 Oct 2004 17:36:51 +0900, Gavin Sinclair wrote:

This is a very common error for me, and I have thought of writing a
little tool just to find missing ends (since my own indentation is
always consistent enough to allow it).

Interesting that you haven't written the tool yet :slight_smile:

Maybe an idea for the quiz?

> So, in summary, I suspect that I would be happy with this rule but
> almost everyone else would be happier with the modified (< vs. <=)
> version I suggested earlier on this thread, which would allow:
>
> starting thing
> guts
> intestines
> gizzards
> end
>
> instead of the salient form:
>
>
> starting thing
> guts
> intestines
> gizzards
> end
>
> which would be expected by the rule as stated.

Yeah, I know. That was something I glossed over, because I didn't want to
think it through :slight_smile:

I think that: if a line begins with 'end', then you process it (reduce the
nesting level) *before* checking whether the line's indentation level
matches what you expect. And if there's more than one, I suppose you have to
process them all, in case anyone writes something silly like

   class Foo; def bar; begin
         puts "hello"
   end; end; end

     Looking ahead adds whole gobs of complexity, and may not help as
much as you think (just my intuition, but I've been led down paths that
looked a lot like that before).

     There is a simpler (and more general) solution in my opinion.
Don't try to codify what people do, because they don't all do the same
thing, they aren't always consistent, and it is fairly complicated to
define exactly in most cases (salience/pair-matching hybrids).

     Instead, try to capture what they DON'T do, and store the location
to use in the message if there turns out to be a problem. What don't
they do?

      * They don't 'exdent' beyond the indentation of the line that
        started the present construct, except for comments & here docs,
        etc. which are already distinguished by the compiler.
      * They don't start a nested construct at the same indentation
        level of the present construct.

I suspect that this would be dirt simple to implement, especially
compared to anything that required lookahead.

Another construct which may cause difficulty is

case foo
when /bar/
  ...
else
  ...
end

since I've seen other ways of laying this out.

     All the ways I've seen would work fine with the simple "oddness"
test above. They may not be the way I'd do it, but they weren't 'odd'.

-- Markus

···

On Sat, 2004-10-09 at 02:11, Brian Candler wrote:

On Sat, Oct 09, 2004 at 04:37:23AM +0900, Markus wrote:

     Instead, try to capture what they DON'T do, and store the location
to use in the message if there turns out to be a problem. What don't
they do?

      * They don't 'exdent' beyond the indentation of the line that
        started the present construct, except for comments & here docs,
        etc. which are already distinguished by the compiler.

I think you're saying: warn (or note the location of) any line which has
fewer spaces at the front than the number of spaces which were in front of
the line which started the current (most deeply nested) construct.

That seems OK to me; you won't get warnings for
   def foo
   line1
        line2
     line3
     end

but we don't really care, as we're unlikely to have a missing 'end' here.

      * They don't start a nested construct at the same indentation
        level of the present construct.

I do :frowning: Especially for deeply-nested modules: to keep listings not too
deeply indented, I write

module Foo
module Bar
class C1
  def m1
  end
end # class C1
class C2
  def m2
  end
end # class C2
end # module Bar
end # module Foo

I suspect that this would be dirt simple to implement, especially
compared to anything that required lookahead.

You're probably right there. Good enough would do. After all, it's only a
hint when you get a 'syntax error' at the end of the file.

Since you've been playing with the parser anyway, perhaps you can have a
look at it? :slight_smile:

Brian.

···

On Sun, Oct 10, 2004 at 01:45:23AM +0900, Markus wrote:

> Instead, try to capture what they DON'T do, and store the location
> to use in the message if there turns out to be a problem. What don't
> they do?
>
> * They don't 'exdent' beyond the indentation of the line that
> started the present construct, except for comments & here docs,
> etc. which are already distinguished by the compiler.

I think you're saying: warn (or note the location of) any line which has
fewer spaces at the front than the number of spaces which were in front of
the line which started the current (most deeply nested) construct.

That seems OK to me; you won't get warnings for
   def foo
   line1
        line2
     line3
     end

but we don't really care, as we're unlikely to have a missing 'end' here.

    Exactly.

> * They don't start a nested construct at the same indentation
> level of the present construct.

I do :frowning: Especially for deeply-nested modules: to keep listings not too
deeply indented, I write

module Foo
module Bar
class C1
  def m1
  end
end # class C1
class C2
  def m2
  end
end # class C2
end # module Bar
end # module Foo

     Good point. I have seen stuff like that.

> I suspect that this would be dirt simple to implement, especially
> compared to anything that required lookahead.

You're probably right there. Good enough would do. After all, it's only a
hint when you get a 'syntax error' at the end of the file.

Since you've been playing with the parser anyway, perhaps you can have a
look at it? :slight_smile:

     I might. I'm a little over committed at the moment, but I think I
could sell my boss on the utility of it, in which case, *evil grin* ah,
what to delegate....

     It probably wouldn't be for a few weeks in any case though.

-- Markus

···

On Sat, 2004-10-09 at 10:06, Brian Candler wrote:

On Sun, Oct 10, 2004 at 01:45:23AM +0900, Markus wrote: