[ANN] testy.rb - ruby testing that's mad at the world

well, for one that won't run :wink: assert takes and arg and not a block... but that aside

cfp:~ > cat a.rb
require 'test/unit'

class TC < Test::Unit::TestCase
   def test_with_shitty_error_reporting
     name = 42
     assert name == 'forty-two'
   end
end

cfp:~ > ruby a.rb
Loaded suite a
Started
F
Finished in 0.007649 seconds.

   1) Failure:
test_with_shitty_error_reporting(TC) [a.rb:6]:
<false> is not true.

1 tests, 1 assertions, 1 failures, 0 errors

the error message '<false> is not true', on 8th line of output (out of possibly thousands) makes me want to hunt the programmer down that wrote that club him to death with a 2x4. the assert api facilities insanity for the code maintainer

now, in testy

cfp:~/src/git/testy > ruby -I lib a.rb

···

On Mar 29, 2009, at 2:09 PM, Phlip wrote:

Or even less:

name = 42
assert{ name == ultimate.answer }

---
my lib:
   name compare:
     failure:
       expect:
         name: forty-two
       actual:
         name: 42

my thinking, currently, is that the test writer should be forced to

   . name the test suite
     . name the test
       . name the check

because doing these three things allows for informative error reporting. testing just shouldn't facilitate obfusicating what went wrong - imho.

cheers.

a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

This, of course, just means you are not writing big enough rails apps :-).

···

On Mon, Mar 30, 2009 at 07:22:35AM +0900, ara.t.howard wrote:

if we accept the research and assume that bugs scale linerarly with the #
of lines of code this is not good for robustness. this is one of my main
gripes with current ruby testing - my current rails app has about 1000
lines of code and 25,000 lines of testing framework!

--

Jeremy Hinegardner jeremy@hinegardner.org

Ara Howard wrote:

assert_equal 42, ultimate.answer, "name"

you can basically do that too, but i continually forget which is
expected and which is actual and, as you know, that's a slippery error
to track down at times.

Perhaps - but it's one rule that only needs to be learned once.

I notice that testy supports check <name>, <expected>, <actual> too.

Testy does (intentially) force you to name your tests, whereas
Test::Unit will happily let you write

  check <expected>, <actual>

I really don't like having to name each assertion, maybe because I'm
lazy or maybe because it feels like DRY violation. I've already said
what I want to compare, why say it again?

because littering the example code with esoteric testing framework
voodoo turns it into code in the testing language that does not
resemble how people might actually use the code

I agree with this. This is why I absolutely prefer Test::Unit (and
Shoulda on top of that) over Rspec.

i always end up writing
both samples and tests - one of the goals of testy is that, by having
a really simple interface and really simple human friendly output we
can just write examples that double as tests.

Hmm, this is probably an argument *for* having a DSL for assertions - to
make the assertions read as much like example code ("after running this
example, you should see that A == B and C < D")

Neither

  result.check "bar attribute", :expected => 123, :actual => res.bar

nor

  assert_equal 123, res.bar, "bar attribute"

reads particularly well here, I think. Ideally it should be as simple as
possible to write these statements of expectation. How about some eval
magic?

  expect[
    "res.foo == 456",
    "res.bar == 123",
    "res.baz =~ /wibble/"
  ]

Maybe need to pass a binding here, but you get the idea. (Before someone
else points it out, this is clearly a case which LISP would be very well
suited to handling - the same code to execute can also be displayed in
the results)

The problem here is reporting on expected versus actual, but perhaps you
could split on space and report the value of the first item.

  expected:
    - res.foo == 456
    - res.bar == 123
  unexpected:

···

-
      test: res.baz =~ /wibble/
      term: res.baz
      value: "unexpected result"

Going too far this way down this path ends up with rspec, I think.

In fact, I don't really have a problem with writing

  res.foo.should == 456

The trouble is the hundreds of arcane variations on this.

You solve this problem by only having a single test (Result#check), and
indeed if rspec only had a single method (should_equal) that would be
fairly clean too. However this is going to lead to awkwardness when you
want to test for something other than equality: e.g.

   res = (foo =~ /error/) ? true : false
   result.check "foo should contain 'error'", :expected=>true,
:actual=>res

Apart from being hard to write and read, that also doesn't show you the
actual value of 'foo' when the test fails.

Is it worth passing the comparison method?

   result.check "foo should contain 'error'", foo, :=~, /error/

But again this is getting away from real ruby for the assertions, in
which case it isn't much better than

   assert_match /error/, foo, "foo should contain 'error'"

   assert_match /error/, foo # lazy/DRY version

get a listing of which tests/examples i can run

Yes, parseable results and test management are extremely beneficial.
Those could be retro-fitted to Test::Unit though (or whatever its
replacement in ruby 1.9 is called)

Getting rid of the at_exit magic is also worth doing.

you can also do something like this (experimental) to just make a
simple example

cfp:~/src/git/testy > cat a.rb
require 'testy'

Testy.testing 'my lib' do

   test 'just an example of summing an array using inject' do
     a = 1,2
     a.push 3
     sum = a.inject(0){|n,i| n += i}
   end

end

Nice, could perhaps show the (expected) result inline too?

  test 'an example of summing and array using inject' do
    a = 1,2
    a.push 3
    sum = a.inject(0){|n,i| n += i}
  end.<< 6

A bit magical though. Also, we can only test the result of the entire
block, whereas a more complex example will want to create multiple
values and test them all.

so the goal is making it even easier to have a user play with your
tests/examples to see how they work, and even to allow simple examples
to be integrated with your test suite so you make sure you samples
still run without error too. of course you can do this with test/unit
or rspec but the output isnt' friendly in the least - not from the
perspective of a user trying to learn a library, nor is it useful to
computers because it cannot be parsed - basically it's just vomiting
stats and backtraces to the console that are hard for people to read
and hard for computers to read. surely i am not the only one that
sometimes resorts to factoring out a failing test in a separate
program because test/unit and rspec output is too messy to play nice
with instrumenting code?

I agree. Part of the problem is that when one thing is wrong making 20
tests fail, all with their respective backtraces, it can be very hard to
see the wood for the trees. What would be nice would be a folding-type
display with perhaps one line for each failed assertion, and a [+] you
can click on to get the detail for that particular one.

yeah that's on deck for sure. i *do* really like contexts with
shoulda. but still

cfp:/opt/local/lib/ruby/gems/1.8/gems/thoughtbot-shoulda-2.9.1 > find
lib/ -type f|xargs -n1 cat|wc -l
     3910

if we accept the research and assume that bugs scale linerarly with
the # of lines of code this is not good for robustness.

I disagree there - not with the research, but the implied conclusion
that you should never use a large codebase. Shoulda works well, and I've
not once found a bizarre behaviour in the testing framework itself that
I've had to debug, so I trust it.

(This is not true of other frameworks though. e.g. I spent a while
tracking this one down:
Lighthouse - Beautifully Simple Issue Tracking)

this is one
of my main gripes with current ruby testing - my current rails app has
about 1000 lines of code and 25,000 lines of testing framework!

Yeah, but how many lines of Rails framework? :slight_smile:

Cheers,

Brian.
--
Posted via http://www.ruby-forum.com/\.

Ara Howard wrote:

i always end up writing both samples and tests

Just a thought: I've found myself doing that too, but in particular I've
ran an example externally to be sure it works, then ended up pasting it
into the rdoc. So it would be really cool if rdoc could integrate the
examples directly in the appropriate place(s). I want to be sure that my
examples actually run as advertised, and at the moment I risk them
becoming out of date, so I'm definitely with you on that point.

However, one problem with having runnable examples/specs is that you
often have a lot of setup (and/or mock) code to make something run. This
risks adding too much noise to the examples, so structuring the code so
as to be able to filter that out would be a bonus - i.e. show me just
the example, not the setup or post-conditions. Not easy to do without
ripping the Ruby though.

I find the other kill-yourself part of testing is behavioural testing
with mock objects, but that's out of scope here I think :slight_smile:

Regards,

Brian.

···

--
Posted via http://www.ruby-forum.com/\.

Quite a bit of discussion since I looked at this last. It's almost as
if people care about testing.

i'm open to suggestion on format though. requirements are

    \. readable by humans
     \. easily parsed by computers

basically that means some yaml format. honestly open to suggestion
here...

Agreed on YAML. That's just a really simple way to go if you're going
to stick to those two (very sensible) requirements. Maybe JSON would
work as well. I think the trouble I had with the specific example is
that it's listed as a failure, yet shows 'a' matching. Maybe instead
of

    returning unexpected results:
      failure:
        expect:
          b: forty-two
        actual:
          b: 42.0

you could have something like

    returning unexpected results:
      status: failure
      vars:
        matched:
        - a
        unmatched:
          expect:
            b: forty-two
          actual:
            b: 42.0

It doesn't have to be exactly like that, of course, but I'd call
attention to specifically the status being something you can easily
check by using the 'status' key instead of looking for 'failure'. Also
that the variables are split into good and bad.

The status thing, admittedly, is more of a point if you're going to
have YAML reports of tests that passed as well. One of the things I
like about RSpec and bacon is the specdoc format, so if you have
sensibly-named contexts and examples, you can get a document
explaining the behavior of the code under test. In fact, I added this
output format to shoulda in the pre-git hullabaloo. I don't think it
ever made it in.

i have major issues with points two and three wrst to most ruby
testing frameworks. one of the main points of testy is to combine
examples with testing. rspec and all the others do not serve as
examples unless you are a ruby master. that is to say they introduce
too many additions to the code that's supposed to be an example to
really preserve it's 'exampleness'. and of course the output is
utterly useless to normal humans. if a framework provides 1000
asset_xxxxxxxx methods ad nausea then the point of the code - it's
level of example-good-ness - is lost to mere mortals

The examples I tend to give are just very high-level overviews, mostly
showing the API and drawing the reader into other documentation, the
specs, or finally the code if they really want to see everything my
lib/module/gem/whathaveyou can do. As such, they're usually not very
helpful as far as specs go.

I agree that having a glut of assertions/checks isn't useful, and is
fact hiding a very important fact when it comes to BDD (or really
testing in general): If something is painful to test, it should be
changed. What I love about BDD is having that something using my code
and defining its API as I go, so I don't get stuck in nasty test-land
with shitty implementation-specific tests, or at least not as easily.
So that pain, like any pain, is a signal that something is wrong, and
it should be fixed. It shouldn't be hidden behind a special assertion
or matcher or any other rug-sweeping activity unless there's good
reason, like the code that's hurting you to test isn't available to
you to fix.

> I used RSpec for a long time, and still do with some projects. I've
> switched bacon for my personal projects, and I love it. As for
> mocking, which is necessary in some cases if you want to test without
> turning into a serial killer, mocha with RSpec, facon with bacon.

this will summarize where my thoughts are on that

cfp:~/redfission > find vendor/gems/{faker,mocha,thoughtbot}* -type f|
xargs -n1 cat|wc -l
24255

cfp:~/redfission > find app -type f|xargs -n1 cat|wc -l
1828

rspec and co might be fine but seriously, the above is insane right?

It's bordering on nuts. It'd be better to show LoC than just wc -l, of
course, and are all the thoughtbot gems for your tests? Also, someone
else mentioned flay/flog/reek output, which could be illuminating. And
someone also brought up the app framework. After all, what you're
doing here is comparing your app code to your test framework. Seems
like you should be comparing app to test, or app framework to test
framework, or app to test with their respective frameworks. (And don't
forget to include Test::Unit for the testing frameworks that are
simply a layer on top of that.)

And I'd like to once again bring up bacon. I say I live with a few
methods on Object and Kernel to be able to test without going crazy,
and I mean it. I don't care how much how much you say assert_equal
makes sense or result.check 'some name', :expect => x, :actual => y is
great, I'm never going to get confused about some_value.should == 5.
It's readable, and it's close to English. Where I think bacon wins out
huge over RSpec is that bacon keeps all those methods on the Should
object instead of making them global. It's a small change to syntax
(some_obj.should.respond_to(:meth) vs. some_obj.should respond_to
(:meth)), but it makes a world of difference as far as sensibility
goes.

Oh, and bacon is < 300 LoC, with facon at 365 (a line of code for
every day in the year!).

···

On Mar 28, 11:27 pm, "ara.t.howard" <ara.t.how...@gmail.com> wrote:
          a: 42
          a: 42

--
-yossef

James Gray wrote:

I really feel counting the errors and reading the output are both things better handled by defining a good interface for the results writer. If I could just define some trivial class with methods like test_passed(), test_failed(), test_errored_out(), and tests_finished() then just plug that in, I could easily do anything I want.

What, besides instantly fix it (or revert) do you want to do with an error message from a broken test?

Phlip is referring to his own assert2, not the Test::Unit one. You
should check it out - it's really a great idea.

···

On Sun, Mar 29, 2009 at 11:33 PM, ara.t.howard <ara.t.howard@gmail.com> wrote:

On Mar 29, 2009, at 2:09 PM, Phlip wrote:

Or even less:

name = 42
assert{ name == ultimate.answer }

well, for one that won't run :wink: assert takes and arg and not a block... but
that aside

doh! i knew i was doing it wrong!

a @ http://codeforpeople.com/

···

On Mar 29, 2009, at 5:47 PM, Jeremy Hinegardner wrote:

This, of course, just means you are not writing big enough rails apps :-).

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Jeremy Hinegardner wrote:

···

On Mon, Mar 30, 2009 at 07:22:35AM +0900, ara.t.howard wrote:

if we accept the research and assume that bugs scale linerarly with the # of lines of code this is not good for robustness. this is one of my main gripes with current ruby testing - my current rails app has about 1000 lines of code and 25,000 lines of testing framework!

I incredibly don't understand that. Our apps are big by Rails standards, yet our test:code ratio is only 2.5:1. That's mostly because we can't "refactor" the tests too much, or they are hard to read.

Sounds like you're pining for Python's doctest?

  doctest — Test interactive Python examples — Python 3.12.1 documentation

Later,

···

On Mar 30, 4:27 am, Brian Candler <b.cand...@pobox.com> wrote:

Just a thought: I've found myself doing that too, but in particular I've
ran an example externally to be sure it works, then ended up pasting it
into the rdoc. So it would be really cool if rdoc could integrate the
examples directly in the appropriate place(s). I want to be sure that my
examples actually run as advertised, and at the moment I risk them
becoming out of date, so I'm definitely with you on that point.

--
http://twitter.com/bil_kleb

( a LOT of good stuff to which i'll reply to selectively here)

Perhaps - but it's one rule that only needs to be learned once.

I notice that testy supports check <name>, <expected>, <actual> too.

Testy does (intentially) force you to name your tests, whereas
Test::Unit will happily let you write

check <expected>, <actual>

I really don't like having to name each assertion, maybe because I'm
lazy or maybe because it feels like DRY violation. I've already said
what I want to compare, why say it again?

hmmm. yeah i see that, but disagree that the effort isn't worth it for the *next* programer.

Hmm, this is probably an argument *for* having a DSL for assertions - to
make the assertions read as much like example code ("after running this
example, you should see that A == B and C < D")

Neither

result.check "bar attribute", :expected => 123, :actual => res.bar

nor

assert_equal 123, res.bar, "bar attribute"

reads particularly well here, I think.

yeah i agree. i'm open to suggestion, just has to be very very simple.

Going too far this way down this path ends up with rspec, I think.

In fact, I don't really have a problem with writing

res.foo.should == 456

The trouble is the hundreds of arcane variations on this.

bingo! i really think the key is having *one* assertion method.

You solve this problem by only having a single test (Result#check), and
indeed if rspec only had a single method (should_equal) that would be
fairly clean too. However this is going to lead to awkwardness when you
want to test for something other than equality: e.g.

i dunno - ruby is pretty good at this

   value = begin; object.call; rescue => e; e.class; end

   result.check :error, SomeError, value

that seems perfectly fine to me. gives me an idea though - maybe check should take a block

   result.check(:error, SomeError){ something that raises an error }

and use the block to get the actual value as in

   value = begin; block.call; rescue Object => e; e; end

  result.check "foo should contain 'error'", foo, :=~, /error/

But again this is getting away from real ruby for the assertions, in
which case it isn't much better than

  assert_match /error/, foo, "foo should contain 'error'"

  assert_match /error/, foo # lazy/DRY version

check actually uses === for the comparison so you can do

   result.check :instance_of, SomeClass, object

   result.check :matches, /pattern/, string

i need to nail that down though.

get a listing of which tests/examples i can run

Yes, parseable results and test management are extremely beneficial.
Those could be retro-fitted to Test::Unit though (or whatever its
replacement in ruby 1.9 is called)

Getting rid of the at_exit magic is also worth doing.

i actually thought of simply patching test/unit... but then there are good test names, contexts, etc.

you can also do something like this (experimental) to just make a
simple example

cfp:~/src/git/testy > cat a.rb
require 'testy'

Testy.testing 'my lib' do

  test 'just an example of summing an array using inject' do
    a = 1,2
    a.push 3
    sum = a.inject(0){|n,i| n += i}
  end

end

Nice, could perhaps show the (expected) result inline too?

well - here actual always === expected with my current impl. it's essentially example code wrapped in assert_nothing_raised :wink:

I agree. Part of the problem is that when one thing is wrong making 20
tests fail, all with their respective backtraces, it can be very hard to
see the wood for the trees. What would be nice would be a folding-type
display with perhaps one line for each failed assertion, and a [+] you
can click on to get the detail for that particular one.

funny you mention that as i also hate that. my first version of testy actually just failed fast - if one test failed the code reported and aborted. maybe i should consider going back to that? i am finding that good output makes a ton of failures much easer - even using less and searching is easier with testy that anything else.

if we accept the research and assume that bugs scale linerarly with
the # of lines of code this is not good for robustness.

I disagree there - not with the research, but the implied conclusion
that you should never use a large codebase. Shoulda works well, and I've
not once found a bizarre behaviour in the testing framework itself that
I've had to debug, so I trust it.

shoulda does work well - i stole it's context concept just yesterday :wink:

about 1000 lines of code and 25,000 lines of testing framework!

Yeah, but how many lines of Rails framework? :slight_smile:

i used to be on the ramaze list alot too you know :wink:

a @ http://codeforpeople.com/

···

On Mar 30, 2009, at 2:12 AM, Brian Candler wrote:
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

I disagree. I 'learn' it each time I look at it, and then I forget it.
I think possibly because I think it's backwards.

The same goes for alias_method for me. I cannot tell you how many
times I've had to look up the order of old_name/new_name. And with
this, it's certainly because I think the values are backwards. (I just
had to look up the order to be sure.)

···

On Mar 30, 2:12 am, Brian Candler <b.cand...@pobox.com> wrote:

Ara Howard wrote:
>> assert_equal 42, ultimate.answer, "name"

> you can basically do that too, but i continually forget which is
> expected and which is actual and, as you know, that's a slippery error
> to track down at times.

Perhaps - but it's one rule that only needs to be learned once.

The default writer wants to write them out to the console for the user to see.

In TextMate, I would override that behavior to color the error messages red in our command output window and hyperlink the stack trace back into TextMate documents.

James Edward Gray II

···

On Mar 29, 2009, at 2:29 PM, Phlip wrote:

James Gray wrote:

I really feel counting the errors and reading the output are both things better handled by defining a good interface for the results writer. If I could just define some trivial class with methods like test_passed(), test_failed(), test_errored_out(), and tests_finished() then just plug that in, I could easily do anything I want.

What, besides instantly fix it (or revert) do you want to do with an error message from a broken test?

report it in your ci tool

a @ http://codeforpeople.com/

···

On Mar 29, 2009, at 1:29 PM, Phlip wrote:

What, besides instantly fix it (or revert) do you want to do with an error message from a broken test?

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Sean O'Halpin wrote:

Phlip is referring to his own assert2, not the Test::Unit one. You
should check it out - it's really a great idea.

Tx, but it sucks! Here's the ideal test case (regardless of its .should or it{} syntax:

   test_activate
      x = assemble()
      g = x.activate()
      g == 42
   end

The test should simply reflect the variables and values of everything after the activate line. Anything less is overhead. I _only_ want to spend my time setting up situations and equating their results. No DSLs or reflection or anything!

Now, why can't our language just /do/ that for us? It knows everything it needs...

ah - reports via the binding i assume - that *is* a nice idea.

a @ http://codeforpeople.com/

···

On Mar 29, 2009, at 5:01 PM, Sean O'Halpin wrote:

Phlip is referring to his own assert2, not the Test::Unit one. You
should check it out - it's really a great idea.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

i'm not talking about code to test ratios, but code to test *framework* ratios. of course these numbers are a little bit high (whitespace, comments, etc) but

cfp:/opt/local/lib/ruby/gems/1.8/gems > for gem in thoughtbot-* rspec* faker* mocha*;do echo $gem;find $gem/lib -type f|grep .rb|xargs -n1 cat>wc -l;done
thoughtbot-factory_girl-1.2.0
      937
thoughtbot-shoulda-2.9.1
     3854
rspec-1.1.12
     8773
rspec-1.1.3
     7785
rspec-1.1.4
     8083
faker-0.3.1
      299
mocha-0.9.5
     3294

most people would consider an 8000 line rails app 'large'.

a @ http://codeforpeople.com/

···

On Mar 29, 2009, at 6:09 PM, Phlip wrote:

I incredibly don't understand that. Our apps are big by Rails standards, yet our test:code ratio is only 2.5:1. That's mostly because we can't "refactor" the tests too much, or they are hard to read.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Bil Kleb wrote:

Sounds like you're pining for Python's doctest?

  doctest — Test interactive Python examples — Python 3.12.1 documentation

Ah, now that's a really interesting way of thinking about
examples/testing: in the form of an irb session. You can write your
tests just by mucking about in irb, and when it makes sense, just paste
the output somewhere.

  > foo = generate_foo
  => #<Foo:0xb7cd041c @attr1="hello", @attr2="world">
  > foo.attr1
  => "hello"
  > foo.attr2
  => "world"

Presumably you could avoid the fragile comparison on Object#inspect
output by deleting it from the transcript.

  > foo = generate_foo
  ...
  > foo.attr1
  => "hello"
  > foo.attr2
  => "world"

Writing something which parses that an (re)runs it to verify the output
should be pretty straightforward.

It can also handle the 'assert_raises' case nicely.

  > f.attr3
  NoMethodError: undefined method `attr3' for ...

Ara, how about it? :slight_smile:

···

--
Posted via http://www.ruby-forum.com/\.

100% agree on both counts. i personally would always use the options approach (:expected, :actual =>) because of that.

a @ http://codeforpeople.com/

···

On Mar 30, 2009, at 12:20 PM, Phrogz wrote:

I disagree. I 'learn' it each time I look at it, and then I forget it.
I think possibly because I think it's backwards.

The same goes for alias_method for me. I cannot tell you how many
times I've had to look up the order of old_name/new_name. And with
this, it's certainly because I think the values are backwards. (I just
had to look up the order to be sure.)

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Ara Howard wrote:

shoulda does work well - i stole it's context concept just yesterday :wink:

Looks like you're not the only one:

A nice thing about tiny test frameworks like this (and yours) is that
they are easily vendorized, so the app is no longer dependent on a
specific test framework and/or version being available.

···

--
Posted via http://www.ruby-forum.com/\.