Minitest randomization

I guess this comes down to idempotency. I expect that if I do
something twice in a row I will get the same result. Randomizing by
default breaks this expectation. It's astonishing, therefore bad, no
matter how good from a theoretical standpoint, and especially
astonishing when people have 10+ years of xUnit and its heirs building
these expectations.

Thanks for your detailed reply, but I should point out that your
responses didn't address either my idempotency or astonishment
arguments.

Why aren't you using --seed when you rerun your specs?

Because I say "rake" to run my tests, and seed is not preserved. How
are you running yours?

If you use --seed with the previous value, all of your complaint about having to "squint" to find your previous failure goes away.

Well, no, not *all* of them :slight_smile:

And I wasn't being metaphorical. When I'm poring over a console full
of fail, I squint. And sometimes I sigh.

* make test randomization an option ("randomize")

It is an option on a class by class basis. See `ri MiniTest::Unit::TestCase::test_order`.

Yeah, I saw that before, but that doesn't fix my core complaint unless
I hack minitest to always run in consistent order. Hence my request
for a patch.

And you're right, it seems that that patch is trivial for
MiniTest::Spec, and probably not too hard for
MiniTest::Unit::TestCase.

BTW your docs should reflect that for MiniTest::Specs, :alpha and
:sorted both really mean :defined -- or whatever you want to call the
traditional "order in which they occur in the file" (which is often
roughly in order of complexity, so earlier failures are often more
essential, so should be fixed first). (And for
MiniTest::Unit::TestCases, too, but only in Ruby 1.9.)

Anecdotally (unfortunately, nothing I can go into in great detail), I've seen far too many projects with test order dependencies, which is the reason that feature went into the library in the first place.

I'm sure you thought you had a good reason for being inconsistent. But
in this case, where the library ships with core Ruby and is supposed
to be an invisible drop-in replacement for Test::Unit, it's a bridge
too far.

And I'm not even disagreeing with your observation! I agree that there
should be a randomizing mode, and that people should run it fairly
often. Just not all the time and not without a config or command-line
option to turn it off.

Unfortunately, we need to output the seed value at the beginning in the case that your tests not only fail, but crash (like when you're Aaron Patterson and you're working on C extensions instead of writing ruby like a good person).

Wait, are you saying the reason we all have to look at console spam is
so that Aaron doesn't have to type "--verbose" when he's writing C in
Ruby? Does he have compromising photos of you with the maid or
something? :slight_smile:

I still think that random tests/specs are stronger tests/specs and completely disagree with you that "most well-factored OO [test] code these days does not exhibit isolation problems" on the basis that most OO [test] code is not well-factored. minitest's test dir flays at 535. Wrong's tests flay at 1150.

That's a nice debate trick -- insert a qualifier and then disagree
with it, not with what I actually said :slight_smile:

Anecdotally, when you see test order dependency problems, are they
because the tests are not isolated or because the production code
isn't? I was talking about production OO code, not test code (which is
more like a bunch of functions than like an object anyway).

Flay's a nice tool but the threshold for DRY in test code is higher
than in production code. Test code needs to be understandable above
all else, so you can hone in on the scenario leading up to a
file-and-line failure, and that can mean duplication to some extent is
desirable. That doesn't mean the tests aren't isolated.

- A

···

--
Alex Chaffee - alex@stinky.com - http://alexch.github.com
Stalk me: http://friendfeed.com/alexch | http://twitter.com/alexch |
http://alexch.tumblr.com

Idempotency is a red-herring. There is nothing about the xunit family/philosophy of tools (or any other test tool that I have used or studied--except rspec) that suggests that test order must be run in the order defined (or must be run in any order at all). Just look at the new tools coming out that distribute and multithread/process your tests and you can see that right there, the notion has to be thrown out the window by design.

As for your astonishment, I thought it was pretty well addressed in the first line of my reply: "Really? I think preventing test order dependency has a very practical effect". If you're still astonished after that, then you're probably misusing the word.

···

On Oct 9, 2010, at 16:50 , Alex Chaffee wrote:

I guess this comes down to idempotency. I expect that if I do
something twice in a row I will get the same result. Randomizing by
default breaks this expectation. It's astonishing, therefore bad, no
matter how good from a theoretical standpoint, and especially
astonishing when people have 10+ years of xUnit and its heirs building
these expectations.

-----

At this point I'm going to cut much of your reply and everything I've written so far in response and cut to the chase:

And I'm not even disagreeing with your observation! I agree that there
should be a randomizing mode, and that people should run it fairly
often. Just not all the time and not without a config or command-line
option to turn it off.

Apparently this is the crux of our disagreement:

I __do__ think that people should randomize their tests a MAJORITY of the time and turn it off TEMPORARILY when they need to sort out an issue. If it wasn't random by default, it wouldn't happen at all.

If you disagree with that (and still want to use minitest), you ALREADY have multiple ways to fix it for your own tests:

1) define ::test_order to return :sorted. That'd be your "config" suggestion above.
2) use --seed when you want the order to be fixed, via TESTOPTS if you're using rake. And that'd be your command line option...

If you want a third option available, feel free to propose it and I'll gladly consider it.

P.S. I'm still mulling over Steve Klabnik's suggestion that the output be sorted. I think it could be very confusing when you do have test dependency errors but that there might be some way to mitigate the confusion. I'd like to hear what you think about his suggestion.

(Apologies if the quotes don't come out right in plain text -- I'm using
both Apple Mail and GMail and they're playing crazy HTML games with my
draft.)

I guess this comes down to idempotency. I expect that if I do

something twice in a row I will get the same result. Randomizing by

default breaks this expectation. It's astonishing, therefore bad, no

matter how good from a theoretical standpoint, and especially

astonishing when people have 10+ years of xUnit and its heirs building

these expectations.

Idempotency is a red-herring. There is nothing about the xunit
family/philosophy of tools (or any other test tool that I have used or
studied--except rspec) that suggests that test order must be run in the
order defined (or must be run in any order at all). Just look at the new
tools coming out that distribute and multithread/process your tests and you
can see that right there, the notion has to be thrown out the window by
design.

That's a fair point. The idempotency I was referring to was that running
"rake test" twice on a failing suite gets different results -- if not
different failures, then the same failures in a different order.

As for your astonishment, I thought it was pretty well addressed in the

first line of my reply: "Really? I think preventing test order dependency
has a very practical effect". If you're still astonished after that, then
you're probably misusing the word.

I'm using it in its technical sense:

And I stand by what I wrote: if your tests are all passing, and they're well
isolated, then randomizing them has no practical effect. It's just shuffling
a deck full of aces.

At this point I'm going to cut much of your reply and everything I've

written so far in response and cut to the chase:

And I'm not even disagreeing with your observation! I agree that there

should be a randomizing mode, and that people should run it fairly

often. Just not all the time and not without a config or command-line

option to turn it off.

Apparently this is the crux of our disagreement:

I __do__ think that people should randomize their tests a MAJORITY of the
time and turn it off TEMPORARILY when they need to sort out an issue. If it
wasn't random by default, it wouldn't happen at all.

This is a noble position, as I said before. You're the self-appointed
isolation vigilante, crusading against a problem you abhor. But I've rarely
encountered it. I feel that my tests don't need randomization, and the extra
output clutters my console (*), and the shuffling cramps my debugging style,
so I want it off unless I ask for it. If you're Batman, I feel like I'm the
Lorax. I speak for the trees whose pristine consoles are being polluted, but
who haven't spoken out. (I haven't really heard a chorus of protestors in
favor of randomization either, fwiw.)

You're the library author, so you have the privilege of deciding what mode
is the default. I'm hoping to convince you of a few things, but if I don't,
I won't take it personally.

Sounds like we're approaching a compromise, though: an option for me,
defaulting to off for you. (And option != monkey patch -- it's a clear API
like a named switch on the command line and/or a value on some Minitest
object, e.g. "Minitest::Config.randomize = false".)

I'd also be happy with just a verbosity setting, maybe with several levels
like you suggest.

2) use --seed when you want the order to be fixed, via TESTOPTS if you're

using rake. And that'd be your command line option...

TESTOPTS. Roger that. Never used it before. Maybe the Minitest README should
say something about that when it talks about --seed. (Oh, looks like it
doesn't talk about --seed either.)

If you want a third option available, feel free to propose it and I'll

gladly consider it.

Had a weird thought while doing the dishes... what if you write out seed
somewhere persistent like .minitest_seed, then erase it after the run but
only if the run was successful. Then when a run starts, if .minitest_seed
exists, it uses it (and says so) instead of rolling a new one. That way you
don't have to print out anything for successful runs and the user doesn't
have to remember anything and idempotency is preserved (if it fails once,
it'll fail the next time, in exactly the same way) and it'll keep failing
consistently until you fix the problem. It also works for C hackers since a
crash means it won't erase the cached seed.

(And hey, also, for Aaron's sake, can't you trap SIGSEGV and print the seed
then? Not a rhetorical question since I haven't done any C+Ruby stuff and I
know signals are sometimes flakey.)

Since --seed makes in-test randomization freeze out too, I think there
should be separate options for all three (--seed, --randomize, and
--verbose).

P.S. I'm still mulling over Steve Klabnik's suggestion that the output be

sorted. I think it could be very confusing when you do have test dependency
errors but that there might be some way to mitigate the confusion. I'd like
to hear what you think about his suggestion.

I like it. It's pretty weird though. it's a very pleasant dream I'm not sure
if it will survive in the cold light of day.

- A

(*) My conosle is already way cluttered even with the minimum verbosity --
my collaborator Steve wrote some code that runs each of our tests in its own
VM process, to ensure isolation of dependencies and other stuff, so I get a
big long scroll of test runs, each of which is now 2 lines longer because of
"test run output" cruft. git clone wrong and run "rake rvm:test" to see what
I mean. Every line of output I save is multiplied by (N tests) x (M Ruby
versions). Since it's slow, I only run it before checkin.

P.S. I'm still mulling over Steve Klabnik's suggestion that the output be
sorted.

I haven't made any psychological tests on this, but I suspect that if I'm
looking at a largish print out it might be easier to notice something
unexpected if the print out is always in the same order. So that seems a
useful option to me.

I think it could be very confusing when you do have test dependency errors

good point

but that there might be some way to mitigate the confusion.

I hesitate to suggest this, because it seems too obvious, so there might be
something wrong with it that I'm missing. If the output is going to be
sorted, the whole of the unsorted output will be available (?), so there
could be an option to write the unsorted output to a file, as well as
outputting the sorted output?

···

On Sun, Oct 10, 2010 at 11:25 AM, Ryan Davis <ryand-ruby@zenspider.com>wrote:

when running in console, is it possible to highlight test failures/errors
with a different (red) color? since most modern terminals support coloring
text. (a pity that I'm stuck w/ Windows due to smth).

once test finishes, maybe an additional report can be provided with only the
highlighted stuff collected in order. with some reporting related
options/command line arguments added to activate this behavior. Furthermore,
as it's an additional report, it can be not just test, maybe a temporary
html that automatically opened in my browser?

···

On Sun, Oct 10, 2010 at 6:25 PM, Ryan Davis <ryand-ruby@zenspider.com>wrote:

If you want a third option available, feel free to propose it and I'll
gladly consider it.

P.S. I'm still mulling over Steve Klabnik's suggestion that the output be
sorted. I think it could be very confusing when you do have test dependency
errors but that there might be some way to mitigate the confusion. I'd like
to hear what you think about his suggestion.

Well, shouldn't any failure/error be unexpected (beyond the regular test first pattern).

···

On Oct 10, 2010, at 08:04 , Colin Bartlett wrote:

On Sun, Oct 10, 2010 at 11:25 AM, Ryan Davis <ryand-ruby@zenspider.com>wrote:

P.S. I'm still mulling over Steve Klabnik's suggestion that the output be
sorted.

I haven't made any psychological tests on this, but I suspect that if I'm
looking at a largish print out it might be easier to notice something
unexpected if the print out is always in the same order. So that seems a
useful option to me.

Neither of these suggestions are things that I think belong in minitest. "mini" is the prefix for a reason. I am working on an extension system right now that should help you write them as plugins. It will be part of the 2.0 release.

···

On Oct 10, 2010, at 20:15 , redstun wrote:

when running in console, is it possible to highlight test failures/errors
with a different (red) color? since most modern terminals support coloring
text. (a pity that I'm stuck w/ Windows due to smth).

once test finishes, maybe an additional report can be provided with only the
highlighted stuff collected in order. with some reporting related
options/command line arguments added to activate this behavior. Furthermore,
as it's an additional report, it can be not just test, maybe a temporary
html that automatically opened in my browser?