The best kittens, technology, and video games blog in the world.

Monday, April 06, 2015

The Rise and Fall of Unit Testing

DONUT WANT by courtneyp from flickr (CC-ND)

It's hard to discuss tests, as there's no agreement on what "X test" means for any value of X. In this post I'll use "unit test" to refer to low level test of various internal subcomponents, and "system test" to refer to high level and largely black box test of the whole system, or major part of it.

Once upon a time, before testing became widespread, I believed that unit testing greatly improves software quality. It was extremely obvious how awful the kind of code that had no unit tests usually was, and code with good unit test coverage was usually much more sensibly organized.

Many year later, once testing became the new orthodoxy, I started noticing something new - code that was reasonably covered by unit tests, but which internally was a total mess.

Was I wrong all along? Was it simply that early adopters of such practices as unit testing just write better code anyway, and nothing can help the mediocre masses? There's probably some amount of truth to that, but that's a really depressing thought, as it'd basically mean that unit testing is a pointless ritual, and we're doomed to suffer crappy code forever.

Then I understood.

Why do we bother unit testing?

The obvious answer is that automated testing protects from bugs, but that's not really true. Testing follows the typical 80:20 pattern - 20% of tests can catch 80% of problems - so if you write a few system tests covering common cases, and then a handful of unit tests and regression tests to deal with parts of your codebase that need them the most, you're nearly as well covered as someone who unit tests absolutely everything, and you'll spend a fraction of time writing and maintaining your tests.

Adding more tests beyond that helps a little, but returns on effort are usually not impressive. If you don't believe me, check CI logs of just about any codebase for tests that always pass, or only fail as false positive, or only fail in groups of which you could pick one test and delete the rest.

What's worse, there are whole important categories of serious bugs - such as security vulnerabilities - which unit tests are very unlikely to discover.

So what is the point then? The most important one is that unit testing forces individual components to be able to work in isolation. If a component can only be used as part of the whole system, it's going to be horribly difficult to test, so you'll end up rewriting it in a way that disentangles it, and this process, iterated for every component, ends up improving structure of the code greatly.

And then someone invented mocking and broke everything.

Don't use mocks. Ever.

Extensive mocking makes unit testing worse than pointless.

On one hand, you no longer need to care if something is hopelessly entangled with the rest of the system - you'll just stub, mock, and double all these connections, and you can write something that looks sort of like a real unit test, while your code is fragile spaghetti.

On the other hand - tests that mock a lot are just hilariously likely to pass or fail because the mock is necessarily different from the real thing, so they don't even test much, mostly only giving you false confidence.

What is even the point?

Mocks are C++ of testing

I'm being unfair here. There are reasonable use cases for pretty much every mocking feature, and for most fancy testing framework features beyond basic test/unit style. People who came up with all that mostly had good reasons for them.

But the same can be said of C++. Pretty much every single feature added was there for a reason. It didn't matter - the end result of adding them all and using them indiscriminately was total disaster.

A tiny bit of mocking here and there when necessary probably won't hurt you, but it takes a lot of discipline to avoid this deteriorating from "tiny bit when necessary" into "well, I could test that properly, but mocking is easier, so I'll just write some fake tests" under pressure.

Donuts for mocks

I thought about some easy guidelines for when mocks are excusable, but I can't really come with anything convincing, and I'm not terribly convinced by guidelines given by other people either.

So to avoid negativity, here's an untested idea - just like swear jars can discourage swearing without completely preventing it when really justified, why not put a similar fine on mocking?

If you're absolutely sure there's no other way, bribe your way out of the situation, and buy the whole team some donuts (or other appropriate sweets). If you don't, no mocks for you.

The next time you are tempted to mock it, ask yourself if it's worth a trip to Krispy Kreme.

9 comments:

Anonymous said...

I guess you are missing there something.

First of all if you need to check security or feature or other behavior on your system then you have to write integration tests (which are not unit one) or you can write ACC tests as well. But if you have to check the status of a method you are for sure not interested to know other thing from your system and mocks are very helpful there.

So if you want to have very stable product then you need to have at least 2 type of automated tests on the project.

I guess you find useful to know that some one use on the project BDD (wich cover feature/integration) and TDD (with unit tests) .
It's just two different testing which complete one to other.

In general testing can be without mocks, but 'unit test' without mocks became not so 'unit' one.

Shaun Finglas said...

I agree with what you said Tomasz about code with tests that is internally a complete mess.

I've blogged about this topic and came up with a few ways to limit this going forward, it may be of some use.

Nicolas Dorier said...

I agree with you on the point that mocking everything is a waste of time.

But testing is not. A good test make it easy to breakpoint and run any line of your code.

On my side, my tests are more integration one, instead of mocking a web server, I initialize its initial state (databases) to a known state and run the test with all dependencies on.

Jason W. Thompson said...

I'm not sure I can agree with you on the "no mocking rule". I would suggest instead that one should not write any "bad" mocks. That is to say, a mock should generally work similar to the actual layer. If I pass a bad argument into a mock, then I should get an exception back and so on.

The main reason for mocks is to have the ability to test components without relying on external resources. For example, if I want to test that my app will allow the end user to change their name after 6 months has passed, I can either not test the scenario, rely on the system clock resource and wait 6 months, or I can mock the clock implementation using the java.time API and change the clock to read 6 months into the future.

Similarly, a layer that talks to a database might be a lot slower. So one should just mock that layer to hold all of the data in memory. Writing slow tests makes for slow development.

Greg said...

Agree 100%.. TDD without mocks is rare.

Jon said...

It is probably worth noting the difference between mocking and faking, as well as mocking nasty dependencies, and faking emergent dependencies. The former type of dependency often is solved by asking the question "Why do I need this dependency?" The latter type of dependency should be obviously necessary based on the test (BDD or TDD).

Anonymous said...

In my experience, unit tests are necessary in two situations. First, during development when the code under test is changing rapidly, a good test suite is a very efficient way of checking that a change or bug fix did not break something. Secondly, in the absence of good test coverage, refactorization is a like sky diving without a parachute.

Anonymous said...

What you suggested at the top is correct: Good, experienced developers know how to write good tests and need to write fewer of them. That's one difference between mere "coding" and building useful, reliable stuff. There's no free lunch.

Anonymous said...

In addition to using mocks sparingly so that the need to test can inform design, the other rule I have is to audit for coverage twice:

First, I make sure that I'm testing (not merely exercising, as automated coverage analysis reports on) all of the common and edge cases in my IMPLEMENTATION.

Second, I repeat the process, but looking for edge cases in the algorithm or theoretical model which should be tested but aren't yet because some quirk of the implementation allows them to be handled by the common code path.

(ie. Audit for what are edge cases now, then audit for what might become edge cases later)