The Differences Between Test-First Programming and Test-Driven Development

Red, Green, RefactorThere seems to be some confusion between Test-First Programming and Test-Driven Development (TDD).

This post explains that merely writing the tests before the code doesn’t necessarily make it TDD.

Similarities Between Test-First Programming and Test-Driven Development

It’s not hard to see why people would confuse the two, since they have many things in common.

My classification of tests distinguishes six dimensions: who, what, when, where, why, and how.

Test-First programming and Test-Driven Development score the same in five of those six dimensions: they are both automated (how) functional (what) programmer (who) tests at the unit level (where) written before the code (when).

The only difference is in why they are written.

Differences Between Test-First Programming and Test-Driven Development

Test-First Programming mandates that tests be written before the code, so that the code will always be testable. This is more efficient than having to change already written code to make it testable.

Test-First Programming doesn’t say anything about other activities in the development cycle, like requirements analysis and design.

This is a big difference with Test-Driven Development (TDD), since in TDD, the tests drive the design. Let’s take a detailed look at the TDD process of Red/Green/Refactor, to find out exactly how that differs from Test-First Programming.

Red

Unit test failureIn the first TDD phase we write a test. Since there is no code yet to make the test pass, this test will fail.

Unit testing frameworks like JUnit will show the result in red to indicate failure.

In both Test-First Programming and Test-Driven Development, we use this phase to record a requirement as a test.

TDD, however, goes a step further: we also explicitly design the client API. Test-First Programming is silent on how and when we should do that.

Green

In the next phase, we write code to make the test pass. Unit testing frameworks show passing tests in green.

In Test-Driven Development, we always write the simplest possible code that makes the test pass. This allows us to keep our options open and evolve the design.

JUnit passing testWe may evolve our code using simple transformations to increase the complexity of the code enough to satisfy the requirements that are expressed in the tests.

Test-First Programming is silent on what sort of code you write in this phase and how you do it, as long as the test will pass.

Refactor

In the final TDD phase, the code is refactored to improve the design of the implementation.

This phase is completely absent in Test-First Programming.

Summary of Differences

So we’ve uncovered two differences that distinguish Test-First Programming from Test-Driven Development:

  1. Test-Driven Development uses the Red phase to design the client API. Test-First Programming is silent on when and how you arrive at a good client API.
  2. Test-Driven Development splits the coding phase into two compared to Test-First Programming. In the first sub-phase (Green), the focus is on meeting the requirements. In the second sub-phase (Refactor), the focus is on creating a good design.

I think there is a lot of value in the second point. Many developers focus too much on getting the requirements implemented and forget to clean up their code. The result is an accumulation of technical debt that will slow development down over time.

TDD also splits the design activity into two. First we design the external face of the code, i.e. the API. Then we design the internal organization of the code.

This is a useful distinction as well, because the heuristics you would use to tell a good API from a bad one are different from those for good internal design.

Try Before You Buy

KataAll in all I think Test-Driven Development provides sufficient value over Test-First Programming to give it a try.

All new things are hard, however, so be sure to practice TDD before you start applying it in the wild.

There are numerous katas that can help you with that, like the Roman Numerals Kata.

On Measuring Code Coverage

In a previous post, I explained how to visualize what part of your code is covered by your tests.

This post explores two questions that are perhaps more important: why and what code coverage to measure.

Why We Measure Code Coverage

What does it mean for a statement to be covered by tests? Well, it means that the statement was executed while the tests ran, nothing more, nothing less.

We can’t automatically assume that the statement is tested, since the bare fact that a statement was executed doesn’t imply that the effects of that execution were verified by the tests.

If you practice Test-First Programming, then the tests are written before the code. A new statement is added to the code only to make a failing test pass. So with Test-First Programming, you know that each executed statement is also a tested statement.

If you don’t write your tests first, then all bets are off. Since Test-First Programming isn’t as popular as I think it should be, let’s assume for the remainder of this post that you’re not practicing it.

Then what good does it do us to know that a statement is executed?

Well, if the next statement is also executed, then we know that the first statement didn’t throw an exception.

That doesn’t help us much either, however. Most statements should not throw an exception, but some statements clearly should. So in general, we still don’t get a lot of value out of knowing that a statement is executed.

The true value of measuring code coverage is therefore not in the statements that are covered, but in the statements that are not covered! Any statement that is not executed while running the tests is surely not tested.

Uncovered code indicates that we’re missing tests.

What Code Coverage We Should Measure

Our next job is to figure out what tests are missing, so we can add them. How can we do that?

Since we’re measuring code coverage, we know the target of the missing tests, namely the statements that were not executed.

If some of those statements are in a single class, and you have unit tests for that class, it’s easy to see that those unit tests are incomplete.

Unit tests can definitely benefit from measuring code coverage.

What about acceptance tests? Some code can easily be related to a single feature, so in those cases we could add an acceptance test.

In general, however, the relationship between a single line of code and a feature is weak. Just think of all the code we re-use between features. So we shouldn’t expect to always be able to tell by looking at the code what acceptance test we’re missing.

It makes sense to measure code coverage for unit tests, but not so much for acceptance tests.

Code Coverage on Acceptance Tests Can Reveal Dead Code

One thing we can do by measuring code coverage on acceptance tests, is find dead code.

Dead code is code that is not executed, except perhaps by unit tests. It lives on in the code base like a zombie.

Dead code takes up space, but that’s not usually a big problem.

Some dead code can be detected by other means, like by your IDE. So all in all, it seems that we’re not gaining much by measuring code coverage for acceptance tests.

Code Coverage on Acceptance Tests May Be Dangerous

OK, so we don’t gain much by measuring coverage on acceptance tests. But no harm, no foul, right?

Well, that remains to be seen.

Some organizations impose targets for code coverage. Mindlessly following a rule is not a good idea, but, alas, such is often the way of big organizations. Anyway, an imposed number of, say, 75% line coverage may be achievable by executing only the acceptance tests.

So developers may have an incentive to focus their tests exclusively on acceptance tests.

This is not as it should be according to the Test Pyramid.

Acceptance tests are slower, and, especially when working through a GUI, may also be more brittle than unit tests.

Therefore, they usually don’t go much further than testing the happy path. While it’s great to know that all the units integrate well, the happy path is not where most bugs hide.

Some edge and error cases are very hard to write as automated acceptance tests. For instance, how do you test what happens when the network connection drops out?

These types of failures are much easier explored by unit tests, since you can use mock objects there.

The path of least resistance in your development process should lead developers to do the right thing. The right thing is to have most of the tests in the form of unit tests.

If you enforce a certain amount of code coverage, be sure to measure that coverage on unit tests only.

A Classification of Tests

There are many ways of testing software. This post uses the five Ws to classify the different types of tests and shows how to use this classification.

Programmer vs Customer (Who)

Tests exist to give confidence that the software works as expected.

But whose expectations are we talking about? Developers have different types of expectations about their code than users have about the application. Each audience deserves its own set of tests to remain confident enough to keep going.

Functionality vs Performance vs Load vs Security (What)

When not specified, it’s assumed that what is being tested is whether the application functions the way it’s supposed to. However, we can also test non-functional aspects of an application, like security.

Before Writing Code vs After (When)

Tests can be written after the code is complete to verify that it works (test-last), or they can be written first to specify how the code should work (test-first). Writing the test first may seem counter-intuitive or unnatural, but there are some advantages:

  • When you write the tests first, you’ll guarantee that the code you later write will be testable (duh). Anybody who’s written tests for legacy code will surely acknowledge that that’s not a given if you write the code first
  • Writing the tests first can prevent defects from entering the code and that is more efficient than introducing, finding, and then fixing bugs
  • Writing the tests first makes it possible for the tests to drive the design. By formulating your test, in code, in a way that looks natural, you design an API that is convenient to use. You can even design the implementation

Unit vs Integration vs System (Where)


Tests can be written at different levels of abstraction. Unit tests test a single unit (e.g. class) in isolation.

Integration tests focus on how the units work together. System tests look at the application as a whole.

As you move up the abstraction level from unit to system, you require fewer tests.

Verification vs Specification vs Design (Why)

There can be different reasons for writing tests. All tests verify that the code works as expected, but some tests can start their lives as specifications of how yet-to-be-written code should work. In the latter situation, the tests can be an important tool for communicating how the application should behave.

We can even go a step further and let the tests also drive how the code should be organized. This is called Test-Driven Design (TDD).

Manual vs Automated Tests (How)


Tests can be performed by a human or by a computer program. Manual testing is most useful in the form of exploratory testing.

When you ship the same application multiple times, like with releases of a product or sprints of an Agile project, you should automate your tests to catch regressions. The amount of software you ship will continue to grow as you add features and your testing effort will do so as well. If you don’t automate your tests, you will eventually run out of time to perform all of them.

Specifying Tests Using the Classification

With the above classifications we can be very specific about our tests. For instance:

  • Tests in TDD are automated (how) programmer (who) tests that design (why) functionality (what) at the unit or integration level (where) before the code is written (when)
  • BDD scenarios are automated (how) customer (who) tests that specify (why) functionality (what) at the system level (where) before the code is written (when)
  • Exploratory tests are manual (how) customer (who) tests that verify (why) functionality (what) at the system level (where) after the code is written (when)
  • Security tests are automated (how) customer (who) tests that verify (why) security (what) at the system level (where) after the code is written (when)

By being specific, we can avoid semantic diffusion, like when people claim that “tests in TDD do not necessarily need to be written before the code”.

Reducing Risk Using the Classification

Sometimes you can select a single alternative along a dimension. For instance, you could perform all your testing manually, or you could use tests exclusively to verify.

For other dimensions, you really need to cover all the options. For instance, you need tests at the unit and integration and system level and you need to test for functionality and performance and security. If you don’t, you are at risk of not knowing that your application is flawed.

Proper risk management, therefore, mandates that you shouldn’t exclusively rely on one type of tests. For instance, TDD is great, but it doesn’t give the customer any confidence. You should carefully select a range of test types to cover all aspects that are relevant for your situation.

Visualizing Code Coverage in Eclipse with EclEmma

Last time, we saw how Behavior-Driven Development (BDD) allows us to work towards a concrete goal in a very focused way.

In this post, we’ll look at how the big BDD and the smaller TDD feedback loops eliminate waste and how you can visualize that waste using code coverage tools like EclEmma to see whether you execute your process well.

The Relation Between BDD and TDD

Depending on your situation, running BDD scenarios may take a lot of time. For instance, you may need to first create a Web Application Archive (WAR), then start a web server, deploy your WAR, and finally run your automated acceptance tests using Selenium.

This is not a convenient feedback cycle to run for every single line of code you write.

So chances are that you’ll write bigger chunks of code. That increases the risk of introducing mistakes, however. Baby steps can mitigate that risk. In this case, that means moving to Test-First programming, preferably Test-Driven Development (TDD).

The link between a BDD scenario and a bunch of unit tests is the top-down test. The top-down test is a translation of the BDD scenario into test code. From there, you descend further down into unit tests using regular TDD.

This translation of BDD scenarios into top-down tests may seem wasteful, but it’s not.

Top-down tests only serve to give the developer a shorter feedback cycle. You should never have to leave your IDE to determine whether you’re done. The waste of the translation is more than made up for by the gains of not having to constantly switch to the larger BDD feedback cycle. By doing a little bit more work, you end up going faster!

If you’re worried about your build time increasing because of these top-down tests, you may even consider removing them after you’ve made them pass, since their risk-reducing job is then done.

Both BDD and TDD Eliminate Waste Using JIT Programming

Both BDD and TDD operate on the idea of Just-In-Time (JIT) coding. JIT is a Lean principle for eliminating waste; in this case of writing unnecessary code.

There are many reasons why you’d want to eliminate unnecessary code:

  • Since it takes time to write code, writing less code means you’ll be more productive (finish more stories per iteration)
  • More code means more bugs
  • In particular, more code means more opportunities for security vulnerabilities
  • More code means more things a future maintainer must understand, and thus a higher risk of bugs introduced during maintenance due to misunderstandings

Code Coverage Can Visualize Waste

With BDD and TDD in your software development process, you expect less waste. That’s the theory, at least. How do we prove this in practice?

Well, let’s look at the process:

  1. BDD scenarios define the acceptance criteria for the user stories
  2. Those BDD scenarios are translated into top-down tests
  3. Those top-down tests lead to unit tests
  4. Finally, those unit tests lead to production code

The last step is easiest to verify: no code should have been written that wasn’t necessary for making some unit test pass. We can prove that by measuring code coverage while we execute the unit tests. Any code that is not covered is by definition waste.

EclEmma Shows Code Coverage in Eclipse

We use Cobertura in our Continuous Integration build to measure code coverage. But that’s a long feedback cycle again.

Therefore, I like to use EclEmma to measure code coverage while I’m in the zone in Eclipse.

EclEmma turns covered lines green, uncovered lines red, and partially covered lines yellow.

You can change these colors using Window|Preferences|Java|Code coverage. For instance, you could change Full Coverage to white, so that the normal case doesn’t introduce visual clutter and only the exceptions stand out.

The great thing about EclEmma is that it let’s you measure code coverage without making you change the way you work.

The only difference is that instead of choosing Run As|JUnit Test (or Alt+Shift+X, T), you now choose Coverage As|JUnit test (or Alt+Shift+E, T). To re-run the last coverage, use Ctrl+Shift+F11 (instead of Ctrl+F11 to re-run the last launch).

If your fingers are conditioned to use Alt+Shift+X, T and/or Ctrl+F11, you can always change the key bindings using Window|Preferences|General|Keys.

In my experience, the performance overhead of EclEmma is low enough that you can use it all the time.

EclEmma Helps You Monitor Your Agile Process

The feedback from EclEmma allows you to immediately see any waste in the form of unnecessary code. Since there shouldn’t be any such waste if you do BDD and TDD well, the feedback from EclEmma is really feedback on how well you execute your BDD/TDD process. You can use this feedback to hone your skills and become the best developer you can be.