On Measuring Code Coverage

In a previous post, I explained how to visualize what part of your code is covered by your tests.

This post explores two questions that are perhaps more important: why and what code coverage to measure.

Why We Measure Code Coverage

What does it mean for a statement to be covered by tests? Well, it means that the statement was executed while the tests ran, nothing more, nothing less.

We can’t automatically assume that the statement is tested, since the bare fact that a statement was executed doesn’t imply that the effects of that execution were verified by the tests.

If you practice Test-First Programming, then the tests are written before the code. A new statement is added to the code only to make a failing test pass. So with Test-First Programming, you know that each executed statement is also a tested statement.

If you don’t write your tests first, then all bets are off. Since Test-First Programming isn’t as popular as I think it should be, let’s assume for the remainder of this post that you’re not practicing it.

Then what good does it do us to know that a statement is executed?

Well, if the next statement is also executed, then we know that the first statement didn’t throw an exception.

That doesn’t help us much either, however. Most statements should not throw an exception, but some statements clearly should. So in general, we still don’t get a lot of value out of knowing that a statement is executed.

The true value of measuring code coverage is therefore not in the statements that are covered, but in the statements that are not covered! Any statement that is not executed while running the tests is surely not tested.

Uncovered code indicates that we’re missing tests.

What Code Coverage We Should Measure

Our next job is to figure out what tests are missing, so we can add them. How can we do that?

Since we’re measuring code coverage, we know the target of the missing tests, namely the statements that were not executed.

If some of those statements are in a single class, and you have unit tests for that class, it’s easy to see that those unit tests are incomplete.

Unit tests can definitely benefit from measuring code coverage.

What about acceptance tests? Some code can easily be related to a single feature, so in those cases we could add an acceptance test.

In general, however, the relationship between a single line of code and a feature is weak. Just think of all the code we re-use between features. So we shouldn’t expect to always be able to tell by looking at the code what acceptance test we’re missing.

It makes sense to measure code coverage for unit tests, but not so much for acceptance tests.

Code Coverage on Acceptance Tests Can Reveal Dead Code

One thing we can do by measuring code coverage on acceptance tests, is find dead code.

Dead code is code that is not executed, except perhaps by unit tests. It lives on in the code base like a zombie.

Dead code takes up space, but that’s not usually a big problem.

Some dead code can be detected by other means, like by your IDE. So all in all, it seems that we’re not gaining much by measuring code coverage for acceptance tests.

Code Coverage on Acceptance Tests May Be Dangerous

OK, so we don’t gain much by measuring coverage on acceptance tests. But no harm, no foul, right?

Well, that remains to be seen.

Some organizations impose targets for code coverage. Mindlessly following a rule is not a good idea, but, alas, such is often the way of big organizations. Anyway, an imposed number of, say, 75% line coverage may be achievable by executing only the acceptance tests.

So developers may have an incentive to focus their tests exclusively on acceptance tests.

This is not as it should be according to the Test Pyramid.

Acceptance tests are slower, and, especially when working through a GUI, may also be more brittle than unit tests.

Therefore, they usually don’t go much further than testing the happy path. While it’s great to know that all the units integrate well, the happy path is not where most bugs hide.

Some edge and error cases are very hard to write as automated acceptance tests. For instance, how do you test what happens when the network connection drops out?

These types of failures are much easier explored by unit tests, since you can use mock objects there.

The path of least resistance in your development process should lead developers to do the right thing. The right thing is to have most of the tests in the form of unit tests.

If you enforce a certain amount of code coverage, be sure to measure that coverage on unit tests only.

Visualizing Code Coverage in Eclipse with EclEmma

Last time, we saw how Behavior-Driven Development (BDD) allows us to work towards a concrete goal in a very focused way.

In this post, we’ll look at how the big BDD and the smaller TDD feedback loops eliminate waste and how you can visualize that waste using code coverage tools like EclEmma to see whether you execute your process well.

The Relation Between BDD and TDD

Depending on your situation, running BDD scenarios may take a lot of time. For instance, you may need to first create a Web Application Archive (WAR), then start a web server, deploy your WAR, and finally run your automated acceptance tests using Selenium.

This is not a convenient feedback cycle to run for every single line of code you write.

So chances are that you’ll write bigger chunks of code. That increases the risk of introducing mistakes, however. Baby steps can mitigate that risk. In this case, that means moving to Test-First programming, preferably Test-Driven Development (TDD).

The link between a BDD scenario and a bunch of unit tests is the top-down test. The top-down test is a translation of the BDD scenario into test code. From there, you descend further down into unit tests using regular TDD.

This translation of BDD scenarios into top-down tests may seem wasteful, but it’s not.

Top-down tests only serve to give the developer a shorter feedback cycle. You should never have to leave your IDE to determine whether you’re done. The waste of the translation is more than made up for by the gains of not having to constantly switch to the larger BDD feedback cycle. By doing a little bit more work, you end up going faster!

If you’re worried about your build time increasing because of these top-down tests, you may even consider removing them after you’ve made them pass, since their risk-reducing job is then done.

Both BDD and TDD Eliminate Waste Using JIT Programming

Both BDD and TDD operate on the idea of Just-In-Time (JIT) coding. JIT is a Lean principle for eliminating waste; in this case of writing unnecessary code.

There are many reasons why you’d want to eliminate unnecessary code:

  • Since it takes time to write code, writing less code means you’ll be more productive (finish more stories per iteration)
  • More code means more bugs
  • In particular, more code means more opportunities for security vulnerabilities
  • More code means more things a future maintainer must understand, and thus a higher risk of bugs introduced during maintenance due to misunderstandings

Code Coverage Can Visualize Waste

With BDD and TDD in your software development process, you expect less waste. That’s the theory, at least. How do we prove this in practice?

Well, let’s look at the process:

  1. BDD scenarios define the acceptance criteria for the user stories
  2. Those BDD scenarios are translated into top-down tests
  3. Those top-down tests lead to unit tests
  4. Finally, those unit tests lead to production code

The last step is easiest to verify: no code should have been written that wasn’t necessary for making some unit test pass. We can prove that by measuring code coverage while we execute the unit tests. Any code that is not covered is by definition waste.

EclEmma Shows Code Coverage in Eclipse

We use Cobertura in our Continuous Integration build to measure code coverage. But that’s a long feedback cycle again.

Therefore, I like to use EclEmma to measure code coverage while I’m in the zone in Eclipse.

EclEmma turns covered lines green, uncovered lines red, and partially covered lines yellow.

You can change these colors using Window|Preferences|Java|Code coverage. For instance, you could change Full Coverage to white, so that the normal case doesn’t introduce visual clutter and only the exceptions stand out.

The great thing about EclEmma is that it let’s you measure code coverage without making you change the way you work.

The only difference is that instead of choosing Run As|JUnit Test (or Alt+Shift+X, T), you now choose Coverage As|JUnit test (or Alt+Shift+E, T). To re-run the last coverage, use Ctrl+Shift+F11 (instead of Ctrl+F11 to re-run the last launch).

If your fingers are conditioned to use Alt+Shift+X, T and/or Ctrl+F11, you can always change the key bindings using Window|Preferences|General|Keys.

In my experience, the performance overhead of EclEmma is low enough that you can use it all the time.

EclEmma Helps You Monitor Your Agile Process

The feedback from EclEmma allows you to immediately see any waste in the form of unnecessary code. Since there shouldn’t be any such waste if you do BDD and TDD well, the feedback from EclEmma is really feedback on how well you execute your BDD/TDD process. You can use this feedback to hone your skills and become the best developer you can be.