On Measuring Code Coverage

In a previous post, I explained how to visualize what part of your code is covered by your tests.

This post explores two questions that are perhaps more important: why and what code coverage to measure.

Why We Measure Code Coverage

What does it mean for a statement to be covered by tests? Well, it means that the statement was executed while the tests ran, nothing more, nothing less.

We can’t automatically assume that the statement is tested, since the bare fact that a statement was executed doesn’t imply that the effects of that execution were verified by the tests.

If you practice Test-First Programming, then the tests are written before the code. A new statement is added to the code only to make a failing test pass. So with Test-First Programming, you know that each executed statement is also a tested statement.

If you don’t write your tests first, then all bets are off. Since Test-First Programming isn’t as popular as I think it should be, let’s assume for the remainder of this post that you’re not practicing it.

Then what good does it do us to know that a statement is executed?

Well, if the next statement is also executed, then we know that the first statement didn’t throw an exception.

That doesn’t help us much either, however. Most statements should not throw an exception, but some statements clearly should. So in general, we still don’t get a lot of value out of knowing that a statement is executed.

The true value of measuring code coverage is therefore not in the statements that are covered, but in the statements that are not covered! Any statement that is not executed while running the tests is surely not tested.

Uncovered code indicates that we’re missing tests.

What Code Coverage We Should Measure

Our next job is to figure out what tests are missing, so we can add them. How can we do that?

Since we’re measuring code coverage, we know the target of the missing tests, namely the statements that were not executed.

If some of those statements are in a single class, and you have unit tests for that class, it’s easy to see that those unit tests are incomplete.

Unit tests can definitely benefit from measuring code coverage.

What about acceptance tests? Some code can easily be related to a single feature, so in those cases we could add an acceptance test.

In general, however, the relationship between a single line of code and a feature is weak. Just think of all the code we re-use between features. So we shouldn’t expect to always be able to tell by looking at the code what acceptance test we’re missing.

It makes sense to measure code coverage for unit tests, but not so much for acceptance tests.

Code Coverage on Acceptance Tests Can Reveal Dead Code

One thing we can do by measuring code coverage on acceptance tests, is find dead code.

Dead code is code that is not executed, except perhaps by unit tests. It lives on in the code base like a zombie.

Dead code takes up space, but that’s not usually a big problem.

Some dead code can be detected by other means, like by your IDE. So all in all, it seems that we’re not gaining much by measuring code coverage for acceptance tests.

Code Coverage on Acceptance Tests May Be Dangerous

OK, so we don’t gain much by measuring coverage on acceptance tests. But no harm, no foul, right?

Well, that remains to be seen.

Some organizations impose targets for code coverage. Mindlessly following a rule is not a good idea, but, alas, such is often the way of big organizations. Anyway, an imposed number of, say, 75% line coverage may be achievable by executing only the acceptance tests.

So developers may have an incentive to focus their tests exclusively on acceptance tests.

This is not as it should be according to the Test Pyramid.

Acceptance tests are slower, and, especially when working through a GUI, may also be more brittle than unit tests.

Therefore, they usually don’t go much further than testing the happy path. While it’s great to know that all the units integrate well, the happy path is not where most bugs hide.

Some edge and error cases are very hard to write as automated acceptance tests. For instance, how do you test what happens when the network connection drops out?

These types of failures are much easier explored by unit tests, since you can use mock objects there.

The path of least resistance in your development process should lead developers to do the right thing. The right thing is to have most of the tests in the form of unit tests.

If you enforce a certain amount of code coverage, be sure to measure that coverage on unit tests only.

Unit testing a user interface

So I’m on this new cool Google Web Toolkit (GWT) project of ours now. Part of the UI consists of a series of HTML labels that is the view to our model. Since our model has a tree structure, we use the visitor pattern to create this UI.

This all works beautifully, except when it doesn’t, i.e. when there is a bug. With all the recursion and the sometimes convoluted HTML that is required to achieve the desired effect, hunting down bugs isn’t always easy.

So it was about time to write some unit tests. Normally, I write the tests first. But I was added to this project, and there weren’t any tests yet. So I decided to introduce them. Now, it is often easier to start with tests than add them later. If you don’t start out by writing tests, you usually end up with code that is not easily testable. That was also the case here.

Making the code testable

So my first job was to make sure the code became testable. I started out by separating the creation of the HTML labels from the visitor code, since I didn’t want my tests to depend on GWT. So I introduced a simple TextStream:

public interface TextStream {
  void add(String text);
}

The visitor code is injected with this text stream and calls the add() method with some HTML markup. Normally, this TextStream is a HtmlLabelStream:

public class HtmlLabelStream implements TextStream {

  private final FlowPanel parent;

  public HtmlLabelStream(final FlowPanel parent) {
    this.parent = parent;
  }

  public void add(final String text) {
    parent.add(new HTML(text));
  }

}

But my test case also implements the TextStream interface, and it injects itself into the visitor. That way, it can collect the output from the visitor and compare it with the expected output.

Simplifying testing

Now I got a whole lot of HTML code that I needed to compare to the desired output. It was hard to find the deviations in this, since the required HTML is rather verbose.

So I decided to invest some time in transforming the incoming HTML into something more readable. For instance, to indent some piece of text, the HTML code contains something like this:

<div style="padding-left: 20px">...</div>

For each indentation level, 10px were used. So I decided to strip the div, and replace it with the number of spaces that corresponds to the indentation level. I repeated the same trick for a couple of other styles. In the end, the output became much easier to read, and thus much easier to spot errors in. In fact, it now looks remarkably similar to what is rendered by the browser.

This certainly cost me time to build. But it also won me time when comparing the actual and expected outputs. And adding a new test is now a breeze. Which is just the sort of incentive I need for my fellow team mates to also start writing tests…