On Measuring Code Coverage

In a previous post, I explained how to visualize what part of your code is covered by your tests.

This post explores two questions that are perhaps more important: why and what code coverage to measure.

Why We Measure Code Coverage

What does it mean for a statement to be covered by tests? Well, it means that the statement was executed while the tests ran, nothing more, nothing less.

We can’t automatically assume that the statement is tested, since the bare fact that a statement was executed doesn’t imply that the effects of that execution were verified by the tests.

If you practice Test-First Programming, then the tests are written before the code. A new statement is added to the code only to make a failing test pass. So with Test-First Programming, you know that each executed statement is also a tested statement.

If you don’t write your tests first, then all bets are off. Since Test-First Programming isn’t as popular as I think it should be, let’s assume for the remainder of this post that you’re not practicing it.

Then what good does it do us to know that a statement is executed?

Well, if the next statement is also executed, then we know that the first statement didn’t throw an exception.

That doesn’t help us much either, however. Most statements should not throw an exception, but some statements clearly should. So in general, we still don’t get a lot of value out of knowing that a statement is executed.

The true value of measuring code coverage is therefore not in the statements that are covered, but in the statements that are not covered! Any statement that is not executed while running the tests is surely not tested.

Uncovered code indicates that we’re missing tests.

What Code Coverage We Should Measure

Our next job is to figure out what tests are missing, so we can add them. How can we do that?

Since we’re measuring code coverage, we know the target of the missing tests, namely the statements that were not executed.

If some of those statements are in a single class, and you have unit tests for that class, it’s easy to see that those unit tests are incomplete.

Unit tests can definitely benefit from measuring code coverage.

What about acceptance tests? Some code can easily be related to a single feature, so in those cases we could add an acceptance test.

In general, however, the relationship between a single line of code and a feature is weak. Just think of all the code we re-use between features. So we shouldn’t expect to always be able to tell by looking at the code what acceptance test we’re missing.

It makes sense to measure code coverage for unit tests, but not so much for acceptance tests.

Code Coverage on Acceptance Tests Can Reveal Dead Code

One thing we can do by measuring code coverage on acceptance tests, is find dead code.

Dead code is code that is not executed, except perhaps by unit tests. It lives on in the code base like a zombie.

Dead code takes up space, but that’s not usually a big problem.

Some dead code can be detected by other means, like by your IDE. So all in all, it seems that we’re not gaining much by measuring code coverage for acceptance tests.

Code Coverage on Acceptance Tests May Be Dangerous

OK, so we don’t gain much by measuring coverage on acceptance tests. But no harm, no foul, right?

Well, that remains to be seen.

Some organizations impose targets for code coverage. Mindlessly following a rule is not a good idea, but, alas, such is often the way of big organizations. Anyway, an imposed number of, say, 75% line coverage may be achievable by executing only the acceptance tests.

So developers may have an incentive to focus their tests exclusively on acceptance tests.

This is not as it should be according to the Test Pyramid.

Acceptance tests are slower, and, especially when working through a GUI, may also be more brittle than unit tests.

Therefore, they usually don’t go much further than testing the happy path. While it’s great to know that all the units integrate well, the happy path is not where most bugs hide.

Some edge and error cases are very hard to write as automated acceptance tests. For instance, how do you test what happens when the network connection drops out?

These types of failures are much easier explored by unit tests, since you can use mock objects there.

The path of least resistance in your development process should lead developers to do the right thing. The right thing is to have most of the tests in the form of unit tests.

If you enforce a certain amount of code coverage, be sure to measure that coverage on unit tests only.

Advertisement

5 thoughts on “On Measuring Code Coverage

  1. Have you looked at CRAP score as a metric? It will help you prioritize coverage so that you have better code coverage for the code that is most complex (and most likely to contain bugs).

    I’m one of the creators of NCover, a code coverage tool for .NET development and we use CRAP score in conjunction with code coverage to help prioritize.

    1. Thanks for your comments, Peter.

      Yes, CRAP is certainly an interesting approach. Since I use Test-First Programming myself, I don’t really need to prioritize where I add tests.

      For those readers who are unfamiliar with CRAP, read about it on the CRAP4J FAQ.

  2. One thing you can say about the code coverage metric, that in any sense it is a negative metric. You might have line coverage, but not cover all branches. However, if you do not hit any line, the line coverage at least tells you what code you’re certainly not testing.

    In any other way, the unit tests should cover 100%, or at least always aim at 100%. There is no sense in saying 80% coverage is enough for unit tests. Its like saying you simply don’t test 1/5th of the system.

    About acceptance tests, I don’t have enough experience with them to talk about sensible code coverage for them. I am not sure yet why acceptance tests are using the GUI. These are the top layer of the testing pyramid (GUI tests); imo acceptance tests should be just beneath the GUI.

    1. Thanks for your comments, Stefan.

      There are reasons not to mandate 100% code coverage for unit tests. For instance, the Java language has features like checked exceptions and private constructors that make testing harder than it should be. But in general, I agree, that the goal should be as close to 100% as possible. With Test-First Programming or TDD that comes naturally.

      Acceptance tests don’t have to use the GUI, I’m sorry if the text suggests that.

      Acceptance tests will always be slower than unit tests, since they test more than one unit. Also, they’re often executed against a running system that must first be assembled, etc, all of which takes time. Finally, acceptance tests often use different technology than unit tests. For instance, when exercising a RESTful interface, it takes time to set up an HTTP connection.

Please Join the Discussion

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s