Using coverage.py with Python unittests to ensure good test coverage

[[!tag coverage]]

This is the approximate, expanded transcript of the quick talk I gave at the December, 2008 Ubuntu Developer Summit.

This talk is about how to figure out how much of your code your tests actually test. This is called test coverage.

I will assume you know about Python, and the Python unittest framework. There is unfortunately no time to introduce those, too.

The tool called coverage.py by Ned Batchelder records and reports which lines of a Python program are executed. Let me show you an example:

$ rm .coverage
$ python-coverage -x foo.py linux windows
linux is smaller
$ python-coverage -rm -o /usr
Name    Stmts   Exec  Cover   Missing
-------------------------------------
foo         8      7    87%   7

coverage.py is a generic tool: it only reports the lines that are executed, or not executed. To make it useful for measuring test coverage, we report which lines are not executed during a test run.

$ rm .coverage
$ python-coverage -x foo_tests.py
.
---------------------------------------
Ran 1 test in 0.000s

OK
$ python-coverage -rm -o /usr 
Name        Stmts   Exec  Cover   Missing
-----------------------------------------
foo             8      5    62%   7, 11-12
foo_tests       9      9   100%   
-----------------------------------------
TOTAL          17     14    82%

As you can see from this output, the test case is missing something. Now that we know what lines are not tested, we can add more test cases to test everything. Or, you might decide that it is not worth to test particular pieces of the code. With test coverage measurement, you can make that decision explicitly, rather than via oversight.

I must stress that coverage.py measures things at the statement level. Thus, even if you get full 100% coverage, it still doesn't test everything. For example, you might test only the first part of an or condition, never the second part, and the second part might be buggy and show up in real life. However, statement coverage is still pretty good.

Test coverage measured over running a whole unit test suite is not necessarily a good idea. Measuring a whole run means some things get marked as being tested even though they just happen to get called, and get called only in some particular way.

On the whole, unit tests are clearer and more reliable if they try to test only a single class at a time. Testing each class is hard to achieve, but a module at a time is easy, and for most projects that's pretty near to testing each class separately.

If you write your code so that each code module has a corresponding test module, then you can run that test module and verify that every line in the code module gets tested.

The code module might use other modules while tested, but only the lines in the code module itself are included in the coverage measurement.

There are at least two tools for automating this pairwise running of tests. The more well known one is called nose, packaged in Ubuntu as python-nose. I did not know about it until it was too late, so I wrote my own, packaged in Ubuntu as python-coverage-test-runner.

node (python-nose.deb): 
nosetest --with-coverage

CoverageTestRunner (python-coverage-test-runner.deb):
python -m CoverageTestRunner

$ python -m CoverageTestRunner
Running test 1/1: testReturnsLinuxAsFirstIsSmaller (foo_tests.FooTes

FAILED

Statements missed by per-module tests:
Module       Missed statements
.../foo.py   7, 11-12

0 failures, 0 errors
Time: 0.0 s

Both of these tools make some assumptions on how code and test modules are laid out, although nose is more flexible. In a large, old project this might be inconvenient, but if you're starting a new project, and apply test driven development, then writing code and test modules next to each other should work very well.