Test driven distribution development

[[!tag debian tddd]]

Disclaimer: There is a huge discussion going on right now in the debian-devel mailing list, about different ways to rearrange how Debian development happens, and maybe to provide a Debian users with an additional release. This blog entry is not a comment on that discussion, which I am ignoring. The discussion did, however, prompt a dent about "test driven development". That prompted a short exchange of thoughts with Tom Marble, and this blog post is a cleaned up version of my part in that discussion.

Currently Debian development happens roughly like this: all packages are uploaded to a part of the Debian archive called unstable. Once they've been in use for a while without serious problems, they get automatically copied into another part called testing. Every year or two, testing is frozen and any remaining problems are fixed, after which all of testing gets copied into yet another part of the archive called stable, and that's the actual release.

There is a little bit of automatic testing, but only a little. Almost all testing is done by people, by them using unstable or testing on their usual computers. If they have problems, they report them.

Contrast this with development using Test Driven Development, or TDD, and other modern development methodologies. Here's a rough summary of what I do when I write (much) of my software.

first write one or more automatic tests (unit or functional ones)
write the actual code, enough to make all tests pass
add more tests
write more code, or change existing code
repeat this until tests describe all the behavior desired of the code

In addition, I measure coverage to make sure all parts of the code gets tested. I usually aim for 100% coverage, except for those parts that very hard or quite pointless to test. (That's easier to achieve than you'd think, but that's the topic of another blog post.)

This all sounds like a lot of bureaucratic nonsense, but what I get out of this is this: once all tests pass, I have a strong confidence that the software works. As soon as I've added all the features I want to have for a release, I can push a button to push it out. (Well, actually there's a little bit more to it.)

This does mean I never have bugs in my software. Of course I do. However, there's a lot fewer of them. In fact, there's so few of them, after tests pass, that I would almost be happy to make an automatic release after every successful run of "make check".

Another aspect of the way I do development is distributed version control. The relevant feature is powerful branching and merging. Most things I develop happen in short-lived single-purpose branches: whenever I start work on a feature or bugfix, I create a branch for it. (Unless I'm feeling particularly lazy, which happens more often than you'd think.)

When I've finished the feature, or fixed the bug, I merge the branch back into the trunk branch.

This way, the trunk is always in releaseable state. It might not have all the features I want for the next release, but the features that are there, do work. The branch probably has bugs, but if I've written tests that are good enough, I know the software works well enough.

Or that's the idea. Sometimes things go wrong. Then I write more tests and the next time goes better. (It doesn't have to be perfect, as long as it gets better every time.)

See the contrast between automatic tests with good coverage, and the Debian style of relying on user feedback? There's no need to wait for user feedback with automatic tests. This speeds up development, makes releasing easier, but most importantly takes away any reason to fear making changes, even big changes.

Automatic tests and good test coverage are easy achieve in small projects. For a system as huge as Debian, good test coverage is quite hard to achieve.

The thing about automatic tests, though, is that even a little bit of it is helpful, and after you have a few tests, it gets easier to add more. The first test is the hardest to get done, since you need to set up all the infrastructure for it.

Debian does do a bit of automatic testing already. (See lintian, piuparts, autopkgtest, edos, etc.) I don't want to belittle that, but I think we could do better.

Here's what I would love to see:

we have a part of the archive that corresponds to a trunk branch, i.e., it is always in a releasable state (the "testing" area was originally meant to be that; we could make it so now)
releaseable state is determined by two things:
- an automatic test suite
- user feedback, particularly in the form of bug reports (as now)
whenever changes are made, they happen in "branches"
- the branch is not affected by changes elsewhere in the archive, except by manual synching ("merge from trunk")
- individual package uploads, as well as groups of packages such as for transitions, are each in their own branch
- this is sort of similar to a PPA, but more powerful
when a branch is to be merged into trunk, the automatic tests must first pass (or at least no new failures in them can be introduced)
- tests can also be run for any other branch, of course, so that those developing the branch know if they're ready to push their changes into trunk
there's a culture of writing tests for bugs (whenever that is feasible), and for new features
- particularly release goals should be expressed as automatic test suites
there's a culture of sharing tests with upstreams, and with other distributions

Since full test coverage is going to be impossible for Debian, some subset should be targeted. Perhaps something like this would suffice to start with:

a version of debian-installer is generated from a branch
test installation of a new system from scratch
- with a large set of tasks selected
test upgrade from previous stable release to the branch
- with a large set of packages installed (as many as possible, actually)
test upgrade from trunk branch to branch to be merged
- ditto large set of packages
test with lintian, reject on specific tests failing
test with piuparts
test the whole system for specific functionality
- ssh access from outside
- sudo access for a logged in user
- sending mail with SMTP
- web server
- possibly test specific web applications
- possibly test Samba and NFS services
- possibly test printing services (CUPS)
- possibly test essential desktop functionality (automatic login, at least)
if package comes with package specific tests, run those too

These tests would not guarantee that a set of changes would not break Debian, but they would give a high confidence that at least basic stuff would still work after the changes.

Now, obviously implementing all of this I'm dreaming of (it is just past midnight, after all) is going to be impossible. There's way too much work to do, there's not enough tools for writing tests, and it would require too much computing power to run, and so on and so forth. But it's late, and I've had a bad day, and I might as well dream of something nice.

Anyway, anyone interested in these things should perhaps help drive DEP8.