TDDD, more thoughts

[[!tag debian tddd]]

My previous blog entry about Test Driven Distro Development (TDDD) was rather rambling. Here is the condensed version, plus some further thoughts.

I would like to automatically determine whether "testing" is minimally ready to be released, at any given point in time.

To implement this, I would like to see Debian get tools that would allow us to define specific minimal functionality that should work in a Debian release, and automatically verify that it does. I would like us to use these tools to determine whether a release is ready for release, with manual override from the release team. Ideally, we would use the tools to prevent breakages from moving into "testing", so that it gets closer to always being releasable.

Getting test coverage sufficiently high that we can replace the release team's judgement abour releasablity will never happen: Debian is rather too large and and complex for that. However, we can reap much benefit from even relatively small coverage.

Here is my initial suggestion for the minimal criteria that should be satisfied, at all times, in "testing". They are aimed at making sure that even if something else breaks, at least the basic stuff works well enough that one can log in and fix things.

there are no open ports in a base install
logging in via getty works
logging in via ssh works, if openssh-server is installed
su, sudo, ls, cp, mv, rm, and vi work at least in some typical use cases
web pages can be accessed from outside the system, given suitably configured apache2
after automatic login into a graphical desktop environment, all the crucial per-session programs start up, as well as web browsers and other GUI programs that almost everyone expects to be part of a desktop environment
- if there are good GUI testing tools, we may want to test in more detail, but that may have to wait for later

These are minimal criteria. They're not meant to verify that everything in Debian works. After these work, adding more coverage is not just possible, but reasonably easy in many cases. Want to make sure DNS gets configured right? Write a test that logs in and does a lookup inside the test system. After the tools have evolved, writing a simple test like that should be only a line or two.

The tests I am talking here are system tests, or acceptance tests for the whole system, not just for individual packages. Unit tests would test individual functions or methods or classes; integration tests would test groups of them, or possibly entire programs; system tests test groups of programs/packages.

The point is not to verify that every package works in every possible case, but that the system as a whole works. Most package-specific functional tests should be part of the package's own test suite. We should also run those, but that's somewhat orthogonal to the vision I'm trying to articulate here.

At least the following scenarios should be tested (the above tests should be run after each scenario):

install "stable" and upgrade to "testing"
install "testing" using latest d-i release, and upgrade to current "testing"
use an install from the previous successful test run, and upgrade to the current "testing"
- this answers the question "if testing worked yesterday, does it still work if we add these new packages"

It may be useful to run the tests against "unstable" as well as "testing". However, rejecting an upload to "unstable" may be less wise than preventing packages from being copied from "unstable" to "testing". Exactly how this the development processes should adopt TDDD is an interesting question, but a reasonable toolset should be adaptable to anything we want to do.

(Indeed, it would be possible to structure the toolset so that it isn't specific to developing a distro: a "system-test" tool that a sysadmin could use to verify that a server works would probably be useful to some people. Perhaps such a tool exists and could be adapted for TDDD?)

Running these tests for security updates to "stable" would obviously also be a possibility that should perhaps be considered.

Running these tests would happen something like this:

generate a suitable virtual machine disk image, configured in the proper way for the test
run the image under virtualization (or on real hardware if that can be remote-controlled suitably)
run the tests against the image
if tests fail, archive the image (or sufficient information to reproduce the start image)
- also, automatically file bugs against all packages involved in the test, and e-mail debian-devel-announce with the names of the people who uploaded the new versions of packages compared to the previous successful test run, Cc'ing their spouses and parents for further shaming, and also Cc'ing the Debconf organization team so they can stood up before an audience armed with rotten tomatoes and eggs (this bullet point is here only to see if anyone actually reads my blog posts)

I would further like to see Debian make it possible to have isolated branches for development. This would help with, for example, large transitions that take a long time. The transition could be prepared in a branch of its own, and when it's ready to be pushed into unstable, the automatic tests outlined above could be used to verify the transition doesn't break anything too critical. (If the tests fail, the push fails; if necessary, that can be overridden manually, but that should be rare.)

In my view, branches would greatly benefit Debian development, but they're not necessary for starting to implement TDDD.

My condensed versions are still too long.