Distributed bug tracking

[[!tag programming idea]]

Distributed bug tracking has been on my mind lately, usually when I'm going to sleep. I've never used distributed bug tracking, but the idea fascinates me.

Here's a few thoughts I keep returning to.

Distributed bug tracking should be useful for sharing bugs between projects, such as upstreams and various Linux distributions. It's also useful for all the cases when distributed version control is used.
A specification for a bug interchange format is more interesting at this stage than any one implementation. Compare RFC2822 versus /usr/bin/sendmail.
Each bug should have a universally unique id. The id should be human friendly: a UUID (long list of hexadecimals) is not. I'd rather see something like Message-ID in e-mail: local@domain, where domain refers to me, and I get to decide what local is. It would be helpful if ids have a regular form, so that it is possible to automatically recognize them in text.

Suggestion: bug://domain/local. Example: bug://liw.fi/1243.
Even though this is very similar to a URL, it does not have to be an actual URL. Indeed, that may be very inconvenient. Rather, there could be a bug finding service. Something like this: liw.fi can publish in DNS a TXT record to specify where its bugs are. Anyone wanting to find the bug on the web could then use this to find out that bug://liw.fi/1243 can be found at http://www.example.com/codehosting/bug-tracking?bugid=1243.

However, due to the nature of distributed bug tracking, it is not necessary for a bug to be on the web. It might instead just be a file on a disk somewhere, shared with others over e-mail, or via a version control system.
After the bug is actually located, it should be in a suitable format. I'm favorable towards an mbox file with some metadata. The metadata could be stored as mails in the mbox. The benefit of the mbox is that it's a well-known format with lots of support in all sorts of languages.
The metadata should include things like the id, the current title, the current summary, severity level, what software, products, services, etc, the bug affects, and so on. Any modern bug tracking system has a set of metadata that can be used as a basis for this.

Metadata should be flexible and extensible. No fixed set of severities, for example, works for all users.
The interesting part of distributed bug tracking happens when bugs are filed and modified all over the place. Some kind of centralization needs to happen, just like for distributed version control, for bug reports to be useful to the projects they are filed against. I would imagine a typical scenario would be like this:
1. Amy creates a bug report on the liwc package in Debian, using a new implementation of the reportbug tool. The tool collects all sorts of useful information about liwc, as installed on Amy's computer.
2. After the bug report has been created, it is sent to Debian's bug tracking system.
3. Lars, the Debian maintainer of the liwc package, looks at the bug report, and asks Amy to provide some additional information, by sending e-mail to both the bug tracking system and Amy.
4. The BTS modifies its copy of the bug report to include Lars's question. It also automatically marks the bug report as "waiting for more information".
5. Meanwhile, Ivar, the upstream maintainer of the liwc package, happens to see Amy's original bug report, since she forwarded it to him, without going through Debian's bug tracker. Ivar realizes that it is a very severe bug, and marks his copy of the bug report accordingly. The bug report now exists in two locations, with different information: in the Debian bug tracker, and on Ivar's computer.
6. Amy responds to Lars, sending e-mail to him and the Debian BTS. The BTS includes Amy's response and unmarks the bug report as "waiting for more information".
7. Meanwhile, Ivar develops a fix for the problem. He makes a new release with the fix, and marks his copy of the bug report as closed.
8. Ivar's release tarball includes a list of bugs it fixes. When Lars updates the Debian package the next time, the Debian BTS sees that the bug is fixed, and modifies the bug status accordingly. However, the BTS also finds out that Ivar has published his copy of the bug report, and merges that with the copy in the BTS.
Merging different instances of bug reports is going to be a interesting problem to solve. If two people have changed the status of the bug independently, in their own copies, and the copies get merged, whose status should apply?
In addition to sharing entire bug reports, it is necessary to also share shorter versions: the bug title and status, in addition to the id, would probably be useful.

My next step should probably be to re-read everything on http://dist-bugs.kitenet.net/ and research existing distributed bug tracking systems.

Edit: Old discussion page.