[[!tag debian idea]]

Note: This is an idea I had, not something I'm working on. If it interests you, please work on it. I may come back to this some day, but not anytime soon, but I thought I'd share it anyway.

When downloading updates for a Debian package, often much of the data doesn't actually change, or at least not much. Because of this, the debdelta service is a wonderful idea (and Debian should adopt it officially).

However, a different way of achieving that might be like this:

  • split up contents of Debian binary packages into chunks of suitable size (which may vary depending on the contents)
  • each chunk has an identifier
  • there is a mapping from checksums of chunks to their identifiers, and vice versa
  • a .deb consists of a list of chunk ids, instead of the chunks themselves
  • there are also "patch chunks", which say how to modify another chunk (e.g., modify offsets in compiled code when it's been moved in a new version, but is otherwise the same)

When downloading updates, apt would look up the chunks ids and checksums, and see if it already has such chunks installed locally. If so, it can reuse those chunks. (Note that apt can look for chunks in the installed files, and does not need to cache things separately.)

Now, I have not done the research to verify that this is actually worthwhile. Someone should experiment with different sizes of chunks to see if enough bandwidth could be saved to make this worthwhile, in an upgrade from one Debian release to another (lenny to squeeze, for example).

Any takers?