[[!tag linux debian]]

I got into a discussion recently about what a Linux distribution actually is. To some people, a distro is just a convenient way of source of binaries from which to pick what you install, either on your own computer or when creating an embedded system, or some other specific kind of system. To others it's a way of life, and the realization of the ideals of software freedom. To yet others, a distro is an outdated concept that will go away in the near-ish future, to be replaced by ... something.

Here's my take: a Linux distribution is defined by the following things:

  • It's a specific, chosen set of upstream projects. The selection criteria vary wildly between distros, and may be based on things like purpose, quality, licenses, etc. Debian, for example, chooses pretty much everything that is free software, but a distro targeted at providing NAS devices would choose a much smaller set.
  • It's a set of changes made to the upstream code, and specific configurations of the upstream code, in order make all the software work well together. For example, if the distro supports multiple mail transport agents, they might all need to be patched and configured to store incoming mail the same way by default.
  • It's the tools, processes, policies, and workflows used to develop the distribution.
  • It's the tools provided to the people using the distribution to install and manage their systems.
  • It's the people and companies who develop the distribution and support its users, and their shared values and purpose. This may be a real community, or just the kind of community that exists as a web page.

Historically, there has been a lot of variation in all aspects mentioned above. Some of this variation may today be unnecessary. For example, it might be helpful for more distributions to share work on common tools, so that not everyone needs to re-invent and re-implement the same kinds of tools. However, my gut feeling is that due to the inherent interconnectedness of all things, it may be very difficult to have one set of tools to support well all workflows, policies, processes, and communities.

The most important differentiating factors between the various distros are not the tools, but the community values, the perceived purpose of the distro, and the resultant criteria for choosing what to include and the workflow for developing the distro.

Even when two distros use the same tools, there can be a lot of difference in the end result. Fedora and RHEL, for example, use a lot of the same infrastructure, but their purpose is entirely different. Debian and Ubuntu use mostly the same packages, even, and still there's a huge difference between the two distros: one aims at a high quality system based on free software, the other aims to support the business of one particular company.

For this reason, there will always be a lot of Linux distributions around, and that is good.