[[!tag backups]]

... backups? did someone talk about backups? I'm sure I heard someone mention backups here somewhere. Backups! BACKUPS! BACKUPS ARE AWESOME!

That's a direct quote from my recent IRC history. I find backups quite interesting, particularly from an implementation point of view, and I may sometimes obsess about them a little bit. This is why I've written my own backup software.

I'm unusual: most people find backups boring at best, and tedious most of the time. When I talk with people about backups, the usual reaction is "um, I know I should". There are a lot of reasons for this. One is that backups are a lot like insurance: you have to spend time, effort, money, up front, to have any use for them. Another is that the whole topic is scary: you have to think about when things go wrong, and that puts people off. A third reason is that while there are lots of backup tools and methods, it's not always easy. After all, backups are about answering the question, "what can I do to keep my data safe, whatever happens in the future?".

I've spent a fair bit of the past several years thinking about backups. This is the first in a series of blog posts about backups, where I share my thinking about the topic. Perhaps it can be of some use to others. At least people will poke holes in my delusions.

In this post, I'll define some terminology. I am not a backup scholar, and I may have invented some of these terms myself, and they may be different from what real sysadmins use. I'll define the words I use, so we can understand each other.

Live data is the data you work with or keep. It's the files on your hard drive: the documents you write, the photos you save, the unfinished novels you wish you'd finish.

A backup is a spare copy of your live data. If you lose some or all of your live data, you can get it back ("restore") from your backup. The backup copy is, by practical necessity, older than your live data, but if you made the backup recently enough, you won't lose much.

Sometimes it's useful to have more than one old backup copy of your live data. You can have a sequence of backups, made at different times, giving you a backup history. Each copy of your live data in your backup history is a generation. This lets you retrieve a file you deleted a long time ago, but didn't realise you needed until now. If you only keep one backup version, you can't get it back, but if you keep, say, a daily backup for a month, you have a month to realise you need it, before it's lost forever.

The place your backups are stored is the backup repository. You can use many kinds of backup media for backup storage: hard drives, tapes, optical disks (DVD-R, DVD-RW, etc), USB flash drives, online storage, etc. Each type of medium has different characteristics: size, speed, convenicence, reliability, price, which you'll need to balance for a backup solution that's reasonable for you.

You may need multiple backup repositories or media, with one of them located off-site, away from where your computers normally live. Otherwise, if you house burns down, you'll lose all your backups too.

You need to verify that your backups work. It would be awkward to go to the effort and expense of making backups and then not be able to restore your data when you need to. You may even want to test your disaster recovery by pretending that all your computer stuff is gone, except for the backup media. Can you still recover? You'll want to do this periodically, to make sure your backup system keeps working.

Most live data is precious in that you'll be upset if you lose it. Some live data is not precious: your web browser cache probably isn't, for example. This distinction can let you limit the amount of data you need to back up, which can significantly reduce your backup costs.

There is a very large variety of backup tools. They can be very simple and manual: you can copy files to a USB drive using your file manager, once a blue moon. They can also be very complex: enterprise backup products that cost huge amounts of money and come with a multi-day training package for your sysadmin team, and which require that team to function properly.

You'll need to define a backup strategy to tie everything together: what live data to back up, to what medium, using what tools, what kind of backup history to keep, and how to verify that they work.

That's the groundwork. In the next episode, I'll blather about what is a backup, and what isn't.