[[!tag backups]]

A backup run takes time. During a backup run the live data may change. This can mean that the snapshot of the live data may not be internally consistent.

For example, suppose you're writing an e-mail when the backup starts automatically. If the timing goes wrong in exactly the right way, the backup might capture both the draft e-mail you were writing when the backup started, and the sent e-mail, when you finished the draft and sent the mail before the backup ended.

Or, worse, the backup might miss both the draft and the sent e-mail. The next backup run will, certainly, get the sent e-mail, but what if you have a disaster before that happens?

Internal consistency of data can be quite difficult to achive. Some programs, such as MySQL, don't do a good job of keeping the data consistent on-disk, making it difficult for the backup program to get a proper setup. (In that case, a standard trick is to "dump" or "export" the data in a consistent form, and back that up instead.)

In an ideal situation, when the backup starts, the system would have a way to tell all running programs to store all their state on disk in a recoverable manner (save all open documents, dump data base content, etc), and once that is done, the backup program creates a snapshot of the filesystem. Unfortunately, I don't know of any operating system that can request consistent state from running applications.

Snapshotting, however, is easy-peasy. It can be achieved by the filesystem (e.g., btrfs), or using LVM. Doing the snapshots requires root privileges, but that can probably be arranged. However, snapshotting like this, without getting all programs to commit their state to disk first, is not a guarantee of consistency.

A thorough way to be consistent is to shut down the computer, and then reboot it into a special backup mode (similar to single-user mode), where no services or applications are running. However, this is, for most people, much too heavyweight a procedure to be worth it.

Does consistency really matter? It matters a lot when the live data is being changed actively and consistency of the data is critical. For example, a bank's database of the amount of money in each account is quite critical: if data is not backed up consistently, money can be lost or created spontaneously, and that's probably a bad thing.

Most people, however, don't need to worry a lot, and merely need to be aware of the issue, and perhaps mitigate it by backing up when the computer is mostly idle, and run backups often enough.

Oh dear. This blog post contains too few jokes. I seem to have run out of jokes to make, so that will conclude this series of blog posts. I hope it was useful to some.