[[!tag idea unix programming]]

The Linux world has been discussing data loss due to ext4, except really because of applications. lwn.net has a pretty good summary (and further articles, which are not yet available to non-subscribers, so I won't link to them now).

Very rough and short summary: applications that don't do the right thing may cause files to be of zero length if the system crashes at just the right moment, and ext4 makes this a bit more likely than ext3, but it can happen with either filesystem.

The one thing that most attracts my attention in this discussion is that it's too difficult to reliably save files under Unix.

Really. It should be very simple, but the world is very complicated, so there's quite a number of steps you need to do to save files reliably.

That complexity should go into libraries. There should be a movement to implement reliable file saving in all popular programming languages.

Here's a very quick first approximation in Python:

import os
import tempfile

def save_file(filename, save_callback, backupname=None, backup=True):
    """Save data to a file."""

    # First, open the directory, so we can fsync it later. It is enough
    # to open it for reading, since that's everything Linux allows.
    dirname = os.path.dirname(filename) or "."
    dirfd = os.open(dirname, os.O_RDONLY)

    # Then, create a temporary file, and ahve the callback write the
    # file contents to that file.
    fd, tempname = tempfile.mkstemp(dir=dirname)
    f = os.fdopen(fd, "w")
    save_callback(f)

    # Close and flush data to the disk.
    f.flush()
    os.fsync(fd)
    f.close()

    # If the file exists, make a backup of it. But only if we should.
    if backup and os.path.exists(filename):
        if backupname is None:
            backupname = filename + ".bak"
        if os.path.exists(backupname):
            os.remove(backupname)
        os.link(filename, backupname)

    # Rename the temporary file to the new file.
    os.rename(tempname, filename)

    # Finally, flush the directory to disk, so that we're sure everything
    # is there.
    os.fsync(dirfd)
    os.close(dirfd)


if __name__ == "__main__":
    import time
    def callback(f):
        f.write("hello %s\n" % time.ctime())
    save_file("test.dat", callback)

I hasten to point out that this is NOT PRODUCTION CODE. Do not use it. Instead, take it as inspiration to write a really, really good function for doing that.