ABOUT ARCHIVE

Atomic Publishing a Static Website

Yes, yes - Static Site Generators are all the rage these days. It seems like there’s (multiple!) options for every language out there (including homegrown options from back in the day).

There’s a bunch of benefits to using them and once you get past thinking you need a “dynamic” site they make perfect sense. I use Metalsmith (NodeJS) for this site and pixls.us. I used Pelican (Python) for gimp.org, and I just got my feet wet with Hugo (Go) for the new digiKam website.

Whichever system you use, the build system normally ends with your website built into a directory. To publish the site you need only transfer that directory of files to your web server. In my case I use rsync to only transfer files (or parts of files) that have changed.

Care should be taken with how the site is updated on the server, though.

Atomicity

The web server has one task: serve requests for files from a directory.

If you simply transfer your files into this directory on the server, there’s a chance that when someone makes a request for a file that it won’t have been uploaded yet, or it might reference a file that no longer exists. This can happen if your transfer takes a little time to do.

One option to mitigate this may be to upload the files to the server in a temporary directory, then simply move them into place. This will be way better than simply transferring into the web directory, but there is still a chance during the move operation for files to be out of sync (particularly for large websites).

We need the operation to be atomic. That is, we want the new site directory to appear to be updated instantaneously to the web server.

Setup

It’s possible to do this but it does require a (just a) little bit of work. I’ll explain the way I do my deployments as an example.

I have my web server serving files from a symlink , public_html. This symlink points to a directory named website-YYYYMMDD/ where the actual files are located.

/home/webserver/
 ├── public_html → website-20170311/
 └── website-20170311/

Updating

Now, I have updated my website and generated a new directory of updated files. I need to get them on the server to update the website!

I want to take advantage of the delta-transfer capabilities of rsync so I need to have a directory on the server to compare it against. As I already mentioned, I don’t want to do this against the live directory, so I will create a copy of it.

Unfortunately, if the site is large it may be prohibitive to create a complete second copy on the server. Luckily for me, the copy command has two options that are really useful here, -a, --archive and -l, --link.

The --archive option will recursively copy all of the SOURCE.

The --link option will create hard links for all of the files (not directories).

This is awesome, because not only does it minimize actual disk usage (you’re really just creating a new link to the same inode) but it’s also really, really fast.

So now there’s a (hard link) copy of the site directory with a new name (website-20170401 in my example):

/home/webserver/
 ├── public_html → website-20170311/
 ├── website-20170311/
 └── website-20170401/

Now I can rsync to the newly copied directory on the server.

“But wait!” you’ll say, “If you change a hard linked file, it will update everywhere!”

This is true. Normally, if you modify a file that is hard linked then all other files that point to that same inode will see the update. Except that rsync will actually unlink the file for you before overwriting the data (please test this on your own system). (If there’s no change then nothing is done.)

Now I have a directory on my web server that contains all of my new changes and updates. To make the new changes live on the website I just need to point the symlink public_html to the new directory.

This can be accomplished atomically by creating a new symlink pointing to the new directory and then renaming it over the old symlink. (Using ln -snf newlink currentlink actually does an unlink, then creates a new link - so not atomic…)

So first we create a new symlink pointing to the new directory:

$ ln -s website-20170401 public_html-tmp

Then moving the new symlink over the old symlink is a rename, which is atomic:

$ mv -Tf public_html-tmp public_html

The web server will now be serving files from the new location and without any hiccups along the way.

In Summary

Assuming your web server is serving from a symlink public_html, and assuming your symlink points to your current site directory (website-20170311) like:

/home/webserver/
 ├── public_html → /website-20170311/
 └── website-20170311/

To update the site atomically:

  1. Copy the current site directory to a new directory (with hard links).

    $ cp -al website-20170311 website-20170401
    
     /home/webserver/
      ├── public_html → /website-20170311/
      ├── website-20170311/
      └── website-20170401/
    
  2. rsync your new site build to the new directory.

    $ rsync -PSauv local_dir/ website-20170401/
    
  3. Create a temporary symlink to the new directory.

    $ ln -s website-20170401 public_html-tmp
    
  4. Rename the new symlink over the old one.
    mv -Tf public_html-tmp public_html
    

Other Uses

Astute readers may notice that leveraging this hard link copy of a filesystem along with rsync is actually similar to how incremental backups can be done without exploding disk space requirements.

Mike Rubel covers something like this in an old post of his.