Simple server backups with Amazon S3 and Glacier

June 7, 2013 | By

nav_main_spriteAmazon recently announced a new feature in Amazon S3 called Lifecycle.  It allows you to create a rule that does one or both of the following based on the age of the objects in the S3 bucket:

  • Move the object to Glacier (super-cheap offline storage)
  • Delete the object permanently from the bucket

I decided to use this to simplify backups of some of the servers I maintain.  Here’s my script (without error checking and other non-relevant code):

#!/bin/bash
tar -czf backup_$(date +%Y%m%d).tgz myStuffToBackupFolder/
s3cmd --reduced-redundancy put backup_$(date +%Y%m%d).tgz s3://mys3backupbucket
rm backup_$(date +%Y%m%d).tgz

Basically, my script creates a .tgz file that contains all of the files that I want to backup.  The script names the tgz file using the current date (e.g. backup_20130607.tgz). It then copies the tgz file to my Amazon S3 bucket using s3cmd and removes the tgz file from my local filesystem.

If you haven’t used the s3cmd before, you should check it out. It gives you a very simple command-line interface to Amazon S3 buckets. You can read more at http://s3tools.org/s3cmd.  You’ll find versions for Linux and Mac OS.

Typically, I would add some code to the backup script that keeps x backups and deletes anything older than y days.  Amazon’s new lifecycle feature really makes this super simple.  If I go to the properties of my S3 bucket on the Amazon Web Services console, I can create the following lifecycle rule:

s3lifecycledialog1

My rule specifies that each backup file is available from the S3 bucket for the first 10 days.  Then, after 10 days, the file is moved to Amazon Glacier (to reduce cost dramatically – about 80% cheaper than leaving the file in an S3 bucket).  Then, after a year, the file is automatically deleted.

If you are not familiar with Amazon Glacier, here are a few facts you need to know:

  • It’s crazy cheap storage – currently it’s $0.01/GB/month (compared to $0.09/GB/month for S3 storage), so a terabyte would only cost $10/month in Glacier.
  • Files in Glacier are not immediately accessible.  It takes 3-5 hours for a restore request to be completed.  Just think of it as a tape backup (I have no idea what type of media Amazon is actually using for Glacier storage).
  • There is a minimum three month storage policy on Glacier and there is a restore fee for anything beyond 5% of your stored data.  A lot of people miss this fine print.  Amazon words it like this: “Glacier is designed with the expectation that restores are infrequent and unusual, and data will be stored for extended periods of time. You can restore up to 5% of your average monthly Glacier storage (pro-rated daily) for free each month. If you choose to restore more than this amount of data in a month, you are charged a restore fee starting at $0.01 per gigabyte.”.  We’re not talking a lot of money so it’s not really a factor for my use case unless I suddenly need to restore some very large files (which typically means something bad has happened).
  • Files moved to Glacier still show up in your S3 bucket, but with a storage type of  “Glacier”.  I like this because I get a complete picture of how all of my files are stored.  Even tools I use like Transmit FTP, CyberDuck and other FTP-like tools that support Amazon S3 buckets still show the files.  However, if you try to download Glacier-stored files using these tools, you’ll get a permission error until you do a restore via the Amazon Web Services console.
  • You’ll see other pricing parameters on the S3 pricing page — charges for PUT, POST, LIST, GET, etc., but when you’re only archiving a handful of files, it’s a rounding error.  For example, if I ran my backup daily, it would be 365 PUTs which doesn’t even add up to one cent!  So basically, you can ignore these unless you are backing up 100k+ files.  If you are really nervous about what your real cost is going to be at the end of the month, simply check the Amazon Web Services account activity page after a backup or two and extrapolate your monthly cost.
  • You can learn more about S3 and Glacier at http://aws.amazon.com/s3/.

One other note — you’ll see that my script uses the s3cmd parameter “–reduced-redundancy”.  Amazon has two classes of storage (other than Glacier) — normal and reduced redundancy.  Reduced redundancy is a bit cheaper, but offers less reliability.  Their site claims 99.99% durability vs the 99.999999999% durability of normal S3 storage.  For my particular needs, 99.99% is fine, especially since I’m quickly moving the files to Glacier.

Filed in: Amazon AWS, Amazon S3, Linux | Tags: , , , , , , , , ,

About the Author (Author Profile)

Greg is a developer advocate for the Google Cloud Platform based in San Francisco, California

Comments are closed.