The Amazon folks have gone public today with the next EC2 feature: persistent storage. The official information is found in Jeff Barr’s blog entry and in Matt’s forum post. Calling the persistent storage a “feature” is actually quite an understatement, it really revolutionizes EC2 and enables usage patterns that any big-iron SAN user would die for.
What does this persistent storage look like? We’ve been testing it for awhile and are thoroughly impressed. The Amazon folks are clearly still fine-tuning a lot of the details, but basically you can create storage volumes in the cloud next to the server instances you launch in the cloud. Think of having a really big SAN in the cloud in which you can create volumes of up to 1TB each with a single API call, or with a simple click in the RightScale UI (yes, of course we’ll have nice support for the storage volumes on our site coupled with some neat automation and an array of pre-packaged solutions). You can mount one or multiple volumes on an instance and they appear just like the other local drives, so you can format them as you like, set-up striping and do other useful things.
The feature that really makes the storage volumes sizzle is the ability to snapshot them to S3 and then create new volumes from the snapshots. The snapshots are great for durability: once a snapshot is taken it is stored in S3 with all the reliability attributes of S3, namely redundant storage in multiple availability zones. This essentially solves the whole backup issue with one simple API call or click in the RightScale UI. You can also easily restore a snapshot by creating a fresh volume from it. This feature is useful beyond just restoring a backup: you may restore to another instance where you now have a clone of the data and can do whatever you want to it. Wow!
The cool stuff
There are so many great uses for the storage volumes that it’s impossible to write them all up in a single blog post, and we obviously haven’t thought of them all (or even close). The first usage scenario we looked into is running a database. Up to today the only setup for a mission critical database we recommend is using two instances with real-time database replication and frequent backups to S3. We’ve now installed our Manager for MySQL replicated set-up for many, many customers and it works very well. In short, we use MySQL replication for redundancy and frequent (like every 10 minutes) backups to S3 on the slave to guard against the unlikely event of simultaneous failure of both instances located in different availability zones.
With the storage volumes the Manager for MySQL set-up works even better. Instead of having to tar-up the database files and upload them to S3 we can just take a snapshot. And in order to initialize a slave we simply create a volume for it from the last snapshot of the master and launch the replication: no more rsync of the data is necessary. It’s really nice to see how all the automation we’ve built stays in place with the new Amazon capabilities and saves just as many headaches as before, it just gets turbocharged by the storage volumes!
In addition, the storage volumes enable slightly lower-end database offerings. Since the storage volumes are more durable than local instance storage a lot of the risk of losing it all if the instance dies goes away. It is now possible to run a single instance with the database data living on a storage volume and to take frequent snapshots to get backups onto S3. Should the instance die, it is very simple to launch a fresh one using the same storage volume. Typically it would take only a few minutes for the new instance to come up and take off where the old one stopped! Of course this set-up has more downtime when compared to the redundant database set-up, and one has to be really careful in setting everything up to minimize the time it takes to mount the volume and to ensure a successful database recovery.
Just as the storage volumes enable the reliable use of single-instance databases they also enable single-tenant appliances in EC2. It is now possible to host the data for a single-tenant virtual appliance on a storage volume and mount it on an instance. What’s really cool is the decoupling of the data from the instance. It means that you can start a customer on a small instance and if they outgrow it, you can migrate them almost seamlessly to a large and later an x-large instance, all using the same storage volume. Beyond an x-large a couple of interesting options are possible to increase performance further, such as striping multiple storage volumes. EC2 really brings virtual appliances to the next level!
The S3 snapshots enable some completely different and very intriguing usage scenarios. Suppose you’re doing some DNA matching against a Genome data set on 1000 instances. In addition to firing-up 1000 instances on a whim you can, also on a whim, clone a nicely prepared snapshot of the data set 1000-times to create 1000 volumes, one for each instance. BANG! This way they can all independently crawl over the data set. This type of massive (essentially read-only) cloning really opens-up new possibilities in running such large computations in a cost effective manner.
Summing it up
I’ll stop here, but clearly the cloud has just squared in size! Two years ago, when I started on EC2 there were only small instances available and the sentiment was that in order to get the horizontal scalability and pricing of the cloud you had to accept inferior features. In the meantime we’ve gotten multiple instance sizes plus recently the remappable IP addresses and availability zones. That already indicated that computing in the cloud would soon surpass computing in traditional colos or in your own datacenter not just in scale and price, but also in feature set. With the addition of the storage volumes with all the cool snapshot features it’s now a fait accomplit: the cloud adopters will have much more computing horsepower and flexibility at their fingertips than those who are still racking their own machines. It’s going to be like agile software development: if you want to survive as an internet/web service you will have to compute in the cloud or your competitors will leave you in the dust by being able to deploy faster, better, and cheaper.
Update: Werner Vogels, Amazon’s CTO also blogs about the storage volumes in all-things-distributed with a little more background perspective. The Amazon folks are getting pretty coordinated with news appearing at the same time on their blogs and the forums. Maybe I missed it, but I don’t think they even press release this stuff…