The Amazon folks have gone public today with the next EC2 feature: persistent storage. The official information is found in Jeff Barr’s blog entry and in Matt’s forum post. Calling the persistent storage a “feature” is actually quite an understatement, it really revolutionizes EC2 and enables usage patterns that any big-iron SAN user would die for.
The basics
What does this persistent storage look like? We’ve been testing it for awhile and are thoroughly impressed. The Amazon folks are clearly still fine-tuning a lot of the details, but basically you can create storage volumes in the cloud next to the server instances you launch in the cloud. Think of having a really big SAN in the cloud in which you can create volumes of up to 1TB each with a single API call, or with a simple click in the RightScale UI (yes, of course we’ll have nice support for the storage volumes on our site coupled with some neat automation and an array of pre-packaged solutions). You can mount one or multiple volumes on an instance and they appear just like the other local drives, so you can format them as you like, set-up striping and do other useful things.
The feature that really makes the storage volumes sizzle is the ability to snapshot them to S3 and then create new volumes from the snapshots. The snapshots are great for durability: once a snapshot is taken it is stored in S3 with all the reliability attributes of S3, namely redundant storage in multiple availability zones. This essentially solves the whole backup issue with one simple API call or click in the RightScale UI. You can also easily restore a snapshot by creating a fresh volume from it. This feature is useful beyond just restoring a backup: you may restore to another instance where you now have a clone of the data and can do whatever you want to it. Wow!
The cool stuff
There are so many great uses for the storage volumes that it’s impossible to write them all up in a single blog post, and we obviously haven’t thought of them all (or even close). The first usage scenario we looked into is running a database. Up to today the only setup for a mission critical database we recommend is using two instances with real-time database replication and frequent backups to S3. We’ve now installed our Manager for MySQL replicated set-up for many, many customers and it works very well. In short, we use MySQL replication for redundancy and frequent (like every 10 minutes) backups to S3 on the slave to guard against the unlikely event of simultaneous failure of both instances located in different availability zones.
With the storage volumes the Manager for MySQL set-up works even better. Instead of having to tar-up the database files and upload them to S3 we can just take a snapshot. And in order to initialize a slave we simply create a volume for it from the last snapshot of the master and launch the replication: no more rsync of the data is necessary. It’s really nice to see how all the automation we’ve built stays in place with the new Amazon capabilities and saves just as many headaches as before, it just gets turbocharged by the storage volumes!
In addition, the storage volumes enable slightly lower-end database offerings. Since the storage volumes are more durable than local instance storage a lot of the risk of losing it all if the instance dies goes away. It is now possible to run a single instance with the database data living on a storage volume and to take frequent snapshots to get backups onto S3. Should the instance die, it is very simple to launch a fresh one using the same storage volume. Typically it would take only a few minutes for the new instance to come up and take off where the old one stopped! Of course this set-up has more downtime when compared to the redundant database set-up, and one has to be really careful in setting everything up to minimize the time it takes to mount the volume and to ensure a successful database recovery.
Just as the storage volumes enable the reliable use of single-instance databases they also enable single-tenant appliances in EC2. It is now possible to host the data for a single-tenant virtual appliance on a storage volume and mount it on an instance. What’s really cool is the decoupling of the data from the instance. It means that you can start a customer on a small instance and if they outgrow it, you can migrate them almost seamlessly to a large and later an x-large instance, all using the same storage volume. Beyond an x-large a couple of interesting options are possible to increase performance further, such as striping multiple storage volumes. EC2 really brings virtual appliances to the next level!
The S3 snapshots enable some completely different and very intriguing usage scenarios. Suppose you’re doing some DNA matching against a Genome data set on 1000 instances. In addition to firing-up 1000 instances on a whim you can, also on a whim, clone a nicely prepared snapshot of the data set 1000-times to create 1000 volumes, one for each instance. BANG! This way they can all independently crawl over the data set. This type of massive (essentially read-only) cloning really opens-up new possibilities in running such large computations in a cost effective manner.
Summing it up
I’ll stop here, but clearly the cloud has just squared in size! Two years ago, when I started on EC2 there were only small instances available and the sentiment was that in order to get the horizontal scalability and pricing of the cloud you had to accept inferior features. In the meantime we’ve gotten multiple instance sizes plus recently the remappable IP addresses and availability zones. That already indicated that computing in the cloud would soon surpass computing in traditional colos or in your own datacenter not just in scale and price, but also in feature set. With the addition of the storage volumes with all the cool snapshot features it’s now a fait accomplit: the cloud adopters will have much more computing horsepower and flexibility at their fingertips than those who are still racking their own machines. It’s going to be like agile software development: if you want to survive as an internet/web service you will have to compute in the cloud or your competitors will leave you in the dust by being able to deploy faster, better, and cheaper.
Update: Werner Vogels, Amazon’s CTO also blogs about the storage volumes in all-things-distributed with a little more background perspective. The Amazon folks are getting pretty coordinated with news appearing at the same time on their blogs and the forums. Maybe I missed it, but I don’t think they even press release this stuff…
Jason Rakowski said
Good Layout and design. I like your blog. I just added your RSS feed to my Google News Reader. .
Jason Rakowski
Ron Kass said
It would make much sense to publish your test data regarding speed, latency etc.
Just my 20 cents..
TvE said
Ron: I’d love to, but the Amazon folks prefer to keep control over which details are made public at this stage. Also, until it’s all released, performance may still change. All I can say is that I’d love to use it in production as it is today.
TvE said
Jason: thanks! We’re using one of the standard wordpress layouts with a few custom tweaks. Love wordpress!
Jorge said
This is a major feature for everybody, but specially for those running database dependent applications like me
Static IP and now persistant storage – this is huge!
Tim Anderson’s ITWriting - Tech writing blog » Amazon Elastic Compute Cloud gets persistent storage said
[...] since the initial launch is static IP numbers. Early tester (and reseller) Thorsten von Eicken is enthusiastic: The feature that really makes the storage volumes sizzle is the ability to snapshot them to S3 and [...]
fiidgets said
Widget developers need to give EC2 a serious look. Those without sys admin experience will appreciate service of RightScale.
mmc said
Actually for a true “fait accomplit” we need all these nice features to become stable (non-beta) + us in Europe need a local data center to host our EC2 images for lower latency. Besides it would be nice if Amazon improves security (and no I am not going to list all the security problems. Contact me privately if you want to know the problems).
Amazon announces persistent storage for EC2 : business|bytes|genes|molecules said
[...] Further reading Phil801 EC2 Forums RightScale blog [...]
TvE said
fiidgets: thanks for the note. We actually have a lot of customers who *do* have sysadmin experience. They really understand how much time and headache RightScale is saving them, so they tend to be our most vocal supporters.
mmc: ahhh, there’s always something more to look forward to… I don’t know that I would couple “become stable” with “non-beta”. When S3 removed the beta tag we didn’t really see any difference, did we? It’s not like Amazon is currently saying “oops, we goofed, but hey, it’s still beta”! They are dead serious, beta or not. And “stable”? Have you purchased colo or hosting at a larger scale (say several racks plus associated bandwidth)? So far Amazon is as stable as anything I’ve seen. I’m totally with you on datacenters in Europe, though.
Wide Awake Developers said
Amazon Blows Away Objections…
Amazon must have been burning more midnight oil than usual lately.Within the last two weeks, they’ve announced three new features that basically eliminate any remaining objections to their AWS computing platform.Elastic IP Addresses Elastic IP ad…
Laurent’s blog » Blog Archive » Private beta for Amazon EC2 with persistent storage said
[...] Read RightScale’s experience with EC2 and persistent storage: “it’s now a fait accomplit: the cloud adopters will have much more computing horsepower and flexibility at their fingertips than those who are still racking their own machines.” addthis_url = ‘http%3A%2F%2Flaurent.pierssens.com%2F2008%2F04%2F14%2Fec2-with-persistent-storage-private-beta%2F’; addthis_title = ‘Private+beta+for+Amazon+EC2+with+persistent+storage’; addthis_pub = ‘lfp’; [...]
Amazon adds persistent storage to cloud computing service « IT Spot said
[...] Thorsten vok Eiken at RightScale, who has been testing the service, talks about the implications of this feature and says his company is making tools to make it easier to use these services. [...]
SearchAllDeals.com said
Great post! I’ve migrated out of ‘degraded instances’ one too many times to understand the importance of persistent storage, and decoupling data from instance. Do you know if multiple instances can mount the same volume? That would solve another set of problems and enable more cool stuff to happen. Btw, keep up the good work at Rightscale!
TvE said
SearchAllDeals: thanks for the kind words! Werner’s blog states “As to be expected with a volume abstraction only one instance can have the volume mounted at any given time.” I don’t know that I would agree with the “as to be expected” piece, I would have expected to be able to mount a volume on multiple instances such that I could use a cluster filesystem like GFS to access it, or at least to mount it read-only on multiple instances. Hopefully that’ll be high on Amazon’s list for V2…
Persistent Storage Boosts Amazon Web Services; Enterprise Ambitions - GigaOM said
[...] thus making it a shared drive. What it all means is that AWS/EC2 has gone up a few notches in terms of reliability. This reliability will go a long way towards the company offering service-level agreements to [...]
- Persistent Storage Boosts Amazon Web Services; Enterprise Ambitions said
[...] thus making it a shared drive. What it all means is that AWS/EC2 has gone up a few notches in terms of reliability. This reliability will go a long way towards the company offering service-level agreements to [...]
Service-Oriented Architecture mobile edition said
[...] April 14: Amazon just added another new feature, in which persistent storage “volumes” can be added to EC2 implementations. Thanks to [...]
EC2 Loses its Last Limit « Jason Watkins’s Weblog said
[...] chance of losing ~10 minutes of data were the price you had to pay for hosting databases on ec2. But now that’s gone. Once again Amazon focuses on a simple tool that can be used in a variety of ways. The only [...]
Rod Boothby said
At Joyent, we already give you access to “real” storage. Each accelerator comes with a dedicated static IP address and a dedicated drive space. Our new accelerators make sure that you have access to all that drive space locally.
You had mentioned that Amazon was not yet disclosing the speed of this new offering. It is likely that developers will experience degraded I/O speeds as multiple users try to run high traffic DBs on the same storage volume. Its a basic issue of physics. You can only run so much through a single storage device.
At Joyent, we have solved this problem by associating large local drives with each Joyent Accelerator.
The other question that has not been covered yet is how much this will cost. Amazon charges extra for bandwidth, extra for static IPs, extra for S3 backup, extra for S3 requests and now, it looks like they are going to charge extra for persistent storage.
BTW, @SearchAllDeals.com , at Joyent, you can mount shared drives across multiple instances.
There is no question about it that RightScale has a tremendous product. But, you guys should consider giving your uses alternatives. Joyent provides just such an alternative.
TvE said
Rod, thanks for leaving me a Joyent advertising comment!
I like what you offer and think that your service is excellent for many users. Also, if there’s a way to work together we’d love to talk!
Regarding the specifics of your comments, the fundamental difference is on-demand. With AWS I can acquire or drop resources on a whim. At first, being able to just launch a bunch of servers or create 10 volumes from a snapshot “just like that” is a wow-experience and feels like a luxury, but after using this stuff for almost two years I just can’t go back. If you’re familiar with software development, when version control first became widely used it was a revelation to be able to tell a developer who was working on a special feature to “just branch the project and commit your changes to the branch”. Well, now we have gotten used to “just clone the staging system and test your changes there”, where the staging system is really a multi-server set-up and cloning that takes 5 minutes of clicking and editing a few web forms. We do this so routinely now that I can’t go back. Some scientific computing people we’re talking to are drooling over being able to create tens to hundreds of clones of a terabyte volume “just like that” because it allows them to get their job done quicker and better. Looking at Amazon’s storage volumes just feature by feature without considering the on-demand scaling is missing the big picture.
amazon block storage — award tour said
[...] Frontier. Amazon will be launching mountable volume storage to EC2 soon enough. rightscale has a nice post of why this is [...]
SB said
Thanks for the excellent report. We’ve been using EC2 for a year and this was great news.
Weekly linkdump #122 - max - блог разработчиков said
[...] Амазон обновил свой сервис EC2, теперь доступна возможность бекапа на S3, Amazon takes EC2 to the next level with persistent storage volumes « RightScale Blog [...]
A few items that clog up my bookmarks | Oliver Thylmann's Thoughts said
[...] And this brings us back to one of my favorite topics, Amazon, who announced a persistent storage feature for EC2. Before I blabber along on how cool that is, just read this from RightScale’s Thorsten vok Eiken. [...]
Amazon Web Services links for 2008-04-22 | Elastic Grid Blog said
[...] Persistent storage for EC2: as announced on AWS blog, but also here and here! [...]
Matthew Lanham said
Great article, im looking forward to this feature, i have linked back to this article from my blog…
Amazon to offer Persistent Storage for Amazon EC2 said
[...] Amazon takes EC2 to the next level with persistent storage volumes [...]
Babble On » Blog Archive » Babble On EC2 said
[...] mitigating any problems. If it really bothers you then hang on a few months for Amazon’s new persistent storage volumes, which are probably exactly what you are looking [...]
Marcus Cake » Online network building blocks: automatic scaling of web servers, persistent storage and MySQL management said
[...] now offers essential features not previously available – automatic scaling , Manager for MySQL and Persistent Storage. The Amazon Web Services platform empowered the entrepreneur, but advanced technical skills were [...]
chirp.syxyz.net » Blog Archive » cloudy with chance of persistent elasticity said
[...] update: beta-tester Thorsten from RightScale also got enthusiastic. [...]
TJH said
1). Congratulation to the entire RightScale team !$!
2). I was able to create volume and snapshot successfully but I was not able to attach my existing server (image).
I click on ‘Add server for boot attachment’ link and got ‘No servers available’ message.
Thanks for your help.
Thorsten said
TJH: you can only attach the volume to instances in the same availability zone. We may not have displayed things well enough to make this obvious, sorry for that.