Archive for October, 2009

RightScale ServerTemplate Library and Machine Tags

Yesterday’s release of the RightScale platform introduced two new features that I’m really excited about: the ServerTemplate Library and the use of Machine Tags on servers. (Ooops, I shouldn’t forget the new features for RackSpace, but I’ll talk about those next week.)

We’ve had rather sophisticated sharing of ServerTemplates in RightScale for over a year now allowing certain users to share ServerTemplates, RightScripts and other design artifacts with other RightScale users. This enables us to publish free ServerTemplates to all our users, premium ones to our customers and it also lets ISVs on our platform publish ServerTemplates for free or for pay to their users and customers. In addition, each of the design artifacts is versioned such that users who have launched servers with a ServerTemplate last year can still launch new servers with exactly the same version of that ServerTemplate.

A result of all this publishing, sharing and versioning is that there’s a lot to choose from. So much that drop-down menus have become really unwieldy and this is where the new library comes into play. In the past, when adding a server to a deployment one had to find the correct ServerTemplate from the list of all available templates in the RightScale system. Now this has become a two-step process where you first import the ServerTemplates of interest from the library into your account and then only the imported templates are shown in all the drop-down selection menus. Separating the library import/export step will also allow us to significantly upgrade the experience browsing all the design artifacts in the library over the coming releases, stay tuned…

We introduced Flickr style machine tags recently and we’re expanding their use with this release. One of the really exciting new features is that servers now have tags and we’ve integrated the tags with the routing of messages between servers, with Chef (via the RightLink agents) and with the UI. All this is still in alpha but it’s starting to take shape. Our first real use-case is the registration of application servers with load balancers. The way it works is that when a load balancer comes up and is ready for operation it adds a “loadbalancer:lb=www” tag to say “I’m a load balancer for the www vhost”. When an app server starts up, it requests all servers in the deployment with a “loadbalancer:lb=www” tag to run a Chef recipe that adds the app server to the load balancer rotation. This way, the app server doesn’t need to know which or how many load balancers there are. The tag matching, communication, and running of the Chef recipe are all done by the RightLink agents.

In order to let new load balancers come up when app servers are already running we can do the same tag-location in reverse: app servers announce “loadbalancer:app=www” to say “I’m an app server serving vhost www” and load balancers on start-up can add all app servers to their config by querying for all servers with that tag. For overall resiliency it’s a good idea for load balancers to re-query the set of app servers and to update their config accordingly. This catches race conditions as well as issues where portions of the app servers may be temporarily invisible due to network partitions. The theme here is “eventual consistency” and we’re still evaluating what the best primitives are to support high availability.

You may wonder why the examples above use such long tags and that’s really where machine tags come in. The “loadbalancer:” prefix helps isolate the tags to coordinate the load balancer registration from other tags. Think of “loadbalancer” as being the name of the application or feature that uses these tags, e.g. the load balancer registration. The “lb=www” and “app=www” tag predicate and value can be used to support multiple vhosts. So a load balancer could announce “loadbalancer:lb=www” and “loadbalancer:lb=api” to indicate that it’s load balancing the www and api vhosts. And an api app server then would only query for the “lb=api” tag and it would only announce the “app=api” counterpart.

While all this is happening amongst the servers, the RightScale UI provides access to all the tags, so one can see the servers announce the various tags and one can even intervene and manually modify these tags. We might provide a “don’t touch” notion for some tags, but right now it’s much more important to us to be able to expose all this machinery. As an ops guy there are few things I loathe more than hidden automation that I can’t inspect and override when I need to.

Of course there’s more in the new release than just these two features: more support for RackSpace (monitoring in particular), improved support for Chef, support for new AWS features, and more

Comments (1)

Amazon launches Relational Database Service and larger server sizes

Today is another big AWS launch day with two important new features available for EC2: a Relational Database Service (RDS) and larger servers. Plus a 15% price reduction on compute cycles: yay!

Relational Database Service

With the Relational Database Service AWS fulfills a long standing request from a large number of its users, namely to provide a full relational database as a service. What Amazon is introducing today is slightly different than what most people might have expected, it’s really MySQL5.1 as a service. The RDS product page has the low-down on how it works, but the short is that with an API call you can launch a MySQL5.1 server that is fully managed by AWS. You can’t SSH into the server, instead you point your MySQL client library (or command line tool) at the database across the network. Almost anything you can do via the MySQL network protocol you can do against your RDS instance. Pretty simple and the bottom line is that businesses that don’t want to manage MySQL themselves can outsource much of that to AWS. For background on RDS I’d also recommend reading Jeff Barr’s write-up and Werner’s blog which recaps the data storage options on AWS.

What AWS does is keep your RDS instance backed up and running, plus give you automation to up-size (and down-size). You can create snapshot backups on-demand from which you can launch other RDS instances and AWS automatically performs a nightly backup and keeps transaction logs that allow you to do a point-in-time restore.

The way I think of an RDS instance is as a virtual appliance or a special-purpose server. You really get an EC2 instance with an EBS volume running a specific version of MySQL plus automation for backups and resizing the storage volume. The API is designed such that additional versions of MySQL and other databases can easily be added in the future. Just like a regular server, each RDS instance lives within an availability zone and access is controlled through a security group (plus the MySQL authentication). I haven’t had the opportunity to run some performance tests, but I would surmise that it’s not too different from DIY running MySQL on a regular instance.

One of the current shortcomings of RDS is the lack of replication. This means you’re dependent on one server and it’s impossible to add slave MySQL servers to an RDS instance in order to increase read performance. It’s also impossible to use MySQL replication to replicate from a MySQL server located in your datacenter to an RDS instance. But replication is in the works according to the RDS product page.

In terms of cost RDS is priced at 30% above the same raw EC2 instance (after the Nov 1st price reduction) but the comparison is a little tricky because some backup storage is included as well. Of course I quickly compared to the cost of RightScale: if you run three XL RDS instances the extra cost is already more than a RightScale subscription which (just on the DB end) gives you replication, read-scaling, full control, plus real live support. Interesting to see how the per-hour price surcharge compares with a more flat-fee subscription to a broad management service.

But our core conviction is that we want to offer our customers the broadest choice possible and we’ll support RDS instances in the RightScale dashboard within a day or two when we complete our next release!

Larger Instance Sizes

EC2 now sports larger and faster servers: XXL and XXXXL sizes, properly called m2.2XL and m2.4XL. These new server sizes are particularly important for large database users and we’ve been awaiting them ourselves. We haven’t had an opportunity to play with them yet but we’ll update our MySQL ServerTemplates as soon as we have a chance. The fact that the new instance size names start with m2 reflects that the speed of each core is significantly higher than that of the m1 series. With the prices being less than 2x and 4x that of a current m1.xlarge instance there’s no reason not to keep scaling up in machine size!

Cloud Computing Keeps Getting Better

Amazon shows it again and again: listen to your customers, implement new features accordingly, and iterate. Tonight’s release adds important new capabilities to the AWS cloud offering and we’re sure many of our customers will rapidly adopt them. I remain a little reserved about the database service because it does not currently support replication, which I wouldn’t want to live without, but Amazon is definitely on the right track.

The 2XL and 4XL servers will be gobbled up real fast by many of our larger customers. We’ve seen a trend towards more and larger servers over the past year and I’m sure that will continue. By the way, how fast can you launch 10 68GB servers in your datacenter? ;-)

RightScale User Meetup

In case you’ve missed it: we’re hosting a RightScale User Meetup next Monday (11/2) in Santa Clara collocated with the Cloud Computing Conference & Expo and we’d love to see you there! We’ll be discussing trends in cloud computing that we see in our user base, our current and future product roadmap, and some “from the trenches” stories from several RightScale customers. It’s easy to register, and free. If you know anyone who might be interested send the link along. Hope to see you there!

Comments (14)

Amazon Usage Estimates

Two weeks ago Guy Rosen posted a very interesting analysis of the EC2 instance IDs which reveals how many instances (virtual machines) have been launched on EC2 since its beginning in 2006. We’ve also been digging in our records and I can share some interesting findings.

First of all, Guy’s analysis contains one significant error which is due to the limited data set he had access to. Before May 2009 EC2 issued even and odd instance IDs, not just even ones as he mentions. Since that date EC2 issued only even IDs until it switched to only odd ones in early September. The even/odd switches don’t seem to correlate with ID boundaries, perhaps Amazon switches between two active/standby reservation systems or something else is going on.

The formula to convert an EC2 ID into a sequential launch number as far as we call tell is:

Given an aws id as i-11223333
Assign p1 the 1's, p2 the 2's and p3 the 3's
Also assign p31 the first two 3's and p32 the last two 3's
Compute:
  c1 = (p1 ^ p32) ^ 0x69
  c2 = (p2 ^ p31) ^ 0xe5
  c3 =  p3 ^ 0x4000
And finally concatenate c1-c2-c3. (This does not include the even/odd adjustments)

The upshot of Guy’s error is that he underestimates the launches by almost 2x! Here is a graph showing the instances launched daily since late 2006 that we would postulate based on his formula for instance IDs and what we’ve observed. We compute a total of 15.5 million instances (!) launched to date:

ec2 instances

You can see that EC2 has been growing very steadily, except for dips during the holidays and a spike in activity in april of 2008. That spike was due to Animoto’s scaling to several thousands of servers within few days. We’re a little puzzled about this spike, however, because the instance ID analysis shows about 2x more servers launched than Animoto actually launched (we launched them so we know). We believe this discrepancy to be temporary, but there remain some mysteries in the instance ID allocation…

It’s also important to be clear about the what an instance launch means — namely, the launch of a virtual server.  It says nothing about what size server is launched (and therefore it’s cost per hour) or how long that server runs (and therefore how many servers are running concurrently).  As a result, an “instance launch” might mean as little as 10 cents in EC2 revenue (1 small instance for 1 hr) or, for example,  $7008 in EC2 revenue (1 XL instance run for 365 days), or even more.  That’s quite a difference, and makes it challenging to calculate revenues based solely on total instance launch statistics.

Another interesting facts that we have observed is that during 2009 many of the larger EC2 customers have been migrating to the larger instance sizes. In earlier days the predominant method of scaling was by launching more servers, but we are now seeing a lot more scaling by replacing smaller servers by larger ones. Those XL servers are going like hotcakes! In addition we see a clear rule where the larger the server the longer it runs. A lot of the small servers go as quickly as they came: they’re used for experimentation, development, and testing. Once you launch a large server and fill it up with data chances are you’ll keep it running for a while. Hold onto your wallet!

Another interesting trend we’ve seen is the improvement in sysadmin-to-server ratio. Our customers who grok the RightScale platform become very effective at managing lots of servers with few people. Hundreds to thousands per sysadmin. As a result they use servers aggressively to solve business needs — whether to keep up with exponential traffic or simply flexibility during dev & test.

Overall, in terms of all cloud spending, in the last 12 months we’ve observed:

  • Cloud infrastructure spending grew 380% – i.e. $$ spent on cloud provider resources
  • Average cloud costs per customer grew 140% – i.e. cloud users on average are spending 2.5X more than a year ago
  • RightScale’s own cloud infrastructure consumption grew 440%

That’s phenomenal growth – and testimony to the value of managed cloud computing.

Meanwhile, the beat goes on, and we’re all consuming more and more cloud resources as each day passes. If you have a story about your own cloud usage, or trends and patterns you’re seeing in cloud usage in general, please post a comment or send it in.

Comments (16)