RightScale ServerTemplate Library and Machine Tags

Yesterday’s release of the RightScale platform introduced two new features that I’m really excited about: the ServerTemplate Library and the use of Machine Tags on servers. (Ooops, I shouldn’t forget the new features for RackSpace, but I’ll talk about those next week.)

We’ve had rather sophisticated sharing of ServerTemplates in RightScale for over a year now allowing certain users to share ServerTemplates, RightScripts and other design artifacts with other RightScale users. This enables us to publish free ServerTemplates to all our users, premium ones to our customers and it also lets ISVs on our platform publish ServerTemplates for free or for pay to their users and customers. In addition, each of the design artifacts is versioned such that users who have launched servers with a ServerTemplate last year can still launch new servers with exactly the same version of that ServerTemplate.

A result of all this publishing, sharing and versioning is that there’s a lot to choose from. So much that drop-down menus have become really unwieldy and this is where the new library comes into play. In the past, when adding a server to a deployment one had to find the correct ServerTemplate from the list of all available templates in the RightScale system. Now this has become a two-step process where you first import the ServerTemplates of interest from the library into your account and then only the imported templates are shown in all the drop-down selection menus. Separating the library import/export step will also allow us to significantly upgrade the experience browsing all the design artifacts in the library over the coming releases, stay tuned…

We introduced Flickr style machine tags recently and we’re expanding their use with this release. One of the really exciting new features is that servers now have tags and we’ve integrated the tags with the routing of messages between servers, with Chef (via the RightLink agents) and with the UI. All this is still in alpha but it’s starting to take shape. Our first real use-case is the registration of application servers with load balancers. The way it works is that when a load balancer comes up and is ready for operation it adds a “loadbalancer:lb=www” tag to say “I’m a load balancer for the www vhost”. When an app server starts up, it requests all servers in the deployment with a “loadbalancer:lb=www” tag to run a Chef recipe that adds the app server to the load balancer rotation. This way, the app server doesn’t need to know which or how many load balancers there are. The tag matching, communication, and running of the Chef recipe are all done by the RightLink agents.

In order to let new load balancers come up when app servers are already running we can do the same tag-location in reverse: app servers announce “loadbalancer:app=www” to say “I’m an app server serving vhost www” and load balancers on start-up can add all app servers to their config by querying for all servers with that tag. For overall resiliency it’s a good idea for load balancers to re-query the set of app servers and to update their config accordingly. This catches race conditions as well as issues where portions of the app servers may be temporarily invisible due to network partitions. The theme here is “eventual consistency” and we’re still evaluating what the best primitives are to support high availability.

You may wonder why the examples above use such long tags and that’s really where machine tags come in. The “loadbalancer:” prefix helps isolate the tags to coordinate the load balancer registration from other tags. Think of “loadbalancer” as being the name of the application or feature that uses these tags, e.g. the load balancer registration. The “lb=www” and “app=www” tag predicate and value can be used to support multiple vhosts. So a load balancer could announce “loadbalancer:lb=www” and “loadbalancer:lb=api” to indicate that it’s load balancing the www and api vhosts. And an api app server then would only query for the “lb=api” tag and it would only announce the “app=api” counterpart.

While all this is happening amongst the servers, the RightScale UI provides access to all the tags, so one can see the servers announce the various tags and one can even intervene and manually modify these tags. We might provide a “don’t touch” notion for some tags, but right now it’s much more important to us to be able to expose all this machinery. As an ops guy there are few things I loathe more than hidden automation that I can’t inspect and override when I need to.

Of course there’s more in the new release than just these two features: more support for RackSpace (monitoring in particular), improved support for Chef, support for new AWS features, and more

Comments (1)

Amazon launches Relational Database Service and larger server sizes

Today is another big AWS launch day with two important new features available for EC2: a Relational Database Service (RDS) and larger servers. Plus a 15% price reduction on compute cycles: yay!

Relational Database Service

With the Relational Database Service AWS fulfills a long standing request from a large number of its users, namely to provide a full relational database as a service. What Amazon is introducing today is slightly different than what most people might have expected, it’s really MySQL5.1 as a service. The RDS product page has the low-down on how it works, but the short is that with an API call you can launch a MySQL5.1 server that is fully managed by AWS. You can’t SSH into the server, instead you point your MySQL client library (or command line tool) at the database across the network. Almost anything you can do via the MySQL network protocol you can do against your RDS instance. Pretty simple and the bottom line is that businesses that don’t want to manage MySQL themselves can outsource much of that to AWS. For background on RDS I’d also recommend reading Jeff Barr’s write-up and Werner’s blog which recaps the data storage options on AWS.

What AWS does is keep your RDS instance backed up and running, plus give you automation to up-size (and down-size). You can create snapshot backups on-demand from which you can launch other RDS instances and AWS automatically performs a nightly backup and keeps transaction logs that allow you to do a point-in-time restore.

The way I think of an RDS instance is as a virtual appliance or a special-purpose server. You really get an EC2 instance with an EBS volume running a specific version of MySQL plus automation for backups and resizing the storage volume. The API is designed such that additional versions of MySQL and other databases can easily be added in the future. Just like a regular server, each RDS instance lives within an availability zone and access is controlled through a security group (plus the MySQL authentication). I haven’t had the opportunity to run some performance tests, but I would surmise that it’s not too different from DIY running MySQL on a regular instance.

One of the current shortcomings of RDS is the lack of replication. This means you’re dependent on one server and it’s impossible to add slave MySQL servers to an RDS instance in order to increase read performance. It’s also impossible to use MySQL replication to replicate from a MySQL server located in your datacenter to an RDS instance. But replication is in the works according to the RDS product page.

In terms of cost RDS is priced at 30% above the same raw EC2 instance (after the Nov 1st price reduction) but the comparison is a little tricky because some backup storage is included as well. Of course I quickly compared to the cost of RightScale: if you run three XL RDS instances the extra cost is already more than a RightScale subscription which (just on the DB end) gives you replication, read-scaling, full control, plus real live support. Interesting to see how the per-hour price surcharge compares with a more flat-fee subscription to a broad management service.

But our core conviction is that we want to offer our customers the broadest choice possible and we’ll support RDS instances in the RightScale dashboard within a day or two when we complete our next release!

Larger Instance Sizes

EC2 now sports larger and faster servers: XXL and XXXXL sizes, properly called m2.2XL and m2.4XL. These new server sizes are particularly important for large database users and we’ve been awaiting them ourselves. We haven’t had an opportunity to play with them yet but we’ll update our MySQL ServerTemplates as soon as we have a chance. The fact that the new instance size names start with m2 reflects that the speed of each core is significantly higher than that of the m1 series. With the prices being less than 2x and 4x that of a current m1.xlarge instance there’s no reason not to keep scaling up in machine size!

Cloud Computing Keeps Getting Better

Amazon shows it again and again: listen to your customers, implement new features accordingly, and iterate. Tonight’s release adds important new capabilities to the AWS cloud offering and we’re sure many of our customers will rapidly adopt them. I remain a little reserved about the database service because it does not currently support replication, which I wouldn’t want to live without, but Amazon is definitely on the right track.

The 2XL and 4XL servers will be gobbled up real fast by many of our larger customers. We’ve seen a trend towards more and larger servers over the past year and I’m sure that will continue. By the way, how fast can you launch 10 68GB servers in your datacenter? ;-)

RightScale User Meetup

In case you’ve missed it: we’re hosting a RightScale User Meetup next Monday (11/2) in Santa Clara collocated with the Cloud Computing Conference & Expo and we’d love to see you there! We’ll be discussing trends in cloud computing that we see in our user base, our current and future product roadmap, and some “from the trenches” stories from several RightScale customers. It’s easy to register, and free. If you know anyone who might be interested send the link along. Hope to see you there!

Comments (12)

Amazon Usage Estimates

Two weeks ago Guy Rosen posted a very interesting analysis of the EC2 instance IDs which reveals how many instances (virtual machines) have been launched on EC2 since its beginning in 2006. We’ve also been digging in our records and I can share some interesting findings.

First of all, Guy’s analysis contains one significant error which is due to the limited data set he had access to. Before May 2009 EC2 issued even and odd instance IDs, not just even ones as he mentions. Since that date EC2 issued only even IDs until it switched to only odd ones in early September. The even/odd switches don’t seem to correlate with ID boundaries, perhaps Amazon switches between two active/standby reservation systems or something else is going on.

The formula to convert an EC2 ID into a sequential launch number as far as we call tell is:

Given an aws id as i-11223333
Assign p1 the 1's, p2 the 2's and p3 the 3's
Also assign p31 the first two 3's and p32 the last two 3's
Compute:
  c1 = (p1 ^ p32) ^ 0x69
  c2 = (p2 ^ p31) ^ 0xe5
  c3 =  p3 ^ 0x4000
And finally concatenate c1-c2-c3. (This does not include the even/odd adjustments)

The upshot of Guy’s error is that he underestimates the launches by almost 2x! Here is a graph showing the instances launched daily since late 2006 that we would postulate based on his formula for instance IDs and what we’ve observed. We compute a total of 15.5 million instances (!) launched to date:

ec2 instances

You can see that EC2 has been growing very steadily, except for dips during the holidays and a spike in activity in april of 2008. That spike was due to Animoto’s scaling to several thousands of servers within few days. We’re a little puzzled about this spike, however, because the instance ID analysis shows about 2x more servers launched than Animoto actually launched (we launched them so we know). We believe this discrepancy to be temporary, but there remain some mysteries in the instance ID allocation…

It’s also important to be clear about the what an instance launch means — namely, the launch of a virtual server.  It says nothing about what size server is launched (and therefore it’s cost per hour) or how long that server runs (and therefore how many servers are running concurrently).  As a result, an “instance launch” might mean as little as 10 cents in EC2 revenue (1 small instance for 1 hr) or, for example,  $7008 in EC2 revenue (1 XL instance run for 365 days), or even more.  That’s quite a difference, and makes it challenging to calculate revenues based solely on total instance launch statistics.

Another interesting facts that we have observed is that during 2009 many of the larger EC2 customers have been migrating to the larger instance sizes. In earlier days the predominant method of scaling was by launching more servers, but we are now seeing a lot more scaling by replacing smaller servers by larger ones. Those XL servers are going like hotcakes! In addition we see a clear rule where the larger the server the longer it runs. A lot of the small servers go as quickly as they came: they’re used for experimentation, development, and testing. Once you launch a large server and fill it up with data chances are you’ll keep it running for a while. Hold onto your wallet!

Another interesting trend we’ve seen is the improvement in sysadmin-to-server ratio. Our customers who grok the RightScale platform become very effective at managing lots of servers with few people. Hundreds to thousands per sysadmin. As a result they use servers aggressively to solve business needs — whether to keep up with exponential traffic or simply flexibility during dev & test.

Overall, in terms of all cloud spending, in the last 12 months we’ve observed:

  • Cloud infrastructure spending grew 380% – i.e. $$ spent on cloud provider resources
  • Average cloud costs per customer grew 140% – i.e. cloud users on average are spending 2.5X more than a year ago
  • RightScale’s own cloud infrastructure consumption grew 440%

That’s phenomenal growth – and testimony to the value of managed cloud computing.

Meanwhile, the beat goes on, and we’re all consuming more and more cloud resources as each day passes. If you have a story about your own cloud usage, or trends and patterns you’re seeing in cloud usage in general, please post a comment or send it in.

Comments (13)

RightScale Release: RackSpace, RightLink, Chef, Machine Tags, VPC, and more!

Yesterday’s release included a number of features that I’ve been itching to get into RightScale for a long time. This stuff is fresh off the press in alpha-release form so we’re hoping for your feedback so we can evolve it to suit your needs. Here are the highlights and some background on where we’re headed.

First off we’re adding RackSpace CloudServers to the set of clouds in RightScale and it’s available to everyone as of today! All you need to do is to get a CloudServers account and enter your credentials into RightScale. Please refer to our tutorial for the details. What we’re releasing today is full support for our ServerTemplate machinery which is the foundation for building cloud portable systems. The ServerTemplates are built using our new RightLink agent and support Chef cookbooks as well as our standard RightScripts (see below for more info on this). While we don’t have a RightImage available for RackSpace quite yet it turns out that we’ve implemented enough magic to make the “Ubuntu 8.10 (intrepid)” image provided by RackSpace work as if it were a RightImage.

Some of the features we’re missing for RackSpace are a full set of the core RightScale production ServerTemplates and the support for monitoring, alerts and automation, such as auto-scaling arrays. We’re working hard to release all this as soon as we can and that’s one reason the current RackSpace support is still labeled alpha.

The second major new feature is the RightLink agent which supports not only RightScripts but also Opscode’s Chef cookbooks. The RightLink agent connects each server with the RightScale core as well as other servers around it. Boot scripts and operational scripts are launched via RightLink and we’ll fully support direct server-to-server communication in a next release. RightLink uses Nanite for the communication, it includes the Chef-client for running cookbook recipes, and it can run RightScripts as well. We’ll be enhancing the whole communication infrastructure so servers can communicate with each other efficiently but in a secure and controlled manner, for example to enable application servers to register with load balancers and to locate the currently active database master.

I’m also very excited that we are now supporting the Chef server configuration system. When I started RightScale almost three years ago I wanted to include something like Chef but couldn’t get myself to pick among the available options. When I dug into Chef earlier this year and started talking to Jessie and Adam at Opscode it became clear to me that this is the right technology for configuring servers in the cloud. Chef cookbooks are the next level beyond OS distributions like RedHat or Ubuntu: a cookbook leverages the distro for getting the right bits onto the machine and then layers the operational know-how on top: how to configure everything and perform operational tasks. RightScale’s ServerTemplates then combine all the cookbooks needed on a server into a portable package and add the coordination between servers. After all, no server operates alone in the cloud…

A nice side-effect of using Chef is that we’ve been able to fully embrace git for developing cookbooks (svn is also supported). We publish our cookbooks on github where you can fork and change what we offer to suit your needs. The RightScale web site pulls metadata information about each cookbook directly from github or any accessible git repo and servers also get everything directly from git. This means that all of git’s (or svn’s) software development goodness (branching, merging, tracking, etc) is now fully integrated with RightScale ServerTemplates!

We still have to put together a getting started tutorial for Chef but we have published a sample ServerTemplate called “Rails all-in-one (EC2 Chef Alpha)”. It launches and comes up running the Rails Mephisto blogging app. You’ll notice that it’s a bit on the slow side to boot — we have a number of things to optimize — but it does pull from the public Opscode and RightScale cookbooks on github. Look into the Server Template under the Repos tab and you’ll see the definitions for the repositories.

But there’s more! We’ve started to add Flickr style machine tags to RightScale resources. A machine tag is a tag that follows a special triplet syntax of namespace:predicate=value and the purpose of machine tags is to allow anyone or any external application to attach metadata to RightScale resources. Right now tags are only available for Servers, Images, and EBS Snapshots. Rather than start attaching tags everywhere we preferred to start using tags ourselves for something concrete so we can ensure we have a good feature set. We’re using tags now for snapshots to control the rotation of backup snapshots and to organize snapshots of multi-volume stripes. We’ll soon use tags to encode the features provided by images, e.g. whether they’re RightImages, support RightLink, support the freezing of repositories, etc. But most importantly we’ll add API access to tags so you can attach your automation to tags. We’d love to hear from you what exactly we need to provide. But in the meantime you can at least add tags to servers and use that in the UI to filter the list of servers you see.

Amazon has been on a tear lately with few weeks going by without a new feature announcement. The most important news to come along in a long time has been the introduction of Virtual Private Clouds (VPC) and we’re pleased to support them in this release, which means that you can create subnets in your VPC and launch servers into them. We’re also now supporting the purchase of reserved instances straight from the RightScale dashboard, plus we’ll show what you’ve purchased.

Finally we’ve improved the speed of the site across the board specially for larger accounts with lots and lots of servers. We continue to appreciate feedback on anything that doesn’t work well or that we should enhance: use the feedback link on our site or email feedback@rightscale.com directly.

I hope you’ll enjoy the new features as much as we do — yes, we eat our own dog food and manage RightScale using RightScale!

Comments (7)

Internal external private public hybrid virtual cloud

I’d like an external private hybrid cloud, dry, with whole milk, please!

Enterprises rise to the cloud, terminology takes off… As if we didn’t have enough cloud confusion already. But after some thinking it’s not all bad news, some of the terms do make sense. While many of the benefits associated with the cloud are independent of cloud type – internal, external, private, public – the type of cloud does determine regulatory compliance, security and financial benefits. The cloud end-user mostly shouldn’t have to care, but to IT these are important considerations.

Note that I’m exclusively talking about infrastructure clouds (IaaS) here, like Amazon EC2, so all this is orthogonal to the the SaaS, vs. Platform cloud (PaaS), vs. Infrastructure Cloud (IaaS) terminology axis.

Many of the benefits of the cloud to central IT are independent of the exact nature of the cloud:

  • Automation increases reliability and system administrators’ efficiency
  • Self provisioning by end users reduces IT menial labor
  • Cost reduction by homogenizing and simplifying the infrastructure

But when we get to regulatory, security and financial benefits internal/external and public/private cloud types come into play. Let me try to define:

  • An internal cloud is located in the enterprise datacenter and it owns the assets which are capitalized
  • An external cloud is located at a service provider and charges are expensed
  • A private cloud is dedicated to an organization, it’s “single tenant” in that sense (but that’s a tricky nomenclature because a private cloud may be used by many internal tenants within the organization)
  • A public cloud is shared across many organizations that don’t even know about each other

Several combinations of the above make sense and here are some example:

  • An internal private cloud could be a Eucalyptus or (future) vCloud implementation in the datacenter of a large enterprise
  • An external private cloud could be a service provider, like perhaps IBM dedicating a number of racks in their facilities for a cloud they operate on an enterprise’s behalf
  • An external public cloud is what the cloud started as with Amazon EC2 and now emulated by others like RackSpace
  • An internal public cloud doesn’t make much sense to me, but I’m sure we’ll see some, perhaps it can make sense for renting out unused capacity, who knows…

privpubcloudThis nomenclature turns out to be useful in teasing out the benefits of these various types of clouds. For public vs. private clouds the two main distinguishing factors are isolation and elasticity. In a private cloud it is easier to draw a hard boundary around the servers, the storage, and the network used by an organization’s cloud resources. This may have advantages from a security compliance and audit point of view. On the flip side, public clouds will tend to have more elasticity than private clouds because of the increased scale and ability to balance across more disparate types of uses. The elasticity is a very important cloud characteristic because it underlies a number of the end-user benefits.

Amazon’s Virtual Private Cloud (VPC) is an interesting in-between the strict public and private definitions. The VPC provides increased isolation between a VPC’s resources and those of other users, but Amazon isn’t very clear on the exact nature of this increased isolation. At the same time the VPC does not compromise elasticity and cost-effectiveness, which is very important. Werner Vogels argues that without the elasticity it’s not a cloud.

intextcloudThe three main distinguishing benefits of internal vs. external clouds are about control, the nature of the costs and cloud locations. By outsourcing the cloud infrastructure to a service provider the typical cap-ex costs of computing infrastructure can be turned into variable costs that scale relative to the actual use of resources. As more and more service providers offer clouds across the globe it is also increasingly easy to place compute resources where they are needed, whether for latency reasons or for regulatory purposes. Internal clouds are bound to where the enterprise has or can summon physical resources.

That leaves the word ‘hybrid’. At RightScale we’ve been using it to denote hybrid cloud uses where an organization makes use of different types of clouds, which is something we believe will be very common. Given the large application portfolios in many enterprises some will undoubtedly be good candidates for credit-card based self-provisioning in external public clouds while others will remain under close scrutiny of IT in internal private clouds for a long time. This type of hybrid use is where the RightScale service is very effective at providing a seamless experience across the many clouds.

While all the concerns around the internal / external / private / public nature of a cloud is interesting, it is important not to loose track of the fact that a cloud is a means, not an end. The most important thing is to deliver the benefits of the cloud to its end users, those who will launch servers in the cloud and use the cloud on a daily basis. In the enterprise space this includes many constituencies across the organization outside of central IT thanks to the fact that the cloud moves the provisioning closer to the end user. enduserbenefitsDevelopers can launch dev servers in the cloud when they need them and shut them down again when they’re done. Test engineers can launch whole clusters for test runs and they go away automatically at the end of the run. Operations engineers can set up staging systems for short periods to engineer the roll out of the next release. Marketing support engineers can launch demo systems for events or important prospects, and in general the various business units are in more direct control of their compute resources. All these users are outside of central IT.

The cloud end user benefits I see in the enterprise settings:

  • Self-provisioning by end users so they can decide when, what, and how much.
  • Increased flexibility and reduced planning thanks to the on-demand nature of the cloud
  • Reduced costs thanks to fewer idle servers and economies of scale and commoditization
  • Increased operational efficiency thanks to more automation from management platforms like RightScale

It’s important to note that none of the end-user benefits are directly related to whether it’s a private, public, internal, or external cloud. End users should care about the elasticity and on-demand nature of the cloud as well as the automation offered by cloud management services like RightScale.

Well, while writing this rather long blog entry the different terms have actually started to grow on me. They do make sense in the right context. But what I am left with is the worry that everything cloud is becoming yet more complex when one of the fundamental benefits of the cloud is simplicity and standardization. The need to simplify IT was also one of the top messages delivered by VMware CEO Paul Maritz at VMworld this year. We have to continue simplifying and standardizing clouds and cloud application architectures at the same time as the forces of enterprise IT try to pull it all in thousand different directions.

Comments (1)

Amazon launches virtual private clouds

This evening Amazon launched a new service called “VPC”, which stands for Virtual Private Cloud, read the details on the product page and the AWS blog, plus a nice backgrounder on Werner Vogel’s blog. The short story is that it allows anyone to spin up a private enclave within Amazon’s infrastructure. This allows VPC users to segregate their EC2 instances from “the masses” and get a VPN connection from their own data center to their VPC, which then looks like a part of their internal network. Exciting stuff and we’ll have support for VPCs in RightScale real soon.

When I look back, in 2006 when EC2 first launched it was for lunatics (ok, I plead guilty). In 2007 startups began to really notice and hop onto the bandwagon. Stories of really cool stuff happening in EC2 started to spread. But by and large it was still a somewhat limited environment and a very ‘early adopter’ product. In 2008 we saw more mature companies starting to adopt the cloud and utilize it where it made sense in their operations. Also, the first enterprise customers started to show up to learn about the cloud, try things out, and voice their concerns. Now that we’re well into 2009 the enterprise interest has really picked up, and Amazon’s new offering comes just at the right time. It’s targeted at addressing a number of the practical networking and security considerations that enterprises have to deal throughout their IT infrastructure.

The best way I’ve found to describe a VPC is a datacenter on a stick: you launch your servers into a balloon within Amazon’s infrastructure and you get a VPN link to tie them all back into your datacenter. Let’s take this step-by-step and see how it works.

  • In your existing EC2 account you create a VPC, that’s the container for all your instances
  • In that VPC, you define one or multiple subnets (e.g. 10.34.1.0/24) chosen so they integrate well into your enterprise-wide internal addressing structure
  • You now set-up your IPsec VPN device (preferably a major-brand router) and connect to a VPN endpoint you create within your VPC
  • Finally, you launch your first VPC instance almost the same way as you would launch a public instance, the only difference being that you specify to which of your VPC subnets it should be attached
  • You now have a server in your VPN that, with a small amount of router config, is indistinguishable from any of your other servers in your datacenter, except that you didn’t have to buy it, rack it, or hook it up!

So what is a VPC really? It really is what it says: a virtual private cloud. One key ingredient here is that a VPC is a logical concept, not a physical one, meaning that the boundary around your instances in your VPC is at the network level, there is no separate room with your servers! What that means is that a VPC is truly a cloud with all the attributes we expect: virtually infinite, on-demand resource availability, pay-per-use pricing, etc. You’re not forking out $$$ to have someone build you a finite cloud-like datacenter, that takes months to build, and is charged up-front. I’m sure Amazon got requests to build private physical clouds in some large enterprise datacenters and I’m glad they opted for the virtual cloud variant. The one that really is a cloud.

Something that initially puzzled me is what the benefits of a VPC are when all the marketing fluff dissipates. Here is what I’ve learned. First, instances in the VPC are separated from non-VPC instances at a deeper network level than instances in different security groups or belonging to different users. As is typical, Amazon doesn’t say anything of substance about the nature of this isolation. Let’s see how soon that will have to change to actually attract enterprises… Second, instances in the VPC can seamlessly integrate into a company’s internal network routing. This is significant because it means that tools used to inventory, secure, audit, manage, and access all servers in the IT infrastructure can now be brought to bear on instances in the cloud as well.

What is really nice about the VPC is that everything works (almost) as usual. Launching instances is only slightly different from before in that one additional parameter specifies the subnet to launch the instance into. Most everything else is unchanged. So all the goodness of RightScale will continue to work. Well, actually, there is one fly in the ointment in this initial release that the docs are quiet about, which is that instances in a VPC have no external network connectivity whatsoever. All traffic in/out of the VPC has to go through the VPN, at the far end of which it may be routed to the internet. This includes traffic to other AWS services, such as S3, SQS, SimpleDB, and indeed any general internet traffic. Sounds like #1 priority limitation to fix also from Amazon’s point of view to me…

Last but not least, the killer feature in my opinion is the price: it’s virtually free! The only extra cost of having a VPC over using standard EC2 instances is the VPN charge which is 5 cents an hour, a charge that doesn’t even register with most folks who need a VPC (the charge is per VPN, so in principle it can add up a little if you have 20 datacenters each with a VPN to your VPC, it’s still peanuts).

Mark your history books: 2009, the year that the cloud became enterprise ready. I believe this is the most compelling feature/service AWS could have added at this stage of the cloud market from an enterprise point of view.  While we’re busy finishing the support for VPCs in the RightScale enterprise edition don’t hesitate to give us a call to find out more about our early experience program for RightScale VPC management.

Comments (9)

More RightScale open source goodness

We’ve always been strong supporters of open source: we use it a lot and we’ve been contributing many things to the community as well. In particular, since early 2007 we’ve been publishing the recipes to our RightImages and we’ve also published the AWS Ruby gems that we use to interface to the various AWS services. We’ve recently taken the next step, which is to develop in public and not just periodically throw the source over the fence. As a result we now have the following public repos:

  • https://github.com/rightscale/right_aws – Ruby AWS library including EC2, S3, SQS, SDB, LBS, and more
  • https://github.com/rightscale/right_flexiscale – Ruby library for FlexiScale cloud API
  • https://github.com/rightscale/right_gogrid – Ruby library for GoGrid cloud API
  • https://github.com/rightscale/right_slicehost – Ruby library for Slicehost cloud API
  • https://github.com/rightscale/right_link – New agent to support Chef
  • https://github.com/rightscale/right_rackspace – Ruby library for Rackspace’s Cloud Servers cloud API (coming soon)

In addition to using the github repos directly ourselves we’re also using the issue tracking built into github for these projects to make it easy for anyone to submit feature requests, bug reports, or best: patches. Writing this, I just noticed that our Rackspace repo is not public, which it really should be since the API is public at this point, we’ll get that fixed asap.

The reason we’re open sourcing more and more of what we do is because we fundamentally believe that RightScale must be a transparent service in that our users must be able to understand what is going on with their cloud resources and must be able to control them to whatever degree they wish. In some cases this means that our platform has to provide the hooks and UI to dive down and see all the details of things, in other cases it means that the code we run or ask our users to run must be inspectable and modifyiable, and there’s no better way to do this that through open source. You can expect significantly more of our codebase to appear soon in our public repos and if there’s something you believe we should be open-sourcing, please let me know!

Comments (1)

Enterprise-class Software becoming available in the Cloud through RightScale

More and more enterprise-class software vendors are making their software available in the cloud and doing it through RightScale. Over the past two weeks the IBM DB2 team made DB2 Express-C v9.7 available, SpringSource published Hyperic HQ, and CohesiveFT published VPN-Cubed, all on the RightScale platform. Publishing software to the cloud is still a somewhat mysterious activity. While almost all software runs in infrastructure cloud environments such as Amazon EC2, publishing to the cloud creates new expectations and opportunities. Over the last year, we’ve been adding features to our platform to help ISVs publish to the cloud and are excited that the DB2 team found it easier to get the next version out using a RightScale ServerTemplate than an Amazon AMI.

I thought it would be helpful to write down how publishing software into the cloud is different from the more traditional software delivery:

Server templates, not software packages: Users expect to point and launch, not download, install, and configure. Of course some software is meant to be embedded or adapted, but in those cases there still is the opportunity to deliver a ready-to-go sample from which the embedding or adaptation can start.

In the cloud, you can launch IBM DB2 and have it running in a couple of minutes. That makes it much easier to get going and then later to start modifying configuration details. I’m sure most users will want to change the ServerTemplate published by IBM, but few will start there. Using the ServerTemplate not only gives you a server with the software already loaded, installed, and configured, it also has all the right software versions, is set up with monitoring and alerts, has  logging prepped correctly, plus offers other goodies.

From a vendor’s point of view the great opportunity of the cloud is to control the software environment from A-Z. You don’t need to have a long list of required software packages and compatible versions, you just provide it in a neatly wrapped-up ServerTemplate that automatically installs all the right components.

Free one-click trials: They don’t need to be literally one click, but the cloud offers tremendous opportunities for users to try before they buy. It’s so much easier to try out software if you can just spin-up a server in the cloud, possibly with some live demo data already loaded. It’s even better when the server is running on the vendor’s dime!

From the vendor’s perspective the cost is really close to just the cloud infrastructure cost. We’ve offered $1 of free EC2 time in our trial sign-up for years now. The $1 is good for about 10 hours of a small EC2 instance and really lets people get a first touch onto RightScale. Who wouldn’t pay $1 to get a prospect to try their product?

No lonely servers: Whose software is designed to run on a lonely server these days? What use is, for example, Hyperic HQ on its own? Its purpose is to monitor other servers so you really want to embed it into multi-server deployments. CohesiveFT’s VPN-Cubed product is similarly targeted to making life easier when you have lots of servers to connect back to the main office or datacenter.

Using RightScale simplifies the configuration of multiple machines because configuration inputs can be defined across many machines at once and it’s much easier for a vendor to also provide scripts that install client plugins or agents on a customers’ other servers.

Pay-per-use: users have come to expect more flexible billing methods in the cloud, such as pay per use. This is good and bad. The good is that it really is a requirement for enabling the scaling of resources on demand or for supporting flexible usage models. Use cases range from the famous scaling up in response to a traffic surge, to  being able to add a database slave server on a whim to test the performance impact of some schema transformations. Pay-per-use really makes the cloud unique and this tells me that like it or not, pay per use is here to stay. However other models will likely co-exist.

From a vendor perspective pay-per-use introduces new challenges. On the execution end it suddenly means that vendors need to meter the usage of their customers. I’m distinguishing metering from billing: the former is about measuring the usage and producing the data that can be used to compute per-use charges, the latter is about sending the customer a bill and getting it paid. We’ve been adding metering support to the RightScale platform for ourselves and we’re starting to make the data available to ISVs to feed into their billing.

On-site support: The servers in the cloud are very easy to access by the vendor’s support engineers and users will soon start to expect such “on-site” support. This is a true win-win proposition because it can reduce problem resolution time and increase customer satisfaction. Of course this means that the support reps need to have the skill to actually fix something and not just to dig in the knowledgebase and send an email reply.

One of the required underpinnings to enable vendor access in a controlled manner is access control. After all, the server’s user needs to be able to selectively grant access to the vendor when help is needed. What we’ve found is that the RightScale dashboard not only offers the ability to do just that, but it also gives the support engineer a lot of context and history information that can help getting to the bottom of the problem quickly. As an extreme case, our support guys have responded to a number of “help, our site is down and we can’t reach our IT guy” calls and were able to get things back up without having prior knowledge of the site. (In case you’re wondering, this is not what our standard support covers, but we also don’t just leave customers fall off the cliff in a situation like that.)

Delivered as a service: “And can you run it for me?” is a question prospects ask more and more. I know for myself that many times I’d rather pay the vendor to run it and sell it to me in SaaS form than go and figure it all out myself. Of course the cloud makes this much easier than ever before because the whole provisioning planning is largely taken out of the equation. When more customers sign-up the vendor can just launch more servers. A good number of our customers do just that and utilize RightScale to manage what one could call virtual appliances for their customers. At the more sophisticated end, companies such as StarCut use RightScale to provision multi-tenant clusters to host many small users and they then move larger users to private clusters and even set them up with their own fully-managed auto-scaled deployment.

Runs everywhere: The final consideration is that “publishing to the cloud” is a rather deceptive term because there isn’t just one cloud. I hate to borrow the “write once, run anywhere” slogan but it really describes what vendors are looking for. It’s too early in the industry to have a clear picture of what the solution should look like, but we’ve certainly made significant strides towards enabling multi-cloud ServerTemplates in RightScale and we’ll have more coming out shortly.

To give credit where credit is due, Amazon has done a great job in preparing the runway for software vendors to make their software available in the cloud. First the fact that EC2 is based on immutable machine images, which are not a snapshot of a server but rather a template from which new servers can be spun up really enabled the first catalog of ready-to-launch servers. Second the pay-per-use pricing which has gotten everyone to rethink how flexible computing could be if the licensing models allowed it. Somewhat to my surprise vendors with a lot of legacy pricing, such as IBM, have jumped into this new opportunity and decided to adopt it. Third Amazon’s DevPay service, which allows vendors to add a charge on top of Amazon’s hourly server fee, was the first offering that closed the metering and billing loop so vendors don’t have to reinvent the wheel. All this has really created a tremendound level of awareness and interest in the new ways software can be delivered in the cloud. We’re now leveraging this to introduce what we belive to be a more multi-cloud friendly and more flexible way to publish software in the cloud.

Comments (6)

RackSpace releases draft Cloud Servers API

In case you missed, the “cloud without an API” is about to become a real cloud with an API! (Sorry RackSpace guys, I just couldn’t resist!) Bret at RackSpace posted a blog entry asking for feedback a little over a week ago and it’s looking pretty good! If you haven’t looked at it, now is a good time. We’ve been in touch with Bret for a while and it’s good to see everything progressing. One item they solved nicely is passing “personalization” data into a new server. In the API you get to tell it to put some arbitrary data into any file you want on the root partition. This way it’s possible, for example,  to set some environment variables in /etc that get picked up by various programs on the server. Nice!

Leave a Comment

Amazon adds Load balancing, Monitoring, and Auto-Scaling (updated)

[This post was updated with sections on the monitoring and auto-scaling services]

Announced late last year, Amazon tonight launched load balancing, monitoring, and auto-scaling for the Elastic Compute Cloud (EC2). These features have been requested many times by EC2 users and with this release Amazon continues to show that it listens and responds to feedback. Read Jeff Barr’s description on the AWS blog and Werner Vogels’ backgrounder on his blog.

At RightScale we’ve been experimenting with a preview version of the new services for a while. We’re pretty excited because they allow us to offer new features and more choices to our customers. In particular, we’ll integrate the load balancing with what we already have. It can be used as an alternative to our haproxy based set-up or in combination with it for more flexibility. For example, for more complex web sites and for SSL sites a more application-specific load balancing layer behind Amazon’s will usually be required.

The new monitoring service will provide additional data sources to our users as well as the ability to aggregate across many servers. The service introduced by Amazon can collect data at the hypervisor level and provides a very versatile storage back-end. We’re planning on sourcing data from the service in our graphing front-end and also integrating the data into our alerting and escalation system. At scale, the use of Amazon’s monitoring service by itself actually costs half of what all of RightScale costs, monitoring included, so we’re offering great value as an integrated solution.

Finally the auto-scaling service is something that has been lacking in many users’ mind from EC2: we’ve often heard from people looking at EC2 the first time “you mean Amazon doesn’t automatically launch more instances when my app is overloaded?” Amazon now has an answer for those questions, which was badly needed. However, unless we’re missing something, there’s nothing additional to our current offering, but we’ll keep listening to what our customers tell us. We believe that the most difficult part of auto-scaling isn’t the actual launching of servers but that it’s lining up all the configuration management and lifecycle management so the new servers go into production successfully, and that dynamic runtime self-configuration is where RightScale really shines.

Load balancing

Let’s take a closer look at the new features introduced tonight starting with the load balancing. It is now possible to allocate a load balancer and have it distribute requests across multiple EC2 instances running in multiple availability zones within one region. (An availability zone is roughly equivalent to a datacenter and the two current regions are US east and EU west.) The interface to the service is pretty simple. You create a load balancer and define for which ports and protocols it should process requests. Then you launch instances and add them to the load balancer. You also define a health check that the load balancer uses to probe the instances to ensure that they’re operating. With that the load balancer is in operation and starts passing incoming requests through to healthy instances.

The load balancing service is designed to serve as a first level of distributing load across a number of instances, dealing specifically with DNS and handling the failure of an availability zone. Most sophisticated web sites will need an additional level of load balancing that is more customizable and more application specific, for example to map portions of the URL space to different back-end services or to optimize the handling of persistent sessions.

Some of the features details of the load balancing service are:

  • It supports HTTP and TCP meaning that it load balances HTTP at the request level and provides TCP switching for all other protocols. This in particular means that it does not terminate HTTPS, instead HTTPS must be balanced at the TCP level and terminated on the user’s instances.
  • It can listen on ports 80, 443, and 1024 thru 65535, which means it cannot be used to load balance many standard protocols that use ports below 1024. I’m not sure why this restriction exists, but it’s perhaps an indication that the service is primarily geared towards web sites.
  • The health checks can either issue an HTTP GET request to a specific URL and check for a 200 OK response, or they can open a TCP connection to an arbitrary port and check that the connection is accepted.
  • Servers can be added to or removed from the load balancer rotation without interrupting operation, and the load balancer can be queried for the status of instances according to the health check.
  • The load balancing occurs in two-stages, first a client is directed to a specific availability zone using DNS, and then it is directed to an available instance in that zone. The zone selection is equal-weight, which means that one better run an equal number of instances in each zone or instances in one zone will end up with a higher load than the other.

We’re currently planning to support the load balancing service at multiple levels. We’ll enable our server templates to use Amazon’s load balancing both instead of and in addition to our own. For simple highly scalable HTTP services Amazon’s can be used on its own, but for more complex configurations a second level of load balancing is needed. In particular for SSL sites, a back-end load balancing after the SSL termination is often required.

Monitoring

The CloudWatch monitoring service is really a special storage engine that is designed for time series data. On one end data collected periodically from servers and from other services is pumped into the monitoring store, and at the other end clients can run queries against the store to extract data from it. What this means is that while not being a complete monitoring system, it is the central storage piece to which all the others would interface.

On the data input side CloudWatch is very limited at the moment. There are seven metrics per server that the virtual machine host injects into CloudWatch, and there are four metrics for each Load Balancer instance. Not very exciting yet, but don’t be fooled, this is just the beginning. Amazon will add more and more metrics and also provide an API for inputting custom metrics. At that point it becomes really interesting!

On the data output side the store offers a number of ways to query the data. The result of a query is always an array of data points over time. What’s interesting is that one can get much more than just the raw data points back out and that’s where CloudWatch shines. For example, it is possible to retrieve the max cpu utilization across a number of servers as a time series. It’s unclear how flexible this aggregation will end up because initially the way to name the servers of interest is somewhat limited, but we’ll find out.

Some other characteristics of the service:

  • data is retained for 2 weeks, so one better extract it and save it somewhere else for longer term comparison and trending
  • the smallest data resolution is 1 minute, anything input more frequently gets aggregated automatically at minute boundaries
  • the service costs $0.015 per server hour and there is no per-query charge

Overall CloudWatch looks like a very promising service that will really gain momentum when many more metrics can be input. We’re still on the fence whether we should modify our graphing and alerting to be able to pull data from CloudWatch directly or whether we should pull data from CloudWatch on a continuous basis and re-store it in our monitoring system. In either case, we’ll make the data available to our customers.

Auto-scaling

The auto-scaling service is something that a lot of first time users of EC2 have been missing. Everyone expects Amazon to automatically scale the resources of a user since this auto-scaling is what is most often quoted in connection with the cloud. Never mind that it doesn’t really make much sense since Amazon just provides compute boxes and doesn’t have any info about the application a user is running and how one would make it scale. It’s like the UPS guys arriving at your doorstep with a bunch of Dell boxes: “here, they noticed you need more, I’ll unbox and plug them in for you”. I wish auto-scaling were that simple!

The Auto-Scaling API reflects the fact that a lot of set-up is required for auto-scaling to work. You have to define not just the auto-scaling behavior but also what to launch, captured in a  “LaunchConfiguration” data structure which includes the image to launch, security group, SSH key, instance size, kernel id, ramdisk id, block device mappings and user data. If I counted correctly, you get to specify a total of 27 parameters to make auto-scaling work.

What’s really missing for the auto-scaling to fit in is all the context information that would provide a lot of the configuration information. The servers being auto-scaled don’t operate in a vacuum, they work in concert with other servers For example, most of our customers have two “base servers” that are not auto-scaled. They often have some extra functions, like acting as front-end www servers, but otherwise they’re configured just like the additional auto-scaled servers. Well, the config of those base servers and the auto-scaled ones share a lot in common, so it’s nice to be able to set all this up in one place for the whole deployment, and not individually for each server and the auto-scaling too. The same goes for other deployment-wide information, from the web site name to the credentials for accessing the database, all that is context that can be shared across all servers in a RightScale deployment.

What becomes even more painful is that any changes to what’s running on the servers, like a bug fix, requires creating a new image and relaunching the servers. This is where our ServerTemplates which build the server at boot time from a base image are a lot more flexible. You can make small changes to the server template, test that on an individual server on the side, and then slide it into the scaling. A neat alternative is being able to run a script that updates existing servers on the fly. In some cases you want a new server array that gradually scales up and takes the load over while the other scales down, other times you just want to run a quick script on the existing servers to patch up a minor config detail. You really do want the whole toolbox so you can pick the right tool for the job.

The good news is that the auto-scaling service itself is free. However, it requires that all launched instances use the CloudWatch monitoring service which costs $0.015 per instance hour.

Summary

I won’t repeat what I wrote at the beginning of the blog entry, but it’s great to see Amazon continue innovating at a break-neck pace! At a feature level what they’ve introduced tonight overlaps more with some features of RightScale than most other parts of their offering. Our focus is to provide an integrated solution on top of their array of infrastructure services. In particular, we have long had support for dynamic configuration management and advanced automation. More recently we have embraced portability across other cloud providers and even hybrid public/private clouds.

Comments (18)

Older Posts »