Archive for February, 2009

RightScale Ruby Cloud Gems released

We’ve just re-released all the Ruby gems (libraries) we use to interface to the various cloud providers. These gems are what we use in our RightScale platform in production, not some stripped-down version. They have performance optimizations and extensive error checking as well as retries for failing operations.

As part of the current wave, we’ve released the Amazon web Services interfaces RightAws 1.10.0. The big new features are SDB’s SQL-like query and query_with_attributes support as well as signature v2 support for all services. There are also numerous bug fixes, many of them reported and patched by users and customers. Thank you!

We’ve also released alpha versions of the GoGrid, FlexiScale, and Slicehost gems. These have seen less production use and the APIs at these providers are still seeing changes, so we expect we’ll have some fixing to do. Please report any bugs to us and we’ll fix’em!

We remain committed to contributing open source libraries to the cloud community, more coming soon! Also, by popular demand, we will be moving the development of these gems to a public git repository soon. It’s a bit more tricky than you might expect as we’re often ahead on a private branch with non-public cloud features, so we need to make it all work correctly… But stay tuned!

Oops: I almost forgot to add a link to the gems: http://rightscale.rubyforge.org/ and http://rubyforge.org/projects/rightscale

Comments (12)

The Skinny on Cloud Lock-in

The topic of cloud lock-in is getting quite some attention as of late, and it definitely needs to be a primary concern for anyone planning to move business critical applications to the cloud. (And who isn’t planning on that these days?) Given all the different layers of cloud computing the conversation can quickly get more confusing than anything else. At Cloud Connect a few weeks ago the lock-in discussion bounced from Salesforce.com to Google App Engine, and then to Amazon Web Services within a single argument — which just makes no sense. To put it simply, different layers of cloud offerings vary widely when it comes to the dangers of lock-in.

Lock-in hypothesis

Let me state Thorsten’s Lock-in Hypothesis:

The higher the cloud layer you operate in, the greater the lock-in.


lockin increase

This means that if you use an application in the cloud, such as an all-in-one CRM package, you have the highest chance of getting locked-in. Move one level down to a platform in the cloud and you are somewhat less likely to get locked-in. Google App engine is one example: you can move a simple Python app off that platform fairly easily, but anything of substance that uses its BigTable storage and other services will end up relying on a lot of proprietary technology.  This “black box” effect locks you in more than, for example, a platform like Heroku where apps follow more of a standard Rails code base. When you move down to an infrastructure cloud, such as Amazon Web Services, it becomes even easier to see how you can move your application stack from one provider to another. After all, there’s not much distinguishing the Linux box you get in EC2 from the Linux box you get at GoGrid. But even here, lock-in needs to be thought through because the system behavior –from storage persistence to networking details and on and on — is far from identical.

So where does this leave us? I’ve been talking about lock-in, but what does that really mean? Well, with cloud computing you outsource the operation of compute resources to a cloud vendor who “runs” your application and who “stores” your data. Lock-in occurs with this vendor to the extent it is prohibitively expensive or time-consuming to run your application elsewhere or move your data elsewhere. Whether this “elsewhere” is another vendor or whether it is your own infrastructure is not important: if you can’t move, or it costs a lot or takes a long time to do so, you’re locked-in. We recently asked our customers and prospects what concerned them most about lock-in. Here are the results:

lockin concerns

The layer cake

Lock-in can actually occur at many levels in the stack, and that’s why the cloud layers differ in their effective lock-in risk. The more code that is controlled “behind the curtain” by the cloud, the more you tend to lose freedom. Conversely, the more that is under your control, the easier it is to replicate it elsewhere and retain freedom. Here are a number of different layers at which you could find yourself locked-in:

  • Application: do you own the application that manages your data or do you need to find/write another one to move?
  • Web services: does your app make use of 3rd party web services that you would have to find or build alternatives to (e.g. storage, search, billing, accounting, …)?
  • Development & run-time environment: does your app run in a proprietary run-time environment and/or is it coded in a proprietary development environment? Would you need to retrain programmers and rewrite your app to move to a different cloud?
  • Programming language: does your app make use of a proprietary language, or language version? Would you need to look for new programmers to rewrite your app to move?
  • Data model: is your data stored in a proprietary or hard to reproduce data model or storage system? Can you continue to use the same type of database or data storage organization if you moved or do you need to transform all your data (and the code accessing it)?
  • Data: can you actually bring your data with you and if so, in what form? Can you get everything exported raw, or only certain slices or views?
  • Log files and analytics: do you own your history and/or metrics and can you move it to a new cloud or do you have to start from scratch?
  • Operating system and system software: do your sysadmins control the operating system platform, the versions of libraries and tools so you can move the know-how and operational procedures from one cloud to another?

All these issues become pertinent when you face questions such as: “How can I move my Force.com application or my web site running in Google App Engine to my own data center?” Or “Can I get the click-stream data for my site out of the platform so I can analyze, for example, last year’s traffic compared to this year’s?” Or “Can I easily move an application between my datacenter and EC2 easily?”

Altitude increases lock-in

The value proposition of the higher cloud layers is appealing and I predict more and more movement in that direction. But lock-in is one of the issues that really gives me pause and that has kept me in the past from adopting some of the services that otherwise have looked compelling.

Let me pick on Google App Engine for a minute. Suppose you develop your site on App Engine and you find yourself having to move away for whatever reason. I don’t know of a good solution for you at that point. While there are ways to port an app from App Engine to Django it’s not clear this is really an answer if you’re running a high volume production app. It’s going to be interesting to see whether we will end up with commercial or perhaps open-source App Engine clones that are “industrial strength” to the point where one can really contemplate moving a big app from one App Engine vendor to another. (Well, first Google App Engine needs to be complete enough to host the types of apps where this is a real concern.)

An example closer to home is Amazon’s Simple DB. I’ve been interested in Simple DB since I first heard about it, but we have yet to use it as part of the RightScale service and the #1 reason is lock-in. For example, we store audit entries for everything that happens with our users’ servers and I’d love to get those out of the SQL database they’re in. Simple DB may be a good solution to the problem from a technical point of view, but we don’t see how we’d be able to move that data out of Amazon without major headaches. In addition, we need to be able to run all pieces of the RightScale service in other clouds and we’d have to build an alternate storage solution there. By the time we do that we might as well only use this alternate solution and forego Simple DB altogether.

At the level of infrastructure clouds like Amazon EC2 the questions around lock-in are somewhat different but still pertinent. The cloud vendor provides what I like to think of as the “atoms of computing,” namely processing, storage, and networking. You get to build your infrastructure using virtual machines (EC2), disk block devices (EBS), hashed storage buckets (S3), security groups, etc. This means that the choices of programming language, development environment, runtime environment, database storage and so forth are all yours and can all at least in principle be duplicated in another cloud, at a traditional hosting provider, or in your own datacenter. Where lock-in starts to creep in is in the system architecture and in the operations infrastructure (automation, scripts, procedures) that your sysadmins put in place to manage everything.

Maintaining freedom of choice

One of the principles that I’ve upheld in the design of the RightScale system from the beginning is transparency. Everything happening on your systems should be visible to you. This not only means that you can find out why something happened and who did it, but also that you can replicate it elsewhere. There’s no magic happening behind the curtain to which you’re held hostage. I love it when others can do magic for me and save me a lot of time and effort by providing a pre-built platform. But there are solid reasons — both business and technology-related — to demand the ability to look into the “secret sauce.” That way, I can be enchanted by the magic but not locked-in to the magician. Our users need to be able to enjoy the same capability.

A second principle we follow is to focus as much as possible on standard software, architectures and configurations. This means that our solutions can easy be replicated elsewhere, such as in your own datacenter. This can present more of a challenge when designing for a cloud environment, which is why we provide cloud-ready solutions for various types of scalability, but it also frees you from being tied to a particular cloud.

lockin details

In the end, there may not exist a zero lock-in option. In fact, certain kinds and degrees of lock-in are probably unavoidable and are actually tolerable. The point is that the lock-in question is an important consideration to take into account when choosing among different cloud computing alternatives, and it’s equally important to keep the differences among cloud layers in mind when you decide what you’re willing to live with. All clouds are not created equal, and all clouds do not create equal lock-in. The key is to know the implications of your cloud choices.

Comments (18)

RightScale supports Amazon EC2 Europe

Our platform now supports Amazon EC2 in Europe! Several of our customers have already noticed it and are running servers there using RightScale. This brings the cloud offering in Europe practically up to par with what’s available in the US. After operating production servers in the EU for about a month now, I must say that it’s been pleasantly uneventful! It actually took me a few server launches for all this to sink in: in a previous life, I had to send employees on overseas trips to scout out hosting facilities before we could ship servers there. And then it cost a ton of “remote hands” to have them racked and wired to spec. Now it’s all just a drop down menu!

The big benefit of Amazon EC2 in Europe is the reduced latency to European users and it should help companies adhere to EU regulatory requirements for data storage and processing. For companies operating globally it also supports an additional level of redundancy and disaster recovery. The main differences between EC2 US and EU are the absence of SQS and SDB in the EU. I sure hope SQS in particular will be available soon since our many RightGrid users need it to operate their deployments and while the latency to the US isn’t a huge deal for SQS it sure would be nice to have all pieces of the puzzle within the same region.

The way this looks in the RightScale platform is deceptively simple: you can now place servers in different regions within the same RightScale deployment and you can manage them as a unit. This means that configuration inputs can apply across regions, that monitoring and alerting are in one place, etc. Below is a screen shot of one of our own deployments where an EU server sits side-by side with peer servers in several US availability zones.

eu-server1

The big surprise when Amazon announced the EU region was that they decided to offer what I would describe as a separate cloud, disconnected from EC2 US. As I mentioned in my previous blog post I am convinced that this was the right decision because it really isolates the regions from one another from a failure perspective. Before this, I always kept wondering how they would convince us that EC2 couldn’t go down worldwide at the same time due to a software bug in the front-end API servers. Now the answer is pretty clear. That’s a really good thing.

To help our users to operate across these two clouds we added some features to replicate images and server templates from one EC2 region to the other. If you have an AMI in the US and want to launch it in the EU, you can simply press the “replicate” button and we’ll make a copy of it in the EU: eu-replicateThe same applies to server templates which you can replicate to the other region. This automatically attaches the right image, kernel, and ramdisk underneath. Having to replicate the images is something required by the EC2 architecture and we’ve already replicated all our RightImages, so the majority of our customers don’t need to deal with this at all. Replicating the server templates creates an additional level of duplication which we’ll eliminate in the next release, making it even easier to operate in both regions!

It’s no secret that getting the EU support into all parts of our system took a little longer than we had hoped. The primary difficulty was that we hadn’t upgraded our EC2 code base to our new multi-cloud structure. And to be frank, the way Amazon decided to separate the US and EU clouds didn’t help: it’s one thing to require the use of a different API front-end to access each region, it’s another not to keep a global object name space, e.g., instance id i-123456 can exist in both the US and EU! But now all this code is refactored and we’re off to the next set of features!

Since we’re talking about name spaces, I might as well comment on an oddity that has crept into the AWS services. There are two different strategies within AWS for handling regions: S3 is handled globally while EC2 is split per region. If you look at S3, there is a global namespace for buckets (the top-level containers in S3). If I point you to the rightscale-test bucket, you can’t tell from the name where it’s located until you access it. And there’s a somewhat elaborate DNS and redirect scheme to ensure that your access “bounces” to the correct region. As a result, our UI has a single “S3 browser.” It wouldn’t make sense to have an “S3 US browser” and an “S3 EU browser”. For EC2 resources, however, everything is duplicated; there a list of EC2 EU instances and a separate list of EC2 US instances, same for EBS volumes, etc. We then stitch this back together when you look at a deployment which can span clouds. The big question now is what we’ll see for SQS and SDB: which of these two schemes will they follow? Only time will tell… In the meantime: enjoy EC2 in Europe!

Comments (1)

Gartner prediction misses today’s enterprise cloud action

John Foley blogs about an interesting sounding Gartner report that gives cloud computing up to 2015 for “mainstream enterprise adoption”. He describes it as “a surprisingly conservative forecast for business adoption of cloud computing services”. I actually thought the time lines were quite rapid and that we’re just not sure about what Gartner means by “mainstream”.

The bottom line is that I think the Gartner report misses a lot of the interesting enterprise movement that has gotten rolling in 2008. Smart enterprises are experimenting with the cloud today. They are cherry picking apps that are appropriate for today’s cloud offerings so they have the experience necessary to move more involved and sensitive ones tomorrow. That’s where the excitement is (at least for us)!

But one step at a time, let’s try the time line from the Gartner report on for size:

2009 is for “pioneers and trailblazers” — I can’t argue that enterprise adoption will go beyond trailblazers this year without having multiple vendors offer clouds with solid security certifications that enterprises can insert into their sarbox, PCI, HIPAA, etc. audits. There are a lot of really interesting, solid, cost saving, time saving things enterprises can do in the cloud in 2009, but they have to cherry pick the right projects.  Will we get to these security certs for cloud offerings and all that in 2009? Probably, but not early enough to move the mainstream needle.

Beginning in 2010 “cloud computing will appeal to a broader range of companies, resulting in a more mainstream user base”I would predict that it will take until next year for enterprises to have the red carpet rolled out in front of them leading them to the finish line with all the i’s dotted and t’s crossed. From books about cloud computing, to cloud consultants, success stories, ROI calculations, security certifications, enterprise offerings, yadda, yadda. This is a lot of moving parts, most of which are not about whether EC2 works and is secure but about leading the horse to the trough.

It’s between 2012 and 2015 that “cloud services reach mainstream critical mass and commoditization” — And I’m thinking how long it is going to take AT&T, EDS, IBM Global Services, Savvis, and everyone else to have an enterprise cloud offering in full production, with all the services, consultants it takes to reel in the “mainstream”? Can I picture earlier than 2012? Mhhh.

What I really think is that the Gartner report misses a lot of the interesting action that has started to get into full swing. The moment sometime between 2012 and 2015 when the last enterprise moves into the cloud may be interesting, but it’s far more interesting when significant movement into the cloud starts to occur. And that’s definitely 2009 and more deeply so 2010.

Large companies don’t run one app. They aren’t sitting there waiting to decide when to move that app into the cloud. They have many apps. They all have different purposes, different stakeholders, different risk profiles, different security requirements, different development timescales, and on and on. The first ones could be moved into the cloud in 2008; more can be moved this year; yet more next year; and the last ones sometime in 2012-2015 if we believe Gartner. Great. Let’s focus on those that can move now!

We see many interesting examples. There are marketing sites that don’t have much security attached, are often under tremendous last-minute time-pressure to add compute resources, and whose production is often outsourced. Cloud this!

Or batch computations that operate on public data using public algorithms. Sounds impossible to find? Well, a major pharmaceutic company has a good deal of those. The next step up is massive compute jobs that use proprietary algorithms that are not super-sensitive and operate on public data. We’re working on some interesting architecture to enable these in the cloud. They have more steps leading to super secret data and very sensitive algorithms and we pushed those to the end of the queue. We’ll get there, perhaps not in 2009 or maybe even 2010, although cloud time seems to be moving even faster than internet time…

I could go on, but the point is that it will take enterprise IT shops a long time to build the internal know-how about the cloud to the point where they will be comfortable moving more and more mission critical applications over to reap the benefits. Now is the time to start building the experience and understanding of what works and what doesn’t. The smart ones clearly have started.

Comments (1)