Archive for July, 2008

Cloud Computing wouldn’t exist without Open Source

I’m at OSCON this week drinking from the open source that made RightScale possible. In talking to Tim O’Reilly I noticed that he hadn’t realized how integral Open Source is to the cloud. So maybe this isn’t as obvious as I thought and worth writing a blog entry about.

Cloud Computing is all about the flexibility to launch and terminate servers on demand, or more generally, to acquire and release resources on demand. This can help solve many tricky problems, from reliability, scaling, development, testing, to business flexibility needs. Where open source comes into the picture is when you think about the software licenses for the software stacks you’re running on all the servers you’re launching. If you are normally running 2 servers but today you need 10 did you consider whether you have licenses for all the software on the additional 8 servers? Most commercial software seems to be licensed by the server or by the cpu, and obviously this just doesn’t cut it in the cloud. If it weren’t for open source stacks no production service would be operating in the cloud today; everyone would still be waiting for software vendors to ‘get it’ and change their licenses to enable efficient use in the cloud (yeah, right…).

But all this is starting to change. The vast majority of software vendors we talk to are in the process of trying to figure out how they can sell their software in the cloud. What technical changes are necessary to enable their customers to deploy their software into the cloud environment and what business model changes are necessary to offer frictionless sales into the cloud. Of course deploying software on the RightScale platform offers a number of benefits, including some new features we’re currently adding to support publishing and charging by the use. But the bottom line really is that without open source we wouldn’t have cloud computing today.

Comments (7)

Cloud Computing vs. Grid Computing

Recently Rich Wolski (UCSB Eucalyptus project) and I were discussing grid computing vs. cloud computing. An observation he made makes a lot of sense to me. Since he doesn’t blog [...], let me repeat here what he said. Grid computing has been used in environments where users make few but large allocation requests. For example, a lab may have a 1000 node cluster and users make allocations for all 1000, or 500, or 200, etc. So only a few of these allocations can be serviced at a time and others need to be scheduled for when resources are released. This results in sophisticated batch job scheduling algorithms of parallel computations.

Cloud computing really is about lots of small allocation requests. The Amazon EC2 accounts are limited to 20 servers each by default and lots and lots of users allocate up to 20 servers out of the pool of many thousands of servers at Amazon. The allocations are real-time and in fact there is no provision for queueing allocations until someone else releases resources. This is a completely different resource allocation paradigm, a completely different usage pattern, and all this results in completely different method of using compute resources.

I always come back to this distinction between cloud and grid computing when people talk about “in-house clouds.” It’s easy to say “ah, we’ll just run some cloud management software on a bunch of machines,” but it’s a completely different matter to uphold the premise of real-time resource availability. If you fail to provide resources when they are needed, the whole paradigm falls apart and users will start hoarding servers, allocating for peak usage instead of current usage, and so forth.

Comments (22)