Archive for August, 2007

Surviving a MySQL master DB crash

Nothing is more heart-arresting than to find out that your database machine has died. Site down. Data gone. Life s…

That’s what happened to one of our customers yesterday morning, right when they were featured on some prominent sites. The Amazon EC2 instance hosting their master DB died. Fortunately they had tested the master-slave set-up using our Manager for MySQL, so everything was set-up to recover quickly. They IM’d me so I could help should things go wrong. We waited a couple of minutes to see whether the machine was just rebooting, but to no avail. So we hit the “promote to master” on the slave instance, and here’s the log of what happened:


[2007-08-21 16:24:45] [ServerActionsWorker] : Executing: 'Executing action: DB promote to master'

[2007-08-21 16:24:46] [ServerActionsWorker] : Using MasterDB DNS ID: 2577432 .

[2007-08-21 16:24:46] [ServerActionsWorker] : Using SlaveDB DNS ID: 2577433 .

[2007-08-21 16:24:54] [ServerActionsWorker] : No slave argument given...assuming localhost

Using C interface for mysql, client version 5.0.22

Server doesn't appear to be logging binary logs, configuring and restarting server with binary logging

Locking slave (and enabling writes)

[2007-08-21 16:28:04] [ServerActionsWorker] : Process 7927 has the lock. terminating others.

Written read_only changes to new master conf file

Stopping master (if alive), noting position, making RO, stopping and unconfiguring replication

Previously connected master db-p-master.company.com not reachable...

...Warning: assuming old master is dead and that the current contents of the Slave is the latest and best we can get.

Promoting slave...

Waiting until it catches up (if alive), stopping and unconfiguring replication,

unlocking tables and setting up replication privileges

Retrieved new master info...File: mysql-bin.000001 position: 98

Stopping slave and misconfiguring master

granting rep rights...

done with rights...

Unlocking tables

Demoting old master...

Changing Master DB DNS...

OK. Result: DNSID 2577432 set to this instance IP: 10.255.47.70

Mission accomplished.

[2007-08-21 16:28:04] [ServerActionsWorker] : Server action successully completed

Woot! The slave promoted to master just fine. At that point we had to bounce the Mongrel servers because, as far as we can tell, ActiveRecord just doesn’t switch to the new DNS entry for the DB in any reasonable amount of time. After verifying that the site was back up and fixing an ancilliary server that wasn’t pointing to the proper database DNS entry, we laucnhed a fresh slave with another button press.MySQL after FailurePhew, all this within about a half hour, including initial reaction and troubleshooting time and follow-up cleanup work. Everything we put in place with Manager for MySQL worked like a charm!


Archived Comments

Lox
“Stopping slave and misconfiguring master” ?? :)

Thorsten
Yes, “Stopping slave and misconfiguring master” Don’t you love the comments developers put into their code :-) . This refers to stopping the replication on the slave so it can become the master, and mis-configuring the master to ensure nothing continues to talk to it.

Ian
All the notes in there mentioning DNS: I assume you have a facility setup that ensures when a new DB master goes live that all the app servers are automagically updated with the new settings? I’d love to learn more about how you do that!

I’m very interested in using your Manager for MySQL for my company, once we’re ready to release. I’ve been using the free version of RightScale and love it!

Thorsten
Ian, thanks for the interest! We use DNSmadeeasy.com for our DNS services. They make it easy to create dynamic DNS entries that can be updated automatically when an instance boots. So when the master DB boots or a slave DB promotes to master it updates the appropriate DNS entry (e.g. db-master.rightscale.com). We set the TTL on that DNS entry to 75 seconds, which is the lowest supported by dnsmadeeasy.com for the type of account we have with them. We have found no problems in the propagation of such a change, and in the case of the DB this switch-over time is fine considering everything else that needs to happen. We are having trouble, however, in convincing Rails’ ActiveRecord to switch.  We’re still tracking down some scenarios where restarting the Rails app is required.

Comments (2)

Redundant MySQL set-up for Amazon EC2

In order to deploy web sites/services onto Amazon EC2 everyone needs the same components, and so we’re building them! One of the most requested and most critical pieces is a good database set-up, and mysql is clearly the highest in demand. Not that a good postgresql or oracle set-up wouldn’t be of interest or would be equally possible, just that more people are asking us (and paying us) for mysql…

What we’ve built is a mysql master/slave set-up with backup to Amazon S3. The set-up consists of one mysql master server which is the primary database server used by the application. We assume it runs on its own EC2 instance but it could probably share the instance with the application. We install LVM (linux volume manager) on the /mnt partition and place the database files there. We use LVM snapshots to back up the database to S3, this means that we get a consistent backup of the database files with only a sub-second hiccup to the database.

MySQL Master and Slave Setup

Well, the snapshots for backup are actually quite a bit more complicated than that. We have to acquire a read-lock on all tables and this could block things if there is a long running query ahead of us. So there’s a timeout and retry loop which needs to balance off locking up the database and getting the backup done.

Using the snapshot backup we set-up a slave instance which then starts replicating in real-time from the master. This means that all changes to the master are propagated with milliseconds of delay to the slave, so should the master instance fail, there’s an up-to-date backup. On a master failure we promote the slave to master and set-up a fresh slave. Note that in most databases the slave lags extremely little behind the master. The main situation where the slave starts lagging is when there is a lot of write activity going on in the master. Under heavy write load the slave is slower at applying the replication to its copy than the master on the same hardware because the slave uses only a single thread to apply all changes while the master has one thread per client connection, so it can overlap network communication, cpu processing, and disk I/O using multiple threads, which the client can’t.

Periodic backups are taken off both the slave and master instances. There is very little penalty for acquiring a read lock on the slave and performing the snapshot and subsequent back-up, so it can be done every few minutes without any real impact (unless the slave has trouble keeping up as described above, in which case it’s probably time to move to multiple slaves). We also take infrequent backups on the master, say once a day, in order to guard against any problems introduced by replication.

While the mysql replication is well proven and used by many large sites in heavy production, there are failure scenarios. First of all, the application should use Innodb tables exclusively because myisam tables are not transactional and have a number of scenarios where replication fails. Even with innodb tables there are failures possible. For example, it is possible to write non-deterministic queries in SQL and since mysql uses logical replication the slave re-executes the query, and it may end up using a different execution order than the master, resulting in different data in the database. Ouch. One example is a create table with an auto-index key using a select from an existing table. The insertion order and hence the keys in the new table depend on the order in which the select is executed, and if it’s executed in a different order in the slave from the master you will end up with an unusable slave DB! (Been there, done that, it still hurts.) Thus: do back-up your master every now and then to be able to recover from such problems. (If you’re paranoid, fire-up an instance every few hours, load up a back-up, and run a few consistency checks — it’ll cost you less than a buck a day to ensure the DB backup is good, that’s cheap insurance.)

The best of all is that all the goodies described above are controlled through the RightScale web interface. Want a new slave? Just press the “set-up slave” button! Want a back-up, just press “backup” on the master or on the slave. The list of functions we have now are:

  • launch database instance
  • restore from S3 backup and configure as master
  • configure as slave, using DB transfer from master for initial state
  • promote slave to master
  • backup to S3
  • daily backups to S3 from master
  • 10-minute backups to S3 from slave

We obviously still have a lot of work ahead of us to improve the flexibility of the set-up. One thing to note is that you are in control of what is executed on the database servers, so they are not opaque virtual appliances. If you need to tweak our database install, slave set-up, backup, or other code, it’s all available in scripts that you can modify. (Of course the more you modify the less we can help when things go wrong.) Also, currently all these functions are “automated” in the sense that you make a decision, push the right button, and things happen. We are adding monitoring and we will add triggers that will cause master-slave failovers automatically.

If you are interested in using our mysql master/slave set-up, please contact us at sales@rightscale.com. This stuff is not available with the free RightScale accounts.


Archived Comments

Pete
Any plans on making this available as a feature you can pay for without signing up for one of those expensive packages?

Thorsten
Pete, thanks for the interest. Obviously “expensive” is in the eye of the beholder. After having spent the days putting all this together, I would rate the price as really low. Together with the help you get to put it all into operation, it’s probably too low to make a profit. Maybe a bit down the line when we have it fully automated and cookie cutter we can reduce th price.

When I started using EC2 I realized that it’s really easy to blow any savings away in sysadmin time. You end up launching and installing servers at an insane rate, because you can, and because it brings sooo many side-benefits. You can do things you didn’t even conceive of before. But someone or something has to install and manage all these servers. If it’s “someone” then that’s really expensive really fast. Hence “something” = RightScale. I hope that when you sum it up at the end of the month the sysadmin you didn’t need to hire thanks to RightScale saves you much more money than the check you write us. YMMV of course.

dennis
I think the prices are great for established businesses, but I think that the smaller startups (who are doing the sysadmin stuff on their own time) would love to be able to launch some of the database stuff as part of the free account…

Paul
Nice use of LVM for the snapshots, just be mindful that long running queries can cause the read lock to take ages to be in the place and have the database in a consistent state. Pete Zaitsev mentioned in one of his cookbook presentations that using FLUSH TABLE is a way to speed the process up. I am also reviewing the mysql-table-checksum from the MySQLtoolkit as a way to make sure the master and slave remain completely in sync. Interesting set of tools as well.
Have Fun
Paul

Thorsten
Paul, thanks for the pointers. Yes, we understand that long running queries can delay the read lock. And if you’re not careful, your read lock request can block a pile of other queries behind it, so it’s not good to just sit there and wait for the read lock to go through. We use a relatively short timeout on the read lock and try again, and again.

Comments (9)

Recompiling kernel modules for EC2 instances

Some time ago, I discovered that the version of the kernel that Amazon uses for its current infrastructure (linux 2.6.16) contains a bug in the lvm modules. This was a bummer to see since we are using LVM snapshotting facilities to realize sub-second database backups. The bug was only triggered under specific load conditions, but when we’re talking database backups nobody likes getting kernel panics at the most inappropriate times.The interesting thing is that this particular bug has been fixed for a while now in newer kernel versions but we (i.e., all EC2 users) cannot benefit from these kernel sub-release patches since we depend on the kernel version that Amazon installs in all instances.

After some research it became clear that in order to successfully use our fast snapshotting facilities in EC2 (or, for that matter, for anybody to use LVM-related tools on EC2), patching the lvm kernel modules became a requirement.

The first thing to do was to find out the exact version of the kernel that Amazon is installing, the sub-release version of the kernel that contains the required patch, and the version of Xen that Amazon uses to patch their kernel. At the time of writing, Amazon’s kernel is based on a vanilla 2.6.16 kernel, patched with an unknown version of Xen (at least I couldn’t really find what version it was or perhaps it’s customized by Amazon). It turns out that the fix for the LVM bug I was triggering was applied at 2.6.16.12. Therefore the task was to recompile the kernel modules for a Xen-patched kernel of version >= 2.6.16.12. There doesn’t seem to be much information at all out there on how to do it, so at first I feared this might be an ugly or esoteric process, but fortunately it turns out to be quite simple!

The next paragraphs describe the rationale and steps on how to recompile kernel modules that are ready to be used for EC2 instances.

Preparing the sources and compiler setup

The first thing to know, is that kernel modules must be compiled with the same gcc version than the kernel they will run on. Since it is Amazon that originally compiled the kernel we need to determine the gcc version. Luckily, that is a simple task since this information is saved in the compiled modules. Therefore we can find out by issuing the following command on an unmodified running EC2 instance:

[root@ src]# modinfo dm_mod
filename:       /lib/modules/2.6.16-xenU/kernel/drivers/md/dm-mod.ko
license:        GPL
author:         Joe Thornber < dm-devel@redhat.com >
description:    device-mapper driver
depends:
vermagic:       2.6.16-xenU SMP 686 gcc-4.0
parm:           major:The major number of the device mapper (uint)

It turns out that the kernel (and modules) were compiled with gcc-4.0 for the 686 architecture. Now, we must bring up an instance that has that version of gcc installed. In my case, I think I booted Amazon’s developer image (ami-26b6534f) but you can pick any other that comes with gcc 4.0.

Once the instance with the right compiler is up, we need to copy the kernel sources and patches. The Amazon kernel sources (patched with Xen) can be found at http://s3.amazonaws.com/ec2-downloads/linux-2.6.16-ec2.tgz and patches for a given sub-release version can be found at http://www.kernel.org/pub/linux/kernel/v2.6/.

Our latest CentOS RightImages already provide an untarred copy of the Amazon kernel sources in /usr/src/linux-2.6.16-xenU, so there’s no need to download it. Therefore, the only thing I needed to download was the latest existing linux patch, which happened to be 2.6.16.53.

Once we have these files on the ec2 instance, we are ready to configure and patch the kernel, and then recompile the modules.

Configuring, patching and compiling the new kernel and modules

To configure the kernel, we can use the built-in config facility of the running Amazon kernel. For that, simply uncompress the original Amazon sources and construct the “.config” file from the instance’s /proc filesystem. For example:

cd /usr/src/linux-2.6.16-xenU/
gunzip < /proc/config.gz > .config

then, apply the latest kernel patch on top of that. Here, the tricky part is that we’ll be trying to apply a patch prepared for the vanilla kernel version, but on top of a Xen-modified version. Therefore, this will likely result in conflicts when applied as is. While the patch I applied didn’t result in any conflict I couldn’t easily resolve, this might not always be the case. If you know what you are doing and the extent of the code you want to fix (or upgrade), you should just patch the affected files (usually only modules) and forget about any core kernel fixes. Remember that any kernel upgrades/fixes outside a loadable module won’t be visible anyway, since Amazon will always replace the kernel of an instance before booting.

For example, applying a complete patch to the amazon kernel will look something like:

bzip2 -d /tmp/patch-2.6.16.53.bz2
cd /usr/src/linux-2.6.16-xenU/
patch  -p1 < /tmp/patch-2.6.16.53
find . -name '*.rej'
./arch/x86_64/ia32/Makefile.rej
./arch/i386/kernel/vm86.c.rej
./net/core/skbuff.c.rej
./Makefile.rej

Once you’ve resolved the conflicts we’re ready to compile and install:

make
make modules_install

If something broke, go back to the conflicts and fix whatever is broken. Once it all compiles, you should have the brand new modules installed in the “/lib/modules/2.6.16-xenU” directory! At that point you can take them for a spin and see if they can be loaded correctly. In the case of lvm, we can try to unload the existing ‘dm’ modules first (if any was loaded) and then load our new ones. If they load correctly we’ll have brand new, bug-fixed kernel modules at work for us.

Packaging the new modules to use for any future ec2 instance

The next step is to get these newly compiled modules and package them properly so we can use them in any of the ec2 instances that we wish. In my case, I used our RightSript infrastructure which allowed me to upgrade any of the templates that use LVM tools within minutes.

All I had to do is to package the kernel modules in a .tgz file and attach it to a new boot RightScript. This boot RightScript installs (i.e., replaces) the modules upon boot, removes any pre-loaded ‘dm’ modules, and loads up the newly installed ones. Here is the complete script:

#!/bin/bash -e
# Copyright (c) 2007 by RightScale Inc., all rights reserved

# First upgrade the kernel modules with some lvm fixes
# Try to unload the md modules if any is loaded (hopefully none will be in use)
echo "Unloading DM modules:"
for m in `cat /proc/modules | grep ^dm_| cut -d' ' -f1`; do echo $m; modprobe -q -r $m; done

echo "Installing new/custom kernel modules..."
(cd /lib/modules/ && tar xzf $ATTACH_DIR/modules-2.6.16.53-xenU.tgz )
echo "Loading the device mapper driver..."
modprobe dm_mod

If you are not familiar with our scripts, any attachment uploaded to the web site will automatically be sent to the booting instance and the ATTACH_DIR environment variable is automatically set to reflect that temporary directory such that the RightScrips can locate it. In this case, only a single tgz file containing the modules was attached.

Now that I have this RightScript (I called it “upgrade LVM kernel modules”) I can seamlessly patch all my server templates by adding it to the list of boot scripts. Voila! Without any other changes, I ensured that the next time any of these templates are instantiated they will use the latest kernel modules and all the nice enhancements and bug fixes that come with them. My database backups are a lot happier now without kernel panics!


Archived Comments

keving

Firstly, thanks for collecting all this and writing it all down. However, I can’t find gcc 4.0 on ami-9a9e7bf3:

[root@domU-12-31-36-00-29-81:] gcc –version gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) Copyright© 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[root@domU-12-31-36-00-29-81:] gcc<tab> gcc gcc34 gccmakedep
[root@domU-12-31-36-00-29-81:~] gcc

Also shouldn’t ‘gunzip < /proc/config.gz .config’ be ‘gunzip < /proc/config.gz > .config’ ??

thanks for your help

-k

blanquer

Kevin,

I’ve corrected the entry. You’re right, the image I had listed comes with 4.1 and not 4.0. I guess my memory was wrong and I booted another of out RightImages. But anyway, I’ve updated the entry to point to the basic Amazon developer image which does come with 4.0. Also, the redirection for the .config file was a formatting typo. I corrected it. Thanks for the feedback!

Josep

sevmax
I have a trouble with compilation new kernel. when I do “make” it wrote me that error:

< CC arch/i386/kernel/cpu/amd.o arch/i386/kernel/cpu/amd.c: In function ‘init_amd’: arch/i386/kernel/cpu/amd.c:211: error: ‘X86FEATUREFXSAVE_LEAK’ undeclared (first use in this function) arch/i386/kernel/cpu/amd.c:211: error: (Each undeclared identifier is reported only once arch/i386/kernel/cpu/amd.c:211: error: for each function it appears in.) make2: [arch/i386/kernel/cpu/amd.o] Error 1 make1: [arch/i386/kernel/cpu] Error 2 make: * [arch/i386/kernel] Error 2 >

How can I correct this error? Thanx.

blanquer

Sevmax,

If you want to recompile the whole patched kernel you’ll need to resolve the conflicts manually. In this particular case, ‘X86FEATUREFXSAVELEAK’ is a new processor feature that doesn’t seem to be compatible with the version of Xen (i.e., this feature doesn’t exist in the Xen patch that is applied to the kernel).

To fix it, just uncomment or remove the lines that set the bit. In my patched version these are lines 210-211 of ‘arch/i386/kernel/cpu/amd.c’. They look like:

       if (c->x86 >= 6)
                set_bit(X86_FEATURE_FXSAVE_LEAK, c->x86_capability);

Good luck,

Josep M.

sevmax

blanquer, thanx for answer. But when I delete this lines, I had new errors:

arch/i386/kernel/vm86.c: In function ‘dosysvm86’: arch/i386/kernel/vm86.c:318: error: ‘eax’ undeclared (first use in this function) arch/i386/kernel/vm86.c:318: error: (Each undeclared identifier is reported only once arch/i386/kernel/vm86.c:318: error: for each function it appears in.) arch/i386/kernel/vm86.c:318: error: invalid lvalue in asm output 0 make1: [arch/i386/kernel/vm86.o] Error 1 make: [arch/i386/kernel] Error 2

When I have looked a file vm86.c, I have seen that function dosysvm86() repeats some times. i.e. this function cannot be removed.

Thanks for attention.

sevmax

I can’t compile kernel on my system with this manual. On your CentOS 5 ans Amazon Developer AMI’s i have a trouble with arch/i386/kernel/vm86.c. If you have success with compile kernel, please, help me compile kernel.

Thanks for your help
sevmax

sevmax
Without kernel patching I have install modules. and necessary modules are load. But after reboot modules was not load. Maybe I can execute script as “upgrade LVM kernel modules” script to correct load nbd ? =)

blanquer

OK, I thought that the conflict resolution of the patch was almost trivial, but since there are questions about it I’ll detail the changes necessary to do it, so it is on the record.

Here’s the list of changes to resolve the conflicts and have a successful compilation (This based only on the versions I mention on the blog).:

1) For “./arch/x86_64/ia32/Makefile” : add (-Wa,-32) to the FLAGS in lines 31 and 32 2) For “./arch/i386/kernel/vm86.c” : add “long eax;” line after the “#endif” in line 261 3) ./net/core/skbuff.c is a conflict but it doesn’t need to be modified….(the Xen patch already fixed/changed it) 4) “./Makefile” (I believe this was only the version name or something trivial like that) 5) “./arch/i386/kernel/cpu/amd.c” There is a new processor feature that is not compatible with the code patched by Xen. Remove/comment it. Comment lines 210 and 211: if (c->x86 >= 6) setbit(X86FEATUREFXSAVELEAK, c->x86_capability);

That’s all it really takes (or took) for the 2.6.16.53 patch.

A couple more things sevmax: 1- The new modules will not be loaded upon booting a new instance until you copy them and load them yourself (which might require unloading the old ones first). 2- If you haven’t been able to resolve these small compilation problems by yourself I would strongly suggest reconsidering if you really require to recompile kernel modules yourself…although it looks like any other “config->make->install” type of operation like any other application, it is actually a little more serious than. If this is just a learning process for you, then, by all means hack away and experiment!

Let us know if you’ve successfully completed the process. Good luck!

Josep M.

sevmax

Hello Josep M. aka blanquer! Thank you very much for help.

I had compiled and installed kernel modules on Fedora Core 6

But now I have another “small” trouble: ”# modprobe nbd FATAL: Error inserting nbd (/lib/modules/2.6.16-xenU/kernel/drivers/block/nbd.ko): Invalid module format”

In instance log: “nbd: version magic ‘2.6.16-xenU SMP 686 gcc-4.1’ should be ‘2.6.16-xenU SMP 686 gcc-4.0’”

So, I must replace gcc 4.1 to gcc 4.0. I’m can’t find how to replace gcc via yum. I must remove gcc 4.1 via yum and install gcc 4.0 from sources? Thanks!

joka

Are you still able to use xfs or reiserfs when using the patched modules? I get the following errors:

reiserfs: disagrees about version of symbol isbadinode reiserfs: Unknown symbol isbadinode reiserfs: disagrees about version of symbol makebadinode reiserfs: Unknown symbol makebadinode

or

exportfs: disagrees about version of symbol isbadinode exportfs: Unknown symbol isbadinode xfs: Unknown symbol findexporteddentry xfs: disagrees about version of symbol isbadinode xfs: Unknown symbol isbadinode xfs: disagrees about version of symbol makebadinode xfs: Unknown symbol makebadinode

I’m hoping I just messed something up along the way.

Leave a Comment

Configuring servers with RightScripts

With EC2 the way one configures a server is by creating a machine image (AMI). But there has to be a better way! The first time I created an AMI almost a year ago it was a lot of fun. Hitting “launch” and seeing a new instance pop up was really cool. But a few weeks later reality started to set in as I was coming up with little tweaks almost on a daily basis: anything from changing an SSHD option to installing additional software required re-imaging, and that began to be painful. I pretty quickly settled into the following procedure:

  1. make the change on a current running instance to make sure it works
  2. launch the original image so I have a clean instance to modify
  3. have a cup of coffee while the instance starts up
  4. make the change again, test again
  5. bundle the instance, have lunch waiting for the bundling to complete
  6. launch the new image
  7. have another cup of coffee waiting for the instance to launch
  8. test again that everything is good
  9. launch additional instances for all the servers that use the modified image
  10. move data from the old instances to the new ones and switch service over

Well, in truth, most of the time after step 7 I discovered some little problem forcing me to circle back to step 2. Usually, by the time I was done the better part of a day had gone by and I didn’t exactly feel productive! Pretty quickly I stopped revving the images and instead kept a set of notes of the fixes I would apply manually to a fresh instance before it would be ready and I kept pushing off the moment of rolling those fixes into a fresh image. You can imagine that this only meant that the config of my servers quickly started to drift apart as I applied various fixes to different parts of the fleet. Not a good way to run a reliable service!

Another problem I faced is that a machine image is very static and I wanted a more dynamic instance configuration system. Images are static and in the end each image really represents one single server. Yet a lot of the power of EC2 lies in multiplying servers, either to handle load (scaling up) or to provide flexibility for testing, development, new projects, additional developers, etc, etc, etc. In all these cases I want a more modular system where I have a base image with software that I need pretty much everywhere and that doesn’t need frequent updates. On top of that I want to layer software modules—think RPM package plus customization, like Apache configured as reverse proxy for Rails just to take one example. Finally, I want to tie the server into a single- or multi-server deployment so it can talk to the right database server, have the right hostname, download the right data dump from S3, etc.

For the past months we’ve been developing a server configuration system called RightScripts that supports our own and our customers’ needs. We need to be able to define servers out of building blocks and then gang them together into multi-server deployments. When we have a multi-server config set-up we need to be able to easily clone it, change some deployment specific environment variables, and launch a second deployment. Or instead of having just one app server, we want to be able to define an array of app servers and have RightScale launch more as the load goes up (and reap some as the load reduces to save money). All this requires thinking about servers as objects that we multiply and dynamically configure, not as something we freeze and thaw.

Enough theory, how do RightScripts look in practice? We start with a base image, which typically is the CentOS 5 RightImage we made available a few months ago. It contains the stuff that we expect to find on all servers: the various EC2 command line tools, perl, ruby, java, gcc, crypto stuff, syslog-ng, a specific sshd config, etc. On top of this base we defined a number of software modules, each one really being a shell script that installs and configures a piece of software for a specific purpose. The script usually needs one or several packages installed and it may need some file attachments. Let’s take an example: our Apache base install starts by installing the CentOS httpd, httpd-devel, and mod_ssl packages (i.e. `yum install httpd httpd-devel mod_ssl`) and then consists of the following bash script:

service httpd stop
if [ -d /mnt/www ]; then
echo "Apache Base Files Exist"
else
## Move Apache
mv /var/www /mnt
ln -nsf /mnt/www /var/www
fi
## Move Apache Logs
rm -fr /var/log/httpd
mkdir -p /mnt/log/httpd
ln -s /mnt/log/httpd /var/log/httpd
## Set Admin Email
perl -p -i -e "s/root@localhost/$ADMIN_EMAIL/" /etc/httpd/conf/httpd.conf
##Change to threaded worked
perl -p -i -e 's/#HTTPD=\/usr\/sbin\/httpd.worker/HTTPD=\/usr\/sbin\/httpd.worker/' /etc/sysconfig/httpd
##modify Server Signature
perl -p -i -e 's/ServerSignature On/ServerSignature Off/' /etc/httpd/conf/httpd.conf
perl -p -i -e 's/#ExtendedStatus On/ExtendedStatus On/' /etc/httpd/conf/httpd.conf
##disable php for now
mv /etc/httpd/conf.d/php.conf /etc/httpd/conf.d/php.disabled

As you can see, this is really simple, and it should be! With the above definition entered as a boot script into RightScale we can associate it with any server where it’s needed and RightScale will take care of installing the packages and executing the script at boot time. The nice thing is that the script doesn’t bake the apache package version into an image: we get the latest one or we could select a specific one in the script. Also, it’s easy to see what is changed from the distribution’s installation and tweak it incrementally to different needs so when we want a somewhat different set-up we’re not diffing installations, we’re just looking at the script for what was changed. Finally, since we’ve configured a number of different servers using these scripts we’ve had to update the base CentOS image and it was a breeze to restart all the servers with the new base image plus the specific set of scripts needed for each server. Before, we would have had to apply the same base CentOS change to each image of each server!There are two additional levels of functionality that I’ll leave for future posts. These RightScripts don’t have to be executed at boot time, they can also be executed later as “pre-configured” actions that can be run when something changes. For example to add a new app server instance to a load balancer instance config. Also, when servers are tied into multi-server deployments, the scripts can refer to variables which provide environment-specific values, such as the IP address of the database server to connect to, or the number of apache processes to run. But more of that in another post!If you’re interested in learning more about RightScripts, please contact us at sales@rightscale.com. RightScripts are available with RightScale’s free accounts, but more advanced features are reserved for the premium accounts.

Comments (6)