Elastic Load Balancing on EC2 redux

A few months back I wrote about how we switched the Galaxy Zoo HAProxy load balancers to Amazon Web Services (AWS) Elastic Load Balancers (ELB). At that point we had basically just swapped out HAProxy (running on its own EC2 small instance) for an ELB but weren't making any use of the auto-scaling features also on offer. For the past few days I've been playing around with auto-scaling our API layer with the ELB that's already in place and this morning I pushed the changes into production.

» Getting started

As I mentioned earlier, we already had an ELB in place so we didn't need to create a new one - instead we're adding here auto-scaling to an ELB that's already in place. For completeness however, this is the command used to create the original ELB:

>> elb-create-lb ApiLoadBalancer --zones us-east-1b --listener "lb-port=80, instance-port=80, protocol=TCP" --listener "lb-port=443, instance-port=8443, protocol=TCP"

» De-register existing ELB instances

As we already had a couple of instances registered with the ELB I found the easiest way to get auto-scaling up and running was to remove the existing instances before proceeding:

>> elb-describe-instance-health ApiLoadBalancer
INSTANCE i-abcdefgh InService
INSTANCE i-ijklmnop InService
>> elb-deregister-instances-from-lb -lb ApiLoadBalancer --instances i-abcdefgh i-ijklmnop No Instances currently registered to LoadBalancer

» Create a launch configuration

Before you can introduce auto-scaling you need to have a couple of things in place - an Amazon Machine Image (AMI) that upon boot is immediately ready to serve your application and a launch configuration compatible with your currently ELB-scaled nodes (security groups etc.). Depending upon your setup, always having an AMI ready to launch with the latest version of your production codebase is probably the hardest thing to achieve here. Once you have your AMI in place and your security group and key-pair settings to hand you're ready to create your launch configuration:

>> as-create-launch-config ApiLaunchConfig --image-id ami-myamiid --instance-type m1.small --key ssh_keypair --group "elb security group name"
OK-Created launch config

» Create an auto-scaling group

Once you have a launch configuration in place it's time to create an auto-scaling group. Auto-scaling groups need as a minimum to know what launch configuration, which load-balancer to use, which availability zone and the minimum and maximum to scale to. We never run the Galaxy Zoo API on anything less than 2 nodes and so to create our auto-scaling group I issued a command something like this:

>> as-create-auto-scaling-group ApiScalingGroup --launch-configuration ApiLaunchConfig --availability-zones us-east-1b --min-size 2 --max-size 6 --load-balancers ApiLoadBalancer
OK-Created AutoScalingGroup

At this point it's worth noting that although we'd removed all of the instances being load balanced by the ApiLoadBalancer ELB, because the auto-scaling group set a minimum number of instances of 2 checking the status of the auto-scaling group showed that 2 new instances were spinning up:

>> as-describe-scaling-activities ApiScalingGroup
ACTIVITY 78bf4e0d-f72b-4b5b-a044-6b99942088ed 2009-08-24T07:19:28Z Successful "At 2009-08-24 07:16:12Z a user request created an AutoScalingGroup changing the desired capacity from 0 to 2. At 2009-08-24 07:17:17Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 2."

I don't know about you but I think that's pretty AWESOME!

» Create some launch triggers

To complete the auto-scaling configuration, you need to define the rules that increase and decrease the number of load-balanced instances. Currently we have a very simple rule based upon CPU load - if the average CPU load over the past 120 seconds is greater than 60% we introduce a new instance, if the CPU average drops below 20% then we remove an instance:

>> as-create-or-update-trigger ApiCPUTrigger --auto-scaling-group ApiScalingGroup --namespace "AWS/EC2" --measure CPUUtilization --statistic Average --dimensions "AutoScalingGroupName=ApiScalingGroup" --period 60 --lower-threshold 20 --upper-threshold 60 --lower-breach-increment=-1 --upper-breach-increment 1 --breach-duration 120
OK-Created/Updated trigger

These triggers will almost certainly require refinement but helpfully the as-create-or-update-trigger command will create a new trigger if one doesn't exist or update an existing trigger based upon the new parameters.

» That's it!

Once again it's been a breeze to introduce the latest AWS features into our production stack. Moving Galaxy Zoo to AWS has completely changed the way we think about running our web applications - we've gone from having a group of 'pet' servers we each know the name of to having a fault-tollerant, auto-scaled web-stack ready for the future.