This morning Jeff Bar announced a new service offering from Amazon Web Services (AWS): Amazon Relational Database Service. The Amazon Relational Database Service (or RDS for short) is a new web service that allows users to create, manage and scale relational databases without having to configure the server instance, database software and storage that the database runs on. In short, this is a service that has the potential to take much of the headache out of database management.
At Galaxy Zoo we run our database on a combination of MySQL 5.1, EC2/Ubuntu Hardy and XFS/EBS storage. While there are some excellent guides on how best to configure a database running on AWS, operating a database in a virtualised environment requires that you plan for the worst case scenario of the virtualised server failing and the filesystem disappearing along with it. Because of this the steps required to configure a new database using 'persistent' storage (i.e. on an Elastic Block Store volume) are numerous:
Launching a new EC2 instance and installing the database software is pretty simple however for convenience we have the Galaxy Zoo database image saved out as a custom AMI.
Launching a new EC2 image without a database engine installed means that you probably need to update the server software (e.g. apt-get update on Ubuntu) and then install MySQL or your database of choice.
Next up you need to create a new EBS volume, attach it to the EC2 instance and format the filesystem on the EBS volume.
Setting up a new database instance on EC2 is clearly non-trivial and requires knowledge of EBS, mount points, filesystems, not to mention configuring the MySQL settings for the chosen size of EC2 instance that you have.
With the introduction of RDS, Amazon has removed almost all of the difficulty in setting up and configuring a new MySQL database that is both scalable and reliable. Creating a new database instance now is as simple as issuing a single command:
>> rds-create-db-instance --db-instance-identifier mydatabase --allocated-storage 20 --db-instance-class db.m1.small --engine MySQL5.1 --master-username root --master-user-password password
With this command I have created a new m1.small MySQL 5.1 database server with 20Gb of storage and configured the master username and password. Provisioning a new RDS instance took a few minutes and during the provisioning you can check on the progress with the command:
>> rds-describe-db-instances Once available, your new RDS instance is given a hostname that you can then use to connect with the standard MySQL port of 3306. Actually, it's not quite that simple - before you can connect you need to assign which AWS security groups are allowed to connect to your RDS instances. I found this step a little confusing but essentially you need to configure is which EC2 instances running under their respective security groups are allowed to connect. For Galaxy Zoo, we have a default security group for all of our web servers called 'web' and so to allow access from these servers I had to add this 'web' security group to the defaults for the RDS servers:
>> rds-authorize-db-security-group-ingress default --ec2-security-group-name web --ec2-security-group-owner-id 1234567789
At this point you have a RDS instance running MySQL 5.1 ready and waiting to serve up your databases. That's not where the benefits stop though - not only do you get the ease of creating new database instances but there are some very nice extras you also get by using the service.
At Galaxy Zoo, we usually our main 'classifications' database on a single EC2 small instance. In the last 8 months we've received something close of 45 million classifications and while the database has started to get a little sluggish, by writing user classifications to SQS and processing them asynchronously we are able to keep the site feeling nippy. Each month however, we try to send a newsletter to our 250,000 strong community and the increased load that this causes on the database means that for a couple of days we switch to a m1.large instance. The overhead of switching database servers is pretty annoying - place the site into maintenance mode, stop the MySQL server, detach the EBS volume, launch a new EC2 instance, attach the EBS volume to the new server... the list goes on. With RDS not only can you change the amount of disk space available to your RDS instance but you can also dynamically resize the server size (i.e. RAM/CPU). I can see that this is going to be a real win for us.
If there's on thing that my time at The Sanger Institute taught me, it's that managing and scaling large databases is a dark art. For the majority of small web applications it's not crucial whether the MySQL server configuration you're running is absolutely optimised for your hardware however now that we're reaching the limits of our current instance size, making sure the MySQL server is well configured is becoming important. Deciding how large your innodb_buffer_pool or key_buffer size should be is not obvious for most of us and so having a MySQL server configured to work well for the resources available to it is very comforting. Over the next couple of days I'm going to be benchmarking our standard MySQL setup to see how it compares against a RDS instance with the same resources. Watch this space!