This was originally posted on the Zooniverse blogs here.
In my last post I described at length the domain model that we use to describe conceptually what the Zooniverse does. That wouldn't mean much without an implementation of that model and so in this post I’m going to describe some of the tools and technologies that we use to actually run our citizen science projects.
The lifecycle of a Zooniverse project
Let's think a little more about what happens when you visit a project such as Snapshot Serengeti. Ignoring all of the to-and-fro that your web browser does to work out where the domain name 'snapshotserengeti.org' points to, once it’s figured this and a few other details out you basically get sent a website that your browser renders for you. For the website to function as a Zooniverse project a few things are essential:
- You need to be able to view images (or listen to audio or watch a video) that we and the science team need your help analysing.
- You need to be able to log in with your Zooniverse account.
- We need to capture back what you said when doing the citizen science analysis task.
- Save out favourite images to your profile.
- View recent images you've seen in your profile.
- Discuss these images with the community.
It turns out that pretty much all of the functionality mentioned above is for us delivered by an application we call Ouroboros as an API layer and a website (such as Snapshot Serengeti) talking to it.
Ouroboros – or 'why the simplest API that works is probably all you need'.
So what is Ouroboros? It provides an API (REST/JSON) that allows you to build a Zooniverse project that has all of the core components (1-6) listed above. Technology-wise it's a custom Ruby on Rails application (Rails 3.2) that uses MongoDB to store data and Redis as a query cache all running on Amazon Web Services. It's probably utterly useless to anyone but us but for our needs it's just about perfect.
At the Zooniverse we're optimised for a few different things. In no particular order of priority they are:
- Volume – we want to be able to build lots of projects.
- Science – we want it to be easy to do science with the efforts of our community.
- Scale/performance – we want to be able to have millions of people come to our proejcts and them to stay up.
- Availability – we'd prefer our websites to be 'up' and not 'down'.
- Cost – we want to keep costs at a manageable level.
Pretty much all of these requirements point to having a shared API (Ouroboros) that serves a large number of projects (I’ll argue #4 in the pub with anyone who really wants to push me on it).
Running a core API that serves many projects makes you take the maintenance and health of that application pretty seriously. Should Ouroboros throw a wobbly then we'd currently take out about 10 Zooniverse projects at once and this is only set to increase. This means we've thought a lot about how to scale the application for times when we're busy and we also spend significant amounts of time monitoring the application performance and tuning code where necessary. I mentioned that cost is a factor – running a central API means that when the Zooniverse is quiet and there aren't many people about we can scale back the number of servers we're running (automagically on Amazon Web Services) to a minimal level.
We've not always built our projects this way. The original Galaxy Zoo (2007) was an ASP/web forms application, projects between Galaxy Zoo 2 and SETI Live were all separate web applications, many of them built using an application called The Juggernaut. Building standalone applications every time not only made it difficult to maintain our projects but we also found ourselves writing very similar (but subtly different) code many times between projects, code for things like choosing which Subject to show next.
Ouroboros is an evolution of our thinking about how to build projects, what's important and generalisable and what isn't. At it's most basic it's a really fast Subject allocator and Classification collector. Our realisation over the last few years was that the vast majority of what’s different about each project is the user experience and classification interface and this has nothing to do with the API.
The actual projects
Having a standard API and client library for talking to it meant that we built the Zooniverse project Planet Four in less than 1 week! That’s not to say it's trivial to build projects, it's definitely not, but it is getting easier. And having this standardised way of communicating with the core Zooniverse means that the bulk of the effort when building Planet Four was exactly where it should be – the fan drawing tools – the bit that's different from any of our other projects.
Currently the majority of our projects are hosted using the Amazon S3 static website hosting service. The benefits of this are numerous but key ones for us are:
- There's no webserver serving the site content, that is http://www.galaxyzoo.org resolves to an S3 bucket. When you access the Galaxy Zoo site S3 does all of the hard work and we just pay for the bandwidth from S3 to your computer.
- Deploying is easy. When we want to put out a new version of any of our sites we just upload new timestamped versions of the files and your browser starts using them instead.
- It's S3! – Amazon S3 is a quite remarkable service – a significant fraction of the web is using it. Currently hosting more than 2 trillion (yes that's 12 zeroes) objects and regularly serving more than 1 million requests for data per second the S3 service is built to scale and we get to use it (and so can you).
If you're like me then when you read something you read the opening, look at the pictures and then skip to the conclusions. I'll summarise here just incase you're doing that too:
In the Zooniverse there's a clear separation between the API (Ouroboros) and the citizen science projects that the community interact with. Ouroboros is a custom-built, highly scalable application built in Ruby on Rails, that runs on Amazon Web Services and uses MongoDB, Redis and a few other fancy technologies to do its thing.
What I didn't talk about in this post are the hardest bits we've solved in Ouroboros – namely all of the logic about how to make finding Subjects for people quickly and other 'smart stuff'. That's coming up next.