There's an idea in the technical world called Service Oriented Architecture. It's a pretty widely recognized as a good way of building big, complex systems in a reliable way and crucially with reusable components. Famously, Amazon's Jeff Bezos realised the value of this approach and as a consequence something like 100 separate services are used to render a typical product page on Amazon.com.
Wikipedia defines SOA as:
A service-oriented architecture (SOA) is an architectural pattern in computer software design in which application components provide services to other components via a communications protocol, typically over a network. The principles of service-orientation are independent of any vendor, product or technology.
Over the past few months I've been noodling on the idea of Service Oriented Astronomy, my working definition is something like this:
Service oriented astronomy (SOA) is an approach whereby researchers develop novel methods for common (or specialist) data analysis tasks that are shared as hosted services with their peers.
In this post I'm going to try and convince you why SOA (of the Astronomy kind) might be a good idea.
In 2020, following more than two decades of planning and construction the Large Synoptic Survey Telescope (LSST) will see first light. If all goes to plan then shortly afterwards it will enter survey mode and will begin producing vast quantities of data1.
Most of the headlines about LSST are focussed on the data products: A continual 'Level 1' alert stream of things (transients) that have changed since that area of the sky was last observed and an annual 'Level 2' data releases of calibrated images. Having at least three 'Vs' (Volume, Velocity and Variety) considered defining characteristics, LSST is generally considered a 'Big Data' project2. For me though, the most exciting part of LSST isn't that it's big data, it's a combination of the following factors:
Doing things live is hard. Especially if you want to be responsive (and timely) to potentially interesting events.
A large number of the events being produced are likely to be of little interest to any single individual/research group.
It's big enough that most of us are woefully unprepared to deal with the data volume
LSST isn't Facebook/Google 'big' but it's certainly big enough to present difficulties given that the vast majority of us have never received any formal training in software development. Building fast, scalable and reliable tools for processing LSST datasets is not something we'll likely to find easy.
Open is a pre-requisite for large-scale innovation. If everyone3 has access to LSST data there's an opportunity for the creation of secondary data products and a marketplace for their consumption.
Although each Level 1 alert itself is going to be pretty small (some positional information, basic photometry and a small thumbnail image), there's going to be lots of them (Volume), coming at a very quick rate4 (Velocity) and given that they could be low flying rocks, variable stars, supernovae or signals of currently unknown astrophysics, they're certainly going to be Varied.
So while ingesting Level 1 data might be OK, doing something intelligent, i.e. making decisions on the fly, is going to be much much harder. To add a further complication, many of the research efforts interested in consuming the alert stream are time-sensitive.
If for example we're interested in capturing high-resoluting spectra of distant Type Ia supernovae then we have the additional compounding factors of being very time-sensitive and highly intolerant of false-positives (because followup is expensive).
Supernovae-science is just one of many use cases where a large volume of data alerts needs to be turned into a much smaller stream of high-quality candidates for (timely) followup observation5.
Continuing to use Type Ia supernovae discovery as a good use case, what does a typical research and discovery process look like6?
Individually, none of these steps present an insurmountable problem: it's the combination of a noisy (and high-volume) alert stream that requires a sophisticated (fast & accurate) first-level classification followed by an expensive followup observation that means we've got problems.
Open source is modular — your software is the value-add on top of a rich ecosystem of reusable components. In many ways, Service Oriented Architecture is a implementation of the same ideas: let's build a the thing we need once and then everyone use the reference (best?) version. Of course, it's possible that people might choose to re-implement a component but in open source that's usually out of preference rather than necessity.
Across astronomy, there are small numbers of teams combining deep theoretical grounding of novel methods and implementing them in high-quality software solving real astrophysical problems, hell, some of them even have Bay Area Startups.
That means that there are people who are getting really really good and solving some of the very hard problems that LSST is going to present us with - why wouldn't you want to incorporate their knowledge into your analysis?
What if there was a way to easily plug in the premier Type Ia detection algorithm into your research and subscribe to that alert stream rather that the raw LSST Level 1 alerts?
This is what I think Service Oriented Astronomy could be.
Not everything needs to be a service. Here are some ways to identify potentially good candidates:
It would also be disingenuous of me to not talk about some significant barriers to making this happen. Namely, even if someone built an incredible piece of software, ran it as a service that a large fraction of the community used they'd not be guaranteed a good career in academia because of our paper-citation obsession as a community. So, some things to address:
Of course, none of this discussion is unique to astronomy. LSST just feels like a good opportunity (through necessity) to drive significant change in the way we do our research. The same could be said for many other areas of science where there is a large source of open data. Perhaps we should just be calling this Service Oriented Science (SOS).
Service Oriented Architecture (and its close relative Microservices) has dramatically changed the way that companies deliver software and data services. How long before we might say the same for Service Oriented Science?
1. Exactly how much still remains to be seen.
2. Some folks disagree with this statement. Some people also seem to have more Vs for you.
3. Those researchers not in the US or a partner country presumably just need to find themselves a suitably geographically-located collaborator.
4. No-body seems to know exactly how many but there are likely to be upwards of tens of and perhaps even hundreds of millions per night.
5. Of course we could just build a giant army of robotic telescopes to observe a larger fraction of the candidates but this seems like a loosing battle.
6. Full disclosure: I know literally nothing about finding supernovae which is somewhat ironic given this
7. I'd love your thoughts on some related efforts I've been working on here