Galaxy Zoo / Zooniverse
Total Page:16
File Type:pdf, Size:1020Kb
Citizen Science Case Study: Galaxy Zoo / Zooniverse Nathan R. Prestopnik Syracuse University [email protected] Abstract Galaxy Zoo and other “Zooniverse” websites are citizen science projects designed to allow individuals to aid with scientific inquiry through annotating various astronomical photographs or other assets online. Galaxy Zoo is specifically designed to have individuals from around the world classify galaxies photographed by various space telescope platforms. The classifications submitted are used to paint a more detailed picture of the universe we live in. This case study is an examination of Galaxy Zoo and its sister citizen science websites as interactive technology artifacts, with emphasis placed on design decisions, technical implementation, and how these two antecedents affect users, participants, and the overall success of these projects. In addition to providing a clear view of how the Galaxy Zoo platform is designed and functions, our purpose is to inform future citizen science technical implementations. It is hoped that by studying a successful implementation such as Galaxy Zoo and the Zooniverse, we may begin to suggest best practices for the deployment of web or mobile-based platforms for a variety of citizen science dependent investigations. Associated Links The following links are to projects and related work mentioned in this case study: Galaxy Zoo: http://www.galaxyzoo.org Moon Zoo: http://www.moonzoo.org Old Weather: http://www.oldweather.org Zooniverse: http://www.zooniverse.org 1. Introduction Galaxy Zoo is a citizen science project originally developed by researchers at the University of Oxford to assist with the classification of galaxies. Ground and space-based observation platforms such as the Sloan Digital Sky Survey and the Hubble Space Telescope have produced large quantities of sky images featuring galaxies and other astronomical objects of interest. Because of the large numbers of images involved (typically numbering in the millions), it is difficult for any one scientist – or even small groups of scientists – to thoroughly review these images and classify the astronomical objects they contain. Galaxy Zoo, originally developed and launched in July 2007, was intended to assist with this problem by displaying galaxy images to registered users and giving them the opportunity to classify galaxy features – spiral arms, elliptical shape, etc. – within those photographs. It was expected that over the course of several years, a relatively small number of highly interested participants from the astronomy community would eventually be able to classify all million photographs of the Sloan Digital Sky Survey. Participants in this endeavor would be recruited from scientific conferences; personal or professional contact with participants was expected to be necessary in order to raise awareness of and interest in the Galaxy Zoo classification project. In retrospect, however, and as the principle investigator for Galaxy Zoo, put it, “that’s not how the internet works.” Instead of the expected trickle of classifications from like-minded scientists, within 24 hours of its launch, the Galaxy Zoo website was receiving 70,000 classifications per hour. Galaxy Zoo was able to collect 50 million classifications within its first year of operation; though the Sloan Digital Sky Survey contained only about 1 million galaxy images to classify, multiple classifications were necessary in order to assess validity and improve data quality for various purposes. The success of the original Galaxy Zoo project prompted project investigators to “professionalize” the project. Steps were taken in late 2008 and early 2009 to hire full-time development staff and begin a more ambitious program to expand Galaxy Zoo’s original goals as well as venture into other areas of citizen- based scientific study. The original Galaxy Zoo was a simple but “elegant” form-based system where users submitted a set of relatively basic information about the observed galaxy. The second iteration of Galaxy Zoo features a more robust and complex “decision tree” that walks users through eight overall questions with about 40 possible responses in total. The technical development of this decision tree required more than the volunteer development time that had generated the original Galaxy Zoo. As a result, full-time developers were brought into the project. As part of the development of Galaxy Zoo 2, care and attention were also given to ways of generalizing the type of information presented to and provided by system users; an internal schema or vocabulary was developed, whereby “assets” (the images presented to users of the website) were “annotated” by “users.” This simple classification has made it possible to develop several very different citizen science websites that all have the same basic core – assets annotated by users. Currently, eight citizen science projects (Galaxy Zoo, Moon Zoo, Old Weather, Solar Storm Watch, Galaxy Zoo: Understanding Cosmic Mergers, Planet Hunters, the Milky Way Project, and Galaxy Zoo: The Hunt for Supernovae) are underway, grouped under the overall umbrella of the “Zooniverse.” These projects are all citizen science-based, and have at their core groups of volunteers who provide data about various artifacts of interest. The following case study is an in-depth look at the current iteration of Galaxy Zoo, based on interviews with project investigators and developers, as well as analysis of the public portions of the website itself. Because Galaxy Zoo is just one site within a constellation of projects called the “Zooniverse,” discussion of it must necessarily include mention of other projects. While this case study has the Galaxy Zoo project as its primary focus, related citizen science websites also receive attention where needed. The goal is to better understand the conceptual and technological underpinnings of these highly successful citizen science projects. 2. Galaxy Zoo Design Galaxy Zoo is a citizen science project designed to generate data about galaxies photographed via ground- and space-based telescope systems. Images from both the Sloan Digital Sky Survey and the Hubble Space telescope have been classified using the Galaxy Zoo system. Galaxy images are presented to users of the Galaxy Zoo website one at a time, and are classified via a "decision tree" that walks users through a series of multiple choice and open-ended questions about each image. The decision tree includes questions such as, "Is the galaxy smooth and rounded, with no sign of a disk?" and, "Does the galaxy have a bulge at its center and if so, what shape?" Each question may prompt the user for further information, such as, "How rounded is it?" or, "Could this be a disk viewed edge on?" Questions are answered with a simple interface; icons and a small amount of explanatory text are used to represent possible answers. The icons visually show how the galaxy image should appear in order for that answer to be correct. Using both short text answers (i.e. "yes," "no," "spiral," "ring," etc.) and icons provides the user with an information-rich yet simple interface that does a very good job of explaining to users what is meant by a question even as it is asked. This has the effect of reducing error and improving the quality of the data generated by the classification process. An example of the Galaxy Zoo classification interface is shown below: Figure 1: Galaxy Zoo Classification Interface. To further simplify the classification process, detailed instructions for each question in the decision tree are provided on a separate page. In addition to providing an overview for each question, the page is set up so that users can practice with several sample galaxy images. This helps users to determine how well they understand what is being asked and how well they are doing. Interestingly, the live Galaxy Zoo classification system used to generate data does not provide feedback to the user about how well they are doing. This is surprising, since one of the more common questions asked by site users is, "How do I know if I am doing this right?" Some of the Galaxy Zoo investigators have suggested that more formal testing procedures (as opposed to general information about how to classify galaxies) would be an appropriate way to address this issue. However, it is a deliberate decision not to implement such features. According to one investigator involved in the project, “The issue of testing users to establish a baseline of competence comes up a lot, but at least from my perspective it's a deliberate decision not to implement it. We did have a test for Zoo 1 (http://zoo1.galaxyzoo.org) but it was one at an extremely low level. The problems with a harder test are twofold - firstly, people find instant engagement with real data more interesting than a test, and so insisting on a test loses us a lot of classifications, both because people leave without completing the test and because they have had enough after completing the test. We see both of these behaviors in Solar Stormwatch, which has a compulsory tutorial mode. Secondly, and probably more importantly, insisting on a test locks you into data that reflects what your test teaches. By searching the database for 'good' users after the fact, and weighting their contributions accordingly, we can make these decisions post hoc - so, for example, we can produce catalogues of merging galaxies that match those that would be produced by two scientists who fundamentally disagree on what should be classified as a merger.” The Galaxy Zoo interface is designed with inexperienced users in mind; by necessity, the interfaces are targeted toward people with no experience classifying or identifying galaxies, since there are very few people in the world who are highly knowledgeable in this area. Hence the need to include both instruction pages and a step-by-step “decision tree” interface that asks specific questions and seeks specific and relatively easy-to-provide answers.