<<

Citizen Case Study: Zoo /

Nathan R. Prestopnik Syracuse University [email protected]

Abstract

Galaxy Zoo and other “Zooniverse” websites are projects designed to allow individuals to aid with scientific inquiry through annotating various astronomical photographs or other assets online. is specifically designed to have individuals from around the world classify photographed by various space telescope platforms. The classifications submitted are used to paint a more detailed picture of the we live in.

This case study is an examination of Galaxy Zoo and its sister citizen science websites as interactive technology artifacts, with emphasis placed on design decisions, technical implementation, and how these two antecedents affect users, participants, and the overall success of these projects. In addition to providing a clear view of how the Galaxy Zoo platform is designed and functions, our purpose is to inform future citizen science technical implementations. It is hoped that by studying a successful implementation such as Galaxy Zoo and the Zooniverse, we may begin to suggest best practices for the deployment of web or mobile-based platforms for a variety of citizen science dependent investigations.

Associated Links

The following links are to projects and related work mentioned in this case study:

 Galaxy Zoo: http://www.galaxyzoo.org  Moon Zoo: http://www.moonzoo.org  : http://www.oldweather.org  Zooniverse: http://www.zooniverse.org

1. Introduction

Galaxy Zoo is a citizen science project originally developed by researchers at the to assist with the classification of galaxies. Ground and space-based observation platforms such as the and the have produced large quantities of sky images featuring galaxies and other astronomical objects of interest. Because of the large numbers of images involved (typically numbering in the millions), it is difficult for any one scientist – or even small groups of scientists – to thoroughly review these images and classify the astronomical objects they contain. Galaxy Zoo, originally developed and launched in July 2007, was intended to assist with this problem by displaying galaxy images to registered users and giving them the opportunity to classify galaxy features – spiral arms, elliptical shape, etc. – within those photographs.

It was expected that over the course of several years, a relatively small number of highly interested participants from the community would eventually be able to classify all million photographs of the Sloan Digital Sky Survey. Participants in this endeavor would be recruited from scientific conferences; personal or professional contact with participants was expected to be necessary in order to raise awareness of and interest in the Galaxy Zoo classification project.

In retrospect, however, and as the principle investigator for Galaxy Zoo, put it, “that’s not how the internet works.” Instead of the expected trickle of classifications from like-minded scientists, within 24 hours of its launch, the Galaxy Zoo website was receiving 70,000 classifications per hour. Galaxy Zoo was able to collect 50 million classifications within its first year of operation; though the Sloan Digital Sky Survey contained only about 1 million galaxy images to classify, multiple classifications were necessary in order to assess validity and improve data quality for various purposes.

The success of the original Galaxy Zoo project prompted project investigators to “professionalize” the project. Steps were taken in late 2008 and early 2009 to hire full-time development staff and begin a more ambitious program to expand Galaxy Zoo’s original goals as well as venture into other areas of citizen- based scientific study. The original Galaxy Zoo was a simple but “elegant” form-based system where users submitted a set of relatively basic information about the observed galaxy. The second iteration of Galaxy Zoo features a more robust and complex “decision tree” that walks users through eight overall questions with about 40 possible responses in total. The technical development of this decision tree required more than the volunteer development time that had generated the original Galaxy Zoo. As a result, full-time developers were brought into the project. As part of the development of Galaxy Zoo 2, care and attention were also given to ways of generalizing the type of information presented to and provided by system users; an internal schema or vocabulary was developed, whereby “assets” (the images presented to users of the website) were “annotated” by “users.” This simple classification has made it possible to develop several very different citizen science websites that all have the same basic core – assets annotated by users.

Currently, eight citizen science projects (Galaxy Zoo, Moon Zoo, Old Weather, Solar Storm Watch, Galaxy Zoo: Understanding Cosmic Mergers, , the Project, and Galaxy Zoo: The Hunt for Supernovae) are underway, grouped under the overall umbrella of the “Zooniverse.” These projects are all citizen science-based, and have at their core groups of volunteers who provide data about various artifacts of interest.

The following case study is an in-depth look at the current iteration of Galaxy Zoo, based on interviews with project investigators and developers, as well as analysis of the public portions of the website itself. Because Galaxy Zoo is just one site within a constellation of projects called the “Zooniverse,” discussion of it must necessarily include mention of other projects. While this case study has the Galaxy Zoo project as its primary focus, related citizen science websites also receive attention where needed. The goal is to better understand the conceptual and technological underpinnings of these highly successful citizen science projects.

2. Galaxy Zoo Design

Galaxy Zoo is a citizen science project designed to generate data about galaxies photographed via ground- and space-based telescope systems. Images from both the Sloan Digital Sky Survey and the Hubble Space telescope have been classified using the Galaxy Zoo system. Galaxy images are presented to users of the Galaxy Zoo website one at a time, and are classified via a "decision tree" that walks users through a series of multiple choice and open-ended questions about each image. The decision tree includes questions such as, "Is the galaxy smooth and rounded, with no sign of a disk?" and, "Does the galaxy have a bulge at its center and if so, what shape?" Each question may prompt the user for further information, such as, "How rounded is it?" or, "Could this be a disk viewed edge on?"

Questions are answered with a simple interface; icons and a small amount of explanatory text are used to represent possible answers. The icons visually show how the galaxy image should appear in order for that answer to be correct. Using both short text answers (i.e. "yes," "no," "spiral," "ring," etc.) and icons provides the user with an information-rich yet simple interface that does a very good job of explaining to users what is meant by a question even as it is asked. This has the effect of reducing error and improving the quality of the data generated by the classification process.

An example of the Galaxy Zoo classification interface is shown below:

Figure 1: Galaxy Zoo Classification Interface.

To further simplify the classification process, detailed instructions for each question in the decision tree are provided on a separate page. In addition to providing an overview for each question, the page is set up so that users can practice with several sample galaxy images. This helps users to determine how well they understand what is being asked and how well they are doing. Interestingly, the live Galaxy Zoo classification system used to generate data does not provide feedback to the user about how well they are doing. This is surprising, since one of the more common questions asked by site users is, "How do I know if I am doing this right?"

Some of the Galaxy Zoo investigators have suggested that more formal testing procedures (as opposed to general information about how to classify galaxies) would be an appropriate way to address this issue. However, it is a deliberate decision not to implement such features. According to one investigator involved in the project, “The issue of testing users to establish a baseline of competence comes up a lot, but at least from my perspective it's a deliberate decision not to implement it. We did have a test for Zoo 1 (http://zoo1.galaxyzoo.org) but it was one at an extremely low level. The problems with a harder test are twofold - firstly, people find instant engagement with real data more interesting than a test, and so insisting on a test loses us a lot of classifications, both because people leave without completing the test and because they have had enough after completing the test. We see both of these behaviors in Solar Stormwatch, which has a compulsory tutorial mode. Secondly, and probably more importantly, insisting on a test locks you into data that reflects what your test teaches. By searching the database for 'good' users after the fact, and weighting their contributions accordingly, we can make these decisions post hoc - so, for example, we can produce catalogues of merging galaxies that match those that would be produced by two scientists who fundamentally disagree on what should be classified as a merger.”

The Galaxy Zoo interface is designed with inexperienced users in mind; by necessity, the interfaces are targeted toward people with no experience classifying or identifying galaxies, since there are very few people in the world who are highly knowledgeable in this area. Hence the need to include both instruction pages and a step-by-step “decision tree” interface that asks specific questions and seeks specific and relatively easy-to-provide answers. Like all Zooniverse projects, simplicity of design is considered key.

3. Zooniverse Design Approach

After the success of Galaxy Zoo, the Zooniverse team was in a position to begin working on other citizen science projects proposed by scientists from various fields. In general, any potential project will receive its own website and custom interface. To design interfaces for new projects, however, the Zooniverse team often has to work closely with scientists to determine project goals and needs, as well as to set expectations for what the citizen science approach is and is not appropriate for. According to the project’s technical lead, “Quite quickly, we try to identify what it is we want to do with the data. Is it marking craters? Is it answering some simple questions? Is it transcribing text?”

Often, scientists approach the Zooniverse team with project ideas that must be refined or modified before they are likely to be successful. For example, some groups have approached the team with visions of web users spending the time to answer 80 distinct scientific questions, when, in fact, this is an unrealistic expectation. The Zooniverse team has worked with scientists in cases like these to refine scientific goals down to, for example, five key areas of interest for further study.

In other cases, scientists approach the team with much lower expectations than necessary. According to the project lead, “People typically think far more conservatively than what you might imagine.” Whether expectations are initially too high or too low, it is important for the Zooniverse team to match scientist expectations with technical possibilities. To do this, a questionnaire was developed. This questionnaire elicits important information from scientists about their project ideas, asking questions such as, “What would the minimum achievements for success be, and what extra might you hope to get?” and, “What specific tasks to you envisage citizen science participants performing with this data in order to address your science requirements?” Additionally, the questionnaire is prefaced by a two page guiding statement intended to set scientist expectations. Some of this advice is counterintuitive, but still highly helpful in constraining a scientific team’s plans to the achievable while encouraging them to expand their thinking about what is possible. For example:

1. “An impression that the data will not be sufficiently appealing to the public should not be a concern, as there are various ways in which the presentation of data and the user experience can likely be enhanced to attract sufficient participants, particularly if there is a compelling science case.” 2. “Much of the power of the (citizen science) method lies in the ability of advanced volunteers to follow up themselves on interesting or unusual data. This necessitates the provision of access to as much additional data as possible.” 3. “Following the successful operation of the project, a certain amount of data verification and reduction is necessary. This is normally the responsibility of the project’s science team.” 4. “Where the task is simple enough, a tutorial can be optional, but where a higher level of understanding is necessary, it may, following the results of the beta, be found necessary to make the tutorial compulsory.” 5. “Our experience suggests that a large proportion of a project’s classifications may come from the initial spike in interest, and so it is important that the interface is correct from the start.”

In general, the first phase of the Zooniverse design process is to match scientist expectations with design practicalities. As with any development project, time constraints and budgets are also factors which must be considered.

Once project needs are established and the developers and scientists have a good idea of what it is that should be done, the development team begins a prototyping process. This process emphasizes technical over visual development, and is ad hoc. Though some rough sketches or design ideas may be generated, for the most part, prototyping emphasizes functionality; an outside design firm is brought in on some projects to assist with interface design or visual/aesthetic treatments when necessary. Other design work is conducted in-house.

The prototyping phase is marked by discussions over resources and technology choices. The level of resources dedicated to a project often determine what will be possible: if a project is well-funded, much can be done from scratch, whereas cheaper projects may require developers to recycle code or plan simpler user interactions. In addition, several technologies are “favored” by the Zooniverse development team. JavaScript is typically favored over Flash or Java, though Flash interfaces have been used in some cases where the level of visual interactivity is high. HTML5 and the canvas tag are generally preferred over Flash. In general, technology is evaluated in light of how appropriate it will be for the specific scientific goals of a project.

Once prototyping is complete, the project is tested using a combination of resources. Internal project development and science staff are approached for testing, and when major issues are worked out, the project is also released to select members of the Zooniverse community for further review. Because the Zooniverse team supports a wide variety of citizen science projects, it is relatively simple to approach users on the forums or via email and request 10 or 20 participants for beta testing of a new interface. Response to these requests is usually quite high. In some cases, test subjects may also be approached in person at physical locations associated with the Zooniverse (i.e. Adler Planetarium or at museums or installations where the science team members work). Watching a user interact with the system in person can be very revealing about usability and how well a website is achieving the project’s scientific goals. Assuming beta testing goes well, a project will be launched once the results from the test period are checked for validity and to ensure that everything is working correctly.

4. Data

Galaxy Zoo is the first of several citizen science-based websites that make up the “Zooniverse.” As such, it served as the developmental test-bed for data collection, storage, and management across all other Zooniverse websites. During the upgrade from Galaxy Zoo 1 to Galaxy Zoo 2, the Galaxy Zoo team spent time considering how best to generalize and define data within the site so that future projects could be developed under the same or similar architectures.

The Galaxy Zoo 2 development process resulted in a relatively simple data structure that has been useful for many subsequent Zooniverse projects. This data structure is oriented around “assets,” “annotations,” and “users.” In the Galaxy Zoo project, this structure is as follows:

1. Galaxy images are considered to be “assets” 2. Site “users” are given access to these assets via the galaxy classification interface 3. “Users” provide information about “assets” in the form of “annotations”

The Galaxy Zoo database stores data about assets, users, and annotations, as well as the relationships between the three. According to the lead developer for Galaxy Zoo, “An annotation is essentially a user interaction with an asset.” Beyond this, a classification is a series of such interactions, and workflow is an abstracted sequence of interaction tasks. This data formulation applies to Zooniverse projects beyond Galaxy Zoo. For example, the Old Weather project has users annotating ship’s log pages (assets) from WWI-era warships. In Moon Zoo, users annotate photographs of the moon’s surface by outlining craters.

While the data supplied by participants on various Zooniverse projects may often be generalized to the “annotation” level, the types of annotation and the ways that annotations are generated and stored in data can be very different. For example, Galaxy Zoo walks users through a decision tree to collect annotations – each node in the tree asks a question about the Galaxy (asset) shown, and user answers are recorded as annotations. Moon Zoo, on the other hand, users a different kind of interface to collect annotations. Users are asked to outline craters using a simple set of drawing tools, and the crater positions and sizes (represented numerically) are then recorded. Finally, the Old Weather project uses yet a third form of interface to collect annotations. Users are shown a scanned image of a ship’s log (asset) and use a cleverly designed overlay tool to find and enter (by typing) weather and other information from the log. According to the project lead, “There’s always this interaction between the person and the asset. What we’re going to want to collect is a sequence of interactions… Whatever that interaction is, it doesn’t really matter. You just need some characteristic information about it. Maybe it’s clicking a button. Maybe it’s drawing a crater. Maybe it’s marking a timestamp in a video… it’s still just an annotation.”

Galaxy Zoo adheres most closely to the user, asset, annotation paradigm in data, due to its decision-tree structure and the fact that it was the project around which these constructs were developed. Subsequent projects have migrated in various ways somewhat from these principles. The Moon Zoo project is based on drawing abstract shapes as representations of crater data – circumference, shape, etc. The notion of what constituted an annotation in this project was left deliberately vague, and, in fact, the underlying database for Moon Zoo does not specifically contain a field for “craters.” Rather, there is a single “value” field which houses all data for a given crater observation, formatted into a serialized string value using Ruby hashes. This string value can be easily reconverted to a useful series of data points when required. While this is a non-traditional approach (typically, each individual crater-related value would be stored in its own data field), it allows the Moon Zoo database to reflect a similar structure to the Galaxy Zoo database, allowing for a great deal of code and structure reuse.

In other, more recent projects, Zooniverse developers have begun shifting their emphasis away from the user, asset, annotation paradigm toward more “bespoke” or custom development. This shift has been largely for pragmatic and technical reasons; while each Zooniverse project may be usefully framed by the user, asset, annotation paradigm, during development it is often difficult to manifest this framework in an identical way to previous projects. This is especially true now that multiple developers are involved in Zooniverse development projects. Early on, when one developer handled all aspects of implementation, the Galaxy Zoo code library – created by this one developer – could be easily replicated and used on subsequent projects.

Currently, at least four software developers are involved with Zooniverse projects, but only one was directly involved in forming the Galaxy Zoo code library. This creates additional challenges for other developers, who may need to spend just as much time learning a library as developing and programming new solutions to data problems. Additionally, the existing code library may actually support more functionality than is required by a given project, and adding or modifying code functionality can be difficult in a library, where it is relatively simple when using custom code developed for a specific application.

For these reasons, much Zooniverse development work has shifted from a relatively generalized data structures and code libraries (i.e. Galaxy Zoo and Moon Zoo) to custom development for each project, but with principles of user, asset, and annotation in mind (i.e. Old Weather and Planet Hunters). This has the additional advantage of giving developers more ownership over these projects, keeping morale and interest in the projects high. However, for code that is reused regularly, an effort has been underway to “re-generalize” and ensure that this code is suitable for a wide variety of future projects.

The following screen capture images show various unique interface tools designed to collect data within the asset-annotation-user data framework: Figure 2: Moon Zoo Crater Survey Interface.

Figure 3: Old Weather Transcription Interface.

Interface design is one of the key development tasks on any new project, and the Zooniverse development team spends a great deal of time considering how best to facilitate user interactions with the assets for a particular project. This is accomplished through thoroughly understanding the needs of a particular project, as well as by simplifying interfaces as much as possible to make the user experience straightforward and easy.

5. Scientific Outreach

In keeping with the notion of targeting an inexpert set of users, the Galaxy Zoo website is also aimed at scientific outreach, but not necessarily in the same way that many other citizen science websites attempt. The typical approach to citizen science outreach is to acquaint inexperienced users with a scientific subject area – for example, astronomy, ornithology, geology, etc. This is most often done by including large amounts of subject information either alongside or within the data submission and analysis interface, rather like an encyclopedia of information about that one specific scientific domain (e.g. http://www.mos.org/fireflywatch/ which provides information about fireflies, http://www.fold.it for information about protein strings, or http://www.greatsunflower.org/ with information about bees and pollination).

Galaxy Zoo and the other Zooniverse projects take a different approach to scientific outreach. Project investigators are not as concerned with acquainting individual participants with specific subject areas; indeed, the original Galaxy Zoo interface never even mentioned that galaxies are astronomical objects made up of stars! When it was suggested to the principle investigator for Galaxy Zoo that excluding this kind of information could be detrimental to outreach efforts, however, his response made Galaxy Zoo’s outreach goals clear: “It depends on what your outreach goals are. If you’re trying to teach people astronomy, I agree with you, but… I would argue that we’re trying to do a different kind of outreach, which is science process. The kind of things I’d like Galaxy Zoo to change: the perception that science is done by other people, people’s perception of science as remote, people’s lack of understanding of basic things like the role of statistics.” The notion of scientific outreach at Zooniverse is not to teach a scientific subject (i.e. astronomy), but to help citizen scientists better understand what science is, and to make them more comfortable with the scientific process. This kind of outreach is addressed in a variety of ways in Galaxy Zoo.

Most directly, the Galaxy Zoo website includes a section entitled “The Science” with three sections, including “What We Want to Know,” “Rare Objects,” and “Data Releases.” These pages explain the scientific goals of the Galaxy Zoo project, and how participation in the project will help. For example, from the page itself: “With Galaxy Zoo: Hubble, we want to see how the mix of galaxies has changed over time. More stars were forming back then, so does that mean we should expect more spirals? Or does the proportion of blue ellipticals increase as we travel back in time? Only you can tell us.”

Rare objects are one key piece to helping citizen participants become more versed in the scientific process, and “The Science” page describes how previously identified rare objects have helped with scientific efforts, and also how to handle unusual objects when they are spotted. Several success stories are mentioned, including Hanny’s Voorwerp (a previously unknown object discovered by Hanny Van Arkel, a Galaxy Zoo participant). One way Galaxy Zoo promotes scientific outreach is through a forum, which allows users to communicate with each other and project investigators, as well as to start their own mini-science projects based on Galaxy Zoo data. Forum members are close-knit, even meeting in person from time to time, and growing frustrated when project investigators are unable to attend these meetings. There are about 100 forum users active at any given time, and about 500 regular users. These constitute a small percentage of the total users of the Galaxy Zoo system, but they are a very important part.

Forum users have established collections of objects based on scientific interest and also for non-scientific reasons (e.g. galaxies that are shaped like each letter of the alphabet, with the goal of creating a “galaxy font”). In addition, other astronomers have approached the Galaxy Zoo forum community for help with their own scientific projects – for example, identifying particular galaxy features within the Hubble images that are not specifically asked about in the Galaxy Zoo decision tree itself.

Unusual astronomical objects and the forum itself are key components in the process whereby Galaxy Zoo users gain a better understanding of the scientific process. One specific set of objects, the “Green Peas,” is an excellent example of how this kind of learning occurs. The “Green Peas” are a series of unique objects found in a number of the Sloan Space telescope images presented to Galaxy Zoo users. They were small, round, green-colored objects that seemed unusual and worth investigating further.

The “Green Peas” became a topic of interest on the Galaxy Zoo forum, but the participants involved in this topic didn’t want to simply bring these objects to the Galaxy Zoo investigators for identification. They had established a scientific question (What are these objects?), and wanted to try to answer at least part of that question before asking project investigators for input. Accordingly, using the forum and the Galaxy Zoo system to support their work, they began an organized effort to identify more of the “Green Pea” objects in the Galaxy Zoo images. First, this meant identifying common characteristics – color, shape, size, etc. Color was an especially important characteristic, since the “Green Peas” were all the same color. One participant wrote a query designed to crawl through the Galaxy Zoo data and collect images containing that specific color. The participants in this mini-project then built their own mini- version of Galaxy Zoo to go through the collected images and identify “Green Peas” within them. Using a downloaded set of spectra data, they generated their own signal-to-noise measure, and established the spectra (and thus chemical composition) of the “Green Peas” (which turned out to be primarily ionized oxygen). With this data in hand, they then approached the Galaxy Zoo investigators for further assistance. Instead of simply asking, “what are these?” the participants involved in the “Green Peas” project could say, “we have identified X number of objects with the following characteristics and chemical composition. Can you help us identify them?” Based on the collected data and further investigation, it turned out that the “Green Peas” were dwarf galaxies that develop stars incredibly quickly, though no one is yet sure why.

The “Green Peas” are a great example of Galaxy Zoo’s ability to teach participants what science is, rather than facts and knowledge that have already been established through science. Using the forum as a communication tool and the Galaxy Zoo system as a source of data, Galaxy Zoo participants come to the basic understanding that science is about answering questions, not absorbing facts, and that answering questions requires data, logic, and a variety of other tools and skills. The perception that science is for “other people” rapidly goes away when one begins doing science themselves, as many in the Galaxy Zoo community have done, by:

1. Finding interesting objects 2. Figuring out if there are more of these objects. 3. Characterizing found objects. 4. Learning about how data is used. 5. Discussion and learning about the peer review process.

Hanny’s Voorwerp and the “Green Peas” have motivated Zooniverse investigators and developers to design better tools for scientific collaboration amongst participants. For example, the Moon Zoo website, a project designed so that citizen scientists can identify and outline photographed by the Lunar Reconnaissance Orbiter, is undergoing a design change so that users will be able to more effectively collaborate during the crater identification process. This new tool gives users the opportunity to tag photos with information about craters and other features of interest, and also allows users to write comments for each photo. These comments and tags will link photos with similar features, and will serve as a mini-forum based around interesting photos and objects. Instead of using a separate forum to collaborate on objects of scientific interest, the new Moon Zoo system will allow users to collaborate as part of the classification process, better integrating data collection and scientific outreach goals.

The Moon Zoo collaboration interface (Beta) is shown below:

Figure 4: Moon Zoo Collaboration Interface.

6. Participation and Citizen Science Philosophy

Unlike many citizen science projects that are motivated primarily by a need to collect large amounts of data in a timely way, Galaxy Zoo and the other Zooniverse projects are developed by a scientific team with a more holistic and innovative attitude toward the citizen science experience. This attitude is encapsulated by three “ethical rules of citizen science:” 1. Don’t waste people’s time 2. Treat participants as collaborators 3. Don’t do anything computationally that you don’t need humans to do.

The first of these rules is relatively self-explanatory, and it is based on an assumption: that people are participating in projects like Galaxy Zoo because they are interested in being involved in research, not simply because they like clicking through a decision tree about galaxies. Successful adherence to this rule involves making sure that the data people are contributing will be useful, and making sure that papers are published from it. Furthermore, it requires that data stop being collected when it is no longer necessary; as an example, Zooniverse’s Supernovae project works with a somewhat limited amount of data, and once that data is adequately classified, there is no need for participants to keep working with it. Thus, this project is turned on and off as needed, rather than left running to waste people’s time.

Treating participants as collaborators is another important approach to citizen science. Some projects (though certainly not all) have a tendency to view citizen scientists as computational resources, rather than as co-investigators. This can sometimes be an effective way to address research problems, but the Galaxy Zoo investigators are more interested in treating participants as collaborators. This approach has resulted in several interesting and important discoveries, notably Hanny’s Voorwerp and the “Green Peas,” both of which saw citizen scientists making real and meaningful scientific contributions in collaboration with Galaxy Zoo investigators. Notably, both Hanny Van Arkel and those involved in the “Green Peas” investigation were credited for their work with acknowledgements and co-authorships. This is an important aspect of collaboration, and the Zooniverse projects are designed to give participants credit for their help. In fact, many publications resulting from Galaxy Zoo come with a lengthy list of contributors; Galaxy Zoo participants are asked to provide a real name that they would like to be identified by in any published research.

Finally, it is important not to ask participants to do tasks that can be handled by existing computational hardware or software. This is a nuanced ethical rule to adhere to, as there are some computational tasks that can be handled by computers, but due to restrictions (cost, hardware, software, etc.) it may not be feasible to have computers do the work. For example, identifying galaxy characteristics is, theoretically, a task that computers should eventually be able to do, but at the current state of the art, it is more reliable and effective to have humans do this task. In some cases, this line is not as clear. Nonetheless, it is important to keep in mind that if a computer could effectively do a task that humans are being asked to do, the first ethical rule of citizen science is in violation. It is important to have people working on scientific projects because they are needed, and not simply for “outreach.” This principle is reiterated in the questionnaire distributed to scientists on potential projects to make them begin thinking about how to best conduct a citizen science study.

The three ethical rules of citizen science are factored into decision-making at Zooniverse. With the popularity of Galaxy Zoo, the Zooniverse developers are approached regularly about deploying new citizen science-based projects. The ethical rules are used as a test to ensure that potential projects will be of value to both project investigators and project participants.

The ethical rules also touch on a broader point related to participation: unlike some projects, the Zooniverse projects are participant-centered. This has resulted in a number of interesting possibilities for future project development. One such possibility is the notion of a “citizen science career,” whereby an individual might contribute over time to multiple projects of varying difficulty, advancing from novice to expert rank and achieving more and more involvement in the scientific process. The “Green Peas” project is an example of this, where participants expanded their role in the Galaxy Zoo project and took on more than simple human-computation tasks. Taken further, the idea of a citizen science career could see citizen scientists starting small – with simple classification tasks, for example – then “graduating” to larger and more complex projects, and finally taking on fully investigative roles under the guidance of a project investigator… or even taking on project leadership themselves. In a sense, this is already how science education takes place for those who intend careers in academia or scientific research, and providing an avenue for this scholarship in a citizen science context is both novel and highly interesting.

The Zooniverse is already tackling some of this challenge through the deployment of projects that vary in difficulty, thereby creating a path for citizen scientists to “graduate” from one project to another. For example, while Galaxy Zoo involves relatively simple classification tasks, other Zooniverse projects, such as the Galaxy Mergers project, are much more involved. This kind of “graduation” from project to project is definitely possible, but project investigators actually see the situation as more nuanced, suggesting that participation across projects may also vary according to mood and interest: “If you log in on Saturday, you’re having a day off, and you’re feeling relaxed, you want a bit of a challenge, you might choose to do something complicated, like our mergers project, where you control a model of a merging galaxy. And it’s really quite involved and you might spend 45 minutes on a single object. But if you’re in front of the telly on a Thursday night, you might just classify galaxies.”

One potential pitfall to offering multiple projects of varying complexity and difficulty is that participants will become diluted across projects. That is, as new projects are brought online, a certain percentage of participants will leave one project to work with the new one, and overall participation on each project will drop. This was a real enough concern that there was serious discussion of creating a committee to address the issue, but according to site traffic data from the Zooniverse, this dilution effect doesn’t actually occur. Projects appear to find their own audience when they are launched, and while a few people will shift their interest to new projects, for the most part, each project finds its own stable core of users.

7. Galaxy Zoo Website Structure

An analysis of the Galaxy Zoo website, along with interviews conducted with project staff shows how the site is organized and structured. The Galaxy Zoo website has a relatively wide and flat structure, with a series of “top level” links located across the top of each page, and only a few secondary links located within those main sections. This wide, flat structure makes the website ideal for inexperienced users who may not be familiar with the site or its various sections, since it is relatively easy to see at a glance what the site contains.

Most of the top-level links in the Galaxy Zoo website are references to internal sections of the website itself: for example, “Home,” “The Story So Far,” “The Science,” “How to Take Part,” “Classify Galaxies,” “Zoo Media,” “FAQ,” and “Contact Us.” A few links take users to external but related site sections: for example, “Forum,” and “Blogs.” Because the Galaxy Zoo website is developed by an in- house team using Ruby on Rails and a MySQL database, rather than using a CMS or other off-the-shelf platform, it is possible to achieve a high level of cohesiveness between the various sections of the website, something that other projects have occasionally struggled with. In a few places, however, it seems that an off-the-shelf solution is still preferable (e.g. the Forum). This may be changing however, as at least one Zooniverse project (Moon Zoo) will be deploying a customized comment and discussion tool directly within the data collection mechanism.

In addition to the primary top-level links, the Galaxy Zoo website also contains several varieties of sub- navigation. A series of links located near the bottom of every page, in a large toolbar, afford access to a variety of key resources, as well as external links. This sub-menu is similar to large-format footers found on some websites, though because of its position on the page, is not a true footer per se. Links in this area include “Galaxy Zoo Quick Links” (“Classify,” “How To Take Part,” “Galaxy Zoo Forum,” “Galaxy Zoo Blog,” “Galaxy Zoo Twitter”) and “Astronomy Links” (“Sloan Digital Sky Survey,” “SDSS Database Access,” “Oxford University,” “University of Nottingham,” “University of Portsmouth,” “Yale University,” “”). A true footer, located below the sub-menu, contains a few typical footer links: “The team,” “Privacy Statement,” and “Copyright.”

The following screen shot shows a typical page of the Galaxy Zoo website, in this case, the “Home” page which provides access to most of the rest of the website: Figure 5: The Galaxy Zoo Website (Galaxy Zoo Website: Home, 2010).

The overall structure of the Galaxy Zoo website is clearly visible in this screen shot. More detailed structural views of each section of the Galaxy Zoo website are included in Appendix A.

8. Comparison of Galaxy Zoo to Other Citizen Science Projects

Without detailed knowledge of the behind-the-scenes workings of other citizen science platforms, it is difficult to make comparisons between them. Nonetheless, the front-end systems, functions, and features of other projects do offer valuable clues as to their inner workings and the motivations of their designers. Thus it is possible and appropriate to compare at least the public portions of Galaxy Zoo to some of its peer citizen science platforms.

A review of 27 citizen science websites was conducted. The websites examined contained the following key functionalities (Fig. 1), which are ordered based on the total count for each feature. Features were identified through an iterative, exploratory process; features identified later in the review process were then re-checked in sites that had already been reviewed. 33 key features of citizen science websites were identified, and, across all the websites reviewed, 354 total instances of these features were found (for a more detailed analysis, see related documents pertaining specifically to this review). Features found in the Galaxy Zoo platform are noted in the rightmost column.

Functionality Type Count Galaxy Zoo Project Information 26 X Instructions 26 X Contact Information 23 X Scientific Information 21 X Collect user Information 18 X Education 18 Project Data So Far 17 X Registration 16 X Team/Staff Information 16 X Affiliates and Sponsors 16 FAQ 15 X Sign In 13 X Links 13 X Submit Text Data 11 News Feed 11 Blogs 11 X Submit Other Media 10 Forum 9 X Alerts 9 X Donate 8 Practice and Testing 7 X Analyze Data 7 X Published Papers 5 X Participant Scores and Stats 5 Email List 5 Photo/Image Gallery 5 Offline Registration 4 Customized User Experience 2 Page Translation 2 Calendar 2 Reward for Participants 1 Contests 1 Sales / Store 1

Total Features 354 Figure 6: Key Citizen Science Website Functionalities

Like other large-scale citizen science projects, the Galaxy Zoo website contains most of the core features one would expect: static information pages about the project and investigators, registration and sign-in systems, data analysis functionality, and detailed information about participation. Of those sections absent from Galaxy Zoo, few are surprising. For the most part, the Galaxy Zoo website appears to contain the sections it needs, while excluding other extraneous functionality and information. This supports the acknowledged design philosophy behind the site, which is to reduce complexity wherever possible. It supports the stated reasoning behind developing sites in a custom, bespoke manner as well – that custom site can contain the functionality they need, while excluding functionality that is unnecessary.

One surprising absence is the lack of an education page. However, this may be attributed to the Zooniverse philosophy of scientific outreach – rather than educating people on astronomy subjects per se, Galaxy Zoo and other Zooniverse projects are educating users on scientific process. However, a section on peer review and publication, or information about the scientific method could be valuable for users who are interested in going beyond simple classification of galaxies to more advanced scientific inquiry. In general, there appear to be few tools designed specifically to support process-oriented scientific learning, though this is changing somewhat with the inclusion of discussion and sharing tools connected to specific classification tasks.

Unlike many other citizen science websites, Galaxy Zoo does post published research papers. This relatively unusual practice is likely the result of project investigators’ philosophies toward citizen science, where participants are viewed as collaborators rather than as distributed “computing” resources. Posting the published material that has come out of the Galaxy Zoo project is a way of showing participants how their work is being used, and also a way to give credit to participants. In general, participants are acknowledged in these published works, especially when they have made greater-than-usual contributions to the research. In at least a few cases, citizen scientists have been included as co-authors on Galaxy Zoo papers oriented around their findings.

Another key difference between Galaxy Zoo and other citizen science projects is the emphasis on annotation tasks over data collection tasks. The distinction between annotation and collection is an important defining characteristic of citizen science projects, and most fall into one or the other of these categories. Annotation tasks ask citizen scientists to generate new information out of existing data sources, for example by classifying a galaxy from a photograph, defining the physical measurements of a crater, or transcribing an old document. Collection tasks require citizen scientists to produce new data through observation, for example by counting insects, measuring rainfall, or registering earthquake events.

At times, the line between annotation and collection can become blurred. For example, some projects ask participants to go into the field and return observations of various flora and fauna, seemingly a relatively clear-cut collection task. The eBird project is an excellent example of this. Many of these projects, including eBird, also request information beyond simple counts – for example, species type, weather conditions, time of year, time of day, location, and more. This hybrid task adds a layer of annotation to the collection task and requires more effort by participants, but potentially results in greater scientific gain overall.

Galaxy Zoo, where galaxy photographs have already been collected by various space telescopes and other scientific instrumentation, is not necessarily unique in its emphasis on annotation. Other projects also emphasize annotation over collection, including many of the other Zooniverse projects, fossil identification and observation projects, and the Stardust@Home project (where participants use a unique web interface to identify microscopic space dust collected by a specialized space capsule). Nonetheless, the differences between pure annotation, pure collection, and hybrid tasks are important and worthy of mention.

Finally, as has been seen in other citizen science platforms, there appear to be two typical routes for technical implementation in the citizen science community, with some programs opting for custom development, and others leveraging a content management system (CMS) or other off-the-shelf software. Galaxy Zoo and the Zooniverse sites are all developed in a custom way, as previously discussed. To compare this to other citizen science websites, a review of the same 27 citizen science websites was conducted specifically looking for evidence of either custom, CMS, or hybrid (a combination of custom and CMS) development techniques. Both the website as presented to the user and an examination of the website’s public HTML code (using “view page source”) were examined to make this determination. The following table shows how custom development compares to CMS development across the citizen science websites reviewed.

Development Technique Count Custom 20 CMS 6 Hybrid 1 Figure 6: Citizen Science Website development Approaches

As shown, custom development appears to be the more common approach for development of citizen science supporting web technologies; 20 of the 27 sites reviewed had custom websites as opposed to websites built through the aegis of a content management system. It should be noted, however, that a “custom” web implementation does not necessarily imply a “better” implementation. Many of the custom sites examined in this review were less elegant or less professionally done than those based on a CMS. That is, though a site may be “custom,” it may still be placed in a range of quality judgments ranging from poor to excellent. Indeed, it is probably true that custom sites have a wider set of extremes as far as quality than do CMS-based sites, which must all conform to at least a minimum standard of pre- established quality. In this review, Galaxy Zoo stood out as an example of very high quality custom development.

9. Discussion

This comprehensive study of Galaxy Zoo and the sister websites that comprise the Zooniverse suggests a variety of important lessons regarding citizen science and citizen science-supporting technologies. These appear to be clustered around three broad categories: design and development lessons, lessons for planning a citizen science project, and lessons oriented on the phenomenon of citizen science itself.

9.1 Design & Development The Galaxy Zoo project revealed that developing technology for citizen science often requires balances and tradeoffs. This was seen in several of its design-related aspects, as well as in the design and development of other Zooniverse websites. For example, Galaxy Zoo was developed with a relatively simple data paradigm in mind: users annotating assets. This simple phrase established a structure for Galaxy Zoo’s underlying data and says much about how information is related within the system.

Traditional programming practice suggests that underlying data should be generalized, thus making it more reusable. While the user-annotation-asset paradigm is general and has been applied to a variety of Zooniverse projects, manifesting a unified data structure for each website that follows this principle has not been possible. Rather, due to practical constraints and development preferences, each Zooniverse project is approached like a stand-alone development task. While the user-annotation-asset paradigm hold relatively true at a conceptual level, at the data level, each project, especially new ones, is unique.

Another developmental trade-off is found in the reuse of previously developed code, so-called “code libraries.” Again traditional programming practice dictates that code be reused as much as possible, to save time on new development, i.e. to avoid “reinventing the wheel” each time a new project is started. On the Zooniverse project, an extensive code library was developed during the Galaxy Zoo implementation. It was used heavily thereafter, at least until additional developers were added to the team. With new programmers, however, the code library became a source of tradeoffs: though the code library contained previously developed code to handle a wide variety of citizen science related tasks, it was complex and time consuming for recently hired programmers to sift through and use effectively. Though the code library offered a great deal of functionality, sometimes it offered more than was needed for a given project, exacerbating the challenges surrounding its use. Finally, the code library had been written largely by one developer, and many new members spent their time learning the library, rather than building new systems of their own for Zooniverse; it was difficult to gain a sense of ownership over the Zooniverse projects when their code was largely the product of one developer.

Such challenges resulted in a new development paradigm, where some previous code is retained or centralized, especially for highly repetitive tasks (i.e. user management), but each Zooniverse site is thereafter approached like an independent design project. This gives developers greater ownership over the project and diverts developer time toward new and customized solutions for each project, rather than toward recycling previously developed code. New developments are more targeted and ultimately easier to modify should the need arise, while development time has not been particularly adversely affected by moving to custom development and away from code libraries.

A third balance found in the Zooniverse is that of resources. Any developer must deal with limitations and restrictions centered on quality, cost, and build time. While many citizen science projects rely on volunteer web development efforts to reduce cost, this can sometimes have an effect on either quality or development timelines. At Zooniverse, the first iteration of Galaxy Zoo (a highly successful implementation in its own right) was developed largely with volunteer developer time. Though this site performed well above expectations, it was not without its trouble-points. When upgrading to a second version, it was realized that volunteer effort was no longer enough; there would have to be an attempt to “professionalize” development. This is not uncommon in the world of business, where a beginning company finds success and must formalize its plans and acquire new expertise to keep pace. In the world of science, however, it can be a serious challenge, as funding and resources are not dictated by the sale of a successful product (i.e. more success equates to more resources), but by grants and faculty research time (i.e. when a project achieves its greatest successes may also be when faculty time is most limited and grant funding is due to expire). The need for professionalization is important, but a difficult challenge to overcome. While funding details are limited, at Zooniverse, multiple projects spread development costs across a variety of funding streams from various scientific projects. Beyond tradeoffs, matters of interface design contain additional key lessons for citizen science development. At Zooniverse, each project receives a customized interface, suitable for use in just that one project. While these interfaces may generally adhere to the user-annotation-asset paradigm on a conceptual level, they can be quite diverse in their details. Across the various citizen science websites reviewed, interfaces all appear to be rather distinct. Single interfaces supporting multiple projects are rare, and usually only found when two different scientific projects can be supported by one interface developed to collect data for both. This suggests that developing an interface to support multiple citizen science projects with varying scientific goals will be highly challenging.

As with any interface design, it was noted in this review of Galaxy Zoo, that simplicity was of high importance to developers and users alike. In addition, generally speaking, principles of good usability are important; Galaxy Zoo’s greatest technical successes are found in areas where usability is high.

One final development-related lesson is that of content management systems vs. custom development. Zooniverse projects are developed in an entirely custom way, but other citizen science programs opt for CMS development. Each method brings strengths and weaknesses, and the following table illustrates some of these:

CMS Proprietary App

Pros  Includes open APIs for many common tasks (user  Greater choice of development technologies management, form submission, etc.)  Better control over data submission tools  Often comes with a support network  Better control over data view/visualization tools  Often free or inexpensive  Ability to integrate data tools with site content  May be easy for non‐technical staff to work with once  Better control over design and implementation running  Market penetration  Often comes with stock templates for attractive design “out of the box”

Cons  May require specific technical backbone (servers, etc.)  Can be expensive / Difficult to implement  Sometimes difficult to customize  Requires high level of technical expertise to implement  Limited to handling data tasks it was designed for  Multiple required areas for development (data, UI,  May be complex to install and configure coding, etc.)  Often too many features, or doesn’t include especially  May require full time support staff and/or a non full‐ desirable features time support arrangement with developers

Figure 7: Pros and Cons for Citizen Science Web Application Frameworks.

The reasoning behind a decision for one or the other of these development options is varied and complex, dependent on a variety of factors. For Zooniverse, the answer was predicated on vision: Galaxy Zoo was an ambitious project, and a CMS system was not the most ambitious route to completion. As Galaxy Zoo’s lead developer put it, “We had pretty grand designs when I joined, and we still do.... Galaxy Zoo 2 was the first of a new breed for us. We’re now eight or nine projects down since then. I think going the bespoke route allowed us to really be pretty opinioned about what we thought was the right approach.” For other projects, a CMS system might be a more satisfying or productive option. The lesson is that each citizen science project will eventually reach this point of decision – whether to leverage an existing platform to its needs, or to build something entirely new.

9.2 Planning a Citizen Science Project In addition to design-focused lessons, a variety of lessons for planning and preparing a citizen science project also came to light during this case study. One key lesson is the need for technical developers to work closely with scientists to set expectations and plan the technology that will support a citizen science effort. Many scientists, while experts in their own fields, are not well-versed in the fields of web development or web technology. At the same time, web technical experts are typically not knowledgeable about scientific processes or specific fields of inquiry. To improve a citizen science project’s chances of success, it is important for these groups to meet and understand each other’s needs and capabilities.

For scientists, this often requires revision of expectations, either down to more practical levels, or up, in order to see possibilities that might at first have seemed impossible. For example, it is common for scientists to approach a citizen science project with a great many scientific tasks in mind, only to find that web users typically will not spend the kind of time that they are expecting. Another possibility is for scientists to approach a project assuming that many desirable tasks will either be impossible to implement, or boring for users. The power of the web is double-edged. For example, on the Galaxy Zoo project, more classifications have been provided than are strictly needed because of high user interest, the long-term of the project, and the fact that (though the number is quite high) there are a finite number of space telescope pictures available for participant classification. This has been addressed by placing new image sets into the Galaxy Zoo system at periodic intervals. At the same time, some projects are subject to a falloff effect – high user interest early in a project due to publicity and word of mouth, but a rapid fade of this interest over time. Web technologies can be a great benefit to citizen science projects, but for scientists it is important to understand the possibilities, while also keeping practicalities in mind.

For web experts, the opposite is often true. Knowledge both of the possible and the practical is usually quite good. The challenge is to develop an understanding of scientific goals and needs. For example, data validation, useful data formats, reliability, and reproducibility are all key goals of scientific exploration. A web developer for citizen science should understand these goals and have a sense of how science is done, in order to most effectively support the underlying science of a project. At Zooniverse, many of the developers are scientists in their own right, and thus the marriage of web and scientific expertise is quite powerful. On other projects, this union may be weaker, and projects, even if highly successful, are correspondingly more subject to challenges and difficulties stemming from inappropriate or imperfect web implementation.

9.3 Citizen Science Phenomenon This case study identified several key lessons on the phenomenon of citizen science itself. First and foremost of these are the “rules of ethical citizen science” developed by the Zooniverse team. These rules suggest that citizen science participants be treated as collaborators, not have their time wasted, and not be asked to undertake tasks that could be better undertaken by a computer. These rules have motivated a variety of design decisions on Zooniverse projects, including the addition of tools for collaboration and the release of scientific data and papers to the Zooniverse community at large. Unlike many citizen science projects, participants are credited, either as a group, or, in some cases, individually. At least one participant has been included as a co-author on a published research paper.

The rules for ethical citizen science seem to stem from a desire for scientific outreach and education. Galaxy Zoo and the other Zooniverse projects have an auxiliary goal to collecting scientific data – the desire to teach participants what it is to do science. This outreach effort emphasizes scientific process – how scientific inquiry, regardless of topic, is conducted. Ultimately, this form of outreach and education generates more knowledgeable participants, and episodes such as the “green peas” show how citizen scientists may graduate from simple tasks to complex, self-guided research.

Providing feedback to participants is an important way to foster a sense of community and to encourage continued participation. Feedback can be immediate, for example, letting a participant know that he or she has classified a galaxy correctly or is doing a task well. In addition, feedback in the form of credit for work completed, acknowledgement in published research, and praise for continued participation are also important. In general, citizen scientists crave feedback. One downside to immediate performance feedback is the potential bias it can introduce. A participant who is told whether or not a classification decision agrees with the decisions made by their peers may begin to classify according to this standard. In a project like Galaxy Zoo, this could manifest as follows: if enough people are wrong about a classification, the incorrect decision might come to be seen as “correct” and soon, data has been badly biased as individuals attempt to “be correct” rather than classify galaxies objectively. The desire for feedback is strong, but providing it can create new kinds of challenges.

A final lesson is the distinction between annotation and collection tasks in citizen science. Galaxy Zoo and most other Zooniverse projects are annotation based. Participants are presented with an asset (a photograph, an old document, etc.) and asked to generate metadata about the asset by, for example, providing information about the galaxy shown in the photo or transcribing some of the written material in the document. Not all citizen science projects are annotation-based. Some require the collection of new data in the field, for example, spotting stars at night, counting bees, or measuring precipitation. Some projects are a hybrid of these methods, requiring both collection and annotation tasks.

In any good citizen science implementation, careful study of the task type must be undertaken to ensure that the technical implementation will adequately support the scientific need. If the task is annotation- oriented, certain functionality will be required to present the user with an asset and capture the resulting metadata. If collection-based, the system will be oriented more toward accepting various measurements and data points. Similarities will exist between these versions, but each has its own particular set of technical requirements.

10. Conclusion

The Zooniverse websites, Galaxy Zoo among them, are an interesting lesson in supporting scientific needs with technology-supported citizen participation. Lessons learned revolve around both the technologies themselves (development techniques and approaches, usability, and design), as well as the phenomenon of citizen science itself (the notion of ethics for citizen science, the perception of participants as collaborators).

Galaxy Zoo and the Zooniverse are, in many ways, unlike many other citizen science projects, which have scientific outcomes as their only true goal. While scientific outcomes are important, the Zooniverse developers have revised their thinking to a more “foundational” point of view. By fostering scientific process education and true collaboration with participants, the Zooniverse team has built a solid foundation of “home scientists” who are truly able to participate meaningfully in scientific inquiry. With this foundation in place and growing, the potential for Zooniverse projects to take new and interesting scientific problems is nearly limitless.

Appendix A: Website Structure