Open Science – Towards Reproducible Research
Total Page:16
File Type:pdf, Size:1020Kb
Information Services & Use 37 (2017) 361–367 361 DOI 10.3233/ISU-170846 IOS Press Open science – towards reproducible research Julien Jomier CEO, Kitware, 26 rue Louis Guérin, 69100 Villeurbanne, France Tel.: +33 (0)4 37450415; E-mail: [email protected] Abstract. This paper presents an overview of several efforts towards reproducible research in the field of medical imaging and visualization. In the first section, the components of Open Science are presented: open access, open data and open source. In the second section, the challenges of open-science are described and potential solutions are mentioned. Finally, a discussion on the potential future of open science and reproducible research is introduced. Keywords: Open science, open access, open source, open data, reproducibility 1. Introduction For centuries, scientific publishing has been the driving mechanism for disseminating knowledge to scientific communities. The main role of publishing has been seen by many scientists as a way to “stand on the shoulders of giant” by reusing the methods and knowledge developed by peers in the common goal to advance Science. While the act of publishing methods, results, and findings is critical to science, it has slowly become more of a requirement rather than a dissemination effort. For instance, in academia, researchers are primarily evaluated based on the number and the quality of their publications. Furthermore, in recent years, the overall cost of publishing has raised the level of entry for several institutions around the world. Finally, the content as well as the access of published materials has also been criticized by researchers who tried, and failed, to reproduce published recipes due to the lack of information and data. For all these reasons, the Open Science movement has emerged in order to overcome these issues with “traditional scientific publishing.” This paper presents the key concepts of Open Science, as well as the challenges associated with this movement and the direction where Open Science is moving towards. 2. Open Science To further understand how Open Science came to be, one must look at the concerns with traditional publishing. First, publishing tends to be seen as a competition more than collaboration, especially in academia where the need to publish is critical. Second, traditional publishing is usually limited in content. In fact, often only written content along with figures can be associated with a given publication. This limitation is one of the sources of frustration for many researchers whom have been trying to reproduce published 0167-5265/17/$35.00 © 2017 – IOS Press and the authors. This article is published online with Open Access and distributed under the terms of the Creative Commons Attribution Non-Commercial Li- cense (CC BY-NC 4.0). 362 J. Jomier / Open science – towards reproducible research Fig. 1. Components of Open Science. results without success. Third and last, traditional publishing is still a slow process from submission to publication with a long history of peer reviews and publisher establishment. In the modern days of agile management, Open Science embraces the concept of agile publishing where the maximum number of information is disseminated quickly to the masses. Open Science cur- rently encompasses three main concepts: Open Access, Open Data, and Open Source, as illustrated in Fig. 1. These concepts are described next. 3. Open Access Open Access publishing is mainly done online as a way to disseminate research free of all restrictions on access. One should note that most Open Access journals are usually free of many restrictions on use; however, some journals might restrict the usage, for instance, to non-commercial use. Open Access requires a new business model for publishers as the revenue from publishing is not di- rectly generated from readers. On the one hand, open access publishers have found creative ways to sustain these journals, either via advertisement, pay-per-use services around publications or by generat- ing revenue exclusively from the authors. On the other hand, some Open Access journals, qualified as “delayed journals,” provide publications to their readers for free only after a period of time, meaning than recent publications have to be purchased. 4. Open Data Another main component of Open Access is the notion of Open Data. As one can guess, the open data movement enables the dissemination of data in a free and open manner. The notion of open data was enabled by the emergence of (very) high speed internet access which allows to disseminate large datasets easily from the comfort of a personal computer. Scientists know that data is critical for any J. Jomier / Open science – towards reproducible research 363 Fig. 2. Slice of the Visible Human project. experiment and sharing data can significantly reduce the time of an experiment. Moreover, by making the data collection a collective effort, the quality as well as the quantity of the data is often improved. However, the potential issues with disseminating data are numerous. First, the nature of the data is of- ten diverse and usually depends on the scientific field, therefore the datasets are often very heterogeneous and their format, if not standardized, can be a bottleneck for reuse by other researchers. Second, datasets are becoming more and more massive which implies that either the data must be down-sampled in order to be shared, or the infrastructure, in terms of storage and bandwidth, must be upgraded to support a large amount of data. Third, the scientific communities are not only interested in the input data and final results, but they are also interested in intermediate data so that they can build upon already-processed datasets. And fourth, data sharing licenses have required some effort to be accepted and, as open source licenses, they are critical to regulate the dissemination and usage of such data. One example of Open Data is illustrated by the Visible Human Project [2] initiated by the National Library of Medicine (NLM) in the USA. This project disseminates openly the color cryosection CT, as illustrated in Fig. 2, and MRI scans of a former inmate whom gave his body to science. Upon his death, high dose CT scans of the whole body as well as high resolution (at the time) MRI scans were acquired. On top of these datasets, color cryosections (photographs) were generated by cutting the frozen body into thin slices. These datasets have become a standard for medical imaging research and have a tremendous value. A more recent example of Open Data for lung cancer research is illustrated by the Give a Scan [6] project initiated by the Lung Cancer Alliance in the USA. Give a Scan is the world’s first patient-powered open database for lung cancer research. The project currently host over seventy-six patients and provides the imaging data (CT scans) along with metadata information, which is critical for longitudinal studies and statistics. The datasets are accessible freely and are currently used in several research studies [5]. 364 J. Jomier / Open science – towards reproducible research 5. Open Source The Open Source movement started back in the 1980s with the creation of the Free Software Founda- tion (FSF). The movement was strengthened in 1998 by the creation of the Open Source Initiative (OSI). The pillars of Open Source can be described by seven values: security, affordability, transparency, perpe- tuity, interoperability, flexibility, and localization. As the years have progressed, a variety of open source licenses have emerged making it somewhat complicated to understand the full extent of the license, but overall, permissive vs. non-permissive licenses allow both academia and the industry to make the most of open source software. In the past decade, with more and more companies embracing and releasing open source software, the infrastructure for hosting, testing, and deploying open source tools have flourished. Such well-known pieces of the infrastructure include GitHub, gitlab, and iPython notebooks, among others. In the medical imaging field, a well-known open source project, The Insight Toolkit (ITK) [3,7]was initiated in 2000 by the National Library of Medicine (NLM). After successfully initiating the Visible Human project, the question of processing the data was raised and in particular several universities and groups around the world started to implement their own, regretfully often non-interoperable, image pro- cessing software. NLM decided to fund the development of the Insight Toolkit in order to “standardize” the implementation and use of image processing in the medical field. ITK is an open source (BSD li- cense) toolkit written in C++, with wrapping for other languages. It has been developed by over one hundred and fifty developers from around the world and has numerous external users and contributors. The project has been a success and is currently used by academia and the industry around the globe. 6. Open Science examples There are many examples of open data, open source and open access projects which illustrate the concept of open science. In this section, two projects are presented. The Insight Journal [4] was created in 2005 as an open access journal companion to the Insight Toolkit. The main idea of the journal is to bring agile programming to the publishing world. Agile publishing allows authors to publish instantaneously their finding while allowing reviewers to comment directly, without restrictions. Furthermore, based on open source and open data, the Insight Journal enforces reproducible science by running automatic testing upon submission. This reproducibility is shown as an automatic review by the testing system, letting users, developers and readers know if the submission is usable as is or not. The Journal is currently hosting over six hundred papers and has more than four thousand registered readers. Elsevier’s Science Direct 3D data visualization project [1] was initiated in 2010 in order to bring 3D visualization to scientific publications.