Data Sharing and the Future of Science Who Benefits from Sharing Data? the Scientists of Future Do, As Data Sharing Today Enables New Science Tomorrow
Total Page:16
File Type:pdf, Size:1020Kb
EDITORIAL OPEN Data sharing and the future of science Who benefits from sharing data? The scientists of future do, as data sharing today enables new science tomorrow. Far from being mere rehashes of old datasets, evidence shows that studies based on analyses of previously published data can achieve just as much impact as original projects. ata sharing has a long history makes new types of research possible. in many areas of research. Consider, for instance, research using the Although the push to encou- Human Connectome Project (HCP) data- Drage social and biological sci- set, one of the data sharing initiatives entists to share and pool their included in the Milham et al. study. The results is a recent one1, in other fields the HCP currently contains extensive fMRI, use of shared data has been the norm for structural MRI and behavioural data from some time. For over a century, much of 1200 healthy young adult volunteers 1234567890():,; economics and meteorology have been (https://www.humanconnectome.org/ based on publicly shared data, for example. study/hcp-young-adult), and is expanding However, trepidation in relation to data- to encompass child, adolescent and older sharing is still prevalent in the scientific adult brains. These data are made available community, particularly in certain dis- to any interested researcher. ciplines. The issues that While data sharing had a somewhat “There is a strong argument to be make some researchers rocky start in the world of cognitive neu- reluctant to share their roscience4, the success of the HCP and the made that leaving data unshared is own data have been many influential studies based on it shows much discussed2,but that its time has come. Without data an impediment to the scientists of researchers considering sharing, it would be all but impossible for a the future.“ using shared data as a single research group to scan 1200 people. basis for their own MRI scans are expensive, and neuroima- research also have concerns: if I want to ging studies using original data typically publish high-impact work, don’t I need to consist of 20–50 participants. These sample collect new data? Is it the act of collecting sizes were sufficient to support the kinds of original data that makes a study novel? studies that were cutting-edge a decade The benefits of data sharing may seem ago, but today, more advanced methods difficult to quantify. But the work of require much more data. Michael P. Milham and colleagues3 pro- It’s not just in neuroscience that data vides direct evidence that, in the field of sharing has already transformed the kinds neuroimaging, published papers based on of studies that researchers are able to carry shared data are just as likely to appear in out. In genetics, genomics and structural high-impact journals, and are just as well- biology, large shared datasets are common cited, compared with papers presenting (e.g., ref.5) and many researchers have original data. Although citations of a used and re-used previously published manuscript and the prestige of the journal datasets to enable new discovery in these in which it appears are not direct measures areas6. of the quality or novelty of scientific out- In the physical sciences, data sharing is put, Milham et al.’s results are likely to be also increasingly practiced. In astronomy reassuring for cognitive neuroscientists and astrophysics, for example, telescope concerned about whether the lack of ori- data is typically open;7 without such shar- ginal data collection would reduce the ing, most research groups, lacking the impact of their work. funds to construct the kinds of large tele- Indeed, far from being an impediment to scopes required for modern astronomy carrying out novel science, data sharing research, would be unable to reach the NATURE COMMUNICATIONS | (2018) 9:2817 | DOI: 10.1038/s41467-018-05227-z | www.nature.com/naturecommunications 1 EDITORIAL NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05227-z cutting edge of discovery. Astronomy data but can drive new science for tomorrow. 5. Genome Aggregation Database (gnomAD). sharing has even expanded to encompass Given that we today cannot predict how http://gnomad.broadinstitute.org/. personal computers with the UC Berkeley- valuable a given set of data will one day 6. Bonàs-Guarch, S. et al. Re-analysis of public genetic data reveals a rare X-chromosomal based SETI@home program, enabling citi- prove to be, there is a strong argument to variant associated with type 2 diabetes. Nat. 8 zen science participation in data analysis . be made that leaving data unshared is an Commun. 9, 321 (2018). The field of ecology has made tre- impediment to the scientists of the future. 7. How big data advances physics. Marc Chahin mendous strides thanks to data sharing Indeed, we can envision a time in which, June 27, 2017 (blog post). https://www.elsevier. under the USA’s Long-Term Ecological far from being a disruptive innovation, com/connect/how-big-data-advances-physics. 9 8. SETI@home. https://setiathome.berkeley.edu/. Research (LTER) Network . This network, data sharing is seen as a normal and 9. Long-Term Ecological Research Network (LTER). a set of long-running observations across essential part of the scientific process, https://lternet.edu/. different ecosystems, has allowed ecologists much the way we see peer-review. 10. Michener, W. K. Ecological data sharing. Ecol. to detect important patterns playing out While SETI@home hasn’t found any Inform. 29,33–44 (2015). over timescales exceeding the length of aliens intelligence just yet, there are billions 11. The Earth Microbiome Project. http://www. earthmicrobiome.org/. research appointments or funding cycles. of stars in our galaxy: how else would we 12. Fournier, J. C. et al. Antidepressant drug effects The extent of data sharing in the field more reach for the stars unless we aim together and depression severity: a patient-level meta- broadly has evolved over time10 but influ- where alone? While neuroscientists haven’t analysis. JAMA 303,47–53 (2010). ential publications are now arising more yet solved the mysteries of human brain 13. On data availability, reproducibility and reuse. 19 than ever from databases supported by even using shared data, with some 86 bil- Nat. Cell Biol. , 259 (2017). 11 14 14. Azevedo, F. A. et al. Equal numbers of neuronal large networks of researchers . lion neurons in a single brain, they will and nonneuronal cells make the human brain an These examples demonstrate one clear need to work together to cover them all. isometrically scaled-up primate brain. J. Comp. benefit of data sharing, in that it enables Neurol. 513, 532–541 (2009). individual researchers to punch above their financial weight by making large, or Open Access This article is licensed expensive-to-collect, datasets available to under a Creative Commons Attribution all. In this way, data sharing opens hence 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium unforeseen avenues of research. This is not or format, as long as you give appropriate credit to the just true of large-scale data sharing initia- References original author(s) and the source, provide a link to the tives: even relatively small datasets, if 1. Gewin, V. Data sharing: an open mind on open Creative Commons license, and indicate if changes were shared, can contribute to big data and fuel data. Nature 529, 117–119 (2016). made. The images or other third party material in this future scientific discoveries in unexpected 2. Tenopir, C. et al. Changes in data sharing and article are included in the article’s Creative Commons ways. In medicine, for example, the data reuse practices and perceptions among license, unless indicated otherwise in a credit line to the 10 material. If material is not included in the article’s Creative patient-level meta-analysis of large number scientists worldwide. PLoS ONE , e0134826 (2015). Commons license and your intended use is not permitted of past clinical trials has revealed numer- 3. Milham, M. P. et al. Assessment of the impact of by statutory regulation or exceeds the permitted use, you ous novel findings that go well beyond the shared brain imaging data on the scientific will need to obtain permission directly from the copyright original purpose of the studies that gener- literature. Nat. Commun. 9 (2018). https://doi. holder. To view a copy of this license, visit http:// ated the data (e.g., ref.12). org/10.1038/s41467-018-04976-1. creativecommons.org/licenses/by/4.0/. Sharing data, then, is not only a way to 4. Van Horn, J. D. & Gazzaniga, M. S. Why share data? Lessons learned from the fMRIDC. improve the reproducibility and robustness Neuroimage 82, 677–682 (2013). © Macmillan Publishers Ltd, Part of Springer Nature 2018 of the science that is taking place today13, 2 NATURE COMMUNICATIONS | (2018) 9:2817 | DOI: 10.1038/s41467-018-05227-z | www.nature.com/naturecommunications.