Biomedical Knowledge Production in the Age of Big Data

Exploratory study 2/2017 Biomedical knowledge production in the age of big data Analysis conducted on behalf of the Swiss Science and Innovation Council SSIC — Professor Sabina Leonelli, Philosophy of Science, University of Exeter, UK Exploratory study 2/2017 Biomedical knowledge production in the age of big data The Swiss Der Schweizerische Science and Wissenschafts- Innovation Council und Innovationsrat The Swiss Science and Innovation Council SSIC is the advisory Der Schweizerische Wissenschafts- und Innovationsrat SWIR body to the Federal Council for issues related to science, higher berät den Bund in allen Fragen der Wissenschafts-, Hochschul-, education, research and innovation policy. The goal of the SSIC, Forschungs- und Innovationspolitik. Ziel seiner Arbeit ist die in conformity with its role as an independent consultative body, kontinuierliche Optimierung der Rahmenbedingungen für die is to promote the framework for the successful development of gedeihliche Entwicklung der Schweizer Bildungs-, Forschungs- the Swiss higher education, research and innovation system. As und Innovationslandschaft. Als unabhängiges Beratungsorgan an independent advisory body to the Federal Council, the SSIC des Bundesrates nimmt der SWIR eine Langzeitperspektive auf pursues the Swiss higher education, research and innovation das gesamte BFI-System ein. landscape from a long-term perspective. Le Conseil suisse Il Consiglio svizzero de la science della scienza et de l'innovation e dell'innovazione Le Conseil suisse de la science et de l’innovation CSSI est l’or- Il Consiglio svizzero della scienza e dell’innovazione CSSI è gane consultatif du Conseil fédéral pour les questions relevant l’organo consultivo del Consiglio federale per le questioni ri- de la politique de la science, des hautes écoles, de la recherche guardanti la politica in materia di scienza, scuole universitarie, et de l’innovation. Le but de son travail est l’amélioration ricerca e innovazione. L’obiettivo del suo lavoro è migliorare cons tante des conditions-cadre de l’espace suisse de la forma- le condizioni quadro per lo spazio svizzero della formazione, tion, de la recherche et de l’innovation en vue de son dévelop- della ricerca e dell’innovazione affinché possa svilupparsi in pement optimal. En tant qu’organe consultatif indépendant, le modo armonioso. In qualità di organo consultivo indipendente CSSI prend position dans une perspective à long terme sur le del Consiglio federale il CSSI guarda al sistema svizzero della système suisse de formation, de recherche et d’innovation. formazio ne, della ricerca e dell’innovazione in una prospettiva globale e a lungo termine. Exploratory study 2/2017 2 About the author Biomedical knowledge production in the age of big data Sabina Leonelli is Professor in the Philosophy and History of Science at the University of Exeter, where she co-directs the Exeter Centre for the Study of the Life Sciences and leads the “Data Studies” research strand. Her research, currently funded by a European Research Council award, focuses on the philosophy of data-intensive science, especially the methods, outputs and social embedding of open and big data, and the ways in which they are redefining what counts as research and knowledge. She published over fifty papers in high-ranking journals within biology as well as the philosophy, social studies and history of science, and is the author of Data-Centric Bio logy: A Philosophical Study (2016, Chicago University Press). She has been invited to present her work to scholars and policy-makers around the world, most recently in venues such as the American Association for the Advancement of Science, the World Conference on the Future of Science, the Open Science EU Presidency Conference and the World Science Forum. The research underpinning this report was funded by the European Research Council (grant award 335925), the Lever- hulme Trust (award RPG-2013-153), the Australian Research Council (award DP160102989), the UK Medical Research Coun- cil and Natural Environment Research Council (award MR/ K019341/1) and the UK Economic and Social Research Coun- cil (award ES/P011489/1). The author is very grateful to Lora E. Fleming and Niccolo Tempini for helpful comments on the manuscript, and Rachel Ankeny, Brian Rappert and Barbara Prainsack for relevant discussions. The author contracted by the Swiss Science and Innovation Council to produce the present paper bears full responsibility for its contents. Exploratory study 2/2017 3 Table of contents Biomedical knowledge production in the age of big data Preface by the SSIC 4 Preface by the SSIC 4 Vorwort des SWIR 4 Préface du CSSI 5 Prefazione del CSSI 5 Executive summary 6 Executive summary 6 Executive Summary 7 Résumé 8 Riassunto 9 1 Introduction 12 2 Relevant background 16 2.1 Big data and the rise of data-centric science 16 2.2 Big data in biomedical research 19 3 Current challenges 24 3.1 Scientific and technical concerns: tackling diversity 24 3.1.1 Data infrastructures and information technology: interoperability, standardisation and maintenance 25 3.1.2 Data re-use: the role of theories, metadata and materials 26 3.2 Ethical concerns: making knowledge production sustainable 28 3.2.1 Ownership and the value of data 28 3.2.2 Security, privacy and confidentiality 29 3.2.3 Bias, inequality and the digital divide 30 3.3 Institutional and management issues 31 3.3.1 Division of labour and shifts in roles 31 3.3.2 Reward system and open data enforcement 32 3.4 Financial concerns: which business models for data infrastructures? 33 4 Conclusions 36 4.1 Two key challenges: managing infrastructures and engaging stakeholders 36 4.2 Towards a federated approach 37 4.3 Using big data to produce biomedical knowledge: five principles 38 4.3.1 Principle 1: ethics and security concerns are an integral part of data science 38 4.3.2 Principle 2: public engagement and trust are crucial to the successful dissemination and use of big data in biomedicine 38 4.3.3 Principle 3: effective public engagement depends on big data literacy across society as a whole 38 4.3.4 Principle 4: biomedical research needs to build on multiple sources of evidence, taking account of the novel types of dis crimination created by big data dissemination and re-use 39 4.3.5 Principle 5: data infrastructures and related data management skills are essential to ex tracting biomedical knowledge from big data 39 References 40 Abbreviations 46 Exploratory study 2/2017 4 Preface by the SSIC Biomedical knowledge production in the age of big data Preface by the SSIC Vorwort des SWIR Investigating the notions of health and disease is an important Für den Schweizerischen Wissenschafts- und Innovationsrat topic of the Swiss Science and Innovation Council’s 2016–2019 (SWIR) ist die Auseinandersetzung mit dem Verständnis von working programme. This includes asking under what condi- Gesundheit und Krankheit ein wichtiger Themenbereich des tions data-centric research can be a basis for statements about Arbeitsprogramms 2016–2019. Dazu gehört die Frage nach den individuals and their background. In her analysis for the Swiss Voraussetzungen, unter denen es möglich ist, mittels datenzen- Science and Innovation Council (SSIC) on “Biomedical know- trierter Forschungsansätze Aussagen über ein Individuum in ledge production in the age of big data”, Prof. Sabina Leonelli seinem Lebenszusammenhang zu machen. Die Analyse «Wis- looks at the extent to which the emergence of big data is alter- sensproduktion in der Biomedizin im Zeitalter von Big Data», ing biomedical research practices and findings, and highlights die Prof. Dr. Sabina Leonelli für den SWIR verfasst hat, bietet ei- the range of challenges confronting researchers. In Leonelli’s nen Überblick darüber, inwiefern das Aufkommen von Big Data view, scientific approaches are influenced by ethical, institu- biomedizinische Forschungspraktiken und -ergebnisse verän- tional, financial and technical problems. dert, und zeigt die Vielfalt der Herausforderungen, mit denen Building on these considerations, which the SSIC is Forschende konfrontiert sind. Dabei sind für Leonelli wissen- pleased to present here to a wider readership, the Council de- schaftliche Betrachtungsweisen auch von ethischen, instituti- cided on 26 June 2017 to discuss certain key issues, focussing onellen, finanziellen und technischen Problemfeldern geprägt. on the Swiss context. It will pursue this perspective over the Ausgehend von diesen Überlegungen, die der SWIR mit coming two years, consulting with Swiss players and continu- der vorliegenden Publikation gerne weiteren interessierten ing to bring in international expert knowledge. It does so with Leserinnen und Lesern zur Verfügung stellt, hat der Rat am the view that there are challenges to face as well as opportu- 26. Juni 2017 die Diskussion über zentrale Fragen im schwei- nities to grasp. Technological developments such as the cre- zerischen Kontext begonnen. Er wird seine Arbeit in den kom- ation of interoperable databases and machine learning can be menden zwei Jahren vertiefen, sich mit Schweizer Akteuren of great value, but the main determinant of a successful and auseinandersetzen und weiterhin auch internationales Wissen sustainable development for biomedicine is ultimately the so- einbeziehen. Er tut dies mit der Haltung, dass Herausforderun- cial environment, in particular institutional, financial and le- gen zu bewältigen, aber auch Chancen zu ergreifen sind. Tech- gal factors, and, not least, ethical aspects. The Council will

Biomedical Knowledge Production in the Age of Big Data

Genomic Response to the Social Environment: Implications for Health Outcomes

A Supergene-Linked Estrogen Receptor Drives Alternative Phenotypes in a Polymorphic Songbird

Genomic and Transcriptomic Analyses of the Subterranean Termite

How Genome-Wide Association Studies on Educational Attainment Reify Eugenic Ideologies

Sociogenomics on the Wings of Social Insects Sainath Suryanarayanan

The Sociogenomics of Sexual and Reproductive Behaviour

Densely Packed Yeast: a Tool to Study Evolution of Group Dynamics and to Enhance Biomanufacturing

Genomic Tools for Behavioural Ecologists to Understand Repeatable Individual Differences in Behaviour

Genes, Cognition, and Social Behavior: Next Steps for Foundations and Researchers

Sociogenomics Takes Flight Showed the Expected Expression Patterns in All 61 Four Species

New Frontiers for the Integrative Study of Animal Behavior

Gene Expression and the Evolution of Behavioral Plasticity