D8.1 Data Management Plan

Total Page:16

File Type:pdf, Size:1020Kb

D8.1 Data Management Plan D8.1 Data Management Plan Project ref. no. H2020-MG-2018-SingleStage-INEA Nº 824326 Project title Revealing fair and actionable knowledge from data to support women’s inclusion in transport systems, Project duration 1st November 2018 – 31st October 2021 (36 months) Website www.diamond-project.eu Related WP/Task WP8 / T8.1 Dissemination level PUBLIC Document due date 30/04/2019 (M6) Actual delivery date 30/04/2019 (M6) Deliverable leader Systematica Srl (SYS) Document status Submitted This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 824326 Revision History Version Date Author Document history/approvals 0.1 03/04/2019 SYS Draft version circulated to partners 0.2 15/04/2019 STIR Draft version circulated to partners 1.0 26/04/2019 Anahita Rezaallah (SYS) Final complete version Andrea Gorrini (SYS) 1.0 29/04/2019 EURECAT Validation 1.0 30/04/2019 EURECAT Submission Executive Summary The following document is the Deliverable 8.1_Data Management Plan (DMP) of the DIAMOND Project, funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement Number 824326. The main aim of DIAMOND Project is to turn data into actionable knowledge with notions of fairness, in order to progress towards an inclusive and efficient transport system, by the development of a methodology based on the collection and analysis of disaggregated data, including new sources, analytics and management techniques. This document is the first version of the DMP, consisting of preliminary information regarding the type and format of data that will be collected and generated, its origin, data utility and how DIAMOND project research data will be findable, accessible, interoperable and reusable (i.e. FAIR). The purpose of DMP is to provide the members of the consortium with an analysis of the main elements of the data management policy regarding all the datasets generated by the project. The DMP is a living document that will evolve during the lifespan of the project, the second version D8.1.2_Data Management Plan, revision period 1 will be published on May 31, 2020 and the third and final version D8.1.3_Data Management Plan, revision period 2 will be published by the end of project on November 30, 2021. D8.1 Data Management Plan Page 2 of 20 Table of Contents REVISION HISTORY ............................................................................................................................. 2 EXECUTIVE SUMMARY ...................................................................................................................... 2 TABLE OF CONTENTS ....................................................................................................................... 3 LIST OF FIGURES ................................................................................................................................. 4 LIST OF TABLES ................................................................................................................................... 5 TERMINOLOGY AND ACRONYMS ................................................................................................... 6 INTRODUCTION ......................................................................................................................... 7 DATA SUMMARY ......................................................................................................................... 8 2.1 WHAT IS THE PURPOSE OF THE DATA COLLECTION/GENERATION AND ITS RELATION TO THE OBJECTIVES OF THE PROJECT? ................................................................................................................................................................................. 8 2.2 WHAT TYPES AND FORMATS OF DATA WILL THE PROJECT GENERATE/COLLECT? ................................................ 9 2.2.1 Type of data .................................................................................................................................................................................. 9 2.2.2 Format of data........................................................................................................................................................................... 10 2.3 WILL YOU RE-USE ANY EXISTING DATA AND HOW? ............................................................................................... 12 2.4 WHAT IS THE ORIGIN OF THE DATA? ........................................................................................................................ 12 2.5 WHAT IS THE EXPECTED SIZE OF THE DATA? ........................................................................................................... 12 2.6 TO WHOM MIGHT IT BE USEFUL ('DATA UTILITY')? .................................................................................................. 13 FAIR DATA .................................................................................................................................. 14 3.1 MAKING DATA FINDABLE, INCLUDING PROVISIONS FOR METADATA .................................................................... 14 3.1.1 Metadata provision .................................................................................................................................................................. 14 3.1.2 Standards for metadata creation ........................................................................................................................................ 14 3.2 MAKING DATA OPENLY ACCESSIBLE .......................................................................................................................... 14 3.3 MAKING DATA INTEROPERABLE ................................................................................................................................. 18 ALLOCATION OF RESOURCES .............................................................................................. 18 DATA SECURITY ....................................................................................................................... 18 ETHICAL ISSUES ........................................................................................................................ 18 6.1 GENDER ANALYSIS AND COMMITMENT TO THE RESPONSIBLE RESEARCH INNOVATION (RRI) PRINCIPLES ... 19 D8.1 Data Management Plan Page 3 of 20 List of Figures Figure 1. Data Life Cycle ................................................................................................................................... 7 D8.1 Data Management Plan Page 4 of 20 List of Tables Table 1. List of DIAMOND project data sets for each use-case ........................................................... 10 Table 2. File formats recommended and accepted by the UK Data Service for data sharing, reuse and preservation ............................................................................................................................................... 12 Table 3. Detail of targeted users benefiting from DIAMOND project ................................................ 14 Table 4. List of DIAMOND project deliverables and their dissemination level................................. 18 D8.1 Data Management Plan Page 5 of 20 Terminology and Acronyms CCTV Closed-Circuit Television DMP Data Management Plan EC European Commission EU European Union EUT FUNDACIO EURECAT FAIR Findable, Accessible, Interoperable, Reusable FGC Ferrocarrils de la Generalitat de Catalunya G&V GENRE ET VILLE H2020 Horizon 2020 ORDP Open Research Data Pilot SYS Systematica S.r.l. VELIB SYNDICAT MIXTE AUTOLIB ET VELIB MÉTROPOLE WAVE WAVE WOMEN AND VEHICLES IN EUROPE WP Work Package ZTM MIASTO STOLECZNE WARSZAWA D8.1 Data Management Plan Page 6 of 20 Introduction The main purpose of the DMP is to provide the members of the consortium with an analysis of the main elements of the data management policy to be used with regard to the project research data. The DMP is produced using the European Commission (EC) guideline of "Horizon 2020 FAIR Data Management Plan (DMP) template", Version: 26 July 2016. It covers the entire research data life cycle as it is shown in the following figure that illustrates the different phases and aspects of research data management and their relation. Figure 1. Data Life Cycle1 It is noteworthy that in reality, research processes are rarely as linear as shown here but the model summarizes the most important aspects of data management at different phases of the project. The DMP provides information regarding purpose of data collection, type and format of data collection and generation, methods of re-using existing data, origin of data, expected size of data, data utility, how DIAMOND project research data will be findable, accessible, interoperable and reusable (FAIR), data security and ethical issues. As this document is the first version of the DMP, delivered in Month 6 of the project, it only provides a general overview of data collection. The upcoming versions of the DMP will be more detailed and describe the practical data management procedures implemented by the DIAMOND project. The DMP will be updated in Month 18 (D8.1.2) and Month 36 (D8.1.3) respectively. 1 SOURCE: HÜSER, FALCO JONAS; ELBÆK, MIKAEL K.; MARTINEZ LAVANCHY, PAULA (2016): DTU RESEARCH DATA LIFE CYCLE. FIGSHARE. FIGURE. D8.1 Data Management Plan Page 7 of 20 Data summary 2.1 What is the purpose of the data collection/generation and its relation to the objectives of the project? The purpose of data collection campaigns
Recommended publications
  • Etriks -Standards Starter Pack Standards Guidelines
    eTRIKS -Standards Starter Pack Standards Guidelines Release 1.1 - 25th April 2016 Authors (by alphabetical order) Bratfalean, Dorina – CDISC Europe Foundation, Braxenthaler, Michael – Roche Innovation Center New York, Houston, Paul – CDISC Europe Foundation (*), Munro, Robin – ID Business Solutions Limited, Richard, Fabien – Centre National de la Recherche Scientifique, Rocca-Serra, Philippe – Oxford e-Research Centre, University of Oxford (*), Romacker, Martin – Roche Innovation Center Basel, Sansone, Susanna-Assunta – Oxford e-Research Centre, University of Oxford. (*) Correspondence to [email protected] or [email protected] Version History Version Date Who Role Notes 1.0 11 Philippe Rocca- Draft creator and Draft for internal use February Serra, Fabien maintainer 2015 Richard, Dorina Bratfalean 1.1 25 April Philippe Rocca- maintainer Public release 2016 Serra Licence: https://creativecommons.org/licenses/by-sa/4.0/ 1 A Business Case for Standards in eTRIKS Part 1. Introduction 1.1 eTRIKS mission and objectives 1.2 Document objective 1.3 Intended Audience 1.4 Standard Definition and Typology 1.4.1 Definition of Standards: 1.4.2 Typology of Standards 1.5 Purpose of Standards Part 2. Procedure for standards selection and recommendation 2.1 Procedure outline 2.2 Attributes of standards 2.3 Versioning of Standards 2.4. Standardization Bodies and Service Providers 2.4. Gaps in Standards 2.4.1 Coverage gap in a domain covered by an existing standard 2.4.2 Coverage gap in a domain not covered by standards 2.5 Changes, maintenance and updates to eTRIKS Standard Starter Pack Part 3. Standards in data management 3.1 Standards for Data Security, Data Privacy and Compliance with Ethical Guidelines.
    [Show full text]
  • Turning FAIR Data Into Reality
    Final Report and Action Plan from the European Commission Expert Group on FAIR Data TURNING FAIR INTO REALITY Research and Innovation 2018 Turning FAIR into reality European Commission Directorate General for Research and Innovation Directorate B – Open Innovation and Open Science Unit B2 – Open Science Contact Athanasios Karalopoulos E-mail [email protected] [email protected] European Commission B-1049 Brussels Manuscript completed in November 2018. This document has been prepared for the European Commission however it reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein. More information on the European Union is available on the internet (http://europa.eu). Luxembourg: Publications Office of the European Union, 2018 Print ISBN 978-92-79-96547-0 doi:10.2777/54599 KI-06-18-206-EN-C PDF ISBN 978-92-79-96546-3 doi: 10.2777/1524 KI-06-18-206-EN-N © European Union, 2018. Reuse is authorised provided the source is acknowledged. The reuse policy of European Commission documents is regulated by Decision 2011/833/EU (OJ L 330, 14.12.2011, p. 39). For any use or reproduction of photos or other material that is not under the EU copyright, permission must be sought directly from the copyright holders. The Expert Group operates in full autonomy and transparency. The views and recommendations in this report are those of the Expert Group members acting in their personal capacities and do not necessarily represent the opinions of the European Commission or any other body; nor do they commit the Commission to implement them.
    [Show full text]
  • White Paper on Implementing the FAIR Principles for Data in the Social, Behavioural, and Economic Sciences
    A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Betancort Cabrera, Noemi et al. Working Paper White Paper on implementing the FAIR principles for data in the social, behavioural, and economic sciences RatSWD Working Paper, No. 274 Provided in Cooperation with: German Data Forum (RatSWD) Suggested Citation: Betancort Cabrera, Noemi et al. (2020) : White Paper on implementing the FAIR principles for data in the social, behavioural, and economic sciences, RatSWD Working Paper, No. 274, Rat für Sozial- und Wirtschaftsdaten (RatSWD), Berlin, http://dx.doi.org/10.17620/02671.60 This Version is available at: http://hdl.handle.net/10419/229719 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open
    [Show full text]
  • Top 10 FAIR Data & Software Things
    Top 10 FAIR Data & Software Things 15 September 2019 Sprinters: Albert Meroño Peñuela, Andrea Scharnhorst, Andrew Mehnert, Aswin Narayanan, Chris Erdmann, Danail Hristozov, Daniel Bangert, Eliane Fankhauser, Ellen Leenarts, Elli Papadopoulou, Emma Lazzeri, Enrico Daga, Evert Rol, Francoise Genova, Frans Huigen, Gerard Coen, Gerry Ryder, Iryna Kuchma, Joanne Yeomans, Jose Manzano Patron, Juande Santander-Vela, Katerina Lenaki, Kathleen Gregory, Kristina Hettne, Leonidas Mouchliadis, Maria Cruz, Marjan Grootveld, Matthew Kenworthy, Matthias Liffers, Natalie Meyers, Paula Andrea Martinez, Ronald Siebes, Spyros Zoupanos, and Stella Stoycheva. Organisations: Athena Research Center, Australian Research Data Commons (ARDC), California Digital Library, Centre de Données astronomiques de Strasbourg, Centre for Advanced Imaging, Centre for Microscopy Characterisation and Analysis, Data Archiving & Networked Services (DANS), EIFL, ELIXIR-Europe, EPFL, FORTH-IESL, FOSTER Open Science, GRACIOUS, Greek Ministry of Education, Greendecision, Göttingen State and University Library, Leiden Observatory, Leiden University, Library Carpentry, National Imaging Facility (NIF) and Characterisation Virtual Laboratory (CVL), National Imaging Facility and Centre for Advanced Imaging,National Research Council of Italy, OpenAire, SKA Organisation: Jodrell Bank Observatory, The Open University, The University of Western Australia, Universitaire Bibliotheken Leiden, University of Notre Dame, University of Nottingham, Vrije Universiteit Amsterdam, and Yordas Group. i About The Top 10 FAIR Data & Software Global Sprint was held online over the course of two- days (29-30 November 2018), where participants from around the world were invited to develop brief guides (stand alone, self-paced training materials), called “Things”, that can be used by the research community to understand FAIR in different contexts but also as starting points for conversations around FAIR.
    [Show full text]
  • Guidelines on FAIR Data Management in Horizon 2020
    EUROPEAN COMMISSION Directorate-General for Research & Innovation H2020 Programme Guidelines on FAIR Data Management in Horizon 2020 Version 3.0 26 July 2016 History of changes Version Date Change Page 2.1 15.02.2016 . the guide was also published as part of the Online Manual all with updated and simplified content 3.0 26.07.2016 . This version has been updated in the context of the all extension of the Open Research Data Pilot and related data management issues . New DMP template included 6 2 1. Background – Extension of the Open Research Data Pilot in Horizon 2020 Please note the distinction between open access to scientific peer-reviewed publications and open access to research data: publications – open access is an obligation in Horizon 2020. data – the Commission is running a flexible pilot which has been extended and is described below. See also the guideline: Open access to publications and research data in Horizon 2020. This document helps Horizon 2020 beneficiaries make their research data findable, accessible, interoperable and reusable (FAIR), to ensure it is soundly managed. Good research data management is not a goal in itself, but rather the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse. Note that these guidelines do not apply to their full extent to actions funded by the ERC. For information and guidance concerning Open Access and the Open Research Data Pilot at the ERC, please read the Guidelines on the Implementation of Open Access to Scientific Publications and Research Data in projects supported by the European Research Council under Horizon 2020.
    [Show full text]
  • Pushing the Boundaries of Open Science at CERN: Submission to the UNESCO Open Science Consultation July 2020
    Pushing the Boundaries of Open Science at CERN: Submission to the UNESCO Open Science Consultation July 2020 Kamran Naim∗ Tullio Basaglia Jelena Brankovic Pamfilos Fokianos Jose Benito Gonzalez Lopez Maria Grazia Pia Alexander Kohls Artemis Lavasa Lars Holm Nielsen Stephanie van de Sandt Javier Serrano Tim Smith CERN, the European Organization for Nuclear Re- around the world, who collectively publish almost 1,000 search, is the world’s largest high-energy physics (HEP) research articles per year. These scientists are engaged in laboratory. Since its founding in 1954, the Laboratory has multiple experimental collaborations that operate largely made significant contributions to our understanding of the independently of each other and have different tools and world and the universe. The mission of the Organization is practices. The high-energy physics research conducted to: provide a unique range of particle accelerator facilities at CERN is one of the most data-intensive branches of that enable research at the forefront of human knowledge, science: particle collisions at CERN produce about 90 perform world-class research in fundamental physics, and petabytes of collision data per year. The complexity, scale unite people from all over the world to push the frontiers and value of these data presents unique and unprecedented of science and technology, for the benefit of all. Supported challenges for the research community at CERN. The through a global partnership of 23 member states, CERN is responsible stewardship, analysis and preservation of these home to the world’s largest scientific instrument, the Large data requires a supporting organizational research culture, Hadron Collider (LHC), and hosts over 12,000 scientists but also a range of tools and services to optimize how data and engineers from across the world.
    [Show full text]
  • Fair Data Advanced Use Cases
    FAIR DATA ADVANCED USE CASES FROM PRINCIPLES TO PRACTICE IN THE NETHERLANDS REPORT MAY 2018 WWW.SURF.NL 2 EXECUTIVE SUMMARY The idea that data needs to be Findable, Accessible, Interoperable and Reusable is a simple message which appeals to many. The 15 international FAIR principles were published in 2016. Various organisations and disciplines have since developed standards, tools and training based on their own interpretation of the FAIR principles. The six use cases included in this report describe different FAIR data approaches taken within different domains. For SURF, it is important to gain a better picture of the best way to support researchers who want to make their data FAIR. Conclusions that can be drawn from this report are: 1. Implementing FAIR is seen as a series of improvements ‘Going for FAIR’ is done one step at a time. There are always steps ahead that can improve FAIR-ness even further. Full compliancy with the FAIR principles is not seen as easy to achieve, if possible at all. 2. FAIR is seen as part of a larger culture change FAIR is seen as part of a larger culture change towards more openness in research and interdisciplinary cooperation. It raises awareness, opens up new innovative ways of working and boosts transfer of knowledge between different domains. 3. FAIR should go hand in hand with privacy regulations FAIR highlights the need to update policies and to invest in support and awareness activities, new infrastructures, software and tools. In most use cases this is taken up hand in hand with new national and international privacy regulations and policies.
    [Show full text]
  • Top 10 FAIR Data & Software Things
    Top 10 FAIR Data & Software Things February 1, 2019 Sprinters: Reid Otsuji, Stephanie Labou, Ryan Johnson, Guilherme Castelao, Bia Villas Boas, Anna- Lena Lamprecht, Carlos Martinez Ortiz, Chris Erdmann, Leyla Garcia, Mateusz Kuzak, Paula Andrea Martinez, Liz Stokes, Natasha Simons, Tom Honeyman, Chris Erdmann, Sharyn Wise, Josh Quan, Scott Peterson, Amy Neeser, Lena Karvovskaya, Otto Lange, Iza Witkowska, Jacques Flores, Fiona Bradley, Kristina Hettne, Peter Verhaar, Ben Companjen, Laurents Sesink, Fieke Schoots, Erik Schultes, Rajaram Kaliyaperumal, Erzsebet Toth-Czifra, Ricardo de Miranda Azevedo, Sanne Muurling, John Brown, Janice Chan, Lisa Federer, Douglas Joubert, Allissa Dillman, Kenneth Wilkins, Ishwar Chandramouliswaran, Vivek Navale, Susan Wright, Silvia Di Giorgio, Akinyemi Mandela Fasemore, Konrad Förstner, Till Sauerwein, Eva Seidlmayer, Ilja Zeitlin, Susannah Bacon, Chris Erdmann, Katie Hannan, Richard Ferrers, Keith Russell, Deidre Whitmore, and Tim Dennis. Organisations: Library Carpentry/The Carpentries, Australian Research Data Commons, Research Data Alliance Libraries for Research Data Interest Group, FOSTER Open Science, OpenAire, Research Data Alliance Europe, Data Management Training Clearinghouse, California Digital Library, Dryad, AARNet, Center for Digital Scholarship at the Leiden University, DANS, The Netherlands eScience Center, University Utrecht, UC San Diego, Dutch Techcentre for Life Sciences, EMBL, University of Technology, Sydney, UC Berkeley, University of Western Australia, Leiden University, GO FAIR, DARIAH, Maastricht University, Curtin University, NIH, NLM, NCBI, ZB MED, CSIRO, and UCLA. TOP 10 FAIR DATA & SOFTWARE THINGS: Table of Contents About, p. 3 Oceanography, p. 4 Research Software, p. 15 Research Libraries, p. 20 Research Data Management Support, p. 25 International Relations, p. 30 Humanities: Historical Research, p. 34 Geoscience, p. 42 Biomedical Data Producers, Stewards, and Funders, p.
    [Show full text]
  • Towards a National Research Data Infrastructure for Chemistry in Germany
    Research Ideas and Outcomes 6: e55852 doi: 10.3897/rio.6.e55852 Grant Proposal NFDI4Chem - Towards a National Research Data Infrastructure for Chemistry in Germany Christoph Steinbeck‡§, Oliver Koepler , Felix Bach|, Sonja Herres-Pawlis¶, Nicole Jung |, Johannes C. Liermann#, Steffen Neumann¤, Matthias Razum«, Carsten Baldauf», Frank Biedermann|, Thomas W. Bocklitz˄, Franziska Boehm«, Frank Broda ¤, Paul Czodrowski˅, Thomas Engel¦, Martin G. Hicksˀ, Stefan M. Kast˅, Carsten Kettnerˀ, Wolfram Koch ˁ, Giacomo Lanza₵,ℓ Andreas Link , Ricardo A. Mata₰, Wolfgang E. Nagel₱,¤ ₳Andrea Porzel , Nils Schlörer , Tobias Schulze₴, Hans-Georg Weinigˁ, Wolfgang Wenzel|, Ludger A. Wessjohann¤,₣ Stefan Wulle ‡ Friedrich-Schiller-University, Jena, Germany § TIB Leibniz Information Centre for Science and Technology, Hannover, Germany | Karlsruhe Institute of Technology, Karlsruhe, Germany ¶ RWTH Aachen University, Aachen, Germany # Johannes Gutenberg University Mainz, Mainz, Germany ¤ Leibniz Institute of Plant Biochemistry, Halle, Germany « FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Karlsruhe, Germany » Fritz-Haber-Institut der MPG, Berlin, Germany ˄ Leibniz Institute of Photonic Technology, Jena, Germany ˅ TU Dortmund, Dortmund, Germany ¦ Ludwig-Maximilians-Universität München, Munich, Germany ˀ Beilstein-Institut, Frankfurt am Main, Germany ˁ Gesellschaft Deutscher Chemiker e.V., Frankfurt am Main, Germany ₵ Physikalisch-Technische Bundesanstalt, Braunschweig, Germany ℓ Deutsche Pharmazeutische Gesellschaft, Frankfurt am Main,
    [Show full text]
  • Open/FAIR Data, Software and Infrastructure in European Transport Research Final
    Ref. Ares(2019)4586061 - 16/07/2019 This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824323. This document reflects only the views of the author(s). Neither the Innovation and Networks Executive Agency (INEA) nor the European Commission is in any way responsible for any use that may be made of the information it contains. European forum and oBsErvatory for OPEN science in transport Project Acronym: BE OPEN Project Title: European forum and oBsErvatory for OPEN science in transport Project Number: 824323 Topic: MG-4-2-2018 – Building Open Science platforms in transport research Type of Action: Coordination and support action (CSA) D2.2 Open/FAIR data, software and infrastructure in European transport research Final European forum and oBsErvatory D2.2 Open/FAIR data, software and for OPEN science in transport infrastructure in European transport research Deliverable Title: Open/FAIR data, software and infrastructure in European transport research Work Package: WP2 Due Date: 2019.06.30 Submission Date: 2019.07.15 Start Date of Project: 1st January 2019 Duration of Project: 30 months Organisation Responsible of Deliverable: FEHRL Version: Final Status: Final Adewole Adesiyun (FEHRL), Alkiviadis Tromaras (CERTH), Author name(s): Caroline Almeras (ECTRI), António Antunes (LNEC), Maria de Lurdes Antunes (LNEC), José Barateiro (LNEC), Isabela Erdelean (AIT), Vilma Jasiūnienė (VGTU), Natalia Manola (ATHENA), Eleni Mavropoulou (CERTH), Milos Milenkovic (FTTE), Anja
    [Show full text]
  • A FAIR and AI-Ready Higgs Boson Decay Dataset
    A FAIR and AI-ready Higgs Boson Decay Dataset Yifan Chen1,2, E. A. Huerta2,3,*, Javier Duarte4, Philip Harris5, Daniel S. Katz1, Mark S. Neubauer1, Daniel Diaz4, Farouk Mokhtar4, Raghav Kansal4, Sang Eon Park5, Volodymyr V. Kindratenko1, Zhizhen Zhao1, and Roger Rusack6 1University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA 2Argonne National Laboratory, Lemont, Illinois 60439, USA 3University of Chicago, Chicago, Illinois 60637, USA 4University of California San Diego, La Jolla, California 92093, USA 5Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA 6The University of Minnesota, Minneapolis, Minnesota, 55405, USA *[email protected] ABSTRACT To enable the reusability of massive scientific datasets by humans and machines, researchers aim to create scientific datasets that adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets each FAIR principle. We then demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We also use other available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to facilitate an understanding and exploration of the dataset, including visualization of its elements. This study marks the first in a planned series of articles that will guide scientists in the creation and quantification of FAIRness in high energy particle physics datasets and AI models.
    [Show full text]
  • Fair Data Through a Federated Cloud Infrastructure: Exploring the Science Mesh
    FAIR DATA THROUGH A FEDERATED CLOUD INFRASTRUCTURE: EXPLORING THE SCIENCE MESH Research in Progress Angelo Romasanta, ESADE Business School, Barcelona, Spain, [email protected] Jonathan Wareham, ESADE Business School, Barcelona, Spain, [email protected] Abstract Despite the many promises of cloud computing in science, its full potential is yet to be unlocked due to the lack of findable, accessible, interoperable and reusable (FAIR) data. To investigate the barriers and promises of FAIR data, we explore the Science Mesh, a federated mesh infrastructure enabling interoperability across research cloud service providers. As a starting point, the project will connect 300,000+ researchers from eight providers across Europe, Australia and beyond. Through our analysis of the Science Mesh, we frame FAIR data as a collective action problem permeating across two levels: researchers downstream and cloud service providers upstream. On one hand, the scientific community would be better off if researchers made their data FAIR, yet misaligned incentives hinder this. On the other hand, cloud providers are also not incentivized to pursue interoperability with other providers, despite its benefits to users. By addressing these two dilemmas towards FAIR data, the Science Mesh promises to unlock new ways of collaborating through frictionless sync and share, remote data analysis and collaborative applications. Keywords: FAIR data, Interoperability, Digital infrastructure, Federated cloud, Open Science 1 Romasanta et al. / FAIR Data through Science Mesh 1 Introduction Cloud computing has enabled distributed researchers to conduct increasingly ambitious and complex experiments (Hoffa et al., 2008; Berriman et al., 2013; Yang et al., 2014). Despite the advances it has brought to research collaborations, the cloud still has not fully realized its impact due to data not following FAIR principles – that the data and the tools to analyse them are findable, accessible, interoperable and reusable (Wilkinson et al., 2016).
    [Show full text]