Proceedings of the 2018 Conference on Adding Value and Preserving Data
Total Page:16
File Type:pdf, Size:1020Kb
Conference Proceedings RAL-CONF-2018-001 PV2018: Proceedings of the 2018 conference on adding value and preserving data Harwell, UK 15th-17th May, 2018 Esther Conway (editor), Kate Winfield (editorial assistant) May 2018 ©2018 Science and Technology Facilities Council This work is licensed under a Creative Commons Attribution 4.0 Unported License. Enquiries concerning this report should be addressed to: RAL Library STFC Rutherford Appleton Laboratory Harwell Oxford Didcot OX11 0QX Tel: +44(0)1235 445384 Fax: +44(0)1235 446403 email: [email protected] Science and Technology Facilities Council reports are available online at: http://epubs.stfc.ac.uk ISSN 1362-0231 Neither the Council nor the Laboratory accept any responsibility for loss or damage arising from the use of information contained in any of their reports or in any communication about their tests or investigations. Proceedings of the 2018 conference on adding value and preserving data This publication is a Conference report published by the This publication is a Conference report published by the Science and Technology (STFC) Library and Information Service. The scientific output expressed does not imply a policy position of STFC. Neither STFC nor any person acting on behalf of the Commission is responsible for the use that might be made of this publication. Contact information Name: Esther Conway Address: STFC, Rutherford Appleton Laboratory Harwell, Oxon, UK Email: [email protected] Tel.: +44 01235 446367 STFC https://www.stfc.ac.uk RAL-CONF-2018-001 ISSN- 1362-0231. Preface The PV2018 Conference welcomes you to its 9th edition, to be held 15th – 17th May 2018 at the Rutherford Appleton Laboratory, Harwell Space Cluster (UK), hosted by the UK Space Agency and jointly organised by STFC, NCEO and the Satellite Applications Catapult. For its ninth edition, the conference series moves to its hosts the UK Space Agency and is located at the Rutherford Appleton Laboratory part of the Harwell Space Cluster in the UK to continue addressing prospects in the domain of data preservation, stewardship and value adding of scientific data and research related information. As we enter the era of big data; for this conference year, we extended a special invite to large- scale scientific archives so we can facilitate discussion of emergent issues across scientific domains. This conference explores a new and exciting technology age, where are seeing large- scale collaborations occurring on state of art virtual research environments and novel collaborative infrastructures. The PV2018 conference consisted of four session/theme areas had the following objectives • Facilitate Science Archives and Data Service Providers sharing knowledge, experiences, and lessons learnt and best practices. In addition to fostering cooperation in the areas of Data Exploitation, Preservation and archived Data Stewardship. • Address key emerging issues for science archives including but not limited to Open Data, Big Data, Managing Heterogeneity, Data Management Planning, Data Usability, Exploitation and Impact. • Provide a forum for organisations dealing with preservation of own data and value adding to present the status of their activities, plans and expectations. In PV2018 we particularly welcome input from a broad range of science archives and data providers. In addition to space data archives we would like to extend a special invitation to. • Large science facilities from different domains to facilitate discussion of our common challenges. • Specialist science archives and data service providers who are integrating data with space- based observation to produce innovative data services. Session 1: Data stewardship approaches to ensure long-term data and knowledge preservation and data standards. In this session, we consider the best practises for the long-term preservation of the data and other results associated with research across the preservation lifecycle, from the submission of data packages for preservation, to the access of data products. This includes the organisational structures, policies and standards adopted by data centres and archives to assure cost-effective preservation, together with risk management, uncertainty quantification, quality assessment and the evaluation of preservation capabilities. Further, we will consider novel architectures and tools used to realise different preservation strategies, and standards, tools and languages to capture the preservation context, including the preservation of data formats, the use of identifiers, metadata, semantics, data provenance, quality and uncertainty. Session 2: Adding value to data and facilitation of data use: In this session, we consider activities that add value to archived data, facilitate their use or produce novel data services. Data archivists often focus most of their energy on creating well-formed, well-documented archives with the expectation that they will be available for the next 50 to 100 years. However, archived data are meaningless if they cannot be easily retrieved, understood, and used. As a result we would like to invite submissions from projects or archives who rising to the challenge of enhancing data in order to facilitate exploitation of data assets. Session 3: Virtual Research Environments for science data exploitation and value adding: This session will consider new challenges, activities and research related to Virtual Research Environments or Collaborative Environments. While Massive data growth is calling for a new paradigm, with a shift towards “bring the user to the data", where scientists can bring their own code and run it where the data actually reside, instead of downloading the data and run their analysis on their computer. There is also an increased need for data, associated documentation and software long-term preservation and accessibility, for scientists to be able to re-run data analysis that was initially applied on the data. Last, scientists are now expecting to share not only their data, but also their software and the results of their research activities, and to work with their collaborators in an easy and effective manner, regardless of their location. Session 4: Data preservation in practice: past (present) and future: The purpose of this session is to examine existing practices and systems and highlight what has been learnt, including how to best benefit from collaboration between projects and/or disciplines. It will also look forward and attempt to understand how new developments and/or technologies and/or tomorrow's data volumes might influence or even constrain how things will be done in the future. Data preservation is not a static field: we wish to use this session to explore what we have learnt from previous migrations and to consider how to best prepare for the future, including potentially disruptive scenarios. We would also like to facilitate discussion on how different services involved in LTDP interplay and to use this opportunity to consider how we measure success and respond to requirements from funding agencies, such as those for F.A.I.R. data management. A total of 77 abstracts were submitted for presentation at the conference. Following the peer- review process by members of the conference programme committee, 48 papers were selected for oral presentation. They are complemented by 22 poster presentations (for a total of 310 distinct co-authors with affiliations distributed over different countries from all continents). The conference introductory welcome, event and dinner talks was given by was given by Tony Hey (STFC), Esther Conway (Centre for Environmental Data Analysis), Chris Mutlow (RAL Space), Beth Greenaway(UKSA), John Remedios (NCEO), Stuart Martin (Satellite Applications Catapult), Sue O’Hare (ESA Business Incubation Centre), Michael Gleaves (Hartree Centre) and Hugh Mortimer(RAL Space) These proceedings consist of a collection of 43 short papers and 20 abstracts corresponding to the oral and poster presentations delivered at the conference. They are organized in sections according to conference sessions followed by the contributions that were presented during the poster session. Further to the oral and poster contributions, the conference has been fortunate to receive keynote lectures and discussion panel addressing a variety of data science topics of interest to PV2018 1. The work of the CEOS WGISS group by Mirko Albani (ESA ESRIN) 2. The EVER-EST project by Rosemarie Leone (ESA ESRIN) 3. Data value and curation across countries, across domains discussion lead by Katrin Molch (DLR) 4. The European Open Science Cloud by Juan Bicarregui (STFC) 5. Copernicus C3S planned service by Carlo Buontempo (ECMWF) 6. Open science data management by Rachel Bruce (JISC) We enjoy participation from projects, organisations and individuals developing novel data services within and as a result of these environments. Participation includes a broad range of scientific disciplines seeking to preserve and derive the maximum value from their data. This edition of the PV conference series is deeply grateful to UKSA and it’s organising partners NCEO, STFC and the Satellite Applications Catapult. Conference Chairs Conference Chair: Tony Hey - Chief Data Scientist STFC Conference Co- chair: Harald Rothfuss - EUMETSAT Organising Committee Head of Organising Committee: Caroline Callard – STFC/RAL Space Esther Conway – STFC/CEDA Brian Matthews – STFC/SCD Poppy Townsend – STFC/CEDA Richard Hilton – Satellite Applications Catapult Anastasia Bolton – Satellite Applications Catapult