Materials Data Centre
Total Page:16
File Type:pdf, Size:1020Kb
Research data introduction Mark Scott, Nicki Clarkson, Alison Knight Guide authors: Mark Scott, Richard Boardman, Philippa Reed, Simon Cox – FEPS Dorothy Byatt and Isobel Stark – Library Accompanying guide available at: https://eprints.soton.ac.uk/403440/ Research data management web site: http://library.soton.ac.uk/researchdata DATA CREATION IN THE ‘GLOBAL DATASPHERE’ 16.1 ZB 4.4 ZB 2.8 ZB 1.8 ZB 0.281 ZB 1.0 ZB 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Source: Estimates from IDC 2008, 2011, 2012, 2014, 2017 3 DATA CREATION IN THE ‘GLOBAL DATASPHERE’ ‘By 2025, embedded data will constitute nearly 20% of all data created.’ • There will be a massive increase in data generated by Mobile and Real- time applications (e.g. automated machines). • Internet of Things is driving real-time data. Source: IDC 2017 4 DATA CREATION IN THE ‘GLOBAL DATASPHERE’ ‘By 2025, an average connected person anywhere in the world will interact with connected devices nearly 4,800 times per day — basically one interaction every 18 seconds.’ • Driven by embedded devices and Internet of Things Source: IDC 2017 5 RESEARCH DATA MANAGEMENT AT THE UNIVERSITY OF SOUTHAMPTON http://library.soton.ac.uk/researchdata • Data management planning • Guidance on retention periods • Describing your data for effective • Data access statements reuse • FAQs • Sharing your data • Useful links • Securing your data • Advice via email: • Storing your data [email protected] • Destruction of data 6 TALK OUTLINE 1. Five ways to think about research data 2. Why data management is important to you 3. Data management best practices 7 FIVE WAYS TO THINK ABOUT RESEARCH DATA 1. CREATION • Scientific experiment • Models or simulation • Derived data • Reference data 8 FIVE WAYS TO THINK ABOUT RESEARCH DATA: 2. THE RESEARCH Electronic text documents Notebooks and diaries Questionnaires, transcripts and Spreadsheets codebooks Digital objects, e.g. figures, videos Audiotapes and videotapes Database schemas Photographs and films Database contents Specimens, samples and artefacts Models, algorithms and scripts Methodologies, workflows, procedures and protocols Software configuration Experimental results Software input and output files (pre- and post-process) Metadata (data describing data) 9 FIVE WAYS TO THINK ABOUT RESEARCH DATA: 3. ELECTRONIC REPRESENTATION Textual Text files, Microsoft Word, PDF, RTF Numerical Excel, CSV (Comma separated) Multimedia TIFF image, AVI movie, MP3 audio Structured CSV, database, multi-purpose (XML) Software code Java, C, Matlab Software specific 3D CAD, statistical model Discipline specific Chemistry’s CIF (for crystallography) Instrument specific Archaeology’s laser scanner files 10 FIVE WAYS TO THINK ABOUT RESEARCH DATA: 4. WHAT FILES MAKE UP YOUR DATA SET? Data sets come in all sizes and complexities Complexity Type of data Size Individual file Raw CT data 10–100s of gigabytes Video Gigabytes Photograph Megabytes Individual frames of a Set of files Gigabytes movie Source code files Kilobytes/megabytes 11 FIVE WAYS TO THINK ABOUT RESEARCH DATA: 5. DATA LIFE CYCLE 12 Categories Stages WHY IS RESEARCH DATA MANAGEMENT IMPORTANT? ‘Data sharing and management snafu in 3 short acts’ by NYU Health Sciences Library (2012) 13 FIVE REASONS TO BOTHER WITH RESEARCH DATA MANAGEMENT 1. You may be required to by your funding body 2. The University expects you to (e.g. backups) 3. You may want to use the dataset again 4. Others may want to use your data 5. Others may want to cite your data – datasets can be cited just like journals and papers which can help your research standing 15 Data management best practices DATA MANAGEMENT PLAN • What is a DMP? “just a tool for thinking systematically through the kinds of material your work will produce, how you will work with it and ensure its integrity during the project, the possible reuse value of this material, and how it will be made safe and available into the future. A DMP is like an insurance policy for sustainability, ensuring you will maximise research value and have no unpleasant surprises at the close of your project. “ http://training.parthenos-project.eu/sample-page/manage-improve-and-open-up-your-research-and-data/data-management-planning/ 17 DATA MANAGEMENT PLAN • Compulsory for all new doctoral students • http://library.soton.ac.uk/researchdata/phd • Must complete & upload a DMP as part of your 12m review • Information, templates and guidance on the Library website 18 OPEN ACCESS TO RESEARCH DATA • Open Research requires data to be FAIR: • Findable • Accessible • Interoperable • Reusable • “As open as possible, as closed as necessary” • There may be moral, ethical, commercial or legal reasons for not sharing data or for restricting access • ORCiD can link your datasets and associated papers • Sharing your data can boost citation rates 19 OPEN ACCESS TO RESEARCH DATA Create a metadata record in a data Describe what the data is, why, when and how it was generated repository within 12 months of the end of data collection or when publishing a paper The data repository can be discipline-specific or EPrints via https://pure.soton.ac.uk Obtain a Digital Object Identifier (DOI) for e.g. 10.5258/SOTON/393614 the record If you can, upload the data to the data repository Include a data access statement in any Now a requirement of many funders published work 20 DATA ACCESS STATEMENTS – EXAMPLES Openly available data ‘Data published in this paper are available from the University of Southampton repository at 10.5258/SOTON/379558.’ (G. Squicciarini, M.G.R. Toward and D.J. Thompson 2015) 21 DATA ACCESS STATEMENTS – EXAMPLES Restricted access – ethical, legal, commercial ‘The study data are not freely available due to legal restrictions, and Government of India’s Health Ministry Screening Committee (HMSC) assessment is required to obtain the data. The Parthenon Cohort team will provide the data on request subject to HMSC approval. For further information contact the corresponding author.’ (Ghattu V. Krishnaveni et al. 2015) 22 DATA ACCESS STATEMENTS – EXAMPLES Secondary analysis of existing data This study was a re-analysis of an existing dataset that is publicly available from [organisation] at [web address] 23 DATA ACCESS STATEMENTS – EXAMPLES No new data created No new datasets were created during this study 24 EU General Data Protection Regulation: ‘GDPR’ Came into force on 25 May 2018 Dr. Alison Knight, Legal Services & Data Governance Background • GDPR aims to strengthen data protection laws to make them fit for the digital age by giving people more control over their own data • The current UK Data Protection Act 1998 was derived from EU law (the Data Protection Directive) which is being replaced by the GDPR • The text of GDPR applies throughout EU Member States, directly embedded into new national data protection legislation (incl. a new UK Data Protection Act 2018 to come) • Brexit does not affect the implementation of GDPR • GDPR also has indirect effects on privacy standards outside the EU 26 Why comply? – New fine levels Major breaches of data protection are subject to administrative fines: whichever is higher of the following: • up to 20,000,000 EUR, OR • up to 4 % of the total worldwide annual turnover of the preceding financial year (in the case of an undertaking) • Focused on incidents which are likely to cause damage and distress Medium breaches of data protection are subject to administrative fines: whichever is higher of the following: • up to 10,000,000 EUR, OR • up to 2 % of the total worldwide annual turnover of the preceding financial year (in the case of an undertaking) • Focused on process failures. For example, failure to report ‘High risk’ breaches to the ICO and the relevant data subjects within 72 hours. Or, a failure to do a DPIA. 27 Any information relating to an identified or identifiable natural person An identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to: • an identifier such as a name, an identification number, location data, an online identifier or • to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person” Compare the concept of ‘sensitive’ personal data – delimited categories Personal data – wide definition, but ultimately context focused 28 The main data protection principles – Key message: Establishing data processing purpose is fundamental to GDPR compliance Data protection impact assessments – ‘DPIAs’ (i) • Used and recommended for some time – now a firm requirement under GDPR along with ‘Privacy By Design’ and ‘Privacy by Default’ • Purpose: exercise to identify and mitigate ‘high-risk’ data processing “to the rights and freedoms of natural persons” taking into account: nature, scope, context and purposes of the processing (‘the Data Environment’) • Must carry out a DPIA if plan “high risk" processing activity e.g. • Sensitive data or criminal convictions/offences in high volume • Systematic profiling of individuals leading to automated decisions e.g. credit checks • Large-scale systematic monitoring in public area e.g. CCTV • Other? To assess context by context… • What's included in a DPIA? • Details of processing including purposes • Assessment of necessity and proportionality of processing • Assessment of risks to data subjects • How risks will be reduced to a ‘non-high’ level 30 Data protection impact assessments – DPIAs (ii) • Identify the need for a DPIA initial form (are you planning to process information relating to living people in your research project?) • If yes and triggers met, a DPIA form must be completed and undergo panel review before ethics clearance. This could delay new projects significantly build in to timeline ⇨ • "Where appropriate" must consult data subjects. Also needs to be factored in to timescales. • Follow up recommendations, and rarely if the DPIA identifies high-risk processing that cannot be properly mitigated, data controller (i.e the University) must consult supervisory authority (e.g. ICO).