University of Southampton Research Repository ePrints Soton Copyright © and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g. AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination http://eprints.soton.ac.uk Faculty of Engineering and the Environment University of Southampton United Kingdom Research data management Thesis submitted for the degree of Doctor of Philosophy by Mark Scott May 2014 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF ENGINEERING AND THE ENVIRONMENT Computational Engineering and Design Doctor of Philosophy RESEARCH DATA MANAGEMENT by Mark Scott Scientists within the materials engineering community produce a wide variety of data, ranging from large 3D volume densitometry files (voxel) generated by microfocus computer tomo- graphy (µCT) to simple text files containing results from tensile tests. Increasingly they need to share this data as part of international collaborations. The design of a suitable database schema and the architecture of a flexible system that can cope with the varying information is a continuing problem in the management of heterogeneous data. We discuss the issues with managing such varying data, and present a model flexible enough to meet users’ diverse requirements. Metadata is held using a database and its design allows users to control their own data structures. Data is held in a file store which, in combin- ation with the metadata, gives huge flexibility and means the model is limited only by the file system. Using examples from materials engineering and medicine we illustrate how the model can be applied. We will also discuss how this data model can be used to support an institu- tional document repository, showing how data can be published in a remote data repository at the same time as a publication is deposited in a document repository. Finally, we present educational material used to introduce the concepts of research data management. Educating students about the challenges and opportunities of data management is a key part of the solution and helps the researchers of the future to start to think about the relevant issues early on in their careers. We have compiled a set of case studies to show the similarities and differences in data between disciplines, and produced documentation for students containing the case studies and an introduction to the data lifecycle and other data management practices. Managing in-use data and metadata is just as important to users as published data. Ap- propriate education of users and a data staging repository with a flexible and extensible data model supports this without precluding the ability to publish the data at a later date. Research data management © Mark Scott, University of Southampton 2014 Copyright and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given, i.e. Mark Scott. 2014. ‘Research data manage- ment’. PhD diss., University of Southampton. To Yeyang and my wonderful family who kept me going during the low times and celebrated with me during the high times, and to my father who finally lost his ten year battle with malignant mesothelioma on his 75th birthday just before the viva and so didn’t quite see the end of this journey. ã A rose blooms and then fades, but the beauty and the fragrance are remembered always. (Eddings 1987) Contents List of figures ....................................... xiii List of tables ....................................... xvii Declaration ....................................... xix Acknowledgements ................................... xxi Chapter 1 Introduction 1 1.1 A historical warning ........................... 1 1.2 ‘Standing on the shoulders of giants’ .................. 2 1.3 Data sharing ................................ 3 1.4 Research data management ........................ 3 1.5 Data capture ................................ 4 1.6 Materials engineering data ........................ 5 1.7 Thesis outline ............................... 6 1.7.1 Main research questions and thesis objectives ......... 6 1.7.2 Structure .............................. 7 Chapter 2 Computing data and associated technologies 11 2.1 Introduction ................................ 11 2.2 Terminology ............................... 11 2.3 Computing data .............................. 12 2.3.1 File data .............................. 13 2.3.2 Metadata .............................. 18 2.4 Data storage ................................ 19 2.4.1 Storage hardware ......................... 19 2.4.2 Local file systems ......................... 20 2.4.3 Network file systems ....................... 22 2.4.4 Distributed file systems ..................... 22 2.4.5 Cloud technologies ........................ 22 2.4.6 Data transfer ........................... 24 vii 2.5 User requirements collection ...................... 24 2.5.1 User requirements report and questionnaire .......... 25 2.5.2 Synchrotron data interview ................... 28 2.5.3 MatDB schema interview .................... 30 2.6 Materials engineering data ........................ 32 2.6.1 Tensile test ............................ 32 2.6.2 Long crack growth fatigue test .................. 33 2.6.3 Plain bend bar testing ...................... 33 2.6.4 Fractography ........................... 34 2.6.5 Microfocus X-ray computed tomography ............ 34 2.6.6 Material information ....................... 35 2.6.7 Relationships between tests ................... 35 2.6.8 Summary ............................. 37 2.7 Database technologies .......................... 37 2.7.1 Relational database systems ................... 37 2.7.2 Normalisation .......................... 37 2.7.3 Entity-Attribute-Value ...................... 37 2.7.4 Federated databases and GaianDB ................ 40 2.8 Structured storage ............................ 41 2.8.1 The Content Repository for Java Technology API specification 42 2.9 Document repositories .......................... 42 2.9.1 Institutional document repositories ............... 42 2.9.2 Microsoft SharePoint ....................... 43 2.10 Protocols and frameworks ........................ 44 2.10.1 RDF (Resource Description Framework) ........... 44 2.10.2 Atom Publishing Protocol and SWORD ............ 44 2.10.3 OAI-ORE (Open Archives Initiative Object Reuse and Exchange) ............................. 44 2.10.4 OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) ............................ 45 2.11 Accurate data capture ........................... 45 2.11.1 Semantic web ........................... 46 2.11.2 Discussion ............................. 46 2.12 Data management projects ........................ 47 2.12.1 The Materials Atlas (materials data repository) ........ 48 2.12.2 Human Genome Project ..................... 49 2.12.3 Large Hadron Collider ...................... 49 viii 2.13 Summary ................................. 50 Chapter 3 Data publishing with EPrints 53 3.1 Introduction ................................ 53 3.2 Architecture ................................ 54 3.2.1 Early MDC prototype ...................... 55 3.2.2 Service layer ............................ 55 3.2.3 EPrints plug-in .......................... 58 3.3 Results ................................... 58 3.4 Summary ................................. 61 Chapter 4 A model for managing materials data 63 4.1 Introduction ................................ 63 4.2 Architecture ................................ 64 4.2.1 File system ............................ 65 4.2.2 Metadata database ......................... 66 4.2.3 Logic layer ............................. 68 4.2.4 File system monitor ....................... 69 4.2.5 Interface .............................. 73 4.3 Materials engineering use cases ..................... 76 4.3.1 Material information ....................... 76 4.3.2 Tensile test ............................ 79 4.3.3 Long crack growth fatigue test .................. 79 4.3.4 Microfocus computed tomography data ............ 81 4.4 Reliability tests .............................. 82 4.4.1 Tests of the synchronisation service ............... 82 4.4.2 Testing the metadata database .................. 82 4.5 Performance tests ............................. 84 4.6 Working with data in the system .................... 84 4.6.1 Searching using the interface tools ............... 85 4.6.2 Data set suggestions ....................... 85 4.6.3 Data access
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages220 Page
-
File Size-