University of Southampton Research Repository Eprints Soton
Total Page:16
File Type:pdf, Size:1020Kb
University of Southampton Research Repository ePrints Soton Copyright © and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g. AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination http://eprints.soton.ac.uk UNIVERSITY OF SOUTHAMPTON FACULTY OF NATURAL AND ENVIRONMENTAL SCIENCES Chemistry Facilitating Chemical Discovery: An e-Science Approach by Andrew J. Milsted Thesis for the degree of Doctor of Philosophy February 2015 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF NATURAL AND ENVIRONMENTAL SCIENCES Chemistry Doctor of Philosophy FACILITATING CHEMICAL DISCOVERY: AN E-SCIENCE APPROACH by Andrew J. Milsted e-Science technologies and tools have been applied to the facilitating of the accumulation, validation, analysis, computation, correlation and dissemination of chemical information and its transformation into accepted chemical knowledge. In this work a number of approaches have been investigated to address the different issues with recording and preserving the scientific record, mainly the laboratory notebook. The electronic laboratory notebook (ELN) has the potential to replace the paper note- book with a marked-up digital record that can be searched and shared. However it is a challenge to achieve these benefits without losing the usability and flexibility of tra- ditional paper notebooks. Therefore using a blog-based platform will be investigated to try and address the issues associated with the development of a flexible system for recording scientific research. Contents Declaration of Authorship xiii Acknowledgements xv 1 Introduction1 1.1 Electronic Lab Notebooks...........................3 1.1.1 Social Media..............................6 1.1.2 Open Science..............................6 2 Repositories9 2.1 eBank and eCrystals..............................9 2.1.1 An Open Access Crystal Structure Report Archive......... 10 2.1.2 Upgrading to Eprints 3........................ 15 2.2 Repository for the Laboratory, R4L...................... 17 2.2.1 Loss of Data.............................. 17 2.2.2 Integrity................................. 20 2.3 Conclusions................................... 21 3 chemTools 23 3.1 Overview.................................... 23 3.1.1 Single SignOn.............................. 24 3.1.2 Marvin................................. 25 3.1.3 Web Services.............................. 25 3.2 eLearning Tutorial on Regression Methods.................. 26 3.3 Proflocate.................................... 27 3.4 eMalaria..................................... 29 3.4.1 Optimization.............................. 30 3.4.1.1 Molopt And Mopac..................... 30 3.4.1.2 Marvin and Smiles...................... 30 3.4.2 Testing the Interface.......................... 32 3.4.3 Distributed vs Dedicated Computing................. 33 3.4.4 Undergraduate Studies......................... 33 3.4.5 Computing Libraries.......................... 34 3.5 Conclusions................................... 35 4 The Problem 37 4.1 Issues...................................... 37 4.1.1 Backup................................. 37 v vi CONTENTS 4.1.2 Competitiveness............................ 38 4.1.3 Collaboration.............................. 39 4.2 Open Research................................. 39 4.2.1 Publication at source.......................... 39 4.3 Provenance................................... 40 5 Electronic Lab Notebooks 41 5.1 What is an ELN?................................ 41 5.2 Example ELN's................................. 42 5.3 Semantic Importance.............................. 44 5.4 The `Un' Semantic ELN............................ 45 6 LabTrove 47 6.1 Architecture................................... 47 6.1.1 Versions................................. 48 6.1.2 LabTrove objects............................ 50 6.1.3 LabTrove Components......................... 53 6.1.4 Database................................ 55 6.1.5 Security................................. 58 6.2 Features..................................... 61 6.2.1 The User interface........................... 61 6.2.1.1 Commenting......................... 66 6.2.2 Pictorial Comments.......................... 67 6.2.2.1 Revisions........................... 68 6.2.3 Provenance............................... 69 6.2.4 RSS Feeds................................ 71 6.2.5 Linking................................. 71 6.3 Handling Data................................. 72 6.4 Templates.................................... 74 6.5 API....................................... 75 6.5.1 The REST API............................. 75 6.5.2 Auto Posting.............................. 77 7 LabTrove Evaluation 79 7.1 Implementations of LabTrove......................... 79 7.2 User Experience:The Biochemist: Jennifer Hale............... 83 7.2.1 Using LabTrove as a simple journal notebook............ 83 7.2.1.1 What makes a post..................... 84 7.2.1.2 Developing a metadata framework............. 86 7.2.1.3 Conclusions from the initial investigation: Specific re- quirements.......................... 87 7.2.2 Using the Blog as an ELN....................... 88 7.2.2.1 The one-item one-post system............... 88 7.2.2.2 What merits its own post?................. 89 7.2.2.3 Consequences and applications of the one-item one-post system............................ 90 7.2.2.4 Metadata organisation in the one item-one post system. 92 7.2.3 Using the Templates: Approaches to metadata frameworks.... 92 CONTENTS vii 7.3 User Experience: the Research Group, the ORC Xray Group....... 94 7.3.1 Integration of the Laboratory..................... 95 7.3.1.1 Auto Posting......................... 95 7.3.1.2 Laboratory Environment.................. 96 7.3.1.3 MatLab............................ 98 7.3.2 Using LabTrove For Open Drug Discovery.............. 98 7.3.3 A Commercial Viewpoint....................... 100 7.4 Blog My Data.................................. 103 7.4.1 Using LabTrove as a possible solution................ 103 7.5 School of Chemistry, University of New South Wales............ 106 7.6 Conclusions................................... 107 8 After the ELN 109 8.1 Where should the research reside....................... 109 8.1.1 Dryad.................................. 110 8.1.2 Figshare................................. 111 8.1.3 DataCite................................ 111 8.2 Credit and Impact............................... 112 9 Conclusions 113 9.1 The scientific record.............................. 113 9.2 Collaboration.................................. 114 9.3 Open Science.................................. 114 9.4 LabTrove and the future............................ 114 A Abbreviations/Definitions 117 A.1 Acronyms/Abreaviations............................ 117 A.2 Data Sizes.................................... 119 A.3 File Formats.................................. 119 B Supporting Data 121 B.1 Thesis...................................... 121 B.2 LabTrove Software............................... 122 B.3 LabTrove Manual................................ 123 References 125 List of Figures 2.1 An archive entry for one dataset....................... 14 2.2 An example of data loss in a journal article................. 18 2.3 The R4L data capture model.......................... 19 2.4 A schematic of the Probity service...................... 21 3.1 A Screen shot of the chemTools project page................ 24 3.2 An example of how to call a soap wrapped web service........... 26 3.3 This shows the interactive part of the stats site................ 27 3.4 An example output page for proflocate................... 28 3.5 The Optimisation process going from smile to 2D then to 3D also showing the systematic construction of peptides.................... 31 3.6 Online peptide interface............................ 31 3.7 Simplified Interface for eMalaria....................... 32 4.1 An example of the degradation of information content with metadata and original data of time; information entropy. Accidents or changes in storage technology (dashed line) may eliminate access to remaining raw data and metadata at any time[1]............................. 38 5.1 An example commercial Electronic laboratory notebook (ELN), iLabber, being used to record the method for an biochemistry procedure. Showing the use of the free text entry and tables................... 43 6.1 Showing the flow of data with in a classic LAMP installation........ 49 6.2 Showing the resultant request, http://blogs.chem.soton.ac.uk/, to the user[Accessed: 01/11/12]............................ 50 6.3 The principal LabTrove objects........................ 51 6.4 Schematic diagram illustrating the principal components of the PHP server process and the main flows of control between components..... 53 6.5 An example plugin hat demonstrats the user of the plugin system, in this example it will alter the HTML content of the post by adding the post idenitfier to the bottom...........................