WHAT INFLUENCE WOULD a CLOUD BASED SEMANTIC LABORATORY NOTEBOOK HAVE on the DIGITISATION and MANAGEMENT of SCIENTIFIC RESEARCH? by Samantha Kanza

UNIVERSITY OF SOUTHAMPTON Faculty of Physical Sciences and Engineering School of Electronics and Computer Science What Influence would a Cloud Based Semantic Laboratory Notebook have on the Digitisation and Management of Scientific Research? by Samantha Kanza Thesis for the degree of Doctor of Philosophy 25th April 2018 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF PHYSICAL SCIENCES AND ENGINEERING SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE Doctor of Philosophy WHAT INFLUENCE WOULD A CLOUD BASED SEMANTIC LABORATORY NOTEBOOK HAVE ON THE DIGITISATION AND MANAGEMENT OF SCIENTIFIC RESEARCH? by Samantha Kanza Electronic laboratory notebooks (ELNs) have been studied by the chemistry research community over the last two decades as a step towards a paper-free laboratory; sim- ilar work has also taken place in other laboratory science domains. However, despite the many available ELN platforms, their uptake in both the academic and commercial worlds remains limited. This thesis describes an investigation into the current ELN landscape, and its relationship with the requirements of laboratory scientists. Market and literature research was conducted around available ELN offerings to characterise their commonly incorporated features. Previous studies of laboratory scientists examined note-taking and record-keeping behaviours in laboratory environments; to complement and extend this, a series of user studies were conducted as part of this thesis, drawing upon the techniques of user-centred design, ethnography, and collaboration with domain experts. These user studies, combined with the characterisation of existing ELN features, in- formed the requirements and design of a proposed ELN environment which aims to bridge the gap between scientists' current practice using paper lab notebooks, and the necessity of publishing their results electronically, at any stage of the experiment life cycle. The proposed ELN environment uses a three-layered approach: a notebook layer consisting of an existing cloud based notebook; a domain specific layer with the appro- priate knowledge; and a semantic layer that tags and marks-up documents. A prototype of the semantic layer (Semanti-Cat) was created for this thesis, and evaluated with re- spect to the sociological techniques: Actor Network Theory and the Unified Theory of Acceptance and Use of Technology. This thesis concludes by considering the implications of this ELN environment on broader laboratory practice. The results of the user studies in this thesis have underscored laboratory scientists' attachment to paper lab notebooks; however, even though paper lab notebooks are currently unlikely to be replaced by a system of digitised experimental records, laboratory scientists are not opposed to using technology that facilitates high-level integration, management and organisation of their records. This thesis therefore identifies areas of improvement in current laboratory data management software. Contents Declaration of Authorship xv Acknowledgements xix 1 Introduction 1 1.1 Motivation . .2 1.2 Research Question and Objectives . .3 1.3 Contribution and Scope . .4 1.4 Report Structure . .4 2 Scientific Record Keeping 7 2.1 The Scientific Record . .8 2.1.1 The What . .8 2.1.2 The Why . .9 2.1.3 The Where . .9 2.1.4 The Who . 10 2.1.5 The How . 12 2.1.6 Limitations and Considerations of Current Literature . 14 2.2 Computer Supported Cooperative Work (CSCW) . 15 2.3 From Paper Lab Notebooks to Electronic Lab Notebooks . 15 2.4 Existing ELN Tools . 16 2.5 Building ELNs . 18 2.5.1 User Centred Design . 18 2.5.2 Ethnographic Studies . 19 2.5.3 Collaborative Design with Scientists . 20 2.6 Features of ELNs . 21 2.6.1 Collaborative . 21 2.6.2 Platform Independent and Cloud Based . 22 2.6.3 Rich Semantics . 22 2.6.4 Domain Specific . 24 2.6.5 Build upon an existing Electronic Notebook . 25 2.7 Desk Based Research . 26 2.7.1 ELN Usage and Adoption Barrier Surveys . 26 2.7.1.1 Cost . 27 2.7.1.2 ELN Attitude . 28 2.7.1.3 Ease of Use . 29 2.7.1.4 ELN Access . 29 v vi CONTENTS 2.7.1.5 Data Compatibility . 30 2.7.2 ELN Market Survey . 30 2.7.2.1 Methodology . 30 2.7.2.2 Results . 32 2.8 ELN Survey Discussion & Summary . 33 3 Research Methodology and Design 35 3.1 Theories . 35 3.1.1 Actor Network Theory (ANT) . 36 3.1.2 Unified Theory of Acceptance and Use of Technology (UTAUT) . 37 3.1.3 Using ANT and UTAUT . 39 3.2 Methodological Approach . 40 3.3 Research Design . 42 3.4 Research Analysis . 43 3.5 Research Structure . 45 4 Initial User Studies 47 4.1 Survey Design . 49 4.2 Focus Groups . 51 4.3 Participant Observation . 54 4.4 Results . 55 4.4.1 Survey Results . 55 4.4.1.1 Comparing Usage of Different Types of Chemistry Software 56 4.4.1.2 Comparing Type of Chemist to Software Used . 56 4.4.1.3 Comparing Number of Years Experience vs Software Usage 58 4.4.1.4 Molecular Modelling & Simulation Software Usage . 59 4.4.1.5 Molecular Editing Software Usage . 60 4.4.1.6 Quantum Chemistry Software Usage . 61 4.4.1.7 Database Informatics, Bibliographic Database & Semantic Web Software Usage . 62 4.4.1.8 Organic Synthesis, Nanostructure Modelling & Chemical Kinetics & Process Software . 63 4.4.1.9 What Piece of Software Would you Most Like to be Cre- ated? . 63 4.4.2 Focus Group Results . 64 4.4.2.1 Recording Notes . 65 4.4.2.2 Organising Notes . 66 4.4.2.3 Use of Technology in Note taking & Lab Equipment . 67 4.4.2.4 Data Storage, Backup and Linkage . 67 4.4.2.5 Intellectual Property . 69 4.4.2.6 Collaboration and Sharing . 69 4.4.2.7 Reference Management . 70 4.4.2.8 Scenarios . 70 4.4.2.9 Current usage of Paper and Electronic Devices . 71 4.4.2.10 ELN Opinions & Experiences . 73 4.4.3 Dial-a-Molecule Surveys . 75 4.4.4 Participant Observation Results . 78 CONTENTS vii 4.4.4.1 Crystallographers . 79 4.4.4.2 Molecular Chemists . 81 4.4.4.3 Inorganic Chemists . 83 4.4.4.4 Organic Chemists . 85 4.5 Discussion . 88 4.6 Key Findings . 91 4.6.1 KF1: How scientists take notes and organise their work is a highly personal endeavour . 92 4.6.2 KF2: Scientists do use technology to digitise portions of their work despite their continued use of paper . 92 4.6.3 KF3: Some of the elements of the scientific record that do not get digitised are the things that do not work . 93 4.6.4 KF4: Scientists are much more concerned with backing up electronic work than paper based work . 94 4.6.5 KF5: Many lab setups aren't conducive to using electronic devices 94 4.6.6 KF6: Scientists seemed more in favour of ELNs if they had su- pervisors who thought they were a good idea or encouraged their usage . 95 4.6.7 KF7: Scientists still seem very attached to their paper notebooks, and do not believe that an ELN would fit into their current work- flow in a way that would make things easier . 95 4.6.8 KF8: The software needs identified by scientists weren't necessarily what they would attribute to an ELN, rather than an improvement to their current software offerings . 96 4.7 Initial Conclusions . 97 5 A Semantic Cloud Based ELN 99 5.1 System Design and Requirements . 99 5.1.1 Non-Functional Requirements . 101 5.1.2 Functional Requirements Notebook Layer . 101 5.1.3 Functional Requirements Domain Layer . 102 5.1.4 Functional Requirements Semantic Layer . 102 5.2 Choosing a Cloud Based Service . 102 5.3 Development Environment . 104 5.4 Developing the Semantic Layer . 105 5.4.1 Implemented Requirements . 105 5.4.2 Must Have Requirements . 106 5.4.3 Should Have Requirements . 106 5.4.4 Could Have Requirements . 106 5.4.5 Tools & Technologies . 106 5.4.5.1 Tagging Services . 107 5.4.5.2 Ontologies . 107 5.4.5.3 Automatic Chemical Recognition . 108 5.4.5.4 Searching . 108 5.5 Semanti-Cat . 108 5.5.1 Architecture & Implementation . 109 5.5.1.1 OpenCalais API . 109 5.5.1.2 GATE . 111 viii CONTENTS 5.5.1.3 ChemicalTagger . 112 5.5.1.4 Ontologies . 112 5.5.2 Semanti-Cat Walkthrough . ..

WHAT INFLUENCE WOULD a CLOUD BASED SEMANTIC LABORATORY NOTEBOOK HAVE on the DIGITISATION and MANAGEMENT of SCIENTIFIC RESEARCH? by Samantha Kanza

Density Functional Theory

Substrate Selectivity Profiling of the Human Monoamine Transporters

Evaluating Small Molecule Microscopic and Macroscopic Pka

Multiscale Simulation of Laser Ablation and Processing of Semiconductor Materials

Open Babel Documentation Release 2.3.1

Aqueous Pka Prediction for Tautomerizable Compounds Using Equilibrium Bond Lengths

Structural Insight Into Pichia Pastoris Fatty Acid Synthase Joseph S

ORGANIZATION of BIOMOLECULES in a 3-DIMENSIONAL SPACE by Sameer Sajja a Thesis Submitted to the Faculty of the University Of

Designing Universal Chemical Markup (UCM) Through the Reusable Methodology Based on Analyzing Existing Related Formats

Notes on OLEX2

Delphi: a Comprehensive Suite for Delphi Software and Associated Resources Lin Li Clemson University

Implementation of Methods to Accurately Predict Transition Pathways and the Underlying Potential Energy Surface of Biomolecular Systems