University of Southampton Research Repository Eprints Soton

University of Southampton Research Repository ePrints Soton Copyright © and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g. AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination http://eprints.soton.ac.uk UNIVERSITY OF SOUTHAMPTON The Building and Application of a Semantic Platform for an e-Research Society by David R. Newman A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the Faculty of Physical and Applied Sciences Electronics and Computer Science October 2011 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF PHYSICAL AND APPLIED SCIENCES ELECTRONICS AND COMPUTER SCIENCE Doctor of Philosophy by David R. Newman This thesis reviews the area of e-Research (the use of electronic infrastructure to support research) and considers how the insight gained from the development of social networking sites in the early 21st century might assist researchers in using this infrastructure. In particular it examines the myExperiment project, a website for e-Research that al- lows users to upload, share and annotate workflows and associated files, using a social networking framework. This Virtual Organisation (VO) supports many of the attributes required to allow a community of users to come together to build an e-Research society. The main focus of the thesis is how the emerging society that is developing out of my- Experiment could use Semantic Web technologies to provide users with a significantly richer representation of their research and research processes to better support reproducible research. One of the initial major contributions was building an ontology for myExperiment. Through this it became possible to build an API for generating and delivering this richer representation and an interface for querying it. Having this richer representation it has been possible to follow Linked Data principles to link up with other projects that have this type of representation. Doing this has allowed additional data to be provided to the user and has begun to set in context the data produced by myExperiment. The way that the myExperiment project has gone about this task and consideration of how changes may affect existing users, is another major contribution of this thesis. Adding a semantic representation to an emergent e-Research society like myExperiment, has given it the potential to provide additional applications. In particular the capability to support Research Objects, an encapsulation of a scientist's research or research process to support reproducibility. The insight gained by adding a semantic representation to myExperiment, has allowed this thesis to contribute towards the design of the architecture for these Research Objects that use similar Semantic Web technologies. iv The myExperiment ontology has been designed such that it can be aligned with other ontologies. Scientific Discourse, the collaborative argumentation of different claims and hypotheses, with the support of evidence from experiments, to construct, confirm or disprove theories requires the capability to represent experiments carried out in silico. This thesis discusses how, as part of the HCLS Scientific Discourse subtask group, the myExperiment ontology has begun to be aligned with other scientific discourse ontologies to provide this capability. It also compares this alignment of ontologies with the architecture for Research Objects. This thesis has also examines how myExperiment's Linked Data and that of other projects can be used in the design of novel interfaces. As a theoretical exercise, it considers how this Linked Data might be used to support a Question-Answering system, that would allow users to query myExperiment's data in a more efficient and user-friendly way. It concludes by reviewing all the steps undertaken to provide a semantic platform for an emergent e-Research society to facilitate the sharing of research and its processes to support reproducible research. It assesses their contribution to enhancing the features provided by myExperiment, as well as e-Research as a whole. It considers how the contributions provided by this thesis could be extended to produce additional tools that will allow researchers to make greater use of the rich data that is now available, in a way that enhances their research process rather than significantly changing it or adding extra workload. Contents Declaration of Authorship xv Acknowledgements xvii 1 Introduction 1 1.1 Research Contributions . .3 2 An e-Research Society 5 2.1 e-Research . .6 2.1.1 CombeChem - Structure-Property Mapping . .7 2.1.1.1 eCrystals Archive . .7 2.1.1.2 Electronic Lab Notebook . .8 2.1.1.3 The SmartTea Project . .9 2.1.1.4 CombeChem and the AKT Project . 10 2.1.1.5 Other CombeChem Projects . 11 2.1.2 myGrid: Directly Supporting the e-Scientist . 12 2.1.2.1 Transparent Access to Multiple Bioinformatics Informa- tion Sources (TAMBIS) . 13 2.1.2.2 The myTea Project . 14 2.1.2.3 Taverna Workbench . 15 2.1.3 e-Research and the Semantic Web . 17 2.1.4 Socializing e-Research . 17 2.2 Social Networking . 19 2.2.1 Social Networking on the Web . 19 2.2.2 Social Networking Websites . 20 2.2.3 The Pros and Cons of Social Networking Websites . 22 2.2.4 Social Networking for e-Research Community . 24 2.3 The myExperiment Project . 26 2.3.1 The myExperiment Model . 28 2.3.2 myExperiment's REST API . 30 2.3.3 Comparisons to Drupal . 32 2.3.4 myExperiment into the Future . 34 3 A Society with Semantics 37 3.1 Semantic Web Technologies . 38 3.1.1 RDF and RDF Schema . 38 3.1.2 OWL . 39 3.1.3 Data, Information and Knowledge . 40 v vi CONTENTS 3.1.3.1 How RDF turns Data into Information . 40 3.1.3.2 How OWL turns Information into Knowledge . 41 3.1.4 Semantic Web Rule Language (SWRL) . 41 3.1.5 RDF Triplestores . 42 3.1.5.1 Triplestore Applications . 43 3.1.5.2 Triplestore Querying . 44 3.1.6 Schemas and Ontologies for an e-Research Society . 45 3.1.6.1 Dublin Core . 45 3.1.6.2 Simple Knowledge Organisation System (SKOS) . 46 3.1.6.3 Friend of a Friend (FOAF) . 47 3.1.6.4 Semantically-Interlinked Online Communities (SIOC) . 48 3.1.6.5 Creative Commons . 50 3.1.6.6 Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) . 50 3.1.6.7 Open Annotations Collaboration . 52 3.1.7 Linked Data . 53 3.1.7.1 The DBPedia Project . 54 3.1.7.2 Vocabulary of Interlinked Datasets (VoID) . 55 3.1.7.3 Co-reference Resolution . 55 3.2 Adding Semantics to myExperiment . 57 3.2.1 The myExperiment Ontology . 57 3.2.1.1 Reusing Ontologies . 57 3.2.1.2 Simple Network Access Rights Management (SNARM) Ontology . 57 3.2.1.3 Base Module . 59 3.2.1.4 Extension Modules . 62 3.2.1.5 Specific Module . 65 3.2.1.6 Promoting Reuse . 66 3.2.1.7 The Evolution and Evaluation of the myExperiment On- tology . 68 3.2.1.8 Comparisons to Drupal RDF . 69 3.2.2 myExperiment's RDF API and SPARQL Endpoint . 71 3.2.3 myExperiment Linked Data . 74 3.2.4 A Semantic myExperiment . 82 4 Applications of a Semantic Platform for an e-Research Society 85 4.1 Research Objects (ROs) . 87 4.1.1 The Features of ROs . 88 4.1.2 Types of RO . 89 4.1.3 From myExperiment Packs to Research Objects . 90 4.1.4 The Future of ROs and myExperiment . 91 4.2 Scientific Discourse . 96 4.2.1 Scientific Discourse and the Semantic Web . 96 4.2.2 Alignment of Scientific Discourse Related Ontologies . 100 4.2.2.1 Online Communities . 101 4.2.2.2 Discourse Models . 101 4.2.2.3 Bibliographic Citations . 104 CONTENTS vii 4.2.2.4 Curation . 105 4.2.2.5 Experimental Data . 106 4.2.2.6 SCientific ARticle of The Future (SCARF) . 110 4.2.3 Research Objects for Scientific Discourse . 111 4.3 Question-Answering Systems for e-Research Societies . 113 4.3.1 Natural Language Processing . 115 4.3.2 Inherent Issues with QA Systems . 116 4.3.3 A Brief History of QA Systems . 119 4.3.4 Tools for NLP . 121 4.3.5 Web and Semantic Web based QA Systems . 123 4.3.6 A Potential Question-Answering (QA) System for myExperiment . 130 4.3.7 Research Objects for Questions . 136 5 Conclusions 139 5.1 Research Outcomes . 147 5.2 Further Work . 148 A myExperiment Data and Specifications 151 A.1 myExperiment REST API Responses . 151 A.1.1 Example Workflow REST API Response . 151 A.2 myExperiment Ontology Modules . 152 A.2.1 Simple Network Access Rights Management (SNARM) Ontology Module . 152 A.2.2 Base Ontology Module . 155 A.2.3 Attribution and Credit Ontology Module . 176 A.2.4 Annotations Ontology Module . 180 A.2.5 Contributions Ontology Module . 188 A.2.6 Packs Ontology Module . 193 A.2.7 Experiments Ontology Module .

University of Southampton Research Repository Eprints Soton

The Design and Realisation of the Myexperiment Virtual Research Environment for Social Sharing of Workﬂows

Semantic Web: a Review of the Field Pascal Hitzler [email protected] Kansas State University Manhattan, Kansas, USA

Scientific Social Objects the Social Objects and Multidimensional Network of the Myexperiment Website

Conceptual Open Hypermedia = Semantic Web?

James A. Hendler, Phd

Lessons from Myexperiment: Two Insights Into Emerging E-Research Practice

Conceptual Open Hypermedia = the Semantic Web?

GGF11 Semantic Grid Applications Workshop

The Semantic Grid: Enabling E-Science

Understanding and Combating Decay in Taverna Workflows

How Semantic Web Knowledge Technologies Might Improve Natural Language Querying Systems

Workflow-Centric Research Objects: First Class Citizens in Scholarly