Semantic Representation for Experimental Protocols

Semantic Representation for Experimental Protocols

UNIVERSIDAD POLITÉCNICA DE MADRID DOCTORAL THESIS SeMAntic RepresenTation for experimental Protocols Author: Supervisor: Olga Ximena Giraldo Pasmin Prof. Dr. Oscar Corcho A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy in the Ontology Engineering Group Department of Artificial Intelligence April 23, 2019 iii Declaration of Authorship I, Olga Ximena Giraldo Pasmin, declare that this thesis titled, “{SeMAntic Repre- senTation for Experimental Protocols” and the work presented in it are my own. I confirm that: • This work was done wholly or mainly while in candidature for a research de- gree at this University. • Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated. • Where I have consulted the published work of others, this is always clearly attributed. • Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work. • I have acknowledged all main sources of help. • Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself. Signed: Date: v vii UNIVERSIDAD POLITÉCNICA DE MADRID Abstract Department of Artificial Intelligence Escuela Técnica Superior de Ingenieros Informáticos Doctor of Philosophy SeMAntic RepresenTation for experimental Protocols by Olga Ximena GIRALDO PASMIN This research address the problem of semantically representing experimental proto- cols in life sciences and how to relate such information to data. The need for open in- teroperable data supporting research transparency, systematic reuse of existing data and, experimental reproducibility has been widely acknowledged. Several efforts are providing infrastructure for sharing and storing data. However, data per se does not imply reproducibility; there is the need to know how the data was produced -here is the data, where are the experimental protocols? Several efforts have stud- ied the problem of "is this reproducible?” Fewer efforts have addressed the prob- lem of semantically valid, machine-processable reporting structures. SMART Pro- tocols (SP) makes use of Semantic Web technology, thus facilitating interoperability and machine processability; SP delivers an extendible infrastructure that allows re- searchers to search for similar protocols, or investigations with similar techniques, methods, instruments, variables and/or populations, etc. In order to achieve such degree of functionality, throughout this investigation a comprehensive vocabulary was gathered by annotating documents; the corresponding infrastructure, hence- forth BioH, was specially developed to support this task. The evaluation of the vo- cabulary thus gathered made it possible to generate the SP gold standard; this is a gold standard corpus specifically engineered for experimental protocols. The tooling and methods applied when building this gold standard can be applied to other do- mains. Furthermore, this investigation also delivers a semantic publication platform for experimental protocols; Scientific publications aggregate data by encompassing it within a persuasive narrative. The SP approach addresses the problem of support- ing such aggregation over a document that is to be born semantic, interoperable and conceived as an aggregator within a web-of-data publishing workflow. ix Acknowledgements First and foremost, thanks to my family. You are the foundation of all my strength. To my mother, thank you for your constant love and support, it is something that I have always depended on without thinking and I would be nowhere without it. To my husband, you have given more to me than I could ever ask, thank you for riding along with me through the storms and the doldrums of this journey and for reaching down and lifting me back up every time I started to drift beneath the surface. Most importantly, and from the bottom of my heart, thanks to my daughter in whom I have found my deepest happiness as well as my true inner strength. Since she was born, she has taught me more about myself than everything I taught I knew. To God, who blessed me with Alba. xi Contents Declaration of Authorship iii Abstract vii Acknowledgements ix 1 Introduction1 1.1 Introducing the problem...........................1 1.2 Motivation...................................2 1.3 Problem statement..............................4 1.4 Contributions of this thesis.........................5 1.4.1 Research Outcomes related to this Investigation.........6 Awards.................................6 Journal Papers............................6 Conferences and Workshops....................6 1.5 Outline of this Thesis.............................7 Bibliography 11 2 A Guideline for Reporting Experimental Protocols in Life Sciences 13 2.1 Introduction.................................. 14 2.2 Materials and Methods............................ 15 2.2.1 Materials................................ 15 i) Instructions for authors from analyzed journals......... 15 ii) Corpus of protocols......................... 16 iii) Minimum information standards and Ontologies....... 16 2.2.2 Methods for developing this guideline............... 17 Analyzing guidelines for authors.................. 17 Analyzing the protocols........................ 18 Analyzing Minimum Information Standards and ontologies.. 19 Generating the first draft...................... 20 Evaluation of data elements by domain experts......... 21 2.3 Results..................................... 21 2.3.1 Bibliographic data elements..................... 23 2.3.2 Data elements of the discourse................... 25 2.3.3 Data elements for materials..................... 26 2.3.4 Data elements for the procedure.................. 32 2.4 Data elements represented in the SMART Protocols Ontology..... 35 2.5 Discussion................................... 36 2.6 Conclusion................................... 38 Bibliography 41 xii 3 Using Semantics for Representing Experimental Protocols 51 3.1 Background.................................. 52 3.2 Methods.................................... 53 3.2.1 The Kick-off, Scenarios and Competency Questions....... 53 3.2.2 Conceptualization and Formalization............... 53 Domain Analysis and Knowledge Acquisition, DAKA..... 54 Linguistic and Semantic Analysis, LISA.............. 55 Iterative ontology building and validation, IO.......... 56 3.2.3 Ontology Evaluation......................... 56 3.3 Results..................................... 57 3.3.1 The SMART Protocols ontology................... 57 The Document Module....................... 57 The Workflow Module........................ 57 3.3.2 Evaluation............................... 59 Syntax................................. 59 Conceptualization and Formalization............... 59 Competency questions........................ 62 3.4 Applying the SMART Protocols Ontology to the Definition of a Mini- mal Information Model............................ 62 3.4.1 The Sample Instrument Reagent Objective (SIRO) Model.... 63 3.4.2 Evaluating the SIRO Model..................... 64 3.5 Discussion................................... 65 3.5.1 SMART Protocols Ontology..................... 65 3.5.2 Modularization of the SP ontology................. 65 3.5.3 Limitations.............................. 66 3.5.4 The SIRO model, application of the ontology........... 66 3.6 Conclusions.................................. 66 Bibliography 71 4 Laboratory Protocols in Bioschemas 77 4.1 Introduction.................................. 78 4.2 Why semantic structuring?.......................... 78 4.3 Bioschemas at a glance............................ 78 4.3.1 Experimental Protocols and Bioschemas.............. 80 4.4 Developing the LabProtocol profile..................... 80 4.5 Results, The Labprotocol Profile....................... 83 4.5.1 Mandatory properties........................ 83 4.5.2 Recommended properties...................... 83 4.6 Discussion................................... 86 4.7 Conclusions and Future Work........................ 87 Bibliography 89 5 BioH, The Smart Protocols Annotation Tool 93 5.1 Introduction.................................. 94 5.2 The SIRO Curation Model.......................... 95 5.3 The Tool.................................... 96 5.3.1 Architecture.............................. 96 5.4 Discussion and Concluding Remarks.................... 97 Bibliography 99 xiii 6 Generating a Gold Standard Corpus for Experimental Protocols 101 6.1 Introduction.................................. 102 6.2 Materials and Methods............................ 102 6.2.1 Materials................................ 102 Corpus of documents........................ 102 Annotators.............................. 103 Annotation guidelines........................ 103 6.3 Methods.................................... 104 6.4 Results..................................... 105 6.5 Discussion................................... 108 6.6 Conclusions.................................. 108 Bibliography 111 7 Semantics at Birth, the SMART Protocols Publication Platform 115 7.1 Introduction.................................. 116 7.2 Semantic Publishing for Experimental Protocols............. 117 7.2.1 Preserving the Resource Map for a Protocol............ 118 7.3 Results..................................... 119 7.3.1 Architecture and Data Workflow.................. 119 7.4 Discussion..................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    181 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us