Flyclockbase

Total Page:16

File Type:pdf, Size:1020Kb

Flyclockbase bioRxiv preprint doi: https://doi.org/10.1101/099192; this version posted January 9, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International licenseFlyClockbase. time series variance curation 1 Watching the clock for 25 years in 2 FlyClockbase: 3 Variability in circadian clocks of Drosophila melanogaster 4 as uncovered by biological model curation 5 *,† ‡ *,† *,† 6 Katherine S. Scheuer , Bret Hanlon , Jerdon W. Dresel , Erik D. Nolan , ‡ *,† 7 John C. Davis , Laurence Loewe 8 * 9 Systems Biology Theme, Wisconsin Institute for Discovery, 10 University of Wisconsin-Madison, Madison, WI, 53715 † 11 Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706 ‡ 12 Department of Statistics, University of Wisconsin-Madison, Madison, WI, 53706 13 14 General Article Summary 15 Circadian clocks impact health and fitness by controlling daily rhythms of gene- 16 expression through complex gene-regulatory networks. Deciphering how they work 17 requires experimentally tracking changes in amounts of clock components. We 18 designed FlyClockbase to simplify data-access for biologists and modelers and curated 19 over 400 time series observed in wildtype fruit flies from 25 years of research on clocks. 20 We found differences in peak time variance of the clock-proteins ‘PERIOD’ and 21 ‘TIMELESS’, which probably stem from differences in phosphorylation-network 22 complexity. Combining in-depth circadian-biology, model-curation, and compiler logic, 23 our trans-disciplinary research shows how biology-friendly compilers could simplify 24 model curation enough to democratize it. 25 POST StabliizingZone Level: QualityQuest. VersionVariant Number: QQv1r0p0_2017m01d05_LL i bioRxiv preprint doi: https://doi.org/10.1101/099192; this version posted January 9, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Scheuer et al. aCC-BY 4.0 International license. 1 Running title: FlyClockbase time series variance curation 2 3 Keywords / Key phrases: 4 Drosophila melanogaster circadian clock model biological data input review, 5 experimental time series peak-valley variance and outlier clock observations, 6 differential variance hypothesis on PERIOD - TIMELESS amount peak times, 7 data repository for estimating parameters of mechanistic simulation models, 8 compiler logic enabling human error analysis simplifying biological model curation. 9 10 Corresponding author: 11 Laurence Loewe 12 Wisconsin Institute for Discovery, University of Wisconsin-Madison, 13 330 N Orchard St, Madison, WI, 53715 14 Email and Phone: [email protected] (608) 316-4324 15 16 Statement of data availability and stabilizing versioning number: QQv1 17 For review purposes: QQv1 zip-archive in Supplemental Material or upon request. 18 Before final publication: FlyClockbase will be made available on http://github.com/ 19 20 Abbreviations: Table 1: Core clock components Table 2: Concepts in FlyClockbase 21 Supporting Material: 22 Supporting Text and Tables, Supplemental Statistical Analysis (87 pages), 23 R-Script zip file (>12K lines, QQv1), FlyClockbase zip file (QQv1). 24 ii 2017-01-05 QQv1 bioRxiv preprint doi: https://doi.org/10.1101/099192; this version posted January 9, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International licenseFlyClockbase. time series variance curation 1 Abstract 2 High-quality model curation provides insights by organizing biological knowledge- 3 fragments. We aim to integrate published results about circadian clocks in Drosophila 4 melanogaster while exploring economies of scale in model curation. Clocks govern 5 rhythms of gene-expression that impact fitness, health, cancer, memory, mental 6 functions, and more. Human clock insights have been pioneered in flies. Flies simplify 7 investigating complex gene regulatory networks, which express proteins cyclically using 8 environmentally entrained interlocking feedback loops. Simulations could simplify 9 research further, but currently few models test their quality directly against 10 experimentally observed time series scattered across publications. We designed 11 FlyClockbase for robust efficient access to such scattered data for biologists and 12 modelers, prioritizing simplicity and openness to encourage experimentalists to 13 preserve more annotations and raw-data. Such details could multiply long-term value for 14 modelers interested in meta-analyses, parameter estimates, and hypothesis testing. 15 Currently FlyClockbase contains over 400 wildtype time series of core circadian 16 components systematically curated from 86 studies published between 1990 and 2015. 17 Using FlyClockbase, we show that PERIOD protein amount peak time variance 18 unexpectedly exceeds that of TIMELESS. We hypothesize, PERIOD’s exceedingly 19 more complex phosphorylation rules are responsible. Human error analysis improved 20 data quality and revealed significance-degrading outliers, possibly violating presumed 21 absence of wildtype heterogeneity or lab evolution. We found PCR-measured peak time 22 variances exceed those from other methods, pointing to initial count stochasticity. Our 23 trans-disciplinary analyses demonstrate how compilers with more biology-friendly logic 24 could simplify, guide, and naturally distribute biological model curation. Resulting quality 25 increases and cost reductions benefit curation-dependent grand challenges like 26 personalizing medicine. 27 POST StabliizingZone Level: QualityQuest. VersionVariant Number: QQv1r0p0_2017m01d05_LL iii bioRxiv preprint doi: https://doi.org/10.1101/099192; this version posted January 9, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Scheuer et al. aCC-BY 4.0 International license. 1 Table of Contents 2 3 4 INTRODUCTION 1 5 Challenges 1 --- Circadian clocks 4 --- Model organisms 4 --- Math models 5 6 Estimating unknown rates from observed time series 6 --- Biological example --- Models and reality 7 Reproducibility of research 8 --- Firm foundations --- Problems with label reproducibility 8 Statistical reproducibility --- Statistical error iceberg --- Reproducibility in genetics 15 9 Versioned Biological Information Resources (VBIRs) 16 --- Importance of versioned data integrators 10 Flexibility of VBIRs --- Database integration --- Cochrane review 11 Genome projects as a model for VBIRs development on a broader scale 20 12 Genome projects have revolutionized biology --- Importance of biological model curation 22 13 How a compiler could help in biological model curation 23 --- Enforce best practices 14 Efficiencies of scale --- Opportunity 25 --- Purpose of this study 26 --- Formal organization of 15 Versioned Biological Information Resources (VBIRs) --- Data integrated by curation --- Hypothesis 16 tested: biological variability --- Hypothesis tested: observation method --- Human error analysis 17 Compiler logic design --- Importance of efficient biological model curation --- Overview of Sections 18 19 MODELS 34 20 Biological model of fly circadian clocks 34 --- Main loop 35 --- Other loops 36 21 In silico models integrating fly clock observations 37 --- Biological results overview 37 22 Role of stochasticity 38 --- Shared problems 39 --- Parameter estimation in complex models 40 23 Using abstract time series traits 42 --- Using complete observed time series 44 --- Using both, 24 complete observed time series and abstract traits 47 --- Complete experimentally observed time 25 series --- Working with abstract time series traits --- Distances --- Both sides offer advantages 26 Studies integrating time series data 50 27 FlyClockbase data model overview 52 --- Overview 52 --- SumS, DetS --- Logic in biology 54 28 Other design decisions 55 --- Simple file system storage 56 --- Flexibility 57 --- Data types for 29 organizing content 59 --- Content, Attributes, Traits --- Identification of TimeSeries 62 --- IDL, IDF, 30 IDM, Ref.Fig.TS -- Raw, Mod, and Odd Observations 65 -- Data types of time measurements 66 31 ZT, DZT, CZT -- Data types of amounts 67 --- Current definition of scope 68 --- Collection of data 69 32 33 MATERIALS AND METHODS 70 34 Literature search 70 --- Initial eligibility assessment 71 --- Biological eligibility assessment 72 35 TimeSeries data extraction 74 --- Accuracy estimates of digitized TimeSeries data 74 36 Extracting TimeSeries Attributes 76 37 TimeSeries Traits analysis of Peaks and Valleys refined by ObsOdd checks 79 38 Measuring a limit for maximal peak time variance 79 39 Factors contributing to increased trait variance 80 --- Linearizing Time Series Data 82 40 Statistical Analyses 85 --- Automated analysis with R script 85 --- Outlier analysis 86 41 Testing differences in variance 87 --- Testing differences in mean 88 42 Supplemental Statistical Analysis 88 43 44 RESULTS 89 45 Experimental observations used in modeling 89 46 FlyClockbase is a new resource enabling studies of circadian clock TimeSeries 91 47 Hypothesis testing --- 91 Error analysis 92 --- Overview of
Recommended publications
  • Data Warehouse: an Integrated Decision Support Database Whose Content Is Derived from the Various Operational Databases
    1 www.onlineeducation.bharatsevaksamaj.net www.bssskillmission.in DATABASE MANAGEMENT Topic Objective: At the end of this topic student will be able to: Understand the Contrasting basic concepts Understand the Database Server and Database Specified Understand the USER Clause Definition/Overview: Data: Stored representations of objects and events that have meaning and importance in the users environment. Information: Data that have been processed in such a way that they can increase the knowledge of the person who uses it. Metadata: Data that describes the properties or characteristics of end-user data and the context of that data. Database application: An application program (or set of related programs) that is used to perform a series of database activities (create, read, update, and delete) on behalf of database users. WWW.BSSVE.IN Data warehouse: An integrated decision support database whose content is derived from the various operational databases. Constraint: A rule that cannot be violated by database users. Database: An organized collection of logically related data. Entity: A person, place, object, event, or concept in the user environment about which the organization wishes to maintain data. Database management system: A software system that is used to create, maintain, and provide controlled access to user databases. www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in 2 www.onlineeducation.bharatsevaksamaj.net www.bssskillmission.in Data dependence; data independence: With data dependence, data descriptions are included with the application programs that use the data, while with data independence the data descriptions are separated from the application programs. Data warehouse; data mining: A data warehouse is an integrated decision support database, while data mining (described in the topic introduction) is the process of extracting useful information from databases.
    [Show full text]
  • Bi-Directional Transformation Between Normalized Systems Elements and Domain Ontologies in OWL
    Bi-directional Transformation between Normalized Systems Elements and Domain Ontologies in OWL Marek Suchanek´ 1 a, Herwig Mannaert2, Peter Uhnak´ 3 b and Robert Pergl1 c 1Faculty of Information Technology, Czech Technical University in Prague, Thakurova´ 9, Prague, Czech Republic 2Normalized Systems Institute, University of Antwerp, Prinsstraat 13, Antwerp, Belgium 3NSX bvba, Wetenschapspark Universiteit Antwerpen, Galileilaan 15, 2845 Niel, Belgium Keywords: Ontology, Normalized Systems, Transformation, Model-driven Development, Ontology Engineering, Software Modelling. Abstract: Knowledge representation in OWL ontologies gained a lot of popularity with the development of Big Data, Artificial Intelligence, Semantic Web, and Linked Open Data. OWL ontologies are very versatile, and there are many tools for analysis, design, documentation, and mapping. They can capture concepts and categories, their properties and relations. Normalized Systems (NS) provide a way of code generation from a model of so-called NS Elements resulting in an information system with proven evolvability. The model used in NS contains domain-specific knowledge that can be represented in an OWL ontology. This work clarifies the potential advantages of having OWL representation of the NS model, discusses the design of a bi-directional transformation between NS models and domain ontologies in OWL, and describes its implementation. It shows how the resulting ontology enables further work on the analytical level and leverages the system design. Moreover, due to the fact that NS metamodel is metacircular, the transformation can generate ontology of NS metamodel itself. It is expected that the results of this work will help with the design of larger real-world applications as well as the metamodel and that the transformation tool will be further extended with additional features which we proposed.
    [Show full text]
  • A Survey and Classification of Controlled Natural Languages
    A Survey and Classification of Controlled Natural Languages ∗ Tobias Kuhn ETH Zurich and University of Zurich What is here called controlled natural language (CNL) has traditionally been given many different names. Especially during the last four decades, a wide variety of such languages have been designed. They are applied to improve communication among humans, to improve translation, or to provide natural and intuitive representations for formal notations. Despite the apparent differences, it seems sensible to put all these languages under the same umbrella. To bring order to the variety of languages, a general classification scheme is presented here. A comprehensive survey of existing English-based CNLs is given, listing and describing 100 languages from 1930 until today. Classification of these languages reveals that they form a single scattered cloud filling the conceptual space between natural languages such as English on the one end and formal languages such as propositional logic on the other. The goal of this article is to provide a common terminology and a common model for CNL, to contribute to the understanding of their general nature, to provide a starting point for researchers interested in the area, and to help developers to make design decisions. 1. Introduction Controlled, processable, simplified, technical, structured,andbasic are just a few examples of attributes given to constructed languages of the type to be discussed here. We will call them controlled natural languages (CNL) or simply controlled languages.Basic English, Caterpillar Fundamental English, SBVR Structured English, and Attempto Controlled English are some examples; many more will be presented herein. This article investigates the nature of such languages, provides a general classification scheme, and explores existing approaches.
    [Show full text]
  • A Conceptual Framework for Constructing Distributed Object Libraries Using Gellish
    A Conceptual Framework for Constructing Distributed Object Libraries using Gellish Master's Thesis in Computer Science Michael Rudi Henrichs [email protected] Parallel and Distributed Systems group Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology June 1, 2009 Student Michael Rudi Henrichs Studentnumber: 9327103 Oranjelaan 8 2264 CW Leidschendam [email protected] MSc Presentation June 2, 2009 at 14:00 Lipkenszaal (LB 01.150), Faculty EWI, Mekelweg 4, Delft Committee Chair: Prof. Dr. Ir. H.J. Sips [email protected] Member: Dr. Ir. D.H.J. Epema [email protected] Member: Ir. N.W. Roest [email protected] Supervisor: Dr. K. van der Meer [email protected] Idoro B.V. Zonnebloem 52 2317 LM Leiden The Netherlands Parallel and Distributed Systems group Department of Software Technology Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology Mekelweg 4 2826 CD Delft The Netherlands www.ewi.tudelft.nl Sponsors: This master's thesis was typeset with MiKTEX 2.7, edited on TEXnicCenter 1 beta 7.50. Illustrations and diagrams were created using Microsoft Visio 2003 and Corel Paint Shop Pro 12.0. All running on an Acer Aspire 6930. Copyright c 2009 by Michael Henrichs, Idoro B.V. Cover photo and design by Michael Henrichs c 2009 http:nnphoto.lemantle.com All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without the prior permission of the author.
    [Show full text]
  • Data Models for Home Services
    __________________________________________PROCEEDING OF THE 13TH CONFERENCE OF FRUCT ASSOCIATION Data Models for Home Services Vadym Kramar, Markku Korhonen, Yury Sergeev Oulu University of Applied Sciences, School of Engineering Raahe, Finland {vadym.kramar, markku.korhonen, yury.sergeev}@oamk.fi Abstract An ultimate penetration of communication technologies allowing web access has enriched a conception of smart homes with new paradigms of home services. Modern home services range far beyond such notions as Home Automation or Use of Internet. The services expose their ubiquitous nature by being integrated into smart environments, and provisioned through a variety of end-user devices. Computational intelligence require a use of knowledge technologies, and within a given domain, such requirement as a compliance with modern web architecture is essential. This is where Semantic Web technologies excel. A given work presents an overview of important terms, vocabularies, and data models that may be utilised in data and knowledge engineering with respect to home services. Index Terms: Context, Data engineering, Data models, Knowledge engineering, Semantic Web, Smart homes, Ubiquitous computing. I. INTRODUCTION In recent years, a use of Semantic Web technologies to build a giant information space has shown certain benefits. Rapid development of Web 3.0 and a use of its principle in web applications is the best evidence of such benefits. A traditional database design in still and will be widely used in web applications. One of the most important reason for that is a vast number of databases developed over years and used in a variety of applications varying from simple web services to enterprise portals. In accordance to Forrester Research though a growing number of document, or knowledge bases, such as NoSQL is not a hype anymore [1].
    [Show full text]
  • Universidad Carlos III De Madrid Escuela Politécnica Superior
    Universidad Carlos III de Madrid Escuela Politécnica Superior Ingeniería en Informática Proyecto Fin de Carrera DISEÑO DE UN MUNDO VIRTUAL PARA LA ENSEÑANZA DE ARQUITECTURA SOFTWARE Autor: Verónica Casado Manzanero Tutor: Anabel Fraga Vázquez Diciembre, 2009 DISEÑO DE UN MUNDO VIRTUAL PARA LA ENSEÑANZA DE ARQUITECTURA SOFTWARE Agradecimientos En primer lugar querría dar las gracias a mis padres, por su apoyo, comprensión, generosidad, por darme todo lo que necesito y más, y sobre todo, por enseñarme a ser como soy y servirme de ejemplo para convertirme en mejor persona cada día. También me gustaría recordar a mi hermana por ayudarme siempre en lo he necesitado con esa gran dosis de paciencia que sé que ha de tener. A mis amigas Carol y Raquel, por permanecer siempre a mi lado a pesar de estar semanas sin vernos. A mi novio Marcos, por pasarme esa paciencia y tranquilidad suya que tanto aprecio, por apoyarme en todo momento, por creer en mí y por permanecer a mi lado durante estos seis largos años. Por último, agradecerle a mi tutora Anabel toda la ayuda prestada, tanto en las asignaturas como en este proyecto. Gracias a su inestimable ayuda y entusiasmo he podido completar con éxito el trabajo aquí propuesto. Ha sido un verdadero placer trabajar con ella. Verónica Casado Manzanero 3/254 Universidad Carlos III de Madrid DISEÑO DE UN MUNDO VIRTUAL PARA LA ENSEÑANZA DE ARQUITECTURA SOFTWARE Verónica Casado Manzanero 4/254 Universidad Carlos III de Madrid DISEÑO DE UN MUNDO VIRTUAL PARA LA ENSEÑANZA DE ARQUITECTURA SOFTWARE Contenido 1. Introducción y motivación .........................................................................................
    [Show full text]
  • Ontology Languages – a Review
    International Journal of Computer Theory and Engineering, Vol.2, No.6, December, 2010 1793-8201 Ontology Languages – A Review V. Maniraj, Dr.R. Sivakumar 1) Logical Languages Abstract—Ontologies have been becoming a hot research • First order predicate logic topic for the application in artificial intelligence, semantic web, Software Engineering, Library Science and information • Rule based logic Architecture. Ontology is a formal representation of set of concepts within a domain and relationships between those • concepts. It is used to reason about the properties of that Description logic domain and may be used to define the domain. An ontology language is a formal language used to encode the ontologies. A 2) Frame based Languages number of research languages have been designed and released • Similar to relational databases during the past few years by the research community. They are both proprietary and standard based. In this paper a study has 3) Graph based Languages been reported on the different features and issues of these • languages. This paper also addresses the challenges for Semantic network research community in the further development of ontology languages. • Analogy with the web is rationale for the semantic web I. INTRODUCTION Ontology engineering (or ontology building) is a subfield II. BACKGROUND of knowledge engineering that studies the methods and CycL1 in computer science and artificial intelligence is an methodologies for building ontologies. It studies the ontology language used by Doug Lenat’s Cye artificial ontology development process, the ontology life cycle, the intelligence project. Ramanathan V. Guna was instrumental methods and methodologies for building ontologies, and the in the design of the language.
    [Show full text]
  • Ontology-Based Design of Space Systems
    Ontology-Based Design of Space Systems Christian Hennig 1, Alexander Viehl 2, Benedikt Kämpgen 2, and Harald Eisenmann 1 1Airbus Defence and Space, Space Systems, Friedrichshafen, Germany {christian.hennig,harald.eisenmann}@airbus.com 2FZI Research Center for Information Technology, Karlsruhe, Germany {viehl,kaempgen}@fzi.de Abstract. In model-based systems engineering a model specifying the system's design is shared across a variety of disciplines and used to ensure the consisten- cy and quality of the overall design. Existing implementations for describing these system models exhibit a number of shortcomings regarding their approach to data management. In this emerging applications paper, we present the appli- cation of an ontology for space system design that provides increased semantic soundness of the underlying standardized data specification, enables reasoners to identify problems in the system, and allows the application of operational knowledge collected over past projects to the system to be designed. Based on a qualitative evaluation driven by data derived from an actual satellite design pro- ject, a reflection on the applicability of ontologies in the overall model-based systems engineering approach is pursued. Keywords: Space Systems, Systems Engineering, MBSE, ECSS-E-TM-10-23, Conceptual Data Model, OWL, Reasoning. 1 Introduction The industrial setting for producing systems to be deployed in space, such as satel- lites, launch vehicles, or science spacecraft, involves a multitude of engineering disci- plines. Each involved discipline has its own view on the system to be built, along with its own models, based on its own model semantics. For forming a consistent picture of the system, information from all relevant discipline-specific models is integrated towards an interdisciplinary system model, forming the practice of model-based sys- tems engineering (MBSE).
    [Show full text]
  • DISEASE ONTOLOGY SC Mohapatra1 and Meghkanta Mohapatra2
    ISSN- 2394-272X(Print) e-ISSN-2394-2738(Online) EDITORIAL: DISEASE ONTOLOGY SC Mohapatra1 and Meghkanta Mohapatra2 The term ontology is originally used in philosophy and has been applied in many different subjects thereafter. It is branch of metaphysics that studies the nature of existence. The word element onto- comes from the Greek “being". Thus Ontology means “science of being”. From philosophy ontology is an explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist and the relationships they have with each other. When the knowledge about a domain is represented in a declarative language, the set of objects are called the universe of discourse. We can describe the ontology of a program by defining a set of representational terms. Definitions associate the names of entities in the population of discourses (e.g. classes, relations, functions or other objects). Formally, an ontology is the statement of a logical theory. We say that an agent commits to an ontology if its observable actions are consistent with the definitions. The idea of ontological commitment is based on the “Knowledge-Level perspective”. In case of disease it makes sense that the ‘Science of being a disease’. The core meaning within computer science is a model for describing the world that consists of a set of types, properties, and relationship. There is also generally an expectation that the features of the model in an ontology should closely resemble the real world (related to the object). What may ontologies have in common in both computer science and in philosophy is the representation of entities, ideas, and events, along with their properties and relations, according to a system of categories.
    [Show full text]
  • Gellish English Steplib
    Dr. Ir. Andries van Renssen Principal Consultant Information Management Shell Global Solutions Consultancy & Services for Data Exchange and Data Integration Copyright: Shell Global Solutions International B.V. The Gellish Language a structured subset of natural languages - Gellish English - Gellish Nederlands - Gellish Deutsch -Etc. - Gellish numeric Copyright: Shell Global Solutions International B.V. The Business Issue: Communication on Product Data >15 EPC Contractors Plant Detailed life Engineering Suppliers time Procure & Technical Advisors 100 - 1000 Fabricate Conceptual design Plants Plant owners Clear away Construct & Commission Plant Change or Revamp Constructors All again >100 Maintain Operate Maintenance Operators Contractors > 100 14-06-1995 Copyright: Shell Global Solutions International B.V. The Business Issue: Communication on Product Data Suppliers perspective Discipline experts Plant Detailed Part-Suppliers life Engineering time Sales Procure & Fabricate Conceptual design Equipment Plant owners & Construct & Commission Operations Systems & Maintenance Maintenance Construction Verification contractors Hand-over and testing Standards Authorities institutes 14-06-1995 Copyright: Shell Global Solutions International B.V. The Data Exchange & Data Integration issue 1. Standard engineering terminology is needed - There is no standard electronic Business/Engineering dictionary available Ecl@ss, Rosettanet, Trade Ranger, UNSPSC, …, STEPlib / ISO 15926-4 All proprietary data and based on proprietary data models 2. Data structures
    [Show full text]
  • Uvic Thesis Template
    UNIVERSITY OF SOUTHAMPTON FACULTY OF PHYSICAL SCIENCES AND ENGINEERING Electronics and Computer Science Building Tag Hierarchies Based on Co-occurrences and Lexico-Syntactic Patterns by Fahad Ibrahim Bin Moqhim Thesis for the degree of Doctor of Philosophy June 2016 ii iii UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF PHYSICAL SCIENCES AND ENGINEERING Electronics and Computer Science Thesis for the degree of Doctor of Philosophy BUILDING TAG HIERARCHIES BASED ON CO-OCCURRENCES AND LEXICO-SYNTACTIC PATTERNS Fahad Ibrahim Bin Moqhim Knowledge structures, such as taxonomies, are key to the organization and management of Web content, but are expensive to build manually. In this thesis we explore the issues around automatically building effective tag hierarchies from folksonomies (collective social classifications), and propose changes to the state-of- the-art methods that improve their performance. These changes aim to tackle the “generality-popularity” tags problem, in that popularity is assumed (sometimes inaccurately) to be a proxy for generality, i.e. high-level taxonomic terms will occur more often than low-level ones. The effectiveness of this research is demonstrated in four experiments. The first experiment explores whether taxonomic tag pairs captured directly from users change the quality of constructed tag hierarchies. The second experiment examines the possibility of using personal tag relationships constructed by users to improve the accuracy of learned taxonomic tags. The third experiment demonstrates the potential of using lexico-syntactic patterns applied to a closed text corpus to improve the direction of automatically derived tag pairs in order to build higher quality tag hierarchies. The last experiment investigates the possibility of using an open knowledge repository instead of a closed knowledge resource to increase the tags coverage in any tag collection, and consequently the quality of learned tag hierarchies.
    [Show full text]
  • Operation Augmented Ontologies
    Capturing the Semantics of Change: Operation Augmented Ontologies Gavan John Newell Submitted in total fulfilment of the requirements of the degree of Master of Computer Science Department of Computer Science and Software Engineering THE UNIVERSITY OF MELBOURNE April 2009 Copyright c 2009 Gavan John Newell All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the author. Abstract As information systems become more complex it is infeasible for a non-expert to under- stand how the information system has evolved. Accurate models of these systems and the changes occurring to them are required for interpreters to understand, reason over, and learn from evolution of these systems. Ontologies purport to model the semantics of the domain encapsulated in the system. Existing approaches to using ontologies do not capture the rationale for change but instead focus on the direct differences between one version of a model and the subsequent version. Some changes to ontologies are caused by a larger context or goal that is temporally separated from each specific change to the ontology. Current approaches to supporting change in ontologies are insufficient for rea- soning over changes and allow changes that lead to inconsistent ontologies. In this thesis we examine the existing approaches and their limitations and present a four-level classification system for models representing change. We address the short- comings in current techniques by introducing a new approach, augmenting ontologies with operations for capturing and representing change. In this approach changes are rep- resented as a series of connected, related and non-sequential smaller changes.
    [Show full text]