Ontology-Based Methods for Analyzing Life Science Data
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
D3.3 Initial Technical & Logical Architecture and Future Work
D3.3 Initial Technical & Logical Architecture and future work recommendations ECP 2008 DILI 558001 Europeana v1.0 D3.3 Initial Technical & Logical Architecture and future work recommendations Deliverable number D3.3 Dissemination level Public Delivery date 30 July 2010 Status Final Makx Dekkers, Stefan Gradmann, Jan Author(s) Molendijk eContentplus This project is funded under the eContentplus programme1, a multiannual Community programme to make digital content in Europe more accessible, usable and exploitable. 1 OJ L 79, 24.3.2005, p. 1. 1/14 D3.3 Initial Technical & Logical Architecture and future work recommendations 1. Introduction This deliverable has two tasks: To characterise the technical and logical architecture of Europeana as a system in its 1.0 state (that is to say by the time of the ‘Rhine’ release To outline the future work recommendations that can reasonably be made at that moment. This also provides a straightforward and logical structure to the document: characterisation comes first followed by the recommendations for future work. 2. Technical and Logical Architecture From a high-level architectural point of view, Europeana.eu is best characterized as a search engine and a database. It loads metadata delivered by providers and aggregators into a database, and uses that database to allow users to search for cultural heritage objects, and to find links to those objects. Various methods of searching and browsing the objects are offered, including a simple and an advanced search form, a timeline, and an openSearch API. It is also important to describe what Europeana.eu does not do, even though people sometimes expect it to. -
A Knowledge Discovery Approach
Semantic XML Tagging of Domain-Specific Text Archives: A Knowledge Discovery Approach Dissertation zur Erlangung des akademisches Grades Doktoringenieur (Dr.-Ing.) angenommen durch die Fakult¨at fur¨ Informatik der Otto-von-Guericke-Universit¨at Magdeburg von Diplom-Kaufmann Peter Karsten Winkler, geboren am 1. Oktober 1971 in Berlin Gutachterinnen und Gutachter: Prof. Dr. Myra Spiliopoulou Prof. Dr. Gunter Saake Prof. Dr. Stefan Conrad Ort und Datum des Promotionskolloquiums: Magdeburg, 22. Januar 2009 Karsten Winkler. Semantic XML Tagging of Domain-Specific Text Archives: A Knowl- edge Discovery Approach. Dissertation, Faculty of Computer Science, Otto von Guericke University Magdeburg, Magdeburg, Germany, January 2009. Contents List of Figures v List of Tables vii List of Algorithms xi Abstract xiii Zusammenfassung xv Acknowledgments xvii 1 Introduction 1 1.1 TheAbundanceofText ............................ 1 1.2 Defining Semantic XML Markup . 3 1.3 BenefitsofSemanticXMLMarkup . 9 1.4 ResearchQuestions ............................... 12 1.5 ResearchMethodology ............................. 14 1.6 Outline...................................... 16 2 Literature Review 19 2.1 Storage, Retrieval, and Analysis of Textual Data . ....... 19 2.1.1 Knowledge Discovery in Textual Databases . 19 2.1.2 Information Storage and Retrieval . 23 2.1.3 InformationExtraction. 25 2.2 Discovering Concepts in Textual Data . 26 2.2.1 Topic Discovery in Text Documents . 27 2.2.2 Extracting Relational Tuples from Text . 31 2.2.3 Learning Taxonomies, Thesauri, and Ontologies . 35 2.3 Semantic Annotation of Text Documents . 39 2.3.1 Manual Semantic Text Annotation . 40 2.3.2 Semi-Automated Semantic Text Annotation . 43 2.3.3 Automated Semantic Text Annotation . 47 2.4 Schema Discovery in Marked-Up Text Documents . -
A Logical Framework for Modularity of Ontologies∗
A Logical Framework for Modularity of Ontologies∗ Bernardo Cuenca Grau, Ian Horrocks, Yevgeny Kazakov and Ulrike Sattler The University of Manchester School of Computer Science Manchester, M13 9PL, UK {bcg, horrocks, ykazakov, sattler }@cs.man.ac.uk Abstract ontology itself by possibly reusing the meaning of external symbols. Hence by merging an ontology with external on- Modularity is a key requirement for collaborative tologies we import the meaning of its external symbols to de- ontology engineering and for distributed ontology fine the meaning of its local symbols. reuse on the Web. Modern ontology languages, To make this idea work, we need to impose certain con- such as OWL, are logic-based, and thus a useful straints on the usage of the external signature: in particular, notion of modularity needs to take the semantics of merging ontologies should be “safe” in the sense that they ontologies and their implications into account. We do not produce unexpected results such as new inconsisten- propose a logic-based notion of modularity that al- cies or subsumptions between imported symbols. To achieve lows the modeler to specify the external signature this kind of safety, we use the notion of conservative exten- of their ontology, whose symbols are assumed to sions to define modularity of ontologies, and then prove that be defined in some other ontology. We define two a property of some ontologies, called locality, can be used restrictions on the usage of the external signature, to achieve modularity. More precisely, we define two no- a syntactic and a slightly less restrictive, seman- tions of locality for SHIQ TBoxes: (i) a tractable syntac- tic one, each of which is decidable and guarantees tic one which can be used to provide guidance in ontology a certain kind of “black-box” behavior, which en- editing tools, and (ii) a more general semantic one which can ables the controlled merging of ontologies. -
Proquest Dissertations
Automated learning of protein involvement in pathogenesis using integrated queries Eithon Cadag A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2009 Program Authorized to Offer Degree: Department of Medical Education and Biomedical Informatics UMI Number: 3394276 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI Dissertation Publishing UMI 3394276 Copyright 2010 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code. uest ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 University of Washington Graduate School This is to certify that I have examined this copy of a doctoral dissertation by Eithon Cadag and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Chair of the Supervisory Committee: Reading Committee: (SjLt KJ. £U*t~ Peter Tgffczy-Hornoch In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with "fair use" as prescribed in the U.S. -
Molecular Biology for Computer Scientists
CHAPTER 1 Molecular Biology for Computer Scientists Lawrence Hunter “Computers are to biology what mathematics is to physics.” — Harold Morowitz One of the major challenges for computer scientists who wish to work in the domain of molecular biology is becoming conversant with the daunting intri- cacies of existing biological knowledge and its extensive technical vocabu- lary. Questions about the origin, function, and structure of living systems have been pursued by nearly all cultures throughout history, and the work of the last two generations has been particularly fruitful. The knowledge of liv- ing systems resulting from this research is far too detailed and complex for any one human to comprehend. An entire scientific career can be based in the study of a single biomolecule. Nevertheless, in the following pages, I attempt to provide enough background for a computer scientist to understand much of the biology discussed in this book. This chapter provides the briefest of overviews; I can only begin to convey the depth, variety, complexity and stunning beauty of the universe of living things. Much of what follows is not about molecular biology per se. In order to 2ARTIFICIAL INTELLIGENCE & MOLECULAR BIOLOGY explain what the molecules are doing, it is often necessary to use concepts involving, for example, cells, embryological development, or evolution. Bi- ology is frustratingly holistic. Events at one level can effect and be affected by events at very different levels of scale or time. Digesting a survey of the basic background material is a prerequisite for understanding the significance of the molecular biology that is described elsewhere in the book. -
Deciding Semantic Matching of Stateless Services∗
Deciding Semantic Matching of Stateless Services∗ Duncan Hull†, Evgeny Zolin†, Andrey Bovykin‡, Ian Horrocks†, Ulrike Sattler†, and Robert Stevens† School of Computer Science, Department of Computer Science, † ‡ University of Manchester, UK University of Liverpool, UK Abstract specifying automated reasoning algorithms for such stateful service descriptions is basically impossible in the presence We present a novel approach to describe and reason about stateless information processing services. It can of any expressive ontology (Baader et al. 2005). Stateless- be seen as an extension of standard descriptions which ness implies that we do not need to formulate pre- and post- makes explicit the relationship between inputs and out- conditions since our services do not change the world. puts and takes into account OWL ontologies to fix the The question we are interested in here is how to help the meaning of the terms used in a service description. This biologist to find a service he or she is looking for, i.e., a allows us to define a notion of matching between ser- service that works with inputs and outputs the biologist can vices which yields high precision and recall for service provide/accept, and that provides the required functionality. location. We explain why matching is decidable, and The growing number of publicly available biomedical web provide biomedical example services to illustrate the utility of our approach. services, 3000 as of February 2006, required better match- ing techniques to locate services. Thus, we are concerned with the question of how to describe a service request Q Introduction and service advertisements Si such that the notion of a ser- Understanding the data generated from genome sequenc- vice S matching the request Q can be defined in a “useful” ing projects like the Human Genome Project is recognised way. -
Bibliography [Abiteboul and Kanellakis, 1989] Serge Abiteboul and Paris Kanellakis
507 Bibliography [Abiteboul and Kanellakis, 1989] Serge Abiteboul and Paris Kanellakis. Object identity as a query language primitive. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 159–173, 1989. [Abiteboul et al., 1995] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison Wesley Publ. Co., Reading, Massachussetts, 1995. [Abiteboul et al., 1997] Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, and Janet L. Wiener. The Lorel query language for semistructured data. Int. J. on Digital Libraries, 1(1):68–88, 1997. [Abiteboul et al., 2000] Serge Abiteboul, Peter Buneman, and Dan Suciu. Data on the Web: from Relations to Semistructured Data and XML. Morgan Kaufmann, Los Altos, 2000. [Abiteboul, 1997] Serge Abiteboul. Querying semi-structured data. In Proc. of the 6th Int. Conf. on Database Theory (ICDT’97), pages 1–18, 1997. [Abrahams et al., 1996] Merryll K. Abrahams, Deborah L. McGuinness, Rich Thomason, Lori Alperin Resnick, Peter F. Patel-Schneider, Violetta Cavalli-Sforza, and Cristina Conati. NeoClassic tutorial: Version 1.0. Technical report, Artificial Intelligence Prin- ciples Research Department, AT&T Labs Research and University of Pittsburgh, 1996. Available as http://www.bell-labs.com/project/classic/papers/NeoTut/NeoTut. html. [Abrett and Burstein, 1987] Glen Abrett and Mark H. Burstein. The KREME knowledge editing environment. Int. J. of Man-Machine Studies, 27(2):103–126, 1987. [Abrial, 1974] J. R. Abrial. Data semantics. In J. W. Klimbie and K. L. Koffeman, editors, Data Base Management, pages 1–59. North-Holland Publ. Co., Amsterdam, 1974. [Achilles et al., 1991] E. Achilles, B. -
Ploidetect Enables Pan-Cancer Analysis of the Causes and Impacts of Chromosomal Instability
bioRxiv preprint doi: https://doi.org/10.1101/2021.08.06.455329; this version posted August 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Ploidetect enables pan-cancer analysis of the causes and impacts of chromosomal instability Luka Culibrk1,2, Jasleen K. Grewal1,2, Erin D. Pleasance1, Laura Williamson1, Karen Mungall1, Janessa Laskin3, Marco A. Marra1,4, and Steven J.M. Jones1,4, 1Canada’s Michael Smith Genome Sciences Center at BC Cancer, Vancouver, British Columbia, Canada 2Bioinformatics training program, University of British Columbia, Vancouver, British Columbia, Canada 3Department of Medical Oncology, BC Cancer, Vancouver, British Columbia, Canada 4Department of Medical Genetics, Faculty of Medicine, Vancouver, British Columbia, Canada Cancers routinely exhibit chromosomal instability, resulting in tumors mutate, these variants are considerably more difficult the accumulation of changes in the abundance of genomic ma- to detect accurately compared to other types of mutations terial, known as copy number variants (CNVs). Unfortunately, and consequently they may represent an under-explored the detection of these variants in cancer genomes is difficult. We facet of tumor biology. 20 developed Ploidetect, a software package that effectively iden- While small mutations can be determined through base tifies CNVs within whole-genome sequenced tumors. Ploidetect changes embedded within aligned sequence reads, CNVs was more sensitive to CNVs in cancer related genes within ad- are variations in DNA quantity and are typically determined vanced, pre-treated metastatic cancers than other tools, while also segmenting the most contiguously. -
Description Logics
Description Logics Franz Baader1, Ian Horrocks2, and Ulrike Sattler2 1 Institut f¨urTheoretische Informatik, TU Dresden, Germany [email protected] 2 Department of Computer Science, University of Manchester, UK {horrocks,sattler}@cs.man.ac.uk Summary. In this chapter, we explain what description logics are and why they make good ontology languages. In particular, we introduce the description logic SHIQ, which has formed the basis of several well-known ontology languages, in- cluding OWL. We argue that, without the last decade of basic research in description logics, this family of knowledge representation languages could not have played such an important rˆolein this context. Description logic reasoning can be used both during the design phase, in order to improve the quality of ontologies, and in the deployment phase, in order to exploit the rich structure of ontologies and ontology based information. We discuss the extensions to SHIQ that are required for languages such as OWL and, finally, we sketch how novel reasoning services can support building DL knowledge bases. 1 Introduction The aim of this section is to give a brief introduction to description logics, and to argue why they are well-suited as ontology languages. In the remainder of the chapter we will put some flesh on this skeleton by providing more technical details with respect to the theory of description logics, and their relationship to state of the art ontology languages. More detail on these and other matters related to description logics can be found in [6]. Ontologies There have been many attempts to define what constitutes an ontology, per- haps the best known (at least amongst computer scientists) being due to Gruber: “an ontology is an explicit specification of a conceptualisation” [47].3 In this context, a conceptualisation means an abstract model of some aspect of the world, taking the form of a definition of the properties of important 3 This was later elaborated to “a formal specification of a shared conceptualisation” [21]. -
BIOINFORMATICS APPLICATIONS NOTE Pages 380-381
Vol. 14 no. 4 1998 BIOINFORMATICS APPLICATIONS NOTE Pages 380-381 MView: a web-compatible database search or multiple alignment viewer NigelP. Brown, ChristopheLeroy and Chris Sander European Bioinformatics Institute (EMBLĆEBI), Wellcome Genome Campus, CambridgeCB10 1SD, UK Received on December 10, 1997; revised and accepted on January 15, 1998 Abstract may be hyperlinked to the SRS system (Etzold et al., 1996), a text field, a field of scoring information from searches, and Summary: MView is a tool for converting the results of a a field reporting the per cent identity of each sequence with sequence database search into the form of a coloured multiple respect to a preferred sequence in the alignment, usually the alignment of hits stacked against the query. Alternatively, an query in the case of a search. existing multiple alignment can be processed. In either case, Multiple alignments require minimal parsing and are the output is simply HTML, so the result is platform independent and does not require a separate application or subjected only to formatting stages. Search hits are first applet to be loaded. stacked against the ungapped query sequence and require Availability: Free from http://www.sander.ebi.ac.uk/mview/ special processing. Ungapped search (e.g. BLAST) hit subject to copyright restrictions. fragments are assembled into a single string by overlaying Contact: [email protected] them preferentially by score onto a template string, while gapped search (e.g. FASTA) hits have columns corresponding Often when running FASTA (Pearson, 1990) or BLAST to query gaps excised. Consequently, the stacked alignment is (Altschul et al., 1990), it is desired to visualize the database a patchwork of reconstituted sequences that nevertheless is hits stacked against the query sequence. -
PREDICTD: Parallel Epigenomics Data Imputation with Cloud-Based Tensor Decomposition
bioRxiv preprint doi: https://doi.org/10.1101/123927; this version posted April 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. PREDICTD: PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition Timothy J. Durham Maxwell W. Libbrecht Department of Genome Sciences Department of Genome Sciences University of Washington University of Washington J. Jeffry Howbert Jeff Bilmes Department of Genome Sciences Department of Electrical Engineering University of Washington University of Washington William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington April 4, 2017 Abstract The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project have produced thousands of data sets mapping the epigenome in hundreds of cell types. How- ever, the number of cell types remains too great to comprehensively map given current time and financial constraints. We present a method, PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition (PREDICTD), to address this issue by computationally im- puting missing experiments in collections of epigenomics experiments. PREDICTD leverages an intuitive and natural model called \tensor decomposition" to impute many experiments si- multaneously. Compared with the current state-of-the-art method, ChromImpute, PREDICTD produces lower overall mean squared error, and combining methods yields further improvement. We show that PREDICTD data can be used to investigate enhancer biology at non-coding human accelerated regions. PREDICTD provides reference imputed data sets and open-source software for investigating new cell types, and demonstrates the utility of tensor decomposition and cloud computing, two technologies increasingly applicable in bioinformatics. -
Are You an Invited Speaker? a Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics
Are You an Invited Speaker? A Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics Senator Jeong, Sungin Lee, and Hong-Gee Kim Biomedical Knowledge Engineering Laboratory, Seoul National University, 28–22 YeonGeon Dong, Jongno Gu, Seoul 110–749, Korea. E-mail: {senator, sunginlee, hgkim}@snu.ac.kr Participating in scholarly events (e.g., conferences, work- evaluation, but it would be hard to claim that they have pro- shops, etc.) as an elite-group member such as an orga- vided comprehensive lists of evaluation measurements. This nizing committee chair or member, program committee article aims not to provide such lists but to add to the current chair or member, session chair, invited speaker, or award winner is beneficial to a researcher’s career develop- practices an alternative metric that complements existing per- ment.The objective of this study is to investigate whether formance measures to give a more comprehensive picture of elite-group membership for scholarly events is represen- scholars’ performance. tative of scholars’ prominence, and which elite group is By one definition (Jeong, 2008), a scholarly event is the most prestigious. We collected data about 15 global “a sequentially and spatially organized collection of schol- (excluding regional) bioinformatics scholarly events held in 2007. We sampled (via stratified random sampling) ars’ interactions with the intention of delivering and shar- participants from elite groups in each event. Then, bib- ing knowledge, exchanging research ideas, and performing liometric indicators (total citations and h index) of seven related activities.” As such, scholarly events are communica- elite groups and a non-elite group, consisting of authors tion channels from which our new evaluation tool can draw who submitted at least one paper to an event but were its supporting evidence.