Deep Autoencoder Neural Networks for Gene Ontology Annotation Predictions

Total Page:16

File Type:pdf, Size:1020Kb

Deep Autoencoder Neural Networks for Gene Ontology Annotation Predictions Deep Autoencoder Neural Networks for Gene Ontology Annotation Predictions ∗ y Davide Chicco Peter Sadowski Pierre Baldi Politecnico di Milano University of California, Irvine University of California, Irvine Dipartimento di Elettronica Dept. of Computer Science Dept. of Computer Science Informazione Bioingegneria Institute for Genomics and Institute for Genomics and Milan, Italy Bioinformatics Bioinformatics [email protected] Irvine, CA, USA Irvine, CA, USA [email protected] [email protected] ABSTRACT functional features from a controlled vocabulary. These an- The annotation of genomic information is a major challenge notations are important for effective communication in biomed- in biology and bioinformatics. Existing databases of known ical research, and lay the groundwork for bioinformatics soft- gene functions are incomplete and prone to errors, and the ware tools and data mining investigations. The in vitro bimolecular experiments needed to improve these databases biomolecular experiments used to validate gene functions are are slow and costly. While computational methods are not expensive, so the development of computational methods to a substitute for experimental verification, they can help in identify errors and prioritize new biomolecular experiments two ways: algorithms can aid in the curation of gene anno- is a worthwhile area of research [1]. tations by automatically suggesting inaccuracies, and they The Gene Ontology project (GO) is a bioinformatics ini- can predict previously-unidentified gene functions, acceler- tiative to characterize all the important features of genes and ating the rate of gene function discovery. In this work, we gene products within a cell [2] [3]. GO is composed of three develop an algorithm that achieves both goals using deep controlled vocabularies structured as mostly-separate sub- autoencoder neural networks. With experiments on gene ontologies: biological processes, cellular components, and annotation data from the Gene Ontology project, we show molecular functions. Each GO sub-ontology is structured as that deep autoencoder networks achieve better performance a directed acyclic graph of features (nodes) and ontological than other standard machine learning methods, including relationships (edges). In January 2014, GO contained 39,000 the popular truncated singular value decomposition. terms with more than 25,450 biological processes, 2,250 cel- lular components, and 9,650 molecular functions. However, GO annotations are constantly being revised and added as Categories and Subject Descriptors new experimental evidence is produced. I.2.6 [Artificial Intelligence]: Learning; J.3 [Life and One approach to improving gene function annotation data Medical Sciences]: Biology and Genetics; H.2.8 [Database bases like GO is to use patterns in the known annotations Applications]: Data mining to predict new annotations. This can be viewed as a matrix- completion problem, in which one attempts to recover a ma- Keywords trix with some underlying structure from noisy observations. Machine learning algorithms have proved very successful in biomolecular annotations, matrix-completion, autoencoders, similar applications, such as the famous million-dollar Net- neural networks, Gene Ontology, truncated singular value flix prize awarded in 2009. Many machine learning algo- decomposition, principal component analysis rithms have already been applied to gene function annota- tion ([4] [5] [6] [7] [8] [9]), but to the best of our knowledge 1. INTRODUCTION deep autoencoder neural networks have not. Deep networks In bioinformatics, a controlled gene function annotation of multiple hidden layers have an advantage over shallow is a binary matrix associating genes or gene products with machine learning methods in that they are able to model complex data with greater efficiency. They have proven their ∗corresponding author usefulness in fields such as vision and speech recognition, and ycorresponding author promise to yield similar performance gains in other machine learning applications that have complex underlying struc- Permission to make digital or hard copies of all or part of this work for personal or ture in the data. classroom use is granted without fee provided that copies are not made or distributed A popular algorithm for matrix-completion is the trun- for profit or commercial advantage and that copies bear this notice and the full citation cated singular value decomposition method (tSVD). Kha- on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or tri et al. first used this method for GO annotation predic- republish, to post on servers or to redistribute to lists, requires prior specific permission tion [10], and one of the authors of this work has extended and/or a fee. Request permissions from [email protected]. their method with gene clustering and term-term similarity BCB’14, September 20–23, 2014, Newport Beach, CA, USA. weights [11] [12]. However, the tSVD method can be viewed Copyright is held by the owner/author(s). Publication rights licensed to ACM. as a special linear case of a more general approach using ACM 978-1-4503-2894-4/14/09 ...$15.00. autoencoders [13] [14] [15]. Deep, non-linear, autoencoder http://dx.doi.org/10.1145/2649387.2649442. ACM-BCB 2014 533 neural networks have more expressive power, and may be better suited for discovering the underlying patterns in gene function annotation data. In this paper, we summarize the tSVD and autoencoder methods, show how they can be used to predict annotations, and compare the performance on six separate GO datasets. 2. SYSTEM AND METHODS In this section we describe the two annotation-prediction algorithms used in this paper: Truncated Singular Value De- composition and Autoencoder Neural Network. 2.1 Truncated Singular Value Decomposition Truncated Singular Value Decomposition (tSVD) [16] is a matrix factorization method that produces a low-rank ap- Figure 1: An illustration of the Singular Value m×n Decomposition (upper green image) and the Trun- proximation to a matrix. Define Ad 2 f0; 1g to be a cated SVD reconstruction (lower blue image) of the matrix of annotations. The m rows of Ad correspond to A matrix. In the classical SVD decompostion, A genes, while the n columns correspond to GO features, such m×n m×m m×n T n×n that 2 f0; 1g , U2 R , Σ 2 R V 2 R . In the Truncated decomposition, where k 2 N is the trun- cation level, U 2 m×k, Σ 2 k×k, V T 2 k×n, and the ( k R k R k R 1 if gene i is annotated with feature j, m×n output matrix Ae2 R Ad(i; j) = 0 otherwise. (1) When features are organized into ontologies, sometimes The matrix Ae is real valued and can be interpreted as a only the most specific feature is specified, and the more gen- model of the noisy, incomplete observations. It can be used eral features (ancestors) are implicit. Thus, in this work we to predict both inaccuracies and missing gene functions | a consider a modified matrix A defined as large value of eaij suggests that gene i should be annotated 8 with term j, whereas a value close to zero suggests the oppo- > if gene i is annotated with feature j < 1 site. The choice of the k truncation parameter controls the A(i; j) = or with any descendant of j, > complexity of the model, and affects the predictions. Khatri : 0 otherwise. et al. use a fixed value of k = 500 in [10] [18] [19], while one (2) of the authors of this paper has developed a new discrete op- T The ith row of the A matrix (ai ) contains all the direct and timization algorithm to select the best truncation level on indirect annotations of gene i. The jth column encodes the the basis of the ROC AUCs, described in [20]. list of genes that have been annotated (directly or indirectly) In order to better comprehend why Ae can be used to to feature j. This process is sometimes defined as annotation predict gene-to-term annotations, we highlight that an al- unfolding [17]. ternative expression of Equation (4) can be obtained using Predictions are produced by computing the SVD of the basic linear algebra manipulations: matrix A and truncating the less-significant singular values. T The SVD of the matrix A is given by Ae = AVk Vk (5) A = U Σ V T (3) Additionally, the SVD of the matrix A is related to the T T eigen-decomposition of the symmetric matrices T = A A where U is a m × m unitary matrix (i.e. U U = I), Σ is T and G = AA . The columns of Vk (Uk) are a set of k a non-negative diagonal matrix of size m × n, and V T is T eigenvectors corresponding to the k largest eigenvalues of a n × n unitary matrix (i.e. V V = I). Conventionally, the matrix T (G). The matrix T has a simple interpretation the entries along the diagonal of Σ (the singular values) in our context. In fact, are sorted in non-increasing order. The number r ≤ p of m non-zero singular values is equal to the rank of the matrix X T (j ; j ) = A · A (6) A, where p = min(m; n). For a positive integer k < r, the 1 2 (i;j1) (i;j2) i=1 tSVD matrix Ae is given by i.e. T (j1; j2) is the number of genes annotated with both terms, j1 and j2. Consequently, T (j1; j2) indicates the (un- T Ae = Uk Σk Vk (4) normalized) correlation between term pairs and it can be interpreted as a similarity score of the terms j and j , the where U (V T ) is a m × k (n × k) matrix achieved by 1 2 k k computation of which is exclusively based on the use of these retaining the first k columns of U (V ) and Σ is a k × k di- terms in available annotations.
Recommended publications
  • Unicarbkb: Building a Standardised and Scalable Informatics Platform for Glycosciences Research
    Platforms EOI: UniCarbKB: Building a Standardised and Scalable Informatics Platform for Glycosciences Research 20 September 2019 at 13:02 Project title UniCarbKB: Building a Standardised and Scalable Informatics Platform for Glycosciences Research Field of Research code(s) 03 CHEMICAL SCIENCES 06 BIOLOGICAL SCIENCES 08 INFORMATION AND COMPUTING SCIENCES 11 MEDICAL AND HEALTH SCIENCES EOI Lead Name Matthew Campbell EOI lead Research Group Institute for Glycomics EOI lead Organisation Institute for Glycomics, Griffith University EOI lead Email Collaborator details Name Research Group Organisation Malcolm Wolski eResearch Services Griffith University Project description Over the last few years the glycomics knowledge platform UniCarbKB has contributed to the growth of international glycoinformatics resources by providing access to data collections, web services and software applications supported by biocuration activities. However, UniCarbKB has continued to develop as a single monolithic resource. This proposed project will improve the growth and scalability of UniCarbKB by breaking down the platform into smaller and composable pieces. This will be achieved through the adoption of a more decentralised approach and a microservice architecture that will allow components of the platform to scale at different rates. This architecture will allow us is to build an extendable and cross-disciplinary resource by integrating existing data with new analysis tools. The adoption of international resources (e.g. NIH GlyGen), will allow not just mapping of glycan data to genes and proteins but also pathways, gene ontology, publications and a wide variety of data from EBI, NCBI and other sources. Additionally, we propose to integrate glycan array data and develop ontologies that can serve as tools for integration and subsequently analysis of glycan-associated Existing technology Adopt The proposed project will adopt a multi-tier application architecture in which applications will be developed and deployed as microservices.
    [Show full text]
  • Hierarchical Classification of Gene Ontology Terms Using the Gostruct
    Hierarchical classification of Gene Ontology terms using the GOstruct method Artem Sokolov and Asa Ben-Hur Department of Computer Science, Colorado State University Fort Collins, CO, 80523, USA Abstract Protein function prediction is an active area of research in bioinformatics. And yet, transfer of annotation on the basis of sequence or structural similarity remains widely used as an annotation method. Most of today's machine learning approaches reduce the problem to a collection of binary classification problems: whether a protein performs a particular function, sometimes with a post-processing step to combine the binary outputs. We propose a method that directly predicts a full functional annotation of a protein by modeling the structure of the Gene Ontology hierarchy in the framework of kernel methods for structured-output spaces. Our empirical results show improved performance over a BLAST nearest-neighbor method, and over algorithms that employ a collection of binary classifiers as measured on the Mousefunc benchmark dataset. 1 Introduction Protein function prediction is an active area of research in bioinformatics [25]; and yet, transfer of annotation on the basis of sequence or structural similarity remains the standard way of assigning function to proteins in newly sequenced organisms [20]. The Gene Ontology (GO), which is the current standard for annotating gene products and proteins, provides a large set of terms arranged in a hierarchical fashion that specify a gene-product's molecular function, the biological process it is involved in, and its localization to a cellular component [12]. GO term prediction is therefore a hierarchical classification problem, made more challenging by the thousands of annotation terms available.
    [Show full text]
  • Gene Ontology Analysis with Cytoscape
    Introduction to Systems Biology Exercise 6: Gene Ontology Analysis Overview: Gene Ontology (GO) is a useful resource in bioinformatics and systems biology. GO defines a controlled vocabulary of terms in biological process, molecular function, and cellular location, and relates the terms in a somewhat-organized fashion. This exercise outlines the resources available under Cytoscape to perform analyses for the enrichment of gene annotation (GO terms) in networks or sub-networks. In this exercise you will: • Learn how to navigate the Cytoscape Gene Ontology wizard to apply GO annotations to Cytoscape nodes • Learn how to look for enriched GO categories using the BiNGO plugin. What you will need: • The BiNGO plugin, developed by the Computational Biology Division, Dept. of Plant Systems Biology, Flanders Interuniversitary Institute for Biotechnology (VIB), described in Maere S, et al. Bioinformatics. 2005 Aug 15;21(16):3448-9. Epub 2005 Jun 21. • galFiltered.sif used in the earlier exercises and found under sampleData. Please download any missing files from http://www.cbs.dtu.dk/courses/27041/exercises/Ex3/ Download and install the BiNGO plugin, as follows: 1. Get BiNGO from the course page (see above) or go to the BiNGO page at http://www.psb.ugent.be/cbd/papers/BiNGO/. (This site also provides documentation on BiNGO). 2. Unzip the contents of BiNGO.zip to your Cytoscape plugins directory (make sure the BiNGO.jar file is in the plugins directory and not in a subdirectory). 3. If you are currently running Cytoscape, exit and restart. First, we will load the Gene Ontology data into Cytoscape. 1. Start Cytoscape, under Edit, Preferences, make sure your default species, “defaultSpeciesName”, is set to “Saccharomyces cerevisiae”.
    [Show full text]
  • To Find Information About Arabidopsis Genes Leonore Reiser1, Shabari
    UNIT 1.11 Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes Leonore Reiser1, Shabari Subramaniam1, Donghui Li1, and Eva Huala1 1Phoenix Bioinformatics, Redwood City, CA USA ABSTRACT The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) is a comprehensive Web resource of Arabidopsis biology for plant scientists. TAIR curates and integrates information about genes, proteins, gene function, orthologs gene expression, mutant phenotypes, biological materials such as clones and seed stocks, genetic markers, genetic and physical maps, genome organization, images of mutant plants, protein sub-cellular localizations, publications, and the research community. The various data types are extensively interconnected and can be accessed through a variety of Web-based search and display tools. This unit primarily focuses on some basic methods for searching, browsing, visualizing, and analyzing information about Arabidopsis genes and genome, Additionally we describe how members of the community can share data using TAIR’s Online Annotation Submission Tool (TOAST), in order to make their published research more accessible and visible. Keywords: Arabidopsis ● databases ● bioinformatics ● data mining ● genomics INTRODUCTION The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) is a comprehensive Web resource for the biology of Arabidopsis thaliana (Huala et al., 2001; Garcia-Hernandez et al., 2002; Rhee et al., 2003; Weems et al., 2004; Swarbreck et al., 2008, Lamesch, et al., 2010, Berardini et al., 2016). The TAIR database contains information about genes, proteins, gene expression, mutant phenotypes, germplasms, clones, genetic markers, genetic and physical maps, genome organization, publications, and the research community. In addition, seed and DNA stocks from the Arabidopsis Biological Resource Center (ABRC; Scholl et al., 2003) are integrated with genomic data, and can be ordered through TAIR.
    [Show full text]
  • Snps) Distant from Xenobiotic Response Elements Can Modulate Aryl Hydrocarbon Receptor Function: SNP-Dependent CYP1A1 Induction S
    Supplemental material to this article can be found at: http://dmd.aspetjournals.org/content/suppl/2018/07/06/dmd.118.082164.DC1 1521-009X/46/9/1372–1381$35.00 https://doi.org/10.1124/dmd.118.082164 DRUG METABOLISM AND DISPOSITION Drug Metab Dispos 46:1372–1381, September 2018 Copyright ª 2018 by The American Society for Pharmacology and Experimental Therapeutics Single Nucleotide Polymorphisms (SNPs) Distant from Xenobiotic Response Elements Can Modulate Aryl Hydrocarbon Receptor Function: SNP-Dependent CYP1A1 Induction s Duan Liu, Sisi Qin, Balmiki Ray,1 Krishna R. Kalari, Liewei Wang, and Richard M. Weinshilboum Division of Clinical Pharmacology, Department of Molecular Pharmacology and Experimental Therapeutics (D.L., S.Q., B.R., L.W., R.M.W.) and Division of Biomedical Statistics and Informatics, Department of Health Sciences Research (K.R.K.), Mayo Clinic, Rochester, Minnesota Received April 22, 2018; accepted June 28, 2018 ABSTRACT Downloaded from CYP1A1 expression can be upregulated by the ligand-activated aryl fashion. LCLs with the AA genotype displayed significantly higher hydrocarbon receptor (AHR). Based on prior observations with AHR-XRE binding and CYP1A1 mRNA expression after 3MC estrogen receptors and estrogen response elements, we tested treatment than did those with the GG genotype. Electrophoretic the hypothesis that single-nucleotide polymorphisms (SNPs) map- mobility shift assay (EMSA) showed that oligonucleotides with the ping hundreds of base pairs (bp) from xenobiotic response elements AA genotype displayed higher LCL nuclear extract binding after (XREs) might influence AHR binding and subsequent gene expres- 3MC treatment than did those with the GG genotype, and mass dmd.aspetjournals.org sion.
    [Show full text]
  • Use and Misuse of the Gene Ontology Annotations
    Nature Reviews Genetics | AOP, published online 13 May 2008; doi:10.1038/nrg2363 REVIEWS Use and misuse of the gene ontology annotations Seung Yon Rhee*, Valerie Wood‡, Kara Dolinski§ and Sorin Draghici|| Abstract | The Gene Ontology (GO) project is a collaboration among model organism databases to describe gene products from all organisms using a consistent and computable language. GO produces sets of explicitly defined, structured vocabularies that describe biological processes, molecular functions and cellular components of gene products in both a computer- and human-readable manner. Here we describe key aspects of GO, which, when overlooked, can cause erroneous results, and address how these pitfalls can be avoided. The accumulation of data produced by genome-scale FIG. 1b). These characteristics of the GO structure enable research requires explicitly defined vocabularies to powerful grouping, searching and analysis of genes. describe the biological attributes of genes in order to allow integration, retrieval and computation of the Fundamental aspects of GO annotations data1. Arguably, the most successful example of system- A GO annotation associates a gene with terms in the atic description of biology is the Gene Ontology (GO) ontologies and is generated either by a curator or project2. GO is widely used in biological databases, automatically through predictive methods. Genes are annotation projects and computational analyses (there associated with as many terms as appropriate as well as are 2,960 citations for GO in version 3.0 of the ISI Web of with the most specific terms available to reflect what is Knowledge) for annotating newly sequenced genomes3, currently known about a gene.
    [Show full text]
  • Wormbase Curation Interfaces and Tools
    Wormbase Curation Interfaces And Tools Xiaodong Wang, Paul Sternberg and the WormBase Consortium Division of Biology 156-29, California Institute of Technology, Pasadena, CA 91125, USA Abstract Cellular Component GO curation form Antibody OA Gene ontology OA Curating biological information from the published literature can be time- and labor-intensive especially without automated tools. WormBase1 has adopted several curation interfaces and tools, most of which were built in-house, to help curators recognize and extract data more efficiently from the literature. These tools range from simple computer interfaces for data entry to employing scripts that take advantage of complex text extraction algorithms, which automatically identify specific objects in a paper and presents them to the curator for curation. By using these in-house tools, we Go term are also able to tailor the tool to the individual needs and preferences of the curator. For example, C.elegans protein Cellular Gene Ontology Cellular Component and gene-gene interaction curators employ the text mining Component software Textpresso2 to indentify, retrieve, and extract relevant sentences from the full text of an Category terms article. The curators then use a web-based curation form to enter the data into our local database. Wormbase has developed our own Ontology Annotator tool based on the publicly available Phenote ontology annotation curation interface (developed by the Berkeley Bioinformatics Open-Source Ontology Annotator Projects (BBOP)), which we have adapted with datatype specific configurations. Currently, we have Autocomple4on field eight data s, including antibody, GO, gene regulation, interaction, molecule, people, phenotype and with ontology Transgene OA transgene are using or will use OA for routine literature curation.
    [Show full text]
  • The Biogrid Interaction Database
    D470–D478 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 26 November 2014 doi: 10.1093/nar/gku1204 The BioGRID interaction database: 2015 update Andrew Chatr-aryamontri1, Bobby-Joe Breitkreutz2, Rose Oughtred3, Lorrie Boucher2, Sven Heinicke3, Daici Chen1, Chris Stark2, Ashton Breitkreutz2, Nadine Kolas2, Lara O’Donnell2, Teresa Reguly2, Julie Nixon4, Lindsay Ramage4, Andrew Winter4, Adnane Sellam5, Christie Chang3, Jodi Hirschman3, Chandra Theesfeld3, Jennifer Rust3, Michael S. Livstone3, Kara Dolinski3 and Mike Tyers1,2,4,* 1Institute for Research in Immunology and Cancer, Universite´ de Montreal,´ Montreal,´ Quebec H3C 3J7, Canada, 2The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada, 3Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, 4School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR, UK and 5Centre Hospitalier de l’UniversiteLaval´ (CHUL), Quebec,´ Quebec´ G1V 4G2, Canada Received September 26, 2014; Revised November 4, 2014; Accepted November 5, 2014 ABSTRACT semi-automated text-mining approaches, and to en- hance curation quality control. The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein in- INTRODUCTION teractions curated from the primary biomedical lit- Massive increases in high-throughput DNA sequencing erature for all major model organism species and technologies (1) have enabled an unprecedented level of humans. As of September 2014, the BioGRID con- genome annotation for many hundreds of species (2–6), tains 749 912 interactions as drawn from 43 149 pub- which has led to tremendous progress in the understand- lications that represent 30 model organisms.
    [Show full text]
  • Summary Visualisations of Gene Ontology Terms with GO-Figure!
    bioRxiv preprint doi: https://doi.org/10.1101/2020.12.02.408534; this version posted December 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Summary Visualisations of Gene Ontology Terms with GO-Figure! Maarten JMF Reijnders1*, Robert M Waterhouse1* 1Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland * Correspondence: [email protected] (MJMFR), [email protected] (RMW) Keywords: functional genomics, Python software, redundancy reduction, semantic similarity, GO term enrichment Abstract The Gene Ontology (GO) is a cornerstone of functional genomics research that drives discoveries through knowledge-informed computational analysis of biological data from large- scale assays. Key to this success is how the GO can be used to support hypotheses or conclusions about the biology or evolution of a study system by identifying annotated functions that are overrepresented in subsets of genes of interest. Graphical visualisations of such GO term enrichment results are critical to aid interpretation and avoid biases by presenting researchers with intuitive visual data summaries. Amongst current visualisation tools and resources there is a lack of standalone open-source software solutions that facilitate systematic comparisons of multiple lists of GO terms. To address this we developed GO-Figure!, an open-source Python software for producing user-customisable semantic similarity scatterplots of redundancy-reduced GO term lists. The lists are simplified by grouping together GO terms with similar functions using their quantified information contents and semantic similarities, with user-control over grouping thresholds.
    [Show full text]
  • Genedb and Wikidata[Version 1; Peer Review: 1 Approved, 1 Approved with Reservations]
    Wellcome Open Research 2019, 4:114 Last updated: 20 OCT 2020 SOFTWARE TOOL ARTICLE GeneDB and Wikidata [version 1; peer review: 1 approved, 1 approved with reservations] Magnus Manske , Ulrike Böhme , Christoph Püthe , Matt Berriman Parasites and Microbes, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK v1 First published: 01 Aug 2019, 4:114 Open Peer Review https://doi.org/10.12688/wellcomeopenres.15355.1 Latest published: 14 Oct 2019, 4:114 https://doi.org/10.12688/wellcomeopenres.15355.2 Reviewer Status Abstract Invited Reviewers Publishing authoritative genomic annotation data, keeping it up to date, linking it to related information, and allowing community 1 2 annotation is difficult and hard to support with limited resources. Here, we show how importing GeneDB annotation data into Wikidata version 2 allows for leveraging existing resources, integrating volunteer and (revision) report report scientific communities, and enriching the original information. 14 Oct 2019 Keywords GeneDB, Wikidata, MediaWiki, Wikibase, genome, reference, version 1 annotation, curation 01 Aug 2019 report report 1. Sebastian Burgstaller-Muehlbacher , This article is included in the Wellcome Sanger Medical University of Vienna, Vienna, Austria Institute gateway. 2. Andra Waagmeester , Micelio, Antwerp, Belgium Any reports and responses or comments on the article can be found at the end of the article. Corresponding author: Magnus Manske ([email protected]) Author roles: Manske M: Conceptualization, Methodology, Software, Writing – Original Draft Preparation; Böhme U: Data Curation, Validation, Writing – Review & Editing; Püthe C: Project Administration, Writing – Review & Editing; Berriman M: Resources, Writing – Review & Editing Competing interests: No competing interests were disclosed. Grant information: This work was supported by the Wellcome Trust through a Biomedical Resources Grant to MB [108443].
    [Show full text]
  • 430.Full.Pdf
    430 LETTER TO JMG J Med Genet: first published as 10.1136/jmg.39.6.430 on 1 June 2002. Downloaded from A novel 2 bp deletion in the TM4SF2 gene is associated with MRX58 F E Abidi, E Holinski-Feder, O Rittinger, F Kooy, H A Lubs, R E Stevenson, C E Schwartz ............................................................................................................................. J Med Genet 2002;39:430–433 linked mental retardation (XLMR) represents around 5% detection of any alteration at the splice junction. Since the of all MR, with a prevalence of 1 in 600 males.12Fifteen PCR products for exons 2 and 5 were greater than 310 bp, they to twenty percent of the total XLMR is the result of the were digested with HaeIII and PvuII respectively (table 1). X 3 fragile X syndrome. Non-fragile X mental retardation was Amplification was done in a PTC-200 thermocycler (MJ subdivided into syndromal and non-syndromal conditions by Research). Following PCR, 10 µl of the IPS loading dye (95% Neri et al4 in 1991. The syndromal XLMR entities (MRXS) are formamide, 10 mmol/l NaOH, 0.25% bromophenol blue, 0.25% those in which there is a specific pattern of physical, xylene cyanol) was added. The samples were denatured at neurological, or metabolic abnormalities associated with the 96°C for three minutes and loaded on a 0.5 × MDE gel (FMC) presence of mental retardation.5 Non-syndromic XLMR prepared in 0.6 × TBE. The gel was run at 8 W for 15-20 hours (MRX) are conditions in which a gene mutation causes men- at room temperature.
    [Show full text]
  • Gene Ontology Functional Annotations and Pleiotropy
    Network based analysis of genetic disease associations Sarah Gilman Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2014 © 2013 Sarah Gilman All Rights Reserved ABSTRACT Network based analysis of genetic disease associations Sarah Gilman Despite extensive efforts and many promising early findings, genome-wide association studies have explained only a small fraction of the genetic factors contributing to common human diseases. There are many theories about where this “missing heritability” might lie, but increasingly the prevailing view is that common variants, the target of GWAS, are not solely responsible for susceptibility to common diseases and a substantial portion of human disease risk will be found among rare variants. Relatively new, such variants have not been subject to purifying selection, and therefore may be particularly pertinent for neuropsychiatric disorders and other diseases with greatly reduced fecundity. Recently, several researchers have made great progress towards uncovering the genetics behind autism and schizophrenia. By sequencing families, they have found hundreds of de novo variants occurring only in affected individuals, both large structural copy number variants and single nucleotide variants. Despite studying large cohorts there has been little recurrence among the genes implicated suggesting that many hundreds of genes may underlie these complex phenotypes. The question
    [Show full text]