Improving the Gene Ontology Resource to Facilitate More Informative Analysis and Interpretation of Alzheimer’S Disease Data
Total Page:16
File Type:pdf, Size:1020Kb
G C A T T A C G G C A T genes Article Improving the Gene Ontology Resource to Facilitate More Informative Analysis and Interpretation of Alzheimer’s Disease Data Barbara Kramarz 1 , Paola Roncaglia 2 , Birgit H. M. Meldal 2 , Rachael P. Huntley 1 , Maria J. Martin 2, Sandra Orchard 2, Helen Parkinson 2, David Brough 3, Rina Bandopadhyay 4, Nigel M. Hooper 3 and Ruth C. Lovering 1,* 1 UCL Institute of Cardiovascular Science, University College London, Rayne Building, 5 University Street, London WC1E 6JF, UK; [email protected] (B.K.); [email protected] (R.P.H.) 2 European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK; [email protected] (P.R.); [email protected] (B.H.M.M.); [email protected] (M.J.M.); [email protected] (S.O.); [email protected] (H.P.) 3 Division of Neuroscience and Experimental Psychology, School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, AV Hill Building, Oxford Road, Manchester M13 9PT, UK; [email protected] (D.B.); [email protected] (N.M.H.) 4 UCL Queen Square Institute of Neurology and Reta Lila Weston Institute of Neurological Studies, 1 Wakefield Street, London WC1N 1PJ, UK; [email protected] * Correspondence: [email protected] or [email protected]; Tel.: +44-207-679-6965 Received: 31 October 2018; Accepted: 23 November 2018; Published: 29 November 2018 Abstract: The analysis and interpretation of high-throughput datasets relies on access to high-quality bioinformatics resources, as well as processing pipelines and analysis tools. Gene Ontology (GO, geneontology.org) is a major resource for gene enrichment analysis. The aim of this project, funded by the Alzheimer’s Research United Kingdom (ARUK) foundation and led by the University College London (UCL) biocuration team, was to enhance the GO resource by developing new neurological GO terms, and use GO terms to annotate gene products associated with dementia. Specifically, proteins and protein complexes relevant to processes involving amyloid-beta and tau have been annotated and the resulting annotations are denoted in GO databases as ‘ARUK-UCL’. Biological knowledge presented in the scientific literature was captured through the association of GO terms with dementia-relevant protein records; GO itself was revised, and new GO terms were added. This literature biocuration increased the number of Alzheimer’s-relevant gene products that were being associated with neurological GO terms, such as ‘amyloid-beta clearance’ or ‘learning or memory’, as well as neuronal structures and their compartments. Of the total 2055 annotations that we contributed for the prioritised gene products, 526 have associated proteins and complexes with neurological GO terms. To ensure that these descriptive annotations could be provided for Alzheimer’s-relevant gene products, over 70 new GO terms were created. Here, we describe how the improvements in ontology development and biocuration resulting from this initiative can benefit the scientific community and enhance the interpretation of dementia data. Keywords: Alzheimer’s disease; dementia; cognitive impairment; neurodegeneration; Gene Ontology; annotation; biocuration; amyloid-beta; microtubule-associated protein tau Genes 2018, 9, 593; doi:10.3390/genes9120593 www.mdpi.com/journal/genes Genes 2018, 9, 593 2 of 24 1. Introduction Understanding the cellular bases of Alzheimer’s disease (AD) and other dementias is an essential step in helping develop improved therapies for their treatment and prevention, as well as supportingGenes 2018 early, 9, 593 diagnosis. Although several genes associated with monogenic AD2 of have 24 been identified [1], the majority of cases are likely to be due to multiple genetic, as well as environmental, 1. Introduction risk factors [2,3]. In order to understand the cellular processes and risk factors associated with AD Understanding the cellular bases of Alzheimer’s disease (AD) and other dementias is an and other dementias, numerous transcriptomic, proteomic, and genome-wide association (GWA) essential step in helping develop improved therapies for their treatment and prevention, as well as studies havesupporting been conductedearly diagnosis. [2,4 –Although6]. Analyses several and ge thenes interpretationassociated with of mo suchnogenic high-throughput AD have been datasets rely onidentified high-quality [1], the annotations majority of cases describing are likely to the be biologicaldue to multiple roles genetic, of the as well implicated as environmental, gene products, as well asrisk appropriate factors [2,3]. methodologiesIn order to understand being the used cellular to generate processes theand data,risk factors the application associated with of appropriate AD statisticaland methods, other dementias, and implementation numerous transcriptomic, in analysis proteomic, tools [4,6 and–8]. genome-wide Gene annotation association provides (GWA) functional knowledgestudies of genehave productsbeen conducted (proteins [2,4–6]. and Analyses RNAs) and ina the format interpretation that can of be such fully high-throughput exploited by systems datasets rely on high-quality annotations describing the biological roles of the implicated gene biology andproducts, genomic as well investigators. as appropriate Thus, methodologies annotation being resources used to generate bridge the data, gap betweenthe application data of collection and dataappropriate analysis statistical [4,9]. The methods, main resourcesand implementation used to identifyin analysis significantly tools [4,6–8]. enrichedGene annotation pathways in GWA, proteomic,provides functional and transcriptomic knowledge of gene studies products are (p thoseroteins provided and RNAs) by in thea format Gene that Ontology can be fully (GO) [10], Reactomeexploited [11], KEGG by systems [12 ],biology and molecularand genomic interaction investigators. databases Thus, annotation [13,14 resources]. This data bridge is the imported gap into independentbetween enrichment data collection analysis and data tools, analysis such [4,9]. as g:Profiler The main [ 15resources], VLAD used [16 to], oridentify DAVID significantly [17], which can enriched pathways in GWA, proteomic, and transcriptomic studies are those provided by the Gene provideOntology different (GO) enrichment [10], Reactome options [11], and/orKEGG [12], settings, and molecular and are interaction updated databases with different [13,14]. frequencies,This all of whichdata willis imported affect analyses’into independent outcomes. enrichment analysis tools, such as g:Profiler [15], VLAD [16], or TheDAVID GO resource [17], which is a can biomedical provide different ontology enrichment that describes options and/or the physiological settings, and are functional updated attributeswith of gene products,different acrossfrequencies, all species, all of which in a will consistent affect analyses’ and computer-accessible outcomes. manner using a controlled vocabulary thatThe GO facilitates resource the is a integrationbiomedical ontology of public that biological describes the data physiological [10,18,19]. functional Fully defined attributes GO terms of gene products, across all species, in a consistent and computer-accessible manner using a are associated with gene products across many species, providing a computable summary of individual controlled vocabulary that facilitates the integration of public biological data [10,18,19]. Fully experimentsdefined that GO demonstrate terms are associated functional with gene information products across (Figure many1). Thesespecies, links providing between a computable GO terms and gene productssummary are of knownindividual as ex ‘annotations’,periments that demonstrate and enable functional the description information ofgene (Figure products 1). These accordinglinks to their molecularbetween GO functions terms and (e.g., gene ‘scavenger products are receptor known as activity’), ‘annotations’, the and biological enable the processes description they of gene contribute towardsproducts (e.g., ‘microtubule according to their cytoskeleton molecular functions organisation’), (e.g., ‘scavenger and their receptor subcellular activity’), locations the biological (or cellular processes they contribute towards (e.g., ‘microtubule cytoskeleton organisation’), and their components, e.g., ‘extracellular region’) [10]. subcellular locations (or cellular components, e.g., ‘extracellular region’) [10]. Figure 1. Cont. Genes 2018, 9, 593 3 of 24 Genes 2018, 9, 593 3 of 24 FigureFigure 1. A selection1. A selection of Geneof Gene Ontology Ontology (GO)(GO) annotati annotationsons created created by Alzheimer’s by Alzheimer’s Research Research United United KingdomKingdom (ARUK)—University (ARUK) – University College College London London (UCL), (UCL as), as apart a part of ourof our project project that that focused focused on on capturing knowledgecapturing on amyloid-betaknowledge on andamyloid-beta tau by the and expert tau by biocuration the expert ofbiocuration experimental of experimental data available data through the biomedicalavailable through literature. the biomedical These annotations literature. These describe annotations related describe biological related processes biological processes using GO terms, whichusing are descendants GO terms, which of theare descendan ‘neuron projectionts of the ‘neuron development’ projection development’ GO term.