Autism Candidate Genes Via Mouse Phenomics ⇑ Terrence F
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Biomedical Informatics 44 (2011) S5–S11 Contents lists available at ScienceDirect Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin Autism candidate genes via mouse phenomics ⇑ Terrence F. Meehan ,1, Christopher J. Carr 1,2, Jeremy J. Jay, Carol J. Bult, Elissa J. Chesler, Judith A. Blake The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA article info abstract Article history: Autism spectrum disorders (ASD) represent a group of developmental disabilities with a strong genetic Available online 11 March 2011 basis. The laboratory mouse is increasingly used as a model organism for ASD, and MGI, the Mouse Gen- ome Informatics resource, is the primary model organism database for the laboratory mouse. MGI uses Keywords: the Mammalian Phenotype (MP) ontology to describe mouse models of human diseases. Using bioinfor- Autism spectrum disorders matics tools including Phenologs, MouseNET, and the Ontological Discovery Environment, we tested data Phenotype ontology associated with MP terms to characterize new gene-phenotype associations related to ASD. Our integra- Mouse disease models tive analysis using these tools identified numerous mouse genotypes that are likely to have previously uncharacterized autistic-like phenotypes. The genes implicated in these mouse models had considerable overlap with a set of over 300 genes recently associated with ASD due to small, rare copy number vari- ation (Pinto et al., 2010). Prediction and characterization of autistic mutant mouse alleles assists researchers in studying the complex nature of ASD and provides a generalizable approach to candidate gene prioritization. Ó 2011 Elsevier Inc. All rights reserved. 1. Introduction disease phenotype; any one allele on its own has little phenotypic contribution [5]. By this notion, the common alleles contributing to Autism spectrum disorders (ASD) are the fastest growing group disease should be identified in patient populations by performing of serious developmental disabilities in the United States, affecting genome-wide association studies (GWAS). ASD has been subject one in every 110 children [1]. ASD is used to describe multiple dis- to a number of GWAS, but to date only a few common alleles have orders that are primarily characterized by deficits in socialization been implicated and the majority of these have not been confirmed and communication, as well as restricted, stereotyped and repeti- in follow up studies [6]. However, there is a growing appreciation tive patterns of behavior [2]. ASD, broadly defined, has a substan- that many complex human disorders are not the result of a combi- tial genetic basis with a 92% concordance rate among monozygotic nation of common alleles but, rather, they result from one or few twins compared to 10% in dizygotic pairs [3]. Some cases are asso- rare variant alleles that have a large phenotypic contribution [7]. ciated to genetic disorders such as Fragile X, to rare mutations in An important tenant of this model is that a large number of gene synaptic genes, or to risk loci identified from genome-wide associ- variants must reside within a population to explain the prevalence ation studies [4]. However, the genetic determinates and molecular of disease. Support for this model in ASD comes from a recent mechanism of ASD remain largely unknown. study that implicates over 300 genes in autism by small (1 or 2 Until recently, the prevailing hypothesis for the cause of com- gene) copy number variants (CNV) found with a higher burden in plex genetic diseases was that a combination of alleles that are rel- 996 ASD patients [8]. atively common in a population collectively act to manifest a With such a large number of genes being implicated in ASD, a bioinformatics and computational approach can usefully be em- ployed to assess how these alleles can manifest in a common phe- Abbreviations: ASD, autism spectrum disorders; MGI, the Mouse Genome notype. In the last 10 years, such scientists have taken advantage of Informatics; MP, Mammalian Phenotype ontology; CNV, copy number variants; the Gene Ontology (GO) to evaluate the characteristics of sets of OMIM, Online Mendelian Inheritance in Man; ODE, Ontological Discovery Environ- ment; ABBA, Anchored Bicliques of Biomolecular Associates. genes [9]. The GO, like other formal ontologies, consists of a struc- ⇑ Corresponding author. Address: Mouse Genome Informatics, The Jackson tured hierarchical controlled vocabulary whose use serves to stan- Laboratory, 600 Main St., Bar Harbor, ME 04609, USA. Fax: +1 207 288 6830. dardize the representation of gene and gene product attributes. GO E-mail addresses: [email protected] (T.F. Meehan), [email protected] term enrichment experiments use the transitive closure power of (C.J. Carr), [email protected] (J.J. Jay), [email protected] (C.J. Bult), Elissa.Chesler@ the GO hierarchical structure to identify biological processes that jax.org (E.J. Chesler), [email protected] (J.A. Blake). 1 These authors contributed equally to this work. sets of genes are statistically more frequently associated with 2 Present address: North Carolina State University, Raleigh, NC, USA. [10]. The authors of the CNV study, for example, performed such 1532-0464/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jbi.2011.03.003 S6 T.F. Meehan et al. / Journal of Biomedical Informatics 44 (2011) S5–S11 a ‘‘GO term enrichment’’ test that revealed biological processes the Phenologs database, a table displaying ASD-implicated human which were not previously implicated in ASD including cellular genes and their mouse orthologs is generated. MP terms that were proliferation and GTPase/Ras signaling [8]. While pointing out po- significantly overrepresented among these mouse orthologs were tential new avenues of investigation, there are limitations to this ranked using a hypergeometric probability as described [14].MP type of GO analysis since GO annotations describe gene function terms that had 2 or more orthologs among the human autism gene in normal biological processes and specifically do not describe set, and had a p-value higher than p > 0.2.05eÀ04, were selected pathological processes. Understanding why certain alleles associ- for further analysis. ate with ASD, however, requires that the analysis include informa- tion about the associated disease phenotypes. For example, a 2.2. Phenome Map in the Ontological Discovery Environment (ODE) common experimental approach is to subject a mouse strain carry- ing a targeted mutation of an ASD candidate gene to behavioral and The Ontological Discovery Environment (ODE) provides analysis neurological studies [11]. Such studies provide valuable insights of functional genomics data sets across species and experimental into the human disorder, but as more genes are implicated in systems (http://ontologicaldiscovery.org/) [16]. The ODE database ASD, and more mouse models become available, researchers can contains gene sets including those that consist of genes with alleles be overwhelmed in just reviewing the literature. annotated to a MP term in the MGI database. These gene sets have Biocuration efforts, coupled to ontology development, make the MP term as their identifier. Within ODE, a user is able to input this literature more computable. The Mouse Genome Informatics gene sets that are centered on a specific phenotype or experimen- (MGI) bioinformatics resource captures data about experiments tal result, retrieve publicly available sets, and analyze multiple conducted using the mouse as a model of human biology including gene sets using a variety of analysis tools. One analysis tool, Phe- those related to autism. MGI integrates genetic, genomic and phe- nome Map [17], uses multi-way intersections of gene sets to create notypic data for the laboratory mouse [12], and curates data using a hierarchy of gene-phenotype associations using a novel algo- several different ontologies including the Mammalian Phenotype rithm for biclique enumeration. We used Phenome Map to identify (MP) Ontology [13]. As of January 8, 2011, the MP contains over genes that are not associated with the OMIM term ‘Autism’ within 7800 terms that are used to describe the phenotypes of over MGI, but that appear to be good candidate genes based on func- 38,000 mouse genotypes. Mouse genotypes are also associated tional similarity with autism-implicated genes that are in MGI. within MGI to the records in the Online Mendelian Inheritance in Mouse alleles found after review of the biomedical literature that Man (OMIM) resource when a mouse is explicitly used as a model were not yet associated with autism within MGI were submitted for a human disease. to the MGI curation pipeline. The MP has been incorporated into several recently developed bioinformatics tools. (1) The Phenologs database, released in 2010 [14] identifies MP terms that are overrepresented among 2.3. MouseNET mouse genes that are orthologous to human genes associated with disease in OMIM. (2) The MouseNET system [15] supports predic- MouseNET uses computational integration of diverse genetic, tion of functional assignments based on integration of genetic and genomic, and phenotype data and Bayesian inference to predict genomic data. (3) The Ontological Discovery Environment (ODE) functional relationships among mouse genes [15]. From a user-de- [16] integrates gene sets associated with phenotype terms using fined list of genes of interest, MouseNET generates a probabilistic a variety of analysis tools such as Phenome