A Genetic Resource for Genes and Mutations Related to Epilepsy
Total Page:16
File Type:pdf, Size:1020Kb
Published online 16 October 2014 Nucleic Acids Research, 2015, Vol. 43, Database issue D893–D899 doi: 10.1093/nar/gku943 EpilepsyGene: a genetic resource for genes and mutations related to epilepsy Xia Ran1,†, Jinchen Li1,†, Qianzhi Shao1, Huiqian Chen1, Zhongdong Lin2, Zhong Sheng Sun1,3,* and Jinyu Wu1,3,* 1Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou 325000, China, 2Department of Pediatric Neurology, The Second Affiliated & Yuying Children’s Hospital, Wenzhou Medical University, Wenzhou 325000, China and 3Beijing Institutes of Life Science, Chinese Academy of Science, Beijing 100101, China Received July 07, 2014; Revised September 06, 2014; Accepted September 27, 2014 ABSTRACT juvenile myoclonic epilepsy. Idiopathic epilepsy, represent- ing up to 47% of all epilepsies, is considered to have a ge- Epilepsy is one of the most prevalent chronic neu- netic basis with a monogenic or polygenic mode of inher- rological disorders, afflicting about 3.5–6.5 per 1000 itance (4). Meanwhile, individuals with epilepsy are con- children and 10.8 per 1000 elderly people. With in- sistently reported to show clinical features of other dis- tensive effort made during the last two decades, orders, or vice versa. In particular, autism spectrum dis- numerous genes and mutations have been pub- order (ASD) and attention-deficit/hyperactivity disorder lished to be associated with the disease. An orga- (ADHD) are the most common comorbid conditions asso- nized resource integrating and annotating the ever- ciated with epilepsy (2). Besides, the prevalence of epilepsy increasing genetic data will be imperative to acquire in patients with autism and mental retardation (MR) is up a global view of the cutting-edge in epilepsy re- to 40% (5,6), and individuals with epilepsy are at an in- search. Herein, we developed EpilepsyGene (http: creased risk of developing schizophrenia (SCZ) like psy- chosis (7). Therefore, to unveil the genetic architecture of //61.152.91.49/EpilepsyGene). It contains cumulative epilepsy, it is of vital importance to investigate the pheno- to date 499 genes and 3931 variants associated with typic and genetic complexity of epilepsy and its comorbidity 331 clinical phenotypes collected from 818 publica- with ASD/MR/ADHD/SCZ. tions. Furthermore, in-depth data mining was per- In the past two decades, with intensive effort made to ex- formed to gain insights into the understanding of the plore genetic susceptibility of epilepsy, numerous genes and data, including functional annotation, gene prioriti- mutations have been discovered to be associated with the zation, functional analysis of prioritized genes and disease. Over the last 2 years, particularly, rapid progress overlap analysis focusing on the comorbidity. An in- in its gene discovery has been accelerated by the appli- tuitive web interface to search and browse the diver- cation of massively parallel sequencing technologies (8,9). sified genetic data was also developed to facilitate An organized resource integrating and annotating the ever- access to the data of interest. In general, Epilepsy- increasing genetic data will be imperative for researchers to acquire a global view of the cutting-edge in epilepsy re- Gene is designed to be a central genetic database to search. However, genetic database that integrates and ana- provide the research community substantial conve- lyzes the scattered genetic data on epilepsy is still in its in- nience to uncover the genetic basis of epilepsy. fancy when compared with other disease-specific databases, such as AutismKB (10) and ADHDgene (11). Therefore, it is urgently required to conduct thorough collection, system- INTRODUCTION atic integration and detailed annotation of existing genes Epilepsy is a group of neurological disorders characterized and mutations underlying epilepsy. by recurrent epileptic seizures (1). As one of the most preva- The currently available genetic databases for epilepsy are: lent chronic neurological disorders, it affects about 3.5– GenEpi (http://epilepsy.hardwicklab.org/), CarpeDB (http: 6.5 per 1000 children (2) and 10.8 per 1000 elderly people //www.carpedb.ua.edu/), epiGAD (12), The Lafora Gene (3). With ages of onset varying from infancy to adulthood, Mutation Database (13) and MeGene (http://www.epigene. epilepsy encompasses a broad range of clinical phenotypes, org/mutation/). However, they are far from a comprehen- such as infantile spasms, childhood absence epilepsy and sive genetic database: either lacking complete genetic infor- *To whom correspondence should be addressed. Tel: +86 577 8883 1309; Fax: +86 577 8883 1309; Email: [email protected] Correspondence may also be addressed to Jinyu Wu. Tel: +86 577 8883 1309; Fax: +86 577 8883 1309; Email: [email protected] †The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors. C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] D894 Nucleic Acids Research, 2015, Vol. 43, Database issue mation, or restricted on specific diseases or researches. In dbSNP (18). In addition, mutation spectrum (19) address- this study, we present EpilepsyGene, a comprehensive ge- ing the mutation distribution was provided to present an netic database aimed to fulfill the growing needs of data in- overview of all mutations at gene level. All the mutations tegration and mining from all available resources. It inte- were mapped onto the locus of the corresponding gene, with grates and annotates 499 genes, 3931 variants and 331 clini- different colors denoting different mutation types (MTs) or cal phenotypes collected from 818 eligible publications. An effects. Mutations reported more than once were separated intuitive web interface with versatile searching and brows- from those initially identified. ing functionalities was also developed to help researchers To provide an informative gene card for each epileptic access the data of interest conveniently and perform fur- gene, gene annotation was produced including basic gene ther data analysis. In general, EpilepsyGene is designed to information, related Gene Ontology (GO) (20) terms and be a central genetic database to provide research communi- pathways, and brain expression details. Firstly, gene an- ties substantial convenience to uncover the phenotypic and notation file ‘Homo sapiens.gene info.gz’ was downloaded genetic complexity of epilepsy and its comorbidities with from NCBI (http://www.ncbi.nlm.nih.gov/) to obtain ba- other disorders. sic information of all epileptic genes. Secondly, WebGestalt (21) was used to enrich GO (20) terms, WIKI pathways (22), KEGG pathways (23) and Pathway Commons (24) associ- DATA COLLECTION AND ANALYSIS ated with the genes. Thirdly, brain expression levels of each Data collection gene spanning 12 periods and 16 brain regions were ob- tainedfromRNA-seq(22) to present the expression pat- To obtain a complete list of genes and mutations rele- tern of the gene. Finally, gene details and relevant pheno- vant to epilepsy, comprehensive searches were performed types in Mouse Genome Informatics (MGI) (25) and On- for epilepsy-related genetic studies. Initially, we retrospec- line Mendelian Inheritance in Man (OMIM, http://www. tively searched the PubMed database (http://www.ncbi.nih. omim.org/)(26) were also included in the gene card. gov/pubmed) with the following query terms: ‘epilepsy [Title/Abstract] OR specific phenotype such as West syn- drome [Title/Abstract] AND gene [Title/Abstract] OR ge- Gene prioritization netic [Title/Abstract] AND mutation [Title/Abstract] OR To identify high-confidence genes by the relevance to variant [Title/Abstract] OR variation [Title/Abstract]’. Ad- epilepsy, gene prioritization was conducted by adopting ditionally, EpilepsyGene also includes genetic variants se- the annotations of 10 functional prediction tools (i.e. SIFT lected with discretion from existing databases, including (27), phyloP (28), SiPhy (29), LRT (30), MutationTaster MITOMAP (14), The Lafora Gene Mutation Database (31), MutationAssessor (32), FATHMM (33), GERP++ (13), epiGAD (12), GenEpi (http://epilepsy.hardwicklab. (34), PolyPhen2 HDIV (35) and PolyPhen2 HVAR (35)). org/) and MeGene (http://www.epigene.org/mutation/). A score of 1 was assigned to the variant, which was pre- Overall, more than 1000 publications dating from 1995 dicted as ‘Deleterious’, ‘Disease-causing’, ‘Tolerated’ or to 2014 were obtained. The abstracts of these articles were ‘Conserved’, and 10 was assigned to the variant whose MT manually screened, and those with negative results or per- is ‘frame shift’, ‘splicing’, ‘stop gain’ or ‘stop loss’. The score forming only functional analysis of known variants were of the variant is then calculated by: excluded. In all, 818 studies were recruited for further x, MT is nonsynonymous information extraction. Genetic data such as nucleotide S = 10, MT i s f r ameshi f t, splicing, stopgain or stoploss change, gene symbol and clinical phenotype, were extracted through in-depth reading the full text of each publication where x is the number of tools which predict the variant as and double-checked manually. Besides,