Gene Recquest) - Software to Assist in Identification and Selection of Candidate Genes from Genomic Regions
Total Page:16
File Type:pdf, Size:1020Kb
University of Massachusetts Medical School eScholarship@UMMS Open Access Articles Open Access Publications by UMMS Authors 2009-10-02 Genetic region characterization (Gene RECQuest) - software to assist in identification and selection of candidate genes from genomic regions Rajani S. Sadasivam University of Massachusetts Medical School Et al. Let us know how access to this document benefits ou.y Follow this and additional works at: https://escholarship.umassmed.edu/oapubs Part of the Bioinformatics Commons, Genetics and Genomics Commons, and the Medicine and Health Sciences Commons Repository Citation Sadasivam RS, Sundar G, Vaughan LK, Tanik MM, Arnett DK. (2009). Genetic region characterization (Gene RECQuest) - software to assist in identification and selection of candidate genes from genomic regions. Open Access Articles. https://doi.org/10.1186/1756-0500-2-201. Retrieved from https://escholarship.umassmed.edu/oapubs/2100 This material is brought to you by eScholarship@UMMS. It has been accepted for inclusion in Open Access Articles by an authorized administrator of eScholarship@UMMS. For more information, please contact [email protected]. BMC Research Notes BioMed Central Short Report Open Access Genetic region characterization (Gene RECQuest) - software to assist in identification and selection of candidate genes from genomic regions Rajani S Sadasivam*†1, Gayathri Sundar†2, Laura K Vaughan†3, Murat M Tanik†2 and Donna K Arnett†4 Address: 1Division of Health Informatics and Implementation Science, Quantitative Health Sciences, University of Massachusetts Medical School, USA, 2Department of Electrical and Computer Engineering, School of Engineering, University of Alabama at Birmingham, Birmingham, Alabama, USA, 3Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama, USA and 4Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama, USA Email: Rajani S Sadasivam* - [email protected]; Gayathri Sundar - [email protected]; Laura K Vaughan - [email protected]; Murat M Tanik - [email protected]; Donna K Arnett - [email protected] * Corresponding author †Equal contributors Published: 30 September 2009 Received: 13 July 2009 Accepted: 30 September 2009 BMC Research Notes 2009, 2:201 doi:10.1186/1756-0500-2-201 This article is available from: http://www.biomedcentral.com/1756-0500/2/201 © 2009 Sadasivam et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: The availability of research platforms like the web tools of the National Center for Biotechnology Information (NCBI) has transformed the time-consuming task of identifying candidate genes from genetic studies to an interactive process where data from a variety of sources are obtained to select likely genes for follow-up. This process presents its own set of challenges, as the genetic researcher has to interact with several tools in a time-intensive, manual, and cumbersome manner. We developed a method and implemented an effective software system to address these challenges by multidisciplinary efforts of professional software developers with domain experts. The method presented in this paper, Gene RECQuest, simplifies the interaction with existing research platforms through the use of advanced integration technologies. Findings: Gene RECQuest is a web-based application that assists in the identification of candidate genes from linkage and association studies using information from Online Mendelian Inheritance in Man (OMIM) and PubMed. To illustrate the utility of Gene RECQuest we used it to identify genes physically located within a linkage region as potential candidate genes for a quantitative trait locus (QTL) for very low density lipoprotein (VLDL) response on chromosome 18. Conclusion: Gene RECQuest provides a tool which enables researchers to easily identify and organize literature supporting their own expertise and make informed decisions. It is important to note that Gene RECQuest is a data acquisition and organization software, and not a data analysis method. Page 1 of 8 (page number not for citation purposes) BMC Research Notes 2009, 2:201 http://www.biomedcentral.com/1756-0500/2/201 Background and be 10-30 Mb in length [2]. In order to identify the Complex genetic diseases are due to common variants act- causal variant, these regions must then be subjected to ing alone or in combination with other genes or the envi- fine-mapping, where the area under each region is satu- ronment to cause disease. For the past several years, rated with additional molecular markers, and these mark- genetic linkage has been a mainstay for the analysis of ers are then tested for association with the trait in these complex diseases. Although there has been some question. This technique can reduce the area under each discussion about the future of genetic linkage studies, region and result in dozens of putative candidate genes for there is little debate that genetic linkage studies have had each region. These reduced regions are then subjected to tremendous success in identifying regions of the genome sequence analysis to further refine the search area. Numer- that contribute to a wide variety of complex phenotypes ous sequence variants, both in coding and non-coding [1]. However, identifying the gene (or genes) underlying a regions, can exist within each region. Because a complex linkage peak or in a region of association which drive the trait region can result from several variants within the association remains a significant challenge. same region, each variant must be tested independently for functionality, as well as combinations of all the vari- Linkage studies typically identify regions of association ants. With a linkage study of a complex trait, multiple that cover 10-30 cM, which can contain up to 300 genes regions are expected to be identified, resulting in several WorkflowFigure 1 of manual linkage studies research activities Workflow of manual linkage studies research activities. Page 2 of 8 (page number not for citation purposes) BMC Research Notes 2009, 2:201 http://www.biomedcentral.com/1756-0500/2/201 hundred, if not thousands, of putative candidate loci, and Viewer tool of NCBI, selecting the organism, chromo- necessitating fine-mapping and subsequent sequencing some, maps and other options, and specifying the region for numerous genomic regions, which could be costly and of chromosome to download a data file with a list of genes time prohibitive [3-6]. on sequence. For each gene, genetic researchers will then search and retrieve information such as individual A commonly used method in selecting genes from a link- PubMed citations and OMIM summaries, which are age study for follow-up is to identify the genes that lie searched manually for the key words of interest. This proc- within the region of interest and select genes from this list ess is hampered by the large number of genes in a linkage for further analysis based on some criteria such as previ- region, and the difficulty involved in identifying and cat- ously demonstrated association with the phenotype. Typ- aloguing each of those genes. ically, biological investigators will undertake a manual process such as that depicted in Figure 1, in which In an effort to make this gene selection process more effi- researchers browse through multiple pages in the Map cient, streamlined and organized, we have developed a GeneFigure RECQuest 2 Map Viewer Interface Gene RECQuest Map Viewer Interface. Page 3 of 8 (page number not for citation purposes) BMC Research Notes 2009, 2:201 http://www.biomedcentral.com/1756-0500/2/201 software system called GENEtic REgion Characterization more information than the researcher needs for his or her Quest (Gene RECQuest). For genes located within a purpose. To reduce the number of steps, we have designed region of interest, Gene RECQuest automatically retrieves a single interface in which the user will be able to set the and stores the PubMed and OMIM citations using XML parameters for displaying the maps in a single step. The and Web Services, and allows for a key phrase-based Gene RECQuest system (Figure 2) navigates the user to the search using database technologies. To demonstrate the appropriate Map Viewer from which the user can down- utility, we applied our method to a recently identified load the chromosomal data, which will be uploaded to linkage peak for VLDL response on chromosome 18. the Gene RECQuest MySql database for further analysis. Unfortunately, as the output format of Map Viewer is Implementation inconsistent, we were not yet able to integrate the Map Our overall goal was to design and develop a web-based Viewer download to the file upload system of GeneREC- tool to integrate and customize the existing tools (NCBI Quest. To overcome this inconsistency, we introduced a web tools) to simplify and streamline genetic linkage step where the user has to format the Map Viewer output studies. We approached the problem with a user-centered before uploading to the GeneRECQuest system. design [7] and modified service-oriented architecture approach that is described [8-11]. Full instructions for use Integration with NCBI Web Services are available on the software's website and key imple- NCBI Web Services provide a Web Service Description mented features are as follows: Language (WSDL)-based Application Programmable Interface (API) to a collection of Entrez E-utilities. The Integration with Map Viewer Gene RECQuest NCBI tools-as-services wrapper uses the Integration with the Map Viewer tool in Gene RECQuest NCBI WSDL-API to access and retrieve the necessary infor- allows for the reduction of the number of steps required mation (Gene, OMIM, and PubMed information) from to interact with Map Viewer. Map Viewer is a genome the NCBI database. Due to the amount of data and restric- analysis tool provided by the NCBI.