Leveraging Existing Biological Knowledge in the Identification Of

BMC Bioinformatics BioMed Central Proceedings Open Access Leveraging existing biological knowledge in the identification of candidate genes for facial dysmorphology Hannah J Tipney1, Sonia M Leach1,2, Weiguo Feng3, Richard Spritz4, Trevor Williams3 and Lawrence Hunter*1 Address: 1Computational Pharmacology Department, University of Colorado at Denver and Health Sciences Center, Aurora, CO, USA, 2ESAT, Research Division SCD, Katholieke Universiteit Leuven, B-3001 Leuven, Belgium, 3Department of Craniofacial Biology, University of Colorado at Denver and Health Sciences Center, Aurora, CO, USA and 4Human Medical Genetics Program, University of Colorado at Denver and Health Sciences Center, Aurora, CO, USA Email: Hannah J Tipney - [email protected]; Sonia M Leach - [email protected]; Weiguo Feng - [email protected]; Richard Spritz - [email protected]; Trevor Williams - [email protected]; Lawrence Hunter* - [email protected] * Corresponding author from The First Summit on Translational Bioinformatics 2008 San Francisco, CA, USA. 10–12 March 2008 Published: 5 February 2009 BMC Bioinformatics 2009, 10(Suppl 2):S12 doi:10.1186/1471-2105-10-S2-S12 <supplement> <title> <p>Selected Proceedings of the First Summit on Translational Bioinformatics 2008</p> </title> <editor>Atul J Butte, Indra Neil Sarkar, Marco Ramoni, Yves Lussier and Olga Troyanskaya</editor> <note>Proceedings</note> </supplement> This article is available from: http://www.biomedcentral.com/1471-2105/10/S2/S12 © 2009 Tipney et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: In response to the frequently overwhelming output of high-throughput microarray experiments, we propose a methodology to facilitate interpretation of biological data in the context of existing knowledge. Through the probabilistic integration of explicit and implicit data sources a functional interaction network can be constructed. Each edge connecting two proteins is weighted by a confidence value capturing the strength and reliability of support for that interaction given the combined data sources. The resulting network is examined in conjunction with expression data to identify groups of genes with significant temporal or tissue specific patterns. In contrast to unstructured gene lists, these networks often represent coherent functional groupings. Results: By linking from shared functional categorizations to primary biological resources we apply this method to craniofacial microarray data, generating biologically testable hypotheses and identifying candidate genes for craniofacial development. Conclusion: The novel methodology presented here illustrates how the effective integration of pre-existing biological knowledge and high-throughput experimental data drives biological discovery and hypothesis generation. Background well-characterized variables to exploring a complicated The increased use of high-throughput analysis methods, mire of thousands of inter-related variables simultane- such as microarrays, in mainstream biological research ously [1]. These methods are powerful, but their outputs has led to a shift from studying small groups of reasonably are complicated and difficult to interpret due to the sheer Page 1 of 7 (page number not for citation purposes) BMC Bioinformatics 2009, 10(Suppl 2):S12 http://www.biomedcentral.com/1471-2105/10/S2/S12 volume of data produced. Interpretation can be prohibi- Methods tively time consuming in the absence of computational Microarray expression data assistance. A comprehensive murine craniofacial developmental expression dataset was used in this study [6]. Expression The ultimate goal of any microarray experiment is to gain was analyzed through the microdissection of mandibular, insight into the workings of cellular organisms by under- maxillary and frontonasal prominences at time points standing the interactions of genes and proteins. For this to E10.5–E12.5 at 0.5 day increments. Expression was meas- be accomplished, raw data must not only be converted ured using the Affymetrix MOE430_2A microarray sys- into information, but this information must also be inter- tem. 916 microarray probes, corresponding to 712 unique preted in context, to be transformed into timely biological MGI identifiers, were clustered using the MANOVA test discovery and knowledge [2]. Currently, the lack of a com- statistic in R [7]. Hierarchical complete clustering was munity-wide consensus on how best to integrate experi- undertaken on the resulting correlation coefficients. The mental data with information resources limits this resultant tree was cut to produce 36 clusters. knowledge acquisition [2]. The recent work of Saraiya et al. (2005) [1] highlighted a "critical need" for tools able Explicit and implicit data sources to "connect numerical patterns to the underlying biologi- Traditionally an 'interaction' between two proteins is cal phenomena", as current techniques fail to adequately defined as a physical association. Here we expand the link microarray data to biological meaning, which limits term to include functional relationships between pairs of researchers' biological insights [1]. proteins (encompassing any type of evidence, including physical, functional, genetic, biochemical, evolutionary, One intuitive way to integrate biological knowledge and and computational evidence [3]). Functional interaction microarray data is through protein-interaction networks, information was retrieved from a number of different where nodes represent proteins and edges symbolize rela- resources falling into either of two categories: explicit and tionships between proteins [3]. However, focusing solely implicit. Explicit sources indicate a direct interaction on physical protein interactions, such network constructs between a pair of genes/proteins, and include experimen- neglect a wealth of knowledge currently distributed tally measured physical, biochemical and genetic interac- among hundreds of existing biological databases (over tions, and computationally predicted gene 1000 listed in this year's Nucleic Acids Research database neighborhoods, gene fusion events, or conserved phylo- issue alone [4]) that is directly applicable to proteins genetic profiles. Implicit sources provide information per- investigated via microarray experiments. Current protein taining to an individual gene or protein attribute, which network constructs typically focus on a small subset of may be shared by any given pair of genes or proteins. Such this biological knowledge, producing incomplete and attributes include literature references [8], sequence sparsely populated resources. This is a particular problem motifs (PReMod, InterPro) [9,10], protein categories for higher eukaryotic organisms such as mice and (ChEBI) [11], protein complexes [12], phenotypes (as humans, for which physical protein interaction data are described by the Mouse Genome Database) [13], cellular limited. location, molecular function, and biological processes (Gene Ontology) [14] and pathways (KEGG, Ingenuity) In agreement with Lee et al. (2004) [5] and Leach et al. [3,15,16]. (2007) [3], we demonstrate by expanding the definition of 'interaction' to include functional information that a) Network construction, weighting and visualization there is enough publicly available biological information Genes within any given cluster were defined to be the to produce biologically useful, well populated interaction nodes in our network constructs; a network was produced networks for higher eukaryotic species, b) through the for each cluster identified from the hierarchical complete combination of expression data and functional informa- clustering stage. Data from both implicit and explicit tion, it is possible to provide contextual insight into the sources were used to define arc interactions between pairs network, and c) it is possible to effectively link to existing of proteins. Applying the cons NoisyOR methodology of biological knowledge using current technology. Using a Leach et al. (2007) [3], the edges between each pair of murine craniofacial developmental expression microarray nodes were assigned a combined reliability score (net- dataset [6] and a recently published technique for weight- work component, σNET) based on the individual reliabili- ing and integrating functional interaction information ties of the sources asserting the edge [3]. Resultant [3], we illustrate how the application of context sensitive networks were viewed using Cytoscape [17]. methodology leverages the full force of current available biological knowledge, enabling the translation of com- Network identification and interrogation plex high-throughput datasets into scientific insight and Based on significant tissue-restricted expression (expres- discovery. sion limited to the mandibular prominence) and progres- Page 2 of 7 (page number not for citation purposes) BMC Bioinformatics 2009, 10(Suppl 2):S12 http://www.biomedcentral.com/1471-2105/10/S2/S12 A Mandibular Maxillary Frontonasal Embryonic day B ChEBIFigure informed 1 network ChEBI informed network. A) Illustratates the mandibular-specific expression profile, B) the network of 11 nodes and their associated edges. Page 3 of 7 (page number not for citation purposes) BMC Bioinformatics 2009, 10(Suppl

Leveraging Existing Biological Knowledge in the Identification Of

• La Gestion Efficace De L’Énergie • Le Réseautage Planétaire • Les Processus Géophysiques • L’Économie Globale, La Sécurité Et La Stabilité

Fall 2016 Is Available in the Laboratory of Dr

Olga G. Troyanskaya - Sciencewatch.Com

Celebrate Princeton Invention 2017

Patrick J. H. Bradley

Seq-Setnet: Exploring Sequence Sets for Inferring Structures

Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by Pickeringhutchins Ltd

Curtis Huttenhower Associate Professor of Computational Biology

Matthew A. Hibbs, Ph.D. Assistant Professor the Jackson Laboratory

Top 100 AI Leaders in Drug Discovery and Advanced Healthcare Introduction

PSB 2009 Attendees (As of January 9, 2009) Mario Albrecht Max Planck

Functional Genomics Workshop Report