An Empirical Framework for Binary Interactome Mapping
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLES An empirical framework for binary interactome mapping Kavitha Venkatesan1,2,12,13, Jean-Franc¸ois Rual1,2,12,13, Alexei Vazquez1,3,4,13, Ulrich Stelzl5,6,13, Irma Lemmens7,13, Tomoko Hirozane-Kishikawa1,2, Tong Hao1,2, Martina Zenkner5, Xiaofeng Xin8, Kwang-Il Goh1,3,9, Muhammed A Yildirim1,2,10, Nicolas Simonis1,2, Kathrin Heinzmann1,2,12, Fana Gebreab1,2, Julie M Sahalie1,2, Sebiha Cevik1,2,12, Christophe Simon1,2,12, Anne-Sophie de Smet7, Elizabeth Dann1,2, Alex Smolyar1,2, Arunachalam Vinayagam5, Haiyuan Yu1,2, David Szeto1,2, Heather Borick1,2,12, Ame´lie Dricot1,2, Niels Klitgord1,2,12, Ryan R Murray1,2, Chenwei Lin1,2, Maciej Lalowski5,12, Jan Timm5, Kirstin Rau5, Charles Boone8, Pascal Braun1,2, Michael E Cusick1,2, Frederick P Roth1,11, David E Hill1,2, Jan Tavernier7, Erich E Wanker5, Albert-La´szlo´ Baraba´si1,3,12 & Marc Vidal1,2 Several attempts have been made to systematically map The protein-protein interactome of an organism is the network protein-protein interaction, or ‘interactome’, networks. formed by all protein-protein interactions that can occur at a range However, it remains difficult to assess the quality and coverage of physiologically relevant protein concentrations. Mapping pro- of existing data sets. Here we describe a framework that uses tein-protein interactions is crucial, albeit not sufficient, for unra- an empirically-based approach to rigorously dissect quality veling the dynamic aspects of cellular networks—including when, parameters of currently available human interactome maps. where and for what purpose protein interactions do occur in vivo1. Our results indicate that high-throughput yeast two-hybrid Currently available human protein-protein interactome maps have (HT-Y2H) interactions for human proteins are more precise been derived using HT-Y2H2,3, high-throughput coaffinity purifi- than literature-curated interactions supported by a single cation followed by mass spectrometry4, curation of published low- publication, suggesting that HT-Y2H is suitable to map a throughput experiments5–10 or computational predictions11,12. significant portion of the human interactome. We estimate Despite a few attempts2,3,13,14, it remains difficult to accurately that the human interactome contains B130,000 binary estimate the quality and coverage of these interactome maps. Nature America, Inc. All rights reserved. 9 interactions, most of which remain to be mapped. Similar Differentiation between sets of protein pairs that can interact to estimates of DNA sequence data quality and genome size (biophysical interactions) and do interact (biological interactions) 200 © early in the Human Genome Project, estimates of protein is only possible with reliable biophysical interactome maps. How- interaction data quality and interactome size are crucial ever, several issues remain unresolved, including what proportion to establish the magnitude of the task of comprehensive of currently available interactome maps represents true biophysical human interactome mapping and to elucidate a path toward interactions and what proportion represents artifacts; whether the this goal. interactions provided by curated low-throughput experiments are 1Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, 1 Jimmy Fund Way, Boston, Massachusetts 02115, USA. 2Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, Massachusetts 02115, USA. 3Center for Complex Network Research and Department of Physics, University of Notre Dame, 225 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA. 4The Simons Center for Systems Biology, Institute for Advanced Study, Einstein Drive, Princeton, New Jersey 08540, USA. 5Max Delbru¨ck Center for Molecular Medicine, Robert-Roessle-Strae 10, D-13125 Berlin, Germany. 6Otto- Warburg Laboratory, Max Planck Institute for Molecular Genetics, Ihnestrae 63-73, D-14195 Berlin, Germany. 7Department of Medical Protein Research, Vlaams Instituut voor Biotechnologie, and Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Albert Baertsoenkaai 3, 9000 Ghent, Belgium. 8Banting and Best Department of Medical Research and Department of Molecular Genetics, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada. 9Department of Physics, Korea University, 1 Anam-dong 5-ga, Seongbuk-gu, Seoul 136- 713, Korea. 10School of Engineering and Applied Sciences, Harvard University, 29 Oxford Street, Cambridge, Massachusetts 02138, USA. 11Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, 250 Longwood Avenue, Boston, Massachusetts 02115, USA. 12Present addresses: Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA (K.V.), Department of Cell Biology, Harvard Medical School, 240 Longwood Avenue, Boston, Massachusetts 02115, USA (J.-F.R.), Centre for Cancer Therapeutics, The Institute of Cancer Research, 15 Cotswold Road, Sutton, SM2 5NG, UK (K.H.), University College Dublin, School of Biomolecular and Biomedical Science, Belfield, Dublin 4, Ireland (S.C.), Genome Exploration Research Group, RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan (C.S.), Department of Biological Sciences, Clemson University, 132 Long Hall, Clemson, South Carolina 29634, USA (H.B.), Bioinformatics Program, Boston University, 24 Cummington Street, Boston, Massachusetts 02215, USA (N.K.), Protein Chemistry/Proteomics/Peptide Synthesis and Array Unit, Biomedicum Helsinki, University of Helsinki, Haartmaninkatu 8, FI-00014 Helsinki, Finland (M.L.) and Center for Complex Network Research and Departments of Physics, Biology and Computer Sciences, Northeastern University, 360 Huntington Avenue, Boston, Massachusetts 02115, USA (A.-L.B.). 13These authors contributed equally to this work. Correspondence should be addressed to M.V. ([email protected]), A.-L.B. ([email protected]), E.E.W. ([email protected]) or J.T. ([email protected]). RECEIVED 4 JUNE; ACCEPTED 10 NOVEMBER; PUBLISHED ONLINE 7 DECEMBER 2008; DOI:10.1038/NMETH.1280 NATURE METHODS | VOL.6 NO.1 | JANUARY 2009 | 83 ARTICLES 2 Figure 1 | Conceptual framework for interactome mapping. The concepts of 1 4 5 3 6 screening completeness (fraction of all pairwise protein combinations tested), True comprehensive binary assay sensitivity (fraction of all biophysical interactions identifiable by a 9 10 biophysical interactome 8 7 given assay), sampling sensitivity (fraction of all identifiable interactions that 15 are detected in a single trial) and precision (fraction of pairs reported by a All protein-coding genes (N) 11 12 13 14 given assay that are true positives) can be estimated independently and ) N combined to empirically estimate the size of binary interactomes. Solid black 16 ~10% lines in a given network graph represent true biophysical interactions present 17 18 in that network; dashed lines represent true biophysical interactions missing Screening completeness ~25% from that network; and solid colored lines represent biophysical artifactual (ORF pairs tested) pairs present in that network. 2 1 4 5 ~15% 3 6 ORF pairs tested (n × m) 9 10 All protein-coding genes ( maps of the same interactome share common interactions. 8 7 Both approaches suffer several inherent limitations. Methods 15 that evaluate the quality of interactions with respect to mRNA 11 14 12 13 coexpression22,23 are systematically biased against true biological 16 Interactions interactions between proteins whose mRNAs are not necessarily 17 18 12345678 correlated, or are even anticorrelated, in expression. Because avail- Assay sensitivity 1 able annotations for protein function and localization are far from (interactions detectable) 2 2 Assays 3 comprehensive, lack of evidence for colocalization of a given pair of 1 4 5 3 6 proteins does not imply that the interaction observed between these proteins is an artifact. Methods based on measuring the extent of 9 10 8 13,20,21 7 overlap between two interactome maps require that the 15 corresponding data sets be derived from identical or similar assays. 14 11 12 13 Existing analyses have not always fulfilled this requirement13. Most 3-hit existing methods for quality assessment do not distinguish between 16 2-hit 17 18 1-hit the multiple sources of false negatives and false positives associated Sampling sensitivity with any interactome mapping strategy. For instance, interactions (interactions detected) missed by a single screen of an assay but identifiable after multiple 2 Interactions 1 4 5 screens must be distinguished from interactions that would never 3 6 123 be identified by that assay even after a saturating number of screens. 9 10 8 Number of screens 7 Here we developed a framework to estimate various quality 15 parameters associated with currently used protein-protein inter- 11 12 13 14 action assays, namely screening completeness, assay sensitivity, sampling sensitivity and precision. We generated empirical data 16 Nature America, Inc. All rights reserved. 17 18 to rigorously dissect these quality parameters without relying on 9 Precision correlation with other biological attributes. Combining these (artifactual pairs 200 observed) parameters provides an estimate of the size of the human binary 2 © 1 4 5 biophysical interactome and projects a path toward the completion 3 6 an independent assay Rate of scoring positive in of its mapping. 9 10 PRS RRS