Interrogating Protein Interaction Networks Through Structural Biology

Interrogating protein interaction networks through structural biology Patrick Aloy and Robert B. Russell* European Molecular Biology Laboratory (EMBL), Postfach 10 22 09, D-69012, Heidelberg, Germany Communicated by I. M. Gelfand, Rutgers, The State University of New Jersey-New Brunswick, Highland Park, NJ, March 13, 2002 (received for review November 16, 2001) Protein–protein interactions are central to most biological processes. databank suggests considerable variation in the interaction partners Although much recent effort has been put into methods to identify preferred by particular protein families (14). Clearly detailed stud- interacting partners, there has been a limited focus on how these ies are required to understand when it is possible to infer an interactions compare with those known from three-dimensional (3D) interaction between proteins when one is known to occur between structures. Because comparison of protein interactions often involves homologues. considering homologous, but not identical, proteins, a key issue is Here, we present a method to model putative interactions on whether proteins that are homologous to an interacting pair will known 3D complexes and to assess the compatibility of a proposed interact in the same way, or interact at all. Accordingly, we describe protein–protein interaction with such a complex. After identifying a method to test putative interactions on complexes of known 3D the residues that make atomic contacts in a known crystallographic structure. Given a 3D complex and alignments of homologues of the complex, we look to homologues of both interacting proteins to see interacting proteins, we assess the fit of any possible interacting pair whether these interactions are preserved by means of empirical on the complex by using empirical potentials. For studies of interact- potentials. This method permits us to score all possible pairs ing protein families that show different specificities, the method between two protein families, and say which are likely to interact. provides a ranking of interacting pairs useful for prioritizing experi- We apply the method to the fibroblast growth factor͞receptor ments. We evaluate the method on interacting families of proteins system, and explore the intersection between all complexes of with multiple complex structures. We then consider the fibroblast known 3D structure and interactions between yeast proteins pro- growth factor͞receptor system and explore the intersection between posed by methods such as two-hybrids. We demonstrate and discuss complexes of known structure and interactions proposed between the importance of incorporating 3D structure information into yeast proteins by methods such as two-hybrids. We provide confir- studies of protein–protein interactions. mation for several interactions, in addition to suggesting molecular details of how they occur. Methods Overview of the Methodology. Our initial analysis of complexes of major goal of functional genomics is to determine protein known 3D structure showed that interactions between proteins Ainteraction networks for whole organisms. Large-scale studies occurred through a variety of main-chain and side-chain contacts have identified hundreds of potentially interacting proteins or (Fig. 1) as described in part previously (15). This prompted us to complexes in yeast (1–3). Computational methods have used gene derive empirical potentials for amino acids to be involved in fusion (4, 5), gene order (6), phyletic distribution (7), or a combi- particular side-chain to side-chain and side-chain to main-chain nation of approaches (5) to predict functional associations (and contacts at protein–protein interfaces (see below). putative interactions) between thousands of proteins, and efforts For a complex of known structure, we identify residues making are underway to catalog interaction data contained within the atomic contacts at the interface of the two proteins. We then use literature (8, 9). the potentials to score the compatibility of any homologous intra- Despite attempts to identify putative interactions, little attention species pair of sequences, and calculate a statistical confidence has been paid to one of the best sources of protein interaction data: based on a comparison to scores for a background of sequences that complexes of known three-dimensional (3D) structure. Decades of are unlikely to make favorable complexes (see below). In the x-ray crystallography have produced hundreds of structures for sections that follow, we refer to scores as being significant if their Յ Յ protein complexes, and these structures provide a rich source of probability to occur by chance is 0.01 or 0.1 when compared data for learning principles of how proteins interact and for with the background. validating interactions determined by other methods. Although the number of complexes of known 3D structure is relatively small, it Database of 3D Protein Complexes. To construct a nonredundant is possible to expand this set by considering homologous proteins. database of interacting protein domains, we used BLAST2 (16) to Any interaction, whether known from two-hybrids, crystallogra- compare sequences from representatives of the SCOP database phy, or another method, will typically involve two or more proteins (17) of protein structures to those from the Pfam database of protein domain families (18). We defined links between the two that themselves are members of homologous families. A major Յ problem in genome annotation efforts is to understand when it is databases as matches with an expectation E-value 0.01. We then searched for instances where different Pfam domains matched possible to transfer functional information, determined by experi- Ͻ ment, from one protein to its homologues. Recent years have seen different chains that were in contact (any atom distance 5Å) in the progress in predicting whether details such as enzymatic specificity same structure. We manually grouped the final, nonredundant set or binding sites can be extrapolated to other members of a protein of 356 unique interacting pairs of different Pfam families into broad family (e.g., refs. 10 and 11), although similar studies on protein– classes: (i) enzyme inhibitors; (ii) cytokine receptors; (iii) signaling protein interactions have been limited. It is not known whether it is generally possible to say that proteins homologous to a known Abbreviations: 3D, three-dimensional; FGF, fibroblast growth factor; FGFR, FGF receptor; interacting pair will interact in the same way, or indeed interact at CKS, cyclin-dependent kinase regulatory subunit. all. For example, cytokines in the same family can be either *To whom reprint requests should be addressed. E-mail: [email protected]. promiscuous or highly specific regarding the receptors they prefer The publication costs of this article were defrayed in part by page charge payment. This (e.g., ref. 12), and some homologues do not bind receptors at all article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. (e.g., ref. 13). Moreover, analysis of interactions within the protein §1734 solely to indicate this fact. 5896–5901 ͉ PNAS ͉ April 30, 2002 ͉ vol. 99 ͉ no. 9 www.pnas.org͞cgi͞doi͞10.1073͞pnas.092147999 Downloaded by guest on September 28, 2021 where na and nb are the total number of amino acids a and b and T is the total number of interacting pairs. Eab is the molar expected Ϫ frequency for the a b pair, Cab is the number of observed contacts between residues a and b and Sab the log-odds score. We assess the compatibility of a potential complex by evaluating a total score as the sum of Sab values for all interacting residue pairs identified in Ͼ the crystallographic complex. Sab values are evaluated only if Cab Ͼ Ϫ 0 and Eab 5 to ensure sufficient data. Otherwise Sab is set to 0.5 and Ϫ1.0 for side-chain to main-chain and side-chain to side-chain potential, respectively, which are approximately the negative of the largest values. High scoring complexes are evaluated by this func- tion to be more favorable. Assigning Statistical Significance and Evaluation. To assess whether an interface contains a sufficient number and type of contacts to be used further, we generated 1,000 random sequences for each Fig. 1. Interaction types across the four classes. interacting protein, based on surface residue frequencies. We scored them on the known complexes by using the empirical potentials on the interacting residues identified from the crystal- complexes; (iv) hetero-multimers; (v) immune system complexes lographic complex. We used random sequences because we could (i.e., antibody͞antigen or MHC complexes); and (vi) other protein– not systematically extract information about noninteracting pairs of protein interactions. proteins homologous to a known complex, because such information is scant and subjective. Only when the known complex gave a Derivation of Empirical Potentials for Protein–Protein Interactions. score with Z Ն 2.3 (2.3 standard deviations above the mean; Ϫ Empirically derived potentials incorporate thermodynamic effects significance Յ10 3) did we consider it suitable for further use. This without explicitly having to model proteins, which makes them process effectively removes complexes where there are few inter- useful in protein fold-recognition and protein–protein docking actions or that are mediated mainly through main-chain to main- (e.g., refs. 19 and 20). We derived main-chain to side-chain and chain contacts. side-chain to side-chain potentials from the complexes above. We To test the method, we extracted 59 pairs of homologous, defined interacting residues by requiring one or more of: hydrogen nonidentical complexes of known structure, and scored one com- bonds (NOO distances Յ 3.5 Å), salt bridges (NOO distances Յ plex by using the other. We ignored immune system complexes as 5.5 Å), or van de Waals interactions (COC distances Յ 5 Å). We different interacting partners are expected (e.g., immunoglobulins). also require a relative accessibility of the unbound proteins Ն10% To ensure the integrity of the potentials we performed the test as a jack-knife (leave-one-out) procedure during this process.

Interrogating Protein Interaction Networks Through Structural Biology

Whole Genome and Segmental Duplications Underlie Glutamine Synthetase and Phosphoenolpyruvate Carboxylase Diversity in Narrow-Leafed Lupin (Lupinus Angustifolius L.)

A Roadmap for Metagenomic Enzyme Discovery

Structural Genomics: an Approach to the Protein Folding Problem

Syntenic Gene and Genome Duplication Drives Diversification of Plant Secondary Metabolism and Innate Immunity in Flowering Plants

The Role of Protein Structure in Genomics

Structural Genomics Lecture Notes

The Impact of Structural Genomics: Expectations and Outcomes John-Marc Chandonia and Steven E

Classical Genetics 3. the Beginnings of Genomic Biol

Unexpected Features of the Dark Proteome

National Science Foundation-Sponsored Workshop Report

Structural Genomics of the Thermotoga Maritima Proteome Implemented in a High-Throughput Structure Determination Pipeline

SINEUP Non-Coding Rnas Rescue Defective Frataxin Expression and Activity in a Cellular Model of Friedreich’S Ataxia