Understanding the properties of a system by abstracting the structural biology and functional behaviors from the underlying cellular components is the very objective of systems analysis. Although at the final stage of understanding a signal transduction pathway, a cell, an organism or a living system, structural biology could be obviated, we need them to be able to reach that stage. Structures of proteins, especially molecular machines, could provide quantitative parameters, help to elucidate functional networks or enable rational designed perturbation experiments for reverse engineering. The role of structural biology in systems biology should be to provide enough understanding so that macromolecules can be translated into dots or even into equations devoid of chemistry.

Depending on the specific biological question at hand, different structural details and biophysical properties of protein complexes should be explored to provide significant insight. For example, when part of a signal transduction cascade is analyzed, accurate kinetic constants will be crucial to model a system correctly. As we will discuss below, protein complex structures could be used to predict these kinetic constants in silico. In cases in which understanding the spatial cellular distribution of larger protein complexes is the aim, the affinity or approximate kinetic constants might be enough. In such circumstances, qualitative experimental binding information, as from pull-down assays, can be combined with structural information from electron microscopy and fluorescence imaging.

In the following, we will discuss how structural genomics is being explored and the role it should increasingly play to reduce proteins to their key functional properties (Figure 1).

Prediction of protein interactions using structural information
Understanding a system requires knowledge of the network of interactions in space and time. In other words, to understand who is interacting with whom, how these interactions affect the properties of the individual components, what are the properties of the complexes formed, and how these interactions change in space and time. Determining protein–protein interactions has therefore become one of the favorites of large-scale projects ranging from pull-down assays [2,3,4] to full yeast two-hybrid analysis [5–8]. Although much progress has been achieved in this area [9], we are still far from having 100% coverage and accuracy [10]. Also, despite the progress in structural genomics projects and the existence of specific large-scale consortia aiming to determine the structures of macromolecular complexes (i.e. http://www.3drepertoire.org/), we are far from having a full atomic description of all cellular complexes. The number of possible complexes, the transient nature of many of them and inherent

experimental difficulties make this goal difficult to achieve. Thus, in recent years, efforts have been made in using available structural information to predict and model the structures of interacting proteins (recently reviewed in [11]). Although the prediction of protein–protein interactions using structural information is far from perfect, it is becoming a useful tool that enables not only a Boolean assignment (yes or no) to a particular putative interaction, but also the production of structural models, sometimes at very high resolution [12]. Particular problems that remain to be solved are the modeling of loop conformations, backbone moves and docking. These problems are minimized if many structures are available of complexes involving members of the same protein family [12].

Quantitative data
In many cases, determining the network of interacting components is not enough to understand a biological

Summary of the main concepts discussed in this review. Structural information can be used in many ways to help us retrieve the characteristic functional properties of cellular components. Here, we detail recent advances in the use of protein structures to predict protein interactions, protein function and quantitative binding parameters, curate large-scale protein interaction studies and understand the impact of coding variability. experimental difficulties make this goal difficult to system to the level of making successful predictions. achieve. Thus, in recent years, efforts have been made For this purpose, quantitative parameters (approximate in using available structural information to predict and or detailed depending on the problem [13]) are required. model the structures of interacting proteins (recently Currently, there are no high-throughput experimental reviewed in [11]). Although the prediction of protein– approaches to obtain these values. Thus, the possibility protein interactions using structural information is far of predicting thermodynamic and kinetic properties of from perfect, it is becoming a useful tool that enables not protein complexes, based on X-ray complex structures or only a Boolean assignment (yes or no) to a particular homology models, could be one of the major contri- putative interaction, but also the production of structural butions of structural biology to systems biology [14]. models, sometimes at very high resolution [12]. Particu- Affinities and kinetic constants are important for model- lar problems that remain to be solved are the modeling of ing cellular signal transduction pathways, as is done in loop conformations, backbone moves and docking. SmartCell (http://smartcell.embl.de)[15], whereby diffu- These problems are minimized if many structures are sion and cellular localization is taken into account. Suc- available of complexes involving members of the same cessful predictions of binding affinities for wild-type and family [12]. mutant complexes have been carried out using the protein design algorithm FoldX (http://foldx.embl.de) Quantitative data [16–18]. Examples are the prediction of Ras–effector In many cases, determining the network of interacting interactions [12,19–21], and interactions of PDZ and components is not enough to understand a biological SH3 domains with their targets [22,23]. www.sciencedirect.com Current Opinion in Structural Biology 2007, 17:378–384 380 Sequences and topology

Predicting binding affinities and hot-spot residues is also on the affinity alone, or whether individual association and important in rational design, to modify the binding speci- dissociation rate constants are important as well. ficity of ligands. This was successfully done for the TRAIL receptor system, for which DR5-selective TRAIL variants The role of structural in the were generated that do not induce apoptosis in DR4- post-genomic era responsive cell lines, but show a large increase in biological Recent efforts to map all possible interactions between activity in DR5-responsive cancer cell lines [24]. Other cellular components, in a high-throughput fashion, have examples are the successful creation of new specifically created very large data sets. Properly mined, this data interacting DNase–inhibitor pairs [25,26] and the rational should help us to better understand living cells. Extract- design of ICAM-1 mutants with enhanced affinity for its ing meaningful information from these data sets is, how- antigen (LFA-1) [27]. ever, not a simple task. Most studies of these interactions rely on a simplified network representation, whereby Approaches to predict association rate constants make use components are nodes and connections between them of the principle of electrostatic steering [28]. Based on this are denoted as edges. This has enabled the vast amount of concept, the association rate constant of a protein complex information to be grasped in a formal way, leading to the can be enhanced by increasing the electrostatic charge discovery of important and general global network prop- complementary at the interface and at the edge of the erties [32–34]. interface. The protein design algorithm PARE [29] was successfully developed to specifically enhance the rate of We will, undoubtedly, require much more rich detail to be association, while not affecting the dissociation rate of added to this representation if we are ever to comprehend various protein systems [29,30,31]. This method has how cellular components bring about cellular functions the potential to specifically change the kinetic properties (see Figure 2). Some studies have already started to add- of protein complexes involved in signal transduction path- ress this by discriminating between different node types ways and to investigate how the magnitude of signal trans- [35,36,37]. Vidal and colleagues have used expression data duction in vivo changes. Using this design tool, important to identify highly connected proteins (hubs) that interact biological questions can be addressed; for example, with their partners either simultaneously (party hubs) or at whether the magnitude of signal transduction is dependent different times (date hubs). Other studies have tried to

From nodes and edges to atomic detail. We should be able to use large-scale protein interaction data to obtain meaningful insight regarding cellular functions. Most of the studies so far, regarding these large data sets, have focused on a simplified formal description of interactions in which components are identically represented. From this ‘nodes and edges’ view, global network properties emerge [32–34]. For this information to be of more use, one must be able to find within the data set the modules that are traditionally studied in from the bottom up. Common to all these modules, we can find local network properties and universal node roles [35,36,37]. For example, no module exists in isolation, so connecting roles are necessary and universal to all cellular modules. We propose that structural information be used to further characterize large-scale networks by providing the key functional properties of cellular components.

study local network properties by first identifying modules Structural information for pathway modeling within the large networks and then classifying nodes The studies discussed above point to the usefulness of according to their pattern of intramodule or intermodule using structural information to curate large interaction connections [36,37]. The identification of these modules networks. The challenge will now be to show that it is and different node types brings network analysis closer to possible to use structural data to improve pathway models. more traditional studies of cellular pathways. It is our One very important aspect of modeling concerns the opinion that structural information can further help to identification of the key functional units, those that mostly bridge the divide (see Figure 2). determine the properties of the pathway. Once a module has been identified, from within the large interaction net- As we describe above, protein structures can be used to work, functional information on the components must be predict protein interactions and associated binding con- analyzed. stants. We should then be able to curate current inter- action networks using structural information to derive In cases in which the functional role of the protein is not pathway models that include rich structural detail (as known, structural information can be used to direct proposed by Aloy and Russell [11]). To enable this, experimental studies by predicting possible biological several databases have already been set up that specialize functions (reviewed in [46,47]). Successful examples of in repertoires of binding interfaces [38–43](seeTable 1). the structure-based prediction of protein function include Using one of these databases, Kim et al.[44]wereable the prediction of protein fold [48], binding pockets [49], to assign binding interfaces for 1269 of the protein– and interactions with DNA [50] and ions [18]. Prediction protein interactions of S. cerevisiae. They could then of protein fold can be used to transfer functional annota- identify mutually exclusive interactions and discriminate tions from other proteins with similar folds, in those protein hubs with many or few binding interfaces. They cases when sequence-based predictions are not possible. have shown that multi-interface hubs, compared to hubs Functional information can also be obtained from struc- with few binding interfaces, are more likely to be essen- ture-based prediction of protein interactions through tial and are, on average, more conserved, as evaluated by reasoning of ‘guilt-by-association’. In many cases, how- the ratio of non-synonymous to synonymous . ever, protein folds are functionally promiscuous, making They have also found that the interaction partners of the transfer of functional annotation difficult. Also, even if multi-interface hubs have a higher expression correlation structure-based predictions point to a very likely protein– (i.e. are expressed at the same time) than hubs with few protein interaction, it might not have an obvious func- interfaces. tional role in the pathway of interest. For example, in the case of Ras–effector interactions, high binding affinity We have used a similar approach to analyze how protein does not necessarily mean functional importance. In the binding specificity might influence the evolutionary turn- case of Ras signal transduction, Rap binds to RalGDS over of protein interactions [45]. We searched for human with high affinity, but no functional relevance has been proteins with multiple interactions through one domain found to date [51]. Therefore, structure-based prediction and compared this group to another with a similar number of functional importance should still be integrated with all of interactions through more than one domain. We found other possible data sources for maximum accuracy [52]. that, given the same number of partners, proteins inter- acting mostly through one domain have a higher rate of Once most components within the cellular module are change of interactions, in , than proteins inter- sufficiently characterized, structural details can be further acting through multiple domains. In conjunction with used to simplify the mathematical model of this pathway. other results, this has led us to propose that more pro- This can be accomplished, for example, by identifying miscuous protein binding domains have higher evolution- stable complexes. In most cases, it would be sufficient to ary turnover of their interactions. model these complexes as a single object instead of the

Database Domain definitions used Number of interfacesa References iPfam Pfam 3019 domain–domain interactions [38] 3did Pfam 3034 domain–domain interactions [39] SCOPPI SCOP 8400 interface types [40] PRISM NA 3799 interface clusters [41] SNAPPI-DB SCOP, CATH and Pfam NA [42] PIBASE SCOP 18755 interface types [43]

a The number of interfaces was obtained from the database web site or from the article, where available. Because of the different methods used, it is not possible to compare the numbers directly. NA, information not available. www.sciencedirect.com Current Opinion in Structural Biology 2007, 17:378–384 382 Sequences and topology

detailed modeling of all the components. One example Acknowledgements for which this reasoning has already been successfully We thank the EU for financial support (INTERACTION PROTEOME grant number LSHG-CT-2003-505520 and COMBIO grant number applied is the modeling of microtubule dynamics [53–55]. LSHG-CT-2004-503568). Pedro Beltrao is supported by a grant from Fundac¸a˜o para a Cieˆncia e Tecnologia through the Graduate Program in Pathways, structures and disease Areas of Basic and Applied Biology. One of the important deliverables of sequencing References and recommended reading projects is the realization of the diversity at single nucleo- Papers of particular interest, published within the period of review, tide level (single- polymorphism; SNP) and have been highlighted as: even at the level of gene copy number [56]. Currently,  of special interest more than 4.5 million unique SNPs have been identified  of outstanding interest and catalogued for the human [57]. framework to simulate cellular processes that combines

