Protein Folding and Binding Can Emerge As Evolutionary Spandrels Through Structural Coupling
Total Page:16
File Type:pdf, Size:1020Kb
Protein folding and binding can emerge as evolutionary spandrels through structural coupling Michael Manharta and Alexandre V. Morozova,b,1 aDepartment of Physics and Astronomy and bBioMaPS Institute for Quantitative Biology, Rutgers University, Piscataway, NJ 08854 Edited* by Curtis G. Callan, Jr., Princeton University, Princeton, NJ, and approved December 31, 2014 (received for review August 16, 2014) Binding interactions between proteins and other molecules medi- global properties of protein interaction networks (18, 19). ate numerous cellular processes, including metabolism, signaling, However, a subtle but key property of proteins has not been and gene regulation. These interactions often evolve in response explored in this context: structural coupling of folding and to changes in the protein’s chemical or physical environment (such binding (the fact that folding is required for function) implies as the addition of an antibiotic). Several recent studies have evolutionary coupling of folding stability and binding strength. shown the importance of folding stability in constraining protein Thus, selection acting directly on only one of these traits may evolution. Here we investigate how structural coupling between produce apparent, indirect selection for the other. The impor- folding and binding—the fact that most proteins can only bind tance of this effect was popularized by Gould and Lewontin in their targets when folded—gives rise to an evolutionary coupling their influential paper on evolutionary “spandrels” (20), defined between the traits of folding stability and binding strength. Using as traits that evolve as byproducts in the absence of direct se- a biophysical and evolutionary model, we show how these protein lection. Since then the importance of spandrels and coupled traits can emerge as evolutionary “spandrels” even if they do not traits has been explored in many areas of evolutionary biology confer an intrinsic fitness advantage. In particular, proteins can (21), including various molecular examples (12, 22, 23). evolve strong binding interactions that have no functional role How do coupled traits affect protein evolution? We consider but merely serve to stabilize the protein if its misfolding is dele- a biophysical model that describes evolution of a new binding terious. Furthermore, such proteins may have divergent fates, interaction in response to a change in the protein’s chemical or evolving to bind or not bind their targets depending on random physical environment, including availability and concentrations EVOLUTION mutational events. These observations may explain the abundance of various ligands (24, 25), or in response to artificial selection in of apparently nonfunctional interactions among proteins ob- a directed evolution experiment (3). We postulate a fitness served in high-throughput assays. In contrast, for proteins with landscape as a function of two biophysical traits: folding stability both functional binding and deleterious misfolding, evolution and binding affinity to a target molecule. We then use an exact may be highly predictable at the level of biophysical traits: adap- numerical algorithm (26, 27) to characterize adaptive paths on tive paths are tightly constrained to first gain extra folding sta- this fitness landscape, focusing on how coupled protein traits bility and then partially lose it as the new binding function is affect evolutionary fates, epistasis, and predictability. developed. These findings have important consequences for our understanding of how natural and engineered proteins evolve un- Results der selective pressure. Model of Protein Energetics. We consider a protein with two-state folding kinetics (1). In the folded state, the protein has an in- protein evolution | evolutionary spandrels | fitness landscapes | terface that binds a target molecule. Assuming thermodynamic protein interactions | folding stability equilibrium (valid when protein folding and binding are faster than typical cellular processes), the probabilities of the protein’s roteins carry out a diverse array of chemical and mechanical structural states are given by their Boltzmann weights: Pfunctions in the cell, ranging from metabolism to gene ex- pression (1). Thus, proteins serve as central targets for natural Significance selection in wild populations, as well as a key toolbox for engi- neering novel molecules with medical and industrial applications Most proteins must be folded to perform their function, which (2, 3). Most proteins must fold into their native state, a unique typically involves binding another molecule. This property 3D conformation, to perform their function, which typically enables protein traits of stable folding and strong binding to involves binding a target molecule such as a small ligand, DNA, emerge through adaptation even when the trait itself does not or another protein (1). Misfolded proteins not only fail to per- increase the organism’s fitness. Such traits, known as evolu- form their function but also may form toxic aggregates and divert tionary “spandrels,” have been investigated across many valuable protein synthesis and quality control resources (4–7). It organisms and scales of biology. Here we show that proteins is therefore imperative that the folded state be stable against can spontaneously evolve strong but nonfunctional binding typical thermal fluctuations. However, biophysical experiments interactions. Moreover, evolution of such interactions can and computational studies reveal that most random mutations in be highly unpredictable, with chance mutation events de- proteins destabilize the folded state (8, 9), including mutations termining the outcome. In contrast, evolution of functional that improve function (9–11). As a result, many natural proteins interactions follows a more predictable pattern of first gaining tend to be only marginally stable, mutationally teetering at the folding stability and then losing it as the new binding func- brink of unfolding (12, 13). With proteins in such a precarious tion improves. position, how can they evolve new functions while maintaining sufficient folding stability? Author contributions: M.M. and A.V.M. designed research; performed research; analyzed Directed evolution experiments have offered a window into data; and wrote the paper. the dynamics of this process (2, 3), indicating the importance of The authors declare no conflict of interest. compensatory mutations, limited epistasis, and mutational ro- *This Direct Submission article had a prearranged editor. bustness. Theoretical efforts to describe protein evolution in 1To whom correspondence should be addressed. Email: [email protected]. biophysical terms have focused on evolvability (14), distributions This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. of protein stabilities and evolutionary rates (13, 15–17), and 1073/pnas.1415895112/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1415895112 PNAS | February 10, 2015 | vol. 112 | no. 6 | 1797–1802 Downloaded by guest on September 28, 2021 Methods). Amino acid energies e ði; σiÞ and e ði; σiÞ are randomly −β E +E f b folded; bound: p ; = e ð f bÞ Z; sampled from distributions constructed using available ΔΔG f b −βE [1] data and other biophysical considerations (Materials and Meth- folded; unbound: p ; = e f Z; f ub ods); we will consider properties of the model averaged over unfolded; unbound: puf;ub = 1=Z: these distributions. The exact shape of the distributions is not important for large enough L due to the central limit theorem. Here β is the inverse temperature, Ef is the free energy of folding ΔG E = E′ − μ E′ Fitness Landscape. We construct a fitness landscape based on the (also known as ), and b b , where b is the binding E E free energy and μ is the chemical potential of the target mole- molecular traits f and b. Without loss of generality, we assume E that the protein contributes fitness 1 to the organism if it is al- cule. For simplicity we will refer to b as the binding energy. The f ; f ∈ ; −βðEf +EbÞ −βEf ways folded and bound. Let ub uf ½0 1 be the multiplicative partition function is Z = e + e + 1. Note that Ef < 0 for E < fitness penalties for being unbound and unfolded, respectively: intrinsically stable proteins and b 0 for favorable binding f interactions; binding can thus be made more favorable either the total fitness is ub if the protein is unbound but folded, and f f if the protein is both unbound and unfolded. Then the by improving the intrinsic affinity between the protein and the ub uf fitness of the protein averaged over all three possible structural target (decreasing E′) or by increasing the concentration of the b states in Eq. 1 is given by target molecule (increasing μ). Because the protein can bind only when it is folded, the binding and folding processes are structur- À Á F E ; E = p ; + f p ; + f f p ; : [3] ally coupled. This gives rise to binding-mediated stability: strong f b f b ub f ub ub uf uf ub binding stabilizes a folded protein, such that even intrinsically E > This fitness landscape is divided into three nearly flat plateaus unstable proteins ð f 0Þ may attain their folded and bound 1 state if binding is strong enough ðE + E < 0Þ. Different physical corresponding to the three protein states of Eq. , separated by f b steep thresholds corresponding to the folding and binding tran- mechanisms are possible for the kinetics of the structural cou- A pling (28, 29), but in thermodynamic equilibrium only the free sitions (Fig. 1 ). The heights of the plateaus are determined by the values of fub and fuf , leading to three qualitative regimes of energy differences matter. B–D The folding and binding energies depend on the protein’s the global landscape structure (Fig. 1 ). In the first case (Fig. 1B), a protein that is perfectly folded but genotype (amino acid sequence) σ. We assume that adaptation unbound has no fitness advantage over a protein that is both only affects “hotspot” residues at the binding interface (30, 31); unbound and unfolded: f = f f . Thus, selection acts directly the rest of the protein does not change on relevant time scales ub uf ub on the binding trait only.