<<

Downloaded by guest on October 1, 2021 rti protein collective facilitates We modes. which functional , paths. evolu- long-range scattering protein observe of multiple and simulations tion over to framework sum mechanical a this which apply as function, interpreted Green the the be in of a can nonlinearity the among when to related transition subdomains. —is epistasis—the into topological that protein the find a divides We perturbations by such manifested of band is emergence The function field. perturba- force the of scatter localized gene that propagator introduce the the the thereby, to Mutations tions and how forces. acids amino of describes propagator, the transmission propagator mechanical of connectivity the The the by determines function. captured Green mod- are with the is network that protein acid motions amino Our coupled and function. strongly a mechanical protein, as and of eled of interactions emergence theory physical the dynamic to simplified to and correlations a sequence genetic relates here observed which outline the We between search cooperativity. the links motivated has physical been this large- for and also sequences, exhibit have protein correlations which interactions in revealed Long-range acids, cooperative modes. amino from dynamical their Vergassola) scale arises Massimo of and Thattai proteins rearrangements Mukund by and of reviewed 2017; 25, function September review The for (sent 2017 20, November Libchaber, Albert by Contributed d Dutta evolution Sandipan protein minimal of a model in mechanical correlated of function Green www.pnas.org/cgi/doi/10.1073/pnas.1716215115 1 (Fig. protein the in regions the connected with A modes weakly soft of these associate emergence pro- works allosteric Recent in (31–33). functionally especially teins (27–30), capture (23–26). modes motion modes large-scale normal networks “soft” relevant into elastic low-frequency network function as that the between proteins revealed of link dynamics model the the to Decomposing is examine motion to and subdomains approach hinge- protein by An of induced sliding changes (20–22). shear-like conformational and or twists, global (14–18) rotations, involve like allostery often In of (19) (1–13). fit acids mechanisms amino the their particular, of displacements coordinated and A function Green Universit a iednmc—n npriua,teudryn psai—snot epistasis—is underlying the particular, in dynamics—and tive of 69). subsets (58–60, coevolving acids and amino can pathways 57), allosteric (56, variation (50–52), interactions epistatic sequence proteins (53–55), of Provided of structure (62–68). 3D analysis the data, landscape predict large fitness sufficiently protein’s physical epis- with effects, the by nonlinear shapes linked inducing By tasis residues function. among common among or place interaction forces the takes (49– epistasis, that acids indicate mutations amino correlations the The among 61). correlations fam- long-range protein show of sequences ilies aligned, When collective. remarkably proteins (46–48). allosteric networks of of and models patterns 43–45) in (36, Such emerge 38). (39–42) (37, modes “floppy” motion viscoelastic enable 36)—that etrfrSf n iigMte,IsiuefrBscSine la 41,Korea; 44919, Ulsan Science, Basic for Institute Matter, Living and Soft for Center eateto hsc,UsnNtoa nttt fSineadTcnlg,Usn499 Korea 44919, Ulsan Technology, and Science of Institute National Ulsan Physics, of Department tl,tempigbtensqec orlto n collec- and correlation sequence between mapping the Still, are proteins’ , dynamic their Like and fpoen steeegneo olciepten fforces of patterns collective of emergence the is proteins of omnpyia ai o h ies ilgclfunctions biological diverse the for basis physical common —cak, serbns”o canl”(1 2 34– 22, (21, “channels” or bands,” “shear B)—“cracks,” eGen de e ´ | iesoa reduction dimensional v,C-21Gnv ,Switzerland; 4, Geneva CH-1211 eve, ` | epistasis a enPer Eckmann Jean-Pierre , | eoyet-hntp map -to- b letLibchaber Albert , c etrfrSuisi hsc n ilg,TeRceelrUiest,NwYr,N 02;and 10021; NY York, New University, Rockefeller The Biology, and Physics in Studies for Center | c,1 n siTlusty Tsvi and , ulse nieArl3,2018. 30, April online Published under distributed is article 1 BY-NC-ND). (CC 4.0 access License NonCommercial-NoDerivatives open This interest. of conflict no declare authors Diego. The Biological San for California, Center of National University M.V., Research, and Fundamental Sciences; of Institute and Tata research M.T., performed Reviewers: and designed T.T. paper. and the A.L., wrote J.-P.E., S.D., contributions: Author rti ensisfins n iet h vltoaysearch. evolutionary the directs and fitness the its across response defines mechanical protein of forces propagation of The propagation via motion. impulse and localized a pro- the to how the responds measures understand protein function to Green way The natural dynamics. a collective is tein’s 77) (76, function Green a vletwr pcfi ehnclfnto:i epnet a conforma- 1 to large-scale (Fig. a response to change undergo in tional will network protein function: the the mechanical force, allowing specific localized is acid a interactions, that amino toward the topology one evolve tweak substitute with that another network Mutations with acid gene. the amino in evolving encoded an of as interpretation tein mechanical a the give evolu- how to their aim of particular, directs epistasis. Our model in proteins simplified (46–48). and functional a tion networks of construct in allosteric dynamics to study collective and different: to 43–45) is used evolu- here 36, recently protein been (35, of have questions proteins models basic extensively Such examine to and tion. one map allow which the (24), survey proteins networks applica- lattice elastic as the or such (73–75) models, is coarse-grained approach simplified of complementary tion natural are A or which experiments (70–72). dimension, high-throughput evolution by huge even of sample, spaces to hard genotype-to-phenotype connects The proteins data. of the map of inter- sparsity physical the the and of actions complexity there the however, challenges: analysis; inherent reliable remain for required databases accu- sequencing mulates extensive and valuable dynamics, protein provide on simulations information and Experiments understood. fully owo orsodnemyb drse.Eal [email protected] or [email protected] Email: addressed. be may correspondence [email protected]. whom To sfins adcp n psai,wihaeotnhr to hard often are which epistasis, and calculate. such landscape evolution, fitness of determinants as basic This of to protein. derivation gene the for the in allows dynamics links collective directly and in which propagation protein function, force complexity, Green describes the this model of of Our terms light useful. fold are In that models network. acids simplified coupled amino strongly of a species into 20 nonrandom The of protein: of made behaviors. nature heteropolymers complex collective the is phenotypic challenge major and genetic links physical for long- between search shows their motivated sequences has of This their correlations. of range motion alignment large-scale while acids, involve amino functions protein Many Significance b hsppritoue oregandter httet pro- treats that theory coarse-grained a introduces paper This D preetd hsqeTh Physique de epartement ´ a,d,1 C and oiu n eto eMath de Section and eorique PNAS ´ .W hwta h plcto of application the that show We D). | o.115 vol. raieCmosAttribution- Commons Creative | o 20 no. | ematiques, ´ E4559–E4568

BIOPHYSICS AND Fig. 1. Protein as an evolving machine and propagation of mechanical forces. (A) Formation of a softer shear band (red) separating the protein into two rigid subdomains (light blue). When a binds, the biochemical function involves a low-energy hinge-like or shear motion (arrows). (B) Shear band and large-scale motion in a real protein: the arrows show the displacement of all amino acids in human glucokinase when it binds glucose (Protein Data Bank ID codes 1v4s and 1v4t). The coloring shows a high-shear region (red) separating two low-shear domains that move as rigid bodies (shear calculated as in refs. 21 and 36). (C) The mechanical model. The protein is made of two species of amino acids, polar (P; red) and hydrophobic (H; blue), with a sequence that is encoded in a gene. Each forms weak or strong bonds with its 12 near neighbors (Right) according to the interaction rule in the table (Left). (D) The protein is made of 10 × 20 = 200 amino acids with positions that are randomized from a regular triangular lattice. Strong bonds are shown as gray lines. Evolution begins from a random configuration (Left) and evolves by mutating one amino acid at each step, switching between H and P. The fitness is the mechanical response to a localized force probe (pinch) (2). After ∼ 103 mutations (Center; intermediate stage), the evolution reaches a solution (Right). The green arrows show the mechanical response: a hinge-like, low-energy motion with a shear band starting at the probe and traversing the protein, qualitatively similar to B. L, left; R, right.

Thus, the Green function explicitly defines the map: gene → in terms of “multiple scattering” pathways. These indirect phys- amino acid network → protein dynamics → function. We use ical interactions appear as long-range correlations in the co- this map to examine the effects of mutations and epistasis. A evolving genes. perturbs the Green function and scatters the propaga- Using a Metropolis-type evolution algorithm, solutions are tion of force through the protein (Fig. 2). We quantify epistasis quickly found, typically after ∼103 steps. Mutations add localized

E4560 | www.pnas.org/cgi/doi/10.1073/pnas.1716215115 Dutta et al. Downloaded by guest on October 1, 2021 Downloaded by guest on October 1, 2021 nuegoa eomto fa nye h response The . function an Green the of by should determined substrate deformation a of global binding specific induce example, force for localized fit, given induced In a to motion large-scale scribed bonds. 1 C the Fig. of of rules strength interaction acid the and network H(c) the of connectivity the Hamiltonian field the force is the stant gives that field, law displacement Hooke’s by tured These amorphous. network the make to position stant codon and acid (a hydrophobic of acids, sequence amino of (a species polar two its with (d lattice hexagonal 2D epneo h ewr oadslcmn field displacement as mechanical a displaced the to are determines acids network bonds amino the the an of of as response strength defined is The bonds which bonds. of interaction, flavors chemical two AND are the to There to springs neighbors. according harmonic nearest next by and connected est is acid vector amino a in stored site source response force the the from (blue) to protein the across wave,” “diffraction tion 2. Fig. rdi es 5ad3 ap 3teen.Tepoeni chain a is protein The therein). B3 (app. gene 36 and a of 35 in refs. in encoded 39) ered are (23–27, that network 1 interactions elastic (Fig. an and of connectivity terms with in Function. Green description Its grained and Network Acid Amino The Machine Evolving an the as Protein of Model: region high-shear the the epis- along origin, in ranged channel. mechanical correlations long its fields. becomes between to displacement tasis Owing correspondence and phenotype. the force tight and between of genotype a dimension spaces find of the solutions reduction to We of huge genes set a of sep- The is space acids subdomains. there amino rigid occurs sparse: connected into which is weakly protein transition, of Protein the topological band arates band. shearable a shear a by continuous when signaled a eventually is are into function which evolution network, by acid arranged amino the to perturbations (C (9). (6). paths paths scattering propagator multiple the mutations, of on series mutation a the as of effect The force. ut tal. et Dutta f A vlto erhsfrapoenta ilrsodb pre- a by respond will that protein a for searches Evolution particular any therefore, and fold, constant a consider We n a G sannierfnto ftegene the of function nonlinear a is 200 = esrstepoaaino h ehnclsga,dpce sa as depicted signal, mechanical the of propagation the measures ae strong a gate: c h re func- Green The (A) epistasis. and mutations, propagation, Force C i G i c P = ntegn noe naioacid amino an encodes gene the in and δ i H 0 = mn acids: amino 200 i .Teaioai hi secddi gene a in encoded is chain acid amino The ). and r .Smlrvco lsiiymdl eeconsid- were models elasticity vector Similar D). i noe a encodes iaycdn,where codons, binary ntepoen h positions The protein. the in v δ mutation A (B) v. H xetteoe ttebudre,every boundaries, the at ones the Except r. f j = seuvln oasre fmlil scattering multiple of series a to equivalent is , H B f H(c) − 2 = a a H(c), P H i G .W olwteH oe 7,74) (73, model HP the follow We ). mn acid. amino (i u δH odadweak and bond r . 1, = i h nlgeo h pigcon- spring the of analogue The i (Eq. → G n G d r . . . δ i × H 11, (76): + c , i n i n eet h rpgto of propagation the deflects u h psai ewe two between epistasis The ) n 1 = v d a aeil n Methods). and Materials i d eetn h amino the reflecting c, oddit a into folded ) h epnei cap- is response The . arx hc records which matrix, = noe an encodes a f d C r i H δ i eueacoarse- a use We · G tacrancon- certain a at n r randomized are − f G a a edescribed be can δH P nue ya by induced 400 = z hnthe when u, f 12 = i and i (“pinch”). and H) = G H δH 10 f f are dfs amino P (pinch) near- j G × u − a c, 20 P is v Eq. narno ee yial,w aeasalfato famino of fraction small a of take we type of Typically, acid gene. random a in acid amino the pattern stiffening or bond softening network. the by changing response elastic thereby the protein, and the between in exchanging codon position to selected process dom corresponds randomly mutation This a one. point flip and we a zero step, via each evolved at and is where, (Materials protein sites) 8–10 The (typically Methods). output channel’s the protein of We of tions the R). maximum allostery the and allows as L fitness specific on that the sites define (unlike specified band between tasks communicating shear of side. mechanical tasks wider right general the a on perform to sites to the prescribed rise of The gives any (pinch). at side This occur left may the on response site dipolar specific response a prescribed at applied a for Landscape. search lations Mechanical the in Searches force Evolution of patterns complex more treat well approach motion. as This and can mode. and hinge-like general a of is emergence the pinch drive localized which a for examples ular u function fitness a the if fitter is protein by The 1D). Fig. in (R pinch protein the of side right vector neigh- displacement two 1D), at dipole Fig. force in a acids 1D): amino Fig. boring in (pinch probe localized is quantity of related A Methods). resolvent, the and (Materials modes nonzero of inversion Hamiltonian, the that of imply [1] and law Hooke’s Hence, lar. and rotation, 2D one A and vanishes. translations, cost energy elastic has the protein and change, not propagator, do a bonds into gene the propaga- maps mechanical motion, that linear into c function force the function nonlinear a is Green the turns it the that space, of tor nature phenotype dual the the in reflects This G(c)f. genotype the from source force Eq. the 2A). from signals of sion the [2] from evaluate to it use and change function fitness below) Green fitness the explained initial in method change low a the of calculate we therefore, step, and each stiff protein the G eacp h uainif mutation the accept we change fitness The → = vlto trsfo admpoencngrto encoded configuration protein random a from starts Evolution a to response mechanical strong rewards function fitness The the of lengths the body, rigid a as moved is protein the When H, H stemcaia rpgtrta esrstetransmis- the measures that propagator mechanical the is oti n,w vleteaioai ewr oincrease to network acid amino the evolve we end, this To v. 2 Gf G(c). .Tehg rcino togbnsrenders bonds strong of fraction high The Left ). 1D, (Fig. enstefins landscape fitness the defines ω f ntepecie response prescribed the on = rdcsalredfraini h ieto specified direction the in deformation large a produces 1 λ k osiue nepii eoyet-hntp map genotype-to-phenotype explicit an constitutes P n . H f q 0 G(ω (about 0 δ 3 = ntenniglrsbpc fthe of subspace nonsingular the in = F : F −f ( = ) uhzr oe Glla ymtis,two symmetries), (Galilean modes zero such c hc stepoeto ftedisplacement the of projection the is which , p p ihadplrresponse, dipolar a with v, F δ otemcaia phenotype mechanical the to = G(c) admypstoe ihnamajority a within positioned randomly 5%) 0 0 F h rsrbdmto sseie ya by specified is motion prescribed The . = (c) and ω eemnsteft ftemutation: the of fate the determines − δ u F H) q v δ = H(c) 0 = | F −1 u ntelf ieo h rti (L protein the of side left the on G(c) v ≥ = | ihplsa h nrylevels energy the at poles with , PNAS + δ tews,temtto is mutation the otherwise, 0; F v v: f. G H | ee eeaiepartic- examine we Here, (c). f f 7,7) hc mut to amounts which 79), (78, F G(c) . n rsrbdresponse prescribed and u s hrfr,awy singu- always therefore, is, f vralptnilloca- potential all over [2] | = costepoen(Fig. protein the across o.115 vol. hra ti also is it whereas f, G v f G . nue yaforce a by induced stepseudoinverse the is H v q | and = n u: o 20 no. d −v − P c u simu- Our F → p n taran- a at nthe on , ' 0 | δ = u(c) 397 = G At 0. E4561 [1] [2] [3] (by G: v, f

BIOPHYSICS AND COMPUTATIONAL BIOLOGY rejected. Since fitness is measured by the criterion of strong will easily excite a large-scale motion with a distinct high-shear mechanical response, it induces softening of the amino acid band (Fig. 1D, Right). network. 3 The typical evolution trajectory lasts about 10 steps. Most Results are neutral mutations (δF ' 0) and deleterious ones (δF < 0); Mechanical Function Emerges at a Topological Transition. The hall- the latter are rejected. About a dozen or so beneficial muta- mark of evolution reaching a solution gene c∗ is the emer- tions (δF > 0) drive the protein toward the solution (Fig. 3A). gence of a new zero-energy mode, u∗, in addition to the three The increase in the fitness reflects the gradual formation of the Galilean symmetry modes. Near the solution, the energy of channel, while the jump in the shear signals the emergence of this mode λ∗ almost closes the spectral gap, λ∗ → 0, and G(ω) the soft mode. The first few beneficial mutations tend to form has a pole at ω ≈ 0. As a result, the emergent mode domi- weakly bonded P-enriched regions near the pinch site on the left | nates the Green function, G ' u∗u /λ∗. The response to a pinch side and close to the right boundary of the protein. The follow- ∗ will be mostly through this soft mode, and the fitness [2] will ing ones join these regions into a floppy channel (a shear band), diverge as which traverses the protein from left to right. We stop the sim- | (v|u∗)(u∗f) ulation when the fitness reaches a large positive value Fm ∼ 5. F (c∗) ' F∗ = . [4] The corresponding gene c∗ encodes the functional protein. The λ∗ ad hoc value Fm ∼ 5 signals slowing down of the fitness curve On average, we find that the fitness increases exponentially with toward saturation at F > Fm, as the channel has formed and now the number of beneficial mutations (Fig. 3A). However, ben- only continues to slightly widen. In this regime, even a tiny pinch eficial mutations are rare and are separated by long stretches

A B

e1 * F e0 Fitness F

e-1 L

-4 0

CD0.02

t = 16 100 g(r) t = 11 0.004 t = 1 0.00 4 8 channel 0.1 distance r 10-1 0.002

protein -2 0 10 shear 0 0.05 0.1 0.15 pc 0 4 8 12 16 0.0 0.025 p-pc

Fig. 3. The mechanical Green function and the emergence of protein function. (A) Progression of the fitness F during the evolution run shown in Fig. 1D (black) together with the fitness trajectory averaged over ∼ 106 runs hFi (red). Shown are the last 16 beneficial mutations toward the formation of the channel. The contribution of the emergent low-energy mode hF∗i (blue) dominates the fitness [4]. (B) Landscape of the fitness change δF [3] averaged over ∼ 106 solutions for all 200 possible positions of point mutations at a solution. Underneath, the average amino acid configuration of the protein is shown in shades of red (P) and blue (H). In most sites, mutations are neutral, while mutations in the channel are deleterious on average. L, left. (C) The average magnitude of the two-codon correlation |Qij| [5] in the shear band (amino acids in rows 7–13; red) and in the whole protein (black) as a function of the number of beneficial mutations, t.(Inset) Profile of the spatial correlation g(r) within the shear band (after t = 1, 11, 16 beneficial mutations). (D) The mean shear in the protein in a single run (black) and averaged over ∼ 106 solutions (red) as a function of the fraction of P amino acids, p. The values of p are shifted by the position of the jump, pc.(Inset) Distribution of pc.

E4562 | www.pnas.org/cgi/doi/10.1073/pnas.1716215115 Dutta et al. Downloaded by guest on October 1, 2021 Downloaded by guest on October 1, 2021 apr h nlsso elsqecs tec iese,w cal- c we correlation step, time two-codon protein each the At culate sequences. real real the of that of analysis from correlation the phylogenetic alignment hampers genes the sequence without align albeit to we (49–61), families this, analogy see in To simulations up. builds codons probe force the to response formed in is 3D). deform channel (Fig. easily the aver- can as protein the abruptly the jumps where and protein transition, the phase in can shear dynamical that soft age a The domains cost. at energetic two appears low into at mode other divided each of now independently is move network acid amino ntesre balances series force the imposed in the field elastic by force the balanced residual mutation, longer a the no of is result field a force sum as a scatterings: as multiple interpretation physical over straightforward a has series This series infinite an into iterated be can latter The can mutations Methods): of and function, mechanics Green perturbed The the 81). G examining by (80, explored crystal further a pertur- be in corresponding defect The Hamiltonian a 2B). the (Fig. of acid bation than amino more mutated no of the strength the Perturbations. vary Mechanical may tion Localized Are protein Mutations 47). of Point (46, models networks allosteric coarse-grained and families 44) in protein 43, as 36, real (35, well allostery in as appear 58–60) residues (53–55, coevolving the of In the regions correlation protein. in the whole shown of correlation the profile long-range spatial in (Fig. significant than channel is larger there forming 10-fold most channel, the is that in it concentrated where find 3C), is We correlation averages. the ensemble of denote brackets where gap, spectral muta- the of of effect vanishing the The sites, most neutral. λ in practically landscape that, is fitness shows the tions from which evident 3B), is (Fig. This mutations. neutral of eatr ftedul uainfo diiiyo w single two of additivity from mutation mutations: double the of departure both effect mutates combined one solution, nal site a at deviation, mutation independent another function function site fitness Green the a in the hence, at in acid change amino a algorithm an evolution the mutate We from 2C). and obtained (Fig. nonlin- protein mutations functional the interacting Correlations. a of epistasis, take effect Genetic of fitness the definition of to calculable earity a Mechanics provides Protein model Links Epistasis of the factor accelerate a to by perturbation computation the of nature which localized [12], the formula exploits Woodbury the using function Green mutated defor- further by for induced accounts mation term scattering second the Similarly, ut tal. et Dutta i ∗ 0 steserbn stkn hp,tecreainamong correlation the shape, taking is band shear the As and = → G aiet satplgclcag ntesse:the system: the in change topological a as manifests 0, c + j : δ δ G hc by h yo qain(7 2 (Materials 82) (77, equation Dyson the obeys which G, δ G = j G and 0 δ − G δu G i δ e Q ,j δf ij F δf = ij n ofrh npatc,w aclt the calculate we practice, In forth. so and and G j = ≡ ytedeformation the by ial,satn gi rmteorigi- the from again starting Finally, . −G hc ≡ 0 δ δ = u H F ∼ δ G δ δ i F i ,j 10 G H H c δ i − = j F ,j − h − i 4 i h epistasis The . s hrfr,lclzd knto akin localized, therefore, is, g G i δ Q δ aeil n Methods). and (Materials (r .Oecnsmlryperform similarly can One [3]. + h rtsatrn term scattering first The f. G H δ F and δ ij G ) G H i c G ewe l ar fcodons of pairs all between i − i Fg 3C, (Fig. ihc δ i )and [12]) by (calculated j hsmtto induces mutation This . G H δ 0 . F j iutnosy iha with simultaneously, j i, j rdcn second a producing , . z δu δ · · · G− H 12 = = .Analogous Inset). e ij G od around bonds esrsthe measures δf = . leaving f, G muta- A δ ∼ Gf. H Our 10 [5] [7] [6] 6 a for lar at mutation to due of energy shear the in factor The r eomdb h otmd.We ohstsaeoutside are sites both perturbations When the mode. by soft varied channel, the bonds the by the deformed where are band, shear the trmisngiil vni n ftemutations, the of one if even channel, the negligible remains It cteigpts hc nld both include which paths, scattering [ case, (∆ pertur- bations noninteracting nonadjacent, among epistasis Long-range energy epistasis an with of perturbation [6] effective series an Dyson the in term (where ha eomto:na h rniin h re ucinis function Green mode, the soft transition, the by the dominated near to deformation: epistasis long-range shear links directly expansion perturbation The the and pinch the at with nonlinearity mutations this response: of isolated value of product from inner effect response combined mechanical and the the func- of of Green deviation the additivity the of measures (“curvature”) tion nonlinearity the mechanics: has effect. mutation smaller second antagonistically a a mutation, interact deleterious channel strongly a the after elastic in (83): mutations the the change since 4 epistasis, hardly tive, (Fig. small landscapes domains epistasis rigid only The the response. show in region mutations high-shear since the outside acids average all ensemble for E the calculation take and double-mutation tions the acids, perform amino among we interaction epistatic average the evaluate To ,w n ipeepeso o the for expression simple a find shear: the we of [8], function a and as epistasis 6] [ mechanical From [4]. by patterns the in similarity to The equal acids. period of amino a with 10 region length: channel channel’s the in epistasis correlations mean strong the between two-codon correspondence depicts tight E a which find 4D, We [5]. Fig. in shown correlations 57) (56, mechanical relations from originates acids. epistasis amino mutated how among forces shows [8] Relation xii otc psai 4–1,bcuetoajcn pertur- adjacent two because acids (49–51), bations, amino epistasis Neighboring range. contact interaction exhibit the Paths. to Scattering according over sis Sum a as Epistasis the network. through sequences acid propagating amino protein interactions mechanical aligned the from among stems observed correlations range h neato al Fg 1C), (Fig. table interaction the ij ij sadrc ikbteneitssadprotein and epistasis between link direct a is [7] Definition ntegn,eittcitrcin r aietdi oo cor- codon in manifested are interactions epistatic gene, the In Q δ = hwsgicn psai ntecanl(i.4.Amino 4). (Fig. channel the in epistasis significant show H j ij , he i xrse h nonlinearity the expresses 6] ∆G n h energy the and ) and δ h ij δ H e j i Thus, . H ij i i H n h oo correlations codon the and ,j ,j e i E h = ij Q i i ≡ and h ,j steprubto ybt uain) h leading The mutations). both by perturbation the is ij h v ≡ ij ' i j 0 = | , δ niae htamjrcnrbto otelong- the to contribution major a that indicates  h rmteainetof alignment the from G u F [ G j δ ∗ | h e i sosre ln h hne Fg ) nthis In 4). (Fig. channel the along observed is ) H 1  · δ ,j i ij δ  H  j H − and = neatnnieryvathe via nonlinearly interact , + 1 ssmall, is [10] epistasis the 1, i i u δ h −v G h ∗ e G i i /λ ij and , h δ h λ i | H i j = − ∗ [G ∗ + j r infiatol nadaround and in only significant are G v ftesf oe n ti simi- is it and mode, soft the of δ G stertoo h change the of ratio the is [10] in | ∆H G + 1 e ∆ + ' ∆G j ij h ∆ PNAS h epistasis The . G G j u i ' h ∆G ,j H i ∗ i j δ ,j i ,j u G]f+ h i H − ,j and E ∗ | j sasnl cteigfrom scattering single a is Q f. ∼ i F | j /λ ij ,j ≡ i G n a lsiyepista- classify can One + 1 ij psai a nybe only can Epistasis . ∆ 10 o.115 vol. = teepcainvalue expectation (the sasmoe multiple over sum a as ohpten exhibit patterns Both . ∗ δ h δ · · · j r otyposi- mostly are A–C) H ihfitness with , H H i 6 he Fg 2C): (Fig. h + i i ucinlgenes functional i i . ,j ,j ij G]f + hc ilsthe yields which , h adcpsof Landscapes i. − j e | h ij · · · − δ j o 20 no. AND e H  ssml the simply is ij . i ' − . 10 2h F δ i 6 aeof gate | H sin is , i given solu- j E4563 h [10] 0 = 6 j [9] [8] F c ∗ i .

BIOPHYSICS AND COMPUTATIONAL BIOLOGY AB

L L

0 6 0 0.3 C D 200

10-1 100

L 100 -2 10-2 10

10-4 -0.3 0 0.5 10-3 100 200

6 Fig. 4. Mechanical epistasis. The epistasis [7], averaged over ∼ 10 solutions Eij = heiji, between a fixed amino acid at position i (black arrow) and all other positions j. Here, i is located at (A) the binding site, (B) the center of the channel, and (C) slightly off the channel. Underneath, the average amino acid configuration of the protein is drawn in shades of red (P) and blue (H). Significant epistasis mostly occurs along the P-rich channel, where mechanical interactions are long ranged. Although epistasis is predominantly positive, negative values also occur, mostly at the boundary of the channel (C). Landscapes are plotted for specific output site at right. L, left. (D) The two-codon correlation function Qij [5] measures the coupling between mutations at positions i and j [5]. The epistasis Eij and the gene correlation Qij show similar patterns. Axes are the positions of i and j loci. Significant correlations and epistasis occur mostly in and around the channel region (positions ∼70–130, rows 7–13).

long ranged along the channel when both mutations are sig- out from the bulk continuous spectrum. These are the col-  −1 −1 −1 nificant, hi , hj  1, and eij ' F 1 − hi − hj + (hi + hj ) ' lective dfs, which show loci in the gene and positions in the F [1 − 1/ min(hi , hj )]. We conclude that epistasis is maximal “flow” (i.e., displacement) and shear fields that tend to vary when both sites are at the start or end of the channel as illus- together. trated in Fig. 4. The nonlinearity of the fitness function gives rise We examine the correspondence among three sets of eigen- to antagonistic epistasis. vectors: {Uk } of the flow, {Ck } of the gene, and {Sk } of the shear. The first eigenvector of the flow, U1, is the hinge motion Geometry of and Gene-to-Function Map. With caused by the pinch, with two eddies rotating in opposite direc- our mechanical evolution algorithm, we can swiftly explore tions (Fig. 5A). The next two modes, shear (U2) and breathing the fitness landscape to examine its geometry. The genotype (U3), also occur in real proteins, such as glucokinase (Fig. 1B). space is a 200D hypercube with vertices that are all possi- The first eigenvectors of the shear S1 and of the gene sequence ble genes c. The phenotypes reside in a 400D space of all C1 show that the high-shear region is mirrored as a P-rich region, possible mechanical responses u. The Green function provides where a mechanical signal may cause local rearrangement of the the genotype-to-phenotype map [1]. A functional protein is amino acids by deforming weak bonds. In the rest of the pro- encoded by a gene c∗ with fitness that exceeds a large threshold, tein, the H-rich regions move as rigid bodies with insignificant F (c∗) ≥ Fm ' 5, and the functional phenotype is dominated by shear. The higher gene eigenvectors, Ck (k > 1), capture patterns the emergent zero-energy mode, u(c∗) ' u∗ (Fig. 3A). We also of correlated genetic variations. The striking similarity between characterize the phenotype by the shear field s∗ (Materials and the sequence correlation patterns Ck and the shear eigenvectors Methods). Sk shows a tight genotype-to-phenotype map, as is further shown The singular value decomposition (SVD) of the 106 solutions in the likeness of the correlation matrices of the amino acid and returns a set of eigenvectors with ordered eigenvalues that show shear flow (Fig. 5C). their significance in capturing the data (Materials and Meth- In the phenotype space, we represent the displacement field ods). The SVD spectra reveal strong correspondence between u in the SVD basis, {Uk } (Fig. 5B). Since ∼ 90% of the data are the genotype c∗ and the phenotype, u and s∗ (Fig. 5). In all explained by the first ∼ 15 Uk , we can compress the displacement three datasets, the largest eigenvalues are discrete and stand field without much loss into the first 15 coordinates. This implies

E4564 | www.pnas.org/cgi/doi/10.1073/pnas.1716215115 Dutta et al. Downloaded by guest on October 1, 2021 Downloaded by guest on October 1, 2021 ognrltssadfins ucin ihmlil nusand inputs multiple with functions fitness applicable for and responses. therefore, tasks expressions are, net- and general acid the mutations amino to localized the particular, the of (in protein. geometry and basic work the theory the in from this follow mode epistasis) of functional collective results a Green The of the emergence of the scatter- nonlinearity multiple the all [7 over equivalently, function sum or a trajectories to corresponds ing among gene, interaction low the the in this Epistasis, loci yield spectrum. that dynamical the configurations functional of network end prescribed for given looks problem: one modes, scattering inverse the ing the (with reference of genes frame the a of is representation contrast, solutions In directions. of set the that (C cloud. genotype the correlation. Corr, of shape spheroid the with eigenvectors the by defined is (Lower field flow 5. Fig. ain,wihsatrtefrefil n eetispropagation its pertur- deflect and mechanical [2]. field to force [3 protein mutations the the genetic scatter which map through bations, to transmission cooper- us of take allows force we This terms and effi- phenotype, motion in functional be a ative between can form As which mapping explicit calculated. functions), ciently the an (Green where takes propagators mechanical protein, phenotype of and informa- genetic genotype theory introduced of simplified We function. evolution protein a to the rise of give with together, physics which matter many-body tion, the acid combine amino to the need protein of Theories Discussion shape that constraints different 84–89). the (36, from them genotypes stems mapping phenotypes in reduction to dimensional incompressible dramatic an The is 5B). set solution the space, ut tal. et Dutta and .Dniyo ouin sclrcdd h eoyecosscini h ln endb h eigenvectors the by defined plane the is cross-section genotype The coded. color is solutions of Density ). rmgn omcaia ucin pcr n iesos ( dimensions. and spectra function: mechanical to gene From Fg ) h vltoaypoesaonst solv- to amounts process evolutionary The 2). (Fig. 6] U k ,adteshear the and (Middle), and .W n htln-ag psai signals epistasis long-range that find We 8]. U 3 {C − ici,wihi a nmost in flat is which discoid, 15D U k 100 } S k ai)rvasta,i genotype in that, reveals basis) rs-etostruhtesto ouin ntegntp pc (Upper space genotype the in solutions of set the through Cross-sections (B) (Bottom). i h et.Tedmninlrdcini aietdb h ici emtyo h hntp lu compared cloud phenotype the of geometry discoid the by manifested is reduction dimensional The text). the (in eei correlations Genetic ) peod(Fig. spheroid 200D c ∗ nteSVD the in h rtfu V ievcos(ntetx)o h gene the of text) the (in eigenvectors SVD four first The A) Q ij hwsmlrt ocreain nteserfield, shear the in correlations to similarity show es eueteH oe 7)wt w mn cdseis hydrophobic species, acid amino two where with (73) model HP (a are the amino and use There letters We are the bonds. Roman ters. that of to by vertices indexed connectivity correspond with acids The that amino lattice 90). edges hexagonal 39, and a 24, acids by (23, described springs is harmonic network of made Protein. work of Model Mechanical The Methods and Materials fsize of operator dient where A, connected which is degree, it maximal 1C the which (Fig. have to acids acids amino amino most is of model, our number In the bonds. is by acid amino each of and r degrees as Laplacian of the part write diagonal can The otherwise. 0 and uct k length )b h orsodn ieto vector direction corresponding the gradient by graph (±1) the element taking by obtained the is that so (39), stresses” equilibrium is “internal their energy at of elastic are initial possibility springs the all that disregarding such length, is configuration initial the that H k bonds weak the whereas 1C bond (Fig. table by interaction connected model’s HP the of rule k AND the by determined is i α s α ∈ i z = hc eke nadiagonal a in keep we which , eebdtegahi ulda space Euclidean in graph the embed We edfiete“medd rdetoperator gradient “embedded” the define We = = E = ∆ = ∇ and 1 d )adplr(a polar and H) k 2 hl mn cd ttebudr aefwrneighbors, fewer have boundary the at acids amino while 12, αj w n oec mn cd ecnaeaealpstosi vector a in positions all concatenate We acid. amino each to c n .Tecnetvt ftegahi eoddb h daec matrix adjacency the by recorded is graph the of connectivity The ). i b ∇ z + a = = i A × · . | (k .A ntecniumcs,teLpaeoperator Laplace the case, continuum the in As −1. ij d noe and H encodes 1 k h odaoa elements nondiagonal The ∇. n = w s ≡ a − :i vertices if ): = fteei odfrom bond a is there if 1 ∇ n k d 1 hsdtrie pigntok eas assume also We network. spring a determines This 0.01. w eae h pcso od n etcs(n s therefore, is, (and vertices and bonds of spaces the relates ial,t ahbn,w sinasrn ihconstant with spring a assign we bond, each to Finally, . )c hsipista togH strong a that implies This α. i i c j = E where , 0 ) h mn cdcani noe nagene a in encoded is chain acid amino The P). = i = ∆ and 0. − c i n P and P Z c j = n i r once yabond a by connected are − b and noe P, encodes 0 C emdltepoena neatcnet- elastic an as protein the model We × where A, 3 − n c b − PNAS C ∆ j j matrix 100 to r h ooso h mn acid amino the of codons the are have H ∆ stedegree the is ∇ E n ntepeoyesae it space, phenotype the in and , i n ij and d Z n utpyn ahnonzero each multiplying and b are | (d s stedaoa arxo the of matrix diagonal the is od nee yGeklet- Greek by indexed bonds h tegho h spring the of strength The K. ∗ i o.115 vol. n h hntp space phenotype the and ) C = = A if −1 k clrcddi o scale). log in coded (color D k ij α 1, )b sinn positions assigning by 2) ,tedisplacement the (Top), = o size (of − = n . . . ij tews.Tegra- The otherwise. 0 i odhas bond H k = and ∆ w , n n | euulytake usually We . then α, a ii a r = = h degree The . i o 20 no. n j − b r connected are 10 ∆ z × i r ec,we Hence, . j steprod- the is ×  n ∇ / d 20 k |

,which ), αi r α i z = E4565 +1 = − = < r 200 r k 12 of j z c,

s ), i , .

BIOPHYSICS AND COMPUTATIONAL BIOLOGY Thus, D is a tensor (D = ∇αinij), which we store as a matrix (α is the bond deleterious mutations (a finite temperature Metropolis algorithm) gave sim- connecting vertices i and j). In each row vector of D, which we denote as ilar results. We may impose a stricter minimum condition δF ≥ ε F with mα ≡ Dα,:, there are only 2d nonzero elements. To calculate the elastic a small positive ε, say 1%. An alternative, stricter criterion would be the response of the network, we deform it by applying a force field f, which demand that each of the terms in F, vp·up and vq·uq, increases separately. leads to the displacement of each vertex by ui to a new position ri + ui The evolution is stopped when F ≥ Fm ∼ 5, which signals the formation of (39). For small displacements, the linear response of the network is given a shear band. When simulations ensue beyond Fm ∼ 5, the band slightly by Hooke’s law, f = Hu. The elastic energy is E = u|Hu/2, and the Hamilto- widens, and the fitness slows down and converges at a maximal value, 2 nian, H = D|KD, is the Hessian of the elastic energy E, Hij = δ E/(δuiδuj). By typically Fmax ∼ 8. 1/2 p rescaling, D → K D, which amounts to scaling all distances by 1/ kα, we obtain H = D|D. It follows that the Hamiltonian is a function of the gene Evolving the Green Function Using the Dyson and Woodbury Formulas. The 0 0+ + H(c), which has the structure of the Laplacian ∆ multiplied by the tensor Dyson formula follows from the identity δH ≡ H − H = G − G , which 0 product of the direction vectors. Each d × d block Hij (i 6= j) is a function of is multiplied by G on the left and G on the right to yield [6]. The for- the codons ci and cj: mula remains valid for the pseudoinverses in the nonsingular subspace. One can calculate the change in fitness by evaluating the effect of a | mutation on the Green function, G0 = G + δG, and then examining the Hij(ci, cj) = ∆ijnijnij   [11] change, δF = v|δGf [3]. Using [6] to calculate the mutated Green func- = −A k + (k − k )c c n n|. 0 ij w s w i j ij ij tion G is an impractical method, as it amounts to inverting at each step a large nd × nd matrix. However, the mutation of an amino acid at i has The diagonal blocks complete the row and column sums to zero, Hii = a localized effect. It may change only up to z = 12 bonds among the − P . j =6 i Hij bonds α(i) with the neighboring amino acids. Thanks to the localized nature of the mutation, the corresponding defect Hamiltonian δHi is, The Inverse Problem: Green Function and Its Spectrum. The Green function therefore, of a small rank, r ≤ z = 12, equal to the number of switched G is defined by the inverse relation to Hooke’s law, u = Gf [1]. If H were bonds (the average r is about 9.3). δHi can be decomposed into a prod- invertible (nonsingular),G would have been just G = H−1. However, H is uct δHi = MBM|. The diagonal r × r matrix B records whether a bond α(i) always singular owing to the zero-energy (Galilean) modes of translation is switched from weak to strong (Bαα = ks − kw = +0.99) or vice versa and rotation. Therefore, one needs to define G as the Moore–Penrose pseu- (Bαα = −0.99), and M is a nd × r matrix with r columns that are the doinverse (78, 79), G = H+, on the complement of the space of Galilean bond vectors mα for the switched bonds α(i). This allows one to calculate transformations. The pseudoinverse can be understood in terms of the spec- changes in the Green function more efficiently using the Woodbury formula trum of H. There are at least n0 = d(d + 1)/2 zero modes: d translation (91, 92): modes and d(d − 1)/2 rotation modes. These modes are irrelevant and will be projected out of the calculation (note that these modes do not  + δG = −GM B−1 + M|GM M|G. [12] come from missing connectivity of the graph ∆ itself but from its embed- d ding in E ). H is singular but is still diagonalizable (since it has a basis of dimension nd), and it can be written as the spectral decomposition, n = P d λ | {λ } { } H k=1 kukuk , where k is the set of eigenvalues and uk are the corresponding eigenvectors (note that k denotes the index of the eigen- value, while i and j denote amino acid positions). For a nonsingular matrix, n −1 = P d λ−1 | one may calculate the inverse simply as H k=1 k ukuk . Since H is singular, we leave out the zero modes and get the pseudoinverse H+, G = n + P d −1 | H = λ uku . It is easy to verify that, if u is orthogonal to the k=n0+1 k k zero modes, then u = GHu. The pseudoinverse obeys the four requirements (78): (i) HGH = H, (ii) GHG = G, (iii)(HG)| = HG, and (iv)(GH)| = GH. In prac- tice, as the projection commutes with the mutations, the pseudoinverse has most virtues of a proper inverse. The reader might prefer to link G and H K = P λkt | = R ∞ K through the heat kernel, (t) k e ukuk . Then, G 0 dt (t) and d H = dt K|t=0.

Pinching the Network. A pinch is given as a localized force applied at the boundary of the “protein.” We usually apply the force on a pair of neigh- boring boundary vertices, p0 and q0. It seems reasonable to apply a force

dipole (i.e., two opposing forces fq0 = −fp0 ), since a net force will move the center of mass. This pinch is, therefore, specified by the force vector f (of

size nd), with the only 2d nonzero entries being fq0 = −fp0 . Hence, it has 0 the same structure as a bond vector mα of a “pseudobond” connecting p 0 and q A normal pinch f has a force dipole directed along the rp0 − rq0 line (the np0q0 direction). Such a pinch is expected to induce a hinge motion. A shear pinch will be in a perpendicular direction ⊥ np0q0 and is expected to induce a shear motion. Evolution tunes the spring network to exhibit a low-energy mode, in which the protein is divided into two subdomains moving like rigid bod- ies. This large-scale mode can be detected by examining the relative motion of two neighboring vertices, p and q, at another location at the boundary (usually at the opposite side). Such a desired response at the other side of the protein is specified by a response vector v, and the only nonzero entries correspond to the directions of the response at p and q. Again, we usually consider a “dipole” response vq = −vp. Fig. 6. The effect of the backbone on evolution of mechanical function. The backbone induces long-range mechanical correlations, which influence Evolution and Mutation. The quality of the response (i.e., the biological protein evolution. We examine two configurations: parallel (A and B) and fitness) is specified by how well the response follows the prescribed one perpendicular (C and D) to the channel. (A and B) Parallel. (A) The back- v. In the context of our model, we chose the (scalar) observable F as bone directs the formation of a narrow channel along the fold (compared | | F = v u = vp·up + vq·uq = v Gf [2]. In an evolution simulation, one would with Fig. 5A). (B) The first four SVD eigenvectors of the gene Ck (Top), exchange amino acids between H and P, while demanding that the fitness the flow Uk (Middle), and the shear Sk (Bottom). (C and D) Perpendic- change δF is positive or nonnegative. By this, we mean δF > 0 is thanks to ular. (C) The formation of the channel is “dispersed” by the backbone. a beneficial mutation, whereas δF = 0 corresponds to a neutral one. Delete- (D) The first four SVD eigenvectors of Ck (Top), Uk (Middle), and shear Sk rious mutations δF < 0 are generally rejected. A version that accepts mildly (Bottom).

E4566 | www.pnas.org/cgi/doi/10.1073/pnas.1716215115 Dutta et al. Downloaded by guest on October 1, 2021 Downloaded by guest on October 1, 2021 2 ai ,Tut 21)Rc-eitdhmlg erha eryotmlsignal optimal nearly a as a search via homology emerge Reca-mediated (2010) catalysis T and Tlusty regulation Y, Savir Allosteric 12. (2008) SJ Benkovic NM, conformational of Goodey impact The 11. proofreading: Conformational (2007) T Tlusty Y, Savir 10. 3 rn J of A camnJ 21)Lrecnomtoa hne nproteins: in changes conformational Large (2010) JA McCammon AA, Gorfe BJ, Grant 13. 4 oo ,WmnJ hnexJ 16)O h aueo lotrctastos A transitions: allosteric Haem-haem haemoglobin: of in effects nature cooperative of the Stereochemistry (1970) On MF Perutz (1965) JP 15. Changeux J, Wyman J, Monod 14. 6 u ,KrlsM(08 lotr n oprtvt revisited. cooperativity and Allostery (2008) M Karplus Q, Cui 16. ir,tpclyaon oe rs,crytennadmcorrelation nonrandom the carry so, out- larger or few dozen A bulk a matrices. continuum information. random around a of typically in spectra reside liers, the eigenvalues the resembles most of that spectra, roots region typical square In matrix the same. covariance are the the eigenvalues of SVD corresponding eigenvalues matrices, The these 36). of ref. eigenvectors the culate ato h tantno Foeisnr) hs he ye fvectors matrices of three types of three These rows norm). the (Frobenius along tensor stored strain are the in the method of of the value part using The field displacement 21. the ref. of derivative symmetrized the field shear length of tor n (i vectors: among three correlation the typically at solutions, looked numerous we map, genotype-to-phenotype the and unwanted SVD. or and sites Dimension isolated at The localized pre- left. being This to protein. from right the channels. weaken of vibrations from not motion does the also, hinge it but enables vents if transmis- and only right mutation the responses to a two-way for accept left the look to the modified only is from not algorithm we pinch basic Thereby, the versa. of vice responses sion at and look right, left, face face protein on the on site to binding force the pinch pinch the symmetrically: applying network by pinch (> pathologies the such diverge of avoid to We location mode. response the the of understand to independent cause easy vibrate and some can are these which of modes, All floppy site. binding around the includes to connecting typically out (which (iii and region sites), target 8–10 the outside H terminate via that connected (i evolutionary relative weakly are The the become observed other. common of The we each modes. that unintended energies to nonfunctional pathologies such low respect in up the with end might components to search the owing of modes motion floppy exhibits nents Networks. Broken of is and factor tensor Pathologies [ 12] a low-rank by expan- of a simulations series of advantage our a accelerates is practical This requires by (n parentheses). The it [6] in that [12]. of term (pseudo-)inversion (the in series only scattering pseudoinverse the the that the get of may sion one and alent, ut tal. et Dutta .EsnesrE,e l 20)Itiscdnmc fa nyeudriscatalysis. underlies enzyme an of dynamics kinases. Intrinsic protein (2005) of al. plasticity et conformational EZ, The Eisenmesser (2002) reaction 9. J enzymatic an Kuriyan along M, motions Huse Intrinsic 8. (2007) al. et biomolecules. Bridging KA, of proteins: simulations Henzler-Wildman dynamics of 7. Molecular dynamics (2002) Global JA (2010) McCammon E M, Karplus Eyal LW, 6. Yang TR, Lezon of landscape I, energy Bahar dynamic The (2006) 5. PE Wright HJ, in Dyson D, processes McElheny DD, Boehr Mechanical catalysis. 4. (2004) to motion protein D Relating enzyme (2006) Izhaky SJ in Benkovic NR, S, dynamics Hammes-Schiffer Forde of 3. role YR, The Chemla (2003) C, JC Smith Bustamante JL, 2. Finney RV, Dunn RM, Daniel 1. a a = h w xrsin o h uainimpact mutation the for expressions two The eeto system. detection route. common recognition. molecular of specificity the on changes Nature 109:275–282. trajectory. Biol Struct Nat function. and structure between catalysis. reductase dihydrofolate Biochem . activity. inln n te functions. other and Signaling neato n h rbe fallostery. of problem the and interaction model. plausible 1307. /r 0 oos;( codons); 200 ) 3 ' 438:117–121. 10 nuRvBohsBoo Struct Biomol Biophys Rev Annu 75:519–541. 4 Nature s . ∗ nuRvBiochem Rev Annu n avco flength of vector (a h eeo h ucinlprotein, functional the of gene the ) :4–5,adertm(02 9:788. (2002) erratum and 9:646–652, a hmBiol Chem Nat d o Biol Mol J = o Cell Mol 450:838–844. hnesta tr n n ttetre einwith- region target the at end and start that channels ) ii 0 of 400 h o ed(displacement), field flow the ) oeaietegoer ftefins landscape fitness the of geometry the examine To 40:388–396. 12:88–118. s ∗ x urOi tutBiol Struct Opin Curr 4:474–482. edi h u fsurso h traceless the of squares of sum the is field N nuRvBiophys Rev Annu and 73:705–748. Science sol → ∼ ewr rknit ijitcompo- disjoint into broken network A y sltdndsa h onaythat boundary the at nodes isolated ) 10 n uain,(ii mutations, P 32:69–92. a Nature 313:1638–1642. eoiycmoet) n (iii and components); velocity F 6 = W m ahslto scaatrzdby characterized is solution Each . ihu rdcn functional a producing without ) 0) ecmueteseras shear the compute We 200). | W C 228:726–734. k hl h ievcosare eigenvectors the while , , 39:23–42. 20:142–147. LSOne PLoS W U δ k C [6 G, and , , sdwy”channels “sideways” ) W c u(c ∗ U and and , 2:e468. avco flength of vector (a ∗ S rti Sci Protein ) k = i V a in (as SVD via , ,aeequiv- are 12], G(c W ∗ S ecal- We . )f nuRev Annu 17:1295– avec- (a the ) Cell 1 igD alM 20)Alseyi oregandmdlo rti dynamics. protein of model functional coarse-grained enable a in to Allostery (2005) structures ME Wall protein D, Ming of Adaptability 31. (2015) I biology. Bahar structural T, in analysis Haliloglu from mode arising 30. normal Coarse-grained proteins (2005) of AJ Rader change I, Bahar Conformational 29. (2001) YH Sanejouand inhibitor, F, Trypsin dynamics: Tama normal-mode Protein 28. (1985) PS Stern C, Sander coarse- M, Levitt by 27. predicted modes soft of significance L functional 26. the On (2010) single- I Bahar a 25. from under- for proteins models network Elastic in (2005) I Bahar motions LW, Yang AJ, elastic Rader C, Chennubhotla amplitude 24. Large (1996) MM data structural Tirion protein of analysis 23. twist and strain Elastic (2018) low S and Leibler MR, structures Mitchell protein 22. of analysis Strain (2016) S Leibler T, Tlusty MR, Mitchell 21. allostery. of coupled nature ensemble form The (2014) rearrangements VJ Hilser Contact J, Li (2008) JO, Wrabl HN, JJ Motlagh Gray 18. TJ, Upadhyaya MD, Daily 17. 0 esenM ekA,CohaC(94 tutrlmcaim o oanmovements domain for mechanisms Structural (1994) C Chothia synthesis. AM, protein Lesk M, to Gerstein specificity enzyme 20. of theory a of Application (1958) D Koshland 19. rn B-00adteSmn etrfrSsesBooyo h Institute the of Biology Systems com- for Science Princeton. Basic Center Study, valuable for Advanced Simons for Institute the for by and Council supported Wyart IBS-R020 Research is T.T. Alex Grant European Matthieu and by thank Bridges, supported Olivier and Grant is also J.-P.E. Advanced Yan, and manuscript. We the Le Zocchi, on encouragement. ments Granick, Giovanni and Steve Moses, discussions Petroff, Elisha helpful for Mitchell, Stanislas thank Rivoire We R. discussions. helpful Michael for and Leibler, 1B) (Fig. glucokinase in shear ACKNOWLEDGMENTS. long-range the on and search the solutions. of among sig- convergence correlations no fast has presence the this The on but sequence effect search, model. evolutionary nificant and our the constrain field of might results backbone the the basic of of the patterns changing to without that features examination correlation certain our further from adds requires conclude backbone this we the but Overall, fold, data. the structural of of direction analysis the 6C with (Fig. “dispersion” oriented output roughly to be the leads at and signal backbone the the neces- of of configuration) deformation (perpendicular configuration) significant fold (parallel sitates the fold across the Transmission of 6A). orientation the (Fig. along easier signals be mechanical of to Transmission seems it. around correlations sequence the with eigenvectors h eiaieo h o n hrfr,hglgtmr itntythese distinctly more highlight therefore, and the flow in the differences. seen of be derivative can the as differences, eigenvectors noticeable 6C flow shows 1D (Fig. inspection oriented (Figs. the closer of less ilar, backbone progression much without evolutionary resulting is the model The configuration, channel the 6A). perpendicular the (Fig. in In fold than 5). in the narrower moving along is eddies progress constrains channel backbone to two the formation with configuration, channel case displacement parallel the the (i.e., backboneless In pattern the fashion. flow to hinge-like a a similar with is protein that the field) of mode low-energy a to a perpendicular or cases: band extreme that shear two 6). bonds the on (Fig. to strong it focused parallel We conserved either acids. of backbone amino path serpentine our all linear augmented through and a therefore, once We, acids “backbone”: passes much mutates. a amino protein are with the the bonds model among when peptide change peptide interactions The not covalent noncovalent do backbone. by the protein than linked polypep- the stronger acids, are form amino Proteins which far. of bonds, so heteropolymers described linear results tides, the affect might backbone Backbone. Protein The sfrtecrepnec ewe eeeigenvectors gene between correspondence the for As of emergence the with interfere not does backbone the of presence The neatosadeouinr implications. evolutionary and interactions Biol Struct Opin Curr calculations. mode normal lysozyme. and ribonuclease crambin, Biol Struct Opin proteins. membrane for models grained assemblies. supramolecular to Biol From machinery: biomolecular standing analysis. atomic parameter, kcsa. channel transmembrane the of allostery and E5855. couplings. allosteric mechanical of dimensionality proteins. allosteric in motions local from networks e Lett Rev proteins. in USA Sci Acad Natl Proc 508:331–339. pzBac R Chac JR, opez-Blanco ` 2:S173–S180. 95:198103. iceity(Mosc) Biochemistry S k h akoeafcstesaeo h hne nconcert in channel the of shape the affects backbone the , 37:46–53. U k 15:586–592. Fg 6 (Fig. nP(06 e eeaino lsi ewr models. network elastic of generation New (2016) P on ` usinmyaiea owa xetteprotein’s the extent what to as arise may question A 44:98–104. etakJcusRueotfrcluain of calculations for Rougemont Jacques thank We rti n e Sel Des Eng Protein hsRvLett Rev Phys B 33:6739–6749. and o Biol Mol J .Tesereigenvectors shear The D). .W rps htteserbn will band shear the that propose We ). e Physiol Gen J 77:1905–1908. .Wieteflwpten r sim- are patterns flow the While ). PNAS urOi tutBiol Struct Opin Curr 14:1–6. 181:423–447. rcNt cdSiUSA Sci Acad Natl Proc | hsBiol Phys Proteins o.115 vol. 135:563–573. 71:455–466. 5:036004. | 35:17–23. o 20 no. C S k k n shear and 113:E5847– represent | Nature E4567 and Phys Phys Curr

BIOPHYSICS AND COMPUTATIONAL BIOLOGY 32. Zheng WJ, Brooks BR, Thirumalai D (2006) Low-frequency normal modes that describe 61. de Juan D, Pazos F, Valencia A (2013) Emerging methods in protein co-evolution. Nat allosteric transitions in biological nanomachines are robust to sequence variations. Rev Genet 14:249–261. Proc Natl Acad Sci USA 103:7664–7669. 62. Cordell HJ (2002) Epistasis: What it means, what it doesn’t mean, and statistical 33. Arora K, Brooks CL (2007) Large-scale allosteric conformational transitions of adeny- methods to detect it in humans. Hum Mol Genet 11:2463–2468. late kinase appear to involve a population-shift mechanism. Proc Natl Acad Sci USA 63. Phillips PC (2008) Epistasis–The essential role of gene interactions in the structure and 104:18496–18501. evolution of genetic systems. Nat Rev Genet 9:855–867. 34. Miyashita O, Onuchic JN, Wolynes PG (2003) Nonlinear elasticity, proteinquakes, and 64. Mackay TFC (2014) Epistasis and quantitative traits: Using model organisms to study the energy landscapes of functional transitions in proteins. Proc Natl Acad Sci USA gene-gene interactions. Nat Rev Genet 15:22–33. 100:12570–12575. 65. Ortlund EA, Bridgham JT, Redinbo MR, Thornton JW (2007) Crystal structure of an 35. Tlusty T (2016) Self-referring DNA and protein: A remark on physical and geometrical : Evolution by conformational epistasis. Science 317:1544–1548. aspects. Philos Trans A Math Phys Eng Sci 374:20150070. 66. Breen MS, Kemena C, Vlasov PK, Notredame C, Kondrashov FA (2012) Epistasis as the 36. Tlusty T, Libchaber A, Eckmann JP (2017) Physical model of the genotype-to- primary factor in molecular evolution. Nature 490:535–538. phenotype map of proteins. Phys Rev X 7:021037. 67. Gong LI, Suchard MA, Bloom JD (2013) Stability-mediated epistasis constrains the 37. Qu H, Zocchi G (2013) How enzymes work: A look through the perspective of evolution of an influenza protein. eLife 2:e00631. molecular viscoelastic properties. Phys Rev X 3:011009. 68. Miton CM, Tokuriki N (2016) How mutational epistasis impairs predictability in 38. Joseph C, Tseng CY, Zocchi G, Tlusty T (2014) Asymmetric effect of mechanical stress protein evolution and design. Protein Sci 25:1260–1272. on the forward and reverse reaction catalyzed by an enzyme. PLoS One 9:e101442. 69. Fitch WM, Markowitz E (1970) An improved method for determining codon variability 39. Alexander S (1998) Amorphous solids: Their structure, lattice dynamics and elasticity. in a gene and its application to the rate of fixation of mutations in evolution. Biochem Phys Rep 296:65–236. Genet 4:579–593. 40. Alexander S, Orbach R (1982) Density of states on fractals: “Fractons.” J Phys Lett 70. Koonin EV, Wolf YI, Karev GP (2002) The structure of the protein universe and 43:625–631. evolution. Nature 420:218–223. 41. Phillips JC, Thorpe MF (1985) Constraint theory, vector percolation and glass- 71. Povolotskaya IS, Kondrashov FA (2010) Sequence space and the ongoing expansion formation. Solid State Commun 53:699–702. of the protein universe. Nature 465:922–926. 42. Thorpe MF, Lei M, Rader AJ, Jacobs DJ, Kuhn LA (2001) Protein flexibility and 72. Liberles DA, et al. (2012) The interface of protein structure, protein biophysics, and dynamics using constraint theory. J Mol Graphics Model 19:60–69. molecular evolution. Protein Sci 21:769–785. 43. Hemery M, Rivoire O (2015) Evolution of sparsity and in a model of protein 73. Dill KA (1985) Theory for the folding and stability of globular proteins. Biochemistry allostery. Phys Rev E 91:042704. (Mosc) 24:1501–1509. 44. Flechsig H (2017) Design of elastic networks with evolutionary optimized long-range 74. Lau KF, Dill KA (1989) A lattice statistical mechanics model of the conformational and communication as mechanical models of allosteric proteins. Biophys J 113:558–571. sequence spaces of proteins. Macromolecules 22:3986–3997. 45. Thirumalai D, Hyeon C (2017) Signaling networks and dynamics of allosteric transi- 75. Shakhnovich E, Farztdinov G, Gutin AM, Karplus M (1991) bottlenecks: tions in bacterial chaperonin GroEL: Implications for iterative annealing of misfolded A lattice Monte Carlo simulation. Phys Rev Lett 67:1665–1668. proteins. arXiv:171007981. 76. Green G (1828) An Essay on the Application of Mathematical Analysis to the Theories 46. Rocks JW, et al. (2017) Designing allostery-inspired response in mechanical networks. of Electricity and Magnetism (T. Wheelhouse, Nottingham, England). Proc Natl Acad Sci USA 114:2520–2525. 77. Abrikosov A, Gorkov L, Dzyaloshinski I (1963) Methods of Quantum Field Theory in 47. Yan L, Ravasio R, Brito C, Wyart M (2017) Architecture and coevolution of allosteric Statistical Physics (Prentice Hall, Englewood Cliffs, NJ). materials. Proc Natl Acad Sci USA 114:2526–2531. 78. Penrose R (1955) A generalized inverse for matrices. Math Proc Cambridge Philos Soc 48. Yan L, Ravasio R, Brito C, Wyart M (2017) Principles for optimal cooperativity in 51:406–413. allosteric materials. arXiv:170801820. 79. Ben-Israel A, Greville TN (2003) Generalized Inverses: Theory and Applications 49. Gobel¨ U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue (Springer Science & Business Media, Springer-Verlag, New York), Vol 15. contacts in proteins. Proteins 18:309–317. 80. Tewary VK (1973) Green-function method for lattice statics. Adv Phys 22:757–810. 50. Marks DS, et al. (2011) Protein 3D structure computed from evolutionary sequence 81. Elliott RJ, Krumhansl JA, Leath PL (1974) The theory and properties of randomly variation. PLoS One 6:e28766. disordered crystals and related physical systems. Rev Mod Phys 46:465–543. 51. Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence 82. Dyson FJ (1949) The s matrix in quantum electrodynamics. Phys Rev 75:1736–1755. variation. Nat Biotechnol 30:1072–1080. 83. Desai MM, Weissman D, Feldman MW (2007) Evolution can favor antagonistic 52. Jones DT, Buchan DWA, Cozzetto D, Pontil M (2012) Psicov: Precise structural con- epistasis. 177:1001–1010. tact prediction using sparse inverse covariance estimation on large multiple sequence 84. Savir Y, Noor E, Milo R, Tlusty T (2010) Cross-species analysis traces adaptation of alignments. Bioinformatics 28:184–190. rubisco toward optimality in a low-dimensional landscape. Proc Natl Acad Sci USA 53. Lockless SW, Ranganathan R (1999) Evolutionarily conserved pathways of energetic 107:3475–3480. connectivity in protein families. Science 286:295–299. 85. Savir Y, Tlusty T (2013) The as an optimal decoder: A lesson in molecular 54. Suel GM, Lockless SW, Wall MA, Ranganathan R (2003) Evolutionarily conserved recognition. Cell 153:471–479. networks of residues mediate allosteric communication in proteins. Nat Struct Biol 86. Kaneko K, Furusawa C, Yomo T (2015) Universal relationship in gene-expression 10:59–69. changes for cells in steady-growth state. Phys Rev X 5:011014. 55. Reynolds K, McLaughlin R, Ranganathan R (2011) Hot spots for allosteric regulation 87. Friedlander T, Mayo AE, Tlusty T, Alon U (2015) Evolution of bow-tie architectures in on protein surfaces. Cell 147:1564–1575. biology. PLoS Comput Biol 11:e1004055. 56. Hopf TA, et al. (2017) Mutation effects predicted from sequence co-variation. Nat 88. Furusawa C, Kaneko K (2017) Formation of dominant mode by evolution in biological Biotechnol 35:128–135. systems. arXiv:170401751. 57. Poelwijk FJ, Socolich M, Ranganathan R (2017) Learning the pattern of epistasis 89. Tlusty T (2010) A colorful origin for the : Information theory, statistical linking genotype and phenotype in a protein. bioRxiv:10.1101/213835. mechanics and the emergence of molecular codes. Phys Life Rev 7:362–376. 58. Halabi N, Rivoire O, Leibler S, Ranganathan R (2009) Protein sectors: Evolutionary 90. Born M, Huang K (1954) Dynamical Theory of Crystal Lattices, The International Series units of three-dimensional structure. Cell 138:774–786. of Monographs on Physics (Clarendon, Oxford). 59. Rivoire O, Reynolds KA, Ranganathan R (2016) Evolution-based functional decompo- 91. Woodbury MA (1950) Inverting Modified Matrices (Princeton Univ, Princeton), sition of proteins. PLoS Comput Biol 12:e1004817. Statistical Research Group Memo Report 42, p 4. 60. Tes¸ileanu T, Colwell LJ, Leibler S (2015) Protein sectors: Statistical coupling analysis 92. Deng CY (2011) A generalization of the Sherman–Morrison–Woodbury formula. Appl versus conservation. PLoS Comput Biol 11:e1004091. Math Lett 24:1561–1564.

E4568 | www.pnas.org/cgi/doi/10.1073/pnas.1716215115 Dutta et al. Downloaded by guest on October 1, 2021