<<

Downloaded by guest on September 24, 2021 fitness experimental for of confounded landscapes. choice additionally nonrandom is the inference by that acid-specific and amino landscape and hotspots the ruggedness demon- epistatic of local we both part to data, known owing difficult a the is beyond of extrapolation multi- subsets that global Using strate the landscape. of fitness features allelic quantify we approaches, statistical and theoretical proposed recently extending and Com- traditional mutations. global bining epistatic the positively four that via find reached we is peak landscape, fitness the in in Hsp90 negative protein of tern heat-shock the in cerevisiae romyces sites 6 at acid-changing amino mutations 13 of combinations mutants possible all engineered represent large that 640 uniquely comprising a landscape present predictability fitness we the multiallelic under- Here, for evolution. true implications of the repeatability its of and and shape predictions landscape the theoretical fitness of test lying understanding to our us improve allowing and mutations, of fitness the of effects studies extensive sequencing and accurate next-generation geno- enable methods with (NGS) mapping combined at approaches aims exper- imental 2016) Novel which 2, attention. ever-increasing August receiving landscapes, is review fitness, for to fitness (received types 2016 of 11, October study approved and The NY, Ithaca, University, Cornell Clark, G. 01605 Andrew MA by Worcester, Edited School, Medical Massachusetts of University Pharmacology, Molecular & Biochemistry Switzerland; a Bank Claudia intragenic landscape large fitness a of (un)predictability the On w ie.Frty ifrn miia adcpshv been www.pnas.org/cgi/doi/10.1073/pnas.1612676113 have combination landscapes the on based empirical generally 4), ref. different in (reviewed Firstly, assessed sides. two (8). the strategies the treatment for and and pathogens particular vaccine effective in in of evolution development biology, drug-resistance of evolutionary quantify- study of toward clinical field far step impacts the first yield beyond would the that of this be advancement parts predictability—an would whether ing unknown which unclear to landscape, extrapolation however, the an is, for It allows categorization (10). extensively been have which studied landscape architecture, fea- experimental similar the of an landscapes relate capture theoretical that to that and statistics landscape the in of interest tures muta- growing of hundreds is to the there tens for tions, of With landscapes allow fitness experimentally. that full of approaches assessed assessment experimental be of to development high the fitness too respective their much to effects—is mutations of of mapping combinations a possible is, all organism—that (the- an is of a landscape of that and fitness dimensionality complete optimization the convergent Unfortunately, of (4). of degree achievable likelihood oretically) the and the evolution, drift, the repeatability parallel , genetic for the of potential on the importance of evolution, information shape of The carries predictability and (5–9). landscape biology increasing fitness of received the subfields has other concept across landscape attention fitness accurate and the large toward datasets, biology systems advance- and the molecular With of (2–4). ment evolution- alike inspired mathematicians has and individual ary an of success reproductive the S nttt ubnind Ci de Gulbenkian Instituto xsigrsac nti ail rwn edcmsfrom comes field growing rapidly this in research Existing fafins adcp eaiggntp o hntp)to idea ) the (or (1), 1932 relating in landscape Wright fitness Sewall a by of proposed first ince | adaptation c ws nttt fBonomtc,11 asne Switzerland; Lausanne, 1015 Bioinformatics, of Institute Swiss a,b,c,1 ne lvtdslnt.Dsieapeaetpat- prevalent a Despite salinity. elevated under eata Matuszewski Sebastian , | epistasis ni,28-5 ers Portugal; Oeiras, 2780-156 encia, ˆ | teslandscape fitness | b,c,1 mutagenesis ynT Hietpas T. Ryan , b colo ieSine,EoePltcnqeF Polytechnique Ecole Sciences, Life of School Saccha- d l il n opn,Idaaoi,I 62;and 46225; IN Indianapolis, Company, and Lilly Eli 1073/pnas.1612676113/-/DCSupplemental at online information supporting contains article This 3 Dryad the in 2 deposited been have paper 1 this in (dx.doi.org/10.5061/dryad.th0rj). reported Repository data Digital The deposition: Data Submission. Direct PNAS a is article paper.This interest. the of wrote conflict J.D.J. no and R.T.H., declare S.M., authors C.B., The and data; analyzed S.M. and C.B. search; cdcagn uain nteha-hc rti s9 in cerevisiae Hsp90 protein charomyces heat-shock the in the mutations of acid-changing rest the to as informative indeed landscape. is the whether topography and for landscape local mutations the the of of choice topography the nonrandom affects the is experiment question which remaining to crucial extent A the with (23). compatible Model are Geometric landscapes the approaches Fisher’s whether features statistical assess certain to Current by and landscapes 14) 22)]. existing (10, the and rank two to 21 used refs. been [i.e., have see epistasis 19 but (refs. 20 expected negative than and deleterious and more are concert (18)] in advan- mutations both more expected; are with than concert landscapes in 17) tageous mutations (16, two [i.e., rugged epistasis and positive (15) smooth both between ing effects the interaction quantify (i.e., and epistasis (10–14). landscape number mutations) of the a degree characterize developed expected that and statistics properties, the of model], respective (HoC), (RMF) concert). Fuji their House-of-Cards land- Mount Rough in studied the the different and beneficial (NK), as muta- proposed NK be Kauffman [such of has to architectures combination research observed scape a theoretical been (i.e., have Secondly, walk that adaptive dissection tions observed the on an or mutations of beneficial observed previously of uhrcnrbtos .. ... n ...dsge eerh ...promdre- performed R.T.H. research; designed J.D.J. and R.T.H., C.B., contributions: Author d,e owo orsodnesol eadesd mi:[email protected]. Email: addressed. be should correspondence whom State To Arizona Sciences, Life of School Medicine, & Evolution for Center address: work. Present this to equally contributed S.M. and C.B. nvriy ep,A 85281. AZ Tempe, University, uti uainseicifraini o considered. not is diffi- information inherently -specific and if predictability cult together landscape hence which and the mutations, extrapolation hotspot of make epistatic ruggedness and of existence peak local the fitness report topog- landscape global They fitness the raphy. the of predictability of for potential accessibility the approaches, the theoretical study proposed they recently combination a and Using traditional yeast. in of Hsp90 protein mutations heat-shock engineered the in systematically intragenic authors 640 multiallelic of complete the landscape and fitness Here, large evolution. uniquely a adaptive present deter- in and stochastic processes concerned of fundamentally ministic roles relative is the landscapes understanding fitness with of study The Significance ee epeeta nrgncfins adcp f60amino 640 of landscape fitness intragenic an present we Here, report- mixed, is studies these from emerges that picture The n efe .Jensen D. Jeffrey and , nacalnigevrnetipsdby imposed environment challenging a in ed ´ rl eLuan,11 Lausanne, 1015 Lausanne, de erale ´ b,c,2,3 . www.pnas.org/lookup/suppl/doi:10. e eatetof Department NSEryEdition Early PNAS | f6 of 1 Sac-

EVOLUTION high salinity. With all possible combinations of 13 mutations of various fitness effects at 6 positions, the presented landscape is not only uniquely large but also distinguishes itself from pre- viously published work regarding several other experimental features—namely, by its systematic and controlled experimen- tal setup using engineered mutations of various selective effects and by considering multiple alleles simultaneously. We begin by describing the landscape and identifying the global peak, which is reached through a highly positively epistatic combination of four mutations. Based on a variety of implemented statistical measures and models, we describe the accessibility of the peak, the pattern of epistasis, and the topography of the landscape. To accommodate our data, we extend several previously used models and statistics to the multiallelic case. Using subsets of the landscape, we discuss the predictive potential of such mod- eling and the problem of selecting nonrandom mutations when attempting to quantify local landscapes to extrapolate global features. Results and Discussion We used the EMPIRIC approach (24, 25) to assess the growth rate of 640 mutants in yeast Hsp90 (Materials and Methods). Based on previous screenings of fitness effects in different envi- ronments (26) and on different genetic backgrounds (20), and on expectations of their biophysical role, 13 amino acid-changing point mutations at 6 sites were chosen for the fitness landscape presented here (Fig. 1). The fitness landscape was created by assessing the growth rate associated with each individual muta- tion on the parental background and all possible mutational com- Fig. 2. (A) Empirical fitness landscape by mutational distance to parental binations. A previously described Monte Carlo Markov chain type. Each line represents a single substitution. Vertical lines appear when (MCMC) approach was used to assess fitness and credibility multiple alleles have been screened at the same position. There is a global intervals (ref. 27 and Materials and Methods). pattern of negative epistasis. The three focal landscapes are highlighted. (B) Close-up on the beneficial portion of the landscape. (C) Expected (“Exp.”) versus observed (“Obs.”) fitness for the focal landscapes, obtained from 1,000 posterior samples. We observe strong positive epistasis in the land- scape that contains the global optimum, whereas the other two are domi- nated by negative epistasis. In A–C, the y axis depicts growth rate as a proxy for fitness.

The Landscape and Its Global Peak. Fig. 2 presents the resulting fitness landscape, with each mutant represented based on its Hamming distance from the parental genotype and its median estimated growth rate. Lines connect single-step sub- stitutions, with vertical lines occurring when there are multiple mutations at the same position (Fig. 1). With increasing Ham- ming distance from the parental type, many mutational combina- tions become strongly deleterious. Thus, we observe strong neg- ative epistasis between the substitutions that, as single steps on the parental background, have small effects. This pattern is con- sistent with Fisher’s Geometric Model (28) when combinations of individually beneficial or small-effect mutations overshoot the optimum and with classic arguments predicting negative epistasis based on mutational load (29). It is also intuitively comprehen- sible on the protein level, where the accumulation of too many mutations is likely to destabilize the protein and render it dys- functional (30). Fig. 1. Individual amino acid substitutions and their effect on the parental The global peak of the fitness landscape is located four muta- background in elevated salinity, obtained from 1,000 samples from the pos- tional steps away from the parental type (Fig. 2B), with 98% terior distribution of growth rates. Boxes represent the interquartile range of posterior samples identifying the peak. The fitness advan- [i.e., the 50% confidence interval (CI)], whiskers extend to the highest/lowest tage of the global peak reaches nearly 10% over the parental data point within the box ±1.5 times the interquartile range, and circles type and is consistent between replicates (Materials and Meth- represent outliers; gray and white background shading alternates by amino ods and SI Appendix, Fig. S3 1). Perhaps surprising given the acid position. The box below indicates with colored dots which mutations degree of conservation of the studied genomic region (figure are involved in the focal landscapes discussed throughout the main text: the four mutations leading from the parental type to the global optimum S5 in ref. 24), it is important to note that these fitness effects (“opt”), the four individually most beneficial mutations on the parental are measured under highly artificial experimental conditions background (“best”), and the four mutations with the individually low- including high salinity, which are unlikely to represent a nat- est growth rates on the parental background (“worst”). (Inset) Parental ural environment of yeast. The effects of the individual muta- sequence at positions 582–590 and assessed amino acids by position. tions comprising the peak in a previous experiment without

2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1612676113 Bank et al. Downloaded by guest on September 24, 2021 Downloaded by guest on September 24, 2021 ake al. et Bank S3 Fig. Appendix, simi- (SI a epistasis shows positive optima growth local of of beneficial signature five lar terms the of in each type fact, In parental individu- rate). the are back- from that indistinguishable mutations parental beneficial (i.e., ally one mutations the of “neutral” combination three synergistic on and highly a mutations via but single-step ground beneficial also most see 20; with ref. associated from (data adaptation 26). environment ref. of salinity cost increased the potential respectively, the M589A, and emphasizing N588L, A587P, W585L, mutations for h oa n lbllnsaeptenmyb ut ifrn,an different, quite be may pattern that landscape indicates to global observation likely and This local is peak. the landscape fitness suboptimal studied a the at 26% on probability stall the higher adaptation only much from Hence, a with 47%. with away reached reached of substitutions is is 3C) two (Fig. it type optimum parental optimum, adap- local an global A in included the probability. are to vertices all walk of 78% tive type and edges parental all the of 73% from restricting and initiating when 3 walks changes (Fig. adaptive picture to the of The analysis accessibility 3). the high (Fig. indicating optimum - points global from starting probability of high with majority reached a is starting and of landscape, 95% the almost in from points probability nonzero with reached be land- the in and mutant 3 given (Fig. reach any scape to from probability starting the optimum particular and a Information Methods), Supporting and Materials Appendix, Extended (SI steps 1: optimum of fitness number the a of reach variance to and mean the for solutions lytical mutations Methods neighboring and the (Materials by the attainable to increases correspond absorbing fitness probabilities the relative transition the to which correspond in absorbing and optima an states, mutation local as weak where dynamics resulting selection chain, the Markov strong express can the until we landscape, In (34), the limit reached. in is mutant optimum any an of from length the starting the evaluated walks we of adaptive addition, accessibility In the optima. local studied observed pro- and six walks recently landscape, adaptive empirical simulated the framework Given we (33). a Plotkin and Draghi within by posed Landscape. landscape Fitness fitness the empirical on Walks Adaptive interact effects individual 13). their (4, for antagonistically synergistically, interact selected to beneficial mutations tend and combined whereas background their wild-type (13) a for mutations on selected effect small-effect were that of more mutations landscape be that to the tends than mutations large-effect rugged of landscape that such fitness TEM-1 mutations the small-effect than enzyme strongly more resistance interact tions antibiotic (32). in the found al. β recently underlying et pattern Visser gene the de the support by results predicted our between been Furthermore, epistasis has positive mutations epista- and negative neutral mutations in fact, beneficial particularly In between 31), evolution. sis (18, compensatory occasionally of epista- context observed the positive been also frequently, has more sis reported been epistasis. during has background mutations negative adaptation com- beneficial parental strong between actual exhibits epistasis the negative thus the on Although and Notably, mutations deleterious 6%. highly four is of posi- these per benefit of mutation a bination one predicts most only at in tion) (considering background (best) advantage. parental dataset the fitness most- the on individually 4% mutations four a the single-step only of beneficial, combination predicts a (opt) even peak Furthermore, involved mutations global four the the in of combination a that demonstrates were NaCl added lcaaein -lactamase uiul,tegoa eki o ece ycmiigthe combining by reached not is peak global the Curiously, sn hsfaeok efidta h lblotmmcan optimum global the that find we framework, this Using is S3 Figs. Appendix , SI hwn htlreefc muta- large-effect that showing coli, Escherichia is S3 Figs. Appendix, SI −0.04135, .Ti prahalwdu odrv ana- derive to us allowed approach This ). −0.01876, n S3 and 3 n S3 and 3 .31,and −0.03816, et esuidthe studied we Next, 4 .Hr,although Here, ). 4 ). 2 −0.02115 .Fg 2C Fig. ). esrmn netit n dpiewalks). adaptive and in uncertainty to these measurement refer on we focus text; therefore main will the We illustrative. proved 14, particularly ref. be in to proposed recently effects, fitness in landscape-averaged correlations measuring statistics our gamma and topography the this conclusions, supports statistics landscape com- of HoC set 4 random whole (Fig. a component rugged- of additive an mixture intermediate and a ponent with by characterized landscape is RMF which ness, a of that resembles all for landscapes focal (iii) three geno- the and discussed. examples parental special the sublandscapes, as approach), containing highlighting diallelic type, landscapes “drop-one” 4-step 360 the diallelic, possible 1,570 as com- all to was (ii) referred acid amino subsequently of approach 1 cross-validation set (a (42), which landscape whole the in mul- from the landscapes removed pletely of all computed consis- (i) case we assess for potential, the statistics To predictive an to Methods). and provide statistic and tency we (Materials used necessary, landscapes the Whenever tiallelic of Text]. Intro- extension Main Models brief analytical Landscape the for Fitness in Different (14); duced of see landscapes Overview 2: terms, egg-box mation these (41), with (36–38), of HoC them [NK definitions 40), compared models (39, and landscape RMF 14) theoretical 11, from gamma (10, expectations proposed Methods ) recently and the Materials and epistasis, statistics; of fraction ratio, acid-specific. amino or shape site- the are in landscape changes the whether of study opportunity. may contained we this are landscape, site the with same within the us at alleles provides statistics. multiple because here landscape Moreover, consis- of studied the potential landscape for predictive tests The the preventing into hence 12), divided and and be 10 tency refs. to see small (but too cre- subsets are were of landscapes majority landscapes complete the Furthermore, published studied criteria. conclusions different the to drawing according because ated However, difficult (10). proven similar landscape capturing has the hence and of been correlated have features land- them ruggedness fitness of and the most epistasis proposed, of Landscape. of topography Fitness measures global Various the scape. the of considered Topography we the Next, and Measures Epistasis experi- of contribution the of see discussion interaction error, additive a mental purely for no alleles; is between there to attributed (i.e., are epistasis 62% magnitude remaining sign the reciprocal whereas and (35), (30%) epistasis sign alle- (8%) pervasive of show pairs loci that different find at we les peaks, fitness local multiple of existence below detail more in discussed (see and confirmed is that observation accessible. but poorly probability, is high it very which a from The with points starting sequence. reached are general, parental there in the is, to from (B) corresponds optimum starting geno- line global red when given The a probability dataset. from respective the starting in the optimum all specific for a computed reach type to probability the is, h e ieidctsma ahlnt o dpiewlsfo the from walks adaptive for length chain). path Markov the mean (B of indicates times genotype. absorbing parental line (i.e., red landscape full The the in type any 3. Fig. ABC efidta h eea oorpyo h teslandscape fitness the of topography general the that find We (roughness-to-slope statistics landscape various computed We .I iewt the with line In Statistics). Landscape of Potential Predictive itiuino vrg egh faatv ak trigfrom starting walks adaptive of lengths average of Distribution (A) IApni,Spotn nomto :Extended 1: Information Supporting Appendix, SI i.S3 Fig. Appendix, SI and C itiuino bobn rbblte,that probabilities, absorbing of Distribution ) IAppendix SI IApni,Spotn Infor- Supporting Appendix, SI o diinlrsls(e.g., results additional for 5 ). NSEryEdition Early PNAS A and .Weesthe Whereas B). | f6 of 3

EVOLUTION Fig. 4. (A) Expected pattern of landscape-wide epis-

tasis measure γd (SI Appendix, Eq. S1 15) with muta- tional distance for theoretical fitness landscapes with

6 loci from ref. 14. (B) Observed decay of γd with mutational distance under the drop-one approach is quite homogenous, except when hot spot mutation 588P is removed; 95% CIs are contained within the lines for the global landscape. (C) Observed decay of

γd for all diallelic six-locus sublandscapes. Depend-

ing on the underlying mutations, γd is vastly differ- ent, suggesting qualitative differences in the topog- raphy of the underlying fitness landscape and in the extent of additivity in the landscape. Three focal

landscapes representative of different types and γd for the full landscape have been highlighted. (D)

Observed decay of γd for all diallelic four-locus sub- landscapes containing the parental type, indicative of locally different landscape topographies. High- lighted are the three focal landscapes. (Insets) His- tograms of the roughness-to-slope ratio r/s for the respective subset, with horizontal lines indicating

values for highlighted landscapes [r/s for global landscape (blue) is significantly different from HoC expectation, p = 0.01]. Similar to γd, there is a huge variation in r/s for subparts of the fitness landscape, especially when considering the four-locus subsets.

Predictive Potential of Landscape Statistics. When computed based reflected in the corresponding roughness-to-slope ratio (inset of on the whole landscape and on a drop-one approach, the land- Fig. 4C and D), further emphasizing heterogeneity of the fitness scape appears quite homogeneous, and the gamma statistics landscape with local epistatic hotspots. show relatively little epistasis (Figs. 4B and 5B). At first sight, Finally, the 1,570 di-allelic four-locus landscapes containing this result contradicts our earlier statement of strong negative the parental genotype, although highly correlated genetically, and positive epistasis but can be understood, given the differ- reflect a variety of possible landscape topographies (Fig. 4D), ent definitions of the epistasis measures used: above, we have ranging from almost additive to egg-box shapes, accompanied by measured epistasis based on the deviation from the multiplica- an extensive range of roughness-to-slope ratios. The three focal tive combination of the single-step fitness effects of mutations on landscapes discussed above are not strongly different compared the parental background. Because these effects were small, epis- with the overall variation and yet show diverse patterns of epis- tasis was strong in comparison. Conversely, the gamma measure tasis between substitutions (Fig. 5). is independent of a reference genotype and captures the fitness decay with a growing number of substitutions as a dominant and quite additive component of the landscape. Only mutation 588P has a pronounced effect on the global ABCD landscape statistics and seems to act as an epistatic hotspot by making a majority of subsequent mutations (of individually small effect) on its background strongly deleterious (clearly vis- ible in Figs. 4B and 5C). This finding can be explained by look- ing at the biophysical properties of this mutation. In wild-type Hsp90, amino acid 588N is oriented away from solvent and forms hydrogen bond interactions with neighboring amino acids (24). Proline lacks an amide proton, which inhibits hydrogen bond interactions. As a result, substituting 588N with a proline could disrupt hydrogen bond interactions with residues that may be involved in main chain hydrogen bonding and destabilize the protein. In addition, the pyrrolidine ring of proline is extremely rigid and can constrain the main chain, which may restrict the conformation of the residue preceding it in the protein Fig. 5. (A) Epistasis measure γ (SI Appendix, Eq. S1 8; (A , B ) on (Ai,Bi)→(Aj,Bj) i i sequence (43). the x axis) between any two substitutions, averaged across the entire land- The variation between inferred landscape topographies in- scape. The majority of interactions are small to moderate (blue). Parts of the creases dramatically for the 360 diallelic, 6-locus sublandscapes fitness landscape show highly localized and mutation-specific epistasis, rang- (Fig. 4C). Whereas all sublandscapes are largely compatible with ing from strong magnitude epistasis (white) to sign and reciprocal sign epis- tasis (yellow). (B) The average epistatic effect γ (SI Appendix, Eq. S1 10) an RMF landscape, the decay of landscape-wide epistasis with Ai→ mutational distance (as measured by γd ) shows a large variance, of a mutation occurring on any background is always small. (C) The aver- age epistatic effect γ (SI Appendix, Eq. S1 13) of a background on any suggesting large differences in the degree of additivity. Interest- →Aj ingly, various sublandscapes, typically carrying mutation *588P, new mutation is usually small, except for mutations to *588P, which shows a show a relaxation of epistatic constraint with increasing muta- strong magnitude effect. (D) Locus-specific gamma for the four mutations leading to the global optimum (Top), the four largest single-effect mutations tional distance (i.e., increasing γd ) that is not captured by any (Middle), and the single-effect mutations with the lowest fitness (Bottom). of the proposed theoretical fitness landscape models, sugges- The opt landscape exhibits strong sign epistasis between loci 587 and 588 and tive of systematic compensatory interactions (but see the egg-box between loci 588 and 589. Also, the best landscape exhibits pervasive epista- model for an explicit example featuring nonmonotonicity in γd ). sis with sign epistasis between locus 588 and loci 585 and 586, respectively. The variation in the shape of the fitness (sub)landscapes is also We observe almost no epistasis in the worst landscape.

4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1612676113 Bank et al. Downloaded by guest on September 24, 2021 Downloaded by guest on September 24, 2021 lndit h 47P lsi htcntttvl xrse Hsp90. expresses constitutively that generated individ- plasmid were and p417GPD isolated (24) the Hsp90 described previously into yeast previously cloned as 13 of strategies, region of cassette-ligation 582-to-590 optimized mutants) from the sextuplet within mutants to ual up (single tions Generation. Data detailed more in A presented used. is methods work theoretical and the materials of the treatment outline briefly we Here, Methods and Materials site-specific of indi- integration of properties. the land- properties biophysical with the small combined about multiple mutations, information of vidual gather utilization to the used be scapes increasing will for forward power way land- predictive promising one fitness that suggests constructing this also danger However, work of combinations. the mutational a practice ascertained highlight from common is scapes thus the From results there to genotype. these inherent parental point, probabil- standpoint, the appreciable practical starting from with a starting any reached when be particular almost will ity, that from optimum reach local peak within global the is Although extremes. these between intermediate epistatic to owing valleys fitness of creation are interactions. the that of paths result adaptive direct obligatory a following predictable geno- population locally starting become the the may with on evolution time, strongly same depends the probability them At the type. of and one unpre- any optima globally reach multiple is to are evolution there of epistasis because presence sign) dictable, the reciprocal in taken and Conversely, path (sign unpredictable. the entirely but will locally optimum, population is single-fitness the the because reach predictable eventually evo- In highly landscape), evolution. globally additive purely predicting is a lution in for (i.e., epistasis epistasis of of absence the duality pattern the by global imposed overall the of hotspots.” predictor “epistatic of poor global because sub- a the smaller Although are landscape, data. homogeneous landscapes rather the a of indicates subsets pattern from could the features extrapolated global with yeast. whether be along test in to landscape, us protein fitness allows nature, com- the Hsp90 multiallelic landscape of the size a of unprecedented of mutants The analysis engineered and 640 creation prising the via question this be to quantita- be yet are optimal will goal the this achieve and extrapolate determined. to area, approaches to fit- qualitative this and of ability in tive nature progress the to high-dimensional however, paramount the landscapes, between to and ness Due within epistasis regions. of prevalence genomic the in and predictability tool evolution the powerful of regarding evo- questions a complex become adaptive address to to describe biology promise to landscapes metaphor fitness a lution, as introduced Originally Conclusion proximity hydrogen spatial and (46). on relationship, similarity bonding based evolutionary residues, structures equivalent protein of effects two mutation-specific provide aligning of may integration via DeepAlign allow as to (45)] such opportunity models matrix the Newer BLOSUM sufficient. 44), [e.g., mutation-specific. not ref. model be is to site-specific (e.g., need a forward models Considering into step muta- such that important hotspot properties demonstrate an epistatic biophysical we is mutation, local of models of single integration landscape existence a the the only Although to across tions. due even fail landscape, may the of olation ake al. et Bank h miia teslnsaesuidhr per obe to appears here studied landscape fitness empirical The difficulty inherent the highlight results our combination, In addressing toward step important an taken have we Here, Extrap- indeed. difficult is landscapes fitness predicting Thus, oo usiuinlbaiscnitn f60combina- 640 of consisting libraries substitution Codon IAppendix SI . ai) npriua,w acltdtetp feittcitrcinbetween interaction epistatic epis- of sign type show back- the mutations mutations both calculated (i.e., both in we if epistasis particular, effect (i.e., In opposite sign epistasis tasis). an sign mutations], has reciprocal considered of or mutations grounds) number two the the (57), of nonadditive with one are increases effects fitness fitness [i.e., whether but so, epistasis if magnitude and, epistatically display interact loci these two between alleles of pairs cific Epistasis. of Fraction landscape. fitness alleles of γ pair specific a between effects Eq. pair Eq. allele specific a on alleles of pairs s g an by Defining tima). of characterized total is a process of consisting with subsequent This chain of optimum walks). Markov consisting an adaptive absorbing only until (so-called process continue reached Markov that is a substitutions as one-step modeled fitness-increasing, be can tion Walks. distance Adaptive files. trace Hellinger resulting the the of using inspection visual assessed with 7,419 combined of was (50) size amino approach samples Convergence each effective for 725). average estimates an (minimum posterior to corresponding 10,000 mutation of acid acid consisted result- amino The output rates. same growth MCMC approach the equal ing with MCMC for replicates Bayesian coding as a interpreted sequences were using Nucleotide sequence 20 27. ref. ref. in in described proposed approach the to ing Rates. Growth of Estimation sequencing (26). of described Analysis previously 49). as (48, performed scoring were PHRED data on based position read Elim each by performed produced was and Sequencing Biopharmaceuticals (25). described previously as performed at stored and points time specific gen- at 12 taken for were NaCl M Samples 0.5 erations. containing medium The selective to dextrose. 30 shifted then (wt/vol) at and h Hsp90 2% 8 with for replaced grown are was culture Gal and Gal to raffinose similar but medium selective medium ampli- to After transferred Gal). was 1% culture and library raffinose leucine, the 1% g/L with fication, uracil 0.1 g/L methionine, 0.01 and g/L lysine, g/L 0.02 0.1 g/L 0.03 hemisulfate, 0.03 sulfate, phenylalanine, adenine g/L g/L 0.05 ammonium 0.04 isoleucine, tyrosine, g/L g/L g/L 0.03 0.4 threonine, acid, 5 glutamic g/L g/L 0.2 0.1 acids, valine, serine, g/L 0.03 arginine, amino g/L 30 0.02 acid, without aspartic at g/L base transfor- h 100 nitrogen 12 Following with :pgals- yeast medium for (47). (Gal) : amplified galactose method using ho: was conditions acetate :leu2 library hsc82: lithium the :leu2 mation, the hsp82: using ura3-1 hsp82-his3) trp1-1 leu2-3,12 his3-11,15 the into introduced h arxo psai fet ewe ifrn ar falleles of calculated pairs geno- different we between all (14), effects theory over epistatic recent of averaged Extending matrix background, landscape. the fitness genetic the altered different is of mutation a types focal onto a of put effect selective when mutations the of how effects quantifies fitness which of (14), correlation the Mutations. calculating by of assessed were Effects Fitness of Correlation dataset. full the from obtained those all to comparing and statistics recalculating and unobserved) as delet- mutation in by corresponding rows assessed and were columns mutations corresponding specific the of ing influence the proba- and the results calculate the of to and optimum the optimum reach in any to variance bility reaching the before and steps expectation the of determine number to allows then matrix tal neighbors one-mutant matrix adaptive, transition all genotype over current sum the the of by normalized 53) (52, ity mutant any from j A ( (g) arigamtn leea locus at allele mutant a carrying A j es yi,DAioain n rprto o luiasqecn were sequencing Illumina for preparation and isolation, DNA lysis, Yeast were combinations mutation Hsp90 of libraries expressed Constitutively ,B i S1 S1 ,B = j i  ) w ihHmigdistance Hamming with 15) ,adtedcyo orlto ffins effects fitness of correlation of decay the and 12), → termed (g Eq. Appendix, (SI g [ j [i ] k ) ] − and bobn ie,otm)and optima) (i.e., absorbing γ w w ntesrn eeto ekmtto ii 5) adapta- (51), limit mutation weak selection strong the In (A P stefins fgenotype of fitness the as (g) g i ,sc httetasto probabilities transition the that such (g), , g 5)i t aoia omadcmuigtefundamen- the computing and form canonical its in (54) B [ oaymutant any to i olwn es 5ad5,w unie hte spe- whether quantified we 56, and 35 refs. Following .cerevisiae S. )→(A j ] (with g If g. j ,B hnsatn rmgenotype from starting when S1 j niiulgot ae eeetmtdaccord- estimated were rates growth Individual ) i Eq. , Appendix (SI g 6 = ,tevco feittcefcsbtenall between effects epistatic of vector the 9), ◦ sa(oa)optimum, (local) a is oalwsuofo h idtp oyof copy wild-type the of shutoff allow to C j ihrsett ie eeec geno- reference given a to respect with ) htf tanDY8 cn-0 ade2-1 (can1-100 DBY288 strain shutoff 0mlinraso 9 ofiec at confidence 99% of reads million ≈30 i d h eeto ofceti eoe by denoted is coefficient selection the , g [i vrgdoe l genotypes all over averaged ] A r ie ytefiainprobabil- fixation the by given are j (A , B n n j i P  , − teghadtp fepistasis of type and Strength ifrn tts(.. mutants), (i.e., states different ie,b setal raigthe treating essentially by (i.e., B termed k i ) S1 rnin tts(.. nonop- (i.e., states transient and g, nalohrpiso alleles of pairs other all on NSEryEdition Early PNAS ,tevco fepistatic of vector the 8), /Lapcli 17g/L (1.7 ampicillin µg/mL ◦ γ p ne nonselective under C →(A g, g g [i ] 0 j = γ ,B stegenotype the as 5) (55). p d j g, ) .Ptigthe Putting 1. IAppendix, (SI IAppendix, (SI [i (A ] −80 o going for i g ,B | fthe of i ) f6 of 5 ◦ C. and γ

EVOLUTION type g over the entire fitness landscape. There was no epistatic interaction if variance and, thus, a very large roughness-to-slope ratio (as in the HoC −6 |si(g[ j]) − si(g)| < ε = 10 , magnitude epistasis if sj(g)sj(g[i]) ≥ 0 and model). In addition, we calculated a test statistic Zρ by randomly shuffling si(g)si(g[ j]) ≥ 0, reciprocal sign epistasis if sj(g)sj(g[i]) < 0 and si(g)si(g[ j]) < fitness values in the sequence space to evaluate the statistical significance 0, and sign epistasis in all other cases (58). of the obtained roughness-to-slope ratio from the dataset ρdata given by ρdata−E[ρRSL] Zρ = √ . Var[ρ ] Roughness-to-Slope Ratio. Following ref. 11, we calculated the roughness- RSL to-slope ratio ρ by fitting the fitness landscape to a multidimensional lin- ear model using the least-squares method. The slope of the linear model ACKNOWLEDGMENTS. We thank Dan Bolon, Brian Charlesworth, Pamela Cote, Inesˆ Fragata, and Hermina Ghenu for helpful comments and dis- corresponds to the average additive fitness effect (10, 23), whereas the cussion. This project was funded by grants from the Swiss National roughness is given by the variance of the residuals. Generally, the bet- Science Foundation and a European Research Council Starting Grant ter the linear model fit, the smaller the variance in residuals such that (to J.D.J.). Computations were performed at the Vital-IT Center the roughness-to-slope ratio approaches 0 in a perfectly additive model. (www.vital-it.ch) for high-performance computing of the Swiss Institute of Conversely, a very rugged fitness landscape would have a large residual Bioinformatics.

1. Wright S (1932) The roles of mutation, inbreeding, crossbreeding, and selection in Uncovering the potential for adaptive walks in challenging environments. Genetics evolution. Proceedings of the Sixth International Congress of Genetics, ed Jones DF 196(3):841–852. (Brooklyn Botanic Garden, Menasha, WI), pp 356–366. 28. Weinreich DM, Knies JL (2013) Fisher’s geometric model of adaptation meets the func- 2. Coyne JA, Barton NH, Turelli M (1997) Perspective: A critique of ’s shift- tional synthesis: Data on pairwise epistasis for fitness yields insights into the shape ing balance theory of evolution. Evolution 51(3):643–671. and size of phenotype space. Evolution 67(10):2957–2972. 3. Gavrilets S (2004) Fitness Landscapes and the Origin of . (Princeton Univ Press, 29. Kimura M, Maruyama T (1966) Mutational load with epistatic gene interactions in Princeton). fitness. Genetics 54(6):1337–1351. 4. de Visser JAGM, Krug J (2014) Empirical fitness landscapes and the predictability of 30. Soskine M, Tawfik DS (2010) Mutational effects and the evolution of new protein evolution. Nat Rev Genet 15(7):480–490. functions. Nat Rev Genet 11(8):572–582. 5. Costanzo M, et al. (2010) The genetic landscape of a cell. Science 327(5964):425–431. 31. Trindade S, et al. (2009) Positive epistasis drives the acquisition of multidrug resis- 6. Jimenez´ JI, Xulvi-Brunet R, Campbell GW, Turk-Macleod R, Chen IA (2013) Compre- tance. PLoS Genet 5(7):e1000578. hensive experimental fitness landscape and evolutionary network for small RNA. Proc 32. de Visser JA, Hoekstra RF, van den Ende H (1997) An experimental test for synergistic Natl Acad Sci USA 110(37):14984–14989. epistasis and its application in Chlamydomonas. Genetics 145(3):815–819. 7. Huang S (2013) Genetic and non-genetic instability in tumor progression: Link 33. Draghi JA, Plotkin JB (2013) Selection biases the prevalence and type of epistasis between the fitness landscape and the epigenetic landscape of cancer cells. Cancer along adaptive trajectories. Evolution 67(11):3120–3131. Metastasis Rev 32(3-4):423–448. 34. Gillespie JH (1984) Molecular evolution over the mutational landscape. Evolution 8. Mann JK, et al. (2014) The fitness landscape of HIV-1 gag: Advanced modeling 38(5):1116–1129. approaches and validation of model predictions by in vitro testing. PLoS Comput Biol 35. Weinreich DM, Watson RA, Chao L (2005) Perspective: Sign epistasis and genetic con- 10(8):e1003776. straint on evolutionary trajectories. Evolution 59(6):1165–1174. 9. Liu G, Rancati G (2016) Adaptive evolution: Don’t fix what’s broken. Curr Biol 36. Kauffman S, Levin S (1987) Towards a general theory of adaptive walks on rugged 26(4):R169–R171. landscapes. J Theor Biol 128(1):11–45. 10. Szendro IG, Schenk MF, Franke J, Krug J (2013) Quantitative analyses of empirical 37. Kauffman SA (1993) The Origins of Order: Self Organization and Selection in Evolu- fitness landscapes. J Stat Mech 2013:P01005. tion. (Oxford Univ Press, New York). 11. Aita T, Iwakura M, Husimi Y (2001) A cross-section of the fitness landscape of dihy- 38. Schmiegelt B, Krug J (2014) Evolutionary accessibility of modular fitness landscapes. drofolate reductase. Protein Eng 14(9):633–638. J Stat Phys 154(1):334–355. 12. Franke J, Klozer¨ A, de Visser JAGM, Krug J (2011) Evolutionary accessibility of muta- 39. Neidhart J, Szendro IG, Krug J (2014) Adaptation in tunably rugged fitness landscapes: tional pathways. PLoS Comput. Biol. 7(8):e1002134. The Rough Mount Fuji model. Genetics 198(2):699–721. 13. Schenk MF, Szendro IG, Salverda MLM, Krug J, de Visser JAGM (2013) Patterns of 40. Aita T, et al. (2000) Analysis of a local fitness landscape with a model of the rough Epistasis between beneficial mutations in an antibiotic resistance gene. Mol Biol Evol Mt. Fuji-type landscape: Application to prolyl endopeptidase and thermolysin. 30(8):1779–1787. Biopolymers 54(1):64–79. 14. Ferretti L, et al. (2016) Measuring epistasis in fitness landscapes: The correlation of 41. Kingman JFC (1978) A simple model for the balance between selection and mutation. fitness effects of mutations. J Theor Biol 396:132–143. J Appl Probab 15(1):1–12. 15. Kryazhimskiy S, Rice DP, Jerison ER, Desai MM (2014) Microbial evolution. Global 42. Geisser S (1993) Predictive Inference. (CRC Press, New York), Vol 55. epistasis makes adaptation predictable despite sequence-level stochasticity. Science 43. Bajaj K, et al. (2007) Stereochemical criteria for prediction of the effects of proline 344(6191):1519–1522. mutations on protein stability. PLoS Comput Biol 3(12):e241. 16. Khan AI, Dinh DM, Schneider D, Lenski RE, Cooper TF (2011) Negative epis- 44. Wylie CS, Shakhnovich EI (2011) A biophysical protein folding model accounts tasis between beneficial mutations in an evolving bacterial population. Science for most mutational fitness effects in viruses. Proc Natl Acad Sci USA 108(24): 332(6034):1193–1196. 9916–9921. 17. Schoustra SE, Bataillon T, Gifford DR, Kassen R (2009) The properties of adaptive walks 45. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. in evolving populations of fungus. PLoS Biol 7(11):e1000250. Proc Natl Acad Sci USA 89(22):10915–10919. 18. Weinreich D, Delaney N, DePristo M (2006) Darwinian evolution can follow only very 46. Wang S, Ma J, Peng J, Xu J (2013) Protein structure alignment beyond spatial proxim- few mutational paths to fitter proteins. Science 312(5770):111–114. ity. Sci Rep 3:1448. 19. Dickinson WJ (2008) Synergistic fitness interactions and a high frequency of benefi- 47. Gietz DR, Woods RA (2002) Transformation of yeast by lithium acetate/single- cial changes among mutations accumulated under relaxed selection in Saccharomyces stranded carrier DNA/polyethylene glycol method. Methods Enzymol 350: cerevisiae. Genetics 178(3):1571–1578. 87–96. 20. Bank C, Hietpas RT, Jensen JD, Bolon DNA (2015) A systematic survey of an intragenic 48. Ewing B, Green P (1998) Base-calling of automated sequencer traces using Phred. II. epistatic landscape. Mol Biol Evol 32(1):229–238. Error probabilities. Genome Res 8:186–194. 21. MacLean RC, Hall AR, Perron GG, Buckling A (2010) The of antibi- 49. Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer otic resistance: Integrating molecular mechanisms and treatment contexts. Nat Rev traces using Phred. I. Accuracy assessment. Genome Res 8:175–185. Genet 11(6):405–414. 50. Boone EL, Merrick JRW, Krachey MJ (2012) A Hellinger distance approach to MCMC 22. Jasmin JN, Lenormand T (2016) Accelerating mutational load is not due to Syner- diagnostics. J Stat Comput Simulat 84(4):833–849. gistic epistasis or mutator alleles in mutation accumulation lines of yeast. Genetics 51. Gillespie JH (1983) A simple stochastic gene substitution model. Theor Popul Biol 202(2):751–763. 23(2):202–215. 23. Blanquart F, Achaz G, Bataillon T, Tenaillon O (2014) Properties of selected muta- 52. Fisher RA (1930) The Genetical Theory of . (Clarendon Press, Oxford, tions and genotypic landscapes under Fisher’s geometric model. Evolution 68(12): UK). 3537–3554. 53. Wright S (1931) Evolution in Mendelian populations. Genetics 16(2):97. 24. Hietpas RT, Jensen JD, Bolon DNA (2011) Experimental illumination of a fitness land- 54. McCandlish DM (2011) Visualizing fitness landscapes. Evolution 65(6):1544–1558. scape. Proc Natl Acad Sci USA 108(19):7896–7901. 55. Kemeny JG, Snell JL (1960) Finite Markov Chains (van Nostrand, Princeton). 25. Hietpas R, Roscoe B, Jiang L, Bolon DNA (2012) Fitness analyses of all possible point 56. Poelwijk FJ, Kiviet DJ, Weinreich DM, Tans SJ (2007) Empirical fitness landscapes reveal mutations for regions of genes in yeast. Nat Protoc 7(7):1382–1396. accessible evolutionary paths. Nature 445(7126):383–386. 26. Hietpas RT, Bank C, Jensen JD, Bolon DNA (2013) Shifting fitness landscapes in 57. Fisher RA (1918) The correlation between relatives on the supposition of Mendelian response to altered environments. Evolution 67(12):3512–3522. inheritance. Trans R Soc Edinburgh 52:399–433. 27. Bank C, Hietpas RT, Wong A, Bolon DN, Jensen JD (2014) A bayesian MCMC 58. Brouillet S, Annoni H, Ferretti L, Achaz G (2015) MAGELLAN: A tool to explore small approach to assess the complete distribution of fitness effects of new mutations: fitness landscapes. bioRxiv:031583.

6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1612676113 Bank et al. Downloaded by guest on September 24, 2021