USING COMPLEX SELECTION DYNAMICS TO REVEAL THE

FITNESS LANDSCAPE AND CROSS EVOLUTIONARY VALLEYS

by

Barrett Steinberg

A dissertation submitted to Johns Hopkins University in conformity with the

requirements for the degree of Doctor of Philosophy

Baltimore, Maryland June 2015

Abstract

Nature repurposes proteins via evolutionary processes. During , the fitness landscapes of proteins change dynamically. Selection for new functionality leaves the protein susceptible to genetic drift in the absence of selective pressure for the former function. Drift is considered to be a driver of evolution, and functional tradeoffs are common during selection. We measured the effect on ampicillin resistance of ~12,500 unique mutants of alleles of TEM-1 β-lactamase along an adaptive path in the evolution of cefotaxime resistance. This series of shifting protein fitness landscapes provides a systematic, quantitative description of genetic drift and pairwise/tertiary intragenic epistasis involving adaptive . Our study provides insight into the relationships between , protein structure, protein stability, epistasis, and drift and reveals the tradeoffs inherent in the evolution of new functions.

We further use principles of ruggedness and dynamic change in fitness landscapes to develop and evaluate novel directed evolution strategies using complex selection dynamics. Interestingly, the strategy that included negative selection relative to the original landscape yielded more highly active variants of β-lactamase than the other four selection strategies. We reconstructed evolutionary pathways leading to this highly active allele, confirmed the presence of a fitness valley, and found an initially deleterious mutation that serves as an epistatic bridge to cross this fitness valley. The ability of negative selection and changing environments to provide access to novel fitness peaks has important implications for applied directed evolution as well as the natural evolutionary mechanisms, particularly of antibiotic resistance.

ii

We finally applied principles of the influence of selection environment to a clinically relevant system by comparing the evolutionary pathways of cells evolving competitively and continuously to single antibiotics, a cocktail of two antibiotics, or alternating cycles of the two antibiotics. We find evidence for distinct evolutionary pathways between antibiotic strategies. Specifically, we suggest that cocktail strategies appear to select for “specialists” of varying activity while cycling more stringently selects for “generalists.” We hypothesize that this result is due to separately emerging populations to evolve in distinct niches during cocktail therapies. Our results have direct relevance to informing clinical antibiotic regimens.

Advisor: Dr. Marc Ostermeier, Professor of Chemical and Biomolecular Engineering,

Johns Hopkins University

Readers: Dr. Michael Betenbaugh, Professor of Chemical and Biomolecular Engineering,

Johns Hopkins University

Dr. Robert Schleif, Professor of Biology, Johns Hopkins University

iii

Acknowledgments

First and foremost, I would like to thank Kathleen January for her boundless support. I would not be here without both sharing help with her during harder times and joy with her at each small victory. I am relieved and ecstatic to finally see our greater plans begin to work out together. I also thank my parents and family for providing the building blocks towards my education.

I couldn’t have finished without the support of members of the Ostermeier lab.

Specifically, I couldn’t ask for a better coworker and friend than Nirav Shelat. Nirav, thank you for the science, pool, and whiskey. I’ve shared great camaraderie with

Courtney Gonzalez, Nathan Nicholes, and Tina Xiong. Martin Kang, great job and best of luck. Outside the lab, both Dillon Nye and Joey Priola have shown to be excellent friends. I hope all of these connections are lasting, and wish you all the best of luck.

I must directly thank Dr. Horacio Frydman for introducing me to proper scientific method and instilling the first true appreciation of science into me. Of course, I also thank

Dr. Ostermeier for the freedom and critical feedback he has provided. He has given me the opportunity to try countless extra experiments, most of which have failed, but a few of which have succeeded. I have this freedom to thank for honing my scientific creativity, and these acknowledgements to thank for dealing with the many failures. I thank my committee members for providing feedback and support on this work. Finally, I cannot begin to acknowledge the great musicians, novelists, and scientists that have helped me.

As Thomas Pynchon has helpfully summarized, “Life's single lesson: that there is more accident to it than a man can ever admit to in a lifetime and stay sane.”

iv

Table of contents

Abstract ...... ii

Acknowledgments ...... iv

Table of contents ...... v

List of Tables ...... viii

List of Figures ...... ix

Chapter 1: Introduction and background ...... 1

The fitness landscape ...... 1

Epistasis and the “tape of life” ...... 6

Genetic drift ...... 10

Directed Evolution ...... 11

β-lactamase as a model gene in directed evolution ...... 14

Directed evolution informed by landscape ruggedness ...... 19

Chapter 2: Shifting fitness and epistatic landscapes reflect tradeoffs along an evolutionary pathway ...... 21

Introduction ...... 21

Methods ...... 26

Mutagenesis ...... 26

Selection ...... 26

Deep sequencing ...... 26

Data analysis ...... 27

v

Results and Discussion ...... 32

Fitness landscapes along an adaptive pathway ...... 32

Structural map of epistasis ...... 40

Sign epistasis along an adaptive pathway ...... 44

Reciprocal sign epistasis and ruggedness along an adaptive pathway ...... 46

Conclusions ...... 47

Chapter 3: Environmental changes bridge fitness valleys ...... 49

Introduction ...... 49

Materials and Methods ...... 55

Plasmid conditions ...... 55

Mutagenesis ...... 55

Selection ...... 55

Deep sequencing and sequence analysis ...... 56

Phylogenetic analysis ...... 57

Selection-weighted attractive graphing (SWAG) ...... 57

Calculation of epistasis ...... 57

Minimum inhibitory concentration testing ...... 58

Results and Discussion ...... 58

Chapter 4: Evaluation of antibiotic cycling and cocktail therapy on evolution of antibiotic resistance ...... 83

Introduction ...... 83

Methods ...... 86

Results ...... 88

vi

Discussion ...... 95

Chapter 5: Conclusions and Future Directions ...... 101

Conclusions reached in this work ...... 101

Revealing the shape of the fitness landscape ...... 104

Evolution in changing environments and selection pressure ...... 106

Applications in antibiotic resistance ...... 108

References ...... 110

Curriculum Vitae ...... 123

vii

List of Tables

Table 1: Table of resistance to cefotaxime conferred by mutants leading to GKTS allele.

...... 16

Table 2: Plasmid mutations resulting from ten rounds of evolution in each selection

strategy and associated MIC values...... 100

viii

List of Figures

Figure 1: Modern model representation of a fitness landscape ...... 2

Figure 2: Demonstration of types of epistasis ...... 7

Figure 3: Structure of TEM-1 β-lactamase protein ...... 15

Figure 4: Structure of TEM-1 β-lactamase with sites of GKTS mutations indicated ...... 16

Figure 5: Diagram of evolutionary pathways with monotonically increasing fitness during

evolution of GKTS from TEM-1, from Weinreich et al...... 18

Figure 6: Adaptive pathway of TEM-1 to TEM-15 ...... 23

Figure 7: Overview of methodology used to measure protein fitness...... 25

Figure 8: Comparison of the TEM-1 fitness values from Firnberg et al., 2014 and those

used in this work...... 28

Figure 9: Illustration of all possible reciprocal sign epistasis, R, in our system...... 31

Figure 10: The landscapes of genetic drift along an adaptive pathway...... 33

Figure 11: Distribution of pairwise and tertiary epistasis values along an adaptive

pathway...... 34

Figure 12: The distribution of epistatic effects involving G238S depends on the severity

of the mutation...... 36

Figure 13: Plot of third-order epistasis versus pairwise epistasis values...... 37

Figure 14: Epistasis as a function of sequence and structure...... 39

Figure 15: Correlation between a mutation’s effect on cefotaxime resistance and the

effect on ampicillin resistance for TEM-15...... 40

Figure 16: The relationship between epistasis and protein structure...... 41

ix

Figure 17: Frequency of sign epistasis and ruggedness along an adaptive pathway...... 44

Figure 18: Sum of occurrences of observed sign epistasis when comparing fitness

landscapes of β-lactamase alleles...... 46

Figure 19: Reciprocal sign epistasis as a function of sequence and structure...... 47

Figure 20: Simplified model representation of how evolution in alternating environments

that modulates the fitness landscape can lead to the crossing of fitness valleys...... 50

Figure 21: General schematic of the experimental evolution experiments on the β-

lactamase gene...... 53

Figure 22: Course of the evolution experiments...... 58

Figure 23: Diversity and conferred resistance of alleles after the eight round of evolution.

...... 59

Figure 24: Heat map of matrix of Hamming similarity between unique alleles found from

47 most resistant alleles following round 8 in all selections...... 60

Figure 25: Genetic distance of evolved sequences following eight rounds of evolution. 61

Figure 26: Structural mapping of mutations found in each selection scheme following

round 8...... 62

Figure 27: Heat map of mutations observed in the negative evolution strategy as analyzed

by deep sequencing...... 64

Figure 28: Stacked frequency distribution of mutations during negative evolution found

by deep sequencing as a function of codon position over eight rounds...... 65

Figure 29: Stacked distribution of mutations found in top 47 alleles from all selection

strategies following round 8 as a function of codon position in β-lactamase...... 65

x

Figure 30: Structural mapping of mutations found in the BS-NEG-4 allele, which confers

extremely high resistance to cefotaxime...... 66

Figure 31: Selection-weighted attraction graph (SWAG) of the possible evolutionary

intermediates between TEM-15 and BS-NEG-4...... 68

Figure 32: Pathways between clusters found in SWAG analysis...... 70

Figure 33: SWAG landscape of pathways between TEM-15 and BS-NEG-4 with single

mutants indicated...... 72

Figure 34: SWAG landscape modeling selection for low fitness alleles along the

evolutionary pathway from TEM-15 to BS-NEG-4...... 74

Figure 35: Radial graph of all analyzed pathways from TEM-15 to BS-NEG-4...... 75

Figure 36: 3D SWAG of evolutionary pathways only following increasing fitness from

TEM-15 to BS-NEG-4...... 76

Figure 37: Epistasis and fitness along possible evolutionary pathways between TEM-15

and BS-NEG-4...... 77

Figure 38: Change in pairwise epistasis, ∆ε3, between two mutations on the pathway

from TEM-15 to BS-NEG-4 in response to the addition of F230S...... 78

Figure 39: The protein fitness effects of single mutations in TEM-15 for the mutations

observed among the 47 most resistant mutants in each of the evolution experiments.

...... 81

Figure 40: Illustration of whole-plasmid evolution strategies evaluated...... 85

Figure 41: pTSmut and barcoded pSkunk3 plasmid maps...... 86

Figure 42: Diagram of continuous evolution for antibiotic resistance using phage and

mutator plasmid...... 88

xi

Figure 43: History of library MICs during whole-plasmid evolution...... 89

Figure 44: Distribution of mutated features from plasmids resulting from each selection.

...... 92

Figure 45: MICs of individual mutants resulting from each whole-plasmid selection

scheme...... 94

Figure 46: Model of generalist evolution during antibiotic cycling ...... 96

xii

Chapter 1: Introduction and background

The fitness landscape

Sewall Wright invented the concept of the fitness landscape to attempt to understand the search space of evolution1. A fitness landscape can represent the evolution of species, metabolic networks, proteins, or any biological system subject to the evolutionary forces of mutation and selection. Ideally, Wright wanted to understand the limitations and availabilities to evolutionary processes. This concept used contour lines to indicate fitness, with drastic simplifications in the dimensionality of possible protein sequences to a two-dimensional plane. Fitness is then indicated by the “height” above a given genotype represented as a point on the xy plane, or “sequence space” (Figure 1).

Wright’s own estimate of sequence space numbered the size of all possible combinations of alleles at 101000, a number already greater than the amount of matter in the observable universe. In fact, Wright’s estimate assumed only 1000 genes per organism, causing him to underestimate the size of the possible landscape. Despite the astronomic scale of this concept, the introduction of the fitness landscape metaphor introduced a conceptual, intuitive framework to the processes of evolution even before the discovery of DNA. As

Wright suggested in this paper, “The problem of evolution as I see it is that of a mechanism by which the species may continually find its way from lower to higher peaks in such a field. In order that this may occur, there must be some trial and error mechanism on a grand scale”1. In other words, given such staggering possible complexity, how can species hope to evolve towards improved function and can the fitness landscape help us understand this search? More fundamentally, how deterministic

1

are the processes of evolution? Is the uniqueness of species ever-increasing, or are there commonalities in evolutionary pathways?

Figure 1: Modern model representation of a fitness landscape The gridded plane indicates sequence space, with a given allele represented by the white point. The elevation along the landscape represents the fitness at that genotype. Here, the landscape shown is rugged. The white sequence represents a local maximum, but not the global maximum on the landscape.

Since Wright’s original work, the concept of the landscape has been expanded both conceptually and mathematically. John Maynard Smith focused on the application of adaptive pathways to the metaphor of the fitness landscape2. His work questioned whether continuous paths exist within sequence space to possible proteins. Smith’s extension emphasized the concept of “ruggedness” of topology, where multiple fitness peaks may be separated by “valleys” of lower fitness2. Further models of changing topology affect the evolutionary processes represented by the fitness landscape. In many instances, these models explore the process of adaptation and how it may be dependent on the topology of the landscape. An adaptive walk along the landscape tends to proceed

“uphill” as fitness increases. Non-adaptive, random walks also exist, including those due to genetic drift, recombination, or other stochastic processes. By altering the central metaphor of the peak structure, or topology of the landscape, these adaptive walks can vary dramatically. For instance, a simple landscape with a single global peak is termed a

2

“Mt. Fuji” landscape. This is the model used by Fisher in which he assumed that ruggedness of the landscape is irrelevant in the face of an infinitely large population3. In this case, all beneficial adaptations climb uphill to a global maximum. Other models include Kauffman’s NK landscape with tunable ruggedness, flat landscapes, and “holey” landscapes4. Theoretical mathematical treatments have also been applied to changing models of the landscape4. For example, these model evolutionary time to fixation or diversity due to population size. The topology of each model affects the prediction of these parameters.

One of Wright’s original questions related to the fitness landscape is still unresolved: how populations move between peaks in the landscape to cross “valleys.”

Wright suggested several possible answers including change of environment, inbreeding or genetic drift, or isolation of smaller populations1. However, these theoretical explanations are largely unresolved and valley crossing remains an open question4,5.

Newer suggestions for traversing fitness valleys include stochastic tunneling and recombination6,7, but these strategies rely entirely on chance. Landscapes and valley crossing have high relevance to the study of speciation, where the access of different peaks provides incompatibilities within the hybrids of the populations. For example, a mule would represent a practical sterile evolutionary valley between a horse and donkey.

The Bateson-Dobzhansky-Muller model provides one description by highlighting the incompatibility of mutations in differing backgrounds8. Alternatively, neutral theory simply suggests that the diversity of populations is due to the ability of evolution to find many diverse yet functionally equivalent solutions to an ecological niche9, advocating instead an argument to scale of complexity. However, the use of the fitness landscape

3

metaphor provides strong theoretical explanation for speciation. The division of a population as it shifts towards different peaks, due for example to geographic isolation or changing selection pressures, may explain how speciation occurs4.

Fitness valleys have conceptual import both on the macro scale of species as well as on the smaller scales of gene networks and proteins. Ruggedness has been shown to be common in both cases10,11. Here, multiple fitness peaks indicate the ability of a protein or gene network to improve function or adapt to a novel function. Fitness in these cases may be defined by the activity of a protein or product yield of a metabolic network, for example. Isolated peaks imply evolutionary “traps” in function. The landscape metaphor is assumed to be isomorphic between micro- and macro- scales.

Fitness landscape models also may be used to predict the advantageousness of a random mutation. One common model is Fisher’s geometric model, which claims that mutations of small effect have the highest likelihood of being beneficial (with greatest likelihood of 50%)3. As mutations have greater effect, the likelihood that they are beneficial decreases. This leads to a deceleration of adaptive potential as a single evolutionary pathway is followed. However, recent predictions from the effects of epistasis provide alternate explanations of the interactions between mutations and diminishing returns because the single-peak model is not assumed12-14.

Empirical fitness landscapes have recently begun to be explored in depth.

Sequencing-based approaches have been particularly beneficial in this field, allowing for the analysis of large libraries of variants in parallel15-22. One example of the landscape of the Hsp90 chaperone in yeast using the EMPIRIC (“extremely methodical and parallel investigation of randomized individual codons”) approach is typical of this process: point

4

mutants in Hsp90 were created and fitness was evaluated by the abundance of the sequence of a given allele after growth as measured by deep sequencing17. In essence, this method uses a competitive, progeny-based measure of fitness. Empirical approaches such as these are often limited by lack of information on the whole protein and assignments of low fitness. In growth assays, low fitness is indicated by no growth or very low growth. Distinguishing alleles conferring low fitness accurately is very difficult in competitive assays because this signal does not amplify. Thus the resolution of many fitness measurements is low.

Alternate selection methods include phage display or viral or RNA reproductive capacity. Like other methods relying on positive selection, mutations of large and positive effect are amplified and produce a clearer signal. Other methods may instead use a screen on a lower library size. The landscape of TEM-1 β-lactamase on amoxicillin was constructed by screening the MIC of 10,000 variants, for example19. Still, most screens have difficulty at resolving mutations of small effect, especially debilitating mutations.

Work done previously in the Ostermeier lab avoids issues of resolution by both utilizing mutagenesis of an entire protein and assigning fitness aided by a synthetic genetic circuit23. We believe this to be the most high-resolution approach for calculating fitness, especially for accurately analyzing low-activity variants and therefore providing high dynamic range of fitness measurements. Since the band-pass genetic circuit selects for low activity rather than uses a positive selection above a threshold of a given activity, low-activity variants are amplified and still produce a strong fitness signal in our data.

However, most of these empirical landscapes are “static,” calculating fitness in an unchanging condition. Recent metaphors for the fitness landscape have used the

5

nomenclature of “fitness seascape” in attempting to emphasize the ever-changing nature of the landscape24,25. In other words, the landscape is fluid and shifts due to the environment of selection as well as past selective history. The high dimensionality of the landscape hinders our intuitive understanding of its ability to change, but these changes may be responsible for many speciation or valley crossing events in nature.

Modern experiments in understanding fitness landscapes have also begun to concentrate on their shifting dynamics. A landmark experiment by the Lenski laboratory following long-term evolution of E. coli tracks fitness changes of adaptive alleles over evolutionary time26. Other experiments probe changes in small landscapes due to environmental change over shorter time scales27. An antibiotic-resistance kinase was also used to develop a fitness landscape in changing antibiotic environments16. The landscape of Hsp90 in yeast using the EMPIRIC approach has also been studied in the context of four different environments, providing support for Fisher’s geometrical model of fitness landscapes and adaptive tradeoffs28.

Epistasis and the “tape of life”

Ruggedness of the fitness landscape is necessitated by sign epistasis29. Broadly, epistasis may be defined as the deviation from additivity in fitness between any two mutations. Epistasis provides a useful definition of the unexpected results of combining mutations and may be used as a measure to quantify landscape ruggedness. Epistasis can be defined intergenically, between two alleles, or intragenically, between amino acid mutations in a single allele. Further categories of epistasis have been defined: magnitude epistasis, sign epistasis, and reciprocal sign epistasis (Figure 2). Magnitude epistasis

6

exists when a mutation has a positive or negative effect on fitness, but when a second mutation is added the extent of this positive or negativity changes. For example, if mutations A and B increase fitness by 10% each, the introduction of mutation B along with mutation A may be expected to multiply and increase fitness by 1.1! = 1.21-fold.

Magnitude sign epistasis would occur if the combinations of both mutations still increase fitness but the increase in fitness deviates from 21% in this example. Sign epistasis occurs if one of these combinations exhibits a different sign of fitness effect. For example, if mutation A provides a fitness increase but mutation B decreases fitness, sign epistasis would exist for the allele AB if the combination increased fitness relative to wildtype. Sign epistasis would then exist for locus B. Finally, reciprocal sign epistasis exists if sign epistasis exists on both loci A and B. Reciprocal sign epistasis between two loci presents a simple visualization of ruggedness, as two maxima are evident from the possible pathways shown.

AB AB AB

A B A B A B

WT WT WT

Figure 2: Demonstration of types of epistasis Epistasis is shown between a wildtype allele (WT), an allele with mutation A, an allele with mutation B, and an allele with mutations A and B (AB). Arrows point towards increasing fitness. From left to right: no sign epistasis, sign epistasis at locus B, and reciprocal sign epistasis due to sign epistasis at both loci A and B.

7

Epistasis then is a necessary condition for landscape ruggedness, which creates local maxima on the fitness landscape. Epistasis is also responsible for the changing nature of the fitness landscape during evolution. If all adaptive walks follow increasing fitness, a rugged landscape indicates that a global maximum may not be reached. Indeed, the maximum reached would depend on the starting point of each evolutionary pathway.

The repeatability of evolution then depends on the structure of the fitness landscape. The question of the repeatability of evolution was elegantly presented by Stephen Gould as the repeatability of the “tape of life”30. In his book, Gould considers whether the same species would evolve upon repeating the processes of evolution. The ruggedness of the fitness landscape is a metaphor that directly addresses this question: will the same peaks be reached as evolution is repeated? The ruggedness of the landscape and the stochastic nature of evolution are the key determinants of whether the conclusion of repeatability is possible.

Importantly, epistasis is enriched during adaptation10,31,32. Rather than being a random artifact of interactions between mutations, epistasis is apparently central to how proteins evolve in nature. Thus the understanding of the constraints and opportunities imposed by epistasis upon evolutionary pathways is central to the understanding of evolutionary mechanisms. For example, epistasis shapes adaptation by adding or subtracting adaptive evolutionary pathways (“sign epistasis”).

Early empirical epistasis measurements have concentrated on calculating the pairwise interaction of a limited number of loci33-37. One early example identified epistatic structural interactions between adaptive residues in the evolution of an ancient enzyme38. However, empirical epistasis measurements can also be calculated from pairs

8

of suitably chosen fitness landscapes39,40. Thus epistasis measurements encounter the same problem of fitness landscape measurements: low-resolution and difficulty resolving mutations of low-effect or very negative effect. Most of these studies also concentrate on a protein domain, such as those by Olson et al.41 and Bank et al.39. Additionally, the focus of study is typically on pairwise epistasis, the interaction of two given mutations. Other studies using inferred fitness and epistatic measurements from statistical deep sequencing and competition-based approaches are subject to error. For example, Hinkley et al. have produced a study measuring the effects of pairwise epistasis in an entire protein42, but the methods used are subject to bias in measurement43. Although limited theoretical consideration has been given to higher order epistasis44, no studies exist with empirical calculations on a large scale of an entire protein and of more than two mutations, although this has been attempted with basic analyses on some evolutionary pathways45.

Both a simplified epistasis calculation for high-order epistatic interactions and a high- throughput measurement of epistasis between more than two sites or epistasis over a whole protein are lacking.

A typical application of epistasis measurements is the analysis of diminishing returns in evolution. Much supporting evidence exists to demonstrate negative epistasis between adaptive mutations, causing diminishing returns of fitness benefits and/or isolation between peaks at different loci13,14,46-49. Epistasis is used to directly quantify the rate of diminishing returns among adaptive mutations, but typically the number of adaptive loci examined is small. Since adaptive mutations are rare, analysis of a large number of adaptations is difficult. Most studies also examine only adaptive mutants displaying one amino acid difference from the wildtype protein. Alternatively, epistasis

9

between activities in different environmental contexts has also been explored with limited loci27,50. These studies are likewise limited by low-resolution fitness, low numbers of loci examined, and low mutational distance from the wildtype protein.

Although epistasis is very prevalent in nature, only limited quantification of pairwise epistasis exists. Data sets of epistasis are highly limited to pairwise evaluations and typically only to domains of proteins. A complete analysis of the epistasis of an entire protein is lacking, especially that of higher order epistasis between more than two mutations. Since epistasis is believed to be prevalent during adaptation, focused data on epistatic effect while a protein evolves is desirable. This quantification would allow us to better understand the extent of landscape ruggedness, how epistasis influences adaptation

(and vice versa), and the shape of the broader fitness landscape and will be discussed in

Chapter 2.

Genetic drift

One key identified stochastic process in evolution is analyzed by neutral theory, proposed by Kimura and extended by Ohta51,52. Neutral theory posits that many mutations are nearly neutral in activity, creating a broad and relatively flat landscape model. Neutral theory introduces the importance of population size and genetic drift to the processes of adaptation. Namely, the probability that an allele may reach fixation is directly tied to the population size. Fixation is then a stochastic property with parameters besides (but including) fitness. Genetic drift extends this further, assuming the complete absence of selection. In this case, fixation and fitness to a given activity are completely stochastic. Neutral drift differs slightly from genetic drift in that the level of starting

10

activity (or above) is maintained, but there is no additional selective pressure to improve while mutations accumulate.

Muller’s ratchet provides a thought experiment relying on genetic drift: by assuming most random mutations are deleterious Muller assumed that genetic load of deleterious mutations should accumulate until an asexual population becomes extinct53.

Although intuitively selection for phenotypic fitness offsets the highly deleterious effects of drift, Muller’s ratchet illustrates the extreme case of the potential of genetic drift in accumulating and fixing deleterious mutations, especially in small population sizes.

Genetic drift and neutral drift have been proposed as evolutionary drivers. Both processes provide some escape from the strictly uphill adaptive walks on the fitness landscape. Random mutations may access regions of sequence space that do not increase activity, creating “tunneling” activity on the landscape. It has been proposed that these processes may allow evolution to reach new peaks along the landscape. Indeed, neutral drift has been successfully demonstrated to improve promiscuous activity to alternate substrates or reach alternate maxima of function which may have been previously inaccessible54-56. However, no studies exist which demonstrate improved activity on a gene’s original activity through neutral drift, and one study that examined the effects of drift experimentally suggests negative results57.

Directed Evolution

Directed evolution is simply the extension of evolutionary principles to the identification of proteins or phenotypes with desired properties defined by the researcher.

Because protein activity is still very difficult to predict from a given sequence, directed

11

evolution is a highly useful technique to evolve chosen functionality without understanding the structural contributions of accumulated mutations. The principles of directed evolution are equivalent to those of evolution: processes of mutation and selection are iterated. However, both the methodology of mutation and the principles and types of selection may be adjusted artificially towards better engineering a specified function. Directed evolution studies often target the evolution of a single gene by selecting for activity conferred by the protein the gene codes for (defined as fitness).

Traditional studies may mutate the gene in a number of ways, including introducing mutations made by natural mutagenesis, error-prone PCR, gene shuffling, cassette mutagenesis, or comprehensive saturation mutagenesis58.

Typically, the fittest mutants are selected or screened for and used to seed the next round of mutagenesis until a desired phenotype is found or a local maximum in fitness is reached. Alternate improvements on directed evolution techniques include in vivo evolution, familial gene shuffling, rational targeting of mutations to sites associated with the activity of interest, or the evolution of stability before function to subvert stability- function tradeoffs59,60. This last technique is of particular interest in the light of drift, as neutral drift is prone to accumulate stabilizing mutations. Because neutral drift often enriches stabilizing mutants55, the hypothesis that stabilizing mutations may be required for increasing functionality is one possible explanation for the utility of neutral theory in explaining de novo adaptation.

Modern theoretical limits of directed evolution are highlighted by the prevailing theory of diminishing returns12. Supporting evidence for diminishing returns among adaptive mutations includes the large amount of negative epistasis between adaptations.

12

According to this theory, as an evolutionary path nears a fitness maximum, mutations of large effect become rarer and only minimally beneficial mutations may be accessed.

Diminishing returns uses Fisher’s predictions of evolution in that only one local maxima is assumed to be accessible. Thus the rate of adaptation slows with the accumulation of adaptive mutations. Large jumps in sequence space (i.e. large numbers of mutations) would be necessary to access a distinct fitness peak. Fisher’s geometric model directly addresses the likelihood of a mutation being adaptive in a single-peak landscape by assuming at the extreme that a mutation of infinitesimally small effect has a 50% chance of being adaptive, with this chance decreasing as effect size increases. Thus mutations of large effect become increasingly rare as evolution progresses towards a peak.

Another prevalent limitation on directed evolution results is that of evolutionary tradeoffs, or the principle that “you get what you select for.” In other words, selection of phenotype must be considered carefully, as stability or pleiotropic effects are often compromised when evolving towards a certain function50,61,62. Maintenance of multiple selection criteria is necessary to avoid the engineering of a protein with unintended secondary properties.

More practical limits on directed evolution techniques in the laboratory center on optimizing the selection or screen for a given activity. Often, the creation of libraries is not as much of a bottleneck as compared to development of a high-throughput screen or selection targeted towards the intended activity. However, the nature of selection indicates that tying phenotypic fitness to a given activity must likely be accomplished on a case-by-case basis.

13

β-lactamase as a model gene in directed evolution

Willem Stemmer presented an early, highly successful study of directed evolution by evolving β-lactamase activity using gene shuffling63. β-lactamase genes code for an antibiotic resistance protein conferring resistance to β-lactam antibiotics (Figure 3). β- lactamase is an ancient enzyme predating human use of antibiotics64. Evidence from reconstruction of ancient β-lactamase proteins indicates that ancestral β-lactamases may have exhibited high stability and promiscuity, implying that modern β-lactamase may have evolved to be more specialized65. Many variants exist today with a spectrum of hydrolyzing activity on a number of antibiotic substrates66. β-lactam antibiotics are the most commonly used antibiotics, and β-lactamase genes could be the most prevalent antibiotic resistance gene67. The understanding of β-lactamase and its evolutionary potential is then highly clinically relevant and may be useful in predicting resistance outbreaks68.

14

Figure 3: Structure of TEM-1 !-lactamase protein Protein structure is shown in cartoon model. Sulfate ion located at the catalytic binding site is indicated in blue spheres. Structure data taken from pdb:1BTL.

Stemmer’s study began with the allele TEM-1 !-lactamase, which has extremely high hydrolyzing activity towards ampicillin66. However, TEM-1 has nearly no activity against cefotaxime, a third-generation !-lactam antibiotic. Gene shuffling with some inherent mutagenesis was used to generate and recombine library diversity while selection was raised in stringency for levels of resistance to cefotaxime63. The net result of this process was an allele accumulating four amino acid mutations and a promoter mutation. These amino acid mutations relative to TEM-1 were A42G, E104K, M182T, and G238S in Ambler notation (an allele henceforth referred to as the GKTS allele)69

15

(Figure 4). The combination of these mutations in Stemmer’s system improved resistance by 33,000-fold relative to TEM-1 (Table 1).

Cefotaxime Fold Designation Mutations MIC T improvement m (°C) (µg/ml) A42G E104K M182T G238S TEM-1 - - - - 0.088 1 51.5 None + - - - 0.088 1 TEM-17 - + - - 0.13 1.48 51 TEM-135 - - + - 0.063 0.72 TEM-19 - - - + 1.4 159 47 TEM-15 - + - + 360 4,090 46 “GKTS” + + + + 2900 33,000

Table 1: Table of resistance to cefotaxime conferred by mutants leading to GKTS allele. Adapted from Weinreich et al.45 and Wang et al.70

E104

G238 A42 M182

Figure 4: Structure of TEM-1 !-lactamase with sites of GKTS mutations indicated Red stick models indicate sites of GKTS mutations. Sulfate located within the binding site is indicated in blue spheres. Structure taken from pdb:1BTL.

16

Since Stemmer’s study, β-lactamase has been often reused as a model gene for directed evolution studies due to its clinical relevance and ease of selection. Because it confers phenotypic resistance in proportion to its hydrolyzing activity on a given substrate, β-lactamase provides a selectable analog for protein fitness on the phenotype of the organism. Studies have evolved β-lactamase towards alternate substrates of β- lactam antibiotics as well as recapitulated Stemmer’s original results45,63,68,71-77.

Strikingly, in all published cases no evolved enzyme has conferred increased cefotaxime resistance relative to Stemmer’s original results.

Because of its familiarity, the structural contributions of each mutation of the

GKTS allele have also been well characterized. G238S is thought to be the mutation that most directly affects the active site by widening it 2.8 Å68. This mutation destabilizes the protein (ΔΔG = -1.94 kcal/mol)70. E104K is also slightly destabilizing (ΔΔG =-0.22 kcal/mol). These mutations are commonly found together, with the allele containing both termed TEM-1578. This combination of mutations has roughly additive destabilization

(TEM-15 ΔΔG = -2.24 kcal/mol). TEM-15 has highly improved activity towards cefotaxime relative to TEM-1. The addition of M182T commonly occurs after the gain of these two mutations and is thought to serve as a global stabilizing mutation, in this case offsetting the destabilization of the TEM-15 mutations79. The A42G mutation is less characterized and indeed does seem to be the least common mutation, with several studies not reporting its acquisition68,73-77,80. A42G may counteract loop reorganization due to the G238S mutation68, but the relative rarity of this mutation among directed evolution studies may imply tradeoffs to its acquisition.

17

Therefore the β-lactamase evolutionary system is very well studied due to its high degree of improvement in evolved activity and convergent endpoint. In one case,

Weinreich et al. have provided a thorough description of all combinations of mutants leading to the final GKTS allele45. This paper makes the assumption that β-lactamase REPORTS only evolvesthe fraction towards of molecules increasing aggregated activity rises with andmicrobes thus are describes exposed to a the spatial small and temporal proportioneffects. of Moreover, inasmuch as intramolecular protein concentration (25), missense mutations diversity of antibiotic compounds in nature as pleiotropy (11, 25)andconcomitantsignepis- evolutionarythat reduce pathways aggregation that [e.g.,TEM (M182T)]-1 may (19 take) well in asreaching in clinical settingsthe final (1). Theallele implications GKTS. Amongtasis are characteristic of many missense may be necessary to render g4205a beneficial. of relaxing these assumptions are explored in mutations (25), constraints on the selective (Compare the effects of g4205a on A42G/ the supporting online text. choice of trajectories like those seen here are 5 2 =32 combinationsE104K/G238S with of mutations that on A42G/E104K/ with 5! = 120However, possible this work pathways was intended of to answer the four a likely amino to apply to the evolution of other pro- M182T/G238S in Table 1.) Thus, here again, more fundamental evolutionary question: Given teins. For example, application of our popula- pleiotropy represents the mechanistic basis of asetofpointmutationsknownjointlytoincrease tion genetic model to the fitness landscape acids andsign epistasis.a promoter mutation, Weinreichorganismal et al. fitness, find how that does 18 Darwinian pathways selec- between display an engineered NADP- and the wild-type Seen as an analysis of clinical cefotaxime tion regard the many mutational trajectories NAD-dependent forms of IMDH (12, 14, 26) monotonicallyresistance evolution,increasing our treatment resistance makes sever- (Figureavailable? 5). By The foregoing studying limitations this four-amino-acid notwith- reveals that at most 29% of all mutational al simplifying assumptions about the mutational standing, the implications of our study for this trajectories are selectively accessible (support- and selective processes. For example, we have broader question are clear: When selection acts ing online text). Our conclusion is also consist- landscape,disregarded Weinreich horizontal et al. gene conclude transfer and that have thison adaptiveTEM wt to increaseβ-lactamase cefotaxime landscape resistance, is entsingle- with results from prospective experimental limited attention to only five mutations. Further- only a very small fraction of trajectories to evolution studies, in which replicate evolution- peaked withmore a we global have assumed maximum that selection at the acts GKTS only allele.TEM* are likely to be realized, owing to sign ary realizations have been observed to follow to increase resistance to cefotaxime, whereas epistasis mediated by intramolecular pleiotropic largely identical mutational trajectories (27). However, the retrospective, combinatorial strategy employed here (11)substantiallyen- riches our understanding of the process of because it enables us to characterize all mutational trajectories, includ- ing those with a vanishingly small probability of realization [which is otherwise impractical (27)]. This is important because it draws at- tention to the mechanistic basis of selective

inaccessibility. It now appears that intramo- on January 6, 2011 lecular interactions render many mutational trajectories selectively inaccessible, which im- plies that replaying the protein tape of life (28) might be surprisingly repetitive. It remains to be seen whether intermolecular interactions similarly constrain Darwinian evolution at larger scales of biological organization.

References and Notes www.sciencemag.org 1. C. Walsh, Antibiotics: Actions, Origins, Resistance (American Society for Microbiology, Washington, DC, 2003). 2. A. A. Medeiros, Clin. Infect. Dis. 24, S19 (1997). 3. G. A. Jacoby, K. Bush, TEM Extended-Spectrum and Inhibitor Resistant b-Lactamases (2005), www.lahey.org/ Studies/temtable.asp. 4. N. Watson, Genet. Anal. Tech. Appl. 70, 399 (1988). Fig. 2. Mutational composition of the 10 most probable trajectories from TEMwt to TEM*. Nodes 5. B. G. Hall, M. Barlow, Drug Resist. Updates 7, 111 Downloaded from represent alleles whose identities are given by a string of five or – symbols corresponding (left to (2004). Figure 5: Diagram of evolutionary pathways with monotonically increasing fitness during evolution6. W. P. C. Stemmer, Nature 370, 389 (1994). right) to the presence or absence of mutations g4205a,þ A42G, E104K, M182T, and G238S, of GKTS from TEM-1, from Weinreich et al. 7. B. G. Hall, Antimicrob. Agents Chemother. 46, 3035 respectively. Numbers indicate cefotaxime resistance (12)inmg/ml. Edges represent mutations, as (2002). Figure takenlabeled. from The Weinreich relative probability et al. (D ofarwinian each beneficia Evolutionlmutationisrepresentedonalogscalebycolorand Can Follow Only Very Few Mutational Paths8. R. P. to Ambler et al., Biochem. J. 276, 269 (1991). Fitter Proteinswidth. ofDaniel edges: M. green/wide, Weinreich, 0.316 Nigel to 1.0; F. purple/medium, Delaney, Mark 0.1 A. to 0.316;DePristo, blue/narrow, and Daniel 0.0316 L. to Hartl 0.1; Science9. M. C. 7 Orencia, J. S. Yoon, J. E. Ness, W. P. C. Stemmer, April 2006:and 312 red/very (5770), narrow, 111 less-114. than [DOI:10.1126/science.1123539] 0.0316. Where two edges are shown) betweenReprinted a pair with of nodes,permissio solid andn from AAASR. D. Stevens,. Nat. Struct. Biol. 8, 238 (2001). 10. D. M. Weinreich, R. A. Watson, L. Chao, Evolution Int. J. broken edges correspond to probabilities under the equal and correlated fixation probability models, Org. Evolution 59, 1165 (2005). respectively. Elsewhere values differ between models by less than a factor of ¾10 0 0.316. 11. B. A. Malcolm, K. P. Wilson, B. W. Matthews, J. F. Kirsch, A. C. Wilson, Nature 345, 86 (1990). 18 12. Materials and methods are available as supporting Table 2. Summary of mutational effects on cefotaxime resistance. material on Science Online. 13. S. Wright, in Proceedings of the Sixth International Number of TEM alleles on which Congress of , D. F. Jones, Ed. (Brooklyn Botanic Mean† proportional Mutation mean mutational effect is Garden, Menasha, WI, 1932), pp. 356–366. increase 14. M. Lunzer, S. P. Miller, R. Felsheim, A. M. Dean, Science Positive* Negative* Negligible 310, 499 (2005). 15. J. H. Gillespie, Evolution Int. J. Org. Evolution 38, 1116 g4205a 8‡ 2‡ 6 1.4 (1984). A42G 12 0 4 5.9 16. H. A. Orr, Evolution Int. J. Org. Evolution 56, 1317 E104K 15 1 0 9.7 (2002). M182T 8‡ 3‡ 5 2.8 17. M. Barlow, B. G. Hall, Genetics 160, 823 (2002). 18. H. A. Orr, Evolution Int. J. Org. Evolution 59, 216 G238S 16 0 0 1.0 103 Â (2005). *Differences in mean MIC values are significant at P G 0.05. †Of MIC (12); geometric mean across all 16 alleles. ‡One 19. V. Sideraki, W. Huang, T. Palzkill, H. F. Gilbert, Proc. Natl. of these comparisons loses significance after Bonferroni correction. Acad. Sci. U.S.A. 98, 283 (2001).

www.sciencemag.org SCIENCE VOL 312 7 APRIL 2006 113

Two high-throughput fitness landscapes have also been produced for TEM-1 β- lactamase on amoxicillin and ampicillin19,23, providing the early groundwork analysis of the broader fitness effects of mutations on this gene and its evolutionary potential.

However, these data only represent single amino acid substitutions, covering a single slice of the entire fitness landscape and do not describe the shift of the landscape with the gain of additional mutations. The high degree of study, utility of knowledge about β- lactamase with regards to clinical use, and applicability to broader evolutionary principles make β-lactamase an excellent model system for directed evolution studies.

Directed evolution informed by landscape ruggedness

The ability of directed evolution to reach more distinct and potentially fitter peaks remains a prevalent and unsolved question. Improvement of directed evolution techniques has direct practical benefits in protein engineering, which often encounters diminishing returns as a peak is climbed12. As discussed, studies to date typically increase selection pressure by selecting for fitter variants at each round of evolution. Others have attempted starting at a variety of starting points to “seed” distinct evolutionary pathways to reach a variety of fitness maxima81. Finally stochastic strategies such as neutral drift have been attempted with mixed success54-57. None of these strategies have demonstrated significant improvement beyond existing methods, especially in the model system of β- lactamase. No studies also exist that have systematically evaluated effects of selection strategies for improving the efficiency or diversity generated by directed evolution.

Given our knowledge about fitness landscape models, we propose a more informed,

19

systematic approach to directed evolution in a landscape with many peaks. Since epistasis is prevalent, more complex selection dynamics may be applied to directed evolution to reach a possibly distinct and fitter maximum, as will be discussed in Chapter 3.

The combination of data of a higher dimensional fitness landscape then provides novel insights to the processes of evolution, such as landscape ruggedness, and can be used to inform directed evolution strategies in new ways. Understanding the shifting nature of the fitness landscape during evolution leads to understanding of evolutionary constraints. We can then implement protein engineering strategies to bypass or minimize these evolutionary constraints. Alternatively, we can take advantage of these constraints to limit the evolution of antibiotic resistance in the clinic by implementing evolutionarily informed strategies of antibiotic prescription (see Chapter 4). By providing these data, we develop a more comprehensive picture of evolutionary adaptation and the unexplored possibilities of the fitness landscape made accessible by applying complex selection strategies and taking advantage of epistatic interactions.

20

Chapter 2: Shifting fitness and epistatic landscapes reflect tradeoffs along an evolutionary pathway

Introduction

Proteins often evolve to serve new roles. Such repurposing can come at the expense of the original function. For instance, different β-lactamases provide different levels of resistance to β-lactam antibiotics82. Selective pressure for resistance to one class of β-lactam may decrease resistance to a second class. Because selective pressures can change over time, the selective history and evolutionary pathways of protein function may be complex65,83. A function not experiencing selective pressure is subject to genetic drift, an important driver of evolution. By alleviating selective pressure for an original function, random alleles relative to the unselected function may become fixed in a population, offering a variety of new evolutionary pathways from which new functions might evolve54,55,84. However, we lack an extensive, systematic study of the changes in fitness and adaptability of a protein undergoing drift and the functional tradeoffs that necessarily result, especially with regards to a protein’s original function. To better understand the evolutionary compromises inherent in adaptation, we used protein fitness landscapes to extensively quantify functional tradeoffs and the effects of adaptation on the prevalence of epistasis.

The advent of deep sequencing has provided the ability for extensive studies of the effect of mutation on function and fitness for a single gene or protein15,85. Protein fitness landscapes provide a description of the effects of mutation on protein function or the phenotype they provide. Most studies of protein fitness landscapes have focused on

21

the effects of single mutations in a set genetic background, characterizing only the first possible evolutionary steps from a given allele. However, the non-additive effects of mutations (i.e. epistasis) give rise to rugged landscapes, making the effect of multiple mutations difficult to predict from the effects of individual mutations4,31,35,40,86,87.

Intragenic epistasis is believed to be enriched during adaptive evolution31,88 but the evidence for this enrichment mostly comes from epistatic interactions between adaptive mutations or through homolog comparisons rather than a systematic study of epistasis throughout the protein along an adaptive pathway. The few large scale studies of protein epistatic landscapes19,39,41,89-91 were not designed to globally address epistasis in the context of adaptive mutations and have been limited to pairwise epistasis. With one exception 19 these studies have not examined mutations throughout an entire protein in a physiological setting. Other studies42 relied on statistical inference of epistasis, which is subject to bias43. To best capture the relationships between adaptation, epistasis and functional tradeoffs, stacks of physiological fitness landscapes of full-length genes involving a series of alleles along an adaptive pathway must be analyzed and compared.

Here, using the TEM-1 β-lactamase gene, we examine how protein fitness landscapes change with respect to the original function as adaptive mutations for a new function accumulate. We also investigate how the prevalence and types of epistasis change along an adaptive pathway.

TEM-1 is highly optimized to provide penicillin resistance to bacteria but has nearly no ability to confer cefotaxime resistance. TEM-17 (E104K), TEM-19 (G238S), and TEM-15 (E104K/G238S) are clinically isolated alleles of TEM-1 with the indicated mutations78. These mutations confer increased cefotaxime resistance and exhibit positive

22

epistasis. E104K and G238S individually confer 4- and 8-fold increases in cefotaxime resistance, respectively, but combined confer a 128-fold increase92. Improved resistance results from active site changes that synergistically increase catalytic activity on cefotaxime, but this adaptation comes at the expense of penicillinase activity and thermodynamic stability70. In particular, the G238S mutation causes the largest increase in cefotaximase activity, the largest decrease in penicillinase activity, and the largest decrease in stability (ΔΔG = -1.94 kcal/mol)70. E104K is only slightly destabilizing (ΔΔG

=-0.22 kcal/mol), and the combination of the two mutations is approximately additive in terms of their effect on stability (TEM-15 ΔΔG = -2.24 kcal/mol).

E104K Ampicillin G238S Ampicillin E104K Cefotaxime G238S Cefotaxime

101 101

100 100 Ampicillin resistance relative to TEM1 Cefotaxime resistance relative to TEM1

10-1 10-1 012 # mutations

Figure 6: Adaptive pathway of TEM-1 to TEM-15 The gain of either mutation E104K or G238S decreases the resistance provided by the allele to ampicillin, the originally optimized substrate (left y-axis, green lines). However, these mutations are adaptive in the context of cefotaxime (right y-axis, blue lines), especially in tandem.

Since TEM-1 is highly specialized for penicillin hydrolysis, these adaptive mutations for cefotaxime resistance expose the allele to risk for loss in the capacity to provide resistance to penicillins such as ampicillin (Amp) (Figure 6). Here, we examine these functional tradeoffs by quantifying how the protein fitness landscape for ampicillin

23

resistance changes along the evolutionary pathway from TEM-1 to TEM-15. This analysis describes drift in ampicillin resistance during the evolution of cefotaxime resistance because resistance to ampicillin is not directly selected for. Since either mutation can occur first in the evolutionary pathway to TEM-1545, we characterized the fitness landscapes along both possible evolutionary trajectories.

Fitness conferred by antibiotic-resistant alleles can be measured through growth competition experiments in the presence of the antibiotic; however, the fitness values depend greatly on the concentration of antibiotic used and the method cannot distinguish fitness differences among alleles conferring antibiotic resistance far above or far below the level of resistance required for growth. We skirt these limitations by measuring the effect of mutations on TEM-1’s ability to confer Amp resistance using a synthetic- biology-based method that quantifies the protein’s underlying fitness landscape and thus its intrinsic evolutionary potential with respect to its primary cellular function23. This method combines high-throughput site-directed mutagenesis93, a band-pass genetic circuit to partition alleles based on fitness94, and deep sequencing to assign fitness values23 (Figure 7). Although the resulting protein fitness landscape is the major determinant of an organismal fitness landscape for growth of the bacteria in the presence of the antibiotic19, the two types of landscapes are not equivalent. However, unlike most previous large-scale studies of protein epistatic landscapes, our landscape is determined in a physiological setting and includes a mutation’s effect on protein specific activity, protein cellular abundance, and potentially other factors arising from the native cellular context. Although synonymous mutations can have small fitness effects in TEM-123, here we average the effect of synonymous mutations and measure protein fitnesses.

24

Library generation Comprehensive site saturation mutagenesis over the length of the beta-lactamase gene and transformation into band-pass E. Coli

Selection Plate 1 Plate 2 Plate 13 Selection of libraries in vivo under . . . band-pass conditions on a range of ampicillin concentrations. Band-pass partitions cells into sublibraries according to their fitness. Plate 1 Ampicillin (μg/ml) selects only the least fit alleles with selected fitness increasing up to plate 13, which partitions the most fit alleles.

Amplification BC1 BC1 BC2 BC2 . . . BC13 BC13 Barcoded PCR of each library according to selected plate

PacBio Deep Sequencing ... A G C T G C ... Pooling of and deep sequencing with 3-pass circular consensus Matlab Analysis Custom scripts written to filter for Mutation A, wA quality, length, observation bias, and Mutation B, wB other errors. Mutations aligned and identified. Fitness, w, calculated from . . . barcode information for each single codon mutation with 5+ reads. Mutation N, wN

Figure 7: Overview of methodology used to measure protein fitness.

25

Methods

Mutagenesis

Comprehensive saturation mutagenesis of TEM-17, TEM-19, and TEM-15 was performed as previously described for TEM-193. Mutagenic oligonucleotides containing the degenerate codon NNN were targeted to every codon of the genes. We designed the process to create a library consisting of all 19 x 286 = 5434 mutants that differ from the template gene by only one amino acid substitution.

Selection

Selection was performed using the band-pass genetic circuit described previously23 with the following exceptions. We used plasmid pTS40 instead of pTS42.

Plasmid pTS42 is identical to pTS40 except that a small section of inconsequential DNA between the chloramphenicol resistance gene and the CloDF13 has been removed. We used 10 µg/ml tetracycline in the band-pass selection experiments with cefotaxime.

Ampicillin selections spanned 13 plates doubling in concentration from 0.25 µg/ml to

1024 µg/ml ampicillin, while cefotaxime selections spanned 7 plates doubling from 0.01

µg/ml to 0.64 µg/ml cefotaxime.

Deep sequencing

Deep sequencing was performed using amplicons generated from plasmid DNA isolated from each swept selection plate as described23. In this study, however, 3 bp barcodes indicating the plate were added on each side of the to identify the plate from which the amplicon originated. PacBio deep sequencing was performed on these amplicons and analyzed using a three-pass circular consensus criterion.

26

Data analysis

We used custom MATLAB scripts to align, analyze, and quantify reads and amino acid mutation composition (synonymous codons were grouped together). Reads were filtered for quality score (reported average quality score > 30, or average reported probability of error = 10-3), length (length of read less than 1100 bp but more than 930 bp), and quality of alignment to the reference gene (entirety of reference gene aligned to read) and the barcode (perfect match accepted only). If insertions or deletions were encountered within 3 bp of a substitution, the read was discarded due to possible misalignment of a mutation and ambiguity of position. Other cases of insertions and deletions were assumed to be sequencing errors. Each read was then aligned to the reference gene as well as each barcode to identify and catalog mutations. Only reads with a full alignment to the reference and only containing one codon substitution were accepted for analysis.

Because band-pass selection selects for cells with activity at or near a plated concentration of ampicillin (or cefotaxime) and not at lower concentrations, growth at each ampicillin concentration determines the fitness of each mutant. Selection and growth of cells leads to more DNA copies of that mutant, and therefore deep sequencing of the selected mutant library with identification of the associated ampicillin concentrations via barcodes may be used to determine fitness for each mutant. We tabulated the number of sequencing reads (the counts) for each allele at each ampicillin or cefotaxime concentration. Next, all counts of sequences for a given plate were adjusted to account for the fact that the number of reads originating from each plate might over- or under- represent the number of alleles conferring growth on that plate (as determined from

27

colony counts on the plate). We also applied this adjustment to the previously published fitness measurements of TEM-123 resulting in minor changes in the fitness values (Figure

8).

2

1.8

1.6

1.4

1.2

1

0.8

TEM-1 Fitness (this paper) 0.6

0.4

0.2

0 0 0.5 1 1.5 2 TEM-1 Fitness (Firnberg et al.)

Figure 8: Comparison of the TEM-1 fitness values from Firnberg et al., 2014 and those used in this work. The TEM-1 fitness values in this paper derived from those in Firnberg et al. but accounted for observation bias in the deep sequencing (see Experimental Procedures). The red line is y=x.

To calculate a fitness, we identified the plate with the highest adjusted counts for each allele, set a window including the four plates on either side (five total), and determined the fitness using the PacBio deep sequencing read counts from these five plates as described previously23. The unnormalized fitness value f of mutant allele i is calculated by averaging the number of reads from each plate p using the following equation, where c represents the number of PacBio read counts and a represents the concentration of ampicillin on plate p in µg/ml (as identified from the DNA barcode):

28

13 ∑ci, p log2 (ap ) p=1 fi = 13 Equation 1 ∑ci, p p=1

The unnormalized fitness f represents the resistance of each mutant to ampicillin.

In order calculate a normalized fitness wi to the wildtype fitness, fwt, the following equation is used:

2 fi wi = Equation 2 2 fwt

The normalized fitness w is unitless and defined relative to the resistance of the wildtype allele. This value directly represents the fraction of resistance conferred relative to the wildtype allele. Except where noted, all fitness values are relative to that of wildtype TEM-1, which was set to a value of 1.00.

In cases of finding two clusters of fitness values (using K-means clustering), preference was given to clusters of counts containing more than one codon usage. A fitness peak was determined iteratively: if a window around the plate with the highest number of counts did not contain 5 counts, the next peak was found and evaluated. In this way only alleles with 5 or more counts (before adjustment) were considered. The counts were also used to estimate an error in the fitness measurement as described23.

Epistasis values were calculated using Equations 3 through 7. Pairwise epistasis between mutation A with fitness �! and mutation B with fitness �! can be defined as

!!"!! �!•! = log!" Equation 3 !!!!

95 in which �! is the wildtype fitness . Generalizing for epistasis of order N with mutations i,j,k…n:

29

!!! !!"#…!!! �!"#…! = log!" ! Equation 4 ! !!

Thus, tertiary epistasis between mutations A, B, and X is:

! !!"#!! �!•!•! = log!" Equation 5 !!!!!!

Tertiary epistasis can also be calculated by summing the appropriate pairwise epistasis terms:

�!•!•! = �!•! + �!•! + �!•! ! = �!•! + �!•! + �!•! ! = �!•! + �!•! + �!•! !

Equation 6 in which �!•! ! refers to the pairwise epistasis between i and j in the context of allele containing mutation k. Thus, epistasis effects among E104K, G238S, and a third mutation (X), can be characterized by six pairwise epistasis terms and one tertiary epistasis term.

Among three mutations, the amount a third mutation changes the epistasis is the same regardless of which mutation is considered the third mutation, as can be seen by this manipulation of Equation 6:

∆�! = �!•! ! − �!•! = �!•! ! − �!•! = �!•! ! − �!•! Equation 7

The criteria for a pair of mutations to be designated as exhibiting significant positive or negative epistasis was that the epistasis value found exceeded twice the error calculated for each epistasis value in the positive or negative direction.

Sign epistasis was determined solely by fitness measurements29. In this case, sign epistasis of a mutation X between alleles A and B was counted as positive (+1) if the fitness of X in the background of A was significantly detrimental (fitness �!" was less than the wildtype value �! minus twice the fitness error), while the fitness of X in the

30

background of B was significantly advantageous (fitness �!" was greater than the wildtype value �! plus twice the fitness error). Negative sign epistasis (-1) is the inverse of this case. The frequency of sign epistasis between alleles A and B, �!•!, was found by summing all occurrences of sign epistasis and dividing by the size of the total data set.

The criteria for reciprocal sign epistasis, �!•!, incorporated identical error calculation but was further specified by the cases illustrated in Figure 9. �!•! was defined as positive

(+1) if the combination of mutations A and B constituted a local maximum (synergistic effect). Negative �!•! (-1) is the inverse case (antagonistic effect).

E104K+X G238S+X TEM15+X

X E104K X G238S E104K+X TEM15

TEM1 TEM1 E104K +R +R +R E104K•X G238S•X G238S•X⏐E104K

TEM15+X TEM15+X TEM15+X

G238S+X TEM15 E104K+X G238S+X E104K+X G238S+X

G238S X X

-R -R +R E104K•X⏐G238S E104K•G238S⏐X E104K•G238S⏐X

Figure 9: Illustration of all possible reciprocal sign epistasis, R, in our system. Arrows point towards increasing fitness between two alleles.

31

Results and Discussion

Fitness landscapes along an adaptive pathway

Our previous study quantified the effect on fitness (with respect to Amp resistance) of 95.6% (5212/5453) of the possible single amino acid substitutions in the

TEM-1 protein23. Here, we quantified the fitness effects on Amp resistance of 39% of all mutations in TEM-17, 50% of all mutations in TEM-19, and 45% of all mutations in

TEM-15 using the same approach (Figure 10). As expected based on previous minimum inhibitory concentration experiments with these alleles92, G238S caused the greater reduction in fitness, which could be ameliorated somewhat by the E104K mutation.

Fitness values relative to that of TEM-1 were 0.54 (TEM-17), 0.22 (TEM-19), and 0.29

(TEM-15). A mutation’s effect on TEM-1 is much more predictive of the mutations effect on TEM-17 than on TEM-19 (Figure 10). The high frequency of beneficial mutations in TEM-19 is striking (12.7% of all mutations improve fitness >50% relative to

TEM-19) and illustrates the prevalence of compensatory mutations in the context of deleterious mutations (i.e. protein robustness). These compensatory mutations are preferentially drawn from mutations with small effects on TEM-1 fitness. In contrast,

G238S tends to magnify the negative effects of deleterious mutations, which may be quantified by calculating the epistasis between sets of mutations.

32

TEM-17 (E104K)

TEM-15 (E104K+G238S) 600 5.8% 1 500 800 6.4% 700 + E104K 400 0.1 + G238S TEM-17 600 300 Count TEM-1 Count 500 200 0.01 TEM-1 400 Count 1600 TEM-17 Fitness 1.3% 100 300 TEM-15 TEM-1 1400 0.001 200 w=3.4 0 1200 00 0.2 0.40.5 0.6 0.81.0 1 1.2 1.41.5 1.6 1.82.0 2 2.2 2.42.5 0.001 0.01 0.1 1 100 1000 TEM-17 Fitness 0 TEM-1 TEM-1 Fitness 800 00 0.2 0.40.5 0.6 0.81.0 1 1.2 1.41.5 1.6 1.82.0 2 2.2 2.42.5 Count TEM-15 Fitness

Count 600 400 1 200 TEM-19 (G238S) 0 00 0.2 0.40.5 0.6 0.81.0 1 1.2 1.41.5 1.6 1.82.0 2 2.2 2.42.5 800 12.7% 0.1 TEM1 Fitness TEM-1 Fitness 700 1 600 0.01

500 0.1 TEM-15 Fitness 400 TEM-19 + G238S + E104K 0.001 Count300 TEM-1 Count w=4.6 0.01 0.001 0.01 0.1 1 200 TEM-19 Fitness 100 TEM-1 Fitness 0 0.001 00 0.2 0.40.5 0.6 0.81.0 1 1.2 1.41.5 1.6 1.82.0 2 2.2 2.42.5 0.001 0.01 0.1 1 TEM-19 Fitness TEM-1 Fitness

Figure 10: The landscapes of genetic drift along an adaptive pathway. The gain of the adaptive mutations (E104K or G238S) for cefotaxime resistance is indicated along with the distribution of fitness effects on ampicillin resistance in that context. Histograms display fitness values relative to the particular allele being characterized. The relative fitness of TEM-1 is indicated along with the percent of mutants with fitness values >50% above that of the genetic background. Scatter plots show fitness values relative to TEM-1 and illustrate how well the effect of the mutation on TEM-1 can predict the effect of the mutation on the indicated allele. The lines indicate the expected value assuming no epistasis.

Figure 11 shows these seven epistatic landscapes. In our analysis we only include epistasis values for which �! (the fitness of TEM-1 containing mutation X) is greater than 0.02 to avoid an artifactual increase in epistasis values due to the lower limit in measuring fitness. Pervasive epistasis involving the adaptive mutations is apparent, and the extent of mutations exhibiting epistasis increases as the adaptive mutations accumulate. E104K and G238S exhibit positive epistasis (pairwise epistasis

�!!"#!•!!"#! = 0.40). This epistasis is 33% less than that observed with cefotaxime

(�!!"#!•!!"#! = 0.60). The extent and magnitude of pairwise epistasis involving G238S is much greater than that involving E104K (Figure 11A). �!!"#!•! tends towards negative epistasis values as �! decreases, whereas �!!"#!•! is largely independent of �!.

33

In other words, when mutation X has a large negative effect on the fitness of TEM-1, its effect tends to be even more negative in the context of G238S, a destabilizing mutation.

This result, like those of previous studies31,39,41,84, fits the threshold robustness model describing protein fitness landscapes84,96,97. This model posits a stability margin that buffers the effect of destabilizing mutations. In the cell, chaperones can also provide a buffer against the destabilizing effects of mutations by compensating for the mutations negative effect on protein abundance23,55. However, once that stability margin is exhausted, the deleterious effects of destabilizing mutations are fully realized, resulting in a landscape that is inherently dominated by negative epistasis.

A Pairwise Epistasis B Tertiary Epistasis

2 2 1.5 1.5 1 1 2 0.5 0.5 ε ε 1.5 E104K•X 0 E104K•X⏐G238S 0 1 -0.5 -0.5 0.5 -1 -1 εE104K•G238S•X 0 -1.5 -1.5 -0.5 -2 -2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 -1 2 -1.5 TEM-1 Fitness 1.5 TEM-1 Fitness -2 1 0 0.5 1 1.5 2 0.5 TEM-1 Fitness εE104K•G238S⏐X 0 = 0.40 εE104K•G238S -0.5 -1 2 -1.5 1.5 -2 1 2 2 0 0.5 1 1.5 2 0.5 1.5 1.5 TEM-1 Fitness 0 Δε3 1 1 -0.5 0.5 ε 0.5 -1 εG238S•X 0 G238S•X⏐E104K 0 -1.5 -0.5 -0.5 -2 -1 -1 0 0.5 1 1.5 2 -1.5 -1.5 TEM-1 Fitness -2 -2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 TEM-1 Fitness TEM-1 Fitness

Figure 11: Distribution of pairwise and tertiary epistasis values along an adaptive pathway. (A) The distribution of the six possible pairwise epistasis values between two common mutations (E104K and G238S) and a third mutation (X) is shown as a function of the effect of mutation X in TEM-1. (B) Tertiary epistasis (�!!"#!•!!"#!•!) and the effect of adding a third mutation on the epistasis (∆�!) are shown as a function of the effect of mutation X on TEM-1. Pie graphs indicate the portion of alleles exhibiting positive epistasis (blue), negative epistasis (red), or no epistasis (white). The criteria for assigning an allele as exhibiting epistasis is that the absolute value of the epistasis must exceed twice the value of the uncertainty in the epistasis measurement. Scatter plots exclude a very small number of alleles that exhibited epistasis values >2 or <-2.

34

Although this model fits for �!!"#!•! when considering nearly neutral and moderately deleterious mutations, the model begins to break down when considering mutations with a severe effect on TEM-1 fitness (�! < 0.2), which are equally likely to exhibit positive or negative epistasis (Figure 12). The reason for this breakdown of the model may lie with the origin of the deleterious effects. We have previously shown that mutations with a severe effect on TEM-1 fitness (�! = ~0.05) are deleterious primarily due to the mutations effects on specific catalytic activity instead of effects on the protein dose23. Thus, such mutations may not necessarily erode the stability buffer as significantly. Alternatively, positive epistasis with G238S may arise from mutations that decrease stability in the same region of the protein as G238S does. Thus, positive epistasis may manifest if G238 and the mutated residue have a local stabilization interaction whose energy is lost by mutation of either position. Regardless, G238S (but not E104K) possesses the potential to either exacerbate or mitigate severely deleterious mutations, suggesting that severely destabilizing mutations move a protein into an area of the protein fitness landscape with more rugged local topography.

35

400 TEM-1 Fitness (wX) 350 300 250 200

Count 150 100 50 0 00 0.1 0.2 0.3 0.40.5 0.5 0.6 0.7 0.8 0.91 1 1.1 1.2 1.3 1.4 1.5 1.5 1.6 1.7 1.8 1.9 2

severe deleterious neutral

0.02 > wX > 0.2 0.2 > wX > 0.5 0.8 > wX > 1.2

160 400 200

120 300 150 m = 0.11 m = 0.04 m = 0.03 80 200 σ = 0.21 100 σ = 0.18 σ = 0.16 εE104K•X Count 40 50 100

0 0 0 -2 -1.6 -1.2-1-2 -0.8 -0.40 0 0.4 0.8 1 1.2 1.6 2 2 -2 -1.6 -1.2-1-2 -0.8 -0.40 0 0.4 0.8 1 1.2 1.6 2 2 -2 -1.6 -1.2-1-2 -0.8 -0.40 0 0.4 0.8 1 1.2 1.6 2 2 160 200 500

400 120 150 300 m = -0.05 εG238S•X 80 100 m = -0.27 m = 0.01 σ = 0.58 = 0.42 = 0.20 Count σ 200 σ 40 50 100

0 0 0 -2 -1.6 -1.2-1-2 -0.8 -0.40 0 0.4 0.8 1 1.2 1.6 2 2 -2 -1.6 -1.2-1-2 -0.8 -0.40 0 0.4 0.8 1 1.2 1.6 2 2 -2 -1.6 -1.2-1-2 -0.8 -0.40 0 0.4 0.8 1 1.2 1.6 2 2

160 200 400

120 150 300 ε E104K•G238S•X 80 m = 0.04 100 m = -0.11 200 m = 0.27 Count σ = 0.51 σ = 0.40 σ = 0.22 40 50 100

0 0 0 -2 -1.6 -1.2-1-2 -0.8 -0.40 0 0.4 0.8 1 1.2 1.6 2 2 -2 -1.6 -1.2-1-2 -0.8 -0.40 0 0.4 0.8 1 1.2 1.6 2 2 -2 -1.6 -1.2-1-2 -0.8 -0.40 0 0.4 0.8 1 1.2 1.6 2 2 Epistasis Epistasis Epistasis

Figure 12: The distribution of epistatic effects involving G238S depends on the severity of the mutation. Whereas mutations with a milder deleterious effect on TEM-1 tend to exhibit negative epistasis with G238S, mutations with a severe deleterious effect on TEM-1 are equally likely to exhibit positive or negative epistasis with G238S. m, mean; s, standard deviation.

From the distributions of Figure 11 one can visualize how the tertiary epistasis is comprised of the sum of three pairwise epistasis values according to Equation 4. This is seen most easily by summing �!!"#!•!!"#!, �!!"#!•!, and �!!"#!•! !!"#!. Since E104K and X exhibit little epistasis and �!!"#!•!!"#! = +0.40, the tertiary epistasis distribution can be seen to approximate that of �!!"#!•! !!"#! shifted up about 0.40 (Figure 11).

Thus, �!!"#!•! !!"#! is the pairwise epistasis term that best predicts the shape of the distribution of tertiary epistasis values (Figure 13).

36

A B 2 2 1 1 0 0

Tertiary ε -1 Tertiary ε -1 -2 -2 -2 02 -2 02 ε ε E104K• X G238S• X C 2 D 2 1 1 0 0

Tertiary ε -1 Tertiary ε -1 -2 -2 -2 02 -2 02 ε ε E104K• X|G238S G238S• X|E104K E 2 1 0

Tertiary ε -1 -2 -2 02 ε E104K• G238S|X

Figure 13: Plot of third-order epistasis versus pairwise epistasis values. Correlation between tertiary epistasis and pairwise epistasis between (A) E104K and X, (B) G238S and X, (C) E104K and X in the context of G238S, (D) G238S and X in the context of E104K, and (E) E104K and G238S in the context of X.

The term ∆�! is the change in epistasis upon adding a third mutation. The terms

�!•!•! and ∆�! reflect different aspects of tertiary epistasis. Although �!•!•! provides a measure of the epistasis involving the three residues according to Equation 3, it cannot distinguish between tertiary epistasis originating solely between two mutations and that originating in the complex interaction between all three mutations. The term ∆�! provides a measure of non-additive fitness effects manifesting from the three mutations collectively and better represents higher order epistatic effects.

37

In our study of TEM-1, ∆�! reflects how mutation X influences the positive pairwise epistasis between E104K and G238S (Figure 11B). On average, the effect is negative (median ∆�! = -0.24) but not enough to erase the positive interaction between

E104K and G238S, which is why the tertiary epistasis on average is positive (median

�!!"#!•!!"#!•! = 0.17). Due to symmetry in Equation 5, the average effect of G238S on the epistasis between E104K and X is equivalently negative, as is the effect of E104K on the epistasis between G238S and X. The term ∆�! is negative for 87% of all triple mutants (1271/1461) and the median epistasis value for a position in the protein is nearly always negative (Figure 14D). Thus, despite a trend towards positive tertiary epistasis values, the adaptive mutations of TEM-15 brought the protein to a more precarious region of sequence space in which a mutation’s effect tends to be more negative than expected based on the mutations effect in TEM-1, TEM-17 or TEM-19. The protein fitness landscape around TEM-15, containing both adaptive mutations, has much steeper drop-offs in fitness than around TEM-1.

38

A E "#$

signal "#% sequence E104K ε E104K•X " (median)

!"#% 145 148 !"#$ 149 0 50 100 150 200 250 Position in TEM-1 161 162 B 163 "#$ G238S 164 signal 165 sequence "#% 167

εG238S•X " 221 (median) 222 225 !"#% 226 228 G238 230 !"#$ 231 0 50 100 150 200 250 Position in TEM-1 243 C 244 signal G238S "#$ sequence E104K 287 289 "#% 26 29 εE104K•G238S•X " (median) !"#%

!"#$ 0 50 100 150 200 250 Position in TEM-1

D "#$

"#% signal sequence E104K G238S

Δε3 " (median) !"#%

!"#$ 0 50 100 150 200 250 Position in TEM-1

Figure 14: Epistasis as a function of sequence and structure. (A-D) Median epistasis as a function of position in TEM-1 (using the Ambler consensus number scheme). Median values were calculated only for those positions for which epistasis values were measured for five or more mutations at that position. The lack of a bar can indicate no epistasis but usually means insufficient data to calculate a median value. (E) Selected regions in the structure of TEM-1 showing a tendency for positive (blue) or negative (red) epistasis with G238S as defined by the median epistasis value. The numbers of the positions highlighted are indicated.

A cost of the adaptive mutations is this precariousness. Although this cost is illustrated here with the original function (ampicillin resistance), this topography likely extends to the evolved function (cefotaxime resistance) as well. We measured the fitness effects of mutations on TEM-15 for cefotaxime fitness and found that in general mutations reduced cefotaxime fitness as much or more than they reduced ampicillin fitness (Figure 15). The median fitness in the presence of ampicillin of single-mutants of

39

TEM-15 was 0.34 relative to TEM-15, while for cefotaxime this value was 0.20. In contrast, the median fitness of single-mutants of TEM-1 in the presence of ampicillin was

0.53, indicating gentler terrain around TEM-1. Most mutations in TEM-15 produced similar fitness effects for both antibiotics (R2 = 0.73), especially for deleterious mutations

(R2 = 0.8). This result suggests that most mutations realize their effects through changes in protein properties common to both substrates (e.g. stability).

1

0.1

0.01 0.1 1 TEM-15 Fitness (cefotaxime) TEM-15TEM-15 Fitness Fitness (ampicillin)(Ampicillin)

Figure 15: Correlation between a mutation’s effect on cefotaxime resistance and the effect on ampicillin resistance for TEM-15. Fitness values are normalized to that of TEM-15. The band-pass filter in the context of cefotaxime tended to allow alleles to confer growth preferentially at a single cefotaxime concentration, hence fitness values in the context of cefotaxime tended towards values defined by the cefotaxime concentration used in the experiment (i.e. the resolution in fitness values tended to 2-fold increments as evidenced by the horizontal banding of points in the figure). The line is y=x.

Structural map of epistasis

We next examined the relationship between epistasis and protein structure.

Mutations occurring within about 10 Å of G238S were biased towards exhibiting positive pairwise epistasis with G238S (Figure 16A) and positive tertiary epistasis (Figure 16B), perhaps because proximal mutations cooperatively contribute towards protein stability.

40

This result substantiates the trend seen within smaller protein domains41,89,98 (but see

Lunzer et al.88). Mutations exhibiting positive or negative pairwise epistasis with G238S preferentially partitioned to the interior of the protein (Figure 16D). Although only mutations participating in negative tertiary epistasis exhibited this bias (Figure 16E), this primarily reflected a high frequency of positive tertiary epistasis involving mutations at surface exposed residues that did little to disrupt the positive epistasis between E104K and G238S (Figure 16H). Mutations that negatively affected the positive epistasis between E104K and G238S had an even stronger bias to be buried in the structure

(Figure 16F). To some extent, residues participating in positive epistasis tended to localize with other residues participating in positive epistasis, and residues participating in negative epistasis localized analogously (Figure 16G,H; Figure 14E).

G εG238S•X

εG238S•X εE104K•G238S•X Δε3 A B C 0.06 0.06 0.06

0.04 0.04 0.04

0.02 0.02 0.02 G238 fraction of alleles fraction of alleles fraction of alleles 0 0 0 Median 0102030 0102030 0102030 Epistasis Distance to G238S (Å) Distance to G238S (Å) Distance to G238S (Å) Positive D E F 0.12 0.2 0.15 0.04 0.04 0.04 H None 0.03 0.03 0.03 0 0 0 Negative 0.02 0 2 4 0.02 0 2 4 0.02 0 2 4 0.01 0.01 0.01 E104 fraction of alleles fraction of alleles 0 fraction of alleles 0 0 050 100 050 100 050 100 % solvent-accessible % solvent-accessible % solvent-accessible surface area surface area surface area G238 No epistasis Positive epistasis Negative epistasis

εE104K•G238S•X

Figure 16: The relationship between epistasis and protein structure. (A-C) The relationship between C-alpha distance from G238 to X and epistasis. (D-F) The relationship between solvent-accessible surface area of residue X and epistasis. (G, H) Median epistasis values mapped by color-coding onto the structure of TEM-1. Median values were calculated only for those positions for which epistasis values were measured for five or more mutations at that position. White can indicate no epistasis or insufficient data to calculate a median value. (A,D) depicts pairwise epistasis between G238S and X and (B, E) depicts tertiary epistasis between G238S, E104K and X and (C, F) indicates , the change in epistasis upon adding a third mutation. !!! 41

Given the localization of positive and negative epistasis in the structure of the protein, it is not too surprising that there was also localization in the primary sequence

(Figure 14). Positions 243 and 244 were prone to exhibit strong positive epistasis with

G238S, most likely because these positions cooperatively provide stability with G238.

Mutations in regions 161-167, 221-231, and the N- and C- termini of the protein (26, 29,

287 and 289) were prone to exhibit strong negative epistasis with G238S. Residues 161-

167 compose the first third of the Ω loop, which plays an important role in TEM-1’s catalytic activity and specificity. Positions 221-231 comprise the end of a beta strand, a short alpha helix, and a loop that connects these two interacting structures. The residues at the N- and C-termini include interacting histidines H26 and H289, which help tie down the end of the two alpha helices that comprise the N- and C-termini of the protein, and

I286, which forms a favorable interaction with P226 (i.e. a residue in the aforementioned

221-231 segment). We propose that mutations in these regions (221-231, and the N- and

C- termini) tend to have moderate destabilizing effects, but together with G238S they exhaust the inherent stability buffer of the protein resulting in negative pairwise epistasis with G238S.

The first 23 codons of TEM-1 encode the signal sequence that directs the protein to the periplasmic space of E. coli, where the signal sequence is removed. Although the signal sequence is not part of the mature protein, mutations therein can affect fitness by changing the protein dose resulting from codon usage effects at the beginning of the gene and alterations in export efficiency23. We wondered if epistasis could manifest between mutations in the signal sequence and the E104K/G238S mutations in the mature protein.

We found that the means for �!!"#!•! (0.00 ± 0.28), �!!"#!•! (0.19 ± 0.28), and

42

�!!"#!•!!"#!•! (0.41 ± 0.23) were statistically different from each other (t < 0.0001,

Student’s t-test). Although �!!"#!•! values for the mature protein tended towards negative epistasis, �!!"#!•! values for the signal sequence tended towards positive epistasis (Figure 14B), especially for signal sequence mutations with a negative effect.

This could reflect G238S’s suspected capacity to cause protein aggregation45 and the complex relationship between export rates and protein dose. Decreased protein synthesis rates (and, analogously, export rates) can counter-intuitively increase soluble protein levels by decreasing off-pathway aggregation99. Thus, mutations causing decreased protein translation or export might decrease aggregation thereby increasing the protein dose, resulting in positive epistasis. The mean tertiary epistasis for signal sequence mutations matched �!!"#!•!!"#! (0.40) (Figure 14C) suggesting that, for the most part, tertiary epistasis involving signal sequence mutations was positive only because the mutations did not affect the positive pairwise epistasis between E104K and G238S.

43

Sign epistasis along an adaptive pathway

A B

Negative Positive

S TEM-1•TEM17 +RE104K•X

+RG238S•X S TEM-1•TEM-19 +R G238S•X⏐E104K -R E104K•X⏐G238S STEM-17•TEM-15 -R E104K•G238S⏐X +R STEM-19•TEM-15 E104K•G238S⏐X

0.04 0.02 0 0.02 0 0.005 0.01 0.015 0.02 Frequency of sign epistasis Frequency of reciprocal sign epistasis

Figure 17: Frequency of sign epistasis and ruggedness along an adaptive pathway. (A) Frequency of sign epistasis, S, observed between alleles. Sign epistasis is defined as the switch from a mutation X being deleterious to beneficial (positive, blue) or from beneficial to deleterious (negative, red) by the addition of an adaptive mutation (E104K or G238S). For example, �!"#!!•!"#!!" is positive if a mutation X was deleterious in TEM-1 but advantageous in TEM-17 (i.e. in the presence of E104K). (B) Fraction of sites demonstrating reciprocal sign epistasis between landscapes.

We further wanted to examine the changes in beneficial mutations between landscapes as caused by evolutionary tradeoffs. The addition or removal of accessible advantageous mutations due to genetic background is termed sign epistasis100. Sign epistasis reflects the direct change in potential advantageous evolutionary pathways between genetic backgrounds. Here, we count a mutation X as exhibiting sign epistasis if it is advantageous in one allele and disadvantageous in a second allele that differs from the first by only one amino acid substitution. Positive sign epistasis occurs when the mutation switches from deleterious to advantageous by the addition of an adaptive mutation (E104K or G238S). Negative sign epistasis is the inverse case. Our analysis indicates that the second adaptive mutation significantly increases the prevalence of negative sign epistasis (Figure 17A). Mutations that are detrimental in TEM-1 but

44

beneficial in TEM-19 have a higher frequency near G238S, especially at R241 and S243, and in the signal sequence (Figure 18B). The Ω-loop (160-178) is a hot spot for mutations exhibiting negative sign epistasis, with a relatively high occurrence of mutations that are advantageous in TEM-17 but deleterious in TEM-15 (Figure 18C).

Apparently, E104K’s deleterious effect on ampicillin resistance can be compensated for by these mutations in the nearby Ω-loop but not in the context of G238S. The overall loss of potential adaptive pathways indicated by the prevalence of negative sign epistasis is a cost of adaptation.

45

A

5 X • 0 E104K S -5 050 100 150 200 250 B Amino acid position 5 X • 0 G238S S -5 050 100 150 200 250 C Amino acid position 5 X|E104K

• 0

G238S -5 S 050 100 150 200 250 D Amino acid position 5 X|G238S

• 0

E104K -5 S 050 100 150 200 250 Amino acid position

Figure 18: Sum of occurrences of observed sign epistasis when comparing fitness landscapes of β- lactamase alleles. Sum of occurrences of sign epistasis, S, between landscapes is tallied against the position in the gene. Sign epistasis is defined as positive (blue) in the case of a mutation’s effect changing from deleterious to beneficial at that position or negative (red) if a mutation’s advantageous effect is lost between given genetic backgrounds.

Reciprocal sign epistasis and ruggedness along an adaptive pathway

Reciprocal sign epistasis is a specialized case of sign epistasis in which multiple fitness peaks exist between alleles (Figure 9)100. This term is a direct measure of ruggedness between genetic backgrounds. As with sign epistasis, negative reciprocal sign

46

epistasis occurs more frequently when two adaptive mutations are present (Figure 17B).

We find that 4.2% of the pathways are rugged between TEM-1 to TEM-15+X (i.e. the pathways exhibit reciprocal sign epistasis). G238S is the primary cause of this ruggedness. Most of these rugged pathways (83%) include G238S+X as a local maximum, and the frequency of reciprocal sign epistasis is highest near the G238S mutation (Figure 19). We speculate that threshold robustness, by advantageously allowing for adaptive mutations that compromise protein stability, generally will lead to increased epistasis and landscape ruggedness along adaptive pathways. Ruggedness as quantified here represents potential evolutionary traps as local maxima are increased, potentially preventing adaptive pathways from reaching fitter maxima.

A B 10

9

8

7

6

5

4

3

2

1

Occurrences of reciprocal sign epistasis 0 050 100 150 200 250 300 Amino acid position

Figure 19: Reciprocal sign epistasis as a function of sequence and structure. (A) Sum of occurrences of observed reciprocal sign epistasis at every amino acid position when comparing fitness landscapes of !-lactamase alleles. Reciprocal sign epistasis occurrences are tallied on every analyzed pathway from TEM-1 to TEM-15+X. (B) Distribution of values in (A) as shown on the structure of !-lactamase. Increasing instances of reciprocal sign epistasis at a given position are indicated by darker red shades and stick representation on the structure. E104 and G238 are noted as green spheres.

Conclusions

Our high-throughput, systematic approach to characterizing local fitness landscapes along an adaptive pathway provides extensive maps of fitness trade-offs and

47

epistasis that reveal the consequences of adaptation for the original protein function in unprecedented detail. Adaptation moves the protein to an area of the fitness landscape characterized by increased epistasis, ruggedness, and precariousness. In this region, deleterious mutations tend to be even worse, and the synergistic effects of beneficial mutations begin to erode. This topography provides an empirical landscape-based explanation for diminishing returns of fitness gains and accessible evolutionary pathways. All these consequences represent costs of adaptation. A residue’s statistical preferences for epistatic effects can be partially understood in terms of structural models.

In addition, the extent of a mutation’s effect showed statistical preferences for the type of epistasis. Whereas moderately deleterious mutations tend to exhibit negative epistasis, severely deleterious mutations were equally likely to exhibit positive or negative epistasis. We propose an explanation for this observation involving the threshold robustness model coupled with a mechanistic understanding of the origin of the severely deleterious effects.

48

Chapter 3: Environmental changes bridge fitness valleys

Introduction

The fitness landscape metaphor is often visualized as a hilly surface with distinct fitness peaks and valleys (Figure 20). This representation flattens the high dimensionality of actual landscapes to three dimensions to aid our understanding of what a fitness landscape represents. Fitness landscapes depict the change in fitness along possible evolutionary pathways1, and thus the shape of the fitness landscape affects the dynamics and outcomes of evolution. Accordingly, much effort has gone into understanding the shape of the landscape and the degree to which ruggedness manifests in actual evolutionary scenarios4. Ruggedness results from complex interactions between mutations, especially those exhibiting reciprocal sign epistasis29. In rugged landscapes, many local fitness maxima exist as “traps,” and therefore the evolutionary endpoint of adaptation from a given starting point is not predetermined as it would be in a smooth landscape. The endpoint of each evolutionary pathway depends on selective history, landscape topology, and chance. However, one way in which evolution might escape traps and cross fitness valleys is through changes in environment that remodel the landscape1. Understanding the role that environmental changes and landscape ruggedness play in adaptive pathways would allow insight into the predictability and potentiality of evolutionary pathways.

49

Figure 20: Simplified model representation of how evolution in alternating environments that modulates the fitness landscape can lead to the crossing of fitness valleys. The xy plane represents sequence space, while the z-axis represents fitness. The solid red arrow indicates an evolutionary pathway following increasing selection pressure. (A) A sequence (circle) sits at an optimum separated from global optima by fitness valley. (B) A change in the environment alters the landscape such evolution brings the gene to a new sequence that previously resided in the valley. (C) Upon return to the original environment, the sequence resides in the valley but can now evolve to the global optimum, which lies uphill.

Despite the importance of ruggedness in fitness landscapes and evolutionary pathways10,91,101, how changes in environment or selection affect evolutionary outcomes is underexplored experimentally. For example, is increasingly stringent positive selection the best strategy for optimization? This strategy is ubiquitously employed in directed evolution experiments, but on a rugged landscape positive selection alone may quickly reach a local maximum, limiting evolutionary capacity. Studies of evolution under strong

50

selection have indicated that evolution can follow few mutational pathways using positive selection, supporting the predictability of evolutionary endpoints45. However, how evolution might cross fitness valleys remains an unresolved problem1,4. Models and simulations have suggested that random strategies such as stochastic tunneling or recombination may occasionally result in the crossing of fitness valleys6,7. Other studies have examined the effect of changing the starting point of evolution to arrive at different fitness maxima81, but experiments rarely fundamentally change the nature of selection over the course of experimental evolution. Exceptions such as selection on alternating antibiotics83 and a regiment of neutral drift followed by positive selection54,56,102,57 have met with mixed success in evolving improved function. In contrast, genes in nature show strong evidence of complex selection dynamics due to selection tradeoffs12 (see Chapter

2), epistasis10 (see Chapter 2), inverted selection24,103, changing metabolic flux optimization104,105, drift84, or fixation due to population size106. The effect of mutations change dynamically as genes evolve or selection contexts change24,25 (Chapter 2), and evolutionary pathways are contingent on past environmental interactions and selective history30. Given the prevalence of natural analogs of these effects10,24,35,38,107,108, complex, fluctuating fitness landscapes and complex selective pressures are likely to contribute to the routes of natural evolutionary pathways.

We wondered if complex selection dynamics such as these could facilitate the crossing of fitness valleys. Specifically, we examined whether dynamic changes in the fitness landscape caused by changing environments would result in evolutionary pathways ending in distinct, potentially fitter evolutionary endpoints. We designed an experiment in which environmental changes enabled selection towards sequences

51

previously residing in a low-fitness valley (“negative selection”) and tested whether such an evolutionary path would arrive at distinct, higher final fitness maximum (Figure 20).

We used E. coli possessing a band-pass genetic circuit designed for tunable selection of ß-lactam antibiotic resistance as a model system for systematically examining such a scenario (Figure 21)94. This genetic circuit allowed us to perform selections for β-lactam resistance within a narrow window of resistance (i.e. one can select for resistance above a first threshold but below a second, higher threshold). Thus, selections for low but above background resistance can be performed. We term such selections "negative" selections in the sense that high fitness (the ability of the cells to grow) is achieved only by low levels of β-lactam resistance. Similarly, neutral selection for maintenance of about the original level of resistance can be performed. These band- pass selections are tuned by the level of the β-lactam antibiotic in the environment and require the addition of tetracycline. Thus, these environments reshape the fitness landscape analogously to the depiction in Figure 20, and our engineered bacteria are a model system for examining evolution under landscape-altering environmental changes.

52

Figure 21: General schematic of the experimental evolution experiments on the β-lactamase gene. After comprehensive saturation mutagenesis, which allows access to all amino acids rather than just those reachable by single base pair mutations, selection proceeds in one of three manners depending on the environment. Positive selection (green) enriches for sequences that provide above a certain level of resistance. Neutral selection (yellow) enriches for sequences conferring about the same resistance as the starting gene. Negative selection (red) enriches for sequences conferring very low, but above background resistance. The latter two selection schemes are enabled by an engineered band-pass gene circuit present in the cell94 and require the addition of the antibiotic tetracycline and different levels of cefotaxime, which constitute the difference in environment from the positive selection.

The β-lactamase allele TEM-1 provides high antibiotic resistance to ampicillin, but very low resistance to cefotaxime. Clinically isolated alleles of TEM-1 conferring increased cefotaxime resistance include TEM-15 (E104K/G238S) and TEM-52

(E104K/M182T/G238S), the latter of which increase resistance 4000-fold73. Six independent directed evolution experiments on TEM-1 that imposed positive selection for cefotaxime resistance found these same three mutations in the highest resistance alleles63,68,74-76,81. The ‘GKTS’ allele, which adds the A42G mutation, has the highest resistance observed in these experiments45,63,72, though other combinations of amino acids at these positions can confer slightly higher resistance109. These directed evolution studies

53

have used typical methodology: mutation of the gene, positive selection for higher resistance, and iteration of these processes over several rounds.

Here, we subjected TEM-15 to eight rounds of experimental evolution using four different selection strategies: positive selection, neutral selection, negative selection, and oscillating selection (Figure 22). As a control for evolution from low resistance alleles using only positive selection, we also subjected TEM-1 to positive selection. The differences in the selection strategies occurred in the first three rounds of evolution and were followed by five rounds of positive selection to climb towards local fitness maxima.

The positive selection experiments represent evolution in a fixed environment with a static landscape and are the typical directed evolution strategy. Due to the band-pass selection, our neutral selection is neutral in a sense beyond existing studies. While other studies select for at or above a near wildtype level of fitness, ours uses band-pass selection as a “stringent” neutral regimen to avoid selection for alleles conferring resistance far above the starting allele. During the negative selection steps in the oscillating and negative selection schemes, we set the level of resistance needed for growth above the background level of resistance exhibited by E. coli without a beta- lactamase gene. We reasoned that mutations that completely deactivate a protein might be exceedingly common (such as nonsense mutations), whereas maintenance of above- background level of activity would better reflect the desired scenario that some enzyme activity is required by the cell.

54

Materials and Methods

Plasmid conditions

All experiments used expression plasmid “pSkunk3” with a 9bp barcode indicating selection type following the stop codon of β-lactamase. Induction of expression of β-lactamase was accomplished using 300µM IPTG and maintenance antibiotic spectinomycin (50µg/ml).

Mutagenesis

Pfunkel mutagenesis was performed as described previously93 (see Chapter 2).

Selected libraries in vector “pSkunk3” were transformed into Escherichia coli CJ236 in order to prepare single-stranded template DNA containing uracil using filamentous phage

R408. Mutagenic oligonucleotides with degenerate codon NNN were targeted to every codon in β-lactamase. Following mutagenesis, all reaction libraries were transformed into electrocompetent 5-alpha cells (NEB) to prepare a naive library for each round. All naïve libraries exceeded 106 transformants following mutagenesis.

Selection

For band pass experiments, pSkunk3 libraries were miniprepped from the naïve library in 5-alpha cells and transformed into Escherichia coli SNO301 in the presence of plasmid pTS40. Band pass experiments were performed as described previously on solid media except 10 µg/ml tetracycline and various concentrations of cefotaxime were used94

(see Chapter 2). Selection in non-band-pass conditions was performed in NEB 5-alpha cells on LB Agar containing 300 µM IPTG at 37°C for 24 hrs. Roughly 106 CFU were

55

plated for each selection experiment. Cell stocks were not previously exposed to beta- lactam antibiotics prior to selection.

Deep sequencing and sequence analysis

Amplicons were prepared for deep sequencing via PCR with the addition of a 3 bp barcode on either side of the amplicons that indicated the plate from which each amplicon was isolated. PacBio deep sequencing was performed on these amplicons using a three-pass circular consensus. Custom MATLAB scripts described previously (see

Chapter 2) were used to align, analyze, and quantify reads and mutation composition.

Reads were filtered for quality score, length, and quantity of insertions and deletions.

Each read was then aligned to the reference gene to identify barcodes and mutations.

During deep sequencing of each round during negative selection, barcodes indicated round number, while barcodes indicated cefotaxime concentration in band-pass experiments.

For calculating protein fitness, selections were performed over 11 plates containing doubling levels of cefotaxime from 0.01 µg/ml to 10.24 µg/ml under band- pass conditions. After amplifying the library from each plate, deep sequencing data was generated for each selection condition. First all counts of barcodes for a given plate were normalized to the distribution of all barcodes to account for sampling errors. Next, the peak of normalized counts was found, and a window around that and the surrounding two plates was used to calculate the final fitness measurement. Only sequences with 5 or more counts were considered. Fitness scores, w, were defined relative to the wildtype fitness as previously described (Chapter 2).

56

Phylogenetic analysis

We used the Hamming distance between any two given sequences on the amino acid level to construct a distance matrix using custom scripts in MATLAB. Next, we applied a neighbor-joining algorithm in MATLAB to generate the tree from all combined sequences and cluster branches. The distance tree was visualized using FigTree110.

Selection-weighted attractive graphing (SWAG)

SWAG network analysis was accomplished using Gephi and calculated fitness values. Forces between nodes were weighted by the calculated relative selection strength between nodes (the ratio of fitnesses). Nodes were only connected that differed by only one amino acid mutation. The ForceAtlas2 algorithm was used to generate graphs.

Clustering was accomplished using the modularity algorithm weighted by selection strength. PageRank was implemented also weighted by selection strength.

Calculation of epistasis

Pairwise epistasis between mutation A with fitness �! and mutation B with fitness �! can be defined as:

�!"�! �!" = log!" �!�!

95 in which �! is the wildtype fitness (see Chapter 2). Generalizing for higher-order epistasis of order N with mutations i,j,k…n:

!!! �!"#…!�! �!"#…! = log!" ! ! �!

57

Minimum inhibitory concentration testing

Minimum inhibitory concentration (MIC) testing was performed in NEB 5-alpha cells by growing 1 ml of each culture to be tested overnight in Luria-Burtani broth in the presence of spectinomycin (50 µg/ml). Saturated growths were diluted 1/100 and grown until above OD600=0.3. All samples were then diluted back to OD600=0.003, and 1µl was spotted onto LB agar plates with 300 µM IPTG and various concentrations of cefotaxime in 2-fold increments. Growth was evaluated after 24 hrs in 37°C. The MIC was the lowest concentration at which no growth was visible.

Results and Discussion

A B Positive Neutral 103 103 g/ml) g/ml) µ µ 102 102

101 101

100 100

10-1 10-1 TEM15 TEM1 -2 -2

Selected cefotaxime resistance ( 10 Selected cefotaxime resistance ( 10 02468 02468 Round Round C D Negative Oscillating 103 103 g/ml) g/ml) µ µ 102 102

101 101

100 100

10-1 10-1

-2 -2

Selected cefotaxime resistance ( 10 Selected cefotaxime resistance ( 10 02468 02468 Round Round

Positive selection Band-pass selection

Figure 22: Course of the evolution experiments. Experiments differ in the first three rounds in which the bacteria underwent (A) positive selection, (B) neutral selection (C) negative selection or (D) oscillating selection. Each round represents both mutagenesis (line segments) and selection at the indicated level of cefotaxime (endpoints). Blue circles indicate band-pass selections and green circles indicate positive selections. The dashed horizontal line indicates the resistance level conferred by TEM-15.

58

Five plasmids were constructed with distinct DNA barcodes identifying the five experiments to monitor and ensure against cross-contamination of the libraries. One round of evolution consisted of comprehensive codon mutagenesis on the plasmid-borne

β-lactamase genes (isolated from the previous round), transformation of the newly mutated plasmid library into fresh SNO301 E. coli cells, and selection on agar plates for the desired level of cefotaxime resistance (Figure 21). At least 106 library members were generated and subjected to selection in each round. Selections were tuned such that at least hundreds but more commonly thousands of colonies appeared at each step to minimize the effects of evolutionary bottlenecking on the experimental outcome.

Interestingly, populations under negative and oscillating selection recovered from negative selection to resistance above that of TEM-15 after one round of positive selection (Figure 22). This result supports the idea of the commonality of compensating mutations or suppressor mutations in restoring protein activity37,97.

A B 14000 Neutral Oscillating 12000

10000

8000

6000

Positive 4000

(TEM-15) 2000

Fold increase of cefotaxime resistance over TEM-1 0 Positive Negative Positive Positive Neutral Negative Oscillating (TEM-1) 1.0 TEM-1 TEM-15

Figure 23: Diversity and conferred resistance of alleles after the eight round of evolution. (A) Unrooted tree of Hamming distances between randomly selected alleles. Branches are color coded by the evolution experiment as indicated. Scale bar indicates 1 unit of Hamming distance (i.e. one missense mutation). (B) Increase in resistance conferred by seven alleles from each selection strategy as compared to TEM-1. The seven alleles were selected from among those depicted in the tree to represent the diversity of sequences.

59

At the end of the experiment, all strategies resulted in bacteria with a high cefotaxime resistance phenotype with maximum resistance levels reaching over 1 mg/ml cefotaxime in the oscillating library. The coding sequence of from 47 of the fittest colonies from each selection strategy were sequenced. All selection strategies showed sequence clustering indicating that each strategy, for the most part, reaches distinct areas of sequence space (Figure 23A). Sequences resulting from negative selection showed both the highest average genetic distance from TEM-15 (Figure 23A, Figure 24, and

Figure 25A) as well as the highest diversity (Figure 25B). We discerned no apparent pattern in the location of mutations on the protein structure (Figure 26).

Positive Positive Neutral Negative Oscillating (TEM-1) (TEM-15)

Positive 286 (TEM-1)

284 Positive (TEM-15) 282 Amino Acid Similarity

280 Neutral 278

276 Negative 274

Oscillating 272

270

Figure 24: Heat map of matrix of Hamming similarity between unique alleles found from 47 most resistant alleles following round 8 in all selections. Red color indicates a higher similarity in amino acid mutations between two given sequences up to the full 287 amino acids of the protein, while blue indicates a greater number of differences in amino acid mutations.

60

We chose seven highly distinct sequences from each selection experiment (based on diversity within the tree) to sub-clone and decouple adaptive mutations in the coding sequence from those elsewhere on the plasmid. The coding sequences were ligated into a fresh plasmid backbone and transformed into an E. coli strain without previous antibiotic exposure for resistance testing (Figure 23B). All libraries produced alleles that conferred resistance higher that that conferred by GKTS although a drop in resistance upon sub- cloning the genes indicated that mutations elsewhere on the plasmids contributed to the high resistance observed after round eight. Strikingly, the negative selection strategy consistently evolved alleles with the highest activity, with 5/7 alleles tested conferring a

8-fold resistance increase over that of GKTS (a 12,800-fold increase over TEM-1).

A B

10 10

8 8

6 6

4 4

2 2

0 Mean diversity within selection type 0 Mean Hamming distance from TEM15 1) 15) 1) 15) Neutral Neutral OscillatingNegative OscillatingNegative PositivePositive (TEM (TEM PositivePositive (TEM (TEM

Figure 25: Genetic distance of evolved sequences following eight rounds of evolution. (A) Mean Hamming distance between TEM-15 and unique sequences taken from 47 most resistant alleles resulting from each selection type after eight rounds of evolution (with equal mutation rates). Negative selection resulted in alleles with greater average evolutionary distance from the starting point as compared with all other selection types (p<.005 by Student’s T-test). (B) Hamming diversity of alleles within each selection type as measured by Hamming distance. Negative selection results in the highest diversity of resulting mutants, while oscillating selection results in the lowest diversity (p<.005 each by Student’s T-test relative to all other selection strategies).

61

Positive (TEM-15) Positive (TEM-1)

Neutral Negative

Oscillating Combined

Figure 26: Structural mapping of mutations found in each selection scheme following round 8. The selection scheme demonstrating most frequent mutation rate at each site is depicted in the combined panel.

62

Deep sequencing of the population at each round of the negative selection strategy revealed high expansion of genetic diversity during negative selection, consistent with our expectation that deleterious mutations are common (Figure 27). In contrast, during the subsequent positive selection steps, increasing purifying selection is apparent (Figure

27). In particular the A184I mutation was highly enriched in the population (Figure 28;

Figure 29). However, a large amount of diversity was maintained throughout the experiment, with unique sequences resulting from final selection in round 8 accounting for 40/47 (85%) of the total analyzed (with average difference of 6.1 AA mutations between unique mutants), the most constrictive bottleneck in our experiment. 5836 unique sequences were found in deep sequencing data after the first positive selection, demonstrating that despite rapidly changing modes of selection, we maintained high diversity of alleles.

63

Figure 27: Heat map of mutations observed in the negative evolution strategy as analyzed by deep sequencing. Each plot indicates all possible amino acid mutations (y-axis) found in a given round across codon position (x-axis) in the gene, with color indicating relative frequency of that mutation from low (blue) to high (yellow). White indicates no observations of the indicated mutation. Rounds 1-3 were negative selections, while rounds 4-8 were positive selections in this selection scheme.

64

Figure 28: Stacked frequency distribution of mutations during negative evolution found by deep sequencing as a function of codon position over eight rounds.

Figure 29: Stacked distribution of mutations found in top 47 alleles from all selection strategies following round 8 as a function of codon position in β-lactamase.

65

Figure 30: Structural mapping of mutations found in the BS-NEG-4 allele, which confers extremely high resistance to cefotaxime. Mutations developed during the negative evolutionary pathway are shown in green and red and with stick models. The mutation F230S is noted in red. The active site is shown in blue.

To determine if the environmental change resulting in negative selection was a key factor in the evolutionary history of an allele conferring high resistance, we attempted to reconstruct all possible evolutionary intermediates between TEM-15 and one of the fittest alleles resulting from the negative selection strategy (BS-NEG-4) (Figure

30). This allele contained nine missense mutations relative to TEM15 (A11S, A42G,

L49M, Q90R, A184I, I208W, A224V, F230S, and R241F and synonymous mutation

P22P in Ambler notation69) and conferred an 8-fold higher cefotaxime resistance than

GKTS. Of the mutations in BS-NEG-4, A11S, A42G, L49M, and A224V have been independently seen in other TEM alleles, though none were present together in the same allele111. We created a library designed to contain all 512 combinations of these nine mutations in order to determine the fitness of all possible evolutionary intermediates in the evolution of BS-NEG-4 (assuming no mutation reversals). We determined the fitness

66

of 98.8% (506/512) of all possible combinations of these nine mutations in a high- throughput manner involving band-pass selection and deep sequencing as previously described (see Chapter 2). One mutation, F230S, was particularly deleterious (�!!"#! =

0.09; with fitness of TEM-15 set to a value of 1.0) and remained so even in combination with 1-2 other mutations, making it a highly likely starting point on the evolutionary pathway resulting in BS-NEG-4.

67

A BS-NEG-4 B

BS-NEG-4

w

TEM-15 TEM-15

C D +F230S

w w -F230S

Figure 31: Selection-weighted attraction graph (SWAG) of the possible evolutionary intermediates between TEM-15 and BS-NEG-4. Nodes are sequences and edges connect nodes that differ by one missense mutation. Forces between connected nodes are weighted proportional to the selection strength as determined by the node’s fitness values. Graph generated by Gephi by using ForceAtlas 2, and the cluster coloring generated by selection- weighted modularity function (cluster resolution 2.0). (A) 2D SWAG with the first mutational steps from TEM-15 indicated by a larger size. 3D SWAGs have fitness as the z-axis and are colored coded by (B) cluster, (C) whether the sequence contains F230S, and (D) fitness value.

We sought a method to visualize the high-dimensional space of the fitness landscape comprised of the 512 potential evolutionary intermediates between TEM-15 and BS-NEG-4. For this purpose, we introduce selection-weighted attraction graphing 68

(SWAG), a force-directed graphing method in which nodes represent sequences and edges connect nodes differing by only one missense mutation (Figure 31). Forces between connected nodes are weighted proportionally to the selection strength as determined by the nodes respective fitness values. These forces are applied in the x-y plane, and the graph can be depicted in three dimensions by adding fitness as the z-axis.

In the graph, edges represent possible single steps in the evolutionary pathway and the distance between connected nodes in the x-y plane is inversely proportional to the likelihood of taking that step under positive selection.

69

A

-F230S +F230S B Transition frequency -F230S Source cluster +F230S

Destination cluster

C -F230S +F230S transition frequency Selection-weighted -F230S Source cluster +F230S

Destination cluster

Figure 32: Pathways between clusters found in SWAG analysis. (A) All connections between clusters shown on a circular map. (B) Heat map of all connections between clusters. (C) Heat map of all connections between clusters weighted by selection strength of each connection.

70

SWAG analysis of the intermediates between TEM-15 and BS-NEG-4 resulted in eight major clusters of sequences representing local fitness peaks (Figure 31A,B). Such clustering was robust to the weighting of selection strength in the forces. No clustering was observed when fitness values were randomly reassigned to the nodes. Clusters comprise groups of sequences with many interconnections (Figure 32) that tend to lead in succession to ever increasing fitness values. The graph can be seen to possess two

‘ranges’ of four fitness peaks each separated by a large valley devoid of nodes (Figure

31). The two ranges are completely defined by whether the sequences contain F230S or not (Figure 31C). The range containing F230S lies far from the start point TEM-15 and below the end point BS-NEG-4. The starting range contains TEM-15 and nearby nodes.

Inter-range edges primarily connect low fitness alleles, typically with the lower fitness node lying in the F230S range.

71

BS-NEG-4

Q90R

F230S

A42G A184I A11S

TEM-15 R241F L49M A224V I206W

Figure 33: SWAG landscape of pathways between TEM-15 and BS-NEG-4 with single mutants indicated.

The SWAG depiction of all possible pathways between TEM-15 and BS-NEG-4 illustrates that the evolution of BS-NEG-4 is extremely unlikely under positive selection.

First, eight of the possible nine initial mutations (the exception being F230S) are located in the starting range (Figure 33). Second, there are many more steps within a cluster than between clusters (Figure 32B). Finally, steps within a cluster tend to be to fitter sequences than steps between clusters (Figure 32C), making inter-cluster steps even less likely. Thus, starting from TEM-15, the most likely mutational pathways in a constant,

72

positive selection environment progress first into one of the clusters of the starting range, and then preferentially climb within that cluster to local fitness maxima. However, when the environment changes such that low resistance is beneficial (i.e. under negative selection) mutational steps to low fitness values in the depths of the F230S range become advantageous (Figure 31D, Figure 34). Then, upon a reversion back to the original environment (i.e. back to positive selection), sequences tend to evolve up from the depths in the clusters of the F230S range, enabling access to BS-NEG-4. The environmental change can be thought to cause inter-range bridges to form across fitness valleys that end in ‘gateway’ sequences possessing F230S, allowing access the sequence space of that range of peaks. Without this bridge, evolution of BS-NEG-4 is extremely unlikely.

73

Figure 34: SWAG landscape modeling selection for low fitness alleles along the evolutionary pathway from TEM-15 to BS-NEG-4. TEM-15 is indicated by the larger grey node. Selection strength between nodes is set to the inverse of the fitness of the destination node for fitnesses less than 0.5. Other selection strengths are set to 0.01. Inverse fitness is graphed on the z-axis, and color indicates low fitness (red) to high fitness (blue).

As further evidence that evolution of BS-NEG-4 in a constant, positive selection environment is unlikely, Figure 35 demonstrates the total possible combinations of all pathways analyzed and illustrates the prevalence of negative steps along these pathways.

Of the 347,616 analyzed pathways from TEM-15 to BS-NEG-4, only sixteen exist that involve only increasing fitness as each mutation is added. Furthermore, evolution under positive selection along these increasing fitness pathways from TEM-15 to BS-NEG-4 is likely to terminate at intermediates that are local maxima along the way (Figure 36). We

74

conclude that evolution of BS-NEG-4 is extremely unlikely to occur under positive selection, but can be accessed by the series of environmental changes employed that modulate the selective pressure on the gene.

A B

Fitness

Negative Neutral Positive

Figure 35: Radial graph of all analyzed pathways from TEM-15 to BS-NEG-4. Each unit branch extending from the center indicates the gain of one of nine amino acids from TEM-15 until reaching the endpoint containing the full set of mutations. (A) The log of relative fitness change from the previous mutant is indicated by color: the scale of increasingly negative mutations are yellow to red while increasingly positive mutations are black to cyan. (B) The data shown in (A) with color indicating absolute fitness change from TEM-15.

75

BS-NEG-4

w

TEM-15

Figure 36: 3D SWAG of evolutionary pathways only following increasing fitness from TEM-15 to BS-NEG-4. Fitness is represented by height on the z-axis. Color indicates clusters. Each node is sized by likelihood of arrival as measured by PageRank algorithm weighted by selection strength between nodes.

We next turned out attention in more detail to the role F230S played in the evolution of BS-NEG-4. In our systematic study of fitness and epistatic landscapes along the evolutionary from TEM-1 to TEM-15 (see Chapter 2), we observed that F230S is very deleterious to TEM-15 for both cefotaxime resistance ( �!!"#! = 0.080) and ampicillin resistance ( �!!"#! = 0.046 ), but only mildly deleterious to ampicillin resistance in TEM-1 (�!!"#! = 0.45; relative to TEM-1). The different effects of F230S in TEM-1 and TEM-15 originate in a strong negative pairwise epistatic interaction with

G238S (�!!"#!•!!"#! = −0.73), with lies at the opposite end of the same β-strand. Strong negative epistasis is not observed between F230S and E104K (�!!"#!•!!"#! = +0.14.) nor can E104K alleviate the negative epistasis between G238S and F230S

(�!!"#!•!!"#!|!!"#! = −1.13, and �!!"#!•!!"#!•!!"#! = −0.59).

76

A B

1.2 9 F230S absent F230S absent 1 F230S present 8 F230S present

0.8 7 0.6 6 0.4 5 0.2 4 0 3 -0.2

Fitness of each mutant 2

Epistasis of each mutant -0.4 -0.6 1 -0.8 0 0246810 0246810 Number of mutations Number of mutations

Figure 37: Epistasis and fitness along possible evolutionary pathways between TEM-15 and BS- NEG-4. (A) N-order epistasis between all mutations and (B) fitness values of all possible evolutionary intermediates as a function of the number of mutations. Mutants containing mutation F230S are noted in red, while all other mutants are shown in blue. Points are jittered for clarity. Larger points indicate the mean for each data set, with bars extending to the standard deviation.

Because we suspected F230S to be key in the evolution of this allele, we compared the epistatic and fitness effects of mutations of evolutionary intermediates containing F230S to intermediates lacking F230S (Figure 37). Seven of nine of the mutations of BS-NEG-4 are beneficial as the sole mutation in TEM-15. However, the higher-order epistatic effect of combinations of these mutations in the absence of F230S is increasingly negative (Figure 37A), consistent with previous observations that beneficial mutations usually interact with negative epistasis, creating diminishing returns13,14,71. In contrast, higher-order epistatic interactions in intermediates containing

F230S are almost always positive (252/256 or 98%) and become increasingly positive as mutations accumulate up to about 4-5 mutations, where the higher-order epistasis plateaus at � = +0.7. This epistasis value corresponds to the average fitness of alleles with F230S being five times higher than expected based on the effects of the individual

77

mutations in TEM-15. We emphasize that this strong positive epistasis with F230S is a higher-order epistasis term. The benefits of F230S are not so apparent when one examines the effect that F230S has on pairwise epistasis among the other eight mutations, though F230S tends to have a slight positive effect on the pairwise epistasis between certain mutations (Figure 38).

0.12 A

0.1

0.08

0.06 Frequency

0.04

0.02

0 -1 -0.5 0 0.5 1 1.5 Δε B 3

Δε3

Figure 38: Change in pairwise epistasis, ∆��, between two mutations on the pathway from TEM-15 to BS-NEG-4 in response to the addition of F230S. (A) Histogram of all values of ∆�! measured. (B) Heat map of average ∆�! values found between pairs of mutations.

78

We suggest F230S serves as an “epistatic bridge,” a key link in traversing the fitness valley in this evolutionary pathway that serves to alleviate the negative interactions between beneficial mutations, creating the pathway to the final high-activity allele. Although alleles with F230S tend to have lower fitness, especially among intermediates with few mutations (Figure 37B), many intermediates demonstrate higher fitness with the addition of F230S, including the highest-fitness member analyzed, which contains all but the Q90R and A184I mutations.

Despite seeding different starting pathways and being able to access any given single amino acid mutation at each round, both of our positive selection strategies only yielded mutants that conferred resistance two-fold higher at maximum over GKTS. This result implies that in this experiment constraints on accessible amino acids are less of a restriction on evolutionary potential than the selection dynamics. Positive high-order epistasis through epistatic bridges then may serve as the mechanism by which more positive evolutionary pathways are accessed, and adaptivity in changing environments may enrich the bridges needed to access these pathways.

Interestingly, the three strategies other than negative selection almost uniformly yielded variants containing the M182T mutation, a well-known stabilizing and adaptive mutation for β-lactamase on cefotaxime (Figure 29)112, and our oscillating selection yielded variants on the GKTS set of mutations. Instead of M182T, the negative selection very commonly yielded the A184I mutation. It is possible that this mutation in concert with negative mutations such as F230S directed these evolutionary pathways towards distinct maxima. The oscillating selection, by extension, may not have had sufficient successive mutation and selection at low fitness to generate as much diversity along the

79

fitness landscape and low probability of escape from a local maximum. Indeed, the oscillating selection resulted in mutants with the lowest diversity (Figure 25B). We hypothesize that antibiotic cycling strategies mimicking this rapid oscillating selection may be useful for minimizing evolutionary potential to antibiotic resistance as measured by diversity of mutants generated. This interpretation could explain the null result of

Schenk et al. in evolving higher activity during selections with alternating antibiotics83.

Previous studies have demonstrated the advantage of pursuing alternate evolutionary paths in the evolution of activity towards alternate substrates by generating diversity through neutral drift, but we believe that our results are the first to show that negative selection may enrich for more and potentially fitter pathways54,56,57. To our knowledge, neutral drift has not shown utility as a tool in directed evolution to enhance activity past a local maximum in selection for the original activity. Our β-lactamase alleles evolved from stringent neutral selection show 4-fold improvement of resistance over GKTS and improvement over the results of all selection schemes except for the negative selection strategy. Our results provide the first experimental support for the ability of drift to access alleles of improved fitness and change evolutionary pathways for the original activity selected for. While neutral selection does seem to access alternate fitness maxima, the distance of escape relative to TEM-1 does not appear to match the effects of extended negative selection in our alleles tested here as measured by Hamming distance and acquired fitness maxima. Negative selection may thus be a more effective means of generating highly diverse evolutionary pathways.

Through negative selection strategies, we evolved several distinct β-lactamase alleles 8-fold more active than any previously laboratory-derived allele. One example of

80

these alleles, BS-NEG-4, shows strong hallmarks of negative selection and epistatic interaction in its possible evolutionary pathways. Since sequence space is large, it is likely possible to arrive at a given allele by chance. However, evolution following negative selection in particular seems to have selected for individual mutations that are more fit on average than the other mutations (Figure 39) (data from Chapter 2). We suggest that such epistatic bridges may bring these mutations together and enrich jumps to more distinct fitness maxima, such as the alleles analyzed here. These jumps would account for the high genetic distance demonstrated in alleles resulting from negative selection as compared to the other four selection strategies.

Figure 39: The protein fitness effects of single mutations in TEM-15 for the mutations observed among the 47 most resistant mutants in each of the evolution experiments. Fitness data taken from Chapter 2. In the box plots, the red lines indicate median fitness values, the dashed lines extend to the most extreme data values not deemed outliers (2.7 sigma or 99.3% coverage with assumed normal distribution) and each box outlines the 25th to 75th percentile of values. Outliers are indicated by jittered blue points.

Due to the prevalence of genetic drift, selective tradeoffs, and other effects that may alter selection dynamics12,31,113 (see Chapter 2), we expect that the expansion of evolutionary pathways by altering the selective environment may be prevalent in nature, including in the evolution of β-lactamase alleles. β-lactamase in particular has evolved to

81

have effect on many substrates and may thus access distinct evolutionary pathways towards an original substrate by evolving activity towards a novel substrate (i.e. through genetic drift)114 (see Chapter 2). Understanding the ability of antibiotic resistances to evolve and generate complexity under shifting selective contexts may inform antibiotic treatments to minimize the evolution of resistance.

We synthesize predictions of landscape ruggedness with emphasis on changing selective pressure in crossing fitness valleys, verifying some of the untested original predictions of Wright1. Contrary to previous results, positive selection alone is not sufficient to predict the efficacy or extent of evolutionary pathways45. This result may appear counter-intuitive to protein engineers but could prove useful in escaping plateaus in protein engineering effectiveness. Our study lends further credence to models of rough fitness landscapes and demonstrates practical inferences from such models. In such cases, replaying the “tape of life” produces very different results due to rugged evolutionary landscapes30.

82

Chapter 4: Evaluation of antibiotic cycling and cocktail therapy on evolution of antibiotic resistance

Barrett Steinberga, Martin Kangb, and Marc Ostermeier

Introduction

A direct and clinically relevant analog of the impact of changing selection environments exists in the introduction of antibiotic cycling and antibiotic cocktail therapies115,116. Antibiotic cycling strategies use alternating antibiotics in series against bacteria, while cocktail therapies combine multiple antibiotics concurrently. However, as previously discussed, changing selective environments have potential to result in the evolution of new and potentially fitter pathways (see Chapter 3). Alternatively, it is possible that alternating environments or cocktail therapies may constrain evolutionary pathways and limit evolution, which is the intent. Results in Chapter 3 showed that populations evolved from oscillating selection strategies were the least diverse.

Analogously, cycling selection may similarly limit the generation of diversity and thus available evolutionary pathways. Previous studies support short-term evolutionary effectiveness of cycling as well as cocktail therapy115,117,118, specifically those by Kim et al.116 and Schenk et al.83. These studies commonly track the evolution of a single antibiotic resistance gene or the whole of an organism. However, a study on whole-plasmid evolution is lacking. We believe that evolution on the plasmid level is

a BS conceived and designed experiment, created and tested genetic constructs and protocols, and analyzed data b MK performed evolutionary experiments and aided in data analysis 83

more representative of natural and clinically relevant systems. Plasmids are a source of widespread horizontal transmission of antibiotic resistance between bacteria119.

Understanding plasmid evolution is then critical to predicting the emergence of transmissible antibiotic resistance elements. To this end, we utilized elements from modern continuous evolution systems120 to evolve β-lactamase resistance in vivo under a variety of selection schemes. Mutagenesis induction is similar to that of phage-assisted continuous evolution, but here we implement semi-discrete selection steps to allow for adaptation to each antibiotic environment over 24 hours. In this manner, the entire plasmid is evolved even as it is introduced to naïve cells between rounds.

We decided to test four strategies of in vivo, competitive directed evolution in liquid culture: selection for only cefoxitin resistance, selection for only cefotaxime resistance, selection for combined cefoxitin and cefotaxime resistance (cocktail selection), and alternating selection for cefoxitin and cefotaxime (cycling selection)

(Figure 40). Cefoxitin and cefotaxime were chosen as model antibiotics both for their use in the clinic as well as due to pilot experiments indicating negative interaction between cefotaxime and cefoxitin resistance (mutants exhibiting greater cefotaxime resistance tended to have lesser cefoxitin resistance and vice versa). By comparing the results of these selection strategies, we can infer the opportunities or limitations imposed on evolution by the different selections.

84

Cefotaxime Concentration Cefoxitin

Cefotaxime+Cefoxitin Cocktail

Cefotaxime+Cefoxitin Cycling Time

Figure 40: Illustration of whole-plasmid evolution strategies evaluated.

85

Methods

tac K 0 anR 73 ori T cA /f1 EM re 3 1 1 -B C m L lo A D F C E F T C

u L

a m c u I

’ pTSmut

D u

m pSkunk3-BLA-CEFT

u

6

2 (11873bp)

9

Q

a

S n

A (4360bp)

d

m m

p

R

R

i

r

o

C

a

r a

A

5

1

G

p

F R P t e T

Figure 41: pTSmut and barcoded pSkunk3 plasmid maps. pTSmut is used in this experiment for induced mutagenesis when arabinose is added. pSkunk3 contains IPTG-inducible beta-lactamase expression, a phage origin for efficient packaging, and a 12bp barcode indicating selection type (shown here is one example barcode, “CEFT”).

Three replicate evolutionary lines were established for each selection scheme. To start, naïve libraries of pSkunk3 plasmids were obtained by transforming plasmids pSkunk3 (initially containing TEM-1 beta-lactamase as well as a 12 bp barcode indicating the selection scheme) and pTSmut into NEB Turbo F+ E. coli. Inoculants of these transformants were grown with 10 µl R408 phage (NEB) and 0.1% arabinose and maintenance antibiotics spectinomycin and chloramphenicol (50 µg/ml each) in 1 ml LB broth at 37°C and shaking at 250 rpm. After overnight growth to saturation, cells were pelleted and discarded, and the supernatant containing phage was recovered. This step was repeated to remove residual bacteria. The supernatant, containing phage libraries of barcoded and mutated pSkunk3-BLA, was heated to 65°C for 20 minutes to kill

86

remaining bacteria. These naïve phage libraries served as the starting point for evolutionary experiments to provide initial population diversity.

Each evolutionary round of infection, selection, and mutagenesis (Figure 42) proceeded by first infecting naïve cells with phage libraries for one hour with shaking at

250 rpm at 37°C. IPTG (300 µM) and arabinose (0.1% w/v) were also added at this time to induce beta-lactamase expression and mutagenesis, respectively. The first round of selection was performed with the naïve phage library, but sequential rounds used phage selected from the previous round. Antibiotics spectinomycin (50 µg/ml), chloramphenicol

(50 µg/ml), and cefotaxime and/or cefoxitin were added after an hour of infection. All selections were performed in 96-well plates using 1 ml of LB broth. Cells were selected in this manner for 24 hours, resulting phage was harvested, and the process was repeated.

Concentrations of cefotaxime and/or cefoxitin used ranged in two-fold increments among wells in empirically defined ranges. Phage were recovered from the culture with the highest antibiotic concentration at which the OD600 > 0.2. These phage were used to infect fresh cells in the next round. Minimum inhibitory concentration (MIC) was defined as twice the highest cefotaxime and/or cefoxitin concentration meeting this growth threshold.

87 A

Ptac R Kan m13/f1 ori TEM15 recA730 cloDF ori barcode umuC MutBP Mutagenesis & lacI Evo Band-pass Plasmid Evolution Plasmid umuD’ pTSmut pSkunk3 (4.4 kB) dnaQ926 (11.9 kB) SmR PBAD ampR p15a ori araC tetC gfp P ampC

B

MutBP Evo + spectinomycin (Evo plasmid selection) +chloramphenicol (pTSmut selection) MutBP Evo helper MutBP Evo initial phage f1 + arabinose (mutagenesis) infection step R408 + IPTG (TEM-bla expression) + cefotaxime and/or cefoxitin (selective pressure) pTSmut Evo

growth / mutagenesis / selection

MutBP Evo MutBP Evo ~24 hours

recover phage ~20 min infect Evo fresh cells Evo f1 1 hour f1 Evo

Evo

Evo MutBP Evo

f1

E. coli F’ f1

Figure 42: Diagram of continuous evolution for antibiotic resistance using phage and mutator plasmid.

Results

We used a PACE-based continuous evolution system to evolve populations of bacteria in competition in liquid media over 10 days, or rounds of evolution (~500 generations). These populations were challenged by various selection schemes designed to test the evolutionary effects of clinically relevant antibiotic treatments on evolutionary outcomes. Specifically, we tested positive selection for only cefoxitin, only cefotaxime, cycling of both cefoxitin and cefotaxime, and a combination “cocktail” therapy of cefoxitin and cefotaxime. We established three replicate lineages to test each selection type.

88

Figure 43: History of library MICs during whole-plasmid evolution. Competitive whole-plasmid evolution experiments were carried out for ten rounds of evolution in the strategies discussed. Three lines of replicate experiments were evolved in each strategy, indicated by differently colored and shaped points in the graphs. Average minimum inhibitory concentration among replicates is indicated by the black line. Importantly, minimum inhibitory concentration (MIC) noted in the cocktail therapy graphs was found during selection present in both antibiotics cefotaxime and cefoxitin and is not indicative of individual MICs to a single antibiotic.

Plasmids pTSmut and pSkunk3 were constructed to carry out competitive evolutionary experiments with whole plasmid mutagenesis. pSkunk3 expresses beta- lactamase, contains a DNA barcode indicating selection type, and is horizontally transferrable via phage. pSkunk3 contains a phage origin of replication (f1), which allows for phage packaging. pTSmut induces mutagenesis through expression of error prone polymerase pol V consisting of subunits umuD’ and umuC, dominant negative proofreading subunit dnaQ926, and induction of the SOS pathway through expression of recA730120. During each round of evolution, plasmids are randomly mutated via induced

89

genes on pTSmut. Selective advantage in antibiotic resistance causes adaptive library members to become enriched in the population. Concurrently, filamentous phage R408 infects cells. R408 preferentially packages the pSkunk3 plasmid, although a smaller fraction of phage contains the phage genome121. Phage is continuously infected into cells and secreted from cells into the growth media. Growth advantages in bacteria (conferred by pSkunk3 plasmids) confer growth advantages to phage. Thus, the phage population is enriched for packaging of adaptive library members of pSkunk3 that confer greater antibiotic resistance to E. coli. After a given round, phage populations from each selection can be harvested to infect naïve E. coli containing only pTSmut.

Mutator plasmid pTSmut was constructed by incorporating mutagenic elements from plasmid MP-QUR120 into plasmid pTS40. Initially, we hoped to prove band-pass elements from pTS40 to be functional in continuous evolution; however, pilot testing showed that phage infection activated the ampC promoter and caused constitutive expression of TetC. However, we determined that plasmid pTSmut induced mutagenesis as desired. By induction of the pTSmut mutagenesis machinery and expression of a beta- lactamase construct with a 1 bp nonsense mutation, we found that induction of the pTSmut with 0.1% arabinose resulted in a mutagenesis rate of ~6.4*10-6 nt/replication,

~104 fold higher than the natural E. coli mutagenesis rate122. We therefore used pTSmut as the mutator plasmid in all experiments.

Figure 43 demonstrates the evolutionary history of the pSkunk3-BLA plasmid libraries in each selection scheme. Positive selection in only cefotaxime resulted in an average of 192-fold improvement in minimum inhibitory concentration (MIC). Positive selection in only cefoxitin increased resistance an average of 128-fold. Relative to

90

cefotaxime-only selection, cefotaxime resistance reduced 12-fold average in cocktail strategies (importantly in the presence of cefoxitin) and 24-fold in cycling strategies. In contrast, relative to the cefoxitin-only selections, cefoxitin resistance did not change due to cycling but showed a 16-fold reduction in cocktail strategies in the presence of cefotaxime. Cefotaxime resistance was also consistently lower across rounds during cycling selection. Phenotypically, it appears that both cefotaxime and cefoxitime resistance were constrained by antibiotic cycling.

91

A

B

Figure 44: Distribution of mutated features from plasmids resulting from each selection. Sanger sequencing was used to sequence representative plasmids from each selection scheme (n=33 total). (A) Relative frequency distribution of mutations found within each selection scheme. (B) Relative enrichment value of mutations within each selection scheme. Here, the relative enrichment value within each selection scheme is shown on the y-axis. Relative enrichment is found by calculating the frequency distribution of mutations found within each selection scheme and dividing by the expected frequency of random mutations. Color indicates key features of the plasmid: beta-lactamase (BLA, orange), the spectinomycin resistance gene (SmR, grey), the origin of replication (p15a, red), and the phage origin (f1, green).

After ten rounds of evolution (~500 generations of selection), we Sanger sequenced nine representative plasmids resulting from each selection and evaluated the resistance levels of these plasmids on both antibiotics. Only plasmids with complete sequence data and from evolutionary schemes with appropriately matching DNA

92

barcodes were evaluated in this work (n=8 from cefotaxime-only selection, n=7 from cefoxitin-only selection, n=7 from cocktail selection, and n=9 from cycling selection).

Figure 44 shows the distribution of the location of these mutations for 33 successful total sequences of entire plasmids among selection strategies (see also Table

2). The fewest mutations in the beta-lactamase coding region were found following cefoxitin-only selection. This data suggests that beta-lactamase evolutionary capacity was most constrained by cefoxitin selection and cycling selections as measured by relative enrichment. The low frequency of mutation may indicate that the fitness of TEM-1 beta- lactamase is near-maximum in cefoxitin, leaving less potential for possible beneficial mutations within the gene. However, instead selective pressure may have favored other possible regulatory elements within the plasmid. In contrast, cefotaxime selection enriched mutations within the coding region of beta-lactamase. Regulatory or expression- related mutations that evolved may include changes to the replication origin p15a, which underwent greatest enrichment in cocktail and cycling selection strategies. Interestingly, the relative enrichment of beta-lactamase mutations during cocktail selections appears to approximate an intermediate between cefotaxime-only and cefoxitin-only selections.

However, the cycling selections take an apparently distinct enrichment pattern with nearly all mutations found made to the origin of replication or to an unknown feature of the plasmid, which may serve regulatory function.

Well-characterized mutations conferring cefotaxime resistance were noted in the cefotaxime selection, such as E104K/G238S (7/8 alleles sequenced had E104K and/or

G238S). These mutations were also often seen following the cocktail selection strategy

(6/7 alleles had E104K and/or G238S). Notably, these well-characterized mutations were

93

absent in both samples from the cycling and cefoxitin selections. These mutations are commonly found in the clinic and in other studies in directed evolution74. Cycling selection Cocktail selection Cefoxitin selection selection Cefotaxime 200 100 0 100 200 300 Cefoxitin improvement Cefotaxime improvement Fold improvement after Round 10

Figure 45: MICs of individual mutants resulting from each whole-plasmid selection scheme. Plasmids resulting from each selection scheme that had complete sequencing data were individually tested for MIC on both antibiotics. Fold improvement over resistance conferred by TEM-1 is noted in red for cefoxitin resistance and in blue for cefotaxime resistance. Notably, cycling selections ended during a round of cefoxitin selection.

These nine plasmids from each library were tested to evaluate the improvement in

MIC after round 10 on both antibiotics (Figure 45, Table 2). The results largely confirm trends seen in Figure 44 on the library level but better illustrate some of the variation within the library, especially during cocktail selection. Individual variants were found in the cocktail selection strategies with MICs matching greatest resistance found in other strategies to both antibiotics. In contrast, cefoxitin and cycling selections demonstrate less variation in resistances in both environments.

94

Discussion

Collectively, the results indicate that antibiotic cycling, at least at rapid speeds such as performed in this experiment, most drastically limited evolutionary resistance of beta-lactamase to cefotaxime both phenotypically and on the genetic level relative to single-antibiotic selection in comparison to other selections exposed to cefotaxime.

Despite periodic exposure to cefotaxime, key typical resistance mutations to cefotaxime are not prevalent in the population. However, this result may be due to the final round of cycling taking place in the presence of cefoxitin, thereby selecting against cefotaxime- resistant members. Selection for cefoxitin, which is antagonistic to cefotaxime, probably decreases the frequency of members with these mutations in the population. Surprisingly, cocktail treatment with both antibiotics appears to result in much larger variability in resistance levels. One possibility is that “specialist” library members evolved during this strategy to handle separate antibiotics. In this case, a synergistic interaction could exist between members of the population and particular niches of activity for selection may have emerged. This hypothesis is corroborated by the high variation in MICs found from the cocktail selection as well as the intermediate enrichment profile of beta-lactamase and other mutations as compared with cefotaxime- and cefoxitin-only selections.

In contrast, evolution during antibiotic cycling also appears to change the evolutionary pathways accessed. However, in the cycling case, the rapid changes in types of selection strength may have constrained the evolutionary pathways available to beta- lactamase. Instead, the high frequency of mutations to other plasmid elements such as the origin of replication suggests that the distinct pathway followed in this selection scheme was selection for increased beta-lactamase expression. Evidence for this hypothesis is

95

additionally found in the low variability of MICs between the two antibiotics following evolution in these conditions. Increasing expression is an intuitive generalist response if the beta-lactamase enzyme has some base-level activity towards both antibiotics.

Cefoxitin-only selection likewise shows low levels of MIC variation, but mutations are less frequent in the origin of replication. We hypothesize that the rapid fluctuations in antibiotics during cycling selections made specialist populations relatively rare in the population (Figure 46). Specialist frequency would have decreased every other round due to the alternation of antibiotics. Instead, generalists may have preferentially evolved, in this case apparently most easily by increasing beta-lactamase expression. Therefore, we suggest that cycling-based selections directed evolutionary pathways distinctly relative to single or combined antibiotic selections.

Cefotaxime specialist

Cefoxitin specialist

Generalist Frequency

Round

Figure 46: Model of generalist evolution during antibiotic cycling Frequency of members with each evolutionary strategy in the population is shown as a function of round of selection. Although specialists are more fit in individual rounds, antagonistic selection strength in subsequent rounds diminishes their frequency in the population. The generalist strategy, in this case hypothesized to be selection for increased expression, is less fit initially but is able to gain frequency in the population over time. In contrast, cocktail selection maintains pressure on both populations to coexist in the population.

96

These results corroborate the results of Schenk et al. and Kim et al. in evolving lower resistance levels following the cycling selection strategies, at least towards cefotaxime. However, previous studies did not take the possibility of specialized library members in the cocktail selection into consideration. By evaluating individual members, we note that individual variants demonstrate various resistance levels in cocktail selections, and this variation is absent in cycling strategies despite being exposed to multiple antibiotics. Because whole-plasmid evolution was not tested, previous studies also did not suggest altered expression conditions evolved as we hypothesize due to antibiotic cycling. Our study allows us to hypothesize the reasons for evolving these alternate alleles and lower overall resistance as modeled in Figure 46.

We are aware of no existing studies that demonstrate differences in creating evolutionary niches due to cocktail-based selections or cycling-based selections. This result merits further study. To improve our understanding of these results, we are undertaking next-generation sequencing of the libraries evolved following all evolutionary strategies. We hope that the population-based sequencing data may aid in corroborating our results shown here, such as by confirming distinct populations of

“specialists” found following the cocktail-based evolution and by providing a greater sample size to compare mutations within beta-lactamase to other mutations, such as those within the origin of replication. This latter ratio would indicate a greater proportion of

“generalist” library members that increase beta-lactamase expression or regulation. Thus this data should be able to provide conclusive evidence to the presence of generalists or specialists due to each selection scheme on a population-wide scale.

97

We suggest that different evolutionary pathways were accessed by changing the selection environment to include both cefotaxime and cefoxitin. We find indicative hallmarks of these different pathways in the changing prevalence between selections of the TEM-15 mutations E104K and G238S, the frequency of mutations outside the beta- lactamase coding region, and the various MICs evolved. Further, we conjecture that different lengths of cycles, for instance selection at multiple rounds with each single antibiotic before switching, should also change the evolutionary trajectory of the plasmids. Allowing more time to adapt to shifting environments may allow for greater evolutionary escape. This method would more closely mirror the negative selection discussed in Chapter 3.

Phenotypically, these results indicate that rapid cycling should be investigated in the clinic for mitigating the evolution of antibiotic resistance with potential caveats of length of cycles used. Longer cycles may allow for greater ability to adapt to a changing environment and could open new evolutionary pathways (see Chapter 3). Additionally, our experiment in antibiotic cycling ended with selection for cefoxitin resistance, not cefotaxime resistance. It could be argued that this final selection decreased library members with high cefotaxime resistance, providing us with a skewed sample of library members. However, we note that the MIC of the library shown in Figure 44 discourages this interpretation, since the cycling selection also reliably demonstrates lower resistance to cefotaxime across all rounds. Further studies using next-generation sequencing should better characterize the population of rare members in the population and allow us to determine if common cefotaxime-resistance mutations are still prevalent at a lower frequency. Even among the same class of antibiotic used here, we show that the

98

phenotypic resistance is constrained due to this selection strategy, although expression may be increased. Although we suggest this promotes generalist evolution, the activity of the enzymes evolved is limited because key cefotaxime-hydrolyzing mutations are not gained. Specifically, appearance of E104K and G238S is suppressed. Thus the ability to evolve cefotaxime resistance is relatively impaired. Our results further indicate caution in the use of multiple antibiotics concurrently, as cocktail therapy may have evolutionary consequences if specialist niches are able to evolve. This interpretation highlights the importance of population drift as a purifying force towards changing antibiotics in evolution as selection forces change. Likewise, the importance of evolutionary niches may explain cooperative resistance reservoirs as found in biofilms123. We are currently undertaking deep sequencing of the evolved plasmids to better elucidate the variants evolved during this study. Additional work could be undertaken to attempt to compare the expression rates of beta-lactamase between selections. Further studies may investigate the effects of the rate of change of selection strength or the influence of the degree of increasing positive selection on evolutionary outcomes. Our results further indicate the contingency of population-wide, competitive evolutionary outcomes on the environment of selection and demonstrate practical potential inferences in designing clinical treatments.

99

Cefotaxime* Cefoxitin*MIC* MIC*(μg/mL) (μg/mL) Cefotaxime*only_A1 5.12 5.12 (BLA)E104K) (BLA)G238S) A3257G)(p15a)ori) T3536C)(f1)origin) Cefotaxime*only_A2 5.12 10.24 (BLA)E104K) (BLA)G238S) G3963A)(f1)origin) C4053G)(f1)origin) Cefotaxime*only_A3 5.12 5.12 (BLA)E104K) (BLA)G238S) A3257G)(p15a)ori) G3963A)(f1)origin) Cefotaxime*only_B2 1.28 5.12 G475A)(LacO) (BLA)G238S) A3257G)(p15a)ori) G3963A)(f1)origin) Cefotaxime*only_B3 1.28 10.24 G475A)(LacO) (BLA)E104K) (BLA)G238S) G829A)(BLA)E108K) A1065T)(BLA)T186T) T1107G)(BLA)A200A) C1799G(SmR)A600G) G2417A T2522A G3489A)(p15a)ori) Cefotaxime*only_C1 5.12 5.12 (BLA)E104K) (BLA)G238S) G2417A Cefotaxime*only_C2 5.12 5.12 (BLA)E104K) (BLA)G238S) A3325C)(p15a)ori) Cefotaxime*only_C3 5.12 5.12 (BLA)E104K) (BLA)G238S) G2417A C3056A)(p15a)ori) G3067A)(p15a)ori) G3417A)(p15a)ori) C3439T)(p15a)ori) G4044A)(f1)origin) Cefoxitin*only_A1 0.32 20.48 G435A)(P300) C985A)()BLA)L160I) T3824C)(f1)origin) Cefoxitin*only_A2 0.32 20.48 C985A)()BLA)L160I) T3824C)(f1)origin) Cefoxitin*only_A3 0.32 20.48 G435A)(P300) C985A)()BLA)L160I) T3824C)(f1)origin) Cefoxitin*only_B1 0.32 20.48 G435A)(P300) T3824C)(f1)origin) Cefoxitin*only_B2 0.32 20.48 A1496G A1532T)(SmR)K511I) T2161G)(SmR)S721A) T2397G A3642G T3824C A3851G)(f1)origin) A3940G)(f1)origin) A4000G)(f1)origin) Cefoxitin*only_B3 0.32 20.48 G3067A)(p15a)ori) Cefoxitin*only_C1 0.32 20.48 C1725T)(SmR)C575C) Cefoxitin*only_C2 0.32 20.48 T460C) A2356G Cefoxitin*only_C3 0.32 20.48 C1725T)(SmR)C575C) A1854C)(SmR)E618D) Cocktail_A1 0.64 10.24 C371A T387G A438C)(P300) (BLA)G238S) A1841T)(SmR)D614V) G3357A)(p15a)ori) A3409G)(p15a)ori) C3439T)(p15a)ori) C3590A)(p15a)ori) A4121G)(f1)origin) Cocktail_A2 0.64 10.24 C371A T387G A438C)(P300) (BLA)G238S) G3357A)(p15a)ori) A3409G)(p15a)ori) C3439T)(p15a)ori) C3590A)(p15a)ori) A4121G)(f1)origin) Cocktail_A3 0.64 20.48 C371A T387G A438C)(P300) (BLA)G238S) C1811T)(SmR)A604V) A2679G) A3145T)(p15a)ori) C3590A)(p15a)ori) T3667C A4121G)(f1)origin) Cocktail_B1 2.56 10.24 T434A)(P300) A655G)(BLA)N50D) (BLA)E104K) (BLA)G238S) G1610C)(SmR)S537T) G1840C)(SmR)D614H) G3438A)(p15a)ori) Cocktail_B2 2.56 10.24 T434A)(P300) A655G)(BLA)N50D) (BLA)E104K) (BLA)G238S) G3438A)(p15a)ori) Cocktail_B3 5.12 20.48 T434A)(P300) A655G)(BLA)N50D) (BLA)E104K) (BLA)G238S) G1293A)(BLA)T262T) G3438A)(p15a)ori) Cocktail_C2 1.28 10.24 G435A)(P300) G683A)(BLA)R59R) T2144C)(SmR)F715S) G2594A T3103C)(p15a)ori) C3928T)(f1)origin) Cycling_C1 0.32 20.48 C2545T T3536C Cycling_C2 0.32 20.48 G435A)(P300) C2545T A3320T)(p15a)ori) T3536C A3757C G3763T G3807C Cycling_C3 0.32 20.48 C2545T T3536C Cycling_B1 0.32 20.48 T876C)(BLA)A123A) C1252T)(BLA)P249S) G2270A)(SmR)R757H) G2429A T2430A A2978G)(p15a)ori) T3404C)(p15a)ori) Cycling_B2 0.32 20.48 T876C)(BLA)A123A) C1252T)(BLA)P249S) A2385G) A2978G)(p15a)ori) T3404C)(p15a)ori) T3408G)(p15a)ori) Cycling_B3 0.32 20.48 T876C)(BLA)A123A) C1252T)(BLA)P249S) A2385G G2451A)(p15a)ori) A2978G)(p15a)ori) T3404C)(p15a)ori) T3408G)(p15a)ori) Cycling_A1 0.32 20.48 G435A)(P300) T876C)(BLA)A123A) C1252T)(BLA)P249S) A2385G T3404C)(p15a)ori) T3408G)(p15a)ori) Cycling_A2 0.32 20.48 T876C)(BLA)A123A) C1252T)(BLA)P249S) G1817A)(SmR)R606K) A2385G T2398A A2978G)(p15a)ori) T3153G)(p15a)ori) G3159A)(p15a)ori) T3150A)(p15a)ori) T3404C)(p15a)ori) T3408G)(p15a)ori) Cycling_A3 0.32 20.48 G435A)(P300) T876C)(BLA)A123A) G1243T)(BLA)A246S) C1252T)(BLA)P249S) A2385G A2978G)(p15a)ori) A3175G)(p15a)ori) T3404C)(p15a)ori)

Table 2: Plasmid mutations resulting from ten rounds of evolution in each selection strategy and associated MIC values. Only plasmids with successful full sequencing are shown. Mutations are noted by point of DNA mutation unless shown for beta-lactamase, in which amino acid mutation is noted. Each known feature is color-coded (mutations in the coding region of beta-lactamase are noted in green, promoter mutations are in blue, origin of replication are in orange, phage origin of replication is in grey, and spectinomycin resistance is shown in tan).

100

Chapter 5: Conclusions and Future Directions

The data presented in this work represents a description of fitness landscape structure, the application of the understanding of fitness landscape structure to protein engineering, and the application of both landscape structure and directed evolution to understand and impede antibiotic resistance evolution. Further studies may be undertaken to better understand these subjects.

Conclusions reached in this work

The data presented in Chapter 2 describes a high-resolution, quantitative description of how the fitness landscape changes during the processes of evolution. Here, we concentrate our analysis on the effect of random additional mutations as the adaptive mutations E104K and/or G238S are gained. In this manner, we generate a “stack” of fitness landscapes along an adaptive pathway, providing a description of the higher dimensionality of the fitness landscape beyond existing studies. This stack of fitness data allows us to deeply explore epistatic interactions and fitness tradeoffs. We find extensive epistasis and are able to attribute much of it to G238S both structurally and functionally

(i.e. through destabilizing effects). Understanding how adaptation increases epistasis enhances our understanding of evolution. We provide evidence for a robustness-threshold model of protein evolution. G238S is destabilizing and augments epistasis. The protein appears to have a threshold of stability, and the combination of G238S and an additional mutation may push the protein beneath this threshold, dramatically increasing epistasis and lowering fitness of the protein. This could suggest widespread stability-activity tradeoffs in evolving functionality. Additionally, we are able to describe the shape of the

101

fitness landscape: specifically the local steepness and ruggedness. As with epistatic interactions, we find that steepness and ruggedness of the landscape are increased during adaptation. These descriptions were not possible before this study because they require a systematic, high-resolution approach and a series of landscapes along an adaptive pathway. These results demonstrate the limitations of evolution and show trends of functional tradeoffs in evolving activity.

In Chapter 3, we undertook four directed evolution strategies in parallel in order to evaluate the effects of changing selection pressure and environment on evolution. This question harkens back to hypotheses central to evolutionary theory as posed by Sewall

Wright, Stephen Jay Gould, and others as to whether evolution is contingent on past events. Because epistatic interactions change as a gene evolves (as evidenced by the work in Chapter 2), we find evolution follows distinct pathways due to previous selective events. Specifically, we compared positive selection, neutral selection, oscillating selection, and negative selection. We found that negative selection relative to the original activity counter-intuitively resulted in mutants of beta-lactamase with the highest resistance after subsequent positive selection. Each selection type generated distinct genotypes of variants, with negative selection boasting the greatest genetic distance from the wildtype allele on average.

Additionally, we studied a high-activity mutant resulting from negative selection strategies in depth. We constructed most combinations of the nine mutations in this allele and evaluated the fitness of these combinations. This method allowed us to evaluate most potential evolutionary pathways among these mutations. We found that the vast majority of pathways to the final mutant had negative selection steps. We created a new method

102

for visualizing fitness landscapes, selection-weighted attractive graphing (SWAG), in order to graph the landscape of these nine mutations. To our knowledge, this is the largest existing landscape of an evolutionary pathway, 16-fold larger as compared to the five- amino-acid landscape Weinreich et al. evaluated45. Additionally, it is the first landscape characterized which requires fitness valleys to be crossed to arrive at its evolutionary endpoint. Our new method, SWAG, allows us both to visualize this landscape and verify the existence of a fitness valley in acquiring the mutation F230S and associated high- fitness peaks. This work has direct impact both for understanding evolutionary theory in general and beta-lactamase in particular, which is highly clinically relevant and exposed to many selective environments and antibiotic substrates. Finally, this work includes library members with a 32-fold increase in activity over the GKTS allele, which has been studied as a model pathway in directed evolution without finding improved mutants for over twenty years, with one recent exception only demonstrating two-fold improvement109. Therefore methods incorporating negative selection should have direct practical applications in protein engineering.

Chapter 4 describes experiments indicating clinically relevant implications of these effects by performing continuous and competitive whole-plasmid evolution for

~500 generations. We designed and implemented a system for testing evolution under positive selection on cefotaxime and cefoxitin as well as cycling and cocktail combinations of these antibiotics. Cycling and cocktail strategies are used in the clinic but their implications are poorly understood. The cycling strategy appeared to have the greatest ability to limit the evolution of resistance as measured by minimum inhibitory concentration of individual variants and emergence of key cefotaxime-resistance

103

mutations in beta-lactamase. Instead, cycling enriched mutations potentially important to regulation of expression. We suggest that cycling and cocktail selection strategies preferentially evolved “generalist” and “specialist” population niches respectively.

However, this data is preliminary and subject to small sample sizes. To remedy this limitation, we are deep sequencing libraries of plasmids resulting from each selection scheme. Our efforts confirm earlier work in the benefits of antibiotic cycling71,116, but extend the analysis from a population-wide study of genes to a population and individual variant study of the whole plasmids involved. The study of whole plasmids highlights the role of regulatory mutations as were enriched during antibiotic cycling and demonstrates distinct evolutionary pathways among populations. This work should provide the first in depth study of the effects of antibiotic treatment strategies on plasmid evolution and could inform clinical strategies.

Revealing the shape of the fitness landscape

Due to the astronomic size of sequence space, a complete description of the fitness landscape is impossible. This analysis focuses on changing fitness effects due to adaptation as well as provides a thorough measure of pairwise and tertiary epistasis along an adaptive pathway. Epistasis is an underexplored area of research experimentally, and these principles of understanding secondary and tertiary interactions between residues may be extended to higher orders, novel types of proteins (i.e. a binding protein instead of an enzyme), or to intergenic epistasis. However, our current results are limited to a small number of adaptations in a single protein. Comparing the tradeoffs of adaptation among a diverse set of adaptive mutations in many proteins may reveal if the trends noted here are truly generalizable.

104

A particular trend of interest is whether stability-activity tradeoffs are widespread in evolution. Researchers speculate that stability may allow for greater evolvability, but this hypothesis remains largely untested with rare exceptions97. A direct answer to this question in beta-lactamase would provide synergistic data with the work presented here, as a multidimensional analysis of fitness-epistasis-stability tradeoffs could be attempted.

Although our results suggest destabilizing mutations to be the major determinant of fitness, an empirical high-resolution data set of the stability landscape, or the stability change of each single-amino-acid mutant in beta-lactamase, is necessary to verify this hypothesis. Similarly to stability-activity tradeoffs, the robustness-threshold model of protein stability would be also directly tested with this data on a large scale.

The results of a stability landscape may then be used to better guide protein engineering if stability-activity tradeoffs are widespread. In these cases, proteins would be engineered first for high stability and then for high activity. However, this hypothesis is under-examined, and others believe that stability that is too high may instead “lock” the protein into an undesirable conformation or decrease possible evolutionary results124.

In such cases, maintaining lower stability or an ensemble of protein states and accessible evolutionary pathways may be more desirable.

Our data also presents the first description of the “steepness” or greater structure of the landscape of mutants along an adaptive pathway, but of course this is a very small description generated by only providing four “slices” of the local landscapes. Because we provide the first description of landscape ruggedness and steepness, the extent of these trends and the greater structure of the landscape are of interest. For instance, are rugged features relatively rare and localized near adaptive residues or are they widespread in

105

protein space? For this comparison, local fitness landscape around a neutral or deleterious mutation would need to be constructed. “Deeper” data sets created by increasing the depth of such stacks could also be integrated with SWAG landscape visualization presented in Chapter 3.

Further fruitful investigations into mechanisms of higher order epistasis may include structural analyses. In other words, why do residues exhibiting high levels of epistasis interact strongly? Commonalities may exist that extend beyond our analysis of localization of these mutations to the core or surface of the protein. Computational or structural experiments on notable residues found in our data would be expected to have highly synergistic results with the landscapes presented here. Ultimately, multidimensional comparisons between structure, stability, fitness, and epistasis on a large scale would provide a thorough description of the adaptive landscape.

If epistatically interacting mutations are enriched during adaptation, is the reverse true? In other words, may mutating strongly epistatically interacting mutations provide greater adaptivity? It could be that such mutations are “key” residues in the protein. In this case targeting high-epistasis residues for directed evolution may enrich the generation of novel or high-activity proteins, but this hypothesis is entirely speculative and has no current theoretical basis. However, we note that the F230S mutation, which was key in results found in Chapter 3, showed extremely high epistatic interaction with the G238S adaptive residue.

Evolution in changing environments and selection pressure

Analogously to the large space of the fitness landscape, there are many possible search strategies to follow during directed evolution. Here, we show the impact of

106

selection environment by following only five of these paths in beta-lactamase. Clearly, there is potential for exploring more pathways. A notable outstanding question is not only whether the direction of selection affects evolution (as explored in Chapter 3), but also whether the strength of selection is a major determinant of each evolutionary pathway. It is highly likely that this is the case, and early experiments in more continuously increasing selection stringencies are indicative of large evolutionary potential125,126. Still, no extensive comparison exists between evolutionary paths as selection strength is varied from continuously increasing to larger discrete steps.

The libraries generated in this experiment are potentially a very rich source of data. We present here the deep sequencing of libraries following negative selection.

Unfortunately, limited conclusions can be drawn from this deep sequencing data on its own. The deep sequencing of libraries undergoing the other selection strategies would generate a data set ripe for comparison. The analysis of the similarities and differences of the differently selected libraries may reveal interesting trends that could also be cross- referenced to the fitness landscape data presented here in Chapter 2.

Further studies could also expand on characterization of the high-activity variants studied here. Other variants could be analyzed in the same detail as the BS-NEG-4 allele was to verify widespread influence of fitness valleys in evolutionary history. A structural analysis as applied to the most resistant variants would help explain how the protein functions and what role the F230S mutation has in residue-interactions in particular.

Additionally, data in diverse beta-lactamases variants and families might be compared to see if natural commonalities exist with the novel variants evolved in this study.

107

A particularly valuable extension to this work would include the comparison of directed evolution methods used in this work as applied to a protein other than beta- lactamase. The utility of negative selection in generating high-specificity antibodies would be a viable and highly practical extension of this research.

Applications in antibiotic resistance

Hopefully, the indication of changing diversity and resistance as generated both during both types of directed evolution experiments (Chapters 3 and 4) under various complex selection pressures is indicative of successful attempts to limit evolution when required. As noted in the previous section, many possible combinations of evolutionary pressure may be applied. In these experiments, rapid fluctuation generated less diversity and resistance. However, is this result dependent on the rate of fluctuation? Exploring longer fluctuations may instead allow more diverse and potentially fitter regions of sequence space to be explored. Therefore although this method shows promising utility in limiting evolution, optimizing methodologies such as the period of fluctuation would better inform clinical practice.

The apparent difference in evolving specialist or generalist populations during cocktail selections or cycling selections respectively is surprising and merits further study. More extensive testing of the variation in resistances generated would help to better evaluate whether these results are significant. We are currently undertaking deep sequencing of all libraries generated. This data should better explain the results found and indicate significance of variability or trends between selection types. This data could also be cross-referenced to the fitness landscape data generated here. If specialist populations

108

evolved in the cocktail selection, for example, they might be identifiable by the indication of two common families of plasmid as found by deep sequencing.

Because our analysis of the experiments in Chapter 4 included the entire plasmid, it is likely a good analog of natural evolution. However, at this point it is difficult to explain the effects of many mutations outside of the beta-lactamase gene. Subcloning of beta-lactamase could be undertaken to see if these effects can be isolated to the plasmid backbone. Likewise, expression tests could be attempted to see if the copy number of the plasmid, mRNA or protein stability, or yield of beta-lactamase protein changed in particular cases.

A final clear extension of this work is to apply it to other antibiotics and systems.

Resistance to antibiotics used here exhibits some antagonism. However, it seems likely that the results are generalizable to any evolutionary situation with a fitness tradeoff. The research could easily not only be extended to other antibiotics, but to other evolutionary models such as the effectiveness of cocktail therapy on the evolution of drug-resistant cancer.

109 References

1. Wright, S. The roles of mutation, inbreeding, crossbreeding and selection in evolution.

Proceedings of the VI International Congress of Genetics 1, 356-366 (1932).

2. Smith, J. M. Natural selection and the concept of a protein space. Nature 225, 563–564

(1970).

3. Fisher, R. A. The Genetical Theory of Natural Selection. (Oxford University Press,

1999).

4. Gavrilets, S. Fitness Landscapes and the Origin of Species. (Princeton University Press,

2004).

5. Chiotti, K. E. et al. The Valley-of-Death: Reciprocal sign epistasis constrains adaptive

trajectories in a constant, nutrient limiting environment. 104, 431–437 (2014).

6. Weinreich, D. M. & Chao, L. Rapid evolutionary escape by large populations from local

fitness peaks is likely in nature. Evolution 59, 1175–1182 (2005).

7. Gokhale, C. S., Iwasa, Y., Nowak, M. A. & Traulsen, A. The pace of evolution across

fitness valleys. Journal of Theoretical Biology 259, 613–620 (2009).

8. Orr, H. A. Dobzhansky, Bateson, and the genetics of speciation. Genetics 144, 1331

(1996).

9. Hubbell, S. P. The Unified Neutral Theory of Biodiversity and Biogeography. (Princeton

University Press, 2001).

10. Breen, M. S., Kemena, C., Vlasov, P. K., Notredame, C. & Kondrashov, F. A. Epistasis

as the primary factor in molecular evolution. Nature (2012). doi:10.1038/nature11510

110

11. Lehner, B. Molecular mechanisms of epistasis within and between genes. Trends Genet

(2011). doi:10.1016/j.tig.2011.05.007

12. Tokuriki, N. et al. Diminishing returns and tradeoffs constrain the laboratory

optimization of an enzyme. Nat Comms 3, 1257 (2012).

13. MacLean, R. C., Perron, G. G. & Gardner, A. Diminishing returns from beneficial

mutations and pervasive epistasis shape the fitness landscape for rifampicin resistance in

Pseudomonas aeruginosa. Genetics 186, 1345–1354 (2010).

14. Chou, H. H., Chiu, H. C., Delaney, N. F., Segre, D. & Marx, C. J. Diminishing returns

epistasis among beneficial mutations decelerates adaptation. Science 332, 1190–1192

(2011).

15. Boucher, J. I. et al. Viewing protein fitness landscapes through a next-gen lens. Genetics

198, 461–471 (2014).

16. Melnikov, A., Rogov, P., Wang, L., Gnirke, A. & Mikkelsen, T. S. Comprehensive

mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes.

Nucleic Acids Research 42, e112–e112 (2014).

17. Hietpas, R. T., Jensen, J. D. & Bolon, D. N. A. Experimental illumination of a fitness

landscape. Proceedings of the National Academy of Sciences (2011).

doi:10.1073/pnas.1016024108

18. Pitt, J. N. & Ferre-D'Amare, A. R. Rapid construction of empirical RNA fitness

landscapes. Science 330, 376–379 (2010).

19. Jacquier, H. et al. Capturing the mutational landscape of the beta-lactamase TEM-1.

Proceedings of the National Academy of Sciences 110, 13067–13072 (2013).

111

20. Kouyos, R. D. et al. Exploring the Complexity of the HIV-1 Fitness Landscape. PLoS

Genet. 8, e1002551 (2012).

21. Fowler, D. M. et al. High-resolution mapping of protein sequence-function relationships.

Nature Methods 7, 741–746 (2010).

22. Roscoe, B. P., Thayer, K. M., Zeldovich, K. B., Fushman, D. & Bolon, D. N. A. Analyses

of the effects of all ubiquitin point mutants on yeast growth rate. J Mol Biol 425, 1363–

1377 (2013).

23. Firnberg, E., Labonte, J. W., Gray, J. J. & Ostermeier, M. A comprehensive, high-

resolution map of a gene's fitness landscape. Mol. Biol. Evol. 31, 1581–1592 (2014).

24. Mustonen, V. & Lässig, M. From fitness landscapes to seascapes: non-equilibrium

dynamics of selection and adaptation. Trends in Genetics 25, 111–119 (2009).

25. Richeter, H. & Engelbrecht, A. Recent Advances in the Theory and Application of Fitness

Landscapes. 6, (Springer Berlin Heidelberg, 2014).

26. Meyer, J. R. et al. Repeatability and contingency in the evolution of a key innovation in

phage lambda. Science 335, 428–432 (2012).

27. Flynn, K. M., Cooper, T. F., Moore, F. B.-G. & Cooper, V. S. The environment affects

epistatic interactions to alter the topology of an empirical fitness landscape. PLoS Genet.

9, e1003426 (2013).

28. Hietpas, R. T., Bank, C., Jensen, J. D. & Bolon, D. N. A. Shifting fitness landscapes in

response to altered environments. Evolution 67, 3512–3522 (2013).

29. Poelwijk, F. J., Tănase-Nicola, S., Kiviet, D. J. & Tans, S. J. Reciprocal sign epistasis is a

necessary condition for multi-peaked fitness landscapes. Journal of Theoretical Biology

272, 141–144 (2011).

112

30. Gould, S. J. Wonderful Life: The Burgess Shale and the Nature of History. (W. W.

Norton & Company, 1990).

31. Gong, L. I. & Bloom, J. D. Epistatically interacting substitutions are enriched during

adaptive protein evolution. PLoS Genet. 10, e1004328 (2014).

32. Jones, A. G., Bürger, R. & Arnold, S. J. Epistasis and natural selection shape the

mutational architecture of complex traits. Nat Comms 5, (2014).

33. Hall, D. W., Agan, M. & Pope, S. C. Fitness epistasis among 6 biosynthetic loci in the

budding yeast Saccharomyces cerevisiae. J Hered 101 Suppl 1, S75–84 (2010).

34. Franke, J., Klözer, A., de Visser, J. A. G. M. & Krug, J. Evolutionary accessibility of

mutational pathways. PLoS Comput Biol 7, e1002134 (2011).

35. Kvitek, D. J. & Sherlock, G. Reciprocal sign epistasis between frequently experimentally

evolved adaptive mutations causes a rugged fitness landscape. PLoS Genet. 7, e1002056

(2011).

36. O'Maille, P. E. et al. Quantitative exploration of the catalytic landscape separating

divergent plant sesquiterpene synthases. Nature Chemical Biology 4, 617–623 (2008).

37. Brown, K. M. et al. Compensatory mutations restore fitness during the evolution of

dihydrofolate reductase. Molecular Biology and Evolution 27, 2682–2690 (2010).

38. Ortlund, E. A., Bridgham, J. T., Redinbo, M. R. & Thornton, J. W. Crystal structure of an

ancient protein: evolution by conformational epistasis. Science 317, 1544–1548 (2007).

39. Bank, C., Hietpas, R. T., Jensen, J. D. & Bolon, D. N. A. A systematic survey of an

intragenic epistatic landscape. Mol. Biol. Evol. 229–238 (2014).

doi:10.1093/molbev/msu301

113

40. Phillips, P. C. Epistasis--the essential role of gene interactions in the structure and

evolution of genetic systems. Nat Rev Genet 9, 855–867 (2008).

41. Olson, C. A., Wu, N. C. & Sun, R. A comprehensive biophysical description of pairwise

epistasis throughout an entire protein domain. Current Biology 2643–2651 (2014).

doi:10.1016/j.cub.2014.09.072

42. Hinkley, T. et al. A systems analysis of mutational effects in HIV-1 protease and reverse

transcriptase. Nat Genet 43, 487–489 (2011).

43. Otwinowski, J. & Plotkin, J. B. Inferring fitness landscapes by regression produces

biased estimates of epistasis. Proceedings of the National Academy of Sciences 111,

E2301–9 (2014).

44. Weinreich, D. M., Lan, Y., Wylie, C. S. & Heckendorn, R. B. Should evolutionary

geneticists worry about higher-order epistasis? Curr Opin Genet Dev 23, 700–707

(2013).

45. Weinreich, D. M., Delaney, N. F., Depristo, M. A. & Hartl, D. L. Darwinian evolution

can follow only very few mutational paths to fitter proteins. Science 312, 111–114

(2006).

46. Rokyta, D. R. et al. Epistasis between Beneficial Mutations and the Phenotype-to-Fitness

Map for a ssDNA Virus. PLoS Genet. 7, e1002075 (2011).

47. Khan, A. I., Dinh, D. M., Schneider, D., Lenski, R. E. & Cooper, T. F. Negative Epistasis

Between Beneficial Mutations in an Evolving Bacterial Population. Science 332, 1193–

1196 (2011).

114

48. da Silva, J., Coetzer, M., Nedellec, R., Pastore, C. & Mosier, D. E. Fitness epistasis and

constraints on adaptation in a human immunodeficiency virus type 1 protein region.

Genetics 185, 293–303 (2010).

49. Moradigaravand, D. et al. Recombination Accelerates Adaptation on a Large-Scale

Empirical Fitness Landscape in HIV-1. PLoS Genet. 10, e1004439 (2014).

50. Costanzo, M. S., Brown, K. M. & Hartl, D. L. Fitness trade-offs in the evolution of

dihydrofolate reductase and drug resistance in plasmodium falciparum. Plos One 6,

e19636 (2011).

51. Kimura, M. The Neutral Theory of Molecular Evolution. (Cambridge University Press,

1984).

52. Ohta, T. The nearly neutral theory of molecular evolution . Annual Review of Ecology

and Systematics (1992). doi:10.2307/2097289

53. Muller, H.J. (1964). The relation of recombination to mutational advance. Mutat. Res.

106, 2–9.

54. Bloom, J. D., Romero, P. A., Lu, Z. & Arnold, F. H. Neutral genetic drift can alter

promiscuous protein functions, potentially aiding functional evolution. Biol Direct 2, 17–

17 (2007).

55. Bershtein, S., Goldin, K. & Tawfik, D. S. Intense neutral drifts yield robust and evolvable

consensus proteins. J Mol Biol 379, 1029–1044 (2008).

56. Smith, W. S., Hale, J. R. & Neylon, C. Applying neutral drift to the directed molecular

evolution of a β-glucuronidase into a β-galactosidase: Two different evolutionary

pathways lead to the same variant. BMC Res Notes 4, 138 (2011).

115

57. Petrie, K. L. & Joyce, G. F. Limits of neutral drift: lessons from the in vitro evolution of

two ribozymes. J Mol Evol 1–16 (2014). doi:10.1007/s00239-014-9642-z

58. Romero, P. A. & Arnold, F. H. Exploring protein fitness landscapes by directed

evolution. Nat Rev Mol Cell Biol 10, 866–876 (2009).

59. Brustad, E. M. & Arnold, F. H. Optimizing non-natural protein function with directed

evolution. Curr Opin Chem Biol 15, 201–210 (2011).

60. Dellus-Gur, E., Toth-Petroczy, A., Elias, M. & Tawfik, D. S. What makes a protein fold

amenable to functional innovation? Fold polarity and stability tradeoffs. Journal of

Molecular Biology (2013). doi:10.1016/j.jmb.2013.03.033

61. Abriata, L. A., M Salverda, M. L. & Tomatis, P. E. Sequence-function-stability

relationships in proteins from datasets of functionally annotated variants: The case of

TEM β-lactamases. FEBS Lett. 586, 3330–3335 (2012).

62. Heineman, R. H. & Brown, S. P. Experimental evolution of a bacteriophage virus reveals

the trajectory of adaptation across a fecundity/longevity trade-off. Plos One 7, e46322

(2012).

63. Stemmer, W. P. Rapid evolution of a protein in vitro by DNA shuffling. Nature 370,

389–391 (1994).

64. D’Costa, V. M. et al. Antibiotic resistance is ancient. Nature 477, 457–461 (2011).

65. Risso, V. A., Gavira, J. A., Mejia-Carmona, D. F., Gaucher, E. A. & Sanchez-Ruiz, J. M.

Hyperstability and substrate promiscuity in laboratory resurrections of precambrian β-

lactamases. ACS Synth. Biol. 135, 2899–2902 (2013).

66. Livermore, D. M. Beta-Lactamases in laboratory and clinical resistance. Clin Microbiol

Rev 8, 557–584 (1995).

116

67. Poole, K. Resistance to beta-lactam antibiotics. Cell. Mol. Life Sci. 61, 2200–2223

(2004).

68. Orencia, M. C., Yoon, J. S., Ness, J. E., Stemmer, W. P. & Stevens, R. C. Predicting the

emergence of antibiotic resistance by directed evolution and structural analysis. Nat.

Struct. Biol. 8, 238–242 (2001).

69. Ambler, R. P. et al. A standard numbering scheme for the class-a beta-lactamases.

Biochem J 276, 269–270 (1991).

70. Wang, X., Minasov, G. & Shoichet, B. K. Evolution of an antibiotic resistance enzyme

constrained by stability and activity trade-offs. Journal of Molecular Biology 320, 85–95

(2002).

71. Schenk, M. F., Szendro, I. G., Salverda, M. L. M., Krug, J. & de Visser, J. A. G. M.

Patterns of epistasis between beneficial mutations in an antibiotic resistance gene.

Molecular Biology and Evolution (2013). doi:10.1093/molbev/mst096

72. Singh, M. K. & Dominy, B. N. The evolution of cefotaximase activity in the TEM β-

lactamase. Journal of Molecular Biology 415, 205–220 (2012).

73. Poyart, C., Mugnier, P., Quesne, G., Berche, P. & Trieu-Cuot, P. A novel extended-

spectrum TEM-type beta-lactamase (TEM-52) associated with decreased susceptibility to

moxalactam in Klebsiella pneumoniae. Antimicrob. Agents Chemother. 42, 108–113

(1998).

74. Barlow, M. & Hall, B. G. Predicting evolutionary potential: in vitro evolution accurately

reproduces natural evolution of the tem beta-lactamase. Genetics 160, 823–832 (2002).

75. Kopsidas, G. et al. RNA mutagenesis yields highly diverse mRNA libraries for in vitro

protein evolution. BMC Biotechnol. 7, 18 (2007).

117

76. Zaccolo, M. & Gherardi, E. The effect of high-frequency random mutagenesis on in vitro

protein evolution: a study on TEM-1 beta-lactamase. J Mol Biol 285, 775–783 (1999).

77. Long McGie, J., Liu, A. D. & Schellenberger, V. Rapid in vivo evolution of a β‐

lactamase using phagemids. Biotechnol. Bioeng. 68, 121–125 (2000).

78. Mabilat, C. & Courvalin, P. Development of ‘oligotyping’ for characterization and

molecular epidemiology of TEM beta-lactamases in members of the family

Enterobacteriaceae. Antimicrob. Agents Chemother. 34, 2210–2216 (1990).

79. Sideraki, V., Huang, W., Palzkill, T. & Gilbert, H. F. A secondary drug resistance

mutation of TEM-1 beta-lactamase that suppresses misfolding and aggregation.

Proceedings of the National Academy of Sciences 98, 283–288 (2001).

80. Salverda, M. L. M., de Visser, J. A. G. M. & Barlow, M. Natural evolution of TEM-1 β-

lactamase: experimental reconstruction and clinical relevance. FEMS Microbiology

Reviews (2010). doi:10.1111/j.1574-6976.2010.00222.x

81. Salverda, M. L. M. et al. Initial mutations direct alternative pathways of protein

evolution. PLoS Genet. 7, e1001321 (2011).

82. Matagne, A., Lamotte-Brasseur, J. & Frere, J. M. Catalytic properties of class A beta-

lactamases: efficiency and diversity. Biochem J 330 ( Pt 2), 581–598 (1998).

83. Schenk, M. F. et al. Role of pleiotropy during adaptation of TEM‐1 β‐lactamase to two

novel antibiotics. Evolutionary Applications (2014). doi:10.1111/eva.12200

84. Bershtein, S., Segal, M., Bekerman, R., Tokuriki, N. & Tawfik, D. S. Robustness-

epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444,

929–932 (2006).

118

85. Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science.

Nature Methods 11, 801–807 (2014).

86. Kondrashov, D. A. & Kondrashov, F. A. Topological features of rugged fitness

landscapes in sequence space. Trends Genet 31, 24–33 (2015).

87. Martin, G., Elena, S. F. & Lenormand, T. Distributions of epistasis in microbes fit

predictions from a fitness landscape model. Nature Genetics 39, 555–560 (2007).

88. Lunzer, M., Golding, G. B. & Dean, A. M. Pervasive cryptic epistasis in molecular

evolution. PLoS Genet. 6, e1001162–e1001162 (2010).

89. Melamed, D., Young, D. L., Gamble, C. E., Miller, C. R. & Fields, S. Deep mutational

scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein.

RNA 19, 1537–1551 (2013).

90. Araya, C. L. & Fowler, D. M. Deep mutational scanning: assessing protein function on a

massive scale. Trends Biotechnol 435–442 (2011). doi:10.1016/j.tibtech.2011.04.003

91. Podgornaia, A. I. & Laub, M. T. Pervasive degeneracy and epistasis in a protein-protein

interface. Science 347, 673–677 (2015).

92. Hall, B. G. Predicting evolution by in vitro evolution requires determining evolutionary

pathways. Antimicrob. Agents Chemother. 46, 3035–3038 (2002).

93. Firnberg, E. & Ostermeier, M. PFunkel: efficient, expansive, user-defined mutagenesis.

Plos One 7, e52031–e52031 (2012).

94. Sohka, T. et al. An externally tunable bacterial band-pass filter. Proceedings of the

National Academy of Sciences 106, 10135–10140 (2009).

95. Østman, B., Hintze, A. & Adami, C. Impact of epistasis and pleiotropy on evolutionary

adaptation. Proc. Biol. Sci. 279, 247–256 (2012).

119

96. Bloom, J. D. et al. Thermodynamic prediction of protein neutrality. Proceedings of the

National Academy of Sciences 102, 606–611 (2005).

97. Tokuriki, N. & Tawfik, D. S. Stability effects of mutations and protein evolvability.

Current Opinion in Structural Biology 19, 596–604 (2009).

98. Poon, A. F. Y. & Chao, L. Functional origins of fitness effect-sizes of compensatory

mutations in the DNA bacteriophage phiX174. Evolution 60, 2032–2043 (2006).

99. Baneyx, F. & Mujacic, M. Recombinant protein folding and misfolding in Escherichia

coli. Nat Biotechnol 22, 1399–1408 (2004).

100. Poelwijk, F. J., Kiviet, D. J., Weinreich, D. M. & Tans, S. J. Empirical fitness landscapes

reveal accessible evolutionary paths. Nature 445, 383–386 (2007).

101. de Visser, J. A. G. M. & Krug, J. Empirical fitness landscapes and the predictability of

evolution. Nat Rev Genet (2014). doi:10.1038/nrg3744

102. Bloom, J. D. & Arnold, F. H. In the light of directed evolution: pathways of adaptive

protein evolution. Proceedings of the National Academy of Sciences 106 Suppl 1, 9995–

10000 (2009).

103. Palmer, A. C., Angelino, E. & Kishony, R. Chemical decay of an antibiotic inverts

selection for resistance. Nature Chemical Biology 6, 105–107 (2010).

104. Dykhuizen, D. E., Dean, A. M. & Hartl, D. L. Metabolic flux and fitness. Genetics 115,

25–31 (1987).

105. Varma, A. & Palsson, B. O. Metabolic flux balancing: basic concepts, scientific and

practical use. Biotechnology (1994).

106. DePristo, M. A., Hartl, D. L. & Weinreich, D. M. Mutational reversions during adaptive

protein evolution. Mol. Biol. Evol. 24, 1608–1610 (2007).

120

107. Taute, K. M., Gude, S., Nghe, P. & Tans, S. J. Evolutionary constraints in variable

environments, from proteins to networks. Trends Genet 30, 192–198 (2014).

108. Podolsky, T., Fong, S. T. & Lee, B. T. Direct selection of tetracycline-sensitive

Escherichia coli cells using nickel salts. Plasmid 36, 112–115 (1996).

109. Firnberg, E. & Ostermeier, M. The genetic code constrains yet facilitates Darwinian

evolution. Nucleic Acids Research (2013). doi:10.1093/nar/gkt536

110. Rambaut, A. & Drummond, A. FigTree. Tree figure drawing tool. (2012). at

111. Thai, Q. K., Bös, F. & Pleiss, J. The Lactamase Engineering Database: a critical survey

of TEM sequences in public databases. BMC Genomics 10, 390 (2009).

112. Huang, W. & Palzkill, T. A natural polymorphism in beta-lactamase is a global

suppressor. Proceedings of the National Academy of Sciences 94, 8801–8806 (1997).

113. John, S. & Jain, K. Effect of drift, selection and recombination on the equilibrium

frequency of deleterious mutations. Journal of Theoretical Biology (2014).

doi:10.1016/j.jtbi.2014.10.023

114. Novais, A. et al. Evolutionary trajectories of beta-lactamase CTX-M-1 cluster enzymes:

predicting antibiotic resistance. PLoS Pathog 6, e1000735 (2010).

115. Brown, E. M. & Nathwani, D. Antibiotic cycling or rotation: a systematic review of the

evidence of efficacy. J. Antimicrob. Chemother. 55, 6–9 (2005).

116. Kim, S., Lieberman, T. D. & Kishony, R. Alternating antibiotic treatments constrain

evolutionary paths to multidrug resistance. Proceedings of the National Academy of

Sciences 201409800 (2014). doi:10.1073/pnas.1409800111

121

117. Bonhoeffer, S., Lipsitch, M. & Levin, B. R. Evaluating treatment protocols to prevent

antibiotic resistance. Proceedings of the National Academy of Sciences 94, 12106–12111

(1997).

118. Bergstrom, C. T. From The Cover: Ecological theory suggests that antimicrobial cycling

will not reduce antimicrobial resistance in hospitals. Proceedings of the National

Academy of Sciences 101, 13285–13290 (2004).

119. Ochman, H., Lawrence, J. G. & Groisman, E. A. Lateral gene transfer and the nature of

bacterial innovation. Nature 405, 299–304 (2000).

120. Esvelt, K.M., Carlson, J.C., and Liu, D.R. (2011). A system for the continuous directed

evolution of biomolecules. Nature 472, 499–503.

121. Russel, M., Kidd, S. & Kelley, M. R. An improved filamentous helper phage for

generating single-stranded plasmid DNA. Gene 45, 333–338 (1986).

122. Drake, J. W., Charlesworth, B., Charlesworth, D. & Crow, J. F. Rates of spontaneous

mutation. Genetics 148, 1667 (1998).

123. Stewart, P.S., and Costerton, J.W. (2001). Antibiotic resistance of bacteria in biofilms.

Lancet 358, 135–138.

124. Bloom, J. D., Wilke, C. O., Arnold, F. H. & Adami, C. Stability and the evolvability of

function in a model protein. Biophysical Journal 86, 2758–2764 (2004).

125. Zhang, Q. et al. Acceleration of emergence of bacterial antibiotic resistance in connected

microenvironments. Science 333, 1764–1767 (2011).

126. Hermsen, R., Deris, J.B., and Hwa, T. (2012). On the rapidity of antibiotic resistance

evolution facilitated by a concentration gradient. Proceedings of the National Academy of

Sciences 109, 10775–10780.

122

Curriculum Vitae

Barrett Steinberg was born to Fred Steinberg and Marilyn Gresham on May 2, 1988 in

Memphis, TN. After graduating high school at Memphis University School in 2006, he attended Boston University. There he graduated magna cum laude with a Bachelor of Science degree in biomedical engineering in 2010. In that same year, he began doctoral studies in the

Department of Chemical and Biomolecular Engineering at the Johns Hopkins University in

Baltimore, MD.

123