<<

Research Collection

Doctoral Thesis

Development of a Standardized Assembly Technology for Large- Scale DNA Constructs and Demonstration of its Applicability to Build Synthetic

Author(s): Venetz, Jonathan E.

Publication Date: 2020

Permanent Link: https://doi.org/10.3929/ethz-b-000421086

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library DISS. ETH NO. 26654

Development of a Standardized Assembly Technology for Large-Scale DNA Constructs and Demonstration of its Applicability to Build Synthetic Chromosomes

A thesis submitted to attain the degree of

DOCTOR OF SCIENCES of ETH ZURICH

(Dr. sc. ETH Zurich)

presented by Jonathan Edelbert Venetz

MSc ETH Biotechnology

born 15.08.1991 citizen of Stalden VS

accepted on the recommendation of Prof. Dr. Beat Christen Prof. Dr. Jorn¨ Piel Prof. Dr. Sai Reddy

2020

Abstract

The fields of synthetic and systems have been significantly impacted by large- scale de novo DNA synthesis, which allows for the design and chemical synthesis of novel -sized DNA constructs. However, these synthesis and assembly technologies are still rather novel and require a high amount of financial and human resources to be applicable. As a consequence, only a small number of research groups can profit from this technology. This thesis aimed to make the assembly of large syn- thetic DNA constructs more accessible to speed up the number of discoveries enabled by synthetic genomics.

A highly efficient chromosome-scale DNA assembly technology was developed to make DNA synthesis technology financially affordable. A factor adding to the price of such chromosome-scale assemblies is the laborious screening process after the assembly. In this work, auxotrophic yeast markers were used to select for the correct construct assembly. Only cells in which these markers, thus the target DNA construct, have been assembled correctly can survive. However, it is possible that the recombination of these auxotrophic selection markers with their counterparts in wild type yeast can result in false-positive clones. Therefore, to minimize the risk of such clones, a Sac- charomyces cerevisiae strain was engineered to not contain any elements of the wild type auxotrophic markers. The last step of the assembly technology development was the elimination of the high GC-content barrier, which has previously prevented large- scale DNA assemblies in yeast. To remove this barrier, different autonomously replicat- ing sequences (ARS) were tested and implemented into chromosome-sized high GC constructs. The DNA assembly technology presented in this thesis combined with a previously described standardized assembly bar code pipeline makes the efficient and inexpensive assembly of almost any synthetic chromosome-scale DNA construct pos- sible.

To test the standardized assembly pipeline, the synthetic Caulobacter ethensis-2.0 (C. eth-2.0) was synthesized, assembled, and functionality tested. The 785 kbp sequence of C. eth-2.0 was based on the sequence of C. eth-1.0, the essential genome of the prokaryote Caulobacter crescentus. The rewritten DNA sequence of C. eth-2.0 differed from the parental sequence of C. eth-1.0 by 17%. The alteration of the se- quence resulted in the synonymous rewriting of 123,562 codons, which corresponds to 56.1% of the total codons. The functionality testing of the rewritten synthetic genome was conducted by applying transposon sequencing on the segments, which are 20 kbp

i DNA building blocks of C. eth-2.0. In sum, 81.5% of all synthetic within C. eth- 2.0 were proven to be functional. The analysis of non-functional elements enabled the identification of novel essential genetic features within coding sequences, as well as genetic control features within genes involved in division control.

To demonstrate the wider applicability of large DNA constructs, a project in the field of biosensor research was conducted. A prototype of a biosensor system based on ge- netically modified S. cerevisiae and a polymeric matrix material is presented. The cells immobilized within the polymeric material detect the presence of estradiol, resulting in a fluorescent signal. The signal can be analyzed using a smartphone equipped with a filter set. This sensor device is envisioned to be used as an inexpensive first line of analysis in a field scenario, where contamination with endocrine-disrupting chemicals is suspected. The high modularity of the biosensor herein described allows exchanging the molecule to be sensed. Therefore, future sensors can be created at a fast pace to expand the detection substrate other common pollutants, such as heavy metals.

The field of is of dual-use, which holds great potential but also consid- erable risks. A typical dual-use scenario is synthetic , which can either be used to develop novel vaccines, but also pathogens. In the last chapter of this thesis, the current Swiss policies and regulations concerning genetically modified are reviewed. Additionally, my opinion of why these regulations do not sufficiently cover the synthetic biology of the future is provided. Possible solutions to be implemented in the future such as stakeholder panels and self-regulation are discussed. This chapter aims at introducing the regulatory and ethical difficulties synthetic biologists will face in the future and provide a base for discussion of how to solve them.

In summary, this thesis presents a standardized, large-scale synthetic DNA assembly technology, which opens up new possibilities for fundamental and applied research. This thesis also aims at opening up a discussion regarding the regulatory and ethical consequences of the proposed technology.

ii Zusammenfassung

Die Bereiche der Synthetischen- und Systembiologie wurden durch die de novo DNS- Synthese, die das Design und die chemische Synthese von neuartigen DNS Konstruk- ten in der Grossenordnung¨ von Chromosomen ermoglicht,¨ erheblich beeinflusst. Diese neuen Synthesetechnologien erfordern noch einen hohen Aufwand an finanziellen und personellen Ressourcen. Folglich konnen¨ nur wenige Forschungsgruppen von dieser Technologie profitieren. Das Ziel dieser Arbeit war es, den Zusammenbau grosser syn- thetischer DNA-Konstrukte zuganglicher¨ zu machen, um das Tempo neuer Entdeckun- gen, die durch die synthetische Genomik ermoglicht¨ werden, zu beschleunigen.

Es ist gelungen, eine hocheffiziente DNS-Zusammenbautechnologie im Chromoso- menmassstab zu entwickeln, die die DNS-Synthesetechnologie erschwinglich macht. Ein Faktor, der den Preis grosser DNS Konstrukte erhoht,¨ ist die aufwendige Selektion nach dem Zusammenbau. In dieser Arbeit wurden auxotrophe Hefemarker zur Selekti- on fur¨ die richtige Konstruktionsanordnung verwendet. Nur Zellen, in denen diese Mar- ker und somit das DNS Konstrukt korrekt zusammengesetzt wurden, sind uberlebens-¨ fahig.¨ Es ist jedoch moglich,¨ dass die Rekombination dieser auxotrophen Selektions- marker mit ihren Gegenstucken¨ in Wildtyp-Hefe zu falsch-positiven Klonen fuhren¨ kann. Um das Risiko solcher Klone zu minimieren, wurde ein Saccharomyces cerevisiae- Stamm konstruiert, der keine Elemente der auxotrophen Wildtyp-Marker enthalt.¨ Der letzte Schritt der Entwicklung einer effizienten DNS-Zusammenbautechnologie war das Uberwinden¨ der GC-Gehalt Hurde,¨ welche bisher den Grossteil der DNS-Zusammen- bauten verhindert hatte. Dafur¨ wurden verschiedene autonom replizierende Sequenzen (ARS) getestet und in Hoch-GC-Konstrukte eingebaut. Die in dieser Arbeit vorgestellte DNS-Zusammenbautechnologie - kombiniert mit einer fruher¨ beschriebenen standardi- sierten Zusammenbau-Barcode-Pipeline - ermoglicht¨ die effiziente und kostengunstige¨ Erstellung nahezu aller synthetischen DNS-Konstrukte.

Um die standardisierte Zusammenbaupipeline zu testen wurde das synthetische Caulo- bacter ethensis-2.0 (C. eth-2.0) Genom synthetisiert, zusammengebaut und auf seine Funktionalitat¨ hin getestet. Die 785 kbp Sequenz von C. eth-2.0 basiert auf der Se- quenz von C. eth-1.0, dem essentiellen Genom des Prokaryonten Caulobacter crescen- tus. Die umgeschriebene DNS-Sequenz von C. eth-2.0 unterschied sich um 17% von der C. eth-1.0 Sequenz. Die Sequenzveranderung¨ fuhrte¨ zur synonymen Umschrei- bung von 123.562 Codons, was 56,1% aller Codons entspricht. Die Funktionalitat¨ des umgeschriebenen synthetischen Genoms wurde durch Transposon-Sequenzierung auf der Segment-Ebene gepruft.¨ Segmente sind 20 kbp DNS-Bausteine von C. eth-2.0. Es

iii wurde nachgewiesen, dass 81,5% aller synthetischen in C. eth-2.0 funktionsfahig¨ sind. Die Analyse der nicht-funktionalen Elemente ermoglichte¨ die Identifizierung neu- er wesentlicher genetischer Merkmale innerhalb der kodierenden Sequenzen, sowie genetischer Kontrollmerkmale innerhalb der Gene, die an der Kontrolle der Zellteilung beteiligt sind.

Ein Projekt im Bereich der Biosensor-Forschung hat gezeigt wie grosse DNS-Konstrukte verwendet werden konnen.¨ Ein Prototyp eines Biosensorsystems basierend auf gene- tisch modifizierten S. cerevisiae-Zellen und einem polymeren Matrixmaterial wurde ent- wickelt. Die im polymeren Material fixierten Zellen erkennen Estradiol und emittieren ein Fluoreszenzsignal. Das Signal kann mit einem Smartphone mit Filteraufsatz, analysiert werden. Dieses Sensorgerat¨ soll als kostengunstige¨ erste Analysemethode im Feld bei vermuteter Kontamination mit endokrin wirksamen Chemikalien eingesetzt werden. Die hohe Modularitat¨ des hier beschriebenen Biosensors ermoglicht¨ den Austausch des zu erfassenden Molekuls.¨ Daher konnen¨ zukunftige¨ Sensoren schnell entwickelt werden, um die Detektion von anderen Schadstoffen, wie z.B. Schwermetalle, zu ermoglichen.¨

Der Bereich der synthetischen Biologie hat einen doppelten Verwendungszweck, der ein großes Potenzial, aber auch erhebliche Risiken birgt. Ein typisches Dual-Use Sze- nario sind synthetische Viren, die entweder zur Entwicklung neuartiger Impfstoffe, oder aber von Krankheitserregern verwendet werden konnen.¨ Im letzten Kapitel dieser Ar- beit werden die aktuellen Schweizer Gesetze und Vorschriften bezuglich¨ gentechnisch veranderter¨ Organismen uberpr¨ uft.¨ In diesem Kapitel werden die regulatorischen und ethischen Schwierigkeiten vorgestellt, mit denen sich die synthetische Biologie in Zu- kunft konfrontiert sehen wird, und es soll eine Diskussionsgrundlage fur¨ die Losung¨ dieser Probleme geschaffen werden.

Zusammenfassend stellt diese Arbeit eine standardisierte synthetische DNS-Zusammen- bautechnologie vor, die neue Moglichkeiten¨ fur¨ die Grundlagen- und Angewandte For- schung eroffnet.¨ Diese These zielt auch darauf ab, eine Diskussion zu regulatorischen und ethischen Konsequenzen dieser Technologie anzuregen.

iv Acknowledgments

First and foremost I would like to thank Prof. Beat Christen for the opportunity to con- duct my PhD-thesis in his group. The environment you create at work and also the atmosphere during lunch-breaks was always very stimulating and thought-provoking. The always open door to your office for discussions on the latest developments of the projects or new ideas was greatly appreciated and I highly value your support.

A big thank you to Dr. Matthias Christen for helping me keep my project on track. You taught me many valuable lessons. Be it in biology, project management, financial plan- ning, politics, business, human interactions up to mountaineering and climbing. I will miss our discussions during morning coffee breaks and afternoon colas, which often resulted in significant advancement of my projects.

Additionally, I would like to thank Prof. Jorn¨ Piel and Prof. Sai Reddy for agreeing to be my co-examiners. The meetings were always very stimulating and I much appreciated the easy communication to arrange details of meetings.

To Nadine Lobsiger, a collaboration partner who became a friend. Conducting projects and writing manuscripts was always a great experience with you. Your writing skills never cease to amaze me. I am very thankful for the very critical and thorough com- ments and the motivation boosts you provided during the writing of this thesis.

To Philipp Schachle,¨ who started his Christen-Lab adventure on the same day as I did. You showed me the cultural and geographical marvels of Liechtenstein (in one week- end). I would like to thank you for the great friendship that developed over the years and hope that there will be many more climbing, photographing and travel events coming up. Also, thank you very much for the helpful comments you made during the writing of this thesis.

To Carlos Flores-Tinoco, you always liked to share your wisdom during our talks, be it in the lab or over a drink after work. I would also like to thank you for your critical and thought-provoking comments on this thesis.

To Marielle¨ van Kooten and Luca Del Medico, as well as all the students and members of the Christen Lab. I would like to thank you for contributing to a great working envi- ronment. I would like to thank the students I supervised during the course of this thesis

v for giving me the opportunity to teach and be taught.

I would like to thank all the members of the IMSB, who I had the pleasure to get to know over the last three years. Furthermore, I would also like to thank the members of the Functional Materials Laboratory at D-CHAB, the coffee breaks, joint evenings and the retreat were a blast.

Last but definitely not least I would like to thank my family. My parents for their unwa- vering support before and during my studies. My sister Veronique´ and her family, for always getting me back to the ground when I was starting to fly too high. My brother Nicolas, for his calming effects, whenever the storm grows fierce.

vi Contents

1 Introduction 1 1.1 S. cervisiae, a Longstanding Companion of Humankind ...... 1 1.2 DNA Cloning, a Technology to Shape Modern Biology ...... 4 1.3 Systems Biology, Unraveling the Networks of ...... 8 1.4 Synthetic Biology, a New Field to Shape Future Research ...... 11

2 Aim of the Thesis 15

3 Establishing a Synthetic Genome Assembly Technology 17 3.1 Introduction ...... 18 3.2 Results ...... 23 3.3 Discussion ...... 30 3.4 Materials and Methods ...... 36

4 Chemical Synthesis Rewriting of a to Achieve Design Flexibility and Biological Functionality 41 4.1 Abstract ...... 41 4.2 Preface ...... 42 4.3 PNAS Publication ...... 43 4.4 Supplementary: Materials and Methods ...... 62 4.5 Supplementary: Figures ...... 81 4.6 Supplementary: Tables ...... 85

5 YestroSens, a Field-Portable S. cerevisiae Biosensor Device for the Detec- tion of Endocrine-Disrupting Chemicals: Reliability and Stability 91 5.1 Abstract ...... 91 5.2 Preface ...... 92 5.3 Biosensors and Bioelectronics Publication ...... 93

6 Synthetic Genomics – a Challenge for Regulation- and Policy-Makers 109 6.1 Benefits and Risks of Synthetic Biology ...... 110

vii Contents

6.2 Why Today’s Regulations do not Sufficiently Cover Synthetic Biology ...... 114 6.3 Scientific Self-regulation Complements a Future Framework of Policies . . 116

7 Discussion and Outlook 121 7.1 Discussion ...... 121 7.2 Outlook ...... 124

Bibliography 127

CV 152

viii List of Figures

1.1 Microscopic drawings by F. Kutzing¨ ...... 2 1.2 Phosphoramidite chemistry-based synthesis ...... 6 1.3 TnSeq produces knock-out-strains at a high throughput ...... 9 1.4 Systems and synthetic biology; similar questions, different approaches . . 11

3.1 Standardized genome assembly pipeline ...... 20 3.2 Pipeline of Construct Design, Synthesis and Assembly ...... 21 3.3 Auxotrophic split marker functionality test ...... 25 3.4 Testing of ARS-elements in a synthetic design...... 27 3.5 Overview of Accelerated Vaccine Design ...... 34

4.1 Part design, compilation and chemical synthesis rewriting of the C. eth- 1.0 genome ...... 45 4.2 Assembly of C. eth-2.0 in S. cerevisiae ...... 49 4.3 Fault diagnosis and error isolation across the C. eth-2.0 chromosome . . 51 4.4 Sequence design flexibility within rewritten C. eth-2.0 genes ...... 54 4.5 Fault diagnosis and repair across the C. eth-2.0 chromosome ...... 57 4.6 Rewriting of the bacterial designer genome C. eth-2.0 ...... 81 4.7 Massive sequence rewriting of the C. eth-1.0 genome ...... 82 4.8 Confirmation of complete C. eth-2.0 assembly in yeast ...... 83 4.9 Cell morphology phenotype and genome stability of yeast cloned C. eth-2.0 84 4.10 Stability of C. eth-2.0 chromosome segments upon conjugation into Caulobac- ter ...... 84

5.1 Schematic of estrogen detection using an immobilized yeast biosensor . . 94 5.2 Detection of different estrogen-compounds in liquid yeast cultures . . . . 101 5.3 Spotting assay to determine viability after lyophilization ...... 102 5.4 Estradiol detection in sensor material containing immobilized yeast . . . . 104 5.5 Prototype of sensor device ...... 105

6.1 Explanation of Bt-corn ...... 115

ix x List of Tables

3.1 Auxotrophic marker targeting sequences ...... 23 3.2 Colony counts of the ARS functionality tests ...... 28 3.3 Set of ARS for synthetic DNA designs ...... 29 3.4 Yeast strains used for the generation and usage of the genome assem- bler toolbox...... 36 3.5 used for the strain engineering by CRISPR-Cas9...... 36

4.1 Part list used to build the C. eth-1.0 genome design ...... 47 4.2 Sequence rewriting of C. eth-1.0 into C. eth-2.0 leads to massive reduc- tion of genetic features ...... 48 4.3 Functionality of C. eth-2.0 genes according to cellular processes . . . . . 55 4.4 Codon frequency table of the rewritten C. eth-2.0 genome ...... 85 4.5 Codon substitutions per C. eth-2.0 chromosome segments ...... 86 4.6 Non-synonymous mutations introduced upon the build process ...... 87 4.7 Conjugational transfer frequency of C. eth-2.0 chromosome segments . . 88 4.8 List of identified toxic C. eth-2.0 gene ...... 89 4.9 Deletions within C. eth-2.0 chromosome segments in Caulobacter . . . . 89

xi xii Chapter 1

Introduction

1.1 S. cervisiae, a Longstanding Companion of Humankind

Throughout history, fermented products have played an important role in human soci- eties and economies. This historical fact has been verified by archaeological discover- ies, which indicate the existence of Chinese fermented beverages dating back to around 7000 BC [1] and Iranian (6000 BC) and Egyptian (3000 BC) wine [2, 3]. During the fol- lowing centuries, humans have globally spread the knowledge of grape cultivation and the art of fermentation [4]. Later in history, the production of beer and bread became important in multiple cultures [5]. However, the molecular processes of fermentation remained poorly understood. In the late 18th and early 19th century, fermentation was defined by chemists as a strictly chemical process [6, 7]. The improvement of the mi- croscope [8] allowed to adapt this theory of a chemical process and introduce the novel scientific branch of microbiology [9].

In 1837, Charles Cagniard-Latour [10], Friedrich Kutzing¨ (Figure 1.1) [11] and Theodor Schwann [12] for the first time described the yeast Saccaromyces cerevisiae (S. cere- visiae) as a living . It took until 1880, for yeasts to be recognized as living microbes by other scientists [9]. During the second half of the 19th century, the scien- tists Louis Pasteur [13] and Pierre Berthelot [14] attempted to explain the fermentation process carried out. These explanation attempts were based on the analysis of the different compounds in fermented products. Yeast cells were the first scientifically ex- amined microorganisms, laid the foundations for microbiology and also had a significant impact in establishing the field of enzymology by describing sucrose inversion in yeast extract [9]. In conclusion, although yeast has accompanied humanity for millennia, it has not lost its importance in the present.

1 Chapter 1

Figure 1.1: Microscopic drawing by Friedrich Kutzing¨ depicting different observed mi- croorganisms. Fig. X depicts the yeast S. cerevisiae. Figure sourced from [11].

Yeast as an Invaluable Tool for Eukaryotic Genetics

After its establishment in peas and flies, the field of classical genetics has focused on prokaryotic organisms, especially on Escherichia coli (E. coli). Several prokaryotic properties, such as the typically short doubling-time, the well-established cultivation, and handling-technologies in a lab, as well as later, the fundamental principles of ge- netics (DNA, RNA, expression, etc.) paved the way in molecular genetics [15]. However, eukaryotic organisms are different from prokaryotes in many ways such as cell compartmentalization and gene regulation. These differences make the linkage between genotype and phenotype more difficult. This difficulty made applying the ge- netic knowledge found in prokaryotes to complicated. As an alternative to the prokaryotic model organisms, a new eukaryotic model organism was found in S. cerevisiae [16]. Despite it being a microorganism, eukaryotic in yeast contain a high degree of amino-acid conservation with respect to higher-order eukaryotes. This conservation is particularly important when using S. cerevisiae as a model organism for eukaryotic genetics. As a microorganism, S. cerevisiae has comparably short genera-

2 Chapter 1 tion times when compared to prokaryotes. Well established recombinant technologies to easily manipulate the DNA information in yeast cells facilitated the establishment of S. cerevisiae as a model organism. In general, there are two methods to genetically manipulate S. cerevisiae. First, there are plasmids, which are replicated within the yeast cells and can be used for protein expression. There are two variations, the first being the centromeric plasmids, which are maintained by the cells at low-copy numbers. The second variant is the 2-µ plasmids, which occur at high-copy numbers. By using these two plasmid systems, it was possible to over-express mutated genes at different concentrations [17,18]. The second method to manipulate S. cerevisiae is us- ing integrative linear DNA constructs. These particular DNA constructs are not capable of self-replication and they need to be integrated into the genome by homologous re- combination. By using this recombination system, it became possible to easily integrate foreign genes [19] or disrupt genes directly within the yeast genome [20,21]. Thus, sci- entists were equipped with straightforward technologies to manipulate and investigate an eukaryotic genome.

In 1996, the full genome of S. cerevisiae was sequenced in a global collaboration [22]. Shortly after this success, an almost complete library of open reading frame deletions in yeast has been produced and analyzed [23, 24]. This library is a first step in character- izing the function of every gene in S. cerevisiae. This library was combined with other available yeast libraries, in which for example all proteins were tagged to measure con- centrations [25] or fluoresce-labeled proteins were used to obtain localization-data of the proteins [26]. With the help of these different libraries, an overarching understand- ing of the yeast genome was achieved and resulted in the creation and maintenance of the Saccharomyces Genome Database (www.yeastgenome.org). This database pro- vides detailed information on every known S. cerevisiae gene. The well-characterized genes of S. cerevisiae were used as a reference for the establishment of new meth- ods [27]. These new methods included genome-scale technologies to study gene expression and regulation, such as chromatin immunoprecipitation-sequencing (Chip- seq) [28] and mRNA stability assays [29]. In addition to gene expression and regulatory networks mentioned above, protein interaction networks [30] and gene interaction net- works [31, 32] have been studied. These networks that could be defined thanks to the extraordinary characterization of S. cerevisiae stood at the beginning of the field of systems biology.

In conclusion, S. cerevisiae has accompanied humankind for a very long time. However, even more than 180 years after the description of the yeast organisms, researchers still see the potential to answer many fundamental and applied questions, such as the mechanism of aging [33] and the genome-sized DNA assemblies [34] by using this organism.

3 Chapter 1

1.2 DNA Cloning, a Technology to Shape Modern Biology

DNA cloning technology has transformed the field of biology and has resulted in the promotion of new fields such as molecular biology, genetic engineering, and systems- as well as synthetic biology [35]. To appreciate the current level of cloning technology, it is important to understand its beginnings. After antibiotics have been developed in the 1940s, the resistances shown by some microorganisms were a setback for the scien- tists within the anti-microbial community. It has been discovered that cells transmit the resistance factors to one another by direct contact [36, 37]. Scientists also discovered that the resistance is maintained as extrachromosomal, circular DNA elements, which were termed plasmids [38–40]. The realization that the antibiotic resistance informa- tion is stored and transmitted on plasmids led to the development of new technologies. Due to the development of transformation methods for E. coli, it became possible to in- sert plasmid DNA into living cells and thereby modifying their genomic information [41]. To successfully clone foreign genes into plasmid DNA, the discovery and isolation of restriction enzymes was necessary. In 1970, a research team at the John Hopkins Uni- versity (Baltimore, USA) reported the blunt-end cutter HindII [42]. Two years later, the EcoRI was isolated, which creates DNA-sequences with sticky-ends after the cut [43]. Because of the sticky-end cuts generated by EcoRI, the in vitro cre- ation of a new plasmid and its transformation in E. coli was possible [44]. Shortly after, the first E. coli expressing DNA from a different prokaryote (Staphylococcus aureus) was created [45]. As a consequence of these achievements, DNA cloning technology was globally established as a key technology in biology laboratories.

Sequencing and PCR Founded the Field of Genomics

In the early days of cloning, it was still difficult to verify the success of the cloning pro- cess, as sequence verification methods were not established yet. The most obvious way has been to use growth selection for confirmation, which does not allow to de- tect mutations in the cloned products. The second method was to use agarose gel electrophoresis [46] in combination with restriction enzymes to separate and visualize the cloned and subsequently digested plasmid pieces. The problem of confirming the DNA sequence was solved with the development of DNA sequencing in the late seven- ties [47, 48]. With this new technology, scientists were able to determine and verify the sequence of the clones they created. The cloning process was facilitated a decade later by the thermostable Taq-PCR technology, automating the thermocycling process as no additional enzyme needed to be added during the reaction. This enhancement of the PCR technology has established the sequencing, cloning, targeted isolation and am- plification of DNA stretches as everyday lab-procedures [49]. These new technologies

4 Chapter 1 facilitated the cloning of DNA and promoted the use of this technology in other fields. It became possible to isolate, mutate, clone and verify any DNA sequence, to answer many fundamental questions and develop new applications, such as quantitative real- time PCR [50].

S. cerevisiae Has Been Used Early on to Clone Large Parts of

Yeast artificial chromosomes (YACs) have been used to analyze large genomes of dif- ferent higher organisms [51–53]. In this technology, pulse-field gel electrophoresis is used to separate the DNA parts of a digested genome. The parts are isolated and cloned into a YAC vector by transforming it into yeast [54–56]. As S. cerevisiae is an eukaryotic organism, DNA which is unstable if cloned in prokaryotes often remains sta- ble in a YAC [54,57,58]. After assembling, the YAC can be transformed into mammalian cells, in which the genes of interest and their flanking regions can be studied [58]. The cloning of YACs was very helpful in the early study of genomes and complex genes. It showed the high potential of S. cerevisiae to clone and maintain large pieces of foreign DNA [58]. However, there is a limitation to the YAC-technology, as the genomes have to be isolated and cloned in their wild-type state. It is very difficult to modify the introduced DNA at a large scale, to answer different research questions. The YAC technology was a valuable tool for scientists, as it provided the possibility to assemble large DNA constructs and investigate their function.

Larger, Faster and More Precise - the New Generation of Sequencing

The ability to sequence DNA is an invaluable tool for the verification of DNA cloning, as well as identifying new genetic elements within organisms. The first generation of se- quencing methods (i.e. Sanger sequencing) was successful in determining sequences of up to 1 kilo-bases (kb) at a low-throughput. However, if a project aimed to sequence a genome-sized DNA construct, this sequencing technique is very laborious and expen- sive [59]. The second generation of sequencing was based on pyro- or [60] fluorescent sequencing [61]. The read-length achieved by these machines is low but the amount of data and read-coverage are high, resulting in a high sequence fidelity [62]. This enables a high-throughput sequencing suitable for genome-sized DNA constructs. However, due to the short read-length, the in silico assembly of the sequence-data can be chal- lenging. With the second generation technologies, the sequencing of genome-sized DNA pieces has become affordable, as the cost-development of the sequencing de- vices does no longer follow Moore’s law [63, 64]. The latest generation of sequencers, third-generation sequencers, is capable to sequence single DNA molecules. Single- molecule sequencing results in long reads, which complement the high-fidelity reads

5 Chapter 1

of the second generation devices [65]. The third-generation sequencing technology makes the verification of large cloning products possible on a short time-scale. Addi- tionally, it enables researchers to develop new tools for the analysis and modification of full genomes. Thanks to these modern sequencing methods, large amounts of se- quencing data can be gathered in a short time, providing novel insights for scientists.

Synthetic DNA, the New Tool in Biology

Another rapidly advancing technology, which is revolutionizing biology, is the chemi- cal de novo synthesis of DNA. The technology dates back to the 1950s when DNA oligomers were manually synthesized in research laboratories [66–68]. Because of its efficient automation, the solid-phase phosphoramidite chemistry-based oligonucleotide synthesis (Figure 1.2) [69,70] is still widely employed today.

DMTrO Base Release & deprotection O Decapping Next cycle O

HO Base DMTrO AcO O Base Base O O

Blocking O O O O NC P failed reactions O O Base O

O New base DMTrO Base O Coupling DMTrO Base O Oxidation O O P O O O Base NC P NC O N

O

Figure 1.2: Phosphoramidite chemistry-based oligonucleotide synthesis. The synthesis cycle starts with a DNA base bound to a PEG-based support (orange). To this initial base, a new base is coupled using the phosphoamidine bound to the new base. The added base is protected by a dimethoxytrityl group (DMT), preventing an additional cou- pling reaction to this base. In an intermediate step, the non-reacted bases are capped (AcO) to prevent faulty prolongation of the sequence in a later step. After oxidizing and modifying the phosphate group of the synthetic chain, a new cycle can be started by de- protecting the last base of the oligo. Alternatively, if the chain elongation is terminated, the oligo can be realeased and deprotected.

Later developments in array-based oligo synthesis and semi-conductor technology en- abled the high-throughput and parallelized production of [71]. The

6 Chapter 1 introduction of array-based oligo synthesis methods has lowered the price of DNA syn- thesis. However, the early high-throughput methods were prone to introduce sequence mutations and therefore different methods have been developed to minimize the error rate in the synthesis process [72]. A first method uses a gene encoding a fluorescent protein, which is integrated in vectors to select for correct DNA synthesis [73]. The flu- orescent protein can only be expressed, if the synthetic DNA is integrated correctly into the vector. As the most common mutation during synthesis is a deletion, this method works well to detect single or double deletions. However, if there is a triple-deletion or a factor thereof, this method does not work. A second method to detect synthesis errors is to use a commercial enzyme mix (e.g. CorrectASE by Thermofisher). This mix can be added to the DNA oligos after a PCR amplification step. The enzymes will digest any oligos with mismatches, thus raising the reliability of synthesized DNA pool [74]. In a third method, second generation sequencing is used to identify correct variants in a pool of synthetic oligos. By using a complex barcoding scheme, the selected oligos can be isolated from the pool by using a specialized PCR protocol [72]. In conclusion, the higher error rate of new high-throughput synthesis methods can be counteracted by selecting for correct variants, thus lowering the price for large-scale DNA synthesis.

Due to the emergence of new cloning methods, the automated assembly of large DNA oligo-pools into larger DNA constructs came into reach. These assembly methods included but were not limited to yeast assembly [75, 76], Gibson assembly [34], and Goldengate assembly [77]. Thanks to these new cloning methods, the synthesis of genome-size DNA constructs with a low amount of mutations is now feasible and afford- able [78]. The synthesis of genome-sized DNA constructs opens up new possibilities for cloning by designing the constructs in silico and directly synthesizing them. This opens up many possibilities for fundamental research, as the effects of mutations on the encoding of functionality into DNA can be researched on a bigger scale. Scientists are not limited to single genes or small networks anymore, but complete genomes can be custom made and analyzed in a single experiment [79]. Further, this technology also opens up many possibilities for applied research, as DNA sequences that where only sequenced but never cloned can be synthesized now. This holds a lot of potential to discover proteins with novel functionalities in . However, most DNA synthesis platforms are still based on phosphoramidite chemistry and therefore synthesis prob- lems such as secondary structures, high-GC content and homo-polymeric stretches are limiting the applicability of DNA synthesis [80].

7 Chapter 1

1.3 Systems Biology, Unraveling the Networks of Life

The classical field of biology often relies on a reductionist research approach. In such an approach, it is assumed that any complicated problem can be simplified to a set of smaller, easier to solve problems. This concept also includes the principle of Oc- cam’s razor stating that the simplest solution is the best solution [81]. In contrast, the term ”holism” means that to understand a system, one cannot reduce it into smaller systems [82]. It is out of holism, that systems biology has been born in the 1990s. The main goal in the field of systems biology lies in understanding the structure of a system, realizing the behavior, characteristics and different states of the system and finally, define how to design such a system [83]. In recent years, different -omics meth- ods such as genomics [84], metabolomics [85] and proteomics [86] have been used to measure and characterize networks in biological systems (e.g. organisms). However, these methods generate terabytes of data and for handling such large data sets, novel computational high-throughput analysis tools need to be developed [87, 88]. Systems biology is an interdisciplinary field in which wet-lab and computational scientists are closely working together to explain biology using a systems wide approach [89].

Transposon Sequencing as a Valuable Tool in Systems Biology

A multitude of different -omics methods, which can be used to characterize different components of networks exist. Here, the transposon sequencing (TnSeq) method [90] will be introduced. This technology enables the high-throughput determination of the essentiality of genes under different selections, for example, antimetabolites or star- vation. Many different transposons are known. The Tn5-transposon (Figure 1.3A, B) was discovered in the 1970s while studying antibiotic resistances [91]. Transposons are DNA elements, which are integrated into the host genome by a ”cut and paste” mecha- nism [92]. As there is a uniform insertion site-specificity, the transposon can be pasted at most locations within the host genome [93–96]. All steps required to introduce trans- posons into the genome are conducted by a single transposase protein (Figure 1.3A) and no additional host proteins are involved [97]. The TnSeq method uses this inte- gration of transposons into the genome of an organism by sequencing from within the transposon into the genome by using second-generation sequencing (Figure 1.3B,C). If the transposon is integrated within an essential gene, disrupting and making it non- functional, the organism cannot survive and the sequence will not be observed. The high-throughput TnSeq method enables the creation of complete transposon insertion libraries to study the essentiality of genes and genomic networks under different growth conditions [90]. Due to further engineering of the transposon and the optimization of the protocols, it is possible to analyze genomes at an eight bp resolution [98]. Since its

8 Chapter 1

A Tn5-transposon B

Tn5-transposon Illumina primer Pxyl Terminator binding site

Tn5 Plasmid for TnSeq Ab-resistance

Transposase Ab-resistance

C

Caulobacter genome Transformation DNA extraction with Tn5-insertions Integration Sequencing

Figure 1.3: TnSeq produces knock-out-strains at high throughput. (A) Tn5 plasmid used for the TnSeq protocol. The Tn5-transposon (red) on the plasmid along a hy- peractive transposase and an antibiotic-resistance marker. Note: there is no on this plasmid, ensuring a single transposon insertion event for each cell. (B) Tn5-transposon in detail. The Tn5-transposon contains the IS elements needed for the insertion at both ends (not marked). The Pxyl promoter ensures that only one transposition occurs. The terminator located downstream of the promoter ensures the termination of every protein, into which the transposon has been inserted. The Illu- mina primer binding site is used to analyze the location of the transposon within the genome, making the analysis at a high-throughput possible. (C) Schematic workflow of a TnSeq experiment. The Tn5 plasmid is transferred into Caulobacter from E. coli using conjugation. In the next step, colonies are grown on a selective plate, which ensures that every colony contains a Tn5-transposon in the genome. After pooling the colonies and extracting the genomic DNA, the high-throughput Illumina sequencing enables the mapping of all inserted transposons in the Caulobacter genome (red). This map can be used to elucidate essential CDS within the genome (no inserts = essential). development in 2009, the TnSeq method has been established and applied for many different organisms like Streptococcus pneumoniae [90], Caulobacter crescentus [98], Vibrio cholerae [99], Methylobacterium extorquens [100], and Brucella abortus [101]. Therefore, the TnSeq-method is a versatile tool to analyze the genomes of a wide vari- ety of organisms.

Definition of the Essential Caulobacter crescentus Genome by TnSeq

One of the goals in systems biology is elucidating the genes and genetic networks of an organism essential for its survival (Figure 1.4). This goal has been achieved by low-

9 Chapter 1

throughput transposon mutagenesis [102–104] and by constructing in-frame deletion libraries [105, 106]. This work was helpful in broadly determining the essential genes of the respective organisms. To get a more detailed understanding of the information encoded in the genome, measurement at a higher resolution was required. In 2011, a hyper-saturated transposon mutagenesis strategy was developed to define the essen- tial genome of the bacterium Caulobacter crescentus (Caulobacter) [98]. In this study, the Tn5 transposon insertion experiment was conducted at a scale of several thousand growth-plates. The colonies growing on each plate originated from cells containing a single Tn5 insertion in their genome. The location of each transposon in the pool of colonies was determined by sequencing. An average distance of 8 bp was determined between neighboring transposons, indicating a very high coverage of transposon in- tegration within the genome. The analysis of the localization-dataset resulted in the definition of 480 essential open reading frames (ORF), 130 essential non-coding el- ements and 402 essential promoter regions in Caulobacter. Based on this extensive dataset, the first rewritten, chemically synthesized and assembled synthetic genome will be described (Chapter 4) [79].

A Minimal Cell is of Interest for Fundamental and Applied Researchers

One of the best-known quotes of the Nobel prize-awarded theoretical physicist Richard Feynman is ”What I cannot create, I do not understand” [107]. This quote is espe- cially true if discussing the understanding and defining of life. The full understanding of a replicating cell is a very daunting goal. As organism’s genomes are very com- plex and include many redundancies and genes to cope with different environments, it is currently not possible to fully understand them [108]. However, by designing and synthesizing a minimal cell, the complexity is reduced [109]. A precisely defined cell can be used as an efficient chassis for novel biotechnological pathways, excluding any interactions between the pathway and the host, which have the potential of lowering the efficiency of the pathway. [110]. It has been shown, that the deletion of significant parts of the E. coli as well as the Bacillus subtilis genome has beneficial effects in the produc- tion of recombinant proteins [111–114]. With the additional fundamental understanding gained from essential organisms, novel potential drug targets [115] can be identified and for example, be employed to combat multidrug-resistant pathogens [116]. Coming back to Feynman’s quote, we can summarize that it is the goal of systems biology to understand and define the essential genome or blueprint of an organism and thereby the coding of life. However, to prove this hypothesis, researchers have to build such a minimal organism as it is done in the field of synthetic biology (Figure 1.4).

10 Chapter 1

Digital Information

Database Design

6-30 µm Synthetic Biology 0.25-1 µm

Synthesis 1-10 µm 5-20 µm Systems Biology Bacterial Blueprint J. E. Venetz Appr.: 13/03/2020

Biological System

Figure 1.4: Systems and synthetic biology; similar questions, different approaches. Systems biology and synthetic biology both start at opposite ends targeting the initiation point of each other. The field of systems biology starts at the living organism as it is found in nature. By using different high-throughput -omics methods, systems biologists characterize the different parts e.g. essential DNA sequences and interactions of a system and store the gathered blueprints in databases. On the other hand, synthetic biology uses the knowledge stored in databases. By applying rational design principles in the future, novel organisms can be designed and subsequently synthesized. The final goal of synthetic biology is the creation of a synthetic, wild-type-like biological system.

1.4 Synthetic Biology, a New Field to Shape Future Research

The field of synthetic biology still is very young and up to now, no clear definition of the overarching goals of the field has been formulated [117]. However, the field is loosely defined by the fact that classical forward engineering approaches are used to reach different goals, such as designing synthetic cells or novel pathways. This approach contrasts the field of systems biology, where a full system analysis strategy is employed to elucidate different networks controlling the cell [83]. The knowledge resulting from the systems analysis (e.g. the genetic regulation) is used by synthetic biologists to create novel circuits, pathways and even minimal cells (Figure 1.4) [117]. In the following paragraphs, some of the achieved milestones leading to the creation of a synthetic cell will be elaborated.

Genetic Circuits, the First Steps in the Field

The first wave of synthetic biology aimed at the creation of genetic circuits. Synthetic ge- netic circuits can be used to control a cell in different situations, like sensing a metabo- lite to induce a defined reaction (e.g. fluorescent protein expression). However, the first advances in creating such theoretical circuits used in silico simulations devised from

11 Chapter 1

electrical engineering [118, 119]. One of the first functional genetic circuits with a de- tectable function was the genetic toggle switch circuit. The cell containing this toggle switch has two stable genetic states, between which can be switched by the addition of small molecules [120]. This first functional genetic circuit was soon followed by an oscil- latory circuit [121] and several autoregulatory feedback modules [122–125]. Following the development of these relatively simple circuits, more complex constructs and mech- anisms have been developed. Such constructs include for example logic AND gates, where two components need to be simultaneously present to result in a signal [126]. Further, sensors using quorum sensing to create multicellular patterning have been reported [127, 128]. As quorum sensing has a high potential in engineering systems, circuits using this mechanism were further developed to create multicellular synchro- nized oscillators [129] as well as edge-detecting circuits [130]. The examples of genetic circuits provided in this paragraph are non-exhaustive and the process of creating more complex and robust circuits is still ongoing.

Metabolic Engineering Enables the Production of Different Compounds

Artemisinin is an antimalarial compound, which can be extracted from Artemisia an- nua, a wormwood [131]. As malaria is highly prevalent in tropical regions and concerns of resistances have arisen, there is a high demand for a higher variety of af- fordable drugs [132]. However, as the drug is extracted from slowly growing , large scale production is difficult and expensive [133]. After an extensive develop- ment period, large-scale bio-production of Artemisinin in S. cerevisiae became reality in 2013 [134]. The drug is licensed by Sanofi (France) and is provided to patients at a low price, potentially saving many in third world countries [117]. The case of the Artemisinin production in yeast is an example of many different natural products, which are currently attempted to be produced in recombinant systems. Apart from solutions in drug development, synthetic biology also holds the promise to help to alleviate the dependency of our society on fossil resources. Some notable achievements of this re- search include the bio-production of isobutanol by engineering the synthesis pathway [135, 136], biodiesel, and gasoline synthesis [137, 138] and the production of bioplastics [139]. There are two reasons that the pathways used in synthetic biology to create novel products have become more complex. First, the synthesis of the current molecules of interest usually is complex. Second, the advances made in the field of synthetic biology entail that current projects need higher degrees of complexity to be relevant to the community. This demand for complexity makes classical cloning more difficult and thus de novo DNA synthesis would facilitate building such pathways. Fur- ther, as previously discussed, there is the risk of unwanted interference between the host organism and the enzymes of the production pathway. It was hypothesized, that

12 Chapter 1 a minimal cell could be used as a chassis to optimize the bio-synthesis process [110]. The creation of such synthetic life was one of the main goals of the field of synthetic biology.

JCVI-syn1.0, Achieving a Milestone in Synthetic Biology

In 2010, scientists at the Institute have reported the synthesis, assembly and boot-up of the first synthetic bacterial cell JCVI-syn1.0 [140, 141]. The bacterium, which was chosen to be the template for the first synthetic life form was the pathogen Mycoplasma mycoides [142]. Mycoplasma contain the smallest known genomes in independently growing organisms [102]. Before the synthesis, a yeast replication and selection element and identifying watermarks were added to the wild type sequence of Mycoplasma mycoides [140,141]. Sequencing of the full genome after the assembly has shown that there were 19 unplanned mutations, which differen- tiated the synthetic genome from the wild type template [140]. Apart from these mu- tations, the yeast elements, and the synthetically introduced watermarks, the synthetic cell was an exact copy of the wild type. The synthetic genome was transplanted into dif- ferent Mycoplasma cells where the synthetic genome started replicating [140,143,144]. Creating a synthetic living cell has shown that it might become possible for humans to create life. However, there are some flaws inherent to this study worth mentioning. The funds required to create this organism was around $ 40 million and countless hours of work. This is not an investment bearable for any research group willing to conduct such a project and thereby the reach of this technology is limited. Further, the organism used in this study is a known human and animal pathogen raising the question on the safety of the research on such a synthetic organism and whether the current policies and regulations are covering all required aspects. Last, regarding the minimal cell, this synthetic organism certainly is a step in the right direction to create a minimal cell but still is a wild-type sequence. A minimized version (JCVI-syn3.0) of this organism has subsequently been created [145] but even within this newly created genome the full in- formation on all the genes has not been elucidated. The creation of synthetic organisms is a huge opportunity to further the understanding of the fundamental coding of life.

13 14 Chapter 2

Aim of the Thesis

The synthesis and boot-up of the first synthetic organism at the J. Craig Venter Institute has marked a milestone for the field of biology [140]. As previously described, the tech- nology of DNA synthesis on a genome-scale holds great promise for the fundamental understanding of life. Also, it enables the further establishment of biology as an engi- neering discipline and paving the way for many novel applications. However, up to now the full access to this technology has been restricted to a small number of well-funded research groups and institutions. This thesis aims to make this technology accessible for a wider audience by establishing a standardized, fast-paced DNA assembly technol- ogy. In the later chapters, this thesis presents some of the possibilities this technology holds, as well as a critical discourse about future policies and regulations regarding this fast-evolving field of biology.

During this PhD-Thesis, three goals were achieved:

• Facilitate Access to Synthetic Genome Assembly Technologies The synthesis of large DNA constructs is expensive and thus unavailable to most research groups. The assembly of a large construct from the synthesized pieces of DNA currently is a custom project requiring a specialized workforce making it expensive. In this thesis, existing DNA assembly methods were standardized and stream-lined. Further, the GC-content barrier during the assembly process was lowered, broadening the space of sequences that can be assembled. By facili- tating the selection process after the assembly, the time required for a successful assembly was significantly decreased. This work is discussed in Chapters 3 and 4.

15 Chapter 2

• Demonstrate the Applicability of Large-scale DNA Synthesis and Artificial Biological Systems Two proof-of-concept studies are included in this thesis to demonstrate the utility of synthesizing DNA at a large scale. The first study (Chapter 4) describes and discusses the DNA rewriting, synthesis, assembly, and functionality-testing of a minimal bacterial genome. This study shows that a high GC-content synthetic genome can be successfully synthesized and assembled. Further, fundamental knowledge regarding gene-regulation and annotation was gained. The second study (Chapter 5) is concerned with an estrogen biosensor in S. cerevisiae within a portable device prototype. The system was not built using synthetic DNA but this project demonstrated the potential of biotechnology for everyday applications. Furthermore, the potential to build more complex synthetic circuits and molecular factories in the future was identified.

• Discuss Safety Hazards as well as Future Policies and Regulations to Ad- vance the Field The field of synthetic biology is constantly evolving leading to the emergence of many novel products and organisms. Additionally, synthetic biology is opening-up to hobby biologists because less specialized equipment is needed. With this fast evolution and wider spread, it is very important to analyze the risk and benefits of this field. Additionally, it is important to understand why the current policies and regulations are no longer sufficient to regulate the field of synthetic biology. In Chapter 6, these points are addressed and a potential way of regulating this field in the future is described.

16 Chapter 3

Establishing a Synthetic Genome Assembly Technology

Synthetic chromosome-sized DNA constructs are becoming a popular tool in funda- mental and applied sciences. However, the assembly of such large synthetic DNA sequences pose a major challenge regarding its feasibility and efficiency, resulting in a high demand on financial and human resources. To raise the accessibility of the genome assembly technology, new stream-lined assembly processes need to be devel- oped. This chapter describes the establishment of a standardized genome assembly technology, which enables the facilitated assembly of synthetic chromosome-sized DNA constructs. The presented assembly technology contains three different components: a dedicated assembly S. cerevisiae strain, specialized selection markers, and DNA maintenance elements. To create the assembly strain, the auxotrophic marker genes used for the later assembly selection were deleted from the S. cerevisiae genome us- ing a CRISPR-Cas9 protocol. The specialized auxotrophic selection markers used for the faithful DNA assembly are described and tested in this chapter. The last element described and tested are the autonomously replicating sequences, which are imple- mented in the design. These sequences ensure the replication of the assembled DNA constructs within the yeast cells. The combination of these three components results in the chromosome-sized DNA assembly technology. By using this technology, it is possible to assemble a fully synthetic genome at a high-throughput for a fraction of the financial and time resources. This improved assembly technology will make large synthetic DNA research projects accessible for a wide spectra of laboratories.

17 Chapter 3

3.1 Introduction

In the past, genetic engineering of large constructs has been considered lengthy and difficult as classical cloning methods using PCR, restriction enzyme digestion and lig- ation were work-intensive processes. These challenges explain why only rarely large DNA constructs containing a multitude of components with more than several thousand base pairs have been designed and assembled. Recently, the latest silicon chip-based chemical DNA synthesis technology enables robust and high-throughput methods to produce large amounts of small (< 1 kb), synthetic DNA within a short time at a low price [78]. Also, computer algorithms can help automate the design process and re- duce the chance of non-synthesizable sequences [146]. This is the case for sequences containing elements, which hinder the chemical synthesis process, such as high GC- content, hairpins or homopolymeric stretches [78, 80, 147]. These advances prove that chemical synthesis of DNA will enable us to engineer entire complex biological path- ways, as well as other large and complex pieces of DNA. Compared to the classical cloning of standard-sized plasmids, the assembly of large strands of DNA entails a new set of problems. In routinely employed molecular cloning kits the columns are certified to be usable up to a certain size of DNA (e.g. 40 kb, QIAGEN Plasmid Miniprep Kit). However, when using these routine kits to work with large constructs such as complex pathways or genome-sized constructs, the DNA is filtered out as genomic DNA above the membrane cut-off and cannot be recovered. Another frequently observed problem when handling large DNA constructs is the rapidly increasing DNA fragility, which is due to shearing-forces when pipetting [148]. This requires special care and a minimal amount of pipetting steps. Assuming successful DNA extraction and purification, the assembly of two already large DNA pieces into one even larger piece also requires special protocols along with a set of specialized cells. Some of the challenges mentioned in the previous paragraph have been solved in 2010. Scientists at the J. Craig Venter Institute (JCVI) have reported the first-ever synthetic Mycoplasma cell [140]. They had developed new protocols such as the Gibson as- sembly [75] as well as refined and applied well-established protocols such as the yeast spheroplast transformation for assembly in S. cerevisiae by homologous recombina- tion [149]. However, the price of this synthetic cell was estimated at several million dollars, restricting the access to this technology to well-founded organizations. The se- lection of the design assembly was not as maximally tight as only one selection marker has been included. This resulted in a high amount of screening due to false-positive colonies. Furthermore, the GC-content of the synthetic organism’s genome was at 24% [142]. The replication system of S. cerevisiae is optimized for a GC-content of around 38% [150]. In consequence, DNA sequences of GC-contents of around 40% or below can be maintained by the cell. This made the assembly of the synthetic genome

18 Chapter 3 possible. To establish the de novo synthesis and assembly of large DNA constructs as a standard protocol in labs, the possible GC-content range of the constructs has to be enlarged and the selection for a full assembly has to be facilitated. Further, the assembly protocol calls for standardization.

We have developed a standardized assembly pipeline, universally applicable to DNA constructs as long as they are non-toxic in E. coli and S. cerevisiae (Figure 3.1A). At the core of this pipeline is the genome partitioner algorithm [151]. This algorithm takes long input DNA-sequence and splits it into several small parts. The sizes of these indi- vidual parts are determined by the researcher and determine the number of assembly tiers needed for a successful large DNA construct assembly. In our experiments, we either used subblocks of around 900 bp length or blocks, which are between 4 and 5 kb long. Constructs of both sizes can be ordered for synthesis at several differ- ent providers, however, the subblock variant results in significantly lower costs as the synthesis process requires less manual steps. To enable the standardized assembly strategy, universal adapters are added to the constructs (Figure 3.1B). The adapters contain a unique restriction site specific for each assembly tier, as well as barcodes which can be used to track and amplify a certain entity in a mix of DNA components. By using these barcodes, it is possible to extract specific DNA parts from a pool of syn- thesized DNA by PCR. Using a universal primer strategy, an automated protocol could be applied for the assembly of the constructs. For the assembly of the different parts, two different organisms were used (Figure 3.1C). For the first tier, E. coli in combination with the Gibson assembly protocol [75] has been applied. To assemble the larger DNA constructs, the mechanism of S. cerevisiae were used. We hypothesize, that by following this standardized approach the assembly of large syn- thetic DNA constructs becomes a process feasible to carry out in standard molecular biology labs. However, the selection performance and the in vivo maintenance of large constructs still need to be optimized.

Concerning efficient selection, auxotrophic markers are a well-established tool when working with S. cerevisiae. To use these markers, a gene within the biosynthesis path- way of a DNA-base or an amino-acid is deleted in the host strain. This gene can be re-introduced into the organism via a plasmid, resulting in a positive selection for the newly introduced construct [152]. As mentioned before, the team of the JCVI did only use one such marker in their very large design. As a consequence, the assembly of this large synthetic design resulted in a high number of false-positive colonies and therefore much additional screening work [140]. To reduce the number of false-positive genome assemblies, several different markers have been used in our design. By design, the markers are split in half and only recombine upon assembly of the synthetic genome. However, as the split-markers will be at a double-stranded DNA break facilitating re-

19 Chapter 3

A Subblock tier Block tier Segment tier Genome tier

Assembly Assembly Assembly 1 2 3 Gibson LiAc Sphero. ITA

BC Assembly 1: Gibson ITA Assembly 2: LiAc Assembly 3: Spheroplast

Segment adapter Vector Vector Enzymes Digest Block adapter

Trans- Trans- Trans- formation formation formation in vitro E. coli S. cerevisiae S. cerevisiae Subblock adapter Spheroplast assembly w/ Block w/ Segment w/ Genome

Figure 3.1: Standardized genome assembly pipeline (A) The 900-1,500 bp subblocks including the universal adapters are delivered by the manufacturer. Before each as- sembly step, the tier-specific adapters are removed by restriction digest. The subblocks are assembled into 4-5 kb blocks by using the Gibson isothermal assembly (ITA) pro- tocol. For the assembly of the 20 kb segments or the 40-60 kb mega-segments (not shown), the lithium acetate (LiAc) transformation protocol in S. cerevisiae is used. For the final assembly-tiers (>60 kb) the yeast spheroplast protocol is applied to assemble the genome construct. (B) Three different adapters are used in the assembly pipeline. 1) subblock adapter (purple): each delivered subblock contains this adapter. 2) block adapter (green): only subblocks located at the edge of blocks after the assembly contain this adapter. The adapter includes homologous sequences to the vector for the assem- bly. 3) segment adapter (blue): same as with the block adapter, only for the segment tier of the assembly. (C) Assembly 1 Gibson ITA: After removing the adapters from the subblocks, they are mixed with the assembly backbone and enzymes and incubated in vitro for one hour to assemble. The assembly mix is transformed into E. coli to isolate the assembled blocks. Assembly 2 LiAc-transformation: the blocks without the block adapters are transformed into S. cerevisiae using the standard LiAc-transformation pro- tocol. The yeast cell assembles the segment by homologous recombination. Assembly 3 spheroplast-transformation: before the transformation step, the cell wall of S. cere- visiae is digested to form spheroplasts. After the transformation and recovery of the yeast cells, clones with the complete genome construct can be selected.

combination, there exists the risk that the DNA parts will recombine with sequences in the yeast genome instead of recombination with the corresponding synthetic com- ponent [152]. This potentially results in a false-positive colony. It will be necessary to create a new set of yeast strains, which can be used for a reliable assembly of synthetic DNA constructs. The main properties of these engineered yeast strains will be their high DNA-uptake capacity when transformed using the spheroplast transformation pro- tocol as well as a low chance of recombination between the genome and the synthetic construct.

20 Chapter 3

The difficulty of assembling and maintaining a large synthetic DNA construct in S. cere- visiae is linked to the GC-content of the sequence. This link is due to the DNA transcrip- tion during mitosis. The origin recognition complex binds at the Autonomous Replicating Sequences (ARS) within the genome to initiate the DNA [153]. ARS are AT- rich sequences [154], which are either spread throughout the genome of S. cerevisiae or which are located at the origin region of stable plasmids. However, if the GC-content of a synthetic DNA construct is high, the probability of an ARS-like sequence to occur is lower. Further, if the size of the said construct is too big to be maintained by the ARS element incorporated on the backbone vector (>150 kb), the construct cannot be maintained in yeast [155]. For this reason, we have to integrate functional S. cerevisiae ARS elements within the design to enable the maintenance of a large, high GC-content DNA construct in yeast.

in silico in vitro in vivo

DataSelection & Optimization Synthesis Assembly Maintenance

Extraction

Database Design

Figure 3.2: Pipeline of construct design, synthesis, and assembly. The pipeline is split in an in silico, an in vitro and an in vivo part. During the in silico part of the pipeline, the knowledge gained from previous studies is used to create new designs from database information. It is important to already incorporate selection and maintenance elements for the subsequent assembly. The last step in the design phase is the optimization of the sequence by applying the DNA rewriting algorithms. This optimization ensures the feasibility of the synthesis as well as the integration of the assembly into the standard- ized pipeline by removing synthesis constraints and splitting the sequence into smaller parts. During the in vitro stage, the split parts of the sequence including the assembly adapters are synthesized by an external provider. During the in vivo step of the pipeline, the full DNA construct is assembled in several subsequent steps. The assembly pipeline does not change if applied for different constructs.

To design and create a large synthetic DNA construct, we always apply the same pipeline for each new construct (Figure 3.2). During the in silico design step, the se- quence information needed for the design is extracted from different online databases and collated into a new design. However, this initial design cannot be stably assem- bled, as the previously described selection and maintenance features are missing. After adding these features to the design, the sequence is optimized for synthesis by auto- mated algorithms and the small DNA subblocks are defined. The chemical de novo synthesis step of the pipeline is outsourced. In the last step of the pipeline, all the parts are assembled yielding the full construct by often using several tiers of assembly. In this chapter of my thesis, I discuss two aspects of this pipeline. First, I created a set of S. cerevisiae strains, which are specialized to be used during the assembly of large

21 Chapter 3

DNA constructs. Within the genomes of these strains, all the auxotrophic marker genes have been deleted. Also, the high DNA-uptake capability of these strains is known due to their common ancestor. The second part of this chapter describes the selection and maintenance features necessary to implement in the design. The auxotrophic split markers were tested for their functionality. Further, the ARS elements within the design were tested and a new, extended set was created.

22 Chapter 3

3.2 Results

Engineering of the Assembler Yeast Strain

For an efficient assembly of a synthetic genome, an engineered custom S. cerevisae strain is needed. To reduce the number of false-positive results, the assembly of a large synthetic DNA construct was done using several auxotrophic split markers. How- ever, this strategy posed the risk of a synthetic DNA part containing a split auxotrophic marker reacting with the wild type counterpart within the genome. This event is possi- ble to occur because often the genes used as auxotrophic markers in S. cerevisae are only knocked-out by point-mutations and not entirely deleted. The recombination of the yeast genome with the synthetic DNA would make it impossible to correctly assemble the full construct. To eliminate this recombination problem, we decided to delete the auxotrophic genes in question from the S. cerevisiae genome. As a starting point, the S. cerevisiae strain YPH857 [156] was used, which has been shown to have a high DNA-uptake efficiency when transformed by spheroplasts. CRISPR-Cas9 was used to introduce the deletions into the genome [157].

Design of the Gene-Targeting Sequences Using the CRISPOR-Tool

The first step in deleting an auxotrophic marker gene in yeast usually is to define the location within the genome, at which Cas9 is to introduce the double-stranded break. In order to optimally define the targeting sequence, the CRISPOR online tool was used (Table 3.1) [158]. In the targeting oligo design, the resulting targeting sequences were flanked by homologous sequences to the linearized CRISPR-Cas9 plasmid to enable homologous recombination. Using these oligos, the CRISPR-Cas9 system was tar- geted towards the auxotrophic genes of interest and could be used to efficiently delete them.

Table 3.1: CRISPR-Cas9 targeting sequences to introduce double-stranded breaks into the corresponding S. cerevisiae auxotrophic marker genes.

Marker Targeting Sequence Met14 GCTGAGCAAAGGGACCCTAA Ura3-52 GGGTCAACAGTATAGAACCG Ade2-101 GATATCAAGAGGATTGGAAA Leu2-∆1 GCAACAAACCCAAGGAACCT Trp1-∆63 AGCGGAGGTGTGGAGACAAA Lys2-801 GATAAATTCACAATGCTGAG

23 Chapter 3

Deletion of Auxotrophic Marker Genes

The different markers from the YPH857 strain have been deleted by serially applying the CRISPR-Cas9 system. To find the best deletion strategy for each gene, the pub- lished results by Brachmann et al. [159] were followed. These published results state how common auxotrophic marker genes can be deleted without impeding the perfor- mance of the S. cerevisiae cell. For genes not described in this paper, the full ORF was deleted. By using this strategy, the S. cerevisiae strains YJV01 to YJV05 were created. In each of these strains, an additional marker gene was deleted. The first strain con- tained a met14∆, the second a ura3∆, the third a ade2∆, the fourth a leu2∆, and finally the fifth a lys2∆ mutation. The Trp1 gene has also been deleted, however, there was no distinguished strain created, as the colony PCR reaction used to verify this deletion was not successful even though the deletion experiment result indicated a successful deletion. However, the mutation trp1-∆63, which was already present in YPH857 is a drastically shortened form of the Trp1 gene. Therefore, the risk of later recombina- tion at this location is expected to be low. In summary, this procedure yielded a set of S. cerevisiae strains showing a high DNA uptake capability during spheroplast experi- ments. Furthermore, six auxotrophic marker genes were completely deleted from the yeast genome to prevent homologous recombination between the synthetic construct and the yeast genome.

Evaluation of the Auxotrophic Split Markers using Giga-Segments

To select for the correct assembly of a large synthetic DNA construct, a set of rewritten auxotrophic split markers was designed. As the name indicates, these markers were split and located at the 3’- or 5’-end of two adjacent 20 kb DNA segments (Figure 3.3A). For the assembly, the segments were transformed into yeast and assembled by homologous recombination. If the yeast cell was able to correctly assemble the segments, the auxotrophic markers became active and a colony could grow on a given selection plate. Five of these split markers have been incorporated into the proof-of- concept (PoC) C. eth-2.0 design [79], discussed in chapter 4. This section describes the testing of the functionality of the markers and the subsequent production burden on the yeast cell of each of these markers. Giga-segments, which consisted of 3 to 4 of the 40-60 kb mega-segment building blocks of the C. eth-2.0 design (Figure 3.3B) were assembled to test the auxotrophic split markers. As the size of the giga-segments was between 140 kb and 200 kb, the sphero- plast transformation protocol was used to load the yeast cells with DNA. To test the aux- otrophic split markers, each giga-segment was designed, such that it contains a single marker. After the standard incubation period, the grown colonies were analyzed (Fig-

24 Chapter 3

A B G-seg 1 G-seg 6 TRP1 Seg. with ADE2 split markers G-seg 2 Seg x Transformation Selection C. eth-2.0 G-seg 5 785,701 bp Seg x+1 HIS3 LEU2

G-seg 4

G-seg 3 MET14 C Giga-seg 1 Giga-seg 2 Giga-seg 3 Giga-seg 4 Giga-seg 5 Giga-seg 6 Trp His Met Leu Ade Ade

Figure 3.3: Auxotrophic split marker functionality test. (A) Two adjacent segments (or- ange) each containing a part of an auxotrophic split marker (green) were transformed into a yeast cell. If both segments were present, yeast assembled them to activate the marker. Corresponding selection resulted in the growth of colonies only containing the correctly assembled construct. (B) Schematic of the C. eth-2.0 design. Several mega- segments (orange) were assembled into giga-segments (grey). After the assembly, each giga-segment contained a single selectable marker. (C) Giga-segment selection plates after incubation. Plates 1 and 2 both showed a high density of colonies. Plate 3 contained two colony populations: 1) the same size as plates 1 and 2, and 2) very small colonies. The colonies on plate 4 were fewer than on the first two plates. Plates 5 and 6 both had a high colony count, however, the size was very small. ure 3.3C). The colonies containing the first two giga-segments, selecting for tryptophan (Trp) and histidine (His) respectively, showed a high number of colonies at the expected growth rate. The third giga-segment, selecting for methionine (Met), also showed a high colony count, however, in between the normally grown colonies, there were many significantly smaller colonies. The smaller colonies indicate, that the selection on the Met marker is not completely selective. The fourth giga-segment, selecting for leucine (Leu), showed an expected growth rate similar to the first two constructs, however, the number of colonies was decreased. The last two giga-segments were both selected for by the adenine (Ade) marker. On these plates, a similar number of colonies as on the plates containing the colonies of the first two giga-segments was observed. However, the size of the Ade selected colonies were significantly smaller, even after an additional day of incubation. In conclusion, the Trp and His markers are working as intended and should be prioritized for inclusion during the design process. The Met and Leu markers are both functional, but they have minor unwanted side effects, which reduce the usabil- ity of the markers. The Ade marker is not working as intended and should be avoided when putting a cell under stress, such as assembling a synthetic genome. In summary, a set of 5 auxotrophic split markers for the assembly of genome-sized

25 Chapter 3

DNA constructs has been tested. From these 5 markers, Trp, His, and Met showed a performance comparable to wild type behavior, while the Leu marker seems to pose a major burden for cell metabolism. The Ade marker is dysfunctional as cell growth has been observed to be very slow.

ARS-Element Testing and Complementing

In yeast and other eukaryotes, ARS elements are required to ensure the replication of the chromosomes [160–162]. Previously, it has been shown that an ARS is necessary approximately every 100 kb to ensure the stable maintenance of high-GC constructs in S. cerevisiae [155]. The original form of the PoC C. eth-2.0 design [79], which will be introduced in Chapter 4 of this thesis, contained an ARS-element approximately ev- ery 100-140 kb. In the first part of this section, it was analyzed whether the originally included ARS-elements are working as expected and thus enable maintaining a large synthetic construct. In the second part of this section, the number of ARS elements available in the DNA toolbox was expanded to ensure the stable maintenance of syn- thetically designed constructs.

Testing of ARS-Elements Included in the Synthetic C. eth-2.0 Design

To ensure stable maintenance of a large, high GC-content synthetic construct in yeast, the functionality of each ARS element embedded within the design has to be guaran- teed. To assess the functionality, a new assay based on the assembly of a 40 kb piece of DNA (Figure 3.4) has been developed. Within the synthetic C. eth-2.0 design, each ARS element was situated at the intersection of two 20 kb segments. The correspond- ing segments were assembled to test the functionality of each ARS element. For the assembly, the neighboring segments contained a 120 bp homologous sequence. To form a closed plasmid, two different linkers for the assembly of the other ends of the segments were used. Linker 1 (Figure 3.4 top) contained a Cen6/ARSH4 sequence in addition to the homologous sequences at the 3’- and 5’-ends. The Cen6 sequence is a centromeric element, which ensures the segregation of the plasmid during . The ARSH4 sequence is a well working ARS element, which is known to stabilize a 40 kb DNA construct in yeast. Therefore, the constructs assembled using this linker were the positive controls. Linker 2 (Figure 3.4 bottom) was almost identical to linker 1, except for the omission of the ARSH4 sequence. By using the second linker for the assembly it was possible to force the cell to use the ARS element, which was in- corporated within the segments. As selection markers, the auxotrophic split markers described above have been used. After transformation and incubation, the ARS functionality has been analyzed. The colony counts resulting from the different ARS-testing constructs (Table 3.2) were com-

26 Chapter 3

ARS 1)

Pos. Control Seg x Seg x+1

ARS of interest ARS Transformation with segments functional Recombination 2) ARSH4 1) 2)

Cen Cen ARS Linkers non-functional

Figure 3.4: Testing of ARS-elements in a synthetic design. To test the ARS elements in a synthetic design, two neighboring segments (orange) containing an ARS element at their intersection (red) were assembled using two different linkers. Linker 1 consisted of a Cen6/ARSH4 sequence (purple/green) flanked by homologous sequences designed to recombine with the two segments. Linker 2 is mostly identical to 1, except for the missing ARSH4 sequence. After assembly in yeast, the colony counts were analyzed. Linker 1 was always expected to yield colonies, as it was a positive control. The linker 2 assemblies yielded colonies if the ARS element at the intersection of the segments was functional. pared. ARS MAX2 and ARS HI [163] both showed a high (28/31) colony count on the functionality test plates. The positive control count was lower than the test count, in- dicating that this synthetic ARS works better than the established ones. The ARS416, which is associated with the His marker in the design, also showed a higher (13) colony count on the test plate than on the positive control plate. However, the size of the colonies was significantly lower on the test plate. This was an indication, that the ARS was partially functional, but not at the level of showing wild type behavior. The func- tionality test of the ARS1213 did not result in colony growth, while the positive control resulted in the expected colony count. Last, the functionality test of the ARS1516 did not result in any grown colonies. It was not possible to deduce at this stage if the problem was only due to the faulty Ade marker as shown above, or whether the ARS function- ality selection was also playing a role. In conclusion, we were able to show that two ARS (ARS MAX2 and ARS HI) were fully functional, one ARS (ARS416) was partially functional and two ARS (ARS1213 and ARS1516) were non-functional or -testable.

Troubleshooting of the Partial and Non-functional ARS-Elements

It had to be elucidated why two ARS elements (ARS1213, ARS1516, Table 3.2) in the design were not fully functional to subsequently design large, high-GC content DNA constructs. The sequences of the different non-functional designer ARS elements were analyzed and compared to the corresponding wild type S. cerevisiae ARS elements. The comparison revealed, that the ARS416 was 9 bp and ARS1213 was 12 bp shorter than their respective wild type sequences. Around this position, the three ARS ele- ments B1, B2, and B3, which are involved in the replication initiation are located. It has been shown, that the mutation of one of these elements had the effect of lowering the

27 Chapter 3

Table 3.2: Colony counts of the ARS functionality tests.

ARS ID Marker # Cen # Cen/ARSa # Neg. Cont. MAX2 Trp 28 7 7 HI Leu 31 29 0 416b His 13 10 0 1213 Met 0 26 0 1516 Adec 0 0 0 a Positive control b Small observed Cen colony size and reduced growth rate. c The Ade selection marker within the C. eth-2.0 design showed a low performance in previous characteriza- tions. Thus, resulting in a low fitness and no colony growth on either plate.

replication initiation efficiency of an ARS. However, if more than one of these elements was mutated at the same time, no replication initiation took place at this location in the genome [164, 165]. This seems to explain, why ARS416 was still partially functional, while ARS1213 containing the larger deletion was non-functional.

Design and Testing of New ARS-Elements

To assemble the C. eth-2.0 design (Chapter 4) and to further expand future synthetic designs, an additional set of ARS elements for inclusion in future designs had to be found. As a starting point, the ARS collected on the ”S. cerevisiae OriDB” database [166], which consisted of 829 entries were used. To narrow down the space of possible candidates, two different selection criteria were applied: 1) none of the restriction sites critical for the synthesis and assembly process (AarI, BsaI, BspQI, PacI, PmeI, I-CeuI, I- SceI) should be present in the sequence and 2) the maximum size of the ARS was not to exceed 250 bp. These two criteria narrowed down the number of possible candidates. To decide on a final set of ARS sequences, a distance criterion was employed. The distance of an ARS to its next known neighbors in the wild type genome was calculated and the ARS were sorted accordingly. By using these 3 criteria, a set of 10 different ARS, which could potentially be incorporated into future synthetic DNA designs were selected (Table 3.3). As the last step some of these elements were tested. The functionality testing of this new set of ARS elements was conducted differently com- pared to the first functionality test due to time constraints. To test the different elements, ARS1018, ARS1113, ARS516, ARS727, and ARS1323 were directly incorporated into the C. eth-2.0 design and assembled in yeast. This was not successful in prior exper- iments due to the two non-functional ARS elements. As this assembly was successful (see Chapter 4), it was deduced that the novel ARS elements were capable of initiating

28 Chapter 3

Table 3.3: Set of ARS for designs of large synthetic DNA constructs.

Name Tot. Dist. [bp]a Size [bp] Tested ARS1018 106’240 235 Yes ARS1217 100’568 249 No ARS516 89’193 247 Yes ARS1524 80’585 219 No ARS1510 66’668 246 No ARS1113 55’426 233 Yes ARS517 64’062 202 No ARS512 62’448 249 No ARS727 60’937 245 Yes ARS1323 56’764 215 Yes a Distance to the neighbouring ARS elements in wild type S. cerevisiae the replication of the synthetic construct. With this, the definition of a set of new ARS to stabilize the maintenance of large synthetic DNA constructs has been achieved.

29 Chapter 3

3.3 Discussion

E. coli is the standard organism to use for traditional cloning procedures. However, not every strain is equally suitable for cloning depending on the genotype. In the late ’80s and early ’90s, scientists from the Hanahan-Lab (Cold Spring Harbour Laboratory, USA) developed the DH5α and DH10B strains, which nowadays are two of the most widely used and sold E. coli cloning strains [167]. One of the reasons for this success is the successful deletions of recombination and restriction genes resulting in high tolerance and low mutability of non-host DNA. However, the classical cloning of DNA using E. coli becomes unreasonably laborious, once a certain DNA-construct size is surpassed (usually 50-70 kb). If a researcher wants to clone larger DNA-constructs (ranging up to full genome size), another host-organism is required. In this case, S. cerevisiae with its very efficient homologous recombination system becomes relevant. To clone medium- to large-sized plasmids (20-80 kb) in S. cerevisiae, the commonly used LiAc-transformation protocol is applied [168]. This protocol works for most rou- tinely used S. cerevisiae strains. However, if the size of the cloning product exceeds 100 kb, the spheroplast transformation protocol is used due to the higher quantity of transformed DNA compared to the LiAc protocol [149]. For this protocol, varying DNA uptake efficiencies can be observed when comparing different strains [169]. To en- sure a maximal DNA uptake by the cells, we decided to use the S. cerevisiae strain YPH857 [156] as the origin for the engineering of a dedicated genome assembler strain. This strain has previously been used to assemble large DNA constructs with the spheroplast transformation protocol [170]. However, in previous assemblies, the selection markers were included within the different DNA pieces, which were being as- sembled. In the current assembly strategy, the auxotrophic markers were used as split markers and the cell had to correctly assemble the markers to survive. As the YHP857 still contained most of the marker sequences, there was a risk that the synthetic DNA constructs would recombine with the yeast genome and therefore raising the number of false-positive results. To eliminate this source of false-positive results, YPH857 has been engineered by fully removing the auxotrophic marker sequences in the genome with CRISPR-Cas9. This removal resulted in the generation of the strains YJV01-05. In each of these strains, an additional marker gene has been deleted, thus eliminating the risk of false-positive colonies due to homologous recombination between the construct and the genome. While assembling different large DNA-constructs (140-200 kb) with high efficiency, it was shown that these strains are highly competent for DNA uptake when transformed using the spheroplast protocol. Therefore, this set of strains can be used for any syn- thetic DNA assembly in the future, as long as the assembled construct is not toxic for S. cerevisiae.

30 Chapter 3

To establish the synthesis and assembly of large DNA constructs as a common lab- method, we introduced a standardized assembly strategy centered around the rewrit- ten auxotrophic split markers. The parts of these markers are located at either the 3’- or 5’-end of two adjacent segments of the DNA construct. The marker is only ac- tive if the two segments are correctly assembled. In this chapter, the auxotrophic split markers included in the PoC construct of C. eth-2.0 (Chapter 4) were tested by assem- bling the giga-segments. Each of these 140-200 kb DNA constructs was assembled in YJV04 using the spheroplast assembly method and each was selected for a different auxotrophic marker. As all the transformations resulted in high colony counts, it was confirmed that the DNA uptake efficiency of the YJV04 strain is high. Further, it was shown that the newly designed auxotrophic split markers are all functional, albeit to a different degree. The Trp- and His-markers have shown a good selection without in- hibiting colony growth. The Met-marker showed a functional selection, however, after prolonged incubation, smaller false-positive colonies were observed. The Leu-marker was also resulting in hundreds of colonies, but the metabolic burden on the cell seemed to be high, as the number of colonies was lower when using this marker. Last, the Ade- marker showed some functionality but the size of the colonies was very small even after several days of incubation. It was suspected, that the problem of this marker was the associated promoter, which is not strong enough to efficiently activate the Ade2 expres- sion. For smaller assemblies, which do not need all 5 selection markers, it is advised to implement the markers in the order as discussed above. Regarding the Ade-marker, an alternative promoter should resolve the problem in future designs.

A given set of rules need to be followed to successfully synthesize and assemble large synthetic constructs. The most important rule is to avoid toxic elements in E. coli and S. cerevisiae at all costs. In addition, the first set of guidelines considers the selection and maintenance of a synthetic DNA construct in yeast. When constructing a pathway or a genome from any organism, it is very often not possible to lower the GC-content of the designer construct to the GC-content of S. cerevisiae to ensure maintenance. In this case, ARS sequences have to be embedded into the design at 100 kb intervals. A list of possible sequences is provided in Table 3.3. For short designs smaller than 100 kb, the ARS element located on the assembly pMR10Y ETH vector is sufficient. The pMR10Y ETH vector is a multi-host backbone, which can carry DNA constructs of up to at least 60 kb. To select for the full assembly, it is important to integrate a split auxotrophic marker every 140-160 kb in the design and previously described order. It is important to maximize the distance between the two best markers to reduce the stress on the yeast cells after the assembly by first using two auxotrophic markers instead of

31 Chapter 3

five. This procedure improves the survival rate after transformation. The selection for the full assembly can be done by streaking the grown colonies onto plates selecting for all markers. By using these two simple guidelines, the selection and maintenance of a large DNA construct in S. cerevisiae is ensured.

The second set of guidelines concerns the synthesis and assembly process of the DNA constructs. In the past years, much progress on the chemical de novo synthesis pro- cess of DNA sequences has been observed. However, many DNA sequences cannot be synthesized due to the presence of secondary structures, mispriming of oligonu- cleotides and polymerase slippage [78, 80]. This problem is especially pronounced in sequences with a high GC-content. To remove such obstructive elements and to lower the average GC-content of a DNA design before the synthesis, the Calligrapher algo- rithm [146] can be applied. This algorithm optimizes the designed DNA sequence for synthesis by using neutral rewriting within the coding sequences of the design. Further- more, the algorithm is also considering the codon table of the target organism to ensure the optimal sequence. After optimizing the sequence for synthesis, the last step is to divide it into smaller, synthesizable parts by applying the Partitioner algorithm [151]. This algorithm ensures the applicability of the standard assembly pipeline established in our laboratory by splitting the full sequence into smaller parts and adding the required adapters to the end of each component. The partitioned sequence can be assembled in a parallel fashion, thus enabling an efficient assembly process. In conclusion, by adding the ARS and auxotrophic split markers to the design, the stable selection and maintenance in yeast can be guaranteed. Furthermore, by applying the Calligrapher and Partitioner algorithms to the designs, the synthesis and assembly of any construct are ensured.

To validate the design rules in combination with the standard assembly pipeline, we designed, synthesized and assembled the minimum genome of C. crescentus as a proof-of-concept (PoC) [79]. The details of this PoC can be found in Chapter 4 of this thesis. In brief, the set of essential genes in optimal conditions of the cell-cycle model organism C. crescentus was defined using the Tn-seq method [98]. Subsequently, the essential genes were collated into a new design, which included the auxotrophic split markers and the first set of ARS. This new design was around 785 kb in size and we refer to it as C. eth-1.0. In theory, this first version of the design should already be stable in S. cerevisae. For the full assembly, however, no synthesis company would be able to produce the design, as there were around 7,000 synthesis constraints detected. At this stage, the Calligrapher algorithm was applied, substituting 17% of all the bases. This is the equivalent of the rewriting of 56% of all the codons in the design, reducing the number of synthesis constraints to around 300. This new version of the design was named C. eth-2.0. To enable the fast-paced assembly of the synthesized components

32 Chapter 3 in a four-tier assembly process, the Partitioner algorithm was used to split the design into 236 blocks and to add the required adapters to each component. Out of the 236 blocks, 235 were successfully manufactured using a low-cost synthesis pipeline. One block had to be synthesized using a more expensive custom synthesis pipeline, as the synthesis was not successful in the low-cost pipeline. This high synthesis success rate indicates that the rewriting of the C. eth-1.0 design was successful with regard to the de novo synthesis. During the assembly process of the C. eth-2.0 construct, non- functional ARS elements within the design were detected and replaced with the new set of ARS elements. Using the new design with the additional ARS markers included, the assembly of the construct was successful. This PoC shows that it is possible to synthesize and assemble high GC-content sequences using the technology proposed in this chapter.

To enable future use of this technology we need to learn how to encode biological func- tionality in a new organism. To do so we need to know a) how is information stored within the genomic sequence of an organism’s genome and b) evaluate if there is de- sign flexibility when it comes to functionality. In the past, these questions were difficult to answer by using standard genomic methods as there were no tools for analyzing the full genome at once. However, with C. eth-2.0 we were able to show, that there is high sequence flexibility within the coding regions, as 56% of all codons were rewrit- ten and 81% of all genes were shown to be functional. However, by analyzing the non-functional genes within the assembled design new insights on information storage within the genome were obtained. By analyzing the large amount of knowledge we gained through the analysis of only the 530 rewritten C. eth-2.0 genes, we suspect that this technology holds the potential to be the next big step for the field of genetics. By advancing this technology further, we will not only help fundamental biology to under- stand genetics more thoroughly but it will also become possible to implement the new knowledge into novel sequence designs and create new applications in biotechnology.

In the last section of the discussion, possible future applications of this technology will be elucidated. As this technology is widely applicable in many different fields of biol- ogy many possibilities are conceivable. The first and most straightforward application is to synthesize bio-pathways and produce valuable compounds such as vitamin B12, Artemisinin, Taxol and many more. In general, the natural pathways for the production of these compounds are known but cloning the pathways into new production organ- isms is not possible or tedious due to the sequence size and the pathway complexity. However, by using this new technology the pathway can be designed and optimized for the new host organism in silico before assembly. For this application, I am currently conducting a PoC study to create a novel vitamin B12 pathway. Another application for the production of compounds can be found in the dark matter of DNA. There are many

33 Chapter 3

organisms with a known genome sequence but they cannot be cultivated under labora- tory conditions. By mining, this in silico genome information, new compounds, and their corresponding pathways can be discovered [171]. Those pathways can subsequently be optimized for a new production-host, synthesized and assembled. Thus, genome mining in combination with synthesis opens up the gates to a host of new compounds, which bear the potential to cure different types of cancer or other currently non-curable diseases. The implementation of single pathways into production strains is the first example of this technology applied to the field of modern biotechnology. However, the next level is the design of completely new organisms or molecular factories. Such organisms will be genetically fully defined and optimized for the production of the desired compound. This overarching knowledge of the organism will also enable us to synthesize more complex compounds of higher commercial value. Such synthetically designed molecular facto- ries could potentially produce new materials such as bioplastics. Further, it could be imagined to start producing bio-fuel with high efficiencies by using such a specialized organism.

Sequencing Synthesis Assembly

Viral Hosts Genome Database Vaccine Design Vaccine in Yeast

Vaccination Downstream Processing Vaccine Purified Available Genomic Vaccine

Vaccination Downstream Processing Vaccine Purified Attenuated Vaccine Available Attenuated Resurrection

Figure 3.5: Overview of Accelerated Vaccine Design. Virus samples are sequenced and added to a global sequence database. To design the vaccine, the DNA sequence is recoded by an algorithm, also enabling the synthesis of the sequence. After synthe- sis and assembly, there are two possible paths to proceed to a vaccine: 1) the DNA- plasmid is purified and injected as a DNA vaccine, 2) the attenuated virus is rescued and purified.

The last potential future application of this technology discussed in this work is the development of new vaccines (Figure 3.5). One of the major bottlenecks in vaccine

34 Chapter 3 development is the necessity to cultivate the desired virus. However, virus cultivation is not trivial and very laborious [172]. Using the rewriting technology described herein combined with knowledge from , it will become feasible to create genomic vac- cines by only using the sequence of the viral agent. This enables to significantly shorten the vaccine development process. After the design, synthesis and assembly process, two possible paths to administer the new vaccines are conceivable. The first path is the traditional way, in which a boot-up of the vaccine construct is conducted within in vivo cell-lines, which is then followed by a clean-up of the attenuated virus. The clean virus particles are subsequently used for vaccination. This process is very similar to the current standard of vaccine production. The second conceivable administration pro- cess is by the means of DNA vaccines. A DNA vaccine consists of only the vaccine DNA construct, which is injected into the patient. Within the designated patient cells, the vaccine encoded in the DNA will be expressed and trigger the desired immune re- sponse. This second administration method is one of the promising technologies of the next decade and we are confident, that the technology presented in this thesis will significantly contribute to it.

35 Chapter 3

3.4 Materials and Methods

Unless otherwise stated, all chemicals were purchased from Sigma-Aldrich (St. Louis, MO, US). Milli-Q water was used for all experiments.

Strains and Plasmids

Assembler Strains

Table 3.4: Yeast strains used for the generation and usage of the genome assembler toolbox.

Strain ID Name Organism Comment Source α, ura3-52 lys2-801 ade2-101 BC3980 YPH857 S. cerevisiae P. Hieter [156] trp1-∆63 his3-∆200 leu2-∆1 BC3981 YJV01 S. cerevisiae YPH857, ∆met14 This Work BC4040 YJV02 S. cerevisiae YJV01, ∆ura3 This Work BC4102 YJV03 S. cerevisiae YJV02, ∆ade2 This Work BC4163 YJV04 S. cerevisiae YJV03, ∆leu2 This Work BC4184 YJV05 S. cerevisiae YJV04, ∆lys2 This Work

CRISPR-Cas9 Plasmids

Table 3.5: Plasmids used for the strain engineering by CRISPR-Cas9.

Strain ID Name Comment Source Universal CIRSPR-Cas9 AflII / BsaI Master Thesis J.E.V, BC4029 a][ Plasmid Restriction site F. Rudolf Nourseothricin Master Thesis J.E.V, BC4100 Helper Plasmid Resistance F. Rudolf a Restriction site used to linearize the plasmid for target acquisition of sgRNA

Media and supplements

For the growth of S. cerevisiae either YPD (spatula tip adenine hemisulfate, 10 g/L Bacto yeast extract (Difco), 20 g/L Bacto Peptone (Difco), 2% Glucose) or synthetic- defined (SD) (17 g/L Yeast nitrogen base w/o amino-acids (AA), 50 g/L Ammonium sulfate, 0.04 g/L L-Tyrosine (Tyr), 0.02 g/L L-Arginine (Arg), 0.06 g/L L-Phenylalanine (Phe), 0.04 g/L L-Lysine monohydrochloride (Lys), 0.05 g/L L-Glutamic acid monosodium

36 Chapter 3 salt (Glu), 0.06 g/L L-Isoleucine (Ile), 0.05 g/L L-Aspartic acid (Asp), 0.12 g/L L-Valine (Val), 0.04 g/L L-Proline (Pro), 0.2 g/L L-Serine (Ser), 0.1 g/L L-Threonine (Thr), 0.02 g/L Uracil (Ura), 0.02 g/L L-histidine monohydrochloride monohydrate (His), 0.08 g/L L- Leucine (Leu), 0.02 g/L L-Methionine (Met), 0.04 g/L L-Tryptophan (Trp), 0.04 g/L Ade- nine hemisulfate (Ade), 2% Glucose) medium was used. For the SD-medium, any AA or base could be removed for selection. If needed, 20 g/L Agar granulated (Difco) was added to the media for plating. The following antibiotic concentrations were used: 200 µg/mL G 418 disulfate salt (Gibco, Thermofisher, Waltham, MA, US), 100 µg/mL Nourseothricin (Nat) (Jena Bioscience, Jena, DE). E. coli was grown in LB-medium (10 g/L Bacto Tryptone (Difco, Thermofisher), 5 g/L Bacto yeast extract (Difco), 5 g/L NaCl) or on LB-plates containing 15 g/L agar granulated (Difco) in addition to the LB- medium. For the selection of E. coli the following antibiotics were used: Kanamycin (50 µg/mL on plates and 30 µg/mL in liquid culture) and Carbenicillin (100 µg/mL on plates and 50 µg/mL in liquid culture)

Microbial growth conditions

For liquid cultures, yeast was grown at 30◦C and E. coli at 37◦C. Both organisms are strongly agitated (∼200 rpm) during growth. The growth temperature on plates was the same without shaking.

Yeast lithium-acetate transformation

Yeast transformation [168] was used to insert DNA into yeast cells. The selection for a successful transformation of the DNA was accomplished by using either an auxotrophic marker (e.g. Ura3) or antibiotic resistance (e.g. Kan). To start, a 5 mL yeast preculture was grown overnight. Next, the dense culture was diluted 1:25 in 50 mL YPD-medium and incubated for 4 h. After incubation, the culture was spun down at 1,000 rcf for 5 min.

The supernatant was replaced with 50 mL H2O and spun down at 3,000 rcf for 5 min. After discarding the supernatant, the cells were resuspended in 100 µL LiAc mix (0.1 M lithium-acetate, 0.01 M Tris-HCl pH = 7.5, 0.001 M EDTA pH = 8) per transformation. To the LiAc-cell mix, 10 µL salmon-sperm DNA (1% w/v salmon-sperm DNA (ssDNA), 0.01 M Tris-HCl pH = 7.5, 0.001 M EDTA pH = 8) and 600 µL PEG-mix (40% w/v Poly(ethylene glycol) 3015-3685 g/mol, 0.01 M Tris-HCl pH = 7.5, 0.001 M EDTA pH = 8) were added per transformation. Of this master mix, 710 µL was aliquoted into 1.5 mL tubes to which the appropriate amount (100 - 200 ng unless stated otherwise) of DNA was added. The tubes were incubated for 30 min at room temperature while shaking. After incubation, 70 µL of DMSO was added and the tubes were incubated at 42◦C. The cells were spun down at 1000 rcf for 2 min, the supernatant was discarded and the pellet

37 Chapter 3

resuspended in 300 µL YPD for antibiotic markers and H2O for auxotrophic markers. The transformed cells were incubated at room temperature while shaking for at least 3 h if G 418 was used as a marker. For auxotrophic markers, no additional incubation was required post-heat-shock. The cells were plated on appropriate agar plates and incubated at 30◦C. Colonies were expected to appear after two days of incubation.

CRISPR-Cas9 assisted gene deletion in yeast

To delete the auxotrophic marker genes within the yeast strains in a fast and efficient manner, CRISPR-Cas9 was used. The utilized system has been co-developed by me during my master’s thesis. I refined the tool and used it to mutate different yeast strains during this project. This CRISPR-Cas9 system consists of three parts: 1) the universal CRISPR-Cas9 plasmid, 2) a helper plasmid, and 3) a mutation-oligo. The complete workflow to introduce a clean mutation into any locus within S. cerevisiae using this system took 2 weeks in total. This system also holds the potential to be used in a multiplexed fashion in the future.

Universal sgRNA target acquisition

The universal CRISPR-Cas9 plasmid was an all-in-one plasmid. It contained a con- stitutively expressed sgRNA with a placeholder targeting sequence and a Cas9 ORF under the expression of a β-estradiol inducible promoter [173]. For the Cas9 to cut at a specific location within the genome, a target has to be defined within the sgRNA. For this, the AflII / BsaI (NEB, Ipswich, MA, US) digested plasmid was transformed into the yeast cell along with an 80 bp oligo, containing the targeting sequence as well as 30 bp overhangs to the sgRNA for homologous recombination. The selection for the in- tact CRISPR-Cas9 plasmid was conducted using G 418. After two days of growth, the colonies were screened for a functional CRISPR-Cas9 system by spotting the colonies onto YPD ± β-estradiol plates. If the system was active, there was no growth on plates containing the activator compound. After this screen, CRISPR-Cas9 positive strains were used for further engineering steps.

CRISPR-Cas9 assisted gene deletion

The genes of interest were deleted by the means of homologous recombination. A yeast pre-culture containing the targeted CRISPR-Cas9 plasmid was grown overnight and di- luted 1:25 in 100 mL YPD medium. After 2.5 h of incubation, 2.5 µM β-estradiol was added and the cells were incubated for an additional 1.5 h. During this time, double- stranded breaks were introduced into the yeast genome by the Cas9 nuclease. Fol- lowing the incubation, the mutation-oligo and the helper plasmid were transformed into

38 Chapter 3 the yeast cell using the LiAc-transformation protocol. The function of the helper plas- mid was to confer an additional antibiotic resistance (Nat), while the mutation-oligo was designed in such a way that the desired mutation was introduced upon recombination with the genome. After transformation, the cells were plated onto YPD + G 418 + Nat + β-estradiol plates selecting for the CRISPR-Cas9 and the helper plasmid. Further, they activated the Cas9 expression, ensuring the positive selection for the mutation. After 2 days of incubation, colonies have grown on the plates.

Verification of gene deletion by colony PCR

To verify the successful deletion of the auxotrophic marker genes, diagnostic colony PCR was used. The primers were designed such that the amplification of a product was only possible if the deletion was introduced correctly. To break the cell-wall, S. cerevisiae colonies were dissolved in 3 µL 0.02 M sodium hydroxide and boiled at 99◦C for 10 min. After this, the PCR master-mix (5 µL 5 M betaine, 12.5 µL BioRed 2x Mas- termix (Bioline, London, UK), 0.55 µL each diagnostic primer (100 nM), and 3.55 µL

H2O) was added to the broken cell mixture. The reactions were conducted in a thermo- cycler (C1000 touch, BioRad, Cressier, Switzerland) according to the following protocol: (1) 5 min at 96◦C, (2) 30 s at 96◦C, (3) 30 s at 55◦C, (4) 1 min at 72◦C, (5) repeat steps 2 – 4, 30 times, (6) final elongation 10 min at 72◦C. PCR products were analyzed on 1.5% agarose gels by electrophoresis to identify clones, in which the genes have been deleted successfully.

Outgrowth of the CRISPR-Cas9 plasmid

After the successful deletion of the target gene in S. cerevisiae, the CRISPR-Cas9 plasmid had to be removed. For this, the yeast cells were grown in plain YPD-medium without any selection for 24 h. After this, the culture was spread on a YPD-plate. After incubation, single colonies were spotted onto YPD plates with and without G 418. By comparing the growth pattern on the two plates, the colonies without the plasmids were identified.

Yeast spheroplast transformation

To assemble DNA constructs larger thank 70 kb, yeast spheroplast transformation was applied according to the following adapted protocol [149]. A yeast pre-culture was di- ◦ luted in 100 mL YPD medium and incubated at 30 C overnight until an OD600 between 0.3 and 0.5 was reached. Cells were harvested by centrifugation at 1,000 rcf for 5 min, washed in 30 mL sterile H2O and again washed in 20 mL 1 M Sorbitol, followed by dis- solving the pellet in 20 mL SPE solution (1 M Sorbitol, 0.01 M sodium phosphate, 0.01 M

39 Chapter 3

EDTA pH 8.0) supplemented with 40 µL 2-mercaptoethanol and 20 µL Zymolyase so- lution (10 mg/mL Zymolyase 20T, 0.05 M Tris-HCl pH 7.5, 25% glycerol). The mixture was incubated at 30◦C and the progression of spheroplast formation was monitored after 20 min by mixing 100 µL of the digested cells with either 900 µL 1 M Sorbitol

or 900 µL 2% SDS respectively. The OD600 ratio in the absence or presence of SDS was determined on a spectrometer. The Zymolyase digestion was continued until the

sample treated with 2% SDS showed a 3 to 5-fold lower OD600 than the control sample. Spheroplasts were harvested by centrifugation at 300 rcf for 10 min and gently dissolved in 50 mL 1 M sorbitol. This washing step was repeated twice again before careful re- suspension of spheroplasts in 2 mL STC solution (1 M sorbitol, 0.01 M Tris-HCl pH 7.5, 0.01 M calcium chloride). For the assembly reactions, 1.8 µg of digested and purified mega-segments in a volume of 20 µL, as well as 2-3 µg ssDNA (Sigma-Aldrich, USA) were pipetted into a sterile 1.5 mL tube and 200 µL of freshly prepared spheroplasts were added and incubated at room temperature for 10 min. To each sample, 800 µL PEG solution (20% PEG 8000, 0.01 M calcium chloride, 0.01 M Tris-HCl pH 7.5) was added, inverted carefully several times and incubated at room temperature for 10 min. Spheroplasts were collected by centrifugation at 300 rcf for 10 min, resuspended in 800 µL SOS solution (1 M sorbitol, 0.0065 M calcium chloride, 0.25% yeast extract; Difco, 0.5% peptone; Difco) and incubated at 30◦C for 40 min. The spheroplast solu- tion was mixed with 7 mL SDSORB-TOP (SD agar containing 1 M Sorbitol and 2.5% agar pre-tempered at 50◦C), inverted several times and poured onto SDSORB (SD agar containing 1 M Sorbitol and 2% agar) plates selecting for the appropriate marker. After hardening, the plates were incubated at 30◦C for 5-7 days before seeing colonies.

Testing of ARS elements

To test the ARS elements included in a synthetic design, an assay was developed. For this, two synthetic 20 kb DNA sequences were assembled into one piece using a 120 bp homologous sequence on one side and a linker on the other site. Two different linkers were used: 1) the linker contained a Cen6 and an ARSH4 element as well as the ho- mologous regions for both DNA pieces. 2) The linker contained the Cen6 element only in addition to the homologous regions. Two days after the LiAc-transformation, colonies were growing on the plates. By comparing the colony counts of the transformations using linker #1 and linker #2, the functionality of the ARS elements was deducted.

40 Chapter 4

Chemical Synthesis Rewriting of a Bacterial Genome to Achieve Design Flexibility and Biological Functionality

4.1 Abstract

Understanding how to program biological functions into artificial DNA sequences re- mains a key challenge in synthetic genomics. Here, we report the chemical synthe- sis and testing of Caulobacter ethensis 2.0 (C.ETH-2.0), a rewritten bacterial genome composed of the most fundamental functions of a bacterial cell. We rebuilt the es- sential genome of Caulobacter crescentus through the process of chemical synthesis rewriting and studied the genetic information content at the level of its essential genes. Within the 785,841 bp genome, we employed sequence rewriting to reduce the number of encoded genetic features from 6,290 to 799. Overall, we introduced 133,313 base substitutions resulting in the rewriting of 123,562 codons. We tested the biological func- tionality of the genome design in Caulobacter crescentus by transposon mutagenesis. Our analysis revealed that 432 essential genes of C.ETH-2.0, corresponding to 81.5% of the design are equal in functionality to natural genes. These findings suggest that neither changing mRNA structure nor the codon context has significant influence on bio- logical functionality of . Discovery of 98 genes that lost their function identified essential genes with incorrect annotation including a limited set of 27 genes where we uncovered novel non-coding control features embedded within protein-coding sequences. In sum, our results highlight the promise of chemical synthesis rewriting to decode fundamental genome functions and its utility towards the design of improved organisms for industrial purposes and health benefits.

41 Chapter 4

4.2 Preface

The project discussed in this chapter consisted of two main parts. The first part was the design, optimization, synthesis, and assembly of the de novo synthesized C. eth- 2.0 genome. The second part of the project considered the functionality of the newly created genome, by analyzing the different parts for their functionality.

My contribution to this publication was made to the first of these two parts. As it was already mentioned in Chapter 3 of this thesis, the assembly of the C. eth-2.0 genome was used as a PoC of the newly developed synthetic genome assembly technology. When I started working on the project, the 20 kb DNA segments have already been assembled or have been delivered by the synthesis companies. The assembly process of the 40-60 kb mega-segments was ongoing then. I started my part of the project by assembling the remaining mega-segments, for which the previous assembly attempts were not successful. Once all the mega-segments were assembled successfully, I ded- icated my time to test the auxotrophic split markers (see Chapter 3) while establishing the yeast spheroplast transformation protocol. After both of these steps were success- ful, I started to assemble the full 785 kb C. eth-2.0 construct, however, this failed several times. After analyzing the false-positive colonies of the full assemblies, I decided to test the ARS elements included in the design, resulting in the integration of the new ARS elements described in Chapter 3. After this, I successfully assembled the full construct and I was able to prove the stable maintenance of this construct within yeast.

In addition to carrying out the experimental steps to assemble the synthetic C. eth-2.0 genome, I helped writing and revising the manuscript. Further, I was working on the visualization of the results in the figures related to the assembly process. Lastly, I took part in the submission and revision process of the paper.

42 Chapter 4

4.3 Chemical Synthesis Rewriting of a Bacterial Genome to Achieve Design Flexibility and Biological Functionality

Jonathan E. Venetz, Luca Del Medico, Alexander Wolfle,¨ Philipp Schachle,¨ Yves Bucher, Donat Appert, Flavia Tschan, Carlos E. Flores-Tinoco, Marielle¨ van Kooten, Rym Guen- noun, Samuel Deutsch, Matthias Christen and Beat Christen

Published in Proceedings of the National Academy of Sciences, 2019, 116(16), 8070- 8079

Introduction

In the early 2000s, the template independent chemical synthesis of the 7.4 kb polio virus [174] and 5.4 kb bacteriophage phiX174 genomes [175] using oligo- has ushered the field of synthetic genomics. The initial progress on moderately sized vi- ral genomes has spurred whole-genome synthesis of more complex organisms. In 2008 and 2010, the Craig Venter Institute reported the chemical synthesis of genome replicas from (583 kb) and Mycoplasma mycoides (1.1 Mb) [140, 141]. These efforts expanded the chemical synthesis-scale to mega-bases and improved in vitro DNA assembly strategies and genome transplantation methods. However, their work also highlighted the challenges of whole genome synthesis as a single mis-sense mutation within the dnaA gene initially prevented boot-up. To gain insights into a mini- mal gene set for cellular life, the teams of Craig Venter build a 473-gene reduced version of the M. mycoides genome [145]. Along these accomplishments, the concept of whole genome synthesis and genome minimization has been expanded towards the rebuilding of all 16 chromosomes of Sac- charomyces cerevisiae, driven by an international consortium comprised of 21 institu- tions. In 2014, the consortium reported synthesis of the artificial yeast chromosome synIII (273 kb) [176]. Subsequently five additional chromosomes [177–181] were gen- erated and as of 2018, roughly 40% of the entire yeast genome has been covered. The redesigned chromosomes removed repetitive sequences (tRNA genes, introns and transposons) to increase targeting fidelity during step-wise homologous replacement, as well as included the seeding of loxP sites to permit iterative genome reduction upon completion of yeast chromosomes. In the beginning of the yeast 2.0 synthesis project, CRISPR had not yet entered the stage, but today offers an alternative approach for progressive genome reduction. The redundancy of the genetic code defining the same amino acid by multiple syn- onymous codons, offers the possibility to erase and reassign codons throughout an entire genome. Such rewriting efforts are used to engineer organisms with altered

43 Chapter 4

genetic codes and freeing up codons for incorporation of artificial amino-acids, which do not occur within natural organisms. To date, genome-wide rewriting efforts have been primarily reported for viral genomes [182–184], and a few focused on the rewrit- ing of microbial genomes of Escherichia coli, Salmonella and S. cerevisiae. Using oligo-mediated recombineering [185] all 321 instances of the TAG stop codon in E. coli were altered to TAA, demonstrating the dispensability of a stop codon within the genetic code [186]. In an extension of this approach, rewriting of 13 sense-codons across a set of ribosomal genes [187] and genome-wide rewriting of 123 instances of the arginine rare codons AGA and AGG [188] was accomplished in E. coli. These studies unearthed unexpected recalcitrant synonymous rewriting events that occurred primarily in vicinity to 5’- and 3’-termini of protein-coding sequences [188, 189]. Recently, to investigate the impact of more complex rewriting schemes, de novo DNA synthesis methods have been used for the rewriting of gene cassettes in conjunction with genomic replacement strategies [185, 190]. Ongoing de novo synthesis towards a 57-codon E. coli genome was reported [191] with the complete genome synthesis underway. Despite this progress, the underlying rewriting design principles have remained ill- defined and debugging has remained challenging [187, 189]. It has been speculated that presence of embedded transcriptional and translational control signals at the ter- mini of coding sequences as well as imprecise genome annotations are the underlying cause. We hypothesized that massive synonymous rewriting in conjunction with a sys- tematic investigation of error causes will shed light onto the general sequence design principles of how biological functions are programmed into genomes. However, while some progress has been made to study recoding schemes using individual genes and gene clusters [191], the field currently lacks a broadly applicable high-throughput error diagnosis approach to probe the rewriting of entire genomes. Here, we report the chemical synthesis of Caulobacter ethensis-2.0 (C. eth-2.0), a bac- terial minimized genome composed of the most fundamental functions of a bacterial cell. We present a broadly applicable design-build-test approach to program the most fundamental functions of a cell into a customized genome sequence. By rebuilding the essential genome of Caulobacter crescentus (Caulobacter thereafter) through the pro- cess of chemical synthesis writing, we studied the genetic information content at the level of its essential genes.

Results

Essential Part List to Build C. eth-1.0

We conceived a bacterial genome design encoding the entire set of essential DNA se- quences from the freshwater bacterium Caulobacter (Fig. 4.1A). Caulobacter is recog-

44 Chapter 4

A escen C. cr tus e gen tiv om na e 2.0 2.4 1.6

2.8

1.2 676 rewritten 3.2 essential genes

0.8

3.6

0.4

4.0 0 0 0.2 0.4 0.6

C. e e th-2.0 genom B Essential C. eth-1.0 DNA part

Tn hits begin end ORFs

DNA sequence rewriting TSS 500 bp RBS

rewritten essential C. eth-2.0 gene

Figure 4.1: Part design, compilation and chemical synthesis rewriting of the C. eth-1.0 genome. (A), Schematic representation of the digital design process. 1,745 DNA parts were extracted from the native Caulobacter NA1000 genome (grey) and reorganized into a rewritten genome design (blue) comprising the entire list of essential genes re- quired to run the basic operating system of a bacterial cell. Lines (in blue) connect positions of DNA parts between native and rewritten genomes. (B), Workflow of the part identification and chemical synthesis rewriting process. Transposon sequencing was used to identify the entire set of essential DNA parts of Caulobacter at a reso- lution of a few base pairs. Absence of transposon insertions (Tn hits plotted as grey lines) pinpoints to non-disruptable DNA regions within the native Caulobacter genome. Such essential DNA parts may encode for putative alternative open reading frames, transcription start sites (TSS) or ribosome binding sites (RBS) that are not required for functionality of the essential DNA part itself. Computational sequence rewriting (Mate- rials and Methods) was used to erase putative sequence features that have not been assigned to a specific biologic function. The resulting rewritten DNA parts are fully defined and only encode for their desired function.

45 Chapter 4

nized as an exquisite cell-cycle model organism [192–195] for which multi-dimensional omics [196], transcriptome- [197] and ribosome-profiling measurements have been in- tegrated into a well annotated genome model [198, 199]. We computationally gener- ated the entire list of essential DNA parts for building a bacterial genome from a pre- viously published high-resolution transposon sequencing data set [98] that identified with base-pair resolution the precise coordinates of essential genes including endoge- nous promoter sequences. DNA parts where extracted from the native Caulobacter NA1000 genome sequence (NCBI Accession: NC 011916.1) according to predefined design rules and concatenated into a digital genome design preserving gene organiza- tion and orientation (Fig. 4.1A, Supplementary Methods, [146]. The resulting 785,701 bp genome design termed Caulobacter ethensis-1.0 (C. eth-1.0) encodes for the most fundamental functions of a bacterial cell. Cumulatively, C. eth-1.0 consists of 1,761 DNA parts including 676 protein-coding, 54 non-coding and 1,015 intergenic sequences. To select for faithful assembly and permit stable maintenance in S. cerevisiae, auxotrophic marker genes (TRP1, HIS3, MET14, LEU2, ADE2) and a set of 10 autonomous replicat- ing sequences (ARS) were seeded across the genome design (Table 4.1, Supplemen- tary Methods). Furthermore, the pMR10Y [146] shuttle-vector sequence, permitting stringent low-copy replication in S. cerevisiae, E. coli and Caulobacter, was inserted at the native location of the Caulobacter origin of replication.

Sequence Rewriting of C. eth-1.0 to Enable de novo Genome Synthesis

We were unable to obtain 3-4 kb DNA building blocks of C. eth-1.0 from commer- cial DNA suppliers due to a multitude of synthesis constraints. Synthesis constraints are a common problem of natural genome sequences, which have evolved to main- tain biological information rather than facilitating chemical synthesis. Recent bioin- formatics work [146] showed that more than three quarters of all deposited bacterial genome sequences are not amenable for low-cost synthesis. We hypothesized that computational synonymous rewriting into an easy-to-synthesize sequence would facili- tate chemical synthesis of the 785 kb genome while maintaining the encoded biological functions (Fig. 4.6A). We used our previously reported computational DNA design al- gorithms [146, 151] and generated a synthesis-optimized genome design termed C. eth-2.0. Cumulatively, we introduced 10,172 base-substitutions and removed 5,668 synthesis constraints (Table 4.2). These are composed of 1,233 repeats, 93 homo- polymeric stretches and 4,342 regions of high GC-content (Table 4.2), known to hinder chemical DNA synthesis. Moreover, we erased additional 1,045 endonuclease restric- tion sites to facilitate standardized assembly of the DNA building blocks into the 785 kb chromosome of C. eth-2.0.

46 Chapter 4

Table 4.1: Part list used to build the C. eth-1.0 genome design

DNA part category Quantity Size [bp] Fraction Protein-coding sequences 676 660,789 83.9% Essential 462 471,072 59.8% Semi-essential 113 114,270 14.5% Redundant 15 14,970 1.9% Non-essential 86 60,477 7.7% Non-coding sequences 54 9,726 1.2% tRNA 44 3,455 0.4% rRNA 3 4,387 0.6% ncRNA 7 1,884 0.2% Intergenic sequences 1,015 96,043 12.2% Genome replication & assembly 16 19,143 2.5% Click-markersa 5 6,352 0.8% ARS 10 2,121 0.3% pMR10Yb 1 10,670 1.4% Total Number of DNA parts 1,761 785,701 a Auxotrophic selection markers that were used to direct the assembly and maintenance of the genome design in yeast. b The pMR10Y shuttle vector contains a broad host-range RK2-based low copy (Genbank AJ606312.1), a kanamycin selection marker, oriT function for conjugational transfer from E. coli to Caulobacter as well as URA3 marker, ARS and CEN elements for selection and replication in yeast.

Sequence Rewriting to Minimize the Number of Genetic Features

We reasoned that chemical synthesis rewriting offers a powerful experimental approach to probe the accuracy of existing genome annotations and study where additional layers of information exist beyond the primary amino acid code. Further, novel fundamental functions encoded within the essential genomes can be identified. In addition to the base substitutions introduced for synthesis streamlining, we employed computational sequence design algorithms [146, 151] to deliberately add 123,141 base substitutions within protein-coding sequences to yield the rewritten C. eth-2.0 design (Table 4.2). In C. eth-2.0, we replaced 56.1% of all codons by synonymous versions. While the amino acid sequence of the 676 annotated genes was maintained, rewriting enabled us to minimize the number of hypothetical genetic elements present within protein-coding se- quences of C. eth-2.0. These elements include alternative open reading frames (ORFs), predicted gene internal transcriptional start sites (TSS) and sequence motifs (predicted or cryptic) that may fine-tune rates (Fig. 4.1B, Materials and Methods). Over- all, we removed 87.4% of all putative ORFs (2,822 out of 3,229, Fig. 4.7C), 95.3% of all internal TSS (1,648 out of 1,730) and 76.7% of all predicted ribosome stalling motifs

47 Chapter 4

Table 4.2: Sequence rewriting of C. eth-1.0 into C. eth-2.0 leads to massive reduction of genetic features

Type C. eth-1.0 C. eth-2.0 Fraction Sequence rewriting Base substitutions none 133,313 17.0% Rewritten Codonsa none 123,562 56.1% Codons TTG 1,154 0 100% TTA 46 0 100% TAG 173 10 94.2% Alternative genetic features ORFsb 3,229 407 87.4% TSSc 1,730 82 95.3% RBSd 1,331 310 76.7% Remaining genetic featurese 6,290 799 DNA synthesis constraints High GC regionsf 4,342 0 100% Direct repeats ≥ 8bp 880 113 87.2% Hairpins ≥ 8bp 606 140 76.9% Homopolymers 139 46 66.9% Restriction sitesg 1,047 2 99.8% Synthesis constraintsh 7,014 301 a Number of synonymous codon substitutions introduced upon sequence rewriting. b Number of alternative open-reading frames (ORFs) residing within the 676 CDS of C. eth-1.0 and C. eth-2.0, respectively. c Number of transcriptional start sites (TSS) internal to coding se- quences (CDS). d Number of ribosome binding sites (RBS) internal to CDS. e Number of remaining genetic features within CDS of C. eth-1.0 and C. eth-2.0 respectively. f Regions of high GC-content > 0.8 within a 100 bp window. g Total number of type IIS restriction sites that were removed (AarI, BsaI, BspQI, PacI, PmeI, I-CeuI, I-SceI). Note: Two unique PmeI and PacI sites remained within the pMR10Y backbone to facilitate linearization of the final assembled chromosome for subsequent analysis by pulsed field gel electrophoresis. h Number of DNA synthesis constraints of C. eth-1.0 and C. eth-2.0 re- spectively.

(1,021 out of 1,331, Table 4.2). Testing whether rewritten genes remain functional will identify genes in which additional information beyond the amino acid code is necessary for proper functioning. Achieving functional C. eth-2.0 genes on the other hand will pro- vide fully defined artificial genes comprised of a minimized number of genetic elements. The precise knowledge of which genes remain functional and the subsequent repair of non-functional genes will ultimately lead to a fully defined artificial cell.

48 Chapter 4

A B C URA3, ARS209 16 Mega-segments PmeI, PacI YJV04 YJV04 ARS1323 + C. eth-2.0 C. eth-2.0

TRP1 YJV04 ARS_Max2 Selective medium SD medium ADE2 Ura-, Trp-, His- Ura+, Trp+, His+ - - - + + + -Ura, -Trp, -His, ARS4 Met , Leu , Ade Met , Leu , Ade -Met, -Leu, -Ade

ARS C. eth-2.0 785,701 bp D E 727 16 Mega-segments Segment level in E. coli Marker YJV04 C. eth-2.0 3.0 37 Segments digest digest 1.0 236 Blocks E. coli HIS3 945 kb 3.0 Mega-segment level in LEU2 ARS416 C. eth-2.0 1.0 ARS_HI 825 kb 771 kb ARS1018 750 kb C. eth- 2.0 3.0 ARS516 680 kb ARS1113 1.0 MET14 565 kb 0100 200 300 400 500 600 700 ARS1213 Sequencing coverage [log10] Genome coordinates [kb]

Figure 4.2: Assembly of C. eth-2.0 in S. cerevisiae.(A) Schematic representation of the circular 785,701 bp C. eth-2.0 chromosome with 6 auxotrophic selection markers (red), 11 autonomously replicating sequences (ARS, black) and the restriction sites for PmeI and PacI (blue). 236 DNA blocks (green boxes) were assembled into 37 genome segments (blue boxes), 16 mega-segments (orange boxes) and further assembled into the complete C. eth-2.0 genome (outer most grey track). (B) The complete C. eth-2.0 chromosome was assembled in a single reaction from 16 mega-segments by yeast spheroplast transformation and subsequent growth selection for auxotrophic TRP1 and LEU2 markers. (C) Growth selection on medium lacking Ura, Trp, His, Met, Leu and Ade identified yeast clone 2 (C. eth-2.0) positive for all auxotrophic markers while the parental strain (YJV04) fails to grow. (D) Size validation of the 785 kb C. eth-2.0 chro- mosome by pulsed-field gel electrophoresis. Digestion with PmeI and PacI releases a 771 kb portion of the C. eth-2.0 chromosome (arrow) from the shuttle vector pMR10Y. Undigested (Marker) and PmeI and PacI digested yeast chromosomes (YJV04 digest) serve as controls. (E) DNA sequencing coverage at segment level (upper panel), mega- segment level (centre panel) and the complete chromosome assembly (bottom panel) is shown.

Chemical Synthesis of C. eth-2.0

We computationally devised a four-tier DNA assembly strategy starting from 3-4 kb as- sembly blocks to build the complete C. eth-2.0 chromosome in yeast [151] (Fig. 4.2A). Demonstrating the ease of genome-scale synthesis upon sequence rewriting, 235 out of 236 blocks were successfully manufactured (Supplementary Materials) and only a single DNA block required custom synthesis. We progressively assembled these ini- tial 236 DNA blocks into 37 chromosome segments (19-22 kb in size), and further into 16 mega-segments (38-65 kb in size) (Fig. 4.2A, Supplementary Materials) using yeast transformation. To select for the complete chromosome assembly, we applied a click-marker strategy by introducing five auxotrophic yeast genes (TRP1, HIS3, MET14, LEU2 and ADE2) split between adjacent mega-segments. Upon correct chromosome assembly in an engineered yeast strain lacking all auxotrophic marker genes (YJV04),

49 Chapter 4

click-markers will form functional genes and reconstitute prototrophy (Fig. 4.2B). Initial attempts to assemble the C. eth-2.0 chromosome from 16 mega-segments were not successful. Sequencing of yeast clones with partial C. eth-2.0 assemblies identified two defective ARS elements (ARS416 and ARS1213), which prevented replication of the full-length chromosome. We corrected these design errors and added five additional ARS sequences to promote efficient replication of the GC-rich C. eth-2.0 chromosome in yeast. One-step transformation of the 16 corrected mega-segments into yeast spheroplasts yielded two clones, one of which restored prototrophy for all six auxotrophic click- markers indicating complete assembly of C. eth-2.0 (Fig. 4.2C). We subsequently con- firmed the presence of C. eth-2.0 as a single by pulsed-field gel electrophoresis (Fig. 4.2D), diagnostic PCR (Fig. 4.8) and (Fig. 4.2E). C. eth-2.0 has a high GC-content exceeding 57%, while previous chemi- cally synthesized chromosomes [140,145,178] exhibit low GC-contents closely match- ing the native yeast genome. So far, attempts to clone high GC sequences in yeast have been proven to be difficult [155]. To assess whether C. eth-2.0 is stably maintained in yeast, we performed whole genome sequencing upon prolonged cultivation. After propagation for over 60 generations we found no occurrences of adaptive mutations or chromosomal rearrangements within C. eth-2.0, indicating stable maintenance in YJV04 (Fig. 4.9A). In agreement with this observation, electron micrographs showed normal yeast cell morphologies for parental cells and YJV04 bearing the C. eth-2.0 chromosome (Fig. 4.9B). We sequence-verified C. eth-2.0 at each assembly level to assess the performance of the genome synthesis process (Fig. 4.2E). Across the 785 kb genome design, we de- tected a total of 21 non-synonymous mutations (Table 4.6). Thereof, 17 emanated from non-sequence perfect DNA blocks that were provided by one of the two commercial suppliers. Only four additional mis-sense mutations within the genes argS (arginyl- tRNA synthetase), fabI (acyl-carrier protein) and the ribosomal genes S7P and L12P were introduced during segment and mega-segment assembly in yeast and E. coli re- spectively. No further mutations occurred in the final assembly of the C. eth-2.0 chro- mosome indicating a high sequence fidelity in the genome build process.

Mapping of Toxic Genes

It was previously reported in clone-based genome sequencing studies that natural mi- crobial genomes contain genes encoding for toxic and dosage sensitive expression products [200, 201]. We speculated that toxic genes residing on C. eth-2.0 would pre- vent chromosomal maintenance in Caulobacter. Therefore, we tested the design in form of the 37 individual C. eth-2.0 chromosome segments for the presence of toxic

50 Chapter 4

A B

Merodiploid Caulobacter strain Caulobacter Native C. eth-2.0 20K 40K 60K 80K 100K 120K 140K chromosome chromosome segments 1-37

160K 180K 200K 220K 240K 260K 280K 300K TnSeq

Fault diagnosis 320K 340K 360K 380K 400K 420K 440K 460K presence absence of Tn insertions

Caulobacter chromosome 480K 500K 520K 540K 560K 580K 600K 620K Functional genes functional non-functional C. eth-2.0 Non-functional genes 640K 660K 680K chromosome Non-essential genes

Figure 4.3: Fault diagnosis and error isolation across the C. eth-2.0 chromosome. (A) Functionality assessment of the C. eth-2.0 chromosome. Merodiploid strains bearing episomal C. eth-2.0 chromosome segments (orange and blue circle) are subjected to transposon sequencing (TnSeq). Presence of transposon insertions (blue marks) in a previously essential chromosomal gene (grey arrows) indicates functionality of the homologous C. eth-2.0 gene (blue arrow) while absence of insertions indicates a non- functional C. eth-2.0 gene (orange arrow). (B) Functionality map of the C. eth-2.0 chro- mosome with functional genes (blue arrows), non-functional genes (orange arrows) and non-essential control genes (grey arrows).

genes. Quantification of conjugational transfer from E. coli to Caulobacter in conjunc- tion with sequencing demonstrated that toxic genes were absent in 25 out of 37 chro- mosome segments (Fig. 4.10, Table 4.7). However, we observed a drastic reduction in transfer efficiency for 12 segments, suggesting presence of toxic genes that collec- tively cover 18.9 ± 3.6 kb in sequence (Table 4.7). We carried out genetic suppressor analysis and identified evolved strains that tolerated formerly toxic genome segments (Materials and Methods). Whole genome sequencing of suppressor strains led to the identification of 14 toxic genetic loci (Table 4.8) that bear mutations alleviating toxicity. An additional 3 chromosome segments acquired small deletions upon selection for fast growth (Table 4.9). Among the toxic genes, we found three chromosome replication genes (dnaQ, dnaB, rarA), six genes involved in LPS and fatty acid biosynthesis (fabB, lptD, lpxD, accC, murU, waaF), two genes encoding interacting RNA polymerase com- ponents (rpoC, topA), the S10-spc-alpha ribosomal protein gene cluster (CETH 01304- 01323) and the sodium-proton antiporter nhaA. Multiple of the identified toxic genes encode for interacting components of protein complexes suggesting an imbalance in subunit-dosage as a likely cause for toxicity. Overall, the observed fraction of 1.9% (14 out of 730) toxic genes found in C. eth-2.0 is well in agreement with previously reported average of 2.15 ± 0.8% of toxic genes identified among seven E. coli strains [200]. We concluded that computational sequence rewriting as part of the chemical synthesis rewriting process does not induce additional gene toxicity. In agreement with this hy- pothesis, 6 genes among the identified 14 toxic rewritten genes, have been previously

51 Chapter 4

identified as “unclonable genes” in E.coli [201]. Furthermore, mis-balanced expression of rpoC, rarA, dnaB, accC has previously been reported to elicit toxicity due to imbal- ance in protein complex subunit stoichiometry [202–205]. Given the precedence of toxicity for wild-type genes, we argue that the toxicity of these genes when ectopically expressed is likely a general property and not attributed to the rewriting process that maintains identical proteins.

Genome-wide Functionality Assessment of C. eth-2.0

While throughout the built process the C. eth-2.0 genome was maintained in heterol- ogous hosts, we next investigated whether rewritten genes resume their anticipated function upon introduction into Caulobacter. Functionality assessment and error diag- nosis of large-scale DNA constructs is a major challenge for bio-engineering of synthetic genomes. To permit parallel functionality assessment of rewritten C. eth-2.0 genes, we developed a transposon-based testing approach. This approach assesses the func- tionality of rewritten genes in merodiploid test-strains, which harbour episomal copies of C. eth-2.0 chromosome segments in addition to the native chromosome. The testing approach measures the functional equivalence between native and rewritten C. eth- 2.0 genes through genetic complementation. In the presence of functional C. eth-2.0 genes, previously essential native genes become dispensable and acquire disruptive transposon insertions (Fig. 4.3A). In contrast, native genes remain essential and do not tolerate disruptive transposon insertions in the presence of non-functional rewritten genes (Fig. 4.3A). In the case of functional C. eth-2.0 genes, such an analysis will prove that rewritten gene variants are functionally equivalent to essential native Caulobacter genes. Failure in complementation will identify specific genes where sequence rewrit- ing in C. eth-2.0 erases additional genetic control elements that are important for proper gene functioning. We asked whether rewritten genes are functionally equivalent to native genes despite the massive level of sequence modification introduced. We subjected 37 merodiploid test strains bearing C. eth-2.0 chromosome segments as episomal copies along the na- tive chromosome to transposon mutagenesis to test gene functionality. We compared transposon insertion patterns obtained between complementing and non-complementing conditions and assessed the functionality of C. eth-2.0 (Materials and Methods, Data S1). substitutions introduced upon rewriting and sequence optimization of the C. eth-2.0 genome allowed to unambiguously assign transposon insertions to the native Caulobacter genome and C. eth-2.0 chromosome segments. Cumulatively, we found 81.5% (432 out of 530) of all essential and semi-essential C. eth-2.0 genes to be functional (Fig. 4.3B, Table 4.3). Functional rewritten C. eth-2.0 genes encompass a drastic reduction in the number of genetic features (annotated, cryptic or predicted)

52 Chapter 4 compared to the wild type Caulobacter genome annotation. Maintenance of biologic functionality within rewritten genes suggests dispensability of these genetic features and hence will lead towards refinement of the current genome annotation. During the design process of C. eth-2.0, we have reduced the number of genetic features within CDS from 6,290 to 799 (Table 4.2). The high functionality level of 81.5% observed within the rewritten C. eth-2.0 suggests that the large majority of the 6,290 previously annotated and predicted genetic features does not adopt essential function. Among the genetic features found to be dispensable were the three formerly assigned anti-sense transcripts (sRNAs) CCNA R0109, R0151 and R0194 internal to rpoC, sufB and atpD that aquired 16,17 and 62 base substitutions during the rewriting process (Fig. 4.4A). Dispensability of the formerly assigned anti-sense transcripts suggests that the major- ity of chromosomally-encoded sRNAs identified by transcriptome analysis [206,207] do not elicit an essential function.

Sequence Design Flexibility Beyond Protein-coding Sequences

The large majority of the 133,313 base substitutions were introduced within protein- coding sequences of C. eth-2.0. However, a significant number of non-synonymous substitutions were inserted within intergenic and non-coding regions such as tRNA genes to facilitate the de novo DNA synthesis process. We found that base-substitutions within non-coding sequences where frequently tolerated and did not impair gene func- tionality. For example the two tRNA genes tRNATrp and tRNATyr remained functional despite base-changes that were introduced within the anti-codon arm to erase DNA synthesis constraints present within the wild type sequences (Fig. 4.4B). In the case of tRNATrp, we removed a Type IIS restriction site and in tRNATyr, a homopolymeric sequence pattern hindering DNA synthesis was removed. Both rewritten tRNA genes retained their function as revealed by our transposon-based complementation mea- surements (Fig. 4.4B). These findings suggest that, even apart from protein-coding sequences, a high level of sequence design flexibility exists to imprint biological func- tions into DNA.

Level of Gene Functionality Among Cellular Processes

We reasoned that the analysis of gene functionality among different cellular processes would permit identification of gene classes harbouring high levels of transcriptional and translational control elements within CDS. Assignment of gene functionality among cel- lular processes revealed that metabolic genes were enriched with over 90.3% function- ality (p value of 2.60E-4, Table 4.3). This supports the idea that metabolic genes contain a low level of transcriptional and translational control elements embedded within their CDS. This finding correlates with the observation that regulation of bacterial metabolism

53 Chapter 4

A B C mreBCD-rodA operon 1kb C. eth-2.0 tRNATrp C. eth-2.0 tRNATyr rpoC Type IIS site removed homopolymer removed 3’ 3’ 5’ 5’ 2 kb dispensable mreB mreCmreD pbp2 rodA anti-sense RNA groEL-ES operon 500 bp GGG G U G C G C G C sufB G A C G C G C G C U A A U 500bp C A C A U A U A dispensable CCA GUA anti-sense RNA groEL groES L34P-yhjA operon 500 bp tRNATrp tRNATyr

atpD

500bp 100 bp 100 bp dispensable L34P rnpA yidC yhjA anti-sense RNA

Figure 4.4: Sequence design flexibility within rewritten C. eth-2.0 genes. (A) Dis- pensability of anti-sense RNAs. Schematic depicting dispensable anti-sense tran- scripts embedded with coding sequences of genes rpoC, sufB and atpD (blue arrows). Upon synonymous rewriting anti-sense transcripts CCNA R0109, CCNA R0151 and CCNA R0194 (doted arrows) internal to rpoC, sufB and atpD acquired 16,17 and 62 base-substitutions. Essential chromosomal genes rpoC, sufB and atpD carry disrup- tive transposon insertion (blue marks) in presence of complementing C. eth-2.0 chro- mosome segments (blue marks) as compared to transposon insertion pattern of the wild-type control strain (green marks) indicating that anti-sense-transcripts are non- essential. (B) Schematic depiction of the secondary structure of the rewritten tRNATrp and tRNATyr. Type IIS restriction sites (red letters, left panel) and homopolymeric se- quences (red letters, right panel) hindering chemical synthesis of tRNA genes were erased by introducing base substitutions (blue) in the anti-codon arms while maintain- ing the anticodons (grey box). Transposon testing reveals functionality of C. eth-2.0 tRNA genes. (C) Functionality testing of C. eth-2.0 operons. Upon complementation with C. eth-2.0 operons, chromosomal genes tolerate disruptive transposon insertions (blue marks) throughout the native operon leading to simultaneously inactivation of mul- tiple native genes.

mainly occurs at the enzymatic level [208, 209]. On the other hand, hypothetical and ribosomal genes were underrepresented with a fraction of 64.2% and 60.6% functional C. eth-2.0 genes respectively (p values of 5.45E-4 and 2.01E-3, Table 4.3). Based on these findings, we estimated that one third of the hypothetical essential genes encode for genetic features other than the annotated protein-coding sequence. Likewise, a frac- tion close to 40% of all ribosomal genes likely contain additional regulatory elements embedded within their protein-coding sequence. From a gene regulatory perspective, this is not surprising as protein synthesis is the major consumer of cellular energy in bacteria [210]. Further, the biogenesis of functional ribosome complexes depends on the concerted transcriptional control of many ribosomal operon genes [211]. In sum, our analysis suggests that a low level of additional essential regulatory elements is

54 Chapter 4 embedded within the protein-coding sequences of metabolic genes. However, a high number of regulatory elements are embedded within coexpressed ribosomal genes and other multi-gene core modules of the bacterial cell.

Table 4.3: Functionality of C. eth-2.0 genes according to cellular pro- cesses

Category % of functional C. eth-2.0 genesa p valueb Translationc 73.6% (81/110) 5.49E-03 Ribosomal proteinsc 60.6% (20/33) 2.01E-03 t-RNA synthetases 81.8% (18/22) 6.14E-01 t-RNAs 67.8% (19/28) 4.73E-02 Translation factors 88.9% (24/27) 2.22E-01 Transcription 86.7% (13/15) 4.51E-01 DNA replication 83.9% (26/31) 4.69E-01 Cellular processes 87.2% (123/141) 1.12E-02 Cell-cycle 87.5% (28/32) 2.52E-01 Cell-envelop 86.9% (73/84) 8.48E-02 Protein turnover 88.0% (22/25) 2.81E-01 Energy production 73.9% (34/46) 1.03E-01 Metabolismd 90.3% (121/134) 2.60E-04 Hypothetical proteinsc 64.2% (34/53) 7.45E-04

Total 81.5% (432/530) a Fraction of functional C. eth-2.0 genes as assessed by TnSeq. Numbers of functional genes versus total gene numbers per class are shown in brackets. b p value for functionality enrichment and de-enrichment of different gene cat- egories. c Categories of genes that display a significant decrease in functionality. d Categories of genes that display a significant increase in functionality.

Rewritten C. eth-2.0 Operons Encompass Fully Functional Biological Modules

Although a significant fraction of chemical synthesis rewritten C. eth-2.0 genes are func- tional on an individual basis, we hypothesized that additive fitness effects might arise when multiple synthetic genes were combined. We thus searched for chromosomal transposon insertions within essential Caulobacter operons leading to simultaneous inactivation of multiple gene products due to truncation of a poly-cistronic mRNA tran- script. We observed such transposon insertions within 41 formerly essential Caulobac- ter operon genes (Data S3, Fig. 4.4C), suggesting that the chemical synthesis rewritten C. eth-2.0 genes may indeed fully encompass functional biological modules comple- menting the function of their native counterparts. One example includes the mreBCD- rodA operon, which is involved in the coordination of cell wall peptidoglycan biosynthe-

55 Chapter 4

sis machinery. This complex is critical for the generation and maintenance of bacterial cell shape [212]. We found Tn insertions disrupting the poly-cistronic mRNA, suggest- ing that the function of the chromosomal mreBCD-rodA operon is complemented by the C. eth-2.0 counterparts (Fig 4.4C). Similar patterns of transposon insertions were obtained for the groEL-ES operon [213] and the membrane protein chaperone operon yidC-yidA [214]. Both tolerated disruptive transposon insertions throughout the native sequence leading to simultaneous inactivation of multiple genes. These findings sup- port the idea that additive fitness effects do not likely arise when multiple synthetic genes are combined into functional modules, which will ultimately simplify the build process of artificial chromosomes by using chemical synthesis rewritten DNA.

Discovery of Novel Essential Genetic Features within CDS

We reasoned that the process of chemical synthesis rewriting offers a powerful ex- perimental approach to map hitherto unknown genetic regulatory elements encoded within the protein-coding sequence and validate the annotation accuracy of an organ- ism’s genome. Discovery of genes that lost their function upon rewriting suggested the presence of additional essential genetic features, which have evaded previous genome annotation efforts. Error cause classification of the 98 non-functional C. eth-2.0 genes (Materials and Methods) pinpointed to 52 instances of imprecise annotation of the an- cestral Caulobacter including mis-annotated promoter regions and incorrect transla- tional start sites predictions. This implies that a significant number of protein-coding genes remain mis-annotated within curated genomes. We found evidence for 27 tran- scriptional and translational control signals embedded within protein-coding sequences that were erased due to sequence rewriting. This finding suggests that internal tran- scriptional and translational control elements do not often occur within CDS of the Caulobacter genome. In only 13 instances, we detected non-functional genes due to base substitutions introduced outside of protein-coding sequences to optimize synthe- sis. Furthermore, six genes acquired deleterious mutations during the build and boot-up process (Data S2). These findings suggest that inaccurate annotation of protein-coding sequences is the main cause for loosing functionality upon synonymous rewriting.

Genetic Control Features within the Cell-division Genes

We next investigated the presence of additional genetic features within the cell-division genes murG, murC, ftsQ and ftsZ, in which genetic complementation failed with the corresponding C. eth-2.0 counterparts (Fig. 4.5A). We hypothesized that computa- tional sequence rewriting likely erased critical control elements needed for proper gene expression. Indeed, we found that rewriting of an overlapping CDS upstream of murC corrupted the associated promoter region. Similarly, sequence rewriting of ddlB erased

56 Chapter 4

A 1 kb

ftsI ftsL ftsW ddlBftsQ ftsA ftsZ mraW murE murF mraY murD murG murC murB alaS

Translational coupling Promoter B RBS murG *** murC ftsW ftsWs Attenuator RBS RBS ftsQ ftsZ ddlB CETH_03971

C 240 180 1800 750 160 120 1200 500

80 60 600 250

LacZ activity 0 0 0 0 murG murC ftsQ ftsZ repaired repaired repaired repaired non-functional non-functional non-functional non-functional

Figure 4.5: Fault diagnosis and repair across the C. eth-2.0 chromosome. (A) Fault diagnosis across the C. eth-2.0 cell-division gene cluster. Transposon insertion in the wildtype control (green marks) and upon complementation with C. eth-2.0 cell-division genes (blue marks) are shown. With the exception of the four non-functional genes murG, murC, ftsQ and ftsZ, (orange arrows), the large majority of rewritten genes are functional (blue arrows). (B) Chemical synthesis rewriting reveals genetic control ele- ments present within the cell division gene cluster including translational coupling sig- nals (murG), internal ribosome binding sites (RBS, ftsQ), extended promoter regions (murC) and attenuator sequences upstream of ftsZ.(E) Insertion of the wild type se- quence elements upstrream of non-functional cell division genes restores gene expres- sion as measured by β-galactosidase assays using lacZ reporter gen fusions.

an internal ribosome binding site necessary for translation of the downstream gene ftsQ (Fig. 4.5B). The rewriting of ftsW erased an embedded short transcript (ftsWs) necessary for murG translation [199] (Fig. 4.5B). Finally, we found a short annotated CDS [199] upstream of the non-functional ftsZ gene. However, sequence analysis revealed that the wild-type sequence contains a hairpin secondary structure, which resembles a transcriptional attenuation element. This may control ftsZ expression de- pending on the metabolic conditions (Fig. 4.5B). While further studies are needed to unravel the exact molecular functions of these genetic control elements within the cell division gene cluster, we found that repair of the sequence upstream of the ftsZ gene restored the ftsZ expression levels (Fig. 4.5C). Similarly, insertion of the wild type se- quence elements into the C. eth-2.0 genes murC, ftsQ and murG also restored gene

57 Chapter 4

expression (Fig. 4.5C). This suggests that once missing essential genetic elements are identified, error causes can rapidly be deduced to allow for rational repair of genome designs. Furthermore, identification and error diagnosis of non-complementing genes will provide a formidable opportunity to uncover new DNA design principles that will further improve our capabilities in programming biological functions into synthetic chro- mosomes.

Discussion

Caulobacter crescentus has emerged as an important model organism for understand- ing the regulation of the bacterial cell cycle [195,215,216]. A notable feature of Caulobac- ter is that the regulatory events that control polar differentiation and cell-cycle progres- sion are highly integrated and occur in a temporally restricted order [217]. The advent of genomic technologies has enabled global analyses that have revolutionized our under- standing of Caulobacter genetic core networks that control the life cycle [196–199]. In recent years, many components of the regulatory circuit have been identified and simu- lation of the circuitry has been reported [195,218]. More recent experimental work using transposon sequencing has shown that 12% of the Caulobacter genome is essential for survival under laboratory conditions [98]. The identified set of essential sequences in- cluded not only protein-coding sequences, but also regulatory regions and non-coding elements that collectively store the genetic information necessary to run a living cell. Of the individual DNA regions identified as essential, 91 were non-coding regions of un- known function and 49 were genes presumably coding for hypothetical proteins whose function is unknown. Although classical genetic approaches dissect the functioning of biological systems by analyzing individual native genes, uncovering the function of essential genes has re- mained very challenging. Herein we show that the rewriting of entire genomes through the process of chemical synthesis provides a powerful and complementary research concept to understand how essential functions are programmed into genomes. Con- temporary synthetic genome projects [141, 145, 178] have largely maintained natural genome sequences, implementing only modest design changes to increase the likeli- hood of functionality. However, conservative genome design misses a key opportunity of chemical DNA synthesis: the rewriting of DNA to advance our understanding on how fundamental biological functions are encoded within genomes. Indeed, synthetic autonomous bacteria such as the M. mycoides strain JCVI-syn3.0 comprising of 473 genes within a 531 kb genome [145] resulted in the creation of a replicative cell. How- ever, it also encompasses 149 genes with unknown functions (84 labelled as “generic” and 65 as “unknowns”) [219]. This corresponds to over one third of its gene set. While these studies were highly valuable to experimentally determine the core set of genes

58 Chapter 4 for an independently replicating cell, they did not probe the genetic information content of its essential genes.

By rebuilding the essential genome of Caulobacter through the process of chemical synthesis rewriting, we assessed the essential genetic information content of a bacterial cell on the level of its protein coding sequences. Within the 785,701 bp genome of C. eth-2.0, we employed sequence rewriting to reduce the number of genetic features present within protein coding sequences from 6,290 to 799. Overall, we introduced 133,313 base substitutions resulting in the synonymous rewriting of 123,562 codons. We speculated that synonymous rewriting of protein-coding sequences maintains the encoded amino acid sequences but likely erases additional genetic information layers. These include alternative reading frames as well as hidden control elements embedded within protein coding sequences of essential genes.

Rewriting of 56% of all codons resulted in complete rewriting of the essential Caulobac- ter transcriptome. Despite incorporating such drastic changes at the level of mRNA, our functionality analysis revealed that over 432 of the transcribed essential genes of C. eth-2.0 corresponding to 81.5% of all rewritten essential genes are equal in functionality to natural counterparts to support viability. This result suggests that in most essential genes neither the primary mRNA sequence, secondary structure nor the codon context has significant influence on biological functionality. This finding is surprising, given the fact that previous studies on individual genes reported that codon translation in vivo is controlled by many factors, including codon context [220]. Furthermore, our findings suggest that the vast majority of the probed open reading frames encode exclusively for proteins, other layers of genetic control seem not to play a significant role. Among the 134 enzyme-encoding genes that make up the metabolic core network of C. eth-2.0, the level of functional genes is even over 90%, suggesting that rewritten biosynthetic pathways retain their functionality in most cases. A possible explanation for the high proportion of functional metabolic genes might be the fact that regulation of essential metabolic functions occurs rather by allosteric interactions at the level of enzymes than at the level of gene expression.

In addition to 432 functional rewritten genes, our study precisely mapped 98 genes that lost functionality upon synonymous rewriting as detected by our transposon-based functionality assessment. Since retaining solely the protein-coding sequences of these genes is not sufficient for their functionality, it is reasonable to conclude that these genes are mis-annotated or contain hitherto unknown essential genetic elements embedded within their CDS. Alternatively, it is also possible that a subset of these genes encode for RNA rather than protein coding functions. Taken together, our genome-rewriting ap- proach can be used to experimentally validate the annotation fidelity of entire genomes.

Altogether, the identified set of 98 non-functional genes corresponds to less than 20% of

59 Chapter 4

the essential genome of C. eth-2.0 and precisely revealed where we currently have gaps in our knowledge that persisted despite previous omics-informed genome reannotation efforts. In the future, it will be interesting to unravel why rewriting renders particular genes non-functional. These studies will shed light onto hitherto unknown transcrip- tional and translational control layers embedded within protein-coding sequences that are of fundamental importance for proper gene functioning. Targeted repair of identi- fied non-functional C. eth-2.0 genes, as exemplified within the subset of the four faulty cell division genes murG, murC, ftsQ and ftsZ, will lead to the discovery of novel ge- netic features, such as the essential attenuator element identified upstream of the ftsZ gene, whose function is currently unknown. We acknowledge that the 98 identified non-functional genes are still poorly understood, yet our findings on C. eth-2.0 serve as an excellent starting-point to close current knowledge gaps in essential genome func- tions towards rational construction of a synthetic organisms with a fully defined genetic blueprint. On the level of de novo DNA synthesis, we herein demonstrate how chemical synthe- sis rewriting facilitates the genome synthesis process. To simplify the entire genome build process, we used sequence design algorithms [146, 151] and collectively intro- duce 10,172 base-substitutions to remove 5,668 DNA synthesis constraints including 1,233 repeats, 93 homo-polymeric stretches and 4,342 regions of high GC-content. Successful low-cost synthesis and subsequent higher-order assembly of C. eth-2.0 into the complete chromosome exemplifies the utility of our approach to rapidly produce designer genomes. Our results highlight the promise of chemical synthesis rewriting of entire genomes to understand how the most fundamental functions of a cell are programmed into DNA. On the systems engineering level, our design-build-test approach enables for the first time to harness massive design flexibility to produce rewritten genomes that are cus- tomized in sequence while maintaining their biological functionality. On the level of genome synthesis, our findings also highlight how chemical synthesis facilitates rewrit- ing of biological information into DNA sequences that can be physically manufactured in a highly reliable manner thereby reducing costs and increasing effectiveness of the genome build process. In sum, our results highlight the promise of chemical synthe- sis rewriting to decode fundamental genome functions and its utility towards design of improved organisms for industrial purposes and health benefits.

Author Contributions

J.V., A.W. and F.T. performed DNA assembly, J.V. performed genome assembly and ver- ification, L.DM., M.VK., Y.B., D.A. and P.S. performed directed evolution, L.DM., P.S., Y.B., D.A. and C.FT. performed transposon mutagenesis experiments, R.G. assessed

60 Chapter 4 gene expression, B.C. and M.C. performed genome design, sequencing and function- ality analyses. S.D. contributed to DNA synthesis, segment assembly and sequencing. M.C, J.V. and B.C wrote the manuscript. M.C. and B.C. jointly conceived the research project, directed the study.

Acknowledgements

We thank R. Schlapbach, L. Poveda from ZFGC for sequencing support, B. Maier and members from ScopeM for electron microscopy support, S. Nath from the JGI for DNA synthesis and sequencing support, F. Rudolf for assistance with yeast marker design and H. Christen for conception of computational algorithms and Samuel I. Miller, Markus Aebi and Uwe Sauer for critical comments. This work received institutional support from the Swiss Federal Institute of Technology (ETH) Zurich,¨ ETH research grant [ETH-08 16-1] to B.C, the Swiss National Science Foundation, [31003A 166476] to B.C and two Community Science Program (CSP) DNA synthesis award [JGI CSP-1593 and CSP- 2840] to B.C. and M.C. from the U.S. Department of Energy Joint Genome Institute in Walnut Creek. CA, USA. The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

61 Chapter 4

4.4 Supplementary: Materials and Methods

Media and Supplements

Microbial Growth Conditions

Standard culturing conditions were used to grow microbial strains. E. coli strains were cultured in liquid or solid Luria-Bertani (LB) medium at 37◦C while C. crescentus (Caulobac- ter thereafter) strains were cultured in liquid or solid peptone-yeast extract (PYE) medium at 30◦C respectively. S. cerevisiae strains were cultured in standard YPD medium sup- plemented with Adenine hemisulfate (80 mg·L-1) or synthetic defined medium (SD) at 30◦C. Unless otherwise indicated, antibiotics and other supplements were used at the following concentrations: (i) E. coli: 10 µg·mL-1 gentamicin (Gm), 50 µg·mL-1 kanamycin (Km), 100 µg·mL-1 ampicillin (Amp); (ii) Caulobacter: 20 µg·mL-1 Km, 20 µg·mL-1, nalidixic acid (Na), 10 µg·mL-1 Gm; (iii) S. cerevisiae: adenine hemisulfate (80 mg·L-1), D-glucose (20 g·L-1)

Design of C. eth-1.0 Genome and Sequence Rewriting into C. eth-2.0

Compilation of the C. eth-1.0 Genome Design

The comprehensive list of DNA sequences used for the design of C. eth-1.0 com- prised essential DNA loci according to a previously reported essential genome data set from Caulobacter NA1000 obtained for growth under rich-medium conditions [98]. Cumulatively, C. eth-1.0 encompasses 575 essential and semi-essential protein-coding sequences, 54 non-coding sequences as well as 1,015 intergenic sequences includ- ing gene regulatory features such as promoters and terminators. In addition, 15 re- dundant genes involved in cellular core metabolism and DNA recombination pathways were also included. Furthermore, 86 non-essential genes were added to the C. eth-1.0 genome design as control genes for subsequent functionality assessment. Essential and semi-essential DNA sequences were compiled into the 785,701 base-pair (bp) C. eth-1.0 genome design [151] with order and orientation maintained according to the native Caulobacter genome (NCBI Accession: NC 011916.1). To select for faithful assembly and permit stable maintenance in S. cerevisiae, auxotrophic marker genes (TRP1, HIS3, MET14, LEU2, ADE2) and a set of 10 ARS sequences (ARS Max2 [163], ARS416, ARS1018, ARS1113, ARS1213, ARS516, ARS HI [163], ARS727, ARS4, ARS1323) were seeded across the genome design. Furthermore, the pMR10Y [146] shuttle-vector sequence permitting replication in S. cerevisiae, E. coli and Caulobacter was inserted between the genes CETH 00003 and CETH 03878. The pMR10Y shuttle vector sequence consists of a broad-host range RK2 replicon, kanamycin resistance marker gene, the URA3 gene, CEN, ARS209 sequences and conjugational transfer

62 Chapter 4 origin (oriT) [146].

Synthesis Optimization and Sequence Rewriting into the C. eth-2.0 Genome De- sign

The C. eth-1.0 design was rewritten to implement DNA synthesis sequence optimiza- tion using the previously reported Genome Calligrapher algorithm and sequence de- sign pipeline [146]. Synthesis constraints and disallowed sequences impeding large- scale de novo DNA synthesis were refactored by neutral rewriting of protein-coding sequences (synonymous codon replacement) and applying desired base-substitutions within intergenic sequences. Cumulatively, sequence refactoring resulted in removal of a total of 6,713 synthesis and DNA assembly constraints, including removal of 4,342 (100%) high-GC regions (GC content larger than 0.8 within a 99 bp window), 87.2% (767 out of 880) of all direct repeats and 76.9% (466 out of 606) of all hairpin struc- tures with repeat size equal or larger than 8 bp, 66.9% (93 out of 139) homopolymeric sequence stretches (larger than six G, eight C, nine A or T) and all 1,045 instances of endonuclease sites for AarI, BsaI, BspQI, PacI, PmeI, I-CeuI and I-SceI enzymes from the C. eth-1.0 genome design. The GC content was lowered from 66.2% of the C. eth-1.0 design to 57.4% in the new C. eth-2.0 design, by requiring GC content within protein-coding sequences not to exceed 0.7 for a 99 bp window and not to exceed 0.85 within a 21 bp window. Similarly, the AT content was set not to exceed 0.7 within a 99 bp window and not to exceed 0.85 within a 21 bp window.

Rewriting of Protein-coding Sequences

Synonymous rewriting of protein-coding sequences was applied in addition to synthesis optimization. The average codon rewriting probability was set to 0.56, resulting in intro- duction of 133,313 base substitutions across the 785,701 bp C. eth-2.0 genome design. A subset of segments (segment 7-11 and 29-31) was rewritten in gradual increments with rewriting rates from 12.5% up to 100% implemented. The first four amino acid codons of protein-coding sequences were excluded from rewriting to preserve potential translational initiation signals. Furthermore, rare codons AGT, ATA, AGA, GTA and AGG were set as immutable codons and were neither replaced or introduced upon rewriting, except when required for removal of type IIS restriction sites. The two leucine codons TTA and TTG were erased throughout the C. eth-1.0 genome design. Furthermore, 163 out of 173 instances of the amber stop codon (TAG) were removed. Cumulatively, sequence refactoring resulted in 123,562 synonymous codon substitutions out of a total of 220,263 codons present across the genome design. To distinguish rewritten genes from the native Caulobacter counterparts, suffixes of gene IDs were changes in C. eth-

63 Chapter 4

2.0 from CCNA to CETH while maintaining the gene ID number (i.e. CCNA 00008 to CETH 00008).

Impact of Rewriting on Alternative ORFs and CDS Internal Motifs

While the massive scale of synonymous rewriting (> 56% of all codons replaced) applied to rewrite the C. eth-1.0 genome maintains protein-coding sequences of a given annotated CDS, while additional overlapping open-reading frame (ORFs) were altered. Similarly, synonymous rewriting erased features and motifs internal to CDS including transcriptional and translational control signals. To assess the impact of synonymous rewriting on alternative genome features and CDS internal motifs, a de- tailed sequence comparison between the C. eth-1.0 genome design prior and after rewriting into the C. eth-2.0 genome design was carried out using BioPython [221]. The complete set of alternative ORFs was identified using the Python regex module with the regular expression search pattern [’ ((ATG)—(GTG)—(TTG)) ([ATGC] {3,3}) + ? ((TAA)—(TAG)—(TGA))’] across the forward and reverse strand of the C. eth-1.0 genome sequence prior and after rewriting. ORFs smaller than 50 bp were discarded. To detect alternative ORFs to the current genome annotation, ORFs were grouped ac- cording to shared stop codon positions (ORF groups). Within ORF groups, smaller ORFs sharing identical stop codon positions as the annotated CDS were discarded. This procedure detected in addition to the 676 annotated CDS 3,229 alternative ORFs. For each CDS and alternative ORF, protein identity prior and after rewriting was cal- culated using BioPython string comparison. From a total of 3,229 alternative ORFs 2,822 (87.4%) showed non-synonymous mutation rates exceeding 20% and were clas- sified as erased upon rewriting. Furthermore, in 2,113 ORFs, (74.9%) a pre-matured stop codon (non-sense mutation) was introduced upon rewriting leading to a truncated protein product. To analyse the effect of rewriting on CDS internal ribosome binding and stalling sites, sequences matching to the bacterial Shine-Dalgarno (SD) consen- sus sequences were identified across the native C. eth-1.0 genome design and com- pared against the corresponding rewritten sequences of the C. eth-2.0 genome. SD consensus sequences were defined by requiring a match to the hepta-nucleotide se- quence ’AAGGAGG’ with less than one mismatch or no more than two A to G substitu- tions. CDS internal ribosome binding sites were classified as erased from the C. eth-2.0 genome design if rewritten sequences no longer matched the SD consensus sequence according to the above criteria. Similarly, CDS internal transcriptional start sites (TSS) according to a previous reported experimental dataset [197] were classified as erased if, within a 50 bp window upstream of the TSS site, more than 10% of all nucleotides were substituted upon rewriting.

64 Chapter 4

Retro-synthetic Partitioning of the C. eth-2.0 Genome Design

To define the optimal retro-synthetic assembly route, the synthesis optimized C. eth-2.0 genome design was partitioned using the previously published Genome Partitioner al- gorithm [151]. The partitioning strategy was designed as a four-tier hierarchical assem- bly process comprised of 236 blocks each 3-4 kb in size (assembly level 1) that were assembled into 37 approximately 20 kb genome segments (assembly level 2) that were further assembled into 16 mega-segments of 40-60 kb in size (assembly level 3) and further assembled into the final 785 kb genome construct. Mega-segment boundaries were set to split 5 auxotrophic markers (TRP1, HIS3, MET14, LEU2, ADE2) seeded at equidistance into the C. eth-2.0 genome design to serve as click-markers to direct faithful assembly of the whole genome in yeast. Furthermore, segments were required to encompass only intact gene sets to prevent splitting of individual genes and allow for functionality testing at the level of individual segments. Terminal homology over- laps of each assembly level were optimized by the Genome Partitioner algorithm [151] using standard settings (maximal repeat size set to 8 bp, non-specific overlap set to 8 bp). Standard flanking 3’ prefix and 5’ suffix adapter sequences were previously re- ported [151].

Synthesis and Hierarchical Assembly of the C. eth-2.0 Genome

Low-cost DNA synthesis by commercial suppliers

The 236 partitioned 3-4 kb assembly blocks used for hierarchical assembly of the C. eth- 2.0 genome design were ordered from two commercial suppliers of low-cost de novo DNA synthesis as sequence verified plasmid cloned constructs (Gen9 Inc and Gen- eArt, Thermo Fisher Scientific, Regensburg Germany). A total of 236 assembly blocks were ordered from the former DNA synthesis supplier. Thereof, 182 were manufac- tured by low-cost synthesis (77% synthesis success rate). From a second commercial provider (GeneArt, Thermo Fisher Scientific, Regensburg Germany) 54 out of 55 as- sembly blocks were manufactured by low-cost DNA synthesis (98% synthesis success rate), while for a single remaining assembly block synthesis was accomplished by cus- tom gene synthesis. Constructs were delivered as plasmid cloned constructs in the commercial maintenance vectors pG9m, GeneArt pMS-RQ and pMA-RQ.

BspQI-mediated Release of Assembly Blocks from Maintenance Vector

Assembly blocks were released from the maintenance vector via a BspQI type IIS re- striction digestion. Digestion reactions to release individual 3-4 kb assembly blocks consisted of a 40 µL endonuclease digestion reaction composed of 10 µL(> 5 µg)

65 Chapter 4

purified maintenance plasmids containing the appropriate assembly block, 1 µL (10 U) BspQI type IIS restriction enzyme (NEB, USA), 4 µL 10x NEBuffer 3.1 (NEB, USA), and 25 µL nuclease-free water (Promega, USA). The digestion reactions were incubated at 50◦C for 1.5 h prior heat-inactivation of the BspQI enzyme at 80◦C for 20 min. The re- striction digested assembly blocks were purified using the NucleoSpin R Gel and PCR clean up Kit (Macherey-Nagel, Switzerland) and quantified on a Nanodrop ND-1000 spectrometer (Thermo Fisher Scientific, Carlsbad, USA).

Yeast Assembly of Segment and Mega-segments

Column-purified 3-4 kb DNA blocks were used for assembly of the 37 segments into the pMR10Y [146] shuttle-vector (pMR10::CEN/ARS::URA3) plasmid. From a mid-log

phase S. cerevisiae culture (VL6-48N [222], OD600 of 0.7) 2 mL of cells were collected by centrifugation and washed in 1 mL 0.9% NaCl-solution. To the washed cells, 100 ng salmon sperm DNA (single stranded from salmon testes, D7656, Sigma-Aldrich, USA), 540 ng linearized pMR10Y (digested with PacI and PmeI) and 300 ng of each DNA block was added. The pellet was resuspended in 500 µL transformation mixture (400 µL 50% PEG solution, 50 µL 1 M Lithium acetate, 50 µL distilled water) and 57 µL DMSO was added prior incubated at RT for 15 min, followed by incubation for 15 min at 42◦C. Cells were harvested by centrifugation, resuspended and plated onto selective yeast syn- thetic defined medium (supplemented with glucose (10 g·L-1) and adenine (80 mg·L-1) but lacking uracil) and incubated at 30◦C for two days. For the assembly of mega- segments from segments a similar Lithium acetate transformation protocol [168] was used with following modification. The segments were released from the pMR10Y vec- tor by PacI and/or PmeI restriction digestion. The first segment composing the assem- bled mega-segment was digested using PmeI. The last segment of the mega-segment was digested using PacI. For mega-segments composed of three segments, the centre segment was released by a PacI and PmeI restriction digest. The 20 µL endonucle- ase digestion reaction consisted of 1 µg segment DNA, 0.5 µL (5 U) PacI / PmeI, 2 µL 10x CutSmart Buffer (NEB, USA) and water to fill up to 20 µL. The restriction digest was conducted for 1 h at 37◦C prior heat-inactivation of the restriction enzyme at 65◦C for 20 min. Of each restriction digest, 4 µL (200 ng) of the released segments were transformed into S. cerevisiae using the protocol described above.

Assessment of Autonomously Replicating Sequence Functionality

Initial attempts to assemble the C. eth-2.0 chromosome in yeast resulted in isolation of partially assembled chromosome constructs bearing mega-segments 1 to 6 as deter- mined by whole genome sequencing. However, no clones with larger constructs were detected. Close inspection of initial partial assembly products revealed that assemblies

66 Chapter 4 failed at the 5’-end of the mega-segment VI, which contains the His3 click marker and the ARS416 at the 3’-end. The Met14 marker and the ARS1213 are located 100 kb downstream of mega-segment VI. We reasoned that either one or multiple ARS se- quences were non-functional to promote replication of the GC-rich C. eth-2.0 chromo- some sequence in S. cerevisiae. We benchmarked the functionality of ARS elements via assembly of adjacent segments containing a click marker and an ARS element with two different linker sequences. The first linker contained a Cen6 and an ARSH4 site, while the second linker contained a Cen6 site only. The first linker was used as a positive control since the replication of the assembled segments is maintained by the ARSH4 site. The second linker was used to test the functionality of the ARS elements seeded in the design, as the linker itself does not permit for DNA replication. This ex- periment showed that ARS416 and ARS1213, located at the mega-segment junctions 5 and 6 and 8 and 9 are non-functional while the ARS Max2 and ARS HI located at the mega-segment junctions 2 and 3, and 10 and 11 respectively were functional. To repair the initial design, five additional ARS elements (ARS1018, ARS1113, ARS516, ARS727, and ARS1323) were integrated at the 5’-end of the different mega-segments 5,6,9,12 and 15.

Addition of ARS to Mega-segments

ARS elements for the repair of the C. eth-2.0 chromosome design were selected ac- cording to following criteria. ARS had to be shorter than 250 bp and not contain any of the restriction sites needed for the synthesis of the construct (AarI, BsaI, BspQI, PacI, PmeI). Selected ARS elements satisfying these criteria were flanked by 70 bp over- hangs to the 3’-end of the mega-segment and the pMR10Y vector and were ordered as G-Blocks (IDT, Skokie, IL, USA). The mega-segments were PmeI restriction digested as described above. To assemble ARS sequences into mega-segments, the following protocol was used. The S. cerevisiae (VL6-48N) pre-culture was grown in 5 mL YPD to high density, diluted 1:25 in 50 mL YPD and grown for 4 h prior collection of cells by centrifugation at 1,000 rcf for 5 min, the supernatant was replaced by 25 mL MQ. The cells were centrifuged at 3,000 rcf for 5 min, the supernatant was discarded. The pellet was dissolved in 100 µL Lithium acetate-mix (0.1 M Lithium acetate, 0.01 M Tris-HCl pH 7.5, 0.001 M EDTA pH 8.0) per transformation. To the LiAc-cell mix, 10 µL salmon- sperm DNA (1% w/v salmon-sperm DNA (ssDNA), 0.01 M Tris-HCl pH 7.5, 0.001 M EDTA pH 8.0) and 600 µL PEG-mix (40% w/v Poly(ethylene glycol) 3015-3685 g/mol, 0.01 M Tris-HCl pH 7.5, 0.001 M EDTA pH 8.0) were added per transformation. Of this master mix, 710 µL were aliquoted into 1.5 mL tubes, to which 200 ng of the digested mega-segment and 15 ng of the G-Block was added. The samples were incubated at RT for 30 min on a shaker (Titramax100, Heidolph, Schwabach, DE) at 300 rpm. Fol-

67 Chapter 4

lowing incubation, 70 µL DMSO was added and the samples were incubated at 42◦C for 15 min. The cells were centrifuged at 1,000 rcf for 2 min, the supernatant was re- placed by 300 µL MQ, 150 µL of the transformants was plated onto SD plates lacking Uracil. The plates were incubated at 30◦C, for two days. The correct integration of the ARS was confirmed by diagnostic PCR and Sanger sequencing (Microsynth, Balgach, CH).

Verification of Higher-order Assemblies in S. cerevisiae using Diagnostic PCR

Correct higher-order assemblies for segment and mega-segment assemblies were ver- ified by performing diagnostic PCR reactions across assembly junctions. S. cerevisiae colonies were resuspended in 3 µL 0.02 M sodium hydroxide, and incubated for 10 min at 99◦C in a thermocycler instrument. The diagnostic PCR reaction contained 5 µL 5 M betaine, 12.5 µL BioRed 2x Mastermix (Bioline, London, UK), 0.5 µL of each diag- nostic primer (100 nM) specific for each assembly junction present in the higher-order assembly, 3.5 µL water as well as 3 µL of boiled S. cerevisiae cells as template. PCR reactions were cycled in a thermocycler instrument (C1000 touch, BioRad, Cressier, Switzerland) according to following protocol: (1) 5 min at 96◦C, (2) 30 s at 96◦C, (3) 30 s at 55◦C, (4) 1 min at 72◦C, (5) repeat steps 2 – 4, 30 times, (6) final elongation 10 min at 72◦C. PCR products were analysed on 1.5% agarose gels by electrophoresis to identify clones bearing full-length higher-order assemblies.

Isolation of Higher-order Constructs from S. cerevisiae

The yeast assembled C. eth-2.0 chromosome segments and mega-segments cloned into the pMR10Y shuttle vector were extracted from S. cerevisiae and transformed into E. coli according to the following procedure. S. cerevisiae strains bearing segment or mega-segment constructs were inoculated in 5 mL SD medium lacking Uracil and grown overnight. Cells were harvested by centrifugation and resuspended in 250 µL Zy- molyase solution (0.143 M 2-mercaptoethanol, 0.01 M Tris-HCl pH 8.0, 1.2 M Sorbitol,

0.005 M CaCl2 , 8 mg/mL Zymolyase 20T from Arthrobacter luteus (AMS Biotechnoloy Europe, UK) followed by incubation at 37◦C for 1 h. To each digest, 12 µL 1 M Tris-HCl pH 8.0 and 6 µL 0.5 M EDTA pH 8.0 solution was added followed by addition of 250 µL of lysis buffer (GeneJET Plasmid Miniprep Kit, Thermo Scientific, USA) and incubated for 5 to 7 min. To terminate the lysis process, 350 µL neutralization buffer (GeneJET Plasmid Miniprep Kit, Thermo Scientific, USA) was added, mixed and the cell debris was pelleted by centrifugation for 10 min at 15,000 rcf. The clear supernatant (800 µL) was transferred and DNA was precipitated by addition of 640 µL 2-propanol and pel- leted by centrifugation in a microfuge (10 min, 15,000 rcf). The resulting pellet was

68 Chapter 4 washed with 500 µL 70% ethanol and centrifuged for 5 min, dried and resuspended in 50 µL 0.1 M Tris-HCl buffer, pH 8.0.

Transformation of Higher-order Constructs into E. coli and Isolation

To transform higher-order constructs into E. coli, 2 µL of the isolated pMR10Y plas- mid DNA was electroporated into E. coli (DH5α / DH10B, (90 µL aliquots, OD 25) at 1.35 kV, 200 Ω, and 25 µF using 0.1 cm electrode gap Gene Pulser R cuvettes (Bio-Rad Laboratories, USA). The pulse was applied at time constants around 4.5 ms. Immedi- ately after the electroporation, transformed E. coli were rescued in 1 mL SOC medium and incubated at 37◦C for 1 h. 100 µL of each rescued electroporation cell sample was plated onto selective LB medium supplemented with kanamycin (20 µg·mL-1) and incubated at 37◦C overnight. Segments and mega-segments were isolated from E. coli using the NucleoBond Xtra Midi kit (Macherey-Nagel, Switzerland) from 50 mL E. coli cultures grown overnight in LB medium supplemented with kanamycin. The purity of extracted plasmid DNA was assessed by agarose gel-electrophoresis and quantified on a Nanodrop ND-1000 spectrometer (Thermo Fisher Scientific, Carlsbad, USA).

Size Confirmation of Segments and Mega-segments by Electrophoresis

Gel electrophoresis was used to assess DNA assemblies of segments. Each assem- bled segment was released from pMR10Y by restriction digestion as described above and 100 ng were loaded in a well of a 0.5% UltraPure Agarose (Invitrogen) in 1x TAE buffer gel containing 1x GelRed (Biotium, Fremont, CA, USA). The electrophoresis was conducted at 7.1 V/cm for 40 min, using 1x TAE as a buffer. A pulsed field gel elec- trophoresis CHEF-DR III variable angle system (BioRad, Cressier, Switzerland) was used to assess DNA assemblies of mega-segments. The running gel consisted of 1% UltraPure Agarose (Invitrogen) in 0.5x TBE buffer (45 mM Tris-borate, 1 mM EDTA, pH 8.3). The temperature of the running buffer (0.5x TBE) was set to 14◦C. PacI / PmeI restriction digested DNA samples, 10 µL DNA 35 ng/µL, plus 2 µL 6x Purple Loading Dye (NEB, USA) were loaded into the wells while the pump was turned off. After 10 min of electrophoresis, the pump was turned on again. The gel was run for 15 h at a voltage of 6 V/cm and a pulsed field switch time of 1 – 8 ms. Agarose gels were stained with 1x GelRed (Biotium, Fremont, CA, USA) in 0.5x TBE buffer for 20 min at RT. The gels for both the segments and the mega-segments were imaged using an AlphaImager (ProteinSimple, San Jose, CA, USA).

69 Chapter 4

Sequence Verification of Segments and Mega-segments by Next-generation Se- quencing

To track the assembly progress, sequence verification was performed at the segment and mega-segment level. Chromosome segments and mega-segments cloned into the maintenance plasmid pMR10Y [146] were isolated using the NucleoBond Xtra Midi kit (Macherey-Nagel, Switzerland) from 50 mL E. coli cultures grown overnight in LB medium supplemented with kanamycin. DNA yield was quantified on a Nanodrop ND- 1000 spectrometer (Thermo Fisher Scientific, Carlsbad, USA) and the quality of isolated plasmid DNA was assessed by gel-electrophoresis using 0.5% agarose gel against a super-coiled plasmid standard. DNA samples were subjected to tagmentation [223] and barcoded using the Illumina DNA library preparation protocols and sequenced on a MiSeq instrument using standard paired end sequencing. The resulting reads were demultiplexed, filtered to remove low-quality reads and trimmed to remove adapter se- quences followed by read alignment to the reference C. eth-2.0 genome using bwa. The resulting sam file was sorted and converted into the bam format using samtool [224] prior construction of read pileup (mpileup) and calculation of read coverage. The output bcf file was converted into a vcf file using bcftools [225] for SNP calling and assignment.

Release of Mega-segments from pMR10Y:ETH Plasmids

All mega-segments except mega-segment 1 and 16 were pooled in equimolar ratios (96 ng/40kb mega-segment, 153 ng/60kb mega-segment, 1.6 µg total) in a 1.5 mL tube and digested using 45 units of PacI and PmeI for 8 h at 37◦C. Similarly, mega-segments 1 and 16 were restriction digested using only PacI or PmeI respectively. To remove the excised pMR10Y backbone from the mega-segment inserts, the digests were loaded onto a 0.5% low gelling temperature agarose (Sigma-Aldrich, USA) in 1X TAE buffer gel and subjected to electrophoresis at 11.4 V/cm for 1 h. After the electrophoresis run, the bands corresponding to the mega-segments were recovered from the gel and incubated in 500 µL 1x β-Agarase I Buffer (NEB, USA) for 30 min on ice. This washing step was repeated once. After equilibrating the gel slice, the agarose plug was melted at 65◦C for 10 min, followed by an equilibration step at 42◦C for 15 min prior addition of 1.5 µL β-Agarase I (NEB, USA) and further incubated at 42◦C for 1 h. The digests were transferred into 1.5 mL screw-cap tubes, extracted with one volume buffer-saturated phenol and centrifuged at 17,000 rcf for 10 min. The upper layer was transferred into a new 1.5 mL tube and an equal volume of 2-propanol supplemented with 1 µL/100µL Glycogen Blue (Invitrogen) was added prior centrifugation at 17,000 rcf for 20 min. The DNA pellet was washed with 70% ethanol and the dried pellet was dissolved in 20 µL 0.01 M Tris-HCl buffer pH 8.0 and incubated overnight at 4◦C.

70 Chapter 4

One-step Assembly of C. eth-2.0 from 16 Mega-segments in Yeast

Whole genome assembly was performed in YJV04, a CRISPR engineered S. cerevisiae strain based on YPH857 of Philip Hieter [156]. In YJV04 the native auxotrophic marker genes URA3, TRP1, HIS3, MET14, LEU2 and ADE2 have been deleted. The whole genome assembly was conducted according to the following procedure [149]. A pre- culture of YJV04 (OD600 = 1.67) was diluted 222-fold in 100 mL YPD medium and incubated at 30◦C overnight until an OD600 between 0.3 and 0.5 was reached. Cells were harvested by centrifugation at 1,000 rcf for 5 min, washed in 30 mL sterile water and again washed in 20 mL 1 M Sorbitol followed by dissolving the pellet in 20 mL SPE solution (1 M Sorbitol, 0.01 M sodium phosphate, 0.01 M EDTA pH 8.0) supplemented with 40 µL 2-mercaptoethanol and 20 µL Zymolyase solution (10 mg/mL Zymolyase 20T, 0.05 M Tris-HCl pH 7.5, 25% glycerol). The mixture was incubated at 30◦C and progression of spheroplast formation was monitored after 20 min by mixing 100 µL of the digested cells with either 900 µL 1 M Sorbitol or 900 µL 2% SDS respectively. The OD600 ratio in absence or presence of SDS was determined on a spectrometer. The Zymolyase digestion was continued until the sample treated with 2% SDS showed a 3 to 5-fold lower OD600 than the control sample. Spheroplasts were harvested by centrifugation at 300 rcf for 10 min and gently dissolved in 50 mL 1 M sorbitol. This washing step was repeated twice again prior carful resuspension of spheroplasts in 2 mL STC solution (1 M sorbitol, 0.01 M Tris-HCl pH 7.5, 0.01 M calcium chloride). For whole genome assembly reactions, 1.8 µg of digested and purified mega-segments in a volume of 20 µL, as well as 2-3 µg ssDNA (Sigma-Aldrich, USA) were pipetted into a sterile 1.5 mL tube and 200 µL of freshly prepared spheroplasts were added and incubated at room temperature for 10 min. To each sample, 800 µL PEG solution (20% PEG 8000, 0.01 M calcium chloride, 0.01 M Tris-HCl pH 7.5) was added, inverted carefully several times and incubated at room temperature for 10 min. Spheroplasts were collected by centrifugation at 300 rcf for 10 min, resuspended in 800 µL SOS solution (1 M sorbitol, 0.0065 M calcium chloride, 0.25% yeast extract; Difco, BE, 0.5% peptone; Difco) and incubated at 30◦C for 40 min. The spheroplast solution was mixed with 7 mL SDSORB-TOP (SD agar containing 1 M Sorbitol and 2.5% agar pre-tempered at 50◦C), inverted several times and poured onto SDSORB (SD agar containing 1 M Sorbitol and 2% agar) plates selecting for Trp and Leu. After hardening, the plates were incubated at 30◦C for 5-7 days before seeing colonies.

Diagnostic PCR to Verify C. eth-2.0 Assembly

To assess the correct assembly of the C. eth-2.0 genome, a PCR was conducted using the yeast cells as template. After growing on a plate, the colony of interest was picked and dissolved in 3 µL 0.02 M sodium hydroxide in a PCR tube. The dissolved colonies

71 Chapter 4

were incubated in a thermocycler at 99◦C for 10 min. Each PCR reaction contained 5 µL 5 M betaine, 12.5 µL BioRed 2x Mastermix (Bioline, London, UK), 0.5 µL diagnostic primers (100 nM, annealing to each of the 15 assembly junctions to be tested) and 3.5 µL water. The PCR reaction mix was added to the lysed yeast cells and put into the thermocycler to run the following PCR protocol. (1) initial denaturation 5 min at 96◦C, (2) denaturation 30 s at 96◦C, (3) primer annealing 30 s at 60◦C, (4) elongation 1 min at 72◦C, (5) repeat steps 2 – 4, 30 times, (6) final elongation 10 min at 72◦C. PCR products were analysed by agarose gel-electrophoresis to confirm the correct amplicon size.

Size Confirmation of the C. eth-2.0 Genome Construct by Pulsed-field Electrophore- sis

A pulsed-field gel electrophoresis CHEF-DR III variable angle system (BioRad, Cressier, Switzerland) was used to confirm complete DNA assembly of the C. eth-2.0 genome. Prior to the electrophoresis step, intact chromosomal DNA was extracted from the S. cerevisiae cell. A YJV04 C. eth-2.0 pre-culture was grown in SD medium selecting for the auxotrophic markers. The pre-culture was diluted 1:15 in SD medium and grown overnight to mid to late log-phase. Cells corresponding to 1.5 mL of the culture were harvested and washed using 1 mL 50 mM EDTA, pH 8.0 and resuspended in 100 µL 50 mM EDTA, pH 8.0. To the washed cells, 50 µL Zymolyase solution (1 mL SCE solution (1 M sorbitol, 0.1 M sodium citrate, 60 mM EDTA, pH 7.0), 9 mg Zymolyase 20T, 50 µL 2-mercaptoethanol) was added prior addition of 250 µL of a 1% low melting point agarose (LMP) solution in 0.125 M EDTA, pH 8.0 equilibrated at 50◦C. A total of 80 µL of the yeast cell containing agarose solution was pipetted into CHEF Mapper XA System 50-Well Plug Molds (BioRad, Cressier, Switzerland) and incubated on ice, until the agarose was solidified. The plugs were collected in 2 mL tubes containing 1.5 mL ETB solution (9 mL 0.5 M EDTA pH 8.0, 1 mL 1 M Tris-HCl pH 8.0, 0.5 mL 2-mercaptoethanol) and were incubated at 37◦C for 4 h. The plugs were washed twice with 1.5 mL 50 mM EDTA pH 8.0 prior to adding 2 mL ProteinaseK solution (9 mL 0.5 M EDTA pH 8.0, 1 mL 10% N-Lauroylsarcosine, 10 mg ProteinaseK, 1 mg RNase) fol- lowed by an overnight incubation at 37◦C. Following the proteinase K digest, the plugs were washed three times with 1.5 mL 50 mM EDTA, pH 8.0. Following an 1 h incubation of the plugs in TE buffer at 37◦C, the plugs were incubated in 500 µL CutSmart buffer for 30 min on ice. After removing the buffer, 160 µL restriction enzyme mix (40 µL 5x CutSmart buffer, 111 µL MQ, 3 µL (30 U) PmeI / PacI) was added and incubated at 37◦C for 5 h prior to exchanging the restriction enzyme mix to further digest the plugs overnight at 37◦C. Prior to loading the plugs into the 1% UltraPure Agarose (Invitrogen) in 0.5x TBE buffer gel, the plugs were incubated in TE buffer for 1 h at room temper-

72 Chapter 4 ature. For the PFGE run, 0.5x TBE buffer at 14◦C was used. The switch time was ramped over the course of 24 h from 60 s to 120 s. The set voltage was 6 V/cm at an angle of 120◦. Agarose gels were stained with 1x GelRed (Biotium, Fremont, CA, USA) in 0.5x TBE buffer for 20 min at RT and imaged using an AlphaImager (ProteinSimple, San Jose, CA, USA).

Sequence Confirmation of C. eth-2.0 Genome Construct

To verify the whole genome assembly of C. eth-2.0 in S. cerevisiae and assess chromo- some maintenance and stability upon serial propagation of the host strain, total genomic DNA was extracted according to the following procedure. The initially isolated S. cere- visiae strain bearing the C. eth-2.0 genome according to PFGE and diagnostic PCR assessment as well as isolates propagated for twenty, forty and sixty generations were grown overnight to full density. Cells corresponding to 1.5 mL of the overnight culture were harvested at 13,000 rcf for 1 min. The supernatant was discarded and cells were resuspended in 200 µL lysis buffer (1% SDS, 0.1 M sodium chloride, 2% Triton X-100, 0.01 M Tris-HCl pH 8.0, 0.001 M EDTA pH 8.0). To each sample 200 µL acid washed glass beads (400-600 µm Sigma-Aldrich) and 200 µL phenol/chloroform/isoamylalcohol was added. The mix was placed on a FastPrep24 5G (MP Biomedicals, Santa Ana, CA, USA) and homogenized twice at full speed for 15 s with incubation on ice for 2 min in be- tween. The cell debris was pelleted by centrifugation at 13,000 rcf for 5 min and the top layer was transferred into a new 1.5 mL tube supplemented with 100 µL chloroform and vortexed at full speed for 30 s and phase-separated by centrifugation at 13,000 rcf for 2 min. The top layer (150 µL) was transferred into a new 1.5 mL tube and 1 µL RNase solution (0.01 M sodium acetate, 10 mg/mL RNase A (Roche, BS, CH), 0.1 M Tris-HCl pH 7.5) was added and incubated at room temperature for 1 h. Genomic DNA was pre- cipitated by adding 2.5 times the sample volume precooled (-20◦C) 100% ethanol and incubated for 30 min at -20◦C. The precipitated DNA was spun collected by centrifu- gation at 14,000 rcf for 5 min at 4◦C and the supernatant was discarded followed by a washing step using 70% ethanol at room temperature. The pellet was air dried and re- suspended in 50 µL Tris-HCl pH 8.0 The genomic DNA extract was stored at 4◦C. DNA concentrations of each sample was quantified on a Qbit spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). To obtain DNA libraries for next-generation sequencing, DNA samples were processed according to the NEBNext ultra II DNA Library Prep Kit for Illumina (NEB, USA). The sequencing was conducted on an Illumina NextSeq500 sequencer using a NextSeq 500/550 Mid Output v2 kit (FC-404-2001, Illumina, San Diego, CA, USA) at the Functional Genomics Center Zurich.¨ The resulting sequenc- ing output data was demultiplexed and reads were quality filtered, subjected to adapter trimming and mapped to the C. eth-2.0 genome using bwa. Similar to the sequence

73 Chapter 4

analysis for sequence verification of chromosome segments and mega-segments, the resulting sam files were processed using samtools and bcftool to generate coverage data and determined SNP frequencies across the C. eth-2.0 genome design.

Construction of Mero-synthetic Caulobacter Test Strains

Conjugational transfer rates of C. eth-2.0 chromosome segments into Caulobac- ter

Sequence verified pMR10Y plasmids harbouring a different C. eth-2.0 chromosome segment were conjugated from E. coli S17-1 donor strains into wildtype Caulobacter NA1000 to generate a panel of 37 mero-synthetic test-strains. To identify C. eth-2.0 chromosome segments that elicit toxicity upon boot-up in Caulobacter, conjugational transfer frequencies were quantified by counting the occurrence of kanamycin resistant colonies per 108 recipient cells using the conjugation assay and compared against a pMR10Y plasmid control lacking any chromosome segment. C. eth-2.0 chromosome segments that elicit toxicity in Caulobacter were identified using a cut-off of showing more than a 1000-fold reduction in kanamycin resistant trans-conjugants as compared to the empty pMR10Y plasmid control. A total of 12 out of 37 C. eth-2.0 chromosome segments harboured one or more toxic C. eth-2.0 genes. Assuming random distribution, the occurrence of toxic C. eth-2.0 genes was estimated with a Poisson rate constant based on the fraction of genome segments that did not elicit toxicity according to:

N λ = − ln (4.1) S

with N = 25 non-toxic segments out of a total of S = 37 genome segments assayed yielding a rate constant λ of 0.392. Based on a Poisson distribution with rate con- stant of 0.392, 25 non-toxic segments, 10 segments with a single toxic gene and 2 segments harbouring two toxic genes are expected. The cumulative size of toxic DNA sequences within individual segments and across the entire C. eth-2.0 genome design was estimated based on the observed conjugational transfer frequencies for each toxic chromosome segment and an approximated mutation rate of 10-8 per base and assum- ing a phenotypic lag of 2-3 cell divisions prior ceasing of cell division upon conjugational transfer of a toxic chromosome segment.

In vivo Adaptation of C. eth-2.0 Chromosome Segments at the Population Level

To assay the adaptive dynamics upon in vivo boot-up of individual C. eth-2.0 chromo- some segments in Caulobacter, populations of 103 up to 105 transconjugants were pooled and pMR10Y plasmids were isolated using the NucleoBond Xtra Midi kit

74 Chapter 4

(Macherey-Nagel, Switzerland). Plasmid DNA was quantified by agarose gel-electro- phoresis. Tagmentation and barcoding of individual DNA samples was performed using Illumina Nextera XT Index Kit v2 barcodes prior paired-end sequencing on an Illumina MiSeq instrument according to standard sequencing chemistry and protocols. Raw reads were demultiplexed using the indexing read information according to the Illu- mina workflow. Reads were filtered for high-quality reads and adapter sequences were trimmed prior alignment to the C. eth-2.0 reference genome sequence using bwa. Read coverage for each C. eth-2.0 chromosome segment was normalized against total reads mapping to the pMR10Y backbone sequence and compiled into a single data set for plotting the sequence coverage across the entire genome design.

Isolation of Evolved C. eth-2.0 Chromosome Segments and Analysis of Self- repair

To identify toxic genes and isolate mero-synthetic Caulobacter strains evolved for stable maintenance of C. eth-2.0 chromosome segments, transconjugants were isolated and PCR screened for presence of intact chromosome segments. For PCR screening, sets of multiplex-PCR primers were used that annealed to each block junction present within chromosome segments. Strains that maintained the full-length or the largest portion of a formerly toxic chromosome segments were isolated and corresponding plasmids were isolated using the NucleoBond Xtra Midi kit (Macherey-Nagel, Switzerland). To detect suppressor mutations responsible for tolerance of toxic chromosome segments, individual plasmids bearing intact chromosome segments were sequenced on a MiSeq Illumina instrument using DNA library preparation and indexing according to the Illu- mina workflow. Read coverage data for each C. eth-2.0 chromosome segment was calculated and normalized against the pMR10Y backbone. Single base substitutions, indels and larger deletions occurring upon in vivo evolution of chromosome segments were mapped using SNP calling by samtools and bcftool. Furthermore, to detect inser- tion events into C. eth-2.0 genes by IS288 and IS544 sequences the output sequencing read dataset was searched for reads mapping to the end sequences of native Caulobac- ter insertion sequences (IS511, IS298, ISCc1, ISCc2, ISCc3, ISCc4 and ISCc5) and reading into the C. eth-2.0 genome design. For C. eth-2.0 chromosome segments 1, no plasmid born suppressor mutations were identified. To identify chromosomal suppres- sor mutations that permit tolerance of the formerly toxic C. eth-2.0 chromosome seg- ment 1, sequencing reads from chromosomal DNA present within the plasmid prepped DNA were aligned onto the native Caulobacter reference sequence and SNPs occur- ring in native copies of mero-diploid genes were identified using samtool and bcftool. To fine-map precise locations of deletions occurring within C. eth-2.0 chromosome seg- ments upon in vivo adaptation of mero-synthetic Caulobacter test strains, the sequence

75 Chapter 4

coverage data was analysed to detect regions devoid of any sequencing reads. A total of 7 deletions spanning 367 bp up to 12,659 bp in size were detected. The precise genome coordinates of deletions were assigned by searching for reads covering both end-points of each deletion.

Fault Diagnosis of C. eth-2.0 by Transposon Sequencing

Transposon mutagenesis, PCR amplification and sequencing of Tn5 junctions

To generate hyper-saturated transposon libraries, a conjugation based mutagenesis procedure was used as previously described [98]. In brief, a Tn5 transposon deriva- tives bearing transposon internal barcode sequences were conjugated using the filter mating procedure from an E. coli S17-1 donor strain into the mero-diploid Caulobac- ter crescentus test strain bearing the different 20 kb long rewritten DNA segments on pMR10Y plasmids. Transposon internal barcodes consisting of a 14 base-pair random sequence adjacent to the transposon end sequences were used to tag independent samples and provide multiplexing capabilities. After overnight incubation on PYE plates supplemented with xylose, cells from each mating filter were harvested, resuspended in 800 µL PYE and 100 µL aliquots were plated onto PYE plates containing gentamycin, nalidixic acid, kanamycin and xylose. Plates were incubated for three days at 30◦C and transposon mutant libraries from each plate were pooled and stored at -80◦C as individ- ual pools of 10,000 insertion mutants in 96 deep well plates (Eppendorf AG, Germany). Reference transposon mutant pools were constructed in a similar manner using test- strains bearing only the empty pMR10Y plasmid to subsequently generate a control TnSeq data set.

Parallel Amplification of Transposon Junctions by Semi-arbitrary PCR

Amplification of transposon junctions carrying terminal adapters compatible with Illu- mina sequencing was performed as previously reported [98]. In brief, a two-step ar- bitrary PCR strategy was carried out in 384 well plate format to amplify transposon junctions in parallel from each mutant pool using a reaction volume of 10 µL using

1 µL of a freeze-thawed transposon mutant pool (OD600 1.0) as template and Taq poly- merase mix (GoTaq, Promega, Dubendorf,¨ Switzerland) in a thermocycler instrument (C1000 touch, BioRad, Cressier, Switzerland). Primers used for first round amplifica- tion consisted of a transposon specific primer (M13 universal) and one of four arbitrary PCR primers as previously described [98]. In a second round of PCR, 1.5 µL of the first round PCR products were further amplified using Illumina paired-end primers PE1.0 and PE2.0. The first PCR amplification was performed according to following PCR pro- gram: (1) 94◦C for 3 min, (2) 94◦C for 30 s, (3) 42◦C for 30 s, slope -1◦C/cycle, (4) 72◦C

76 Chapter 4 for 1 min, (5) repeat step 2-4, 6 times, (6) 94◦C for 30 s, (7) 58◦C for 30 s, (8) 72◦C for 1 min, (9) repeat step 6-8, 25 times, (10) 72◦C for 3 min, (11) 12◦C hold. Product of first round of PCR was further amplified in a second nested PCR reaction using following thermocycling conditions: (1) 94◦C for 3 min, (2) 94◦C for 30 s, (3) 64◦C for 30 s, (4) 72◦C for 1 min, (5) repeat step 2-4, 30 times, (6) 72◦C for 3 min, (7) 12◦C hold.

DNA Library Purification and Next-generation Sequencing of Transposon Junc- tions

Amplicons in the 200-700 bp range corresponding to transposon junctions were size se- lected by gel electrophoresis (2.0% agarose) and purified over a silica column (Machery- Nagel, Switzerland). The DNA concentration of each sample was quantified on a Nanodrop spectrometer Nanodrop ND-1000 spectrometer (Thermo Fisher Scientific, Carlsbad, USA) and samples from different TnSeq libraries were combined. Transpo- son junction libraries were paired end sequenced (2x150bp) on HiSeq and MiSeq Illu- mina platforms using standard cluster generation protocols, sequencing chemistry and primers PE 1.0 and 2.0 with a 5% phiX spike-in for calibration of base-calling. Sequenc- ing was performed at the Functional Genomics Center Zurich,¨ Zurich,¨ Switzerland.

Transposon Sequencing Read Processing and Mapping of Transposon Insertion Sites

Raw sequencing data processing, read alignment and sample demultiplexing was per- formed using a previously described analysis pipeline based on Python, Biopython, bwa, and Matlab routines [226]. Read filtering criteria included i) removal of low quality reads ii) requiring at least a 15 bp long perfect match to the Tn5 transposon end se- quence [GTGTATAAGAGACAG] and iii) requiring a genomic insert size of at least 15 bp as inferred by testing for overlapping read sequences between mate reads. Sequences corresponding to the Illumina adapters, transposon end sequence and arbitrary PCR primers sequences were trimmed and processed reads were aligned to the Caulobacter NA1000 and C. eth-2.0 reference genomes using bwa [227]. Transposon insertion sites were assigned requiring read pairs with correct orientation, perfect matches within the first 15 bases of both reads to a genomic location, unambiguous mapping to either the native genome or the C. eth-2.0 genome sequence and a genomic insert size smaller than 500 bases. Nucleotide substitutions introduced upon rewriting and sequence opti- mization of the C. eth-2.0 genome allowed to unambiguously assign transposon inser- tions to the native Caulobacter crescentus NA1000 reference genome NC 011916 and chromosome segments of C. eth-2.0. Demultiplexing into the different TnSeq dataset was performed according to the 14 bp long internal transposon barcode sequences adjacent to the transposon end. Upon insertion into a target locus, Tn5 transposition

77 Chapter 4

generates a nine- long duplication, which was taken into account for sub- sequent insertion site analysis. Transposon insertion sites were defined according to the first reference base detected immediately after reading outwards of the transposon I-end, which contains an outward pointing Pxyl promoter [98]. Global insertion sites statistics, insertion occurrence and distribution across all annotated features of the ref- erence genomes were calculated with Matlab according to previously described metrics and routines [98,226].

Functionality Analysis of C. eth-2.0 Genes

Transposon insertion numbers across protein and RNA-coding genes were normalized to account for differences in the global number of transposon insertion sites mapped between the different TnSeq measurements. Reference transposon insertion frequen- cies for native gene copies residing on the Caulobacter chromosome were determined for each gene present on the C. eth-2.0 genome design using following routine. Es- sential and semi-essential genes tolerate a low frequency of transposon insertions at sites that are non-disruptive and still permit expression of a functional gene product (5’ and 3’ portion of essential genes). Transposon hit rates for native genes from a panel of TnSeq measurements across different mero-diploid test strains were linearly fitted against the total number of unique insertions recovered from each individual data set. Based on the linear fit-function, the mean baseline insertion frequency and standard deviation under non-complementing conditions was determined for each Caulobacter gene. For the linear fitting procedure, only transposon insertions with unique mapping positions were considered. A Z-score metric was used to quantify the level of function- ality of each rewritten C. eth-2.0 gene (functionality Z-score) and diagnose faulty genes in the genome design according to the following formula:

T n − T n z = compl baseline (4.2) σ

with T ncompl specifying the normalized number of transposon insertions within a native Caulobacter gene under complementing conditions (in presence of a C. eth-2.0 chro-

mosome segment that carries rewritten gene copies), T nbaseline specifying the normal- ized number of transposon insertions within the native Caulobacter gene according to base-line experiments in absence of complementation and σ specifying the experimen- tally determined standard deviation of transposon hits for a given Caulobacter gene un- der non-complementing conditions. The functionality of rewritten C. eth-2.0 genes was assessed for each gene present on the evolved C. eth-2.0 genome design with genes that were deleted or partially deleted excluded from the analysis. A total of 42,218 kb of the genome design was deleted upon in vivo adaptation (identified deletions within seg- ments, 6, 12, 14, 21, 22, 25, 38). In addition, we discovered that segment 4 containing

78 Chapter 4 test-strains were no longer susceptible for conjugation-based transposon mutagene- sis. To carry out transposon mutagenesis and assess functionality, a test-strain bearing a partially deleted segment 4 (9.4 kb deletion) was used. Deleted regions of the C. eth-2.0 genome were enlarged to exclude partially deleted genes and operons from subsequent functionality assessment. According to these criteria, 89.2% (697,348 bp out of 781,541 bp excluding the pMR10Y backbone) of the C. eth-2.0 genome design was subjected to the TnSeq-based functionality analysis. Genes were classified into functional and faulty C. eth-2.0 genes according to following criteria. Non-essential and redundant C. eth-2.0 genes were excluded from the functionality analysis. Essential and semi-essential C. eth-2.0 genes with functionality Z-score equal or larger than 1.65 corresponding to a p value smaller than 0.05 were classified as functional C. eth-2.0 genes whereas genes with a functionality Z-score smaller than 1.65 were deemed non- functional.

Functionality Analysis of C. eth-2.0 Genes Across Gene Categories

C. eth-2.0 genes were grouped according to the clusters of orthologous groups (COG) classification into Metabolism (COG classifier, G, E, F, H, I, P), Cellular processes (COG classifier, D, M, N, O), Transcription (COG classifier, K), DNA replication (COG classi- fier, L), Energy production (COG classifier, C), Translation (COG classifier, J including tRNAs) and Hypothetical proteins (COG classifier, S). P values for enrichment and de- enrichment in functionality of C. eth-2.0 genes were calculated based on a cumulative binomial distribution for the above gene categories assuming uniform gene functionality across a given gene set.

Fault Diagnosis and Classification of Error Cause

Fault diagnosis of individual faulty C. eth-2.0 genes was performed according to follow- ing classification scheme. Fault causes were grouped into four fault categories, i) mis- annotation, ii) sequence rewriting, iii) synthesis optimization and iv) mutation. Faulty genes that are shorter than annotated, non-existent or lack a functional promoter ac- cording to transcriptional start site measurements [197] were assigned to the fault cate- gory ‘mis-annotation’. Faulty genes with internal promoters embedded within upstream genes that were erased upon rewriting and faulty genes located downstream of func- tional genes within operons were assigned to the fault category ‘sequence rewriting’. Faulty genes with synthesis constraints such as type IIS endonuclease sites, hairpins and direct repeats removed within their promoter regions were assigned to the fault cause ‘synthesis optimization’. Faulty genes that harboured sequence changes occur- ring upon synthesis, assembly and transfer to Caulobacter were assigned to the fault category ‘mutation’.

79 Chapter 4

Repair of Faulty Cell Division Genes and β-galactosidase Reporter Assay

DNA sequences covering the promoter region of ftsZ (CCNA 02623, CETH 02623), ftsQ (CCNA 02625, CETH 02625), murC (CCNA 02629, CETH 02626) and murG (CCNA 02634, CETH 02634) from the native chromosome or the C. eth-2.0 genome including the first four amino acids of the corresponding protein coding regions were PCR amplified and cloned via BglII and SpeI sites into the lacZ reporter plasmid pR9TT using isothermal assembly [75]. Reporter plasmids were conjugated into Caulobacter NA1000 cells and β-galactosidase-activity of the resulting translational lacZ-reporter constructs were assayed in cells using standard ONPG based assays. The β-galactosi- dase activities reported represent the average of at least three independent measure- ments derived from mid-log phase cultures.

80 Chapter 4

4.5 Supplementary: Figures

A native DNA GAGA CTCAT rewritten DNA GAAA CGC A C

identical protein E T H B

0 700 100

C. eth-2.0 600 785,701 bp

133,313 base 200 500 substitutions

300 400

base changes per segment to A to C 0 3000 to T to G

C erased features [%] 0 20 40 60 80 100

Alternative ORFs 2822 Internal TSS 1648 features

Internal RBS 1021 Biological GC content > 80% 4342 Hairpins & repeats 1233 Homopolymers 93

Restriction sites 1045 Synthesis constraints

Figure 4.6: Rewriting of the bacterial designer genome C. eth-2.0.(A) Synonymous rewriting maintains identical protein sequences upon genome rewriting. (B) Genome map of C. eth-2.0 illustrating the number of base substitutions introduced per genome segment (radial bars), with base changes to A (beige), T (grey), C (blue) and G (green) colour coded. (C) Bar chart summarizing the numbers and categories of genome fea- tures and DNA synthesis constraints that have been erased upon genome rewriting.

81 Chapter 4

A B C

4 1.0 3*10 -> G 1.0 C. eth-1.0 C. eth-2.0 676 ORFs -> C 0.8 maintained -> A 0.8 2*104 0.6 -> T 2822 alternative 0.6 ORFs were 0.4 1*104 removed GC content 0.4 0.2 Amino acid identity cumulative substitutions 0 0.2 0.0 CGAT 0 200 400 600 native base Genome coordinates [kbp] 50 250 450 650 850 ORF lenght [AA]

Figure 4.7: Massive sequence rewriting of the C. eth-1.0 genome. (A) Bar graph sum- marizing cumulative base substitutions between the C. eth-1.0 genome (design based on native sequences) and the rewritten (synthesis streamlined) C. eth-2.0 genome de- sign. (B) GC content plot between the difficult to synthesize (C. eth-1.0 genome, grey) and the synthesis optimization genome design (C. eth-2.0 genome, blue). Regions with low GC content pinpoint to yeast ARS sequences introduced to facilitate chromosome maintenance and replication in S. cerevisiae strain YJV04. (C) Dot plot of the amino acid sequence identity versus ORF length between C. eth-1.0 and C. eth-2.0. A total of 676 ORFs (blue) were maintained in C. eth-2.0 while 2,822 putative alternative ORFs (grey) were erased upon rewriting.

82 Chapter 4

AB URA3, ARS209 ARS1323

Amplicon Size [bp] Amplicon Size [bp] TRP1 15 1 ARS_Max2 1 320 9 861 ADE2 ARS4 14 2 2 786 10 169 13 3 3 400 11 407 ARS 12 4 600 12 897 727 C. eth-2.0 4 11 5 935 13 207 5 6 601 14 454 10 6 HIS3 7 844 15 975 LEU2 9 7 ARS416 8 233 ARS_HI 8 ARS1018 ARS516 ARS1113 MET14 ARS1213 C

L 1 2 3 4 5 6 7 8 9 101112131415 L

1000 bp 500 bp

Figure 4.8: Confirmation of complete C. eth-2.0 assembly in yeast. (A) Diagnostic PCR was used to verify chromosome assembly. (B) Fifteen amplicons covering assembly junctions between mega-segments (light and dark orange boxes) were designed. Lo- cation of annealing sites for diagnostic primers (blue arrows) are shown. The pMR10Y vector sequence is represented by a black triangle. (C) Size distribution of the PCR amplicons derived for the isolated YJV04 clone 2 bearing the complete C. eth-2.0 chro- mosome are shown against a size standard (L).

83 Chapter 4

A B

3.0 >20 generations

1.0 0100 200 300 400 500 600 700

3.0 >40 generations YJV04 1.0

0100 200 300 400 500 600 700 3.0 >60 generations 1.0 Sequencing coverage [log10] C. eth-2.0 0100 200 300 400 500 600 700 Genome coordinates [kb] YJV04 +

Figure 4.9: Cell morphology phenotype and genome stability of yeast cloned C. eth-2.0. (A) Sequencing coverage retrieved of upon serial outgrowth indicates stable chromo- some maintenance of C. eth-2.0. Sequencing coverage is shown for C. eth-2.0 after serial passage for 20, 40 and 60 generations in yeast. (B) Phenotypic analysis of YJV04 (upper panel) and YJV04 bearing C. eth-2.0 (lower panel) by scanning electron microscopy (SEM). Normal yeast cell morphologies and no phenotypic influence of the C. eth-2.0 chromosome on cellular replication of S. cerevisiae is detected.

3.0 original C. eth-2.0 design 1.0 0100 200 300 400 500 600 700 C. eth-2.0 3.0 evolved design Sequencing

coverage [log10] 1.0 0100 200 300 400 500 600 700 Genome coordinates [kb]

Figure 4.10: Stability of C. eth-2.0 chromosome segments upon conjugation into Caulobacter. Sequence coverage plots are shown across the C. eth-2.0 chromosome upon conjugation of individual segments into Caulobacter. Coverage data obtained from sequencing pools of trans-conjugants of the original C. eth-2.0 genome design (upper panel) and from single isolates (lower panel) evolved for stable maintenance of full length chromosome segments is shown.

84 Chapter 4

4.6 Supplementary: Tables

Table 4.4: Codon frequency table of the rewritten C. eth-2.0 genome

2nd base U C A G 3,231 (6,924) 1,925 (1,690) 1,787 (2,481) 552 (1,343) C F Y C 4,417 (724) 1,192 (285) 2,499 (1,805) 956 (166) U U S 0 (42)a 208 (207) * 236 (191) * 321 (201) A L 2 (1,105)a 3,644 (4,940) * 8 (173)a W 2,526 (2,526) G 6,042 (3,438) 3,194 (3,708) 1,388 (2,920) 4,205 (9,722) C H 5,207 (1,040) 3,002 (601) 2,357 (824) 5,265 (2,028) U C L P R 1,290 (202) 1,709 (289) 4,245 (882) 2,287 (428) A Q 8,482 (15,196) 3,662 (6,969) 2,535 (5,898) 2,930 (2,506) G base base

st 4,579 (9,822) 4,253 (7,047) 4,119 (2,166) 3,124 (2,971) C rd 1 N S 3 I 5,863 (621) 1,111 (242) 919 (2,871) 164 (164)b U A T 107 (107)b 1,095 (168) 518 (5,275) 139 (109)b A K R M 4,561 (4,560) 4,275 (3,276) 8,857 (4,100) 192 (222)b G 6,406 (9,013) 9,173 (17,959) 4,610 (10,183) 5,795 (14,411) C D 4,126 (1,173) 8,215 (1,845) 8,204 (2,630) 6,793 (2,170) U G V A G 195 (194)b 3,060 (480) 8,885 (4,360) 3,334 (625) A E 6,309 (6,656) 8,798 (8,963) 4,655 (9,180) 3,076 (1,795) G a Codons erased upon rewriting. b Rare codons maintained upon rewriting except for removal of forbidden type IIS endonuclease sites.

85 Chapter 4

Table 4.5: Codon substitutions per C. eth-2.0 chromosome segments

Segment ID Coordinates Size [bp] Codon substitutionsa Rewriting % seg 1 1..22276 22,276 3,861 (6,387) 60.45 seg 2 22157..41386 19,230 3,451 (5,351) 64.49 seg 3 41267..60572 19,306 3,475 (5,570) 62.39 seg 4 60451..80089 19,639 3,484 (5,747) 60.62 seg 5 79966..101065 21,100 3,550 (5,946) 59.70 seg 6 100946..122064 21,119 3,560 (5,883) 60.51 seg 7 121944..142404 20,461 927 (5,585) 16.60 seg 8 142293..161370 19,078 1,039 (5,121) 20.29 seg 9 161367..182611 21,245 1,441 (5,731) 25.14 seg 10 182491..202737 20,247 1,472 (5,086) 28.94 seg 11 202617..223202 20,586 2,176 (6,265) 34.73 seg 12 223083..245967 22,885 3,766 (6,364) 59.18 seg 13 245848..266880 21,033 3,449 (5,892) 58.54 seg 14 266762..288129 21,368 3,816 (6,211) 61.44 seg 15 288009..309977 21,969 4,127 (6,830) 60.42 seg 16 309856..332605 22,750 4,205 (6,935) 60.63 seg 17 332486..351749 19,264 3,449 (5,706) 60.45 seg 18 351627..374062 22,436 3,897 (6,621) 58.86 seg 19 373942..391434 17,493 3,005 (5,025) 59.80 seg 21 391316..413535 22,220 3,933 (6,873) 57.22 seg 22 413414..434554 21,141 3,731 (6,650) 56.11 seg 23 434431..456204 21,774 3,657 (6,244) 58.57 seg 24 456085..476452 20,368 3,597 (5,813) 61.88 seg 25 476332..496786 20,455 3,494 (5,693) 61.37 seg 26 496667..518097 21,431 3,344 (5,465) 61.19 seg 27 517978..539588 21,611 3,536 (5,898) 59.95 seg 28 539466..559225 19,760 3,842 (6,188) 62.09 seg 29 559105..578007 18,903 2,066 (4,755) 43.45 seg 30 577887..597167 19,281 3,072 (5,322) 57.72 seg 31 597046..617173 20,128 4,754 (5,535) 85.89 seg 32 617051..638739 21,689 3,672 (6,079) 60.40 seg 33 638620..659188 20,569 3,143 (5,268) 59.66 seg 34 659068..681646 22,579 4,122 (6,711) 61.42 seg 35 681526..702398 20,873 2,775 (4,475) 62.01 seg 36 702278..725152 22,875 4,001 (6,693) 59.78 seg 37 725032..748644 23,613 4,310 (7,103) 60.68 seg 38 748524..773851 25,328 4,573 (7,723) 59.21 Total codons substituted in C. eth-2.0 123,562 (220,350) 56.08 a Total number of codons are indicated in brackets

86 Chapter 4

Table 4.6: Non-synonymous mutations introduced upon the build process

Segment Mutationa Stageb Gene annotation, SNP effectc seg 7 128065 G to C assembly CETH 00531, LSU ribosomal protein L12P, K127N seg 7 135815 G to A synthesis CETH 00537, RNAP beta’ chain, E1030K seg 8 149437 G to T synthesis CETH 00735, beta-barrel assembly protein BamF, terminator seg 8 150347 T to C synthesis CETH 00736, lipoprotein signal peptidase, I109V seg 8 152937 T to C synthesis CETH 00737, isoleucyl-tRNA synthetase, D264G seg 8 156211 C to T synthesis CETH 00749, ferrous iron transport protein B, T258M seg 8 161242 G to A synthesis CETH R0015, tRNA-Val, 63g>a seg 9 175114 C to T synthesis CETH 00892, phosphoenolpyruvate-protein, S276P seg 10 185224 C to T synthesis CETH 00944; flagellar hook length protein, P337S seg 10 190540 G to A synthesis CETH 01001, ribokinase, R22C seg 10 193501 C to T synthesis CETH 01060, type I secretion protein RsaD, A486P seg 10 201452 C to T synthesis CETH 01104, glycosyltransferase, A100T seg 11 208036 T to C synthesis CETH 01210, nucleotidyltransferase protein, P7S seg 11 209021 G to A synthesis CETH 01211, MobA-like NTP transferase, V265P seg 11 212793 A to G synthesis CETH 01214, YjgP/YjgQ membrane permease, L120P seg 24 470251 6 bp ∆ synthesis CETH 02177, hypothetical protein, terminator seg 29 577864 C to T synthesis CETH 02934, conserved hypothetical protein, terminator seg 30 589432 T to C synthesis CETH 03026, two-component regulator PetR, V182T seg 31 616412 C to T assembly CETH 03305, SSU ribosomal protein S7P, D15N seg 33 649119 G to T assembly CETH 03469, arginyl-tRNA synthetase, P597H seg 38 752493 C to T assembly CETH 03833, acyl-carrier protein FabI, M215I a Position of sequence alterations are shown together with nucleotide change. b Stage of the built process where nucleotide change occurred. c Gene ID are provided together with the functional annotation and amino-acid changes.

87 Chapter 4

Table 4.7: Conjugational transfer frequency of C. eth-2.0 chromosome segments

Segment ID Genomecoordinates Size [bp] Frequency [log10]a Size of toxic DNA part [bp]b seg 1 1..22276 22,276 -4.72±0.05 948±105 seg 2 22157..41386 19,230 -1.05±0.05 NA seg 3 41267..60572 19,306 -2.06±0.16 NA seg 4 60451..80089 19,639 -1.68±0.1 NA seg 5 79966..101065 21,100 -2.33±0.08 NA seg 6 100946..122064 21,119 -4.63±0.11 1,173±262 seg 7 121944..142404 20,461 -3.95±0.09 5,576±1050 seg 8 142293..161370 19,078 -0.83±0.07 NA seg 9 161367..182611 21,245 -0.95±0.01 NA seg 10 182491..202737 20,247 -1.12±0.08 NA seg 11 202617..223202 20,586 -0.96±0.08 NA seg 12 223083..245967 22,885 -4.95±0.03 564±33 seg 13 245848..266880 21,033 -4.72±0.07 965±152 seg 14 266762..288129 21,368 -1.96±0.06 NA seg 15 288009..309977 21,969 -1.83±0.03 NA seg 16 309856..332605 22,750 -4.56±0.07 1,394±195 seg 17 332486..351749 19,264 -4.67±0.02 1,061±53 seg 18 351627..374062 22,436 -2.03±0.14 NA seg 19 373942..391434 17,493 -4.40±0.32 2,006±1043 seg 21 391316..413535 22,220 -4.64±0.02 1,138±60 seg 22 413414..434554 21,141 -1.55±0.10 NA seg 23 434431..456204 21,774 -1.41±0.08 NA seg 24 456085..476452 20,368 -1.35±0.10 NA seg 25 476332..496786 20,455 -1.90±0.07 NA seg 26 496667..518097 21,431 -1.75±0.05 NA seg 27 517978..539588 21,611 -4.57±0.01 1,341±37 seg 28 539466..559225 19,760 -1.39±0.12 NA seg 29 559105..578007 18,903 -0.77±0.02 NA seg 30 577887..597167 19,281 -0.72±0.06 NA seg 31 597046..617173 20,128 -2.00±0.10 NA seg 32 617051..638739 21,689 -1.17±0.16 NA seg 33 638620..659188 20,569 -1.39±0.07 NA seg 34 659068..681646 22,579 -1.28±0.12 NA seg 35 681526..702398 20,873 -4.41±0.16 1,968±602 seg 36 702278..725152 22,875 -1.43±0.05 NA seg 37 725032..748644 23,613 -1.93±0.02 NA seg 38 748524..773851 25,328 -4.82±0.03 767±57 pMR10Y (vector control) -0.70±0.07 NA Cumulative size of toxic DNA parts 18,903±3,650 Estimated number of toxic DNA partsc 14 a Log10 transformed conjugational transfer frequency of a given segment relative to the transfer rate of the empty pMR10Y plasmid. b Estimated size of toxic DNA parts based on an effective mutation rate of 10-7. c Estimated total number of toxic DNA parts.

88 Chapter 4

Table 4.8: List of identified toxic C. eth-2.0 gene

Segment Gene ID Annotation Mutation type seg 1 CETH 00005 DnaQ DNA pol III ǫ subunit Chrom: Val99 to Val (GTC to GTA) seg 6 CETH 00390 WaaF, LPS heptosyltransferase a ∆ 105919:113454 CETH 00459 NhaA, Na+/H+ antiporter a seg 7 CETH 00537 RpoC, RNApol β’ b Partial reversion to native sequence seg 12 CETH 01304 – 1323 Ribosomal protein operon a ∆ 235844:236985 seg 13 CETH 01342 RarA, recombination protein c His331 to Arg (CAC to CGC) seg 16 CETH 01737 DnaB replicative DNA helicase d Val266 to Gly (GTG to GGG) seg 17 CETH 01760 LptD, LPS-assembly protein a Gln202 to Met (CTG to ATG) seg 19 CETH 01961 AccC, Biotin carboxylase e ISCc3 insertion 383675: 383676 seg 21 CETH 01986 – 1992 LPS synthesis gene cluster a ∆ 395165:403283 seg 27 CETH 02535 TopA, DNA topoisomerase I a Lys70 to Lys (AAA to AAG) seg 35 CETH 03650 MurU, Mannose-1-P guanylyltransferase Leu55 to Gly (GTC to GGC) seg 38 CETH 03835 parS,f FabB, oxoacyl synthase Gly298 to Gly, (GGT to GGC) a Toxicity demonstrated in E. coli ( [201], PanDaTox database [228]). b Change in RNApol β and β’ stochiometry results in growth arrest in E.coli [202]. c Toxicity demonstrated in E. coli [203]. d Toxicity demonstrated in E. coli due to disbalance in DnaB:DnaC stochiometry [204]. e Toxicity demonstrated in E. coli due to disbalance in AccC:AccB stochiometry [205]. f Toxicity of extra-copy of centromeric parS site has been reported for Caulobacter [229].

Table 4.9: Deletions within C. eth-2.0 chro- mosome segments in Caulobacter

Deletion Segment ID Coordinates Size [bp] seg 6 105919:113454 7,537 seg 12 235844:236985 1,143 seg 14 276227:279173 2,948 seg 21 395165:403283 8,120 seg 22 419856:420221 367 seg 25 482128:491570 9,444 seg 38 761193:773851 12,659 Cumulative size of deletions 42,218

89 90 Chapter 5

YestroSens, a Field-Portable S. cerevisiae Biosensor Device for the Detection of Endocrine-Disrupting Chemicals: Reliability and Stability

5.1 Abstract

Farming, industry and urbanization lead to increases in the concentrations of poten- tially harmful compounds in waste, surface and drinking waters. One example of such pollution are estrogens, the steroidal female reproductive hormones. Already at a few nanograms per litre, these hormones can trigger endocrine disruption and cause acute and chronic health problems in humans and wildlife. Here, we present a Sac- charomyces cerevisiae estrogen biosensor capable of detecting estradiol, as well as ethinylestradiol, at concentrations of 1 nM. After an initial characterization of the sensor strain performance in an optimal laboratory setting, we focused on developing a biosen- sor device. We addressed current limitations of biosensors, such as the requirement of the cells for a liquid growth matrix, controlled storage conditions required to preserve cell viability, and the usually required bulky, as well as expensive, laboratory equipment. Our study provides significant new insights into the field of applied biosensors. The system presented in this work takes microorganism-based analytics one step closer to field application in decentralized locations.

91 Chapter 5

5.2 Preface

The project described in this chapter aimed at the application of a versatile and inex- pensive biosensor platform [230] for the analysis of endocrine-disrupting chemicals in aqueous samples. The platform consisted of two main components. The biological component of the sensor was based on a S. cerevisiae strain, which has previously been engineered to study transcription factors upon estradiol induction [173]. In the context of this project, the inducibility of the strain was used in a sensor application. The material component of the platform are polymeric discs, which can be produced at a high-throughput and low-cost and enable storage and facilitated cell culture.

This project was realized in collaboration with N. Lobsiger from the D-CHAB, ETH Zurich.¨ As the biologist in this interdisciplinary collaboration, I was responsible for the the biological components of this project. At first, I defined which strains are potential candidates for the biosensor. There were several different candidates in the literature, some of which I had previous experience with. After choosing the most suitable strain for the platform, I was responsible for providing optimally grown and healthy biosensor cultures for experimentation. After each round of experiments, I was responsible of op- timizing the experimental conditions from a biological perspective (i.e. growth media, growth time, cryo-protectants). Regarding the future of the project, I was responsible to think of future strategies to improve the biosensor in the future. The conceptualization and management of the project, as well as the manuscript writing and revision, was carried out together with N. Lobsiger.

This project demonstrates an approach to combine chemical and biological engineer- ing in order to make biological sensor research accessible. With advances in the bio- engineering of more complex pathways, it will be possible to create a high number of complex biosensor constructs. As a consequence, biological systems with de novo synthesized designer pathways have the potential to complement laborious chemical analysis.

92 Chapter 5

5.3 YestroSens, a Field-Portable S. cerevisiae Biosensor De- vice for the Detection of Endocrine-Disrupting Chemi- cals: Reliability and Stability

Nadine Lobsiger, Jonathan E. Venetz, Michele Gregorini, Matthias Christen, Beat Chris- ten and Wendelin J. Stark

Published in Biosensors and Bioelectronics, 2019, 146

Introduction

Human activities such as farming, industry and urbanization lead to increases in the concentrations of potentially harmful compounds in water. This study is concerned with the detection of estrogens, the steroidal female reproductive hormones mainly pro- duced by humans and livestock. Estrogens are constituted by 18 carbon atoms in three hexagonal and one pentagonal ring. The compounds of particular interest are either of natural origin, such as estradiol (E2), estrone (E1), or estriol (E3) or of synthetic origin such as 17α-ethinylestradiol (EE2), the oral contraceptive [231]. Apart from increased estrogen concentrations in sewage treatment plants, the manure from livestock, which is spread on land, represents a major source of steroids in soil and water [232]. Already at a very low concentration of a few nanograms per litre, these molecules are known to trigger endocrine disruption (EDCs, endocrine-disrupting chemicals) and cause acute and chronic health problems in humans and wildlife [233, 234]. The high estrogenic potency of natural hormones is caused by their high affinity for binding to nuclear es- trogen receptors (ERs) [231]. Concentrations and exact effects vary depending on species and developmental stage. For example, in wild roach (Rutilus rutilus) an EE2 concentration as low as 0.5 ng/L already significantly affects gene expression during early development. In such aquatic species, continuous long-term exposure to EE2 at low concentrations (sub ng/L to ng/L) have been reported to change gene expression and induce intersex and reproduction deficiencies [235]. There is a growing demand for reliable analytic methods for EDCs detection at sub- nanomolar concentrations from environmental matrices. In laboratories, LC-MS/MS is routinely employed to detect the presence of EDCs and cell-based bioassays are used to detect and quantify biological activity. The main drawbacks of laboratory-based as- says are the laborious sample preparation involving extraction and pre-concentration, as well as expensive analytic devices. To globally monitor the spreading of environmen- tal pollution and raise awareness of the ecosystem disrupting consequences of EDCs, there is an increased need for on-site analysis of EDC levels. Ideally, there would ex-

93 Chapter 5

ist a low cost, facile assessment of the pollution level employing a first-line bioactivity monitoring. Upon confirmation of the presence of EDCs, the samples could be sent to a laboratory for a more thorough analysis by LC-MS/MS.

Figure 5.1: (A) Schematic representation of the mechanism of action of the LexA-ER- AD system with ER being the hormone-binding domain of the human estrogen receptor and AD the activation domain. Adapted from [173] (B) Schematic visualizing environ- mental sampling followed by pre-concentration using solid-phase extraction (SPE). The dried sensor material is rehydrated, incubated and induced the environmental sample. The sensor material subsequently detects EDCs and show a dose-dependent fluores- cent response.

In the discipline of biomonitoring, living organisms and their physiological responses to stimuli are used to determine changes in an environment such as water, air or soil. Microbiologists, synthetic biologists and ecotoxicologists have collaborated and con- structed bioreporter systems in pro- and eukaryotes. In whole-cell biosensors, a tar- get compound interacts with the cellular biorecognition element and a physicochem- ical transducer converts the interaction into a measurable signal. A comprehensive overview of different bacterial bioreporter designs is provided by van der Meer and Belkin [236]. The biosensor systems can be classified according to their detection sys- tems and it is distinguished between amperometric, colourimetric, luminometric or flu-

94 Chapter 5 orometric biosensors [237]. Most previously reported whole-cell biosensors are based on bacterial cells due to short cultivation times and thorough genetic characterization. However, due to their eukaryotic and robust nature, the interest in Saccharomyces cere- visiae (S. cerevisiae) as a biosensor organism has increased [238]. It is hypothesized that eukaryotic cells might even provide information on the effects of direct relevance to other eukaryotes [239]. Additionally, the genome of S. cerevisiae is completely se- quenced and strategies, as well as vectors to genetically modify it, are readily avail- able [169, 240,241].

Yeast cells, which respond to a broad range of different molecules have been engi- neered and reported in literature [242–244]. With the intrinsic ability of S. cerevisiae to correctly express transfected vertebrate nuclear receptors and the subsequent genera- tion of transgenic yeast strains, yeast biosensors targeting EDCs have become partic- ularly prominent [237, 245,246]. For example the strain of S. cerevisiae used in this study does not naturally react to estrogenic compounds [173]. have engineered it to use E2 as a transcription factor. A fusion between the bacterial LexA gene and the human estrogen receptor (ER) [247] was stably integrated into the genome. Upon binding of E2 to the ER, LexA can bind to specific LexA-binding sites, inducing the expression of a downstream reporter gene such as citrine [248]. With a higher number of LexA binding-sites, more LexA can bind upstream of the promoter and thus the resulting expression is stronger. The system was developed to precisely control gene expression in biotechnological applications [173]. The stable genetic integration of the gene fusions into the genome of the yeast cell is an advantage, as it makes antibiotic selection obsolete. The absence of antibiotics fa- cilitates the usage of the system, as it reduces the handling steps and use of thermally fragile compounds [249] necessary for cell cultivation. Several yeast-based in vitro assays for estrogens have been developed and they are commonly known under the name yeast estrogen screen (YES). Several studies have assessed estrogenic potency in waste, surface and drinking waters employing YES [250–254]. Furthermore, [237] have described standardization and validation methods that are essential for regulatory acceptance when moving from research towards practical monitoring. A recent exam- ple of a yeast-based biosensor, which is routinely employed in a standardized fashion is the ISO certification for the determination of the estrogenic potential of water and wastewater (ISO 19040-1:2018).

Lateral flow assays, such as a home pregnancy test, are the prime example for market- implemented, standardized and validated point of care devices. We observe that these systems require minimal sample preparation and no instrumentation. However, comparing these devices to currently existing whole-cell biosensors, the fol- lowing limitations have been identified: 1) Cell embedding matrix enabling long-term

95 Chapter 5

ambient temperature storage while preserving cell activity and viability, 2) Scalability of the sensor material production, 3) Laborious pre-analytical steps such as extraction and concentration, 4) Analyte toxicity on sensor cells, 5) Lack of on-site application and user-friendliness and 6) Compliance with the national and international legislation con- trolling the application of genetically modified organisms. The requirements for a polymeric material to entrap the yeast cells should permit dif- fusion of molecules, be non-toxic, biodegradable and provide good mechanical stabil- ity [236]. For example, Vopalenska et al. [255] have presented the immobilization of yeast cells in alginate beads for the detection of copper ions. Similarly, the enclosure of sensor cells in agarose for the detection of diclofenac in wastewater has been de- scribed [256]. All these requirements are envisioned to facilitate on-site handling steps, ensure cell viability, sensor stability and reproducibility. In this work, we report a modified version of the Lentikats system using polyvinyl alcohol to create favourable microenvironments for the sensor cells [257]. The resulting biocompatible hydrogel matrix can be man- ufactured at low-costs to facilitate scalability of the sensor material production. The combination of the matrix with the efficient approach of lyophilizing, preserves yeast cells in a dormant state that facilitates long-term storage, addresses aforementioned limitations 1 and 2. The final device will be designed in a way to prevent the direct contact of the biosensor matrix with natural water bodies to avoid escape and contami- nation of the environment. Addressing limitation 3, which is concerned with laborious pre-analytical steps such as extraction and concentration, different methods have been described. Most methods to extract estrogens from environmental assays for bioassay analysis rely on solid-phase extraction and the components are commercially available as kits greatly facilitating the use [250, 258–262]. Heub et al. presented a platform to perform SPE for the detection of E2 in environmental water samples by immunoassays. The system is fully auto- mated, portable, user-friendly and efficient for the pre-concentration of E2 at ng/L levels in water samples and immunoassays [263]. In this work, the challenge of extraction was omitted as artificially prepared estrogen samples were used to prove the concept of sensor cell performance in liquid culture, within the material and after lyophilization and storage. To improve on-site applicability and user-friendliness, we present an integrated smartphone- compatible sensor-device based on a fluorescent reporter system, avoiding yeast cell lysis and the addition of temperature-sensitive external substrates. Combining the sen- sor material enclosed in a device with a smartphone equipped with a custom-designed filter set for fluorescence detection, we can substitute bulky and expensive bench-top instrumentation within a simple chip. Recently, Cevenini et al. [264] have presented

96 Chapter 5 an example of a cell-based mobile platform employing a novel bioluminescent yeast- estrogen screen (nanoYES) in combination with a low-cost compact camera as the light detector. Using such a camera instead of a smartphone increases the applicabil- ity as it can wireless connect with any smartphone model. Lopreside et al. [265] have recently presented a smartphone-based device to monitor bioluminescence in an up- graded version of the widely used yeast estrogen screening (YES) assays. This paper demonstrates the potential integration of cell cultivation, sensor function and signal de- tection within a stimulus-responsive material incorporated into a field-portable device. Further, it shows that the handling steps are minimized to facilitate the application by an untrained user in a resource-limited setting. The limitation concerning the legisla- tion has not been addressed in this work as all experiments were conducted within a laboratory setting. To facilitate the prospective use of the sensor device outside of a laboratory, yeast, a well-characterized and non-pathogenic organism was used.

Material and Methods

Chemicals and reagents

Unless stated otherwise, all chemicals were purchased from Sigma Aldrich (St. Louis,

MO, US). The H2O used was MQ-H2O.

Strains and their Construction

The strains used in this study have all been constructed and published by Ottoz et al. [173]. All strains were kindly gifted by Fabian Rudolf. The positive control (PC) corresponds to strain FRY757, the negative control (NC) to strain FRY11 and the sensor strain is FRY666.

Microbial Growth Conditions

Standard culturing conditions were used to grow microbial strains. S. cerevisiae strains were cultured in synthetic defined medium (SD) [79,266] at 30◦C. The SD-Full medium contained all the amino acids essential for S. cerevisiae growth. For agar plates, 20 g/L agar granulated (Difco) were added to the SD medium. ”High-density cultures” de- scribes cultures, which have reached stationary phase, where the rate of cell death equals the rate of cell division and the optical density of the suspension no longer changes.

97 Chapter 5

Estrogen Induction

A 10 mM stock of E2 in absolute ethanol was prepared. Dilutions of 10, 1, 0.1 and 0.001 µM were prepared subsequently [173].

Alternative Inducers

Stock solutions at a concentration of 10 mM of EE2, E1 and E3 were prepared followed by subsequent dilutions to working solutions of 10, 1, 0.1 and 0.001 µM.

Lyophilization

Cell cultures were prepared for lyophilization by removing culture medium (centrifuga- tion, 5 min, 2543 rcf). The supernatant was discarded and replaced by the equal volume of a sterile 5% sodium glutamate solution. The resuspended cells were incubated for 5 min before freezing. Frozen samples were immediately transferred to the lyophilizer unit at -49◦C and 0.007 mbar and dried for 16 h.

Storage

Tubes containing lyophilized samples were sealed with parafilm and stored at room temperature until use.

Assessment of viability after lyophilization

Viability Analysis in Liquid Cultures

Viability of the lyophilized yeast strains was assessed by rehydrating the cultures in the same volume of nutrient medium from which they were lyophilized. Subsequently, the cells were diluted 250x in fresh nutrient medium, transferred to the wells of a 96-well plate and inducer added in the corresponding concentrations. Growth and induction curves were recorded with the microplate reader.

Viability Analysis in Liquid Cultures after Storage

The tubes containing the lyophilized cells were stored at room temperature until use and rehydrated as described in paragraph 2.7.1. Growth and induction curves were recorded with the microplate reader. As bacterial growth can be approximated with a logistic function, the following equation was used and fitted with the least-squares method:

L f(x) = (5.1) 1 + e−k(x−x0)

98 Chapter 5

A threshold value of OD600= 0.2 was defined. The cycle number, at which the threshold was crossed, was determined to be a measure for the viability of the cells after lyophiliz- ing, as it reflects the duration of the lag phase. The growth curves were evaluated by this method at different time points.

Yeast Spotting Assay

Spotting assays were used to visualize the survival of yeast after lyophilization. The colonies were grown overnight to full density and serially diluted five times at a 1:10 ratio in a 96-well plate (Thermo Fisher, Waltham, MA, US). Alternatively, lyophilized dense cultures were used and treated the same way as the fresh cultures. To create the spots, 3.5 µL of the diluted cells were pipetted onto SD-Full plates. This theoretically resulted in between 3 and 5 cells in the lowest concentrated spot. Following incubation of the serial dilutions, the plates were visually inspected and cell counts of the spots were compared. Spots of a similar count were identified and the mortality rate was determined under consideration of the dilution factor.

Microplate Reader

Growth of yeast cultures was monitored by measuring the optical density at a wave- length of 600 nm. GFP fluorescence (excitation = 488 nm, emission = 530 nm) was measured from the bottom at 30◦C in a 96-well well plate with a microplate reader (TECAN, Spark). The data were initially analyzed in Excel (Office 2010, Microsoft).

Preparation of PEG-PVA Hydrogel

Solutions of poly(vinyl alcohol) (PVA, Mowiol 28-99; Mw 145,000 g/mol) at 10% (w/w) were prepared by dissolving PVA in tap water at 80◦C and followed by repeated heating to 80◦C and stirring on a magnetic stirrer plate. Once dissolved polyethylene glycol 600 (PEG, abcr GmbH), at 6% (w/w), was added while continuously heating and stirring [257]. The hydrogel solution was stored at room temperature until use.

Material Production

The biosensor material containing the yeast strains was produced using high-density cultures. After growth, 5 mL of cells were harvested by centrifugation (5 min, 2543 rcf). The supernatant was discarded and the pellet resuspended in 500 µL sterile 5% sodium glutamate solution and the suspension was incubated for 5 min before adding it to 9.5 mL of previously described PEG-PVA hydrogel [230]. To ensure homogeneous

99 Chapter 5

mixing, material and cells were mixed in a SpeedMixer DAC 150 FVZ (Hauschild En- gineering, Germany) at 2000 rpm for 10 s. The homogeneously mixed hydrogel was applied to the UV-sterilized aluminium mould (10x10x0.1 cm) and frozen at -20◦C for 1 h. Frozen samples were immediately transferred to the lyophilizer unit (CHRIST AL- PHA 1-2, Germany) at -49◦C and 0.007 mbar and dried for 16 h.

Material Cost Analysis

1 L of PVA-PEG hydrogel solution is required to produce 1 m2 of sensor material (60 g PEG, 100 g PVA). Polyethylene glycol 600 is available at CHF 60.50/kg and the PVA (Mowiol R 28-99; Mw 145,000) at CHF 128.40/kg). 1 L of hydrogel solution, therefore, amounts to CHF 16.40.

Data Analysis

All experiments were carried out in biological triplicates and data was analyzed with Origin (OriginLab Corporation). For the evaluation of the sensor performance within the hydrogel material, a two-sample t-test for unequal variance was conducted with an assumed alpha level of 0.05.

Design of the Field-portable Device and the Smartphone Adapter

The 3D design of the field-portable device was created using the software SOLID- WORKS 2016 and printed with a ProJet 2500 Plus (3D Systems, Inc.). The material used for the 3D print was VisiJet M2R-WT, an ABS-like material providing a rigid func- tion with moderate flex. Using this workflow, the field-portable device, as well as the smartphone adapter prototypes, were manufactured.

Results and Discussion

Analytical performance of the sensor

The analytical performance of the yeast biosensor in a laboratory setting has been eval- uated. In an intial screen, the limit of quantification (LoQ) for E2 was determined [173]. Subsequently, the sensor was tested for cross-inducibility by the different estrogenic compounds E1, E3 and EE [245]. To pave the way for a real-world application within a device, the viability and E2-inducibility of the cells after lyophilization were character- ized.

100 Chapter 5

Limit of Quantification and Induction Specificity in Liquid Cultures

We have characterized the sensor strain in terms of inducibility by E2 and compared the performance to a positive (constitutive reporter gene expression) and a negative control (’empty’ strain) in a dose-response curve as visualized in Figure 5.2A. To characterize the biosensor in terms of induction specificity, several steroidal estro- gens were tested at concentrations between 0 and 10 nM. Strong induction was ob- served for E2 and EE2. E1 also induced the sensor strain at a concentration of 10 nM. E3 did not cause a measurable induction of the sensor.

Viability and Growth Recovery of Sensor Cells after Lyophilization

The biosensor strain is strongly induced by E2 and EE2 and weaker at higher concen- trations of E1 when tested with a fresh, non-lyophilized culture. The sensor material fabrication process involves freeze-drying to enable storage. To assess the compatibil- ity of the yeast sensor strain with such a process, high-density cultures were subjected to lyophilization in a 5% sodium glutamate solution. Following lyophilization growth and induction, curves were measured in regular, seven- day intervals to assess a potential decrease in viability after extended storage. Cultures were rehydrated in SD-medium and diluted to an OD600 of 0.001. By monitoring the growth over 60 hours, we have learned that the lag phase of the lyophilized yeast cells

R A B 1 7 0.3 nM E2 H R 0.5 nM E2 2 1.0 nM E2 6 3.0 nM E2 H H 5.0 nM E2 HO PC Estrone (E1) Estradiol (E2) 5 NC O OH R1: R2: H R1: R2: H

4 Estriol (E3) Ethinylestradiol (EE) OH OH OH R1: R2: R1: R2: H 3 C

Fold change 1 nM 8 5 nM 2 10 nM 6

4 1 2 Fold change 0 0 0 5 10 15 20 25 30 35 E1E2 E3 EE Cycle (30 min) Compound

Figure 5.2: (A) The fold change in fluorescence expression upon E2-induction was plotted as a function of growth time. One cycle corresponds to 30 minutes. Symbols represent the mean and error bars the standard deviation. It is observed that the in- crease in fluorescence is higher than background fluorescence starting from 0.5 nM. (B) Chemical structures used for the determination of induction specificity. (C) Compar- ison of sensor inducibility by E1, E3, E2 and EE2 at 1, 5 and 10 nM evaluated after 24 cycles (12h), when the cells had reached an OD600 of approximately 0.3.

101 Chapter 5

was extended. The growth after lyophilization was approximated by fitting a logistic function. Subsequently, the cycle at which the growth curve crosses the arbitrarily de-

fined threshold value of OD600= 0.2 was determined. For a fresh culture, an OD600 of 0.2 was reached after about 8 hours. Compared to a lyophilized sample, the extension

of the lag phase caused the culture to only cross the OD600 threshold of 0.2 after 29 h if not stored, which is offsetting it to 43 h (RT storage, 35 days). Currently, we are testing the storage behaviour over several months.

To qualitatively check the impact of lyophilization on the survival of S. cerevisiae, a yeast spotting assay was conducted. Two time points post-lyophilization were tested. The first colony was lyophilized and subsequently stored at room temperature for one month and the second colony was plated 24 hours after lyophilization. As a positive con- trol, a freshly grown colony was plated. As expected, the survival rate of the lyophilized colonies was lower than the positive control. The survival rate after the lyophilization process was determined to be about 1 in 104. Visual inspection did not yield a ma- jor observable difference in survival for the two samples stored for different periods of time as shown in Figure 5.3. Furthermore, no difference between the different strains (sensor, constitutive and empty) was observed. In line with the dose-response curves reported for the LOQ, the inducibility of the sensor cells after lyophilization was checked and confirmed to be preserved. As soon as growth was observed, the sensor cells pro- duced citrine in the presence of E2 and the positive control indicated viability by showing inducer-independent fluorescence signal.

Dil. F. 0 10 102 103 104 105

Control

Lyo 1 m

Lyo 1 d

Figure 5.3: (A) Results of the yeast spotting assay. The top row indicates the dilution factor and the left column the different conditions tested. The non-lyophilized culture shows growth at all dilutions. For the lyophilized and stored samples, reduced viability was observed.

After characterization of the cell survival following lyophilization, the robustness of the sensor cells was exploited, the cells embedded in a polymeric matrix and the material quantitatively tested.

102 Chapter 5

Fabrication of the Biosensor Material

Production of Sensor Material

Briefly, cell culture was resuspended in a small volume of a 5% sodium glutamate so- lution. This cell suspension was mixed with the hydrogel solution resulting in a 10%, w/w and PEG 6%, w/w biosensor material precursor [230]. In the previous section, we have demonstrated the yeast survival after lyophilizing in presence of sodium gluta- mate as a . Combining lyophilization with the hydrogel platform, we have established an easily scalable production process for a yeast biosensor material.

Low Cost of Sensor Material Manufacturing with Current Setup

Considering the low costs of the polymers constituting the growth matrix for the S. cerevisiae sensor strains, we have calculated the material cost for the production of one batch of sensor discs. With a lyophilizer chamber diameter of 23 cm and a height of 27 cm, square aluminium moulds with a diagonal of 15 cm can be used for sample preparation. The moulds are 1 cm thick, potentially resulting in 27 layers to be stacked within the chamber. Assuming a sensor disc diameter of 0.5 cm, 900 discs can fit per layer. A total number of 24’300 discs can be produced per lyophilizer batch. A full run requires approximately 600 ml of hydrogel and the material cost amount to CHF 9.95. The production of an individual sensor disc therefore costs CHF 0.0004. As four discs are necessary for the analysis of one sample, the cost of the sensor material within a single device amount to of CHF 0.001. In the future, fabrication of the device housing the sensor discs will be performed by injection moulding allowing for very low prices per unit and paving the way further for cheap and robust on-site applications.

Qualitative Evaluation of Cell-recovery and Sensor Performance

Up to this point, this work has described the production of a low-cost, ambient tem- perature storable and cell-containing material. The sensor performance of the material was evaluated. The sensor material discs were rehydrated and incubated in nutrient medium. For cell recovery, 48 h were allowed as determined from the lyophilization growth assays. Subsequently, E2 was added at concentrations ranging from 0 to 5 nM. After 5 hours, the sensor discs were transferred to a sample holder and images were taken (Figure 5.4A). Figure 5.4B compares the mean pixel intensities of the citrine flu- orescence readout of the positive control (fluorescent protein is driven by a constitutive promoter), the negative control (yeast wildtype strain containing no reporter gene con- struct), and the sensor strain with the E2-inducible element. As expected, the positive control exhibited fluorescence independent of the inducer concentration and the neg- ative control has only shown a background signal. For the sensor strain, the finding

103 Chapter 5

A NC

S

PC

01 3 5 Concentration E2 [nM] B 150 S NC PC p = 0.02 100 *

50 Pixel Mean Intensity

0 00.3 0.5 0.8 1 3 5 0 0 Concentration E2 [nM]

Figure 5.4: (A) Visualization of the effect of E2 on the sensor material. PC: positive control, NC: negative control, S: sensor material. (B) Quantitative evaluation of the sensor discs depicted in (A) after 48 h of growth followed by 5 hours of induction.

of Figure 5.2B of a limit of quantification of 1 nM. Fluorescence of the induced sensor strain at 1 nM was significantly different from the negative control (p = 0.02). To en- able quantification, the experiment was carried out in triplicates and Figure 5.2B shows the mean intensities of the different strains at different E2 concentrations. In sum- mary, two different detection limits have been observed depending on the experimental conditions. In liquid culture, where fluorescence is monitored with a highly sensitive mi- croplate reader, the signal caused by 0.5 nM E2 could be detected. On the other hand, within the hydrogel material, the fluorescence signal is captured with a single-lens reflex camera with lower sensitivity and only 1 nM E2 could be detected.

104 Chapter 5

A Smartphone-readable Sensor Device to Enable on-site Analysis and Field Use

Device integrating cell cultivation, sample application and sensor readout

Current biosensor systems come at the limitations of lacking the capability of controlled storage, which directly impacts sensor cell survival and therefore sensor performance. Ideally, a biosensor device would facilitate cell handling to an extent where a biology laboratory was made obsolete and sensor readout can be conducted without expensive laboratory equipment [264,265]. We have developed a device integrating cell cultivation and sample application, while at the same time providing a platform for imaging of the

A

Controlled Storage Liquid Growth Matrix Laboratory Equipment

Device Rehydration & Induction Signal Readout

B

Flip

Incubation Position Imaging Position

Figure 5.5: (A) The top row summarizes the challenges of current biosensor applica- tions. Sensor cell survival is closely linked to controlled storage conditions, usually in a freezer to stop the metabolism or at least in a fridge to slow it down. The cells require a liquid growth matrix for optimal growth and usually expensive and bulky laboratory equipment is necessary for signal readout. The bottom row of (A) describes how this biosensor tackles the aforementioned challenges. With lyophilization of the sensor cells and incorporation within a device, storage is addressed. The cells are rehydrated and induced within the sterile environment of the sensor chip and the use of expensive lab- oratory equipment is made obsolete by employing a smartphone with a filter set for signal readout. (B) The device is supposed to be used in two different orientations. A cross-section through the device showing the polymer disc housing dried cells with nutrient medium penetrating the disc in incubation position is presented. In a real-world scenario, the incubation with the analyte takes place at room temperature in the incu- bation position. Subsequently, the device is flipped to imaging position and images are taken.

105 Chapter 5

sensor material after a period of incubation. The device was 3D-printed with a ProJet 2500 Plus, which is a state of the art professional-grade 3D printer capable of printing rigid and elastomeric parts with high precision. The biosensor device features four sterile, independent chambers, which are sealed from the environment with a septum. Within each chamber, the sensor material disc is mounted on support pillars, which hold it in position underneath a glass window. During assembly, nutrient medium powder is immobilized on the walls of the device. Figure 5.5A graphically shows the steps to use the sensor device. The device is deliv- ered to the customer in a sachet providing a controlled atmosphere for longterm stor- age. Upon unpacking from a sachet, each chamber is injected with a given volume of water provided with the kit. Together with the immobilized nutrient powder, the water constitutes the medium required for bacterial growth. The device is flipped to incuba- tion position to ensure optimal nutrient access as depicted in Figure 5.5B. The inducer is added 48 hours after of incubation, the device is incubated for another 5 hours and subsequently flipped by 180 degrees back to imaging position. The disc will no longer be covered by nutrient medium and is optimally presented in the window.

Smartphone Adapter Enabling Sensor Readout

To fully detach the sensor device from a laboratory environment and enable facilitated readout of the signal, a smartphone filter adapter was designed and fabricated using 3D printing technology. The last part of Figure 5.5A shows the adapter mounted to a smartphone. The adapter features two filters. The first one is a blue excitation filter (peak at 460 nm) covering the area of the flash. When the flash is operated, the ex- citation filter produces light at a wavelength smaller than 460 nm. Subsequently, the orange emission filter (bandpass filter, 520 nm), mounted in front of the camera, al- lows capturing light at the emitted wavelength. After incubation in the field at slightly elevated ambient temperature conditions, the device is flipped to the imaging position, the adapter is fixed to the smartphone camera and images are taken. In the future, we plan to develop a smartphone application to automate the processing of the fluorescent signal quantification. The results will be reported in a future publication.

Conclusion

Here we report the development of a portable and storable S. cerevisiae biosensor platform for the detection of E2. The sensor cells embedded in a polymer matrix can reliably detect the presence of E2 at concentrations of 1 nM. After storage, the sensor material discs are incubated in nutrient medium for 48 hours, the inducer is added and the signal reported after another 5 hours of incubation. Looking at the specificity of the

106 Chapter 5 sensor, we have observed that E2 and EE2 cause induction of the sensor already at a concentration of 1 nM, E1 at 10 nM and there was no measurable fluorescence signal in the presence of E3. Conventionally, the detection of environmental pollutants is conducted in specialized an- alytical laboratories. Such facilities are equipped with devices for analytical chemistry, enabling a very precise determination of sample composition. In a biological laboratory, biological detection methods are employed to a similar outcome. Assays conducted in laboratory facilities deliver extremely precise results but come at the main disadvantage of the high cost of maintenance, labour and consumables, resulting in a high cost per analysis. With the biosensor platform described in this study, we have taken laboratory analytics one step closer to the application in the field. The lyophilized polymeric matrix enables longterm storage of the sensor-cells within the sensor device and facilitates sensor operation by minimizing the handling steps. By the means of rapid prototyping, we have developed a field-portable analytical biosensor device integrating cell cultiva- tion, sensor function and signal detection within a stimulus-responsive material. Signal readout and quantification are conducted with a nowadays omnipresent electronic de- vice and maximally facilitated. With methods from analytical chemistry, a well-equipped laboratory and educated staff, very precise determination of a chemical sample is pos- sible. If the bioactivity of compounds in a sample matrix is of interest, bioassays are the tools of choice but with similar drawbacks as mentioned above. With an analytical task at hand, one always has to balance the pros and cons of different methods. In the case of a biosensor, the cons usually are slow growth, a narrow range of optimal assay conditions and the requirement for sterile handling conditions. The system presented in this work takes microorganism-based analytics one step closer to application in de- centralized locations. Future efforts should be centred around improving the limit of detection either by biologically improving the sensor strain or by facilitating the extrac- tion and pre-concentration methods. Once these challenges have been mastered, the true performance of the bioreporter assay can be examined by comparing its results to chemical analytics conducted in parallel.

Author Contributions

Nadine Lobsiger: Conceptualization, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization, Project admin- istration. Jonathan E. Venetz: Conceptualization, Validation, Investigation, Writing - original draft, Writing - review & editing, Visualization, Project administration. Michele Gregorini: Conceptualization, Formal analysis, Writing - review & editing. Matthias Christen: Conceptualization, Writing - review & editing, Supervision. Beat Christen:

107 Chapter 5

Conceptualization, Writing - review & editing, Supervision. Wendelin J. Stark: Con- ceptualization, Writing - review & editing, Supervision, Funding acquisition.

Acknowledgements

We would like to thank F. Rudolf for the gifted S. cerevisiae strains. Many thanks also to R. Walker from the ETH Zurich, D-CHAB, ICB workshop for support with rapid proto- typing. Financial support by ETH Zurich is kindly acknowledged. The authors gratefully acknowledge the financial funding by the Gebert Ruf¨ Stiftung (GRS-056/16).

108 Chapter 6

Synthetic Genomics – a Challenge for Regulation- and Policy-Makers

The relatively young field of synthetic biology holds the potential of solving many cur- rent and future problems faced by our society such as health problems as well as the dependency on natural resources [137–139]. This ability arises from drastically new engineering approaches (e.g. CRISPR-Cas9 [267], large scale DNA synthesis) intro- duced into biology. In some scientific works, authors talk about the ”phenomenological naturalization of technology”, which can be interpreted as mixing nature and a new generation of industrial technologies [268]. To establish these new technologies, sci- entists from many different fields (e.g. biology, chemistry, data science, etc.) have to collaborate. However, while the fast-paced innovation-rate due to the novelty and the interdisciplinarity of the field are its biggest assets, these properties also pose a problem for policymakers as there are no previously known regulations regarding these innovations. As an example, there are currently very limited regulations concerning the synthetic organisms emerging from new fields of science, which led policy-makers and scientists to apply the existing regulations in regards to genetically modified organisms (GMO) [269]. However, the complexity of the problems faced by synthetic biology reg- ulations goes beyond the scope of existing policies and regulations. In this section, the benefits and risks of the field of synthetic biology will be considered, followed by an opinion of why the field needs a new set of regulations and why the existing regulations are not enough. At the end of the chapter, some potential strategies to define a new set of policies and regulations for the field of synthetic biology will be introduced.

109 Chapter 6

6.1 Benefits and Risks of Synthetic Biology

Benefits of Synthetic Biology

Currently, there are many approaches to synthetic biology. In this chapter, the focus lies on the approach of synthetic genomics as described in previous chapters of this thesis. The general goal of this field is to create synthetic life. More specifically, the goal of this field is the creation of minimal chassis cells and subsequently specialized industrial producer cells [110]. The minimal chassis-cells only contain the genetic components necessary to survive under laboratory- or production-conditions. In the first genera- tion of specialized producer cells, an additional synthetic pathway will be added to the chassis-cell to produce the desired product. Due to the minimal design of the chassis- cell, there will be minimal interference of the normal cell metabolism with the compound production and vice versa. By implementing different genetic modules in addition to the minimal chassis, the next-generation production cell-line can be specialized to yield high product titers [270]. Potentially this will enable already existing bio-synthesized products (e.g. insulin, detergents, etc.) to be produced in higher quantities and at a lower cost. Considering metabolic-pathways encoding for the production of complex compounds, the field of synthetic genomics opens up many possibilities. Previously, these com- pounds could only be synthesized chemically or had to be extracted from natural re- sources. An example is the anti-cancer drug taxol, which is originally produced by the pacific yew [271]. In consequence, the large-scale production of these compounds was often harmful to the environment as well as dependent on non-renewable resources. Metabolic engineering and DNA synthesis will facilitate the creation and expression of large bio-synthesis pathways like the taxol pathway in the future [272]. Due to the high- throughput nature of the design- and synthesis-process, as well as the standardized assembly strategies proposed in this thesis, it will be possible to optimize the known pathways by testing different sets of genetic regulators. Due to this fast pipeline, the prototyping of novel pathways becomes a high-throughput process, opening up innova- tion avenues for industrial biotechnology to produce novel products. Up to this point only currently known pathways and products have been discussed. However, nature offers many more not yet known compounds, which can be discovered using sequence data mining approaches [273]. These compounds hold the promise of being e.g. therapeutics or replacements for oil-based plastics. One of the reasons for this vast undiscovered space of compounds is that the majority of microorganisms can- not be cultured under laboratory conditions. This is especially important for organisms, which are living in symbiosis with higher-order organisms. However, the DNA of these non-cultivatable organisms can often be isolated from environmental samples [274].

110 Chapter 6

Over the last years, DNA sequencing technology has improved significantly and has be- come more sensitive and cheaper. Due to the high sensitivity of the newest sequencing technologies, trace amounts of DNA often suffice to gain sequence information. This way, samples taken from nature can be sequenced without cultivation. The data min- ing of this ”genomic dark matter” yields pathways and products, which have not been known before [171, 273]. Using the genome-mining approach in combination with ex- tensive fundamental research, it was possible to produce the new cytotoxic compounds aeronamides in a short time frame [275]. To recreate and modify such pathways iden- tified in silico, synthetic genomic approaches discussed in this thesis can be used. By expressing these synthetic constructs, future novel compounds can be produced and characterized by their medical activities. As previously discussed in this thesis, the fast synthesis pipeline of any sequence also holds the potential of creating DNA vaccines [276]. Such vaccines do not rely on com- plex and time-intensive production systems. Also, new algorithms relying on artificial- intelligence enable the design of non-pathogenic viruses or virus particles resembling the pathogen for immunizations. As the vaccine design will be done in silico and no virus cultivation will be needed, the development and production time of a vaccine can be drastically reduced to weeks instead of currently months to years [277]. Further- more, the increased stability of DNA vaccines will enable more widespread vaccination of the population of the developing world. The higher stability facilitates transport of these novel vaccines to remote regions and also enables easier storage. The benefits discussed in this section so far have been mainly concentrated on applica- tions and not focused much on fundamental research. However, synthetic biology also has a considerable impact on fundamental biology. By the means of synthetic biology, it becomes possible to study the interactions of different genes in an organism on a more complex level by synthesizing a tailor-made model organism, as opposed to mutating an existing one. Furthermore, the synthesis process enables the researchers to rewrite the genetic code of the organism. As the known information layers are considered dur- ing the rewriting process, so far unknown rewritten layers can affect the functionality of the final construct. By analyzing these previously unknown layers after modifying them, the understanding of how information is stored and regulated within the genome can be expanded.

Risks of Synthetic Biology

As discussed above, the field of synthetic genomics holds a great promise for future applications, such as vaccines and new biosynthetic pathways. However, if this tool gets into the wrong hands there is also a very large potential to negate the potential benefits. This is especially true for governments ignoring international agreements and

111 Chapter 6

researching bio-weapons. This problem, however, is not limited to governments but does also apply for non-governmental extremist groups such as terror-organizations. The feasibility to create potentially dangerous organisms and genetic constructs has been shown in the early 2000s when the genome of an infectious poliovirus has been synthesized [174]. Although there was a lot of criticism for the publication of this work, it did not stop other scientists of chemically synthesizing and publishing further viruses, such as Influenza [184, 278, 279], Zika-virus [280], SARS-virus [281] and many more [282–285]. The publication of the assembly methods and the blue-prints of all of these synthetic viruses is a considerable problem. Especially when considering the growing ease, at which large DNA constructs can be synthesized and assembled nowadays. Therefore, the field of synthetic virology shows the most obvious potential for misuse.

However, the synthesis of viruses is not the limit. In the previous section, the possibili- ties of synthetic pathways have been discussed. It is important to note, that sometimes a pathway or even enzyme can change an organism from a non-pathogenic biosafety level 1 organism, to a dangerous pathogen, which has to be handled with high care and in a carefully restricted environment [286]. The group of people that go by names such as Do-It-Yourself (DIY) synthetic biologists, citizen scientists or biohackers has the potential to cause harm without intention as it has been part of the discussions at the G20 summit in 2017 [287]. Such people often are inexperienced individuals, who buy commercially available kits to conduct synthetic biology experiments at home. However, when experimenting with kits containing CRISPR-Cas9 components and other powerful synthetic biology tools, it is also imaginable to accidentally create strains with potentially adverse effects on the health and the environment. This is a risk factor, that has to be considered before the synthetic biology kits become as easily accessible and common as the well known chemistry-at-home kits.

When considering the new organisms, which can potentially be generated by DYI bi- ologists, as well as trained scientists, it is important to consider the risk of genetically modified organisms (GMO) or synthetically created organisms breaking out of the lab confinement. Usually, the risks of released organisms to human health and the envi- ronment are considered to be low, as these organisms are lab-bred under optimal con- ditions. However, once in nature these lab-bred organisms show major fitness disad- vantages compared to their wild type counterparts [288]. In cases such as gene drives against mosquitoes, organisms containing synthetic parts are released on purpose to eradicate entire species [289]. The technology of gene drives has been developed to introduce a selfish gene into e.g. mosquito populations with the help of CRISPR-Cas9 to permanently alter the genetic properties of a species. In the particular case of the mosquito gene drive, the aim is to sterilize and thus eradicate the population. The targeted extinction of the mosquitoes aims at eradicating the zoonotic parasite Plas-

112 Chapter 6 modium, which is responsible for malaria. The targeted eradication of an entire species holds great potential for ethical discussions regarding the moral boundaries of scien- tific research [290]. However, considering the release of lab-microorganisms, synthetic genomics enables scientists to create a new breed of organisms at a high pace, which might be stronger competitors in the environment. These novel organisms potentially cause a threat to existing live-forms as pathogens or as competitors. The survival of newly created synthetic organisms is, like everything else, driven by evolution. This implies, that the organism will evolve into unknown directions after the release. To date, it is impossible to use current data to estimate the consequences of the release of such organisms. Potentially, such evolved synthetic strains can cause harm and this risk has to be evaluated [291].

Lastly, the advent of synthetic biology could pose a major threat to water- and food secu- rity in certain regions. It is possible, that scientists are successful in designing and syn- thesizing a microorganism, which can produce for example carbon hydrides chains. Us- ing already established chemical production pipelines, carbon hydrides could be used as a starting material to produce fuels or plastics needed in everyday life, eliminating the dependency on carbon hydrides from fossil sources. Alternatively, the organisms could be engineered to produce bio-plastics or -fuels. This would help to abolish our dependency on fossil resources. However, these microorganisms would need some form of carbon source for the metabolic processes involved in the production of these compounds. Possible carbon sources can be found in starch-rich plants (e.g. corn), which are typically grown in mono-cultures [292]. The explosion of demand in these materials would have the effect, that farmland currently used to grow food will be used to grow non-edible starch-rich plants. As a consequence, new fields have to be created to grow edible crops. This would result in the loss of biodiversity and it would also have an impact on the food- and water-safety of those regions, as all the resources would be used up for the irrigation of these ”petrol-crops” [293]. The problem of ”petrol-crops” is already an ongoing discussion considering the production of bioethanol [292]. It is also in conflict with the UN sustainable development goal 2, which aims at eradicating hunger around the globe [294]. In conclusion, this application of synthetic biology would lower the need for fossil resources but a replacement energy-source has to be grown. This energy-source would entail its own set of problems, which have to be carefully evaluated.

Many of the risks and benefits discussed in the last two sections are nullifying each other. For example, it becomes easier to design new viruses. At the same time, the fast development of a vaccine against this specific virus will also be possible. This implies, that the field of synthetic biology has the potential to solve the risks it creates. For this

113 Chapter 6

reason, it is very important to implement policies and regulations, which put a certain grade of control onto the field. At the same time, it is crucial to give the young field enough leeway so it can still develop and innovate without being suffocated at the core.

6.2 Why Today’s Genetic Engineering Regulations do not Suf- ficiently Cover Synthetic Biology

Classical Genetic Engineering Uses a Top-Down Approach

To understand why the current policies and regulations are not sufficient to regulate synthetic biology, we have to take a look at their origins. The field of classical genetic engineering started in the 1970s and 1980s with the discovery of restriction enzymes and the development of cloning technologies [35]. With the emergence of the PCR tech- nology, it became possible to amplify pieces of DNA from almost any template [49]. Be- cause the PCR technology requires DNA oligo primers, the demand for synthetic DNA oligonucleotides increased, advancing the chemical oligonucleotide synthesis technol- ogy. At the same time, new technologies such as first-generation DNA sequencing were developed. Due to these novel sequencing and cloning technologies, it became possi- ble to create and verify organisms with newly acquired beneficial functions, for example, the production of biosynthetic human insulin in E. coli [295]. Not only were these genet- ically altered organisms the basis of the biotechnology industry, but they also formed the basis for the first policies and regulations concerned with genetic engineering. It is very important to note, that the classical field of genetic engineering is based on a top-down engineering approach. This means that the newly introduced elements into GMOs have been previously known and are usually copied from existing organisms. An example for this is the insect-resistant Bt-corn, which carries genes encoding pro- teins of the bacterium Bacillus thuringiensis, to battle the destruction of corn-harvests by insects (Figure 6.1A) [296]. The integration of this already known Bacillus gene into corn would naturally never occur. However, using genetic engineering it was possible to integrate such genes into the corn genome (Figure 6.1B). Insects eating Bt-corn starve shortly after the consumption due to a destroyed digestive system (Figure 6.1C). This example shows, that science is at a stage, at which significant world problems can be solved. However, the potential risk of this novel DNA-modification resulted in a strong opposition towards Bt-corn and its prohibition in many countries. The modification of an organism with an already elsewhere existing trait is also were most of the current policies and regulations apply. By definition, the regulations and policies apply if an organism has been modified in a way, which would not be possible in nature [269]. Fur- thermore, the regulations assume that the newly integrated parts are previously known concerning safety, which is not always the case in synthetis biology.

114 Chapter 6

A Harvest destroyed

B Endotoxin

Trans- Cloning formation Bacillus thuringiensis

Corn expressing endotoxing C Harvest successful

Vermin exterminated

Figure 6.1: Explanation of Bt-corn. (A) Corn plants are eaten by insects. Once a plant is eaten, the vermin migrates to the next healthy plant. (B) The gene of a delta-endotoxin from Bacillus thurigiensis has been isolated and a plasmid has been constructed. The plasmid was transformed into corn plants. These plants express the endotoxin (pink) throughout the plant. (C) The endotoxin consumed by the insects is activated in the stomachs, resulting in paralyzation of the digestive system and the eventual lysis of the gut-walls. As a consequence of the inactivation of the digestive system, the vermin starve to death. These endotoxins are only active in insects and do not harm mam- malians.

The Efficiency of Current Regulations Decreases with the Advancement of Synthetic Biology

The field of synthetic biology is still loosely defined regarding its goals and so far no consensus has been found for a clear definition. However, the desired goal which can be found throughout the whole field is the bottom-up engineering approach [117]. This approach in combination with the interdisciplinary nature of the field separates synthetic biology from the field of genetic engineering. In bottom-up engineering, the goal is not to take already existing properties and implement them in a new host, but to create new functionalities from parts available in nature. If the goal of the engineering was to introduce a novel genetic circuit into a cell, the current regulations developed for the classical genetic engineering apply well because the resulting organism does still

115 Chapter 6

fall within the category of GMOs. However, as described in Chapter 4 of this thesis, another goal of synthetic biology is to create novel organisms. Due to the rewriting of the genetic information in the C. eth-2.0 genome, the resulting organism, if alive and proliferating, would be considered a new species. The rewritten genome qualifies to be a new species because its DNA sequence varies from the parent sequence and results in a phylogenetic significant difference. At this point, it could be argued that this is not a GMO any more as there was nothing modified in its genome but it is a completely new and synthetically created organism. Along these lines, the current policies and regulations considering genomic-engineering can be disputed. The ambitions of the field do not stop at creating minimal cell chassis and optimized strains. Another goal within the field of synthetic biology is the design of new entities such as enzymes in silico. Computer tools have been developed to design novel en- zymes using a function as a starting point and not an already existing enzyme [297]. At the current level, there are still many iterations of design optimization needed to reach the goal of a well-functioning enzyme. It is expected that with some progress in the field, it will become possible to design fully functional enzymes on the computer and produce them in the laboratory [298]. If one expands this thought to future applications, it might be possible to design a whole new organism with all the desired specifications in silico. This development poses a major challenge for policy-makers working on regulating the field of synthetic biology. It is very difficult to define the risk of a non-natural design, which has new functionalities and was designed on a computer. However, the regula- tions for novel synthetic DNA sequences are currently sparse, bearing the danger of potentially dangerous designs not being recognized as such.

6.3 Scientific Self-regulation Complements a Future Frame- work of Policies

The current set of policies and regulations covering the field of synthetic biology are limited and will probably be outdated soon, resulting in a legally unsure situation, which could be capitalized on by people with malicious intent. In this last section, different options to potentially ensure proper regulation of the field will be discussed concerning the innovation progress driving further developments in the field. It is important to note, that the field of synthetic biology, like many other fields, has a large number of different stakeholders [299]. For one, there are scientists, who want to drive the field forward and develop new technologies to contribute to solutions for global problems and fundamental questions. However, it is also common that scientists apply for patents on their invention and become entrepreneurs commercializing their invented

116 Chapter 6 technologies. Usually, the main goal of entrepreneurs and companies is to maximize generated revenues. The next stakeholder involved are the governments of nations re- searching synthetic biology. On one hand, they have to limit the field to assure national security, while they also want to have access to technologies that provide them with ad- vantages in internationally competitive fields (e.g. petrol production, bio-defense, etc.). Another group of stakeholders is different groups promoting environmental protection. Tendencies show that such groups often oppose applied synthetic biology due to the risk of environmental damages caused by the inappropriate release of the organisms. This list of potential stakeholders is not conclusive and it is important to note that there are many more stakeholders impacted by the field of synthetic biology. The vast num- ber of different stakeholders prevents a universal solution for policies. With a careful stakeholder analysis, the different aspects of synthetic biology can be considered from different angles, reducing the risk of biased policymaking.

When implementing new policies and regulations, their impact on potentially restricting research progress needs to be considered. Currently, the regulations on large syn- thetic constructs are lax. However, if the newly developed regulations turn out too stringent, the progress of the field is likely stopped. This risk is especially high within such a dynamic and young field. It will be very important to create a legal framework broadly regulating synthetic biology. Such a broad fashion would imply, that known risk factors, such as the publishing of detailed blueprints and methods for the assembly of synthetic pathogenic viruses, would be regulated. Simultaneously, the research on synthetic viruses has to be allowed to promote the research in the field of DNA vac- cines, as these hold great promise in fighting future viral outbreaks. Of course, such a dynamic form of regulation would put a high responsibility on the scientists calling for self-regulation [300]. The amount of knowledge gained from experiments regarding synthetic constructs would increase over time and it would very soon be possible to clarify and adopt policies and regulations. One could think of a process very similar to the regulations seen in animal experimentation, which is a great example of successful policymaking. Initially, researchers were allotted a lot of leeway in their experimental procedures. With increasing knowledge and a slower rate of innovation, the regulations got tighter and nowadays almost every aspect of the approval process and the experi- ments is strictly regulated. Therefore, this approach allowed the technology to develop to its full potential while resulting in a reasonable set of policies and regulations [301]. In conclusion, the future new set of policies and regulations has to minimize the pre- viously discussed risks with an initially loose legal framework. This framework has to be defined by stakeholder panels and is expected to optimally cater to the needs of the different stakeholders. With the panels defining the framework, scientists must follow their duty of self-regulation. This self-regulation can also be implemented by panels of

117 Chapter 6

researchers, which approve of planned experiments, very much like it is done in the field of clinical trials. An additional layer of security is the implementation of molecular safety mechanisms, which allow for the extermination of synthetic organisms in case of an emergency.

The Regulation of DNA Synthesis

The last aspect to be discussed in this chapter is the regulation of the DNA synthesis process. It has become obvious that some kind of control will be required, especially when considering DIY synthetic biologists working at home or organizations with ma- licious intent. Due to a lack of internationally approved regulations and thus unclear legal responsibilities, several commercial DNA providers, which are major stakeholders within synthetic biology, have founded the International Gene Synthesis Consortium (IGSC) [302]. The goal of the IGSC is to screen the placed orders for dangerous se- quences. Further, members of the IGSC also verify their customers, ensuring the exis- tence of suitably equipped laboratories for the work with the synthesized DNA. However, participation in this consortium still is voluntary. The rapidly expanding gene-databases and the ordering of ambiguous sequences, which have to be screened by an operator, make it increasingly difficult to screen for dangerous sequences, as well as increasing the cost per screen. It has been argued before, that this rising cost will eventually make the companies unprofitable [303]. This example shows that a set of new policies is needed, such as smarter databases for the screen, training of scientists to avoid red flags for standard constructs or international incentives, which reward companies for participating in the screen [302]. However, the sequence screening of the providers does not solve the bio-security prob- lem of DNA synthesis. As DNA synthesizers become more readily available and the prices have significantly dropped, more laboratories will possess their own machines. Similar to next-generation sequencing machines, the synthesizers trend towards a fully- automated chip-based process, which makes the handling easy, also for basic to mod- erately trained users. In consequence, it could be imagined that DNA synthesizers can be owned by private persons. As this development negates the screening efforts of the IGSC, an additional set of international policies for the owning of DNA synthesizers needs to be discussed. Several ideas, such as regulating the sale of the machines, the sale of the reagents or conducting sequence screens on each machine have been proposed [300]. Regulating machine sales and the reagents seem efficient measures to make sure only trained people use the technology. However such measures result in additional administrative work and can still be circumvented rather easily. Directly screening each sequence on the machine before the synthesis seems difficult to imple- ment, as the same issues faced by the IGSC are encountered. The machines would

118 Chapter 6 need access to huge databases, which are filled with loads of ambiguous DNA-data. This automated screen would result in the decline of many sequences or an expert would be needed for evaluation. Additionally, such a safety-program would be difficult to implement internationally, as every government wants to keep its sovereignty over which sequences are put on a DNA sequence blacklist. In conclusion, none of the reg- ulations would be perfect in solving the problem, but they would certainly support efforts in minimizing the risks associated with DNA synthesis.

119 120 Chapter 7

Discussion and Outlook

7.1 Discussion

Grant Access to Large Scale DNA Synthesis and Assembly for all

The report of the first synthetic organism [140] has opened up the door to many poten- tial methods and applications in the fields of biology and biotechnology, such as creating complex pathways or genomic vaccines. However, back in 2010, the technology was still very young and thus very resource-intensive. One could almost compare it to the door to a VIP-lounge, which has recently opened up. Everybody wants to get in, how- ever, the bouncers are keeping unwanted guests out and thus there is an exclusivity to entering. During this PhD and generally in the Christen-lab, we strive to transform this very restricting door into an open gate enabling many people to enter and thus shaping the technology by contributing their innovation. The first step towards opening up the technology of synthesizing and assembling genome sized DNA constructs was achieved by creating the codon-rewriting and partitioning al- gorithms, as these algorithms make the synthesis of almost any DNA construct pos- sible [146, 151]. These algorithms remove synthesis restrictions and make expensive custom synthesis obsolete. By using the algorithms to optimize the genome sequence, as well as separating the large DNA sequences into standardized pieces, the standard- ization of the down-stream assembly pipeline comes into reach. With such a pipeline a parallelized high-throughput assembly can be achieved and custom protocols become obsolete. An additional factor previously adding to the final cost of around $ 40 mil- lion was the labor-intensive verification process to find correctly assembled sequences, as there was no selection for the correct assembly. To reduce the amount of work re- quired, auxotrophic split markers have been integrated into the design and were tested. These split markers are located on two adjacent segments and are only active to en- sure the survival of the organism if the segments were assembled correctly. By using such markers the selection process has been greatly simplified to streaking out the

121 Chapter 7

candidate-colonies onto selective plates. With the help of a set of autonomous repli- cation sequences (ARS) integrated into the design, the construct was replicated in S. cerevisiae and the GC-content barrier has been removed. With a combination of the rewriting and partition algorithms, the design requirements, and a proper assembly strategy, the synthesis and assembly of large-scale DNA constructs have been greatly facilitated. Two major limitations persist. First, the DNA sequences need to be non-toxic for the assembly organisms E. coli and S. cerevisiae. Second, there has to be a pos- sibility to rewrite the sequence, which can be challenging in extremely high GC-content DNA stretches. If these two limitations are considered, almost all computer-designed synthetic DNA sequences can be produced without depending on classical cloning ap- proaches. Simultaneously, the price to produce such sequences has significantly de- creased. These two developments open-up many possibilities for different applications within fundamental research or in applied projects.

Different Applications Enabled by the DNA Synthesis Rewriting Technol- ogy

The potential of synthetic genomics for fundamental biology has been shown with the assembly and testing of C. eth-2.0. In this proof-of-concept, the essential genome of Caulobacter has been synthesized, assembled and tested. By synonymous rewriting of the coding sequences within the design, the fundamental principle of information en- coding and regulation in a genome was understood in more detail. It was also shown that many previously annotated elements were not correct and required correction. The study of C. eth-2.0 can be seen as the further development of the yeast artificial chro- mosomes (YACs), which were briefly discussed in the Introduction (Chapter 1). In short, by cloning large parts of eukaryotic genomes into yeast, geneticists successfully char- acterized the genome of eukaryotic organisms. As an example, some human chromo- somes (i.g. X chromosome) have been characterized using YACs. One disadvantage of the YAC-technology is that the cloning uses fragments of the genome, which are hard to mutate in a targeted fashion. Therefore, custom made YACs that encode for the synthesis of complex compounds (i.e. vitamin B12) are impossible to construct. With the technology-development presented in this thesis, the sequence contained within the YAC can be precisely defined and customized for the research question at hand. This freedom of sequence entails the possibility to up-regulate some components of a net- work, while simultaneously down-regulating others and maintaining the overall network on a single construct. Such combinations are a very interesting approach for the field of systems biology to further elucidate network structures within biological systems. In addition to the value generated for fundamental research, DNA synthesis can also be employed to design and build biological pathways, to synthesize novel products or opti-

122 Chapter 7 mize the production of existing compounds. To implement an example of this application in the laboratory, a 40 kbp synthetic B12 pathway has been designed and synthesized for the assembly of different sequence variations. These sequence variations will show the effect of different sequences on the synthesizability as well as the functionality of a construct. This project is currently conducted in a collaboration between Alexander Wolfle,¨ Dr. Rym Guennoun, and me. The assembly of the pathway is still ongoing and further experiments are needed to characterize the final pathway. However, the capa- bility to bio-synthesize novel products is foreseen to hold big potential for many different fields such as medicine, nutrition, biofuels, and bio-plastics. In addition to the B12 pathway, we developed biosensor platforms during this thesis. Al- though the biosensor discussed in this thesis (Chapter 5) was not based on a synthetic DNA approach, it shows the large potential of applying biological systems as sensors. Such a biosensor based on S. cerevisiae has shown long shelf-stability and its use does neither need extensive training nor depends on specialized equipment. The cur- rent generation of biosensor cells used in this study does not yet show the sensitivity as it could be obtained from a specialized analysis laboratory. The whole-cell biosensor is considered to, in the future, become an excellent tool for the first level of pollution assessment in the field. Combining the existing knowledge with the technologies that were presented in this thesis, it will become possible to characterize variations of new sensors at a fast pace. If for example the sensor output relies on fluorescent signals, fluorescence-activated cell sorting (FACS) can be used to isolate cells with the best sensor output. Within the measured cell-pool, a high variety of different sensor con- structs would be tested. All the different sensor constructs would be constructed from a pool of differently designed building-blocks, resulting in a high final variance within the different constructs.

Raising Awareness of the Risks Associated with DNA Synthesis

The facilitated access to DNA synthesis makes the usage of this technology easier for individuals or organizations with malicious intent. In comparison to older technologies, there is no template DNA needed to create a strain. This simplification makes it easier for individuals to create dangerous organisms (e.g. viruses) and thus the risk of bioter- rorism needs to be taken into consideration. Another problem faced arises from the growing movement of DIY-biologists, who conduct experiments in their garages. These individuals usually do not have malicious intent but they often lack scientific lab-training. This lack of training and knowledge for e.g. inactivation of biological waste increases the risk of genetically modified organisms to be accidentally released into the wild. To minimize such risks it is essential to implement rules and regulations, which control the access of untrained people to the synthesis of DNA [287].

123 Chapter 7

When defining policies to regulate synthetic biology, it is important to consider that the field is still very young. Much potential for the future development of the field is still un- known. By implementing too restrictive regulations, there is the potential of suffocating the innovation process of such a young field. One could see the policy-making for a new field like building the scaffolding for a monument that is going to be built. The scaffold cannot be built at the size of the monument base because at a later point the monu- ment will be larger than its base. For this reason, the scaffolding around the build has to be constructed in anticipation of the development of the monument structure over time. Also, if the scaffolding is conceived too large, the builders cannot reach the monument to proceed with construction. There are two different kinds of scaffolding possible: 1) an adaptive scaffold, which changes with the statue or 2) a scaffold that has the right size to accommodate the full monument from the beginning. In this analogy, the mon- ument to be built represents the field of synthetic biology, while the scaffolding is the policies implemented for this field. As we are currently at the base of this monument it is essential to implement an adaptive policy system. This goal can be achieved by defining the policies for novel experiments on a case-basis. These custom policies have to be updated while the project is progressing, also regarding a future project that could arise. However, once these policies are defined they can be used as a reference for future similar projects. This procedure also facilitates modifications of the policies. The board, which defines these policies, should be composed of people with different ex- pertise and interests to avoid biases. Regardless of existing and future policies, which regulate the field, the most important regulator should be the scientists themselves. It is of outermost importance, that each scientist working with these technologies is well-informed of the benefits and the associated risks. A very important component, which should be implemented in each design, is a molecular kill-switch, which enables to abort experiments. By implementing some of the principles discussed here and in Chapter 6, I am convinced that with a few measures the risks of this technology can be minimized by simultaneously maximizing the future benefits. However, these measures require human and financial resources to be applicable.

7.2 Outlook

A characteristic of technology is that innovations keep coming and thus it always keeps developing. Naturally, this is also valid for the technology presented within this thesis. There are several future ideas and projects planned to further advance this technology. One currently ongoing project is evaluating the rewriting of a sequence in several dif- ferent ways, pooling all synthesized variants and assembly of the parts in an unbiased way. This advanced technology comes with two main advantages: 1) if one rewritten

124 Chapter 7 sequence cannot be synthesized, different variants are encoding the same informa- tion available. This sequence ambiguity removes the waiting time for DNA sequences, which cannot directly be produced by replacing them with easier-to-synthesize alterna- tives. 2) this technology is also an easy way to further understand the regulation of different parts, as with different rewriting strategies promoters are affected differently. The second point has two applications, the first is to correct wrongly annotated and the second to discover previously unknown regulatory sequences. In the first case, the faulty in silico annotation of the construct results in promoters being labeled as coding sequences. As a consequence, these promoters are rewritten, sometimes resulting in an alteration of the protein expression. Therefore by creating several different rewrit- ten constructs, this strategy also helps in identifying constructs, which show to be very efficient in comparison to others, which encode the same information. If a regulatory sequence is not known before the rewriting process, the functional analysis of rewritten synthetic constructs can help discover new genomic elements.

Another area, which requires further work is the assembly or transfer of full genomes in E. coli. This is especially important if the synthetic constructs need to be transferred into another prokaryotic host, as this is often done by conjugation from E. coli [304]. By developing new or optimizing existing small- to mid-scale transfer protocols to achieve this large-scale DNA transfer, we will be able to use different new hosts for the synthetic constructs. Another important milestone is the full automation of the assembly process. Only with a fully automated assembly process, it will be possible to seek answers to fundamental questions and optimize complex bio-pathways at high throughput and with high efficiency.

Along with the new technologies mentioned above new variants of the C. eth-2.0 de- signs are planned. The first design C. eth-3.0, which is currently being synthesized, is a smaller design containing repaired sequences of all non-functional genes of C. eth- 2.0. As soon as all currently faced challenges have been addressed, further design iterations will follow to reach the ultimate goal of booting-up a synthetic Caulobacter ethensis cell. This cell will be revolutionary, as it will be one of the first living cells, which have been fundamentally rewritten. It would also be recognized as the first, liv- ing man-made species by the NCBI. Thanks to this new synthetic organism and future fundamentally rewritten organisms, we will significantly expand our knowledge about essential genomic networks and their regulation. With the further development of the technology, it will one day be possible to create non-minimized rewritten organisms and study them in their entirety.

In the future, the de novo synthesis of large DNA constructs will be used to produce novel classes of products. The amount of available sequencing data is increasing every day. Much of this sequence data is from non-cultivatable organisms, which are living in

125 Chapter 7

specialized niches. Thanks to sequence mining algorithms and the de novo synthesis technology, we will soon be able to identify compounds produced by such organisms. The mining of this ”genomic dark matter” has already been reported with the aim to dis- cover novel compound-producing pathways [171]. It will not only be possible to create new compounds but also encode novel functionalities in organisms. This can be shown by the example of nitrogen-fixation in plants. If all crops could fix their nitrogen, the need for artificial fertilizer would decrease significantly. As the use of such fertilizers is bad for the soil and the production is harmful to the climate, and the independence of fertilizer would be great for the environment. Recently, Flores-Tinoco et al. [305] have proposed a new nitrogen-fixing mechanism, which lies at the foundation of the symbio-

sis between legumes and Rhizobia. As the mechanism of this N2-fixation is now known, a DNA construct containing the information needed to fix nitrogen in planta can be cre- ated. Using this synthetic DNA for plant engineering, we might be able to create novel crops (e.g. wheat, corn, etc.), which are capable to enter a symbiosis with bacteria and thus fix their nitrogen, instead of being dependent on high amounts of fertilizer. This outlook is just a glimpse of the large potential of synthetic DNA. It shows, however, that there are already significant advances and applications under development. Such novel applications will shape and potentially change the way biological research is con- ducted over the next decade. All aspects considered, the field of synthetic biology and rewritten DNA synthesis entails everything required to become a staple of the molecular biology of the future.

126 Bibliography

[1] P. E. McGovern, J. Zhang, J. Tang, Z. Zhang, G. R. Hall, R. A. Moreau, A. Nunez,˜ E. D. Butrym, M. P. Richards, C. S. Wang, G. Cheng, Z. Zhao, and C. Wang, “Fer- mented beverages of pre- and proto-historic China,” Proceedings of the National Academy of Sciences of the USA, vol. 101, no. 51, pp. 17593–17598, 2004.

[2] P. E. McGovern, U. Hartung, V. R. Badler, D. L. Glusker, and L. J. Exner, “The beginnings of winemaking and viniculture in the ancient Near East and Egypt,” in Expedition Magazine, vol. 39, pp. 3–21, Penn Museum, 1997.

[3] D. Cavalieri, P.E. McGovern, D. L. Hartl, R. Mortimer, and M. Polsinelli, “Evidence for S. cerevisiae fermentation in ancient wine,” Journal of Molecular Evolution, vol. 57, pp. 226–232, 2003.

[4] I. S. Pretorius, “Tailoring wine yeast for the new millennium: Novel approaches to the ancient art of winemaking.,” Yeast, vol. 16, no. 8, pp. 675–729., 2000.

[5] J. L. Legras, D. Merdinoglu, J. M. Cornuet, and F. Karst, “Bread, beer and wine: Saccharomyces cerevisiae diversity reflects human history,” Molecular Ecology, vol. 16, no. 10, pp. 2091–2102, 2007.

[6] A. L. Lavoisier, Traite´ el´ ementaire´ de chimie. Cuchet, 1789.

[7] J. L. Gay-Lussac, “Sur l’analyse de l’alcool et de l’ether´ sulfurique, et sur les produits de la fermentation.,” Annales de chimie, no. 95, pp. 311–318, 1815.

[8] G. Amici, De’microscopj catadiottrici etc. Societa Tipografica, 1818.

[9] J. A. Barnett, “Beginnings of microbiology and biochemistry: The contribution of yeast research,” Microbiology, vol. 149, no. 3, pp. 557–567, 2003.

[10] C. Cagniard-Latour, “Memoire´ sur la fermentation vineuse,” Comptes rendus, vol. 4, pp. 905–906, 1837.

[11] F. Kutzing,¨ “Microscopische Untersuchungen uber¨ die Hefe und Essigmutter, nebst mehreren andern dazu gehorigen¨ vegetabilischen Gebilden,” Journal fur¨ praktische Chemie, vol. 11, pp. 385–409, 1837.

[12] T. Schwann, “Vorlaufige¨ Mittheilung, betreffend Versuche uber¨ die Weingahrung¨ und Faulniss,”¨ Annalen der Physik, vol. 117, no. 5, pp. 184–193, 1837.

[13] L. Pasteur, “Memoire´ sur la fermentation alcoolique,” Annales de Chimie et de Physique, vol. 58, pp. 323–426, 1860.

[14] M. Berthelot, “Sur la fermentation glucosique du sucre de canne,” Comptes ren- dus, vol. 50, pp. 980–984, 1860.

127 Bibliography

[15] D. Botstein and G. R. Fink, “Yeast: An experimental organism for modern biology,” Science, vol. 240, no. 4858, pp. 1439–1443, 1988. [16] J. N. Strathern, The molecular biology of the yeast saccharomyces, vol. 11 of Cold Spring Harbor monograph series. Cold Spring Harbor, N.Y: Cold Spring Harbor Laboratory, 1981. [17] D. Meeks-Wagner and L. H. Hartwell, “Normal stoichiometry of histone dimer sets is necessary for high fidelity of mitotic chromosome transmission,” Cell, vol. 44, no. 1, pp. 43–52, 1986. [18] M. D. Rose and G. R. Fink, “KAR1, a gene required for function of both intranu- clear and extranuclear microtubules in yeast,” Cell, vol. 48, pp. 1047–1060, 1987. [19] G. S. Roeder and G. R. Fink, “DNA rearrangements associated with a transpos- able element in yeast,” Cell, vol. 21, pp. 239–249, 1980. [20] R. J. Rothstein, “One-step gene disruption in yeast,” Methods in Enzymology, vol. 101, pp. 202–211, 1983. [21] D. Shortle, J. E. Haber, and D. Botstein, “Lethal disruption of the yeast actin gene by integrative DNA transformation,” Science, vol. 217, no. 4557, pp. 371–373, 1982. [22] A. Goffeau, B. G. Barrell, H. Bussey, R. Davis, B. Dujon, H. Feldmann, F. Gal- ibert, J. D. Hoheisel, C. Jacq, M. Johnston, E. J. Louis, H. W. Mewes, Y. Mu- rakami, P. Philippsen, H. Tettelin, and S. G. Oliver, “Life with 6000 genes,” Sci- ence, vol. 274, no. 5287, pp. 546–567, 1996. [23] E. A. Winzeler, D. D. Shoemaker, A. Astromoff, H. Liang, K. Anderson, B. An- dre, R. Bangham, R. Benito, J. D. Boeke, H. Bussey, A. M. Chu, C. Connelly, K. Davis, F. Dietrich, S. W. Dow, M. El Bakkoury, F. Foury, S. H. Friend, E. Gen- talen, G. Giaever, J. H. Hegemann, T. Jones, M. Laub, H. Liao, N. Liebundguth, D. J. Lockhart, A. Lucau-Danila, M. Lussier, N. M’Rabet, P. Menard, M. Mittmann, C. Pai, C. Rebischung, J. L. Revuelta, L. Riles, C. J. Roberts, P.Ross-MacDonald, B. Scherens, M. Snyder, S. Sookhai-Mahadeo, R. K. Storms, S. Veronneau,´ M. Voet, G. Volckaert, T. R. Ward, R. Wysocki, G. S. Yen, K. Yu, K. Zimmermann, P. Philippsen, M. Johnston, and R. W. Davis, “Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis,” Science, vol. 285, no. 5429, pp. 901–906, 1999. [24] G. Giaever, A. M. Chu, L. Ni, C. Connelly, L. Riles, S. Veronneau,´ S. Dow, A. Lucau-Danila, K. Anderson, B. Andre,´ A. P. Arkin, A. Astromoff, M. E. Bakkoury, R. Bangham, R. Benito, S. Brachat, S. Campanaro, M. Curtiss, K. Davis, A. Deutschbauer, K.-D. Entian, P. Flaherty, F. Foury, D. J. Garfinkel, M. Gerstein, D. Gotte, U. Guldener,¨ J. H. Hegemann, S. Hempel, Z. Herman, D. F. Jaramillo, D. E. Kelly, S. L. Kelly, P. Kotter,¨ D. LaBonte, D. C. Lamb, N. Lan, H. Liang, H. Liao, L. Liu, C. Luo, M. Lussier, R. Mao, P. Menard, S. L. Ooi, J. L. Revuelta, C. J. Roberts, M. Rose, P. Ross-MacDonald, B. Scherens, G. Schim- mack, B. Shafer, D. D. Shoemaker, S. Sookhai-Mahadeo, R. K. Storms, J. N. Strathern, G. Valle, M. Voet, G. Volckaert, C.-y. Wang, T. R. Ward, J. Wilhelmy, E. A. Winzeler, Y. Yang, G. Yen, E. Youngman, K. Yu, H. Bussey, J. D. Boeke, M. Snyder, P. Philippsen, R. W. Davis, and M. Johnston, “Functional profiling of the Saccharomyces cerevisiae genome,” Nature, vol. 418, pp. 387–391, 2002.

128 Bibliography

[25] S. Ghaemmaghami, W.-K. Huh, K. Bower, R. W. Howson, A. Belle, N. Dephoure, E. K. O’Shea, and J. S. Weissman, “Global analysis of protein expression in yeast,” Nature, vol. 425, no. 6959, pp. 737–741, 2003.

[26] W.-K. Huh, J. V. Falvo, L. C. Gerke, A. S. Carroll, R. W. Howson, J. S. Weissman, and E. K. O’Shea, “Global analysis of protein localization in budding yeast.,” Na- ture, vol. 425, no. 6959, pp. 686–691, 2003.

[27] D. Botstein and G. R. Fink, “Yeast: An experimental organism for 21st century biology,” Genetics, vol. 189, no. 3, pp. 695–704, 2011.

[28] G. Robertson, M. Hirst, M. Bainbridge, M. Bilenky, Y. Zhao, T. Zeng, G. Eu- skirchen, B. Bernier, R. Varhol, A. Delaney, N. Thiessen, O. L. Griffith, A. He, M. Marra, M. Snyder, and S. Jones, “Genome-wide profiles of STAT1 DNA asso- ciation using chromatin immunoprecipitation and massively parallel sequencing,” Nature Methods, vol. 4, no. 8, pp. 651–657, 2007.

[29] Y. Wang, C. L. Liu, J. D. Storey, R. J. Tibshirani, D. Herschlag, and P. O. Brown, “Precision and functional specificity in mRNA decay,” Proceedings of the National Academy of Sciences of the USA, vol. 99, no. 9, pp. 5860–5865, 2002.

[30] N. J. Krogan, G. Cagney, H. Yu, G. Zhong, X. Guo, A. Ignatchenko, J. Li, S. Pu, N. Datta, A. P. Tikuisis, T. Punna, J. M. Peregr´ın-Alvarez, M. Shales, X. Zhang, M. Davey, M. D. Robinson, A. Paccanaro, J. E. Bray, A. Sheung, B. Beattie, D. P. Richards, V. Canadien, A. Lalev, F. Mena, P. Wong, A. Starostine, M. M. Canete, J. Vlasblom, S. Wu, C. Orsi, S. R. Collins, S. Chandran, R. Haw, J. J. Rilstone, K. Gandi, N. J. Thompson, G. Musso, P. St Onge, S. Ghanny, M. H. Lam, G. But- land, A. M. Altaf-Ul, S. Kanaya, A. Shilatifard, E. O’Shea, J. S. Weissman, C. J. Ingles, T. R. Hughes, J. Parkinson, M. Gerstein, S. J. Wodak, A. Emili, and J. F. Greenblatt, “Global landscape of protein complexes in the yeast Saccharomyces cerevisiae,” Nature, vol. 440, no. 7084, pp. 637–643, 2006.

[31] A. H. Y. Tong, M. Evangelista, A. B. Parsons, H. Xu, G. D. Bader, N. Page,´ M. Robinson, S. Raghibizadeh, C. W. Hogue, H. Bussey, B. Andrews, M. Tyers, and C. Boone, “Systematic genetic analysis with ordered arrays of yeast deletion mutants,” Science, vol. 294, no. 5550, pp. 2364–2368, 2001.

[32] M. Costanzo, A. Baryshnikova, J. Bellay, Y. Kim, E. D. Spear, C. S. Sevier, H. Ding, J. L. Koh, K. Toufighi, S. Mostafavi, J. Prinz, R. P. St. Onge, B. Vandersluis, T. Makhnevych, F. J. Vizeacoumar, S. Alizadeh, S. Bahr, R. L. Brost, Y. Chen, M. Cokol, R. Deshpande, Z. Li, Z. Y. Lin, W. Liang, M. Marback, J. Paw, B. J. S. Luis, E. Shuteriqi, A. H. Y. Tong, N. Van Dyk, I. M. Wallace, J. A. Whitney, M. T. Weirauch, G. Zhong, H. Zhu, W. A. Houry, M. Brudno, S. Ragibizadeh, B. Papp, C. Pal,´ F. P. Roth, G. Giaever, C. Nislow, O. G. Troyanskaya, H. Bussey, G. D. Bader, A. C. Gingras, Q. D. Morris, P. M. Kim, C. A. Kaiser, C. L. Myers, B. J. Andrews, and C. Boone, “The genetic landscape of a cell,” Science, vol. 327, no. 5964, pp. 425–431, 2010.

[33] A. V. Oliveira, R. Vilac¸a, C. N. Santos, V. Costa, and R. Menezes, “Exploring the power of yeast to model aging and age-related neurodegenerative disorders,” Biogerontology, vol. 18, pp. 3–34, 2017.

129 Bibliography

[34] D. G. Gibson, G. A. Benders, K. C. Axelrod, J. Zaveri, M. A. Algire, M. Moodie, M. G. Montague, J. C. Venter, H. O. Smith, and C. A. Hutchison, “One-step as- sembly in yeast of 25 overlapping DNA fragments to form a complete synthetic Mycoplasma genitalium genome,” Proceedings of the National Academy of Sci- ences of the USA, vol. 105, no. 51, pp. 20404–20409, 2008.

[35] S. N. Cohen, “DNA cloning: A personal view after 40 years,” Proceedings of the National Academy of Sciences of the USA, vol. 110, no. 39, pp. 15521–15529, 2013.

[36] K. Harada, M. Kameda, M. Suzuki, and S. Mitsuhashi, “Drug resistance of enteric bacteria. 3. acquisition of transferability,” Journal of bacteriology, vol. 88, no. 5, pp. 1257–1265, 1964.

[37] T. Watanabe, C. Ogata, and S. Sato, “Episome-mediated transfer of drug resis- tance in Enterobacteriaceae. VIII. Six-drug-resistance R factor,” Journal of bacte- riology, vol. 88, no. 4, pp. 922–928, 1964.

[38] T. Watanabe and T. Fukasawa, “Episome-mediated transfer of drug resistance in Enterobacteriaceae. I. Transfer of resistance factors by conjugation.,” Journal of bacteriology, vol. 81, pp. 669–678, 1961.

[39] A. M. Campbell, “Episomes,” Advances in Genetics, vol. 11, pp. 101–145, 1963.

[40] F. Jacob, S. Brenner, and F. Cuzin, “On the regulation of DNA replication in bacte- ria,” in Cold Spring Harbor symposia on quantitative biology, vol. 28, pp. 329–348, Cold Spring Harbor Laboratory Press, 1963.

[41] S. N. Cohen, A. C. Y. Chang, and L. Hsu, “Nonchromosomal antibiotic resis- tance in bacteria: genetic transformation of Escherichia coli by R-factor DNA.,” Proceedings of the National Academy of Sciences of the USA, vol. 69, no. 8, pp. 2110–2114, 1972.

[42] H. O. Smith and K. W. Welcox, “A Restriction enzyme from Hemophilus influen- zae. I. Purification and general properties,” Journal of Molecular Biology, vol. 51, no. 2, pp. 379–391, 1970.

[43] R. Yoshimori, D. Roulland-Dussoix, and H. W. Boyer, “R factor-controlled restric- tion and modification of deoxyribonucleic acid: restriction mutants.,” Journal of Bacteriology, vol. 112, no. 3, pp. 1275–1279, 1972.

[44] S. N. Cohen, A. C. Chang, H. W. Boyer, and R. B. Helling, “Construction of biolog- ically functional bacterial plasmids in vitro,” Proceedings of the National Academy of Sciences of the USA, vol. 70, no. 11, pp. 3240–3244, 1973.

[45] A. C. Y. Chang and S. N. Cohen, “Genome construction between bacterial species in vitro: Replication and expression of Staphylococcus plasmid genes in Escherichia coli,” Proceedings of the National Academy of Sciences of the USA, vol. 71, no. 4, pp. 1030–1034, 1974.

[46] P.A. Sharp, B. Sugden, and J. Sambrook, “Detection of two restriction endonucle- ase activities in Haemophilus parainfluenzae using analytical agarose-ethidium bromide electrophoresis,” Biochemistry, vol. 12, no. 16, pp. 3055–3063, 1973.

130 Bibliography

[47] A. M. Maxam and W. Gilbert, “A new method for sequencing DNA,” Proceedings of the National Academy of Sciences of the USA, vol. 74, no. 2, pp. 560–564, 1977.

[48] F. Sanger, S. Nicklen, and A. R. Coulson, “DNA sequencing with chain- terminating inhibitors.,” Proceedings of the National Academy of Sciences of the USA, vol. 74, no. 12, pp. 5463–5467, 1977.

[49] R. K. Saiki, D. H. Gelfand, S. Stoffel, S. J. Scharf, R. Higuchi, G. T. Horn, K. B. Mullis, and H. A. Erlich, “Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase,” Science, vol. 239, no. 4839, pp. 487–491, 1988.

[50] L. Garibyan and N. Avashia, “Research techniques made simple: Polymerase chain reaction (PCR),” The Journal of investigative dermatology, vol. 133, no. 3, 2013.

[51] D. Garza, J. W. Ajioka, D. T. Burke, and D. L. Hartl, “Mapping the Drosophila genome with yeast artificial chromosomes,” Science, vol. 246, no. 4930, pp. 641– 646, 1989.

[52] R. D. Little, G. Porta, G. F. Carle, D. Schlessinger, and M. D’Urso, “Yeast artificial chromosomes with 200- to 800-kilobase inserts of human DNA containing HLA, V(k), 5S, and Xq24-Xq28 sequences,” Proceedings of the National Academy of Sciences of the USA, vol. 86, no. 5, pp. 1598–1602, 1989.

[53] B. H. Brownstein, G. A. Silverman, R. D. Little, D. T. Burke, S. J. Korsmeyer, D. Schlessinger, and M. V. Olson, “Isolation of single-copy human genes from a library of yeast artificial chromosome clones,” Science, vol. 244, no. 4910, pp. 1348–1351, 1989.

[54] D. T. Burke, G. F. Carle, and M. V. Olson, “Cloning of large segments of ex- ogenous DNA into yeast by means of artificial chromosome vectors,” Science, vol. 236, no. 4803, pp. 806–812, 1987.

[55] D. Marchuk and F. S. Collins, “pYAC-RC, a yeast artificial chromosome vector for cloning DNA cut with infrequently cutting restriction eodonucleases,” Nucleic Acids Research, vol. 16, no. 15, p. 7743, 1988.

[56] H. Cooke and S. Cross, “pYAC-4 Neo, a yeast artificial chromosome vector which codes for G418 resistance in mammalian cells,” Nucleic Acids Research, vol. 16, no. 12, p. 11817, 1988.

[57] R. Anand, D. J. Ogilvie, R. Butler, J. H. Riley, R. S. Finniear, S. J. Powell, J. C. Smith, and A. F. Markham, “A yeast artificial chromosome contig encompassing the cystic fibrosis locus,” Genomics, vol. 9, no. 1, pp. 124–130, 1991.

[58] M. Ramsay, “Yeast artificial chromosome cloning,” Molecular Biotechnology, vol. 1, no. 2, pp. 181–201, 1994.

[59] S. Levy, G. Sutton, P. C. Ng, L. Feuk, A. L. Halpern, B. P. Walenz, N. Axelrod, J. Huang, E. F. Kirkness, G. Denisov, Y. Lin, J. R. MacDonald, A. W. C. Pang, M. Shago, T. B. Stockwell, A. Tsiamouri, V. Bafna, V. Bansal, S. A. Kravitz, D. A. Busam, K. Y. Beeson, T. C. McIntosh, K. A. Remington, J. F. Abril, J. Gill, J. Bor- man, Y. H. Rogers, M. E. Frazier, S. W. Scherer, R. L. Strausberg, and J. C.

131 Bibliography

Venter, “The diploid genome sequence of an individual human,” PLoS Biology, vol. 5, no. 10, pp. 2113–2144, 2007.

[60] M. Ronaghi, M. Uhlen,´ and P. Nyren,´ “A sequencing method based on real-time pyrophosphate,” Science, vol. 281, no. 5375, pp. 363–365, 1998.

[61] G. Turcatti, A. Romieu, M. Fedurco, and A. P. Tairi, “A new class of cleavable fluorescent nucleotides: Synthesis and optimization as reversible terminators for DNA sequencing by synthesis,” Nucleic Acids Research, vol. 36, no. 4, 2008.

[62] J. M. Heather and B. Chain, “The sequence of sequencers: The history of se- quencing DNA,” Genomics, vol. 107, no. 1, pp. 1–8, 2016.

[63] G. M. Moore, “Cramming more components onto integrated circuits,” Electronics, vol. 38, no. 8, pp. 114–117, 1965.

[64] L. D. Stein, “The case for cloud computing in genome informatics,” Genome Biol- ogy, vol. 11, no. 5, 2010.

[65] E. E. Schadt, S. Turner, and A. Kasarskis, “A window into third-generation se- quencing,” Human Molecular Genetics, vol. 19, no. 2, pp. 227–240, 2010.

[66] A. M. Michelson and A. R. Todd, “Nucleotides Part XXXII: Synthesis of a dithymi- dine dinucleotide containing a 3’:5’-internucleotidic linkage,” Journal of the Chem- ical Society, pp. 2632–2638, 1955.

[67] R. H. Hall, A. R. Todd, and R. F. Webb, “Nucleotides. Part XLI.* Mixed anhydrides as intermediates in the synthesis of dinucleoside phosphates.,” Journal of the Chemical Society, pp. 3291–3296, 1957.

[68] H. G. Khorana, W. E. Razzell, P.T. Gilham, G. M. Tener, and E. H. Pol, “Syntheses of dideoxyribonucleotides,” Journal of the American Chemical Society, vol. 79, no. 4, pp. 1002–1003, 1957.

[69] S. L. Beaucage and M. H. Caruthers, “Deoxynucleoside phosphoramidites-A new class of key intermediates for deoxypolynucleotide synthesis,” Tetrahedron Let- ters, vol. 22, no. 20, pp. 1859–1862, 1981.

[70] H. Lonnberg,¨ “Synthesis of oligonucleotides on a soluble support,” Beilstein J. Org. Chem, vol. 13, pp. 1368–1387, 2017.

[71] A. R. Pike, L. H. Lie, R. A. Eagling, L. C. Ryder, S. N. Patole, B. A. Connolly, B. R. Horrocks, and A. Houlton, “DNA on silicon devices: On-chip synthesis, hybridization, and charge transfer,” Angewandte Chemie - International Edition, vol. 41, no. 4, pp. 615–617, 2002.

[72] J. J. Schwartz, C. Lee, and J. Shendure, “Accurate gene synthesis with tag- directed retrieval of sequence-verified DNA molecules,” Nature Methods, vol. 9, no. 9, pp. 913–915, 2012.

[73] H. Kim, H. Han, D. Shin, and D. Bang, “A Fluorescence Selection Method for Accurate Large-Gene Synthesis,” ChemBioChem, vol. 11, no. 17, pp. 2448–2452, 2010.

132 Bibliography

[74] S. Kosuri, N. Eroshenko, E. M. Leproust, M. Super, J. Way, J. B. Li, and G. M. Church, “Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips,” Nature Biotechnology, vol. 28, no. 12, pp. 1295–1299, 2010.

[75] D. G. Gibson, L. Young, R.-Y. Chuang, J. C. Venter, C. A. Hutchison, and H. O. Smith, “Enzymatic assembly of DNA molecules up to several hundred kilobases.,” Nature methods, vol. 6, pp. 343–345, May 2009.

[76] H. Muller, N. Annaluru, J. W. Schwerzmann, S. M. Richardson, J. S. Dymond, E. M. Cooper, J. S. Bader, J. D. Boeke, and S. Chandrasegaran, “Assembling Large DNA Segments in Yeast,” in Gene Synthesis: Methods and Protocols, vol. 852, pp. 133–150, Humana Press, 2012.

[77] E. Weber, C. Engler, R. Gruetzner, S. Werner, and S. Marillonnet, “A modular cloning system for standardized assembly of multigene constructs,” PLoS ONE, vol. 6, no. 2, 2011.

[78] S. Kosuri and G. M. Church, “Large-scale de novo DNA synthesis: technologies and applications.,” Nature methods, vol. 11, pp. 499–507, May 2014.

[79] J. E. Venetz, L. Del Medico, A. Wolfle,¨ P. Schachle,¨ Y. Bucher, D. Appert, F. Tschan, C. E. Flores-Tinoco, M. van Kooten, R. Guennoun, S. Deutsch, M. Christen, and B. Christen, “Chemical synthesis rewriting of a bacterial genome to achieve design flexibility and biological functionality,” Proceedings of the Na- tional Academy of Sciences of the USA, vol. 116, pp. 8070–8079, 4 2019.

[80] M. J. Czar, J. C. Anderson, J. S. Bader, and J. Peccoud, “Gene synthesis demys- tified,” Trends in Biotechnology, vol. 27, no. 2, pp. 63–72, 2009.

[81] M. A.´ Medina, “Systems biology for molecular life sciences and its impact in biomedicine,” Cellular and Molecular Life Sciences, vol. 70, no. 6, pp. 1035–1053, 2013.

[82] J. C. Smuts, Holism and evolution. The Macmillan Company, 1926.

[83] H. Kitano, “Perspectives on systems biology,” New Generation Computing, vol. 18, no. 3, pp. 199–216, 2000.

[84] P. J. Bickel, J. B. Brown, H. Huang, and Q. Li, “An overview of recent develop- ments in genomics and associated statistical methods,” Philosophical Transac- tions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 367, no. 1906, pp. 4313–4337, 2009.

[85] K. Dettmer, P. A. Aronov, and B. D. Hammock, “Mass spectrometry-based metabolomics,” Mass Spectrometry Reviews, vol. 26, pp. 51–78, 2006.

[86] R. Aebersold and M. Mann, “Mass spectrometry-based proteomics,” Nature, vol. 422, no. 6928, pp. 198–207, 2003.

[87] N. Gehlenborg, S. I. O’Donoghue, N. S. Baliga, A. Goesmann, M. A. Hibbs, H. Ki- tano, O. Kohlbacher, H. Neuweger, R. Schneider, D. Tenenbaum, and A. C. Gavin, “Visualization of omics data for systems biology,” Nature Methods, vol. 7, no. 3, pp. S56–S68, 2010.

133 Bibliography

[88] B. Palsson and K. Zengler, “The challenges of integrating multi-omic data sets,” Nature Chemical Biology, vol. 6, no. 10, pp. 787–789, 2010.

[89] H.-Y. Chuang, M. Hofree, and T. Ideker, “A decade of systems biology,” Annual Review of Cell and Developmental Biology, vol. 26, no. 1, pp. 721–744, 2010.

[90] T. van Opijnen, K. L. Bodi, and A. Camilli, “Tn-seq: High-throughput parallel se- quencing for fitness and genetic interaction studies in microorganisms,” Nature Methods, vol. 6, no. 10, pp. 767–772, 2009.

[91] D. E. Berg, J. Davies, B. Allet, and J.-D. Rochaix, “Transposition of R factor genes to bacteriophage λ,” Proceedings of the National Academy of Sciences of the USA, vol. 72, no. 9, pp. 3628–3632, 1975.

[92] W. S. Reznikoff, “Transposon Tn 5,” Annual Review of Genetics, vol. 42, pp. 269– 286, 2008.

[93] R. C. Johnson and W. S. Reznikoff, “DNA sequences at the ends of transposon Tn5 required for transposition,” Nature, vol. 304, no. 5923, pp. 280–282, 1983.

[94] C. Sasakawa, G. F. Carle, and D. E. Berg, “Sequences essential for transposition at the termini of IS50,” Proceedings of the National Academy of Sciences of the USA, vol. 80, no. 23, pp. 7293–7297, 1983.

[95] I. Y. Goryshin, J. A. Miller, Y. V. Kil, V. A. Lanzov, and W. S. Reznikoff, “Tn5 / IS50 target recognition,” Proceedings of the National Academy of Sciences of the USA, vol. 95, no. September, pp. 10716–10721, 1998.

[96] Y. Shevchenko, G. G. Bouffard, Y. S. N. Butterfield, R. W. Blakesley, J. L. Hartley, A. C. Young, M. A. Marra, S. J. M. Jones, J. W. Touchman, and E. D. Green, “Sys- tematic sequencing of cDNA clones using the transposon Tn5,” Nucleic Acids Research, vol. 30, no. 11, pp. 2469–2477, 2002.

[97] I. Y. Goryshin and W. S. Reznikoff, “Tn5 in vitro transposition,” Journal of Biologi- cal Chemistry, vol. 273, no. 13, pp. 7367–7374, 1998.

[98] B. Christen, E. Abeliuk, J. M. Collier, V. S. Kalogeraki, B. Passarelli, J. A. Coller, M. J. Fero, H. H. McAdams, and L. Shapiro, “The essential genome of a bac- terium.,” Molecular Systems Biology, vol. 7, no. 1, pp. 528–528, 2011.

[99] T. G. Dong, B. T. Ho, D. R. Yoder-Himes, and J. J. Mekalanos, “Identification of T6SS-dependent effector and immunity proteins by Tn-seq in Vibrio cholerae,” Proceedings of the National Academy of Sciences of the USA, vol. 110, no. 7, pp. 2623–2628, 2013.

[100] A. M. Ochsner, M. Christen, L. Hemmerle, R. Peyraud, B. Christen, and J. A. Vorholt, “Transposon sequencing uncovers an essential regulatory func- tion of phosphoribulokinase for methylotrophy,” Current Biology, vol. 27, no. 17, pp. 2579–2588, 2017.

[101] J. F. Sternon, P. Godessart, R. G. de Freitas, M. Van der Henst, K. Poncin, N. Francis, K. Willemart, M. Christen, B. Christen, J. J. Letesson, and X. De Bolle, “Transposon sequencing of Brucella abortus uncovers essential genes for growth in vitro and inside macrophages,” Infection and Immunity, vol. 86, no. 8, pp. 1–20, 2018.

134 Bibliography

[102] C. A. Hutchison, S. N. Peterson, S. R. Gill, R. T. Cline, O. White, C. M. Fraser, H. O. Smith, and J. C. Venter, “Global transposon mutagenesis and a minimal Mycoplasma genome,” Science, vol. 286, no. 5447, pp. 2165–2169, 1999.

[103] M. A. Jacobs, A. Alwood, I. Thaipisuttikul, D. Spencer, E. Haugen, S. Ernst, O. Will, R. Kaul, C. Raymond, R. Levy, L. Chun-Rong, D. Guenthner, D. Bovee, M. V. Olson, and C. Manoil, “Comprehensive transposon mutant library of Pseu- domonas aeruginosa,” Proceedings of the National Academy of Sciences of the USA, vol. 100, no. 24, pp. 14339–14344, 2003.

[104] J. I. Glass, N. Assad-Garcia, N. Alperovich, S. Yooseph, M. R. Lewis, M. Maruf, C. A. Hutchison, H. O. Smith, and J. C. Venter, “Essential genes of a minimal bac- terium,” Proceedings of the National Academy of Sciences of the USA, vol. 103, no. 2, pp. 425–430, 2006.

[105] K. Kobayashi, S. D. Ehrlich, A. Albertini, G. Amati, K. K. Andersen, M. Arnaud, K. Asai, S. Ashikaga, S. Aymerich, P.Bessieres, F. Boland, S. C. Brignell, S. Bron, K. Bunai, J. Chapuis, L. C. Christiansen, A. Danchin, M. Debarbouill´ e,´ E. Dervyn, E. Deuerling, K. Devine, S. K. Devine, O. Dreesen, J. Errington, S. Fillinger, S. J. Foster, Y. Fujita, A. Galizzi, R. Gardan, C. Eschevins, T. Fukushima, K. Haga, C. R. Harwood, M. Hecker, D. Hosoya, M. F. Hullo, H. Kakeshita, D. Kara- mata, Y. Kasahara, F. Kawamura, K. Koga, P. Koski, R. Kuwana, D. Imamura, M. Ishimaru, S. Ishikawa, I. Ishio, D. le Coq, A. Masson, C. Mauel,¨ R. Meima, R. P. Mellado, A. Moir, S. Moriya, E. Nagakawa, H. Nanamiya, S. Nakai, P. Ny- gaard, M. Ogura, T. Ohanan, M. O’Reilly, M. O’Rourke, Z. Pragai, H. M. Poo- ley, G. Rapoport, J. P. Rawlins, L. A. Rivas, C. Rivolta, A. Sadaie, Y. Sadaie, M. Sarvas, T. Sato, H. H. Saxild, E. Scanlan, W. Schumann, J. F. Seegers, J. Sekiguchi, A. Sekowska, S. J. Seror,´ M. Simon, P. Stragier, R. Studer, H. Taka- matsu, T. Tanaka, M. Takeuchi, H. B. Thomaides, V. Vagner, J. M. van Dijl, K. Watabe, A. Wipat, H. Yamamoto, M. Yamamoto, Y. Yamamoto, K. Yamane, K. Yata, K. Yoshida, H. Yoshikawa, U. Zuber, and N. Ogasawara, “Essential Bacil- lus subtilis genes,” Proceedings of the National Academy of Sciences of the USA, vol. 100, no. 8, pp. 4678–4683, 2003.

[106] T. Baba, T. Ara, M. Hasegawa, Y. Takai, Y. Okumura, M. Baba, K. A. Datsenko, M. Tomita, B. L. Wanner, and H. Mori, “Construction of Escherichia coli K-12 in- frame, single-gene knockout mutants: The Keio collection,” Molecular Systems Biology, vol. 2, 2006.

[107] M. Way, “What I cannot create, I do not understand,” The Company of Biologists Ltd, 2017.

[108] G. Weng, U. S. Bhalla, and R. Iyengar, “Complexity in biological signaling sys- tems,” Science, vol. 284, no. 5411, pp. 92–96, 1999.

[109] M. Juhas, “On the road to synthetic life: The minimal cell and genome-scale engineering,” Critical Reviews in Biotechnology, vol. 36, no. 3, pp. 416–423, 2016.

[110] M. Juhas, L. Eberl, and J. I. Glass, “Essence of life: Essential genes of minimal genomes,” Trends in Cell Biology, vol. 21, no. 10, pp. 562–568, 2011.

[111] G. Posfai,´ G. Plunkett, T. Feher,´ D. Frisch, G. M. Keil, K. Umenhoffer, V. Kolisny- chenko, B. Stahl, S. S. Sharma, M. De Arruda, V. Burland, S. W. Harcum, and

135 Bibliography

F. R. Blattner, “Emergent properties of reduced-genome Escherichia coli,” Sci- ence, vol. 312, no. 5776, pp. 1044–1046, 2006.

[112] T. Feher,´ B. Papp, C. Pal,´ and G. Posfai,´ “Systematic genome reductions: Theoretical and experimental approaches,” Chemical Reviews, vol. 107, no. 8, pp. 3498–3513, 2007.

[113] T. Morimoto, R. Kadoya, K. Endo, M. Tohata, K. Sawada, S. Liu, T. Ozawa, T. Ko- dama, H. Kakeshita, Y. Kageyama, K. Manabe, S. Kanaya, K. Ara, K. Ozaki, and N. Ogasawara, “Enhanced recombinant protein productivity by genome reduction in Bacillus subtilis,” DNA Research, vol. 15, no. 2, pp. 83–91, 2008.

[114] K. Manabe, Y. Kageyama, T. Morimoto, T. Ozawa, K. Sawada, K. Endo, M. To- hata, K. Ara, K. Ozaki, and N. Ogasawara, “Combined effect of improved cell yield and increased specific productivity enhances recombinant enzyme production in genome-reduced Bacillus subtilis strain MGB874,” Applied and Environmental Microbiology, vol. 77, no. 23, pp. 8370–8381, 2011.

[115] M. Thibonnier, J. M. Thiberge, and H. De Reuse, “Trans-translation in Helicobac- ter pylori: Essentiality of ribosome rescue and requirement of protein tagging for stress resistance and competence,” PLoS ONE, vol. 3, no. 11, 2008.

[116] J. O’Neill, “Tackling drug-resistant infections globally: Final report and recom- mendations.,” HM Government and Welcome Trust: UK, 2016.

[117] D. E. Cameron, C. J. Bashor, and J. J. Collins, “A brief history of synthetic biology,” Nature Reviews Microbiology, vol. 12, no. 5, pp. 381–390, 2014.

[118] H. H. McAdams and L. Shapiro, “Circuit simulation of genetic networks,” Science, vol. 269, no. 5224, pp. 650–656, 1995.

[119] H. H. McAdams and A. Arkin, “Gene regulation: Towards a circuit engineering discipline,” Current Biology, vol. 10, no. 8, pp. 318–320, 2000.

[120] T. S. Gardner, C. R. Cantor, and J. J. Collins, “Construction of a genetic toggle switch in Escherichia coli,” Nature, vol. 403, no. 6767, pp. 339–342, 2000.

[121] M. B. Elowitz and S. Leibier, “A synthetic oscillatory network of transcriptional regulators,” Nature, vol. 403, no. 6767, pp. 335–338, 2000.

[122] A. Becskel and L. Serrano, “Engineering stability in gene networks by autoregu- lation,” Nature, vol. 405, no. 6786, pp. 590–593, 2000.

[123] A. Becskei, B. Seraphin,´ and L. Serrano, “Positive feedback in eukaryotic gene networks: Cell differentiation by graded to binary response conversion,” EMBO Journal, vol. 20, no. 10, pp. 2528–2535, 2001.

[124] J. Hasty, D. McMillen, and J. J. Collins, “Engineered gene circuits,” Nature, vol. 420, no. 6912, pp. 224–230, 2002.

[125] F. J. Isaacs, J. Hasty, C. R. Cantor, and J. J. Collins, “Prediction and measurement of an autoregulatory genetic module,” Proceedings of the National Academy of Sciences of the USA, vol. 100, no. 13, pp. 7714–7719, 2003.

[126] J. C. Anderson, C. A. Voigt, and A. P. Arkin, “Environmental signal integration by a modular and gate,” Molecular Systems Biology, vol. 3, no. 133, 2007.

136 Bibliography

[127] S. Basu, Y. Gerchman, C. H. Collins, F. H. Arnold, and R. Weiss, “A synthetic mul- ticellular system for programmed pattern formation,” Nature, vol. 434, no. 7037, pp. 1130–1134, 2005.

[128] L. You, R. S. Cox, R. Weiss, and F. H. Arnold, “Programmed population control by cell-cell communication and regulated killing,” Nature, vol. 428, no. 6985, pp. 868– 871, 2004.

[129] T. Danino, O. Mondragon-Palomino,´ L. Tsimring, and J. Hasty, “A synchronized quorum of genetic clocks,” Nature, vol. 463, no. 7279, pp. 326–330, 2010.

[130] J. J. Tabor, H. M. Salis, Z. B. Simpson, A. A. Chevalier, A. Levskaya, E. M. Mar- cotte, C. A. Voigt, and A. D. Ellington, “A synthetic genetic edge detection pro- gram,” Cell, vol. 137, no. 7, pp. 1272–1281, 2009.

[131] T. Youyou, N. Muyun, Z. Yurong, L. Lanna, G. Shulian, Z. Muqun, W. Xiuzhen, and L. Xiaotian, “Studies on the constituents of Artemisia annua part II,” Journal of Medicinal Plant Research, vol. 44, pp. 143–145, 1982.

[132] R. G. Ridley, “Medical need, scientific opportunity and the drive for antimalarial drugs.,” Nature, vol. 415, no. 1, pp. 686–693, 2002.

[133] V. Hale, J. D. Keasling, N. Renninger, and T. T. Diagana, “Microbially derived artemisinin: A biotechnology solution to the global problem of access to afford- able antimalarial drugs,” American Journal of Tropical Medicine and Hygiene, vol. 77, no. SUPPL. 6, pp. 198–202, 2007.

[134] C. J. Paddon, P. J. Westfall, D. J. Pitera, K. Benjamin, K. Fisher, D. McPhee, M. D. Leavell, A. Tai, A. Main, D. Eng, D. R. Polichuk, K. H. Teoh, D. W. Reed, T. Treynor, J. Lenihan, H. Jiang, M. Fleck, S. Bajad, G. Dang, D. Dengrove, D. Di- ola, G. Dorin, K. W. Ellens, S. Fickes, J. Galazzo, S. P. Gaucher, T. Geistlinger, R. Henry, M. Hepp, T. Horning, T. Iqbal, L. Kizer, B. Lieu, D. Melis, N. Moss, R. Regentin, S. Secrest, H. Tsuruta, R. Vazquez, L. F. Westblade, L. Xu, M. Yu, Y. Zhang, L. Zhao, J. Lievense, P. S. Covello, J. D. Keasling, K. K. Reiling, N. S. Renninger, and J. D. Newman, “High-level semi-synthetic production of the potent antimalarial artemisinin,” Nature, vol. 496, no. 7446, pp. 528–532, 2013.

[135] S. Atsumi, T. Hanai, and J. C. Liao, “Non-fermentative pathways for synthesis of branched-chain higher alcohols as biofuels,” Nature, vol. 451, no. 7174, pp. 86– 89, 2008.

[136] Y. X. Huo, K. M. Cho, J. G. Rivera, E. Monte, C. R. Shen, Y. Yan, and J. C. Liao, “Conversion of proteins into biofuels by engineering nitrogen flux,” Nature Biotechnology, vol. 29, no. 4, pp. 346–351, 2011.

[137] E. J. Steen, Y. Kang, G. Bokinsky, Z. Hu, A. Schirmer, A. McClure, S. B. Del Cardayre, and J. D. Keasling, “Microbial production of fatty-acid-derived fuels and chemicals from plant biomass,” Nature, vol. 463, no. 7280, pp. 559–562, 2010.

[138] Y. J. Choi and S. Y. Lee, “Microbial production of short-chain alkanes,” Nature, vol. 502, no. 7472, pp. 571–574, 2013.

[139] H. Yim, R. Haselbeck, W. Niu, C. Pujol-Baxley, A. Burgard, J. Boldt, J. Khan- durina, J. D. Trawick, R. E. Osterhout, R. Stephen, J. Estadilla, S. Teisan, H. B.

137 Bibliography

Schreyer, S. Andrae, T. H. Yang, S. Y. Lee, M. J. Burk, and S. Van Dien, “Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol,” Nature Chemical Biology, vol. 7, no. 7, pp. 445–452, 2011.

[140] D. G. Gibson, J. I. Glass, C. Lartigue, V. N. Noskov, R.-Y. Chuang, M. A. Al- gire, G. A. Benders, M. G. Montague, L. Ma, M. M. Moodie, C. Merryman, S. Vashee, R. Krishnakumar, N. Assad-Garcia, C. Andrews-Pfannkoch, E. A. Denisova, L. Young, Z.-Q. Qi, T. H. Segall-Shapiro, C. H. Calvey, P. P. Parmar, C. A. Hutchison, H. O. Smith, and J. C. Venter, “Creation of a bacterial cell con- trolled by a chemically synthesized genome.,” Science, vol. 329, pp. 52–56, July 2010.

[141] D. G. Gibson, G. A. Benders, C. Andrews-Pfannkoch, E. A. Denisova, H. Baden- Tillson, J. Zaveri, T. B. Stockwell, A. Brownley, D. W. Thomas, M. A. Algire, C. Merryman, L. Young, V. N. Noskov, J. I. Glass, J. C. Venter, C. A. Hutchi- son III, and H. O. Smith, “Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome,” Science, vol. 319, no. 5867, pp. 1215–1220, 2008.

[142] J. Westberg, A. Persson, A. Holmberg, A. Goesmann, J. Lundeberg, K. E. Jo- hansson, B. Pettersson, and M. Uhlen,´ “The genome sequence of Mycoplasma mycoides subsp. mycoides SC type strain PG1T, the causative agent of con- tagious bovine pleuropneumonia (CBPP),” Genome Research, vol. 14, no. 2, pp. 221–227, 2004.

[143] C. Lartigue, J. I. Glass, N. Alperovich, R. Pieper, P. P. Parmar, C. A. Hutchison, H. O. Smith, and J. C. Venter, “Genome transplantation in bacteria: Changing one species to another,” Science, vol. 317, no. 5838, pp. 632–638, 2007.

[144] C. Lartigue, S. Vashee, M. A. Algire, R.-Y. Chuang, G. A. Benders, L. Ma, V. N. Noskov, E. A. Denisova, D. G. Gibson, N. Assad-Garcia, N. Alperovich, D. W. Thomas, C. Merryman, C. A. Hutchison, H. O. Smith, J. C. Venter, and J. I. Glass, “Creating bacterial strains from genomes that have been cloned and engineered in yeast,” Science, vol. 325, no. 5948, pp. 1693–1696, 2009.

[145] C. A. Hutchison, R.-Y. Chuang, V. N. Noskov, N. Assad-Garcia, T. J. Deerinck, M. H. Ellisman, J. Gill, K. Kannan, B. J. Karas, L. Ma, J. F. Pelletier, Z.-Q. Qi, R. A. Richter, E. A. Strychalski, L. Sun, Y. Suzuki, B. Tsvetanova, K. S. Wise, H. O. Smith, J. I. Glass, C. Merryman, D. G. Gibson, and J. C. Venter, “Design and synthesis of a minimal bacterial genome.,” Science, vol. 351, no. 6280, p. 1414 aad6253, 2016.

[146] M. Christen, S. Deutsch, and B. Christen, “Genome Calligrapher: A Web Tool for Refactoring Bacterial Genome Sequences for de novo DNA Synthesis.,” ACS synthetic biology, vol. 4, pp. 927–934, Aug. 2015.

[147] M. A. Jensen, M. Fukushima, and R. W. Davis, “DMSO and betaine greatly im- prove amplification of GC-rich constructs in de novo synthesis,” PLoS ONE, vol. 5, no. 6, pp. 1–5, 2010.

[148] R. T. Kovacic, L. Comal, and A. J. Bendich, “Protection of megabase DNA from shearing,” Nucleic Acids Research, vol. 23, no. 19, pp. 3999–4000, 1995.

138 Bibliography

[149] N. Kouprina and V. Larionov, “Selective isolation of genomic loci from complex genomes by transformation-associated recombination cloning in the yeast Sac- charomyces cerevisiae,” Nature protocols, vol. 3, no. 3, pp. 371–377, 2008.

[150] V. Wood, R. Gwilliam, M.-A. Rajandream, M. Lyne, R. Lyne, A. Stewart, J. Sgouros, N. Peat, J. Hayles, S. Baker, B. D, S. Bowman, K. Brooks, B. D, S. Brown, t. Chillingworth, C. Churcher, M. Collins, R. Connor, A. Cronin, P. Davis, T. Feltwell, A. Frase, S. Gentles, A. Goble, N. Hamlin, D. Harris, J. Hi- dalgo, G. Hodgson, S. Holroyd, T. Hornsby, S. Howarth, E. J. Huckle, S. Hunt, K. Jagels, K. James, L. Jones, M. Jones, S. Leather, S. McDonald, J. McLean, P. Mooney, S. Moule, K. Mungall, L. Murphy, D. Niblett, C. Odell, K. Oliver, S. O’Neil, D. Pearson, M. A. Quail, E. Rabbinowitsch, K. Rutherford, S. Rut- ter, D. Saunders, K. Seeger, S. Sharp, J. Skelton, M. Simmonds, R. Squares, S. Squares, K. Stevens, K. Taylor, R. G. Taylor, A. Tivey, S. Walsh, T. War- ren, S. Whitehead, J. Woodward, G. Volckaert, R. Aert, J. Robben, B. Grymon- prez, I. Weltjens, E. Vanstreels, M. Rieger, M. Schafer,¨ S. Muller-Auer,¨ C. Gabel, M. Fuchs, C. Fritzc, E. Holzer, D. Moestl, H. Hilbert, K. Borzym, I. Langer, A. Beck, H. Lehrach, R. Reinhardt, T. M. Pohl, P. Eger, W. Zimmermann, H. Wedler, R. Wambutt, B. Purnelle, A. Goffeau, E. Cadieu, S. Dreano,´ S. Gloux, V. Lelaure, S. Mottier, F. Galibert, S. J. Aves, Z. Xiang, C. Hunt, K. Moore, S. M. Hurst, M. Lu- cas, M. Rochet, C. Gaillardin, V. A. Tallada, A. Garzon, G. Thode, R. R. Daga, L. Cruzado, J. Jimenez, M. Sanchez, F. del Rey, J. Benito, A. Dominguez, J. L. Revuelta, S. Moreno, J. Armstrong, S. L. Forsburg, L. Cerrutti, T. Lowe, W. R. McCombie, I. Paulsen, J. Potashkin, G. V. Shpakovski, D. Ussery, B. G. Barrell, and P. Nurse, “The genome sequence of Schizosaccharomyces pombe,” Nature, vol. 415, no. 6874, pp. 871–880, 2002.

[151] M. Christen, L. Del Medico, H. Christen, and B. Christen, “Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications.,” PloS one, vol. 12, no. 5, pp. 1–19, 2017.

[152] V. Siewers, “An overview on selection marker genes for transformation of Saccha- romyces cerevisiae,” in Yeast Metabolic Engineering, pp. 3–15, Springer, 2014.

[153] S. P. Bell and B. Stillman, “ATP-dependent recognition of eukaryotic origins of DNA replication by a multiprotein complex,” Nature, vol. 357, no. 6374, pp. 128– 134, 1992.

[154] J. Broach, Y.-Y. Li, J. Feldman, M. Jayaram, J. Abraham, K. Nasmyth, and J. Hicks, “Localization and sequence analysis of yeast origins of replication,” in Cold Spring Harbor Symposia on Quantitative Biology, vol. 47, pp. 1165–1173, Cold Spring Harbor Laboratory Press, 1983.

[155] V. N. Noskov, B. J. Karas, L. Young, R.-Y. Chuang, D. G. Gibson, Y.-C. Lin, J. Stam, I. T. Yonemoto, Y. Suzuki, C. Andrews-Pfannkoch, J. I. Glass, H. O. Smith, C. A. Hutchison, J. C. Venter, and P. D. Weyman, “Assembly of large, high G+C bacterial DNA fragments in yeast.,” ACS synthetic biology, vol. 1, pp. 267– 273, July 2012.

[156] F. Spencer, Y. Hugerat, G. Simchen, O. Hurko, C. Connelly, and P. Hieter, “Yeast kar1 mutants provide an effective method for YAC transfer to new hosts.,” Ge- nomics, vol. 22, pp. 118–126, July 1994.

139 Bibliography

[157] J. E. Dicarlo, J. E. Norville, P. Mali, X. Rios, J. Aach, and G. M. Church, “Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems,” Nucleic Acids Research, vol. 41, no. 7, pp. 4336–4343, 2013.

[158] M. Haeussler, K. Schonig,¨ H. Eckert, A. Eschstruth, J. Mianne,´ J.-B. Renaud, S. Schneider-Maunoury, A. Shkumatava, L. Teboul, J. Kent, J.-S. Joly, and J.-P. Concordet, “Evaluation of off-target and on-target scoring algorithms and inte- gration into the guide rna selection tool crispor,” Genome biology, vol. 17, no. 1, pp. 1–12, 2016.

[159] C. B. Brachmann, A. Davies, G. J. Cost, E. Caputo, J. Li, P. Hieter, and J. D. Boeke, “Designer deletion strains derived from Saccharomyces cerevisiae S288C: A useful set of strains and plasmids for PCR-mediated gene disruption and other applications,” Yeast, vol. 14, no. 2, pp. 115–132, 1998.

[160] D. T. Stinchcomb, K. Struhl, and R. W. Davis, “Isolation and characterisation of a yeast chromosomal replicator,” Nature, vol. 282, no. 5734, pp. 39–43, 1979.

[161] J. A. Huberman, J. Zhu, L. R. Davis, and C. S. Newlon, “Close association of a DNA replication origin and an ARS element on chromosome III of the yeast, Saccharomyces cerevisiae,” Nucleic acids research, vol. 16, no. 14, pp. 6373– 6384, 1988.

[162] B. J. Brewer and W. L. Fangman, “The Localization of Replication Origins on ARS Plasmids in S. cerevisiae,” Cell, vol. 51, pp. 463–471, 1988.

[163] I. Liachko, R. A. Youngblood, U. Keich, and M. J. Dunham, “High-resolution map- ping, characterization, and optimization of autonomously replicating sequences in yeast.,” Genome research, vol. 23, pp. 698–704, Apr. 2013.

[164] A. Rowley, J. H. Cocker, J. Harwood, and J. F. X. Diffley, “Initiation complex as- sembly at budding yeast replication origins begins with the recognition of a bi- partite sequence by limiting amounts of the initiator , ORC,” The EMBO Journal, vol. 14, no. 11, pp. 2631–2641, 1995.

[165] H. Rao and B. Stillman, “The origin recognition complex interacts with a bipartite DNA binding site within yeast replicators,” Proceedings of the National Academy of Sciences of the USA, vol. 92, no. 6, pp. 2224–2228, 1995.

[166] C. C. Siow, S. R. Nieduszynska, C. A. Muller,¨ and C. A. Nieduszynski, “OriDB, the DNA replication origin database updated and extended,” Nucleic Acids Research, vol. 40, no. D1, pp. 682–686, 2012.

[167] S. G. N. Grant, J. Jesseet, F. R. Bloomt, and D. Hanahan, “Differential plasmid rescue from transgenic mouse into Escherichia coli methylation-restriction mutants,” Proceedings of the National Academy of Sciences of the USA, vol. 87, no. June, pp. 4645–4649, 1990.

[168] R. D. Gietz and R. H. Schiestl, “High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method,” Nature Protocols, vol. 2, no. 1, pp. 31–34, 2007.

[169] R. S. Sikorski and P. Hieter, “A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae.,” Ge- netics, vol. 122, no. 1, pp. 19–27, 1989.

140 Bibliography

[170] V. Larionov, N. Kouprina, J. Graves, and M. A. Resnick, “Highly selective isolation of human DNAs from rodent-human hybrid cells as circular yeast artificial chro- mosomes by transformation-associated recombination cloning,” Proceedings of the National Academy of Sciences of the USA, vol. 93, no. 24, pp. 13925–13930, 1996.

[171] K. S. Makarova, Y. I. Wolf, and E. V. Koonin, “Towards functional characteriza- tion of archaeal genomic dark matter,” Biochemical Society Transactions, vol. 47, no. 1, pp. 389–398, 2019.

[172] M. K. Estes, K. Ettayebi, V. R. Tenge, K. Murakami, U. Karandikar, S. C. Lin, B. V. Ayyar, N. W. Cortes-Penfield, K. Haga, F. H. Neill, A. R. Opekun, J. R. Broughman, X. L. Zeng, S. E. Blutt, S. E. Crawford, S. Ramani, D. Y. Graham, and R. L. Atmar, “Human Norovirus Cultivation in Nontransformed Stem Cell- Derived Human Intestinal Enteroid Cultures: Success and Challenges,” Viruses, vol. 11, no. 7, pp. 9–11, 2019.

[173] D. S. Ottoz, F. Rudolf, and J. Stelling, “Inducible, tightly regulated and growth condition-independent transcription factor in Saccharomyces cerevisiae,” Nucleic Acids Research, vol. 42, no. 17, pp. 1–11, 2014.

[174] J. Cello, A. V. Paul, and E. Wimmer, “Chemical synthesis of poliovirus cDNA: gen- eration of infectious virus in the absence of natural template,” Science, vol. 297, no. 5583, pp. 1016–1018, 2002.

[175] H. O. Smith, C. A. Hutchison, C. Pfannkoch, and J. C. Venter, “Generating a synthetic genome by whole genome assembly: phiX174 bacteriophage from syn- thetic oligonucleotides.,” Proceedings of the National Academy of Sciences of the USA, vol. 100, pp. 15440–15445, Dec. 2003.

[176] N. Annaluru, H. Muller, L. A. Mitchell, S. Ramalingam, G. Stracquadanio, S. M. Richardson, J. S. Dymond, Z. Kuang, L. Z. Scheifele, E. M. Cooper, Y. Cai, K. Zeller, N. Agmon, J. S. Han, M. Hadjithomas, J. Tullman, K. Caravelli, K. Cirelli, Z. Guo, V. London, A. Yeluru, S. Murugan, K. Kandavelou, N. Agier, G. Fis- cher, K. Yang, J. A. Martin, M. Bilgel, P. Bohutskyi, K. M. Boulier, B. J. Capaldo, J. Chang, K. Charoen, W. J. Choi, P. Deng, J. E. DiCarlo, J. Doong, J. Dunn, J. I. Feinberg, C. Fernandez, C. E. Floria, D. Gladowski, P. Hadidi, I. Ishizuka, J. Jabbari, C. Y. L. Lau, P. A. Lee, S. Li, D. Lin, M. E. Linder, J. Ling, J. Liu, J. Liu, M. London, H. Ma, J. Mao, J. E. McDade, A. McMillan, A. M. Moore, W. C. Oh, Y. Ouyang, R. Patel, M. Paul, L. C. Paulsen, J. Qiu, A. Rhee, M. G. Rubashkin, I. Y. Soh, N. E. Sotuyo, V. Srinivas, A. Suarez, A. Wong, R. Wong, W. R. Xie, Y. Xu, A. T. Yu, R. Koszul, J. S. Bader, J. D. Boeke, and S. Chandrasegaran, “To- tal synthesis of a functional designer eukaryotic chromosome,” Science, vol. 344, no. 6179, pp. 55–58, 2014.

[177] L. A. Mitchell, A. Wang, G. Stracquadanio, Z. Kuang, X. Wang, K. Yang, S. Richardson, J. A. Martin, Y. Zhao, R. Walker, Y. Luo, H. Dai, K. Dong, Z. Tang, Y. Yang, Y. Cai, A. Heguy, B. Ueberheide, D. Fenyo,¨ J. Dai, J. S. Bader, and J. D. Boeke, “Synthesis, debugging, and effects of synthetic chromosome consolida- tion: synvi and beyond,” Science, vol. 355, no. 6329, p. eaaf4831, 2017.

[178] S. M. Richardson, L. A. Mitchell, G. Stracquadanio, K. Yang, J. S. Dymond, J. E. DiCarlo, D. Lee, C. L. V. Huang, S. Chandrasegaran, Y. Cai, J. D. Boeke, and J. S.

141 Bibliography

Bader, “Design of a synthetic yeast genome.,” Science, vol. 355, pp. 1040–1044, Mar. 2017.

[179] Y. Shen, Y. Wang, T. Chen, F. Gao, J. Gong, D. Abramczyk, R. Walker, H. Zhao, S. Chen, W. Liu, Y. Luo, C. A. Muller,¨ A. Paul-Dubois-Taine, B. Alver, G. Strac- quadanio, L. A. Mitchell, Z. Luo, Y. Fan, B. Zhou, B. Wen, F. Tan, Y. Wang, J. Zi, Z. Xie, B. Li, K. Yang, S. M. Richardson, H. Jiang, C. E. French, C. A. Nieduszyn- ski, R. Koszul, A. L. Marston, Y. Yuan, J. Wang, J. S. Bader, J. Dai, J. D. Boeke, X. Xu, Y. Cai, and H. Yang, “Deep functional analysis of synii, a 770-kilobase synthetic yeast chromosome,” Science, vol. 355, no. 6329, 2017.

[180] Y. Wu, B.-Z. Li, M. Zhao, L. A. Mitchell, Z.-X. Xie, Q.-H. Lin, X. Wang, W.-H. Xiao, Y. Wang, X. Zhou, H. Liu, X. Li, M.-Z. Ding, D. Liu, L. Zhang, B.-L. Liu, X.-L. Wu, F.-F. Li, X.-T. Dong, B. Jia, W.-Z. Zhang, G.-Z. Jiang, Y. Liu, X. Bai, T.-Q. Song, Y. Chen, S.-J. Zhou, R.-Y. Zhu, F. Gao, Z. Kuang, X. Wang, M. Shen, K. Yang, G. Stracquadanio, S. M. Richardson, Y. Lin, L. Wang, R. Walker, Y. Luo, P.-S. Ma, H. Yang, Y. Cai, J. Dai, J. S. Bader, J. D. Boeke, and Y.-J. Yuan, “Bug mapping and fitness testing of chemically synthesized chromosome X.,” Science, vol. 355, p. 1048 eaaf4706, Mar. 2017.

[181] Z.-X. Xie, B.-Z. Li, L. A. Mitchell, Y. Wu, X. Qi, Z. Jin, B. Jia, X. Wang, B.-X. Zeng, H.-M. Liu, X.-L. Wu, Q. Feng, W.-Z. Zhang, W. Liu, M.-Z. Ding, X. Li, G.- R. Zhao, J.-J. Qiao, J.-S. Cheng, M. Zhao, Z. Kuang, X. Wang, J. A. Martin, G. Stracquadanio, K. Yang, X. Bai, J. Zhao, M.-L. Hu, Q.-H. Lin, W.-Q. Zhang, M.-H. Shen, S. Chen, W. Su, E.-X. Wang, R. Guo, F. Zhai, X.-J. Guo, H.-X. Du, J.-Q. Zhu, T.-Q. Song, J.-J. Dai, F.-F. Li, G.-Z. Jiang, S.-L. Han, S.-Y. Liu, Z.-C. Yu, X.-N. Yang, K. Chen, C. Hu, D.-S. Li, N. Jia, Y. Liu, L.-T. Wang, S. Wang, X.-T. Wei, M.-Q. Fu, L.-M. Qu, S.-Y. Xin, T. Liu, K.-R. Tian, X.-N. Li, J.-H. Zhang, L.-X. Song, J.-G. Liu, J.-F. Lv, H. Xu, R. Tao, Y. Wang, T.-T. Zhang, Y.-X. Deng, Y.-R. Wang, T. Li, G.-X. Ye, X.-R. Xu, Z.-B. Xia, W. Zhang, S.-L. Yang, Y.-L. Liu, W.-Q. Ding, Z.-N. Liu, J.-Q. Zhu, N.-Z. Liu, R. Walker, Y. Luo, Y. Wang, Y. Shen, H. Yang, Y. Cai, P.-S. Ma, C.-T. Zhang, J. S. Bader, J. D. Boeke, and Y.-J. Yuan, “”Perfect” designer chromosome V and behavior of a ring derivative.,” Science, vol. 355, p. 1046 eaaf4704, Mar. 2017.

[182] J. R. Coleman, D. Papamichail, S. Skiena, B. Futcher, E. Wimmer, and S. Mueller, “Virus attenuation by genome-scale changes in codon pair bias,” Science, vol. 320, no. 5884, pp. 1784–1787, 2008.

[183] M. A. Mart´ınez, A. Jordan-Paiz, S. Franco, and M. Nevot, “Synonymous virus genome recoding as a tool to impact viral fitness,” Trends in microbiology, vol. 24, no. 2, pp. 134–147, 2016.

[184] S. Mueller, J. R. Coleman, D. Papamichail, C. B. Ward, A. Nimnual, B. Futcher, S. Skiena, and E. Wimmer, “Live attenuated influenza virus vaccines by computer-aided rational design,” Nature biotechnology, vol. 28, no. 7, p. 723, 2010.

[185] H. H. Wang, F. J. Isaacs, P. A. Carr, Z. Z. Sun, G. Xu, C. R. Forest, and G. M. Church, “Programming cells by multiplex genome engineering and accelerated evolution.,” Nature, vol. 460, pp. 894–898, Aug. 2009.

142 Bibliography

[186] F. J. Isaacs, P. A. Carr, H. H. Wang, M. J. Lajoie, B. Sterling, L. Kraal, A. C. Tolonen, T. A. Gianoulis, D. B. Goodman, N. B. Reppas, C. J. Emig, D. Bang, S. J. Hwang, M. C. Jewett, J. M. Jacobson, and G. M. Church, “Precise ma- nipulation of chromosomes in vivo enables genome-wide codon replacement,” Science, vol. 333, no. 6040, pp. 348–353, 2011.

[187] M. Lajoie, S. Kosuri, J. Mosberg, C. Gregg, D. Zhang, and G. Church, “Probing the limits of genetic recoding in essential genes,” Science, vol. 342, no. 6156, pp. 361–363, 2013.

[188] M. G. Napolitano, M. Landon, C. J. Gregg, M. J. Lajoie, L. Govindarajan, J. A. Mosberg, G. Kuznetsov, D. B. Goodman, O. Vargas-Rodriguez, F. J. Isaacs, D. Soll,¨ and G. M. Church, “Emergent rules for codon choice elucidated by editing rare arginine codons in Escherichia coli,” Proceedings of the National Academy of Sciences of the USA, vol. 113, no. 38, pp. E5588–E5597, 2016.

[189] K. Wang, J. Fredens, S. F. Brunner, S. H. Kim, T. Chia, and J. W. Chin, “Defin- ing synonymous codon compression schemes by genome recoding.,” Nature, vol. 539, pp. 59–64, Nov. 2016.

[190] Y. H. Lau, F. Stirling, J. Kuo, M. A. Karrenbelt, Y. A. Chan, A. Riesselman, C. A. Horton, E. Schafer,¨ D. Lips, M. T. Weinstock, D. G. Gibson, J. C. Way, and P. A. Silver, “Large-scale recoding of a bacterial genome by iterative recombineering of synthetic DNA,” Nucleic acids research, vol. 45, no. 11, pp. 6971–6980, 2017.

[191] N. Ostrov, M. Landon, M. Guell, G. Kuznetsov, J. Teramoto, N. Cervantes, M. Zhou, K. Singh, M. G. Napolitano, M. Moosburner, E. Shrock, B. W. Pruitt, N. Conway, D. B. Goodman, C. L. Gardner, G. Tyree, A. Gonzales, B. L. Wanner, J. E. Norville, M. J. Lajoie, and G. M. Church, “Design, synthesis, and testing toward a 57-codon genome,” Science, vol. 353, no. 6301, pp. 819–822, 2016.

[192] J. Holtzendorff, D. Hung, P. Brende, A. Reisenauer, P. H. Viollier, H. H. McAdams, and L. Shapiro, “Oscillating global regulators control the genetic circuit driving a bacterial cell cycle.,” Science, vol. 304, pp. 983–987, May 2004.

[193] P. T. McGrath, H. Lee, L. Zhang, A. A. Iniesta, A. K. Hottes, M. H. Tan, N. J. Hillson, P. Hu, L. Shapiro, and H. H. McAdams, “High-throughput identification of transcription start sites, conserved promoter motifs and predicted regulons.,” Nature Biotechnology, vol. 25, pp. 584–592, May 2007.

[194] M. Christen, H. D. Kulasekara, B. Christen, B. R. Kulasekara, L. R. Hoffman, and S. I. Miller, “Asymmetrical distribution of the second messenger c-di-GMP upon bacterial cell division.,” Science, vol. 328, pp. 1295–1297, June 2010.

[195] X. Shen, J. Collier, D. Dill, L. Shapiro, M. Horowitz, and H. H. McAdams, “Ar- chitecture and inherent robustness of a bacterial cell-cycle control system.,” Pro- ceedings of the National Academy of Sciences of the USA, vol. 105, no. 32, pp. 11340–11345, 2008.

[196] J. M. Schrader, B. Zhou, G.-W. Li, K. Lasker, W. S. Childers, B. Williams, T. Long, S. Crosson, H. H. McAdams, J. S. Weissman, and L. Shapiro, “The coding and noncoding architecture of the Caulobacter crescentus genome.,” PLoS Genetics, vol. 10, p. e1004463, July 2014.

143 Bibliography

[197] B. Zhou, J. M. Schrader, V. S. Kalogeraki, E. Abeliuk, C. B. Dinh, J. Q. Pham, Z. Z. Cui, D. L. Dill, H. H. McAdams, and L. Shapiro, “The global regulatory architec- ture of transcription during the Caulobacter cell cycle.,” PLoS genetics, vol. 11, p. e1004831, Jan. 2015. [198] W. C. Nierman, T. V. Feldblyum, M. T. Laub, I. T. Paulsen, K. E. Nelson, J. A. Eisen, J. F. Heidelberg, M. R. Alley, N. Ohta, J. R. Maddock, I. Potocka, W. C. Nelson, A. Newton, C. Stephens, N. D. Phadke, B. Ely, R. T. DeBoy, R. J. Dod- son, A. S. Durkin, M. L. Gwinn, D. H. Haft, J. F. Kolonay, J. Smit, M. B. Craven, H. Khouri, J. Shetty, K. Berry, T. Utterback, K. Tran, A. Wolf, J. Vamathevan, M. Er- molaeva, O. White, S. L. Salzberg, J. C. Venter, L. Shapiro, C. M. Fraser, and J. Eisen, “Complete genome sequence of Caulobacter crescentus.,” Proceedings of the National Academy of Sciences of the USA, vol. 98, pp. 4136–4141, Mar. 2001. [199] J. M. Schrader, G.-W. Li, W. S. Childers, A. M. Perez, J. S. Weissman, L. Shapiro, and H. H. McAdams, “Dynamic translation regulation in Caulobacter cell cycle control.,” Proceedings of the National Academy of Sciences of the USA, vol. 113, pp. E6859–E6867, Nov. 2016. [200] A. Kimelman, A. Levy, H. Sberro, S. Kidron, A. Leavitt, G. Amitai, D. R. Yoder- Himes, O. Wurtzel, Y. Zhu, E. M. Rubin, and R. Sorek, “A vast collection of micro- bial genes that are toxic to bacteria.,” Genome Research, vol. 22, pp. 802–809, Apr. 2012. [201] R. Sorek, Y. Zhu, C. J. Creevey, M. P. Francino, P. Bork, and E. M. Rubin, “Genome-wide experimental determination of barriers to horizontal gene trans- fer,” Science, vol. 318, no. 5855, pp. 1449–1452, 2007. [202] J. Izard, C. D. G. Balderas, D. Ropers, S. Lacour, X. Song, Y. Yang, A. B. Lindner, J. Geiselmann, and H. De Jong, “A synthetic growth switch based on controlled expression of RNA polymerase,” Molecular systems biology, vol. 11, no. 11, p. 840, 2015. [203] T. Shibata, T. Hishida, Y. Kubota, Y.-W. Han, H. Iwasaki, and H. Shinagawa, “Functional overlap between RecA and MgsA (RarA) in the rescue of stalled repli- cation forks in Escherichia coli,” Genes to Cells, vol. 10, no. 3, pp. 181–191, 2005. [204] G. Allen and A. Kornberg, “Fine balance in the regulation of DnaB helicase by DnaC protein in replication in Escherichia coli.,” Journal of Biological Chemistry, vol. 266, no. 33, pp. 22096–22101, 1991. [205] E. Choi-Rhee and J. E. Cronan, “The biotin carboxylase-biotin carboxyl carrier protein complex of Escherichia coli acetyl-CoA carboxylase,” Journal of Biological Chemistry, vol. 278, no. 33, pp. 30806–30812, 2003. [206] S. T. Rutherford and B. L. Bassler, “Bacterial quorum sensing: its role in virulence and possibilities for its control,” Cold Spring Harbor perspectives in medicine, vol. 2, no. 11, p. a012427, 2012. [207] G. Storz, J. Vogel, and K. M. Wassarman, “Regulation by small RNAs in bacteria: expanding frontiers,” Molecular cell, vol. 43, no. 6, pp. 880–891, 2011. [208] J. Monod, J.-P. Changeux, and F. Jacob, “Allosteric proteins and cellular control systems,” Journal of molecular biology, vol. 6, no. 4, pp. 306–329, 1963.

144 Bibliography

[209] V. Chubukov, L. Gerosa, K. Kochanowski, and U. Sauer, “Coordination of mi- crobial metabolism,” Nature Reviews Microbiology, vol. 12, no. 5, pp. 327–340, 2014.

[210] G.-W. Li, D. Burkhardt, C. Gross, and J. S. Weissman, “Quantifying absolute pro- tein synthesis rates reveals principles underlying allocation of cellular resources,” Cell, vol. 157, no. 3, pp. 624–635, 2014.

[211] B. J. Paul, W. Ross, T. Gaal, and R. L. Gourse, “rRNA transcription in Escherichia coli,” Annual review of genetics, vol. 38, pp. 749–770, 2004.

[212] N. A. Dye, Z. Pincus, J. A. Theriot, L. Shapiro, and Z. Gitai, “Two independent spiral structures control cell shape in Caulobacter,” Proceedings of the National Academy of Sciences of the USA, vol. 102, no. 51, pp. 18608–18613, 2005.

[213] M. F. Susin, R. L. Baldini, F. Gueiros-Filho, and S. L. Gomes, “GroES/GroEL and DnaK/DnaJ have distinct roles in stress responses and during cell cycle pro- gression in Caulobacter crescentus,” Journal of bacteriology, vol. 188, no. 23, pp. 8044–8053, 2006.

[214] D. Kiefer and A. Kuhn, “YidC as an essential and multifunctional component in membrane protein assembly,” International review of cytology, vol. 259, pp. 113– 138, 2007.

[215] H. H. McAdams and L. Shapiro, “A bacterial cell-cycle regulatory network oper- ating in time and space.,” Science, vol. 301, pp. 1874–1877, Sept. 2003.

[216] H. H. McAdams and L. Shapiro, “System-level design of bacterial cell cycle con- trol.,” FEBS letters, vol. 583, p. 3991, Dec. 2009.

[217] K. Lasker, T. H. Mann, and L. Shapiro, “An intracellular compass spatially coordi- nates cell cycle modules in Caulobacter crescentus.,” Current opinion in microbi- ology, vol. 33, pp. 131–139, Oct. 2016.

[218] J. M. Skerker and M. T. Laub, “Cell-cycle progression and the generation of asym- metry in Caulobacter crescentus.,” Nature reviews microbiology, vol. 2, pp. 325– 337, Apr. 2004.

[219] A. Danchin and G. Fang, “Unknown unknowns: essential genes in quest for func- tion.,” Microbial biotechnology, vol. 9, pp. 530–540, Sept. 2016.

[220] F. F. V. Chevance and K. T. Hughes, “Case for the genetic code as a triplet of triplets.,” Proceedings of the National Academy of Sciences of the USA, vol. 114, pp. 4745–4750, May 2017.

[221] P. J. A. Cock, T. Antao, J. T. Chang, B. A. Chapman, C. J. Cox, A. Dalke, I. Fried- berg, T. Hamelryck, F. Kauff, B. Wilczynski, and M. J. L. de Hoon, “Biopython: freely available Python tools for computational molecular biology and bioinfor- matics.,” Bioinformatics (Oxford, England), vol. 25, pp. 1422–1423, June 2009.

[222] N. Kouprina, V. N. Noskov, and V. Larionov, “Selective isolation of large chromo- somal regions by transformation-associated recombination cloning for structural and functional analysis of mammalian genomes.,” Methods in molecular biology, vol. 349, pp. 85–101, 2006.

145 Bibliography

[223] A. Adey, H. G. Morrison, Asan, X. Xun, J. O. Kitzman, E. H. Turner, B. Stack- house, A. P. MacKenzie, N. C. Caruccio, X. Zhang, and J. Shendure, “Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition.,” Genome biology, vol. 11, no. 12, p. R119, 2010.

[224] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and 1000 Genome Project Data Processing Subgroup, “The sequence alignment/map format and SAMtools.,” Bioinformatics (Oxford, England), vol. 25, pp. 2078–2079, Aug. 2009.

[225] H. Li, “A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.,” Bioinformatics (Oxford, England), vol. 27, pp. 2987–2993, Nov. 2011.

[226] M. Christen, C. Beusch, Y. Bosch,¨ D. Cerletti, C. E. Flores-Tinoco, L. Del Medico, F. Tschan, and B. Christen, “Quantitative selection analysis of bacteriophage ϕCbK susceptibility in Caulobacter crescentus.,” Journal of molecular biology, Nov. 2015.

[227] H. Li and R. Durbin, “Fast and accurate long-read alignment with Burrows- Wheeler transform.,” Bioinformatics (Oxford, England), vol. 26, pp. 589–595, Mar. 2010.

[228] G. Amitai and R. Sorek, “Pandatox: a tool for accelerated metabolic engineering,” Bioengineered, vol. 3, no. 4, pp. 218–221, 2012.

[229] E. Toro, S.-H. Hong, H. H. McAdams, and L. Shapiro, “Caulobacter requires a dedicated mechanism to initiate chromosome segregation,” Proceedings of the National Academy of Sciences of the USA, 2008.

[230] K. Schulz-Schonhagen,¨ N. Lobsiger, and W. J. Stark, “Continuous production of a shelf-stable living material as a biosensor platform,” Advanced materials tech- nologies, p. 1900266, 2019.

[231] S. Combalbert and G. Hernandez-Raquet, “Occurrence, fate, and biodegradation of estrogens in sewage and manure,” Applied microbiology and biotechnology, vol. 86, pp. 1671–1692, 5 2010.

[232] S. L. Shrestha, F. X. M. Casey, H. Hakk, D. J. Smith, and G. Padmanabhan, “Fate and transformation of an estrogen conjugate and its metabolites in agricultural soils,” Environmental science & technology, vol. 46, pp. 11047–11053, 10 2012.

[233] M. Adeel, X. Song, Y. Wang, D. Francis, and Y. Yang, “Environmental impact of estrogens on human, animal and plant life: A critical review.,” Environment international, vol. 99, pp. 107–119, 2 2017.

[234] G. P.Daston, J. W. Gooch, W. J. Breslin, D. L. Shuey, A. I. Nikiforov, T. A. Fico, and J. W. Gorsuch, “Environmental estrogens and reproductive health: A discussion of the human and environmental data,” Reproductive toxicology, vol. 11, pp. 465– 481, 7 1997.

[235] L. Nikoleris, C. L. Hultin, P. Hallgren, and M. C. Hansson, “17α-Ethinylestradiol (EE2) treatment of wild roach (Rutilus rutilus) during early life development dis- rupts expression of genes directly involved in the feedback cycle of estrogen,”

146 Bibliography

Comparative biochemistry and physiology. Toxicology & pharmacology, vol. 180, pp. 56–64, 2016.

[236] J. R. van der Meer and S. Belkin, “Where microbiology meets microengineering: design and applications of reporter bacteria,” Nature reviews. Microbiology, vol. 8, pp. 511–522, 7 2010.

[237] S. Jarque, M. Bittner, L. Blaha, and K. Hilscherova, “Yeast biosensors for detec- tion of environmental pollutants: Current state and limitations,” Trends in biotech- nology, vol. 34, pp. 408–419, 5 2016.

[238] A. Adeniran, M. Sherer, and K. E. Tyo, “Yeast-based biosensors: Design and applications,” FEMS yeast research, vol. 15, no. 1, pp. 1–15, 2015.

[239] K. H. Baronian, “The use of yeast and moulds as sensing elements in biosensors,” Biosensors & bioelectronics, vol. 19, pp. 953–962, 4 2004.

[240] R. Gnugge,¨ T. Liphardt, and F. Rudolf, “A shuttle vector series for precise genetic engineering of Saccharomyces cerevisiae,” Yeast, vol. 33, no. 10, pp. 83–98, 2016.

[241] H. Mewes, K. Albermann, and M. Bahr,¨ “Overview of the Yeast Genome,” Nature, vol. 387, no. 6632 Suppl, pp. 7–65, 1997.

[242] P. A. Cahill, A. W. Knight, N. Billinton, M. G. Barker, L. Walsh, P. O. Keenan, C. V. Williams, D. J. Tweats, and R. M. Walmsley, “The GreenScreen R genotoxicity assay: A screening validation programme,” Mutagenesis, vol. 19, no. 2, pp. 105– 119, 2004.

[243] K. Tag, K. Riedel, H. J. Bauer, G. Hanke, K. H. Baronian, and G. Kunze, “Ampero- metric detection of Cu2+ by yeast biosensors using flow injection analysis (FIA),” Sensors and actuators. B, Chemical, vol. 122, no. 2, pp. 403–409, 2007.

[244] H. Wang, Q. Lang, L. Li, B. Liang, X. Tang, L. Kong, M. Mascini, and A. Liu, “Yeast surface displaying glucose oxidase as whole-cell biocatalyst: Construc- tion, characterization, and its electrochemical glucose sensing application,” Ana- lytical Chemistry, vol. 85, no. 12, pp. 6107–6112, 2013.

[245] V. Beck, A. Pfitscher, and A. Jungbauer, “GFP-reporter for a high throughput assay to monitor estrogenic compounds,” Journal of biochemical and biophysical methods, vol. 64, pp. 19–37, 7 2005.

[246] C. R. Lyttle, P. Damian-Matsumura, H. Juul, and T. R. Butt, “Human estrogen receptor regulation in a yeast model system and studies on receptor agonists and antagonists,” The Journal of steroid biochemistry and molecular biology, vol. 42, pp. 677–685, 8 1992.

[247] V. Kumar, S. Green, A. Staub, and P. Chambon, “Localisation of the oestradiol- binding and putative DNA-binding domains of the human oestrogen receptor.,” The EMBO journal, vol. 5, no. 9, pp. 2231–6, 1986.

[248] O. Griesbeck, G. S. Baird, R. E. Campbell, D. A. Zacharias, and R. Y. Tsien, “Re- ducing the environmental sensitivity of yellow fluorescent protein,” The Journal of biological chemistry, vol. 276, no. 31, pp. 29188–29194, 2001.

147 Bibliography

[249] B. J. Berendsen, I. J. Elbers, and A. A. Stolker, “Determination of the stabil- ity of antibiotics in matrix and reference solutions using a straightforward pro- cedure applying mass spectrometric detection,” Food additives & contaminants. Part A, Chemistry, analysis, control, exposure & risk assessment, vol. 28, no. 12, pp. 1657–1666, 2011.

[250] B. I. Escher, M. Allinson, R. Altenburger, P. A. Bain, P. Balaguer, W. Busch, J. Crago, N. D. Denslow, E. Dopp, K. Hilscherova, A. R. Humpage, A. Kumar, M. Grimaldi, B. S. Jayasinghe, B. Jarosova, A. Jia, S. Makarov, K. A. Maruya, A. Medvedev, A. C. Mehinto, J. E. Mendez, A. Poulsen, E. Prochazka, J. Richard, A. Schifferli, D. Schlenk, S. Scholz, F. Shiraishi, S. Snyder, G. Su, J. Y. M. Tang, B. van der Burg, S. C. van der Linden, I. Werner, S. D. Westerheide, C. K. C. Wong, M. Yang, B. H. Y. Yeung, X. Zhang, and F. D. L. Leusch, “Benchmarking Organic Micropollutants in Wastewater, Recycled Water and Drinking Water with In Vitro Bioassays,” Environmental science & technology., vol. 48, pp. 1940–1956, 2 2014.

[251] H. Fang, W. Tong, R. Perkins, A. M. Soto, N. V. Prechtl, and D. M. Sheehan, “Quantitative comparisons of in vitro assays for estrogenic activities,” Environ- mental health perspectives., vol. 108, pp. 723–729, 8 2000.

[252] A. C. Mehinto, A. Jia, S. A. Snyder, B. S. Jayasinghe, N. D. Denslow, J. Crago, D. Schlenk, C. Menzie, S. D. Westerheide, F. D. Leusch, and K. A. Maruya, “In- terlaboratory comparison of in vitro bioassays for screening of endocrine active chemicals in recycled water,” Water research, vol. 83, pp. 303–309, 10 2015.

[253] A. J. Murk, J. Legler, M. M. Van Lipzig, J. H. Meerman, A. C. Belfroid, A. Spenke- link, B. Van der Burg, G. B. Rijs, and D. Vethaak, “Detection of estrogenic potency in wastewater and surface water with three in vitro bioassays,” Environmental tox- icology and chemistry, vol. 21, pp. 16–23, 1 2002.

[254] P. A. Neale and B. I. Escher, “In vitro bioassays to assess drinking water quality,” Current Opinion in Environmental Science & Health, vol. 7, pp. 1–7, 2 2019.

[255] I. Vopalensk´ a,´ L. Vachov´ a,´ and Z. Palkova,´ “New biosensor for detection of cop- per ions in water based on immobilized genetically modified yeast cells,” Biosen- sors & Bioelectronics, vol. 72, pp. 160–167, 10 2015.

[256] C. Schirmer, J. Posseckardt, A. Kick, K. Rebatschek, W. Fichtner, K. Ostermann, A. Schuller, G. Rodel,¨ and M. Mertig, “Encapsulating genetically modified Sac- charomyces cerevisiae cells in a flow-through device towards the detection of diclofenac in wastewater,” Journal of Biotechnology, vol. 284, pp. 75–83, 10 2018.

[257] V. Krasnan,ˇ R. Stloukal, M. Rosenberg, and M. Rebros,ˇ “Immobilization of cells and enzymes to LentiKats R ,” Applied Microbiology and Biotechnology, vol. 100, pp. 2535–2553, 3 2016.

[258] A. Abbas, I. Schneider, A. Bollmann, J. Funke, J. Oehlmann, C. Prasse, U. Schulte-Oehlmann, W. Seitz, T. Ternes, M. Weber, H. Wesely, and M. Wag- ner, “What you extract is what you see: Optimising the preparation of water and wastewater samples for in vitro bioassays,” Water Research, vol. 152, pp. 47–60, 4 2019.

148 Bibliography

[259] R. Kase, B. Javurkova, E. Simon, K. Swart, S. Buchinger, S. Konemann,¨ B. I. Escher, M. Carere, V. Dulio, S. Ait-Aissa, H. Hollert, S. Valsecchi, S. Polesello, P. Behnisch, C. di Paolo, D. Olbrich, E. Sychrova, M. Gundlach, R. Schlichting, L. Leborgne, M. Clara, C. Scheffknecht, Y. Marneffe, C. Chalon, P. Tusil, P. Sol- dan, B. von Danwitz, J. Schwaiger, A. Moran,´ F. Bersani, O. Perceval, C. Kienle, E. Vermeirssen, K. Hilscherova, G. Reifferscheid, and I. Werner, “Screening and risk management solutions for steroidal estrogens in surface and wastewater,” Trends in Analytical Chemistry, vol. 102, pp. 343–358, 5 2018.

[260] P. A. Neale, W. Brack, S. A¨ıt-A¨ıssa, W. Busch, J. Hollender, M. Krauss, E. Maillot- Marechal,´ N. A. Munz, R. Schlichting, T. Schulze, B. Vogler, and B. I. Escher, “Solid-phase extraction as sample preparation of water samples for cell-based and other: In vitro bioassays,” Environmental science. Processes & impacts, vol. 20, no. 3, pp. 493–504, 2018.

[261] E. Simon, A. Schifferli, T. B. Bucher, D. Olbrich, I. Werner, and E. L. Vermeirssen, “Solid-phase extraction of estrogens and herbicides from environmental waters for bioassay analysis—effects of sample volume on recoveries,” Analytical and Bioanalytical Chemistry, vol. 411, pp. 2057–2069, 4 2019.

[262] P. Tusil,ˇ E. Vermeirssen, J. Schwaiger, V. Dulio, Y. Marneffe, S. Konemann,¨ R. Kase, B. I. Escher, I. Werner, R. Schlichting, C. Scheffknecht, C. Di Paolo, E. Simon, B. von Danwitz, G. Reifferscheid, L. Leborgne, M. Clara, O. Perce- val, B. Javurkova, S. Valsecchi, M. I. San Mart´ın Becares, D. Olbrich, P. Soldan,` P. Behnisch, C. Chalon, F. Bersani, M. Carere, M. Schlusener,¨ K. Hilscherova,´ E. Sychrova, S. Buchinger, K. Swart, T. Ternes, S. A¨ıt-A¨ıssa, S. Polesello, and H. Hollert, “Effect-based and chemical analytical methods to monitor estrogens under the European Water Framework Directive,” Trends in Analytical Chemistry, vol. 102, pp. 225–235, 5 2018.

[263] S. Heub, N. Tscharner, V. Monnier, F. Kehl, P. S. Dittrich, S. Follonier, and L. Barbe, “Automated and portable solid phase extraction platform for immuno- detection of 17β-estradiol in water,” Journal of chromatography. A, vol. 1381, pp. 22–28, 2015.

[264] L. Cevenini, A. Lopreside, M. M. Calabretta, M. D’Elia, P. Simoni, E. Michelini, and A. Roda, “A novel bioluminescent NanoLuc yeast-estrogen screen biosen- sor (nanoYES) with a compact wireless camera for effect-based detection of endocrine-disrupting chemicals,” Analytical and Bioanalytical Chemistry, vol. 410, no. 4, pp. 1237–1246, 2018.

[265] A. Lopreside, M. M. Calabretta, L. Montali, M. Ferri, A. Tassoni, B. R. Bran- chini, T. Southworth, M. D’Elia, A. Roda, and E. Michelini, “Pret-ˆ a-porter` nanoYESα and nanoYESβ bioluminescent cell biosensors for ultrarapid and sen- sitive screening of endocrine-disrupting chemicals,” Analytical and Bioanalytical Chemistry, pp. 1–13, 4 2019.

[266] S. Fred, “Getting started with yeast,” Methods in Enzymology, vol. Volume 350, pp. 3–41, 2002.

[267] M. Jinek, K. Chylinski, I. Fonfara, M. Hauer, J. a. Doudna, and E. Charpentier, “A Programmable dual-RNA–uided DNA endonuclease in adaptive bacterial immu- nity,” Science, vol. 337, pp. 816–822, 2012.

149 Bibliography

[268] J. C. Schmidt, “Philosophy of late-modern technology,” in Synthetic Biology (J. Boldt, ed.), pp. 13–29, Springer, 2016.

[269] Schweizer Eidgenossenschaft, “Bundesgesetz uber¨ die Gentechnik im Ausser- humanbereich,” 2018. https://www.admin.ch/opc/de/classified-compilation/19996136/index.html.

[270] S. Y. Lee and H. U. Kim, “Systems strategies for developing industrial microbial strains,” Nature Biotechnology, vol. 33, no. 10, pp. 1061–1072, 2015.

[271] W. Slichenmyer and D. Von Hoff, “Taxol: a new and effective anti-cancer drug,” Anti-cancer drugs, vol. 2, no. 6, pp. 519–530, 1991.

[272] A. Nazhand, A. Durazzo, M. Lucarini, M. A. Mobilia, B. Omri, and A. Santini, “Rewiring cellular metabolism for heterologous biosynthesis of Taxol,” Natural Product Research, vol. 34, no. 1, pp. 110–121, 2019.

[273] R. Ueoka, A. Bhushan, S. I. Probst, W. M. Bray, R. S. Lokey, R. G. Linington, and J. Piel, “Genome-based identification of a plant-associated marine bacterium as a rich natural product source,” Angewandte Chemie - International Edition, vol. 57, no. 44, pp. 14519–14523, 2018.

[274] P. Hugenholtz and G. W. Tyson, “Metagenomics,” Nature, vol. 455, pp. 481–483, 2008.

[275] A. Bhushan, P. J. Egli, E. E. Peters, M. F. Freeman, and J. Piel, “Genome mining- and synthetic biology-enabled production of hypermodified peptides,” Na- ture Chemistry, vol. 11, no. 10, pp. 931–939, 2019.

[276] S. S. A. A. Hasson, J. K. Z. Al-Busaidi, and T. A. Sallam, “The past, current and future trends in DNA vaccine immunisations,” Asian Pacific Journal of Tropical Biomedicine, vol. 5, no. 5, pp. 344–353, 2015.

[277] K. Houser and K. Subbarao, “Influenza vaccines: Challenges and solutions,” Cell Host and Microbe, vol. 17, no. 3, pp. 295–300, 2015.

[278] S. A. Tomlins, D. R. Rhodes, S. Perner, R. Kuefer, C. Lee, and J. E. Montie, “Characterization of the reconstructed 1918 Spanish Influenza pandemic virus,” Science, vol. 310, pp. 644–648, oct 2005.

[279] C. Yang, S. Skiena, B. Futcher, S. Mueller, and E. Wimmer, “Deliberate reduc- tion of hemagglutinin and neuraminidase expression of influenza virus leads to an ultraprotective live vaccine in mice,” Proceedings of the National Academy of Sciences of the USA, vol. 110, no. 23, pp. 9481–9486, 2013.

[280] Y. X. Setoh, N. A. Prow, N. Peng, L. E. Hugo, G. Devine, J. E. Hazlewood, A. Suhrbier, and A. A. Khromykh, “De Novo generation and characterization of new Zika Virus isolate using sequence Data from a microcephaly case,” mSphere, vol. 2, p. 478, may 2017.

[281] M. M. Becker, R. L. Graham, E. F. Donaldson, B. Rockx, A. C. Sims, T. Sheahan, R. J. Pickles, D. Corti, R. E. Johnston, R. S. Baric, and M. R. Denison, “Syn- thetic recombinant bat SARS-like coronavirus is infectious in cultured cells and in mice,” Proceedings of the National Academy of Sciences of the USA, vol. 105, pp. 19944–19949, dec 2008.

150 Bibliography

[282] R. S. Noyce, S. Lederman, and D. H. Evans, “Construction of an infectious horsepox virus vaccine from chemically synthesized DNA fragments,” PLoS ONE, vol. 13, no. 1, p. e0188453, 2018. [283] F. Diaz-San Segundo, G. N. Medina, E. Ramirez-Medina, L. Velazquez-Salinas, M. Koster, M. J. Grubman, and T. de los Santos, “Synonymous deoptimization of Foot-and-Mouth Disease virus causes attenuation in vivo while inducing a strong neutralizing antibody response,” Journal of Virology, vol. 90, pp. 1298–1310, feb 2016. [284] L. de Fabritus, A. Nougairede,` F. Aubry, E. A. Gould, and X. de Lamballerie, “Attenuation of tick-borne encephalitis virus using large-scale random codon re- encoding,” PLoS Pathogens, vol. 11, pp. 1–19, mar 2015. [285] S. H. Shen, C. B. Stauft, O. Gorbatsevych, Y. Song, C. B. Ward, A. Yurovsky, S. Mueller, B. Futcher, and E. Wimmer, “Large-scale recoding of an arbovirus genome to rebalance its insect versus mammalian preference,” Proceedings of the National Academy of Sciences of the USA, vol. 112, pp. 4749–4754, apr 2015. [286] L. W. Riley, “Pandemic lineages of extraintestinal pathogenic Escherichia coli,” Clinical Microbiology and Infection, vol. 20, no. 5, pp. 380–390, 2014. [287] B. Kolodziejczyk and A. Kagansky, “Consolidated G20 synthetic biology policies and their role in the 2030 Agenda for Sustainable Development,” G20 Insights, pp. 1–10, 2017. [288] Z. Palkova, “Multicellular microorganisms: Laboratory versus nature,” EMBO Re- ports, vol. 5, no. 5, pp. 470–476, 2004. [289] A. Hammond, R. Galizi, K. Kyrou, A. Simoni, C. Siniscalchi, D. Katsanos, M. Grib- ble, D. Baker, E. Marois, S. Russell, A. Burt, N. Windbichler, A. Crisanti, and T. Nolan, “A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae,” Nature Biotechnology, vol. 34, no. 1, pp. 78–83, 2016. [290] J. Pugh, “Driven to extinction? The ethics of eradicating mosquitoes with gene- drive technologies,” Journal of Medical Ethics, vol. 42, no. 9, pp. 578–581, 2016. [291] J. Boldt, “Swiss watches, genetic machines, and ethics,” in Synthetic Biology (J. Boldt, ed.), pp. 1–9, Springer, 2016. [292] R. E. Sims, W. Mabee, J. N. Saddler, and M. Taylor, “An overview of second gen- eration biofuel technologies,” Bioresource Technology, vol. 101, no. 6, pp. 1570– 1580, 2010. [293] B. Giese, H. Wigger, C. Pade, and A. von Gleich, “Promising applications of synthetic biology–and how to avoid their potential pitfalls,” in Synthetic Biology (J. Boldt, ed.), pp. 195–215, Springer, 2016. [294] U. Secretary-General, “Special edition: progress towards the Sustainable Devel- opment Goals,” UN Economic and Social Council, 2019. [295] D. V. Goeddel, D. G. Kleid, and F. Bolivar, “Expression in Escherichia coli of chemically synthesized genes for human insulin,” Proceedings of the National Academy of Sciences of the USA, vol. 76, no. 1, pp. 106–110, 1979.

151 Bibliography

[296] M. Peferoen, “Progress and prospects for field use of Bt genes in crops,” Trends in Biotechnology, vol. 15, no. 5, pp. 173–177, 1997.

[297] F. Richter, A. Leaver-Fay, S. D. Khare, S. Bjelic, and D. Baker, “De novo enzyme design using Rosetta3,” PLoS ONE, vol. 6, no. 5, pp. 1–12, 2011.

[298] A. Zanghellini, “De novo computational enzyme design,” Current Opinion in Biotechnology, vol. 29, no. 1, pp. 132–138, 2014.

[299] H. Konig,¨ D. Frank, R. Heil, and C. Coenen, “Synthetic biology’s multiple dimen- sions of benefits and risks: implications for governance and policies,” in Synthetic biology (J. Boldt, ed.), pp. 217–232, Springer, 2016.

[300] M. S. Garfinkel, D. Endy, G. L. Epstein, and R. M. Friedman, “Synthetic genomics — options for governance.,” Biosecurity and : biodefense strategy, practice, and science, vol. 5, no. 4, pp. 359–362, 2007.

[301] Schweizer Eidgenossenschaft, “Verordnung des BLV uber¨ die Haltung von Ver- suchstieren und die Erzeugung gentechnisch veranderter¨ Tiere sowie uber¨ die Verfahren bei Tierversuchen,” 2010. https://www.admin.ch/opc/de/classified-compilation/20082892/index.html.

[302] D. DiEuliis, S. R. Carter, and G. K. Gronvall, “Options for Synthetic DNA Order Screening, Revisited,” mSphere, vol. 2, no. 4, pp. 2–5, 2017.

[303] S. R. Carter and R. M. Friedman, “DNA synthesis and biosecurity: lessons learned and options for the future,” J Craig Venter Institute, La Jolla, CA, 2015.

[304] C. Dahlberg, M. Bergstrom,¨ M. Andreasen, B. B. Christensen, S. Molin, and M. Hermansson, “Interspecies bacterial conjugation by plasmids from marine environments visualized by GFP expression,” Molecular Biology and Evolution, vol. 15, no. 4, pp. 385–390, 1998.

[305] C. E. Flores-Tinoco, M. Christen, and B. Christen, “Co-catabolism of arginine and succinate drives symbiotic nitrogen fixation,” BioRxiV, 2019.

152