Expanding the Toolkit

for Metabolic Engineering

Yao Zong (Andy) Ng

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY

2016

© 2016

Yao Zong (Andy) Ng

All rights reserved ABSTRACT

Expanding the Toolkit for Metabolic Engineering

Yao Zong (Andy) Ng

The essence of metabolic engineering is the modification of microbes for the overproduction of useful compounds. These cellular factories are increasingly recognized as an environmentally-friendly and cost-effective way to convert inexpensive and renewable feedstocks into products, compared to traditional chemical synthesis from petrochemicals. The products span the spectrum of specialty, fine or bulk chemicals, with uses such as pharmaceuticals, nutraceuticals, flavors and fragrances, agrochemicals, biofuels and building blocks for other compounds. However, the process of metabolic engineering can be long and expensive, primarily due to technological hurdles, our incomplete understanding of biology, as well as redundancies and limitations built into the natural program of living cells. Combinatorial or directed evolution approaches can enable us to make progress even without a full understanding of the cell, and can also lead to the discovery of new knowledge. This thesis is focused on addressing the technological bottlenecks in the directed evolution cycle, specifically de novo DNA assembly to generate strain libraries and small molecule product screens and selections.

In Chapter 1, we begin by examining the origins of the field of metabolic engineering.

We review the classic “design–build–test–analyze” (DBTA) metabolic engineering cycle and the different strategies that have been employed to engineer cell metabolism, namely constructive and inverse metabolic engineering. This is followed by a discussion of how combinatorial methods that draw on both approaches can improve the speed, cost and efficiency of metabolic engineering. Next, we discuss how enabling technologies such as DNA assembly methods and genome-wide mutagenesis techniques have served as a driver of developments in the field, as well as some current limitations in the area of screens and selections for natural products.

Finally, we offer some perspectives on how the field of metabolic engineering could be advanced in the future.

In order to re-engineer the metabolism of a cell, its genetic program has to be modified, either by mutating its DNA or introducing foreign genes and pathways. Furthermore, because metabolic engineering is a combinatorial optimization problem, it could be advantageous to create a library of strain variants in parallel for testing. However, a key bottleneck is the lack of tools to build long multi-gene pathways and libraries, especially in genetically intractable host cells. In Chapter 2, we report the de novo assembly of the ~ 100 kb meridamycin Type I modular polyketide synthase (mPKS) pathway using Reiterative Recombination. The mPKS pathways are difficult to manipulate due to their long, GC-rich and highly repetitive sequences.

Our strategy was to harness the power of yeast recombination to assemble and engineer the pathway in the chromosome of the yeast Saccharomyces cerevisiae, followed by transfer of the pathway into a heterologous Streptomyces lividans host for polyketide production. We also developed FLIP, which is based on site-specific recombination, for the efficient recovery of the pathway/library from the yeast chromosome onto shuttle plasmids. Lastly, we describe the construction of a promoter library in yeast using CRISPR in order to optimize the production of the polyketide meridamycin, a compound with promising neuroprotective activities. Our approach can also be potentially applied for combinatorial biosynthesis to produce novel analogs. Although there has been much progress in the development of techniques for the de novo assembly of pathways, libraries and even whole genomes, as well as large-scale mutagenesis approaches, it is widely held that the current bottleneck to applying combinatorial approaches to metabolic engineering is a lack of general and high-throughput screens and selections for the product. State-of-the-art methods such as liquid/gas chromatography – mass spectrometry

(LC/GC-MS) can detect a diverse range of compounds, but are too low-throughput (< 102 samples/day) to screen libraries of ≥ 103. Other methods that rely on detecting a product’s fluorescence, color, bioactivity or reactivity cannot be generally applied to the majority of compounds. Thus in Chapter 3, we established a high-throughput screen (104 – 105 samples/day) based on the Fluorescence Polarization (FP) assay, which in theory can be adapted as a screen for a wide range of small molecules and natural products. The FP assay works by measuring the changes in fluorescence polarization values when the target molecule competes with a fluorescently-tagged reporter molecule for binding to a common receptor protein. As a proof-of- principle, we applied the FP assay to screen a UV-mutagenized library of Streptomyces tsukubaensis and identified strains that produced higher titers of FK506 (tacrolimus), an immunosuppressant drug. We also demonstrate the biosynthesis of FK506 in 96-well, deep well, microtiter plates, which is compatible with the throughput of the FP assay performed in 384-well microtiter plates.

In contrast to a screen, genetic selections enable much higher throughput (> 108 samples/day limited by culture volume), since the population of cells can be assayed together in a one pot growth selection. However, this requires that the production of the target compound confer a growth advantage on the cell. In Chapter 4, we describe how the yeast three-hybrid system (Y3H) can be adapted to detect a product by displacement of one arm of a small molecule dimerizer that bridges two receptor protein fusions that reconstitute an active transcription factor, thus changing the transcription levels of a reporter gene. We report Y3H systems capable of detecting trimethoprim (TMP), meridamycin and FK506, and linking detection to the transcription of a selectable marker. This sets the stage to use the Y3H for genetic selections, if both the production and detection functions can be performed in the same cell.

Although metabolic engineering can potentially be used to biosynthesize a large portfolio of compounds, there are limitations to relying completely on this approach. For instance, the final products could be toxic to the cell, or the enzymes for a reaction step could be non-existent in nature, or more often than not, the biosynthetic process from feedstock to final compound is not economically viable or would take too long to develop. Thus, many groups have developed semi-synthetic approaches that combine biosynthetic steps with chemical synthesis to generate compounds. In Chapter 5, we describe novel semi-synthetic strategies leading to Ambrox, a key component of fragrances, and the tocotrienols, key components of Vitamin E that possess anti- oxidant and anti-cancer properties, as well as their derivatives. Both strategies rely on engineering the yeast S. cerevisiae or the bacterium E. coli to produce linear terpenoid intermediates followed by modification with synthetic chemistry. To derive Ambrox, E. coli was engineered to produce the unnatural polyene intermediate Z,E-farnesol, which was cyclized by the halonium-based reagents BDSB (bromodiethylsulfonium bromopentachloroantimonate) and

IDSI (iododiethylsulfonium iodopentachloroantimonate) to form halo-Ambrox, which could be readily converted to 9-epi-Ambrox and its analogs. For the tocotrienols, yeast was engineered using a triple enzyme fusion to produce E,E,E-geranylgeraniol, which was converted in one step to the tocotrienols using a novel Lewis acid – Phase Transfer Catalyst system (Type I Adanve acid). Table of Contents

List of Figures v

List of Tables ix

List of Abbreviations x

Acknowledgements xiv

Dedication xvii

Chapter 1 Strategies and Enabling Technologies for Metabolic Engineering 1

1.0 Chapter outlook 2

1.1 Metabolic engineering – origins 3

1.2 Classic strategies for metabolic engineering 6

1.2.1 The design–build–test–analyze cycle 8

1.2.2 Constructive metabolic engineering 10

1.2.3 Inverse metabolic engineering 16

1.3 Combinatorial directed evolution strategies and tools for metabolic

engineering 18

1.3.1 Mutagenesis to create single gene libraries 20

1.3.2 Mutagenesis to create pathway libraries 22

1.3.3 Genome-wide mutagenesis 30

1.3.4 Screening and selection technologies for small molecule products 40

1.4 Conclusions and future perspectives 46

1.5 References 49

i

Chapter 2 De novo Assembly and Promoter Engineering of the ~ 100 kb Type I

Meridamycin Polyketide Biosynthetic Pathway 64

2.0 Chapter outlook 65

2.1 Introduction 66

2.2 Results 68

2.2.1 ~ 100 kb meridamycin PKS pathway assembly by

Reiterative Recombination 68

2.2.2 Sequence editing and promoter library construction using

CRISPR 74

2.2.3 Pathway recovery from the chromosome onto a shuttle vector by

FLIP 77

2.2.4 Meridamycin production and promoter library screening 90

2.3 Discussion 92

2.4 Experimental methods 99

2.5 Assembly PCR, check PCR and CRISPR tables 108

2.6 Strains and plasmids 119

2.7 References 124

Chapter 3 Fluorescence Polarization Assay for High-throughput Small Molecule

Screens 132

3.0 Chapter outlook 133

3.1 Introduction 134

3.2 Results 136

3.2.1 High-throughput FP assay to detect FK506 136

ii

3.2.2 Production of FK506 in 96-well microtiter plate format 138

3.2.3 Extraction of FK506 and growth determination in

96-well microtiter plates 141

3.2.4 Random mutagenesis and screening for improved

FK506 production 143

3.2.5 Scale-up of top mutants in shake flasks 145

3.2.6 Measuring the performance of the FP assay by Z’ score 146

3.3 Discussion 147

3.4 Experimental methods 152

3.5 References 157

Chapter 4 Yeast Three-Hybrid Assay for Small Molecule Screens and Selections 165

4.0 Chapter outlook 166

4.1 Introduction 167

4.2 Results 170

4.2.1 Yeast three-hybrid screen for trimethoprim 170

4.2.2 Yeast three-hybrid screen for meridamycin 173

4.2.3 Yeast three-hybrid screen for FK506 177

4.3 Discussion 181

4.4 Experimental methods 183

4.5 References 187

Chapter 5 Semi-synthesis Approaches to the Terpenoids Ambrox and

the Tocotrienols 191

5.0 Chapter outlook 192

iii

5.1 Introduction 193

5.2 Results 196

5.2.1 Semi-synthesis approach to Ambrox and derivatives 196

5.2.2 Biosynthesis of Z,E-farnesol in E. coli 199

5.2.3 Biosynthesis of E,E-farnesol in S. cerevisiae 201

5.2.4 Semi-synthesis approach to the tocotrienols (Vitamin E) 204

5.2.5 Biosynthesis of E,E,E-geranylgeraniol in S. cerevisiae 205

5.3 Discussion 210

5.4 Experimental methods 212

5.5 Strains, plasmids, oligonucleotides and media 221

5.6 Spectra 233

5.7 References 243

Appendix Sequences of Oligonucleotides, IDT gblocks, Pathways Constructed

by Reiterative Recombination and Shuttle Plasmids 248

A.1 Oligonucleotides used in Chapter 2 249

A.2 dsDNA gblocks used in Chapter 2 261

A.3 Sequence of Fragment 1 yeast assembly 265

A.4 Sequence of Fragment 2 yeast assembly 281

A.5 Sequence of Fragment 3 yeast assembly 295

A.6 Sequence of Fragment 4 yeast assembly 317

A.7 Sequence of pAN84 330

A.8 Sequence of pAN146 336

iv

List of Figures

Figure 1-1 The Design-Build-Test-Analysis metabolic engineering cycle. 7

Figure 1-2 Biosynthetic pathways to produce 1,3-propanediol, 1,4-butanediol

and artemisinic acid. 12

Figure 1-3 Directed evolution cycle. 19

Figure 1-4 Yeast gap repair for DNA assembly on plasmids. 25

Figure 1-5 Reiterative Recombination cycle. 27

Figure 1-6 Synthetic yeast chromosome and SCRaMbLE. 29

Figure 1-7 Global transcription machinery engineering. 31

Figure 1-8 Multiplex automated genome engineering. 32

Figure 1-9 Trackable multiplex recombineering. 34

Figure 1-10 Comparison between the TALENs, ZFNs and CRISPR/Cas9. 37

Figure 1-11 Engineering protein receptor – transducer fusions. 42

Figure 1-12 Engineering RNA aptamers to create small molecule sensors. 43

Figure 1-13 Harnessing natural regulators to create small molecule selections. 45

Figure 2-1 ~ 100 kb meridamycin Type I NRPS – PKS pathway assembly scheme. 69

Figure 2-2 PCR verification of complete PKS fragment assemblies. 70

Figure 2-3 Pulsed field gel electrophoresis and Southern blotting to verify PKS

assemblies. 71

Figure 2-4 Fitness tests for yeast strain containing PKS fragment 3. 72

Figure 2-5 Sub-culture scheme to check the stability of PKS assemblies in the

chromosome. 73

Figure 2-6 PCR to check the stability of PKS assemblies in the chromosome. 73

v

Figure 2-7 Promoter library construction by CRISPR to optimize meridamycin

production. 76

Figure 2-8 Proportion of 30 colonies containing each promoter after one round

of CRISPR. 77

Figure 2-9 Design of the FLIP system based on RMCE. 78

Figure 2-10 FLIP efficiencies with culture time. 79

Figure 2-11 Design of FRT sites, yeast and bacterial marker genes for iterative

FLIP. 81

Figure 2-12 Individual FLIPs to recover all four fragments onto plasmids. 82

Figure 2-13 FLIP efficiencies of Fragment 1, 2, 3 and 4. 83

Figure 2-14 Restriction analysis of plasmids obtained by FLIP. 84

Figure 2-15 Colony PCR to test for the recovery of the pathway onto pAN438 by

TAR. 86

Figure 2-16 Restriction analysis of TAR plasmids containing fragment 3. 87

Figure 2-17 Restriction digest and ligation scheme to clone pathway fragments. 88

Figure 2-18 Restriction analysis of plasmids containing fragment 4 promoter library. 89

Figure 2-19 RT-PCR to test for expression of fragment 4 in S. lividans K4-114. 90

Figure 2-20 Standard curves to detect meridamycin via the FP assay. 92

Figure 3-1 FP assay to detect FK506. 137

Figure 3-2 Production of FK506 in 96-well plates. 140

Figure 3-3 Extraction of FK506 and growth determination in 96-well plates. 142

Figure 3-4 High-throughput mutagenesis and screening platform. 144

Figure 3-5 Production titer of FK506 in shake flasks for the top 8 mutants by

vi

productivity. 145

Figure 3-6 Z’ score for the FP assay. 147

Figure 4-1 Y3H assay for small molecule detection. 169

Figure 4-2. Y3H system to detect TMP. 171

Figure 4-3. TMP detection using the Y3H system. 172

Figure 4-4. Y3H system to detect meridamycin or FK506. 174

Figure 4-5. LEU2 selection with Y3H systems containing FKBP12 receptor mutants. 175

Figure 4-6. Detection of 100 nM and 5 µM of meridamycin standards with the

H87L Y3H system. 176

Figure 4-7. Growth curves with different concentrations of meridamycin. 177

Figure 4-8. Detection of 100 nM and 5 µM of FK506 standards with the H87L

Y3H system. 178

Figure 4-9. Growth of the Y3H strain in the presence of various extracts of

unfermented ISPz media. 179

Figure 4-10. Detection of biosynthesized FK506 using the Y3H system. 180

Figure 5-1 Comparison of natural and semi-synthetic routes to 9-epi Ambrox

and analogs. 196

Figure 5-2 Scheme to derive 3-Z,E-homofarnesol from (a) Z,E-FOH and

(b) E,E-FOH. 197

Figure 5-3 Scheme for cyclization of homofarnesol to halogenated 9-epi

Ambrox and further derivatization to novel analogs. 198

Figure 5-4 Biosynthetic pathways for the production of Z,E-FOH and E,E-FOH. 200

Figure 5-5 Production yields of Z,E-FOH and E,E-FOH in E. coli expressing

vii

different Z,E-FPPSs. 201

Figure 5-6 Production yields of prenyl alcohols in S. cerevisiae strains engineered

for Z,E-FOH production. 203

Figure 5-7 Semi-synthesis scheme for the production of the tocotrienols. 205

Figure 5-8 Biosynthetic pathway for the production of E,E,E-geranylgeraniol in S.

cerevisiae. 206

Figure 5-9 Production yield of GGOH in engineered yeast strains. 208

Figure 5-10 Production yield of prenyl alcohols in yeast strains engineered for GGOH

production. 209

Figure 5-11 GC-MS traces of E. coli fermentations from FPP2 flask fermentation. 233

Figure 5-12 GC-MS traces of E. coli fermentations from FPP2 bench-scale

production. 234

Figure 5-13 GC-MS traces of E. coli fermentations of the other Z,E-FPPSs. 237

Figure 5-14 1H-NMR of biosynthetic Z,E-FOH from E. coli. 238

Figure 5-15 GC-MS traces of S. cerevisiae fermentations for Z,E-FOH production. 240

Figure 5-16 GC-MS analysis of yeast fermentation extracts and authentic GGOH

standard. 241

Figure 5-17 1H-NMR of biosynthetic GGOH from bench-scale yeast fermentation. 242

viii

List of Tables

Table 2-1 Mutations in yeast assemblies. 74

Table 2-2 Assembly PCRs for Fragment 1. 108

Table 2-3 Assembly PCRs for Fragment 2. 109

Table 2-4 Assembly PCRs for Fragment 3. 110

Table 2-5 Assembly PCRs for Fragment 4. 112

Table 2-6 PCRs to verify Fragment 1. 113

Table 2-7 PCRs to verify Fragment 2. 114

Table 2-8 PCRs to verify Fragment 3. 115

Table 2-9 PCRs to verify Fragment 4. 116

Table 2-10 CRISPR details and gRNA primers. 117

Table 2-11 CRISPR details and repair DNA primers. 118

Table 2-12 Strains used in this study. 119

Table 2-13 Plasmids used in this study. 122

Table 5-1 Strains used in this study. 221

Table 5-2 Plasmids used in this study. 222

Table 5-3 Primers and dsDNA gene fragments (gblocks) used in this study. 224

Table 5-4 Yeast synthetic-dropout media recipe. 231

Table 5-5 Amino acids used for 20X amino acid mix. 232

ix

List of Abbreviations

5-FOA 5-fluoroorotic acid

AD activation domain of transcription factor

AmpR ampicillin resistance gene

BDO 1,4-butanediol bp base pair

°C degrees Celsius

CmR chloramphenicol resistance

DBD DNA-binding domain of transcription factor

Dex dexamethasone

DHFR dihydrofolate reductase

DMF dimethyl formamide

DMSO dimethylsulfoxide

DNA deoxyribonucleic acid dNTP deoxynucleotide triphosphate dsDNA double-stranded DNA

E. coli Escherichia coli et al. et alia

FACS fluorescence-activated cell sorting

E,E-FOH E,E-farnesol

Z,E-FOH Z,E-farnesol

FPP farnesyl pyrophosphate g gram

x

G418 geneticin gal galactose gDNA genomic DNA

GFP green fluorescent protein

GR glucocorticoid receptor gTME global transcription machinery engineering h hour

IPP isopentenyl pyrophosphate

IPTG isopropyl β-D-thiogalactopyranoside

KanR kanamycin resistance

Kb kilobase

L liter

LexAop operator binding a dimer of the LexA DNA-binding domain

M moles per liter

MAGE multiplex automated genome engineering

Mb megabase

MCS multiple cloning site mg milligram min minute mL milliliter

Mtx methotrexate nm nanometer nM nanomoles per liter

xi

NRPS nonribosomal peptide synthase

ODx optical density at x nm ori origin of replication

PCR polymerase chain reaction

PDO 1,3-propanediol

PKS polyketide synthase pM picomoles per liter

RNA ribonucleic acid rpm revolutions per minute s second

S. cerevisiae Saccharomyces cerevisiae

SC synthetic complete

SCRaMbLE synthetic chromosome recombination and modification by LoxP-mediated

evolution

SDS sodium dodecyl sulfate

S. lividans Streptomyces lividans t time

µg microgram

µL microliter

µM micromoles per liter

UV ultraviolet

Y2H yeast two-hybrid

Y3H yeast three-hybrid

xii

YOGE yeast oligo-mediated genome engineering

YPD yeast peptone dextrose media

xiii

Acknowledgements

First, I would like to give thanks to my advisor, Professor Virginia Cornish, for being such a wonderful mentor. Over the course of five years, she has not only provided opportunities to learn new knowledge, but also to develop my interpersonal skills and to work with many collaborators and students. She has always reminded her students to think of the big picture and encouraged us to excel to our full potential.

Many thanks also goes to my PhD defense committee, Professor Jef Boeke, Professor

Harris Wang, Professor Wei Min and Dr. George Ellestad for kindly sharing their time and expertise, and providing helpful suggestions to improve my dissertation. I would also like to thank Professor Harris Wang and Professor Ronald Breslow, for providing much useful feedback during my second year defense and fourth year original research proposal.

I am also grateful to all of my colleagues in the Cornish lab, past and present, for all the invigorating discussions, help and friendship that we have had. In particular, Pedro Baldera-

Aguayo has been instrumental in helping to perform experiments for the FP project (Chapter 3).

He is also picking up the remaining work for meridamycin production and screening in Chapter

2. Bertrand Adanve developed the exciting projects in Chapter 5 and we spent many late nights doing both science and dinner. I have also had the pleasure to collaborate with Ehud Herbst,

Miguel Jimenez, Andrew Anzalone, Marie Harton, Mia Shandell, Caroline Patenode, Tracy

Wang, Lisa Kahl (Biology PhD rotation student, Fall 2014) and Nathaniel Jaffe (PhD rotation student, Spring 2014), all talented people in their own right. I also obtained much valuable technical help and advice from Dr. Dante Romanini, Dr. Zhixing Chen, Dr. Nili Ostrov, Dr.

Sonja Billerbeck, Dr. Anna Kaplan and Dr. Kenichi Shimada (the latter two from the Stockwell lab).

xiv

During my PhD, I have had the opportunity to mentor several high school, undergraduate and masters students, many of whom contributed significantly to my projects. My gratitude goes to Hans (Mingjian) Gao (Undergrad, 2014-2016), Dean Deng (High school student then

Undergrad, 2012-2016) and Estefania Chavez (Undergrad, 2012-2014) for their long-term commitment, dedication and perseverance on the polyketide project (Chapter 2). Other students who have contributed to this project include Jingkang Chen (Masters, 2013), Jeffrey Li (Cornell

Undergrad, Summer 2014), Millicent Olawale (Post-Bac, 2013-2014), Madison Jones

(Undergrad, 2012-2013), Kevin Vo (Undergrad, 2012-2013), Matthew Wilder (Masters, Summer

2013), Yu-Wei Chang (Masters, Summer 2013), Camilla Bykhovsky (High school student,

Summer 2014), Benjamin Jones (High School Student, Summer 2015) and Leah Howard (High

School Student, Summer 2013). For the yeast engineering projects to produce terpenoids

(Chapter 5), I am grateful to Erik Aznauryan (Masters, Summer 2014) for all his hard work and effort, and Mehaa Bajaj (Masters, Spring 2014). Lastly, I would like to thank Rea Globus

(Masters, Summer 2014) for all his dedication and preliminary work on the FP and Y3H project

(Chapters 3 and 4).

I would also like to thank my collaborators for all their kind advice, help and support. In particular, Professor Alessandra Eustaquio (University of Illinois Chicago, formerly Pfizer) performed the conjugation experiments in Chapter 2. Jeffrey Janso (Pfizer) also performed conjugation experiments and provided much support in terms of materials for our experiments.

Dr. Frank Koehn (former Head of Natural Products Lab, Pfizer) provided strong and continuous support during our collaboration. I also learnt much about the biotech industry from our work with Dr. Philip Goelet (co-founder, Acidophil LLC), Dr. Simon Aspland (Acidophil LLC) and

Dr. Brian Green (Acidophil LLC).

xv

Finally, I would like to thank my parents, family and friends for all their unconditional and unwavering love and support throughout the years. During the eight years I have spent away from home to pursue my bachelor and PhD degrees, their constant presence and encouragement has made this journey so much brighter, successful and meaningful. Also, Tongchaoran Gao provided much invaluable help with the figures in this thesis.

xvi

To my family, for their love and support.

xvii

Chapter 1

Strategies and Enabling Technologies

for Metabolic Engineering

1

1.0 Chapter outlook

Metabolic engineering has the potential to revolutionize how chemicals are made – moving from organic synthesis using petrochemical sources or low-yield extraction from natural sources to high-yield biosynthesis using engineered microbes. Although fermentation technology for the production of chemicals such as ethanol, acetic acid and lactic acid has existed for centuries, but the microbes used were either natural strains or derived by artificial selection. In contrast, the goal of metabolic engineering is to optimize metabolic flux or improve the fitness of microbes via genetic engineering for the overproduction of a target compound. The key advance that enabled this was the establishment of recombinant DNA technology in the 1970s. Since the inception of metabolic engineering in the 1990s, the field has blossomed into an interdisciplinary one encompassing biochemistry, genetic engineering and analytical techniques. In this chapter, we trace the origins of metabolic engineering as a discipline within the broader field of industrial microbial biotechnology. Next, starting with the classic design-build-test-analyze cycle, we review the two key approaches that have been employed in the field, namely constructive metabolic engineering and inverse metabolic engineering. We then present a third approach using combinatorial libraries or directed evolution to increase the efficiency of strain development. This approach has been enabled by advances in technologies for DNA assembly and genome-wide mutagenesis to construct gene, pathway, network or whole genome libraries, as well as product screens and selections for testing the library. We highlight some prominent technologies and discuss their applications and limitations. Finally we offer some perspectives on future developments in the field.

2

1.1 Metabolic engineering – origins

For centuries, humans have carried out fermentation for the production of beer, wine, bread, cheese and soya. However, it was only with the direct observation of microbial cells by

Leeuwenhoek in the 1680s, and studies by Pasteur from the 1850s–1870s, that the role of microbes in fermentation was established, and technologies developed for their isolation, characterization and sterile culture1. This led not only to a spurt in the output volume of the aforementioned products, but also to a greater diversity of fermentation products, including the commodity (bulk) chemicals lactic acid2 and citric acid3. During World War I, to meet war needs, acetone and butanol were made by the Weizmann (ABE) fermentation process4 and glycerol by the sulfite-steered yeast fermentation process5. However, the rise of the oil industry in the early 20th century led to the replacement of these fermentation processes to produce bulk chemicals with cheaper petrochemical routes.

A key turning point in microbial biotechnology was the discovery of penicillin by

Fleming in 1928, which ushered in the age of antibiotics6. Research on penicillin production began in 1941, driven by military needs during World War II. After two years of intensive efforts involving screening of natural strains from different environments and random mutagenesis, production levels were improved by 140,000 times1. Encouraged by this success, research on antibiotics continued and led to compounds such as actinomycin, streptomycin, erythromycin and tetracycline7. Because these natural products had more complex structures than bulk chemicals, they were difficult to synthesize chemically, and thus the only economically viable option to produce them was via fermentation of microbes.

In the following decades, analogs of the original antibiotics were developed, in order to improve their pharmacokinetic properties or to circumvent resistance mechanisms of bacteria.

3

For instance, minocycline and doxycycline have longer half-lives and are active against a wider range of bacteria than their parent antibiotic tetracycline8. These analogs were often produced by semi-synthesis9: first using microbial fermentation to biosynthesize an intermediate compound, which is then chemically modified to create the final compound. This strategy was adopted instead of relying solely on biosynthesis as the final compound could be toxic to the producer cell, or there could be a lack of natural enzymes to catalyze a chemical reaction or due to the overall economics of the process.

Another prominent branch of industrial biotechnology in the 20th century was the use of microbial cells, whole cell extracts or purified enzymes as catalysts for biotransformation10.

Whereas fermentation involves a multi-step reaction pathway, biotransformation consists of one or two steps to produce a compound that is structurally similar to the substrate. Some notable examples were the production of vinegar (acetic acid) by oxidation of ethanol using acetic acid bacteria11, the conversion of cortisol to prednisolone (an anti-inflammatory drug) using

Corynebacterium simplex12, and the conversion of acrylonitrile to acrylamide using nitrile hydratase13. Since biotransformation relies on enzymes, the reactions occur in mild operating conditions and produce compounds with high regioselectivity and stereospecificity.

In the 1970s, the field of biotechnology was revolutionized with the development of recombinant DNA techniques. In 1970, Hamilton Smith discovered restriction enzymes, which could cleave dsDNA at specific target sites14. Then in 1972, Paul Berg produced the first recombinant DNA by the insertion of viral DNA into bacterial DNA using restriction digest followed by ligation15. This was followed in 1973 by Herbert Boyer and Stanley Cohen, who expressed recombinant DNA from the frog Xenopus laevis in the bacterium E. coli16. This was

4

an exciting moment as it raised the possibility that rapid-growing microbes could be reprogrammed and cultivated in large scale to produce desired genes and their products.

It was not long before the first drug – human insulin, manufactured using bacteria engineered with recombinant DNA, was approved by the FDA in 198217. This obviated the need for the purification of insulin from the pancreas of cattle and pigs to treat diabetes. Instead, microbes could be fermented in a controlled environment to efficiently and reproducibly produce large quantities of the pure drug. This achievement sparked a “gold rush” in biotechnology to clone and express heterologous genes in production hosts to produce a variety of compounds, including peptide-based drugs (e.g. hormones, growth factors, enzymes and antibodies), vaccines

(e.g. Hepatitis B), commercial enzymes (e.g. lipases, proteases and carbohydrases), primary metabolites (e.g. amino acids, organic acids, vitamins and nucleotides) and secondary metabolites with important bioactivities (e.g. anti-bacterial, anti-fungal and anti-parasitic)18.

In 1991, the term “metabolic engineering” was coined by James Bailey, Gregory

Stephanopoulos and Joseph Vallino. Metabolic engineering was defined as the improvement of cellular activities through the use of recombinant DNA techniques to modify the enzymatic, regulatory or transport systems in the cell19. This broad definition recognizes that metabolic engineering is not only about manipulating biosynthetic genes or pathways to overproduce a compound, but that the same techniques can be applied to optimize relevant physiological properties of the cell. Some examples include overcoming product feedback inhibition, increasing tolerance to hypoxia or high osmotic pressure, ensuring the delivery of precursors to reaction sites, and the channeling of intermediates between enzymes and cellular compartments.

This definition also sets metabolic engineering apart from traditional microbial biotechnology.

While the former involves the use of DNA engineering techniques to alter the genetic program of

5

a cell, the latter relies on natural strains from the environment or strains modified by artificial selection.

Another early definition of metabolic engineering is “the directed improvement of product formation or cellular properties through the modification of specific biochemical reaction(s) or the introduction of new one(s) with the use of recombinant DNA technology20.”

Thus, metabolic engineering is not only about redirecting fluxes in the cell, but also potentially, the creation of new ones by the introduction of foreign genes or pathways. These may serve to expand the repertoire of chemistry in the cell in order to create non-natural products or to utilize non-traditional feedstocks. Other activities that were classified as metabolic engineering include heterologous protein production, xenobiotic degradation and the elimination or reduction of unwanted side products21. Furthermore, a sense of “direction” is inherent in all metabolic engineering efforts and implies making rational changes to the enzymes, regulators, pathways or strain background (constructive metabolic engineering, see Section 1.2.2). However, random or inverse metabolic engineering (see Section 1.2.3) or combinatorial approaches (see Section 1.3) play a critical role to reduce the time and cost of strain development, given a limited knowledge of the cell. These approaches can also lead to the discovery of new regulators and interacting genes, which can be used in future rational designs.

1.2 Strategies for metabolic engineering

Nature has imbued cells with robust regulatory networks that maintain homeostasis in the face of environmental changes. This is actually an advantage of using live cells as bio-factories, since the conditions during fermentation can be extreme (e.g. low pH and high osmotic stress).

However, these regulatory networks also serve to balance the flux of metabolites in the cell.

6

Thus, while it may be possible to create cells that produce many natural chemicals in low amounts, it is more difficult to redirect the flux to obtain high yields of a particular metabolite22.

This problem is compounded by our lack of understanding of all of the interacting enzymatic, regulatory or transport networks in the cell. Hence, metabolic engineers typically employ an iterative cycle of design, build, test and analysis, which borrows from the engineering design process23 (Fig. 1-1).

Figure 1-1. The Design-Build-Test-Analysis metabolic engineering cycle.

7

1.2.1 The design–build–test–analyze cycle

The first step in the design-build-test-analyze (DBTA) cycle is “design”24 (Fig. 1-1). This process involves selecting a host organism for production, taking into account factors such as toxicity of the product or intermediates, availability of metabolic precursors, feedstock utilization, and growth characteristics (e.g. optimal temperature and pH). Next, the biosynthetic pathway has to be designed to optimize flux towards the product and decrease flux towards competing reactions. This involves modeling the enzymes (kcat and Km) in the cell, their associated metabolite fluxes, as well as their transcription, translation and regulation, in order to identify key nodes for intervention. If the final product is non-natural to the cell, enzymes that catalyze the intermediate reactions leading to the product have to be identified and cloned for heterologous expression in the host cell. However, if there are no enzymes available, or if improvements to the enzymes are desired (e.g. increase kcat, decrease Km or improve substrate specificity), directed evolution can be used to evolve novel enzymes. Once a promising design or set of designs are made, the next stage is to construct the strains containing the various modifications.

The “build” step involves using genetic engineering techniques to overexpress, knockdown, delete or mutate enzymes or their regulators in the desired host cell, and is a key step in metabolic engineering25. Indeed, the birth of the field was enabled by the development of recombinant DNA techniques (see Section 1.1). Recently, the rise of synthetic biology offers a new paradigm and set of tools to construct gene pathways, circuits and networks, by combining well-characterized “parts” consisting of promoters, genes, terminators and regulators26. In the last few years, technologies for DNA assembly and mutagenesis have blossomed. Coupled with the decrease in cost per base pair of DNA synthesis, and the increasing lengths of dsDNA blocks

8

that are readily available commercially (~ 5 kb), it is now possible to perform extensive mutagenesis of the genome to create large, random or focused libraries (see Section 1.3), or to construct long, multigene, designer pathways, circuits and even whole genomes de novo (see

Chapter 2) (Fig. 1-1).

The next step after strain construction is the “test” phase27. Broadly, this encompasses all attempts to characterize the strain and its performance in culture. This includes sequencing to verify the successful incorporation of genes or mutations, gene expression profiling, as well as studying protein and metabolite changes in the cell. One can take advantage of various -omics technologies such as whole genome sequencing, RNA-seq, high-throughput proteomics and metabolomics that can rapidly measure global changes in gene sequences, RNA transcripts, proteins and small molecule metabolites respectively28 (Fig. 1-1). Another important tool is metabolic flux analysis with 13C or other labelled substrates, which provides a dynamic view of changes in the flow of metabolites in the cell29,30. In terms of product output, the Titer (final concentration of product in the fermentation medium), Rate (production per unit time), and Yield

(amount of product made per unit of feedstock consumed), known as “TRY”23, are important parameters to be determined by specific product screens or gas/liquid chromatography–mass spectrometry (GC/LC-MS) (see Section 1.3.4 and Chapter 3). The strain(s) can be initially tested in small-scale microfluidic cells, multi-well plates or test tubes, and later scaled up to shake flasks, bench-top fermenters or pilot fermenters31.

Finally, the “analyze” or “learn” phase seeks to create new knowledge of how various genetic perturbations affect cellular metabolism. This knowledge can be incorporated into future designs. There is the potential for synergy with the field of systems biology, which seeks to study the molecular mechanisms of living cells using both mathematical modelling and experimental

9

biology, in order to predict biological function32. Indeed, many of the tools to interpret large data sets from –omics approaches are identical. For instance, statistical techniques such as principal component analysis (PCA) and machine learning algorithms can be applied to discover obscure patterns and trends33. In order for the information to be broadly useful, there needs to be a detailed recording of as many parameters as possible, such as a list of genetic perturbations made to a strain, cultivation conditions and sampling time. Finally, the “analyze” step can also serve to uncover additional interacting enzymes or regulators that were previously unknown, and contribute to a more comprehensive understanding of networks in the cell.

1.2.2 Constructive metabolic engineering

The DBTA cycle is a process that is performed in order to overcome gaps in our understanding of cell metabolism. However, the first step “design” implies a certain degree of knowledge about the enzymes, regulators or their interacting partners in a biosynthetic pathway.

This is then used to formulate hypotheses about changes that can be made to increase the titer, rate or yield of the product. This approach has been traditionally adopted by metabolic engineers and is termed “constructive”, “rational” or “reductionist” metabolic engineering34. This will be further described using examples drawn from some notable successes in the field (Fig. 1-2).

Selection of a production host

One of the first steps in metabolic engineering is deciding on a production host.

Typically, model organisms such as E. coli or S. cerevisiae are chosen as their biology is relatively well-characterized – there are models and databases of their metabolic networks, pathways, enzymes and reactions35, as well as detailed genome36,37, transcriptome37,38, proteome39,40 and metabolome41,42 information. Many tools have been developed for their genetic

10

engineering, their cultivation and fermentation technology is mature, and they often enjoy favorable regulatory status. For instance, DuPont Tate & Lyle have engineered E. coli for the commercial production of bio-based 1,3-propanediol (PDO)43, and Genomatica have engineered

E. coli to produce the commodity chemical 1,4-butanediol (BDO)44, both using renewable glucose feedstocks instead of petrochemical sources (Fig. 1-2). However, in the Semi-synthetic

Artemisinin Project45, which aimed to develop a microbial strain to produce the anti-malarial drug precursor artemisinic acid, engineering efforts were switched from the prokaryote E. coli to the eukaryote S. cerevisiae, as a late step in the pathway required the expression of an eukaryotic cytochrome P450 enzyme derived from the plant Artemisia annua.

Enzyme identification and pathway design

Another key step in constructive metabolic engineering is to identify enzymes and design pathways that can convert a cellular metabolite to the final compound. If the compound is non- native to the cell, enzymes have to be identified from foreign sources or evolved by directed evolution for heterologous expression in the production host. PDO is naturally produced by many bacteria when grown anaerobically on glycerol, but economics strongly favor an aerobic process starting from the cheaper feedstock glucose. Thus, a four-step reaction pathway starting from the glycolytic pathway intermediate dihydroxyacetone phosphate (DHAP) was designed43

(Fig. 1-2). For BDO, which is not produced naturally by any organism, 10,000 pathways were generated by bioinformatics and ranked by maximum theoretical yield, pathway length, number of non-native steps, number of novel steps and thermodynamic feasibility. A five-step reaction pathway starting from the tricarboxylic acid (TCA) cycle metabolite succinyl-CoA and proceeding through the intermediate 4-hydroxybutyrate (4-HB) was identified for testing44 (Fig.

1-2). In order to produce artemisinic acid, a four-step enzymatic pathway from A. annua that

11

catalyzed the conversion of farnesyl pyrophosphate (FPP) to artemisinic acid was codon- optimized and expressed heterologously expressed in S. cerevisiae46 (Fig. 1-2).

Figure 1-2. Biosynthetic pathways to produce 1,3-propanediol, 1,4-butanediol and artemisinic acid.

12

Single enzyme changes

Another rational change that is made during strain engineering is the identification of flux-limiting enzymes in the pathway and attempting to increase their activity. This can be achieved by raising gene copy number via expression on high-copy plasmids or multi-copy integration into the chromosome, increasing transcription efficiency through the use of stronger promoters, performing codon optimization, engineering of ribosome binding sites to improve translation efficiency, and less commonly, engineering of the mRNA transcripts25,47. The enzyme itself can be engineered via directed evolution to boost its catalytic activity (higher kcat, lower km), improve substrate specificity or to eliminate product feedback inhibition. Alternatively, homologs of the enzyme possessing more desirable attributes can be discovered via library screening approaches48.

During strain engineering for the production of PDO, BDO and artemisinic acid, non- native genes had to be cloned and expressed in the production host to channel flux from native metabolites in the cell towards the desired product (Fig. 1-2). For PDO, the native E. coli oxidoreductase yghD was found to lead to better yields when compared to the original heterologous enzyme DhaT49. In the case of BDO, it was observed that production rate decreased significantly when BDO titer reached 80-90 g/L, with a concomitant increase in the intermediate compound 4-HB. A series of diagnostic experiments revealed that the downstream enzyme Cat2 was inhibited by high concentrations of BDO. Thus, directed evolution was performed to discover Cat2 variants that were insensitive to high levels of BDO. When the best Cat2 variant was expressed together with an improved Ald enzyme, the production levels of BDO increased by 20 % to > 110 g/L and 4-HB levels decreased by 75 %44. For the biosynthesis of artemisinic acid in yeast, the rate-limiting enzyme in the mevalonate pathway, 3-hydroxy-3-methylglutaryl

13

coenzyme A reductase, was overexpressed by integrating three copies of the gene into the chromosome to ensure a steady supply of precursors45. In fact, a truncated version of the enzyme was used (tHMGR), which lacked the regulatory domains for feedback control, thus enabling higher levels of flux through the pathway50.

Pathway, network or regulatory changes

Changes can also be made on a broader scale to multiple enzymes in the pathway or other interacting networks. For instance, increasing the activity of a flux-limiting enzyme may lead to other enzymes becoming rate-limiting, and these can be similarly overexpressed. Enzymes that direct flux into competing pathways can be eliminated or knocked down to reduce side products.

Also to this end, enzyme fusions, scaffold proteins or the co-localization of enzymes in cellular compartments can be used to improve substrate channeling and reduce the leakage of metabolites into other cellular pathways. In addition, it is important to consider the role of regulatory genes on the pathway. Negative regulators or repressors of the enzymes or pathway can be knocked down or deleted, while positive regulators can be overexpressed.

For PDO, the genes encoding glycerol kinase (glpK) and glycerol dehydrogenase (gldA) were deleted to prevent glycerol from being channeled back into central carbon metabolism43

(Fig. 1-2). In the case of BDO, the levels of the side product, acetate, was reduced by constitutive expression of the genes encoding acetate kinase and phosphotransacetylase (ackA- pta), which convert acetate to acetyl-CoA44. For artemisinic acid, all enzymes in the mevalonate pathway up to ERG20, which are responsible for the production of terpenoid monomers, were overexpressed using the galactose-inducible GAL1/GAL10 divergent promoter51. Furthermore, to eliminate the need for galactose induction, which would have made the process non-economical,

14

the regulatory gene GAL80 was deleted to enable constitutive expression of all GAL-regulated genes45. Next, the gene encoding squalene synthase (ERG9), which converts FPP to squalene for the synthesis of ergosterol (a competing pathway), was knocked down using a CTR3 copper- repressible promoter.

Strain background changes

Last but not least, rational changes can be made to the host cell background in order to improve strain fitness, genomic stability, tolerance to extreme conditions during fermentation

(e.g. low pH, osmotic stress, organic solvents), and to mitigate toxicity of the product, intermediates or side products34. For example, influx transporters can be expressed to deliver feedstocks and nutrients more effectively into the cell, while efflux transporters can help the secretion of products or intermediates out of the cell and prevent their accumulation in the cytosol52. For commercial BDO production by E. coli, the genes encoding the phage receptors tonA and lamB were mutated to impart resistance to common phage53. For artemisinic acid production, the gene encoding an artemisinic aldehyde dehydrogenase (ALDH1) from A. annua was expressed in yeast, in order to reduce the levels of the toxic intermediate artemisinic aldehyde and increase the yield of artemisinic acid45 (Fig. 1-2). In addition, cytosolic catalase

(CTT1) was overexpressed to mitigate oxidative stress54. Finally, all relevant enzymes were stably integrated into the chromosome to remove the requirement for plasmid selection.

15

1.2.3 Inverse metabolic engineering

The success of a constructive metabolic engineering approach hinges upon our understanding of the enzymes, pathways and regulatory networks in the cell, and our ability to model these to predict mutations that can lead to improved titer, rate and yield of a product.

While there has been much progress on all fronts, especially in our knowledge of cell biology and computational modeling, we still do not yet have a complete whole cell metabolic model for the basic production hosts E. coli and S. cerevisiae55. Many designs often produce unexpected results, which could be due to a number of reasons, such as a failure to model the pathway itself accurately, a lack of knowledge of the interactions between the pathway and other networks in the cell, the presence of undiscovered regulators, enzymes or regulatory mechanisms, and redundancies in enzyme activities. Thus, inverse metabolic engineering offers a way to improve strains with only a limited understanding of the cell, and to discover new knowledge.

The approach was codified by Bailey as “first, identifying, constructing, or calculating a desired phenotype; second, determining the genetic or the particular environmental factors conferring that phenotype; and third, endowing that phenotype on another strain or organism by directed genetic or environmental manipulation56.” Thus, inverse metabolic engineering begins with a desired phenotype in mind, followed by random mutagenesis to create a library of mutant cells. The library is screened to find clones that exhibit the desired phenotype, which are analyzed (e.g. by –omics approaches and sequencing) to find the causative mutations. The mutations can be re-created in a targeted manner in a clean strain background to verify the phenotype. This approach can reveal potentially novel enzymes, regulators or other cellular components that confer a particular phenotype. This knowledge can then be used in future rational designs as part of a constructive approach57.

16

Spontaneous mutagenesis

Inverse metabolic engineering has been practiced for centuries, albeit on a rudimentary level, as part of artificial selection programs for microorganisms with desirable traits. The source of genetic diversity was spontaneous mutagenesis arising from errors in DNA replication, which occurs rarely (~ 10-10 per base pair per replication in E. coli and S. cerevisiae) if the DNA repair machinery of the cell is intact58,59. Thus, the process has to rely on growth selections to cover a sufficiently large library of cells (> 108) to increase the chances of uncovering mutants possessing the desired phenotype. A screen would be simply too inefficient and impractical to perform. For example, ethanol-tolerant mutants of E. coli, which can potentially be advantageous for bio-ethanol production, were enriched by serial transfers into broth containing increasing concentrations of ethanol60. In order to generate a strain of S. cerevisiae capable of efficiently utilizing xylose as a sole carbon source (the second most abundant monosaccharaide after glucose), the enzymes xylose reductase, xylitol dehydrogenase and xylulose kinase from Pichia stipitis were first expressed heterologously in yeast61. The yeast strain was subjected to serial cultivation in minimal media supplemented with xylose to enrich for strains with enhanced xylose consumption and growth rates.

Random mutagenesis by chemical mutagens, UV or transposons

Instead of relying on infrequent spontaneous mutations to generate genetic diversity, one can employ chemical mutagens such as N’-nitro-N-nitrosoguanidine (NTG) or ethylmethanesulfonate (EMS), or expose the cells to UV, in order to create random mutations throughout the genome. Smith et al. treated E. coli with NTG and selected for growth in the presence of norvaline, a toxic analog of valine, which shares its biosynthetic pathway with

17

isobutanol. Subsequently, a strain showing improved flux towards isobutanol production, which outcompetes the toxic norvaline, was isolated62. Another way to generate mutations randomly is to use transposon mutagenesis. First, a transposon library is created containing random insertions of the transposon in the chromosome. The library is then subject to some selective pressure.

Genomic DNA fragments containing the transposon from the selected libraries can be compared with the unselected transposon insertion library using DNA microarrays, in order to identify insertions that affect fitness63.

Compared to spontaneous mutagenesis, these methods of random mutagenesis have a sufficiently high mutagenic rate, such that potentially interesting mutants can be found from screening a population of cells ~ 104 – 105. This greatly broadens the applicability of the inverse metabolic engineering approach to screenable but non-selectable phenotypes (see Section 1.3.4).

For instance, Alper et al. used transposon mutagenesis to identify gene knockout targets in E. coli that led to increased lycopene (a colored compound) production, using a screen for colonies that were more orange/red, which indicates more lycopene64. This approach identified two new potential gene targets, yjfP and yjiD, which were previously uncharacterized.

1.3 Combinatorial directed evolution strategies and tools for metabolic engineering

Constructive and inverse metabolic engineering are really two ends of a spectrum of possible approaches that can be employed for engineering cellular factories. Hybrid strategies that marry the strengths of both approaches can lead to advances in the speed and efficiency of metabolic engineering. This approach has been termed “combinatorial metabolic engineering”57,65, “evolutionary metabolic engineering”34, “systems metabolic engineering”66 or simply “directed evolution”67. The key steps are first, the generation of a semi-rational library at

18

the enzyme, pathway, network or genome level, and second, applying a screen or selection for the product molecule to identify variants with improved titer, rate and yield (Fig. 1-3).

Importantly, although not discussed in this chapter, pathway libraries can also be generated for the production of novel analogs of the compound68.

Thus, the combinatorial directed evolution approach bridges the gap between entirely rational and random approaches. The size and degree of “rationality” of the library can be designed based on the desired application and the amount of knowledge that is available about the enzyme, pathway or cell. The two key technological bottlenecks to this approach are methods to generate mutant libraries and screens or selections for the product (Fig. 1-3). While there has been much progress addressing the DNA mutagenesis problem, there is still a need for more general and high-throughput screens/selections for small molecules. In the next sections, we discuss these technologies and provide examples of how they have been harnessed for combinatorial metabolic engineering.

Figure 1-3. Directed evolution cycle.

19

1.3.1 Mutagenesis to create single gene libraries

Protein engineering

Following the identification of rate-limiting enzymes in a biosynthetic pathway, directed evolution can be performed to improve the properties of the enzymes, such as their catalytic activity, thermostability and specificity. Various technologies that have been developed for single protein engineering can be applied here. For instance, Nair et al. used error-prone PCR to improve the specificity of the promiscuous enzyme xylitol reductase for D-xylose over L- arabinose by 8.7-fold69. This was improved further to 16.5-fold by subsequent rounds of targeted site-saturation mutagenesis, using overlap-extension PCR with primers that contained a degenerate NNS codon (N: any nucleotide; S: cytosine or guanine).

Pirakitikulr et al. developed PCRless library mutagenesis via oligonucleotide recombination in yeast70. The method relies on mutagenic single-stranded oligonucleotides, which can be readily synthesized commercially. The target gene is located on a linearized plasmid and co-transformed into the cell with the mutagenic oligonucleotides. The presence of a double-stranded DNA break stimulates efficient homologous recombination, which was used to mutagenize up to three active site loops simultaneously in a proof-of-principle marker gene

TRP1.

A more difficult challenge is to create enzymes with novel catalytic abilities, in order to expand the repertoire of chemistry performed by the cell71,72. Arnold and co-workers have developed a strategy to accomplish this. First, bioinformatics is used to search for enzymes possessing mechanistic similarities to a desired transformation for which no enzymes are known.

Directed evolution can then be used to improve upon the enzyme’s activity and specificity. The

20

group has successfully evolved cytochrome P450s to catalyze a range of novel chemical reactions such as cyclopropanation, carbine and nitrene transfers, and C-H amination71.

Discovery of novel enzymes from metagenomic libraries

Environmental DNA libraries can also be a valuable resource to search for homologs of enzymes with improved properties. For PDO production, metagenomic libraries were screened by colony hybridization and enzyme complementation in dehydratase-negative E. coli to find homologs of glycerol and diol dehydratases73. Eventually, a diol dehydratase was found, which exhibited higher kcat/kM values and retained 40 % of its activity even in the presence of high concentrations (2 M) of the product PDO.

The method of family shuffling or in vitro DNA recombination using PCR can be used to mix-and-match different regions of homologous proteins to create a library of hybrids74,75. This method was used to shuffle three homologs of the enzyme biphenyl dioxygenase to evolve variants that had enhanced activities to degrade polychlorinated biphenyls (PCBs)76.

Tuning single gene expression

Next, the expression of the enzyme or regulator can be optimized by tuning its level of transcription and translation, via the use of promoter libraries and ribosome binding site (RBS) variants respectively77. Alper et al. generated a promoter library by error-prone PCR of a constitutive promoter, and used it to assess how the levels of the enzymes phosphoenolpyruvate carboxylase (ppc) and deoxy-xylulose-P synthase (dxs) impacted growth yield and lycopene production respectively in E. coli78. It was found that lycopene yields scaled linearly with dxs expression, indicating a rate-limiting step. In addition, the stability of the mRNA and protein can

21

be tuned by incorporating RNA stabilizing/de-stabilizing elements and protein degradation tags respectively79. Babiskin et al. created a library of Rnt1p hairpin substrates, which mediate mRNA degradation when placed at the 3’ UTR of mRNAs. They fused the library to the gene encoding squalene synthase (ERG9) in S. cerevisiae in order to modulate the flux through the ergosterol pathway80. Finally, gene knockout or overexpression libraries such as the Keio knockout collection for E. coli81 can be used for the systematic analysis of genes that affect strain production57.

1.3.2 Mutagenesis to create pathway libraries

Since metabolic engineering involves the optimization of multiple interacting genes and pathways, combinatorial approaches would benefit from generating libraries of multi-gene pathways as well as to optimize their individual expression simultaneously. The two main approaches to achieve this are either bottom-up DNA assembly of the pathway library or top- down mutagenesis of an existing pathway. The advent of synthetic biology has led to the development of DNA assembly methods that are capable of efficiently building pathway libraries de novo82,83. At the same time, methods for in vivo mutagenesis have continued to expand, enabling multiple genomic loci to be targeted simultaneously for systems-level engineering and systems biology studies84. These methods will be explored later in the context of genome-wide mutagenesis, even though they can also be applied to build pathway libraries.

In vitro DNA assembly methods

In vitro DNA assembly methods such as restriction enzyme digestion and ligation can be used to construct DNA pathway libraries. BioBrick™85 and Golden Gate86 employ well-chosen

22

restriction digest and ligation schemes to enable re-use of the same restriction sites for multiple rounds of assembly. For instance, the BioBrick™ standard involves the creation of DNA cassettes flanked at the 5′ region by an EcoRI and XbaI site, while the 3′ region contains a SpeI and PstI site85. Ligation of the compatible sticky ends of XbaI and SpeI destroys both sites, enabling further rounds of assembly using the same enzymes. Golden Gate employs Type IIS restriction enzymes that cleave outside of its recognition sequence to enable scarless DNA assembly86. Guo et al. used this method to create a series of standardized biological parts to enable the rapid construction of a promoter library (three promoters) for a three-gene carotenoid biosynthesis pathway (total of 27 combinations)87. Latimer et al. also used Golden Gate to construct promoter libraries for the eight-gene xylose utilization pathway in S. cerevisiae, in order to investigate differences in promoter enrichment under various conditions88.

Occasionally, the use of restriction enzyme-based methods may be limited due to the occurrence of restriction sites within the sequence of interest. In those cases, other seamless assembly methods such as overlap-extension PCR can be used. Jeschek et al. used this method to create rationally-reduced RBS libraries (RedLibs) for three genes in order to optimize the flux around a branch point in the violacein biosynthesis pathway89. Variants were identified that showed increased production of the pigments violacein and deoxyviolacein by 240% and 420% respectively. Other useful in vitro methods that enable the rapid and scarless assembly of DNA fragments based on homology include sequence and ligation-independent cloning (SLIC)90 and

Gibson assembly, which has been shown to be capable of assembling several hundred kilobase- long genomes91. Layton et al. used Gibson assembly to assemble ester fermentative pathways in

23

modular fashion to enable the rapid construction of promoter, RBS and origin of replication libraries to optimize transcription, translation and plasmid copy number respectively92.

In vivo DNA assembly on plasmids

Another class of methods for DNA assembly involve harnessing the highly efficient homologous recombination machinery in the yeast S. cerevisiae or the bacterium Bacillus subtilis for in vivo assembly. For instance, Gibson et al. used gap repair in yeast for the one-step assembly of 25 overlapping DNA fragments to form the complete, synthetic, 592 kb

Mycoplasma genitalium genome93,94 (Fig. 1-4). Larionov et al. developed Transformation

Associated Recombination (TAR), based on gap repair of a linearized plasmid in yeast, for the efficient cloning of genes or genomic regions up to 300 kb in length from co-transformed prokaryotic or eukaryotic genomes95. Shao et al. developed DNA Assembler, also based on yeast gap repair, and demonstrated the one-step construction of a functional eight-gene, D-xylose utilization and zeaxanthin biosynthesis pathway (19 kb)96. Next, the authors used DNA

Assembler to construct a functional 29 kb aureothin polyketide biosynthesis pathway, and modified the pathway gene AurB to produce an analog of aureothin97. However, when the assembly of the ~ 45 kb spectinabilin polyketide biosynthesis pathway was attempted, deletions were observed in the pathway, which was probably due to recombination between repetitive sequences. This necessitated the assembly of the pathway in two steps using yeast followed by in vitro assembly. This and our own experience (see Chapter 2) suggest a potential drawback to in vivo homologous recombination-based methods for the assembly of highly repetitive sequences.

24

Figure 1-4. Yeast gap repair for DNA assembly on plasmids. Multiple DNA fragments with overlapping homologies can be co-transformed with a linearized plasmid for efficient recombination in yeast to form a closed plasmid containing the assembled DNA.

Mitchell et al. combined in vitro assembly by Golden Gate with in vivo yeast gap repair to create a Versatile Genetic Assembly System (VEGAS)98. In that study, Golden Gate was first used to construct transcriptional units (promoter – gene – terminator) flanked by orthogonal sequences. These were used as homology arms for the one-step assembly of the four gene β- carotene pathway and the five gene violacein pathway by homologous recombination in yeast.

Next, the authors used VEGAS to assemble combinatorial promoter and terminator libraries (10 promoters and 5 terminators) for each gene in the β-carotene pathway. The resultant colonies

25

produced different concentrations of β-carotene and intermediate compounds, which was reflected phenotypically in color differences between colonies98. Overall, in vivo DNA assembly using yeast gap repair can offer advantages in terms of speed and efficiency over in vitro DNA assembly methods, as only a single transformation step is required to create functional pathways and libraries on plasmids in yeast (Fig. 1-4). However, for the construction of robust and stable production strains, it is preferable to assemble biosynthetic pathways in the chromosome instead of plasmids, as plasmids require continuous selection pressure for maintenance, and can exhibit variability in their copy number99.

In vivo DNA assembly in the chromosome

Itaya et al. have developed the B. subtilis genome as a vector (BGM vector) for the cloning of the 3.5 Mb genome of the photosynthetic bacterium Synechocystis PCC6803 using a homologous recombination-based method named “inchworm”100. However, the method requires long (> 100 kb), high quality, continuous DNA as template, which may not be readily available.

The authors later developed “domino” assembly of overlapping blocks of DNA that can be generated by PCR or gene synthesis, and demonstrated the assembly of the 16.3 kb mouse mitochondrial genome and the 134.5 kb rice chloroplast genome101.

Our lab (Wingler et al.) developed Reiterative Recombination as a robust, user-friendly and efficient tool for the assembly of multi-gene pathways and libraries in the yeast chromosome102,103. Reiterative Recombination relies on inducing an endonuclease to create a dsDNA break in the genome, which increases the efficiency of repair by homologous recombination by orders of magnitude104,105. A marker recycling scheme (HIS3 and LEU2) that exchanges the marker gene during each round of Reiterative Recombination, enables multiple

26

rounds of assembly to be performed without running out of markers (Fig. 1-5). We have applied this method for the assembly of a challenging ~ 100 kb polyketide biosynthesis pathway for meridamycin (Chapter 2). Importantly, the high efficiency allows for the construction of pathway libraries of at least 104. Wingler et al. demonstrated this by building a mock library for the three gene lycopene biosynthesis pathway (crtE, crtB and crtI) and recovering one active clone from a pool of 104 inactive pathways102.

Figure 1-5. Reiterative Recombination cycle. The system relies on cycling between a set of two endonucleases and marker genes (HIS3 and LEU2) for indefinite rounds of assembly. In round 1 (top), endonuclease 1 is induced by galactose, which creates a dsDNA break at the endonuclease target site in the genome. This stimulates efficient homologous recombination using a piece of donor DNA (orange) containing the DNA to be assembled. Integration into the chromosome can be selected for using marker 1, which is only active on the chromosome and not the plasmid. In round 2 (bottom), endonuclease 2 cuts the genome to stimulate homologous recombination with the next piece of donor DNA. The process swaps out marker 1 for marker 2, allowing marker 1 to be reused for the next round of assembly.

27

In 2014, Annaluru et al. reported the total synthesis of a ~ 270 kb functional, designer yeast chromosome III. The authors used a hierarchical assembly scheme starting from oligonucleotides to create ~ 750 bp double-stranded DNA building blocks by overlap-extension

PCR. These were joined by yeast gap repair to create 2 – 4 kb mini-chunks. Finally, pools of overlapping, synthetic mini-chunks, were used to replace native sections of the yeast chromosome III by homologous recombination (Fig. 1-6). A total of eleven rounds of recombination were performed to replace the entire chromosome with synthetic DNA, using alternating marker genes (LEU2 and URA3) to select for mini-chunk integration.

This achievement is part of the S. cerevisiae (Sc2.0) project, which seeks to construct an entirely synthetic, designer yeast genome106. Significantly, the refactored genome will have destabilizing elements such as transposons removed, and TAG stop codons will be replaced by

TAA stop codons to free up a codon for future unnatural amino acid incorporation. Finally, all non-essential genes and major landmarks such as the centromere and telomere, will be flanked by loxPsym sites to enable genome-scale rearrangements, duplications, inversions and deletions to occur upon inducible expression of Cre recombinase (SCRaMbLE) (Fig. 1-6). Once the synthetic genome is complete, SCRaMbLE can be used to explore genome minimization to create a fully-dedicated cell factory107, or to accelerate the directed evolution of the host background (see next) for improved strain fitness and production phenotypes108.

28

Figure 1-6. Synthetic yeast chromosome and SCRaMbLE. Top: Homologous recombination was used to replace all portions of the native yeast chromosome with synthetic constructs. Bottom: Upon induction of Cre recombinase, the genes (arrows) can undergo a variety of recombination events between the flanking LoxPsym sites (diamonds) to produce deletions, duplications, inversions and transpositions.

29

Overall, DNA assembly technologies have advanced to the stage where the construction of pathway libraries of up to 104 variants are becoming routine. This has largely overcome the bottleneck in the mutagenesis step for the directed evolution of biosynthetic pathways. The next challenge is to optimize the host strain background by combinatorial approaches. While DNA assembly methods are beginning to make inroads into the de novo total synthesis of functional bacterial93,94,104 and eukaryotic genomes106,109, these approaches are still developing and remain technically challenging to perform. However, in recent years, there has been much progress in the development of technologies to perform genome-wide mutagenesis for synthetic and systems biology applications. In the next section, we highlight some prominent technologies and approaches that are enabling the high-throughput directed evolution of host genomes for combinatorial metabolic engineering.

1.3.3 Genome-wide mutagenesis

Global transcription machinery engineering

Alper et al. developed global transcription machinery engineering (gTME) for the global perturbation of the transcriptome110 (Fig. 1-7). The method relies on using error-prone PCR to mutate the E. coli gene rpoD that encodes the main bacterial sigma factor, σ70. In E. coli, sigma factors can tune the promoter preferences of RNA polymerase for multiple genes111. Alper et al. demonstrated that mutagenizing the global regulator σ70 was more efficient than traditional methods targeting multiple genes, to optimize phenotypes such as ethanol tolerance, lycopene production and combined phenotypes (ethanol and SDS tolerance).

30

Figure 1-7. Global transcription machinery engineering. Global regulators such as the sigma factor are mutagenized in E. coli. This may result in changes to the gene expression profile of several genes that are originally regulated by the sigma factor, or new genes that are now regulated by the mutated factor.

Various –omics approaches such as transcriptomics can be used to quickly identify target genes that may be responsible for an interesting phenotype.

Recombineering in E. coli

Recombineering has emerged as an efficient and reliable method to mutagenize the genome in E. coli112,113. The method harnesses phage-based recombination machinery such as the

ET system or λ Red system, for homologous recombination of linear single-stranded DNA

(ssDNA) or double-stranded DNA (dsDNA) with E. coli plasmids or the genome114-118. The efficiency of Red-recombination approaches 25 % of cells surviving electroporation when the E. coli mismatch repair system is knocked out119. Importantly, the region of homology required for recombination is short enough (20 - 70 bases), such that thousands of single-stranded mutagenic

31

oligonucleotides (ss-oligos) can be synthesized at reasonable cost on a chip for genome-wide mutagenesis120.

Figure 1-8. Multiplex automated genome engineering. Low-cost, 90-base single-stranded oligos can be used to create large randomized libraries (NNNNN) in specific locations or genes in the genome.

Using iterative cycles of MAGE, which can be automated, extensive changes can be made to the genome.

Wang et al. developed Multiplex Automated Genome Engineering (MAGE), based on recombineering, for the automated, genome-wide mutagenesis and evolution of E. coli121 (Fig. 1-

8). Each cycle takes ~ 2 – 2.5 h and is capable of mutagenizing > 30 % of the cell population using 90-base ss-oligos. The authors applied MAGE to optimize the host background for lycopene production. Twenty genes were chosen for the construction of RBS libraries to tune their translation, and four genes were targeted for inactivation simultaneously. After 35 cycles of

MAGE, during which ~ 1.5 x 1010 variants were generated, a strain producing five-fold more

32

lycopene compared to the background strain was isolated. Significantly, the strain exceeded the best documented yields at the time, and was evolved in less than a week121. Isaacs et al. applied

MAGE to refactor all the 314 TAG stop codons in E. coli to TAA, in order to free up a codon for potential applications such as unnatural amino acid incorporation. They accomplished this by using MAGE to refactor 10 different stop codons at a time using a pool of 10 oligos to generate

32 strains. The authors then developed hierarchical conjugative assembly genome engineering

(CAGE) to combine all the refactored genomes in the 32 strains to produce a strain containing all

314 stop codon replacements122,123. Wang et al. further developed MAGE to enhance the efficiency of integrations by co-selection for oligo-mediated reversion of an inactivated genomic marker (CoS-MAGE)124. The authors showed that the population of cells containing oligo- mediated mutations near the co-selection marker was enriched by this co-selection process. They further applied CoS-MAGE to combinatorially insert the 20-bp T7 promoter in front of 12 genes to optimize the production of indigo. Subsequently, a strain producing four-fold more indigo than the background strain was isolated.

While MAGE was applied for the combinatorial tuning of known genes that affect a biosynthetic pathway, Warner et al. developed another recombineering-based method, called

Trackable multiplex recombineering (TRMR), to enable genome-wide searches for unknown genes that impact upon a selectable phenotype125 (Fig. 1-9). The authors created a gene overexpression library and a gene knockdown library containing unique barcodes for 4,077 genes in E. coli (95 % of all genes). The libraries were subjected to a selection and screened for enriched genes using DNA microarrays to scan for the barcodes. Using TRMR, the authors rapidly mapped all 4,077 genes individually to selectable traits such as growth in various media

33

(rich, minimal and cellulosic hydrolysate) and in the presence of growth inhibitors126. To showcase how TRMR could be employed for combinatorial metabolic engineering, Sandoval et al. used TRMR to identify individual genes that impact upon acetate tolerance, growth at pH 5, and cellulosic hydrolysate tolerance. The authors then used recursive recombineering, similar to

MAGE, to construct combinatorial RBS libraries for the top hits identified from TRMR.

Superior combinations of mutations were subsequently identified127. The optimization of metabolic flux requires more than simple overexpression or knockdown. Thus, Freed et al. incorporated inducible promoters by TRMR (T2RMR) to enable a more fine-grained mapping of different gene expression levels to specific phenotypes128.

Figure 1-9. Trackable multiplex recombineering. Barcoded oligos are used to create a genome-wide overexpression or gene knockdown library. After screening or selecting for cells that display a desired phenotype, the population can be sequenced using the barcodes to quickly identify target genes that may be responsible for the phenotype.

34

Yeast homologous recombination

Compared to E. coli, the yeast S. cerevisiae possesses a highly efficient endogenous homologous recombination (HR) machinery, which has enabled efficient genome manipulations to be performed for decades129. However, when using dsDNA, high efficiencies of recombination (~ 10-3) necessitates either long homology arms (> 500 bp)130 or the generation of a dsDNA break near the region of modification102, coupled with the use of selection markers to select for integration. When using short ssDNA with homology arms ~ 20 – 30 bp, the efficiency drops to ~ 10-6, which can be rescued by using high amounts (~ 1 nmol) of oligonucleotides, selecting for reversion of a marker gene131 or creating a dsDNA break near the site of modification105. Thus, although it is straightforward to perform the targeted mutagenesis of a few genes in the yeast genome, it is challenging to apply libraries of short, synthetic oligos to perform genome-wide mutagenesis.

Nevertheless, Di Carlo et al. developed yeast oligo-mediated genome engineering

(YOGE), for the recombineering of the yeast genome using synthetic oligonucleotides132. After knockout of the mismatch repair system, overexpression of DNA recombinase, and optimization of conditions such as oligonucleotide length and homology, efficiencies of 0.2 – 2 % were achieved across all loci for oligo incorporation without a selectable marker. While an impressive achievement, the efficiency of YOGE is still ~ 10 – 100 fold lower than MAGE121. Thus, the authors propose that a negative selection against unedited genomes using programmable nucleases (see next) can help to eliminate non-mutagenized cells and increase the proportion of mutagenized cells in the population.

35

Programmable sequence-specific DNA-binding proteins

The development of programmable nucleases that can make targeted dsDNA breaks in the genome has opened up new possibilities to perform efficient genome editing in yeast and other organisms133 (Fig. 1-10). Zinc finger nucleases (ZFNs) and Transcription activator-like effector nucleases (TALENs) have been created by fusing a non-specific DNA-cleaving endonuclease to a modular DNA-binding domain (DBD) based on the zinc finger proteins

(ZFPs) or transcription activator-like effectors (TALEs) respectively134. Due to their modular structure, ZFNs or TALENs that target specific DNA sequences can be created by changing the amino acid sequence of the DBD. The creation of a dsDNA break can stimulate homologous repair with a co-transformed piece of mutagenic ssDNA or dsDNA. Zhang et al. developed

TALENs-assisted multiplex editing (TAME) in yeast for genome-scale directed evolution135.

TALENs were designed to recognize the consensus TATA and GC boxes and create a dsDNA break in between, in order to produce insertions, deletions and nucleotide substitutions (indels) via repair by non-homologous end joining (NHEJ). A total of 66 sites regulating the transcription of 98 genes were potentially targeted simultaneously by this approach. The authors then applied

TAME to evolve yeast strains with improved ethanol tolerance and found that the best performing strain contained TALENs-induced indels at 7 sites. The TAME approach is similar to gTEM in that a single regulatory system controlling multiple genes is mutagenized. However, compared to homologous recombination, NHEJ is inefficient in yeast since repair of dsDNA breaks by HR is the dominant pathway136.

An alternative to editing the DNA sequence of the genome is to apply the programmable

DNA-binding proteins as artificial transcription factors by fusion of the DBD to an activator or repressor domain (Fig. 1-10). Park et al. employed a library of zinc finger proteins (ZFPs) to

36

effect genome-wide changes in gene expression137. By randomly shuffling DNA segments to create three-finger proteins, followed by fusing them individually to either a repression or activation domain, the group created a library of artificial transcription factors with varied DNA- binding specificities. This was used to identify ZFPs that increased the tolerance of yeast cells to heat shock, high osmotic pressure and drug resistance.

Figure 1-10. Comparison between the TALENs, ZFNs and CRISPR/Cas9. The DNA-recognition modules that enable sequence-specific targeting in these systems are either protein domains (TALENs), zinc finger protein domains (ZFNs) or mRNA (CRISPR). The DNA-recognition modules can be fused to

37

a nickase/endonuclease, activator or repressor domain to enable the creation of sequence-specific ssDNA/dsDNA breaks, gene activation or repression respectively.

CRISPR/Cas9 RNA-programmable DNA targeting proteins

Recently, the CRISPR/Cas9 system from Streptococcus pyogenes has been harnessed as an RNA-guided DNA-editing tool in various organisms138-141. Compared to the ZFPs and

TALEs, which are bulky and require multiple changes to the DBD to tune its DNA specificity,

Cas9 can be easily targeted to a specific DNA sequence by a complementary guide RNA

(gRNA) (Fig. 1-10). CRISPR/Cas9 has been shown to be highly efficient for the marker-free editing of yeast genomes since Cas9 cutting is cell-lethal, which imposes a negative selection against non-repaired dsDNA breaks or non-mutated DNA142. Ryan et al. developed CRISPRm for the single and multiplex editing of up to three genomic loci in yeast simultaneously by driving gRNA expression via a strong tRNA promoter and fusing a HDV ribozyme to the 5’ end of the gRNA to improve its stability143. Bao et al. developed HI-CRISPR for the multiplex knockout of three genomic loci by using a polycistronic gRNA and repair DNA array, as well as an ultra-high copy number plasmid carrying iCas9 to increase efficiency144. The authors used HI-

CRISPR to knockout three genes in a heterologous hydrocortisone biosynthesis pathway with

100 % efficiency in 6 days. Jakočiūnas et al. demonstrated that CRISPR/Cas9 can be used to knockout different combinations of up to 5 genes, and identified strains producing 41-fold higher titers of mevalonate compared to the wild-type strain145.

In the same vein as the ZF and TALE-based transcription factors, CRISPR/Cas9 can be exploited to target gene transcription. Catalytically inactive Cas9 (dCas9) that retains its DNA binding function but lacks its endonuclease function has been developed as a general platform

38

for RNA-guided DNA targeting (Fig. 1-10). The binding of dCas9 to DNA has been shown to inhibit RNA polymerase or transcription factor binding, or transcriptional elongation, resulting in the effective silencing of genes (CRISPRi)146,147. This system has been shown to be highly specific, reversible and capable of targeting multiple genes at once. In addition, the fusion of repressor or activator domains to dCas9 can result in the more effective repression or activation of transcription of endogenous genes in eukaryotes respectively148,149. Bikard et al. showed that dCas9 can be similarly exploited as an artificial transcriptional repressor in E. coli, and when fused to the omega subunit of RNA polymerase, can serve as a transcriptional activator150.

Farzarhard et al. further showed that the activity of CRISPR-based transcription factors

(CRISPR-TFs) can be modulated in S. cerevisiae by targeting several CRISPR-TFs to different positions in natural promoters, or by creating multiple CRISPR-TF binding sites within synthetic promoters151.

Examples of using CRISPR-TFs for transcriptional re-programming for metabolic engineering are beginning to emerge. Cress et al. constructed a library of T7-lac promoters that can be orthogonally repressed by CRISPRi in E. coli152. The authors incorporated these promoters into the five-gene, branched violacein biosynthesis pathway, and showed that they could use CRISPRi for the selective redirection of flux to produce different compounds. Cleto et al. demonstrated the use of CRISPRi to decrease the expression of the genes pgi, pck and pyk in

Corynebacterium glutamicum and showed that they could achieve increased titers of L-lysine and L-glutamate production comparable to levels achieved via gene deletion153. Already, genome-wide gRNA libraries that target practically all the genes in fly154 and human155 have been constructed. It is anticipated that such gRNA libraries that target all known protein-coding genes, and even non-protein coding regions in various host strains, will be used for combinatorial

39

metabolic engineering to improve the strain background and to discover new genes and regulatory systems that impact production.

1.3.4 Screens or selections for small molecule products

While there have been major advances in methods for bottom-up DNA assembly and top- down genome-wide mutagenesis, it is widely held that the current bottleneck to employing combinatorial approaches in metabolic engineering is the lack of robust and efficient screens and selections for small molecule products156. The current state-of-the-art for small molecule detection is gas or liquid chromatography coupled with mass spectrometry (GC/LC-MS), which is capable of detecting and quantifying multiple compounds in one sample since chromatographic separation is coupled to highly sensitive and specific mass detection157. As such, GC/LC-MS has been harnessed to track global changes in carbon networks in the cell by

13C isotope-based metabolic flux analysis158-160. However, for the purpose of screening a combinatorial strain library to detect a single target molecule, the low-throughput and the requirement for additional sample preparation steps impose severe limitations on library size – assuming sample turnover is on the order of minutes, a modest library of 103 samples will take days to screen. This limitation has led groups to develop creative screens/selections or to be fastidious in their choice of metabolic engineering targets.

Products with screenable or selectable phenotypes

Molecules that are intrinsically colored such as lycopene64, indigo161 or violacein162, fluorescent such as astaxanthin163, or contain reactive groups that can produce a color/fluorescence change in the presence of an enzyme/reactant such as L-tyrosine (which can

40

be converted to melanin by tyrosinase164), are excellent targets to develop screens. This is because differences in colony color or fluorescence can be easily distinguished on agar plates (up to 106 clones) or measured in multi-well plates (~ 104 - 105 samples/day) using a plate reader.

Also, for fluorescent products, high-throughput fluorescence-activated cell sorting (FACS) (up to

109 per day) can be used to enrich a population of cells demonstrating increased fluorescence165.

Even better are target molecules that can confer a growth advantage on the producer cell, including primary metabolites such as prephenate166 and pyruvate167, which will enable higher producing clones to be enriched in a one-pot growth selection. For growth selections, the library size is only limited by culture volume (<1010) and for this reason, selections are indispensable to fully cover large random or combinatorial mutant libraries (>108).

Engineering synthetic protein receptor-transducer fusions

Unfortunately, many metabolic engineering targets of interest, such as fatty acids, short chain alcohols and diols, lack obvious properties that will enable the direct application of high- throughput screening/selection methods168. For such targets, various transduction strategies are needed to couple the detection of the molecule to an output amenable to high-throughput assay formats. Early work was focused on combining protein-based small molecule receptors with a transducer, in order to turn the binding event into a detectable signal. One approach was to incorporate a receptor into a transducer such as a fluorescent protein or enzyme169 (Fig. 1-11).

Some examples include fluorescent protein – receptor fusions that change their fluorescence output upon ligand binding170,171, or enzyme (e.g. DHFR) – receptor fusions that exhibit ligand- dependant activity172-174. These are based on the observation that protein insertions within a domain of another protein can result in a functional fusion protein and may occasionally result in

41

functional coupling between the two175. However, these methods require cumbersome screening of a large pool of circularly permutated fusion proteins for functional variants.

An alternative approach is to incorporate a transducer into a receptor, for instance, labeling the receptor with an environmentally-sensitive fluorophore positioned close to the ligand binding site, which can result in changes in fluorescence output upon ligand binding due to displacement of the fluorophore from the binding site176 (Fig. 1-11). This has been applied to antibodies⁠177-179, cell surface receptors180,⁠ bacterial periplasmic binding proteins181 and enzymes182-184.⁠ However, these methods require the additional step of site-specific protein labeling, screening to optimize fluorophore positioning and may result in variable output depending on the nature of the fluorophore.

Figure 1-11. Engineering protein receptor – transducer fusions. In one approach (top), receptor proteins can be fused to GFP in different positions (circularly permutated). Several positions have to be

42

screened to find a functional variant where the fluorescence output is dependent on the presence of target molecule (red star) binding to the receptor. In another approach (bottom), an environmentally-sensitive fluorophore can be placed close to the receptor binding site. Displacement of the fluorophore by ligand binding can result in a change in fluorescence, thus producing a detectable signal.

Figure 1-12. Engineering RNA aptamers to create aptasensors. A fluorophore and quencher (blue circle) are attached to the ends of a RNA aptamer. In the absence of the ligand (blue star), the fluorophore is far away from the quencher and thus emits fluorescence. When the ligand is present, the aptamer conformation is stabilized, leading to proximity between the quencher and fluorophore, which results in a decrease in fluorescence.

Engineering RNA/DNA-based aptamers

An alternative to protein-based receptors, RNA/DNA-based aptamers that bind to small molecules have also generated much interest for use in bio-sensing. This is because there are well established methods such as SELEX to develop high-affinity RNA/DNA binders.185,186

These aptamers have been incorporated into modular RNA/DNA designs that allow for switching between different secondary structural states in response to ligand binding to the aptamer region, and are known as molecular aptamer beacons or aptasensors187,188 (Fig. 1-12).

The output is usually a change in fluorescence by having the ligand stabilize a structure that either brings together or shifts apart a fluorophore-quencher pair or FRET pair. Molecular

43

beacons have been mostly applied to detect nucleic acids189, which they were originally developed for, although progress is being made to detect other molecules such as proteins190,191 and small molecules192,193. Challenges faced by RNA/DNA-based sensors include biphasic switch behavior in response to varying ligand concentrations (absence of a linear dynamic range), high background in the cell (tend to enter nucleus and fluoresce) and lack of structural stability in different types of conditions194,195.

Engineering natural small molecule regulators

Drawing inspiration from natural sensing mechanisms in the cell, natural regulators such as riboswitches/riboregulators196,197, transcription factors168,198 and nuclear hormone receptors199 have been harnessed to link small molecule detection to translation or transcription of a reporter gene157,195 (Fig. 1-13). These regulators are elegant since they combine the functions of target binding and signal transduction in one molecule and can be genetically encoded. When applied to metabolic engineering, these regulators can be used to couple small molecule production to increased cell growth in selective media in order to enrich for better producers, or to create feedback circuits for product-dependent regulation of biosynthetic genes to improve the efficiency of production200. For example, Srivatsan et al. used MAGE to create RBS, start codon and promoter libraries to tune the expression of genes involved in the production of precursors for naringenin production in E. coli201. They then coupled a naringenin sensor, ttgR, to regulate the expression of TolC, an outer membrane protein that confers resistance to sodium dodecyl- sulfate (SDS), and when absent renders the cell tolerant to colicin E1202. Importantly, this enabled the authors to perform a positive selection using SDS for high naringenin producers, followed by negative selection with colicin E1 to weed out “cheaters” that constitutively express

44

TolC in the absence of induction of naringenin production (toggled selection). After four rounds of mutagenesis (15 MAGE cycles each) and toggled selection, a strain producing 36-fold more naringenin than the parent strain was isolated.

Figure 1-13. Harnessing natural regulators to create small molecule selections. Riboswitches (top) can couple small molecule binding to an RNA aptamer to a change in translational output, for instance by sequestering the ribosome binding site (RBS). Transcription factors (middle row) that bind/unbind DNA 45

in the presence/absence of a small molecule ligand can be used to create ligand-responsive gene circuits.

Nuclear hormone receptors (bottom) function in a similar manner to transcription factors, except that the small molecule ligand promotes the entry of the receptor into the nucleus to affect gene transcription.

These systems can be used to couple small molecule binding to a change in translation or transcription of a selectable marker gene.

However, the repertoire of these small molecule regulators is largely limited by their availability in nature. There have been very few reported cases where natural regulators have been successfully re-designed or evolved to switch their target specificity, and even then, the new target is often similar in structure to the natural target203-207. One possible reason for this is that the binding and signal transduction functions are tightly interwoven in these regulators, which makes it difficult to rationally design new ligand specificities that still preserve their transduction function. The use of directed evolution approaches alone is also limited since this will probably require the in vivo exploration of sequence space that is beyond current capabilities for routine practice. Further advances in the development of small molecule regulators for metabolic engineering can come from using metagenomics to discover new natural regulators from the environment, and strategies that combine rational design, computational modelling and directed evolution to develop artificial regulators for novel molecules208.

1.4 Conclusions and future perspectives

Metabolic engineering for the production of chemicals promises to revolutionize our society, leading to cheaper goods and materials, improved access to pharmaceuticals and greener production technologies. However, we are only just beginning to realize the potential of living cells as chemical factories. This is because cells are complex systems that are challenging to re-

46

engineer, due to our lack of understanding of all the parts and their interactions. While rational

(constructive) approaches can lead to a strain that produces a low level of compound, inverse or directed evolution approaches typically have to be applied to raise production to commercially viable levels. These combinatorial strategies depend on technologies for de novo DNA assembly or top-down mutagenesis to construct large gene, pathway, network or strain libraries, coupled with product screens or selections to identify promising variants. While there has been much progress in the “build” technologies, more resources have to be channeled to solve the “test” bottleneck.

Despite our best engineering efforts, a major limitation of using live microbial cells for biosynthesis is that they are not fully efficient chemical factories. Ultimately, energy and materials are needed for cell growth, maintenance and reproduction209. In addition, there are redundant pathways that serve to enhance the robustness of the system, as well as pathways for feedback regulation that limit the accumulation of particular metabolites. One potential approach to overcome these obstacles is to develop a “chassis organism”, which should contain the requisite metabolic precursors, transport systems and also channel most of its resources towards production of the desired target compound. A related objective is the development of a “minimal cell”, which contains a basic complement of genes and pathways for maintenance and reproduction. By chipping away redundant genes and pathways, biosynthetic circuits can be built that are orthogonal to unrelated cell functions. Technologies for genome minimization by evolution (e.g. SCRaMbLE)107 and de novo genome synthesis94,106 can contribute to future advances.

Another concern of employing cellular factories to perform chemistry is the limited repertoire of chemical reactions that are available in nature72. Here, the field of protein design,

47

enzyme engineering and directed evolution can contribute to the design and evolution of enzymes de novo to perform novel reactions. Bio-prospecting or environmental genome mining can lead to the discovery of new enzymes and pathways, which can be heterologously cloned and tested in the production host. Furthermore, the exploration of non-traditional hosts for production can lead to the development of new chassis organisms for producing different classes of compounds210. This will require portable technologies such as CRISPR/Cas9138 that are broadly applicable to a range of organisms, in order to perform genome engineering.

Finally, advances in our understanding of biology and computational modeling can eventually lead to the development of comprehensive whole cell models55. These models would include all cellular processes and can be used to predict metabolic responses to environmental stimuli or genetic mutations. In this way, the “design-test-build-analyze” cycle can be performed at least initially in silico, and multiple cycles can be run to expand the number of permutations and combinations of mutations that are tested. De novo DNA assembly or genome-wide mutagenesis technologies can then be applied for rapid strain construction and testing. Metabolic engineering would then evolve towards a more rational discipline where fewer designs need to be tested for an increased probability of success.

48

1.5 References

1 Buchholz, K. & Collins, J. The roots--a short history of industrial microbiology and biotechnology. Appl Microbiol Biotechnol 97, 3747-3762, doi:10.1007/s00253-013-4768-2 (2013). 2 Gasson, M. J. Progress and Potential in the Biotechnology of Lactic-Acid Bacteria. Fems Microbiology Reviews 12, 3-20 (1993). 3 Soccol, C. R., Vandenberghe, L. P. S., Rodrigues, C. & Pandey, A. New perspectives for citric acid production and application. Food Technol Biotech 44, 141-149 (2006). 4 Tashiro, Y., Yoshida, T., Noguchi, T. & Sonomoto, K. Recent advances and future prospects for increased butanol production by acetone-butanol-ethanol fermentation. Eng Life Sci 13, 432-445, doi:10.1002/elsc.201200128 (2013). 5 Wang, Z. X., Zhuge, J., Fang, H. & Prior, B. A. Glycerol production by microbial fermentation: a review. Biotechnol Adv 19, 201-223 (2001). 6 Bud, R. Penicillin : triumph and tragedy. (Oxford University Press, 2007). 7 Aminov, R. I. A brief history of the antibiotic era: lessons learned and challenges for the future. Front Microbiol 1, 134, doi:10.3389/fmicb.2010.00134 (2010). 8 Scholar, E. M. & Pratt, W. B. The antimicrobial drugs. 2nd edn, (Oxford University Press, 2000). 9 Wright, P. M., Seiple, I. B. & Myers, A. G. The evolving role of chemical synthesis in antibacterial drug discovery. Angew Chem Int Ed Engl 53, 8840-8869, doi:10.1002/anie.201310843 (2014). 10 Liese, A., Seelbach, K. & Wandrey, C. Industrial biotransformations. 2nd completely rev. and extended edn, (Wiley-VCH, 2006). 11 Raspor, P. & Goranovic, D. Biotechnological applications of acetic acid bacteria. Crit Rev Biotechnol 28, 101-124, doi:10.1080/07388550802046749 (2008). 12 Bhatti, H. N. & Khera, R. A. Biological transformations of steroidal compounds: a review. Steroids 77, 1267-1290, doi:10.1016/j.steroids.2012.07.018 (2012). 13 Kobayashi, M. & Shimizu, S. Metalloenzyme nitrile hydratase: structure, regulation, and application to biotechnology. Nat Biotechnol 16, 733-736, doi:10.1038/nbt0898-733 (1998). 14 Smith, H. O. & Wilcox, K. W. A restriction enzyme from Hemophilus influenzae. I. Purification and general properties. J Mol Biol 51, 379-391 (1970). 15 Jackson, D. A., Symons, R. H. & Berg, P. Biochemical method for inserting new genetic information into DNA of Simian Virus 40: circular SV40 DNA molecules containing lambda

49

phage genes and the galactose operon of Escherichia coli. Proc Natl Acad Sci U S A 69, 2904- 2909 (1972). 16 Cohen, S. N., Chang, A. C., Boyer, H. W. & Helling, R. B. Construction of biologically functional bacterial plasmids in vitro. Proc Natl Acad Sci U S A 70, 3240-3244 (1973). 17 Johnson, I. S. Human insulin from recombinant DNA technology. Science 219, 632-637 (1983). 18 Demain, A. L. Microbial biotechnology. Trends Biotechnol 18, 26-31 (2000). 19 Bailey, J. E. Toward a science of metabolic engineering. Science 252, 1668-1675 (1991). 20 Stephanopoulos, G., Aristidou, A. A. & Nielsen, J. H. Metabolic engineering : principles and methodologies. (Academic Press, 1998). 21 Nielsen, J. Metabolic engineering. Appl Microbiol Biotechnol 55, 263-283 (2001). 22 Stephanopoulos, G. & Vallino, J. J. Network rigidity and metabolic engineering in metabolite overproduction. Science 252, 1675-1681 (1991). 23 Nielsen, J. & Keasling, J. D. Engineering Cellular Metabolism. Cell 164, 1185-1197, doi:10.1016/j.cell.2016.02.004 (2016). 24 Copeland, W. B. et al. Computational tools for metabolic engineering. Metab Eng 14, 270-280 (2012). 25 Keasling, J. D. Synthetic biology and the development of tools for metabolic engineering. Metab Eng 14, 189-195, doi:10.1016/j.ymben.2012.01.004 (2012). 26 Jensen, M. K. & Keasling, J. D. Recent applications of synthetic biology tools for yeast metabolic engineering. FEMS Yeast Res, doi:10.1111/1567-1364.12185 (2014). 27 Petzold, C. J., Chan, L. J., Nhan, M. & Adams, P. D. Analytics for Metabolic Engineering. Front Bioeng Biotechnol 3, 135, doi:10.3389/fbioe.2015.00135 (2015). 28 Bro, C. & Nielsen, J. Impact of 'ome' analyses on inverse metabolic engineering. Metab Eng 6, 204-211, doi:10.1016/j.ymben.2003.11.005 (2004). 29 Antoniewicz, M. R. Methods and advances in metabolic flux analysis: a mini-review. J Ind Microbiol Biotechnol 42, 317-325, doi:10.1007/s10295-015-1585-x (2015). 30 Wiechert, W. 13C metabolic flux analysis. Metab Eng 3, 195-206, doi:10.1006/mben.2001.0187 (2001). 31 Hegab, H. M., Elmekawy, A. & Stakenborg, T. Review of microfluidic microbioreactor technology for high-throughput submerged microbiological cultivation. Biomicrofluidics 7, 21502, doi:10.1063/1.4799966 (2013). 32 Nielsen, J. & Jewett, M. C. Impact of systems biology on metabolic engineering of Saccharomyces cerevisiae. FEMS Yeast Res 8, 122-131, doi:10.1111/j.1567-1364.2007.00302.x (2008).

50

33 Sweetlove, L. J., Last, R. L. & Fernie, A. R. Predictive metabolic engineering: a goal for systems biology. Plant Physiol 132, 420-425, doi:10.1104/pp.103.022004 (2003). 34 Nevoigt, E. Progress in metabolic engineering of Saccharomyces cerevisiae. Microbiol Mol Biol Rev 72, 379-412, doi:10.1128/MMBR.00025-07 (2008). 35 Keseler, I. M. et al. EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res 41, D605-612, doi:10.1093/nar/gks1027 (2013). 36 Zhou, J. & Rudd, K. E. EcoGene 3.0. Nucleic Acids Res 41, D613-624, doi:10.1093/nar/gks1235 (2013). 37 Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res 40, D700-705, doi:10.1093/nar/gkr1029 (2012). 38 Ishii, N. et al. Multiple high-throughput analyses monitor the response of E. coli to perturbations. Science 316, 593-597, doi:10.1126/science.1132067 (2007). 39 Han, M. J. & Lee, S. Y. The Escherichia coli proteome: past, present, and future prospects. Microbiol Mol Biol Rev 70, 362-439, doi:10.1128/MMBR.00036-05 (2006). 40 Hodges, P. E., Payne, W. E. & Garrels, J. I. The Yeast Protein Database (YPD): a curated proteome database for Saccharomyces cerevisiae. Nucleic Acids Res 26, 68-72 (1998). 41 Sajed, T. et al. ECMDB 2.0: A richer resource for understanding the biochemistry of E. coli. Nucleic Acids Res 44, D495-501, doi:10.1093/nar/gkv1060 (2016). 42 Jewison, T. et al. YMDB: the Yeast Metabolome Database. Nucleic Acids Res 40, D815-820, doi:10.1093/nar/gkr916 (2012). 43 Nakamura, C. E. & Whited, G. M. Metabolic engineering for the microbial production of 1,3- propanediol. Curr Opin Biotechnol 14, 454-459 (2003). 44 Burgard, A., Burk, M. J., Osterhout, R., Van Dien, S. & Yim, H. Development of a commercial scale process for production of 1,4-butanediol from sugar. Curr Opin Biotechnol 42, 118-125, doi:10.1016/j.copbio.2016.04.016 (2016). 45 Paddon, C. J. et al. High-level semi-synthetic production of the potent antimalarial artemisinin. Nature 496, 528-532, doi:10.1038/nature12051 (2013). 46 Paddon, C. J. & Keasling, J. D. Semi-synthetic artemisinin: a model for the use of synthetic biology in pharmaceutical development. Nat Rev Microbiol 12, 355-367, doi:10.1038/nrmicro3240 (2014). 47 Boyle, P. M. & Silver, P. A. Parts plus pipes: synthetic biology approaches to metabolic engineering. Metab Eng 14, 223-232, doi:10.1016/j.ymben.2011.10.003 (2012).

51

48 Davids, T., Schmidt, M., Bottcher, D. & Bornscheuer, U. T. Strategies for the discovery and engineering of enzymes for biocatalysis. Curr Opin Chem Biol 17, 215-220, doi:10.1016/j.cbpa.2013.02.022 (2013). 49 Emptage, M., Haynie, S. L., Laffend, L. A., Pucci, J. P. & Whited, G. Process for the biological production of 1,3-propanediol with high titer. USA patent 6,514,733 (2003). 50 Donald, K. A., Hampton, R. Y. & Fritz, I. B. Effects of overproduction of the catalytic domain of 3-hydroxy-3-methylglutaryl coenzyme A reductase on squalene synthesis in Saccharomyces cerevisiae. Appl Environ Microbiol 63, 3341-3344 (1997). 51 Westfall, P. J. et al. Production of amorphadiene in yeast, and its conversion to dihydroartemisinic acid, precursor to the antimalarial agent artemisinin. Proc Natl Acad Sci U S A 109, E111-118, doi:10.1073/pnas.1110740109 (2012). 52 Kell, D. B., Swainston, N., Pir, P. & Oliver, S. G. Membrane transporter engineering in industrial biotechnology and whole cell biocatalysis. Trends Biotechnol 33, 237-246, doi:10.1016/j.tibtech.2015.02.001 (2015). 53 Braun, V. FhuA (TonA), the career of a protein. J Bacteriol 191, 3431-3436, doi:10.1128/JB.00106-09 (2009). 54 Farrugia, G. & Balzan, R. Oxidative stress and programmed cell death in yeast. Front Oncol 2, 64, doi:10.3389/fonc.2012.00064 (2012). 55 Macklin, D. N., Ruggero, N. A. & Covert, M. W. The future of whole-cell modeling. Curr Opin Biotechnol 28, 111-115, doi:10.1016/j.copbio.2014.01.012 (2014). 56 Bailey, J. E. et al. Inverse metabolic engineering: A strategy for directed genetic engineering of useful phenotypes. Biotechnol Bioeng 52, 109-121, doi:10.1002/(SICI)1097- 0290(19961005)52:1<109::AID-BIT11>3.0.CO;2-J (1996). 57 Skretas, G. & Kolisis, F. N. Combinatorial approaches for inverse metabolic engineering applications. Comput Struct Biotechnol J 3, e201210021, doi:10.5936/csbj.201210021 (2012). 58 Drake, J. W., Charlesworth, B., Charlesworth, D. & Crow, J. F. Rates of spontaneous mutation. Genetics 148, 1667-1686 (1998). 59 Fijalkowska, I. J., Schaaper, R. M. & Jonczyk, P. DNA replication fidelity in Escherichia coli: a multi-DNA polymerase affair. FEMS Microbiol Rev 36, 1105-1121, doi:10.1111/j.1574- 6976.2012.00338.x (2012). 60 Yomano, L. P., York, S. W. & Ingram, L. O. Isolation and characterization of ethanol-tolerant mutants of Escherichia coli KO11 for fuel ethanol production. J Ind Microbiol Biotechnol 20, 132-138 (1998).

52

61 Scalcinati, G. et al. Evolutionary engineering of Saccharomyces cerevisiae for efficient aerobic xylose consumption. FEMS Yeast Res 12, 582-597, doi:10.1111/j.1567-1364.2012.00808.x (2012). 62 Smith, K. M. & Liao, J. C. An evolutionary strategy for isobutanol production strain development in Escherichia coli. Metab Eng 13, 674-681, doi:10.1016/j.ymben.2011.08.004 (2011). 63 Badarinarayana, V. et al. Selection analyses of insertional mutants using subgenic-resolution arrays. Nat Biotechnol 19, 1060-1065, doi:10.1038/nbt1101-1060 (2001). 64 Alper, H., Miyaoku, K. & Stephanopoulos, G. Construction of lycopene-overproducing E. coli strains by combining systematic and combinatorial gene knockout targets. Nat Biotechnol 23, 612-616, doi:10.1038/nbt1083 (2005). 65 Santos, C. N. & Stephanopoulos, G. Combinatorial engineering of microbes for optimizing cellular phenotype. Curr Opin Chem Biol 12, 168-176, doi:10.1016/j.cbpa.2008.01.017 (2008). 66 Choi, K. R., Shin, J. H., Cho, J. S., Yang, D. & Lee, S. Y. Systems Metabolic Engineering of Escherichia coli. EcoSal Plus 7, doi:10.1128/ecosalplus.ESP-0010-2015 (2016). 67 Chartrain, M., Salmon, P. M., Robinson, D. K. & Buckland, B. C. Metabolic engineering and directed evolution for the production of pharmaceuticals. Curr Opin Biotechnol 11, 209-214 (2000). 68 Sun, H., Liu, Z., Zhao, H. & Ang, E. L. Recent advances in combinatorial biosynthesis for drug discovery. Drug Des Devel Ther 9, 823-833, doi:10.2147/DDDT.S63023 (2015). 69 Nair, N. U. & Zhao, H. Evolution in reverse: engineering a D-xylose-specific xylose reductase. Chembiochem 9, 1213-1215, doi:10.1002/cbic.200700765 (2008). 70 Pirakitikulr, N., Ostrov, N., Peralta-Yahya, P. & Cornish, V. W. PCRless library mutagenesis via oligonucleotide recombination in yeast. Protein Sci 19, 2336-2346, doi:10.1002/pro.513 (2010). 71 Arnold, F. H. The nature of chemical innovation: new enzymes by evolution. Q Rev Biophys 48, 404-410, doi:10.1017/S003358351500013X (2015). 72 Renata, H., Wang, Z. J. & Arnold, F. H. Expanding the enzyme universe: accessing non-natural reactions by mechanism-guided directed evolution. Angew Chem Int Ed Engl 54, 3351-3367, doi:10.1002/anie.201409470 (2015). 73 Knietsch, A., Bowien, S., Whited, G., Gottschalk, G. & Daniel, R. Identification and characterization of coenzyme B12-dependent glycerol dehydratase- and diol dehydratase- encoding genes from metagenomic DNA libraries derived from enrichment cultures. Appl Environ Microbiol 69, 3048-3060 (2003). 74 Stemmer, W. P. DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. Proc Natl Acad Sci U S A 91, 10747-10751 (1994).

53

75 Zhao, H. & Arnold, F. H. Optimization of DNA shuffling for high fidelity recombination. Nucleic Acids Res 25, 1307-1308 (1997). 76 Barriault, D., Plante, M. M. & Sylvestre, M. Family shuffling of a targeted bphA region to engineer biphenyl dioxygenase. J Bacteriol 184, 3794-3800 (2002). 77 Coussement, P., Maertens, J., Beauprez, J., Van Bellegem, W. & De Mey, M. One step DNA assembly for combinatorial metabolic engineering. Metab Eng 23, 70-77, doi:10.1016/j.ymben.2014.02.012 (2014). 78 Alper, H., Fischer, C., Nevoigt, E. & Stephanopoulos, G. Tuning genetic control through promoter engineering. Proc Natl Acad Sci U S A 102, 12678-12683, doi:10.1073/pnas.0504604102 (2005). 79 Siddiqui, M. S., Thodey, K., Trenchard, I. & Smolke, C. D. Advancing secondary metabolite biosynthesis in yeast with synthetic biology tools. FEMS Yeast Res 12, 144-170, doi:10.1111/j.1567-1364.2011.00774.x (2012). 80 Babiskin, A. H. & Smolke, C. D. A synthetic library of RNA control modules for predictable tuning of gene expression in yeast. Mol Syst Biol 7, 471, doi:10.1038/msb.2011.4 (2011). 81 Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2, 2006 0008, doi:10.1038/msb4100050 (2006). 82 Chao, R., Yuan, Y. & Zhao, H. Recent advances in DNA assembly technologies. FEMS Yeast Res, doi:10.1111/1567-1364.12171 (2014). 83 Juhas, M. & Ajioka, J. W. High molecular weight DNA assembly in vivo for synthetic biology applications. Crit Rev Biotechnol, 1-10, doi:10.3109/07388551.2016.1141394 (2016). 84 Esvelt, K. M. & Wang, H. H. Genome-scale engineering for systems and synthetic biology. Mol Syst Biol 9, 641, doi:10.1038/msb.2012.66 (2013). 85 Smolke, C. D. Building outside of the box: iGEM and the BioBricks Foundation. Nat Biotechnol 27, 1099-1102, doi:10.1038/nbt1209-1099 (2009). 86 Engler, C., Kandzia, R. & Marillonnet, S. A one pot, one step, precision cloning method with high throughput capability. PLoS One 3, e3647, doi:10.1371/journal.pone.0003647 (2008). 87 Guo, Y. et al. YeastFab: the design and construction of standard biological parts for metabolic engineering in Saccharomyces cerevisiae. Nucleic Acids Res 43, e88, doi:10.1093/nar/gkv464 (2015). 88 Latimer, L. N. et al. Employing a combinatorial expression approach to characterize xylose utilization in Saccharomyces cerevisiae. Metab Eng 25, 20-29, doi:10.1016/j.ymben.2014.06.002 (2014).

54

89 Jeschek, M., Gerngross, D. & Panke, S. Rationally reduced libraries for combinatorial pathway optimization minimizing experimental effort. Nat Commun 7, 11163, doi:10.1038/ncomms11163 (2016). 90 Li, M. Z. & Elledge, S. J. Harnessing homologous recombination in vitro to generate recombinant DNA via SLIC. Nat Methods 4, 251-256, doi:10.1038/nmeth1010 (2007). 91 Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6, 343-345, doi:10.1038/nmeth.1318 (2009). 92 Layton, D. S. & Trinh, C. T. Engineering modular ester fermentative pathways in Escherichia coli. Metab Eng 26, 77-88, doi:10.1016/j.ymben.2014.09.006 (2014). 93 Gibson, D. G. et al. One-step assembly in yeast of 25 overlapping DNA fragments to form a complete synthetic Mycoplasma genitalium genome. Proc Natl Acad Sci U S A 105, 20404- 20409, doi:10.1073/pnas.0811011106 (2008). 94 Gibson, D. G. et al. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 319, 1215-1220, doi:10.1126/science.1151721 (2008). 95 Kouprina, N. & Larionov, V. Transformation-associated recombination (TAR) cloning for genomics studies and synthetic biology. Chromosoma, doi:10.1007/s00412-016-0588-3 (2016). 96 Shao, Z., Zhao, H. & Zhao, H. DNA assembler, an in vivo genetic method for rapid construction of biochemical pathways. Nucleic Acids Res 37, e16, doi:10.1093/nar/gkn991 (2009). 97 Shao, Z., Luo, Y. & Zhao, H. Rapid characterization and engineering of natural product biosynthetic pathways via DNA assembler. Mol Biosyst 7, 1056-1059, doi:10.1039/c0mb00338g (2011). 98 Mitchell, L. A. et al. Versatile genetic assembly system (VEGAS) to assemble pathways for expression in S. cerevisiae. Nucleic Acids Res 43, 6620-6630, doi:10.1093/nar/gkv466 (2015). 99 Gu, P. et al. A rapid and reliable strategy for chromosomal integration of gene(s) with multiple copies. Sci Rep 5, 9684, doi:10.1038/srep09684 (2015). 100 Itaya, M., Tsuge, K., Koizumi, M. & Fujita, K. Combining two genomes in one cell: stable cloning of the Synechocystis PCC6803 genome in the Bacillus subtilis 168 genome. Proc Natl Acad Sci U S A 102, 15971-15976, doi:10.1073/pnas.0503868102 (2005). 101 Itaya, M., Fujita, K., Kuroki, A. & Tsuge, K. Bottom-up genome assembly using the Bacillus subtilis genome vector. Nat Methods 5, 41-43, doi:10.1038/nmeth1143 (2008). 102 Wingler, L. M. & Cornish, V. W. Reiterative Recombination for the in vivo assembly of libraries of multigene pathways. Proc Natl Acad Sci U S A 108, 15135-15140, doi:10.1073/pnas.1100507108 (2011).

55

103 Ostrov, N., Wingler, L. M. & Cornish, V. W. Gene assembly and combinatorial libraries in S. cerevisiae via reiterative recombination. Methods Mol Biol 978, 187-203, doi:10.1007/978-1- 62703-293-3_14 (2013). 104 Lartigue, C. et al. Creating bacterial strains from genomes that have been cloned and engineered in yeast. Science 325, 1693-1696, doi:10.1126/science.1173759 (2009). 105 Storici, F., Durham, C. L., Gordenin, D. A. & Resnick, M. A. Chromosomal site-specific double- strand breaks are efficiently targeted for repair by oligonucleotides in yeast. Proc Natl Acad Sci U S A 100, 14994-14999, doi:10.1073/pnas.2036296100 (2003). 106 Dymond, J. S. et al. Synthetic chromosome arms function in yeast and generate phenotypic diversity by design. Nature 477, 471-476, doi:10.1038/nature10403 (2011). 107 Dymond, J. & Boeke, J. The Saccharomyces cerevisiae SCRaMbLE system and genome minimization. Bioeng Bugs 3, 168-171, doi:10.4161/bbug.19543 (2012). 108 Shen, Y. et al. SCRaMbLE generates designed combinatorial stochastic diversity in synthetic chromosomes. Genome Res 26, 36-49, doi:10.1101/gr.193433.115 (2016). 109 Annaluru, N. et al. Total synthesis of a functional designer eukaryotic chromosome. Science 344, 55-58, doi:10.1126/science.1249252 (2014). 110 Alper, H. & Stephanopoulos, G. Global transcription machinery engineering: a new approach for improving cellular phenotype. Metab Eng 9, 258-267, doi:10.1016/j.ymben.2006.12.002 (2007). 111 Burgess, R. R. & Anthony, L. How sigma docks to RNA polymerase and what sigma does. Curr Opin Microbiol 4, 126-131 (2001). 112 Pines, G., Freed, E. F., Winkler, J. D. & Gill, R. T. Bacterial Recombineering: Genome Engineering via Phage-Based Homologous Recombination. ACS Synth Biol 4, 1176-1185, doi:10.1021/acssynbio.5b00009 (2015). 113 Sharan, S. K., Thomason, L. C., Kuznetsov, S. G. & Court, D. L. Recombineering: a homologous recombination-based method of genetic engineering. Nat Protoc 4, 206-223, doi:10.1038/nprot.2008.227 (2009). 114 Zhang, Y., Buchholz, F., Muyrers, J. P. & Stewart, A. F. A new logic for DNA engineering using recombination in Escherichia coli. Nat Genet 20, 123-128, doi:10.1038/2417 (1998). 115 Murphy, K. C. Use of bacteriophage lambda recombination functions to promote gene replacement in Escherichia coli. J Bacteriol 180, 2063-2071 (1998). 116 Zhang, Y., Muyrers, J. P., Testa, G. & Stewart, A. F. DNA cloning by homologous recombination in Escherichia coli. Nat Biotechnol 18, 1314-1317, doi:10.1038/82449 (2000).

56

117 Datsenko, K. A. & Wanner, B. L. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc Natl Acad Sci U S A 97, 6640-6645, doi:10.1073/pnas.120163297 (2000). 118 Ellis, H. M., Yu, D., DiTizio, T. & Court, D. L. High efficiency mutagenesis, repair, and engineering of chromosomal DNA using single-stranded oligonucleotides. Proc Natl Acad Sci U S A 98, 6742-6746, doi:10.1073/pnas.121164898 (2001). 119 Costantino, N. & Court, D. L. Enhanced levels of lambda Red-mediated recombinants in mismatch repair mutants. Proc Natl Acad Sci U S A 100, 15748-15753, doi:10.1073/pnas.2434959100 (2003). 120 Cleary, M. A. et al. Production of complex nucleic acid libraries using highly parallel in situ oligonucleotide synthesis. Nat Methods 1, 241-248, doi:10.1038/nmeth724 (2004). 121 Wang, H. H. et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature 460, 894-898, doi:10.1038/nature08187 (2009). 122 Isaacs, F. J. et al. Precise manipulation of chromosomes in vivo enables genome-wide codon replacement. Science 333, 348-353, doi:10.1126/science.1205822 (2011). 123 Ma, N. J., Moonan, D. W. & Isaacs, F. J. Precise manipulation of bacterial chromosomes by conjugative assembly genome engineering. Nat Protoc 9, 2285-2300, doi:10.1038/nprot.2014.081 (2014). 124 Wang, H. H. et al. Genome-scale promoter engineering by coselection MAGE. Nat Methods 9, 591-593, doi:10.1038/nmeth.1971 (2012). 125 Mansell, T. J., Warner, J. R. & Gill, R. T. Trackable multiplex recombineering for gene-trait mapping in E. coli. Methods Mol Biol 985, 223-246, doi:10.1007/978-1-62703-299-5_12 (2013). 126 Warner, J. R., Reeder, P. J., Karimpour-Fard, A., Woodruff, L. B. & Gill, R. T. Rapid profiling of a microbial genome using mixtures of barcoded oligonucleotides. Nat Biotechnol 28, 856-862, doi:10.1038/nbt.1653 (2010). 127 Sandoval, N. R. et al. Strategy for directing combinatorial genome engineering in Escherichia coli. Proc Natl Acad Sci U S A 109, 10540-10545, doi:10.1073/pnas.1206299109 (2012). 128 Freed, E. F. et al. Genome-Wide Tuning of Protein Expression Levels to Rapidly Engineer Microbial Traits. ACS Synth Biol 4, 1244-1253, doi:10.1021/acssynbio.5b00133 (2015). 129 Scherer, S. & Davis, R. W. Replacement of chromosome segments with altered DNA sequences constructed in vitro. Proc Natl Acad Sci U S A 76, 4951-4955 (1979). 130 Wach, A., Brachat, A., Pohlmann, R. & Philippsen, P. New heterologous modules for classical or PCR-based gene disruptions in Saccharomyces cerevisiae. Yeast 10, 1793-1808 (1994).

57

131 Moerschell, R. P., Tsunasawa, S. & Sherman, F. Transformation of yeast with synthetic oligonucleotides. Proc Natl Acad Sci U S A 85, 524-528 (1988). 132 DiCarlo, J. E. et al. Yeast oligo-mediated genome engineering (YOGE). ACS Synth Biol 2, 741- 749, doi:10.1021/sb400117c (2013). 133 David, F. & Siewers, V. Advances in yeast genome engineering. FEMS Yeast Res, doi:10.1111/1567-1364.12200 (2014). 134 Gaj, T., Gersbach, C. A. & Barbas, C. F., 3rd. ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends Biotechnol 31, 397-405, doi:10.1016/j.tibtech.2013.04.004 (2013). 135 Zhang, G. et al. TALENs-Assisted Multiplex Editing for Accelerated Genome Evolution To Improve Yeast Phenotypes. ACS Synth Biol 4, 1101-1111, doi:10.1021/acssynbio.5b00074 (2015). 136 Paques, F. & Haber, J. E. Multiple pathways of recombination induced by double-strand breaks in Saccharomyces cerevisiae. Microbiol Mol Biol Rev 63, 349-404 (1999). 137 Park, K. S. et al. Phenotypic alteration of eukaryotic cells using randomized libraries of artificial transcription factors. Nat Biotechnol 21, 1208-1214, doi:10.1038/nbt868 (2003). 138 Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821, doi:10.1126/science.1225829 (2012). 139 Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823-826, doi:10.1126/science.1232033 (2013). 140 Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823, doi:10.1126/science.1231143 (2013). 141 Doudna, J. A. & Charpentier, E. Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096, doi:10.1126/science.1258096 (2014). 142 DiCarlo, J. E. et al. Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acids Res 41, 4336-4343, doi:10.1093/nar/gkt135 (2013). 143 Ryan, O. W. & Cate, J. H. Multiplex engineering of industrial yeast genomes using CRISPRm. Methods Enzymol 546, 473-489, doi:10.1016/B978-0-12-801185-0.00023-4 (2014). 144 Bao, Z. et al. Homology-integrated CRISPR-Cas (HI-CRISPR) system for one-step multigene disruption in Saccharomyces cerevisiae. ACS Synth Biol 4, 585-594, doi:10.1021/sb500255k (2015). 145 Jakociunas, T. et al. Multiplex metabolic pathway engineering using CRISPR/Cas9 in Saccharomyces cerevisiae. Metab Eng 28, 213-222, doi:10.1016/j.ymben.2015.01.008 (2015).

58

146 Qi, L. S. et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173-1183, doi:10.1016/j.cell.2013.02.022 (2013). 147 Larson, M. H. et al. CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nat Protoc 8, 2180-2196, doi:10.1038/nprot.2013.132 (2013). 148 Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442-451, doi:10.1016/j.cell.2013.06.044 (2013). 149 Perez-Pinera, P. et al. RNA-guided gene activation by CRISPR-Cas9-based transcription factors. Nat Methods 10, 973-976, doi:10.1038/nmeth.2600 (2013). 150 Bikard, D. et al. Programmable repression and activation of bacterial gene expression using an engineered CRISPR-Cas system. Nucleic Acids Res 41, 7429-7437, doi:10.1093/nar/gkt520 (2013). 151 Farzadfard, F., Perli, S. D. & Lu, T. K. Tunable and multifunctional eukaryotic transcription factors based on CRISPR/Cas. ACS Synth Biol 2, 604-613, doi:10.1021/sb400081r (2013). 152 Cress, B. F. et al. Rapid generation of CRISPR/dCas9-regulated, orthogonally repressible hybrid T7-lac promoters for modular, tuneable control of metabolic pathway fluxes in Escherichia coli. Nucleic Acids Res 44, 4472-4485, doi:10.1093/nar/gkw231 (2016). 153 Cleto, S., Jensen, J. V., Wendisch, V. F. & Lu, T. K. Corynebacterium glutamicum Metabolic Engineering with CRISPR Interference (CRISPRi). ACS Synth Biol 5, 375-385, doi:10.1021/acssynbio.5b00216 (2016). 154 Bassett, A. R., Kong, L. & Liu, J. L. A genome-wide CRISPR library for high-throughput genetic screening in Drosophila cells. J Genet Genomics 42, 301-309, doi:10.1016/j.jgg.2015.03.011 (2015). 155 Gilbert, L. A. et al. Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell 159, 647-661, doi:10.1016/j.cell.2014.09.029 (2014). 156 Lynch, S. A. & Gill, R. T. Synthetic biology: new strategies for directing design. Metab Eng 14, 205-211, doi:10.1016/j.ymben.2011.12.007 (2012). 157 Dietrich, J. A., McKee, A. E. & Keasling, J. D. High-throughput metabolic engineering: advances in small-molecule screening and selection. Annu Rev Biochem 79, 563-590, doi:10.1146/annurev- biochem-062608-095938 (2010). 158 Fischer, E. & Sauer, U. Metabolic flux profiling of Escherichia coli mutants in central carbon metabolism using GC-MS. European journal of biochemistry / FEBS 270, 880-891 (2003). 159 Fischer, E., Zamboni, N. & Sauer, U. High-throughput metabolic flux analysis based on gas chromatography-mass spectrometry derived 13C constraints. Analytical biochemistry 325, 308- 316 (2004).

59

160 Zamboni, N., Fendt, S. M., Ruhl, M. & Sauer, U. (13)C-based metabolic flux analysis. Nature protocols 4, 878-892, doi:10.1038/nprot.2009.58 (2009). 161 Pathak, H. & Madamwar, D. Biosynthesis of indigo dye by newly isolated naphthalene-degrading strain Pseudomonas sp. HOB1 and its application in dyeing cotton fabric. Appl Biochem Biotechnol 160, 1616-1626, doi:10.1007/s12010-009-8638-4 (2010). 162 Choi, S. Y., Yoon, K. H., Lee, J. I. & Mitchell, R. J. Violacein: Properties and Production of a Versatile Bacterial Pigment. Biomed Res Int 2015, 465056, doi:10.1155/2015/465056 (2015). 163 Ukibe, K., Katsuragi, T., Tani, Y. & Takagi, H. Efficient screening for astaxanthin-overproducing mutants of the yeast Xanthophyllomyces dendrorhous by flow cytometry. FEMS microbiology letters 286, 241-248, doi:10.1111/j.1574-6968.2008.01278.x (2008). 164 Santos, C. N. & Stephanopoulos, G. Melanin-based high-throughput screen for L-tyrosine production in Escherichia coli. Applied and environmental microbiology 74, 1190-1197, doi:10.1128/AEM.02448-07 (2008). 165 Tracy, B. P., Gaida, S. M. & Papoutsakis, E. T. Flow cytometry for bacteria: enabling metabolic engineering, synthetic biology and the elucidation of complex phenotypes. Current opinion in biotechnology 21, 85-99, doi:10.1016/j.copbio.2010.02.006 (2010). 166 Gamper, M., Hilvert, D. & Kast, P. Probing the role of the C-terminus of Bacillus subtilis chorismate mutase by a novel random protein-termination strategy. Biochemistry 39, 14087- 14094 (2000). 167 Griffiths, J. S. et al. A bacterial selection for the directed evolution of pyruvate aldolases. Bioorganic & medicinal chemistry 12, 4067-4074, doi:10.1016/j.bmc.2004.05.034 (2004). 168 Dietrich, J. A., Shis, D. L., Alikhani, A. & Keasling, J. D. Transcription factor-based screens and synthetic selections for microbial small-molecule biosynthesis. ACS Synth Biol 2, 47-58, doi:10.1021/sb300091d (2013). 169 Hellinga, H. W. & Marvin, J. S. Protein engineering and the development of generic biosensors. Trends in biotechnology 16, 183-189 (1998). 170 Doi, N. & Yanagawa, H. Design of generic biosensors based on green fluorescent proteins with allosteric sites by directed evolution. FEBS letters 453, 305-307 (1999). 171 Baird, G. S., Zacharias, D. A. & Tsien, R. Y. Circular permutation and receptor insertion within green fluorescent proteins. Proceedings of the National Academy of Sciences of the United States of America 96, 11241-11246 (1999). 172 Tucker, C. L. & Fields, S. A yeast sensor of ligand binding. Nature biotechnology 19, 1042-1046, doi:10.1038/nbt1101-1042 (2001).

60

173 Louvion, J. F., Havaux-Copf, B. & Picard, D. Fusion of GAL4-VP16 to a steroid-binding domain provides a tool for gratuitous induction of galactose-responsive genes in yeast. Gene 131, 129- 134 (1993). 174 Israel, D. I. & Kaufman, R. J. Dexamethasone negatively regulates the activity of a chimeric dihydrofolate reductase/glucocorticoid receptor protein. Proceedings of the National Academy of Sciences of the United States of America 90, 4290-4294 (1993). 175 Collinet, B. et al. Functionally accepted insertions of proteins within protein domains. The Journal of biological chemistry 275, 17428-17433, doi:10.1074/jbc.M000666200 (2000). 176 Loving, G. S., Sainlos, M. & Imperiali, B. Monitoring protein interactions and dynamics with solvatochromic fluorophores. Trends in biotechnology 28, 73-83, doi:10.1016/j.tibtech.2009.11.002 (2010). 177 Renard, M. et al. Knowledge-based design of reagentless fluorescent biosensors from recombinant antibodies. Journal of molecular biology 318, 429-442, doi:10.1016/S0022- 2836(02)00023-2 (2002). 178 Rabbany, S. Y., Donner, B. L. & Ligler, F. S. Optical immunosensors. Critical reviews in biomedical engineering 22, 307-346 (1994). 179 Pollack, S. J., Nakayama, G. R. & Schultz, P. G. Introduction of nucleophiles and spectroscopic probes into antibody combining sites. Science 242, 1038-1040 (1988). 180 Masharina, A., Reymond, L., Maurel, D., Umezawa, K. & Johnsson, K. A fluorescent sensor for GABA and synthetic GABA(B) receptor ligands. Journal of the American Chemical Society 134, 19026-19034, doi:10.1021/ja306320s (2012). 181 Gilardi, G., Zhou, L. Q., Hibbert, L. & Cass, A. E. Engineering the maltose binding protein for reagentless fluorescence sensing. Analytical chemistry 66, 3840-3847 (1994). 182 Boyd, A. E., Marnett, A. B., Wong, L. & Taylor, P. Probing the active center gorge of acetylcholinesterase by fluorophores linked to substituted cysteines. The Journal of biological chemistry 275, 22401-22408, doi:10.1074/jbc.M000606200 (2000). 183 Tsai, Y. C., Jin, Z. & Johnson, K. A. Site-specific labeling of T7 DNA polymerase with a conformationally sensitive fluorophore and its use in detecting single-nucleotide polymorphisms. Analytical biochemistry 384, 136-144, doi:10.1016/j.ab.2008.09.006 (2009). 184 Chan, P. H. et al. Rational design of a novel fluorescent biosensor for beta-lactam antibiotics from a class A beta-lactamase. Journal of the American Chemical Society 126, 4074-4075, doi:10.1021/ja038409m (2004). 185 Ellington, A. D. & Szostak, J. W. In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818-822, doi:10.1038/346818a0 (1990).

61

186 Tuerk, C. & Gold, L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505-510 (1990). 187 Nutiu, R. & Li, Y. Structure-switching signaling aptamers: transducing molecular recognition into fluorescence signaling. Chemistry 10, 1868-1876, doi:10.1002/chem.200305470 (2004). 188 Kim, Y. S. & Gu, M. B. Advances in aptamer screening and small molecule aptasensors. Advances in biochemical engineering/biotechnology 140, 29-67, doi:10.1007/10_2013_225 (2014). 189 Tombelli, S., Minunni, M. & Mascini, M. Analytical applications of aptamers. Biosensors & bioelectronics 20, 2424-2434, doi:10.1016/j.bios.2004.11.006 (2005). 190 Yamamoto, R., Baba, T. & Kumar, P. K. Molecular beacon aptamer fluoresces in the presence of Tat protein of HIV-1. Genes to cells : devoted to molecular & cellular mechanisms 5, 389-396 (2000). 191 Li, J. J., Fang, X. & Tan, W. Molecular aptamer beacons for real-time protein recognition. Biochemical and biophysical research communications 292, 31-40 (2002). 192 Bayer, T. S. & Smolke, C. D. Programmable ligand-controlled riboregulators of eukaryotic gene expression. Nature biotechnology 23, 337-343, doi:10.1038/nbt1069 (2005). 193 Trevino, S. G. & Levy, M. High-Throughput Bead-Based Identification of Structure-Switching Aptamer Beacons. Chembiochem : a European journal of chemical biology, doi:10.1002/cbic.201402037 (2014). 194 Wang, K. et al. Molecular engineering of DNA: molecular beacons. Angewandte Chemie 48, 856- 870, doi:10.1002/anie.200800370 (2009). 195 Michener, J. K., Thodey, K., Liang, J. C. & Smolke, C. D. Applications of genetically-encoded biosensors for the construction and control of biosynthetic pathways. Metabolic engineering 14, 212-222, doi:10.1016/j.ymben.2011.09.004 (2012). 196 Mandal, M. & Breaker, R. R. Gene regulation by riboswitches. Nature reviews. Molecular cell biology 5, 451-463, doi:10.1038/nrm1403 (2004). 197 Montange, R. K. & Batey, R. T. Riboswitches: emerging themes in RNA structure and function. Annual review of biophysics 37, 117-133, doi:10.1146/annurev.biophys.37.032807.130000 (2008). 198 van Sint Fiet, S., van Beilen, J. B. & Witholt, B. Selection of biocatalysts for chemical synthesis. Proceedings of the National Academy of Sciences of the United States of America 103, 1693- 1698, doi:10.1073/pnas.0504733102 (2006). 199 Evans, R. M. & Mangelsdorf, D. J. Nuclear Receptors, RXR, and the Big Bang. Cell 157, 255- 266, doi:10.1016/j.cell.2014.03.012 (2014).

62

200 Rogers, J. K. et al. Synthetic biosensors for precise gene control and real-time monitoring of metabolites. Nucleic Acids Res 43, 7648-7660, doi:10.1093/nar/gkv616 (2015). 201 Raman, S., Rogers, J. K., Taylor, N. D. & Church, G. M. Evolution-guided optimization of biosynthetic pathways. Proc Natl Acad Sci U S A 111, 17803-17808, doi:10.1073/pnas.1409523111 (2014). 202 DeVito, J. A. Recombineering with tolC as a selectable/counter-selectable marker: remodeling the rRNA operons of Escherichia coli. Nucleic Acids Res 36, e4, doi:10.1093/nar/gkm1084 (2008). 203 Collins, C. H., Leadbetter, J. R. & Arnold, F. H. Dual selection enhances the signaling specificity of a variant of the quorum-sensing transcriptional activator LuxR. Nature biotechnology 24, 708- 712, doi:10.1038/nbt1209 (2006). 204 Mohn, W. W., Garmendia, J., Galvao, T. C. & de Lorenzo, V. Surveying biotransformations with a la carte genetic traps: translating dehydrochlorination of lindane (gamma- hexachlorocyclohexane) into lacZ-based phenotypes. Environmental microbiology 8, 546-555, doi:10.1111/j.1462-2920.2006.00983.x (2006). 205 Tang, S. Y. & Cirino, P. C. Design and application of a mevalonate-responsive regulatory protein. Angew Chem Int Ed Engl 50, 1084-1086, doi:10.1002/anie.201006083 (2011). 206 Chockalingam, K., Chen, Z., Katzenellenbogen, J. A. & Zhao, H. Directed evolution of specific receptor-ligand pairs for use in the creation of gene switches. Proc Natl Acad Sci U S A 102, 5691-5696, doi:10.1073/pnas.0409206102 (2005). 207 Peet, D. J., Doyle, D. F., Corey, D. R. & Mangelsdorf, D. J. Engineering novel specificities for ligand-activated transcription in the nuclear hormone receptor RXR. Chem Biol 5, 13-21 (1998). 208 Taylor, N. D. et al. Engineering an allosteric transcription factor to respond to new ligands. Nat Methods 13, 177-183, doi:10.1038/nmeth.3696 (2016). 209 Wu, S. G., He, L., Wang, Q. & Tang, Y. J. An ancient Chinese wisdom for metabolic engineering: Yin-Yang. Microb Cell Fact 14, 39, doi:10.1186/s12934-015-0219-3 (2015). 210 Woolston, B. M., Edgar, S. & Stephanopoulos, G. Metabolic engineering: past and future. Annu Rev Chem Biomol Eng 4, 259-288, doi:10.1146/annurev-chembioeng-061312-103312 (2013).

63

Chapter 2

De novo Assembly and Promoter Engineering of the

~ 100 kb Type I Meridamycin Polyketide Biosynthetic Pathway

──────────────────────────────────────────────────────────────────────────────

The contents of this chapter will be published in:

Yao Zong Ng, Pedro Baldera-Aguayo, Mingjian Gao, Dean Deng, Estefania Chavez, Jeffrey

Janso, Frank Koehn, Alessandra Eustaquio, Virginia W. Cornish. “De novo assembly and promoter engineering of the ~ 100 kb type I meridamycin polyketide biosynthetic pathway using yeast synthetic biology”, in preparation.

Chapter contributions: Y.Z.N. and A.E. designed the experiments, Y.Z.N., M.G., D.D. and

E.C. performed pathway assembly, Y.Z.N. developed the FLIP system, Y.Z.N., M.G. and P.B.A. constructed the promoter libraries and performed FLIP. A.E. and J.J. performed conjugation,

P.B.A. will perform polyketide production and screening.

64

2.0 Chapter outlook

Building and testing combinatorial libraries of biosynthetic pathways is an efficient approach to optimize compound production and to produce novel analogs. However, it remains technically challenging to assemble and engineer long (10s – 100s kb) and complex multi-gene pathways, especially in host cells with a limited toolkit for genetic engineering. One strategy is to construct the pathway/library in a genetically tractable host such as yeast, followed by shuttling of the constructs into the heterologous production host. In this study, we report the de novo assembly of a ~ 100 kb Type I modular polyketide synthase (PKS) pathway for meridamycin in the yeast chromosome using Reiterative Recombination. This is one of the longest and most complex (highly repetitive and GC-rich) biosynthetic pathways for a small molecule natural product. The pathway was assembled in four fragments in parallel to create four yeast strains, each containing one fragment. Next, CRISPR/Cas9 was used to build a promoter library to drive the expression of each fragment separately, in order to optimize the production of meridamycin – a compound with promising neuroprotective properties. Finally, we developed

FLIP for the efficient recovery of the fragments onto shuttle plasmids for their transfer by conjugation into the heterologous host Streptomyces lividans K4-114. We plan to screen the library using a high-throughput fluorescence polarization assay (see Chapter 3) to identify strains producing higher titres of meridamycin. In future, our approach can potentially be harnessed for combinatorial biosynthesis for the production of novel analogs.

65

2.1 Introduction

Synthetic biology offers a new paradigm for the bottom-up engineering of cells to perform user-defined functions, such as bio-sensing or metabolic engineering for the production of chemicals and feedstocks1. A key bottleneck is the introduction of multiple non-native genes, pathways or even whole genomes to create customized host cells2. De novo gene synthesis can provide the ultimate flexibility to build codon-optimized pathways or a library of variants to optimize a complex function. However, although the cost of DNA synthesis has dropped significantly, the lengths of double-stranded (ds)DNA blocks that are commercially available are still quite short (~ 2 – 5 kb). Thus, there is a need for efficient and accessible strategies to assemble these dsDNA blocks into multi-gene pathways (10s – 100s kb) and libraries for their expression in a host cell3.

An important class of natural product pathways is the Type I modular polyketide synthases (mPKSs), which are large (10s – 100s kb), multi-modular enzyme complexes4. Each module consists of enzyme domains that extend the polyketide chain with monomers and modify them in assembly line fashion. Many notable drugs are modular polyketides, such as the anti- bacterial erythromycin5, anti-fungal soraphen and candicidin6, immunosuppressant FK506 and rapamycin7, and anti-parasitic avermectin8. Because of their bioactive profile and multi-step, low-yield chemical syntheses9-11, there is a strong interest in optimizing the biosynthesis of these compounds in native and heterologous hosts. Furthermore, their modular architecture gives rise to the opportunity for pathway engineering or combinatorial biosynthesis to obtain novel analogs12-14. However, the Type I mPKSs are one of the most challenging pathways to engineer due to their length, highly repetitive module sequences, GC content (> 70 %), and lack of tools for genetic engineering in their native actinomycetes or myxobacteria hosts15. Although some

66 progress has been made with CRISPR16, the method is still inefficient in Streptomyces spp. and may be limited by off-target effects when targeting the repetitive mPKS sequence (see Section

2.2.2).

De novo assembly methods can offer the greatest flexibility to create designer sequences17. In one report, a combination of in vitro methods such as PCR and restriction digest/ligation was used to assemble the DEBS Type I mPKS pathway (32 kb, 6 modules) for expression in E. coli3. This approach requires unique restriction sites to be available, which would be problematic for longer sequences. In contrast, in vivo methods that rely on homologous recombination allow for the scarless assembly of DNA with a simple transformation step. The assembly of an iterative Type I PKS (45 kb, 4 modules) was achieved using DNA Assembler, based on plasmid gap repair, in the yeast Saccharomyces cerevisiae, followed by expression in

Streptomyces lividans18. However, the efficiency of the method is low and a two-step assembly strategy had to be used to overcome severe gene deletions likely caused by the repetitive PKS sequences. Previously, our lab developed Reiterative Recombination as a user friendly, robust and efficient tool for assembling multi-gene pathways and libraries in the yeast chromosome19,20.

The method relies on dsDNA break-induced homologous recombination to increase the efficiency of assembly21. Thus, we sought to explore the limits of our method by challenging it with building a long and complex mPKS pathway.

In this chapter, we report the de novo assembly of a ~ 100 kb non-ribosomal peptide synthase (NRPS) – Type I mPKS (14 modules) that produces the polyketide meridamycin22. To our knowledge, this is the longest Type I mPKS pathway assembled from the bottom-up so far3,18. Our strategy was to split the pathway into four fragments according to open reading frames (ORFs) for parallel assembly in yeast. Meridamycin is structurally similar to FK506 and

67 rapamycin, but is non-immunosuppressive23, and has shown promise as a neuroprotectant against ischemic stroke, Parkinson’s disease and dementia22,24-28. Thus, we sought to optimize the production of the compound by using CRISPR in yeast to build a promoter library. Next, we developed FLIP, based on Recombinase Mediated Cassette Exchange (RMCE)29, for the efficient recovery of the pathway fragments and library from the chromosome onto shuttle plasmids. These were amplified in E. coli and cloned in vitro onto expression plasmids for transfer into the heterologous host Streptomyces lividans K4-114. Finally, we plan to screen for meridamycin production using our previously developed high-throughput fluorescence polarization assay (see Chapter 3). Our strategy can potentially be used to study the function of the PKS complexes, optimize compound production and for the biosynthesis of novel analogs.

2.2 Results

2.2.1 ~ 100 kb meridamycin PKS pathway assembly by Reiterative Recombination

In our design, the ~ 100 kb meridamycin NRPS-PKS pathway was divided into four fragments according to open reading frames (ORFs), for parallel assembly in four separate yeast strains (Fig. 2-1). The constitutive ermE* promoter (ermEp*)30 from Streptomyces erythraeus was placed in front of each fragment to drive their expression separately. Although the Type I

PKSs are typically transcribed as a single operon, several studies have shown that ORFs cloned and expressed in separate loci in the genome can reconstitute a functional PKS for polyketide production31-34. This strategy has been harnessed for combinatorial biosynthesis to produce a library of analogs. In our study, this “divide and conquer” approach also enabled the faster parallel assembly of the pathway by Reiterative Recombination, and in theory, would reduce the chances of unwanted recombination events between homologous regions during the assembly

68 process. Various unique restriction sites, recombination sites and bacterial antibiotic resistance markers were placed around the fragments to enable their recovery onto shuttle plasmids for transfer into a heterologous host13 (Fig. 2-11 and Fig. 2-17).

Figure 2-1. ~ 100 kb meridamycin Type I NRPS – PKS pathway assembly scheme. The fragments were split according to open reading frames (ORFs): Fragment 1: merP, merA; Fragment 2: merB;

Fragment 3: merC; Fragment 4: merE – merT. The core PKS subunits are encoded by merA, merB and merC, which are composed of 14 modules (M1 – M14) and a loading module. The rounds of Reiterative

Recombination performed to assemble each fragment are shown below the horizontal line. Repeated numbers indicate that two or three DNA building blocks were assembled in that round. Various enzymatic domains in each module are shown schematically below in colored circles. KS: ketosynthase; AT: acyltransferase; DH: dehydratase; KR: ketoreductase; ER: enoyl-reductase; ACP: acyl carrier protein.

69

In total, 44 rounds of Reiterative Recombination were performed to assemble all of the four PKS pathway fragments (Fig. 2-1). During each round of Reiterative Recombination, we assembled one, two or three blocks of dsDNA of length ~ 1.5 – 3 kb each, which were mostly

PCR amplified from a plasmid (pHLW30) containing the full pathway13 (Tables 2-2 – 2-5). We allowed for 100 – 250 bp of overlap between each DNA block in order to reduce the chances of off-target recombination, although in theory as little as 30 – 40 bp of homology is sufficient to assemble uncomplicated sequences19. After each round of Reiterative Recombination, ~ 104 colonies/mL were obtained, which is similar to previously reported values20. PCR was used initially to screen for the assemblies in the chromosome (Fig. 2-2, Tables 2-6 – 2-9). On average, six colonies were screened after each round, of which at least one colony contained the correct assembly. When two or more blocks of DNA were assembled at the same time, we sometimes had to screen up to 12 colonies to find a correct clone, which was likely due to an increased frequency of undesired recombination events happening.

Figure 2-2. PCR verification of complete PKS fragment assemblies. Lane 1: NEB 1 kb ladder. From left to right: check PCRs that were performed for each fragment to verify its assembly in the genome.

70

Primers and expected product sizes are listed in Table 2-6 (Fragment 1), Table 2-7 (Fragment 2), Table

2-8 (Fragment 3) and Table 2-9 (Fragment 4).

The four yeast strains containing the pathway fragments in chromosome IV were further verified by restriction digest of agarose-embedded DNA, followed by pulsed field gel electrophoresis and Southern blotting (Fig. 2-3).

Figure 2-3. Pulsed field gel electrophoresis and Southern blotting to verify PKS assemblies. 1 kb extend ladder (NEB); 1: LW2591Y undigested, dark band shows position of undigested chromosome IV;

(2-6: Samples digested with SpeI and expected sizes) 2: LW2591Y (13.9 kb), 3: Fragment 1 (38.5 kb), 4:

Fragment 2 (37.5 kb), 5: Fragment 3 (50.2 kb), 6: Fragment 4 (33.6 kb).

Also, we checked for fitness defects in the yeast strain containing Fragment 3 (merC), which is the longest fragment (~ 36 kb), but did not find any when compared to the Reiterative

Recombination acceptor strain (LW2591Y) and its parental strain (BY4733) (Fig. 2-4).

71

Figure 2-4. Fitness tests for yeast strain containing PKS fragment 3. Comparison of fitness of

Fragment 3 yeast strain F3-R12 (column 3) with the Reiterative Recombination acceptor strain LW2591Y

(column 2) and parental strain BY4733 (column 1), when cultured under different conditions. Top row from left to right: YPGE, 30 ºC, 6 days; YPD, 30 ºC, pH 4.0, 3 days; YPD + MMS (0.05 %), 30 ºC, 3 days. Bottom row from left to right: YPD, 25 ºC, 3 days; YPD, 30 ºC, 3 days; YPD, 37 ºC, 3 days.

Furthermore, to determine if the assemblies were stable in the chromosome, the yeast strain containing Fragment 3, which has 7 repetitive PKS modules, was sub-cultured for 10 days

(> 120 generations) in three independent lineages (Fig. 2-5). Due to the presence of repetitive sequences, it is possible that portions of the pathway could pop-out by homologous recombination35. However, we did not observe any such cases by PCR verification of the assembly (Fig. 2-6).

72

Figure 2-5. Sub-culture scheme to check the stability of PKS assemblies in the chromosome. 3 colonies were picked from a streak of strain F3-R12 and sub-cultured independently. Each day, 5 μL of culture was transferred into 5 mL of fresh YPD in a 14 mL culture tube. After Day 10 (~ 120 generations), the culture was plated to obtain single colonies, which were isolated and the genomic DNA

(gDNA) extracted for testing by PCR.

Figure 2-6. PCR to check the stability of PKS assemblies in the chromosome. Colonies from three independent lineages that were sub-cultured for 10 days were tested.

73

2.2.2 Sequence editing and promoter library construction using CRISPR

After the assembly was complete, we sequenced all of the fragments by Sanger sequencing of the check PCR products (Fig. 2-2). A total of 12 single base pair differences from our reference sequence were found (Table 2-1). Five mutations were found in the PKS genes, three of which were missense and two were silent mutations. To demonstrate that we could utilize yeast synthetic biology to make sequence edits and to keep the amino acid sequence identical to the wild-type (WT), we decided to correct all of the missense mutations in the pathway.

Table 2-1. Mutations in yeast assemblies.

Fragment Position Mutation (sense strand) Locus 1 15707 G to A (Gly to Glu) merA-M2-AT 1 21005 T to C (Leu to Pro) merA-M3-AT 1 23224 Deletion of T tetR promoter 2 23409 C to T cat (chloramphenicol resistance gene) 2 23524 Insertion of T cat terminator 3 1444 G to T ermE* promoter Linker between merC-M9-AT and merC- 3 9058 C to T (silent) M9-KR HO endonuclease cut site in ReRec acceptor 3 NA Deletion of A locus after Round 10) Linker between merC-M13-DH and merC- 3 27250 G to A (silent) M13-ER 3 35069 G to A ant3ia promoter 3 36274 G to T ant3ia terminator 4 13416 C to T (Ala to Thr) merQ

74

Recently, the CRISPR/Cas9 method has been developed for the in vivo editing of DNA in yeast and other organisms36,37. The method is highly versatile, since the endonuclease Cas9 can be directed to make a dsDNA cut at a site that is complementary to a user-designed guide RNA

(gRNA)38. The dsDNA break stimulates repair by homologous recombination with a block of user-supplied repair DNA to allow for sequence editing. We performed CRISPR successfully to correct a total of four mutations, including one silent mutation and one single base-pair deletion that occurred in the HO endonuclease cut site of the Reiterative Recombination module during

Round 10 of assembly of Fragment 3 (Table 2-1). However, one missense mutation was recalcitrant to our efforts to correct it using CRISPR. This mutation occurred in the acyltransferase (AT) domain of PKS module 2, and led to local sequence identity with the homologous AT domain in PKS module 1. We hypothesize that this could have led to off-target

Cas9 cleavage of PKS module 1, since the gRNAs that we designed to target the mutation in module 2 differed only by one base-pair from the AT sequence in module 1 (Table 2-10 and

Table 2-11). We also successfully applied CRISPR to introduce recombination marker genes and FRT sites to allow for subsequent fragment recovery onto shuttle plasmids (Table 2-10).

Next, we decided to apply CRISPR to construct a promoter library in an attempt to increase the production yield of meridamycin in a heterologous host. The reported yields of meridamycin from native strains is quite low (< 10 mg/L)39,40. Previously, the entire pathway was shuttled into the heterologous host S. lividans K4-114, but no production was observed with the native promoter13. When the constitutive ermEp* promoter was swapped in to form strain E7, a very low yield was observed (~ 240 μg/L)30,41. Thus, promoter libraries could offer a combinatorial way to optimize the expression of the major ORFs in the pathway, especially since they are transcribed independently in our study42.

75

Figure 2-7. Promoter library construction by CRISPR to optimize meridamycin production. 5 promoters were chosen to build the library in front of each fragment. They are listed in order of increasing strength and compared with the ermEp* promoter. Shown on the right are hypothetical combinations of different promoters leading to different product yields.

Five constitutive Streptomyces promoters of various strengths were chosen to build the library (Fig. 2-7). kasOp*43, SF14p44 and p2145,46 have been shown to be stronger than ermEp*.

B4p is weaker than ermEp*, while p72 is inactive and serves as a negative control. Compared to ermEp*, these promoters have not been fully explored for metabolic engineering in Streptomyces spp.45. We performed one round of CRISPR on each fragment and supplied an equimolar mix of the five promoters with appropriate homologies to the fragments as repair DNA. Each CRISPR attempt yielded hundreds of colonies (data not shown), which is similar to that of other reports36.

30 colonies were screened by yeast colony PCR and sequenced to verify the identity of the promoter. All the screened colonies contained one of the five promoters in place of the original ermEp* promoter (Fig. 2-8), making the CRISPR/Cas9 method highly efficient (~ 100 %) at producing desired mutations. Assuming all four fragment libraries are transferred into

Streptomyces K4-114, the final library size would be 625 (54).

76

Figure 2-8. Proportion of 30 colonies containing each promoter after one round of CRISPR.

2.2.3 Pathway recovery from the chromosome onto a shuttle vector by FLIP

Development of a FLIP system for pathway recovery

Next, we sought to recover the NRPS-PKS pathway fragments and library from the yeast chromosome onto a shuttle vector for transfer into Streptomyces spp.. For this, we developed

“FLIP”, which is based on site-specific recombination (SSR). Unlike homologous recombination, which can occur between any pair of sequences containing a minimum amount of identity, SSR occurs only between a pair of specific target sequences recognized by a recombinase enzyme47. In particular, the Cre/Lox system from bacteriophage P1 and Flp/Flp

Recombinase Target (FRT) site system from yeast have been exploited as efficient tools to perform gene insertions, deletions and inversions in a wide variety of organisms48. The original

FRT site (FRT-WT) has been engineered by the creation of orthogonal 48 base-pair FRT mutant sites (FRT-X) differing only in an 8 base-pair spacer, which recombine preferentially with

77 themselves over other variant sites49. This has led to the development of Recombinase Mediated

Cassette Exchange (RMCE), in which DNA cassettes flanked by the same pair of heterologous

FRT sites are exchanged in a double SSR event catalyzed by Flp recombinase29. RMCE has been harnessed to integrate DNA into the genome of mammalian, plant and insect cells, and can be multiplexed for the simultaneous modification of two genomic loci50.

Figure 2-9. Design of the FLIP system based on RMCE. Marker 1 and 2 are fused in frame to the 48 base pair FRT-WT site. Marker 1 is silent due to the absence of a promoter and start codon. Upon FLIP, marker 1 is activated by a strong promoter (pMET25) and start codon on the shuttle vector, whereas marker 2 is silenced, enabling positive and negative selection respectively. FRT-X represents a heterologous FRT variant site. Flp recombinase was cloned onto the shuttle vector under control of the inducible galactose promoter.

78

In this study, we engineered RMCE in reverse with a start codon and promoter trap to enable in vivo selection for the recovery of DNA from the yeast chromosome onto a shuttle plasmid (FLIP) (Fig. 2-9). In the FLIP system, an inactive marker gene (no start codon and no promoter) in the chromosome is activated after FLIP onto a plasmid containing a promoter and start codon. As a proof-of-principle, we have recovered the URA3 marker gene alone, flanked by

FRT-WT and FRT-3, from the chromosome onto a pRS series E. coli – S. cerevisiae shuttle vector51. High efficiencies of 10-4 were only obtained in the presence of Flp recombinase, compared to background homologous recombination rates of ~ 10-6, a 100-fold difference (Fig.

2-10). Thus, FLIP can enable the recovery of pathway libraries of at least 104 given a culture of

108 cells.

Figure 2-10. FLIP efficiencies with culture time. The pRS415 shuttle plasmids containing Flp recombinase (pAN133, lines C and D) and lacking Flp recombinase (pAN65, lines A and B) were transformed into a yeast strain (AN61Y) containing URA3 in the chromosome flanked by FRT-WT and

FRT-3. Transformants were scraped and cultured in glucose-based media (lines A and C) or galactose- based media to induce Flp recombinase (lines B and D). After culturing for different time (3 h, 7 h, 12 h,

79

24 h and 48 h), the cells were diluted and plated on selective agar lacking uracil, as well as non-selective agar. Efficiencies were calculated as number of colonies on selective / non-selective plates. Error bars represent the standard deviation of colony counts from two plates.

Recovery of assembled fragments by FLIP

In order to recover our large pathway fragments from the yeast chromosome for transfer into Streptomyces spp., we modified a pSBAC E. coli – Streptomyces shuttle vector previously developed for the cloning of the meridamycin (and other large) pathways13. We incorporated yeast replication sequences for its maintenance as a low-copy centromeric plasmid in yeast, the

LEU2 marker for selection in yeast, the FLIP acceptor module (FRT-WT – TRP1 – FRT-3)

(pAN98) and the Flp recombinase under control of a galactose-inducible promoter (pAN146).

Originally, our goal was to recover all four fragments onto a single shuttle plasmid using an iterative FLIP scheme (Fig. 2-11).

However, we faced difficulties in the recovery of fragment 3 onto a plasmid that already contained fragment 4. Very few/no colonies were obtained, which could have been due to a non- functional KanMX marker (the FRT-WT site adds an extra 16 amino acids at the N-terminus of the protein). The few colonies obtained had plasmids with incomplete pathway fragments, which could have been due to unwanted recombination events occurring between the repetitive sequences (data not shown).

80

Figure 2-11. Design of FRT sites, yeast and bacterial marker genes for iterative FLIP.

As such, we opted to recover each of the fragments individually by FLIP onto plasmids, followed by restriction digest and ligation to clone the fragments onto final shuttle plasmids for transfer into Streptomyces spp.. CRISPR was performed on strain F3-R12 to exchange the

KanMX marker in fragment 3 for the URA3 (cr23-1) or the TRP1 (cr23-2) marker to create strain AN501Y and AN503Y respectively (Table 2-10). Various FLIP shuttle plasmids (pAN428

– pAN432) were created containing different pairs of heterologous FRT sites and markers to recover all four fragments by FLIP (Fig. 2-12).

81

Figure 2-12. Individual FLIPs to recover all four fragments onto plasmids. Shuttle plasmids for each fragment-containing yeast strain are: pAN435 (F1-R10-cr10), pAN437 (F2-R14-cr9), pAN434 (F3-R12- cr23-1), pAN436 (F3-R12-cr23-2), pAN146 /or pAN433 (F4-R8-cr6-cr7).

82

Next, we carried out FLIP on all four fragments individually and selected for activation of the yeast marker genes URA3 (fragment 1, 3 (cr23-1) and 4) or TRP1 (fragment 2 and 3

(cr23-2)). Efficiencies of > 10-4 were recorded (except for cr23-1), which was similar to flipping the URA3 gene alone (Fig. 2-13). Selecting for the TRP1 marker produced about 3-fold higher efficiencies than the URA3 marker. However, the efficiencies decreased generally when flipping fragment 3, whether selecting for the URA3 (cr23-1) or TRP1 (cr23-2) marker.

Figure 2-13. FLIP efficiencies of Fragment 1, 2, 3 and 4. Efficiencies were calculated after incubation for one day (T = 24 h) in glucose (GLU) or galactose (GAL) – based media. Galactose was used to induce

Flp Recombinase. Efficiencies are measured as number of colonies on selective / non-selective plates.

Error bars represent standard deviation for four plates.

83

Following FLIP, the plasmids were extracted from yeast using a gentle lysis protocol52 and transformed into E. coli EPI300 by electroporation. Transformants were selected for the apramycin resistance marker present in the backbone of pSBAC, as well as for the specific antibiotic resistance marker assembled at the end of each fragment (Fig. 2-11). The plasmids were amplified and verified by restriction analysis (Fig. 2-14).

Figure 2-14. Restriction analysis of plasmids obtained by FLIP. 6 colonies were screened for fragment

1, 2 and 3 (cr23-1) and (cr23-2). 8 colonies were screened for fragment 4. Successful clones were stocked: pAN472 (fragment 1), pAN475 (fragment 5), pAN285 (fragment 4).

84

Flipping fragments 1, 2 and 4 produced plasmids containing the expected fragments, except for one colony from F2, which did not contain the fragment. In contrast, the plasmids derived from flipping fragment 3 all lacked the expected fragment (Fig. 2-14). We later discovered that EPI300 is already resistant to streptomycin, which was used to select for fragment 3 in E. coli.

Fragment recovery by transformation associated recombination

Thus, we explored the use of transformation associated recombination (TAR)53, as an alternative method to recover the pathway fragments from the yeast chromosome onto plasmids.

TAR relies on gap repair of a linearized plasmid containing ~ 60 bp “hook” sequences, which are homologous to the ends of the fragment of interest (“bait”)54,55. This approach has been used to clone long (up to 250 kb) and repetitive sequences from co-transformed eukaryotic or prokaryotic genomes onto yeast plasmids56. In this study, we did not transform any exogenous genome into the yeast cell, but rather sought to rely on gap repair of the plasmid using the endogenous genome as template. To reduce the background of self-closed plasmids, we removed the autonomous replicating sequence (ARS) from our yeast - E. coli - Streptomyces spp. shuttle plasmid. Successful replication of the plasmid would then depend on recovery of the chromosomal target sequence, which contains an ARS57. TAR has been reported to yield 1-5 % of gene-positive clones, which can be raised to 20-30 % by introduction of a dsDNA break near the target region58. Since we did not introduce a dsDNA break in the chromosome, hook sequences (homology arms) of ~ 1 kb were used to increase the efficiency of recovery. The final

TAR shuttle plasmid (pAN438) was transformed into a mix of the four fragment-containing yeast strains (F1-R10-cr10, F2-R14-cr9, F3-R12, F4-R8-cr6-cr7) and selected on agar plates

85 lacking leucine. After 6 days, colonies containing each of the four fragments on plasmids were obtained with an overall efficiency of 50 %, which we verified by yeast colony PCR (Fig. 2-15).

Figure 2-15. Colony PCR to test for the recovery of the pathway onto pAN438 by TAR. Three colony

PCRs were performed on 48 colonies using the forward primer AN358, which is specific to the pAN438 backbone, and the reverse primers AN299 / AN681 / AN285, which are specific for fragments 1 and 4

(top gel) / fragment 3 (middle gel) / fragment 2 (bottom gel) respectively.

We further extracted the plasmids from the four yeast colonies containing fragment 3, and transformed them into E. coli EPI300, followed by amplification and restriction digest to verify the integrity of the pathway. Out of the four yeast plasmids, only one (F3-3, also called

TAR1-C7-c1) contained the full fragment 3 pathway (Fig. 2-16). The other plasmids (F3-1, F3-2 and F3-4) produced missing or unexpected bands, which may have been due to parts of fragment

3 looping out by homologous recombination between repetitive sequences.

86

Figure 2-16. Restriction analysis of TAR plasmids containing fragment 3. The top gel is identical to the bottom gel, but it has been run for a longer time to increase the resolution of the larger bands. The four TAR plasmids from yeast were extracted and transformed separately into E. coli. Three E. coli colonies were picked from each transformation for plasmid amplification and restriction analysis using the enzymes StuI and XhoI. Expected bands are listed below in kilobases.

Cloning of fragments by restriction digest and ligation

Finally, we employed an iterative cloning scheme using the unique restriction sites XbaI,

NheI and MfeI to enable the fragments to be cloned in any combination onto the shuttle plasmid pAN84 for transfer into Streptomyces spp. (Fig. 2-17). pAN84 is a derivative of pSBAC, created by site-directed mutagenesis to remove existing NheI and MfeI sites in the backbone, followed by insertion of a multiple cloning site containing the NheI and MfeI sites to allow for the insertion of the pathway fragments. Using the restriction digest and ligation scheme, fragments 1 and 2 were cloned into pAN84 to create plig5 (data not shown).

87

Figure 2-17. Restriction digest and ligation scheme to clone pathway fragments. The design of the fragment assemblies and restriction sites are shown above. To perform cloning of the fragments,

FLIP/TAR plasmids containing each of the fragments are digested with XbaI and MfeI. The shuttle plasmids (pAN84 or pAN446) are digested with NheI and MfeI. The cut fragment and plasmid are ligated together overnight at 16 ºC using T4 ligase (high concentration), followed by transformation into EPI300 and selection for the apramycin marker on the pSBAC and the antibiotic resistance gene attached to the fragment. XbaI and NheI produce complementary sticky ends that when ligated together, destroy both sites. Thus, in the second round of cloning, the plasmid containing a single fragment can be digested with

NheI and MfeI to remove the bacterial selection marker, and allow for the cloning of a second fragment digested with XbaI and MfeI. This process can be repeated indefinitely. 88

The pSBAC shuttle plasmid was developed to integrate constructs in the phage φBT1 attB locus in the genome of Streptomyces spp., using the φBT1 attP/int system encoded on the plasmid13. This system has been shown to be compatible with the orthologous φC31 attP/int system, which integrates plasmids into a different locus59. Other orthologous systems have been developed to enable multiplex integration into four different loci in the Streptomyces spp. genome60. Fayed et al. demonstrated this by separately integrating the three DEBS PKS subunits and a fourth module containing tailoring enzymes into four different loci. Thus, we replaced the

φBT1 attP/int system in pAN84 with φC31 attP/int to create pAN446, in order to enable the integration of the pathway fragments into different genomic loci to test for locus-specific effects on gene expression, which could potentially impact on product yield.

Recovery of the promoter libraries

Our next goal was to recover the promoter library from the yeast chromosome onto shuttle plasmids. So far, we have recovered the promoter library for fragment 4 on plasmids using a single round of FLIP (Fig. 2-18).

Figure 2-18. Restriction analysis of plasmids containing fragment 4 promoter library. All plasmids were digested with EcoRV. Expected sizes (kb): 11.7, 9.8, 7.7, 5.3, 2.8, 0.3. 1 kb NEB ladder, 1: pAN285, 2: pAN286, 3: pAN340, 4: pAN341, 5: pAN342, 6: pAN343, 7: pAN344, 8: pAN345, 9: pAN346, 10: pAN347, 11: pAN348, 12: pAN349.

89

2.2.4 Transfer of pathway into Streptomyces spp. for meridamycin production and library screening

Pathway transfer into Streptomyces spp.

Once all the fragments and the promoter library are cloned onto the appropriate shuttle plasmids, we plan to transfer the pathway/library into S. lividans K4-114 by conjugation with E. coli13. As a proof-of-concept, while awaiting construction of the other shuttle plasmids, we transferred a plasmid containing fragment 4 under control of the ermEp* promoter into K4-114.

We verified the presence of the full fragment 4 in the genome of K4-114 by PCR (A.E. data not shown), and detected expression of fragment 4 by RT-PCR (Fig. 2-19). However, we observed a very low efficiency for the conjugation process (only two colonies). This will be optimized in order to transfer the promoter library. The protoplast transformation method, which is generally less efficient than conjugation, can also be tested31.

Figure 2-19. RT-PCR to test for expression of fragment 4 in S. lividans K4-114. Top gel: Total RNA extracted from K4-114 (WT) and K4-114-F4, showing the presence of 23S and 16S rRNA. LD: NEB 100 bp ladder. Bottom gel: Reverse transcription followed by PCR to check for expression. A band that

90 corresponded to the positive control (+ genomic DNA of K4-114-F4) was detected for F4 only when reverse transcription was performed (+RT) but not in the absence of reverse transcription (-RT).

Meridamycin production and screening

We plan to optimize meridamycin production initially in shake flasks using the engineered strain E7, which contains the full meridamycin pathway introduced into K4-114 on the pSBAC vector13. Since meridamycin production levels reported for that strain are low (~ 240

μg/L), which may be challenging to detect, we plan on testing several natural strains along with media that have been reported to produce higher levels (~ 5 – 10 mg/L) of meridamycin61,62.

In order to screen our promoter library (theoretical size of 625), we plan to culture the streptomyces library in 96-well, deep-well microtiter plates for compound production. In

Chapter 3, we demonstrate the use of this production platform for culturing Streptomyces tsukubaensis for the biosynthesis of FK506 (tacrolimus), a polyketide which is an immunosuppressant drug and is structurally similar to meridamycin. Furthermore, we have developed a high-throughput fluorescence polarization (FP) assay for the detection and quantification of meridamycin (Fig. 2-20) and more generally as a platform for small molecule product detection (see Chapter 3). Briefly, a fixed amount of receptor (purified FKBP12) and reporter molecules (FK506 linked to the fluorophore fluorescein) are supplied to each sample in a 384-well plate. Due to competition between the reporter (FK506-fluorescein) and the target compound (meridamycin) for binding to a common receptor (FKBP12), the ratio of bound to unbound reporter molecules is inversely correlated with the concentration of meridamycin. This results in different fluorescence polarization values, which can be measured using a plate reader.

A standard curve can be constructed to quantify meridamycin concentrations in the sample (Fig.

91

2-20). In Chapter 3, as a proof-of-principle, we demonstrate how we can apply the FP assay to screen a library of 160 UV-mutagenized S. tsukubaensis colonies for FK506 production.

400

350

300

250

200

FP (mP) FP 150 1 µM FKBP12

100 500 nM FKPB12

50 50 nM FKBP12

0 0.0001 0.001 0.01 0.1 1 Meridamycin (mg/L)

Figure 2-20. Standard curves to detect meridamycin via the FP assay. Different receptor concentrations (50 nM, 500 nM, 1 μM) were used to construct a series of meridamycin standard curves.

Increasing receptor concentrations shift the detection range to higher levels of meridmaycin. Error bars represent standard deviation from 3 wells.

2.3 Discussion

Type I modular PKS pathway assembly

In this study, we demonstrate the de novo assembly of the ~ 100 kb meridamycin biosynthetic pathway using Reiterative Recombination. To our knowledge, this is the first report of the de novo assembly of a modular, Type I hybrid NRPS-PKS pathway of this scale.

Compared to methods for performing large-scale DNA mutagenesis, bottom-up DNA assembly can offer the ultimate flexibility to construct user-defined, heterologous pathways and even whole genomes2,17, especially when natural templates are difficult to obtain (e.g. when the source

92 organism cannot be cultured in the lab). In addition, substantial refactoring of biosynthetic pathways such as codon optimization can be performed by piecing together synthetic DNA building blocks. Although we use mostly PCR-amplified products for our pathway assembly,

Reiterative Recombination is equally compatible with commercially available synthetic DNA building blocks (e.g. GeneArt™ Strings or IDT gBlocks®). Importantly, de novo assembly allows for unique combinations of genes and regulators to be assembled for the production of novel circuits, compounds and analogs.

There is a strong interest in manipulating the Type I modular PKS pathways to increase the naturally low polyketide yields and also for combinatorial biosynthesis for the production of novel analogs14,32,33,63. However, the PKS sequences are highly repetitive and GC-rich (~ 70 %), which make this class of pathways challenging to engineer. This problem is compounded by a lack of genetic engineering tools in their native Streptomyces spp. hosts. Although CRISPR/Cas9 has been established in Streptomyces spp., but the efficiency of DNA transformation remains low, which limits the library sizes that can be built16. Also, long homology arms (~ 1 kb) are needed to introduce point mutations by homology-directed repair since homologous recombination is not as efficient as in yeast, which prevents the use of low-cost, short (< 90 bases) oligonucleotides to build large libraries. A potential solution is to perform both pathway engineering and compound production in a genetically tractable host, such as E. coli or S. cerevisiae. However, due to differences in host cell metabolism, lack of precursor biosynthetic pathways, lack of post-translational enzymes, as well as difficulties in heterologous expression of the large multi-modular polyketide synthases, these examples have been limited to proof-of- principle studies using only a few modules of the PKS64,65 or expressing the model Type I DEBS

PKS for erythromycin in E. coli66,67. Thus, our approach of first harnessing yeast, a genetically

93 tractable organism with developed genetic engineering tools, for pathway assembly and engineering, followed by pathway/library transfer into Streptomyces spp. for production, can be a potentially general route to engineer the Type I modular PKS pathways.

Another option is to perform pathway assembly entirely in vitro. Kodumal et al. assembled the Type I DEBS PKS pathway de novo using polymerase chain assembly (PCA)68 of overlapping synthetic oligonucleotides, followed by hierarchical restriction digest and ligation to re-create the pathway3. However, finding unique restriction sites within the long PKS sequences may be difficult, and the introduction of sites by mutating the sequence (even silent mutations) can interfere with PKS gene expression63. Alternatively, in vitro assembly methods based on end homology, such as Gibson assembly, have been developed for the scarless assembly of dsDNA building blocks69. Importantly, only a short region of homology (20 – 80 bp) is exposed at both ends of each dsDNA block by exonuclease digestion. In theory, this would allow for the one-step cloning of multiple building blocks containing highly repetitive sequences, if the junctions are carefully placed to maximize the homology at the paired ends of two sequential building blocks, and to minimize the amount of homology with the ends of other building blocks. This is an advantage over in vivo homologous recombination, which can recombine repetitive sequences even if they are in the middle of a building block. However, the efficiency of Gibson assembly decreases sharply with increasing numbers of building blocks in a round of assembly70, and in vitro handling of long DNA fragments can result in mechanical shearing.

Itaya and coworkers have explored the use of the Bacillus subtilis genome as a vector for

DNA assembly71. They have shown that the single, circular chromosome of B. subtilis can accommodate large inserts, including the whole 3.5 megabase (Mb) genome of the bacterium

Synechocystis PCC680372. They have developed assembly methods (“inchworm” and “domino”)

94 both based on homologous recombination for the sequential assembly of DNA in the B. subtilis genome73,74. To broaden the scope of their assembly platform, they have also developed a method based on homologous recombination (Bacillus recombinational transfer – BReT) to recover the assembled DNA onto a plasmid for transfer to other organisms75. Significantly, repetitive DNA can be maintained stably in the B. subtilis genome vector76. However, this system has yet to be widely adopted by other groups, which may be because the development of tools for efficient genome modification in B. subtilis76 still lag behind that of model organisms such as E. coli and S. cerevisiae77.

Our lab developed Reiterative Recombination as an user-friendly and efficient method for the in vivo assembly of DNA pathways and libraries in the yeast chromosome19. Despite being a sequential assembly method, we can split the PKS pathway by open reading frames for faster assembly in parallel. This approach can also be exploited to build libraries to improve production or to create analogs31-33.

Although methods based on yeast gap repair to assemble pathways18 and genomes78 on plasmids are quicker, since the plasmids can be directly isolated from yeast cells, but chromosomal assembly offers advantages such as a stable, single-copy existence and propagation without a selection marker. Compared to the natural chromosome, yeast centromeric plasmids

(YCps) are present on an average of 1-3 copies per cell, which may increase the chances of unwanted recombination events occurring between repetitive sequences. Even yeast artificial chromosomes (YACS), which are linear and more similar to natural chromosomes, have an elevated rate of recombination resulting in chimeras79. To resolve these issues, recombination deficient yeast strains can be used to increase the stability of YACS80, but this results in slower growth and lower transformation efficiency81. Thus, assembly in the native yeast chromosome is

95 advantageous, provided there are simple and efficient methods to retrieve the assembly. Indeed, our fragment 3 PKS assembly (merC, 35 kb, 7 modules) remained stable in the chromosome for at least ~ 120 generations despite the yeast being propagated in non-selective, rich YPD media

(Fig. 2-6). Furthermore, yeast can be easily freeze-dried or sporulated to form spores, which can serve as long-term, stable repositories for DNA assemblies.

After assembly of the core pathway, we used CRISPR/Cas9 to diversify the pathway to create mutations or to build the promoter library. Although Reiterative Recombination can be used to build libraries of 104 directly during the assembly process, but this requires prior planning. Also, the efficiency of Reiterative Recombination at assembling correct clones decreases during later rounds of assembly, which could be due to higher probabilities of unwanted recombination events with the previous assemblies in the chromosome. Thus, cumbersome screening of clones have to be performed during each round to pick out library members without gene deletions to carry forward for subsequent rounds of assembly. If not, the population of incorrect clones would quickly overtake the culture over several rounds of assembly. Therefore, the most flexible strategy would be to use Reiterative Recombination to assemble the core pathway, followed by diversification using CRISPR/Cas9 or other programmable nucleases such as the transcription activator-like effector nucleases (TALENs) or zinc-finger nucleases (ZFNs)82. These can be used to build promoter/ribosome-binding site libraries to optimize expression, or pathway libraries at specific loci to make analogs. In addition, other genetic modification techniques that have been developed for yeast, such as delitto perfetto83 and YOGE84, can be used for the efficient, precise and scarless modification of the pathway to create insertions, deletions and substitutions, and to construct libraries. This makes the yeast S. cerevisiae a versatile DNA assembly and engineering factory.

96

Pathway recovery onto shuttle plasmids

Few approaches are available to clone long DNA fragments from the chromosome onto a plasmid. In vitro methods such as restriction digest of genomic DNA, followed by ligation into a plasmid, require unique restriction sites to be available that do not cut within the region of interest. Also, many clones have to be screened to identify the right constructs (unless a selectable marker is available), since restriction enzymes often cut the genome at multiple sites, which will produce a high background of unwanted clones. Recently, another in vitro method has been developed using a purified CRISPR/Cas9 system to create dsDNA breaks at both ends of the fragment of interest, followed by Gibson assembly into a linearized plasmid containing homologies to the ends of the fragment. This method greatly reduces the background of unwanted clones, due to the high specificities of both CRISPR (target site is complementary to gRNA) and Gibson assembly (recombination is based on homology). However, both in vitro methods suffer from low efficiencies due to limiting amounts of the genomic target, as well as possible shearing of long DNA fragments by mechanical forces during pipetting.

In vivo methods based on homologous recombination (HR) have been developed for the recovery of genomic sequences onto shuttle plasmids in yeast (TAR)53. However, we thought that activating the HR machinery near the pathway might result in gene deletions due to unwanted recombination events between repetitive elements within the pathway. Thus, we developed the FLIP method based on site-specific recombination (SSR), which is catalyzed by a specific Flp recombinase enzyme that recognizes defined FRT target sites. The efficiency of

FLIP in yeast was found to be 10-4 when recovering the URA3 marker (~ 1 kb) (Fig. 2-10), which was similar to the values obtained during recovery of fragments 1, 2 and 4 (~ 20 kb each).

However, the efficiency decreased slightly when recovering fragment 3 (~ 35 kb) (Fig. 2-13).

97

This is to be expected, since Flp recombinase has to search for the short 48 bp FRT sites for recombination.

Next, when FLIP was combined with a bacterial selection marker to select for fragments

1, 2 and 4, almost all the bacterial colonies obtained from the transformation of FLIP-derived plasmids contained correct assemblies (Fig. 2-14). In the case of fragment 3, the streptomycin selection was not effective as EPI300 is naturally resistant to the antibiotic. In that case, all the E. coli colonies tested contained plasmids identical to the original FLIP acceptor shuttle plasmid that was transformed into the yeast cells (verified by sequencing). This background is surprising, since we did not observe this when flipping the other fragments. One possibility was human error, or that the cells somehow managed to escape the URA3 or TRP1 selection by activating the silent marker gene in the chromosome. One approach that we are testing to overcome this is to transform the yeast-purified plasmids (after FLIP) into a different strain of E. coli that is sensitive to streptomycin. Colonies that appear on streptomycin plates will be scraped and their plasmids extracted for transformation into EPI300 for subsequent BAC amplification.

We also tested TAR for the recovery of our fragments. In yeast, we obtained very high efficiencies of 50 %, which is greater than the reported optimized efficiencies of 20 - 30 %58.

This is probably due to our use of ~ 1 kb homology arms instead of the minimal amount of homology required (60 bp)85. However, in the case of fragment 3-containing yeast colonies, only one out of four had the complete fragment. This was likely due to unwanted recombination occurring during the recovery by HR, resulting in deletions occurring. These results suggest that the FLIP approach, when combined with a bacterial selection for the fragment, can potentially yield a higher percentage of colonies containing the full assemblies with no deletions.

98

Conclusions

In summary, we have shown that the yeast chromosome can be a stable vector for the de novo assembly and engineering of long and highly repetitive DNA sequences, such as the Type I modular PKS pathways. In addition, we have developed FLIP to enable the efficient recovery of the pathway/library from the yeast chromosome onto a shuttle plasmid for transfer and expression in a heterologous host. Polyketides are a privileged class of natural products due to their interesting bioactivities. Platforms that facilitate the engineering of PKS pathways to construct promoter/RBS or pathway libraries can enable combinatorial metabolic engineering approaches to be applied to optimize polyketide production or to create novel analogs respectively. In the next chapter, we show how the fluorescence polarization assay may be harnessed as a high-throughput, general solution to alleviate the screening bottleneck in combinatorial metabolic engineering.

2.4 Experimental methods

General materials and methods. PCR was performed using Phusion High-Fidelity DNA

Polymerase (NEB, M0530S), using the GC Buffer and adding DMSO unless otherwise stated.

For E. coli and yeast colony PCRs, GoTaq DNA Polymerase (Promega, M300) was used. For E. coli colony PCRs, a tiny quantity of cells were mixed directly into the PCR buffer. For yeast colony PCRs, the DNA was first prepared by adding a small quantity of cells (just visible on the end of a pipette tip) into 30 μL of 0.02 M NaOH and heating at 98 ºC for 20 minutes.

Subsequently, 2 μL of the mixture was used as template for the PCR. Oligonucleotides were obtained from Integrated DNA Technologies. The dNTPs for PCR were purchased from NEB,

N0447S. PCR products were purified by agarose gel electrophoresis followed by the QIAquick

99

Gel Extraction Kit, (Qiagen, 28704). DNA ladders were purchased from NEB. All sequencing was performed by Genewiz. Restriction enzymes, Gibson Assembly Master Mix (E2611) and T4

DNA ligase (M0202) were purchased from New England Biolabs. DNA concentrations were determined by absorption at 260 nm using a Tecan Infinite M200. Milli-Q water was used for in vitro reactions. PCR was carried out using a Bio-Rad T100 thermal cycler.

Standard methods for molecular biology were applied when culturing Saccharomyces cerevisiae and E. coli86,87. E. coli strains were grown at 37 ºC in LB media with appropriate antibiotics for the selection of plasmids. General cloning was performed using the E. coli strain

TG1 Electroporation-Competent Cells (Agilent Technologies, #200123). All cloning of pSBAC constructs were performed using TransforMax EPI300 Electrocompetent E. coli (Epicentre,

EC300110). This strain allows for the pSBAC to be maintained at low copy number normally, but can be induced to high copy number using the CopyControl Induction Solution (Epicentre,

CCIS125). S. cerevisiae strains were cultivated at 30°C in YEPD media or SC drop-out media to select for auxotrophic marker genes20. SC drop-out media contains 2 % glucose. For galactose induction, 2 % galactose supplemented with 2 % raffinose was used instead of glucose. Plasmids in E. coli were purified using the QIAprep Spin Miniprep Kit (Qiagen, 27104). Plasmids in yeast were purified according to the Silverman et al. protocol52. Yeast genomic DNA for PCR verification of the pathway assembly was purified using the YeaStar Genomic DNA Kit (Zymo

Research, D2002). Transformation of E. coli was performed by electroporation using a Bio-Rad

E. coli Pulser. Yeast electroporation was carried out using a Bio-Rad Gene Pulser Xcell, following an established protocol20.

100

Pathway assembly by Reiterative Recombination and verification of assemblies in yeast.

Multiple cycles of Reiterative Recombination were performed to assemble the pathway using previously published protocols19,20. Building blocks for fragment assembly were PCR amplified using pHLW30 as template13. The list of assembly PCRs, primers, PCR conditions and expected lengths can be found in Table 2-2 (Fragment 1), Table 2-3 (Fragment 2), Table 2-4 (Fragment

3) and Table 2-5 (Fragment 4). The list of check PCRs and primers etc. that were used to verify the chromosomal assemblies can be found in Table 2-6 (Fragment 1), Table 2-7 (Fragment 2),

Table 2-8 (Fragment 3) and Table 2-9 (Fragment 4).

pAN68 was created by first PCR amplifying FRT3 with the primers AN264 and AN265, using pAN15 as template. The PCR product was used as template for a second round of PCR with primers AN266 and AN265. The product was co-transformed into yeast with pLW2593 digested with SalI and selected on SC(U-) in order for gap repair to clone the FRT3 site into pLW2593. Yeast transformants were miniprepped and the plasmids transformed into E. coli for verification to create pAN68.

For pulsed field gel electrophoresis (PFGE), yeast genomic DNA was extracted in agarose plugs according to a published method88. In-gel complete restriction digest of the genomic DNA was performed at 37 ºC using the enzyme SpeI (NEB R0133S) according to the manufacturer’s recommendations. PFGE was performed using the following conditions: Initial

Switch time: 1.7 seconds; Final Switch time: 2.6 seconds; Total run time: 22 hours; Voltage:

6V/cm; Buffer: 0.5x commercial grade TBE; Temperature 14 ºC; Gel: 180 ml of 1% BioRad

Molecular Biology Certified agarose (Amr Al-Zain, Symington lab). For the Southern blot, the probe (500 bp of the HIS3 gene) was radiolabeled with the Promega Prime-a-Gene labeling kit, but using P-32 dATP instead of dCTP. Following labeling, the probe was hybridized to the blot

101 using the GE Amersham Rapid-hyb buffer. The blot was exposed on a phosphor screen overnight and imaged using a phosphor-imager.

CRISPR to correct mutations and build promoter library. Guide RNA (gRNA) plasmids were created by PCR of the SNR52 promoter using the gRNA R primer (see Table 2-10) and

AN318. The SUP4 terminator was amplified by PCR using the gRNA F primer (Table 2-10) and

AN321. For both PCRs, a gRNA-CAN1 plasmid obtained from the Church lab was used as the template36. Fusion PCR was performed using AN318 and AN321 to fuse the pSNR52 and tSUP4 fragments and create the unique gRNA 20-bp targeting sequence. The fusion PCR product was digested with SacI and XhoI, and ligated into gRNA-CAN1 (pRS426)51 that was digested with the same pair of enzymes. The ligation product was transformed into E. coli and selected on

LB/Ampicillin plates. To perform CRISPR, p414-pTEF-Cas9 (a kind gift from the Church lab36) containing the Cas9 endonuclease constitutively expressed by the TEF promoter was first transformed into yeast and selected on SC(HT-) plates. A few colonies were picked to undergo a second round of transformation with 500 ng of gRNA plasmid and 0.05 nmol of repair fragment.

Transformants were selected on SC(HTU-) plates and verified by colony PCR and sequencing.

The CRISPR gRNA and Cas9 plasmids were cured from the cells by culturing for one day in non-selective media SC(H-), followed by streaking the cells on SC(H-), 0.1 % FOA and 0.05 %

FAA plates to select against the URA389 and TRP190 marker respectively. The repair fragments were generated by PCR using the primers and templates listed in Table 2-11.

For cr16b (promoter libraries), the 5 promoters (SF14p, kasOp*, p21, B4p, p72) were

PCR amplified using the primers in Table 2-11 and gblocks as templates, to generate each of the

102 promoters flanked by 90 bp of homology on both ends to the respective fragment. An equimolar mix of the 5 promoters for each fragment (total 0.05 nmol) and the cr16b gRNA plasmid (1 µg) were supplied during the transformation into yeast cells containing the p414-pTEF-Cas9 plasmid. 30 colonies that appeared on the selective plate SC(HTU-) were picked for colony PCR and sequencing to verify the identity of the pathway promoter that was inserted to replace the ermEp* promoter. Colony PCR was performed using the following primer pairs: fragment 1

(LK18, LK23), fragment 2 (AN533, LK19), fragment 3 (LK24, LK26), fragment 4 (AN324,

LK13). Two colonies / promoter / fragment were cured of the CRISPR/Cas9 plasmids and stocked (Table 2-12). These will be used for FLIP to recover the library onto plasmids.

Shuttle plasmid construction. To obtain pAN8, site-directed mutagenesis was performed on the pSBAC to remove the NheI restriction site in the int gene. The pSBAC backbone was

PCR amplified with primers AN3 and AN4, using PfuUltra High-Fidelity DNA Polymerase AD

(Agilent Technologies, #600385). The PCR product was separated on a gel, purified, digested with DpnI overnight, transformed into EPI300 and selected on LB agar plates containing 50

μg/mL of apramycin to select for the pSBAC backbone. Plasmids were verified by restriction digest and sequencing. pAN12 was also created by the same site-directed mutagenesis protocol, but using the primers AN21 and AN22 to amplify pAN8, in order to remove the MfeI restriction site in the parA gene. To create pAN84, a multiple cloning site containing the NheI and MfeI restriction sites was created by PCR extension using primers AN343 and AN344. The product was amplified by a second round of PCR using primers AN362 and AN363, followed by digestion with EcoRI and HindIII. This was ligated into pAN12 that was digested using the same set of enzymes. pAN98 was constructed by Gibson assembly. First, the vector, pAN84, was

103 digested with MfeI and NheI, and gel purified. Next, the inserts containing the yeast CEN/ARS replication sequence, LEU2 auxotrophic marker, and the FLIP acceptor module (pMET25 –

AUG – FRT(WT) – TRP1 – FRT(3)) were amplified from pAN65 in two blocks using the primers AN415/AN447 and AN448/AN416. These were Gibson assembled into the cut vector

(pAN84) to form pAN98. To construct pAN146, pGAL – Flp was amplified from pAN133 using the primers AN619 and AN620. tACT was amplified from yeast gDNA using primers

AN621/AN622. The two inserts were then Gibson assembled into HindIII-digested pAN98 to form pAN146.

To construct the FLIP shuttle plasmids for individually flipping all the fragments, overlap extension PCR (no template) was first performed to create the following cassettes using the primer pairs in brackets: pAN65-FRT3-MfeI (AN849/AN850), pAN65-FRT13-MfeI

(AN851/AN852), pAN65-FRT15-MfeI (AN853/AN854), pAN27-FRT13-MfeI (AN851/AN855) and pAN27-FRT14-MfeI (AN857/AN856). The first three cassettes were Gibson assembled into pAN65 digested with SmaI and NheI, to create pAN428, pAN429 and pAN430 respectively. The last two cassettes were Gibson assembled into pAN27 partially digested with KpnI (6.5 kb band) to form pAN431 and pAN432 respectively. Next, to create the pSBAC FLIP acceptor plasmids pAN433, pAN434, pAN435, pAN436 and pAN437, the FLIP acceptor cassettes were first PCR amplified using primers AN416 and AN858 from the templates pAN428, pAN429, pAN430, pAN431 and pAN432 respectively. To create the final plasmids, the FLIP acceptor cassettes were then Gibson assembled into pAN146 digested with NheI and KfII, together with a fragment containing int, TraJ and oriT that was PCR amplified from pAN146 using the primers AN876 and AN877. The Gibson assembly reaction was transformed into EPI300 and plated on

LB/apramycin plates to obtain colonies that were verified by plasmid extraction and sequencing.

104

To construct the TAR acceptor plasmid pAN438, the following cassettes were PCR amplified: 1) left homology arm (HO-L) was amplified using primers AN859 and AN884, with

LW2591Y gDNA as template. 2) right homology arm (HO-R) was amplified using primers

AN885 and AN862, with LW2591Y gDNA as template. 3) LEU2 gene was amplified using primers AN863 and AN890, with pAN146 as template. 4) CEN6 (no ARS) was amplified using primers AN891 and AN866, with pAN146 as template. The 4 cassettes were Gibson assembled into pAN84 digested with NheI and HindIII, to form pAN438.

To construct the second pSBAC-Streptomyces spp. integrative plasmid pAN446, the following cassettes were PCR amplified: 1) Kanamycin resistance marker (aphII) was amplified using primers AN892 and AN893, with pGM190 (Addgene, #69614) as template. 2) Fd terminator was amplified using primers AN894 and AN895, with gblock-pAN84-tFd as template. 3) attP/int from the phage φC31 was PCR amplified using primers AN896 and AN897, with pSET152 (kind gift from Alessandra Eustaquio) as template. The cassettes were Gibson assembled into pAN84 digested with AvrII, to form pAN446.

Construction of the FLIP system. pAN15 containing the FRT(3) site was created by site- directed mutagenesis, following the same protocol as above but using primers AN15 and AN16 to amplify pJ33R. pAN20 containing the N terminal HA-tagged Flp recombinase under control of the GALL promoter was constructed by PCR of Flp recombinase from pFL119 using the primers AN47 and AN48. The product was amplified by a second round of PCR using the primers AN49 and AN48. This insert was digested with BamHI and SalI, and ligated into p416GALL digested with the same pair of enzymes, to form pAN20. To construct pAN27, the

105

URA3 gene was first PCR amplified with the primers AN104 and AN106 to remove the start codon and create the FRT-WT site. The PCR product was amplified by a second round of PCR with the primers AN105 and AN106. This product was digested with BamHI and SalI and ligated into p415MET25 digested with the same enzymes, to form pAN27. To construct pAN29, the TRP1 gene was first PCR amplified with primers AN107 and AN108 to remove the start codon and create the FRT-WT site. The PCR product was amplified by a second round of PCR with the primers AN105 and AN108. This product was digested with BamHI and SalI, and subsequently ligated into p415MET2591 that was digested with the same set of enzymes, to form pAN29. To construct pAN53, the pGAL – HA – Flp – tCYC insert was PCR amplified from pAN20 using the primers AN128 and AN138. The product was purified and used as template for a second round of PCR using primers AN137 and AN138, followed by digestion with KpnI. This was ligated into the vector that was partially digested with KpnI (cut once), to form pAN53. To construct pAN65, the insert containing the FRT(3) site was first PCR amplified from pAN15 using primers AN231 and AN232. The vector pAN29 was partially digested with KpnI (cut once), gel purified, and ligated with the FRT(3) insert to form pAN65. To construct pAN133

(pAN53 but no HA tag on the Flp recombinase), the vector pAN53 was partially digested with

NdeI (cut once). The insert was created by PCR extension of primers NJ8 and NJ9. This was

Gibson assembled into the partially digested vector to form pAN133.

FLIP protocol. To perform FLIP, the shuttle plasmid containing the compatible FRT sites and FLIP markers were transformed into the yeast strain and selected on SC(L-) plates. FLIP shuttle plasmids for the various fragment-containing strains are: pAN435 (F1-R10-cr10), pAN437 (F2-R14-cr9), pAN434 (F3-R12-cr23-1), pAN436 (F3-R12-cr23-2), pAN146 /or

106 pAN433 (F4-R8-cr6-cr7). About 40 transformant colonies (1 – 2 plates) were scraped with water, pelleted and washed once with sterile water, pelleted again and resuspended in 1 mL of sterile water to create the main cell stock. The OD600 was measured by diluting the cell stock 50 times (20 µL of cells in 980 µL sterile water). The cell stock was diluted to create three 1 mL starter cultures of final OD600 = 1. The final media for the three starter cultures were: t0 = sterile water, t24-Gal = SC(L-, 2 % galactose, 2 % raffinose), t24-Glu = SC(L-, 2 % glucose). The t0 culture was diluted and plated on selective (for the FLIP marker in the chromosome) and non- selective (SC) plates to calculate the efficiency of FLIP at t0. The other two cultures were incubated at 30 ºC in a 230 rpm shaker for 24 hours, and plated on selective and non-selective plates to calculate FLIP efficiencies in yeast with the induction of Flp recombinase (t24-Gal) and without Flp induction (t24-Glu).

Checking gene expression via RT-PCR. RNA was extracted from S. lividans strains K4-114 and K4-114-F4 using the RNeasy Mini Kit and gDNA eliminator columns (Qiagen, 74104) according to the manufacturer’s recommendations. The RNA was treated with DNaseI (NEB,

M0303) to remove contaminating DNA. Reverse transcription was performed using SuperScript

II Reverse Transcriptase (Invitrogen, 18064014). PCR was performed to detect merE using

GoTaq polymerase and primers DD13 and DD14.

107

2.5 Assembly PCR, Check PCR and CRISPR tables

Table 2-2. Assembly PCRs for Fragment 1.

Annealing Rev Size Extension Round* Fwd primer Fwd position Rev primer temperature Template Strain (AKA) position (bp) time (min) (ºC) F1-R1 1 AN133 31 AN134 1057 1027 58 0:30 pAN38 (AN43Y) 2 (PCR 1) AN217 1022 AN219 1363 342 52.5 0:15 pSE34 2 (PCR 2)** AN218 1002 AN219 1363 362 53 0:15 2 (PCR 1) F1-R2 3 EC1 1327 EC2 3424 2098 52 1:00 pHLW30 F1-R3 4-1*** EC6 3331 KV18 5298 1968 54 1:00 pHLW30 F1-R4 4-2 KV19 5157 KV20 7159 2003 54 1:00 pHLW30 5-1 KV21 7024 KV22 9057 2034 54 1:00 pHLW30 F1-R5 5-2 KV23 8802 KV24 10771 1970 54 1:00 pHLW30 6-1 KV29 10574 KV30 12378 1805 54.3 1:00 pHLW30 F1-R6 108 6-2 KV31 12185 KV32 13874 1690 53 1:00 pHLW30

7-1 KV56 13497 KV40 15729 2233 54 1:00 pHLW30 F1-R7 7-2 KV55 15494 KV42 17414 1921 52.5 1:00 pHLW30 8-1 KV47 17232 KV48 20010 2779 53.9 1:20 pHLW30 F1-R8 8-2 KV49 19933 KV50 21831 1899 53.6 1:00 pHLW30 9 KV65 21691 KV66 23205 1515 55 0:45 pHLW30 F1-R9 10 (PCR 1) KV70 23179 KV71 24643 1465 52.5 0:45 pACYC184 10 (PCR 2) KV72 23140 KV73 24675 1536 55 0:45 10 (PCR 1) 10 (PCR 3) KV74 23101 KV73 24675 1575 55 0:45 10 (PCR 2) 10 (PCR 4)** KV75 23101 KV73 24675 1575 55 0:45 10 (PCR 3) F1-R10 cr10 see Table 2-11 F1-R10-cr10 * The first number corresponds to the round of Reiterative Recombination. ** For Round 2, PCR2 was used for Reiterative Recombination, whereas for Round 10, PCR 4 was used. These blocks were created by multiple rounds of PCR, using the previous product as template for the next PCR. *** The labels 4-1 and 4-2 mean that two blocks were assembled together in round 4 of Reiterative Recombination.

Table 2-3. Assembly PCRs for Fragment 2.

Fwd Fwd Rev Rev Size Annealing Extension Round Template Strain (AKA) primer position primer position (bp) temperature (ºC) time (min) 1-1* AN100 NA AN101 NA NA 50 0:15 pAN15 F2-R1 (AN23Y) 1-2* AN102 NA AN103 1353 NA 50 0:15 pSE34 2 AN109 1324 AN146 3315 1992 54 1:00 pHLW30 F2-R2 (AN41Y) 3 AN155 3243 AN156 5315 2073 57 1:00 pHLW30 F2-R3 (AN46Y) 4 AN164 5206 AN165 7197 1992 55 1:00 pHLW30 F2-R4 (AN51Y) 5 AN168 6942 AN169 9172 2231 55 1:00 pHLW30 F2-R5 (AN57Y) 6 AN171 8939 AN172 10970 2032 55 1:00 pHLW30 F2-R6 (AN60Y) 7 AN188 10747 AN189 12752 2006 55 1:00 pHLW30 F2-R7 (AN62Y) 8 AN215 12482 AN216 14527 2046 55 1:00 pHLW30 F2-R8 (AN63Y) 9 AN229 14297 AN230 16292 1996 55 1:00 pHLW30 F2-R9 10 AN241 16061 AN242 18087 2027 56 1:00 pHLW30 F2-R10 11 AN249 17945 AN250 19916 1972 56 1:00 pHLW30 F2-R11 12 AN273 19658 AN274 21682 2025 55 1:00 pHLW30 F2-R12

109 13-1 AN277 21480 AN345 22705 1226 55 0:40 pHLW30 F2-R13 13-2** AN346 NA AN347 NA NA 54 0:40 pAN38 14 (PCR 1) AN485 22668 AN553 23634 967 56 0:30 pACYC184 F2-R14 14 (PCR 2)** AN487 22648 AN554 23661 1014 55 0:30 14 (PCR 1) cr9-a (PCR 2), cr9*** AN531 -33 AN492 1118 1152 57 0:30 F2-R14-cr9 cr9-b (PCR 3) cr9-a (PCR 1) AN133 1 AN532 781 781 55 0:25 pAN39 cr9-a (PCR 2) AN531 -33 AN532 781 815 55 0:25 cr9-a (PCR 1) LW2591Y cr9-b (PCR 1) AN533 730 AN534 1026 297 55 0:15 gDNA cr9-b (PCR 2) AN533 730 AN535 1044 315 55 0:15 cr9-b (PCR 1) cr9-b (PCR 3) AN533 730 AN524 1078 349 55 0:15 cr9-b (PCR 2) * FRT-3 and ermE* promoter were assembled in Round 1, CRISPR (cr9) was performed to replace FRT-3 with FRT-WT, the TRP1 marker and FRT-15. ** The ampR (ampicillin resistance) gene was originally assembled in Round 13, and replaced by cat (chloramphenicol resistance gene) in Round 14. *** Repair DNA block for CRISPR (cr9), formed by fusion PCR of cr9-a (PCR 2) and cr9-b (PCR 3).

Table 2-4. Assembly PCRs for Fragment 3.

Annealing Fwd Fwd Rev Rev Size Extension Round temperature Template Strain (AKA) primer position primer position (bp) time (min) (ºC) 1* AN133 31 AN134 1057 1027 58 0:30 pAN38 F3-R1 (AN43Y) 2 (PCR 1)* AN217 NA AN220 1475 NA 52.5 0:15 pSE34 2 (PCR 2)* AN218 NA AN220 1475 NA 52.5 0:15 2 (PCR 1) F3-R2 3 KV1 1435 KV2 3462 2028 52 1:00 pHLW30 F3-R3 4-1 KV4 3296 JK1 5305 2010 54 1:00 pHLW30 F3-R4 4-2 JK2 5122 JK3 7124 2003 53 1:00 pHLW30 5-1 JK33 6863 JK34 8879 2017 54 1:00 pHLW30 F3-R5 5-2 JK7 8704 JK8 10705 2002 53 1:00 pHLW30 see Table cr4 F3-R5-cr4 2-11 6-1 JK40 10364 JK41 12329 1966 54.5 1:00 pHLW30 F3-R6 6-2 JK42 12211 JK43 14483 2273 54.5 1:00 pHLW30

110 7 JK67 14151 JK68 17497 3347 55.5 1:45 pHLW30 F3-R7

8-1 JK69 17354 JK59 19286 1933 54.5 1:00 pHLW30 F3-R8 8-2 JK71 19092 JK72 20978 1887 56.5 1:00 pHLW30 9 JK91 20744 JK76 22986 2243 54 1:00 pHLW30 F3-R9 10-1 JK92 22790 JK94 25253 2464 54.5 1:10 pHLW30 F3-R10 10-2 JK95 25066 JK87 27386 2321 54.5 1:10 pHLW30 see Table cr12 F3-R10-cr12 2-11 cr15-a (PCR 3), cr15- cr15 repair*** AN531 -33 AN492 1238 1272 57 0:40 F3-R10-cr12-cr15 b (PCR 1) cr15-a (PCR 1) AN613 43 AN681 644 602 55 0:20 pLW2594 cr15-a (PCR 2) AN614 36 AN681 644 609 55 0:20 cr15-a (PCR 1) cr15-a (PCR 3) AN133 1 AN681 644 644 55 0:20 cr15-a (PCR 2) cr15-b (PCR 1) AN682 614 AN492 1238 625 55 0:20 gblock ptKanR 11-1 JK98 27076 JK100 29742 2667 55 1:15 pHLW30 F3-R11 11-2 JK130 29612 JK131 31706 2095 56 1:00 pHLW30 12-1 JK111 31546 JK112 33899 2354 56 1:10 pHLW30 F3-R12 12-2 JK113 33747 JK114 35050 1304 57 0:40 pHLW30 12-3** JK115 34912 JK120 36365 1454 54 0:45 12-3a, 3b, 3c

12-3a JK115 34912 JK116 35225 314 55.5 0:15 gblock ptKanR Acidophil's plasmid 12-3b JK117 35196 JK118 36206 1011 56 0:30 with ant3ia gene 12-3c JK119 36178 JK120 36365 188 54 0:15 gblock ptKanR cr23-1a, cr23-1b, F3-R12-cr23-1 cr23-1 repair**** AN910 NA AN916 NA 1189 57 0:35 cr23-1c (AN501Y) cr23-1a AN910 NA AN911 NA 200 55 0:15 F1-R10-cr10 gDNA cr23-1b AN913 NA AN914 NA 822 55 0:20 F1-R10-cr10 gDNA cr23-1c AN915 NA AN916 NA 226 54 0:15 F3 gDNA cr23-2a, cr23-2b, F3-R12-cr23-2 cr23-2 repair**** AN910 NA AN916 NA 1060 57 0:35 cr23-2c (AN503Y) cr23-2a AN910 NA AN912 NA 208 55 0:15 F2-R14-cr9 gDNA cr23-2b AN917 NA AN918 NA 691 55 0:20 F2-R14-cr9 gDNA cr23-2c AN919 NA AN916 NA 231 54 0:15 F3 gDNA * The URA3 marker was originally assembled in Round 1. CRISPR (cr15) was performed later to convert the URA3 marker to KanMX and add FRT-14. ** 12-3 was built by PCR fusion of 12-3a, 12-3b and 12-3c.

111 *** Repair DNA block for CRISPR (cr15), formed by fusion PCR of cr15-a (PCR 3) and cr15-b (PCR 1).

**** The repair fragments for cr23-1 and cr23-2 were formed by fusion PCR of three PCR blocks.

Table 2-5. Assembly PCRs for Fragment 4.

Fwd Rev Rev Size Annealing Extension Round Fwd primer positio Template Strain (AKA) primer position (bp) temperature (ºC) time (min) n 1 AN133 31 AN134 1057 1027 58 0:30 pAN38 F4-R1 (AN43Y) 2 (PCR 1)* AN217 NA AN221 1407 NA 52.5 0:15 pSE34 2 (PCR 2)* AN218 NA AN221 1407 NA 52.5 0:15 2 (PCR 1) F4-R2 3 DD1 1367 DD3 3367 2001 54 1:00 pHLW30 F4-R3 4 DD5 3198 DD6 5211 2014 56 1:00 pHLW30 F4-R4 5 DD9 5047 DD10 7292 2246 55 1:00 pHLW30 F4-R5 6-1 DD18 7105 DD20 9136 2032 54 1:00 pHLW30 F4-R6 6-2 DD21 9005 DD22 10942 1938 53 1:00 pHLW30 7-1 DD30 10820 DD31 13093 2274 53.8 1:00 pHLW30 F4-R7 7-2 DD32 12894 DD33 14972 2079 53.5 1:00 pHLW30 8** NA 14769 NA 19681 4913 NA NA NA F4-R8 8a (Gibson) DD51 14769 DD39 16869 2101 55 1:00 pHLW30 112 8b (Gibson) DD52 16850 DD41 18667 1818 55 1:00 pHLW30

8c (Gib-pAN68) DD42 18607 DD54 19681 1075 54 0:30 pAN38 cr6 see Table 2-11 F4-R8-cr6 cr7 repair*** AN489 1008 AN492 1170 163 55 0:15 cr7 (PCR 1) F4-R8-cr6-cr7 cr7 (PCR 1) AN490 1046 AN491 1130 85 55 0:15 Fusion PCR * CRISPR (cr7) was performed to insert FRT-13. ** In Round 8, the 3 building blocks (8a, 8b, 8c) were Gibson assembled into pAN68 digested with SalI. *** Repair DNA block for CRISPR (cr7).

Table 2-6. PCRs to verify Fragment 1.

Fwd Fwd Rev Annealing Extension No. Rev position Size (bp) primer position primer temperature (ºC) time (min) 1 AN348 -7 AN219 1367 1375 55 0:40 2 AN19 1064 KV17 2374 1311 55 0:40 3 EC8 2276 EC9 4248 1973 56 1:00 4 KV15 4109 KV16 6125 2017 55 1:00 5 KV25 5965 KV26 7959 1995 55 1:00 6 KV27 7881 KV28 9879 1999 55 1:00 7 KV94 9736 KV95 11667 1932 56 1:00 8 AN239 11490 KV96 13184 1695 56 0:45 9 KV51 13002 KV52 15040 2039 56 1:00 10 KV53 14998 KV54 16755 1758 54.5 1:00

113 11 KV60 16525 KV61 18471 1947 56 1:00 12 KV62 18361 KV63 20794 2434 57 1:00

13 KV64 20569 KV50 21831 1263 53 0:35 14 KV65 21691 KV66 23205 1515 56 0:45 15 KV75 23101 AN272 24983 1883 54.5 0:45

Table 2-7. PCRs to verify Fragment 2.

Fwd Fwd Rev Rev Size Annealing Extension No. primer position primer position (bp) temperature (ºC) time (min) 1 AN348 -7 AN103 1353 1361 55 0:45 2 AN580 1149 AN146 3315 2167 55 1:00 3 AN173 2088 AN145 4080 1993 55 1:00 4 AN144 3969 AN175 5759 1791 55 1:00 5 AN164 5206 AN187 7707 2502 55 1:00 6 AN223 7399 AN224 9692 2294 55 1:00 7 AN245 9380 AN246 11626 2247 55 1:00 8 AN227 11569 AN228 13570 2002 55 1:00 9 AN239 13465 AN240 15497 2033 55 1:00 10 AN247 15389 AN248 17446 2058 55 1:00

114 11 AN267 17220 AN268 19232 2013 55 1:00 12 AN275 18997 AN276 20849 1853 55 1:00

13 AN364 20614 AN365 22071 1458 55 0:45 14 AN366 21953 AN272 23975 2023 55 1:00

Table 2-8. PCRs to verify Fragment 3.

Fwd Fwd Rev Size Annealing Extension No. Rev primer primer position position (bp) temperature (ºC) time (min) 1 AN348 -7 AN220 1475 1483 55 0:30 2 AN580 1269 KV3 3084 1816 54.5 1:00 3 KV6 2924 KV7 4565 1642 55 1:00 4 KV10 4398 KV11 6418 2021 55 1:00 5 JK35 5928 JK36 7506 1579 55 0:45 6 JK4 6954 JK38 9372 2419 55.5 1:10 7 JK44 9175 JK45 11677 2503 56.4 1:15 8 JK46 11610 JK47 13648 2039 56 1:00 9 JK62 13435 JK63 14937 1503 55 0:45 10 JK64 14814 JK65 16612 1799 55 1:00

115 11 JK66 16418 JK53 18197 1780 53.8 1:00 12 JK73 17848 JK74 19743 1896 56 1:00

13 JK81 19517 JK82 21845 2329 56 1:00 14 JK83 21761 JK84 23689 1929 55 1:00 15 JK85 23555 JK96 25987 2433 55 1:00 16 JK97 25859 JK107 27762 1904 56 1:00 17 JK134 27488 JK135 28869 1382 56 0:45 18 JK104 28747 JK105 30654 1908 55.5 1:00 19 JK106 30426 JK121 32420 1995 56.5 1:00 20 JK122 32240 JK123 34372 2133 55.5 1:00 21 JK124 34270 AN272 36658 2389 55 1:15

Table 2-9. PCRs to verify Fragment 4.

Fwd Fwd Rev Size Annealing Extension No. Rev position primer position primer (bp) temperature (ºC) time (min) 1 AN348 -7 AN106 885 893 55 0:30 2 AN184 463 AN221 1407 945 55 0:30 3 AN580 1201 DD4 3009 1809 55 1:00 4 DD7 2900 DD8 4558 1659 54 1:00 5 DD11 4319 DD12 6343 2025 57 1:00 6 DD29 5779 DD28 8337 2559 55 1:00 7 DD26 8103 DD27 10059 1957 56 1:00 8 DD44 9856 DD45 12026 2171 55.5 1:00 9 DD36 11746 DD37 13929 2184 56 1:00 10 DD46 13722 DD47 15724 2003 56.5 1:00

1 11 DD48 15535 DD49 17828 2294 55.5 1:00 16 12 DD50 17696 AN272 20097 2402 54.5 1:00

Table 2-10. CRISPR details and gRNA primers.

CRISPR gRNA R gRNA F Fragment Position Function Locus Target strand no. primer primer

cr14 1 15707 Correct mutation: A to G merA-M2-AT Antisense AN609 AN610

cr14-2 1 15707 Correct mutation: A to G merA-M2-AT Sense AN693 AN694

cr10 1 21005 Correct mutation: C to T merA-M3-AT Antisense AN542 AN543

Replace FRT-3 with FRT-WT, cr9 2 31 – 1059 ermE* promoter Sense AN525 AN526 TRP1 and FRT-15

merC-M9-AT and KR cr4 3 9058 Correct mutation: T to C Sense AN339 AN340 linker

cr12-1 3 NA Correct mutation: insert A HO endonuclease cut site Sense AN599 AN600

Replace URA3 with KanMX, FRT- 117 cr15 3 79 – 1179 ermE* promoter Sense AN611 AN612 14

cr23-1 3 79 - 885 Replace KanMX with URA3 KANMX Sense AN908 AN909

cr23-2 3 79 - 885 Replace KanMX with TRP1 KANMX Sense AN908 AN909

cr6 4 13416 Correct mutation: T to C merQ Antisense AN396 AN397

1064 – cr7 4 Insert FRT-13 ermE* promoter Sense AN493 AN494 1111

Replace ermE* promoter with cr16b-F1 1 NA ermE* promoter Sense EC36 EC37 library

Replace ermE* promoter with cr16b-F2 2 NA ermE* promoter Sense EC36 EC37 library

Replace ermE* promoter with cr16b-F3 3 NA ermE* promoter Sense EC36 EC37 library

Replace ermE* promoter with cr16b-F4 4 NA ermE* promoter Sense EC36 EC37 library

Table 2-11. CRISPR details and repair DNA primers.

Repair fragment Repair fragment length CRISPR no. PCR template for repair Comments primers (bp)

Failed to correct. gRNA sequence is cr14 KV41, KV76 pHLW30 247 identical to merA-M1-AT except for 9th base from the PAM sequence

Failed to correct. gRNA sequence is cr14-2 AN695, AN696 Fusion pcr 97 identical to merA-M1-AT except for 1st base from the PAM sequence

cr10 AN544, AN545 Fusion pcr 90 Corrected

cr9 see Table 2-3 see Table 2-3 1151 Inserted

cr4 AN341, AN342 Fusion pcr 98 Corrected

118 cr12-1 AN605, AN606 Fusion pcr 91 Corrected

cr15 see Table 2-4 see Table 2-4 1272 Inserted

cr23-1 see Table 2-4 see Table 2-4 1189 Inserted

cr23-2 see Table 2-4 see Table 2-4 1060 Inserted

cr6 AN374, AN375 Fusion pcr 100 Corrected

cr7 see Table 2-5 see Table 2-5 163 Inserted

cr16b-F1 EC46, EC53 gblocks F1-kasOp/SF14p/p21/B4p/p72 Various Inserted

cr16b-F2 EC46, EC52 gblocks F2-kasOp/SF14p/p21/B4p/p72 Various Inserted

cr16b-F3 EC46, EC54 gblocks F3-kasOp/SF14p/p21/B4p/p72 Various Inserted

cr16b-F4 EC46, EC47 gblocks F4-kasOp/SF14p/p21/B4p/p72 Various Inserted

2.6 Strains and plasmids

Table 2-12. Strains used in this study.

Name (AKA) Genotype / Details Source Wingler and Cornish, LW2591Y Reiterative Recombination parental acceptor 2011 AN61Y AN43Y / FRT-3 site (FRT-WT – URA3 – FRT-3) This study F1-R1 (AN43Y) This study F1-R2 This study F1-R3 This study F1-R4 This study F1-R5 This study F1-R6 Fragment 1 (merP, merA) containing strains This study F1-R7 This study F1-R8 This study F1-R9 This study F1-R10 This study F1-R10-cr10 This study F2-R1 (AN23Y) This study F2-R2 (AN41Y) This study F2-R3 (AN46Y) This study F2-R4 (AN51Y) This study F2-R5 (AN57Y) This study F2-R6 (AN60Y) This study F2-R7 (AN62Y) This study F2-R8 (AN63Y) Fragment 2 (merB) containing strains This study F2-R9 This study F2-R10 This study F2-R11 This study F2-R12 This study F2-R13 This study F2-R14 This study F2-R14-cr9 This study F3-R1 (AN43Y) This study Fragment 3 (merC) containing strains F3-R2 This study

119

F3-R3 This study F3-R4 This study F3-R5 This study F3-R5-cr4 This study F3-R6 This study F3-R7 This study F3-R8 This study F3-R9 This study F3-R10 This study F3-R10-cr12 This study F3-R10-cr12- This study cr15 F3-R11 This study F3-R12 This study F3-R12-cr23-1 This study (AN501Y) F3-R12-cr23-2 This study (AN503Y) F4-R1 (AN43Y) This study F4-R2 This study F4-R3 This study F4-R4 This study F4-R5 This study Fragment 4 (merE - merT) containing strains F4-R6 This study F4-R7 This study F4-R8 This study F4-R8-cr6 This study F4-R8-cr6-cr7 This study AN367Y F1-cr16b-p72-c5 This study AN368Y F1-cr16b-p72-c11 This study AN369Y F1-cr16b-kasOp*-c1 This study AN370Y F1-cr16b-kasOp*-c3 This study AN371Y F1-cr16b-SF14p-c2 This study AN372Y F1-cr16b-SF14p-c13 This study AN373Y F1-cr16b-B4p-c4 This study AN374Y F1-cr16b-B4p-c9 This study

120

AN375Y F1-cr16b-p21-c10 This study AN376Y F1-cr16b-p21-c14 This study AN356Y F2-cr16b-SF14p-c1 This study AN357Y F2-cr16b-SF14p-c2 This study AN358Y F2-cr16b-kasOp-c1 This study AN359Y F2-cr16b-kasOp-c10 This study AN360Y F2-cr16b-p21-c5 This study AN361Y F2-cr16b-p21-c9 This study AN362Y F2-cr16b-p72-c4 This study AN363Y F2-cr16b-p72-c8 This study AN364Y F2-cr16b-B4p-c3 This study AN365Y F2-cr16b-B4p-c6 This study AN306Y F3-cr16b-kasOp-c1 This study AN307Y F3-cr16b-kasOp-c50 This study AN309Y F3-cr16b-SF14p-c7 This study AN310Y F3-cr16b-SF14p-c15 This study AN350Y F3-cr16b-p72-c17 This study AN351Y F3-cr16b-p72-c26 This study AN352Y F3-cr16b-B4p-c18 This study AN353Y F3-cr16b-B4p-c20 This study AN354Y F3-cr16b-p21-c8 This study AN355Y F3-cr16b-p21-c13 This study AN248Y F4-cr16b-p72-c1 This study AN249Y F4-cr16b-p72-c48 This study AN250Y F4-cr16b-SF14p-c19 This study AN251Y F4-cr16b-SF14p-c27 This study AN252Y F4-cr16b-B4p-c24 This study AN253Y F4-cr16b-B4p-c38 This study AN254Y F4-cr16b-p21-c12 This study AN255Y F4-cr16b-p21-c35 This study AN256Y F4-cr16b-kasOp-c26 This study AN257Y F4-cr16b-kasOp-c37 This study K4-114 Streptomyces lividans host strain Liu et al., 2009 / Pfizer E7 K4-114 / pHLW30 conjugant (meridamycin producer) Liu et al., 2009 / Pfizer K4-114-F4 S. lividans strain with Fragment 4 This study

121

Table 2-13. Plasmids used in this study.

Source/ Name Details Reference

Reiterative Recombination pLW2592 Reiterative Recombination odd round donor plasmid Wingler and pLW2593 Reiterative Recombination even round donor plasmid Cornish, pLW2594 Reiterative Recombination Round 1 donor plasmid 2011 Reiterative Recombination even round donor plasmid containing FRT pAN68 This study (3) site

FLIP system ATCC p415MET25 E. coli - S. cerevisiae shuttle plasmid #87322 ATCC p416GALL E. coli - S. cerevisiae shuttle plasmid #87340 pFL119 Flp recombinase gene G. Struhl pJ33R FRT-WT flanking Drosophila DNA G. Struhl pAN15 pJ33R / site-directed mutagenesis to create FRT-3 This study pAN20 p416GALL with N terminus HA-tagged Flp This study pAN27 p415MET25 with AUG-FRT(WT)-URA3 This study pAN29 p415MET25 with AUG-FRT(WT)-TRP1 This study pAN65 p415MET25 with AUG-FRT(WT)-TRP1-FRT(3) This study pAN53 p415MET25 with AUG-FRT(WT)-TRP1-FRT(3) and pGAL-HA-Flp This study pAN133 pAN53 / remove HA tag in Flp1 This study pAN428 pAN65 / FRT(WT)-TRP1-FRT(3)-MfeI This study pAN429 pAN65 / FRT(WT)-TRP1-FRT(13)-MfeI This study pAN430 pAN65 / FRT(WT)-TRP1-FRT(15)-MfeI This study pAN431 pAN27 / FRT(WT)-URA3-FRT(13)-MfeI This study pAN432 pAN27 / FRT(WT)-URA3-FRT(14)-MfeI This study pSBAC acceptors Liu et al., E. coli - Streptomyces spp. shuttle Bacterial Artificial Chromosome pSBAC 2009 / conjugation vector Pfizer Inc. pAN8 pSBAC / Site-directed mutagenesis to remove NheI site in int This study pAN12 pAN8 / Site-directed mutagenesis to remove MfeI site in parA This study pAN84 pAN12 / Insert multiple cloning site containing NheI and MfeI This study

122

pAN84 / Insert Flp acceptor module FRT(WT)-TRP1-FRT(3), pAN98 This study CEN/ARS, LEU2 marker pAN146 pAN98 / FRT(WT)-TRP1-FRT(3) / Insert pGAL-Flp1-tACT This study pAN433 pAN146 / Replace Flp acceptor with FRT(WT)-TRP1-FRT(3)-MfeI This study pAN434 pAN146 / Replace Flp acceptor with FRT(WT)-TRP1-FRT(13)-MfeI This study pAN435 pAN146 / Replace Flp acceptor with FRT(WT)-TRP1-FRT(15)-MfeI This study pAN436 pAN146 / Replace Flp acceptor with FRT(WT)-URA3-FRT(13)-MfeI This study pAN437 pAN146 / Replace Flp acceptor with FRT(WT)-URA3-FRT(14)-MfeI This study pAN438 TAR acceptor plasmid with NheI site to linearize This study pAN84 / attP-int from phage ɸC31 / kanamycin marker (aphII) / Fd pAN446 This study phage terminator pSBAC with pathway fragments pAN285 pAN146 / Fragment 4 (merE - merT) This study pAN340 pAN146 / F4-SF14p-c1 This study pAN341 pAN146 / F4-SF14p-c5 This study pAN342 pAN146 / F4-B4p-c3 This study pAN343 pAN146 / F4-B4p-c12 This study pAN344 pAN146 / F4-p72-c4 This study pAN345 pAN146 / F4-p72-c8 This study pAN346 pAN146 / F4-kasOp-c15 This study pAN347 pAN146 / F4-kasOp-c23 This study pAN348 pAN146 / F4-p21-c25 This study pAN349 pAN146 / F4-p21-c27 This study pAN472 pAN435 / Fragment 1 (merP, merA) This study pAN475 pAN437 / Fragment 2 (merB) This study TAR1-C7-c1 pAN438 / Fragment 3 (merC) This study plig5 pAN84 / F1 (merA) / F2 (merB) This study

123

2.7 References

1 Yadav, V. G., De Mey, M., Lim, C. G., Ajikumar, P. K. & Stephanopoulos, G. The future of metabolic engineering and synthetic biology: towards a systematic practice. Metabolic engineering 14, 233-241 (2012). 2 Gibson, D. G. et al. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 319, 1215-1220, doi:10.1126/science.1151721 (2008). 3 Kodumal, S. J. et al. Total synthesis of long DNA sequences: synthesis of a contiguous 32-kb polyketide synthase gene cluster. Proceedings of the National Academy of Sciences of the United States of America 101, 15573-15578, doi:10.1073/pnas.0406911101 (2004). 4 Khosla, C. Structures and mechanisms of polyketide synthases. J Org Chem 74, 6416-6420, doi:10.1021/jo9012089 (2009). 5 Kao, C. M., Katz, L. & Khosla, C. Engineered biosynthesis of a complete macrolactone in a heterologous host. Science 265, 509-512 (1994). 6 Waksman, S. A., Lechevalier, H. A. & Schaffner, C. P. Candicidin and other polyenic antifungal antibiotics. Bull World Health Organ 33, 219-226 (1965). 7 Thomson, A. W. & Woo, J. Immunosuppressive properties of FK-506 and rapamycin. Lancet 2, 443-444 (1989). 8 Campbell, W. C. History of avermectin and ivermectin, with notes on the history of other macrocyclic lactone antiparasitic agents. Curr Pharm Biotechnol 13, 853-865 (2012). 9 Nicolaou, K. C., Chakraborty, T. K., Piscopio, A. D., Minowa, N. & Bertinato, P. Total synthesis of rapamycin. Journal of the American Chemical Society 115, 4419-4420, doi:10.1021/ja00063a093 (1993). 10 Ley, S. V. et al. Total synthesis of rapamycin. Chemistry 15, 2874-2914, doi:10.1002/chem.200801656 (2009). 11 Maddess, M. L. et al. Total synthesis of rapamycin. Angewandte Chemie 46, 591-597, doi:10.1002/anie.200604053 (2007). 12 Carter, G. T. in Drug Discovery from Natural Products (ed O. and Vicente Genilloud, F.) Ch. 5, (The Royal Society of Chemistry, 2012). 13 Liu, H. et al. Rapid cloning and heterologous expression of the meridamycin biosynthetic gene cluster using a versatile Escherichia coli-streptomyces artificial chromosome vector, pSBAC. Journal of natural products 72, 389-395, doi:10.1021/np8006149 (2009). 14 Bentley, R. & Bennett, J. W. Constructing polyketides: from collie to combinatorial biosynthesis. Annu Rev Microbiol 53, 411-446, doi:10.1146/annurev.micro.53.1.411 (1999).

124

15 Chaudhary, A. K., Dhakal, D. & Sohng, J. K. An insight into the "-omics" based engineering of streptomycetes for secondary metabolite overproduction. Biomed Res Int 2013, 968518, doi:10.1155/2013/968518 (2013). 16 Cobb, R. E., Wang, Y. & Zhao, H. High-efficiency multiplex genome editing of Streptomyces species using an engineered CRISPR/Cas system. ACS synthetic biology 4, 723-728, doi:10.1021/sb500351f (2015). 17 Annaluru, N. et al. Total synthesis of a functional designer eukaryotic chromosome. Science 344, 55-58, doi:10.1126/science.1249252 (2014). 18 Shao, Z., Luo, Y. & Zhao, H. Rapid characterization and engineering of natural product biosynthetic pathways via DNA assembler. Molecular bioSystems 7, 1056-1059, doi:10.1039/c0mb00338g (2011). 19 Wingler, L. M. & Cornish, V. W. Reiterative Recombination for the in vivo assembly of libraries of multigene pathways. Proceedings of the National Academy of Sciences of the United States of America 108, 15135-15140, doi:10.1073/pnas.1100507108 (2011). 20 Ostrov, N., Wingler, L. M. & Cornish, V. W. Gene assembly and combinatorial libraries in S. cerevisiae via reiterative recombination. Methods in molecular biology 978, 187-203, doi:10.1007/978-1-62703-293-3_14 (2013). 21 Storici, F., Durham, C. L., Gordenin, D. A. & Resnick, M. A. Chromosomal site-specific double- strand breaks are efficiently targeted for repair by oligonucleotides in yeast. Proceedings of the National Academy of Sciences of the United States of America 100, 14994-14999, doi:10.1073/pnas.2036296100 (2003). 22 Sun, Y. et al. Organization of the biosynthetic gene cluster in Streptomyces sp. DSM 4137 for the novel neuroprotectant polyketide meridamycin. Microbiology 152, 3507-3515, doi:10.1099/mic.0.29176-0 (2006). 23 Dumont, F. J., Staruch, M. J., Koprak, S. L., Melino, M. R. & Sigal, N. H. Distinct mechanisms of suppression of murine T cell activation by the related macrolides FK-506 and rapamycin. Journal of immunology 144, 251-258 (1990). 24 Summers, M. Y., Leighton, M., Liu, D., Pong, K. & Graziani, E. I. 3-normeridamycin: a potent non-immunosuppressive immunophilin ligand is neuroprotective in dopaminergic neurons. The Journal of antibiotics 59, 184-189, doi:10.1038/ja.2006.26 (2006). 25 Graziani, E. I. & Pong, K. Meridamycin analogues for the treatment of neurodegenerative disorders. US patent US 7745457 B2 (2010).

125

26 Gold, B. G., Armistead, D. M. & Wang, M. S. Non-FK506-binding protein-12 neuroimmunophilin ligands increase neurite elongation and accelerate nerve regeneration. Journal of neuroscience research 80, 56-65, doi:10.1002/jnr.20447 (2005). 27 Guo, X., Dawson, V. L. & Dawson, T. M. Neuroimmunophilin ligands exert neuroregeneration and neuroprotection in midbrain dopaminergic neurons. The European journal of neuroscience 13, 1683-1693 (2001). 28 Edmund Idris Graziani, C. R., NY (US); Kevin Pong, Robbinsville, NJ (US) MERIDAMYCIN ANALOGUES FOR THE TREATMENT OF NEURODEGENERATIVE DISORDERS US 7,745,457 B2 (2010). 29 Turan, S., Zehe, C., Kuehle, J., Qiao, J. & Bode, J. Recombinase-mediated cassette exchange (RMCE) - a rapidly-expanding toolbox for targeted genomic modifications. Gene 515, 1-27, doi:10.1016/j.gene.2012.11.016 (2013). 30 Bibb, M. J., Janssen, G. R. & Ward, J. M. Cloning and analysis of the promoter region of the erythromycin resistance gene (ermE) of Streptomyces erythraeus. Gene 38, 215-226, doi:http://dx.doi.org/10.1016/0378-1119(85)90220-3 (1985). 31 Ziermann, R. & Betlach, M. C. A two-vector system for the production of recombinant polyketides in Streptomyces. J Ind Microbiol Biotech 24, 46-50, doi:10.1038/sj.jim.2900763 (2000). 32 Xue, Q., Ashley, G., Hutchinson, C. R. & Santi, D. V. A multiplasmid approach to preparing large libraries of polyketides. Proceedings of the National Academy of Sciences of the United States of America 96, 11740-11745 (1999). 33 Tang, L., Fu, H. & McDaniel, R. Formation of functional heterologous complexes using subunits from the picromycin, erythromycin and oleandomycin polyketide synthases. Chemistry & Biology 7, 77-84, doi:http://dx.doi.org/10.1016/S1074-5521(00)00073-9 (2000). 34 Pieper, R., Luo, G., Cane, D. E. & Khosla, C. Cell-free synthesis of polyketides by recombinant erythromycin polyketide synthases. Nature 378, 263-266, doi:10.1038/378263a0 (1995). 35 Scherer, S. & Davis, R. W. Replacement of chromosome segments with altered DNA sequences constructed in vitro. Proceedings of the National Academy of Sciences of the United States of America 76, 4951-4955 (1979). 36 DiCarlo, J. E. et al. Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic acids research 41, 4336-4343, doi:10.1093/nar/gkt135 (2013). 37 Sander, J. D. & Joung, J. K. CRISPR-Cas systems for editing, regulating and targeting genomes. Nature biotechnology 32, 347-355, doi:10.1038/nbt.2842 (2014).

126

38 Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821, doi:10.1126/science.1225829 (2012). 39 Fehr, T., Sanglier, Jean-jacques, Schuler, Walter. RAPAMYCIN-LIKE MACROLIDE AND A NEW STRAIN OF STREPTOMYCES WHICH PRODUCES IT. WO/1994/018207 (1994). 40 Salituro, G. M. et al. Meridamycin: A Novel Nonimmunosuppressive FKBP12 Ligand From Streptomyces hygroscopius. Tetrahedron Letters 36, 4 (1995). 41 Schmitt-John, T. & Engels, J. Promoter constructions for efficient secretion expression in Streptomyces lividans. Appl Microbiol Biotechnol 36, 493-498, doi:10.1007/BF00170190 (1992). 42 Alper, H., Fischer, C., Nevoigt, E. & Stephanopoulos, G. Tuning genetic control through promoter engineering. Proceedings of the National Academy of Sciences of the United States of America 102, 12678-12683, doi:10.1073/pnas.0504604102 (2005). 43 Wang, W. et al. An engineered strong promoter for streptomycetes. Applied and environmental microbiology 79, 4484-4492, doi:10.1128/AEM.00985-13 (2013). 44 Labes, G., Bibb, M. & Wohlleben, W. Isolation and characterization of a strong promoter element from the Streptomyces ghanaensis phage I19 using the gentamicin resistance gene (aacC1) of Tn 1696 as reporter. Microbiology 143 ( Pt 5), 1503-1512 (1997). 45 Siegl, T., Tokovenko, B., Myronovskyi, M. & Luzhetskyy, A. Design, construction and characterisation of a synthetic promoter library for fine-tuned gene expression in actinomycetes. Metabolic engineering 19, 98-106, doi:10.1016/j.ymben.2013.07.006 (2013). 46 Bibb, M. J., White, J., Ward, J. M. & Janssen, G. R. The mRNA for the 23S rRNA methylase encoded by the ermE gene of Saccharopolyspora erythraea is translated in the absence of a conventional ribosome-binding site. Molecular Microbiology 14, 533-545, doi:10.1111/j.1365- 2958.1994.tb02187.x (1994). 47 Grindley, N. D., Whiteson, K. L. & Rice, P. A. Mechanisms of site-specific recombination. Annu Rev Biochem 75, 567-605, doi:10.1146/annurev.biochem.73.011303.073908 (2006). 48 Gaj, T., Sirk, S. J. & Barbas, C. F., 3rd. Expanding the scope of site-specific recombinases for genetic and metabolic engineering. Biotechnology and bioengineering 111, 1-15, doi:10.1002/bit.25096 (2014). 49 Schlake, T. & Bode, J. Use of mutated FLP recognition target (FRT) sites for the exchange of expression cassettes at defined chromosomal loci. Biochemistry 33, 12746-12751 (1994). 50 Turan, S., Kuehle, J., Schambach, A., Baum, C. & Bode, J. Multiplexing RMCE: versatile extensions of the Flp-recombinase-mediated cassette-exchange technology. Journal of molecular biology 402, 52-69, doi:10.1016/j.jmb.2010.07.015 (2010).

127

51 Sikorski, R. S. & Hieter, P. A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122, 19-27 (1989). 52 Silverman, G. A. Purification of YAC-containing total yeast DNA. Methods in molecular biology 54, 65-68 (1996). 53 Larionov, V. et al. Specific cloning of human DNA as yeast artificial chromosomes by transformation-associated recombination. Proceedings of the National Academy of Sciences of the United States of America 93, 491-496 (1996). 54 Ketner, G., Spencer, F., Tugendreich, S., Connelly, C. & Hieter, P. Efficient manipulation of the human adenovirus genome as an infectious yeast artificial chromosome clone. Proceedings of the National Academy of Sciences of the United States of America 91, 6186-6190 (1994). 55 Erickson, J. R. & Johnston, M. Direct cloning of yeast genes from an ordered set of lambda clones in Saccharomyces cerevisiae by recombination in vivo. Genetics 134, 151-157 (1993). 56 Noskov, V. N. et al. A general cloning system to selectively isolate any eukaryotic or prokaryotic genomic region in yeast. BMC genomics 4, 16, doi:10.1186/1471-2164-4-16 (2003). 57 Kouprina, N., Noskov, V. N., Koriabine, M., Leem, S. H. & Larionov, V. Exploring transformation-associated recombination cloning for selective isolation of genomic regions. Methods in molecular biology 255, 69-89, doi:10.1385/1-59259-752-1:069 (2004). 58 Leem, S. H. et al. Optimum conditions for selective isolation of genes from complex genomes by transformation-associated recombination cloning. Nucleic acids research 31, e29 (2003). 59 Gregory, M. A., Till, R. & Smith, M. C. Integration site for Streptomyces phage phiBT1 and development of site-specific integrating vectors. Journal of bacteriology 185, 5320-5323 (2003). 60 Fayed, B. et al. Multiplexed integrating plasmids for engineering of the erythromycin gene cluster for expression in Streptomyces spp. and combinatorial biosynthesis. Applied and environmental microbiology 81, 8402-8413, doi:10.1128/AEM.02403-15 (2015). 61 Jiang, H. et al. Investigation of the biosynthesis of the pipecolate moiety of neuroprotective polyketide meridamycin. The Journal of antibiotics 64, 533-538, doi:10.1038/ja.2011.45 (2011). 62 Graziani, E. & Pong, K. (Google Patents, 2005). 63 Menzella, H. G. et al. Combinatorial polyketide biosynthesis by de novo design and rearrangement of modular polyketide synthase genes. Nature biotechnology 23, 1171-1176, doi:10.1038/nbt1128 (2005). 64 Kealey, J. T., Liu, L., Santi, D. V., Betlach, M. C. & Barr, P. J. Production of a polyketide natural product in nonpolyketide-producing prokaryotic and eukaryotic hosts. Proceedings of the National Academy of Sciences of the United States of America 95, 505-509 (1998).

128

65 Mutka, S. C., Bondi, S. M., Carney, J. R., Da Silva, N. A. & Kealey, J. T. Metabolic pathway engineering for complex polyketide biosynthesis in Saccharomyces cerevisiae. FEMS Yeast Res 6, 40-47, doi:10.1111/j.1567-1356.2005.00001.x (2006). 66 Gao, X., Wang, P. & Tang, Y. Engineered polyketide biosynthesis and biocatalysis in Escherichia coli. Appl Microbiol Biotechnol 88, 1233-1242, doi:10.1007/s00253-010-2860-4 (2010). 67 Pfeifer, B. A., Admiraal, S. J., Gramajo, H., Cane, D. E. & Khosla, C. Biosynthesis of complex polyketides in a metabolically engineered strain of E. coli. Science 291, 1790-1792, doi:10.1126/science.1058092 (2001). 68 Stemmer, W. P., Crameri, A., Ha, K. D., Brennan, T. M. & Heyneker, H. L. Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene 164, 49-53 (1995). 69 Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6, 343-345, doi:10.1038/nmeth.1318 (2009). 70 Kuijpers, N. G. et al. A versatile, efficient strategy for assembly of multi-fragment expression vectors in Saccharomyces cerevisiae using 60 bp synthetic recombination sequences. Microbial cell factories 12, 47, doi:10.1186/1475-2859-12-47 (2013). 71 Itaya, M. Toward a bacterial genome technology: integration of the Escherichia coli prophage lambda genome into the Bacillus subtilis 168 chromosome. Molecular & general genetics : MGG 248, 9-16 (1995). 72 Itaya, M., Tsuge, K., Koizumi, M. & Fujita, K. Combining two genomes in one cell: stable cloning of the Synechocystis PCC6803 genome in the Bacillus subtilis 168 genome. Proceedings of the National Academy of Sciences of the United States of America 102, 15971-15976, doi:10.1073/pnas.0503868102 (2005). 73 Itaya, M., Nagata, T., Shiroishi, T., Fujita, K. & Tsuge, K. Efficient cloning and engineering of giant DNAs in a novel Bacillus subtilis genome vector. Journal of biochemistry 128, 869-875 (2000). 74 Itaya, M. & Tsuge, K. Construction and manipulation of giant DNA by a genome vector. Methods in enzymology 498, 427-447, doi:10.1016/B978-0-12-385120-8.00019-X (2011). 75 Tsuge, K. & Itaya, M. Recombinational transfer of 100-kilobase genomic DNA to plasmid in Bacillus subtilis 168. Journal of bacteriology 183, 5453-5458 (2001). 76 Iwata, T. et al. Bacillus subtilis genome vector-based complete manipulation and reconstruction of genomic DNA for mouse transgenesis. BMC genomics 14, 300, doi:10.1186/1471-2164-14- 300 (2013).

129

77 Dong, H. & Zhang, D. Current development in genetic engineering strategies of Bacillus species. Microbial cell factories 13, 63, doi:10.1186/1475-2859-13-63 (2014). 78 Gibson, D. G. et al. One-step assembly in yeast of 25 overlapping DNA fragments to form a complete synthetic Mycoplasma genitalium genome. Proceedings of the National Academy of Sciences of the United States of America 105, 20404-20409, doi:10.1073/pnas.0811011106 (2008). 79 Kouprina, N., Eldarov, M., Moyzis, R., Resnick, M. & Larionov, V. A model system to assess the integrity of mammalian YACs during transformation and propagation in yeast. Genomics 21, 7- 17, doi:10.1006/geno.1994.1218 (1994). 80 Haldi, M. et al. Large human YACs constructed in a rad52 strain show a reduced rate of chimerism. Genomics 24, 478-484, doi:10.1006/geno.1994.1656 (1994). 81 Le, Y. & Dobson, M. J. Stabilization of yeast artificial chromosome clones in a rad54-3 recombination-deficient host strain. Nucleic acids research 25, 1248-1253 (1997). 82 Kim, H. & Kim, J. S. A guide to genome engineering with programmable nucleases. Nat Rev Genet 15, 321-334, doi:10.1038/nrg3686 (2014). 83 Stuckey, S. & Storici, F. Gene knockouts, in vivo site-directed mutagenesis and other modifications using the delitto perfetto system in Saccharomyces cerevisiae. Methods in enzymology 533, 103-131, doi:10.1016/B978-0-12-420067-8.00008-8 (2013). 84 Dicarlo, J. E. et al. Yeast Oligo-Mediated Genome Engineering (YOGE). ACS synthetic biology, doi:10.1021/sb400117c (2013). 85 Noskov, V. N. et al. Defining the minimal length of sequence homology required for selective gene isolation by TAR cloning. Nucleic acids research 29, E32 (2001). 86 Adams A, G. D., Kaiser C, Stearns T, eds. Methods in Yeast Genetics

(Cold Spring Harbor Laboratory Press, 1998). 87 Ausubel F, e. a. Current Protocols in Molecular Biology. (Wiley, 1995). 88 Argueso, J. L. et al. Double-strand breaks associated with repetitive DNA can reshape the genome. Proceedings of the National Academy of Sciences of the United States of America 105, 11845-11850, doi:10.1073/pnas.0804529105 (2008). 89 Boeke, J. D., Trueheart, J., Natsoulis, G. & Fink, G. R. 5-Fluoroorotic acid as a selective agent in yeast molecular genetics. Methods in enzymology 154, 164-175 (1987). 90 Toyn, J. H., Gunyuzlu, P. L., White, W. H., Thompson, L. A. & Hollis, G. F. A counterselection for the tryptophan pathway in yeast: 5-fluoroanthranilic acid resistance. Yeast 16, 553-560, doi:10.1002/(SICI)1097-0061(200004)16:6<553::AID-YEA554>3.0.CO;2-7 (2000).

130

91 Mumberg, D., Muller, R. & Funk, M. Regulatable promoters of Saccharomyces cerevisiae: comparison of transcriptional activity and their use for heterologous expression. Nucleic acids research 22, 5767-5768 (1994).

131

Chapter 3

Fluorescence Polarization Assay for

High-throughput Small Molecule Screens

──────────────────────────────────────────────────────────────────────────────

The contents of this chapter will be published in:

Yao Zong Ng, Pedro Baldera-Aguayo, Virginia W. Cornish. “Fluorescence polarization assay for high-throughput screening of FK506 biosynthesized in 96-well microtiter plates”, submitted.

Yao Zong Ng, Pedro Baldera-Aguayo, Virginia W. Cornish. “Fluorescence polarization assay for high-throughput screening of FK506, rapamycin, meridamcyin and related compounds produced in microtiter plates”, patent pending.

Chapter contributions: Y.Z.N. conceived the ideas, Y.Z.N. designed the experiments, Y.Z.N and P.B.A performed the experiments and analyzed the results.

132

3.0 Chapter outlook

Metabolic engineering of microbes is a promising approach to produce compounds of pharmaceutical, agricultural and industrial importance in a cheap and green way. However, unlike single protein engineering, the combinatorial space is orders of magnitude greater, since multiple enzymes and interacting networks have to be optimized. This problem is compounded by our incomplete knowledge of enzymatic pathways and networks in the cell. Thus, rational

(constructive) approaches for the development of high-yielding strains is often a costly and time- consuming process. On the other hand, inverse metabolic engineering approaches such as combinatorial metabolic engineering allow for the generation of improved strains based on limited knowledge of their cellular functions. Recent advances in DNA assembly technologies and genome-wide mutagenesis techniques have enabled the efficient, multiplex generation of large strain libraries. However, a current bottleneck is the lack of general and high-throughput screens to detect the small molecule products. We propose that the fluorescence polarization (FP) assay can be adapted as a high-throughput and potentially generalizable screen for metabolic engineering. The FP assay is quantitative, sensitive, fast and cheap. As a proof of principle, we established the FP assay to screen for FK506 (tacrolimus) biosynthesized by Streptomyces tsukubaensis in 96-well plates. A UV mutagenized library of 160 colonies was screened to find strains showing higher FK506 productivities. The FP assay has the potential to be adapted quickly to create screens for a wide range of small molecule metabolites for combinatorial metabolic engineering.

133

3.1 Introduction

Metabolic engineering has the potential to revolutionize the production of pharmaceuticals, chemicals and fuels by lowering its cost and environmental impact, as well as providing access to novel chemistry.1-3 However, successful examples such as engineering yeast to produce the anti-malarial precursor amorphadiene4,5 and E. coli to produce the bulk chemical

1,3-propanediol6, all required heroic effort, time and cost to achieve. This is understandable given that metabolic engineering is a combinatorial optimization problem involving multi-gene pathways and interacting cellular networks.7 As such, platforms that enable the high-throughput creation and screening of large populations of mutant strains can transform the speed, cost and scale of metabolic engineering. Although many technologies have been established to generate large strain libraries8 (> 106), it is widely held that the current bottleneck to applying combinatorial approaches to metabolic engineering is the lack of generalizable, high-throughput screens for small molecule metabolites.9

The current state-of-the-art for small molecule detection is gas/liquid chromatography

(GC/LC-MS), which is an expensive and low-throughput method (~102 samples per day).9 High- throughput methods such as FACS or growth selections (~107 samples per day) require that the target molecule be colored10, fluorescent11, contain reactive groups that allow conversion to a colored/fluorescent compound12, or confer a growth advantage on the producer cell13,14, and are thus not generalizable to the majority of metabolites. Some assays rely on adapting natural regulators such as riboswitches15,16, transcription factors17,18 and nuclear hormone receptors9,19,20, but there have been few reports of their successful re-design or evolution to change their target specificity21-24, so their generality is still unknown. Likewise, reports of receptors labelled directly with reporters (e.g. GFP25,26, enzymes27-29, environmentally-sensitive fluorophores30-38)

134 to detect small molecule binding and RNA/DNA molecular aptamer beacons or aptasensors39-43 have yet to be widely adopted since their binding and signal transduction functions are tightly interwoven, which makes it difficult to alter ligand specificity and still preserve their signaling function.

We propose that the fluorescence polarization (FP) assay, when applied as a competition assay, can offer a generalizable and high-throughput screen for small molecule metabolite production. Although the FP assay has been used for drug discovery and to study enzyme kinetics, antibody–antigen and other biological interactions 44-48, its potential for small molecule detection for metabolic engineering remains untapped. In the assay, the target molecules compete with a fixed concentration of reporter molecules (labeled with a fluorophore) for binding to a common receptor. Thus, the ratio of bound to unbound reporter molecules, which can be measured by fluorescence polarization, is inversely correlated with the concentration of target molecules. This allows for highly sensitive detection of the target compound and accurate quantification without the need for tedious washing steps, unlike other competition assays using radioactive ligands.49 The FP assay can be performed using the microtiter plate format (e.g.

96/384/1536 wells) to screen libraries of 104-105 samples per day 50, making it cheap, automatable and high-throughput compared to GC/LC-MS. The dynamic range of the assay is tunable to detect a wide range of target concentrations simply by changing the receptor concentration. The FP assay requires two components: a receptor, which can either be found in nature or evolved using established display technologies such as phage, mRNA or yeast surface display51-53, as well as a reporter, which can be the target itself coupled to a fluorophore.54,55

Thus, we expect the FP assay to be broadly applicable to a wide range of small molecule metabolites.

135

As a proof-of-concept, we have established the FP assay as a high-throughput screen for the polyketide FK506, which is naturally produced by Streptomyces spp.56-62 FK506

(Tacrolimus) is the major immunosuppressive drug used to prevent the rejection of organ transplants (e.g. kidney, liver, heart) and also used topically to treat atopic dermatitis.63 It functions by binding to an intracellular receptor FKBP12, which then triggers a signaling pathway leading to immunosuppression.64 Due to the off-patent status of tacrolimus as well as a need to reduce off-target effects, there is a strong interest in strain and culture process development to optimize the production of the drug and analogs.65-69 Previously, FP assays using the receptor FKBP12 and various fluoresceinated reporters were developed to identify novel

FKBP12 ligands and to study other FK506-binding proteins (FKBPs) such as FKBP51 and

FKBP52.70-72 In this study, we have established a high-throughput production and screening platform for FK506, based on culturing Streptomyces tsukubaensis in 96-well, deep-well microtiter plates, followed by extraction and detection of FK506 using the FP assay. As a pilot experiment, we screened a strain library of 160 colonies created by UV mutagenesis and found strains that exhibited higher productivities of FK506.

3.2 Results

3.2.1 High-throughput FP assay to detect FK506

The FP assay can be adapted to detect a target small molecule for metabolic engineering

(Fig. 3-1). To perform the assay, a fixed amount of receptor and reporter molecules (which contains a fluorophore and competes with the target molecule for binding to the receptor) are added to the sample in a microtiter plate. If the concentration of the target molecules is low, the reporter molecules will be mostly bound to the receptor, thus producing higher fluorescence

136 polarization values. However, if the concentration of the target molecules is high, the reporter molecules will be mostly unbound, thus resulting in lower fluorescence polarization values (Fig.

3-1).

(a)

(b) 350

300

250

200

150

100 100 nM FKBP12

Fluorescence polarization (mP) polarization Fluorescence 50 1 µM FKBP12

0 0.01 0.1 1 10 100 FK506 concentration (mg/L)

Figure 3-1. FP assay to detect FK506. (a) Principle of applying the FP assay to detect metabolites.

(b) Competition curves for FK506 standards using 100 nM and 1 µM of FKBP12. Error bars are standard deviations of 3 readings each of 2 replicate wells. Concentrations refer to that of the sample. The final concentration in the assay well is 10-fold lower.

137

As a proof-of-concept, we adapted the FP assay to establish a high-throughput screen for

FK506 using the receptor FKBP12 and the reporter molecule FK506-fluorescein (Fig. 3-1, b).

The receptor and reporter were previously used to measure the affinity of FKBP12 receptor mutants for FK506.73 We first constructed a standard curve of fluorescence polarization values against different concentrations of FK506 that we purchased commercially (Fig. 3-1, b). The curve is sigmoidal, and the FP readings approach a maximum and minimum at low and high

FK506 concentrations in the sample respectively. In order to use the standard curve to determine the concentration of FK506 in a sample, the sample reading has to fall within the non-saturating region where changes in FP signal occur. We performed the FP assay with two different receptor concentrations (100 nM and 1 µM), which shifted the dynamic range from ~0.1–10 mg/L to ~1–

20 mg/L of FK506 respectively (Fig. 3-1, b). This suggests that the dynamic range of the FP assay can be readily adjusted to detect different levels of small molecule (FK506) simply by tuning the receptor concentration.

3.2.2 Production of FK506 in 96-well microtiter plate format

In order for the FP assay to be useful as a high-throughput screening platform for metabolic engineering, the production platform has to be of similar throughput and also compatible with the microtiter plate format used in the FP assay. Thus we attempted to produce

FK506 by cultivating S. tsukubaensis in 96-well, square, deep-well plates. Microtiter plates are cheap, automatable74, standardized and high-throughput compared to shake flasks.75,76 However, cultivating Streptomyces spp. in plates is challenging due to its propensity for mycelial growth, which tends to limit oxygen transfer. It has been shown that the use of square wells can increase oxygen transfer rates by increasing turbulence, and thus improve secondary metabolite

138 production 77-80. While the production of rapamycin has been demonstrated in 96-well plate format,81,82 to our knowledge, the production of FK506 in liquid culture in 96-well microtiter plates has not yet been reported.

We cultivated S. tsukubaensis in three different volumes of production media – 500, 750 and 1000 µL and tested for the production of FK506 over the course of 8 days. Production of

FK506 was observed after 24 h for all three volumes and increased over the course of 8 days

(Fig. 3-2, a). Production levels as measured by the FP assay reached ~17 mg/L after 8 days for both 750 µL and 1000 µL, which was higher than that for 500 µL (~14 mg/L). Cell growth was measured by taking the dry weight of submerged mycelia as well as growth on the walls above the liquid level (Fig. 3-2, b).

We further quantified the amount of FK506 in all our samples by mass spectrometry. The values thus obtained correlate with the values measured by the FP assay (R2 = 0.705) (Fig. 3-2, c). Compared to the readings obtained by the FP assay, the readings obtained by mass spectrometry displayed large error bars (data not shown).

(a)

25 500 µL

750 µL 20 1000 µL

15

10 FK506 (mg/L) FK506

5

0 24 48 72 96 120 144 168 192 Time (h)

139

(b)

18

16 500 µL 14 750 µL 12 10 1000 µL 8 6 4 2

Dry weight (mg/mL of of media) (mg/mL weight Dry 0 0 24 48 72 96 120 144 168 192 Time (h)

(c)

30 500 µL 750 µL 25 1000 µL

20

15

FP assay (mg/L) assay FP 10

5

0 0 5 10 15 20 25 30 Mass spectrometry (mg/L)

Figure 3-2. Production of FK506 in 96-well plates. (a) FK506 produced (in mg per liter of production media) over 8 days using 500, 750 and 1000 µL per well, as measured by the FP assay. All error bars represent the standard deviation from 4 wells. (b) Growth curves of cells for 500, 750 and 1000

µL, as measured by increase in dry mass over time. (c) Correlation chart for mass spectrometry and FP measurements.

140

3.2.3 Extraction of FK506 and growth determination in 96-well microtiter plates

FK506 is insoluble in water and is not secreted from the producer cells. This necessitates an additional extraction step before high-throughput screening using the FP assay can be carried out. Thus we developed a protocol to extract FK506 from the streptomyces cells using the organic solvent ethyl acetate, which can be performed in the same 96-well production plate (see

Experimental methods).

We tested our protocol by spiking a series of FK506 standards into production media (no cells) and carried out the extraction. We then performed the FP assay with the extracted standards, which matched the results obtained from standards that were directly added to the FP assay (Fig. 3-3, a). Performing the extraction step in 96-well plates would allow for future automation of the extraction step, which should increase the throughput and reproducibility of the data obtained.

Next, we adopted the methylene blue assay to measure the dry weight of S. tsukubaensis cells in 96-well plates.83,84 Using a series of samples with different amounts of cells, we constructed a standard curve of methylene blue absorbance against dry weight (Fig. 3-3, b). The curve shows a linear relationship with the amount of methylene blue left unabsorbed in solution decreasing proportionally with an increasing amount of cells. This method is rapid (~30 min) and high-throughput compared to traditional methods to determine dry weight, which may require over a day for the sample to dry in an oven.

141

(a)

350

R² = 0.978

300

250

200 FP values FP for FK506 standards

150 150 200 250 300 350 FP values for extracted FK506 standards

(b)

2.4

2.2 y = -1.3597x + 3.0072 R² = 0.9886 2.0

1.8

1.6

1.4

1.2 Absorbance 660nm Absorbance 1.0

0.8

0.6 0.4 0.6 0.8 1.0 1.2 1.4 Dry weight (mg)

Figure 3-3. Extraction of FK506 and growth determination in 96-well plates. (a) Plot of FP values for non-extracted FK506 standards versus extracted FK506 standards extracted from media in 96-well plates.

Error bars are the standard deviation of 3 readings each of 2 replicate wells. (b) Standard curve of methylene blue absorbance (at 660 nm) against dry weight of cells.

142

3.2.4 Random mutagenesis and screening for improved FK506 production

As a pilot experiment combining our high-throughput production and screening platform, we screened a UV-mutagenized library of 160 S. tsukubaensis mutants for FK506 production

(Fig. 3-4, a). Overall, one mutagenesis, production and screening cycle took ~12 days, which is an improvement over other methods that have been reported that measure rapamycin production using the zone of inhibition of sensitive reporter cells.81

The mutant colonies displayed a range of productivities that range from 47% to 160% of the non-mutagenized spores (Fig. 3-4, b). In general, the higher the amount of FK506 produced

(measured in mg per liter of media), the higher the productivity of the cells (measured as mg of

FK506 per gram of dry weight), since cell growth (dry weight per mL of media) overall does not seem to increase with FK506 production (Fig. 3-4, b). However, in some instances such as colony 43, the higher productivity value despite lower FK506 amounts can be explained by lower cell growth. Also, colony 119 was not included in the analysis as it had a seemingly high productivity value (1.17 mg FK506 / g dry weight) but this was due to the fact that the cells did not grow in the production plate. This may be due to experimental error or UV-induced mutations that impaired the growth of the cells.

Thus, when selecting mutants for scaled-up testing or further rounds of mutagenesis, both the productivity value and cell growth/density have to be taken into account. Cell growth/density is an important parameter in biosynthesis since it relates to the robustness of the cells, rate of compound production, and efficiency of feedstock to product conversion (yield).

143

(a)

(b)

FK506 (mg/L) Dry weight (mg/mL) Productivity (mg FK506 / g dry weight) 2.5 20

18 Productivity (mg FK506 dry/ g weight) 2 16 14 12 1.5 10 8 1 6 4 0.5 2

FK506 (mg/L) or Dry (mg/mL) weight Dry or (mg/L) FK506 0 0

Mutant colony number

Figure 3-4. High-throughput mutagenesis and screening platform. (a) Integrated protocol to mutagenize and screen a library of spores for improved production. (b) Productivity of mutant colonies

(bottom black bars, y-axis on the right) arranged in ascending order from left to right. Only half the colonies are labelled. The final bar on the right represents the average of 16 wells of non-mutagenized

(WT) spores. For all other mutant colonies, bars represent the average of 2 wells grown in different plates.

The amount of FK506 extracted (middle gray bars, y-axis on the left) and dry weight of cells (upper

144 dotted bars, y-axis on the left) are also shown. Error bars represent the standard deviation for the 2 replicates.

3.2.5 Scale-up of top mutants in shake flasks

Next, we scaled up production to shake flasks for the top 8 mutants, by order of productivity of FK506 (mg of FK506 per gram of dry weight of cells). In our first trial (grey bars), all 8 mutants were tested (M41, M1, M71, M94, M95, M79, M83 and M121, listed in order of decreasing productivity) and compared to two non-mutagenized colonies (WT2 and

WT4). In the second trial (yellow bars), only WT2, WT4, M71, M94, M79 and M83 were tested.

140

120

100

80

60

FK506 FK506 (mg/L) 40

20

0

M1

M41 M71 M94 M95 M79 M83

WT2 WT4 M121

WT_ave

Figure 3-5. Production titer of FK506 in shake flasks for the top 8 mutants by productivity. Gray bars represent the first production trial. Yellow bars represent the second production trial. All error bars represent the standard deviation of 3 replicate flasks per colony for each production trial.

The results of the first trial show that the production titer of FK506 (mg/L) for all the mutants were comparable to or more than the average of the two WT colonies (WT_ave). In 145 particular, M71, M94 and M83 exhibited ~ 5-fold, 2-fold and 2-fold higher titers of FK506 compared to WT_ave respectively (Fig. 3-5). In order to validate these results, we repeated the production trial. This time, the titers of FK506 obtained were higher generally compared to the first trial. The WT colonies produced comparable titers of FK506 with M71, M94 and M83. In particular, the variation in titer between replicate flasks for each colony was high. Promisingly, the average titer of M71 remained higher than the WT_ave, although the error bars are large, thus requiring further validation.

3.2.6 Measuring the performance of the FP assay by Z’ score

The performance of the FP assay was evaluated by calculating the Z’ score. The Z’ score is a statistical parameter that takes into account the assay dynamic range and the variability associated with measurements, and thus reports on the suitability of the assay for high- throughput screening applications.85

We used the FP assay to screen a total of 96 wells of methanol (CH3OH) (maximum FP signal) and 96 wells containing a final concentration of 5 µM FK506 (minimum FP signal) that were divided across two different 384-well plates (Fig. 3-6). The Z’ score was determined to be

0.68, which is similar to FP assays used in other contexts, and falls within the classification of an excellent assay (An ideal assay has a Z’ score of 1).85-87 The FP assay described here can also be applied to detect other compounds that bind to FKBPs such as rapamycin (data not shown).

146

380

340 CH3OH

300

Z' score = 0.68 260

220

180 FK506 Fluorescence polarization (mP) polarization Fluorescence

140 0 20 40 60 80 100 Sample number

Figure 3-6. Z’ score for the FP assay. Faint dots represent FP values for the 96 samples of methanol

(CH3OH) and 96 FK506 controls. The figure shows the mean of the controls (continuous black line), one standard deviation (long dashes) and 3 standard deviations (short dashes) from the mean.

3.3 Discussion

Tuning the detection range of the FP assay

The FP assay is versatile in that the dynamic range can be tuned to measure different target concentrations by varying the receptor concentration. When screening a library, the receptor concentration should be chosen such that the WT (non-mutagenized) or control sample falls in the middle or lower part of the dynamic range, which would allow for higher production levels to be measured. Alternatively, the samples can be diluted before testing. Thus, the FP assay can serve as a high-throughput preliminary screen to identify potentially interesting strains for scaled-up production in flask or bioreactors.

The lower limit (sensitivity) of the assay depends on the Kd of both the target molecule and reporter molecule, as well as the fluorophore used. For molecules with Kd in the low nM range, it should be possible to detect as low as 10 nM of compound using lower receptor

147 concentrations (e.g. 50 nM), which is possibly approaching the lower limit of the assay since the final concentration of reporter used is 2.5 nM. This is the minimum concentration of reporter required for a reliable FP signal. The minimum concentration of reporter used can be empirically determined by testing various combinations of receptor and reporter concentrations alone in a well, until a reliable difference in FP signal between the presence and absence of the receptor is observed.

Potential for the FP assay to be generalized for other target molecules

We expect that the FP assay can be quickly and easily adopted to screen for a wide range of small molecule metabolites. Significantly, the FP assay can be performed by simply adding a receptor and competitive reporter molecule to the sample.88 This is unlike other assays based on fluorophore quenching in the binding site89 or bioluminescence resonance energy transfer

(BRET)90, which require receptor modification with a quencher or fluorophore respectively, and are highly target-dependent. In order to develop an FP assay for a target molecule, two components need to be developed – a receptor and a competitive reporter molecule.

Receptor design and evolution

The receptor has to have adequate affinity for both the reporter molecule and target molecule, ideally a Kd in the nM – low µM range. This is because weak affinities will require very high concentrations of receptor in order to observe a change in FP when the reporter is added. This could be problematic as the receptor (if protein), may aggregate or bind non- selectively to the reporter/fluorophore at high concentrations.

For the majority of metabolites, we expect that protein-based receptors can be found in nature, including its natural target receptor or the inactivated enzyme for which the molecule is the substrate or product. A receptor can also be created de novo by computational design, or have

148 its binding affinity increased via established directed evolution technologies (e.g. phage, mRNA, yeast surface display).51-53 The receptor can also be an antibody/monobody, RNA-based molecule (e.g. aptamers) or other binder.

The receptor used determines the specificity of the FP assay. In this study, the receptor

FKBP12 is employed. Since it binds to several ligands with nanomolar affinity including FK506, rapamycin and various analogs, the assay can be used to detect any one of these targets (data not shown). Similarly, reporter molecules that are based on any one of these molecules (or others that bind to FKBP12) can be used interchangeably in the FP assay. Thus, if the FP assay is used for screening in metabolic engineering, it is important to verify the identity of the product by an alternative method such as mass spectrometry. However, if a higher degree of specificity is desired, the receptor can be modified or evolved using directed evolution technologies to selectively bind one target molecule over the others.

Reporter design and synthesis

The competitive reporter molecule consists of a molecule that is linked to a fluorophore, and which competes with the target molecule for binding to the receptor. The reporter can be constructed by chemically coupling the target molecule itself to a fluorophore. Analogs of the target molecule or inhibitors that bind to the receptor can also be used. However, a major concern is that attaching a fluorophore to a target molecule/inhibitor of comparable size would be likely to reduce the affinity of the reporter conjugate to the receptor due to steric hindrance from the fluorophore. This can be overcome by choosing linker sites on the target/inhibitor that are not critical for receptor binding or that point outwards from the binding site. The choice of linker sites can be informed by X-ray crystal structures of the target/inhibitor bound to the receptor, or structure-activity relationship studies that have been performed on the molecule.54,55

149

Also, the linker length can be increased to reduce steric interaction between the fluorophore and receptor. However, the linker length must not be too long since this will increase the flexibility of the fluorophore to rotate and result in a lower FP signal when the reporter is bound to the receptor (the “propeller effect”).91

Another important decision lies in the choice of the fluorophore. The fluorophore may interact non-specifically with the receptor to produce a false high polarization signal. This can be detected by supplying a high concentration of target molecule, which should displace the reporter only if it is bound specifically, and decrease the FP signal. Other characteristics of the fluorophore such as quantum yield, which affects the signal intensity may also be important considerations. Fluorophores with longer excited-state lifetimes are suitable for measuring interactions over a larger molecular weight range. Also, red-shifted fluorophores can reduce the background from intrinsically fluorescent components in the sample.

FP assay in a high-throughput setting

The FP assay provides a measure of the proportion of bound versus unbound reporter molecule and is a ratiometric technique. Thus, it is not affected by light absorbance or color quenching by other components in the sample. The FP assay is homogeneous – it can be performed with all the components added to the same well and there is no need to separate the bound and unbound species. This minimizes the number of steps to perform the assay.

Since the FP assay relies on the detection of changes in polarization of a fluorescence signal, the assay is highly sensitive and can be performed easily and cheaply in microtiter plate format using a fluorescence plate reader that has a FP module. In this study, we performed the

FP assay in a 384-well plate format, which would potentially allow screens of up to >104 samples (~26 plates) per day (30 minutes per plate). The FP assay has also been reported to be

150 achieved in 1536-well plates, which would increase sample throughput to ~105 samples per day.92,93

However, the production platform for culturing the organism can quickly become limiting. Here, we demonstrate that FK506 can be produced by cultivation in 96-well plates. In general, there have been only a few reports of cultivation of Streptomyces spp. in 96-well plate format.76,81,82,94 This is probably due to the requirement of most organisms such as Streptomyces spp. for high oxygen transfer rates for growth and secondary metabolite production. This can be achieved through the use of high-frequency shakers, smaller culture volumes, tilted rotation and larger shaking orbits. Microtiter plates have the added advantage that they can be easily automatable. Importantly, cultivation in microtiter plates have been shown to be more reproducible than shake flask cultivations.82 However, a key question to be addressed is whether the relationship between the productivities of different strains in a microtiter plate will hold when scaled up to a shake flask or bioreactor. In the future, novel platforms that use microfluidics to cultivate organisms in mini-bioreactor droplets can potentially increase throughput.95

We have established that the FP assay can be used for small molecule detection for metabolic engineering purposes. As a proof-of-concept, we adapted the FP assay for high- throughput screening of FK506 produced by Streptomyces tsukubaensis in 96-well microtiter plates. A FK506 production and extraction protocol in 96-well plates was developed to enable a high-throughput production and screening cycle. We applied our platform to screen a UV mutagenized library for improved production strains. We further verified that the FP assay is robust using the Z’ score. Overall, the platform developed here will be useful not only for screening mutagenized strain libraries, but also to screen different culture conditions and media compositions to optimize the production process. We expect that the FP assay will be broadly

151 applicable to screen for other targets of interest to the metabolic engineering community, as well as for other purposes such as the discovery of natural product pathways in metagenomics.

3.4 Experimental methods

Strain, sporulation and mutagenesis conditions. Streptomyces tsukubaensis (NRRL 18488) was obtained from the Agricultural Research Service Culture Collection (NRRL), USA. Spores were harvested after culturing NRRL 18488 on inorganic salts/starch agar media (ISP4) (BD

Difco, #277210) at 28 ºC for 4 days. UV mutagenesis was performed using a Stratalinker 2400

(Agilent). UV dose was calibrated to obtain a 99 % kill rate.

Production conditions. Production of FK506 was carried out in 2.2 mL, square, 96-well deep well microtiter plates (Axygen Scientific) using a two-stage cultivation method.62 First, 500 μL of modified BaSa seed medium was inoculated with spores of NRRL 18488 and incubated for 3 days at 28 ºC and 250 rpm (Innova 44R, 2-inch orbit). The seed culture was used to inoculate

500 – 1000 μL of ISPz production media 62 per well and incubated for 4 – 8 days at 28 ºC and

250 rpm. The volume of seed culture added was 10 % of the final volume in the well. The plates were sealed with two layers of sterile, breathable rayon seal (AeraSeal) before incubation.

For flask productions to verify the top 8 mutants by productivity that were found in the

FP screen, 5 mL of seed culture (4.5 mL of BaSa medium inoculated with 500 μL of spores of each mutant) was incubated for 40 h at 28 °C and 230 rpm. The seed cultures were used for the inoculation of 5 mL of ISPz production media (to a final OD600 of 1) in 50 mL, unbaffled flasks

(Pyrex Erlenmeyer 50 mL, CE-FLAS050), and incubated for 8 days at 28 °C and 230 rpm.

152

Measurement of cell growth

Dry weight. Cell growth was determined by measuring the dry weight of an aliquot of the culture. The cells were pelleted in pre-weighed 4 mL glass vials, washed twice with acid to remove insoluble CaCO3 in the media, and then dried in an oven for 2 days before measuring the mass. For the construction of the standard curve, the cells were transferred onto pre-weighed

Nalgene filter units (polyether sulfone, membrane diameter 75 mm, pore size 0.45 μm, Thermo

Scientific), washed twice with acidified water (pH 0.5) to remove insoluble CaCO3 in the media, and then dried in an oven at 80 °C until constant weight (1 day) before measuring the mass.

Methylene blue assay. For the library screening, cell growth was measured using a methylene blue absorption assay.83 A standard curve of methylene blue absorption against dry cell weight was constructed to convert absorbance values to dry weight (Fig. 3b). The methylene blue assay was performed by transferring 100 µL of sample into a Corning Costar polystyrene clear 96-well plate. 100 µL of methylene blue solution was added to the sample (final concentration of 1.5 mM) and the plate was incubated for 15 min at 80 ºC. At 5 min intervals, the plate was agitated for 10 s in a high-frequency microplate shaker (800 rpm) to enable mixing. After incubation for

15 min, the plate was centrifuged for 5 min at 3000 rpm and 10 µL of supernatant was transferred into a new 96-well clear plate containing 190 µL of water in each well. The contents were mixed and absorbance was determined in a Tecan plate reader at 660 nm.

Extraction of FK506 Initial testing for production. For the initial testing of production of

FK506 in microtiter plates and shake flasks, the culture in each well/flask was transferred into a

2 mL Eppendorf tube. An equal volume of ethyl acetate was added and the tube was vortexed for

153

30 min. After centrifugation for 5 minutes, the top ethyl acetate layer was transferred into a 4 mL glass vial and evaporated to dryness by rotary evaporation. An equal volume of methanol was added to dissolve any FK506 present for subsequent testing.

Library screening. For the library screening, FK506 was extracted from cell cultures of S. tsukubaensis using the same 96-well deep-well microtiter plate that was used for production

(Axygen Scientific). Firstly, the cells in the plates were subjected to 3 freeze-thaw cycles. 100

µL of culture was taken for the methylene blue assay. 500 µL of ethyl acetate was added to the remaining culture (~650 µL). The plates were sealed with a silicone mat (ImpermaMat, Axygen

Scientific), wrapped with parafilm, and incubated for 1 hour at 20 ºC, 800 rpm in a high frequency plate shaker. The plates were then centrifuged for 5 minutes at 3000 rpm. 150 µL of ethyl acetate was added gently to the top layer, mixed gently, and 100 µL of the top ethyl acetate layer was transferred to a 96-well polypropylene plate (Greiner). The plate was evaporated to dryness in a chemical hood, and 100 µL of methanol was added to dissolve the extracted FK506.

The methanol sample was added directly into the assay wells for the FP assay. All procedures were performed using 8-channel manual pipettes.

FK506 quantification Fluorescence polarization assay. The FP assay for FK506 was performed using the receptor FKBP12 (purified using the Ni-NTA kit (Qiagen) from a previous construct73) and reporter molecule FK506-fluorescein (a gift from Ariad Pharmaceuticals).73,96,97

The assay was performed in 384-well, round, black-bottomed plates (Corning #3821). A total volume of 40 µL per well was used. The final concentrations of reagents in each well were: 2.5 nM FK506-fluorescein and 1 µM FKBP12 (unless otherwise specified). The order of addition of

154 each reagent into the well was: 32 µL of master mix containing FK506-fluorescein, 2X FP assay buffer 73 and water, followed by 4 µL of the sample (in methanol solvent) and finally 4 µL of 10

µM FKBP12. The plates were left for two hours for equilibrium to be reached. FP readings were taken using a Victor X5 plate reader (Perkin Elmer), with excitation filter (485 nm) and emission filter (535 nm). FK506 used for the standard curves were purchased from Sigma (#F4679).

Mass spectrometry. Mass spectrometry was performed using an LC-MS (Agilent 1100 Series

Capillary LCMSD Trap XCT Spectrometer, NYU). A standard curve was generated by selected ion monitoring of ion counts at m/z of 826.4 (FK506 + Na) using different concentrations of commercial FK506 (Sigma #F4679). This curve was used to determine the concentration of

FK506 in the sample.

Random mutagenesis and screening for improved FK506 production. First, a S. tsukubaensis spore suspension was irradiated with UV (killing ratio of 99 %) and plated on ISP4 agar. Sporulated colonies appeared after 3-4 days. These were picked to create a glycerol spore stock in a 96-well plate, which was preserved in a -80 ºC freezer for future testing and further rounds of mutagenesis.94 The glycerol stock plate was used to inoculate a seed culture plate.

After incubation for 3 days, the seed culture plate was used to inoculate two independent production plates (replicates). Following incubation for 4 days to allow for production of FK506,

100 µL of culture in each well was used to determine cell growth by the methylene blue assay.

The remainder of the culture (650 µL per well) was subject to ethyl acetate extraction in the same production plate. The FP assay was performed to measure the amount of FK506 in the extracts.

155

Z’ score. The Z’ score was calculated using the following equation: Z’ = 1 – (3σn + 3σp) / (µn -

µp), where σn is the standard deviation of the negative control (methanol), σp is the standard deviation of the positive control (final concentration of 5 µM FK506), µn is the mean FP value

85 for methanol and µp is the mean FP value for 5 µM FK506.

156

3.5 References

1 Rabinovitch-Deere, C. A., Oliver, J. W., Rodriguez, G. M. & Atsumi, S. Synthetic biology and metabolic engineering approaches to produce biofuels. Chemical reviews 113, 4611-4632, doi:10.1021/cr300361t (2013). 2 Lee, J. W., Kim, T. Y., Jang, Y. S., Choi, S. & Lee, S. Y. Systems metabolic engineering for chemicals and materials. Trends in biotechnology 29, 370-378, doi:10.1016/j.tibtech.2011.04.001 (2011). 3 Chartrain, M., Salmon, P. M., Robinson, D. K. & Buckland, B. C. Metabolic engineering and directed evolution for the production of pharmaceuticals. Current opinion in biotechnology 11, 209-214 (2000). 4 Westfall, P. J. et al. Production of amorphadiene in yeast, and its conversion to dihydroartemisinic acid, precursor to the antimalarial agent artemisinin. Proceedings of the National Academy of Sciences of the United States of America 109, E111-118, doi:10.1073/pnas.1110740109 (2012). 5 Ro, D. K. et al. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature 440, 940-943, doi:10.1038/nature04640 (2006). 6 Kurian, J. V. A new polymer platform for the future - Sorona (R) from corn derived 1,3- propanediol. J Polym Environ 13, 159-167, doi:DOI 10.1007/s10924-005-2947-7 (2005). 7 Abatemarco, J., Hill, A. & Alper, H. S. Expanding the metabolic engineering toolbox with directed evolution. Biotechnology journal 8, 1397-1410, doi:10.1002/biot.201300021 (2013). 8 Esvelt, K. M. & Wang, H. H. Genome-scale engineering for systems and synthetic biology. Molecular systems biology 9, 641, doi:10.1038/msb.2012.66 (2013). 9 Dietrich, J. A., McKee, A. E. & Keasling, J. D. High-throughput metabolic engineering: advances in small-molecule screening and selection. Annual review of biochemistry 79, 563-590, doi:10.1146/annurev-biochem-062608-095938 (2010). 10 Alper, H., Miyaoku, K. & Stephanopoulos, G. Construction of lycopene-overproducing E. coli strains by combining systematic and combinatorial gene knockout targets. Nature biotechnology 23, 612-616, doi:10.1038/nbt1083 (2005). 11 Ukibe, K., Katsuragi, T., Tani, Y. & Takagi, H. Efficient screening for astaxanthin-overproducing mutants of the yeast Xanthophyllomyces dendrorhous by flow cytometry. FEMS microbiology letters 286, 241-248, doi:10.1111/j.1574-6968.2008.01278.x (2008).

157

12 Santos, C. N. & Stephanopoulos, G. Melanin-based high-throughput screen for L-tyrosine production in Escherichia coli. Applied and environmental microbiology 74, 1190-1197, doi:10.1128/AEM.02448-07 (2008). 13 Gamper, M., Hilvert, D. & Kast, P. Probing the role of the C-terminus of Bacillus subtilis chorismate mutase by a novel random protein-termination strategy. Biochemistry 39, 14087- 14094 (2000). 14 Griffiths, J. S. et al. A bacterial selection for the directed evolution of pyruvate aldolases. Bioorganic & medicinal chemistry 12, 4067-4074, doi:10.1016/j.bmc.2004.05.034 (2004). 15 Mandal, M. & Breaker, R. R. Gene regulation by riboswitches. Nature reviews. Molecular cell biology 5, 451-463, doi:10.1038/nrm1403 (2004). 16 Montange, R. K. & Batey, R. T. Riboswitches: emerging themes in RNA structure and function. Annual review of biophysics 37, 117-133, doi:10.1146/annurev.biophys.37.032807.130000 (2008). 17 van Sint Fiet, S., van Beilen, J. B. & Witholt, B. Selection of biocatalysts for chemical synthesis. Proceedings of the National Academy of Sciences of the United States of America 103, 1693- 1698, doi:10.1073/pnas.0504733102 (2006). 18 Dietrich, J. A., Shis, D. L., Alikhani, A. & Keasling, J. D. Transcription factor-based screens and synthetic selections for microbial small-molecule biosynthesis. ACS synthetic biology 2, 47-58, doi:10.1021/sb300091d (2013). 19 Peet, D. J., Doyle, D. F., Corey, D. R. & Mangelsdorf, D. J. Engineering novel specificities for ligand-activated transcription in the nuclear hormone receptor RXR. Chemistry & biology 5, 13- 21 (1998). 20 Michener, J. K., Thodey, K., Liang, J. C. & Smolke, C. D. Applications of genetically-encoded biosensors for the construction and control of biosynthetic pathways. Metabolic engineering 14, 212-222, doi:10.1016/j.ymben.2011.09.004 (2012). 21 Collins, C. H., Leadbetter, J. R. & Arnold, F. H. Dual selection enhances the signaling specificity of a variant of the quorum-sensing transcriptional activator LuxR. Nature biotechnology 24, 708- 712, doi:10.1038/nbt1209 (2006). 22 Mohn, W. W., Garmendia, J., Galvao, T. C. & de Lorenzo, V. Surveying biotransformations with a la carte genetic traps: translating dehydrochlorination of lindane (gamma- hexachlorocyclohexane) into lacZ-based phenotypes. Environmental microbiology 8, 546-555, doi:10.1111/j.1462-2920.2006.00983.x (2006). 23 Tang, S. Y. & Cirino, P. C. Design and application of a mevalonate-responsive regulatory protein. Angewandte Chemie 50, 1084-1086, doi:10.1002/anie.201006083 (2011).

158

24 Chockalingam, K., Chen, Z., Katzenellenbogen, J. A. & Zhao, H. Directed evolution of specific receptor-ligand pairs for use in the creation of gene switches. Proceedings of the National Academy of Sciences of the United States of America 102, 5691-5696, doi:10.1073/pnas.0409206102 (2005). 25 Doi, N. & Yanagawa, H. Design of generic biosensors based on green fluorescent proteins with allosteric sites by directed evolution. FEBS letters 453, 305-307 (1999). 26 Baird, G. S., Zacharias, D. A. & Tsien, R. Y. Circular permutation and receptor insertion within green fluorescent proteins. Proceedings of the National Academy of Sciences of the United States of America 96, 11241-11246 (1999). 27 Tucker, C. L. & Fields, S. A yeast sensor of ligand binding. Nature biotechnology 19, 1042-1046, doi:10.1038/nbt1101-1042 (2001). 28 Louvion, J. F., Havaux-Copf, B. & Picard, D. Fusion of GAL4-VP16 to a steroid-binding domain provides a tool for gratuitous induction of galactose-responsive genes in yeast. Gene 131, 129- 134 (1993). 29 Israel, D. I. & Kaufman, R. J. Dexamethasone negatively regulates the activity of a chimeric dihydrofolate reductase/glucocorticoid receptor protein. Proceedings of the National Academy of Sciences of the United States of America 90, 4290-4294 (1993). 30 Renard, M. et al. Knowledge-based design of reagentless fluorescent biosensors from recombinant antibodies. Journal of molecular biology 318, 429-442, doi:10.1016/S0022- 2836(02)00023-2 (2002). 31 Rabbany, S. Y., Donner, B. L. & Ligler, F. S. Optical immunosensors. Critical reviews in biomedical engineering 22, 307-346 (1994). 32 Pollack, S. J., Nakayama, G. R. & Schultz, P. G. Introduction of nucleophiles and spectroscopic probes into antibody combining sites. Science 242, 1038-1040 (1988). 33 Loving, G. S., Sainlos, M. & Imperiali, B. Monitoring protein interactions and dynamics with solvatochromic fluorophores. Trends in biotechnology 28, 73-83, doi:10.1016/j.tibtech.2009.11.002 (2010). 34 Masharina, A., Reymond, L., Maurel, D., Umezawa, K. & Johnsson, K. A fluorescent sensor for GABA and synthetic GABA(B) receptor ligands. Journal of the American Chemical Society 134, 19026-19034, doi:10.1021/ja306320s (2012). 35 Gilardi, G., Zhou, L. Q., Hibbert, L. & Cass, A. E. Engineering the maltose binding protein for reagentless fluorescence sensing. Analytical chemistry 66, 3840-3847 (1994).

159

36 Boyd, A. E., Marnett, A. B., Wong, L. & Taylor, P. Probing the active center gorge of acetylcholinesterase by fluorophores linked to substituted cysteines. The Journal of biological chemistry 275, 22401-22408, doi:10.1074/jbc.M000606200 (2000). 37 Tsai, Y. C., Jin, Z. & Johnson, K. A. Site-specific labeling of T7 DNA polymerase with a conformationally sensitive fluorophore and its use in detecting single-nucleotide polymorphisms. Analytical biochemistry 384, 136-144, doi:10.1016/j.ab.2008.09.006 (2009). 38 Chan, P. H. et al. Fluorescein-labeled beta-lactamase mutant for high-throughput screening of bacterial beta-lactamases against beta-lactam antibiotics. Analytical chemistry 77, 5268-5276, doi:10.1021/ac0502605 (2005). 39 Yamamoto, R., Baba, T. & Kumar, P. K. Molecular beacon aptamer fluoresces in the presence of Tat protein of HIV-1. Genes to cells : devoted to molecular & cellular mechanisms 5, 389-396 (2000). 40 Li, J. J., Fang, X. & Tan, W. Molecular aptamer beacons for real-time protein recognition. Biochemical and biophysical research communications 292, 31-40 (2002). 41 Tombelli, S., Minunni, M. & Mascini, M. Analytical applications of aptamers. Biosensors & bioelectronics 20, 2424-2434, doi:10.1016/j.bios.2004.11.006 (2005). 42 Bayer, T. S. & Smolke, C. D. Programmable ligand-controlled riboregulators of eukaryotic gene expression. Nature biotechnology 23, 337-343, doi:10.1038/nbt1069 (2005). 43 Trevino, S. G. & Levy, M. High-Throughput Bead-Based Identification of Structure-Switching Aptamer Beacons. Chembiochem : a European journal of chemical biology, doi:10.1002/cbic.201402037 (2014). 44 Nasir, M. S. & Jolley, M. E. Fluorescence polarization: an analytical tool for immunoassay and drug discovery. Combinatorial chemistry & high throughput screening 2, 177-190 (1999). 45 Nikiforov, T. T. & Simeonov, A. M. Application of fluorescence polarization to enzyme assays and single nucleotide polymorphism genotyping: some recent developments. Combinatorial chemistry & high throughput screening 6, 201-212 (2003). 46 Turek-Etienne, T. C., Kober, T. P., Stafford, J. M. & Bryant, R. W. Development of a fluorescence polarization AKT serine/threonine kinase assay using an immobilized metal ion affinity-based technology. Assay and drug development technologies 1, 545-553, doi:10.1089/154065803322302808 (2003). 47 Gaudet, E. A. et al. A homogeneous fluorescence polarization assay adaptable for a range of protein serine/threonine and tyrosine kinases. Journal of biomolecular screening 8, 164-175, doi:10.1177/1087057103252309 (2003).

160

48 Owicki, J. C. Fluorescence polarization and anisotropy in high throughput screening: perspectives and primer. Journal of biomolecular screening 5, 297-306 (2000). 49 Pollard, T. D. A guide to simple and informative binding assays. Molecular biology of the cell 21, 4061-4067, doi:10.1091/mbc.E10-08-0683 (2010). 50 Inglese, J. et al. High-throughput screening assays for the identification of chemical probes. Nature chemical biology 3, 466-479, doi:10.1038/nchembio.2007.17 (2007). 51 Bradbury, A. R., Sidhu, S., Dubel, S. & McCafferty, J. Beyond natural antibodies: the power of in vitro display technologies. Nature biotechnology 29, 245-254, doi:10.1038/nbt.1791 (2011). 52 Gilbreth, R. N. & Koide, S. Structural insights for engineering binding proteins based on non- antibody scaffolds. Current opinion in structural biology 22, 413-420, doi:10.1016/j.sbi.2012.06.001 (2012). 53 Amstutz, P., Forrer, P., Zahnd, C. & Pluckthun, A. In vitro display technologies: novel developments and applications. Current opinion in biotechnology 12, 400-405 (2001). 54 Kuramochi, K. Synthetic and structure-activity relationship studies on bioactive natural products. Bioscience, biotechnology, and biochemistry 77, 446-454, doi:10.1271/bbb.120884 (2013). 55 Koehn, F. E. & Carter, G. T. The evolving role of natural products in drug discovery. Nature reviews. Drug discovery 4, 206-220, doi:10.1038/nrd1657 (2005). 56 Kino, T. et al. FK-506, a novel immunosuppressant isolated from a Streptomyces. II. Immunosuppressive effect of FK-506 in vitro. The Journal of antibiotics 40, 1256-1265 (1987). 57 Dumont, F., Garrity, G. M., Fernandez, I. M. & Matas, T. D. (Google Patents, 1992). 58 Garrity, G. M., Heimbuch, B. K., Motamedi, H. & Shafiee, A. Genetic relationships among actinomycetes that produce the immunosuppressant macrolides FK506, FK520/FK523 and rapamycin. Journal of industrial microbiology 12, 42-47 (1993). 59 Sigmund, J. M., Clark, D. C., Rainey, F. A. & Anderson, A. S. Detection of eubacterial 3- hydroxy-3-methylglutaryl coenzyme a reductases from natural populations of actinomycetes. Microbial ecology 46, 106-112, doi:10.1007/s00248-002-2029-5 (2003). 60 Kim, H. S. & Park, Y. I. Isolation and identification of a novel microorganism producing the immunosuppressant tacrolimus. Journal of bioscience and bioengineering 105, 418-421, doi:10.1263/jbb.105.418 (2008). 61 Martinez-Castro, M., Barreiro, C., Romero, F., Fernandez-Chimeno, R. I. & Martin, J. F. Streptomyces tacrolimicus sp. nov., a low producer of the immunosuppressant tacrolimus (FK506). International journal of systematic and evolutionary microbiology 61, 1084-1088, doi:10.1099/ijs.0.024273-0 (2011).

161

62 Martinez-Castro, M. et al. Taxonomy and chemically semi-defined media for the analysis of the tacrolimus producer 'Streptomyces tsukubaensis'. Applied microbiology and biotechnology 97, 2139-2152, doi:10.1007/s00253-012-4364-x (2013). 63 Liu, F., Wang, Y. Q., Meng, L., Gu, M. & Tan, R. Y. FK506-binding protein 12 ligands: a patent review. Expert opinion on therapeutic patents 23, 1435-1449, doi:10.1517/13543776.2013.828695 (2013). 64 Demain, A. L. & Sanchez, S. Microbial drug discovery: 80 years of progress. The Journal of antibiotics 62, 5-16, doi:10.1038/ja.2008.16 (2009). 65 Vaid, S. & Dr, N. P. (Google Patents, 2006). 66 Moss, S. J., Wilkinson, B., Martin, C. J. & Schell, U. E. (Google Patents, 2010). 67 Chen, D. et al. Improvement of FK506 production in Streptomyces tsukubaensis by genetic enhancement of the supply of unusual polyketide extender units via utilization of two distinct site-specific recombination systems. Applied and environmental microbiology 78, 5093-5103, doi:10.1128/AEM.00450-12 (2012). 68 Xia, M. et al. Enhanced FK506 production in Streptomyces tsukubaensis by rational feeding strategies based on comparative metabolic profiling analysis. Biotechnology and bioengineering 110, 2717-2730, doi:10.1002/bit.24941 (2013). 69 Xu, H., Perez, S. D., Cheeseman, J., Mehta, A. K. & Kirk, A. D. The allo- and viral-specific immunosuppressive effect of belatacept, but not tacrolimus, attenuates with progressive T cell maturation. American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons 14, 319-332, doi:10.1111/ajt.12574 (2014). 70 Bollini, S. et al. High-throughput fluorescence polarization method for identification of FKBP12 ligands. Journal of biomolecular screening 7, 526-530, doi:10.1177/1087057102238626 (2002). 71 Dubowchik, G. M., Ditta, J. L., Herbst, J. J., Bollini, S. & Vinitsky, A. Fluoresceinated FKBP12 ligands for a high-throughput fluorescence polarization assay. Bioorganic & medicinal chemistry letters 10, 559-562 (2000). 72 Kozany, C., Marz, A., Kress, C. & Hausch, F. Fluorescent probes to characterise FK506-binding proteins. Chembiochem : a European journal of chemical biology 10, 1402-1410, doi:10.1002/cbic.200800806 (2009). 73 de Felipe, K. S., Carter, B. T., Althoff, E. A. & Cornish, V. W. Correlation between ligand- receptor affinity and the transcription readout in a yeast three-hybrid system. Biochemistry 43, 10353-10363, doi:10.1021/bi049716n (2004).

162

74 Huber, R. et al. Robo-Lector - a novel platform for automated high-throughput cultivations in microtiter plates with high information content. Microbial cell factories 8, 42, doi:10.1186/1475- 2859-8-42 (2009). 75 Mølgaard, L. & Mølgaard, L. Engineering the Polyketide Cell Factory. (Technical University of Denmark, 2012). 76 Duetz, W. A. Microtiter plates as mini-bioreactors: miniaturization of fermentation methods. Trends in microbiology 15, 469-475, doi:10.1016/j.tim.2007.09.004 (2007). 77 Duetz, W. A. & Witholt, B. Oxygen transfer by orbital shaking of square vessels and deepwell microtiter plates of various dimensions. Biochemical Engineering Journal 17, 181-185, doi:http://dx.doi.org/10.1016/S1369-703X(03)00177-3 (2004). 78 Funke, M., Diederichs, S., Kensy, F., Muller, C. & Buchs, J. The baffled microtiter plate: increased oxygen transfer and improved online monitoring in small scale fermentations. Biotechnology and bioengineering 103, 1118-1128, doi:10.1002/bit.22341 (2009). 79 Duetz, W. A. et al. Methods for intense aeration, growth, storage, and replication of bacterial strains in microtiter plates. Applied and environmental microbiology 66, 2641-2646 (2000). 80 Duetz, W. A. & Witholt, B. Effectiveness of orbital shaking for the aeration of suspended bacterial cultures in square-deepwell microtiter plates. Biochem Eng J 7, 113-115 (2001). 81 Xu, Z. N., Shen, W. H., Chen, X. Y., Lin, J. P. & Cen, P. L. A high-throughput method for screening of rapamycin-producing strains of Streptomyces hygroscopicus by cultivation in 96- well microtiter plates. Biotechnology letters 27, 1135-1140, doi:10.1007/s10529-005-8463-y (2005). 82 Zhu, Y. A Rapid, Small-Scale Method for Improving Fermentation Medium Performance Master's thesis, The University of Waikato, (2007). 83 Fischer, M. & Sawers, R. G. A universally applicable and rapid method for measuring the growth of streptomyces and other filamentous microorganisms by methylene blue adsorption-desorption. Applied and environmental microbiology 79, 4499-4502, doi:10.1128/AEM.00778-13 (2013). 84 Fischer, M., Schmidt, C., Falke, D. & Sawers, R. G. Terminal reduction reactions of nitrate and sulfate assimilation in Streptomyces coelicolor A3(2): identification of genes encoding nitrite and sulfite reductases. Research in microbiology 163, 340-348, doi:10.1016/j.resmic.2012.05.004 (2012). 85 Zhang, J. H., Chung, T. D. & Oldenburg, K. R. A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays. Journal of biomolecular screening 4, 67-73 (1999).

163

86 Kimple, A. J. et al. A high throughput fluorescence polarization assay for inhibitors of the GoLoco motif/G-alpha interaction. Combinatorial chemistry & high throughput screening 11, 396-409 (2008). 87 Souza-Fagundes, E. M. et al. A high-throughput fluorescence polarization anisotropy assay for the 70N domain of replication protein A. Analytical biochemistry 421, 742-749, doi:10.1016/j.ab.2011.11.025 (2012). 88 Hulme, E. C. & Trevethick, M. A. Ligand binding assays at equilibrium: validation and interpretation. British journal of pharmacology 161, 1219-1237, doi:10.1111/j.1476- 5381.2009.00604.x (2010). 89 Wong, K.-y. H. K., CN), Leung, Yun Chung (Hong Kong, CN), Chan, Pak Ho (Hong Kong, CN), Chung, Wai Hong (Hong Kong, CN), Chow, Ka Yan (Hong Kong, CN), Chow, Ho Yin (Hong Kong, CN), Leung, Hon Man (Hong Kong, CN). Method and system for detection of chloramphenicol. United States patent (2012). 90 Griss, R. et al. Bioluminescent sensor proteins for point-of-care therapeutic drug monitoring. Nature chemical biology 10, 598-603, doi:10.1038/nchembio.1554 (2014). 91 Lea, W. A. & Simeonov, A. Fluorescence polarization assays in small molecule screening. Expert opinion on drug discovery 6, 17-32, doi:10.1517/17460441.2011.537322 (2011). 92 Minuesa, G. et al. A 1536-well fluorescence polarization assay to screen for modulators of the MUSASHI family of RNA-binding proteins. Combinatorial chemistry & high throughput screening 17, 596-609 (2014). 93 Yasgar, A. et al. High-throughput 1,536-well fluorescence polarization assays for alpha(1)-acid glycoprotein and human serum albumin binding. PloS one 7, e45594, doi:10.1371/journal.pone.0045594 (2012). 94 Minas, W., Bailey, J. E. & Duetz, W. Streptomycetes in micro-cultures: growth, production of secondary metabolites, and storage and retrieval in the 96-well format. Antonie van Leeuwenhoek 78, 297-305 (2000). 95 Wang, B. L. et al. Microfluidic high-throughput culturing of single cells for selection based on extracellular metabolite production or consumption. Nature biotechnology 32, 473-478, doi:10.1038/nbt.2857 (2014). 96 Clackson, T. et al. Redesigning an FKBP-ligand interface to generate chemical dimerizers with novel specificity. Proceedings of the National Academy of Sciences of the United States of America 95, 10437-10442 (1998). 97 Clackson, T. P. et al. (Google Patents, 2003).

164

Chapter 4

Yeast Three-Hybrid Assay for

Small Molecule Screens and Selections

──────────────────────────────────────────────────────────────────────────────

Chapter contributions: Yao Zong Ng and Virginia W. Cornish conceived the idea, Y.Z.N. designed the study, Y.Z.N., Marie D. Harton and Pedro Baldera-Aguayo performed the experiments and analyzed the results.

165

4.0 Chapter outlook

In the last chapter, we established the fluorescence polarization (FP) assay as a high- throughput screen for small molecules, capable of screening ~ 104 – 105 samples per day.

However, when library sizes approach 108 or more, screening methods becomes impractical and growth selections have to be employed in order to sample the full library. Previously, our lab developed chemical complementation, based on the small molecule yeast three-hybrid system

(Y3H), to establish a selection for enzyme catalysis. In this chapter, we propose to adapt the

Y3H to detect small molecules by displacement of a competitive small molecule dimerizer that bridges the two receptor-fusion proteins in the Y3H system. In this way, small molecule detection can be coupled to a change in the transcription output of a reporter gene, which can be used to create a screen or selection. In preliminary results, we show that the Y3H system can detect the small molecule trimethoprim (TMP), as well as the polyketides meridamcyin and

FK506. At the same time, small molecule detection was coupled to the expression of a selectable reporter gene. This sets the stage for the Y3H system to be further developed as a general selection platform for small molecules for combinatorial metabolic engineering.

166

4.1 Introduction

To address the combinatorial complexity of metabolic engineering, directed evolution approaches can be applied to optimize the biosynthetic enzymes, pathways, interacting networks and the host strain background. However, this will require high-throughput DNA assembly or mutagenesis technologies for creating combinatorial libraries, as well as screens and selections for the desired product (see Chapter 1). While many technologies have been developed for the creation of large DNA libraries1 (> 108), there is a lack of generalizable and high-throughput screens or selections for small molecules2. Colorimetric or fluorescence-based screens are limited to ~ 104 – 105 samples per day, except for fluorescence activated cell sorting (FACS), which can screen up to ~ 108 single cells per day3. However, unless the product is fluorescent and accumulates intracellularly, the application of FACS to detect small molecule products for metabolic engineering is limited4. In contrast, selections enable large populations of cells ~ 1010 to be enriched in a one-pot growth selection, and thus remains the method of choice for testing really large libraries (> 108)2.

One strategy to create a small molecule selection is to couple the detection of the compound to a change in the expression of a selectable reporter gene. Natural regulators such as transcription factors, riboregulators (e.g. riboswitches) and nuclear receptor proteins detect natural metabolites and can transduce the binding event into a change in gene transcription or translation. Thus, these natural regulators have been harnessed to create product screens or growth selections by coupling small molecule detection to the expression of a screenable or selectable reporter gene respectively5. In addition, the specificity of some regulators have been switched in order to expand the repertoire of compounds that they can detect6. However, this is neither trivial nor routine as the binding and signaling functions in these regulators are tightly

167 interwoven. Thus, more modular systems that can interface small molecules with genetics may generally be easier to engineer.

The small molecule three-hybrid (Y3H) system has its roots in the use of natural chemical dimerizers such as rapamycin for the control of protein interactions in the cell7.

Subsequently, the idea of interfacing the two halves of a transcription factor – the DNA binding domain (DBD) and activation domain (AD), with a dimeric small molecule, was born as a way to modulate gene transcription with small molecules8. The Y3H is highly modular in that different receptors can be fused to the DBD and AD. The two receptor-fusion proteins are then bridged by a dimeric small molecule (the chemical inducer of dimerization or CID) that can be synthesized by joining the ligands of the two receptors together. The Y3H system has since been exploited to screen cDNA libraries to discover receptors for small molecules8, to identify FKBP12 or FRAP

(receptor) mutants that bind specific FK506 or rapamycin analogs9,10, and to evolve receptors to bind novel ligands11.

Cornish et al. developed the idea of “Chemical Complementation”12 to couple enzyme activity to a transcriptional output via the Y3H system. By introducing reactive groups in the middle of the CID, the three-hybrid system was adapted to select for enzymes that catalyzed bond formation (glycosynthases)13 or bond cleavage (cellulases)14 reactions. Here, we envisioned that the Y3H system can be applied in a competition format to assay for small molecule displacement of one end of the CID from its receptor, which will result in a decrease in reporter gene transcription (Fig. 4-1). This is analogous to the Boeke “reverse” yeast one or two-hybrid systems, which can be used to screen for mutations or molecules that dissociate DNA-binding proteins from DNA or protein-protein interactions respectively15. As a proof-of-concept, we established Y3H systems capable of detecting the small molecules trimethoprim (TMP),

168 meridamycin and FK506 (tacrolimus). In addition, we show that the Y3H can be used to differentiate varying concentrations of these molecules and to couple detection to the transcription of a selectable marker gene.

Figure 4-1. Y3H assay for small molecule detection. X and Y are receptor proteins that each bind to one half of the small molecule CID, Z. X and Y are fused to the DNA-binding domain (DBD) and activation domain (AD) respectively. In the presence of X, Y and Z, an active transcription factor is reconstituted, thus activating transcription of the reporter gene. The presence of a small molecule product

(e.g. biosynthesized by a cell) can compete with one end of the CID for binding to a receptor. The displacement of the CID results in inactivation of reporter gene transcription. In this way, the Y3H system can couple small molecule detection to a genetic output.

169

4.2 Results

4.2.1 Yeast three-hybrid screen for trimethoprim

First, we tested to see if the Y3H system was capable of detecting small molecules, using trimethoprim (TMP) as a proof-of-principle target. Previously, our lab has developed Y3H systems based on the two receptor fusion pairs: dihydrofolate reductase (DHFR)–LexA DNA- binding domain (DBD) and glucocorticoid receptor–B42 activation domain (AD). These are brought together by the CID, dexamethasone-methotrexate (Dex-Mtx), to reconstitute a transcription factor, resulting in the activation of a downstream reporter gene16. Our lab also developed a second generation Y3H system based on the same small molecule-receptor pairs, but with the URA3 reporter gene placed under an extra layer of control by the dual tetracycline (Tet)- inducible regulators, which are in turn regulated by the Y3H system17,18 (Fig. 4-2). This system was shown to enable selective enrichment from a mock library of 106.19 The use of URA3 as a marker allows for a negative selection to be performed in the presence of 5-fluoroorotic acid

(FOA), which is converted to a toxic product by the enzyme product of URA320.

170

Figure 4-2. Y3H system to detect TMP. The second generation Y3H system adds an extra layer of control mediated by the TetR’-activator and TetR’-repressor. In the absence of TMP, the Y3H is active, resulting in transcription of the TetR’-activator. This alleviates the repression of the URA3 reporter by the constitutively expressed TetR’-repressor, thus resulting in transcription of URA3, which leads to cell death in the presence of 5-FOA (negative selection). When TMP is present, it can displace Mtx-Dex from binding to DHFR, resulting in an inactive Y3H system. This prevents the TetR’-activator from being expressed. Thus, the TetR’-repressor binds to TetO and inhibits the transcription of URA3, which increases cell survival in the presence of 5’-FOA.

Thus, we envisioned that this system could be used to detect TMP, a small molecule antibiotic that binds with high affinity to bacterial dihydrofolate reductase (DHFR)21. TMP should compete with the methotrexate end of the CID (Dex-Mtx) for binding to DHFR, thus reducing transcription of URA3 and promoting cell growth in the presence of FOA (Fig. 4-2).

171

Indeed, when we cultured the cells with different concentrations of TMP (0.5 nM – 50 µM), we observed that the cells grew faster in the presence of higher concentrations of TMP (Fig. 4-3).

TMP detection by Y3H 2

1.8

1.6

1.4

1.2

1 OD600 0.8

0.6

0.4

0.2

0 0 20 40 60 80 100 120 140 160 Time (h)

- CID, - DMF - CID, + DMF - CID, + DMF, 50 uM TMP + CID, - TMP + CID, 0.5 nM TMP + CID, 5 nM TMP + CID, 50 nM TMP + CID, 500 nM TMP + CID, 5 µM TMP + CID, 50 µM TMP

Figure 4-3. TMP detection using the Y3H system. The CID (Dex-Mtx) was either added (+) or not added (-) to the wells. CID was added to a final concentration of 5 µM. Dimethyl formamide (DMF) is the solvent for the CID. Concentrations of TMP refer to final concentrations in the well. Error bars represent standard deviation of 4 wells.

172

The maximum growth range of the assay is set by the minimum growth curve when the

CID is added (+ CID), and the maximum growth curve when no CID is added (- CID). Since the

CID was dissolved in the solvent dimethyl formamide (DMF), we also made sure that its presence did not affect growth (- CID, + DMF curve was identical to the - CID, - DMF curve).

The Y3H system was highly sensitive and we could detect as low as 0.5 nM of TMP, although the difference in growth compared to the minimum curve (+ CID, - TMP) was small. Above 5

µM of TMP, the growth reached the maximum (- CID, - DMF curve). This suggests that the

Y3H system is capable of differentiating between 0.5 nM and 5 µM of TMP (a dynamic range of at least 104) if run as a growth selection, since minor differences in growth can be amplified over time. However, when the Y3H is run as a screen based on OD600 measured in microtiter plates, the dynamic range may be smaller since minor growth differences can fall within the error of the

OD600 measurements. In fact, OD600 changes did not scale linearly with TMP concentration. The maximum OD600 change occurred between 50 nM and 500 nM of TMP (Fig. 4-3). Thus, when applied as a screen, this Y3H system is most useful for detecting product concentrations in that range.

4.2.2 Yeast three-hybrid screen for meridamycin

Next, we wanted to see if we could construct an Y3H system to detect the polyketide meridamycin that we aimed to produce using our engineered Streptomyces spp. strains (see

Chapter 2). Previously, meridamycin has been shown to bind to the receptor FKBP1222. An

Y3H system employing the receptor fusions Lex A DBD-glucocorticoid receptor and B42 AD-

FKBP12, bridged by the CID dexamethasone-FK506 has been developed8. Our lab developed an

Y3H system using the CID methotrexate–SLF (Mtx-SLF)23 to bridge the receptor fusions LexA

173

DBD-DHFR and B42 AD-FKBP12. SLF is a synthetic analog of FK506, and has been used successfully as part of a CID in Y3H systems9,23. In that study, various FKBP12 mutants with varying affinities for FK506 were generated, in order to test the effect of small molecule-receptor affinity on the transcriptional output24. However, it was reported that the Y3H system could only activate the screenable marker lacZ but not the auxotrophic markers LEU2 or URA3. Thus, in this study, we decided to swap the receptor fusion pairs to B42 AD-DHFR and LexA DBD-

FKBP12, to see if we could couple the Y3H system to a selectable LEU2 reporter gene, which would enable growth in the absence of leucine. In this setup, meridamycin would displace the

SLF moiety of the CID from binding to FKBP12, leading to inactivation of LEU2 transcription and decreased cell growth in media lacking leucine (Fig. 4-4).

Figure 4-4. Y3H system to detect meridamycin or FK506. The presence of FK506 or meridamycin can displace SLF from binding to the receptor FKBP12. This results in inactivation of the Y3H system, which decreases the transcription of the LEU2 gene, leading to lower cell growth in the absence of leucine.

To increase our chances of developing a sensitive Y3H system to detect meridamycin, various FKBP12 mutants (WT, H87L, Y26F) with varying Kd’s for SLF (15, 7.3, 170 nM respectively) were tested. When the CID alone was supplied, only the WT FKBP12 and H87L

174 mutant resulted in LEU2 activation and growth in media lacking leucine, with the higher affinity receptor H87L demonstrating faster growth (Fig. 4-5).

LEU2 selection 2.5 2

1.5 600

OD 1 0.5 0 0 1 2 3 Time (Days) No Mtx-SLF WT No Mtx-SLF H87L No Mtx-SLF Y26F Plus Mtx-SLF WT Plus Mtx-SLF H87L Plus Mtx-SLF Y26F

Figure 4-5. LEU2 selection with Y3H systems containing FKBP12 receptor mutants. For each receptor mutant (wild-type WT, H87L or Y26F), growth curves were measured in the presence (plus Mtx-

SLF) and absence of the CID. All error bars represent standard deviation across 3 wells.

We selected the H87L Y3H system for use to maximize the dynamic range of the assay, and tested the strain with 100 nM and 5 µM of meridamycin. This led to the expected result, where increasing meridamycin concentrations caused decreased cell growth (Fig. 4-6).

175

Figure 4-6. Detection of 100 nM and 5 µM of meridamycin standards with the H87L Y3H system.

All error bars represent standard deviation across 3 wells.

To ensure that the decrease in growth was due to the effect of meridamycin on the Y3H output and not the toxicity of the compound itself, we performed a growth assay in the presence of different concentrations of meridamycin (0, 50 nM, 100 nM, 1 µM, 5 µM). Although the addition of meridamycin led to a ~ 25 % decrease in growth, but all the different concentrations tested led to similar levels of growth (Fig. 4-7). This suggests that the differences in growth observed between 100 nM and 5 µM of meridamycin, when the CID was added, was likely due to the effect of meridamycin on the transcriptional output of the Y3H system (Fig. 4-6).

176

Meridamycin toxicity 2.5 2

1.5 600

OD 1 0.5 0 0 1 2 3 Time (Days)

0 nM 50 nM 100 nM 1 µM 5 µM

Figure 4-7. Growth curves with different concentrations of meridamycin. No CID was added. Values refer to the final concentrations of meridamycin (0, 50 nM, 100 nM, 1 µM and 5 µM) in the well.

Standards were dissolved in DMSO. All error bars represent standard deviation across 3 wells.

4.2.3 Yeast three-hybrid screen for FK506

Finally, we sought to optimize the Y3H for detecting biosynthetic meridamycin.

However, since the high biosynthetic strains for meridamycin were not yet available, we used the polyketide FK506 instead as a test bed. In Chapter 3, we demonstrated the production of FK506 in cultures of Streptomyces tsukubaensis cells. Here, we tested to see if the Y3H assay could be used to detect a real biosynthetic product (as opposed to purified standards). First, we checked that the H87L Y3H system could detect FK506. As expected, the addition of 100 nM and 5 µM of FK506 standards led to a concentration-dependent decrease in growth (Fig. 4-8), which was similar to the case for meridamycin (Fig. 4-6).

177

Figure 4-8. Detection of 100 nM and 5 µM of FK506 standards with the H87L Y3H system. Chart from Marie Harton, thesis. Standards were dissolved in DMSO. Bars represent the average OD600 after 6 days of culture. All error bars represent standard deviation across 3 wells.

As biosynthesized FK506 is not secreted from the S. tsukubaensis cells, we first had to perform a chemical extraction step using ethyl acetate (see Chapter 3). Previously, to prepare the samples for the fluorescence polarization assay, we rotavapped the ethyl acetate extracts and resuspended the samples in methanol. In this study, we tested different methods of extraction from unfermented Streptomyces spp. ISPz media used for FK506 production25. We tested to see if the ethyl acetate extracts could be screened directly by the Y3H assay, and compared that to samples that were resuspended in ethanol or extracted with methanol. Our results show that adding the unfermented media directly to the Y3H sensor cells inhibited cell growth, whereas the ethyl acetate extract produced maximal growth (Fig. 4-9). Replacing the ethyl acetate solvent with ethanol led to a decrease in cell growth, while the methanol extracts completely inhibited cell growth. One possible explanation for these results is that the ISPz media contained water- soluble compound(s) that were inhibitory to the growth of yeast. Performing the ethyl acetate extraction removed these inhibitory substance(s), leading to the restoration of Y3H growth.

Furthermore, ethyl acetate did not harm the yeast cells as it was immiscible with the aqueous

178 phase, a small volume (5 µL) was used, and it evaporated quickly in the incubator. Adding either ethanol or methanol inhibited yeast growth, and thus these were not tested further.

1

600 0.5 OD

0 0 1 2 3 4 5 Time (Days)

No SM Plus MS MS_media(1) MS_media(2A) MS_media(2B) MS_media(3) MS_mediaFK506

Figure 4-9. Growth of the Y3H strain in the presence of various extracts of unfermented ISPz media.

No SM: No small molecule CID. Plus MS: 5 µM of MTX-SLF added. All media were tested in the presence of the CID. (1): Unaltered ISPz media. (2A): Ethyl acetate extract. (2B): Ethyl acetate extraction followed by rotavap and resuspension in ethanol. (3): Methanol extract of ISPz media. MS_mediaFK506: unaltered ISPz media spiked with 5 µM FK506. For the methanol extracts, an equal volume of methanol was added to ISPz, vortexed for 1 h and the supernatent tested. Error bars represent one standard deviation across three wells.

Finally, we tested an ethyl acetate extract of a sample from a S. tsukubaensis fermentation trial that had previously been shown to contain ~ 15 mg/L (~ 20 µM) of FK506 by LC-MS (see 179

Chapter 3). The expected final concentration of FK506 in the well is 500 nM. Our results show a detectable decrease in Y3H growth when the sample was added compared to an ethyl acetate extract of unfermented media (Fig. 4-10). These results are promising and in future, further tests can be performed with extracts from other time points in the fermentation run, which contain different levels of FK506 (Fig. 3-2).

1

600 0.5 OD

0 0 1 2 3 4 5 Time (Days)

No SM Plus MS MS_media(2A) MS_t192(2A)

Figure 4-10. Detection of biosynthesized FK506 using the Y3H system. A fermentation sample of S. tsukubaensis in ISPz from Day 8 (t192) was extracted with an equal volume of ethyl acetate and 5 µL was added to the wells (200 µL) containing the Y3H sensor strain. Thus, the final expected concentration of

FK506 in the well is 500 nM. Error bars represent the standard deviation across three wells.

180

4.3 Discussion

In this chapter, we show how the Y3H system can be adapted to couple small molecule detection to a transcriptional readout for the small molecules TMP, meridamycin and FK506.

However, these Y3H systems made used of small molecule-receptor pairs that had already been previously established. A key question is whether the Y3H is sufficiently modular to be readily applied as a general platform to detect small molecules for metabolic engineering.

Early work on small molecule–protein dimerizer systems26 exploited natural small molecule–receptor pairs such as FK506-FKBP12, cyclosporin A-FKBP12, rapamycin-FKBP12 and rapamycin-FRAP for the control of protein interactions and localization in cells7,9,27-29. Next, autonomous systems based on synthetic analogs of FK506/rapamycin that interact only with mutant FKBP12 but not the wild-type (WT) receptor were developed to prevent cross-talk with endogenous pathways9. Later, Y3H systems based on the small molecule-receptor pairs dexamethasone—glucocorticoid receptor (DEX-GR)8,16, FK506-FKBP128,24,30, methotrexate— dehydrofolate reductase (MTX-DHFR)16,30, DDBO—scytalone dehydratase31, biotin—

32 32 streptavidin (SA) , estradiol—estrogen receptor (ER) , and O6-benzylguanine—human O6- alkylguanine-DNA alkyltransferase (hAGT) SNAP tag33 were developed. The Y3H has also been applied to screen cDNA libraries to identify the protein targets of small molecules like anecortave acetate (AA)34, kinase inhibitors35, erlotinib, atorvastatin and sulfasalazine36, and the plant hormones cucurbic acid methylester and 2,6-dihydroxybenzoic acid37.

As such, there is a rich history of examples where natural and synthetic small molecules have been interfaced with receptor proteins to control genetic programs in the cell. In addition, most of these studies involving the Y3H system also showed that an excess of the small molecule monomer can compete with the CID to inactivate transcription, which suggests that the Y3H can

181 be used as a small molecule sensor16,38. Furthermore, the Y3H has been harnessed for directed evolution to switch the specificity of the retinoid X receptor (RXR) from retinoic acid to the

11 synthetic ligand LG335 (EC50~40nM) , demonstrating that the Y3H can be used to evolve receptors for new small molecule targets.

In terms of the CID, we and others have performed studies on the optimal CID linker length38 and type27,35, which can inform future CID designs. Another key concern is the affinity

(Kd) of the small molecule-receptor pairs needed to create functional Y3H systems. It has been estimated for the yeast two-hybrid system that protein-protein interactions with Kd < 1 µM can be detected39 and our recent work using the Y3H to fish out the anecortave acetate - PDE6D

34 receptor interaction (Kd = 2.7µM) suggests that this is also true for the Y3H . However, this may be complicated by other factors such as the uptake of the CID and target molecule into the cell, receptor turnover and the influx/efflux of the CID from the nucleus8,30. Thus for receptors with Kd above this range, directed evolution (e.g. using the Y3H system) can be first used to evolve a higher affinity binder.

Finally, in order for the Y3H to be run as a selection and not a screen, the production and selection functions for the target molecule have to be combined within the same cell. However, there could be a few concerns with this approach. For one, both the production and selection functions are naturally antagonistic, which can reduce the power of the selection5. This is because optimized production strains are often under heavy metabolic burden and may exhibit lowered growth rates as resources are channeled for production. Another potential challenge common to all growth selections is that the product, if secreted, can diffuse into other cells resulting in cross-talk, which will increase the background noise and reduce the enrichment power of the selection. Thus, methods that co-encapsulate the producer and sensor cells, such as

182 in alginate beads, may represent a potential solution to these problems. Microbeads (~100 μm) offer an efficient means to compartmentalize cells for biological assays and can be sorted based on fluorescence by the large-particle COPAS sorter for libraries of 106 - 107.40-42 In order to achieve this, the Y3H system can be coupled to the expression of fluorescent genes such as RFP or GFP. Another efficient method to sort the beads is to couple the Y3H output to a gas- producing enzyme such as peroxidase or carbonic anhydrase. Gas production can cause the beads to float in water to allow sorting of the beads easily, cheaply and quickly by density43.

Alternatively, three-hybrid systems can be constructed in producer hosts such as Streptomyces spp. to enable selection for non-secreted compounds such as FK506 or meridamycin.

In conclusion, we show that the Y3H system can differentiate varying concentrations of the small molecules TMP and meridamycin, and couple that to the expression of a selectable marker gene. Furthermore, we developed a potential protocol to apply the Y3H to screen real biosynthetic FK506 samples. We anticipate that the Y3H system can be a general platform to develop small molecule screens/selections for metabolic engineering, and will significantly increase the library sizes that can be tested using high-throughput directed evolution approaches.

3.4 Experimental methods

General methods and molecules. All molecular biology techniques (e.g. PCR, cloning,

Gibson Assembly, E. coli and S. cerevisiae culture and transformation) were performed according to standard methods (see Section 2.4). TMP (#T7883) and FK506 (#F4679) were obtained from Sigma. Meridamycin was obtained from Pfizer Inc. Dex-Mtx was synthesized by

Zhixing Chen and Ehud Herbst. Mtx-SLF was synthesized by Eric Althoff23.

183

Yeast three-hybrid strains. The Y3H strain for TMP detection (MH24503_098) was a previously constructed strain in the lab19. The Y3H strain for FK506 detection was constructed as follows. EGY48 (MATa trp1 his3 ura3 6LexAop-LEU2 GAL+)44 was transformed with the plasmid MH423B42-eDHFR and selected for on SC (H-, 2% glucose) plates. Next, colonies were transformed with the plasmid MH414LexA-FKBP12* to generate MH-H87L. Successful transformants were selected on SC (HT-, 2% glucose) plates.

Plasmid construction. MH414LexA-FKBP12* was constructed as follows. The pGAL1-

LexA fragment was amplified from pKB52112 with primers MH162 (5’-

AAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGCCCCATTATCTTAGCC

TAAA) and MH164 (5’- TCCTGGGGAGATGGTTTCCACCTGCACTCCCAATTCCAG

CCAGTCGC) that added homology to pRS414GAL1 and the FKBP12*-tADH fragment. The

FKBP12*-tADH fragment was amplified from 1054E24 with primers MH165 (5’-

GTTATTCGCAACGGCGACTGGCTGGAATTGGGAGTGCAGGTGGAAAC) and MH163

(5’- GAGGGCGTGAATGTAAGCGTGACATAACTAATTACATGACGGATCCGTGTGGAA

GAAC). The fragments pGAL1-LexA and FKBP12*-tADH were Gibson assembled with pRS414GAL1 (ATCC 87328) digested with SacI and XhoI according to the manufacturer’s protocol (NEB #2611). Plasmid MH423B42-eDHFR was constructed as follows. The pGAL1-

B42-eDHFR fragment was amplified from pBC1034E38 with primers MH162 and MH163. The product was Gibson assembled into pRS423 (ATCC 87327) digested with SacI and XhoI.

Following Gibson assembly, the reaction mix was transformed into E. coli by electroporation and selected on LB/ampicillin plates. Plasmids were extracted (Miniprep, Qiagen) and characterized by sequencing (Genewiz). The other strains containing WT FKBP12 and the Y26F

184 were constructed in an analogous manner but using templates containing WT FKBP12 or

Y26F24.

Y3H assays to detect TMP. Three colonies of the Y3H strain were inoculated into SC(HTL-),

2% glucose media and shaken overnight at 30°C. Cells were pre-induced for 24 hours in

SC(HTL-), 2% galactose, 2% raffinose and 2 μg/mL of doxycycline (Dox). The pre-induced cells were added to SC(HTL-), 2% galactose, 2% raffinose, 0.2 % FOA, 2 μg/mL Dox, +/- 5 μM Mtx-

SLF to have a starting OD600 of 0.1. 96-well plates were used and the total volume in each well was 200 µL. The plates were shaken at 30°C and 230 rpm, and OD600 readings were taken with a

Tecan plate reader. All growth curves were performed in duplicate with three technical replicates.

Y3H assays to detect FK506 or meridamycin. In general, three colonies of the Y3H strain were inoculated into SC(HT-), 2% glucose media and shaken overnight at 30°C. Cells were pre- induced for 24 hours in SC(HT-), 2% galactose, 2% raffinose media. The pre-induced cells were

- added to SC(HTL ), 2% galactose, 2% raffinose, +/- 5 μM Mtx-SLF to have a starting OD600 of

0.1. 96-well plates were used and the total volume in each well was 200 µL. The plates were shaken at 30°C and 230 rpm, and OD600 readings were taken with a Tecan plate reader. All growth curves were performed in duplicate with three technical replicates.

FK506 production. Streptomyces tsukubaensis (NRRL 18488) was obtained from the

Agricultural Research Service Culture Collection (NRRL), USA. Spores were harvested after culturing NRRL 18488 on inorganic salts/starch agar media (ISP4) (BD Difco, #277210) at 28

185

ºC for 4 days. Production of FK506 was carried out in 2.2 mL, square, 96-well deep well microtiter plates (Axygen Scientific) using a two-stage cultivation method.25 First, 500 μL of modified BaSa seed medium was inoculated with spores of NRRL 18488 and incubated for 3 days at 28 ºC and 250 rpm (Innova 44R, 2-inch orbit). The seed culture was used to inoculate

500 – 1000 μL of ISPz production media 25 per well and incubated for 4 – 8 days at 28 ºC and

250 rpm. The volume of seed culture added was 10 % of the final volume in the well. The plates were sealed with two layers of sterile, breathable rayon seal (AeraSeal) before incubation.

Extraction of FK506. The culture in each well/flask was transferred into a 2 mL Eppendorf tube. An equal volume of ethyl acetate was added and the tube was vortexed for 30 min. After centrifugation for 5 minutes, the top ethyl acetate layer was used for testing with the Y3H system. To obtain the ethanol sample, the ethyl acetate layer was transferred into a 4 mL glass vial and evaporated to dryness by rotary evaporation. An equal volume of ethanol was added to dissolve any FK506 present for subsequent testing. Methanol extraction was performed by adding an equal volume of methanol to the culture. The tube was vortexed for 1 h and centrifuged. The supernatant was taken for testing with the Y3H system.

186

3.5 References

1 Esvelt, K. M. & Wang, H. H. Genome-scale engineering for systems and synthetic biology. Molecular systems biology 9, 641, doi:10.1038/msb.2012.66 (2013). 2 Dietrich, J. A., McKee, A. E. & Keasling, J. D. High-throughput metabolic engineering: advances in small-molecule screening and selection. Annual review of biochemistry 79, 563-590, doi:10.1146/annurev-biochem-062608-095938 (2010). 3 Basu, S., Campbell, H. M., Dittel, B. N. & Ray, A. Purification of specific cell population by fluorescence activated cell sorting (FACS). J Vis Exp, doi:10.3791/1546 (2010). 4 Cheng, Q., Ruebling-Jass, K., Zhang, J., Chen, Q. & Croker, K. M. Use FACS sorting in metabolic engineering of Escherichia coli for increased peptide production. Methods Mol Biol 834, 177-196, doi:10.1007/978-1-61779-483-4_12 (2012). 5 Dietrich, J. A., Shis, D. L., Alikhani, A. & Keasling, J. D. Transcription factor-based screens and synthetic selections for microbial small-molecule biosynthesis. ACS synthetic biology 2, 47-58, doi:10.1021/sb300091d (2013). 6 Chockalingam, K., Chen, Z., Katzenellenbogen, J. A. & Zhao, H. Directed evolution of specific receptor-ligand pairs for use in the creation of gene switches. Proceedings of the National Academy of Sciences of the United States of America 102, 5691-5696, doi:10.1073/pnas.0409206102 (2005). 7 Amara, J. F. et al. A versatile synthetic dimerizer for the regulation of protein-protein interactions. Proceedings of the National Academy of Sciences of the United States of America 94, 10618-10623 (1997). 8 Licitra, E. J. & Liu, J. O. A three-hybrid system for detecting small ligand-protein receptor interactions. Proceedings of the National Academy of Sciences of the United States of America 93, 12817-12821 (1996). 9 Clackson, T. et al. Redesigning an FKBP-ligand interface to generate chemical dimerizers with novel specificity. Proceedings of the National Academy of Sciences of the United States of America 95, 10437-10442 (1998). 10 Liberles, S. D., Diver, S. T., Austin, D. J. & Schreiber, S. L. Inducible gene expression and protein translocation using nontoxic ligands identified by a mammalian three-hybrid screen. Proceedings of the National Academy of Sciences of the United States of America 94, 7825-7830 (1997). 11 Schwimmer, L. J., Rohatgi, P., Azizi, B., Seley, K. L. & Doyle, D. F. Creation and discovery of ligand-receptor pairs for transcriptional control with small molecules. Proceedings of the

187

National Academy of Sciences of the United States of America 101, 14707-14712, doi:10.1073/pnas.0400884101 (2004). 12 Baker, K. et al. Chemical complementation: a reaction-independent genetic assay for enzyme catalysis. Proceedings of the National Academy of Sciences of the United States of America 99, 16537-16542, doi:10.1073/pnas.262420099 (2002). 13 Lin, H., Tao, H. & Cornish, V. W. Directed evolution of a glycosynthase via chemical complementation. Journal of the American Chemical Society 126, 15051-15059, doi:10.1021/ja046238v (2004). 14 Peralta-Yahya, P., Carter, B. T., Lin, H., Tao, H. & Cornish, V. W. High-throughput selection for cellulase catalysts using chemical complementation. Journal of the American Chemical Society 130, 17446-17452, doi:10.1021/ja8055744 (2008). 15 Vidal, M., Brachmann, R. K., Fattaey, A., Harlow, E. & Boeke, J. D. Reverse two-hybrid and one-hybrid systems to detect dissociation of protein-protein and DNA-protein interactions. Proceedings of the National Academy of Sciences of the United States of America 93, 10315- 10320 (1996). 16 Lin, H., Abida, W. M., Sauer, R. T. & Cornish, V. W. Dexamethasone−Methotrexate: An Efficient Chemical Inducer of Protein Dimerization In Vivo. Journal of the American Chemical Society 122, 4247-4248, doi:10.1021/ja9941532 (2000). 17 Belli, G., Gari, E., Piedrafita, L., Aldea, M. & Herrero, E. An activator/repressor dual system allows tight tetracycline-regulated gene expression in budding yeast. Nucleic acids research 26, 942-947 (1998). 18 Gari, E., Piedrafita, L., Aldea, M. & Herrero, E. A set of vectors with a tetracycline-regulatable promoter system for modulated gene expression in Saccharomyces cerevisiae. Yeast 13, 837-848, doi:10.1002/(SICI)1097-0061(199707)13:9<837::AID-YEA145>3.0.CO;2-T (1997). 19 Harton, M. D., Wingler, L. M. & Cornish, V. W. Transcriptional regulation improves the throughput of three-hybrid counter selections in Saccharomyces cerevisiae. Biotechnology journal 8, 1485-1491, doi:10.1002/biot.201300186 (2013). 20 Boeke, J. D., Trueheart, J., Natsoulis, G. & Fink, G. R. 5-Fluoroorotic acid as a selective agent in yeast molecular genetics. Methods in enzymology 154, 164-175 (1987). 21 Jing, C. & Cornish, V. W. Design, synthesis, and application of the trimethoprim-based chemical tag for live-cell imaging. Curr Protoc Chem Biol 5, 131-155, doi:10.1002/9780470559277.ch130019 (2013).

188

22 Summers, M. Y., Leighton, M., Liu, D., Pong, K. & Graziani, E. I. 3-normeridamycin: a potent non-immunosuppressive immunophilin ligand is neuroprotective in dopaminergic neurons. J Antibiot (Tokyo) 59, 184-189, doi:10.1038/ja.2006.26 (2006). 23 Althoff, E. A. & Cornish, V. W. A bacterial small-molecule three-hybrid system. Angewandte Chemie 41, 2327-2330, doi:10.1002/1521-3773(20020703)41:13<2327::AID- ANIE2327>3.0.CO;2-U (2002). 24 de Felipe, K. S., Carter, B. T., Althoff, E. A. & Cornish, V. W. Correlation between ligand- receptor affinity and the transcription readout in a yeast three-hybrid system. Biochemistry 43, 10353-10363, doi:10.1021/bi049716n (2004). 25 Martinez-Castro, M. et al. Taxonomy and chemically semi-defined media for the analysis of the tacrolimus producer 'Streptomyces tsukubaensis'. Applied microbiology and biotechnology 97, 2139-2152, doi:10.1007/s00253-012-4364-x (2013). 26 Spencer, D. M., Wandless, T. J., Schreiber, S. L. & Crabtree, G. R. Controlling signal transduction with synthetic ligands. Science 262, 1019-1024 (1993). 27 Kley, N. Chemical dimerizers and three-hybrid systems: scanning the proteome for targets of organic small molecules. Chemistry & biology 11, 599-608, doi:10.1016/j.chembiol.2003.09.017 (2004). 28 Belshaw, P. J., Spencer, D. M., Crabtree, G. R. & Schreiber, S. L. Controlling programmed cell death with a cyclophilin-cyclosporin-based chemical inducer of dimerization. Chemistry & biology 3, 731-738 (1996). 29 Belshaw, P. J., Ho, S. N., Crabtree, G. R. & Schreiber, S. L. Controlling protein association and subcellular localization with a synthetic ligand that induces heterodimerization of proteins. Proceedings of the National Academy of Sciences of the United States of America 93, 4604-4607 (1996). 30 Henthorn, D. C., Jaxa-Chamiec, A. A. & Meldrum, E. A GAL4-based yeast three-hybrid system for the identification of small molecule-target protein interactions. Biochemical pharmacology 63, 1619-1628 (2002). 31 Firestine, S. M., Salinas, F., Nixon, A. E., Baker, S. J. & Benkovic, S. J. Using an AraC-based three-hybrid system to detect biocatalysts in vivo. Nature biotechnology 18, 544-547, doi:10.1038/75414 (2000). 32 Hussey, S. L., Muddana, S. S. & Peterson, B. R. Synthesis of a beta-estradiol-biotin chimera that potently heterodimerizes estrogen receptor and streptavidin proteins in a yeast three-hybrid system. Journal of the American Chemical Society 125, 3692-3693, doi:10.1021/ja0293305 (2003).

189

33 Gendreizig, S., Kindermann, M. & Johnsson, K. Induced protein dimerization in vivo through covalent labeling. Journal of the American Chemical Society 125, 14970-14971, doi:10.1021/ja037883p (2003). 34 Shepard, A. R. et al. Identification of PDE6D as a molecular target of anecortave acetate via a methotrexate-anchored yeast three-hybrid screen. ACS chemical biology 8, 549-558, doi:10.1021/cb300296m (2013). 35 Becker, F. et al. A three-hybrid approach to scanning the proteome for targets of small molecule kinase inhibitors. Chemistry & biology 11, 211-223, doi:10.1016/j.chembiol.2004.02.001 (2004). 36 Chidley, C., Haruki, H., Pedersen, M. G., Muller, E. & Johnsson, K. A yeast-based screen reveals that sulfasalazine inhibits tetrahydrobiopterin biosynthesis. Nature chemical biology 7, 375-383, doi:10.1038/nchembio.557 (2011). 37 Cottier, S. et al. The yeast three-hybrid system as an experimental platform to identify proteins interacting with small signaling molecules in plant cells: potential and limitations. Frontiers in plant science 2, 101, doi:10.3389/fpls.2011.00101 (2011). 38 Abida, W. M., Carter, B. T., Althoff, E. A., Lin, H. & Cornish, V. W. Receptor-dependence of the transcription read-out in a small-molecule three-hybrid system. Chembiochem : a European journal of chemical biology 3, 887-895, doi:10.1002/1439-7633(20020902)3:9<887::AID- CBIC887>3.0.CO;2-F (2002). 39 Estojak, J., Brent, R. & Golemis, E. A. Correlation of two-hybrid affinity data with in vitro measurements. Molecular and cellular biology 15, 5820-5829 (1995). 40 PANKE, S. R., Zürich, CH-8050, CH), PELLAUX, René (Sihlfeldstrasse 150, Zürich, CH-8004, CH), HELD, Martin (Schaffhauserstrasse 41, Zürich, CH-8006, CH), MEYER, Andreas (Schöntalstrasse 18, Zürich, CH-8004, CH),. SUSPENSION ARRAYS FOR SPECIES-SPECIES INTERACTION STUDIES. (2009). 41 Walser, M. et al. Novel method for high-throughput colony PCR screening in nanoliter-reactors. Nucleic acids research 37, e57, doi:10.1093/nar/gkp160 (2009). 42 Walser, M., Leibundgut, R. M., Pellaux, R., Panke, S. & Held, M. Isolation of monoclonal microcarriers colonized by fluorescent E. coli. Cytometry. Part A : the journal of the International Society for Analytical Cytology 73, 788-798, doi:10.1002/cyto.a.20597 (2008). 43 Pellaux, R. Z., CH), Walser, Marcel (Winterthur, CH), Meyer, Andreas Jörg (Frauenfeld, CH), Held, Martin (Zurich, CH), Panke, Sven (Zurich, CH),. FAST SCREENING OF CLONES. United States patent (2013). 44 Golemis, E. A. et al. Interaction trap/two-hybrid system to identify interacting proteins. Curr Protoc Cell Biol Chapter 17, Unit 17 13, doi:10.1002/0471143030.cb1703s08 (2001).

190

Chapter 5

Semi-synthesis Approaches to the

Terpenoids Ambrox and the Tocotrienols

──────────────────────────────────────────────────────────────────────────────

The contents of this chapter will be published in:

Bertrand T. Adanve, Caroline A. Patenode, Yao Zong Ng, Scott A. Snyder and Virginia W.

Cornish. “A Hybrid Biosynthetic and Synthetic Approach for the Preparation of 9-epi-Ambrox

Derivatives”, in preparation.

Bertrand T. Adanve, Yao Zong Ng, Scott A. Snyder, and Virginia W. Cornish. “Novel Acid

Catalyst System Enabled the First Single-Step Synthesis of Tocotrienols from Biosynthetic

Geranylgeraniol”, in preparation.

Chapter contributions: B.T.A. conceived the ideas, B.T.A. designed and performed chemical synthesis experiments, Y.Z.N. and B.T.A. designed S. cerevisiae experiments, Y.Z.N. performed

S. cerevisiae experiments. C.A.P. performed E. coli experiments.

191

5.0 Chapter outlook

The use of metabolically engineered microbes for the biosynthesis of high-value, complex chemicals from cheap, renewable feedstocks represents a potentially environmentally- friendly and low-cost alternative to chemical synthesis. However, there are several limitations in relying entirely on this approach, including toxicity of the final product or intermediates, a lack of natural enzymes to catalyze certain chemical steps, as well as the time and cost of developing a commercially-ready strain for the conversion of feedstock to end product. Thus, several groups have established semi-synthetic schemes, combining biosynthesis with chemical synthesis to generate efficient and cost-effective routes to compounds, such as the anti-malarial drug artemisinin and anti-cancer drug taxol. In this chapter, we report semi-synthetic strategies to produce the terpenoids Ambrox, a key component of fragrances traditionally obtained from the rare natural source ambergris (sperm whale excrement), and the tocotrienols, Vitamin E components with anti-oxidant and anti-cancer properties, as well as their analogs. To make

Ambrox, we engineered E. coli to produce the unnatural intermediate Z,E-farnesol, followed by cyclization by BDSB (bromodiethylsulfonium bromopentachloroantimonate) and IDSI

(iododiethylsulfonium iodopentachloroantimonate) to form halo-Ambrox, which could be easily converted chemically to 9-epi-Ambrox and its analogs. Attempts to engineer yeast to produce

Z,E-farnesol were unsuccessful but led to the overproduction of the natural isomer E,E-farnesol instead. For the tocotrienols, we engineered yeast to express a novel triple enzyme fusion to biosynthesize the intermediate E,E,E-geranylgeraniol, which was converted in one step to the tocotrienols using a novel Lewis acid – Phase Transfer Catalyst system (Type I Adanve acid).

192

5.1 Introduction

Natural products are typically secondary metabolites of plants, fungi and bacteria that play important roles in signaling, nutrient acquisition and defense1,2. As such, these compounds possess significant bioactivities that have made them an indispensable source of pharmaceutical, agricultural, nutraceutical, fragrance and industrial compounds3. Traditionally, natural products are extracted from natural sources that produce these molecules in low amounts, which make this method expensive and largely unsustainable. It has been estimated that the bark of an entire

Pacific yew tree is needed to yield one dose of the anti-cancer drug paclitaxel (Taxol)4. On the other hand, total chemical synthesis is a powerful tool enabling us to access diverse molecules and analogs. However, because natural products are often relatively large and complex molecules, their total syntheses require multiple steps and are usually not economically viable for commercial production5.

Metabolic engineering of microbes holds the promise of providing an alternative, renewable, high-yield source of natural products from low-cost feedstocks6. Microbes that naturally produce the compound can be modified to increase the titer, rate and yield of the product. Alternatively, genetically tractable hosts such as E. coli and S. cerevisiae, which are easy to cultivate, can be engineered by transplanting entire biosynthetic pathways into them to produce non-native natural products7. These approaches have been made possible by advances in our ability to manipulate DNA and our understanding of cell biology.

While there have been numerous landmark successes, the majority of these efforts have required a tremendous input of time, labor and resources8-10. There is a long developmental lag- time between a proof-of-principle strain that can produce the compound in low amounts, and a high-yielding, commercially viable strain11. The reasons for this include our incomplete

193

knowledge of all the enzymes and regulatory genes that interact with a given pathway. This problem can be compounded when the core biosynthetic pathway is only partially elucidated, which limits production to natural hosts4. Even if all the enzymes are known, it may be challenging to express them and optimize their activity in a heterologous host. For instance, cytochrome P450s are notoriously difficult to express in a heterologous host12. Finally, the product or intermediates could be toxic to the host cell.

Given these challenges, many groups have turned to semi-synthetic strategies, combining biosynthesis and organic synthesis steps to derive the compound in the most efficient manner.

Some prominent examples include the anti-malarial drug artemisinin10 and the anti-cancer drug

Taxol13. In the former case, yeast was engineered to produce artemisinic acid, which was converted to artemisinin chemically in four steps. In the case of Taxol, the intermediate 10- deacetylbaccatin III was extracted from the needles of the Taxus plants and converted in four steps to Taxol13, although currently plant cell culture is the more promising approach for the production of Taxol14. Semi-synthesis can offer advantages such as stereo- and regioselective catalysis by enzymes and the ability to modify biosynthetic intermediates with synthetic chemistry to produce non-natural analogs15.

The terpenoids form a significant class of natural products because a few key intermediates can give rise to an incredibly diverse array of molecules16. Terpenoids offer a renewable source of next generation biofuels17-19, materials20, and pharmaceuticals4,10. Our first target is the terpenoid 9-epi-Ambrox (Fig. 5-1, (1)) and several related derivatives21,22. Ambrox is a valuable tricyclic labdane ether component of fragrances, naturally derived from ambergris

(sperm whale excrement). Due to the ban on whaling, difficulty in obtaining this compound has led to the development of total syntheses for Ambrox production21. However, these methods are

194

encumbered by long, multi-step reactions and low yield. Furthermore, the majority of reported

Ambrox analogs contain modifications to just the tetrahydrofuran ring, leaving the remainder of the molecule’s core unexplored23. Thus, we have devised a scheme to engineer E. coli / S. cerevisiae for the biosynthesis of the linear, unnatural terpenoid intermediate 2Z, 6E-farnesol

(Z,E-FOH), which can be chemically modified (Fig. 5-2) and cyclized by the halonium-based reagents (bromodiethylsulfonium bromopentachloroantimonate) and IDSI (iododiethylsulfonium iodopentachloroantimonate) to form halo-Ambrox (Fig. 5-3). The introduction of a halogen handle allows for subsequent functionalization and the preparation of novel labdane analogs.

Our second terpenoid target is the group of molecules known as the tocotrienols.

Together with the tocopherols, they form the Vitamin E family of compounds, which play important roles as anti-oxidants and even cardio-protective agents24. Despite the fact that the tocotrienols have shown promise as anti-cancer, neuro-protective25 and hypocholesteromic agents26, they have received less attention than the tocopherols, which is probably due to their scarcity in nature. While racemic α-tocopherol is produced industrially by a single step condensation of isophytol with trimethylhydroquinone (TMHQ)27,28, an analogous direct synthesis of α-tocotrienol through the condensation of geranyllinalool (GLOH) with TMHQ has not been reported. As such, nearly all of the reported racemic29-31 as well as asymmetric32-34 tocotrienol preparation methods involve the assembly of the molecule in different stages—first the construction of the chroman skeleton, followed by addition of the polyene chain. Our strategy was to engineer S. cerevisiae for the production of E,E,E-geranylgeraniol (E,E,E-GGOH), followed by a one-pot tandem C-C coupling of the geranylgeranyl chain to TMHQ using a novel

Lewis acid—Phase Transfer Catalyst system (type I Adanve acid). A Bronsted acid version of

195

the catalyst system (type II Adanve acid) was employed to achieve regioselective cycloetherification of 3-geranygeranyl-hydroquinones into their tocotrienol equivalents (Fig. 5-

7).

5.2 Results

5.2.1 Semi-synthesis approach to Ambrox and derivatives

Figure 5-1. Comparison of natural and semi-synthetic routes to 9-epi Ambrox and analogs. (A)

Natural biosynthetic route to 9-epi Ambrox. (B) Proposed semi-synthesis approach to 9-epi Ambrox and analogs.

196

Previously, other groups have designed semi-synthetic strategies to produce several key drug molecules such as the penicillins35, paclitaxel4 and artemisinin10. In each case, an advanced intermediate in the pathway was produced by biosynthesis and subject to organic synthesis steps to yield the final product. In contrast to this general strategy, we sought to produce biosynthetic intermediates that are amenable to chemical modification and are versatile, in order to enable access to a greater number of end products. Importantly, the intermediate need not occur naturally in the biosynthetic pathway towards the target molecule.

Figure 5-2. Scheme to derive 3-Z,E-homofarnesol from (a) Z,E-FOH and (b) E,E-FOH.

Z,E-FOH is an unnatural terpenoid molecule that is expensive (~$700 per mg, Sigma

94967) and is difficult to synthesize chemically36. Thus, our goal was to engineer E. coli and S.

197

cerevisiae to ensure a cheap and renewable source of Z,E-FOH (Fig. 5-4). As an intermediate,

Z,E-FOH can be efficiently homologated to 3-Z,E-homofarnesol (Fig. 5-2) and cyclized by halonium-donating reagents BDSB and IDSI to form halo-Ambrox (Fig. 5-3). This can be easily converted into 9-epi-Ambrox and other analogs. We also explored the use of the natural isomer of Z,E-FOH, all-trans-farnesol (E,E-FOH), as a starting material for chemical synthesis37.

Although only two steps are required to derive homofarnesol, a mixture of E,E-homofarnesol and Z,E-homofarnesol (ratio of 2:1) was produced, which required a tedious separation step to obtain the desired, pure Z,E-homofarnesol (Fig. 5-2). Thus, it is advantageous to biosynthesize the isomer Z,E-FOH.

198

Figure 5-3. Scheme for cyclization of homofarnesol to halogenated 9-epi Ambrox and further derivatization to novel analogs.

5.2.2 Biosynthesis of Z,E-farnesol in E. coli

In prokaryotes like E. coli, the building blocks for terpenoid synthesis, isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP), are generated by the 2-C-methyl-D- erythritol 4-phosphate (MEP) pathway, whereas in eukaryotes, the mevalonate (MVA) pathway is responsible (Fig. 5-4). The condensation of IPP and DMAPP, which is catalyzed by trans- farnesyl pyrophosphate synthase (trans-FPPS), produces geraniol pyrophosphate (GPP). A second condensation of GPP with IPP, which is catalyzed by the same enzyme trans-FPPS, produces E,E-farnesyl pyrophosphate (E,E-FPP), which can be dephosphorylated by endogenous phosphatases to produce E,E-FOH38.

To engineer E. coli for the production of Z,E-FOH, the enzymes in the MEP pathway from Bacillus subtilis (dxs, dxr, ispD, ispF, ispG, ispH and idi) were codon-optimized and overexpressed using the IPTG-inducible pLac promoter. It has been shown that B. subtilis dxs and dxr possess higher activity than their E. coli homologs39. Next, we heterologously expressed cis-farnesyl pyrophosphate synthases (cis-FPPSs) from several bacteria (Mycobacterium tuberculosis, Thermobifida fusca, Corynebacterium glutamicum, and Corynebacterium efficiens) in our host strain40,41. cis-FPPS catalyzes the same reactions as trans-FPPS but produces the isomer Z,E-FPP instead of E,E-FPP. This can be hydrolyzed by phosphatases to form Z,E-FOH.

199

Figure 5-4. Biosynthetic pathways for the production of Z,E-FOH and E,E-FOH. (A) E. coli and (B)

S. cerevisiae.

We tested the strains in shake flasks (small scale production trials) and found that the M. tuberculosis cis-FPPS (Rv1086) and C. glutamicum cis-FPPS produced Z,E-FOH, at levels at least four-fold higher than the natural isomer E,E-FOH (Fig. 5-5, Fig. 5-11, Fig. 5-13). Rv1086 resulted in the highest yield of Z,E-FOH (8 mg/L) compared to the C. glutamicum enzyme (~ 1 mg/L). Finally, we scaled-up production of the Rv1086 strain in a 6 L, bench-top vessel, which produced ~ 3 mg/L (Fig. 5-5, Fig. 5-12, Fig. 5-14). During the course of our experiments, an independent report confirmed our findings that expression of Rv1086 in an E. coli strain

200

containing an exogenous MVA pathway led to production of 7.7 mg/L of Z,E-FOH42. The group further fused the E. coli trans-FPPS (IspA) to Rv1086, which led to 115.6 mg/L of Z,E-FOH.

The authors hypothesized that the enzyme fusion led to higher titers than expressing the enzymes separately because the intermediate GPP produced by ispA was channeled more effectively to

Rv1086.

Figure 5-5. Production yields of Z,E-FOH and E,E-FOH in E. coli expressing different Z,E-FPPSs.

Production yields are reported as mg/L of media. The production yield for FPP2/A8 (E. coli containing the M. tuberculosis Z,E-FPPS Rv1086) is reported for bench-scale production (represented by the jar) and flask production. Error bars represent the standard error of four replicates.

5.2.3 Biosynthesis of E,E-farnesol in S. cerevisiae

We also attempted to engineer the yeast S. cerevisiae for the production of Z,E-FOH, to see if we could further increase yields. We chose to codon-optimize Rv1086 and Tfu0456 (trans-

201

FPPS from T. fusca) for yeast expression as these two enzymes were reported to have the highest

40 Kcat/Km values out of the four bacterial trans-FPPSs . We constructed high-copy plasmids containing Rv1086 or Tfu0456 alone, under control of the strong, constitutive PGK1 promoter, or plasmids containing the trans-FPPSs fused to DPP1 (either at the C- or N-terminus) via a

GGGS linker (Table 5-2). DPP1 is an endogenous phosphatase that is thought to hydrolyze

FPP/GGPP into FOH/GGOH43.

First, we overexpressed a truncated version of the rate-limiting enzyme in the mevalonate pathway, HMG-CoA reductase (tHMG1), in the yeast strain ATCC200589, which has been reported to lead to high prenyl alcohol yields44. We tested several colonies and found two populations of strains, one that produced ~ 10-fold higher levels of E,E-FOH, nerolidol (NOH),

GGOH and squalene overall (high-AN-ZE1-T7) than the other (low-AN-ZE1-T7) (Fig. 5-6).

Next, we transformed the plasmids overexpressing Rv1086 or Tfu0456 alone into the yeast strain high-AN-ZE1-T7 to create AN-ZE5-T2 and AN-ZE5-T1 respectively. However, we did not manage to detect any Z,E-FOH production in both strains. Instead, both strains produced high levels of squalene (2-3 mg/L) and low levels (<0.3 mg/L) of E,E-FOH (Fig. 5-6). When we transformed the trans-FPPS-DPP1 fusion constructs into high-AN-ZE1-T7, we found that the combination ATCC200589/tHMG1/DPP1-Tfu0456 (CP-51) led to high production of E,E-FOH

(20.3 mg/L) but no Z,E-FOH (Fig. 5-6).

202

Log Scale

Figure 5-6. Production yields of prenyl alcohols in S. cerevisiae strains engineered for Z,E-FOH production. Data is plotted on a log scale. In brackets is the number of independent replicates from which the average values and error bars were derived. Error bars represent one standard deviation.

In addition, CP-51 produced comparable levels of NOH and GGOH as the background strain high-AN-ZE1-T7, but a ten-fold lower amount of squalene (Fig. 5-6). This effect could be due to the overexpressed DPP1 within the fusion construct acting on E,E-FPP to produce E,E-

FOH, which reduces the pool of FPP available for squalene synthase to produce squalene. This effect was also observed for the other three trans-FPPS-DPP1 combinations (strains: EA-14, EA-

3 and AN-ZE2-T3). All three produced E,E-FOH (~0.2 mg/L) and GGOH (~0.2 mg/L) at comparably low levels as low-AN-ZE1-T7, but did not produce any detectable amounts of squalene as compared to low-AN-ZE1-T7 (~0.2 mg/L) (Fig. 5-6).

Finally, in our CP-51 fermentations, we observed a significant side peak at retention time

(RT) ~ 9.90 min that possessed a similar mass spectra to a side peak at the same retention time in

203

our synthetic Z,E-FOH standard (Fig. 5-15). This unidentified side peak was also present in the background strain high-AN-ZE1-T7. Furthermore, we occasionally observed this side peak in our initial E. coli fermentations performed without decane, which disappeared when we used 20

% decane as an overlay (Fig. 5-12). This suggested to us that the side peak was due to Z,E-FOH and/or E,E-FOH interacting with components in the media or breaking down. Unfortunately, we could not test our yeast strains with 20 % decane as their growth was severely impaired when >5

% decane was added (data not shown). We also tested alternative surfactants such as isopropyl myristate for our yeast fermentations to sequester the prenyl alcohol products. However, we and our collaborators at Acidophil found that isopropyl myristate stuck to the GC-MS columns, and thus could not be used to quantify the products. Our failure to produce Z,E-FOH in yeast could also be due to the codon-optimized versions of Rv1086 and Tfu0456 being non-functional in yeast.

5.2.4 Semi-synthesis approach to the tocotrienols (Vitamin E)

Our semi-synthetic scheme to produce the tocotrienols involves harnessing yeast to produce the linear terpenoid E,E,E-geranylgeraniol (E,E,E-GGOH) (Fig. 5-7). E,E,E-GGOH is used as a component of fragrances and as a starting material for the production of vitamins and other terpenes. Thus there is pharmaceutical and industrial interest in deriving renewable and cost-efficient sources of GGOH7. Next, we developed a novel Lewis acid—Phase Transfer

Catalyst system (type I Adanve acid) to perform the one-pot tandem C-C coupling of the geranylgeranyl chain to TMHQ. Furthermore, a Bronsted acid version of the catalyst system

(type II Adanve acid) was employed to achieve regioselective cycloetherification of 3- geranygeranyl-hydroquinones into their tocotrienol equivalents (Fig. 5-7).

204

Figure 5-7. Semi-synthesis scheme for the production of the tocotrienols. (A) Engineered yeast for the biosynthesis of E,E,E-geranylgeraniol followed by one-step coupling with TMHQ to form the tocotrienols. (B) Regioselective cycloetherification of 3-geranylgeranylhydroquinones to the tocotrienols catalyzed by type II Adanve acids. (C) Asymmetric chroman synthesis catalyzed by type II Adanve acids.

5.2.5 Biosynthesis of E,E,E-geranylgeraniol in S. cerevisiae

In yeast, E,E,E-GGOH is derived from the mevalonate pathway. Farnesyl pyrophosphate synthase (ERG20) catalyzes the formation of E,E-FPP from IPP and DMAPP. Geranylgeranyl pyrophosphate synthase (BTS1) catalyzes the condensation of E,E-FPP and IPP to form geranylgeranyl pyrophosphate (GGPP), which can be hydrolyzed by endogenous phosphatases like DPP1 to produce GGOH (Fig. 5-8). Previously, low-level GGOH production was observed

205

in the strain ATCC200589 and AURGG101 (ATCC200589 with mutations in the AUR1C gene)44. This was increased to 2.3 mg/L of GGOH in AURGG101 upon overexpression of

HMG1 or a truncated version of HMG1 (Δ1192–1266). GGOH production has also been reported to be achieved by the use of squalene synthase and phytoene synthase inhibitors in the carotenoid producing yeast Rhodotorula rubra45.

Figure 5-8. Biosynthetic pathway for the production of E,E,E-geranylgeraniol in S. cerevisiae.

Recently, high-level GGOH production has been reported to be achieved via the use of fusions of enzymes catalyzing sequential steps in the GGOH biosynthesis pathway46,47. Enzyme

206

fusions are thought to promote substrate channeling by increasing the proximity between the active sites of sequential enzymes, thus reducing diffusive loss of the substrate/product between different steps of the pathway48. A diploid prototroph (background YPH499) overexpressing

HMG1 and the double fusion ERG20-BTS1 was reported to produce 138 mg/L GGOH in a 5 L fermenter, but this strain also produced 60 mg/L of squalene46. Later, the group engineered

YPH499 to overexpress HMG1 and the two double fusion constructs BTS1-DPP1 and BTS1-

ERG20, which they report to produce 3.3 g/L GGOH under optimal conditions in a fermenter47.

In this study, we attempted to express a novel triple fusion construct (DPP1-BTS1-

ERG20) as well as the two double fusions (DPP1-BTS1 and BTS1-ERG20) in the previously reported YPH499 strain, and also another background: AN159Y (ATCC200589 engineered with the AUR1C mutation using CRISPR). First, we overexpressed a truncated version of the rate- limiting enzyme in the mevalonate pathway, HMG-CoA reductase (tHMG1), in both backgrounds, which led to low level (< 1 mg/L) GGOH production (Fig. 5-9). Next, we found that co-expression of the two double fusions (DPP1-BTS1 and BTS1-ERG20) led to the production of ~ 5 mg/L GGOH in the background strain YPH499 (strain AN-GG1-T3) but not

AN159Y (AN-GG2-T5) (Fig. 5-9). This yield was more than doubled to ~ 13 mg/L when we overexpressed the triple fusion of DPP1-BTS1-ERG20 in AN159Y/tHMG1 (strain AN-GG2-T3) but not YPH499 (strain AN-GG1-T4) (Fig. 5-9).

207

Figure 5-9. Production yield of GGOH in engineered yeast strains. D-B: DPP1-BTS1 double enzyme fusion. B-E: BTS1-ERG20 double enzyme fusion. D-B-E: DPP1-BTS1-ERG20 triple fusion. Extracted yields are reported in mg/L of growth media. For each strain, the number in brackets indicates the number of independent transformant colonies tested, except for AN159Y/tHMG1/D-B-E (AN-GG2-T3), where five independent colonies derived from a streak of one original transformant colony were tested.

Error bars represent one standard deviation.

Both AN-GG1-T3 (double fusion) and AN-GG2-T3 (triple fusion) produced E,E-FOH at approximately ten-fold lower amounts than GGOH, compared to the background strains overexpressing tHMG1 alone, which produced similar levels of GGOH and E,E-FOH (Fig. 5- 208

10). This suggests that the enzyme fusion constructs, if properly expressed, can help in substrate channeling for GGOH production.

Figure 5-10. Production yield of prenyl alcohols in yeast strains engineered for GGOH production.

Data is plotted on a log scale to capture the full range of production values. Strains are labelled as in Fig.

5-9.

209

5.3 Discussion

Prenyl alcohols such as E,E-FOH, NOH and E,E,E-GGOH are attractive targets for metabolic engineering, as they have a variety of uses, such as starting materials for fragrances, vitamins and pharmaceuticals49. Our results show that in yeast, overexpression of tHMG1 under the strong, constitutive TEF1 promoter (AN-ZE1-T7) can increase the overall production of the downstream terpenoids E,E-FOH, NOH, GGOH and squalene (Fig. 5-6). However, out of six independent transformants (colonies) of AN-ZE1-T7 screened, two produced a 10-fold higher yield of the terpenoids (high-AN-ZE1-T7) compared to the other four (low-AN-ZE1-T7). Also for GGOH production, the strains AN-GG1-T4 (triple fusion in YPH499) and AN-GG2-T5

(double fusions in AN159Y) did not produce any detectable GGOH, which was surprising given that the same plasmids, when transformed into the alternate strain background led to production

(Fig. 5-6). Furthermore, we observed a generally low transformation efficiency with the triple fusion plasmids (only four colonies for AN-GG2-T3 and two colonies for AN-GG1-T4). Out of the four colonies for AN-GG2-T3, only one led to GGOH production.

These results suggest that colonies from the same transformation attempt can exhibit large differences in production phenotypes, which could be due to reasons such as mutations, differences in plasmid copy number and/or imbalance in pathway flux. Therefore, when testing various constructs on plasmids, it is important to screen several colonies to account for this variability. Alternatively, chromosomal integration of the genes can help to reduce variability and increase strain stability. Also, there could be problems with the folding of the enzyme fusion constructs or natural selection for spontaneous mutations that prevent the expression of the fusion constructs, since these are expressed on high copy plasmids and are likely to impose a metabolic burden on the cells. In future, this can be potentially remedied by growing the strains

210

at a lower temperature (e.g. 28 ºC) to improve folding, using low copy plasmids, and tuning the expression levels of the enzyme constructs using promoter libraries. Despite this, the colonies that produced GGOH consistently did so across multiple replicate fermentations, which suggests that enzyme folding is not an issue.

E,E-FPP and E,E,E-GGPP are natural intermediates downstream of the MEP pathway in

E. coli and the MVA pathway in S. cerevisiae (Fig. 5-8). In yeast, two E,E-FPP molecules are joined by squalene synthase (ERG9) to form squalene, which is the first committed step in the ergosterol biosynthesis pathway. Thus, the majority of engineering efforts in yeast for the production of E,E-FOH and other terpenes have traditionally relied on ERG9 repression or mutated (non-functional) ERG9 combined with overexpression of specific terpene synthases/cyclases to divert flux to the production of terpenoids of interest10,50. However, this approach often requires supplementation of the media with ergosterol, an essential component of yeast cell membranes, which adds to fermentation costs. Possible ways to circumvent this requirement may be to improve substrate channeling through co-localization or fusions of the enzymes in the pathway, or introducing a highly active downstream enzyme that can compete effectively with ERG9 for the substrate FPP. In our yeast engineering efforts for Z,E-FOH production, we observed that expression of enzyme fusions containing the phosphatase DPP1 resulted in a significant drop in squalene levels (Fig. 5-6). This may be due to substrate channeling or DPP1 sequestering FPP and decreasing the pool of FPP for squalene synthesis.

Next, since Z,E-FPP is a non-natural isomer of E,E-FPP in E. coli and S. cerevisiae, it could form a possible orthogonal pathway for the synthesis of unnatural isomers of natural metabolites in the cell. This may open up further interesting lines of investigation into

211

biochemistry and cell biology, such as the enantioselectivity of enzymes, and whether unnatural metabolite isomers are functional in the cell.

Finally, the production yields reported in this work are based on mg/L of growth media and not of the extract. It has been shown that a significant proportion of FOH and GGOH remain stuck within the cell or in the cell membrane even with the addition of soybean oil as a surfactant51. Thus, actual production yields are likely to be higher. Furthermore, during our bench-scale production of GGOH using the strain AN-GG2-T3, we noticed that the yield of

GGOH was lower (about 6 mg/L from one trial, data not shown) than that obtained from small- scale flask fermentations (~13 mg/L). This was probably due to a lack of proper control of aeration, pH, mixing and nutrient availability in our crude setup. Further optimization of these factors during scale-up can lead to significant increases in production yield47.

5.4 Experimental methods

Culture media. LB media was made from Miller powdered mix (broth:10285, plates:10283),

TB media was made from BD-Difco powdered mix (243820). 99 % glycerol (G5516), D-(+)- glucose (G7021), ampicillin sodium salt (A0166) and spectinomycin dihydrochloride pentahydrate (S40014) were purchased from Sigma-Aldrich. YPD media (1 L): 10 g BD Bacto

Yeast Extract (212750), 20 g Bacto Peptone (211677), dH2O to 950 mL, autoclave; when cooled to 50 ºC add 50 mL 40 % glucose. For yeast SC selective media, see Table 5-4 and 5-5.

Strain construction for Z,E-FOH production. BL21 (DE3) cells were made competent using

52 CaCl2 and transformed with plasmid pA8 (carrying an IPTG-inducible subset of the B. subtillis

MEP pathway) and with one of four plasmids carrying various Z,E-FPPSs under an IPTG-

212

inducible promoter. The four Z,E-FPPSs were derived from various species of bacteria and codon-optimized for expression in E. coli: M. tuberculosis/RV1086 (pFPP2), T. fusca/Tfu0456

(pFPP3), C. efficiens/CE1055 (pFPP4) and C. glutamicum/uppS1 (pFPP5). E. coli transformants were selected on LB plates containing 50 mg/L spectinomycin (for pA8) and 100 mg/L ampicillin (for pFPPs). The yeast strain ATCC200589 was obtained from the American Type

Culture Collection, Rookville. The other strains used in this study (see Table 5-1) were constructed by transforming various plasmids (see Table 5-2) into the background strain

ATCC200589. All yeast transformations were performed according to established protocols53.

Strain construction for E,E,E-GGOH production. The yeast strain YPH499 (ATCC76625) was obtained from the American Type Culture Collection, Rookville. AN159Y which contains mutations (F158Y and A240C) in the AUR1C gene of ATCC20058944 was constructed by the

CRISPR method54. In brief, pAN70 (a kind gift from the Church lab) containing yeast codon- optimized Cas9 under a constitutive TEF1p promoter was transformed into ATCC200589. A guide RNA (gRNA) plasmid, pAN151, was constructed to target Cas9 to AUR1C. A repair fragment (for homology-directed repair) containing the desired mutations was created by PCR of

AUR1C in 3 parts using ATCC200589 genomic DNA as template and the following primer pairs: AUR1C-a: AN627/AN628; AUR1C-b: AN629/AN630; AUR1C-c: AN631/AN632. The three parts were fused by PCR using the external primers AN627/AN632. pAN151 (500 ng) and the repair fragment (2 µg) were transformed into the previously obtained ATCC200589/pAN70-

Cas9 colonies and plated on SC agar plates lacking histidine, tryptophan and uracil. Correct clones were confirmed by PCR of genomic DNA (YeaStar Genomic DNA Kit, Zymo Research).

PCR was performed using Phusion polymerase (NEB). The other strains used in this study (see

213

Table 5-1) were constructed by transforming various combinations of plasmids (see Table 5-2) into either of the background strains AN159Y or YPH499.

Plasmid construction for Z,E-FOH production. Four Z,E-FPPSs previously identified in four species of bacteria (M. tuberculosis:RV1086, T. fusca:Tfu0456, C. efficiens:CE1055, and C. glutamicum:uppS1)40 were codon-optimized for expression in E. coli and synthesized commercially by DNA 2.0 Inc. Each of the four Z,E-FPPSs was amplified by PCR to add NcoI and XhoI restriction sites on the 5’ and 3’ ends respectively. The resulting amplicons and pTrc2B vector were digested with NcoI and XhoI, and the construct inserted via InFusion cloning

(Clontech), to create the four IPTG-inducible Z,E-FPPS expression plasmids pFPP2, pFPP3, pFPP4 and pFPP5. The cloning mix was transformed into electrocompetent E. coli TG1 cells and selected on LB/ampicillin agar plates. This cloning was performed by our collaborator Acidophil

LLC. Plasmid pA8 containing the B. subtilis MEP pathway was also constructed by Acidophil. It consists of a pCL1920 backbone55 with a replication origin from pSC1010 and carrying a spectinomycin resistance cassette. It contains the B. subtilis MEP genes dxs, dxr, ispD, ispF, ispG, ispH and idi, which were codon-optimized for E. coli expression. The genes were cloned under the IPTG-inducible pLac promoter. B. subtilis genes were chosen due to the higher flux through the MEP pathway in B. subtilis compared to E. coli39.

The general strategy adopted for the construction of yeast expression plasmids was: (1) PCR of the individual parts (see Table 5-3 for primers), (2) overlap/fusion PCR (external primers

AN409 and AN412 for all plasmids except pAN127) to create a transcriptional unit (TU) composed of the promoter, ORF and terminator, (3) Gibson assembly to insert the TU into the

214

cut plasmid backbone, and (4) transformation of the Gibson mixture into E. coli TG1 cells for selection on LB/ampicillin plates and amplification prior to transforming into S. cerevisiae.

pAN96 was constructed by Gibson assembly of the vector pRS41656 digested with SacI and

KpnI, and PCR products of the promoter and terminator, as well as synthetic double-stranded gene fragments (gblock-1 and gblock-2, Integrated DNA Technologies) coding for yeast codon- optimized Tfu0456 containing an N-terminal His6-tag and TEV cleavage site (Table 5-3). pAN99 was constructed via the same strategy but using gblock-3 and gblock-4 coding for yeast codon-optimized Rv1086 containing an N-terminal His6-tag and TEV cleavage site. pAN193 and pAN194 were constructed by Gibson assembly of the vector pRS426 digested with SacI and

KpnI and the PCR amplified transcriptional units from pAN96 and pAN99 respectively (primers

AN409 and AN412).

The plasmids pAN137 and pCP51 containing the enzyme fusions were constructed by Gibson assembly of SacI and KpnI-digested pRS426 with a fusion PCR product containing the promoter,

N-terminal His6-tag and TEV cleavage site, DPP1, GGGS linker, Rv1086 or Tfu0456 and terminator amplified from pAN99/pAN96 respectively. For these two plasmids, the N-terminal

His6-tag and TEV cleavage site were created by primer extension (primers AN466 and AN467) followed by a second round of PCR with the primers AN468 and AN469. The plasmids pAN138 and pAN141 were constructed by Gibson assembly of SacI and KpnI–digested pRS426 with a fusion PCR product containing the promoter, Z,E-FPPases amplified from pAN96/pAN99 respectively, GGGS linker, DPP1 and terminator.

215

Plasmid construction for E,E,E-GGOH production. The strategy adopted for the construction of the yeast expression plasmids pAN117, pAN119, pAN127 and pAN128 (see Table 5-2) was:

(a) PCR of the individual parts (see Table 5-3 for primers and parts), (b) fusion PCR (external primers AN409/AN412 for pAN117/pAN119; AN568/AN569 for pAN127/pAN128) to create a transcriptional unit (TU) composed of the promoter, ORF and terminator, (c) Gibson assembly

(NEB, E2611S) to insert the TU into the cut plasmid backbone (pRS426 digested with SacI and

KpnI for pAN117/pAN119; pRS424 digested with SacI and KpnI for pAN127/pAN128), (d) transformation of the Gibson mixture into electro-competent E. coli TG1 cells followed by selection on LB/ampicillin plates, (e) testing of colonies by amplification in LB/ampicillin media overnight followed by plasmid extraction (Miniprep kit, Qiagen), sequencing and restriction analysis to determine correct constructs.

pAN120 was constructed by Gibson assembly of the vector pAN117 (digested with KpnI) and

PCR products of PGK1 (AN536/AN540) and BTS1-ERG20 (AN541/AN537) using pAN119 as a template. pAN151 was constructed by PCR of SNR52p (AN318/AN625) and SUP4t

(AN626/AN321) using a CAN1-gRNA plasmid as template (kind gift from the Church lab), followed by fusion PCR, restriction digestion (SacI/XhoI) and ligation into SacI/XhoI-digested pRS426.

Small scale production trials for E. coli Z,E-FOH production. E. coli farnesol production was performed in sterilized, baffled, 125 mL Erlenmeyer flasks (Corning 4444-125) with culture caps (KimKap 73665-25) in an incubator shaker (New Brunswick, Innova 43R). On Day 1, a single colony was inoculated into a 14 mL Falcon culture tube containing 3 mL LB with 100

216

mg/L ampicillin and 50 mg/L spectinomycin. The culture was shaken at 37 C and 230 rpm for

15 hours. On Day 2, 0.5 mL of the saturated overnight culture was added to 50 mL of LB with

1% glycerol, 100 mg/L ampicillin and 50 mg/L spectinomycin. The flasks were grown at 30C and 230 rpm until they reached an optical density (OD600) of approximately 1.5 (4-6 hrs) as determined by a 1:10 dilution of the culture in LB media. OD600 measurements were carried out in a SpectraMaxPlus 384 spectrophotometer using 1.5 mL cuvettes (VWR, 97000-586). At this point, IPTG was added (0.2 mM) to induce expression of the MEP pathway components and

Z,E-FPPS. An overlay of 4.5 mL of decane (95 %, Sigma 30570) was added at the time of induction. The culture was harvested on Day 3 after 24 hours post-induction.

Small scale production trials for S. cerevisiae Z,E-FOH and E,E,E-GGOH production.

Yeast prenyl alcohol production was performed in sterilized, 50 ml Erlenmeyer flasks (VWR) in an incubator shaker (New Brunswick, Innova 43R) at 30 ºC and 230 rpm for a total of 7 days. On

Day 1, 0.3 ml of saturated, overnight culture grown in selective SD media was added to 2.7 ml of

SD or YPD media. On Day 2, 3 ml of YPD and 150 µl of decane (final concentration of 2.5 %) were added. On Day 3, 4 ml of YPD and 100 µl of decane were added. Cultures were harvested on Day 8 after incubation for 7 days. As a measure of cell growth, the final optical density

(OD600) was determined by 10-fold dilution of the culture broth with YPD.

Bench scale production for Z,E-FOH production. Bench scale E. coli production was performed in a Bellco 6 L Low Profile Vertical Sidearm Bioreactor vessel. Filtered air was supplied via a sidearm sparger, and vented via two exit ports equipped with moisture traps. The 6

L vessel was placed in a water bath to maintain the temperature at 29.5 C. Mixing of the broth

217

was achieved by the sparged air. There was no control of dissolved oxygen, pH, nutrient availability or culture volume. Bench scale production was attempted only with the FPP2/A8 E. coli strain. Briefly, on Day 1, a single colony was inoculated into a 14 mL Falcon culture tube containing 3 mL LB with 100 mg/L ampicillin and 50 mg/L spectinomycin, and grown at 37 C and 230 rpm for 8 hours. The 3 mL culture was then used to inoculate 25 mL of the same media and incubated for 15 hours at 37 C and 230 rpm. On Day 2, the 25 mL overnight culture was added to 2.5 L of TB broth and 1 % glycerol (autoclaved at 250 F for 45 minutes on Day 1) in a

6 L vessel, along with 400 µl of Sigma Antifoam-400, 100 mg/L ampicillin and 50 mg/L spectinomycin. The culture was grown until it reached an OD600 of approximately 0.3 (4-6 hours), at which time 0.2 mM IPTG was added to induce expression of the Z,E-FPPS RV1086 and MEP pathway components. 375 mL of decane was added to the 6 L vessel at the time of induction to form an overlay. Pulses of glycerol (25 mL, 1 % of final vol) were added at 8 and 16 hours post-inoculation. At 24 hours post-inoculation, 1 L of TB media containing the aforementioned concentrations of IPTG and antibiotics was added to replace media lost from the

6 L vessel. A further 250 mL of decane was also added at this time to replenish the overlay. 6 L vessels were collected and extracted at 48 hours post-induction.

Bench scale production for E,E,E-GGOH production. The same setup as for E. coli was used. On Day 1, 100 mL of starter culture grown for 2 days was added to 2 L of YPD to start the fermentation. On Day 2, 1 L of YPD and 50 mL of decane were added. On Day 3 and Day 6, another 1 L of YPD and 100 mL decane were added. The culture was harvested on Day 8 (1 week).

218

Prenyl alcohol extraction for E. coli. Small scale flask fermentations were transferred into a 50 mL falcon tube. The inside of the flask was rinsed with 5 mL of ethyl acetate (99.5 %, Sigma

437549), and combined with the fermented broth. 1 mL of brine was added to aid separation.

The tube was vortexed vigorously for 4 x 30 seconds. The tube was then centrifuged at 3200 rpm for 10 minutes and the top organic layer was filtered and transferred to a glass GC-MS vial for analysis. Bench scale fermentations were transferred into a 2 L separation funnel in batches of 1

L. 500 mL of ethyl acetate and 20 mL of brine were added to each batch in the funnel, and the contents shaken vigorously. Separation was allowed to occur and the aqueous layer was removed and re-extracted once more as above. The organic layer was transferred into centrifuge tubes and spun at 4000 rpm for 10 minutes to break the emulsion. The top organic layer was recovered, combined with organic layers from other batches and rotavapped to remove the ethyl acetate.

The concentrated product was dissolved in hexane for analysis (99 %, Sigma 32293).

Prenyl alcohol extraction for S. cerevisiae. For small-scale fermentation trials, 1 ml of the final culture broth was removed for pH and OD600 measurements. The remainder of the culture

(~8.4 ml) was transferred into a 15 ml falcon tube. 1 ml of brine and 2 ml of hexane were added.

The tube was placed horizontally in an incubator shaker (230 rpm) for 1 hour to mix. The mixture was centrifuged at 3200 rpm for 10 min and the top hexane layer was filtered and transferred into a glass GC-MS vial for analysis. For bench-scale production, 8.4 ml of the broth was sampled as above and analyzed by GC-MS after 7 days. The remainder of the broth was mixed in batches with ethyl acetate and transferred into a 2 L-separating funnel. The organic layer was collected, dried (MgSO4) and concentrated under reduced pressure. The resulting concentrate was purified on silica gel to yield an oil, which was weighed and its identity further confirmed by 1H-NMR.

219

Analysis of prenyl alcohols. Z,E-FOH, E,E-FOH, nerolidol, GGOH and squalene were analyzed using a ThermoFinnigan PolarisQ GC-MS (Agilent J&W column, #CP8907, VF-1 ms

15 m X 0.25 mm x 0.25 µm) at NYU. The instrument was acquired through the support of the

National Science Foundation under Award number DUE-0126958. GC-MS runs were carried out with the following method: MS: ion source 225 ºC, EI/NICI, mass range 50-600. GC: helium gas, 1.2 ml/min, inlet temperature 225 ºC, initial column temperature 50 ºC, hold 2.5 minutes, raise at 17 ºC/min to 285 ºC, hold 3.5 minutes.

For quantitative analysis of E. coli fermentations, cis-nerolidol (96 %, Sigma 72180) was used as an internal standard. A standard concentration curve was created by graphing the ratio of the total ion counts (TIC) area of the E,E-FOH (96 %, Sigma 277541) peak to the TIC area of the

25 mg/L cis-nerolidol peak for a range of E,E-FOH concentrations. To determine the concentration of Z,E-FOH in the biosynthesis extracts, each sample was dosed to achieve a final concentration of 25 mg/L cis-nerolidol. The equation of the linear regression of the standard curve was used to calculate the concentration of Z,E-FOH42. The production values were back calculated and reported as mg/L of culture media. Specifically, the total amount of extracted Z,E-

FOH was calculated by multiplying the determined concentration by the volume of the extract.

This was then divided by the total volume of culture to determine production yield in mg/L of media.

For quantitative analysis of S. cerevisiae fermentations, undecanol was used as an internal standard. The ratio of the selected ion peak area (67 mass fragment) of 25 mg/L undecanol to that of 25 mg/L of authentic standards was used as a standard ratio. The amount of prenyl alcohol in a sample was determined by comparison of this ratio to the ratio in a sample spiked with 25 mg/L undecanol.

220

5.5 Strains, plasmids, oligonucleotides and media

Table 5-1. Strains used in this study.

Source/ Name Genotype/Plasmids-insert Reference

FPP2/A8 DH5(DE3)/pA8/pFPP2 This study

FPP3/A8 DH5(DE3)/pA8/pFPP3 This study

FPP4/A8 DH5(DE3)/pA8/pFPP4 This study

FPP5/A8 DH5(DE3)/pA8/pFPP5 This study

ATCC200589 MATalpha can1 leu2 trp1 ura3 aro7 ATCC

AN-ZE1-T7 ATCC200589/pAN127-tHMG1 This study

AN-ZE5-T1 ATCC200589/pAN127-tHMG1/pAN193-Tfu0456 This study

AN-ZE5-T2 ATCC200589/pAN127-tHMG1/pAN194-Rv1086 This study

CP-51 ATCC200589/pAN127-tHMG1/pCP51-DPP1-Tfu0456 This study

EA-3 ATCC200589/pAN127-tHMG1/pAN137-DPP1-Rv1086 This study

EA-14 ATCC200589/pAN127-tHMG1/pAN138-Tfu0456-DPP1 This study

AN-ZE2-T3 ATCC200589/pAN127-tHMG1/pAN141-Rv1086-DPP1 This study

YPH499 MATa ura3-52 lys2-801 ade2-101 trp1-Δ63 his3-Δ200 leu2-Δ1 ATCC (ATCC76625)

ATCC200589 MATalpha can1 leu2 trp1 ura3 aro7 ATCC

This study, AN159Y ATCC200589/AUR1C 44

AN-CP4-S1 YPH499/pAN127-tHMG1 This study

AN-CP2-S4 AN159Y/pAN127-tHMG1 This study

AN-GG1-T3 YPH499/pAN127-tHMG1/pAN120-DPP1-BTS1/BTS1-ERG20 This study

AN-GG1-T4 YPH499/pAN127-tHMG1/pAN119-DPP1-BTS1-ERG20 This study

AN-GG2-T3 AN159Y/pAN127-tHMG1/pAN119-DPP1-BTS1-ERG20 This study

AN-GG2-T5 AN159Y/pAN128-HMG1/pAN120-DPP1-BTS1/BTS1-ERG20 This study

221

Table 5-2. Plasmids used in this study.

Transcriptional Unit Name Origin Marker Details (Promoter-ORF-Terminator) pSC101, pLac-(dxs, dxr, ispD,ispF, idi, E. coli codon-optimized MEP pA8 SpecR E coli ispH) –T1 components from B. subtilis pBR322, E. coli codon-optimized Z,E- pFPP2 AmpR pTrc-RV1086-rrnBt E coli FPPase from M. tuberculosis pBR322, E. coli codon-optimized Z,E- pFPP3 AmpR pTrc-Tfu0456-rrnBt E coli FPPase from T. fusca pBR322, E. coli codon-optimized Z,E- pFPP4 AmpR pTrc-CE1055-rrnBt E coli FPPase from C. efficiens pBR322, E. coli codon-optimized Z,E- pFPP5 AmpR pTrc-uppS1-rrnBt E coli FPPase from C. glutamicum pAN127 2µ, yeast TRP1 TEF1p-tHMG1-ACT1p Truncated tHMG

CEN, pAN96 URA3 PGK1p-Tfu0456-ADH1t Codon optimized Tfu0456 yeast CEN, pAN99 URA3 PGK1p-Rv1086-ADH1t Codon optimized Rv1086 yeast pAN193 2µ, yeast URA3 PGK1p-Tfu0456-ADH1t Codon optimized Tfu0456 pAN194 2µ, yeast URA3 PGK1p-Rv1086-ADH1t Codon optimized Rv1086 pAN137 2µ, yeast URA3 PGK1p-DPP1-Rv1086-ADH1t Fusion of DPP1-Rv1086 pAN138 2µ, yeast URA3 PGK1p-Tfu0456-DPP1-ADH1t Fusion of Tfu0456-DPP1 pAN141 2µ, yeast URA3 PGK1p-Rv1086-DPP1-ADH1t Fusion of Rv1086-DPP1 pCP51 2µ, yeast URA3 PGK1p-DPP1-Tfu0456-ADH1t Fusion of DPP1-Tfu0456

CEN, pAN70 TRP1 TEF1p-Cas9-CYC1t Gift from Di Carlo, Church lab yeast

CRISPR gRNA plasmid to pAN151 2µ, yeast URA3 SNR52p-cr17-gRNA-SUP4t create AUR1C mutation pAN127 2µ, yeast TRP1 TEF1p-tHMG1-ACT1t Truncated HMG1 pAN128 2µ, yeast TRP1 TEF1p-HMG1-ACT1t Wild-type HMG1 pAN117 2µ, yeast URA3 PGK1p-DPP1-BTS1-ADH1t Double fusion DPP1-BTS1

222

PGK1p-DPP1-BTS1-ERG20- Triple fusion DPP1-BTS1- pAN119 2µ, yeast URA3 ADH1t ERG20

Two double fusions DPP1- PGK1p-DPP1-BTS1-ADH1t / pAN120 2µ, yeast URA3 BTS1/BTS1-ERG20 on one PGK1p-BTS1-ERG20-ADH1t-rev plasmid

223

Table 5-3. Primers and dsDNA gene fragments (gblocks) used in this study.

Name Plasmid Template Sequence (5’ – 3’)

CACTAAAGGGAACAAAAGCTGGAGCTATAGCTTCAA AN569 pAN127 TEF1p-F AATGTTTCTACTCC GTTTTCACCAATTGGTCCATAAACTTAGATTAGATTGC AN387 pAN127 TEF1p-R TATGCT TAGCAATCTAATCTAAGTTTATGGACCAATTGGTGAA AN388 pAN127 tHMG1-F AACTGAAG TACGCGCACAAAAGCAGAGATTAGGATTTAATGCAG AN401 pAN127 tHMG1-R GTGACGG

AN402 pAN127 ACT1t-F TCACCTGCATTAAATCCTAATCTCTGCTTTTGTGCGCG

GACTCACTATAGGGCGAATTGGGTACACACTATGATA AN568 pAN127 ACT1t-R TATAAATATAATAGTTTTTCG All CCTCACTAAAGGGAACAAAAGCTGGAGCTCGAAGTA AN409 except PGK1p-F CCTTCAAAGAATGGGGTC pAN127 ATGATGATGGTGCATTTGTTTTATATTTGTTGTAAAAA AN413 pAN96 PGK1p-R GTAGATAATTAC ACAAATATAAAACAAATGCACCATCATCATCACCACG GTAAACCAATTCCTAATCCTTTGTTAGGTTTGGACTCC ACTGAAAACTTATATTTCCAAGGTATTGACCCATTCA CCATGGGTTTACTTAGAATTCCCTTGTACAGATTGTAC GAAAGAAGACTAGAACGTGCCTTGTCCAACGCGCCCA AGCCAAGACACGTTGGTGTTATTTTGGACGGTAACAG gblock- pAN96 Tfu0456-1 AAGATGGGCTCGTTTGTCCGGTTTGTCTTCACCAAAG 1 GAAGGTCATAGAGCTGGTGCCGAGAAGATCTTCGAAC TGTTGGATTGGTGTGACGAAGTTGGTGTCCAAGTCGT CACTTTGTGGTTACTTTCTACAGATAACTTGGCTCGTC CACCAGAAGAATTGGAACCATTGTTCGAAATTATTGA AAATACCGTTAGAAGATTGTGTAATGAAGGTAGAAG AGTTAACCCAATGGGT AATGAAGGTAGAAGAGTTAACCCAATGGGTGCTTTAG ATTTGTTACCAGCTTCTACGGCTCAAGTCATGAAGGA AGCTGGTACCACTACCGAAAGAAACCCAGGTTTATTG GTCAACGTCGCTGTCGGTTACGGTGGTAGAAGAGAAA gblock- TCGCTGATGCTGTTAGATCTTTGTTGTTGGAGGAAGCT pAN96 Tfu0456-2 2 GCTAAGGGTACTACCTTGGAAGAATTGGCGGAAAGAT TGGACTTAGACGATATTGCTAAGCATTTGTACACTAG AGGTCAACCAGACCCAGACTTGTTGATTAGAACCTCT GGTGAACAAAGATTGTCTGGTTTCTTGTTGTGGCAAT CTGCTCACTCTGAATTCTACTTCTGTGAAGTTTTCTGG

224

CCAGCCTTCCGTAAGATTGATTTCTTGAGAGCTTTGAG ATCTTATAGCGTCAGACAAAGACGTTTTGGTTGTTAA GCGAATTTCTTATGA CGTTTTGGTTGTTAAGCGAATTTCTTATGATTTATGAT AN414 pAN96 ADH1t-F TTTTAT

All TACGACTCACTATAGGGCGAATTGGGTACCGAGCGAC AN412 except ADH1t-R CTCATGCTATACC pAN127 GTGATGATGATGCATTTGTTTTATATTTGTTGTAAAAA AN410 pAN99 PGK1p-R GTAGATAATTAC ACAAATATAAAACAAATGCATCATCATCACCACCATG GTAAACCAATTCCAAACCCATTGTTGGGTTTGGACTC TACCGAAAACTTGTACTTCCAAGGTATCGACCCATTC ACCATGGAGATTATTCCTCCAAGATTGAAAGAACCAT TGTACAGATTGTATGAATTGAGATTGAGACAAGGTTT GGCTGCTTCCAAGTCTGACTTACCAAGACATATAGCT gblock- pAN99 Rv1086-1 GTTTTGTGTGATGGTAACAGACGTTGGGCTAGATCTG 3 CCGGTTACGACGATGTCTCTTACGGTTACAGAATGGG TGCTGCTAAGATTGCCGAAATGTTAAGATGGTGTCAT GAAGCTGGTATCGAATTGGCTACCGTCTATTTGTTATC TACTGAAAATTTGCAAAGAGATCCAGATGAACTTGCG GCCTTGATTGAAATCATCACTGACGTTGTCGAAGAAA TTTGTGCCCCAGCTAATCACTGGAGCGTT ATTTGTGCCCCAGCTAATCACTGGAGCGTTAGAACCG TCGGTGACTTAGGTTTGATTGGTGAAGAACCAGCTAG AAGGTTGAGAGGTGCTGTTGAATCCACTCCAGAAGTT GCCTCTTTCCACGTTAATGTGGCGGTGGGTTACGGTG GTAGGAGAGAAATTGTCGATGCCGTTAGAGCTTTATT GTCCAAGGAATTAGCTAACGGTGCTACTGCCGAAGAA gblock- pAN99 Rv1086-2 TTGGTCGATGCTGTTACTGTTGAAGGTATTTCTGAAAA 4 CTTGTACACTTCCGGTCAACCAGACCCAGACTTGGTT ATAAGAACCTCCGGTGAACAAAGATTGTCTGGTTTCT TGTTATGGCAATCTGCTTATTCTGAAATGTGGTTCACT GAAGCTCATTGGCCAGCTTTCAGACACGTCGATTTCTT GAGAGCCTTGAGAGATTATTCGGCTCGTCATAGATCT TACGGTCGATAAGCGAATTTCTTATGA TCTTACGGTCGATAAGCGAATTTCTTATGATTTATGAT AN411 pAN99 ADH1t-F TTTTAT pAN137/ GGTGGTGATGATGATGCATTTGTTTTATATTTGTTGTA AN465 PGK1p-R pCP51 AAAAGTAGATAATTAC pAN137/ CATCATCATCACCACCATGGTAAACCAATTCCAAACC AN466 HIS -TEV-F1 pCP51 6 CATTGTTGGGTTTGGACTCTACC

AN467 pAN137/ HIS6-TEV-R1 GTGAATGGGTCGATACCTTGGAAGTACAAGTTTTCGG

225

pCP51 TAGAGTCCAAACCCAACAATGGG

pAN137/ CAACAAATATAAAACAAATGCATCATCATCACCACCA AN468 HIS -TEV-F2 pCP51 6 TGGTAAAC pAN137/ ATAAACGAAACTCTGTTCATGGTGAATGGGTCGATAC AN469 HIS -TEV-R2 pCP51 6 CTTGGAAG pAN137/ AAGGTATCGACCCATTCACCATGAACAGAGTTTCGTT AN470 DPP1-F pCP51 TATTAAAACG TAATCTCCATAGAACCACCACCCATACCTTCATCGGA AN471 pAN137 DPP1-R CAAAGGATG AGGTATGGGTGGTGGTTCTATGGAGATTATTCCTCCA AN472 pAN137 Rv1086-F AGATTGAA CATAAGAAATTCGCTTATCGACCGTAAGATCTATGAC AN473 pAN137 Rv1086-R G CTTACGGTCGATAAGCGAATTTCTTATGATTTATGATT AN474 pAN137 ADH1t-F TTTAT GTGATGATGATGGTGCATTTGTTTTATATTTGTTGTAA AN480 pAN138 PGK1p-R AAAGTAGATAATTAC CAACAAATATAAAACAAATGCACCATCATCATCACCA AN481 pAN138 Tfu0456-F CG CTGTTCATAGAACCACCACCACAACCAAAACGTCTTT AN482 pAN138 Tfu0456-R GTCTGAC TTGGTTGTGGTGGTGGTTCTATGAACAGAGTTTCGTTT AN483 pAN138 DPP1-F ATTAAAACG pAN138/ AAATCATAAATCATAAGAAATTCGCTTACATACCTTC AN431 DPP1-R pAN141 ATCGGACAAAGG pAN138/ TCCTTTGTCCGATGAAGGTATGTAAGCGAATTTCTTAT AN432 ADH1t pAN141 GATTTATGATTTTTAT GTGGTGATGATGATGCATTTGTTTTATATTTGTTGTAA AN475 pAN141 PGK1p-R AAAGTAGATAATTAC CAACAAATATAAAACAAATGCATCATCATCACCACCA AN452 pAN141 Rv1086-F TGG CTGTTCATAGAACCACCACCTCGACCGTAAGATCTAT AN476 pAN141 Rv1086-R GACGAG CTTACGGTCGAGGTGGTGGTTCTATGAACAGAGTTTC AN477 pAN141 DPP1-F GTTTATTAAAACG GTAAACCCATAGAACCACCACCCATACCTTCATCGGA AN478 pCP51 DPP1-R CAAAGGATG

226

aggtatgggtggtggttctATGGGTTTACTTAGAATTCCCTTGTA AN479 pCP51 Tfu0456-F C pFPP2, FPP- pFPP3, GAGGAATAAACCATGCATCACCATCATCACCAC NcoI-F FWR pFPP4, pFPP5 Mtub_R pFPP2 XhoI-R AGCTGCAGATCTCGACTCGAGTTAGCGGCCGTA WR AGGAGGTAAAACATATGCATCACCATCATCACCACGG TAAACCAATCCCTAATCCGTTGCTGGGTCTGGATAGC ACGGAAAACCTGTATTTTCAGGGCATTGATCCGTTTA CCATGGAGATTATCCCTCCGCGTCTGAAAGAGCCGTT GTACCGTCTGTATGAGTTGCGTCTGCGTCAAGGTCTG GCGGCAAGCAAGTCTGATTTGCCGCGTCACATCGCGG TTCTGTGTGACGGCAATCGCCGTTGGGCACGTTCGGC TGGTTACGACGATGTTAGCTACGGTTATCGTATGGGT GCTGCCAAGATCGCGGAAATGCTGCGTTGGTGCCACG AAGCGGGTATTGAGCTGGCGACCGTGTATCTGCTGAG CACGGAGAATCTGCAACGTGACCCGGATGAGCTGGC AGCGCTGATCGAGATCATTACCGACGTGGTCGAAGAG optEC- pFPP2 -- ATTTGCGCCCCAGCTAACCACTGGTCCGTGCGTACCG RV1086 TCGGTGACCTGGGTCTGATCGGCGAAGAGCCGGCACG TCGCCTGCGTGGCGCGGTTGAAAGCACCCCGGAAGTG GCAAGCTTCCATGTTAATGTCGCCGTCGGCTATGGCG GTCGCCGCGAAATTGTTGACGCGGTACGTGCACTGCT GTCCAAAGAGCTGGCCAACGGCGCAACCGCGGAAGA GCTGGTGGACGCGGTTACGGTTGAGGGTATTTCTGAG AACTTGTACACCAGCGGCCAGCCGGACCCGGACCTGG TGATCCGCACGAGCGGTGAACAGCGTCTGAGCGGTTT CTTGCTGTGGCAAAGCGCGTACAGCGAGATGTGGTTT ACTGAAGCGCACTGGCCAGCGTTCCGTCATGTCGATT TCCTGCGTGCCCTGCGCGATTACTCCGCTCGCCACCGC AGCTACGGCCGCTAACTCAGA Tfus_R AGCTGCAGATCTCGACTCGAGTTAACAACCGAAGCGA pFPP3 XhoI-R WR C AGGAGGTAAAACATATGCATCACCATCATCACCACGG TAAACCAATCCCTAATCCGTTGCTGGGTCTGGATAGC ACGGAAAACCTGTATTTTCAGGGCATTGATCCGTTTA CCATGGGTCTGCTGCGTATTCCGTTGTACCGCCTGTAT optEC- GAGCGTCGCCTGGAGCGTGCTCTGAGCAACGCGCCGA Tfu045 pFPP3 -- AGCCGCGTCACGTCGGTGTGATCCTGGATGGCAACCG 6 CCGCTGGGCCCGTCTGTCGGGCCTGAGCAGCCCGAAA GAGGGTCACCGCGCAGGCGCCGAGAAAATCTTTGAA CTGCTGGACTGGTGTGACGAAGTTGGTGTCCAGGTCG TGACCCTGTGGCTGCTGAGCACCGATAATCTGGCGCG TCCGCCGGAAGAACTGGAACCGTTGTTCGAGATTATT

227

GAGAACACGGTTCGTCGCCTGTGCAATGAGGGTCGTC GTGTGAATCCGATGGGCGCACTGGACTTGCTGCCGGC TTCCACGGCGCAAGTGATGAAAGAAGCGGGCACCAC CACTGAGCGTAATCCAGGCCTGCTGGTTAACGTTGCC GTGGGTTACGGTGGCCGTCGCGAGATCGCAGACGCCG TCCGTAGCCTGCTGCTGGAAGAGGCGGCGAAGGGTAC CACGTTGGAAGAGCTGGCAGAACGTCTGGATCTGGAC GACATCGCGAAGCACCTGTACACCCGTGGTCAGCCGG ACCCGGATCTGTTGATTCGCACCAGCGGCGAGCAACG CTTGAGCGGTTTCCTGCTGTGGCAGAGCGCGCATTCC GAGTTCTACTTCTGCGAGGTATTTTGGCCGGCATTTCG TAAGATCGACTTCTTGCGTGCGCTGCGTTCTTATAGCG TTCGCCAACGTCGCTTCGGTTGTTAACTCAGA Ceff_R pFPP4 XhoI-R AGCTGCAGATCTCGACTCGAGTTACTTGCCGAAAC WR AGGAGGTAAAACATATGCATCACCATCATCACCACGG TAAACCAATCCCTAATCCGTTGCTGGGTCTGGATAGC ACGGAAAACCTGTATTTTCAGGGCATTGATCCGTTTA CCATGCACCCGGGCTGGACCTGTAACTACTCGGATAC CCCAGGCGTGCCGGGTCAAGACTGCGTTGCCATTGTG GAACAACGTCCGGATGGTAACCGTGGTGCGGGTGTCC GCGATATCCCGGTCGTGGGTACCCGTTTGCTGAATGT CCTGAATCTGCCGCGTCTGCTGTATCCGTTGTACGAGC GTCGTCTGCTGAAAGAGCTGAACGGTGCCCGTCAACC GGGTCACGTTGCGATTATGTGCGATGGCAATCGTCGT TGGGCGCGTGAGGCTGGTTTTGTTGACCTGTCTCACG GTCATCGTGTTGGCGCGAAGAAAATCGGCGAGTTGGT CCGCTGGTGTGACGACGTTGATGTGGACCTGGTTACG optEC- GTGTATCTGCTGAGCACGGAGAACCTGGGTCGCGACG pFPP4 -- CE1055 AGGCGGAACTGCAGCTGCTGTATGACATCATTGGTGA TGTGGTTGCGGAGCTGTCTTTGCCGGAAACCAACTGC CGTGTGCGCCTGGTCGGTCATCTGGACTTGCTGCCGA AAGAATTCGCGGAACGTCTGCGTGGCACCGTCTGTGA AACGGTCGACCACACTGGCGTTGCAGTGAATATTGCA GTTGGCTACGGTGGTCGCCAAGAAATCGTAGATGCTG TGCGCGACTTGCTGACCAGCGCACGTGATGAGGGCAA GACCCTGGACGAGGTCATTGAAAGCGTTGACGCGCAG GCAATCAGCACCCACCTGTACACTAGCGGCCAGCCGG ACCCGGATCTGGTGATCCGTACGAGCGGTGAGCAACG CCTGAGCGGTTTCATGCTGTGGCAGAGCGCCTATTCC GAGATTTGGTTCACGGATACCTACTGGCCTGCGTTTC GTCGCATCGACTTCCTGCGCGCACTGCGCGATTACAG CCAGCGTTCCCGCCGTTTCGGCAAGTAACTCAGA Cglu_R pFPP5 XhoI-R AGCTGCAGATCTCGACTCGAGTTACTTACCGAAGC WR optEC- AGGAGGTAAAACATATGCATCACCATCATCACCACGG pFPP5 -- uppS1 TAAACCAATCCCTAATCCGTTGCTGGGTCTGGATAGC

228

ACGGAAAACCTGTATTTTCAGGGCATTGATCCGTTTA CCATGCTGAACTTGCCGCGTCTGCTGTACCCGGTGTA CGAGCGCCGTCTGCTGCGTGAGCTGGATGGTGCAAAA CAGCCGGGTCACGTTGCGATCATGTGTGATGGTAATC GCCGCTGGGCGCGTGAGGCAGGCTTCACCGATGTTAG CCACGGCCACCGCGTCGGCGCGAAGAAAATTGGTGA GATGGTTCGTTGGTGCGACGACGTGGATGTCAATCTG GTTACGGTTTATCTGCTGTCTATGGAAAACTTGGGTCG CAGCAGCGAAGAGCTGCAACTGCTGTTCGACATCATT GCCGACGTTGCGGACGAATTGGCGCGTCCGGAAACCA ACTGTCGCGTGCGTCTGGTGGGTCACCTGGACCTGTT GCCGGACCCGGTGGCGTGCCGTCTGCGCAAGGCCGAA GAGGCGACCGTAAACAATACCGGCATCGCTGTCAATA TGGCAGTTGGCTACGGCGGTCGTCAAGAAATCGTTGA TGCAGTCCAAAAGCTGCTGACGATTGGTAAAGACGAG GGCCTGTCCGTCGATGAACTGATCGAGAGCGTCAAAG TGGATGCTATCAGCACGCATTTGTACACCAGCGGCCA GCCGGATCCAGACCTGGTGATCCGTACTAGCGGTGAG CAACGTCTGTCCGGTTTCATGCTGTGGCAGTCTGCCTA CTCGGAGATTTGGTTCACCGACACCTATTGGCCGGCA TTTCGTCGTATTGACTTTCTGCGTGCGATTCGTGACTA TAGCCAGCGCAGCCGTCGCTTCGGTAAGTAACTCAGA

AN318 pAN151 SNR52p-F ctaaagggaacaaaagctgGAGCTCTCTTTGAAAAGATAATG

GCTCTAAAACaaatgaaatagtccgtacggGATCATTTATCTTT AN625 pAN151 SNR52p-R CACTGCGGAGAAG ccgtacggactatttcatttGTTTTAGAGCTAGAAATAGCAAGTT AN626 pAN151 SUP4t-F AAAATAAG gacataactaattacatgaCTCGAGAGACATAAAAAACAAAAA AN321 pAN151 SUP4t-R AAG

AN627 NA AUR1a-F ATGGCAAACCCTTTTTCGAGATGG

AN628 NA AUR1a-R GAAATAGTCCGTACGGTAACCATGC

cattttagcatggttaccgtacggactatttcattaTGGGGCCCCATTTGTCG AN629 NA AUR1b-F TTG gcaccgaaaatgacggaggaatttgaaaaacaTGTAGTATACATATTAA AN630 NA AUR1b-R TACCGAG

AN631 NA AUR1c-F TTTTTCAAATTCCTCCGTCATTTTCG

AN632 NA AUR1c-R TTAAGCCCTCTTTACACCTAGTGAC

pAN127/ CACTAAAGGGAACAAAAGCTGGAGCTATAGCTTCAA AN569 TEF1p-F pAN128 AATGTTTCACT1TCC

AN387 pAN127 TEF1p-R GTTTTCACCAATTGGTCCATAAACTTAGATTAGATTGC

229

TATGCT

AN388 pAN127 tHMG1-F tagcaatctaatctaagtttATGGACCAATTGGTGAAAACTGAAG

pAN127/ AN401 tHMG1-R tacgcgcacaaaagcagagaTTAGGATTTAATGCAGGTGACGG pAN128 pAN127/ AN402 ACT1t-F tcacctgcattaaatcctaaTCTCTGCTTTTGTGCGCG pAN128 pAN127/ gactcactatagggcgaattgggtacACACTATGATATATAAATATA AN568 ACT1t-R pAN128 ATAGTTTTTCG

AN571 pAN128 TEF1p-R cttgaatagcggcggcatAAACTTAGATTAGATTGCTATGCT

AN570 pAN128 HMG1-F tagcaatctaatctaagtttATGCCGCCGCTATTCAAGG

All cctcactaaagggaacaaaagctggagctcGAAGTACCTTCAAAGAAT AN409 except PGK1p-F GGGGTC pAN127 pAN117/ ttttaataaacgaaactctgttcatTTGTTTTATATTTGTTGTAAAAAG AN420 PGK1p-R pAN119 TAGATAATTAC pAN117/ actttttacaacaaatataaaacaaATGAACAGAGTTTCGTTTATTAA AN421 DPP1-F pAN119 AACG pAN117/ cagctcatctatcttggcctccatagaaccaccaccCATACCTTCATCGGA AN441 DPP1-R pAN119 CAAAGGATG pAN117/ catcctttgtccgatgaaggtatgggtggtggttctATGGAGGCCAAGATA AN442 BTS1-F pAN119 GATGAG aaatcataaatcataagaaattcgcTCACAATTCGGATAAGTGGTCT AN443 pAN117 BTS1-R A aatagaccacttatccgaattgtgaGCGAATTTCTTATGATTTATGAT AN444 pAN117 ADH1t-F TTTTAT All tacgactcactatagggcgaattgggtaccGAGCGACCTCATGCTATAC AN412 except ADH1t-R C pAN127 taatttctttttctgaagccatagaaccaccaccCAATTCGGATAAGTGGT AN445 pAN119 BTS1-R CTATTATAT aatagaccacttatccgaattgggtggtggttctATGGCTTCAGAAAAAG AN446 pAN119 ERG20-F AAATTAGG aaatcataaatcataagaaattcgcCTATTTGCTTCTCTTGTAAACTT AN424 pAN119 ERG20-R TG caaagtttacaagagaagcaaatagGCGAATTTCTTATGATTTATGA AN425 pAN119 ADH1t-F TTTTTAT

230

PGK1p(RC)- tacgactcactatagggcgaattgggtaccGAAGTACCTTCAAAGAAT AN536 pAN120 F GGGGTC PGK1p(RC)- tcatctatcttggcctccatTTGTTTTATATTTGTTGTAAAAAGTA AN540 pAN120 R GATAATTAC

AN541 pAN120 BTS1(RC)-F ctttttacaacaaatataaaacaaATGGAGGCCAAGATAGATGAG

ERG20(RC)- ctcaggtatagcatgaggtcgctcggtaccCTATTTGCTTCTCTTGTAA AN537 pAN120 R ACTTTG

Table 5-4. Yeast synthetic-dropout media recipe.

Ingredient Amount to 100 mL 2X (13 g/L) Yeast Nitrogen Base (YNB) from 50 mL BD-Difco (291940) 40 % Glucose 5 mL 20X amino acid 5 mL 1 % Histidine (if required) 0.2 mL 1 % Leucine (if required) 0.9 mL 1 % Tryptophan (if required) 0.2 mL 0.2 % Uracil (if required) 1.0 mL

4 % Bacto-Agar (plates) or dH2O (liquid media) 40 mL

231

Table 5-5. Amino acids used for 20X amino acid mix.

Amino acid To 500 mL Sigma-Aldrich Catalog Code Adenine sulfate 0.2 g S/A-8626 Arginine HCl 0.2 g S/A-5006 **Aspartic acid 1.0 g S/A-9256 Glutamic acid 1.0 g S/G-1251 Isoleucine 0.3 g S/I-2752 Lysine HCl 0.3 g S/L-5626 Methionine 0.2 g S/M-9625 Phenylalanine 0.5 g S/P-2126 Serine 4.0 g S/S-4500 **Threonine 2.0 g S/T-8625 Tyrosine 0.3 g S/T-3754 Valine 1.5 g S/V-0500 Autoclave all and add ** when cooled, then sterile filter

Selective amino acids To 200 mL Sigma-Aldrich Catalog Code 1 % Histidine 2 g S/H-8125 1 % Leucine 2 g S/L-8000 1 % Tryptophan 2 g S/T-0254 0.2 % Uracil 0.4 g S/U-0750 autoclave

232

5.6 Spectra

(i) (ii) (iii)

Figure 5-11. GC-MS traces of E. coli fermentations from FPP2 flask fermentation. Gas chromatographs of extract from FPP2 (M. tuberculosis FPPase) flask fermentation in decane (top) and synthetic Z,E-FOH standard in decane (below). Corresponding mass spectra at the bottom. Peak (i) denotes the Z,E-FOH product of the FPP2 strain. Peak (iii) denotes the synthesized Z,E-FOH, and peak

(ii) denotes a potential degradation product (side peak) of Z,E-FOH. Retention time (RT) of Z,E-FOH is

~12.12 min. RT of the Z,E-FOH side peak is ~12.05 min. These runs were performed with a different GC column compared to all other traces in this report, hence the different retention times.

233

(i) (ii)

(iii) (iv)

234

Figure 5-12. GC-MS traces of E. coli fermentations from FPP2 bench-scale production. Gas chromatographs of extract from FPP2 bench-scale production in hexane (top) and the synthetic Z,E-FOH standard (below) in hexane. Corresponding mass spectra at the bottom. Peak (i) denotes the added cis- nerolidol standard. Peak (ii) denotes the biosynthesized Z,E-FOH. Peak (iv) denotes the chemically synthesized Z,E-FOH, and peak (iii) denotes the side peak as in Fig. 5-11. RT of cis-nerolidol in hexane is ~8.87 min. RT of standard Z,E-FOH in hexane is ~9.98 min. RT of Z,E-FOH side peak in hexane is

~9.90 min.

235

(i) (ii) (iii)

236

(iv) (v) (vi)

(vii) (viii) (ix)

(x) (xi)

237

Figure 5-13. GC-MS traces of E. coli fermentations of the other Z,E-FPPSs. Gas chromatographs of extracts from E. coli strains containing the other Z,E-FPPSs: 1) FPP5 (C. glutamicum); 2) FPP4 (C. efficiens); 3) FPP3 (T. fusca); 4) Mix of farnesol isomers (Sigma, W247804) in decane; 5) Synthesized

Z,E-FOH standard in decane. Corresponding mass spectra below. Peaks (i), (iv), and (vi) denote the added cis-nerolidol. Peaks (ii) and (vii) denote biosynthesized Z,E-FOH. Peaks (iii), (v) and (viii) denote

E,E-FOH. RT of cis-nerolidol in decane is ~8.76 min. RT of standard Z,E-FOH in decane (peaks (ix) and

(xi)) is ~9.81 min. RT of standard E,E-FOH in decane (peak (x)) is ~ 9.95 min.

Figure 5-14. 1H-NMR of biosynthetic Z,E-FOH from E. coli.

238

B (i) (ii)

(iii) (iv)

239

(v) (vi)

(vii)

Figure 5-15. GC-MS traces of S. cerevisiae fermentations for Z,E-FOH production. (A) GC chromatographs of: 1. Extract from small scale fermentation trial of CP-51. 2. Extract from small scale fermentation trial of ATCC200589/tHMG1 (AN-ZE1-T7). 3. Synthetic Z,E-FOH standard. 4. Authentic

E,E-FOH standard (96 %, Sigma 277541). The undecanol internal standard is present in each trace at retention time (RT) ~ 7.68 min. RT of the unidentified side peak is ~ 9.90 min. RT of Z,E-FOH is ~ 9.99 min. RT of E,E-FOH is ~10.12 min. (B) Mass spectra of labeled peaks in the GC chromatographs.

240

B (i) (ii)

(iii) (iv)

Figure 5-16. GC-MS analysis of yeast fermentation extracts and authentic GGOH standard. (A) GC chromatographs of (i) authentic GGOH standard; GGOH produced from small-scale fermentation trials of

241

(ii) AN-GG1-T3, (iii) AN-GG2-T3 and (iv) bench-scale production of AN-GG2-T3. The undecanol internal standard is present in each trace at retention time (RT) ~7.68 min. RT of E,E-FOH is ~10.08 min.

RT of GGOH is ~12.72 min. (B) Mass spectra of GGOH peak for (i), (ii), (iii) and (iv).

Figure 5-17. 1H-NMR of biosynthetic GGOH from bench-scale yeast fermentation.

242

5.7 References 1 Vining, L. C. Functions of secondary metabolites. Annu Rev Microbiol 44, 395-427, doi:10.1146/annurev.mi.44.100190.002143 (1990). 2 Demain, A. L. & Fang, A. The natural functions of secondary metabolites. Adv Biochem Eng Biotechnol 69, 1-39 (2000). 3 Cragg, G. M. & Newman, D. J. Natural products: a continuing source of novel drug leads. Biochim Biophys Acta 1830, 3670-3695, doi:10.1016/j.bbagen.2013.02.008 (2013). 4 Goodman, J. & Walsh, V. The story of taxol : nature and politics in the pursuit of an anti-cancer drug. (Cambridge University Press, 2001). 5 Li, J. J. & Corey, E. J. Total synthesis of natural products : at the frontiers of organic chemistry. (Springer, 2013). 6 Demain, A. L. From natural products discovery to commercialization: a success story. J Ind Microbiol Biotechnol 33, 486-495, doi:10.1007/s10295-005-0076-x (2006). 7 Chang, M. C. & Keasling, J. D. Production of isoprenoid pharmaceuticals by engineered microbes. Nature chemical biology 2, 674-681, doi:10.1038/nchembio836 (2006). 8 Burgard, A., Burk, M. J., Osterhout, R., Van Dien, S. & Yim, H. Development of a commercial scale process for production of 1,4-butanediol from sugar. Curr Opin Biotechnol 42, 118-125, doi:10.1016/j.copbio.2016.04.016 (2016). 9 Nakamura, C. E. & Whited, G. M. Metabolic engineering for the microbial production of 1,3- propanediol. Curr Opin Biotechnol 14, 454-459 (2003). 10 Paddon, C. J. & Keasling, J. D. Semi-synthetic artemisinin: a model for the use of synthetic biology in pharmaceutical development. Nat Rev Microbiol 12, 355-367, doi:10.1038/nrmicro3240 (2014). 11 Nielsen, J. & Keasling, J. D. Engineering Cellular Metabolism. Cell 164, 1185-1197, doi:10.1016/j.cell.2016.02.004 (2016). 12 Renault, H., Bassard, J. E., Hamberger, B. & Werck-Reichhart, D. Cytochrome P450-mediated metabolic engineering: current progress and future challenges. Curr Opin Plant Biol 19, 27-34, doi:10.1016/j.pbi.2014.03.004 (2014). 13 Exposito, O. et al. Biotechnological production of taxol and related taxoids: current state and prospects. Anticancer Agents Med Chem 9, 109-121 (2009). 14 Zhong, J. J. Plant cell culture for production of paclitaxel and other taxanes. Journal of bioscience and bioengineering 94, 591-599 (2002).

243

15 Buchholz, K. & Collins, J. The roots--a short history of industrial microbiology and biotechnology. Applied microbiology and biotechnology 97, 3747-3762, doi:10.1007/s00253- 013-4768-2 (2013). 16 Kampranis, S. C. & Makris, A. M. Developing a yeast cell factory for the production of terpenoids. Comput Struct Biotechnol J 3, e201210006, doi:10.5936/csbj.201210006 (2012). 17 Zhang, H. et al. Microbial production of sabinene--a new terpene-based precursor of advanced biofuel. Microb Cell Fact 13, 20, doi:10.1186/1475-2859-13-20 (2014). 18 Peralta-Yahya, P. P. et al. Identification and microbial production of a terpene-based advanced biofuel. Nat Commun 2, 483, doi:10.1038/ncomms1494 (2011). 19 Phelan, R. M., Sekurova, O. N., Keasling, J. D. & Zotchev, S. B. Engineering terpene biosynthesis in Streptomyces for production of the advanced biofuel precursor bisabolene. ACS Synth Biol 4, 393-399, doi:10.1021/sb5002517 (2015). 20 Wilbon, P. A., Chu, F. & Tang, C. Progress in renewable polymers from natural terpenes, terpenoids, and rosin. Macromol Rapid Commun 34, 8-37, doi:10.1002/marc.201200513 (2013). 21 Paquette, L. A. & Maleczka, R. E. Enantioselective Total Synthesis of (-)-9-Epi-Ambrox, a Potent Ambergris-Type Olfactory Agent. J Org Chem 56, 912-913, doi:DOI 10.1021/jo00003a004 (1991). 22 Stoll, M. & Hinder, M. Les substances bicyclohomofarnesiqueues. Helv Chim. Acta. 33, 1251- 1261 (1950). 23 Barrero, A. F., Altarejos, J., AlvarezManzaneda, E. J., Ramos, J. M. & Salido, S. Synthesis of (+/-)-ambrox from (E)-nerolidol and beta-ionone via allylic alcohol [2,3] sigmatropic rearrangement. J Org Chem 61, 2215-2218, doi:DOI 10.1021/jo951908o (1996). 24 Jiang, Q. Natural forms of vitamin E: metabolism, antioxidant, and anti-inflammatory activities and their role in disease prevention and therapy. Free Radic Biol Med 72, 76-90, doi:10.1016/j.freeradbiomed.2014.03.035 (2014). 25 Sen, C. K., Khanna, S. & Roy, S. Tocotrienol: the natural vitamin E to defend the nervous system? Ann N Y Acad Sci 1031, 127-142, doi:10.1196/annals.1331.013 (2004). 26 Sen, C. K., Khanna, S. & Roy, S. Tocotrienols: Vitamin E beyond tocopherols. Life Sci 78, 2088- 2098, doi:10.1016/j.lfs.2005.12.001 (2006). 27 Bonrath, W. & Netscher, T. Catalytic processes in vitamins synthesis and production. Appl Catal a-Gen 280, 55-73, doi:10.1016/j.apcata.2004.08.028 (2005). 28 Bonrath, W., Eggersdorfer, M. & Netscher, T. Catalysis in the industrial preparation of vitamins and nutraceuticals. Catal Today 121, 45-57, doi:10.1016/j.cattod.2006.11.021 (2007).

244

29 Pearce, B. C., Parker, R. A., Deason, M. E., Qureshi, A. A. & Wright, J. J. Hypocholesterolemic activity of synthetic and natural tocotrienols. J Med Chem 35, 3595-3606 (1992). 30 Mayer, H., Metzger, J. & Isler, O. [The stereochemistry of natural gamma-tocotrienol (plastochromanol-3), plastochromanol-8 and plastochromenol-8]. Helv Chim Acta 50, 1376-1393, doi:10.1002/hlca.19670500519 (1967). 31 Urano, S., Nakano, S. & Matsuo, M. Synthesis of Dl-Alpha-Tocopherol and Dl-Alpha- Tocotrienol. Chem Pharm Bull 31, 4341-4345 (1983). 32 Scott, J. W., Bizzarro, F. T., Parrish, D. R. & Saucy, G. Syntheses of (2R,4'R,8'R)-alpha- tocopherol and (2R,3'E,7'E)-alpha-tocotrienol. Helv Chim Acta 59, 290-306, doi:10.1002/hlca.19760590135 (1976). 33 Chenevert, R., Courchesne, G. & Pelchat, N. Chemoenzymatic synthesis of both enantiomers of alpha-tocotrienol. Bioorg Med Chem 14, 5389-5396, doi:10.1016/j.bmc.2006.03.035 (2006). 34 Couladouros, E. A., Moutsos, V. I., Lampropoulou, M., Little, J. L. & Hyatt, J. A. A short and convenient chemical route to optically pure 2-methyl chromanmethanols. Total asymmetric synthesis of beta-, gamma-, and delta-tocotrienols. J Org Chem 72, 6735-6741, doi:10.1021/jo0705418 (2007). 35 Hamilton-Miller, J. Development of the semi-synthetic penicillins and cephalosporins. Int J Antimicrob Ag 31, 189-192, doi:10.1016/j.ijantimicag.2007.11.010 (2008). 36 Yu, J. S., Kleckley, T. S. & Wiemer, D. F. Synthesis of farnesol isomers via a modified Wittig procedure. Organic letters 7, 4803-4806, doi:10.1021/ol0513239 (2005). 37 Buchi, G., Cushman, M. & Wuest, H. Conversion of allylic alcohols to homologous amides by N,N-dimethylformamide acetals. Journal of the American Chemical Society 96, 5563-5565, doi:10.1021/ja00824a041 (1974). 38 Wang, C. et al. Farnesol production from Escherichia coli by harnessing the exogenous mevalonate pathway. Biotechnology and bioengineering 107, 421-429, doi:10.1002/bit.22831 (2010). 39 Zhao, Y. et al. Biosynthesis of isoprene in Escherichia coli via methylerythritol phosphate (MEP) pathway. Applied microbiology and biotechnology 90, 1915-1922, doi:10.1007/s00253-011- 3199-1 (2011). 40 Ambo, T., Noike, M., Kurokawa, H. & Koyama, T. Cloning and functional analysis of novel short-chain cis-prenyltransferases. Biochemical and biophysical research communications 375, 536-540, doi:10.1016/j.bbrc.2008.08.057 (2008). 41 Schulbach, M. C., Brennan, P. J. & Crick, D. C. Identification of a short (C15) chain Z-isoprenyl diphosphate synthase and a homologous long (C50) chain isoprenyl diphosphate synthase in

245

Mycobacterium tuberculosis. The Journal of biological chemistry 275, 22876-22881, doi:10.1074/jbc.M003194200 (2000). 42 Wang, C. et al. Engineered heterologous FPP synthases-mediated Z,E-FPP synthesis in E. coli. Metabolic engineering 18, 53-59, doi:10.1016/j.ymben.2013.04.002 (2013). 43 Faulkner, A. et al. The LPP1 and DPP1 gene products account for most of the isoprenoid phosphate phosphatase activities in Saccharomyces cerevisiae. The Journal of biological chemistry 274, 14831-14837 (1999). 44 Ohto, C., Muramatsu, M., Obata, S., Sakuradani, E. & Shimizu, S. Overexpression of the gene encoding HMG-CoA reductase in Saccharomyces cerevisiae for production of prenyl alcohols. Applied microbiology and biotechnology 82, 837-845, doi:10.1007/s00253-008-1807-5 (2009). 45 Muramatsu, M., Ohto, C., Obata, S., Sakuradani, E. & Shimizu, S. Accumulation of prenyl alcohols by terpenoid biosynthesis inhibitors in various microorganisms. Applied microbiology and biotechnology 80, 589-595, doi:10.1007/s00253-008-1578-z (2008). 46 Ohto, C., Muramatsu, M., Obata, S., Sakuradani, E. & Shimizu, S. Production of geranylgeraniol on overexpression of a prenyl diphosphate synthase fusion gene in Saccharomyces cerevisiae. Applied microbiology and biotechnology 87, 1327-1334, doi:10.1007/s00253-010-2571-x (2010). 47 Tokuhiro, K. et al. Overproduction of geranylgeraniol by metabolically engineered Saccharomyces cerevisiae. Applied and environmental microbiology 75, 5536-5543, doi:10.1128/AEM.00277-09 (2009). 48 Castellana, M. et al. Enzyme clustering accelerates processing of intermediates through metabolic channeling. Nat Biotechnol 32, 1011-1018, doi:10.1038/nbt.3018 (2014). 49 Kirby, J. & Keasling, J. D. Metabolic engineering of microorganisms for isoprenoid production. Natural product reports 25, 656-661, doi:10.1039/b802939c (2008). 50 Chambon, C., Ladeveze, V., Oulmouden, A., Servouse, M. & Karst, F. Isolation and properties of yeast mutants affected in farnesyl diphosphate synthetase. Current genetics 18, 41-46 (1990). 51 Muramatsu, M., Ohto, C., Obata, S., Sakuradani, E. & Shimizu, S. Various oils and detergents enhance the microbial production of farnesol and related prenyl alcohols. Journal of bioscience and bioengineering 106, 263-267, doi:10.1263/jbb.106.263 (2008). 52 Dagert, M. & Ehrlich, S. D. Prolonged incubation in calcium chloride improves the competence of Escherichia coli cells. Gene 6, 23-28 (1979). 53 Ostrov, N., Wingler, L. M. & Cornish, V. W. Gene assembly and combinatorial libraries in S. cerevisiae via reiterative recombination. Methods in molecular biology 978, 187-203, doi:10.1007/978-1-62703-293-3_14 (2013).

246

54 DiCarlo, J. E. et al. Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic acids research 41, 4336-4343, doi:10.1093/nar/gkt135 (2013). 55 Lerner, C. G. & Inouye, M. Low copy number plasmids for regulated low-level expression of cloned genes in Escherichia coli with blue/white insert screening capability. Nucleic acids research 18, 4631 (1990). 56 Sikorski, R. S. & Hieter, P. A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122, 19-27 (1989).

247

Appendix Sequences of Oligonucleotides, IDT gblocks, Pathways Constructed by Reiterative Recombination and Shuttle Plasmids

248

A.1 Oligonucleotides used in Chapter 2.

AN3 caacgcacgaccggccgcCagctgtgcttc AN4 gtcgaccgaagcacagctGgcggccggtc AN15 CCTATTCCGAAGTTCCTATTCTTCAAATAGTATAGGAAC

AN16 GCTCTGAAGTTCCTATACTATTTGAAGAATAGGAAC AN19 GAGCTCGGTACCAGCCCG AN21 cacgcatatgattaatttgttcaatGgtataaccaac AN22 gagcaacgtgttggttatacCattgaacaaattaatc AN47 GTTCCAGATTATGCTccacaatttggtatattatgtaAAACACCACCTAAGGTGCTTG AN48 gctacaGTCGACttatatgcgtctatttatgtaggatg AN49 gattcaGGATCCatgTATCCATATGATGTTCCAGATTATGCTccacaat AN100 aaaattgtgcctttggacttaaaatggcgtTGTAAACGACGGCCAGTG AN101 GTCGGGCTggtaccgagctctctagaGCCTTAGTCGACCTGCAG AN102 GCCTGCAGGTCGACTAAGGCtctagagagctcggtaccAGCCCG AN103 cttagggataacagggtaatAGACGAACTCGACGGCAGGTggatccTACCAACCGGCAC AN104 tccgaagttcctattctctagaaagtataggaacttcTCGAAAGCTACATATAAGGAACG AN105 taagacGGATCCATGGAAGTTCCTATTCCGAAGTTCCTATTCTCTAGAAAGTATAGG AN106 aactatGTCGACTTAGTTTTGCTGGCCGCATCTTC AN107 cgaagttcctattctctagaaagtataggaacttcTCTGTTATTAATTTCACAGGTAGTT AN108 aactatGTCGACCTATTTCTTAGCATTTTTGACGAAATTTGC AN109 tgagaaggttttgggacgctcgaaggctttggtaggatccACCTGCCGTCGAGTTCGTCT AN128 tccgaagttcctattcttcaaatagtataggaacttcTAGTACGGATTAGAAGCCGC AN133 aaaattgtgcctttggacttaaaatggcgtGAAGTTCCTATTCCGAAGTTCCTATTCTCT AN134 tcagtacaatcttagggataacagggtaatGAGCGACCTCATGCTATACCTGAG AN137 taagacGGTACCGAAGTTCCTATTCCGAAGTTCCTATTCTTCAAATAGTAT AN138 aactatGGTACCGGCCGCAAATTAAAGCCTTCG AN144 TCGACCAGTTCCACTCGGCCG AN145 TGGAAGGGGTAGGTCGGCAGCT AN146 ctgttgcggaaagctgaaaGATGGAGTGATCCCGTAGGAGG AN155 ggacgctcgaaggctttTCGTCCAGCCGGTCCTGTTCA

249

AN156 cttagggataacagggtaatCATCGGTGTCCACCAGCACCAG AN164 ggacgctcgaaggctttTGTCGACACGGGTGATGGTGT AN165 ctgttgcggaaagctgaaaACGGTGGGCGTGGACATCA AN168 ggacgctcgaaggctttTGATGCACCACGACTACCTCAC AN169 cttagggataacagggtaatGAAGGCGTACGTCGGCAGAT AN171 ggacgctcgaaggctttCACCACACCTTCATCGAGATCAG AN172 ctgttgcggaaagctgaaaCGAGTACGACGGCGACATG AN173 CGGTGATGACGACGACGAAC AN175 GCGTCAACGACGTGAGCAAC AN184 GTTAGCGGTTTGAAGCAGGC AN187 TCATCGATGTGCAGGCTCTG AN188 ggacgctcgaaggctttCGACTGGGACCGGCACTAC AN189 cttagggataacagggtaatCCAGCTCACGCAGCAGTTC AN215 ggacgctcgaaggctttCATGTGATTCTGGAGCAGGCC AN216 ctgttgcggaaagctgaaaGGTGAACGGAAGGTGCGGTC AN217 tcaggttgctttctcaggtatagcatgaggtcgctctctagaGAGCTCGGTACCAGCCCG AN218 ggacgctcgaaggctttgagtaactctttcctgtaggTCAGGTTGCTTTCTCAGGTATAG AN219 ctgttgcggaaagctgaaacttccccgacgactgatggaaaGGATCCTACCAACCGGCAC AN220 ctgttgcggaaagctgaaagagttcctggtcgatgattccgGGATCCTACCAACCGGCAC AN221 ctgttgcggaaagctgaaaccctcgttcgcgcatcaactgaGGATCCTACCAACCGGCAC AN223 taagacctcgagTTAAAAATCTAGCTTAGGCTTATCAC AN224 GTTGACGGCCCCGAATGGTC AN227 CGGCGTGTTCACGGGGAC AN228 GAACACCGTATGGCCAAGCTC AN229 ggacgctcgaaggctttGTCTACGCCGAGATGAGCAC AN230 cttagggataacagggtaatGTTGGCGAGCCACCAGACC AN231 TTAATTCGAGCTCGGTACCCG AN232 aactatGGTACCGCTAGCTTCAAAAGCGCTC AN239 GGACACCATCGAGTTGGACGC AN240 CCACTACTCCGGAGCCGATC

250

AN241 ggacgctcgaaggctttGACCGTATCGGCCGACTGC AN242 ctgttgcggaaagctgaaaGACAGCCGTTCCAGGACGAG AN245 CGTTACACGATCCGGTGGAAG AN246 CTCGTCCGCCTTCAGCAAC AN247 GTCAACTTCCGCGATGTACTGC AN248 CTCCGATATCGCGTCCGTG AN249 ggacgctcgaaggctttGTGATGACGACACCGGACCTG AN250 cttagggataacagggtaatAGAACCGCTCACGCTGGAAC AN264 ccactagacTTTCAGCTTTCCGCAACAGTATAActgtgcTTAATTCGAGCTCGGTACCCG AN265 tgctttttctttttttttctcttgaactcgacggagagctcGCTAGCTTCAAAAGCGCTC AN266 ggacgctcgaaggctttgcatgcctgcaggtcgaccactagacTTTCAGCTTTCCG AN267 CCACCCTTGTCTTCGACCACC AN268 CATCCTCCAAAGTCAACGCAC AN272 cttgcggttgccataagagaag AN273 ggacgctcgaaggctttgCAGGGCCACACGGTGTTCGTC AN274 ctgttgcggaaagctgaaaGATGACACCGTCGTCCACCAC AN275 GATGCGTGAATGTGCCGATG AN276 GAAGCTCCTTCCACTCCAGC AN277 ggacgctcgaaggctttGTGACCTGGTGCTGGCCAG AN285 GCACACATATAATACCCAGCAAG AN299 GCGTCTCCCTTGTCATCTAAACC AN318 ctaaagggaacaaaagctgGAGCTCTCTTTGAAAAGATAATG AN321 gacataactaattacatgaCTCGAGAGACATAAAAAACAAAAAAAG AN324 tcttgagtaactctttcctgtaggtc AN339 gctctaaaacaccaagaccccacacccccgGATCATTTATCTTTCACTGCGGAGAAG AN340 cgggggtgtggggtcttggtGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG GTCGATGGTGATGTGGTGGATCCGGGGCAGTCGGGGGTGTGGGGTCTTGGCCGGG AN341 TGATC TCGATCAGACCACCCCAACGATCCGGATGCTCCAAACGGATCACCCGGCCAAGAC AN342 CCCAC AN343 TCTAGAGGATCCCCGGGCAATTGTACCGAGCTCCAGAGCACCGTCGAAGCCCTGA

251

TGAAG

AN344 GCCAGTGAATTCTCTAGAccggcacgattgtgcGCTAGCTTCATCAGGGCTTCGACGGTG GAATGTATTTAGAAAAATAAACAAATAGGGGTgctagcTCATGACCCATCGAGTTCC AN345 TGG CCAGGAACTCGATGGGTCATGAgctagcACCCCTATTTGTTTATTTTTCTAAATACAT AN346 TC AN347 cttagggataacagggtaatcgGATCTTCACCTAGATCCTTTTAAATTAAAA AN348 ccaacgtaaaattgtgcctttggac AN358 CAACTGTTGGGAAGGGCGATC GACCATGATTACGCCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCGG AN362 GCA CCAGGGTTTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGAATTCTCTAGAcc AN363 gg AN364 CCATCTCCCCTTCTCCTGGAG AN365 GTGGCCACGAGTGTGGGTTC AN366 CTGTCGGAGCAGGACGTG GCGGTCGTGGTGTCGACGACCTCCAGTGCCACGGTGCGCTGAACTCCGGCGGTGC AN374 CGAAG TCCCCACGGGGTGGGGCGACCGCACCATCGTCGACAGCCTCTTCGGCACCGCCGG AN375 AGTTC AN396 gctctaaaacGTGGTGCCGAAGAGGCTGTCGATCATTTATCTTTCACTGCGGAGAAG AN397 GACAGCCTCTTCGGCACCACGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG AN415 GGTCGACTCTAGAGGATCCCCGGGCAATTGGGTCCTTTTCATCACGTGCTAT AN416 TTCTCTAGAccggcacgattgtgcGCTAGCCGGATGCAAGGGTTCGAATC AN447 atatatatatttcaaggatataccattctaATGTCTGCCCCTATGTCTGC AN448 cataggggcagacatTAGAATGGTATATCCTTGAAATATATATATATATTG GAATCATCGACCAGGAACTCGATGGGTCATGAgctagcGGCGAAAATGAGACGTTGA AN485 TCG AN487 ggacgctcgaaggctttgCTCGCTCGACGACATGTTCGGAATCATCGACCAGGAACTCGA AN489 ctctttcctgtaggtcaggttgctttctcaggtatagcatgaggtcgctctctagaGAAG AN490 atgaggtcgctctctagaGAAGTTCCTATTCCGAAGTTCCTATTCTCATATAAGTATAGG TGCTCGGGTCGGGCTggtaGAAGTTCCTATACTTATATGAGAATAGGAACTTCGGAA AN491 TAG CTCGAACTCCGGTCCGACATCGACCAGGCGTGCCGGCGCGTGCTCGGGTCGGGCT AN492 ggtaG

252

AN493 gctctaaaacagctctctagagagcgacctGATCATTTATCTTTCACTGCGGAGAAG AN494 aggtcgctctctagagagctGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG TGCTCGGGTCGGGCTggtaGAAGTTCCTATACTCCTATAAGAATAGGAACTTCGGAA AN524 TAG AN525 GCTCTAAAACagctctctagagccttagtcGATCATTTATCTTTCACTGCGGAGAAG AN526 gactaaggctctagagagctGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG AN531 tgatatttgatctttaccgtttagttccaacgtAAAATTGTGCCTTTGGACTTAAAATGG AN532 atacataaacatacgcgcacaaaagcagagaCTATTTCTTAGCATTTTTGACGAAATTTG AN533 gtcaaaaatgctaagaaatagTCTCTGCTTTTGTGCGCG AN534 cggaataggaacttctctagaACACTATGATATATAAATATAATAGTTTTTCG AN535 CTATAAGAATAGGAACTTCGGAATAGGAACTTCtctagaACAC AN542 GCTCTAAAACgagttggggccggacgggacGATCATTTATCTTTCACTGCGGAGAAG AN543 gtcccgtccggccccaactcGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG GCCGATGGGCTGGATACTCTCCGTGAGCTGGGTGTGGGTTCGTTCCTGGAGTTGG AN544 GGC GCCATCCGCCAAGGCGGTCAACGTCCCGTCCGGCCCCAACTCCAGGAACGAACCC AN545 AC AN553 tatacttctgatagaataggaacttcggaataggaacttcCGACGACCGGGTCGAATTTG AN554 ctgttgcggaaagctgaaagaagttccTATACTTCTGATAGAATAGGAACTTCGGAATAG AN580 GTCCATGCGAGTGTCCGTTC AN599 gctctaaaaccacagttatactgtgcggaaGATCATTTATCTTTCACTGCGGAGAAG AN600 ttccgcacagtataactgtgGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG AN605 atgttcgacgccctggtattcgTTTCAGCTTTCCGCAACAGTATAActgtgtatttcaca AN606 tttttctcttgaactcgacggatctatgcggtgtgaaatacacagTTATACTGTTGCGG AN609 gctctaaaacgaggccaagcggattccggtGATCATTTATCTTTCACTGCGGAGAAG AN610 accggaatccgcttggcctcGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG AN611 GCTCTAAAACagctctctagagagcgacctGATCATTTATCTTTCACTGCGGAGAAG AN612 aggtcgctctctagagagctGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG AN613 ccgaagttcctattctctagaaagtataggaacttcGGTAAGGAAAAGACTCACGTTTCG AN614 tcctattCCGAAGTTCCTATTCTCTAGAAAG AN619 caggaaacagctatgaccatgattacgccaACGGATTAGAAGCCGCCGAG

253

AN620 caaaagcagagaTTATATGCGTCTATTTATGTAGGATGAAAG AN621 cataaatagacgcatataaTCTCTGCTTTTGTGCGCG AN622 tcctctagagtcgacctgcaggcatgcaACACTATGATATATAAATATAATAGTTTTTCG AN681 gaatccggtgagaatggcaaaag AN682 tgcataagcttttgccattctcac AN693 gctctaaaaccttggcctccgggaactccgGATCATTTATCTTTCACTGCGGAGAAG AN694 cggagttcccggaggccaagGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG CCGGTGTCGACGGTGGTGTCGGGGGCTGTTGAGGTGCTGGAGGGGGTGCTGGCGG AN695 AGTTC GCGACGCATAATCCACCGGAATCCGCTTGGCCCCCGGGAACTCCGCCAGCACCCC AN696 CTCCA TAATTTGCGGCCGGTACCCGAAGTTCCTATTCCGAAGTTCCTATTCTTCAAATAGT AN849 ATAG GCGAATTGGGTACCGCTAGCCAATTGGAAGTTCCTATACTATTTGAAGAATAGGA AN850 ACTTC TAATTTGCGGCCGGTACCCGAAGTTCCTATTCCGAAGTTCCTATTCTCATATAAGT AN851 ATAG GCGAATTGGGTACCGCTAGCCAATTGGAAGTTCCTATACTTATATGAGAATAGGA AN852 ACTTC TAATTTGCGGCCGGTACCCGAAGTTCCTATTCCGAAGTTCCTATTCTTATAGGAGT AN853 ATAG GCGAATTGGGTACCGCTAGCCAATTGGAAGTTCCTATACTCCTATAAGAATAGGA AN854 ACTTC TATAGGGCGAATTGGGTACCGCTAGCCAATTGGAAGTTCCTATACTTATATGAGA AN855 ATAGG TAATTTGCGGCCGGTACCCGAAGTTCCTATTCCGAAGTTCCTATTCTATCAGAAGT AN856 ATAG TATAGGGCGAATTGGGTACCGCTAGCCAATTGGAAGTTCCTATACTTCTGATAGA AN857 ATAGG AN858 CACTATAGGGCGAATTGGGTACC AN859 attctctagaccggcacgattgtgcgctagTGTTGCAAGGCTAATGACGGC AN862 gctcattttttaacACGGTGCTGCTGTTCAAGGTG AN863 cagcagcaccgtGTTAAAAAATGAGCTGATTTAACAAAAATTTAAC AN866 cagctatgaccatgattacgccaagctGGGGTTCCGCGCACATTTCC AN876 GAGTTCGTGGACGTCGGGC AN877 gcacaatcgtgccggTCTAG

254

AN884 CCAGCATTCTAGTTAGCTAGCACGCCATTTTAAGTCCAAAGGCAC AN885 GGCGTGCTAGCTAACTAGAATGCTGGAGTAGAAATACG AN890 cttgtctgtgtagaagaTCGTAAGGCCGTTTCTGACAG AN891 gaaacggccttacgaTCTTCTACACAGACAAGATGAAACAATTC AN892 gatgcttttctgtgactggtgagtatccctaggCCGGAATTGCCAGCTGGGG AN893 gccgcttttgcggTCAGAAGAACTCGTCAAGAAGGC AN894 gagttcttctgaCCGCAAAAGCGGCCTTTGAC AN895 ggaaaacgtgaagcAAAGTTTTGTCGTCTTTCCAGACG AN896 gacgacaaaactttGCTTCACGTTTTCCCAGGTCAG AN897 cagaatgacttggttgagtatccctaggAGACGTAGATCAGGCTTCCCG AN908 gctctaaaacattaatttcccctcgtcaaaGATCATTTATCTTTCACTGCGGAGAAG AN909 tttgacgaggggaaattaatGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG AN910 taatgtagagttgcacgtagttcttac AN911 ACGTTCCTTATATGTAGCTTTCGAG AN912 GGACCAGAACTACCTGTGAAATTAATAAC AN913 GAACTTCTCGAAAGCTACATATAAGG AN914 attgtcagtactgaTTAGTTTTGCTGGCCGCATCT AN915 gccagcaaaactaaTCAGTACTGACAATAAAAAGATTCTTG AN916 aatcgacagcagtatagcgac AN917 ggaacttcTCTGTTATTAATTTCACAGG AN918 gtcagtactgaCTATTTCTTAGCATTTTTGACGAAATTTG AN919 caaaaatgctaagaaatagTCAGTACTGACAATAAAAAGATTCTTG DD1 ggacgctcgaaggctttGTGCCGGTTGGTAggatccTCAGTTGATGCGCGAACG DD3 cttagggataacagggtaatCGGAGGGTTTGTGCTGTTCC DD4 CCGACAAGGCTCAAGTGGC DD5 ggacgctcgaaggctttGCGTTACCACTCGACCCGAC DD6 ctgttgcggaaagctgaaaCTCAGCTGGCACTGGATCTTC DD7 GATCATCATCCAGGCCGAACC DD8 CAGCCTGGGAATGTCCGGTG DD9 ggacgctcgaaggctttGGTGTAGATGCCCAGCATCAG

255

DD10 cttagggataacagggtaatcggCAAAATGCGTTCGGGGTTCTATC DD11 GTGAACGCCACCGCGAAGG DD12 GTTACCCTGGGCCGCATGACC DD18 ggacgctcgaaggctttGCAACAGCCTTGTCCCCAG DD20 CTCGACGATGTCGCAGAG DD21 GAAGCCGCCTCGGTAGTAC DD22 ctgttgcggaaagctgaaaTTCAAGTATCGGATGCTCGTC DD26 GTACCTCGCGGACCTGTTCAC DD27 CAACGCCGACGAGCTGATC DD28 GGTCCAGGTCAGCCAAGTCAC DD29 AGACTATTTGTTTCGGCTCTACTTG DD30 ggacgctcgaaggctttgTCTCGTAGTAGCGCAGTGTC DD31 GTATCCCGCATGACAGAGAAG DD32 TCGAAGACGCGGAAGATAG DD33 cttagggataacagggtaatcGGAGGTCGTCGGCCAGAC DD36 GTCGCGTTCCCCTTGAGGC DD37 CTCTTCGACCGCTCGCAC DD39 GTTGGAGCGTGTCGGGAATG DD41 TGAATGTATTTAGAAAAATAAACAAATAGGGGTgctagcggtcgacaacatcccgacccc DD42 cggggtcgggatgttgtcgaccgctagcACCCCTATTTGTTTATTTTTCTAAATACATTC DD44 ATGGTCTTCGAGCGGATGTAG DD45 GGCCTCATCCGGATGCAGAC DD46 CGGCCTGAAACGTGCTCAG DD47 GAGGGCGAGGTCGAGCTTC DD48 CGTCGGGGTCCGTGAGATC DD49 acggcgatgtggtgacgatc DD50 TAGCGCACGAAGTCCTCG DD51 aggctttgcatgcctgcaggCGAACCCGGACCTTCACACC DD52 CATTCCCGACACGCTCCAAC DD54 gaaagctgaaagtctagtggCAATTGGATCTTCACCTAGATCCTTTTAAATTAAAA

256

EC1 ggacgctcgaaggctttGTGCCGGTTGGTAggatccTTTCCATCAGTCGTCGGGG EC2 cttagggataacagggtaatCCAGGCCACCAGGTTGAG EC6 ggacgctcgaaggctttGAACCTCGCCTATGTCATTTACAC EC8 TACGTGGACAGCGACGACATC EC9 ACATAGGCCACCAGCCTCTTG EC36 GCTCTAAAACgccgtcaagatcgaccgcgtGATCATTTATCTTTCACTGCGGAGAAG EC37 acgcggtcgatcttgacggcGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAG EC46 GCCCGACCCGAGCACGC EC47 GTCGTACGGACAACGCCGTTC EC52 GAGGTAATGACGAAGTTTCTCTTCATTC EC53 GAAGAGATTGCTCATGGCCGC EC54 GTCGTTCGACATCCGCGTGC JK2 GGAGCTGATCGAGACCATTC JK4 ggacgctcgaaggctttgCCCTGTTGGCCACGTATGGTC JK7 GAGGTCACTCGTCGAGTCC JK33 ggacgctcgaaggctttgTTGGCGAATGCCCGTTTGTC JK35 AATATCTGAGCGTGGAATTGCTC JK36 CCACCCAGCAACTCCTCACG JK38 CTCGCCAGCACAACAGTCTC JK40 ggacgctcgaaggctttgATGAGCTGCCGATTCCCTG JK41 CAGCCGGTCAGCAGAAACC JK42 TCTGTCGTTGGAGGACGGTG JK43 ctgttgcggaaagctgaaaCGTAATGGCGTTCCCATTCG JK44 GGACCAGGTTGCGGTTCGTG JK45 CGAGAGCACCCACGGAACAG JK46 TGGCCGAACCCGAGGAAACC JK47 GGTCGGCAGGTCGATGAGAC JK53 cttagggataacagggtaatcggACGTGTCCGCAATCGGTG JK59 gatcccccgggctgcaggaattcgatatcaTGTGCGCTACCTGTATCTCTTC JK62 CATTGGTTTCGCGATGACGG

257

JK63 ACGATCGGATCGTCGTCCAC JK64 CCGTATTCGACTACCCGACCC JK65 ATCCAAATGCTGGTCCAGCTG JK66 GCGGATCGTGGTGAGTTGTTG JK67 cttgcttgagaaggttttgggacgctcgaaggctttAGTTGACCCGTGACATGGAACTC JK68 tcagtacaatcttagggataacagggtaatGGATGGCTCCAACCAGAACCG JK69 cttgcttgagaaggttttgggacgctcgaaggctttGAGCCTCTGACCGCTATGG JK71 TCGACCTGGACCGGACGTTC JK72 gcacagttatactgttgcggaaagctgaaaGCCTATCCAGATGCTGGTCCAG JK73 TGTGGTGTCATTGTTGTCGGTCG JK74 CCGGTCAGAAGGTATCCTTCGG JK76 tcagtacaatcttagggataacagggtaatGCCGAGGAGAACAACACAAAG JK81 GAGTTCGACGCCGGATTCTTC JK82 AGAACCGCTGCCGTTGGAAC JK83 GGTGTGGAGGTGGACTGGGAC JK84 GATGATGGCGATCGGTTCGTC JK85 GCTTCCGTGATCTACGACTACC JK87 gcacagttatactgttgcggaaagctgaaaCGAATACCAGGGCGTCGAAC JK91 cttgcttgagaaggttttgggacgctcgaaggcttTCTTTGGTGGCCACCCGGTC JK92 cttgcttgagaaggttttgggacgctcgaaggctttgCTTGCGGCTCTTTTGGTGAC JK94 GGAAGACGAACACCACCTTGC JK95 GTCCTTCATCGACGCCCATC JK96 CTGGTTCATCAGGGCTTCGAC JK97 CAGATCGCGTTCCACTCGAC JK98 cttgcttgagaaggttttgggacgctcgaaggctttGTATCTGCCGTTCTCCTGGAG JK100 AGGTCCAGCACCAGCTGTTC JK104 CTTCGAGCGAGGGATCTTCAC JK105 CGGGAGAACTCCACGAAGATG JK106 GATCCGAGGGCTACTTCCTGAC JK107 GAATCCTGGTGGCCGTCGTG

258

JK111 cttgcttgagaaggttttgggacgctcgaaggctttgGGAACTGCTGGACGGACTGG JK112 GATCGGACGTATCGGTGTCGAC JK113 CGCGTCTGGTGCTGGTGAC JK114 TCATGGCCGATCCGTCCGC JK115 CATTCTCGTGCTCGAAGAGC JK116 CAAGGTTCTGGACCAGTTGCG JK117 ATGCGCTCACGCAACTGG JK118 TTATTTGCCGACTACCTTGGTGATC JK119 GCGAGATCACCAAGGTAGTCG JK120 gtgaaataccgcacagTTATACTGTTG JK121 GTTCGAGGAAGGTGCCCGTG JK122 GTACTCGACACGGTGGACTTCC JK123 CACAAAGCGGTCCAGTTCCAG JK124 GTCCCTCTTGCCGGAATCCC JK130 CTCTTCCGCGGTCTGGTCC JK131 tcagtacaatcttagggataacagggtaatCGAAAACCGGGAACGACTGGTAC JK134 GACATGCCCGACATGGTGGTG JK135 TGCGGAATGGACAGCACCAG KV1 ggacgctcgaaggctttGTGCCGGTTGGTAggatccCGGAATCATCGACCAGGAAC KV3 CGACCAGTCGCCAAAGACCAC KV4 ggacgctcgaaggctttCTGGATCGGTATTTGGATAGGCC KV6 GAGTCTGAGGCTGAAGGTGTG KV7 CGACCGACAACAAAGACACCAC KV10 GTCTCTCTTCGCAGCTCTCC KV11 GAAATAGCCTTCCGACGCTTC KV15 GATGACCAGGTGAAGATACGTGG KV16 GGTTCCATGCGTTTCCTCGG KV17 CACCACGGTGTTGACGAAGAAC KV18 CAGAACTCGAGTTGGCGTTG KV19 GAGATCTGTCCATCGTGTACG

259

KV20 ctgttgcggaaagctgaaaGGTGGCCGATGTTGGTCTTC KV21 ggacgctcgaaggctttGTTCAGTACGTGGAGCTGCAC KV22 GAGTTGCTTGAAGGTGTGCTC KV23 GATTCCACTGCCCACGTACG KV24 cttagggataacagggtaatcggCATTCACACCACGCAACAACTC KV25 CAAGTCCCTCTTCGACCGTG KV26 GTCGTCACCGTGGATCACGTC KV27 CCAGGCTCTTGACGAAGTGTG KV28 ACGGGAGAACTCCACGAACATC KV29 ggacgctcgaaggctttgCCGTCGCCTGGGTGTTGTC KV30 GTGTGCGTCGCAGTGGAC KV31 TGTGCTCGCTCACCACTG KV32 ctgttgcggaaagctgaaaGGGGAGATCCCGAAGAGTTC KV40 ATCCACCGGAATCCGCTTG KV41 TGTCGGTTCCGGTTTCTGC KV42 cttagggataacagggtaatcGGAACCACATCCGTATCTACATC KV47 ggacgctcgaaggctttGGTGGAGGAGCGGTTTGAG KV48 GTCGCCAGCGGAGAACAC KV49 GACATGGCGGGTGAACCTG KV50 ctgttgcggaaagctgaaaGATGCCTACCAGCGCATTC KV50 ctgttgcggaaagctgaaaGATGCCTACCAGCGCATTC KV51 TCTGCCGGCCACTTCGATC KV52 TGGGGTTGTCGTCAGCGAAG KV53 GCGCAGATCGAAGGCTTG KV54 CGCCTCCACGAAGATTTCG KV55 GTCTGTCGTTGGAGGACGG KV56 ggacgctcgaaggctttgCACCTTGCTGTTCGACCACC KV60 GTGTCGGTGTTTTCGTGTCGTG KV61 GAACCCGATCTCCTGGAACGTC KV62 GAAAGGCAGCACCAGATCATGC

260

KV63 CCGAATGAAACGCATGACTCACC KV64 CGTTTGATGCAGGCGTTGC KV65 cttgcttgagaaggttttgggacgctcgaaggctttGGTGACTTGGTGGATCCTGGC KV66 tcagtacaatcttagggataacagggtaatgctagcTCACGAGATCCCGAACTCCTTG CTCGGCAAGGAGTTCGGGATCTCGTGAgctagcCGATATAAGTTGTAATTCTCATGTT KV70 TG KV71 ctatactcctataagaataggaacttcggaataggaacttcTGCCAAGGGTTGGTTTGCG KV72 gacatggattccgcgacggacgaggagatgttcgacctcCTCGGCAAGGAGTTCGGGATC KV73 gttatactgttgcggaaagctgaaaGAAGTTCCTATACTCCTATAAGAATAGGAACTTCG CTGCTGTCCCACTGGGGAGCGGCCACCAACAGTGACATCGACATGGATTCCGCGA KV74 CGGAC KV75 cttgcttgagaaggttttgggacgctcgaaggctttCTGCTGTCCCACTGGGGAG KV76 AACGGCACCTCACCCGTAC KV94 GTCACCGTGGACACGGCATG KV95 GATGTCAGTGTCGGTGGTGTCG KV96 AAGGTCTCCCAGTCGACGTCC LK13 GGGTCCATGGAGAAGAAGAAGC LK18 ctctttcctgtaggtcaggttgc LK19 CGCAGATCCTTCGTGACCTC LK23 ACAATGCCGGTGGATTCGTTTC LK24 cgtcaatcgtatgtgaatgctgg LK26 CGGGAAACGGCAGCTCATAC NJ8 ctatactttaacgtcaaggagaaaaaaccccggatATGCCACAATTTGGTATATTATG NJ9 CTGACGAACAAGCACCTTAGGTGGTGTTTtacataatataccaaattgtggcatatccgg

A.2 dsDNA gblocks used in Chapter 2. gblock- GGCGAATGGGCTGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCATCGCCTT pAN84 CTATCGCCTTCTTGACGAGTTCTTCTGACCGCAAAAGCGGCCTTTGACTCCCTGCAAGCCTCAGCGACC -tFd GAATATATCGGTTATGCGTGGGCGATGGTTGTTGTCATTGTCGGCGCAACTATCGGTATCAAGCTGTT

261

TAAGAAATTCACCTCGAAAGCAAGCTGATAAACCGATACAATTAAAGGCTCCTTTTGGAGCCTTTTTTT TTGGAGATTTTCAACGTGAAAAAATTATTATTCGCAATTCCTTTAGTTGTTCCTTTCTATTCTCACTCCG CTGAAACTGTTGAAAGTTGTTTAGCAAAACCTCATACAGAAAATTCATTTACTAACGTCTGGAAAGAC GACAAAACTTTGCTTCACGTTTTCCCAGGTCAGAAGCGGTTTTCGGGAGTAGTGC gaaatgcataagcttttgccattctcaccggattcagtcgtcactcatggtgatttctcacttgataaccttatttttgacgaggggaaat taataggttgtattgatgttggacgagtcggaatcgcagaccgataccaggatcttgccatcctatggaactgcctcggtgagttttctc cttcattacagaaacggctttttcaaaaatatggtattgataatcctgatatgaataaattgcagtttcatttgatgctcgatgagttttt ctaatcagtactgacaataaaaagattcttgttttcaagaacttgtcatttgtatagtttttttatattgtagttgttctattttaatcaaat gttagcgtgatttatattttttttcgcctcgacatcatctgcccagatgcgaagttaagtgcgcagaaagtaatatcatgcgtcaatcgt atgtgaatgctggtcgctatactgctgtcgattcgatactaacgccgccatccagtgtcgatctagaGAAGTTCCTATTCCGAA GTTCCTATTCTATCAGAAGTATAGGAACTTCtaccAGCCCGACCCGAGCACGCGCCGGCACGCCTGGTC gblock- GATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGTCCATCATTCTCGT ptKanR GCTCGAAGAGCTCAACAAGCTCGACGATTCGATCCTCGTGCTCGACCCGGCAAGCGCGGCACGGGTG CGGATCTCGACCCTGCTCCAGGACCTGGCCGCGAAATGGGTCGAGCGGACGGGCGAGATCACCAAG GTAGTCGGCAAATAATAATACTAGCAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGATT ACCGCCTTTGAGTGAGCGAAGTTCCTATTCCGAAGTTCCTATTCTCATATAAGTATAGGAACTTCTTTC AGCTTTCCGCAACAGTATAActgtgcggtatttcATCGGCCATGAgctagcGTGGCACTTTTCGGGCAGAATA AATGATCATATCGTCAATTATTACCTCCACGGGGAGAGCCTGAGCAAACTGGCCTCAGGCATTTGAGA AGCACACGGTCACACTGCTTCCGGTAGTCAATAAACCGGTAAGTAGCGTATGCGCTCACGCAACTGG TCCAGAACCTTG CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATGCTGTAGGCTGTTAATATATTTCGGTGTGTAGGATACGGGCTTTCCATCAGTCGTCGGGGAAG F1-B4p GCTGTGTTGTGGGGAATTCAGGCGCACCCAAATCCCGTGGTCTCAGCGCGGCCATGAGCAATCTCTTC TTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATTGTTCACATTCGAACGGTCTCTGCTTTGACAacatgctgtgcggtgtTGTAAAGTCGTGGCCAGGAGA F1- ATACGACAGCGTGCAGGACTGGGGGAGTTTTTCCATCAGTCGTCGGGGAAGGCTGTGTTGTGGGGA kasOp ATTCAGGCGCACCCAAATCCCGTGGTCTCAGCGCGGCCATGAGCAATCTCTTCTTTTTTTGTTTTTTAT GTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATTGTGCGGGCTCTAACACGTCCTAGTATGGTAGGATGAGCAATTTCCATCAGTCGTCGGGGAAG F1-p21 GCTGTGTTGTGGGGAATTCAGGCGCACCCAAATCCCGTGGTCTCAGCGCGGCCATGAGCAATCTCTTC TTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATCTAATTGGCTACGTCATAGAGAGATTCTTTAGGATGAGAAATTTCCATCAGTCGTCGGGGAAGG F1-p72 CTGTGTTGTGGGGAATTCAGGCGCACCCAAATCCCGTGGTCTCAGCGCGGCCATGAGCAATCTCTTCT TTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA gblock- CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC F1- GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT SF14p CCATTCGAGAGATCTTAATTAATGAGTTACGTAGACCTACGCCTTGACCTTGATGAGGCGGCGTGAGC

262

TACAATCAATACTCGATTAGAATTTTTCCATCAGTCGTCGGGGAAGGCTGTGTTGTGGGGAATTCAGG CGCACCCAAATCCCGTGGTCTCAGCGCGGCCATGAGCAATCTCTTCTTTTTTTGTTTTTTATGTCTGGTA CCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATGCTGTAGGCTGTTAATATATTTCGGTGTGTAGGATACGGGCACCTGCCGTCGAGTTCGTCTCCG F2-B4p AGTGAGTCCAGCACCGCGTTGAGAGGGCCGTCCTGTGGAGAATGAAGAGAAACTTCGTCATTACCTC TTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATTGTTCACATTCGAACGGTCTCTGCTTTGACAacatgctgtgcggtgtTGTAAAGTCGTGGCCAGGAGA F2- ATACGACAGCGTGCAGGACTGGGGGAGTTACCTGCCGTCGAGTTCGTCTCCGAGTGAGTCCAGCACC kasOp GCGTTGAGAGGGCCGTCCTGTGGAGAATGAAGAGAAACTTCGTCATTACCTCTTTTTTTGTTTTTTATG TCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATTGTGCGGGCTCTAACACGTCCTAGTATGGTAGGATGAGCAAACCTGCCGTCGAGTTCGTCTCCG F2-p21 AGTGAGTCCAGCACCGCGTTGAGAGGGCCGTCCTGTGGAGAATGAAGAGAAACTTCGTCATTACCTC TTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATCTAATTGGCTACGTCATAGAGAGATTCTTTAGGATGAGAAAACCTGCCGTCGAGTTCGTCTCCG F2-p72 AGTGAGTCCAGCACCGCGTTGAGAGGGCCGTCCTGTGGAGAATGAAGAGAAACTTCGTCATTACCTC TTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATTCGAGAGATCTTAATTAATGAGTTACGTAGACCTACGCCTTGACCTTGATGAGGCGGCGTGAGC F2- TACAATCAATACTCGATTAGAATTACCTGCCGTCGAGTTCGTCTCCGAGTGAGTCCAGCACCGCGTTG SF14p AGAGGGCCGTCCTGTGGAGAATGAAGAGAAACTTCGTCATTACCTCTTTTTTTGTTTTTTATGTCTGGT ACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATGCTGTAGGCTGTTAATATATTTCGGTGTGTAGGATACGGGCCGGAATCATCGACCAGGAACTC F3-B4p GATGGGTCATGAGCAGCGAGAACGTCCGACCGGAAATCGAGGGGACTGGCACGCGGATGTCGAAC GACTTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATTGTTCACATTCGAACGGTCTCTGCTTTGACAacatgctgtgcggtgtTGTAAAGTCGTGGCCAGGAGA F3- ATACGACAGCGTGCAGGACTGGGGGAGTTCGGAATCATCGACCAGGAACTCGATGGGTCATGAGCA kasOp GCGAGAACGTCCGACCGGAAATCGAGGGGACTGGCACGCGGATGTCGAACGACTTTTTTTGTTTTTT ATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA gblock- CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC F3-p21 GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT

263

CCATTGTGCGGGCTCTAACACGTCCTAGTATGGTAGGATGAGCAACGGAATCATCGACCAGGAACTC GATGGGTCATGAGCAGCGAGAACGTCCGACCGGAAATCGAGGGGACTGGCACGCGGATGTCGAAC GACTTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATCTAATTGGCTACGTCATAGAGAGATTCTTTAGGATGAGAAACGGAATCATCGACCAGGAACTC F3-p72 GATGGGTCATGAGCAGCGAGAACGTCCGACCGGAAATCGAGGGGACTGGCACGCGGATGTCGAAC GACTTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATTCGAGAGATCTTAATTAATGAGTTACGTAGACCTACGCCTTGACCTTGATGAGGCGGCGTGAGC F3- TACAATCAATACTCGATTAGAATTCGGAATCATCGACCAGGAACTCGATGGGTCATGAGCAGCGAGA SF14p ACGTCCGACCGGAAATCGAGGGGACTGGCACGCGGATGTCGAACGACTTTTTTTGTTTTTTATGTCTG GTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATGCTGTAGGCTGTTAATATATTTCGGTGTGTAGGATACGGGCTCAGTTGATGCGCGAACGAGGG F4-B4p AGTCAACAGTGAGCGAGACCTTGTCCCTTCCCGGGACCGTGAAGGCCGAACGGCGTTGTCCGTACGA CTTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATTGTTCACATTCGAACGGTCTCTGCTTTGACAacatgctgtgcggtgtTGTAAAGTCGTGGCCAGGAGA F4- ATACGACAGCGTGCAGGACTGGGGGAGTTTCAGTTGATGCGCGAACGAGGGAGTCAACAGTGAGCG kasOp AGACCTTGTCCCTTCCCGGGACCGTGAAGGCCGAACGGCGTTGTCCGTACGACTTTTTTTGTTTTTTAT GTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATTGTGCGGGCTCTAACACGTCCTAGTATGGTAGGATGAGCAATCAGTTGATGCGCGAACGAGGG F4-p21 AGTCAACAGTGAGCGAGACCTTGTCCCTTCCCGGGACCGTGAAGGCCGAACGGCGTTGTCCGTACGA CTTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATCTAATTGGCTACGTCATAGAGAGATTCTTTAGGATGAGAAATCAGTTGATGCGCGAACGAGGG F4-p72 AGTCAACAGTGAGCGAGACCTTGTCCCTTCCCGGGACCGTGAAGGCCGAACGGCGTTGTCCGTACGA CTTTTTTTGTTTTTTATGTCTGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA CCAAGCGCGCAATTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGCCCGACCCGAGCACGCGCC GGCACGCCTGGTCGATGTCGGACCGGAGTTCGAGGTACGCGGCTTGCAGGTCCAGGAAGGGGACGT gblock- CCATTCGAGAGATCTTAATTAATGAGTTACGTAGACCTACGCCTTGACCTTGATGAGGCGGCGTGAGC F4- TACAATCAATACTCGATTAGAATTTCAGTTGATGCGCGAACGAGGGAGTCAACAGTGAGCGAGACCT SF14p TGTCCCTTCCCGGGACCGTGAAGGCCGAACGGCGTTGTCCGTACGACTTTTTTTGTTTTTTATGTCTGG TACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCA

264

A.3 Sequence of Fragment 1 yeast assembly.

Sequence of Fragment 1 shown schematically in Fig. 2-1. To show the context of the sequence in the chromosome, the last 30 bp of the HO(L) region on the 5’ end, and the pPYK promoter of the acceptor module on the 3’ end are included. The sequence is in Genbank format:

LOCUS mer-fragment-1-031016 27156 bp DNA linear UNA 07- Apr-2012 DEFINITION FEATURES Location/Qualifiers Misc._recombina 1..30 /label=HO(L) Misc._recombina 31..78 /label=FRT-WT Primer 72..97 /label=AN913 Primer complement(78..102) /label=AN911 ORF 79..879 /label=URA3 Primer_binding_ 463..482 /label=AN184 Primer_binding_ complement(688..710) /label=AN299 Primer complement(857..879) /label=AN106 Terminator 892..1057 /label=tADH Primer 1008..1030 /label=LK18 Restriction_sit 1058..1063 /label=XbaI Cleavage_site 1058..1063 /label=assembly1 Primer_binding_ 1064..1081 /label=AN19 Amplification 1064..1345 /label="from pSE34" Misc._recombina 1073..1164 /label=pAN84-gib1-homology Misc._signal complement(1077..1096) /label=cr21-1 Misc._signal 1085..1104 /label=cr21-3 Primer 1097..1115 /label=EC48 Misc._signal 1116..1135 /label=cr21-4 Misc._signal 1130..1149 /label=cr21-2 Promoter 1167..1343 /label=ermE Primer complement(1327..1345)

265

/label=AN20 Primer complement(1327..1367) /label=AN219 Primer 1327..1364 /label=EC1 RBS 1346..1375 /label=merP Primer complement(1371..1391) /label=AN598 ORF 1376..6094 /label=merP Primer complement(1449..1470) /label=LK23 Primer_binding_ 1594..1616 /label=FAN1 Primer_binding_ complement(1793..1813) /label=FAN2 Primer complement(1793..1812) /label=AN834 Primer_binding_ 2276..2296 /label=EC8 Primer_binding_ complement(2353..2374) /label=KV17 Primer_binding_ 2944..2963 /label=KV12 Primer_binding_ complement(3067..3085) /label=EC3 Primer 3251..3268 /label=EC4 Primer 3331..3354 /label=EC6 Primer complement(3407..3424) /label=EC2 Primer_binding_ 3550..3569 /label=FAN3 Primer_binding_ complement(3598..3619) /label=KV13 Primer 3989..4009 /label=KV77 Primer_binding_ 4109..4131 /label=KV15 Primer_binding_ complement(4228..4248) /label=EC9 Primer_binding_ 4760..4780 /label=FAN4 Primer_binding_ complement(4760..4780) /label=KV14 Primer 5157..5178 /label=EC10 Primer 5157..5177 /label=KV19 Primer complement(5189..5207) /label=EC5 Primer_binding_ complement(5279..5298) /label=KV18 Primer complement(5279..5298) /label=EC7

266

Primer_binding_ 5368..5391 /label=FAN5 Primer complement(5442..5463) /label=KV78 Primer_binding_ 5965..5984 /label=KV25 ORF 6067..23205 /label=merA Primer_binding_ complement(6106..6125) /label=KV16 Active_site 6130..7395 /label=LM-KS Primer_binding_ 6579..6600 /label=FAN6 Primer_binding_ complement(6641..6661) /label=FAN7 Primer 7024..7044 /label=KV21 Primer complement(7138..7159) /label=EC11 Primer complement(7140..7159) /label=KV20 Primer_binding_ 7270..7290 /label=FAN8 Primer_binding_ complement(7365..7386) /label=FAN9 Primer 7754..7773 /label=KV33 Active_site 7807..8706 /label=LM-AT Primer_binding_ 7881..7901 /label=KV27 Primer_binding_ complement(7939..7959) /label=KV26 Primer 8383..8402 /label=KV79 Primer_binding_ 8515..8535 /label=FAN10 Primer_binding_ complement(8599..8619) /label=FAN11 Primer 8802..8821 /label=KV23 Active_site 8977..9189 /label=LM-ACP Primer complement(9037..9057) /label=KV22 Primer_binding_ complement(9132..9153) /label=FAN12 Primer_binding_ 9221..9240 /label=KV35 Active_site 9247..10521 /label=M1-KS Primer complement(9340..9359) /label=KV80 Primer_binding_ 9693..9713 /label=KV34 Primer 9736..9755

267

/label=KV94 Primer_binding_ complement(9858..9879) /label=KV28 Primer_binding_ 10360..10379 /label=FAN13 Primer_binding_ complement(10414..10433) /label=FAN14 Primer 10574..10592 /label=KV29 Primer complement(10750..10771) /label=KV24 Active_site 10834..11700 /label=M1-AT Primer_binding_ complement(10974..10995) /label=KV58 Primer_binding_ 11005..11025 /label=FAN15 Primer complement(11242..11262) /label=KV81 Primer_binding_ 11482..11502 /label=KV37 Primer 11490..11510 /label=AN239 Primer complement(11646..11667) /label=KV95 Primer_binding_ complement(11699..11718) /label=KV36 Primer 11816..11836 /label=KV82 Primer_binding_ 12185..12202 /label=KV31 Primer_binding_ complement(12361..12378) /label=KV30 Active_site 12508..13035 /label=M1-KR Primer complement(12511..12530) /label=KV83 Primer 12779..12799 /label=KV84 Primer_binding_ 13002..13020 /label=KV51 Primer 13055..13073 /label=KV43 Primer complement(13164..13184) /label=KV96 Primer_binding_ complement(13165..13186) /label=KV38 Active_site 13300..13557 /label=M1-ACP Primer 13497..13516 /label=KV56 Active_site 13633..14898 /label=M2-KS Primer_binding_ 13668..13685 /label=KV39 Primer_binding_ complement(13855..13874) /label=KV32

268

Primer_binding_ 14232..14254 /label=FAN16 Primer_binding_ complement(14360..14377) /label=FAN17 Primer 14891..14908 /label=KV45 Primer_binding_ 14998..15015 /label=KV53 Primer_binding_ complement(15021..15040) /label=KV52 Primer complement(15075..15094) /label=KV44 Active_site 15190..16059 /label=M2-AT Misc._feature 15272..16021 /label=F1-cr14-repair-gblock Primer 15276..15295 /label=KV85 Primer complement(15313..15334) /label=KV59 Primer 15494..15512 /label=KV55 Primer_binding_ 15575..15593 /label=KV41 Misc. 15643..15739 /label=cr14-2-repair Primer 15643..15702 /label=AN695 Primer complement(15680..15739) /label=AN696 Misc._signal complement(15706..15725) /label=cr14-target Mutation 15707 /label="G to A" Primer_binding_ complement(15711..15729) /label=KV40 Primer complement(15711..15729) /label=KV57 Primer complement(15803..15821) /label=KV76 Primer complement(15997..16015) /label=KV97 Primer 16161..16180 /label=KV86 Active_site 16222..16713 /label=M2-DH Primer_binding_ complement(16440..16459) /label=KV46 Primer_binding_ 16525..16546 /label=KV60 Primer_binding_ complement(16737..16755) /label=KV54 Primer_binding_ 17232..17250 /label=KV47 Primer_binding_ complement(17392..17414) /label=KV42 Active_site 17536..18075

269

/label=M2-KR Primer 17659..17679 /label=KV88 Primer complement(17812..17831) /label=KV87 Primer_binding_ complement(17889..17908) /label=FAN19 Primer_binding_ 17910..17930 /label=FAN18 Active_site 18340..18597 /label=M2-ACP Primer_binding_ 18361..18382 /label=KV62 Primer_binding_ complement(18450..18471) /label=KV61 Active_site 18652..19932 /label=M3-KS Primer complement(18660..18680) /label=KV89 Primer_binding_ complement(18900..18918) /label=FAN20 Primer_binding_ 19008..19029 /label=KV67 Primer 19440..19459 /label=KV90 Primer_binding_ complement(19545..19564) /label=FAN21 Primer_binding_ 19641..19661 /label=KV68 Primer 19933..19951 /label=KV49 Primer complement(19993..20010) /label=KV48 Primer_binding_ complement(20158..20179) /label=FAN23 Primer_binding_ 20217..20238 /label=FAN22 Active_site 20224..21036 /label=M3-AT Primer_binding_ 20569..20587 /label=KV64 Primer_binding_ complement(20772..20794) /label=KV63 Primer 20959..21016 /label=AN544 Primer complement(20992..21048) /label=AN545 Mutation 21005 /label="T to C" Misc._signal complement(21007..21026) /label=cr10-target Primer_binding_ 21175..21196 /label=FAN24 Primer_binding_ complement(21297..21315) /label=KV69 Primer complement(21381..21401) /label=KV91

270

Primer_binding_ 21691..21711 /label=KV65 Primer_binding_ complement(21813..21831) /label=KV50 Active_site 21964..22485 /label=M3-KR Primer_binding_ 22438..22457 /label=FAN25 Primer_binding_ complement(22585..22605) /label=FAN26 Primer complement(22675..22694) /label=KV92 Active_site 22726..22956 /label=M3-ACP Primer_binding_ 23101..23119 /label=KV75 Primer 23101..23160 /label=KV74 Primer 23140..23199 /label=KV72 Primer 23141..23160 /label=AN846 Primer 23179..23238 /label=KV70 Primer_binding_ complement(23184..23205) /label=KV66 Source 23212..24602 /label=pACYC184 Mutation 23224 /label="T deletion" ORF 23312..24502 /label=tet-efflux-classC Primer 23714..23733 /label=KV93 Primer_binding_ complement(23938..23959) /label=FAN28 Primer_binding_ 23968..23992 /label=FAN27 Primer complement(24584..24643) /label=KV71 Misc._recombina 24603..24650 /label=FRT-15 Primer complement(24616..24675) /label=KV73 Primer complement(24752..24767) /label=AN112 ORF complement(24880..25583) /label=HIS3 Primer 24962..24983 /label=AN835 Primer complement(24962..24983) /label=AN272 ORF complement(25584..26297) /label=GFP Primer_binding_ complement(25603..25624) /label=AN325 Misc._recombina 25733..25836

271

/label=pAN84-gib1-homology Restriction_sit 25733..25738 /label=MfeI Misc._signal 25736..25755 /label=cr22-1 Misc._signal complement(25780..25799) /label=cr22-2 Misc._signal complement(25822..25841) /label=cr22-3 Primer complement(26185..26207) /label=AN836 Promoter complement(26298..27156) /label=pPYK ORIGIN 1 aaaattgtgc ctttggactt aaaatggcgt GAAGTTCCTA TTCCGAAGTT CCTATTCTCT 61 AGAAAGTATA GGAACTTCTC GAAAGCTACA TATAAGGAAC GTGCTGCTAC TCATCCTAGT 121 CCTGTTGCTG CCAAGCTATT TAATATCATG CACGAAAAGC AAACAAACTT GTGTGCTTCA 181 TTGGATGTTC GTACCACCAA GGAATTACTG GAGTTAGTTG AAGCATTAGG TCCCAAAATT 241 TGTTTACTAA AAACACATGT GGATATCTTG ACTGATTTTT CCATGGAGGG CACAGTTAAG 301 CCGCTAAAGG CATTATCCGC CAAGTACAAT TTTTTACTCT TCGAAGACAG AAAATTTGCT 361 GACATTGGTA ATACAGTCAA ATTGCAGTAC TCTGCGGGTG TATACAGAAT AGCAGAATGG 421 GCAGACATTA CGAATGCACA CGGTGTGGTG GGCCCAGGTA TTGTTAGCGG TTTGAAGCAG 481 GCGGCAGAAG AAGTAACAAA GGAACCTAGA GGCCTTTTGA TGTTAGCAGA ATTGTCATGC 541 AAGGGCTCCC TATCTACTGG AGAATATACT AAGGGTACTG TTGACATTGC GAAGAGCGAC 601 AAAGATTTTG TTATCGGCTT TATTGCTCAA AGAGACATGG GTGGAAGAGA TGAAGGTTAC 661 GATTGGTTGA TTATGACACC CGGTGTGGGT TTAGATGACA AGGGAGACGC ATTGGGTCAA 721 CAGTATAGAA CCGTGGATGA TGTGGTCTCT ACAGGATCTG ACATTATTAT TGTTGGAAGA 781 GGACTATTTG CAAAGGGAAG GGATGCTAAG GTAGAGGGTG AACGTTACAG AAAAGCAGGC 841 TGGGAAGCAT ATTTGAGAAG ATGCGGCCAG CAAAACTAAG TCGACCTCGA Ggcgaatttc 901 ttatgattta tgatttttat tattaaataa gttataaaaa aaataagtgt atacaaattt 961 taaagtgact cttaggtttt aaaacgaaaa ttcttattct tgagtaactc tttcctgtag 1021 gtcaggttgc tttctcaggt atagcatgag gtcgctctct agagagctcg gtaccAGCCC 1081 GACCCGAGCA CGCGCCGGCA CGCCTGGTCG ATGTCGGACC GGAGTTCGAG GTACGCGGCT 1141 TGCAGGTCCA GGAAGGGGAC GTCCATGCGA GTGTCCGTTC GAGTGGCGGC TTGCGCCCGA 1201 TGCTAGTCGC GGTTGATCGG CGATCGCAGG TGCACGCGGT CGATCTTGAC GGCTGGCGAG 1261 AGGTGCGGGG AGGATCTGAC CGACGCGGTC CACACGTGGC ACCGCGATGC TGTTGTGGGC 1321 ACAATCGTGC CGGTTGGTAg gatccTTTCC ATCAGTCGTC GGGGAAGGCT GTGTTGTGGG 1381 GAATTCAGGC GCACCCAAAT CCCGTGGTCT CAGCGCGGCC ATGAGCAATC TCTTCGAGCG 1441 GACCAGGAGA AACGAATCCA CCGGCATTGT GCCGGTCGAC CGGGGCCGGG AGCTGAGGGC 1501 GTCGTTCGCC CAGCAACGGC TGTGGTTTCT GGACCAGTTG GAACCCGGCA ACGCCTCGTA 1561 CAATCTCCCC TTCGCGGTGC GGGTGCGCGG CCGCTTGGAC ATCTCTCATC TCTCCCGGGC 1621 CCTCTCGCTC GTGGTCGCCC GGCACGAGGC GCTGCGCACC ACCTTCGGCG AGGCCGGCGG 1681 TCAGCCGGTG CAGCGGATCG AGCCCCCCGG CCCCGTCCCG GTGCGCCTTG AAGCGGTGTC 1741 CGGCGGCTCG GAGGAGGAGC GGCTGGCCGA GGTCCGGCGG CTGGCCGGAG CCGAGATCAC 1801 CGAGCCCTTC GACCTGAGCA CCGGGCCCCT GCTGCGCGCC AAGGCGCTGC GACTGGACGA 1861 ACAGGACCAC GTCCTGCTGC TGACGGTGCA CCATGTGGCG ACGGACGCCT GGTCACAAGG 1921 CATTGTGGTG CGTGAGCTGT CCGTCGCGTA CGCGTCGCTC GACGCCGGGC GCGAGCCCGT 1981 GCTGCCCCCG CTGCCCGTGC AGTACGCGGA CTACGCGGAG TGGGAGCGCG ACTGGCTGTC 2041 CGGCCCGACC CTGCGCCGCC AGCTGGACTA CTGGACGAAG CGGCTCGACG GCATGGCGCC 2101 CGCGCTGGAG CTGCCCACCG ACCGGCCCAG GCCCTCGGTC GCCAGCCAGG AAGGCGACGC 2161 GGTGCGCTGG GAGTTGCCGC CGGAACTGAT CCGGGCGGCC CGCCGGCTGG GCGCCGGTGA 2221 GAACGCGACC CTCTACATGA CCCTGCTGGC CGCTTTCCAG CTGGTACTGG GCCGGTACGT 2281 GGACAGCGAC GACATCACGG TGGGCACCCC CGTGGCCAAC CGGGGCCGCG CCGAGGTCGA 2341 GGGGCTCATC GGGTTCTTCG TCAACACCGT GGTGCTGCGG ACCGACCTGT CCGGCGACCC 2401 CACCTTCCGC CAACTGCTGG GCCGGGTCCG CGACACGGCG GCGGGTGCCT TCGCCCATGG 2461 CGACCTGCCC TTCGAGTATC TGGTGGAGCA GGTGCACCCC GAGCGGGACT TGTCGCGGAA 2521 CCCGCTGGTC CAGGTGCTCT TCCAGATGAT CAACGTACCG GCGGAGCGGC TCGAGCTGCC

272

2581 CGGCGCGCGG ACCGAGCCCT ACGACCACGG CGGCATCCTC ACGCGAATGG ATCTGGAGGT 2641 CCATCTCGTC GAGACCGGGG ACGGGGTTCT GGGGCACATC GTCTTCAGCA AGGCCCTGTT 2701 CGACACGAGC ACCATCGAAC GGCTGCTGCA CCACGTCACC GTCGTCCTCC GGGGCGTCCT 2761 GGCCGAGCCG GACCGGCGCA TCTCCGAGAT CTCGCTGCTC GACGAGGCGG AGCGGGCGAA 2821 GGTCCTGGAG AAGTTCAACA CGACCACGGG CCCCGTACCC GCCGGATCCC TGCCCGCGCT 2881 CTTCACCGCC CAGGCCGAGC GCCGCCCCGA TGCGGTGGCC GTGATCAGCG GTGGTGACCG 2941 GGTGACCTAC GCCGAGCTGG ATCAGCGGGC GAACCAGCTC GCCCATCTGC TGGAGGGCCG 3001 GGGGGTCGGC CCCGAGACCC TGGTCGGGCT CTGCGTCGAT CGCGGCATCG AGATGATCGT 3061 GGCGATCCTC GCGATCCTCA AGCTCGGAGC GGCCTATGTG CCGATCGATC CCCACCACCC 3121 CCGAGACCGC GTCCAGTTCG TCCTTGCCGA CTCCGGGGTG ACCGTCGCCG TCACCCAGCA 3181 GCGCTTCACC GGCCTGCTCG AAACCCCGGA GGCACCCGGG ACGCCCGATG CGTCCGGGAC 3241 GTCCGGGATC CGCCTCATCC TGCTCGACGC CGAGCGCGAG CCGCTCGCCG GGCAGCCCCG 3301 GACCCCGCCC ACGGCACGGC CCAGCGCCCA GAACCTCGCC TATGTCATTT ACACCTCCGG 3361 CTCCACCGGA GTCCCCAAGG GCATCCTCAT GCCCGCCACC TGTGTGCTCA ACCTGGTGGC 3421 CTGGCAGAAG CGGGCCCTGC CGATCGGTCC CGACGCCAAG ACGGCACAGT TCGCCACGCT 3481 GACCTTCGAT ATCTCGTTGC AGGAGATCTT CTCCGCGCTG CTGTACGGCG AGACGATCGT 3541 CGTCCCCGGC GAGGAACTGC GCATGGACCC CGCCGAGTTC GCCACATGGG TCCACGCCAA 3601 CGAGATCGAC CAGCTCTTCG TCCCGAATGT GATGCTGCGG GCGATCTCCG AGGAGGTGGA 3661 TCCGCACGGC ACCGAGCTGG CCGCACTGCG CCACCTCTCA CAGGCCGGCG AACCCCTCTC 3721 CCTCCACCAC GATCTGCGCG AGCTGTGCGC CCGCCGCCCC GAGTTGCGGC TGCACAACCA 3781 CTACGGTCCC AGCGAAGCCC ATGTGGTGAC GTCGTACTCG CTCCCCGCCG AGGTGGCCGA 3841 GTGGCCGCTC ACCGCACCCA TCGGCCGCCC GATCGGCAAC ACCCGGGTGT ATGTGGTCGA 3901 CCGGCGGCTC CGGCCCGTCC CGGTGGGGGT GCCAGGTGAG CTGTGCGTGG CCGGAGAGGG 3961 GCTGGCCAGG GGCTATCTCG GCCGCCCGGA TCTGACCGCT TCCCGGTTCG TGGCGGACCC 4021 GTTCCGCGGC GACGGATCGC GTATGTACCG CTCCGGCGAC CTGGTGCGCT GGCTGCCCGA 4081 CGGCAACCTG GAATTCCTCG GCCGGATCGA TGACCAGGTG AAGATACGTG GCTTCCGGAT 4141 CGAACCGGGC GAGATCGAGG CGATCCTCGC CCGGCACCAG GACGTTCTGC ACACGGCCGT 4201 GATGGTGCGC GAGGACACCC CCGGCGACAA GAGGCTGGTG GCCTATGTGG TGGCCGATGC 4261 CACCGCCGCG GACCGGCACG GCGGGCTGAC CGAGACCCTG CGCCGGCACG TCGAGTCCGC 4321 GGTGCCCGAA TACATGGTGC CCTCCGCGTT CGTCCTGCTG GACACCATGC CCCTGACCTC 4381 CGGCGGCAAG ATCGACCGGA AGGCGCTGCC CGCCCCCGAT CTGCGCACCG TGCTCGAGGT 4441 CGGCTACGTC GCCCCACGCA CCCCCGAGGA AGAGGCCGTC TGCCGGGTTT ACGCGGATCT 4501 GCTCGGCGCG GCCAAGGTCG GCATCGACGA CGACTTCTTC GCACTGGGCG GCCATTCCCT 4561 CATCGCCACC AGGGTGGTCG CCAGGCTCCG GTCCGCCCTC GGTATCGCCG TACCGCTGAA 4621 GACCGTCTTC CAGCAGCGCA CCCCCCGAGA GCTGGCGGCC ACGCTCACCG CCGCGGCCCG 4681 CTCCGGTCCC GAACCCGAGC TGCCGCCGCT GGTTCCCACG CGGCGCGACC AGCCCGTCCC 4741 CCTCACCTTC GCACAGCAGC AGACGGACCT CTTCTTCGAC GATGTCCTGA ACGCCGGGCA 4801 CTGGAACATC CCCATGGCGG TGCGGGTGTC GGGCGAACTG GACCTCGACT GCCTGCGGCG 4861 GGCGATGGAC CTGCTGATCG ACCGCCACGA GGCCCTGCGC ACCACCTTCG TCAGGGAAGC 4921 CGACGGATAC GTCCAGGTGA TCCGGCCGAG CGCGCCGGTC CAGGTGGAGG TGGCCGAGAC 4981 GCACGACGAG ACCGAAGCCT CGGTACTGGC CGGCCAGGAG GCCGCCCGCC CCTTCGACCT 5041 CACACGCGGC CCGCTGGCGA GACTGCGCGT GCTGCGGCTG TCCCAGTCCG ACCATGTGCT 5101 GGTGCTCACC CTGCACCACC TGGTCACCGA CGGCTGGTCC CAGGGAGTGC TGGTCCGAGA 5161 TCTGTCCATC GTGTACGCGG CACTGCTGCA CGGCACCGAA CCCGATCTGC CACCCGCACC 5221 CGTCCAGTAC GCCGATGTCG CGAGCTGGGA GCGGAAGTGG TTGCGCGGTC CGCTGCTGCA 5281 ACGCCAACTC GAGTTCTGGA AGCGGCATTT CGAGGGCATG ACCCCCGCCG AACTGCCCAC 5341 CGACCGGCCC CGCGCCGCGT CGGCCCGCTA CGAGAGTGAC ATCTTCCACT GGCGACTGCC 5401 GACGGACGCC GTCGAGACCG CCCGACGGCT GGGCGAATCG TGCAACGCCA CCTTGTACAT 5461 GACGCTGCTG ACCGCCCTGA AGGTGGTCAT GTCCGCCCGC TCGGACAACC AGGACGTCCT 5521 CGTCGGCGTG CCCACGGCCA ACCGTGGCCG GGACGAACTG GAGAACACGG TGGGCCTCGT 5581 CTCCAAGATG CTCGCGCTGC GCACCGAAGT GTCCGGTGCC ACGGACTTCG GCACACTGCT 5641 GGCCACGGTG CGCGATGCGA TGTCCGACGC CCATACACAC CAGGACGTGC CCTTCGTGTC 5701 CGTTCTCAAG CACATCGGTG ACCACACCGC CGGCCCCGCC GGTGACACCG CCGGCGGCCG 5761 GGCCGGGACG CGGCTGTCGG ACGATCCGCC AGTGAAGGTG ATCTTTCAGA TCGTCAACAC 5821 CCCGCCGCGG CCACTCCGGC TCACCGGACT GACGGCCGAG CCGTTCCCGA TGACCCACCC 5881 GCCGGTCACG GTCAACGTGG ACATGGAGAT CGACCTGTAC GAGAGCGCGG AGGACGGCGG 5941 CCTCGCCGGC ACCGTGCTGT TCAGCAAGTC CCTCTTCGAC CGTGCCACGA TCGAGCGGTT

273

6001 CTGCGACGAC GTGGTGGCGG TCGTCTCCGC GGCCGCCGCG GATCCCGGAC GGCCGGTCTC 6061 ACAGGTGTGG CAGGGCCGGG GCCGCGACCA GTGAACGATC CCGCCCCGAG GAAACGCATG 6121 GAACCGGATG AGGCCGTCGC CGTTGTCGGA ATGTCCTGCC GCTTTCCGCA GGCACCCGAT 6181 CCCGAGGCGT TCTGGCGGCT GCTGAGCGAG GGCATCTCGG CCATCGGTGA GGTGCCCGCG 6241 GGGCGGTGGA CCGACGACCA GCCCACGCCG TCCGGGACCG ACGAGCGGTC CACGCCGCCC 6301 GCCATCCGCC GCGGCGGCTT CATCGACGAC GTCGACCGCT TCGACCCCGC GTTCTTCGGC 6361 ATCTCACCAC GGGAAGCCGC GGCGATGGAC CCCCAGCAGC GGCTGATGCT CGAGCTGGCC 6421 TGGGAAGGGC TGGAGGACGC GGGCATCGTG CCCGCCACCC TGCGGGGCGC CACCGTCGGA 6481 GCGTTCATCG GCGCCGGGTC CGACGACTAC GCCTCGCTGA TCCGCGCCCG CGGCCGTTCA 6541 CACCACACGC TGACCGGCAC CCAGCGGGGC ATGATCGCGA ACCGGCTCTC CCATGTGTTC 6601 GGCCTGAGCG GCCCGAGCGT GACCGTGGAC GCGGCCCAGG CATCCTCCCT GGTCGCGGTA 6661 CACATGGCCG TGGAGAGCGT GCGCCGCGGC GAGTCACGGC TCGCGCTGGC GGGCGGGGTC 6721 AACCTGAACC TCTCCGCGGA GACCGCCGCC GATATCGCGG CGTTCGGCGC ACTGTCCCCG 6781 GACGGCCGCT GCTTCACCTT CGACGCACGC GCCAATGGCT ATGTGCGGGG CGAGGGCGGC 6841 GGACTCGTCG TCCTGAAACC GCTCTCCGAC GCTCTCGCCG ACGGCGACAC CGTCTACTGC 6901 GTGATCGAGG GCAGCGCGGT CAACAACGAC GGCGGCGGTG CATCGCTCAC CGCACCCGAC 6961 CCGGACGGCC AGCGACGGGT GCTCCGACTC GCCCAGCGGC GGGCCGCGAT CTCCCCCGAG 7021 GCCGTTCAGT ACGTGGAGCT GCACGGCACC GGAACGGCAC TCGGCGACCC GGCGGAAGCG 7081 GCGGCCCTGG GCGCCGTCTT CGGCCGGAGC GGAGCGAGGC CGGTGCAGCT GGGGTCGGTG 7141 AAGACCAACA TCGGCCACCT CGAAGCCGCC GCCGGTATCG CCGGACTTCT GAAGACCGCA 7201 CTGGCCATCC ACCACCGGCA GCTGCCGGCC GGCCTCAATT ACCGCACGCC GAATCCCCGT 7261 ATCCCCATGG GCGAACTCAA CCTGGAGATG CGCCTCGCAC CGGGGGAGTG GCCGAAGCCG 7321 GACGACCGCC TGGTCGCCGG TGTCAGCTCT TTCGGGATGG GCGGCACCAA CTGCCATGTC 7381 CTGCTCGCCG AACCACTCGT CGGCGTCCCC TCCCACGCCT CCGCGCATGC CCCTGAGCCC 7441 GACTCCCTCC CCAGCTCGAT CCCGGCCCCG GTCCCGGTCC CGGTCCCGGT CCCGGCCCCG 7501 GTCCCGGTCC CGGCCCCGGC CCCGGCCCCG GCCCCGGTCC CGGTCCCCGT CCCGCTTCCG 7561 TTGTCCGGGG TGTCCGCTGC CGCGCTTCGC GGCCAGGCGA TGCGGCTACG GCCGTATCTG 7621 GAGCGATCGC CGAACCTCAC CGACCTCTCC TTCTCCCTCG CCACCGCACG AACCTCCTTC 7681 GACCACCGTG CGGTGCTGAT CACCGGGCAG GCGGCCGACG CGGCACACGG CCTGGACGCG 7741 CTCGTCGAAG GCGGCACGGT GGCGGGTTTG GTGACGGGCA CGGCGAGGGC GGCGGGAAAG 7801 CTCGCCTTCG CCTTCGCCGG CCAGGGCTCG CAGCGTCTCG GCATGGGACG TGAACTCGGG 7861 GCCGTCTTCC CCGTCTTTGC CCAGGCTCTT GACGAAGTGT GCACGGCGCT GGACGCACAC 7921 CTGGACCGGC CGCTTCGGGA CGTGATCCAC GGTGACGACG CCGAACCGCT CAACCGGACG 7981 GTGTACGCCC AGGCCGGACT CTTCGCGGTG GAGGTGGCGC TGTTCCGGCT GCTGGAGGAC 8041 TTCGGCCTCG TACCGGACCT GCTGATCGGC CACTCCCTCG GCGAGGTGAG CGCCGCCCAT 8101 GTCGCCGGTG TGCTGTCCTT GGCGGACGCC GCCACCTTCG TCGCCGCCCG TGGGCGGCTG 8161 ATGCAGGCCG TGACGGAGCC GGGCGCCATG GTGTCGCTCG AAGCCACCGA GGACGAGGTC 8221 ACCCGGACGC TCATGGCGGG CGGGGCATCG GACGACGGTG CCCGGGTGTG CGTGGCGGCG 8281 GTCAACGGCC CCACCGCCAC GGTGATCTCG GGGGACGAGC GCGCCGTACT CGACCTGGCG 8341 GTGGAGTGGG CCGGTCGCGG ACGCAAGACG AAGCGGCTCC GGACGAGCCA CGCCTTCCAT 8401 TCGCCCCATC TGGACCCCGT ACTGGACGAG CTTCGGCACA TCGCCGAGAG CCTCACGTAC 8461 CGGGCGCCCC GGATCCCGCT GGTGTCGAAT GTGACCGGCC GACGTGCCAC GGCGGAAGAG 8521 CTGTGTTCTC CGGAGTACTG GGTCCGGCAT GTCCGCCGGA CCGTACGGTT CCTGGACGGC 8581 GTCCGCTGTC TGGAGGACGA AGGCGTCACC ACCATCCTGG AACTGGGCCC GGACAAGGCG 8641 CTCACCACCC TGGCCCGCGA CTGCCTGACC GGGCCCGGGA CGCTGGTGGG CACCCTTCGT 8701 CGCGACCGGC CCGAGCCGCA GGCCCTGGTC ACCGCGCTGG CCGAGCTGTA TGTCTCGGGT 8761 GTCGAAGTGG CATGGAGCCC GCTGGTGTCC GGTGGGCGGC GGATTCCACT GCCCACGTAC 8821 GCCTTCCAGC GGCAGCGGTA CTGGTTCTCC GCTCCCGGGC CCGAGAGCGG AACCACGCCT 8881 GGCCATGGGG TCACATCCGG GCGCGAGCGC ACGGACACCG GCCTGAGCGG CGACGAGGCG 8941 CCCGACACCG GCCCGAGCGG CGGCGAGACG CTTGGCATGG TCCGGGCGCA CGCGGCCGTC 9001 GTGCTCGGAT ACGCGTCGGC AACCGCCATC GGCGCCGAGC ACACCTTCAA GCAACTCGGG 9061 TTCGACTCGA TCACCGCCGT CGAACTGTGC GAACGGCTCG GTGCGGCGAC CGCGCTTCCG 9121 CTGCCCGGCA CCTTGCTGTT CGACTATCCG ACGCCCGCCG CGCTCGCCGA GCATCTGCAC 9181 CGCAGGCTCC ACGGCCGGAC GGATGAGCAG GCCGCGCCCG CGACCGTGCC AACACCTGAC 9241 GGCGGCGATC CGGTGGTGAT CGTGGGGATG GGCTGCCGGT TCCCCGGCCG GGCCCACTCG 9301 CCGGAGGACC TGTGGCGGAT CGTGGCCGAC GGTGAGGACG CCATCTCCGG CTTTCCGTCC 9361 GACCGGGGCT GGGACCTCGC TGGTCTCTAC CACCCCGACC CCGACCACCC CGGCACGTCA

274

9421 TACGCACGCG ACGGCGGATT CCTCTACGAC GCGGCCGAGT TCGACGCGGG GTTCTTCGGG 9481 ATCTCACCGC GTGAGGCCGA GGCGATGGAC CCGCAGCAGC GGCTGCTGCT GGAGACATCG 9541 TGGGAGGCGT TGGAACGGGC GGGTATCCCC GCGGAACACA TCAAGGGCAG TAGCACGGGC 9601 GTGTTCATCG GCGCCTCGTC GGTCGGCTAC GCGGCGGACG CCGGAGAGGC GGCCGAGGGC 9661 TACCAGCTGA CCGGCACTGC CGCGAGCGTG GCCTCGGGCA GGGTGTCCTA CACCCTGGGC 9721 CTCGAAGGCC CGGCGGTCAC CGTGGACACG GCATGCTCGT CCTCGCTGGT GGCATTGCAC 9781 CTGGCCGTAC AGTCGCTGAG GGCGGGCGAG TGCTCACTGG CATTGGCGGG CGGTGTGACC 9841 GTGATGGCCA CACCGGCGAT GTTCGTGGAG TTCTCCCGTC AGCGGGGGCT GGCCATGGAC 9901 GGTCGGTGCA AGGCGTTCGC GGCGGCGGCG GACGGCACGG GGTGGGCCGA AGGCGTCGGG 9961 GTGCTGGTGG TCGAGCGGTT GTCGGACGCC GAGCGCAATG GGCATCGGGT GTTGGCGGTG 10021 GTGCGTGGTT CTGCGGTGAA TCAGGATGGT GCGTCGAATG GTTTGACGGC GCCGAATGGT 10081 CCGTCGCAGC AGCGGGTGAT CCGGCAGGCG TTGGCGAGTG CGGGTCTTGT GGCGTCGGAT 10141 GTGGATGCGG TGGAGGCGCA TGGTACGGGT ACGACGCTCG GTGATCCGAT TGAGGCGCAG 10201 GCGTTGTTGG CCACGTACGG TCAGGGTCGG GATGCGGATC GGCCGTTGTG GTTGGGGTCG 10261 GTGAAGTCGA ACATCGGTCA TACGCAGGCG GCCGCGGGTG TGGCTGGTGT GATCAAGATG 10321 GTGATGGCCA TGCGGCACGG GGTGCTGCCG CGAACGCTGC ACGTGGATGA GCCGTCGACC 10381 CACGTCGACT GGTCCGGCGG CCGGGTAGAG CTGCTCACCG GGACAACGCC ATGGCCCACG 10441 ACGGGTGGCC TTCGCCGAGC GGGCGTCTCC TCGTTCGGTG TGAGTGGCAC CAACGCTCAC 10501 GTCATCCTGG AGCAGGTCCC GGAGACGGCC CGGCCGACCG GGCCCATCGG GGAAGACGAC 10561 GGCGAAGCGG CGCCCGTCGC CTGGGTGTTG TCGGGACAGG GCGAGACTGG GCTGCGGGCC 10621 CAGGCCGAGC GGCTGTGCGC CTTCATGGCG GCCGATACCC GCCCCACCCC GGCGGAAGTG 10681 GGATGGTCAC TGGCATCGAC ACGTGCGACG TTGTCGCACC GCGCGGTGGT CGTGGGTGCT 10741 GGACGCGACG AGTTGTTGCG TGGTGTGAAT GCGGTGGCGA ACGGGACACC CGTGCCGGGA 10801 GTGGTACGGG GCACCGGAGC CTCCGGGGAC GTGGTGTTCG TCTTCCCGGG GCAGGGGTCG 10861 CAGTGGGTTG GGATGGCGTT GGAGTTGGTG GAGTCGTCGC CGGTGTTCGC GCGGCGGTTG 10921 GGTGATTGTG CGGATGCGTT GGCGCCGTTT GTGGAGTGGT CGTTGTTCGA TGTGTTGGGT 10981 GATGAGGTGG CGATCGGTCG GGTTGATGTG GTGCAGCCGG TGTTGTGGGC GGTGATGGTG 11041 TCGTTGGCGG AGTTGTGGCG TTCGTTTGGT GTGGTGCCGT CGGCGGTGGT GGGGCATTCG 11101 CAGGGTGAGA TCGCGGCGGC GTGTGTGGCG GGTGCGTTGA CTTTGGAGGA TGGGGCGCGT 11161 GTGGTGGCCT TGCGGAGCAG GGCGTTGCTG GCTCTGTCGG GTCGGGGCGG CATGGTGTCC 11221 GTACCGGTGT CCGCCGATCG GCTCCGTGAC CGTGTGGGGT TGTCGGTGGC GGCGGTGAAT 11281 GGTCCGGCGT CGACGGTCGT GTCCGGGGCG GTTGAGGTGC TGGAGGCGGT GCTGGCGGAG 11341 TTCCCGGAGG CCAAACGGAT TCCGGTGGAT TATGCCTCGC ATTCGGTGCA GGTGGAGGGG 11401 ATCCGGGAGG GGCTGGCGGA GGCGTTGGCG CCGGTTCGGC CGCGTACGGG TCAGGTGCCG 11461 TTCTATTCGA CGGTGACCGG CCGGCTGATG GACACCATCG AGTTGGACGC GGAGTACTGG 11521 TACAGGAACC TGCGCGAGAC GGTGGAGTTC CAGAGCACCG TCGAACACCT CATGCGCCAG 11581 GGTCATACGG TGTTTGTCGA GGCCAGCCCG CATCCGGTGC TGACCATCGG CGTCCAGGAC 11641 ACCGCCGACA CCACCGACAC TGACATCGTC GTCACCGGAT CGCTGCGCCG CGATGATGGC 11701 ACTGTCCAGC GGTTTCTGAC CTCCCTGGCC GAGCTCCACG TGCGCGGTGT CCGGATCGAC 11761 TGGGGCCCGC TCTTCGCCGG TGTCTCGCCC GTTGAGCTGC CGACGTACGC CTTCCAACGG 11821 GAACGGTTCT GGCTTGGGGC GGACATCGCC GAGTCCGCCG TGGACACGTG GCGATACCAG 11881 ATCTCCTGGA AGCCGCTGCC GGACATGGAC CCCCCGGCCC TCTCCGGCAC CTGGCTGGCC 11941 GTGGTCCCCG AAGGGGACGA GTGGGCCATG GCGGGCGCAC GGGCGCTGAT CGAGTCGGGC 12001 ACGGCCAGCG TCCGTACCCT CCAGGTGACC TGCGACGCGG ACCGCCGGAC CCTGGCCGGG 12061 CCGCTGACGG ATGTGGCGGG ATCCGAAGAC ATCGCCGGTG TCGTCTCGTT CCTGGCCGCC 12121 GACGAAGTTC CGCATCCGGC CCACCCCGCG CTGTCCCGGG GGATGGCGCA CACGGTCGAG 12181 CTGCTGTGCT CGCTCACCAC TGCCGATGTC GAGGCCCCGC TGTGGTGTGT CACCCGGGCG 12241 GCCGTCACGG CACTGCCCGC GGACCCGGCG CCGAGCCCCG CCCAGGCGGC GGTATGGGGA 12301 TTCGGACGGG TGGCCGGGCT GGAGCGATCC GAGCGGTGGG GCGGCCTGAT CGACCTGCCC 12361 GTCCACTGCG ACGCACACGT GCTGCGGCGG TTCGTCGCCG TACTCGCGCA GGCAGCCGGT 12421 GAGGACCAGG TGGCGGTGCG GCCATCGGCG GCCCTGGGCC GACGGTTGGA GCCGGCGCCC 12481 AGGACCGGAC CGGCCGGCGC ATGGCGCCCG CACGGCACGG TGCTGATCAC CGGTGGCACC 12541 GGCGTGCTGG GCGCACATGT GGCACGGTGG CTGGCGCGGT CCGGCGCGGA ACACCTGGTG 12601 CTGCTCAGCC GCCGTGGCCC GCAGGCCCCT GGGGCGGCCG TGCTCGACGA CGAACTGACC 12661 GCGCTCGGCG TACGAGTGAC CCTGACGGCC TGCGATGTGA CCGACCGGGC CGCTCTCGCC 12721 GGGGTGCTGG CATCGGTGCC GGACCTCACC GCCGTGGTCC ATCTCGCGGG GACCGTGCGA 12781 TTCGGCAATT CCATCGACGC GGACCTCGAC GAGTACGCCG GCGTCTTCGA CGCCAAGGTC

275

12841 ACCGGTGCCC TGCATCTGGA CGAGCTCCTC GACCACTCGT CACTGGAGGC GTTCGTCCTC 12901 TTCTCCTCGG CAGCGGCCGT CTGGGGCGGT GTCGGCCAGG CCGGTTACGC GGCGGCGAAC 12961 GCCCTGCTCG ACGCGGTGGC ACAGCGGCGT CGCGCACGCG GTCTGCCGGC CACTTCGATC 13021 GGCTGGGGCA CCTGGGGCGG CAGCCTCGCG CCCGAGGACG AGGAGCGGCT GAGCCGCATC 13081 GGCCTGCGCC CGATGCGGCC GGAGGTGGCC GTCACCGAGC TGCGCCACGT CGTCGGATCG 13141 GCCGAGCCCT GCCCGGCCAT CGCGGACGTC GACTGGGAGA CCTTCGGCCC GGCCTTCACG 13201 GCAGGCCGGC CCAGCCGCCT GCTCAGCGAG TTGCCGCGGC TGCGAAACAC CTCCGGCGCC 13261 ATGGCGATGA CCGGCGACCA CGCCGCATTG CGGAGGCGAC TGGCCGGGGT GTCCGCGGCC 13321 GACCAGGCCC GGACGCTGGT GGACCTGGTA CGTGAACACG CGGCGGAACT CCTGGGGCAC 13381 CGCGGCCCGG CGGCGATCGA CCCCACGGTG CCATTCCGGC AACTGGGCTT CGACTCGCTG 13441 ACGGCGGTCG AGCTGCGGAC CCGGCTGAAC GCGGCCACGG GACTGCGCCT CCCGGCCACC 13501 TTGCTGTTCG ACCACCCGAG CTGCCGGGCG GTCGCCGATC TGCTGCGCTC GGAACTGCTC 13561 GGCGACCGGC CGGGCTCCCT CGCGGCGTCG TCCGCCACGG AGGCTGTGCC CGCCGGCGTG 13621 GTGGCCTCCG ACGAGCCGAT CGCCATCGTC GCGATGAGCT GCCGCTTCCC GGGAGGCATC 13681 GGAACCCCCG AGGACTTGTG GCGGGTGGTC AGCGAGGGCC GGGACGTGCT CTCCGACTTC 13741 CCCGACGACC GCGGCTGGGA CGTGGACGCG CTGTACGACC CGGACCCGGA CCGGCCCGGC 13801 ACCAGCTATG TGCGTACCGG TGGATTCCTC CACGACGCCG CGGAGTTCGA CCCGGAACTC 13861 TTCGGGATCT CCCCGCGTGA GGCGCTGGCG ATGGATCCCC AGCAGCGGCT GCTGCTGGAG 13921 TCGGCGTGGC AGGTCCTGGA GCGCGCCAGG ATGGCGCCGA CCTCCCTGCG ATCCAGCAGG 13981 ACCGGTGTCT TCATCGGCGG CTGGGGCCAG GGCTACCCCT CGGCCTCGGA CGAGGGGTAT 14041 GCCCTGACCG GCGCCGCGAC CAGCGTGATG TCCGGTCGTA TCGCCTACGC GCTGGGGCTG 14101 GAGGGCCCCG CCCTGACCGT GGACACGGCA TGTTCGTCCT CGCTGGTGGC GCTGCATCTG 14161 GCGAGCGAGG CGCTACGGCG CGGCGAGTGC TCGCTGGCGC TCGCCGGCGG CGTGACGGTG 14221 ATGGCGACGC CCAGTACCTT TGTGGAGTTC TCGCGCCAGC GTGGGCTGGC CCCGGACGGG 14281 CGCTGCAAGC CGTTCGCCGG GGCGGCGGAC GGCACGGGGT GGGGCGAGGG CGTGGGCATG 14341 CTGCTGGTGG AGCGGTTGTC GGATGCTGAG CGGCTTGGGC ATCCGGTGCT GGCCGTTGTC 14401 TCCGGCTCTG CGGTGAATCA AGACGGTGCG TCGAATGGTT TGACGGCGCC GAATGGTCCG 14461 TCGCAGCAGC GGGTGATCCG TCAGGCGTTG GCGAGTGCGG GTCTTGTGGC GTCGGATGTG 14521 GATGCGGTGG AGGCGCACGG TACGGGTACG ACGCTCGGTG ATCCGATCGA GGCGCAGGCG 14581 CTGCTGGCCA CCTACGGTCA GGACCGGGAT GCGGATCGGC CGTTGTGGTT GGGGTCCCTG 14641 AAGTCGAACA TCGGTCATAC GCAGGCGGCC GCGGGTGTGG CTGGTGTGAT CAAGATGGTG 14701 ATGGCCATGC GGCACGGGGT GCTGCCGCGA ACGCTGCACG TGGATGAGCC GACACCGAAG 14761 GTGGATTGGT CCGCCGGCGC GGTGGGACTG CTCACCGAGT CGGCCGAGTG GCGGCAGGAG 14821 GGCCGACCGC GCCGAGCCGG GGTGTCGGCT TTCGGGGTGA GCGGCACCAA TGCCCATGTG 14881 ATCCTGGAGC AGGCCCCGAA GCACGCACCG GGGGTGGCGG CCGAGGGCAG GAAGGGGCGC 14941 GGGGAGCCGC CGACGGTGCC CTGGGTGCTG TCGGGCGCGA GCGAGGCGGG TCTGCGGGCG 15001 CAGATCGAAG GCTTGCGGGC CTTCGCTGAC GACAACCCCA CGCTCGATCC GGCGGATGTG 15061 GGCTGGTCGT TGGCGTCCAC ACGTGCGCTT CTGCCGTATC GCACTGTCGT CGTGGGCACC 15121 GACCTCGACG AGTTGCGGCG TGGGTTGGAC GCGGCGGAGG TGGTGGGCGC GGCCGAGCCG 15181 GACCGTGGCG CCGTGTTGGT GTTCCCGGGG CAGGGGTCGC AGTGGGTTGG GATGGCGTTG 15241 GAGTTGGTGG AGTCGTCGCC GGTGTTCGCG GGGCGGATGC GTGATTGTGC GGATGCGTTG 15301 GCGCCGTTCG CCGAGTGGTC GTTGTTCGGT GTGTTGGGTG ATGAGGTGGC GCTTGGGCGG 15361 GTTGATGTGG TGCAGCCGGT GTTGTGGGCG GTGATGGTGT CGTTGGCGGA GTTGTGGCGT 15421 TCGTTTGGTG TGGTGCCGTC GGTGGTGGTG GGGCATTCGC AGGGTGAGAT CGCGGCGGCG 15481 TGTGTGGCCG GGGGTCTGTC GTTGGAGGAC GGTGCCCGTG TGGTGGCCTT GCGGAGCAGG 15541 GCGTTGCTGG CTCTGTCGGG TCGGGGTGGG ATGGTGTCGG TTCCGGTTTC TGCTGACCGG 15601 CTGCGGGGTC GTGTGGGGTT GTCGGTGGCG GCGGTGAATG GTCCGGTGTC GACGGTGGTG 15661 TCGGGGGCTG TTGAGGTGCT GGAGGGGGTG CTGGCGGAGT TCCCGGGGGC CAAGCGGATT 15721 CCGGTGGATT ATGCGTCGCA TTCGGTGCAG GTGGAGGGGA TCCGGGAGGG GTTGGCGGAG 15781 GCGTTGGCAC CGGTTCGGCC GCGTACGGGT GAGGTGCCGT TCTATTCGAC GGTGACCGGG 15841 CGATTGATGG ACACCGTGGG GCTGGATGGG GAGTACTGGT ATCGGAATCT GCGTGAGACG 15901 GTGGAGTTCC AGTCCGCGAT CGAGGGGCTG CTGGAGCTTG GTCATACGGT GTTCGTCGAG 15961 GCCAGCCCGC ATCCGGTGCT GACCGTCGGC ATCCAGGACA CCGCCGAGAC CACGGACACC 16021 GACATCCTCG TCACCGGCTC GCTGCGCCGT GACGGCGGTG GCCTTGCCTC TTTCCTCACC 16081 GCGCTGGCCC GGCTGCATGT CCGGGGTGTC GCGGTGGAGT GGCGGGAGGC GTTCGCCGGG 16141 CTGGACGCCC ACGCCGTGGA CCTGCCGACC TACGCCTTTC AGCGTCGGCG CTTCTGGGCG 16201 GCCTCCCTGC GGCAGACTCC CGGGACGGCC GAGTTCGACC ATCCCCTCCT GGGCGCGGTG

276

16261 CTGCCCTTGC CCGATTCCGG CGGCGGTCTG CTCACGGGCG TGCTCACACT GGCCGGACAG 16321 CCGTGGCTGG CCGAACACTC GGTGGCCGGT GTGGTGTTGT TCCCGGGGAC GGGGTTTGTG 16381 GAGTTGGTGT TGCAGGCGGG GTTGCGGTGG GGGTGTGGGG TGGTTGAGGA GTTGACTTTG 16441 GAGGGGCCGT TGGTGCTTCC GGAGCGGGGT GAGGTTGAGG TTCAGGTTTC GGTGGGTGGT 16501 GTGGATGGGG CGGGGTGTCG GTCGGTGTCG GTGTTTTCGT GTCGTGGGGG TGAGTGGGTT 16561 CGGCATGCGG TGGGTGTGCT TGGGGTGGGG GATGGTGTGG TGCCGGGTGT GGAGGTGTGG 16621 CCGCCGGTGG GTGCGGAGCG GGTTGGGGTG GAGGGGGTTT ATGAGGTTTT GGCGGAGCGG 16681 GGGTATGTGT ATGGGCCGGT GTTCCAGGGG TTGCGGGACG CCTGGCGCCG GGGCGACGAA 16741 ATCTTCGTGG AGGCGGAGGT ACCGGCGGAG GCGCGGGGCG ATGCGGCTCG CTGTGCCATC 16801 CATCCCGCGC TGCTCGACGC AGGGCTGCAC GGCGTCGGAT TGGGCGGCCT GATCAGCGAC 16861 GACGGCCGGG CGTACCTGCC GTTCTCCTGG AGCGGGGTCA GGCTGCACGC GGTCGGCGCA 16921 TCCGCTGTCC GGATGACGCT GACGCCCGCC GGACCGGACG CGGTGTCGCT GAGGGTGACC 16981 GATGAGGCGG GCGAGGCGGT GCTGACGGCG GACTCCCTTG TGCTCCGCCC GGTCACCGAG 17041 GGACAGCTCG CCGAAGCCGA GATCGGCAAC CGCGATGTGC TTCATCGGGT GGAGTGGGTG 17101 GATGCGGGGG CGTGTTCGGT GGGGTCGTTC GTGGAGTGGG GTGAGGTGGC TGCTGGTGGG 17161 GTGGTGCCGG ATTGTGTGGT GTTGGCCGGG GCTGATGTGG CGGGTGTGTT GGAGGTTTTG 17221 CGGACGTGGG TGGTGGAGGA GCGGTTTGAG GGTTCGCGGT TGGTGGTGGT GACGAGGGGT 17281 GCGGTGTCGG TCGGTGGTGA GGGTTTGGAG GATGTGAGTG GTGGTGCGGT GTGGGGGTTG 17341 GTGCGGTCGG CGCAGTCGGA GCATCCGGGG CGGTTTGTGC TGGTGGACGC CGATGTAGAT 17401 ACGGATGTGG TTCCGGATGT GGTGGGGCTG GGGGAGTGGC AGGTGGCGGT GCGTGCGGGT 17461 CGGGTGTGGG TGCCGCGTCT GGTGGATGTG GATGTGAGTG TGGGTGGTGC TGTGGTGCGT 17521 GGGGGCTTGG GTTCGGGTGT GGCGTTGGTG ACGGGTGGGA CGGGGTTGCT GGGTGGGTTG 17581 GTGGCGCGTC ATCTGGTGTC GGCGTATGGG GTGGGTGAGT TGGTGTTGGT GAGTCGTCGG 17641 GGGGTGGCTG CGCCGGGCGT GGAGGAGTTG GTGGGGGAGT TGGAGGGGTT GGGCGCGCGG 17701 GTGCGGGTGG TGGCGTGTGA TGTGGCGGAT CGGGGTGCGG TGGCGGAGTT GGTGGGGTCG 17761 ATCGAGGGGT TGCGGGTGGT GGTGCACGCG GCGGGTGTCG TGGATGACGG GGTGATCGGT 17821 TCGTTGGACG CGGAGCGGTT GTGTGGGGTG ATGGGGCCGA AGGCGTGGGG TGCCTGGCAT 17881 CTGCATGAGC TGACGCGTGG GTTGGATCTG TCGGCGTTCG TGTTGTTCTC GTCGGCGGCG 17941 GGTGTGTTGG GCAACGCGGG CCAGGGCGGC TACGCGGCCG CGAATGGGTT CCTGGACGCG 18001 CTGGCGGTTC ACCGTCGGGG GCGGGGACTC CCCGCGGTGT CGATCGCGTG GGGCTTCTGG 18061 GAGGAACGCA GCGAACTGAC CGCCGACCTG GCCGAGGTGC AGCTGTCGAG GATCTCCCGG 18121 TCCGTAGGGG CCAGCATCAG CAGCGCACAA GGACTGGATC TGTTCGACGC GGCGCTTGCC 18181 GCCGACGAGC CGATGGTGCT GGCCACACCC CTGAACCTGC CCGCGTTGCG GGACCAGGCC 18241 GCCGCGGGCA CGTTGCCCTC GATCCTGAGC GGACTGGTCA CCGCTCCCGT CCGCAGGACG 18301 GCCGGCACCG GGCGCACTCC GGCCGGACTG CGGCACCAAC TCGCCGGGGT GACAGAGGCC 18361 GAAAGGCAGC ACCAGATCAT GCGCCTGGTG CAGGAACATG TGGCCGGCGT TCTGGGACAT 18421 GCCTCCGCGG AGTTGGTCGA CGCCTCGCGG ACGTTCCAGG AGATCGGGTT CGACTCGCTG 18481 ACCGCCGTGG AACTGCGCAA CCGGATCAGC GCCGCCACCG GCATACGGCT GCCCGCCACC 18541 GCGGTCTTCG ACCACCCCAC GCCCAGGCTG CTGGCCGAGC GGGTGCTGGC CGAGGTAGGG 18601 GGCTCCTTGC CGACCGCCGC CCCGATCGCG CCGGTGTCGG CCGTCGATGA CGAGCCGATC 18661 GTGATCGTGG GCATGAGTTG CCGCTTCCCC GGCGGCGTCG AGTCCCCCGA GGACCTGTGG 18721 CGCCTGGTCC ACTCGGCCAC CGACGCGGTC TCCGCGCTGC CCACGGACCG GGGCTGGGAC 18781 CTGGCCACCT TGTCCGGTGC CAAGGGCGGC GCCGGTGCCT CGTACGCCCG GGACGGCGGA 18841 TTCCTTTACG ACGCGGCTGA GTTCGACGCC GGATTCTTCG GGATCTCGCC GCGCGAGGCG 18901 ACCGCGATGG ATCCGCAGCA GCGGCTGCTG CTGGAGGCGG CCTGGGAGGT GTTCGAGCGG 18961 GCCGGAATCG CCCCGGACAC GCTCAAAGGC AGCCGGACGG GCGTCTTCAC AGGCGTGATG 19021 TACCACGACT ACGGCTCGTG GCTCACCGAT GTCCCGGAGG ACGTCGAGGG CTATCTGGGC 19081 ACAGGCATCG CGGGCAGTGT GGCGTCGGGG CGACTCGCCT ATACGTTCGG CCTTGAGGGG 19141 CCTGCCCTGA CGGTGGACAC GGCCTGCTCC TCATCACTGG TGGCGCTGCA TCTGGCGGCC 19201 GAGTCGCTGC GGCGCGGGGA GTGCTCGCTG GCACTCGCGG GCGGCGTCAC CGTACTGGCG 19261 ACTCCGCAGG TCTTCGTGGA GTTCACACGC CAGGGCGGAC TCGCACCGGA TGGCCGGTGC 19321 AAGCCCTTCG CCGCTGGTGC GGATGGGACG GGCTGGTCGG AGGGTGTTGG GCTGCTGCTG 19381 GTGGAGCGGT TGTCGGATGC CGAGCGGAAC GGGCATCCGG TGCTGGCCGT TGTCTCCGGC 19441 TCCGCGGTGA ATCAAGACGG TGCGTCGAAT GGTTTGACGG CGCCGAATGG TCCGTCGCAG 19501 CAGCGGGTGA TCCGTCAGGC GTTGGCGAAC GCCGGGCTCG CCGCCAGGGA TGTCGATGCG 19561 GTGGAGGCGC ATGGTACGGG GACGACGCTG GGTGATCCGA TCGAGGCGCA GGCGTTGCTG 19621 GCCACGTACG GTCAGGGCCG GGATGTGGGT CAGCCGTTGT GGTTGGGGTC GGTGAAGTCG

277

19681 AACATCGGTC ATACGCAGGC GGCTGCGGGT GTGGCTGGTG TGATCAAGAT GGTGATGGCT 19741 ATGCGGCACG GGGTGCTGCC GCGAACGCTG CACGTCGATG AGCCGTCGCC GCATGTGGAT 19801 TGGTCTGCTG GGGCGGTGGA GCTCCTGGGG GAGCACATGG GCTGGCCGGA GGTCGGGCGG 19861 CCCCGTCGGG CGGGTGTCTC GTCGTTCGGG GCGAGTGGCA CCAACGCCCA TGTGATTCTT 19921 GAGCAGGCCC CCGACATGGC GGGTGAACCT GAGCAAAGGC CGGAGCGTAA CGAACTACCG 19981 GCGATTCCCT GGGTGTTCTC CGCTGGCGAC GAGGCGGGTT TGCGGGCACA GGCCGTACGG 20041 CTACGGGCCT TCGCGGACCG GAATCCGGAT CTGGATCCGG TGGATGTGGG GTGGTCTTTG 20101 GCGACTGGTC GTGCGGGGTT GTCGCATCGT GCGGTGGTGG TGGGTGCGGG TCGTGGTGAG 20161 TTGTTGGGGG CTTTGGAGGG TGTGCCGGTG GTGGGTGTGC CGGTGGTGGG TGGGTTGGGT 20221 GTGTTGTTTG CGGGTCAGGG GTCGCAGCGG TTGGGGATGG GTCGTGGGTT GTATGAGGGG 20281 TATCCGGTGT TCGCTGCGGT GTGGGATGAG GTGTGCGCGC AGCTGGACCA GCATTTGGAT 20341 AGGCCGGTGG GTGAGGTGGT GTGGGGTGAT GATGCCGGGT TGGTCGGGGA GACGGTGTAT 20401 GCGCAGGCGG GGTTGTTCGC GCTTGAGGTG GCGCTGTATC GGCTGATCGC TTCGTGGGGT 20461 GTGAGGGGGG ATTATCTGCT GGGTCATTCG ATTGGTGAGT TGGCTGCGGC GTATGTGGCG 20521 GGTGTGTGGT CGTTGGAGGA TGCGGGGAGG GTGGTGGTGG CGCGGGGTCG TTTGATGCAG 20581 GCGTTGCCGT CGGGTGGTGC GATGGTTGGG GTGGCGGCGT CGGAGGGTGT GGTGCGGCCG 20641 CTGCTGGGCG AGGGTGTGGT GGTTGCGGCG GTGAATGGTC CCGAGTCGGT GGTGCTGTCG 20701 GGTGATGAGG ATGCGGTTGA GGCGGTTGTG GATGTGTTGG CTGGGCGTGG GGTGCGGACG 20761 CGGCGGTTGC GGGTGAGTCA TGCGTTTCAT TCGGCTCGTA TGGACGGGAT GCTGGCGGAG 20821 TTCGGTGAGG TGCTTCGGGG GGTGGAGTTC CGTGCCCCGA GCGTGCCCGT GGTGTCGAAC 20881 GTGTCCGGTG CGGTGGCGGG TGAGGAGTTG TGTTCGCCGG AGTATTGGGT GCGGCATGTG 20941 CGGGAGACGG TCCGGTTCGC CGATGGGCTG GATACTCTCC GTGAGCTGGG TGTGGGTTCG 21001 TTCCTGGAGT TGGGGCCGGA CGGGACGTTG ACCGCCTTGG CGGATGGCGA TGGTGTGCCT 21061 GTCTTGCGTC GGGATCGTCC GGAGCCTCTG ACCGCTATGG CGGCTTTGGG CGGGCTGTAC 21121 GTCCGGGGTG TCCAGATCGA CTGGGATGCG GTGTTCCCGG GTGCTCGGCG GGTTGATTTG 21181 CCGACGTATG CCTTCCAGCG TGAGCGGTTC TGGTTGGAGC CGTCCCCTGA GCGGCCCACG 21241 ACGAGCGTGG TTGACGCGGC GTTCTGGGAT GCGGTTGAGC GTGGGGATCT CGGTTCGTTC 21301 GGCATCGATG CCGAGCAGCC GCTCAGCACC GCCCTGCCCG CCCTCTCGTC CTGGCGGAGG 21361 GCGCGGCAGG AGCAGTCGGT GATTGATGGC TGGCGTTACC GGCTCGGTTG GATGCCGATT 21421 CCGGCGGTGT CCGGGGAGGT GGGCCTCACC GGTACCTGGC TGGTTGTGGT CGAGCCGGGT 21481 GCGGACGGTA CTGATGTGGC TGTCGCGTTG CGGTCGGCCG GGGCCGGTGT CGAGGTTGTG 21541 ACGTCGGCGG AGCTGAGCGC TGGTCCGGTT GCGGGTGTGG TGTCGTTGGT GTCGGTCGAG 21601 GCGACGGTGT CGTTGCTGCA CGTCCTTGTG GCGGCCGGGG TCGATGCGCC GTTGTGGTGT 21661 GTGACTCGTG GTGCGGTCTC GGTGGTCGAC GGTGACTTGG TGGATCCTGG CCAGGCGGGA 21721 GTCTGGGGTC TGGGCCGTGT GATCGGTCTG GAGCATCCGG ATCGTTGGGG CGGGCTGATC 21781 GACTTGCCTG GCGAACTGGA CGATCGCGCG GGGAATGCGC TGGTAGGCAT CCTTGCCGGG 21841 GGCACCGGTG AGGATCAGGT GGCCATCCGT GTCACCGGCA TATGGGGTGC CCGGCTGGTG 21901 CGGGCGACGC CGGTCCCGAT CGGTGACGCG GGTGGTGAGG CTGCGGCCGC GTGGCGTGGG 21961 CGTGGTACCG CGCTGGTCAC CGGTGGTACG GGGGCGCTGG GGCGCCAGGT GGCGCGCTGG 22021 CTGGTGGACA GTGGTCTGGA GCGGGTCGTG CTGACGAGCC GTCGGGGGGG CGAGGCGCCC 22081 GGTGCCGTCG AGCTGGTGGC TGAGTTGGGG AGCCGAGTGC GTGTCGTGGC CTGTGATGTC 22141 GGCGATCGTG AGGAGCTTGC GGCTCTTTTG GCGATGCTCC CGGATGTGCG GACCATCGTG 22201 CATGCGGCGG GTGTCCTCGA CGACGGGGTG CTCGAATCGC TGACGCCCGA GCGGATCCGT 22261 GAGGTGATGC GGGCCAAGGC CGACGGCGCG CGGCATCTCC ACGAGTTGAC CCGTGACATC 22321 GACCTCGACG CCTTCGTGTT GTTCTCGTCG GCTGCCGGGA CCGTGGGTAA TGCGGGTCAG 22381 GGGAGCTATG CGGCGGCCAA CGCCGTCCTG GACGGGCTGG CGTGGCGTCG CCGGGCCGAG 22441 GGCTTGGTGG CCACATCGGT GGCCTGGGGA GCCTGGGCCG ACAGCGGCAT GGGGGCTGGG 22501 CACGCACGGG CCATGGCACC ACGGCTGGCG CTGGCAGCCC TTCAGCGAGC GTTGGACGAC 22561 GACGAGACCG CACTCATGGT CGCGGACGTG GATTGGTCGA GCTTCGGCTC CCGGTTCACC 22621 GCCGTACGGC CGAGCCCGCT GCTGAGCGAA CTGCTGCCCC GCTCCAGCGC GCCGGTGGAA 22681 CCGGTCGAGG CACTCGCCAC CCGGTTGCGG GGCATGTCGC GGATCGAGCG CGATCGGGCG 22741 GTGCTGGAGC TGGTCCGTGC CCAAGTGGCG GCCGTGCTGG GACATGCGAA GCCCGCTTCG 22801 GTCGACCCCT CGCGGACCTT CCAGGAAGTC GGCTTCGACT CGCTGACCGC GGTGGAGCTG 22861 CGGAACCGGC TGGCCACTGC CACCGGCGTA CCGTTCCCGG GGTCGGTCAT CTTCGACTAT 22921 CCGACTCCCA CGGCGCTCGC CGACCATGTC CGGGCCCGGT TCGTTCCGGA CACGGACAAC 22981 GACGAGGACG GGGGCGGCGC GACGTCCGTG CTCGACGAGC TGACCAGGCT GGAAGCCGTG 23041 CTGTCCGACC TGTCCCCGAG CGACGTGGCC GGTGCCGAGG TCGCCGCGAA GATCAAGAGC

278

23101 CTGCTGTCCC ACTGGGGAGC GGCCACCAAC AGTGACATCG ACATGGATTC CGCGACGGAC 23161 GAGGAGATGT TCGACCTCCT CGGCAAGGAG TTCGGGATCT CGTGAgctag cCGATATAAG 23221 TTGTAATTCT CATGTTTGAC AGCTTATCAT CGATAAGCTT TAATGCGGTA GTTTATCACA 23281 GTTAAATTGC TAACGCAGTC AGGCACCGTG TATGAAATCT AACAATGCGC TCATCGTCAT 23341 CCTCGGCACC GTCACCCTGG ATGCTGTAGG CATAGGCTTG GTTATGCCGG TACTGCCGGG 23401 CCTCTTGCGG GATATCGTCC ATTCCGACAG CATCGCCAGT CACTATGGCG TGCTGCTAGC 23461 GCTATATGCG TTGATGCAAT TTCTATGCGC ACCCGTTCTC GGAGCACTGT CCGACCGCTT 23521 TGGCCGCCGC CCAGTCCTGC TCGCTTCGCT ACTTGGAGCC ACTATCGACT ACGCGATCAT 23581 GGCGACCACA CCCGTCCTGT GGATCCTCTA CGCCGGACGC ATCGTGGCCG GCATCACCGG 23641 CGCCACAGGT GCGGTTGCTG GCGCCTATAT CGCCGACATC ACCGATGGGG AAGATCGGGC 23701 TCGCCACTTC GGGCTCATGA GCGCTTGTTT CGGCGTGGGT ATGGTGGCAG GCCCCGTGGC 23761 CGGGGGACTG TTGGGCGCCA TCTCCTTGCA TGCACCATTC CTTGCGGCGG CGGTGCTCAA 23821 CGGCCTCAAC CTACTACTGG GCTGCTTCCT AATGCAGGAG TCGCATAAGG GAGAGCGTCG 23881 ACCGATGCCC TTGAGAGCCT TCAACCCAGT CAGCTCCTTC CGGTGGGCGC GGGGCATGAC 23941 TATCGTCGCC GCACTTATGA CTGTCTTCTT TATCATGCAA CTCGTAGGAC AGGTGCCGGC 24001 AGCGCTCTGG GTCATTTTCG GCGAGGACCG CTTTCGCTGG AGCGCGACGA TGATCGGCCT 24061 GTCGCTTGCG GTATTCGGAA TCTTGCACGC CCTCGCTCAA GCCTTCGTCA CTGGTCCCGC 24121 CACCAAACGT TTCGGCGAGA AGCAGGCCAT TATCGCCGGC ATGGCGGCCG ACGCGCTGGG 24181 CTACGTCTTG CTGGCGTTCG CGACGCGAGG CTGGATGGCC TTCCCCATTA TGATTCTTCT 24241 CGCTTCCGGC GGCATCGGGA TGCCCGCGTT GCAGGCCATG CTGTCCAGGC AGGTAGATGA 24301 CGACCATCAG GGACAGCTTC AAGGATCGCT CGCGGCTCTT ACCAGCCTAA CTTCGATCAT 24361 TGGACCGCTG ATCGTCACGG CGATTTATGC CGCCTCGGCG AGCACATGGA ACGGGTTGGC 24421 ATGGATTGTA GGCGCCGCCC TATACCTTGT CTGCCTCCCC GCGTTGCGTC GCGGTGCATG 24481 GAGCCGGGCC ACCTCGACCT GAATGGAAGC CGGCGGCACC TCGCTAACGG ATTCACCACT 24541 CCAAGAATTG GAGCCAATCA ATTCTTGCGG AGAACTGTGA ATGCGCAAAC CAACCCTTGG 24601 CAGAAGTTCC TATTCCGAAG TTCCTATTCT TATAGGAGTA TAGGAACTTC TTTCAGCTTT 24661 CCGCAACAGT ATAActgtgc ggtatttcac accgcataga tccgtcgagt tcaagagaaa 24721 aaaaaagaaa aagcaaaaag aaaaaaggaa agcgcgcctc gttcagaatg acacgtatag 24781 aatgatgcat taccttgtca tcttcagtat catactgttc gtatacatac ttactgacat 24841 tcataggtat acatatatac acatgtatat atatcgtatg ctgcagcttt aaataatcgg 24901 tgtcactaca taagaacacc tttggtggag ggaacatcgt tggtaccatt gggcgaggtg 24961 gcttctctta tggcaaccgc aagagccttg aacgcactct cactacggtg atgatcattc 25021 ttgcctcgca gacaatcaac gtggagggta attctgctag cctctgcaaa gctttcaaga 25081 aaatgcggga tcatctcgca agagagatct cctactttct ccctttgcaa accaagttcg 25141 acaactgcgt acggcctgtt cgaaagatct accaccgctc tggaaagtgc ctcatccaaa 25201 ggcgcaaatc ctgatccaaa cctttttact ccacgcactg ctcctagggc ctctttaaaa 25261 gcttgaccga gagcaatccc gcagtcttca gtggtgtgat ggtcgtctat gtgtaagtca 25321 ccaatgcact caacgattag cgaccagccg gaatgcttgg ccagagcatg tatcatatgg 25381 tccagaaacc ctatacctgt gtggacgtta atcacttgcg attgtgtggc ctgttctgct 25441 actgcttctg cctctttttc tgggaagatc gagtgctcta tcgctagggg accacccttt 25501 aaagagatcg caatctgaat cttggtttca tttgtaatac gctttactag ggctttctgc 25561 tctgtaccag aaccaccaga accTTTGTAC AATTCATCCA TACCATGGGT AATACCAGCA 25621 GCAGTAACAA ATTCTAACAA GACCATGTGG TCTCTCTTTT CGTTTGGATC TTTGGATAAG 25681 GCAGATTGAG TGGATAAGTA ATGGTTGTCT GGTAACAAGA CTGGACCATC ACCAATTGGA 25741 GTATTTTGTT GATAATGGTC AGCTAATTGA ACAGAACCAT CTTCAATGTT GTGTCTAATT 25801 TTGAAGTTAA CTTTGATACC ATTCTTTTGT TTGTCAGCCA TGATGTAAAC ATTGTGAGAG 25861 TTATAGTTGT ATTCCAATTT GTGACCTAAA ATGTTACCAT CTTCTTTAAA ATCAATACCT 25921 TTTAATTCGA TTCTATTAAC TAAGGTATCA CCTTCAAACT TGACTTCAGC TCTGGTCTTG 25981 TAGTTACCGT CATCTTTGAA AAAAATAGTT CTTTCTTGAA CATAACCTTC TGGCATGGCA 26041 GACTTGAAAA AGTCATGTTG TTTCATATGA TCTGGGTATC TcGCAAAACA TTGAACACCA 26101 TAACCGAAAG TAGTGACTAA GGTTGGCCAT GGAACTGGCA ATTTACCAGT AGTACAAATA 26161 AATTTTAAGG TCAATTTACC GTAAGTAGCA TCACCTTCAC CTTCACCGGA GACAGAAAAT 26221 TTGTGACCAT TAACATCACC ATCTAATTCA ACCAAAATTG GGACAACACC AGTGAATAAT 26281 TCTTCACCTT TAGACATTGT GATGATGTTT TATTTGTTTT GATTGGTGTC TTGTAAATAG 26341 AAACAAGAGA GAATAATAAA CAAGTTAAGA ATAAAAAACC AAAGGATGAA AAAGAATGAA 26401 TATGAAAAAG AGTAGAGAAT AACTTTGAAA GGGGACCATG ATATAACTGG AAAAAAGAGG 26461 TTCTTGGAAA TGAAAAGTTA CCAAAGAGTA TTTATAATTC AGAAAAAAAA GCCAACGAAT

279

26521 ATCGTTTTGA TGGCGAGCCT TTTTTTTTTT TTAGGAAGAC ACTAAAGGTA CCTAGCATCA 26581 TATGGGAAGG AAAGGAAATC ACTTGGAAGA CATCACAAGC ATTCATTTAC CAAGAGAAAA 26641 AATATGCATT TTAGCTAAGA TCCATTGAAC AAAGCACTCA CTCAACTCAA CTGAATGAAC 26701 GAAAGAAGAA AGAACAGTAG AAAACACTTT GTGACGGTGC GGAACACATT TACGTAGCTA 26761 TCATGCTGAA TTCTACTATG AAAATCTCCC AATCTGTCGA TGGCAAAACG ACCCACGTGG 26821 CAGAGTTGGG TCAAGTGCCA GTTTCTGGAT TAAGTAACAG ATACAGACAT CACACGCCAT 26881 AGAGGAATCC CGCCGTTGCG AGAGATGGAA AACAATAGAG CCGAAATTGT GGAAGCCCGA 26941 TGTCTGGGTG TACATTTTTT TTTTTTttCT TTCTTTCTCT TTCAATAATC TTTCCTTTTT 27001 CCATTTAGCT TGCCGGAAAA ACTTTCGGGT AGCGAAAATC TTTCTGCCGG AAAAATTAGC 27061 TATTTTTTTC TTCCTTATTA TTTTTTTAGT TCTGAAGTTT GACCAGGGCG CTACCCTGAC 27121 CGTATCACAA CCGACGATCC GGGGTCATGG CGGCTA //

280

A.4 Sequence of Fragment 2 yeast assembly.

Sequence of Fragment 2 shown schematically in Fig. 2-1. To show the context of the sequence in the chromosome, the last 30 bp of the HO(L) region on the 5’ end, and the pPYK promoter of the acceptor module on the 3’ end are included. The sequence is in Genbank format:

LOCUS mer-fragment-2-031016 26148 bp DNA linear UNA 07- Apr-2012 DEFINITION FEATURES Location/Qualifiers Misc._recombina 1..30 /label=HO-L Misc. 1..1118 /label=cr9-repair-fragment Primer 1..18 /label=AN348 Primer 1..60 /label=AN133 Primer 1..27 /label=AN531 Primer 1..16 /label=AN578 Misc._recombina 31..78 /label=FRT-WT ORF 79..750 /label=TRP1 Primer complement(82..110) /label=AN912 Primer complement(177..199) /label=AN285 Primer 508..529 /label=AN183 Primer complement(722..781) /label=AN532 Primer 730..768 /label=AN533 Terminator 751..1005 /label=tACT Primer complement(974..1026) /label=AN534 Primer complement(1002..1044) /label=AN535 Restriction_sit 1006..1011 /label=XbaI Misc._recombina 1012..1059 /label=FRT-15 Primer complement(1019..1078) /label=AN524 Cleavage_site 1054..1059 /label=assembly1 Primer complement(1059..1118) /label=AN492 Amplification 1060..1333

281

/label="from pSE34" Primer 1149..1168 /label=AN580 Promoter 1155..1331 /label=ermE Primer complement(1307..1327) /label=AN579 Primer complement(1315..1333) /label=AN20 Primer complement(1315..1353) /label=AN103-R9-R1 Primer_binding_ complement(1315..1333) /label=AN20 Primer 1324..1353 /label=AN109-R9-R2 Amplification 1334..1390 /label="from mer" ORF 1391..22699 /label=merB Primer complement(1427..1446) /label=LK19 Active_site 1490..2758 /label=M4-KS Primer complement(1501..1519) /label=LK20 Primer_binding_ 1697..1714 /label=B1 Primer_binding_ complement(1894..1913) /label=B2 Primer_binding_ 2088..2107 /label=AN173 Primer_binding_ 2449..2467 /label=B3 Primer_binding_ complement(2653..2670) /label=B4 Active_site 3059..3964 /label=M4-AT Primer 3243..3263 /label=AN155-R9-R3 Primer_binding_ 3273..3291 /label=AN120 Primer complement(3294..3315) /label=AN119 Primer_binding_ complement(3294..3315) /label=AN146 Primer_binding_ 3969..3989 /label=AN144 Primer_binding_ complement(4059..4080) /label=AN145 Active_site 4133..4627 /label=M4-DH Primer_binding_ 4472..4490 /label=B5 Primer_binding_ complement(4565..4585) /label=B6 Primer 5206..5226 /label=AN164-R9-R4

282

Primer_binding_ 5206..5226 /label=AN164 Primer complement(5294..5315) /label=AN156-R9-R3 Primer_binding_ complement(5294..5315) /label=AN156 Active_site 5456..6004 /label=M4-KR Primer_binding_ complement(5740..5759) /label=AN175 Primer_binding_ 5779..5797 /label=B7 Primer_binding_ complement(5841..5859) /label=B8 Active_site 6260..6517 /label=M4-ACP Primer_binding_ 6382..6400 /label=B9 Primer_binding_ complement(6469..6488) /label=B10 Primer complement(6518..6547) /label=AN110-r2-pcr1 Primer 6518..6547 /label=AN124 Active_site 6575..7858 /label=M5-KS Primer 6942..6963 /label=AN168-R9-R5 Primer_binding_ 6942..6963 /label=AN168 Primer complement(7179..7197) /label=AN165-R9-R4 Primer_binding_ complement(7179..7197) /label=AN165 Primer_binding_ 7399..7418 /label=AN223 Primer_binding_ 7574..7591 /label=B18 Primer_binding_ complement(7688..7707) /label=AN187 Primer_binding_ 8146..8164 /label=B11 Primer_binding_ complement(8180..8200) /label=B12 Active_site 8189..9058 /label=M5-AT Primer_binding_ 8880..8897 /label=B13 Primer_binding_ complement(8891..8909) /label=B14 Primer complement(8923..8962) /label=AN125 Primer 8923..8962 /label=AN126 Primer 8939..8961 /label=AN171-R9-R6 Primer 8993..9014

283

/label=AN136 Primer complement(9063..9083) /label=AN135 Primer complement(9153..9172) /label=AN169-R9-R5 Primer 9380..9400 /label=AN245 Primer_binding_ 9542..9560 /label=AN225 Primer_binding_ complement(9674..9692) /label=AN224 Active_site 10079..10609 /label=M5-KR Primer_binding_ 10157..10176 /label=B15 Primer_binding_ complement(10260..10277) /label=B16 Primer 10529..10549 /label=AN400 Primer 10747..10765 /label=AN188 Active_site 10895..11152 /label=M5-ACP Primer_binding_ 10910..10929 /label=B17 Primer complement(10952..10970) /label=AN172-R9-R6 Primer_binding_ complement(10952..10970) /label=AN172 Primer complement(11173..11202) /label=AN127 Active_site 11219..12505 /label=M6-KS Primer_binding_ 11569..11586 /label=AN227 Primer complement(11608..11626) /label=AN246 Primer_binding_ complement(11658..11677) /label=AN226 Primer_binding_ 12221..12238 /label=B18 Primer_binding_ complement(12335..12354) /label=AN187 Primer 12482..12502 /label=AN215 Primer complement(12734..12752) /label=AN189 Active_site 12809..13675 /label=M6-AT Primer_binding_ 12851..12870 /label=B19 Primer_binding_ complement(12940..12961) /label=B20 Primer 13465..13485 /label=AN239 Primer_binding_ complement(13550..13570) /label=AN228

284

Active_site 13853..14353 /label=M6-DH Primer_binding_ 14096..14116 /label=BAN21 Primer_binding_ complement(14099..14120) /label=BAN22 Primer 14297..14316 /label=AN229 Primer complement(14306..14326) /label=AN286 Primer complement(14508..14527) /label=AN216 Primer_binding_ 14766..14786 /label=BAN23 Primer_binding_ complement(14768..14787) /label=BAN24 Active_site 15290..16210 /label=M6-ER Primer 15389..15410 /label=AN247 Primer complement(15478..15497) /label=AN240 Primer 16061..16079 /label=AN241 Primer_binding_ complement(16097..16117) /label=BAN25 Active_site 16208..16771 /label=M6-KR Primer complement(16274..16292) /label=AN230 Primer_binding_ 16753..16770 /label=BAN26 Primer_binding_ complement(16781..16801) /label=BAN27 Active_site 17024..17281 /label=M6-ACP Primer 17220..17240 /label=AN267 Active_site 17342..18625 /label=M7-KS Primer complement(17428..17446) /label=AN248 Primer 17945..17965 /label=AN249 Primer complement(18068..18087) /label=AN242 Primer_binding_ 18629..18647 /label=BAN28 Primer_binding_ complement(18650..18669) /label=BAN29 Active_site 18914..19780 /label=M7-AT Primer_binding_ 18956..18975 /label=B19 Primer 18997..19016 /label=AN275 Primer complement(19212..19232)

285

/label=AN268 Primer_binding_ complement(19627..19646) /label=BAN30 Primer 19658..19678 /label=AN273 Primer complement(19897..19916) /label=AN250 Active_site 19955..20455 /label=M7-DH Primer_binding_ 20212..20230 /label=BAN31 Primer 20614..20634 /label=AN364 Primer complement(20830..20849) /label=AN276 Primer complement(21343..21361) /label=AN377 Primer 21363..21382 /label=AN376 Active_site 21383..21937 /label=M7-KR Primer 21480..21498 /label=AN277 Primer complement(21662..21682) /label=AN274 Primer 21953..21970 /label=AN366 Primer complement(22052..22071) /label=AN365 Active_site 22196..22453 /label=M7-ACP Primer_binding_ 22318..22336 /label=B9 Primer complement(22544..22561) /label=AN379 Primer 22552..22570 /label=AN378 Primer 22648..22689 /label=AN487 Primer 22668..22727 /label=AN485 Primer complement(22678..22705) /label=AN345 Restriction_sit 22700..22705 /label=NheI Promoter 22706..22824 /label=cat ORF 22825..23484 /label=CAT Primer complement(23142..23163) /label=AN298 Terminator 23485..23594 /label=cat Primer complement(23575..23634) /label=AN553 Misc._recombina 23595..23642 /label=FRT14

286

Primer complement(23602..23661) /label=AN554 Primer complement(23643..23672) /label=LMW375 Primer complement(23744..23759) /label=AN112 Terminator complement(23745..23802) /label=tHIS3 ORF complement(23872..24575) /label=HIS3 Primer complement(23954..23975) /label=AN272 ORF complement(24576..25289) /label=GFP Restriction_sit 24725..24730 /label=MfeI Promoter complement(25290..26148) /label=pPYK ORIGIN 1 aaaattgtgc ctttggactt aaaatggcgt GAAGTTCCTA TTCCGAAGTT CCTATTCTCT 61 AGAAAGTATA GGAACTTCTC TGTTATTAAT TTCACAGGTA GTTCTGGTCC ATTGGTGAAA 121 GTTTGCGGCT TGCAGAGCAC AGAGGCCGCA GAATGTGCAC TAGATTCCGA TGCTGACTTG 181 CTGGGTATTA TATGTGTGCC CAATAGAAAG AGAACAATTG ACCCGGTTAT TGCAAGGAAA 241 ATTTCAAGTC TTGTAAAAGC ATATAAAAAT AGTTCAGGCA CTCCGAAATA CTTGGTTGGC 301 GTGTTTCGTA ATCAACCTAA GGAGGATGTT TTGGCTCTGG TCAATGATTA CGGCATTGAT 361 ATCGTCCAAC TGCATGGAGA TGAGTCGTGG CAAGAATACC AAGAGTTCCT CGGTTTGCCA 421 GTTATTAAAA GACTCGTATT TCCAAAAGAC TGCAACATAC TACTCAGTGC AGCTTCACAG 481 AAACCTCATT CGTTTATTCC CTTGTTTGAT TCAGAAGCAG GTGGGACAGG TGAACTTTTG 541 GATTGGAACT CGATTTCTGA CTGGGTTGGA AGGCAAGAGA GCCCCGAGAG CTTACATTTT 601 ATGTTAGCTG GTGGACTGAC GCCAGAAAAT GTTGGTGATG CGCTTAGATT AAATGGCGTT 661 ATTGGTGTTG ATGTAAGCGG AGGTGTGGAG ACAAATGGTG TAAAAGACTC TAACAAAATA 721 GCAAATTTCG TCAAAAATGC TAAGAAATAG TCTCTGCTTT TGTGCGCGTA TGTTTATGTA 781 TGTACCTCTC TCTCTATTTC TATTTTTAAA CCACCCTCTC AATAAAATAA AAATAATAAA 841 GTATTTTTAA GGAAAAGACG TGTTTAAGCA CTGACTTTAT CTACTTTTTG TACGTTTTCA 901 TTGATATAAT GTGTTTTGTC TCTCCCTTTT CTACGAAAAT TTCAAAAATT GACCAAAAAA 961 AGGAATATAT ATACGAAAAA CTATTATATT TATATATCAT AGTGTtctag aGAAGTTCCT 1021 ATTCCGAAGT TCCTATTCTT ATAGGAGTAT AGGAACTTCt accAGCCCGA CCCGAGCACG 1081 CGCCGGCACG CCTGGTCGAT GTCGGACCGG AGTTCGAGGT ACGCGGCTTG CAGGTCCAGG 1141 AAGGGGACGT CCATGCGAGT GTCCGTTCGA GTGGCGGCTT GCGCCCGATG CTAGTCGCGG 1201 TTGATCGGCG ATCGCAGGTG CACGCGGTCG ATCTTGACGG CTGGCGAGAG GTGCGGGGAG 1261 GATCTGACCG ACGCGGTCCA CACGTGGCAC CGCGATGCTG TTGTGGGCAC AATCGTGCCG 1321 GTTGGTAgga tccACCTGCC GTCGAGTTCG TCTCCGAGTG AGTCCAGCAC CGCGTTGAGA 1381 GGGCCGTCCT GTGGAGAATG AAGAGAAACT TCGTCATTAC CTCAAAGAGG TCACGAAGGA 1441 TCTGCGGCAG ACCCGCCAGC GCTTGCAGGA CGTCGAGGCG AAGAGCCGCG AGCCCATCGC 1501 GATCGTCGGC ATGAGCTGCC GTTTCCCCGG TGGCATCGCA ACGCCGGAAG CGCTGTGGGA 1561 CCTGGTGCGC GAGGGCGGCG ACGCGGTGTC GGAGTTCCCG GCCGACCGCG GATGGGACAC 1621 GGAGGGCCTC TACGACCCCG CGGGCGGCTC CGGGAAGTCG GTCACCCGCT ACGGCGGATT 1681 CCTGCGCGGC GTCGCCGATT TCGACGCCGC GCTCTTCGGG ATCTCTCCCC GTGAGGCGAT 1741 CGCGATGGAC CCGCAGCAGC GGCTGATGCT GGAGACCTCC TGGGAAGCGT TCGAGCGGGC 1801 CGGTGTCAAC CGTGACGCGG TGCGGGGCAG CCGGACCGGG GTGTTCATCG GCACCAACGG 1861 CCAGGACTAC GCGACACTGC TCAGCGCTGC CCGGGACGAT GTGCAAGGCC ACCTCGGCAC 1921 GGGCAGCGCG GCCAGTGTGC TCTCGGGACG GGTCGCCTAC ACCTTCGGTC TCGAAGGGCC 1981 GACGGTCACC GTGGACACCG CGTGCTCGTC CTCACTGATC GCCCTGCACC TGGCCGTCCA 2041 GGCACTGCGC AACGGCGAGT GCGAGCTGGC GCTGGCGGGC GGCGTCACGG TGATGACGAC 2101 GACGAACACC TTCGTCGAGC TGTCCAAGCA GGGCGGGCTG GCGCCGGACG GCCGGTCCAA 2161 GGCGTTCGCG GCGGCGGCGG ACGGCACCGG CTGGGGTGAG GGCGCCGGGA TGCTGCTGGT 2221 GGAGCGGCTG TCCGACGCCG AACGGCACGG TCACCCCGTG CTGGCGGTGG TGCGTGGCAC

287

2281 CGCCGCCAAC CAGGACGGCG CGTCGAATGG GCTGACCGCG CCGAACGGGC CCTCCCAGCG 2341 CCGGGTCATC CGCGCGGCGC TGTCCAACGC CCAGCTGTCC ACGGGCGATG TCGACGTGGT 2401 GGAGGCACAC GGCACCGGCA CCCGGCTCGG CGACCCGATC GAGGCACAGG CCCTGCTCGA 2461 CACCTACGGT CAGGACCGGG ACCGGCCGCT GTGGCTCGGA TCGGTCAAGT CGAACCTGGG 2521 ACACACCCAG GCCGCCGCGG GTGTCGCCGG GGTCATCAAG ATGGTGCTCG CCATGCGCCA 2581 CGGTGTGCTG CCGCGCACCC TGCACGTGGA TGAACCGACC CCGCATGTGG ACTGGTCCGC 2641 CGGGGCGGTG CGGCTGCTCA CCGAGCGGAC CCCGTGGCCG GAGGCCGACC GGCCGCGCAG 2701 GGCGGGCGTC TCCGCCTTCG GAGTGAGCGG CACCAACGCC CATGTGATCG TGGAGCAGGC 2761 ATCGGAGGCC GAGCCCGTCG AGCCGCCCCG GGCCGAACCG GTGACGGTGC CCTGGGTGCT 2821 CTCGGGCCAG GGCGAGGCCG GTCTGCGGGC CTTCGCGGCC CGGCTCGCCG ATGTGGCCAC 2881 CGAAGCGCAC CCCGGCGACC TCGGATGGAC CCTGGCCACC ACCCGCTCGG CGCTGCCGCA 2941 CCGTGCGGTG GTGATCGGAT CCACACCAGA GGAACTGCGG AGCGGCCTCG CGGCGGTGGC 3001 CGCCGGAGAG CCGGCCTCGA ACGTGGTGGA GGGAGTGGCC GGCTCCGACA CCGGCGTGGT 3061 CTTCGTCTTC CCGGGACAGG GCTCGCAGTG GGCCGGTATG GCCGTGGAAC TGCTGGACTC 3121 CTCCCCGGCC TTCGCCCGCC GGTTCGCCGA ATGCGCCCGT GCCCTGGAGA CACACCTCGA 3181 CTGGTCCATC GAGGACGTGG TGCGTTCCGC GCCCGGTGCG CCCTCGCTCG ACCTCATCGA 3241 GGTCGTCCAG CCGGTCCTGT TCACCATGAT GGTGTCCCTC GCTGAGCTGT GGGCCTCCTA 3301 CGGGATCACT CCATCGGCCG TGGTCGGCCA CTCCCAGGGC GAGATCGCGG CGGCCTGTGT 3361 GGCCGGGGCG CTGTCGCTGG AGGACGCGGC CAAGGTGGTG GTGTTGCGCA GCCGCCTCTT 3421 CGCCGAAACG CTGGTGGGCA ACGGCGCCAT CGCCTCGGTC GCCCTGCCCG CGGAACAACT 3481 GGCCACCCGG ATCGAGCCGT GGGGCGAGCG CCTCGTGGTG GCCGGGGTGA ACGGGCCCGC 3541 GGCCGCCACG GTGGCCGGCG ATCCCCAGAG CCTCGAGGAG TTCGTCGCCG CATGCGCGGC 3601 GGACGGCGTA CGCGCCCGCG TCGTGCCCGC CACCGTGGCC TCCCACGGCC CGCAGGTGGA 3661 ACCGCTGCGG GAACGGCTGC TCGCCCTGCT GGCCGACGTG GCGCCACGCC AGTCCACCGT 3721 TCCGTTCTAC TCCACGGTGA CCGGCGGACT CCTGGACACC ACCGAACTCG ACGCGGACTA 3781 CTGGTTCTGG AACGCCCGTA AGCCGATCGA CTTCCTCGGC GCGCTCCGGG CGCTGTTCGC 3841 CGACGGCCAC CGCGTCTTCG TGGAGTCGAG CACCCACCCC GCCCTGACCA TGGGGGTCCA 3901 GGACACCGCG GATGCCTCCG GCGAGTCCGT GGAGGTCACC GGCTCGTTGC GGCGTGGCGA 3961 GGGCGGGCTC GACCAGTTCC ACTCGGCCGT GGCGCGGCTG CATGTGCACG GCGTACGGGT 4021 GGACTGGTCC GCGGCCTTCG GGGCGGCGCG GCGGGTGGAG CTGCCGACCT ACCCCTTCCA 4081 GCGGGAGCGT TACTGGCTGA CGCCCCGGCC CGGCCAGGGT GACGCCTCCG CCCTGGGGCT 4141 GGGTGCGCTC GACCACCCCC TGCTGGGGGC CACGGTCGTG CTGCCCGAGT CCGGCGGTTG 4201 CCTGCTCACC GGTCGGCTGT CCCTGGCCGG ACAGCCGTGG CTGGCCGATC ACGCCCTCTC 4261 CGGTGTGGTG TTGCTGCCGG GGACGGGGTT TGTGGAGTTG GTGTTGCAGG CGGGGTTGCG 4321 GTGGGGGTGT GGGGTGGTTG AGGAGTTGAC TTTGGAGGGG CCGTTGGTTC TTCCGGAGCG 4381 GGGTGAGGTT GAGGTTCAGG TTTCGGTGGG TGGTGTGGAT GGGGCCGGGT GTCGGTCGGT 4441 GTCGGTGTTT TCGTGTCGTG GGGGTGAGTG GGTTCGGCAT GCGGTGGGTG TGCTTGGGGT 4501 GGGGGATGGT GCGGTGCCGG TGGCGGAGGT GTGGCCGCCG GTGGGTGCGG AGCGGGTTGG 4561 GGTGGAGGGG GTTTATGAGG CGTTGGCGGA GCGGGGGTAT GCGTACGGCC CGGTGTTCCA 4621 GGGGCTGCGG GACGCCTGGC GCCGGGGAGA CGAAATCTTC GTCGAGGTGG CGGTGGCCCA 4681 GGAGGCACGG GCGGACGCGG CGCGGTGCGC GATCCATCCC GCGCTGCTCG ACGCCGCGCT 4741 CCACGGGGTG CGATTCGGTG ACTTCGTATC CGACGACGAC CAGGCTTATG TGCCGTTCTC 4801 CTGGACCGGC GTCACGCTGC ACGCGGTCGG TGCGACGGTC CTGCGCGTCA CACTGTCCCC 4861 GGCAGGACGC GACGCGATCG CCCTCCGGGC CACGGACACC ACCGGTGCGC CGGTCCTGTC 4921 GGCACGCTCA CTGGCCCTGC GACCGGTCTC CGCCCAGCAG TTGAACGACA CGCGGGGGAG 4981 CAGGACTGAC GCCCTCCATC GGGTGGAGTG GGTGGACGCG TCCGGAACCG TGGCGGTGGG 5041 GGGTGAGGTG GCGCCGCGGA CTGAGGTGGT GCGGGTCGTC TCCGAGGGTC CGGATGTGGT 5101 GGGTGAGGCG TACGGGCATG TGCTTGAGGT TCTGGAGCGG GTGCAGGCGT GGGTGGCGGA 5161 TGAGGACCTG GCGGGTGAGC GGTTGGTGGT GGTGACGCGG GGCGCTGTCG ACACGGGTGA 5221 TGGTGTGGCG GACGTGGCTG GGGCCGCGGT GTGGGGCCTG GTGCGGTCCG CGCAGTCGGA 5281 GAACCCGGGG CGTCTGGTGC TGGTGGACAC CGATGACCTG GACGGCGTCG ACAGTCTGCT 5341 TCCCGGGATG CTGGCTCTGG ATGAGGAGCA GGTGCTGGTG CGGTCGGGTG CGGTGCGGGT 5401 GCCGCGTCTG GCTCGGGTGC CGGCGCCGGG TGAGGTATCG GGAGGGTTTG GTTCCGGTGC 5461 GGTGTTGGTG ACGGGTGGCA CTGGTGTGCT GGGCGGTCTG GTGTCACGGC ATCTGGTGGC 5521 GCGGCATGGG GTGAGCAGGC TGGTGCTGCT GTCGCGTCGC GGTGCGGAGG CCGAAGGTGC 5581 GGCGGAGTTG CGGGAGGAGC TGGAGGCCGC GGGCGCCGAG GTGGTGATCG CGGCGTGTGA 5641 TGCCGCGGAT CGTGAGGCTC TGGCCGGGGT GTTGTCGGGG TTGTCGGCGG ACTTCGCCTT

288

5701 GAGCGGTGTG GTGCATGCGG CGGGTGTGCT GGACGACGGG TTGCTCACGT CGTTGACGCG 5761 TGAGCGGGTC GAGCCGGTGT TGCGGGCGAA GGTGGACGCG GCGTGGAACC TGCATGAGCT 5821 GACCACGGGC ATGGATCTGT CGGCGTTTGT GCTGTTCTCA TCGGCGGCGG GTATTCTGGG 5881 CAACGCGGGC CAGGGCAGTT ATGCGGCGGC GAACGGGTTC CTGGACGCGC TGGCGGCTCA 5941 TCGGCGGGCG CGGGGACTGC CCGCGGTGTC GATCGCGTGG GGCTTCTGGG AAGCACGCAG 6001 CGAGCTGACC CAGCACCTGT CGGCCGACGA TCTGGCGCGT GCCCACGCGG TGCCGATGCC 6061 CACCTCCCAG GCACTGGATC TGTTCGACGC GACGCTCGCC GCCGACGAGC CGATGGTGCT 6121 GGCCGCACCC CTGAACCCGC AGGCATGGTC GGACGCCGGC CACCTGCCTC CCGTCCTGCG 6181 CGATCTGGTC CGGCCGCGGA TCCGGCGCGC GGCGGAGACA ACCGGCGCCC CCGAATCGGC 6241 CTCCGCGCTC GGACACCGGC TGGCCGCCGT CGACCGCTCC GAGTGGGACC AGGTCGTACG 6301 CGAACTCGTG CGCAATCACA TCGCGGCGGT GCTGCGCCAT GCCTCCGGGG AGTCGGTGGA 6361 CACCTCGCGG ACGTTCCAGG AGATCGGCTT CGACTCGCTG ACCGCCGTGG AACTGCGCAA 6421 CCGGATCAGC GCCGCCACCG GCGTACGGCT GCCCGCCACC GCCGTGTTCG ACTACCCGAC 6481 ACCGCAAGCG CTGGCCGAGT ACCTGCTCGC CGAAGTCCTC GGGAAGGACA GCGCCGCCGC 6541 CGCGACACCC GTCGGAACCG CCCTCGTCGC CGACGATCCC ATCGTCATCG TCGGAATGAG 6601 CTGCCGCTAC CCCGGCGGGA TCACCTCGCC GGAAGCGCTG TGGGACCTGG TGCGCTCGGA 6661 CGGCGATGCC ATATCCGTCC TGCCGGCCGA CAGAGGATGG GACCTGGACG GCCTCTACGA 6721 CCCGGATCCG GACCGCACCG GTACGTCGTA CGCCCGCAGC GGTGGATTCG TCTACGACGC 6781 GGCCGAGTTC GACGCCGCCT TCTTCGGGAT CTCGCCGCGC GAGGCCGCCG CCATGGACCC 6841 GCAGCAGCGG CTGCTACTGG AAACCTCATG GGAGGCGTTC GAACGCGCGG GCATCCCCGC 6901 CACCTCCGTC AAGGGTGAGC GGATCGGCGT GTTCACCGGG GTGATGCACC ACGACTACCT 6961 CACCCGCCTG TCGACCACAC CGGACGCCGT TGAGGGCTAT CTGGGCACGG GCGCGGCAGC 7021 GGGCGTCGCC TCGGGCCGCG TGGCCTACAC CTTCGGACTC GAGGGCCCGG CGGTCACCGT 7081 GGACACCGCC TGCTCGTCGT CGCTGGTGGC CCTGCACCTC GCCGTACAGG CGCTGCGCCT 7141 CGGCGAGTGC TCGCTCGCGC TGGCCGGTGG TGTGACGGTG ATGTCCACGC CCACCGTCTT 7201 CGTCGAGTTC TCCCGCCAGC GCGGGCTCGC GCCGGACGGC AGGTGTAAGG CGTTCGCGGG 7261 AGCGGCGGAC GGCACCGGCT TCGCCGAAGG CATCGGCATG CTGCTGGTCG AACGGCTCTC 7321 GGACGCACGG CGCAACGGAC ACCCCGTCCT GGCCGTGGTG CGGGGCAGTG CGGTGAATCA 7381 GGATGGTGCG TCGAATGGGT TGACGGCCCC GAATGGTCCG TCGCAGCAGC GGGTGATCCG 7441 GCAGGCGCTG GCGAGCGCGG GGCTGTCCAC GGTGGATGTG GACGCGGTGG AGGCGCACGG 7501 TACGGGTACG ACGCTGGGTG ATCCGATCGA GGCGCAGGCG TTGCTGGCCA CGTACGGTCA 7561 GGGCCGGGAT TCGGACCGGC CGTTGCTGCT GGGGTCGATC AAGTCGAACA TCGGTCACAC 7621 TCAGGCGGCC GCCGGTGTGG CTGGTGTGAT CAAGATGGTG ATGGCGATGC GCCACGGCGT 7681 GCTGCCGCAG AGCCTGCACA TCGATGAGCC CACTCCCCAC GTCGACTGGT CCACCGGCGC 7741 GGTGGAGCTC CTGAGCGAAC AGACGGCATG GCCGGAGGCC GGGCGGCCCC GCCGGGCCGG 7801 GGTGTCGTCG TTCGGCATCA GCGGGACGAA CGCGCACCTG ATCCTTGAGC AGGCTCCGCT 7861 GCCGACGGCA GCGGAGCGGC CCGGTGACGC CGAGCCCGTT CCGGTCGAGC CTGCCGCGGT 7921 GGTCCCGTGG ATCGTCTCGG GGCGCGACCG GCATGCCGTG CGCGCGCAGG CGGAACGACT 7981 GCGCGCACAC GTGGTGAGCC ACCCTGACCG GAGGGTGGCG GACATCGGTT TCTCGCTGCT 8041 GACCAGCCGC GCCGTGCTGG AGCACCGAGC GGTGGTACTC GGCGGTGACC ATGCCGAACT 8101 GCTGGCCGGG CTGACGGCCC TGGCACGGGA CGAACCCGCA CCGGGCGTGG TGGAGGCCCT 8161 GGACGCGGCC GAGCCGGGGC GCAAGGTGGT GTTCGTCTTC CCCGGTCAGG GGTCGCAGTG 8221 GGCCGGGATG GCGCTGGAAC TGATGGAGTC CTCGCCCGTG TTCGCACGGC GGATGGGCGA 8281 GTGCGCCGAT GCGCTGGCTC CGCTGGTGGA GTGGTCGCTG CCGGACGTGC TGGCGGATGA 8341 GCGAGCGCTG GCCCGTGTCG ATGTGGTGCA GCCGGTGCTG TGGGCGGTGA TGGTGTCGCT 8401 GGCCGAGCTG TGGCGTTCGT ACGGTGTGGT GCCGTCGGCG GTGGTGGGTC ACTCGCAGGG 8461 TGAGATCGCG GCGGCGTGTG TCGCGGGTGG CCTGTCCCTG GCGGACGGGG CAAGGGTGGT 8521 CGTGCTGCGC GGCAAGGCGC TGCTCGCCTT GTCGGGCCGG GGCGGAATGG TGTCCGTTCC 8581 GGTGCCCGCC GACCGGCTGC GGGACCGGCC CGGGGTCTCC ATCGCGGCGG TGAACGGCCC 8641 ATCCTCGACA GTGGTGTCCG GCGGCGACGA GGTGCTGGAC GCGGTGCTGG CGGAGTTCCC 8701 GGCCGCCAAG CGCATCCCGG TGGACTACGC CTCCCACTCG CCCCAGATCG ACGACATCCG 8761 GGACGAACTG CTGAAGGCCC TGGCGCCGAT CGAGCCGCGC ACCGCGGCGA TCCCCTTCCA 8821 CTCCACGGTG ACCGGACGGC CCATCGACAC CGCCGACCTG GACGCGGACT ACTGGTATCG 8881 CAATCTGCGC GAGACCGTGG AGCTCGAGCG GGTCATCCGT ACGGCGGTCG AGGACGGCCA 8941 CCACACCTTC ATCGAGATCA GCCCCCACCC GGTGCTGACC ACGGGCCTGC GCGAAACACT 9001 CGACGACGCG GACGCGCACG GCGGCCTCGT ACTGGCCTCA CTGCGCCGGG ACGACGGTGG 9061 CCCTACCCGC TTCCTCACCG CCTTGGCCGA GGCGTACGCA CACGGCGTCG AGGTCGACTG

289

9121 GCTGCCGCTG TTCCCGGGCG CCCGCCGGGT GGATCTGCCG ACGTACGCCT TCCAGCGCGA 9181 GCGCTACTGG CTGGACGCGC CCACCGCCGA GGCCCCCACC AGCGCGATCG ACGCGGAATT 9241 CTGGGCCGCC GTCGAGCGCG AGGACCTCGA GTCGCTCGCC GCGACGCTGC GCGTCGACGG 9301 GCAGCCGCTG CGCGAAGTGC TGCCCGCCCT GTCCCAGTGG CGGCGCGAAC GCCGTGACGT 9361 CTCCACCATC GACTCATGGC GTTACACGAT CCGGTGGAAG CCGCTCACCC CGCCCGCCAC 9421 TTCACCGACC GGCACCTGGC TGGTCGTGGT CTGCCATGCC GAGGCCGGGC ACGAGTGGGT 9481 CGCGGGGGTG ACCGACGCGC TGACCCGTCA CGGTGCCGAG CCGCTCGTGG TCGTTCTCGG 9541 CGAGCCCGAA CTGGACCGTG CCGCGCTGGC CGCCCGGCTG GGCGGCGTAC TGGCCGACAC 9601 CCCCAGGATC AGCGGTGTGG TGTCGCTGAC CGCGCTGGAC GAGAGCCCGC ACCCGGCGTA 9661 CCCCTCGGTC CCCCAGGGAT ACGCGATGAC GCTGCTGCTC TCGCAGGCGC TCGGGGACGC 9721 CAGGGTGGAA GCTCCGCTGT GGTGCCTCAC CCAGCGCGGC GTCTCGCTCG GCGATGCCGG 9781 AGGCAGTGGC AGTGGCAGTG GCACTGGCGA CGGCAGGGGC AAGGGCAAGG GTGATGTGGC 9841 CGTCAGCCGG AAGCAGGCCC TGACCTGGGG TCTCGGCAAG GTGATCGCTC TGGAACAGCC 9901 CCTGCGCTGG GGCGGTTTGA TCGACCTGCC GGAGGGCGTG GCCCCGCATA CCCAGGACTA 9961 CCTTGCCGGT GTGCTGTCCG GCACCTCGGA CGAGGACCAG GTGGCGATCC GCCCGACGGG 10021 GCTCTTCGGC CGTAGGCTGG CCCACGCGCC GGCCCGCGAG CGCGGCGGGG GCTGGCAACC 10081 CCGCGGCACC GTACTGGTCA CCGGTGGCAC CGGAGCGCTG GGCGGCCATG TCGCCCGGTG 10141 GCTGGCCGGC CAGGGGGCTG AACACGTGGT GCTGACCAGT CGCCGGGGCA TGGCCGCGCC 10201 CGGCGCCGAG CGGCTGGCCG GGGAGCTGGA GGCGCTCGGC GCCCGGGTGA CGGTGGCGGC 10261 GTGCGACGTC GGTGACCGGG ACGCCCTGGC CGGGTTGCTG GCCGAGGTCG GCCCGCTGAC 10321 CGCTGTGGTG CACACCGCGG CGGTGCTCGA CGACGGCACG CTGAACTCGC TCACCACCGA 10381 CCAGCTGCAA CGCGTGCTGC GCGTCAAGAC CGACGGCGCG GTGCATCTGC ACGAACTGAC 10441 GCGGGACATG GAGCTGTCCG CGTTCGTGCT CTTCTCCTCG CTGTCCGGCA CTCTGGGCGC 10501 ACCCGGTCAG GGCAACTACG CACCCGGCCA TGTCTTCGTG GACACGCTGG CCGAGCAGCG 10561 GCGGGCCGAG GGCCTGGTGG CCACCTCCAT CGCCTGGGGG CTGTGGGCCG GTGACGGCAT 10621 GGGCGAGGGC GGTGTGGGCG ACGTGGCCCG CCGCCATGGC GTACCGGAGA TGGCGCCGGA 10681 GATGGCGGTC GCCGCCATGG CACGCGCCGT CGAGCAGGAC GACACCGTCG TCACGGTGGC 10741 CGAGATCGAC TGGGACCGGC ACTACGTCGC GTTCACCGCG ACCCGCCCCA GCCCGCTGCT 10801 GTCCGACCTC CCCGAGGTGC GTGCGCTGGT CGACGCCGGA GTCGGCCAGG AGAGCGCCGA 10861 GCCGGGCCAC GAGCGCTCGG AATTCGCGGA GCGGCTCGCC GGGATGGCCG AGACCGACCG 10921 GAACCACGCG TTGCTGGACC TGGTCCGGCG CCATGTCGCC GTCGTACTCG GACACACCGG 10981 TCCGGACGCG ATCGACCCCG GCCGGGCCTT CCACGAGATC GGCTTCGACT CGGTCACCGC 11041 GGTCGAACTG CGCAACCGGC TCAACCGGGC CACCGGCCTA CGGCTGCCCG CCACCGTGAC 11101 GTTCGACCAG CCCACCCCGC TGGCGATGGC GCAGTACCTC CGCGGCGAAC TGCTGCACGA 11161 CGGCCAAGGC CGATCGGCCC CCGCCCTCCC GGTCCGCGCG ACCGGCGCGG TGGACGACGA 11221 GCCTATCGCG ATCGTGGGGA TGAGCTGCCG CTTCCCCGGG GACGTCGCGT CCCCCGAGGA 11281 CCTGTGGCGG CTGCTCGCCG ACGGTTCCGA CGCCATCGGC GAGTTCCCCG AGAACCGGGG 11341 CTGGGACACC GCGCACCTCT TCCACCCGGA CCCCGACCAC CGAGGCACCT CCTCCACCCG 11401 AGCGGCCGCG TTCGTCTCCG GGGCCGGTGA GTTCGACGCC GGATTCTTCG GGATCTCCCC 11461 GCGGGAAGCG GTGGCGATGG ACCCGCAACA GCGGCTGCTG CTCGAAGTGT CATGGGAGGC 11521 GCTGGAGCGG GCCGGGATCG ACCCCACGAC CCTGCGGGGC AGCGAGACCG GCGTGTTCAC 11581 GGGGACGAAC GGTCAGGACT ACGCGTCGTT GCTGAAGGCG GACGAGACGG GTGACTTCGA 11641 GGGCCGGGTG GGCACCGGCA ACTCGGCATC GGTCATGTCC GGCCGGATCT CCTACGTCCT 11701 CGGTCTCGAA GGCCCCGCGC TGACCGTGGA TACGGCGTGC TCGTCGTCGC TGGTGGCATT 11761 GCACCTGGCG GTGCGGGCCC TGCGGTCGGG CGAGTGCTCA CTGGCCCTGG CGGGAGGCGC 11821 GAGTGTCATG ACGACCGCCG GCATCTTCGT GGAGTTCTCC CGTCAGCGCG CGTTGGCGGC 11881 CGATGGACGC TGCAAGGCGT TCGCGGCGGC GGCGGACGGT ACCGGCTGGG GTGAGGGTGC 11941 CGGAATGCTG GTGGTGGAGC GGTTGTCGGA TGCTGAGCGG CTTGGGCATC GGGTGTTGGC 12001 GGTGGTGCGT GGTTCTGCGG TGAATCAGGA TGGTGCGTCG AATGGTTTGA CGGCGCCGAA 12061 TGGTCCGTCG CAGCAGCGGG TGATCCGGCA GGCGCTGGCG AGCGCGGGGC TGTCCACGGT 12121 GGATGTGGAC GCGGTGGAGG CGCACGGTAC GGGTACGACG CTGGGTGATC CGATCGAGGC 12181 GCAGGCGTTG CTGGCCACGT ACGGTCAGGG CCGGGATTCG GACCGGCCGT TGCTGCTGGG 12241 GTCGATCAAG TCGAACATCG GTCACACTCA GGCGGCCGCC GGTGTGGCTG GTGTGATCAA 12301 GATGGTGATG GCGATGCGCC ACGGTGTGCT GCCGCAGAGC CTGCACATCG ATGAGCCCAC 12361 TCCCCACGTC GACTGGTCCA CCGGCGCGGT GGAGCTCCTG AGCGAACAGA CGGCATGGCC 12421 GGAGAACACA CGGCCCCGCC GCGCCGGGGT GTCCGCCTTC GGAGTGAGCG GCACCAACGC 12481 GCATGTGATT CTGGAGCAGG CCCCCGAGCC GACCGCCGCC CAGCCCGAAC TCTCGCCGGA

290

12541 ACGCGACGAA ATGAGGGCCG TGCCGTGGGT GGTGACGGGT GCGAGCGAGG CCGGAGTCCG 12601 CGCACAGGCC GCGCGCCTCA TGGCCTTTGT CGACGACCGG CCGGAACTCC GCCCGGTGAA 12661 CATCGGCTGG TCGCTGGCCT CGACCCGCGC GGCCCTGTCA CACCGTGCCG TGGTCGTAGG 12721 TGCTGAACGT ACGGAACTGC TGCGTGAGCT GGAGGCCGTG GCCAGTGGCA GCGTCACGGT 12781 CGGCGAGGCC CGCACGCATT CCGGGGTGGT GTTCGTCTTC CCGGGGCAGG GGTCGCAGTG 12841 GGTTGGGATG GCGTTGGAGT TGGTGGAGTC GTCGCCGGTG TTCGCGGGGC GGATGCGTGA 12901 TTGTGCGGAT GCGTTGGCCC CGTTTGTGGA GTGGTCGTTG TTCGATGTGT TGGGTGATGA 12961 GGTGGCGCTT GGGCGGGTTG ATGTGGTGCA GCCGGTGTTG TGGGCGGTGA TGGTGTCGTT 13021 GGCGGAGTTG TGGCGTTCGT TTGGTGTGGT GCCGTCGGTG GTGGTGGGGC ATTCGCAGGG 13081 TGAGATCGCG GCGGCGTGTG TGGCCGGGGG TCTGTCGTTG GAGGACGGTG CCCGTGTGGT 13141 GGCCTTGCGG AGCAGGGCGT TGCTGGCTCT GTCGGGTCGG GGCGGCATGG TGTCGGTTCC 13201 GGTTTCTGCT GACCGGCTGC GGGGTCGTGT GGGGTTGTCG GTGGCGGCGG TGAATGGTCC 13261 GGTGTCGACG GTGGTGTCGG GGGCTGTTGA GGTGCTGGAT GGGGTGCTGG CGGAGTTCCC 13321 GGAGGCGAGG CGGATTCCGG TGGATTATGC GTCGCATTCG GTGCAGGTGG AGGGGATCCG 13381 GGAGGGTTTG GCGGAGGCGT TGGCGCCGGT TCGGCCGCGT ACGGGTGAGG TGCCGTTCTA 13441 TTCGACGGTG ACCGGCCGGC TGATGGACAC CATCGAGTTG GACGCGGAGT ACTGGTACCG 13501 GAACCTGCGC GAGACGGTGG AGTTCCAGAG CGCGATCGAG GGGCTGCTGG AGCTTGGCCA 13561 TACGGTGTTC GTCGAGGCCA GCCCGCATCC GGTGCTGACC ATTGGCATCC AGGACACCGC 13621 CGACACCACC GACACCGACA TCGTCGTAAG CGGGTCACTG CGCCGCGACG ACGGCGGTCC 13681 TGTCCGCTTC CTCAGCACCG TCGGGCGACT GTTCACCGAG GGCGTGCCGG TGGAGTGGCA 13741 GCCGCTGTTC GCCGCGGCCG GGGCGCGAAA GGTCGATCTC CCGACCTATG CGTTCCAGCA 13801 TGAGTGGTTC TGGCTGGATC CGGTGCGCGG CGCGAGTGAT GTGGGCGGCG CGGGCCTTGC 13861 CGGTCTCGCT CACCCCTTGG TGAGCGCGGT GTTGCCGCTG CCCGAATCCG ATGGCTGTGT 13921 GCTGACCGGC TCGCTCTCCT CGGCCACCCA TCCTTGGCTG CGTGACCACG CCGTGCTGGA 13981 CAAGGTGTTG CTGCCGGGCA CCGGGTTCGT GGAACTGGCC CTTCAGGCCG GGCTGCACCT 14041 GGGCTGCCGG ACGCTGGATG AGCTGACCCT GCAGGCGCCG CTCATGCTGC CCGCGCACGG 14101 AGACGTACAG ATCCAGGTGG CGGTCGGCGG ACCGGACGAC AGCGGCCGCC GGCCGGTCAC 14161 GGTGTACTCC AGGCCGGGCA AGGACCGGAC CTGGATGCGG CACGCCACCG GCAGCATCAG 14221 CCCCGTCGGT GAAACGGCCA CCGTGGACCG GGCGGTGTGG CCCCCGGTCG GCGCCACACC 14281 GGTCGAGCTC ACCGATGTCT ACGCCGAGAT GAGCACGCAC GGTTACGCGT ATGGGCCCGT 14341 CTTCCAGGGG CTGCGCGCCG CATGGCGACG TGGCGACGAG GTGTTCGCCG AGGTGGTCCT 14401 GCCCGAGACG GCCGAGAGCG ACGCGGGTCG TTGCGCCATC CACCCCGCCC TCCTCGACGC 14461 CGCCCTGCAC GGTGCCGGAC TGGGCACGTT CGTGACCGAA CCAGGCCGAC CGCACCTTCC 14521 GTTCACCTGG ACCGGTGTCA CCCTGCACGC CGTCGGTGCC ACCACCTTGC GGGTCGTCCT 14581 GTCGCCCGCC GGGCCGGACG CCATCTCGCT CCTGGCCATG GACGGCACGG GAGCGCCGGT 14641 GCTGACGGCG GACTCTCTGG CCCTGCGCCC GGTGTCCGAG GGCGGGCTCG GCGGCTCCCA 14701 CGACGACTCG CTGTTCCGCG TGGACTGGAC CGAGCTCACC CTGGACGCCT CGGACGCCTC 14761 GGACGCACCG GAGGTGTCGG ATGAAGCGGC CTTCCCGGTC GTCGAGTCCG TGGCCCAGCT 14821 GGCCGGGGTG GCGGCGGCCC GGAGCGGGCG CGGGGCCGTG GTGTTCAGGC TTTCCACCAC 14881 GGAGACCACA GGAGGCGCCG CCGAGGAGAG CCCGGAGGAC GTCTACGCGC TCACCAGCCG 14941 TGTCCTCAAG GTCGCGCAGG CGTGGTTGGC GGACGACCGG TTCGGGGACG CCCGCCTCGT 15001 CGTGGTGACG CGGGGCGCGG TCGCGACCAC GCCCGGAGAG AACCCGGAGA GCCTTGCCGC 15061 CGCCGCGGTC TGGGGCCTCA TCCGCACCGC GCAGACCGAG AACCCCGGCC GTTTCGTCCT 15121 CGTGGACACG GTGGACGAGG ATCCGTCGGC GTTGCCGGGG GTGCTCGCCA CCGATGAGCC 15181 ACAGGTGGCG ATCCGGGCGG GGAAGGCGCT GGTGCCCAGG CTGGTACGGG CCACCTCGTC 15241 GGCGTTGCCG GTACCAGCTG AGACGGACAC CTGGCGGCTG GAGACCGACG GTCAGGGCAC 15301 TCTGGAGAAC CTGGTCCTCT CGCCCCGCGC CGAGGCGTCC AGGCCACTTG CCGCACATGA 15361 GATCCGGGTG GCCGTGCACG CGGCCGGGGT CAACTTCCGC GATGTACTGC TCGCTTTGGG 15421 GATGTACCCG GACAAGGCCG GTCTGCTGGG CAGCGAAGCC GCCGGGACGG TGCTGGAGAT 15481 CGGCTCCGGA GTAGTGGGAG TGGCACCGGG AGACCGGGTG ATGGGTCTGT TCTCCGGTGC 15541 CTTCGCGCCG GTGGCGATCA CCGATCACCG ACTGGTGGCA CCGATCCCGG AGGGGTGGTC 15601 CTTCCCGCAG GCCGCCGCCA CCCCGATCGC CTTCCTCACG GCGATGTACG CCTTGATCGA 15661 CCTGGCCGAA GTGCGGAGCG GCGAGTCGGT GCTGGTGCAC GCGGCGGCCG GTGGCGTCGG 15721 GATGGCGGCA GTGCAGGTGG CGCGCTGGCT GGGCGCCGAG GTGTTCGCCA CCGCGAGCCC 15781 GGCCAAGTGG GATGCGGTGC GCGCATGCGG GGTCGCCCCG CGGCGGATCG CTTCCTCCCG 15841 CTCGCCGGAG TTCGCGGACC GCTTCCGCTC GGACGCACCG GACGGTGTGG ATGTCGTACT 15901 CAACTCGCTG ACCGGTGAAC TCCTCAACGC GTCGCTCGGA CTGCTGCGTC CCGGTGGACG

291

15961 GCTGATCGAG ATGGGCAGGA CCGAACTCCG GGACGCACAG GAGGTGATGG CGCGCCACGG 16021 TGTGTCGTAC CGGGCCTTCG AACTGCTCGA CGCCGGTCCC GACCGTATCG GCCGACTGCT 16081 CACCGAGCTG CTCGCCCTGT TCCACCAGGG CGTGTTCACC CCGCTGCCAC TGCGCGTCCA 16141 GGACGTACGG CAGGCGAGTG ACGCTTTCCG CCACCTCTCC CAGGCGCGCC ACATCGGCAA 16201 GCTGGCCCTC ACCATCCCGC GACCGTTGTC CGGCGGCACC GCACTGATCA CCGGGGGCAC 16261 CGGGACACTG GGCGGTCTGG TGGCTCGCCA ACTGGTGCGG GAGCACGGCG TGACGGAGCT 16321 GGTGCTGGCC AGCCGTCGTG GTGACACCGC TCCGCAGGCG GCGGAGCTGC TCACCGAGCT 16381 GGAGGCCGCC GGGGCGCGGG TGCGGGTGGC CGCATGCGAT GTGTCGGACC GGGACGCCAT 16441 CGCCGCACTC GTCGCCTCGC TGCCGAACCT GCGGAGCGTG GTGCACACGG CCGGTGTCCT 16501 CGACGACGCC GTGATCGGGT CGCTCACCCC GGAGCGGCTG CGGACGGTAC TGCGTCCCAA 16561 GGCGGACGCC GCATGGCATC TGCATGAACT GACCCGGGAC CGGGACCTTG CCGAGTTCGT 16621 GTTGTTCTCC TCGGCGGCCG GAGTACTCGG TGGCCCAGGG CAGGGCAATT ACGCGGCGGC 16681 CAACGCCTTC CTGGACGCGC TGGCCGCGCG CCGCCGGGCA CAGGGACTGC CCGCGACCTC 16741 GCTGGCCTGG GGCTTCTGGG AGCAGCGCAG CGGACTGACC GAACACCTGA CCACCGATCG 16801 GCTCGCCCGG GCCGGCGTCC TGCCGCTGTC CACCGACGAG GGGCTGGTCC TCTTCGACGA 16861 CGCCCGCGCG ACCGGCGACA CCCTGCTGGT GCCGATGCGT TACGAACCGT CCTCGCCGGG 16921 CCCTGAGCCG GTACCCGCCC TGCTGCGTGG CCTCGTACGC GCTCCGCTCG CCCGCGCCCT 16981 TCCGGGCCCG GCCGATGGTG TGGGCAGCGG TGTGGCGGAG GGCCTCACAG GGCTGGCGGC 17041 GGACGAACGC CTCGGCGCAC TGCTCGACCT GGTCCGCCGG GAGGCGGCGG CCGTGCTCGG 17101 CCACGGCGGT CCGGAATCGG TGACACCCCA GCGTCCGTTC AAGGAACTCG GCTTCGACTC 17161 GCTCTCCGCC GTGGAACTGC GCAACCGGCT GCGCGCGGCG ACCGGCCGAC GGCTGGAGGC 17221 CACCCTTGTC TTCGACCACC CCACTCCGGC CGTGCTCGCA CGCCACCTCG ACGCCGAGCT 17281 GTTCGGCGCC ACCGACGTGG CGGCGCCCGT ACCAGCACCG GCGGTCGCGC ACCCGGCCGA 17341 CGAGCCGATC GCCATCGTCG GCATGAGCTG TCGGCTCCCG GCCGGGGTGG ACTCCCCGGA 17401 GGCGCTGTGG AAGCTGCTGG TGAGCGGCAC GGACGCGATA TCGGAGCTGC CCCCCGACCG 17461 CGGCTGGGAC CTTGACAGGC TCTACGACCA GGATCCGAGC CGGCCCGGTA CGACATACGC 17521 CAAGACCGGT GGCTTCCTGA AGAACGCGGC GGACTTCGAC GCGGGATTCT TCACGATCTC 17581 CCCCCGAGAG GCGCTGGCCG CGGATCCCCA GCAACGGCTG TGGCTCGAGG CGTGCTGGGA 17641 AGCCTTCGAA CGCGCCGGTA TCGATCCGCT CGCCCTGAAG GGCACCCGAA CCGGGGTGTT 17701 CGCGGGTGCC GTTTCGACGA CGTACGGCGC GGGTCAGGCC GCCACTCCGG ACGGCTCCGA 17761 GGGGTACCTG CTCACCGGCA ACTCCACCTC CGTGATCTCC GGCCGCGTGG CCTACACCCT 17821 CGGCCTCGAA GGCCCCGCCG TCACCGTGGA CACCGCGTGC TCGTCCTCGC TGGTCAGCGT 17881 GCACTGGGCG TGTGAGTCCC TGCGCCGGGG CGAAAGCACA CTGGCGCTGG CGGGCGGTGT 17941 GGCGGTGATG ACGACACCGG ACCTGCTGGT CGAATTCTCC CGCCAGCGCG GACTCGCACC 18001 GGACGGGCGG TGCAAGTCGT TCGCCGCCGC CGCTGACGGC ACAGGGTTCG CCGAAGGCGT 18061 CGGGGTGCTC GTCCTGGAAC GGCTGTCCGA CGCCACGCGG AACGGCCACC AGGTGCTGGC 18121 GGTGATCCGC GGCTCCGCCG TCAACCAGGA CGGCGCGTCC AACGGTCTGA CCGCGCCGAA 18181 CGGCCCCTCG CAGCAGCGGG TGATCCGGCA GGCGCTGGTG AACGCCGGAC TCGCCTCCCA 18241 GGATGTCGAC GTGGTGGAGG CGCACGGTAC GGGTACGACG CTGGGCGACC CCATCGAGGC 18301 GCAGGCTCTG CTGGCCACCT ACGGCCAGGA CCGGGATCCG GATCGGCCGC TGCTGCTGGG 18361 CTCCGTGAAG TCCAACATCG GGCACACCCA GGCGGCCGCA GGTGCCGCCG GACTCATCAA 18421 GATGGTTCTG GCGCTGCGCA ACGGCGTACT GCCGCGCACC CTGCACGTCG ACGAGCCCTC 18481 CCCGCACGTC GACTGGTCCG CCGGGGCCAT GGAGCTGCTG ACCGAGCAGA CCGCGTGGCC 18541 CGACCGGGAC CACCTGCGCC GGGCCGGGGT GTCCGCGTTC GGAGTGAGCG GCACCAACGC 18601 CCATGTGATC CTCGAACAGG CCCCGGAGCC GGATGAGAAC GGCGAACCGG ACACCGTCCG 18661 GTCGTGGTTG CCCGCGGTGC CCTGGGTGCT GTCGGGCGCG GGAGCGGCCG GGCTTCGGGC 18721 CCAGGCCCAG CGGTTGGCGT CCTTCGTGCG GGAGAACCCC GGGCTCGACC CCGTGGACGT 18781 GGGCTGGTCC CTGGTCGCGA CCCGCGCCGC CCTGTCGCAC CGAGCCGTCG TCGTGGGCGC 18841 GGACCGCACG GAACTGCTGC GCGAGCTGGC CGCGGTGGAA TCCGTGGGCG CCGCCGAGGC 18901 GGAGCGCGAC GTGGTGTTCG TCTTCCCGGG GCAGGGGTCG CAGTGGGTTG GGATGGCGTT 18961 GGAGTTGGTG GAGTCGTCGC CGGTGTTCGC GGGGCGGATG CGTGAATGTG CCGATGCGCT 19021 CGCCCCGTTT GTGGAGTGGT CGTTGTTCGG TGTGTTGGGT GATGAGGTGG CGCTCGGTCG 19081 GGTTGATGTG GTGCAGCCGG TGTTGTGGGC GGTGATGGTG TCGCTGGCGG AGTTGTGGCG 19141 TTCGTTTGGT GTGGTGCCGT CGGTGGTGGT GGGGCATTCG CAGGGTGAGA TTGCGGCGGC 19201 GTGTGTGGCG GGTGCGTTGA CTTTGGAGGA TGGGGCGCGT GTGGTGGCCT TGCGGAGCAG 19261 GGCGTTGCTG GCTCTGTCGG GTCGGGGCGG CATGGTGTCG GTTCCGGTGT CCGCTGATCG 19321 GCTGCGGGGT CGTGTGGGGT TGTCGGTGGC GGCGGTGAAT GGTCCGGTGT CGACGGTGGT

292

19381 GTCGGGGGCT GTTGAGGTGC TGGATGGGGT GCTGGCGGAG TTCCCGGAGG CGAGGCGGAT 19441 TCCGGTGGAT TATGCGTCGC ATTCGGTGCA GGTGGAGGGG ATCCGGGAGG GTTTGGCGGA 19501 GGCGTTGGCG CCGGTTCGGC CGCGTACGGG TGAGGTGCCG TTCTATTCGA CGGTGACCGG 19561 TCGGTTGATG GACACCGTGG GGCTGGACGG GGAGTACTGG TATCGGAATC TGCGTGAGAC 19621 GGTGGAGTTC CAGAGCACCG TCGAAGCTCT GATCGGCCAG GGCCACACGG TGTTCGTCGA 19681 GGCCAGCCCG CATCCGGTGC TGACCGTCGG CGTCCAGGAC ACCGCCGACG CGATGGAGAC 19741 CCCCATAGTG GCCACCGGTT CGCTTCGCCG GGACGAGGGA GGCGTACGAC GGTTCCTGAC 19801 GTCACTGGCT GAGGTATCCG TCCATGGCAT CGAGGTCAAC TGGCAGACGG TCTTCGACGG 19861 CACCGGCGCT CGGCGAGTCG ACTTGCCCAC CTACGCGTTC CAGCGTGAGC GGTTCTGGCT 19921 GGTGCCATCG ACGGGCACGG GCGACGCGTC CGGGCTGGGC CTGGGCGCCG TTGACCATCC 19981 GCTGCTGGGC GCGGCGGTGC CGCTTCCGGA CGCGGACGGC TGTGTGCTGA CCGGTGCGCT 20041 GTCGCTGGCC GGGCAGCCAT GGCTGGCCGA CCACTCCGTC CTCGGCATGG TCCTTCTGCC 20101 GGGCACCGCG TTCGTGGAGC TCGCGTTGCA GGCGGGGGCG CGGTTCGGCT GCGGCACTCT 20161 GGAAGAGCTG ACGTTGCATG AGCCGCTCGT CCTGCCCGAG CGGGAGACCG TGCAGCTCCA 20221 GGTGTCGGTC GGAGGCTCGG ACGACTTCGG AGGCCGCCCC TTCACGGTGT TCTCCCGCTG 20281 TGAGGGTGAG TGGATACGCC ACGCCGGGGG CACCCTGCGT GTGGGCGAGC GTGGCGATCC 20341 GCCCGCGAAC CCGTCGGTCT GGCCACCGGC CGATGCCCGG CCGGTCGATG TCGCCGAGTT 20401 GCACACGACG ATGGCCGAGC GGGGCTATCA GTACGGGCCC GCCTTCCAGG GCCTGCGGAA 20461 GGCATGGATC CGTGACAGCG AAGTGTTTCT CGACGTCGCG TTGCCCGAGC AGGTGAGGGG 20521 CGACGCGGCC CGCTGCGGAG TGCATCCCGC GTTGCTGGAC GCGGCCCTGC AAGGCATCGG 20581 CCTCGGCGCC TTCGTCAACG AACCGGGCCA GGCCCATCTC CCCTTCTCCT GGAGCGGGGT 20641 GACCCTGCAC GCGGTGGGCG CCACTGCCGT GCGGGTGACA CTCAGCCCGG CCGGACCGGA 20701 CACGGTGGCC ATCCGGATGG CGGACACCAT CGGGGCGCCC GTGCTGTCCA TCGACGCGCT 20761 GGCGATGCGT CCGCTCGCGG AGCAGCGGCT GCTCGAGGCG GGTGGCAGCC GCGGCGATGC 20821 GCTGTTCCGG CTGGAGTGGA AGGAGCTTCC CGTCCCCACG GGGGCCACCG GCCCACGGGC 20881 GCAGTCCTGG GGCCTGCTGG GCGGCCACGA CGAGCCTCGA CTGACCGCGG CGCTGACCGC 20941 GGCCGGTGTG TCGCCACAAC GCCATCGGGA CCTCGCCTCC ATCGACCAGG TGCCGGATGT 21001 GCTGGTCCTG TCGTGTCCGC CCGAGGCGGA TGGCGGCCCG GCCCCGGAAG CCACCTCGTC 21061 CGCCCTCCGC CGAGTGCTGG AAGTGGTGCG GGAGTGGCTC GGGGACGCGC GGTACACCGA 21121 TGCCCGACTG ATGGTGCTCA CCCGCCGCGC GGTGGCCACA TCCACCGGTG ACGACGTGGA 21181 GGATCTGGCG GCGGCCGCTG TACGGGGACT CCTGCGCACC GCACAACAGG AGAACCCCGA 21241 CCGGCTCGTC GTGATCGACC ATGACGACTC GGACCTTGAG GTGCTCCCCG TGGTGCTCGG 21301 GACAGGGGAG CCCGAAGCGG CCATCCGGGC CGGTAAGGTG CTGGTGCCCA GGCTGGTCAA 21361 GGCGGCCGTA TCGGAAGGGA AGGCCCCTGC CTGGGACGCC GGCACCGTGC TGATCACCGG 21421 CGGGACGGGG ACACTCGGCG GCCTGGTCGC CCGCCATCTG GTGACCACCC ATGGCGCGCG 21481 TGACCTGGTG CTGGCCAGTC GCGGAGGTGA CACCGCGCCC GGCGCCGTGG AACTGGCCAC 21541 CGAACTGGAG GCGCTCGGTG CCCGCATCCG CGTCGCCGCC TGCGATGTGG CCGATCGTGC 21601 CCAGCTGACC GCGCTGCTCG ACACCATTCC GGCGCTGCGT GCTGTCGTCC ACACCGCAGG 21661 TGTGGTGGAC GACGGTGTCA TCGGCTCGAT GACCGCCGAA CGCGTGGAGA CCGTCCTACG 21721 GCCGAAGGCG AACGCGGCGT GGCACCTGCA CGCGCTGACC CGCCACCTGG ACCTGGACGC 21781 GTTCGTACTG TTCTCCTCCG CCACCGGAGT GCTGGGCAGC GCGGGACAGG GCAACTACGC 21841 CGCGGCCAAC GCCTTCCTCG ACGCGCTGGC CGTGCACCGG CGCGCCCAGG GGCTCCCCGC 21901 GGTGTCGGTG GCATGGGGCC TGTGGGAGCG GCGCAGTGGC CTGACCGCAC ATCTGTCGGA 21961 GCAGGACGTG GCCCGTATGA CCAGCACGGG CGCCGTTCCC CTCTCCGACG AACGCGGTCT 22021 CGAGCTGTTC GACGCCGCGT GCCGGAGTGG CGAACCCACA CTCGTGGCCA CCCCGTTGCA 22081 CCTTCGTGCG GTGGCGGCCA CCGGTACGGT GCCCCACGTG CTCAGCGCGC TGGCACCGAC 22141 CCCGCCACGC CGGGCCGCCG AGGCCGGTGA CGGTGGAGTG GCTCTACGGC AGAGCCTTGC 22201 CGAGATGTCG GGCGCGGAAC AGAGCCAGAC CGTCCTGGGG CTGGTACGCG GGCAGGTCGC 22261 CGCCGTGCTG CGGCACCCGG ACCCGTCGGC GATCGACACG GCGCGGACGT TCCAGGAGAT 22321 CGGCTTCGAC TCGCTGACCG CGGTGGAGCT GCGCAACCGG CTGGGCGCCA CCACCGGGAT 22381 CAGGCTGGCC GCGACCGCGA TCTTCGACTA TCCGACACCT GCCACGCTGG CACAGCACCT 22441 GCTCGCCGAG ATCGTGCCGG AGACCGCCGA CCCGGTCGCG GCCCGGCTCG GCGAGCTGGA 22501 CAAGGTGGCC GCCATGATTT CGGCGATGGC CGAGGACGAC ACCCTGCGCG AGCAGTTGTC 22561 CTCGCGGATG GAGACCATCG TCGCGATGTG GGCCGACCTG CACCGTCCGG AGCGGCCGGG 22621 CACGGTTGAG CGGGACCTCG AATCCGCCTC GCTCGACGAC ATGTTCGGAA TCATCGACCA 22681 GGAACTCGAT GGGTCATGAg ctagcGGCGA AAATGAGACG TTGATCGGCA CGTAAGAGGT 22741 TCCAACTTTC ACCATAATGA AATAAGATCA CTACCGGGCG TATTTTTTGA GTTATCGAGA

293

22801 TTTTCAGGAG CTAAGGAAGC TAAAATGGAG AAAAAAATCA CTGGATATAC CACCGTTGAT 22861 ATATCCCAAT GGCATCGTAA AGAACATTTT GAGGCATTTC AGTCAGTTGC TCAATGTACC 22921 TATAACCAGA CCGTTCAGCT GGATATTACG GCCTTTTTAA AGACCGTAAA GAAAAATAAG 22981 CACAAGTTTT ATCCGGCCTT TATTCACATT CTTGCCCGCC TGATGAATGC TCATCCGGAA 23041 TTCCGTATGG CAATGAAAGA CGGTGAGCTG GTGATATGGG ATAGTGTTCA CCCTTGTTAC 23101 ACCGTTTTCC ATGAGCAAAC TGAAACGTTT TCATCGCTCT GGAGTGAATA CCACGACGAT 23161 TTCCGGCAGT TTCTACACAT ATATTCGCAA GATGTGGCGT GTTACGGTGA AAACCTGGCC 23221 TATTTCCCTA AAGGGTTTAT TGAGAATATG TTTTTCGTCT CAGCCAATCC CTGGGTGAGT 23281 TTCACCAGTT TTGATTTAAA CGTGGCCAAT ATGGACAACT TCTTCGCCCC CGTTTTCACC 23341 ATGGGCAAAT ATTATACGCA AGGCGACAAG GTGCTGATGC CGCTGGCGAT TCAGGTTCAT 23401 CATGCCGTCT GTGATGGCTT CCATGTCGGC AGAATGCTTA ATGAATTACA ACAGTACTGC 23461 GATGAGTGGC AGGGCGGGGC GTAATTTTTT TAAGGCAGTT ATTGGTGCCC TTAAACGCCT 23521 GGTGCTACGC CTGAATAAGT GATAATAAGC GGATGAATGG CAGAAATTCG AAAGCAAATT 23581 CGACCCGGTC GTCGGAAGTT CCTATTCCGA AGTTCCTATT CTATCAGAAG TATAGGAACT 23641 TCTTTCAGCT TTCCGCAACA GTATAActgt gcggtatttc acaccgcata gatccgtcga 23701 gttcaagaga aaaaaaaaga aaaagcaaaa agaaaaaagg aaagcgcgcc tcgttcagaa 23761 tgacacgtat agaatgatgc attaccttgt catcttcagt atcatactgt tcgtatacat 23821 acttactgac attcataggt atacatatat acacatgtat atatatcgta tgctgcagct 23881 ttaaataatc ggtgtcacta cataagaaca cctttggtgg agggaacatc gttggtacca 23941 ttgggcgagg tggcttctct tatggcaacc gcaagagcct tgaacgcact ctcactacgg 24001 tgatgatcat tcttgcctcg cagacaatca acgtggaggg taattctgct agcctctgca 24061 aagctttcaa gaaaatgcgg gatcatctcg caagagagat ctcctacttt ctccctttgc 24121 aaaccaagtt cgacaactgc gtacggcctg ttcgaaagat ctaccaccgc tctggaaagt 24181 gcctcatcca aaggcgcaaa tcctgatcca aaccttttta ctccacgcac tgctcctagg 24241 gcctctttaa aagcttgacc gagagcaatc ccgcagtctt cagtggtgtg atggtcgtct 24301 atgtgtaagt caccaatgca ctcaacgatt agcgaccagc cggaatgctt ggccagagca 24361 tgtatcatat ggtccagaaa ccctatacct gtgtggacgt taatcacttg cgattgtgtg 24421 gcctgttctg ctactgcttc tgcctctttt tctgggaaga tcgagtgctc tatcgctagg 24481 ggaccaccct ttaaagagat cgcaatctga atcttggttt catttgtaat acgctttact 24541 agggctttct gctctgtacc agaaccacca gaaccTTTGT ACAATTCATC CATACCATGG 24601 GTAATACCAG CAGCAGTAAC AAATTCTAAC AAGACCATGT GGTCTCTCTT TTCGTTTGGA 24661 TCTTTGGATA AGGCAGATTG AGTGGATAAG TAATGGTTGT CTGGTAACAA GACTGGACCA 24721 TCACCAATTG GAGTATTTTG TTGATAATGG TCAGCTAATT GAACAGAACC ATCTTCAATG 24781 TTGTGTCTAA TTTTGAAGTT AACTTTGATA CCATTCTTTT GTTTGTCAGC CATGATGTAA 24841 ACATTGTGAG AGTTATAGTT GTATTCCAAT TTGTGACCTA AAATGTTACC ATCTTCTTTA 24901 AAATCAATAC CTTTTAATTC GATTCTATTA ACTAAGGTAT CACCTTCAAA CTTGACTTCA 24961 GCTCTGGTCT TGTAGTTACC GTCATCTTTG AAAAAAATAG TTCTTTCTTG AACATAACCT 25021 TCTGGCATGG CAGACTTGAA AAAGTCATGT TGTTTCATAT GATCTGGGTA TCTcGCAAAA 25081 CATTGAACAC CATAACCGAA AGTAGTGACT AAGGTTGGCC ATGGAACTGG CAATTTACCA 25141 GTAGTACAAA TAAATTTTAA GGTCAATTTA CCGTAAGTAG CATCACCTTC ACCTTCACCG 25201 GAGACAGAAA ATTTGTGACC ATTAACATCA CCATCTAATT CAACCAAAAT TGGGACAACA 25261 CCAGTGAATA ATTCTTCACC TTTAGACATT GTGATGATGT TTTATTTGTT TTGATTGGTG 25321 TCTTGTAAAT AGAAACAAGA GAGAATAATA AACAAGTTAA GAATAAAAAA CCAAAGGATG 25381 AAAAAGAATG AATATGAAAA AGAGTAGAGA ATAACTTTGA AAGGGGACCA TGATATAACT 25441 GGAAAAAAGA GGTTCTTGGA AATGAAAAGT TACCAAAGAG TATTTATAAT TCAGAAAAAA 25501 AAGCCAACGA ATATCGTTTT GATGGCGAGC CTTTTTTTTT TTTTAGGAAG ACACTAAAGG 25561 TACCTAGCAT CATATGGGAA GGAAAGGAAA TCACTTGGAA GACATCACAA GCATTCATTT 25621 ACCAAGAGAA AAAATATGCA TTTTAGCTAA GATCCATTGA ACAAAGCACT CACTCAACTC 25681 AACTGAATGA ACGAAAGAAG AAAGAACAGT AGAAAACACT TTGTGACGGT GCGGAACACA 25741 TTTACGTAGC TATCATGCTG AATTCTACTA TGAAAATCTC CCAATCTGTC GATGGCAAAA 25801 CGACCCACGT GGCAGAGTTG GGTCAAGTGC CAGTTTCTGG ATTAAGTAAC AGATACAGAC 25861 ATCACACGCC ATAGAGGAAT CCCGCCGTTG CGAGAGATGG AAAACAATAG AGCCGAAATT 25921 GTGGAAGCCC GATGTCTGGG TGTACATTTT TTTTTTTTtt CTTTCTTTCT CTTTCAATAA 25981 TCTTTCCTTT TTCCATTTAG CTTGCCGGAA AAACTTTCGG GTAGCGAAAA TCTTTCTGCC 26041 GGAAAAATTA GCTATTTTTT TCTTCCTTAT TATTTTTTTA GTTCTGAAGT TTGACCAGGG 26101 CGCTACCCTG ACCGTATCAC AACCGACGAT CCGGGGTCAT GGCGGCTA //

294

A.5 Sequence of Fragment 3 yeast assembly.

Sequence of Fragment 2 shown schematically in Fig. 2-1. To show the context of the sequence in the chromosome, the last 30 bp of the HO(L) region on the 5’ end, and the pPYK promoter of the acceptor module on the 3’ end are included. The sequence is in Genbank format:

LOCUS mer-fragment-3-031016 38831 bp DNA linear UNA 07- Apr-2012 DEFINITION FEATURES Location/Qualifiers Misc._recombina 1..30 /label=HO-L Primer 1..27 /label=AN531 Primer 1..60 /label=AN133 Primer 1..18 /label=AN348 Misc._recombina 31..78 /label=FRT-WT Primer 36..66 /label=AN614 Primer 43..102 /label=AN613 ORF 79..885 /label=KanMX Misc. 610..1288 /label=AN-cr15-gblock Primer 614..637 /label=AN682 Primer complement(622..644) /label=AN681 Misc._signal 685..704 /label=cr23-target Terminator 886..1125 /label=tKanMX Misc._differenc 1043 /label="different from R1-donor" Primer 1055..1077 /label=LK24 Primer complement(1108..1140) /label=AN615 Primer complement(1117..1176) /label=AN616 Restriction_sit 1126..1131 /label=XbaI Cleavage_site 1126..1131 /label=assembly1 Misc._recombina 1132..1179 /label=FRT-14 Primer complement(1146..1196) /label=AN617 Primer complement(1179..1238)

295

/label=AN492 Amplification 1180..1453 /label="from pSE34" Primer complement(1268..1288) /label=DD77 Primer 1269..1288 /label=AN580 Promoter 1275..1451 /label=ermE Misc._recombina 1385..1444 /label=CRISPR-repair Misc._recombina complement(1423..1488) /label=CRISPR-repair-delete-bamHI Misc._recombina 1427..1446 /label=CRISPR-target Primer complement(1427..1447) /label=AN579 Primer_binding_ complement(1434..1453) /label=AN697 Primer complement(1435..1453) /label=AN20 Primer complement(1435..1475) /label=AN220 Primer 1435..1473 /label=KV1 Mutation 1444 /label="G to T" RBS 1454..1483 /label=merC-RBS ORF 1484..35050 /label=merC Primer 1489..1506 /label=JK22 Active_site 1631..2914 /label=M8-KS Primer complement(1650..1669) /label=LK26 Primer_binding_ complement(1831..1850) /label=GAN1 Primer 1856..1875 /label=JK9 Primer_binding_ complement(2503..2522) /label=GAN2 Primer 2504..2523 /label=JK33 Primer 2913..2932 /label=JK139 Primer 2924..2944 /label=KV6 Primer 2931..2950 /label=JK52 Primer complement(3064..3084) /label=KV3 Active_site 3197..4009 /label=M8-AT Primer 3293..3315 /label=JK10

296

Primer 3296..3318 /label=KV4 Primer_binding_ 3440..3462 /label=GAN3 Primer complement(3441..3462) /label=KV2 Primer_binding_ 3535..3553 /label=GAN10 Primer_binding_ 3634..3653 /label=GAN62 Primer_binding_ complement(3656..3674) /label=GAN25 Primer_binding_ 3969..3990 /label=GAN4 Primer_binding_ complement(3969..3990) /label=GAN5 Primer complement(4393..4409) /label=JK32 Primer 4398..4417 /label=KV10 Primer complement(4544..4565) /label=KV7 Primer complement(4565..4585) /label=JK140 Primer_binding_ 4786..4805 /label=GAN59 Primer_binding_ 4820..4839 /label=GAN63 Active_site 4907..5437 /label=M8-KR Primer_binding_ complement(5057..5076) /label=GAN7 Primer 5057..5076 /label=KV8 Primer 5122..5141 /label=JK2 Primer complement(5255..5273) /label=KV5 Primer complement(5284..5305) /label=JK1 Primer_binding_ 5687..5708 /label=GAN6 Active_site 5711..5947 /label=M8-ACP Primer_binding_ complement(5729..5748) /label=GAN8 Primer 5928..5950 /label=JK35 Primer complement(5963..5980) /label=JK23 Active_site 6005..7273 /label=M9-KS Primer_binding_ 6236..6253 /label=GAN9 Primer_binding_ complement(6236..6253) /label=GAN60 Primer complement(6398..6418)

297

/label=KV11 Primer_binding_ complement(6862..6881) /label=GAN2 Primer 6863..6882 /label=JK33 Primer 6954..6974 /label=JK4 Primer complement(7103..7124) /label=KV9 Primer complement(7104..7124) /label=JK3 Primer 7275..7293 /label=JK37 Primer complement(7487..7506) /label=JK36 Active_site 7556..8368 /label=M9-AT Primer_binding_ complement(7871..7889) /label=GAN12 Primer_binding_ 7894..7912 /label=GAN10 Primer_binding_ 8544..8563 /label=GAN11 Primer_binding_ complement(8592..8611) /label=GAN13 Primer 8704..8722 /label=JK7 Primer complement(8861..8879) /label=JK34 Primer complement(8926..8944) /label=JK6 Primer complement(8926..8945) /label=JK5 Primer 9008..9067 /label=AN341-cr4-repair Misc._RNA 9039..9058 /label="Target mutated sequence" Primer complement(9046..9105) /label=AN342-cr4-repair Mutation 9058 /label="C to T" Primer 9175..9194 /label=JK44 Active_site 9278..9820 /label=M9-KR Primer complement(9353..9372) /label=JK38 Primer_binding_ 9690..9713 /label=GAN14 Primer complement(9962..9982) /label=JK39 Active_site 10028..10285 /label=M9-ACP Active_site 10343..11599 /label=M10-KS Primer 10364..10382 /label=JK40

298

Primer_binding_ complement(10451..10469) /label=GAN16 Primer complement(10686..10705) /label=JK8 Primer_binding_ complement(10982..10999) /label=GAN17 Primer_binding_ 11039..11060

/label=GAN15 Primer 11610..11629 /label=JK46 Primer complement(11658..11677) /label=JK45 Active_site 11909..12781 /label=M10-AT Primer 12211..12230 /label=JK42 Primer complement(12301..12319) /label=JK41 Primer_binding_ 12813..12832 /label=GAN18 Primer_binding_ complement(12919..12937) /label=GAN19 Primer 13435..13454 /label=JK62 Primer complement(13629..13648) /label=JK47 Active_site 13796..14326 /label=M10-KR Primer 14151..14173 /label=JK67 Primer_binding_ complement(14164..14183) /label=GAN20 Primer 14270..14288 /label=JK48 Primer 14271..14288 /label=JK60 Primer complement(14464..14483) /label=JK43 Active_site 14612..14869 /label=M10-ACP Primer 14814..14834 /label=JK64 Primer complement(14918..14937) /label=JK63 Active_site 14927..16213 /label=M11-KS Primer_binding_ complement(15491..15509) /label=GAN22 Primer_binding_ 15533..15551 /label=GAN21 Primer_binding_ complement(16065..16084) /label=GAN23 Primer 16212..16231 /label=JK50 Primer 16230..16249 /label=JK52

299

Primer complement(16363..16383) /label=KV3 Primer complement(16418..16437) /label=JK49 Primer 16418..16438 /label=JK66 Active_site 16496..17308 /label=M11-AT Primer complement(16592..16612) /label=JK65 Primer_binding_ 16839..16858 /label=GAN24 Primer_binding_ complement(16955..16973) /label=GAN25 Primer 17354..17372 /label=JK55 Primer complement(17376..17393) /label=JK58 Primer complement(17477..17496) /label=JK54 Primer complement(17477..17497) /label=JK68 Primer 17848..17870 /label=JK73 Primer complement(18088..18107) /label=JK51 Primer complement(18180..18197) /label=JK53 Active_site 18218..18739 /label=M11-KR Primer_binding_ 18424..18443 /label=GAN26 Primer_binding_ complement(18512..18530) /label=GAN27 Active_site 18995..19240 /label=M11-ACP Primer 19092..19111 /label=JK71 Primer_binding_ complement(19121..19139) /label=GAN28 Primer complement(19265..19285) /label=JK56 Primer complement(19265..19286) /label=JK59 Primer complement(19265..19284) /label=JK61 Primer complement(19265..19286) /label=JK70 Active_site 19307..20590 /label=M12-KS Primer 19517..19537 /label=JK81 Primer complement(19722..19743) /label=JK74 Primer 19776..19797 /label=JK80 Primer_binding_ 20093..20111

300

/label=GAN29 Primer_binding_ complement(20193..20213) /label=GAN31 Primer 20267..20287 /label=JK127 Primer 20589..20608 /label=JK50 Primer 20607..20626 /label=JK52 Primer_binding_ complement(20743..20762) /label=GAN32 Primer 20744..20763 /label=JK91 Primer 20804..20824 /label=JK75 Active_site 20858..21670 /label=M12-AT Primer complement(20957..20978) /label=JK72 Primer_binding_ complement(21303..21321) /label=GAN33 Primer_binding_ 21317..21336 /label=GAN30 Primer 21761..21781 /label=JK83 Primer complement(21826..21845) /label=JK82 Primer complement(22382..22401) /label=JK126 Primer_binding_ 22411..22430 /label=GAN34 Active_site 22598..23119 /label=M12-KR Primer 22790..22809 /label=JK78 Primer 22790..22809 /label=JK92 Primer 22906..22965 /label=AN607 Primer complement(22942..23001) /label=AN608 Primer complement(22966..22986) /label=JK76 Primer complement(22966..22986) /label=JK77 Primer_binding_ 22991..23010 /label=GAN35 Primer_binding_ complement(23009..23027) /label=GAN36 Active_site 23360..23614 /label=M12-ACP Primer 23555..23576 /label=JK85 Primer complement(23669..23689) /label=JK84 Active_site 23672..24955 /label=M13-KS

301

Primer_binding_ 24138..24157 /label=GAN37 Primer_binding_ complement(24142..24161) /label=GAN40 Primer_binding_ 24748..24768 /label=GAN38 Primer_binding_ complement(24762..24781) /label=GAN41 Primer 25066..25085 /label=JK86 Primer 25066..25085 /label=JK95 Primer 25131..25149 /label=JK128 Primer complement(25233..25253) /label=JK79 Primer complement(25233..25253) /label=JK93 Primer complement(25233..25253) /label=JK94 Active_site 25241..26056 /label=M13-AT Primer_binding_ 25295..25315 /label=GAN39 Primer_binding_ complement(25360..25379) /label=GAN42 Primer 25859..25878 /label=JK97 Primer complement(25967..25987) /label=JK96 Primer 26318..26338 /label=JK133 Active_site 26393..26917 /label=M13-DH Primer_binding_ 26460..26479 /label=GAN43 Primer_binding_ complement(26512..26531) /label=GAN44 Primer_binding_ complement(27076..27096) /label=GAN45 Primer 27076..27096 /label=JK98 Mutation 27250 /label="G to A" Primer 27269..27289 /label=JK89 Primer complement(27367..27386) /label=JK87 Primer complement(27367..27386) /label=JK88 Primer complement(27398..27417) /label=JK129 Primer 27488..27508 /label=JK134 Primer complement(27743..27762) /label=JK107 Active_site 27950..28858

302

/label=M13-ER Primer 28220..28241 /label=JK108 Primer complement(28292..28313) /label=JK132 Primer 28747..28767 /label=JK104 Primer complement(28846..28866) /label=JK103 Primer complement(28850..28869) /label=JK135 Active_site 28856..29422 /label=M13-KR Primer_binding_ complement(29435..29455) /label=GAN47 Primer 29435..29455 /label=JK101 Primer 29612..29630 /label=JK130 Active_site 29696..29953 /label=M13-ACP Primer complement(29723..29742) /label=JK90 Primer complement(29723..29742) /label=JK99 Primer complement(29723..29742) /label=JK100 Primer_binding_ 30007..30025 /label=GAN46 Active_site 30014..31291 /label=M14-KS Primer_binding_ complement(30032..30051) /label=GAN48 Primer 30426..30447 /label=JK106 Primer_binding_ 30464..30481 /label=JK136 Primer complement(30634..30654) /label=JK105 Primer_binding_ 31030..31051 /label=GAN49 Primer_binding_ complement(31030..31051) /label=GAN51 Primer 31546..31565 /label=JK111 Active_site 31637..32506 /label=M14-AT Primer_binding_ 31665..31684 /label=GAN50 Primer complement(31679..31700) /label=JK102 Primer complement(31684..31706) /label=JK131 Primer_binding_ 32230..32250 /label=JK137 Primer 32240..32261 /label=JK122

303

Primer_binding_ complement(32246..32265) /label=GAN61 Primer complement(32401..32420) /label=JK121 Active_site 32684..33103 /label=M14-DH Primer_binding_ complement(32891..32914) /label=GAN53 Primer 32891..32914 /label=JK125 Primer_binding_ 33599..33618 /label=GAN52 Primer_binding_ complement(33616..33635) /label=GAN54 Primer 33747..33765 /label=JK113 Primer complement(33878..33899) /label=JK112 Primer 33997 /label="C to A mutation" Active_site 34001..34477 /label=M14-KR Primer 34270..34289 /label=JK124 Primer complement(34352..34372) /label=JK123 Primer_binding_ complement(34364..34384) /label=JK138 Active_site 34631..34888 /label=M14-ACP Primer_binding_ complement(34843..34862) /label=GAN56 Source 34912..35225 /label=gblock-AN-ptKanR Primer 34912..34931 /label=JK115 Primer complement(35032..35050) /label=JK114 Promoter 35057..35195 /label="from pSPL" Mutation 35069 /label="G to A" Primer 35196..35213 /label=JK117 ORF 35196..36206 /label=ant3ia-specm-strpm Primer complement(35205..35225) /label=JK116 Primer_binding_ complement(35451..35474) /label=GAN57 Primer_binding_ 35537..35559 /label=GAN55 Primer_binding_ complement(36006..36025) /label=GAN58 Source 36178..36363 /label=ptKanR Primer 36178..36198

304

/label=JK119 Primer complement(36182..36206) /label=JK118 Terminator 36207..36277 /label="kanR terminator" Mutation 36274 /label="G to T" Misc._recombina 36278..36325 /label=FRT-13 Primer complement(36326..36355) /label=LW375 Primer complement(36339..36365) /label=JK120 Primer complement(36427..36442) /label=AN112 ORF complement(36555..37258) /label=HIS3 Primer complement(36637..36658) /label=AN272 ORF complement(37259..37972) /label=GFP Promoter complement(37973..38831) /label=pPYK ORIGIN 1 aaaattgtgc ctttggactt aaaatggcgt GAAGTTCCTA TTCCGAAGTT CCTATTCTCT 61 AGAAAGTATA GGAACTTCgg taaggaaaag actcacgttt cgaggccgcg attaaattcc 121 aacatggatg ctgatttata tgggtataaa tgggctcgcg ataatgtcgg gcaatcaggt 181 gcgacaatct atcgattgta tgggaagccc gatgcgccag agttgtttct gaaacatggc 241 aaaggtagcg ttgccaatga tgttacagat gagatggtca gactaaactg gctgacggaa 301 tttatgcctc ttccgaccat caagcatttt atccgtactc ctgatgatgc atggttactc 361 accactgcga tccccggcaa aacagcattc caggtattag aagaatatcc tgattcaggt 421 gaaaatattg ttgatgcgct ggcagtgttc ctgcgccggt tgcattcgat tcctgtttgt 481 aattgtcctt ttaacagcga tcgcgtattt cgtctcgctc aggcgcaatc acgaatgaat 541 aacggtttgg ttgatgcgag tgattttgat gacgagcgta atggctggcc tgttgaacaa 601 gtctggaaag aaatgcataa gcttttgcca ttctcaccgg attcagtcgt cactcatggt 661 gatttctcac ttgataacct tatttttgac gaggggaaat taataggttg tattgatgtt 721 ggacgagtcg gaatcgcaga ccgataccag gatcttgcca tcctatggaa ctgcctcggt 781 gagttttctc cttcattaca gaaacggctt tttcaaaaat atggtattga taatcctgat 841 atgaataaat tgcagtttca tttgatgctc gatgagtttt tctaatcagt actgacaata 901 aaaagattct tgttttcaag aacttgtcat ttgtatagtt tttttatatt gtagttgttc 961 tattttaatc aaatgttagc gtgatttata ttttttttcg cctcgacatc atctgcccag 1021 atgcgaagtt aagtgcgcag aaagtaatat catgcgtcaa tcgtatgtga atgctggtcg 1081 ctatactgct gtcgattcga tactaacgcc gccatccagt gtcgatctag aGAAGTTCCT 1141 ATTCCGAAGT TCCTATTCTA TCAGAAGTAT AGGAACTTCt accAGCCCGA CCCGAGCACG 1201 CGCCGGCACG CCTGGTCGAT GTCGGACCGG AGTTCGAGGT ACGCGGCTTG CAGGTCCAGG 1261 AAGGGGACGT CCATGCGAGT GTCCGTTCGA GTGGCGGCTT GCGCCCGATG CTAGTCGCGG 1321 TTGATCGGCG ATCGCAGGTG CACGCGGTCG ATCTTGACGG CTGGCGAGAG GTGCGGGGAG 1381 GATCTGACCG ACGCGGTCCA CACGTGGCAC CGCGATGCTG TTGTGGGCAC AATCGTGCCG 1441 GTTGGTAgga tccCGGAATC ATCGACCAGG AACTCGATGG GTCATGAGCA GCGAGAACGT 1501 CCGACCGGAA ATCGAGGGGA CTGGCACGCG GATGTCGAAC GACGAAAAGG TACTCGAGTA 1561 CCTCAAGAAG CTCACCGCCG ATCTGCGCCA GACGCGTCAG CGTCTCCAGG ACGTCGAGGC 1621 CAAGAGCCGC GAGCCGATCG CGATCGTCGG TATGAGCTGC CGTTTCCCGG GTGGGGTGAG 1681 CTCCCCGGAA GACCTGTGGC GGCTGACGGA GTCTGCGGTG GACGCGGTCT CCGGTTTCCC 1741 CACGGACCGA GGCTGGGACC TGGACGGTCT GTACGACCCC GACCCGGATC GCGCGGGCCG 1801 GTCGTACGCC CGAGAGGGCG CGTTCATCCC CGATGCAGGC CACTTCGACC CCGGCCTCTT 1861 CGGGATCTCG CCACGTGAGG CGCTGGCGAT GGATCCGCAG CAGCGGCTGC TGCTGGAGGC 1921 ATCGTGGGAG GCCCTGGAGC GGGCGGGTAT TCCCACCGAT TCCCTGAAGG GCAGCCGGAC

305

1981 CGGGGTGTTC GCCGGACTGA TGTCTTCCGA CTATGTCTCG CGGCTGTCCG CGGTCCCGGA 2041 CGAACTCGAG GGGTACGTCG GAATCGGAAG CGCGGCGAGC GTCGCCTCCG GCCGCGTGTC 2101 GTACACCCTG GGGCTTGAGG GCCCGGCGGT CACCGTGGAC ACGGCGTGTT CGTCGTCGTT 2161 GGTGGCGTTG CATCTGGCGG TGCAGGCATT GCGGTCGGGT GAGTGCTCGC TGGCGCTCGC 2221 GGGCGGTGTC ACGGTGATGG CGACACCCGG CACCTTCGTC CAGTTCTCCC GCCAGCGCGG 2281 CCTGGCCGCC GACGGCCGGT GCAAGGCGTT CGCGGCGGGG GCCGACGGTA CCGGCTGGGG 2341 CGAAGGCGTC GGCATGCTGG TGGTGGAGCG GTTGTCGGAT GCTGAGCGGC TTGGGCATCG 2401 GGTGTTGGCG GTGGTGCGTG GTTCTGCGGT GAATCAGGAT GGTGCGTCGA ATGGTTTGAC 2461 GGCGCCGAAT GGTCCGTCGC AGCAGCGGGT GATCCGTCAG GCGTTGGCGA ATGCCCGTTT 2521 GTCGGCGGTG GATGTGGATG CGGTGGAGGC GCATGGTACG GGTACGGCGT TGGGTGATCC 2581 GATTGAGGCG CAGGCGTTGT TGGCCACGTA TGGTCAGGGT CGGGATGTGG GTCGGCCGTT 2641 GTGGTTGGGG TCGGTGAAGT CGAACATCGG TCATACGCAG GCGGCTGCGG GTGTGGCTGG 2701 TGTGATCAAG ATGGTGATGG CGATGCGGCA TGGGGTGTTG CCGCGGACGT TGCATGTGGA 2761 TGAGCCGTCG CCGCATGTGG ATTGGTCTGC TGGTGCGGTT GAGTTGTTGA CGGGGCAGGT 2821 GGCGTGGCCG GAGGTGGATC GGCCGCGTCG GGCGGGTGTG TCGGCGTTCG GGGTGAGTGG 2881 GACGAATGCG CATGTGATTG TGGAGCAGGC GCCTGAAGTG GCGGAGTCTG AGGCTGAAGG 2941 TGTGGTGTTG CCTGCTGTGC CGTGGGTGGT GTCGGGTGTG GGTGAGGTGG CGGTGCGGGC 3001 GCAGGTGGAG CGGTTGCGGG CCTTTGCGGA CCGGAATCCG GGTCTGGATC CGGTGGATGT 3061 GGGGTGGTCT TTGGCGACTG GTCGTGCGGG GTTGTCGCAT CGTGCGGTGG TGGTGGGTGC 3121 GGGTCGTGGT GAGTTGTTGG GGGCTTTGGA GGGTGTGCCG GTGGTGGGTG TTCCGGTGGT 3181 GGGTGGGTTG GGTGTGTTGT TTGCGGGTCA GGGGTCGCAG CGGTTGGGGA TGGGGCGTGG 3241 GTTGTATGAG GGGTATCCGG TGTTCGCTGC GGTGTGGGAT GAGGTGTGCG CGCAGCTGGA 3301 TCGGTATTTG GATAGGCCGG TGGGTGAGGT GGTGTGGGGT GATGATGCCG GGTTGGTCGG 3361 GGAGACGGTG TATGCGCAGG CGGGGTTGTT CGCGCTTGAG GTGGCGTTGT ATCGGCTGAT 3421 CGCTTCGTGG GGTGTGAGGG CGGATTATCT GCTGGGTCAT TCGATTGGTG AGTTGGCTGC 3481 GGCGTATGTG GCGGGTGTGT GGTCGTTGGA GGATGCGGTG AGGGTGGTGG TGGCGCGGGG 3541 GCGTTTGATG CAGGCGTTGC CGTCGGGTGG TGCGATGGTT GCGGTGGGGG CGTCGGAGGG 3601 TGTGGTGCGG CCGCTGCTGG GCGAGGGTGT GGTGGTTGCG GCGGTGAATG GTCCCGAGTC 3661 GGTGGTGCTG TCGGGTGATG AGGATGCGGT TCAGGTTGTG GTGGATGTGT TGGCTGGGCG 3721 TGGGGTGCGG ACGCGGCGGT TGCGGGTGAG TCATGCGTTT CATTCGGCTC GTATGGACGG 3781 CATGCTGGCG GAGTTCGGTG AGGTGCTTCG GGGCGTGGAG TTCCGTGCCC CGAGCGTGCC 3841 CGTGGTGTCG AACGTGTCCG GTGTGGTGGC GGGCGAGGAG TTGTGTTCGC CGGAGTATTG 3901 GGTGCGGCAT GTGCGGGAGA CGGTCCGGTT CGCCGATGGG CTGGAGACGC TGCGCGAGCT 3961 GGGTGTGGGT TCGTTCCTGG AGTTGGGACC GGACGGGACA TTGACCGCGC TGGCCGACGG 4021 CGATGGTGTG TCGGCGCTGC GCCGGGACCG TCCGGAACCG ACTGCGGTAA TGGCTGCTTT 4081 GGGTGGGTTG TATGTCCGGG GTGTGGAGGT CGACTGGGAC GCGGTGTTCC CGGGTGCTCG 4141 GCGGGTCGAT TTGCCGACGT ATGCCTTCCA GCGTGAGCGG TTCTGGCTGG AACCGGCCGC 4201 TGAGCAGCCT GCGACGAGCG CGGTGGACGC GGCGTTCTGG GACGCGGTCG AGCGGGGCGA 4261 TGCGGAGATT CTCGGGGTTG ACGTTGAGCA GCCGTTGAGT GCCGCGTTGC CCGCATTGGC 4321 GTCGTGGCGA CGGGCGCGGC AGGAAGAGTC GGTCATCGAC GCATGGCGGT ATCGGCTGAC 4381 CTGGACCCCG GTCGCGGGTC TCTCTTCGCA GCTCTCCGGC GTGTGGTTGG TGGTGGTCGA 4441 GCCGGACGAG GCGGAGCCGG ACGTCGTCGC CGCGCTGCGG GGCGCCGGCG CCGAGGTGCG 4501 TGTCGTAACG ATCGATGAGC TGGACGCGGG CCCGGTCGCG GGCGTGGTGT CTTTGTTGTC 4561 GGTCGAGACG ACGGTGTCAT TGCTCCAGGC CCTTGTGGCA GAGGGGGGCG ATGCGCCGTT 4621 GTGGTGTGTC ACTCGGGGTG CGGTCTCCGT GGTGGACGGG GATGTGGTGG ATCCGCATGC 4681 GTCGGCCGTC TGGGGTTTGG GCCGTGTGAT CGGTCTGGAG CATCCGGACC GTTGGGGCGG 4741 GCTGATCGAT CTGCCCACCG CATGGGGTGA GCGAACCTCC GGCATGTTGT GCTCGGTGCT 4801 TTCGGGCGCC ACGGGTGAGG ACCACACAGC GATCCGTGGC GACGAGGTGT TGGGTTGTCG 4861 TCTGAGCCGT GCGACGACGT CGGCACCGGG GCCGTCCACT GCCTGGGAAG CGTCGGGGAC 4921 CGCGCTGATC ACCGGTGGAA CGGGTGCCTT GGGGAGCCAT GTCGCCCGAT GGCTCGCGGA 4981 TACCGGCGTC GAAGAGATCG TGCTGACGAG CCGACGAGGC GCGGACGCTC CCGGAGCACG 5041 GGAACTGGTC GCCGAACTGT CGGCCATGGG CGTATCGGCC CGCGTCGTGG CGTGTGATGT 5101 GGCCGATCGG GACGCGGTTG CGGAGCTGAT CGAGACCATT CCGGACCTCC GCGTGGTCGT 5161 CCACGCCGCG GGAGTACCGA GCTGGGGTGC GTTGAGCACA CTTACCGCAC AGGGCCTTCA 5221 GGATGGGATG CGGGCGAAGG TCGCGGGAGC CATCCACCTG GATGAGCTGA CGCGCGATAT 5281 GCGCTTGGAC GCCTTTGTGT TGTTCTCGTC GGTGGCGGGG GTGTGGGGGA GCGGTAGTCA 5341 GTCGGCGTAT GCGGCGGCGA ACGCGTTTCT GGATGGGTTG GCGTGGCGGC GTCGTGGTGT

306

5401 TGGGTTGGTG GCGACGTCGG TGGCGTGGGG GATGTGGGGT GGCGGTGGTA TGGCGGTTGG 5461 GGGTGAGGAG TTTCTGGTTG AGCGTGGTGT GTCGGGGATG GCTCCGGGGT CGGCGGTGGC 5521 TGCGTTGCGG CGGGCGCTGT GTGATGGTGA GACGGCGCTT GTGGTGGCGG ATGTGGATTG 5581 GGAGCGGTTC GGGCCGAGGT TCACCGCGTT GCGTCCGAGC CCACTGCTGA GCGAGCTGAT 5641 CCCCGACACC GTCGGCTCGG GGGTTCCGCT GGGTGAATTC GCGGCCCGTT TCCAGACCAT 5701 GTCCGAGGGC GAGCGCATGC GCGCGGCCGT CGAGCTGGTG CGTGTTTCGG CCGCGGCCGT 5761 GCTGGGGCAC CAGGGCCCGG AGGCCATCGA TCCCGTCAGG ACGTTCCAGG AGATCGGCTT 5821 CGACTCGCTG ACCGCGGTGG AACTGCGCAA CCGGATCGCC ACGGCTACCG GTATCCGCCC 5881 GCCGGCCACG ATGGTCTTCG ACTATCCGAC TCCTGTGGCC CTCGCCGAAT ATCTGAGCGT 5941 GGAATTGCTC GGTTCGCCGC AGGACAGTGT GCCGCCGTTG CAGGTGGCCG CGCCGGACGA 6001 CGGTGATCCC ATTGTCATCG TCGGCATGAG CTGCCGCTTC CCCGGGGACG TCGAGTCTCC 6061 CGAGGATCTG TGGCGGTTGA TCGACTCCGA CGGCGATGCC ATAACGGCCT TTCCGACGGA 6121 CCGTGGATGG GACCTGACCG GCCTCTTCGA CACGGCTGTG GGGGAGTCGG GGACGTCGTA 6181 TGCGCGTGTT GGTGGCTTCG TCCACGACGC GGGTGAGTTC GATCCGGCCT TCTTCGGTAT 6241 CTCGCCGCGT GAGGCGACCG CGATGGATCC GCAGCAGCGG CTGCTGCTGC ACGCGGCATG 6301 GGAGGCGTTC GAGCGGGCCG GTATCCCGGC CGCCTCGGTC AGGGGCAGCA GGACTGGAGT 6361 GTTCGTCGGA GCCTCGCCGC AGGGCTATGG CGCCGCCGAA GCGTCGGAAG GCTATTTCCT 6421 CACCGGTAGT TCGGGCAGTG TCATTTCGGG TCGCGTGTCG TACACGCTGG GTCTTGAGGG 6481 CCCGGCGGTC ACGGTGGATA CGGCGTGTTC GTCGTCGTTG GTGGCGTTGC ATCTGGCGGT 6541 GCAGGCGTTG CGGTCGGGCG AGTGTTCGCT GGCGCTCGCG GGCGGTGTCA CGGTGATGGC 6601 GACACCCACT GCTTTCGTGG AGTTCTCGCG TCAGCGTGGG CTGGCCGCCG ATGGCCGCTG 6661 CAAGTCCTTT GCCGCTGGTG CGGATGGGAC AGGTTGGTCG GAGGGTGTTG GGCTGCTGCT 6721 GGTGGAGCGG TTGTCGGATG CGGAGCGGCT TGGGCATCGG GTGCTGGCGG TGGTGCGTGG 6781 TTCTGCGGTG AATCAGGATG GTGCGTCGAA TGGTTTGACG GCGCCGAATG GTCCGTCGCA 6841 GCAGCGGGTG ATCCGTCAGG CGTTGGCGAA TGCCCGTTTG TCGGCGGTGG ATGTGGATGC 6901 GGTGGAGGCG CATGGTACGG GTACGGCGTT GGGTGATCCG ATCGAGGCGC AGGCCCTGTT 6961 GGCCACGTAT GGTCAGGGTC GGGATGTGGG TCGGCCGTTG TGGTTGGGGT CGGTGAAGTC 7021 GAATATTGGT CATACGCAGG CGGCTGCGGG TGTGGCTGGT GTGATCAAGA TGGTGATGGC 7081 GCTGCGGCAT GGGGTGCTGC CGCGAACGTT GCACGTCGAT GAACCCTCCC CGCATGTGGA 7141 CTGGTCGTCC GGGGCGGTCG AGTTGTTGAG CGAGAGGGCT GCTTGGCCGG AGATGGGCCG 7201 ACCGCGTCGG GCGGGCGTGT CGTCGTTCGG GGTGAGCGGG ACGAACGCGC ATGTGGTGTT 7261 GGAGCAGGCT CCTGGGGCGG TGGAGGAGTC TCGGGGCGAG GGTGTTGCGT TGCCTGCTGT 7321 GCCGTGGGTG GTGTCGGGTG CGGGTGAGGT GGCGGTGCGG GCGCAGGTGG AGCGGTTGCG 7381 GGCCTTCGCG GACCGGAATC CGGGTCTGGA TCCGGTGGAT GTGGGGTGGT CTTTGGTGGC 7441 CACTCGTTCT GGGTTGTCGC ATCGTGCGGT GGTGGTGGGT GCGGATCGTG AGGAGTTGCT 7501 GGGTGGGTTG GGTTCGGTGG TGGTGGGTGT TCCGGTTGCG GGTGGGTTGG GTGTGTTGTT 7561 TGCGGGTCAG GGGTCGCAGC GGTTGGGGAT GGGTCGTGGG TTGTATGAGG GGTATCCGGT 7621 GTTCGCTGCG GTGTGGGATG AGGTGTGCGG GGAGCTGGAT CGGTATCTGG ATAGGCCGGT 7681 GGGTGAGGTG GTGTGGGGTG ATGATGCCGG GTTGGTCGGG GAGACGGTGT ATGCGCAGGC 7741 GGGGTTGTTC GCGCTGGAGG TGTCGCTGTA TCGGCTGATC GCTTCGTGGG GTGTGAGGGG 7801 GGATTATCTG CTGGGTCATT CGATTGGTGA GTTGGCTGCG GCGTATGTGG CGGGTGTGTG 7861 GTCGTTGGAG GATGCGGGGA GGGTGGTGGT GGCGCGGGGG CGTTTGATGC AGGCGTTGCC 7921 GTCGGGTGGT GCGATGGTTG CGGTGGCGGC GTCGGAGGGT GAGGTGCGGC CGCTGCTGGG 7981 CGAGGGTGTG GTGGTTGCGG CGGTGAATGG TCCCGAGTCG GTGGTGGTCT CGGGGGATGA 8041 GGATGCGGTT GAGGCGGTTG TGGATGTGTT GGCTGGGCGT GGGGTGCGGA CGCGGCGGTT 8101 GCGGGTGAGT CATGCGTTTC ATTCGGCTCG TATGGACGGG ATGCTCGCGG AGTTCGGTGA 8161 GGTGCTTCGG GGCGTGGAGT TCCGTGCCCC GAGCGTGCCC GTGGTGTCGA ACGTGTCCGG 8221 TGCGGTGGCC GGTGAAGAGC TCTGCTCGCC GGAGTATTGG GTGCGTCATG TGCGGGAGAC 8281 GGTCCGATTC GCGGATGGGC TGGAGACGCT CCGTGAGCTG GGTGTTGGTT CGTTCCTGGA 8341 GTTGGGGCCT GACGGGACGT TGACCGCCTT GGCGGATGGC GATGGTGTGC CTGTCTTGCG 8401 TCGGGATCGT CCGGAGCCTC TGACCGTTAT GGCGGCTTTG GGTGGGCTGT ACGTCCGGGG 8461 TGTCCAGATC GACTGGGATG CGGTGTTCCC GGGTGCTCGG CGGGTTGATT TGCCGACGTA 8521 TGCCTTCCAG CGTGAGCGGT TCTGGTTGGA GCCGTCCCCT GAGCAGCCCA CGACGAGCGC 8581 GGCGGACGCG GCGTTCTGGG ATGCGGTTGA GCGTGGGGAT CTCGGTTCTT TCGGTATCGA 8641 TGCCGAACAG CCGCTCAGCG CCGCACTGCC CGCCCTCTCG TCCTGGCGCC GCCGTCACCA 8701 GGAGAGGTCA CTCGTCGAGT CCTGGCGGTA CCGCCTCGAC TGGTCCCCGA TCGGCACCGC 8761 TTCCGAGCAG CCGAGTCTGC GCGGCACGTG GCTGGTGGTG GGCGAGGGCG GAGACGACGT

307

8821 GGTCGCCGTG CTGCGGGCTG CGGGGGCCGA TGCGCGAGTT GTGACAATGG CGGAGCTGGG 8881 CGAGGTCGCG GCTGCGGGTG TGGTGTCGTT GTTGCCGGTC GAGGCGACGG TGTCACTGGT 8941 GCAGGCACTG GGGACGGCCG GGGCCGATGC GCCGTTGTGG TGTGTGACTC GGGGTGCGGT 9001 GTCGGTGGTC GATGGTGATG TGGTGGATCC GGGGCAGTCG GGGGTGTGGG GTCTTGGCCG 9061 GGTGATCCGT TTGGAGCATC CGGATCGTTG GGGTGGTCTG ATCGATGTGC CGGTGGTGGT 9121 GGATGAGGAG GCCGGGGCTT GGTTGTGCCG GGTGTTGGGT GGGGGTACGG GGGAGGACCA 9181 GGTTGCGGTT CGTGGTGGTG GGGCGTGGGG TGCTCGGCTG GTGCGGGTGT CGGGCTCGGG 9241 TTCGGGATCG GGTGGGGCGG TTGTGTGGCG GGGTCGAGGG GCGGCGTTGG TGACGGGCGG 9301 TACGGGTGCG TTGGGTGGTC ATGTGGCGCG GTGGTTGGCC GGTGCTGGTG TGGAGACTGT 9361 TGTGCTGGCG AGTCGTCGGG GGATGGCTGC GCCGGATGCG GAGCAGCTGG TCGCGGAGTT 9421 GGAGGGGTTG GGTGTTGCGG TGCGGGTGGT GGCGTGTGAT GTGGCGGATC GGGGTGCGGT 9481 GGCGGAGTTG TTGGAGGGGA TTGGGGATTT GCGTGTGGTG GTGCATGCGG CGGGTGTGCT 9541 GGATGACGGT GTGTTGGAGT CGCTGACGTC TGAGCGGGTT CGTGAGGTGA TGCGGGTCAA 9601 GGCGGAGGGT GCGCGGTATC TGGATGAGTT GACGCGGGGT TGGGATCTGG ATGCGTTTGT 9661 GTTGTTTTCT TCGGCTGCGG GGACTGTGGG TAATGCGGGT CAGGGGAGTT ATGCGGCGGC 9721 GAATGCGGTG TTGGACGGGT TGGCTTGGCG GCGTCGGGCG GAGGGGTTGG TGGCCACGTC 9781 GGTGGCTTGG GGAGCCTGGG CCGACAGCGG CATGGGGGCT GGGCACGCAC GGGCCATGGC 9841 ACCACGGCTG GCGTTGGCAG CCCTTCAGCG AGCGTTGGAC GACGACGAGA CCGCACTGAT 9901 GATCGCGGAC GTGGATTGGT CGAGCTTCGG CTCCCGGTTC ACCGCCGTGC GGCCCAGCCC 9961 GCTGCTCGGT GAATTGCTGG GTGGCGCCGC TCATCCCGCG CCCGCGGTGG GCGGGTTCGT 10021 CGACCGGCTA CGGGACCTCC CCCCGGCCGA GCGGGAACGG ACGGTCCTTG AGCTCGTACG 10081 TGGCCAGGTG GCCGTCGTTC TGGGACATGC CACCCCGGGG GCGATCGACA CCGCAGCGAC 10141 ATTCCAGTCA GCCGGTTTCG ACTCCCTGAC CGCGATCGAA CTCCGCAATC GGCTCATGGC 10201 GGCCACCGGA GTGCAGACAC CTGCCTCGGT CGTCTTCGAC TACCCCACTC CGGAACTTCT 10261 CGCCGGCCAC CTGCGGGAGC AACTGCTCGG GGCAGGGTCG GCAGCACTCT CGACGACGGT 10321 CGCCACGGCT CCGGTCGATG ACGACCCGAT TGCGATCATC GGCATGAGCT GCCGATTCCC 10381 TGGTGGTGTC GACTCGCCCG AAGAGCTGTG GCGGCTCCTG GAGTCGGGGA CGGATGCCAT 10441 TTCCGCCTTT CCACAAGACC GCGGCTGGGA CCTCGTGGGC GGAGTCGATG GCGCGTCGGT 10501 CCGGGCGGGT GGCTTCCTCT ACACGGCGGC CGAGTTCGAC CCCGCGTTCT TCGGGATCTC 10561 GCCGCGCGAG GCGATCGCGA TGGATCCGCA GCAGCGGCTG CTGCTCGAGG CCTCGTGGGA 10621 GGTCTTCGAG CGGGCCGGGA TCGCCGCGGA CGCGTTGCGG GACAGCCCCA CCGGAGTGTT 10681 CGTCGGCACC AACGGCCAGG ATTACGCCGC CCTCGTCGGT AACGCGCCAC AGCGTGCGGA 10741 CGGCCATCTG GCCACCGGCA GCGCGGCGAG CGTGGCATCC GGCCGACTGT CCTACACCTT 10801 CGGGCTCGAG GGCCCGGCCA TCACCGTGGA CACCGCGTGT TCGTCGTCGC TGGTGGCCAT 10861 GCACCTGGCC GCGCAGGCGC TGCGCTCGGG CGAATGCCGT ATGGCCCTTG CGGGCGGCGC 10921 CACGGTAATG GCCACGCCCA CCGCGTTCGC CGAGTTCTCC CGGCAAGGCG CGTTGGCCGC 10981 TGATGGCCGG TGCAAGGCGT TCGCGGCGGG CGCGGACGGC ACCGGCTGGG GCGAAGGCGT 11041 AGGCATTCTG CTGTTGGAGC GGCTGTCCGA CGCCGAGCGG AACGGCCACC GGGTGCTGGC 11101 GGTGATGCGT GGCTCCGCCG TCAACCAGGA TGGTGCGTCG AATGGTTTGA CGGCGCCGAA 11161 TGGTCCGTCG CAGCAGCGGG TGATCCGGCA GGCGCTGGCG AACGCACGTC TGTCCACAGT 11221 AGACGTGGAC GCGGTGGAGG CGCACGGTAC GGGTACGACG CTGGGCGACC CCATCGAGGC 11281 GCAGGCTCTG CTGGCCACCT ACGGCCAGGA CCGGGATCCG GATCGGCCGC TGCTGCTGGG 11341 CTCCGTGAAG TCCAACATCG GCCATACGCA GGCCGCGGCC GGTGTGGCTG GTGTGATCAA 11401 GATGGTGATG GCGATGCGCC ACGGCGTGCT GCCGCGGAGC CTACACATCG ACGAGCCCAC 11461 TCCCCACGTG GACTGGACGG CCGGACGGAT CGCACTGCTC ACCGAACCGT CCCCCTGGCC 11521 TCTGACGGGA GCGCCGCGAC GCGCCGCCGT CTCCTCGTTC GGTGTGAGTG GCACCAACGC 11581 GCATGTGATC CTCGAACAGG CATCTGCGGT GGCCGAACCC GAGGAAACCG ACACGGCGCG 11641 AACACCCGAA CCGCCAGCTG TTCCGTGGGT GCTCTCGGCA CGGAGCGAGG CGGGGCTACG 11701 GGCGCATGCC CTCAGGCTTC GGTCCTTCGT GAACGCCGAT GCTGATCTGC GTCCAGTCGA 11761 TGTCGGCTGG TCGCTGGCGT CGGCTCGCTC GGTGTTGTCA CACCGTGCGG TGGTCGTGGG 11821 CGCGGACCGC GATGAACTCC TCCGTGAACT GGAGGCCGTG GCCAGTGGCA GCGTCACGGT 11881 CGGCGAGGCC CGCACGCATT CCGGGGTGGT GTTTGTCTTC CCGGGGCAGG GGTCGCAGTG 11941 GGTTGGGATG GCGTTGGAGC TCCTGGAGCA TTCGCCGGTG TTCGCGGGGC GGATGCGTGA 12001 TTGTGCGGAT GCGTTGGCGC CGTTTGTGGA GTGGTCGTTG TTCGATGTGT TGGGTGATGA 12061 GGTGGCGCTC GGTCGGGTTG ATGTGGTGCA GCCGGTGTTG TGGGCGGTGA TGGTGTCGCT 12121 GGCCGAGTTG TGGCGTTCGT TTGGTGTGGT GCCGTCGGCG GTGGTGGGGC ATTCGCAGGG 12181 TGAGATCGCG GCGGCGTGTG TGGCCGGGGG TCTGTCGTTG GAGGACGGTG CCCGTGTGGT

308

12241 GGCCTTGCGG AGCAGGGCGT TGCTGGCTCT GTCGGGTAGG GGCGGCATGG TGTCGGTTCC 12301 GGTTTCTGCT GACCGGCTGC GGGGTCGTGT GGGGTTGTCG GTTGCGGCGG TGAATGGTCC 12361 GGTGTCGACG GTGGTGTCGG GGGCGGTTGA GGTGCTGGAG GGGGTGCTGG CGGAGTTCCC 12421 GGAGGCCAAG CGGATTCCGG TGGATTATGC GTCGCATTCG GTGCAGGTGG AGGGGATCCG 12481 GGAGGGTTTG GCGGAGGCGT TGGCGCCGGT TCGGCCGCGT ACGGGTGAGG TGCCGTTCTA 12541 TTCGACGGTG ACCGGCCGGC TGATGGACAC CATCGAGTTG GACGGGGAGT ACTGGTACCG 12601 GAATCTGCGT GAGACGGTGG AGTTCCAGAG CACCGTCGAA GCTCTGATCG GCCAGGGTCA 12661 TACGGTGTTC GTCGAGGCCA GCCCGCATCC GGTGCTGACC GTCGGCGTCC AGGACACCGC 12721 CGACACCACC GACACCGCCA CCGACATCGT CGTCACCGGA TCGCTGCGCC GCGACGACGG 12781 CGGTCCGGCG CGCTTCCTCA CCGCGCTGGC CGAGCTGTCC GTACGAGGGG TGGCGACGGA 12841 CTGGCGGCAG GCGTTCGAAG GGACCGGCGC CCGACATGTC GACTTGCCGA CCTACCCCTT 12901 CCAGCGGCAG CGCTTTTGGA TCGAACCCAC TGCCCCGGAC GTGGCCCGGG AGGACGCTCG 12961 CGTCACCACT GCGGACGGCG AGTTCTGGGC GGCCGTCGAG CGCGAAGACG CCGCATCCCT 13021 GGCAACAGCC CTGGAGGTCG ACGACGCCTC ACTGGGCAAC CTGCTGCCCG CCTTGTCGGC 13081 CTGGCGCCGC CGGCGGCACG AGTGGTCCGC ATTGGAGGCC GTCCGGTACC AGGTCAACTG 13141 GAAGCGGCTC GTCGATGACC GACCCGCGAT GTTGTCAGGT GCCTGGCTGG TCGTGGTTTC 13201 CCAGGCCGAC GCCGACCATG AGTGGGTCTC CGGCGTAAGC GAGACGCTCG CCGAGTACGG 13261 GGCCGAGCCA GTGGTGTGCC CGGTGGACGA GCGACACCTG GATCGTGCCG TGCTGGCCGA 13321 CCGGCTGGCG AGCATGACCG GTACGAGCAG CACGACGAGC ACGGCGAGTA TCAGCGGCGT 13381 GGTGTCGCTG GTCGCCCTGG ACCAGCGCCC GCACCCGGAC TTCGCCTCCG TGCCCATTGG 13441 TTTCGCGATG ACGGTGCTGC TGACTCAGGC GTTGGGCGAC ACGGGGGTGG AGGCCCCGCT 13501 GTGGAGTCTG ACCCAACACG CCGTGTCCAC CGGGCCCGCT GACACCCTCC TCGCGTCCGC 13561 ATCGGCGCAG GCACTGGTGT GGGGCGTCGG CCGAGTGATC GCACTCGAGC AGCCCCTGCG 13621 CTGGGGTGGT CTCATCGACC TGCCGACCGA GGTGAACGCG AGGGCGCGGG AACGGCTGGC 13681 AAGGGTCCTG TCAGGCGTTT CGGGCGAGGA CCAGGTCGCG ATCCGGACGG TGGGGGCCTT 13741 CGGACGCAGG CTCGTCCATG CACCCGCGTT GCGGACCGAC CTGCCGTCCT GGCAGCCGAG 13801 CGGGACCGTA CTGGTCACCG GAGGCACTGG AGCGCTGGGC GGTCATATCG CGCGGTGGCT 13861 GGCGCATCAG GGCGCGGAGC ACCTCGTGCT GACCAGCCGA CGCGGTATGG CCGCGCCCGG 13921 GGCGTCCGCA CTCGTGGCGG ACCTGGAAGC GGCCGGAGCG GCGGTGACGG TGGCCGTGTG 13981 CGACGTGGCC GAGCGTGCCC AACTGGCCGA CCTGGTGGCG GATGTCGGCC CGCTGACGGC 14041 TGTTGTGCAC ACGGCCGCCC TGCTGGACGA CGCGACGGTC GAGTCCCTGA CCACCGAGCA 14101 ACTGCACCGG GTGCTCCGCG TCAAGGTCGA CGGTGCGACG CATCTGCACG AGTTGACCCG 14161 TGACATGGAA CTCTCCGCGT TCGTGCTCTT CTCCTCCTTG TCCGGGACGG TCGGCACACC 14221 GGGGCAGGGC AACTACGCAC CGGGCAACGC CTTCCTCGAC GCGCTGGCCG AGTACCGCAG 14281 GACCCAAGGC CTGGTGGCGA CATCGGTGGC CTGGGGCCTG TGGGCCGGTG ACGGGATGGG 14341 AGAGGGCGAA GCCGGCGAGG TGGCCCGGCG GCATGGTGTT CCCGCGCTGT CGCCGGAGCT 14401 GGCGGTGGCC GCTCTGCGTG CGGCCGTCGA ACAGGGCGAC GCGGTGGTCA CGGTTGCCGA 14461 CATCGAATGG GAACGCCATT ACGCCGCCTT CACCGCGACG CGCCCCAGCC CCTTGCTCGC 14521 CGACCTTCCA GAGGTACGGC GACTCATCGA CGCGGGCGCC GCTTCGGCCG TCGAGGAGAC 14581 GGACCGGGAC CGATCCGGAC TCAGCGGGCG CTTGGCAGGG CTCGACGGGG CCGAACAGCG 14641 GCGACTGCTG CTCGATTTGG TACGCCGCAA TGTCGCGGTG GTGCTCGGGC ACACCGACCC 14701 AGAAGCCGTG TCGTCCCACC GCGCCTTCCA GGAGCTCGGC TTCGACTCCG TGACGGCGGT 14761 CGAGTTCCGC AACCGGCTGG GTGCCGCGAC CGGTCTGCGG CTCCCGGCCA CTGCCGTATT 14821 CGACTACCCG ACCCCGCTGG CCCTGGCGGA GTACGCGCTG TCGGAACTGC TGGGGACGGT 14881 CGGGGAGCCC CTTCGCGTCG AGTCGAGCGG CTCCCCCGTG GACGACGATC CGATCGTGAT 14941 CGTGGGAATG AGCTGCCGCT TCCCCGGCGG GGTGAGCTCG CCGGAGGACC TGTGGGACCT 15001 CCTCACCGAG GGCGGGGACG CGATGTCGGC GTTCCCCGGG GACCGTGGCT GGGACCTGGC 15061 CGGGCTCTTC CACAGCGACC CCGGCCACCC GGGTACCTCG TACACCCGGA CAGGTGGTTT 15121 CCTCCATGAC GCGACCGCGT TCGACGCCGA CTTCTTCGGC ATCTCGCCAC GTGAAGCGCT 15181 GGCGATGGAC CCGCAGCAGC GGCTGCTGCT GGAGGCGTCA TGGGAGGCGT TCGAGCGGGC 15241 GGGGATCGAT CCTCGGTCGC TGCGGGGCAG CGAGACCGGG GTGTTCGCCG GCACCAATGG 15301 TCAGGACTAC GTCAGCCTTT TGGGCGGAGA TCAGCCGCAG GAGTTCGAGG GCTATGTCGG 15361 AACGGGCAAT TCGGCATCGG TGATGTCCGG CCGGATCGCC TACGTCCTGG GCCTTGAGGG 15421 CCCGGCGCTG ACGGTGGATA CGGCGTGTTC GTCGTCGTTG GTGGCGTTGC ATCTGGCGGT 15481 GCAGGCGTTG CGGTCGGGTG AGTGTTCGCT GGCGCTCGCG GGCGGTGTCA CGGTGATGGC 15541 GACACCGGGT CTGTTCGTGG AGTTCTCCCG TCAGCGTGGC CTGGCCGCCG ATGGTCGGTG 15601 CAAGGCGTTC GCGGGGGCGG CTGATGGCAC CGGTTTCTCC GAGGGTGTGG GGATGCTGGT

309

15661 GGTGGAGCGG TTGTCGGATG CTGAGCGGCT TGGGCATCGG GTGTTGGCGG TGGTGCGGGG 15721 CAGTGCGGTG AATCAGGATG GTGCGTCGAA TGGTTTGACG GCGCCGAATG GTCCGTCGCA 15781 GCAGCGGGTG ATCCGTCAGG CGTTGGCGAG CGCGGGTCTT GTGGCGGTGG ATGTGGATGC 15841 GGTGGAGGCG CATGGTACGG GTACGGCGTT GGGTGATCCG ATTGAGGCGC AGGCGTTGTT 15901 GGCCACGTAT GGTCAGGGTC GGGATGTGGG TCGGCCGTTG TGGTTGGGTT CGGTGAAGTC 15961 GAATATTGGT CATACGCAGG CGGCCGCGGG TGTGGCTGGT GTGATCAAGA TGGTGATGGC 16021 GCTGCGGCAT GGGGTGTTGC CGCAGAGTCT GCACATCGAT GAGCCGACAC CGCATGTGGA 16081 CTGGTCCACC GGCGCGGTGG AGCTCCTGGG GGAGCACACG GGCTGGCCGG AGGTGGATCG 16141 GCCGCGTCGG GCGGGTGTGT CGGCGTTCGG GGTGAGTGGG ACGAATGCGC ATGTGATTGT 16201 GGAGCAGGCG CCTGAAGTGG TGGAGCCTGA GGCTGAAGGT GTGGTGTTGC CTGCTGTGCC 16261 GTGGGTGGTG TCGGGTGTGG GTGAGGTGGC GGTGCGGGCG CAGGTGGAGC GGTTGCGGGC 16321 CTTTGCGGAC CGGAATCCGG GTCTGGATCC GGTGGATGTG GGGTGGTCTT TGGCGACTGG 16381 TCGTGCGGGG TTGTCGCATC GTGCGGTGGT GGTGGGTGCG GATCGTGGTG AGTTGTTGGG 16441 GGCTTTGGAG GGTGTGCCGG TGGTGGGTGT TCCGGTGGTG GGTGGGTTGG GTGTGTTGTT 16501 TGCGGGGCAG GGGTCGCAGC GGTTGGGGAT GGGTCGTGGG TTGTATGAGG GGTATCCGGT 16561 GTTCGCTGCG GTGTGGGATG AGGTGTGCGC GCAGCTGGAC CAGCATTTGG ATAGGCCGGT 16621 GGGTGAGGTG GTGTGGGGTG ATGATGCCGA GCTAATTGGC GAGACGGTGT ATGCGCAGGC 16681 GGGGTTGTTC GCGCTTGAGG TGGCGCTGTA TCGGCTGATC GCTTCGTGGG GTGTGAGGGG 16741 GGATTATCTG CTGGGTCATT CGATTGGTGA GTTGGCTGCG GCGTATGTGG CGGGTGTGTG 16801 GTCGTTGGAG GATGCGGCGA GGGTGGTGGT GGCGCGGGGT CGTTTGATGC AGGCGTTGCC 16861 GTCGGGTGGT GCGATGGTTG CGGTGGCCGT TTCGGAGGGT GTGGTGCGGC CGCTGCTGGG 16921 CGAGGGTGTG GTGGTTGCGG CGGTGAATGG TCCCGAGTCG GTGGTGCTGT CGGGTGATGA 16981 GGATGCGGTT CAGGTTGTGG TGGATGTGTT GGCTGGGCGT GGGGTGCGGA CGCGGCGGTT 17041 GCGGGTGAGT CATGCGTTCC ATTCGGCTCG TATGGACGGG ATGCTGGCGG AGTTCGGTGA 17101 GGTGCTTGGG GGCGTGGAGT TCCGTGCCCC GAGCGTGCCC GTGGTGTCGA ACGTGTCCGG 17161 TGCGGTGGCG GGTGAGGAGT TGTGTTCGCC GGAGTATTGG GTGCGGCATG TGCGGGAGAC 17221 GGTCCGGTTC GCCGATGGGC TGGAGACGCT GCGCGAGCTG GGTGTGGGTT CGTTCCTGGA 17281 GTTGGGGCCT GACGGGACGT TGACTGCCTT GGCGGATGGC GATGGTGTGC CTGTCTTGCG 17341 TCGGGATCGT CCGGAGCCTC TGACCGCTAT GGCGGCTTTG GGCGGGCTGT ACGTCCGGGG 17401 TGTCCAGATC GACTGGGGGG CGGTGTTCCC GGGTGCTCGG CGGGTCGATT TGCCGACGTA 17461 TGCCTTCCAG CGTGAGCGGT TCTGGTTGGA GCCATCCGCT GAGCAGCCTG CGACGAGCGT 17521 GGTGGACGCG GCGTTCTGGG ACGCGGTCGA GCGGGGCGAT GCGGAGGCTC TTGGGGGCGA 17581 TGCCGAGCAG TCGTTGAGTG CCGCGTTGCC TGCTTTGGCG TCGTGGCGGC GGGCGCAGCA 17641 GGAAGAGTCG GTTATCGACG GGTGGCGTTA CCGGCTCGGC TGGACGCCGA TCCCGGTGGT 17701 GCTGGGGGAG CCATGCCTCA CTGGCACTTG GCGGGTTGTG GTCGAACCGG GTGCGGACGG 17761 TACCGATGTG GCTGCCGCGC TGCGGTCGGC CGGGGCTGAT GCCGAGGTCG TGACGTCGGC 17821 GGAACTGAGC GCGGGGCCGG TCGCGGGTGT GGTGTCATTG TTGTCGGTCG AGGCGACGGT 17881 GGCGCTGGTG CAGGCTCTCG GGACGGTCGG GATCGATGCG CCGTTGTGGT GTGTGACGCG 17941 GGGTGCGGTC TCCGTGGTGG ACGGGGATGT GGTGGAACCG TACGCGTCGG CCGTCTGGGG 18001 TCTGGGCCGT GTGATCGGTC TGGAGCATCC GGACCGTTGG GGCGGGCTGA TCGACCTGCC 18061 CACGGAGGCG GACGCACGTG TGGGTGCGTT GTTGGCCGGG GTTCTCGCCG GGCGCACCGG 18121 GGAGGATCAG GTGGCAATCC GGGCCGCCGG GGCGTGGGGT GCCCGGCTGA GCCGGGCGAC 18181 ACCGATTGCG GACACGTCTG GCGGGTGGCG TGGTCGGGGA GCTGCCTTGA TCACCGGGGG 18241 TACGGGTGCG CTGGGCGGCC ATGTGGCGCG CTGGCTGGCG GGGACCGGGG TGGAGCGCAT 18301 CGTGCTGACG AGCCGCCGGG GGATCGAGAC CCCGGGTGCG GCCGAGCTGG TGACCGAGTT 18361 GGAGGAGTTC GGAGTCCAGG TGACGGTGGT CGCGTGCGAT GTCGCCGATC GGGAGGCGGT 18421 CGCGACGCTG CTGGTCACCA TCCCCGATCT CCGGGTCGTC GTACACGCCG CAGGGGTGCC 18481 GAGCTGGAGT GCGGTGGACA GCCTGACACC CGAGGAGTTC GAGGAGAGCG CGCGGTCGAA 18541 GGTTGCCGGG GCGGCGAACC TGGACGCGCT CCTGGCGGAC GCTGAGCTGG ACGCCTTTGT 18601 GTTGTTCTCG TCGGTGGCGG GGGTGTGGGG GAGCGGTAGT CAGTCGGCGT ATGCGGCGGC 18661 GAACGCGTTT CTGGATGGGT TGGCGTGGCG GCGTCGTGGT GTTGGGTTGG TGGCGACGTC 18721 GGTGGCGTGG GGGATGTGGG GTGGCGGTGG TATGGCGGTT GGGGGTGAGG AGTTTCTGGT 18781 TGAGCGTGGT GTGTCGGGGA TGGCTCCGGG GTTGGCGGTG GCTGCGTTGC GGCGGGCGCT 18841 GTGTGATGGT GAGACGGCGC TTGTGGTGGC GGATGTGGAT TGGGAGCGGT TCGGGCCGAG 18901 GTTCACCGCG TTGCGTCCGA GCCCACTGCT GAGCGAGCTG ATCCCCGATA CGTCCGAACC 18961 ACTCGCGTCG ACGGTGGGTG AGTTCGCGGT CGAGCTGCGC GGATTGTCGC GCGAGGACCG 19021 GGACCGTGCC GTCGTGGAGC TCGTACGGAC ACATGCCGCC GAGGTGTTGG GCCACCAGAA

310

19081 CCCGAGCGCG ATCGACCTGG ACCGGACGTT CCAGGAGCTG GGCTTTGACT CGCTGACCGC 19141 CGTGGAATTG CGGGACCGGC TCGGCACGGC TACTCAGCTG CGATTCCCAG CGTCCGTGAT 19201 CTTCGACTAC CCGACTCCGG CGGCACTCGC CGAGCATGTG TGCGGGGCGG CCCTCGGACT 19261 GGCCGAAGAG ATACAGGTAG CGCACACGCC CAGCGCGGTG GCCGACGATC CGATCGTGAT 19321 CATCGGCATG AGCTGCCGAT TCCCGGGCGG TGTGGACTCT CCGGAGGCGC TGTGGCGGCT 19381 GGTCAGCGCC GGTGGCGACG CCGTATCGTC CTTCCCGTCC GACCGTGGCT GGGACCTGGC 19441 CGGTGTGTAC GACGCCGACG CCACTCGCTC GGGCCGGTCG TACGTCCGCA CGGGTGGATT 19501 CCTCCATGAC GCGGCTGAGT TCGACGCCGG ATTCTTCGGG ATCTCGCCGC GCGAGGCGAC 19561 CGCGATGGAT CCGCAGCAGC GGCTGCTGCT GGAGGCGTCC TGGGAGGCGT TCGAGCGGGC 19621 CGGAATCCCG GCCTCGACGC TCAAGGGCAG CCAGACCGGC GTCTTCGTGG GCGCGTCCGC 19681 ACAGGGCTAT GGCGGCGGGG ACGGGCAGGC GCCGGAAGGA TCCGAAGGAT ACCTTCTGAC 19741 CGGCAACGCG GGCAGCGTGG TGTCCGGTCG GGTGGCCTAT ACGTTTGGGC TGGAGGGCCC 19801 GGCGGTCACC GTGGACACGG CGTGCTCGTC CTCGTTGGTG GCGCTGCACT GGGCGGTGCG 19861 GGCCCTTCGG TCGGGCGAGT GCTCCCTCGC GCTGGCCGGC GGAGTGACGG TGATGGCGAC 19921 ACCCGCCACC TTTGTGGAGT TCTCACGTCA GCGTGGGCTG GCCGCCGATG GCCGCTGCAA 19981 GTCCTTCGCC GCCGGTGCGG ATGGGACGGG CTGGTCGGAG GGTGTTGGGC TGTTGCTGGT 20041 GGAGCGGTTG TCGGATGCCG AGCGGAACGG GCATCCGGTG CTGGCCGTTG TCTCCGGCTC 20101 TGCGGTGAAT CAAGACGGTG CGTCGAATGG TTTGACGGCG CCGAATGGTC CGTCGCAGCA 20161 GCGGGTGATC CGTCAGGCGT TGGCGAATGC GGGTCTTGTG GCGTCGGATG TGGATGCGGT 20221 GGAGGCGCAC GGTACGGGTA CGACGCTGGG TGATCCGATC GAGGCGCAGG CGTTGTTGGC 20281 CACGTACGGT CAGGGTCGGG ATGCGGGTCG GCCGTTGTGG TTGGGGTCGG TGAAGTCGAA 20341 CATCGGTCAT ACGCAGGCGG CTGCGGGTGT GGCTGGTGTG ATCAAGATGG TGATGGCCAT 20401 GCGGCATGGG GTGTTGCCGC GGACGTTGCA TGTGGATGAG CCGTCGCCGC ATGTGGATTG 20461 GTCTGCTGGT GCGGTGGAGT TGTTGACGGG GCAGGTGGCG TGGCCGGAGG TGGATCGGCC 20521 GCGTCGGGCG GGTGTGTCGG CGTTCGGGGT GAGTGGGACG AATGCGCATG TGATTGTGGA 20581 GCAGGCGCCT GAAGTGGTGG AGCCTGAGGC TGAAGGTGTG GTGTTGCCTG CTGTGCCGTG 20641 GGTGGTGTCG GGTGTGGGTG AGGTGGCGGT GCGGGCGCAG GTGGAGCGGT TGCGGGCCTT 20701 CGCGGACCGG AATCCGGGTC TGGATCCGGT GGATGTGGGG TGGTCTTTGG TGGCCACCCG 20761 GTCTGGGTTG TCGCATCGTG CGGTGGTGGT GGTTGCGGAT GGTGAGGAGT TGTTGGGGGC 20821 TTTGGAGGGT GTTCCGGTGG TGGGTGGGTT GGGTGTGTTG TTTGCGGGTC AGGGGTCGCA 20881 GCGGTTGGGG ATGGGTCGTG GGTTGTATGA GGGGTATCCG GTGTTCGCTG CGGCGTGGGA 20941 TGAGGTGTGC GCCCAGCTGG ACCAGCATCT GGATAGGCCG GTGGGTGAGG TGGTGTGGGG 21001 TGATGATGCC GAGCTAATTG GCGAGACGGT GTATGCGCAG GCGGGGTTGT TCGCGCTTGA 21061 GGTGGCGCTG TATCGGCTGG TCGCCTCGTG GGGTGTGAGG GCGGATTACC TGCTGGGTCA 21121 TTCGATTGGT GAGTTGGCTG CGGCGTATGT GGCGGGTGTG TGGTCGTTGG AGGATGCGGC 21181 GAGGGTGGTG GCGGCGCGGG GACGTTTGAT GCAGGCGTTG CCGTCGGGTG GCGCGATGGT 21241 CGCGGTGGCG GCGTCGGAGG GTGAGGTGCG GCCGCTGCTG GGCGAGGGTG TGGTGGTTGC 21301 GGCGGTGAAC GGTCCCGAGT CGGTAGTGGT CTCGGGTGAT GAGGATGCGG TGCATGCCAT 21361 CGAGGAGACG TTCGCCATGG GTGGGGTGCG GACGCGGCGG TTGCGGGTGA GTCATGCGTT 21421 CCATTCGGCT CGTATGGACG GGATGCTCGC GGAGTTCGGT GAGGTGCTTC GGGGCGTGGA 21481 GTTCCGTGCC CCGAGCGTGC CTGTCGTGTC GAACGTGTCC GGTGCGGTGG CCGGTGAGGA 21541 GCTCTGCTCG CCGGAGTATT GGGTGCGGCA TGTGCGGGAA ACGGTCCGGT TCGCCGATGG 21601 GCTGGATACT CTCCGTGAGC TGGGTGTGGG TTCGTTCCTG GAGTTGGGGC CGGACGGGAC 21661 GTTGACCGCC TTGGCGGATG GCGATGGTGT GCCTGTCTTG CGTCGGGATC GTCCGGAGCC 21721 CCTGACCGCT ATGGCGGCTC TGGGCGGGCT GTACGTCCGG GGTGTGGAGG TGGACTGGGA 21781 CGCGGTGTTC CCCGGCGGTC GGCGGGTCGA TCTCCCCACC TACGCGTTCC AACGGCAGCG 21841 GTTCTGGTTG GAGTCGGCCT CGGACCAGCC TGCGACCAGC GCGGTGGACG CGGCGTTCTG 21901 GGACGCGGTC GAGCGCGGGG ATGCGCGGGC GCTGGGCATT GACGAGGAAC AGCCGTTGAG 21961 TGCCGTACTG CCCGCCCTCT CGTCGTGGCG GAGGGCGCGG CAGGAGCAGT CGGTGATTGA 22021 TGGCTGGCGT TATCGGCTCG GTTGGATGCC GATTCCGGCG GTGTTGGGGG AGGTGGGCCT 22081 CATCGGTACC TGGCTGGTTG TGGTCGAGCC GGGTGTGGAC GGTACTGATG TGGCCGCAGT 22141 GTTGCGGTCG GCCGGGGCTG GTGTCGAGGT TGTGACGTCG GCGGAGCTGA GCGCTGGTCC 22201 GGTTGCGGGT GTGGTGTCGT TGGTGTCGGT CGAGGCGACG GTGTCGTTGC TGCAAGTCCT 22261 TGTGGCGGCC GGGGTCGATG CGCCGTTGTG GTGTGTGACT CGTGGTGCGG TCTCGGTGGT 22321 CGACGGTGAC CTGGTGGATC CTGGCCAGGC GGGAATCTGG GGTCTGGGCC GTGTGATCGG 22381 TCTGGAGTGT CCGGACCGTT GGGGCGGGCT GATCGACTTG CCTGGCGAAC TGGACGATCG 22441 CGCGGGGAAT GCGCTGGTAG GCATCCTTGC CGGGGGCACC GGTGAGGATC AGGTGGCCAT

311

22501 CCGTGTCACC GGCATATGGG GTGCCCGGCT GGTGCGGGCG ACGCCGGTCC CGATCGGTGA 22561 CGCGGGTGGT GAGGCTGCGG CCGCGTGGCG TGGGCGTGGT ACCGCGCTGG TCACCGGTGG 22621 TACGGGGGCG TTGGGGCGTC AGGTGGCGCG GTGGCTGGTG GGCAGTGGTC TGGAGCGGGT 22681 CGTGCTGACG AGCCGTCGGG GGGTTGAGGC GCCCGGTGCC GTCGAGCTGG TGGCTGAGTT 22741 GGGGAGCCGA GTGCGTGTCG TGGCCTGTGA TGTCGGCGAT CGTGAGGAGC TTGCGGCTCT 22801 TTTGGTGACG CTTCCGGATG TGCGGACCAT CGTGCATGCG GCGGGTGTCC TCGACGACGG 22861 GGTGCTCGAA TCGCTGACGC CCGAGCGGAT CCGTGAGGTG ATGCGGGCCA AGGCCGACGG 22921 CGCGCGGCAT CTCCACGAGT TGACCCGTGA CATCGACCTC GACGCCTTTG TGTTGTTCTC 22981 CTCGGCTGCC GGGACCGTGG GTAATGCGGG TCAGGGGAGC TATGCGGCGG CCAACGCCGT 23041 CCTGGACGGG CTGGCGTGGC GTCGCCGGGC CGAGGGCTTG GTGGCCACAT CGGTGGCCTG 23101 GGGAGCCTGG GCCGAATCCG GTATGGCCGC GGAGATGGCG CGGTCGCAGG GCATGGATCC 23161 GAGGTCGGCG CTCGCCGCCC TGGGGCTGGT GCTGGCCGCT GACGAGACCA CGGTGATGGT 23221 GGCCGACATC GACTGGGCGA CCTTCGGGGC CCGGTTCACC GCCTCACGGC CGAGCCCGCT 23281 GCTCAGCGAG TTGCTCGGCG ACGGATCCGT GTCGACCGAG GCAGCCGACG GCGAACCGGC 23341 CGACGCGTTC GCCACCCGCC TGGAGGCCAT GGCCGAGCGG GAACGGGCGG CCACCGTGCT 23401 GGACCTCGTC CGTACGCATG TGGCCGCTGT CCTGGGACAC ACGGCATCCG AGGCGATCGA 23461 CCCGGCCCGG CCCTTCCAGG AGATCGGTTT CGACTCGCTC ACCGCGGTGG AGCTGCGGAA 23521 CCGGCTCACC GCGGCCACCG GGGTACGGTT CCCGGCTTCC GTGATCTACG ACTACCCGAC 23581 CCCGGCCGCG CTCGCCGAGC ACGTGTGCCG GGAGGCGCTG GGTCCGGGCG GACGGACACC 23641 GGCTCCGGTG GTGCCACGCC CGGTGGACGA CGAACCGATC GCCATCATCG GGATGAGCTG 23701 CCGTTTCCCC GGCGGGGTGA GCTCGCCGGA GGACCTGTGG GGGCTGCTGG CCGAGGGCCG 23761 TGACGCCGTG TCGGACTTCC CGGCGGACCG TGGCTGGAAC CTGGCCGAGC TGTACGACCC 23821 GGATCCCGAC CACCCCGGCT CCTCGTACGT CCGGGCGGGC GGATTCCTTG ATGACGCGGC 23881 CGCGTTCGAC CCCGGCTTCT TCGGGATATC GCCGCGCGAG GCGCTCGCGA TGGACCCGCA 23941 GCAGCGGCTA TTGCTGGAGG TCGCCTGGGA GGCGTTCGAG CGCGCCCATA TGTCCCCCGC 24001 CACCCTCAAG GGCAGCCGGA CCGGGGTGTT CGTCGGGACC AACGGCCAGG ATTACGCCGC 24061 TCTGGCGAGC GGGGCCCCGC GGAGCGCGGA AGGGTATCTG GGCACGGGCA GCGCCGCCAG 24121 TGTCGCCTCG GGCCGGCTGG CGTACACCTT CGGCCTCGAG GGCCCGGCGG TCACCGTGGA 24181 CACCGCCTGC TCGTCGTCGC TGGTCGCGCT GCACCTCGCC GCACAGGCCC TGCGCTCCGG 24241 TGAATGCTCC TTGGCCTTGG CCGGTGGTGC GACCGTCATG GCCACTCCGG CGGCCTTCCT 24301 GGAATTCTCC CGCCAGCGTG CGTTGGCGGC CGATGGGCGC TGCAAGGCGT TCGCGGCGGC 24361 GGCGGACGGC ACCGGCTGGG GCGAGGGCGT CGGCATGCTC CTGGTGGAGC GGCTCTCCGA 24421 CGCGGAGCGC AACGGCCACC GGGTGCTGGC GGTGATGCGT GGCTCCGCCG TCAATCAGGA 24481 CGGCGCGTCC AACGGGCTCA CGGCGCCGAA CGGCCCGTCG CAGCAGCGAG TGATCCGTCA 24541 GGCCCTGGCG AACGCCCGGC TGTCCGCCAC GGACATCGAC GTGGTGGAGG CGCACGGCAC 24601 CGGCACCAGT CTCGGCGACC CGATCGAGGC GCAGGCACTG CTCGCCACGT ACGGTCAGGG 24661 CCGGTCCCAG AACAAGCCAC TGTGGCTCGG CTCGGTGAAG TCCAACATCG GGCACACCCA 24721 GGCGGCCGCC GGCGTGGCCG GTGTGATCAA GATGGTCATG GCCATGCGAC ACGGTGTACT 24781 GCCGCGGACC CTGCATGTCG ACTCGCCCTC GCCCCATGTG GACTGGGCGG CGGCCCGGGT 24841 CGAGTTGCTC GTCGAAGCGA GGGAGTGGCC GCGGACCGGC GCTCCTCGCC GGGCGGGTGT 24901 GTCCTCGTTC GGGGTCAGTG GCACCAACGC CCATGTCATC GTCGAGCAGG GGCCGGTGGT 24961 GGCCCGGCCC GATCGGGAGT CGGCGCGCGA GCCGTCACCC TCCGTGCCGT GGGTGCTGTC 25021 AGGTGCGGGG GAGGCCGGGC TGAGGGCCCA GGTCGAGCGC CTGGCGTCCT TCATCGACGC 25081 CCATCCGGGC CTGGATCCCG CCGATGTCGG GTGGACGCTG GTGGCCGGCC GTTCGTGTCA 25141 GTCGCACCGC GCCGTAGTGG TGGGTGCAGA CCTCGCGGAG CTTCGACGTG GACTGGACGC 25201 AGTCTCGACC GGTGGCGCCG CCCGGTCCGG CCGCAAGGTG GTGTTCGTCT TCCCCGGCCA 25261 GGGGTCGCAG TGGGCCGGAA TGGCGTTGGA ACTGTTGGAG CATTCGCCGG TGTTCGCGGA 25321 GCGGATGCGT GCATGCGCCG ATGCGCTCAC CCCGTTCGCC GAGTGGTCGC TGTTCGATGT 25381 GCTGGGTGAT GAGGTGGCGC TCGGTCGGGT TGATGTGGTG CAGCCGGTGT TGTGGGCGGT 25441 GATGGTGTCG CTGGCCGAGT TGTGGCGTTC GTTTGGTGTG GTGCCGTCGG CGGTGGTGGG 25501 GCATTCGCAG GGTGAGATCG CGGCGGCGTG TGTGGCCGGG GGTCTGTCGC TGGAGGACGG 25561 TGCCCGTGTG GTGGCCTTGC GGAGCAGGGC GTTGCTGGCT CTGTCGGGTC GGGGTGGGAT 25621 GGTGTCCGTA CCGGTGTCCG CCGATCGGCT CCGTGACCGT GCGGGGTTGT CGGTGGCGGC 25681 GGTGAACGGT CCGGCGTCGA CGGTGGTGTC GGGGGCTGTT GAGGTGCTGG ATGGGGTGCT 25741 GGCGGAGTTT CCGGAGGCCA AACGGATTCC GGTGGATTAC GCCTCACACT CCCCGCAGGT 25801 GGCCGAGATC CAGCGGGAGC TGGCGGACGT GCTGGCGCCG GTCCGGCCGC GCGGTGGACA 25861 GATCGCGTTC CACTCGACGG TGACCGGACG GCTCACCGAC ACCTCCGAAC TCGACGCCGA

312

25921 CTACTGGTAC CGCAACCTCC GGCACACCGT GGAATTCCAG AGCACCGTCG AAGCCCTGAT 25981 GAACCAGGGC CACACCGTGT TCGTCGAGGT GAGCCCGCAC CCCGTGCTGA CCATCGGCAT 26041 CCAGGACACC GCCGAGACCC CAGGCACCCC CGACACCCCA GGCACCCCCG ACACCGCGGA 26101 CGCCACCGAC GCTCACGAGG CCACCGGCGC CCCCGACGTC GCCAACACCG CCGACGTCAC 26161 CGGCGCTCCC GACGTCACCG GCGCCGACAT CGTCATCACC GGATCGCTGC GCCGCGACGA 26221 CGGTGGCCCC GCCCGCTTCC TCACCGCCCT CGGCGACCTC CACACCCGGG GCGTGGACGT 26281 GGACTGGAGC CCGGTCTTCA CCGGAGCCCG GACGGTGGAC CTTCCCACCT ACGCCTTCCA 26341 ACGGGAACGC TTCTGGCTGA AGCCCGCGCG GGCGGTGACC CAGGCGTCCG GGCTGGGCCT 26401 CGGCGATATC GAGCACCCCC TGCTGGGCGC GGTACTGCCC CTGCCCGGGG ACGAGGGCGG 26461 TGTGCTGACC GGACTGCTCT CCCTGGACGG ACAGCCCTGG CTGGCCCACC ACATGGTGCG 26521 GGACACGGTT GTCTTCCCCG GCACGGGATT CGTCGAACTC GCCCTGCAGG CCGGTCAGCA 26581 CTTCGGCCAC TCGGTGATCG AGGAGCTGAC CCTGCATGCC CCGCTGGTGG TGCCGGACCA 26641 GGGCGGGGTC CAGGTACAGG TGGCCGTATC GGCGGCGGAC GAACGGGGCC GGAGGCCGGT 26701 CACGGTGCAC TCGTGCCGTG CCGGGGAGTG GCTGCTGCAC GCCTCGGGCA CTCTCGGCGC 26761 CACCGGAGGC CTCGACGTCA CCGAGCCGCG CCCCGCCGAC GTGGCCCGGC CCCTGGAGGT 26821 CTGGCCGCCC GAGGGCGCGC GGAGCCTCGA TGTCTCGGGG ATGTACGAGG CGATGGCGGA 26881 GCGCGGCTAC GGGTACGGTC CCGCTTTCCA AGGGCTGCGC GCCGCGTGGA CACGGGACGA 26941 TGAGATCTAC GCCGAAGTGG CTCTGGAGCC GGAGGCACAG GACGTGGCGG CGCGGTGCGG 27001 TGCGCATCCG GCCCTGCTCG ACGCCGCGCT CCACGGAGTG GGGCTCGGCC GCTTCCTCAC 27061 CGACCCCGGC CAGGCGTATC TGCCGTTCTC CTGGAGCGGG GTCGCGCTGC ACGCGGTAGG 27121 CGCCTCCGCC ATCCGCGTGG TGCTCTCCCC GGCCGGTACG GACGCGGTGT CGCTGGAGGT 27181 GACGGACCCG ACGGGAGCGC CGGTGCTGTC GGTGGCGTCG CTCTCGTTGC GTCCGCTGTC 27241 CAGCGGGCGG ATCGCGGACA CCCGAGGGGT GGACCAGGAC TCGCTGTACC GCGTGGACTG 27301 GGTCGAGATG CCGCTGCCGA CTGCCCCGGC AGGCTCGGCC CCGGCCGAGT ACGACGCGCC 27361 GGCGATGTTC GACGCCCTGG TATTCGACGC CCCGGTCGAG TACGACGTTC TCGCCTCCGA 27421 CGCCTCCGAC GCCTCCGACG CCTCCGACGC CCCCGGCACC CCCGACGCCT CCAGTGCCCC 27481 GGTGCCCGAC ATGCCCGACA TGGTGGTGCT GCCGTGTGAG TCGGCCGGTG ACGCGGTGTC 27541 CACCGTCGTG TGCCGGGCGC TGGCGGCGGT ACGGCGATGG CTCGCCGACG AGCGCTGTGC 27601 CCGGTCGCGG CTGGCCGTGC TGACGCGCGG CGCGATGGCC ACCGCTCCCG GCGAGAGCGT 27661 CGAAGACCTC GGCGCGGCAG CGGTCTGGGG CCTGCTCCGC AGCGCCCAGG CCGAGCACCC 27721 GGACCGCTTC GTCCTCGTCG ACCACGACGG CCACCAGGAT TCCCGTGCGG TGCTCGCCGC 27781 CGCGCTGGCC GCCGCGGTCG ACGGTGGCCA TGCGCATCTC GCGCTGCGCC GTGGCCGTGT 27841 CCTGACGCCT CAGCTCGCTC CGCTCACCCC GTCCGCGACC GCCCTGTCCA CCACCGCACC 27901 GCCCGCCGCC ACCCCAACCC CGGAGGCCGG GGCACCGTGG CGGATGGACG TCACCAGTCA 27961 GGGCACGCTG GAGAACCTGG CCGCCGTCCC CTGCCCGGAG GCCGCCGGTG TCCTCGGCGC 28021 CGGACAGGTG CGGGTGGCGA TGCACGCGGC CGGGGTGAAC TTCCGGGACG TCGTCGTCGC 28081 CCTCGGCATG ATCCCCGGTC AGGACGTCAT CGGCAGCGAG GGTGCCGGAG TGGTGCTCGA 28141 CATCGGCCCC GGTGTGTCCG GCCTGGCGCC CGGTGACCGG GTGATGGGTC TGTTCTCCGG 28201 GGCGTTCGGC CCCGTGGCGG TGACCGATCA CCGACTGTTG GCGCGGCTGC CGGAAGGCTG 28261 GTCGTTCGCC GACGCCGCGG CCACGCCGGT GGTGTTCCTG ACCGCCATGT ACGGGCTGAT 28321 GGACCTGGCC GGTCTGCGAC CCGGTGAATC GGTGCTGCTG CACTCGGCCG CCGGCGGGGT 28381 GGGCATGGCC GCGACACAGG TGGCCCGCTG GCTCGGCGCT GAGGTGTACG CCACCGCGAG 28441 CCCAGGGAAG TGGGACGCGC TGCGCGCCGG AGGAGTGGCG GACGACCGGA TCGCCTCGTC 28501 CCGCTCCTTG GAGTTCGCCG ACCGCTTCGG CCGGGTGGAC GTGGTGCTGA ACTCGCTGGC 28561 GGGCGAGTAC GTGGACGCCT CGCTCGGCCT GCTCGCCGAC GGTGGCCGTT TCCTGGAGAT 28621 GGGCAAGACC GACATCCGCG ACGGTGAGCG CGTGGCCGCG GAGCACGGGG TGCGGTACCA 28681 GGCGTTCGAC CTCATGGACG CGGGGCCCGA CCGGGTCGGG GAACTGCTCA GGCTGCTGGT 28741 GTCGCTCTTC GAGCGAGGGA TCTTCACGGC ACTGCCGACC CGCGTCTGGG ACGTCCGGCA 28801 GGCGGGTGAC GCGCTGCGCT TCCTCTCGCA GGCACGCCAC ATCGGCAAGC TGGTGCTGTC 28861 CATTCCGCAG CCGCTGCGGG AGGGGGACAC CGTGCTCATC ACCGGCGGCA CCGGCACACT 28921 GGGCGGGCTG GTCGCCCGTC ACCTGGTCGA ACGGCACGGA GTACGGGATG TCGTCCTGGC 28981 CGGCCGGCGG GGGCCGGACG CCCCGGACGC GGCCGAACTC GCCGCCGCCC TGCGCGAATA 29041 CGGCGCCCGG GTGCGGGTGG TGGCCTGTGA CGTGGCCGAC CGGGACCAGC TGGCACGGCT 29101 GCTGGACACC GTCTCCGGCC TGCGGATGGT GGTGCACACC GCGGGTGTGC TCGACGACGG 29161 GGTGATCGAG TCGCTCACCC CGGAGCGGGT GCGCGAGGTC CTGAGGCCGA AAGTGGACGC 29221 CGCCTGGTAT CTGCACGAGC TGACGGCCGG TCGTGAGCTG GCGGAATTCG TGGTGTTCTC 29281 CTCGGCCGCG GGTGTTCTGG GAAGCCCCGG GCAGGGCGCC TACGCGGCGG CGAACGCCTG

313

29341 GCTGGACGCG CTGATGGCGC ATCGCCGGGC CGCCGGGCTG CCGGGTCTCT CCGTGGCCTG 29401 GGGGCTGTGG GCCGAGCGCA GCGGGATGAC CGGCCATCTG TCGGACCGGG ATCTCGCCCG 29461 GATGGCCAGG GCCGGTGCCA CGCCTCTCGC CACCGATCAG GGGCTCCGGC TCCTGGACAG 29521 TGCCAGGGCG GCCACCGAGG CGCTCGTGCT GGCCACACCG CTGGACGCCG CGGCGCTGCG 29581 GGCACAAGCC GACGCCGGGG CGCTGCCCGC GCTCTTCCGC GGTCTGGTCC GTGCGCCGAT 29641 CCGCCGCGCG ACCGGCGCGG GCCCGGTGGA GGACGAGTCG TCGCTGCGGG GCCGGATGGC 29701 CGCGATGCCG GTCGCCGAGC GCGAACAGCT GGTGCTGGAC CTGGTCCGTA CGCAGGTGGC 29761 GACCGTGCTG GGGCACGGCA CCGCCACCGC GGTCGACCCG GCGCGTACGT TCGCGGAGAC 29821 CGGCTTCGAC TCGCTCACGG CCGTCGAGCT GCGCAACCGG CTGCGCACCG CCACCGGGGT 29881 CAGGCTGTCG GCCACCGCGA TCTTCGACTA TCCGACACCC GCGGTCCTGG CCGGTCATCT 29941 CCTCCGGGAG CTGGACGGCA CCGTCGGCGA GGCCGTGACA CGGCCCGCCG CCCCGGCCGC 30001 CGCCACCGAC CGGGACCCGA TCGTGATCGT CGGAATGGCC TGCCGCTATC CGGGCGGAGT 30061 GGCGTCGCCC GAGGAGTTGT GGGAGCTGCT CGCCACCGGG CGCGACGCGG TCGCGGATCT 30121 GCCGGACGAC CGGGGCTGGG ACCTGGACGG CCTGTACAGC GCCGATCCGG ACAGCTCGGG 30181 CACCTCGTAC GTCCGCTCCG GTGGCTTCGT GTACGACGCG GGCGAGTTCG ACGCCGACTT 30241 CTTCGGCATC TCGCCGCGCG AGGCGCTCGC GATGGATCCG CAGCAGCGGT TGCTGCTGGA 30301 AGTGGCCTGG GAGACGGTGG AGCGGGCCGG TGTCCCGGCG GCGTCGCTGA AGGGGAGCCA 30361 GACCGGGGTG TTCGTCGGTG CCGCGGCACA GGGCTACGGC ACGGGGGCCG GGCAGGCGGC 30421 GGAGGGATCC GAGGGCTACT TCCTGACCGG TGGCGCGGGC AGCGTGGTCT CCGGCCGGCT 30481 CTCGTACACC TTCGGCCTGG AGGGGCCGGC GGTCACCGTG GACACCGCCT GCTCGTCGTC 30541 GCTGGTCGCG CTGCACCTGG CGGCGCAGGC CCTGCGGTCC GGCGAGTGCT CGCTGGCACT 30601 GGCCGGCGGG GTGACGGTGA TGGCCACCCC GGGCATCTTC GTGGAGTTCT CCCGACAGCG 30661 CGGACTGGCC GCCGACGGCC GCTGCAAGGC GTTCGCCGAC GCGGCGGACG GCACCGGCTG 30721 GGGCGAGGGC GTCGGCATGC TGCTGCTGGA GCGGCTGTCC GACGCCCGCC GCAACGGCCA 30781 CCGGGTCCTG GCGGTCGTAC GGGGCTCCGC CGTCAACCAG GACGGCGCCT CGAACGGCCT 30841 GACGGCGCCG AACGGGCCCT CGCAGCAGCG GGTGATCCGG GCCGCGCTGG CGAACGCCGG 30901 GCTGGCCGCG TCGGACGTGG ACGCGGTGGA GGCACACGGC ACCGGCACCA GCCTGGGCGA 30961 CCCGATCGAG GCACAGGCGC TGCTGGCCAC CTACGGGCAG CAACGCGAAC GGCCGCTGCT 31021 GCTGGGCTCG ATCAAGTCGA ACATCGGGCA CACCCAGTCG GCCGCGGGAG TGGCCGGTGT 31081 GATCAAGATG GTGCTGGCGA TGCGGCACGG GGCGCTGCCC CGCACCCTGC ACGTGGACCA 31141 GCCGTCGACC CATGTGGACT GGTCGGCCGG TGCGGTGGAG CTGCTGACCG AGCCCGCCGA 31201 GTGGCCGGGG ACCTCCCGCC CCCGCCGGGC CGGGGTGTCC TCGTTCGGGG TGAGCGGGAC 31261 CAACGCCCAT GTGATCCTCG AACAGCCACC CGCGGAGGCG GAGTCCGGGC CCGCTCCGGA 31321 GTCGGCACCC GGGCCCGTCC CGGCGGTGGT GCCCGGGCCC GTCCCGGCGG TGGTGCCATG 31381 GGTGCTCTCC GGCCAGGGCG AGCGCGGACT GCGGGCGCAG GCCGCCCGGT TGCGGTCCTT 31441 CCTGGCCGCG CGCCCCGAGT CCGGCCCGGC CGACGTGGGC TGGTCGCTGG CCGCCACCCG 31501 TTCGGCGCTC TCCCACCGGG CCGCGGTGGT CGGGGCGGAC CGGGCGGAAC TGCTGGACGG 31561 ACTGGCCGCG CTTGCGGCCG GCGAGCCCGC CCCGGGCGTG GTCTTGGGCA CCGCGGACCC 31621 GGGCCGGGTG GGCGTGCTGT TCGCGGGCCA GGGTACGCAA CGGCCCGGTA TGGGGCGTGA 31681 GTTGTACCAG TCGTTCCCGG TTTTCGCGGC GGCGTGGGAC GAGGTGTGCG CCGCGCTCGA 31741 CCCGCATCTG GACCGTCCGC TCGGCGAGGT GGTGACCGAT GCCACCGGCG CGCTGGACGC 31801 CACCACGTAC ACGCAGGCGG GCCTGTTCGC CCTCGAAGTG TCGCTGTTCC GGCTGGTGTC 31861 CTCCTGGGGC GTGCGGCCGG ACTATCTGCT GGGCCACTCC ATCGGCGAGC TGGCGGCCGC 31921 GCAGGTGGCC GGTCTGTGGT CGCTGGAGGA CGCCGCCAAG GTGGTGGCGG CCCGGGGCCG 31981 GCTCATGGGC GCGCTGCCGC CGGGCGGGGC GATGGTGGCC CTGGCCGCGC CGGAGGACCA 32041 GGTACGGCCG TTCCTGACCG ACCGGGTCGC CCTCGCGGCC GTGAACGGGC CGTCGTCGGT 32101 CGTGGTGTCC GGGGACGAGG ACGCGGTGTG CGGTGTGGCC GAGGCGTTCG CCGCCCGTGG 32161 GGTGAAGACG CGGCGGCTGC GGGTCGGCCA CGCCTTCCAC TCGCCGCTGA TGGACGAGAT 32221 GCTCATCGCG TTCGCCGAGG TACTCGACAC GGTGGACTTC CGCACCCCGC GGATACCGGT 32281 GGTGTCGAAC CTCTCCGGTG CGGTGGCGGG GGAGGAGCTG TGCTCCCCCG CTTACTGGGT 32341 GCGGCAGGTG CGGGAGACGG TGCGGTTCGC CGCCGGGCTT GAGCGTCTGC GGGAGCTCGG 32401 CACGGGCACC TTCCTCGAAC TCGGGCCGGA CGGCACCCTC ACCGCCTTGG CCCAGGCCCA 32461 GATCACCGGG GCGGACGCCG AGTTCATCCC CACTCTGCGC GCCGACCGGC CCGAGCCGGT 32521 CACGGTCACC ACCGCCCTCG CCCAGTTGCA CACACACGGT GTGGAGCCGG ACTGGTCCGC 32581 GGTCTTCCCC GGCGCCCACC GGGCCGAGCT GCCGACCTAC GCCTTCCAGC GCTCCCGCTT 32641 CTGGCTGGAG CCCTCCCGTA CACCCGGTGA CGCGGGCGAC TTCGGGCTCG GCGCGCTGGA 32701 CCATCCGCTG GTCGGCGCGA GGGTGCCGCT GCCCGACGCG GACGGCGTTC TGCTCACCGG

314

32761 CCGCATCTCC GCCGAGGCCC ACTCGTGGCT GATCGGTCAG CGGGCGCTGG GCGTGCCCCT 32821 GTTCCCGGCG ACCGGCTTCC TGGAACTGGT GCTCCAGGCG GGGCTCCAGT GCGACAGCCG 32881 GACGGTGGAC GAACTCACCA TCCATGAACC ACTCGTCCTC CCCGAGCGGG GCGGGGTCGA 32941 GGTGCAGGTG TCCGTCCGTG GCGCCGACGA GTCCGGCCGC CGCCCGGCCA CCGTGTACTG 33001 CCGCCGCGAC CAGCGGTGGG TCCGGCATGC CACGGCCGTC CTCGGCGCGG ACCGGCCGCC 33061 CGCGCCGGAG CCGCGCCCCG AGCCCTGGCC GCCCACCGGC GCCCGGCCGC TGGAGTCCGG 33121 CGGGACACCG GCGTGGCGCC GTGACGACGA GGTCTTCCTG GACATCGAGC TGCCCGAGGT 33181 GGCCGGGGCC GAGGCCGAAC GCTGGACGCT GCATCCCGCC CTGCTCGAAC AGGCGTTGCG 33241 CGGGGAGGCG CTGGCAGGGC TGGTCACGGC GGCCGAGGGG ACCCATCTGC CGTTCTCCTG 33301 GACGGGGATC ACCCTGCACA CGACGGGTGC CACGAGACTG CGAGCCACCC TCGCGCCCGT 33361 CGGCCCGGAC ACGGTCTCGC TCCACGTGGC CGACGCCGCC GGAACACCCG TGCTGTCGGT 33421 GGACTCGCTG GCGCTCCGCC CGGTGTCCGG ACAGCGGCTG CGCCAGGCCA ACGCGGCGCT 33481 GTTCCGGCCG GTGTGGGCGG CTTGCCGCAC GCGGGCCGAA CCGGACACCG GCTCTGTCCG 33541 ATGGGGGCTC GTCGGCGACC CGGACGCCTG GAAACCGGAC ACGCTCGGCG CGCCGGTCGC 33601 GCTGTACCCG GACCTGTCGG CCATCGAGGA CGTACCGGAC GTCATCCTCC TCCCGTGCGT 33661 ATCCGAGGGC GGAACGGCGT CCGAGGTGGC CGTCCGCGTA TCCGAGACCG TGCGGACGTG 33721 GTTGGCCGGG GAGCGGTTCG CCGCCTCGCG TCTGGTGCTG GTGACCCGGG GCGCGCTCGC 33781 CACGGCGGCC GGTGAGGAGC TCGAGGACCT GGCCGCGGCC GCGGTGTGGT CGCTGGTCGA 33841 GCCCCTCCAG GCGGCCGTGG CGGGACGGCT GACACTCGTC GACACCGATA CGTCCGATCT 33901 GCGCATGCTG CCCGCCGCGG TGGCCGTGGG GGAGGACCGG GTCGCGGTCC GGGCGGGAGC 33961 GGTGCTGGTA CCGGACCTGG TCACGCCGCC GGCCACCGAG CAGGATCCGC CCGCCTGGGG 34021 CCCGGGGACG GTGCTGGTCA CCGGTGGGTC GGCCATGGCT GTCTCCCGGC ATCTGGTCGC 34081 CGAACGCGGT GTGCGTGACC TGGTCCTGGC CGGGGACGGC GACATGGCCG AACTGGCGGC 34141 CCTCGGAGCC ACGGTTCGGC TCGCCCCGTG TGATCCGGCG GACGGTCAGG CGCTGGCGGC 34201 GCTGGTGGCG GAGATTCCCG GGCTGCGGAG CGTGGTGCAC ACCGCGGCCG ACGCCCCGGA 34261 GCGGACCCGG TCCCTCTTGC CGGAATCCCT GCGGCCACAG CTGCGGTCGG GAGTGGCGGC 34321 GGCCTGGAAC CTGCACCTGG CCACGCGGGG CCTGGAACTG GACCGCTTTG TGCTGTTCAC 34381 CTCCGCCGAC GGGACACTGG GCCCCGCGTA CGCCGACGCG CTGGCCGCAC ACCGGCGGGC 34441 TCGCGGACTG CCCGCGGTGT CCGTCTCCAC CGATCTGGGT CTCGCCCTGT TCGACGAGGC 34501 ATGCGCCGGG CCCGGGGAGG CGATCCGGGT CACCACCGCC ACGCCGGCCC CCGCACCCAC 34561 CGAGGCGGAC CGGCAGCCGG TGGAACAACC CCCGGCGGCC GAGGCCTCCG CGACCACGTT 34621 GCTGGAGCGG CTGGCCGGGC GGACGGAGGA CGAGCAGGAC GAGATCCTGC TGGAGCTGGT 34681 CCGTGGCCAG GTCGCCATGG TGCTCGGCCA TCCCGACGCC ACCATGGTCG ACCCGGACCG 34741 AGGCTTCGTG GAACTGGGCT TCGACTCGGT GGCGGCCGTG AAGCTCCGCA ACCAACTGGC 34801 CGGAGCCACC CGGCTCGACC TGCCCGCCAG CCTCACCTTC GACCACCCCA CGGCTGTCGA 34861 TCTCGCCCGC CATCTGCGCG CCGAAATGCT GCCCGACGAC GCGGCGGCCG CCATTCTCGT 34921 GCTCGAAGAG CTCAACAAGC TCGACGATTC GATCCTCGTG CTCGACCCGG CAAGCGCGGC 34981 ACGGGTGCGG ATCTCGACCC TGCTCCAGGA CCTGGCCGCG AAATGGGTCG AGCGGACGGA 35041 TCGGCCATGA gctagcGTGG CACTTTTCGG GCAGAATAAA TGATCATATC GTCAATTATT 35101 ACCTCCACGG GGAGAGCCTG AGCAAACTGG CCTCAGGCAT TTGAGAAGCA CACGGTCACA 35161 CTGCTTCCGG TAGTCAATAA ACCGGTAAGT AGCGTATGCG CTCACGCAAC TGGTCCAGAA 35221 CCTTGACCGA ACGCAGCGGT GGTAACGGCG CAGTGGCGGT TTTCATGGCT TGTTATGACT 35281 GTTTTTTTGG GGTACAGTCT ATGCCTCGGG CATCCAAGCA GCAAGCGCGT TACGCCGTGG 35341 GTCGATGTTT GATGTTATGG AGCAGCAACG ATGTTACGCA GCAGGGCAGT CGCCCTAAAA 35401 CAAAGTTAAA CATCATGAGG GAAGCGGTGA TCGCCGAAGT ATCGACTCAA CTATCAGAGG 35461 TAGTTGGCGT CATCGAGCGC CATCTCGAAC CGACGTTGCT GGCCGTACAT TTGTACGGCT 35521 CCGCAGTGGA TGGCGGCCTG AAGCCACACA GTGATATTGA TTTGCTGGTT ACGGTGACCG 35581 TAAGGCTTGA TGAAACAACG CGGCGAGCTT TGATCAACGA CCTTTTGGAA ACTTCGGCTT 35641 CCCCTGGAGA GAGCGAGATT CTCCGCGCTG TAGAAGTCAC CATTGTTGTG CACGACGACA 35701 TCATTCCGTG GCGTTATCCA GCTAAGCGCG AACTGCAATT TGGAGAATGG CAGCGCAATG 35761 ACATTCTTGC AGGTATCTTC GAGCCAGCCA CGATCGACAT TGATCTGGCT ATCTTGCTGA 35821 CAAAAGCAAG AGAACATAGC GTTGCCTTGG TAGGTCCAGC GGCGGAGGAA CTCTTTGATC 35881 CGGTTCCTGA ACAGGATCTA TTTGAGGCGC TAAATGAAAC CTTAACGCTA TGGAACTCGC 35941 CGCCCGACTG GGCTGGCGAT GAGCGAAATG TAGTGCTTAC GTTGTCCCGC ATTTGGTACA 36001 GCGCAGTAAC CGGCAAAATC GCGCCGAAGG ATGTCGCTGC CGACTGGGCA ATGGAGCGCC 36061 TGCCGGCCCA GTATCAGCCC GTCATACTTG AAGCTAGACA GGCTTATCTT GGACAAGAAG 36121 AAGATCGCTT GGCCTCGCGC GCAGATCAGT TGGAAGAATT TGTCCACTAC GTGAAAGGCG

315

36181 AGATCACCAA GGTAGTCGGC AAATAATAAT ACTAGCAGAA ATCATCCTTA GCGAAAGCTA 36241 AGGATTTTTT TTATCTGATT ACCGCCTTTG AGTGAGCGAA GTTCCTATTC CGAAGTTCCT 36301 ATTCTCATAT AAGTATAGGA ACTTCTTTCA GCTTTCCGCA ACAGTATAAc tgtgcggtat 36361 ttcacaccgc atagatccgt cgagttcaag agaaaaaaaa agaaaaagca aaaagaaaaa 36421 aggaaagcgc gcctcgttca gaatgacacg tatagaatga tgcattacct tgtcatcttc 36481 agtatcatac tgttcgtata catacttact gacattcata ggtatacata tatacacatg 36541 tatatatatc gtatgctgca gctttaaata atcggtgtca ctacataaga acacctttgg 36601 tggagggaac atcgttggta ccattgggcg aggtggcttc tcttatggca accgcaagag 36661 ccttgaacgc actctcacta cggtgatgat cattcttgcc tcgcagacaa tcaacgtgga 36721 gggtaattct gctagcctct gcaaagcttt caagaaaatg cgggatcatc tcgcaagaga 36781 gatctcctac tttctccctt tgcaaaccaa gttcgacaac tgcgtacggc ctgttcgaaa 36841 gatctaccac cgctctggaa agtgcctcat ccaaaggcgc aaatcctgat ccaaaccttt 36901 ttactccacg cactgctcct agggcctctt taaaagcttg accgagagca atcccgcagt 36961 cttcagtggt gtgatggtcg tctatgtgta agtcaccaat gcactcaacg attagcgacc 37021 agccggaatg cttggccaga gcatgtatca tatggtccag aaaccctata cctgtgtgga 37081 cgttaatcac ttgcgattgt gtggcctgtt ctgctactgc ttctgcctct ttttctggga 37141 agatcgagtg ctctatcgct aggggaccac cctttaaaga gatcgcaatc tgaatcttgg 37201 tttcatttgt aatacgcttt actagggctt tctgctctgt accagaacca ccagaaccTT 37261 TGTACAATTC ATCCATACCA TGGGTAATAC CAGCAGCAGT AACAAATTCT AACAAGACCA 37321 TGTGGTCTCT CTTTTCGTTT GGATCTTTGG ATAAGGCAGA TTGAGTGGAT AAGTAATGGT 37381 TGTCTGGTAA CAAGACTGGA CCATCACCAA TTGGAGTATT TTGTTGATAA TGGTCAGCTA 37441 ATTGAACAGA ACCATCTTCA ATGTTGTGTC TAATTTTGAA GTTAACTTTG ATACCATTCT 37501 TTTGTTTGTC AGCCATGATG TAAACATTGT GAGAGTTATA GTTGTATTCC AATTTGTGAC 37561 CTAAAATGTT ACCATCTTCT TTAAAATCAA TACCTTTTAA TTCGATTCTA TTAACTAAGG 37621 TATCACCTTC AAACTTGACT TCAGCTCTGG TCTTGTAGTT ACCGTCATCT TTGAAAAAAA 37681 TAGTTCTTTC TTGAACATAA CCTTCTGGCA TGGCAGACTT GAAAAAGTCA TGTTGTTTCA 37741 TATGATCTGG GTATCTcGCA AAACATTGAA CACCATAACC GAAAGTAGTG ACTAAGGTTG 37801 GCCATGGAAC TGGCAATTTA CCAGTAGTAC AAATAAATTT TAAGGTCAAT TTACCGTAAG 37861 TAGCATCACC TTCACCTTCA CCGGAGACAG AAAATTTGTG ACCATTAACA TCACCATCTA 37921 ATTCAACCAA AATTGGGACA ACACCAGTGA ATAATTCTTC ACCTTTAGAC ATTGTGATGA 37981 TGTTTTATTT GTTTTGATTG GTGTCTTGTA AATAGAAACA AGAGAGAATA ATAAACAAGT 38041 TAAGAATAAA AAACCAAAGG ATGAAAAAGA ATGAATATGA AAAAGAGTAG AGAATAACTT 38101 TGAAAGGGGA CCATGATATA ACTGGAAAAA AGAGGTTCTT GGAAATGAAA AGTTACCAAA 38161 GAGTATTTAT AATTCAGAAA AAAAAGCCAA CGAATATCGT TTTGATGGCG AGCCTTTTTT 38221 TTTTTTTAGG AAGACACTAA AGGTACCTAG CATCATATGG GAAGGAAAGG AAATCACTTG 38281 GAAGACATCA CAAGCATTCA TTTACCAAGA GAAAAAATAT GCATTTTAGC TAAGATCCAT 38341 TGAACAAAGC ACTCACTCAA CTCAACTGAA TGAACGAAAG AAGAAAGAAC AGTAGAAAAC 38401 ACTTTGTGAC GGTGCGGAAC ACATTTACGT AGCTATCATG CTGAATTCTA CTATGAAAAT 38461 CTCCCAATCT GTCGATGGCA AAACGACCCA CGTGGCAGAG TTGGGTCAAG TGCCAGTTTC 38521 TGGATTAAGT AACAGATACA GACATCACAC GCCATAGAGG AATCCCGCCG TTGCGAGAGA 38581 TGGAAAACAA TAGAGCCGAA ATTGTGGAAG CCCGATGTCT GGGTGTACAT TTTTTTTTTT 38641 TttCTTTCTT TCTCTTTCAA TAATCTTTCC TTTTTCCATT TAGCTTGCCG GAAAAACTTT 38701 CGGGTAGCGA AAATCTTTCT GCCGGAAAAA TTAGCTATTT TTTTCTTCCT TATTATTTTT 38761 TTAGTTCTGA AGTTTGACCA GGGCGCTACC CTGACCGTAT CACAACCGAC GATCCGGGGT 38821 CATGGCGGCT A //

316

A.6 Sequence of Fragment 4 yeast assembly.

Sequence of Fragment 2 shown schematically in Fig. 2-1. To show the context of the sequence in the chromosome, the last 30 bp of the HO(L) region on the 5’ end, and the pPYK promoter of the acceptor module on the 3’ end are included. The sequence is in Genbank format:

LOCUS mer-fragment-4-031016 22270 bp DNA linear UNA 07- Apr-2012 DEFINITION FEATURES Location/Qualifiers Misc._recombina 1..30 /label=R1-homology-AN38 Misc._recombina 31..78 /label=FRT-WT ORF 79..879 /label=URA3 Primer_binding_ 463..482 /label=AN184 Primer complement(688..710) /label=AN299 Primer_binding_ complement(857..885) /label=AN106 Terminator 892..1057 /label=tADH Primer 998..1023 /label=AN324 Misc. 1008..1170 /label=cr7-repair Primer 1008..1067 /label=AN489 Primer 1046..1105 /label=AN490 Restriction_sit 1058..1063 /label=XbaI Cleavage_site 1058..1063 /label=assembly1 Misc._recombina 1064..1111 /label=FRT13 Primer complement(1071..1130) /label=AN491 Primer complement(1111..1170) /label=AN492 Amplification 1112..1385 /label="from pSE34" Primer 1201..1220 /label=AN580 Promoter 1207..1383 /label=ermE Primer complement(1359..1379) /label=AN579 Primer complement(1367..1385) /label=AN20 Primer complement(1367..1407)

317

/label=AN221 Primer 1367..1403 /label=DD1 RBS 1386..1415 /label=merE ORF 1416..2609 /label=merE Primer complement(1553..1572) /label=LK2 Primer_binding_ 1652..1673 /label=DAN1 Primer_binding_ complement(1654..1675) /label=DAN2 Primer complement(1654..1675) /label=LK13 Primer 1848..1868 /label=DD14 Primer_binding_ 2279..2301 /label=DAN3 Primer_binding_ complement(2283..2305) /label=DAN4 Primer 2560..2580 /label=DD59 Primer complement(2560..2580) /label=DD63 Primer complement(2675..2693) /label=DD13 ORF complement(2759..3415) /label=merF-1 Primer 2900..2920 /label=DD7 Primer complement(2991..3009) /label=DD4 Primer 3198..3217 /label=DD5 ORF complement(3301..3837) /label=merF-2 Primer_binding_ complement(3348..3367) /label=DD3 Primer complement(3348..3367) /label=DD3 Primer complement(3471..3493) /label=DD64 Primer_binding_ 3477..3499 /label=DAN5 Primer complement(3902..3921) /label=DD2 Primer_binding_ 3991..4010 /label=DAN6 ORF complement(4237..5721) /label=merG Primer 4319..4337 /label=DD11 Primer complement(4539..4558) /label=DD8 Primer 5047..5067 /label=DD9

318

Primer complement(5191..5211) /label=DD6 Primer_binding_ 5620..5641 /label=DAN7 Primer_binding_ complement(5758..5780) /label=DAN8 Primer 5779..5803 /label=DD29 ORF complement(5840..6319) /label=merH Primer 6110..6131 /label=DD23 Primer 6189..6209 /label=DD24 Primer complement(6323..6343) /label=DD12 Primer_binding_ 6352..6374 /label=DAN9 ORF complement(6401..7345) /label=merI Primer_binding_ complement(6534..6554) /label=DAN10 Primer complement(6876..6895) /label=DD60 Primer 7105..7123 /label=DD16 Primer 7105..7123 /label=DD18 Primer complement(7265..7286) /label=DD15 Primer complement(7270..7292) /label=DD10 ORF 7300..8295 /label=merJ Primer_binding_ 7691..7713 /label=DAN11 Primer_binding_ complement(7701..7720) /label=DAN12 Primer 8103..8123 /label=DD26 Primer complement(8143..8161) /label=DD25 Primer complement(8317..8337) /label=DD28 ORF complement(8405..9229) /label=merK Primer_binding_ 8771..8792 /label=DAN13 Primer_binding_ complement(8798..8818) /label=DAN14 Primer 9005..9023 /label=DD21 Primer complement(9119..9136) /label=DD20 Primer complement(9119..9136) /label=DD19 Primer complement(9136..9155)

319

/label=DD17 ORF complement(9342..10391) /label=merL Primer_binding_ complement(9381..9401) /label=DAN16 Primer_binding_ 9414..9433 /label=DAN15 Primer 9856..9876 /label=DD44 Primer 9904..9923 /label=DD34 Primer complement(10041..10059) /label=DD27 ORF complement(10461..10868) /label=merM Primer_binding_ 10582..10601 /label=DAN17 Primer_binding_ complement(10614..10633) /label=DAN18 Primer 10820..10839 /label=DD30 Primer complement(10922..10942) /label=DD22 ORF complement(10953..12446) /label=merN Primer_binding_ complement(11301..11319) /label=DAN20 Primer_binding_ 11306..11324 /label=DAN19 Primer 11746..11764 /label=DD36 Primer 11906..11925 /label=DD55 Primer complement(11907..11925) /label=DD35 Primer complement(12007..12026) /label=DD45 ORF complement(12443..13084) /label=merO Primer 12478..12499 /label=DD56 Primer_binding_ complement(12491..12511) /label=DAN21 Primer 12894..12912 /label=DD32 Primer complement(13073..13093) /label=DD31 Primer_binding_ 13137..13156 /label=DAN22 Primer_binding_ complement(13159..13178) /label=DAN23 ORF complement(13174..14220) /label=merQ Primer 13367..13426 /label=AN374 Misc._signal 13407..13426 /label="cr6-2 target"

320

Primer complement(13407..13466) /label=AN375 Misc._signal complement(13415..13434) /label=cr6-target Mutation 13416 /label="C to T" Primer 13722..13740 /label=DD46 Primer complement(13912..13929) /label=DD37 ORF 14085..14741 /label=merR Primer_binding_ 14347..14366 /label=DAN24 Primer_binding_ complement(14394..14414) /label=DAN25 Primer 14769..14788 /label=DD38 Primer 14769..14788 /label=DD51 ORF 14870..16249 /label=merS Primer complement(14955..14972) /label=DD33 Primer complement(15074..15094) /label=DD61 Primer 15535..15553 /label=DD48 Primer complement(15706..15724) /label=DD47 Primer 16036..16054 /label=DD62 Primer_binding_ complement(16246..16266) /label=DAN27 Primer_binding_ 16291..16309 /label=DAN26 ORF complement(16299..16706) /label=merT Primer 16730..16751 /label=DD40 Primer complement(16850..16869) /label=DD39 Primer 16850..16869 /label=DD52 ORF complement(16939..18628) /label=merU Primer_binding_ complement(17031..17052) /label=DAN28 Primer_binding_ 17035..17056 /label=DAN29 Primer 17696..17713 /label=DD50 Primer complement(17809..17828) /label=DD49 Primer complement(18234..18255) /label=DD57 Primer 18310..18330

321

/label=DD58 Primer_binding_ complement(18383..18403) /label=DAN30 RBS complement(18599..18628) /label=merT Primer 18607..18666 /label=DD42 Primer complement(18608..18667) /label=DD41 Promoter 18664..18692 /label=ampR ORF 18734..19594 /label=Ampicillin Primer 19077..19101 /label=AN380 Primer complement(19123..19143) /label=AN381 Primer complement(19646..19681) /label=DD43 Primer complement(19646..19675) /label=AN347 Primer complement(19646..19681) /label=DD54 Misc._recombina 19741..19788 /label=FRT3 Primer complement(19866..19881) /label=AN112 Terminator complement(19867..19924) /label=tHIS3 Terminator complement(19994..22270) /label=HIS3-part ORF complement(19994..20697) /label=HIS3 Primer complement(20076..20097) /label=AN272 ORF complement(20698..21411) /label=GFP Promoter complement(21412..22270) /label=pPYK ORIGIN 1 aaaattgtgc ctttggactt aaaatggcgt GAAGTTCCTA TTCCGAAGTT CCTATTCTCT 61 AGAAAGTATA GGAACTTCTC GAAAGCTACA TATAAGGAAC GTGCTGCTAC TCATCCTAGT 121 CCTGTTGCTG CCAAGCTATT TAATATCATG CACGAAAAGC AAACAAACTT GTGTGCTTCA 181 TTGGATGTTC GTACCACCAA GGAATTACTG GAGTTAGTTG AAGCATTAGG TCCCAAAATT 241 TGTTTACTAA AAACACATGT GGATATCTTG ACTGATTTTT CCATGGAGGG CACAGTTAAG 301 CCGCTAAAGG CATTATCCGC CAAGTACAAT TTTTTACTCT TCGAAGACAG AAAATTTGCT 361 GACATTGGTA ATACAGTCAA ATTGCAGTAC TCTGCGGGTG TATACAGAAT AGCAGAATGG 421 GCAGACATTA CGAATGCACA CGGTGTGGTG GGCCCAGGTA TTGTTAGCGG TTTGAAGCAG 481 GCGGCAGAAG AAGTAACAAA GGAACCTAGA GGCCTTTTGA TGTTAGCAGA ATTGTCATGC 541 AAGGGCTCCC TATCTACTGG AGAATATACT AAGGGTACTG TTGACATTGC GAAGAGCGAC 601 AAAGATTTTG TTATCGGCTT TATTGCTCAA AGAGACATGG GTGGAAGAGA TGAAGGTTAC 661 GATTGGTTGA TTATGACACC CGGTGTGGGT TTAGATGACA AGGGAGACGC ATTGGGTCAA 721 CAGTATAGAA CCGTGGATGA TGTGGTCTCT ACAGGATCTG ACATTATTAT TGTTGGAAGA 781 GGACTATTTG CAAAGGGAAG GGATGCTAAG GTAGAGGGTG AACGTTACAG AAAAGCAGGC 841 TGGGAAGCAT ATTTGAGAAG ATGCGGCCAG CAAAACTAAG TCGACCTCGA Ggcgaatttc 901 ttatgattta tgatttttat tattaaataa gttataaaaa aaataagtgt atacaaattt 961 taaagtgact cttaggtttt aaaacgaaaa ttcttattct tgagtaactc tttcctgtag

322

1021 gtcaggttgc tttctcaggt atagcatgag gtcgctctct agaGAAGTTC CTATTCCGAA 1081 GTTCCTATTC TCATATAAGT ATAGGAACTT CtaccAGCCC GACCCGAGCA CGCGCCGGCA 1141 CGCCTGGTCG ATGTCGGACC GGAGTTCGAG GTACGCGGCT TGCAGGTCCA GGAAGGGGAC 1201 GTCCATGCGA GTGTCCGTTC GAGTGGCGGC TTGCGCCCGA TGCTAGTCGC GGTTGATCGG 1261 CGATCGCAGG TGCACGCGGT CGATCTTGAC GGCTGGCGAG AGGTGCGGGG AGGATCTGAC 1321 CGACGCGGTC CACACGTGGC ACCGCGATGC TGTTGTGGGC ACAATCGTGC CGGTTGGTAg 1381 gatccTCAGT TGATGCGCGA ACGAGGGAGT CAACAGTGAG CGAGACCTTG TCCCTTCCCG 1441 GGACCGTGAA GGCCGAACGG CGTTGTCCGT ACGACCCGCC GGAGGCGCAC CGCCGACTGC 1501 GGGACAAGGG CGAACTGGGC AAACTGGAGC TGCCCGGCGG TCTGGTGATG TGGTTCCTGA 1561 CCAAGCACGA CGACATCAGG GCCATGCTGG CCGACTCCCG GTTCAGCGGT GCGAGGGTGC 1621 CGTTTCCGGC GATGAACCCG GAGATACCCG CGGGCTTCTT CTTCTCCATG GACCCGCCGG 1681 ACCACACCCG CTACCGCCGC ACACTCACCG CCGAGTTCTC GGTGCGCGGC GCACGCGAAC 1741 TGACCGGCCG GATCGAGCGG CTGGCCGACC GGCACCTCGA TGCGATGGAG GCGGCGGGCA 1801 CGAGCGCGGA CCTCGTGGCG GCCTACGCCA GTCCGGTGCC CGCGATGGTG ATCTCCGAAA 1861 TCCTCGGCGT GCCGTACACC TACCACCAGA AGTTCGACCA CGAGGTACGC ACGCTCCGGG 1921 AGACCGGCGG CGACGATCAG GCCGTCGGCG CGATGGCGAC CGCGTGGTGG GACGAGATGC 1981 GCGGATTCGT GCGTGCCAAA CGGGCCGAGC CCGGGGACGA CATGATCAGC AGGCTGCTGC 2041 ATGATGAGGT CGAGGGCGGT GCGCTGACCG ACGAGGAGGT GGTCGGCATT GCGATGACCA 2101 TCATTTTCGC CGGTCATGAA CCCGTGGAGA ACCTGATCGG CCTCGGCATG CTGGCGCTGT 2161 TCCAGGACGG TGAGCAGCTG ACCCGGTTGC GGGAGAACCC CGACCTCATT GACAGCGCCG 2221 TGGAGGAGTT CCTTCGCTAC TTCCCCGTCA ACAACTTCGG CACCGTGCGC ACCGCCACCG 2281 AGGATGCAGT GATCAATGGT CACCCCATCG CGAAGGGCGA GATCGTGGCC GGTCTGGTGT 2341 CCACCGCCAA CCGGGACCCC GAGCGGTTCG CCGATCCCGA CCGCCTTGTC CTCGACCGGT 2401 CGCACACCTC CCACCTCGCG TTCGGGCACG GTGTGCACCA GTGTCTGGGC CAGCAGCTGG 2461 CGAGGGTGGA ACTGAAGGTG CTCCTACAGC GGCTGCTCGT CAGGTTCCCC GCTCTGCGGC 2521 TGGCGGTGGC CCCGGAGGAG ATCAGGTACC GGGAGAACAC CTCGTTCTAC GGTGTCCACG 2581 AGCTCCCGGT GACCTGGGCG GCCGAGTAGC CGCAGCCGGG GCCGGAAGAC ACGGCGCGGG 2641 CGGTGGCCGC GGGGTCCGGC GCGAGCGGTG GCCGGATACC CGGCCACCGG CTCAGCCGGC 2701 CCGGGTGACG CCCACTGCCG CCCTGAGATC CGCCCAGAAC TCCGGCCGAC CGCGGATCTC 2761 AAGAGCCGAG CCGGCCGCCG GTCAGGCGTG CCAGCGGGCG GCGGCGACGA GCCGCGCGCC 2821 TGCGCAGGAC GACCGCGGCG GTGGCGCCCG CCGCGGCGGC ACCCGCACCG GCGACGGCTG 2881 CCAGGACCTT CCGGGCCTTG ATCATCATCC AGGCCGAACC CGCCGCCGCG GCAGCCTTCT 2941 TGGGCGCCTC AGCGGCCACG GTACGACCGG TGGTCACGGC CTTCCCGGCG GCCACTTGAG 3001 CCTTGTCGGC GGCGACATGA GCGGTGTGCG CCGCGGTCGT CGCCGCGCCG GCCGCGGCGG 3061 TCTTCGCCGT GGTCTCCGCC TTGCCCGCGG TGTCCTTGGC CCTTTCCGCG GTCTCCTTGG 3121 CCTTGGTGGC GGTGGCCTTC GCGTTCACGT TGGTGCTCCG GGTAGTGTGT GCGTTCTTCT 3181 TCTCGGTCAT GTTCACCGCG TTACCACTCG ACCCGACAAC AAACCCGCGT CCTCGGGGCC 3241 GGCCGCCACG GCGTGACGAT CCGAGGGGGC GGCACACCGG GAGGCGCGCC GCCCATCGGC 3301 TCATTCGATG TCTGAGCCGC CCGAACCACC CGAGCCGCGC CGCTGCTGGA ACAGCACAAA 3361 CCCTCCGGCC AGGGCGAACA GACAGACGCC GATGATGATG CCCACGGCGG GCCAAGCGGA 3421 TCCGCCGTCG GATGTCGCGG CCTTCGAAGG CTCCAGCGAT GTGGCATCGG GCAACTGTCT 3481 TACGACGAAA CTGTGTTCGG CGTTCTCGCC GGGTGCGAGC GCCGGGCCGC CGACCGAGTA 3541 GCCGTCATCG GTGGGCTTCA GCTTCCAGCC CTTCGGGGCC TGCTTCAGCC TCACATCGCC 3601 GGGGTCGATG CCCGTGGGCA ACACGGTCCG GATCTCGGTG AAACCGGCCC TCCCGTCCTC 3661 CGCCTCCGAC TCGAACGTCA GCGTGACGTC CTTCGCCAGG GCGCGGGAGT CGGAGGCGCT 3721 CACCTCGGTG TGGGCCAGGG CGGGGGTCGC CGTGGCGAGG ACGAGCGCCG AGGTGGCGAC 3781 GGCCAGGGCG CCGATCCGTC GCGGCCGCGG GTGGGACGGC GGACGATGTG CTTTCACGGT 3841 ATCTCTCCTC GTCCATGGGG TGGTCACCCG AGCCCTCCTC ACCGGTCACC CGGCGCGCTC 3901 ACCAGGTGCG TTCACATTCC CCGGTACGTA CGGCCCGGGC GCGATGTTCA TCCTCGCCCA 3961 GACCGGAAGC CCGCTTGCCA CGCGGGGGAG CAAGGAGTTC CGGCGTCTTC GTGGCCCGCG 4021 CAAGGCGGAC CGAGGGGGTC GCCGCGTCCC AGTGCGGTGA TGTGCCCGGT GGTCAGGTGG 4081 TCCGCTGGTC CCGCAGGGTC CGGGACAGCT CCTCCTCGGT GAGCACCCGG GGCGCGGGTC 4141 CGGCACCGGG CTTCGTCTCG GCGCCGTTCG CCGGGCTCTT GTTCTCGGTC GTCGTCATGA 4201 GGGTGCACCT TTCGCTGGTG GTGGCGGAGG ACGGGATCAG GCGGTGGCGA CCGGTGTGGC 4261 GGGCGGATTG GCGTTCCGCG GCACAGCGGG GGGCTTGCGC GCAGGTCCTC GCAGTACGGT 4321 GAACGCCACC GCGAAGGCCG CCACGATGAG CCCCGTTCCG ACGGCGAAGG CCAGGTGGTA 4381 GCCGCCGGTC AGCGCCTCGG CCCGACCCTT GCCCCGGGAG AGCAGGGCAT CCGTGCGGGA

323

4441 GGCGGCCAGG GTGGACAGCA CCGCGACGCC CAGCGCCATG CCGATCTGCT GGGTGGTGTT 4501 GAACAGCCCG GAGACGAGCC CGGCCTCGTC CTCCTTCGCA CCGGACATTC CCAGGCTGGT 4561 CAGCGCAGGG AGCGCCAGCC CGAAACCGGC GGCGAGCAGC ATCACCGGGA GGAGGTCGGG 4621 GAGGTACCGG GCGTGCACGG GGACGCGGAC GAGCAGGCCG AGAACGCCGG TCAGGAGGGC 4681 CAGCCCGGTC AGCAGCACCG CGCGGTCGCC GAAGCGTGCG CTGAGCCGTG CGGAGACGCC 4741 GAGGGACACC GCGCCGATGG CGATGGCGGC CGGGAGCATG GCCAGACCGG TTCCGGTGGC 4801 GTCGTACCCC AGCACATTGC GCAGATAGAG GGCGACCAGG ATCTGGAACG AGAAGAGCGC 4861 GGCCACCATC AGGAGCTGGA CCAGATTGGC CCCCGCCACC CCGCGCGACC GCAGGATCCG 4921 CAGGGGCATC AGCGGGGTGC GGGCGGTGGT CTGGCGGACC AGGAACAGGG CGATCAGGAG 4981 GATCGAGACG GCGCCGAGGC CGAGTGTGCG CGCCGCCGTC CAGCCGTAGT CCGCCACCTT 5041 GACCACGGTG TAGATGCCCA GCATCAGCCC GGTCGTGACC AGCAGGGCGC CGAGGACATC 5101 GGCGCCGGCC GCGAGGCCCG GCCCGCGGTC GGCGGGCAGG ACGGGTATGG CGACCGCGAG 5161 CGTCAGCAGC CCGATCGGCA GATTGATCAG GAAGATCCAG TGCCAGCTGA GCGCGTCGGT 5221 GAGGAGGCCG CCGAGCACCT GGCCGATCGA CGCTCCGGCG GCGCCGGTGA AGCTGAACAC 5281 GGCGATCGCC TTCGACCGTT CGGCGCGTTC GGTGAAGAGC GTGACGAGGA TGCCCAGGCT 5341 GACCGCCGAG GCCATCGCGC TGCCGACCCC CTGGAGGAAC CGTGCGGCGA TCAGCACAGC 5401 GGGGGAGGTG GCCACGGCCG CGAGCAACGA GGCCGCGGTG AACACCGCGG TACCGGTCAG 5461 GAACACCCGC TTGCGGCCGA TGAGATCGCC GATACGGCCG CCGAGCAGCA GCAGACCGCC 5521 GAACGCGATC AGGTAGGCGT TGACGACCCA GCTGAGCCCG GCGGGGGAGA ACCGCAGATC 5581 GCTCTGGATG GCGGGCATGG CCACGGTCAC GATGCTGCCG TCGAGGATCA CCATCAGCAT 5641 GCCGGTGGCG ATGACCCCGA GGGCCAGTCG ACGTGTCGGG GGGACACGGG GAGACGTCGG 5701 GTCGGAAGAG GCGGCGGACA TGCGGACACT CCTGTCAGTA GGCGCGACAG GAGTGACCGT 5761 AGCAGATGGT TTTGTTGCAG ACTATTTGTT TCGGCTCTAC TTGTGGCGCA TGACGGGCGG 5821 CCGCGCGGAT GTCTCCGCTC TACTTCTCGC GCTGCCGCGC CCTCCGCGCC GGGCGGGGGC 5881 TCTCGGCGGG CGTGGCCAGA TGCCCCTCGG ACAGCCGGGT CAACGCCTTT AGCAGCGCGG 5941 CGCGTTGGGT CTCGGGGAGT GTCGCCAACG CCTCGCGATG GACGCGGTCC ACGATCTCCT 6001 GGCTCCGTTC GGCGATCCGC GCGCCCTCCT CGGTGACCGC GATGATCCGG GCCCGGCGAT 6061 CGTGGGTCGA GGCGCGCCGC TCCGCGAGGC CCGCCTTCTC CAGGGCGTCC ACCGTCACCA 6121 CCATCGTGGT CTTGTCCATG TCGCCGATCT CGGCGAGCTG GGCCTGGGTG CGCTCTTCCT 6181 CCAGGGCGTG GACCAGTACG CAGTGCATCC GCGCCGTCAG CCCGATTTCG GCGAGCGCGG 6241 CCGACATCTG GGTGCGGAGG ACGTGGCTGG TGTGGTCGAG GAGGAACGAC AGGTCGGGTT 6301 CGGTCTTGGT GGGCGCCATG GCGGTCATGC GGCCCAGGGT AACAATTCGA TCCGTACTGG 6361 ATTATCCGGA ACAGTCCATA GGGAGGGGTG GGGTCAGGGG TCAGCCGTTG CGGTAGAGGG 6421 CGGTGAGCAG CTCGACCGCG GTCTGGGTGG CGGGGTGCGG GTCGTCCCGC CACCAGGCGA 6481 GGCGTACCGC GATGGGCTCG GCGTCGCGGA CCGGCCGGTA GGCGATTCCG GGCCTCGGAT 6541 ACTGGTTGGC CGTGGACTCC GCCGTCATGC CGACGCAGCG GCCCGCGGAG ATCACGGTGA 6601 GCCAGTCCTC CACGTCGTGG GTCTCCTCCG TGGCCGGCCG GGAGTCGGGC GGCCACAGCT 6661 CCGTGGTGGT GGTACCGGTC CTGCGGTCGA CCAGCAGGGT GCGCCCGCTG AGGTCGGCCA 6721 GCCGGACCGA GCGGCGCCTG GCGAGCGGGT CGTCGGCGGC CACGGCGCAC AGCCGCCGCT 6781 CCAGTCCGAC GATGGCGGAG TCGAAGCGGC GCTCGTCGAG CGGTCTGCGC ACCACGGCCA 6841 GGTCGCAGGC GCCCTCCGTC AGCCCCGCGG TGGCGGAATT GACGCGGACG AGGTGCAGCT 6901 CCGTCTCGGG ATACGCCTGC GCCCAGCGGC GCTGGAAGGC GGGGGTGTGA CGGCCCAGCG 6961 CGGACCAGGC GTAGCCGATC CGCAGATGGG CGTGGCCCGA TACGGCCTCC CGGATCAGCC 7021 CGTCCACCTC GGCCAGCACC CGCCGGGCGT GTGCCACCAC CCGCAGCCCG GTGCCCGTCG 7081 GGGTCACCTC GCGGGAGGTC CGCCGCAACA GCCTTGTCCC CAGGGCGCGT TCGAGCGCTG 7141 CCAGGGTGCG GGACACGGCC GCCTGGGAGA CGCCGAGCGC GATGGCGGCG TCGGTGAAGG 7201 TGCCCTCGTC GACGATCGCG ACGAGGCAGC GCAGTTGCCG TAGCTCCACA TCCATACGTC 7261 CAGCGTATAG ATAGAACCCC GAACGCATTT TGCGCATGCA TGAGCCGGGC GCACGATCGA 7321 CGCATGCGAC TCGCCCCCGC GTCACGCACC CCGTCCCCAC GAGCGATGGA CACCGCACAC 7381 CGGACCGCGC CGACCCCTGC CGACTACGAC CTGGGACAGG GCCTGGAGCG GGGCCTGGCC 7441 CCTGACCCTG ATCAGCGGCC GACCGGACGG CGGTTCGCCG GTGTGGCCAC GATGATCGGC 7501 AGTGGGCTGT CCAACCAGAC CGGCGCCGCG ATCGGATCCC AGGCCTTCCC CGTCATCGGC 7561 CCGGTCGGGG TCGTCGCCGT CCGCCAGTAC GTGGCCGCGA TCGTCCTGCT GGCCGTCGGC 7621 AGGCCCCGGT TGCGGAGCTT CACCTGGTGG CAGTGGCGGC CGGTGGTGGG GCTCGCCGTG 7681 GTGTTCGGCA CCATGAATCT GTCCCTGTAC AGCGCCATCG ACCGCATCGG CCTCGGGCTG 7741 GCGGTGACCC TGGAGTTCCT CGGCCCGCTG TGCATCGCGC TCGCCGGCTC ACGGCGCCGC 7801 GTGGACGCCT GCTGTGCGCT GGTCGCGGCG GCCGCCGTGG TGACCCTCAT GCGCCCGCGC

324

7861 CCCTCGGCCG ACTATCTGGG TATGGGGCTG GGGTTGCTGG CCGCCGTGTG CTGGGCGTCG 7921 TACATCCTGC TCAACCGCAC CGTGGGGCGG CGGGTCCCCG GCGCCCAGGG GTCGGCGGCG 7981 GCCGCGGGGA TCTCCGCGCT GATGTTCCTG CCGGTCGGGA TCGCCGTCGC CGTCCACCAG 8041 CCGCCGACCG TGAGCGCCGC GGCGTACGCC ATCATCGCGG GCGTCCTCTC CTCGGCCGTG 8101 CCGTACCTCG CGGACCTGTT CACGCTGCGC CGCGTGCCCG CCCAGGCGTT CGGGCTCTTC 8161 ATGAGCGTCA ACCCCGTCCT CGCCGCACTG GTCGGCTGGG TCGGCCTGGG GCAGAGCCTG 8221 GGGTGGACGG AGTGGATCAG CGTGGGCGCC ATCGTCGCGG CCAACGCGCT GAGCATCCTC 8281 ACCCGGCGCG GCTGAAGGAC CAGCGGGGGT GGCCCGGTGA CTTGGCTGAC CTGGACCCGG 8341 GGGTGGACCC GGGGACGGAG GGCCGCGCCG CCCCCAGGCC ACCGCTCCGC CCCCGGGCCA 8401 CCGCTCAGCC CGCGGCCTCG AACAGCGCCT CCGCGGCGGC GATCGCCTCG GCCAGGGCGG 8461 TGGGCTCCGG CCGCAGCCCC GCCACGATCG TGTCGATGGC GCGCAGGTCC GCCCGCTGGA 8521 GGAGCCGCTT CTCGTTGGTG ACCCACTCAC CGCGCGCCGC CAGCACCGCG TGCGCCGTCT 8581 GGAGGGCGGC GGTCGCCACC GCCCCCGCGA CCTCGGTGGC CTGGCCACGT CCCACGTACG 8641 CGGCCGAGGC GTACCGCAGG GTCAAGGCCG CCCGCCCGCG CCACGCCGGG GGTGCCGCCT 8701 CACGCAGCGC CGCCGGGTAC TCGGGGCGGG GCAGGGTGCC CCGCAGCACC TGGTTCAGGG 8761 CGAGTTCGGC CACCACCAGA TAGCTGGGGA TGCCCGCGAG GTGGAACATC AGCGGCTCCC 8821 AGTGGAAGCG GCCCCGTCGC GACTCGGCGA GTTCGTGTTC CACCACCTCG AGGTCGCGGT 8881 AGTGGACGTC CACGCGCCGT CCGTCGATCG TCAGCCAGGC GCCCCCGTTG AAGACACCAC 8941 CGCCCCACTC GCCGAGCTCG GAGACCTCGC CCTCCCAGCC CACGGCCCGC AGCGCGGCCG 9001 GGTCGAAGCC GCCTCGGTAG TACAGGGCCA GGTCCCAGTC GCTCTCCGGG GTGTGGGTCC 9061 CCTGCGCACG CGAGCCCCCG AGGGCGACGG CGTGCACGGC GGGCAGGGCG GCGAGTCGCT 9121 CTGCGACATC GTCGAGGAAC GTGTCGTCGG TCATGAGGAA CATGTCGTCG GTCATACGGA 9181 TCCGATCGTG TGAAGTGGAT GACGGGTGCC GCGGGCACAC CGAACGCACG CCGGAGCACA 9241 GGCTCGAACG GCGGCGTCCA TACGGATCGG TGTGCCGGGT CATCACTCCA TGACGCCGAC 9301 CTTACCGCCC CCGTCAGAGG GCGGCAGCGG CCAGCGGCGG ATCAGTCCTT CACCAGGCCC 9361 AGGCGGAACA GGCTCTCCGC GGTGTCGAGG ATGGTCGTCA CCGGGTCGCG CGGGGTCCAG 9421 CCGAACACGG AACGCGCCTT CTCGGTGCGC AGGATCGGCA CCCGCTCCGT CACGCCGACC 9481 GCTTCCCGCG CTCGTTCGTC GTCGAACTCC CGCGTGGGCA CCCGGGCGGC GCGCTCGCCG 9541 AGGTGCTCGG CCAGCACCTG GGCGATCCAC AGGAAGCTGA CGGTCCGGTC GCCGCTGGCG 9601 AGGAAGCGCT CTCCGGCCGC GGCGGGGTGT GCCATGGCCC GGAGGTGGAG CTCGGCGACA 9661 TCGCGCACGT CCACCATGCC GAAGTGTGCG CGGGGGACGG CCGACATCGC CCCCTCCAGC 9721 ATCGCCCGGA CGTGTTCCGT CGAGGCGGAC AGCCGGGGGC CGAGTGCCGG ACCGAAGATC 9781 CCGGTCGGGT TGATCACCGT CAGTTCGAGG CCGTCCCCCT CCTTCGCCAC GAAGTCCCAG 9841 GCGGCCAGCT CCGCGATGGT CTTCGAGCGG ATGTAGGGCG GGTTGTCGTC CTCGGGGTCG 9901 GTCCAGTCGC TCTCGTCGTA CTCGTCACCG TCCTTGTGGC TGTATCCCAC CGCGGCGAAC 9961 GAGGACGTCA TCACGACCCG TTTCACACCC TGGTCCCGTG CGGCCCTCAG CACACGAAGG 10021 GTGCCGTCCC GCGCGGGGAC GATCAGCTCG TCGGCGTTGT CCGGCTGGAC GGCGGGGAAC 10081 GGTGACGCGA CGTGGTGGAC GCGGGTGCAC CCCGCCATCG CGTCGTCCCA GCCGTCGTCC 10141 GTGGTCAGGT CGGCGCTGAC GATATCGAGC CGCCCGCCGG GATCGACACC GGAGGCCGCG 10201 ATGGCCGACC GGACACTCGC GGCGGCGCCG GTGGCCGGGC CGTGTGAACG GACCGTGGTG 10261 CGGACCCGAT GGCCGCTCCG CAGCAGGCCG CTGATCACAT GGGTGCCGAG ATAGCCGCTG 10321 CCTCCTGTCA CCAGGACGAG TTCGCCACTA ACGGTGTCGC CACGGGCGTC GGCGCCGGCA 10381 TCAGCGGACA CGGGGGTTGC CTTGCTTTCC ATGGGGTACT TCGGATCCCT TCCCAAGTGT 10441 GTTTCTCGCA GCTGTGTCTC TCACGGCCGC AGCGCGTCGA TGACGTCCGT CAGTTCGTCG 10501 ATCGCCCGGC GCTCCGCATC GGCGTCGGCG CGCTCCCGCG CGTCCCACAT GTCCGCCTTG 10561 AGCCGCAGAT AGCGCAGCCG GACCTCCATG AGCGCGATCT CCCGCTCCAG GCGGTCCGCG 10621 TTGCGTTGGA AGAGGTCACG CAGGGGAGCC GCACCCTGAT CGCCTTCGTC GAGGTGGCCG 10681 AGGTAGGCGC GCATGTCCTG CATGCTCATG CCGGTGGATC TCAGGCACCC CAGCGACCTG 10741 ATCGTCTCCA CCACGGAGGG GGGATAGCGC CGGTGGCCAC TGTCCCGGTC GCGGTCCACG 10801 GCGGGGATCA AGCCGATCTT CTCGTAGTAG CGCAGTGTCG GCTCCGAGAG GCCACTCAGC 10861 CTCGACACCT GCTGGATGGT CATCGGGGAG CCCCGGACCT CTGTCCTCGT CGTTGTCATA 10921 GGACGAGCAT CCGATACTTG AAGCGCTTGA GGTCAAGCGA GCCGATCGGC CTTTGCGGGG 10981 ACGGCGGTCC CGGAAGTGGG CGCGCCCGGG CGCTTCCCGG CCGTGGCGAT GGAGATGTCC 11041 CGCATGAGGA GCAGCGCCCC GAGGGCGAGG AGCCCGGCGG CGCCGCCGCT CCAGGAGACG 11101 GTGGAGCCGA TGGCCGTGGC GGTGGGCGCG GCCGCGCCGG AGTGGCCGAT CAGGGACTGG 11161 GCGCTGGCCA GGCCGACCGC GCCACCGAGC TGCTTGGTGA GCGCGGAACC CGCGGTGGCG 11221 GTGCCCATGT CCGCACGCGG GACGGCGCTC TGGGTGGCGA TGGTGAGCCC GCCCATGGCC

325

11281 GGTCCCGCGC CGAGCCCGAC GAGCAGCAGC AGAACGGACG TCAGCGCGAG AGGGGTCGTG 11341 GCCCGCAGGG CGACGAAGGC GGCGGTACCG GCGGTGAGCA GCCCCGCGCC GATCAGCAGG 11401 ACCGGCTTGA CGTGCCCGCT GCGCAGCACG GTGGCGGCGG TGAGCCGGTT GCCCAGGGTC 11461 ATGCCGATGA GCAGGGGGAG CAGCAGCAGA CCGGAGGCGG TGGCCGAATG GCCGCGGATG 11521 TGCTGGAAGT ACAGCGGCAG GAAGATTCCC ACCGGCGCCG CGGCGACCTG GAAGAAGAAA 11581 CCGGCGGTCA GCAGGGCGGT GTAGGTGCGG TGCCGGAACA GCCGCAGGGG CAGGACGGGG 11641 ACGGCGGCCC GCCGCTCGAC CGGTATGAGC GTGGTGAGCA GCGCCAGACC GCCGAGCAGA 11701 CAGCCCAGCA CGGCCGGGTC CGTCCAGGAG GGCGCGTGTC CGGCGGTCGC GTTCCCCTTG 11761 AGGCTGAGGC CGGTCAGCGC GAGGGCGAGC CCCGCGGCGA GCAGGAGGAT CCCCGCCACG 11821 TCGAGCCGGC CGGACGGCGG GGTGGCGGGA CGGCGGTCGG GCAGGGCCAG GACGATGACG 11881 GCGCCCGCGG CCAGCCCGAG CGGGAGGTTG AGCCAGAACG CCCAGCGCCA GCCGATGTGA 11941 TCGGCGAGTA ACCCGCCGAG GAGCGGGCCG CCCACCATGC CCAGGATCAT CATGGCGGCC 12001 ATCGCCGTCT GCATCCGGAT GAGGCCCTGG GGGCGGGACG GCGGGTGGAG GTCGCGGACC 12061 AGTGCCATGC CGAGGGTCAG CAGGGATCCG GCACCCAGGC CCTGGAGCGC GCGGGAGAGG 12121 ATCAGGGCGG GCATCGAGGC GGACAGGCCG CAGGCGATGG AGCCGATCAG GAAGACGCCG 12181 AGCCCGCCGA TCAGCAGCCG GCGGCGGCCG TGGAGGTCGG AGAAGCGGCC GTAGACCGGC 12241 ACGCTGACCG AGGAGGTCAG CAGATAGGCG GTGACGAGCC AGACGTACCA GGAGTCCCCT 12301 CCGCCGATCT GCTCGACGAT GCGGGGCAGC GCGGTGCCGA CCACGGTGCC GTCCAGCATG 12361 GCCAGGAAGG CGCAGCCCAG CAGGGCGATG GTGACCAGGG CCCGGCGGCG GTGCGGGAGC 12421 GCTTCGTACC CGTCGGGTCC GGTCACCGGG CCTCCCCGGG CGCCACGAGC GCGCCGTGCA 12481 GGAAGAGGTC CACGACTTCC TCGGTGGTCA GCGGCTCGGG GGCGCCCAGG CGCCCGGCCG 12541 ACATCAGCGT CAGCTGGAAG GCGTCGGCGA GCCGTTCCGG GGCGAGTCGC AGGCGGTCCC 12601 GGTCGGGCTC GAACAGCGCG GCCAGCGCGG CGCGCGGCCG GACCAGGCTC GCCTCGCGGT 12661 CCGGGAGGCG CCCGTCCTTG CCGGGCTTGG GCGCCATGCG CTCCAGCCGC CCGGCCGCCG 12721 CGAGCGCCCC GGCGACCGCG CCGATGCGCG CCATGTGTCC GCGCACCACA TCGGCCGCCT 12781 CGGCGAGCCG GTCCGCAAGC GGCTGGTCAA GGGCGATCGA CTCCAGATGG GCCACGGTGT 12841 CATCGGGCCG CACGGCCTCC GCCATACAGG CCGCGAGCAG GGCGTCCTTG TCCTCGAAGA 12901 CGCGGAAGAT AGTGCCTTCC CCGATGCCCG CGGCCCGGGC GATCTTCGCG GTCGTCACGG 12961 TGGCGCCGTA TTCGACGACG AGGGGGAGCG CGGCGGCGAC GATCATCGCG CGGCGCTGGT 13021 CGGGATCCAT GGCCGGAGCG CGGCGGCGGG TCGGAGTGGA GGGGTTCTCT GCCTTCTCTG 13081 TCATGCGGGA TACGGTACGG AGTGAGTACT CACTCCGTCA ATGCACGGTG CGCGGCCACA 13141 AGGCGAGTGG CGGTTCGGCT TCGACGTTGT CGGTCAGCGC GCGGCGAGCA GGGCCCGGCG 13201 CAGGGCGCGG CCCGCCTCGG ACGGGGGATG GTTGCGCGAG GCGGCGAGGC CGAGACCGTG 13261 CAGCGGCGGT GGATCGGCGA GATCGACCTG GGTCAGGCCG GGCCGCGAGG CGATGGTCTC 13321 CTCGGCCACG AAGGCGAGCC CGAGACGCCG CCGGACCATG GTCAGGGCGG TCGTGGTGTC 13381 GACGACCTCC AGTGCCACGG TGCGCTGAAC TCCGGCGGTG CCGAAGAGGC TGTCGACGAT 13441 GGTGCGGTCG CCCCACCCCG TGGGGAAGTC GATGAAGCGG CGGTCGGCGA GGTCGGCGTA 13501 GGTCACGCCG TGCGCCTCGG CGAGGGGGTC GTCGGTGCGG CAGGCCAGCC CGAGGCGTAT 13561 CCGCGACACA TCATCGATGA TCAGATCCGG GCCGAGGACG GCGGGGCCGT GCGGGGGCAC 13621 CGGCAGCAGC ATCAGGTCGA ACCTGCCCTC GCGCAGGGCG GTGGCGTGTC CGGCCAGCGG 13681 ACCGGTCGAG TGGCGCAGCC GCACCACGAC ATCGGGGTGC TCGGCCTGAA ACGTGCTCAG 13741 CGCCCCGATC AGGTCGAACG AGCCGGTGGA CAGGACCGTC CCGAGGGTGA CCGTACCGCT 13801 GAGCCCCCCG GTGAGACGGC CCATGTCATC GCGCGCCCGC TGCGCCTCCG CGAGCAGGAT 13861 CCGGGCCCGG GCCAGCAGGG TGCGCCCCGC GGTGGTCAGC TCCAGGGTGC GGTGCGAGCG 13921 GTCGAAGAGC GCGGTCTGGA ACTCCTGCTC CAGCCGGGCC ACGGCGGCGG AGGCCGCCGA 13981 CTGGACGACG TGTTCCCGCT GGGCGCCGCG GGTGAAGCTG CGCTCCTCGG CCACCGCGAC 14041 GAAGTACGCC AGCTGCCGGA GCTCCACCAT CATCTCCATT CGCGATGCCA CACAGCACAC 14101 ATCATCGTTG GACACGATAC CTATGGGACC GCCACCGTGG AGGGGAAGCG GAACGCCCCG 14161 GCCGGACGGC CCGGTTCGGC GCCACGCCCC CCAACTTCCC CGTGTGCCAG CACACTTCAC 14221 CACGGAAGGC ATCCATCGTC ATGAGCGTCT CAGCCATCCA GATCGGGCTC CACCCCGATG 14281 CCATCGACTA CGAGGCGCCG GAGTTCGCCG CCTTCGCCGG TCTGAGCCGG GAGACGTTGC 14341 GCGCCGCCAA CGACGACAAC CTCGCCCTGC TGCTCGACGC CGGATACGAG GCGGACGGCT 14401 GTCAGATCGA CTTCGGGGAG ACCGCCCTCG ACACCATCCG CGCCATGCTC GGCCGCAAGC 14461 GCTACGACGC GGTCCTCATC GGCGCCGGGG TACGGCTCAC CGCGGGCAAT ACACTGCTCT 14521 TCGAATCCAT CGTCAACCTC GTCCACACCG CGTTGCCCCA CGCGCGGTTC ATCTTCAACC 14581 ACTCCGCCGC GGCCACCCCC GACGACATCC GCCGCCACTA CCCCGACCCG GCCTCCACCG 14641 TTCCCCTCGA CGTCCCCCGC GACCTCGAGG AGGCCGCGCT GAAGAACCCC GGCAACGCCG

326

14701 CCCGCCCCGA AGCCGCCCAC GGCCCGCGGG AGACGCGGTG ACCGCCCCGG CCCGGCCCCA 14761 CGGTGAGGCG AACCCGGACC TTCACACCAC GGATGTGCTC GTCGTCGGCG GCGGGCCGAC 14821 CGGAATGACC CTGGCCGGGG ATCTGGCACG GGCCGGACGC GCGGTCACCG TGCTGGAACG 14881 CCGGCCGGCG ATCCATCCGT CCAGCCGTGC CTTCGTCACC ATGCCCCGCA CCCTGGAAGT 14941 CCTCGACAGC CGTGGTCTGG CCGACGACCT CCTGGCCGGG GCGAACACCA CCGAAGCGGT 15001 CCACCTGTTC GCGGGCGCCA CGCTCGATCT GACACATCTG CCCTCCCGCC ACCGATACGG 15061 GATGATCACC CCGCAGACCA ATGTGGACCA GGCGCTCGAA CGCTACGCCC GCGACCAGGG 15121 CGCCCGGGTG CTGCGCGGCA CCGAGGTCAC CGGCCTCGCC CAGGACGCCG ACGCGGTCAC 15181 CGTCACCGCC CGCGCCGACG GCGGCGGACC CGCTTCCACG TGGCGAGCCC GGTACGTCGT 15241 GGGGGCGGAC GGGGCGCACA GCACCGTCCG CGGCCTCCTC GGCGCCGACT TCCCCGGAAG 15301 GACGGTTCTG ACCTCCGTGG TGCTGGCCGA TGTCCGCCTC GCCGACGGCC CCACCGGGAA 15361 CGGGCTCACC CTGGGCAACA CCCCTGAGGT CTTCGGCTTC CTCGTGCCGT ACGGGAAGGC 15421 GCGCCCCGGC TGGTACCGGT CGATGACCTG GGACCGCCGC CACCAACTGC CCGACAAGGC 15481 CGCCGTGGAG GAGGCGGAGG TCACCCGCGT ACTGGCCGAG GCCATGGGAC GTGACGTCGG 15541 GGTCCGTGAG ATCGGCTGGC ACTCCCGGTT CCACTGCGAT GAACGCCAGG TCCGCTCCTA 15601 CCGGCACGGC CGGGTCTTCC TCGCCGGGGA CGCCGCCCAC GTGCACTCCC CGATGGGCGG 15661 CCAGGGCATG AACACCGGCG TCCAGGACGC GGCCAACCTC GCCTGGAAGC TCGACCTCGC 15721 CCTCGGCGGC GCCGACCCGG CCATCCTGGA CACCTACCAC CGGGAGCGCC ACCCCGTCGG 15781 CCGCCGTGTC CTGCTCCAGA GCGGTGCCAT GATGCGCGCC GTCACCCTCG GGCCGCGCCC 15841 GGCGCGGTGG CTGCGCGACC ATCTGGCCCC GGCCCTGCTG GGCGTCGGCC GGGTGCGCGA 15901 CACCATCGCC GGAAGCTTCA CCGGCGTCAC CCCGCGCTAT CCGCGCGGAC GGCGACAGCA 15961 CGCACTGGTG GGCACCCGCG CCACCGAAGT CCCGCTCGCC GAGGGCCGGT TGACCGAACT 16021 GCAGCGGGCC GGTGGCTTTC TGCTGATCCG CGAGCGGGGC GCGGCGCGCG TCGACACCAC 16081 GGTGGCCCAG GCCGAGCGCA CCGACTCCGG CCCCGCCCTG CTGGTCCGCC CCGACGGCTA 16141 TATCGCCTGG GCCGGACCCG GTGTCCGTAC GGACGGCCCC GACGGCTGGC ACACCACATG 16201 GCGGGCCTGG ACCGGCCCGG CCACCGATGC GGTGCGCGCC GGGCGCTGAA CAGGAGACGG 16261 GGAGACGGCG CCGGGCGGGC GGCCCGGCGC CGACGCCCTC ATCCGTTCCC CGTCGCCTGC 16321 CCGGCGGACA GGGAGTCGGG GAGGGCAGCG GCGGCCGGTT CCTCGGGTCG TCCCTTGCGG 16381 AAGAACCGGA GCAGCGACGG GCCGCCCCGG AAACACCACA GCGCGATGAC CGGATCGCAG 16441 ATCGCACAGC CCAGCGACGA GCCATGCGCG AACGGGACGA TGGGGTTGCC CTTGATGTGA 16501 TGCACGATGT CCCACGCGGT GTGCAGCAGC CAGCCGATGC CGATGAAGGT CCACGACTCC 16561 AGGCCACGGT AGGCCACATA GGTGGCGACC ACGGTGAAGG CGAACTCCCA GCCGTCCAGG 16621 CCGCCGCCGC TGAGGTAGGC CGCACCCGCT CCGCCGACCA TGATCGCGTT GAAGCGCCGG 16681 CGGTGCGGTT CGCGAATCAG GGACATCAGG AGCGCGTAGA GGACACCGAT GAAGACCGGA 16741 GCGATGTATT GGATCATGCG GAAGAACTTC CTGCGGGTGA CGGAACGTTG GCCGCCCGGC 16801 GGGGCGACGG TTCATCACGC TAGATCCGCC CCCGGCCGCC CCACAGGGCC ATTCCCGACA 16861 CGCTCCAACG GATAATCGCC GGGGCCGGAT CATCGCCGTG GCCACGGCCT CCACCCGGCC 16921 ACCACGCTCA GGGCCCGATC ACAGCAGCCG CCACAGGTGG TCATCGGTTC CGTTGTCGTC 16981 GTACTGCACC ACCTGGGCGC TGTTGGCGGT GGACATGCCG TCGACACCCA GCACCTTCTG 17041 GCTGTTCTTG TTGAGGACGC GGAACCAGCC GTCGCCGTTG TCCACCTTCC GCCAGAGGTG 17101 ATCGGCCGTG CCGTTGTCCT CGTACTGGAC GACGATGGCG CTGTTCGCGG TGGACATCCG 17161 GTCGACGCCC AGGACCTTGC CGCTGTGGCC GTTGCGGATC AGGAACCAGC CGTCGCCCCG 17221 GTCGATCCAC TGCCAGGCGT GGTCGCCCGT CGGTGTGTTG TCGTACTGCA CCACGCGGGC 17281 GCTGTTGGCC GTGGACATCT CGTCGACGGC GAGCACCTTG CCGCTGTGCT TGTTGAGCAG 17341 TCGGCGGAAG GGCGGCTCCG GGGTCCACGC CCGGCCGTCC GGCGCGGCCG TGGGAAAGCA 17401 GGTGATACGC AGCCGGGCGG CGCCCATGGG GATGAGGGTG ACCGTCTCCG CCGGTGCGTC 17461 GGCCCGGGCC GGGCTCTGCT GAAGCGGGGT GACCACATGC TCGTCGTCCG AGACCCACTC 17521 GGCGATACGG CGCGCCTGGG CGGTCATGCG GACCGGGGTG GTCTCGTGGG TGAAGGGATT 17581 GGCGGCGAGC GGACCGTCGT CGCGGGTGAG CACGGGGAGG GCTCCGGGGG CGAGGCCGTA 17641 GTTCCACGGA GTGGTGGCGT GCACTTCGTA CTCGGGGAAG GTGTCGGTGC CGGCGTAGCG 17701 CACGAAGTCC TCGCCGATGC GCAGGGAGta cgtcagcggg ccgtggtcga cgctgaccgc 17761 gccgtgctgc gccgaccagg tccgcagggc ggtgcgctgc ggcaggcgga tcgtcaccac 17821 atcgccgtcc gtccagctcc ggtcgacctt gacgaaggcc ggaccgccgc gcgtggccac 17881 cgcccggccg ttgacctcga tccgggggtt cttgcaccag ccggggaccc gcagatggag 17941 cgggaaggcc accttctcgg gggtggacag cgtgagtgtg atggtctcgt cgaacggata 18001 gtcggtgtcc tcggtgacgg tgaccgtcgt accgcccgcc accttcgcgg acacctggct 18061 tgcggcgtac agggaggcgg cgagcccctt gtcgggcgtg gccagccaca gctcctcgct

327

18121 gaagtacggc cagcccatgc cgtagttgtg cggacagcag cggtactggt cgacgcccgg 18181 ctggtacgac tgcatcgcga agccgttctg gaactgcccc tgcgacttca ccgcgttgtt 18241 cagatcgatg ctgttcgcgc tggtgatgta gtgggtgccg gtgccctggg ggtcgagggc 18301 ggcgggcagc atgttgaacg ccaggtcctc gcaccggtcg gcccacaccg gatcgccggt 18361 gatccgggtc agcagctcat ggctggccat gaattcgacg atgccgcagg tctcgaagcc 18421 ctgccggggg tctccgaaac ccgggcggta gttctcgtcc ccggcgaagc caccgcccgg 18481 gaactggccg tatgcgccga gcaccgacgt atagccgcgg taggtcgcct gcctgagctc 18541 ggcggagccg gtcagctggg cgtactgggc gggctcgcgg aagccctggg cgatattgac 18601 gttgtgcggg gtcgggatgt tgtcgaccgc tagcACCCCT ATTTGTTTAT TTTTCTAAAT 18661 ACATTCAAAT ATGTATCCGC TCATGAGACA ATAACCCTGA TAAATGCTTC AATAATATTG 18721 AAAAAGGAAG AGTATGAGTA TTCAACATTT CCGTGTCGCC CTTATTCCCT TTTTTGCGGC 18781 ATTTTGCCTT CCTGTTTTTG CTCACCCAGA AACGCTGGTG AAAGTAAAAG ATGCTGAAGA 18841 TCAGTTGGGT GCACGAGTGG GTTACATCGA ACTGGATCTC AACAGCGGTA AGATCCTTGA 18901 GAGTTTTCGC CCCGAAGAAC GTTTTCCAAT GATGAGCACT TTTAAAGTTC TGCTATGTGG 18961 CGCGGTATTA TCCCGTATTG ACGCCGGGCA AGAGCAACTC GGTCGCCGCA TACACTATTC 19021 TCAGAATGAC TTGGTTGAGT ACTCACCAGT CACAGAAAAG CATCTTACGG ATGGCATGAC 19081 AGTAAGAGAA TTATGCAGTG CTGCCATAAC CATGAGTGAT AACACTGCGG CCAACTTACT 19141 TCTGACAACG ATCGGAGGAC CGAAGGAGCT AACCGCTTTT TTGCACAACA TGGGGGATCA 19201 TGTAACTCGC CTTGATCGTT GGGAACCGGA GCTGAATGAA GCCATACCAA ACGACGAGCG 19261 TGACACCACG ATGCCTGTAG CAATGGCAAC AACGTTGCGC AAACTATTAA CTGGCGAACT 19321 ACTTACTCTA GCTTCCCGGC AACAATTAAT AGACTGGATG GAGGCGGATA AAGTTGCAGG 19381 ACCACTTCTG CGCTCGGCCC TTCCGGCTGG CTGGTTTATT GCTGATAAAT CTGGAGCCGG 19441 TGAGCGTGGG TCTCGCGGTA TCATTGCAGC ACTGGGGCCA GATGGTAAGC CCTCCCGTAT 19501 CGTAGTTATC TACACGACGG GGAGTCAGGC AACTATGGAT GAACGAAATA GACAGATCGC 19561 TGAGATAGGT GCCTCACTGA TTAAGCATTG GTAACTGTCA GACCAAGTTT ACTCATATAT 19621 ACTTTAGATT GATTTAAAAC TTCATTTTTA ATTTAAAAGG ATCTAGGTGA AGATCcaatt 19681 gTTTCAGCTT TCCGCAACAG TATAActgtg cTTAATTCGA GCTCGGTACC CGGGGATCTT 19741 GAAGTTCCTA TTCCGAAGTT CCTATTCTTC AAATAGTATA GGAACTTCAG AGCGCTTTTG 19801 AAGCTAGCga gctctccgtc gagttcaaga gaaaaaaaaa gaaaaagcaa aaagaaaaaa 19861 ggaaagcgcg cctcgttcag aatgacacgt atagaatgat gcattacctt gtcatcttca 19921 gtatcatact gttcgtatac atacttactg acattcatag gtatacatat atacacatgt 19981 atatatatcg tatgctgcag ctttaaataa tcggtgtcac tacataagaa cacctttggt 20041 ggagggaaca tcgttggtac cattgggcga ggtggcttct cttatggcaa ccgcaagagc 20101 cttgaacgca ctctcactac ggtgatgatc attcttgcct cgcagacaat caacgtggag 20161 ggtaattctg ctagcctctg caaagctttc aagaaaatgc gggatcatct cgcaagagag 20221 atctcctact ttctcccttt gcaaaccaag ttcgacaact gcgtacggcc tgttcgaaag 20281 atctaccacc gctctggaaa gtgcctcatc caaaggcgca aatcctgatc caaacctttt 20341 tactccacgc actgctccta gggcctcttt aaaagcttga ccgagagcaa tcccgcagtc 20401 ttcagtggtg tgatggtcgt ctatgtgtaa gtcaccaatg cactcaacga ttagcgacca 20461 gccggaatgc ttggccagag catgtatcat atggtccaga aaccctatac ctgtgtggac 20521 gttaatcact tgcgattgtg tggcctgttc tgctactgct tctgcctctt tttctgggaa 20581 gatcgagtgc tctatcgcta ggggaccacc ctttaaagag atcgcaatct gaatcttggt 20641 ttcatttgta atacgcttta ctagggcttt ctgctctgta ccagaaccac cagaaccTTT 20701 GTACAATTCA TCCATACCAT GGGTAATACC AGCAGCAGTA ACAAATTCTA ACAAGACCAT 20761 GTGGTCTCTC TTTTCGTTTG GATCTTTGGA TAAGGCAGAT TGAGTGGATA AGTAATGGTT 20821 GTCTGGTAAC AAGACTGGAC CATCACCAAT TGGAGTATTT TGTTGATAAT GGTCAGCTAA 20881 TTGAACAGAA CCATCTTCAA TGTTGTGTCT AATTTTGAAG TTAACTTTGA TACCATTCTT 20941 TTGTTTGTCA GCCATGATGT AAACATTGTG AGAGTTATAG TTGTATTCCA ATTTGTGACC 21001 TAAAATGTTA CCATCTTCTT TAAAATCAAT ACCTTTTAAT TCGATTCTAT TAACTAAGGT 21061 ATCACCTTCA AACTTGACTT CAGCTCTGGT CTTGTAGTTA CCGTCATCTT TGAAAAAAAT 21121 AGTTCTTTCT TGAACATAAC CTTCTGGCAT GGCAGACTTG AAAAAGTCAT GTTGTTTCAT 21181 ATGATCTGGG TATCTcGCAA AACATTGAAC ACCATAACCG AAAGTAGTGA CTAAGGTTGG 21241 CCATGGAACT GGCAATTTAC CAGTAGTACA AATAAATTTT AAGGTCAATT TACCGTAAGT 21301 AGCATCACCT TCACCTTCAC CGGAGACAGA AAATTTGTGA CCATTAACAT CACCATCTAA 21361 TTCAACCAAA ATTGGGACAA CACCAGTGAA TAATTCTTCA CCTTTAGACA TTGTGATGAT 21421 GTTTTATTTG TTTTGATTGG TGTCTTGTAA ATAGAAACAA GAGAGAATAA TAAACAAGTT 21481 AAGAATAAAA AACCAAAGGA TGAAAAAGAA TGAATATGAA AAAGAGTAGA GAATAACTTT

328

21541 GAAAGGGGAC CATGATATAA CTGGAAAAAA GAGGTTCTTG GAAATGAAAA GTTACCAAAG 21601 AGTATTTATA ATTCAGAAAA AAAAGCCAAC GAATATCGTT TTGATGGCGA GCCTTTTTTT 21661 TTTTTTAGGA AGACACTAAA GGTACCTAGC ATCATATGGG AAGGAAAGGA AATCACTTGG 21721 AAGACATCAC AAGCATTCAT TTACCAAGAG AAAAAATATG CATTTTAGCT AAGATCCATT 21781 GAACAAAGCA CTCACTCAAC TCAACTGAAT GAACGAAAGA AGAAAGAACA GTAGAAAACA 21841 CTTTGTGACG GTGCGGAACA CATTTACGTA GCTATCATGC TGAATTCTAC TATGAAAATC 21901 TCCCAATCTG TCGATGGCAA AACGACCCAC GTGGCAGAGT TGGGTCAAGT GCCAGTTTCT 21961 GGATTAAGTA ACAGATACAG ACATCACACG CCATAGAGGA ATCCCGCCGT TGCGAGAGAT 22021 GGAAAACAAT AGAGCCGAAA TTGTGGAAGC CCGATGTCTG GGTGTACATT TTTTTTTTTT 22081 ttCTTTCTTT CTCTTTCAAT AATCTTTCCT TTTTCCATTT AGCTTGCCGG AAAAACTTTC 22141 GGGTAGCGAA AATCTTTCTG CCGGAAAAAT TAGCTATTTT TTTCTTCCTT ATTATTTTTT 22201 TAGTTCTGAA GTTTGACCAG GGCGCTACCC TGACCGTATC ACAACCGACG ATCCGGGGTC 22261 ATGGCGGCTA //

329

A.7 Sequence of pAN84.

LOCUS pAN84-pSBAC-acceptor 12032 bp DNA circular UNA 31-MAR- 2011 COMMENT This file is created by Vector NTI http://www.invitrogen.com/ FEATURES Location/Qualifiers Primer 749..772 /label=AN509 Primer 870..889 /label=AN512 Misc._signal 876..948 /label="attP site" Primer 976..995 /label=AN14 CDS 992..2617 /vntifkey="4" /gene="int" Primer complement(1094..1138) /label=AN841 Mutation 1114 /label="A to G (NheI delete)" Primer 1444..1465 /label=AN813 Primer 1514..1534 /label=AN511 Primer complement(1560..1582) /label=AN812 Primer complement(2052..2073) /label=AN510 Primer 2336..2356 /label=AN814 Mutation 2570 /label="Insertion G" Primer complement(2604..2621) /label=AN13 Mutation 2648 /label="Insertion G" Primer complement(2848..2871) /label=AN513 Primer 2888..2912 /label=AN815 Primer complement(2960..2981) /label=AN508 CDS complement(3339..3710) /vntifkey="4" /gene="TraJ" Primer 4364..4384 /label=AN358 Primer 4456..4479 /label=M13F(-47) Primer 4456..4472 /label=VM129 Primer 4458..4515 /label=AN363 Primer 4462..4479

330

/label=M13F(-41) Primer 4482..4499 /label=M13F(-21) Primer 4494..4553 /label=AN344 Restriction_sit 4527..4532 /label=NheI Primer complement(4532..4591) /label=AN343 Restriction_sit 4569..4574 /label=MfeI Primer complement(4573..4630) /label=AN362 Primer complement(4628..4644) /label=M13R Primer complement(4629..4647) /label=VM130 Primer complement(4738..4758) /label=AN357 Primer 5344..5363 /label=AN43 CDS complement(5463..6266) /vntifkey="4" /gene="aacIII(IV)" Primer 5464..5484 /label=AN2 Primer complement(6236..6257) /label=AN1 Primer 6236..6255 /label=AN369 Primer complement(6573..6592) /label=AN42 Primer 6844..6866 /label=AN370 Primer 7550..7571 /label=AN371 Primer 8071..8091 /label=AN816 CDS 8442..9197 /vntifkey="4" /gene="repE" Primer complement(8508..8527) /label=AN368 Primer 8677..8699 /label=AN817 CDS 9795..10952 /vntifkey="4" /gene="parA" Primer 10010..10064 /label=AN840 Primer 10014..10050 /label=AN22 Primer complement(10023..10059) /label=AN21 Mutation complement(10034) /label="T-G (MfeI delete)" Primer 10227..10249

331

/label=AN818 Primer complement(10375..10395) /label=AN24 CDS 10952..11710 /vntifkey="4" /gene="parB" Primer 11249..11269 /label=AN820 Primer complement(11388..11411) /label=AN819 Primer 11779..11799 /label=AN839 Primer complement(11920..11941) /label=AN838 ORIGIN 1 ATTATTAGTC TGGGACCACG GTCCCACTCG TATCGTCGGT CTGATTATTA GTCTGGAACC 61 ACGGTCCCAC TCGTATCGTC GGTCTGATTA TTAGTCTGGG ACCACGGTCC CACTCGTATC 121 GTCGGTCTGA TTATTAGTCT GGGACCACGA TCCCACTCGT GTTGTCGGTC TGATTATCGG 181 TCTGGGACCA CGGTCCCACT TGTATTGTCG ATCAGACTAT CAGCGTGAGA CTACGATTCC 241 ATCAATGCCT GTCAAGGGCA AGTATTGACA TGTCGTCGTA ACCTGTAGAA CGGAGTAACC 301 ATAGGCCTCA GTGAGGCACC TATCTCAGCG ATCTGTCTAT TTCGTTCATC CATAGTTGCC 361 TGACTCCCCG TCGTGTAGAT AACTACGATA CGGGAGGGCT TACCATCTGG CCCCAGTGCT 421 GCAATGATAC CGCGAGACCC ACGCTCACCG GCTCCAGATT TATCAGCAAT AAACCAGCCA 481 GCCGGAAGGG CCGAGCGCAG AAGTGGTCCT GCAACTTTAT CCGCCTCCAT CCAGTCTATT 541 AATTGTTGCC GGGAAGCTAG AGTAAGTAGT TCGCCAGTTA ATAGTTTGCG CAACGTTGTT 601 GCCATTGCTA CAGGCATCGT GGTGTCACGC TCGTCGTTTG GTATGGCTTC ATTCAGCTCC 661 GGTTCCCAAC GATCAAGGCG AGTTACATGA TCCCCCATGT TGTGCAAAAA AGCGGTTAGC 721 TCCTTCGGTC CTCCGATCGT TGTCAGAAGT AAGTTGGCCG CAGTGTTATC ACTCATGGTT 781 ATGGCAGCAC TGCATAATTC TCTTACTGTC ATGCCATCCG TAAGATGCTT TTCTGTGACT 841 GGTGAGTATC CCTAGGTGGC GCCGGACGGG GCTTCAGACG TTTCGGGTGC TGGGTTGTTG 901 TCTCTGGACA GTGATCCATG GGAAACTACT CAGCACCACC AATGTTCCCA AAAGAAAGCG 961 CAGGTCAGCG CCCATGAGCC AAGATCTAGG CATGTCGCCC TTCATCGCTC CCGACGTCCC 1021 TGAGCACCTT CTAGACACTG TTCGCGTCTT CCTGTACGCG CGTCAGTCTA AGGGCCGGTC 1081 CGACGGCTCA GACGTGTCGA CCGAAGCACA GCTGGCGGCC GGTCGTGCGT TGGTCGCGTC 1141 TCGCAACGCC CAGGGGGGTG CGCGCTGGGT CGTGGCAGGT GAGTTCGTGG ACGTCGGGCG 1201 CTCCGGCTGG GACCCGAACG TGACCCGTGC CGACTTCGAG CGCATGATGG GCGAAGTCCG 1261 CGCCGGCGAA GGTGACGTTG TCGTTGTGAA TGAGCTTTCC CGGCTCACTC GCAAGGGCGC 1321 CCATGACGCG CTCGAAATCG ACAACGAATT GAAGAAGCAC GGCGTGCGCT TCATGTCGGT 1381 TCTTGAGCCG TTCCTTGACA CGTCTACCCC TATCGGCGTC GCCATTTTCG CGCTGATCGC 1441 TGCCCTTGCG AAACAGGACA GTGACCTGAA GGCGGAGCGC CTGAAGGGTG CGAAAGACGA 1501 GATTGCCGCG CTGGGTGGCG TTCACTCGTC TTCCGCCCCG TTCGGAATGC GCGCCGTGCG 1561 CAAGAAGGTC GATAATCTCG TGATCTCCGT TCTTGAGCCG GACGAAGACA ACCCGGATCA 1621 CGTCGAGCTA GTTGAGCGCA TGGCGAAAAT GTCGTTCGAA GGCGTGTCCG ACAACGCCAT 1681 TGCAACGACC TTCGAGAAGG AAAAGATCCC GTCGCCCGGA ATGGCTGAGA GACGCGCCAC 1741 GGAAAAGCGT CTTGCGTCCG TCAAGGCACG TCGCCTGAAC GGCGCTGAAA AGCCGATCAT 1801 GTGGCGCGCT CAAACGGTCC GATGGATTCT CAACCATCCC GCAATCGGCG GTTTCGCATT 1861 CGAGCGTGTG AAGCACGGTA AGGCGCACAT CAACGTCATA CGGCGCGACC CCGGCGGCAA 1921 GCCGCTAACG CCCCACACGG GCATTCTCAG CGGCTCGAAG TGGCTTGAGC TTCAAGAGAA 1981 GCGTTCCGGG AAGAATCTCA GCGACCGGAA GCCTGGGGCC GAAGTCGAAC CGACGCTTCT 2041 GAGCGGGTGG CGTTTCCTGG GGTGCCGAAT CTGCGGCGGC TCAATGGGTC AGTCCCAGGG 2101 TGGCCGTAAG CGCAACGGCG ACCTTGCCGA AGGCAATTAC ATGTGCGCCA ACCCGAAGGG 2161 GCACGGCGGC TTGTCGGTCA AGCGCAGCGA ACTGGACGAG TTCGTTGCTT CGAAGGTGTG 2221 GGCACGGCTC CGCACAGCCG ACATGGAAGA TGAACACGAT CAGGCATGGA TTGCCGCCGC 2281 TGCGGAGCGC TTCGCCCTTC AGCACGACCT AGCGGGGGTG GCCGATGAGC GGCGCGAACA 2341 ACAGGCGCAC CTAGACAACG TGCGGCGCTC CATCAAGGAC CTTCAGGCGG ACCGTAAGCC 2401 CGGTCTGTAC GTCGGGCGTG AAGAGCTGGA AACGTGGCGC TCAACGGTGC TGCAATACCG 2461 GTCCTACGAA GCGGAGTGCA CGACCCGACT CGCTGAGCTT GACGAGAAGA TGAACGGCAG

332

2521 CACCCGCGTT CCGTCTGAGT GGTTCAGCGG CGAAGACCCG ACGGCCGAAG GGGCATCTGG 2581 GCAAGCTGGG ACGTGTACGA GCGTCGGGAG TTCCTGAGCT TCTTCCTTGA CTCCGTCATG 2641 GTCGACCGGG GCGCCACCCT GAGACGAAGA AATACATCCC CCTGAAGGAC CGTGTGACGC 2701 TCAAGTGGGC GGAGCTGCTG AAGGAGGAAG ACGAAGCGAG CGAAGCCACT GAGCGGGAGC 2761 TTGCGGCGCT GTAGCGCACA GCGGGAGGGG TCGAGCCGGC GGACGGTTCG GCCCCTTTTT 2821 TGGCCTTGAA ATCGTTAGTT AGGCTAACTA GTAGTTCCTT CGTCACCACA GCGGGCAGGG 2881 AGCGCCTAGG GATACTCAAC CAAGTCATTC TGAGAATAGT GTATGCGGCG ACCGAGTTGC 2941 TCTTGCCCGG CGTCAATACG GGATAATACC GCGCCACATA GCAGAACTTT AAAAGTGCTC 3001 ATCATTGGAA AACGTTCTTC GGGGCGAAAA CTCTCAAGGA TCTTACCGCT GTTGAGATCC 3061 AGTTCGATGT AACCCACTCG TGCACCCAAC TGATCTTCAG CATCTTTTAC TTTCACCAGC 3121 GTTTCTGGGT GAGCAAAAAC AGGAAGGCAA AATGCCGCAA AAAAAGGGAA TAAGGGCGAC 3181 ACGGAAATGT TGAATACTCA TACTCTTCCT TTTTCAATTT GCCTTGCTCG TCGGTGATGT 3241 ACTTCACCAG CTCCGCGTAG TCGCTCTTCT TGATGGAGCG CATGGGGACG TGCTTGGCAA 3301 TCACGCGCAC CCCCCGGCCG TTTTAGCGGC TAAAAAAGTC ATGGCTCTGC CCTCGGGCGG 3361 ACCACGCCCA TCATGACCTT GCCAAGCTCG TCCTGCTTCT CTTCGATCTT CGCCAGCAGG 3421 GCGAGGATCG TGGCATCACC GAACCGCGCC GTGCGCGGGT CGTCGGTGAG CCAGAGTTTC 3481 AGCAGGCCGC CCAGGCGGCC CAGGTCGCCA TTGATGCGGG CCAGCTCGCG GACGTGCTCA 3541 TAGTCCACGA CGCCCGTGAT TTTGTAGCCC TGGCCGACGG CCAGCAGGTA GGCCGACAGG 3601 CTCATGCCGG CCGCCGCCGC CTTTTCCTCA ATCGCTCTTC GTTCGTCTGG AAGGCAGTAC 3661 ACCTTGATAG GTGGGCTGCC CTTCCTGGTT GGCTTGGTTT CATCAGCCAT CCGCTTGCCC 3721 TCATCTGTTA CGCCGGCGGT AGCCGGCCAG CCTCGCAGAG CAGGATTCCC GTTGAGCACC 3781 GCCAGGTGCG AATAAGGGAC AGTGAAGAAG GAACACCCGC TCGCGGGTGG GCCTACTTCA 3841 CCTATCCTGC CCGGCTGACG CCGTTGGATA CACCAAGGAA AGTCTACACG AACCCTTTGG 3901 CAAAATCCTG TATATCGTGC GATTATTGAA GCATTTATCA GGGTTATTGT CTCATGAGCG 3961 GATACATATT TGAATGTATT TAGAAAAATA AACAAATAGG GGTTCCGCGC ACATTTCCCC 4021 GAAAAGTGCC ACCTGACGTC TAAGAAACCA TTATTATCAT GACATTAACC TATAAAAATA 4081 GGCGTATCAC GAGGCCCTTT CGTCTCGCGC GTTTCGGTGA TGACGGTGAA AACCTCTGAC 4141 ACATGCAGCT CCCGGAGACG GTCACAGCTT GTCTGTAAGC GGATGCCGGG AGCAGACAAG 4201 CCCGTCAGGG CGCGTCAGCG GGTGTTGGCG GGTGTCGGGG CTGGCTTAAC TATGCGGCAT 4261 CAGAGCAGAT TGTACTGAGA GTGCACCATA TGCGGTGTGA AATACCGCAC AGATGCGTAA 4321 GGAGAAAATA CCGCATCAGG CGCCATTCGC CATTCAGGCT GCGCAACTGT TGGGAAGGGC 4381 GATCGGTGCG GGCCTCTTCG CTATTACGCC AGCTGGCGAA AGGGGGATGT GCTGCAAGGC 4441 GATTAAGTTG GGTAACGCCA GGGTTTTCCC AGTCACGACG TTGTAAAACG ACGGCCAGTG 4501 AATTCTCTAG Accggcacga ttgtgcGCTA GCTTCATCAG GGCTTCGACG GTGCTCTGGA 4561 GCTCGGTACA ATTGCCCGGG GATCCTCTAG AGTCGACCTG CAGGCATGCA AGCTTGGCGT 4621 AATCATGGTC ATAGCTGTTT CCTGTGTGAA ATTGTTATCC GCTCACAATT CCACACAACA 4681 TACGAGCCGG AAGCATAAAG TGTAAAGCCT GGGGTGCCTA ATGAGTGAGC TAACTCACAT 4741 TAATTGCGTT GCGCTCACTG CCCGCTTTCC AGTCGGGAAA CCTGTCGTGC CAGCTGCATT 4801 AATGAATCGG CCAACGCGCG GGGAGAGGCG GTTTGCGTAT TGGGCGCTCT TCCGCTTCCT 4861 CGCTCACTGA CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG AGCGGTATCA GCTCACTCAA 4921 AGGCGGTAAT ACGGTTATCC ACAGAATCAG GGGATAACGC AGGAAAGAAC ATGCATGGAG 4981 ATCCCCAATT CCCCAATGTC AAGCACTTCC GGAATCGGGA GCGCGGCCGA TGCAAAGTGC 5041 CGATAAACAT AACGATCTTT GTAGAAACCA TCGGCGCAGC TATTTACCCG CAGGACATAT 5101 CCACGCCCTC CTACATCGAA GCTGAAAGCA CGAGATTCTT CGCCCTCCGA GAGCTGCATC 5161 AGGTCGGAGA CGCTGTCGAA CTTTTCGATC AGAAACTTCT CGACAGACGT CGCGGTGAGT 5221 TCAGGCTTTT TCATATCTCA TTGCCCCCGG ACGAGCGTCT GCTCCGCCAT TCGCCGTCCG 5281 CCGTGCCAAT CGGATCAGCC GTCCAAATGC GGGATTTTCG TTAGTCGGAG GCCAAACGGC 5341 ATTGAGCGTC AGCATATCAT CAGCGAGCTG AAGAAAGACA ATCCCCGATC CGCTCCACGT 5401 GTTGCCCCAG CAATCAGCGC GACCTTGCCC CTCCAACGTC ATCTCGTTCT CCGCTCATGA 5461 GCTCAGCCAA TCGACTGGCG AGCGGCATCG CATTCTTCGC ATCCCGCCTC TGGCGGATGC 5521 AGGAAGATCA ACGGATCTCG GCCCAGTTGA CCCAGGGCTG TCGCCACAAT GTCGCGGGAG 5581 CGGATCAACC GAGCAAAGGC ATGACCGACT GGACCTTCCT TCTGAAGGCT CTTCTCCTTG 5641 AGCCACCTGT CCGCCAAGGC AAAGCGCTCA CAGCAGTGGT CATTCTCGAG ATAATCGACG 5701 CGTACCAACT TGCCATCCTG AAGAATGGTG CAGTGTCTCG GCACCCCATA GGGAACCTTT 5761 GCCATCAACT CGGCAAGATG CAGCGTCGTG TTGGCATCGT GTCCCACGCC GAGGAGAAGT 5821 ACCTGCCCAT CGAGTTCATG GACACGGGCG ACCGGGCTTG CAGGCGAGTG AGGTGGCAGG 5881 GGCAATGGAT CAGAGATGAT CTGCTCTGCC TGTGGCCCCG CTGCCGCAAA GGCAAATGGA

333

5941 TGGGCGCTGC GCTTTACATT TGGCAGGCGC CAGAATGTGT CAGAGACAAC TCCAAGGTCC 6001 GGTGTAACGG GCGACGTGGC AGGATCGAAC GGCTCGTCGT CCAGACCTGA CCACGAGGGC 6061 ATGACGAGCG TCCCTCCCGG ACCCAGCGCA GCACGCAGGG CCTCGATCAG TCCAAGTGGC 6121 CCATCTTCGA GGGGCCGGAC GCTACGGAAG GAGCTGTGGA CCAGCAGCAC ACCGCCGGGG 6181 GTAACCCCAA GGTTGAGAAG CTGACCGATG AGCTCGGCTT TTCGCCATTC GTATTGCACG 6241 ACATTGCACT CCACCGCTGA TGACATCAGT CGATCATAGC ACGATCAACG GCACTGTTGC 6301 AAATAGTCGG TGGTGATAAA CTTATCATCC CCTTTTGCTG ATGGAGCTGC ACATGAACCC 6361 ATTCAAAGGC CGGCATTTTC AGCGTGACAT CATTCTGTGG GCCGTACGCT GGTACTGCAA 6421 ATACGGCATC AGTTACCGTG AGCGGGTACC GAGCTCGAAT TAATTCGTAA TCATGGCCAT 6481 GCATGTGAGC AAAAGGCCAG CAAAAGGCCA GGAACCGTAA AAAGGCCGCG TTGCTGGCGT 6541 TTTTCCATAG GCTCCGCCCC CCTGACGAGC ATCACAAAAA TCGACGCTCA AGTCAGAGGT 6601 GGCGAAGGCC TGAAGGGCTT CCCGGTATCA ACAGGGACAC CAGGATTTAT TTATTCTGCG 6661 AAGTGATCTT CCGTCACAGG TATTTATTCG CGATAAGCTC ATGGAGCGGC GTAACCGTCG 6721 CACAGGAAGG ACAGAGAAAG CGCGGATCTG GGAAGTGACG GACAGAACGG TCAGGACCTG 6781 GATTGGGGAG GCGGTTGCCG CCGCTGCTGC TGACGGTGTG ACGTTCTCTG TTCCGGTCAC 6841 ACCACATACG TTCCGCCATT CCTATGCGAT GCACATGCTG TATGCCGGTA TACCGCTGAA 6901 AGTTCTGCAA AGCCTGATGG GACATAAGTC CATCAGTTCA ACGGAAGTCT ACACGAAGGT 6961 TTTTGCGCTG GATGTGGCTG CCCGGCACCG GGTGCAGTTT GCGATGCCGG AGTCTGATGC 7021 GGTTGCGATG CTGAAACAAT TATCCTGAGA ATAAATGCCT TGGCCTTTAT ATGGAAATGT 7081 GGAACTGAGT GGATATGCTG TTTTTGTCTG TTAAACAGAG AAGCTGGCTG TTATCCACTG 7141 AGAAGCGAAC GAAACAGTCG GGAAAATCTC CCATTATCGT AGAGATCCGC ATTATTAATC 7201 TCAGGAGCCT GTGTAGCGTT TATAGGAAGT AGTGTTCTGT CATGATGCCT GCAAGCGGTA 7261 ACGAAAACGA TTTGAATATG CCTTCAGGAA CAATAGAAAT CTTCGTGCGG TGTTACGTTG 7321 AAGTGGAGCG GATTATGTCA GCAATGGACA GAACAACCTA ATGAACACAG AACCATGATG 7381 TGGTCTGTCC TTTTACAGCC AGTAGTGCTC GCCGCAGTCG AGCGACAGGG CGAAGCCCTC 7441 GAGCTGGTTG CCCTCGCCGC TGGGCTGGCG GCCGTCTATG GCCCTGCAAA CGCGCCAGAA 7501 ACGCCGTCGA AGCCGTGTGC GAGACACCGC GGCCGGCCGC CGGCGTTGTG GATACCTCGC 7561 GGAAAACTTG GCCCTCACTG ACAGATGAGG GGCGGACGTT GACACTTGAG GGGCCGACTC 7621 ACCCGGCGCG GCGTTGACAG ATGAGGGGCA GGCTCGATTT CGGCCGGCGA CGTGGAGCTG 7681 GCCAGCCTCG CAAATCGGCG AAAACGCCTG ATTTTACGCG AGTTTCCCAC AGATGATGTG 7741 GACAAGCCTG GGGATAAGTG CCCTGCGGTA TTGACACTTG AGGGGCGCGA CTACTGACAG 7801 ATGAGGGGCG CGATCCTTGA CACTTGAGGG GCAGAGTGCT GACAGATGAG GGGCGCACCT 7861 ATTGACATTT GAGGGGCTGT CCACAGGCAG AAAATCCAGC ATTTGCAAGG GTTTCCGCCC 7921 GTTTTTCGGC CACCGCTAAC CTGTCTTTTA ACCTGCTTTT AAACCAATAT TTATAAACCT 7981 TGTTTTTAAC CAGGGCTGCG CCCTGTGCGC GTGACCGCGC ACGCCGAAGG GGGTGCCCCC 8041 CTTCTCGAAC CCTCCCGGTC GAGTGAGCGA GGAAGCACCA GGGAACAGCA CTTATATATT 8101 CTGCTTACAC ACGATGCCTG AAAAAACTTC CCTTGGGGTT ATCCACTTAT CCACGGGGAT 8161 ATTTTTATAA TTATTTTTTT ATTAGTTTTT AGATCTTCTT TTTTAGAGCG CCTTGTAGGC 8221 CTTTATCCAT GCTGGTTCTA GAGAAGGTGT TGTGACAAAT TGCCCTTTCA GTGTGACAAA 8281 TCACCCTCAA ATGACAGTCC TGTCTGTGAC AAATTGCCCT TAACCCTGTG ACAAATTGCC 8341 CTCAGAAGAA GCTGTTTTTT CACAAAGTTA TCCCTGCTTA TTGACTCTTT TTTATTTAGT 8401 GTGACAATCT AAAAACTTGT CACACTTCAC ATGGATCTGT CATGGCGGAA ACAGCGGTTA 8461 TCAATCACAA GAAACGTAAA AATAGCCCGC GAATCGTCCA GTCAAACGAC CTCACTGAGG 8521 CGGCATATAG TCTCTCCCGG GATCAAAAAC GTATGCTGTA TCTGTTCGTT GACCAGATCA 8581 GAAAATCTGA TGGCACCCTA CAGGAACATG ACGGTATCTG CGAGATCCAT GTTGCTAAAT 8641 ATGCTGAAAT ATTCGGATTG ACCTCTGCGG AAGCCAGTAA GGATATACGG CAGGCATTGA 8701 AGAGTTTCGC GGGGAAGGAA GTGGTTTTTT ATCGCCCTGA AGAGGATGCC GGCGATGAAA 8761 AAGGCTATGA ATCTTTTCCT TGGTTTATCA AACGTGCGCA CAGTCCATCC AGAGGGCTTT 8821 ACAGTGTACA TATCAACCCA TATCTCATTC CCTTCTTTAT CGGGTTACAG AACCGGTTTA 8881 CGCAGTTTCG GCTTAGTGAA ACAAAAGAAA TCACCAATCC GTATGCCATG CGTTTATACG 8941 AATCCCTGTG TCAGTATCGT AAGCCGGATG GCTCAGGCAT CGTCTCTCTG AAAATCGACT 9001 GGATCATAGA GCGTTACCAG CTGCCTCAAA GTTACCAGCG TATGCCTGAC TTCCGCCGCC 9061 GCTTCCTGCA GGTCTGTGTT AATGAGATCA ACAGCAGAAC TCCAATGCGC CTCTCATACA 9121 TTGAGAAAAA GAAAGGCCGC CAGACGACTC ATATCGTATT TTCCTTCCGC GATATCACTT 9181 CCATGACGAC AGGATAGTCT GAGGGTTATC TGTCACAGAT TTGAGGGTGG TTCGTCACAT 9241 TTGTTCTGAC CTACTGAGGG TAATTTGTCA CAGTTTTGCT GTTTCCTTCA GCCTGCATGG 9301 ATTTTCTCAT ACTTTTTGAA CTGTAATTTT TAAGGAAGCC AAATTTGAGG GCAGTTTGTC

334

9361 ACAGTTGATT TCCTTCTCTT TCCCTTCGTC ATGTGACCTG ATATCGGGGG TTAGTTCGTC 9421 ATCATTGATG AGGGTTGATT ATCACAGTTT ATTACTCTGA ATTGGCTATC CGCGTGTGTA 9481 CCTCTACCTG GAGTTTTTCC CACGGTGGAT ATTTCTTCTT GCGCTGAGCG TAAGAGCTAT 9541 CTGACAGAAC AGTTCTTCTT TGCTTCCTCG CCAGTTCGCT CGCTATGCTC GGTTACACGG 9601 CTGCGGCGAG CGCTAGTGAT AATAAGTGAC TGAGGTATGT GCTCTTCTTA TCTCCTTTTG 9661 TAGTGTTGCT CTTATTTTAA ACAACTTTGC GGTTTTTTGA TGACTTTGCG ATTTTGTTGT 9721 TGCTTTGCAG TAAATTGCAA GATTTAATAA AAAAAACGCA AAGCAATGAT TAAAGGATGT 9781 TCAGAATGAA ACTCATGGAA ACACTTAACC AGTGCATAAA CGCTGGTCAT GAAATGACGA 9841 AGGCTATCGC CATTGCACAG TTTAATGATG ACAGCCCGGA AGCGAGGAAA ATAACCCGGC 9901 GCTGGAGAAT AGGTGAAGCA GCGGATTTAG TTGGGGTTTC TTCTCAGGCT ATCAGAGATG 9961 CCGAGAAAGC AGGGCGACTA CCGCACCCGG ATATGGAAAT TCGAGGACGG GTTGAGCAAC 10021 GTGTTGGTTA TACCATTGAA CAAATTAATC ATATGCGTGA TGTGTTTGGT ACGCGATTGC 10081 GACGTGCTGA AGACGTATTT CCACCGGTGA TCGGGGTTGC TGCCCATAAA GGTGGCGTTT 10141 ACAAAACCTC AGTTTCTGTT CATCTTGCTC AGGATCTGGC TCTGAAGGGG CTACGTGTTT 10201 TGCTCGTGGA AGGTAACGAC CCCCAGGGAA CAGCCTCAAT GTATCACGGA TGGGTACCAG 10261 ATCTTCATAT TCATGCAGAA GACACTCTCC TGCCTTTCTA TCTTGGGGAA AAGGACGATG 10321 TCACTTATGC AATAAAGCCC ACTTGCTGGC CGGGGCTTGA CATTATTCCT TCCTGTCTGG 10381 CTCTGCACCG TATTGAAACT GAGTTAATGG GCAAATTTGA TGAAGGTAAA CTGCCCACCG 10441 ATCCACACCT GATGCTCCGA CTGGCCATTG AAACTGTTGC TCATGACTAT GATGTCATAG 10501 TTATTGACAG CGCGCCTAAC CTGGGTATCG GCACGATTAA TGTCGTATGT GCTGCTGATG 10561 TGCTGATTGT TCCCACGCCT GCTGAGTTGT TTGACTACAC CTCCGCACTG CAGTTTTTCG 10621 ATATGCTTCG TGATCTGCTC AAGAACGTTG ATCTTAAAGG GTTCGAGCCT GATGTACGTA 10681 TTTTGCTTAC CAAATACAGC AATAGCAATG GCTCTCAGTC CCCGTGGATG GAGGAGCAAA 10741 TTCGGGATGC CTGGGGAAGC ATGGTTCTAA AAAATGTTGT ACGTGAAACG GATGAAGTTG 10801 GTAAAGGTCA GATCCGGATG AGAACTGTTT TTGAACAGGC CATTGATCAA CGCTCTTCAA 10861 CTGGTGCCTG GAGAAATGCT CTTTCTATTT GGGAACCTGT CTGCAATGAA ATTTTCGATC 10921 GTCTGATTAA ACCACGCTGG GAGATTAGAT AATGAAGCGT GCGCCTGTTA TTCCAAAACA 10981 TACGCTCAAT ACTCAACCGG TTGAAGATAC TTCGTTATCG ACACCAGCTG CCCCGATGGT 11041 GGATTCGTTA ATTGCGCGCG TAGGAGTAAT GGCTCGCGGT AATGCCATTA CTTTGCCTGT 11101 ATGTGGTCGG GATGTGAAGT TTACTCTTGA AGTGCTCCGG GGTGATAGTG TTGAGAAGAC 11161 CTCTCGGGTA TGGTCAGGTA ATGAACGTGA CCAGGAGCTG CTTACTGAGG ACGCACTGGA 11221 TGATCTCATC CCTTCTTTTC TACTGACTGG TCAACAGACA CCGGCGTTCG GTCGAAGAGT 11281 ATCTGGTGTC ATAGAAATTG CCGATGGGAG TCGCCGTCGT AAAGCTGCTG CACTTACCGA 11341 AAGTGATTAT CGTGTTCTGG TTGGCGAGCT GGATGATGAG CAGATGGCTG CATTATCCAG 11401 ATTGGGTAAC GATTATCGCC CAACAAGTGC TTATGAACGT GGTCAGCGTT ATGCAAGCCG 11461 ATTGCAGAAT GAATTTGCTG GAAATATTTC TGCGCTGGCT GATGCGGAAA ATATTTCACG 11521 TAAGATTATT ACCCGCTGTA TCAACACCGC CAAATTGCCT AAATCAGTTG TTGCTCTTTT 11581 TTCTCACCCC GGTGAACTAT CTGCCCGGTC AGGTGATGCA CTTCAAAAAG CCTTTACAGA 11641 TAAAGAGGAA TTACTTAAGC AGCAGGCATC TAACCTTCAT GAGCAGAAAA AAAGCTGGGG 11701 TGATATTTGA AGCTGAAGAA GTTATCACTC TTTTAACTTC TGTGCTTAAA ACGTCATCTG 11761 CATCAAGAAC TAGTTTAAGC TCACGACATC AGTTTGCTCC TGGAGCGACA GTATTGTATA 11821 AGGGCGATAA AATGGTGCTT AACCTGGACA GGTCTCGTGT TCCAACTGAG TGTATAGAGA 11881 AAATTGAGGC CATTCTTAAG GAACTTGAAA AGCCAGCACC CTGATGCGAC CTCGTTTTAG 11941 TCTACGTTTA TCTGTCTTTA CTTAATGTCC TTTGTTACAG GCCAGAAAGC ATAACTGGCC 12001 TGAATATTCT CTCTGGGCCC ACTGTTCCAC TT //

335

A.8 Sequence of pAN146.

LOCUS pSBAC-Flp-acceptor-pGAL-Flp 19162 bp DNA circular UNA 31-MAR-2011 COMMENT This file is created by Vector NTI http://www.invitrogen.com/ FEATURES Location/Qualifiers CDS complement(323..1081) /vntifkey="4" /gene="parB" CDS complement(1081..2238) /vntifkey="4" /gene="parA" Primer 1638..1658 /label=AN24 Primer 1974..2010 /label=AN21 Primer complement(1983..2019) /label=AN22 Mutation 1999 /label="T-G (MfeI delete)" CDS complement(2836..3591) /vntifkey="4" /gene="repE" Primer 3506..3525 /label=AN368 Primer complement(4462..4483) /label=AN371 Primer complement(5167..5189) /label=AN370 Primer 5420..5447 /label=pB1 Primer 5441..5460 /label=AN42 CDS 5767..6570 /vntifkey="4" /gene="aacIII(IV)" Primer 5776..5797 /label=AN1 Primer complement(5778..5797) /label=AN369 Primer complement(6549..6569) /label=AN2 Primer complement(6670..6689) /label=AN43 Primer 7275..7295 /label=AN357 Mutation 7358 /label="T to A" Primer 7386..7404 /label=VM130 Primer 7389..7405 /label=M13R Primer 7389..7438 /label=AN619 Restriction_sit 7418

336

/label="destroyed HindIII" Promoter 7419..7875 /label=pGAL Primer 7876..7921 /label=AN5 ORF 7876..9147 /label=Flp Primer 7879..7921 /label=AN47 Primer complement(8132..8151) /label=AN87 Primer 8333..8352 /label=AN243 Primer complement(8647..8665) /label=AN86 Primer 8937..8955 /label=AN170 Primer complement(9118..9159) /label=AN620 Primer complement(9122..9147) /label=AN48 Primer 9129..9165 /label=AN621 Terminator 9148..9402 /label=tACT Primer complement(9371..9430) /label=AN622 Restriction_sit 9403 /label="destroyed HindIII" Primer 9415..9466 /label=AN415 Replication_ori complement(9445..9963) /label=CEN/ARS Primer 10425..10442 /label=AN61 ORF complement(10690..11784) /label=LEU2-f1 ORF complement(10690..11796) /label=LEU2-f2 Primer 10944..10961 /label=AN62 Primer 11438..11456 /label=AN63 Primer complement(11777..11826) /label=AN447 Primer 11782..11832 /label=AN448 Replication_ori complement(12633..12939) /label="F1 ori" Primer 12851..12868 /label=AN50 Primer 12982..13002 /label=AN358 Primer 13126..13147 /label=AN154 Misc._recombina complement(13180..13227) /label=FRT3

337

Terminator complement(13244..13495) /label=tCYC Primer 13508..13537 /label=AN108 ORF complement(13508..14179) /label=TRP1 Primer complement(13729..13750) /label=AN183 Primer 14059..14081 /label=AN285 Misc._recombina complement(14180..14227) /label=FRT(WT) Promoter complement(14249..14630) /label=pMET25 Primer complement(14611..14660) /label=AN416 Primer complement(14664..14681) /label=M13F(-21) Primer complement(14684..14707) /label=M13F(-47) Primer complement(14684..14701) /label=M13F(-41) Primer complement(14691..14707) /label=VM129 Primer complement(14779..14799) /label=AN358 Replication_ori complement(15275..15373) /label="oriT (RP4)" CDS 15453..15824 /vntifkey="4" /gene="TraJ" Primer 16182..16203 /label=AN508 Source 16277..18307 /label=Phi-BT1 Primer 16542..16559 /label=AN13 CDS complement(16546..18171) /vntifkey="4" /gene="int" Primer 17090..17111 /label=AN510 Primer complement(17629..17649) /label=AN511 Mutation 18049 /label="T-C (NheI delete)" Primer complement(18168..18187) /label=AN14 Misc._recombina complement(18215..18287) /label=attP Primer complement(18391..18414) /label=AN509 Primer complement(18835..18862) /label=pB2 ORIGIN 1 AAGTGGAACA GTGGGCCCAG AGAGAATATT CAGGCCAGTT ATGCTTTCTG GCCTGTAACA 61 AAGGACATTA AGTAAAGACA GATAAACGTA GACTAAAACG AGGTCGCATC AGGGTGCTGG

338

121 CTTTTCAAGT TCCTTAAGAA TGGCCTCAAT TTTCTCTATA CACTCAGTTG GAACACGAGA 181 CCTGTCCAGG TTAAGCACCA TTTTATCGCC CTTATACAAT ACTGTCGCTC CAGGAGCAAA 241 CTGATGTCGT GAGCTTAAAC TAGTTCTTGA TGCAGATGAC GTTTTAAGCA CAGAAGTTAA 301 AAGAGTGATA ACTTCTTCAG CTTCAAATAT CACCCCAGCT TTTTTTCTGC TCATGAAGGT 361 TAGATGCCTG CTGCTTAAGT AATTCCTCTT TATCTGTAAA GGCTTTTTGA AGTGCATCAC 421 CTGACCGGGC AGATAGTTCA CCGGGGTGAG AAAAAAGAGC AACAACTGAT TTAGGCAATT 481 TGGCGGTGTT GATACAGCGG GTAATAATCT TACGTGAAAT ATTTTCCGCA TCAGCCAGCG 541 CAGAAATATT TCCAGCAAAT TCATTCTGCA ATCGGCTTGC ATAACGCTGA CCACGTTCAT 601 AAGCACTTGT TGGGCGATAA TCGTTACCCA ATCTGGATAA TGCAGCCATC TGCTCATCAT 661 CCAGCTCGCC AACCAGAACA CGATAATCAC TTTCGGTAAG TGCAGCAGCT TTACGACGGC 721 GACTCCCATC GGCAATTTCT ATGACACCAG ATACTCTTCG ACCGAACGCC GGTGTCTGTT 781 GACCAGTCAG TAGAAAAGAA GGGATGAGAT CATCCAGTGC GTCCTCAGTA AGCAGCTCCT 841 GGTCACGTTC ATTACCTGAC CATACCCGAG AGGTCTTCTC AACACTATCA CCCCGGAGCA 901 CTTCAAGAGT AAACTTCACA TCCCGACCAC ATACAGGCAA AGTAATGGCA TTACCGCGAG 961 CCATTACTCC TACGCGCGCA ATTAACGAAT CCACCATCGG GGCAGCTGGT GTCGATAACG 1021 AAGTATCTTC AACCGGTTGA GTATTGAGCG TATGTTTTGG AATAACAGGC GCACGCTTCA 1081 TTATCTAATC TCCCAGCGTG GTTTAATCAG ACGATCGAAA ATTTCATTGC AGACAGGTTC 1141 CCAAATAGAA AGAGCATTTC TCCAGGCACC AGTTGAAGAG CGTTGATCAA TGGCCTGTTC 1201 AAAAACAGTT CTCATCCGGA TCTGACCTTT ACCAACTTCA TCCGTTTCAC GTACAACATT 1261 TTTTAGAACC ATGCTTCCCC AGGCATCCCG AATTTGCTCC TCCATCCACG GGGACTGAGA 1321 GCCATTGCTA TTGCTGTATT TGGTAAGCAA AATACGTACA TCAGGCTCGA ACCCTTTAAG 1381 ATCAACGTTC TTGAGCAGAT CACGAAGCAT ATCGAAAAAC TGCAGTGCGG AGGTGTAGTC 1441 AAACAACTCA GCAGGCGTGG GAACAATCAG CACATCAGCA GCACATACGA CATTAATCGT 1501 GCCGATACCC AGGTTAGGCG CGCTGTCAAT AACTATGACA TCATAGTCAT GAGCAACAGT 1561 TTCAATGGCC AGTCGGAGCA TCAGGTGTGG ATCGGTGGGC AGTTTACCTT CATCAAATTT 1621 GCCCATTAAC TCAGTTTCAA TACGGTGCAG AGCCAGACAG GAAGGAATAA TGTCAAGCCC 1681 CGGCCAGCAA GTGGGCTTTA TTGCATAAGT GACATCGTCC TTTTCCCCAA GATAGAAAGG 1741 CAGGAGAGTG TCTTCTGCAT GAATATGAAG ATCTGGTACC CATCCGTGAT ACATTGAGGC 1801 TGTTCCCTGG GGGTCGTTAC CTTCCACGAG CAAAACACGT AGCCCCTTCA GAGCCAGATC 1861 CTGAGCAAGA TGAACAGAAA CTGAGGTTTT GTAAACGCCA CCTTTATGGG CAGCAACCCC 1921 GATCACCGGT GGAAATACGT CTTCAGCACG TCGCAATCGC GTACCAAACA CATCACGCAT 1981 ATGATTAATT TGTTCAATGG TATAACCAAC ACGTTGCTCA ACCCGTCCTC GAATTTCCAT 2041 ATCCGGGTGC GGTAGTCGCC CTGCTTTCTC GGCATCTCTG ATAGCCTGAG AAGAAACCCC 2101 AACTAAATCC GCTGCTTCAC CTATTCTCCA GCGCCGGGTT ATTTTCCTCG CTTCCGGGCT 2161 GTCATCATTA AACTGTGCAA TGGCGATAGC CTTCGTCATT TCATGACCAG CGTTTATGCA 2221 CTGGTTAAGT GTTTCCATGA GTTTCATTCT GAACATCCTT TAATCATTGC TTTGCGTTTT 2281 TTTTATTAAA TCTTGCAATT TACTGCAAAG CAACAACAAA ATCGCAAAGT CATCAAAAAA 2341 CCGCAAAGTT GTTTAAAATA AGAGCAACAC TACAAAAGGA GATAAGAAGA GCACATACCT 2401 CAGTCACTTA TTATCACTAG CGCTCGCCGC AGCCGTGTAA CCGAGCATAG CGAGCGAACT 2461 GGCGAGGAAG CAAAGAAGAA CTGTTCTGTC AGATAGCTCT TACGCTCAGC GCAAGAAGAA 2521 ATATCCACCG TGGGAAAAAC TCCAGGTAGA GGTACACACG CGGATAGCCA ATTCAGAGTA 2581 ATAAACTGTG ATAATCAACC CTCATCAATG ATGACGAACT AACCCCCGAT ATCAGGTCAC 2641 ATGACGAAGG GAAAGAGAAG GAAATCAACT GTGACAAACT GCCCTCAAAT TTGGCTTCCT 2701 TAAAAATTAC AGTTCAAAAA GTATGAGAAA ATCCATGCAG GCTGAAGGAA ACAGCAAAAC 2761 TGTGACAAAT TACCCTCAGT AGGTCAGAAC AAATGTGACG AACCACCCTC AAATCTGTGA 2821 CAGATAACCC TCAGACTATC CTGTCGTCAT GGAAGTGATA TCGCGGAAGG AAAATACGAT 2881 ATGAGTCGTC TGGCGGCCTT TCTTTTTCTC AATGTATGAG AGGCGCATTG GAGTTCTGCT 2941 GTTGATCTCA TTAACACAGA CCTGCAGGAA GCGGCGGCGG AAGTCAGGCA TACGCTGGTA 3001 ACTTTGAGGC AGCTGGTAAC GCTCTATGAT CCAGTCGATT TTCAGAGAGA CGATGCCTGA 3061 GCCATCCGGC TTACGATACT GACACAGGGA TTCGTATAAA CGCATGGCAT ACGGATTGGT 3121 GATTTCTTTT GTTTCACTAA GCCGAAACTG CGTAAACCGG TTCTGTAACC CGATAAAGAA 3181 GGGAATGAGA TATGGGTTGA TATGTACACT GTAAAGCCCT CTGGATGGAC TGTGCGCACG 3241 TTTGATAAAC CAAGGAAAAG ATTCATAGCC TTTTTCATCG CCGGCATCCT CTTCAGGGCG 3301 ATAAAAAACC ACTTCCTTCC CCGCGAAACT CTTCAATGCC TGCCGTATAT CCTTACTGGC 3361 TTCCGCAGAG GTCAATCCGA ATATTTCAGC ATATTTAGCA ACATGGATCT CGCAGATACC 3421 GTCATGTTCC TGTAGGGTGC CATCAGATTT TCTGATCTGG TCAACGAACA GATACAGCAT 3481 ACGTTTTTGA TCCCGGGAGA GACTATATGC CGCCTCAGTG AGGTCGTTTG ACTGGACGAT

339

3541 TCGCGGGCTA TTTTTACGTT TCTTGTGATT GATAACCGCT GTTTCCGCCA TGACAGATCC 3601 ATGTGAAGTG TGACAAGTTT TTAGATTGTC ACACTAAATA AAAAAGAGTC AATAAGCAGG 3661 GATAACTTTG TGAAAAAACA GCTTCTTCTG AGGGCAATTT GTCACAGGGT TAAGGGCAAT 3721 TTGTCACAGA CAGGACTGTC ATTTGAGGGT GATTTGTCAC ACTGAAAGGG CAATTTGTCA 3781 CAACACCTTC TCTAGAACCA GCATGGATAA AGGCCTACAA GGCGCTCTAA AAAAGAAGAT 3841 CTAAAAACTA ATAAAAAAAT AATTATAAAA ATATCCCCGT GGATAAGTGG ATAACCCCAA 3901 GGGAAGTTTT TTCAGGCATC GTGTGTAAGC AGAATATATA AGTGCTGTTC CCTGGTGCTT 3961 CCTCGCTCAC TCGACCGGGA GGGTTCGAGA AGGGGGGCAC CCCCTTCGGC GTGCGCGGTC 4021 ACGCGCACAG GGCGCAGCCC TGGTTAAAAA CAAGGTTTAT AAATATTGGT TTAAAAGCAG 4081 GTTAAAAGAC AGGTTAGCGG TGGCCGAAAA ACGGGCGGAA ACCCTTGCAA ATGCTGGATT 4141 TTCTGCCTGT GGACAGCCCC TCAAATGTCA ATAGGTGCGC CCCTCATCTG TCAGCACTCT 4201 GCCCCTCAAG TGTCAAGGAT CGCGCCCCTC ATCTGTCAGT AGTCGCGCCC CTCAAGTGTC 4261 AATACCGCAG GGCACTTATC CCCAGGCTTG TCCACATCAT CTGTGGGAAA CTCGCGTAAA 4321 ATCAGGCGTT TTCGCCGATT TGCGAGGCTG GCCAGCTCCA CGTCGCCGGC CGAAATCGAG 4381 CCTGCCCCTC ATCTGTCAAC GCCGCGCCGG GTGAGTCGGC CCCTCAAGTG TCAACGTCCG 4441 CCCCTCATCT GTCAGTGAGG GCCAAGTTTT CCGCGAGGTA TCCACAACGC CGGCGGCCGG 4501 CCGCGGTGTC TCGCACACGG CTTCGACGGC GTTTCTGGCG CGTTTGCAGG GCCATAGACG 4561 GCCGCCAGCC CAGCGGCGAG GGCAACCAGC TCGAGGGCTT CGCCCTGTCG CTCGACTGCG 4621 GCGAGCACTA CTGGCTGTAA AAGGACAGAC CACATCATGG TTCTGTGTTC ATTAGGTTGT 4681 TCTGTCCATT GCTGACATAA TCCGCTCCAC TTCAACGTAA CACCGCACGA AGATTTCTAT 4741 TGTTCCTGAA GGCATATTCA AATCGTTTTC GTTACCGCTT GCAGGCATCA TGACAGAACA 4801 CTACTTCCTA TAAACGCTAC ACAGGCTCCT GAGATTAATA ATGCGGATCT CTACGATAAT 4861 GGGAGATTTT CCCGACTGTT TCGTTCGCTT CTCAGTGGAT AACAGCCAGC TTCTCTGTTT 4921 AACAGACAAA AACAGCATAT CCACTCAGTT CCACATTTCC ATATAAAGGC CAAGGCATTT 4981 ATTCTCAGGA TAATTGTTTC AGCATCGCAA CCGCATCAGA CTCCGGCATC GCAAACTGCA 5041 CCCGGTGCCG GGCAGCCACA TCCAGCGCAA AAACCTTCGT GTAGACTTCC GTTGAACTGA 5101 TGGACTTATG TCCCATCAGG CTTTGCAGAA CTTTCAGCGG TATACCGGCA TACAGCATGT 5161 GCATCGCATA GGAATGGCGG AACGTATGTG GTGTGACCGG AACAGAGAAC GTCACACCGT 5221 CAGCAGCAGC GGCGGCAACC GCCTCCCCAA TCCAGGTCCT GACCGTTCTG TCCGTCACTT 5281 CCCAGATCCG CGCTTTCTCT GTCCTTCCTG TGCGACGGTT ACGCCGCTCC ATGAGCTTAT 5341 CGCGAATAAA TACCTGTGAC GGAAGATCAC TTCGCAGAAT AAATAAATCC TGGTGTCCCT 5401 GTTGATACCG GGAAGCCCTT CAGGCCTTCG CCACCTCTGA CTTGAGCGTC GATTTTTGTG 5461 ATGCTCGTCA GGGGGGCGGA GCCTATGGAA AAACGCCAGC AACGCGGCCT TTTTACGGTT 5521 CCTGGCCTTT TGCTGGCCTT TTGCTCACAT GCATGGCCAT GATTACGAAT TAATTCGAGC 5581 TCGGTACCCG CTCACGGTAA CTGATGCCGT ATTTGCAGTA CCAGCGTACG GCCCACAGAA 5641 TGATGTCACG CTGAAAATGC CGGCCTTTGA ATGGGTTCAT GTGCAGCTCC ATCAGCAAAA 5701 GGGGATGATA AGTTTATCAC CACCGACTAT TTGCAACAGT GCCGTTGATC GTGCTATGAT 5761 CGACTGATGT CATCAGCGGT GGAGTGCAAT GTCGTGCAAT ACGAATGGCG AAAAGCCGAG 5821 CTCATCGGTC AGCTTCTCAA CCTTGGGGTT ACCCCCGGCG GTGTGCTGCT GGTCCACAGC 5881 TCCTTCCGTA GCGTCCGGCC CCTCGAAGAT GGGCCACTTG GACTGATCGA GGCCCTGCGT 5941 GCTGCGCTGG GTCCGGGAGG GACGCTCGTC ATGCCCTCGT GGTCAGGTCT GGACGACGAG 6001 CCGTTCGATC CTGCCACGTC GCCCGTTACA CCGGACCTTG GAGTTGTCTC TGACACATTC 6061 TGGCGCCTGC CAAATGTAAA GCGCAGCGCC CATCCATTTG CCTTTGCGGC AGCGGGGCCA 6121 CAGGCAGAGC AGATCATCTC TGATCCATTG CCCCTGCCAC CTCACTCGCC TGCAAGCCCG 6181 GTCGCCCGTG TCCATGAACT CGATGGGCAG GTACTTCTCC TCGGCGTGGG ACACGATGCC 6241 AACACGACGC TGCATCTTGC CGAGTTGATG GCAAAGGTTC CCTATGGGGT GCCGAGACAC 6301 TGCACCATTC TTCAGGATGG CAAGTTGGTA CGCGTCGATT ATCTCGAGAA TGACCACTGC 6361 TGTGAGCGCT TTGCCTTGGC GGACAGGTGG CTCAAGGAGA AGAGCCTTCA GAAGGAAGGT 6421 CCAGTCGGTC ATGCCTTTGC TCGGTTGATC CGCTCCCGCG ACATTGTGGC GACAGCCCTG 6481 GGTCAACTGG GCCGAGATCC GTTGATCTTC CTGCATCCGC CAGAGGCGGG ATGCGAAGAA 6541 TGCGATGCCG CTCGCCAGTC GATTGGCTGA GCTCATGAGC GGAGAACGAG ATGACGTTGG 6601 AGGGGCAAGG TCGCGCTGAT TGCTGGGGCA ACACGTGGAG CGGATCGGGG ATTGTCTTTC 6661 TTCAGCTCGC TGATGATATG CTGACGCTCA ATGCCGTTTG GCCTCCGACT AACGAAAATC 6721 CCGCATTTGG ACGGCTGATC CGATTGGCAC GGCGGACGGC GAATGGCGGA GCAGACGCTC 6781 GTCCGGGGGC AATGAGATAT GAAAAAGCCT GAACTCACCG CGACGTCTGT CGAGAAGTTT 6841 CTGATCGAAA AGTTCGACAG CGTCTCCGAC CTGATGCAGC TCTCGGAGGG CGAAGAATCT 6901 CGTGCTTTCA GCTTCGATGT AGGAGGGCGT GGATATGTCC TGCGGGTAAA TAGCTGCGCC

340

6961 GATGGTTTCT ACAAAGATCG TTATGTTTAT CGGCACTTTG CATCGGCCGC GCTCCCGATT 7021 CCGGAAGTGC TTGACATTGG GGAATTGGGG ATCTCCATGC ATGTTCTTTC CTGCGTTATC 7081 CCCTGATTCT GTGGATAACC GTATTACCGC CTTTGAGTGA GCTGATACCG CTCGCCGCAG 7141 CCGAACGACC GAGCGCAGCG AGTCAGTGAG CGAGGAAGCG GAAGAGCGCC CAATACGCAA 7201 ACCGCCTCTC CCCGCGCGTT GGCCGATTCA TTAATGCAGC TGGCACGACA GGTTTCCCGA 7261 CTGGAAAGCG GGCAGTGAGC GCAACGCAAT TAATGTGAGT TAGCTCACTC ATTAGGCACC 7321 CCAGGCTTTA CACTTTATGC TTCCGGCTCG TATGTTGTGT GGAATTGTGA GCGGATAACA 7381 ATTTCACACA GGAAACAGCT ATGACCATGA TTACGCCAac ggattagaag ccgccgagcg 7441 ggtgacagcc ctccgaagga agactctcct ccgtgcgtcc tcgtcttcac cggtcgcgtt 7501 cctgaaacgc agatgtgcct cgcgccgcac tgctccgaac aataaagatt ctacaatact 7561 agcttttatg gttatgaaga ggaaaaattg gcagtaacct ggccccacaa accttcaaat 7621 gaacgaatca aattaacaac cataggatga taatgcgatt agttttttag ccttatttct 7681 ggggtaatta atcagcgaag cgatgatttt tgatctatta acagatatat aaatgcaaaa 7741 actgcataac cactttaact aatactttca acattttcgg tttgtattac ttcttattca 7801 aatgtaataa aagtatcaac aaaaaattgt taatatacct ctatacttta acgtcaagga 7861 gaaaaaaccc cggatatgcc acaatttggt atattatgta AAACACCACC TAAGGTGCTT 7921 GTTCGTCAGT TTGTGGAAAG GTTTGAAAGA CCTTCAGGTG AGAAAATAGC ATTATGTGCT 7981 GCTGAACTAA CCTATTTATG TTGGATGATT ACACATAACG GAACAGCAAT CAAGAGAGCC 8041 ACATTCATGA GCTATAATAC TATCATAAGC AATTCGCTGA GTTTCGATAT TGTCAATAAA 8101 TCACTCCAGT TTAAATACAA GACGCAAAAA GCAACAATTC TGGAAGCCTC ATTAAAGAAA 8161 TTGATTCCTG CTTGGGAATT TACAATTATT CCTTACTATG GACAAAAACA TCAATCTGAT 8221 ATCACTGATA TTGTAAGTAG TTTGCAATTA CAGTTCGAAT CATCGGAAGA AGCAGATAAG 8281 GGAAATAGCC ACAGTAAAAA AATGCTTAAA GCACTTCTAA GTGAGGGTGA AAGCATCTGG 8341 GAGATCACTG AGAAAATACT AAATTCGTTT GAGTATACTT CGAGATTTAC AAAAACAAAA 8401 ACTTTATACC AATTCCTCTT CCTAGCTACT TTCATCAATT GTGGAAGATT CAGCGATATT 8461 AAGAACGTTG ATCCGAAATC ATTTAAATTA GTCCAAAATA AGTATCTGGG AGTAATAATC 8521 CAGTGTTTAG TGACAGAGAC AAAGACAAGC GTTAGTAGGC ACATATACTT CTTTAGCGCA 8581 AGGGGTAGGA TCGATCCACT TGTATATTTG GATGAATTTT TGAGGAATTC TGAACCAGTC 8641 CTAAAACGAG TAAATAGGAC CGGCAATTCT TCAAGCAATA AACAGGAATA CCAATTATTA 8701 AAAGATAACT TAGTCAGATC GTACAATAAA GCTTTGAAGA AAAATGCGCC TTATTCAATC 8761 TTTGCTATAA AAAATGGCCC AAAATCTCAC ATTGGAAGAC ATTTGATGAC CTCATTTCTT 8821 TCAATGAAGG GCCTAACGGA GTTGACTAAT GTTGTGGGAA ATTGGAGCGA TAAGCGTGCT 8881 TCTGCCGTGG CCAGGACAAC GTATACTCAT CAGATAACAG CAATACCTGA TCACTACTTC 8941 GCACTAGTTT CTCGGTACTA TGCATATGAT CCAATATCAA AGGAAATGAT AGCATTGAAG 9001 GATGAGACTA ATCCAATTGA GGAGTGGCAG CATATAGAAC AGCTAAAGGG TAGTGCTGAA 9061 GGAAGCATAC GATACCCCGC ATGGAATGGG ATAATATCAC AGGAGGTACT AGACTACCTT 9121 TCATCCTACA TAAATAGACG CATATAATCT CTGCTTTTGT GCGCGTATGT TTATGTATGT 9181 ACCTCTCTCT CTATTTCTAT TTTTAAACCA CCCTCTCAAT AAAATAAAAA TAATAAAGTA 9241 TTTTTAAGGA AAAGACGTGT TTAAGCACTG ACTTTATCTA CTTTTTGTAC GTTTTCATTG 9301 ATATAATGTG TTTTGTCTCT CCCTTTTCTA CGAAAATTTC AAAAATTGAC CAAAAAAAGG 9361 AATATATATA CGAAAAACTA TTATATTTAT ATATCATAGT GTTGCATGCC TGCAGGTCGA 9421 CTCTAGAGGA TCCCCGGGCA ATTGGGTCCT TTTCATCACG TGCTATAAAA ATAATTATAA 9481 TTTAAATTTT TTAATATAAA TATATAAATT AAAAATAGAA AGTAAAAAAA GAAATTAAAG 9541 AAAAAATAGT TTTTGTTTTC CGAAGATGTA AAAGACTCTA GGGGGATCGC CAACAAATAC 9601 TACCTTTTAT CTTGCTCTTC CTGCTCTCAG GTATTAATGC CGAATTGTTT CATCTTGTCT 9661 GTGTAGAAGA CCACACACGA AAATCCTGTG ATTTTACATT TTACTTATCG TTAATCGAAT 9721 GTATATCTAT TTAATCTGCT TTTCTTGTCT AATAAATATA TATGTAAAGT ACGCTTTTTG 9781 TTGAAATTTT TTAAACCTTT GTTTATTTTT TTTTCTTCAT TCCGTAACTC TTCTACCTTC 9841 TTTATTTACT TTCTAAAATC CAAATACAAA ACATAAAAAT AAATAAACAC AGAGTAAATT 9901 CCCAAATTAT TCCATCATTA AAAGATACGA GGCGCGTGTA AGTTACAGGC AAGCGATCCG 9961 TCCTAAGAAA CCATTATTAT CATGACATTA ACCTATAAAA ATAGGCGTAT CACGAGGCCC 10021 TTTCGTCTCG CGCGTTTCGG TGATGACGGT GAAAACCTCT GACACATGCA GCTCCCGGAG 10081 ACGGTCACAG CTTGTCTGTA AGCGGATGCC GGGAGCAGAC AAGCCCGTCA GGGCGCGTCA 10141 GCGGGTGTTG GCGGGTGTCG GGGCTGGCTT AACTATGCGG CATCAGAGCA GATTGTACTG 10201 AGAGTGCACC ATATCGACTA CGTCGTAAGG CCGTTTCTGA CAGAGTAAAA TTCTTGAGGG 10261 AACTTTCACC ATTATGGGAA ATGGTTCAAG AAGGTATTGA CTTAAACTCC ATCAAATGGT 10321 CAGGTCATTG AGTGTTTTTT ATTTGTTGTA TTTTTTTTTT TTTAGAGAAA ATCCTCCAAT

341

10381 ATCAAATTAG GAATCGTAGT TTCATGATTT TCTGTTACAC CTAACTTTTT GTGTGGTGCC 10441 CTCCTCCTTG TCAATATTAA TGTTAAAGTG CAATTCTTTT TCCTTATCAC GTTGAGCCAT 10501 TAGTATCAAT TTGCTTACCT GTATTCCTTT ACTATCCTCC TTTTTCTCCT TCTTGATAAA 10561 TGTATGTAGA TTGCGTATAT AGTTTCGTCT ACCCTATGAA CATATTCCAT TTTGTAATTT 10621 CGTGTCGTTT CTATTATGAA TTTCATTTAT AAAGTTTATG TACAAATATC ATAAAAAAAG 10681 AGAATCTTTT TAAGCAAGGA TTTTCTTAAC TTCTTCGGCG ACAGCATCAC CGACTTCGGT 10741 GGTACTGTTG GAACCACCTA AATCACCAGT TCTGATACCT GCATCCAAAA CCTTTTTAAC 10801 TGCATCTTCA ATGGCCTTAC CTTCTTCAGG CAAGTTCAAT GACAATTTCA ACATCATTGC 10861 AGCAGACAAG ATAGTGGCGA TAGGGTCAAC CTTATTCTTT GGCAAATCTG GAGCAGAACC 10921 GTGGCATGGT TCGTACAAAC CAAATGCGGT GTTCTTGTCT GGCAAAGAGG CCAAGGACGC 10981 AGATGGCAAC AAACCCAAGG AACCTGGGAT AACGGAGGCT TCATCGGAGA TGATATCACC 11041 AAACATGTTG CTGGTGATTA TAATACCATT TAGGTGGGTT GGGTTCTTAA CTAGGATCAT 11101 GGCGGCAGAA TCAATCAATT GATGTTGAAC CTTCAATGTA GGGAATTCGT TCTTGATGGT 11161 TTCCTCCACA GTTTTTCTCC ATAATCTTGA AGAGGCCAAA ACATTAGCTT TATCCAAGGA 11221 CCAAATAGGC AATGGTGGCT CATGTTGTAG GGCCATGAAA GCGGCCATTC TTGTGATTCT 11281 TTGCACTTCT GGAACGGTGT ATTGTTCACT ATCCCAAGCG ACACCATCAC CATCGTCTTC 11341 CTTTCTCTTA CCAAAGTAAA TACCTCCCAC TAATTCTCTG ACAACAACGA AGTCAGTACC 11401 TTTAGCAAAT TGTGGCTTGA TTGGAGATAA GTCTAAAAGA GAGTCGGATG CAAAGTTACA 11461 TGGTCTTAAG TTGGCGTACA ATTGAAGTTC TTTACGGATT TTTAGTAAAC CTTGTTCAGG 11521 TCTAACACTA CCGGTACCCC ATTTAGGACC ACCCACAGCA CCTAACAAAA CGGCATCAAC 11581 CTTCTTGGAG GCTTCCAGCG CCTCATCTGG AAGTGGGACA CCTGTAGCAT CGATAGCAGC 11641 ACCACCAATT AAATGATTTT CGAAATCGAA CTTGACATTG GAACGAACAT CAGAAATAGC 11701 TTTAAGAACC TTAATGGCTT CGGCTGTGAT TTCTTGACCA ACGTGGTCAC CTGGCAAAAC 11761 GACGATCTTC TTAGGGGCAG ACATAGGGGC AGACATTAGA ATGGTATATC CTTGAAATAT 11821 ATATATATAT TGCTGAAATG TAAAAGGTAA GAAAAGTTAG AAAGTAAGAC GATTGCTAAC 11881 CACCTATTGG AAAAAACAAT AGGTCCTTAA ATAATATTGT CAACTTCAAG TATTGTGATG 11941 CAAGCATTTA GTCATGAACG CTTCTCTATT CTATATGAAA AGCCGGTTCC GGCCTCTCAC 12001 CTTTCCTTTT TCTCCCAATT TTTCAGTTGA AAAAGGTATA TGCGTCAGGC GACCTCTGAA 12061 ATTAACAAAA AATTTCCAGT CATCGAATTT GATTCTGTGC GATAGCGCCC CTGTGTGTTC 12121 TCGTTATGTT GAGGAAAAAA ATAATGGTTG CTAAGAGATT CGAACTCTTG CATCTTACGA 12181 TACCTGAGTA TTCCCACAGT TAACTGCGGT CAAGATATTT CTTGAATCAG GCGCCTTAGA 12241 CCGCTCGGCC AAACAACCAA TTACTTGTTG AGAAATAGAG TATAATTATC CTATAAATAT 12301 AACGTTTTTG AACACACATG AACAAGGAAG TACAGGACAA TTGATTTTGA AGAGAATGTG 12361 GATTTTGATG TAATTGTTGG GATTCCATTT TTAATAAGGC AATAATATTA GGTATGTGGA 12421 TATACTAGAA GTTCTCCTCG ACCGTCGATA TGCGGTGTGA AATACCGCAC AGATGCGTAA 12481 GGAGAAAATA CCGCATCAGG AAATTGTAAA CGTTAATATT TTGTTAAAAT TCGCGTTAAA 12541 TTTTTGTTAA ATCAGCTCAT TTTTTAACCA ATAGGCCGAA ATCGGCAAAA TCCCTTATAA 12601 ATCAAAAGAA TAGACCGAGA TAGGGTTGAG TGTTGTTCCA GTTTGGAACA AGAGTCCACT 12661 ATTAAAGAAC GTGGACTCCA ACGTCAAAGG GCGAAAAACC GTCTATCAGG GCGATGGCCC 12721 ACTACGTGAA CCATCACCCT AATCAAGTTT TTTGGGGTCG AGGTGCCGTA AAGCACTAAA 12781 TCGGAACCCT AAAGGGAGCC CCCGATTTAG AGCTTGACGG GGAAAGCCGG CGAACGTGGC 12841 GAGAAAGGAA GGGAAGAAAG CGAAAGGAGC GGGCGCTAGG GCGCTGGCAA GTGTAGCGGT 12901 CACGCTGCGC GTAACCACCA CACCCGCCGC GCTTAATGCG CCGCTACAGG GCGCGTCGCG 12961 CCATTCGCCA TTCAGGCTGC GCAACTGTTG GGAAGGGCGA TCGGTGCGGG CCTCTTCGCT 13021 ATTACGCCAG CTGGCGAAAG GGGGATGTGC TGCAAGGCGA TTAAGTTGGG TAACGCCAGG 13081 GTTTTCCCAG TCACGACGTT GTAAAACGAC GGCCAGTGAG CGCGCGTAAT ACGACTCACT 13141 ATAGGGCGAA TTGGGTACCG CTAGCTTCAA AAGCGCTCTG AAGTTCCTAT ACTATTTGAA 13201 GAATAGGAAC TTCGGAATAG GAACTTCAAG ATCCCCGGGT ACCGGCCGCA AATTAAAGCC 13261 TTCGAGCGTC CCAAAACCTT CTCAAGCAAG GTTTTCAGTA TAATGTTACA TGCGTACACG 13321 CGTCTGTACA GAAAAAAAAG AAAAATTTGA AATATAAATA ACGTTCTTAA TACTAACATA 13381 ACTATAAAAA AATAAATAGG GACCTAGACT TCAGGTTGTC TAACTCCTTC CTTTTCGGTT 13441 AGAGCGGATG TGGGGGGAGG GCGTGAATGT AAGCGTGACA TAACTAATTA CATGACTCGA 13501 GGTCGACCTA TTTCTTAGCA TTTTTGACGA AATTTGCTAT TTTGTTAGAG TCTTTTACAC 13561 CATTTGTCTC CACACCTCCG CTTACATCAA CACCAATAAC GCCATTTAAT CTAAGCGCAT 13621 CACCAACATT TTCTGGCGTC AGTCCACCAG CTAACATAAA ATGTAAGCTC TCGGGGCTCT 13681 CTTGCCTTCC AACCCAGTCA GAAATCGAGT TCCAATCCAA AAGTTCACCT GTCCCACCTG 13741 CTTCTGAATC AAACAAGGGA ATAAACGAAT GAGGTTTCTG TGAAGCTGCA CTGAGTAGTA

342

13801 TGTTGCAGTC TTTTGGAAAT ACGAGTCTTT TAATAACTGG CAAACCGAGG AACTCTTGGT 13861 ATTCTTGCCA CGACTCATCT CCATGCAGTT GGACGATATC AATGCCGTAA TCATTGACCA 13921 GAGCCAAAAC ATCCTCCTTA GGTTGATTAC GAAACACGCC AACCAAGTAT TTCGGAGTGC 13981 CTGAACTATT TTTATATGCT TTTACAAGAC TTGAAATTTT CCTTGCAATA ACCGGGTCAA 14041 TTGTTCTCTT TCTATTGGGC ACACATATAA TACCCAGCAA GTCAGCATCG GAATCTAGTG 14101 CACATTCTGC GGCCTCTGTG CTCTGCAAGC CGCAAACTTT CACCAATGGA CCAGAACTAC 14161 CTGTGAAATT AATAACAGAG AAGTTCCTAT ACTTTCTAGA GAATAGGAAC TTCGGAATAG 14221 GAACTTCCAT GGATCCACTA GTTCTAGAGT ATGGATGGGG GTAATAGAAT TGTATCTATG 14281 TATCTGACGA CCCTGTATTA CACGAAGGAA ATAGTAGATG AAAAGACAAG AGAGCAAGAA 14341 AAAGGAAAAA CTTCCTTTCT AACAGACGCT TTACTTAACC TTATATATAT CTTATTTTTT 14401 TCATCGAGCG TGTTCAATTG GACAAGGTGC CACCTTTTCG ACACTTCGGT TATTATGTTA 14461 CACAGTTTTC ATGAGGATGG CGCCTTGACT AACTTAATTA GTCATTTGCC AACCACCACA 14521 GTTCCCCAAT ATCGACAGCT TCACGTGCCA TTTGCCATTT TGCGATCATG TGACCTCAAG 14581 AGAAAAAGCA AAAAATAATG AGAGCTAAGG GATTCGAACC CTTGCATCCG GCTAGCgcac 14641 aatcgtgccg gTCTAGAGAA TTCACTGGCC GTCGTTTTAC AACGTCGTGA CTGGGAAAAC 14701 CCTGGCGTTA CCCAACTTAA TCGCCTTGCA GCACATCCCC CTTTCGCCAG CTGGCGTAAT 14761 AGCGAAGAGG CCCGCACCGA TCGCCCTTCC CAACAGTTGC GCAGCCTGAA TGGCGAATGG 14821 CGCCTGATGC GGTATTTTCT CCTTACGCAT CTGTGCGGTA TTTCACACCG CATATGGTGC 14881 ACTCTCAGTA CAATCTGCTC TGATGCCGCA TAGTTAAGCC AGCCCCGACA CCCGCCAACA 14941 CCCGCTGACG CGCCCTGACG GGCTTGTCTG CTCCCGGCAT CCGCTTACAG ACAAGCTGTG 15001 ACCGTCTCCG GGAGCTGCAT GTGTCAGAGG TTTTCACCGT CATCACCGAA ACGCGCGAGA 15061 CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT AATGGTTTCT 15121 TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG TTTATTTTTC 15181 TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT GCTTCAATAA 15241 TCGCACGATA TACAGGATTT TGCCAAAGGG TTCGTGTAGA CTTTCCTTGG TGTATCCAAC 15301 GGCGTCAGCC GGGCAGGATA GGTGAAGTAG GCCCACCCGC GAGCGGGTGT TCCTTCTTCA 15361 CTGTCCCTTA TTCGCACCTG GCGGTGCTCA ACGGGAATCC TGCTCTGCGA GGCTGGCCGG 15421 CTACCGCCGG CGTAACAGAT GAGGGCAAGC GGATGGCTGA TGAAACCAAG CCAACCAGGA 15481 AGGGCAGCCC ACCTATCAAG GTGTACTGCC TTCCAGACGA ACGAAGAGCG ATTGAGGAAA 15541 AGGCGGCGGC GGCCGGCATG AGCCTGTCGG CCTACCTGCT GGCCGTCGGC CAGGGCTACA 15601 AAATCACGGG CGTCGTGGAC TATGAGCACG TCCGCGAGCT GGCCCGCATC AATGGCGACC 15661 TGGGCCGCCT GGGCGGCCTG CTGAAACTCT GGCTCACCGA CGACCCGCGC ACGGCGCGGT 15721 TCGGTGATGC CACGATCCTC GCCCTGCTGG CGAAGATCGA AGAGAAGCAG GACGAGCTTG 15781 GCAAGGTCAT GATGGGCGTG GTCCGCCCGA GGGCAGAGCC ATGACTTTTT TAGCCGCTAA 15841 AACGGCCGGG GGGTGCGCGT GATTGCCAAG CACGTCCCCA TGCGCTCCAT CAAGAAGAGC 15901 GACTACGCGG AGCTGGTGAA GTACATCACC GACGAGCAAG GCAAATTGAA AAAGGAAGAG 15961 TATGAGTATT CAACATTTCC GTGTCGCCCT TATTCCCTTT TTTTGCGGCA TTTTGCCTTC 16021 CTGTTTTTGC TCACCCAGAA ACGCTGGTGA AAGTAAAAGA TGCTGAAGAT CAGTTGGGTG 16081 CACGAGTGGG TTACATCGAA CTGGATCTCA ACAGCGGTAA GATCCTTGAG AGTTTTCGCC 16141 CCGAAGAACG TTTTCCAATG ATGAGCACTT TTAAAGTTCT GCTATGTGGC GCGGTATTAT 16201 CCCGTATTGA CGCCGGGCAA GAGCAACTCG GTCGCCGCAT ACACTATTCT CAGAATGACT 16261 TGGTTGAGTA TCCCTAGGCG CTCCCTGCCC GCTGTGGTGA CGAAGGAACT ACTAGTTAGC 16321 CTAACTAACG ATTTCAAGGC CAAAAAAGGG GCCGAACCGT CCGCCGGCTC GACCCCTCCC 16381 GCTGTGCGCT ACAGCGCCGC AAGCTCCCGC TCAGTGGCTT CGCTCGCTTC GTCTTCCTCC 16441 TTCAGCAGCT CCGCCCACTT GAGCGTCACA CGGTCCTTCA GGGGGATGTA TTTCTTCGTC 16501 TCAGGGTGGC GCCCCGGTCG ACCATGACGG AGTCAAGGAA GAAGCTCAGG AACTCCCGAC 16561 GCTCGTACAC GTCCCAGCTT GCCCAGATGC CCCTTCGGCC GTCGGGTCTT CGCCGCTGAA 16621 CCACTCAGAC GGAACGCGGG TGCTGCCGTT CATCTTCTCG TCAAGCTCAG CGAGTCGGGT 16681 CGTGCACTCC GCTTCGTAGG ACCGGTATTG CAGCACCGTT GAGCGCCACG TTTCCAGCTC 16741 TTCACGCCCG ACGTACAGAC CGGGCTTACG GTCCGCCTGA AGGTCCTTGA TGGAGCGCCG 16801 CACGTTGTCT AGGTGCGCCT GTTGTTCGCG CCGCTCATCG GCCACCCCCG CTAGGTCGTG 16861 CTGAAGGGCG AAGCGCTCCG CAGCGGCGGC AATCCATGCC TGATCGTGTT CATCTTCCAT 16921 GTCGGCTGTG CGGAGCCGTG CCCACACCTT CGAAGCAACG AACTCGTCCA GTTCGCTGCG 16981 CTTGACCGAC AAGCCGCCGT GCCCCTTCGG GTTGGCGCAC ATGTAATTGC CTTCGGCAAG 17041 GTCGCCGTTG CGCTTACGGC CACCCTGGGA CTGACCCATT GAGCCGCCGC AGATTCGGCA 17101 CCCCAGGAAA CGCCACCCGC TCAGAAGCGT CGGTTCGACT TCGGCCCCAG GCTTCCGGTC 17161 GCTGAGATTC TTCCCGGAAC GCTTCTCTTG AAGCTCAAGC CACTTCGAGC CGCTGAGAAT

343

17221 GCCCGTGTGG GGCGTTAGCG GCTTGCCGCC GGGGTCGCGC CGTATGACGT TGATGTGCGC 17281 CTTACCGTGC TTCACACGCT CGAATGCGAA ACCGCCGATT GCGGGATGGT TGAGAATCCA 17341 TCGGACCGTT TGAGCGCGCC ACATGATCGG CTTTTCAGCG CCGTTCAGGC GACGTGCCTT 17401 GACGGACGCA AGACGCTTTT CCGTGGCGCG TCTCTCAGCC ATTCCGGGCG ACGGGATCTT 17461 TTCCTTCTCG AAGGTCGTTG CAATGGCGTT GTCGGACACG CCTTCGAACG ACATTTTCGC 17521 CATGCGCTCA ACTAGCTCGA CGTGATCCGG GTTGTCTTCG TCCGGCTCAA GAACGGAGAT 17581 CACGAGATTA TCGACCTTCT TGCGCACGGC GCGCATTCCG AACGGGGCGG AAGACGAGTG 17641 AACGCCACCC AGCGCGGCAA TCTCGTCTTT CGCACCCTTC AGGCGCTCCG CCTTCAGGTC 17701 ACTGTCCTGT TTCGCAAGGG CAGCGATCAG CGCGAAAATG GCGACGCCGA TAGGGGTAGA 17761 CGTGTCAAGG AACGGCTCAA GAACCGACAT GAAGCGCACG CCGTGCTTCT TCAATTCGTT 17821 GTCGATTTCG AGCGCGTCAT GGGCGCCCTT GCGAGTGAGC CGGGAAAGCT CATTCACAAC 17881 GACAACGTCA CCTTCGCCGG CGCGGACTTC GCCCATCATG CGCTCGAAGT CGGCACGGGT 17941 CACGTTCGGG TCCCAGCCGG AGCGCCCGAC GTCCACGAAC TCACCTGCCA CGACCCAGCG 18001 CGCACCCCCC TGGGCGTTGC GAGACGCGAC CAACGCACGA CCGGCCGCCA GCTGTGCTTC 18061 GGTCGACACG TCTGAGCCGT CGGACCGGCC CTTAGACTGA CGCGCGTACA GGAAGACGCG 18121 AACAGTGTCT AGAAGGTGCT CAGGGACGTC GGGAGCGATG AAGGGCGACA TGCCTAGATC 18181 TTGGCTCATG GGCGCTGACC TGCGCTTTCT TTTGGGAACA TTGGTGGTGC TGAGTAGTTT 18241 CCCATGGATC ACTGTCCAGA GACAACAACC CAGCACCCGA AACGTCTGAA GCCCCGTCCG 18301 GCGCCACCTA GGGATACTCA CCAGTCACAG AAAAGCATCT TACGGATGGC ATGACAGTAA 18361 GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC TGCGGCCAAC TTACTTCTGA 18421 CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA CAACATGGGG GATCATGTAA 18481 CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT ACCAAACGAC GAGCGTGACA 18541 CCACGATGCC TGTAGCAATG GCAACAACGT TGCGCAAACT ATTAACTGGC GAACTACTTA 18601 CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC GGATAAAGTT GCAGGACCAC 18661 TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA TAAATCTGGA GCCGGTGAGC 18721 GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG TAAGCCCTCC CGTATCGTAG 18781 TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG AAATAGACAG ATCGCTGAGA 18841 TAGGTGCCTC ACTGAGGCCT ATGGTTACTC CGTTCTACAG GTTACGACGA CATGTCAATA 18901 CTTGCCCTTG ACAGGCATTG ATGGAATCGT AGTCTCACGC TGATAGTCTG ATCGACAATA 18961 CAAGTGGGAC CGTGGTCCCA GACCGATAAT CAGACCGACA ACACGAGTGG GATCGTGGTC 19021 CCAGACTAAT AATCAGACCG ACGATACGAG TGGGACCGTG GTCCCAGACT AATAATCAGA 19081 CCGACGATAC GAGTGGGACC GTGGTTCCAG ACTAATAATC AGACCGACGA TACGAGTGGG 19141 ACCGTGGTCC CAGACTAATA AT //

344