Genetic fidelity and stability in the hyperthermophilic archaeon

Sulfolobus acidocaldarius

A dissertation submitted to the University of Cincinnati in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

by

Dominic Mao

Department of Biological Sciences

McMicken College of Arts and Sciences

2012

Dennis Grogan, Ph.D., Committee chair

Brian Kinkle, Ph.D.

Eddie Merino, Ph.D.

Charlotte Paquin, Ph.D.

Katherine Tepperman, Ph.D.

Abstract

Hyperthermophilic grow optimally at temperatures that accelerate DNA damage which raises important questions about how these organisms maintain genetic fidelity and genome stability.

Archaea in general have reshaped our understanding of the different adaptations of cellular life and their uniqueness justifies their classification into a separate of life. The aim of the thesis research presented in this document was to investigate genetic fidelity and genome stability in hyperthermophilic archaea. The approach involved developing genetic assays based on conventional bacterial and eukaryal model systems as well as novel approaches to probe fundamental mechanisms of genome stability at the molecular level. An important component of genetic fidelity, DNA mismatch repair, was investigated in Sulfolobus acidocaldarius. Sulfolobus acidocaldarius was found to repair mismatches formed during homologous recombination (HR), which provides the first in vivo evidence for mismatch repair in hyperthermophilic archaea. However, the events seen in S. acidocaldarius were highly localized, involving individual or short patches of mismatches seen within long tracts of mismatched DNAs, and thus differed from that resulting from conventional mismatch repair. This process contributed to the unique properties of HR in S. acidocaldarius compared to known bacterial and eukaryotic counterparts. Evidence for genome stability and genetic fidelity in S. acidocaldarius was obtained by sequencing and comparing whole of three natural isolates from local populations separated by large distances (~8200 km). Only 40 polymorphisms were found across all three strains, which suggest a combination of efficient global dispersal and (novel) genetic mechanisms that limit replication errors and chromosomal rearrangements. DNA exchange via conjugation between S. acidocaldarius cells can contribute to this process. Preliminary studies of the size(s) of DNA fragment(s) transferred and direction of transfer show transfer capability of at least 122kb and provide insights into the processing of DNA by the recipient cell following transfer.

ii

iii Acknowledgements

I would like to thank Dennis Grogan for being an excellent mentor. He provides the best learning environment for budding scientists and is extremely generous with his time and advice. His appreciation for science and dedication towards it is contagious and inspiring. In addition to being my Ph.D. advisor he is a close friend who has witnessed or been part of many milestones in my personal and professional life.

I also like to thank all my committee members for being supportive when my research plans did not work out, for helping me find alternative strategies, and for always giving the best advice with my future career in focus. The biological sciences department has been extremely helpful with funding as well as providing a good support system for graduate students.

I am grateful to Cynthia Sakofsky for her friendship and being a great colleague. We had many conversations that ranged from the intellectual, intense, deep, humorous and absurd. Other Groganlab members who became great friends are Jananie Rockwood, Stacey Cranert and Laura Runck.

Arvind Goja and Hardik Kheskani provided the much needed fun times to punctuate every achievement during my graduate school career. Alongside the fun, they were always there to give me sound advice and to listen to my occasional banter.

The women in my family have been a source of inspiration constantly; being the youngest in the family, I can safely say that I derived a lot of my own ‘character’ from these women. My mother,

Margaret Mao, raised four children on her own under extremely hard circumstances. My eldest sister

Sylvia is strong, knows no fear of the unknown and has an adventurous spirit. When I was a young boy, she introduced me to the world of books and set the foundation for my penchant for reading – for that I will be forever grateful. Sophia is perseverant, hard working and patient. Angelina was my partner in mischief as kids and as we grew up I witnessed her shine academically and professionally against all odds.

She is also a bundle of energy who constantly works to strengthen bonds within the family. Here’s to the

Mao family!

Finally I would like to thank my wife, Hanan Dahche, for all her love and support.

iv Table of contents

Chapter 1 Hyperthermophilic archaea and the question of genetic fidelity 1

1.1 Introduction to archaea 1

1.2 Genome stability and DNA repair in cellular organisms 2

1.3 Hyperthermophilic archaea and the question of genome stability and DNA repair 6

1.4 Sulfolobus acidocaldarius, a model hyperthermophilic archaeon 10

1.5 References 12

Chapter 2 Heteroduplex plasmid mismatch repair assay 16

2.1 Introduction 16

2.2 Experimental design 18

2.3 Materials and Methods 19

2.3.1 Strains 19

2.3.2 Plasmid and plasmid modification 20

2.3.3 Construction of heteroduplex plasmid substrates 22

2.3.4 Transformation and analysis of transformants 23

2.3.5 Control for co-transformation 24

2.4 Results 25

2.5 Discussion 26

2.6 Future directions 27

2.7 References 29

v Chapter 3 Formation and fates of mismatches in v iv o during homologous recombination in S. acidocaldarius 31

3.1 Introduction 31

3.2 Experimental design 33

3.3 Materials and methods 34

3.3.1 Strains and growth conditions 34

3.3.2 Donor DNAs 35

3.3.2.1 462bp DNAs with varying number of mismatches 35

3.3.2.2 Multiply marked full-length ds pyrE 35

3.3.2.3 Multiply marked full-length ss pyrE 37

3.3.2.4 Distinctly marked pyrE sense and antisense 185nt oligos 37

3.3.3 Site-directed mutagenesis 38

3.3.4 Transformation 39

3.3.4 Analysis of transformants 39

3.3.5 Scoring non-selected markers 41

3.3.5.1 Genotyping by Ligase Chain Reaction 41

3.4 Results 43

3.4.1 There is no apparent penalty of increasing mismatches on HR efficiency 43

3.4.2 Sectored colonies result from transformation using ds pyrE 44

3.4.3 Sectoring is not due to conjugation 44

3.4.4 Sectoring is not due to more than one copy of donor DNA recombining multiple times with the chromosome(s) 45

3.4.5 Single‐stranded donor DNA can generate sectored colonies. 46

3.4.6 Both strands of a donor duplex participate in HR 47

vi 3.5 Discussion 48

3.6 References 52

Chapter 4 Genome stability and evolution of S. acidocaldarius 54

4.1 Introduction 54

4.2 Materials and methods 57

4.2.1 Sulfolobus acidocaldarius strains 57

4.2.1.1 History of the GBA strain 57

4.2.2 Genomic DNA extraction and sequencing 58

4.2.3 Fluctuation test for GBA strain 62

4.3 Results 63

4.3.1 The genome of S. acidocaldarius is globally conserved 64

4.3.2 Possible sequencing errors in the DSM639 reference genome sequence 64

4.3.3 Genomic and mutation spectra data contradict mutator phenotype of GBA 65

4.4 Discussion 66

4.5 References 70

Chapter 5 DNA exchange at high temperatures 73

5.1 Introduction 73

5.2 Experimental design 74

5.3 Materials and methods 76

vii 5.3.1 Conjugational crosses 76

5.3.2 Analysis of recombinants 77

5.4 Results 78

5.5 Discussion 79

5.6 References 81

Chapter 6 Conclusions 82

6.1 References 88

viii Chapter 1. Hyperthermophilic archaea and the question of genetic fidelity

1.1 Introduction to archaea

Archaea are prokaryotic organisms that form the third domain of life. Before the era of sequence analysis, archaea and were clumped under the domain ‘prokaryotes’ primarily due to their similar cellular physiologies of being unicellular, anucleate and containing circular chromosomes. However, phylogenetic analyses of small sub-unit ribosomal DNA (rDNA) revealed archaea to be monophyletic (Woese and Fox, 1977) subsequently leading to the proposal by of changing the dichotomous tree of life (prokaryotes and ) into a trichotomous one where archaea formed a third domain of life (Woese et al., 1990). This proposal was further supported by the discovery of certain unique features that validate grouping these organisms to a separate domain. For example, archaea have phytanyl ethers as opposed to fatty ester lipids in the cell membranes (Woese et al., 1978). The cell membranes further show absence of a canonical peptidoglycan layer and presence of a proteinaceous S layer (Esko et al.,

2009). Archaeal B-family DNA polymerases have a unique property of stalling upon encountering a uracil residue in the DNA template (Connolly et al., 2003). An interesting observation from genome comparisons was that archaeal genomes contain homologs of bacterial as well as eukaryotic genes; proteins involved in metabolic processes resemble bacterial counterparts whereas proteins involved in information processing (i.e., replication, transcription and translation) resemble eukaryotic counterparts (Makarova and Koonin, 2003). The molecular and genetic processes of archaea cannot be predicted completely based on analogy to bacteria or eukarya, but should be delineated experimentally – a need further highlighted by the observation that about half of all archaeal genomes consist of elements that are conserved within the archaeal

1 domain which have no known homologs in bacteria or eukaryotes (Allers and Mevarech, 2005).

The phylogenetic juxtaposition of the archaeal domain between bacteria and eukarya makes the study of archaea important for understanding the evolutionary relationships of all organisms.

Archaeal research gains further significance considering their molecular similarities with eukaryotes thus allowing these organisms to serve as simpler experimental models to provide insights into more complex eukaryal processes (Kelman and Kelman, 2003).

Archaea have broadened our views of cellular adaptations by the lifestyles (, , etc) and unique metabolism () of some members of this domain.

Among archaea, the hyperthermophiles are particularly interesting because these organisms thrive in conditions that generally destabilize biomolecules essential for cellular viability (Stetter,

1999). Furthermore, they have great biotechnological significance considering the intrinsic thermostable nature of their proteins; in fact, highly accurate and processive polymerases, ligases, etc. from hyperthermophilic archaea are popularly used tools in research (Cline et al., 1996).

1.2 Genome stability and DNA repair in cellular organisms

The genomes of cellular organisms are constantly challenged with damage (or DNA lesions) in the form of oxidized bases, abasic sites, cyclobutane pyrimidine dimers, etc. (Lindahl and Wood, 1999). Genomic stability can even be affected by the intrinsic sequence composition

– for example, mononucleotide repeats, direct or inverted repeats, etc., can promote replication slippage or homologous recombination (Aguilera and Gomez-Gonzalez, 2008).

It has been shown in several model organisms spanning all domains of life that a number of excision repair pathways exist that can repair a variety of lesions (Lindahl and Wood, 1999).

Base excision repair (BER) removes abnormal bases like oxidized variants. Enzymes called

2 glycosylases specific for the type of modified base recognize and remove the damaged base creating an abasic site by cleaving the N-glycosydic bond (Krokan et al., 1997). The sugar- phosphate backbone is then removed by an endonuclease (Sung and Demple, 2006).

Nucleotide excision repair (NER) removes bulky adducts like cisplatin adducts, cyclobutane pyrimidine dimers (CPDs), etc., by discarding a short single-strand containing the lesion. In E. coli, UvrABC complex recognizes and nicks on either side of the DNA adduct (also known as the incision step) followed by removal of the incised oligonucleotide containing the adduct by a helicase (Orren et al., 1992). The genes involved in NER in humans are categorized into 7 groups, xpa-xpg (named after xeroderma pigmentosa, a disease manifestation of NER deficiency) based on complementation studies. They encode proteins that assemble or form the

5’ 3’ 3’ 5’

Unrepaired lesion blocks replication 3’

5’ 3’ Repair Damage reversal (NER, BER, MMR, etc.) HR 5’ TLS, HR 5’ 3’ 3’ (Homolog) 5’ 3’

Replication completes 5’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ Repair synthesis (D) 5’ 3’ 5’ 3’ + 5’ 3’ (Homolog) 3’ 5’ 3’ 5’ 5’ 3’

Figure 1. Fates of DNA lesions. (A) Double stranded DNA containing lesion (red block) on one strand. (B) Excision repair pathways remove lesion. (C) Lesion-free DNA. (D) If the lesion is a double-strand break, homologous recombination (HR) can repair it provided a homologous DNA partner is available. (E) Replication fork encounters the lesion and stalls. (F) Translesion synthesis (TLS) or HR allows lesion to be bypassed. (G) Replication proceeds downstream of lesion generating daughter DNAs (G) where the lesion persists and (H) which is lesion-free. (G) faces the same fates as (A).

3 recognition-incision complex and helicases (Garfinkel and Bailis, 2002). Mutations in most of the XP genes cause diseases such as the aforementioned xeroderma pigmentosa, Cockayne syndrome and symptoms such as sensitivity to sunlight where exposure can lead to skin cancer

(Cleaver et al., 2009).

Mismatch repair (MMR) primarily repairs replication errors like base mispairs and insertion/deletion loops that can be caused by replication slippage by excising a long fragment of the strand containing the error. MMR is different from other repair pathways in that the primary substrates of MMR are normal bases, but erroneous in the context of their complement. In E. coli, MutSL forms the mismatch recognition complex; in eukaryotes, different heterodimers of

MutS homologs recognize specific types of mismatches (Kunkel and Erie, 2005). MMR also polices homologous recombination. Mismatched intermediates are formed during recombination between similar but not identical sequences and MMR can inhibit HR between such sequences.

If the recombination is allowed to proceed, MMR decides the outcome by its action on the heteroduplex intermediate. If MMR repairs the mismatches in the intermediate, gene conversion occurs, where all or part of one of the participating DNAs is lost (Iyer et al., 2006). In humans, deficiency in MMR leads to predisposition to hereditary nonpolyposis colorectal cancer

(Peltomaki, 2001).

After the removal of the offending adduct, repair synthesis fills the gap in all three pathways (BER, NER and MMR). These repair pathways share in common the prerequisite of an intact complementary strand opposite the strand containing the lesion to serve as a template for repair synthesis and generally result in error-free repair (Lindahl and Wood, 1999).

DNA damage can also be directly reversed by certain proteins. Bacteria, archaea and lower eukaryotes have been documented to possess DNA photolyase, which can harness light

4 energy to cleave the bond between the dimerized bases, a process called photoreactivation (Essen and Klar, 2006).

Lesions that escape repair can halt progression of a replication fork due to the inability of the primary replication polymerase to elongate a nascent strand past the lesion (in other words, the inability to use the lesion as a template) (Cox et al., 2000). Replication fork stalling can also result in disassembly of the replication complex (also known as fork collapse) (Labib and

Hodgson, 2007). One strategy employed by cells to overcome such situations is to recruit trans- lesion synthesis (TLS) polymerases to the blocked site in order to synthesize past the lesion

(Goodman, 2002; Nohmi, 2006). Structural analyses of TLS polymerases show the active sites to be highly accommodating to a variety of template substrates (Waters et al., 2009). The caveat to recruiting a TLS polymerase is that there is a risk of introducing mutations as these polymerases have a higher error rate than replicative polymerases. This ensures that replication proceeds, which is required for the immediate survival of the cell. This is considered as damage tolerance (lesion bypass) in contrast to damage repair (removal of lesion). The action of TLS polymerases also offers another chance for cells to repair the offending lesion. Most of the TLS polymerases belong to family Y polymerases though Pol ζ, a B family eukaryotic polymerase is also capable of lesion bypass (Johansson et al., 2001). Homologues of TLS polymerases have been found in all domains of life (Waters et al., 2009).

Stalled or collapsed forks can also be rescued by homologous recombination in various ways that primarily involve using homologous, lesion-free DNA to synthesize nascent DNA corresponding to the site of the damage. Various models have been elucidated like template switching, formation of chicken-foot structures as a result of fork regression, strand transfer, etc.

(Heller and Marians, 2006; Izhar et al., 2008). Most of the models involving HR require a

5 protein that binds single-stranded DNA to initiate strand transactions – in bacteria RecA performs this function, while the eukaryotic and archaeal homologs are Rad51 and RadA respectively (Lin et al., 2006). Like TLS, these pathways do not remove the lesion, but allow replication to proceed. However, lesion bypass by homologous recombination is generally error- free.

DNAs containing single-stranded breaks produce double-strand breaks (DSB) during replication (Kuzminov, 1995). DSBs can also result from ionizing radiation. Some normal cellular mechanisms that involve DSB include mating switch in yeast, V(D)J recombination, etc

(Alt et al., 1992; Klar et al., 1991). DSBs pose a serious threat for immediate cell survival, but can be repaired by homologous recombination provided an intact homologous partner is available. DSB repair has been extensively documented both in bacteria and eukaryotes (Aylon and Kupiec, 2004; Jackson, 2002; Motamedi et al., 1999). DSBs can also be repaired by non- homologous end-joining (NHEJ), but frequently results in loss of DNA sequence at the ends of the DSB (Mladenov and Iliakis, 2011).

1.3 Hyperthermophilic archaea and the question of genome stability and DNA

repair

Hyperthermophilic archaea serve as interesting study organisms for DNA repair and genome stability because of their growth requirement of high temperatures, which is known to promote DNA damage (Lindahl, 1993; Wang et al., 1982). Hyperthermophilic archaea have optimal growth temperatures of 80°C and above and some species also have an additional requirement of highly acidic environments (Stetter, 1999). Thus, these organisms provide unique opportunities to study DNA repair and the diverse ways by which genetic fidelity can be

6 achieved in extreme environments. Moreover, all hyperthermophilic archaeal genomes sequenced show the absence of genes encoding damage recognition proteins involved in MMR

(Table 1 and 2) and NER (Grogan, 2004). Curiously, homologs of genes involved in the downstream reactions of NER are present. However, some of these genes have been documented to not possess equivalent functions as their bacterial/eukaryotic counterparts. For example, UVDE, an endonuclease of the prokaryotic NER pathway, in S. acidocaldarius does not seem to play a role in repair of UV photoproducts (Sakofsky et al., 2011). In Thermococcus kodakaraensis, XPB and XPD helicases have been shown to play minimal roles in the repair of

UV photoproducts (Fujikane et al., 2010).

7 Table 1. Archaeal species possessing at least one copy of mutS AND mutL in their genomes. The entries are listed in increasing order of their optimal growth temperatures; mid-points were considered for ranges of growth temperature.

Phylum (Species Growth Species represented/total temperature (°C) sequenced) Halopiger xanaduensis 21.8 Methanoculleus marisnigri 20-25 Methanococcoides burtonii 23 Methanosphaerula palustris 28-30 Halorubrum lacusprofundi 31 Methanospirillum hungatei 30-35 ~35 Candidatus Methanoregula boonei 35 Methanohalophilus mahii 35 Methanosaeta concilii 35 Methanosarcina mazei 35 Halalkalicoccus jeotgali 21-50 Methanocorpusculum labreanum 37 Methanoplanus petrolearius 37 Methanosalsum zhilinae 37 Euryarcahea (31/76) Natronomonas pharaonis 37 Natrialba magadii 37-40 Methanosarcina barkeri str. Fusaro 37-42 Haloarcula hispanica 41 sp. NRC-1 42 Halomicrobium mukohataei 40-45 Haloarcula marismortui 40-50 Haloferax volcanii 45 Haloquadratum walsbyi 45 Methanohalobium evestigatum 40-50 Halorhabdus utahensis 50 Halogeometricum borinquense 49-53 Haloterrigena turkmenica 51 Methanosaeta thermophila 55-60 halophilic archaeon ?

8 Table 2. Archaeal species missing both mutS and mutL or only mutL (*) in their genomes. Multiple species of the same genus with the same mutSL status are represented by the genus name followed by the number of species analyzed within brackets. The entries are listed in increasing order of their optimal growth temperatures within each phylum; mid-points were considered for ranges of growth temperature

Phylum Growth (Species represented/total Genus/Species temperature sequenced) (°C) Metallosphaera [2] 55-75 Caldivirga maquilingensis 60-92 Acidianus hospitalis 65-95 Sulfolobus [12] ~80 Thermofilum pendens 67-93 Acidilobus saccharovorans 80-85 Desulfurococcus [2] 85 Thermoproteus [3] 85 Crenarchaea (39/39) Thermosphaera aggregans 85 Ignicoccus hospitalis 73-98 Vulcanisaeta [2] 85-90 Staphylothermus [2] 85-92 Pyrobaculum [6] 75-104 90-95 Ignisphaera aggregans 92-95 Hyperthermus butylicus 80-108 * 106 Methanobrevibacter [2]* ~30-40 Methanocella paludicola SANAE* 35-37 [8] 35-40 Ferroplasma acidarmanus* 35-42 Halobacterium salinarum* 49-50 Thermoplasma acidophilum 55-60 Picrophilus torridus 60 * 60 Methanothermobacter [2] * 55-65 Methanothermococcus okinawensis 60-65 Methanotorris igneus Kol 75 [5] 65-85 Archaeoglobus fulgidus 76 Euryarchaea Archaeoglobus veneficus* ~80 (45/76) Archaeoglobus profundus* 82 Ferroglobus placidus* 85 Thermococcus [6] * 85 Pyrococcus sp. NA2 93 kandleri 96 * 96 Methanothermus fervidus* 97 Methanosphaera stadtmanae* 98 * 98 Pyrococcus yayanosii * 98 * 100 Methanobacterium sp. SWAN-1* ? Uncultured methanogenic archaeon* RC-I ? Candidatus Caldiarchaeum subterraneum ? Other (3/3) Candidatus Korarchaeum cryptofilum ?

9 Nanoarchaeum equitans Kin4-M 73-98 Cenarchaeum symbiosum ~10 Thaumarchaea (2/2) Nitrosopumilus maritimus SCM1 9-29

1.4 Sulfolobus acidocaldarius, a model hyperthermophilic archaeon

For the studies conducted in this dissertation, Sulfolobus acidocaldarius was used as a model organism. Sulfolobus acidocaldarius is a hyperthermophilic archaeon belonging to the phylum which grows optimally at ~80 °C and pH ~3.0 (Brock et al., 1972).

Sulfolobus acidocaldarius serves as a good model system because of the ease with which it can be cultured in the laboratory and its ability to be transformed with exogenous DNA. The pyrE gene that codes for orotate phosphoribosyltransferase (OPRT) serves as an effective selectable gene for molecular genetic experiments. OPRT is involved in the biosynthetic pathway of pyrimidines and loss-of-function mutations in pyrE produce uracil auxotrophs (i.e., cells that require exogenous uracil to grow). pyrE also offers an additional advantage of selecting for pyrE mutants since they are resistant to 5-fluoro-orotic acid (FOA). FOA is a pyrimidine analog that is metabolized by Pyr+ S. acidocaldarius cells to yield a genotoxic product 5-fluoro-uracil monophosphate (Grogan and Gunsalus, 1993).

The spontaneous mutation rate of S. acidocaldarius is found to be comparable to that of

Escherichia coli (Grogan et al., 2001) implying highly accurate DNA replication and/or alternative repair proteins in these organisms that compensates for the missing MMR and NER components. Recently it has been shown that Phr, a DNA photolyase homolog, can repair UV photoproducts in S. acidocaldarius (Sakofsky et al., 2011); this can alleviate the organism’s genome of a subset of lesions that are normally repaired by NER in other organisms. However,

10 no pathway has been documented that can repair mismatches introduced during replication or homologous recombination.

The overarching theme of this doctoral dissertation is investigating genetic fidelity in S. acidocaldarius. Towards this goal several independent studies were carried out:

1. Since canonical MMR plays a key role in keeping a check on spontaneous mutations, the

mismatch repair capability of S. acidocaldarius was tested. Sulfolobus acidocaldarius

cells were presented with mismatches to test if they could be repaired in vivo.

Mismatches were delivered in two different contexts; the mismatches were (i) preformed

in vitro in plasmids or (ii) formed in vivo by engineering HR substrates that could

generate mismatched intermediates. Cells were transformed with these substrates and

transformants were analyzed for repair of the mismatches. Since conventional MMR has

inhibitory effects on HR between mismatched DNAs, the effects of mismatches on the

efficiency of HR in S. acidocaldarius were also tested.

2. Genome sequences of natural S. acidocaldarius isolates from different geographical

locations were compared for sequence divergence to observe the evolution of S.

acidocaldarius in nature. The results were compared to sequence divergence seen in a

sister species S. islandicus. A laboratory S. acidocaldarius strain that had been subjected

to chemical and UV mutagenesis and reported to possess mutator phenotype was also

evaluated for genes that were affected as a result of induced mutations. The effects of the

mutagens on the genome of S. acidocaldarius were also investigated by tracing the

origins of the induced mutations across the various mutagenic treatments.

11 3. An ongoing study investigates the phenomenon of exchange of genetic material between

S. acidocaldarius cells via conjugation at the molecular level. Two main aspects are

being addressed: size(s) of DNA(s) transferred as well as directionality of transfer.

1.5 References

Aguilera, A., and Gomez-Gonzalez, B. (2008). Genome instability: a mechanistic view of its causes and consequences. Nature reviews Genetics 9, 204-217.

Allers, T., and Mevarech, M. (2005). Archaeal genetics - the third way. Nat Rev Genet 6, 58-73.

Alt, F.W., Oltz, E.M., Young, F., Gorman, J., Taccioli, G., and Chen, J. (1992). VDJ recombination. Immunol Today 13, 306-314.

Aylon, Y., and Kupiec, M. (2004). DSB repair: the yeast paradigm. DNA repair 3, 797-815.

Brock, T.D., Brock, K.M., Belly, R.T., and Weiss, R.L. (1972). Sulfolobus: a new genus of sulfur-oxidizing bacteria living at low pH and high temperature. Arch Mikrobiol 84, 54-68.

Cleaver, J.E., Lam, E.T., and Revet, I. (2009). Disorders of nucleotide excision repair: the genetic and molecular basis of heterogeneity. Nat Rev Genet 10, 756-768.

Cline, J., Braman, J.C., and Hogrefe, H.H. (1996). PCR fidelity of pfu DNA polymerase and other thermostable DNA polymerases. Nucleic Acids Res 24, 3546-3551.

Connolly, B.A., Fogg, M.J., Shuttleworth, G., and Wilson, B.T. (2003). Uracil recognition by archaeal family B DNA polymerases. Biochem Soc Trans 31, 699-702.

Cox, M.M., Goodman, M.F., Kreuzer, K.N., Sherratt, D.J., Sandler, S.J., and Marians, K.J. (2000). The importance of repairing stalled replication forks. Nature 404, 37-41.

Esko, J.D., Doering, T.L., and Raetz, C.R.H. (2009). Eubacteria and Archaea.

Essen, L.O., and Klar, T. (2006). Light-driven DNA repair by photolyases. Cell Mol Life Sci 63, 1266-1277.

Fujikane, R., Ishino, S., Ishino, Y., and Forterre, P. (2010). Genetic analysis of DNA repair in the hyperthermophilic archaeon, Thermococcus kodakaraensis. Genes Genet Syst 85, 243-257.

Garfinkel, D.J., and Bailis, A.M. (2002). Nucleotide Excision Repair, Genome Stability, and Human Disease: New Insight from Model Systems. J Biomed Biotechnol 2, 55-60.

Goodman, M.F. (2002). Error-prone repair DNA polymerases in prokaryotes and eukaryotes. Annu Rev Biochem 71, 17-50.

12 Grogan, D.W. (2004). Stability and repair of DNA in hyperthermophilic Archaea. Curr Issues Mol Biol 6, 137-144.

Grogan, D.W., Carver, G.T., and Drake, J.W. (2001). Genetic fidelity under harsh conditions: analysis of spontaneous mutation in the thermoacidophilic archaeon Sulfolobus acidocaldarius. Proc Natl Acad Sci U S A 98, 7928-7933.

Grogan, D.W., and Gunsalus, R.P. (1993). Sulfolobus acidocaldarius synthesizes UMP via a standard de novo pathway: results of biochemical-genetic study. J Bacteriol 175, 1500-1507.

Heller, R.C., and Marians, K.J. (2006). Replisome assembly and the direct restart of stalled replication forks. Nature reviews Molecular cell 7, 932-943.

Iyer, R.R., Pluciennik, A., Burdett, V., and Modrich, P.L. (2006). DNA mismatch repair: functions and mechanisms. Chem Rev 106, 302-323.

Izhar, L., Goldsmith, M., Dahan, R., Geacintov, N., Lloyd, R.G., and Livneh, Z. (2008). Analysis of strand transfer and template switching mechanisms of DNA gap repair by homologous recombination in Escherichia coli: predominance of strand transfer. Journal of molecular biology 381, 803-809.

Jackson, S.P. (2002). Sensing and repairing DNA double-strand breaks. Carcinogenesis 23, 687- 696.

Johansson, E., Majka, J., and Burgers, P.M. (2001). Structure of DNA polymerase delta from Saccharomyces cerevisiae. The Journal of biological chemistry 276, 43824-43828.

Kelman, L.M., and Kelman, Z. (2003). Archaea: an archetype for replication initiation studies? Mol Microbiol 48, 605-615.

Klar, A.J., Bonaduce, M.J., and Cafferkey, R. (1991). The mechanism of fission yeast mating type interconversion: seal/replicate/cleave model of replication across the double-stranded break site at mat1. Genetics 127, 489-496.

Krokan, H.E., Standal, R., and Slupphaug, G. (1997). DNA glycosylases in the base excision repair of DNA. Biochem J 325 ( Pt 1), 1-16.

Kunkel, T.A., and Erie, D.A. (2005). DNA mismatch repair. Annu Rev Biochem 74, 681-710.

Kuzminov, A. (1995). Collapse and repair of replication forks in Escherichia coli. Molecular microbiology 16, 373-384.

Labib, K., and Hodgson, B. (2007). Replication fork barriers: pausing for a break or stalling for time? EMBO Rep 8, 346-353.

Lin, Z., Kong, H., Nei, M., and Ma, H. (2006). Origins and evolution of the recA/RAD51 gene family: evidence for ancient gene duplication and endosymbiotic gene transfer. Proceedings of the National Academy of Sciences of the United States of America 103, 10328-10333.

13 Lindahl, T. (1993). Instability and decay of the primary structure of DNA. Nature 362, 709-715.

Lindahl, T., and Wood, R.D. (1999). Quality control by DNA repair. Science 286, 1897-1905.

Makarova, K.S., and Koonin, E.V. (2003). Comparative genomics of archaea: how much have we learned in six years, and what's next? Genome Biol 4.

Mladenov, E., and Iliakis, G. (2011). Induction and repair of DNA double strand breaks: the increasing spectrum of non-homologous end joining pathways. Mutation research 711, 61-72.

Motamedi, M.R., Szigety, S.K., and Rosenberg, S.M. (1999). Double-strand-break repair recombination in Escherichia coli: physical evidence for a DNA replication mechanism in vivo. Genes Dev 13, 2889-2903.

Nohmi, T. (2006). Environmental stress and lesion-bypass DNA polymerases. Annu Rev Microbiol 60, 231-253.

Orren, D.K., Selby, C.P., Hearst, J.E., and Sancar, A. (1992). Post-incision steps of nucleotide excision repair in Escherichia coli. Disassembly of the UvrBC-DNA complex by helicase II and DNA polymerase I. J Biol Chem 267, 780-788.

Peltomaki, P. (2001). Deficient DNA mismatch repair: a common etiologic factor for colon cancer. Hum Mol Genet 10, 735-740.

Sakofsky, C.J., Runck, L.A., and Grogan, D.W. (2011). Sulfolobus Mutants, Generated via PCR Products, Which Lack Putative Enzymes of UV Photoproduct Repair. Archaea.

Stetter, K.O. (1999). and their adaptation to hot environments. Febs Lett 452, 22- 25.

Sung, J.S., and Demple, B. (2006). Roles of base excision repair subpathways in correcting oxidized abasic sites in DNA. FEBS J 273, 1620-1629.

Wang, R.Y., Kuo, K.C., Gehrke, C.W., Huang, L.H., and Ehrlich, M. (1982). Heat- and alkali- induced deamination of 5-methylcytosine and cytosine residues in DNA. Biochim Biophys Acta 697, 371-377.

Waters, L.S., Minesinger, B.K., Wiltrout, M.E., D'Souza, S., Woodruff, R.V., and Walker, G.C. (2009). Eukaryotic Translesion Polymerases and Their Roles and Regulation in DNA Damage Tolerance. Microbiol Mol Biol R 73, 134-+.

Woese, C.R., and Fox, G.E. (1977). Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A 74, 5088-5090.

Woese, C.R., Kandler, O., and Wheelis, M.L. (1990). Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A 87, 4576- 4579.

14 Woese, C.R., Magrum, L.J., and Fox, G.E. (1978). Archaebacteria. J Mol Evol 11, 245-252.

15 Chapter 2 Heteroduplex plasmid mismatch repair assay

2.1 Introduction

Post-replicative, long-patch DNA mismatch repair (MMR) is a DNA repair pathway that contributes to maintaining genetic fidelity by repairing errors introduced during replication (Li,

2008). MMR defects in cells result in a high rate of mutation (50-1000 fold increase); they are also associated with hereditary nonpolyposis colorectal cancer in humans (Iyer et al., 2006).

MMR system repairs mismatches (base mispairs from misincorporation and insertion/deletion loops (IDL) from processive slippage) that escape the proofreading by DNA polymerase during replication (Li, 2008). Mismatches have to be repaired before the next round of replication since strands containing the erroneous bases will serve as templates for polymerization resulting in the errors being fixed as mutations.

In E. coli, MutS and MutL proteins form the mismatch recognition complex and upon

binding, recruit other proteins to excise a

(A) part of the strand containing the

Mismatch recognition misincorporated base(s) (Fig. 1A and 1B); Exonucleolytic removal of strand with discrimination signal DNA polymerase then resynthesizes the

(B) excised strand resulting in replacement of

the mispaired base with the correct Repair synthesis nucleotide (Fig. 1C) (Iyer et al., 2006).

(C) Identification of the “erroneous” strand is Figure 2. Mechanistic highlights of MMR. (A) Mismatched DNA with a nick on the bottom strand. MutSL crucial for repair and since these errors are recognizes the mismatch and recruits proteins to catalyze exonucleolytic removal of error containing strand past the introduced during replication, bias is against mismatch (B). (C) Repair synthesis fills in the gap followed by ligation.

16 the nascent strand. In eukaryotes, homologues of MutS and MutL function similarly, although there are different heterodimers of MutS homologues that recognize specific types of mismatches

(Kunkel and Erie, 2005; Larrea et al., 2010). Generally, nicks or breaks in one strand serve as a signal for strand discrimination resulting in the removal of the discontinuous strand (Jiricny,

2006). E. coli can recognize the erroneous strand even in the absence of nicks by hemimethylation of the adenine residue in 5’-GATC-3’ sequences caused by the lag in the methylation of the newly synthesized strand; a specialized endonuclease MutH nicks the non- methylated strand and provides the substrate for excision initiation.

All hyperthermophilic archaeal genomes sequenced to date have revealed the absence of homologues of important genes (mutS and mutL) involved in the MMR pathway (Grogan, 2004).

Interestingly, the spontaneous mutation rate of at least one species, S. acidocaldarius (1.8 x 10-3 mutations per genome per replication), is found to be comparable to that of E. coli (2.5 x 10-3 mutations per genome per replication) (Grogan et al., 2001), which has a functional MMR system.

No cellular organism has been shown to lack mutS/mutL homologues and still maintain a low mutation rate as is seen in S. acidocaldarius. A fundamental question that needs to be addressed is how hyperthermophilic archaea avoid spontaneous mutations without homologues of the mutS and mutL genes that encode proteins responsible for the formation of the mismatch recognition complex in a conventional MMR pathway, in their genomes. This study probes at the capability of S. acidocaldarius to repair preformed mismatches.

17 2.2 Experimental design

A direct approach to test for the ability to repair mismatches is to transform the organism with DNA containing preformed mismatches and follow the fates of the mismatches in the transformants for repair (Bishop and Kolodner, 1986).

A critical requirement for the construction of mismatches using double stranded (ds) circular DNA is nicking endonucleases, which are enzymes that cleave only one strand of a dsDNA substrate (Wang and Hays, 2006). Plasmids with mismatches can be constructed by nicking a plasmid at two closely located sites on the same strand (Fig. 2A), heating to the

Released strand Incoming strand

Backbone strand Heteroduplex plasmid

Nicking digestion Heat release Anneal synthetic ss oligo to gapped region and ligate A B C Figure 2. Construction of heteroduplex plasmid. (A) Dual nicking step; “↓” indicates positions of nicks. (B) Heat release step; ss gap is formed in the plasmid. (C) Annealing step; synthetic oligo annealed to the gap forms desired mismatch, which is followed by ligation to seal the nicks. melting temperature (Tm) of the nicked fragment for release of the short single-strand (ss) to generate a gap in the plasmid (Fig. 2B) and annealing a synthetic ss oligo that generates a single mismatch to the ss “backbone” strand on the plasmid (Fig. 2C), followed by ligation to obtain a covalently-closed circular heteroduplex.

18 Strand discrimination signals for a conventional MMR system can be tested by introducing nicks in the vicinity of the mismatch using a different nicking enzyme. Thus, the substrate can be designed to detect directional repair (detection and removal of erroneous strand via strand discrimination signals) of preformed mismatches in vivo. It is important to investigate if repair is directional because repair without distinguishing a correct base from an incorrect one can result in replication errors being fixed; such a repair system, even though capable of removing mismatches, will not be effective in reducing replication errors.

2.3 Materials and Methods

2.3.1 Strains

Strain MR31, which has an 18bp deletion (positions 154-171 of the coding region) of pyrE was used for all transformations in S. acidocaldarius. Escherichia coli strain DH5α was used in transformations for plasmid construction. E. coli strain ER2566 carrying a plasmid

BbvCI expressing a functional equivalent of the SuaI RM StuI NruI

NsiI system, EsaBC41 (a generous gift from Rick Morgan, (1009) BbvCI

New England biolabs) (Kurosawa and Grogan, 2005) orf904 was used for transformations when plasmids needed to

pC

pyrE pyrE be methylated. An E. coli strain E. cloni®5-alpha was pyrF

used for the MMR assay due to its inability to restrict bla methylated C residues which allows the methylated Figure 3. Map of Sulfolobus-E. coli shuttle vector pC. “Universal backbone” plasmid to propagate in E. coli. and the site of insertion (NsiI site) is shown.

19 2.3.2 Plasmid and plasmid modification

A shuttle vector for Sulfolobus and E. coli, pC, was used for generating plasmids with

mismatches. pC contains the pyrE and the bla selective markers that allows for the selection of

uracil prototrophs (cells not requiring uracil in medium to grow) in Sulfolobus and ampicillin

resistance in E. coli respectively (Fig. 3) (Berkner et al., 2007).

In order to modify pC to enable introduction of all possible mismatches, a short DNA

fragment was inserted into pC. The fragment contained StuI and NruI sites flanked by BbvCI

sites; none of these sites are present in native pC (Fig. 4). The motivation for engineering BbvCI

sites is that the enzyme is commercially available in two forms, Nt. And Nb., which cut top and

bottom strands within the recognition sequence respectively (Fig. 5). Digesting with Nt. BbvCI

or Nb. BbvCI will result in dual nicking of

BspQI (8646) BsmI (8265) the top or bottoms strand respectively, which

BsmI (8169) BbvCI (1011) StuI (1021) enables the top or bottom strand to be NruI (1027) BbvCI (1034) removed. Thus, by selectively replacing the orf904 BspQI (1257) BspQI (1347) BsmI (1920) top or bottom strand with synthetic oligos,

pDM8

(6336) pyrE pyrE

BspQI BsmI (2325) all possible mismatches can be generated

pyrF (Fig. 5).

BtsCI (3946) bla

About 40-80ng of pC was digested

BtsCI (4414) BtsCI (4233) using NsiI in total reaction volume of 20 µL

Figure 4. Map of pDM8 showing positions of restriction containing ~10 units of enzyme and 1X enzyme sites used in the MMR assay. Naturally occurring sites for BsmI, BtsCI and BspQI (used for strand NEB1 buffer specified by the manufacturer discrimination signal) are shown along with the coordinates. The engineered BbvCI, StuI and NruI sites are highlighted in (New England Biolabs) at 37°C for an hour. green boxes. The reaction mixture was supplemented with

20 1X Antarctic phosphatase buffer, ~5 units of Antarctic phosphatase and further incubated at

37°C for 30 minutes to dephosphorylate the 5’-ends of the linearized plasmid. DNA was purified from the reaction mixture using Sureclean from Bioline. Purified linearized and dephosphorylated pC, 40pmol of MMRunivtop/MMRunivbot duplex (obtained by mixing equimolar amounts of the ss oligos) (Table 1), 1X T4 DNA ligase buffer and ~10 units of T4

DNA ligase in a total reaction volume of 20uL was incubated at 16°C overnight. ss oligos

MMRunivtop and MMRunivbot duplex when annealed to each other form a duplex with 3’- overhangs compatible with NsiI cut site. The ligation mixture was used to transform chemically competent E. coli and transformants were selected using yeast extract-tryptone (YT) rich solid medium containing 500µg/ml of ampicillin.

Transformants were incubated in 3ml liquid YT-ampicillin at 37°C overnight. Aliquots of 1 ml of the liquid cultures were cryopreserved in -80°C with dimethyl sulfoxide (DMSO). The remaining 2 ml of the cultures were used for plasmid extraction using UltraClean® Standard

Mini Plasmid Prep Kit from Mo Bio laboratories.

Plasmids containing the ds insert were identified by digesting with StuI; sensitivity to StuI indicated successful insertion of the fragment into pC. Since native pC does not have a StuI site, presence of the ds fragment insert would result in linearization of the plasmid upon treatment with StuI. StuI sensitive samples were verified to contain the insert by Sanger sequencing performed at the DNA core facility in Cincinnati Children’s Hospital Medical center, using primer pDMMMRA1f.

21 Figure 5. “Universal backbone” inserted into pC. Nicking sites for Nt.BbvCI and Nb. BbvCI are indicated by “↓” and “↑” respectively. Recognition sites are underlined and mismatches are shown in bold. (A)-(J) are synthetic oligonucleotides which serve as “incoming” strands to anneal to the gap on the plasmid.

2.3.3 Construction of heteroduplex plasmid substrates

About 500 ng of pDM8 was digested with 10 units (~10 times excess) of Nb. BbvCI (for

T/C heteroduplex plasmid) or Nt. BbvCI (for G/G heteroduplex plasmid) and 1X NEB buffer 4 in a total reaction volume of 40 µL at 37 °C for 1 hour. Excess enzyme was added to maximize dual nicking at the two BbvCI sites. A volume of 10 µL a 10 µM solution of synthetic ss oligo listed in fig. 5A (for T/C heteroduplex plasmid) or fig. 5G (for G/G/ heteroduplex plasmid) was added and the mixture was heated to 75 °C for 2 min to release the short, nicked strand, followed by cooling at room temperature for 30 min. The synthetic oligos will outcompete the native

22 strand that was released and anneal to the gap on the plasmid being in >10x molar excess. About

10 units of StuI (for T/C heteroduplex plasmid) or NruI (for G/G heteroduplex plasmid) was added and the reaction mixture was incubated at 37 °C for 1 hour to linearize any homoduplex plasmids. An excess of T4 DNA ligase was added along with T4 DNA ligase buffer in a final concentration of 1X and incubated at room temperature for an hour. DNA was precipitated using

Sureclean and the purified DNA pellet was resuspended in 20 µL of distilled water. Strand discrimination signals (i.e., nicks on one strand close to a mismatch) were introduced by nicking with 10 units of Nt. BspQI in a reaction containing 1X NEB buffer 3 and incubating at 50 °C for an hour. DNA was purified using Sureclean prior to transformations.

For all heteroduplex pDM8 constructions, methylated pDM8 was used since S. acidocaldarius has a robust restriction-modification system called SuaI that restricts unmethylated 5’-GGCC sites. pDM8 was methylated by propagating it in E. coli strain ER2566 carrying plasmid EsaBC41 (see materials and methods) (Kurosawa and Grogan, 2005). DNA recovered from the methylating strain of E. coli was checked for methylation by restriction digestion with HaeIII. HaeIII shares the same recognition sequence as SuaI, and cleavage is blocked by methylation at the cognate site (Grogan, 2003).

2.3.4 Transformation and analysis of transformants

Sulfolobus acidocaldarius pyrE mutants were transformed with mismatch-containing plasmids via electroporation (electroporation was carried out in a 1mm cuvette with a single pulse of 1.25 kV at 2 µF and 1 kΩ and plated on xylose-tryptone (XT) medium without uracil; plates were incubated at 75°C for 5-6 days. Transformant colonies were inoculated into 2 ml liquid XT and incubated at 70°C. Genomic DNA was extracted and the universal backbone

23 locus was PCR amplified using primers pDMMMRA1f and pDMMMRA1r using thermocycler program DG48, which consists of an initial denaturation step at 95°C for 2 min, followed by 30 cycles of denaturation at 95°C for 22s, annealing at 48°C for 22s and extension at 72°C for 1 min

33s; a final extension step was carried at 72°C for 3 min.

To serve as a comparative control, the assay was conducted in MMR-proficient E. coli. E. cloni®5-alpha strain was transformed with HD plasmids via chemical transformation, spread on

YT-ampicillin plates and incubated overnight at 37°C. Transformant colonies were inoculated into 2ml liquid YT and incubated at 37°C overnight. Plasmids were extracted using

UltraClean® Standard Mini Plasmid Prep Kit from Mo Bio laboratories.

In order to score for repair of T/C mismatches, plasmid DNA recovered/amplified from transformants were digested with BsmI and StuI. Partial sensitivity of the DNAs to both enzymes was evidence for no repair (Fig. 6A, lanes 2, 3 and Fig. 6B lanes 1, 2). Complete

sensitivity to only BsmI or StuI indicated (A) (B) removal of the strand containing C or T 1 2 3 4 5 6 1 2 3 4 Linearized Supercoiled respectively (Fig. 6A lanes 5, 6 and Fig. 6B

Undigested amplicon lanes 3,4). Repair of G/G mismatches were Digested amplicon fragments scored by digesting with NruI and BtsCI.

Lanes 1, 4: Undigested plasmid Lanes 2, 5: StuI digested 1,3: BsmI digested 3, 6: BsmI digested 2,4: StuI digested 2.4.3 Control for co-transformation Figure 6. Agarose gel images of results from screening (A) plasmids extracted from E. coli A prerequisite for the assay is that transformants showing no repair (lanes 1, 2, 3; incomplete digestion by single cutter StuI in lane 2) and repair (4, 5, 6; complete digestion by StuI in lane 2) of mismatches. (B) only a single DNA molecule should amplicons obtained from S. acidocaldarius transformants showing no repair (lanes 1, 2; both lanes show partial transform a given cell. In order to ensure digestion) and repair (lanes 3, 4; lane 3 shows complete digestion while lane 4 shows complete resistance to that only one plasmid molecule enters and is digestion) of mismatches.

24 established in a transformant cell, two versions of pC containing HpaI and PmeI sites respectively at the same location of the plasmid were mixed in equal amounts and transformed into S. acidocaldarius and E. coli cells. No transformant colonies containing both versions of pC were obtained indicating that under the assay conditions only one DNA molecule is transformed into S. acidocaldarius and E. coli cells.

2.4 Results

pDM8 containing a T/C mismatch without strand discrimination signals showed 70% of transformants containing plasmids representing both parental strands of the heteroduplex plasmid repair and in 30% of the transformants, the backbone strand was retained (removal of incoming strand) in MMR-proficient E. coli. However, when a nick was introduced in the backbone strand

268nt from mismatch, MMR-proficient E. coli yielded 80% of transformants that retained both parental strands of the heteroduplex plasmid and 10% each of either incoming or backbone strand being discarded. In previous studies where similar assays were carried out in E. coli, about half of the transformants retained both parental strands of a heteroduplex, whereas the other half showed non-directional repair (about equal numbers of transformants showing repair favoring either strand) when strand discrimination signals are absent (Parker and Marinus, 1992).

In the presence of strand discrimination signals, repair was reported as more than 90% of transformants retaining only the strand containing no strand discrimination signal (Parker and

Marinus, 1992).

Sulfolobus acidocaldarius showed 79.2% of transformants retained both parental strands of the heteroduplex plasmid and 20.8% discarded the incoming strand when transformed with pDM8 with a T/C mismatch without strand discrimination signals. Strand discrimination signals

25 on the backbone strand yielded 45.8% of transformants that did not resolve the mismatch, 33.3% that discarded the backbone strand and 20.8% discarded the incoming strand.

G/G mismatches were also tested in MMR-proficient E. coli. Dual nicking for top backbone strand replacement was carried out by digesting with Nt. BbvCI and strand discriminating signal was introduced by nicking with Nt. BspQI. The expected result in case of repair would be a majority of transformants retaining only the backbone strand since the strand containing the incoming strand has the nick as a result of Nt. BspQI digestion. In the absence of strand discrimination signals, 60% of transformants retained both parental strands of the heteroduplex plasmid while 40% discarded the incoming strand. In the presence of strand discrimination signals, 70% retained both parental strands while 30% discarded the incoming strand. Since most the E. coli transformants showed retention of both parental strands of the heteroduplex plasmids with or without strand discrimination signals, it is indicative of no repair.

2.5 Discussion

Several factors could be at play for MMR-proficient E. coli to not efficiently repair the mismatches in pDM8. The strand discriminating signal could have been ligated in vivo before

MMR could act on the mismatch. In fact, about 50% of mismatch-containing plasmid substrates with nicks get ligated in MMR assays in vitro (Peter Hoffman, personal communication).

Replication of the plasmid could also have outcompeted the E. coli MMR system. The plasmid itself could be a weak substrate for the MMR system.

The results from S. acidocaldarius suggest there could be a repair system akin to conventional MMR due to the slight increase in transformants where the strand containing the nick was removed compared to heteroduplex plasmids that were not nicked, although increase in

26 repair cases upon introducing a nick in one of the strands could result from partial denaturation of the nicked strand and removal of the ss flaps by structure-specific endonucleases. On the other hand, even though mismatches in pDM8 proved to be weak substrates for the E. coli MMR system, it might not be the case with S. acidocaldarius. However, without the assay performing as expected in E. coli, there is no comparative control for the S. acidocaldarius results.

The results from E. coli and S. acidocaldarius confirm the efficiency with which mismatches can be introduced into plasmids using nicking enzymes; since the majority of transformants retained both the strands, heteroduplexes construction in vitro was efficient. This suggests that failure of the assay to perform as expected in E. coli was not due to unwanted forms of the plasmid (like gapped forms or native homoduplexes) present in the heteroduplex preparation.

2.6 Future directions

pDM8 can be used to develop assays to investigate translesion synthesis (TLS) in S. acidocaldarius. TLS polymerases are specialized polymerases that allow replication to proceed past DNA lesions in the template due to their ability to bypass lesions; most TLS polymerases belong to family Y polymerases (Goodman, 2002). They replace polymerases at blocked replication sites, bypass the obstructing lesions and hand back over downstream polymerization to the main replication polymerases (Friedberg et al., 2005; Nohmi, 2006).

The genome of S. acidocaldarius shows presence of at least four polymerases, one of which is homologous to a Y family TLS polymerase, dinB, in E. coli, named dbh (DinB

Homolog). In E. coli, dinB encodes pol IV TLS polymerase, which bypasses a variety of adducts including derivatives of guanine residues (Yuan et al., 2008). Our lab has successfully

27 disrupted the dbh gene in S. acidocaldarius and mutation spectrum analyses show that Dbh might be involved in accurately bypassing modified guanine residues (Sakofsky et al., 2012).

pDM8 can be used to deliver specific lesions into Dbh+ and Dbh- S. acidocaldarius strains to investigate lesion bypass properties of Dbh. The steps to introduce a synthetic single stranded oligo containing desired lesions are the same as described for constructing plasmids with mismatches (Fig. 7A-C). After introducing the lesion-containing strand, a second dual followed by heat release (Fig. 7D) will generate a plasmid with a lesion in a single-stranded gap

(Fig. 7E). The plasmid construct will only be able to replicate if the gap is filled in vivo.

Therefore, transformants obtained by electroporating the gapped plasmid construct will represent cases where the gap was filled successfully. Transforming Dbh+ and Dbh- cells with the gapped plasmid construct and analyzing the transformants for the base that is inserted opposite the lesion will reveal the role of Dbh, if any, in bypassing the respective lesions. The identity of the base(s) inserted opposite a bypassed lesion will also show whether Dbh bypasses a given lesion with the correct complementary base(s).

ND “INCOMING” 2 DUAL NICKING 1ST DUAL NICKING GAPPED PLASMID 1ST HEAT RELEASE + STRAND ANNEALING W/ LESION HEAT RELEASE

A B C D E Figure 7. (A) First dual nicking step (indicated by blue dashed arrows; here, top strand is represented by the inner circle). (B) First heat-release of native plasmid strand. (C) Annealing synthetic ss oligo with defined lesion (red block) to gap, followed by ligation. (D) Second dual nicking step (indicated by dashed green arrows). (E) Second heat release yields desired substrate (gapped plasmid with defined lesion)

28 2.7 References

Berkner, S., Grogan, D., Albers, S.V., and Lipps, G. (2007). Small multicopy, non-integrative shuttle vectors based on the plasmid pRN1 for Sulfolobus acidocaldarius and , model organisms of the (cren-)archaea. Nucleic Acids Res 35, e88.

Bishop, D.K., and Kolodner, R.D. (1986). Repair of Heteroduplex Plasmid DNA after Transformation into Saccharomyces-Cerevisiae. Molecular and Cellular Biology 6, 3401-3409.

Friedberg, E.C., Lehmann, A.R., and Fuchs, R.P.P. (2005). Trading places: How do DNA polymerases switch during translesion DNA synthesis? (vol 18, pg 499, 2005). Mol Cell 19, 143- 143.

Goodman, M.F. (2002). Error-prone repair DNA polymerases in prokaryotes and eukaryotes. Annu Rev Biochem 71, 17-50.

Grogan, D.W. (2004). Stability and repair of DNA in hyperthermophilic Archaea. Curr Issues Mol Biol 6, 137-144.

Grogan, D.W., Carver, G.T., and Drake, J.W. (2001). Genetic fidelity under harsh conditions: analysis of spontaneous mutation in the thermoacidophilic archaeon Sulfolobus acidocaldarius. Proc Natl Acad Sci U S A 98, 7928-7933.

Iyer, R.R., Pluciennik, A., Burdett, V., and Modrich, P.L. (2006). DNA mismatch repair: functions and mechanisms. Chem Rev 106, 302-323.

Jiricny, J. (2006). The multifaceted mismatch-repair system. Nat Rev Mol Cell Biol 7, 335-346.

Johansson, E., Majka, J., and Burgers, P.M.J. (2001). Structure of DNA polymerase delta from Saccharomyces cerevisiae. Journal of Biological Chemistry 276, 43824-43828.

Johnson, R.E., Prakash, S., and Prakash, L. (1999). Efficient bypass of a thymine-thymine dimer by yeast DNA polymerase, Pol eta. Science 283, 1001-1004.

Kunkel, T.A., and Erie, D.A. (2005). DNA mismatch repair. Annu Rev Biochem 74, 681-710.

Kurosawa, N., and Grogan, D.W. (2005). Homologous recombination of exogenous DNA with the Sulfolobus acidocaldarius genome: properties and uses. FEMS Microbiol Lett 253, 141-149.

Larrea, A.A., Lujan, S.A., and Kunkel, T.A. (2010). SnapShot: DNA mismatch repair. Cell 141, 730 e731.

Li, G.M. (2008). Mechanisms and functions of DNA mismatch repair. Cell Res 18, 85-98.

Nohmi, T. (2006). Environmental stress and lesion-bypass DNA polymerases. Annu Rev Microbiol 60, 231-253.

29 Parker, B.O., and Marinus, M.G. (1992). Repair of DNA heteroduplexes containing small heterologous sequences in Escherichia coli. Proc Natl Acad Sci U S A 89, 1730-1734.

Sakofsky, C.J., Foster, P.L., and Grogan, D.W. (2012). Roles of the Y-family DNA polymerase Dbh in accurate replication of the Sulfolobus genome at high temperature. DNA Repair (Amst) 11, 391-400.

Wang, H., and Hays, J.B. (2006). Construction of MMR plasmid substrates and analysis of MMR error correction and excision. Methods Mol Biol 314, 345-353.

Waters, L.S., Minesinger, B.K., Wiltrout, M.E., D'Souza, S., Woodruff, R.V., and Walker, G.C. (2009). Eukaryotic Translesion Polymerases and Their Roles and Regulation in DNA Damage Tolerance. Microbiol Mol Biol R 73, 134-+.

Yuan, B., Cao, H., Jiang, Y., Hong, H., and Wang, Y. (2008). Efficient and accurate bypass of N2-(1-carboxyethyl)-2'-deoxyguanosine by DinB DNA polymerase in vitro and in vivo. Proc Natl Acad Sci U S A 105, 8679-8684.

30 Chapter 3 Formation and fates of mismatches in v iv o during homologous

recombination in S. acidocaldarius

Data from this chapter appear in the manuscript published in Frontiers in evolutionary and genomic microbiology titled “Heteroduplex

formation, mismatch resolution, and genetic sectoring during homologous recombination in the hyperthermophilic archaeon Sulfolobus

acidocaldarius”. Authors: D. Mao and D. Grogan.

3.1 Introduction

Genetic transformation by linear DNA via homologous recombination (HR) (also known as ‘ends-out recombination) is used extensively for ‘targeted gene replacement’ in model organisms spanning all domains of life (Capecchi, 1994; Hastings et al., 1993; Sharan et al.,

2009). Though the ability to

(A) transform Sulfolobus

Strand Invasion acidocaldarius with linear DNA is well documented, little is (B) (C) Recombination terminated known about the mechanism(s)

by which the exogenous DNA is

-#./(', "#*'#*/0&), incorporated into the

chromosome. Even in well-

!"#$%&'()*+, (D) (E) (F) established model organisms,

there is ongoing debate over the Figure 3. Roles of MMR in HR. (A) shows two homologous DNAs in red and black). (B) Strand invasion of DNA partner in black by DNA numerous models that have partner in red forms a heteroduplex intermediate. (C) Recombination is terminated by MMR if the recombining sequences exceed the threshold been proposed to explain the limit of mismatches. If recombination occurs, the heteroduplex can be acted upon by MMR, which can either remove (D) the red strand or (E) strand transactions that lead to the black strand; result of repair is gene conversion. (F) If heteroduplex is not acted upon by MMR, both strands can serve as templates during transfer of linear DNA (donor) replication and the resulting DNAs can segregate into daughter cells, which results in genetic sectoring.

31 to the chromosome (recipient). Most of the models have a common feature of an intermediate hybrid DNA molecule formed by the annealing of part or all of the participating homologous

DNAs (Fig. 1B), though multiple ways have been reported for the formation of the intermediate.

Recombination mediated by the Red proteins of λ phage in E. coli involves processing ds DNA into single stranded intermediates that are annealed to complementary regions of the chromosome exposed during replication (the Red recombination system of λ phage expressed in

E. coli allows for transformation using linear DNA which would normally get degraded by the

RecBCD nuclease of E. coli) (Maresca et al., 2010). In Saccharomyces cerevisiae, both ends of the donor DNA are recessed by 5’-3’ exonucleolytic activity to expose 3’-overhangs that invade homologous chromosomal regions leading to double crossover (a strand invasion event at one end is shown in Fig. 1B) (Langston and Symington, 2004).

Dissimilarities in partner sequences would result in the formation of a mismatched intermediate upon which conventional MMR can act in two ways: (i) MMR can terminate the recombination reaction (Fig. 1C). It has been shown in yeast that a single mismatch within a

350bp donor DNA can inhibit recombination 5-10 fold; addition of more mismatches in the donor further elevates inhibition (Datta et al., 1997). The inhibitory effects of MMR on recombination have also been documented in bacteria as well as mammalian cells (Majewski and

Cohan, 1998; Modrich and Lahue, 1996). (ii) If a recombination reaction is allowed to proceed,

MMR can influence the outcome by repairing the mismatches leading to gene conversion whereby one strand is lost (Fig. 1D and 1E). If the mismatches are not repaired, each strand can serve as templates during replication and segregate into daughter cells (Fig. 1F).

Assays were developed to query outcomes resulting from both (i) and (ii) in S. acidocaldarius. The results suggest that there is no detectable penalty for increasing mismatches

32 between partner DNAs in homologous recombination in S. acidocaldarius. Furthermore, genetically sectored colonies were obtained from transformation using marked DNA suggesting that mismatched intermediates are formed during HR followed by segregation of the mismatched markers into individual daughter cells. This provides evidence that the heteroduplex intermediates were not acted upon by a conventional MMR-like pathway. In addition to querying for MMR, the study also provides clues to mechanistic details of recombination using linear DNA in S. acidocaldarius that suggest a recombinational pathway distinct from that of eukaryotes or bacteria.

3.2 Experimental design

To determine whether the inhibitory effects of an MMR-like system on recombination can be detected in S. acidocaldarius, the effect of increasing the number of mismatches on HR efficiency was tested. If an MMR-like system is present in S. acidocaldarius, a decrease in

efficiency in HR would be seen 0-MM with increasing mismatches in the 2-MM 5-MM partner DNAs. Sulfolobus 6-MM acidocaldarius was transformed

with different DNAs that have in (A/G) 30/99 (C/T) 19/88 (A/C) 45/114 (A/C) 45/114 (C/T) 87/156 (A/C) 66/135 (T/G) 117/186 (T/G) 117/186 (A/G) 147/216

HindIII common a selectable sequence,

Selectable 154-171/223-240 and a synonymous base pair Figure 4. Donor DNAs testing penalty of mismatch number on HR efficiency. Circles indicate synonymous BPS. The two substitution (BPS) that creates a constant markers present in all DNAs are denoted by black square (selectable marker) and red circle (HindIII). Nucleotides in coding HindIII site, but differ in the sequence of pyrE/donor DNA as well as position of markers in recipient/donor are given below each maker. number of synonymous BPSs

33 between the selectable and the HindIII marker (Fig. 2). The DNAs were transformed into MR31 and HR efficiency was measured by scoring the number of transformants. The effect of increasing mismatches between the selectable and the HindIII marker on the frequency of incorporation of the HindIII marker was also tested.

To test for formation of heteroduplex intermediates and follow their fates in vivo, S. acidocaldarius pyrE mutants were transformed with exogenous pyrE (donor) that contains the selectable marker to rescue the mutation in the chromosomal copy of the gene (recipient), as well as multiple scorable synonymous BPSs (Fig. 3). Transformant colonies were genotyped by scoring the non-selected markers. Sectored colonies were identified by the presence of mixtures of pyrE sequences representing both donor and recipient markers at any of the multiple synonymous BPS sites.

3.3 Materials and methods

3.3.1 Strains and growth conditions

The Sulfolobus strain MR31 was used for most experiments (Reilly and Grogan, 2001).

Strain SA1 was used for testing the effects of conjugation; this strain is isogenic to MR31, and contains, in addition to the 18bp internal pyrE deletion, a deletion of most of the upsE gene encoding an ATP-dependent pilus-assembly protein (Ajon et al., 2011). Cells were cultured at

76-78 C; liquid growth and plating conditions were those described previously (Grogan and

Rockwood, 2010).

34 3.3.2 Donor DNAs

3.3.2.1 462bp DNAs with varying number of mismatches

The penalty of mismatch number on HR efficiency was tested using four 462bp

fragments (designated 0-MM, 2-MM, 5-MM and 6-MM) of wild type pyrE containing the MR31

18bp selectable sequence in the center and a synonymous BPS that creates a HindIII site towards

the 5’-end of the pyrE coding sequence separated by a distance of 134bp. 0-MM, 2-MM, 5-MM

and 6-MM have 0, 2, 5 and 6 mismatches respectively in the 134bp region between the selectable

and HindIII marker (Fig. 2). All 462bp DNAs were obtained by PCR amplification using

primers JRset1 fwd and MM HR rev (Table 1), thermocycler program DG48 and the following

genomic DNA templates: 2-MM, 5-MM and 6-MM are from transformants obtained during a

Investigating homologous previousrecombination study by undergraduate using exogenous researcher linear Jananie DNA Rockwood in Sulfolobus (Grogan and acidocaldarius Rockwood, 2010)

Dominic M. Mao and Dennis W. Grogan Department of Biological Sciences, University of Cincinnati, Ohio

BspEI MboI Abstract Donor DNA Genetic transformation by linear DNA has been studied in yeast and ....CTCCCCAACCATCCGGACATATTTTCATTCGTCGTTAGTTCTGCGGTCGACATAGTAAAAGGTATAAATTTTGATATGATCTTGGGAGTTGTTACAGGAGGCGTT.... mammalian systems and used extensively for ‘targeted gene ....CTCCCCAACCATCCTGACATATTTTCATTCGTCGTTAGTTCTGCAGTCGAC (18 bp deletion) TTTGATATGATATTGGGAGTTGTTACTGGGGGCGTT.... replacement’. Though the ability to transform Sulfolobus acidocaldarius with linear DNA is well documented, little is known Recipient genome PstI about the mechanism(s) by which the exogenous DNA is incorporated into the chromosome. We used multiple scorable Figure 2. The recombination assay. The 18-bp selectable sequence is flanked by scorable but phenotypically silent mutations (in red) that either create or destroy restriction enzyme sites. synonymous base pair substitutions incorporated into the pyrE gene Figure 3. Sense strand alignment of donor and recipient pyrE showing the selectable marker synonymous BPSs for high-resolution analysis of the outcomes of recombination between linear DNA and the S. acidocaldarius genome. A number creating or destroying restriction enzyme sitesS. acidocaldarius exhibits genetic sectoring . In the donor DNA shown here, PstI site is destroyed and BspEI Is sectoring due to segregation of two donor strands? of S. acidocaldarius transformant colonies contained two distinct and MboI sites are created in wildtype when transformed with linear DNApyrE . The alignment can also be used to visualize the heteroduplex pyrE genotypes, suggesting formation of a heteroduplex •! In principle, the two strands of a duplex donor DNA could recombine intermediate (DNA with mismatches at the position of the markers) •! S. acidocaldarius cells with an 18 bp deletion in the pyrE gene were intermediates that could be formed during recombination by replacing any one strand with its complement.independently with two copies of the chromosome or two strands of one copy. followed by segregation of the mismatched markers into individual transformed with amplicons encoding a marked but functional pyrE (Fig. 2). daughter cells. In systems possessing functional DNA mismatch • To test this, single-stranded pyrE was produced by selectively digesting one •! The region was PCR-amplified from unpurified transformant colonies, and a ! repair (MMR), heteroduplex intermediates formed during while 0-MM was obtained by site-directed mutagenesisstrand of (seethe pyrE section amplicon using 3.3.3). ExoIII. These DNAs yielded genetic homologous recombination are often resolved/repaired by MMR so sub-set of the markers was scored by digestion with restriction enzymes. sectoring similar to untreated dsDNA (two examples shown in Table 1). that only those intervals which escape repair yield distinct daughter •! The presence of both digested and undigested amplicons in individual cells. Analysis of the genetic sectoring that results from digestion reactions indicated potential sectoring. transforming S. acidocaldarius thus queries the mechanism(s) by Conclusions which linear DNA recombines with the Sulfolobus chromosome and •! Transformant colonies showing both markers at these sites were confirmed by has potential for investigating the existence of a MMR-like system clonal purification and sequencing; some genotypes are shown in Table 1. •! About 20% of S. acidocaldarius cells transformed with multiply marked linear in S. acidocaldarius. 3.3.2.2 Multiply marked full-length ds py rE Donor DNA yielded colonies containing two distinct transformant clones that differ with respect to non-selected pyrE markers. Background The multicopyRecipient plasmid pSaPyrEv3 (Grogan•! The and properties Rockwood, of this genetic 2010)sectoring phenomenon was used suggest as somea source process Genetic transformation by linear DNA is widely used for Colony # that involves i) formation of a heteroduplex intermediate followed by ii) partial disrupting genes. This process has been called ‘ends-out’ resolution of mismatches and iii) segregation into daughter cells (see Fig. 3). recombination, to distinguish it from double-strand break repair, 1 and various pathways for it have been proposed (Paques and Haber of marked full-length pyrE for sectoring assays. pSaPyrEv3 contains the wild-type pyrE 5’ 3’ +

mutants (a) 1999). In one model, independent strand invasion events occur on Figure 3. One possible 3’ 5’ ds either end of dsDNA (Fig. 1a); in another, a long ssDNA may 2 pathway that can lead to R e p a i r / r e s o l u t i o n simply anneal to its complement (Fig. 1b). sequence (Saci_1597) (Chen et al., 2005) modified geneticwith sectoring: synonymous BPSs regularly spaced (a) heteroduplex DNA forms, (b) mismatches are 3 (b) However, all proposed models agree on the formation of partially repaired/resolved heteroduplex DNA in the intermediate. MMR can repair the mutants favoring donor DNA, along+ the entire length of the coding sequence that either create or destroy restriction enzyme mismatches in the heteroduplex to determine the distribution of (c) repair synthesis with R e p a i r s y n t h e s i s ds markers in the recombinants. Conversely, failure to repair results in 4 donor DNA as template, (d) replication of resolved sectored transformant colonies (Paques and Haber, 1999). duplex and segregation of (c) replicated chromosomes 5 The uncertainty of whether a MMR-like mechanism operates into daughter cells yields R e p l i c a t i o n & s e g r e g a t i o n siblings with distinct in hyperthermophilic archaea raises questions about ‘ends-out’ + mutants 35 genotypes. recombination in Sulfolobus. This study investigated the fate of a ss 6 multiply marked DNA recombining with the S. acidocaldarius S I B L I N G G E N O T Y P E S genome, and the results suggest this assay may provide a new way + (d) to investigate the unresolved question of MMR in Sulfolobus. Table 1. pyrE genotypes obtained from sectored transformant colonies. Scorable donor markers are denoted by solid circles, recipient markers are denoted by open 3’ circles. Solid square represents 18 bp selectable donor marker; open square represents Acknowledgements 18 bp deletion in recipient pyrE.

5’ 3’ 5’ This work was supported by NSF grant MCB 0543910. The authors would like to thank 3’ 5’ 3’ Sonya Albers for the upsE strain. Is sectoring due to conjugation after transformation? (a) (b) (b)

5’ •! Conjugation between transformed and non-transformed cells very soon after References Figure 1. Intermediates of two possible pathways of ends- out recombination. recombination could generate multiple genotypes in a transformant colony. Paques, F., and Haber, J.E. (1999). Multiple pathways of recombination induced by double- a)! Independent strand invasion from both 3’-ends of foreign •! To test this, a conjugation-deficient (upsE ) mutant was analyzed by the strand breaks in Saccharomyces cerevisiae. Microbiol Mol Biol Rev 63, 349-404. DNA (green) b)! Single-strand annealing to its complement at a replication method above. Sectored colonies were found at frequencies similar to those of fork. ups+ cells (two examples shown in Table 1). sites. The plasmid does not have a Sulfolobus replication origin and hence cannot replicate in

Sulfolobus. For most transformations with full-length pyrE, the pyrEv3 cassette was amplified by PCR using primers sspyrEfwd and sspyrErev (Table 1). After PCR and digestions (if any, see below), the amplicons were purified using SureClean (Bioline).

Table 3. Oligonucleotides used in this study. The bases in bold are the query bases for Ligase Chain Reaction (LCR) probes. All downstream oligo probes for LCR were phosphorylated at the 5’-end.

DESIGNATION SEQUENCE (5'-3') sspyrE fwd TGATGCGGTACCATGGATTTCGTGAAAGCTTTAC sspyrE rev ATACGAAGCGCTATTATTCTAGCTTTTTCCAATGTTTTTC JRset1 fwd ACGCCCTTAAATAAGGTTAG

R JRset1 rev GGGACATTGAAAGAACTAGA PC HindIII JR GGATTTCGTGAAAGCTTTACTT

MM HR rev AGCTTTCCTTACCTCCTCCAC

PE AseI+ AACTCCCAATATCATATCAAAATTAATACCTTTTACTATGTCGACTGCAGAACTAACGAC

PE AseI- AACTCCCAATATCATATCAAAATTTATACCTTTTACTATGTCGACTGCAGAACTAACGAC Transforming

LCRSaPE90dntop GATTTAAGGAAACTCCC LCRSaPE90dnbot AGGTAATAGGGGCTAAC LCRSaPE90Atop GTTAGCCCCTATTACCTA LCRSaPE90Tbot GGGAGTTTCCTTAAATCT LCRSaPE90Ttop GTTAGCCCCTATTACCTT LCRSaPE90Abot GGGAGTTTCCTTAAATCA LCRSaPE117dntop GACATATTTTCATTCGTCG

LCRSaPE117dnbot GGATGGTTGGGGAG LCRSaPE117Ttop CTCCCCAACCATCCT LCRSaPE117Abot CGACGAATGAAAATATGTCA

LCR probes LCRSaPE117Atop CTCCCCAACCATCCA LCRSaPE117Tbot CGACGAATGAAAATATGTCT LCRSaPE192dntop GTTACTGGGGGCG LCRSaPE192dnbot ACTCCCAATATCATATCA LCRSaPE192Gtop GATATGATATTGGGAGTG LCRSaPE192Cbot GCCCCCAGTAACC LCRSaPE192Ttop GATATGATATTGGGAGTT LCRSaPE192Abot GCCCCCAGTAACA

36 3.3.2.3 Multiply marked full-length ss py rE

The PCR primers used to amplify the pyrEv3 cassette also provided for the specific digestion of one strand to generate the single-stranded form. The forward primer (sspyrEfwd) was designed to include a 5'-tail with KpnI site (highlighted in red in Table 1 and Fig. 4) and a

3'-end that anneals to the start of the pyrE (start codon underlined in Table 1). The reverse primer (sspyrErev) has a corresponding 5'-tail with HinP1I site (highlighted in blue in Table 1 and Fig. 4) and a 3'-end that anneals to end of the pyrE coding sequence (stop codon underlined).

Digesting the resulting amplicon with KpnI and HinP1I creates 3’- and 5’- overhangs respectively. The amplicon was then treated with ExoIII, which specifically degrades the strand containing 5’-overhangs because the strand with 3’-overhangs is not a substrate for the exonuclease (Henikoff, 1984). Any residual ds amplicons were eliminated by digestion with

SalI, which cuts 5nt upstream of the 5’-end of the selectable marker on the sense strand and 1nt upstream of the 3’-end of the selectable marker on the anti-sense strand. Double stranded amplicons digested with SalI did not yield transformants, confirming that the treatment was effective in eliminating ds DNA.

5’-TGATGCGGTAC CATGGATTTCGTGAAAGCTTTAC……………GAAAAACATTGGAAAAAGCTAGAATAATAG CGCTTCGTAT-3’ 3’-ACTACGC CATGGTACCTAAAGCACTTTCGAAATG……………CTTTTTGTAACCTTTTTCGATCTTATTATCGC GAAGCATA-5’

Figure 4. ss pyrE construction. Forward primer has a 5'-tail with KpnI site (Red) and a 3'-end that anneals to the start of the pyrE (start codon shown). Reverse primer has a 5'-tail with Hinp1I site (blue) and a 3'-end that anneals to end of the pyrE coding sequence (stop codon shown). Digesting with KpnI and HinP1I creates 3’- and 5’- overhangs respectively (cleavage sites shown as gaps in sequence). Treatment with ExoIII selectively degrades the top (sense) strand since 3’-overhangs are resistant to cleavage.

3.3.2.4 Distinctly marked py rE sense and antisense 185nt oligos

Synthetic donor DNAs markedSaPE43-227top and markedSaPE43-227bot were designed with regularly spaced non-selected markers as for the pyrEv3 cassette (Grogan and Rockwood,

37 Table 2. Multiply-marked 185-nt transforming DNAs. Most of the BPSs create or destroy restriction enzyme sites in the recipient. Sites 90, 117 and 192 were scored by ligase chain reaction (LCR) as indicated.

Base pair WT pyrE nt Recipient Donor Recipient Donor Base pair scoring position (Sense) (Sense) (Antisense) (Antisense) scoring (sense) (antisense) 63 A C T C Bsu36I HpyCH4IV 90 A G T A BstNI LCR 117 T A A C LCR BspEI 144 T C A C SsiI PstI NsiI BspDI 163, 165, 168 G, T, T T, C, A C, A, A A, C, T (selectable) (selectable) 192 T G A G TsoI/LCR HinfI 207 T A A G RsaI BsaHI

2010), except that only a 185bp interval (pyrE nt 43 to 227) was represented. All seven sites of

BPSs in the sense (top) strand oligo were different from those in the antisense (bottom) strand oligo (Table 2 and Fig. 5).

3.3.3. Site-directed mutagenesis LCR RsaI NsiI SsiI BstNI LCR Bsu36I 5’ 154 3’ 63 117 144 192 207 PCR was carried out with primers HindIII JR and 90 -171 3’ 5’

JRset1 rev, genomic DNA from wildtype S. LCR PstI HinfI BspEI BsaHI BspDI

HpyCH4IV acidocaldarius using thermocycler program DG48. The

Figure 5. 185nt donor DNAs annealed to HindIII JR primer contains the BPS that introduces the each other. Synonymous, scorable substitutions are shown as semicircles and the HindIII site (highlighted in green in table 1). The PCR 18nt selectable sequences are shown as squares (sense strand markers are in black product was used to transform MR31 and the and bottom strand markers are in grey). The 18nt selectable sequences are modified versions of the wild type sequence that confer transformants were scored for presence of the BPS by non-synonymous changes, but yield functional PyrE, creating NsiI and BspDI digesting with HindIII. Samples that were sensitive to sites in the sense and antisense oligos respectively. In subsequent figures, full HindIII were confirmed for the desired BPS by Sanger circles indicate marker base pairs representing duplex DNA.

38 sequencing. One confirmed sample was cryopreserved and its genomic DNA was used as a source of 0-MM.

3.3.4 Transformation

5, 10 and 15ng each of 0-MM, 2-MM, 5-MM and 6-MM were electroporated into S. acidocaldarius MR31 cells. For genetic sectoring assays, the amount of pyrEv3 DNA used was

60-100 ng, and the amount of 185nt oligo was 10 and 20 pmol. Transformation reactions were plated on XT plates and incubated at 75°C for 5 days.

3.3.4 Analysis of transformants

For the assay testing mismatch penalty, total transformant colony counts were first obtained. Transformant colonies were then picked from a circular area of 1-inch diameter on the plate, inoculated separately into 2 ml liquid XT and incubated for 2 days at 75°C. Genomic

DNAs were extracted and the pyrE region amplified using JRset1 fwd and rev primers. The amplicons were digested with HindIII and then subjected to electrophoresis on 1.65% agarose gels to score for sensitivity to HindIII.

In order to analyze transformants for the genetic sectoring assay, two different methods were employed. In initial studies, isolated S. acidocaldarius Pyr+ transformant colonies were suspended in sterile buffer and serial diluted. Dilutions (10-4, 10-5 and 10-6) were plated on selective plates and incubated for 5 days. Eight isolated colonies of sibling cells from each transformant colony were picked and grown in selective liquid medium for about 2 days.

Genomic DNA was extracted and a 995bp fragment containing the pyrE loci was amplified using primers JRset1fwd and JRset1rev (Table 1). Preliminary restriction digestion screening using 4-

39 5 markers equally spaced within the pyrE gene was carried out to identify sibling strains that differed in digestion patterns. These strains were selected for further screening by sequencing the pyrE locus to obtain complete marker profiles.

In a later streamlined procedure, the original (non-purified) transformant colonies were picked and grown in 3 ml selective medium. Genomic DNA was extracted from 2 ml of culture and pyrE was amplified as described above, while the remaining culture was stored at room temperature until needed for plating (see below). A restriction-digestion screen was carried out on the amplicons to identify samples that showed only partial cleavage, indicating both endonuclease-sensitive and -resistant amplicons in the sample (Fig. 6 Lane 5). Apparently dual- genotype cultures were then serially diluted and plated to obtain isolated colonies as described above. The markers in these “sibling colonies” were then scored as described above to confirm distinct clones, and representatives of these sibling clones were then scored at all remaining sites.

The two approaches of identifying sectored colonies, (i) isolating and analyzing sibling colonies

initially from each transformant, vs. (ii)

1 2 3 4 5 screening non-purified transformant colonies followed by confirmation of Full length amplicon (995bp) purified sibling colonies, yielded 719 bp BsaHI digested fragments 276 bp comparable efficiencies in detecting sectored colonies from cells transformed

with the pyrEv3 cassette. Method (ii) was Figure 6. Restriction digestion results of pyrE amplicons from unpurified transformant colonies obtained using 185bp donor. Lane 1 shows a sample therefore employed for most of the study that had a single genotype of pyrE at the BsaHI site evident by complete digestion of the amplicon. Lanes 2, 3 and 4 owing to its greater efficiency, thus show samples with complete absence of the BsaHI site. Lane 5 shows a potential sectored sample where undigested increasing sample size. as well as digested products are obtained.

40 3.3.5 Scoring non-selected markers

Most of the BPSs introduced into the native pyrE sequence in the various donor DNAs either create or destroy restriction sites, enabling scoring by restriction enzyme digestions of

PCR amplicons and fragment analysis on agarose gels. However, two alleles of the pyrEv3 cassette and two of the 185nt oligonucleotide pair did not affect restriction sites. For simplicity, all pyrEv3 markers were scored by Sanger sequencing of the PCR amplicon.

3.3.5.1 Genotyping by Ligase Chain Reaction

For oligonucleotide transformants, the endonuclease TsoI proved to be unreliable in scoring the marker incorporated at pyrE position 192, so ligase chain reaction (LCR) (Wiedmann et al., 1994) was used to score this marker and two other donor alleles that did not alter restriction sites (Table 2). Genotyping by LCR uses sequence specificity required for ligation to identify bases at a known position. The template DNA is denatured and four oligos are allowed to anneal to the two complementary strands of the template; two that anneal upstream of the

(A) (B) 5’-TTTGATATGATATTGGGAGTGGTTACTGGGGGCGT-3’ 5’-TTTGATATGATATTGGGAGTGGTTACTGGGGGCGT-3’ Template 3’-AAACTATACTATAACCCTCACCAATGACCCCCGCA-5’ 3’-AAACTATACTATAACCCTCACCAATGACCCCCGCA-5’

Denaturation Denaturation

5’-TTTGATATGATATTGGGAGTGGTTACTGGGGGCGT-3’ 5’-TTTGATATGATATTGGGAGTGGTTACTGGGGGCGT-3’

3’-AAACTATACTATAACCCTCACCAATGACCCCCGCA-5’ 3’-AAACTATACTATAACCCTCACCAATGACCCCCGCA-5’

3’-ACTCCCAATATCATATCA-5’ Annealing 3’-ACTCCCAATATCATATCA-5’ Annealing 3’-CCAATGACCCCCG-5’ 3’-ACAATGACCCCCG-5’ + + 5’-GTTACTGGGGGCG-3’ 5’-GTTACTGGGGGCG-3’ 5’-GATATGATATTGGGAGTG-3’ Ligation 5’-GATATGATATTGGGAGTT-3’ Ligation

5’-TTTGATATGATATTGGGAGTGGTTACTGGGGGCGT-3’ 5’-TTTGATATGATATTGGGAGTGGTTACTGGGGGCGT-3’ ACTCCCAATATCATATCA C CAATGACCCCCG ACTCCCAATATCATATCA A CAATGACCCCCG

GATATGATATTGGGAGTG GTTACTGGGGGCG GATATGATATTGGGAGTT GTTACTGGGGGCG 3’-AAACTATACTATAACCCTCACCAATGACCCCCGCA-5’ 3’-AAACTATACTATAACCCTCACCAATGACCCCCGCA-5’ Ligated product No ligated product Figure 7. Illustrations of LCR reactions with template containing donor marker G/C at query position 192 (bolded). (A) Reaction with upstream oligos (in red) carrying donor markers at the 3’-end (bolded) resulting in ligated product. (B) Reaction with upstream oligos (in red) carrying recipient markers at the 3’-end (bolded) resulting in no ligated product.

41 query position and two that anneal downstream of the query position on both denatured template strands. The 3’-ends of the upstream oligos probe for the bases at the query position and ligation can occur only if the four oligos are perfectly annealed to the template strands. The sequential steps of denaturation, annealing and ligation are repeated several times in a thermocyler. The ligation product can be then be identified by obtaining a melting curve of the reaction mixture with the help of SYBR green. SYBR green preferentially binds ds DNA with lower affinity for ss DNA. Therefore, the loss in fluorescence with increase in temperature can be plotted; if the upstream oligos are a perfect match and ligation occurs, the melting curve will show a peak for the ligated product (Fig. 8A and 8D). For this study, there are two possible base pairs (donor or recipient) for any given query position. Two separate LCR reactions with upstream oligo pairs carrying either donor or recipient bases at the 3’-end were carried out. Fig. 8 shows typical

(A) (C)

Ligated product LCR probing for (A/T)

(B) (D)

LCR probing for (C/G) Ligated product

Figure 8. Figure 3. LCR results for two transformants probed for markers at position PE192. (A) and (B) are LCR results from genotyping a transformant for AT and CG respectively; presence of a peak corresponding to the ligated product in reactions (A), but not in (B) indicates A/T at the query position. Similarly (C) and (D) show LCR results for a transformant with C/G at the query position.

42 melting curves obtained after LCR; in Fig. 8A, the first peak at about 60°C corresponds to the oligo probes, which denature as temperature is increased resulting in the loss of fluorescence.

The second peak corresponds to the ligated product (if ligation was successful depending on the base pair in the template and the upstream oligo probe) (Fig. 8A). The third peak corresponds to the template DNA which is pyrE amplicon in the LCR performed for this study (Fig. 8A).

The LCR mixture consisted of approximately 10ng pyrE DNA, 2µM of the four probes

(Table 1), 1X SYBR Green, 1X 9°NTM ligase buffer and 5U of 9°NTM (Thermococcus) DNA ligase (New England Biolabs) and in a total volume of 20µL. After an initial denaturation at

95°C for 3 minutes, the following 2-step cycle was used: denaturation at 95°C for 30s and annealing and ligation at 59.3°C, for 39 cycles. This was followed by a final annealing and ligation at 59.3°C for 3 min. A melting curve of the reaction mixture was obtained from 60-

95°C by plotting decrease in fluorescence per increment of 0.5°C.

3.4 Results

3.4.1 There is no apparent penalty of increasing mismatches on HR efficiency

Total colony counts show no significant differences in overall transformation efficiencies.

Analysis of variance (ANOVA) revealed no significant effect of increasing mismatches on overall transformation efficiencies (F5,33 = 1.83, P = 0.31). Also, increasing the number of mismatches does not appear to reduce the frequency of incorporation of the HindIII marker

(Table 3); ANOVA F3,20 =0.68, P = 0.58.

43 Table 3. Summary of frequency of HindIII incorporation and overall transformation efficiencies. Number of % HindIII Transformation mismatches incorporation efficiency/ng DNA 462bp DNA in 134bp Std. Std. Average Average region Dev. Dev. WT 0 - - 3.40 1.86 0-MM 0 27.96 23.00 5.60 4.28 2-MM 2 23.27 13.00 3.98 2.74 5-MM 5 25.58 5.20 3.41 2.47 6-MM 6 25.74 21.00 3.03 0.84

3.4.2 Sectored colonies result from transformation using ds py rE

About 22% of the transformant colonies analyzed from transformation using ds pyrE were sectored. The marker profiles of the distinct pyrE genotypes obtained from sectored colonies are given in Fig. 9A. Most of the marker profiles show differences in marker incorporation at the ends of the marked region.

3.4.3 Sectoring is not due to conjugation

In order to exclude conjugation as a mechanism that contributes to formation of sectored colony, the assay using ds pyrE was carried out in SA1 cells. One can imagine a scenario where conjugation between transformed and non-transformed cells could result in colonies that mimicked sectoring. Even though SA1 cells are incapable of conjugation, sectored colonies were obtained indicating that conjugation does not contribute to the formation of sectored colonies (Fig. 9B).

44 (A) (B) dsPE1 SA1 dsPE2 SA2 dsPE3 SA3 dsPE4 (C)

ssPE1 dsPE5

ssPE2 dsPE6 ssPE3 dsPE7

dsPE8 dsPE9 Figure 9. Marker profiles showing two distinct pyrE genotypes from sectored dsPE10 colonies obtained by transforming (A) MR31 with ds full-length pyrE (B) SA1 dsPE11 cells with ds full-length pyrE (B) MR31 with ss full-length pyrE. Solid square and dsPE12 circles represent selectable and scorable donor markers. Open circles represent recipient markers.

3.4.4 Sectoring is not due to more than one copy of donor DNA recombining multiple times with the chromosome(s)

Sulfolobus acidocaldarius cells have a relatively long G2 phase during which the cell possesses two copies of the chromosome (Bernander and Poplawski, 1997; Lundgren et al.,

2004). In theory, two copies of a donor DNA can enter a cell and recombine independently with each of the chromosomes of a cell in G2 phase. In order to test this possibility, MR31 was transformed with saturating amounts of a mixture of two differentially marked 80nt oligos; one had the wild type sequence identified by the presence of an AseI site within the 18bp selectable

45 sequence, while the AseI site was destroyed in the other oligo (Table 1). Both oligos correspond to the antisense strand of the pyrE gene (anneals to the sense strand on the chromosome). Out of

55 transformant colonies analyzed, 25 incorporated the oligo with the AseI site while 30 incorporated the oligo without the AseI site. No colonies showed incorporation of both oligos suggesting only one donor DNA molecule gets incorporated in a recipient cell under the assay conditions.

3.4.5 Single-stranded donor DNA can generate sectored colonies

Since S. acidocaldarius can be transformed by ss oligos (Kurosawa and Grogan, 2005), the assay was conducted using ss pyrE to see if it was capable of generating sectored colonies.

Sectored colonies showing marker profiles similar to those obtained using ds pyrE were obtained

(Fig. 9C).

Since full-length ss pyrE was prepared by enzymatic degradation of one strand, there is a possibility of ds regions persisting at one end. In order to confirm sectoring by ss DNA, 185nt synthetic oligos (Table 2 and fig. 5) were used to transform MR31 cells. Previous studies reported a strand bias when transforming S. acidocaldarius with ss DNA; the authors reported higher transformation efficiencies with exogenous DNA annealing to leading strand template

(Grogan and Stengel, 2008). In order to investigate if differences could be seen in sectoring as a result of sense or antisense strand transformations, the assay was performed using synthetic oligos that anneal to same coordinates of sense or antisense strands of the pyrE gene on the chromosome respectively. No strand bias was observed in transformation efficiencies from top

(sense) or bottom (antisense) strand transformations and both donor DNAs yielded sectored

46 Figure 10. Marker profiles of (A) sectored cases from 185nt top (sense) strand, (B) bottom (antisense) strand and (C) ds oligo transformations; (D) non-sectored cases from 185nt ds transformations. colonies (Fig. 10 A and B). Frequencies of sectored colonies for top and bottom strand transformations were 29.2% and 39.6% respectively.

3.4.6 Both strands of a donor duplex participate in HR

The 185nt oligos, when annealed to each other, forms a duplex with distinct markers on each strand. This allows for testing whether both strands of a donor duplex participate in HR; in other words, if both donor markers can be recovered from a transformant colony. Equimolar

47 amounts of top and bottom strand oligos were annealed at room temperature and transformed into MR31. Sectored colonies were obtained at frequencies (26.4%) comparable to that of ss transformations. An interesting observation was that in a majority of the sectored colonies

(14/19), markers from both donor strands were incorporated (Fig. 10C).

3.5 Discussion

This study exploits the effects of conventional MMR on HR to probe if there is an MMR- like system in S. acidocaldarius. Genetic assays were developed to detect formation of heteroduplex intermediates and their fates in vivo. In addition, this study also reveals properties of HR between linear and chromosomal DNA in S. acidocaldarius.

Mismatches have been shown to decrease efficiencies of recombination in bacteria and eukaryotes due to the action of MMR inhibiting formation of the heteroduplex intermediate

(Datta et al., 1997; Shen and Huang, 1989). In yeast a 100-fold decrease in recombination was observed when the recombining sequences were 94% identical compared to recombination between 100% identical sequences. An assay was developed to test the effect of number of mismatches between two markers (selectable and scorable HindIII) on HR efficiency in S. acidocaldarius. DNA sequences of 100%, 98.5%, 96.3% and 95.5% identity (0, 2, 5 and 6 mismatches within a 134bp sequence) did not show a significant decrease in transformation efficiency as measured by total colony counts per ng of DNA. Thus, under the assay conditions no penalty is observed for increasing mismatches on HR efficiency in S. acidocaldarius suggesting the absence of a conventional MMR-like system that inhibits recombination between divergent sequences. No significant differences were found in the frequency of co-transfer of the

HindIII site along with the selectable marker between the various input DNAs.

48 To investigate heteroduplex intermediate formation and processing in vivo, an assay was developed to detect sectoring as a result of segregation of unresolved heteroduplex intermediates.

Two distinct pyrE genotypes were obtained in 22% of transformant colonies analyzed when cells were transformed with ds full-length, marked pyrE. Sectoring could be seen even in strains incapable of conjugation indicating that it is not due to recombination via conjugation between transformed and non-transformed cells. Furthermore, two distinctly marked antisense oligos corresponding to the same pyrE coordinates failed to generate sectored colonies even at saturated transforming DNA amounts suggesting only one DNA molecule enters a cell and/or participates in recombination. Together, the data provide the first documented evidence for unresolved heteroduplex intermediates during HR in S. acidocaldarius.

Carrying out the assay with the distinctly marked 185nt sense and antisense oligos allowed a more in-depth analysis of the formation and resolution of the heteroduplex. Marker profiles were compared to the expected outcomes of popular models of ends-out recombination in bacteria and eukaryotes (Fig. 11).

The first model involves a double crossover whereby both ends of the donor is first resected in the 5’-3’ direction (Fig. 11B) and the 3’-overhangs that are formed then invade the recipient (Fig. 11C) (Langston and Symington, 2004) (Paques and Haber, 1999). This model has been proposed to explain the trans configuration of markers in the siblings, where one sibling genotype shows incorporation of donor markers on the right side of the selectable marker while the other incorporates donor markers on the left side (Fig. 11D).

The second model involves single-strands of the donor annealing to the recipient either during exposure of ss complementary regions of the chromosome during replication or by strand invasion (Fig. 11E). This model is also known as single strand annealing/assimilation (Mosberg

49 et al., 2010; Paques and Haber, 1999). Repair in the region of the selectable marker favoring the donor strand results in a cis configuration where one sibling genotype shows incorporation of donor markers on both sides of the selectable marker while the other has recipient markers at those positions (Fig. 11F). Of course if repair does not occur, one strand will be lost due to the absence of the selectable marker and no sectoring will be seen (Fig. 11G).

The results from sectoring assays conducted using different DNA types reveal a large majority of the sectored cases did not conform to a cis or trans configuration, but differed in marker incorporation only on one side of the selectable marker (Table 4). This suggests that the main mode of ends-out recombination in S. acidocaldarius does not follow either of the proposed

!"#$

5’-3’ resection ss annealing

!%#$ !(#$

Strand invasion Repair No repair/ replication !&#$ !)#$

Resolution of !+#$ branched structure Replication and replication

!'#$ !*#$

Figure 11. Classical models of ends-out recombination. Recombination between two DNAs shown in (A) where solid square is the selectable and solid circles are scorable donor markers. (B-D) shows the double crossover pathway where (B) 5’-ends of the donor are resected which then (C) invade the recipient resulting in a (D) trans configuration of marker incorporation. In the ss assimilation/annealing pathway, (E) a single strand of the donor anneals to the recipient. If mismatch is repaired at the selectable region favoring the donor strand (shown by dotted arrows), (F) the selectable marker is transferred to the recipient strand, (G) resulting in a cis configuration of marker incorporation. (H) If no repair occurs, one daughter cell will be lost due to the absence of the selectable marker.

50 models in bacteria and eukaryotes. Additional evidence for this is that most of the sibling genotypes obtained by transformation using distinctly marked 185bp duplex incorporated markers from top and bottom (sense and antisense) strands of the donor duplex. The outcome of the two established models would be either top or bottom donor markers incorporated in a sibling genotype. However, out of 60 total recombinants analyzed (sectored as well as non- sectored cases), only 25% and 20% had exclusively top or bottom strand markers respectively.

Fundamental differences can be seen in the marker profiles from ss and ds 185nt transformations; the former showed a higher loss of markers at the 3’-end of the donor, while the latter lost 5’-end markers preferentially. This suggests that that might be two different pathways or ways of processing ss and ds donor DNAs in ends-out recombination in S. acidocaldarius.

Isolated markers, which are interruptions of a tract of one marker type by another marker type, can be seen in marker profiles from transformants obtained by transformation using full- length ds and ss pyrE as well as 185nt oligos. For example, in one of the siblings in dsPE6 (Fig.

9), an isolated recipient marker can be seen flanked by two donor markers. Isolated recipient

markers can also be seen in siblings of Table 4. Summary of marker incorporation configuration among sectored cases. ssPE2, T68, B9, TB34, etc. (Fig. 9C, 10A, Transforming Total One-sided Cis Trans DNA analyzed disagreement 10B, 10C). The simplest way to explain ds full-length 15 1 2 12 pyrE incorporation of isolated markers is by

ss full-length 3 1 - 2 repair of extremely short patches of pyrE

ss 185nt top mismatches (sometimes single 14 1 2 11 strand mismatches) within a string of mismatches ss 185nt bottom 19 - 3 16 strand in the heteroduplex intermediates.

ds 185bp 19 1 4 14 However, there was no bias seen in the duplex

51 frequency of isolated marker types; both donor and recipient markers were discarded with comparable frequencies. Another indication of repair of mismatches in S. acidocaldarius is the incorporation of both top and bottom donor markers within single pyrE genotypes obtained from sectored as well as non-sectored colonies via ds 185bp transformations. This suggests that the preformed heteroduplex donor might be subject to the short patch repair before it undergoes recombination with the chromosome. Removal of short patches of top or bottom strand and resynthesis of the resulting gap could transfer markers between the top and bottom strand prior to recombination. This form of short patch repair of mismatches will not be relevant in the context of enhancing replication accuracy due to the requirement of the ability to discriminate against the error-containing strand. It is possible that the hypothesized short-patch repair system can identify the error in particular situations; for example, when it is associated with the replication machinery.

This study provides the first in vivo evidence for repair of mismatches in S. acidocaldarius and highlights the non-conformity of the organism’s ends-out recombinational pathway(s) to conventional models systems in bacteria and eukaryotes.

3.6 References

Ajon, M., Frols, S., van Wolferen, M., Stoecker, K., Teichmann, D., Driessen, A.J., Grogan, D.W., Albers, S.V., and Schleper, C. (2011). UV-inducible DNA exchange in hyperthermophilic archaea mediated by type IV pili. Mol Microbiol 82, 807-817.

Bernander, R., and Poplawski, A. (1997). Cell cycle characteristics of thermophilic archaea. J Bacteriol 179, 4963-4969.

Capecchi, M.R. (1994). Targeted gene replacement. Sci Am 270, 52-59.

Chen, L., Brugger, K., Skovgaard, M., Redder, P., She, Q., Torarinsson, E., Greve, B., Awayez, M., Zibat, A., Klenk, H.P., et al. (2005). The genome of Sulfolobus acidocaldarius, a model organism of the Crenarchaeota. J Bacteriol 187, 4992-4999.

52 Datta, A., Hendrix, M., Lipsitch, M., and Jinks-Robertson, S. (1997). Dual roles for DNA sequence identity and the mismatch repair system in the regulation of mitotic crossing-over in yeast. Proc Natl Acad Sci U S A 94, 9757-9762.

Grogan, D.W., and Rockwood, J. (2010). Discontinuity and limited linkage in the homologous recombination system of a hyperthermophilic archaeon. J Bacteriol 192, 4660-4668.

Hastings, P.J., McGill, C., Shafer, B., and Strathern, J.N. (1993). Ends-in vs. ends-out recombination in yeast. Genetics 135, 973-980.

Henikoff, S. (1984). Unidirectional digestion with exonuclease III creates targeted breakpoints for DNA sequencing. Gene 28, 351-359.

Langston, L.D., and Symington, L.S. (2004). Gene targeting in yeast is initiated by two independent strand invasions. Proc Natl Acad Sci U S A 101, 15392-15397.

Lundgren, M., Andersson, A., Chen, L., Nilsson, P., and Bernander, R. (2004). Three replication origins in Sulfolobus species: synchronous initiation of chromosome replication and asynchronous termination. Proc Natl Acad Sci U S A 101, 7046-7051.

Majewski, J., and Cohan, F.M. (1998). The effect of mismatch repair and heteroduplex formation on sexual isolation in Bacillus. Genetics 148, 13-18.

Maresca, M., Erler, A., Fu, J., Friedrich, A., Zhang, Y., and Stewart, A.F. (2010). Single- stranded heteroduplex intermediates in lambda Red homologous recombination. BMC Mol Biol 11, 54.

Modrich, P., and Lahue, R. (1996). Mismatch repair in replication fidelity, genetic recombination, and cancer biology. Annu Rev Biochem 65, 101-133.

Mosberg, J.A., Lajoie, M.J., and Church, G.M. (2010). Lambda red recombineering in Escherichia coli occurs through a fully single-stranded intermediate. Genetics 186, 791-799.

Paques, F., and Haber, J.E. (1999). Multiple pathways of recombination induced by double- strand breaks in Saccharomyces cerevisiae. Microbiol Mol Biol Rev 63, 349-404.

Reilly, M.S., and Grogan, D.W. (2001). Characterization of intragenic recombination in a hyperthermophilic archaeon via conjugational DNA exchange. J Bacteriol 183, 2943-2946.

Sharan, S.K., Thomason, L.C., Kuznetsov, S.G., and Court, D.L. (2009). Recombineering: a homologous recombination-based method of genetic engineering. Nat Protoc 4, 206-223.

Shen, P., and Huang, H.V. (1989). Effect of base pair mismatches on recombination via the RecBCD pathway. Mol Gen Genet 218, 358-360.

Wiedmann, M., Wilson, W.J., Czajka, J., Luo, J., Barany, F., and Batt, C.A. (1994). Ligase chain reaction (LCR)--overview and applications. PCR Methods Appl 3, S51-64.

53

Chapter 4 Genome stability and evolution of S. acidocaldarius

Data from this chapter appears in the manuscript published in International society of microbial ecology Journal titled “Genomic evidence of

rapid, global-scale gene flow in a Sulfolobus species”. Authors: D. Mao and D. Grogan. Advanced online publication 15 March 2012.

4.1 Introduction

The genomes of organisms undergo changes resulting from several factors. Mutations introduced during replication, or as a result of DNA damage can be fixed if they do not confer a strong negative disadvantage to the organism. Simple repetitive elements can cause replication slippage and as a result variation in lengths of these elements can be frequently seen in sequenced genomes (Richards and Sutherland, 1994). Genomes can also be subject to change by various other factors. (HGT) is a phenomenon by which can acquire genetic material closely related or sometimes even distant species primarily via transformation with naked DNA, transduction and conjugation (Gogarten and

Townsend, 2005; Thomas and Nielsen, 2005). Insertion sequences, transposons, and miniature inverted-repeat transposable elements can also shuffle and rearrange the genome during their transit from one location to another (Bennetzen, 2005; Mahillon and Chandler, 1998). A study conducted in E. coli reported less than 40% of protein coding genes were shared in common between three strains suggesting that a majority fraction of the genomes is newly and differentially acquired (Welch et al., 2002). As more complete genome sequences of prokaryotes are obtained, the emerging picture strongly argues for HGT to be contributing towards the variability seen between different isolates of the same species (Ochman et al., 2000).

The evolution of microbes has been a subject of debate with two primary schools of thought arguing for and against allopatric speciation (independent evolution of different

54 populations of a species separated by geographic barriers due to limited gene flow) in microorganisms. The argument against allopatric speciation in microorganisms is exemplified by the Baas-Becking hypothesis that microorganisms are everywhere, but the environment selects (de Wit and Bouvier, 2006). Numerous studies have shown the efficiency with which microorganisms disperse, to support this hypothesis (Finlay, 2002; Hooper et al., 2008).

Countering this are studies that provide evidence for allopatric speciation in microorganisms using microorganisms that have very specialized habitats that may be sparsely distributed geographically. The most convincing evidence comes from a study conducted using S. islandicus, a sister species of S. acidocaldarius, that demonstrated a geographic signature in genome sequences of various natural isolates from different locations; phylogenetic trees constructed from the genomes of the various isolates showed that strains from closer geographical locations grouped together (average divergence for distances 1500 and 6500 km were 4.6 x 10-3 and 1.1 x 10-2 respectively) (Reno et al., 2009). In other words, sequence divergence was seen to increase with increasing distance between the habitats from where the various strains were isolated. Reno et al interpret their findings as being due to the result of failure of successful dispersal of viable cells over long distances (Reno et al., 2009).

In order to see whether S. acidocaldarius showed similar divergence across large distances, genome sequences of three natural isolates from North America, Germany and Japan were analyzed for polymorphisms. These are the only three strains of S. acidocaldarius isolated to date. The average geographical separation of these strains was about 8200 km. This study directly tested the hypothesis that allopatric speciation of microbes, which have highly specialized growth requirements, results from inefficient dispersal of viable cells over large

55 distances. The results showed that the genomes of three S. acidocaldarius isolates were highly conserved suggesting efficient gene flow in contrast to the results seen in S. islandicus.

In addition to querying the evolution of S. acidocaldarius in nature (as measured by sequence divergences of natural isolates), the plasticity of the S. acidocaldarius genome under the influence of mutagens was also investigated. An attempt was also made to identify genes that were involved in maintaining genetic fidelity in S. acidocaldarius. Former graduate student

Greg Bell isolated a mutator strain of S. acidocaldarius, designated GBA, by random mutagenesis using UV and the methylating agent N-methyl-N′-nitro-N-nitroso-guanidine

(MNNG) (Bell and Grogan, 2002). The rationale behind this approach is that random mutagenesis can cause loss-of-function mutation of genes involved in lowering spontaneous mutations, which will result in a mutator phenotype. UV causes cyclobutane pyrimidine dimers between TT and CC residues in DNA; translesion polymerases have been shown to bypass TT dimers accurately for their affinity to pair AA opposite dimers, but CC dimers are often paired with TT (Brash and Haseltine, 1982; Strauss et al., 1982). MNNG primarily causes formation of

O6-methylguanine in DNA which is mutagenic since it can pair with thymine and as a result cause G:CA:T transition mutations (Sedgwick, 2004). The genome sequence of GBA was analyzed to identify the genes that were mutated and looked for candidate genes that could be involved in lowering spontaneous mutations and genomic rearrangements. To verify if GBA was indeed a mutator, the spontaneous mutation spectrum of GBA was compared to wild type.

Intermediate strains from each successive mutagen treatment were also analyzed for the number as well as type of mutations that accumulated. The genome sequence results as well as the mutation spectra data contradict the claim of GBA being a mutator strain. The study also

56 showed that MNNG primarily causes G:CA:T substitutions in S. acidocaldarius consistent with the effect reported in other model organisms (Sedgwick, 2004).

4.2 Materials and methods

4.2.1 Sulfolobus acidocaldarius strains

Strain N8 (gift from Norio Kurosawa) was isolated from Hokkaido in Japan (Kurosawa et al., 1995). Ron12/I (gift from Karl Stetter) was isolated from a uranium mining site in

Ronnenberg, Germany (Fuchs et al., 1995). DG185 and DSM639 are synonymous with strain

98-3, which is the first S. acidocaldarius isolated from Yellowstone national park, North America, by Thomas Brock (Brock et al., 1972). The genome sequence of DSM639 is deposited in

Genbank under accession number NC_007181.1 (Chen et al., 2005).

4.2.1.1 History of the GBA strain

The original parent of GBA is strain C obtained from I. Holz (Max-Planck-Institut fur

Biochemie, Germany) initially mistaken to be S. solfataricus DG1

Selection of papillae strain P1, but later was found to be S. acidocaldarius DG6 UV mutagenesis + selection (Grogan, 1991). DG1 was subjected to two rounds of

DG64 selecting for papillae where cells were plated for confluent MNNG Enriching for mutator phenotype by alternating selection

MM8 growth and colonies that formed over the lawn were picked (Mutator phenotype)

Spontaneous auxotrophic mutant isolation (Grogan, 1991). A clone from one such colony designated

GBA (Auxotrophic mutant) DG6 was treated with UV and selected for mutants that Figure 5. Genealogy of GBA. were resistant to 5-fluoroorotic acid (FOA) in the presence of uracil (having a Pyr- phenotype) yielding strain DG64, which has a mutation in the pyrB4

57 gene (encoding apartate carbamyl transferase) involved in the UMP biosynthetic pathway

(Grogan and Gunsalus, 1993). DG64 was treated with MNNG and enriched for mutator strains by alternating MNNG treated cells between two selective conditions, plating with FOA and uracil (selecting for loss of pyrEF function) and orotic acid as a source of pyrimidine (selecting for restoration of pyrEF function) (Bell and Grogan, 2002). The alternating selection provides a way to enrich for mutator strains because a mutator will have an advantage in a changing environment to adapt to the change. Several prospective mutator strains were identified identified for their high spontaneous rate of forming auxotrophic mutants an indication of having an elevated spontaneous mutation rate (Bell and Grogan, 2002). One such mutator MM8 yielded a spontaneous mutant auxotrophic for a combination of amino acids, designated GBA.

4.2.2 Genomic DNA extraction and sequencing

Genomic DNAs from S. acidocaldarius strains DG185, Ron12/I, N8, DG1, DG6 DG64,

MM8 and GBA were extracted by Dennis Grogan and sequenced using ILLUMINA multiplex platform (Iowa state university DNA facility). ILLUMINA deep sequencing generated millions of 76 bp reads that could be aligned to each other based on overlapping regions as well as to a reference genome sequence. Raw reads were aligned to S. acidocaldarius strain DSM639 sequence from Genbank (Chen et al., 2005) by Dennis Grogan using CLC genomics workbench software at Duke University. Alignment parameters were defined such that reads that matched with more than one location on the reference or with low coverage were discarded; this resulted in a number of contigs for each strain that had several gaps corresponding to the ambiguous reads. To identify polymorphisms, contigs of each strain were compared to DSM639 sequence.

The ILLUMINA sequencing yielded an average coverage of >200X per base.

58 In order to fill the gaps between the contigs and to generate a consensus sequence for each strain, primers pairs were designed to amplify the gaps (Table 1). Amplification was performed using thermocycler program DG50 (all parameters are same as DG48 [see chapter 2] except annealing temperature is 50°C). Sanger sequencing of the amplicons was performed at

Duke University’s DNA core facility.

All polymorphism sites identified in GBA by ILLUMINA sequencing were amplified from strains DG1, DG6, DG64 and MM8 and Sanger sequenced to trace the origins of the mutations across the various treatments of DG1 leading to GBA.

Table 4. Primers used to amplify loci of interest from S. acidocaldarius chromosome.

Primers Forward sequence (5'-3') Reverse sequence (5'-3')

Gap 1 856729 ACTAATCAGGAGTTGCCG CCAGTTGTTATCCGTTTCC Gap 2 924277 CTTATCTCTGGTTTCGCTC TCTGAGGAAGTAGCCAG Gap 3 930230 TGTCATCTGGTAACTGCC GCTAAGTACGGTATAGAGGTAG Gap 4 987197 CAGCAGGGAATTCACAG GTTCATGAGAAGCCTTATCTATG Gap 5 995043 GCATAGATTCTCTAAGAGATGAC CCTTCATTCATTAATACACACTTATAG Gap 6 1610559 GTGGTAACTGCTGAGGAG AGCGGTGAATCCAACAT Gap 7 1633415 CTTGAAGTAAACAGAGCAATAGA CAACTGAAGATCTAAGCTCAAG Gap 8 1675041 CTCATTATCTCTGATTTCGACC CCCAGTGTTTTAGTTTCTTGTC Gap 9 1754359 GAATAAAGATAGTCGTGGCG CAGTGTAGTTCGCGGT Gap 10 1826676 GGGATTCATCAAAGTTCTAGTC CAAATACGTGGGTTTATAAGTCT Gap 11 1829828 GTTTCTGTGTGTCGTGG ACTAGGAACGTCGAACAC Gap 12 1942567 CCGAATGTATTTTCTAGTCCAC AAATTGGAAAGGAGCTGG Gap 13 1963233 GACTTATTAGCCCTACTCTTAAC ACTGTAGTGGTTAAGAAATTAGAC Gap 14 2049743 GCATGCCCAATTACTGATG GATCAATCTGCTCTAAGACTC Gap 15 2050945 AAGACAATATCATCTGGTAACTG AGCAGGGATTGATTGTTTTC 50k del TCTTCACGTCTTCATCTGATATAG GCAGAGCGGTGTATATTC 186412 ATGTACTGACATGGAGGAG CAGTATTGATGGTGGTATGTC 201458 GTTGAAAGAACCACGTATACTAC CGCTGTCAGTCCTATATATGAT 207684 ATGTTATGGCCCCTTCAAG CTAGAGAGCATATCGTAGTTGAG 224354 AGTTCATGGAGGGCAG AGATGTTGGAGAGGAGTG 359302 CGCCTGGTATGTTAAAGATC GTTTCTCCCGACCGT 396514 CCATCATAAAGCCTGAATATAGAG CGCAGCATTAGAGATTTGAG 417973 CTTTTCTTTCGTGGCACTC TCTTAGCTACGACTCTGC 419147 GCTTACCACGACGTTG ATCGTAGCCTCATAGAATACC 542546 CGCGAATTGGTAACCAC TTATGGAGTAGCAATATTGACAG 627121 GAGATTAATAATAGGTGGTGGAAC ATACGATCAAATCTGCCTCT

59 628473 GAAGAGGGAATAGGTGGT ATCCACTCCAGCCTC 673664 ATAGCTCCTCCTGGTTTAG TACGATTTAAGGTTGCTGAC 700719 GAACAAAAGGGGTAAGGTAG TGTTAATGTGGATAATGTCAACG 708838 AATTAAAGATATCTACGGCGAAG CATACTAATAGAACCGCAGC 863050 863063 CGGTTATTATTCCAGCAGG ATCCTTTTCTACCCAGTTTCT 863070 932496 932620 ACAAACCTAGACTCAGAGAAC TCCTCCTGCAGTGTAATTAG 932650 933655 GGCTCTTATACTTCTGATAGATGC ATCCCATAATACCAGAATATTGATTAC 963994 GATCCCCTAGAGCTTGG GGGAATTTTCACGTGTTATCT 1083082 CATCCACCACTAGCTGC GTCTAGCTCCAATATTATCCTCT 1095706 CGAAGACAAGAATATCTCCAG GGCGAGTCAGTCCTTCG 1133249 GGCTTAACTATCAATCCATTCTC GTATTGTGACGGTCTTGAAG 1225699 CTCCTGTCAATACATCTGTG AGTGTATTAAGTGAATTTCCTGAAG 1225968 1269571 ATCTTCCATTACCCCTAGTTAC TGGGAGATTATATGTGGTTACC 1270078 1271203 AGGTGGCTGGTCTTAC CCACTTTCAACTTCTATATCTGG 1278928 GGAATCTGGTTACTCTACTACTC ATGAGGTATCGCTAGTTCTTAC 1351450 ACTATCCTGCCGTAGTTAATATC TTATTTTAGCCTTCCATTCCTC 1791107 GTAAAGCACTATTCCACCTG GACTCTGAAAATCCTGTTATACC 1755721 TGCACCTACCAAGACAC GAGGTTTATGATGCCAGAG 2002816 ATGGAATCCTCATCTGGTAC GTGAACAGTACTACAGCTATAAC 2008438 ATCTGCAACGACACTATGG TCTTATACCTTGTGTTAAATGACG DG210 specific GACAGAAGGATTACGAGTCAG CTCCTCAGTTAAGCTCAGAATG (63001) DG212 specific CCATCATAAAGCCTGAATATAGAG CGCAGCATTAGAGATTTGAG 6661 TCTGCAAGGTGTATACTTCATC GTTACTTCCTATCATTCCTTATTACAC 81844 CTTCCATCTAAAGTTACTGCC TGTAATAGGCTTAGTGATGTTCG 124734 TCTGCATATAAATTCTACCTACTATCA GAAGTAAACGTAAACAGTGGAG 126202 ACAGGAAGAGCGGATTATG ATTCAACATAGCTTCCAACTAC 264056 CATATTAGCGTTAGTAGAAGTGAG GTCTCATGTACTGGGAAAGG 304011 304038 CACACAACTTAATGCAACACC ATGATGAGGCACTGCTG 304052 329548 TGATTGAAAGATTTAGCAGTAGTC CCTTTTCTCCAGTAAGCTCC 348066 348082 GTAGGCTCACGATGTAACTC CTGTATCCCCTGGAATGTTAG 348226 375745 CCAATAGTAGCTCCACTTATACC CCTCCAACATTTTAACAGCATC 623697 GCTGATATTGCGGTACATAG CGCCTACAGTAGTTGGATC 720076 AGGAAATAATAGAGTTAGTGGCG CACAAACAAATACGACCTGATAG

60 736558 AAGGCTAACGTACAACTATCG CATACAGGGACTGGATAAGG 813324 GGCAAGCTTATGACGAAG ACCTGGTGCTATGATCTTAC 836667 CACCGTTAACATACCAGC AGATAGGTGGCGCTG 838531 GCAGTGAAGAATAGGTTAGGAG GCTCTCCTGGTATTTACGTAG 1034071 CTAACTTGTACTCCCTAGCAG GAAAGAAATGTTGAGAGATATAAGTTC 1098620 ATCGGTTCTGGTGCATAAC ATTAAGTATGCGTGTATTATGCTC 1177753 CATATGAATGCGGTAATTGTGG CAGAGTTTATAGATTAAGACAAGAGG 1228163 TCTAGCCCTCCTAAATATCG GGGCTGAATGGAATGG 1298723 AGATGCTATAAGGTCTGTAGTAAC CGGTCAGATAGGAGAAAACAATG 1322185 AAGGAGAAGGATTACTGGG CTATATGCCTTCTCTCCTGG 1324668 AGCTACAGTAAGAGGAGAAGAG CGGTTATATAGCCAGTCAAGTC 1326340 ATGATGAGATTAGAATGAGGAAGG AACAATGGAAATAAAACTTCTTCATG 1361175 TCTTTTCGTATTCCATTTCATCTAC CGTGATGAGCTAGAGGAC 1361176 1377156 TTCAATATGGGAGTCGGAATG ACAACGTCTTCCCTCAATC 1458274 CGACCACGTAACATGTTC GAGTCACATCCTTGCTCA 1580287 TTCCTATGTATACCCAGTCAATAG AAAGGACTAATTACACTAAAGTTGG 1584867 GTTAGTGTTACATGTGAGGAAAG ATGTGGAACGCGTATCTTC 1615168 GTAGCCCTCAAGTAATATCCG CTGCTGTACTGGAATGGTC 1700422 AACATTACATGGTGATAGGTAAAG ACCTCTACACCAATGCTC 1701284 TCTTCTCGAGGTTGACTAGC GTAGAGAGGTTAGATTGACGTTG 1702322 ACTTCAACTCCAGATATTACTCC CGTTCTAGGTGGACTCGG 1719129 TCTACAATAGCAGTATATTCACCAG GTACATAGTATTGAGCAAGAATGAG 1837116 1837119 1837120 1837131 1837134 TCTAGCTAGCGAGTCCTC CCTCTCTACCCTTCTCACAC 1837161 1837173 1837174 1837176 1878887 CACGAGAACTGTTGGAG TCTCTCTGCGTTCCTGA 1917860 GCTTATATGGATAGTTTGTTAGTGAC ATGGTGTGTTTGGCCTAC 1937684 TGGCTAACTCCTCGAG GACACAGGAGACATCAGG 2026424 CATTGCCGACACAGTAGC CCTGCTAGTGGTATAGTATATGC 2027418 TACAGTACAGCTAACATATAACCAG GACGAATCTACTGAATTCTTACAC 2162456 GTCCAATCTCCTCAATTCTACTC GAGATAGAATAGCAGCTACATACG

61 4.2.3 Fluctuation test for GBA strain

Fluctuation tests (Luria and Delbruck, 1943) were carried out using GBA to isolate spontaneous pyrE mutants in order to compare the spontaneous mutation spectrum to that of wildtype. Frozen stock of GBA was scraped using a sterile toothpick and suspended in 200µL sdil buffer. The cell suspension was diluted and dilutions of 10-4, 10-5 and 10-6 were spread on

XT plates supplemented with uracil and CAA and incubated for 5-6 days at 75°C till 2-3 mm colonies formed. About 30 colonies were picked for inoculation into individual wells of a 96- well microtiter plate containing 200µl liquid XT-uracil-CAA and incubated at 75°C for 2 days.

Liquid cultures from each microtiter well were plated on XT-uracil-CAA plates containing 0.05 mg/ml FOA and incubated at 75°C for 5-6 days to select for forward mutations in the pyrE gene.

One colony per plate was picked, inoculated in 3 ml liquid XT-uracil-CAA and incubated for 2 days at 75°C. The plates were previously labeled with a dot arbitrarily to randomize sampling and a single colony closest to the dot was picked per plate. Genomic DNA was extracted from 2 ml of the liquid cultures while 1 ml was cryopreserved at -80°C. pyrE was amplified with

JRset1fwd and JRset1rev primers using thermocyler program DG48 and the amplicons were

Sanger sequenced. Mutations in pyrE were identified by aligning the sequences to wildype pyrE sequence. pyrE mutations for a total of 73 mutants from 3 independent fluctuation tests were obtained and. The mutation spectrum of GBA was compared to that of wildtype with respect to position of the mutations within the pyrE gene as well as type of mutation, for significant differences using the Hyper-G computer program (Cariello et al., 1994).

62 4.3 Results

Table 2. Polymorphisms across three S. acidocaldarius isolates. The positions listed for the indels are of the base immediately before the affected base.

DSM639 Ron12/I N8 No. Type Position Base Position Base Position Base 1 BPS 63001 G 63001 A 63001 G 2 BPS 143619 G 143619 A 143619 G 3 BPS 186412 G 184612 G 186412 T 4 indel 207685 207685 207685 +GGA

5 BPS 223721 C 223721 T 223724 C 6 BPS 224355 A 224355 A 224358 G 7 BPS 330705 C 330705 T 330708 C 8 indel 349787 349787 +A 349790

9 indel 417972 417973 +AA 417975

10 BPS 543853 T 543856 C 543856 T 11 indel 638620 638623 +G 638623

12 indel 677699 677703 +TAT 677702

13 BPS 806092 T 806099 C 806095 T 14 BPS 933825 G 933832 A 933828 G 15 indel 988822 988829 +TA 988825

16 BPS 1049027 T 1049036 A 1049030 T 17 BPS 1083005 A 1083014 T 1083008 A 18 indel 1211243 1211254 +C 1211248

19 BPS 1441077 A 1441089 A 1441082 T 20 BPS 1476215 A 1476227 A 1476220 G 21 BPS 1513632 A 1513644 A 1513637 G 22 indel 1600666 1600678 +AA 1600671

23 indel 1610486 1610500 +54 bp 1610491

24 indel 1633413 1633481 -2145 bp 1633418

25 indel 1637249 1635172 +C 1637254

26 indel 1674989 1672913 +59 bp 1674994

27 BPS 1700421 G 1698404 A 1700426 G 28 BPS 1709809 T 1707792 C 1709814 T 29 indel 1754362 1752345 +210 bp 1754367

30 BPS 1755552 C 1753745 T 1755557 C 31 BPS 1813552 G 1811745 A 1813557 G 32 indel 1846775 1844968 1846780 -49429 bp

33 BPS 1865777 G 1863970 A (absent)

34, 35 indel 2008266 2006459 +A 1958842 -A

36 BPS 2015256 C 2013450 T 1965831 C 37 BPS 2016709 C 2014903 C 1967284 T 38 indel 2137109 2135303 2087684 -A

39 indel 2165081 2163274 +G 2115655

40 BPS 2165619 T 2163814 C 2116193 T

63 4.3.1 The genome of S. acidocaldarius is globally conserved

The total number of polymorphisms across three S. acidocaldarius strains isolated from geographically distant locations was 40 (16 indels and 24 BPSs) (Table 2). Pairwise comparisons between the strains showed 30, 9 and 38 polymorphisms between Ron12/I:DG185,

N8:DG185 and Ron12/I:N8 respectively.

Ron12/I showed maximum divergence among the three strains probably because it was isolated from a waste depository of a uranium mine where the cells could have been subjected to ionizing radiation. N8 showed a remarkable difference of a missing 49429 bp fragment compared to the other strains; this can be loss of that fragment by N8 or acquisition of new DNA by a common ancestor that migrated to form DSM639 and Ron12/I.

S. acidocaldarius isolates showed an average nucleotide divergence of 1.14 x 10-5 for a geographical separation of ~8200 km as opposed to 1.1 x 10-2 seen in S. islandicus strains with a separation of ~6500 km. This measures to about 1000-fold lower divergence in S. acidocaldarius compared to S. islandicus.

Table 3. Summary of polymorphisms across natural S. acidocaldarius isolates.

Strain Total Nucleotide comparisons Polymorphisms divergence Ron12/I vs 30 1.30 x 10-5 DSM639 N8 vs DG185 9 4.04 x 10-6 Ron12/I vs N8 38 1.71 x 10-5 Average 25.7 1.14 x 10-5

4.3.2 Possible sequencing errors in the DSM639 reference genome sequence

While comparing the consensus sequence of each natural isolate to DSM639 sequence, several polymorphisms were found to be shared by all three Ron12/I, GBA and N8 strains (Table

64 Table 3. Corrections made to DSM639 2). The most obvious explanations for this observation were sequence (Accession #: NC_007181.1). Chromosomal coordinates and the corresponding corrections are listed. For that they were either true polymorphisms or sequencing errors indels, listed position is the position of the nucleotide immediately before the affected in the DSM639 strain that was sequenced by Chen et al. In site. DSM639 DG185 (corrected) Position Base Position Change order to check if these polymorphisms occurred in DSM639,

186412 T 186412 G 201457 201457 +A the loci for the polymorphisms common to all three strains 224370 T 224371 C 359301 359302 -T were amplified from DG185 which is the designation given to 396514 A 396514 G 419146 419146 +C our laboratory stock of DSM639, and amplicons were 542545 542546 -A

627121 G 627121 T sequenced by Sanger sequencing. The Sanger sequencing 627222 A 627222 T 627265 A 627265 T 628472 628472 +A results showed that DG185 possessed the same “alleles” as 673663 673664 -A 700719 G 700719 A Ron12/I, GBA and N8 suggesting that the differences were 708838 C 708838 A 863049 863049 -T sequencing errors in the DSM639 sequence. This is a plausible 863062 863061 -T 863069 863067 -T 932496 G 932493 T explanation as the authors reported 5.8X sequence coverage 932620 T 932617 C 932650 C 932647 T while assembling the DSM639 chromosome using shotgun 933655 T 933652 C 963993 963990 +T cloning and Sanger sequencing (Chen et al., 2005). An 1083081 1083079 +TT 1095706 T 1095704 C 1133248 1133246 +T additional confirmation would be to sequence the same loci in 1225698 1225697 -T 1225967 1225965 +T the DSM639 strain that was originally sequenced by Chen et al. 1269571 A 1269570 G 1270078 C 1270077 T 1271203 C 1271202 T 1278928 A 1278927 G 4.3.3 Genomic and mutation spectra data contradict 1351450 C 1351449 T 1754276 1754275 -168 bp mutator phenotype of GBA 1755721 T 1755552 C 1791106 1790937 -G 2002815 2002645 -C The total number of polymorphisms identified in GBA compared to DSM639 sequence was 52, which is a considerably low number of mutations for a strain subjected to mutagens. In E. coli, MNNG treatment resulted in ~4000 mutations genome- wide (Harper and Lee, 2012). When these mutations were traced back across the GBA lineage, most of the mutations (19/52) were introduced during MNNG treatment and only 5 were

65 introduced during UV treatment (Table 4). Surprisingly, Half of the mutations (26/52) were differences between DG1 and DSM639 (see discussion). No differences were found between

MM8 and GBA. In other words, MM8 and GBA were found to be completely genetically identical. Thus, the sequence data contradict the published result that claim MM8 to be a mutator strain that yielded GBA, an auxotrophic mutant, by spontaneous mutation. One would expect to see multiple mutations in GBA if MM8 was indeed a mutator.

In addition to the sequence data, the mutation spectrum of the GBA strain was obtained and compared to wildtype. Even though GBA was reported to have an elevated mutation rate compared to wildtype (Bell and Grogan, 2002), comparing mutation spectra allows for a more sensitive probe of phenotypic differences than measuring mutation rate. The mutation spectrum of GBA did not show significant differences when compared with respect to mutation type and position of mutations, to that of wild type.

Table 4. Summary of mutations across the GBA lineage. Differences found in pairwise comparisons of strains are listed under total mutations along with identity of the strains.

Total Transitions Transversions Mutations

A:T A:T T:A C:G Strain G:C C:G Deletions Insertions       A:T A:T G:C C:G A:T G:C

DG1 26 (vs DSM639) 7 9 1 3 3 - 1 2 DG6 2 (vs DG1) 2 ------DG64 5 (vs DG6) 1 4 ------MM8 19 (vs DG64) 1 16 - - - 1 - 1 GBA 0 (vs MM8) ------

4.4 Discussion

Increased genetic divergence with increasing distance seen in S. islandicus was hypothesized to be due to the inability of viable cells to migrate large distances successfully

66 (Reno et al., 2009). Since S. acidocaldarius and S. islandicus share similar growth conditions and habitats, this hypothesis would predict a similar divergence between equivalent distances among S. acidocaldarius isolates. However, the results from S. acidocaldarius strains separated by a distance of about 8200 km showed about 1000 fold lower divergence than that seen in S. islandicus strains separated by distances of about 6500 km. The results from S. acidocaldarius indicate efficient gene flow over large distances, in contrast to the results obtained from S. islandicus.

The incredibly low number of sequence differences seen in natural isolates of S. acidocaldarius could be due to the genome being highly intolerant to mutations (in other words, under high purifying or negative selection) in natural populations. However, most of the substitutions were non-synonymous with a dN/dS=18/3=6, which indicates positive selection at play. Moreover, almost half of the polymorphisms (average of 45.3%) are indels, many of which disrupt protein-coding frames.

The low sequence divergence between S. acidocaldarius isolates cannot be attributed to enhanced replication fidelity compared to S. islandicus. The rates of replication errors in S. acidocaldarius and S. islandicus measured in the pyrEF genes are comparable at about 3 x 10-7 and 4.9 x 10-7 mutations per cell division respectively (Blount and Grogan, 2005).

Survival assays were conducted to see if S. acidocaldarius fared better than S. islandicus under two non-growing conditions, low temperature and desiccation, which could maybe explain higher success in migration. Interestingly, S. islandicus had higher survival rates than S. acidocaldarius in the two tested conditions, though both strains demonstrated a drastic decrease in viable cells under both conditions (Mao and Grogan, 2012).

67 A number of factors can be hypothesized to contribute to the differences in divergence seen between S. acidocaldarius and S. islandicus isolates. Sulfolobus acidocaldarius has a restriction-modification system SuaI that cleaves 5’-GGCC-3’ sites that do not have the inner cytosine residue methylated at the N4 position (Prangishvili et al., 1985), which can limit horizontal gene transfer (HGT). No R-M system has yet been identified in S. islandicus.

No virus has yet been identified that infects S. acidocaldarius while a number of viruses capable of infecting S. islandicus exist (Arnold et al., 2000; Prangishvili et al., 1999). Viruses can promote diversity in their hosts; hosts with mutations that render resistance to infection will survive. On the other hand, viruses can undergo mutations that enable them to counter resistance mechanisms in the host. This constant interplay can drive diversification of both viruses and their hosts.

Sequencing of the GBA strain and genotyping the polymorphisms sites in the predecessor strains yielded surprising results. Out of the 52 polymorphisms (mutations) identified in the

GBA genome compared to DSM639 genome, 26 were already present in DG1 prior to any induced mutagenesis. DG1 (or strain C) was initially thought to be S. solfataricus strain P1; after being identified correctly as S. acidocaldarius, it was assumed to be contamination from the S. acidocaldarius strain first isolated by Thomas Brock (strain 98-3 or DSM639). However, the differences between DG1 and DSM639 genomes suggest that DG1 could in fact be a fourth strain of S. acidocaldarius. Two mutations accumulated while selecting for papillae and 5 during

UV mutagenesis. Two of the mutations introduced during UV treatment are adjacent G:CA:T, which is consistent with the formation of CC dimer followed by pairing with TT as a result of translesion synthesis in other organisms (de Gruijl et al., 2001). This suggests that S. acidocaldarius exhibits translesion synthesis, though it is unclear why only one mutation of this

68 type was found as a result of UV mutagenesis. Direct reversal by Phr photolyase has been demonstrated in S. acidocaldarius (Sakofsky et al., 2011), though an NER-like system has not yet been identified. Also, the genome of S. acidocaldarius shows the presence of one putative translesion polymerase gene, dbh (Chen et al., 2005), though the data does not address if Dbh was responsible for mutation.

The remaining 19 mutations accumulated during the MNNG treatment. During the

MNNG treatment, 15 out of the 19 mutations that accumulated were G:CA:T, which is consistent with the type of substitution reported as a result of MNNG treatment in other model organisms (Sedgwick, 2004).

Another surprising observation was that there were no differences in the genome sequences of MM8 and GBA even though GBA was obtained as a spontaneous auxotrophic mutant from MM8. Therefore, the auxotrophic phenotype of GBA could probably be attributed to epigenetic factors.

Even though a higher rate of spontaneous mutations has been reported for GBA (Bell and

Grogan, 2002), no differences were found in the mutation spectrum when compared to wildtype with respect to positions of the mutations as well as types of mutations. Since a loss in genetic fidelity has been reported in the MM8 strain, the mutations accumulated between DG64 and

MM8 were analyzed for the ORFs that were affected; no ORF was found to be obviously involved with DNA metabolism. In fact, none of the total 52 mutations affected any ORF predicted to play a role in DNA metabolism. However, a few of the affected ORFs were conserved among the archaea whose functions are yet unknown.

In summary, the genome of S. acidocaldarius was observed to be very stable in nature as well as when the organism was bombarded with mutagens in the laboratory. In addition to

69 mechanisms that limit replication errors, there needs to be regulatory pathways that prevent intrachromosomal HR, which would result in genome rearrangements and deletions. HR in S. acidocaldarius was shown to be uninhibited by increasing mismatch number which suggests that recombination between similar sequences in the chromosome could occur, if not regulated.

Analysis of the S. acidocaldarius genome revealed about four thousand imperfect repeats which could participate in such recombinational events (Grogan, personal communication). The dearth of such rearrangements in the genomes of the natural isolates of S. acidocaldarius suggests there must be pathways that police HR.

Efficient gene flow across large distances in S. acidocaldarius contradicts the results of increasing genome divergence with increasing distance seen in S. islandicus. This suggests that factors other than mere geographical separation might contribute to the biogeography seen in S. islandicus; for example, aforementioned biotic factors like viruses, etc. The mechanism(s) enforcing genome stability in S. acidocaldarius remain evasive.

4.5 References

Arnold, H.P., Zillig, W., Ziese, U., Holz, I., Crosby, M., Utterback, T., Weidmann, J.F., Kristjanson, J.K., Klenk, H.P., Nelson, K.E., et al. (2000). A novel lipothrixvirus, SIFV, of the extremely thermophilic crenarchaeon Sulfolobus. Virology 267, 252‐266.

Bell, G.D., and Grogan, D.W. (2002). Loss of genetic accuracy in mutants of the thermoacidophile Sulfolobus acidocaldarius. Archaea 1, 45‐52.

Bennetzen, J.L. (2005). Transposable elements, gene creation and genome rearrangement in flowering plants. Curr Opin Genet Dev 15, 621‐627.

Blount, Z.D., and Grogan, D.W. (2005). New insertion sequences of Sulfolobus: functional properties and implications for genome evolution in hyperthermophilic archaea. Mol Microbiol 55, 312‐325.

Brash, D.E., and Haseltine, W.A. (1982). UV‐induced mutation hotspots occur at DNA damage hotspots. Nature 298, 189‐192.

70 Brock, T.D., Brock, K.M., Belly, R.T., and Weiss, R.L. (1972). Sulfolobus: a new genus of sulfur‐oxidizing bacteria living at low pH and high temperature. Arch Mikrobiol 84, 54‐68.

Cariello, N.F., Piegorsch, W.W., Adams, W.T., and Skopek, T.R. (1994). Computer program for the analysis of mutational spectra: application to p53 mutations. Carcinogenesis 15, 2281‐2285.

Chen, L., Brugger, K., Skovgaard, M., Redder, P., She, Q., Torarinsson, E., Greve, B., Awayez, M., Zibat, A., Klenk, H.P., et al. (2005). The genome of Sulfolobus acidocaldarius, a model organism of the Crenarchaeota. J Bacteriol 187, 4992‐4999. de Gruijl, F.R., van Kranen, H.J., and Mullenders, L.H. (2001). UV‐induced DNA damage, repair, mutations and oncogenic pathways in skin cancer. J Photochem Photobiol B 63, 19‐ 27. de Wit, R., and Bouvier, T. (2006). 'Everything is everywhere, but, the environment selects'; what did Baas Becking and Beijerinck really say? Environ Microbiol 8, 755‐758.

Finlay, B.J. (2002). Global dispersal of free‐living microbial species. Science 296, 1061‐1063.

Fuchs, T., Huber, H., Teiner, K., Burggraf, S., and Stetter, K.O. (1995). Metallosphaera prunae, sp nov, a novel metal‐mobilizing, thermoacidophilic archaeum, isolated from a uranium mine in Germany. Syst Appl Microbiol 18, 560‐566.

Gogarten, J.P., and Townsend, J.P. (2005). Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol 3, 679‐687.

Grogan, D.W. (1991). Selectable mutant phenotypes of the extremely thermophilic archaebacterium Sulfolobus acidocaldarius. J Bacteriol 173, 7725‐7727.

Grogan, D.W., and Gunsalus, R.P. (1993). Sulfolobus acidocaldarius synthesizes UMP via a standard de novo pathway: results of biochemical‐genetic study. J Bacteriol 175, 1500‐ 1507.

Harper, M., and Lee, C.J. (2012). Genome‐wide analysis of mutagenesis bias and context sensitivity of N‐methyl‐N'‐nitro‐N‐nitrosoguanidine (NTG). Mutat Res 731, 64‐67.

Hooper, S.D., Raes, J., Foerstner, K.U., Harrington, E.D., Dalevi, D., and Bork, P. (2008). A molecular study of microbe transfer between distant environments. PLoS One 3, e2607.

Kurosawa, N., Sugai, A., Fukuda, I., Itoh, T., Horiuchi, T., and Itoh, Y.H. (1995). Characterization and Identification of Thermoacidophilic Archaebacteria Isolated in Japan. J Gen Appl Microbiol 41, 43‐52.

Luria, S.E., and Delbruck, M. (1943). Mutations of Bacteria from Virus Sensitivity to Virus Resistance. Genetics 28, 491‐511.

71 Mahillon, J., and Chandler, M. (1998). Insertion sequences. Microbiol Mol Biol Rev 62, 725‐ 774.

Mao, D., and Grogan, D. (2012). Genomic evidence of rapid, global‐scale gene flow in a Sulfolobus species. Isme J.

Ochman, H., Lawrence, J.G., and Groisman, E.A. (2000). Lateral gene transfer and the nature of bacterial innovation. Nature 405, 299‐304.

Prangishvili, D., Arnold, H.P., Gotz, D., Ziese, U., Holz, I., Kristjansson, J.K., and Zillig, W. (1999). A novel virus family, the Rudiviridae: Structure, virus‐host interactions and genome variability of the sulfolobus viruses SIRV1 and SIRV2. Genetics 152, 1387‐1396.

Prangishvili, D.A., Vashakidze, R.P., Chelidze, M.G., and Gabriadze, I. (1985). A restriction endonuclease SuaI from the thermoacidophilic archaebacterium Sulfolobus acidocaldarius. FEBS Lett 192, 57‐60.

Reno, M.L., Held, N.L., Fields, C.J., Burke, P.V., and Whitaker, R.J. (2009). Biogeography of the Sulfolobus islandicus pan‐genome. Proc Natl Acad Sci U S A 106, 8605‐8610.

Richards, R.I., and Sutherland, G.R. (1994). Simple repeat DNA is not replicated simply. Nat Genet 6, 114‐116.

Sakofsky, C.J., Runck, L.A., and Grogan, D.W. (2011). Sulfolobus mutants, generated via PCR products, which lack putative enzymes of UV photoproduct repair. Archaea 2011, 864015.

Sedgwick, B. (2004). Repairing DNA‐methylation damage. Nat Rev Mol Cell Biol 5, 148‐157.

Strauss, B., Rabkin, S., Sagher, D., and Moore, P. (1982). The role of DNA polymerase in base substitution mutagenesis on non‐instructional templates. Biochimie 64, 829‐838.

Thomas, C.M., and Nielsen, K.M. (2005). Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol 3, 711‐721.

Welch, R.A., Burland, V., Plunkett, G., 3rd, Redford, P., Roesch, P., Rasko, D., Buckles, E.L., Liou, S.R., Boutin, A., Hackett, J., et al. (2002). Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings of the National Academy of Sciences of the United States of America 99, 17020‐17024.

72 Chapter 5 DNA exchange at high temperatures

5.1 Introduction

Conjugation is the transfer of DNA between cells by means of physical contact between the participating cells (Clark and Adelberg, 1962). If a copy of a chromosomal fragment is transferred, the transferred DNA can then be integrated into the recipient chromosome by homologous recombination. Physical contact is mostly mediated by structures called pili or in some cases through actual fusion of the cells (Alvarez-Martinez and Christie, 2009). Most prokaryotic conjugation involves a plasmid that encodes the components necessary for transferring DNA (Willetts and Skurray, 1980).

Conjugation was demonstrated in S. acidocaldarius by mixing two distinct Pyr- mutants, which generated Pyr+ recombinants (Grogan, 1996). It was also observed that treating mating pairs with UV stimulated conjugation as seen by the increased yield of recombinants (Schmidt et al., 1999). A type IV pilus, whose components and assembly proteins are encoded by genes of the ups operon, was found to be an essential requirement for conjugation (Ajon et al., 2011).

Interestingly, the ups operon in S. solfataricus is among the most highly upregulated genes in response to UV exposure (Frols et al., 2007). Other studies have shown that S. solfataricus cells aggregate upon UV exposure and the aggregation is mediated by the type IV pili (Frols et al.,

2008). The name ups is derived from UV-inducible type IV pili operon of Sulfolobus and the operon consists of upsA and upsB, which encode pilin-like proteins, upsF, which encodes a transmembrane protein, upsE, which encodes an ATPase responsible for assembly of the pili

(Frols et al., 2008). All Sulfolobales genomes carry the ups operon. Ajon et al. showed that upon deleting the upsE gene, S. acidocaldarius cells failed to aggregate and form recombinants

73 (Ajon et al., 2011). At least one of the conjugating strains needed to be Ups+ in order to obtain recombinants. In addition, when a Ups+ strain was irradiated with UV and mated with unirradiated Ups- strain, recombination was stimulated. However, when the Ups- strain was irradiated and crossed with unirradiated Ups+ strain, no stimulation of recombination was observed suggesting that when the genome of S. acidocaldarius undergoes UV damage, the pilus is used to import DNA in order to repair the damage.

This study aims at understanding the DNA transaction that occurs during conjugation in

S. acidocaldarius at the molecular level. Two important components will be investigated: (i) the size(s) of DNA(s) transferred during conjugation and (ii) whether the pilus allows bidirectional transfer of DNA between donor and recipient. The study employs whole genome mapping of markers in order to address these questions.

Preliminary results show that most of the donors transfer a short segment of DNA.

However, a minority of the recombinants shows transfer of at least 122kb of DNA.

5.2 Experimental design

Two S. acidocaldarius pyrE mutants with distinct pyrE mutations are mated with each other to produce Pyr+ recombinants. In order to produce Pyr+ recombinants, one of the strains will have to replace its pyrE mutation with a functional sequence from the other strain, which will result in a fully functional pyrE. No recombinants will be recovered if, in addition to the functional sequence, the mutation from the pairing strain is also transferred. The mating strains are chosen such that they have a number of polymorphisms that are spread out throughout the genome that can be used as non-selected markers.

74 Mating pairs were chosen based on the polymorphisms identified genome-wide in the different S. acidocaldarius strains (Chapter 4 of this document). MR31 and SA1 are isogenic except for an additional deletion of the upsE gene in SA1 rendering the strain incapable of conjugation (Ajon et al., 2011). SA1 was derived from MR31, which was in turn derived from

DG6 (Reilly and Grogan, 2001). A spontaneous pyrE mutant derived from Ron12/I, designated

DG269, was chosen as the strain to be mated with SA1 and MR31. This is because the locations

of polymorphisms between T (A) pyrE Ron12/I 38-43 Ron12/I and DG6 (identified in

the study described in chapter 4 of pyrE MR31/SA1 (B) 154-171 this document) allow for better

genome wide coverage. DG269

(C) Functional pyrE was isolated by Dennis Grogan as 594 bp a stable spontaneous FOA Figure 6. Selectable markers used in conjugation. (A) pyrE in DG269 showing a T insertion between coordinates 38‐43. (B) pyrE in MR31 resistant mutant (no reversion of and SA1 showing a deletion (triangle) of sequence 154‐171. (C) Functional pyrE in the recombinant that can be recovered only if: (A) mutation was seen) and tested to replaces the T insertion mutation with the corresponding sequence from (B) without the T insertion, (B) replaces the 18bp deletion with form recombinants when crossed the correct sequence from (A) with SA1. With respect to the selectable marker, DG269 has a single T insertion in a 4nt run of T residues located at positions

38-43 of the pyrE coding sequence while MR31 and SA1 have an 18bp deletion of sequence

154-171 of the pyrE coding sequence (Grogan, personal communication). In order to form recombinants (having a functional pyrE), either sequence should replace the pyrE mutation with the correct sequence from the other mating pair (Fig. 1).

75 The non-selected markers (polymorphisms between DG269 and MR31/SA1) chosen for this study are listed in Table 1. Scoring the non-selected markers enable identification of the donor-recipient strains as well as fragment size(s) of DNA(s) transferred.

Table 5. Polymorphisms between Ron12/I and MR31 and SA1 used as genetic markers in this study. The types of polymorphisms are identified for each strain with coordinates from DSM639. The primer pairs used to amplify loci of interest are also listed. Reference DG269 MR31/S position Fwd Rev (Ron12/I) A1 (DSM639) 81844 T G CTTCCATCTAAAGTTACTGCC TGTAATAGGCTTAGTGATGTTCG 304011 T A 304052 T A 329548 T C 348066 G A 348082 T G 348226 G A 375745 - A 623697 G A 720076 A G AGGAAATAATAGAGTTAGTGGCG CACAAACAAATACGACCTGATAG 836667 - T 838531 - A 1228163 T C TCTAGCCCTCCTAAATATCG GGGCTGAATGGAATGG 1298723 G A AGATGCTATAAGGTCTGTAGTAAC CGGTCAGATAGGAGAAAACAATG 1610486 513bp 459bp GTGGTAACTGCTGAGGAG AGCGGTGAATCCAACAT 1633413 130bp 2275bp AAGCTCCAGAACCTGC CCAAATATCAGTGACTTCAGAAAC 1674990 687bp 628bp CTCATTATCTCTGATTTCGACC CATAAGGACCTGGAGTGATC 1754390 561bp 351bp GAATAAAGATAGTCGTGGCG CAGTGTAGTTCGCGGT 1755721 T G TGCACCTACCAAGACAC GAGGTTTATGATGCCAGAG 1837116 T C 1837176 A T 1917860 C A 2027418 TT -- TACAGTACAGCTAACATATAACCAG GACGAATCTACTGAATTCTTACAC

5.3 Materials and methods

5.3.1 Conjugational crosses

DG269 is a pyrE mutant derived from Ron12/I. SA1 is derived from MR31, which was in turn derived from DG6. Recombinants were obtained by Dennis Grogan from conjugational crosses (DG269 x MR31 and DG269 x SA1) following the procedure outlined in (Grogan, 1996).

76 Table 6. Oligos used for LCR. Names of the 5.3.2 Analysis of recombinants oligos contain the polymorphism positions. Query bases in the upstream oligos are Genomic DNA was extracted from 49 separated from the rest of the sequence by a space and bolded.OLIGO 5’-3’ SEQUENCE recombinants from SA1 x DG269 and 45 81844dnbot AGTCTTGGTACAAGTAA 81844dntop GAAAATGATTAAAGAAAAC recombinants from MR31 x DG269 and loci of 81844Ttop TTACTTGTACCAAGACT T 81844Abot GTTTTCTTTAATCATTTTC A interest were amplified using the primers listed in 81844 Ctop TTACTTGTACCAAGACT C 81844 Gbot GTTTTCTTTAATCATTTTC G table 1 and thermocycler program DG50. The 720076dntop CCTATAGACTTTAAGAGAA 720076dnbot GTGCCGCTCTTTATA markers were scored either by electrophoresis on 720076Atop ATAAAGAGCGGCAC A 720076Tbot TTCTCTTAAAGTCTATAGG T 1% agarose gel (when markers differed in 720076Gtop ATAAAGAGCGGCAC G 720076Cbot TTCTCTTAAAGTCTATAGG C fragment sizes discernable by running on a gel) 1228163dntop AGAATCGTTAGTTTCTCC 1228163dnbot ATATTCTTTATTGTAGAGAAAA (Fig. 2) or LCR (when markers were BPS or 1- 1228163 Ttop TTTTCTCTACAATAAAGAATAT T 1228163 Abot GGAGAAACTAACGATTCT A 2nt indels) (Fig. 3). 1228163 Ctop TTTTCTCTACAATAAAGAATAT C 1228163 Gbot GGAGAAACTAACGATTCT G LCR was performed as previously 1298723dntop CTAAAATACCATATAGGTTG 1298723dnbot ACGTATTTTTCTCATG described (chapter 3 of this document) using 1298723Gtop CATGAGAAAAATACGT G 1298723Cbot ACCTATATGGTATTTTAG C oligo probes listed in Table 2; the only change in 1298723Atop CATGAGAAAAATACGT A 1298723Tbot ACCTATATGGTATTTTAG T the protocol was the optimal Tm of each oligo 1755721dntop AGGGTTTGAGGTGGA 1755721dnbot TGTACTCCCAGATCA probe set, which was derived experimentally by 1755721 Ctop TGATCTGGGAGTACA C 1755721 Gbot TCCACCTCAAACCCT G testing several temperatures. A modification of 1755721 Ttop TGATCTGGGAGTACA T 1755721 Abot TCCACCTCAAACCCT A the procedure involved enzymatic 2027418dnbot TTTTTACCATGTTAAAAACTT 2027418dntop TTTTTAAAATTATCTGAAATCA phosphorylation of downstream oligos by T4 2027418--bot TTGATTTCAGATAATTTTAAAAA 2027418--top AAGTTTTTAACATGGTAAAAA 2027418TTtop AGTTTTTAACATGGTAAAAA TT polynucleotide kinase, which is an economic 2027418AAbot TGATTTCAGATAATTTTAAAAA AA alternative to purchasing phosphorylated oligos.

Phosphorylation reactions consisted of 5µL of each downstream oligo, 80µL water, 10µL T4

DNA ligase buffer and 1µL polynucleotide kinase incubated at 37°C overnight.

77 561 bp fragment (Ron12/I allele) 351bp fragment (MR31/SAI allele)

Figure 2. Scoring markers based on fragment length of amplicons obtained by PCR using 1754363fwd and 1754363rev. Ron12/I differs from MR31 and SA1 by 210 bp.

5.4 Results

Seven sites have been queried so far; all the markers scored in the recombinants from both crosses were identified to be DG269 markers except 8/45 recombinants from DG269 x

MR31, which had MR31 markers, and 10/49 recombinants from DG269 x SA1, which had SA1 markers at all polymorphism positions scored between coordinates 1633414 and 1755721 (Fig.

4).

(A) (C) Ligation product

A/T

(B) (D)

Ligation product

C/G

Figure 3. LCR results for two recombinants probed for markers at position 1755721. (A) and (B) are LCR results from genotyping a transformant for AT and CG respectively; presence of a peak corresponding to the ligated product in reactions (A), but not in (B) shows C/G at the query position. Similarly (C) and (D) show LCR results from a transformant with A/T at the query position.

78 5.5 Discussion

In all the recombinants obtained from both crosses, DG269 was the recipient since the majority of markers are from the DG269 strain. This was not expected and there might be physiological differences in the mating strains or an effect of type and/or position of pyrE mutation that contribute towards this bias. More crosses with other strains need to be carried out in order investigate the basis for this bias.

In the cross between DG269 and SA1, only DG269 is capable of producing pili, which suggests the pili were used to import DNA from SA1. With respect to the amount of DNA transferred, most of the recombinants appear to transfer a short segment containing the selectable marker. However, a total of 18 recombinants from both crosses had MR31/SA1 markers at all

the query locations between 1610486 and 1917860 1837116, 1837176 2027418 1754390, 1755721 1755721. It is unusual that the same 81844 1610486, 1633413, 1674990

1298723 pyrE fragment was transferred in all 18

upsEF (1272319) recombinants and might reflect an effect of

1228163 the particular region of the chromosome. It

would be interesting to see if the same

304011, 304052 329548 region also gets transferred in crosses with

348066, 348082, 348226 other S. acidocaldarius strains.

720076 836667, 838531 623697 Another interesting observation is

that two donor sequences are interrupted by Figure 4. Genome map of S. acidocaldarius showing approximate locations of markers. Colored (green and a recipient sequence. The pyrE coordinates blue) boxes represent markers that have been scored so far. Green boxes indicate markers that showed the Ron12/I in the S. acidocaldarius DSM639 genome allele in all recombinants. Blue boxes indicate markers that showed MR31/SA1 allele in about 20% of the recombinants. are 1361635-1362228. Since the selectable Red circles indicate the origins of replication.

79 marker from MR31/SA1 needs to be transferred to DG269 to form recombinants, one end-point of the transferred DNA would have to be upstream of the position 38 in pyrE coding sequence

(see Fig. 1). It is surprising that in 18 recombinants from both crosses DG269 imported DNA downstream of the pyrE coding sequence since the 18bp deletion in MR31/SA1 lies downstream of the T insertion. The recombinants could have been formed in two ways: (i) transfer of two fragments; one fragment with one end point between the two selectable markers and the other end point upstream of position 38, and the other fragment with end points flanking coordinates

1610486 and 1755721 (~122kb), or (ii) transfer of a ~400kb fragment, one end of which lies upstream of position 38 and the other end lies beyond position 1755721, followed by repair at the 18bp deletion site favoring the DG269 strand. There is precedence for (ii) from the genetic sectoring study (chapter 3 of this document), where individual mismatches were repaired amidst a string of multiple mismatches.

In summary, the pilus was observed to import DNA into the DG269 strain when crossed with Ups+ as well as Ups- strains in S. acidocaldarius. Whether S. acidocaldarius can use its pilus to export DNA remains inconclusive since DG269 was the sole recipient in both crosses with MR31 as well as SA1. About 20% of the recombinants show transfer of at least 122kb of

DNA in addition to the selectable marker. These preliminary results show that either a single, large fragment or multiple fragments of DNA can be transferred during conjugation in S. acidocaldarius. It would be interesting to see whether more transferred fragments can be identified as the rest of the markers are scored. Considering the hypothesized function of conjugation in Sulfolobus for DNA exchange between cells under genomic stress for repair purposes, it seems logical that large amounts of DNA (either as large fragments or multiple short fragments) are transferred, which would be necessary to provide repair templates for lesions that

80 are formed genome-wide. It would be interesting to compare genome-wide maps of marker exchange between S. acidocaldarius strains in the absence and presence of genomic stress.

5.6 References

Ajon, M., Frols, S., van Wolferen, M., Stoecker, K., Teichmann, D., Driessen, A.J., Grogan, D.W., Albers, S.V., and Schleper, C. (2011). UV-inducible DNA exchange in hyperthermophilic archaea mediated by type IV pili. Mol Microbiol 82, 807-817.

Alvarez-Martinez, C.E., and Christie, P.J. (2009). Biological Diversity of Prokaryotic Type IV Secretion Systems. Microbiol Mol Biol R 73, 775-808.

Clark, A.J., and Adelberg, E.A. (1962). Bacterial conjugation. Annu Rev Microbiol 16, 289-319.

Frols, S., Ajon, M., Wagner, M., Teichmann, D., Zolghadr, B., Folea, M., Boekema, E.J., Driessen, A.J., Schleper, C., and Albers, S.V. (2008). UV-inducible cellular aggregation of the hyperthermophilic archaeon Sulfolobus solfataricus is mediated by pili formation. Mol Microbiol 70, 938-952.

Frols, S., Gordon, P.M., Panlilio, M.A., Duggin, I.G., Bell, S.D., Sensen, C.W., and Schleper, C. (2007). Response of the hyperthermophilic archaeon Sulfolobus solfataricus to UV damage. J Bacteriol 189, 8708-8718.

Grogan, D.W. (1996). Exchange of genetic markers at extremely high temperatures in the archaeon Sulfolobus acidocaldarius. J Bacteriol 178, 3207-3211.

Reilly, M.S., and Grogan, D.W. (2001). Characterization of intragenic recombination in a hyperthermophilic archaeon via conjugational DNA exchange. J Bacteriol 183, 2943-2946.

Schmidt, K.J., Beck, K.E., and Grogan, D.W. (1999). UV stimulation of chromosomal marker exchange in Sulfolobus acidocaldarius: implications for DNA repair, conjugation and homologous recombination at extremely high temperatures. Genetics 152, 1407-1415.

Willetts, N., and Skurray, R. (1980). The Conjugation System of F-Like Plasmids. Annu Rev Genet 14, 41-76.

81 Chapter 6 Conclusions

The overarching theme of this dissertation research is genome stability in HA, which was investigated at several different levels in the representative archaeon S. acidocaldarius. This research topic is particularly significant in view of the DNA damaging conditions in which the

HA thrive, combined with the missing NER and MMR components in the genomes of these organisms (Grogan, 2004). Properties typical of organisms that intrinsically do not have NER or

MMR, or have impaired versions of these pathways are reduced tolerance to UV irradiation and

50-1000 fold increase in spontaneous mutation rates respectively (Lindahl and Wood, 1999).

The conventional NER system shares CPDs as substrates with the DNA photolyase repair system, which could in theory purge the cell of the lesions in the absence of NER; S. acidocaldarius has been shown to possess a functional photoreactivation system catalyzed by a DNA photlyase called Phr (Sakofsky et al., 2011). However, MMR serves a specialized role of correcting replication errors and in the absence of MMR no other repair system can substitute to reduce spontaneous mutations primarily due to the error-recognizing capability of the pathway. Where most repair pathways recognize damaged DNA, the primary substrates for MMR are normal bases mispaired to each other, necessitating recognition of the erroneous base(s) (Lindahl and

Wood, 1999). Since the spontaneous mutation rate in S. acidocaldarius was found to be comparable to that of MMR-proficient E. coli (Grogan et al., 2001), a fundamental question arises, which is whether a novel system with an equivalent function to that of conventional MMR operates in S. acidocaldarius.

In order to investigate whether a conventional MMR-like system exists in S. acidocaldarius, various features of the canonical MMR pathway were tested: mismatch repair capability, ability to distinguish the error-containing strand, inhibition of recombination between

82 nearly identical sequences and gene conversion as a result of repair of long tracts of mismatched intermediates formed during HR. The direct approach to answering the question of whether S. acidocaldarius has an MMR-like system is to test whether preformed mismatches can be repaired in vivo (chapter 2 of this document). Single, defined mismatches were introduced into

Sulfolobus-E. coli shuttle plasmids with and without strand discrimination signals (nicks on one strand) and transformed S. acidocaldarius with the heteroduplex plasmids. Generating the mismatches by means of overlapping restriction enzyme sites provided a way to score for repair of the mismatches. When S. acidocaldarius was transformed with plasmids containing a T/C mismatch without strand discrimination signals, 79.2% of the transformants retained both parental heteroduplex plasmid strands and 20.8% retained only the backbone strand (discarded incoming strand). Sulfolobus acidocaldarius transformed with pDM8 with a T/C mismatch containing strand discrimination signals on the backbone strand yielded 45.8% of transformants that retained both parental strands, 33.3% that retained only donor strand and 20.8% that retained the backbone strand. Definitive conclusions could not be derived from data obtained in S. acidocaldarius because MMR-proficient E. coli failed to show repair of the mismatches in the heteroduplex plasmids with or without strand discrimination signals (majority of the transformants retained both parental strands). This could be because mismatches in the context of the plasmid are weak substrates for the E. coli MMR machinery or that ligation and/or replication outcompeted MMR. Failure of the assay to behave as expected in E. coli raises doubts about the repair seen in S. acidocaldarius with strand discrimination signals in the substrate as a nicked strand could partially denature in vivo at the physiological temperature

(80°C) of S. acidocaldarius, creating a flap that could be removed by endonucleases, for example, which would mimic repair discriminating against the strand containing the nick.

83 Even though the heteroduplex plasmid assay failed to definitively conclude on the repair of preformed mismatches, it can be modified to query TLS in S. acidocaldarius. The results from the MMR study confirm the success in replacing single strands by dual nicking using nicking enzymes and heat release. The main limitation of the assay was the dependency on E. coli as a comparative control to investigate MMR since no mutSL genes are present in S. acidocaldarius that can be knocked out to obtain mutants for comparison. This requirement can be bypassed while assaying TLS because of the availability of a S. acidocaldarius strain with the prospective TLS polymerase gene, dbh, knocked out.

The heteroduplex plasmids assay tested repair of preformed mismatches generated in vitro. In order to present mismatches in a different context, assays were developed to investigate formation and repair of mismatches in vivo during homologous recombination in S. acidocaldarius (chapter 3 of this document). Transforming S. acidocaldarius with exogenous

DNA containing genetic markers showed that heteroduplex intermediates are formed during the intermediate steps of incorporating the exogenous DNA into the chromosome evident from the sectored recombinant colonies that were obtained. Sectored colonies can only be detected if there are unrepaired mismatches that segregated during replication. An in-depth analysis using differentially marked single and double stranded oligonucleotides showed that a subset of mismatches formed during homologous recombination was repaired. However, the repair did not mimic a conventional MMR system which removes a long tract of mismatches. Instead, individual mismatches were repaired within a heteroduplex containing multiple mismatches in S. acidocaldarius. Also, the repair did not show directionality as no bias was seen in the strand that was removed. This study provided the first in vivo evidence for repair of mismatches in S. acidocaldarius. Even though the repair of individual mismatches occurred without strand bias, it

84 is possible an error-differentiation capability exists, but the assay did not provide the appropriate conditions for its function.

Also, the inhibitory effect of mismatches on homologous recombination, seen in several

MMR-proficient bacteria and eukaryotes, were not detected in S. acidocaldarius (chapter 3 of this document). In yeast, a 100-fold decrease in recombination was seen with sequences that were 94% identical (Datta et al., 1997). In contrast, S. acidocaldarius did not show significant differences in recombination efficiencies with sequences that were 100%, 98.5%, 96.3% and

95.5% identical.

Though repair of mismatches could be seen with the genetic sectoring assay, no inhibitory effect of mismatches on HR efficiency were detected suggesting that the putative mismatch repair system does not have the policing role of terminating HR between divergent sequences in conventional MMR. Repair of individual mismatches within a DNA duplex containing multiple mismatches can, in theory, be catalyzed by structure-specific endonucleases that cleave DNA containing Y-shaped structures and flaps; mismatches could produce bubble- like structures, due to the instability caused by the violation of Watson-Crick base pairing, thereby generating two Y-shaped structures at each end of the bubble. Sulfolobus acidocaldarius possesses homologues of a flap endonuclease called FEN-1 that has been shown to function as a

5’-flap endonuclease and has 5’-3’ exonuclease activity as well (Chen et al., 2005; Lieber, 1997).

FEN-1 is involved with DNA replication and repair and interacts with proliferating cell nuclear antigen (PCNA), an integral component of the replication machinery, in S. solfataricus (Hutton et al., 2008). FEN-1 can be hypothesized to catalyze removal of mismatches in vivo in S. acidocaldarius. Another candidate structure specific endonuclease is XPF, which is a 3’-flap endonuclease (Chen et al., 2005). The interactions of structure-specific endonucleases with

85 PCNA during progression of the replication fork might enable differentiating erroneous nascent

DNA from the template.

The genetic sectoring assays also provided information about mechanism of homologous recombination in S. acidocaldarius. Processing of linear DNA for recombination with the chromosome was found to be different from the established pathways in bacteria and eukaryotes; where mostly cis or trans configurations of marker incorporation are seen in other model organisms, they were found only in a small minority of S. acidocaldarius transformants.

Differences were also seen in incorporation of markers at the ends of the donor DNA from single-stranded and double stranded DNA transformations suggesting differential processing of ss and ds donor DNAs in S. acidocaldarius.

The genome of S. acidocaldarius was found to be very stable in nature compared to its sister species, S. islandicus (chapter 4 of this document). Three S. acidocaldarius strains isolated from sites about 8200km apart showed only 40 polymorphisms in total with average pairwise divergence between the three strains of 1.14 x 10-5. This is in sharp contrast to the average divergence of 1.1 x 10-2 seen in S. islandicus strains isolated from sites about 6500km apart. The efficient gene flow seen in S. acidocaldarius supports the Baas-Becking hypothesis of everything being everywhere, but the environment selects. The results suggest that factors other than simple geographical separation could contribute to the limited gene flow in S. islandicus.

The stability of the S. acidocaldarius genome was further seen when the genome was treated with mutagens (chapter 4 of this document). The genome of S. acidocaldarius was observed to be fairly resistant to UV and MNNG. The mutations from MNNG treatment were the expected type (G:C-A:T) indicating that MNNG has similar chemical effects on the S. acidocaldarius genome as has been documented in other organisms (Sedgwick, 2004); this

86 suggests that the replication polymerase of S. acidocaldarius pairs thymine opposite an O6- methylguanine in the template. UV mutagenesis yielded only 5 mutations, only one of which fit the mutation type induced by UV in other organisms of a CC residue being converted to TT.

Sulfolobus acidocaldarius does seem to have systems that enforce genetic fidelity and genome stability evident from the low sequence divergence among the genome sequences of natural isolates. This is further supported by the results obtained from treating S. acidocaldarius with mutagens. However, unlike bacteria and eukaryotes, S. acidocaldarius does not seem to have a conventional MMR-like mechanism. Even though mismatches formed during HR were repaired, no strand discrimination was found. Furthermore, individual or short patches of mismatches were repaired within a long tract of mismatches; the expected result from conventional MMR would be resolution of long tracts (~1kb in E. coli). Mismatches did not inhibit HR in S. acidocaldarius suggesting that the mechanism that repairs short patches of mismatches does not act on mismatched intermediates formed during HR to terminate recombination unlike conventional MMR. Thus, the mechanistic features of conventional MMR were not seen in S. acidocaldarius. Together these results suggest that S. acidocaldarius does not have a canonical MMR-like system, but has novel strategies for lowering spontaneous mutations rates. These results are significant because they reinforce the uniqueness of archaea and broaden our knowledge of strategies employed by cellular organisms to maintain genetic fidelity. Future investigations to elucidate the mechanisms enforcing genetic fidelity in S. acidocaldarius should account for completely novel mechanisms and assays should be developed accordingly.

Another area of DNA repair in S. acidocaldarius that is currently being investigated is conjugational transfer of DNA between cells mediated by a type IV pilus (chapter 5 of this document). The increase in recombinants from conjugation upon genomic stress is hypothesized

87 to be a DNA damage response of DNA exchange followed by homologous recombinational repair. I am investigating the transfer of DNA during conjugation in S. acidocaldarius in terms the size(s) of the fragment(s) transferred as well as to try and establish the role of the pilus in the directionality of DNA transfer (whether it is used to import or export DNA). The assay developed towards these objectives involves scoring genetic markers spread throughout the genome in recombinants obtained from conjugational crosses in order to identify the contribution of each partner in a mating cross. A pyrE mutant derived from Ron12/I was crossed both with

MR31 and SA1, which are isogenic strains except that SA1 is incapable of assembling pili and, as a result, conjugation. The data so far show a majority of the recombinants transfer a small fragment of DNA during conjugation. However, about 20% of recombinants showed a transfer of at least a 122kb fragment along with the selected marker. The transfer could be of two fragments from the donor or a single fragment that was processed to lose a short segment of the donor strand during HR into the recipient chromosome. Scoring more markers will reveal a clearer picture of this aspect of DNA transfer during conjugation in S. acidocaldarius. With respect to the directionality of transfer, Ron12/I was the recipient in both crosses with MR31 and

SA1 which suggests the pilus was used to import DNA. The simplicity of the assay allows it to be carried out with other S. acidocaldarius strains with different chromosomal markers to answer the question of whether the pilus can also be used to export DNA.

6.1 References

Chen, L.M., Brugger, K., Skovgaard, M., Redder, P., She, Q.X., Torarinsson, E., Greve, B., Awayez, M., Zibat, A., Klenk, H.P., et al. (2005). The genome of Sulfolobus acidocaldarius, a model organism of the Crenarchaeota. J Bacteriol 187, 4992-4999.

88 Datta, A., Hendrix, M., Lipsitch, M., and Jinks-Robertson, S. (1997). Dual roles for DNA sequence identity and the mismatch repair system in the regulation of mitotic crossing-over in yeast. Proc Natl Acad Sci U S A 94, 9757-9762.

Grogan, D.W. (2004). Stability and repair of DNA in hyperthermophilic Archaea. Curr Issues Mol Biol 6, 137-144.

Grogan, D.W., Carver, G.T., and Drake, J.W. (2001). Genetic fidelity under harsh conditions: analysis of spontaneous mutation in the thermoacidophilic archaeon Sulfolobus acidocaldarius. Proc Natl Acad Sci U S A 98, 7928-7933.

Hutton, R.D., Roberts, J.A., Penedo, J.C., and White, M.F. (2008). PCNA stimulates catalysis by structure-specific nucleases using two distinct mechanisms: substrate targeting and catalytic step. Nucleic Acids Res 36, 6720-6727.

Lieber, M.R. (1997). The FEN-1 family of structure-specific nucleases in eukaryotic DNA replication, recombination and repair. Bioessays 19, 233-240.

Lindahl, T., and Wood, R.D. (1999). Quality control by DNA repair. Science 286, 1897-1905.

Sakofsky, C.J., Runck, L.A., and Grogan, D.W. (2011). Sulfolobus mutants, generated via PCR products, which lack putative enzymes of UV photoproduct repair. Archaea 2011, 864015.

Sedgwick, B. (2004). Repairing DNA-methylation damage. Nat Rev Mol Cell Biol 5, 148-157.

89