MIAMI UNIVERSITY The Graduate School

CERTIFICATE FOR APPROVING THE DISSERTATION

We hereby approve the Dissertation

Of

Brian Junior Henson

Candidate for the Degree:

Doctor of Philosophy

Advisor ______(Susan R. Barnum)

Advisor ______(Linda E. Watson)

Reader ______(David A. Francko)

Reader ______(John Z. Kiss)

Grad School Representative ______(Luis A. Actis)

Abstract

EVOLUTION, VARIATION, AND EXCISION OF DEVELOPMENTALLY REGULATED DNA ELEMENTS IN THE HETEROCYSTOUS

by Brian Junior Henson

In some cyanobacteria, heterocyst differentiation is accompanied by developmentally regulated DNA rearrangements that occur within the nifD, fdxN, and hupL genes, referred to as the nifD, fdxN, and hupL elements. These elements are excised from the genome by site-specific recombination during the latter stages of heterocyst differentiation. In this dissertation, two major questions are addressed: 1) what is the evolutionary history of the nifD and hupL elements and 2) how is the nifD element excised? To answer the first question, full length nifD and hupL element sequences were characterized and compared; and xisA and xisC sequences (which encode the recombinases that excise the nifD and hupL elements, respectively) were phylogenetically analyzed. Results indicated extensive structural and compositional variation within the nifD and hupL elements. The data suggests that the nifD and hupL elements are of viral origin and that they have variable patterns of evolution in the cyanobacteria. To answer the second question, a recombination system was devised where the ability of XisA to excise or recombine variants of the nifD element (substrate plasmids) was tested. Using PCR directed mutagenesis, specific nucleotides within the flanking regions of the nifD element were altered and the effects on recombination determined. Results indicate that nucleotides within and outside of the direct repeats are involved in excision, and that not all nucleotides within the direct repeats are required. In certain nucleotide positions, the presence of a purine versus a pyrimidine greatly affected recombination. Although excision was inhibited when certain nucleotides were mutated, PCR analyses revealed that excision occurred at a low level. The data also indicate that the site of excision occurs within the direct repeats. The results presented here suggest that the elements may be variable in size, composition, and excision. EVOLUTION, VARIATION, AND EXCISION OF DEVELOPMENTALLY REGULATED DNA REARRANGEMENTS IN THE HETEROCYSTOUS CYANOBACTERIA

A Dissertation

Submitted to the Faculty of

Miami University in partial

fulfillment of the requirements

for the degree of

Doctor of Philosophy

Department of Botany

by

Brian Junior Henson

Miami University

Oxford, Ohio

2005

Major Advisors: Susan R. Barnum Ph.D. and Linda E. Watson Ph.D.

TABLE OF CONTENTS

PAGE

LIST OF TABLES...... iii

LIST OF FIGURES...... iv

DEDICATION...... v

ACKNOWLEDGEMENTS...... vi

INTRODUCTION ...... 1 REFERENCES...... 9

CHAPTER 1 ...... 14 ABSTRACT...... 15 INTRODUCTION...... 16 MATERIALS AND METHODS...... 17 RESULTS ...... 18 DISCUSSION ...... 19 REFERENCES...... 23

CHAPTER 2 ...... 26 ABSTRACT...... 27 INTRODUCTION...... 28 MATERIALS AND METHODS...... 31 RESULTS ...... 34 DISCUSSION ...... 38 REFERENCES...... 57

CHAPTER 3 ...... 65 ABSTRACT...... 66 INTRODUCTION...... 67 MATERIALS AND METHODS...... 70 RESULTS ...... 76 DISCUSSION ...... 80 REFERENCES...... 113 SUMMARY ...... 118

ii LIST OF TABLES

TABLES PAGE

CHAPTER I 1. Similarities within conserved regions of the nifD element...... 21

CHAPTER II 1. Prevalence of the nifD element ...... 48 2. Prevalence of the hupL element ...... 49 3. G+C content of the elements...... 50

CHAPTER III 1. Primer table ...... 83 2. Results of recombination assay ...... 89

iii LIST OF FIGURES

FIGURE PAGE

INTRODUCTION 1. NifD element ...... 7 2. HupL element...... 8

CHAPTER I 1. Similarity between conserved regions of the nifD elements...... 22

CHAPTER II 1. Comparison of nifD elements...... 51 2. Comparison of Cylindrospermum PCC 7417 and N. punctiforme nifD elements...... 52 3. Comparison of hupL elements ...... 53 4. Combined xisA and xisC phylogeny...... 54 5. XisA phylogeny ...... 55 6. XisC phylogeny ...... 56

CHAPTER III 1. NifD element...... 91 2. Map of pDK1 and pSUB1...... 92 3. Map of pSUB9B and its derivatives ...... 93 4. pRE-18 ...... 94 5. pDK1...... 94 6. pSUB1...... 95 7. pSUB2...... 95 8. pSUB9B ...... 96 9. nifD and nifK proximal regions in pSUB9B ...... 96 10 pSUB16Z, 10...... 97 11 pSUB68,69,71-73 ...... 97 12 pSUB11,17,18,23,26,74-77...... 98 13 pSUB45-55...... 98 14 pSUB34-44...... 99 15 pSUB13,14,19-21,33 ...... 99 16 Overview of excision...... 100 17 Branch migration...... 101 18 Model of excision ...... 112

iv Dedication

This dissertation, as with all of my life’s work, is dedicated to the loving memory of my father, Clyde Junior Henson, my uncle, Pete L. York, and my godfather Augustine Roman III. Your love and friendship is sorely missed. The memory of each of you serves as my inspiration and motivation.

v Acknowledgements

I would like to thank my advisors Linda E. Watson Ph.D. and Susan R. Barnum Ph.D. for all of their help and guidance throughout my graduate carrier. I would also like to thank my other committee members for providing valuable input. I also thank the entire faculty of the Botany Department for allowing me to complete my graduate studies in the department. I acknowledge Chris Wood and the Center for Bioinformatics and Functional Genomics for assisting me in many aspects of my research. I thank Luis A. Actis Ph.D. for assistance with research and for supplying me with several plasmids and bacterial cultures. I thank the members of the Barnum lab for their assistance and camaraderie. This includes Dan Prochaska, Feng Fang, Kyle Kenyon, Erika Budde, Shari Hesselbrock, Jon Hlivko, Josie Hugie, Michael Tom, and Eric Pennington. I thank Dawn M. O’Dee for all of her support throughout this entire process. She made it a lot easier for me. I would also like to acknowledge the love and support of my family. I would like to recognize my brother Brad Henson and mother Margaret Henson. I thank you both for everything.

vi Introduction

Cyanobacteria are a diverse group of oxygenic photosynthetic prokaryotes that can be found in virtually every environment on Earth (Castenholz and Waterbury, 1989). They are an ancient lineage that dates back 3.5 billion years in the fossil record (Castenholz, 1992). It is believed that they were responsible for the aerobic conversion of ancient Earth’s anaerobic atmosphere (Hayes, 1983; Schopf et al., 1983). It is also widely accepted that were once free living cyanobacteria that underwent an endosymbiotic event with the ancestor of modern plants. In addition to being primary producers, many cyanobacteria also fix nitrogen.

Nitrogen fixation is the process of reducing atmospheric nitrogen (N2) to ammonia

(NH3). Although a small amount of nitrogen is fixed by artificial means (fertilizers) and lightning, the majority is fixed by microorganisms. The cyanobacteria play a major role in the global nitrogen cycle by supplying more fixed nitrogen to the environment than any other group of organisms (Sprent and Sprent, 1990). In addition, some cyanobacteria are involved in symbiotic relationships directly supplying the host with reduced nitrogen (Rai et al., 2000). These symbiotic relationships can be found with a number of plants including, cycads, , lichens, and the water fern Azolla (Rai et al., 2000; Whitton, 2000). Azolla along with the cyanobacterial symbiont Anabaena azollae, are responsible for supplying rice paddies with fixed nitrogen. Without this source of fixed nitrogen, global rice production would plummet, which would be devastating considering the number of people that depend on rice as the basis of their diet. Atmospheric nitrogen is reduced by the enzyme nitrogenase, which is encoded by the nifHDK operon. Nitrogenase is composed of two components, dinitrogenase reductase (iron protein) and dinitrogenase (molybdenum-iron protein). Dinitrogenase reductase is composed of two subunits encoded by nifH (Mavarech et al., 1980), and dinitrogenase is a tetramer, with two subunits encoded by nifD and two subunits encoded by nifK (Mazur and Chui, 1982). Dinitrogenase reductase mediates the ATP-dependent transfer of electrons to dinitrogenase, and dinitrogenase binds atmospheric nitrogen, and reduces it (Postgate, 1982). In addition to nifHDK, other genes are involved in nitrogen

1 fixation have been identified. These include the nifENXW and nifBSU, fdxN operons (Mulligan et al., 1988; Mulligan and Haselkorn, 1989; Borthakur et al., 1990; Haselkorn, 1992; Haselkorn and Buikema, 1992). Although some cyanobacteria are photosynthetic and able to fix nitrogen, the two metabolic processes are incompatible. Oxygen, which is liberated during , binds to the iron cofactors of nitrogenase, inactivating it and preventing nitrogen fixation. To overcome this limitation cyanobacteria have developed several ways to separate nitrogen fixation and photosynthesis (oxygen). Some cyanobacteria temporally separate the two with photosynthesis occurring during the day and nitrogen fixation occurring at night (Adams and Duggan, 1999; Wolk et al., 1994). Cyanobacteria also use a variety of ways to spatially separate photosynthesis and nitrogen fixation. Some clump together as a large aggregate of cells, creating a micro-anaerobic environment in the center allowing the innermost cells to fix nitrogen (Adams and Duggan, 1999; Wolk et al., 1994). The heterocystous cyanobacteria also spatially separate oxygen and nitrogenase by developing heterocysts, specialized cells that are impermeable to oxygen (Carr, 1983). During periods of nitrogen starvation, certain vegetative cells will differentiate into heterocysts, which are solely dedicated to fixing nitrogen and supplying it to the surrounding vegetative cells. In return, vegetative cells supply heterocysts with carbohydrates. Heterocyst differentiation involves a complex and orderly pattern of gene expression involving up to 1000 different genes or 15-25% of the entire genome (Adams and Duggan, 1999; Lynn et al., 1986). Heterocysts are terminally differentiated, once they complete differentiation they are unable to dedifferentiate. Heterocysts are larger than vegetative cells, their cytoplasm appears to be less granulated, they contain polar bodies that are absent in vegetative cells, and they have a much thicker cell wall. The internal anaerobic environment of the heterocyst is achieved by the alteration of the cell wall during differentiation. The vegetative cell wall consists of an outer membrane and an inner cytoplasmic membrane separated by a peptidoglycan layer. During differentiation, three additional layers are added to the cell wall (Adams and Duggan, 1999; Wolk et al., 1994). The innermost layer, the laminated layer, is composed of glycolipids (Lazaro et al., 2001). The next layer, the homogeneous layer, is composed of polysaccharides deposited in a dense and compact arrangement. The

2 outermost layer, the fibrous layer, is composed of the same polysaccharides as the homogeneous layer but they are arranged in a less compacted fashion. These additional layers, especially the laminated layer, decrease the permeability of heterocysts to oxygen (Adams and Duggan, 1999; Wolk et al., 1994). Many genetic and biochemical changes are associated with differentiation. These include shutting down biochemical pathways not needed by heterocysts, including photosystem II. Genetic changes include numerous alterations in gene expression, and several developmentally regulated DNA rearrangements. These DNA rearrangements involve the removal of genetic elements from within the chromosome by site-specific recombination. These elements are large fragments of DNA up to 55 kb in length. They occur within the coding regions of nifD, fdxN, and hupL. These genes encode a component of dinitrogenase, a bacterial like ferrodoxin, and a subunit of the membrane- bound uptake hydrogenase, respectively (Haselkorn, 1992; Golden et al., 1987; Carrasco et al., 1994; Mulligan et al., 1988). These genetic elements, termed the nifD, fdxN, and hupL elements, are excised from the genome during the latter stages of differentiation by site-specific recombination. Each element is independently excised, and each encodes the site-specific recombinase responsible for its excision. The first element to be discovered and sequenced was the 11-kb nifD element in Nostoc PCC 7120 (Fig. 1) (Golden et al., 1985; Haselkorn, 1992; Lammers et al., 1990). This element is removed from the coding region of nifD by site-specific recombination between 11-bp direct repeats (GCCTCATTAGG) that flank the element (Golden et al., 1985; Haselkorn, 1992). Expression of the nifHDK operon and nitrogen fixation is contingent on the removal of the element. The recombinase responsible for excision is encoded by xisA, which is located within the element near nifK (Fig. 1). Many other potential open reading frames (ORF’s) or genes are encoded within the element; however, it is believed that xisA is the only gene required for excision (Lammers et al., 1986; 1990). There is evidence that xisA is not the only gene encoded within the nifD element to be expressed, but it has not been unequivocally determined which gene or genes are expressed (Rice et al., 1982; Lammers et al., 1990). The 55 kb element within fdxN was the second to be discovered in Nostoc PCC 7120 (Golden et al., 1988; Carrasco et al., 1994; Mulligan et al., 1988), and its removal is

3 required for the expression of the nifBSU, fdxN operon. The fdxN element is excised by XisF, XisH, and XisI, all of which are encoded within the element (Ramaswamy et al., 1997). The element is excised by site-specific recombination between flanking 5-bp direct repeats (TATTC) (Golden et al., 1987; 1988; Mulligan et al., 1988; Carrasco et al., 1994). Many potential ORF’s are encoded within the fdxN element but only xisF, xisH, and xisI are believed to be required for excision (Carrasco et al., 1995). The third element found in Nostoc PCC 7120 was a 10.5 kb element within the coding region of hupL (Fig. 2) (Wolk et al., 1994; Carrasco et al., 1995). This element is excised from within hupL by the recombinase XisC, which is encoded within the element (Carrasco et al., 1995). The hupL element is excised by site-specific recombination between 16 bp direct repeats (CACAGCAGTTATATGG) that flank the element (Carrasco et al., 1995). Although many potential ORFs are encoded within the element, only xisC is believed to be involved in the excision of the element (Carrasco et al., 1995). Although these elements have received moderate attention in the literature, many questions remain about their evolutionary history, structural and compositional variation, and the regulation of their excision. It is not known how prevalent each element is in the heterocystous lineage. Some heterocystous cyanobacteria have all three elements, whereas others have one, two, or none of these elements. The evolutionary origin of these elements is unknown, although it has been hypothesized that they are the remnants of ancient viral infections that have lost the ability to self replicate (Haslekorn, 1992; Henson et al., 2005). The fdxN element shows similarity to the Bacillus subtilis skin element (48 kb) that is excised from within the sigK gene during sporulation by the SpoIVCA recombinase (Kunkel et al., 1990; Haselkorn, 1992; Carrasco et al., 1994). Both xisF and SpoIVCA belong to the resolvase/invertase or serine family of recombinases (Smith and Thorpe, 2002). The recombinases xisA and xisC belong to the tyrosine or phage integrase family of site-specific recombinases (Nunes-Duby et al., 1998). The tyrosine integrases are characterized by a conserved R-H- R-Y tetrad within the active site (Voziyanov et al., 1999; Nunes-Duby et al., 1998); however, xisA and xisC have a mutation in their active site. Instead of a histidine, they have a tyrosine (R-T-R-T) in thier active site (Nunes-Duby et al., 1998). In addition, xisA and xisC are 61% similar and 43% identical at the amino acid level (Carrasco et al.,

4 1995). BLAST (basic local alignment search tool) searches indicate that xisA and xisC are more similar to each other than any other sequence in the data base. The numerous similarities between xisA and xisC could suggest that they are related and share a common ancestor. Considering the hypothesized viral origin of the nifD and hupL elements and the close relationship between xisA and xisC, it is possible that the nifD and hupL elements themselves share a common evolutionary origin. The exact sequences required for excision of the elements is unknown. The only element to be examined thus far is the nifD element. Brusca et al., (1990) created the substrate plasmid pAM461, which contained 450 bp surrounding the nifD proximal direct repeat and 750 bp surrounding the nifK proximal direct repeat cloned into pUC18. The substrate plasmid pAM461 was successfully recombined when cloned into E. coli cells expressing xisA, and it contains the smallest portions of the nifD element’s flanking regions that have been examined for the ability to be recombined. Although Brusca et al., (1990) addressed it to some extent, a through examination of the nucleotides required for excision of the nifD element has not been performed. It has been suggested that the only nucleotides absolutely required for excision of the nifD element (as well as the other two elements) are those within the direct repeats (Golden et al., 1985; Lammers et al., 1986; 1990; Brusca et al., 1990). The sequences surrounding the direct repeats of the examined heterocystous cyanobacteria are highly conserved (Henson et al., 2005), which may indicate that some of the nucleotides in these regions are involved in excision. The goal of this dissertation was to further our knowledge about these genetic elements.To accomplished this two specific aims were addressed. 1) What is the evolutionary history of the nifD and hupL elements? 2) How is the nifD element excised? Chapter one entitled “Characterization of a 4kb variant of the nifD element in Anabaena sp. Strain ATCC 33047” has been published (Henson et al., 2005). In this chapter a 4 kb variant of the nifD element was characterized and compared to the other sequenced nifD elements. In chapter two, I examine the evolutionary history of the nifD and hupL elements by characterizing and comparing full length hupL and nifD element sequences and by phylogentic analysis of xisA and xisC. Chapter two will be submitted to the Journal of Molecular Evolution. In chapter three, I examine the excision of the nifD element by determining which nucleotides in the flanking regions of the nifD element are

5 required for excision. Chapter three will be submitted for publication in the journal Microbiology (Society for General Microbiology).

6

7

8 Literature cited:

Adams DG, Duggan PS (1999) Heterocyst and akinete differentiation in cyanobacteria. New Phytol 144:3-33

Brusca JS, Chastain CJ, Golden JW (1990) Expression of the Anabaena sp. strain PCC 7120 xisA gene from a heterologous promoter results in excision of the nifD element. J Bacteriol 172:3925-3931

Borthakur D, Basche M, Buikema WJ, Borthakur PB, Haselkorn R (1990) Expression, nucleotide sequence, and mutational analysis of two open reading frames in the nif gene region of Anabaena sp. strain PCC 7120. MGG 221:227-234

Carr NG (1983) Biochemical aspects of heterocyst differentiation and function. In: Papageoriou GC, Packer L (eds) Photosynthetic Prokaryotes. Elservier Publishing, New York. pp. 265-280

Carrasco CD, Ramaswamy KS, Ramasubramanian TS, Golden JW (1994) Anabaena xisF gene encodes a developmentally regulated site-specific recombinase. Genes Dev 8:74-83

Carrasco CD, Golden JW (1995) Programmed DNA rearrangement of cyanobacterial hupL gene in heterocysts. Proc Natl Acad Sci (USA) 92:791-795

Castenholz RW (1992) Species Usage, Concept, and Evolution in the Cyanobacteria (Blue-Green Algae). J Phycol 28:737-745

Castenholz RW, Waterbury JB (1989) Group I. Cyanobacteria. In: Krieg NR, Holt JG (eds) Bergey’s Manual of Systematic Bacteriology. vol. 3. Williams and Wilkins, Baltimore, MD. pp. 1710-1728

9 Golden JW, Robinson SJ, Haselkorn R (1985) Rearrangement of nitrogen fixation genes during heterocyst differentiation in the cyanobacterium Anabaena. Nature 314:419-423

Golden JW, Mulligan M, Haselkorn R (1987) Different recombination site specificity of two developmentally regulated genome rearrangements. Nature 327:526-529

Golden JW, Carrasco CD, Mulligan ME, Schneider GJ, Haselkorn R (1988) Deletion of a 55-kilobase-pair DNA element from the chromosome during heterocyst differentiation of Anabaena sp. Strain PCC 7120. J Bacteriol 170:5034-5041

Haselkorn R (1992) Developmentally regulated gene rearrangements in prokaryotes. Annu Rev Genet 26:113-130

Haselkorn R, Buikema WJ (1992) Nitrogen fixation in cyanobacteria. In: Stacey G, Burris RH, Evans HJ (eds) Biological Nitrogen Fixation. Chapman & Hall, New York. pp. 43-85

Hayes JM (1983) Geochemical evidence bearing on the origin of aerobiosis, a speculative hypothesis. In: Schopf JW (ed) The Earth’s Earliest Biosphere, its Origins and Evolution. Princeton University Press, Princeton, NJ. pp. 291- 300

Henson BJ, Watson LE, Barnum SR (2005) Characterization of a 4kb variant of the nifD element in Anabaena sp. Strain ATCC 33047. Curr Microbiol 50:129-132

Kunkel B, Losick R, Stragier R (1990) The Bacillus subtilis gene for the development transcription factor sigma K is generated by excision of a dispensable DNA element containing a sporulation recombinase gene. Genes Dev 4:525-535

Lammers PL, Golden J W, Haselkorn R (1986) Identification and sequence of a gene required for a developmentally regulated DNA excision in Anabaena. Cell 44: 905-911

10

Lammers PL, Mclaughlin S, Papin S, Trujillo-Provencio C, Ryncarz II AJ (1990) Developmental rearrangement of cyanobacterial nif genes: nucleotide sequence, open reading frames, and cytochrome P-450 homology of the Anabaena sp. strain PCC 7120 nifD element. J Bacteriol 172:6981-6990

Lazaro S, Fernandez-Pinas F, Fernandez-Valiente E, Blanco-Rivero A, Leganes F (2001) pbpB, a gene coding for a putative penicillin-binding protein, is required for aerobic nitrogen fixation in the cyanobacterium Anabaena sp. strain PCC 7120. J Bacteriol 183:628-636

Lynn ME, Bantle JA, Ownby JD (1986) Estimation of gene expression in heterocysts of Anabaena variabilis by using DNA-RNA hybridization. J Bacteriol 167:940-946

Mavarech M, Rice D, Haselkorn R (1980) Nucleotide sequence of a cyanobacterial nifH gene coding for nitrogenase reductase. Proc Natl Acad Sci (USA) 77:6476-6480

Mazur BO, Chui F (1982) Sequence of the gene coding for the beta-subunit of dinitrogenase from the blue-green alga Anabaena. Proc Natl Acad Sci (USA). 79:6782- 6786

Mulligan ME, Haselkorn R (1989) Nitrogen-fixation (nif) genes of the cyanobacterium Anabaena sp. strain PCC 7120. the nifB-fdxN-nifS-nifU operon. J Biol Chem 264:19200- 19207

Mulligan ME, Buikema WJ, Haselkorn R (1988) Bacterial-type ferredoxin genes in the nitrogen fixation regions of the cyanobacterium Anabaena sp. strain PCC 7120 and Rhizobium meliloti. J Bacteriol 170:4406-4410

11 Nunes-Duby SE, Kwon RS, Tirumalai RS, Ellenberger T, Landy A (1998) Similarities and differences among 105 members of the Int family of site-specific recombinases. Nucleic Acids Res 26:391-406

Postgate JR (1982) The fundamentals of nitrogen fixation. Cambridge University Press, New York.

Rai AN, Söderbäck E, Bergman B (2000) Tansley Review No. 116 Cyanobacterium- plant symbioses. New Phytol 147:449-481

Rice D, Mazur B, Haselkorn R (1982) Isolation and physical mapping of nitrogen fixation genes from the cyanobacterium Anabaena PCC 7120. J Biol Chem 257:13157- 13163

Schopf JW, Hayes JM, Walter MR (1983) Evolution of the Earth’s earliest ecosystem: recent progress and unsolved problems. In: Schopf JW (ed) Earth’s Earliest Biosphere, its Origins and Evolution. Princeton University Press, Princeton, NJ. pp. 361-384

Sprent JI, Sprent P (1990) Nitrogen fixing organisms: pure and applied aspects. Chapman and Hall, New York, pp. 1-30

Smith MCM, Thorpe HM (2002) Diversity in the serine recombinases. Mol Microbiol 44:299-307

Voziyanov Y, Pathania S, Jayaram M (1999) A general model for site-specific recombination by the integrase family of recombinases. Nucleic Acids Res 27:930-941

Whitton BA (2000) Soils and rice-fields. In: Whitton BA and Potts M (eds) The Ecology of Cyanobacteria. Kluwer Academic Publishers, Boston, MA. pp 233-255

12 Wolk PC, Anneliese E, Elhai J (1994) Heterocyst metabolism and development. In: Bryant DA (ed) The Molecular Biology of Cyanobacteria. Kluwer Academic Publishers, The Netherlands. pp. 769-823

13

Chapter I

Characterization of a 4 kb variant of the nifD element in Anabaena sp. Strain ATCC 33047

14 Abstract Heterocyst differentiation in some cyanobacteria is accompanied by a programmed DNA rearrangement within the nitrogen fixation gene nifD. The nifD element is excised from within nifD during the latter stages of heterocyst differentiation by site-specific recombination. There is considerable variation in those nifD elements examined thus far, with Nostoc sp. Strain PCC 7120 and Anabaena variabilis having 11- kb elements, and Nostoc punctiforme having a 24-kb element. Here we characterize a 4- kb nifD element in Anabaena sp. Strain ATCC 33047, and compare it to the other sequenced nifD elements. While there is considerable variation in both the size (ranging from 4kb-24kb) and composition of the nifD elements examined thus far, there are regions that are conserved in all. These conserved regions include the flanking 3' and 5' regions, the xisA gene, and a small open reading frame known as ORF2 in Nostoc sp. Strain PCC 7120.

15 Introduction

Heterocystous cyanobacteria are unique among cyanobacteria because they develop heterocysts, which are terminally differentiated cells devoted to nitrogen fixation [1, 2, 20]. Heterocysts have a thickened cell wall that spatially separates nitrogenase from oxygen, which renders nitrogenase inactive. Heterocysts supply surrounding vegetative cells with nitrogen and in return, vegetative cells supply the heterocysts with energy [1, 2, 20]. Heterocyst differentiation involves an orderly pattern of gene expression that involves up to 1000 different genes, constituting 15-25% of the entire genome [1, 14]. Heterocyst differentiation in some cyanobacteria is accompanied by developmentally regulated DNA rearrangements. A developmentally regulated DNA rearrangement is the removal of a segment of DNA from the genome that accompanies some sort of developmental shift or differentiation process. In Nostoc sp. Strain PCC 7120 (hereafter referred to as Nostoc PCC 7120), developmentally regulated DNA rearrangements have been found to occur within the nifD, fdxN, and hupL genes [5, 7-11, 20]. Each of these genetic or insertion elements are excised from the genome towards the end of heterocyst differentiation by site-specific recombination between direct repeats that flank each element [1, 5, 11]. The length and nucleotide sequence of the direct repeats vary for each element [1], and each element encodes its own site-specific recombinase responsible for its excision [1, 2]. The evolutionary origin of these elements unknown, but it has been suggested that they may be of viral origin [11]. The first genetic element to be sequenced was an 11-kilobase (kb) element inserted within nifD. NifD encodes the α subunit of dinitrogenase, which is part of the larger nitrogenase enzyme complex (encoded by nifHDK) that is responsible for nitrogen fixation in Nostoc PCC 7120 [17]. This element is excised between 11-base pair (bp) direct repeats (CGGAGTAATCC) that flank the element [3, 4, 8-11, 12, 13]. The enzyme responsible for the excision of this element is encoded by xisA, which is located within the nifD element closest to the 3’ end of nifD (Fig. 1). Many other potential open reading frames (ORFs) are located within the element; however, it is not believed that any ORF or gene, other than xisA, is required for excision [4, 12, 13]. Removal of the element is

16 essential for expression of the nifHDK operon and for nitrogen fixation to occur. Two additional nifD elements have been sequenced, a 24-kb has been sequenced from Nostoc punctiforme that is quite divergent from Nostoc PCC 7120 [15], and an 11-kb element from Anabaena variabilis [4] that is very similar to Nostoc PCC 7120. The two additional insertion elements in Nostoc PCC 7120 are the 55-kb element within fdxN and a 10.5-kb element within hupL. Excision of the fdxN element occurs between 5-bp direct repeats (TATTC) and requires the products of xisF, xisH, and xisI, all located within the element itself [6, 8, 16, 19]. Excision of the hupL element occurs between 16-bp direct repeats (CACAGCAGTTATATGG) and requires the product of xisC, which is located within the element [1, 2, 5]. In this study, we characterize a 4-kb nifD element from Anabaena sp. Strain ATCC 33047 (coidentity to Anabaena CA and hereafter referred to as Anabaena ATCC 33047), and compared it the other sequenced nifD elements. We also make a global comparison of all nifD elements sequenced to date.

Methods and Materials

Anabaena ATCC 33047 was obtained from the American Type Culture Collection (ATCC), and was grown in an illuminated shaking incubator at 27°C in a variation of BG-11 media (BG-110 + NaHCO3 (5mM)), which has a reduced amount of nitrate. Genomic DNA was extracted using the Purgene DNA Isolation Kit (Gentra Systems, Minneapolis, MN) with slight modification. Amplification of the nifD element was accomplished with the Expand Long Template PCR system (Roche, Mannheim, Switzerland), using the oligos 5' TTTCCGCCAAATGCACTCTTG 3' and 5' GCAAACCGTCGTAACCGTGAT 3', which are located within nifD and flank the direct repeats. PCR conditions were as follows: an initial hold at 94°C for 2 min, 10 cycles of denaturation at 94° C for 10 s, annealing at 55° C for 30 s, and extension at 68° C for 10 min followed by 15 cycles of denaturation at 94° C for 10 s, annealing at 55° C for 30 s, extension at 68° C for 10 min, with each cycle increasing 20 s in duration, and a final extension of 68° C for 7 min. PCR

17 products were verified by electrophoresis on 0.8% agarose gels, and were purified with the QIAquick Gel Extraction Kit (QIAGEN, Valencia, CA). Sequencing of the PCR product was accomplished with the DYEnamicTM ET terminator cycle sequencing kit (Amersham Biosciences, Piscataway, NJ) using capillary electrophoresis on the ABI 310 and ABI 3100 genetic analyzers (Applied Biosystems, Foster City, CA). Both the forward and reverse DNA strands of the PCR amplified element were sequenced to ensure that the sequence was correct and that no artifacts of PCR were present. The sequence was deposited under the GenBank accession number AY299473. Open reading frame searches were performed with the NCBI ORF finder (www.ncbi.org), Gene Finder (www.softberry.com), and Gene Mark (http://opal.biology.gatech.edu/GeneMark/) programs. BLAST (basic local alignment search tool) searches were performed on all potential open reading frames using both the nucleotide and amino acid sequences.

Results Anabaena ATCC 33047 nifD element The Anabaena ATCC 33047 nifD element, including both direct repeats, is 3975bp in length. Open reading frame (ORF) searches revealed up to12 potential ORFs. On the positive strand these potential ORFs correspond to positions 130-1548, 3059-3637, 1984- 2130, 3009-3161, and 749-865. On the negative strand these potential ORFs correspond to positions 2402-2755, 1605-2330, 2359-2658, 3145-3387, 156-365, 3747-3944, and 1694-1891. BLAST results indicate that portions of the Anabaena ATCC 33047 nifD element have significant similarity to the other sequenced nifD elements. These regions of similarity include xisA, the region known as ORF2 in Nostoc PCC 7120, and the 3` and 5` flanking regions. Most of the other ORFs showed little similarity to any other sequence in the data base; however, a 125 nucleotide region that is composed of mostly repeating CATCTT units, encompassing the ORFs at positions 2359-2658 and 2402- 2755, showed significant similarity to a number of viruses.

Comparison of all nifD elements

18 There is considerable variation in the nifD elements examined thus far (Fig. 1). They vary in overall size, ranging from 4kb to 24kb in length, and in composition. Despite the large amount of variability that exists between the different nifD elements, there are several regions that are conserved in all the elements (Fig. 1). Each nifD element is flanked by identical 11-bp direct repeats, CGGAGTAATCC, with the exception of the nifD element in Anabaena ATCC 33047. In Anabaena ATCC 33047, the direct repeats differ by one nucleotide from each other. The direct repeat furthest from xisA, CGGAGTAATCC, is identical to the direct repeats from all the other nifD elements; however, the direct repeat nearest xisA differs by one nucleotide, CGGAGTAATTC (underlined and in bold face). The upstream region of xisA (129bp), as well as xisA itself are conserved in all elements (Fig. 1 and Table 1). The region of the Nostoc PCC 7120 nifD element known as ORF2 [13], a small putative open reading frame (~500 nucleotides) of unknown function, is present and conserved in all the elements (Fig. 1 and Table 1). The 3` flanking region of the element (~250 nucleotides including the direct repeat furthest from xisA) is conserved in all the sequenced elements (Fig. 1 and Table 1). In addition to the similarity between the different nifD elements listed above, there exist other regions of similarity between some of the nifD elements that are not found in the others. For instance, the Anabaena variabilis nifD element contains all the regions of the Nostoc PCC 7120 nifD element, except for ORF4 and adxA, which are located on the 3` end of the element (Fig. 1). Anabaena ATCC 33047 shares a small region between ORFs 5 and 6 with Nostoc PCC 7120 and Anabaena variabilis that is not found in Nostoc punctiforme (Fig.1). Likewise Nostoc punctiforme shares a small region between ORFs 6 and 7 with Nostoc PCC 7120 and Anabaena variabilis that is not found in Anabaena ATCC 33047 (Fig. 1).

Discussion It is now evident that the nifD element is quite divergent in those heterocystous cyanobacteria that possess it. While the nifD elements examined thus far contain conserved regions including xisA, ORF2, and the 5' and 3' flanking regions, there is considerable variation in overall size, ranging from 4kb-24kb, and composition. The

19 overall difference in size and the variable composition of these elements may indicate a complex pattern of insertions, deletions, and sequence divergence during the course of evolution. In an effort to better understand the evolution of the nifD element, we are currently sequencing nifD elements from additional strains of heterocystous cyanobacteria, which differ in size and sequence composition from published sequences of the strains discussed above (unpublished data). Of special interest is the variation in the sequence composition of the direct repeats of the Anabaena ATCC 33047 nifD element. Both direct repeats from the Anabaena variabilis, Nostoc PCC 7120, and N. punctiforme nifD elements are identical in nucleotide sequence (CGGAGTAATCC). This is also the exact sequence of the direct repeat farthest from xisA in Anabaena ATCC 33047; however, the direct repeat nearest xisA (CGGAGTAATTC) varies by one nucleotide (underlined and in bold face). The nifD element is excised from the chromosome by recombination that occurs between the flanking direct repeats. The fact that the recombination repeats in Anabaena ATCC 33047 are not identical in nucleotide sequence leads us to question the exact role they play in excision. It is not know whether the direct repeats are all that is required for excision of nifD element to occur. Considering that the Anabaena ATCC 33047 direct repeats are not identical, it is possible that other cis or trans acting sequences may be required as well. The possibility that sequences other than the direct repeats are required for excision is supported by the presence of conserved regions, including the 3` and 5` flanking regions, and ORF2. This was addressed to some extent by Brusca et al (1990) [3] and Lammers et al (1986; 1990) [12, 13]; however, it has not been fully resolved. We are currently assessing what portions of the DNA surrounding the direct repeats are required for excision of the nifD element.

20

A A. variabilis ATCC 33047 PCC 7120 N. punctiforme

A. variabilis ---- 89 99 86

ATCC 33047 88 ---- 90 81

PCC 7120 91 86 ---- 86

N. punctiforme 83 84 83 ----

B A. variabilis ATCC 33047 PCC 7120 N. punctiforme

A. variabilis ---- 90 88 74

ATCC 33047 81 ---- 89 75

PCC 7120 93 81 ---- 73

N. punctiforme 80 83 80 ----

Table 1. Comparison of sequence similarities from the conserved regions of the nifD element. Similarities are in percentages and represent similarities between nucleotide sequences. A) The 5` flanking region upstream of xisA (top) xisA itself (bottom). B) 3` flanking region (top) and ORF2 (bottom).

21

22 Literature Cited:

1. Adams DG, Duggan PS (1999) Heterocyst and akinete differentiation in cyanobacteria. New Phytol 144: 3-33

2. Bohme H (1998) Regulation of nitrogen fixation in heterocyst-forming cyanobacteria. Trends Plant Sci 3: 346-351

3. Brusca JS, Chastain CJ, Golden JW (1990) Expression of the Anabaena sp. strain PCC 7120 xisA gene from a heterologous promoter results in excision of the nifD element. J Bacteriol 172: 3925-3931

4. Brusca JS, Hale MA, Carrasco CD, Golden JW (1989) Excision of an 11- kilobase-pair DNA element from within the nifD gene in Anabaena variabilis (1989) J Bacteriol 171: 4138-4145

5. Carrasco CD, Golden JW (1995) Programmed DNA rearrangement of cyanobacterial hupL gene in heterocysts. Proc Natl Acad Sci (USA) 92: 791-795

6. Carrasco CD, Ramaswamy KS, Ramasubramanian TS, Golden JW (1994) Anabaena xisF gene encodes a developmentally regulated site-specific recombinase. Gen Dev 8: 74-83

7. Golden JW, Carrasco CD, Mulligan ME, Schneider GJ, Haslekorn R (1988) Deletion of a 55-kilobase-pair DNA element from the chromosome during heterocyst differentiation of Anabaena sp. Strain PCC 7120. J Bacteriol 170: 5034-5041

8. Golden JW, Mulligan M, Haselkorn R (1987) Different recombination site specificity of two developmentally regulated genome rearrangements. Nature. 327: 526-529

23

9. Golden JW, Robinson SJ, Haselkorn R (1985) Rearrangement of nitrogen fixation genes during heterocyst differentiation in the cyanobacterium Anabaena. Nature 314: 419-423

10. Golden JW, Whorff LL, Weist DR (1991) Independent regulation of the nifHDK operon transcription and DNA rearrangement during heterocyst differentiation in the cyanobacterium Anabaena sp. Strain PCC 7120. J Bacteriol 173: 7098-7105

11. Haselkorn R (1992) Developmentally regulated gene rearrangements in prokaryotes. Annu Rev Genet 26: 113-130

12. Lammers PL, Golden JW, Haselkorn R (1986) Identification and sequence of a gene required for a developmentally regulated DNA excision in Anabaena. Cell 44: 905-911

13. Lammers PL, Mclaughlin S, Papin S, Trujillo-Provencio C, Ryncarz II AJ (1990) Developmental rearrangement of cyanobacterial nif genes: nucleotide sequence, open reading frames, and cytochrome P-450 homology of the Anabaena sp. strain PCC 7120 nifD element. J Bacteriol 172: 6981-6990

14. Lynn ME, Bantle JA, Ownby JD (1986) Estimation of gene expression in heterocysts of Anabaena variabilis by using DNA-RNA hybridisation. J Bacteriol 167: 940-946

15. Meeks JC, Elhai J, Thiel T, Potts M, Larimer F, Lamerdin J, Predki P, Atlas, R (2001) An overview of the genome of Nostoc punctiforme, a multicellular, symbiotic cyanobacterium. Photosynth Res 70: 85-106

24 16. Mulligan ME, Buikema WJ, Haselkorn R (1988) Bacterial-type ferredoxin genes in the nitrogen fixation regions of the cyanobacterium Anabaena sp. strain PCC 7120 and Rhizobium meliloti. J Bacteriol 170: 4406-4410

17. Mulligan ME, Haselkorn R (1989) Nitrogen-Fixation (nif) Genes of the Cyanobacterium Anabaena sp. strain PCC 7120. The nifB-fdxN-nifS-nifU Operon. J Biol Chem 264: 19200-19207

18. Oxelfelt F, Tamagnini P, Lindblad P (1998) Hydrogen uptake in Nostoc sp. Strain PCC 73102. Cloning and characterization of a hupSL homologue. Arch Microbiol 169: 267-274.

19. Ramaswamy KS, Carrasco CD, Tasneem F, Golden JW (1997) Cell-type specificity of the fdxN-element rearrangement requires xisH and xisI. Mol Microbiol 23: 1241-1249

20. Wolk PC, Anneliese E, Elhai J (1994) Heterocyst metabolism and development. In: Bryant DA (ed) The molecular biology of cyanobacteria. The Netherlands: Kluwer Academic Publishers, pp 769-823

25

Chapter II

Evolution and Variation of the nifD and hupL Elements in the Heterocystous Cyanobacteria

26

Abstract In some cyanobacteria, heterocyst differentiation is accompanied by developmentally regulated DNA rearrangements that occur within the nifD, hupL, and fdxN genes, referred to as the nifD, hupL, and fdxN elements. These elements are pieces of DNA embedded within the coding region of each gene and they range from 4 to 55 kb in length. Each element is independently excised from the genome during the latter stages of differentiation by site-specific recombination by xisA and xisC recombinases encoded within the nifD and hupL elements, respectively. Here we examine the variation and evolution of the nifD and hupL elements by comparing full length nifD and hupL element sequences and by constructing phylogenetic trees of their respective xisA and xisC genes. Our data indicate that the representative nifD and hupL elements differ in overall size and composition. However, conserved regions are found within all nifD elements and within all hupL elements. For each element type, conserved regions include the 5` and 3` flanking regions, the respective recombinase required for excision, and a small region of unknown function located near the middle of each element. The data indicates that the nifD and hupL elements have undergone a complex pattern of insertions, deletions, translocations, and sequence divergence over the course of evolution, but that conserved regions remain, suggesting that conserved regions of the elements have been under selective pressure to be retained.

27 Introduction

The heterocystous cyanobacteria are a monophyletic lineage and are unique because they develop heterocysts as a mechanism to spatially separate photosynthesis and nitrogen fixation (Adams and Duggan, 1999; Bohme, 1998; Wolk et al., 1994). Heterocysts have a thickened cell wall that separates nitrogenase from oxygen, which otherwise becomes inactivated in its presence. Heterocysts supply surrounding vegetative cells with nitrogen, and in return vegetative cells supply the heterocysts with photosynthates (Adams and Duggan, 1999; Bohme, 1998; Wolk et al., 1994). Heterocyst differentiation involves up to 1000 different genes, representing up to 15-25% of the cyanobacterial genome (Adams and Duggan, 1999; Lynn et al., 1986). In cyanobacteria, several developmentally regulated DNA rearrangements accompany heterocyst differentiation, which involve the removal of a segment of DNA associated with a developmental shift or differentiation process. Cellular differentiation and developmentally regulated DNA rearrangements are rare in prokaryotes, with the best studied examples being sporulation in Bacillus subtilis and Clostridium difficile and heterocyst differentiation in the cyanobacteria (Kunkel et al., 1990; Haselkorn, 1992; Haraldsen and Sonenshein, 2003). Cellular differentiation and developmentally regulated DNA rearrangements are, however, common among eukaryotes and include macronucleus development in Tetrahymena (Patil et al., 1997) and the VDJ rearrangements of immunoglobulin genes (Sleckman et al., 1996). In the cyanobacteria, DNA rearrangements occur within the nifD, fdxN, and hupL genes, which encode a subunit of nitrogenase, a bacterial-like ferrodoxin of unknown function, and a subunit of the membrane-bound uptake hydrogenase, respectively (Golden et al., 1985, 1987, 1991; Mulligan et al., 1988; Oxelfelt et al., 1989; Haselkorn, 1992; Carrasco and Golden, 1995; Carrasco et al., 1995; Wolk et al., 1994). Each of these elements is self excised from the genome during the latter stages of heterocyst differentiation by site-specific recombination between direct repeats that flank the element (Adams and Duggan, 1999; Haselkorn, 1992; Carrasco et al., 1995). The length and nucleotide sequence of the direct repeats vary for each element (Adams and Duggan, 1999), and each element harbors its own site-specific recombinase responsible for its

28 excision (Adams and Duggan, 1999; Bohme, 1998). All three elements have been found in Nostoc PCC 7120 (Golden et al., 1985, 1987, 1991). An 11-kb element is present within the nifD gene of Nostoc sp. strain PCC 7120. NifD is part of the nifHDK operon, which encodes nitrogenase (Mazur and Chui, 1982). Expression of the nifHDK operon and the process of nitrogen fixation are contingent on the removal of this element. It is excised between 11-bp direct repeats (CGGAGTAATCC) that flank the element by the enzyme encoded by xisA, which is located within the nifD element (Brusca et al., 1989, 1990; Golden et al., 1985, 1987; 1991; Lammers et al., 1986, 1990; Haselkorn, 1992). There are two possible in-frame start codons for transcription of xisA, but evidence presented here (below) as well as elsewhere (Haselkorn, 1992) suggests that the second in-frame start codon is the one from which translation initiates. Many other potential open reading frames (ORFs) are located within the element; although, it is not believed that any ORF or gene, other than xisA, is required for excision (Lammers et al., 1986, 1990; Brusca et al., 1990). Some have suggested that other genes within the Nostoc sp. strain PCC 7120 nifD element maybe expressed (Rice et al., 1982; Lammers et al., 1990). A 55-kb element is embedded within the fdxN gene of Nostoc PCC 7120. FdxN is part of the nifBSU, fdxN operon (Golden et al., 1987, 1988; Mulligan et al., 1988; Carrasco et al., 1994), and its removal is required for expression. The element is excised by the recombinase XisF, and accessory proteins XisH, and XisI, which are all located within the fdxN element (Ramaswamy et al., 1997). This element is excised by site- specific recombination between 5-bp direct repeats (TATTC) that flank it (Golden et al., 1987, 1988; Mulligan et al., 1988; Carrasco et al., 1994). Embedded within the hupL gene of Nostoc sp. strain PCC 7120 is a 9.4-kb element (Carrasco et al., 1995). This element is excised by the recombinase XisC, which is located within the element (Carrasco et al., 1995) (Fig. 3). The hupL element is excised by site-specific recombination between 16-bp direct repeats (CCATATAACTGCTGTG) that flank the element (Carrasco et al., 1995). It is believed that no other functional products are encoded on the hupL element (Carrasco et al., 1995). Although these programmed DNA rearrangements have received moderate attention in the literature, many questions remain regarding their evolution, variation, and

29 regulation of excision. It is not known how widespread these elements are in the cyanobacteria, but the nifD and hupL elements have been found in a number of strains (Tables 1 and 2), whereas the fdxN element has only been found in Nostoc sp. Strain Mac, Anabaena PCC 7122, and Anabaena sp. Strain M131 (Carrasco and Golden, 1995). It was initially thought that heterocystous cyanobacteria of Subsection IV have the elements (Saville et al., 1987; Kallas et al., 1985; Lammers et al., 1990). A recent study reported the presence of the hupL element (or a portion of xisC) within Fischerella PCC 7521 (Subsection V) and Leptolyngbya PCC 73110 (a nonheterocystous Subsection III strain) (Table 2), using Southern hybridization (Tamagnini et al., 2000). Within Subsection IV the distribution of the nifD and hupL elements appears to be haphazard and independent of one another, with some strains having all three elements and others having two, one, or none. For example Nostoc sp. strain PCC 7120 has all three, whereas Anabaena variabilis and Nostoc punctiforme have only the nifD element (Happe et al., 2000; Meeks et al., 2001). Although the extent of variation within the three elements is unknown, previous studies of the nifD element revealed variation in size and composition (Henson et al., 2005). The origins of these elements are unknown. The fdxN element has similarity to the Bacillus subtilis SKIN element (48 kb) that is excised from within the sigK gene during sporulation by the SpoIVCA recombinase (Kunkel et al., 1990; Haselkorn, 1992; Carrasco et al., 1994). Both xisF and SpoIVCA are members of the resolvase/invertase or serine family of recombinases (Smith and Thorpe, 2002). The recombinases xisA and xisC belong to the tyrosine or phage integrase family of site-specific recombinases (Nünes-Düby et al., 1998). The tyrosine integrases are characterized by a conserved R- H-R-Y tetrad within the active site (Voziyanov et al., 1999; Nünes-Düby et al., 1998); however, xisA and xisC have a mutation in their active site. Instead of a histidine, they have a tyrosine (R-Y-R-Y) in their active site (Nünes-Düby et al., 1998). XisA and xisC are 61% similar and 43% identical at the amino acid level (Carrasco et al., 1995) and BLAST (basic local alignment search tool) searches indicate that they are more similar to each other than any other sequence in the GenBank database. The similarities between xisA and xisC suggest that they are related and share a common ancestor, which maybe

30 evidence that the nifD and hupL elements themselves share a common evolutionary origin. It has been suggested that the nifD, hupL, and fdxN elements are the remnants of ancient viral infections that have lost the ability to self replicate (Haselkorn, 1992; Henson et al., 2005). If in fact they are of viral origin, then they probably represent defective prophages in advanced stages of mutational decay (Casjens, 2003). They are not known to be functional (lysogenic). They contain little sequence similarity to extant viruses, with exception of a small region of the nifD element in Anabaena ATCC 33047 (Henson et al., 2005). The B. subtilis SKIN element, which is similar to the fdxN element, is believed to be a defective prophage (Mizuno et al., 1996; Casjens, 2003). Prophages that become defective enter a stage of mutational decay, which is believed to be characterized by increased rates of mutations, deletions, and possibly insertions (Casjens, 2003; Canchaya et al., 2003). Many questions about the nifD and hupL elements remain. What is the extent of structural and compositional variation within the elements? What are their patterns of evolution? What evidence supports a viral ancestry for them? Did the elements independently arise within the cyanobacteria, or was there a single common origin followed by divergence? To test the hypothesis that the elements independently arose, we compared full length sequences of the nifD and hupL elements from several heterocystous cyanobacterial strains. We also sequenced additional xisA and xisC genes and constructed phylogenies to determine if they are distinct genes, to shed light on the origin of the elements.

Methods and Materials

Cyanobacteria cultures were obtained from the Pasture Culture Collection (PCC) and American Type Culture Collection (ATCC), and were grown in an illuminated shaking incubator at 27° C in a variation of BG-11 media (BG-110 + NaHCO3 (5mM), which has reduced nitrate. Genomic DNA was extracted using the Purgene DNA

31 Isolation Kit (Gentra Systems, Minneapolis, MN) with slight modification (Henson et al., 2002). Amplification of full length nifD and hupL elements was accomplished with the Expand Long Template PCR system (Roche, Mannheim, Switzerland) using oligonucleotides that flank the elements. For the nifD element, the primers used for PCR amplification were 5'TTTCCGCCAAATGCACTCTTG3' and 5'GCAAACCGTCGTAACCGTGAT3`, which are located within nifD flanking the direct repeats. To amplify the hupL element, the primers used were 5'GGCGCACATCACACCACCAGGC3` and 5'GCCCTGTTTGGCGGACAATGGC3', which are located within hupL flanking the direct repeats. PCR conditions to amplify the full length elements were as follows: an initial hold at 94°C for 2 min, 10 cycles of denaturation at 94° C for 10 s, annealing at 55° C for 30 s, and extension at 68° C for 10 min followed by 15 cycles of denaturation at 94° C for 10 s, annealing at 55° C for 30 s, extension at 68° C for 10 min, with each cycle increasing 20 s in duration, and a final extension of 68° C for 7 min. PCR products were verified by gel electrophoresis on 0.8% agarose, and purified with the QIAquick Gel Extraction Kit (QIAGEN, Valencia, CA). Amplification of partial xisA and xisC genes was accomplished with taq DNA polymerase (New England Biolabs, Ipswich, MA). The cyanobacterial strains from which the partial xisA and xisC sequences were generated are listed in Tables 1 and 2. Primers used to amplify xisA were 5`CTGCGTTTGAAGTCTGCAAAGAC3` and 5`CGTAAACCAAAAACTGC TAACATCC3`. To amplify xisC, the primers 5`GCAGACTCCAACCGACTGGTTCGC 3` and 5`TATGATTTGTCTCTAGGGATACCA 3` were used. PCR conditions included: an initial hold at 94°C for 2 min, 35 cycles of denaturation at 94° C for 20 s, annealing at 55° C for 15 s, and extension at 72° C for 1 min. PCR products were verified by gel electrophoresis on 0.8% agarose, and were purified with the QIAquick Gel Extraction Kit (QIAGEN, Valencia, CA). Sequencing of purified PCR products was accomplished with the DYEnamicTM ET terminator cycle sequencing kit (Amersham Biosciences, Piscataway, NJ) using capillary electrophoresis on ABI 310, ABI 3100, and ABI 3730 genetic analyzers in the Miami University Center for Bioinformatics and Functional Genomics (Applied

32 Biosystems, Foster City, CA). Both the forward and reverse DNA strands of the PCR products (full length nifD and hupL elements and partial xisA and xisC genes) were sequenced to ensure that the sequence was correct and that no artifacts of PCR were present. BLAST searches were carried out using complete nifD and hupL element sequences as well as smaller portions of each element (ranging from ~200-1500 bp). The BLAST programs used were: Quickly search for highly similar sequences (megablast), Quickly search for divergent sequences (discontiguous megablast), Nucleotide-nucleotide BLAST (blastn), search for short, nearly exact matches, and translated query vs. protein database (blastx). Databases that were searched include nr, est, wgs, and individual genomes. All elements were searched against the completely sequenced microbial genomes, including eubacteria and archaea. Phylogenetic analyses of xisA and xisC nucleic acid sequences were accomplished with PAUP* 4.0b10 (Swofford, 2002) using parsimony, distance, and maximum likelihood (ML) criteria. The xisA tree was rooted with xisC from Nostoc PCC 7120, and the xisC tree was rooted with the xisA from Nostoc PCC 7120. We also aligned and analyzed the two genes in one data matrix using ML to measure branch length and divergence between xisA and xisC. To construct the data matrices, inferred amino acid sequences were aligned using Clustal W (Thompson et al., 1994) with gaps inserted for optimal alignment. The amino acid alignments were manually adjusted using MacClade 4.0 (Maddison and Maddison, 2000). Nucleotide alignments were generated using Codon Align 2.0 (Barry Hall, University of Rochester), which constructs the nucleotide alignment based on the amino acid alignment, with gaps inserted between rather than within codons. Parsimony analysis was conducted using the heuristic search option, with gaps treated as missing data, trees were obtained using stepwise addition, and the tree- bisection-reconnection (TBR) branch swapping with random sequence addition for 100 replicates, steepest descent not in effect, maxtrees set at 50,000, branches collapsed if the maximum length was zero, multrees option in effect, and no topological constraints enforced. Distance analysis was performed using the neighbor-joining method (Saitou and Nei, 1987), with all characters included and the DNA distance measure set to

33 LogDet/paralinear to correct for base compositional bias (Lake, 1994; Lockhart et al., 1994). Prior to maximum likelihood analyses of our data matrices, Modeltest 3.7 was used to determine the evolutionary model that best fit the xisA and xisC matrices (Posada & Crandall, 1998). Modeltest selects two models of evolutionary fitness, one for the hierarchical likelihood ratio test (hLRTs) one for the Akaike information criterion (AIC). For xisA, Modeltest selected TrN+G (-lnL = 7176.2539) for the hLRTs and GTR+G (- lnL = 7145.8521) for AIC. For xisC Modeltest selected HKY+G (-lnL = 4562.8667) for the hLRTs and GTR+G (-lnL = 4544.7305) for the AIC. Maximum likelihood analyses were conducted using the heuristic search option, with the likelihood settings corresponding to each of the models selected by Modeltest. The parameters were as follows: no molecular clock was enforced, starting branch lengths were determined via the Rogers-Swofford approximation method, trees with likelihoods that were 5% or further from the target score were rejected, Branch-length optimization equaled one-dimensional Newton Raphson with pass where the limit = 20 and delta = 1e-06; starting trees were obtained via stepwise addition, sequence addition was random, one tree was held at each step during stepwise addition, the branch swapping algorithm tree bisection reconnection (TBR) was used, steepest descent option was not in effect, maxtrees was set to 50,000, branches were collapsed if the branch lengths were equal to or less than 1e-08, the multrees option was in effect, and topological constraints were not enforced. Bootstrap values were calculated for 500 replicates to evaluate branch support using parsimony criteria (Bremer, 1994; Felsenstein, 1985; Huelsenbeck, 1995).

Results

Structural Variation of the nifD element To date, five nifD elements have been completely sequenced, and considerable variation has been found among them (Fig. 1). They vary in overall size, ranging from 4 to 24 kb in length, and in composition (Henson et al., 2005) (Fig. 1). Considerable portions of each nifD element are composed of regions not found in the other completely

34 sequenced elements; however, the G+C content of each element is comparable to the total genome (within 5%) (Table 3). Despite this compositional variation, there are conserved regions (Fig. 1). The 5` and 3` flanking regions (~250 bp) of each element are conserved (Fig. 1). Included in the conserved flanking regions are the 11-bp direct repeats which, for most strains are CGGAGTAATCC. However in Anabaena ATCC 33047 (Henson et al., 2005) and Nostoc PCC 7416, the direct repeat closest to xisA differs by one nucleotide, CGGAGTAATTC for Anabaena ATCC 33047 and CGGAGTAATCA for Nostoc PCC 7416 (variable nucleotide is in bold and underscored). All nifD elements contain xisA embedded within it; however, the upstream region of xisA or its promoter is variable. The promoter of most xisA genes is ~130 bp in length and highly conserved. However, the xisA promoters in Cylindrospermum PCC 7417, Cylindrospermum PCC 7604, Nostoc PCC 7416, and Nostoc PCC 7415 contain small unique fragments inserted within them. The region of the Nostoc PCC 7120 nifD element encompassing alr1449 (~500 bp), originally designated as ORF2 (Lammers et al., 1990; Henson et al., 2005), is present and highly conserved in all completely sequenced nifD elements (Fig. 1). It is not known if alr1449 is functional, or what that function may be. It has been suggested by others (Rice et al., 1982; Lammers et al., 1990) that alr1449 may be expressed in Nostoc PCC 7120. BLAST searches indicate that it is highly similar to the other sequenced nifD elements, with little similarity to any other known sequence. In addition to the conserved regions found in all nifD elements, there are regions of similarity between some elements but not found in others. For example, the A. variabilis nifD element contains all of the regions that the Nostoc PCC 7120 element possesses except for alr1443, asr1451, asr1452, asr1453, and a portion of the intergenic spacer between alr1450 and asr1451 (Fig. 1). Anabaena ATCC 33047 shares a small region between alr1443 and alr1444 with Nostoc PCC 7120 and A. variabilis, which is not found in other nifD elements (Fig. 1). N. punctiforme contains a portion of all1445, which is also found in the nifD elements of Nostoc PCC 7120 and A. variabilis, but not the other strains (Fig. 1). The Cylindrospermum PCC 7417 and N. punctiforme nifD elements both contain a region that encompasses the tRNA-Asn (600 bp) and ctiA genes (1500 bp), which are not found in the other elements (discussed below) (Fig. 2).

35 Structural Variation of the hupL element The hupL elements from Nostoc PCC 7120 and Cylindrospermum PCC 7417 have been completely sequenced and are 9.4 kb and 7 kb in length, respectively. Like the nifD element, a considerable portion of each element is unique; but, the G+C content of each element is comparable to its entire genome (Table 3). There are conserved regions present in both elements (Fig. 3). The 5` (1800 bp) and 3` (200 bp) flanking regions of the Nostoc PCC 7120 and Cylindrospermum PCC 7417 elements are conserved. The Nostoc PCC 7120 element is flanked by 16-bp direct repeats, CCATATAACTGCTGTG. Within the Cylindrospermum PCC 7417 hupL element, the direct repeat furthest from xisC is identical to the direct repeats in Nostoc PCC 7120, but the direct repeat nearest xisC, CCATATAACTGCTATG, differs by one nucleotide (in bold and underscored). The recombinase xisC (and its promoter) is highly conserved in Nostoc PCC 7120 and Cylindrospermum PCC 7417. Cylindrospermum PCC 7417 contains a ~50 bp region that corresponds to the area between all0685 and alr0686 in Nostoc PCC 7120. The region of the Nostoc PCC 7120 hupL element encompassing asr0680, alr0681, and asr0682 is conserved in Cylindrospermum PCC 7417, although the arrangement is different between the two (Fig. 3). In Nostoc PCC 7120 they are encoded in the order asr0680, alr0681, and asr0682 from 5` to 3`, whereas in Cylindrospermum PCC 7417 they are encoded in the order asr0682, asr0680, and alr0681 (Fig. 3).

Phylogenies of xisA and xisC XisA and xisC were aligned together in one data matrix and analyzed. ML analysis resulted in the tree in Fig. 4, which indicates that xisA and xisC are distinct genes. The individual xisA and xisC trees also differed from each other. Within the xisC tree, the Nostoc strains PCC 7121, PCC 7118, PCC 7120, PCC 7906 and Anabaena PCC 7108 occur together; however they do not occur together in the xisA tree. The similarities between the two trees include Cylindrospermum strains PCC 7417 and PCC 7604 forming a clade, the occurrence of Nostoc strains PCC 7121 and PCC 7906 together, and the placement of Nostoc strains PCC 7118 and PCC 7120 and Anabaena PCC 7108 in the same clade.

36 The xisA aligned data matrix was 594 characters long and ML analyses with TrN+G (-lnL = 7176.2539) and the GTR+G methods (-lnL = 7145.8521) produced single trees with identical topologies (Fig. 5). The topology of this tree is identical to trees created by parsimony and distance analyses. The xisA tree is composed of two major clades (Fig. 5). Clade A is composed of three branches: 1) Nostoc PCC 7415 and Nostoc PCC 7121; 2) sister to N. punctiforme, Scytonema PCC 7110 and Nostoc PCC 7906, and 3) sister to Cylindrospermum PCC 7417 and Cylindrospermum PCC 7604. In clade B, strains Anabaena PCC 7108, Nostoc PCC 7120, Nostoc PCC 6314, Nostoc PCC 7118 are united in an unresolved polytomy, with A. variabilis, Nostoc PCC 7416, Nostoc PCC 7524 as sister. Anabaena ATCC 33047 is at the base of the tree. Analysis of xisA with the third codon excluded from the analyses produced a different topology that differed by having clade B embedded within clade A. The xisC aligned data matrix was 791 characters long and ML analyses with HKY+G (-lnL = 4562.8667) and GTR+G (-lnL = 4544.7305) each resulted in a single tree with identical topologies (Fig. 6), which were also identical to distance and parsimony analyses. Clade A is composed of Cylindrospermum PCC 7417, Cylindrospermum PCC 7604, and Nodularia PCC 73104. Clade B is composed of Nostoc PCC 6411, Nostoc PCC 7121, Nostoc PCC 7118, Nostoc PCC 7122 on a single branch, with Anabaena PCC 7108 and Nostoc PCC 7120 as sister. Analysis of the xisC data matrix with the third codon excluded produced a phylogeny with a different topology that consisted of one branch with Cylindrospermum strains PCC 7417 and PCC 7604 and Nodularia PCC 73104 united together, and the remaining strains occurring as an unresolved polytomy.

Discussion Heterocyst differentiation is initiated by nitrogen deprivation, or a reduction in reduced nitrogen in the environment. NtcA is able to gauge the environmental nitrogen status by sensing 2-oxoglutarate levels (Muro-Pastor et al., 2001; Tanigawa et al., 2002; Herrero et al., 2004). NtcA is a transcription factor involved in the regulation of nitrogen fixation and heterocyst differentiation (Chastain et al., 1990; Jiang et al., 1997, 2000; Ramasubrananian et al., 1994, 1996; Herrero et al., 2004). HetR is considered a master

37 regulator of heterocyst differentiation, and is believed to be regulated by NtcA; however, the exact mechanism of this regulation is not known (Herrero et al., 2004). HetR along with NtcA, HetC, and HetL, are required for initiation of differentiation (Golden and Yoon, 2003; Khudyakov and Golden 2004). HetR, along with HetP, HetF, PatS, PatA, PatB, and PatU are involved in the spacing pattern of heterocysts within the filament (Golden and Yoon, 2003; Khudyakov and Golden 2004). It is not known what factor(s) induce the expression of the recombinases and excision. Analysis of the xisA promoter from Nostoc PCC 7120 reveals three potential binding sites for NtcA (Jiang et al., 1997, 2000; Ramasubrananian et al., 1994, 1996; Herrero et al., 2004) and an additional binding site for “factor 2” (Ramasubrananian et al., 1994); however, it is not known how they influence expression.

NifD element The 5` end of all nifD elements is conserved and includes the direct repeat, the xisA promoter, and xisA itself. The 3`end of all nifD elements (~250 bp), is also conserved and includes the direct repeat. All nifD elements contain alr1449 (Fig. 1), but is has no known function nor is similar to other sequence in GenBank (Lammers et al., 1990; Henson et al., 2005). The work of others suggests that alr1449 may be expressed in Nostoc PCC 7120 (Rice et al., 1982); but, there are discrepancies as to whether it is a combination of alr1448 and alr1449 or alr1450 that is expressed (Lammers et al., 1990). Alr1448 has no similarity to any other known sequence in Genbank and is not conserved in all nifD elements (Fig. 1). Alr1450 is only found in the nifD elements of Nostoc PCC 7120 and A. variabilis (Fig. 1) but is similar to the cytochrome P-450 family of monooxygenases, which are found throughout the eukaryotes and prokaryotes. A divergent portion of alr1450 (125 bp) is located in the genome of N. punctiforme, but not within the nifD element (see below). It has been suggested that alr1450 could provide some sort of competitive advantage to those organisms that have it, and therefore may be the selective pressure to keep the element (Lammers et al., 1990; Haselkorn, 1992; Meeks et al., 1994), but this remains questionable since it is not conserved in all nifD elements. While some members of the serine family of site-specific recombinases require additional proteins for recombination, members of the tyrosine family, including xisA, are not

38 believed to require any additional component for recombination (Smith and Thorpe 2002; Nünes-Düby et al., 1998). This suggests that alr1449 is not involved in the actual excision of the nifD element. Individual BLAST searches of sequenced nifD elements confirmed the expected similarities to the conserved regions of other nifD elements in GenBank (Fig. 1). They also identified particular regions of individual elements which are similar to a variety of other sequences. BLAST searches of the Nostoc PCC 7120 and A. variabilis nifD elements revealed that alr1444 is similar to several cyanobacterial genes. These include the pebAB operon in Fremyella diplosiphon, which is involved in the synthesis of the photosynthetic pigment phycoerythrobilin (Alvey et al., 2003), sll1232 in Synechocystis PCC 6803, and glr0577 and gll1896 in PCC 7421. Asl1446 was found to be very similar to putative proteins within a number of cyanobacteria, and in some cases to multiple genes within individual genomes. In addition, a 35 bp region of the Nostoc PCC 7120 and A. variabilis nifD elements is similar to the Nostoc PCC 7524 nspV endonuclease (Ueno et al., 1993). Interestingly, this ~35 bp region is also conserved in the Cylindrospermum PCC 7417 hupL element. In addition, a region of the A. variabilis nifD element is similar to rbpG, which encodes an RNA binding protein that is located in multiple places within the A. variabilis genome (Maruyama et al., 1999). BLAST searches of the Anabaena ATCC 33047 nifD element also identified expected similarities to other nifD elements. A 125 bp region of this element, composed mostly of repeating CATCTT units, is similar to several viruses (Rabbit fibroma virus and Rat cytomegalovirus), the protozoan Cryptosporidium hominis, the amoeba Entamoeba histolytica, the mosquito Plasmodium falciparum, the fungi Yarrowia lipolytica, Filobasidiella neoformans ,Cryptococcus neoformans, and Kluyveromyces delphensis, the slime mold Dictyostelium discoideum, and portions of certain human and mouse chromosomes (Henson et al., 2005). The largest known nifD element is the 24-kb N. punctiforme element. In addition to the expected similarities with other nifD elements (Fig. 1), portions of the N. punctiforme element are similar to a variety of sequences in the GenBank that are not found in the other nifD elements. Two separate regions within the N. punctiforme element are similar to a number of eubacterial serine/theronine and histidine kinases, and in some

39 cases, multiple kinases within individual genomes. The N. punctiforme element contains a region that is similar to all3149 (a hypothetical protein from Nostoc PCC 7120) and slr1963 from Synechocystis PCC 6803, which is believed to encode a solanesyl diphosphate synthase gene involved in the formation of isoprenoid quinones (Okada et al., 1997). Regions of the N. punctiforme element are also similar to alr1077, alr2191, all4941 from Nostoc PCC 7120. Of these regions, all4941 has no known function and is not widespread, alr1077 is a probable carboxymethylenebutenolidase (Sazuka 2003), and alr2191 has no known function, but is highly conserved in a number of eubacteria. Interestingly, the N. punctiforme nifD element also contains a ~100 bp portion that is similar to a region of the chromosome near patU. PatU is believed to be involved in heterocyst pattern formation (Meeks et al., 2002; Khudyakov and Golden, 2004), and is also present in the genomes of Nostoc PCC 7120 and A. variabilis (Meeks et al., 2002). BLAST searches of this 100 bp region revealed that it occurs in five separate places within the genome of N. punctiforme. The last two regions of the N. punctiforme nifD element that have similarity to other sequences in the database are a ~ 500 bp region that is very similar to the tRNA-Asn gene and a ~1500 bp region that corresponds to the A. variabilis ctiA gene. The tRNA-Asn gene encodes the transfer RNA for asparagine. Although the exact function of ctiA remains unknown, it is very similar to the family of membrane bound stomatin/prohibitins genes, which are found throughout the eukaryotes and prokaryotes. The close placement of the tRNA-Asn and ctiA is common among the cyanobacteria; however, the spacer region between the two varies in size. The tRNA-Asn and ctiA region is conserved in A. variabilis, Nostoc PCC 7120, N. punctiforme, and Cylindrospermum PCC 7417; although, it is only located within the nifD element of the latter two (Fig. 2). The 9 kb nifD element in Cylindrospermum PCC 7417 containes all the conserved regions of the other nifD elements as described above (Fig. 1), as well as the ctiA and tRNA-Asn genes it shares with N. punctiforme (Fig. 2). The rest of the Cylindrospermum PCC 7417 element has little similarity to any other known sequence. Based on structure and composition, the nifD elements can be separated into several groups. The first group includes the nifD elements from Nostoc PCC 7120 and A. variabilis, which are ~70% identical over their entire length. Conservation at the

40 nucleotide and gene organization levels between these two suggests that they are closely related. The only differences between the two elements are 1) the lack of a conserved alr1443 in A. variabilis, 2) the asr1451, asr1452, asr1453 genes, and surrounding sequences, which are absent in A. variabilis and probably lost via deletion, both of which are not uncommon during the mutational decay of prophages (Casjens, 2003). The second group is composed of only Anabaena ATCC 33047, which is highly divergent and appears to have had two major deletions during its evolution: one encompassing the region between alr1444 and alr1448 and the second including the region from alr1450 to the last 350 bp of the element (Fig. 1). This element also contains segments that correspond to alr1443 and the alr1443-alr1444 intergenic region. In the Anabaena ATCC 33047 element, the region that corresponds to alr1443 is very divergent, probably due to mutation whereas the alr1443-alr1444 intergenic region is conserved. The fact that the Anabaena ATCC 33047 element shares the alr1443-alr1444 intergenic region with Nostoc PCC 7120, A. variabilis, and no other element, may indicate that they share a close relationship. It is possible that the ancestor of Anabaena ATCC 33047 contained an element that resembled those found in Nostoc PCC 7120 and A. variabilis. It appears that an increased rate of mutation as well as the two large deletions shaped the evolution of the Anabaena ATCC 33047 nifD element. Of particular importance is the 125 bp region that is similar to a number of organisms including viruses, amoeba, mosquitos, fungi, slime molds, and mammalian chromosomes (Henson et al., 2005). It is possible that this region was present in the ancestral nifD element and has been retained in this strain but lost in the other elements. An alternative hypothesis is that this region could have originated via another viral infection or as a result of the exchange of genetic material with another integrated prophage (Garcia-Vallve et al., 1999; Casjens, 2003). The last group of elements contains the N. punctiforme and Cylindrospermum PCC 7417 elements, and is based on the presence of the ctiA and tRNA-Asn genes. N. punctiforme also contains a portion of all1445 suggesting that the full length all1445 and possibly some of the surrounding regions were present in the ancestor of N. punctiforme, but have been deleted over time. In addition, N. punctiforme contains a 125 bp segment that is similar to alr1450 that is not located within the nifD element, as it is in other

41 strains. This would imply that the elements in this group, N. punctiforme and Cylindrospermum PCC 7417, originated from an ancestral element that resembled those in Nostoc PCC 7120 and A. variabilis. The ctiA and tRNA-Asn genes are present in Nostoc PCC 7120, A. variabilis, N. punctiforme, and Cylindrospermum PCC 7417; however, they only occur within the nifD elements of the latter two. This suggests that either the ctiA and tRNA-Asn genes have been inserted into or removed from the nifD element over the course of evolution (via a translocation, chromosomal rearrangement, or deletion). It is possible that the virus that gave rise to the nifD element contained the ctiA and tRNA-Asn genes and they have been transferred out of the nifD element into the genomes of Nostoc PCC 7120 and A. variabilis. It is also possible that these genes were not present in the ancestral nifD element but have since been transferred into the elements of N. punctiforme and Cylindrospermum PCC 7417. There are several lines of evidence that suggest that the location of the ctiA and tRNA-Asn genes within the nifD element is not the ancestral state, but that they have been transferred into the element. CtiA is found in many organisms (both eukaryotic and prokaryotic), and to the best of our knowledge is not located in any prophage (functional or defective). If ctiA were to have been introduced into the cyanobacteria via a prophage, then it could be expected that ctiA would still be associated with a prophage in other (including the cyanobacteria). Furthermore, the tRNA-Asn gene is only encoded within nifD element of N. punctiforme and nowhere else in the genome, suggesting it is the only functional copy. While some viruses do encode tRNA genes, most do not (Nishida et al., 1999). The N. punctiforme element contains several other regions with similarities to genes that are not typically found in prophages (Casjens, 2003). These genes include histidine kinases, a potential solanesyl diphosphate synthase gene, and a carboxymethylenebutenolidase gene. All this suggests that segments of DNA may have been transferred into the nifD element of N. punctiforme over time. While the movement of genetic material into integrated prophages is infrequent, it is believed to have occurred (Garcia-Vallve et al., 1999; Casjens, 2003; Canchaya et al., 2003). The structural and compositional variation within the nifD element suggests a model of evolutionary history that involves deletions, insertions, translocations, and

42 nucleotide sequence divergence. Although purely speculative, we speculate that the nifD elements in Nostoc PCC 7120 and A. variabilis more closely resemble the ancestral nifD element and that the other elements represent more derived forms.

HupL element The 5` end of the hupL elements is conserved and includes the direct repeat, xisC, and its promoter. The 3` end (~200 bp) of the hupL elements is also conserved and includes the direct repeat. In addition, both the Cylindrospermum PCC 7417 and Nostoc PCC 7120 hupL elements also contain a conserved central region, which encompasses the ars0680, alr0681, and ars0682 genes. However, these genes are in different arrangements in Cylindrospermum PCC 7417 and Nostoc PCC 7120, which indicates that a translocation has occurred during the course of evolution. It is not known which of these arrangements represents the ancestral state. Additional hupL elements sequences may help determine the ancestral state, but it is unlikely that this will be unequivocally determined. Although it cannot be resolved with absolute certainty, we hypothesize that asr0682 as it appears in Cylindrospermum PCC 7417 is the result of a translocation that occurred during evolution. This is supported by comparative sequence analysis of the Nostoc PCC 7120 and Cylindrospermum PCC 7417 hupL elements. The intergenic region between the Nostoc PCC 7120 asr0681 and asr0682 genes is absent in Cylindrospermum PCC 7417. In Cylindrospermum PCC 7417 the end of asr0682 is located exactly 92 bp upstream of the start of asr0680. This 92 bp region corresponds to the region just upstream of asr0680 in Nostoc PCC 7120 and is highly conserved in both elements. The region between xisC and asr0682 in the Cylindrospermum PCC 7417 element, while being divergent, has small regions of similarity to asl0678. These small regions of similarity, the orientation of asr0682, alr0680, and asr0681, and the presence of the 92 bp region between asr0682 and alr0680 in Cylindrospermum PCC 7417 lend support to the hypothesis that asr0682 has been translocated in Cylindrospermum PCC 7417. BLAST searches of the hupL elements revealed the expected similarities to the conserved regions, but the remainder of each element has little similarity to any other sequence in GenBank. The one exception is a small region (35 bp) of the

43 Cylindrospermum PCC 7417 hupL element that is similar to a portion of the Nostoc PCC 7524 nspV gene as well as the nifD elements from Nostoc PCC 7120 and A. variabilis. Although only two hupL elements have been sequenced, it is apparent that they have experienced translocations, deletions, and nucleotide sequence divergence. In addition to the translocation, the remainder of the variation between the Nostoc PCC 7120 and Cylindrospermum PCC 7417 hupL elements is probably due to nucleotide sequence divergence over time. The difference in size between the two elements indicates that either a deletion(s) or insertion(s) have occurred, this fits the mutational decay hypothesis for decaying prophages (Casjens, 2003). While the nifD and hupL elements each have unique conserved regions with no sequence similarity between them (other than xisA and xisC), there are structural similarities between their conserved regions. Both elements have: 1) conserved 5` and 3` flanking regions, including the direct repeats, 2) each element type encodes its own recombinase located within 250 bp of one of the direct repeats, and 3) both element types have a conserved middle region. Although there are conserved regions, considerable variation in size and composition exist within each element type. In fact, the overall percentage of each individual element that is occupied by the conserved regions is relatively small. Within the nifD elements, the conserved regions account for 10 to 25% of the entire sequence (excluding Anabaena ATCC 33047 which is over 60%), whereas within the hupL elements the conserved regions account for 30 to 40% of the entire sequence. Therefore the majority of each individual nifD and hupL element is composed of regions that are not conserved. A considerable portion of these nonconserved regions are unique and have no similarity to any other sequence whereas other regions are similar to a variety of sequences in GenBank. Both xisA and xisC belong to the same family of recombinases, are more similar to each other than to any other recombinase, and our analyses indicate that they are clearly two separate genes that share a common evolutionary origin. The close relationship between xisA and xisC suggests a common ancestry for the nifD and hupL elements, which may have arisen from a common viral ancestor but diverged prior to insertion within the cyanobacteria. Our analyses of xisA resulted in a phylogeny that was distinct from xisC, implying that they may have different patterns of evolution within the

44 heterocystous cyanobacteria since their insertion within the nifD and hupL genes. This could also be due to sampling error since the two data sets do not have the same taxa. The available data support the hypothesis that the nifD and hupL elements independently arose within the cyanobacteria. If the nifD and hupL elements resulted from a single origin that subsequently duplicated and diverged, we would expect more sequence conservation between the two. While xisA and xisC have similar nucleotide and amino acid sequences, the nifD elements have little sequence similarity to the hupL elements, with the exception of the 35 bp region of similarity to the Nostoc PCC 7524 nspV endonuclease that the nifD elements of A. variabilis and Nostoc PCC 7120 and the hupL element of Cylindrosopermum PCC 7417 share. The presence of this 35 bp fragment in the nifD elements of A. variabilis and Nostoc PCC 7120 and the hupL element of Cylindrosopermum PCC 7417 could be explained by one of three possibilities: 1) occured by chance, 2) the nifD and hupL elements did not arise independently in the cyanobacteria, and 3) there has been an exchange of genetic material between the hupL and nifD elements in the ancestor of Cylindrospermum PCC 7417. The first and third possibilities are most likely. Exchange of genetic material between unrelated prophages within the same organism has been suggested by others (Casjens, 2003). In addition to the independent origins of the nifD and hupL elements, it is likely that they have been under different selection pressures, which would explain the topological differences between the xisA and xisC phylogenies, as well as the seemingly haphazard occurrence of the elements. In addition, it appears that individual regions within each element type may have been under different selection pressures and experienced different rates of evolution. This would explain the extensive structural and compositional variation within the nifD and hupL elements that is coupled with the presence of conserved regions. If only certain regions within the nifD and hupL elements are under any selective pressure to be retained (Figs. 1 and 2), then the remainder would be free to accumulate mutations (i.e., insertions, deletions, and translocations, etc) without inhibiting excision or harming the organism. The origins of the elements are unclear, but the data suggest that they represent defective prophages that are in the process of mutational decay. This is supported by the close relationship of xisA and xisC with other members of tyrosine family of

45 recombinases, which are mostly found in viruses (Nünes-Düby et al., 1998; Carrasco et al., 2005). The structure of the nifD and hupL elements also suggests viral ancestry, with the recombinase being located to one end of the integrated prophage (Casjens, 2003). In addition, each element type is inserted in the same location in all examined strains. The only segment of any sequenced nifD or hupL element (other than xisA and xisC) that is similar to a virus is the 125 bp region of the Anabaena ATCC 33047, which is similar to portions of the Rabbit fibroma virus and Rat cytomegalovirus (Henson et al., 2005). The nifD or hupL elements may represent defective prophages that are in a state of mutational decay, which includes an increased mutational rate in certain regions and large scale deletions, all of which are characteristic of decaying prophages (Casjens, 2003; Canchaya et al., 2003). In defective prophages, it is possible for certain genes to be retained as they are selected for, while the remainder of the prophage accumulates mutations. This could explain the structural and compositional variation within the nifD and hupL elements while certain regions remain conserved. It is also possible for a highly mutated prophage to have diverged so much that no characteristic genes from the virus (other than the recombinase) remain (Casjens, 2003). If they are defective prophages, the elements exist as part of the flexible gene pool in cyanobacteria. The flexable gene pool is best described as the DNA within a bacterial genome that arose through hoirizontal transfer (Haker and Carniel, 2001). The flexable gene pool consists of genetic entities such as genomic islands, prophages, integrons, transposons, genomic islets, and plasmids (Haker and Carniel, 2001). Genetic information within the flexable gene pool may provide an ecological or pathogenic advantage; however it could provide no selective advantage and exists as selfish DNA (Lilley et al., 2000). The occurrence of the nifD and hupL elements is haphazard. Initially they were thought to only be in Subsection IV; but they are not present in all representatives, suggesting that the elements arose prior to the diversification of Subsection IV but have since been lost in some strains. Recently the hupL element (or at least a portion of xisC) has been found within a Subsection V Fischerella strain and a Subsection III Leptolyngbya strain (Tamagnini et al., 2000). This suggests that the elements may have originated within the lineage prior to the diversification of the filamentous cyanobacteria (Subsections III, IV, and V). It is also possible that they arose prior to the diversification

46 of all cyanobacteria, and that remnants of these elements may exist in all subsections. However, no remnants of either element have been found in the unicellular strains that have been sequenced (Subsections I and II), however a through investigation is lacking. Regardless of when the element arose within the cyanobacterial lineage they have been independently lost numerous times. For the elements to be maintained there would have to be some sort of selection pressure to keep them. If the excision of the elements were to become coupled with the onset of nitrogen fixation and heterocyst differentiation, then they would be retained. This would explain why many of the nonheterocystous cyanobacteria do not have the elements; however it does not explain why some heterocystous strains do not have them. These strains may have lost the elements, which could happen if the elements were inadvertently excised. The variation in size, structure, and composition of nifD and hupL elements indicate a complex pattern of insertions, deletions, translocation, and sequence divergence over the course of evolution. This complex evolutionary pattern suggests that aside from the conserved regions, these elements are a repository of genetic information where segments of DNA can be inserted, deleted, translocated, or mutated with little effect on the organism or excision. The nifD and hupL elements represent an interesting aspect of the regulation of nitrogen fixation, heterocyst differentiation, and the molecular evolution of genome structure within the heterocystous cyanobacteria. In addition, they may represent an excellent system for the study of defective prophages that are in an advanced state of mutational decay.

47 Table 1. Cyanobacteria that have been examined for the presence of the nifD element. Strains are separated by taxonomic Subsection. Presence refers to whether or not the element has been detected,Y (yes) or N (no). Method refers to whether the element has detected by Southern Hybridization (S) or by DNA sequencing (D). The reference for where the Southern Hybridization or DNA sequence data can be found is listed. Culture collection abbreviations are: ATCC-American Type Culture Collection (Rockville, MD), PCC- Pasture Culture Collection (Paris, France), and ARM- National Facility for Blue-Green Algal Collections, Indian Agricultural Research Institute, New Delhi.

Presence Method Length Reference (Y/N) (S/D) sequenced Subsection I Aphanocaps sp. ATCC 27178 N S n/a Saville et al.,1987 Anacystis nidulans R2 N S n/a Saville et al.,1987 Coccochloris peniocystis ATCC 27147 N S n/a Saville et al.,1987 Cyanothece PCC 7424 N S n/a Kallas et al., 1985 Gloeocapsa sp. ATCC 27191 N S n/a Saville et al.,1987 Synechococcus cedrorum ATCC 27146 N S n/a Saville et al.,1987 Synechococcus PCC 6301 N S n/a Kallas et al., 1985 Synechococcus PCC 7335 N S n/a Kallas et al., 1985 Synechococcus PCC 7425 N S n/a Kallas et al., 1985 Synechocystis sp. ATCC 29109 N S n/a Saville et al.,1987

Subsection III Leptolyngbya sp. ATCC 29126 N S n/a Saville et al.,1987 Leptolyngbya PCC 73110 N S n/a Kallas et al., 1985 Pseudoanabaena sp. ATCC 29210 N S n/a Saville et al., 1985 Pseudoanabaena PCC 7409 N S n/a Kallas et al., 1985

Subsection V Fischerella ATCC 27929 N S n/a Saville et al.,1987 Mastigocladus sp. ARM 351 N S n/a Prasanna and Kaushik 1995

Subsection IV Anabaena azollae N S n/a Meeks et al., 1988 Anabaena ATCC 33047 Y D C Henson et al., 2005 Anabaena PCC 7108 Y D xisA (700 bp) this study Anabaena. variabilis Y D C Brusca et al., 1989 Calothrix PCC 7601 Y S n/a Kallas et al., 1987 Cylindrospermum PCC 7604 Y D xisA (500 bp) this study Cylindrospermum PCC 7417 Y D C this study Nostoc PCC 7120 Y D C Kancko et al., 2001 Nostoc PCC 7121 Y D xisA (700 bp) this study Nostoc PCC 7906 Y D xisA (500 bp) this study Nostoc PCC 7415 Y D xisA (700 bp) this study Nostoc PCC 6314 Y D xisA (700 bp) this study Nostoc PCC 7118 Y D xisA (700 bp) this study Nostoc PCC 7416 Y D xisA (700 bp) this study Nostoc PCC 7524 Y D xisA (700 bp) this study Nostoc punctiforme Y D C Meeks et al., 2001 Nostoc sp .Strain Mac Y S n/a Meeks et al.,1994 Scytonema PCC 7110 Y D xisA (700 bp) this study Scytonema sp. ARM 428 Y S none Prasanna and Kaushik 1995 Scytonematopsis sp. ARM 428 N S n/a Prasanna and Kaushik 1995 Tolypothrix ceylonica ARM397 Y S none Prasanna and Kaushik 1995

48 Table 2. Cyanobacterial strains that have been examined for the presence of the hupL element. Strains are separated by Subsection. Presence refers to whether or not the element has been detected,Y (yes) or N (no). Method refers to whether the element has detected by Southern Hybridization (S) or by DNA sequencing (D). The reference for where the Southern Hybridization or DNA sequence data can be found is listed. Culture collection abbreviations are: PCC- Pasture Culture Collection (Paris, France), CYA- Culture Collection of Algae Norwegian Institute for Water Research (Oslo, Norway), HCC- Culture Collection of the Hawaii Natural Energy Institute, University of Hawaii (Honolulu, HI), CCAP Culture Collection of Algae and Protozoa (Ambelside, UK), ACOI- Culture Collection of Algae, University of Coimbra (Coimbra, Portugal).

Presence Method Length Reference (Y/N) (S/D) sequenced Subsection III Leptolyngbya PCC 73110 Y S n/a Tamagnini et al., 2000 Oscillatoria agardhii CYA 29 N S n/a Tamagnini et al., 2000

Subsection V Fischerella PCC 7521 Y S none Tamagnini et al., 2000

Subsection IV Anabaena PCC 7108 Y D xisC (800 bp) this study Anabaena PCC 7122 Y D xisC (500 bp) this study Anabaena variabilis N S n/a Happe et al., 2000 Cylindrospermum PCC 7604 Y D xisC (700 bp) this study Cylindrospermum PCC 7417 Y D C this study Nodularia PCC 73104 Y D xisC (700 bp) this study Nostoc PCC 6411 Y D xisC (700 bp) this study Nostoc PCC 7121 Y D xisC (650 bp) this study Nostoc PCC 7118 Y D xisC (750 bp) this study Nostoc punctiforme N D n/a Meeks et al., 2001 Nostoc PCC 7906 Y D xisC (600 bp) this study Nostoc PCC 7120 Y D xisC (600 bp) Kancko et al., 2001 Nostoc HCC 1048 Y S none Tamagnini et al., 2000 Nostoc HCC 1061 Y S none Tamagnini et al., 2000 Nostoc HCC 1075 N S n/a Tamagnini et al., 2000 Nostoc PCC 7107 Y S none Tamagnini et al., 2000 Nostoc PCC 6705 Y S none Tamagnini et al., 2000 Nostoc PCC 6314 Y S none Tamagnini et al., 2000 Nostoc PCC 6314/1 Y S none Tamagnini et al., 2000 Nostoc muscorum CCAP 1453/12 N S n/a Tamagnini et al., 2000 Nostoc CYA 190 N S n/a Tamagnini et al., 2000 Nostoc CYA 238 Y S none Tamagnini et al., 2000 Nostoc CYA 295 Y S none Tamagnini et al., 2000 Nostoc CYA 306 N S n/a Tamagnini et al., 2000 Nostoc edaphicum ACOI 97 Y S none Tamagnini et al., 2000 Nostoc microscopicum ACOI 578 N S n/a Tamagnini et al., 2000 Nostoc ellipsosporum ACOI 610 Y S none Tamagnini et al., 2000

49 Table 3. Comparison of the G+C % of the nifD and hupL elements to the genomes in which they reside.

Strain Genome nifD element hupL element

Nostoc PCC 7120 42.35 % 37.77 % 38.18 % Cylindrospermum PCC 7417 43.89 % 39.55 % 37.91 % Nostoc punctiforme 42.55 % 38.98 % - Anabaena variabilis 44.31 % 39.06 % - Anabaena ATCC 33047 42.23 % 39.73 % -

50

51

52

53

54

55

56 Literature cited

Adams DG, Duggan PS (1999) Heterocyst and akinete differentiation in cyanobacteria. New Phytol 144:3-33

Bohme H (1998) Regulation of nitrogen fixation in heterocyst-forming cyanobacteria. Trends in Plant Science 3:346-351

Bremer, K (1994) Branch support and tree stability. Cladistics 10:295-304

Brusca JS, Chastain CJ, Golden JW (1990) Expression of the Anabaena sp. strain PCC 7120 xisA gene from a heterologous promoter results in excision of the nifD element. J Bacteriol 172:3925-3931

Carrasco CD, Ramaswamy KS, Ramasubramanian TS, Golden JW (1994) Anabaena xisF gene encodes a developmentally regulated site-specific recombinase. Genes Dev 8:74-83

Carrasco CD, Golden JW (1995) Two hererocyst-specific DNA rearrangements of nif operons in Anabaena cylindrica and Nostoc sp. strain Mac. Microbiol 141:2479-2487

Casjens S (2003) Prophages and bacterial genomics: what have we learned so far? Mol Microbiol 49:277-300

Chastain CJ, Brusca JS, Ramasubramanian TS, Wei TF, Golden JW (1990) A sequence- sequence DNA-binding factor (VF1) from Anabaena sp. strain PCC 7120 vegetative cells binds to three adjacent sites in the xisA upstream region. J Bacteriol 172: 5004-5051

Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evol 39:783-791

57 Garcia-Vallve S, Palau J, Romeu A (1999) Horizontal gene transfer in glycosyl hydrolases inferred from codon usage in Escherichia coli and Bacillus subtilis. Mol Biol Evol 16:1125–1134

Golden JW, Mulligan M, Haselkorn R (1987) Different recombination site specificity of two developmentally regulated genome rearrangements. Nature 327:526-529

Golden JW, Carrasco CD, Mulligan ME, Schneider GJ, Haslekorn R (1988) Deletion of a 55-kilobase-pair DNA element from the chromosome during heterocyst differentiation of Anabaena sp. Strain PCC 7120. J Bacteriol 170:5034-5041

Golden JW, Yoon HS (2003) Heterocyst development in Anabaena. Curr Opin Microbiol 6: 557-563

Hacker J, Carniel E (2001) Ecological fitness, genomic islands and bacterial pathogenicity. A Darwinian view of the evolution of microbes. Embo reports 2:376-381

Haselkorn R (1992) Developmentally regulated gene rearrangements in prokaryotes. Annu Rev Genet 26:113-130

Hayes JM (1983) Geochemical evidence bearing on the origin of aerobiosis, a speculative hypothesis. In: Schopf JW (ed) The Earth’s Earliest Biosphere, its Origins and Evolution. Princeton University Press, Princeton, NJ. pp. 291- 300

Henson BJ, Watson LE, Barnum SR (2002) Molecular differentiation of the heterocystous cyanobacteria, Nostoc and Anabaena, based on complete nifD sequences. Curr Microbiol 45:161-164

Henson BJ, Watson LE, Barnum SR (2005) Characterization of a 4kb variant of the nifD element in Anabaena sp. Strain ATCC 33047. Curr Microbiol 50:129-132

58 Herrero A, Muro-Pastor AM, Valladares A, Flores E (2004) Cellular differentiation and the NtcA transcription factor in filamentous cyanobacteria. FEMS Microbiol Rev

Huelsenbeck JP, Hillis DM, Jones R (1995) Parametric bootstrapping in molecular phylogenetics: applications and performance. In: Ferraris JS, Palumbi SR (eds) Molecular Zoology: strategies and protocols. Wiley-Liss, New York, pp. 19-46

Jiang FY, Wisen S, Widersten M, Bergman B, Mannervik B (2000). Examination of the transcription factor NtcA-binding motif by in vitro selection of DNA sequences from a random library. J. Mol. Biol. 301: 783-793

Jiang FY, Mannervik B, Bergman B (1997) Evidence for redox regulation of the transcription factor NtcA, acting both as an activator and a repressor, in the cyanobacterium Anabaena PCC 7120. Biochemical Journal 327: 513-517

Kallas T, Coursin T, Rippka R (1985) Different organization of nif genes in nonheterocystous and heterocystous cyanobacteria. Plant Mol Biol 5:321-329

Kancko T, Nakamura Y, Wolk CP, Kuritz T, Sasamoto S, Watanabe A, et al. (2001) Complete genomic sequence of the filamentous nitrogen-fixing cyanobacteriun Anabaena sp. strain PCC 7120. DNA Res 8(5):205-213, 227-253

Khudyakov IY, Golden JW (2004) Different functions of HetR, a master regulator of heterocyst differentiation in Anabaena sp. PCC 7120, can be separated by mutation. Proc Natl Acad Sci (USA) 101(45):16040-16045

Kunkel B, Losick R, Stragier P (1990) The Bacillus subtilis Gene for the Development Transcription Factor Sigma K is Generated by Excision of a Dispensable DNA Element Containing a Sporulation Recombinase Gene. Genes Dev 4:525-535

59 Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci (USA) 91:1455-1459

Lammers PL, Mclaughlin S, Papin S, Trujillo-Provencio C, Ryncarz II AJ (1990) Developmental rearrangement of cyanobacterial nif genes: nucleotide sequence, open reading frames, and cytochrome P-450 homology of the Anabaena sp. strain PCC 7120 nifD element. J Bacteriol 172:6981-6990

Lilley A, Yound P, Bailey M (2000) Bacterial population genetics: Do plasmids maintain bacterial diversity and adaptation? In Thomas CM (ed), The horizontal gene pool. Havard Academic Publisher, pp 287-300

Lockhart PJ, Steel MA, Hendy MD, Penny D (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol 11:605-612

Lynn ME, Bantle JA, Ownby JD (1986) Estimation of gene expression in heterocysts of Anabaena variabilis by using DNA-RNA hybridisation. J Bacteriol 167:940-946

Maddison WP, Maddison DR (2000) MacClade ver. 4 Analysis of phylogeny and character evolution. Sinauer, Sunderland, MA

Maruyama K, Sato N, Ohta N (1999) Conservation of structure and cold-regulation of RNA-binding proteins in cyanobacteria: probable convergent evolution with eukaryotic glycine-rich RNA-binding proteins. Nucleic Acids Res 27(9):2029–2036

Mazur, B. O., F. Chui. (1982). Sequence of the Gene Coding for the Beta-Subunit of Dinitrogenase from the Blue-Green Alga Anabaena. Proc. Natl. Acad. Sci. (USA). 79: 6782-6786

60 Meeks JC, Campbell EL, Bisen PS (1994) Elements interrupting nitrogen fixation genes in cyanobacteria: presence and absence of a nifD element in clones of Nostoc sp. strain Mac. Microbiol 140:3225-3232

Meeks JC, Elhai J, Thiel T, Potts M, Larimer F, Lamerdin J, Predki P, Atlas R (2001) An overview of the genome of Nostoc punctiforme, a multicellular, symbiotic cyanobacterium. Photosyn Res 70:85-106

Mizuno M, Masuda S, Takemaru K, Hosono S, Sato T, Takeuchi M, et al. (1996) Systematic sequencing of the 283 kb 210 degrees-232 degrees region of the Bacillus subtilis genome containing the SKIN element and many sporulation genes. Microbiology 142:3103–3111

Mulligan ME, Buikema WJ, Haselkorn R (1988) Bacterial-type ferredoxin genes in the nitrogen fixation regions of the cyanobacterium Anabaena sp. strain PCC 7120 and Rhizobium meliloti. J Bacteriol 170:4406-4410

Muro-Pastor MI, Reyes JC, Florencio FJ (2001) Cyanobacteria perceive nitrogen status by sensing intracellular 2-oxoglutarate levels. J Biol Chem 276: 38320-38328

Nishida K, Kawasaki T, Fujie M, Usami S, Yamada T (1999) Aminoacylation of tRNAs encoded by Chlorella Virus CVK2. Virology 263:220-229

Nunes-Düby SE, Kwon HJ, Tirumalai RS, Ellenberger T, Landy A (1998) Similarities and differences among 105 members of the Int family of site-specific recombinases. Nucleic Acids Res 26:391-406

Okada K, Kamiya Y, Zhu X, Suzuki K, Tanaka K, Nakagawa T, Matsuda H, Kawamukai M (1997) Cloning of the sdsA gene encoding solanesyl diphosphate synthase from Rhodobacter capsulatus and its functional expression in Escherichia coli and Saccharomyces cervisiae. J Bacteriol 179:5992-5998

61

Posada D, Crandall KA (1998) Modeltest: testing the model of DNA substitution. Bioinformatics 14:817-818

Prasanna R, Kaushik BD (1995) Nitrogen fixation and nif gene organization in branched heterocystous cyanobacteria: variation in the presence of xisA. Folia Microbiol 40(2):176-180

Ramasubramanian TS, Wei TF, Golden JW (1994) Two Anabaena sp. strain PCC 7120 DNA-binding factors interact with vegetative cell- and heterocyst-specific genes. J Bacteriol 176: 1214-1223

Ramasubramanian TS, Wei TF, Oldham AK, Golden JW (1996) Transcription of the Anabaena sp. strain PCC 7120 ntcA gene: multiple transcripts and NtcA binding. J Bacteriol 178: 922-926

Ramaswamy KS, Carrasco CD, Tasneem F, Golden JW (1997) Cell-type specificity of the fdxN-element rearrangement requires xisH and xisI. Mol Microbiol 23:1241-1249

Rice D, Mazur B, Haselkorn R (1982) Isolation and physical mapping of nitrogen fixation genes from the cyanobacterium Anabaena PCC 7120. J Biol Chem 257:13157- 13163

Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406-425

Saville B, Straus N, Coleman JR (1987) Contiguous organization of the nitrogenase genes in a heterocystous cyanobacterium. Plant Physiol 85:26-29

62 Sazuka T (2003) Proteomic analysis of the cyanobacterium Anabaena sp. strain PCC7120 with two-dimensional gel electrophoresis and amino-terminal sequencing. Photosynthesis Res 78(3):279-291

Schopf JW, Hayes JM, Walter MR (1983) Evolution of the Earth’s earliest ecosystem: recent progress and unsolved problems. In: Schopf JW (ed) Earth’s Earliest Biosphere, its Origins and Evolution. Princeton University Press, Princeton, NJ. pp. 361-384

Smith MCM, Thorpe HM (2002) Diversity in the serine recombinases. Mol Microbiol 44:299-307

Sprent JI, Sprent P (1990) Nitrogen fixing organisms: pure and applied aspects. Chapman and Hall, New York, pp. 1-30

Swofford, DL (2001) PAUP. Phylogenetic Analysis Using Parsimony. Version 4. Sinauer Associates, Sunderland, MA

Tanigawa R, Shirokane M, Maeda Si S, Omata T, Tanaka K, Takahashi H (2002) Transcriptional activation of the NtcA-dependent promoters on Synechococcus sp. PCC 7942 by 2-oxoglutarate in vitro. Proc Natl Acad Sci USA 99:4251-4255

Tamagnini P, Costa JL, Almeida L, Oliveira MJ, Salema R, Lindblad P (2000) Diversity of cyanobacterial hydrogenases, a molecular approach. Current Microbiol 40:356-361

Thompson JD, Higgins DG, Gibson TJ (1994) Clustall W: improving sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673-4680

Ueno T, Ito H, Kotani H, Kimizuka F, Nakajima K (1993) Cloning and expression of the NspV restriction-modification genes of Nostoc sp. strain PCC7524. Nucleic Acids Res 21:3899

63

Voziyanov Y, Pathania S, Jayaram M (1999) A general model for site-specific recombination by the integrase family recombinases. Nucleic Acid Res 27:930-941

Wolk PC, Anneliese E, Elhai J (1994) Heterocyst metabolism and development. In: Bryant DA (ed) The Molecular Biology of Cyanobacteria. The Netherlands: Kluwer Academic Publishers, pp 769-823

64 Chapter III

Excision of the nifD Element in the Heterocystsous Cyanobacteria

65

Abstract In some cyanobacteria, heterocyst differentiation is accompanied by developmentally regulated DNA rearrangements that occur within the nifD, fdxN, and hupL genes. These genetic elements are excised from the genome by site-specific recombination during the latter stages of differentiation before nitrogen fixation can begin. The nifD element is excised by the recombinase, XisA, which is located within the element. Excision is believed to occur between the flanking 11-bp direct repeats; however, it is not known where excision occurs within the direct repeats. The nucleotides required for excision, within and surrounding the direct repeats, are not known. Our objective was to examine the XisA-mediated excision of the nifD element by examining which nucleotides flanking the element are required for its excision. To accomplish this, we observed the ability of XisA to excise substrate plasmids that contained the flanking regions of the nifD element in an E. coli host. Using PCR directed mutagenesis we altered the nucleotides in the nifD element flanking regions in substrate plasmids and determined the effect that it had on recombination. Results indicate that nucleotides within and outside of the direct repeats are involved in excision, and that not all nucleotides within the direct repeats are required. In certain nucleotide positions, the presence of a purine versus a pyrimidine greatly affected recombination. Although excision was inhibited when certain nucleotides were mutated, PCR analyses revealed that excision occurred at a low level. Our results also indicated that the site of excision and branch migration occurs in a 6 bp region within the direct repeats. This indicates that the excision of the nifD element is variable with respect to the nuclelotides involved in excision.

66 Introduction Homologous recombination involves the exchange of genetic information between two molecules of DNA that contain long stretches of similarity. Site-specific recombination differs in that the exchange occurs between DNA molecules with only short stretches of DNA similarity (i.e., direct repeats or inverted repeats). Site-specific recombination is typified by the integration or excision of phage DNA from the host’s genome. In addition, site-specific recombination is involved in the integration and excision of transposons, integrons, and plasmids (Smith and Thorpe, 2002). The enzymes that perform these reactions are classified as site-specific recombinases, but have also been called invertases, integrases, excisionases, transposases, and resolvases. They catalyze a number of reactions including integrations (insertions), inversions, or excisions (deletions). In viruses, site-specific recombinases can either catalyze the integration or the excision of a viral genome into the host, depending upon which stage of the life cycle it is in (lytic or lysogenic). Site-specific recombinases are currently divided into two separate classes, the resolvase/invertase or serine family of recombinases (Smith and Thorpe, 2002) and the tyrosine or phage integrase family of site-specific recombinases (Nunes-Düby et al., 1998). The serine and tyrosine families are unrelated, both on an evolutionary and functional level (Smith and Thorpe, 2002). The serine family is composed of the resolvases and invertases (Smith and Thorpe, 2002; Smith et al., 2004). Resolvases typically only recognize direct repeats and excise the DNA between them, whereas invertases recognize inverted repeats and invert the DNA between them (Smith and Thorpe, 2002). Both resolvases and invertases recognize and bind unique sequences in addition to the inverted or direct repeats that are required for excision. Furthermore, the invertases require a FIS (factor for inversion stimulation) cofactor (Smith and Thorpe, 2002). The method that serine recombinases use for excision is described as a “four-strand cleavage and rejoining mechanism” (Smith and Thorpe, 2002; Smith et al., 2004; Bibb et al., 2005). This mechanism involves generating a staggered cut at a 2 bp core sequence within both repeats, and temporary bonds are formed between the serine residue within the active site and the 5` exposed phosphate (Smith and Thorpe, 2002; Smith et al., 2004; Bibb et al., 2005). Rotation and ligation of complementary strands then occurs, completing recombination.

67 In the tyrosine family of site-specific recombinase, there is considerable variation in the nucleic acid and amino acid sequences of its members. However, sequence alignments based on secondary and tertiary structures reveal structural conservation within the family. They are characterized by the “active site tetrad”, which consists of four conserved amino acids: arginine, histitidine, arginine, and tyrosine (R-H-R-Y). The tyrosine residue is catalytic, initiating DNA cleavage and establishing the phosphotyrosine bond, whereas the R-H-R residues are involved in the DNA-protein interaction (Voziyanov et al., 1999; Nunes-Düby et al., 1995; Nunes-Düby et al., 1998). The typical recombination core for members of the tyrosine family is composed of two inverted binding regions of 11-13 bp separated by a 6-8 bp spacer region (Voziyanov et al., 1999). Cleavage typically occurs in two single strand cleavage/exchange reactions, with the formation of a Holiday junction being an obligatory intermediate (Voziyanov et al., 1999). Branch migration within the holiday junctions is usually confined to the 6-8 bp spacer region (Nunes-Düby et al., 1995; Voziyanov et al., 1999). Single nucleotide mutations outside of the 6-8 bp recombination core typically have little or no effect on recombination, whereas mutations within the core dramatically affect recombination (Nunes-Düby et al., 1995). In prokaryotes, site-specific recombination is known to accompany cellular differentiation in several organisms, and occurs in conjunction with developmentally regulated DNA rearrangements. A developmentally regulated DNA rearrangement is the excision of a segment of a DNA by site-specific recombination that is associated with some sort of developmental shift or differentiation process. Cellular differentiation is rare in prokaryotes, with the best studied examples being sporulation in Bacillus subtilis and Clostridium difficile, and heterocyst differentiation in the heterocystous cyanobacteria (Kunkel et al., 1990; Haselkorn 1992; Haraldsen and Sonenshein, 2003). Within the cyanobacteria, three developmentally regulated DNA rearrangements are known to be coupled with heterocyst differentiation. These rearrangements occur within the fdxN, hupL, and nifD genes, referred to as the nifD, hupL, and fdxN elements, respectively. NifD encodes a subunit of nitrogenase, the enzyme required for nitrogen fixation (Mazur and Chui, 1982), fdxN encodes a bacterial like ferrodoxin of unknown function (Mulligan et al., 1988; Mulligan and Haselkorn, 1989), and hupL encodes the

68 large subunit of the membrane-bound uptake hydrogenase (Oxelfelt et al., 1989; Carrasco et al., 1995). Each of these elements is independently excised from the genome during the latter stages of heterocyst differentiation by site-specific recombination between direct repeats that flank the element (Adams and Duggan, 1999; Haselkorn, 1992; Carrasco and Golden, 1995). The length and nucleotide sequence of the direct repeats varies for each element, and each element encodes its own site-specific recombinase responsible for its excision (Adams and Duggan, 1999; Bohme, 1998). Excision occurs in such a way that the element becomes circularized and is neither amplified, nor degraded (Brusca et al., 1989); and the gene, which the element interrupts, is ligated restoring the contiguous reading frame (Fig. 1). The evolutionary origin of these elements remains unknown, but it has been hypothesized that they are remnants of ancient viruses that have lost the ability to self replicate (Haselkorn, 1992; Henson et al., 2005; see chapter 2). The fdxN element in Nostoc sp. strain PCC 7120 (hereafter referred to as Nostoc PCC 7120) is 55-kb in length and located within the coding region of fdxN. FdxN is part of the nifBSU, fdxN operon (Golden et al., 1987; 1988; Mulligan et al., 1988; Carrasco et al., 1994), and removal of the element is required for expression of the operon. Recombination or excision of the element occurs between 5-bp direct repeats (TATTC) that flank the element (Golden et al., 1987, 1988; Mulligan et al., 1988; Carrasco et al., 1994). The element is excised by enzymes encoded by xisF, xisH, and xisI, which are located within the element (Ramaswamy et al., 1997). XisF encodes the primary recombinase, and xisH and xisI are accessory proteins. The fdxN element shows similarity to the Bacillus subtilis skin element (48 kb) that is excised from within the sigK gene during sporulation by the SpoIVCA recombinase (Kunkel et al., 1990; Haselkorn, 1992; Carrasco et al., 1994). Both xisF and SpoIVCA are members of the serine family of recombinases (Smith and Thorpe, 2002). The hupL element is 9.4-kb in Nostoc PCC 7120 and is excised by site-specific recombination between 16-bp direct repeats (CCATATAACTGCTGTG) that flank the element (Carrasco et al., 1995, 2005). The hupL element is excised from within hupL by the recombinase encoded by xisC (Carrasco et al., 1995, 2005). It is believed that xisC is the only gene required for excision, although it is not known whether other genes located within the hupL element encode functional products (Carrasco et al., 1995, 2005).

69 The nifD element is 11kb in length in Nostoc PCC 7120 (Fig. 1) and is excised between 11-bp direct repeats (CGGAGTAATCC) that flank the element by the recombinase xisA, which is encoded within the element (Brusca et al., 1989, 1990; Golden et al., 1985, 1987, 1991; Lammers et al., 1986, 1990; Haselkorn 1992). Although it is believed that only xisA is required for excision, some have suggested that other gene(s) within the Nostoc PCC 7120 nifD element are expressed (Rice et al., 1982; Lammers et al., 1990). XisA and xisC, of the nifD and hupL elements repsectively, are members of the tyrosine integrase family of site-specific recombinases (Nunes-Düby et al., 1998). The conserved active site tetrad (R-H-R-Y) is variable in xisA and xisC. Rather than a histidine, they have a tyrosine (R-Y-R-Y) in their active site (Nunes-Düby et al., 1998). This histidine residue has also been found to be altered in other tyrosine recombinases (such as asparagine, lysine, arginine) (Nunes-Düby et al., 1998). XisA and xisC are 61% similar and 43% identical at the amino acid level (Carrasco et al., 1995) and BLAST (basic local alignment search tool) searches indicate that they are more similar to each other than to any other sequence in GenBank. Many questions remain about the evolution, function, and excision of the nifD element. It is unknown which specific nucleotides within and flanking the direct repeats are required for excision, or precisely where excision occurs. It has been hypothesized that the nucleotides within the direct repeats are required and that excision occurs within them (Haselkorn, 1992). It has also been suggested that nucleotides surrounding the direct repeats may be involved in excision (Henson et al., 2005). Therefore, our objective was to test the hypothesis that some, but not all, nucleotides within and flanking the direct repeats are involved in excision, and that excision occurs within the direct repeats.

Methods and Materials

Organisms and culturing conditions The cyanobacterium Nostoc PCC 7120, formerly Anabaena PCC 7120 (Henson et al., 2002), was obtained from the Pasteur Culture Collection (PCC). The culture was as

70 grown in a variation of BG-11 medium (BG-110 + NaHCO3 (5mM), without nitrate) at 270C under constant fluorescent illumination in shaking incubators. E. coli strains HB101 and TOP10 (Invitrogen, Carlsbad, CA) were cultured according to standard laboratory protocols (Sambrook and Russell, 2001). Luria Broth (LB) and M9 media were prepared as described elsewhere (Sambrook and Russell, 2001).

M9 media was supplemented with casamino acids (10%), glycerol (20%), CaCl2

(0.01M), MgCl2 (0.01M), 0.5% thiamine, and arabinose (2M). When necessary, media was supplemented with the antibiotics ampicillin (100µg/ml), chloramphenicol (20µg/ml), and kanamycin (50µg/ml).

Nucleic acid manipulation Cyanobacterial DNA was isolated using the Purgene DNA Isolation Kit (Gentra Systems, Minneapolis, MN) with slight modifications (Henson et al., 2002). Plasmid DNA was isolated using the Wizard Plus Miniprep DNA purification system (Promega, Madison, WI). PCR was accomplished with taq DNA polymerase (New England Biolabs, Ipswich, MA). PCR products were verified by electrophoresis. Gel electrophoresis was performed on 0.8% or 1.5% agarose gels. Extraction of agarose embedded DNA was accomplished with the QIAquick Gel Extraction Kit (QIAGEN, Valencia, CA). Restriction endonucleases were used according to manufacture’s specifications (New England Biolabs, Ipswich, MA). Sequencing of purified PCR products was accomplished with the DYEnamicTM ET terminator cycle sequencing kit (Amersham Biosciences, Piscataway, NJ) using capillary electrophoresis on the ABI 310, ABI 3100, and ABI 3730 genetic analyzers (Applied Biosystems, Foster City, CA). DNA ligations were carried out using T4 DNA Ligase (New England Biolabs, Ipswich, MA) according to the manufacture’s specifications.

Construction of xisA expression vectors XisA was expressed from pBAD33 (Guzman et al., 1995) (kindly donated by Luis A. Actis), which contains the chloramphenicol resistance gene. Cloning xisA into pBAD33 placed it under the control of the arabinose promoter allowing for strict control of

71 expression (Guzman et al., 1995). Two versions of the xisA expression vector were created. The vector pHEN1 contains the entire coding region of xisA from the first potential start codon to the stop codon; the vector was transformed into E. coli strain HB101 cells containing pHEN1, and are hereafter referred to as pHEN1 cells. Using PCR, xisA was amplified from Nostoc PCC 7120 genomic DNA, using the primers xisA- sac1-F and xisA-xmaI-R (Table 1), which incorporated the restriction sites SacI and XmaI into the PCR product. The PCR product was digested and ligated into the SacI and XmaI sites of pBAD33, transformed into E. coli and plated on LB supplemented with chloramphenicol. The expression vector pHEN2 contains the coding region of xisA extending from the second potential start codon to the stop codon, and was transformed into E. coli strain HB101 cells containing pHEN2, which is hereafter referred to as pHEN2 cells. Using PCR, xisA was amplified from Nostoc PCC 7120 genomic DNA, using the primers xisA-sac1-F2 and xisA-xmaI-R (Table 1), which incorporated the restriction sites SacI and XmaI into the PCR product. The PCR product was digested and ligated into the SacI and XmaI sites of pBAD33, transformed into E. coli and plated on LB supplemented with chloramphenicol.

Construction of substrate plasmids Several categories or series of substrate plasmids were created to analyze the excision of the nifD element. Substrate plasmids categories are referred to as pDK, pRE, pSUB, and pTOPO series. All substrate plasmids were confirmed by sequencing with either: M13 F, M13 R, pUCF1, pUCR1, LamInsBam-F, or LamInsHinc-R (Table 1). M13 F and M13 R flank the HindIII and EcoRI sites of pUC18. The primers pUCF1 and pUCR1 are located ~250 bp upstream and downstream (respectively) of the pUC18 multiple cloning site. The primers LamInsBam-F and LamInsHinc-R are located within the 1.5 kb lamda spacer region. pRE-18 The plasmid pRE-18 was created to determine if a substrate plasmid containing the direct repeats (in the correct orientation) separated by a 1.3 kb segment of DNA is able to be recombined by XisA. To engineer pRE-18, PCR directed mutagenesis was used to

72 amplify a 1.3 kb segment of Lambda DNA (nucleotides 23135 to 24376) with primers that incorporated the 11-bp direct repeats (in the correct orientation) and XabI and HindIII sites (Table 1). The PCR product, with the direct repeats and XabI and HindIII sites included, was ligated into the XabI and HindIII sites of pUC18, creating pRE-18. pDK1 and pDK3 The plasmid pDK1 was designed to emulate the substrate plasmid pAM461 used by Brusca et al. (1990). pDK1 was engineered by amplifying a 750 bp fragment of the nifK proximal region of the nifD element (with the primers that incorporated the HindIII and HincII sites, Table 1), and ligating it into the HindIII and HincII sites of pUC18 (Fig. 2). This was followed by amplifying a 450 bp fragment of the nifD proximal region (with the primers that incorporated EcoRI and XmaI sites, Table 1), and ligating the fragment into the EcoR1 and BamHI sites of pUC18. (Fig. 2) The substrate plasmid pDK3 was created by inserting the EcoR1- HindIII fragment of pDK1 into the EcoR1 HindIII sites of pBR322. pSUB series The pSUB series of substrate plasmids were created to physically separate the nifD and nifK proximal regions and to analyze the effect of modifying nucleotides on excision. The plasmids pSUB1 through pSUB9B were engineered to have decreasing portions of the nifD and nifK proximal regions, with pSUB9B having the smallest portions. All derivatives of pSUB9B were generated using PCR directed mutagenesis to alter specific nucleotides. To separate the nifD and nifK proximal regions, a 1.5 kb HincII-BamHI fragment of Lambda genomic DNA (from nucleotides 43180 to 41732, respectively) was amplified (including the HincII and BamHI sites) and ligated into HincII and BamHI sites of pDK1, creating pSUB1 (Fig. 2). The plasmid pSUB2 was created by amplifying a 250 bp of the nifK proximal region using primers that incorporate EcoR1 and BamHI sites (Table 1) and ligating it into the BamHI and EcoR1 sites of pSUB1, creating pSUB1A. The XmaI site of pSUB1 was replaced with a BamHI site. This was followed by amplifying a 250 bp segment of the nifD proximal side using primers that incorporate BamHI and EcoR1 sites (Table 1) and ligating it into the HincII and HindIII sites of

73 pSUB1A, creating pSUB2. The substrate plasmid pSUB6, was created by amplifying a 41 bp fragment spanning the nifD proximal direct repeat with the primers that incorporate HincII and HindIII sites (Table 1), and inserting this fragment into the HincII and HindIII sites of pSUB2, creating pSUB6. To create pSUB9B a 41bp fragment of the nifK proximal region including direct repeat was amplified using primers that incorporate BamHI and EcoR1 sites, and inserting this fragment into the BamHI and EcoR1 sites of pSUB6m (Fig. 3). To create the remaining plasmids in the pSUB series, PCR directed mutagenesis was used to alter specific nucleotides in the nifD and nifK proximal regions of pSUB9B. We altered nucleotides that are identical in the nifD and nifK proximal regions. These nucleotides are termed the “identical” nucleotides (see below). These PCR products were then ligated into either the HindIII-EcoR1 sites of pSUB2, the HindIII- HincII sites of pSUB9B, or the EcoR1-BamHI sites of pSUB9B depending upon the PCR product. The primers used to generate the remaining pSUB substrate plasmids are listed in Table 1. Unless otherwise indicated all nucleotides altered by PCR directed mutagenesis were transversions and changed from A to C, C to A, T to G, and G to T. pTOPO series The pTOPO series of substrate plasmids were created to analyze which nucleotides not identical in the nifD and nifK proximal regions are involved in excision of pSUB9B which are not identical in both regions that are involved in excision. These nucleotides are termed the “nonidentical” nucleotides (see below). Using PCR directed mutagenesis and pSUB9B as a template, the “nonidentical” nucleotides were mutated and ligated into pCR®4-TOPO, with the TOPO TA Cloning® Kit for Sequencing (Invitrogen, Carlsbad, CA). The primers used to generate the pTOPO are listed in Table 1.

Recombination assays Substrate plasmids were transformed into chemically competent pHEN1 cells and or pHEN2 cells using standard protocols (Sambrook and Russell, 2001). All substrate plasmids were cloned into pHEN2 cells, and only pRE-18 and pDK1 were cloned into pHEN1 cells. This was done because it is believed that the second in-frame start codon is the one from which translation initiates (Haselkorn, 1992). The transformants were plated

74 on M9 media supplemented with ampicillin and chloramphenicol to select for the substrate plasmid and expression vector, respectively, and allowed to grow for 12-16 hours at 37ºC. After incubation ~10 colonies were pooled together and inoculated into liquid M9 and grown in a 37ºC shaking incubator for 15-19 hours, supplemented with ampicillin, chloramphenicol, and arabinose (to induce expression of xisA). Total plasmids were isolated and quantified. Plasmid preparations (1.0-2.0µg) were digested with HindIII for 20-28 hours. Plasmid digestions were separated gel electrophoresis on 1.5% agarose gels. Recombined substrate plasmids differ structurally from nonrecombined plasmids. Excision or recombination is thought to occur within the direct repeats (Haslekorn, 1992), and results in the removal of the nucleotides between the direct repeats. Therefore, recombined substrate plasmids are smaller than nonrecombined plasmids and the two can be differentiated by gel electrophoresis. When applicable, the percentage of recombined versus nonrecombined substrate plasmids was determined using the Alpha Imager (Alpha Innotech, San Leandro, CA), according to manufacture’s suggestions. Quantification of recombined and nonrecombined bands was accomplished using the spot density tool with the automatic background function selected to correct for background fluorescence. To calibrate the quantification of the recombined and nonrecombined plasmids, known quantities of Lambda DNA (digested with HindIII and EcoR1) were electrophoresed with the substrate plasmids. Using Lambda DNA as a control, a standard curve was created, allowing for a more efficiently quantify recombined and nonrecombined substrate plasmids. This resulted in the estimated quantities of recombined and nonrecombined substrate plasmids. To determine the percentage of recombined versus nonrecombined substrate plasmids, we corrected for the difference in size between the two. Nonrecombined substrate plasmids were ~1.5 kb larger than their recombined counterparts, which is due to the presence of the spacer region of Lambda DNA that is removed upon recombination. In addition, PCR was also used to confirm recombination or to check for very low levels of recombination. The primers, pUCF and pUCR, were used to amplify a portion of the recombined plasmids (corresponding to the contiguous nifD gene). To determine where recombination occurred, we sequenced through the recombination repeat of some of these PCR products.

75 Results

No recombination or excision was detected when pRE-18 was cloned into pHEN1 or pHEN2 cells (Fig. 4), indicating that nucleotides other than just the direct repeats are required for excision. When transformed into pHEN2 cells, pDK1 was recombined inefficiently, resulting in the recombined pDK1 having to be subcloned (Fig. 5). The inability to achieve efficient recombination has been reported by others (Brusca et al., 1990). We hypothesize that the inefficient recombination of pDK1 was due to either the high copy number of pUC18 derived substrate plasmids, or the close proximity of the nifD and nifK proximal direct repeats in pDK1. The direct repeats in pDK1 are ~500 bp apart, whereas within the genomes of the heterocystous cyanobacteria, they are between 24 and 4 kb apart. Their close proximity in pDK1 could limit recombination, perhaps by steric hindrance. To determine if the inefficient recombination of pDK1 was due to either copy number or the close proximity of the direct repeats, we took two approaches: 1) we created the substrate plasmid pDK3, a pBR322 derivative, which has a lower copy number than the pUC18 derived pDK1, 2) A spacer region was inserted into pDK1 (creating pSUB1) to separate the direct repeats (Fig. 2). When transformed into pHEN2 cells, pDK3 was recombined but at a low level, suggesting that lowering the copy number of the substrate plasmid did not increase recombination efficiency (data not shown). When transformed into pHEN2 cells, pSUB1 was recombined at an efficency of 40% (Fig. 6). This suggests that the close proximity of the direct repeats in pDK1 may inhibit efficient recombination. The plasmid pSUB2 was recombined at an efficency of 40% when transfected into pHEN2 cells (Fig. 7). pSUB9B was recombined at an efficency of ~25% (Fig. 8), indicating that 41 bp nifD and nifK proximal regions was sufficient for recombination to occur. Using PCR directed mutagenesis and pSUB9B as a template, nucleotides within the nifD and nifK proximal regions (the direct repeats, identical nucleotides, and nonidentical nucleotides) (Fig. 9) were mutated to determine which are required for excision. Results are summarized in Table 2. Nucleotides at positions D1-4 and K1-4 were shown to be involved in excision of the element. Substrate plasmids containing mutations in nucleotide positions D1-4 (pSUB16Z) and K1-4 (pSUB10) were deficient in

76 recombination (Fig. 10), although PCR analysis revealed that recombination occurred at low levels (data not shown). However individually mutating the nucleotides at positions D1-2 (pSUB68 and pSUB69, respectively) and K1-4 (pSUB70, pSUB71, pSUB72, and pSUB73, respectively), did not inhibit recombination (Fig. 11). Additionally, the substrate plasmid pSUB3, which contains the nucleotides TTCT in positions K1-4 (as opposed to CCGT) was successfully recombined (data not shown). This suggests that retaining the wild type nucleotide at K4 and implementing transitions as opposed to transversions at K1-2 allowed for recombination. The nucleotides at positions K9 and D9 had dramatically different effects on recombination when mutated. Altering the nucleotide at D9 from an A to a C (pSUB17) or from an A to a G (pSUB74) did not affect recombination (Fig. 12). Changing the nucleotide at K9 from an A to a C (pSUB11) inhibited recombination; although changing it from A to G (pSUB76) did not affect recombination (Fig. 12). Furthermore, when both D9 and K9 were changed from G to T in the same substrate plasmid (pSUB26, which was constructed by ligating the EcoR1- HincII fragment from pSUB11 into pSUB17), recombination was successful (Fig. 12). This suggests that a double mutation at D9 and K9 mutation was able to overcome the deleterious effects of a single mutation at D9. Altering the nucleotide at D11 from G to T (pSUB18) inhibited recombination (Fig. 12), although PCR revealed that recombination occurred at low levels (data not shown). Changing the nucleotide at D11 from G to A (pSUB75) greatly reduced recombination (Fig. 12), which is supported by a faint band corresponding to the recombined pSUB75 that could be seen when the gel in Fig. 12 was over exposed (data not shown). Mutating the nucleotide at position K11 from G to T (pSUB23) did reduce recombination efficiency (as demonstrated by the faint band corresponding to the recombined pSUB23) (Fig. 12). When K11 was changed from G to an A, (pSUB77), recombination was successful (Fig. 12). Mutational analyses of the nucleotides within the direct repeats revealed that not all are required for recombination. Of the 22 nucleotides within the direct repeats, only nine inhibited recombination when individually mutated. The results for the nifD proximal repeat can be found in Fig. 13, and the results of nifK proximal repeat can be found in Fig. 14. Altering nucleotides D14-17, 22, and 24 (pSUB55-53, 47, and 45, respectively) (Fig. 13), as well as K14-18, 22, and 24 (pSUB44-40, 36, and 34, respectively) (Fig. 14)

77 did not inhibit recombination. Altering nucleotides D18-21, and 23 (pSUB51-48, and 46) (Fig. 13) and K19-21, and 23 (pSUB39-37, and 35) (Fig. 14) inhibited recombination, although PCR revealed that recombination occurred at low levels (data not shown). When the nucleotide at K23 was changed from C to A (pSUB35), recombination was negatively affected. When K23 was altered from C to a T (pSUB12C), a transition as opposed to a transversion, recombination was successful. The nucleotide at K23 is a T and not a C within the chromosome of Anabaena sp. Strain ATCC 33047 (Henson et al., 2005). Although certain nucleotides within the direct repeats can be individually mutated without affecting recombination, altering more than one in an individual direct repeat negatively affected recombination. Substrate plasmids with mutations in positions K14- 16 (pSUB78) and K14-15 (pSUB9A) were both deficient in recombination (data not shown). PCR analysis revealed that the substrate plasmid pSUB78 was recombined at low levels, but that pSUB9A was not. Substrate plasmids containing mutations at K30-32 (pSUB14), D30-32 (pSUB19), K35-36 (pSUB13), D35-36 (pSUB20), K41 (pSUB33), and D41 (pSUB21) were recombined when cloned into pHEN2 cells (Fig. 15). Although the plasmids pSUB14, pSUB19, and pSUB20 were recombined, they had reduced efficiencies, suggesting that although not required for excision they may enhance it. To address whether the nonidentical nucleotides are required for recombination, we created pSUB56 and pSUB57 which contain mutations in all the nonidentical nucleotides in the nifK and nifD proximal regions, respectively. When cloned into pHEN2 cells, neither pSUB56 nor pSUB57 were recombined (data not shown), which was also substantiated by PCR (data not shown). To further investigate the role that nonidentical nucleotides have in recombination, we created the plasmids pSUB60 (mutations in D5-8, 10, 12, and 13), pSUB63 (mutations in D25-29, 33, 34, and 37-40), and pSUB64 (mutations in K25-29, 33, 34, and 37-40). Of these, only pSUB63 was recombined at a low level (data not shown). We then set out to determine if any of the nonidentical nucleotides are individually required for excision, Using PCR directed mutagenesis and pSUB9B as a template, all the nonidentical nucleotides (except for nucleotides at positions D6, D10, K25-28, and K34) were mutated. These PCR products were cloned into pCR®4-TOPO (Invitrogen, Carlsbad, CA), creating the pTOPO series of

78 substrate plasmids, which were cloned into pHEN2 cells. All of the pTOPO substrate plasmids were recombined when cloned into pHEN2 cells (data not shown). These results suggest that the nonidentical nucleotides do not affect recombination when individually mutated; however, when multiple nucleotides in these regions are mutated, recombination is negatively affected. By analyzing published nifD sequences (Henson et al., 2002; Henson et al., 2004), sequenced genomes (Meeks et al., 2001; Kancko et al., 2001), sequenced nifD elements (Henson et al., 2005), as well as sequencing through the recombination junction of select substrate plasmids, we can deduce which nucleotides within the nifD and nifK proximal regions become part of the contiguous nifD gene and which become part of the excised element as a result of recombination (Fig. 16). The direct repeats are believed to be where recombination occurs, but the actual site of recombination within the direct repeats is unknown (Haselkorn, 1992). As with recombination mediated by other tyrosine recombinases, XisA-mediated excision of the nifD element is likely to have an obligatory intermediate Holiday structure and branch migration. To determine where recombination takes place and to what extent branch migration occurs, we sequenced through the recombination junction corresponding to the contiguous nifD gene from the recombination assays of pSUB34-55 (those plasmids harboring mutations within the direct repeats). The data indicate that the nucleotides at positions K14 and D21-23 become integrated into nifD, and therefore the nucleotides in positions D14 and K21-23 are incorporated into the excised element. This is supported by the electropherograms presented in Fig. 17. The electropherograms for K14 and D21-23 (pSUB44 and 37-35) each have the mutated nucleotide in the position of interest (Fig. 17) indicating that these nucleotide positions become incorporated within the contiguous nifD gene. The electropherograms for D14 and K21-23 (pSUB44, 48-46 respectively) all have the wild type nucleotide in the position of interest (Fig. 17), indicating that these nucleotide positions (D14 and K21-23) become part of the excised element and not nifD. The data also support that branch migration occurs in a 6 bp region encompassing nucleotides D15-D20 and K15-K20 (Fig. 17). This is supported by the electropherograms corresponding to these nucleotide positions (Fig. 17). Within these electropherograms, the nucleotide position of interest contains a polymorphism. These polymorphisms are

79 demonstrated by two separate nucleotide peaks in the same position, suggesting the presence of two nucleotides. This situation is described as a “peak under a peak”, and indicates that multiple templates have been sequenced. If in the course of recombination, branch migration permitted excision, not at a specific site, but at any location within this 6 bp region (D15-D20 and K15-K20), then when sequencing through the recombination junction we could expect to see polymorphisms or multiple peaks at these positions, which is exactly what we observed. A model of the excision of the nifD element is presented in (Fig. 18).

Discussion The data presented in this study indicate that nucleotides both within and outside of the direct repeats are involved in excision. Surprisingly, not all nucleotides within the direct repeats were shown to be required for excision, which would explain the nucleotide differences found within the direct repeats of some heterocystous cyanobacteria (Henson et al., 2005; see Chapter two). Although recombination is not negatively affected when certain nucleotides are individually mutated, it can be inhibited when more than one is altered in the same substrate plasmid, suggesting an additive effect. This included both the identical and nonidentical nucleotides. Although mutating certain nucleotides had an inhibitory effect on recombination, rarely was recombination completely eliminated, since PCR detected recombination at low levels. In certain nucleotide positions there appears to be selectivity with respect to which nucleotides allow recombination and which ones inhibit it. This is the case for nucleotides in positions D11, K9, K11, and K23 where a transition did not adversely affect recombination, but a transversion inhibited it. The nucleotide at position K23 has been found to be modified from a C to a T (transition) in the genome of Anabaena ATCC 33047, which further supports that a transition at this position does not adversely affect recombination. The data suggests that the nucleotides required for excision (by possibly interacting with XisA) are not symmetrical with respect to the site of recombination. In other recombination systems, such as the Flp, XerD, and Cre recombinases (Voziyanov et al., 1999), the recombinase binding sites are symmetrical in that they flank the site of branch migration and recombination (Voziyanov et al., 1999; Nunes-Düby et al., 1995; Chen and Rice, 2003). The presence of a purine or

80 a pyrimidine at several nucleotide positions significantly affected recombination. This suggests that something about the chemical structure or secondary structure of the nucleotides in these positions either allows for efficient rebcombination or blocks it. This could affect the ability of XisA to bind to the DNA, affect the stability of the XisA-DNA interaction, or affect the recombination reaction itself. Classical models predict that the binding of recombinases (via hydrogen bonds) to specific base pairs is a requirement for recombination (Kono and Sarai, 1999; Tain et al., 2004). The results presented here, as well as elsewhere (Tian et al., 2004), suggests that in some cases it may be individual nucleotides not necessarily complementary base pairs that are bound by the recombinase during recombination. Our analyses indicate that excision of the nifD element occurs via the model presented in Fig. 18. The model predicts that the direct repeats are brought in close proximity of each other and that, strand exchange, Holiday junction formation, and excision occurs between them. Results indicate that excision occurs between nucleotide positions D14 and D21 on the nifD proximal side and K14 and K21 on the nifK proximal side, with branch migration occurring between positions D15-D20 and K15-K20. Based on mutational analyses of the nucleotides within this 6 bp region (D15-D20 and K15- K20), we propose that the exchange of strands, and Holiday junction formation, is initiated at nucleotides positions D20 and K20, followed by branch migration that proceeds up to and including D15 and K15 (Fig. 18). It has been suggested that sequence similarity at the site of the strand exchange is critical since immediately following exchange the branch must migrate several nucleotides or the reaction can be reversed; and mismatches at the site of initial strand exchange inhibit recombination (Nunes-Düby et al., 1987; Kitts and Nash, 1988; Nunes-Düby et al., 1995). For example, recombination mediated by Flp is inhibited when nucleotides mismatches occur in the in the 5` end of the strand being exchanged (Senecoff and Cox, 1988; Chen and Rice, 2003). This may explain why mutations at nucleotides D18-20 and K19-20 inhibit recombination, since mutations to these nucleotides would inhibit recombination if they occur near the 5` end of the strands being exchanged. Although recombination was successful in E. coli, it was inefficient, never achieving complete (100%) recombination. Brusca et al. (1990) reported inefficient

81 recombination of pAM461 in E. coli and that cells over-expressing XisA grew poorly, suggesting that high levels of XisA may be harmful. Conversely, XisA may not be completely functional in E. coli or that it is less efficient in trans. The results presented here suggest that XisA-mediated excision of the nifD element is distinct from other identified members of the tyrosine family of site-specific recombinases. The data indicate there is variation in the sequences required for excision of the nifD element, which has significant functional and biological importance. The ability of XisA to excise slightly different templates would be beneficial to a cyanobacterium that develops a mutation in the flanking regions of the nifD element. It would be possible for that cyanobacterium to still excise the nifD element and fix nitrogen. It has been hypothesized that the nifD element originated as an ancient viral infection (Haselkorn, 1992; Henson et al., 2005), and the variation seen in its excision may be due to its ancestry. Functional variation would have been beneficial to the viral ancestor enabling it to integrate into and excise out of a variety of locations within the genome. To further evaluate the excision of the nifD element, the XisA-DNA interaction should be studied, and the intermediate Holiday structures isolated.

82 Table 1. List of primers used in this study. The plasmids pRE-18, pDK1, pSUB1, pSUB2, pSUB6, and pSUB9B have all primers used for construction listed. The plasmids pSUB60, pSUB63, and pSUB64 list just the names of the primers used and not their sequence since these are located elsewhere in the table. The remaining substrate plasmids (both the pSUB and pTOPO series) have only the primer that incorporated the mutation into the PCR product listed, with the altered nucleotide(s) in bold face.

Primers for sequencing M13R CAG GAA ACA GCT ATG AC M13F GTA AAA CGA CGG CCA G pUCR1 GGC ACG ACA GGT TTC CCG ACT GG pUCF1 GCG TAA GGA GAA AAT ACC GCA TCA GG pBAD18 F1 CTG TTT CTC CAT ACC CGT T LamInsHinc F CCA GAC ATG CTC GTT GAA GCA TAC GG LamInsBamH1 R CGT ACC ATG TCC TGA TAC AGG GC pHEN1 and pHEN2 xisA-sac1-F TCT GCA AGA GCT CCA GGA GGG AGA ACA CAT GAG AAC AAA A xisA-sac1-F2 AGG CTA AAG AGC TCA GGA GGC CAC AGC GAT GCA AAA TCA G xisA-xmaI-R GCA TTG CCC GGG TTA TTT TTA TAA AAT TCA ACT ATT C pRE-18 LAM repeat R1 TAT GAT TCT AGA TAG AGG ATT ACT CCG CTT ATC GG LAM repeat F1 GTA TAG TCA AGC TTC TCT TCT GTC GGA GTA ATC CTT TTA GGG pDK1 Kprox R1 GTT AAC GGT TAC AAT TCC ACG AGC G Kprox F1 GGT CTT GTC GAC TTT TGT TCT C Dprox R2 GTG ATT CAA AGC TTT CGC CTA ACC Dprox F1 GTA CTG TCG ACG AAT TCG CTC ACA ATG pSUB1 Kprox R1 GTT AAC GGT TAC AAT TCC ACG AGC G Kprox F1 GGT CTT GTC GAC TTT TGT TCT C Dprox R2 GTG ATT CAA AGC TTT CGC CTA ACC Dprox F1 GTA CTG TCG ACG AAT TCG CTC ACA ATG New Lambda F1 GGC TGT ATA GTC AAC TAA CTC TTC Lambda R1 GTG GCA TGC CCC GGG AAG GAC GTT TG pSUB2 Kprox F2 CTA GGT CCC GGG GTA CTT TTG TTC T Kprox F3B CAA GGA GCG GAA TTC AAG CTC CAA G Dprox R3B GAT GAC GTT AAA GCT TAC GAA TTT G Dprox F2 GCC GCA AAA ATG TTA ACT TCC CAG A pSUB6 Dprox R4 GGT CTA AGC TTC CGT CAA ATG CAC TCT TGG GAT TAC TCC G Dprox F3 CTG ACA TCG TCG ACC CGT CGC CAA GTT CGG AGT AAT CC pSUB9B Dprox R4 GGT CTA AGC TTC CGT CAA ATG CAC TCT TGG GAT TAC TCC G Dprox F3 CTG ACA TCG TCG ACC CGT CGC CAA GTT CGG AGT AAT CC Kprox RAB TGT TGG ATC CCA TTA AAC CAC AAA AAG GAT TAC TCC G Kprox 10 AAT CCG GAA TTC CCG TGA TAA GGG CCG GAG TAA TCC

pSUB3

83 Kprox F4 GTC AGA ATT CTG ATA AGG GCC GGA GTA ATC C pSUB9A Kprox R4 TGT TGG ATC CCA TTA AAC CAC AAA AAG GAT TAC TGG C pSUB10 Kprox F5 AAT CCG GAA TTC AAT GGA TAA GGG CCG GAG TAA TCC pSUB11 Kprox F6 AAT CCG GAA TTC CCG TGA TAC GGG CCG GAG TAA TCC pSUB12C Kprox F7 AAT CCG GAA TTC CCG TGA TAA GGG CCG GAG TAA TTC pSUB13 Kprox R5B TGT TGG ATC CCA TTC CAC CAC AAA AAG GAT TAC TCC G pSUB14 Kprox R6B TGT TGG ATC CCA TTA AAC ACA AAA AAG GAT TAC TCC G pSUB16Z Dprox F4 CTG ACA TCG TCG ACA ATG CGC CAA GTT CGG AGT AAT CC pSUB17 Dprox F5 CTG ACA TCG TCG ACC CGT CGC CCA GTT CGG AGT AAT CC pSUB18 Dprox F6 CTG ACA TCG TCG ACC CGT CGC CAA TTT CGG AGT AAT CC pSUB19 Dprox R5 GGT CTA AGC TTC CGT CAA ATG TGT TCT TGG GAT TAC TCC G pSUB20 Dprox R6 GGT CTA AGC TTC CGT CAC CTG CAC TCT TGG GAT TAC TCC G pSUB21 Dprox R7Z GGT CTA AGC TTC TGT CAA ATG CAC TCT TGG GAT TAC TCC G pSUB23 Kprox F8Z AAT CCG GAA TTC CCG TGA TAA GTG CCG GAG TAA TCC pSUB33 Kprox R10 TGT TGG ATC CAC ATT AAA CCA CAA AAA GGA TTA CTC CG pSUB34 Kprox F11 TCC GGA ATT CCC GTG ATA AGG GCC GGA GTA ATC A pSUB35 Kprox F12 TCC GGA ATT CCC GTG ATA AGG GCC GGA GTA ATA C pSUB36 Kprox F13 TCC GGA ATT CCC GTG ATA AGG GCC GGA GTA AGC C pSUB37 Kprox F14 TCC GGA ATT CCC GTG ATA AGG GCC GGA GTA CTC C pSUB38

84 Kprox F15 TCC GGA ATT CCC GTG ATA AGG GCC GGA GTC ATC C pSUB39 Kprox F16 TCC GGA ATT CCC GTG ATA AGG GCC GGA GGA ATC C pSUB40 Kprox F17 TCC GGA ATT CCC GTG ATA AGG GCC GGA TTA ATC C pSUB41 Kprox F18 TCC GGA ATT CCC GTG ATA AGG GCC GGC GTA ATC C pSUB42 Kprox F19 TCC GGA ATT CCC GTG ATA AGG GCC GTA GTA ATC C pSUB43 Kprox F20 TCC GGA ATT CCC GTG ATA AGG GCC TGA GTA ATC C pSUB44 Kprox F21 TCC GGA ATT CCC GTG ATA AGG GCA GGA GTA ATC C pSUB45 Dprox R10 GTC TAA GCT TCC GTC AAA TGC ACT CTT GTG ATT ACT CCG pSUB46 Dprox R11 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGT ATT ACT CCG pSUB47 Dprox R12 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG CTT ACT CCG pSUB48 Dprox R13 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG AGT ACT CCG pSUB49 Dprox R14 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATG ACT CCG pSUB50 Dprox R15 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT CCT CCG pSUB51 Dprox R16 GTC TAA GCT TCC GTC AAA TCG ACT CTT GGG ATT AAT CCG pSUB52 Dprox R17 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACG CCG pSUB53 Dprox R18 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACT ACG pSUB54 Dprox R19 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACT CAG pSUB55 Dprox R20 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACT CCT pSUB56 Dprox F7 AGT GCC AAG CTT CCT GAC AAG TCA CGA GGT GGA TTA CTC CGC CCG TTT ATA CGG GTC GAC GAT GG pSUB57

85 Kprox R8 ATT ACG AAT TCC CGT TCG CAT GTA CGG AGT AAT CCG GCC GGT GTG TTC CGT GGA TCC CAT GGT CG pSUB60 Made using M13R, Dprox R4 and pSUB56 as the template pSUB63 Made using M13F, Dprox F3 and pSUB56 as the template pSUB64 Made using M13F, Kprox F10 and pSUB57 as the template pSUB68 Dprox F10 CAT CGT CGA CAC GTC GCC AAG TTC GGA GTA ATC C pSUB69 Dprox F11 CAT CGT CGA CCA GTC GCC AAG TTC GGA GTA ATC C pSUB70 Kprox F24 TCC GGA ATT CCC GGG ATA AGG GCC GGA GTA ATC C pSUB71 Kprox F23 TCC GGA ATT CCC TTG ATA AGG GCC GGA GTA ATC C pSUB72 Kprox F22 TCC GGA ATT CCA GTG ATA AGG GCC GGA GTA ATC C pSUB73 Kprox F21 TCC GGA ATT CAC GTG ATA AGG GCC GGA GTA ATC C pSUB74 Dprox R22 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACT CCG AAC TCG pSUB75 Dprox R23 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACT CCG AAT TTG pSUB76 Kprox F25 TCC GGA ATT CCC GTG ATA GGG GCC GGA GTA ATC C pSUB77 Kprox F26 TCC GGA ATT CCC GTG ATA AGA GCC GGA GTA ATC C pSUB78 Kprox F27 TCC GGA ATT CCC GTG ATA AGG GCA TGA GTA ATC C pTOPO84 Kprox F28 TCC GGA ATT CCC GTT ATA AGG GCC G pTOPO85 Kprox F29 TCC GGA ATT CCC GTG CTA AGG GCC G pTOPO86 Kprox F30 TCC GGA ATT CCC GTG AGA AGG GCC G pTOPO87 Kprox F31 TCC GGA ATT CCC GTG ATC AGG GCC G pTOPO88 Kprox F32 TCC GGA ATT CCC GTG ATA ATG GCC G

86 pTOPO89 Kprox F33 TCC GGA ATT CCC GTG ATA AGG TCC G pTOPO90 Kprox F34 TCC GGA ATT CCC GTG ATA AGG GAC G pTOPO95 Kprox F39 TCC GGA ATT CCC GTG ATA AGG GCC GGA GTA ATC CTT TTG pTOPO96 Kprox F40 TCC GGA ATT CCC GTG ATA AGG GCC GGA GTA ATC CTT TTT GTG T pTOPO98 Kprox F42 TCC GGA ATT CCC GTG ATA AGG GCC GGA GTA ATC CTT TTT GTG GTT TC pTOPO99 Kprox F43 TCC GGA ATT CCC GTG ATA AGG GCC GGA GTA ATC CTT TTT GTG GTT TAC pTOPO100 Kprox F44 TCC GAA TTC CCG TGA TAA GGG CCG GAG TAA TCC TTT TTG TGG TTT AAG pTOPO101 Dprox R41 GTC TAA CGT TCC TTC AAA TGC ACT CTT GGG pTOPO102 Dprox R24 GTC TAA GCT TCC GGC AAA TGC ACT CTT GGG pTOPO103 Dprox R25 GTC TAA GCT TCC GTA AAA TGC ACT CTT GGG pTOPO104 Dprox R26 GTC TAA GCT TCC GTC CAA TGC ACT CTT GGG pTOPO105 Dprox R27 GTC TAA GCT TCC GTC AAA GGC ACT CTT GGG pTOPO106 Dprox R28 GTC TAA GCT TCC GTC AAA TTC ACT CTT GGG pTOPO107 Dprox R29 GTC TAA GCT TCC GTC AAA TGC ACG CTT GGG pTOPO108 Dprox R30 GTC TAA GCT TCC GTC AAA TGC ACT ATT GGG pTOPO109 Dprox R31 GTC TAA GCT TCC GTC AAA TGC ACT CGT GGG pTOPO110 Dprox R32 GTC TAA GCT TCC GTC AAA TGC ACT CTG GGG pTOPO111 Dprox R33 GTC TAA GCT TCC GTC AAA TGC ACT CTT TGG pTOPO112 Dprox R34 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACT CCG C

87 pTOPO113 Dprox R35 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACT CCG AC pTOPO115 Dprox R37 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACT CCG AAC TTT pTOPO116 Dprox R38 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACT CCG AAC TTG T pTOPO118 DproxR40 GTC TAA GCT TCC GTC AAA TGC ACT CTT GGG ATT ACT CCG AAC TTG GCT

88 Table 2. Results of recombination assay for the pSUB and pTOPO series of substrate plasmids. Nucleotide position with plasmid name listed in first column. Second lists the changes in nucleotides that were made. Recombination indicates whether or not recombination occurred with + indicating yes, - indicating no, and +/- indicating greatly reduced recombination. The plasmids pSUB60, pSUB63, pSUB64, pSUB56, and pSUB57 were exclude.

Nucleotide position Nucleotide Recombination (substrate plasmid) change

D1-4 (pSUB16Z) CCGT to AATG - K1-4 (pSUB10) CCGT to AATG - K1-3 (pSUB3) CCG to TTC + D-1 (pSUB68) C to A + D2 (pSUB69) C to A + K1 (pSUB70) C to A + K2 (pSUB71) C to A + K3 (pSUB72) C to A + K4 (pSUB73) C to A + D9 (pSUB17) A to C + D9 (pSUB74) A to G + K9 (pSUB11) A to C - K9 (pSUB76) A to G + K9 & D9 (pSUB26) G to T (both) + D11 (pSUB18) G to T - D11 (pSUB75) G to A +/- K11 (pSUB23) G to T +/- K11 (pSUB77) G to A + D14 (pSUB55) C to A + D15 (pSUB54) G to T + D16 (pSUB53) G to T + D17 (pSUB52) A to C + D18 (pSUB51) G to T - D19 (pSUB50) T to G - D20 (pSUB49) A to C - D21 (pSUB48) A to C - D22 (pSUB47) T to G + D23 (pSUB46) C to A - D24 (pSUB45) C to A + K14 (pSUB44) C to A + K15 (pSUB43) G to T + K16 (pSUB42) G to T + K17 (pSUB41) A to C + K18 (pSUB40) G to T + K19 (pSUB39) T to G - K20 (pSUB38) A to C - K21 (pSUB37) A to C - K22 (pSUB36) T to C + K23 (pSUB35) C to A - K23 (pSUB12C) C to T + K24 (pSUB34) C to A + K14-16 (pSUB78) CG to AT - K14-17 (pSUB9A) CGG to ATT - D30-32 (pSUB19) GTG-TGT + K30-32 (pSUB14) GTG-TGT + D35-36 (pSUB20) TT to GG +

89 K35-36 (pSUB13) TT to GG + D41 (pSUB21) G to T + K41 (pSUB33) G to T + D5 (pTOPO118) C to A + D7 (pTOPO116) C to A + D8 (pTOPO115) C to A + D12 (pTOPO113) T to G + D13 (pTOPO112) T to G + D25 (pTOPO111) C to A + D26 (pTOPO110) A to C + D27 (pTOPO109) A to C + D28 (pTOPO108) G to T + D29 (pTOPO107) A to C + D33 (pTOPO106) C to A + D34 (pTOPO105) A to C + D37 (pTOPO104) T to G + D38 (pTOPO103) G to T + D39 (pTOPO102) A to C + K5 (pTOPO84) G to T + K6 (pTOPO85) A to C + K7 (pTOPO86) T to G + K8 (pTOPO87) A to C + K10 (pTOPO88) G to T + K12 (pTOPO89) G to T + K13 (pTOPO90) C to A + K29 (pTOPO95) T to G + K33 (pTOPO96) G to T + K37 (pTOPO98) A to C + K38 (pTOPO99) A to C + K39 (pTOPO100) T to G + K40 (pTOPO101) C to A +

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112 Literature cited

Adams DG, Duggan PS (1999) Heterocyst and akinete differentiation in cyanobacteria. New Phytol 144:3-33

Bohme H (1998) Regulation of nitrogen fixation in heterocyst-forming cyanobacteria. Trends in Plant Science 3:346-351

Bibb LA, Hancox MI, Hatfull GF (2005) Integration and excision by the large serine recombinase φRv1 integrase. Mol Microbiol 55:1896-1910

Brusca JS, Hale MA, Carrasco CD, Golden JW (1989) Excision of an 11-kilobase-pair DNA element from within the nifD gene in Anabaena variabilis heterocysts. J Bacteriol 171:4138-4145

Brusca JS, Chastain CJ, Golden JW (1990) Expression of the Anabaena sp. strain PCC 7120 xisA gene from a heterologous promoter results in excision of the nifD element. J Bacteriol 172:3925-3931

Carrasco CD, Ramaswamy KS, Ramasubramanian TS, Golden JW (1994) Anabaena xisF gene encodes a developmentally regulated site-specific recombinase. Genes Dev 8:74-83

Carrasco CD, Golden JW (1995) Two heterocyst-specific DNA rearrangements of nif operons in Anabaena cylindrica and Nostoc sp. strain Mac. Microbiol 141:2479-2487

Carrasco CD, Buettner JA, Golden JW (1995) Programmed DNA rearrangement of cyanobacterial hupL gene in heterocysts. Proc Natl Acad Sci USA 92:791-795

Carrasco CD, Holliday SD, Hansel A, Lindblad P, Golden JW (2005) Heterocyst-specific Excision of the Anabaena sp. Strain PCC 7120 hupL element requires xisC 187:6031- 6038

113 Chen Y, Rice PA (2003) New insight into site-specific recombination from Flp recombinase-DNA structures. Annu Rev Biophys Biomol Struct 32:135-59

Golden JW, Mulligan M, Haselkorn R (1987) Different recombination site specificity of two developmentally regulated genome rearrangements. Nature 327:526-529

Golden JW, Carrasco CD, Mulligan ME, Schneider GJ, Haslekorn R (1988) Deletion of a 55-kilobase-pair DNA element from the chromosome during heterocyst differentiation of Anabaena sp. Strain PCC 7120. J Bacteriol 170:5034-5041

Golden JW, Robinson SJ, Haselkorn R (1992) Rearrangement of nitrogen fixation genes during heterocyst differentiation in the cyanobacterium Anabaena. Nature 314:419-423

Guzman L, Belin D, Carson MJ, Beckwith J (1995) Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter. J Bacteriol 177:4121-4130

Haselkorn R (1992) Developmentally regulated gene rearrangements in prokaryotes. Annu Rev Genet 26:113-130

Haraldsen JD, Sonenshein AL (2003) Efficient sporulation in Clostridium difficile requires disruption of the σK gene. Mol Microbiol 48:811–821

Henson BJ, Watson LE, Barnum SR (2002) Molecular differentiation of the heterocystous cyanobacteria, Nostoc and Anabaena, based on complete nifD sequences. Curr Microbiol 45:161-164

Henson BJ, Watson LE, Barnum SR (2004) Molecular phylogeny of the heterocystous cyanobacteria (Sections IV and V) based on nifD. ISJEM 54:493-497

114 Henson BJ, Watson LE, Barnum SR (2005) Characterization of a 4kb variant of the nifD element in Anabaena sp. Strain ATCC 33047. Curr Microbiol 50:129-132

Kancko T, Nakamura Y, Wolk CP, Kuritz T, Sasamoto S, Watanabe A, et al. (2001) Complete genomic sequence of the filamentous nitrogen-fixing cyanobacteriun Anabaena sp. strain PCC 7120. DNA Res 8:205-213, 227-253

Kitts PA, Nash HA (1988) Bacteriophage λ site-specific recombination proceeds with a defined order of strand-exchanges. J Mol Biol 204:95-108

Kono H, Sarai, A (1999) Structure-based prediction of DNA target sites by regulatory proteins. 35:114–131

Kunkel B, Losick R, Stragier P (1990) The Bacillus subtilis Gene for the Development Transcription Factor Sigma K is Generated by Excision of a Dispensable DNA Element Containing a Sporulation Recombinase Gene. Genes Dev 4:525-535

Lammers PL, Golden JW, Haselkorn R (1986) Identification and sequence of a gene required for a developmentally regulated DNA excision in Anabaena. Cell 44:905-911

Lammers PL, Mclaughlin S, Papin S, Trujillo-Provencio C, Ryncarz II AJ (1990) Developmental rearrangement of cyanobacterial nif genes: nucleotide sequence, open reading frames, and cytochrome P-450 homology of the Anabaena sp. strain PCC 7120 nifD element. J Bacteriol 172:6981-6990

Mazur BO, Chui F (1982) Sequence of the Gene Coding for the Beta-Subunit of Dinitrogenase from the Blue-Green Alga Anabaena. Proc Natl Acad Sci (USA) 79:6782- 6786

115 Meeks JC, Elhai J, Thiel T, Potts M, Larimer F, Lamerdin J, Predki P, Atlas R (2001) An overview of the genome of Nostoc punctiforme, a multicellular, symbiotic cyanobacterium. Photosyn Res 70:85-106

Mulligan ME, Buikema WJ, Haselkorn R (1988) Bacterial-type ferredoxin genes in the nitrogen fixation regions of the cyanobacterium Anabaena sp. strain PCC 7120 and Rhizobium meliloti. J Bacteriol 170:4406-4410

Mulligan ME, Haselkorn R (1989) Nitrogen-fixation (nif) genes of the cyanobacterium Anabaena sp. strain PCC 7120 the nifB-fdxN-nifS-nifU operon. J Biol Chem 264:19200- 19207

Nünes-Düby SE, Kwon HJ, Tirumalai RS, Ellenberger T, Landy A (1998) Similarities and differences among 105 members of the Int family of site-specific recombinases. Nucleic Acids Res 26:391-406

Nunes-Düby SE, Marco AA, Landy A (1995) Swapping DNA strands and sensing homology without branch migration in λ site-specific recombination. Current Biology 5:139-148

Nunes-Düby SE, Matsumoto L, Landy A (1987) Site-specific recombination intermediates trapped with suicide substrates. Cell 50:779-788

Oxelfelt F, Tamagnini P, Lindblad P (1998) Hydrogen uptake in Nostoc sp. Strain PCC 73102. Cloning and characterization of a hupSL homologue. Arch Microbiol 169:267-274

Ramaswamy KS, Carrasco CD, Tasneem F, Golden JW (1997) Cell-type specificity of the fdxN-element rearrangement requires xisH and xisI. Mol Microbiol 23:1241-1249

116 Rice D, Mazur B, Haselkorn R (1982) Isolation and physical mapping of nitrogen fixation genes from the cyanobacterium Anabaena PCC 7120. J Biol Chem 257:13157- 13163

Sambrook J, Russell DW (2001) Molecular Cloning: A Laboratory Manual, 3rd edn,(2001) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY

Senecoff JF, Rossmeissl PJ, Cox MM (1988) DNA recognition by the FLP recombinase of the yeast 2 mu plasmid. A mutational analysis of the FLP binding site. J. Mol. Biol. 201:405–21

Tian L, Sayer JM, Jerina DM, Shuman S (2004) Individual nucleotide bases, not base pairs, are critical for triggering site-specific DNA cleavage by vaccinia topoisomerase. J Biol Chem 279:39718-39726

Smith MCM, Thorpe HM (2002) Diversity in the serine recombinases. Mol Microbiol 44:299-307

Smith MCA, Till R, Brady K, Soultanas P, Thorpe M, Smith MCM (2004) Synapsis and DNA cleaveage in φC31 integrase-mediated site-specific recombination. Nucleic Acid Res 32: 2607-2617

Voziyanov Y, Pathania S, Jayaram M (1999) A general model for site-specific recombination by the integrase family recombinases. Nucleic Acid Res 27:930-941

117 Summary In some cyanobacteria, heterocyst differentiation is accompanied by developmentally regulated DNA rearrangements that occur within the nifD, hupL, and fdxN genes, which are referred to as the nifD, hupL, and fdxN elements. The data presented in this dissertation indicate that the nifD and hupL elements are variable in size, structure, and composition. The sequenced nifD elements vary from 4 and 24 kb in size and the sequenced hupL elements vary from 7 and 9.5 kb in length. There are conserved regions found within all nifD elements and within all hupL elements. For each element type, these conserved regions include the 5` and 3` flanking regions, the respective recombinase required for excision (xisA for the nifD element and xisC for the hupL element), and a small region of unknown function located near the middle of each element. The data indicates that the nifD and hupL elements have undergone a complex pattern of insertions, deletions, translocations, and sequence divergence over the course of evolution, but that conserved regions remain. This suggests that conserved regions of the elements have been under selective pressure to be retained. It appears that these elements have arisen as ancient viral infections that have lost the ability to self replicate. Both the nifD and hupL elements may represent defective prophages that are in the process of mutational decay, which is characterized by increased rates of mutation. The nifD, hupL, and fdxN elements are excised from genome late in differentiation; however, the sequence requirement for excision of the elements is not known. In this dissertation the excision of the nifD element was addressed. The results presented here indicate that both nucleotides within and outside the direct repeats are involved in excision of nifD element; but, not all nucleotides within the direct repeats are required for excision. In certain nucleotide positions, the presence of a purine versus a pyrimidine greatly affected recombination. Our results also indicated that the site of excision and branch migration occurs in a 6 bp region within the direct repeats.

118