<<

The Pennsylvania State University

The Graduate School

Eberly College of Science

SYSTEMATICS AND EVOLUTION

IN THE PARASITIC (DODDER)

A Thesis in

Biology

by

Joel R. McNeal

© 2005 Joel R. McNeal

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

May 2005 The thesis of Joel R. McNeal was reviewed and approved* by the following:

Claude W. dePamphilis Associate of Thesis Adviser Chair of Committee

Stephen W. Schaeffer Associate Professor of Biology

Andrew G. Stephenson Professor of Biology

David M. Geiser Associate Professor of Plant Pathology

Douglas R. Cavener Professor of Biology Head of the Department of Biology

*Signatures are on file in the Graduate School. iii

ABSTRACT

Parasitism has evolved independently many times within the course of angiosperm evolution. One of the most economically damaging of these parasitic lineages is the genus Cuscuta, which is derived from within the Morning Glory Family

(). All of Cuscuta are epiphytic stem parasites that lack roots and expanded leaves at maturity. Their hosts include a wide array of non-grass land , and a single Cuscuta individual may parasitize dozens of host species simultaneously. Although almost 200 described species exist, identification and are difficult within the genus, as few morphological characteristics exist outside of the inflorescence in these reduced parasites. Although only about 10% of the necessary for are transcribed from the plastid , analyses of plastid genome structure, content, and sequence evolution are an excellent and efficient way to study the changes in photosynthetic ability and organellar function that accompany the transition from autotrophy to heterotrophy in angiosperms.

This work presents a useful method for acquiring complete plastid genome sequences from parasitic plants, analyses of full plastid genome sequences from two Cuscuta species and a nonparasitic relative, and a well-resolved and highly-supported phylogeny of Cuscuta using 3 plastid genes (rbcL, rps2, and matK) and the nuclear ribosomal internal transcribed spacer locus. A molecular phylogenetic approach is used to address hypotheses involving taxonomy, biogeography, morphological evolution, photosynthetic ability, and plastid genome evolution within the genus. iv

TABLE OF CONTENTS

LIST OF FIGURES...... vi

LIST OF TABLES...... vii

ACKNOWLEDGMENTS...... viii

Chapter 1: Overview...... 1

References...... 5

Chapter 2: Utilization of Partial Genomic Libraries for Sequencing Complete Organellar ...... 6

Abstract...... 7 Introduction...... 8 Materials and Methods...... 10 DNA Isolation and Purification...... 10 Partial Genomic Library Construction...... 11 Identifying Plastid Clones...... 11 Selecting Clones for Sequencing...... 13 Results and Discussion...... 14 Acknowledgments...... 19 References...... 20

Chapter 3: Disappearance of Promotes Adaptive Change and Loss of a Highly Conserved Maturase...... 24

Abstract...... 25 Methods...... 32 PCR and Sequencing...... 32 Data Analyses...... 33 References...... 34 Acknowledgments...... 36

Chapter 4: Complete Plastid Genome Sequences Suggest Strong Selection for Retention of Photosynthetic Genes in the Genus Cuscuta...... 41

Abstract...... 42 Introduction...... 44 Materials and Methods...... 48 Plastid Genome Sequencing, Assembly, and Annotation...... 48 Molecular Evolutionary Analyses...... 49 Results and Discussion...... 51 v

Acknowledgments...... 65 Literature Cited...... 66

Chapter 5: Systematics and Plastid Genome Evolution of the Cryptically Photosynthetic Parasitic Plant Genus Cuscuta (Convolvulaceae)...... 85

Acknowledgments...... 86 Abstract...... 87 Materials and Methods...... 92 Plant Material...... 92 PCR and Sequencing...... 93 Phylogenetic Analyses...... 95 Genome Size Estimates...... 96 Rates Analyses...... 96 Results...... 97 Phylogeny...... 97 Genome Size Results...... 99 Plastid Genome Variation Assays...... 100 Tests of Selective Constraint...... 101 Discussion...... 102 Morphological and Biogeographical Interpretation of Phylogeny.... 102 Genome Sizes and Speciation...... 104 Plastid Genome Evoluton in Cuscuta...... 106 Loss of Photosynthesis in Cuscuta...... 107 References...... 109 Appendix...... 129

Chapter 6: Future Direction and Conclusion...... 130

References...... 133 vi

LIST OF FIGURES

Chapter 2: Figure Legends...... 22 Figure 1: Macroarray screen of fosmid clones using pooled plastid probes...... 23 Figure 2: Map of end-sequenced clone coverage on plastid genomes...... 23

Chapter 3: Figure Legends...... 38 Figure 1: Results of PCR assays for presence or absence of two group IIB introns contained in ycf3 (a) and a group IIA in 3' rps12 (b)...... 39 Figure 2: Phylogenies of (a) Convolvulaceae and (b) Orobanchaceae inferred from sequence of full and partial matK sequences...... 39

Chapter 4: Figure Legends...... 72 Figure 1: Circular map of the complete plastid genome of Ipomoea purpurea Inset: Genomes scaled to relative size...... 73 Figure 2: Circular map of the complete plastid genome of Cuscuta exaltata...... 74 Figure 3: Circular map of the complete plastid genome of Cuscuta obtusiflora...... 75

Figure 4: Pairwise dN/ dS of Nicotiana and Ipomoea vs. Panax ginseng for all shared -coding genes...... 76 Figure 5: Rates of substitution and selection across 4 functionally-defined classes of genes...... 77 Figure 6: Phylogenetic trees created using Maximum Likelihood GTR+gamma for each functionally defined gene class...... 78 Figure 7: p-distance for Epifagus, Ipomoea, C. exaltata, and C. obtusiflora vs. Panax across most genes present in Epifagus...... 79

Figure 8: dN/ dS for all genes, all taxa (including Epifagus) vs. Panax...... 80

Chapter 5: Figure Legends...... 112 Figure 1: Gynoecia and ovules of species across the taxonomic diversity of Cuscuta...... 117 Figure 2: Maximum Parsimony consensus trees of 500 bootstrap replicates for rbcL, rps2, ITS, and all three genes combined with matK...... 118 Figure 3: Parsimony bootstrap consensus tree (500 replicates) with taxonomic classifications and plastid genome changes...... 119 Figure 4: Results of long PCR tests to detect differences in intergenic spacer regions...... 120 Figure 5: Unconstrained maximum likelihood tree estimates for atpE, rbcL, rps2, and rpoA...... 121 Figure 6: Floral diversity within the genus Cuscuta...... 122 Figure 7: Individual gene phylograms produced by Neighbor-Joining method...... 123 Figure 8: Approximation of phylogenetic inferences suggested by Yunker...... 124 vii

LIST OF TABLES

Chapter 2: Table 1: Number of clones screened and identified for each species...... 22

Chapter 3: Table 1: Intron distribution in Cuscuta, Ipomoea, Nicotiana, and Epifagus...... 37 Table 2: Levels of selection on important evolutionary branches in matK...... 37 Table 3: Voucher information and Genbank accession numbers...... 40

Chapter 4: Table 1: Plastid gene loss relative to Panax ginseng...... 71 Table 2: (A) Pairwise relative rates tests (B) Relative Ratio Tests...... 81 Table 3: Distance values estimated in unconstrained ML trees...... 82 Table 4: (A) Intergenic distance between shared, intact coding sequence (B) Shared pseudogene sequence relative to Nicotiana...... 83 Table 5: Cumulative codon usage...... 84

Chapter 5: Table 1: Genome size and chromosome numbers in Cuscuta...... 114 Table 2: Likelihood Ratio Test comparisons of trees with constrained clades versus fully unconstrained trees...... 115 Table 3: New primer sequences designed for this study...... 116 Table 4: Synonymous and nonsynonymous substitutions for all branches of atpE...... 125 Table 5: Synonymous and nonsynonymous substitutions for all branches of rbcL...... 126 Table 6: Synonymous and nonsynonymous substitutions for all branches of rps2...... 127 Table 7: Synonymous and nonsynonymous substitutions for all branches of rpoA...... 128 viii

ACKNOWLEDGMENTS

Professional acknowledgements are included in each of the major chapters of this thesis; therefore, these acknowledgments are more for personal thanks than business. My advisor, Claude dePamphilis does not appear in those acknowledgements since he is listed as a co-author, so it is only fair to thank him here. Claude was supportive throughout my PhD. and encouraged me to be an independent thinker and researcher, as well as allowing me to take a leading role in teaching Plant Taxonomy as a field course here at Penn State. He was also the first person to introduce me to the world of parasitic plants, back when I was an undergraduate nearly 10 years ago. My committee- David

Geiser, Stephen Schaeffer, and Andrew Stephenson- graciously allowed me extra time to finish my thesis. On short notice, they all gave thorough and thoughtful comments and revisions that will undoubtedly aid in finalizing the chapters for journal publication. I would also like to thank Jim Leebens-Mack for being like a second advisor during the final, roughest years of finishing my degree, as well as being a good friend. He took over that role from Todd Barkman, who was integral in training me when I first started as a graduate student. In only a few short months after working closely with him, taking

Molecular Evolution class was a simple review, even though I had never had such a class in my undergraduate career. He was a crucial part of a flurry of learning during my first two years before my research became more aimed towards legubrious, monotonous data collection, analysis, and writing. Sheila Plock was a wonderful technician who was always helpful to me and does everything she can to expedite, not impede the work of others in lab. The research world would be a better place if all technicians were so thoughtful. Kevin Imafuku and Alison Lau aided in helping me finish many of the ix tedious chores near the end of my dissertation. Doug Cavener and the Biology

Department helped to further my training by assisting in sending me to Costa Rica for a

Tropical Plant Systematics class through the Organization for Tropical Studies that was undoubtedly one of the best experiences of my . The Biology Department funded me with various teaching assistantships and allowed me to teach Plant Taxonomy every year

I was here, as well as giving me a semester to focus on research during a critical time by awarding me the Henry W. Popp Fellowship. Mauricio Bonifacino and Nicole

Maturen were wonderful guides and translators during a collecting trip to South America in 2003, which also was undoubtedly one of the greatest experiences I will ever be lucky enough to have in life. David McCauley gave me my first job in a plant research lab after my freshman year at Vanderbilt University and was an excellent undergraduate advisor to me as well. Finally, I'd like to thank Dr. Robert Kral, whose Spring Flora class I was lucky enough to get into the last time it was ever offered at Vanderbilt before he and the herbarium moved on. That class introduced a budding amateur naturalist to the wonderful world of plant diversity and provided me with direction and motivation to do something I enjoyed during a critical period in my professional development. Chapter 1:

Overview

Within flowering plants, the primary producers for most extant terrestrial life, the transition from autotrophy to heterotrophy has occurred numerous times. This transition is often mediated by mycorrhizal mutualisms that most land plants share with fungi, with the fungi mediating inorganic nutrient uptake by the plant roots in exchange for organic nutrients made available by photosynthesis. Some plants exploit this relationship by becoming completely reliant on fungal associates for all of their nutrition. In this phenomenon, known as mycotrophy, organic nutrients flow from one plant through the to the mycotrophic plant or "mycoparasite"; a taxonomically diverse assemblage of plants and fungi are known to be involved in such interactions (Bidartondo and Bruns,

2001; Bidartondo et al., 2002).

By contrast, plants commonly referred to as "parasitic plants" obtain some or all of their nutrition through a direct connection with their hosts. The connective organs, called haustoria, take a variety of forms depending on the differing life histories of the parasites. At least 11, and perhaps as many as 13 independent lineages of angiosperms have evolved direct haustorial parasitism (Barkman et al., in prep). Thousands of species of parasitic plants representing an astonishing diversity of morphology and life history exist among these lineages. A large number of parasites retain photosynthesis, a phenomenon referred to as hemiparasitism. Many others have lost the ability to photosynthesize altogether and are known as holoparasites. These parasites generally 2 lack obvious and expanded leaves. Parasitic plants may be rooted in the soil and attach their haustoria to the roots of their hosts, they may be epiphitic and attach to branches or stems of their host, or they may be endophytic and grow entirely within their host, emerging only to flower and fruit (Kuijt, 1969).

This thesis deals specifically with one lineage of parasitic plants, the genus

Cuscuta (dodders). These plants are epiphytic , with no roots or expanded leaves.

They twine around their hosts and penetrate their stems with numerous haustoria in regions of close contact, eventually extracting nutrients from both xylem and phloem.

This lineage evolved within the order and has been confidently shown to be nested within the Morning Glory family, Convolvulaceae (Stefanovic and Olmstead,

2004). Members of the genus cause extensive crop damage each year, largely on fodder crops such as alfalfa and clover, but also extending to onion, tomato, basil, , cranberries, citrus trees, and many other non-grain crops (Kuijt, 1969). This parasitic plant genus has been the focus of hundreds of studies involving its morphology, physiology, ecology, and plastid genome evolution (see internal chapter references for examples). The purpose of this thesis was to provide a phylogenetic framework from which to better understand all aspects of the biology of this genus in an evolutionary context, and in particular, to investigate plastid genome evolution within this parasitic lineage.

The second chapter of this thesis is formatted for submission to Biotechniques and outlines a procedure developed in the dePamphilis lab specifically for obtaining full plastid genome sequences from parasitic plants. Popularly used current methods of plastid genome sequencing (Jansen et al., 2005) do not work for many plants and are 3 most often useless for parasitic plants. The method outlined in this thesis involves production of a large-insert clone library followed by a screening procedure to identify plastid clones. A set of clones that covers the entire plastid genome is chosen, and these clones are sheared, subcloned, and enough fragments are randomly sequenced for adequate coverage of the entire plastid chromosome. This method should also be applicable to mitochondrial genomes in plants.

The third thesis chapter details the implications of loss of the gene matK from the plastid genome of one lineage within Cuscuta. This gene is thought to be an intron maturase and is located within an intron of a transfer RNA gene (trnK-UUU). The trnK intron, matK, and all other group II introns appear to have been incorporated into the plastid genome prior to the evolution of land plants, as they also are found in

Charophytes, the algal lineage sister to land plants (Sanders, Karol, and McCourt, 2003).

The putative function of matK is to promote splicing of group IIA introns, 7 of which exist in the plastid genomes of most plants (Liere and Link, 1995; Vogel, Borner, and

Hess, 1999). The parasitic of Cuscuta likely has allowed the loss of many transfer

RNA genes that contain group IIA introns. This phenomenon, combined with a high frequency of intron loss from intact genes through an unknown mechanism, has led to the loss of all group IIA introns from Cuscuta subgenus Grammica. This, in turn, likely rendered the function of matK obsolete and allowed for its loss in this group.

Additionally, it appears that matK may undergo adaptive evolution after the loss of most group IIA introns in cases where it no longer must be a generalist splicing factor. This chapter is formatted for submission as a manuscript to Nature. 4

Chapter four of this thesis is formatted for submission to Molecular Biology and

Evolution. In this chapter, data are presented for three fully sequenced plastid genomes:

Ipomoea purpurea, Cuscuta exaltata, and Cuscuta obtusiflora. Ipomoea is a close relative of Cuscuta within Convolvulaceae. Cuscuta exaltata is a stout member of

Cuscuta subgenus Monogyna that is very green throughout its stems and inflorescences.

Cuscuta obtusiflora is a member of subgenus Grammica that usually only produces chlorophyll within its inflorescences, especially in the ovules. These three species were chosen to test if significant changes to the plastid genome occurred before the evolution of parasitism, whether significant gene loss and changes to constraint within the plastid genome accompany reduced and localized chlorophyll production, and whether Cuscuta shows parallel patterns of plastid genome evolution relative to the nonphotosynthetic

Epifagus virginiana, the only parasitic plant plastid genome sequenced prior to this work

(Wolfe, Morden, and Palmer, 1992).

The fifth thesis chapter extends the work from chapter four to the entire genus. A well-supported phylogeny of Cuscuta is presented, and further sequence and PCR data are used to determine where the major changes to the plastid genome discovered in chapter 4 happened in the genus. The phylogeny is compared with prior monographs of the genus (Choisy, 1841; Engelmann, 1859; Yuncker, 1932) to identify where taxonomic corrections need to be made. Finally, the phylogeny and analytical results are used to interpret various aspects of the biology of this parasitic lineage, including photosynthetic purpose and ability, species delimitation, and morphological evolution. This chapter is formatted as a manuscript that is to be submitted to American Journal of . A final thesis chapter is used to present final conclusions and direction of future work. 5

References BARKMAN, T. J., J. R. MCNEAL, L. S., G. COAT, H. B. CROOM, N. D. YOUNG, AND C. W. DEPAMPHILIS. in prep. Mitochondrial DNA suggests 12 origins of parasitism in angiosperms and implicates parasitic plants as vectors of . to be submitted to Proceedings of the National Academy of Sciences of the United States of America. BIDARTONDO, M. I., AND T. D. BRUNS. 2001. Extreme specificity in epiparasitic Monotropoideae (Ericaceae): widespread phylogenetic and geographical structure. Molecular Ecology 10: 2285-2295. BIDARTONDO, M. I., D. REDECKER, I. HIJRI, A. WIEMKEN, T. D. BRUNS, L. DOMINGUEZ, A. SERSIC, J. R. LEAKE, AND D. J. READ. 2002. Epiparasitic plants specialized on arbuscular mycorrhizal fungi. Nature 419: 389-392. CHOISY, J. D. 1841. De Convolvulaceis Dissertatio Tertia. Mem. Soc. Phys. Hist. Nat. Geneve. 9: 261-288. ENGELMANN, G. 1859. Systematic arrangement of the species of the genus Cuscuta, with critical remarks on old species and descriptions of new ones. Trans. of the Academy of Science, St. Louis 1: 453-523. JANSEN, R. K., L. A. RAUBESON, J. L. BOORE, C. W. DEPAMPHILIS, T. W. CHUMLEY, R. C. HABERLE, S. K. WYMAN, A. J. ALVERSON, R. PEERY, S. J. HERMAN, H. M. FOURCADE, J. V. KUEHL, J. R. MCNEAL, J. LEEBENS-MACK, AND L. CUI. 2005. Methods for Obtaining and Analyzing Whole Genome Sequences. Methods in Enzymology (in press). KUIJT, J. 1969. Biology of Parasitic Flowering Plants. University of Press, Berkeley and Los Angeles. LIERE, K., AND G. LINK. 1995. RNA-binding activity of the matK protein encoded by the chloroplast trnK intron from Mustard (Sinapis alba L.). Nucleic Acids Research 23: 917-921. SANDERS, E. R., K. G. KAROL, AND R. M. MCCOURT. 2003. Occurrence of matK in a trnK group II intron in charophyte green algae and phylogeny of the Characeae. American Journal of Botany 90: 628-633. STEFANOVIC, S., AND R. G. OLMSTEAD. 2004. Testing the phylogenetic position of a parasitic plant (Cuscuta, Convolvulaceae, Asteridae): Bayesian inference and the parametric bootstrap on data drawn from three genomes. Systematic Biology 53: 384-399. VOGEL, J., T. BORNER, AND W. R. HESS. 1999. Comparative analysis of splicing of the complete set of chloroplast group II introns in three higher plant mutants. Nucleic Acids Research 27: 3866-3874. WOLFE, K. H., C. W. MORDEN, AND J. D. PALMER. 1992. Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant. Proceedings of the National Academy of Sciences of the United States of America 89: 10648- 10652. YUNCKER, T. G. 1932. The Genus Cuscuta. Memoirs of the Torrey Botanical Club 18: 113-331. 6

Chapter 2: Formatted for submission to Biotechniques

Utilization of Partial Genomic Fosmid Libraries for Sequencing Complete

Organellar Genomes

Joel R. McNeal, James H. Leebens-Mack, and Claude W. dePamphilis

Department of Biology, Huck Institutes of Life Sciences, and Institute of Molecular

Evolutionary Genetics, The Pennsylvania State University, University Park, PA, 16802-

5301 7

Abstract:

Eukaryotic organellar genomes are vastly smaller in size than their nuclear counterparts and exist in relatively high copy number per . Although cloning and sequencing of organellar genomes is generally much easier and profoundly less expensive than total nuclear genome sequencing, it often demands extractions greatly enriched in organellar

DNA for acceptable efficiency. The enrichment process involves isolation of intact organelles, a difficult process for most , and an impossible one for others. We herein present results using simple macroarray screening of partial genomic fosmid libraries from selected plants to isolate large fragments of plastid DNA. An optimal subset of these fragments were identified, shotgun sequenced, and assembled into complete plastid genomes. This technique is extremely useful for organisms whose organelles defy other available sequencing or for which very limited amounts of tissue are available. 8

INTRODUCTION

To date, over 600 mitochondrial genomes and over 40 plastid genomes are publicly available (http://www.ncbi.nlm.gov/genomes/static/euk_o.html). Unlike the eukaryotic nuclear genome, organellar genomes come in high copy-number per cell and are of an overall size more tenable for complete sequencing. Gene orthology is typically easy to ascertain across a wide taxonomic range of organisms, and, thus, organellar genes provide a disproportionately large fraction of the genes currently used for organismal phylogeny (1). Although these genomes contain just a small proportion of the genes necessary for their primary organellar functions (ATP synthesis in the , photosynthesis in the plastid), much can be learned about the evolution of these processes from the subset of genes that remain in the organelle.

While it is generally much easier and much less costly to sequence organellar rather than entire organismal genomes, it is still a laborious task in most instances. Plant plastid genomes are particularly challenging to isolate and sequence. The earliest plastid genome sequences were generated by digesting, cloning, and mapping plastid-enriched

DNA, followed by sequencing small fragments one at a time from the clone bank (2).

With the advent of cost-effective, high-throughput sequencing, plastid genome sequences could be generated more efficiently by shotgun cloning and sequencing directly from plastid DNA (ptDNA) isolations. To separate ptDNA from nuclear and mitochondrial contaminants, intact must be isolated, most often by sucrose or percoll gradient centrifugations (3). The final yield of purified ptDNA is low, and large quantities of fresh starting tissue are necessary to produce even a small quantity of plastid-enriched

DNA. Due to the immense quantity of nuclear DNA relative to ptDNA in a cell, it is rare 9 to produce an extraction with less than 30% nuclear contamination. This proportion may be improved by using Rolling Circle Amplification (RCA) to further enrich the extraction. RCA often preferentially amplifies the high-copy ptDNA present in these extractions, reducing the percentage of DNA that is nuclear in origin, and thus making shotgun sequencing of the product much more efficient. Another method recently employed for a number of plastid genomes is PCR of large sections of ptDNA between regions for which primers exist in conserved plastid sequence (4). Sequences from numerous PCR fragments are then assembled into a complete plastid genome. Jansen et al. (3) extensively review current procedures in ptDNA isolation and sequencing.

Although the above procedures have succeeded for a variety of plant plastid genomes, many plants exist for which available methods are not feasible. For many plants, it is difficult, if not impossible, to produce isolations that are significantly enriched for ptDNA, even if large amounts of fresh tissue are readily available. The PCR method (4) eliminates the need for specific ptDNA isolation; however, assembling sequences of large, overlapping PCR products into a plastid genome is only practical if the genome is not highly rearranged or if gene order is otherwise known via prior mapping. A large set of PCR and sequencing primers spaced around the entire plastid genome is also necessary.

Among the most problematic groups of land plants for existing methods of plastid genome sequencing are parasitic plants. Parasitic plants provide an opportunity to study changes in the plastid genome that accompany the transition from autotrophy to heterotrophy and perhaps reveal underlying functions of the plastid genome that aren't apparent in plants with a dominant photosynthetic function. Despite the utility of data 10 from parasitic plants, the plastid genome of only one, Epifagus virginiana (Beechdrops), has been successfully sequenced. Sequencing of Epifagus required tedious production of a clone library produced from a combination of PCR products and excised restriction fragments (5). Parasitic plant plastids are typically more fragile than those of completely autotrophic plants, usually making plastid extraction and significant ptDNA enrichment impossible, and their genomes are often divergent enough structurally and at the nucleotide level to make PCR of some regions difficult, if not impossible. As such, the improved methods of plastid genome sequencing discussed above cannot be efficiently applied. The method we present here enables one to obtain complete plastid genomes from both parasitic and nonparasitic plants using a small amount of tissue, which may be fresh, frozen, or dried in silica gel.

MATERIALS AND METHODS

DNA Isolation and Purification

Fresh tissue from two parasitic plants (Cuscuta exaltata and Cuscuta obtusiflora) and a completely heterotrophic plant (Ipomoea purpurea) was grown from seed.

Approximately 1 g of tissue from each plant was pulverized to powder via mortar and pestle after being frozen in liquid nitrogen for approximately 20 seconds. DNA was extracted in 10 ml of buffer from the powdered tissue using a standard 2X C-TAB procedure (6) with 1% Polyethylene Glycol MW 8000 (PEG8000) included in the extraction buffer. At the final Isopropanol precipitation step, DNA was spooled out with

a glass hook, rinsed with 70% ethanol, and resuspended in 500 µl of H2O. To clean and further purify the DNA, it was precipitated again by adding 125 µl of 4 M NaCl, 11 followed by 625 µl of 13% PEG8000 and incubated on ice for 20 minutes before centrifugation at 4°C for 15 minutes. The final DNA pellets were resuspended in 75 µl of

H2O. The major high molecular weight fragment size of the DNA was determined by running 3 µl alongside a size standard on a 0.8% agarose gel using field inversion gel electrophoresis (FIGE).

Partial Genomic Library Construction

The CopyControl™ Fosmid Library Production Kit from Epicentre® was used to construct partial total DNA libraries from the three purified DNA isolations. DNA of appropriate size for Fosmid cloning was end-repaired as described in the protocol and gel-purified using FIGE with a 0.8% low-melt agarose gel. The recovered were

resuspended in 10 µl of H2O. Concentration of final end-repaired DNA was determined using Amersham® PicoGreen™ dye and flourimetry. Appropriate quantities of DNA were ligated and packaged according to the manufacturer's protocol.

Identifying Plastid Clones

E. coli cells provided with the Fosmid Kit were infected with phage particles and plated on LB-agar + 12.5 µg/ml chloramphenicol. A Genetix® Q-PixII™ robot was used to organize individual clones into 384-well plates. The same robot was then used to grid a predetermined number of colonies onto nylon membranes (Genetix® Q-Performa™) soaked in LB + 12.5 µg/ml chloramphenicol. Gridding patterns that allowed rapid identification of specific clones after hybridization were used (Fig. 1), and each clone was replicated a minimum of 6 times per filter. Colonies were grown on the filters for 16 12 hours. Afterwards, filters were allowed to soak up denaturing solution (0.5 N NaOH, 1.5

M NaCl) from saturated blotter paper below for 4 minutes. This process was repeated with fresh denaturing solution over a boiling water bath for an additional 4 minutes. The filters were then placed on blotter paper soaked in 1.5 M NaCl, 1 M Tris solution for 4 minutes at room temperature and allowed to dry for 10 minutes. Colonies were immersed in a Proteinase K solution (.1 M NaCl, 50 mM Tris, 50 mM EDTA, 1 X

Sarkosyl 100 mg/L Proteinase K) for 50 minutes at 37°C, dried thoroughly, baked for 2 hours at 80°C, and cross-linked under ultraviolet light for 2 minutes.

PCR products ranging from approximately 200 to 700 base pairs were generated from the plastid genes rps2, rps4, rpl16, rps7, rbcL, and psaC for each species. After verifying the size of the amplicon, PCR products were pooled at approximately equal molar concentration, diluted to approximately 5 ng/µl, and radioactively labeled with [∝-

32P]dATP according to the Ambion® Strip-EZ™ DNA protocol. Excess radionucleotide was removed by running the probes through Centri-Spin™ columns (Princeton

Separations®).

Filters were prehybridized in a solution of 5X NaCl/NaH2PO4/EDTA (SSPE), 5X

Denhardt's Solution (7), 0.5% sodium dodecyl sulfate (SDS), and 0.1 mg/ml fragmented salmon sperm DNA for 1 hour at 68°C. Probes were diluted to 250 µl in 10 mM EDTA, denatured at 90°C for 10 minutes, and added to the prehybridized filters. The filters were allowed to hybridize at 68°C overnight. They were then subjected to 5 washes of 15 minutes each. They were first washed in a solution of 2X SSPE and 0.5% SDS at room temperature, followed by a wash in 2 X SSPE / 0.5% SDS at room temperature, a third wash of 0.3 X SSPE / 0.5% SDS at room temperature, a fourth wash in 2X SSPE / 0.5% 13

SDS at 55°C, and finally a fifth wash of 0.3 X SSPE at room temperature. The filters were enclosed in plastic wrap and exposed on phosphorimaging screens overnight. A screen image was captured and putative plastid clones were identified by positive hybridizations.

Selecting Clones for Sequencing

Out of the positively hybridizing clones, a subset of 6 to 15 clones was randomly selected as possible candidates for full sequencing for each species. Cultures with 5 ml of Terrific Broth + 12.5 µg/ml chloramphenicol were inoculated with the selected clones and grown for 15 hours. 0.5 ml of this culture was then added to 4.5 ml of LB broth +

12.5 µg/ml chloramphenicol. This culture was then induced to high copy number following the CopyControl™ protocol. Cells were spun down and transferred to microcentrifuge tubes where minipreps were performed using a mini alkaline-lysis procedure followed by precipitation with 1/4 volume NaCl and equal volume PEG8000 at

4°C for 20 minutes followed by centrifugation. Pellets were resuspended in 20 µl of

H2O, and DNA concentrations were determined on an Eppendorf® Biophotometer™. T7

RNA promoter forward primer and pCC1/pEpiFOS reverse sequencing primer (sequence provided in CopyControl™ protocol) were used to sequence the first few hundred at both ends of each fosmid insert on a Beckman Coulter

CEQ8000™ system. 2.5 µg of DNA template and 5 µmoles of primer were used, with other sequencing parameters following those provided by Beckman Coulter for Bacterial

Artificial Chromosome (BAC) end sequencing. BLASTn (8) was then used to confirm the plastid origin of each clone's insert and to identify the plastid region spanned by each 14 insert. Directionality of the end sequences was also checked relative to the fully sequenced plastid genome of Nicotiana tabacum (Genbank accession NC 001879) for each in order to find any genomic rearrangements involving inversion that may have occurred with one internal and one external breakpoint. In addition, PCR tests were conducted for the same genes used as the hybridization probes to confirm that the clones covered the entire region indicated by the end sequences. With reasonably secure knowledge as to the region encompassed by each clone, minimally overlapping sets of clones encompassing the entire plastid genome were chosen for each species. The DNA preps of each were randomly sheared into fragments approximately 3 kilobases in length, and subcloned (3). 384 random clones were chosen from the subclone library and sequence reads of both ends of the fragment were obtained; fosmid vector sequence reads were screened out, and the remaining reads were assembled into a complete circular plastid genome as described in Jansen et al. (3). A 3 kilobase gap in fosmid clone coverage for Cuscuta exaltata was PCR amplified and sequenced separately on the

Beckman Coulter Beckman Coulter CEQ8000™ following the standard procedures outlined by the manufacturer.

RESULTS AND DISCUSSION

This method successfully produced plastid genomes for all three plants attempted, including both parasitic species. Macroarray probing resulted in clearly defined, positively hybridizing clones (Fig. 1). The combination of end sequencing and PCR analysis of clones provided a reliable inference of plastid genome coverage area for each clone. In all, 5 clones were necessary for coverage of Ipomoea purpurea, 4 for Cuscuta 15 exaltata, and 3 for Cuscuta obtusiflora. Locations of clones are shown in Figure 2. The full plastid genome (IR) was only sequenced once in Cuscuta exaltata; no polymorphisms between the two IRs were detected in the other species.

Although this method was successfully implemented across these species, there were drastic differences in overall percentages of clones that hybridized to the pooled plastid probes. At first look, the results were rather unexpected; Cuscuta exaltata is a much more chlorophyllous species with a much more conserved plastid genome than C. obtusiflora, yet over ten times as many clones positively hybridized for C. obtusiflora than for C. exaltata (Table 1). The reason for this result became clear once we examined flow cytometric nuclear genome size estimates of these species. Nuclear DNA content of

C. exaltata was estimated to be over 25 times that of C. obtusiflora. Not surprisingly, larger amounts of nuclear DNA per cell resulted in a lower relative percentage of ptDNA in a total DNA isolation, and subsequently, a lower percentage of clone inserts in the fosmid libraries that were of plastid origin. Because the ratio of nuclear to ptDNA plays such a huge role in determining how many clones must be screened to ensure enough plastid ones are found, it follows that the tissue-type sampled may also play a role. Plant tissues with a higher density of plastids would perhaps enhance the success rate of finding plastid clones. Age of tissue may also play an important role in determining its overall merit for this method, as plastid DNA concentration may decrease over the life of the cell (9). Interestingly, although estimates of nuclear genome size for Ipomoea and

Cuscuta obtusiflora were very similar, the percentage of clones that were plastid in

Ipomoea was over 3 times higher than in C. obtusiflora. Young leaf tissue was used for the original Ipomoea extraction, whereas stem-tip tissue was used for C. obtusiflora 16

(leaves are reduced to minute scales in this genus). The difference in plastid clone percentage could be due to tissue-type, as already mentioned, or due to an overall reduction in number of plastids and/or plastid DNA copy in the cells of the heterotrophic

C. obtusiflora.

Although this method worked well for the three plants discussed herein, there are some caveats that should be addressed. The ability to detect small organellar genomes is limited by the insert size of the library. The smallest genome that we sequenced was 85 kilobases, but plastid genomes less than 40 kilobases would not be included in the fosmid library and would be impossible to sequence using the techniques described above. In such cases, a solution would be to use a different type of library with smaller clone size.

This method also requires plastid probes spaced less than about 80 kilobases apart that can be hybridized against the library. Genomes for which insufficient PCR primers exist could be heterologously probed with genes amplified from a related species using hybridization conditions less stringent than those presented here. In addition, as long as at least one organellar clone is positively identified, its end sequences can be used to reprobe the library and "walk" around the genome in both directions. Highly rearranged genomes could also potentially cause problems with determining coverage across the entire plastid genome map. Although interpretation is complicated by the presence of fosmid vector ligated to the insert DNA, restriction mapping of clones could be used to confirm complete genome coverage. However, end sequencing and an increased number of internal PCR tests on each clone should be sufficient, if possible, in practically all cases. 17

One final caveat that we encountered in screening our clones is the possibility of false positive hybridizations resulting from lateral transfers of plastid DNA to either the mitochondrial or nuclear genome. Despite the fact that most of the plated fosmid clones have inserts of nuclear origin, any transfer to the nuclear genome that exists in single copy form is highly unlikely to be detected using the number of clones we screened, as only a small proportion of the vast nuclear genome is represented. A transfer of plastid material to the mitochondrial genome is much more likely to be detected because, like the plastid genome, it exists in high copy number within each cell (10). We detected two clones whose inserts are suspected to be of mitochondrial origin. End sequence of a strongly hybridizing clone for Ipomoea gave Blastn results very similar to regions of the

Beta vulgaris mitochondrial genome (NC 002511) on both ends. One clone for Cuscuta exaltata reported plastid sequences as the best blast hit on both ends, and PCR tests showed it contained all of the plastid probes it should have as predicted by the end sequences. However, most of the genes in this clone were obvious pseudogenes, revealed as such by early stop codons or large truncations. Some pseudogenes were present in multiple copies, and many internal rearrangements existed for this clone, although the pseudogene sequences were not extremely divergent from the true gene sequences. Quite rapid structural change yet slow mutation rates are characteristic of plant mitochondrial genomes (10), indicating this clone is probably a large fragment of plastid DNA that was transferred to the mitochondrial genome, where it has become nonfunctional. Transfers from the plastid to the mitochondrion of genetic material this sizable have never been reported before, but it is not completely unexpected given that in at least one strain of 18

Arabidopsis thaliana, a nearly full copy of the mitochondrial genome is present on a nuclear chromosome (11).

Despite these caveats, this method is a proven, effective way to obtain complete plastid genomes from as little as 1 gram of plant tissue, even from those plants for which extracting purified ptDNA is impossible or which have undergone extensive rearrangements relative to other known plastid genomes. Unlike other methods that require large quantities of fresh tissue, small quantities of frozen or even silica gel dried plant material generally produce a sufficient DNA quantity with most high molecular weight fragments falling within the size range necessary for fosmid cloning with simple, fast microextractions. Even though the 8 kilobase fosmid vector is proportionally 15 to

20 percent of the DNA that is sheared and shotgun sequenced, practically no finishing sequencing was necessary for the plastid genomes generated with this method; other shotgun sequencing methods, even with RCA ptDNA enrichment, rarely approach 80% plastid DNA efficiency (3). We found no evidence of heteroplasmy in the plastid genomes of the organisms used in this study; however, examination of multiple clones could provide insight into this phenomenon, including possible polymorphisms between the inverted repeats. Another benefit to this method over others is the ability to determine changes in orientation of the small single copy region of the plastid genome relative to the large single copy region. Recombinations between inverted repeats could easily lead to a change in orientation of the small single copy region that would be impossible to detect via PCR or ptDNA shotgun sequencing. If fosmid clones are identified with inserts that span from the large single copy to the small single copy region spanning an inverted repeat, their relative orientations are easily determined. 19

Although we used parasitic plant plastid genomes as an example of this method, it could easily be extended to other organellar genomes, including those of non-plants. Larger organellar genomes would require more probes to ensure no sections of the genome would be too far away from a probe on either side to be included in a positively hybridizing clone. For both mitochondrial and plastid genomes, Bacterial Artificial

Chromosome (BAC) libraries could be used instead of fosmid libraries, so long as the library insert sizes were less than the overall size of the organellar genome. It would take fewer BAC clones than fosmid clones to cover an organellar genome, but BAC libraries are much more difficult to generate and, again, usually require sizable amounts of fresh material for DNA extraction (12). Finally, this method could be employed as a means to separate organellar DNA of organisms in close association, such as endophytes and endosymbiotic organisms and their hosts. As long as species-specific probes could be generated, organellar genomes could be readily attainable without contamination.

ACKNOWLEDGMENTS

The authors wish to thank Sheila Plock, Tim Chumley, and Xiaomu Wei for technical assistance, Tony Omeis and the Pennsylvania State University Greenhouse for assistance in growing plant material, John Carlson and Tei-hui Kao for use of pulse field gel equipment, K. Arumuganathan for flow cytometric nuclear genome size estimations,

David Geiser, Steve Schaeffer, and Andy Stephenson for critical review of the manuscript, and Jennifer Kuehl and Jeffery Boore of the Joint Genome Institute for sequencing results and assemblies. 20

COMPETING INTERESTS STATEMENT

The authors declare no competing interests.

REFERENCES

1. Savolainen, V., M.W. Chase, N. Salamin, D.E. Soltis, P.S. Soltis, A.J. Lopez, O. Fedrigo and G.J.P. Naylor. 2002. Phylogeny reconstruction and functional constraints in organellar genomes: Plastid atpB and rbcL sequences versus mitochondrion. Systematic Biology 51:638-647. 2. Shinozaki, K., M. Ohme, M. Tanaka, T. Wakasugi, N. Hayashida, T. Matsubayashi, N. Zaita, J. Chunwongse, J. Obokata, K. Yamaguchishinozaki, C. Ohto, K. Torazawa, B.Y. Meng, M. Sugita, H. Deno, T. Kamogashira, K. Yamada, J. Kusuda, F. Takaiwa, A. Kato, N. Tohdoh, H. Shimada and M. Sugiura. 1986. The Complete Nucleotide-Sequence of the Tobacco Chloroplast Genome - Its Gene Organization and Expression. Embo Journal 5:2043-2049. 3. Jansen, R.K., L.A. Raubeson, J.L. Boore, C.W. dePamphilis, T.W. Chumley, R.C. Haberle, S.K. Wyman, A.J. Alverson, R. Peery, S.J. Herman, H.M. Fourcade, J.V. Kuehl, J.R. McNeal, J. Leebens-Mack and L. Cui. 2005. Methods for Obtaining and Analyzing Whole Chloroplast Genome Sequences. Methods in Enzymology (in press). 4. Goremykin, V.V., K.I. Hirsch-Ernst, S. Wolfl and F.H. Hellwig. 2003. Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm. Molecular Biology and Evolution 20:1499- 1505. 5. Wolfe, K.H., C.W. Morden and J.D. Palmer. 1992. Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant. Proceedings of the National Academy of Sciences of the United States of America 89:10648- 10652. 6. Doyle, J.J. and J.L. Doyle. 1990. Isolation of plant DNA from fresh tissue. Focus 12:13-15. 7. Sambrook, J., E.F. Fritsch and T. Maniatis. 1989. Molecular Cloning: A Manual. Cold Springs Harbor Laboratory, New York. 8. Altschul, S.F., W. Gish, W. Miller, E.W. Myers and D.J. Lipman. 1990. Basic Local Alignment Search Tool. Journal of Molecular Biology 215:403-410. 9. Rowan, B.A., D.J. Oldenburg and A.J. Bendich. 2004. The demise of chloroplast DNA in Arabidopsis. Current Genetics 46:176-181. 10. Palmer, J.D. and L.A. Herbon. 1989. Plant Mitochondrial-DNA Evolves Rapidly in Structure, but Slowly in Sequence. Journal of Molecular Evolution 28:87-97. 11. Lin, X.Y., S.S. Kaul, S. Rounsley, T.P. Shea, M.I. Benito, C.D. Town, C.Y. Fujii, T. Mason, C.L. Bowman, M. Barnstead, T.V. Feldblyum, C.R. Buell, K.A. Ketchum, J. Lee, C.M. Ronning, H.L. Koo, K.S. Moffat, L.A. Cronin, M. Shen, G. Pai, S. Van Aken, L. Umayam, L.J. Tallon, J.E. Gill, M.D. Adams, A.J. Carrera, T.H. Creasy, H.M. Goodman, C.R. Somerville, G.P. 21

Copenhaver, D. Preuss, W.C. Nierman, O. White, J.A. Eisen, S.L. Salzberg, C.M. Fraser and J.C. Venter. 1999. Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana. Nature 402:761-+. 12. Chalhoub, B., H. Belcram and M. Caboche. 2004. Efficient cloning of plant genomes into bacterial artificial chromosome (BAC) libraries with larger and more uniform insert size. Plant Biotechnology Journal 2:181-188. 22

Figure 1. Macroarray screen of fosmid clones using pooled plastid probes. Eight plates, each containing 384 clone cultures from a partial genomic fosmid library of Cuscuta obtusiflora, were spotted onto the filter in a known pattern. Squares on the grid are labeled along the outer edge corresponding to the 384 wells of the plates. Each grid square contains clones corresponding to that well from all 8 plates, and each clone is replicated twice within the square in a particular pattern unique to each of the eight plates (shown below the grid). In total, 6144 spots representing 3072 unique clones were screened in this particular image, of which approximately 66 positively hybridized to the plastid probes. Six clones from plate 3 (wells C8, D14, F4, F5, and N5, shown with emboldened borders) were randomly chosen for end sequencing and internal PCR testing to determine what portion of the plastid genome they covered.

Figure 2. Map of end-sequenced clone coverage on plastid genomes. Both ends of selected clones were sequenced to determine relative coverage of the plastid genome. Sequence strand-directionality and internal PCR assays for a variety of plastid genes were also used to identify any genome rearrangements that may have occurred and could possibly confuse mapping. Minimal subsets of clones necessary for complete coverage were used for shotgun sequencing and are shown as solid arcs. Clone labels consist of a species identifier and the 384-well plate number before the period, followed by the well location after the period. End-sequenced clones not used for shotgun sequencing are shown as dashed arcs. Relative locations of the gene probes used for hybridization are marked on the circular genome map, with underlined gene labels for each probe inside the circles. Genome maps are drawn to scale relative to one another.

Table 1. Number of clones screened and identified for each species

# of clones # of positive Percent Nuclear genome screened hybridizations positives size Ipomoea purpurea 1536 120 7.81% 1.51 pg/2C Cuscuta exaltata 6144 10 0.16% 41.86 pg/2C C. obtusiflora 6144 140 2.28% 1.59 pg/2C 23

Figure 1

24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 P O N M L K J I H G F E D C B A

Plate # 1 2 3 4 5 6 7 8

Figure 2 24

Chapter 3: Formatted for submission to Nature

Disappearance of introns promotes adaptive change and loss of a highly conserved maturase

Joel R. McNeal and Claude W. dePamphilis

Department of Biology, Huck Institutes of Life Sciences, and Institute of Molecular

Evolutionary Genetics, The Pennsylvania State University, University Park, PA, 16802-

5301 25

Abstract

With few exceptions, plastid genome content and arrangement are highly conserved across land plants and their closest algal relatives (family Characeae), with most introns likely having been acquired in the common ancestor of these two lineages1, together known as Streptophytes. The intron within the transfer RNA trnK-UUU contains a large open reading frame that encodes the presumed intron maturase, matK. This maturase gene is found in all streptophyte plastid genomes sequenced to date including that of the nonphotosynthetic parasitic plant Epifagus virginiana

(Orobanchaceae), which possesses only four protein-coding genes not involved in or translation2. Here we report the first known loss of this gene from the plastid genome and present evidence that corroborates its involvement in splicing the full set of group IIA introns3 by examining matK and intron distribution in the parasitic plant genus Cuscuta (Convolvulaceae). Furthermore, we show that loss of most group IIA introns from the plastid genome results in substantial change in selective pressure within the hypothetical RNA-binding domain X of matK in both

Cuscuta and Epifagus. 26

Unlike introns found in nuclear genes of , introns in organellar genomes don't rely on for excision from transcripts. They are often regarded as "self-splicing", although most, if not all, still require other trans-acting factors for efficient splicing in vivo4. Although matK has been shown experimentally to be an essential factor in the splicing of the trnK intron within which it is contained5, its involvement in the splicing of other plastid introns isn't yet fully understood, as it also interacts with the group IIB intron within plastid trnG-UCC6. Despite the fact that the plastid genome of the nonphotosynthetic parasitic plant Epifagus lacks functional trnK and trnG transfer RNA genes, the trnK pseudogene retains a complete open reading frame for matK that appears to be evolving under selective constraint7, indicating matK may be essential for other functions beyond splicing the trnK intron in that species.

Various studies together have shown that in barley -deficient plastid mutants such as albostrians, the full set of seven group IIA introns in the plastid genome remain in an unspliced transcript form, while group IIB introns are largely unaffected and, instead, seem to rely upon a nuclear-encoded factor, crs2 in maize for splicing3,8,9.

Splicing of the only group I intron in the plastid genome, found within the trnL-UAA locus, is unaffected by any of these factors, as is splicing of the second of two group IIB introns found within ycf39.

Like Epifagus, members of the genus Cuscuta are parasitic plants that have undergone substantial gene loss from their plastomes. However, at least some members of the genus retain a substantial portion of their photosynthetic genes and probably photosynthesize10, albeit in a localized form less crucial to the parasites' survival than to the survival of fully autotrophic plants11. Loss of three group IIA introns from the 27 genomes of various Cuscuta species have been previously reported12,13, with at least the intron found within the 3' locus of the trans-spliced rps12 gene being polymorphic in its occurrence within the genus14. Using assays that gave clear positive or negative results

(Fig. 1), we surveyed for the presence of matK along with all group IIA introns, trnG-

UCC, and introns within ycf3 from a variety of Cuscuta species representing all three currently recognized subgenera (Table 1). In cases of tRNA (transfer-RNA) introns, we used sequence reads to confirm presence or absence of the gene and intron, as tRNA exons are generally shorter than 40 nucleotides in length.

Although the trnK gene itself is defunct across all Cuscuta species, all sampled members of subgenera Monogyna and Cuscuta retain an open reading frame for matK at the locus, identical to the situation reported for Epifagus. By contrast, all members of subgenus Grammica in this study lack matK, the first such instance reported in any lineage of Streptophyte. This loss correlates perfectly with the loss of all group IIA introns from the plastid genome, indicating that the major role of matK is nonexistent without introns of this class. However, members of subgenus Grammica still possess four group IIB introns and the trnL-UAA group I intron within otherwise normal genes, implying that resident plastid matK is not necessary for the splicing of these introns in

Cuscuta.

Loss of tRNA genes is a common phenomenon in the plastid genomes of parasitic plants15-17, and Epifagus has also lost its atpF gene and its group IIA intron along with all other photosynthetic and photorespiratory genes2. Although sampled members of subgenus Grammica parallel Epifagus in losing all group IIA intron-containing tRNAs, atpF and rps12 remain intact in Cuscuta despite precise intron losses from these genes. 28

The group IIB introns in ycf3 are also precisely lost from subgenus Grammica and

Cuscuta nitida (Table 1), indicating a mechanism for intron loss that isn't limited to just group IIA introns in Cuscuta. Intron losses from plastid genes are not unprecedented in land plants12,18,19, but they are far from common. Independent loss of introns from four different genes in Cuscuta suggests these plants are more prone to purge these elements from their genomes, perhaps through an increase in homologous recombination events involving processed retrotranscripts.

All species of Cuscuta that still possess matK also possess at least four group IIA introns with the exception of Cuscuta nitida, which retains only the 3' rps12 intron (Table

1, Fig. 2b). The open reading frame of matK was partially or fully sequenced for five species in Cuscuta subgenus Monogyna, three species from subgenus Cuscuta, and four species from the otherwise autotrophic family they are derived within, Convolvulaceae

(Morning Glory Family). Using outgroup sequences from available plastid genomes, a well-supported phylogeny was constructed that agrees fully with published relationships within Convolvulaceae20 (Fig. 2a). Ipomoea (tribe Convolvuleae) was strongly supported as sister to Cuscuta, although other hypotheses at this node could not be rejected in another study21. Because our taxon sampling outside of Cuscuta is sparse, we conservatively chose to collapse this node as a polytomy for analyses of selective constraint. All branches were found to be evolving under apparent selective constraint

with the inferred nucleotide distance at synonymous sites (dS) being greater than that at

nonsynonymous sites (dN). One portion of matK, referred to as domain X, has been

22 identified as the putative RNA binding domain of the protein . If dN/dS is constrained globally, domain X appears to be evolving under stronger purifying selection (dN/dS = 29

0.3033) than the remainder of the gene (dN/dS = 0.5340). All sampled species also contained an amino acid consensus motif within domain X (SX3-6TLAXKXK) conserved across streptophytes23, further suggesting that matK remains functional across all Cuscuta that still possess it.

To detect changes in selective constraint within domain X that may be correlated with loss of group II introns, we performed Likelihood Ratio Tests (LRTs) by comparing

the tree in Fig. 2a constrained with a global dN/dS against trees with dN/dS free to vary at individual branches where intron losses have occurred. None of the LRTs for these branches were significant with one exception (Table 2). When the branch leading to C. nitida is left unconstrained, a significantly better likelihood estimate is attained for the

tree (p=0.007). By contrast, releasing dN/dS for the same branch for the remainder of matK had little effect on improving the likelihood of the constrained tree (p=0.721).

These results indicate that loss of three of the final four group IIA introns in C. nitida has resulted in relaxed, or even positive selection specifically on domain X, as rates of nonsynonymous substitution actually exceed synonymous rates for this branch in the

unconstrained tree (dN/dS.=1.283). This is a dramatic reversal from the trend across the remainder of the tree, where domain X is under much higher levels of constraint than the rest of the gene and where multiple group IIA introns are still present. It is possible that constraint on domain X to remain a "generalist" for group IIA intron binding was been released on the branch leading to Cuscuta nitida, and matK may have subsequently adapted to specialize on the 3' rps12 intron.

Interestingly, in Epifagus one of only two remaining plastid group IIA introns is the 3' rps12 intron; the second is an intron in rpl2, which was presumably lost in the 30 ancestor of all Convolvulaceae before the evolution of Cuscuta24. Because Epifagus retains only one additional intron relative to Cuscuta nitida and both introns are found in ribosomal protein genes, we performed LRT analyses to determine whether matK in

Orobanchaceae may also be evolving under positive selection using available published sequences. A second nonphotosynthetic relative (Orobanche fasciculata), a photosynthetic parasite (Castilleja linariifolia), and a fully autotrophic sister-group to the parasites (Lindenbergia philippinensis), were included in the study, and the same outgroups were used as for the Convolvulaceae tests. The phylogeny obtained for these species (Fig. 2b) was congruent with published relationships25. As was the case with the

Cuscuta/Convolvulaceae result, global dN/dS as estimated from the fully constrained trees was lower in domain X than for the rest of the gene. The only branch producing

significantly better likelihood given un-enforced dN/dS was the terminal leading to

Epifagus for the domain X partition (p=0.007). Just as in Cuscuta nitida, dN actually exceeds dS (dN/dS = 1.078), yet outside of domain X, unconstrained dN/dS at this branch does not significantly effect the likelihood estimate (Table 2B), again indicating a shift from strongly purifying to possibly adaptive selection within domain X. Unlike in the case of Cuscuta, we don't have full knowledge of group IIA intron distribution in

Orobanchaceae, but given the strikingly parallel result for Cuscuta nitida, it is likely that

the reduction in number of group IIA introns has exacerbated increase in dN/dS for domain

X in Epifagus as well. Besides sharing a paucity of group IIA introns, Cuscuta nitida and

Epifagus share the trait of having lost all group II introns from tRNA genes and from atpF, which may suggest that the change in selective pressure observed on these two branches may be influenced by evolution specifically for the group IIA introns in 31 ribosomal protein genes. Unlike in Cuscuta, only full gene loss, not intron loss from intact genes, has ever been reported from Orobanchaceae. Since its invasion of the plastid genome over 450 million years ago inside of a group IIA intron, matK has assumed the role of both a cis- and trans- splicing element in the plastid genome. It is likely that a combination of tRNA loss facilitated by parasitism, combined with a predisposition for intron loss has allowed a lineage within the parasitic plant genus

Cuscuta to lose the plastid matK gene, which is necessary for functional in all other streptophytes studied to date. 32

Methods

PCR and Sequencing

PCR primers were specifically chosen for ease of interpretation on 1% agarose gels stained with ethidium bromide (see Fig. 1). PCRs for matK and plastid introns were conducted using a combination of published25-28 and newly designed primer sequences

(atpF-F 5'-ATGAAMRACGTAACCKATT-3', atpF-R 5'-CTCTTTGTAAGGYTTGTTG-

3', ycf3-F 5'-TCAGGAGAAAAAGAGGCATT-3', ycf3-R 5'-

GCAATTTCAGAATCTCCCTGTTG-3', rrn16-endF 5'-

GTGAAGTCGTAACAAGGTAGCCG-3', rrn23-R1 5'-

CGTCTCTGGGTGCCTAGGTATCC-3', trnKConv-endF 5'-

CACTATGTATCATTTGATAACCC-3', matKConv-54F 5'-

CCTATATCCACTTMTCTTTCAGGAG-3', matKConv-783F 5'-

GTYTTTGYTAAGGATTTTMAGG-3', matKConv-801F 5'-

GGCCAACCTAGGCTTGCTCAAGG-3', matKConv-882R 5'-

TTGAAGCCAGAAKKGATTTTCC-3', matKConv-1339R 5'-

AGTTCKAGCRCAAGAAAG-3', matKConv-1423R 5'-

GTTCTTCCGACGTWAAGAATTCTTC-3', matKConv-1450F

TTTRTATCRAATAAAGTATATAC-3', trnKsubgM-F1 5'-

GGGCGAGTATAAAGAGAGAGGG-3', matKsubgM-2R

5'CGTTCAATAATATCAGAATCT-3', matKsubgM-3F 5'-

CGCGCTTTTTTACAAAGCTTGGG-3', matKsubgM-ex3R 5'-

CCCAAGCTTTGTAAAAAAGCGCG-3', matKsubgM-ex4F 5'-

ATCTCAGAATTTACGATCAATTC-3', matKsubgM-ex5R 5'- 33

TGTAGAAAGAATTGTAATAAATG-3', matKsubgM-ex6R 5'-

CGAAGCGTCTTGTACCCAGACCG-3', matKsubgC-R1, 5'-

GAATCTGAKAARTCGGYCCAACC-3', 5'-CAMGATTTCCARATGAGGGGGG-3'). matK primers designed using sequence from subgenus Monogyna are designated by the suffix subgM, ones designed using subgenus Cuscuta sequences by subgC, and ones designed with Convolvulaceae sequences by Conv. Sequencing was performed on a

Beckman-Coulter CEQ8000 system according to manufacturers protocol and at the

Pennsylvania State University Nucleic Acids Facility. Genbank accession numbers and voucher numbers for sequences generated by and used for this study are shown in Table 3

(Supplemental Info). Complete plastid genome sequences of Cuscuta obtusiflora,

Cuscuta exaltata, and Ipomoea purpurea were used to assess presence of other plastid introns not directly involved in this study, to eliminate the possibility of gene transpositions in cases of PCR-detected intron loss, and to verify the presence of only the expected loci for genes examined in this study.

Data Analyses

Initial matK phylogenies were constructed in PAUP*4.0b1029 using Maximum

Likelihood, GTR + gamma method with parameters estimated from the data. Likelihood

Ratio Tests and nonsynonymous/synonymous rate calculations were done using HYPHY

.99b30 under the MG96 x HKY 3x4 codon model. 34

References 1. Turmel, M., Otis, C. & Lemieux, C. The chloroplast and mitochondrial genome sequences of the charophyte Chaetosphaeridium globosum: Insights into the timing of the events that restructured organelle DNAs within the green algal lineage that led to land plants. Proceedings of the National Academy of Sciences of the United States of America 99, 11275-11280 (2002). 2. Wolfe, K. H., Morden, C. W. & Palmer, J. D. Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant. Proceedings of the National Academy of Sciences of the United States of America 89, 10648- 10652 (1992). 3. Vogel, J., Borner, T. & Hess, W. R. Comparative analysis of splicing of the complete set of chloroplast group II introns in three higher plant mutants. Nucleic Acids Research 27, 3866-3874 (1999). 4. Lambowitz, A. M. & Perlman, P. S. Involvement of aminoacyl-transfer RNA- synthetases and other in group-I and group-Ii intron splicing. Trends in Biochemical Sciences 15, 440-444 (1990). 5. Vogel, J., Hubschmann, T., Borner, T. & Hess, W. R. Splicing and intron-internal RNA editing of trnK-matK transcripts in barley plastids: Support for matK as an essential splice factor. Journal of Molecular Biology 270, 179-187 (1997). 6. Liere, K. & Link, G. RNA-binding activity of the matK protein encoded by the chloroplast trnK intron from Mustard (Sinapis alba L.). Nucleic Acids Research 23, 917-921 (1995). 7. Young, N. D. & dePamphilis, C. W. Purifying selection detected in the plastid gene matK and flanking regions within a group II intron of nonphotosynthetic plants. Molecular Biology and Evolution 17, 1933-1941 (2000). 8. Hubschmann, T., Hess, W. R. & Borner, T. Impaired splicing of the rps12 transcript in ribosome-deficient plastids. Plant Molecular Biology 30, 109-123 (1996). 9. Jenkins, B. D., Kulhanek, D. J. & Barkan, A. Nuclear mutations that block group II RNA splicing in maize reveal several intron classes with distinct requirements for splicing factors. Plant Cell 9, 283-296 (1997). 10. Haberhausen, G., Valentin, K. & Zetsche, K. Organization and sequence of photosynthetic genes from the plastid genome of the holoparasitic Cuscuta reflexa. Molecular & General Genetics 232, 154-161 (1992). 11. Hibberd, J. M. et al. Localization of photosynthetic in the parasitic angiosperm Cuscuta reflexa. Planta 205, 506-513 (1998). 12. Downie, S. R. et al. Six independent losses of the chloroplast DNA rpl2 intron in Dicotyledons - Molecular and phylogenetic implications. Evolution 45, 1245- 1259 (1991). 13. Bommer, D., Haberhausen, G. & Zetsche, K. A large deletion in the plastid DNA of the holoparasitic flowering plant Cuscuta reflexa concerning two ribosomal- proteins (rpl2, rpl23), one transfer-RNA (trnI) and an orf2280 homolog. Current Genetics 24, 171-176 (1993). 14. Freyer, R., Neckermann, K., Maier, R. M. & Kossel, H. Structural and functional- analysis of plastid genomes from parasitic plants - Loss of an intron within the genus Cuscuta. Current Genetics 27, 580-586 (1995). 35

15. Taylor, G. W., Wolfe, K. H., Morden, C. W., Depamphilis, C. W. & Palmer, J. D. Lack of a functional plastid transfer RNA(Cys) gene is associated with loss of photosynthesis in a lineage of parasitic plants. Current Genetics 20, 515-518 (1991). 16. Wimpee, C. F., Morgan, R. & Wrobel, R. L. Loss of transfer-RNA genes from the plastid 16S-23S ribosomal-RNA gene spacer in a parasitic plant. Current Genetics 21, 417-422 (1992). 17. Lohan, A. J. & Wolfe, K. H. A subset of conserved tRNA genes in plastid DNA of nongreen plants. Genetics 150, 425-433 (1998). 18. Downie, S. R., Llanas, E. & KatzDownie, D. S. Multiple independent losses of the rpoC1 intron in angiosperm chloroplast DNA's. Systematic Botany 21, 135- 151 (1996). 19. McPherson, M. A., Fay, M. E., Chase, M. W. & Graham, S. W. Parallel loss of a slowly evolving intron from two closely related families in asparagales. Systematic Botany 29, 296-307 (2004). 20. Neyland, R. A phylogeny inferred from large ribosomal subunit (26S) rDNA sequences suggests that Cuscuta is a derived member of Convolvulaceae. Brittonia 53, 108-115 (2001). 21. Stefanovic, S. & Olmstead, R. G. Testing the phylogenetic position of a parasitic plant (Cuscuta, Convolvulaceae, Asteridae): Bayesian inference and the parametric bootstrap on data drawn from three genomes. Systematic Biology 53, 384-399 (2004). 22. Mohr, G., Perlman, P. S. & Lambowitz, A. M. Evolutionary relationships among groupII intron-encoded proteins and identification of a conserved domain that may be related to maturase function. Nucleic Acids Research 21, 4991-4997 (1993). 23. Sanders, E. R., Karol, K. G. & McCourt, R. M. Occurrence of matK in a trnK group II intron in charophyte green algae and phylogeny of the Characeae. American Journal of Botany 90, 628-633 (2003). 24. Stefanovic, S., Krueger, L. & Olmstead, R. G. Monophyly of the Convolvulaceae and circumscription of their major lineages based on DNA sequences of multiple chloroplast loci. American Journal of Botany 89, 1510-1522 (2002). 25. Young, N. D., Steiner, K. E. & dePamphilis, C. W. The evolution of parasitism in Scrophulariaceae/Orobanchaceae: Plastid gene sequences refute an evolutionary transition series. Annals of the Missouri Botanical Garden 86, 876-893 (1999). 26. Demesure, B., Sodzi, N. & Petit, R. J. A set of universal primers for amplification of polymorphic noncoding regions of mitochondrial and chloroplast DNA in plants. Molecular Ecology 4, 129-131 (1995). 27. Dumolin-Lapegue, S., Pemonge, M. H. & Petit, R. J. An enlarged set of consensus primers for the study of organelle DNA in plants. Molecular Ecology 6, 393-397 (1997). 28. Nickrent, D. L., Yan, O. Y., Duff, R. J. & dePamphilis, C. W. Do nonasterid holoparasitic flowering plants have plastid genomes? Plant Molecular Biology 34, 717-729 (1997). 29. Swofford, D. L. (Sinauer Associates, Sunderland, MA, 2002). 36

30. Kosakovsky-Pond, S. L., Frost, S. D. W. & Muse, S. V. HyPhy: hypothesis testing using phylogenies. Bioinformatics, bti079 (2004). 31. , W. R. Analyzing tables of statistical tests. Evolution 43, 223-225 (1989).

Acknowledgements We would like to thank Jim Leebens-Mack, Dave Geiser, Steve

Schaeffer, and Andy Stephenson for critical review of the manuscript. George

Yatskievych, Daniel Austin, Julian Hibberd, Andreas Fleischmann, Peter Endress, Kim

Steiner, Greg Jordan, and Todd Barkman assisted in providing plant material used in this study.

Competing interests statement The authors declare that they have no competing financial interests.

Correspondence and requests for material should be addressed to J.R.M.

([email protected]). 37

Table 1 Intron distribution in Cuscuta, Ipomoea, Nicotiana, and Epifagus Group IIA Group IIB Taxon Taxonomic trnK- atpF *trnV- rpl2 3'rps12 trnI- trnA- trnG- ycf3 Classification UUU UAC GAU UGC UCC (both)

Cuscuta exaltata Subg. Monogyna x + +/x - + + + + + C. reflexa Subg. Monogyna x + +/x - + + + + + C. japonica Subg. Monogyna x + +/x - + + + + + C. lupuliformis Subg. Monogyna x + +/x - + + + + + C. europaea Subg. Cuscuta x + x - + + + x + C. epilinum Subg. Cuscuta x + x - + + + x + C. nitida Subg. Cuscuta x - x - + x x x - C. indecora Subg. Grammica x - x - - x x x - C. umbellata Subg. Grammica x - x - - x x x - C. tasmanica Subg. Grammica x - x - - x x x - C. rostrata Subg. Grammica x - x - - x x x - C. obtusiflora Subg. Grammica x - x - - x x x - Ipomoea purpurea Convolvulaceae + + + - + + + + + Nicotiana tabacum Solanaceae + + + + + + + + + Epifagus virginiana Orobanchaceae x x x + + x x x x x = nonfunctional or missing gene - = intron loss + = intron(s) present *trnV introns in Cuscuta subg. Monogyna have deletions that may render them pseudogenes

Table 2 Levels of selection on important evolutionary branches in matK Clade or species represented Group IIA introns lost Domain X Remainder of gene

at branch terminal along branch dN/dS p-value dN/dS p-value Tree A (from Fig. 2a) Convolvulaceae Cuscuta nitida trnA, trnI, atpF 1.282 0.007* 0.504 0.721 Subgenus Cuscuta trnV 0.282 0.692 0.471 0.732 All Cuscuta trnK 0.000 0.345 0.738 0.200 All Convolvulaceae rpl2 0.260 0.781 0.575 0.862 Tree B (From Fig. 2b) Orobanchaceae Epifagus virginiana (all but rps12 and rpl2 lost) 1.078 0.007* 0.797 0.097 Orobanche fasciculata ? 0.358 0.653 0.696 0.364 Nonphotosynthetic clade ? 0.365 0.934 0.556 0.500 All parasites ? 0.000 0.172 0.363 0.546

*Significantly different in LRT when all other branches are constrained to a globally estimated dN/dS p-value < 0.05 becomes < 0.0125 with Bonferroni correction for 4 tests 31 38

Figure 1 Results of PCR assays for presence or absence of two group IIB introns contained in ycf3 (a) and a group IIA intron in 3' rps12 (b). Taxa included on the above gels are: Cuscuta japonica (1) and C. reflexa (2) representing subgenus Monogyna; C. nitida (3), C. europaea (4), and C. epilinum (5) of subgenus Cuscuta; and C. indecora

(6), C. umbellata (7), C. tasmanica (8), and C. rostrata (9) of subgenus Grammica. Full results of all intron assays are shown in Table 1.

Figure 2 Phylogenies of Convolvulaceae (a) and Orobanchaceae (b) inferred from nucleotide sequence of full and partial matK sequences. a, Maximum Likelihood bootstrap values (100 replications) are shown at the nodes (bootstraps of 100 are denoted by asterisks). Taxonomic delimitations of Cuscuta subgenera and Convolvulaceae are boxed and labeled. Group IIA intron losses are mapped on branches where they are inferred to have occurred. b, Clades within Orobanchaceae (nonphotosynthetic lineage and parasitic lineage) used in LRT analyses presented in Table 2 are boxed and labeled. 39

1 2 3 4 5 6 7 8 9 Figure 1 a

1,806 bp

308 bp

b

1,301 bp 762 bp

trnA trnI atpF Figure 2 trnV C. nitida a * C. europaea *

trnK C. epilinum * C. lupuliformis * * C. japonica 93 * C. reflexa C. exaltata * Ipomoea purpurea rpl2 * Jacquemontia tamnifolia 72 carolinensis * Humbertia madagascariensis Nicotiana tabacum * * Atropa belladonna Panax ginseng Spinacia oleracea 0.05 substitutions / site

b Epifagus virginiana Orobanche fasciculata Castilleja linariifolia Lindenbergia philippinensis

Nicotiana tabacum Atropa belladonna Panax ginseng Spinacia oleracea 0.05 substitutions / site 40

Table 3 Voucher information and Genbank accession numbers

Species Voucher # Genbank accession Cuscuta exaltata * XXXXXX C. reflexa # XXXXXX C. japonica # XXXXXX C. lupuliformis (PAC) JRM03.0808 XXXXXX C. europaea (PAC) JRM03.1101 XXXXXX C. epilimum (PAC) JRM03.1210a XXXXXX C. nitida * XXXXXX C. indecora (PAC) JRM03.1103 XXXXXX C. umbellata * XXXXXX C. tasmanica * XXXXXX C. rostrata (PAC) JRM03.1001 XXXXXX C. obtusiflora (PAC) JRM03.0207 XXXXXX Ipomoea purpurea (PAC) JRM03.1203 XXXXXX Jacquemontia tamnifolia (MO) 00883399 XXXXXX Dichondra carolinensis # XXXXXX Humbertia madagascariensis (MO) 3854462 XXXXXX Nicotiana tabacum N/A NC001879 Atropa belladona N/A NC004561 Epifagus virginiana N/A NC001568 Orobanche fasciculata N/A AF051990 Castilleja linariifolia N/A AF051981 Lindenbergia philippinensis N/A AF051994 Panax ginseng N/A NC006290 Spinacia oleracea N/A NC002202 bold= Obtained from Genbank 41

Chapter 4: Formatted for submission to Molecular Biology and Evolution

Complete Plastid Genome Sequences Suggest Strong Selection for Retention of

Photosynthetic Genes in the Parasitic Plant Genus Cuscuta.

Joel R. McNeal,* Jennifer Kuehl,† Jeffery L. Boore,† and Claude W. dePamphilis*

*Department of Biology, Huck Institutes of Life Sciences, and Institute of Molecular

Evolutionary Genetics, The Pennsylvania State University, University Park; †DOE Joint

Genome Institute, Walnut Creek, California

Key Words: Ipomoea, Cuscuta, parasitic plants, plastid genome, chloroplast, photosynthesis

E-mail: [email protected] 42

ABSTRACT

Although only a small proportion of the genes necessary for photosynthesis are transcribed within the chloroplast, plastid genome content and protein sequence are, in general, highly conserved throughout all land plants and their closest algal relatives.

Parasitic plants, which obtain some or all of their nutrition through an attachment to a host plant, are often a striking exception. With heterotrophy comes an apparent relaxation of constraint on genes in the plastid genome of some species, in many cases resulting in gene loss. We sequenced the full plastid genomes of two species in the parasitic plant genus Cuscuta along with a nonparasitic relative, Ipomoea purpurea, to investigate changes in the plastid genome that may result from transition to the parasitic lifestyle. An overall increase in rate of base substitution and at least one putative gene loss appear to have occurred even before evolution of parasitism within the family containing these species. Aside from loss of all NADH dehydrogenase genes, Cuscuta exaltata retains an otherwise intact set of photosynthetic and photorespiratory genes that evolve under strong selective constraint. Cuscuta obtusiflora has incurred substantially more change to its plastome, including loss of all genes for the plastid-encoded RNA polymerase. Despite extensive change in gene content and a greatly increased rate of overall nucleotide substitution, C. obtusiflora surprisingly also retains all photosynthetic and photorespiratory genes with only one minor exception. Although Epifagus virginiana, the only other parasitic plant with its plastid genome sequenced to date, has lost a largely overlapping set of transfer-RNA and ribosomal genes as Cuscuta, it has lost all genes related to photosynthesis and maintains a set of genes which are among the most divergent in Cuscuta. Analyses demonstrate photosynthetic genes are under the 43 highest constraint of any genes within the plastomes of Cuscuta, indicating a photosynthetic function is still the primary purpose of the plastid genome in these species. 44

Introduction

Parasitic plants offer excellent opportunities to study changes in genome evolution that accompany the switch from an autotrophic to a heterotrophic lifestyle, a transition that has occurred many times over the course of evolution. Within angiosperms, the ability to obtain nutrition through direct attachment to a host plant has evolved at least a dozen times (Barkman et al. in prep) with many additional instances of plants obtaining most or all of their nutrition through specific mycotrophic fungal interactions (Bidartondo and Bruns 2001; Bidartondo et al. 2002). While approximately

90% of genes involved in photosynthesis have been transferred to the nuclear genome over the course of chloroplast evolution since divergence from free-living cyanobacterial relatives (Martin and Herrmann 1998), these nuclear genes are often impossibly difficult to study in non-model organisms. Widespread gene and genome duplication often makes inference of orthology among nuclear genes difficult, and rate acceleration in ribosomal loci of some parasitic plants suggests that the sequences of nuclear genes may be too divergent to amplify through standard PCR (Nickrent and Starr 1994). By contrast, genes remaining on the plastid chromosome evolve more slowly than nuclear genes and exist as single, readily identifiable orthologs in each plastome, although the plastid chromosome itself is in high copy number per cell (Wolfe, Li, and Sharp 1987).

Many species of parasitic plants retain the ability to photosynthesize, and aside from a supplemental connection to the roots of a host, otherwise resemble fully autotrophic plants in habit (Kuijt 1969). Others, however, display increased dependency on their hosts, often to the extent of becoming fully heterotrophic and nonphotosynthetic.

Such plants are often deemed "holoparasites", and one such species, Epifagus virginiana 45

(Beechdrops, Orobanchaceae) is the only parasitic plant whose full plastid genome has been sequenced to date (Wolfe, Morden, and Palmer 1992). Its plastid genome is reduced to less than half the size of that in normal angiosperms due to ubiquitous gene loss, including all photosynthetic and photorespiratory genes, some ribosomal protein genes, many tRNA genes, and genes for plastid-encoded polymerase (dePamphilis and

Palmer 1990; Wolfe, Morden, and Palmer 1992). Despite such drastic changes, plastid transcription and intron splicing still occur (dePamphilis and Palmer 1990; Ems et al.

1995), presumably for the purpose of producing the four remaining proteins not related to transcription or . Smaller scale studies show similar or less genome reduction in related species (Wimpee, Morgan, and Wrobel 1992; Delavault, Sakanyan, and

Thalouarn 1995; Delavault et al. 1996; Lohan and Wolfe 1998). For some holoparasitic lineages, existence of a functional plastid genome remains to be proven, although preliminary evidence suggests extremely divergent plastid genomes may occur in the families Balanophoraceae, Cytinaceae, Hydonoraceae, and Cynomoriaceae (Nickrent,

Duff, and Konings 1997; Nickrent et al. 1997).

A large number of studies on plastid function have been performed involving members of the parasitic genus Cuscuta, derived from within the otherwise autotrophic

Morning Glory Family (Convolvulaceae, order Solanales, class Asteridae). Plastid ultrastructure and gene content are quite variable between different taxa (van der Kooij et al. 2000), and over 150 species exist in this widespread and recognizable genus (Yunker

1932). Unlike Epifagus and other root-parasitic Orobanchaceae, Cuscuta is a twining with no roots at maturity. Instead, it sends its shoot like feeding organs, haustoria, directly into the stems of its hosts to invade the vasculature and obtain all necessary water 46 and other nutrients. Leaves are reduced to vestigial scales. Despite an obligate reliance upon their hosts, many Cuscuta species show some green color, at least in their inflorescences and, particularly, in maturing ovules. Machado and Zetsche demonstrated the presence of Rubisco, chlorophyll, and low levels of carbon fixation in Cuscuta reflexa, a member of subgenus Monogyna (Machado and Zetsche 1990). Additionally, although all NADH dehydrogenase (ndh) genes were either undetectable or nonfunctional

(Haberhausen and Zetsche 1994), other genes related to photosynthesis appeared to be present in functional form (Haberhausen, Valentin, and Zetsche 1992). In this species, green plastids of normal function are localized to a ring of cells between the stem pith and cortex that are isolated from atmospheric gas exchange, indicating photosynthesis may occur in this species using recycled respiratory CO2 (Hibberd et al. 1998) despite an altered xanthophyll cycle in its light-harvesting complex (Bungard et al. 1999). A different situation exists in the plastids of Cuscuta pentagona (subgenus Grammica), which lacks such a ring of cells, but possesses what appear to be photosynthetically capable plastids with immunodetectable Rubisco, photosystem, and light-harvesting proteins in proper plastid locations within green tissues of seedlings and adult plants

(Sherman, Pettigrew, and Vaughn 1999b). Other species within subgenus Grammica show a range of rbcL transcript levels, from low to none (van der Kooij et al. 2000), and sampled members of this subgenus lack promoters for plastid-encoded polymerase upstream of the rrn16 and rbcL genes, although transcription of rbcL still occurs from nuclear-encoded polymerase promoter sites in both cases (Krause, Berg, and Krupinska

2003). Conflicting evidence exists for Cuscuta europaea (subgenus Cuscuta), which has been described as lacking chlorophyll and detectable rbcL protein (Machado and Zetsche 47

1990), yet still possesses green color and more typical plastid sequences, including rbcL, than members of subgenus Grammica (Stefanovic, Krueger, and Olmstead 2002).

In this study, we test if significant changes to the plastid genome have occurred prior to the evolution of parasitism, if previously published observations of plastid genome evolution in Cuscuta apply to other members of the genus, if differences in chlorophyll content and distribution between Cuscuta species parallel differences in plastid genome content, whether plastid genes retained in Cuscuta are still evolving under strong purifying selection, and whether plastid gene retention and selective constraint suggest a photosynthetic function for plastids in this parasitic genus. To do so, we sequenced the full plastid genomes of two species of Cuscuta and a close photosynthetic relative, Ipomoea purpurea (Common Morning Glory). Ipomoea is a member of the

Convolvuloideae clade, which has been shown as the most likely sister group to Cuscuta in a number of studies (Neyland 2001; Stefanovic, Krueger, and Olmstead 2002;

Stefanovic, Austin, and Olmstead 2003; Stefanovic and Olmstead 2004). Cuscuta exaltata, a member of subgenus Monogyna with visible chlorophyll distributed throughout the stems and inflorescences, and Cuscuta obtusiflora, a member of subgenus

Grammica that usually only exhibits green pigmentation in inflorescences, fruits, starved seedlings and stressed stem tips, were chosen to represent Cuscuta. We examined overall rates of substitution and changes in selective constraint by comparing rates of synonymous and nonsynonymous substitution for all plastid genes and across functionally defined classes of genes to determine if photosynthetic genes remain the most highly conserved in the plastid genome and whether relaxation of functional constraint precedes gene losses both before and after the evolution of parasitism in this 48 lineage. We also tested whether patterns of transfer RNA loss, changes in intergenic regions, and rates of substitution parallel those seen in the completely nonphotosynthetic

Epifagus virginiana. Finally, we use the cumulative evidence of photosynthetic localization, specific gene loss, and strong functional constraint of specific genes to suggest a photosynthetic function of the plastid genome unrelated to the Calvin Cycle in

Cuscuta and perhaps other parasitic plants as well.

Materials and Methods

Plastid Genome Sequencing, Assembly, and Annotation

Seeds of all three species were germinated and grown in the Pennsylvania State

University Biology Greenhouse. An heirloom cultivar of Ipomoea purpurea, "Grandpa

Ott's", was used to decrease likelihood of heteroplasmy within the sample. One gram of young leaf tissue was used for DNA isolation in Ipomoea. One gram of tissue from a collection of very green seedlings originating from a selfed parental plant was used for

Cuscuta exaltata, and one gram of stem tip tissue was used for Cuscuta obtusiflora.

Partial fosmid libraries were constructed from the extracted DNA using the CopyControl

Fosmid Library Production Kit (Epicentre, Madison, WI). Libraries were screened for clones containing plastid DNA according to McNeal et al. (McNeal 2004a). A subset of clones covering the entire plastid genome was selected for each species, and clones were shotgun sequenced and the reads assembled according to previously described methods

(Jansen et al. 2005). Genome annotations were completed using DOGMA (Wyman, 49

Jansen, and Boore 2004) in combination with manual sequence alignments of previously annotated genes from available related species.

Molecular Evolutionary Analyses

Phylogenies for each gene were constructed in PAUP*4.0b10 (Swofford 2002) under various Maximum Parsimony, Neighbor-Joining, and Maximum Likelihood criteria, including the following related taxa with full plastid genome sequences publicly available: Nicotiana tabacum (Genbank accession NC 001879) and Atropa belladona

(NC 004561)(Solanaceae, Solanales, Asteridae), Panax ginseng (NC 006290)(Araliaceae,

Apiales, Asteridae), and Spinacia oleracea (NC 002202)(Caryophyllidae) as an outgroup.

All sequenced genes appeared orthologous, with only minor, method-dependent aberrations from the expected phylogeny which were probably due to extreme rate heterogeneity between taxa (Felsenstein 1978). Maximum Likelihood analyses that were performed under the General Time-Reversible model with gamma distribution of among- site variation (GTR+gamma) and model parameters estimated from the data were most accurate at obtaining the expected phylogeny from the data and were used for subsequent phylogenetic reconstruction of combined-gene datasets.

Pairwise synonymous (dS) and nonsynonymous (dN) nucleotide distances and standard errors were computed under the Kumar method using MEGA 2.1 (Kumar et al.

2000) for each gene and for classes of genes that together encode subunits of larger proteins. ATP synthase genes, 6 genes, 4344 aligned characters (atp); cytochrome b6/f complex subunits, 6 genes, 2622 aligned characters (pet); photosystem I and II protein 50 subunits, 19 genes, 11730 aligned characters (psa and psb = ps); large and small ribosomal protein subunits, 17 genes, 7686 aligned characters (rpl and rps = rp); plastid- encoded RNA polymerase 4 genes, 11958 aligned characters (rpo); and NADH- dehydrogenase, 11 genes, 10,653 aligned characters (ndh) were predefined classes of genes examined. Pairwise dS , dN, and amino acid p-distance were also calculated between Epifagus virginiana and Panax, the closest available outgroup to both it and

Cuscuta.

Maximum Likelihood estimates of dS and dN/ dS for each branch of the combined- gene dataset phylogenies were calculated under the MG96 x HKY 3x4 codon model in the HYPHY .99beta package (Kosakovsky-Pond, Frost, and Muse 2004). HYPHY was also used to conduct likelihood ratio tests (LRTs) between trees with universally constrained dS and dN/ dS versus trees with each respective parameter free from constraint on one branch. Branches leading to Convolvulaceae (Ipomoea + both Cuscuta species),

Cuscuta, and each individual Cuscuta species were tested for significant p-values in this manner. Additionally, pairwise relative rates tests were conducted for each gene class using various combinations of taxa with all parameters constrained as the null hypothesis and all parameters unconstrained as the alternate hypothesis. Pairwise relative ratio tests were conducted in HYPHY between the combined datasets with either synonymous or nonsynonymous distances constrained as the null hypothesis to determine whether there was significant heterogeneity in either across gene classes. Parameters were estimated independently for each branch. Finally, GCUA (McInerney 1998) was used to determine relative synonymous codon usage across all coding sequences for each genome to 51 identify any changes in codon bias that may have accompanied tRNA loss or relaxed selection for photosynthesis.

Results and Discussion

The three plastid genomes presented here all have a pair of large, inverted, identical repeat sequence (IR) separated from each other by a large single copy and small single copy region (LSC and SSC) on either end, as is the case for practically all plant plastid genomes (Palmer 1985). However, considerable length variation exists between these three plastid genome sequences, with the smallest genome, Cuscuta obtusiflora, barely half the size of that in Ipomoea purpurea (85,280 base pairs versus 162,046 bp).

Cuscuta exaltata is intermediate in size at 125,373 bp (See figs. 1-3). The plastid genome of Ipomoea is slightly larger than that of Nicotiana tabacum (155,939 bp), largely through expansion of the inverted repeats into the small single copy region (fig.

1). While the IR of Nicotiana barely extends into ycf1, in Ipomoea the IR includes the entire ycf1 gene, rps15, ndhH, and a short fragment of the first exon of ndhA. By contrast, the LSC end of the IR is slightly constricted, not including rpl2 and rpl23 as it does in Nicotiana. Gene content in Ipomoea is decidedly similar to that in Nicotiana and

Atropa. These three taxa, along with both Cuscuta species, lack an intact infA (Schmitz-

Linneweber et al. 2002), indicating this gene loss probably occurred prior to the divergence of Solanaceae from Convolvulaceae, both in the order Solanales. This is not surprising, as infA has been lost from the plastid many times in angiosperm evolution

(Millen et al. 2001). A second gene, ycf15, is lost across Convolvulaceae taxa sequenced 52 in this study but is present in Solanaceae and outgroups (Schmitz-Linneweber et al. 2001;

Schmitz-Linneweber et al. 2002; Kim and Lee 2004). However, the function of this gene is not known, and the effect of its loss in Convolvulaceae is difficult to interpret. A third gene, rpl23, may be a pseudogene in Ipomoea, is clearly a pseudogene in Cusucta exaltata, and is lost completely in C. obtusiflora. Although a full length open reading frame exists in Ipomoea for this gene, a frameshift mutation occurs towards the 3' end of the gene. A second frameshift mutation eventually brings the sequence back into the original reading frame, but the stop codon is further downstream than in other taxa with functional rpl23. The gene also does not appear to be evolving under negative selective constraint as in Nicotiana (see fig. 4), further indicating it may be a pseudogene, although tests of expression will be necessary to confirm this. Despite being a component of the plastid translational apparatus, the expendability of this ribosomal protein gene subunit in its plastid location is supported by its loss from the plastid genome Spinacia as well

(Schmitz-Linneweber et al. 2001). A gene found thus far only in members of

Solanaceae, sprA (Schmitz-Linneweber et al. 2002), is not found in any of the sequenced

Convolvulaceae genomes, indicating presence of this gene in the plastome is restricted to

Solanaceae.

One gene that surprisingly was found in all three Convolvulaceae plastid genomes is ycf1, a large gene of unknown function previously reported as missing in Cuscuta and three other Convolvulaceae (Downie and Palmer 1992). That study used Southern Blot hybridizations to screen for gene presence; ycf1 is still present as the second largest open reading frame in the plastid genome, but is extremely variable in size between the two

Cuscuta species and is greatly elongated in Ipomoea, possesses numerous large indels, 53 and is unalignable at the protein level for much of its length. These factors likely explain the negative hybridizations previously observed. As is the case for ycf15, interpreting consequences of the extreme divergence of this gene in Convolvulaceae awaits full knowledge of its function.

Gene loss is much more prominent in the two Cuscuta species than Ipomoea. All genes lost in C. exaltata are also lost in C. obtusiflora, and are most parsimoniously assumed to be lost in the common ancestor of both species. Most notable of these losses are the ndh genes, all of which are fully lost or are pseudogenes in Cuscuta. This confirms the PCR and blot data collected for Cuscuta reflexa that suggested all ndh genes were missing or highly altered in that species (Haberhausen, Valentin, and Zetsche 1992).

All ndh genes are also lost in Epifagus (dePamphilis and Palmer 1990), indicating loss of these genes may be directly related to parasitic habit. Although ndh genes are retained in most photosynthetic plants, they are also lost from the chloroplast genome of Pinus

(Wakasugi et al. 1994), indicating their presence in the plastid genome is not necessary for photosynthesis, even in fully autotrophic plants. Both Cuscuta species also lack a functional rps16 gene, although C. exaltata contains a pseudogene with portions of both exons and the group II intron present between them. A final gene loss from both Cuscuta plastomes that is also reported in C. reflexa is the loss of trnK-UUU (Bommer,

Haberhausen, and Zetsche 1993). As is the case for Epifagus, C. exaltata retains the open reading frame, matK, contained within the intron of that tRNA. A deletion within the trnV -CAU intron also reported in C. reflexa (Haberhausen, Valentin, and Zetsche

1992), and similar to that seen in Orobanche minor, may hypothetically disrupt its splicing (Lohan and Wolfe 1998), but because both exons remain intact in these species, 54 we hesitate to call it a pseudogene in C. exaltata without experimental evidence. Aside from these gene losses, plastid genome content of C. exaltata is identical to that in

Ipomoea and includes a full set of genes presumably necessary for photosynthesis.

Structurally, the plastid genome of C. exaltata has undergone a number of changes relative to Ipomoea and Nicotiana. The LSC end of the IR is constricted in both

Cuscuta species, but it has apparently re-extended to include a few nucleotides of trnH-

GUG (4 nucleotides in C. exaltata, 6 in C. obtusiflora). As in Ipomoea, the first full gene in the LSC end of the IR in C. obtusiflora is trnI-CAU. However, the IR constriction is much more dramatic in C. exaltata, with rpl2, trnI, and over half of ycf2 falling outside the IR (fig. 2). Putative loss of these genes in C. reflexa detected by PCR (Bommer,

Haberhausen, and Zetsche 1993) is likely an artifact of this constriction rather than a deletion, as the primers used in that study would have shown similar results for C. exaltata and not amplified the opposite LSC/IR junction at which these genes actually do exist. The IR has not extended substantially into the SSC in Cuscuta as in Ipomoea. In fact, C. exaltata is somewhat contracted relative to Nicotiana and ends slightly before the start codon of ycf1. Like Nicotiana, the IR of C. obtusiflora contains a portion of the 5' end of ycf1. Two segmental inversion events are observed in C. exaltata. One inversion occurs from trnV-UAC to psbE in the LSC region, the other in the SSC encompassing only two genes, ccsA and trnL-UAG. Both of these inversions border on regions that once contained ndh genes. Extensive noncoding pseudogene sequence may have helped ameliorate accumulation of repeat sequences that could promote inversion. Perhaps not coincidentally, the only inversion observed in Epifagus is trnL-UAG in the SSC (Wolfe,

Morden, and Palmer 1992). 55

Only one substantial inversion was found in C. obtusiflora. In this species, the entire SSC is inverted relative to the position for Nicotiana and most other published angiosperm genomes. This isn't a completely unexpected observation, as any homologous recombination event between the two IRs could result in this phenomenon.

Perhaps more unusual is the lack of any other short gene rearrangements relative to

Nicotiana and Ipomoea in this species. Unlike C. exaltata, C. obtusiflora lacks extensive pseudogene sequence and may have purged such unused DNA from its plastome before sequence motifs necessary for inversion events had time to develop. Gene loss, on the other hand, is much more rampant within C. obtusiflora (Table 1). In addition to the genes previously discussed for C. exaltata, C. obtusiflora has lost a third ribosomal protein gene, rpl32, and five additional tRNAs. Also lost are all subunits of the plastid- encoded RNA polymerase (rpo), and the intron maturase matK, the loss of which parallels loss of all group IIA introns from the genome as well, as previously reported

(McNeal 2004b). Blot data and negative PCR results have suggested loss of plastid rpo genes from other species within subgenus Grammica as well (Krause, Berg, and

Krupinska 2003), although the rrn gene cluster and rbcL gene appear to still be transcribed from nuclear-encoded polymerase in at least some species (Berg, Krause, and

Krupinska 2004). Despite such extensive gene loss from the plastome, C. obtusiflora shockingly maintains all plastid genes directly involved in photosynthesis, including all atp genes, all pet genes, rbcL, and all psa and psb genes, with the exception of psaI. This gene is one of the smallest in the plastome (36 codons or less), although it is highly conserved across land plants. 56

With these three new full plastid genome sequences, we tested whether substantial changes in selective pressure of genes, particularly those lost in Cuscuta, occurred prior to evolution of parasitism in this lineage. Comparing dN/ dS of Ipomoea and Nicotiana for all genes relative to an outgroup, Panax, revealed an interesting trend toward relaxed selection in the genome of the fully autotrophic Morning Glory (fig. 4). Of 77 protein- coding genes shared between the two taxa, 56 (72.7%) have a higher dN/ dS in Ipomoea than in Nicotiana, with only 15 genes showing higher dN/ dS in Nicotiana (6 genes were indistinguishable or had dN/ dS < 0.01 in both taxa). Furthermore, 12/13 genes lost in both Cuscuta species had higher dN/ dS in Ipomoea, indicating these genes may have already been under relaxed selection prior to the evolution of parasitism in

Convolvulaceae. All previously defined classes of genes (atp, pet, ps, rp, rpo, and ndh) with the exception of pet showed significantly greater overall rates of substitution in

Ipomoea than in Nicotiana in pairwise relative rates test using Panax as an outgroup

(Supplemental Table 2). Analysis of the combined set of ndh genes revealed that dN/ dS on the branch leading to Ipomoea is much higher than in the previous branch in the tree leading to Solanales leading to an extremely significant difference in the likelihood of the tree when left unconstrained (p < 0.0001, Supplemental Table 3), suggesting relaxed selection in ndh genes probably began before the advent of parasitism.

Pairwise relative rates tests also show significant overall rate differences between

Ipomoea and Cuscuta exaltata as well as between the two Cuscuta species for all types of genes (Supplemental Table 2). We next wanted to test whether ratios of overall selection between classes of genes remaining in Cuscuta are similar to autotrophic taxa. Figure 5 shows how patterns of synonymous and nonsynonymous substitution vary between 57 sampled Solanalean taxa relative to Panax for the various classes of genes in the plastome. While there are minor changes in dS between different gene classes, relative ratio tests of dS for the tree topologies of each gene class yielded no significant differences (Supplemental Table 2). However, dN values for ps genes were significantly different from both atp and rp genes, and there were lower dN and dN/ dS for pet and ps genes in all pairwise comparisons performed (fig. 5b and c). The trend in Cuscuta is clearly symmetrical to other taxa; all classes of genes appear to be evolving under strong negative selection with dN/ dS much lower than 1, and photosystem and pet genes remain the most highly conserved, even in the rapidly evolving C. obtusiflora genome. Despite the loss of psaI in C. obtusiflora, selective constraint on the plastid genome of both

Cuscuta species strongly suggests that a photosynthetic process remains the primary purpose of their plastid genomes.

Although plastid genes in Cuscuta are still evolving under strong negative selection, the data show that they are somewhat relaxed compared to their fully autotrophic relatives. Figure 6 shows phylograms for each of the previously discussed gene classes with significant increases in dS and dN/ dS as determined by LRTs indicated on the branches. The overall synonymous rate for C. obtusiflora varies between 5 and 8 times that of the branch leading to Convolvulaceae across the four classes of genes for which it could be studied, while that of C. exaltata is nearly identical (Supplemental

Table 3). These highly accelerated substitution rates in C. obtusiflora could be the result of shorter generation time, damage to repair machinery allowed by relaxed selective constraint, or, alternatively, could result from a lower organismal or plastid genome population size (Mogensen 1996). Strongly negative selective pressure in C. obtusiflora, 58 particularly in ps and pet genes, occurring in spite of highly accelerated rates of nucleotide substitution further supports the idea that C. obtusiflora must be utilizing its photosynthetic genes for some purpose important to the plant. This is particularly fascinating considering full loss of plastid-encoded polymerase. While Epifagus virginiana has been shown to perform transcription of ribosomal and various other protein coding genes in the absence of plastid rpo genes (dePamphilis and Palmer 1990;

Ems et al. 1995), this phenomenon is unknown from any photosynthetic plant. In large part, plastid polymerase performs the transcriptional duties for photosynthetic genes in typical green plants, but a dramatic shift seems to have occurred in Cuscuta toward imported nuclear polymerase transcription of all genes. Many plastid genes are known to be transcribed by both (Liere and Maliga 2001), and whether or not autotrophic relatives of Cuscuta obtusiflora already possess the ability to transcribe all genes with imported nuclear polymerase or whether novel promoters and transcription factor binding sites evolved rather recently remains to be seen.

Although Epifagus virginiana has undergone a similar downsizing of its plastid genome, it and Cuscuta are quite different in a number of ways, most obviously in that

Cuscuta retains a seemingly functional set of photosynthetic genes while Epifagus has lost all such genes. With the loss of rpo genes in both taxa, we investigated whether both taxa show similar patterns of deletion in intergenic regions, which should contain plastid promoters, transcription-factor binding sites, and other motifs no longer necessary in a nuclear-transcribed plastome. Overall, Epifagus has 22 fewer protein coding genes and 7 fewer tRNA genes than C. obtusiflora. While the plastid genome size of Epifagus

(70,028 bp) is over 15 kilobases smaller than that of C. obtusiflora, this is actually less 59 than would be expected given such a dramatic difference in overall gene content. In 63 non-coding, intergenic regions between homologous functional genes in both Cuscuta species, Ipomoea, and Nicotiana, C. obtusiflora (11714) has undergone a 49% overall decrease in length relative to Nicotiana (22,996 bp), perhaps largely due to a deletion of plastid polymerase and transcription factor binding sites. C. exaltata has decreased 16% over the same area, and Ipomoea only 1%. Over the 16 intergenic regions shared by

Epifagus, C. obtusiflora has decreased by 33% relative to Nicotiana, while Epifagus has only decreased by slightly over 3% (values in Supplemental Table 4A). Likewise, in 3 regions for which conserved functional genes flanking regions containing homologously defunct genes could be compared between Epifagus and C. obtusiflora, Epifagus exhibits a 32% total decrease in size relative to the full length sequences containing functional genes in Nicotiana, while C. obtusiflora is 85% shorter (Supplemental Table 4B). The IR of Epifagus is almost the same length as that of a normal angiosperm, while its SSC and

LSC regions are the sites of practically all of its gene loss. Cuscuta has extensive deletion in those areas too, but also exhibits a significant contraction of the IR, largely through pseudogene loss relative to Epifagus. While Cuscuta almost completely lacks pseudogene sequences, Epifagus retains a fair number of them. Coupled with various intron losses (McNeal 2004b), the plastid genome of C. obtusiflora is much more streamlined than that of Epifagus.

We also wanted to test whether genes remaining in the plastid of the fully nonphotosynthetic Epifagus are under less constraint than those of the putatively photosynthetic Cuscuta species. Surprisingly, among the alignable genes they share, C. obtusiflora is usually more divergent at the protein level from a common outgroup, 60

Panax (Supplemental fig. 7). Comparison of dN/ dS across all genes shows no clear trend, with some genes under greater constraint in Epifagus than in C. obtusiflora and others more conserved in Cuscuta (Supplemental fig. 8).

C. obtusiflora retains the four protein-coding genes in Epifagus not related to transcription or translation and presumably the reason for retaining a plastid genome in that species: accD, clpP, ycf1, and ycf2 (Wolfe, Morden, and Palmer 1992). If the two parasitic lineages are utilizing their plastid genomes for the same purpose, we would expect to see these genes evolving similarly between the taxa. accD and clpP both are less constrained in Cuscuta than in Epifagus, and in clpP, dramatically so, with all three

Convolvulaceae taxa exhibiting higher dN/ dS for both genes. The effect this has on the amino acid divergence is also very apparent (Supplemental fig. 7). clpP is a protease that is essential for shoot development in Nicotiana, but exactly which proteins it catalyses are still unknown (Kuroda and Maliga 2003), and how it can be so divergent in the closely related autotroph, Ipomoea, has yet to be deduced. While alignable regions of ycf1 and ycf2 actually have lower dN/ dS in C. obtusiflora than for Epifagus, the rest of each gene is unalignable at even the protein level in Cuscuta while Epifagus is relatively easy to align, and overall protein divergence is actually much higher for C. obtusiflora than Epifagus in these genes. Thus, strangely, the four genes for which a plastid genome exists in Epifagus are among the least conserved in the plastid genome of

Convolvulaceae taxa. Overall, with the exception of photosynthetic genes, the plastid genome of Cuscuta obtusiflora is more streamlined, less constrained, and more divergent than Epifagus for the genes they share in common. Whether this indicates faster overall evolutionary rates in C. obtusiflora or simply a longer time as a specialized parasite under 61 relaxed constraint is difficult to discern without accurate dating methods and more taxon sampling.

Despite some differences in patterns of evolution, many parallels exist between plastid genome evolution in Cuscuta and that of the related but independently derived parasitic lineage Orobanchaceae, including Epifagus. Both lineages show overall increased rates of nucleotide substitution, relaxed selective constraint, and lack any appreciable shift in synonymous codon usage in spite of loss of multiple tRNAs (Morden et al. 1991)(Supplemental Table 5). Substantial gene loss is observed in both lineages; in addition to sharing loss of all ndh and rpo genes with C. obtusiflora, Epifagus has lost a largely overlapping set of tRNAs from its plastid genome. All tRNAs lost in C. obtusiflora are also lost in Epifagus with the exception of trnR-ACG, and even that has been suggested to be a pseudogene (Lohan and Wolfe 1998). The three ribosomal proteins lost in C. obtusiflora are also a subset of the six lost in Epifagus. Although

Epifagus lacks all photosynthetic genes, other Orobanchaceae retain genes normally required for photosynthesis in seemingly functional form. Lathraea clandestina has what appears to be a functional rbcL (Rubisco, large subunit) gene, and rpo genes are also amplifiable by PCR, despite the fact that the plant apparently lacks chlorophyll and spends its entire life cycle underground except when flowering (Delavault et al. 1996).

Similarly, some members of the genus Orobanche and other holoparasites within the family retain rbcL genes that appear to be evolving under functional constraint (Wolfe and dePamphilis 1997; Wolfe and dePamphilis 1998; Leebens-Mack and dePamphilis

2002). Pholisma, a genus in the holoparasitic family Lennoaceae, is yet another example of an independently nonphotosynthetic lineage retaining rbcL (Bremer et al. 2002). 62

Without full plastid genome sequence from these plants, it is difficult to know whether they too may still possess a necessary complement of plastid genes for residual photosynthesis, although unlike the Cuscuta species in this study, they lack obvious chlorophyll at any life stage and rarely are above ground to encounter light.

Because no atmospheric gas exchange occurs with presumably photosynthetic cells in C. reflexa, recycling of respiratory carbon dioxide has been presented as a hypothesis for retention of photosynthesis in that species, and although their source carbohydrates all apparently originate from the host, a net decline in carbon dioxide release is indeed detected in the presence of light (Hibberd et al. 1998). However, recycling carbon dioxide back to carbohydrate through the Calvin cycle is not the only potential reason for retaining photosynthesis. Another possible explanation for conservation of photosynthetic genes in Cuscuta and retention of rbcL in other holoparasities may lie in a recently described alternative function of Rubisco involving lipid biosynthesis, where it acts independently of its formerly known role in Calvin cycle production of carbohydrates. In this alternative pathway, 20% more acetyl-CoA is available for fatty acid biosynthesis, and 40% less carbon is lost as carbon dioxide in green seeds of Brassica napus. This pathway is still largely reliant on ATP and NADPH generated during the light reactions of photosynthesis, although less than 15% of that necessary for the Calvin cycle is needed for this function of Rubisco to play a dominant role in lipid synthesis (Schwender et al. 2004). The authors of that study postulate that this pathway is the reason why many plants have green seeds, and seeds perform this process efficiently best in high light, whereas non-green seeds such as Sunflower don't seem to benefit greatly from it. No atmospheric carbon dioxide would be necessary for 63 this process, and it could also explain the observation of less respiratory carbon dioxide loss during light exposure (Hibberd et al. 1998), when necessary ATP and NADPH for the reaction would be produced.

Collective evidence suggests that the lipid biosynthetic pathway is better supported as a hypothesis for retention of photosynthetic genes in Cuscuta rather than

Calvin cycle production of carbohydrates, particularly in Cuscuta obtusiflora.

Chlorophyll is most concentrated in developing ovules and seeds of Cuscuta obtusiflora and relatives in subgenus Grammica, while they lack the circular ring of chlorophyllous cells between their pith and cortex (Sherman, Pettigrew, and Vaughn 1999a). Because

Cuscuta species must survive long enough after germination to search for and attach to a host, utilizing this alternative function of photosynthesis for efficient lipid allocation to seeds and subsequent efficient carbon use in the free-living seedlings, which also display noticeable chlorophyll, may be the explanation for an intact photosynthetic apparatus in this parasitic lineage.

Loss of ndh genes in Cuscuta provides another line of evidence that photosynthetic genes are primarily retained for lipid biosynthesis in Cuscuta. Linear electron flow through the photosystems results in a lower ratio of ATP to NADPH than what is optimal for the Calvin cycle, while an increase in cyclical electron flow around photosystem I increases this ratio (Heber and Walker 1992). Overproduction and accumulation of NADPH during light reactions would likely result in lethal over- reduction of the plastid stroma, making this cyclic flow necessary (Munekage et al.

2004). ndh genes have been shown to play a role in cyclic electron flow around photosystem I (Shikanai et al. 1998), although under optimal growth conditions their 64 function may be lost due to the presence of a partially redundant, complementary pathway (Joët et al. 2001). However, when stomata are forced to close under conditions favorable to photorespiration, tobacco mutants lacking a functional set of ndh genes perform poorly, suggesting the major role of ndh genes is facilitating photosynthesis under lowered carbon dioxide levels (Horvath et al. 2000). This perhaps explains the ability of Pinus thunbergii to lose ndh genes, as the adaptation of needle-like leaves in conifers decreases water loss from transpiration while allowing sufficient gas exchange for optimal photosynthesis through open stomata.

Cuscuta, on the other hand, lacks normally functioning stomata (Hibberd et al.

1998), and as such would be expected to have increased, not decreased, incidence of photorespiration and a greater need for the cyclical electron pathway involving ndh genes if carbohydrate production through the Calvin cycle was the primary role of photosynthesis. Extraordinarily high rates of respiration and an extremely high concentration of carbon dioxide in photosynthetic cells would be required to prevent acidification of the stroma. However, flux of lipid synthesis through Rubisco actually requires increased levels of NADPH relative to ATP (Schwender et al. 2004), and the loss of ndh genes in Cuscuta may very well indicate a shift in the primary use of photosynthesis from carbohydrate production to lipid production. Moreover, lipids in the phloem in Canola and likely other plants are in much lower concentration and are much different in composition than in normal plant tissue, while proteins and, especially, sugars are more commonly transported in the vasculature (Madey, Nowack, and Thompson

2002). While a steady flow of carbohydrate is available from hosts of Cuscuta, the parasite probably remains very much dependent on its own cells for lipid biosynthesis. 65

Use of Rubisco for lipid biosynthesis may also explain retention of rbcL in other holoparasite plastids. A role for Rubisco in Acetyl CoA and subsequent lipid synthesis is particularly tantalizing given that one of the four genes for which the plastid genome is retained in Epifagus, accD, is a subunit of acetyl co-A carboxylase, indicating that lipid biosynthesis remains an important function of plastids in this species. Even in the absence of light or genes necessary for photosynthetic light reactions, efficient harvesting of host nutrients as a source for ATP and NADPH could maintain rbcL as a valuable resource in holoparasites such as Orobanche, as Rubisco-mediated oil metabolism requires far fewer of these photosynthetic products than carbohydrate synthesis

(Schwender et al. 2004). Future physiological study of photosynthetic tissues in Cuscuta as well as other parasitic plants, which may have largely if not entirely lost the primary photosynthetic function of their plastid genomes, should lead to greater understanding of possible alternate roles of the plastid genome in parasitic and autotrophic plants alike.

Acknowledgments

We would like to thank Mauricio Bonifacino and Daniel Austin for assistance in obtaining plant material for the study, Tim Chumley for technical support, Robert Jansen for general support and encouragement, and Jim Leebens-Mack, Dave Geiser, Steve

Schaeffer, and Andy Stephenson for critical review of the manuscript. This research was supported by National Science Foundation Grants DEB-0206659 and DEB-0120709 66

Literature Cited

Barkman, T. J., J. R. McNeal, L. S., G. Coat, H. B. Croom, N. D. Young, and C. W. dePamphilis. in prep. Mitochondrial DNA suggests 12 origins of parasitism in angiosperms and implicates parasitic plants as vectors of horizontal gene transfer. to be submitted to Proceedings of the National Academy of Sciences of the United States of America. Berg, S., K. Krause, and K. Krupinska. 2004. The rbcL genes of two Cuscuta species, C. gronovii and C. subinclusa, are transcribed by the nuclear-encoded plastid RNA polymerase (NEP). Planta 219:541-546. Bidartondo, M. I., and T. D. Bruns. 2001. Extreme specificity in epiparasitic Monotropoideae (Ericaceae): widespread phylogenetic and geographical structure. Molecular Ecology 10:2285-2295. Bidartondo, M. I., D. Redecker, I. Hijri, A. Wiemken, T. D. Bruns, L. Dominguez, A. Sersic, J. R. Leake, and D. J. Read. 2002. Epiparasitic plants specialized on arbuscular mycorrhizal fungi. Nature 419:389-392. Bommer, D., G. Haberhausen, and K. Zetsche. 1993. A large deletion in the plastid DNA of the holoparasitic flowering plant Cuscuta reflexa concerning two ribosomal- proteins (rpl2, rpl23), one transfer-RNA (trnI) and an orf2280 homolog. Current Genetics 24:171-176. Bremer, B., K. Bremer, N. Heidari, P. Erixon, R. G. Olmstead, A. A. Anderberg, M. Kallersjo, and E. Barkhordarian. 2002. Phylogenetics of based on 3 coding and 3 non-coding chloroplast DNA markers and the utility of non-coding DNA at higher taxonomic levels. Molecular Phylogenetics and Evolution 24:274- 301. Bungard, R. A., A. V. Ruban, J. M. Hibberd, M. C. Press, P. Horton, and J. D. Scholes. 1999. Unusual carotenoid composition and a new type of xanthophyll cycle in plants. Proceedings of the National Academy of Sciences of the United States of America 96:1135-1139. Delavault, P., V. Sakanyan, and P. Thalouarn. 1995. Divergent evolution of two plastid genes, rbcL and atpB, in a non-photosynthetic parasitic plant. Plant Molecular Biology 29:1071-1079. Delavault, P. M., N. M. Russo, N. A. Lusson, and P. A. Thalouarn. 1996. Organization of the reduced plastid genome of Lathraea clandestina, an achlorophyllous parasitic plant. Physiologia Plantarum 96:674-682. dePamphilis, C. W., and J. D. Palmer. 1990. Loss of photosynthetic and chlororespiratory genes from the plastid genome of a parasitic flowering plant. Nature 348:337-339. Downie, S. R., and J. D. Palmer. 1992. Use of chloroplast DNA rearrangements in reconstructing plant phylogeny. Pp. 14-35 in D. E. Soltis, P. S. Soltis, and J. J. Doyle, eds. Molecular Systematics of Plants. Chapman and Hall, London, UK. Ems, S. C., C. W. Morden, C. K. Dixon, K. H. Wolfe, C. W. dePamphilis, and J. D. Palmer. 1995. Transcription, splicing and editing of plastid in the nonphotosynthetic plant Epifagus virginiana. Plant Molecular Biology 29:721- 733. 67

Felsenstein, J. 1978. Cases in which parsimony and compatibility methods will be positively misleading. Systematic Zoology 27:401-410. Haberhausen, G., K. Valentin, and K. Zetsche. 1992. Organization and sequence of photosynthetic genes from the plastid genome of the holoparasitic flowering plant Cuscuta reflexa. Molecular & General Genetics 232:154-161. Haberhausen, G., and K. Zetsche. 1994. Functional loss of ndh genes in an otherwise relatively unaltered plastid genome of the holoparasitic flowering plant Cuscuta reflexa. Plant Molecular Biology 24:217-222. Heber, U., and D. Walker. 1992. Concerning a dual function of coupled cyclic electron transport in leaves. Plant Physiology 100:1621-1626. Hibberd, J. M., R. A. Bungard, M. C. Press, W. D. Jeschke, J. D. Scholes, and W. P. Quick. 1998. Localization of photosynthetic metabolism in the parasitic angiosperm Cuscuta reflexa. Planta 205:506-513. Horvath, E. M., S. O. Peter, T. Joet, D. Rumeau, L. Cournac, G. V. Horvath, T. A. Kavanagh, C. Schafer, G. Peltier, and P. Medgyesy. 2000. Targeted inactivation of the plastid ndhB gene in tobacco results in an enhanced sensitivity of photosynthesis to moderate stomatal closure. Plant Physiology 123:1337-1349. Jansen, R. K., L. A. Raubeson, J. L. Boore, C. W. dePamphilis, T. W. Chumley, R. C. Haberle, S. K. Wyman, A. J. Alverson, R. Peery, S. J. Herman, H. M. Fourcade, J. V. Kuehl, J. R. McNeal, J. Leebens-Mack, and L. Cui. 2005. Methods for Obtaining and Analyzing Whole Chloroplast Genome Sequences. Methods in Enzymology (in press). Joët, T., L. Cournac, E. M. Horvath, P. Medgyesy, and G. Peltier. 2001. Increased sensitivity of photosynthesis to antimycin A induced by inactivation of the chloroplast ndhB gene. Evidence for a participation of the NADH-dehydrogenase complex to cyclic electron flow around photosystem I. Plant Physiology 125:1919-1929. Kim, K. J., and H. L. Lee. 2004. Complete chloroplast genome sequences from Korean ginseng (Panax schinseng Nees) and comparative analysis of sequence evolution among 17 vascular plants. DNA Research 11:247-261. Kosakovsky-Pond, S. L., S. D. W. Frost, and S. V. Muse. 2004. HyPhy: hypothesis testing using phylogenies. Bioinformatics:bti079. Krause, K., S. Berg, and K. Krupinska. 2003. Plastid transcription in the holoparasitic plant genus Cuscuta: parallel loss of the rrn16 PEP-promoter and of the rpoA and rpoB genes coding for the plastid-encoded RNA polymerase. Planta 216:815-823. Kuijt, J. 1969. Biology of Parasitic Flowering Plants. University of California Press, Berkeley and Los Angeles. Kumar, S., K. Tamura, I. Jakobsen, and M. Nei. 2000. Molecular Evolutionary Genetics Analysis (MEGA) version 2.1. Kuroda, H., and P. Maliga. 2003. The plastid clpP1 protease gene is essential for plant development. Nature 425:86-89. Leebens-Mack, J. H., and C. W. dePamphilis. 2002. Power analysis of tests for loss of selective constraint in cave crayfish and nonphotosynthetic plant lineages. Molecular Biology and Evolution 19:1292-1302. 68

Liere, K., and P. Maliga. 2001. Plastid RNA polymerases in higher plants. Pp. 29-49 in E. Aro, and B. Andersson, eds. Regulation of photosynthesis. Kluwer Academic Publishers, Dordrecht, Boston, London. Lohan, A. J., and K. H. Wolfe. 1998. A subset of conserved tRNA genes in plastid DNA of nongreen plants. Genetics 150:425-433. Machado, M. A., and K. Zetsche. 1990. A structural, functional and molecular analysis of plastids of the holoparasites Cuscuta reflexa and Cuscuta europaea. Planta 181:91-96. Madey, E., L. M. Nowack, and J. E. Thompson. 2002. Isolation and characterization of lipid in phloem sap of canola. Planta 214:625-634. Martin, W., and R. G. Herrmann. 1998. Gene transfer from organelles to the nucleus: How much, what happens, and why? Plant Physiology 118:9-17. McInerney, J. O. 1998. GCUA (General Codon Usage Analysis). Bioinformatics 14:372- 373. McNeal, J. R. 2004a. Chapter 2: "Utilization of partial genomic fosmid libraries for sequencing complete organellar genomes" in Systematics and plastid genome evolution in the parasitic plant genus Cuscuta (dodder). Department of Biology. The Pennsylvania State University, University Park. McNeal, J. R. 2004b. Chapter 3: "Disappearance of introns promotes adaptive change and loss of a highly conserved maturase" in Systematics and molecular evolution in the parasitic plant genus Cuscuta (dodder). Department of Biology. The Pennsylvania State University, University Park. Millen, R. S., R. G. Olmstead, K. L. Adams, J. D. Palmer, N. T. Lao, L. Heggie, T. A. Kavanagh, J. M. Hibberd, J. C. Giray, C. W. Morden, P. J. Calie, L. S. Jermiin, and K. H. Wolfe. 2001. Many parallel losses of infA from chloroplast DNA during angiosperm evolution with multiple independent transfers to the nucleus. Plant Cell 13:645-658. Mogensen, H. L. 1996. The hows and whys of cytoplasmic inheritance in plants. American Journal of Botany 83:383-404. Morden, C. W., K. H. Wolfe, C. W. Depamphilis, and J. D. Palmer. 1991. Plastid translation and transcription genes in a nonphotosynthetic plant - Intact, missing and pseudo genes. Embo Journal 10:3281-3288. Munekage, Y., M. Hashimoto, C. Miyake, K. Tomizawa, T. Endo, M. Tasaka, and T. Shikanai. 2004. Cyclic electron flow around photosystem I is essential for photosynthesis. Nature 429:579-582. Neyland, R. 2001. A phylogeny inferred from large ribosomal subunit (26S) rDNA sequences suggests that Cuscuta is a derived member of Convolvulaceae. Brittonia 53:108-115. Nickrent, D. L., R. J. Duff, and D. A. M. Konings. 1997. Structural analyses of plastid- derived 16S rRNAs in holoparasitic angiosperms. Plant Molecular Biology 34:731-743. Nickrent, D. L., and E. M. Starr. 1994. High rates of nucleotide substitution in nuclear small-subunit (18S) rDNA from holoparasitic flowering plants. Journal of Molecular Evolution 39:62-70. 69

Nickrent, D. L., O. Y. Yan, R. J. Duff, and C. W. dePamphilis. 1997. Do nonasterid holoparasitic flowering plants have plastid genomes? Plant Molecular Biology 34:717-729. Palmer, J. D. 1985. Comparative organization of chloroplast genomes. Ann Rev Genet 19:325-354. Schmitz-Linneweber, C., R. M. Maier, J. P. Alcaraz, A. Cottet, R. G. Herrmann, and R. Mache. 2001. The plastid chromosome of spinach (Spinacia oleracea): complete nucleotide sequence and gene organization. Plant Molecular Biology 45:307-315. Schmitz-Linneweber, C., R. Regel, T. G. Du, H. Hupfer, R. G. Herrmann, and R. M. Maier. 2002. The plastid chromosome of Atropa belladonna and its comparison with that of Nicotiana tabacum: The role of RNA editing in generating divergence in the process of plant speciation. Molecular Biology and Evolution 19:1602- 1612. Schwender, J., F. Goffman, J. B. Ohlrogge, and Y. Shachar-Hill. 2004. Rubisco without the Calvin cycle improves the carbon efficiency of developing green seeds. Nature 432:779-782. Sherman, T. D., W. T. Pettigrew, and K. C. Vaughn. 1999a. Structural and immunological characterization of the Cuscuta pentagona L-chloroplast. Plant and Cell Physiology 40:592-603. Sherman, T. D., W. T. Pettigrew, and K. C. Vaughn. 1999b. Structural and immunological characterization of the Cuscuta pentagona L. chloroplast. Plant and Cell Physiology 40:592-603. Shikanai, T., T. Endo, M. Hashimoto, Y. Yamada, K. Asada, and A. Yokota. 1998. Directed disruption of the tobacco ndhB gene impairs cyclic electron flow around photosystem I. Proceedings of the National Academy of Sciences of the United States of America 95:9705-9709. Stefanovic, S., D. F. Austin, and R. G. Olmstead. 2003. Classification of Convolvulaceae: A phylogenetic approach. Systematic Botany 28:791-806. Stefanovic, S., L. Krueger, and R. G. Olmstead. 2002. Monophyly of the Convolvulaceae and circumscription of their major lineages based on DNA sequences of multiple chloroplast loci. American Journal of Botany 89:1510-1522. Stefanovic, S., and R. G. Olmstead. 2004. Testing the phylogenetic position of a parasitic plant (Cuscuta, Convolvulaceae, Asteridae): Bayesian inference and the parametric bootstrap on data drawn from three genomes. Systematic Biology 53:384-399. Swofford, D. L. 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Sinauer Associates, Sunderland, MA. van der Kooij, T. A. W., K. Krause, I. Dorr, and K. Krupinska. 2000. Molecular, functional and ultrastructural characterisation of plastids from six species of the parasitic flowering plant genus Cuscuta. Planta 210:701-707. Wakasugi, T., J. Tsudzuki, S. Ito, K. Nakashima, T. Tsudzuki, and M. Sugiura. 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the Black Pine Pinus thunbergii. Proceedings of the National Academy of Sciences of the United States of America 91:9794-9798. 70

Wimpee, C. F., R. Morgan, and R. L. Wrobel. 1992. Loss of transfer-RNA genes from the plastid 16S-23S ribosomal-RNA gene spacer in a parasitic plant. Current Genetics 21:417-422. Wolfe, A. D., and C. W. dePamphilis. 1998. The effect of relaxed functional constraints on the photosynthetic gene rbcL in photosynthetic and nonphotosynthetic parasitic plants. Molecular Biology and Evolution 15:1243-1258. Wolfe, A. D., and C. W. dePamphilis. 1997. Alternate paths of evolution for the photosynthetic gene rbcL in four nonphotosynthetic species of Orobanche. Plant Molecular Biology 33:965-977. Wolfe, K. H., W. H. Li, and P. M. Sharp. 1987. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proceedings of the National Academy of Sciences of the United States of America 84:9054- 9058. Wolfe, K. H., C. W. Morden, and J. D. Palmer. 1992. Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant. Proceedings of the National Academy of Sciences of the United States of America 89:10648- 10652. Wyman, S. K., R. K. Jansen, and J. L. Boore. 2004. Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20:3252-3255. Yunker, T. G. 1932. The Genus Cuscuta. Memoirs of the Torrey Botanical Club 18:113- 331. 71

Table 1 Plastid Gene Loss Relative to Panax ginseng

Gene Type Ipomoea purpurea Cuscuta exaltata Cuscuta obtusiflora NADH ndhA, ΨndhB, ndhC, ndhA, ndhB, ndhC, ndhD, dehydrogenase ΨndhD, ndhE, ndhF, ndhG, ndhE, ndhF, ndhG, ndhH, ndhH, ndhI, ndhJ, ndhK ndhI, ndhJ, ndhK,

Photosystem Protein psaI

Ribosomal Protein (Ψrpl23?) Ψrpl23, Ψrps16 rpl23, rpl32, rps16

Transfer-RNA trnK-UUU trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnR-ACG†, trnV-UAC

RNA polymerase ΨrpoA, rpoB, rpoC1, rpoC2

Initiation factor ΨinfA*† ΨinfA*† infA*†

Unknown ycf15 Ψycf15 ycf15 Intron maturase matK†

* Also nonfunctional in Nicotiana tabacum and Atropa belladonna † Still present in Epifagus virginiana 72

FIG. 1.-Circular map of the complete plastid genome of Ipomoea purpurea, comprising an 88,172 bp LSC, a 12,110 bp SSC , and two 30,882 bp IRs. Position one of the annotated sequence begins at the LSC/IRA junction and increases numerically counterclockwise around the genome. Genes on the inside of the circle are transcribed clockwise, those on the outside, counterclockwise. Asterisks mark genes with introns (2 asterisks mark genes with 2 introns), Ψ indicates a pseudogene. INSET-Genomes scaled to relative size: Ipomoea (outermost), Cuscuta exaltata (middle), and C. obtusiflora (innermost).

FIG. 2.-Circular map of the complete plastid genome of Cuscuta exaltata, comprising an 82,721 bp LSC and a 9,250 bp SSC separated by two 16,701 bp IRs. Inversion end- points are shown with lines connecting the inner circle to the outer. Position one of the annotated sequence begins at the LSC/IRA junction and increases numerically counterclockwise around the genome. Genes on the inside of the circle are transcribed clockwise, those on the outside, counterclockwise. Asterisks mark genes with introns (2 asterisks mark genes with 2 introns), Ψ indicates a pseudogene.

FIG. 3.-Circular map of the complete plastid genome of Cuscuta obtusiflora, comprised of a 50,201 bp LSC and a 6,817 bp SSC separated by 14,131 bp IRs. Position one of the annotated sequence begins at the LSC/IRA junction and increases numerically counterclockwise around the genome. Genes on the inside of the circle are transcribed clockwise, those on the outside, counterclockwise. Asterisks mark genes with introns (2 asterisks mark genes with 2 introns), Ψ indicates a pseudogene.

FIG. 4.-Pairwise dN/ dS of Nicotiana and Ipomoea vs. Panax ginseng for all shared protein-coding genes. Genes are ranked left to right by increasing dN/ dS for Nicotiana. Genes lost in Cuscuta exaltata and C. obtusiflora are indicated below the graph. All dN and dS values and standard errors are available as a supplemental spreadsheet.

FIG. 5.-Rates of substitution and selection across 4 functionally-defined classes of genes. A- dN estimates and standard errors vs Panax for Atropa, Nicotiana, Ipomoea, C. exaltata, and C. obtusiflora. B- dS vs Panax for the same taxa. C. Pairwise dN/ dS for the same taxa vs. Panax; Ipomoea, C. exaltata, and C. obtusiflora vs. Nicotiana; C. exaltata and C. obtusiflora vs. Ipomoea, and C. exaltata vs. C. obtusiflora.

FIG. 6.-Phylogenetic trees created using Maximum Likelihood GTR+gamma for each functionally defined gene class. Branches with significantly higher (LRT, p < 0.01) rates of synonymous substitution per site are thickened. Branches with significantly higher dN/ dS are marked with one (p < 0.01), two, (p < 0.001), or three asterisks (p < 0.0001). Values of dS and dN/ dS on relevant branches are given in Supplemental Table 3.

FIG. 7.-(supplemental) Amino acid p-distance for Epifagus, Ipomoea, C. exaltata, and C. obtusiflora vs. Panax across most genes present in Epifagus.

FIG. 8.-(supplemental) dN/ dS for all genes, all taxa (including Epifagus) vs. Panax. S

* - L F G

-

- G FIG 1 U 73 G

A A M A C A A C Z - b C C G s A - b D p s U G b U p s G p G rb - c T L a N c t A c e C D p G p -

y C s T

a c y - c I U r f n c f p 3 4 n e d G U A m V n d s * 4 B A h * A - d h U 4 a 1 A U a p J s G K s C e h s A - a p C t C p U A p a t C r - U t p M A p f S C E * -U U B U E G M - G p - b e Y s p t D p e L tG ps LSC a r J B pl p o 33 s rp rp p b 1 s1 s J C 8 p b o p s F rp s bL * W bE -C P- C U A 2 GG C r po pl r 5' 20 p rps sbB 12 2 clp ps ps P* r I bT * atp psbH H atp *pet pF CU B ps *at R-U bN pA UCC* *petD at G- psbI rpoA psbK CU rps11 S-G rpl36 Q-UUG ΨinfA *rps16 rps8 rpl14 rpl1 Ipomoea purpurea *K-UUU 6* rps3 matK rpl22 p rps19 sbA rpl2 H-GUG rpl23 I-CA AU U I-C 162,046 bp y cf2

ycf2

AA -C * L hB nd 7 A rps * IR 12 L-C ps B IR V AA ' r r -G 3 rn A 1 C n 6 dh *I B* * -G A A -U U rp r G s rn 3 7 2 C ' r rr 3 ps r n 1 R r 4 2 n . * C U - 5 5 A 6 U SSC A -G 1 C V n -G y rr c G N f 1 U r p A n

G C * d s - n I F n n

n 1 n

* G p h d

h d d 5

d d

U s H - h d h h

h h A a 3 I * n G A

E C 2 D n

r r 5 . 4 5 n N r rn r r G -G C U -A U

R f1 c y

5 1 H

s h

p d 2 r

G n 3 A l

s

A p

c r

U

c

-

L

Rubisco subunit Photosystem protein Cytochrome-related ATP synthase NADH dehydrogenase Ribosomal protein subunit Ribosomal RNA Plastid-encoded RNA polymerase Other Unknown function / Unnamed (ycf) Pseudogene Transfer RNA Intron FIG 2 74 FIG 3 75

U r A b A a A A c c A C L G c - - U y D - A c M F c L e G f * m 4 G - p A e S t A

p p e e tL p t s G r a p J l r 3 B E p 3 U p s p t 1 t G 8 a 4 a U s - p C T r f3 C p c p y -G Z s s A b p b b a G s p s J s p W s b L p b F P - E C - C b U C aB ps G A LSC s r G p p 4 l2 1 0 s D p U b p 5 r A ps s ' rp C bB s - A c 12 fM G lp -U p P S GU sb ** -G T T psb H

etN UC p *pe -U A tB ps E UA -GC bN -G C C Y GU D- M psb *petD

rps2 atpI ΨrpoA atpH rps11 F rpl36 atp rps8 atpA rpl14 Cuscuta obtusiflora

rpl16* S-GCU R-UCU psbI ps3 Q r -UUG psbK rpl22 85,280 bp ps19 p r sbA rpl2 H-G U UG I-CA

I-C AU yc f2 A IR IR B

f2 c A y A 7 C s 2 - 1 L rp s V p - r G ' r A 3 rn C 1 6 SSC L- r C r r A r n p r s A n 2 3 7 r ' U r 4 3 r L n p . - s U c 5 5 U 1 2 c G A - s

C G A N A

-G 6 V 1 n rr

3 2 n N r r - 5 . G 5 4 r U n p

n p r r s U r 1 r s f a

1

c C

5 y

Rubisco subunit Photosystem protein Cytochrome-related ATP synthase NADH dehydrogenase Ribosomal protein subunit Ribosomal RNA Plastid-encoded RNA polymerase Other Unknown function / Unnamed (ycf) Pseudogene Transfer RNA Intron FIG 4 76

dN/dS: Nicotiana and Ipomoea for all genes relative to Panax S

d Ipomoea /

N Nicotiana d

* * * * # *# * * * * * # * #* #*# # *= Lost in C. exaltata and C. obtusiflora

#= Lost in C. obtusiflora Genes FIG 5 C B A

Pairwise dN/dS dS relative to Panax dN relative to Panax

C . ob C tu . s e if x lo a

r l

a t a C C

C t . a . . o ob o b t b

u t t u si u s fl s i

o i I f r f l l

a p o o o r C r a a . m ex a o

l e ta ta a C C . o C . b .

t e u e x si x a fl a

o l l t r t a a a t t a C a

. N ex a i lt c at o a t i I a po n m a I I p oe p o a o m m o o e C e a a . o bt us if lo ra C . N ex N

a i i c lt c o at o t a t i i a a n n a

I a po m P oe a a n a x N ic ot A ia A t

n t r a r o o p p a A a t ro pa P A C R h T y i b o t P o o t o c s s o h s y y m n r s o t t a h m e l a m P e s e r 77 C ( o p ( t a s ( e p ) t i p e n ) t ) ( r p ) FIG 6 78

** C. obtusiflora * C. obtusiflora atp *** pet

** C. exaltata * C. exaltata

Ipomoea Ipomoea

Nicotiana Nicotiana

Atropa Atropa

Panax Panax

Spinacia Spinacia 0.05 substitutions / site 0.05 substitutions / site

*** C. obtusiflora C. obtusiflora ps rp ** *

C. exaltata *** C. exaltata

Ipomoea Ipomoea

Nicotiana Nicotiana

Atropa Atropa

Panax Panax

Spinacia Spinacia 0.05 substitutions / site 0.05 substitutions / site

*** C. exaltata rpo ndh

Ipomoea *** Ipomoea

Nicotiana Nicotiana

Atropa Atropa

Panax Panax

Spinacia Spinacia 0.05 substitutions / site 0.05 substitutions / site FIG 7 79 FIG 8 80

dN/dS all genes, all taxa vs. Panax

Nicotiana

S Ipomoea d / C. exaltata N

d C. obtusiflora Epifagus

Genes 81

Table 2 (Supplemental)

A Pairwise Relative Rates Tests Taxa compared C. exaltata vs. C. obtusiflora C. exaltata vs. Ipomoea Ipomoea vs. Nicotiana Outgroup Ipomoea Nicotiana Panax atp *** *** 0.00377* pet *** *** 0.257 ps *** *** *** rp *** *** *** rpo *** *** ndh ***

B Relative Ratio Tests Gene Classes Compared atp vs. pet atp vs. ps atp vs. rp dN 0.246 0.00634* 0.0231 dS 0.186 0.0254 0.116543

Gene Classes Compared pet vs. ps pet vs. rp ps vs. rp dN 0.0783 0.0554 *** dS 0.983 0.911 0.27

*p<0.01 **p<0.001 ***p<0.0001 All values remain significant after sequential Bonferroni tests as described in Rice, W.R. 1989. Evolution. 43: 223-225. 82

Table 3 (Supplemental)

Values estimated in unconstrained ML trees atp d N /d S d S Synonymous rate increase (x Convolvulaceae) Globally constrained 0.184 0.205 Solanales 0.0885 0.136 Convolvulaceae 0.133 0.189 Cuscuta 0.821*** 0.0206 C. exaltata 0.264** 0.216 1.14 C. obtusiflora 0.231** 0.959*** 5.07 pet Globally constrained 0.114 0.208 Solanales 0.0381 0.131 Convolvulaceae 0.0907 0.135 Cuscuta 0.0606 0.127 C. exaltata 0.245* 0.16 1.19 C. obtusiflora 0.154* 1.037*** 7.68 ps Globally constrained 0.0728 0.191 Solanales 0.0402 0.106 Convolvulaceae 0.046 0.159 Cuscuta 0.138** 0.0847 C. exaltata 0.0859 0.17 1.07 C. obtusiflora 0.103*** 0.91*** 5.72 rp Globally constrained 0.227 0.184 Solanales 0.0988 0.106 Convolvulaceae 0.267 0.157 Cuscuta 0.314* 0.0959 C. exaltata 0.390*** 0.148 0.94 C. obtusiflora 0.232 0.901*** 5.74 rpo Globally constrained 0.231 0.175 Solanales 0.144 0.163 Convolvulaceae 0.218 0.211 C. exaltata 0.435*** 0.253*** 1.2 ndh Globally constrained 0.19 0.223 Solanales 0.181 0.114 Ipomoea 0.284*** 0.401***

*p<0.01 **p<0.001 ***p<0.0001 Significance of rate differences determined by comparing fully constrained tree likelihood to tree with specified node unconstrained All values remain significant after standard or sequential Bonferroni tests as described in Rice, W.R. 1989. Evolution. 43: 223-225. 83

Table 4 (Supplemental)

A Intergenic distance between shared, intact coding sequence Region (flanking genes) Epifagus C. obtusiflora C. exaltata Ipomoea rpl20-rps12 745 720 799 814 trnN-ycf1 323 113 537 402 rrn4.5-rrn5 233 83 218 241 trnW-trnP 225 153 74 169 rps18-rpl20 213 102 244 257 rrn23-rrn4.5 188 80 100 101 trnS-rps4 182 119 310 313 rpl16-rps3 162 79 159 166 trnD-trnY 151 54 106 106 rps12-clpP 138 81 187 187 trnfM-rps14 133 126 152 144 rpl33-rps18 115 124 185 193 rps11-rpl36 108 103 100 94 trnY-trnE 80 68 70 55 rps19-rpl2 62 96 64 51 rps7-rps12 47 48 53 53 trnQ-psbK 350 360 370 trnR-atpA 101 87 108 atpA-atpF 66 61 59 atpF-atpH 202 294 353 atpH-atpI 289 744 1113 atpI-rps2 79 412 239 trnC-petN 347 709 928 petN-psbM 353 497 1197 psbM-trnD 94 459 660 trnE-trnT 200 585 747 trnT-psbD 636 868 1322 psbC-trnS 107 283 250 trnS-psbZ 133 262 351 psbZ-trnG 168 285 278 trnG-trnfM 89 100 270 rps14-psaB 98 100 128 psaB-psaA 19 28 28 psaA-ycf3 349 724 790 ycf3-trnS 166 576 829 rps4-trnT 240 425 457 trnT-trnL 370 660 875 trnL-trnF 113 357 371 trnM-atpE 172 209 211 atpB-rbcL 401 799 414 rbcL-accD 1053 584 899 ycf4-cemA 296 517 616 cemA-petA 170 318 220 petA-psbJ 458 642 909 psbJ-psbL 115 167 137 psbL-psbF 40 25 25 petL-petG 171 257 230 petG-trnW 50 124 131 trnP-psaJ 125 392 543 psaJ-rpl33 114 356 460 clpP-psbB 253 565 511 psbB-psbT 110 163 203 psbT-psbN 60 62 71 psbN-psbH 122 110 116 psbH-petB 120 130 133 petB-petD 132 195 209 rpl36-rps8 206 542 490 rps8-rpl14 178 170 278 rpl14-rpl16 107 145 105 rpl22-rps19 90 62 68 trnV-rrn16 116 220 221 trnL-ccsA 119 158 124 rps15-ycf1 218 207 369

# bp 11714 19353 22762

% change -49.06% -15.84% -1.02%

Only those shared w/ E. v. 3105 2149 3358 3346

% change -3.36% -33.12% 4.51% 4.14%

B Shared pseudogene* sequence relative to Nicotiana Epifagus % change C. obtusiflora % change Nicotiana rrn16-rrn23 2009 -3.32% 341 -83.59% 2078 rps7-trnL 1322 -56.37% 307 -89.87% 3030 trnI-rpl2 471 1.29% 168 -63.87% 465

3802 -31.78% 816 -85.36% 5573

*intergenic spacer regions between conserved genes that once contained one or more genes no longer present in functional form in either species 84

Table 5 (Supplemental)

Ipomoea Cumulative Codon Usage* C. exaltata Cumulative Codon Usage C. obtusiflora Cumulative Codon Usage

AA Codon N RSCU AA Codon N RSCU AA Codon N RSCU AA Codon N RSCU AA Codon N RSCU AA Codon N RSCU

Phe UUU 907 1.33 Ser UCU 463 1.53 Phe UUU 769 1.41 Ser UCU 355 1.55 Phe UUU 649 1.51 Ser UCU 262 1.56 UUC 455 0.67 UCC 291 0.96 UUC 321 0.59 UCC 214 0.94 UUC 213 0.49 UCC 138 0.82 Leu UUA 742 1.91 UCA 344 1.14 Leu UUA 627 1.9 UCA 260 1.14 Leu UUA 559 2.14 UCA 203 1.21 UUG 467 1.2 UCG 184 0.61 UUG 424 1.29 UCG 149 0.65 UUG 292 1.12 UCG 108 0.64 Tyr UAU 649 1.6 Cys UGU 211 1.45 Tyr UAU 517 1.62 Cys UGU 150 1.46 Tyr UAU 392 1.71 Cys UGU 112 1.53 UAC 160 0.4 UGC 80 0.55 UAC 123 0.38 UGC 55 0.54 UAC 67 0.29 UGC 34 0.47 ter UAA 0 ter UGA 0 ter UAA 0 ter UGA 0 ter UAA 0 ter UGA 0 ter UAG 0 Trp UGG 407 1 ter UAG 0 Trp UGG 322 1 ter UAG 0 Trp UGG 264 1

Leu CUU 523 1.35 Pro CCU 347 1.49 Leu CUU 405 1.23 Pro CCU 263 1.33 Leu CUU 323 1.24 Pro CCU 207 1.41 CUC 149 0.38 CCC 194 0.84 CUC 130 0.39 CCC 187 0.95 CUC 88 0.34 CCC 109 0.74 CUA 314 0.81 CCA 257 1.11 CUA 264 0.8 CCA 201 1.02 CUA 215 0.82 CCA 185 1.26 CUG 135 0.35 CCG 131 0.56 CUG 128 0.39 CCG 139 0.7 CUG 92 0.35 CCG 88 0.6 His CAU 392 1.49 Arg CGU 284 1.23 His CAU 350 1.48 Arg CGU 242 1.22 His CAU 245 1.54 Arg CGU 188 1.39 CAC 135 0.51 CGC 107 0.46 CAC 122 0.52 CGC 102 0.51 CAC 73 0.46 CGC 80 0.59 Gln CAA 631 1.56 CGA 315 1.37 Gln CAA 530 1.52 CGA 273 1.37 Gln CAA 425 1.51 CGA 202 1.49 CAG 180 0.44 CGG 112 0.49 CAG 168 0.48 CGG 117 0.59 CAG 138 0.49 CGG 62 0.46

Ile AUU 957 1.5 Thr ACU 505 1.61 Ile AUU 753 1.5 Thr ACU 428 1.67 Ile AUU 593 1.59 Thr ACU 358 1.85 AUC 389 0.61 ACC 257 0.82 AUC 271 0.54 ACC 210 0.82 AUC 189 0.51 ACC 121 0.62 AUA 562 0.88 ACA 358 1.14 AUA 482 0.96 ACA 280 1.09 AUA 339 0.91 ACA 208 1.07 Met AUG 511 1 ACG 132 0.42 Met AUG 385 1 ACG 105 0.41 Met AUG 310 1 ACG 88 0.45 Asn AAU 828 1.54 Ser AGU 377 1.25 Asn AAU 684 1.5 Ser AGU 299 1.31 Asn AAU 543 1.53 Ser AGU 240 1.43 AAC 247 0.46 AGC 155 0.51 AAC 226 0.5 AGC 93 0.41 AAC 167 0.47 AGC 57 0.34 Lys AAA 938 1.49 Arg AGA 410 1.78 Lys AAA 860 1.56 Arg AGA 326 1.64 Lys AAA 697 1.58 Arg AGA 207 1.53 AAG 317 0.51 AGG 153 0.66 AAG 243 0.44 AGG 135 0.68 AAG 188 0.42 AGG 74 0.55

Val GUU 437 1.45 Ala GCU 591 1.83 Val GUU 372 1.42 Ala GCU 464 1.73 Val GUU 282 1.46 Ala GCU 348 1.61 GUC 148 0.49 GCC 197 0.61 GUC 132 0.5 GCC 186 0.69 GUC 137 0.71 GCC 151 0.7 GUA 448 1.49 GCA 353 1.09 GUA 378 1.44 GCA 292 1.09 GUA 235 1.22 GCA 253 1.17 GUG 170 0.57 GCG 152 0.47 GUG 165 0.63 GCG 133 0.49 GUG 119 0.62 GCG 114 0.53 Asp GAU 698 1.58 Gly GGU 508 1.3 Asp GAU 571 1.55 Gly GGU 395 1.2 Asp GAU 399 1.53 Gly GGU 338 1.4 GAC 186 0.42 GGC 171 0.44 GAC 167 0.45 GGC 173 0.53 GAC 123 0.47 GGC 126 0.52 Glu GAA 900 1.47 GGA 579 1.48 Glu GAA 716 1.48 GGA 462 1.4 Glu GAA 548 1.48 GGA 345 1.42 GAG 326 0.53 GGG 304 0.78 GAG 254 0.52 GGG 288 0.87 GAG 194 0.52 GGG 160 0.66

*Calculated across all coding regions for that taxa: total # of codons varies between taxa. 85

Chapter 5: Formatted for submission to American Journal of Botany

SYSTEMATICS AND PLASTID GENOME EVOLUTION OF THE

CRYPTICALLY PHOTOSYNTHETIC PARASITIC PLANT GENUS CUSCUTA

(CONVOLVULACEAE)1

JOEL R. MCNEAL,2,4 KATHIRAVETPILLA ARUMUGANATHAN3, AND CLAUDE

W. DEPAMPHILIS2

2Department of Biology, Huck Institutes of Life Sciences, and Institute of Molecular

Evolutionary Genetics, The Pennsylvania State University, University Park, Pennsylvania

16802-5301 USA; and 3Benaroya Research Institute at Virginia Mason, 1201 Ninth

Avenue, Seattle, Washington 98101 USA 86

1Manuscript received XXXXXX 2005; revision accepted XXXXXX 2005.

The authors thank Daniel Austin, Todd Barkman, Mauricio Bonifacino, Alison Colwell,

Peter Endress, Andreas Fleischmann, Julian Hibberd, Greg Jordan, Cliff Morden, Lytton

Musselman, Ann Rhoads, Kim Steiner, Kyoji Yamada, George Yatskievych, Tony Omeis and the Pennsylvania State university Greenhouse, Tom Wendt and the University of

Texas herbarium, and the P.S.U. herbarium for assistance in obtaining plant material, and

Jim Leebens-Mack, Dave Geiser, Steve Schaeffer, and Andy Stephenson for critical review of the manuscript. This study was supported by a National Science Foundation

Doctoral Dissertation Improvement Grant (DEB-0206659).

4E-mail: [email protected]. 87

ABSTRACT

The genus Cuscuta L. (Convolvulaceae), commonly known as dodders, are epiphytic vines that invade the stems of their host with haustorial feeding structures at points of contact. Though they lack expanded leaves, some species are noticeably chlorophyllous, especially as seedlings. Some species are reported as crop pests of worldwide distribution, whereas others are extremely rare with local distributions and apparent niche specificity. A systematic study of this large genus is essential to understand in an evolutionary context the interesting ecological, morphological, and molecular phenomena that occur within these parasites. Here we present a well- supported phylogeny of Cuscuta using sequences of the nuclear ribosomal internal transcribed spacer and plastid rps2, rbcL, and matK from representatives across most of the taxonomic diversity of the genus. We use the phylogeny to interpret morphological and plastid genome evolution within the genus. At least three currently recognized taxonomic sections are not monophyletic, and subgenus Cuscuta is unequivocally paraphyletic. Plastid genes are extremely variable in evolutionary constraint, with rbcL exhibiting even higher levels of purifying selection than photosynthetic relatives. We conclude that most species of Cuscuta retain photosynthetic ability, primarily for nutrient apportionment to their seeds, while complete loss of photosynthesis is likely limited to a single clade of species of primarily Andean distribution. 88

Between 150 and 200 described species of Cuscuta exist, distributed widely on every continent except Antarctica (Yuncker, 1932). These parasites have no roots at maturity, and leaves are reduced to minute scales. As such, few morphological characters exist to distinguish and classify species outside of the flower and fruit. Style and stigma morphology, capsule dehiscence, and corolla and calyx shape and size form the basis of existing monographical studies (Choisy, 1841; Engelmann, 1859; Yuncker, 1932).

Englemann (1859) separated Cuscuta into three subgenera on the basis of style fusion and stigma shape. Members of subgenus Monogyna have the two styles fused for most or all of their length, and consist of thick-stemmed species that commonly parasitize trees and shrubs; subgenera Cuscuta and Grammica have the styles free, with stigmas being globose in subgenus Grammica and elongate in subgenus Cuscuta (Fig. 1). The last full monograph of the genus completed by Yuncker (1932) recognized 9 species in

Monogyna, distributed primarily in Eurasia and Africa with one species, Cuscuta exaltata

Engelmann, having a disjunct distribution in the southern United States in scrub habitat of and Texas. The 28 species in subgenus Cuscuta recognized by Yuncker are restricted to, but widely distributed in, the Old World. Subgenus Grammica, with 121 species recognized by Yuncker, is almost completely limited to the New World, with a handful of exceptions in Asia, Africa, and Pacific islands, including Tasmania and

Australia.

Engelmann (1859) further divided each of the subgenera into sections based on stigma morphology and capsule dehiscence. Monogyna consists of 2 sections; the first,

Callianche, contains only Cuscuta reflexa Roxburgh, defined by its elongate stigmas atop the fused styles. All other members of subgenus Monogyna are relegated to section 89

Monogynella, which have shorter, stouter stigmas. All members of subgenus Monogyna possess a circumscissile capsule as the fruit. Subgenus Cuscuta is subdivided into four sections: section Cleistococca has only one species, Cuscuta capitata Roxburgh, which is distinguished from all other members of subgenus Cuscuta by having an indehiscent capsule as its fruit. Fruits of sections Pachystigma and Epistigma are only irregularly circumscissile, and fruits of section Eucuscuta are always cleanly dehiscent. Section

Pachystigma is distinguished from section Epistigma by its members' long, slender styles topped by wider stigmas, whereas members of section Epistigma possess only short to indetectable styles topped by the elongate stigmas. The six species of Pachystigma are restricted to Southern Africa, while the four species of Epistigma and Cuscuta capitata are restricted to central Asia. Section Eucuscuta has a wider distribution, with the largest number of species found close to the Mediterranean Sea. Subgenus Grammica is divided into two sections based on capsule dehiscence, with section Eugrammica possessing complete to partially dehiscent capsules and section Cleistogrammica producing indehiscent capsules. Species of subgenus Grammica are relatively evenly divided between the two subgenera, with 53 species in section Cleistogrammica and 68 species in

Eugrammica (Yuncker, 1932).

Cuscuta is readily recognizable as a genus, with only species in the completely unrelated but amazingly similar parasitic vine genus Cassytha L. () likely to ever cause any confusion (Kuijt, 1969); however, small flowers and a paucity of usable morphological characteristics often make identification of Cuscuta to the species level a challenge. Although no comprehensive taxonomic study of the entire genus has been completed since Yuncker's monograph, Cuscuta remains one of the most widely studied 90 parasitic plant lineages, with numerous publications on its anatomy (Lyshede, 1992), nutritional physiology (Jeschke, Baig, and Hilpert, 1997), plastid evolution (Machado and

Zetsche, 1990), and even foraging behavior (Kelly, 1992). Phylogenies of

Convolvulaceae with a small sampling of Cuscuta species showed it is confidently nested within that family (Stefanovic, Krueger, and Olmstead, 2002); although its exact placement could not be strongly inferred (Stefanovic and Olmstead, 2004), another phylogeny from plastid matK (McNeal, 2005b) concurs with higher support on a placement sister to a clade referred to as /Convolvuloideae (Stefanovic, Austin, and

Olmstead, 2003). Taxa from subgenus Monogyna appeared basal to subgenus Cuscuta and subgenus Grammica in those studies.

Conflicting evidence exists as to the photosynthetic ability across the genus.

Machado and Zetsche (1990) demonstrated low levels of photosynthetic carbon assimilation in the noticeably chlorophyllous stems of Cuscuta reflexa (subgenus

Monogyna) despite apparent loss of all ndh genes (Haberhausen and Zetsche, 1994), but found no detectable levels of rubisco expression in C. europaea, despite the presence of the gene encoding its large subunit (rbcL) in the plastid genome. Studies further showed that C. reflexa only produces chlorophyll in a specific layer of cells isolated from atmospheric gas exchange, suggesting it only photosynthesizes by recycling carbon dioxide released from respiratory byproducts of carbohydrates from its host source

(Hibberd et al., 1998). C. pentagona Engelmann of subgenus Grammica was shown to possess proper ratios of chlorophyll a and b, contain properly localized photosynthetic proteins, and display low levels of carbon assimilation (Sherman, Pettigrew, and Vaughn,

1999). However, other members of subgenus Grammica seem to possess highly altered 91 plastid genomes; C. gronovii Willdenow and C. subinclusa Durand et Hilgard seemingly lack plastid-encoded polymerase (rpo) genes (Krause, Berg, and Krupinska, 2003), although low levels of transcription of rbcL still take place from nuclear-encoded polymerase promoter sites (Berg, Krause, and Krupinska, 2004), and these species, along with C. campestris Yuncker and C. reflexa still possess normal chlorophyll a and b ratios

(van der Kooij et al., 2000). By contrast, C. odorata Ruiz et Pavon and C. grandiflora

Humbolt, Bonpland, et Kunth are achlorophyllous, lack thylakoids, and do not produce detectable levels of rbcL transcript or protein (van der Kooij et al., 2000). Complete plastid genome sequences of C. exaltata (subgenus Monogyna) and C. obtusiflora

Humbolt, Bonpland, et Kunth (subgenus Grammica) revealed that loss of ndh genes probably spans across the genus, with additional loss of ycf15, rpl23, and trnK-UUU in both species (McNeal, 2005c).

In this study, we examined phylogeny of the genus Cuscuta by sampling 35 species from all sections of the genus defined by Englemann (1859) with the exceptions of section Epistigma and the monospecific section Cleistococca. Our sampling also included species from 19 of 29 subsectional groups recognized by Yuncker (1932). We obtained DNA sequences from two plastid loci (rbcL and rps2) and the nuclear internal transcribed spacer (ITS) region between the 5.8S and 18S ribosomal RNA loci from largely overlapping subsets of taxa to investigate phylogenetic relationships within the genus and test monophyly of the previously defined subgeneric and subsectional delimitations. We determined genome sizes for species available as fresh tissue in order to address questions of species delimitation and to test whether genome size correlates with published chromosome numbers, which are highly variable (Pazy and Plitmann, 92

1995). In addition to the plastid loci mentioned above, which correspond to rubisco large subunit and a small ribosomal protein subunit respectively, we sampled two more plastid loci representing two other functionally distinct genes (atpE, ATP synthase subunit; rpoA plastid-encoded polymerase subunit) from smaller subsets of taxa in order to test whether all classes of plastid genes are evolving equally in Cuscuta relative to photosynthetic taxa. Using further PCR assays, we tested the distribution of major changes to the plastid genome within the genus and combined them with previously published evidence to gain a comprehensive view of photosynthetic evolution within Cuscuta. Finally, we used evidence from the biology and natural history of these parasites to suggest potential hypotheses as to why photosynthesis is retained in most members of the genus despite what superficially appears to be minimal opportunity for gain of photosynthetic carbohydrate.

MATERIALS AND METHODS

Plant material- Quality of available tissue for different Cuscuta species was variable, but a common method using a typical plant C-TAB DNA isolation (Doyle and

Doyle, 1990) with 1% polyethylene glycol (molecular weight 8000) added to the buffer proved effective for plant specimens including live plants grown in the Pennsylvania

State University Biology greenhouse, freshly collected wild plants, frozen tissue, silica- gel dried tissue, and small samples from herbarium specimens. For some silica-gel dried material, vouchers were unavailable, and we instead identified species by dissection of rehydrated flowers from the sample. Photographs taken through a dissecting scope of 93 characters necessary for identification are available as vouchers for such species. For two species which we received no voucherable material nor flowering and fruiting material for dissection, we verified proper identification of the sample with sequence comparison of vouchered data at loci always variable above the species level. Vouchered specimens were deposited in the Pennsylvania State University Herbarium (PAC). Vouchers, taxon information, and GenBank accession numbers for all sequences are available as a supplemental appendix.

PCR and sequencing- Previously designed primers ITS4 and ITS5 were used for amplification and sequencing of the nuclear ITS locus according to published protocol

(Baldwin, 1992). A few taxa exhibited sequence polymorphisms, particularly in a highly variable loop region (Hershkovitz and Zimmer, 1996), which was not confidently alignable and excluded for analyses. This also often resulted in length polymorphisms that required Topo cloning (Invitrogen, California) and to ensure sequence of only one copy. For all taxa with polymorphic ITS loci, we found no evidence of lineage sorting, as all alleles from a given species always formed a clear clade. We used consensus sequences from multiple clone reads to sort true nucleotide polymorphisms from Taq polymerase error in incorporated PCR fragments. True nucleotide polymorphisms were rare and were entered in the data matrix as the predominant locus in our sample. Only one sequence from each species with identified length polymorphisms was used. rps2 was amplified with primers rps2-661R and either rps2-18F or rps2-47F (dePamphilis,

Young, and Wolfe, 1997) or, for recalcitrant taxa, new primers designed from the more readily generated Cuscuta sequences and the available plastid genome sequences of C. 94 exaltata and C. obtusiflora. rbcL was also amplified using published primer sequences

(Olmstead et al., 1992) or new primers designed specifically for Cuscuta. For some herbarium sampled taxa, internal primer combinations were used to amplify and sequence the gene in parts when necessary. Amplification across atpE was performed using primers atpB-1277F (Hoot, Culham, and Crane, 1995) and trnF-F (Dumolin-Lapegue,

Pemonge, and Petit, 1997); for members of section Eucuscuta, trnT(2)-R (Demesure,

Sodzi, and Petit, 1995) was substituted for trnF-F on the basis of an inversion of those taxa verified by this PCR and a PCR from trnF-F to rps4-32F (Nickrent et al., 1997). rpoA or rpoA pseudogenes were amplified and sequenced with a combination of the newly designed primers petD-endF and rps11-C398F. PCR protocol for rps2, rbcL, atpE, and rpoA all followed the rps2 protocol described in dePamphilis et al. (1997). Long

PCR assays of intergenic sequences were conducted using the following primer combinations: psbD-40F (Graham and Olmstead, 2000) to trnfM-R (Demesure, Sodzi, and Petit, 1995); trnC-F (Demesure, Sodzi, and Petit, 1995) to psbD-45R (Graham and

Olmstead, 2000); and rps4-32F to atpB-s1277F. PCR from psbA-984F to ndhB-13F

(Graham and Olmstead, 2000) was used to confirm shrinkage of the inverted repeat in members of subgenus Monogyna. These longer PCR assays were done using 1 X Taq

Extender Buffer, 0.2mM of each dNTP , 2.5mM MgCl2, 3.0 µM of each primer, 0.5 units of Taq DNA Polymerase (Promega, Pittsburgh), 0.5 units of Taq Extender (Stratagene,

CA), and approximately 500ng of template DNA in 50µl total volume. Amplification was accomplished using a thermal-cycling scheme of an initial 94º C denaturation for 2 mi, followed by 10 cycles of 94º C for 10 s, 55º C for 30 s, and 68º C for 6 minutes. 16 additional cycles were performed under the parameters of 94º C for 20s, 55º C for 30 s, 95 and 68º C for 6 minutes with an additional 20 seconds added to this extension time each cycle. A final, additional extension at 68º C for 7 min was also performed. In cases where multiple bands were produced, this process was repeated with the extra MgCl2 removed. All newly designed PCR primers are available as a Supplemental Table 3. All

PCR products that were sequenced were cleaned using a Qiaquick PCR Purification Kit

(Qiagen California) or a combination of 5 units of Exonuclease I and 2 units of Shrimp

Alkaline Phosphatase (USB, ) in 10 µl volume incubated at 37º C for an hour followed by 15 minutes at 80º C to inactivate the . Sequencing was performed on a Beckman-Coulter CEQ-8000XL machine following manufacturer's protocol.

Phylogenetic Analyses- ITS sequences were initially aligned using Clustal X

(Thompson et al., 1997) followed by manual adjustment. Protein-coding plastid sequences were easily aligned by eye, with attention paid to codon alignment in the few areas where gaps existed. A consensus of 500 bootstrap trees was created for each gene individually using maximum parsimony in PAUP*4.0b10 (Swofford, 2002). Aligned datasets contained 684 base pairs (bp) for ITS, 1399 bp for rbcL, and 660 bp for rps2. A combined bootstrap consensus was created using data from these three genes combined with previously reported matK data (1650 aligned bp, 4393 combined bp)(McNeal,

2005b), although because of gene loss or failed amplifications many taxa could not be represented by all four genes. Bayesian Posterior probabilities were calculated for each node using Mr. Bayes v3.0b4 (Huelsenbeck and Ronquist, 2001). Four cold chains and one chain heated at the default value were run with swapping according to default settings and a likelihood model of general-time reversible (GTR) with a gamma and 96 invariant parameter estimated from the data. One million generations were run with sampling every hundredth generation for a total of 10,001 trees. Likelihood estimates were graphically investigated to determine appropriate burn-in values for each gene (201 trees discarded for rps2 and rbcL, 401 trees discarded for ITS, 250 discarded for combined data). Additionally, Neighbor-Joining Phyograms and Bootstrap Values (500 replicates) were generated for each of the three newly reported gene alignments after complete deletion of all gaps and characters missing for any taxa (ITS, 291 bp after deletion; rbcL 1035 bp; rps2, 522 bp) under the GTR + gamma (0.5) model. C. cuspidata was removed from NJ ITS analyses because of numerous missing characters.

Full sequence alignments are available as supplemental material (Appendix).

Genome size estimates- Nuclear genome size-estimates and standard errors were measured by using either Rice, Soybean, Tobacco, Barley, or Wheat

Cultivars of known nuclear genome size as standards. Four replicates were done for each plant, with the mean estimates and standard deviations (SD) reported in Table 1. Fresh plant material for these measurements was grown in the Pennsylvania State University

Biology greenhouse. Cuscuta seeds were germinated after scarification in concentrated

H2SO4 and grown with Impatiens, Coleus, or Linum usitatissimum (for C. epilinum) as hosts. Fresh stem tip tissue was used for all size estimates reported.

Rates Analyses- Aligned datasets of 12 identical taxa for atpE, rbcL, and rps2 were imported into HYPHY.99beta (Kosakovsky Pond, Frost, and Muse, 2004). A different set of taxa was used for rpoA, which is missing in all sampled members of 97 subgenus Grammica. A user tree based on highly supported nodes of the bootstrap consensus tree in Fig. 2 that was congruent with all single-gene analyses was used for all genes (single-gene trees for atpE and rpoA not shown). Synonymous and nonsynonymous branch lengths were first calculated with no constraints under the MG96 x HKY 3x4 codon model. Next, a tree with all branches constrained to the same nonsynonymous to synonymous distance ratio (dN/dS) was optimized, and a likelihood ratio test (LRT) to determine whether the unconstrained tree had a significantly better likelihood better was performed. Likelihood parameters were then re-optimized for trees with dN/dS constrained differently for various clades (i.e. two dN/dS ratios on the tree; one for the subclade being tested, one for the remainder of the tree). Clades examined in this manner for atpE, rbcL, and rps2 were the Convolvulaceae clade (Ipomoea + Cuscuta), all

Cuscuta, all Cuscuta but subgenus Monogyna, and the clade comprising the three sampled species of subgenus Grammica. For rpoA, clades examined were

Convolvulaceae, Cuscuta, Subgenus Cuscuta, and Cuscuta nitida. LRTs were confined to testing only hypotheses of change at these expected nodes rather than doing numerous tests increasing chances of Type I error.

RESULTS

Phylogeny- Figure 2 shows parsimony bootstrap consensus cladograms for all three genes and the 4-gene combined dataset, and Fig. 3 shows the 4-gene bootstrap consensus topology with taxonomic classifications of each of the sampled species displayed to the right. Maximum Parsimony bootstrap values (MP) are shown above the 98 nodes and Bayesian posterior probability estimates (PP) are shown below the nodes. The individual gene trees are almost identical in topology, with no well-supported incongruences. Many of the support values are high for individual genes, and almost every node is very well supported in the combined analysis. Furthermore, Neighbor-

Joining analyses were performed with gapped and missing data removed, causing a significant reduction in number of characters available for analysis, again giving nearly congruent topologies that agreed at well-supported nodes (Supplemental Fig. 7). Cuscuta was found to be sister to the /Convolvuloideae clade for two of the genes (matK and ITS), and this placement was very well supported in the combined analysis (MP 92, PP 1.0).

Within Cuscuta, subgenus Monogyna was monophyletic (MP 100, PP 1.0), with C. exaltata falling basal among sampled species. Section Monogynella was paraphyletic, with C. reflexa of the monotypic section Callianche nested within (MP BP 100, PP 1.0).

Subgenus Cuscuta was strongly supported as paraphyletic (MP 98, PP 1.0), with Cuscuta nitida Meyer representing section Pachystigma falling sister to Subgenus Grammica, a result also supported by a previously reported synapomorphic loss of two transfer RNA genes and loss of introns from ycf3and atpF (McNeal, 2005b). The two sampled species in section Eucuscuta were monophyletic (MP 100, PP 1.0). Subgenus Grammica was clearly monophyletic (MP 100, PP 1.0), although neither section Eugrammica nor

Cleistogrammica were monophyletic (many highly supported nodes). The basal lineage of subgenus Grammica was not clearly resolved, with the consensus showing a clade including subsection Odoratae (C. chilensis Ker-Gawler) with subsection Acutilobae (C. foetida Humboldt, Bonpland, et Kunth) and a clade with subsections Indecorae,

Umbellatae, and Leptanthae falling in a polytomy with a clade containing the remainder 99 of the sampled subsections of subgenus Grammica. Subsection Californicae and subsection Tinctoriae were not monophyletic in the combined tree, but the monophyly of all other subsections cannot be disputed by these data.

Genome size results- Genome size estimates were highly variable within Cuscuta and did not appear to be related to previously published chromosome numbers overall.

Subgenus Monogyna species, which show, on average, intermediate chromosome numbers between the other two subgenera (Pazy and Plitmann, 1995), have extremely large nuclear genomes according to our results. Low number of plastid clones relative to nuclear clones in a genomic fosmid library used to generate the full plastid genome sequence of Cuscuta exaltata (McNeal, 2005a) help confirm that this is, indeed, the case.

Within subgenus Cuscuta section Eucuscuta, genome sizes of Cuscuta europaea L. and

C. epilinum Weihe actually did appear to correlate with karyotypes and known ploidy levels, with the apparent recent triploid C. epilinum having a genome size consistent with this data relative to C. europaea. Estimated nuclear genome sizes within subgenus

Grammica are the most variable, with an estimate for Cuscuta pentagona (1.16 picograms/2C) being the smallest of all sampled species and C. indecora Choisy (65.54 pg/2C) being the largest. There does not appear to be a "normal" genome size within this subgenus, although closely related species in subsection Oxycarpae, subsection

Cephalanthae, and subsecion Lepidanche all possess proportional nuclear genome size, with three size classes perhaps reflecting different ploidy levels. Interestingly, accessions of C. gronovii from different geographic localities showed quite striking differences in 100 genome size, even within two collections made in the state of Pennsylvania. Smaller, secondary peaks were detected in many species, suggesting that these stem tips were growing so rapidly as to have many cells at different stages of mitosis with different overall DNA content depending on phase. Alternatively, the parasites could be undergoing endoreduplication of their genomes, making assessment of their true nuclear genome size difficult.

Plastid genome variation assays- PCR and sequencing of the region between petD and rps11 showed that taxa across subgenus Grammica contained only residual pseudogene sequence, although the length of the remaining intergenic region was surprisingly constant across those taxa (data not shown). This confirmed previous hybridization data that failed to detect rpo genes (Krause, Berg, and Krupinska, 2003) and showed loss of transcription from known plastid-encoded polymerase promoter sites.

PCR data also showed that an inversion detected in the large single copy region of C. exaltata and C. reflexa is a synapomorphy in all sampled species of subgenus Monogyna, as is a constriction of the plastid genome inverted repeat into ycf2. A 2 kilobase inversion in the large single copy region of the plastome was found in both members of subgenus Cuscuta subsection Eucuscuta. Long PCR covering many intergenic regions demonstrated that substantial shrinkage of these regions observed in C. obtusiflora is

(McNeal, 2005c) shared across subgenus Grammica, with all species in the subgenus seemingly converging on a minimal length (Figure 4). C. lupuliformis shows much less reduction in intergenic regions, being almost identical to lengths extracted from the full plastid sequence of its relative in subgenus Monogyna, C. exaltata. Members of 101 subgenus Cuscuta show intermediate levels of intergenic constriction, demonstrating that this phenomenon does not completely result from loss of plastid-encoded polymerase, which they still possess. Finally, we attempted to study plastid genes in C. chilensis, an achlorophyllous species related to C. odorata, which appeared to lack rbcL in another study (van der Kooij et al., 2000). Unlike the results from C. odorata, we were unable to amplify rrn16 from C. chilensis using many combinations of primers. Furthermore, hybridization to over 1,500 clones from a genomic fosmid library with various ribosomal proteins and rrn16 returned no positive plastid results. Positive control amplifications and hybridizations to mitochondrial genes showed that DNA quality was not a factor.

Although the existence of a plastid genome with dramatically altered gene sequences cannot be unequivocally ruled out, all indications point to the potential absence of a plastid genome in this species. Major changes to the plastid genome reported in this and previous studies are mapped on the cladogram in Fig. 3.

Tests of selective constraint- Unconstrained trees are shown in Fig. 5. Trees with all branches constrained to the same dN/dS were significantly worse than fully unconstrained trees for atpE, rbcL, and rps2 (Table 2), indicating lineage specific heterogeneity in selective constraint for these genes. No significant difference was observed between the likelihoods of rpoA trees when trees with all branches constrained to identical dN/dS were compared to unconstrained trees. Of the four hypotheses tested for atpE, constraining an independent dN/dS for all Cuscuta from the rest of the tree improved the likelihood the most, with the resulting likelihood no longer being significantly different from the fully unconstrained tree. For rbcL, all of the clades tested 102 in the same manner remained significantly worse than the unconstrained tree, with the greatest improvement coming when subgenera Cuscuta and Grammica together were given a separate dN/dS. In this case, as is apparent in the unconstrained tree, dN/dS actually decreases within Cusucta, with all species under higher levels of purifying selection than the autotrophic outgroups. For rps2, yet a third pattern was observed. Of the hypotheses tested, a change in dN/dS across Convolvulaceae improves the likelihood the most, again to the extent that it is no longer significantly different than the unconstrained tree, suggesting that a relaxation of constraint may have occurred in this gene before the evolution of parasitism. A similar result was found by McNeal et al. with a combined dataset of all ribosomal protein genes from C. exaltata, C. obtusiflora, and

Ipomoea purpurea (McNeal, 2005c), and in the independently derived parasitic plant family Orobanchaceae, significant rate increases in rps2 are seen even in very photosynthetic lineages before evolution of holoparasitism (dePamphilis, Young, and

Wolfe, 1997). For rpoA, there was no significant difference between the fully constrained and fully unconstrained trees to begin with, and no appreciable changes occurred under any of the proposed hypothetical shifts in dN/dS.

DISCUSSION

Morphological and Biogeographical interpretation of phylogeny- Although

Yuncker believed morphological features of subgenus Grammica were the ancestral state due to the species-richness of that subgenus and thus rooted his phylogeny there, subsectional relationships within sections largely agree with interpretation of 103 phylogenetic relationships proposed by Yuncker based on morphology once the tree is re- rooted to the proper node (Figure 8, supplemental). Artificial relationships found to be non-monophyletic mostly result from interpretation of two morphological characters: stigma morphology and capsule dehiscence. Elongated stigmas appear to be a derived state in C. monogyna, which is nested within a clade of species with much stouter stigmas. On the contrary, the globose stigmas seen in subgenus Grammica are apparently derived from elongate stigmas, such as those seen in subgenus Cuscuta. Stigma morphology appears to be quite plastic within the genus, and a full range of intermediates between subgenus Cuscuta and subgenus Grammica exist. Thus, it isn't surprising that section Pachystigma, with seemingly transitional stigma morphology, is actually sister to subgenus Grammica. In fact, a species within section Pachystigma, Cuscuta cucullata

Yuncker, is so similar to the only member of subgenus Grammica found in South Africa,

C. appendiculata Engelmann, that Yuncker mentions it as the only species likely to cause confusion. The existence of these species in South Africa has biographical implications for the colonization of the New World by subgenus Grammica from a South African /

South American dispersal event. Putatively basal clades of subgenus Grammica are either distributed almost completely in South America (Subsection Acutilobae and

Subsection Odoratae) or contain lineages distributed widely from South to North

America (Subsection Indecorae and Subsection Umbellatae). Interestingly, C. cucullata and C. appendiculata are unique among South African Cuscuta species in having indehiscent capsules, which facilitate floating and water-mediated dispersal in many members of subgenus Grammica section Cleistogrammica. Subgenus Grammica apparently has successfully diversified since reaching the New World, with many more 104 species than either of the other two subgenera. Whether the ancestor of C. exaltata, of subgenus Monogyna may have taken a similar route to colonize the New World is unknown, although it, too, shares a morphologically similar relative in South Africa (C. cassytoides Nees von Esenbeck).

While capsule dehiscence was one of the main characters used for monographical work in Cuscuta (Engelmann, 1859; Yuncker, 1932), our phylogenetic analyses indicate it is a transient character in the genus with very little systematic value. Many species of

Cuscuta subgenus Grammica possess irregularly dehiscent capsules that are not easily classified as either indehiscent or circumscissile. Two interesting cases of indehiscent- capsuled species being allied to clades with circumscissile capsules are C. tasmanica

Engelmann and C. sandwichiana Choisy. These derived members of subgenus

Grammica have independently colonized islands far from the home of their Mexican sister-taxa, and both are found in coastal habitats. Again, indehiscent capsules seem to be the key to water dispersal. Other taxa from subgenus Grammica found in the Pacific Rim

(C. victoriana Yuncker and C. australis R. Brown) likely took a similar route via indehiscent capsules.

Genome sizes and speciation- Estimates of species number within Cuscuta vary greatly, largely because so few characters exist to distinguish them. Existence of forms with supernumary (Pazy, 1997) and such widely scattered estimates of chromosome numbers in the genus (Pazy and Plitmann, 1995) suggest polyploid and aneuploid evolution occur rather rapidly in this lineage. Species very similar morphologically may occupy very dissimilar habitat niches and exhibit different host 105 preferences. One such example involves C. pentagona, C. campestris, C. polygonorum

Engelmann, and other relatives in subsection Arveses and subsection Platycarpae. C. campestris is often merged taxonomically with C. pentagona, as the two are distinguished primarily by the minor characters of overall flower size and angularity of the calyx. However, our estimates of genome size between accessions identified as either form differed in size by almost 10-fold. Estimates for C. polygonorum and C. pentagona differ by almost 50%, although those species have also been merged in at least one treatment (Beliz, 1986). C. polygonorum is identifiable by flowers that are often 4- merous and have slightly different gynoecial shape than those in C. pentagona.

However, the species can usually be distinguished simply by habitat and host preference, which are quite different. In such cases, where forms seem to be ecologically distinct as well as morphologically distinguishable, we suggest species-level distinction is likely warranted given the rapid change with which genome structure occurs. Within the same species, Cuscuta gronovii, seemingly different ploidy levels exist. Morphological variation in corolla size and shape exist in this species as well (Fig. 6), indicating that perhaps cryptic species with different chromosome numbers that are incapable of interbreeding may exist. Accelerated rates of nucleotide substitution in the nuclear genome may also promote rapid speciation in subgenus Grammica, if acceleration in ribosomal loci such as ITS (Fig. 7 supplemental) and 18S (Nickrent and Starr, 1994) are proportional to protein-coding rates. As almost all species of Cuscuta produce selfed seed readily even in the absence of pollinators, and pollen is often deposited on the stigma before the corolla even opens, drastic changes in the nuclear genome that prevent outcrossing may promote speciation. 106

Plastid genome evolution in Cuscuta- Plastid genome evolution in Cuscuta has occurred in a stepwise fashion, with major changes occurring in the ancestor of the genus, the nodes leading up to subgenus Grammica, and within one clade of subgenus

Grammica. Across most species of Grammica, plastid genome content appears to have settled on a smaller, but constrained size (e.g., Fig 4). Different types of genes appear to be evolving under different levels of constraint. Most surprisingly, rbcL appears to be under much greater purifying selection in Cuscuta than in autotrophic relatives. This effect may be due largely to much higher rates of substitution in Cuscuta for the plastid genome (evidenced by branch lengths in Fig. 5, Supplemental Fig. 7), but a need for amino acid stasis in rbcL. But why do these parasites retain such strongly conserved photosynthetic genes in the absence of leaves and extensive photosynthetically capable tissues? Hibberd et al. (1998) suggest that recycling of internally respired carbon dioxide may be the answer. However, loss of ndh genes hypothetically would make these parasites more susceptible to photorespiration (McNeal, 2005c) unless extremely high respiratory rates existed near these photosynthetic cells or some other mechanism similar to C4 photosynthesis existed. Why these plants would need to produce carbohydrates, which are readily available from the host, is not known. At least in subgenus Grammica, chlorophyll is concentrated primarily in developing ovaries and ovules, and nearly all species of Cuscuta display this phenotype (Fig 1). A second pathway involving rbcL in lipid biosynthesis in green seeds of Brassica (Schwender et al., 2004) could suggest a tantalizing explanation for Cuscuta. Seeds often have high lipid content as energy reserves for the seedling, and Cuscuta is no exception; it has been shown to accumulate 107 lipid bodies around the periphery of the embryo which may aid in desiccation tolerance and seed longevity (Lyshede, 1992), traits for which Cuscuta seeds are very successful.

Most Cuscuta species are annuals and must be prolific seed producers of highly energetic seeds to ensure at least some offspring will be able to germinate and survive long enough to search out and attach to a host. A combination of both Rubisco pathways could be in effect as well, particularly in subgenus Monogyna, where photosynthetic cells are concentrated in a layer of stem tissue as well as the gynoecium (Hibberd et al., 1998).

Loss of photosynthesis in Cuscuta- If photosynthesis is so important to seed production in Cuscuta, then why do some species exist that lack chlorophyll and probably rbcL (van der Kooij et al., 2000) (C. chilensis, Fig. 1)? Reproductive biology of the lineages of Cuscuta that contain these species, subsection Odoratae and subsection

Grandiflorae, may provide an important clue. Large corolla size (see Cuscuta chilensis, figure 6) and strong fragrance characterize members of these subsections. Aside from the names of C. odorata and C. grandiflora, specific epithets of some member in subsection

Acutilobae (rubella, bella, purpurata, and foetida), which in our phylogeny is shown to be closely related to Odoratae, further attest to the showiness and fragrance of species in this lineage of mostly Andean species. In our experience with cultivating C. chilensis, it is incapable of producing selfed seed (from over 100 hand-pollinations), while most species readily produce selfed seed without assistance. Crosses with a second individual with a distinct gynoecial phenotype (Fig. 1) also failed to produce seeds. Observations of various natural populations in Chile showed that pollinator visitation was frequent, with

Lepidopterans, Hymenopterans, and Dipterans all eagerly entering the flowers and drinking the copious nectar secreted at the base of the corolla by the gynoecium. 108

However, seed set in these natural populations was extremely low, with less than 5% of old flowers containing matured seed capsules. Similar observations have been made involving members of subsection Subulatae, which may be part of the same lineage, in the mountains of Central America; members of this subsection also produce large flowers, exhibit low seed set, and like Cuscuta chilensis, can survive on the host year- round (Beliz, 1986). Ability to perenniate may explain why these species have less demand for seed set and, thus, are able to survive the cost of low seed set and reap the benefits of self-incompatibility. Alternatively, loss of photosynthesis and subsequent loss of seed set could have driven selection for the life-history characteristics demonstrated by this lineage, although it seems less likely given the immediate deleterious effects. Also, since at least one recent, additional example of an independently derived case of self- incompatability exists in C. rostrata, which also exhibits strong fragrance and large flowers (Fig 6), we see that self-incompatability can evolve without drive from photosynthetic loss. C. rostrata remains an annual species, and sets abundant seed when hand-crossed with plants of a different . Wild populations show ample seed set as well. Our results and observations suggest in-depth study of species in Cuscuta subgenus Grammica, subsections Odoratae, Grandiflorae, Acutilobae, and Subulatae will provide further insight into the evolutionary loss of photosynthesis in this parasitic lineage. 109

References BALDWIN, B. G. 1992. Phylogenetic utility of the internal transcribed spacers of nuclear ribosomal DNA in plants: An example from the Compositae. Molecular Phylogenetics and Evolution 1: 3-16. BELIZ, T. 1986. A revision of Cuscuta section Cleistogrammica using phenetic and cladistic analyses with a comparison of reproductiv mechanisms and host preferences in species from California, Mexico, and Central America, University of California, Berkely, Berkeley. BERG, S., K. KRAUSE, AND K. KRUPINSKA. 2004. The rbcL genes of two Cuscuta species, C. gronovii and C. subinclusa, are transcribed by the nuclear-encoded plastid RNA polymerase (NEP). Planta 219: 541-546. CHOISY, J. D. 1841. De Convolvulaceis Dissertatio Tertia. Mem. Soc. Phys. Hist. Nat. Geneve. 9: 261-288. DEMESURE, B., N. SODZI, AND R. J. PETIT. 1995. A set of universal primers for amplification of polymorphic noncoding regions of mitochondrial and chloroplast DNA in plants. Molecular Ecology 4: 129-131. DEPAMPHILIS, C. W., N. D. YOUNG, AND A. D. WOLFE. 1997. Evolution of plastid gene rps2 in a lineage of hemiparasitic and holoparasitic plants: Many losses of photosynthesis and complex patterns of rate variation. Proceedings of the National Academy of Sciences of the United States of America 94: 7367-7372. DOYLE, J. J., AND J. L. DOYLE. 1990. Isolation of plant DNA from fresh tissue. Focus 12: 13-15. DUMOLIN-LAPEGUE, S., M. H. PEMONGE, AND R. J. PETIT. 1997. An enlarged set of consensus primers for the study of organelle DNA in plants. Molecular Ecology 6: 393-397. ENGELMANN, G. 1859. Systematic arrangement of the species of the genus Cuscuta, with critical remarks on old species and descriptions of new ones. Trans. of the Academy of Science, St. Louis 1: 453-523. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000. Utility of 17 chloroplast genes for inferring the phylogeny of basal angiosperms. American Journal of Botany 87: 1712-1730. HABERHAUSEN, G., AND K. ZETSCHE. 1994. Functional loss of ndh genes in an otherwise relatively unaltered plastid genome of the holoparasitic flowering plant Cuscuta reflexa. Plant Molecular Biology 24: 217-222. HERSHKOVITZ, M. A., AND E. A. ZIMMER. 1996. Conservation patterns in angiosperm rDNA ITS2 sequences. Nucleic Acids Research 24: 2857-2867. HIBBERD, J. M., R. A. BUNGARD, M. C. PRESS, W. D. JESCHKE, J. D. SCHOLES, AND W. P. QUICK. 1998. Localization of photosynthetic metabolism in the parasitic angiosperm Cuscuta reflexa. Planta 205: 506-513. HOOT, S. B., A. CULHAM, AND P. R. CRANE. 1995. The utility of atpB gene sequences in resolving phylogenetic relationships: Comparisons with rbcL aand 18S ribosomal DNA sequences in the Lardizabalaceae. Annals of the Missouri Botanical Garden 82: 194-207. HUELSENBECK, J. P., AND F. RONQUIST. 2001. MRBAYES: Bayesian inference of phylogeny. Bioinformatics 17: 754-755. 110

JESCHKE, W. D., A. BAIG, AND A. HILPERT. 1997. Sink-stimulated photosynthesis, increased transpiration and increased demand-dependent stimulation of nitrate uptake: Nitrogen and carbon relations in the parasitic association Cuscuta reflexa- Coleus blumei. Journal of Experimental Botany 48: 915-925. KELLY, C. K. 1992. Resource choice in Cuscuta europaea. Proceedings of the National Academy of Sciences of the United States of America 89: 12194-12197. KOSAKOVSKY POND, S. L., S. D. W. FROST, AND S. V. MUSE. 2004. HyPhy: hypothesis testing using phylogenies. Bioinformatics: bti079. KRAUSE, K., S. BERG, AND K. KRUPINSKA. 2003. Plastid transcription in the holoparasitic plant genus Cuscuta: parallel loss of the rrn16 PEP-promoter and of the rpoA and rpoB genes coding for the plastid-encoded RNA polymerase. Planta 216: 815-823. KUIJT, J. 1969. Biology of Parasitic Flowering Plants. University of California Press, Berkeley and Los Angeles. LYSHEDE, O. B. 1992. Studies on mature seeds of Cuscuta pedicillata and Cuscuta campestris by electron microscopy. Annals of Botany (London) 69: 65-371. MACHADO, M. A., AND K. ZETSCHE. 1990. A structural, functional and molecular analysis of plastids of the holoparasites Cuscuta reflexa and Cuscuta europaea. Planta 181: 91-96. MCNEAL, J. R. 2005a. Chapter 2: "Utilization of partial genomic fosmid libraries for sequencing complete organellar genomes" in Systematics and plastid genome evolution in the parasitic plant genus Cuscuta (dodder). PhD., The Pennsylvania State University, University Park. ______. 2005b. Chapter 3: "Disappearance of introns promotes adaptive change and loss of a highly conserved maturase" in Systematics and plastid genome evolution in the parasitic plant genus Cuscuta (dodder). PhD., The Pennsylvania State University, University Park. ______. 2005c. Chapter 4: "Complete plastid genome sequences suggest strong selection for retention of photosynthetic genes in the parasitic plant genus Cuscuta" in Systematics and plastid genome evolution in the parasitic plant genus Cuscuta (dodder). PhD., The Pennsylvania State University, University Park. NICKRENT, D. L., AND E. M. STARR. 1994. High rates of nucleotide substitution in nuclear small-subunit (18S) rDNA from holoparasitic flowering plants. Journal of Molecular Evolution 39: 62-70. NICKRENT, D. L., O. Y. YAN, R. J. DUFF, AND C. W. DEPAMPHILIS. 1997. Do nonasterid holoparasitic flowering plants have plastid genomes? Plant Molecular Biology 34: 717-729. OLMSTEAD, R. G., H. J. MICHAELS, K. M. SCOTT, AND J. D. PALMER. 1992. Monophyly of the Asteridae and identification of their major lineages inferred from DNA sequences of rbcL. Annals of the Missouri Botanical Garden 79: 249-265. PAZY, B. 1997. Supernumerary chromosomes and their behaviour in meiosis of the holocentric Cuscuta babylonica Choisy. Botanical Journal of the Linnean Society 123: 173-176. PAZY, B., AND U. PLITMANN. 1995. Chromosome divergence in the genus Cuscuta and its systematic implications. Caryologia 48: 173-180. 111

SCHWENDER, J., F. GOFFMAN, J. B. OHLROGGE, AND Y. SHACHAR-HILL. 2004. Rubisco without the Calvin cycle improves the carbon efficiency of developing green seeds. Nature 432: 779-782. SHERMAN, T. D., W. T. PETTIGREW, AND K. C. VAUGHN. 1999. Structural and immunological characterization of the Cuscuta pentagona L. chloroplast. Plant and Cell Physiology 40: 592-603. STEFANOVIC, S., AND R. G. OLMSTEAD. 2004. Testing the phylogenetic position of a parasitic plant (Cuscuta, Convolvulaceae, Asteridae): Bayesian inference and the parametric bootstrap on data drawn from three genomes. Systematic Biology 53: 384-399. STEFANOVIC, S., L. KRUEGER, AND R. G. OLMSTEAD. 2002. Monophyly of the Convolvulaceae and circumscription of their major lineages based on DNA sequences of multiple chloroplast loci. American Journal of Botany 89: 1510- 1522. STEFANOVIC, S., D. F. AUSTIN, AND R. G. OLMSTEAD. 2003. Classification of Convolvulaceae: A phylogenetic approach. Systematic Botany 28: 791-806. SWOFFORD, D. L. 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Sinauer Associates, Sunderland, MA. THOMPSON, J. D., T. J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN, AND D. G. HIGGINS. 1997. The ClustalX windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 24: 4876-4882. VAN DER KOOIJ, T. A. W., K. KRAUSE, I. DORR, AND K. KRUPINSKA. 2000. Molecular, functional and ultrastructural characterisation of plastids from six species of the parasitic flowering plant genus Cuscuta. Planta 210: 701-707. YUNCKER, T. G. 1932. The Genus Cuscuta. Memoirs of the Torrey Botanical Club 18: 113-331. 112

Fig. 1. Gynoecia and ovules of species across the taxonomic diversity of Cuscuta.

Species in subgenus Monogyna have fused styles, species in subgenus Cuscuta have linear stigmas, and species in subgenus Grammica have globose stigmas. All species examined had chlorophyllous ovules and gynoecia excepr C. chilensis. Two different flower morphs with gynoecia of different shapes, size, and colors were examined.

Fig. 2. Maximum Parsimony consensus trees of 500 bootstrap replicates for plastid rbcL, plastid rps2, nuclear ITS, and all three genes combined with plastid matK. Parsimony bootstrap values are shown above the branches at nodes above 50% support, while

Bayesian posterior probabilities are given below the branches.

Fig. 3. Parsimony bootstrap consensus tree (500 replicates) with taxonomic classifications according to Yuncker (1932) to the right of taxon names. Changes to the plastid genome are mapped on nodes.

Fig. 4. Results of long PCR tests to detect differences in intergenic spacer regions. trnfM-CAU to psbD (top), psbD to trnC-GCA (middle), and atpB to rps4 (bottom).

Lengths calculated from complete plastid genome sequences of Ipomoea purpurea,

Cuscuta exaltata, and C. obtusiflora are shown beneath genes contained within each region. 113

Fig. 5. Unconstrained maximum likelihood tree estimates for atpE, rbcL, rps2, and rpoA. dN/dS values are shown above and dS values are shown below all branches with overall d

> 0.02.

Fig. 6. Floral diversity within the genus Cuscuta, showing differences both within and between species. Ruler marks to the left are millimeters. Cuscuta chilensis and C. rostrata were unable to produce selfed seed when hand-pollinated, whereas all other species readily produced self seed with no assistance. C. coryli flower is rehydrated from an herbarium specimen; all other flowers were collected fresh from the P.S.U. greenhouse.

Fig. 7. Phylograms of individual genes produced by Neighbor-Joining GTR + gamma method with bootstrap values shown above the nodes. All gaps and characters missing for any taxa were excluded from the analysis, and Cuscuta cuspidata was left out of the

ITS analysis due to many missing characters unique to it.

Fig. 8. Approximation of phylogenetic inferences suggested by Yunker (1932) on journal page 116 of his monograph. Taxa included in this study are shown to the right of subsection classifications to which they belong. 114

Table 1. Genome size and chromosome numbers in Cuscuta

Species Nuclear Genome Size (pg/2C) SD Published chromosome estimate (2n ) Convolvulaceae Ipomoea purpurea 1.51 0.020 30

Subgenus Monogyna Cuscuta exaltata 41.86 0.559 ? Cuscuta lupuliformis 44.93 0.290 28

Subgenus Cuscuta Cuscuta epilinum 7.74 0.177 42 Cuscuta europaea 2.15 0.046 14

Subgenus Grammica Cuscuta chilensis 5.73 0.074 ? (C. odorata = 32) Cuscuta indecora 65.54 0.572 30 Cuscuta obtusiflora 1.58 0.022 ? Cuscuta polygonorum 1.62 0.018 ? Cuscuta campestris 10.83 0.290 56 Cuscuta pentagona 1.16 0.023 56,44 Cuscuta veatchii 5.83 0.096 ? (C. denticulata = 30) Cuscuta compacta 15.69 0.056 30 Cuscuta rostrata 8.12 0.015 ? Cuscuta cephalanthi 7.85 0.029 60 Cuscuta gronovii (NJ) 7.56 0.129 60 Cuscuta gronovii (OH) 7.17 0.109 ... Cuscuta gronovii (C PA) 13.81 0.074 ... Cuscuta gronovii (SE PA) 4.37 0.194 ... Cuscuta gronovii calyptrata 11.47 0.130 ... 115

Table 2. Likelihood Ratio Test comparisons of trees with constrained clades versus fully unconstrained trees.

Constrained branches d N /d S p-value degrees of freedom atpE All 0.256 0.040 20

Convolvulaceae 0.284 All but Convolvulaceae 0.193 0.049 19

Cuscuta 0.323 All but Cuscuta 0.168 0.120 19

Subgenus Grammica+subg. Cuscuta 0.323 All but subg. Grammica+subg. Cuscuta 0.168 0.000 19

Subgenus Grammica 0.238 All but subgenus Grammica 0.264 0.032 19 rbcL All 0.071 0.001 20

Convolvulaceae 0.057 All but Convolvulaceae 0.111 0.006 19

Cuscuta 0.052 All but Cuscuta 0.111 0.011 19

Subgenus Grammica+subg. Cuscuta 0.047 All but subg. Grammica+subg. Cuscuta 0.108 0.047 19

Subgenus Grammica 0.094 All but subgenus Grammica 0.046 0.011 19 rps2 All 0.207 0.003 20

Convolvulaceae 0.265 All but Convolvulaceae 0.098 0.127 19

Cuscuta 0.249 All but Cuscuta 0.140 0.012 19

Subgenus Grammica+subg. Cuscuta 0.249 All but subg. Grammica+subg. Cuscuta 0.165 0.005 19

Subgenus Grammica 0.238 All but subgenus Grammica 0.192 0.002 19 rpoA All 0.322 0.247 20

Convolvulaceae 0.357 All but Convolvulaceae 0.259 0.331 19

Cuscuta 0.360 All but Cuscuta 0.276 0.296 19

Subgenus Cuscuta 0.333 All but subg. Cuscuta 0.316 0.203 19

Cuscuta nitida 0.389 All but C. nitida 0.313 0.223 19 116

Table 3. New primer sequences designed for this study

Primer Name Sequence (5' to 3') rbcL-Z1Cus ATGTCACCACAAACAGARACTAAARC rbcL-521F CTATTAAACCWAAATTGGGKTTATC rbcL-599R GTAAAATCAAGTCCACCRCGAAG rbcL-818F GATTCACTGCAAATACTTCTTTGG rbcL-910R GTCTATCAATAACKGCATGCATTG rbcL-1392R CTCYTTCCATACCTCACAAGCAG rps2-J12F ATATTGGAACATMAAWTTGGAAG rps2-J662R CYAATTTGTTMAGAATGAATCG rps2-J306F CGGTATGTTAACRAATTGGTCCAC rps2-J458R CCCAGATATMTTTGCAAGCGAGC petD-endF CAAAATCCATTTCGKCGTCCAG rps11-C398F GCCACACAATGGCTGTAGACCTCC 117

Fig. 1

Subgenus Monogyna Subgenus Cuscuta

C. exaltata C. lupuliformis C. epilinum C. europaea C. nitida

Subgenus Grammica

C. indecora C. tasmanica C. suksdorfii C. obtusiflora

C. compacta C. gronovii C. gronovii C. rostrata C. cephalanthi var. calyptrata

C. epilinum C. rostrata C. chilensis C. chilensis ovule ovule ovules 118

Fig. 2

Cuscuta gronovii rbcL 89 1.0 Cuscuta gronovii 99 Cuscuta rostrata 76 1.0 rps2 0.94 100 Cuscuta cephalanthi 72 Cuscuta rostrata 1.0 0.71 99 Cuscuta glomerata 95 Cuscuta cephalanthi 1.0 1.0 97 Cuscuta compacta 100 1.0 1.0 Cuscuta glomerata Cuscuta veatchii 83 70 1.0 Cuscuta compacta 0.95 Cuscuta campestris 100 90 Cuscuta veatchii 1.0 0.99 85 Cuscuta obtusiflora 80 Cuscuta campestris 0.98 89 100 0.98 Cuscuta salina 100 1.0 1.0 1.0 Cuscuta obtusiflora 93 Cuscuta subinclusa 100 _ Cuscuta applanata Cuscuta tasmanica 1.0 100 Cuscuta indecora 1.0 Cuscuta applanata 88 98 100 0.96 1.0 Cuscuta americana Cuscuta umbellata 86 88 Cuscuta nitida 1.0 Cuscuta indecora 0.93 97 Cuscuta nitida Cuscuta epilinum 1.0 100 1.0 Cuscuta epilinum 100 97 Cuscuta europaea 1.0 1.0 Cuscuta europaea Cuscuta japonica 100 73 1.0 Cuscuta japonica 0.96 75 Cuscuta lupuliformis 0.88 100 76 Cuscuta lupuliformis 59 1.0 51 0.91 0.77 Cuscuta reflexa 100 0.88 Cuscuta reflexa Cuscuta exaltata Cuscuta exaltata Jacquemontia tamnifolia 94 97 Dichondra occidentalis 0.99 1.0 Dichondra carolinensis Ipomoea purpurea Ipomoea purpurea Calystegia sepium 91 1.0 Jacquemontia tamnifolia Calystegia sepium

Humbertia madagascariensis Humbertia madagascariensis

Nicotiana tabacum Nicotiana tabacum

Cuscuta cuspidata Cuscuta cuspidata Cuscuta gronovii Cuscuta gronovii ITS 4 gene combined 56 Cuscuta gronovii var. calyptrata Cuscuta gronovii var. calyptrata 100 0.85 1.0 Cuscuta rostrata 84 Cuscuta rostrata 1.0 90 Cuscuta cephalanthi 100 Cuscuta cephalanthi 1.0 Cuscuta glomerata 1.0 Cuscuta glomerata 94 Cuscuta compacta Cuscuta compacta 1.0 99 Cuscuta denticulata 58 99 Cuscuta denticulata 1.0 Cuscuta veatchii 0.69 1.0 Cuscuta veatchii 99 Cuscuta campestris Cuscuta campestris 1.0 100 Cuscuta polygonorum 100 Cuscuta polygonorum 1.0 Cuscuta obtusiflora Cuscuta obtusiflora 90 76 Cuscuta salina 55 75 Cuscuta salina 68 1.0 100 0.64 Cuscuta salina var. apoda 0.84 100 0.81 Cuscuta salina var. apoda 0.99 1.0 99 Cuscuta suksdorfii 1.0 Cuscuta suksdorfii 100 100 1.0 Cuscuta subinclusa 1.0 1.0 Cuscuta subinclusa 86 Cuscuta californica Cuscuta californica _ 89 Cuscuta tasmanica 82 Cuscuta tasmanica 1.0 Cuscuta tinctoria Cuscuta tinctoria 100 69 Cuscuta applanata 1.0 74 Cuscuta applanata 100 0.97 Cuscuta potosina 1.0 100 0.95 Cuscuta potosina Cuscuta sandwichiana 1.0 Cuscuta americana 99 Cuscuta sandwichiana 96 Cuscuta attenuata 1.0 Cuscuta americana 100 100 0.84 Cuscuta indecora 96 Cuscuta attenuata 1.0 1.0 100 0.86 Cuscuta indecora 93 Cuscuta coryli 1.0 1.0 100 Cuscuta leptantha 75 Cuscuta coryli 98 61 1.0 0.95 Cuscuta umbellata 100 Cuscuta leptantha 1.0 0.95 100 Cuscuta foetida 1.0 Cuscuta umbellata 100 1.0 Cuscuta chilensis 1.0 100 100 Cuscuta foetida Cuscuta.nitida 1.0 1.0 Cuscuta chilensis 100 Cuscuta epilinum Cuscuta.nitida 100 1.0 Cuscuta europaea 1.0 100 Cuscuta epilinum 100 92 Cuscuta japonica 1.0 100 1.0 Cuscuta europaea 1.0 Cuscuta lupuliformis 100 1.0 74 Cuscuta japonica 92 Cuscuta reflexa 1.0 96 0.51 Cuscuta lupuliformis 1.0 Cuscuta exaltata 86 1.0 86 Cuscuta reflexa 100 78 Ipomoea purpurea 0.82 54 1.0 Cuscuta exaltata 1.0 1.0 Ipomoea quamoclit X coccinea 1.0 Calystegia sepium 78 Ipomoea purpurea 100 1.0 Ipomoea quamoclit X coccinea 1.0 84 Jacquemontia tamnifolia 1.0 Calystegia sepium 100 Dichondra carolinensis / occidentalis 1.0 Humbertia madagascariensis Jacquemontia tamnifolia 100 Nicotiana tabacum Dichondra carolinensis 1.0 Atropa belladonna Humbertia madagascariensis Panax ginseng Nicotiana tabacum Spinacia oleracea 119

Fig. 3

C. cephalanthi Subsection Cephalanthae C. gronovii C. g. calyptrata Subsection Oxycarpae C. rostrata C. cuspidata C. glomerata Subsection Lepidanche C. compacta C. denticulata Subsection Denticulatae C. veatchii Section Cleistogrammica † rpl23 appears to evolve as a pseudogene in Ipomoea, Subsection Arvenses although it is present as a full length open reading frame C. campestris C. polygonorum Subsection Platycarpae * A deletion in the intron may render trnV-UAC a C. obtusiflora pseudogene in subgenus Monogyna, and, thus, functionally lost in the ancestor of all Cuscuta C. salina C. salina apoda Subsection Subinclusae C. suksdorfii Cuscuta C. subinclusa Subgenus C. californica Subsection Californicae Grammica C. tasmanica Subsection Lobostigmae - psaI, trnR-ACG, rpl32 C. tinctoria Subsection Tinctoriae C. applanata Subsection Tinctoriae Section Eugrammica ? C. potosina Subsection Odontolepisae

- - a C. sandwichiana Subsection Californicae Section Cleistogrammica 3’ ll rp r s po C. americana Subsection Americanae Section Eugrammica 12 g i e - m n ne a tro s C. attenuata tK n - bo - C. indecora Subsection Indecorae th at Section Cleistogrammica - y p - t r cf F tr nA 3 in C. coryli nI -U in tr o -G G tr n A on C. leptantha Subsection Leptanthae U C s - - t r Subsection Umbellatae tr nG C. umbellata nV - Section Eugrammica -U U - rbcL?, loss of photosynthesis? A U C. foetida Subsection Acutilobae C C - * a C. chilensis Subsection Odoratae ll 11 n Subsection Africanae Section Pachystigma - t dh C. nitida Cuscuta rn K - r ge 2 KB inversion (trnT-UGU to trnF-GAA) -U ps ne C. epilinum U 1 s Subgenus U 6 Subsection Europaeae Section Eucuscuta - rpl23†, ycf15 C. europaea Cuscuta C. japonica ? Section Monogynella 13 KB inversion 2 KB inversion C. lupuliformis Cuscuta (trnV-UAC to psbE) (trnL-UAG to ccsA) C. reflexa Section Callianche Subgenus Cuscuta exaltata Section Monogynella Monogyna Haustorial parasitism Ipomoea purpurea - Convolvulaceae: Tribe Ipomoeeae rp Ipomoea hybrid l2 in t r on Calystegia sepium Convolvulaceae: Tribe Convolvuleae Jacquemontia tamnifolia Convolvulaceae: Tribe Jacquemontieae - i nf A Dichondra Convolvulaceae: Tribe Dichondreae Humbertia madagascariensis Convolvulaceae: Tribe Humbertieae +sprA Nicotiana tabacum Atropa belladonna Solanaceae Panax ginseng Spinacia oleracea 120

Fig 4.

lup epl nit ind ros trnfM-CAU, trnG-GCC, psbZ, trnS-UGA, psbC, psbD

3700 bp 3906 bp in Ipomoea purpurea 3200 bp 3684 bp in C. exaltata 3252 bp in C. obtusiflora

lup epl nit ind ros psbD, trnT-GGU, trnE-UUC, trnT-GGU, trnT-GGU,psbM, petN, trnC-GCA

4800 bp 6567 bp in Ipomoea purpurea 4837 bp in C. exaltata 3300 bp 3310 bp in C. obtusiflora

epl nit ind ros atpB, atpE, *, trnF-GAA, trnL-UAA, trnT-UGU, rps4 3800 bp 7869 bp in Ipomoea purpurea N/A in C. exaltata† 2700 bp 2756 bp in C. obtusiflora

*trnV-UAC, ndhC, ndhK, and ndhJ are lost in Cuscuta species shown in gel † An inversion in all members of subgenus Monogyna prevents PCR of this region 121

Fig. 5

rbcL atpE 0.113 0.039 Cuscuta rostrata 0.296 C. rostrata 2.254 0.280 0.009 0.038 0.157 0.208 0.030 0.270 0.036 0.149 Cuscuta obtusiflora 0.235 C. obtusiflora 0.419 0.352 0.332 0.148 0.047 Cuscuta indecora 0.416 C. indecora 0.263 0.451 0.327 0.068 0.326 Cuscuta nitida 0.178 C. nitida Cuscuta epilinum C. epilinum 0.047 0.181 0.332 0.138 0.263 0.291 0.109 Cuscuta europaea 0.126 C. europaea 0.051 0.117 0.432 0.060 0.269 0.177 0.146 Cuscuta exaltata 0.142 C. exaltata 0.208 0.061 0.147 Ipomoea purpurea Ipomoea purpurea 0.153 Nicotiana tabacum Nicotiana tabacum 0.104 0.702 0.070 0.118 0.606 0.066 0.475 0.139 Atropa belladonna Atropa belladonna

0.433 0.121 0.164 Panax ginseng 0.151 Panax ginseng Spinacia oleracea Spinacia oleracea

0.1 substitutions/site 0.1 substitutions/site

rps2 rpoA 0.100 0.369 C. nitida 0.371 C. rostrata 0.360 0.460 0.115 C. obtusiflora 0.090 C. europaea 0.474 0.228 0.251 0.205 0.448 0.449 C. indecora C. epilinum 0.365 0.833 0.293 C. nitida C. japonica 0.260 0.057 C. epilinum C. lupuliformis 0.200 0.452 0.261 C. europaea 0.153 C. reflexa 0.203 0.199 0.409 0.329 0.120 C. exaltata 0.144 C. exaltata 0.169 Ipomoea purpurea 0.155 Ipomoea purpurea Nicotiana tabacum Nicotiana tabacum 0.237 0.282 0.108 0.089 0.493 0.716 0.177 0.133 Atropa belladonna Atropa belladonna

0.341 0.329 Panax ginseng 0.144 Panax ginseng 0.164 Spinacia oleracea Spinacia oleracea

0.1 substitutions/site 0.1 substitutions/site 122

Fig. 6

SubgenusM onogyna Subgenus Grammica

C. cephalanthi C. suksdorfii

C. obtusiflora

C. campestris C. exaltata C. veatchii C. tasmanica C. lupuliformis C. chilensis C. coryli SubgenusC uscuta C. compacta

C. gronovii (NJ) C. gronovii C. europaea C. indecora C. indecora C. rostrata C. gronovii (OH) C. epilinum var. longisepala var. calyptrata 123

Fig. 7

88 89 95 98 100 100 82 83 59 100 100 ITS rbcL 82

100 64 99 100 85 60 63

99 93 78 100 100 79 65 80 96 54 52 60 100

99 99

94 100 82 70

68 100 92 88 100 89 99 99 58 69 88

62 77

rps2 86 100 78 100

51 57 93

75

100 94 75 68 64 100

77

90 Fig. 8 124

B C B A C

A interpreted as 125

Table 4. Synonymous and Nonsynonymous substitutions for all branches of atpE

Parameter Value d N /d S Shared TV/TS 0.544

Atropa belladonna d N 0.007 Atropa belladonna d s 0.000 N/A Cuscuta epilinum d N 0.018 Cuscuta epilinum d S 0.057 0.317 Cuscuta europaea d N 0.047 Cuscuta europaea d S 0.032 1.476 Cuscuta exaltata d N 0.063 Cuscuta exaltata d S 0.146 0.433 Cuscuta indecora d N 0.066 Cuscuta indecora d S 0.451 0.148 Cuscuta nitida d N 0.107 Cuscuta nitida d S 0.326 0.327 Cuscuta obtusiflora d N 0.023 Cuscuta obtusiflora d S 0.149 0.157 Cuscuta rostrata d N 0.032 Cuscuta rostrata d S 0.280 0.113 Ipomoea purpurea d N 0.030 Ipomoea purpurea d S 0.147 0.208 Nicotiana tabacum d N 0.007 Nicotiana tabacum d S 0.087 0.079 C.epilinum & C. europaea d N 0.087 C.epilinum & C. europaea d S 0.263 0.332 Solanaceae d N 0.047 Solanaceae d S 0.066 0.702 Solanales d N 0.009 Solanales d S 0.153 0.061 Convolvulaceae d N 0.014 Convolvulaceae d S 0.269 0.051 Cuscuta d N 0.020 Cuscuta d S 0.109 0.181 Subgenera Grammica & Cuscuta d N 0.018 Subgenera Grammica & Cuscuta d S 0.000 N/A C. nitida & Subgenus Grammica d N 0.039 C. nitida & Subgenus Grammica d S 0.053 0.740 Subgenus Grammica d N 0.113 Subgenus Grammica d S 0.419 0.270 C.obtusiflora & C. rostrata d N 0.087 C.obtusiflora & C. rostrata d S 0.038 2.254 Panax ginseng d N 0.071 Panax ginseng d S 0.164 0.433 Spinacia oleracea d N 0.063 Spinacia oleracea d S 0.606 0.104 126

Table 5. Synonymous and Nonsynonymous substitutions for all branches of rbcL

Parameter Value d N /d S Shared TV/TS 0.559

Atropa belladonna d N 0.002 Atropa belladonna d s 0.031 0.062 Cuscuta epilinum d N 0.006 Cuscuta epilinum d S 0.076 0.080 Cuscuta europaea d N 0.000 Cuscuta europaea d S 0.048 0.000 Cuscuta exaltata d N 0.009 Cuscuta exaltata d S 0.142 0.060 Cuscuta indecora d N 0.020 Cuscuta indecora d S 0.416 0.047 Cuscuta nitida d N 0.012 Cuscuta nitida d S 0.178 0.068 Cuscuta obtusiflora d N 0.007 Cuscuta obtusiflora d S 0.235 0.030 Cuscuta rostrata d N 0.012 Cuscuta rostrata d S 0.296 0.039 Ipomoea purpurea d N 0.003 Ipomoea purpurea d S 0.041 0.074 Nicotiana tabacum d N 0.000 Nicotiana tabacum d S 0.030 0.000

C.epilinum & C. europaea d N 0.014

C.epilinum & C. europaea d S 0.291 0.047

Solanaceae d N 0.016

Solanaceae d S 0.139 0.118

Solanales d N 0.027

Solanales d S 0.056 0.486

Convolvulaceae d N 0.021

Convolvulaceae d S 0.177 0.117

Cuscuta d N 0.017 Cuscuta d S 0.126 0.138 Subgenera Grammica & Cuscuta d N 0.007 Subgenera Grammica & Cuscuta d S 0.049 0.146 C. nitida & Subgenus Grammica d N 0.011 C. nitida & Subgenus Grammica d S 0.047 0.240 Subgenus Grammica d N 0.013 Subgenus Grammica d S 0.352 0.037 C.obtusiflora & C. rostrata d N 0.002 C.obtusiflora & C. rostrata d S 0.208 0.009 Panax ginseng d N 0.018 Panax ginseng d S 0.151 0.121 Spinacia oleracea d N 0.033 Spinacia oleracea d S 0.475 0.070 127

Table 6. Synonymous and Nonsynonymous substitutions for all branches of rps2

Parameter Value d N /d S Shared TV/TS 0.439

Atropa belladonna d N 0.000 Atropa belladonna d s 0.018 0.000 Cuscuta epilinum d N 0.028 Cuscuta epilinum d S 0.051 0.544 Cuscuta europaea d N 0.021 Cuscuta europaea d S 0.036 0.601 Cuscuta exaltata d N 0.047 Cuscuta exaltata d S 0.144 0.329 Cuscuta indecora d N 0.164 Cuscuta indecora d S 0.365 0.449 Cuscuta nitida d N 0.076 Cuscuta nitida d S 0.260 0.293 Cuscuta obtusiflora d N 0.026 Cuscuta obtusiflora d S 0.228 0.115 Cuscuta rostrata d N 0.037 Cuscuta rostrata d S 0.371 0.100 Ipomoea purpurea d N 0.029 Ipomoea purpurea d S 0.028 1.050 Nicotiana tabacum d N 0.000 Nicotiana tabacum d S 0.059 0.000 C.epilinum & C. europaea d N 0.090 C.epilinum & C. europaea d S 0.452 0.200 Solanaceae d N 0.016 Solanaceae d S 0.177 0.089 Solanales d N 0.004 Solanales d S 0.094 0.040 Convolvulaceae d N 0.049 Convolvulaceae d S 0.120 0.409 Cuscuta d N 0.028 Cuscuta d S 0.093 0.297 Subgenera Grammica & Cuscuta d N 0.010 Subgenera Grammica & Cuscuta d S 0.128 0.079 C. nitida & Subgenus Grammica d N 0.018 C. nitida & Subgenus Grammica d S 0.072 0.249 Subgenus Grammica d N 0.097 Subgenus Grammica d S 0.205 0.474 C.obtusiflora & C. rostrata d N 0.010 C.obtusiflora & C. rostrata d S 0.148 0.068 Panax ginseng d N 0.038 Panax ginseng d S 0.298 0.126 Spinacia oleracea d N 0.077 Spinacia oleracea d S 0.716 0.108 128

Table 7. Synonymous and Nonsynonymous substitutions for all branches of rpoA

Parameter Value d N /d S Shared TV/TS 0.442

Atropa belladonna d N 0.022 Atropa belladonna d s 0.016 1.427 Cuscuta epilinum d N 0.018 Cuscuta epilinum d S 0.061 0.288 Cuscuta europaea d N 0.033 Cuscuta europaea d S 0.067 0.499 Cuscuta exaltata d N 0.022 Cuscuta exaltata d S 0.009 2.419 Cuscuta japonica d N 0.039 Cuscuta japonica d S 0.090 0.431 Cuscuta lupuliformis d N 0.025 Cuscuta lupuliformis d S 0.060 0.424 Cuscuta nitida d N 0.133 Cuscuta nitida d S 0.361 0.369 Cuscuta reflexa d N 0.028 Cuscuta reflexa d S 0.060 0.470 Ipomoea purpurea d N 0.033 Ipomoea purpurea d S 0.058 0.565 Nicotiana tabacum d N 0.009 Nicotiana tabacum d S 0.069 0.131

Cuscuta d N 0.048

Cuscuta d S 0.057 0.833

Subgenus Monogyna d N 0.030

Subgenus Monogyna d S 0.199 0.153

Subgenus Monogyna above C. exaltata d N 0.007

Subgenus Monogyna above C. exaltata d S 0.051 0.129

C. japonica & C. lupuliformis d N 0.013

C. japonica & C. lupuliformis d S 0.010 1.340

Subgenus Cuscuta d N 0.041 Subgenus Cuscuta d S 0.090 0.460 C.epilinum & C. europaea d N 0.113 C.epilinum & C. europaea d S 0.448 0.252 Solanales d N 0.026 Solanales d S 0.155 0.169 Solanaceae d N 0.037 Solanaceae d S 0.133 0.282 Convolvulaceae d N 0.053 Convolvulaceae d S 0.203 0.261 Panax ginseng d N 0.056 Panax ginseng d S 0.164 0.341 Spinacia oleracea d N 0.117 Spinacia oleracea d S 0.493 0.237 129

Appendix Voucher information and Genbank accession numbers

Species Voucher rbcL rps2 matK ITS atpE rpoA Cuscuta cuspidata N/A N/A N/A N/A AF323744 N/A N/A C. gronovii (PAC) JRM03.1206 XXXXXXXX XXXXXXXX N/A XXXXXXXX N/A N/A C. gronovii var. calyptrata (PAC) JRM03.1102 N/A N/A N/A XXXXXXXX N/A N/A C. rostrata (PAC) JRM03.1001 XXXXXXXX XXXXXXXX N/A XXXXXXXX XXXXXXXX N/A C. cephalanthi (PAC) JRM03.1002 XXXXXXXX XXXXXXXX N/A XXXXXXXX N/A N/A C. glomerata (TEX) 00393912 XXXXXXXX XXXXXXXX N/A XXXXXXXX N/A N/A C. compacta (PAC) JRM03.1104 XXXXXXXX XXXXXXXX N/A XXXXXXXX N/A N/A C. denticulata (PAC) CWD98.301 N/A N/A N/A XXXXXXXX N/A N/A C. veatchii (PAC) JRM04.0701 XXXXXXXX XXXXXXXX N/A XXXXXXXX N/A N/A C. campestris (PAC) JRM04.0702 XXXXXXXX XXXXXXXX N/A XXXXXXXX N/A N/A C. polygonorum (PAC) JRM03.1207 N/A N/A N/A XXXXXXXX N/A N/A C. obtusiflora (PAC) JRM03.0207 XXXXXXXX XXXXXXXX N/A XXXXXXXX XXXXXXXX N/A C. salina (TEX) Halse4961 XXXXXXXX N/A N/A XXXXXXXX N/A N/A C. salina var. apoda (TEX) Tiehm13405 N/A N/A N/A XXXXXXXX N/A N/A C. suksdorfii * N/A N/A N/A XXXXXXXX N/A N/A C. subinclusa (TEX) provance2138 N/A N/A XXXXXXXX N/A N/A C. californica (TEX) van der Werff 1 N/A N/A N/A XXXXXXXX N/A N/A C. tasmanica * XXXXXXXX N/A N/A XXXXXXXX N/A N/A C. tinctoria (TEX) 00155775 N/A N/A N/A XXXXXXXX N/A N/A C. applanata * XXXXXXXX XXXXXXXX N/A XXXXXXXX N/A N/A C. potosina (TEX) 00155818 N/A N/A N/A XXXXXXXX N/A N/A C. sandwichiana (BISH) 2098 N/A N/A N/A XXXXXXXX N/A N/A C. americana * XXXXXXXX N/A N/A XXXXXXXX N/A N/A C. attenuata N/A N/A N/A N/A AF348405 N/A N/A C. indecora (PAC) JRM03.1103 XXXXXXXX XXXXXXXX N/A XXXXXXXX XXXXXXXX N/A C. coryli (PAC) 62115 N/A N/A N/A XXXXXXXX N/A N/A C. leptantha (TEX) 00394072 N/A N/A N/A XXXXXXXX N/A N/A C. umbellata * N/A XXXXXXXX N/A XXXXXXXX N/A N/A C. foetida (TEX) Sparre16952 N/A N/A N/A XXXXXXXX N/A N/A C. chilensis (PAC) JRM03.0203 N/A N/A N/A XXXXXXXX N/A N/A C. nitida * XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX C. epilinum (PAC) JRM03.1210a XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX C. europaea (PAC) JRM03.1101 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX C. japonica # AY101061 XXXXXXXX XXXXXXXX XXXXXXXX N/A XXXXXXXX C. lupuliformis (PAC) JRM03.0808 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX N/A XXXXXXXX C. reflexa # X61698 XXXXXXXX XXXXXXXX XXXXXXXX N/A XXXXXXXX C. exaltata * XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Ipomoea purpurea (PAC) JRM03.1203 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Ipomoea quamoclit X coccinea * N/A N/A N/A XXXXXXXX N/A N/A Calystegia sepium (PAC) JRM97.052 AY100992 XXXXXXXX N/A XXXXXXXX N/A N/A Jacquemontia tamnifolia (MO) 00883399 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX N/A N/A Dichondra carolinensis # N/A XXXXXXXX XXXXXXXX XXXXXXXX N/A N/A Dichondra occidentalis N/A AY101023 N/A N/A N/A N/A N/A Humbertia madagascariensis (MO) 3854462 AY101062 XXXXXXXX XXXXXXXX XXXXXXXX N/A N/A Nicotiana tabacum N/A NC001879 NC001879 NC001879 AJ492448 NC001879 NC001879 Atropa belladonna N/A NC004561 NC004561 NC004561 N/A NC004561 NC004561 Panax ginseng N/A NC006290 NC006290 NC006290 N/A NC006290 NC006290 Spinacia oleracea N/A NC002202 NC002202 NC002202 N/A NC002202 NC002202 bold=Sequences already deposited on Genbank before this study XXXXXX=Not yet deposited * Not enough material for herbarium voucher; photographs of dissected flowers used for identification available upon request # No voucher available; verified by sequence identity to existing sequences on Genbank 130

Chapter 6:

Future Direction and Conclusion

The research in this thesis provides a method for obtaining plastid or mitochondrial genome sequences from plants for which it would otherwise be difficult or impossible to do so (McNeal, 2005a). It also provides a phylogeny of Cuscuta that is valuable for understanding previous research on the genus in an evolutionary context

(McNeal, 2005c). Complete plastid genome sequences of two Cuscuta species and a nonparasitic relative were analyzed (McNeal, 2005b), and through comparison with sequence data of other species, a more comprehensive understanding of plastid genome evolution, photosynthetic ability, and plastid function throughout the genus has been attained. Loss of ndh genes in all Cuscuta species suggests that lipid biosynthesis, rather than carbohydrate production, has probably been the primary purpose of photosynthesis in these parasites since before the diversification of extant species. Colonization of the

New World by subgenus Grammica probably occurred through a South African dispersal event. Subgenus Cuscuta is paraphyletic, with a South African clade of species as the sister group to subgenus Grammica. Vast reduction of the plastid genome, including the loss of plastid-encoded polymerase genes and associated intergenic transcription-related sequence, occurred prior to colonization of the New World by subgenus Grammica.

Most species diversity occurs in this lineage, and complete loss in photosynthesis is probably limited to one small clade of species predominantly distributed in the mountains of Central and South America (McNeal, 2005c). 131

Despite advances in knowledge of Cuscuta biology provided by this thesis, these results create many more opportunities for future study. Examination of lipid composition in seeds of photosynthetic and nonphotosynthetic Cuscuta species should confirm the hypothesis of photosynthesis for lipid production in most species in the genus, as well as provide further information on how nonphotosynthetic species accomplish successful seed production in the absence of Rubisco. Studies of the plastid genome in members of subsections Odoratae, Subulatae, Acutilobae, and Grandiflorae could help demonstrate how the steps toward complete loss of photosynthesis occur.

The phylogeny in this thesis demonstrates the need for taxonomic revision of

Cuscuta. Subgenus Cuscuta is paraphyletic, and sectional delimitations within subgenus

Grammica are completely artificial. Sampling of members of sections in subgenus

Cuscuta not included in this study will help resolve how subgeneric boundaries should be outlined to preserve monophyletic groups, as well as providing further information on

Cuscuta biogeography. Species delimitation is a more difficult question to address and will require extensive population-level sampling, karyotyping, and pollination experiments for questionable taxa.

Finally, studies of other parasitic and mycotrophic groups will lead to a greater understanding of the evolutionary parallels between independently heterotrophic lineages. Using the method detailed in chapter 2, more plastid genomes of parasitic and mycotrophic plants are being produced and will show the similarities and differences in plastid evolution experienced as unrelated plants become heterotrophic. Additionally, further study of Cuscuta chilensis and other parasitic plant species for which proof of a functional plastid genome does not yet exist (Nickrent et al., 1997) will help determine 132 whether plants can exist in the absence of a plastid genome and, if so, how they replace or compensate for the lost role it plays. 133

References MCNEAL, J. R. 2005a. Chapter 2: "Utilization of partial genomic fosmid libraries for sequencing complete organellar genomes" in Systematics and plastid genome evolution in the parasitic plant genus Cuscuta (dodder). PhD., The Pennsylvania State University, University Park. ______. 2005b. Chapter 4: "Complete plastid genome sequences suggest strong selection for retention of photosynthetic genes in the parasitic plant genus Cuscuta" in Systematics and plastid genome evolution in the parasitic plant genus Cuscuta (dodder). PhD., The Pennsylvania State University, University Park. ______. 2005c. Chapter 5: "Systematics and plastid genome evolution of the cryptically photosynthetic parasitic plant genus Cuscuta (Convolvulaceae)" in Systematics and plastid genome evolution in the parasitic plant genus Cuscuta (dodder). PhD., The Pennsylvania State University, University Park. NICKRENT, D. L., O. Y. YAN, R. J. DUFF, AND C. W. DEPAMPHILIS. 1997. Do nonasterid holoparasitic flowering plants have plastid genomes? Plant Molecular Biology 34: 717-729. VITA Joel R. McNeal

EDUCATION: Fall 95 - Spring 99: B. S. (Biology) Vanderbilt University; Nashville, TN Fall 99 - Spring 05: PhD. Program, Pennsylvania State University; University Park, PA Summer 03: Organization for Tropical Studies Tropical Plant Systematics Course; Costa Rica PROFESSIONAL EXPERIENCE: Fall 96/Spring 97 Undergraduate Work Study in the Lab of Dr. David McCauley, Department of Fall 98/Spring 99 Biology, Vanderbilt University Greenhouse work, DNA extraction, PCR, and DNA sequencing for various projects, mostly involving population ecology of Silene species. Fall 97/ Spring 98 Molecular Evolution Research Under Dr. Claude dePamphilis Department of Biology, Vanderbilt University Survey of diverse plant families to study molecular evolution of a mitochondrial intron and how it is distributed parasitic plants. Summer 98 N.S.F. Research Experience for Undergrasduates and H.H.M.I. Summer Fellowship Molecular evolutionary work with parasitic plants. Included DNA sequencing, plant collecting and pressing, and data analysis. AWARDS / GRANTS: Fall 99-Spring 00 Eberly College of Science Braddock Fellowship June 02-June 04 National Science Foundation Doctoral Dissertation Improvement Grant- $10,000 award Fall 02 Pennsylvania State University Biology Department Outstanding Teaching Assistant Award Fall 03 Pennsylvania State University Biology Department Henry W. Popp Fellowship PUBLICATIONS 2000 Barkman, T.J., G. Chenery, J.R. McNeal, J. Lyons-Weiler, W.J. Elisens, A.G. Moore, A.D. Wolfe, and C.W. dePamphilis. “Independent and Combined Analyses of Sequences of All Three Genomes Converge on the Root of Flowering Plant Phylogeny.” Proceedings of the National Academy of Sciences. 97:13166- 13171. 2001 Musselman, L. J. and J.R. McNeal. “Hydnora triceps (Hydnoraceae): Unique Flowers with an Uncertain Future.” Proceedings of the 7th International Parasitic Weeds Symposium: 23-28. TEACHING EXPERIENCE: (Pennsylvania State University Department of Biology) Fall 99 BIOL 110: Basic Concepts and Biodiversity (T.A., 2 lab sections) Spring 00 BIOL 414: Taxonomy of Seed Plants (T.A., 2 lab sections) Spring 01 BIOL 407: Plant Anatomy (T.A., 1 lab section) Fall 01 BIOL 110H: Basic Concepts and Biodiversity (T.A., 2 honors lab sections, 1 regular section) Fall 02 BIOL 448: Field Ecology (T.A.) May-June 01-04 BIOL 414: Taxonomy of Seed Plants (Co-Instructor 01 and 03, Instructor 02 and 04) 3 credit, 4 week intensive field course covering evolutionary systematics and taxonomy of plants. Website used as systematics text designed by J. McNeal: http://www.bio.psu.edu/Courses/bio414.