<<

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/323136699

Total duplication of the small single copy region in the angiosperm plastome: Rearrangement and inverted repeat instability in

Article in American Journal of Botany · February 2018 DOI: 10.1002/ajb2.1001

CITATIONS READS 24 150

4 authors, including:

Brandon T. Sinn Otterbein University

26 PUBLICATIONS 286 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Planteome Project View project

All content following this page was uploaded by Brandon T. Sinn on 06 December 2019.

The user has requested enhancement of the downloaded file. RESEARCH ARTICLE

Total duplication of the small single copy region in the ­angiosperm plastome: Rearrangement and inverted repeat instability in Asarum

Brandon T. Sinn1,3,4 , Dylan D. Sedmak1, Lawrence M. Kelly2, and John V. Freudenstein1

Manuscript received 29 August 2017; revision accepted 27 PREMISE OF THE STUDY: As more plastomes are assembled, it is evident that rearrangements, ­November 2017. losses, intergenic spacer expansion and contraction, and syntenic breaks within otherwise 1 The Ohio State University Museum of Biological Diversi- functioning plastids are more common than was thought previously, and such changes have ty, ­Department of Evolution, Ecology, and Organismal Biology, developed independently in disparate lineages. However, to date, the remain Columbus, Ohio 43212, USA characterized by their highly conserved plastid genomes (plastomes). 2 New York Botanical Garden, Bronx, New York 10458-5126, USA 3 West Virginia University, Department of Biology, Morgantown, METHODS: Illumina HiSeq and MiSeq platforms were used to sequence the plastomes of West Virginia 26505, USA Saruma henryi and those of representative species from each of the six taxonomic sections 4 Author for correspondence (e‑mail: [email protected]) of Asarum. Sequenced plastomes were compared in a phylogenetic context provided Citation: Sinn, B. T., D. D. Sedmak, L. M. Kelly, and J. V. Freuden- by maximum likelihood and parsimony inferences made using an additional 18 publicly stein. 2018. Total duplication of the small single copy region in available plastomes from early-­diverging angiosperm lineages. the angiosperm plastome: Rearrangement and inverted repeat instability in Asarum. American Journal of Botany 105(1): 71–84. KEY RESULTS: In contrast to previously published magnoliid plastomes and the newly doi: 10.1002/ajb2.1001 sequenced Saruma henryi plastome published here, Asarum plastomes have undergone extensive disruption and contain extremely lengthy AT-­repeat regions. The entirety of the small single copy region (SSC) of A. canadense and A. sieboldii var. sieboldii has been incorporated into the inverted repeat regions (IR), and the SSC of A. delavayi is only 14 bp long. All sampled Asarum plastomes share an inversion of a large portion of the large single copy region (LSC) such that trnE-­UUC is adjacent to the LSC-­IR boundary.

CONCLUSIONS: Plastome divergence in Asarum appears to be consistent with trends seen in highly rearranged plastomes of the monocots and eudicots. We propose that plastome instability in Asarum is due to repetitive motifs that serve as recombinatory substrates and reduce genome stability.

KEY WORDS ; Asarum; cruciform DNA; inverted repeat region; magnoliid; phylogenomics; plastid genome; plastome; small single copy region

The majority of sequenced angiosperm plastid genomes (plas- et al., 2010; Guisinger et al., 2011; Ruhlman and Jansen, 2014; Sloan tomes) are highly conserved with respect to gene content, gene et al., 2014). However, syntenically disrupted and otherwise diver- order (synteny), length, and GC content (Guisinger et al., 2011; gent plastomes have only recently been reported in the magnoli- Wicke et al., 2011; Ruhlman and Jansen, 2014). Nearly all angio- ids (Blazier et al., 2016b; Naumann et al., 2016), an early-diverging­ sperm plastomes are functionally tripartite due to the presence angiosperm clade that comprises approximately 10,000 species in of two coevolving, expansive inverted repeat (IR) regions that 20 families (Stevens, 2001) and four orders (Cai et al., 2006; APG separate large (LSC) and small single copy (SSC) regions. The IR IV, 2016). Here, we characterize widespread plastome disruption regions have long been hypothesized to lend stability to the genome and its putative causes in one magnoliid lineage, and we present (Palmer, 1983; Knox, 2014). With the advent of massively parallel evidence for the first reported incorporation of the entirety of the sequencing, our sampling and understanding of plastome diversity SSC region into the IR. across the angiosperm phylogeny have greatly improved (Ruhlman The magnoliids represent the largest clade of basal angiosperms, and Jansen, 2014), and the discovery of nonstandard plastomes has yet the plastome sequences of only 46 representative taxa have been become increasingly common among the monocots and eudicots published (Cai et al., 2006; Kuang et al., 2011; Li et al., 2013; Bla- (Chumley et al., 2006; Cai et al., 2008; Haberle et al., 2008; Magee zier et al., 2016b; Zhou et al., 2017). Relative to other angiosperm

American Journal of Botany 105(1): 71–84, 2018; http://www.wileyonlinelibrary.com/journal/AJB © 2018 Botanical Society of America • 71 72 • American Journal of Botany

lineages, magnoliid plastomes have generated comparatively little ­plastome rearrangement be inferred from our data using a phylo- interest aside from their use in phylogenetic studies (Jansen et al., genetic framework? 2007; Moore et al., 2010; Soltis et al., 2011; Li et al., 2013; Logacheva Through the comparison of the disrupted plastomes of Asa- et al., 2014; Ruhfel et al., 2014); those sequenced to date are unre- rum with the canonically ordered Saruma plastome, we found that arranged and highly conserved in length, and gene and GC content reduction in intergenic spacer complexity, coupled with length- (Cai et al., 2006; Li et al., 2013; Song et al., 2016), with the excep- ening of AT-rich­ motifs, corresponds to a reduction in syntenic tion of a six-gene­ inversion and IR expansion in Annona cherimola conservatism. Specifically, we found that the development and (Annonaceae; Blazier et al., 2016b) and the highly degraded plas- extension of tandem repeats are associated with syntenic breaks tome of the parasitic Hydnora visseri (Hydnoraceae; Naumann and that differential resolution of recombinatory events involving et al., 2016). Of the more than 1000 land-­ plastomes databased cruciform DNA is likely responsible for the variable presence of the by the National Center for Biotechnology Information (Ruhlman SSC region in Asarum plastomes. Our results provide further evi- et al., 2017), less than 1% are from the approximately 10,000 mag- dence that the control of replication fidelity and recombination may noliid species. Currently, the plastomes of Magnolia are perhaps together underlie a generalizable mechanism of syntenic disruption best characterized, with 18 of the 46 completely sequenced magno- of angiosperm plastomes. liid plastomes available on GenBank from this woody, slowly evolv- ing group (Kim et al., 2001). To date, only two syntenically disrupted magnoliid plastomes MATERIALS AND METHODS have been published. The plastome of Annona cherimola (Blazier et al., 2016b) has undergone marked IR expansion and contains DNA extraction and library preparation an SSC region comprising only two genes, whereas the plastome of the holoparasitic Hydnora visseri (Naumann et al., 2016) is Total DNA was extracted from 1 g of material per sampled spe- extremely short and gene-­sparse. Our discovery of highly variable, cies using the CTAB method (Doyle and Doyle, 1987; see Appen- -lengthy AT-­rich repetitive regions in Asarum plastomes (Sinn dix 1 for sample voucher information). To reduce the concentra et al., 2015a) as part of a phylogenetic project led us to investigate tion of residual proteins and RNase, phenol–chloroform–isoamyl whether currently available magnoliid plastomes provide an accu- alcohol (25:24:1) extractions were used. DNA concentration and rate account of the presumed conservative nature of these organ- purity were quantified using a Nanodrop 2000 spectrophotometer ellar genomes. To assess the synteny and repeat content of Asarum (Thermo Scientific; Waltham, MA, USA). plastomes, we sequenced whole plastomes, surveying each taxo- nomic section of Asarum as well as the sole extant member of the DNA sequencing and data handling lineage that is sister to Asarum, Saruma henryi. We found that the plastome of S. henryi is syntenically similar to other plastomes Massively parallel sequencing of all species was accomplished by reported from the magnoliids, yet is dissimilar from these plas- multiplexing samples on multiple Illumina (San Diego, CA, USA) tomes due to the development of short AT-rich­ tandem repeats HiSeq and MiSeq lanes. MiSeq runs used version 3 chemistry to throughout the genome. Furthermore, the plastomes of Asarum achieve paired-end­ reads up to 300 bp long, and HiSeq sequencing are syntenically disrupted with respect to that of Saruma, and produced 100-­bp paired-­end reads with an average insert size of some have experienced shifts in IR boundaries that have resulted 560 bp. Library preparation and DNA sequencing were done at the in the heretofore unreported incorporation of the entirety of the Molecular and Cellular Imaging Center of the Ohio Agricultural SSC into the IR. Taken together, these results are evidence that the Research and Development Center. Geneious versions 7.1, 8.1, and magnoliids do not represent a genomically homogeneous clade 10 (Drummond et al., 2011) were used for demultiplexing, mate-­ frozen in time. pair shuffling, and quality trimming of reads; reads were trimmed The growing number and distribution of syntenically disrupted using an erroneous-­nucleotide probability of α = 0.05, correspond- plastomes representing lineages from across the angiosperm phy- ing to a Phred score of approximately 13. logeny are evidence that divergence of the content and structure of angiosperm plastomes may not be as infrequent as previously Plastome assembly and annotation thought. Divergent, yet functional, plastomes have been reported from eudicot and monocot families such as Campanulaceae A combination of reference-based­ contig extension, de novo assem- (Haberle et al., 2008; Knox, 2014), Caryophyllaceae (Sloan et al., bly, and Sanger sequencing was used to assemble plastomes. Using 2012), Ericaceae (Fajardo et al., 2013), Fabaceae (Magee et al., 2010; the Geneious Read Mapper (Drummond et al., 2011), reads from Gurdon and Maliga, 2014), Orchidaceae (Kim et al., 2014), the Saruma henryi were mapped to the Piper kadsura (Lee et al., 2016) Alismatales (Ross et al., 2015), Geraniaceae (Chumley et al., 2006; plastome using parameters that allowed gaps of up to 3 bp, com- Blazier et al., 2016a), as well as the magnoliid families Annonaceae prising up to 5% of a given mapped read. Mapped reads, including (Blazier et al., 2016b) and Hydnoraceae (Naumann et al., 2016). any read-pair­ mates that did not map, were then de novo assembled Extensive repetitive and low-complexity­ regions have been reported using the Geneious De Novo Assembler (Drummond et al., 2011) from the majority of syntenically disrupted plastomes. The relax- using the default Low Sensitivity / Fastest setting, with “use paired ation of nuclear-encoded­ repair mechanisms (Zhang et al., 2016) read distances to improve assembly” and “only paired hits during and aberrant recombination and DNA repair have been implicated assembly” enabled, and with gap and maximum depth disabled. in the destabilization of plastome synteny (Guisinger et al., 2008; De novo contigs were proofread and trimmed, using a minimum 2010; 2011; Ruhlman et al., 2017). In light of the syntenic disrup- depth of three reads, and their consensus sequences were extended tion of Asarum plastomes relative to that of the syntenically canon- using iterative reference-based­ mapping of all reads with the Gene- ical Saruma plastome, we asked: Can the mechanisms of Asarum ious Read Mapper (Drummond et al., 2011). The Geneious de novo January 2018, Volume 105 • Sinn et al.—Plastome rearrangement and IR instability in Asarum • 73

assembler was used to combine contigs that overlapped after multi- question marks separated each Asarum contig, as well as regions ple rounds of iterative read mapping and extension. Unfortunately, that were split to match the synteny of the Saruma plastome, so extreme base-pair­ compositional bias in several AT-rich­ regions that their boundaries could be identified in the final alignment. To prevented circularization of Asarum plastomes. Primers for Sanger arrange the synteny of the Annona cherimola plastome to match sequencing were developed in an attempt to bridge AT-rich­ regions that of the typical ancestral synteny assumed for the magnoliids, that were either not represented in the sequencing libraries or were LASTZ (version 1.02, Harris, 2007) was used to identify the bounds not recovered by Illumina sequencing, but this was largely unsuc- of syntenic breaks; the resulting partitions were then assembled to cessful and ultimately judged to be unproductive. Primers used for the plastome of Liriodendron tulipifera, the inversions reversed, and Sanger sequencing are included in Appendix 2. De novo assembly the contigs concatenated. Sequence alignment was conducted in in Discovar de novo (Computational Research and Development Geneious using the MAFFT (Katoh et al., 2002) plugin with auto- Group, Broad Institute, Cambridge, MA, USA), SOAPdenovo2 (Luo matic algorithm selection and a 200PAM/k=2 scoring matrix with et al., 2012), and Velvet (Zerbino and Birney, 2008) failed to circu- gap-open­ penalty and offset values set to 1.53 and 0.123, respec- larize or improve the lengths of the contigs assembled iteratively in tively. Due to the many repetitive regions found in Asarum plasto- Geneious. mes, especially in intergenic spacer regions, Gblocks (version 0.91b; The Live Annotation feature of Geneious was used for all plas- Castresana, 2000) was used as an objective and repeatable method tome annotation, with like regions identified by a minimum of 70% to remove regions that could not be unambiguously aligned, con- similarity to published annotations of the Piper kadsura plastome. tained indels, or were flanked by highly variable sequence using the The reading frame of each annotated region was visually inspected stringent default settings. and hand-­adjusted. We selected solutions that maintained open reading frames over those that resulted in pseudogenes. Annotated Phylogenetic inference plastomes are archived on GenBank, and accession numbers can be found in Appendix 3. Maximum likelihood (ML; Felsenstein, 1981) phylogenetic infer- ence was conducted using the Pthreads, high-­performance comput- Tandem repeat detection ing version of RAxML 8.2.4 (Stamatakis, 2006; Ott et al., 2007). Tree searches were conducted on each of 1000 randomly constructed Phobos version 3.3.12 (Mayer, 2006–2010) was used via the Gene- topologies, using the GTRGAMMA model with six discrete rate ious plugin to discover, count, and annotate tandem repeats. Pho- categories. Support values were determined via 1000 bootstrap (BS) bos uses a score that is reflective of the quality of a local alignment replicates. as an optimality criterion. The tandem repeat search was restricted Maximum parsimony (MP) phylogenetic inference was con- to perfect repeats between 2 and 1000 bp long and used default ducted using PAUP*4 (version 4.0a152, Swofford, 2002). The score constraints; the “remove hidden repeats” setting was enabled. branch and bound algorithm was used to search for most parsimo- nious trees. Jackknife (JK; Farris et al., 1996) support values were Sequence alignment and matrix construction determined by enabling the “emulate jac” option for 1000 branch and bound search replicates. Eighteen complete plastomes comprising at least one representative of each magnoliid genus for which a complete plastome has been Determination of trnH-­GUG genomic compartment membership published, including Amborella trichopoda and Chloranthus spica- tus as outgroups, were downloaded from GenBank and included To determine whether trnH-GUG was present in a region of the in our study (Appendix 3). We constructed three matrices, one plastome that failed to assemble, the total read pools of each species comprising our assembled and downloaded GenBank plastomes, were mapped against the plastome copy of the gene from S. henryi a second with only Saruma and Asarum species, and a third with using reference-­guided mapping in Geneious and the Low Sensi- trnH-GUG­ sequences. One copy of the IR was removed, and in tivity / Fast settings modified as outlined in the Plastome assembly some cases, the SSC was reoriented, before alignment of entire and annotation section. De novo assembly of the matching reads plastomes. Probable assembly errors identified in the IR and SSC of was completed using the Geneious de novo assembler, using the Piper cenocladum were corrected before alignment; the map for this same settings for reference mapping with the addition of the rean- species (Fig. 1) includes these corrections. We chose to align plas- alyze threshold set to 0, separate variants with coverage over 6, and tomes in their entirety, rather than aligning particular regions or “merge homopolymer variants” enabled. Maximum likelihood sets of regions and subsequently concatenating them, to minimize analysis of an alignment consisting of this region from each species biases that could be introduced by arbitrarily selecting bounds of of the Canellales and used in this study, constructed and subpartitions to align across genomes. A whole-­genome alignment analyzed in identical fashion as for ML inference of plastid rela- approach allows the opportunity to better visualize and interpret tionships, was used to infer the genomic compartment housing the the development of repetitive motifs in spacer regions across the recovered copies of the trnH-­GUG sequence. magnoliids. However, this method does not easily accommodate highly divergent plastomes such as that of the parasitic species Hydnora visseri (Naumann et al., 2016), which is only 27 kb long RESULTS and encodes only 24 genes, whose inclusion would confound the interpretability of recovered patterns in our alignment due to the Plastome length and gene content large numbers of ambiguously aligned sites. Before alignment, contigs from each Asarum species were concatenated such that The plastomes of Asarum and Saruma are highly variable in their they matched the synteny of the plastome of Saruma henryi. Three length and the ease with which they can be circularized using 74 • American Journal of Botany

FIGURE 1. Phylogenetic relationships inferred by maximum likelihood (ML) and maximum parsimony (MP) frameworks using RAxML (ML) and PAUP* (MP) analyses of the entire plastome, minus a single copy of the IR, data set. The topology of highest likelihood is shown. Branches with ML bootstrap and MP jackknife support values below 100 are shown, respectively. Incongruence between the ML topology and the single most parsimonious tree is indicated with a support value of –. The ML topology with proportional branch lengths is shown at bottom left. Red boxes denote duplication of all genes typically found in the SSC region; black boxes denote duplication of at least two genes from the region. Abbreviated physical maps depict SSC–IR and IR–LSC boundaries for the adjacent lineage. Maps are provided for species with divergent plastomes and closely related species to aid interpretation. Plastome gene content and synteny of species without maps differ minimally from the typical angiosperm plastome. Perfect tandem repeat content as a percentage of the total genome length is shown below the name of each species. January 2018, Volume 105 • Sinn et al.—Plastome rearrangement and IR instability in Asarum • 75

TABLE 1. Comparison of length, total reads used, percentage GC, percentage coding and noncoding sequence, number of genes, number of tRNAs, and length of the small (SSC), large (LSC) single copy, and inverted repeat (IR) regions for each plastome sequenced. Some values are estimates due to noncircularization of the plastome. Character Saruma henryi Asarum epigynum A. canadense A. sieboldii A. minus A. megacalyx A. delavayi Length (sum) of contigs 159,914 176,682 183,617 184,923 168,938 136,106 184,279 (bp) Number of contigs 1 8 7 7 5 10 8 Total reads 950,661 1,265,453 2,955,635 728,079 3,123,141 993,146 301,581 % GC 38.8 ~37.6 ~37.4 ~37.7 ~37.9 ~37.8 ~37.8 % Coding and structural 56.8 ~57.2 ~57.0 ~57.8 ~55.9 ~55.5 ~58.0 sequence % Noncoding sequence 43.2 ~42.8 ~43.0 ~42.2 ~44.1 ~44.5 ~42.0 Number of genes 130 ≥137 ≥143 ≥147 ≥133 >113 ≥146 5’ Gene regions at IR 2 2 1 1 2 ≥2 2 boundaries Number of tRNAs 37 ≥37 ≥38 ≥38 ≥37 ≥30 ≥38 LSC length (bp) 88,643 >89,484 >91,459 >88,243 >89,255 >86,887 >86,187 SSC length (bp) 19,504 10,724 0 0 17,145 ~15,832 14 IR length (bp) 51,767 76,494 90,448 >96,680 62,538 >32,411 96,440

­standard massively parallelized sequencing and assembly tech- this gene and relationships inferred in an ML framework together niques. Saruma henryi forms a fully circularized master circle provide evidence that both mitochondrial and plastid copies were 159,914 bp long. The plastomes of Asarum species could not be cir- recovered from each sampled species (Appendix S1, see Supple- cularized. We present sets of contigs and scaffolds for each Asarum mental Data with this article). species, due to the diminishing returns of completing the circular genomes by designing primers for Sanger sequencing and optimiz- Sequence alignment and phylogenetic inference ing PCR protocols for extremely lengthy and troublesome AT-­repeat regions that we estimated to have very low melting points. Our The alignment of plastomes from across the magnoliids resulted in best estimates of plastome length, based on sums of contig lengths, a data matrix of 173,387 characters, of which 116,547 (67.2%) were ranged from 168,938 bp in A. minus to 184,923 bp in A. sieboldii invariant, 56,840 (32.8%) were variable, and 33,887 (19.5%) were var. sieboldii. The sum of contig lengths for the remainder of our parsimony informative. Spacer regions contained a relatively large samples was 176,121 bp in A. epigynum, 183,617 bp in A. canadense, number of characters that were parsimony informative, but the 184,279 bp in A. delavayi, and 136,106 bp in A. megacalyx. The rel- states of many characters could not be confidently assigned due to atively low sum of contig lengths reported for A. megacalyx reflects ambiguous alignment and/or a large amount of missing data. Pro- our inability to confidently demarcate the IR and SSC boundaries. cessing our magnoliid alignment with Gblocks reduced the num- Due to an unusual paucity of reads extending beyond the 3′ por- ber of characters to 99,693, of which 70,346 (70.6%) were invariant, tion of ndhF in A. megacalyx, we were unable to assemble sequence 29,347 (29.4%) were variable, and 18,056 (18.1%) were parsimony downstream of that gene. The position of the IR–SSC boundary in informative, the relative proportions of which were similar to those A. megacalyx was inferred through differences in sequencing depth of the original alignment. near the 3′ portion of ycf1. Comparisons of plastome length, GC The alignment of the plastomes of Saruma henryi and Asarum content, spacer and gene regions are in Table 1. It is important to species resulted in a data matrix of 148,143 characters, of which reiterate that, with the exception of Saruma henryi, the plastome 143,421 (96.8%) were invariant, 4722 (3.2%) were variable, and 720 lengths and GC content of certain regions presented are estimates (0.5%) were parsimony informative. Processing this alignment with due to our inability to fully circularize these genomes. Gblocks reduced the number of characters to 128,732 characters, of Excluding gene duplication events due to IR expansion, the which 126,170 (97.9%) were invariant, 2562 (1.9%) were variable, gene content of Asarum and Saruma plastomes is identical to that and 385 (0.3%) were parsimony informative. in published magnoliid plastomes (Cai et al., 2006; Kuang et al., Maximum likelihood and MP analyses of our magnoliid matrix 2011; Li et al., 2013; Zhou et al., 2017). Inverted repeat boundaries recovered nearly identical optimal topologies; the majority of nodes are located in sequence representing the 5′ portions of ndhA, rpl2, were recovered in each BS and/or JK replicate (Fig. 1). Chloranthus rpl14, rpl16, rps19, and ycf1 that have been incorporated into the IR. spicatus is strongly supported as sister to the magnoliids (BS and TrnT-­GGU and trnH-­GUG were not recovered in any Asarum JK = 100), which comprised two major clades corresponding to the contig containing other plastome genes. The average nucleotide Canellales + Piperales (BS and JK = 100) and the Laurales + Mag- sequencing depth of contigs containing each of these tRNAs was noliales (BS and JK = 100). Piper + were recovered as lower than expected for plastome contigs for each species, which is sister to Saruma + Asarum (JK = 100) under the MP optimality cri- likely due to their short length, low complexity, and PCR bias during terion. In contrast, Piper was recovered as sister to Aristolochia + the library enrichment. In spite of the different sequencing depth of Saruma and Asarum clade under ML (BS = 77). Annona cherimola these contigs, lengthy AT-­rich repeats flanking trnT-­GGU are con- was recovered as sister to the remainder of the Magnoliales. Each sistent with other regions throughout the plastomes presented here. of the four magnoliid orders, the Laurales, Canellales, Magnoliales, trnH-­GUG is normally present in both the mitochondrial genome and Piperales, was recovered as a clade in all resampled topologies and the plastome. Sequencing depth of each assembled copy of (BS and JK = 100). 76 • American Journal of Botany

FIGURE 2. The 3′ portion of ndhF and accompanying downstream sequence in Saruma and Asarum plastomes plotted alongside a topology inferred by both maximum likelihood and maximum parsimony (MP) using Saruma and Asarum plastomes. The alternative positions of A. canadense and A. epigynum in our two most parsimonious trees are depicted. Maximum likelihood bootstrap and MP jackknife support values below 100 are plotted at the nodes, respectively. Possible secondary structures of cruciform DNA in the 3′ portion of ndhF in S. henryi and A. canadense (courtesy of Eric Knox) are shown, and the corresponding regions are annotated in the gene map. Sequence shown in red is located in the SSC region; black sequence indicates sequence in the IR.

Topological incongruence among bootstrap and jackknife repli- of the trnH-­GUG region, and ultimately the inversion of a portion cates was attributable to the poorly supported placement of Aristo- of the LSC. We later discuss this in depth and propose mechanisms lochia, and the relationship of either A. epigynum or A. canadense as underlying the observed plastome disruptions. sister to the remainder of Asarum. Aristolochia and Piper comprised The position of the IR-LSC­ boundary is variable between Asa- a clade (JK = 100) in the single most parsimonious topology, but rum and Saruma, and the location of the SSC-IR­ boundary differs Aristolochia is sister to the Saruma + Asarum clade (ML = 77) in dramatically in Asarum. Differential expansion of the SSC-proximal­ the ML tree. Asarum canadense was sister to the remainder of Asa- portion of the IR has disproportionately influenced the length of rum in 84% and 93% of ML bootstrap and MP jackknife replicates, many Asarum plastomes. Providing critical polarity is the plastome respectively. The remainder of Asarum was recovered as two clades, of Saruma henryi, which contains the typical SSC, the boundary of Asarum minus + A. delavayi (BS and JK = 100) and A. megacalyx + which is found near the 5′ end of ycf1. In Asarum, expansion of the A. sieboldii (BS and JK = 100). IR into the SSC has resulted in the incorporation of ycf1 through Analyses of our Saruma and Asarum matrix failed to more con- the 5′ portion of ndhA exon 2 in A. epigynum, the 5′ portion of fidently resolve relationships within Asarum, relative to analyses ndhF in A. delavayi, and all genes of the SSC in A. canadense and of plastomes from across the magnoliids. Asarum canadense was A. sieboldii into the IR region. Overlapping open reading frames recovered as sister to the remainder of the genus in 65% and 49% of corresponding to ndhF were identified in plastomes where all genes ML bootstrap and MP jackknife replicates, respectively. Maximum of the SSC were incorporated into the IR (Fig. 2). The formerly sep- parsimony analysis identified two most parsimonious trees. These arate IR regions of A. canadense and A. sieboldii are continuous and MP trees differ only in their placement of eitherA . canadense or A. palindromic; the longest of these contiguous IR regions is 96,680 bp epigynum as sister to the remainder of the genus. in A. sieboldii. The incorporation of the SSC into the IR identified in our assemblies was further documented using relative coverage The IR, SSC, and shifts of their boundaries depth through read mapping (Fig. 3, Appendix S2).

The IR boundaries of Asarum plastomes are highly variable and Tandem repeat analysis results experienced positional shifts at both the SSC-­ and LSC-borders.­ The greatest shift in the position of the LSC-IR­ boundary involved The use of Phobos to discover repetitive motifs (tandem repeats) the incorporation of rps19, rpl22, rps3, and rpl16 into the IR, the resulted in the identification of tandem repeats that vary greatly boundary of which is found within rpl14. Encroachment of the IR in their number and relative percentage of total genome content boundary into the LSC is evident in all Asarum plastomes, with in a survey of a sample of syntenically intact and syntenically dis- the exception of A. canadense where the boundary retracted and is rupted plastomes from throughout the angiosperm phylogeny found within the rpl2 intron. In the Saruma plastome, the IR likewise (Table 2). We found that the absolute number of tandem repeats expanded into the LSC, and the boundary is located within rps19; identified is not necessarily reflective of plastome disruption, since on the basis of the phylogenetic results presented here (Figs. 1, 2), the cumulative length of the 216 tandem repeats identified in the we hypothesize this shift to be the precursor of the destabilization highly reduced plastome of Hydnora visseri accounts for 10.8% of January 2018, Volume 105 • Sinn et al.—Plastome rearrangement and IR instability in Asarum • 77

FIGURE 3. Sequencing depth depicted on linear forms of Saruma and Asarum plastomes with one copy of the IR removed. Asarum contigs were split and rearranged to match the synteny of the Saruma plastome. Note the increased sequencing depth indicating the IR regions. Plastome regions where sequencing depth was less than 2× are depicted with red boxes.

its total genome length. However, more than 3% of the total length are no exception to this trend, and the IR of some of these spe- of each plastome was attributable to tandem repeats, with values of cies accounts for approximately half of the total plastome length. 5% or greater identified in some syntenically disrupted plastomes The increased proportion of large, rearranged plastomes containing (5.0–10.8%), with the exception of the syntenically disrupted Pelar- expanded IR regions is evidence that in some cases the development gonium x hortorum (4.0%). The greatest number of tandem repeats of destabilizing features can reduce overall plastome stability more was identified in Cypripedium japonicum (1473), accounting for than can be conferred by the IR (Knox, 2014; Blazier et al., 2016a). 11.7% of total genome length. The plastomes of Asarum species Fluctuations in the position of the SSC–IR boundary often had the next greatest number of tandem repeats, with the most in involve the 3′ portion of ndhF, a gene close to this boundary that Asarum canadense (1005), accounting for 8.1% of the total genome has been implicated to play a role in its movement and the subse- length. An exception to this trend is the syntenically intact plasto- quent variable loss of components of the ndh gene family across mes of Saruma henryi, Aristolochia contorta, and Aristolochia debilis the orchids (Kim et al., 2015). The 3′ portion of ndhF and the pres- wherein 5.3, 5.9, and 6.0% of the genome length, respectively, is due ence of single-copy,­ downstream sequence varies greatly across the to tandem repeats. magnoliids, where the reading frame of the former often intrudes into the adjacent copy of the IR (Figs. 1, 2). The variable absence of single-­copy sequence downstream of the ndhF stop codon is pre- DISCUSSION sumably due to differential resolution of independent recombina- tory events throughout the magnoliid lineages. The 3′ portion of A mechanistic hypothesis for the incorporation of the SSC ndhF and the presence of single-copy­ downstream sequence varies region into the IR most strikingly in Asarum plastomes; this variability can further our understanding of the molecular mechanisms underlying the The IR regions have been hypothesized to confer structural stability incorporation of the SSC in the IR. to the circularized plastid genome (Palmer and Thompson, 1982). The development of nearly palindromic sequence in the 3′ por- However, many of the largest and most highly rearranged plasto- tions of ndhF has lead to the destabilization of the SSC region. mes sequenced to date contain IR regions that have expanded and The typical SSC region of S. henryi contains single-­copy sequence contain a large portion of what was once the SSC region (Chumley, downstream of ndhF, yet ndhF genes in Asarum have either lost et al., 2006; Blazier et al., 2016b). The longest Asarum plastomes accompanying downstream single-­copy sequence or have been var- 78 • American Journal of Botany

TABLE 2. Total number of perfect tandem repeats composed of motifs of in the same region of the SSC-incorporated­ IR of A. canadense 2–1000 bp identified using Phobos from genomes used in this study and from (Fig. 2). The stability of the nearly palindromic motif in S. henryi selected plastomes from across the angiosperm tree of life. Values for Asarum (ATTTACTACGTAGTGAAT) is likely compromised due to an plastomes are estimates; only a single IR from Asarum megacalyx was analyzed A–C pairing within the stem region on the template strand. The to quantify tandem repeats for this species, due to the inability to accurately 3′ ndhF cruciform DNA structures of S. henryi and A. canadense delineate the IR-SSC­ boundary. Rearranged or otherwise divergent plastomes are denoted with an asterisk. differ by only four base pairs, but the variant inA . canadense is distinct in that the destabilizing A-C­ pairing found in the template Plastome Perfect tandem repeat content strand is replaced by an A–T pairing, and the GCA-CGT­ pairing at Length as % of the 3′ side of the branchpoint is absent. We propose that the com- Taxon Total no. genome pletely palindromic IR of A. canadense is the result of a DNA cru- Hydnora visseri* 216 10.8 ciform resolution event involving nonhomologous end-joining­ (see Calycanthus floridus var. glaucus 420 3.3 Lobachev et al., 2007, fig. 2; Kwon et al., 2010). Litsea glutinosa 453 3.6 Despite the apparent instability of the SSC, some Asarum spe- Machilus balansae 461 3.6 cies have retained single-copy­ genes separating the IRs. How- Persea americana 465 3.8 ever, a clear mechanism involving the 3′ portion of ndhF in each Phoebe sheareri 466 3.8 Cinnamomum micranthum f. kanehirae 467 3.9 SSC-­containing species is not apparent. In A. epigynum, a puta- Magnolia liliifera 488 3.8 tive recombinatory event has resulted in the truncation of ndhF Liriodendron chinense 503 3.8 by nine codons, compared to A. canadense. The reading frame of Liriodendron tulipifera 511 3.8 ndhF in A. epigynum now runs into an intact coding frame corre- Magnolia officinalis 512 4.0 sponding to 5′ndhA in the adjacent copy of the IR. Contrastingly, Pelargonium alternans* 536 4.9 Asarum minus has retained an intact SSC region and accompa- Piper cenocladum 544 4.3 nying single copy sequence downstream of ndhF. Unfortunately, Illicium oligandrum 549 4.9 the sequence downstream from ndhF in A. minus and S. henryi is Chloranthus spicatus 565 4.6 AT-­rich and impedes our ability to clearly assessment homology Piper kadsura 588 4.8 and the possible retention of this putatively ancestral single copy Drimys granadensis 601 4.8 Vaccinium macrocarpon* 612 4.6 sequence. Saruma henryi 632 5.3 The unique SSC region of A. delavayi was likely established Vitis vinifera 635 5.0 during resolution of a cruciform-­mediated DNA break. The Asarum megacalyx* >641 ~6.8 majority of the SSC region of A. delavayi has been incorpo- Pelargonium ×hortorum* 659 4.0 rated into the IR, resulting in the duplication of all but the last Annona cherimola* 720 5.9 nucleotide of ndhF and 13 other single copy nucleotides. The Aristolochia debilis 726 5.9 14-­bp SSC region of A. delavayi bears no semblance to the SSC Aristolochia contorta 753 6.0 of other magnoliids, because this region is an inversion of an Amborella trichopoda 732 5.6 intact sequence of ndhF codons, rather than retained, ancestral Asarum minus* >888 ~7.7 downstream sequence. Asarum delavayi SSC sequence is a per- Asarum delavayi* >903 ~7.1 Asarum sieboldii* >909 ~7.1 fect inversion of the second and third positions of the Y codon Asarum epigynum* >959 ~8.0 and following VSFF codons that are found in the corresponding Asarum canadense* >1004 ~8.1 portion of ndhF in S. henryi and all other species of Asarum, Cypripedium japonicum* 1473 11.7 with the exception of A. megacalyx. The SSC of A. delavayi was likely established independently from other known SSC regions, possibly due to differential resolution of a similar recombinatory event that established the palindromic, SSC-containing­ IR of A. iously incorporated in the IR, the latter of which is often associated canadense. with palindromic sequence. Palindromic, or nearly palindromic, Nucleotide motifs underlying cruciform secondary structures sequence in close proximity can pair and form stem and loop struc- can be modified or lost during resolution of nonhomologous tures on each of the coding and template strands, which collectively recombination (Lobachev et al., 2007). A cruciform DNA structure comprise a cruciform DNA structure. The influence of cruciform is absent from the 3′ portion of ndhF contained within the palin- DNA structures on processes ranging from translation to disease dromic IR of A. sieboldii and may have been deleted during reso- pathology has been well documented in diverse lineages (see Brázda lution of a recombinatory event. In A. sieboldii, ndhF is truncated et al., 2011). Cruciform DNA structures are also known to reduce such that a stop codon is now found at the position where the cruci- chromosomal stability and serve as hotspots for rearrangements form structure presumably once began. We hypothesize that a ndhF (Lobachev et al., 2007), roles that hold critical explanatory power cruciform structure was also responsible for the destabilization of with regards to the SSC region in Asarum plastomes. the A. sieboldii SSC, but that the structure itself did not persist after The variable length and presence of the SSC region throughout the recombination event. Asarum is best explained by recombination involving cruciform A lack of high-quality­ reads spanning the 3′ portions of ndhF DNA structures. Intact cruciform structures of varying degrees of in A. megacalyx obscures the nature of the SSC-IR­ boundary. The stability are evident in the 3′ portions of ndhF in S. henryi and A. last 13 codons of ndhF in A. megacalyx are highly divergent from canadense. A nearly palindromic cruciform DNA structure is pres- the 3′ portions of ndhF in S. henryi and other Asarum species, ent in the 3′ portion of ndhF in the typical SSC of S. henryi, while a and few high-quality­ Illumina reads span the stop codon of ndhF, completely palindromic and putatively more stable variant is found prohibiting the confident assembly of this region. Reads extend- January 2018, Volume 105 • Sinn et al.—Plastome rearrangement and IR instability in Asarum • 79

taining A. canadense when nuclear or plastid data were analyzed alone or in combination. Three of the six Asarum plastomes sequenced have IR regions that contain nearly all of the nucleotides that are usually found in the SSC. When these states are mapped onto our phy- logram, three independent incorporations of the SSC into the IR is the most parsimonious reconstruction of events (Fig. 1).

A mechanistic hypothesis of the destabiliza- tion and rearrangement of the LSC

We propose that the plastome rearrangement in Asarum occurred because of destabilization of the trnH-­GUG and trnT-­GGU regions due to the generation of low-­complexity sequence, followed by recombination resulting in the inversion of a large portion of the LSC before the diversification of Asarum. Our proposed model is outlined in Fig. 5. The propagation of a repetitive motif caused by a shift of the IR-LSC­ boundary likely destabilized the trnH-­GUG spacer region and ultimately resulted in the inver- sion of a portion of the LSC in the ancestral Asarum plastome. In Saruma, the IR bound- ary is located within a region corresponding to the 5′ portion of rps19. We hypothesize that the low-­complexity trnH-­GUG region FIGURE 4. Possible structure and functional compartmentalization and Saruma and Asarum is due to the establishment of an AAT repeat, plastomes, based on stereotypical plastome circularization. which developed into AT-rich­ motifs, that propagated from within 5′rps19 in the LSC-­ ing downstream­ of ndhF contain palindromic sequences similar proximal portion of the IR of Saruma henryi (Fig. 6A). This motif to those found in the read pools of A. canadense and A. sieboldii, is also found upstream of trnH-­GUG in Saruma henryi, but an yet the low relative coverage depth of this region is not indica- accompanying low-­complexity progenitor to this repeat is not pres- tive of duplication of ndhF. This palindromic sequence has been ent in the plastomes of Piper or Drimys sampled here. Alignment added to the A. megacalyx map in Fig. 2, but we caution that this of the 3′ end of the rps19 pseudogene and the region downstream sequence may not accurately characterize the majority of plastid of trnH-­GUG reveals that these two sequences are highly similar genomes in this species in vivo. The lack of high-­quality reads (Fig. 6B, C). The sequence separating these direct repeats is com- spanning the SSC-IR­ boundary could be due to the presence of posed solely of A or T, with the exception of the span containing low-complexity­ sequence dropout due to PCR or sequencing trnH-­GUG. Due to the presumably stochastic nature of the prop- bias. agation of such low complexity motifs, we propose that the repeti- The plastomes of someAsarum species are functionally bipartite tive, low-complexity­ direct repeats could have extended or become (Fig. 4), due to the incorporation of the SSC into the IR. Although even more numerous between the common ancestor of Asarum and unipartite plastomes have been reported from the eudicots, which Saruma and extant Asarum. due to the loss of one copy of the IR have lost their functionally Direct and inverted repeats, which are present in the flanking tripartite nature (see Ruhlman and Jansen, 2014, and Knox, 2014 regions of trnH-­GUG and trnT-GGU­ in the S. henryi plastome, for thorough reviews), the plastomes described here are the first have long been known to undergo recombination (Kolodner and to owe their reduction in functional compartmentalization to IR Tewari, 1979; Palmer, 1983). In addition to the low-complexity­ expansion rather than IR reduction. sequence shown in Fig. 6, a search in the Saruma henryi plas- Our phylogenetic results (Figs. 1, 4) indicate three independent tome for a motif commonly found in this repetitive spacer region events leading to the incorporation of at least a portion of all for- (AATAT) recovered three direct repeats, two 75 bp and 64 bp merly single copy SSC genes into the IR. With the exception of the downstream of trnH-GUG­ and the other 175 bp upstream of trnT-­ poorly supported relationship of A. epigynum to the remainder of GGU. The development of both localized and dispersed poten- Asarum, the relationships presented here are consistent with those tial recombination substrates in Saruma and the maintenance inferred from the analysis of several plastid loci, but are incongruent and duplication of these same repetitive motifs in the flanks of with those inferred using nuclear rDNA sequence (Kelly, 1998; Sinn syntenically-­disrupted regions in Asarum plastomes are evidence et al., 2015a). Sinn et al. (2015a, b) variously recovered A. epigynum that proliferation of tandem repeats and recombination can dis- as sister to the remainder of the genus or as sister to a clade con- rupt plastome stability. 80 • American Journal of Botany

of rearrangements, losses, and duplications in the plastomes of angiosperms (Aii et al., 1997; Chumley et al., 2006; Cai et al., 2008; Haberle et al., 2008; Wang et al., 2008; Ruhl- man and Jansen, 2014). Coincidently, our understanding of the molecular mechanisms that generate these potential disruption-­ associated sequences has greatly increased (Levinson and Gutman, 1987; Maréchal et al., 2009; Vaughn and Bennetzen, 2014; Iyer et al., 2015; Odahara et al., 2015). In light of the present states of Asarum plasto- mes and the conclusions of Guisinger et al. (2008, 2010, 2011), Zhang et al. (2016), and Ruhlman et al. (2017), we hypothesize that an interplay between differential resolution of recombination events between highly similar regions and a coincident reduction in the efficacy of nuclear-encoded­ proteins that promote faithful replication form a com- pelling model upon which to frame future investigations of the underlying causes of plastome disruption. Future investigations of FIGURE 5. Proposed model of plastome disruption in Asarum. (1) A shift in the IR–LSC bound- proteins and pathways implicated in recom- ary into the LSC incorporates a portion of the rps19 intron, (2) propagation of AT-­rich repeats in bination, repair, and selection in and replica- the duplicated rps19 intron is mirrored upstream of trnH-­GUG; similar sequence develops in an tion of organellar genomes represent fruit- intergenic spacer of trnT-­GGU, (3) highly repetitive AT-rich­ sequence upstream and downstream ful avenues of research that can further our of trnH-­GUG become complementary with each other and similar to sequence near trnT-­GGU, understanding of the constraints and predis- destabilizing the LSC; a lengthy portion of the LSC undergoes an inversion due to intramolecular positions of plastome disruption (Maréchal recombination involving the trnH-­GUG and trnT-­GGU flanking regions. ATAT//ATAT labels repre- and Brisson, 2010; Kwon et al., 2010; Bock sent regions of low complexity containing tandem repeats that are predominantly comprised of et al., 2014; Sloan, 2014). AT-­rich sequence. Although dispersed tandem repeats are characteristic of plastomes (Ruhlman and The lengthy, low-­complexity regions found flanking the Asa- Jansen, 2014) and duplications of these motifs in close proximity to rum LSC inversion are more than sufficient to destabilize the one another have been reported elsewhere (Thomas et al., 2004), the plastome. For example, Odahara et al. (2015) demonstrated that mechanisms responsible for the generation of these repetitive motifs the loss of recG, a DNA repair and recombination-suppressing­ are not fully agreed upon (Vaughn and Bennetzen, 2014). Regard- gene in the nucleus, led to atypical recombination between 12 less of how repetitive regions are generated, we present evidence bp direct repeats in plastomes of knockout lines. Direct repeats that is consistent with the work of other researchers (Guisinger 12 bp in length (AATATAAATAAT) are found in both the trnH-­ et al., 2010, 2011) who have hypothesized that replication fidelity GUG and trnT-GGU­ regions of the Saruma henryi plastome. and recombination can lead to plastome disruption on an evolu- These same repeats are present at five locations in the highly tionary timescale (Do and Kim, 2017), in addition to generating the degraded plastome of the parasitic Hydnora visseri, but not in expected genomic variability that can be found between geographic either of the Piper plastomes included in this study. Although or otherwise isolated populations of the same species (Gurdon and between one and three copies of this 12-­bp sequence were iden- Maliga, 2014; Kim and Kim, 2014). tified in the plastomes of Drimys granadensis, Liriodendron spp., Although the association of hypermutable (Magee et al., 2010) and Magnolia spp., only those identified in the plastomes of and repetitive sequence has been implicated in the syntenic dis- Saruma and Asarum were direct repeats, suggesting that direct ruption of plastomes in previously published studies (Chumley repeats of this motif are not common in magnoliid plastomes. et al., 2006; Guisinger et al., 2011; Knox, 2014), the density of taxon Recombination involving the low-complexity­ and highly com- sampling is rarely sufficient to provide clear polarity of change. plementary repeat-containing­ regions flanking trnH-­GUG Characterizing the role of individual regions of low complexity or and trnT-GGU­ in both Saruma and Asarum are most likely repetitive sequence in many lineages is crucial to improving our responsible for the inversion of a portion of the LSC in Asarum understanding of plastome disruption at a finer resolution and can ­plastomes. most effectively be accomplished for nonmodel lineages through dense taxon sampling and a phylogenetic framework. Only when The roles of recombination, slipped-­strand replication, we have sequenced plastomes from the majority of taxa in multi- and error repair in the evolution of the plastome ple disparate lineages will we be able to understand whether atyp- ical plastome features are precursors to, or symptoms of, plastome Many studies and reviews have reported that repetitive and low disruption. We argue that the evolutionary history of Asarum and complexity motifs are commonly found in the flanking regions Saruma provides one such case and that future sequencing efforts January 2018, Volume 105 • Sinn et al.—Plastome rearrangement and IR instability in Asarum • 81

FIGURE 6. Comparisons of the trnH-­GUG tRNA region. (A) Low complexity sequence located upstream of trnH-­GUG and downstream of 5′rps19 com- prising strings of tandem repeats. (B) Predicted DNA folding of the trnH-­GUG region illustrating the complementarity of the upstream (start indicated by the nucleotide circled in blue) and the downstream (start indicated by the nucleotide circled in red) flanking regions. (C) DNA alignment of forward and reversed regions immediately upstream (top) and downstream (bottom) of trnH-­GUG, respectively. Note that the majority of sequence that is not identical is complementary. in this lineage can further our understanding of the nature of plas- SUPPORTING INFORMATION tome change. Additional Supporting Information may be found online in the supporting information tab for this article. ACKNOWLEDGEMENTS

The authors thank the Botanical Society of America (BSA) Genetics LITERATURE CITED Section for a Student Travel Award, which supported the presentation of this work, to B.T.S. and those involved in the many useful conversa- Aii, J., Y. Kishima, T. Mikami, and T. Adachi. 1997. Expansion of the IR in the tions at BSA meetings. The authors thank Craig Barrett for critiquing chloroplast genomes of buckwheat species is due to incorporation of an SSC an early draft of this manuscript and Mark P. Simmons and an anon- sequence that could be mediated by an inversion. Current Genetics 31: 276– ymous reviewer for their detailed review and comments. The authors 279. also thank Eric Knox for many insights and discussions that did much Angiosperm Phylogeny Group. 2016. An update of the Angiosperm Phylogeny to improve this work, especially with regard to the cruciform DNA Group classification for the orders and families of flowering : APG IV. structure in the 3′ portion of ndhF. This work was supported by the Botanical Journal of the Linnean Society 181: 1–20. NSF DDIG program (grant number DEB-1406732­ to B.T.S. and J.V.F.). Barrett, C. F., J. V. Freudenstein, D. R. Mayfield-Jones, L. Perez, J. C. Pires, and C. Santos. 2014. Investigating the path of plastid genome degradation in an early-­transitional clade of heterotrophic orchids, and implications for het- erotrophic angiosperms. Molecular Biology and Evolution 31: 3095–3112. DATA ACCESSIBILITY Blazier, J. C., R. K. Jansen, J. P. Mower, M. Govindu, J. Zhang, M. Weng, and T. A. Ruhlman. 2016a. Variable presence of the inverted repeat and plastome The respective databases and accession numbers of data used or stability in Erodium. Annals of Botany 117: 1209–1220. generated by this study can be found in Appendix 3. Alignments are Blazier, J. C., T. A. Ruhlman, M.-L. Weng, S. K. Rehman, J. S. M. Sabir, and R. K. archived on the Dryad Digital Repository: https://doi.org/10.5061/ Jansen. 2016b. Divergence of RNA polymerase α subunits in angiosperm plas- dryad.n976p. tid genomes is mediated by genomic rearrangement. Scientific Reports 6: 24595. 82 • American Journal of Botany

Bock, D. G., R. L. Andrew, and L. H. Rieseberg. 2014. On the adaptive vale of Katoh, K., K. Misawa, K.-I. Kuma, and T. Miyata. 2002. MAFFT: a novel method cytoplasmic genomes in plants. Molecular Ecology 23: 4899–4911. for rapid multiple sequence alignment based on fast Fourier transform. Brázda, V. R., C. Laister, E. B. Jagelská, and C. Arrowsmith. 2011. Cruciform Nucleic Acids Research 30: 3059–3066. structures are a common DNA feature important for regulating biological Kelly, L. M. 1998. Phylogenetic relationships in Asarum (Aristolochiaceae) processes. BMC Molecular Biology 12: 33. based on morphology and ITS sequences. American Journal of Botany 85: Cai, Z., M. Guisinger, H.-G. Kim, E. Ruck, J. C. Blazier, V. McMurtry, J. V. Keuhl, 1454–1467. et al. 2008. Extensive reorganization of the plastid genome of Trifolium Kim, J. S., H. T. Kim, and J.-H. Kim. 2014. The largest plastid genome of mono- ­subterraneum (Fabaceae) is associated with numerous repeated sequences cots: a novel genome type containing AT residue repeats in the slipper orchid and novel DNA insertions. Journal of Molecular Evolution 67: 696–704. Cypripedium japonicum. Plant Molecular Biology Reporter 33: 1210–1220. Cai, Z., C. Penaflor, J. V. Kuehl, J. Leebens-Mack, J. E. Carlson, C. W. dePamphi- Kim, S., C.-W. Park, Y.-D. Kim, and Y. Suh. 2001. Phylogenetic relationships lis, J. L. Boore, et al. 2006. Complete plastid genome sequences of Drimys, in family Magnoliaceae inferred from ndhF sequences. American Journal of Liriodendron, and Piper: implications for the phylogenetic relationships of Botany 88: 717–728. magnoliids. BMC Evolutionary Biology 6: 77. Kim, T. K., J. S. Kim, M. J. Moore, K. M. Neubig, N. H. Williams, W. M. Whitten, Castresana, J. 2000. Selection of conserved blocks from multiple alignments for and J.-H. Kim. 2015. Seven new complete plastome sequences reveal ram- their use in phylogenetic analysis. Molecular Biology and Evolution 17: 540–552. pant independent loss of the ndh gene family across orchids and associated Chumley, T. W., J. D. Palmer, J. P. Mower, H. M. Fourcade, P. J. Calie, J. L. Boore, instability of the inverted repeat/small single-­copy region boundaries. PloS and R. K. Jansen. 2006. The complete chloroplast genome sequence of Pel- One 10: e0142215. argonium × hortorum: organization and evolution of the largest and most Kim, T. K., and K.-J. Kim. 2014. Chloroplast genome differences between Asian highly rearranged chloroplast genome of land plants. Molecular Biology and and American Equisetum arvense (Equisetaceae) and the origin of the Evolution 23: 2175–2190. hypervariable trnY-trnE intergenic spacer. PLoS ONE 9: e103898. Computational Research and Development Group, Broad Institute. DISCOVAR Knox, E. B. 2014. The dynamic history of plastid genomes in the Campanulaceae de novo (release 52325). Available from ftp://ftp.broadinstitute.org/pub/crd/ sensu lato is unique among angiosperms. Proceedings of the National Acad- Discovar/. emy of Sciences, USA 111: 11097–11102. Do, H. D. K., and J.-H. Kim. 2017. A dynamic tandem repeat in monocotyledons Kolodner, R., and K. K. Tewari. 1979. Inverted repeats in chloroplast DNA from inferred from a comparative analysis of chloroplast genomes in Melanthia- higher plants. Proceedings of the National Academy of Sciences, USA 76: 41–45. ceae. Frontiers in Plant Science 8: 693. Kuang, D.-Y., H. Wu, Y.-L. Wang, L.-M. Gao, S.-Z. Zhang, and L. Lu. 2011. Com- Doyle, J. J., and J. L. Doyle. 1987. A rapid DNA isolation procedure for small plete chloroplast genome of Magnolia kwangiensis (Magnoliaceae): implica- quantities of fresh leaf tissue. Phytochemical Bulletin 19: 11–15. tion for DNA barcoding and population genetics. Genome 54: 663–673. Drummond, A. J., B. Ashton, S. Buxton, M. Cheung, A. Cooper, C. Duran, M. Kwon, T., E. Huq, and D. L. Herrin. 2010. Microhomology-mediated­ and non- Field, et al. 2011. Geneious, version 5.3+. Biomatters, Auckland, New ­Zealand. homologous repair of a double-strand­ break in the chloroplast genome Fajardo, D., D. Senalik, M. Ames, H. Zhu, S. A. Steffan, R. Harbut, J. Polashock, of Arabidopsis. Proceedings of the National Academy of Sciences, USA 107: et al. 2013. Complete plastid genome sequence of Vaccinium macrocarpon: 13954–13959. structure, gene content, and rearrangements revealed by next generation Lee, J. H., I. S. Choi, B. H. Choi, S. Yang, and G. Choi. 2016. The complete plastid sequencing. Tree Gene Genomes 9: 489–498. genome of Piper kadsura (Piperaceae), an East Asian woody vine. Mitochon- Farris, J. S., V. A. Albert, M. Källersjö, D. Lipscomb, and A. G. Kluge. 1996. Parsi- drial DNA, part A 27: 3555–3556. mony jackknifing outperforms neighbor-­joining. Cladistics 12: 99–124. Levinson, G., and G. A. Gutman. 1987. Slipped-strand­ mispairing: a major Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum like- mechanism for DNA sequence evolution. Molecular Biology and Evolution lihood approach. Journal of Molecular Evolution 17: 368–376. 4: 203–221. Guisinger, M. M., T. W. Chumley, J. V. Kuehl, J. L. Boore, and R. K. Jansen. 2010. Li, X.-W., H.-H. Gao, Y.-T. Wang, J.-Y. Song, R. Henry, H.-Z. Wu, Z.-G. Hu, et al. Implications of the plastid genome sequence of Typha (Typhaceae, Poales) 2013. Complete chloroplast genome sequence of Magnolia grandiflora and for understanding genome evolution in Poaceae. Journal of Molecular Evo- comparative analysis with related species. Science Life Sciences 56(2): lution 70: 9317. 189–198. Guisinger, M. M., J. V. Kuehl, J. L. Boore, and R. K. Jansen. 2008. Genome-wide­ Lobachev, K. S., A. Rattray, and V. Narayanan. 2007. Hairpin- ­and cruciform-­ analyses of Geraniaceae plastid DNA reveal unprecedented patterns of mediated chromosome breakage: causes and consequences in eukaryotic increased nucleotide substitutions. Proceedings of the National Academy of cells. Frontiers in Bioscience 12: 4208–4220. Sciences, USA 105: 18424–18429. Logacheva, M. D., M. I. Schelkunov, M. S. Nuraliev, T. H. Samigull, and A. A. Guisinger, M. M., J. V. Kuehl, J. L. Boore, and R. K. Jansen. 2011. Extreme recon- Penin. 2014. The plastid genome of mycoheterotrophic monoct Petrosavia figuration of plastid genomes in the angiosperm family Geraniaceae: rear- stellaris exhibits both gene losses and multiple rearrangements. Genome Biol- rangements, repeats, and codon usage. Molecular Biology and Evolution 28: ogy and Evolution 6: 238–246. 583–600. Luo, R., B. Liu, Y. Xie, Z. Li, W. Huang, J. Yuan, G. He, et al. 2012. SOAPdenovo2: Gurdon, C., and P. Maliga. 2014. Two distinct plastid genome configurations an empirically improved memory-­efficient short-­read de novo assembler. and unprecedented intraspecies length variation in the accD coding region GigaScience 1: 18. in Medicago truncatula. DNA Research 21: 417–427. Magee, A. M., S. Aspinall, D. W. Rice, B. P. Cusack, M. Sémon, A. S. Perry, S. Haberle, R. C., H. M. Fourcade, J. L. Boore, and R. K. Jansen. 2008. Extensive Stefanović, et al. 2010. Localized hypermutation and associated gene losses rearrangements in the chloroplast genome of Trachelium caeruleum are in legume chloroplast genomes. Genome Research 20: 1700–1710. associated with repeats and rRNA genes. Journal of Molecular Evolution 66: Maréchal, A., and N. Brisson. 2010. Recombination and the maintenance of 350–361. plant organelle genome stability. New Phytologist 186: 299–317. Harris, R. S. 2007. Improved pairwise alignment of genomic DNA. Ph.D. disser- Maréchal, A., J.-S. Parent, F. Véronneau-Lafortune, A. Joyeux, B. F. Lang, and N. tation, Pennsylvania State University, University Park, PA, USA. Brisson. 2009. Whirly proteins maintain plastid genome stability in Arabidop- Iyer, R. R., A. Pluciennik, M. Napierala, and R. D. Wells. 2015. DNA triplet repeat sis. Proceedings of the National Academy of Sciences, USA 106: 14693–14698. expansion and mismatch repair. Annual Reviews in Biochemistry 84: 8.1–8.28. Mayer, C. 2006–2010. Phobos 3.3.12. Available from http://www.ruhr-unibo- Jansen, R. K., Z. Cai, L. A. Raubeson, H. Daniell, C. W. dePamphilis, J. Lee- chum.de/ecoevo/cm/cm_phobos.htm. bens-Mack, K. F. Müller, et al. 2007. Analysis of 81 genes from 64 plastid Moore, M. J., P. S. Soltis, C. D. Bell, J. G. Burleigh, and D. E. Soltis. 2010. Phy- genomes resolves relationships in angiosperms and identifies genome-scale­ logenetic analysis of 83 plastid genes further resolves the early diversifica- evolutionary patterns. Proceedings of the National Academy of Sciences, USA tion of eudicots. Proceedings of the National Academy of Sciences, USA 107: 104: 19369–19374. 4623–4628. January 2018, Volume 105 • Sinn et al.—Plastome rearrangement and IR instability in Asarum • 83

Naumann, J., J. P. Der, E. K. Wafula, S. S. Jones, S. T. Wagner, J. A. Honaas, P. E. and phylogenetic relationships with other Lauraceae. Canadian Journal of Ralph, et al. 2016. Detecting and characterizing the highly divergent plastid Research 46: 1293–1301. genome of the nonphotosynthetic parasitic plant Hydnora visseri (Hydnora- Stamatakis, A. 2006. RAxML-­VI-­HPC: Maximum likelihood-­based phyloge- ceae). Genome Biology and Evolution 8: 345–363. netic analyses with thousands of taxa and mixed models. Bioinformatics 22: Odahara, M., Y. Masuda, M. Sato, M. Wakazaki, C. Harada, K. Toyooka, and Y. 2688–2690. Sekine. 2015. RECG maintains plastid and mitochondrial genome stability Stevens, P. F. 2001 [onward]. Angiosperm Phylogeny website, version 12, July by suppressing extensive recombination between short dispersed repeats. 2012 [and more or less continuously updated since], http://www.mobot.org/ PLOS Genetics 11: e1005080. MOBOT/research/APWeb/ Ott, M., J. Zola, S. Aluru, and A. Stamatakis. 2007. Large-scale maximum like- Swofford, D. L. 2002. PAUP*: phylogenetic analysis using parsimony (*and other lihood-based phylogenetic analysis on the IBM BlueGene/L. In SC ‘07: Pro- methods), version 4.0a152. Sinauer, Sunderland, MA, USA. ceedings of the ACM/IEEE Supercomputing Conference 2007, Reno, NV, USA. Thomas, E. E., N. Srebro, J. Sebat, N. Navin, J. Healy, B. Mishra, and M. https://doi.org/10.1145/1362622.1362628 Wigler. 2004. Distribution of short paired duplications in mammalian Palmer, J. D., 1983. Chloroplast DNA exists in two orientations. Nature 301: genomes. Proceedings of the National Academy of Sciences, USA 101: 92–93. 10349–10354. Palmer, J. D., and W. F. Thompson. 1982. Chloroplast DNA rearrangements are Vaughn, J. N., and J. L. Bennetzen. 2014. Natural insertions in rice commonly more frequent when a large inverted repeat sequence is lost. Cell 29: 537–550. form tandem duplications indicative of patch-­mediated double-­strand break Ross, T. G., C. F. Barrett, M. S. Gomez, V. K. Y. Lam, C. L. Henriquez, D. H. Les, induction and repair. Proceedings of the National Academy of Sciences, USA J. I. Davis, et al. 2015. Plastid phylogenomics and molecular evolution of 111: 6684–6689. Alismatales. Cladistics 32: 160–178. Wang, R.-J., C.-L. Cheng, C.-C. Chang, C.-L. Wu, T.-M. Su, and S.-M. Chaw. Ruhfel, B. R., M. A. Gitzendanner, P. S. Soltis, D. E. Soltis, and J. G. Burleigh. 2008. Dynamics and evolution of the inverted repeat-­large single copy junc- 2014. From algae to angiosperms – inferring the phylogeny of green plants tions in the chloroplast genomes of monocots. BMC Evolutionary Biology (Viridiplantae) from 360 plastid genomes. BMC Evolutionary Biology 14: 23. 8: 36. Ruhlman, T. A., and R. K. Jansen. 2014. The plastid genomes of flowering plants. Wicke, S., G. M. Schneeweiss, C. W. dePamphilis, K. F. Müller, and D. Quandt. In P. Maliga [ed.], Chloroplast biotechnology: methods and protocols, 3–38. 2011. The evolution of the plastid chromosome in land plants: gene content, Springer, New York, NY, USA. gene order, gene function. Plant Molecular Biology 76: 273–297. Ruhlman, T. A., J. Zhang, J. C. Blazier, J. S. M. Sabier, and R. K. Jansen. 2017. Zerbino, D. R., and E. Birney. 2008. Velvet: algorithms for de novo short read Recombination dependent replication and gene conversion homogenize assembly using de Bruijn graphs. Genome Research 18: 821–829. repeat sequences and diversify plastid genome structure. American Journal Zhang, J., T. A. Ruhlman, J. S. M. Sabir, J. C. Blazier, M.-L. Weng, S. Park, and of Botany 104: 559–572. R. K. Jansen. 2016. Coevolution between nuclear-encoded­ DNA replication, Sinn, B. T., L. M. Kelly, and J. V. Freudenstein. 2015a. Phylogenetic relationships recombination, and repair genes and plastid genome complexity. Genome in Asarum: effect of data partitioning and a revised classification. American Biology and Evolution 8: 622–634. Journal of Botany 102: 765–779. Zhou, J., X. Chen, Y. Cui, W. Sun, Y. Li, Y. Wang, J. Song, and H. Yao. 2017. Sinn, B. T., L. M. Kelly, and J. V. Freudenstein. 2015b. Putative floral brood-site­ Molecular structure and phylogenetic analyses of complete chloroplast mimicry, loss of autonomous selfing, and reduced vegetative growth are sig- genomes of two Aristolochia medicinal species. International Journal of nificantly correlated with increased diversification inAsarum (Aristolochia- Molecular Sciences 18: 1839. ceae). Molecular Phylogenetics and Evolution 89: 194–204. Sloan, D. B., A. J. Alverson, M. Wu, J. D. Palmer, and D. R. Taylor. 2012. Recent APPENDIX 1. Herbarium voucher information for material used. acceleration of plastid sequence and structural evolution coincides with Taxon; Collector initials and number; Collection locality; Herbarium (OS = extreme mitochondrial divergence in the angiosperm genus Silene. Genome The Ohio State University Herbarium; K = Royal Botanic Gardens, Kew). Biology and Evolution 4: 294–306. Sloan, D. B., D. A. Triant, N. J. Forrester, L. M. Bergner, M. Wu, and D. R. Taylor. Asarum canadense L.; JVF2392; , Missouri; OS. A. delavayi 2014. A recurring syndrome of accelerated plastid genome evolution in the Franch.; USDA 58431*H; collection locality unknown; OS. A. epigynum angiosperm tribe Sileneae (Caryophyllaceae). Molecular Phylogenetics and ­Hayata; LK1049; Taiwan; K. A. megacalyx (F.Maek.)T.Sugaw.; USDA 58437*L; Evolution 72: 82–89. collection locality unknown; OS. A. minus Ashe; BTS 1060; United States, Soltis, D. E., S. A. Smith, N. Cellinese, K. J. Wurdack, D. C. Tank, S. F. Brocking- North Carolina; OS. A. sieboldii Miq. var. sieboldii; USDA 56474*DH; collec- ton, N. F. Refulio-Rodriguez, et al. 2011. Angiosperm phylogeny: 17 genes, tion locality unknown; OS. Saruma henryi Oliv.; USDA 49482*J; collection 640 taxa. American Journal of Botany 98: 704–730. locality unknown; OS. Song, Y., X. Yao, Y. Tan, Y. Gan, and R. T. Corlett. 2016. Complete chloroplast genome sequence of the avocado: gene organization, comparative analysis,

APPENDIX 2. Primers and GenBank accessions of Sanger sequences

Plastome including sequence Amplified region Name Primer sequence (5′-3­ ′) (GenBank Accession No.) trnS-­GCU – trnG-­UUC spacer 10181F TAGCAATCCGCCGCTTTAGT A. canadense (MG544845) 1128R CCGTGGGAGGACAGAATAGG A. epigynum (MG554448) A. minus (MG554424) pafI intron 1 45989F CCCGAATCACGTCTCTTTCTC A. epigynum (MG554446) 46532R CGGGAGAAAAGGAGGCATTT ndhE–ndhG spacer 126696F TGCAATTGCTATGGCTCGTC A. minus (MG554423) 127167R GGCACTCAAAACAAGTACATGCT petD–rpoA 83371F ACGGATCGTGCTAAAGATTCA A. minus (MG554423) 83758R TTGGACATTCTACAGAAGGATTTCAC 84 • American Journal of Botany

APPENDIX 3. Taxonomic and accession information for samples sequenced by or used in the study for phylogenetic analysis Taxon Order, Family GenBank Amborella trichopoda Amborellales, Amborellaceae NC_005086.1 Annona cherimola Magnoliales, Annonaceae NC_030166.1 Aristolochia contorta Piperales, Aristolochiaceae NC_036152.1 Aristolochia debilis Piperales, Aristolochiaceae NC_036153.1 Asarum canadense Piperales, Aristolochiaceae MG544845-­ MG544851 Asarum delavayi Piperales, Aristolochiaceae MG554426-­ MG554433 Asarum epigynum Piperales, Aristolochiaceae MG554441-­ MG554448 Asarum megacalyx Piperales, Aristolochiaceae MG554449-­ MG554458 Asarum minus Piperales, Aristolochiaceae MG554421-­ MG554425 Asarum sieboldii Piperales, Aristolochiaceae MG554434-­ MG554440 Calycanthus floridus var. Laurales, Calycanthaceae NC_004993.1 glaucus Chloranthus spicatus Chloranthales, NC_009598.1 Chloranthaceae Cinnamomum micranthum Laurales, Lauraceae KR014245.1 f. kanehirae Drimys granadensis Canellales, Winteraceae NC_008456.1 Liriodendron chinense Magnoliales, Magnoliaceae NC_030504.1 Liriodendron tulipifera Magnoliales, Magnoliaceae NC_008326.1 Litsea glutinosa Laurales, Lauraceae KU382356.1 Machilus balansae Laurales, Lauraceae NC_028074.1 Magnolia liliifera Magnoliales, Magnoliaceae NC_023238.1 Magnolia officinalis Magnoliales, Magnoliaceae NC_020316.1 Persea americana Laurales, Lauraceae NC_031189.1 Phoebe sheareri Laurales, Lauraceae NC_031191.1 Piper cenocladum Piperales, Piperaceae NC_008457.1 (modified) Piper kadsura Piperales, Piperaceae NC_027941.1 Saruma henryi Piperales, Aristolochiaceae MG520100

View publication stats