Transposable elements drive rapid phenotypic variation in rubella

Xiao-Min Niua,b,1, Yong-Chao Xua,b,1, Zi-Wen Lia, Yu-Tao Biana,b, Xing-Hui Houa,b, Jia-Fu Chena,b, Yu-Pan Zoua,b, Juan Jianga,b, Qiong Wua, Song Gea,b, Sureshkumar Balasubramanianc, and Ya-Long Guoa,b,2

aState Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, 100093 Beijing, China; bUniversity of Chinese Academy of Sciences, 100049 Beijing, China; and cSchool of Biological Sciences, Monash University, VIC 3800, Australia

Edited by Ian T. Baldwin, Max Planck Institute for Chemical Ecology, Jena, Germany, and approved February 19, 2019 (received for review July 4, 2018) Rapid phenotypic changes in traits of adaptive significance are of invasion with a much wider distribution than that of its out- crucial for organisms to thrive in changing environments. How crossing sister species (16). C. rubella provides such phenotypic variation is achieved rapidly, despite limited an excellent opportunity to unravel the underlying mechanisms, in genetic variation in species that experience a genetic bottleneck addition to its transition to selfing that helps with rapid coloni- is unknown. Capsella rubella, an annual and inbreeding forb (Bras- zation, for the genetic paradox of invasion (14, 15, 17). sicaceae), is a great system for studying this basic question. Its In this study, we reasoned that, if TEs are critical for rapid distribution is wider than those of its congeneric species, despite phenotypic variation and adaptation, their distribution in the C. an extreme genetic bottleneck event that severely diminished its rubella genome might reflect this. Through population genomics genetic variation. Here, we demonstrate that transposable ele- analyses, we show that TEs are highly enriched in promoters and ments (TEs) are an important source of genetic variation that could downstream regions of genes in C. rubella compared with C. account for its high phenotypic diversity. TEs are (i) highly grandiflora. TEs are also highly polymorphic in natural pop- enriched in C. rubella compared with its outcrossing sister species ulations of C. rubella, and 4.2% of polymorphic TE insertions are Capsella grandiflora, and (ii) 4.2% of polymorphic TEs in C. rubella associated with significant changes in expression levels of their are associated with variation in the expression levels of their ad- adjacent genes. In particular, we demonstrate that frequent TE jacent genes. Furthermore, we show that frequent TE insertions at insertions at the FLOWERING LOCUS C (FLC) locus in natural FLOWERING LOCUS C (FLC) in natural populations of C. rubella populations of C. rubella affect its expression and could explain could explain 12.5% of the natural variation in flowering time, a 12.5% of the variation in flowering time, one of the most important key life history trait correlated with fitness and adaptation. In life history traits correlated with adaptation. We also reveal that a ′ particular, we show that a recent TE insertion at the 3 UTR of TE insertion in the 3′ UTR affects mRNA stability. Overall, our FLC affects mRNA stability, which results in reducing its steady- results indicate that TEs play a crucial role in rapid phenotypic state expression levels, to promote the onset of flowering. Our variation, which could potentially promote adaptation to changing results highlight that TE insertions can drive rapid phenotypic var- iation, which could potentially help with adaptation to changing environments in a species with limited standing genetic variation. Significance

Brassicaceae | Capsella rubella | natural variation | rapid phenotypic The mechanisms underlying rapid adaptation to changing en- variation | transposable elements vironments in species with reduced genetic variation, referred to as the “genetic paradox of invasion,” are unknown. We re- port that transposable elements (TEs) are highly enriched in the apid phenotypic changes in traits of adaptive significance gene promoter regions of Capsella rubella compared with its provide organisms with potentials for local adaptation, which R outcrossing sister species Capsella grandiflora. We also show is critical for both survival and range expansion of organisms. that a number of polymorphic TEs in C. rubella are associated This becomes even more crucial in the context of global climate with changes in gene expression. Frequent TE insertions at change (1). Species often experience a genetic bottleneck after FLOWERING LOCUS C of C. rubella affect flowering-time vari- speciation or introduction into a new area that diminishes their ation, an important life history trait correlated with fitness. genetic variation, but rapidly adapt and become invasive there- These results indicate that TE insertions drive rapid phenotypic after, which is referred to as the “genetic paradox of invasion” – variation, which could potentially help adapting to novel en- (2 4). However, the underlying mechanisms mediating these vironments in species with limited genetic variation. rapid diversification phenotypes, in traits that are correlated with adaptation, are unclear. Author contributions: Y.-L.G. designed research; X.-M.N., Y.-C.X., Z.-W.L., Y.-T.B., X.-H.H., Transposable elements (TEs) are among the most variable J.-F.C., and J.J. performed research; Y.-L.G. contributed new reagents/analytic tools; components of the genome, which can replicate and integrate X.-M.N., Y.-C.X., Z.-W.L., Y.-P.Z., and Q.W. analyzed data; and X.-M.N., S.G., S.B., and into new positions. Changes in the environment, such as climate Y.-L.G. wrote the paper. change, can alter both the copy number of TEs and their effects The authors declare no conflict of interest. on gene regulation, generating novel genetic and phenotypic This article is a PNAS Direct Submission. variation of potential adaptive significance (5). Therefore, TEs This open access article is distributed under Creative Commons Attribution-NonCommercial- have the potential to quickly create abundant genetic diversity NoDerivatives License 4.0 (CC BY-NC-ND). and thus be agents of rapid adaptation (6–10). Despite extensive Data deposition: The data described in the paper have been deposited in the National Center for Biotechnology Information Sequence Read Archive under accession nos. studies on their phenotypic effects, the extent to which TEs can PRJNA392709, PRJNA392711, and PRJNA511520 and in GenBank under accession nos. contribute to the process of rapid adaptation is largely unknown MF422379–MF422413, MF422414–MF422449, MF422450–MF422525, MF142149–MF142156, (11–13). and MG492014–MG492020. To explore whether TEs could drive rapid phenotypic di- 1X.-M.N. and Y.-C.X. contributed equally to this work. versification in species with limited genetic variation, we focused 2To whom correspondence should be addressed. Email: [email protected]. on Capsella rubella (), an annual and inbreeding This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. forb that experienced a genetic bottleneck during speciation (14, 1073/pnas.1811498116/-/DCSupplemental. 15). C. rubella provides a typical example of the genetic paradox Published online March 15, 2019.

6908–6913 | PNAS | April 2, 2019 | vol. 116 | no. 14 www.pnas.org/cgi/doi/10.1073/pnas.1811498116 Downloaded by guest on October 1, 2021 environments in and could explain the genetic paradox regions (SI Appendix,Fig.S2B). The number of polymorphic TEs of invasion. in these accessions ranged from 1,024 to 1,688, and 1,731 in the reference MTE (SI Appendix, Fig. S3 and Table S1). The poly- Results morphic TEs that are present in at least two accessions were TEs Are Highly Enriched in C. rubella and Affect the Expression Levels considered as common. In total, of the 3,808 polymorphic TEs, of Its Adjacent Genes. We hypothesized that if TEs are among the 2,558 (67.17%) were present in at least two accessions, suggesting key determinants of phenotypic variation, their distribution may a common allelic variation (Fig. 1C). differ between C. rubella and its outcrossing congeneric sister To assess the functional implications of this variation, we species C. grandiflora. To test this, we compared the distribution performed transcriptome sequencing for three representative of TEs in 19,795 orthologous gene pairs between C. rubella and accessions of C. rubella, MTE, 86IT1, and 879 (SI Appendix, C. grandiflora. TEs are highly enriched in different genic regions Table S2), based on their positions on the phylogenetic tree of 28 of the C. rubella compared with the corresponding regions of C. C. rubella accessions (SI Appendix, Fig. S1). In these three ac- grandiflora (Fig. 1A). Our analysis with orthologous regions cessions there are 1,309 polymorphic TEs, of which 5.45% were unmasked these divergent distribution patterns compared with located in genic regions and 94.55% were in intergenic regions. the previous study (18), since the genomes of these two species We found that 55 of 1,309 polymorphic TE loci in these three have varied coverages/assemblies. In addition, TEs are highly accessions (4.2%) were associated with significant changes in the enriched in the promoter and downstream regions of genes in C. expression levels of their adjacent genes [Wilcoxon sum test, rubella compared with C. grandiflora (Fig. 1B), indicating that false discovery rate (FDR) corrected, P < 0.05, SI Appendix, TEs could potentially contribute to the diversification of gene Table S3]. Among these TEs that are associated with changes in expression in C. rubella. gene expression, retrotransposons are more common (36.37%, To assess whether TEs are polymorphic in C. rubella,we SI Appendix, Fig. S4). A Gene Ontology enrichment analysis (21) scanned 28 C. rubella genomes, which included 27 accessions (18, of the genes that are affected by the TEs using the Arabidopsis 19) (SI Appendix, Table S1), with MTE accession as a reference orthologous genes suggested that these genes are involved in (20). A phylogenetic tree based on whole-genome sequences lateral root formation and defense response (SI Appendix, Fig. suggested that these 28 C. rubella accessions are closely related (SI S5). Taken together, these findings suggest that polymorphic Appendix,Fig.S1). We identified 3,808 polymorphic TEs among TEs can diversify gene expression, which has the potential to

these genomes (SI Appendix,Fig.S2A). As reported earlier (18), shape the phenotypic variation in C. rubella as in other plants (8, EVOLUTION these polymorphic TEs are enriched (89.84%) in intergenic 22, 23).

Fig. 1. Transposable elements cause phenotypic diversification and changes in gene expression in C. rubella.(A) Number of TEs in different genic regions of the C. rubella and C. grandiflora genomes. (B) Enrichment of TEs in 100-bp bins in the regions 2 kb upstream of the start codon and 1 kb downstream of the stop codon between orthologous gene pairs of C. rubella and C. grandiflora. The x axis indicates the distance from TEs to the start codon (stop codon) of the closest gene; the y axis indicates the number of genes with TEs at different distances to the start codon (stop codon). (C) Frequency distribution of polymorphic TEs in the genomes of 27 C. rubella natural accessions with MTE as a reference. The x axis indicates the number of accessions with the same TE insertions. (D) Insertion of TEs in FLC in C. rubella populations. Thick lines in different colors and lengths indicate the classification and length of the inserted TEs. Numbers in parentheses indicate the sequence lengths of inserted TEs; the names before the parentheses indicate the accessions with the inserted TEs; when there are two accessions, accession names are separated by a backslash (/). (E) Flowering-time variation between accessions with or without a TE insertion at the CrFLC locus. (F) Relative expression levels of CrFLC, CrFT, and CrSOC1 in ac- cessions with TE insertions. Four biological replicates, each with three technical replicates, were analyzed. Expression data were normalized with CrTubulin and then to the levels of 879 plants and are shown as means ± SD; *P < 0.05; **P < 0.01; NS, insignificant.

Niu et al. PNAS | April 2, 2019 | vol. 116 | no. 14 | 6909 Downloaded by guest on October 1, 2021 Frequent TE Insertions at CrFLC Affect Natural Variation in Flowering the CrFLC expression level was strongly correlated with the TE Time. The geographic distribution of C. grandiflora is much nar- insertion in the 3′ UTR in 2-wk-old seedlings of the F2 pop- rower than that of C. rubella, which has colonized most areas of ulation (Pearson’s correlation coefficient, r2 = 0.462, P < 0.01). southern and western Europe with varied environmental conditions These findings suggested that the 3′ UTR TE potentially af- (24). Flowering time is an important trait for adaptation to changing fected CrFLC expression in this cross. climates and novel environments (25, 26). Interestingly, we found Sequencing of a pool of early flowering plants from the F2 that FLC, a determinant of natural variation in flowering time of C. population mapped the causal loci to an ∼863-kb region on rubella (27), has TE insertions in its genic region (27). To assess the chromosome 6 (from position 2.78–3.64 Mb) (Fig. 2C). We frequency of TE insertions at CrFLC, we sequenced the CrFLC further used 825 individuals from the BC2F2 population and genomic region (from 2 kb upstream of the start codon to the end narrowed the causal region down to a 130-kb interval (SI Ap- of 3′ UTR) in eight accessions by Sanger sequencing and compared pendix, Table S6). This mapping interval contained 39 ORFs, them with 27 other previously published CrFLC genomic sequences including CrFLC (Fig. 2D). Sanger sequencing revealed that (SI Appendix,TableS4). We found five different TE insertions in 7 CrFLC has one deletion of 2 bp in the promoter, two SNPs in its ′ of 35 accessions (20%). We found a TE insertion in the 3 UTR in intron, one SNP, and a 1,958-bp Helitron TE insertion in the 3′ two accessions (879 and 1208) and four additional TE insertions UTR that differentiated MTE from 879 (Fig. 2 D and E). The in intron 1 in five accessions (39.1, 690, 1204, 844, and TAAL) (Fig. CrFLC expression level in 879 was a quarter of that in MTE (Fig. 1D). The TE insertions ranged from 498 to 1,958 bp, and most were 1F), making it a prime candidate for the observed flowering-time Helitron, except one insertion in the TAAL accession, which was a variation. DNA transposon. All these seven accessions with TE insertions are To assess whether the TE insertion in intron 1 of CrFLC is early flowering compared with the reference accession MTE and also the causal mutation for early flowering, we constructed an accessions without TEs (Fig. 1E and SI Appendix,TableS4). The × < ’ F2 population of 844 MTE. Using a similar sequencing ap- expression levels of CrFLC are lower (P 0.05, Student s t test) proach, we identified an ∼650-kb region on chromosome 6 based than MTE in all of the other six accessions except for 1204 (Fig. on 501 F2 individuals (SI Appendix, Fig. S9 A and B). We nar- 1F). This is also reflected in the inversely correlated expression rowed down the mapping interval to a 210-kb region using 561 levels of the CrFLC-downstream genes FT and SOC1 (Fig. 1F). more F individuals and additional markers, which included the Furthermore, 12.5% of the variation in flowering time could be 2 CrFLC locus (SI Appendix, Fig. S9 C and D and Table S7). The explained by allelic variation of the TE insertions at CrFLC in C. location of TEs in intron 1 was roughly similar to the position of rubella (Pearson’s correlation coefficient, r2 = 0.125, P < 0.05). Ler in A. thaliana, in which a TE insertion regulates histone These results revealed that TE insertions in CrFLC are associated modification via siRNA, thereby regulating flowering time (35). with early flowering in C. rubella. Therefore, it is likely that a similar mechanism may mediate the Given that each TE insertion is present in only a few C. rubella onset of early flowering in 844. accessions at different regions of the FLC locus, we concluded that However, the TE insertion at the 3′ UTR in accession 879 is these TE insertions originated recently and independently (SI Ap- unique (SI Appendix, Fig. S6). To confirm whether the TE in- pendix,Fig.S1). In the Brassicaceae family, there are frequent TE ′ insertions at FLC (SI Appendix,Fig.S6A). An analysis of a few ac- sertion in the 3 UTR modulates flowering time, we generated cessions from C. grandiflora and C. orientalis and 76 accessions of constructs with genomic CrFLC from MTE and 879 and an ad- Arabidopsis thaliana (SI Appendix,TableS5)alsosuggestedthatthe3′ ditional construct of MTE-CrFLC in which the Helitron TE from 879 was introduced (MTE-Ins) and compared their ability to UTR TE insertion in accession 879 is a recent mutation that is sf2 present only in two C. rubella accessions (SI Appendix,Fig.S7). If the complement FRI flc-3 A. thaliana plants (Fig. 3A). CrFLC TE insertions have any potential to be of adaptive significance, one expression was higher in transgenic plants harboring the MTE might expect that these accessions could show some geographic construct than in those with 879 or MTE-Ins constructs (Fig. 3B; patterns. To assess this, we compared the origin of these accessions. see SI Appendix, Fig. S10, for 10 randomly selected independent Intriguingly, accessions with TE insertions clustered in the south- transgenic lines). Consistent with these findings, levels of the ernmost habitat of this species around the Mediterranean region (SI CrFLC downstream genes CrFT and CrSOC1 were inversely Appendix,Fig.S6B and Table S4). This region harbors a Mediter- correlated with the CrFLC expression level (Fig. 3B). Flowering ranean climate, which is characterized by high precipitation, warm time was correlated with CrFLC expression. Transgenic plants winters, and dry and hot summers (28–31). Plants in the area are with the MTE genomic sequence were late flowering, while usually early flowering to avoid dry and hot summers (28–31). The transgenic plants with either 879 or MTE with the TE insertion high TE insertion frequency, associated with early flowering in C. were earlier flowering (all P < 0.001, Student’s t test; Fig. 3C). rubella, is consistent with this hypothesis. To assess whether there is Taken together, these results suggested that the allelic variation significant niche divergence between C. rubella accessions with caused by the TE insertion at the 3′ UTR of CrFLC was re- or without the TE insertion, we performed ecological niche sponsible for the flowering-time difference between 879 and modeling and found significant niche divergence between the MTE accessions of C. rubella. two groups (100 random permutations, one-sample t test, P < 0.001, SI Appendix,Fig.S8), consistent with the possibility that TE in 3′ UTR of CrFLC Causes Alternative Polyadenylation and Affects TEs in C. rubella show the potential to cause phenotypic variation Its mRNA Stability. TE insertions can change gene expression by of adaptive significance. altering transcription and/or by modulating mRNA stability (8). To test whether TE insertions in the 3′ UTR and intron 1 are To assess whether the differential CrFLC expression between indeed the causal mutations for early flowering, we performed MTE and 879 is caused by transcriptional differences, we genetic mapping. First, we analyzed an F2 population (879 × quantified transcription rates by measuring unspliced mRNA MTE) derived from a cross between the late-flowering, se- abundance (18, 36). The abundance of unspliced CrFLC nascent quenced accession MTE (20) and the early flowering accession transcript is similar between MTE and 879 (Fig. 4A). Consistent 879 that harbors a TE in the 3′ UTR. Flowering time correlates with this, in both accessions, low levels of DNA methylation were significantly with FLC expression in the Brassicaceae family (32– detected around the CrFLC transcription start site and the end 34). Here, we found that the distribution of flowering time is of intron 1, two regions important for the regulation of FLC consistent with the distribution of CrFLC expression in F2 pop- expression (37) (SI Appendix, Fig. S11). These findings suggest ulations (Fig. 2 A and B). The F2 individuals with lower CrFLC that transcriptional differences do not account for the observed expression belong to the 879 genotype (Fig. 2B). Furthermore, changes in gene expression.

6910 | www.pnas.org/cgi/doi/10.1073/pnas.1811498116 Niu et al. Downloaded by guest on October 1, 2021 FRIsf2 flc-3 A. thaliana plants, in which the poly(A) signal was mutated (879-PAS-mut) (SI Appendix,Fig.S12). These plants were also early flowering like transgenic plants of 879-CrFLC compared with transgenic plants of MTE-CrFLC, which suggests that, while the TE insertion at the 3′ UTR causes APA, other mechanisms also contribute to the reduction in gene expression. To assess mRNA stability, we measured CrFLC expression in the presence of cordycepin, which inhibits transcription (40), in parental lines. These experiments revealed that 879-CrFLC mRNA de- graded faster than MTE-CrFLC mRNA (Fig. 4C). To further confirm the effect of the 3′ UTR, we generated GUS-reporter constructs in FRIsf2 flc-3 A. thaliana plants driven by the CrFLC promoter from MTE with either MTE or 879 3′ UTRs (Fig. 4D). We found that the MTE 3′ UTR led to an almost threefold increase in expression compared with that of 879. These results are consis- tent with findings from Arabidopsis, indicating that the 3′ UTR could affect the stability of FLC mRNA (36). MTE and 879 also differ in AU-rich elements (AREs) and GU-rich elements (GREs) before their poly(A) tails (Fig. 4B). These elements, which are present only in 879, have the potential to affect RNA stability (41). To assess whether AREs or GREs could mediate changes in gene expression, we compared ex- pression levels of GUS after specifically mutagenizing these el- ements in transgenic lines (Fig. 4D). While the loss of the first ARE had a significant impact on gene expression, the loss of the GRE and the second ARE did not, suggesting that the first ARE at the 3′ UTR of 879 is critical for modulating mRNA expression EVOLUTION

Fig. 2. QTL mapping analysis of flowering time. (A) Distribution of flow-

ering time of the 557 individuals of the 879 × MTE F2 population. Averages and ranges of flowering times for the two parental lines are shown above.

(B) Histogram of CrFLC expression levels variations among F2 population (n = 122) with TE insertion (879 genotype, shown in yellow), without TE insertion (MTE genotype, purple), and heterozygote (gray) at its 3′ UTR. Averages and ranges of CrFLC expression levels for the two parental lines are shown above. Expression data were normalized against CrTubulin.(C) SHOREmap analysis of flowering time. The homozygosity estimator is 0 for even allele frequencies for both parents, 1 when homozygous for the late-flowering accession MTE, and −1 when homozygous for the early-flowering acces- sion 879. (D) Fine mapping by a genetic-linkage analysis and sequence var- iation in the candidate gene CrFLC. The number of recombinants between the markers and the causal locus is indicated below the linkage map. CrFLC is marked in red. Dashed red lines indicate sequence variation between MTE and 879; red line indicates deleted sequence. (E) Sequence variation in the CrFLC 3′ UTR and the inserted Helitron structure. The Helitron structure is illustrated at the top, composed of the 5′ TC and 3′ CTRR termini (shown in red), loop and two short palindromic sequences close to the 3′ terminus that could form a 20-bp hairpin (shown in blue and purple) and are inserted within a host dinucleotide AT (shown in green). The end sequences of the CrFLC 3′ UTR Helitrons are shown at the bottom in the same color. Fig. 3. TE insertion at CrFLC explains variation in flowering time. (A) Dia- grams of constructs. Red vertical lines indicate variation between MTE and 879. Pink rectangles indicate the TE insertion. (B) Expression of CrFLC and TE insertions at the 3′ UTR can give rise to alternative pol- two downstream flowering regulators (FT, SOC1)inA. thaliana transgenic yadenylation (APA), which in turn could modulate mRNA sta- plants, as assayed using a pool of 25 independent transgenic lines. Expres- bility (38, 39). The Helitron insertion in 879 introduces a sion data were normalized against AtTubulin and then to the levels of ′ transgenic plants with 879-CrFLC and are shown as means ± SD; **P < 0.01. canonical poly(A) signal (AAUAAA). The 3 RACE assays (C) Flowering time for transgenic plants. Flowering times for T1 transgenic revealed that there is APA in 879-CrFLC (Fig. 4B). To assess lines with FRIsf2 and flc-3 alleles in the A. thaliana Col-0 background are whether APA could fully account for the observed phenotypic shown; the number of independent transgenic lines scored for each con- differences, we generated an additional 879-CrFLC construct in struct is given above the bar graph. DTF, days to flowering.

Niu et al. PNAS | April 2, 2019 | vol. 116 | no. 14 | 6911 Downloaded by guest on October 1, 2021 Fig. 4. TE insertion-induced alternative polyadenylation regulates CrFLC expression by modulating RNA stability. (A) Expression levels of unspliced CrFLC. Five biological replicates, each with three technical replicates, were analyzed. Expression data were normalized against CrTubulin and are shown as means ± SD. (B) Alternative polyadenylation at the 3′ UTR between 879 and MTE and a schematic model. The MTE 3′ UTR is shown in dark gray. Pink rectangles indicate part of Helitron. The ARE (purple rectangles) and GRE (blue rectangles) sites and the polyadenylation signal (PAS, orange rectangles) domain are shown. Red line indicates variation. (C) RNA stability of CrFLC. CrATEXLA1 was used as a control, representing an unstable transcript. Three biological replicates, each with three technical replicates, were analyzed. (D) Various deletion constructs with different 3′ UTRs and their effects on gene expression in transgenic lines. The MTE promoter is shown in light gray; GUS genes are in white; the same 3′ UTR region is shown in dark gray, and the latter half of the MTE 3′ UTR is shown in gray; the deleted regions are shown in dashed lines. The vertical red line indicates the mutation of the ARE motif, and the vertical blue line indicates the mutation of the GRE motif. The pooled number of independent transgenic lines is given next to the bar graphs. Expression data were normalized against AtTubulin and are shown as means ± SD; *P < 0.05; **P < 0.01; NS, insignificant.

in these transgenic lines. An analysis of transgenic lines har- extreme genetic bottleneck but has a wide distribution and is boring reporter constructs with partial UTRs from MTE sug- therefore a good model system to address this question. TEs gested that the latter part of the MTE 3′ UTR is more critical as have been suggested as an agent of rapid adaptation because the deletion of the first half of the MTE 3′ UTR (MTEΔ2) they can produce abundant genetic variation in a limited time resulted in the same expression level as that of MTE, but the (5–9) and can affect phenotypic variation (11–13). For example, deletion of the latter half (MTEΔ1) significantly reduced the in peppered moths, a TE insertion within intron 1 of the cortex expression level (Fig. 4D). These results indicate that the pres- gene increases its expression level and induces melanization (13). ence of AREs and the absence of the latter part of the MTE 3′ However, the extent to which TEs contribute to rapid phenotypic UTR in 879-CrFLC might reduce CrFLC expression probably by diversification and the mechanisms by which they influence affecting mRNA stability (Fig. 4D). phenotypes are unknown. Our results show that TEs play a crucial role in rapid phe- Discussion notypic variation and suggest that they can potentially aid in Invasive species with limited genetic variation often rapidly ac- adaptation to changing environments. Previous studies in other quire diversification of phenotypes to adapt to changing envi- systems have showed that TEs in 3′ UTRs can alter gene ex- ronments and become successful in a process referred to as the pression via the regulation of methylation in Arabidopsis (42) and genetic paradox of invasion (2). How this process occurs is a can change the translation efficiency in rice (43). We have found fundamental question in biology. C. rubella experienced an that a TE insertion in the 3′ UTR affects mRNA stability. This

6912 | www.pnas.org/cgi/doi/10.1073/pnas.1811498116 Niu et al. Downloaded by guest on October 1, 2021 mechanism also has the potential to affect gene expression and pBI121 vector using the SbfI and XmaI sites, and the 3′ UTR was cloned using phenotypes. While our study hints at a correlation with potential the SacI and EcoRI sites. adaptive significance, future studies should explore whether such RNA-seq was performed on total RNA samples extracted from flower bud phenotypic diversification ultimately confers an adaptive ad- tissues with three biological repeats for each accession (SI Appendix, Table S2). Expression levels of genes were estimated by fragments per kilobase of vantage and whether TEs can drive adaptation. exon per million reads mapped. RT-qPCR was performed to estimate RNA expression levels. RNA stability was measured using RT-qPCR to quantify Materials and Methods gene expression levels at different time points after incubation of 3- More detailed information on the materials and methods used in this study is deoxyadenosine (cordycepin). DNA after bisulfite conversion was se- provided in SI Appendix, Supplementary Information Text. The C. rubella quenced and analyzed to measure DNA methylation levels with the online accessions were described in our previous study (14, 27, 32) or this study (SI tool Kismeth (48). All primers are listed in SI Appendix, Table S8. All statis- Appendix, Table S4). Flowering time was assayed as days to flowering after tical analyses were performed in R (www.r-project.org/). planting. TEs in C. grandiflora (20) and C. rubella (20) were annotated using The data described in this paper have been deposited in the NCBI Sequence RepeatMasker v.4.0.6 (www.repeatmasker.org). The polymorphic TEs Read Archive under accession nos. PRJNA392709, PRJNA392711, and PRJNA511520 were identified using TEPID (44) based on the raw reads of 27 resequenced and in GenBank under accession nos. MF422379–MF422413, MF422414–MF422449, genomes of C. rubella accessions (18, 19) (SI Appendix, Table S1) with the MF422450–MF422525, MF142149–MF142156, and MG492014–MG492020. MTE genome as the reference. A neighbor-joining tree of 28 C. rubella ac- cessions was constructed using PHYLIP (version 3.696) (45). Ecological niche ACKNOWLEDGMENTS. We thank Detlef Weigel, John Bowman, Xiaofeng modeling was performed to measure niche similarity using ENMTools 1.3 Cao, and Wolfgang Busch for valuable suggestions and discussions; Li Yang

(46). Pooled leaves of early-flowering 879 × MTE (844 × MTE) F2 plants were and Mao-Fu Li for assistance with the experiments; and especially the sequenced to map the causal loci using SHOREmap (47). For the comple- anonymous reviewers for their help in improving the manuscript. This study was funded by National Natural Science Foundation of China Grant mentation assay and the poly(A) signal mutation assay of CrFLC in A. thali- 91731306 (to Y.-L.G.); Strategic Priority Research Program of the Chinese ana, the fragments with (or without) mutations were cloned into the Academy of Sciences Grant XDB27010305 (to Y.-L.G.); the Innovative pCAMBIA1300 vector using the KpnI and SalI sites and introduced into A. Academy of Seed Design, Chinese Academy of Sciences (Y.-L.G.); Chinese sf2 thaliana with FRI and flc-3 alleles in the Col-0 background. For the GUS- Academy of Sciences President’s International Fellowship for Visiting Scien- reporter assay, the promoter was amplified from MTE and cloned into the tists (S.B.); and Monash University Larkins Fellowship (to S.B.).

1. Scheffers BR, et al. (2016) The broad footprint of climate change from genes to bi- 26. Anderson JT, Inouye DW, McKinney AM, Colautti RI, Mitchell-Olds T (2012) Pheno- omes to people. Science 354:719. typic plasticity and adaptive evolution contribute to advancing flowering phenology

2. Estoup A, et al. (2016) Is there a genetic paradox of biological invasion? Annu Rev Ecol in response to climate change. Proc Biol Sci 279:3843–3852. EVOLUTION Evol Syst 47:51–72. 27. Yang L, et al. (2018) Parallel evolution of common allelic variants confers flowering 3. Kolbe JJ, et al. (2004) Genetic variation increases during biological invasion by a Cu- diversity in Capsella rubella. Cell 30:1322–1336. ban lizard. Nature 431:177–181. 28. Kigel J, et al. (2011) Relationships between flowering time and rainfall gradients 4. Frankham R (2005) Resolving the genetic paradox in invasive species. Heredity (Edinb) across Mediterranean-desert Transects. Isr J Ecol Evol 57:91–109. 94:385. 29. Wolfe MD, Tonsor SJ (2014) Adaptation to spring heat and drought in northeastern 5. Stapley J, Santure AW, Dennis SR (2015) Transposable elements as agents of rapid adap- Spanish Arabidopsis thaliana. New Phytol 201:323–334. tation may explain the genetic paradox of invasive species. Mol Ecol 24:2241–2252. 30. Shavrukov Y, et al. (2017) Early flowering as a drought escape mechanism in plants: 6. Schrader L, et al. (2014) Transposable element islands facilitate adaptation to novel How can it aid wheat production? Front Plant Sci 8:1950. environments in an invasive species. Nat Commun 5:5495. 31. Exposito-Alonso M, et al. (2018) Genomic basis and evolutionary potential for ex- 7. Casacuberta E, González J (2013) The impact of transposable elements in environ- treme drought adaptation in Arabidopsis thaliana. Nat Ecol Evol 2:352–358. – mental adaptation. Mol Ecol 22:1503 1517. 32. Guo YL, Todesco M, Hagmann J, Das S, Weigel D (2012) Independent FLC mutations as 8. Chuong EB, Elde NC, Feschotte C (2017) Regulatory activities of transposable ele- causes of flowering-time variation in Arabidopsis thaliana and Capsella rubella. – ments: From conflicts to benefits. Nat Rev Genet 18:71 86. Genetics 192:729–739. 9. Barrón MG, Fiston-Lavier AS, Petrov DA, González J (2014) Population genomics of 33. Slotte T, Holm K, McIntyre LM, Lagercrantz U, Lascoux M (2007) Differential expres- – transposable elements in Drosophila. Annu Rev Genet 48:561 581. sion of genes important for adaptation in Capsella bursa-pastoris (Brassicaceae). Plant 10. Li ZW, et al. (2018) Transposable elements contribute to the adaptation of Arabi- Physiol 145:160–173. dopsis thaliana. Genome Biol Evol 10:2140–2150. 34. Lempe J, et al. (2005) Diversity of flowering responses in wild Arabidopsis thaliana 11. Studer A, Zhao Q, Ross-Ibarra J, Doebley J (2011) Identification of a functional strains. PLoS Genet 1:109–118. transposon insertion in the maize domestication gene tb1. Nat Genet 43:1160–1163. 35. Liu J, He Y, Amasino R, Chen X (2004) siRNAs targeting an intronic transposon in the 12. Martin A, et al. (2009) A transposon-induced epigenetic change leads to sex de- regulation of natural flowering behavior in Arabidopsis. Genes Dev 18:2873–2878. termination in melon. Nature 461:1135–1138. 36. Csorba T, Questa JI, Sun Q, Dean C (2014) Antisense COOLAIR mediates the co- 13. Van’t Hof AE, et al. (2016) The industrial melanism mutation in British peppered ordinated switching of chromatin states at FLC during vernalization. Proc Natl Acad moths is a transposable element. Nature 534:102–105. Sci USA 111:16160–16165. 14. Guo YL, et al. (2009) Recent speciation of Capsella rubella from Capsella grandiflora, 37. Sheldon CC, Conn AB, Dennis ES, Peacock WJ (2002) Different regulatory regions are associated with loss of self-incompatibility and an extreme bottleneck. Proc Natl Acad required for the vernalization-induced repression of FLOWERING LOCUS C and for the Sci USA 106:5246–5251. epigenetic maintenance of repression. Plant Cell 14:2527–2537. 15. Foxe JP, et al. (2009) Recent speciation associated with the evolution of selfing in 38. Feschotte C (2008) Transposable elements and the evolution of regulatory networks. Capsella. Proc Natl Acad Sci USA 106:5241–5245. – 16. Hurka H, Neuffer B (1997) Evolutionary processes in the genus Capsella (Brassicaceae). Nat Rev Genet 9:397 405. Plant Syst Evol 206:295–316. 39. Di Giammartino DC, Nishida K, Manley JL (2011) Mechanisms and consequences of – 17. Tang C, et al. (2007) The evolution of selfing in Arabidopsis thaliana. Science 317:1070–1072. alternative polyadenylation. Mol Cell 43:853 866. 18. Ågren JA, et al. (2014) Mating system shifts and transposable element evolution in 40. Golisz A, Sikorski PJ, Kruszka K, Kufel J (2013) Arabidopsis thaliana LSM proteins – the plant genus Capsella. BMC Genomics 15:602. function in mRNA splicing and degradation. Nucleic Acids Res 41:6232 6249. 19. Wu Q, et al. (2017) Long-term balancing selection contributes to adaptation in Ara- 41. Al-Ahmadi W, Al-Ghamdi M, Al-Haj L, Al-Saif M, Khabar KS (2009) Alternative poly- bidopsis and its relatives. Genome Biol 18:217. adenylation variants of the RNA binding protein, HuR: Abundance, role of AU-rich – 20. Slotte T, et al. (2013) The Capsella rubella genome and the genomic consequences of elements and auto-regulation. Nucleic Acids Res 37:3612 3624. rapid mating system evolution. Nat Genet 45:831–835. 42. Saze H, Kakutani T (2007) Heritable epigenetic mutation of a transposon-flanked Arabi- 21. Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of dopsis gene due to lack of the chromatin-remodeling factor DDM1. EMBO J 26:3641–3652. large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57. 43. Shen J, et al. (2017) Translational repression by a miniature inverted-repeat trans- 22. Hollister JD, et al. (2011) Transposable elements and small RNAs contribute to gene posable element in the 3′ untranslated region. Nat Commun 8:14651. expression divergence between Arabidopsis thaliana and Arabidopsis lyrata. Proc 44. Stuart T, et al. (2016) Population scale mapping of transposable element diversity Natl Acad Sci USA 108:2322–2327. reveals links to gene regulation and epigenomic variation. eLife 5:e20777. 23. Wei L, Cao X (2016) The effect of transposable elements on phenotypic variation: 45. Feisenstein J (1989) PHYLIP-phylogeny inference package (version 3.2). Cladistics 5:164–166. Insights from plants to humans. Sci China Life Sci 59:24–37. 46. Warren DL, Glor RE, Turelli M (2010) ENMTools: A toolbox for comparative studies of 24. Hurka H, Friesen N, German DA, Franzke A, Neuffer B (2012) ‘Missing link’ species environmental niche models. Ecography 33:607–611. Capsella orientalis and Capsella thracica elucidate evolution of model plant genus 47. Schneeberger K, et al. (2009) SHOREmap: Simultaneous mapping and mutation Capsella (Brassicaceae). Mol Ecol 21:1223–1238. identification by deep sequencing. Nat Methods 6:550–551. 25. Weigel D (2012) Natural variation in Arabidopsis: From molecular genetics to eco- 48. Gruntman E, et al. (2008) Kismeth: Analyzer of plant methylation states through logical genomics. Plant Physiol 158:2–22. bisulfite sequencing. BMC Bioinformatics 9:371.

Niu et al. PNAS | April 2, 2019 | vol. 116 | no. 14 | 6913 Downloaded by guest on October 1, 2021