Correlated Variation and Population Differentiation in Satellite DNA Abundance Among Lines of Drosophila Melanogaster

Correlated variation and population differentiation in satellite DNA abundance among lines of Drosophila melanogaster Kevin H.-C. Wei, Jennifer K. Grenier, Daniel A. Barbash, and Andrew G. Clark1 Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853-2703 Contributed by Andrew G. Clark, November 18, 2014 (sent for review October 1, 2014; reviewed by Giovanni Bosco and Keith A. Maggert) Tandemly repeating satellite DNA elements in heterochromatin repeats of simple sequences (≤10 bp); the most abundant include occupy a substantial portion of many eukaryotic genomes. Al- AAGAG (aka GAGA-satellite), AACATAGAAT (aka 2L3L), though often characterized as genomic parasites deleterious to and AATAT (11, 14, 15). In comparison, the genome of its sister the host, they also can be crucial for essential processes such as species Drosophila simulans,fromwhichD. melanogaster di- chromosome segregation. Adding to their interest, satellite DNA verged ∼2.5 mya, is estimated to have only 5% satellite DNA, elements evolve at high rates; among Drosophila, closely related more than 10-fold less AAGAG, and little to no AACATA- species often differ drastically in both the types and abundances GAAT (16). For further contrast, nearly 50% of the Drosophila of satellite repeats. However, due to technical challenges, the evo- virilis and less than 0.5% of Drosophila erecta genomes are sat- lutionary mechanisms driving this rapid turnover remain unclear. ellite DNA (16, 17). Strikingly, such rapid changes in genomes have been implicated in postzygotic isolation of species in the Here we characterize natural variation in simple-sequence repeats – of 2–10 bp from inbred Drosophila melanogaster lines derived form of hybrid incompatibility in several species of flies (18 20), from multiple populations, using a method we developed called demonstrating the critical role satellite DNA has on the evolu- k-Seek that analyzes unassembled Illumina sequence reads. In ad- tion of genomes and species. The expansions and contractions of satellite sequences are dition to quantifying all previously described satellite repeats, we thought to result from a combination of molecular events such as BIOLOGY identified many novel repeats of low to medium abundance. Many POPULATION unequal crossing over (21), rolling circle replication (22), and of the repeats show population differentiation, including two that polymerase slippage (23). Early population genetic studies as- are present in only some populations. Interestingly, the popula- sumed that satellite DNA has no function and that small changes tion structure inferred from overall satellite quantities does not in copy number are neutral, although total abundance may be recapitulate the expected population relationships based on the under constraint due to the potential burden on metabolism, demographic history of D. melanogaster. We also find that some nuclear volume, and DNA replication (24). Under such assump- satellites of similar sequence composition are correlated across tions, early simulation studies demonstrated that unequal crossing lines, revealing concerted evolution. Moreover, correlated satel- over, drift, and reduced recombination are sufficient to generate lites tend to be interspersed with each other, further suggesting long stretches of satellite DNA from random sequences (21, 25). that concerted change is partially driven by higher order structure. Nevertheless, selection also appears to play an important role in Surprisingly, we identified negative correlations among some sat- shaping satellite DNA. For example, Stephan and Cho (1993) ellites, suggesting antagonistic interactions. Our study demon- showed that selection is important in determining the length and strates that current genome assemblies vastly underestimate the heterogeneity of satellites, suggesting that the drastic interspecific complexity, abundance, and variation of highly repetitive satellite DNA and presents approaches to understand their rapid evolution- Significance ary divergence. Most eukaryotic genomes harbor large amounts of highly re- satellite DNA | population differentiation | rapid evolution petitive satellite DNA primarily in centromeric regions. Closely related Drosophila species have nearly complete turnover of eterochromatin occupies a substantial portion of most the types and quantities of simple sequence repeats. How- Heukaryotic genomes and contains vast quantities of tan- ever, the detailed dynamics of turnover remains unclear, in demly repeating, noncoding DNA elements known as satellite part due to technical challenges in examining these highly DNA. These sequences, along with transposable elements, are repetitive sequences. We present a method (k-Seek) that often described as selfish elements or genomic parasites, as they identifies and quantifies simple sequence repeats from whole can increase their copy numbers irrespective of host fitness (1, 2). genome sequences. By characterizing natural variation in Indeed, they can be highly deleterious for the host genome; for tandem repeats within Drosophila melanogaster, we identi- example, ectopic recombination between homologous satellite fied many novel repeats and found that geographically iso- repeats can lead to devastating chromosomal rearrangements (3, lated populations show differentiation patterns that are, 4). Consequently, these elements are mostly sequestered in re- unexpectedly, incongruous with demographic history. More- pressive chromatin environments around the centromeres and over, repeats undergo correlated change in abundance, pro- telomeres where there is minimal recombination and transcrip- viding additional insight into the dynamics of satellite DNA tional activity. However, paradoxically, repetitive sequences are and genome evolution. also crucial components of euchromatic genomes, as they recruit the centromeric histone H3 variant to form centromeres in many Author contributions: K.H.-C.W., D.A.B., and A.G.C. designed research; K.H.-C.W. per- species (5, 6), thereby affecting the fidelity of chromosome seg- formed research; K.H.-C.W., J.K.G., D.A.B., and A.G.C. contributed new reagents/analytic regation (7, 8). tools; K.H.-C.W. analyzed data; and K.H.-C.W., D.A.B., and A.G.C. wrote the paper. Adding to the perplexity, satellite DNA turns over at re- Reviewers: G.B., Dartmouth College; and K.A.M., Texas A&M University. markably high rates between species (9, 10). In Drosophila mel- The authors declare no conflict of interest. anogaster, satellite DNA is estimated to occupy over 20% of the 1To whom correspondence should be addressed. Email: [email protected]. genome. With the exception of the 359-bp (11), responder (12), This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. and dodeca (13) satellites, most known satellites are tandem 1073/pnas.1421951112/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1421951112 PNAS | December 30, 2014 | vol. 111 | no. 52 | 18793–18798 Downloaded by guest on September 26, 2021 differences may not be neutral (26). Since then, multiple authors A 5mer repeats non-repeat have emphasized the importance of both genetic drift and natural selection in the evolution of repetitive DNA (27, 28). Further- Group2 Group3 k-finder more, recent studies have shown that repetitive sequences can Group1 have remarkable effects on the rest of the genome. For example, k-counter natural variation in the Y chromosome, which is nearly entirely B perfect subs indels heterochromatic, can modulate differential gene expression and 1.0 cause variable phenotypes including differences in immune re- 0.9 sponse (29, 30). These results reveal that changes in repetitive 0.8 identification 0.7 sequences can have fitness consequences on which selection will Freq. of correct 2mer 3mer 4mer 5mer 6mer 7mer 8mer 9mer 10mer act. Meiotic drive models have also been proposed, in which C D centromeric satellites that bias the rate of transmission in female 0.2 meiosis will quickly fix in the population (8, 31), providing an 1.2 0 additional mechanism for rapid turnover of satellite DNA. 0.8 However, technical challenges have hindered research on heterochromatin. Because heterochromatic regions do not recom- 0.4 -0.2 (Relative probe signal) bine, common genetic manipulations are mostly ineffective. 10 r = 0.559 Coefficient of variation 0.2 p < 0.005 Repetitive sequences, particularly low complexity satellite DNA, Log -0.4 present severe challenges for making sequence assemblies and 100 10000 100000 -0.4 -0.2 0 0.2 0.4 Normalized mean (bp) Log (Relative abundance) unique alignments (32). A handful of techniques have been ap- 10 plied to study heterochromatin. High-density cesium–chloride Fig. 1. The k-Seek package identifies and quantifies tandem kmers. (A) gradient centrifugation has been instrumental in identifying major k-finder identifies kmers de novo by fragmenting short reads and grouping satellite blocks of different buoyancy, but fails to isolate less them. k-counter then quantifies the number of tandem occurrences. (B) abundant repeats (11). Hybridization approaches can only label k-Seek applied to simulated 100-bp reads containing tandem arrays of kmers known repeats and are often difficult to quantify precisely. More of different lengths. Simulated tandem arrays contained either perfect recently, flow cytometry has been used to indirectly estimate repeats, up to four substitutions (subs), or indels. Frequency of correct iden- heterochromatic content, but it cannot distinguish

Load more