Access to DNA Establishes a Secondary Target Site Bias for The
Total Page:16
File Type:pdf, Size:1020Kb
FEATURE Access to DNA establishes a secondary target site bias SACKLER SPECIAL for the yeast retrotransposon Ty5 Joshua A. Baller1, Jiquan Gao1, and Daniel F. Voytas2 Department of Genetics, Cell Biology, and Development, and Center for Genome Engineering, University of Minnesota, Minneapolis, MN 55455 Edited by Joan Curcio, Wadsworth Center, New York State Department of Health, Albany, NY, and accepted by the Editorial Board June 28, 2011 (received for review March 11, 2011) Integration sites for many retrotransposons and retroviruses are and HMR) (13–15). Ty5 IN selects integration sites using a 6-aa determined by interactions between retroelement-encoded inte- motif at the IN C terminus (16, 17). This IN targeting domain grases and specific DNA-bound proteins. The Saccharomyces interacts with a protein component of heterochromatin, namely retrotransposon Ty5 preferentially integrates into heterochroma- silent information regulator 4 (Sir4) (16, 18). The Ty5 IN/Sir4 tin because of interactions between Ty5 integrase and the interaction tethers the integration complex to target sites and heterochromatin protein silent information regulator 4. We map- results in the primary target site bias of Ty5. ped over 14,000 Ty5 insertions onto the S. cerevisiae genome, 76% In this study, we applied high-throughput DNA sequencing to of which occurred in heterochromatin, which is consistent with the characterize a large number of Ty5 insertions that we mapped to known target site bias of Ty5. Using logistic regression, associa- the S. cerevisiae genome. Whereas the majority of Ty5 elements tions were assessed between Ty5 insertions and various chromo- integrated as predicted in heterochromatin, a secondary target somal features such as genome-wide distributions of nucleosomes site bias was revealed for both euchromatic and heterochromatic and histone modifications. Sites of Ty5 insertion, regardless of insertions. Logistic regression established that this secondary bias whether they occurred in heterochromatin or euchromatin, were was influenced by chromosomal features characteristic of open strongly associated with DNase hypersensitive, nucleosome-free chromatin, including DNase hypersensitivity, lack of nucleo- fl regions anking genes. Our data support a model wherein silent somes, presence of transcription factors, and epigenetic marks information regulator 4 tethers the Ty5 integration machinery to associated with gene transcription. We provide evidence sug- GENETICS fi domains of heterochromatin, and then, speci c target sites are gesting that this secondary target site bias reflects sites that can be selected based on DNA access, resulting in a secondary target site easily accessed by the Ty5 integration complex during integration. bias. For insertions in euchromatin, DNA access is the primary de- terminant of target site choice. One consequence of the secondary Results target site bias of Ty5 is that insertions in coding sequences occur Ty5 Insertion Dataset. To observe genome-wide patterns of Ty5 infrequently, which may preserve genome integrity. integration, we created an integrant library of ∼400,000 in- dependent transposition events. This library was derived from 16 he insertion of mobile genetic elements into new chromo- separate Ty5 transposition assays—8 assays using the WT YPH499 Tsomal sites profoundly impacts genome structure and evolu- haploid strain and 8 assays using the isogenic WT diploid YPH501. tion. For many mobile elements, integration sites are not chosen Ty5/host DNA junction fragments were recovered from each of randomly. Target site biases are particularly well-documented for the 16 populations using linker-mediated PCR. Linkers were li- the LTR retrotransposons and retroviruses (1–3). These retro- gated to genomic DNA that had been digested with restriction elements replicate by reverse-transcribing mRNA into cDNA and enzymes. Four enzymes (each recognizing four bases) were used to then inserting the cDNA into their host’s genome using an ele- maximize potential to recover sites and minimize recovery bias. ment-encoded integrase (IN). Retrotransposons are among the The genomic sequence at each insertion site was determined by most abundant interspersed repeats in eukaryotic genomes, and pyrosequencing using the 454 GS FLX platform. retroviruses are often used as vectors for gene therapy. Un- In total, ∼337,000 sequencing reads were obtained (Table 1). derstanding mechanisms of retroelement target site choice, Specific barcode sequences in the PCR primers made it possible therefore, has value for both basic and applied research. to assign reads to 1 of 16 transposition assays. Reads were ex- In the best studied cases, retroelement target site choice is cluded that (i) did not have a perfect match to a barcode and dictated by interactions between IN and specific DNA-bound surrounding DNA or (ii) had more than four mismatches to the proteins. HIV IN, for example, interacts with the transcription primer. Furthermore, insertions at a given position and orienta- coactivator lens epithelial-derived growth factor (4), and sites of tion were only counted once in each pool. In total, ∼160,000 reads fl ’ HIV integration are in uenced by sites of this protein s chromo- passed our filters. Sequences sharing more than 98% sequence somal occupancy (5). The role of chromatin in target site choice is also well-established for model yeast retrotransposons. The Schizosaccharomyces pombe Tf1 element inserts preferentially This paper results from the Arthur M. Sackler Colloquium of the National Academy of into regions upstream of some genes transcribed by RNA poly- Sciences, “Telomerase and Retrotransposons: Reverse Transcriptases That Shaped Ge- merase (pol) II (6). Tf1 IN interacts with the transcription factor nomes” held September 29 and 30, 2010, at the Arnold and Mabel Beckman Center of the Atf1p (7), and at the fbp1 promoter, Atf1p alone mediates target National Academies of Sciences and Engineering in Irvine, CA. The complete program and audio files of most presentations are available on the NAS Web site at www.nasonline.org/ site choice (8). The Saccharomyces cerevisiae Ty1 and Ty3 retro- telomerase_and_retrotransposons. transposons prefer to integrate upstream of genes transcribed by Author contributions: J.A.B., J.G., and D.F.V. designed research; J.A.B. and J.G. performed RNA pol III, likely because of interactions between IN and research; J.A.B. and J.G. contributed new reagents/analytic tools; J.A.B., J.G., and D.F.V. components of the pol III machinery or associated chromatin (9, analyzed data; and J.A.B., J.G., and D.F.V. wrote the paper. 10). In the case of Ty3, critical factors for targeting are the TATA The authors declare no conflict of interest. binding protein and Brf (also called TFIIIB70) (11, 12). This article is a PNAS Direct Submission. J.C. is a guest editor invited by the Editorial Board. The first retroelement for which a targeting mechanism was 1J.A.B. and J.G. contributed equally to this work. described in detail was the Saccharomyces retrotransposon Ty5. 2To whom correspondence should be addressed. E-mail: [email protected]. Ty5 integrates preferentially into heterochromatin, which in This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. yeast, is found near the telomeres and silent mating loci (HML 1073/pnas.1103665108/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1103665108 PNAS | December 20, 2011 | vol. 108 | no. 51 | 20351–20356 Downloaded by guest on September 25, 2021 Table 1. Ty5 insertion sites recovered by pyrosequencing Strain name and Clean Base pairs hosting Base pairs hosting pool number Ploidy All reads reads ambiguous alignments unambiguous alignments YPH499-1 Haploid 21,960 10,368 468 743 YPH499-2 Haploid 22,050 11,082 423 847 YPH499-3 Haploid 22,559 10,356 370 673 YPH499-4 Haploid 23,351 10,868 444 766 YPH499-5 Haploid 22,102 10,525 400 719 YPH499-6 Haploid 21,367 9,161 361 637 YPH499-7 Haploid 21,361 9,816 540 912 YPH499-8 Haploid 21,779 10,749 568 987 YPH501-1 Diploid 18,605 9,127 348 389 YPH501-2 Diploid 20,365 9,485 207 228 YPH501-3 Diploid 19,292 8,680 264 214 YPH501-4 Diploid 20,889 9,903 222 287 YPH501-5 Diploid 19,572 9,182 212 234 YPH501-6 Diploid 21,967 10,460 205 279 YPH501-7 Diploid 21,014 10,542 346 450 YPH501-8 Diploid 19,237 8,906 205 243 Not assigned 6,399 —— — identity to a single site on the S. cerevisiae genome were desig- a 30-kb buffer between heterochromatin and euchromatin to nated as unambiguous insertions. Because Ty5 integrates pref- ensure that signals were distinct. The euchromatin and buffer erentially into repetitive, subtelomeric regions, reads mapping to regions constituted 88% and 7% of the genome, respectively. The multiple sites in the genome (greater than 98% sequence identity) rDNA and MAT were excluded from euchromatin, because the were also considered. These ambiguous insertions were down- former is not accurately represented in the reference genome and weighted by a factor equal to the number of sites to which the read the latter contained many ambiguous insertions because of du- mapped (i.e., each ambiguous site was assigned a fraction of an plicated sequences at the silent mating loci. integration event); 40% of the high-quality reads were ambiguous. Selection could influence the distribution of Ty5 insertions; for example, insertions may not be recovered if they occur in es- Primary Target Site Bias of Ty5. The majority of Ty5 insertions sential genes in haploid strains. To assess impacts of selection, mapped to the ends of all 16 S. cerevisiae chromosomes (Fig. 1 and Ty5 insertion sites were compared between the haploid and Fig. S1). Thus, the primary pattern of Ty5 integration matched diploid populations. Both the haploid and diploid chromosomal what we predicted based on our previous work showing the key distributions were nearly identical, with a Pearson’s correlation role played by heterochromatin in target site choice (15, 18).