MIR Retrotransposon Sequences Provide Insulators to the Human Genome
Total Page:16
File Type:pdf, Size:1020Kb
MIR retrotransposon sequences provide insulators to the human genome Jianrong Wanga, Cristina Vicente-Garcíab,c, Davide Seruggiab,c, Eduardo Moltób,c, Ana Fernandez-Miñánd, Ana Netod, Elbert Leee, José Luis Gómez-Skarmetad, Lluís Montoliub,c, Victoria V. Lunyake, and I. King Jordana,f,1 aSchool of Biology, Georgia Institute of Technology, Atlanta, GA 30332; bDepartment of Molecular and Cellular Biology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CSIC), 28049 Madrid, Spain; cCentro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III (ISCIII), 28029 Madrid, Spain; dCentro Andaluz de Biología del Desarrollo CSIC-Universidad Pablo de Olavide (UPO), 41013 Sevilla, Spain; eAelan Cell Technologies, Inc., San Francisco, CA 94107; and fPanAmerican Bioinformatics Institute, Santa Marta, Magdalena, Colombia Edited by Nancy L. Craig, The Johns Hopkins University School of Medicine, Baltimore, MD, and approved June 26, 2015 (received for review April 21, 2015) Insulators are regulatory elements that help to organize eukary- More recently, TE-derived insulator sequences have been otic chromatin via enhancer-blocking and chromatin barrier activ- discovered in mammalian genomes. The short interspersed nuclear ity. Although there are several examples of transposable element element (SINE) B1 has insulator activity that is mediated by the (TE)-derived insulators, the contribution of TEs to human insula- binding of specific transcription factors along with the insulator tors has not been systematically explored. Mammalian-wide in- associated protein CCCTC-binding factor (CTCF) (11). A genome- terspersed repeats (MIRs) are a conserved family of TEs that have wide analysis of CTCF binding sites in the human and mouse ge- substantial regulatory capacity and share sequence characteristics nomes discovered that many CTCF binding sites are derived from with tRNA-related insulators. We sought to evaluate whether TE sequences (12), and a survey of six mammalian species revealed MIRs can serve as insulators in the human genome. We applied a that lineage-specific expansions of retrotransposons have contrib- bioinformatic screen using genome sequence and functional uted numerous CTCF binding sites to their genomes (13). A + genomic data from CD4 T cells to identify a set of 1,178 predicted number of these TE-derived CTCF binding sites in the mouse and MIR insulators genome-wide. These predicted MIR insulators were rat genomes are capable of segregating domains enriched or de- computationally tested to serve as chromatin barriers and regula- + pleted for acetylation of histone 2A lysine 5 (H2AK5ac), suggesting tors of gene expression in CD4 T cells. The activity of predicted that they may encode insulator function. Interestingly, this same MIR insulators was experimentally validated using in vitro and in analysis did not detect retrotransposon-driven expansion of CTCF vivo enhancer-blocking assays. MIR insulators are enriched around binding sites in the human genome (13). genes of the T-cell receptor pathway and reside at T-cell–specific Whereas subsets of CTCF binding sites are known to be boundaries of repressive and active chromatin. A total of 58% of associated with insulators, numerous insulators can function in a the MIR insulators predicted here show evidence of T-cell–specific CTCF-independent manner. An important example comes from a chromatin barrier and gene regulatory activity. MIR insulators ap- mouse TE, the SINE B2 element, which serves as a developmentally pear to be CCCTC-binding factor (CTCF) independent and show a regulated compound insulator, encoding both enhancer-blocking and distinct local chromatin environment with marked peaks for RNA chromatin barrier activity, at the growth hormone locus (14). B2 is a Pol III and a number of histone modifications, suggesting that MIR tRNA-derived SINE that encodestheB-boxpromoterelement, insulators recruit transcriptional complexes and chromatin modify- which is bound by RNA polymerase III (RNA Pol III). The con- ing enzymes in situ to help establish chromatin and regulatory do- nection to tRNAs/Pol III binding is intriguing, given the fact that mains in the human genome. The provisioning of insulators by MIRs across the human genome suggests a specific mechanism by which Significance TE sequences can be used to modulate gene regulatory networks. transposable elements | insulators | chromatin | gene regulation | Insulators are genome sequence elements that help to orga- genomics nize eukaryotic genomes into coherent regulatory domains. Insulators can encode both enhancer-blocking activity, which prevents the interaction between enhancers and promoters nsulators are regulatory sequence elements that help to orga- located in distinct regulatory domains, and/or chromatin bar- Inize eukaryotic chromatin into functionally distinct domains (1, rier activity that helps to delineate active and repressive 2). Insulators can encode two different functions: enhancer-blocking chromatin domains. The origins and functional characteristics activity and chromatin barrier activity. Enhancer-blocking insulators of insulator sequence elements are important, open questions prevent the interaction of enhancer and promoter elements located in molecular biology and genomics. This report provides insight in distinct domains, and chromatin barrier insulators, also known as into these questions by demonstrating the origins of a number boundary elements (3, 4), protect active chromatin domains by of human insulator sequences from a family of transposable- blocking the spread of repressive chromatin. These two functional element–derived repetitive sequence elements: mammalian- roles are not mutually exclusive; compound insulators may encode wide interspersed repeats (MIRs). Human MIR-derived in- both enhancer-blocking and chromatin barrier activities (5). sulators are characterized by distinct sequence, expression, and Transposable element sequences are known to provide a variety of chromatin features that provide clues as to their potential regulatory sequences to eukaryotic genomes (6), and there are several mechanisms of action. examples of transposable element (TE)-derived insulators. The best studied TE insulator comes from the Drosophila gypsy element (7–10). Author contributions: J.W., L.M., V.V.L., and I.K.J. designed research; J.W., C.V.-G., D.S., Gypsy is a long terminal repeat retrotransposon that contains an in- E.M., A.F.-M., A.N., E.L., and J.L.G.-S. performed research; C.V.-G., D.S., E.M., A.F.-M., A.N., ′ E.L., J.L.G.-S., L.M., and V.V.L. contributed new reagents/analytic tools; J.W. and I.K.J. sulator sequence in its 5 untranslated region. The gypsy insulator analyzed data; and J.W. and I.K.J. wrote the paper. interacts with the suppressor of hairy wing [su(Hw)] and modifier of The authors declare no conflict of interest. mdg4 [mod(mdg4)] proteins to block regulatory interactions between This article is a PNAS Direct Submission. distal enhancer and proximal promoter sequences. This same in- 1To whom correspondence should be addressed. Email: [email protected]. sulator can also protect transgenes from position effects, indicating This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. that it encodes chromatin barrier activity as well. 1073/pnas.1507253112/-/DCSupplemental. E4428–E4437 | PNAS | Published online July 27, 2015 www.pnas.org/cgi/doi/10.1073/pnas.1507253112 Downloaded by guest on September 24, 2021 tRNA gene sequences/Pol III binding have been shown to encode First, all MIR sequences in the human genome that contain PNAS PLUS + insulators in yeast (15–18), mouse (19), and human (20, 21). The intact B-boxes and are bound by RNA Pol III in CD4 T cells association of insulators to the binding of RNA Pol III, or tran- were identified. Then, these MIRs were evaluated for their scription factor III C (TFIIIC) specifically, to B-box elements is ability to partition active versus repressive chromatin using a pre- widely observed in multiple species, suggesting that Pol III-related viously described approach (21) that segregates histone modifica- machinery represents another insulator mechanism in addition to tions associated with expressed (active) versus silent (repressive) CTCF binding. Because the human genome is made up of a sub- genomic regions. Broad genomic distributions of 39 histone modi- stantial fraction of TE sequences, including numerous tRNA-derived fications, with 34 characterized as active and 5 characterized as SINE retrotransposons (22), it is highly possible that subsets of repressive, were evaluated to detect large contiguous regions (do- these tRNA-derived SINE sequences encode insulator functions. mains) of active and repressive chromatin. The B-box–containing The discovery and characterizationofsuchTE-derivedinsulatorswill and RNA Pol III-bound MIR elements found to be located be- help to augment the currently sparse insulator annotations in the tween adjacent active versus repressive were then used for further human genome and also provide additional evidence regarding Pol analysis (SI Appendix, SI Methods). RNA-seq was then used to III-related mechanisms of insulator activity. further reduce the list of putative MIR insulators to those that Mammalian-wide interspersed repeats (MIRs) are an ancient delineate high- versus low-expressed genomic regions. Finally, only family of TEs (23) that bear several features, suggesting that they MIR insulators