Du et al. Epigenetics & Chromatin (2016) 9:28 DOI 10.1186/s13072-016-0078-0 Epigenetics & Chromatin

RESEARCH Open Access Chromatin variation associated with liver metabolism is mediated by transposable elements Juan Du1,2, Amy Leung1, Candi Trac1, Michael Lee1,2, Brian W. Parks3, Aldons J. Lusis4, Rama Natarajan1,2 and Dustin E. Schones1,2*

Abstract Background: Functional regulatory regions in eukaryotic genomes are characterized by the disruption of nucle- osomes leading to accessible chromatin. The modulation of chromatin accessibility is one of the key mediators of transcriptional regulation, and variation in chromatin accessibility across individuals has been linked to complex traits and disease susceptibility. While mechanisms responsible for chromatin variation across individuals have been inves- tigated, the overwhelming majority of chromatin variation remains unexplained. Furthermore, the processes through which the variation of chromatin accessibility contributes to phenotypic diversity remain poorly understood. Results: We profiled chromatin accessibility in liver from seven strains of mice with phenotypic diversity in response to a high-fat/high-sucrose (HF/HS) diet and identified reproducible chromatin variation across the individuals. We found that sites of variable chromatin accessibility were more likely to coincide with particular classes of transpos- able elements (TEs) than sites with common chromatin signatures. Evolutionarily younger long interspersed nuclear elements (LINEs) are particularly likely to harbor variable chromatin sites. These younger LINEs are enriched for bind- ing sites of immune-associated transcription factors, whereas older LINEs are enriched for liver-specific transcription factors. Genomic region enrichment analysis indicates that variable chromatin sites at TEs may function to regulate liver metabolic pathways. CRISPR-Cas9 deletion of a number of variable chromatin sites at TEs altered expression of nearby metabolic . Finally, we show that polymorphism of TEs and differential DNA methylation at TEs can both influence chromatin variation. Conclusions: Our results demonstrate that specific classes of TEs show variable chromatin accessibility across strains of mice that display phenotypic diversity in response to a HF/HS diet. These results indicate that chromatin variation at TEs is an important contributor to phenotypic variation among populations. Keywords: Chromatin accessibility, Transposable element, Transcription factor, DNA methylation, FAIRE-seq

Background different individuals [1, 5, 6], and these variable chroma- Accessible (open) chromatin is a common feature of tin sites have been shown to be associated with complex active regulatory regions in eukaryotic genomes [1, 2]. traits and disease susceptibility [7]. However, the mecha- The cell type-specific accessibility of chromatin allows nisms underlying chromatin accessibility variation, and regulatory factors to bind to the underlying DNA, leading the processes through which this variation impacts phe- to tightly regulated expression [1, 3, 4]. Accessible notypic diversity, remain poorly understood. chromatin regions have been shown to be variable among Initial investigations into the relationship between variation of chromatin accessibility and genetic variation *Correspondence: [email protected] have begun to elucidate some principles. Examination of 1 Department of Diabetes Complications and Metabolism, Beckman chromatin signatures in individuals with diverse ances- Research Institute, City of Hope, Duarte, CA, USA tries revealed extensive variation in regulatory regions Full list of author information is available at the end of the article

© 2016 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Du et al. Epigenetics & Chromatin (2016) 9:28 Page 2 of 16

and evidence of heritability of these signatures [6]. Chro- To study the roles of TEs in chromatin accessibility matin accessibility profiling in human lymphoblastoid variation, we chose seven strains of inbred mice that have cell lines revealed the association of chromatin accessi- differential response to a “western” high-fat, high-sucrose bility signatures with genetic variants which are associ- (HF/HS) diet [28] and performed genome-wide chroma- ated with the expression of nearby genes and potentially tin accessibility profiling in liver tissue using FAIRE-seq phenotypic diversity in humans [5, 8]. A study in eryth- [29]. Given that TEs are typically repressed in somatic roblasts from eight strains of inbred mice found that cells [15, 17], we expected that most TE sequences would approximately 1/3 of variable open chromatin sites can be less accessible in mouse liver. Interestingly, we found be explained by single nucleotide variants and that these that a substantial fraction of variable chromatin sites are variants were associated with complex traits and disease at TEs. Furthermore, TE-associated region of chromatin [7]. While these pioneering studies have provided some variations among different strains regulates nearby met- insight into the drivers of chromatin variation, the major- abolic genes. Taken together, our study shows that TE ity of chromatin variation across the genome remains loci are sources of chromatin accessibility variation and unexplained. metabolic gene regulation among different inbred strains, In addition to single nucleotide variants, transposable which may further impact phenotypic diversity in livers elements (TEs) constitute a major portion of genomic of different strains of mice. variation [9, 10]. Approximately 50 % of the and 40 % of the mouse genome are derived from Results TEs [11, 12]. TEs can affect nearby gene activity and Chromatin accessibility variation observed in livers of mice have been linked to complex traits and diseases, includ- with differential phenotypes ing cancer and diabetes [13, 14]. Due to the deleterious Previous studies have reported strain-specific heteroge- nature of TE transposition, mammalian systems have neity in physiological response to HF/HS diet feeding [28, a number of transcriptional and posttranscriptional 30]. In this study, we chose male mice from seven com- mechanisms to silence TEs [15]. The major mechanisms monly used inbred strains of mice: A/J, AKR/J, BALB/ responsible for the suppression of TE transposition are cJ, C57BL/6J, C3H/HeJ, CBA/J and DBA/2J. These mice DNA methylation, histone methylation and RNA inter- display diverse body fat percentage change after 8 weeks ference [15–17]. Most DNA methylation in mammals of HF/HS feeding, ranging from an average increase of occurs within TE sequences in order to transcriptionally 70 % (BALB/cJ) to over 200 % (C57BL/6J) [28]. We also suppress TE activities [17, 18]. Indeed, in somatic cells, observed significant variation of liver phenotypic mark- most TEs are epigenetically silenced by DNA methyla- ers, including liver triglyceride content (Additional file 1: tion [19]. However, studies have shown that specific TEs Figure S1) [31], as expected given the important meta- can be derepressed in a tissue-specific manner [19–21]. bolic functions of the liver [32]. For example, tissue-specific DNA hypomethylation To profile chromatin accessibility at a genome-wide within TEs has been shown to contribute to novel regu- level, we performed FAIRE-seq [29] in livers from male latory networks [19]. mice of the seven strains after 8 weeks of HF/HS feeding There is growing evidence that TEs have evolved for the (two biological replicates for each strain). In order to mit- benefit of the host, contributing to host genome expan- igate alignment biases, we created strain-specific pseudo- sion and genetic innovation [22]. TEs can regulate gene genomes using known single nucleotide polymorphisms expression by functioning as distal enhancers, alterna- (SNPs) [33], as described previously, and mapped the tive promoters or alternative splicing signals [19, 20, 23, reads for each strain to the corresponding pseudo- 24]. Chromatin accessibility at TEs has been associated genome [7]. With the aligned reads, we utilized F-seq with the transcription of nearby genes in a tissue-specific [34] with the IDR framework [35] to identify reproduc- manner [25, 26]. Many binding sites for transcription ible peaks from our FAIRE-seq data for each of the seven factors (TFs) have been characterized within specific TE strains. Using this approach, on average 29,752 reproduc- sequences [26, 27]. Analysis of TE-associated TF binding ible accessible chromatin sites were identified in each sites in different species has further suggested that the individual strain (Additional file 1: Table S1). Combining expansion of the mammalian TF binding repertoire has the sites from the seven strains, we found a union set of been mediated by TE transposition [24, 27]. Given the 50,775 open chromatin sites. To identify sites that display prevalence of TE sequences and their potential regula- variation in chromatin accessibility among the strains, we tory functions, we hypothesized that TEs can play a regu- compared quantile-normalized read counts at the union latory role in mouse liver, and the chromatin accessibility set of sites using the DESeq package [7, 36]. We ranked variation at TEs among different individual may drive sites by their adjusted p values (Additional file 1: Figure phenotypic diversity among them. S2a) and selected the top 5 % as the most variable set of Du et al. Epigenetics & Chromatin (2016) 9:28 Page 3 of 16

sites (2539 sites; adjusted p < 1.21e-9). Similarly, we clas- [37] annotation of TEs. As expected, sites of accessi- sified the bottom 5 % of sites as the common set of sites. ble chromatin are less likely to overlap instances of four Variable sites display substantial heterogeneity in pat- classes of TEs (DNA transposons and the retrotrans- terns of chromatin accessibility across the strains (Addi- poson classes of LINEs (long interspersed nuclear ele- tional file 1: Figure S2b), indicating that the observed ments), SINEs (short interspersed nuclear elements) and variability is not due to one strain being dramatically LTRs (long terminal repeats)) compared to random sites −16 different from the others. Examples of variable and com- in the genome (34 vs. 54 %, p < 2.2 × e , Fisher’s exact mon chromatin sites are shown in Fig. 1. test; Additional file 1: Figure S4a, b). These percentages Given that SNPs have been shown to contribute to are comparable to a previous study using DNase I hyper- chromatin variation in mouse erythroblasts [7], we first sensitivity data sets from human tissues [26]. tested whether SNPs are associated with chromatin vari- Interestingly, although TE sequences generally display ation in the liver (see “Methods” section), and found that less accessible chromatin, there are more TEs observed 30 % (764/2539) of the most variable chromatin sites at variable chromatin sites than at common chromatin have underlying SNPs that are associated with chroma- sites (37 vs 32 %, p = 0.001, Fisher’s exact test; Additional tin variation among the seven strains (Additional file 1: file 1: Figure S4c, d). Furthermore, two specific classes of Figure S3a, b). This result is consistent with a previous retrotransposons, LINEs and LTRs, are significantly more study using erythroblasts from eight strains of inbred enriched at variable chromatin sites compared with com- −13 mice [7]. While this analysis provides a genetic expla- mon chromatin sites (Fig. 2a, b; LINE: p = 3.7 × e ; −13 nation for ~1/3 of chromatin variation, the majority of LTR: p = 4.5 × e , Fisher’s exact test). As an example, chromatin variation among the inbred strains remained the variable chromatin site at the Adi1 locus in Fig. 1a unexplained. coincides with a LTR (Additional file 1: Figure S4e). In contrast, SINEs are more enriched at common chromatin Chromatin variability at TEs across inbred strains sites compared with variable sites (Fig. 2c, p = 0.00017, Previous studies have shown that TEs contribute to Fisher’s exact test). DNA transposons are not enriched at regulatory networks in mammalian genomes [26]. We either variable or common sites (Additional file 1: Figure therefore reasoned that TEs could influence chroma- S5a, p = 0.37, Fisher’s exact test). tin accessibility variation among inbred strains. Given that TEs are typically repressed/silenced in somatic cells Variable chromatin sites are enriched at evolutionarily [15, 17], we expected that TEs would be less enriched at younger LINEs sites of chromatin accessibility compared with random Since specific subfamilies of TEs can play specific role in sites. To test this, we examined the prevalence of TEs in gene regulation [27, 38], we next investigated whether all accessible chromatin sites utilizing the RepeatMasker variable chromatin sites are enriched for specific fami- lies of TEs. Similar to previous analysis [24], we used the RepeatMasker [37] annotation of TE families and sub- families and tabulated the occurrences of TEs from each a Variable b Common 2 kb 1 kb subfamily at variable or common chromatin sites (Addi- 140 A/J 80 A/J tional file 2: Table S2). Intriguingly, we found that several 80 L1Md subfamilies are significantly enriched at the vari- 140 AKR/J AKR/J able sites compared with common sites (Fig. 2d, e). These 80 140 BALB/cJ BALB/cJ L1Md subfamilies of TEs are evolutionarily younger com- pared with other TEs with the average age of L1Md_T, 140 80 C3H/HeJ C3H/HeJ L1Md_F2, L1Md_A, L1Md_F and L1Md_F3 being 8.27,

FAIRE-se q 80 140 C57BL/6J C57BL/6J 15.06, 8.05, 30.29 and 12.05 million years, respectively (see “Methods” section) [27, 37]. Furthermore, the acces- 140 CBA/J 80 CBA/J sibility of chromatin at younger L1Md subfamilies seems

80 to be strain specific, where strains with higher chromatin 140 DBA/2J DBA/2J accessibility at one young L1Md subfamily also showed higher accessibility for other young L1Md subfamilies Adi1 Lipc (Additional file 1: Figure S6). Given that evolutionarily Fig. 1 Chromatin accessibility across inbred strains of mice. Genome younger LINEs have diverged less and therefore contain browser view of FAIRE-seq tracks for seven strains and RefSeq genes less unique sequence, we assessed the potential of map- illustrating a a variable chromatin site (boxed in red) at the Adi1 locus ping biases by generating mappability tracks for the ref- and b a common chromatin site (boxed in blue) at the Lipc locus erence genome and representative pseudo-genomes for Du et al. Epigenetics & Chromatin (2016) 9:28 Page 4 of 16

a bc 12 ** 12 ** 25 ** e 20 8 Variable 8 15 Common 10 4 4

Percentag 5

of chromatin site s 0 0 0 LINE LTR SINE dechr1:176,871,990-176,881,238 2 kb 50 100 A/J ** Variable 100 AKR/J 40 Common ** 100 BALB/cJ 100 30 ** C3H/HeJ 100 C57BL/6J Count s 20 ** FAIRE-se q 100 CBA/J 10 * 100 DBA/2J 0 L1Md_T C57BL/6J 1 1 L1Md_A C3H/HeJ L1Md_TL1Md_F2 L1Md_FL1Md_F3 170-mer Mappability

f LINE g LTR h SINE 0.02 .0 2 .0 2 Densit y 0.01 .0 10 .0 10 00 0 00 04080 04080 04080 Age (Myrs) Age (Myrs) Age (Myrs) All Variable Common Fig. 2 Specific subfamilies of TEs are enriched for variable chromatin sites. a–c Percentage of variable and common chromatin sites overlapping a LINEs, b LTRs and c or SINEs (**indicates p < 0.001, Fisher’s exact test). d Numbers of variable (red) or common (blue) chromatin sites overlapping certain subfamilies of TEs (*p < 0.05, **p < 0.001, Fisher’s exact test). e Genome browser view of a variable chromatin site in an L1Md_T element. The 170-mer mappability tracks for the reference genome and one representative pseudo-genome are shown below. f–h Age distribution of all TEs and TEs at variable or common chromatin sites for different classes of TEs, including f LINEs, g LTRs and h SINEs. Myrs million years

170-mers, the average length of our mapped fragments to C3H/HeJ (Additional file 1: Figure S7a). We further (see “Methods” section). assessed whether the L1Mds that are commonly acces- To begin to access the potential association between sible in C57BL/6J and BXH2/TyJ but not C3H/HeJ and genotype and strain-specific accessibility at young BXH19/TyJ can be explained by local genetic variants. L1Mds, we profiled chromatin accessibility from two We found 35 % (14/40) of these L1Mds are regions where recombinant inbred strains, BXH2/TyJ and BXH19/TyJ, C57BL/6J and BXH2/TyJ share a genotype at the locus, derived from C57BL/6J and C3H/HeJ. C57BL/6J has while C3H/HeJ and BXH19/TyJ share a different geno- higher accessibility at younger L1Mds than does C3H/ type (Additional file 1: Supplementary methods). Given HeJ (Additional file 1: Figure S6). Interestingly, BXH2/ the known roles of suppressor proteins and epigenetic TyJ has similar accessibility at young L1Mds compared modifications in controlling chromatin accessibility [15], with that of C57BL/6J, while BXH19/TyJ is more similar it is not surprising that local genetic variation does not Du et al. Epigenetics & Chromatin (2016) 9:28 Page 5 of 16

explain all of chromatin variation. Nevertheless, we did enriched chromatin accessibility at younger LINEs com- find examples where accessibility of LINE corresponds to pared with older LINEs (Fig. 3a). To further examine the the genotype (Additional file 1: Figure S7b). profiles of chromatin accessibility across entire LINE ele- Given that sites of variable chromatin are enriched for ments, we stratified LINEs into different groups based on evolutionarily younger families of LINE elements com- their size and produced aggregate plots of the FAIRE-seq pared with common sites, we next asked whether TEs at signal at LINEs and flanking regions (Fig. 3b). Consist- variable sites are in general evolutionarily younger than ent with the heatmap analysis (Fig. 3a), younger LINEs those at common sites. We again separated TEs into four have higher chromatin accessibility compared with older classes (DNA transposons and SINE, LINE and LTR ret- LINEs, regardless of size (Fig. 3b). We also found that rotransposons) and plotted the distribution of the evolu- longer LINEs have more enriched chromatin accessibility tionary age of all elements as well as those at variable and compared with shorter ones (Fig. 3b), likely because the common sites separately in each of the classes (Fig. 2f–h; longer intact LINEs tend to be evolutionarily younger. Additional file 1: Figure S5b). Strikingly, we found that Chromatin accessibly in another strain of mice, A/J, LINEs at variable chromatin sites display a bimodal dis- reveals a similar trend (Additional file 1: Figure S9). It has tribution for age, with one subgroup of evolutionarily previously been shown that intact, longer, LINEs can be younger LINEs being prominently variable (Fig. 2f). In transcribed [20]. However, we did not detect increased contrast, LINEs that overlap common chromatin sites RNA transcripts from younger LINEs compared with −16 are in general evolutionarily older (Fig. 2f, p < 2.2 × e ). older LINEs (Fig. 3c). To ensure that the differential chro- This difference between variable and common sites was matin accessibility and uniform transcription profiles at further exemplified when we grouped individual LINEs younger vs older LINEs were not due to mapping biases, into subfamilies (Additional file 1: Figure S8a). We we repeated analysis of FAIRE-seq and RNA-seq enrich- observed a similar, albeit less dramatic, trend for LTR ele- ment at LINE families using TEtranscripts [41], a soft- −7 ments (Fig. 2g, p = 2.7 × e , Wilcoxon’s rank-sum test). ware package designed for including TEs in the analysis However, for SINE elements, there was no significant of sequencing datasets. This analysis supported our con- age difference observed between variable and common clusions that differential chromatin accessibility exists at chromatin sites (Fig. 2h, p = 0.22, Wilcoxon’s rank-sum younger vs older LINEs but there is no differential tran- test). DNA transposons that overlap variable chromatin scription (Additional file 1: Supplementary methods, Fig- show slight enrichment at older elements (Additional ure S10). However, we cannot rule out the possibility that file 1: Figure S5b, p = 0.049, Wilcoxon’s rank-sum test). transcripts from TEs are subjected to posttranscriptional However, DNA transposons contribute to a much smaller suppression that affects RNA stability [15]. Neverthe- population of variable chromatin sites as compared to less, these results indicate that a group of evolutionar- other classes of TEs (Fig. 2a–c, Additional file 1: Figure ily younger LINEs have potential regulatory features in S5a). These results indicate that younger TEs, especially mouse liver, while not producing stable transcripts. LINEs, display increased variation in regulatory potential across strains of mice and therefore may be involved in Differential transcription factor binding sites at younger more recent adaptations of regulatory networks. and older LINEs TEs have been shown to contain transcription fac- Increased chromatin accessibility at younger LINEs tor binding sites, and contribute to the evolution of the In order to understand the regulatory roles of younger mammalian TF binding repertoire [24, 27, 42]. We exam- LINEs, we examined the chromatin accessibility dif- ined the potential regulatory roles of LINEs by scanning ferences at all LINEs ranked by their evolutionary age for binding sites of known TFs in LINE-associated vari- (Fig. 3a). To better examine the coverage of all mappable able chromatin sites stratified by age (see “Methods” sec- (but not necessarily unique) reads from FAIRE-seq data tion). Intriguingly, we found that different TF binding at repetitive elements, we mapped FAIRE-seq reads to motifs are enriched at sites overlapping older LINEs com- the mouse genome using bowtie2 [39], which is capable pared with those overlapping younger LINEs (Fig. 4a). of mapping non-unique reads from highly similar TE The motif for HNF4α, a liver TF, is the top enriched elements to a given subfamily of TE [40] (see “Methods” motif in variable chromatin sites containing older LINEs section). To examine chromatin accessibility differences (Fig. 4a). HNF4α ChIP-seq data from C57BL/6J liver [43] among younger and older LINEs, we ranked all Repeat- also validated the enrichment of HNF4α binding at older −16 Masker-annotated LINEs by their evolutionary age and LINEs compared to younger ones (Fig. 4b; p < 2.2 × e , then plotted C57BL/6J liver FAIRE-seq read counts Wilcoxon’s rank-sum test). In addition, the binding motif upstream and downstream of the annotated 5’ start for another liver TF, C/EBPα, is also enriched at vari- and 3’ end of all LINEs (Fig. 3a). Interestingly, we found able sites containing older LINEs (Fig. 4a). Notably, 59 % Du et al. Epigenetics & Chromatin (2016) 9:28 Page 6 of 16

a c

s Age: 25 0 Myr 10

IRE-seq read count 0 0 RNA-seq read counts FA 103 Myrs -2.5k 2.5k -2.5k 2.5k -2.5k2.5k-2.5k 2.5k 5’ 3’ 5’ 3’ b q s 15 200 - 500 bp 500 bp - 1kb1 - 2 kb 2 - 5 kb > 5 kb 10 read count 05 Average FAIRE-se 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ Younger LINEs (< 40 Myrs) Older LINEs (>= 40 Myrs) Fig. 3 Differential chromatin accessibility profiles at younger and older LINEs. a Heatmap showing FAIRE-seq read counts from C57BL/6J mouse liver surrounding the 5′ and 3′ borders of LINEs sorted by their evolutionary age. Black triangles denote the 5′ or the 3′ end of LINEs, with counts extending 2500 bp upstream and downstream. b Aggregate plots of average FAIRE-seq read counts upstream, downstream and within LINEs, ± stratified by size of LINEs. c Heatmap showing RNA-seq read counts from C57BL/6J mouse liver surrounding the 5′ and 3′ border of LINEs sorted by their evolutionary age. Myrs million years

(67/114) of variable chromatin sites overlapping older younger LINEs at accessible chromatin regions are −9 LINEs are bound by the two liver TFs, HNF4α and/or C/ enriched for STAT binding sites (Fig. 4d; p = 2.3 × e , EBPα. To serve as a control, we searched for the sites that Wilcoxon’s rank-sum test). To confirm the results, we are bound by CTCF [4], a non-liver-specific TFs. Com- performed chromatin immunoprecipitation (ChIP) using pared with HNF4α and/or C/EBPα, we found only 11 % antibodies targeting STAT3, a member of the STAT pro- (12/114) of variable chromatin sites overlapping older tein family known to be active in the liver [48]. Using LINEs to be bound by CTCF, indicating the important quantitative PCR (qPCR), we found that STAT3 binds to role of older LINEs in liver-specific transcription regula- an L1Md_F2 element in a strain-specific manner (Fig. 4e, −15 tion (Fig. 4c, p = 7.5 × e , Fisher’s exact test). f). These results indicate that younger LINEs may play a Intriguingly, variable chromatin sites containing role in the STAT-mediated immune response in the liver. younger LINEs are most enriched for the binding motif of STAT proteins (Fig. 4a), which have been shown to TE‑associated variable chromatin sites contribute to liver play an important role in response to inflammation in metabolic pathways liver [44]. In addition, we noticed that several other To further investigate the impact of variable chroma- enriched motifs contain a half GAS motif (TTC or GAA), tin at TEs to phenotypic diversity among strains, we to which STATs can also bind [45, 46]. To further inves- used the genomic regions enrichment of annotations tigate the presence of specific TF binding at specific tool (GREAT) [49] to identify enriched biological func- LINEs, we used the occurrence of motifs at accessible tions of accessible chromatin sites overlapping TEs. We chromatin sites in C57BL/6J mouse liver as a predictor found that variable chromatin sites with TEs are enriched of binding [47]. Of the predicted HNF4α binding sites, in liver metabolic pathways, including gluconeogenesis, 87 % (7209/8296) have HNF4α ChIP-seq peaks in mouse insulin secretion and lipid storage (Fig. 5a). Variable chro- liver [43]. We furthermore found that compared with matin sites containing younger LINEs are enriched in the older LINEs that are enriched for HNF4α binding sites, negative regulation of gluconeogenesis (Additional file 1: Du et al. Epigenetics & Chromatin (2016) 9:28 Page 7 of 16

a b HNF4α

HNF4α 0.03 ChIP (1e-4, 36.8%) all

≥ 40 Myrs C/EBP .0 2 LINEs

(1e-2, 28.1%) Densit y ESRRB .0 10 (1e-2, 36.8%) 00 04080 STAT Age (Myrs) (1e-5, 20.7%) LINEs with ERE c Variable older LINEs variable (1e-3, 14.6%) chromatin ETS ** (1e-3, 13.4%) 60

< 40 Myrs PGR (1e-2, 13.4%) 30

TCF3 of site s

(1e-2, 17.0%) Percentage 0 HNF4α NFAT C/EBPα CTCF (1e-2, 39.6%) Bound by

d e 10kb f 12 HNF4α 60 0.04 motif C57BL/6J 10 STAT motif all 60 C3H/HeJ 8 STAT3 LINEs IgG 0.02

Densit y FAIRE-se q 6 Gas motif Ugt2b37 4

SINE Relative values 2

0.00 LINE LTR 04080 0 C57BL/6J C3H/HeJ Age (Myrs) C57BL/6J1 C3H/HeJ 1 170-mer Mappability Fig. 4 Specific TFs bind to younger and older LINEs. a Top known motifs found in variable chromatin sites overlapping older (top) or younger (bottom) LINEs. Numbers in parentheses represent p-values of enrichment of motif occurrence in the given set of sequences compared with back- ground, and the percentage of sequences with the motif. b Age distribution of all LINEs as well as LINEs containing HNF4α ChIP-seq peaks. Myrs million years. c Percentage of older-LINE-associated variable sites bound by liver TFs (HNF4α and C/EBPα) or CTCF (**indicates p < 0.001, Fisher’s exact test). d Age distribution of all LINEs as well as LINEs containing STAT or HNF4α motif within accessible chromatin sites in C57BL/6J mouse liver. e Genome browser view of an L1Md_F2 located upstream of Ugt2b37. The GAS motif site is overlapping a site that is accessible in C57BL/6J but not C3H/HeJ. The 170-mer mappability tracks for the reference genome and the C3H/HeJ pseudo-genome are shown below. f Mean relative value of STAT3 ChIP-qPCR compared to IgG control of liver samples from C57BL/6J or C3H/HeJ mice. Error bars represent the SEM (n 3) =

Table S3). In contrast, variable chromatin sites at unique regulated in all the strains. Taken together, these results sequences (without TE or other repeats) of the genome suggest that the variation of chromatin accessibility are only enriched for filopodium assembly and antigen among different strains is associated with liver metabolic processing pathways (Fig. 5a). In addition, variable chro- pathways through specific TE sequences. matin sites overlapping other types of repeat show no As an example, several TE-associated variable chroma- enrichment for biological functions and comprise only a tin sites are found in the major urinary protein (MUP) small percentage of variable chromatin sites. To serve as gene locus. Two LINE-associated variable sites proximal a control, we also searched for enriched biological pro- to Mup19 and Mup5 are shown in Fig. 5b. MUP fam- cesses in common chromatin sites with or without TEs. ily proteins are expressed mainly in the liver and bind Not surprisingly, both groups of common chromatin sites to small lipophilic molecules, including fatty acids [50]. are enriched for liver metabolic processes, including tri- MUPs have been shown to play important roles in glucose glyceride metabolic process and cellular response to oxi- and lipid metabolism and are highly polymorphic in mice dative stress (Additional file 1: Table S3), indicating that [50–52]. Our results here suggest that TEs are involved in these liver metabolic pathways are conserved and tightly this polymorphic feature of MUPs in the mouse genome. Du et al. Epigenetics & Chromatin (2016) 9:28 Page 8 of 16

a b 20kb Negative regulation of 60 cellular carbohydrate A/J metabolic process 60 Regulation of gluconeogenesis AKR/J 60 Negative regulation of BALB/cJ carbohydrate metabolic process 60 C57BL/6J Negative regulation of

TEs insulin secretion involved FAIRE-se q 60 C3H/HeJ 37% in cellular response to glucose stimulus 60 Regulation of lipid storage CBA/J Other 60 Repeats DBA/2J No enriched terms 20% Mup19 Mup5 Filopodium assembly SINE Unique LINE 43% Antigen processing and LTR presentation of peptide 1 antigen via MHC Class I C57BL/6J BALB/cJ 1 170-mer Mappability

c 10kb d 40 Ugt2b37 BALB/cJ 1.4 * FAIRE RNA -seq 1.2 1 Ugt2b37 0.8 SINE 0.6 LINE

Relative Values 0.4 LTR LTR.Ugt2b37 gRNAs 0.2 BALB/cJ1 0 170-mer Mappability Control ∆LTR.Ugt2b37 Fig. 5 TE-associated chromatin variation impacts the expression of liver metabolic genes. a Top enriched biological processes of variable chromatin sites overlapping TEs, other repeats or non-repetitive (unique) sequences. The full list of enriched biological pathways is in Additional file 1: Table S3. Genomic coordinates of accessible chromatin sites were used as input for genomic regions enrichment of annotations tool (GREAT) analysis (see “Methods” section). b Genome browser view of TE-associated variable chromatin sites within the MUP locus. c Genome browser view of an LTR-asso- ciated variable chromatin site upstream of Ugt2b37. The guide RNAs designed for CRISPR-Cas9 deletion, and 170-mer mappability tracks of BALB/cJ pseudo-genome are shown below. d Quantitative polymerase chain reaction (qPCR) of Ugt2b37 expression levels in control or ΔLTR.Ugt2b37 H2.35 cells (*indicates p < 0.05, Student’s t test)

To validate the function of TEs at variable chro- LTR leads to significant reduction inUgt2b37 expres- matin sites in regulating nearby , we sion (Fig. 5d, p = 0.03, Student’s t test). We further tested used the CRISPR-Cas9 system to generate deletion of two additional variable LINEs and showed the deletion TE sequences in H2.35 cells, a cell line derived from of each leads to the dysregulation of nearby metabolic BALB/c hepatocytes. We first tested an LTR located genes. Deletion of an L1Md_F2 located 11 kb upstream 1 kb upstream of UDP glucuronosyltransferase 2 family, of Ugt2b37 also leads to significant reduction in Ugt2b37 polypeptide B37 (Ugt2b37), a member of UGT family expression (Additional file 1: Figure S11a, p = 0.004, Stu- (Fig. 5c). UGT gene family members encode enzymes in dent’s t test). We also deleted an Lx8 LINE element 6 kb detoxification pathways and are upregulated in steatotic downstream of suppressor of defective silencing 3 homolog liver tissue from obese mice [53]. The deletion of this (Suds3). Suds3 encodes a protein component of the SIN3 Du et al. Epigenetics & Chromatin (2016) 9:28 Page 9 of 16

histone deacetylase (HDAC) corepressor complex, which sites are playing regulatory roles. While we found that has been shown to play a regulatory role in metabolic only 6 % (59/934) of the TEs that overlap with variable control in the liver [54]. Deletion of the Lx8 leads to chromatin sites are polymorphic among the strains, this increased expression of Suds3, which indicates a poten- is likely an underestimate given the difficulty in genome tial suppressor function of the Lx8 (Additional file 1: assembly at repetitive regions of the genome [10]. Figure S11b, p = 0.02, Student’s t test). These results vali- dated that variable TEs contribute to the regulation of Differential DNA methylation at TEs contributes metabolic genes in liver cells. to regulatory variation across inbred strains Given that sites displaying chromatin variation at TEs It has been previously demonstrated that TEs are subject are enriched in metabolic pathways, we hypothesized that to regulation through epigenetic mechanisms, includ- variable TE sequences regulate nearby metabolic genes ing DNA methylation and histone modifications [17]. In in response to diet. Our previous work has demonstrated human somatic cells, DNA hypomethylation has been that HF/HS diet leads to chromatin remodeling at regu- found within specific TE subfamilies that are associated latory regions in the liver [55]. We therefore examined with enhancer marks [19]. We therefore reasoned that chromatin accessibility differences in control-fed and HF/ TEs with differential chromatin accessibility not classified HS diet-fed C57BL/6J male mice from [55]. Intriguingly, as polymorphic could be differentially regulated through TE-associated variable chromatin sites have increased epigenetic mechanisms, such as DNA methylation accessibility in response to HF/HS diet, compared with (Fig. 8a). An example of a TE-containing variable chro- common sites or random sites (Fig. 6a). Examining the matin locus with negatively associated CpG methylation accessibility of LINE elements that were unique to either levels is shown in Fig. 8b. Interestingly, strain-specific (A/J diet condition revealed that accessible LINEs in HF/HS- vs C57BL/6J) binding of liver TFs [43] indicates that this fed mice, but not control-fed mice, are enriched for lipid region is differentially bound by liver TFs as well (Fig. 8b). metabolic pathways (Fig. 6b) and are proximal to meta- Bisulfite sequencing at the region highlighted in Fig. 8b in bolic genes with altered expression in response to HF/HS livers from A/J and C57BL/6J revealed differential meth- diet (Fig. 6c, d). These results indicate that TEs contribute ylation of this region (Fig. 8c). To examine the impact of to regulatory changes in the liver in response to diet. differential methylation at TEs to chromatin variation across the genome, we utilized reduced representation TE polymorphic variants contribute to regulatory variation bisulfite sequencing (RRBS) data from liver tissue of the across inbred strains same strains of mice [57]. Interestingly, variable chroma- Given the widespread contribution of TEs to regulatory tin sites at TEs have a greater degree of DNA methylation networks, we were further interested in characterizing variation across strains as compared to variable chroma- the potential mechanisms responsible for TE-driven reg- tin sites at other regions (Fig. 8d). These results indicate ulatory variation among different strains. One possible that differential epigenetic suppression of TEs contributes mechanism whereby TEs could contribute to chromatin to chromatin accessibility variation across the strains. accessibility variation is TE polymorphism—where a TE To further validate that the epigenetic variation at TEs is present in one genome and not in another (Fig. 7a). in liver is not only restricted to the seven inbred strains A previous study has characterized TE polymorphism of mice, we compared the CpG methylation levels from across 18 strains of mice, including the seven strains in livers of 25 inbred mouse strains [57]. Consistent with our study [10]. Figure 7b shows an example of a poly- the results presented above, the differentially methylated morphic LTR variant associated with chromatin varia- (DM) CpG sites among inbred strains are significantly tion. The LTR element present in C57BL/6J, CBA/J and enriched for TEs compared to other CpG sites (Fig. 8e, −16 DBA/2J genomes [10] contains a strain-specific accessi- p < 2.2 × e , Fisher’s exact test). These results suggest ble chromatin region. Interestingly, the accessible chro- that widespread chromatin variation at TEs is a general matin site within the LTR also shows evidence of binding feature in mouse liver. by liver TFs, including HNF4α, C/EBPα and FOXA1 (Fig. 7b). This region is within the intron of the Enpp1 Discussion gene, which encodes a pyrophosphatase, and has been While previous studies have identified a genetic compo- shown to be related to type 2 diabetes [56]. These results nent to chromatin variation [7], the mechanisms under- indicate that polymorphic TE-associated chromatin sites lying the majority of chromatin variation have remained may play a strain-specific regulatory role for Enpp1. All unexplained. We report here that TEs are a major con- together, we found approximately 30 % of polymorphic tributor to chromatin variation in liver tissue and further- TE sites are bound by liver TFs (Fig. 7c), suggesting that more that TE-driven chromatin variation is important for these polymorphic TE-associated variable chromatin metabolic phenotypes. Du et al. Epigenetics & Chromatin (2016) 9:28 Page 10 of 16

a C57BL/6J b -log10(Hypergeometric p-values) ** 0 2 4 6 8 ** Lipid homeostasis 8.37 Fatty acid metabolic process 7.02 Alpha-amino acid metabolic process 6.48 Cellular amino acid catabolic process 5.61

024 Triglyceride metabolic process 5.61 Alcohol metabolic process 5.55 Acylglycerol metabolic process 5.32 −2

log2 (HFHS/Control) Cholesterol metabolic process 5.02 Positive regulation of lipid storage 4.97 Intrinsic apoptotic signaling pathway 4.96 −4

Variable Common Random GO Biological Process TE TE sites enriched for HF/HS-specific accessible LINEs c d 2 kb 2 kb 135 Control (4.2) 800 Control (56.5)

135 HF/HS (9.2) HF/HS (105.2) RN A 800 RN A

165 Control 110 Control

165 HF/HS 110 HF/HS FAIR E FAIR E

Dusp1 Herpud1 s

s SINE SINE

TE LINE TE LINE LTR 1 C57BL/6J 1 C57BL/6J 170-mer Mappability 170-mer Mappability Fig. 6 Variable TEs are associated with diet-induced gene expression changes. a Fold change (HF/HS over control) of C57BL/6J FAIRE-seq reads at variable TE sites, common TE sites or random accessible sites. Box plots show the median value, and whiskers show distribution of first and third quartile (**indicates p < 0.0001, Wilcoxon’s rank-sum test). b Enriched biological processes of accessible LINEs specific for HF/HS-fed C57BL/6J mice (4339 regions). Genomic coordinates were used as input for GREAT analysis. c, d Genome browser views of LINE-associated diet-induced chromatin remodeled sites nearby diet-induced metabolic genes, Dusp1 (b) and Herpud1 (c). Numbers in parentheses for RNA-seq tracks represent FPKM values

We have previously shown that variation in chroma- effects, variability of chromatin accessibility at TEs might tin accessibility across three strains of mice in response also be influenced by local genetic variation, as indicated to diet depends on genetic factors [55, 58]. We have now by the regions where genotype and chromatin accessibil- extended our study to a total of seven inbred strains that ity correspond (Additional file 1: Supplementary meth- have significant variability in liver phenotypes in response ods, Figure S7b, Figure S14). to a HF/HS diet. The variability of the phenotype in these Epigenetic variability can occur both inter-strain and mice resembles the diversity of diet response in humans inter-individual. In our study, we used duplicates of [59]. Although accessible chromatin sites are less likely each strain of mice for chromatin accessibility profiling to overlap TEs in liver tissue in general (Additional file 1: and employed a computational pipeline (see “Methods” Figure S4a, b), we found that chromatin sites with higher section) to identify reproducible chromatin variation variability in different strains are enriched for TEs, spe- among different strains of mice. A previous study on cifically evolutionarily younger LINEs. We furthermore C57BL/6J mice showed inter-individual variation of demonstrated that strains with higher accessibility for a DNA methylation at TEs [60]. We used the 356 regions given young L1Md subfamily also display higher accessi- identified as inter-individual differentially methylated bility for other young L1Md subfamilies. One explanation regions [60] and found less than 1 % (16/2539) of our for this is that certain strains have less faithful silencing variable sites contain inter-individual variability, indicat- of younger LINEs compared with others. Further studies ing that the majority of the variable chromatin sites we examining the strain-specific regulation of young LINEs identified represent sites of variability among different will be enlightening. In addition to potential long-range strains of mice. Du et al. Epigenetics & Chromatin (2016) 9:28 Page 11 of 16

ab 2 kb 35 TE FOXA1 35 C/EBPα 35 HNF4α ChIP-se q C57BL/6J 35 Input

Strain A 80 A/J Strain B 80 AKR/J

80 c BALB/cJ

d 80 40 C3H/HeJ 80 30 C57BL/6J FAIRE-se q 80 20 CBA/J 10 80 DBA/2J

Percentage boun 0 C57BL/6J A/J Enpp1 by HNFα/C/EBPα/FOXA1 LTR: IAPEz-int 1 C57BL/6J C3H/HeJ 1 170-mer Mappability Fig. 7 TE polymorphism accounts for a small percentage of chromatin variation. a A model of TE contribution to chromatin variation through TE polymorphism. b Genome browser view of a polymorphic LTR variant with variable chromatin accessibility. Strains with FAIRE-seq tracks colored in red contain the LTR element, while the strains in black do not have the LTR element. ChIP-seq tracks of three liver TFs and input from C57BL/6J mouse liver are also shown. c Percentage of polymorphic TE-associated variable chromatin sites shown to be bound by liver TFs in C57BL/6J or A/J mice livers

TEs have been shown to play an important role in elements of different evolutionary age have contributed expanding the TF binding repertoire during mamma- unique elements to regulatory networks. lian evolution [24, 27]. Supportive of this, we found that We further investigated possible mechanisms of chro- younger and older LINEs have differential chromatin matin variation at TEs. TE polymorphism explains at accessibility and are bound by different TFs. Evolution- least 6 % of the TE contribution to chromatin variation. arily older LINEs are enriched for binding sites of liver Previous work has shown that less than 10 % of these regulatory factors (HNF4α and C/EBPα), indicating structural variants result in detectable gene expression their important regulatory roles in the liver. In contrast, changes [10]. However, we found approximately 30 % younger LINEs are enriched for the binding sites of TFs of polymorphic TE sites to be bound by liver TFs, indi- involved in immune response, such as STATs. The rela- cating that they play a regulatory role in liver. This dis- tionship between STAT and TEs is intriguing; a recent crepancy may due to the high stringent threshold used study demonstrated that specific TEs play a functional in the previous work [10]. It is also possible that liver- role in immune pathways in human HeLa cells [21]. It is TF-bound sites are not directly regulating nearby gene possible that the variable sites uncovered by our studies targets [63]. also contribute to immune pathways regulated by STAT Epigenetic mechanisms, such as DNA methylation, proteins. STATs have been show to be involved in the have been shown to play an important role in suppres- development of hepatosteatosis [61], which can also be sion of TE activity in somatic cells [17]. We show here induced by HF diet [62]. Given that these STAT-bound that variation of CpG methylation at TEs contributes to LINEs are at variable chromatin sites, STAT binding to chromatin accessibility variation. DNA hypomethyla- the young LINEs could be a source of chromatin varia- tion at specific TEs has been shown to be associated with tion. Importantly, our results indicate that specific LINE enhancer activity [19]. Therefore, these TE-associated Du et al. Epigenetics & Chromatin (2016) 9:28 Page 12 of 16

5 kb a b c CpG sites 110 FOXA1 110 C/EBPα Clones A/ J 110 HNF4α 110 Input d 110 FOXA1

ChIP-se q A/J (30%) 110 C/EBPα ** ** 110 HNF4α C57BL/6J 0.008 110 Input C57BL/6J (89%) 60 A/J .004 e

-seq ** FAIR E 60 C57BL/6J 30 s 00 20 Suds3 TEs Other Unique CpG sites Repeats 10 Variance of CpG Methylation

s SINE Variable chromatin LINE 0 TE CpG in TE DM Other

LTR Percentage of 1 C57BL/6J CpGs 170-mer Mappability Fig. 8 Differential DNA methylation at TEs can impact chromatin accessibility variation across inbred strains. a A model of TE contribution to chro- matin variation through differential DNA methylation. b Genome browser view of differentially methylated TEs with variable chromatin accessibility. ChIP-seq tracks of three liver TF and input from C57BL/6J and A/J livers are also shown. CpG sites within the highlighted accessible chromatin sites are shown in blue. c Bisulfite sequencing in A/J and C57BL/6J liver of the highlighted region in b. Filled circles represent methylated CpGs, whereas open circles represent unmethylated CpGs. d Boxplot of CpG methylation variance among the seven strains within variable chromatin sites that con- tain TEs, other repeat elements or unique sequences. Box plots show the median value, and whiskers show distribution of first and third quartile. e Percentage of differential methylated (DM) vs other CpGs in TE sequences. Differential methylation was defined from RRBS data of 25 inbred strains (see “Methods” section) (**indicates p < 0.001, Fisher’s exact test)

chromatin sites may have differential enhancer activity in genes which may result in downstream phenotypic different strains. Future studies on histone modifications diversity. may explain more of the impact of TE epigenetic regula- tion on chromatin accessibility. Methods Our finding that TEs contribute to chromatin vari- Animal ation and metabolic gene regulation suggests that the Mice were obtained from The Jackson Laboratory and phenotypic diversity observed across the strains is at were bred at the University of California, Los Ange- least partially due to the regulatory role of TEs. One of les. Male A/J, AKR/J, BALB/cJ, C57BL/6J, C3H/HeJ, the classical models of TE contribution to phenotypic CBA/J and DBA/2J mice were maintained on a chow diet diversity is the agouti viable yellow (Avy) gene, for which (Ralston Purina Company) until 8 weeks of age. Then a TE exists upstream of the Avy gene [64]. Variation of they were given a high-fat, high-sucrose diet (Research DNA methylation at this TE regulates the expression of diets D12266B, 16.8 % kcal protein, 51.4 % kcal carbohy- the Avy gene and therefore leads to differential coat color drate and 31.8 % kcal fat) for 8 weeks. During the feed- and obesity susceptibility [64, 65]. Further experimental ing period, body fat percentage was tracked as described validation on the TE-associated variable chromatin sites previously [28]. Mice were then humanely euthanized may lead to the identification of more examples like this. and livers were harvested. All animal study protocols in Studies in different tissue types and disease systems may this study were approved by the Institutional Care and further reveal the impact of TEs to phenotypic diversity. Use Committee (IACUC) at University of California, Los Angeles and by the Institutional Care and Use Commit- Conclusions tee (IACUC) at the City of Hope. In summary, our study has revealed that specific classes of TEs, especially younger LINEs, can impact chromatin Phenotypic characterization of mice accessibility variation in liver of different inbred strains. Hematoxylin and eosin (H&E) staining and Oil red We further demonstrate that TEs regulate tissue-specific O staining were performed on liver sections by the Du et al. Epigenetics & Chromatin (2016) 9:28 Page 13 of 16

Pathology Core at the City of Hope using standard discovery rate (IDR) framework [35]. To obtain a union procedures. set of accessible chromatin sites from the seven strains, we used the mergeBed function with default parameters FAIRE‑seq and alignment [69]. Formaldehyde-assisted isolation of regulatory elements To identify variable chromatin sites among different (FAIRE) was performed on flash frozen liver tissues from strains, we first counted the FAIRE-seq reads from each two biological replicates in each strain as previously FAIRE-seq library at the union set of accessible chroma- described [29]. Isolated FAIRE DNA fragment from each tin sites. We normalized the read counts using quantile sample was barcoded and sequenced on the Illumina normalization [70]. We then used DESeq [36] to identify HiSeq 2500 to produce 100 × 100 bp paired-end reads. variable chromatin sites among the seven strains, as has In order to eliminate the mapping biases caused by been applied previously [7]. We ranked the accessible inter-strain sequence variation, we first generated a chromatin sites by adjusted p-values from DESeq. The pseudo-genome for each non-reference strains by intro- 5 % with smallest adjusted p-values were considered as ducing SNPs from each strain into the reference mouse variable chromatin sites, whereas the 5 % with biggest genome (mm9) [33]. We then mapped FAIRE-seq reads adjusted p-values were considered as least variable (com- from each replicate to the appropriate pseudo-genome mon) chromatin site among the seven strains (Additional using bowtie1 [66] and only reads that could be mapped file 1: Figure S2). to single location in the genome were retained. Aligned reads were further filtered to exclude improperly paired Association between SNP genotype and chromatin reads and PCR duplicates. Overall, we obtained around accessibility 17 million uniquely mapped non-duplicate reads in The correlation of FAIRE-seq signal and local sequence each sample (Additional file 1: Table S1). Wiggle tracks variation (Additional file 1: Figure S3) was analyzed were generated for visualization on the UCSC Genome as previously described [7]. Briefly, we translated the Browser [67]. genotypes for all the seven strains at a certain SNP into For the analysis of FAIRE-seq and RNA-seq coverage at a vector and evaluated the correlation of this vector to LINEs (Fig. 3), we mapped reads to the reference genome FAIRE-seq read counts at the overlapping accessible using bowtie2 with the local alignment option [39], as chromatin site by linear regression. described previously [40]. Unlike bowtie1 with unique mapping mode, the bowtie2 alignment method keeps Identification of TE‑associated chromatin sites reads with multiple alignments and reports the best To identify accessible chromatin sites at TE sequences, alignments [39]. Therefore, the reads from highly similar we used intersectBed [69] to find the accessible chroma- TE elements can be mapped to a given subfamily of TE. tin sites that overlap with TEs as annotated by Repeat- Masker [37] for the mouse genome (mm9). The age of Mappability score TEs was calculated as: age = divergence/substitution In order to mitigate mapping biases, we generated map- rate, as previously described [27]. The divergence rates pability scores for the reference (C57BL/6J) and non-ref- (number of mismatches) for all TEs were obtained from erence pseudo-genomes. We used the genome multitool the RepeatMasker annotation file [37]. We used the −9 (GEM) mapper [68] to generate mappability scores. The substitution rates as 4.5 × 10 per site per year for the average length of paired-end FAIRE-seq fragments for the mouse genome [11, 27]. seven strains was 170 ± 3 bp (mean ± standard deviation). Therefore, we generated 170-mer mappability scores with Motif scanning up to two mismatches allowed. The mappability score (M) To characterize TF motifs in LINEs, we used HOMER measures how often the sequence found at the particular (version 4.8) (findMotifsGenome.pl) [71] to identify location will align within the whole genome. M = 1 means motifs of known TFs in variable chromatin sites contain- unique match in the genome, S = 0.5 means two matches ing younger (<40 million years (Myrs) or older LINEs (≥40 in the genome, and so on. All the tracks shown here are in Myrs) as compared to random sequences with matched the form of signals ranging from 0 to 1. GC %. Motifs with p-value of enrichment less than 0.01 that occurred in more than 10 % of the target sequences Accessible chromatin detection and analysis were selected. Highly similar motifs were combined To identify accessible chromatin sites from FAIRE-seq by using joinmotifs tool [72], and only one of the simi- reads for each library, F-seq was used with default param- lar motifs is reported. HOMER was further used to scan eters and a 400 bp feature length [34]. To find reproduci- for occurrences (scanMotifGenomeWide.pl) [71] of the ble peaks across replicates, we utilized the irreproducible HNF4α and STAT motif genome wide. Putative binding Du et al. Epigenetics & Chromatin (2016) 9:28 Page 14 of 16

sites were defined by motif occurrences within accessible USA). White colonies were selected through blue/white chromatin regions identified in C57BL/6J mouse liver, sim- screening and analyzed with Sanger sequencing. ilar to what has been reported before [47]. DNA methylation data ChIP‑Quantitative PCR DNA methylation data from reduced representation Chromatin immunoprecipitation (ChIP) was performed bisulfite sequencing were obtained from GEO (acces- with an anti-STAT3 antibody (sc-482X, Santa Cruz Bio- sion number GSE67507 [57]). Similar to previous analy- technology) and IgG control using standard ChIP proto- ses [57], only CpG sites were included for analysis. For cols. Fragmented chromatin was assessed for enrichment simplicity, we deleted the small amount of polymorphic at specific sites by quantitative PCR quantitation. The CpGs in the seven strains from the methylation data. ΔΔCt method was utilized to evaluate enrichment of tar- Differentially methylated (DM) regions were identified get DNA and normalized to input DNA. qPCR primer as those with a variance greater than 0.05 and range of sequences at the L1Md_F2 are in Additional file 1: Table methylation differences greater than 0.75. S4. Based on in silico PCR, the primer set can bind to seven L1s in the genome. However, six of the potentially (GO) analysis targeted regions contain STAT motif and display similar In order to investigate the enriched biological function chromatin accessibility variation as shown in Fig. 4e. of the genes nearby accessible chromatin sites, we used genomic coordinates (UCSC mm9) of accessible chro- CRISPR‑Cas9 genomic deletion matin sites as input for genomic regions enrichment of For each TE tested, two guide RNAs (gRNAs) were annotations tool (GREAT) version 3.0.0 [49]. Gene reg- designed to generate specific deletion of the TE sequence ulatory regions were defined using default parameters in H2.35 cells, a cell line derived from BALB/c hepato- (5 kb upstream, 1 kb downstream and up to 1000 kb dis- cytes. All gRNAs (Additional file 1: Figure S12, Table S4) tal) and included significant associations for “GO Terms were verified to be unique targets in the mouse genome Biological Process”. Only terms that were below a false by using BLAT against mouse reference genome. We discovery rate (FDR) of 0.01 were reported. also avoided any gRNA targets that contained annotated SNPs in BALB/cJ mice. gRNA oligos were cloned into RNA‑seq and ChIP‑seq data pSpCas9(BB)-2A-GFP and pSpCas9(BB)-2A-Puro vectors RNA-seq data from livers of C57BL/6J and A/J mice fed following a published protocol [73]. pSpCas9(BB)-2A-GFP with HF/HS diet were obtained from GEO (accession (PX458) and pSpCas9(BB)-2A-Puro (PX459) V2.0 were numbers GSE55581 [55] and GSE75984 [58]). ChIP-seq gifts from Feng Zhang (Addgene plasmid #48138, #62988). sites of liver TFs (HNF4α, C/EBPα, and FOXA1) from H2.35 cells were co-transfected (Invitrogen, Lipofectamine C57BL/6J and A/J liver tissues were downloaded from 2000) with both gRNA constructs and placed under puro- ArrayExpress (accession number E-MTAB-1414 [43]). mycin (1 μg/ml) selection for 3 days. Control cells were co- CTCF ChIP-seq sites were obtained from GEO (acces- transfected with vectors without gRNA insertions. From sion number GSM918715 [4]). these cells and control cells transfected with pSpCas9(BB)- Additional files 2A-GFP and pSpCas9(BB)-2A-Puro, genomic DNA (Epi- centre, Quickextract) and RNA were extracted (Trizol, Additional file 1: Supplementary methods. Figure S1. Phenotypic Life Technologies) as recommended by the manufacturer. diversity in different inbred strains. Figure S2. Chromatin variability across Genomic deletion was verified by PCR using flanking inbred strains of mice. Figure S3. Association between chromatin varia- primer pairs at the expected deletion site (Additional file 1: tion and SNPs. Figure S4. Accessible chromatin sites and TE sequences. Figure S5. DNA transposons and chromatin accessibility variation. Figure Figure S12, Table S4). Reverse transcription quantitative S6. Differential accessibility at young L1Md subfamilies across different PCR (RT-qPCR) was performed to determine the expres- strains. Figure S7. Chromatin accessibility at younger L1Md subfamilies sion change in nearby gene(s) (primer sequences are in in recombinant inbred strains. Figure S8. Chromatin variability and age of LINE subfamilies. Figure S9. Differential chromatin accessibility profile Additional file 1: Table S4). at younger and older LINEs in A/J mice liver. Figure S10. Accessibility and transcription of LINE subfamilies. Figure S11. CRISPR-Cas9 deletion of Bisulfite Sanger sequencing additional TEs. Figure S12. Guide RNA and genotyping primers used for CRISPR-Cas9 genome editing. Figure S13. Genotyping for TE deletions. Genomic DNA from liver tissue was bisulfite-treated Figure S14. Example of an eQTL that associated with variable chromatin according to the manufacturer’s instructions (EpiTect accessibility at a LTR. Table S1. Summary of FAIRE-seq data sets in all the Bisulfite Kit, QIAGEN, USA). Converted genomic DNA strains in this study. Table S3. Enriched biological process from GREAT analysis of accessible chromatin sites. Table S4. Sequences used in this was used for PCR (primer sequences are in Additional study. file 1: Table S4). Purified PCR products were cloned Additional file 2: Table S2. Counts of TEs in subfamilies. into pDrive Cloning vector (PCR Cloning Kit, QIAGEN, Du et al. Epigenetics & Chromatin (2016) 9:28 Page 15 of 16

Abbreviations 4. Stamatoyannopoulos JA, Snyder M, Hardison R, Ren B, Gingeras T, TE: transposable element; FAIRE: formaldehyde-assisted isolation of regulatory Gilbert DM, Groudine M, Bender M, Kaul R, Canfield T, et al. An ency- elements; HF/HS: high fat, high sucrose; SNP: single nucleotide polymorphism; clopedia of mouse DNA elements (Mouse ENCODE). Genome Biol. LINE: long interspersed nuclear element; SINE: short interspersed nuclear 2012;13:418. element; LTR: long terminal repeat; TF: transcription factor; TG: triglyceride; 5. McDaniell R, Lee BK, Song L, Liu Z, Boyle AP, Erdos MR, Scott LJ, Morken H&E: hematoxylin and eosin; Adi1: acireductone dioxygenase 1; Lipc: lipase, MA, Kucera KS, Battenhouse A, et al. Heritable individual-specific and hepatic; Myrs: million years; HNF4α: hepatocyte nuclear factor 4 alpha; C/EBPα: allele-specific chromatin signatures in humans. Science. 2010;328:235–9. CCAAT/enhancer binding protein alpha; FOXA1: forkhead box A1; STAT: signal 6. Kasowski M, Kyriazopoulou-Panagiotopoulou S, Grubert F, Zaugg transducers and activators of transcription; GAS: gamma-activated sequence; JB, Kundaje A, Liu Y, Boyle AP, Zhang QC, Zakharia F, Spacek DV, et al. Ugt2b37: UDP glucuronosyltransferase 2 family, polypeptide B37; Enpp1: Extensive variation in chromatin states across humans. Science. ectonucleotide pyrophosphatase/phosphodiesterase 1; Suds3: suppressor 2013;342:750–2. of defective silencing 3 homolog (S. cerevisiae); Dusp1: dual-specificity phos- 7. Hosseini M, Goodstadt L, Hughes JR, Kowalczyk MS, de Gobbi M, Otto phatase 1; Herpud1: homocysteine-inducible, endoplasmic reticulum stress- GW, Copley RR, Mott R, Higgs DR, Flint J. Causes and consequences of inducible, ubiquitin-like domain member 1; FPKM: fragments per kilobase chromatin variation between inbred mice. PLoS Genet. 2013;9:e1003570. million; DM: differentially methylated. 8. Degner JF, Pai AA, Pique-Regi R, Veyrieras JB, Gaffney DJ, Pickrell JK, De Leon S, Michelini K, Lewellen N, Crawford GE, et al. DNase I sensitivity Authors’ contributions QTLs are a major determinant of human expression variation. Nature. JD, AL and DES designed the study. JD, CT, ML and BWP performed the experi- 2012;482:390–4. ments. JD, AL and DES conducted the analysis. JD, AJL, RN and DES prepared 9. Ewing AD, Kazazian HH Jr. High-throughput sequencing reveals extensive the manuscript. All authors read and approved the final manuscript. variation in human-specific L1 content in individual human genomes. Genome Res. 2010;20:1262–70. Author details 10. Nellaker C, Keane TM, Yalcin B, Wong K, Agam A, Belgard TG, Flint J, 1 Department of Diabetes Complications and Metabolism, Beckman Research Adams DJ, Frankel WN, Ponting CP. The genomic landscape shaped by Institute, City of Hope, Duarte, CA, USA. 2 Irell & Manella Graduate School selection on transposable elements across 18 mouse strains. Genome of Biological Sciences, City of Hope, Duarte, CA, USA. 3 Department of Nutri- Biol. 2012;13:R45. tional Sciences, University of Wisconsin-Madison, Madison, WI, USA. 4 Depart- 11. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agar- ment of Medicine, University of California, Los Angeles, CA, USA. wala R, Ainscough R, Alexandersson M, An P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–62. Acknowledgements 12. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, We would like to acknowledge other members of the Schones and Lusis Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the laboratories for helpful discussions and comments. We thank Beisi Xu for human genome. Nature. 2001;409:860–921. generating mappability tracks. We thank Parijat Senapati, Sadhan Das, Xiaoxiao 13. Jern P, Coffin JM. Effects of retroviruses on host genome function. Annu Ma and Kaniel Cassady for suggestions on the manuscript. Rev Genet. 2008;42:709–32. 14. Hancks DC, Kazazian HH Jr. Active human retrotransposons: variation and Competing interests disease. Curr Opin Genet Dev. 2012;22:191–203. The authors declare that they have no competing interests. 15. Slotkin RK, Martienssen R. Transposable elements and the epigenetic regulation of the genome. Nat Rev Genet. 2007;8:272–85. Availability of data and material 16. Mita P, Boeke JD. How retrotransposons shape genome regulation. Curr The sequence data have been deposited in the NCBI GEO repository Opin Genet Dev. 2016;37:90–100. (GSE75770). 17. Levin HL, Moran JV. Dynamic interactions between transposable ele- ments and their hosts. Nat Rev Genet. 2011;12:615–27. Ethics approval and consent to participate 18. Yoder JA, Walsh CP, Bestor TH. Cytosine methylation and the ecology of All animal study protocols in this study were approved by the Institutional intragenomic parasites. Trends Genet. 1997;13:335–40. Care and Use Committee (IACUC) at University of California, Los Angeles and 19. Xie MC, Hong CB, Zhang B, Lowdon RF, Xing XY, Li DF, Zhou X, Lee HJ, by the Institutional Care and Use Committee (IACUC) at the City of Hope. Maire CL, Ligon KL, et al. DNA hypomethylation within specific transpos- able element families associates with tissue-specific enhancer landscape. Funding Nat Genet. 2013;45:836–41. This work was supported by Shaeffer endowment funds, T32DK007571-24 20. Faulkner GJ, Kimura Y, Daub CO, Wani S, Plessy C, Irvine KM, Schroder K, (AL), K01DK104993 (AL), K99HL123021 (BP), R01HL106089, R01HL087864 Cloonan N, Steptoe AL, Lassmann T, et al. The regulated retrotransposon and RO1DK065073 (RN). Research reported in this publication included work transcriptome of mammalian cells. Nat Genet. 2009;41:563–71. performed in the Pathology and Integrative Genomics Cores supported by 21. Chuong EB, Elde NC, Feschotte C. Regulatory evolution of innate the National Cancer Institute of the National Institutes of Health under award immunity through co-option of endogenous retroviruses. Science. number P30CA33572. 2016;351:1083–7. 22. Cordaux R, Batzer MA. The impact of retrotransposons on human Received: 18 April 2016 Accepted: 29 June 2016 genome evolution. Nat Rev Genet. 2009;10:691–703. 23. Cowley M, Oakey RJ. Transposable elements re-wire and fine-tune the transcriptome. PLoS Genet. 2013;9:e1003234. 24. Sundaram V, Cheng Y, Ma Z, Li D, Xing X, Edge P, Snyder MP, Wang T. Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res. 2014;24:1963–76. References 25. Marino-Ramirez L, Jordan IK. Transposable element derived DNaseI- 1. Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Shef- hypersensitive sites in the human genome. Biol Direct. 2006;1:20. field NC, Stergachis AB, Wang H, Vernot B, et al. The accessible chromatin 26. Jacques PE, Jeyakani J, Bourque G. The majority of primate-specific regu- landscape of the human genome. Nature. 2012;489:75–82. latory sequences are derived from transposable elements. PLoS Genet. 2. Song LY, Zhang ZC, Grasfeder LL, Boyle AP, Giresi PG, Lee BK, Sheffield 2013;9:e1003504. NC, Graf S, Huss M, Keefe D, et al. Open chromatin defined by DNaseI 27. Bourque G, Leong B, Vega VB, Chen X, Lee YL, Srinivasan KG, Chew JL, and FAIRE identifies regulatory elements that shape cell-type identity. Ruan Y, Wei CL, Ng HH, Liu ET. Evolution of the mammalian transcrip- Genome Res. 2011;21:1757–67. tion factor binding repertoire via transposable elements. Genome Res. 3. Voss TC, Hager GL. Dynamic regulation of transcriptional states by chro- 2008;18:1752–62. matin and transcription factors. Nat Rev Genet. 2014;15:69–81. 28. Parks BW, Nam E, Org E, Kostem E, Norheim F, Hui ST, Pan C, Civelek M, Rau CD, Bennett BJ, et al. Genetic control of obesity and gut microbiota Du et al. Epigenetics & Chromatin (2016) 9:28 Page 16 of 16

composition in response to high-fat, high-sucrose diet in mice. Cell 51. Cheetham SA, Smith AL, Armstrong SD, Beynon RJ, Hurst JL. Limited Metab. 2013;17:141–52. variation in the major urinary proteins of laboratory mice. Physiol Behav. 29. Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD. FAIRE (formaldehyde- 2009;96:253–61. assisted isolation of regulatory elements) isolates active regulatory 52. Thoss M, Luzynski KC, Ante M, Miller I, Penn DJ. Major urinary protein elements from human chromatin. Genome Res. 2007;17:877–85. (MUP) profiles show dynamic changes rather than individual ‘barcode’ 30. Montgomery MK, Hallahan NL, Brown SH, Liu M, Mitchell TW, Cooney signatures. Front Ecol Evol. 2015;3:71. GJ, Turner N. Mouse strain-dependent variation in obesity and 53. Xu JL, Kulkarni SR, Li LY, Slitt AL. UDP-glucuronosyltransferase expression glucose homeostasis in response to high-fat feeding. Diabetologia. in mouse liver is increased in obesity—and fasting-induced steatosis. 2013;56:1129–39. Drug Metab Dispos. 2012;40:259–66. 31. Hui ST, Parks BW, Org E, Norheim F, Che N, Pan C, Castellani LW, Charu- 54. Mihaylova MM, Shaw RJ. Metabolic reprogramming by class I and II gundla S, Dirks DL, Psychogios N, et al. The genetic architecture of NAFLD histone deacetylases. Trends Endocrinol Metab. 2013;24:48–57. among inbred strains of mice. Elife. 2015;4:e05607. 55. Leung A, Parks BW, Du J, Trac C, Setten R, Chen Y, Brown K, Lusis AJ, 32. Cohen JC, Horton JD, Hobbs HH. Human fatty liver disease: old questions Natarajan R, Schones DE. Open chromatin profiling in mice livers reveals and new insights. Science. 2011;332:1519–23. unique chromatin variations induced by high fat diet. J Biol Chem. 33. Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger 2014;289:23557–67. A, Agam A, Slater G, Goodson M, et al. Mouse genomic variation and its 56. Besic V, Stubbs RS, Hayes MT. Liver ENPP1 protein increases with remis- effect on phenotypes and gene regulation. Nature. 2011;477:289–94. sion of type 2 diabetes after gastric bypass surgery. BMC Gastroenterol. 34. Boyle AP, Guinney J, Crawford GE, Furey TS. F-Seq: a feature den- 2014;14:222. sity estimator for high-throughput sequence tags. Bioinformatics. 57. Orozco LD, Morselli M, Rubbi L, Guo W, Go J, Shi H, Lopez D, Furlotte 2008;24:2537–8. NA, Bennett BJ, Farber CR, et al. Epigenome-wide association of liver 35. Li QH, Brown JB, Huang HY, Bickel PJ. Measuring reproducibility of high- methylation patterns and complex metabolic traits in mice. Cell Metab. throughput experiments. Ann Appl Stat. 2011;5:1752–79. 2015;21:905–17. 36. Anders S, Huber W. Differential expression analysis for sequence count 58. Leung A, Trac C, Du J, Natarajan R, Schones DE. Persistent chromatin data. Genome Biol. 2010;11:R106. modifications induced by high fat diet. J Biol Chem. 2016;291:10446–55. 37. RepeatMasker Open-3.0. http://www.repeatmasker.org. 59. Zeevi D, Korem T, Zmora N, Israeli D, Rothschild D, Weinberger A, Ben- 38. Su M, Han DL, Boyd-Kirkup J, Yu XM, Han JDJ. Evolution of Alu elements Yacov O, Lador D, Avnit-Sagi T, Lotan-Pompan M, et al. Personalized toward enhancers. Cell Reports. 2014;7:376–85. nutrition by prediction of glycemic responses. Cell. 2015;163:1079–94. 39. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat 60. Oey H, Isbel L, Hickey P, Ebaid B, Whitelaw E. Genetic and epigenetic vari- Methods. 2012;9:357–9. ation among inbred mouse littermates: identification of inter-individual 40. Criscione SW, Zhang Y, Thompson W, Sedivy JM, Neretti N. Transcriptional differentially methylated regions. Epigenetics Chromatin. 2015;8:54. landscape of repetitive elements in normal and cancer human cells. BMC 61. Baik M, Yu JH, Hennighausen L. Growth hormone-STAT5 regulation of Genom. 2014;15:583. growth, hepatocellular carcinoma, and liver metabolism. Ann NY Acad 41. Jin Y, Tam OH, Paniagua E, Hammell M. TEtranscripts: a package for includ- Sci. 2011;1229:29–37. ing transposable elements in differential expression analysis of RNA-seq 62. Meli R, Raso GM, Irace C, Simeoli R, Di Pascale A, Paciello O, Pagano datasets. Bioinformatics. 2015;31:3593–9. TB, Calignano A, Colonna A, Santamaria R. High fat diet induces liver 42. Kunarso G, Chia NY, Jeyakani J, Hwang C, Lu XY, Chan YS, Ng HH, Bourque steatosis and early dysregulation of iron metabolism in Rats. PLOS One. G. Transposable elements have rewired the core regulatory network of 2013;8(6):e66570. human embryonic stem cells. Nat Genet. 2010;42:631–4. 63. MacQuarrie KL, Fong AP, Morse RH, Tapscott SJ. Genome-wide transcrip- 43. Stefflova K, Thybert D, Wilson MD, Streeter I, Aleksic J, Karagianni P, tion factor binding: beyond direct target regulation. Trends Genet. Brazma A, Adams DJ, Talianidis I, Marioni JC, et al. Cooperativity and rapid 2011;27:141–8. evolution of cobound transcription factors in closely related mammals. 64. Duhl DMJ, Vrieling H, Miller KA, Wolff GL, Barsh GS. Neomorphic agouti Cell. 2013;154:530–40. mutations in obese yellow mice. Nat Genet. 1994;8:59–65. 44. Gao B, Wang H, Lafdil F, Feng D. STAT proteins—key regulators of anti- 65. Morgan HD, Sutherland HGE, Martin DIK, Whitelaw E. Epigenetic inherit- viral responses, inflammation, and tumorigenesis in the liver. J Hepatol. ance at the agouti locus in the mouse. Nat Genet. 1999;23:314–8. 2012;57:430–41. 66. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory- 45. Kang K, Robinson GW, Hennighausen L. Comprehensive meta-analysis of efficient alignment of short DNA sequences to the human genome. signal transducers and activators of transcription (STAT) genomic bind- Genome Biol. 2009;10:R25. ing patterns discerns cell-specific cis-regulatory modules. BMC Genom. 67. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, 2013;14:4. Haussler D. The human genome browser at UCSC. Genome Res. 46. Bonham AJ, Wenta N, Osslund LM, Prussin AJ 2nd, Vinkemeier U, Reich 2002;12:996–1006. NO. STAT1:DNA sequence-dependent binding modulation by phos- 68. Marco-Sola S, Sammeth M, Guigo R, Ribeca P. The GEM mapper: fast, accu- phorylation, protein:protein interactions and small-molecule inhibition. rate and versatile alignment by filtration. Nat Methods. 2012;9:1185–8. Nucleic Acids Res. 2013;41:754–63. 69. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing 47. Pique-Regi R, Degner JF, Pai AA, Gaffney DJ, Gilad Y, Pritchard JK. Accurate genomic features. Bioinformatics. 2010;26:841–2. inference of transcription factor binding from DNA sequence and chro- 70. Hicks SC, Irizarry RA. Quantro: a data-driven approach to guide the choice matin accessibility data. Genome Res. 2011;21:447–55. of an appropriate normalization method. Genome Biol. 2015;16:117. 48. Inoue H, Ogawa W, Ozaki M, Haga S, Matsumoto M, Furukawa K, Hashi- 71. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, moto N, Kido Y, Mori T, Sakaue H, et al. Role of STAT-3 in regulation of Singh H, Glass CK. Simple combinations of lineage-determining transcrip- hepatic gluconeogenic genes and carbohydrate metabolism in vivo. Nat tion factors prime cis-regulatory elements required for macrophage and Med. 2004;10:168–74. B cell identities. Mol Cell. 2010;38:576–89. 49. McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, 72. Schones DE, Sumazin P, Zhang MQ. Similarity of position frequency matri- Bejerano G. GREAT improves functional interpretation of cis-regulatory ces for transcription factor binding sites. Bioinformatics. 2005;21:307–13. regions. Nat Biotechnol. 2010;28:495–501. 73. Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, Zhang F. Genome engi- 50. Zhou Y, Rui L. Major urinary protein regulation of chemical communica- neering using the CRISPR-Cas9 system. Nat Protoc. 2013;8:2281–308. tion and nutrient metabolism. Vitam Horm. 2010;83:151–63.