1 HIV Tat controls RNA Polymerase II and the epigenetic landscape
2 to transcriptionally reprogram target immune cells
3
4 Jonathan E. Reeder,1,2,* Youn-Tae Kwak,2,3,* Ryan P. McNamara2, Christian V. Forst4 and Iván
5 D’Orso2,5
6
7 1University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080
8 2Department of Microbiology, University of Texas Southwestern Medical Center, 5323 Harry
9 Hines Boulevard, Dallas, TX 75390
10 3Present address: Department of Biochemistry, University of Texas Southwestern Medical
11 Center, 5323 Harry Hines Boulevard, Dallas, TX 75390
12 4Department of Genetics and Genomic Sciences, Institute for Genomics and Multiscale Biology,
13 Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029
14 5Correspondence to: [email protected]
15 *These authors contributed equally to this work
1 16 Abstract
17 HIV encodes Tat, a small protein that facilitates viral transcription by binding an RNA structure
18 (TAR) formed on nascent viral pre-mRNAs. Besides this well characterized mechanism, Tat
19 appears to modulate cellular transcription, but the target genes and molecular mechanisms
20 remain poorly understood. We report here that Tat uses unexpected regulatory mechanisms to
21 reprogram target immune cells to promote viral replication and rewire pathways beneficial for
22 the virus. Tat functions through master transcriptional regulators bound at promoters and
23 enhancers, rather than through cellular “TAR-like” motifs, to both activate and repress gene sets
24 sharing common functional annotations. Despite the complexity of transcriptional regulatory
25 mechanisms in the cell, Tat precisely controls RNA Polymerase II recruitment and pause
26 release to fine-tune the initiation and elongation steps at target genes. We propose that a virus
27 with a limited coding capacity optimized its genome by evolving a small but “multitasking”
28 protein to simultaneously control viral and cellular transcription.
2 29 Introduction
30 Transcription of protein coding genes by RNA Polymerase (Pol) II is regulated at several steps
31 including initiation, elongation and termination (Adelman and Lis, 2012; Fuda et al., 2009;
32 Grunberg and Hahn, 2013; Sims et al., 2004; Zhou et al., 2012). Transcription factors
33 coordinate the activation and/or repression of key regulatory programs by acting at one or
34 multiple steps in this regulatory circuitry. While one group of transcription factors recruit Pol II to
35 their target genes to induce transcription initiation, another class functions by promoting the
36 transcriptional pause release from the promoter-proximal state to allow Pol II transition to the
37 productive elongation phase (Feinberg et al., 1991; Gomes et al., 2006; Peterlin and Price,
38 2006; Rahl et al., 2010; Zhou et al., 2012). Thus, promoter-proximal pausing has been
39 identified as a general feature of transcription control by Pol II in metazoan cells and a key
40 regulatory step during differentiation, cell development and induction of stem cell pluripotency
41 (Adelman and Lis, 2012; Core and Lis, 2008; Smith and Shilatifard, 2013; Zeitlinger et al., 2007;
42 Zhou et al., 2012).
43 In addition to Pol II and the basal transcription machinery, the epigenetic landscape
44 plays another critical role in transcription regulation, mediated by covalent modifications to the
45 N-terminal tails of histones. The study of chromatin modifications has revealed fundamental
46 concepts in the regulation of transcription. Histone marks, in general, associate with different
47 genomic domains (promoters, coding units and enhancers) and provide evidence of their
48 transcriptional status (Barski et al., 2007; Creyghton et al., 2010; Guenther et al., 2007; Li et al.,
49 2007; Tessarz and Kouzarides, 2014; Zhou et al., 2011). While active promoters are marked
50 with Histone 3 Lysine 4 trimethylation (H3K4me3), active transcription units are marked with
51 H3K79me3 and H3K36me3 (Kouzarides, 2007; Li et al., 2007). Thus, most Pol II regulated
52 genes (protein-coding and long non-coding RNAs) contain a K4/K36 signature that positively
53 correlates with active gene expression (Guttman and Rinn, 2012; Martin and Zhang, 2005). In
54 addition to promoters, distal genomic elements referred to as enhancers modulate gene activity
3 55 through gene looping or long-range chromatin interactions and further regulate the location,
56 timing, and levels of gene expression (Bulger and Groudine, 2011). The genomic locations of
57 these enhancers (intergenic or intragenic) usually correlate with high H3K4me1 and low
58 H3K4me3 content and their activity is proportional to the levels of H3K27Ac (Bulger and
59 Groudine, 2011; Creyghton et al., 2010; Kim et al., 2010b; Kowalczyk et al., 2012; Smallwood
60 and Ren, 2013). Additionally, H3K27Ac also appears to modulate two temporally separate
61 events: it enhances the search kinetics of transcriptional activators, and accelerates the
62 transition from initiation into elongation leading to a robust and potentially tunable transcriptional
63 response (Stasevich et al., 2014).
64 Viruses have evolved strategies to precisely orchestrate shifts in regulatory programs.
65 In addition to sustaining transcription of their own genomes, certain viral transcription factors
66 directly or indirectly alter existing cellular programs in ways that promote viral processes
67 (replication and spread) (Ferrari et al., 2008; Horwitz et al., 2008), and/or modulate programs in
68 the infected target cell. It is well established that the HIV Tat protein relieves promoter-proximal
69 pausing at the viral promoter by recruiting the positive transcription elongation factor b (P-TEFb)
70 to the trans-activating RNA (TAR) stem-loop formed at the 5’-end of viral nascent pre-mRNAs
71 (D'Orso and Frankel, 2010; Mancebo et al., 1997; Ott et al., 2011; Zhou et al., 2012; Zhu et al.,
72 1997). Much recently, it was discovered that Tat recruits P-TEFb as part of a larger complex
73 referred to as Super Elongation Complex (SEC), which is composed by the MLL-fusion partners
74 involved in leukemia (AF9, AFF4, AFF1, ENL, and ELL), and PAF1. Although SEC formation
75 relies on P-TEFb, optimal P-TEFb kinase activity towards the Pol II C-terminal domain (CTD) is
76 AF9 dependent, and the MLL-fusion partners and PAF1 are required for Tat transactivation (He
77 et al., 2010; Luo et al., 2012; Sobhian et al., 2010).
78 Moreover, Tat stimulates transcription complex assembly through recruitment of TATA-
79 binding protein (TBP) in the absence of TBP-associated factors (TAFs) (Raha et al., 2005),
80 implying that Tat controls both the initiation and elongation steps of transcription, in agreement
4 81 with early proposals of Tat increasing transcription initiation, stabilizing elongation and
82 precluding anti-termination (Kao et al., 1987; Laspia et al., 1989; Rice and Mathews, 1988).
83 Thus, Tat has the ability to control multiple stages in the HIV transcriptional cycle to robustly
84 increase transcript synthesis to promote viral replication.
85 In addition to controlling HIV transcription, Tat appears to modulate cellular gene
86 expression to generate a permissive environment for viral replication and spread, and alter or
87 evade immune responses (Izmailova et al., 2003; Kim et al., 2010a; Kim et al., 2013; Lopez-
88 Huertas et al., 2010; Marban et al., 2011). One example of Tat-mediated down-regulation is the
89 mannose receptor in macrophages and immature dendritic cells, which plays a key role in host
90 defense against pathogens by mediating their internalization (Caldwell et al., 2000). A similar
91 case is Tat’s repression of the MHC class I gene promoter, which depends on Tat binding to
92 complexes containing the TBP-associated factor TAFII250/TAF1 (Weissman et al., 1998). The
93 interaction of the C-terminal domain and TAF1 suggests that Tat mediates repression functions
94 through the transcription initiation complex. These, and many other examples, suggest that Tat
95 is capable of altering cellular gene expression via association with factors bound to promoters.
96 More recently, Chip-on-chip and ChIP-seq approaches have revealed that Tat has the
97 ability to bind target genes in the human genome. While these previous studies have provided
98 early glimpses about Tat recruitment to host cell chromatin, a comprehensive description of the
99 direct target genes and the molecular mechanisms remain poorly understood. To address
100 those gaps in knowledge, we aimed at: (i) identifying Tat target genes in the human genome, (ii)
101 delineating the molecular mechanisms by which Tat reprograms cellular transcription, and (iii)
102 defining how Tat is recruited to host cell chromatin. One of the main challenges in the field is to
103 obtain a high-quality, comprehensive ChIP-seq that will provide functional insights. To address
104 this challenge, we analyzed the genome-wide distribution of Tat in the human genome using a
105 technically improved ChIP-seq compared to the previous studies. Moreover, we investigated
106 the molecular mechanisms using a global analysis of chromatin signatures that demarcate the
5 107 position and activity of genomic domains (enhancers, promoters and coding units), as well as
108 Pol II recruitment and activity (Barski et al., 2007; Creyghton et al., 2010). We provide, for the
109 first time, evidence that the direct Tat target genes share functional annotations and are
110 regulated by common master transcriptional regulators such as T-cell identity factors. While the
111 Tat stimulated genes (TSG) show a positive role in activating T cells, favoring cell proliferation
112 to promote viral replication and spread, the Tat down-regulated genes (TDG) show critical roles
113 in blunting immune system responses, nucleic acid biogenesis (splicing and translation), and
114 proteasome control. Strikingly, Tat functions as both activator and repressor by modulating Pol
115 II recruitment and/or pause release as well as controlling the activity of chromatin-modifying
116 enzymes to reprogram the epigenetic landscape of the host cell.
117 Taken together, we propose that a virus with a limited coding capacity has optimized its
118 genome by evolving a small protein (Tat) to perform multiple functions throughout the viral life
119 cycle. Beyond controlling HIV transcription, Tat has evolved unique properties to occupy
120 precise genomic domains (promoters and enhancers) to reprogram cellular transcription using
121 unexpected regulatory mechanisms. We provide the molecular basis of an unprecedented
122 paradigm with critical roles in host-pathogen interactions.
6 123 Results
124 Genomic domains occupied and regulated by Tat in CD4+ T cells
125 Previous studies have proposed that Tat modulates cellular gene expression to generate an
126 environment hospitable for viral replication and spread (Kim et al., 2010a; Kim et al., 2013; Li et
127 al., 1997; Lopez-Huertas et al., 2010; Marban et al., 2011). However, the molecular
128 mechanisms remain poorly understood. To elucidate how Tat performs these functions we
129 aimed at: (i) identifying genomic domains occupied and regulated by Tat in the human genome,
130 and (ii) defining the mechanisms of transcriptional control. We generated high-quality chromatin
131 immunoprecipitation sequencing (ChIP)-seq datasets in two Jurkat CD4+ T cell lines, inducibly
132 expressing FLAG-tagged Tat or GFP used as negative control (Figure 1A). Importantly, the cell
133 line used expresses low and physiologic levels of Tat, which mimic those detected during HIV
134 infection (Figure 1–figure supplement 1). We utilized this minimalistic system in order to more
135 precisely examine the effects of Tat alone rather than in an infection setting. The latter
136 approach may compromise the investigation due to the introduction of other viral products that
137 may also affect cellular behavior and/or alter Tat activity (Frankel and Young, 1998).
138 We established a ChIP-seq pipeline using both a Tat antibody (Ab) previously used in
139 ChIP-seq (Marban et al., 2011) and a FLAG Ab. We identified genomic domains enriched with
140 statistically significant occupancy relative to input DNA using the model-based analysis of ChIP-
141 seq (MACS) algorithm with a stringent cutoff (FDR<0.05) (Zhang et al., 2008). Unexpectedly,
142 we did not find a significant number of overlapping peaks between the Tat and FLAG ChIP-seq
143 datasets, suggesting that the Tat Ab performs poorly in ChIP, providing a molecular explanation
144 for the low-quality dataset generated by Marban et al. (Marban et al., 2011) (Figure 1–figure
145 supplement 2A–E). We thus focused on FLAG, which enabled us to generate a better quality,
146 more comprehensive ChIP-seq dataset compared to previous studies (Kim et al., 2010a;
147 Marban et al., 2011), which served as the key basis to elucidate novel Tat functions in the
148 control of cellular transcription. To precisely map Tat binding sites in the human genome, we
7 149 called FLAG peaks in both the Tat and GFP cell lines, then discarded those peaks in the
150 intersection of both datasets, considering only the FLAG peaks unique to the Tat cell line to be
151 true Tat binding events. We found that the majority of FLAG peaks in the Tat cell line represent
152 bona fide Tat binding sites because they are not detected in the GFP cell line (Figure 1–figure
153 supplement 2A and C–E). Importantly, the ChIP signal for FLAG was barely detectable in the
154 GFP cell line or in cells expressing inactivating Tat mutations in the activation domain (such as
155 C22A) that abolish Tat recruitment to target genes and/or impair gene expression activation
156 (D'Orso et al., 2012) (Figure 1–figure supplement 3). In contrast to the C22A non-functional
157 Tat mutant, synonymous mutations within the RNA-binding domain (such as K50Q and
158 R52R53K) have less pronounced effects on Tat binding to chromatin (Figure 1–figure
159 supplement 3).
160 Genome-wide distribution analysis based on the ENCODE annotation (Consortium,
161 2011), revealed that Tat binding is heavily enriched at promoters (24%, p-value 3.4x10-324) and
162 5’-UTR (1.2%, p-value 5.3x10-193) relative to their genomic proportions. These genomic
163 domains are underrepresented in the genome (about 1%) making Tat’s relative abundance at
164 these sites highly significant (Figure 1B). Although Tat also binds introns (36.4%, p-value
165 5.1x10^-4) and intergenic domains (35%, below the expected level of enrichment), defined as
166 sites located more than 1 Kb from the nearest annotated gene, these two genomic domains are
167 highly represented in the genome and the relative Tat binding frequency here is not nearly as
168 striking as at promoters and promoter-proximal regions (Figure 1B and Table 1).
169 To gain insights into the roles of Tat in cellular transcriptional control, we integrated the
170 FLAG ChIP-seq dataset with a whole transcriptome generated by RNA-seq (Figure 1C). RNA-
171 seq revealed 2013 differentially expressed genes (DEGs) using a q-value cutoff < 0.05, 456 of
172 which also appeared in the set of Tat-bound genes generated by our ChIP-seq analysis. We
173 refer to this dataset as direct targets. Remarkably, inactivating mutations that abolish Tat
174 recruitment to host cell chromatin also impair gene expression changes, indicating that the
8 175 effect of Tat on target gene transcription is direct and requires Tat binding and activity.
176 Specifically, reduced Tat C22A binding to promoters of Tat target genes correlates with
177 decreased gene expression changes (Figure 1–figure supplement 3), thus providing direct
178 evidence of Tat function.
179 Besides the direct target genes, we identified another set referred to as indirect targets.
180 These are genes that are differentially expressed in the presence of Tat (RNA-seq), but are not
181 directly bound by Tat (Figure 1C). Tat might regulate these genes through downstream effects
182 or alternative mechanisms (i.e. signaling pathways) or could be targets not identified by ChIP-
183 seq because of the high-confidence threshold used during peak calling. In this work we only
184 focused on the direct target genes to study the mechanisms of transcriptional control by Tat.
185 Further analysis of the direct Tat targets revealed that 244 genes are up-regulated and
186 212 genes down-regulated, and we refer to them as Tat stimulated genes (TSG) or Tat down-
187 regulated genes (TDG), respectively (Figure 1C). Importantly, we validated the expression of
188 several direct target genes using quantitative real-time PCR (qRT-PCR) assays, confirming the
189 reliability of RNA-seq (Figure 1D). Notably, protein expression analysis revealed that the
190 changes detected at the RNA level are also reflected at the protein level (for example CD69
191 expression at the cell surface in the presence of Tat) indicating that gene expression changes
192 are functional (Figure 1–figure supplement 4).
193 We further analyzed the distribution of Tat binding sites within the direct target genes to
194 probe for enrichment in particular genomic domains. We found that Tat is equally recruited to
195 both promoters (39%) and intragenic domains (39%) at TSG, whereas the majority of occupied
196 domains at TDG are at promoters (61%) (Figure 1–figure supplement 5). Inspection of
197 genome browser tracks showed that, in addition to binding TSG promoters (CD69), Tat is also
198 recruited to gene body regions with enrichment at introns (ADCYAP1) (Figure 1E). Notably,
199 this mode of Tat binding appears to be functionally relevant because it correlates with RNA
200 abundance changes as revealed by RNA-seq. In addition, Tat binds the target genes using
9 201 discrete (CD69 and ADCYAP1) or broad (CD1E) distribution patterns, probably due to the
202 different modes or mechanisms of recruitment to chromatin (Figure 1E). Importantly, we have
203 validated the FLAG ChIP-seq dataset by performing extensive ChIP-qPCR on several direct
204 targets including tow TSG (CD69 and ADCYAP1) and two TDG (CD1E and RAG1) in the Tat
205 and GFP cell lines (Figure 1–figure supplement 6).
206 Given that our model was built on the ectopic expression of Tat in target cells, we
207 performed an infection experiment to test whether Tat reprograms cellular transcription in a
208 similar way in the context of infection. Importantly, the TSG (CD69, FAM46C and PPM1H) and
209 TDG (CD1E, EOMES and FBLN2) tested are also modulated early during HIV infection of
210 Jurkat T cells, albeit with different kinetics (Figure 1–figure supplement 7), supporting the view
211 that the cellular reprogramming by Tat is functional and not simply an artifact of RNA-seq or the
212 ectopic expression of Tat outside the context of the virus.
213 Given that we proposed that Tat effects on host cell gene expression might be relevant
214 to normal HIV biology we asked whether the TSG and TDG are also modulated in response to
215 viral infection of primary CD4+ T cells. To this end, we isolated naïve CD4+ T cells from the
216 blood of healthy donors and generated central memory T cells (TCM) (Figure 1–figure
217 supplement 8A). After infection of TCM with replication competent X4 trophic virus (NL-GFP) or
218 mock infection and sorting infected cells, we isolated RNA and performed qRT-PCR analysis on
219 several TSG and TDG. We observed that HIV infected TCM cells showed the differential gene
220 expression signature (at least for the 12 direct targets examined) that we previously observed
221 with the ectopic expression of Tat or HIV infection of Jurkat T cells (Figure 1–figure
222 supplement 8B). This ex vivo experiment clearly demonstrates the robustness of our
223 minimalistic setting to study Tat functions in the host cell.
224
225 The direct Tat target genes share common functional annotations and are enriched in
226 pathways beneficial for the virus
10 227 If the genes directly modulated by Tat are involved in biologically relevant processes then we
228 would expect them to share functional annotations. To explore whether the TSG and TDG have
229 any common biological functions we examined their gene ontology (Figure 1F). To provide
230 statistical robustness, we used cluster analysis and a control set of genes depleted in the Tat
231 ChIP-seq experiment. Gene categories significantly enriched in the set of TSG include positive
232 regulation of immune system process, cell activation and regulation of lymphocyte
233 differentiation, while TDG include negative regulation of cell aging, regulation of myeloid cell
234 differentiation and processes of DNA/RNA biogenesis (Figure 1F). Consistently, network
235 analysis indicates that TSG are significantly enriched in T-cell receptor (TCR) pathway, cell
236 cycle and focal adhesion, while TDG enrich processes relevant for DNA/RNA processes,
237 ribosome and proteasome control, among others (Figure 1–figure supplement 9).
238 With respect to T-cell activation, CD69 exhibits a rather central role, because its
239 upregulation promotes T-cell stimulation and differentiation (TCR Pathway cluster) (Sancho et
240 al., 2005). Another stimulated process involves components of the cell cycle (CDK6) together
241 with cyclinD3 (CCND3) and cyclin-dependent kinase inhibitor 1B (CDKN1B) (Cell Cycle cluster)
242 that appear to be controlled by phosphorylation via the LCK kinase from the TCR complex, as
243 one of the central node in the network. Another controller node assembles the ataxia-
244 telangiectasia-mutated (ATM) serine/threonine kinase, which is best known for its role as an
245 activator of the DNA damage response (HIV Infection cluster). The activity of HIV integrase
246 stimulates an ATM-dependent DNA damage response, and ATM deficiency sensitizes cells to
247 retrovirus-induced cell death. In addition, ATM inhibition is capable of suppressing the
248 replication of both wild-type and drug-resistant HIV (Lau et al., 2005), thus demonstrating the
249 importance of this TSG in controlling viral processes. With respect to down-regulated
250 processes, ribosomal proteins centered around RPS9 (Ribosome cluster), together with
251 translation initiation factors (EIF3b) and the nucleolar and coiled-body phosphoprotein NOLC1,
252 as well as components of the spliceosome such as SF3B5, SNRPB, SNRNP200, LSM4, and
11 253 PCBP2 (Spliceosome cluster), suggest negative regulation of these processes (Figure 1–figure
254 supplement 9).
255 Because we proposed that the predicted GO biological processes of the direct Tat target
256 genes are essential for the viral life cycle (to promote a permissive state for viral replication), we
257 further analyzed whether they are retained in the context of infection. To test this, we
258 assembled a collection of 62 publicly available datasets including 48 gene sets from 13
259 publications containing information on DEGs identified during HIV infection together with 14
260 datasets from the Molecular Signatures Database (MSigDB) of the Broad Institute
261 (Subramanian et al., 2005) (Supplementary file 1A). After executing gene set enrichment
262 analysis with Bonferroni correction for multiple testing, we identified 5 datasets that significantly
263 enriched the TSG (with FDR ≤ 0.05) (Supplementary file 1B). In addition, TDG also enrich two
264 gene sets from the HIV relevant MSigDB sets (Figure 1–figure supplement 10A and
265 Supplementary file 1B). Together, the analysis provides evidence that the direct Tat target
266 genes (and thus the predicted GO biological processes) are retained in the context of viral
267 infection, supporting the model that Tat-mediated host cell reprogramming occurs during
268 infection. Notably, this proposal is also consistent with our infection data on Jurkat and primary
269 TCM cells (Figure 1–figure supplements 7 and 8).
270 Our network analysis indicated that TSG and TDG were enriched in specific biological
271 processes that might promote viral functions (including replication) (Figure 1–figure
272 supplement 9 and 10). To answer how those functional annotations could promote viral
273 infection, we employed a variety of methods to identify the functions of the direct Tat target
274 genes, and relate them to the biology of HIV. Functional annotation by GO classes provides an
275 overview of biological processes and functions enriched by the gene sets (Figure 1–figure
276 supplement 10B). Furthermore, we have obtained additional information from the “canonical
277 pathway” collection of MSigDB. By annotating clusters of the response network with GO and
278 MSigDB pathways we related biological processes with network content. For example, the TCR
12 279 Pathway cluster annotated with MSigDB includes ITK, LCK, LCP2, PRKCQ and VAV3, among
280 other targets. This cluster not only reveals that those targets interact with each other, but also
281 with the Tyrosine protein kinase LCK, which is known to phosphorylate both LCP2 and PRKCQ,
282 implying physical and functional interactions.
283 The data clearly indicates that TSG are enriched in datasets from many pathways that
284 correlate with stimulation of viral replication, such as the ‘TCR pathway/stimulation’ (CD3,
285 CD247, INPP5D, LCP2, and PTEN) and ‘downstream TCR signaling/pathway’ (PIK3CA and
286 PRKCQ), ‘T-cell co-stimulation’ (CTLA4 and CD28), ‘cell motility signaling/pathway’ (RAC1),
287 ‘generation of second messengers’ (EVL, ITK, and LCK), and ‘phosphatidylinositol signaling’
288 (INPP4A, INPP5D, PIK3CA, PLCB1, PRKCB, and PTEN), among others (Figure 1–figure
289 supplement 10B and Supplementary file 1B). The prime example of pathways enriched in
290 TSG is T-cell signaling/activation and co-stimulatory signals (CD28) that provide additional
291 control mechanisms to prevent inappropriate and hazardous T-cell activation. The data is
292 consistent with the fact that in early stages of infection the viral encoded proteins (particularly
293 Tat) mimic T-cell signaling pathways, resulting in sustained viral replication within infected T
294 cells. This T-cell activation provides new targets for HIV replication, creating a favorable
295 environment for further virus mediated damage to the immune system and chronic consumption
296 of the pools of naïve and resting memory cells (Hazenberg et al., 2000; Hellerstein et al., 2003).
297 Our data suggest that HIV uses Tat to directly induce several pathways (including T-cell
298 signaling) for productive infection of immune cells. Both the biological data and our prediction
299 suggest that TSG play important roles in these processes. On the other hand, TDG are
300 primarily enriched in datasets related to ‘metabolism of RNA’ (HSPA8, LSM4, and PRMT5), in
301 particular ‘ribosome’ (RPL and RPS variants), ‘proteasome’ (PSMB3, PSMB4, PSMB5, PSMC2,
302 PSMD1, and PSMD8), and ‘regulation of apoptosis’ (DAPK1, together with above proteasomal
303 genes), among others identified by the HIV life cycle/host interaction gene sets from Reactome
304 (Croft et al., 2014) (Figure 1–figure supplement 10B and Supplementary files 1C and D).
13 305 However, further studies will be required to precisely define the role of individual TSG and TDG
306 on viral replication.
307 Taken together, we have established the first comprehensive framework of target genes
308 in the human genome directly bound and regulated by Tat. Below we study the molecular
309 mechanisms of transcriptional control by examining genome-wide changes of Pol II and
310 chromatin signatures.
311
312 Global analysis of chromatin signatures indicates that Tat stimulates transcription at
313 both the initiation and elongation steps
314 With these functional insights we focused on the direct target genes to delineate the
315 mechanisms by which Tat regulates cellular transcription. Profiling of histone modifications has
316 revealed fundamental concepts in the regulation of transcription (Barski et al., 2007; Zhou et al.,
317 2011). To investigate whether Tat acts at the initiation or elongation steps, we performed ChIP-
318 seq of several histone modifications (in the absence and presence of Tat) that demarcate
319 distinct genomic domains such as promoters, coding units and enhancers and provide (in
320 general) information regarding their transcriptional status (Table 2).
321 Active promoters usually have adjacent nucleosomes bearing an H3K4me3 transcription
322 initiation mark (Bernstein et al., 2002; Guenther et al., 2007; Mikkelsen et al., 2007; Santos-
323 Rosa et al., 2002), and genes marked by H3K4me3 display significant amounts of
324 transcriptionally competent Pol II at promoters (Min et al., 2011). Thus, we first determined the
325 levels of H3K4me3 in both cell lines to test whether Tat modulates transcription initiation
326 (Figure 2A). After fixing TSS selection for a subset of Tat stimulated genes (TSG) such as
327 ADCYAP1 and ARHGEF7, where transcription starts at a short, internal isoform (Figure 2–
328 figure supplement 1A–B), we observed that the majority of TSG (excluding ATP9A, CD244,
329 ADCYAP1, CD226, SERINC2) are already marked with variable (low-to-high) H3K4me3 levels
330 and, as expected, its distribution mirrors the location of the promoter-adjacent nucleosomes
14 331 (Figure 2B). We thus defined two groups of genes based on H3K4me3 fold change levels in
332 the presence and absence of Tat (Tat/GFP). While the first group of genes (n = 17), referred to
333 as class I TSG, has undetectable or low H3K4me3 levels in the absence of Tat (Figure 2A and
334 Figure 2–figure supplement 1C–D), the second group of genes, referred to as class II TSG,
335 shows medium-to-high H3K4me3 levels surrounding the TSS (Figures 2A and B).
336 In the presence of Tat, class I TSG experiences a large increase in H3K4me3 density at
337 promoter-proximal regions (2-to-10–fold change) (Figures 2A and B, and Figure 2–figure
338 supplement 1C–D). Interestingly, Tat can selectively promote de novo transcription initiation at
339 a small subset of genes (such as ADCYAP1, ATP9A and CD244), as revealed by the large fold
340 changes in H3K4me3, and selection of a non-canonical or novel TSS (Figure 2–figure
341 supplement 1A, C and D). For example, Tat induces H3K4me3 at an internal site (intron 4) of
342 ADCYAP1 but not at the canonical TSS, which coincides with the production of a novel, short
343 isoform of a yet unknown function (Figure 2–figure supplement 1A). In contrast to class I
344 TSG, class II shows no change or, unexpectedly, a slight reduction in H3K4me3 levels in the
345 presence of Tat (~1.5-2–fold decrease) (Figures 2A and B), suggesting a post-initiation role for
346 Tat in activating these genes (see below).
347 The H3K4me3 density surrounding the TSS of class I TSG was typically lower (at least
348 5-fold less) than the signal in actively transcribed genes. As expected, these genes are
349 transcriptionally inactive or show low RNA levels (<10 Fragments Per Kilobase of transcript per
350 Million mapped reads (FPKM)) such as signaling peptide hormones (ADCYAP1), transcription
351 factors involved in immune system maturation (ZNF521, RORβ, ETV6) and genes essential for
352 T-cell maturation and responses (CD69, CD244). Interestingly, we observed a positive
353 correlation (R2=0.764) between the increase in transcript levels and H3K4me3 density nearby
354 the promoters of these genes, and that highly stimulated genes (ADCYAP1 and CD69)
355 experience larger increase in H3K4me3 density compared with other genes (Figure 2–figure
356 supplement 2A). On the other hand, class II TSG shows a much lower correlation (R2=0.177)
15 357 between the increase in transcript levels and H3K4me3 density surrounding promoters (Figure
358 2–figure supplement 2B), consistent with the idea that they are regulated at a post-initiation
359 step (see below).
360 If increased transcription initiation at these genes correlates with a productive increase in
361 RNA levels, then we would expect to find nucleosome modifications associated with
362 transcription elongation throughout the coding units (Li et al., 2007). Previous studies have
363 elucidated that in actively transcribed genes, H3K79me2/3-modified nucleosomes are present at
364 their highest levels shortly downstream of the TSS and that H3K36me3 modifications occupy
365 the entire gene body, with increasing density towards the 3’-end of the gene (Guenther et al.,
366 2007; Kolasinska-Zwierz et al., 2009; Li et al., 2007; Seila et al., 2008; Zhou et al., 2011). To
367 examine transcription states in detail, we carried out ChIP-seq experiments of H3K79me3 and
368 H3K36me3 using validated antibodies (Egelhofer et al., 2011). To assess Tat’s role in coupling
369 transcription initiation with elongation at class I TSG, we calculated the fold change in
370 H3K79me3 and H3K36me3 tag density in the presence and absence of Tat (Tat/GFP) as well
371 as changes in their gene average distribution in both class I and II TSG (Figures 2C–F). As
372 expected, we observed that class I TSG (activated at the initiation step) also shows evidence of
373 transcription elongation, based on the increase in H3K79me3/H3K36me3, total Pol II and
374 elongating Pol II levels, as well as recruitment of the P-TEFb kinase at two TSG (CD69 and
375 ADCYAP1) (Figure 2–figure supplement 3). While in the absence of Tat, class I TSG are
376 devoid of H3K79me3 throughout the gene, Tat promotes a robust increase in H3K79me3 (~6.2-
377 fold over GFP) just downstream of the TSS with a progressive decline towards the transcription
378 termination site (TTS) (Figures 2C–F, and Figure 2–figure supplement 4). Similarly, levels of
379 H3K36me3 at class I TSG are low in the gene body and increase towards the 3’-end of the gene
380 in the presence of Tat (~5.3-fold over GFP) (Figure 2F). Remarkably, the average distribution
381 patterns of both H3K79me3 and H3K36me3 in these genes are consistent with previous
382 genome-wide distribution analysis (Kouzarides, 2007; Li et al., 2007).
16 383 Because H3K4me3 levels do not increase in the class II TSG, we reasoned that these
384 genes are regulated by Tat at a post-initiation step. If this were the case, then we would expect
385 to find an increase in nucleosome modifications associated with transcription elongation marks
386 in the presence of Tat, despite the lack of H3K4me3 increase. To test this possibility, we
387 examined H3K36me3 and H3K79me3 distribution and density in these genes. We ignored
388 genes that lack H3K79me3 irrespective of Tat presence as well as intronless genes, which
389 complicated the density and distribution calculations, and thus ended with a more cohesive and
390 consistently behaved group of class II TSG (n = 43). As expected, most class II TSG showed
391 increased H3K79me3 (100%) and H3K36me3 (~82%) within their gene bodies (Figures 2C–F)
392 consistent with a role of Tat in promoting transcription elongation. Together, this analysis
393 suggests that the majority of target genes that do not experience transcription initiation changes
394 in response to Tat do show evidence of increased levels of promoter escape and transition to
395 active elongation based on the global analysis of chromatin signatures (Figure 2–figure
396 supplement 4) and Pol II (see below). Although the increase in H3K79me3 is significant,
397 H3K36me3 changes are smaller, probably because class II TSG is active and their transcribing
398 units already demarcated by H3K36me3 (Figure 2C–F). Again, the average gene distribution of
399 H3K79me3 and H3K36me3 correlates with previous studies suggesting that H3K79me3 is most
400 enriched shortly downstream of the TSS and H3K36me3 increases towards the middle and end
401 of the gene (Figures 2 D and F) (Kouzarides, 2007; Li et al., 2007).
402 Inspection of individual gene tracks from our genome-wide study reveals examples of
403 these two regulatory mechanisms (Figures 2G and H). At the neuropilin and tolloid-like 1
404 (NETO1) locus (class I TSG), which encodes a transmembrane receptor that plays a critical role
405 in spatial learning and memory, Tat binding at the promoter-proximal region correlates with a
406 sharp increase in H3K4me3 (~8-fold) with concomitant increases in H3K79me3 and H3K36me3
407 downstream of the TSS (Figure 2G). Conversely, at the VAV3 guanine nucleotide exchange
408 factor locus (class II TSG), the H3K4me3 signature surrounding the TSS does not appear to
17 409 change, while the markers of active transcription in coding units (H3K79me3 and H3K36me3)
410 showed large increases in the presence of Tat, indicating a role in transcription elongation
411 (Figure 2H). Heatmap representation of the density of all three chromatin signatures at
412 promoter and gene body of class I and class II TSG demonstrate that this is common for all
413 target genes, albeit with differences in fold change (Figure 2–figure supplement 4A–E), which
414 is in agreement with the broad distribution of gene expression changes (Figure 1D).
415 Given that we defined two classes of genes using the minimalistic Tat ectopic
416 expression system, we asked whether these genes are also modified in a similar manner (at the
417 initiation or elongation steps) during HIV infection. To test this, we analyzed the levels of
418 initiating (promoter-proximal) and elongating (promoter-distal) transcripts in primary TCM cells
419 infected with replication-competent HIV versus mock infection. Interestingly, we observed that
420 two class I TSG (CD69 and PPM1H) and two class II TSG (VAV3 and ANXA1) showed
421 evidence of increased initiation or elongation, respectively (Figure 1–figure supplements 8C
422 and D). This implies that the model of cellular reprogramming by the ectopic expression of Tat
423 alone is mirrored (at least for the target genes examined) during HIV infection of primary T cells.
424 Several enzymes are known to regulate histone modifications associated with distinct
425 epigenetic states. If Tat promotes cellular reprogramming by modifying the epigenetic
426 landscape then we would expect Tat to interact with chromatin-modifying enzymes associated
427 with the respective histone modifications. While H3K79me3 is generated by the Dot1L complex
428 (Nguyen and Zhang, 2011), the H3K36me3 mark is imposed by the SetD2 methyltransferase
429 recruited by elongating Pol II and enriched within the body of transcriptionally active genes
430 (Guenther et al., 2007; Krogan et al., 2003a; Krogan et al., 2003b; Nguyen and Zhang, 2011;
431 Pokholok et al., 2005). To test if Tat does indeed interact with these enzymes, we first affinity
432 purified (AP) Strep-tagged Tat (or GFP as negative control) from nuclear fractions of the Jurkat
433 T-cell lines and observed that Tat binds both Dot1L and SetD2 as well as the P-TEFb kinase
434 (CDK9) used as positive control in this assay (Figure 2–figure supplement 5A). To further test
18 435 whether changes in the epigenetic landscape directly correlate with the recruitment of these
436 chromatin modifiers to specific genes, we performed ChIP followed by quantitative PCR (ChIP-
437 qPCR) and found that the Tat-mediated recruitment of Dot1L and SetD2 to the CD69 locus
438 (class I TSG) correlates well with the increase in the histone modifications associated with
439 transcription elongation as well as Pol II (Figure 2–figure supplement 5B–C). To provide a
440 functional link between the Tat-mediated recruitment of these chromatin modifiers and gene
441 expression changes, we used short hairpin RNAs (shRNAs) to target Dot1L and SetD2 by RNA
442 interference (Figure 2–figure supplement 5D–H). Interestingly, we noted that after knockdown
443 of these chromatin-modifying enzymes as well as of CDK9, the increase in CD69 RNA levels
444 (but not RPL19) in response to Tat is virtually abolished, implying that Tat-mediated recruitment
445 of Dot1L, SetD2 and P-TEFb to TSG is a requisite for their increased transcription elongation
446 levels, in agreement with the ChIP data (Figure 2–figure supplements 3 and 5).
447 The discovery of these key enzymes as Tat targets and the evidence that they play
448 important roles in Tat-mediated cellular gene expression alterations suggest that this regulatory
449 mechanism is part of the reprogramming of target immune cells by Tat. Collectively, we have
450 described a set of human genes stimulated by Tat at two different steps in the transcription
451 cycle (initiation and elongation) using selective chromatin-modifying enzymes and the
452 transcription elongation machinery.
453
454 Tat promotes Pol II recruitment and pause release to induce initiation and elongation at
455 different gene classes
456 Pol II regulates the control of transcription initiation and elongation in the context of chromatin
457 (Fuda et al., 2009). In fact, the deposition of histone modifications associated with initiation and
458 elongation at promoters and gene bodies has been functionally linked to levels of recruited and
459 transcriptionally engaged Pol II, respectively (Adelman and Lis, 2012). Therefore, if Tat
460 promotes transcription initiation and elongation as proposed (Figure 2), we would expect Pol II
19 461 levels to fluctuate in response to Tat in ways that reflect the specific mode of activation. Pol II is
462 recruited to gene promoters to initiate transcription, but also tends to occupy promoter-proximal
463 regions of genes that show evidence of initiation but no or inefficient elongation, which is
464 referred to as promoter-proximal pausing (Adelman and Lis, 2012; Fuda et al., 2009; Rahl et al.,
465 2010; Wade and Struhl, 2008). To further investigate whether Tat promotes Pol II recruitment
466 (indicative of transcription initiation) or increases Pol II levels within gene bodies (indicative of
467 transcription elongation) we used ChIP-seq to examine Pol II distribution in the absence and
468 presence of Tat (Figure 3).
469 We first analyzed levels of Pol II at both Tat binding sites (Tat peak) and promoters of
470 target genes in cases where Tat only binds to promoter-distal sites. Interestingly, we found that
471 Tat stimulates Pol II recruitment at both the Tat binding site and promoter-proximal region of
472 class I TSG (Figures 3A–C), in perfect agreement with the Tat-mediated increase in H3K4me3
473 at those gene promoters (Figures 2A and B). Notably, this observation is consistent with the
474 model that Tat mediates transcription initiation at class I TSG. Conversely, as expected, we did
475 not see higher levels of Pol II at the promoters of class II TSG in response to Tat (Figures 3A,
476 B and D). This lack of Pol II recruitment further supports our proposed model based on the
477 global analysis of chromatin signatures that class II TSG are not regulated at the initiation step
478 (Figure 2).
479 Interestingly, a metagene analysis of class I TSG in the absence of Tat showed very low
480 Pol II density levels in both the promoter and transcribing unit. Conversely, Tat induces an
481 increase in Pol II density throughout the gene body with a noticeable peak at the promoter-
482 proximal region, consistent with Pol II recruitment to promoters and transition into elongation
483 (Figure 3E) and with the increase in H3K4me3 at the promoter-proximal regions associated
484 with those genes (Figures 2A and B, and Figure 2–figure supplement 3). Genome browser
485 views of individual class I TSG such as NETO1 exemplify the metagene analysis depicting that
486 Pol II is strongly recruited to the intragenic Tat binding site and promoter, marked with
20 487 H3K4me3 in the presence of Tat (Figure 3F), and that transcription initiation correlates with
488 increase in chromatin signatures (H3K79me3 and H3K36me3) associated with active
489 transcription in gene bodies, as well as transcribing Pol II (Figure 3–figure supplement 1A).
490 We have previously defined class II TSG as being primarily regulated at the elongation
491 step because H3K4me3 and Pol II density at the promoter-proximal region do not increase in
492 the presence of Tat (Figures 2 and 3A–D). Examination of two class II TSG (VAV3 and CD82)
493 using Chip-qPCR clearly demonstrates that Tat induces Pol II elongation by recruiting the P-
494 TEFb kinase at those target genes (Figure 3–figure supplement 2A–B). Consistently, a
495 metagene analysis shows that Tat largely increases Pol II density throughout the gene body and
496 3’-end of class II TSG, but not at the promoter, in agreement with a role of Tat in promoting the
497 transition to elongation (Figure 3G). Therefore, in the absence of Tat stimulation, the majority
498 of Pol II at class II TSG accumulates in the promoter-proximal region with a peak just
499 downstream of the TSS while Tat strongly induces increased Pol II density in the gene body but
500 not in the promoter-proximal region (Figure 3G). Genome browser views of individual class II
501 TSG such as VAV3 are consistent with the elongation function (Figure 3H), and are also in
502 perfect agreement with the analysis of chromatin signatures related to elongation (H3K79me3
503 and H3K36me3) (Figure 2 and Figure 3–figure supplement 1B).
504 Collectively, the data indicate that Tat controls both Pol II recruitment and pause release
505 to promote initiation and elongation in different gene classes.
506
507 Tat stimulates transcription initiation from intragenic enhancers by inducing gene
508 looping
509 Tat is recruited to two different genomic domains (promoters and intragenic) irrespective of the
510 transcription step regulated (initiation or elongation) (Figure 4A), implying that the site of Tat
511 recruitment to its target genes does not dictate the mechanism of transcription activation. To
512 further elucidate how Tat promotes transcription by binding to promoters or intragenic sites, we
21 513 examined how chromatin signatures associated with these genomic domains change in
514 response to Tat. Promoters bound by Tat are marked with the expected signature: high
515 H3K4me3 and low H3K27Ac (Figure 4B) (Creyghton et al., 2010; Heintzman and Ren, 2009).
516 Conversely, intragenic sites bound by Tat are marked with the enhancer signature: high
517 H3K27Ac and low H3K4me3 (Figure 4B), as well as high H3K4me1 (data not shown)
518 (Creyghton et al., 2010; Heintzman and Ren, 2009; Zhou et al., 2011). This suggests that the
519 intragenic Tat-binding sites appear to be intragenic enhancers, which have been previously
520 proposed to function as alternative elements required for gene activation (Kowalczyk et al.,
521 2012).
522 To better define the roles of Tat in activating transcription from these promoter-distal,
523 intragenic sites, we sorted class I and class II TSG based on the H3K27Ac density surrounding
524 the Tat peak. Notably, we observed that in the absence of Tat, the intragenic sites at class I
525 TSG have low or undetectable levels of H3K27Ac (Figure 4C), consistent with the idea that
526 these genes are inactive or only minimally transcribed in the basal state without Tat (Figure 2).
527 However, Tat increases H3K27Ac density near its intragenic binding sites by ~2–10-fold
528 depending on the gene (Figures 4C and D). Conversely, in the absence of Tat, H3K27Ac
529 levels at the intragenic sites of class II TSG are high and Tat increases their density, albeit with
530 a lower fold-change than in class I TSG (Figures 4C and D). This is consistent with the model
531 that these genes are active in the basal state and Tat activates a post-initiation step, namely
532 transcription elongation. Metagene analysis of class I TSG indicates that the H3K27Ac mark at
533 intragenic binding sites is virtually absent at the immediate binding site itself but high in the
534 nucleosomes directly surrounding the Tat peak, progressively declining further up and
535 downstream from the binding site (Figure 4E). At these sites, Tat increased H3K27Ac levels an
536 average of ~3-fold. A similar H3K27Ac distribution pattern is observed in class II TSG, even
537 though the magnitude of H3K27Ac increase is smaller (~1.5-fold) because these genes are
538 already active in the absence of Tat (Figure 4F).
22 539 Given that the histone acetyl transferase p300/CBP is a well-known Tat interactor and
540 that it is recruited to enhancers to facilitate transcription activation through chromatin acetylation
541 (including H3K27Ac) (Hottiger and Nabel, 1998; Jager et al., 2012; Kim et al., 2010b), we asked
542 whether Tat recruits p300 to these intragenic sites to trigger H3K27Ac. To test this possibility,
543 we performed ChIP assays on one class I TSG (PPM1H) and one class II TSG (CD82).
544 Notably, we observed that in the presence of Tat, the increase in H3K27Ac levels at class I
545 intragenic sites in response to Tat mirrors the recruitment of p300 (Figure 4G). However, levels
546 of both H3K27Ac and p300 detected at the intragenic sites of class II TSG are already high in
547 the basal state and are slightly induced (~1.2-1.4-fold) in response to Tat (Figure 4H).
548 Remarkably, we observed a sharp correlation (R2=0.853) between increased H3K27Ac
549 density at the intragenic site and gene expression levels at class I TSG (Figure 4–figure
550 supplement 1A), supporting a model in which Tat is recruited to these sites to induce de novo
551 transcription activation through recruitment of p300 at internal sites. However, the correlation
552 between H3K27Ac density at the intragenic site and gene expression levels at class II TSG is
553 quite low (R2=0.0728) because these genes are already marked by H3K27Ac in the absence of
554 Tat (Figure 4–figure supplement 1B). These results support the model that Tat binds at
555 intragenic sites of non-productive genes (class I TSG) to induce their transcription initiation or is
556 recruited to productive genes (actively transcribed) such as class II TSG to augment
557 transcription elongation levels.
558 Because these Tat binding sites are located distally from the promoter in the majority of
559 class I TSG (15 out of 17), we hypothesized a model in which Tat controls gene looping to
560 induce spatial proximity between the intragenic site (putative enhancer) and the promoter. To
561 further test this possibility in detail we selected one class I TSG (PPM1H), which is about 300 kb
562 in length, thus facilitating the analysis of long-range chromatin interactions (Figure 4I). We
563 performed chromosome conformation capture (3C) to assess the association of two intragenic
564 Tat target sites and the promoter. We also performed a similar analysis between the gene
23 565 promoter and distal intergenic sites located at a similar distance from the promoter but not
566 bound by Tat. We used anchoring points near the gene promoter to measure the extent of
567 chromatin looping between the promoter and the intragenic Tat-bound sites or the distal control
568 domain. Interestingly, this analysis revealed that prior to Tat stimulation there is no obvious
569 interaction between the gene promoter and the two intragenic Tat-bound sites. However, Tat
570 promotes a strong association between the two sites, albeit with different crosslinking
571 efficiencies (Figure 4I), and this gene looping needs a functional Tat, because the C22A non-
572 functional Tat mutant does not promote this effect (Figure 4–figure supplement 2). Moreover,
573 the effect of Tat on chromatin looping is specific as there was no detectable association when a
574 control anchor primer was placed about 20 kb upstream from the PPM1H promoter (Figure 4–
575 figure supplement 2).
576 Gene looping could be a direct consequence of Tat activity at promoters and enhancers,
577 or Tat can simply help activate enhancers, causing them to increase looping/proximity to their
578 target promoters (Bulger and Groudine, 2011). To distinguish between these two possibilities
579 we used flavopiridol (FP), a CDK9 inhibitor that, in addition to blocking transcription elongation,
580 also inhibits the production of eRNAs (enhancer-derived RNAs), without affecting other
581 molecular indicators of enhancer activity (such as Pol II binding, H3K4me1 levels, and gene
582 looping) (Hah et al., 2013). We used FP in the Jurkat-GFP and -Tat cell lines and performed 3C
583 experiments to uncouple the assembly of enhancer complexes and gene looping per se from
584 eRNA production. The data suggests that Tat promotes gene looping even in the absence of
585 eRNA synthesis (Figure 4–figure supplement 3A–D), implying that eRNA synthesis occurs
586 after assembly of Tat and the transcription machinery at the enhancer, and enhancer-promoter
587 communication.
588 We have suggested that class II TSG have paused Pol II, and it is known that promoter-
589 enhancer loops are associated with paused Pol II (Ghavi-Helm et al., 2014). We thus wished to
590 test whether gene looping is also taking place at class II TSG, and if Tat plays any role. We first
24 591 analyzed the percentage of class II TSG containing intragenic Tat-bound sites (n = 34 out of 43,
592 79%) and then examined whether gene looping is critical for their expression and whether Tat
593 promotes the looping (Figure 4–figure supplement 4). For this we selected one class II TSG
594 (CD82), which shows a strong Pol II peak promoter-proximally, has one Tat-bound site
595 intragenically, and shows evidence of looping between the promoter and the intragenic Tat-
596 bound site. We found no evidence that Tat modulates gene looping at the CD82 loci thereby
597 indicating that Tat's effects at class II TSG is at a step post-formation of long-range chromatin
598 interactions (Figure 4–figure supplement 4), in agreement with a role of Tat in promoting
599 elongation.
600 In conclusion, class I TSG is transcriptionally inactive (or transcribed at a low rate) and
601 Tat binds to intragenic sites to induce chromatin looping and transcription activation. On the
602 other hand, class II TSG is actively transcribed and Tat promotes a post-initiation step (namely
603 Pol II elongation) by binding to both promoters and/or intragenic sites without affecting gene
604 looping.
605
606 Implementation of a modified traveling ratio algorithm indicates that Tat promotes
607 transcription elongation at class II TSG
608 While at class I TSG Tat promotes the initiation step, at class II TSG Tat appears to induce Pol
609 II transition into the elongation phase (Figures 2 and 3). Previously, an algorithm that
610 computes Pol II levels in promoter-proximal regions and gene body (termed Traveling Ratio
611 (TR)) has been devised to examine the transition from initiation into elongation (Figure 5A)
612 (Rahl et al., 2010; Reppas et al., 2006; Wade and Struhl, 2008; Zeitlinger et al., 2007). We
613 thought to apply such an algorithm to examine Tat functions in initiation and elongation by
614 comparing Pol II levels found in promoter-proximal regions (-50 to +1000 respective to the TSS)
615 versus levels within the gene body. Unexpectedly, we found in our ChIP-seq datasets that in
616 both the presence and absence of Tat, Pol II levels tend to peak not only near the promoter of
25 617 many genes, but also at certain intronic domains marked by H3K27Ac and H3K4me1,
618 sometimes at densities similar to those found at the promoter-proximal region (Figure 5B).
619 These sites do not contain any known annotation in the UCSC genome browser and does not
620 appear to contain a non-canonical TSS because the H3K4me3 levels are low-to-undetectable.
621 Thus, to examine if these Pol II forms were technical artifacts, we performed ChIP-seq with
622 other Pol II Abs (total Pol II, CTD, Ser5P-CTD and Ser2P-CTD), both in the absence and
623 presence of Tat, and found that, irrespective of the Ab used, this stably paused Pol II form was
624 detected in the gene body, albeit with variable density levels (Figure 5–figure supplement 1).
625 Similarly, genome browser searches revealed that intragenically poised Pol II is also detected in
626 primary CD4+ T cells, thus ruling out the possibility that it is an artifact of Pol II distribution
627 present in immortalized T cell lines growing in tissue culture. Moreover, examination of our
628 ChIP-seq chromatin signature tracks indicated that the sites of intragenic Pol II pausing
629 correspond to potential enhancers because they are marked with the corresponding enhancer
630 signature (high H3K4me1/H3K27Ac and low H3K4me3) (Figure 5B) (Heintzman and Ren,
631 2009).
632 Given that we found intragenic enhancers containing a form of Pol II that appears to be
633 paused, and that enhancer-promoter interactions are frequently associated with paused Pol II
634 (Ghavi-Helm et al., 2014), we speculated that the original TR algorithm might not accurately
635 describe Tat’s role in transcription elongation in our dataset. The problem arises because the
636 TR algorithm treats all Pol II in the gene body as actively elongating. This assumption is invalid
637 in cases where Pol II appears paused intragenically, or is detected at an intragenic enhancer
638 because of chromatin looping. In order to circumvent this problem we devised a custom-made
639 algorithm referred to as “modified TR” (mTR) to identify intragenic enhancers marked by
640 H3K4me1/H3K27Ac and paused Pol II. mTR accurately categorizes these sites to more
641 precisely calculate the promoter-proximal and gene body Pol II counts (Figure 5C). We defined
642 paused Pol II (pPol II) as Pol II located in both the promoter-proximal region (-50/+1000 from the
26 643 TSS) and at intragenic sites marked with H3K27Ac (as determined by MACS2, with a peak
644 cutoff of p-value < 0.05) and having a read density/nt greater than five times that of the average
645 density/nt within the gene body. Of note, we used a window of -50/+1000 from the TSS rather
646 than -50/+300 because several genes contained accumulation of Pol II beyond the +100, and
647 up to the +1000 position. Transcribing Pol II (tPol II) is defined as Pol II in the remainder of the
648 gene that is not associated with the enhancer signature. Then, the mTR is the ratio of average
649 pPol II (promoter+gene body) density to average tPol II (gene body) density (Figure 5C). As
650 expected, class II TSG showed decreased mTR in the presence of Tat compared to the GFP
651 cell line (Figure 5D).
652 Despite this compelling observation, the mTR could be altered in response to Tat either
653 by fluctuations in the levels of pPol II, tPol II, or both. Therefore, to more clearly explore Tat’s
654 role in controlling elongation, we computed densities for both Pol II classes (pPol II and tPol II)
655 and compared them separately in the GFP and Tat cell lines. For each gene, we calculated the
656 ratio of average pPol II and tPol II densities in the two cell lines (Tat/GFP) and plotted them on
657 the X- and Y-axis, respectively (Figure 5E). This tool plot demonstrates clearly the Pol II
658 behavior in response to Tat, both in terms of changes to initiation and elongation. While class I
659 TSG showed profound increase in pPol II and tPol II as consequence of transcription
660 initiation/elongation activation, class II TSG shows, albeit with gene-specific differences, higher
661 evidence of elongation (increased tPol II form in the presence of Tat) (Figure 5E).
662 Together, the data suggest that Tat has evolved to fine-tune both transcription initiation
663 and elongation steps at functionally different gene classes. At class I TSG, Tat directly
664 mediates Pol II recruitment to gene promoters or binds to intragenic sites to induce chromatin
665 looping (promoter-enhancer communication) to trigger transcription initiation (Figure 5F). At
666 class II TSG, Tat binds to already engaged Pol II to promote its escape into the productive
667 elongation phase (Figure 5G).
668
27 669 Global analysis of chromatin signatures and Pol II distribution indicate that Tat
670 downregulates transcription by blocking both initiation and elongation
671 Global analysis of chromatin signatures in stimulated genes revealed that Tat induces both
672 transcription initiation and elongation (Figure 2). Thus, we computationally examined chromatin
673 signatures throughout promoter-proximal and -distal regions of Tat downregulated genes (TDG)
674 to determine at which step in the transcription cycle Tat blocks gene activation. Despite the
675 identification of genes having marked changes in both chromatin initiation (H3K4me3) and
676 elongation signatures (H3K79me3 and H3K36me3) such as CD1E and FBLN2, or chromatin
677 elongation signatures only such as EOMES and HSPA8, this distinction did not become as clear
678 for genes having low gene expression changes (<3-fold). Therefore, we used a combined
679 chromatin/Pol II signature to identify initiation- (class I) and elongation-regulated (class II) TDG
680 (Figure 6). We defined class I TDG as genes having chromatin and Pol II signatures consistent
681 with inhibited transcription initiation, with at least 60% decrease in H3K4me3 surrounding the
682 TSS and 30% decrease in promoter-proximal Pol II with Tat (fold H3K4me3 Tat/GFP<0.4 and fold
683 Pol IITat/GFP<0.7). Conversely, class II TDG are inhibited at later stages of transcription because
684 H3K4me3 and Pol II promoter-proximal levels remain virtually unchanged in the presence of Tat
685 (fold H3K4me3 Tat/GFP>0.7 and fold Pol IITat/GFP>0.75). Therefore, we defined class II TDG as
686 genes having less than 30% decrease H3K4me3 surrounding the TSS and less than 25%
687 decrease in promoter-proximal Pol II level in response to Tat. Using this signature, we identified
688 11 class I TDG and 14 class II TDG that strictly pass the above mentioned criteria. Remarkably,
689 these genes also appear to be controlled at the initiation and elongation level in primary TCM
690 cells infected with HIV (Figure 1–figure supplement 8E–F). Although further research is
691 needed to pinpoint the details, the rest of the genes could be a mixed class I-II population in
692 which Tat simultaneously interfere with both the initiation and elongation steps (see Discussion).
693 Class I TDG experience a sharp decrease in H3K4me3 (~4.2-fold change) and in Pol II
694 (see below) in the presence of Tat (Figures 6A and B), consistent with a role of Tat in blocking
28 695 transcription initiation. In support of this view, we observed a statistically significant correlation
696 between the decrease in H3K4me3 levels surrounding the TSS and RNA levels at class I TDG
697 but not class II (data not shown). To further examine how Tat blocks the initiation step, we
698 performed detailed ChIP-qPCR assays to monitor the occupancy of subunits of the transcription
699 pre-initiation complex. Interestingly, we observed that Tat recruitment to the CD1E gene
700 promoter blocks the step of PIC assembly as denoted by the sharp decrease in the occupancy
701 levels of TBP, Mediator (MED1), Pol II and the P-TEFb kinase (CDK9) (Figure 6–figure
702 supplement 1A), in agreement with the virtual removal of H3K4me3 (Figures 6A and G). On
703 the contrary, class II TDG showed no significant change in H3K4me3 density surrounding the
704 TSS (Figures 6B and H), in agreement with the lack of Tat effects at the transcription initiation
705 level. Consistently, at the class II TDG EOMES, Tat does not interfere with PIC assembly (as
706 revealed by identical levels of TBP and MED1 at the promoter). However, Tat blocks Pol II
707 transition into productive elongation due to dismissal of P-TEFb from both the promoter and
708 transcription unit (Figure 6–figure supplement 1B), which affects levels of total Pol II and the
709 elongating form (Ser2P) in the gene body but not at the promoter-proximal region at two class II
710 TDG (EOMES and HSPA8) (Figure 3–figure supplement 2C–D).
711 If Tat blocks transcription initiation, then we would predict that the chromatin elongation
712 signatures would also decrease, as there is no recruited Pol II available for elongation. In
713 agreement, class I TDG showed a virtual elimination of H3K79me3 and H3K36me3 levels in the
714 transcribed unit, reflecting potent inhibition of these genes (Figures 6C–F). The metagene
715 analysis clearly indicated that all class I TDG showed reduced levels of both chromatin
716 elongation signatures (Figures 6D and F). However, the difference in magnitude at different
717 genes was very broad, most likely due to the fact that these genes are transcribed at different
718 levels and their abundance (based on RNA-seq) is quite disparate (Figures 6C and E).
719 Genome browser inspection of individual genes like CD1E, which is expressed at high levels,
720 provide evidence that Tat is a very potent transcription initiation repressor, as revealed by the
29 721 large drop in all chromatin signatures profiled (Figure 6G) and inhibition of PIC assembly at the
722 promoter, as revealed by the dismissal of TBP, MED1, CDK9 and Pol II at the CD1E loci
723 (Figure 6–figure supplement 1A).
724 In contrast to class I TDG, class II showed no significant change in H3K4me3 levels
725 surrounding the TSS (Figures 6A–B). However, a subset of class II genes such as EOMES,
726 HSPA8 and CDK6, showed reduced chromatin elongation signatures (H3K79me3 and
727 H3K36me3) throughout the transcribing unit (Figures 6C, E, and H). Genome browser
728 inspection of individual class II TDG like EOMES demonstrate that Tat blocks the transition to
729 elongation because Pol II density largely decreases at the transcribing unit without significant
730 alterations in the promoter-proximal region, consistent with decreased chromatin elongation
731 signatures but no H3K4me3 effects (Figure 6H). However, surprisingly, a few genes showed
732 no change in H3K79me3 modification, or even a small increase, right after the TSS (Figure 6C),
733 possibly related to the fact that Dot1L-mediated establishment of H3K79me3 also has been
734 linked with transcriptional repression (Nguyen and Zhang, 2011; Onder et al., 2012). However,
735 this alternative function of Dot1L in transcription repression will require further investigation.
736 Taken together, it is evident that Tat functions as a transcriptional repressor blocking Pol
737 II recruitment and pause release as well as promoting the removal of chromatin modifications
738 associated with initiation and elongation at different gene classes.
739
740 Tat blocks Pol II recruitment and pause release to repress transcription initiation and
741 elongation
742 The fact that Tat modifies the epigenetic landscape to repress cellular transcription prompted us
743 to examine Tat effects on Pol II distribution changes (Figure 7). If Tat represses transcription
744 initiation by blocking PIC assembly such as in the CD1E gene then we would expect Pol II
745 levels to decrease at gene promoters and other functionally relevant sites in the presence of
746 Tat. We first analyzed levels of Pol II at both Tat binding sites (Tat peak) and promoters of
30 747 associated genes for both class I and class II TDG. Interestingly, we found that Tat blocks Pol II
748 recruitment at both the Tat binding site and promoter-proximal region of class I TDG (Figures
749 7A–C), in perfect agreement with the Tat-mediated decrease in H3K4me3 at those promoters
750 (Figure 6). Remarkably, these observations are in agreement with the model that Tat blocks
751 transcription initiation at class I TDG. Conversely, as expected based on the global analysis of
752 chromatin signatures, Tat does not appear to largely block Pol II recruitment at the Tat peak and
753 promoter of class II TDG (Figures 7A, B and D), which we proposed to be regulated at the
754 elongation level based on the global analysis of chromatin signatures and Pol II levels (Figure
755 6). In fact, this is in agreement with the lack of H3K4me3 density changes at promoters of class
756 II TDG in the presence of Tat.
757 Metagene analysis of class I TDG showed high Pol II density levels in both the gene and
758 transcribing unit in the absence of Tat. However, Tat reduces Pol II density at the promoter,
759 consistent with a drop in transcribing Pol II at the gene body (Figure 7E). Genome browser
760 views of individual class I TDG such as CD1E clearly exemplify the metagene analysis depicting
761 that Pol II recruitment at the promoter is strongly blocked, leading to the virtual disappearance of
762 Pol II signal throughout the gene body. The transcription initiation signature H3K4me3 is also
763 eliminated (Figure 7F), and the blockage of transcription initiation correlates with decrease in
764 chromatin signatures associated with transcription elongation (H3K79me3 and H3K36me3)
765 throughout the gene body (Figure 6).
766 We have defined class II TDG as being primarily controlled at the elongation step
767 because H3K4me3 and Pol II density at the promoter-proximal region does not appear to be
768 modified by Tat (Figures 6 and 7). Consistently, a metagene analysis shows that Tat
769 decreases Pol II density throughout the gene body and 3’-end, but not at the promoter-proximal
770 region, in agreement with a role of Tat in blocking the transition to elongation (Figure 7G).
771 Genome browser views of individual class II TDG such as EOMES are consistent with the block
772 of Pol II pause release (Figure 7H), and with the global analysis of chromatin signatures related
31 773 to elongation (Figure 6). Reduced elongation is due to the block of P-TEFb recruitment to
774 promoters at class II TDG (EOMES and HSPA8) (Figure 3–figure supplement 2). Moreover,
775 simultaneous analysis of the poised and transcribing Pol II forms (pPol II and tPol II) using our
776 mTR algorithm provides further evidence that, at the majority of class II TDG, Tat dampens the
777 level of tPol II without largely modifying or even increasing pPol II (Figure 7–figure supplement
778 1), consistent with a role in primarily blocking the transition into elongation.
779 In conclusion, the data indicate that Tat precisely modulates Pol II recruitment and
780 transition into the gene body to block transcription initiation or elongation, respectively, at
781 different gene classes.
782
783 Enrichment of motifs for master transcriptional regulators but not of TAR-like motifs in
784 the Tat binding sites
785 Given that Tat activates transcription from the HIV promoter by associating with the TAR
786 structure that is formed at the nascent chain of viral pre-mRNAs (Frankel and Young, 1998)
787 (Figure 8A), we investigated the possibility that Tat is recruited to the host chromatin through
788 interaction with TAR-like structures. To test whether the Tat sites at the direct target genes
789 (both TSG and TDG) contain TAR-like structures, we searched for TAR-like motifs using a
790 custom algorithm and the input sequence X(2)GATX(1,2)GAX(4,40)TCTCX(2) as query (Figure 8B),
791 where X denotes any nucleotide, and the numbers in brackets represent the minimal and
792 maximal allowed positions with the corresponding secondary structure pattern XX((XX((X(4–40)-
793 X))))XX including the di-/tri-nucleotide bulge within the stem and loop (both critical determinants
794 for TAR binding) (Frankel and Young, 1998). Locations of TAR-like motifs near target genes
795 were cataloged and compared to distribution of Tat peaks in our ChIP-seq dataset. While ~20%
796 of the TSG and TDG contain TAR-like motifs within a very large window (<10 Kb) from the Tat
797 site, an insignificant number (~1%) contain a TAR-like motif in close proximity (<0.1 Kb) to the
798 Tat site identified by ChIP-seq (Figures 8C and D). The data suggest that there is no
32 799 significant presence of TAR-like motifs at or near Tat peaks, and only minimal examples of a
800 TAR-like motif within 10 Kb of a Tat peak. Thus, it seems unlikely that the more prevalent
801 mechanism of Tat recruitment to its target genes is through interaction with HIV TAR mimics.
802 Given that Tat’s interaction with its target genes appears not to rely on TAR-like motifs,
803 we reasoned that Tat might be recruited through interaction with master transcriptional
804 regulators. To identify candidates, we submitted 200-bp windows surrounding Tat peaks from
805 TSG and TDG to the DREME (Discriminative Regular Expression Motif) analysis tool (Bailey,
806 2011), which returned motifs matching T cell master regulators (ETS1, RUNX1) and GATA3
807 transcription factors for TSG, and ETS1 and RUNX1 for TDG, with statistically significant higher
808 p-values in TSG (2.5x10-18 and 9.6x10-14, respectively) (Figures 8E and F), without enrichment
809 of other factors expressed in T cells such as members of the Signal Transducer and Activator of
810 Transcription (STAT) family. Remarkably, enrichment of both ETS1 and RUNX1 at TSG
811 correlates with the observed functional annotation and their known role in modulating T cell
812 activation (Figure 1) (Hollenhorst et al., 2009; Hollenhorst et al., 2011).
813 Given that ETS1 motifs are present at both TSG and TDG, we examined whether the
814 presence of these motifs inform about the mode of Tat effect on cellular genes. To test this, we
815 looked to see whether ETS1 motifs are more prevalent in class I or class II TSG or TDG, and
816 observed that the number of target genes containing ETS1 binding sites within 100-bp from the
817 center of the Tat peak is: 11/17 (64.7%) for class I TSG, 36/43 (83.72%) for class II TSG, 8/14
818 (57.1%) for class I TDG, and 6/11 (54.5%) for class II TSG. This analysis indicates that the
819 presence of an ETS1 motif at the Tat target genes is not by itself sufficient to determine the
820 mechanism of action (stimulation or downregulation) or whether Tat modulates the initiation or
821 elongation step of transcription. ETS1 may help recruit Tat to chromatin but it needs another
822 determinant for specificity or mode of action, most likely related to the transcriptional activity
823 status or yet unidentified co-factors (protein and/or long-non coding RNA), which might function
824 in a gene-specific manner.
33 825 Given that the highest-confidence motif returned for both TSG and TDG is ETS1, we
826 examined whether this was indicative of co-occupancy of ETS1 with Tat in these genes. We
827 first retrieved ETS1 ChIP-seq data in Jurkat T cells (Hollenhorst et al., 2009) and compared
828 locations of ETS1 peaks to the locations of Tat peaks in our dataset (Figures 8G and H).
829 Contrary to our results when comparing TAR-like motifs sites to Tat ChIP-seq peaks, we
830 discovered that 73% of TSG and 80% of TDG contain an ETS1 ChIP-seq peak within 100-bp of
831 a Tat ChIP-seq peak. Together, the data suggest a model where Tat is recruited to host cell
832 chromatin through interaction with T-cell identity factors such as ETS1 and not through TAR-like
833 motifs, thus revealing unique and unexpected recruitment mechanisms. ETS1 ChIP-seq
834 revealed 19049 sites in the genome (Hollenhorst et al., 2009). Interestingly, nearly 52% of the
835 Tat peaks detected by ChIP-seq (3203/6117) harbor an ETS1 binding event within 100-bp,
836 significantly more than expected from random occurrence (p-value 1x10-8536, Hypergeometric
837 test).
838 Although a large number of both TSG and TDG contain ETS1 motifs, the enriched motifs
839 are different between both classes (5’-AGGAAG/AT/C-3’ and 5’-CNGGAA-3’, respectively)
840 (Figures 8E and F), even though both contain the signature motif for ETS1 binding (5’-GGAA-
841 3’). Thus, it would be interesting to determine whether these different motifs dictate a diverse
842 binding mode and could inform about the mechanism of activation and repression through the
843 same transcription factor, and whether the DNA binding site directs recruitment of specific
844 cofactors differentially targeted by Tat at TSG and TDG.
845 Given that we found no significant evidence of Tat interaction with TAR mimics in the
846 human genome as well as evidence of Tat interaction with ETS1, we reasoned that one clear
847 mechanism by which Tat is recruited to chromatin is through direct protein-protein interactions.
848 To biochemically test that Tat is recruited to chromatin in a RNA-independent manner, we
849 performed a cellular fractionation by sequential salt extraction in the absence and presence of
850 RNase (Figure 8–figure supplement 1A) and observed that: (i) Tat is present in the
34 851 nucleoplasm (100 mM salt fraction) as well as bound to chromatin eluted at (600 mM salt
852 fraction) from Jurkat nuclear extracts, while GFP is mainly detected in the 100mM salt fraction;
853 and (ii) the interaction between Tat and chromatin is primarily dictated by protein-protein
854 interaction, although a minor fraction appears to be RNA-dependent (Figure 8–figure
855 supplement 1B). To test this possibility further, we performed ChIP-qPCR assays at select
856 TSG and TDG by preparing samples in the presence and absence of RNase and observed that
857 the treatment does not affect Tat binding to four different target gene promoters (Figure 8–
858 figure supplement 1C). Although we cannot strictly rule out that Tat can combinatorially
859 interact with target proteins and RNA species present at different genomic domains, our data
860 indicate that Tat interaction with chromatin is primarily dictated by protein-protein interactions.
861
862 ETS1 recruits Tat to chromatin to reprogram cellular transcription
863 Given that we found a significant enrichment of ETS1 motifs in Tat sites (Figure 8) we reasoned
864 that ETS1 mediates Tat recruitment to chromatin. To test whether the two proteins interact we
865 performed Strep affinity purifications (AP) using nuclear fractions prepared from the Jurkat T-cell
866 lines followed by western blot analysis. The data indicate that Tat, but not the C22A non-
867 functional mutant or GFP, interacts with endogenous ETS1, as well as with the P-TEFb kinase
868 (CDK9) used as positive control, thereby showing that the protein-protein interaction is specific
869 (Figure 9A and Figure 9–figure supplement 1A). The facts that Tat and ETS1 interact, ETS1
870 motifs are enriched at the Tat target genes, and ETS1 binds these genes (as revealed by ETS1
871 ChIP-seq in Jurkat T cells (Hollenhorst et al., 2009)) prompted us to test whether both proteins
872 co-occupy target genes. To test this possibility we performed ChIP assays on the Jurkat GFP
873 and Tat cell lines using primer-pairs that amplify both promoter-proximal and promoter-distal of
874 different Tat target genes. We found that Tat and ETS1 co-occupy the CD69 promoter but not
875 gene body (Figure 9B), consistent with the motif enrichment analysis and a previous ETS1
876 ChIP-seq dataset (Hollenhorst et al., 2009). Furthermore, Tat binding at the CD69 promoter
35 877 does not appear to alter the occupancy levels of ETS1, since the density of ETS1 in the
878 absence and presence of Tat is similar, if not identical (Figure 9B). In addition, Tat binds some
879 target genes lacking ETS1 motifs such us CD1E (Figure 9C). These results provide evidence
880 that Tat also is recruited in an ETS1-independent manner thereby indicating that additional
881 recruitment mechanisms exist.
882 Given that Tat and ETS1 interact and co-occupy target gene promoters, we asked
883 whether ETS1 (already bound to DNA elements) recruits Tat to chromatin to reprogram gene
884 transcription. To test this, we generated Jurkat-GFP and -Tat cell lines expressing ETS1-
885 specific shRNAs to efficiently knockdown ETS1 or a non-target (NT) shRNA as negative control
886 (Figure 9D). Remarkably, we observed that ETS1 knockdown consistently diminishes (~6-fold)
887 ETS1 density at the CD69 promoter in both GFP and Tat cell lines with a concomitant loss of
888 Tat signal (~6-fold) at the promoter (Figure 9E). Importantly, ETS1 knockdown does not alter
889 the recruitment of Tat to the CD1E promoter, which is regulated in an ETS1-independent
890 manner (Figure 9F).
891 Given that Tat binds ETS1 and both co-occupy promoter-proximal regions of selected
892 target genes, we examined whether ETS1 is critical for transcriptional activation of CD69 and
893 three other TSG and observed that ETS1 knockdown interferes with the Tat-mediated increase
894 in target gene stimulation for both class I and class II TSG (Figure 9–figure supplement 1B–E)
895 without affecting RNA steady-state levels of non-target genes such as RPL19 and 7SK (Figure
896 9–figure supplement 1F–G). The evidence that ETS1 is critical for Tat's transcriptional
897 activation of these four selected TSG correlates with the Tat-mediated increase in the levels of
898 Pol II and the chromatin marks coinciding with transcription initiation and elongation at both
899 class I and II TSG (Figures 2–5).
900 Together, we provide compelling evidence that Tat is recruited to chromatin through
901 interaction with the T-cell identity factor ETS1 to reprogram cellular transcription. Further
902 studies are needed to determine whether ETS1 functions as a scaffold to promote Tat
36 903 recruitment or whether Tat induces protein conformational changes to activate or repress ETS1
904 regulatory transcriptional programs (see Discussion).
37 905 Discussion
906 Several studies have reported that Tat binds the human genome to modulate cellular gene
907 expression to alter the biology of immune cells (dendritic and CD4+ T) and generate a
908 permissive environment for viral replication and/or spread (Huang et al., 1998; Izmailova et al.,
909 2003; Kim et al., 2010a; Kim et al., 2013; Kukkonen et al., 2014; Li et al., 1997; Marban et al.,
910 2011). However, a comprehensive description of the direct target genes and the nature of the
911 regulatory mechanisms have yet to be discovered. In this article, we report a technically
912 improved Tat ChIP-seq assay that dramatically increases sensitivity in comparison with previous
913 methodologies. Furthermore, by simultaneously assaying transcriptome changes and Tat’s
914 genome-wide distribution we combinatorially identified direct Tat targets in the human genome
915 with high confidence. In contrast to previous studies, we were able to elucidate a large
916 proportion of Tat binding sites in the human genome that correlate with marked gene expression
917 changes (activation or repression) at both the RNA and protein levels. Importantly, the Tat
918 target genes shared functional annotations, and are regulated (for the most part) by a common
919 set of master transcriptional regulators.
920 Transcription factors coordinate the activation and maintenance of transcriptional
921 programs by regulating one or multiple steps in the transcriptional cycle. While some sequence-
922 specific DNA binding transcription factors recruit Pol II and the basal transcription apparatus to
923 promote transcription initiation (Hahn, 2004), others function at the elongation step by allowing
924 Pol II transition from a promoter paused state to the productive elongation phase (Adelman and
925 Lis, 2012; Rahl et al., 2010; Zhou et al., 2012). By performing a global analysis of chromatin
926 signatures that are generally associated with different genomic domains (promoter, coding units
927 and enhancers) together with genome-wide Pol II distribution data in the absence and presence
928 of Tat, we described, for the first time, the precise nature of the mechanisms of transcriptional
929 activation and repression by Tat. Strikingly, we found that Tat directly controls both the initiation
930 and elongation steps to transcriptionally reprogram the cell. Tat promotes Pol II recruitment at
38 931 promoters and transcribing units to stimulate the transcription initiation and elongation steps,
932 respectively. Conversely, Tat blocks Pol II recruitment to promoters or transcribing units to
933 prevent the initiation and elongation steps, respectively, thereby leading to gene repression.
934 Remarkably, the global analysis of chromatin signatures is consistent (for the most part) with the
935 proposed mechanisms based on Pol II occupancy changes. For example, class I TSG
936 (regulated at the initiation step) showed a large increase in both Pol II and H3K4me3 at
937 promoter-proximal regions, consistent with their functional link in activating gene transcription
938 (Shilatifard, 2012). Although we have not searched for the H3K4 methylase, it would be
939 interesting to define whether Tat hijacks one or more of Set/MLL methylases and whether
940 DoT1L and the SEC are co-recruited to link transcription initiation with elongation (Luo et al.,
941 2012; Nguyen and Zhang, 2011; Shilatifard, 2012). In addition, the mechanism by which Tat
942 reduces H3K4me3 density at some class II TSG remains yet unknown, but it may be possible
943 that Tat recruitment to the promoter region interferes with the H3K4me3 signature, as has been
944 seen with the NS1 protein during influenza virus infection (Marazzi et al., 2012). Although we
945 have described that Tat associates with and recruits chromatin-modifying enzymes to target
946 genes, one important aspect of the mechanism that we have not clarified yet is that the histone
947 modifications may represent (in some cases) indirect effects of changes in transcription and not
948 a direct consequence of Tat function on the genome.
949 Despite the canonical view of transcriptional regulation through transcription factor-
950 promoter DNA interaction, it has recently become evident that internal promoters, intragenic
951 enhancers and other genomic elements contribute to specify transcriptional programs through
952 the formation of three-dimensional structures (Bulger and Groudine, 2011; Creyghton et al.,
953 2010; Ghavi-Helm et al., 2014; Heintzman et al., 2007; Jin et al., 2013). Consistent with these
954 discoveries, we found that Tat exploits the human genome by binding not only at promoters but
955 also at intergenic and intragenic sites to modulate long-range chromatin interactions and
956 transcription activation from these promoter-distal sites. We observed that at a large number of
39 957 class I TSG (regulated at the initiation step) Tat binds at intragenic sites to modulate enhancer
958 activity. In the absence of Tat, the nucleosomes surrounding these sites contain low or
959 undetectable histone modifications related to enhancer activity (H3K4me1 and H3K27Ac). Tat
960 binding increases their density in a manner that is proportional to transcriptional levels. Notably,
961 this increase also correlates with the recruitment of chromatin-modifying enzymes (p300/CBP)
962 at the Tat site, chromatin looping between the internal site and the promoter, and Pol II
963 recruitment to trigger de novo transcription initiation. The role of intragenic Pol II pausing
964 requires further investigation but it may be possible that this form maintains activate
965 transcription elongation by promoting template DNA circularization. This Pol II form might not
966 have been observed in embryonic stem cells because most, if not all, genes are incompetent for
967 elongation and Pol II is primarily paused in the promoter-proximal region.
968 Recent evidence suggested that internal sites marked with H3K27Ac appeared to be a
969 sort of intragenic enhancers (Kowalczyk et al., 2012), implying that Tat directly dictates gene
970 activation by binding at these sites and is likely to be a major determinant of the overall
971 architecture and the composition of histone modifications of these internal sites. The gene
972 looping hypothesis provides a mechanistic explanation for the molecular effects observed at the
973 promoter (namely Pol II recruitment and increase in H3K4me3 density) in response to Tat
974 binding at promoter-distal sites, even without detectable Tat at the promoter-proximal region.
975 Despite we have shown that the C22A non-functional mutant (which does not dimerize) does
976 not promote gene looping, it would be interesting to further test whether the Tat-mediated long-
977 range chromatin interactions are controlled by protein homo-dimerization, a feature of Tat that
978 was originally described by Frankel and Pabo (Frankel et al., 1988), but whose function has
979 since then remained largely elusive. Furthermore, the role of intragenic enhancers in activating
980 gene expression is an attractive possibility, and might help recruit chromatin-modifying enzymes
981 for compartmentalization purposes and/or ‘on site’ activation.
982 Although in this report we did not thoroughly examine Tat binding at intergenic regions
40 983 and their functional role in transcriptional control, it is plausible that Tat binds these genomic
984 domains (enhancers) to control long-range interactions and the assembly of transcription
985 complexes to modulate transcription activation/repression. Given that transcription from
986 enhancers is a widespread regulatory mechanism to modulate the activity of nearby genes,
987 further research is needed to understand the role of Tat in controlling transcription through
988 enhancers and the potential role of the newly identified class of enhancer-derived long-non-
989 coding RNAs in transcription activation or repression (Kim et al., 2010b; Kim and Shiekhattar,
990 2015). Although we have detected Tat-induced RNAs from enhancers that are co-regulated
991 with nearby genes, further investigation is also needed to define the molecular mechanisms by
992 which these Tat-induced non-coding RNAs function during the reprogramming process.
993 Although the majority of transcription factors described to date contact DNA directly, Tat
994 is unique because it binds the nascent RNA structure (TAR) formed at the HIV promoter.
995 Surprisingly, in contrast to activation of the HIV genome, we found insignificant enrichment of
996 TAR-like motifs at Tat binding sites in the human genome. Although we cannot completely
997 exclude the possibility that nascent RNA chains (folding or not into TAR-like structures) help Tat
998 recruitment to host cell chromatin, we unexpectedly found that Tat occupies sites bound by
999 master transcriptional regulators (ETS1) with high frequency. Notably, we provided biochemical
1000 and genetic evidence that ETS1 recruits Tat to chromatin to modulate (activate or repress) gene
1001 transcription. ETS1 is one member of a large family of transcription factors (ETS) that play
1002 important roles in T-cell stimulation and differentiation (Hollenhorst et al., 2011). ETS1 requires
1003 combinatorial interactions with other factors (such as RUNX1) to activate transcription. We
1004 propose that, like RUNX1, Tat uses ETS1 as a scaffold to promote its recruitment to target
1005 genes and modulate gene transcription. Given that ETS1 is found at both activated and
1006 repressed genes, it is evident that ETS1 does not dictate per se the mode of regulation
1007 (activation or repression). For the mechanism of gene activation, a likely scenario would be that
1008 Tat is recruited to ETS1 to relieve its auto-inhibition and recruit Pol II, elongation factors and
41 1009 chromatin-modifying enzymes. For the mechanism of gene repression, Tat might compete off
1010 factors pre-associated with ETS1 (such as RUNX1) and block PIC assembly (in the case of
1011 transcription initiation blockage) or prevent the action of elongation factors such as P-TEFb (in
1012 the case of transcription elongation repression). Although ETS1 appears to play a central role
1013 in the recruitment of Tat to several direct target genes (irrespective of the transcriptional
1014 outcome), it is completely possible that several other co-factors including long non-coding RNAs
1015 and/or proteins function combinatorially to specify target loci identification and Tat function.
1016 Although we have provided important insights into the recruitment mechanisms, further
1017 investigation is needed to clarify the molecular basis of the Tat-ETS1 protein-protein interaction
1018 in the regulation of gene activation and repression. One particular observation derived from the
1019 motif analysis is that the ETS1 motif identified at both TSG and TDG contain a common
1020 sequence (5’-GGAA-3’) but differ in the -1/-2 positions. Of note, transcription factor binding
1021 sites slightly differing in sequence have been shown to modulate various steps of the
1022 transcription cycle including transcription factor binding affinity and conformational changes
1023 upon their recruitment to target sequences, as well as co-factor recruitment (Meijsing et al.,
1024 2009). Therefore, the difference in the ETS1 motifs at TSG and TDG might contain information
1025 utilized by Tat (and/or Tat co-factors) to activate or repress transcription. Undoubtedly, further
1026 research is required to elucidate the precise molecular basis.
1027 Despite the widespread role of ETS1 in recruiting Tat to chromatin, it is most likely that
1028 other recruitment strategies exist because not all Tat target genes identified are regulated by
1029 ETS1. Defining all the details will require higher-resolution approaches such as ChIP-exo and
1030 ChIP-nexus in order to improve the definition of Tat target sites and mechanisms of recruitment
1031 to chromatin (He et al., 2015; Rhee and Pugh, 2011). In addition, the analysis of transcriptome
1032 changes using RNA-seq (which measures steady-state RNA levels and not transcription per se)
1033 has some caveats, for example difficulties in detecting low abundance or highly unstable RNAs.
1034 Potentially, this may help explain why we also found a large fraction of binding events that do
42 1035 not have a correlate with gene expression changes. Other tools such as GRO-seq (Core et al.,
1036 2008), which measure levels of nascent RNAs, will also be needed to further improve the
1037 definition of direct and indirect target genes. Moreover, a more recent and powerful technique
1038 named NET-seq yields transcriptional activity at nucleotide resolution and thus outperforms
1039 compared to GRO-seq (Mayer et al., 2015). Certainly, these tools will provide much higher-
1040 resolution to define target sites, modes of Tat recruitment and improved functional insights.
1041 Nonetheless, our work provides the first comprehensive view of how Tat modulates the biology
1042 of immune target cells by mediating key transcriptional changes. Interestingly, Tat can
1043 potentially perform different functions depending on the target cell type. Thus, it would be
1044 informative to determine how host adaptability is harnessed by a viral protein to selectively
1045 reprogram transcription in a cell lineage-specific manner.
1046 In conclusion, despite the complexity of transcriptional regulatory mechanisms in the
1047 cell, Tat precisely controls Pol II recruitment and pause release to fine-tune the initiation and
1048 elongation steps, respectively. It is possible that the diversity of mechanisms employed by Tat
1049 to reprogram the host cell arise from the intrinsic complexity of transcriptional regulatory
1050 strategies of human genes. Finally, our data provide yet another example on how a virus with a
1051 limited coding capacity optimized its genome by evolving a small but “multi-tasking” protein to
1052 simultaneously control viral and cellular transcription using distinct regulatory strategies.
43 1053 Materials and methods
1054
1055 Experimental analysis
1056 Cell culture
1057 HEK293T cells were maintained in DMEM with 10% fetal bovine serum (FBS), 100 U/ml
1058 penicillin and 100 μg/ml streptomycin (Life Technologies). CD4+ Jurkat T cells (clone E6.1)
1059 were cultured in RPMI 1640 (HyClone) with 10% FBS, 100 U/ml penicillin and 100 μg/ml
1060 streptomycin (Life Technologies). The Jurkat HIV E4 clone was kindly provided by Dr. J. Karn
1061 and described elsewhere (Pearson et al., 2008). Jurkat T-REx cells were maintained in the
1062 same conditions as for Jurkat but in the presence of 10 μg/ml Blasticidin as indicated by the
1063 manufacturer's instructions (Life Technologies). The derived Jurkat T-REx clones (GFP and Tat
1064 bearing the Strep/Flag (SF) epitope: GFP-SF and Tat-SF, respectively) were selected with 300
1065 μg/ml of zeocin for four weeks and later cultured with 10 μg/ml Blasticidin and 100 μg/ml zeocin
1066 (Life Technologies). The Tat cell line was cultured for less than 3 months because it otherwise
1067 loses Tat expression and responsiveness. To generate the stable GFP:SF and Tat:SF
1068 expressing cell lines, the parental Jurkat T-REx was electroporated with 2 μg of Ssp1-linearized
1069 pcDNA4/TO vector (Life Technologies) bearing either insert using a nucleofector kit V and a
1070 Nucleofector II electroporator (Lonza). To induce expression of GFP:SF and Tat:SF, the stable
1071 cell lines were treated with 1 μg/ml doxycycline (DOX, Sigma) for 16 hrs (or as indicated). We
1072 chose to utilize a SF-tagged Tat protein and immunoprecipitate Tat:SF using an anti-FLAG
1073 antibody due to our inability to efficiently immunoprecipitate Tat under denaturing conditions
1074 with the available commercial anti-Tat antibodies from Covance (catalogue number MMS-116)
1075 and Abcam (ab43014).
1076
1077 Primary CD4+ T cell isolation, stimulation and HIV infection
44 1078 Blood from healthy donors was obtained from the Gulf Coast Regional Blood Center from
1079 Houston, TX. PBMCs were isolated by Ficoll-Hypaque density gradient centrifugation (Stem
1080 cell) and Naïve CD4+ T cells were then purified by negative selection using the magnetic beads
1081 Human Naïve CD4 T Cell Enrichment Set (Becton Dickinson, 558521). Cell purity was
1082 determined by surface staining: APC-conjugated anti-CD4 (Becton Dickinson, 555349) and
1083 FITC-conjugated CD45RA (Becton Dickinson, 555488). To generate CD4 central memory T
1084 cells (TCM), Naïve CD4 T cells were activated at day 0 (Figure 1–figure supplement 8) in 96-
1085 well plates precoated with anti-CD3/CD28 [2.5 μg/ml anti-CD3 (OKT3), 2.5 μg/ml anti-CD28
1086 (Biolegend, 302914)] in the presence of anti-IL-12 (4 μg/1x106 cells; R&D, MAB219), anti-IL-4 (2
1087 μg/1x106 cells; R&D, MAB204) and TGF-β1 (0.8 μg/1x106 cells; Peprotech, 100-21) (Bosque
1088 and Planelles, 2011; Messi et al., 2003). After activation cells were maintained at 1x106 cells/ml
1089 in RPMI 1640 with L-Glutamine (HyClone) containing 10% FBS, 100 U/ml penicillin and 100
1090 μg/ml streptomycin (Life Technologies) in the presence of 30 IU/ml of human IL-2 (AIDS
1091 Research and Reference Reagent Program). For infection of TCM with replication competent X4
1092 virus (pNL-4-3–GFP Nef(–), MOI=0.5), activated cells (day 7) were spinoculated (1x106 cells in
1093 1 ml; 2 hrs, 2900 rpm) in the presence of 8 μg/mL polybrene (Hexadimethrine bromide, Sigma).
1094 After infection, the supernatant was removed, cells were resuspended in complete RPMI and
1095 crowded for three days in 96-well plates (round bottom) at a concentration of 1X106 cells/ml (0.1
1096 ml/well). Three days post-infection GFP+ cells were sorted at the UTSW Flow Cytometry Core.
1097 At this point supernatant and cells were harvested for p24 ELISA and RNA isolation,
1098 respectively, as indicated above.
1099
1100 Protein-protein interaction assays
1101 Tat-associated proteins were purified using Strep-Tactin Superflow beads (IBA Life Sciences).
1102 Briefly, Jurkat (1x109 cells) were lysed using the Dignam method with brief modifications
45 1103 (Dignam et al., 1983). Strep-Tactin beads were equilibrated with IP lysis buffer, mixed with cell
1104 lysates and rotated at 4°C for 2 hrs. After centrifugation at 3000 rpm for 1 min, unbound
1105 proteins were removed and beads washed four times with IP wash buffer (20 mM Tris-HCl pH =
1106 7.5, 1.5 mM MgCl2, 250 mM NaCl, 0.2% NP-40, 5% glycerol, 1 mM DTT, protease inhibitor
1107 (Roche)). Specifically bound proteins were eluted with elution buffer containing desthiobiotin
1108 (IBA Life Sciences) for 30 min at 4°C.
1109
1110 Cell fractionation by sequential salt extraction
1111 1x108 cells (Jurkat-GFP and -Tat) were collected from 100 ml of culture media (grown at 1x106
1112 cells/ml) and washed with 2 ml of cold 1x PBS plus EDTA-free Protease Inhibitor (PI) Cocktail
1113 (Roche). The cell pellet (PCV=0.1 ml) was resuspended in 1 ml Buffer A (10 mM KCl, 10 mM
1114 HEPES pH 7.9, 0.1 mM EDTA, 1 mM DTT, 0.4% NP-40, plus PI) and incubated on ice for 5
1115 min. Nuclei were pelleted by centrifugation (5 min spin at 2,000 g at 4°C), the supernatant
1116 saved as the cytoplasmic extract (~0.5-0.6 ml), and the nuclear pellet washed twiced with 1 ml
1117 buffer A. Nuclei were resuspended in 0.25 ml Buffer B (20 mM HEPES pH 7.9, 100 mM NaCl, 1
1118 mM EDTA, 1 mM DTT, 1% NP-40 plus PI) using 10 strokes in a 1-ml dounce homogenizer and
1119 centrifuged 5 min at 6000 g. The supernatant (~0.4 ml) was designated as nuclear extract (100
1120 mM). The pellet was similarly resuspended in 2x pellet volume of Buffer B containing 600 mM
1121 NaCl, vortexed for 10 sec, treated with RNAse A (Roche) or BSA (mock treatment) at a
1122 concentration of 10 μg/ml for 60 min at 4°C and then spun at 6,000 g for 5 min.
1123
1124 Western blotting and antibodies
1125 Protein samples were resolved in SDS-PAGE, transferred to 0.45 μm nitrocellulose (Bio-Rad)
1126 membranes, blocked in Tris-buffered saline (TBS) containing 5% non fat dry milk for 1 hr, and
1127 incubated with primary antibodies overnight at 4°C. Primary antibodies used for western blot:
46 1128 FLAG M2 (F1804, Sigma); STREP-Tag II (71591, Novagen); CDK9 (sc-484, Santa Cruz); ETS1
1129 (sc-350X, Santa Cruz); β-actin (sc-47778, Santa Cruz); Tat (ab43014, Abcam); Dot1L (A300-
1130 954A, Bethyl); SetD2 (ab69836, Abcam); histone H3 (ab1791, Abcam). Secondary antibodies
1131 coupled to horseradish peroxidase (HRP) were donkey anti-rabbit IgG-HRP (sc-2313, Santa
1132 Cruz), goat anti-mouse IgG-HRP (sc-2005, Santa Cruz), and donkey anti-goat IgG-HRP (sc-
1133 2020, Santa Cruz) and were incubated at 1:10,000 dilutions for 1 hr and blots developed using
1134 Clarity Western ECL Substrate kit (Bio-Rad).
1135
1136 Flow cytometry
1137 For cell surface marker staining, Jurkat cells were washed twice with PBS/0.5% BSA, and
1138 incubated with 50 μl of the antibody diluted in PBS/0.5% BSA for 30 min in the dark. Antibodies
1139 used include CD69-PE (12-0699, eBioscience) or Mouse IgG1 K isotype control PE (12-4714,
1140 eBioscience) and CD4-APC (MHCD0405, Caltag). Cells were washed twice with PBS and then
1141 resuspended in 200 μL PBS/2.5% formalin, washed twice with PBS and twice with PBS/0.5%
1142 BSA. Fixed cells were subjected to flow cytometry analysis on a FACS Calibur in the Flow
1143 Cytometry Core Facility in UT Southwestern Medical Center. Data analysis was performed
1144 using FlowJo version 9.6.1.
1145
1146 Enzyme-linked immunosorbent assay (ELISA)
1147 Viral stocks and kinetics of HIV p24 antigen expression were monitored by ELISA following the
1148 manufacturer’s instructions. Plates and reagents were obtained from SAIC Frederick, Inc. from
1149 the National Cancer Institute’s (NCI) Operations and Technical Support Contractor to the
1150 Frederick National Laboratory for Cancer Research (FNLCR).
1151
1152 RNAi
47 1153 Lentiviral particles were made by transfecting HEK293T cells with three plasmids: the shRNA
1154 transfer vector pLKO.1 (Sigma), pMD2.G (VSV-g) and psPAX2 (gag-pol). Supernatants were
1155 harvested two days post-transfection, quantified by ELISA and stored at -80°C. Jurkat cells
1156 (2x105) were spinoculated at 2900 rpm for 2 hrs with lentiviral particles (50-200 μl, ~1x107
1157 transducing units (TU) determined by p24 ELISA) in the presence of 8 μg/ml polybrene.
1158 Efficiently transduced cells were selected with 1 μg/ml puromycin for 5 days. At this point, cells
1159 were used for validation of RNAi efficiency by qRT-PCR and/or western blot using the
1160 corresponding primer-pair and primary antibody, respectively. shRNAs used for RNAi are:
1161 pLKO.1-NT shRNA (SHC002, Sigma); pLKO.1-Dot1L shRNA (TRCN0000020209, Sigma);
1162 pLKO.1-SetD2 shRNA (TRCN0000003030, Sigma); pLKO.1-CDK9 shRNA (TRCN0000000494,
1163 Sigma); pLKO-puro-IPTG-3xLacO-NT shRNA (SHC332, Sigma); pLKO-puro-IPTG-3xLacO-
1164 ETS1 shRNA (TRCN0000005591, Sigma).
1165
1166 Chromosome conformation capture (3C) assay
1167 Chromatin interaction was determined using a 3C assay (Hagege et al., 2007). For the 3C
1168 assay in the absence/presence of CDK9 inhibitor, cells were induced with DOX for 6 hours
1169 (minimal detectable time before the onset of protein induction) followed by 2 hr incubation with
1170 FP or DMSO (vehicle), and cells were then processed as follows. 1x107 Jurkat cells were fixed
1171 with 1.5% methanol-free formaldehyde (Thermo Fisher) at room temperature for 10 min, the
1172 crosslinking was quenched with 0.125 M glycine and then cells washed twice with cold PBS.
1173 Cell pellets were homogenized in cold lysis buffer (10 mM Tris-HCl pH 8.0, 10 mM NaCl, 0.2%
1174 NP-40) using a dounce homogeneizer (10 cycles) and incubated for 90 min at 4°C with constant
1175 rotation. After centrifugation for 5 min at 400 g at 4°C, 1x106 nuclei were re-suspended in 0.5
1176 ml of 1.2 X restriction enzyme buffer for DNA digestion followed by incubation with 7.5 μl of 20%
1177 (w/v) SDS for 1 hr at 37°C with constant shaking. 50 μl of 20% (v/v) Triton X-100 were treated
48 1178 for 1 hr at 37°C with constant shaking. Chromatin DNA was digested with 600 units of the
1179 indicated restriction enzyme (EcoRI) overnight at 37°C with constant shaking. Samples were
1180 incubated with 40 μl of 20% (w/v) SDS for 30 min at 65°C to stop the reaction. Digested nuclei
1181 were transferred to 50 ml Falcon tube and incubated with 6.125 ml of 1.15 X ligation buffer and
1182 375 µl of 20% (v/v) Triton X-100 for 1 hr at 37°C with constant shaking. DNA ligation was
1183 performed by incubation with 2,000 units of T4 DNA ligase (New England Biolabs) for 4 hrs at
1184 16°C followed by 30 min at room temperature. Reverse crosslinking was done by incubation
1185 with 25 μl (20 mg/ml) of proteinase K (Roche) at 65°C overnight. DNA samples were then
1186 treated with 5 μl of RNase A (Roche) for 30 min at 37°C, and then mixed with 7 ml of phenol–
1187 chloroform vigorously and centrifuged for 15 min at 2,200 g at room temperature. The
1188 supernatant was transferred into a 50 ml tube and mixed with 7 ml of distilled water, 1.5 ml of 2
1189 M sodium acetate pH 5.6, and 35 ml of ethanol to precipitate DNA by centrifuging at 12,000g for
1190 30 min at 4°C. The DNA pellet was dissolved in 150 μl of 10 mM Tris-HCl pH 7.5. Random
1191 ligation matrix was prepared by digestion with restriction enzyme and ligation with BAC clones
1192 containing the respective genes examined from the Children’s Hospital Oakland Research
1193 Institute (CHORI). DNA concentration of 3C samples was determined by qRT-PCR assays with
1194 GAPDH-specific primers and using Fast SYBR Green Master Mix in a 7500 Real-Time PCR
1195 System (Applied Biosystems). Primers used for 3C will be provided upon request.
1196
1197 RNA extraction and qRT-PCR assays
1198 Total RNA from the Jurkat cell lines was isolated using TRIzol (Life Technologies) and RNeasy
1199 mini kit (Qiagen). 1 μg of total RNA was used for synthesis of first strand cDNA with M-MLV
1200 (New England Biolabs), and qPCR was performed using a Fast SYBR Green Master Mix in a
1201 7500 Fast Real-Time PCR System (Applied Biosystems). Experiments were done in biological
1202 triplicates and error bars represent the SEM as indicated in all figure legends. The primers used
49 1203 for qRT-PCR analysis are as follow (Gene, forward primer, reverse primer):
1204 (CCCCCCGGGCCGTCTTCCCCTC, TGAGGATGCCTCTCTTGCTCTG)
1205 RPL19 (ATCGATCGCCACATGTATCA, GCGTGCTTCCTTGGTCTTAG)
1206 7SK (TAAGAGCTCGGATGTGAGGGCGATCTG, GGAGCGGTGAGGGAGGAAG)
1207 CD69 (ACTGTGAAGAGGAGCTGG, GTGTTCCTCTCTACCTGCGTATCG)
1208 ADCYAP1 (GGGATCTTCACGGACAGCTA, CGGCGTCCTTTGTTTTTAAC)
1209 FAM46C (GCCTAGGGGCTGTAGAGGTCG, CTGCTCTCCTCTGCCATCTT)
1210 PPM1H (GCACACACAATGAAGACCAAGCCA, CAATAGTGGCAGGAAACACCCTCG)
1211 ANXA1 (ACGCTTTGCTTTCTCTTGCTAAGG, AAGGCCCTGGCATCTGAATCAGCC)
1212 ZNF83 (AGCCACCAAGAAGACCAAAGAAA, TATAAAGCCCTCTGTGCAGGGTTCA)
1213 CD1E (AAAGCCTTCTTGGTCACACCT, GCTTTGGGTAGAATCCTGAGAC)
1214 FBLN2 (GCCGATGGCTATATCCTCAAT, CAGGTGAGTGCCTTGTAGCA)
1215 EOMES (AAATTCCACCGCCACCAAACTGAG, TTGTAGTGGGCAGTGGGATTGAGT)
1216 CDK6 (ATGTGTGCACAGTGTCACGAACAG, TTAGATCGCGATGCACTACTCGGT)
1217 PRAME (ATTTGCGGAGGTTTTCACAGGA, TATCGGCTCTGAATGGAACCCC)
1218 FAM133B (TCGGGTGGCCTATATGAACCCAAT, GCCAAAGCCTTGGAGCCTTTCTTT)
1219 Dot1L (CCACCAACTGCAAACATCACT, GAGGAAATCGCCTCTCTCCAAT)
1220 SetD2 (CTCTCACCACCCTCTTCTGCCTA, CCACCTCTTGCTCAAACAACTTCC)
1221 CDK9 (GTGTTCGACTTCTGCGAGCATGAC, CTATGCAGGATCTTGTTTCTGTGG)
1222 PPM1H eRNA (AGCCTGGAGTGGAGAAGAAA, TCTGATTCCCTTCCATCCTC)
1223 GAPDH (CCCTGTGCTCAACCAGT, CTCACCTTGACACAAGCC)
1224 CD82 (GGGCTCAGCCTGTATCAAAG, CAGGACAGAGATGAAACTGCTC)
1225 VAV3 (CAACATCCTCCCTCCTTAGC, CCCTGACAATGACAGACTGG)
1226 HSPA8 (CAGTACGGAGGCGTCTTACA, CCACTTGGGTGGAGAAGATT)
1227 CD69-In (TCTTGTCCACTCTCCGGATGC, AGACTCAACAAGAGCTCCAGC)
1228 PPM1H-In (CTCCTTGAGGATGAGGATGG, CTCAGGACGAGGTGGAGTG)
50 1229 VAV3-In (GCAGAGCAGGACTCCATCGCGG, AGCCGCGTCCCGGAGCCGTCG)
1230 ANXA1-In (CAAACAGAAGGCAGCCAAT, CAGTGGAGACTTGGGCTTCT)
1231 CD1E-In (CAGTTAGTGCAATTAGGGAG, AAGCCACACTGCCTCGC)
1232 FBLN2-In (CGCGCACACAGCCAGGGG, GCGGACCGACGGGGAT)
1233 EOMES-In (ACGCTGGAAGAAGGTGACTT, TCAACTTGACCGATGCTTTG)
1234 CDK6-In (AGAGTTCCAGTTCCGTTTGG, GATCCTGCTTCCACGTATCC)
1235
1236 RNA-seq library preparation
1237 For RNA-seq, total RNA samples (10 μg) were prepared from biological duplicate samples with
1238 TRIzol followed by depletion of rRNA using the Ribominus isolation kit (Life Technologies).
1239 Briefly, 500 ng of rRNA-depleted RNA were used for library preparation using the SOLiD Total
1240 RNA-seq kit (Applied Biosystems). The RNA was fragmented and adapters ligated before
1241 cDNA synthesis. The cDNA was then size selected, amplified, purified with AmpureXP beads
1242 (Beckton Coulter) and quality checked on a Bioanalyzer (Agilent). Samples were quantified by
1243 qPCR using SOLiD adapter primers per the manufacturer's instructions (Applied Biosystems)
1244 and the EZ-Bead system was used to amplify the libraries onto the sequencing beads for high-
1245 throughput sequencing on a SOLiD 5500xl instrument (Applied Biosystems).
1246
1247 Chromatin immunoprecipitation (ChIP) assay
1248 Chromatin was cross-linked with 1% formaldehyde for 10 min at room temperature in culture
1249 media, and the reaction was stopped by the addition of glycine (125 mM final). Cells were
1250 washed twice with PBS and re-suspended in 1 ml of nuclei extraction buffer (5 mM PIPES pH
1251 8.0, 85 mM KCl, 0.5% NP-40, 1 mM PMSF, protease inhibitor (Roche)) at 2x107 cells/ml and
1252 incubated for 10 min at 4°C. Cell nuclei were collected by centrifugation at 3000 rpm for 5 min
1253 at 4°C, and re-suspended in Szak's RIPA buffer (50 mM Tris-HCl pH 8.0, 1% NP-40, 150 mM
51 1254 NaCl, 0.5% deoxycholate, 0.1% SDS, 5 mM EDTA, 0.5 mM PMSF, and EDTA-free protease
1255 inhibitor cocktail (Roche)) at 5x107 nuclei/ml. Nuclear pellets were sonicated using the Bioruptor
1256 water bath (Diagenode) using the high setting with 30 sec on 30 sec off for 45 min to generate
1257 100-300 bp DNA fragments. Chromatin DNA was quantified with NanoDrop 1000 (Agilent). For
1258 the ChIP step, 1–3 μg of antibody was conjugated with 15 μl of protein G dynabeads (Life
1259 Technologies) and blocked with 0.16% bovine serum albumin for 16 hrs at 4°C. Antibody-
1260 conjugated dynabeads were incubated with 30–50 μg of chromatin DNA for 2 hrs at 4°C. Then
1261 the beads were sequentially washed two times with the following buffers: low salt buffer (20 mM
1262 Tris-HCl pH 8.0, 150 mM NaCl, 1% Triton, 0.1% SDS, 2 mM EDTA), high salt buffer (20 mM
1263 Tris-HCl pH 8.0, 500 mM NaCl, 1% Triton, 0.1% SDS, 2 mM EDTA), LiCl wash buffer (0.25M
1264 LiCl, 1% NP-40, 1% deoxycholate, 1 mM EDTA, 20 mM Tris-HCl, pH 8.0), and 1 mM Tris-EDTA
1265 pH 8.0. Chromatin immunocomplexes were eluted by incubation for 10 min at 65°C with 1%
1266 SDS and 100 mM NaHCO3, and cross-linking was reversed by incubation in the solution
1267 adjusted to 200 mM NaCl and proteinase K (20 μg) for 1 hr at 65°C. Antibodies used for ChIP:
1268 RNA polymerase II unphosphorylated/Ser5P-CTD (39097, Active Motif); RNA polymerase II
1269 total P-CTD (61081, Active Motif); RNA polymerase II Ser5P-CTD (ab5131, Abcam); RNA
1270 polymerase II Ser2P-CTD (ab5095; Abcam); Normal rabbit IgG (sc-2027; Santa Cruz); CycT1
1271 (sc-899X, Santa Cruz); Cdk9 (sc-8338X, Santa Cruz); H3K4me1 (ab8895, Abcam); H3K4me3
1272 (ab8580, Abcam); H3K9me3 (ab8898, Abcam); H3K27me3 (ab6002, Abcam); H3K36me3
1273 (ab9050, Abcam); H3K79me3 (ab2621, Abcam); H3K36me3 (ab4729, Abcam); H3 (ab1971,
1274 Abcam) Dot1L (A300-954A, Bethyl); SetD2 (ab69836, Abcam); MED1 (A300-793A, Bethyl);
1275 TBP (ab28450, Abcam); p300 (sc-585X, Santa Cruz); FLAG M2 (F1804, Sigma); Tat (ab43014;
1276 Abcam). qRT-PCR was performed on a 7500 ABI Fast Real-Time PCR System (Applied
1277 Biosystems) as indicated above using the following primers (forward, reverse) with the sign (-/+)
1278 and number indicating the position respective to the TSS:
52 1279 CD69 -17913 (TTGTTGGCCTGAAGTTTTCC, ATACGGATTCACAGCCGAAC)
1280 CD69 -3955 (AATCCAGGGTGAGACGTCAG, GGAAGTCTGTGGTCCCTGTT)
1281 CD69 -63 (AATCCCACTTTCCTCCTGCT, GCCGCCTACTTGCTTGACTA)
1282 CD69 +2046 (GGTAAACTGGACCAAGAGAAGTTGCC, AATCTGAGTGCCGATCTGTGATGG)
1283 CD69 +5897 (ACTGTGAAGAGGAGCTGG, GTGTTCCTCTCTACCTGCGTATCG)
1284 CD1E -1499 (GTCCTTGTGATAGTTTGCTGAGAAT, CAAATGTCCAACAATGATAGACTG)
1285 CD1E +4 (CACAAGAGCAGGGAGAAAATCTGGA, GACAGACAAACCTCTCCCTCTG)
1286 CD1E +1341 (GGGCCCAGAACATCTGTAAAG, AGAGCAGGTGCTGCATCTTT)
1287 CD1E +5215 (GGCCAAAGCTCAGAAGAGTGCAG, AGAGCAGGTGCTGCATCTTT)
1288 ADCYAP1 -8465 (TCTCCCCTTCCTGATGATTG, CCCAGAGCACACAGCTAACA)
1289 ADCYAP1 +1 (AGACACCAACGCCAGACG, GGAGAGCTGCCAGGTAGGAC)
1290 ADCYAP1 +3713 (TTCCAGGGAGGTTTTGCTATAA, CTGTTTGGGTCCATAAGTGTCC)
1291 ADCYAP1 +6242 (AGCTCCAACAGACCCTGAGA, ATCTGATTGCTGGGTGAAGG)
1292 RAG1 -3544 (GCCAGTTCATGGATTCAACAAC, TGCGGGCTTTCTGATTTTAGCTAC)
1293 RAG1 +8 (TGAGAAACAAGAGGGCAAGGAGAGA, CCTGGCCAAAGTGTTGAGATGTCTG)
1294 RAG1 +5761 (AAGGTTTTCCGGATCGATGTG, GGGCACTGCTAAACTTCCTGTGCAT)
1295 RAG1 +12971 (TCCAAGATGCAATGGTGGTA, GTGGGACATGGTGTCAACAG)
1296 PPM1H +3755 (CCAGGCTAACACACCATGAC, TTGTTAATGCATGGGACAGG)
1297 EOMES -5777 (AAGGCACCCTTAACTGGATG, TCTTGCTCTGCACTTGCTCT)
1298 EOMES -57 (GGGCTGTCACTAGCTGCTTT, CAGGCGACTTGATCCAATTA)
1299 EOMES +3289 (GGGTGTTGTTGTTATTTGCG, TGTATGTTCACCCAGAGTCTCC)
1300 EOMES +9699 (GGAGACAGGACAGTCGAGGT, CTAACTGTGTAGCGCGGAAA)
1301 FAM46C -182 (AGGTGTCCCACTAACTCCGA, TGGTTGCCAAGAGACGAGTA)
1302 VAV3 -3936 (TCCTACTGCCCTTGAGTCCT, AAAGTGCCCAGAAGAGTGCT)
1303 VAV3 +91 (GTAGGAAACGCCAAAGTGGT, TCATGCGGCTAATTTCTTTG)
1304 VAV3 +102770 (ACAGGCACAACAGTCAAAGG, TGGTTAATTCCAGTGGGTGA)
53 1305 CD82 -10659 (TTAGGCCAGCTGTCTCATTG, AGGGCAGACATGGAATTAGG)
1306 CD82 +4180 (CCCACCTGAAGCACCTTTAT, GGGCCGAGATCTCAACTCTA)
1307 CD82 +28894 (GGATCCTAACTGCCAAGCAT, CAGAGGCCCAGAGAGGTTAG)
1308 HSPA8 -3885 (CTGGTAGTAGAGCCCTTGCC, CGGACAATATGATCTGCCAA)
1309 HSPA8 -103 (CTGCCCTTACAAGACCCAAT, GGTGAGTGCGTTATCGTGAG)
1310 HSPA8 +3576 (TTAGCTAGACGCCCTTAGGC, TGTGGACAAGAGTACGGGAA)
1311 FBLN2 + 622 (CACACACGCACTCACACAAG, GAGAGGGAGGATGTGGTGAC)
1312
1313 ChIP-seq library preparation and sequencing
1314 ChIP DNA was quantified on a Qubit® 2.0 Fluorometer (Life Technologies) and ~10 ng DNA
1315 were submitted to the McDermott Center Sequencing Core at UT Southwestern Medical Center
1316 for library preparation and high-throughput sequencing using the SOLiDTM ChIP-seq Kit with
1317 modifications for the 5500xl instrument (Applied Biosystems). Samples were end repaired, 3’-
1318 end adenylated and barcoded with multiplex adapters (Applied Biosystems). After purification
1319 with Ampure XP beads (Beckton Coulter), samples were PCR amplified (~14 cycles), size
1320 selected with Ampure XP beads, and quantified on the Agilent 2100 Bioanalyzer. Emulsion
1321 PCR was then performed and beads enriched on the EZ-Bead system. Between 20-30x106 50
1322 nt sequencing reads were generated for all ChIP experiments in the two cell lines. We uniquely
1323 mapped between 70-80% reads to the human genome. High-throughput sequencing was
1324 analyzed as described in the computational analysis section below.
1325
1326 Computational analysis
1327 All scripting was performed using python 2.7.6. All ChIP-seq binding events were loaded into a
1328 custom MySQL database to allow for efficient comparisons of multiple factors’ binding loci.
1329
1330 Mapping and sorting
54 1331 The output from the SOLiD ChIP-seq was aligned to the reference human genome
1332 (GRCh37/hg19) using Bowtie v1.0.0. (Langmead et al., 2009), allowing 1 mismatch (-v 1).
1333 Sequences aligning to multiple locations in the reference genome were discarded. -S was
1334 specified to deliver.sam output, which was then sorted and converted to binary (.bam) using
1335 Samtools (Li et al., 2009).
1336
1337 ChIP-seq peak calling
1338 The .sam files for all ChIP-seq marks were analyzed for binding events using peak calling in the
1339 MACS2 software (Zhang et al., 2008). When determining peaks, duplicate read alignments
1340 were discarded to avoid amplification artifacts from the PCR. An FDR threshold of < 0.05 was
1341 applied for statistical significance. This methodology was used to call FLAG peaks in both the
1342 Tat and GFP cell lines, and only FLAG peaks present in the Tat, but not in the GFP, cell line
1343 were considered valid Tat binding events. The GFP:SF cell line was used to control for the
1344 presence of the FLAG epitope as well as any inherent chromatin biases in the Jurkat cell lines.
1345
1346 ChIP-seq data visualization
1347 MACS2 was run in the callpeak mode with the –B and –SPMR flags set to generate signal
1348 pileup tracks in bedGraph format on a per million reads basis. This allows for direct comparison
1349 between the GFP and Tat cell lines, regardless of any differences in depth of the sequencing
1350 experiment. The pileup tracks were visualized and images were generated using Integrative
1351 Genomics Viewer (IGV) (Robinson et al., 2011). These normalized pileup tracks also served as
1352 the input for many data visualizations as noted below.
1353
1354 Modified traveling ratio (mTR) algorithm
1355 Rahl et al. utilized a ratio of the levels of promoter-proximal and distal Pol II within a gene as a
1356 probe to determine rate of transcription phase change from initiation to elongation on a per-gene
55 1357 basis (Rahl et al., 2010). In our Pol II ChIP-seq dataset, we noted that Pol II density is elevated
1358 above active transcription levels not only at the promoter-proximal location, but also at discrete
1359 positions within bodies of actively transcribed genes. These positions are often accompanied
1360 by H3K27Ac chromatin marks. We found that these positions are three-dimensionally
1361 positioned near the TSS and are actually representative of a form of paused Pol II, not actively
1362 transcribing Pol II. Therefore, our modified traveling ratio (mTR) is calculated as follows:
1363 promoter-proximal Pol II ChIP reads / promoter-distal Pol II ChIP reads. Promoter-proximal Pol
1364 II ChIP reads are considered to be all reads aligned between -30 to +1000 bp of the TSS and all
1365 reads present in distal locations having a H3K27Ac peak (as determined by the method
1366 provided above) within 500 bp and having a density greater than 5X the average Pol II density
1367 between TSS+1001 bp to TTS. Distal Pol II reads are considered to be all those from
1368 TSS+1001 bp to TTS that do not fit the criteria for promoter-proximal Pol II as indicated above.
1369 The mTR is a custom software and is publicly available. Source code is included in ‘calculate-
1370 mTR.py’ source data file.
1371
1372 Cis-regulatory motif discovery
1373 To determine conserved DNA motifs associated with Tat, 200 bp fragments of DNA were
1374 selected, centered on the summits of Tat peaks found by MACS2 (Zhang et al., 2008). These
1375 fragments were submitted to the MEME suite (Bailey et al., 2009) and motifs were considered
1376 significant when having an e-value less than 0.05.
1377
1378 Generation of Marban et al. pileup track
Marban et al. (Marban et al., 2011) did not publish processed visualization data such as bigwig
or bedGraph pileup files. Therefore, to visually compare the fidelity of our Tat ChIP-seq data
with the data generated by Marban et al., we retrieved Marban’s raw ChIP-seq read data
published in GEO (GSE30738) and processed it using the pipeline described above for mapping
56 and peak calling/pileup track generation. Notably, using the strict p-value cutoff we chose for
peak calling with our dataset, MACS2 did not return any peaks when processing the Marban
data.
1379
1380 Annotation of ChIP-seq peaks to genes
1381 HOMER (Heinz et al., 2010) annotatePeaks.pl was used to relate ChIP-seq peak location
1382 information to the corresponding genes in which those peaks are found. Annotations were
1383 made based on the GRCh37/hg19 annotation reference.
1384
1385 Generation of ChIP-seq heatmaps
1386 We used HOMER (Heinz et al., 2010) annotatePeaks.pl to generate density matrices using
1387 normalized bedGraph pileup tracks of ChIP-seq marks as input. The –hist flag was set with a
1388 parameter of 50 bins and the –ghist flag was also set to generate a density matrix. Certain
1389 marks’ heatmaps were generated using a fixed length window (i.e. –size 6000) while others
1390 were generated based on the entire transcribed region of a set of genes that differ in length (–
1391 size given with a peakfile containing a bed-style list of TSS and TTS coordinates for each gene
1392 in the set).
1393
1394 Generation of ChIP-seq metagene plots
1395 HOMER (Heinz et al., 2010) annotatePeaks.pl was used to generate metagene plots based on
1396 the average profile of a series of ChIP-seq marks in normalized bedGraph pileup tracks. The –
1397 hist flag was set using 50 bins for all metagene profiles. For certain marks, we were only
1398 interested in a fixed-length window, so the –size flag was set to a fixed number of base pairs,
1399 i.e. H3K4me3 was plotted centered on TSS and extending -/+ 3kb (-size 6000). Other markers
1400 show the profile of the entire transcribed regions of a set of genes that differ in length (-size
1401 given with a peakfile containing a bed style list of TSS and TTS coordinates for each gene in the
57 1402 set). In these cases, the profile is generated based on relative positions within the genes rather
1403 than fixed windows (TSS + 2%, TSS + 4%, etc). In these profiles, we also generated a leading
1404 and trailing profile of a fixed 2 kb length.
1405
1406 Generation of box plots
1407 HOMER (Heinz et al., 2010) annotatePeaks.pl was used to generate average density figures for
1408 particular ChIP-seq marks using the respective normalized bedGraph pileup tracks as input.
1409 This is accomplished by passing the –size flag alone in addition to the required parameters. In
1410 the case of fixed windows, an integer parameter is passed along with the –size flag (-size 6000),
1411 and in the case of whole genes, the –size given option is used, indicating that the peakfile
1412 contains coordinates of the TSS and TTS for a set of genes. The resulting output contains a set
1413 of average densities for the windows specified and these data were plotted in box plot form
1414 using GraphPad Prism.
1415
1416 RNA-seq data analysis
1417 The 50 nt RNA-seq outputs reads from the sequencer Fastq files were aligned to the UCSC
1418 hg19 genome using Tophat v2.0.10 overlaid on Bowtie 1.0.0. Cufflinks v1.3.0 was used to
1419 assemble exons discovered by Tophat into competent mRNA transcripts associated with the
1420 hg19 annotation (Trapnell et al., 2013). The GFP cell line was used as a baseline transcriptome
1421 to which to compare that of the Tat cell line. The Tat transcriptome assembled by Cufflinks was
1422 compared to the GFP transcriptome using Cuffdiff v1.3.0 to generate a list of genes that were
1423 significantly differentially expressed, using a q-value cutoff of 0.05. About 30x106 50 nt
1424 sequencing reads were generated for the GFP and Tat samples. For the GFP cell line we
1425 uniquely mapped 67.82% and 69.95% of reads (values for the GFP1 and GFP2 duplicate
1426 samples) to the human genome. For the Tat cell line we uniquely mapped 72.46% and 77.35%
1427 of reads (values for the Tat1 and Tat2 duplicate samples) to the human genome. FPKM per
58 1428 genes were then produced by counting the number of unique mapped reads that fell within
1429 exonic regions for all the genes annotated in the Ensemble hg19. The total length of exonic
1430 regions was used to normalize read counts per gene.
1431
1432 Unbiased identification of TAR-like motifs in the human genome
1433 To identify TAR-like signatures we first extracted genomic sequences of the TSG and TDG
1434 including up- and down-stream regions of 10 Kb. In a first-pass filtering approach using seedtop
1435 (Shiryev et al., 2007) from the NCBI BLAST package, we searched for the following pattern:
1436 X(2)-G-A-T-X(1,2)-G-A-X(4,40)-T-C-T-C-X(2), yielding 51559 sequence signatures. After subjecting
1437 these sequences to RNA Fold from the Vienna RNA package (Lorenz et al., 2011) and
1438 calculating the partition function and base-pairing probability matrix, we used the maximum
1439 expected accuracy (MEA) structure and isolated sequences that fold in the corresponding
1440 bulged hairpin. In detail, we required the following conserved secondary structure (in dot-
1441 bracket notation, with x denoting either dot or bracket): X(2)-G-A-T-X(1,2)-G-A-X(4,40)-T-C-T-C-X(2)
1442 and X(2)-(-(-.-.(1,2)-(-(-X(4,40)-)-)-)-)-X(2), resulting in 745 signatures covering 225 genes (Table 3).
1443
1444 Response network
1445 A response network was constructed based on published protein interaction and gene-
1446 regulatory data using RNA-seq FPKM values as well as the 438 TSG/TDG targets as seed
1447 nodes as previously described (Mata et al., 2011). Quantitative expression data was
1448 superimposed with network information. The response network was inferred from the large
1449 network by calculating k-shortest weighed paths between seed-node pairs. A detailed
1450 description of the algorithm can be found here (Cabusora et al., 2005).
1451
1452 Gene set enrichment analysis
1453 Simple, non-weighted gene set enrichment analysis has been performed by using a variety of
59 1454 different gene set databases including GO and GSEA/MSigDB (Ashburner et al., 2000;
1455 Subramanian et al., 2005). As standard procedure we employed the Fisher Exact Test and
1456 used Bonferroni correction for multiple testing. We also performed quantitative enrichment
1457 analysis using the GSEA package together with MSigDB (Subramanian et al., 2005).
60 1458 Acknowledgements
1459 We are grateful to Vanessa Schmid and the McDermott Center at UT Southwestern for the
1460 library preparation and high-throughput sequencing, and the Texas Advanced Computing
1461 Center (TACC) for data storage and use of computing resources. We thank Amparo Estrada,
1462 Kelly Vollbracht, Jennifer McCann and Blake Richardson for preliminary work related to qRT-
1463 PCR assays and cloning; Leonardo Estrada for advice in flow cytometry; Lee Kraus and Ralf
1464 Kittler for discussions about ChIP-seq and dataset analysis; and Ana Beatriz Silva and Vicente
1465 Planelles for advice on protocols to obtain central memory T cells. Research reported in this
1466 publication was supported by the National Institute of Allergy and Infectious Diseases of the NIH
1467 under Award Numbers R00AI083087, R56AI106514, and R01AI114362, and The Welch
1468 Foundation grant I-1782 to I.D.; and NIH training grant T32 2T32AI007520-16 to R.P.M. J.E.R.
1469 was funded through the Green Fellows program at The University of Texas at Dallas and the
1470 SURF Program at UT Southwestern.
1471
1472 Additional information
1473 Competing interests
1474 The authors declare no conflicts of interest.
1475
1476 Author contributions
1477 YTK, JER, ID, Conception and design, Acquisition of data, Analysis and interpretation of data;
1478 JER, ID, Writing and drafting the article; CVF, RPM, Acquisition of data.
1479
1480 Data availability
1481 The following data has been deposited in the NCBI GEO (under accession no. GSE65689):
1482 ChIP-seq raw sequence tags from Jurkat CD4+ T cells and gene expression (RNA-seq) data.
61 1483 References
1484 Adelman, K., and Lis, J.T. (2012). Promoter-proximal pausing of RNA polymerase II: emerging 1485 roles in metazoans. Nature reviews Genetics 13, 720-731. 1486 1487 Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., 1488 Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology: tool for the unification of 1489 biology. The Gene Ontology Consortium. Nat Genet 25, 25-29. 1490 1491 Bailey, T.L. (2011). DREME: motif discovery in transcription factor ChIP-seq data. 1492 Bioinformatics 27, 1653-1659. 1493 1494 Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., and 1495 Noble, W.S. (2009). MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 1496 37, W202-208. 1497 1498 Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., 1499 and Zhao, K. (2007). High-resolution profiling of histone methylations in the human genome. 1500 Cell 129, 823-837. 1501 1502 Bernstein, B.E., Humphrey, E.L., Erlich, R.L., Schneider, R., Bouman, P., Liu, J.S., Kouzarides, 1503 T., and Schreiber, S.L. (2002). Methylation of histone H3 Lys 4 in coding regions of active 1504 genes. Proc Natl Acad Sci U S A 99, 8695-8700. 1505 1506 Bosque, A., and Planelles, V. (2011). Studies of HIV-1 latency in an ex vivo model that uses 1507 primary central memory T cells. Methods 53, 54-61. 1508 1509 Bulger, M., and Groudine, M. (2011). Functional and mechanistic diversity of distal transcription 1510 enhancers. Cell 144, 327-339. 1511 1512 Cabusora, L., Sutton, E., Fulmer, A., and Forst, C.V. (2005). Differential network expression 1513 during drug and stress response. Bioinformatics 21, 2898-2905. 1514 1515 Caldwell, R.L., Egan, B.S., and Shepherd, V.L. (2000). HIV-1 Tat represses transcription from 1516 the mannose receptor promoter. J Immunol 165, 7035-7041. 1517 1518 Consortium, E.P. (2011). A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS 1519 Biol 9, e1001046. 1520 1521 Core, L.J., and Lis, J.T. (2008). Transcription regulation through promoter-proximal pausing of 1522 RNA polymerase II. Science 319, 1791-1792. 1523 1524 Core, L.J., Waterfall, J.J., and Lis, J.T. (2008). Nascent RNA sequencing reveals widespread 1525 pausing and divergent initiation at human promoters. Science 322, 1845-1848. 1526 1527 Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W., Steine, E.J., Hanna, 1528 J., Lodato, M.A., Frampton, G.M., Sharp, P.A., et al. (2010). Histone H3K27ac separates active 1529 from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A 107, 21931- 1530 21936. 1531 1532 Croft, D., Mundo, A.F., Haw, R., Milacic, M., Weiser, J., Wu, G., Caudy, M., Garapati, P.,
62 1533 Gillespie, M., Kamdar, M.R., et al. (2014). The Reactome pathway knowledgebase. Nucleic 1534 Acids Res 42, D472-477. 1535 1536 D'Orso, I., and Frankel, A.D. (2010). RNA-mediated displacement of an inhibitory snRNP 1537 complex activates transcription elongation. Nat Struct Mol Biol 17, 815-821. 1538 1539 D'Orso, I., Jang, G.M., Pastuszak, A.W., Faust, T.B., Quezada, E., Booth, D.S., and Frankel, 1540 A.D. (2012). Transition step during assembly of HIV Tat:P-TEFb transcription complexes and 1541 transfer to TAR RNA. Mol Cell Biol 32, 4780-4793. 1542 1543 Dignam, J.D., Lebovitz, R.M., and Roeder, R.G. (1983). Accurate transcription initiation by RNA 1544 polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res 11, 1475- 1545 1489. 1546 1547 Egelhofer, T.A., Minoda, A., Klugman, S., Lee, K., Kolasinska-Zwierz, P., Alekseyenko, A.A., 1548 Cheung, M.S., Day, D.S., Gadel, S., Gorchakov, A.A., et al. (2011). An assessment of histone- 1549 modification antibody quality. Nat Struct Mol Biol 18, 91-93. 1550 1551 Feinberg, M.B., Baltimore, D., and Frankel, A.D. (1991). The role of Tat in the human 1552 immunodeficiency virus life cycle indicates a primary effect on transcriptional elongation. Proc 1553 Natl Acad Sci U S A 88, 4045-4049. 1554 1555 Ferrari, R., Pellegrini, M., Horwitz, G.A., Xie, W., Berk, A.J., and Kurdistani, S.K. (2008). 1556 Epigenetic reprogramming by adenovirus e1a. Science 321, 1086-1088. 1557 1558 Frankel, A.D., Bredt, D.S., and Pabo, C.O. (1988). Tat protein from human immunodeficiency 1559 virus forms a metal-linked dimer. Science 240, 70-73. 1560 1561 Frankel, A.D., and Young, J.A. (1998). HIV-1: fifteen proteins and an RNA. Annual review of 1562 biochemistry 67, 1-25. 1563 1564 Fuda, N.J., Ardehali, M.B., and Lis, J.T. (2009). Defining mechanisms that regulate RNA 1565 polymerase II transcription in vivo. Nature 461, 186-192. 1566 1567 Ghavi-Helm, Y., Klein, F.A., Pakozdi, T., Ciglar, L., Noordermeer, D., Huber, W., and Furlong, 1568 E.E. (2014). Enhancer loops appear stable during development and are associated with paused 1569 polymerase. Nature 512, 96-100. 1570 1571 Goel, R., Muthusamy, B., Pandey, A., and Prasad, T.S. (2011). Human protein reference 1572 database and human proteinpedia as discovery resources for molecular biotechnology. Mol 1573 Biotechnol 48, 87-95. 1574 1575 Gomes, N.P., Bjerke, G., Llorente, B., Szostek, S.A., Emerson, B.M., and Espinosa, J.M. 1576 (2006). Gene-specific requirement for P-TEFb activity and RNA polymerase II phosphorylation 1577 within the p53 transcriptional program. Genes & development 20, 601-612. 1578 1579 Grunberg, S., and Hahn, S. (2013). Structural insights into transcription initiation by RNA 1580 polymerase II. Trends in biochemical sciences 38, 603-611. 1581 1582 Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R., and Young, R.A. (2007). A chromatin 1583 landmark and transcription initiation at most promoters in human cells. Cell 130, 77-88.
63 1584 1585 Guttman, M., and Rinn, J.L. (2012). Modular regulatory principles of large non-coding RNAs. 1586 Nature 482, 339-346. 1587 1588 Hagege, H., Klous, P., Braem, C., Splinter, E., Dekker, J., Cathala, G., de Laat, W., and Forne, 1589 T. (2007). Quantitative analysis of chromosome conformation capture assays (3C-qPCR). Nat 1590 Protoc 2, 1722-1733. 1591 1592 Hah, N., Murakami, S., Nagari, A., Danko, C.G., and Kraus, W.L. (2013). Enhancer transcripts 1593 mark active estrogen receptor binding sites. Genome Res 23, 1210-1223. 1594 1595 Hahn, S. (2004). Structure and mechanism of the RNA polymerase II transcription machinery. 1596 Nat Struct Mol Biol 11, 394-403. 1597 1598 Hazenberg, M.D., Stuart, J.W., Otto, S.A., Borleffs, J.C., Boucher, C.A., de Boer, R.J., 1599 Miedema, F., and Hamann, D. (2000). T-cell division in human immunodeficiency virus (HIV)-1 1600 infection is mainly due to immune activation: a longitudinal analysis in patients before and 1601 during highly active antiretroviral therapy (HAART). Blood 95, 249-255. 1602 1603 He, N., Liu, M., Hsu, J., Xue, Y., Chou, S., Burlingame, A., Krogan, N.J., Alber, T., and Zhou, Q. 1604 (2010). HIV-1 Tat and host AFF4 recruit two transcription elongation factors into a bifunctional 1605 complex for coordinated activation of HIV-1 transcription. Mol Cell 38, 428-438. 1606 1607 He, Q., Johnston, J., and Zeitlinger, J. (2015). ChIP-nexus enables improved detection of in vivo 1608 transcription factor binding footprints. Nat Biotechnol 33, 395-401. 1609 1610 Heintzman, N.D., and Ren, B. (2009). Finding distal regulatory elements in the human genome. 1611 Curr Opin Genet Dev 19, 541-549. 1612 1613 Heintzman, N.D., Stuart, R.K., Hon, G., Fu, Y., Ching, C.W., Hawkins, R.D., Barrera, L.O., Van 1614 Calcar, S., Qu, C., Ching, K.A., et al. (2007). Distinct and predictive chromatin signatures of 1615 transcriptional promoters and enhancers in the human genome. Nat Genet 39, 311-318. 1616 1617 Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre, C., 1618 Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-determining transcription 1619 factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 1620 576-589. 1621 1622 Hellerstein, M.K., Hoh, R.A., Hanley, M.B., Cesar, D., Lee, D., Neese, R.A., and McCune, J.M. 1623 (2003). Subpopulations of long-lived and short-lived T cells in advanced HIV-1 infection. J Clin 1624 Invest 112, 956-966. 1625 1626 Hollenhorst, P.C., Chandler, K.J., Poulsen, R.L., Johnson, W.E., Speck, N.A., and Graves, B.J. 1627 (2009). DNA specificity determinants associate with distinct transcription factor functions. PLoS 1628 Genet 5, e1000778. 1629 1630 Hollenhorst, P.C., McIntosh, L.P., and Graves, B.J. (2011). Genomic and biochemical insights 1631 into the specificity of ETS transcription factors. Annual review of biochemistry 80, 437-471. 1632 1633 Horwitz, G.A., Zhang, K., McBrian, M.A., Grunstein, M., Kurdistani, S.K., and Berk, A.J. (2008). 1634 Adenovirus small e1a alters global patterns of histone modification. Science 321, 1084-1085.
64 1635 1636 Hottiger, M.O., and Nabel, G.J. (1998). Interaction of human immunodeficiency virus type 1 Tat 1637 with the transcriptional coactivators p300 and CREB binding protein. J Virol 72, 8252-8256. 1638 1639 Huang, L., Bosch, I., Hofmann, W., Sodroski, J., and Pardee, A.B. (1998). Tat protein induces 1640 human immunodeficiency virus type 1 (HIV-1) coreceptors and promotes infection with both 1641 macrophage-tropic and T-lymphotropic HIV-1 strains. J Virol 72, 8952-8960. 1642 1643 Izmailova, E., Bertley, F.M., Huang, Q., Makori, N., Miller, C.J., Young, R.A., and Aldovini, A. 1644 (2003). HIV-1 Tat reprograms immature dendritic cells to express chemoattractants for activated 1645 T cells and macrophages. Nat Med 9, 191-197. 1646 1647 Jager, S., Cimermancic, P., Gulbahce, N., Johnson, J.R., McGovern, K.E., Clarke, S.C., Shales, 1648 M., Mercenne, G., Pache, L., Li, K., et al. (2012). Global landscape of HIV-human protein 1649 complexes. Nature 481, 365-370. 1650 1651 Jin, F., Li, Y., Dixon, J.R., Selvaraj, S., Ye, Z., Lee, A.Y., Yen, C.A., Schmitt, A.D., Espinoza, 1652 C.A., and Ren, B. (2013). A high-resolution map of the three-dimensional chromatin interactome 1653 in human cells. Nature 503, 290-294. 1654 1655 Kao, S.Y., Calman, A.F., Luciw, P.A., and Peterlin, B.M. (1987). Anti-termination of transcription 1656 within the long terminal repeat of HIV-1 by tat gene product. Nature 330, 489-493. 1657 1658 Kerrien, S., Aranda, B., Breuza, L., Bridge, A., Broackes-Carter, F., Chen, C., Duesbury, M., 1659 Dumousseau, M., Feuermann, M., Hinz, U., et al. (2012). The IntAct molecular interaction 1660 database in 2012. Nucleic Acids Res 40, D841-846. 1661 1662 Kim, N., Kukkonen, S., Gupta, S., and Aldovini, A. (2010a). Association of Tat with promoters of 1663 PTEN and PP2A subunits is key to transcriptional activation of apoptotic pathways in HIV- 1664 infected CD4+ T cells. PLoS Pathog 6, e1001103. 1665 1666 Kim, N., Kukkonen, S., Martinez-Viedma Mdel, P., Gupta, S., and Aldovini, A. (2013). Tat 1667 engagement of p38 MAP kinase and IRF7 pathways leads to activation of interferon-stimulated 1668 genes in antigen-presenting cells. Blood 121, 4090-4100. 1669 1670 Kim, T.K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, 1671 M., Barbara-Haley, K., Kuersten, S., et al. (2010b). Widespread transcription at neuronal 1672 activity-regulated enhancers. Nature 465, 182-187. 1673 1674 Kim, T.K., and Shiekhattar, R. (2015). Architectural and Functional Commonalities between 1675 Enhancers and Promoters. Cell 162, 948-959. 1676 1677 Kolasinska-Zwierz, P., Down, T., Latorre, I., Liu, T., Liu, X.S., and Ahringer, J. (2009). 1678 Differential chromatin marking of introns and expressed exons by H3K36me3. Nat Genet 41, 1679 376-381. 1680 1681 Konig, R., Stertz, S., Zhou, Y., Inoue, A., Hoffmann, H.H., Bhattacharyya, S., Alamares, J.G., 1682 Tscherne, D.M., Ortigoza, M.B., Liang, Y., et al. (2010). Human host factors required for 1683 influenza virus replication. Nature 463, 813-817. 1684 1685 Kouzarides, T. (2007). Chromatin modifications and their function. Cell 128, 693-705.
65 1686 1687 Kowalczyk, M.S., Hughes, J.R., Garrick, D., Lynch, M.D., Sharpe, J.A., Sloane-Stanley, J.A., 1688 McGowan, S.J., De Gobbi, M., Hosseini, M., Vernimmen, D., et al. (2012). Intragenic enhancers 1689 act as alternative promoters. Mol Cell 45, 447-458. 1690 1691 Krogan, N.J., Dover, J., Wood, A., Schneider, J., Heidt, J., Boateng, M.A., Dean, K., Ryan, 1692 O.W., Golshani, A., Johnston, M., et al. (2003a). The Paf1 complex is required for histone H3 1693 methylation by COMPASS and Dot1p: linking transcriptional elongation to histone methylation. 1694 Mol Cell 11, 721-729. 1695 1696 Krogan, N.J., Kim, M., Tong, A., Golshani, A., Cagney, G., Canadien, V., Richards, D.P., 1697 Beattie, B.K., Emili, A., Boone, C., et al. (2003b). Methylation of histone H3 by Set2 in 1698 Saccharomyces cerevisiae is linked to transcriptional elongation by RNA polymerase II. Mol Cell 1699 Biol 23, 4207-4218. 1700 1701 Kukkonen, S., Martinez-Viedma Mdel, P., Kim, N., Manrique, M., and Aldovini, A. (2014). HIV-1 1702 Tat second exon limits the extent of Tat-mediated modulation of interferon-stimulated genes in 1703 antigen presenting cells. Retrovirology 11, 30. 1704 1705 Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient 1706 alignment of short DNA sequences to the human genome. Genome Biol 10, R25. 1707 1708 Laspia, M.F., Rice, A.P., and Mathews, M.B. (1989). HIV-1 Tat protein increases transcriptional 1709 initiation and stabilizes elongation. Cell 59, 283-292. 1710 1711 Lau, A., Swinbank, K.M., Ahmed, P.S., Taylor, D.L., Jackson, S.P., Smith, G.C., and O'Connor, 1712 M.J. (2005). Suppression of HIV-1 infection by a small molecule inhibitor of the ATM kinase. Nat 1713 Cell Biol 7, 493-500. 1714 1715 Li, B., Carey, M., and Workman, J.L. (2007). The role of chromatin during transcription. Cell 1716 128, 707-719. 1717 1718 Li, C.J., Ueda, Y., Shi, B., Borodyansky, L., Huang, L., Li, Y.Z., and Pardee, A.B. (1997). Tat 1719 protein induces self-perpetuating permissivity for productive HIV-1 infection. Proc Natl Acad Sci 1720 U S A 94, 8116-8120. 1721 1722 Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., 1723 Durbin, R., and Genome Project Data Processing, S. (2009). The Sequence Alignment/Map 1724 format and SAMtools. Bioinformatics 25, 2078-2079. 1725 1726 Lopez-Huertas, M.R., Callejas, S., Abia, D., Mateos, E., Dopazo, A., Alcami, J., and Coiras, M. 1727 (2010). Modifications in host cell cytoskeleton structure and function mediated by intracellular 1728 HIV-1 Tat protein are greatly dependent on the second coding exon. Nucleic Acids Res 38, 1729 3287-3307. 1730 1731 Lorenz, R., Bernhart, S.H., Honer Zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P.F., and 1732 Hofacker, I.L. (2011). ViennaRNA Package 2.0. Algorithms Mol Biol 6, 26. 1733 1734 Luo, Z., Lin, C., and Shilatifard, A. (2012). The super elongation complex (SEC) family in 1735 transcriptional control. Nat Rev Mol Cell Biol 13, 543-547. 1736
66 1737 Mancebo, H.S., Lee, G., Flygare, J., Tomassini, J., Luu, P., Zhu, Y., Peng, J., Blau, C., Hazuda, 1738 D., Price, D., et al. (1997). P-TEFb kinase is required for HIV Tat transcriptional activation in 1739 vivo and in vitro. Genes & development 11, 2633-2644. 1740 1741 Marazzi, I., Ho, J.S., Kim, J., Manicassamy, B., Dewell, S., Albrecht, R.A., Seibert, C.W., 1742 Schaefer, U., Jeffrey, K.L., Prinjha, R.K., et al. (2012). Suppression of the antiviral response by 1743 an influenza histone mimic. Nature 483, 428-433. 1744 1745 Marban, C., Su, T., Ferrari, R., Li, B., Vatakis, D., Pellegrini, M., Zack, J.A., Rohr, O., and 1746 Kurdistani, S.K. (2011). Genome-wide binding map of the HIV-1 Tat protein to the human 1747 genome. PLoS One 6, e26894. 1748 1749 Martin, C., and Zhang, Y. (2005). The diverse functions of histone lysine methylation. Nat Rev 1750 Mol Cell Biol 6, 838-849. 1751 1752 Mata, M.A., Satterly, N., Versteeg, G.A., Frantz, D., Wei, S., Williams, N., Schmolke, M., Pena- 1753 Llopis, S., Brugarolas, J., Forst, C.V., et al. (2011). Chemical inhibition of RNA viruses reveals 1754 REDD1 as a host defense factor. Nat Chem Biol 7, 712-719. 1755 1756 Mayer, A., di Iulio, J., Maleri, S., Eser, U., Vierstra, J., Reynolds, A., Sandstrom, R., 1757 Stamatoyannopoulos, J.A., and Churchman, L.S. (2015). Native elongating transcript 1758 sequencing reveals human transcriptional activity at nucleotide resolution. Cell 161, 541-554. 1759 Meijsing, S.H., Pufall, M.A., So, A.Y., Bates, D.L., Chen, L., and Yamamoto, K.R. (2009). DNA 1760 binding site sequence directs glucocorticoid receptor structure and activity. Science 324, 407- 1761 410. 1762 Messi, M., Giacchetto, I., Nagata, K., Lanzavecchia, A., Natoli, G., and Sallusto, F. (2003). 1763 Memory and flexibility of cytokine gene expression as separable properties of human T(H)1 and 1764 T(H)2 lymphocytes. Nat Immunol 4, 78-86. 1765 1766 Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., 1767 Brockman, W., Kim, T.K., Koche, R.P., et al. (2007). Genome-wide maps of chromatin state in 1768 pluripotent and lineage-committed cells. Nature 448, 553-560. 1769 1770 Min, I.M., Waterfall, J.J., Core, L.J., Munroe, R.J., Schimenti, J., and Lis, J.T. (2011). Regulating 1771 RNA polymerase pausing and transcription elongation in embryonic stem cells. Genes & 1772 development 25, 742-754. 1773 1774 Nguyen, A.T., and Zhang, Y. (2011). The diverse functions of Dot1 and H3K79 methylation. 1775 Genes & development 25, 1345-1358. 1776 1777 Onder, T.T., Kara, N., Cherry, A., Sinha, A.U., Zhu, N., Bernt, K.M., Cahan, P., Marcarci, B.O., 1778 Unternaehrer, J., Gupta, P.B., et al. (2012). Chromatin-modifying enzymes as modulators of 1779 reprogramming. Nature 483, 598-602. 1780 1781 Ott, M., Geyer, M., and Zhou, Q. (2011). The control of HIV transcription: keeping RNA 1782 polymerase II on track. Cell Host Microbe 10, 426-435. 1783 1784 Pearson, R., Kim, Y.K., Hokello, J., Lassen, K., Friedman, J., Tyagi, M., and Karn, J. (2008). 1785 Epigenetic silencing of human immunodeficiency virus (HIV) transcription by formation of 1786 restrictive chromatin structures at the viral long terminal repeat drives the progressive entry of 1787 HIV into latency. J Virol 82, 12291-12303.
67 1788 1789 Peterlin, B.M., and Price, D.H. (2006). Controlling the elongation phase of transcription with P- 1790 TEFb. Mol Cell 23, 297-305. 1791 1792 Pokholok, D.K., Harbison, C.T., Levine, S., Cole, M., Hannett, N.M., Lee, T.I., Bell, G.W., 1793 Walker, K., Rolfe, P.A., Herbolsheimer, E., et al. (2005). Genome-wide map of nucleosome 1794 acetylation and methylation in yeast. Cell 122, 517-527. 1795 1796 Raha, T., Cheng, S.W., and Green, M.R. (2005). HIV-1 Tat stimulates transcription complex 1797 assembly through recruitment of TBP in the absence of TAFs. PLoS Biol 3, e44. 1798 1799 Rahl, P.B., Lin, C.Y., Seila, A.C., Flynn, R.A., McCuine, S., Burge, C.B., Sharp, P.A., and 1800 Young, R.A. (2010). c-Myc regulates transcriptional pause release. Cell 141, 432-445. 1801 1802 Reppas, N.B., Wade, J.T., Church, G.M., and Struhl, K. (2006). The transition between 1803 transcriptional initiation and elongation in E. coli is highly variable and often rate limiting. Mol 1804 Cell 24, 747-757. 1805 1806 Rhee, H.S., and Pugh, B.F. (2011). Comprehensive genome-wide protein-DNA interactions 1807 detected at single-nucleotide resolution. Cell 147, 1408-1419. 1808 1809 Rice, A.P., and Mathews, M.B. (1988). Transcriptional but not translational regulation of HIV-1 1810 by the tat gene product. Nature 332, 551-553. 1811 1812 Robinson, J.T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and 1813 Mesirov, J.P. (2011). Integrative genomics viewer. Nat Biotechnol 29, 24-26. 1814 1815 Sancho, D., Gomez, M., and Sanchez-Madrid, F. (2005). CD69 is an immunoregulatory 1816 molecule induced following activation. Trends Immunol 26, 136-140. 1817 1818 Santos-Rosa, H., Schneider, R., Bannister, A.J., Sherriff, J., Bernstein, B.E., Emre, N.C., 1819 Schreiber, S.L., Mellor, J., and Kouzarides, T. (2002). Active genes are tri-methylated at K4 of 1820 histone H3. Nature 419, 407-411. 1821 1822 Seila, A.C., Calabrese, J.M., Levine, S.S., Yeo, G.W., Rahl, P.B., Flynn, R.A., Young, R.A., and 1823 Sharp, P.A. (2008). Divergent transcription from active promoters. Science 322, 1849-1851. 1824 1825 Shilatifard, A. (2012). The COMPASS family of histone H3K4 methylases: mechanisms of 1826 regulation in development and disease pathogenesis. Annual review of biochemistry 81, 65-95. 1827 1828 Shiryev, S.A., Papadopoulos, J.S., Schaffer, A.A., and Agarwala, R. (2007). Improved BLAST 1829 searches using longer words for protein seeding. Bioinformatics 23, 2949-2951. 1830 1831 Sims, R.J., 3rd, Belotserkovskaya, R., and Reinberg, D. (2004). Elongation by RNA polymerase 1832 II: the short and long of it. Genes & development 18, 2437-2468. 1833 1834 Smallwood, A., and Ren, B. (2013). Genome organization and long-range regulation of gene 1835 expression by enhancers. Curr Opin Cell Biol 25, 387-394. 1836 1837 Smith, E., and Shilatifard, A. (2013). Transcriptional elongation checkpoint control in 1838 development and disease. Genes & development 27, 1079-1088.
68 1839 1840 Sobhian, B., Laguette, N., Yatim, A., Nakamura, M., Levy, Y., Kiernan, R., and Benkirane, M. 1841 (2010). HIV-1 Tat assembles a multifunctional transcription elongation complex and stably 1842 associates with the 7SK snRNP. Mol Cell 38, 439-451. 1843 1844 Stasevich, T.J., Hayashi-Takanaka, Y., Sato, Y., Maehara, K., Ohkawa, Y., Sakata-Sogawa, K., 1845 Tokunaga, M., Nagase, T., Nozaki, N., McNally, J.G., et al. (2014). Regulation of RNA 1846 polymerase II activation by histone acetylation in single living cells. Nature 516, 272-275. 1847 1848 Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., 1849 Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment 1850 analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc 1851 Natl Acad Sci U S A 102, 15545-15550. 1852 1853 Tessarz, P., and Kouzarides, T. (2014). Histone core modifications regulating nucleosome 1854 structure and dynamics. Nat Rev Mol Cell Biol 15, 703-708. 1855 1856 Trapnell, C., Hendrickson, D.G., Sauvageau, M., Goff, L., Rinn, J.L., and Pachter, L. (2013). 1857 Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31, 1858 46-53. 1859 1860 Wade, J.T., and Struhl, K. (2008). The transition from transcriptional initiation to elongation. Curr 1861 Opin Genet Dev 18, 130-136. 1862 1863 Weissman, J.D., Brown, J.A., Howcroft, T.K., Hwang, J., Chawla, A., Roche, P.A., Schiltz, L., 1864 Nakatani, Y., and Singer, D.S. (1998). HIV-1 tat binds TAFII250 and represses TAFII250- 1865 dependent transcription of major histocompatibility class I genes. Proc Natl Acad Sci U S A 95, 1866 11601-11606. 1867 1868 Zeitlinger, J., Stark, A., Kellis, M., Hong, J.W., Nechaev, S., Adelman, K., Levine, M., and 1869 Young, R.A. (2007). RNA polymerase stalling at developmental control genes in the Drosophila 1870 melanogaster embryo. Nat Genet 39, 1512-1516. 1871 1872 Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C., 1873 Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). 1874 Genome Biol 9, R137. 1875 1876 Zhou, Q., Li, T., and Price, D.H. (2012). RNA polymerase II elongation control. Annual review of 1877 biochemistry 81, 119-143. 1878 1879 Zhou, V.W., Goren, A., and Bernstein, B.E. (2011). Charting histone modifications and the 1880 functional organization of mammalian genomes. Nature reviews Genetics 12, 7-18. 1881 1882 Zhu, Y., Pe'ery, T., Peng, J., Ramanathan, Y., Marshall, N., Marshall, T., Amendt, B., Mathews, 1883 M.B., and Price, D.H. (1997). Transcription elongation factor P-TEFb is required for HIV-1 tat 1884 transactivation in vitro. Genes & development 11, 2622-2632. 1885
69 1886 Figure legends
1887
1888 Figure 1. Genomic domains occupied and regulated by Tat in CD4+ T cells. (A) Western
1889 blot of Jurkat-GFP and -Tat cell lines treated (+) or not (–) with doxycycline (DOX) using the
1890 indicated antibodies. (B) Genome-wide distribution of Tat across the human genome. (C)
1891 Integration of the FLAG ChIP-seq and RNA-seq datasets defines a set of genes directly
1892 regulated by Tat. Direct targets, genes directly bound and regulated by Tat. TSG, Tat stimulated
1893 genes; TDG, Tat downregulated genes. (D) Validation of the RNA-seq dataset using qRT-PCR
1894 on the indicated TSG, TDG or non-target genes as negative controls (mean ± SEM; n = 3). (E)
1895 Individual tracks showing FLAG ChIP-seq and the corresponding RNA-seq dataset in the GFP
1896 and Tat cell lines. TSS, transcription start site. (F) Functional annotation of biological processes
1897 enriched at TSG and TDG. This figure is associated with Figure1–figure supplements 1 to 10.
1898
1899 Figure 2. Global analysis of chromatin signatures reveals that Tat activates the
1900 transcription initiation and elongation steps. (A) Dot plots of H3K4me3 log2 fold change in
1901 the region encompassing -3/+3 Kb from the TSS of all TSG. Genes are divided into two
1902 clusters: class I and class II based on increased (>1.5-fold change) or no change/decreased
1903 H3K4me3 levels in the presence of Tat. Selected TSG examples are indicated in red. (B)
1904 Metagene plots centered on TSS showing H3K4me3 occupancy profiles at both class I (n = 17)
1905 and class II (n = 43) TSG in the presence of Tat or GFP. (C) Dot plots of H3K79me3 log2 fold
1906 change in the region from TSS to TTS (see methods). (D) Metagene analysis showing average
1907 H3K79me3 ChIP-seq signals at both class I and class II TSG in the presence of Tat or GFP.
1908 Units are mean tags per million ChIP-Seq reads per bin across the transcribed region of each
1909 gene with 2 kb upstream and downstream flanking regions. (E) Dot plots of H3K36me3 log2 fold
1910 change in the region from TSS to TTS. (F) Metagene plots showing average H3K36me3 ChIP-
1911 seq signals at both class I and class II TSG in the presence of Tat or GFP. (G) Genome browser
70 1912 views showing ChIP-seq signal at a class I TSG (NETO1) in the GFP and Tat cell lines. (H)
1913 Genome browser views showing ChIP-seq signal at a class II TSG (VAV3) in the GFP and Tat
1914 cell lines. This figure is associated with Figure 2–figure supplements 1 to 5.
1915
1916 Figure 3. Tat promotes Pol II recruitment and pause release at two distinct TSG classes.
1917 (A) Tat binding promotes Pol II recruitment at class I but not class II TSG. Box plots of
1918 normalized Pol II tag density at the Tat peak of class I (n = 17) or II (n = 43) TSG in the GFP
1919 and Tat cell lines. (B) Tat binding at class I TSG induces Pol II occupancy at promoters. Box
1920 plots of normalized Pol II tag density at the promoter of class I or II TSG in the GFP and Tat cell
1921 lines. (C) Pol II normalized tag density relative to the Tat peak at class I TSG. (D) Pol II
1922 normalized tag density relative to the Tat peak at class II TSG. (E) Pol II distribution at class I
1923 TSG (Metagene plots) in the Tat and GFP cell lines. (F) Genome browser views of ChIP-seq
1924 data in the Tat and GFP cell lines at a class I TSG (NETO1). The arrow indicates the position of
1925 the FLAG peak called in the Tat cell lines by MACS. (G) Pol II distribution at class II TSG
1926 (Metagene plots) in the Tat and GFP cell lines. (H) Genome browser views of ChIP-seq data in
1927 the Tat and GFP cell lines at a class II TSG (VAV3). This figure is associated with Figure 3–
1928 figure supplements 1 and 2.
1929
1930 Figure 4. Tat induces transcription initiation from distal sites by inducing gene looping.
1931 (A) Heatmap representation of Tat distribution at TSG promoter or intragenic sites. (B) Heatmap
1932 representation of H3K4me3 and H3K27Ac tag density centered on the Tat peak at both TSG
1933 promoter and intragenic sites in the GFP and Tat cell lines. Promoter sites are marked with high
1934 H3K4me3 and low H3K27Ac levels, while intragenic sites are marked by low H3K4me3 and
1935 high H3K27Ac levels. (C) Heatmap representation of H3K27Ac at class I and class II TSG
1936 centered on the Tat peak in the GFP and Tat cell lines. Genes are ranked based on H3K27Ac
1937 density. (D) Tat binding sharply increases H3K27Ac levels at class I TSG. Box plots showing
71 1938 normalized H3K27Ac density at class I and II TSG in the GFP and Tat cell lines. (E) Metagene
1939 analysis of H3K27Ac levels surrounding Tat peaks at class I TSG in the GFP and Tat cell lines.
1940 (F) Metagene analysis of H3K27Ac levels surrounding Tat peaks at class II TSG in the GFP and
1941 Tat cell lines. (G) ChIP of H3K27Ac and p300 recruitment at an intragenic Tat site at the
1942 PPM1H gene. (H) ChIP of H3K27Ac and p300 recruitment at an intragenic Tat site at the CD82
1943 gene. (I) Top, genome browser views of ChIP-seq at the PPM1H locus in the GFP and Tat cell
1944 lines. The arrows indicate the position of two intragenic Tat binding sites. Bottom, 3C assay
1945 showing the relative crosslinking efficiency at the PPM1H locus in the GFP and Tat cell lines.
1946 The position of the primers used in qPCR assays, restriction sites and fragment generated after
1947 digestion are indicated. This figure is associated with Figure 4–figure supplements 1 to 4.
1948
1949 Figure 5. A modified traveling ratio algorithm reveals Tat roles in transcription
1950 elongation. (A) Traveling ratio (TR) algorithm used to calculate levels of promoter-proximal
1951 paused and elongation Pol II as previously described (Rahl et al., 2010; Zeitlinger et al., 2007).
1952 (B) Genome browser views of ChIP-seq data at the CD82 locus in the GFP and Tat cell lines.
1953 (C) Modified traveling ratio (mTR) algorithm that enables the identification of intragenically
1954 paused Pol II at sites of high H3K27Ac levels. (D) mTR decreases at TSG in the Tat cell line.
1955 Box plots showing mTR calculations at TSG as indicated in panel (D). (E) Dual-axis plot of the
1956 ratio of paused Pol II (pPol II) and traveling Pol II (tPol II) at class I (n = 17) and II (n = 43) TSG
1957 in the presence and absence of Tat (Tat/GFP). (F) Model of Tat-mediated Pol II recruitment and
1958 gene looping at class I TSG. Tat binds at promoter and/or intragenic regions (in some cases
1959 inducing gene looping), and recruits Pol II to promote transcription initiation from both promoter-
1960 proximal and promoter-distal sites. (G) Model of Tat-mediated Pol II elongation at class II TSG.
1961 Class II TSG are already activated in the absence of Tat, and Tat binds to these genes to
1962 primarily stimulate transcription elongation. These genes contain promoter-bound Pol II. This
1963 figure is associated with Figure 5–figure supplement 1.
72 1964
1965 Figure 6. Global analysis of chromatin signatures reveals the basis of transcriptional
1966 repression by Tat. (A) Dot plots of H3K4me3 fold change in the region encompassing -3/+3 Kb
1967 from the TSS of all TDG. TDG are divided into two classes: class I (n = 11) and class II (n = 14)
1968 based on decreased, or unchanged levels of H3K4me3 in the presence of Tat, respectively.
1969 Selected TDG are indicated in red. (B) Metagene plots centered on TSS showing H3K4me3
1970 occupancy profiles at both class I and class II TDG in the presence of Tat or GFP. (C) Dot plots
1971 of H3K79me3 fold change from TSS to TTS. (D) Metagene plots showing average H3K79me3
1972 ChIP-seq signals at both class I and class II TDG in the presence of Tat or GFP (see methods).
1973 (E) Dot plots of H3K36me3 fold change in the region from TSS to TTS. (F) Metagene plots
1974 showing average H3K36me3 ChIP-seq signals at both class I and class II TDG in the presence
1975 of Tat or GFP. (G) Genome browser views showing ChIP-seq signal at a class I TDG (CD1E) in
1976 the GFP and Tat cell lines. (H) Genome browser views showing ChIP-seq signal at a class II
1977 TDG (EOMES) in the GFP and Tat cell lines. This figure is associated with Figure 6–figure
1978 supplement 1.
1979
1980 Figure 7. Tat blocks Pol II recruitment and pause release to repress gene transcription at
1981 different gene classes. (A) Tat binding blocks Pol II recruitment primarily at class I TDG. Box
1982 plots of normalized Pol II tag density at the Tat peak of class I or II TDG in the GFP and Tat cell
1983 lines. (B) Tat binding at class I TSG blocks Pol II density at promoters. Box plots of normalized
1984 Pol II tag density at the promoter of class I (n = 11) or class II (n = 14) TDG in the GFP and Tat
1985 cell lines. (C) Normalized Pol II tag density relative to the Tat peak at class I TDG. (D)
1986 Normalized Pol II tag density relative to the Tat peak at class II TDG. (E) Pol II distribution at
1987 class I TDG (Metagene plots) in the Tat and GFP cell lines. (F) Genome browser views of ChIP-
1988 seq data at a class I TDG (CD1E) in the Tat and GFP cell lines. (G) Pol II distribution at class II
1989 genes (Metagene plots) in the Tat and GFP. (H) Genome browser views of ChIP-seq data at a
73 1990 class II TDG (EOMES) in the Tat and GFP cell lines. This figure is associated with Figure 7–
1991 figure supplement 1.
1992
1993 Figure 8. Enrichment of master transcriptional regulators, but not TAR-like motifs on Tat
1994 sites at the direct target genes. (A) TAR sequence and scheme of the secondary structure.
1995 (B) Schematic of the sequence query used to search for TAR-like motifs within the direct Tat
1996 target genes. (C) TAR-like motifs are not found at or significantly near Tat peaks in TSG.
1997 Fraction of TSG containing TAR-like motifs at <10kb, <1kb or <0.1kb from the Tat peak. (D)
1998 TAR-like motifs are not found at or significantly near Tat peaks in TDG. Fraction of TDG
1999 containing TAR-like motifs at <10kb, <1kb or <0.1kb from the Tat peak. (E) MEME analysis of
2000 200-bp windows surrounding Tat peaks in TSG reveals high-confidence motifs related to
2001 transcription factors ETS1 and RUNX1. (F) MEME analysis of 200-bp windows surrounding Tat
2002 peaks in TDG reveal different motifs for ETS1 and RUNX1. (G) Comparison of ETS1 ChIP-seq
2003 peak locations in Jurkat, as determined by Hollenhorst et al. (Hollenhorst et al., 2009), to Tat
2004 peak locations reveal that 73% of TSG contain an ETS1 peak within 100-bp of a Tat peak. (H)
2005 Comparison of ETS1 ChIP-seq peak locations to Tat peak locations reveal that 80% of TDG
2006 contain an ETS1 peak within 100-bp of a Tat peak. This figure is associated with Figure 8–figure
2007 supplement 1.
2008
2009 Figure 9. Tat is recruited to its target genes through interaction with the master
2010 transcriptional regulator ETS1. (A) Western blots showing interactions between Tat and
2011 ETS1. CDK9 was used as a positive control in the interaction. Strep-tagged Tat and GFP were
2012 affinity purified (AP) from the Jurkat cell lines using Strep beads, and analyzed by western blot
2013 using the indicated antibodies. (B) Tat is recruited to target genes marked by ETS1. ChIP
2014 assays showing that Tat and ETS1 co-occupy the promoter-proximal but not promoter-distal
2015 region of CD69, in agreement with the location of Tat and ETS1 peaks found by ChIP-seq and
74 2016 the presence of ETS1 motifs (as revealed by enrichment analysis). (C) Tat is also recruited to
2017 target genes using ETS1-independent mechanisms. ChIP assays showing that Tat but not
2018 ETS1 occupies the promoter-proximal region of the CD1E target gene, in agreement with the
2019 Tat and ETS1 ChIP-seq dataset and motif prediction analysis. (D) Generation of Jurkat GFP
2020 and Tat cell lines expressing non-target (NT) or ETS1 shRNAs. (–) denotes the parental
2021 untransduced Jurkat cell lines. Protein lysates were analyzed by western blot using the
2022 indicated antibodies to verify for the RNAi efficiency. (E) ETS1 knockdown impairs Tat
2023 recruitment to the CD69 promoter. ChIP-qPCR assays showing the density of ETS1 or FLAG at
2024 the CD69 promoter in the Jurkat GFP or Tat cell lines transduced with the NT or ETS1 shRNAs.
2025 (F) ETS1 knockdown does not abolish Tat recruitment to the CD1E promoter. ChIP-qPCR
2026 assays showing the density of ETS1 or FLAG at the CD1E promoter in the Jurkat GFP or Tat
2027 cell lines transduced with the NT or ETS1 shRNAs. This figure is associated with Figure 9–
2028 figure supplement 1.
75 2029 Table 1. Genome-wide distribution of Tat in CD4+ T cells.
Genomic Domain Peaks Percent Genome ChIP fraction (p-value) Total 6117 100 Intergenic 2141 35 52.7% Intragenic Promoter 1469 24 1.1% 3.4x10-323 (-1kb– TSS) Exon 128 2.1 1.9% 1.4x10-3 Intron 2227 36.4 42.4% 5.1x10-4 5’ UTR 73 1.2 0.4% 5.3x10-193 3’ UTR 79 1.3 1.5% 1.4x10-2 2030
76 2031 Table 2. Genome-wide distribution and function of histone modifications.
Histone modification Location Function H3K4me3 Promoters Transcription activation H3K79me3 Gene bodies Transcription activation H3K36me3 Gene bodies Transcription activation H3K4me1 Enhancers Does not demarcate active status, but location H3K27Ac Enhancers Transcription activation H3K9me3 Ubiquitous Transcription repression 2032
77 2033 Table 3. Statistics of the TAR signature selection.
TSG TDG Total Targets 247 191 438 Genomic Sequences (with duplicates) 1089 797 1886 Isolated target sequences (with 34739 16820 51559 duplicates) With correct secondary structure 493 253 746 Corresponding genes 146 86 232 2034
78 2035 Supplementary figure legends
2036
2037 Figure 1–figure supplement 1. Tat protein expression levels in the Jurkat Tat-SF model
2038 matches the levels of Tat detected during HIV infection. Western blot (FLAG and Tat) of
2039 Jurkat Tat-SF cell line treated with (+) or without (–) doxycycline (DOX), and Jurkat cells latently
2040 infected with HIV (clone E4) induced with (+) or without (–) TNF-α for 24 hours. A β-actin
2041 western blot is shown as loading control.
2042
2043 Figure 1–figure supplement 2. Technical improvement of Tat ChIP-seq in CD4+ T cells.
2044 (A) Comparison of ChIP-seq Tat peak numbers in the Jurkat GFP and Tat cell lines using the
2045 Tat and FLAG antibodies. The number of Tat peaks in a similar Jurkat Tat cell line from the
2046 study of Marban et al. (Marban et al., 2011) is indicated. (B) Overlay of Tat peaks from this
2047 study and those identified by Marban et al. using ChIP-seq (Marban et al., 2011). The number of
2048 common Tat peaks between our ChIP-seq data and Marban’s at a distance of <0.1kb or <1kb
2049 from the Tat sites is shown. (C–E) Genome browser views of FLAG ChIP-seq tracks in the GFP
2050 and Tat cell lines along with the FLAG peaks called by MACS in the Tat cell line and the Tat
2051 track by Marban et al. (Marban et al., 2011).
2052
2053 Figure 1–figure supplement 3. Non-functional Tat mutants have compromised chromatin
2054 interaction and modulation of cellular gene expression. (A) Scheme of Tat showing the
2055 position of its activation domain (AD), RNA-binding domain (RBD) and C-terminal domain (Ct)
2056 along with the location of the mutated residues. (B) Western blots of Jurkat CD4+ T cell lines
2057 expressing GFP, wild-type Tat or non-functional mutants (C22A, K50Q, R52R53K) (D'Orso et
2058 al., 2012). Cells from panel (B) were used in ChIP assays to analyze the occupancy of GFP, Tat
2059 or the non-functional mutants at class I TSG promoters (C), class II TSG promoters (D), class I
2060 TDG promoters (E) and class II TDG promoters (F). Values represent the average of three
79 2061 independent experiments (mean ± SEM are shown; n = 3). Cells from panel (B) were used to
2062 isolate total RNA and the expression of class I TSG (G), class II TSG (H), class I TDG (I) and
2063 class II TDG (J) was measured by qRT-PCR, normalized to RPL19, and plotted as fold RNA
2064 change over the GFP line arbitrarily set at 1 (mean ± SEM are shown; n = 3).
2065
2066 Figure 1–figure supplement 4. Tat-induced transcriptome changes are also observed at
2067 the protein level. Flow cytometry analysis of Jurkat GFP and Tat cell lines induced with
2068 doxycycline for 24 hours and stained with CD4-APC and CD69-PE antibodies or unstained
2069 (negative control) to monitor levels of CD4 and CD69 proteins expressed at the cell surface.
2070 Note the increase in the CD69+ population in the presence of Tat.
2071
2072 Figure 1–figure supplement 5. Distribution of Tat occupancy at promoter and/or
2073 intragenic domains in TSG and TDG. The percentage of sequences from the total is
2074 indicated.
2075
2076 Figure 1–figure supplement 6. FLAG ChIP-qPCR analysis on the indicated genomic loci.
2077 ChIP assay to analyze the distribution of Tat or GFP at the CD69 (A), ADCYAP1 (B), CD1E (C)
2078 and RAG1 (D) locus in the GFP (green) and Tat (black) cell lines. The position of the amplicons
2079 used in ChIP-qPCR and their distance to the TSS (arrow) is shown with the schematic of each
2080 locus. The schemes are not in real scale. Values represent the average of three independent
2081 experiments (mean ± SEM are shown; n = 3).
2082
2083 Figure 1–figure supplement 7. The genes modulated by ectopic expression of Tat are
2084 also detected during a time-course HIV infection experiment. (A) Jurkat T cells were
2085 infected with HIV (NL4-3) and levels of p24/Capsid protein was quantified using ELISA at
2086 different time points post-infection (0, 3 and 7 hours; 1, 2, 4, 6, 8, 10, and 12 days). Values
80 2087 represent the average of three independent experiments (mean ± SEM are shown; n = 3). Cells
2088 from panel (A) were used to isolate total RNA and the expression of three TSG: CD69 (B),
2089 FAM46C (C), and PPM1H (D); and three TDG: CD1E (E), EOMES (F) and FBLN2 (G)
2090 normalized to RPL19 was measured by qRT-PCR and plotted as fold RNA change over the
2091 GFP cell line arbitrarily set at 1 (mean ± SEM are shown; n = 3). The points in the curve were
2092 fitted to a non-linear regression in GraphPad Prism.
2093
2094 Figure 1–figure supplement 8. HIV infection of central memory CD4+ T cells triggers
2095 deregulation of TSG and TDG detected in the genome-wide approaches. (A) Scheme of
2096 the pipeline used to generate primary central memory T cells (TCM) and infect with replication
2097 competent HIV to identify differentially expressed genes. (B) qRT-PCR analysis on the indicated
2098 class I and II TSG, TDG and non-target genes (mean ± SEM are shown; n = 3). Cells from
2099 panel (A) were used to isolate total RNA and the expression of initiating (In) and elongating (El)
2100 transcripts for class I TSG (C), class II TSG (D), class I TDG (E) and class II TDG (F) was
2101 measured by qRT-PCR, normalized to RPL19, and plotted as fold RNA change: HIV
2102 infection/mock infection (mean ± SEM are shown; n = 3).
2103
2104 Figure 1–figure supplement 9. Response network of TSG and TDG. A response network
2105 was constructed based on RNA-seq data using a p-value cut off p = 0.01. The network consists
2106 of 138 nodes and 161 edges. Multiple edges between nodes indicate multiple evidence from
2107 different protein-protein interaction datasets (Human Protein Reference Database (HPRD) (Goel
2108 et al., 2011); IntAct (Kerrien et al., 2012); NetworKin, BHMRSS, CORUM, Hynet (Konig et al.,
2109 2010) and NCBIs HIV-1 interaction databases). Green and red nodes denote TSG and TDG,
2110 respectively. Edges are directional according to the respective databases. Edges with circles as
2111 ‘arrow tips’ denote phosphorylation reactions, ‘diamond shaped tips’ refer to general activation,
2112 and ‘arrows’ describe other reactions. Non-directional edges indicate binding.
81 2113
2114 Figure 1–figure supplement 10. Enrichment of TSG and TDG in publicly available
2115 datasets of differentially expressed genes identified in HIV infection and replication
2116 experiments. (A) The enrichment analysis with 48 publicly available differentially expressed
2117 gene datasets identified in HIV and replication experiments and 14 HIV relevant pathways from
2118 MSigDB are shown. (B) The functional annotation by the MSigDB canonical pathways (set
2119 c2.cp) is depicted. 12 significantly TSG enriched gene sets and 43 significantly TDG enriched
2120 gene-sets have been identified. P-values are Bonferroni adjusted for multiple testing. The
2121 vertical line indicates FDR = 0.05.
2122
2123 Figure 2–figure supplement 1. Tat specifies TSS selection and synthesis of alternate
2124 isoforms. (A) Genome browser views of FLAG, H3K4me3 and Pol II ChIP-seq tracks in the
2125 GFP and Tat cell lines along with the Refseq track for the ADCYAP1 locus. The position of the
2126 canonical and Tat-induced ‘novel’ TSS is indicated with arrows. (B) Genome browser views of
2127 FLAG, H3K4me3 and Pol II ChIP-seq tracks in the GFP and Tat cell lines along with the Refseq
2128 track for the ARHGEF7 locus. The position of the canonical and Tat-induced ‘novel’ TSS is
2129 indicated with arrows. (C) Normalized H3K4me3 tag density (-/+ 3 Kb respective to the TSS) for
2130 class I TSG in the Tat (Black) and GFP (Green) cell lines. (D) Genome browser views of
2131 H3K4me3 ChIP-seq tracks in the GFP and Tat cell lines for four class I TSG from panel (C).
2132
2133 Figure 2–figure supplement 2. Correlation between gene expression levels and H3K4me3
2134 density surrounding the TSS at class I and class II TSG. (A) Correlation plot between
2135 normalized H3K4me3 tag density (-/+ 3 kb respective to the TSS) and total gene expression
2136 levels (based on RNA-seq FPKM) at class I TSG in the presence and absence of Tat (Tat/GFP).
2137 (B) Correlation plot between normalized H3K4me3 tag density (-/+ 3 kb respective to the TSS)
2138 and total gene expression levels (based on RNA-seq) at class II TSG in the presence and
82 2139 absence of Tat (Tat/GFP).
2140
2141 Figure 2–figure supplement 3. Evidence that Tat increases Pol II and P-TEFb recruitment,
2142 and chromatin marks coinciding with transcription initiation and elongation at class I
2143 TSG. (A) Jurkat-GFP and -Tat cell lines were used in ChIP assays to analyze the occupancy of
2144 GFP and Tat (FLAG), H3K4me3, H3K79me3, H3K36me3, Pol II (total), Pol II (Ser2P-CTD form)
2145 and P-TEFb (CDK9) at the CD69 locus with the three indicated amplicons. The numbers
2146 indicate the position of the amplicons respective to the TSS. (B) Jurkat-GFP and -Tat cell lines
2147 were used in ChIP assays to analyze the occupancy of GFP and Tat (FLAG), H3K4me3,
2148 H3K79me3, H3K36me3, Pol II (total), Pol II (Ser2P-CTD form) and P-TEFb (CDK9) at the
2149 ADCYAP1 locus with the three indicated amplicons. The numbers indicate the position of the
2150 amplicons respective to the TSS. For both panels, IP DNA (% Input) values represent the
2151 average of three independent experiments (mean ± SEM are shown; n = 3).
2152
2153 Figure 2–figure supplement 4. Transcription initiation correlates with increased
2154 elongation chromatin markers. (A) Heatmap representation of ChIP-seq binding for H3K4me3
2155 (blue), H3K79me3 (red), and H3K36me3 (brown) at class I and class II TSG, rank ordered from
2156 lowest to most H3K4me3 density increase from the GFP to the Tat cell line. The asterisk
2157 denotes the position of the TTS. (B) Genome browser views of H3K4me3, H379me3 and
2158 H3K36me3 ChIP-seq tracks in the GFP and Tat cell lines along with the Refseq track for the
2159 ADCYAP1 locus (class I TSG). The position of the TSS is indicated with an arrow. (C) Genome
2160 browser views of H3K4me3, H379me3 and H3K36me3 ChIP-seq tracks in the GFP and Tat cell
2161 lines along with the Refseq track for the ATP9A locus (class I TSG). (D) Genome browsers of
2162 H3K4me3, H379me3 and H3K36me3 ChIP-seq tracks in the GFP and Tat cell lines along with
2163 the Refseq track for the CD82 locus (class II TSG). (E) Genome browser views of H3K4me3,
2164 H379me3 and H3K36me3 ChIP-seq tracks in the GFP and Tat cell lines along with the Refseq
83 2165 track for the FAM46C locus (class II TSG).
2166
2167 Figure 2–figure supplement 5. Tat recruits chromatin-modifying enzymes and elongation
2168 factors at selected target genes to promote transcription elongation. (A) Strep-tagged
2169 GFP and Tat were affinity purified (AP) from nuclear preparations of the Jurkat cell lines (1x10^9
2170 total cells) and interacting partners were analyzed by western blot with the indicated antibodies.
2171 CDK9 was used as a positive protein interacting control. (B) ChIP assay to analyze the
2172 distribution of histone marks (H3K79me3 and H3K36me3) and chromatin-modifying enzymes
2173 (Dot1L and SetD2) at the CD69 locus of the GFP (green) and Tat (black) cell lines. The position
2174 of the amplicons used in ChIP-qPCR is shown with the schematic of the locus. Values represent
2175 the average of three independent experiments (mean ± SEM are shown; n = 3). (C) Genome
2176 browsers of FLAG, Pol II, H3K4me3, H379me3 and H3K36me3 ChIP-seq tracks in the GFP and
2177 Tat cell lines along with the Refseq track for the CD69 locus. The position of the TSS is
2178 indicated with an arrow. (D) Knockdown of Dot1L, SetD2 and CDK9 blocks Tat-mediated CD69
2179 transcription activation. qRT-PCR of CD69 in the Jurkat-GFP and -Tat cell lines expressing non-
2180 target (NT), Dot1L, SetD2 and CDK9 shRNAs (mean ± SEM are shown; n = 3). (E) Knockdown
2181 of Dot1L, SetD2 and CDK9 does not alter RNA steady state levels of RPL19. qRT-PCR of
2182 RPL19 in the Jurkat-GFP and -Tat cell lines expressing non-target (NT), Dot1L, SetD2 and
2183 CDK9 shRNAs (mean ± SEM are shown; n = 3). (F, G, H) qRT-PCR validation of Dot1L, SetD2
2184 and CDK9 knockdown in the Jurkat-GFP and -Tat cell lines using gene-specific primers and
2185 normalized to RPL19 (mean ± SEM are shown; n = 3).
2186
2187 Figure 3–figure supplement 1. Tat recruitment induces Pol II and chromatin signatures
2188 controlling transcription initiation or elongation at different gene classes. (A) Genome
2189 browser views of FLAG, Pol II, H3K4me3, H379me3 and H3K36me3 ChIP-seq tracks in the
2190 GFP and Tat cell lines along with the Refseq track for the NETO1 locus (class I TSG). The
84 2191 position of the TSS is indicated with an arrow. (B) Genome browsers of FLAG, Pol II, H3K4me3,
2192 H379me3 and H3K36me3 ChIP-seq tracks in the GFP and Tat cell lines along with the Refseq
2193 track for the VAV3 locus (class II TSG). The position of the TSS is indicated with an arrow.
2194
2195 Figure 3–figure supplement 2. Tat controls P-TEFb and Pol II recruitment at class II TSG
2196 and class II TDG. (A–B) Tat promotes P-TEFb recruitment and Pol II elongation at class II
2197 TSG. (C–D) Tat precludes P-TEFb recruitment to block Pol II elongation at class II TDG. Jurkat-
2198 GFP and -Tat cell lines were used in ChIP assays to analyze the occupancy of GFP and Tat
2199 (FLAG), P-TEFb and Pol II at the indicated target genes using the indicated amplicons. Values
2200 represent the average of three independent experiments (mean ± SEM are shown; n = 3).
2201
2202 Figure 4–figure supplement 1. Stimulation of gene expression by Tat correlates with
2203 increased H3K27Ac density at the Tat peak. (A) Correlation plot between normalized
2204 H3K27Ac (-/+ 1 kb respective to the Tat peak) and total gene expression levels (based on RNA-
2205 seq FPKM) at class I TSG in the presence and absence of Tat (Tat/GFP). (B) Correlation plot
2206 between normalized H3K27Ac (-/+ 1 kb respective to the Tat peak) and total gene expression
2207 levels (based on RNA-seq FPKM) at class II TSG in the presence and absence of Tat
2208 (Tat/GFP).
2209
2210 Figure 4–figure supplement 2. Wild-type Tat but not the C22A non-functional mutant
2211 induces gene looping between the promoter and intragenic sites. (A) Genome browser
2212 views of ChIP-seq at the PPM1H locus in the GFP and Tat cell lines. The arrows indicate the
2213 position of two intragenic Tat binding sites. (B) 3C assay showing the relative crosslinking
2214 efficiency between a promoter anchor primer (top) or an upstream control primer (bottom) and
2215 several distal sites at the PPM1H locus in the GFP and Tat cell lines. The position of the primers
2216 used in qPCR assays, restriction sites and fragment generated after digestion are indicated
85 2217 (mean ± SEM are shown; n = 3).
2218
2219 Figure 4–figure supplement 3. Tat-mediated gene looping does not require transcription
2220 activity at enhancers. (A) Genome browser views of ChIP-seq at the PPM1H locus in the GFP
2221 and Tat cell lines. The arrows indicate the position of two intragenic Tat binding sites. (B) 3C
2222 assay showing the relative crosslinking efficiency between a promoter anchor primer and
2223 intragenic sites at the PPM1H locus in the GFP and Tat cell lines treated with DMSO (-FP) or
2224 flavopiridol (+FP). The position of the primers used in qPCR assays, restriction sites and
2225 fragment generated after digestion are indicated. (C) The GFP and Tat cell lines were treated
2226 with DMSO (-FP) or FP (+FP), and RNA isolated to measure levels of PPM1H mRNAs. (D) The
2227 GFP and Tat cell lines were treated with DMSO (-FP) or FP (+FP), and RNA isolated to
2228 measure levels of the intragenic enhancer-derived RNAs (eRNAs) (mean ± SEM are shown; n =
2229 3).
2230
2231 Figure 4–figure supplement 4. Class II TSG contain stable gene loops between promoter-
2232 enhancer that remain unaltered in response of Tat. (A) Genome browser views of ChIP-seq
2233 at the CD82 locus in the GFP and Tat cell lines. The arrows indicate the position of two
2234 intragenic Tat binding sites. (B) 3C assay showing the relative crosslinking efficiency at the
2235 CD82 locus in the GFP and Tat cell lines using a promoter anchor primer (top) or an upstream
2236 control primer (bottom). The position of the primers used in qPCR assays, restriction sites and
2237 fragment generated after digestion are indicated (mean ± SEM are shown; n = 3).
2238
2239 Figure 5–figure supplement 1. Intragenically paused Pol II is detected in CD4+ T cell lines
2240 and primary cells at sites with high H3K27Ac and low H3K4me3. Genome browser views of
2241 H3K4me3, H3K27Ac and different Pol II forms (total CTD (CTD), Ser5P-CTD and Ser2P-CTD)
2242 in the Jurkat-GFP cell line and primary CD4+ T cells along with the Refseq track for the ABCG1
86 2243 locus (class II TSG). The position of the different isoforms is indicated.
2244
2245 Figure 6–figure supplement 1. Different modes of Tat repression at class I and class II
2246 TDG. (A) Tat blocks transcription initiation of CD1E by interfering with assembly of the pre-
2247 initiation complex at the promoter. (B) Tat prevents elongation of EOMES by precluding P-TEFb
2248 loading at the promoter. ChIP-qPCR experiments with the indicated antibodies, in the GFP and
2249 Tat cell lines. The position of the amplicons used in ChIP-qPCR and their distance to the TSS
2250 (arrow) is indicated. Values represent the average of three independnt experiments. The SEM is
2251 less than 10% and not shown for simplicity. IP, immunoprecipitation.
2252
2253 Figure 7–figure supplement 1. mTR reveals that Tat blocks both Pol II recruitment and
2254 pause release at TDG. Calculations of the ratio of paused Pol II (pPol II) and traveling Pol II
2255 (tPol II) at class I and class II TDG in the presence and absence of Tat (Tat/GFP).
2256
2257 Figure 8–figure supplement 1. The interaction between Tat and host cell chromatin
2258 appears to be primary dictated by protein-protein interactions. (A) Fractionation scheme of
2259 Jurkat-GFP and Jurkat-Tat cells by increased salt extraction in the absence (–) or presence (+)
2260 of RNase A. Ch denotes the chromatin fraction. (B) Western blots of the samples prepared as in
2261 panel (A) with the indicated antibodies. The efficiency of RNase treatment was verified by
2262 electrophoresis of the purified RNA in an agarose gel stained with ethidium bromide. (C) ChIP
2263 assays to analyze the occupancy of GFP and Tat (FLAG) at the CD69 promoter (-63 amplicon)
2264 in the absence (–) and presence (+) of RNase. (D) ChIP assays to analyze the occupancy of
2265 GFP and Tat (FLAG) at the FAM46C promoter (-182 amplicon) in the absence (–) and presence
2266 (+) of RNase. (E) ChIP assays to analyze the occupancy of GFP and Tat (FLAG) at the CD1E
2267 promoter (+4 amplicon) in the absence (–) and presence (+) of RNase. (F) ChIP assays to
2268 analyze the occupancy of GFP and Tat (FLAG) at the EOMES promoter (-57 amplicon) in the
87 2269 absence (–) and presence (+) of RNase (mean ± SEM are shown; n = 3).
2270
2271 Figure 9–figure supplement 1. The C22A non-functional Tat mutant fails to bind ETS1
2272 and evidence that ETS1 is critical for transcription activation of Tat target genes. (A)
2273 Western blots showing interactions between Tat (but not GFP or the C22A non-functional Tat
2274 mutant) and ETS1. CDK9 was used as a positive control in the interaction. Strep-tagged GFP,
2275 Tat, and C22A were affinity purified (AP) from the Jurkat cell lines using Strep beads, and
2276 analyzed by western blot using the indicated antibodies. (B) qRT-PCR analysis of CD69
2277 expression in the indicated cell lines. (C) qRT-PCR analysis of ADCYAP1 expression in the
2278 indicated cell lines. (D) qRT-PCR analysis of VAV3 expression in the indicated cell lines. (E)
2279 qRT-PCR analysis of FAM46C expression in the indicated cell lines. (F) qRT-PCR analysis of
2280 RPL19 expression in the indicated cell lines. (G) qRT-PCR analysis of 7SK expression in the
2281 indicated cell lines. Expression of the indicated genes (panels B–G) was normalized to ACTB.
2282 Values represent the average of three independent experiments (mean ± SEM are shown; n =
2283 3).
88 Figure 1 A C D Jurkat T-cell line Combinatorial analysis 32 qRT-PCR 16 GFP Tat ChIP-seq RNA-seq DOX –+ –+ 8 2751 2013 4 Tat bound genes DE genes 2 1 FLAG -2 -4 2295 456 1557 -8 -16 β-actin Log2 fold change (Tat/GFP) -32 B 7SK ACTB CD69 CD1E CDK6 Indirect RPL19 ANXA1ZNF83 FBLN2 Direct FAM46CPPM1H EOMES PRAME targets ADCYAP1 FAM133B Exon5’UTR (2.1%) targets Intron Intergenic Promoter 3’UTR TSG TDG 36.4 35 24 Non- Direct targets 0 50 100 244 TSG 212 TDG targets Percent of sequences E F TSG GO Biological Process TSG Regulation of immune system process CD69 Immune system process Positive regulation of immune system process GFP Regulation of immune response Cell activation Tat Positive regulation of immune response Positive regulation of cell activation ChIP-seq RefSeq TSS Positive regulation of leukocyte activation Leukocyte activation Intron Promoter Positive regulation of lymphocyte activation Immune response-regulating cell surface receptor signaling Antigen receptor-mediated signaling pathway Immune response-activating cell surface receptor signaling GFP Regulation of cell activation Lymphocyte differentiation Tat Regulation of leukocyte activation
RNA-seq Leukocyte differentiation Regulation of lymphocyte activation Lymphocyte activation ADCYAP1 Regulation of B cell proliferation
GFP 10e-05 10e-10 10e-15 10e-20 Tat Binomial p-value
ChIP-seq RefSeq TSS GO Biological Process TDG Intron 3’-UTR Negative regulation of cell aging Regulation of cell aging Negative regulation of myeloid cell differentiation GFP Interphase Regulation of myeloid cell differentiation Tat Endocrine pancreas development
RNA-seq Cell cycle process Chromatin assembly or disassembly Nucleosome assembly Chromatin assembly Nucleosome organization TDG Cell cycle phase Protein-DNA complex subunit organization CD1E Cell cycle Protein-DNA complex assembly GFP Pancreas development Regulation of gene silencing Tat DNA packaging S phase RefSeq DNA conformation change TSS 10e-25 10e-50 10e-75 10e-100 Binomial p-value GFP Tat RNA-seq ChIP-seq Figure 2
A H3K4me3 B H3K4me3 8 NETO1 ATP9A Class I Class II ADCYAP1 0.035 0.1 4 ZNF83 CD244 0.03 PPM1H 0.08 RORβ 0.025 FLT1 ZNF521 2 PREX2 CD69 0.06 IQGAP2 ETV6 0.02 VAV3 0.015 0.04 1 0.01 0.02 0.005 Tat GFP
Normalized tag density Normalized tag density Tat -2 GFP -3-2 -1 0 +1 +2 +3 -3-2 -1 0 +1 +2 +3
Log2 fold change (Tat/GFP) Distance from TSS (Kb) Distance from TSS (Kb) -4 C D H3K79me3 H3K79me3 64 CD69 ADCYAP1 5 Class I1.2 Class II 32 1.0 VAV3 4 16 CD82 ETV6 0.8 3 8 0.6 2 0.4 4 1 0.2 Tat 2 Normalized tag density Tat Normalized tag density GFP GFP 1 TSS TTS TSS TTS Log2 fold change (Tat/GFP) -2
E H3K36me3 F H3K36me3 16 5 Class I2.5 Class II RORβ 8 NETO1 ADCYAP1 2.0 ZNF83 4 ZNF521 CD69 3 1.5 4 PPM1H ETV6 2 1.0 VAV3 Tat 2 1 0.5 GFP Tat Normalized tag density Normalized tag density 1 GFP TSS TTS TSS TTS Log2 fold change (Tat/GFP) -2
GHNETO1 (Class I) VAV3 (Class II) GFP GFP H3K4me3 H3K4me3 Tat Tat GFP GFP H3K79me3 H3K79me3 Tat Tat
GFP GFP H3K36me3 H3K36me3 Tat Tat Refseq TSS Refseq TSS Figure 3
ABPol II density at Tat peak Pol II density at promoter 2.0 Class I 44.0Class II 1.2 Class I 2.0 Class II
1.5 33.0 0.9 1.5
1.0 22.0 0.6 1.0
0.5 11.0 0.3 0.5 Normalized tag density Normalized tag density Normalized tag density Normalized tag density
GFP Tat GFP Tat GFP Tat GFP Tat C D 0.8 Class I 2.0 Class II
0.6 1.5
0.4 1.0
0.2 0.5 Tat Tat GFP GFP Normalized tag density -1 -0.5 0+1+0.5 Normalized tag density -1 -0.5 0+1+0.5 Distance from Tat peak (Kb) Distance from Tat peak (Kb) E F Tat Class I NETO1 (Class I) 0.25 GFP 0.20 FLAG Tat 0.15 GFP 0.10 Pol II Tat 0.05 Tat GFP GFP H3K4me3 Normalized tag density Tat TSS TTS Refseq TSS G H Class II VAV3 (Class II) 0.7 GFP 0.6 FLAG 0.5 Tat 0.4 GFP 0.3 Pol II Tat 0.2 Tat 0.1 GFP GFP H3K4me3 Normalized tag density Tat TSS TTS Refseq
TSS Figure 4 ADB C H3K4me3 H3K27Ac H3K27Ac H3K27Ac Tat Tat GFP Tat GFP Tat GFP Tat 4 Class I 5 Class II
H3K4me3 high Class I 4 H3K27Ac low 3 Promoter Promoter 3 2 2 H3K4me3 low 1 Normalized tag density H3K27Ac high Normalized tag density 1 Class II Ranked based on K27 density Intragenic Intragenic GFP Tat GFP Tat
(Centered on Tat peak) (Centered on Tat peak) TSS
E G I Tat Tat PPM1H (Class I) PPM1H Class I 3 GFP FLAG 5 +3755 Tat 2 4 GFP Pol II 3 1 Tat 2 GFP 1 H3K4me3 IP DNA (% Input) DNA IP Tat
Normalized tag density -3-2 -1 0 1 2 3 Distance from Tat peak (Kb) GFP Tat Tat GFP GFP H3K27Ac Tat H3K27Ac p300 Refseq
F H 300 kb 200 kb 100 kb TSS Class II CD82 (Class II) Primers 3 Restriction Anchor +4180 2 10 Fragments 12 34 5 67 8 9 8 12
1 6 4 9 2 IP DNA (% Input) DNA IP Normalized tag density -3-2 -1 0 1 2 3 6 Distance from Tat peak (Kb)
Tat Tat GFP GFP 3 H3K27Ac p300 Crosslinking efficiency Figure 5 A B Traveling Ratio (TR) CD82 (Class II) Pol II promoter density GFP TR = Pol II gene body density FLAG Tat Promoter Gene Body GFP Pol II Tat
GFP H3K4me3 Tat
GFP H3K4me1
ChIP-seq enrichment Tat
GFP TSS TTS H3K27Ac -50bp +1000bp Tat Refseq TSS
C Modified Traveling Ratio (mTR) D E 8 pPol II (promoter+gene body density) 128 mTR = tPol II (gene body density) 7 64 Promoter Gene Body 6 Intragenic enhancer 32 (H3K27Ac+) 5 16 4
8 3 tPol II (Tat/GFP) Normalized mTR 4 2
ChIP-seq enrichment 2 1 Class I GFP Tat Class II TSS TTS 1234567 8 -50bp +1000bp pPol II (Tat/GFP) F G Class I TSG Class II TSG
Tat Pol II Pol II Tat Pol II Pol II Promoter TSS
TSS Pol II Pol II Intragenic Tat
Pol II Pol II Pol II
TSS TSS Figure 6 A B H3K4me3 H3K4me3 2 Class I Class II 1 0.10 0.05
-2 0.08 0.04
-4 0.06 0.03 THEMIS DAB2IP AOX1 -8 MYO7B 0.04 0.02
-16 FBLN2 0.02 0.01 Tat GFP GFP Tat -32 Normalized tag density Normalized tag density -3-2 -1 0 +1 +2 +3 -3-2 -1 0 +1 +2 +3 Log2 fold change (Tat/GFP) CD1E -64 Distance from TSS (Kb) Distance from TSS (Kb) C D H3K79me3 H3K79me3 4 Class I Class II 2 0.025 0.025
1 0.020 0.020
0.015 0.015 -2 0.010 0.010 FBLN2 -4 EOMES 0.005 0.005 GFP Tat Normalized tag density -8 Tat Normalized tag density GFP
Log2 fold change (Tat/GFP) CD1E TSS -16 TTS TSS TTS
E H3K36me3 F H3K36me3 2
0.05 Class I0.01 Class II
0.04 0.008 1 0.03 0.006
0.02 0.004 GFP GFP -2 EOMES 0.01 Tat 0.002 Tat FBLN2 Normalized tag density Normalized tag density CD1E Log2 fold change (Tat/GFP) TSS TTS TSS TTS -4 GH CD1E (Class I) EOMES (Class II) GFP GFP FLAG FLAG Tat Tat GFP GFP H3K4me3 H3K4me3 Tat Tat GFP GFP H3K79me3 H3K79me3 Tat Tat
GFP GFP H3K36me3 H3K36me3 Tat Tat Refseq TSS Refseq TSS Figure 7
ABPol II density at Tat peak Pol II density at promoter 8.0 Class I 16.0 Class II 4.0 Class I 4.0 Class II 8.0 4.0 2.0 2.0 4.0 2.0 1.0 1.0 2.0 1.0 0.5 0.5 1.0 0.5 0.25 0.25 0.5 0.25 0.125 0.125
Normalized tag density Normalized tag density 0.25 Normalized tag density Normalized tag density
0.125 0.125 0.0625 0.0625 GFP Tat GFP Tat GFP Tat GFP Tat CD 1.2 Class I 1.2 Class II
0.9 0.9
0.6 0.6 GFP GFP 0.3 0.3 Tat Tat Normalized tag density Normalized tag density -1 -0.5 0+1+0.5 -1 -0.5 0+1+0.5 Distance from Tat peak (Kb) Distance from Tat peak (Kb) E F CD1E (Class I) 0.12 Class I GFP 0.10 FLAG Tat 0.08 GFP 0.06 GFP Pol II 0.04 Tat Tat 0.02 GFP H3K4me3 Normalized tag density Tat TSS TTS Refseq TSS G H EOMES (Class II) 0.10 Class II GFP 0.08 FLAG Tat 0.06 GFP Pol II 0.04 GFP Tat 0.02 Tat GFP H3K4me3 Normalized tag density Tat TSS TTS Refseq
TSS Figure 8 A B TAR TAR-like (query)
GG Loop U G N(4-40) C A CG NN GC NN AU AU GC GC Bulge N(1-2) N(1-2) U U AU AU GC GC AU NN CG NN CG NN C D 1.0 TSG 1.0 TDG
0.8 0.8
0.6 0.6
0.4 0.4
TAR-like motifs TAR-like 0.2 motifs TAR-like 0.2 Fraction of genes with Fraction of genes with
<10kb <1kb <0.1kb <10kb <1kb <0.1kb Distance from Tat peak Distance from Tat peak EF TSG TDG Transcription Transcription Motif P-value Motif P-value factor factor
-18 -9 ETS1 2.5x10 ETS1 1.1x10
-14 -7 RUNX1 9.6x10 RUNX1 2.0x10
-12 GATA3 1.9x10 G H 1.0 TSG 1.0 TDG
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2 ETS1 ChIP-seq peaks ETS1 ChIP-seq peaks Fraction of genes with Fraction of genes with
<10kb <1kb <0.1kb <10kb <1kb <0.1kb Distance from Tat peak Distance from Tat peak Figure 9 A B C Strep AP CD69 CD1E 3’ Inputs Elutions Tat (ChIP-Seq) Tat (ChIP-Seq) ETS1 (ChIP-Seq) ETS1 (ChIP-Seq) ETS1 motif prediction ETS1 motif prediction GFPTat GFPTat -63 +5897 +4 +5215 Q-PCR amplicons Q-PCR amplicons Promoter- Promoter- Promoter- Promoter- GFP proximal distal proximal distal 0.8 1.6
FLAG 0.6 1.2
0.4 0.8 Tat FLAG FLAG 0.2 0.4
ETS1 (% Input) DNA IP (% Input) DNA IP
CDK9 2.0 2.0
1.5 1.5
1.0 ETS1 ETS1 1.0 0.5 0.5 IP DNA (% Input) DNA IP IP DNA (% Input) DNA IP
Tat Tat Tat Tat GFP GFP GFP GFP D E F Jurkat cell line CD69 CD1E GFP Tat 3’ -63 +4 2.5 2.0 shRNA – NT ETS1– NT ETS1 2.0 1.5 GFP 1.5 ETS1 1.0 ETS1 FLAG 1.0 0.5 0.5 IP DNA (% Input) DNA IP (% Input) DNA IP
Tat shRNA NT NT shRNA NT NT ETS1 ETS1 ETS1 ETS1 ETS1 GFP Tat GFP Tat
1.0 2.0
β-actin 0.8 1.5 0.6 FLAG 1.0 FLAG 0.4 0.5 0.2 IP DNA (% Input) DNA IP IP DNA (% Input) DNA IP
shRNA NT NT shRNA NT NT ETS1 ETS1 ETS1 ETS1 GFP Tat GFP Tat