1 HIV Tat controls RNA Polymerase II and the epigenetic landscape

2 to transcriptionally reprogram target immune cells

3

4 Jonathan E. Reeder,1,2,* Youn-Tae Kwak,2,3,* Ryan P. McNamara2, Christian V. Forst4 and Iván

5 D’Orso2,5

6

7 1University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080

8 2Department of Microbiology, University of Texas Southwestern Medical Center, 5323 Harry

9 Hines Boulevard, Dallas, TX 75390

10 3Present address: Department of Biochemistry, University of Texas Southwestern Medical

11 Center, 5323 Harry Hines Boulevard, Dallas, TX 75390

12 4Department of Genetics and Genomic Sciences, Institute for Genomics and Multiscale Biology,

13 Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029

14 5Correspondence to: [email protected]

15 *These authors contributed equally to this work

1 16 Abstract

17 HIV encodes Tat, a small that facilitates viral transcription by binding an RNA structure

18 (TAR) formed on nascent viral pre-mRNAs. Besides this well characterized mechanism, Tat

19 appears to modulate cellular transcription, but the target and molecular mechanisms

20 remain poorly understood. We report here that Tat uses unexpected regulatory mechanisms to

21 reprogram target immune cells to promote viral replication and rewire pathways beneficial for

22 the virus. Tat functions through master transcriptional regulators bound at promoters and

23 enhancers, rather than through cellular “TAR-like” motifs, to both activate and repress sets

24 sharing common functional annotations. Despite the complexity of transcriptional regulatory

25 mechanisms in the cell, Tat precisely controls RNA Polymerase II recruitment and pause

26 release to fine-tune the initiation and elongation steps at target genes. We propose that a virus

27 with a limited coding capacity optimized its genome by evolving a small but “multitasking”

28 protein to simultaneously control viral and cellular transcription.

2 29 Introduction

30 Transcription of protein coding genes by RNA Polymerase (Pol) II is regulated at several steps

31 including initiation, elongation and termination (Adelman and Lis, 2012; Fuda et al., 2009;

32 Grunberg and Hahn, 2013; Sims et al., 2004; Zhou et al., 2012). Transcription factors

33 coordinate the activation and/or repression of key regulatory programs by acting at one or

34 multiple steps in this regulatory circuitry. While one group of transcription factors recruit Pol II to

35 their target genes to induce transcription initiation, another class functions by promoting the

36 transcriptional pause release from the promoter-proximal state to allow Pol II transition to the

37 productive elongation phase (Feinberg et al., 1991; Gomes et al., 2006; Peterlin and Price,

38 2006; Rahl et al., 2010; Zhou et al., 2012). Thus, promoter-proximal pausing has been

39 identified as a general feature of transcription control by Pol II in metazoan cells and a key

40 regulatory step during differentiation, cell development and induction of stem cell pluripotency

41 (Adelman and Lis, 2012; Core and Lis, 2008; Smith and Shilatifard, 2013; Zeitlinger et al., 2007;

42 Zhou et al., 2012).

43 In addition to Pol II and the basal transcription machinery, the epigenetic landscape

44 plays another critical role in transcription regulation, mediated by covalent modifications to the

45 N-terminal tails of histones. The study of chromatin modifications has revealed fundamental

46 concepts in the regulation of transcription. Histone marks, in general, associate with different

47 genomic domains (promoters, coding units and enhancers) and provide evidence of their

48 transcriptional status (Barski et al., 2007; Creyghton et al., 2010; Guenther et al., 2007; Li et al.,

49 2007; Tessarz and Kouzarides, 2014; Zhou et al., 2011). While active promoters are marked

50 with Histone 3 Lysine 4 trimethylation (H3K4me3), active transcription units are marked with

51 H3K79me3 and H3K36me3 (Kouzarides, 2007; Li et al., 2007). Thus, most Pol II regulated

52 genes (protein-coding and long non-coding RNAs) contain a K4/K36 signature that positively

53 correlates with active gene expression (Guttman and Rinn, 2012; Martin and Zhang, 2005). In

54 addition to promoters, distal genomic elements referred to as enhancers modulate gene activity

3 55 through gene looping or long-range chromatin interactions and further regulate the location,

56 timing, and levels of gene expression (Bulger and Groudine, 2011). The genomic locations of

57 these enhancers (intergenic or intragenic) usually correlate with high H3K4me1 and low

58 H3K4me3 content and their activity is proportional to the levels of H3K27Ac (Bulger and

59 Groudine, 2011; Creyghton et al., 2010; Kim et al., 2010b; Kowalczyk et al., 2012; Smallwood

60 and Ren, 2013). Additionally, H3K27Ac also appears to modulate two temporally separate

61 events: it enhances the search kinetics of transcriptional activators, and accelerates the

62 transition from initiation into elongation leading to a robust and potentially tunable transcriptional

63 response (Stasevich et al., 2014).

64 Viruses have evolved strategies to precisely orchestrate shifts in regulatory programs.

65 In addition to sustaining transcription of their own genomes, certain viral transcription factors

66 directly or indirectly alter existing cellular programs in ways that promote viral processes

67 (replication and spread) (Ferrari et al., 2008; Horwitz et al., 2008), and/or modulate programs in

68 the infected target cell. It is well established that the HIV Tat protein relieves promoter-proximal

69 pausing at the viral promoter by recruiting the positive transcription elongation factor b (P-TEFb)

70 to the trans-activating RNA (TAR) stem-loop formed at the 5’-end of viral nascent pre-mRNAs

71 (D'Orso and Frankel, 2010; Mancebo et al., 1997; Ott et al., 2011; Zhou et al., 2012; Zhu et al.,

72 1997). Much recently, it was discovered that Tat recruits P-TEFb as part of a larger complex

73 referred to as Super Elongation Complex (SEC), which is composed by the MLL-fusion partners

74 involved in leukemia (AF9, AFF4, AFF1, ENL, and ELL), and PAF1. Although SEC formation

75 relies on P-TEFb, optimal P-TEFb kinase activity towards the Pol II C-terminal domain (CTD) is

76 AF9 dependent, and the MLL-fusion partners and PAF1 are required for Tat transactivation (He

77 et al., 2010; Luo et al., 2012; Sobhian et al., 2010).

78 Moreover, Tat stimulates transcription complex assembly through recruitment of TATA-

79 binding protein (TBP) in the absence of TBP-associated factors (TAFs) (Raha et al., 2005),

80 implying that Tat controls both the initiation and elongation steps of transcription, in agreement

4 81 with early proposals of Tat increasing transcription initiation, stabilizing elongation and

82 precluding anti-termination (Kao et al., 1987; Laspia et al., 1989; Rice and Mathews, 1988).

83 Thus, Tat has the ability to control multiple stages in the HIV transcriptional cycle to robustly

84 increase transcript synthesis to promote viral replication.

85 In addition to controlling HIV transcription, Tat appears to modulate cellular gene

86 expression to generate a permissive environment for viral replication and spread, and alter or

87 evade immune responses (Izmailova et al., 2003; Kim et al., 2010a; Kim et al., 2013; Lopez-

88 Huertas et al., 2010; Marban et al., 2011). One example of Tat-mediated down-regulation is the

89 mannose receptor in macrophages and immature dendritic cells, which plays a key role in host

90 defense against pathogens by mediating their internalization (Caldwell et al., 2000). A similar

91 case is Tat’s repression of the MHC class I gene promoter, which depends on Tat binding to

92 complexes containing the TBP-associated factor TAFII250/TAF1 (Weissman et al., 1998). The

93 interaction of the C-terminal domain and TAF1 suggests that Tat mediates repression functions

94 through the transcription initiation complex. These, and many other examples, suggest that Tat

95 is capable of altering cellular gene expression via association with factors bound to promoters.

96 More recently, Chip-on-chip and ChIP-seq approaches have revealed that Tat has the

97 ability to bind target genes in the . While these previous studies have provided

98 early glimpses about Tat recruitment to host cell chromatin, a comprehensive description of the

99 direct target genes and the molecular mechanisms remain poorly understood. To address

100 those gaps in knowledge, we aimed at: (i) identifying Tat target genes in the human genome, (ii)

101 delineating the molecular mechanisms by which Tat reprograms cellular transcription, and (iii)

102 defining how Tat is recruited to host cell chromatin. One of the main challenges in the field is to

103 obtain a high-quality, comprehensive ChIP-seq that will provide functional insights. To address

104 this challenge, we analyzed the genome-wide distribution of Tat in the human genome using a

105 technically improved ChIP-seq compared to the previous studies. Moreover, we investigated

106 the molecular mechanisms using a global analysis of chromatin signatures that demarcate the

5 107 position and activity of genomic domains (enhancers, promoters and coding units), as well as

108 Pol II recruitment and activity (Barski et al., 2007; Creyghton et al., 2010). We provide, for the

109 first time, evidence that the direct Tat target genes share functional annotations and are

110 regulated by common master transcriptional regulators such as T-cell identity factors. While the

111 Tat stimulated genes (TSG) show a positive role in activating T cells, favoring cell proliferation

112 to promote viral replication and spread, the Tat down-regulated genes (TDG) show critical roles

113 in blunting immune system responses, nucleic acid biogenesis (splicing and translation), and

114 proteasome control. Strikingly, Tat functions as both activator and repressor by modulating Pol

115 II recruitment and/or pause release as well as controlling the activity of chromatin-modifying

116 enzymes to reprogram the epigenetic landscape of the host cell.

117 Taken together, we propose that a virus with a limited coding capacity has optimized its

118 genome by evolving a small protein (Tat) to perform multiple functions throughout the viral life

119 cycle. Beyond controlling HIV transcription, Tat has evolved unique properties to occupy

120 precise genomic domains (promoters and enhancers) to reprogram cellular transcription using

121 unexpected regulatory mechanisms. We provide the molecular basis of an unprecedented

122 paradigm with critical roles in host-pathogen interactions.

6 123 Results

124 Genomic domains occupied and regulated by Tat in CD4+ T cells

125 Previous studies have proposed that Tat modulates cellular gene expression to generate an

126 environment hospitable for viral replication and spread (Kim et al., 2010a; Kim et al., 2013; Li et

127 al., 1997; Lopez-Huertas et al., 2010; Marban et al., 2011). However, the molecular

128 mechanisms remain poorly understood. To elucidate how Tat performs these functions we

129 aimed at: (i) identifying genomic domains occupied and regulated by Tat in the human genome,

130 and (ii) defining the mechanisms of transcriptional control. We generated high-quality chromatin

131 immunoprecipitation sequencing (ChIP)-seq datasets in two Jurkat CD4+ T cell lines, inducibly

132 expressing FLAG-tagged Tat or GFP used as negative control (Figure 1A). Importantly, the cell

133 line used expresses low and physiologic levels of Tat, which mimic those detected during HIV

134 infection (Figure 1–figure supplement 1). We utilized this minimalistic system in order to more

135 precisely examine the effects of Tat alone rather than in an infection setting. The latter

136 approach may compromise the investigation due to the introduction of other viral products that

137 may also affect cellular behavior and/or alter Tat activity (Frankel and Young, 1998).

138 We established a ChIP-seq pipeline using both a Tat antibody (Ab) previously used in

139 ChIP-seq (Marban et al., 2011) and a FLAG Ab. We identified genomic domains enriched with

140 statistically significant occupancy relative to input DNA using the model-based analysis of ChIP-

141 seq (MACS) algorithm with a stringent cutoff (FDR<0.05) (Zhang et al., 2008). Unexpectedly,

142 we did not find a significant number of overlapping peaks between the Tat and FLAG ChIP-seq

143 datasets, suggesting that the Tat Ab performs poorly in ChIP, providing a molecular explanation

144 for the low-quality dataset generated by Marban et al. (Marban et al., 2011) (Figure 1–figure

145 supplement 2A–E). We thus focused on FLAG, which enabled us to generate a better quality,

146 more comprehensive ChIP-seq dataset compared to previous studies (Kim et al., 2010a;

147 Marban et al., 2011), which served as the key basis to elucidate novel Tat functions in the

148 control of cellular transcription. To precisely map Tat binding sites in the human genome, we

7 149 called FLAG peaks in both the Tat and GFP cell lines, then discarded those peaks in the

150 intersection of both datasets, considering only the FLAG peaks unique to the Tat cell line to be

151 true Tat binding events. We found that the majority of FLAG peaks in the Tat cell line represent

152 bona fide Tat binding sites because they are not detected in the GFP cell line (Figure 1–figure

153 supplement 2A and C–E). Importantly, the ChIP signal for FLAG was barely detectable in the

154 GFP cell line or in cells expressing inactivating Tat mutations in the activation domain (such as

155 C22A) that abolish Tat recruitment to target genes and/or impair gene expression activation

156 (D'Orso et al., 2012) (Figure 1–figure supplement 3). In contrast to the C22A non-functional

157 Tat mutant, synonymous mutations within the RNA-binding domain (such as K50Q and

158 R52R53K) have less pronounced effects on Tat binding to chromatin (Figure 1–figure

159 supplement 3).

160 Genome-wide distribution analysis based on the ENCODE annotation (Consortium,

161 2011), revealed that Tat binding is heavily enriched at promoters (24%, p-value 3.4x10-324) and

162 5’-UTR (1.2%, p-value 5.3x10-193) relative to their genomic proportions. These genomic

163 domains are underrepresented in the genome (about 1%) making Tat’s relative abundance at

164 these sites highly significant (Figure 1B). Although Tat also binds introns (36.4%, p-value

165 5.1x10^-4) and intergenic domains (35%, below the expected level of enrichment), defined as

166 sites located more than 1 Kb from the nearest annotated gene, these two genomic domains are

167 highly represented in the genome and the relative Tat binding frequency here is not nearly as

168 striking as at promoters and promoter-proximal regions (Figure 1B and Table 1).

169 To gain insights into the roles of Tat in cellular transcriptional control, we integrated the

170 FLAG ChIP-seq dataset with a whole transcriptome generated by RNA-seq (Figure 1C). RNA-

171 seq revealed 2013 differentially expressed genes (DEGs) using a q-value cutoff < 0.05, 456 of

172 which also appeared in the set of Tat-bound genes generated by our ChIP-seq analysis. We

173 refer to this dataset as direct targets. Remarkably, inactivating mutations that abolish Tat

174 recruitment to host cell chromatin also impair gene expression changes, indicating that the

8 175 effect of Tat on target gene transcription is direct and requires Tat binding and activity.

176 Specifically, reduced Tat C22A binding to promoters of Tat target genes correlates with

177 decreased gene expression changes (Figure 1–figure supplement 3), thus providing direct

178 evidence of Tat function.

179 Besides the direct target genes, we identified another set referred to as indirect targets.

180 These are genes that are differentially expressed in the presence of Tat (RNA-seq), but are not

181 directly bound by Tat (Figure 1C). Tat might regulate these genes through downstream effects

182 or alternative mechanisms (i.e. signaling pathways) or could be targets not identified by ChIP-

183 seq because of the high-confidence threshold used during peak calling. In this work we only

184 focused on the direct target genes to study the mechanisms of transcriptional control by Tat.

185 Further analysis of the direct Tat targets revealed that 244 genes are up-regulated and

186 212 genes down-regulated, and we refer to them as Tat stimulated genes (TSG) or Tat down-

187 regulated genes (TDG), respectively (Figure 1C). Importantly, we validated the expression of

188 several direct target genes using quantitative real-time PCR (qRT-PCR) assays, confirming the

189 reliability of RNA-seq (Figure 1D). Notably, protein expression analysis revealed that the

190 changes detected at the RNA level are also reflected at the protein level (for example CD69

191 expression at the cell surface in the presence of Tat) indicating that gene expression changes

192 are functional (Figure 1–figure supplement 4).

193 We further analyzed the distribution of Tat binding sites within the direct target genes to

194 probe for enrichment in particular genomic domains. We found that Tat is equally recruited to

195 both promoters (39%) and intragenic domains (39%) at TSG, whereas the majority of occupied

196 domains at TDG are at promoters (61%) (Figure 1–figure supplement 5). Inspection of

197 genome browser tracks showed that, in addition to binding TSG promoters (CD69), Tat is also

198 recruited to gene body regions with enrichment at introns (ADCYAP1) (Figure 1E). Notably,

199 this mode of Tat binding appears to be functionally relevant because it correlates with RNA

200 abundance changes as revealed by RNA-seq. In addition, Tat binds the target genes using

9 201 discrete (CD69 and ADCYAP1) or broad (CD1E) distribution patterns, probably due to the

202 different modes or mechanisms of recruitment to chromatin (Figure 1E). Importantly, we have

203 validated the FLAG ChIP-seq dataset by performing extensive ChIP-qPCR on several direct

204 targets including tow TSG (CD69 and ADCYAP1) and two TDG (CD1E and RAG1) in the Tat

205 and GFP cell lines (Figure 1–figure supplement 6).

206 Given that our model was built on the ectopic expression of Tat in target cells, we

207 performed an infection experiment to test whether Tat reprograms cellular transcription in a

208 similar way in the context of infection. Importantly, the TSG (CD69, FAM46C and PPM1H) and

209 TDG (CD1E, EOMES and FBLN2) tested are also modulated early during HIV infection of

210 Jurkat T cells, albeit with different kinetics (Figure 1–figure supplement 7), supporting the view

211 that the cellular reprogramming by Tat is functional and not simply an artifact of RNA-seq or the

212 ectopic expression of Tat outside the context of the virus.

213 Given that we proposed that Tat effects on host cell gene expression might be relevant

214 to normal HIV biology we asked whether the TSG and TDG are also modulated in response to

215 viral infection of primary CD4+ T cells. To this end, we isolated naïve CD4+ T cells from the

216 blood of healthy donors and generated central memory T cells (TCM) (Figure 1–figure

217 supplement 8A). After infection of TCM with replication competent X4 trophic virus (NL-GFP) or

218 mock infection and sorting infected cells, we isolated RNA and performed qRT-PCR analysis on

219 several TSG and TDG. We observed that HIV infected TCM cells showed the differential gene

220 expression signature (at least for the 12 direct targets examined) that we previously observed

221 with the ectopic expression of Tat or HIV infection of Jurkat T cells (Figure 1–figure

222 supplement 8B). This ex vivo experiment clearly demonstrates the robustness of our

223 minimalistic setting to study Tat functions in the host cell.

224

225 The direct Tat target genes share common functional annotations and are enriched in

226 pathways beneficial for the virus

10 227 If the genes directly modulated by Tat are involved in biologically relevant processes then we

228 would expect them to share functional annotations. To explore whether the TSG and TDG have

229 any common biological functions we examined their (Figure 1F). To provide

230 statistical robustness, we used cluster analysis and a control set of genes depleted in the Tat

231 ChIP-seq experiment. Gene categories significantly enriched in the set of TSG include positive

232 regulation of immune system process, cell activation and regulation of lymphocyte

233 differentiation, while TDG include negative regulation of cell aging, regulation of myeloid cell

234 differentiation and processes of DNA/RNA biogenesis (Figure 1F). Consistently, network

235 analysis indicates that TSG are significantly enriched in T-cell receptor (TCR) pathway, cell

236 cycle and focal adhesion, while TDG enrich processes relevant for DNA/RNA processes,

237 ribosome and proteasome control, among others (Figure 1–figure supplement 9).

238 With respect to T-cell activation, CD69 exhibits a rather central role, because its

239 upregulation promotes T-cell stimulation and differentiation (TCR Pathway cluster) (Sancho et

240 al., 2005). Another stimulated process involves components of the cell cycle (CDK6) together

241 with cyclinD3 (CCND3) and cyclin-dependent kinase inhibitor 1B (CDKN1B) (Cell Cycle cluster)

242 that appear to be controlled by phosphorylation via the LCK kinase from the TCR complex, as

243 one of the central node in the network. Another controller node assembles the ataxia-

244 telangiectasia-mutated (ATM) serine/threonine kinase, which is best known for its role as an

245 activator of the DNA damage response (HIV Infection cluster). The activity of HIV integrase

246 stimulates an ATM-dependent DNA damage response, and ATM deficiency sensitizes cells to

247 retrovirus-induced cell death. In addition, ATM inhibition is capable of suppressing the

248 replication of both wild-type and drug-resistant HIV (Lau et al., 2005), thus demonstrating the

249 importance of this TSG in controlling viral processes. With respect to down-regulated

250 processes, ribosomal centered around RPS9 (Ribosome cluster), together with

251 translation initiation factors (EIF3b) and the nucleolar and coiled-body phosphoprotein NOLC1,

252 as well as components of the spliceosome such as SF3B5, SNRPB, SNRNP200, LSM4, and

11 253 PCBP2 (Spliceosome cluster), suggest negative regulation of these processes (Figure 1–figure

254 supplement 9).

255 Because we proposed that the predicted GO biological processes of the direct Tat target

256 genes are essential for the viral life cycle (to promote a permissive state for viral replication), we

257 further analyzed whether they are retained in the context of infection. To test this, we

258 assembled a collection of 62 publicly available datasets including 48 gene sets from 13

259 publications containing information on DEGs identified during HIV infection together with 14

260 datasets from the Molecular Signatures Database (MSigDB) of the Broad Institute

261 (Subramanian et al., 2005) (Supplementary file 1A). After executing gene set enrichment

262 analysis with Bonferroni correction for multiple testing, we identified 5 datasets that significantly

263 enriched the TSG (with FDR ≤ 0.05) (Supplementary file 1B). In addition, TDG also enrich two

264 gene sets from the HIV relevant MSigDB sets (Figure 1–figure supplement 10A and

265 Supplementary file 1B). Together, the analysis provides evidence that the direct Tat target

266 genes (and thus the predicted GO biological processes) are retained in the context of viral

267 infection, supporting the model that Tat-mediated host cell reprogramming occurs during

268 infection. Notably, this proposal is also consistent with our infection data on Jurkat and primary

269 TCM cells (Figure 1–figure supplements 7 and 8).

270 Our network analysis indicated that TSG and TDG were enriched in specific biological

271 processes that might promote viral functions (including replication) (Figure 1–figure

272 supplement 9 and 10). To answer how those functional annotations could promote viral

273 infection, we employed a variety of methods to identify the functions of the direct Tat target

274 genes, and relate them to the biology of HIV. Functional annotation by GO classes provides an

275 overview of biological processes and functions enriched by the gene sets (Figure 1–figure

276 supplement 10B). Furthermore, we have obtained additional information from the “canonical

277 pathway” collection of MSigDB. By annotating clusters of the response network with GO and

278 MSigDB pathways we related biological processes with network content. For example, the TCR

12 279 Pathway cluster annotated with MSigDB includes ITK, LCK, LCP2, PRKCQ and VAV3, among

280 other targets. This cluster not only reveals that those targets interact with each other, but also

281 with the Tyrosine protein kinase LCK, which is known to phosphorylate both LCP2 and PRKCQ,

282 implying physical and functional interactions.

283 The data clearly indicates that TSG are enriched in datasets from many pathways that

284 correlate with stimulation of viral replication, such as the ‘TCR pathway/stimulation’ (CD3,

285 CD247, INPP5D, LCP2, and PTEN) and ‘downstream TCR signaling/pathway’ (PIK3CA and

286 PRKCQ), ‘T-cell co-stimulation’ (CTLA4 and CD28), ‘cell motility signaling/pathway’ (RAC1),

287 ‘generation of second messengers’ (EVL, ITK, and LCK), and ‘phosphatidylinositol signaling’

288 (INPP4A, INPP5D, PIK3CA, PLCB1, PRKCB, and PTEN), among others (Figure 1–figure

289 supplement 10B and Supplementary file 1B). The prime example of pathways enriched in

290 TSG is T-cell signaling/activation and co-stimulatory signals (CD28) that provide additional

291 control mechanisms to prevent inappropriate and hazardous T-cell activation. The data is

292 consistent with the fact that in early stages of infection the viral encoded proteins (particularly

293 Tat) mimic T-cell signaling pathways, resulting in sustained viral replication within infected T

294 cells. This T-cell activation provides new targets for HIV replication, creating a favorable

295 environment for further virus mediated damage to the immune system and chronic consumption

296 of the pools of naïve and resting memory cells (Hazenberg et al., 2000; Hellerstein et al., 2003).

297 Our data suggest that HIV uses Tat to directly induce several pathways (including T-cell

298 signaling) for productive infection of immune cells. Both the biological data and our prediction

299 suggest that TSG play important roles in these processes. On the other hand, TDG are

300 primarily enriched in datasets related to ‘metabolism of RNA’ (HSPA8, LSM4, and PRMT5), in

301 particular ‘ribosome’ (RPL and RPS variants), ‘proteasome’ (PSMB3, PSMB4, PSMB5, PSMC2,

302 PSMD1, and PSMD8), and ‘regulation of apoptosis’ (DAPK1, together with above proteasomal

303 genes), among others identified by the HIV life cycle/host interaction gene sets from Reactome

304 (Croft et al., 2014) (Figure 1–figure supplement 10B and Supplementary files 1C and D).

13 305 However, further studies will be required to precisely define the role of individual TSG and TDG

306 on viral replication.

307 Taken together, we have established the first comprehensive framework of target genes

308 in the human genome directly bound and regulated by Tat. Below we study the molecular

309 mechanisms of transcriptional control by examining genome-wide changes of Pol II and

310 chromatin signatures.

311

312 Global analysis of chromatin signatures indicates that Tat stimulates transcription at

313 both the initiation and elongation steps

314 With these functional insights we focused on the direct target genes to delineate the

315 mechanisms by which Tat regulates cellular transcription. Profiling of histone modifications has

316 revealed fundamental concepts in the regulation of transcription (Barski et al., 2007; Zhou et al.,

317 2011). To investigate whether Tat acts at the initiation or elongation steps, we performed ChIP-

318 seq of several histone modifications (in the absence and presence of Tat) that demarcate

319 distinct genomic domains such as promoters, coding units and enhancers and provide (in

320 general) information regarding their transcriptional status (Table 2).

321 Active promoters usually have adjacent nucleosomes bearing an H3K4me3 transcription

322 initiation mark (Bernstein et al., 2002; Guenther et al., 2007; Mikkelsen et al., 2007; Santos-

323 Rosa et al., 2002), and genes marked by H3K4me3 display significant amounts of

324 transcriptionally competent Pol II at promoters (Min et al., 2011). Thus, we first determined the

325 levels of H3K4me3 in both cell lines to test whether Tat modulates transcription initiation

326 (Figure 2A). After fixing TSS selection for a subset of Tat stimulated genes (TSG) such as

327 ADCYAP1 and ARHGEF7, where transcription starts at a short, internal isoform (Figure 2–

328 figure supplement 1A–B), we observed that the majority of TSG (excluding ATP9A, CD244,

329 ADCYAP1, CD226, SERINC2) are already marked with variable (low-to-high) H3K4me3 levels

330 and, as expected, its distribution mirrors the location of the promoter-adjacent nucleosomes

14 331 (Figure 2B). We thus defined two groups of genes based on H3K4me3 fold change levels in

332 the presence and absence of Tat (Tat/GFP). While the first group of genes (n = 17), referred to

333 as class I TSG, has undetectable or low H3K4me3 levels in the absence of Tat (Figure 2A and

334 Figure 2–figure supplement 1C–D), the second group of genes, referred to as class II TSG,

335 shows medium-to-high H3K4me3 levels surrounding the TSS (Figures 2A and B).

336 In the presence of Tat, class I TSG experiences a large increase in H3K4me3 density at

337 promoter-proximal regions (2-to-10–fold change) (Figures 2A and B, and Figure 2–figure

338 supplement 1C–D). Interestingly, Tat can selectively promote de novo transcription initiation at

339 a small subset of genes (such as ADCYAP1, ATP9A and CD244), as revealed by the large fold

340 changes in H3K4me3, and selection of a non-canonical or novel TSS (Figure 2–figure

341 supplement 1A, C and D). For example, Tat induces H3K4me3 at an internal site (intron 4) of

342 ADCYAP1 but not at the canonical TSS, which coincides with the production of a novel, short

343 isoform of a yet unknown function (Figure 2–figure supplement 1A). In contrast to class I

344 TSG, class II shows no change or, unexpectedly, a slight reduction in H3K4me3 levels in the

345 presence of Tat (~1.5-2–fold decrease) (Figures 2A and B), suggesting a post-initiation role for

346 Tat in activating these genes (see below).

347 The H3K4me3 density surrounding the TSS of class I TSG was typically lower (at least

348 5-fold less) than the signal in actively transcribed genes. As expected, these genes are

349 transcriptionally inactive or show low RNA levels (<10 Fragments Per Kilobase of transcript per

350 Million mapped reads (FPKM)) such as signaling peptide hormones (ADCYAP1), transcription

351 factors involved in immune system maturation (ZNF521, RORβ, ETV6) and genes essential for

352 T-cell maturation and responses (CD69, CD244). Interestingly, we observed a positive

353 correlation (R2=0.764) between the increase in transcript levels and H3K4me3 density nearby

354 the promoters of these genes, and that highly stimulated genes (ADCYAP1 and CD69)

355 experience larger increase in H3K4me3 density compared with other genes (Figure 2–figure

356 supplement 2A). On the other hand, class II TSG shows a much lower correlation (R2=0.177)

15 357 between the increase in transcript levels and H3K4me3 density surrounding promoters (Figure

358 2–figure supplement 2B), consistent with the idea that they are regulated at a post-initiation

359 step (see below).

360 If increased transcription initiation at these genes correlates with a productive increase in

361 RNA levels, then we would expect to find nucleosome modifications associated with

362 transcription elongation throughout the coding units (Li et al., 2007). Previous studies have

363 elucidated that in actively transcribed genes, H3K79me2/3-modified nucleosomes are present at

364 their highest levels shortly downstream of the TSS and that H3K36me3 modifications occupy

365 the entire gene body, with increasing density towards the 3’-end of the gene (Guenther et al.,

366 2007; Kolasinska-Zwierz et al., 2009; Li et al., 2007; Seila et al., 2008; Zhou et al., 2011). To

367 examine transcription states in detail, we carried out ChIP-seq experiments of H3K79me3 and

368 H3K36me3 using validated antibodies (Egelhofer et al., 2011). To assess Tat’s role in coupling

369 transcription initiation with elongation at class I TSG, we calculated the fold change in

370 H3K79me3 and H3K36me3 tag density in the presence and absence of Tat (Tat/GFP) as well

371 as changes in their gene average distribution in both class I and II TSG (Figures 2C–F). As

372 expected, we observed that class I TSG (activated at the initiation step) also shows evidence of

373 transcription elongation, based on the increase in H3K79me3/H3K36me3, total Pol II and

374 elongating Pol II levels, as well as recruitment of the P-TEFb kinase at two TSG (CD69 and

375 ADCYAP1) (Figure 2–figure supplement 3). While in the absence of Tat, class I TSG are

376 devoid of H3K79me3 throughout the gene, Tat promotes a robust increase in H3K79me3 (~6.2-

377 fold over GFP) just downstream of the TSS with a progressive decline towards the transcription

378 termination site (TTS) (Figures 2C–F, and Figure 2–figure supplement 4). Similarly, levels of

379 H3K36me3 at class I TSG are low in the gene body and increase towards the 3’-end of the gene

380 in the presence of Tat (~5.3-fold over GFP) (Figure 2F). Remarkably, the average distribution

381 patterns of both H3K79me3 and H3K36me3 in these genes are consistent with previous

382 genome-wide distribution analysis (Kouzarides, 2007; Li et al., 2007).

16 383 Because H3K4me3 levels do not increase in the class II TSG, we reasoned that these

384 genes are regulated by Tat at a post-initiation step. If this were the case, then we would expect

385 to find an increase in nucleosome modifications associated with transcription elongation marks

386 in the presence of Tat, despite the lack of H3K4me3 increase. To test this possibility, we

387 examined H3K36me3 and H3K79me3 distribution and density in these genes. We ignored

388 genes that lack H3K79me3 irrespective of Tat presence as well as intronless genes, which

389 complicated the density and distribution calculations, and thus ended with a more cohesive and

390 consistently behaved group of class II TSG (n = 43). As expected, most class II TSG showed

391 increased H3K79me3 (100%) and H3K36me3 (~82%) within their gene bodies (Figures 2C–F)

392 consistent with a role of Tat in promoting transcription elongation. Together, this analysis

393 suggests that the majority of target genes that do not experience transcription initiation changes

394 in response to Tat do show evidence of increased levels of promoter escape and transition to

395 active elongation based on the global analysis of chromatin signatures (Figure 2–figure

396 supplement 4) and Pol II (see below). Although the increase in H3K79me3 is significant,

397 H3K36me3 changes are smaller, probably because class II TSG is active and their transcribing

398 units already demarcated by H3K36me3 (Figure 2C–F). Again, the average gene distribution of

399 H3K79me3 and H3K36me3 correlates with previous studies suggesting that H3K79me3 is most

400 enriched shortly downstream of the TSS and H3K36me3 increases towards the middle and end

401 of the gene (Figures 2 D and F) (Kouzarides, 2007; Li et al., 2007).

402 Inspection of individual gene tracks from our genome-wide study reveals examples of

403 these two regulatory mechanisms (Figures 2G and H). At the neuropilin and tolloid-like 1

404 (NETO1) locus (class I TSG), which encodes a transmembrane receptor that plays a critical role

405 in spatial learning and memory, Tat binding at the promoter-proximal region correlates with a

406 sharp increase in H3K4me3 (~8-fold) with concomitant increases in H3K79me3 and H3K36me3

407 downstream of the TSS (Figure 2G). Conversely, at the VAV3 guanine nucleotide exchange

408 factor locus (class II TSG), the H3K4me3 signature surrounding the TSS does not appear to

17 409 change, while the markers of active transcription in coding units (H3K79me3 and H3K36me3)

410 showed large increases in the presence of Tat, indicating a role in transcription elongation

411 (Figure 2H). Heatmap representation of the density of all three chromatin signatures at

412 promoter and gene body of class I and class II TSG demonstrate that this is common for all

413 target genes, albeit with differences in fold change (Figure 2–figure supplement 4A–E), which

414 is in agreement with the broad distribution of gene expression changes (Figure 1D).

415 Given that we defined two classes of genes using the minimalistic Tat ectopic

416 expression system, we asked whether these genes are also modified in a similar manner (at the

417 initiation or elongation steps) during HIV infection. To test this, we analyzed the levels of

418 initiating (promoter-proximal) and elongating (promoter-distal) transcripts in primary TCM cells

419 infected with replication-competent HIV versus mock infection. Interestingly, we observed that

420 two class I TSG (CD69 and PPM1H) and two class II TSG (VAV3 and ANXA1) showed

421 evidence of increased initiation or elongation, respectively (Figure 1–figure supplements 8C

422 and D). This implies that the model of cellular reprogramming by the ectopic expression of Tat

423 alone is mirrored (at least for the target genes examined) during HIV infection of primary T cells.

424 Several enzymes are known to regulate histone modifications associated with distinct

425 epigenetic states. If Tat promotes cellular reprogramming by modifying the epigenetic

426 landscape then we would expect Tat to interact with chromatin-modifying enzymes associated

427 with the respective histone modifications. While H3K79me3 is generated by the Dot1L complex

428 (Nguyen and Zhang, 2011), the H3K36me3 mark is imposed by the SetD2 methyltransferase

429 recruited by elongating Pol II and enriched within the body of transcriptionally active genes

430 (Guenther et al., 2007; Krogan et al., 2003a; Krogan et al., 2003b; Nguyen and Zhang, 2011;

431 Pokholok et al., 2005). To test if Tat does indeed interact with these enzymes, we first affinity

432 purified (AP) Strep-tagged Tat (or GFP as negative control) from nuclear fractions of the Jurkat

433 T-cell lines and observed that Tat binds both Dot1L and SetD2 as well as the P-TEFb kinase

434 (CDK9) used as positive control in this assay (Figure 2–figure supplement 5A). To further test

18 435 whether changes in the epigenetic landscape directly correlate with the recruitment of these

436 chromatin modifiers to specific genes, we performed ChIP followed by quantitative PCR (ChIP-

437 qPCR) and found that the Tat-mediated recruitment of Dot1L and SetD2 to the CD69 locus

438 (class I TSG) correlates well with the increase in the histone modifications associated with

439 transcription elongation as well as Pol II (Figure 2–figure supplement 5B–C). To provide a

440 functional link between the Tat-mediated recruitment of these chromatin modifiers and gene

441 expression changes, we used short hairpin RNAs (shRNAs) to target Dot1L and SetD2 by RNA

442 interference (Figure 2–figure supplement 5D–H). Interestingly, we noted that after knockdown

443 of these chromatin-modifying enzymes as well as of CDK9, the increase in CD69 RNA levels

444 (but not RPL19) in response to Tat is virtually abolished, implying that Tat-mediated recruitment

445 of Dot1L, SetD2 and P-TEFb to TSG is a requisite for their increased transcription elongation

446 levels, in agreement with the ChIP data (Figure 2–figure supplements 3 and 5).

447 The discovery of these key enzymes as Tat targets and the evidence that they play

448 important roles in Tat-mediated cellular gene expression alterations suggest that this regulatory

449 mechanism is part of the reprogramming of target immune cells by Tat. Collectively, we have

450 described a set of human genes stimulated by Tat at two different steps in the transcription

451 cycle (initiation and elongation) using selective chromatin-modifying enzymes and the

452 transcription elongation machinery.

453

454 Tat promotes Pol II recruitment and pause release to induce initiation and elongation at

455 different gene classes

456 Pol II regulates the control of transcription initiation and elongation in the context of chromatin

457 (Fuda et al., 2009). In fact, the deposition of histone modifications associated with initiation and

458 elongation at promoters and gene bodies has been functionally linked to levels of recruited and

459 transcriptionally engaged Pol II, respectively (Adelman and Lis, 2012). Therefore, if Tat

460 promotes transcription initiation and elongation as proposed (Figure 2), we would expect Pol II

19 461 levels to fluctuate in response to Tat in ways that reflect the specific mode of activation. Pol II is

462 recruited to gene promoters to initiate transcription, but also tends to occupy promoter-proximal

463 regions of genes that show evidence of initiation but no or inefficient elongation, which is

464 referred to as promoter-proximal pausing (Adelman and Lis, 2012; Fuda et al., 2009; Rahl et al.,

465 2010; Wade and Struhl, 2008). To further investigate whether Tat promotes Pol II recruitment

466 (indicative of transcription initiation) or increases Pol II levels within gene bodies (indicative of

467 transcription elongation) we used ChIP-seq to examine Pol II distribution in the absence and

468 presence of Tat (Figure 3).

469 We first analyzed levels of Pol II at both Tat binding sites (Tat peak) and promoters of

470 target genes in cases where Tat only binds to promoter-distal sites. Interestingly, we found that

471 Tat stimulates Pol II recruitment at both the Tat binding site and promoter-proximal region of

472 class I TSG (Figures 3A–C), in perfect agreement with the Tat-mediated increase in H3K4me3

473 at those gene promoters (Figures 2A and B). Notably, this observation is consistent with the

474 model that Tat mediates transcription initiation at class I TSG. Conversely, as expected, we did

475 not see higher levels of Pol II at the promoters of class II TSG in response to Tat (Figures 3A,

476 B and D). This lack of Pol II recruitment further supports our proposed model based on the

477 global analysis of chromatin signatures that class II TSG are not regulated at the initiation step

478 (Figure 2).

479 Interestingly, a metagene analysis of class I TSG in the absence of Tat showed very low

480 Pol II density levels in both the promoter and transcribing unit. Conversely, Tat induces an

481 increase in Pol II density throughout the gene body with a noticeable peak at the promoter-

482 proximal region, consistent with Pol II recruitment to promoters and transition into elongation

483 (Figure 3E) and with the increase in H3K4me3 at the promoter-proximal regions associated

484 with those genes (Figures 2A and B, and Figure 2–figure supplement 3). Genome browser

485 views of individual class I TSG such as NETO1 exemplify the metagene analysis depicting that

486 Pol II is strongly recruited to the intragenic Tat binding site and promoter, marked with

20 487 H3K4me3 in the presence of Tat (Figure 3F), and that transcription initiation correlates with

488 increase in chromatin signatures (H3K79me3 and H3K36me3) associated with active

489 transcription in gene bodies, as well as transcribing Pol II (Figure 3–figure supplement 1A).

490 We have previously defined class II TSG as being primarily regulated at the elongation

491 step because H3K4me3 and Pol II density at the promoter-proximal region do not increase in

492 the presence of Tat (Figures 2 and 3A–D). Examination of two class II TSG (VAV3 and CD82)

493 using Chip-qPCR clearly demonstrates that Tat induces Pol II elongation by recruiting the P-

494 TEFb kinase at those target genes (Figure 3–figure supplement 2A–B). Consistently, a

495 metagene analysis shows that Tat largely increases Pol II density throughout the gene body and

496 3’-end of class II TSG, but not at the promoter, in agreement with a role of Tat in promoting the

497 transition to elongation (Figure 3G). Therefore, in the absence of Tat stimulation, the majority

498 of Pol II at class II TSG accumulates in the promoter-proximal region with a peak just

499 downstream of the TSS while Tat strongly induces increased Pol II density in the gene body but

500 not in the promoter-proximal region (Figure 3G). Genome browser views of individual class II

501 TSG such as VAV3 are consistent with the elongation function (Figure 3H), and are also in

502 perfect agreement with the analysis of chromatin signatures related to elongation (H3K79me3

503 and H3K36me3) (Figure 2 and Figure 3–figure supplement 1B).

504 Collectively, the data indicate that Tat controls both Pol II recruitment and pause release

505 to promote initiation and elongation in different gene classes.

506

507 Tat stimulates transcription initiation from intragenic enhancers by inducing gene

508 looping

509 Tat is recruited to two different genomic domains (promoters and intragenic) irrespective of the

510 transcription step regulated (initiation or elongation) (Figure 4A), implying that the site of Tat

511 recruitment to its target genes does not dictate the mechanism of transcription activation. To

512 further elucidate how Tat promotes transcription by binding to promoters or intragenic sites, we

21 513 examined how chromatin signatures associated with these genomic domains change in

514 response to Tat. Promoters bound by Tat are marked with the expected signature: high

515 H3K4me3 and low H3K27Ac (Figure 4B) (Creyghton et al., 2010; Heintzman and Ren, 2009).

516 Conversely, intragenic sites bound by Tat are marked with the enhancer signature: high

517 H3K27Ac and low H3K4me3 (Figure 4B), as well as high H3K4me1 (data not shown)

518 (Creyghton et al., 2010; Heintzman and Ren, 2009; Zhou et al., 2011). This suggests that the

519 intragenic Tat-binding sites appear to be intragenic enhancers, which have been previously

520 proposed to function as alternative elements required for gene activation (Kowalczyk et al.,

521 2012).

522 To better define the roles of Tat in activating transcription from these promoter-distal,

523 intragenic sites, we sorted class I and class II TSG based on the H3K27Ac density surrounding

524 the Tat peak. Notably, we observed that in the absence of Tat, the intragenic sites at class I

525 TSG have low or undetectable levels of H3K27Ac (Figure 4C), consistent with the idea that

526 these genes are inactive or only minimally transcribed in the basal state without Tat (Figure 2).

527 However, Tat increases H3K27Ac density near its intragenic binding sites by ~2–10-fold

528 depending on the gene (Figures 4C and D). Conversely, in the absence of Tat, H3K27Ac

529 levels at the intragenic sites of class II TSG are high and Tat increases their density, albeit with

530 a lower fold-change than in class I TSG (Figures 4C and D). This is consistent with the model

531 that these genes are active in the basal state and Tat activates a post-initiation step, namely

532 transcription elongation. Metagene analysis of class I TSG indicates that the H3K27Ac mark at

533 intragenic binding sites is virtually absent at the immediate binding site itself but high in the

534 nucleosomes directly surrounding the Tat peak, progressively declining further up and

535 downstream from the binding site (Figure 4E). At these sites, Tat increased H3K27Ac levels an

536 average of ~3-fold. A similar H3K27Ac distribution pattern is observed in class II TSG, even

537 though the magnitude of H3K27Ac increase is smaller (~1.5-fold) because these genes are

538 already active in the absence of Tat (Figure 4F).

22 539 Given that the histone acetyl transferase p300/CBP is a well-known Tat interactor and

540 that it is recruited to enhancers to facilitate transcription activation through chromatin acetylation

541 (including H3K27Ac) (Hottiger and Nabel, 1998; Jager et al., 2012; Kim et al., 2010b), we asked

542 whether Tat recruits p300 to these intragenic sites to trigger H3K27Ac. To test this possibility,

543 we performed ChIP assays on one class I TSG (PPM1H) and one class II TSG (CD82).

544 Notably, we observed that in the presence of Tat, the increase in H3K27Ac levels at class I

545 intragenic sites in response to Tat mirrors the recruitment of p300 (Figure 4G). However, levels

546 of both H3K27Ac and p300 detected at the intragenic sites of class II TSG are already high in

547 the basal state and are slightly induced (~1.2-1.4-fold) in response to Tat (Figure 4H).

548 Remarkably, we observed a sharp correlation (R2=0.853) between increased H3K27Ac

549 density at the intragenic site and gene expression levels at class I TSG (Figure 4–figure

550 supplement 1A), supporting a model in which Tat is recruited to these sites to induce de novo

551 transcription activation through recruitment of p300 at internal sites. However, the correlation

552 between H3K27Ac density at the intragenic site and gene expression levels at class II TSG is

553 quite low (R2=0.0728) because these genes are already marked by H3K27Ac in the absence of

554 Tat (Figure 4–figure supplement 1B). These results support the model that Tat binds at

555 intragenic sites of non-productive genes (class I TSG) to induce their transcription initiation or is

556 recruited to productive genes (actively transcribed) such as class II TSG to augment

557 transcription elongation levels.

558 Because these Tat binding sites are located distally from the promoter in the majority of

559 class I TSG (15 out of 17), we hypothesized a model in which Tat controls gene looping to

560 induce spatial proximity between the intragenic site (putative enhancer) and the promoter. To

561 further test this possibility in detail we selected one class I TSG (PPM1H), which is about 300 kb

562 in length, thus facilitating the analysis of long-range chromatin interactions (Figure 4I). We

563 performed conformation capture (3C) to assess the association of two intragenic

564 Tat target sites and the promoter. We also performed a similar analysis between the gene

23 565 promoter and distal intergenic sites located at a similar distance from the promoter but not

566 bound by Tat. We used anchoring points near the gene promoter to measure the extent of

567 chromatin looping between the promoter and the intragenic Tat-bound sites or the distal control

568 domain. Interestingly, this analysis revealed that prior to Tat stimulation there is no obvious

569 interaction between the gene promoter and the two intragenic Tat-bound sites. However, Tat

570 promotes a strong association between the two sites, albeit with different crosslinking

571 efficiencies (Figure 4I), and this gene looping needs a functional Tat, because the C22A non-

572 functional Tat mutant does not promote this effect (Figure 4–figure supplement 2). Moreover,

573 the effect of Tat on chromatin looping is specific as there was no detectable association when a

574 control anchor primer was placed about 20 kb upstream from the PPM1H promoter (Figure 4–

575 figure supplement 2).

576 Gene looping could be a direct consequence of Tat activity at promoters and enhancers,

577 or Tat can simply help activate enhancers, causing them to increase looping/proximity to their

578 target promoters (Bulger and Groudine, 2011). To distinguish between these two possibilities

579 we used flavopiridol (FP), a CDK9 inhibitor that, in addition to blocking transcription elongation,

580 also inhibits the production of eRNAs (enhancer-derived RNAs), without affecting other

581 molecular indicators of enhancer activity (such as Pol II binding, H3K4me1 levels, and gene

582 looping) (Hah et al., 2013). We used FP in the Jurkat-GFP and -Tat cell lines and performed 3C

583 experiments to uncouple the assembly of enhancer complexes and gene looping per se from

584 eRNA production. The data suggests that Tat promotes gene looping even in the absence of

585 eRNA synthesis (Figure 4–figure supplement 3A–D), implying that eRNA synthesis occurs

586 after assembly of Tat and the transcription machinery at the enhancer, and enhancer-promoter

587 communication.

588 We have suggested that class II TSG have paused Pol II, and it is known that promoter-

589 enhancer loops are associated with paused Pol II (Ghavi-Helm et al., 2014). We thus wished to

590 test whether gene looping is also taking place at class II TSG, and if Tat plays any role. We first

24 591 analyzed the percentage of class II TSG containing intragenic Tat-bound sites (n = 34 out of 43,

592 79%) and then examined whether gene looping is critical for their expression and whether Tat

593 promotes the looping (Figure 4–figure supplement 4). For this we selected one class II TSG

594 (CD82), which shows a strong Pol II peak promoter-proximally, has one Tat-bound site

595 intragenically, and shows evidence of looping between the promoter and the intragenic Tat-

596 bound site. We found no evidence that Tat modulates gene looping at the CD82 loci thereby

597 indicating that Tat's effects at class II TSG is at a step post-formation of long-range chromatin

598 interactions (Figure 4–figure supplement 4), in agreement with a role of Tat in promoting

599 elongation.

600 In conclusion, class I TSG is transcriptionally inactive (or transcribed at a low rate) and

601 Tat binds to intragenic sites to induce chromatin looping and transcription activation. On the

602 other hand, class II TSG is actively transcribed and Tat promotes a post-initiation step (namely

603 Pol II elongation) by binding to both promoters and/or intragenic sites without affecting gene

604 looping.

605

606 Implementation of a modified traveling ratio algorithm indicates that Tat promotes

607 transcription elongation at class II TSG

608 While at class I TSG Tat promotes the initiation step, at class II TSG Tat appears to induce Pol

609 II transition into the elongation phase (Figures 2 and 3). Previously, an algorithm that

610 computes Pol II levels in promoter-proximal regions and gene body (termed Traveling Ratio

611 (TR)) has been devised to examine the transition from initiation into elongation (Figure 5A)

612 (Rahl et al., 2010; Reppas et al., 2006; Wade and Struhl, 2008; Zeitlinger et al., 2007). We

613 thought to apply such an algorithm to examine Tat functions in initiation and elongation by

614 comparing Pol II levels found in promoter-proximal regions (-50 to +1000 respective to the TSS)

615 versus levels within the gene body. Unexpectedly, we found in our ChIP-seq datasets that in

616 both the presence and absence of Tat, Pol II levels tend to peak not only near the promoter of

25 617 many genes, but also at certain intronic domains marked by H3K27Ac and H3K4me1,

618 sometimes at densities similar to those found at the promoter-proximal region (Figure 5B).

619 These sites do not contain any known annotation in the UCSC genome browser and does not

620 appear to contain a non-canonical TSS because the H3K4me3 levels are low-to-undetectable.

621 Thus, to examine if these Pol II forms were technical artifacts, we performed ChIP-seq with

622 other Pol II Abs (total Pol II, CTD, Ser5P-CTD and Ser2P-CTD), both in the absence and

623 presence of Tat, and found that, irrespective of the Ab used, this stably paused Pol II form was

624 detected in the gene body, albeit with variable density levels (Figure 5–figure supplement 1).

625 Similarly, genome browser searches revealed that intragenically poised Pol II is also detected in

626 primary CD4+ T cells, thus ruling out the possibility that it is an artifact of Pol II distribution

627 present in immortalized T cell lines growing in tissue culture. Moreover, examination of our

628 ChIP-seq chromatin signature tracks indicated that the sites of intragenic Pol II pausing

629 correspond to potential enhancers because they are marked with the corresponding enhancer

630 signature (high H3K4me1/H3K27Ac and low H3K4me3) (Figure 5B) (Heintzman and Ren,

631 2009).

632 Given that we found intragenic enhancers containing a form of Pol II that appears to be

633 paused, and that enhancer-promoter interactions are frequently associated with paused Pol II

634 (Ghavi-Helm et al., 2014), we speculated that the original TR algorithm might not accurately

635 describe Tat’s role in transcription elongation in our dataset. The problem arises because the

636 TR algorithm treats all Pol II in the gene body as actively elongating. This assumption is invalid

637 in cases where Pol II appears paused intragenically, or is detected at an intragenic enhancer

638 because of chromatin looping. In order to circumvent this problem we devised a custom-made

639 algorithm referred to as “modified TR” (mTR) to identify intragenic enhancers marked by

640 H3K4me1/H3K27Ac and paused Pol II. mTR accurately categorizes these sites to more

641 precisely calculate the promoter-proximal and gene body Pol II counts (Figure 5C). We defined

642 paused Pol II (pPol II) as Pol II located in both the promoter-proximal region (-50/+1000 from the

26 643 TSS) and at intragenic sites marked with H3K27Ac (as determined by MACS2, with a peak

644 cutoff of p-value < 0.05) and having a read density/nt greater than five times that of the average

645 density/nt within the gene body. Of note, we used a window of -50/+1000 from the TSS rather

646 than -50/+300 because several genes contained accumulation of Pol II beyond the +100, and

647 up to the +1000 position. Transcribing Pol II (tPol II) is defined as Pol II in the remainder of the

648 gene that is not associated with the enhancer signature. Then, the mTR is the ratio of average

649 pPol II (promoter+gene body) density to average tPol II (gene body) density (Figure 5C). As

650 expected, class II TSG showed decreased mTR in the presence of Tat compared to the GFP

651 cell line (Figure 5D).

652 Despite this compelling observation, the mTR could be altered in response to Tat either

653 by fluctuations in the levels of pPol II, tPol II, or both. Therefore, to more clearly explore Tat’s

654 role in controlling elongation, we computed densities for both Pol II classes (pPol II and tPol II)

655 and compared them separately in the GFP and Tat cell lines. For each gene, we calculated the

656 ratio of average pPol II and tPol II densities in the two cell lines (Tat/GFP) and plotted them on

657 the X- and Y-axis, respectively (Figure 5E). This tool plot demonstrates clearly the Pol II

658 behavior in response to Tat, both in terms of changes to initiation and elongation. While class I

659 TSG showed profound increase in pPol II and tPol II as consequence of transcription

660 initiation/elongation activation, class II TSG shows, albeit with gene-specific differences, higher

661 evidence of elongation (increased tPol II form in the presence of Tat) (Figure 5E).

662 Together, the data suggest that Tat has evolved to fine-tune both transcription initiation

663 and elongation steps at functionally different gene classes. At class I TSG, Tat directly

664 mediates Pol II recruitment to gene promoters or binds to intragenic sites to induce chromatin

665 looping (promoter-enhancer communication) to trigger transcription initiation (Figure 5F). At

666 class II TSG, Tat binds to already engaged Pol II to promote its escape into the productive

667 elongation phase (Figure 5G).

668

27 669 Global analysis of chromatin signatures and Pol II distribution indicate that Tat

670 downregulates transcription by blocking both initiation and elongation

671 Global analysis of chromatin signatures in stimulated genes revealed that Tat induces both

672 transcription initiation and elongation (Figure 2). Thus, we computationally examined chromatin

673 signatures throughout promoter-proximal and -distal regions of Tat downregulated genes (TDG)

674 to determine at which step in the transcription cycle Tat blocks gene activation. Despite the

675 identification of genes having marked changes in both chromatin initiation (H3K4me3) and

676 elongation signatures (H3K79me3 and H3K36me3) such as CD1E and FBLN2, or chromatin

677 elongation signatures only such as EOMES and HSPA8, this distinction did not become as clear

678 for genes having low gene expression changes (<3-fold). Therefore, we used a combined

679 chromatin/Pol II signature to identify initiation- (class I) and elongation-regulated (class II) TDG

680 (Figure 6). We defined class I TDG as genes having chromatin and Pol II signatures consistent

681 with inhibited transcription initiation, with at least 60% decrease in H3K4me3 surrounding the

682 TSS and 30% decrease in promoter-proximal Pol II with Tat (fold H3K4me3 Tat/GFP<0.4 and fold

683 Pol IITat/GFP<0.7). Conversely, class II TDG are inhibited at later stages of transcription because

684 H3K4me3 and Pol II promoter-proximal levels remain virtually unchanged in the presence of Tat

685 (fold H3K4me3 Tat/GFP>0.7 and fold Pol IITat/GFP>0.75). Therefore, we defined class II TDG as

686 genes having less than 30% decrease H3K4me3 surrounding the TSS and less than 25%

687 decrease in promoter-proximal Pol II level in response to Tat. Using this signature, we identified

688 11 class I TDG and 14 class II TDG that strictly pass the above mentioned criteria. Remarkably,

689 these genes also appear to be controlled at the initiation and elongation level in primary TCM

690 cells infected with HIV (Figure 1–figure supplement 8E–F). Although further research is

691 needed to pinpoint the details, the rest of the genes could be a mixed class I-II population in

692 which Tat simultaneously interfere with both the initiation and elongation steps (see Discussion).

693 Class I TDG experience a sharp decrease in H3K4me3 (~4.2-fold change) and in Pol II

694 (see below) in the presence of Tat (Figures 6A and B), consistent with a role of Tat in blocking

28 695 transcription initiation. In support of this view, we observed a statistically significant correlation

696 between the decrease in H3K4me3 levels surrounding the TSS and RNA levels at class I TDG

697 but not class II (data not shown). To further examine how Tat blocks the initiation step, we

698 performed detailed ChIP-qPCR assays to monitor the occupancy of subunits of the transcription

699 pre-initiation complex. Interestingly, we observed that Tat recruitment to the CD1E gene

700 promoter blocks the step of PIC assembly as denoted by the sharp decrease in the occupancy

701 levels of TBP, Mediator (MED1), Pol II and the P-TEFb kinase (CDK9) (Figure 6–figure

702 supplement 1A), in agreement with the virtual removal of H3K4me3 (Figures 6A and G). On

703 the contrary, class II TDG showed no significant change in H3K4me3 density surrounding the

704 TSS (Figures 6B and H), in agreement with the lack of Tat effects at the transcription initiation

705 level. Consistently, at the class II TDG EOMES, Tat does not interfere with PIC assembly (as

706 revealed by identical levels of TBP and MED1 at the promoter). However, Tat blocks Pol II

707 transition into productive elongation due to dismissal of P-TEFb from both the promoter and

708 transcription unit (Figure 6–figure supplement 1B), which affects levels of total Pol II and the

709 elongating form (Ser2P) in the gene body but not at the promoter-proximal region at two class II

710 TDG (EOMES and HSPA8) (Figure 3–figure supplement 2C–D).

711 If Tat blocks transcription initiation, then we would predict that the chromatin elongation

712 signatures would also decrease, as there is no recruited Pol II available for elongation. In

713 agreement, class I TDG showed a virtual elimination of H3K79me3 and H3K36me3 levels in the

714 transcribed unit, reflecting potent inhibition of these genes (Figures 6C–F). The metagene

715 analysis clearly indicated that all class I TDG showed reduced levels of both chromatin

716 elongation signatures (Figures 6D and F). However, the difference in magnitude at different

717 genes was very broad, most likely due to the fact that these genes are transcribed at different

718 levels and their abundance (based on RNA-seq) is quite disparate (Figures 6C and E).

719 Genome browser inspection of individual genes like CD1E, which is expressed at high levels,

720 provide evidence that Tat is a very potent transcription initiation repressor, as revealed by the

29 721 large drop in all chromatin signatures profiled (Figure 6G) and inhibition of PIC assembly at the

722 promoter, as revealed by the dismissal of TBP, MED1, CDK9 and Pol II at the CD1E loci

723 (Figure 6–figure supplement 1A).

724 In contrast to class I TDG, class II showed no significant change in H3K4me3 levels

725 surrounding the TSS (Figures 6A–B). However, a subset of class II genes such as EOMES,

726 HSPA8 and CDK6, showed reduced chromatin elongation signatures (H3K79me3 and

727 H3K36me3) throughout the transcribing unit (Figures 6C, E, and H). Genome browser

728 inspection of individual class II TDG like EOMES demonstrate that Tat blocks the transition to

729 elongation because Pol II density largely decreases at the transcribing unit without significant

730 alterations in the promoter-proximal region, consistent with decreased chromatin elongation

731 signatures but no H3K4me3 effects (Figure 6H). However, surprisingly, a few genes showed

732 no change in H3K79me3 modification, or even a small increase, right after the TSS (Figure 6C),

733 possibly related to the fact that Dot1L-mediated establishment of H3K79me3 also has been

734 linked with transcriptional repression (Nguyen and Zhang, 2011; Onder et al., 2012). However,

735 this alternative function of Dot1L in transcription repression will require further investigation.

736 Taken together, it is evident that Tat functions as a transcriptional repressor blocking Pol

737 II recruitment and pause release as well as promoting the removal of chromatin modifications

738 associated with initiation and elongation at different gene classes.

739

740 Tat blocks Pol II recruitment and pause release to repress transcription initiation and

741 elongation

742 The fact that Tat modifies the epigenetic landscape to repress cellular transcription prompted us

743 to examine Tat effects on Pol II distribution changes (Figure 7). If Tat represses transcription

744 initiation by blocking PIC assembly such as in the CD1E gene then we would expect Pol II

745 levels to decrease at gene promoters and other functionally relevant sites in the presence of

746 Tat. We first analyzed levels of Pol II at both Tat binding sites (Tat peak) and promoters of

30 747 associated genes for both class I and class II TDG. Interestingly, we found that Tat blocks Pol II

748 recruitment at both the Tat binding site and promoter-proximal region of class I TDG (Figures

749 7A–C), in perfect agreement with the Tat-mediated decrease in H3K4me3 at those promoters

750 (Figure 6). Remarkably, these observations are in agreement with the model that Tat blocks

751 transcription initiation at class I TDG. Conversely, as expected based on the global analysis of

752 chromatin signatures, Tat does not appear to largely block Pol II recruitment at the Tat peak and

753 promoter of class II TDG (Figures 7A, B and D), which we proposed to be regulated at the

754 elongation level based on the global analysis of chromatin signatures and Pol II levels (Figure

755 6). In fact, this is in agreement with the lack of H3K4me3 density changes at promoters of class

756 II TDG in the presence of Tat.

757 Metagene analysis of class I TDG showed high Pol II density levels in both the gene and

758 transcribing unit in the absence of Tat. However, Tat reduces Pol II density at the promoter,

759 consistent with a drop in transcribing Pol II at the gene body (Figure 7E). Genome browser

760 views of individual class I TDG such as CD1E clearly exemplify the metagene analysis depicting

761 that Pol II recruitment at the promoter is strongly blocked, leading to the virtual disappearance of

762 Pol II signal throughout the gene body. The transcription initiation signature H3K4me3 is also

763 eliminated (Figure 7F), and the blockage of transcription initiation correlates with decrease in

764 chromatin signatures associated with transcription elongation (H3K79me3 and H3K36me3)

765 throughout the gene body (Figure 6).

766 We have defined class II TDG as being primarily controlled at the elongation step

767 because H3K4me3 and Pol II density at the promoter-proximal region does not appear to be

768 modified by Tat (Figures 6 and 7). Consistently, a metagene analysis shows that Tat

769 decreases Pol II density throughout the gene body and 3’-end, but not at the promoter-proximal

770 region, in agreement with a role of Tat in blocking the transition to elongation (Figure 7G).

771 Genome browser views of individual class II TDG such as EOMES are consistent with the block

772 of Pol II pause release (Figure 7H), and with the global analysis of chromatin signatures related

31 773 to elongation (Figure 6). Reduced elongation is due to the block of P-TEFb recruitment to

774 promoters at class II TDG (EOMES and HSPA8) (Figure 3–figure supplement 2). Moreover,

775 simultaneous analysis of the poised and transcribing Pol II forms (pPol II and tPol II) using our

776 mTR algorithm provides further evidence that, at the majority of class II TDG, Tat dampens the

777 level of tPol II without largely modifying or even increasing pPol II (Figure 7–figure supplement

778 1), consistent with a role in primarily blocking the transition into elongation.

779 In conclusion, the data indicate that Tat precisely modulates Pol II recruitment and

780 transition into the gene body to block transcription initiation or elongation, respectively, at

781 different gene classes.

782

783 Enrichment of motifs for master transcriptional regulators but not of TAR-like motifs in

784 the Tat binding sites

785 Given that Tat activates transcription from the HIV promoter by associating with the TAR

786 structure that is formed at the nascent chain of viral pre-mRNAs (Frankel and Young, 1998)

787 (Figure 8A), we investigated the possibility that Tat is recruited to the host chromatin through

788 interaction with TAR-like structures. To test whether the Tat sites at the direct target genes

789 (both TSG and TDG) contain TAR-like structures, we searched for TAR-like motifs using a

790 custom algorithm and the input sequence X(2)GATX(1,2)GAX(4,40)TCTCX(2) as query (Figure 8B),

791 where X denotes any nucleotide, and the numbers in brackets represent the minimal and

792 maximal allowed positions with the corresponding secondary structure pattern XX((XX((X(4–40)-

793 X))))XX including the di-/tri-nucleotide bulge within the stem and loop (both critical determinants

794 for TAR binding) (Frankel and Young, 1998). Locations of TAR-like motifs near target genes

795 were cataloged and compared to distribution of Tat peaks in our ChIP-seq dataset. While ~20%

796 of the TSG and TDG contain TAR-like motifs within a very large window (<10 Kb) from the Tat

797 site, an insignificant number (~1%) contain a TAR-like motif in close proximity (<0.1 Kb) to the

798 Tat site identified by ChIP-seq (Figures 8C and D). The data suggest that there is no

32 799 significant presence of TAR-like motifs at or near Tat peaks, and only minimal examples of a

800 TAR-like motif within 10 Kb of a Tat peak. Thus, it seems unlikely that the more prevalent

801 mechanism of Tat recruitment to its target genes is through interaction with HIV TAR mimics.

802 Given that Tat’s interaction with its target genes appears not to rely on TAR-like motifs,

803 we reasoned that Tat might be recruited through interaction with master transcriptional

804 regulators. To identify candidates, we submitted 200-bp windows surrounding Tat peaks from

805 TSG and TDG to the DREME (Discriminative Regular Expression Motif) analysis tool (Bailey,

806 2011), which returned motifs matching T cell master regulators (ETS1, RUNX1) and GATA3

807 transcription factors for TSG, and ETS1 and RUNX1 for TDG, with statistically significant higher

808 p-values in TSG (2.5x10-18 and 9.6x10-14, respectively) (Figures 8E and F), without enrichment

809 of other factors expressed in T cells such as members of the Signal Transducer and Activator of

810 Transcription (STAT) family. Remarkably, enrichment of both ETS1 and RUNX1 at TSG

811 correlates with the observed functional annotation and their known role in modulating T cell

812 activation (Figure 1) (Hollenhorst et al., 2009; Hollenhorst et al., 2011).

813 Given that ETS1 motifs are present at both TSG and TDG, we examined whether the

814 presence of these motifs inform about the mode of Tat effect on cellular genes. To test this, we

815 looked to see whether ETS1 motifs are more prevalent in class I or class II TSG or TDG, and

816 observed that the number of target genes containing ETS1 binding sites within 100-bp from the

817 center of the Tat peak is: 11/17 (64.7%) for class I TSG, 36/43 (83.72%) for class II TSG, 8/14

818 (57.1%) for class I TDG, and 6/11 (54.5%) for class II TSG. This analysis indicates that the

819 presence of an ETS1 motif at the Tat target genes is not by itself sufficient to determine the

820 mechanism of action (stimulation or downregulation) or whether Tat modulates the initiation or

821 elongation step of transcription. ETS1 may help recruit Tat to chromatin but it needs another

822 determinant for specificity or mode of action, most likely related to the transcriptional activity

823 status or yet unidentified co-factors (protein and/or long-non coding RNA), which might function

824 in a gene-specific manner.

33 825 Given that the highest-confidence motif returned for both TSG and TDG is ETS1, we

826 examined whether this was indicative of co-occupancy of ETS1 with Tat in these genes. We

827 first retrieved ETS1 ChIP-seq data in Jurkat T cells (Hollenhorst et al., 2009) and compared

828 locations of ETS1 peaks to the locations of Tat peaks in our dataset (Figures 8G and H).

829 Contrary to our results when comparing TAR-like motifs sites to Tat ChIP-seq peaks, we

830 discovered that 73% of TSG and 80% of TDG contain an ETS1 ChIP-seq peak within 100-bp of

831 a Tat ChIP-seq peak. Together, the data suggest a model where Tat is recruited to host cell

832 chromatin through interaction with T-cell identity factors such as ETS1 and not through TAR-like

833 motifs, thus revealing unique and unexpected recruitment mechanisms. ETS1 ChIP-seq

834 revealed 19049 sites in the genome (Hollenhorst et al., 2009). Interestingly, nearly 52% of the

835 Tat peaks detected by ChIP-seq (3203/6117) harbor an ETS1 binding event within 100-bp,

836 significantly more than expected from random occurrence (p-value 1x10-8536, Hypergeometric

837 test).

838 Although a large number of both TSG and TDG contain ETS1 motifs, the enriched motifs

839 are different between both classes (5’-AGGAAG/AT/C-3’ and 5’-CNGGAA-3’, respectively)

840 (Figures 8E and F), even though both contain the signature motif for ETS1 binding (5’-GGAA-

841 3’). Thus, it would be interesting to determine whether these different motifs dictate a diverse

842 binding mode and could inform about the mechanism of activation and repression through the

843 same transcription factor, and whether the DNA binding site directs recruitment of specific

844 cofactors differentially targeted by Tat at TSG and TDG.

845 Given that we found no significant evidence of Tat interaction with TAR mimics in the

846 human genome as well as evidence of Tat interaction with ETS1, we reasoned that one clear

847 mechanism by which Tat is recruited to chromatin is through direct protein-protein interactions.

848 To biochemically test that Tat is recruited to chromatin in a RNA-independent manner, we

849 performed a cellular fractionation by sequential salt extraction in the absence and presence of

850 RNase (Figure 8–figure supplement 1A) and observed that: (i) Tat is present in the

34 851 nucleoplasm (100 mM salt fraction) as well as bound to chromatin eluted at (600 mM salt

852 fraction) from Jurkat nuclear extracts, while GFP is mainly detected in the 100mM salt fraction;

853 and (ii) the interaction between Tat and chromatin is primarily dictated by protein-protein

854 interaction, although a minor fraction appears to be RNA-dependent (Figure 8–figure

855 supplement 1B). To test this possibility further, we performed ChIP-qPCR assays at select

856 TSG and TDG by preparing samples in the presence and absence of RNase and observed that

857 the treatment does not affect Tat binding to four different target gene promoters (Figure 8–

858 figure supplement 1C). Although we cannot strictly rule out that Tat can combinatorially

859 interact with target proteins and RNA species present at different genomic domains, our data

860 indicate that Tat interaction with chromatin is primarily dictated by protein-protein interactions.

861

862 ETS1 recruits Tat to chromatin to reprogram cellular transcription

863 Given that we found a significant enrichment of ETS1 motifs in Tat sites (Figure 8) we reasoned

864 that ETS1 mediates Tat recruitment to chromatin. To test whether the two proteins interact we

865 performed Strep affinity purifications (AP) using nuclear fractions prepared from the Jurkat T-cell

866 lines followed by western blot analysis. The data indicate that Tat, but not the C22A non-

867 functional mutant or GFP, interacts with endogenous ETS1, as well as with the P-TEFb kinase

868 (CDK9) used as positive control, thereby showing that the protein-protein interaction is specific

869 (Figure 9A and Figure 9–figure supplement 1A). The facts that Tat and ETS1 interact, ETS1

870 motifs are enriched at the Tat target genes, and ETS1 binds these genes (as revealed by ETS1

871 ChIP-seq in Jurkat T cells (Hollenhorst et al., 2009)) prompted us to test whether both proteins

872 co-occupy target genes. To test this possibility we performed ChIP assays on the Jurkat GFP

873 and Tat cell lines using primer-pairs that amplify both promoter-proximal and promoter-distal of

874 different Tat target genes. We found that Tat and ETS1 co-occupy the CD69 promoter but not

875 gene body (Figure 9B), consistent with the motif enrichment analysis and a previous ETS1

876 ChIP-seq dataset (Hollenhorst et al., 2009). Furthermore, Tat binding at the CD69 promoter

35 877 does not appear to alter the occupancy levels of ETS1, since the density of ETS1 in the

878 absence and presence of Tat is similar, if not identical (Figure 9B). In addition, Tat binds some

879 target genes lacking ETS1 motifs such us CD1E (Figure 9C). These results provide evidence

880 that Tat also is recruited in an ETS1-independent manner thereby indicating that additional

881 recruitment mechanisms exist.

882 Given that Tat and ETS1 interact and co-occupy target gene promoters, we asked

883 whether ETS1 (already bound to DNA elements) recruits Tat to chromatin to reprogram gene

884 transcription. To test this, we generated Jurkat-GFP and -Tat cell lines expressing ETS1-

885 specific shRNAs to efficiently knockdown ETS1 or a non-target (NT) shRNA as negative control

886 (Figure 9D). Remarkably, we observed that ETS1 knockdown consistently diminishes (~6-fold)

887 ETS1 density at the CD69 promoter in both GFP and Tat cell lines with a concomitant loss of

888 Tat signal (~6-fold) at the promoter (Figure 9E). Importantly, ETS1 knockdown does not alter

889 the recruitment of Tat to the CD1E promoter, which is regulated in an ETS1-independent

890 manner (Figure 9F).

891 Given that Tat binds ETS1 and both co-occupy promoter-proximal regions of selected

892 target genes, we examined whether ETS1 is critical for transcriptional activation of CD69 and

893 three other TSG and observed that ETS1 knockdown interferes with the Tat-mediated increase

894 in target gene stimulation for both class I and class II TSG (Figure 9–figure supplement 1B–E)

895 without affecting RNA steady-state levels of non-target genes such as RPL19 and 7SK (Figure

896 9–figure supplement 1F–G). The evidence that ETS1 is critical for Tat's transcriptional

897 activation of these four selected TSG correlates with the Tat-mediated increase in the levels of

898 Pol II and the chromatin marks coinciding with transcription initiation and elongation at both

899 class I and II TSG (Figures 2–5).

900 Together, we provide compelling evidence that Tat is recruited to chromatin through

901 interaction with the T-cell identity factor ETS1 to reprogram cellular transcription. Further

902 studies are needed to determine whether ETS1 functions as a scaffold to promote Tat

36 903 recruitment or whether Tat induces protein conformational changes to activate or repress ETS1

904 regulatory transcriptional programs (see Discussion).

37 905 Discussion

906 Several studies have reported that Tat binds the human genome to modulate cellular gene

907 expression to alter the biology of immune cells (dendritic and CD4+ T) and generate a

908 permissive environment for viral replication and/or spread (Huang et al., 1998; Izmailova et al.,

909 2003; Kim et al., 2010a; Kim et al., 2013; Kukkonen et al., 2014; Li et al., 1997; Marban et al.,

910 2011). However, a comprehensive description of the direct target genes and the nature of the

911 regulatory mechanisms have yet to be discovered. In this article, we report a technically

912 improved Tat ChIP-seq assay that dramatically increases sensitivity in comparison with previous

913 methodologies. Furthermore, by simultaneously assaying transcriptome changes and Tat’s

914 genome-wide distribution we combinatorially identified direct Tat targets in the human genome

915 with high confidence. In contrast to previous studies, we were able to elucidate a large

916 proportion of Tat binding sites in the human genome that correlate with marked gene expression

917 changes (activation or repression) at both the RNA and protein levels. Importantly, the Tat

918 target genes shared functional annotations, and are regulated (for the most part) by a common

919 set of master transcriptional regulators.

920 Transcription factors coordinate the activation and maintenance of transcriptional

921 programs by regulating one or multiple steps in the transcriptional cycle. While some sequence-

922 specific DNA binding transcription factors recruit Pol II and the basal transcription apparatus to

923 promote transcription initiation (Hahn, 2004), others function at the elongation step by allowing

924 Pol II transition from a promoter paused state to the productive elongation phase (Adelman and

925 Lis, 2012; Rahl et al., 2010; Zhou et al., 2012). By performing a global analysis of chromatin

926 signatures that are generally associated with different genomic domains (promoter, coding units

927 and enhancers) together with genome-wide Pol II distribution data in the absence and presence

928 of Tat, we described, for the first time, the precise nature of the mechanisms of transcriptional

929 activation and repression by Tat. Strikingly, we found that Tat directly controls both the initiation

930 and elongation steps to transcriptionally reprogram the cell. Tat promotes Pol II recruitment at

38 931 promoters and transcribing units to stimulate the transcription initiation and elongation steps,

932 respectively. Conversely, Tat blocks Pol II recruitment to promoters or transcribing units to

933 prevent the initiation and elongation steps, respectively, thereby leading to gene repression.

934 Remarkably, the global analysis of chromatin signatures is consistent (for the most part) with the

935 proposed mechanisms based on Pol II occupancy changes. For example, class I TSG

936 (regulated at the initiation step) showed a large increase in both Pol II and H3K4me3 at

937 promoter-proximal regions, consistent with their functional link in activating gene transcription

938 (Shilatifard, 2012). Although we have not searched for the H3K4 methylase, it would be

939 interesting to define whether Tat hijacks one or more of Set/MLL methylases and whether

940 DoT1L and the SEC are co-recruited to link transcription initiation with elongation (Luo et al.,

941 2012; Nguyen and Zhang, 2011; Shilatifard, 2012). In addition, the mechanism by which Tat

942 reduces H3K4me3 density at some class II TSG remains yet unknown, but it may be possible

943 that Tat recruitment to the promoter region interferes with the H3K4me3 signature, as has been

944 seen with the NS1 protein during influenza virus infection (Marazzi et al., 2012). Although we

945 have described that Tat associates with and recruits chromatin-modifying enzymes to target

946 genes, one important aspect of the mechanism that we have not clarified yet is that the histone

947 modifications may represent (in some cases) indirect effects of changes in transcription and not

948 a direct consequence of Tat function on the genome.

949 Despite the canonical view of transcriptional regulation through transcription factor-

950 promoter DNA interaction, it has recently become evident that internal promoters, intragenic

951 enhancers and other genomic elements contribute to specify transcriptional programs through

952 the formation of three-dimensional structures (Bulger and Groudine, 2011; Creyghton et al.,

953 2010; Ghavi-Helm et al., 2014; Heintzman et al., 2007; Jin et al., 2013). Consistent with these

954 discoveries, we found that Tat exploits the human genome by binding not only at promoters but

955 also at intergenic and intragenic sites to modulate long-range chromatin interactions and

956 transcription activation from these promoter-distal sites. We observed that at a large number of

39 957 class I TSG (regulated at the initiation step) Tat binds at intragenic sites to modulate enhancer

958 activity. In the absence of Tat, the nucleosomes surrounding these sites contain low or

959 undetectable histone modifications related to enhancer activity (H3K4me1 and H3K27Ac). Tat

960 binding increases their density in a manner that is proportional to transcriptional levels. Notably,

961 this increase also correlates with the recruitment of chromatin-modifying enzymes (p300/CBP)

962 at the Tat site, chromatin looping between the internal site and the promoter, and Pol II

963 recruitment to trigger de novo transcription initiation. The role of intragenic Pol II pausing

964 requires further investigation but it may be possible that this form maintains activate

965 transcription elongation by promoting template DNA circularization. This Pol II form might not

966 have been observed in embryonic stem cells because most, if not all, genes are incompetent for

967 elongation and Pol II is primarily paused in the promoter-proximal region.

968 Recent evidence suggested that internal sites marked with H3K27Ac appeared to be a

969 sort of intragenic enhancers (Kowalczyk et al., 2012), implying that Tat directly dictates gene

970 activation by binding at these sites and is likely to be a major determinant of the overall

971 architecture and the composition of histone modifications of these internal sites. The gene

972 looping hypothesis provides a mechanistic explanation for the molecular effects observed at the

973 promoter (namely Pol II recruitment and increase in H3K4me3 density) in response to Tat

974 binding at promoter-distal sites, even without detectable Tat at the promoter-proximal region.

975 Despite we have shown that the C22A non-functional mutant (which does not dimerize) does

976 not promote gene looping, it would be interesting to further test whether the Tat-mediated long-

977 range chromatin interactions are controlled by protein homo-dimerization, a feature of Tat that

978 was originally described by Frankel and Pabo (Frankel et al., 1988), but whose function has

979 since then remained largely elusive. Furthermore, the role of intragenic enhancers in activating

980 gene expression is an attractive possibility, and might help recruit chromatin-modifying enzymes

981 for compartmentalization purposes and/or ‘on site’ activation.

982 Although in this report we did not thoroughly examine Tat binding at intergenic regions

40 983 and their functional role in transcriptional control, it is plausible that Tat binds these genomic

984 domains (enhancers) to control long-range interactions and the assembly of transcription

985 complexes to modulate transcription activation/repression. Given that transcription from

986 enhancers is a widespread regulatory mechanism to modulate the activity of nearby genes,

987 further research is needed to understand the role of Tat in controlling transcription through

988 enhancers and the potential role of the newly identified class of enhancer-derived long-non-

989 coding RNAs in transcription activation or repression (Kim et al., 2010b; Kim and Shiekhattar,

990 2015). Although we have detected Tat-induced RNAs from enhancers that are co-regulated

991 with nearby genes, further investigation is also needed to define the molecular mechanisms by

992 which these Tat-induced non-coding RNAs function during the reprogramming process.

993 Although the majority of transcription factors described to date contact DNA directly, Tat

994 is unique because it binds the nascent RNA structure (TAR) formed at the HIV promoter.

995 Surprisingly, in contrast to activation of the HIV genome, we found insignificant enrichment of

996 TAR-like motifs at Tat binding sites in the human genome. Although we cannot completely

997 exclude the possibility that nascent RNA chains (folding or not into TAR-like structures) help Tat

998 recruitment to host cell chromatin, we unexpectedly found that Tat occupies sites bound by

999 master transcriptional regulators (ETS1) with high frequency. Notably, we provided biochemical

1000 and genetic evidence that ETS1 recruits Tat to chromatin to modulate (activate or repress) gene

1001 transcription. ETS1 is one member of a large family of transcription factors (ETS) that play

1002 important roles in T-cell stimulation and differentiation (Hollenhorst et al., 2011). ETS1 requires

1003 combinatorial interactions with other factors (such as RUNX1) to activate transcription. We

1004 propose that, like RUNX1, Tat uses ETS1 as a scaffold to promote its recruitment to target

1005 genes and modulate gene transcription. Given that ETS1 is found at both activated and

1006 repressed genes, it is evident that ETS1 does not dictate per se the mode of regulation

1007 (activation or repression). For the mechanism of gene activation, a likely scenario would be that

1008 Tat is recruited to ETS1 to relieve its auto-inhibition and recruit Pol II, elongation factors and

41 1009 chromatin-modifying enzymes. For the mechanism of gene repression, Tat might compete off

1010 factors pre-associated with ETS1 (such as RUNX1) and block PIC assembly (in the case of

1011 transcription initiation blockage) or prevent the action of elongation factors such as P-TEFb (in

1012 the case of transcription elongation repression). Although ETS1 appears to play a central role

1013 in the recruitment of Tat to several direct target genes (irrespective of the transcriptional

1014 outcome), it is completely possible that several other co-factors including long non-coding RNAs

1015 and/or proteins function combinatorially to specify target loci identification and Tat function.

1016 Although we have provided important insights into the recruitment mechanisms, further

1017 investigation is needed to clarify the molecular basis of the Tat-ETS1 protein-protein interaction

1018 in the regulation of gene activation and repression. One particular observation derived from the

1019 motif analysis is that the ETS1 motif identified at both TSG and TDG contain a common

1020 sequence (5’-GGAA-3’) but differ in the -1/-2 positions. Of note, transcription factor binding

1021 sites slightly differing in sequence have been shown to modulate various steps of the

1022 transcription cycle including transcription factor binding affinity and conformational changes

1023 upon their recruitment to target sequences, as well as co-factor recruitment (Meijsing et al.,

1024 2009). Therefore, the difference in the ETS1 motifs at TSG and TDG might contain information

1025 utilized by Tat (and/or Tat co-factors) to activate or repress transcription. Undoubtedly, further

1026 research is required to elucidate the precise molecular basis.

1027 Despite the widespread role of ETS1 in recruiting Tat to chromatin, it is most likely that

1028 other recruitment strategies exist because not all Tat target genes identified are regulated by

1029 ETS1. Defining all the details will require higher-resolution approaches such as ChIP-exo and

1030 ChIP-nexus in order to improve the definition of Tat target sites and mechanisms of recruitment

1031 to chromatin (He et al., 2015; Rhee and Pugh, 2011). In addition, the analysis of transcriptome

1032 changes using RNA-seq (which measures steady-state RNA levels and not transcription per se)

1033 has some caveats, for example difficulties in detecting low abundance or highly unstable RNAs.

1034 Potentially, this may help explain why we also found a large fraction of binding events that do

42 1035 not have a correlate with gene expression changes. Other tools such as GRO-seq (Core et al.,

1036 2008), which measure levels of nascent RNAs, will also be needed to further improve the

1037 definition of direct and indirect target genes. Moreover, a more recent and powerful technique

1038 named NET-seq yields transcriptional activity at nucleotide resolution and thus outperforms

1039 compared to GRO-seq (Mayer et al., 2015). Certainly, these tools will provide much higher-

1040 resolution to define target sites, modes of Tat recruitment and improved functional insights.

1041 Nonetheless, our work provides the first comprehensive view of how Tat modulates the biology

1042 of immune target cells by mediating key transcriptional changes. Interestingly, Tat can

1043 potentially perform different functions depending on the target cell type. Thus, it would be

1044 informative to determine how host adaptability is harnessed by a viral protein to selectively

1045 reprogram transcription in a cell lineage-specific manner.

1046 In conclusion, despite the complexity of transcriptional regulatory mechanisms in the

1047 cell, Tat precisely controls Pol II recruitment and pause release to fine-tune the initiation and

1048 elongation steps, respectively. It is possible that the diversity of mechanisms employed by Tat

1049 to reprogram the host cell arise from the intrinsic complexity of transcriptional regulatory

1050 strategies of human genes. Finally, our data provide yet another example on how a virus with a

1051 limited coding capacity optimized its genome by evolving a small but “multi-tasking” protein to

1052 simultaneously control viral and cellular transcription using distinct regulatory strategies.

43 1053 Materials and methods

1054

1055 Experimental analysis

1056 Cell culture

1057 HEK293T cells were maintained in DMEM with 10% fetal bovine serum (FBS), 100 U/ml

1058 penicillin and 100 μg/ml streptomycin (Life Technologies). CD4+ Jurkat T cells (clone E6.1)

1059 were cultured in RPMI 1640 (HyClone) with 10% FBS, 100 U/ml penicillin and 100 μg/ml

1060 streptomycin (Life Technologies). The Jurkat HIV E4 clone was kindly provided by Dr. J. Karn

1061 and described elsewhere (Pearson et al., 2008). Jurkat T-REx cells were maintained in the

1062 same conditions as for Jurkat but in the presence of 10 μg/ml Blasticidin as indicated by the

1063 manufacturer's instructions (Life Technologies). The derived Jurkat T-REx clones (GFP and Tat

1064 bearing the Strep/Flag (SF) epitope: GFP-SF and Tat-SF, respectively) were selected with 300

1065 μg/ml of zeocin for four weeks and later cultured with 10 μg/ml Blasticidin and 100 μg/ml zeocin

1066 (Life Technologies). The Tat cell line was cultured for less than 3 months because it otherwise

1067 loses Tat expression and responsiveness. To generate the stable GFP:SF and Tat:SF

1068 expressing cell lines, the parental Jurkat T-REx was electroporated with 2 μg of Ssp1-linearized

1069 pcDNA4/TO vector (Life Technologies) bearing either insert using a nucleofector kit V and a

1070 Nucleofector II electroporator (Lonza). To induce expression of GFP:SF and Tat:SF, the stable

1071 cell lines were treated with 1 μg/ml doxycycline (DOX, Sigma) for 16 hrs (or as indicated). We

1072 chose to utilize a SF-tagged Tat protein and immunoprecipitate Tat:SF using an anti-FLAG

1073 antibody due to our inability to efficiently immunoprecipitate Tat under denaturing conditions

1074 with the available commercial anti-Tat antibodies from Covance (catalogue number MMS-116)

1075 and Abcam (ab43014).

1076

1077 Primary CD4+ T cell isolation, stimulation and HIV infection

44 1078 Blood from healthy donors was obtained from the Gulf Coast Regional Blood Center from

1079 Houston, TX. PBMCs were isolated by Ficoll-Hypaque density gradient centrifugation (Stem

1080 cell) and Naïve CD4+ T cells were then purified by negative selection using the magnetic beads

1081 Human Naïve CD4 T Cell Enrichment Set (Becton Dickinson, 558521). Cell purity was

1082 determined by surface staining: APC-conjugated anti-CD4 (Becton Dickinson, 555349) and

1083 FITC-conjugated CD45RA (Becton Dickinson, 555488). To generate CD4 central memory T

1084 cells (TCM), Naïve CD4 T cells were activated at day 0 (Figure 1–figure supplement 8) in 96-

1085 well plates precoated with anti-CD3/CD28 [2.5 μg/ml anti-CD3 (OKT3), 2.5 μg/ml anti-CD28

1086 (Biolegend, 302914)] in the presence of anti-IL-12 (4 μg/1x106 cells; R&D, MAB219), anti-IL-4 (2

1087 μg/1x106 cells; R&D, MAB204) and TGF-β1 (0.8 μg/1x106 cells; Peprotech, 100-21) (Bosque

1088 and Planelles, 2011; Messi et al., 2003). After activation cells were maintained at 1x106 cells/ml

1089 in RPMI 1640 with L-Glutamine (HyClone) containing 10% FBS, 100 U/ml penicillin and 100

1090 μg/ml streptomycin (Life Technologies) in the presence of 30 IU/ml of human IL-2 (AIDS

1091 Research and Reference Reagent Program). For infection of TCM with replication competent X4

1092 virus (pNL-4-3–GFP Nef(–), MOI=0.5), activated cells (day 7) were spinoculated (1x106 cells in

1093 1 ml; 2 hrs, 2900 rpm) in the presence of 8 μg/mL polybrene (Hexadimethrine bromide, Sigma).

1094 After infection, the supernatant was removed, cells were resuspended in complete RPMI and

1095 crowded for three days in 96-well plates (round bottom) at a concentration of 1X106 cells/ml (0.1

1096 ml/well). Three days post-infection GFP+ cells were sorted at the UTSW Flow Cytometry Core.

1097 At this point supernatant and cells were harvested for p24 ELISA and RNA isolation,

1098 respectively, as indicated above.

1099

1100 Protein-protein interaction assays

1101 Tat-associated proteins were purified using Strep-Tactin Superflow beads (IBA Life Sciences).

1102 Briefly, Jurkat (1x109 cells) were lysed using the Dignam method with brief modifications

45 1103 (Dignam et al., 1983). Strep-Tactin beads were equilibrated with IP lysis buffer, mixed with cell

1104 lysates and rotated at 4°C for 2 hrs. After centrifugation at 3000 rpm for 1 min, unbound

1105 proteins were removed and beads washed four times with IP wash buffer (20 mM Tris-HCl pH =

1106 7.5, 1.5 mM MgCl2, 250 mM NaCl, 0.2% NP-40, 5% glycerol, 1 mM DTT, protease inhibitor

1107 (Roche)). Specifically bound proteins were eluted with elution buffer containing desthiobiotin

1108 (IBA Life Sciences) for 30 min at 4°C.

1109

1110 Cell fractionation by sequential salt extraction

1111 1x108 cells (Jurkat-GFP and -Tat) were collected from 100 ml of culture media (grown at 1x106

1112 cells/ml) and washed with 2 ml of cold 1x PBS plus EDTA-free Protease Inhibitor (PI) Cocktail

1113 (Roche). The cell pellet (PCV=0.1 ml) was resuspended in 1 ml Buffer A (10 mM KCl, 10 mM

1114 HEPES pH 7.9, 0.1 mM EDTA, 1 mM DTT, 0.4% NP-40, plus PI) and incubated on ice for 5

1115 min. Nuclei were pelleted by centrifugation (5 min spin at 2,000 g at 4°C), the supernatant

1116 saved as the cytoplasmic extract (~0.5-0.6 ml), and the nuclear pellet washed twiced with 1 ml

1117 buffer A. Nuclei were resuspended in 0.25 ml Buffer B (20 mM HEPES pH 7.9, 100 mM NaCl, 1

1118 mM EDTA, 1 mM DTT, 1% NP-40 plus PI) using 10 strokes in a 1-ml dounce homogenizer and

1119 centrifuged 5 min at 6000 g. The supernatant (~0.4 ml) was designated as nuclear extract (100

1120 mM). The pellet was similarly resuspended in 2x pellet volume of Buffer B containing 600 mM

1121 NaCl, vortexed for 10 sec, treated with RNAse A (Roche) or BSA (mock treatment) at a

1122 concentration of 10 μg/ml for 60 min at 4°C and then spun at 6,000 g for 5 min.

1123

1124 Western blotting and antibodies

1125 Protein samples were resolved in SDS-PAGE, transferred to 0.45 μm nitrocellulose (Bio-Rad)

1126 membranes, blocked in Tris-buffered saline (TBS) containing 5% non fat dry milk for 1 hr, and

1127 incubated with primary antibodies overnight at 4°C. Primary antibodies used for western blot:

46 1128 FLAG M2 (F1804, Sigma); STREP-Tag II (71591, Novagen); CDK9 (sc-484, Santa Cruz); ETS1

1129 (sc-350X, Santa Cruz); β-actin (sc-47778, Santa Cruz); Tat (ab43014, Abcam); Dot1L (A300-

1130 954A, Bethyl); SetD2 (ab69836, Abcam); histone H3 (ab1791, Abcam). Secondary antibodies

1131 coupled to horseradish peroxidase (HRP) were donkey anti-rabbit IgG-HRP (sc-2313, Santa

1132 Cruz), goat anti-mouse IgG-HRP (sc-2005, Santa Cruz), and donkey anti-goat IgG-HRP (sc-

1133 2020, Santa Cruz) and were incubated at 1:10,000 dilutions for 1 hr and blots developed using

1134 Clarity Western ECL Substrate kit (Bio-Rad).

1135

1136 Flow cytometry

1137 For cell surface marker staining, Jurkat cells were washed twice with PBS/0.5% BSA, and

1138 incubated with 50 μl of the antibody diluted in PBS/0.5% BSA for 30 min in the dark. Antibodies

1139 used include CD69-PE (12-0699, eBioscience) or Mouse IgG1 K isotype control PE (12-4714,

1140 eBioscience) and CD4-APC (MHCD0405, Caltag). Cells were washed twice with PBS and then

1141 resuspended in 200 μL PBS/2.5% formalin, washed twice with PBS and twice with PBS/0.5%

1142 BSA. Fixed cells were subjected to flow cytometry analysis on a FACS Calibur in the Flow

1143 Cytometry Core Facility in UT Southwestern Medical Center. Data analysis was performed

1144 using FlowJo version 9.6.1.

1145

1146 Enzyme-linked immunosorbent assay (ELISA)

1147 Viral stocks and kinetics of HIV p24 antigen expression were monitored by ELISA following the

1148 manufacturer’s instructions. Plates and reagents were obtained from SAIC Frederick, Inc. from

1149 the National Cancer Institute’s (NCI) Operations and Technical Support Contractor to the

1150 Frederick National Laboratory for Cancer Research (FNLCR).

1151

1152 RNAi

47 1153 Lentiviral particles were made by transfecting HEK293T cells with three plasmids: the shRNA

1154 transfer vector pLKO.1 (Sigma), pMD2.G (VSV-g) and psPAX2 (gag-pol). Supernatants were

1155 harvested two days post-transfection, quantified by ELISA and stored at -80°C. Jurkat cells

1156 (2x105) were spinoculated at 2900 rpm for 2 hrs with lentiviral particles (50-200 μl, ~1x107

1157 transducing units (TU) determined by p24 ELISA) in the presence of 8 μg/ml polybrene.

1158 Efficiently transduced cells were selected with 1 μg/ml puromycin for 5 days. At this point, cells

1159 were used for validation of RNAi efficiency by qRT-PCR and/or western blot using the

1160 corresponding primer-pair and primary antibody, respectively. shRNAs used for RNAi are:

1161 pLKO.1-NT shRNA (SHC002, Sigma); pLKO.1-Dot1L shRNA (TRCN0000020209, Sigma);

1162 pLKO.1-SetD2 shRNA (TRCN0000003030, Sigma); pLKO.1-CDK9 shRNA (TRCN0000000494,

1163 Sigma); pLKO-puro-IPTG-3xLacO-NT shRNA (SHC332, Sigma); pLKO-puro-IPTG-3xLacO-

1164 ETS1 shRNA (TRCN0000005591, Sigma).

1165

1166 Chromosome conformation capture (3C) assay

1167 Chromatin interaction was determined using a 3C assay (Hagege et al., 2007). For the 3C

1168 assay in the absence/presence of CDK9 inhibitor, cells were induced with DOX for 6 hours

1169 (minimal detectable time before the onset of protein induction) followed by 2 hr incubation with

1170 FP or DMSO (vehicle), and cells were then processed as follows. 1x107 Jurkat cells were fixed

1171 with 1.5% methanol-free formaldehyde (Thermo Fisher) at room temperature for 10 min, the

1172 crosslinking was quenched with 0.125 M glycine and then cells washed twice with cold PBS.

1173 Cell pellets were homogenized in cold lysis buffer (10 mM Tris-HCl pH 8.0, 10 mM NaCl, 0.2%

1174 NP-40) using a dounce homogeneizer (10 cycles) and incubated for 90 min at 4°C with constant

1175 rotation. After centrifugation for 5 min at 400 g at 4°C, 1x106 nuclei were re-suspended in 0.5

1176 ml of 1.2 X restriction enzyme buffer for DNA digestion followed by incubation with 7.5 μl of 20%

1177 (w/v) SDS for 1 hr at 37°C with constant shaking. 50 μl of 20% (v/v) Triton X-100 were treated

48 1178 for 1 hr at 37°C with constant shaking. Chromatin DNA was digested with 600 units of the

1179 indicated restriction enzyme (EcoRI) overnight at 37°C with constant shaking. Samples were

1180 incubated with 40 μl of 20% (w/v) SDS for 30 min at 65°C to stop the reaction. Digested nuclei

1181 were transferred to 50 ml Falcon tube and incubated with 6.125 ml of 1.15 X ligation buffer and

1182 375 µl of 20% (v/v) Triton X-100 for 1 hr at 37°C with constant shaking. DNA ligation was

1183 performed by incubation with 2,000 units of T4 DNA ligase (New England Biolabs) for 4 hrs at

1184 16°C followed by 30 min at room temperature. Reverse crosslinking was done by incubation

1185 with 25 μl (20 mg/ml) of proteinase K (Roche) at 65°C overnight. DNA samples were then

1186 treated with 5 μl of RNase A (Roche) for 30 min at 37°C, and then mixed with 7 ml of phenol–

1187 chloroform vigorously and centrifuged for 15 min at 2,200 g at room temperature. The

1188 supernatant was transferred into a 50 ml tube and mixed with 7 ml of distilled water, 1.5 ml of 2

1189 M sodium acetate pH 5.6, and 35 ml of ethanol to precipitate DNA by centrifuging at 12,000g for

1190 30 min at 4°C. The DNA pellet was dissolved in 150 μl of 10 mM Tris-HCl pH 7.5. Random

1191 ligation matrix was prepared by digestion with restriction enzyme and ligation with BAC clones

1192 containing the respective genes examined from the Children’s Hospital Oakland Research

1193 Institute (CHORI). DNA concentration of 3C samples was determined by qRT-PCR assays with

1194 GAPDH-specific primers and using Fast SYBR Green Master Mix in a 7500 Real-Time PCR

1195 System (Applied Biosystems). Primers used for 3C will be provided upon request.

1196

1197 RNA extraction and qRT-PCR assays

1198 Total RNA from the Jurkat cell lines was isolated using TRIzol (Life Technologies) and RNeasy

1199 mini kit (Qiagen). 1 μg of total RNA was used for synthesis of first strand cDNA with M-MLV

1200 (New England Biolabs), and qPCR was performed using a Fast SYBR Green Master Mix in a

1201 7500 Fast Real-Time PCR System (Applied Biosystems). Experiments were done in biological

1202 triplicates and error bars represent the SEM as indicated in all figure legends. The primers used

49 1203 for qRT-PCR analysis are as follow (Gene, forward primer, reverse primer):

1204  (CCCCCCGGGCCGTCTTCCCCTC, TGAGGATGCCTCTCTTGCTCTG)

1205 RPL19 (ATCGATCGCCACATGTATCA, GCGTGCTTCCTTGGTCTTAG)

1206 7SK (TAAGAGCTCGGATGTGAGGGCGATCTG, GGAGCGGTGAGGGAGGAAG)

1207 CD69 (ACTGTGAAGAGGAGCTGG, GTGTTCCTCTCTACCTGCGTATCG)

1208 ADCYAP1 (GGGATCTTCACGGACAGCTA, CGGCGTCCTTTGTTTTTAAC)

1209 FAM46C (GCCTAGGGGCTGTAGAGGTCG, CTGCTCTCCTCTGCCATCTT)

1210 PPM1H (GCACACACAATGAAGACCAAGCCA, CAATAGTGGCAGGAAACACCCTCG)

1211 ANXA1 (ACGCTTTGCTTTCTCTTGCTAAGG, AAGGCCCTGGCATCTGAATCAGCC)

1212 ZNF83 (AGCCACCAAGAAGACCAAAGAAA, TATAAAGCCCTCTGTGCAGGGTTCA)

1213 CD1E (AAAGCCTTCTTGGTCACACCT, GCTTTGGGTAGAATCCTGAGAC)

1214 FBLN2 (GCCGATGGCTATATCCTCAAT, CAGGTGAGTGCCTTGTAGCA)

1215 EOMES (AAATTCCACCGCCACCAAACTGAG, TTGTAGTGGGCAGTGGGATTGAGT)

1216 CDK6 (ATGTGTGCACAGTGTCACGAACAG, TTAGATCGCGATGCACTACTCGGT)

1217 PRAME (ATTTGCGGAGGTTTTCACAGGA, TATCGGCTCTGAATGGAACCCC)

1218 FAM133B (TCGGGTGGCCTATATGAACCCAAT, GCCAAAGCCTTGGAGCCTTTCTTT)

1219 Dot1L (CCACCAACTGCAAACATCACT, GAGGAAATCGCCTCTCTCCAAT)

1220 SetD2 (CTCTCACCACCCTCTTCTGCCTA, CCACCTCTTGCTCAAACAACTTCC)

1221 CDK9 (GTGTTCGACTTCTGCGAGCATGAC, CTATGCAGGATCTTGTTTCTGTGG)

1222 PPM1H eRNA (AGCCTGGAGTGGAGAAGAAA, TCTGATTCCCTTCCATCCTC)

1223 GAPDH (CCCTGTGCTCAACCAGT, CTCACCTTGACACAAGCC)

1224 CD82 (GGGCTCAGCCTGTATCAAAG, CAGGACAGAGATGAAACTGCTC)

1225 VAV3 (CAACATCCTCCCTCCTTAGC, CCCTGACAATGACAGACTGG)

1226 HSPA8 (CAGTACGGAGGCGTCTTACA, CCACTTGGGTGGAGAAGATT)

1227 CD69-In (TCTTGTCCACTCTCCGGATGC, AGACTCAACAAGAGCTCCAGC)

1228 PPM1H-In (CTCCTTGAGGATGAGGATGG, CTCAGGACGAGGTGGAGTG)

50 1229 VAV3-In (GCAGAGCAGGACTCCATCGCGG, AGCCGCGTCCCGGAGCCGTCG)

1230 ANXA1-In (CAAACAGAAGGCAGCCAAT, CAGTGGAGACTTGGGCTTCT)

1231 CD1E-In (CAGTTAGTGCAATTAGGGAG, AAGCCACACTGCCTCGC)

1232 FBLN2-In (CGCGCACACAGCCAGGGG, GCGGACCGACGGGGAT)

1233 EOMES-In (ACGCTGGAAGAAGGTGACTT, TCAACTTGACCGATGCTTTG)

1234 CDK6-In (AGAGTTCCAGTTCCGTTTGG, GATCCTGCTTCCACGTATCC)

1235

1236 RNA-seq library preparation

1237 For RNA-seq, total RNA samples (10 μg) were prepared from biological duplicate samples with

1238 TRIzol followed by depletion of rRNA using the Ribominus isolation kit (Life Technologies).

1239 Briefly, 500 ng of rRNA-depleted RNA were used for library preparation using the SOLiD Total

1240 RNA-seq kit (Applied Biosystems). The RNA was fragmented and adapters ligated before

1241 cDNA synthesis. The cDNA was then size selected, amplified, purified with AmpureXP beads

1242 (Beckton Coulter) and quality checked on a Bioanalyzer (Agilent). Samples were quantified by

1243 qPCR using SOLiD adapter primers per the manufacturer's instructions (Applied Biosystems)

1244 and the EZ-Bead system was used to amplify the libraries onto the sequencing beads for high-

1245 throughput sequencing on a SOLiD 5500xl instrument (Applied Biosystems).

1246

1247 Chromatin immunoprecipitation (ChIP) assay

1248 Chromatin was cross-linked with 1% formaldehyde for 10 min at room temperature in culture

1249 media, and the reaction was stopped by the addition of glycine (125 mM final). Cells were

1250 washed twice with PBS and re-suspended in 1 ml of nuclei extraction buffer (5 mM PIPES pH

1251 8.0, 85 mM KCl, 0.5% NP-40, 1 mM PMSF, protease inhibitor (Roche)) at 2x107 cells/ml and

1252 incubated for 10 min at 4°C. Cell nuclei were collected by centrifugation at 3000 rpm for 5 min

1253 at 4°C, and re-suspended in Szak's RIPA buffer (50 mM Tris-HCl pH 8.0, 1% NP-40, 150 mM

51 1254 NaCl, 0.5% deoxycholate, 0.1% SDS, 5 mM EDTA, 0.5 mM PMSF, and EDTA-free protease

1255 inhibitor cocktail (Roche)) at 5x107 nuclei/ml. Nuclear pellets were sonicated using the Bioruptor

1256 water bath (Diagenode) using the high setting with 30 sec on 30 sec off for 45 min to generate

1257 100-300 bp DNA fragments. Chromatin DNA was quantified with NanoDrop 1000 (Agilent). For

1258 the ChIP step, 1–3 μg of antibody was conjugated with 15 μl of protein G dynabeads (Life

1259 Technologies) and blocked with 0.16% bovine serum albumin for 16 hrs at 4°C. Antibody-

1260 conjugated dynabeads were incubated with 30–50 μg of chromatin DNA for 2 hrs at 4°C. Then

1261 the beads were sequentially washed two times with the following buffers: low salt buffer (20 mM

1262 Tris-HCl pH 8.0, 150 mM NaCl, 1% Triton, 0.1% SDS, 2 mM EDTA), high salt buffer (20 mM

1263 Tris-HCl pH 8.0, 500 mM NaCl, 1% Triton, 0.1% SDS, 2 mM EDTA), LiCl wash buffer (0.25M

1264 LiCl, 1% NP-40, 1% deoxycholate, 1 mM EDTA, 20 mM Tris-HCl, pH 8.0), and 1 mM Tris-EDTA

1265 pH 8.0. Chromatin immunocomplexes were eluted by incubation for 10 min at 65°C with 1%

1266 SDS and 100 mM NaHCO3, and cross-linking was reversed by incubation in the solution

1267 adjusted to 200 mM NaCl and proteinase K (20 μg) for 1 hr at 65°C. Antibodies used for ChIP:

1268 RNA polymerase II unphosphorylated/Ser5P-CTD (39097, Active Motif); RNA polymerase II

1269 total P-CTD (61081, Active Motif); RNA polymerase II Ser5P-CTD (ab5131, Abcam); RNA

1270 polymerase II Ser2P-CTD (ab5095; Abcam); Normal rabbit IgG (sc-2027; Santa Cruz); CycT1

1271 (sc-899X, Santa Cruz); Cdk9 (sc-8338X, Santa Cruz); H3K4me1 (ab8895, Abcam); H3K4me3

1272 (ab8580, Abcam); H3K9me3 (ab8898, Abcam); H3K27me3 (ab6002, Abcam); H3K36me3

1273 (ab9050, Abcam); H3K79me3 (ab2621, Abcam); H3K36me3 (ab4729, Abcam); H3 (ab1971,

1274 Abcam) Dot1L (A300-954A, Bethyl); SetD2 (ab69836, Abcam); MED1 (A300-793A, Bethyl);

1275 TBP (ab28450, Abcam); p300 (sc-585X, Santa Cruz); FLAG M2 (F1804, Sigma); Tat (ab43014;

1276 Abcam). qRT-PCR was performed on a 7500 ABI Fast Real-Time PCR System (Applied

1277 Biosystems) as indicated above using the following primers (forward, reverse) with the sign (-/+)

1278 and number indicating the position respective to the TSS:

52 1279 CD69 -17913 (TTGTTGGCCTGAAGTTTTCC, ATACGGATTCACAGCCGAAC)

1280 CD69 -3955 (AATCCAGGGTGAGACGTCAG, GGAAGTCTGTGGTCCCTGTT)

1281 CD69 -63 (AATCCCACTTTCCTCCTGCT, GCCGCCTACTTGCTTGACTA)

1282 CD69 +2046 (GGTAAACTGGACCAAGAGAAGTTGCC, AATCTGAGTGCCGATCTGTGATGG)

1283 CD69 +5897 (ACTGTGAAGAGGAGCTGG, GTGTTCCTCTCTACCTGCGTATCG)

1284 CD1E -1499 (GTCCTTGTGATAGTTTGCTGAGAAT, CAAATGTCCAACAATGATAGACTG)

1285 CD1E +4 (CACAAGAGCAGGGAGAAAATCTGGA, GACAGACAAACCTCTCCCTCTG)

1286 CD1E +1341 (GGGCCCAGAACATCTGTAAAG, AGAGCAGGTGCTGCATCTTT)

1287 CD1E +5215 (GGCCAAAGCTCAGAAGAGTGCAG, AGAGCAGGTGCTGCATCTTT)

1288 ADCYAP1 -8465 (TCTCCCCTTCCTGATGATTG, CCCAGAGCACACAGCTAACA)

1289 ADCYAP1 +1 (AGACACCAACGCCAGACG, GGAGAGCTGCCAGGTAGGAC)

1290 ADCYAP1 +3713 (TTCCAGGGAGGTTTTGCTATAA, CTGTTTGGGTCCATAAGTGTCC)

1291 ADCYAP1 +6242 (AGCTCCAACAGACCCTGAGA, ATCTGATTGCTGGGTGAAGG)

1292 RAG1 -3544 (GCCAGTTCATGGATTCAACAAC, TGCGGGCTTTCTGATTTTAGCTAC)

1293 RAG1 +8 (TGAGAAACAAGAGGGCAAGGAGAGA, CCTGGCCAAAGTGTTGAGATGTCTG)

1294 RAG1 +5761 (AAGGTTTTCCGGATCGATGTG, GGGCACTGCTAAACTTCCTGTGCAT)

1295 RAG1 +12971 (TCCAAGATGCAATGGTGGTA, GTGGGACATGGTGTCAACAG)

1296 PPM1H +3755 (CCAGGCTAACACACCATGAC, TTGTTAATGCATGGGACAGG)

1297 EOMES -5777 (AAGGCACCCTTAACTGGATG, TCTTGCTCTGCACTTGCTCT)

1298 EOMES -57 (GGGCTGTCACTAGCTGCTTT, CAGGCGACTTGATCCAATTA)

1299 EOMES +3289 (GGGTGTTGTTGTTATTTGCG, TGTATGTTCACCCAGAGTCTCC)

1300 EOMES +9699 (GGAGACAGGACAGTCGAGGT, CTAACTGTGTAGCGCGGAAA)

1301 FAM46C -182 (AGGTGTCCCACTAACTCCGA, TGGTTGCCAAGAGACGAGTA)

1302 VAV3 -3936 (TCCTACTGCCCTTGAGTCCT, AAAGTGCCCAGAAGAGTGCT)

1303 VAV3 +91 (GTAGGAAACGCCAAAGTGGT, TCATGCGGCTAATTTCTTTG)

1304 VAV3 +102770 (ACAGGCACAACAGTCAAAGG, TGGTTAATTCCAGTGGGTGA)

53 1305 CD82 -10659 (TTAGGCCAGCTGTCTCATTG, AGGGCAGACATGGAATTAGG)

1306 CD82 +4180 (CCCACCTGAAGCACCTTTAT, GGGCCGAGATCTCAACTCTA)

1307 CD82 +28894 (GGATCCTAACTGCCAAGCAT, CAGAGGCCCAGAGAGGTTAG)

1308 HSPA8 -3885 (CTGGTAGTAGAGCCCTTGCC, CGGACAATATGATCTGCCAA)

1309 HSPA8 -103 (CTGCCCTTACAAGACCCAAT, GGTGAGTGCGTTATCGTGAG)

1310 HSPA8 +3576 (TTAGCTAGACGCCCTTAGGC, TGTGGACAAGAGTACGGGAA)

1311 FBLN2 + 622 (CACACACGCACTCACACAAG, GAGAGGGAGGATGTGGTGAC)

1312

1313 ChIP-seq library preparation and sequencing

1314 ChIP DNA was quantified on a Qubit® 2.0 Fluorometer (Life Technologies) and ~10 ng DNA

1315 were submitted to the McDermott Center Sequencing Core at UT Southwestern Medical Center

1316 for library preparation and high-throughput sequencing using the SOLiDTM ChIP-seq Kit with

1317 modifications for the 5500xl instrument (Applied Biosystems). Samples were end repaired, 3’-

1318 end adenylated and barcoded with multiplex adapters (Applied Biosystems). After purification

1319 with Ampure XP beads (Beckton Coulter), samples were PCR amplified (~14 cycles), size

1320 selected with Ampure XP beads, and quantified on the Agilent 2100 Bioanalyzer. Emulsion

1321 PCR was then performed and beads enriched on the EZ-Bead system. Between 20-30x106 50

1322 nt sequencing reads were generated for all ChIP experiments in the two cell lines. We uniquely

1323 mapped between 70-80% reads to the human genome. High-throughput sequencing was

1324 analyzed as described in the computational analysis section below.

1325

1326 Computational analysis

1327 All scripting was performed using python 2.7.6. All ChIP-seq binding events were loaded into a

1328 custom MySQL database to allow for efficient comparisons of multiple factors’ binding loci.

1329

1330 Mapping and sorting

54 1331 The output from the SOLiD ChIP-seq was aligned to the reference human genome

1332 (GRCh37/hg19) using Bowtie v1.0.0. (Langmead et al., 2009), allowing 1 mismatch (-v 1).

1333 Sequences aligning to multiple locations in the reference genome were discarded. -S was

1334 specified to deliver.sam output, which was then sorted and converted to binary (.bam) using

1335 Samtools (Li et al., 2009).

1336

1337 ChIP-seq peak calling

1338 The .sam files for all ChIP-seq marks were analyzed for binding events using peak calling in the

1339 MACS2 software (Zhang et al., 2008). When determining peaks, duplicate read alignments

1340 were discarded to avoid amplification artifacts from the PCR. An FDR threshold of < 0.05 was

1341 applied for statistical significance. This methodology was used to call FLAG peaks in both the

1342 Tat and GFP cell lines, and only FLAG peaks present in the Tat, but not in the GFP, cell line

1343 were considered valid Tat binding events. The GFP:SF cell line was used to control for the

1344 presence of the FLAG epitope as well as any inherent chromatin biases in the Jurkat cell lines.

1345

1346 ChIP-seq data visualization

1347 MACS2 was run in the callpeak mode with the –B and –SPMR flags set to generate signal

1348 pileup tracks in bedGraph format on a per million reads basis. This allows for direct comparison

1349 between the GFP and Tat cell lines, regardless of any differences in depth of the sequencing

1350 experiment. The pileup tracks were visualized and images were generated using Integrative

1351 Genomics Viewer (IGV) (Robinson et al., 2011). These normalized pileup tracks also served as

1352 the input for many data visualizations as noted below.

1353

1354 Modified traveling ratio (mTR) algorithm

1355 Rahl et al. utilized a ratio of the levels of promoter-proximal and distal Pol II within a gene as a

1356 probe to determine rate of transcription phase change from initiation to elongation on a per-gene

55 1357 basis (Rahl et al., 2010). In our Pol II ChIP-seq dataset, we noted that Pol II density is elevated

1358 above active transcription levels not only at the promoter-proximal location, but also at discrete

1359 positions within bodies of actively transcribed genes. These positions are often accompanied

1360 by H3K27Ac chromatin marks. We found that these positions are three-dimensionally

1361 positioned near the TSS and are actually representative of a form of paused Pol II, not actively

1362 transcribing Pol II. Therefore, our modified traveling ratio (mTR) is calculated as follows:

1363 promoter-proximal Pol II ChIP reads / promoter-distal Pol II ChIP reads. Promoter-proximal Pol

1364 II ChIP reads are considered to be all reads aligned between -30 to +1000 bp of the TSS and all

1365 reads present in distal locations having a H3K27Ac peak (as determined by the method

1366 provided above) within 500 bp and having a density greater than 5X the average Pol II density

1367 between TSS+1001 bp to TTS. Distal Pol II reads are considered to be all those from

1368 TSS+1001 bp to TTS that do not fit the criteria for promoter-proximal Pol II as indicated above.

1369 The mTR is a custom software and is publicly available. Source code is included in ‘calculate-

1370 mTR.py’ source data file.

1371

1372 Cis-regulatory motif discovery

1373 To determine conserved DNA motifs associated with Tat, 200 bp fragments of DNA were

1374 selected, centered on the summits of Tat peaks found by MACS2 (Zhang et al., 2008). These

1375 fragments were submitted to the MEME suite (Bailey et al., 2009) and motifs were considered

1376 significant when having an e-value less than 0.05.

1377

1378 Generation of Marban et al. pileup track

Marban et al. (Marban et al., 2011) did not publish processed visualization data such as bigwig

or bedGraph pileup files. Therefore, to visually compare the fidelity of our Tat ChIP-seq data

with the data generated by Marban et al., we retrieved Marban’s raw ChIP-seq read data

published in GEO (GSE30738) and processed it using the pipeline described above for mapping

56 and peak calling/pileup track generation. Notably, using the strict p-value cutoff we chose for

peak calling with our dataset, MACS2 did not return any peaks when processing the Marban

data.

1379

1380 Annotation of ChIP-seq peaks to genes

1381 HOMER (Heinz et al., 2010) annotatePeaks.pl was used to relate ChIP-seq peak location

1382 information to the corresponding genes in which those peaks are found. Annotations were

1383 made based on the GRCh37/hg19 annotation reference.

1384

1385 Generation of ChIP-seq heatmaps

1386 We used HOMER (Heinz et al., 2010) annotatePeaks.pl to generate density matrices using

1387 normalized bedGraph pileup tracks of ChIP-seq marks as input. The –hist flag was set with a

1388 parameter of 50 bins and the –ghist flag was also set to generate a density matrix. Certain

1389 marks’ heatmaps were generated using a fixed length window (i.e. –size 6000) while others

1390 were generated based on the entire transcribed region of a set of genes that differ in length (–

1391 size given with a peakfile containing a bed-style list of TSS and TTS coordinates for each gene

1392 in the set).

1393

1394 Generation of ChIP-seq metagene plots

1395 HOMER (Heinz et al., 2010) annotatePeaks.pl was used to generate metagene plots based on

1396 the average profile of a series of ChIP-seq marks in normalized bedGraph pileup tracks. The –

1397 hist flag was set using 50 bins for all metagene profiles. For certain marks, we were only

1398 interested in a fixed-length window, so the –size flag was set to a fixed number of base pairs,

1399 i.e. H3K4me3 was plotted centered on TSS and extending -/+ 3kb (-size 6000). Other markers

1400 show the profile of the entire transcribed regions of a set of genes that differ in length (-size

1401 given with a peakfile containing a bed style list of TSS and TTS coordinates for each gene in the

57 1402 set). In these cases, the profile is generated based on relative positions within the genes rather

1403 than fixed windows (TSS + 2%, TSS + 4%, etc). In these profiles, we also generated a leading

1404 and trailing profile of a fixed 2 kb length.

1405

1406 Generation of box plots

1407 HOMER (Heinz et al., 2010) annotatePeaks.pl was used to generate average density figures for

1408 particular ChIP-seq marks using the respective normalized bedGraph pileup tracks as input.

1409 This is accomplished by passing the –size flag alone in addition to the required parameters. In

1410 the case of fixed windows, an integer parameter is passed along with the –size flag (-size 6000),

1411 and in the case of whole genes, the –size given option is used, indicating that the peakfile

1412 contains coordinates of the TSS and TTS for a set of genes. The resulting output contains a set

1413 of average densities for the windows specified and these data were plotted in box plot form

1414 using GraphPad Prism.

1415

1416 RNA-seq data analysis

1417 The 50 nt RNA-seq outputs reads from the sequencer Fastq files were aligned to the UCSC

1418 hg19 genome using Tophat v2.0.10 overlaid on Bowtie 1.0.0. Cufflinks v1.3.0 was used to

1419 assemble exons discovered by Tophat into competent mRNA transcripts associated with the

1420 hg19 annotation (Trapnell et al., 2013). The GFP cell line was used as a baseline transcriptome

1421 to which to compare that of the Tat cell line. The Tat transcriptome assembled by Cufflinks was

1422 compared to the GFP transcriptome using Cuffdiff v1.3.0 to generate a list of genes that were

1423 significantly differentially expressed, using a q-value cutoff of 0.05. About 30x106 50 nt

1424 sequencing reads were generated for the GFP and Tat samples. For the GFP cell line we

1425 uniquely mapped 67.82% and 69.95% of reads (values for the GFP1 and GFP2 duplicate

1426 samples) to the human genome. For the Tat cell line we uniquely mapped 72.46% and 77.35%

1427 of reads (values for the Tat1 and Tat2 duplicate samples) to the human genome. FPKM per

58 1428 genes were then produced by counting the number of unique mapped reads that fell within

1429 exonic regions for all the genes annotated in the Ensemble hg19. The total length of exonic

1430 regions was used to normalize read counts per gene.

1431

1432 Unbiased identification of TAR-like motifs in the human genome

1433 To identify TAR-like signatures we first extracted genomic sequences of the TSG and TDG

1434 including up- and down-stream regions of 10 Kb. In a first-pass filtering approach using seedtop

1435 (Shiryev et al., 2007) from the NCBI BLAST package, we searched for the following pattern:

1436 X(2)-G-A-T-X(1,2)-G-A-X(4,40)-T-C-T-C-X(2), yielding 51559 sequence signatures. After subjecting

1437 these sequences to RNA Fold from the Vienna RNA package (Lorenz et al., 2011) and

1438 calculating the partition function and base-pairing probability matrix, we used the maximum

1439 expected accuracy (MEA) structure and isolated sequences that fold in the corresponding

1440 bulged hairpin. In detail, we required the following conserved secondary structure (in dot-

1441 bracket notation, with x denoting either dot or bracket): X(2)-G-A-T-X(1,2)-G-A-X(4,40)-T-C-T-C-X(2)

1442 and X(2)-(-(-.-.(1,2)-(-(-X(4,40)-)-)-)-)-X(2), resulting in 745 signatures covering 225 genes (Table 3).

1443

1444 Response network

1445 A response network was constructed based on published protein interaction and gene-

1446 regulatory data using RNA-seq FPKM values as well as the 438 TSG/TDG targets as seed

1447 nodes as previously described (Mata et al., 2011). Quantitative expression data was

1448 superimposed with network information. The response network was inferred from the large

1449 network by calculating k-shortest weighed paths between seed-node pairs. A detailed

1450 description of the algorithm can be found here (Cabusora et al., 2005).

1451

1452 Gene set enrichment analysis

1453 Simple, non-weighted gene set enrichment analysis has been performed by using a variety of

59 1454 different gene set databases including GO and GSEA/MSigDB (Ashburner et al., 2000;

1455 Subramanian et al., 2005). As standard procedure we employed the Fisher Exact Test and

1456 used Bonferroni correction for multiple testing. We also performed quantitative enrichment

1457 analysis using the GSEA package together with MSigDB (Subramanian et al., 2005).

60 1458 Acknowledgements

1459 We are grateful to Vanessa Schmid and the McDermott Center at UT Southwestern for the

1460 library preparation and high-throughput sequencing, and the Texas Advanced Computing

1461 Center (TACC) for data storage and use of computing resources. We thank Amparo Estrada,

1462 Kelly Vollbracht, Jennifer McCann and Blake Richardson for preliminary work related to qRT-

1463 PCR assays and cloning; Leonardo Estrada for advice in flow cytometry; Lee Kraus and Ralf

1464 Kittler for discussions about ChIP-seq and dataset analysis; and Ana Beatriz Silva and Vicente

1465 Planelles for advice on protocols to obtain central memory T cells. Research reported in this

1466 publication was supported by the National Institute of Allergy and Infectious Diseases of the NIH

1467 under Award Numbers R00AI083087, R56AI106514, and R01AI114362, and The Welch

1468 Foundation grant I-1782 to I.D.; and NIH training grant T32 2T32AI007520-16 to R.P.M. J.E.R.

1469 was funded through the Green Fellows program at The University of Texas at Dallas and the

1470 SURF Program at UT Southwestern.

1471

1472 Additional information

1473 Competing interests

1474 The authors declare no conflicts of interest.

1475

1476 Author contributions

1477 YTK, JER, ID, Conception and design, Acquisition of data, Analysis and interpretation of data;

1478 JER, ID, Writing and drafting the article; CVF, RPM, Acquisition of data.

1479

1480 Data availability

1481 The following data has been deposited in the NCBI GEO (under accession no. GSE65689):

1482 ChIP-seq raw sequence tags from Jurkat CD4+ T cells and gene expression (RNA-seq) data.

61 1483 References

1484 Adelman, K., and Lis, J.T. (2012). Promoter-proximal pausing of RNA polymerase II: emerging 1485 roles in metazoans. Nature reviews Genetics 13, 720-731. 1486 1487 Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., 1488 Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology: tool for the unification of 1489 biology. The Gene Ontology Consortium. Nat Genet 25, 25-29. 1490 1491 Bailey, T.L. (2011). DREME: motif discovery in transcription factor ChIP-seq data. 1492 Bioinformatics 27, 1653-1659. 1493 1494 Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., and 1495 Noble, W.S. (2009). MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 1496 37, W202-208. 1497 1498 Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., 1499 and Zhao, K. (2007). High-resolution profiling of histone methylations in the human genome. 1500 Cell 129, 823-837. 1501 1502 Bernstein, B.E., Humphrey, E.L., Erlich, R.L., Schneider, R., Bouman, P., Liu, J.S., Kouzarides, 1503 T., and Schreiber, S.L. (2002). Methylation of histone H3 Lys 4 in coding regions of active 1504 genes. Proc Natl Acad Sci U S A 99, 8695-8700. 1505 1506 Bosque, A., and Planelles, V. (2011). Studies of HIV-1 latency in an ex vivo model that uses 1507 primary central memory T cells. Methods 53, 54-61. 1508 1509 Bulger, M., and Groudine, M. (2011). Functional and mechanistic diversity of distal transcription 1510 enhancers. Cell 144, 327-339. 1511 1512 Cabusora, L., Sutton, E., Fulmer, A., and Forst, C.V. (2005). Differential network expression 1513 during drug and stress response. Bioinformatics 21, 2898-2905. 1514 1515 Caldwell, R.L., Egan, B.S., and Shepherd, V.L. (2000). HIV-1 Tat represses transcription from 1516 the mannose receptor promoter. J Immunol 165, 7035-7041. 1517 1518 Consortium, E.P. (2011). A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS 1519 Biol 9, e1001046. 1520 1521 Core, L.J., and Lis, J.T. (2008). Transcription regulation through promoter-proximal pausing of 1522 RNA polymerase II. Science 319, 1791-1792. 1523 1524 Core, L.J., Waterfall, J.J., and Lis, J.T. (2008). Nascent RNA sequencing reveals widespread 1525 pausing and divergent initiation at human promoters. Science 322, 1845-1848. 1526 1527 Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W., Steine, E.J., Hanna, 1528 J., Lodato, M.A., Frampton, G.M., Sharp, P.A., et al. (2010). Histone H3K27ac separates active 1529 from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A 107, 21931- 1530 21936. 1531 1532 Croft, D., Mundo, A.F., Haw, R., Milacic, M., Weiser, J., Wu, G., Caudy, M., Garapati, P.,

62 1533 Gillespie, M., Kamdar, M.R., et al. (2014). The Reactome pathway knowledgebase. Nucleic 1534 Acids Res 42, D472-477. 1535 1536 D'Orso, I., and Frankel, A.D. (2010). RNA-mediated displacement of an inhibitory snRNP 1537 complex activates transcription elongation. Nat Struct Mol Biol 17, 815-821. 1538 1539 D'Orso, I., Jang, G.M., Pastuszak, A.W., Faust, T.B., Quezada, E., Booth, D.S., and Frankel, 1540 A.D. (2012). Transition step during assembly of HIV Tat:P-TEFb transcription complexes and 1541 transfer to TAR RNA. Mol Cell Biol 32, 4780-4793. 1542 1543 Dignam, J.D., Lebovitz, R.M., and Roeder, R.G. (1983). Accurate transcription initiation by RNA 1544 polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res 11, 1475- 1545 1489. 1546 1547 Egelhofer, T.A., Minoda, A., Klugman, S., Lee, K., Kolasinska-Zwierz, P., Alekseyenko, A.A., 1548 Cheung, M.S., Day, D.S., Gadel, S., Gorchakov, A.A., et al. (2011). An assessment of histone- 1549 modification antibody quality. Nat Struct Mol Biol 18, 91-93. 1550 1551 Feinberg, M.B., Baltimore, D., and Frankel, A.D. (1991). The role of Tat in the human 1552 immunodeficiency virus life cycle indicates a primary effect on transcriptional elongation. Proc 1553 Natl Acad Sci U S A 88, 4045-4049. 1554 1555 Ferrari, R., Pellegrini, M., Horwitz, G.A., Xie, W., Berk, A.J., and Kurdistani, S.K. (2008). 1556 Epigenetic reprogramming by adenovirus e1a. Science 321, 1086-1088. 1557 1558 Frankel, A.D., Bredt, D.S., and Pabo, C.O. (1988). Tat protein from human immunodeficiency 1559 virus forms a metal-linked dimer. Science 240, 70-73. 1560 1561 Frankel, A.D., and Young, J.A. (1998). HIV-1: fifteen proteins and an RNA. Annual review of 1562 biochemistry 67, 1-25. 1563 1564 Fuda, N.J., Ardehali, M.B., and Lis, J.T. (2009). Defining mechanisms that regulate RNA 1565 polymerase II transcription in vivo. Nature 461, 186-192. 1566 1567 Ghavi-Helm, Y., Klein, F.A., Pakozdi, T., Ciglar, L., Noordermeer, D., Huber, W., and Furlong, 1568 E.E. (2014). Enhancer loops appear stable during development and are associated with paused 1569 polymerase. Nature 512, 96-100. 1570 1571 Goel, R., Muthusamy, B., Pandey, A., and Prasad, T.S. (2011). Human protein reference 1572 database and human proteinpedia as discovery resources for molecular biotechnology. Mol 1573 Biotechnol 48, 87-95. 1574 1575 Gomes, N.P., Bjerke, G., Llorente, B., Szostek, S.A., Emerson, B.M., and Espinosa, J.M. 1576 (2006). Gene-specific requirement for P-TEFb activity and RNA polymerase II phosphorylation 1577 within the p53 transcriptional program. Genes & development 20, 601-612. 1578 1579 Grunberg, S., and Hahn, S. (2013). Structural insights into transcription initiation by RNA 1580 polymerase II. Trends in biochemical sciences 38, 603-611. 1581 1582 Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R., and Young, R.A. (2007). A chromatin 1583 landmark and transcription initiation at most promoters in human cells. Cell 130, 77-88.

63 1584 1585 Guttman, M., and Rinn, J.L. (2012). Modular regulatory principles of large non-coding RNAs. 1586 Nature 482, 339-346. 1587 1588 Hagege, H., Klous, P., Braem, C., Splinter, E., Dekker, J., Cathala, G., de Laat, W., and Forne, 1589 T. (2007). Quantitative analysis of chromosome conformation capture assays (3C-qPCR). Nat 1590 Protoc 2, 1722-1733. 1591 1592 Hah, N., Murakami, S., Nagari, A., Danko, C.G., and Kraus, W.L. (2013). Enhancer transcripts 1593 mark active estrogen receptor binding sites. Genome Res 23, 1210-1223. 1594 1595 Hahn, S. (2004). Structure and mechanism of the RNA polymerase II transcription machinery. 1596 Nat Struct Mol Biol 11, 394-403. 1597 1598 Hazenberg, M.D., Stuart, J.W., Otto, S.A., Borleffs, J.C., Boucher, C.A., de Boer, R.J., 1599 Miedema, F., and Hamann, D. (2000). T-cell division in human immunodeficiency virus (HIV)-1 1600 infection is mainly due to immune activation: a longitudinal analysis in patients before and 1601 during highly active antiretroviral therapy (HAART). Blood 95, 249-255. 1602 1603 He, N., Liu, M., Hsu, J., Xue, Y., Chou, S., Burlingame, A., Krogan, N.J., Alber, T., and Zhou, Q. 1604 (2010). HIV-1 Tat and host AFF4 recruit two transcription elongation factors into a bifunctional 1605 complex for coordinated activation of HIV-1 transcription. Mol Cell 38, 428-438. 1606 1607 He, Q., Johnston, J., and Zeitlinger, J. (2015). ChIP-nexus enables improved detection of in vivo 1608 transcription factor binding footprints. Nat Biotechnol 33, 395-401. 1609 1610 Heintzman, N.D., and Ren, B. (2009). Finding distal regulatory elements in the human genome. 1611 Curr Opin Genet Dev 19, 541-549. 1612 1613 Heintzman, N.D., Stuart, R.K., Hon, G., Fu, Y., Ching, C.W., Hawkins, R.D., Barrera, L.O., Van 1614 Calcar, S., Qu, C., Ching, K.A., et al. (2007). Distinct and predictive chromatin signatures of 1615 transcriptional promoters and enhancers in the human genome. Nat Genet 39, 311-318. 1616 1617 Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre, C., 1618 Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-determining transcription 1619 factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 1620 576-589. 1621 1622 Hellerstein, M.K., Hoh, R.A., Hanley, M.B., Cesar, D., Lee, D., Neese, R.A., and McCune, J.M. 1623 (2003). Subpopulations of long-lived and short-lived T cells in advanced HIV-1 infection. J Clin 1624 Invest 112, 956-966. 1625 1626 Hollenhorst, P.C., Chandler, K.J., Poulsen, R.L., Johnson, W.E., Speck, N.A., and Graves, B.J. 1627 (2009). DNA specificity determinants associate with distinct transcription factor functions. PLoS 1628 Genet 5, e1000778. 1629 1630 Hollenhorst, P.C., McIntosh, L.P., and Graves, B.J. (2011). Genomic and biochemical insights 1631 into the specificity of ETS transcription factors. Annual review of biochemistry 80, 437-471. 1632 1633 Horwitz, G.A., Zhang, K., McBrian, M.A., Grunstein, M., Kurdistani, S.K., and Berk, A.J. (2008). 1634 Adenovirus small e1a alters global patterns of histone modification. Science 321, 1084-1085.

64 1635 1636 Hottiger, M.O., and Nabel, G.J. (1998). Interaction of human immunodeficiency virus type 1 Tat 1637 with the transcriptional coactivators p300 and CREB binding protein. J Virol 72, 8252-8256. 1638 1639 Huang, L., Bosch, I., Hofmann, W., Sodroski, J., and Pardee, A.B. (1998). Tat protein induces 1640 human immunodeficiency virus type 1 (HIV-1) coreceptors and promotes infection with both 1641 macrophage-tropic and T-lymphotropic HIV-1 strains. J Virol 72, 8952-8960. 1642 1643 Izmailova, E., Bertley, F.M., Huang, Q., Makori, N., Miller, C.J., Young, R.A., and Aldovini, A. 1644 (2003). HIV-1 Tat reprograms immature dendritic cells to express chemoattractants for activated 1645 T cells and macrophages. Nat Med 9, 191-197. 1646 1647 Jager, S., Cimermancic, P., Gulbahce, N., Johnson, J.R., McGovern, K.E., Clarke, S.C., Shales, 1648 M., Mercenne, G., Pache, L., Li, K., et al. (2012). Global landscape of HIV-human protein 1649 complexes. Nature 481, 365-370. 1650 1651 Jin, F., Li, Y., Dixon, J.R., Selvaraj, S., Ye, Z., Lee, A.Y., Yen, C.A., Schmitt, A.D., Espinoza, 1652 C.A., and Ren, B. (2013). A high-resolution map of the three-dimensional chromatin interactome 1653 in human cells. Nature 503, 290-294. 1654 1655 Kao, S.Y., Calman, A.F., Luciw, P.A., and Peterlin, B.M. (1987). Anti-termination of transcription 1656 within the long terminal repeat of HIV-1 by tat gene product. Nature 330, 489-493. 1657 1658 Kerrien, S., Aranda, B., Breuza, L., Bridge, A., Broackes-Carter, F., Chen, C., Duesbury, M., 1659 Dumousseau, M., Feuermann, M., Hinz, U., et al. (2012). The IntAct molecular interaction 1660 database in 2012. Nucleic Acids Res 40, D841-846. 1661 1662 Kim, N., Kukkonen, S., Gupta, S., and Aldovini, A. (2010a). Association of Tat with promoters of 1663 PTEN and PP2A subunits is key to transcriptional activation of apoptotic pathways in HIV- 1664 infected CD4+ T cells. PLoS Pathog 6, e1001103. 1665 1666 Kim, N., Kukkonen, S., Martinez-Viedma Mdel, P., Gupta, S., and Aldovini, A. (2013). Tat 1667 engagement of p38 MAP kinase and IRF7 pathways leads to activation of interferon-stimulated 1668 genes in antigen-presenting cells. Blood 121, 4090-4100. 1669 1670 Kim, T.K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, 1671 M., Barbara-Haley, K., Kuersten, S., et al. (2010b). Widespread transcription at neuronal 1672 activity-regulated enhancers. Nature 465, 182-187. 1673 1674 Kim, T.K., and Shiekhattar, R. (2015). Architectural and Functional Commonalities between 1675 Enhancers and Promoters. Cell 162, 948-959. 1676 1677 Kolasinska-Zwierz, P., Down, T., Latorre, I., Liu, T., Liu, X.S., and Ahringer, J. (2009). 1678 Differential chromatin marking of introns and expressed exons by H3K36me3. Nat Genet 41, 1679 376-381. 1680 1681 Konig, R., Stertz, S., Zhou, Y., Inoue, A., Hoffmann, H.H., Bhattacharyya, S., Alamares, J.G., 1682 Tscherne, D.M., Ortigoza, M.B., Liang, Y., et al. (2010). Human host factors required for 1683 influenza virus replication. Nature 463, 813-817. 1684 1685 Kouzarides, T. (2007). Chromatin modifications and their function. Cell 128, 693-705.

65 1686 1687 Kowalczyk, M.S., Hughes, J.R., Garrick, D., Lynch, M.D., Sharpe, J.A., Sloane-Stanley, J.A., 1688 McGowan, S.J., De Gobbi, M., Hosseini, M., Vernimmen, D., et al. (2012). Intragenic enhancers 1689 act as alternative promoters. Mol Cell 45, 447-458. 1690 1691 Krogan, N.J., Dover, J., Wood, A., Schneider, J., Heidt, J., Boateng, M.A., Dean, K., Ryan, 1692 O.W., Golshani, A., Johnston, M., et al. (2003a). The Paf1 complex is required for histone H3 1693 methylation by COMPASS and Dot1p: linking transcriptional elongation to histone methylation. 1694 Mol Cell 11, 721-729. 1695 1696 Krogan, N.J., Kim, M., Tong, A., Golshani, A., Cagney, G., Canadien, V., Richards, D.P., 1697 Beattie, B.K., Emili, A., Boone, C., et al. (2003b). Methylation of histone H3 by Set2 in 1698 Saccharomyces cerevisiae is linked to transcriptional elongation by RNA polymerase II. Mol Cell 1699 Biol 23, 4207-4218. 1700 1701 Kukkonen, S., Martinez-Viedma Mdel, P., Kim, N., Manrique, M., and Aldovini, A. (2014). HIV-1 1702 Tat second exon limits the extent of Tat-mediated modulation of interferon-stimulated genes in 1703 antigen presenting cells. Retrovirology 11, 30. 1704 1705 Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient 1706 alignment of short DNA sequences to the human genome. Genome Biol 10, R25. 1707 1708 Laspia, M.F., Rice, A.P., and Mathews, M.B. (1989). HIV-1 Tat protein increases transcriptional 1709 initiation and stabilizes elongation. Cell 59, 283-292. 1710 1711 Lau, A., Swinbank, K.M., Ahmed, P.S., Taylor, D.L., Jackson, S.P., Smith, G.C., and O'Connor, 1712 M.J. (2005). Suppression of HIV-1 infection by a small molecule inhibitor of the ATM kinase. Nat 1713 Cell Biol 7, 493-500. 1714 1715 Li, B., Carey, M., and Workman, J.L. (2007). The role of chromatin during transcription. Cell 1716 128, 707-719. 1717 1718 Li, C.J., Ueda, Y., Shi, B., Borodyansky, L., Huang, L., Li, Y.Z., and Pardee, A.B. (1997). Tat 1719 protein induces self-perpetuating permissivity for productive HIV-1 infection. Proc Natl Acad Sci 1720 U S A 94, 8116-8120. 1721 1722 Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., 1723 Durbin, R., and Genome Project Data Processing, S. (2009). The Sequence Alignment/Map 1724 format and SAMtools. Bioinformatics 25, 2078-2079. 1725 1726 Lopez-Huertas, M.R., Callejas, S., Abia, D., Mateos, E., Dopazo, A., Alcami, J., and Coiras, M. 1727 (2010). Modifications in host cell cytoskeleton structure and function mediated by intracellular 1728 HIV-1 Tat protein are greatly dependent on the second coding exon. Nucleic Acids Res 38, 1729 3287-3307. 1730 1731 Lorenz, R., Bernhart, S.H., Honer Zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P.F., and 1732 Hofacker, I.L. (2011). ViennaRNA Package 2.0. Algorithms Mol Biol 6, 26. 1733 1734 Luo, Z., Lin, C., and Shilatifard, A. (2012). The super elongation complex (SEC) family in 1735 transcriptional control. Nat Rev Mol Cell Biol 13, 543-547. 1736

66 1737 Mancebo, H.S., Lee, G., Flygare, J., Tomassini, J., Luu, P., Zhu, Y., Peng, J., Blau, C., Hazuda, 1738 D., Price, D., et al. (1997). P-TEFb kinase is required for HIV Tat transcriptional activation in 1739 vivo and in vitro. Genes & development 11, 2633-2644. 1740 1741 Marazzi, I., Ho, J.S., Kim, J., Manicassamy, B., Dewell, S., Albrecht, R.A., Seibert, C.W., 1742 Schaefer, U., Jeffrey, K.L., Prinjha, R.K., et al. (2012). Suppression of the antiviral response by 1743 an influenza histone mimic. Nature 483, 428-433. 1744 1745 Marban, C., Su, T., Ferrari, R., Li, B., Vatakis, D., Pellegrini, M., Zack, J.A., Rohr, O., and 1746 Kurdistani, S.K. (2011). Genome-wide binding map of the HIV-1 Tat protein to the human 1747 genome. PLoS One 6, e26894. 1748 1749 Martin, C., and Zhang, Y. (2005). The diverse functions of histone lysine methylation. Nat Rev 1750 Mol Cell Biol 6, 838-849. 1751 1752 Mata, M.A., Satterly, N., Versteeg, G.A., Frantz, D., Wei, S., Williams, N., Schmolke, M., Pena- 1753 Llopis, S., Brugarolas, J., Forst, C.V., et al. (2011). Chemical inhibition of RNA viruses reveals 1754 REDD1 as a host defense factor. Nat Chem Biol 7, 712-719. 1755 1756 Mayer, A., di Iulio, J., Maleri, S., Eser, U., Vierstra, J., Reynolds, A., Sandstrom, R., 1757 Stamatoyannopoulos, J.A., and Churchman, L.S. (2015). Native elongating transcript 1758 sequencing reveals human transcriptional activity at nucleotide resolution. Cell 161, 541-554. 1759 Meijsing, S.H., Pufall, M.A., So, A.Y., Bates, D.L., Chen, L., and Yamamoto, K.R. (2009). DNA 1760 binding site sequence directs glucocorticoid receptor structure and activity. Science 324, 407- 1761 410. 1762 Messi, M., Giacchetto, I., Nagata, K., Lanzavecchia, A., Natoli, G., and Sallusto, F. (2003). 1763 Memory and flexibility of cytokine gene expression as separable properties of human T(H)1 and 1764 T(H)2 lymphocytes. Nat Immunol 4, 78-86. 1765 1766 Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., 1767 Brockman, W., Kim, T.K., Koche, R.P., et al. (2007). Genome-wide maps of chromatin state in 1768 pluripotent and lineage-committed cells. Nature 448, 553-560. 1769 1770 Min, I.M., Waterfall, J.J., Core, L.J., Munroe, R.J., Schimenti, J., and Lis, J.T. (2011). Regulating 1771 RNA polymerase pausing and transcription elongation in embryonic stem cells. Genes & 1772 development 25, 742-754. 1773 1774 Nguyen, A.T., and Zhang, Y. (2011). The diverse functions of Dot1 and H3K79 methylation. 1775 Genes & development 25, 1345-1358. 1776 1777 Onder, T.T., Kara, N., Cherry, A., Sinha, A.U., Zhu, N., Bernt, K.M., Cahan, P., Marcarci, B.O., 1778 Unternaehrer, J., Gupta, P.B., et al. (2012). Chromatin-modifying enzymes as modulators of 1779 reprogramming. Nature 483, 598-602. 1780 1781 Ott, M., Geyer, M., and Zhou, Q. (2011). The control of HIV transcription: keeping RNA 1782 polymerase II on track. Cell Host Microbe 10, 426-435. 1783 1784 Pearson, R., Kim, Y.K., Hokello, J., Lassen, K., Friedman, J., Tyagi, M., and Karn, J. (2008). 1785 Epigenetic silencing of human immunodeficiency virus (HIV) transcription by formation of 1786 restrictive chromatin structures at the viral long terminal repeat drives the progressive entry of 1787 HIV into latency. J Virol 82, 12291-12303.

67 1788 1789 Peterlin, B.M., and Price, D.H. (2006). Controlling the elongation phase of transcription with P- 1790 TEFb. Mol Cell 23, 297-305. 1791 1792 Pokholok, D.K., Harbison, C.T., Levine, S., Cole, M., Hannett, N.M., Lee, T.I., Bell, G.W., 1793 Walker, K., Rolfe, P.A., Herbolsheimer, E., et al. (2005). Genome-wide map of nucleosome 1794 acetylation and methylation in yeast. Cell 122, 517-527. 1795 1796 Raha, T., Cheng, S.W., and Green, M.R. (2005). HIV-1 Tat stimulates transcription complex 1797 assembly through recruitment of TBP in the absence of TAFs. PLoS Biol 3, e44. 1798 1799 Rahl, P.B., Lin, C.Y., Seila, A.C., Flynn, R.A., McCuine, S., Burge, C.B., Sharp, P.A., and 1800 Young, R.A. (2010). c-Myc regulates transcriptional pause release. Cell 141, 432-445. 1801 1802 Reppas, N.B., Wade, J.T., Church, G.M., and Struhl, K. (2006). The transition between 1803 transcriptional initiation and elongation in E. coli is highly variable and often rate limiting. Mol 1804 Cell 24, 747-757. 1805 1806 Rhee, H.S., and Pugh, B.F. (2011). Comprehensive genome-wide protein-DNA interactions 1807 detected at single-nucleotide resolution. Cell 147, 1408-1419. 1808 1809 Rice, A.P., and Mathews, M.B. (1988). Transcriptional but not translational regulation of HIV-1 1810 by the tat gene product. Nature 332, 551-553. 1811 1812 Robinson, J.T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and 1813 Mesirov, J.P. (2011). Integrative genomics viewer. Nat Biotechnol 29, 24-26. 1814 1815 Sancho, D., Gomez, M., and Sanchez-Madrid, F. (2005). CD69 is an immunoregulatory 1816 molecule induced following activation. Trends Immunol 26, 136-140. 1817 1818 Santos-Rosa, H., Schneider, R., Bannister, A.J., Sherriff, J., Bernstein, B.E., Emre, N.C., 1819 Schreiber, S.L., Mellor, J., and Kouzarides, T. (2002). Active genes are tri-methylated at K4 of 1820 histone H3. Nature 419, 407-411. 1821 1822 Seila, A.C., Calabrese, J.M., Levine, S.S., Yeo, G.W., Rahl, P.B., Flynn, R.A., Young, R.A., and 1823 Sharp, P.A. (2008). Divergent transcription from active promoters. Science 322, 1849-1851. 1824 1825 Shilatifard, A. (2012). The COMPASS family of histone H3K4 methylases: mechanisms of 1826 regulation in development and disease pathogenesis. Annual review of biochemistry 81, 65-95. 1827 1828 Shiryev, S.A., Papadopoulos, J.S., Schaffer, A.A., and Agarwala, R. (2007). Improved BLAST 1829 searches using longer words for protein seeding. Bioinformatics 23, 2949-2951. 1830 1831 Sims, R.J., 3rd, Belotserkovskaya, R., and Reinberg, D. (2004). Elongation by RNA polymerase 1832 II: the short and long of it. Genes & development 18, 2437-2468. 1833 1834 Smallwood, A., and Ren, B. (2013). Genome organization and long-range regulation of gene 1835 expression by enhancers. Curr Opin Cell Biol 25, 387-394. 1836 1837 Smith, E., and Shilatifard, A. (2013). Transcriptional elongation checkpoint control in 1838 development and disease. Genes & development 27, 1079-1088.

68 1839 1840 Sobhian, B., Laguette, N., Yatim, A., Nakamura, M., Levy, Y., Kiernan, R., and Benkirane, M. 1841 (2010). HIV-1 Tat assembles a multifunctional transcription elongation complex and stably 1842 associates with the 7SK snRNP. Mol Cell 38, 439-451. 1843 1844 Stasevich, T.J., Hayashi-Takanaka, Y., Sato, Y., Maehara, K., Ohkawa, Y., Sakata-Sogawa, K., 1845 Tokunaga, M., Nagase, T., Nozaki, N., McNally, J.G., et al. (2014). Regulation of RNA 1846 polymerase II activation by histone acetylation in single living cells. Nature 516, 272-275. 1847 1848 Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., 1849 Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment 1850 analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc 1851 Natl Acad Sci U S A 102, 15545-15550. 1852 1853 Tessarz, P., and Kouzarides, T. (2014). Histone core modifications regulating nucleosome 1854 structure and dynamics. Nat Rev Mol Cell Biol 15, 703-708. 1855 1856 Trapnell, C., Hendrickson, D.G., Sauvageau, M., Goff, L., Rinn, J.L., and Pachter, L. (2013). 1857 Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31, 1858 46-53. 1859 1860 Wade, J.T., and Struhl, K. (2008). The transition from transcriptional initiation to elongation. Curr 1861 Opin Genet Dev 18, 130-136. 1862 1863 Weissman, J.D., Brown, J.A., Howcroft, T.K., Hwang, J., Chawla, A., Roche, P.A., Schiltz, L., 1864 Nakatani, Y., and Singer, D.S. (1998). HIV-1 tat binds TAFII250 and represses TAFII250- 1865 dependent transcription of major histocompatibility class I genes. Proc Natl Acad Sci U S A 95, 1866 11601-11606. 1867 1868 Zeitlinger, J., Stark, A., Kellis, M., Hong, J.W., Nechaev, S., Adelman, K., Levine, M., and 1869 Young, R.A. (2007). RNA polymerase stalling at developmental control genes in the Drosophila 1870 melanogaster embryo. Nat Genet 39, 1512-1516. 1871 1872 Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C., 1873 Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). 1874 Genome Biol 9, R137. 1875 1876 Zhou, Q., Li, T., and Price, D.H. (2012). RNA polymerase II elongation control. Annual review of 1877 biochemistry 81, 119-143. 1878 1879 Zhou, V.W., Goren, A., and Bernstein, B.E. (2011). Charting histone modifications and the 1880 functional organization of mammalian genomes. Nature reviews Genetics 12, 7-18. 1881 1882 Zhu, Y., Pe'ery, T., Peng, J., Ramanathan, Y., Marshall, N., Marshall, T., Amendt, B., Mathews, 1883 M.B., and Price, D.H. (1997). Transcription elongation factor P-TEFb is required for HIV-1 tat 1884 transactivation in vitro. Genes & development 11, 2622-2632. 1885

69 1886 Figure legends

1887

1888 Figure 1. Genomic domains occupied and regulated by Tat in CD4+ T cells. (A) Western

1889 blot of Jurkat-GFP and -Tat cell lines treated (+) or not (–) with doxycycline (DOX) using the

1890 indicated antibodies. (B) Genome-wide distribution of Tat across the human genome. (C)

1891 Integration of the FLAG ChIP-seq and RNA-seq datasets defines a set of genes directly

1892 regulated by Tat. Direct targets, genes directly bound and regulated by Tat. TSG, Tat stimulated

1893 genes; TDG, Tat downregulated genes. (D) Validation of the RNA-seq dataset using qRT-PCR

1894 on the indicated TSG, TDG or non-target genes as negative controls (mean ± SEM; n = 3). (E)

1895 Individual tracks showing FLAG ChIP-seq and the corresponding RNA-seq dataset in the GFP

1896 and Tat cell lines. TSS, transcription start site. (F) Functional annotation of biological processes

1897 enriched at TSG and TDG. This figure is associated with Figure1–figure supplements 1 to 10.

1898

1899 Figure 2. Global analysis of chromatin signatures reveals that Tat activates the

1900 transcription initiation and elongation steps. (A) Dot plots of H3K4me3 log2 fold change in

1901 the region encompassing -3/+3 Kb from the TSS of all TSG. Genes are divided into two

1902 clusters: class I and class II based on increased (>1.5-fold change) or no change/decreased

1903 H3K4me3 levels in the presence of Tat. Selected TSG examples are indicated in red. (B)

1904 Metagene plots centered on TSS showing H3K4me3 occupancy profiles at both class I (n = 17)

1905 and class II (n = 43) TSG in the presence of Tat or GFP. (C) Dot plots of H3K79me3 log2 fold

1906 change in the region from TSS to TTS (see methods). (D) Metagene analysis showing average

1907 H3K79me3 ChIP-seq signals at both class I and class II TSG in the presence of Tat or GFP.

1908 Units are mean tags per million ChIP-Seq reads per bin across the transcribed region of each

1909 gene with 2 kb upstream and downstream flanking regions. (E) Dot plots of H3K36me3 log2 fold

1910 change in the region from TSS to TTS. (F) Metagene plots showing average H3K36me3 ChIP-

1911 seq signals at both class I and class II TSG in the presence of Tat or GFP. (G) Genome browser

70 1912 views showing ChIP-seq signal at a class I TSG (NETO1) in the GFP and Tat cell lines. (H)

1913 Genome browser views showing ChIP-seq signal at a class II TSG (VAV3) in the GFP and Tat

1914 cell lines. This figure is associated with Figure 2–figure supplements 1 to 5.

1915

1916 Figure 3. Tat promotes Pol II recruitment and pause release at two distinct TSG classes.

1917 (A) Tat binding promotes Pol II recruitment at class I but not class II TSG. Box plots of

1918 normalized Pol II tag density at the Tat peak of class I (n = 17) or II (n = 43) TSG in the GFP

1919 and Tat cell lines. (B) Tat binding at class I TSG induces Pol II occupancy at promoters. Box

1920 plots of normalized Pol II tag density at the promoter of class I or II TSG in the GFP and Tat cell

1921 lines. (C) Pol II normalized tag density relative to the Tat peak at class I TSG. (D) Pol II

1922 normalized tag density relative to the Tat peak at class II TSG. (E) Pol II distribution at class I

1923 TSG (Metagene plots) in the Tat and GFP cell lines. (F) Genome browser views of ChIP-seq

1924 data in the Tat and GFP cell lines at a class I TSG (NETO1). The arrow indicates the position of

1925 the FLAG peak called in the Tat cell lines by MACS. (G) Pol II distribution at class II TSG

1926 (Metagene plots) in the Tat and GFP cell lines. (H) Genome browser views of ChIP-seq data in

1927 the Tat and GFP cell lines at a class II TSG (VAV3). This figure is associated with Figure 3–

1928 figure supplements 1 and 2.

1929

1930 Figure 4. Tat induces transcription initiation from distal sites by inducing gene looping.

1931 (A) Heatmap representation of Tat distribution at TSG promoter or intragenic sites. (B) Heatmap

1932 representation of H3K4me3 and H3K27Ac tag density centered on the Tat peak at both TSG

1933 promoter and intragenic sites in the GFP and Tat cell lines. Promoter sites are marked with high

1934 H3K4me3 and low H3K27Ac levels, while intragenic sites are marked by low H3K4me3 and

1935 high H3K27Ac levels. (C) Heatmap representation of H3K27Ac at class I and class II TSG

1936 centered on the Tat peak in the GFP and Tat cell lines. Genes are ranked based on H3K27Ac

1937 density. (D) Tat binding sharply increases H3K27Ac levels at class I TSG. Box plots showing

71 1938 normalized H3K27Ac density at class I and II TSG in the GFP and Tat cell lines. (E) Metagene

1939 analysis of H3K27Ac levels surrounding Tat peaks at class I TSG in the GFP and Tat cell lines.

1940 (F) Metagene analysis of H3K27Ac levels surrounding Tat peaks at class II TSG in the GFP and

1941 Tat cell lines. (G) ChIP of H3K27Ac and p300 recruitment at an intragenic Tat site at the

1942 PPM1H gene. (H) ChIP of H3K27Ac and p300 recruitment at an intragenic Tat site at the CD82

1943 gene. (I) Top, genome browser views of ChIP-seq at the PPM1H locus in the GFP and Tat cell

1944 lines. The arrows indicate the position of two intragenic Tat binding sites. Bottom, 3C assay

1945 showing the relative crosslinking efficiency at the PPM1H locus in the GFP and Tat cell lines.

1946 The position of the primers used in qPCR assays, restriction sites and fragment generated after

1947 digestion are indicated. This figure is associated with Figure 4–figure supplements 1 to 4.

1948

1949 Figure 5. A modified traveling ratio algorithm reveals Tat roles in transcription

1950 elongation. (A) Traveling ratio (TR) algorithm used to calculate levels of promoter-proximal

1951 paused and elongation Pol II as previously described (Rahl et al., 2010; Zeitlinger et al., 2007).

1952 (B) Genome browser views of ChIP-seq data at the CD82 locus in the GFP and Tat cell lines.

1953 (C) Modified traveling ratio (mTR) algorithm that enables the identification of intragenically

1954 paused Pol II at sites of high H3K27Ac levels. (D) mTR decreases at TSG in the Tat cell line.

1955 Box plots showing mTR calculations at TSG as indicated in panel (D). (E) Dual-axis plot of the

1956 ratio of paused Pol II (pPol II) and traveling Pol II (tPol II) at class I (n = 17) and II (n = 43) TSG

1957 in the presence and absence of Tat (Tat/GFP). (F) Model of Tat-mediated Pol II recruitment and

1958 gene looping at class I TSG. Tat binds at promoter and/or intragenic regions (in some cases

1959 inducing gene looping), and recruits Pol II to promote transcription initiation from both promoter-

1960 proximal and promoter-distal sites. (G) Model of Tat-mediated Pol II elongation at class II TSG.

1961 Class II TSG are already activated in the absence of Tat, and Tat binds to these genes to

1962 primarily stimulate transcription elongation. These genes contain promoter-bound Pol II. This

1963 figure is associated with Figure 5–figure supplement 1.

72 1964

1965 Figure 6. Global analysis of chromatin signatures reveals the basis of transcriptional

1966 repression by Tat. (A) Dot plots of H3K4me3 fold change in the region encompassing -3/+3 Kb

1967 from the TSS of all TDG. TDG are divided into two classes: class I (n = 11) and class II (n = 14)

1968 based on decreased, or unchanged levels of H3K4me3 in the presence of Tat, respectively.

1969 Selected TDG are indicated in red. (B) Metagene plots centered on TSS showing H3K4me3

1970 occupancy profiles at both class I and class II TDG in the presence of Tat or GFP. (C) Dot plots

1971 of H3K79me3 fold change from TSS to TTS. (D) Metagene plots showing average H3K79me3

1972 ChIP-seq signals at both class I and class II TDG in the presence of Tat or GFP (see methods).

1973 (E) Dot plots of H3K36me3 fold change in the region from TSS to TTS. (F) Metagene plots

1974 showing average H3K36me3 ChIP-seq signals at both class I and class II TDG in the presence

1975 of Tat or GFP. (G) Genome browser views showing ChIP-seq signal at a class I TDG (CD1E) in

1976 the GFP and Tat cell lines. (H) Genome browser views showing ChIP-seq signal at a class II

1977 TDG (EOMES) in the GFP and Tat cell lines. This figure is associated with Figure 6–figure

1978 supplement 1.

1979

1980 Figure 7. Tat blocks Pol II recruitment and pause release to repress gene transcription at

1981 different gene classes. (A) Tat binding blocks Pol II recruitment primarily at class I TDG. Box

1982 plots of normalized Pol II tag density at the Tat peak of class I or II TDG in the GFP and Tat cell

1983 lines. (B) Tat binding at class I TSG blocks Pol II density at promoters. Box plots of normalized

1984 Pol II tag density at the promoter of class I (n = 11) or class II (n = 14) TDG in the GFP and Tat

1985 cell lines. (C) Normalized Pol II tag density relative to the Tat peak at class I TDG. (D)

1986 Normalized Pol II tag density relative to the Tat peak at class II TDG. (E) Pol II distribution at

1987 class I TDG (Metagene plots) in the Tat and GFP cell lines. (F) Genome browser views of ChIP-

1988 seq data at a class I TDG (CD1E) in the Tat and GFP cell lines. (G) Pol II distribution at class II

1989 genes (Metagene plots) in the Tat and GFP. (H) Genome browser views of ChIP-seq data at a

73 1990 class II TDG (EOMES) in the Tat and GFP cell lines. This figure is associated with Figure 7–

1991 figure supplement 1.

1992

1993 Figure 8. Enrichment of master transcriptional regulators, but not TAR-like motifs on Tat

1994 sites at the direct target genes. (A) TAR sequence and scheme of the secondary structure.

1995 (B) Schematic of the sequence query used to search for TAR-like motifs within the direct Tat

1996 target genes. (C) TAR-like motifs are not found at or significantly near Tat peaks in TSG.

1997 Fraction of TSG containing TAR-like motifs at <10kb, <1kb or <0.1kb from the Tat peak. (D)

1998 TAR-like motifs are not found at or significantly near Tat peaks in TDG. Fraction of TDG

1999 containing TAR-like motifs at <10kb, <1kb or <0.1kb from the Tat peak. (E) MEME analysis of

2000 200-bp windows surrounding Tat peaks in TSG reveals high-confidence motifs related to

2001 transcription factors ETS1 and RUNX1. (F) MEME analysis of 200-bp windows surrounding Tat

2002 peaks in TDG reveal different motifs for ETS1 and RUNX1. (G) Comparison of ETS1 ChIP-seq

2003 peak locations in Jurkat, as determined by Hollenhorst et al. (Hollenhorst et al., 2009), to Tat

2004 peak locations reveal that 73% of TSG contain an ETS1 peak within 100-bp of a Tat peak. (H)

2005 Comparison of ETS1 ChIP-seq peak locations to Tat peak locations reveal that 80% of TDG

2006 contain an ETS1 peak within 100-bp of a Tat peak. This figure is associated with Figure 8–figure

2007 supplement 1.

2008

2009 Figure 9. Tat is recruited to its target genes through interaction with the master

2010 transcriptional regulator ETS1. (A) Western blots showing interactions between Tat and

2011 ETS1. CDK9 was used as a positive control in the interaction. Strep-tagged Tat and GFP were

2012 affinity purified (AP) from the Jurkat cell lines using Strep beads, and analyzed by western blot

2013 using the indicated antibodies. (B) Tat is recruited to target genes marked by ETS1. ChIP

2014 assays showing that Tat and ETS1 co-occupy the promoter-proximal but not promoter-distal

2015 region of CD69, in agreement with the location of Tat and ETS1 peaks found by ChIP-seq and

74 2016 the presence of ETS1 motifs (as revealed by enrichment analysis). (C) Tat is also recruited to

2017 target genes using ETS1-independent mechanisms. ChIP assays showing that Tat but not

2018 ETS1 occupies the promoter-proximal region of the CD1E target gene, in agreement with the

2019 Tat and ETS1 ChIP-seq dataset and motif prediction analysis. (D) Generation of Jurkat GFP

2020 and Tat cell lines expressing non-target (NT) or ETS1 shRNAs. (–) denotes the parental

2021 untransduced Jurkat cell lines. Protein lysates were analyzed by western blot using the

2022 indicated antibodies to verify for the RNAi efficiency. (E) ETS1 knockdown impairs Tat

2023 recruitment to the CD69 promoter. ChIP-qPCR assays showing the density of ETS1 or FLAG at

2024 the CD69 promoter in the Jurkat GFP or Tat cell lines transduced with the NT or ETS1 shRNAs.

2025 (F) ETS1 knockdown does not abolish Tat recruitment to the CD1E promoter. ChIP-qPCR

2026 assays showing the density of ETS1 or FLAG at the CD1E promoter in the Jurkat GFP or Tat

2027 cell lines transduced with the NT or ETS1 shRNAs. This figure is associated with Figure 9–

2028 figure supplement 1.

75 2029 Table 1. Genome-wide distribution of Tat in CD4+ T cells.

Genomic Domain Peaks Percent Genome ChIP fraction (p-value) Total 6117 100 Intergenic 2141 35 52.7% Intragenic Promoter 1469 24 1.1% 3.4x10-323 (-1kb– TSS) Exon 128 2.1 1.9% 1.4x10-3 Intron 2227 36.4 42.4% 5.1x10-4 5’ UTR 73 1.2 0.4% 5.3x10-193 3’ UTR 79 1.3 1.5% 1.4x10-2 2030

76 2031 Table 2. Genome-wide distribution and function of histone modifications.

Histone modification Location Function H3K4me3 Promoters Transcription activation H3K79me3 Gene bodies Transcription activation H3K36me3 Gene bodies Transcription activation H3K4me1 Enhancers Does not demarcate active status, but location H3K27Ac Enhancers Transcription activation H3K9me3 Ubiquitous Transcription repression 2032

77 2033 Table 3. Statistics of the TAR signature selection.

TSG TDG Total Targets 247 191 438 Genomic Sequences (with duplicates) 1089 797 1886 Isolated target sequences (with 34739 16820 51559 duplicates) With correct secondary structure 493 253 746 Corresponding genes 146 86 232 2034

78 2035 Supplementary figure legends

2036

2037 Figure 1–figure supplement 1. Tat protein expression levels in the Jurkat Tat-SF model

2038 matches the levels of Tat detected during HIV infection. Western blot (FLAG and Tat) of

2039 Jurkat Tat-SF cell line treated with (+) or without (–) doxycycline (DOX), and Jurkat cells latently

2040 infected with HIV (clone E4) induced with (+) or without (–) TNF-α for 24 hours. A β-actin

2041 western blot is shown as loading control.

2042

2043 Figure 1–figure supplement 2. Technical improvement of Tat ChIP-seq in CD4+ T cells.

2044 (A) Comparison of ChIP-seq Tat peak numbers in the Jurkat GFP and Tat cell lines using the

2045 Tat and FLAG antibodies. The number of Tat peaks in a similar Jurkat Tat cell line from the

2046 study of Marban et al. (Marban et al., 2011) is indicated. (B) Overlay of Tat peaks from this

2047 study and those identified by Marban et al. using ChIP-seq (Marban et al., 2011). The number of

2048 common Tat peaks between our ChIP-seq data and Marban’s at a distance of <0.1kb or <1kb

2049 from the Tat sites is shown. (C–E) Genome browser views of FLAG ChIP-seq tracks in the GFP

2050 and Tat cell lines along with the FLAG peaks called by MACS in the Tat cell line and the Tat

2051 track by Marban et al. (Marban et al., 2011).

2052

2053 Figure 1–figure supplement 3. Non-functional Tat mutants have compromised chromatin

2054 interaction and modulation of cellular gene expression. (A) Scheme of Tat showing the

2055 position of its activation domain (AD), RNA-binding domain (RBD) and C-terminal domain (Ct)

2056 along with the location of the mutated residues. (B) Western blots of Jurkat CD4+ T cell lines

2057 expressing GFP, wild-type Tat or non-functional mutants (C22A, K50Q, R52R53K) (D'Orso et

2058 al., 2012). Cells from panel (B) were used in ChIP assays to analyze the occupancy of GFP, Tat

2059 or the non-functional mutants at class I TSG promoters (C), class II TSG promoters (D), class I

2060 TDG promoters (E) and class II TDG promoters (F). Values represent the average of three

79 2061 independent experiments (mean ± SEM are shown; n = 3). Cells from panel (B) were used to

2062 isolate total RNA and the expression of class I TSG (G), class II TSG (H), class I TDG (I) and

2063 class II TDG (J) was measured by qRT-PCR, normalized to RPL19, and plotted as fold RNA

2064 change over the GFP line arbitrarily set at 1 (mean ± SEM are shown; n = 3).

2065

2066 Figure 1–figure supplement 4. Tat-induced transcriptome changes are also observed at

2067 the protein level. Flow cytometry analysis of Jurkat GFP and Tat cell lines induced with

2068 doxycycline for 24 hours and stained with CD4-APC and CD69-PE antibodies or unstained

2069 (negative control) to monitor levels of CD4 and CD69 proteins expressed at the cell surface.

2070 Note the increase in the CD69+ population in the presence of Tat.

2071

2072 Figure 1–figure supplement 5. Distribution of Tat occupancy at promoter and/or

2073 intragenic domains in TSG and TDG. The percentage of sequences from the total is

2074 indicated.

2075

2076 Figure 1–figure supplement 6. FLAG ChIP-qPCR analysis on the indicated genomic loci.

2077 ChIP assay to analyze the distribution of Tat or GFP at the CD69 (A), ADCYAP1 (B), CD1E (C)

2078 and RAG1 (D) locus in the GFP (green) and Tat (black) cell lines. The position of the amplicons

2079 used in ChIP-qPCR and their distance to the TSS (arrow) is shown with the schematic of each

2080 locus. The schemes are not in real scale. Values represent the average of three independent

2081 experiments (mean ± SEM are shown; n = 3).

2082

2083 Figure 1–figure supplement 7. The genes modulated by ectopic expression of Tat are

2084 also detected during a time-course HIV infection experiment. (A) Jurkat T cells were

2085 infected with HIV (NL4-3) and levels of p24/Capsid protein was quantified using ELISA at

2086 different time points post-infection (0, 3 and 7 hours; 1, 2, 4, 6, 8, 10, and 12 days). Values

80 2087 represent the average of three independent experiments (mean ± SEM are shown; n = 3). Cells

2088 from panel (A) were used to isolate total RNA and the expression of three TSG: CD69 (B),

2089 FAM46C (C), and PPM1H (D); and three TDG: CD1E (E), EOMES (F) and FBLN2 (G)

2090 normalized to RPL19 was measured by qRT-PCR and plotted as fold RNA change over the

2091 GFP cell line arbitrarily set at 1 (mean ± SEM are shown; n = 3). The points in the curve were

2092 fitted to a non-linear regression in GraphPad Prism.

2093

2094 Figure 1–figure supplement 8. HIV infection of central memory CD4+ T cells triggers

2095 deregulation of TSG and TDG detected in the genome-wide approaches. (A) Scheme of

2096 the pipeline used to generate primary central memory T cells (TCM) and infect with replication

2097 competent HIV to identify differentially expressed genes. (B) qRT-PCR analysis on the indicated

2098 class I and II TSG, TDG and non-target genes (mean ± SEM are shown; n = 3). Cells from

2099 panel (A) were used to isolate total RNA and the expression of initiating (In) and elongating (El)

2100 transcripts for class I TSG (C), class II TSG (D), class I TDG (E) and class II TDG (F) was

2101 measured by qRT-PCR, normalized to RPL19, and plotted as fold RNA change: HIV

2102 infection/mock infection (mean ± SEM are shown; n = 3).

2103

2104 Figure 1–figure supplement 9. Response network of TSG and TDG. A response network

2105 was constructed based on RNA-seq data using a p-value cut off p = 0.01. The network consists

2106 of 138 nodes and 161 edges. Multiple edges between nodes indicate multiple evidence from

2107 different protein-protein interaction datasets (Human Protein Reference Database (HPRD) (Goel

2108 et al., 2011); IntAct (Kerrien et al., 2012); NetworKin, BHMRSS, CORUM, Hynet (Konig et al.,

2109 2010) and NCBIs HIV-1 interaction databases). Green and red nodes denote TSG and TDG,

2110 respectively. Edges are directional according to the respective databases. Edges with circles as

2111 ‘arrow tips’ denote phosphorylation reactions, ‘diamond shaped tips’ refer to general activation,

2112 and ‘arrows’ describe other reactions. Non-directional edges indicate binding.

81 2113

2114 Figure 1–figure supplement 10. Enrichment of TSG and TDG in publicly available

2115 datasets of differentially expressed genes identified in HIV infection and replication

2116 experiments. (A) The enrichment analysis with 48 publicly available differentially expressed

2117 gene datasets identified in HIV and replication experiments and 14 HIV relevant pathways from

2118 MSigDB are shown. (B) The functional annotation by the MSigDB canonical pathways (set

2119 c2.cp) is depicted. 12 significantly TSG enriched gene sets and 43 significantly TDG enriched

2120 gene-sets have been identified. P-values are Bonferroni adjusted for multiple testing. The

2121 vertical line indicates FDR = 0.05.

2122

2123 Figure 2–figure supplement 1. Tat specifies TSS selection and synthesis of alternate

2124 isoforms. (A) Genome browser views of FLAG, H3K4me3 and Pol II ChIP-seq tracks in the

2125 GFP and Tat cell lines along with the Refseq track for the ADCYAP1 locus. The position of the

2126 canonical and Tat-induced ‘novel’ TSS is indicated with arrows. (B) Genome browser views of

2127 FLAG, H3K4me3 and Pol II ChIP-seq tracks in the GFP and Tat cell lines along with the Refseq

2128 track for the ARHGEF7 locus. The position of the canonical and Tat-induced ‘novel’ TSS is

2129 indicated with arrows. (C) Normalized H3K4me3 tag density (-/+ 3 Kb respective to the TSS) for

2130 class I TSG in the Tat (Black) and GFP (Green) cell lines. (D) Genome browser views of

2131 H3K4me3 ChIP-seq tracks in the GFP and Tat cell lines for four class I TSG from panel (C).

2132

2133 Figure 2–figure supplement 2. Correlation between gene expression levels and H3K4me3

2134 density surrounding the TSS at class I and class II TSG. (A) Correlation plot between

2135 normalized H3K4me3 tag density (-/+ 3 kb respective to the TSS) and total gene expression

2136 levels (based on RNA-seq FPKM) at class I TSG in the presence and absence of Tat (Tat/GFP).

2137 (B) Correlation plot between normalized H3K4me3 tag density (-/+ 3 kb respective to the TSS)

2138 and total gene expression levels (based on RNA-seq) at class II TSG in the presence and

82 2139 absence of Tat (Tat/GFP).

2140

2141 Figure 2–figure supplement 3. Evidence that Tat increases Pol II and P-TEFb recruitment,

2142 and chromatin marks coinciding with transcription initiation and elongation at class I

2143 TSG. (A) Jurkat-GFP and -Tat cell lines were used in ChIP assays to analyze the occupancy of

2144 GFP and Tat (FLAG), H3K4me3, H3K79me3, H3K36me3, Pol II (total), Pol II (Ser2P-CTD form)

2145 and P-TEFb (CDK9) at the CD69 locus with the three indicated amplicons. The numbers

2146 indicate the position of the amplicons respective to the TSS. (B) Jurkat-GFP and -Tat cell lines

2147 were used in ChIP assays to analyze the occupancy of GFP and Tat (FLAG), H3K4me3,

2148 H3K79me3, H3K36me3, Pol II (total), Pol II (Ser2P-CTD form) and P-TEFb (CDK9) at the

2149 ADCYAP1 locus with the three indicated amplicons. The numbers indicate the position of the

2150 amplicons respective to the TSS. For both panels, IP DNA (% Input) values represent the

2151 average of three independent experiments (mean ± SEM are shown; n = 3).

2152

2153 Figure 2–figure supplement 4. Transcription initiation correlates with increased

2154 elongation chromatin markers. (A) Heatmap representation of ChIP-seq binding for H3K4me3

2155 (blue), H3K79me3 (red), and H3K36me3 (brown) at class I and class II TSG, rank ordered from

2156 lowest to most H3K4me3 density increase from the GFP to the Tat cell line. The asterisk

2157 denotes the position of the TTS. (B) Genome browser views of H3K4me3, H379me3 and

2158 H3K36me3 ChIP-seq tracks in the GFP and Tat cell lines along with the Refseq track for the

2159 ADCYAP1 locus (class I TSG). The position of the TSS is indicated with an arrow. (C) Genome

2160 browser views of H3K4me3, H379me3 and H3K36me3 ChIP-seq tracks in the GFP and Tat cell

2161 lines along with the Refseq track for the ATP9A locus (class I TSG). (D) Genome browsers of

2162 H3K4me3, H379me3 and H3K36me3 ChIP-seq tracks in the GFP and Tat cell lines along with

2163 the Refseq track for the CD82 locus (class II TSG). (E) Genome browser views of H3K4me3,

2164 H379me3 and H3K36me3 ChIP-seq tracks in the GFP and Tat cell lines along with the Refseq

83 2165 track for the FAM46C locus (class II TSG).

2166

2167 Figure 2–figure supplement 5. Tat recruits chromatin-modifying enzymes and elongation

2168 factors at selected target genes to promote transcription elongation. (A) Strep-tagged

2169 GFP and Tat were affinity purified (AP) from nuclear preparations of the Jurkat cell lines (1x10^9

2170 total cells) and interacting partners were analyzed by western blot with the indicated antibodies.

2171 CDK9 was used as a positive protein interacting control. (B) ChIP assay to analyze the

2172 distribution of histone marks (H3K79me3 and H3K36me3) and chromatin-modifying enzymes

2173 (Dot1L and SetD2) at the CD69 locus of the GFP (green) and Tat (black) cell lines. The position

2174 of the amplicons used in ChIP-qPCR is shown with the schematic of the locus. Values represent

2175 the average of three independent experiments (mean ± SEM are shown; n = 3). (C) Genome

2176 browsers of FLAG, Pol II, H3K4me3, H379me3 and H3K36me3 ChIP-seq tracks in the GFP and

2177 Tat cell lines along with the Refseq track for the CD69 locus. The position of the TSS is

2178 indicated with an arrow. (D) Knockdown of Dot1L, SetD2 and CDK9 blocks Tat-mediated CD69

2179 transcription activation. qRT-PCR of CD69 in the Jurkat-GFP and -Tat cell lines expressing non-

2180 target (NT), Dot1L, SetD2 and CDK9 shRNAs (mean ± SEM are shown; n = 3). (E) Knockdown

2181 of Dot1L, SetD2 and CDK9 does not alter RNA steady state levels of RPL19. qRT-PCR of

2182 RPL19 in the Jurkat-GFP and -Tat cell lines expressing non-target (NT), Dot1L, SetD2 and

2183 CDK9 shRNAs (mean ± SEM are shown; n = 3). (F, G, H) qRT-PCR validation of Dot1L, SetD2

2184 and CDK9 knockdown in the Jurkat-GFP and -Tat cell lines using gene-specific primers and

2185 normalized to RPL19 (mean ± SEM are shown; n = 3).

2186

2187 Figure 3–figure supplement 1. Tat recruitment induces Pol II and chromatin signatures

2188 controlling transcription initiation or elongation at different gene classes. (A) Genome

2189 browser views of FLAG, Pol II, H3K4me3, H379me3 and H3K36me3 ChIP-seq tracks in the

2190 GFP and Tat cell lines along with the Refseq track for the NETO1 locus (class I TSG). The

84 2191 position of the TSS is indicated with an arrow. (B) Genome browsers of FLAG, Pol II, H3K4me3,

2192 H379me3 and H3K36me3 ChIP-seq tracks in the GFP and Tat cell lines along with the Refseq

2193 track for the VAV3 locus (class II TSG). The position of the TSS is indicated with an arrow.

2194

2195 Figure 3–figure supplement 2. Tat controls P-TEFb and Pol II recruitment at class II TSG

2196 and class II TDG. (A–B) Tat promotes P-TEFb recruitment and Pol II elongation at class II

2197 TSG. (C–D) Tat precludes P-TEFb recruitment to block Pol II elongation at class II TDG. Jurkat-

2198 GFP and -Tat cell lines were used in ChIP assays to analyze the occupancy of GFP and Tat

2199 (FLAG), P-TEFb and Pol II at the indicated target genes using the indicated amplicons. Values

2200 represent the average of three independent experiments (mean ± SEM are shown; n = 3).

2201

2202 Figure 4–figure supplement 1. Stimulation of gene expression by Tat correlates with

2203 increased H3K27Ac density at the Tat peak. (A) Correlation plot between normalized

2204 H3K27Ac (-/+ 1 kb respective to the Tat peak) and total gene expression levels (based on RNA-

2205 seq FPKM) at class I TSG in the presence and absence of Tat (Tat/GFP). (B) Correlation plot

2206 between normalized H3K27Ac (-/+ 1 kb respective to the Tat peak) and total gene expression

2207 levels (based on RNA-seq FPKM) at class II TSG in the presence and absence of Tat

2208 (Tat/GFP).

2209

2210 Figure 4–figure supplement 2. Wild-type Tat but not the C22A non-functional mutant

2211 induces gene looping between the promoter and intragenic sites. (A) Genome browser

2212 views of ChIP-seq at the PPM1H locus in the GFP and Tat cell lines. The arrows indicate the

2213 position of two intragenic Tat binding sites. (B) 3C assay showing the relative crosslinking

2214 efficiency between a promoter anchor primer (top) or an upstream control primer (bottom) and

2215 several distal sites at the PPM1H locus in the GFP and Tat cell lines. The position of the primers

2216 used in qPCR assays, restriction sites and fragment generated after digestion are indicated

85 2217 (mean ± SEM are shown; n = 3).

2218

2219 Figure 4–figure supplement 3. Tat-mediated gene looping does not require transcription

2220 activity at enhancers. (A) Genome browser views of ChIP-seq at the PPM1H locus in the GFP

2221 and Tat cell lines. The arrows indicate the position of two intragenic Tat binding sites. (B) 3C

2222 assay showing the relative crosslinking efficiency between a promoter anchor primer and

2223 intragenic sites at the PPM1H locus in the GFP and Tat cell lines treated with DMSO (-FP) or

2224 flavopiridol (+FP). The position of the primers used in qPCR assays, restriction sites and

2225 fragment generated after digestion are indicated. (C) The GFP and Tat cell lines were treated

2226 with DMSO (-FP) or FP (+FP), and RNA isolated to measure levels of PPM1H mRNAs. (D) The

2227 GFP and Tat cell lines were treated with DMSO (-FP) or FP (+FP), and RNA isolated to

2228 measure levels of the intragenic enhancer-derived RNAs (eRNAs) (mean ± SEM are shown; n =

2229 3).

2230

2231 Figure 4–figure supplement 4. Class II TSG contain stable gene loops between promoter-

2232 enhancer that remain unaltered in response of Tat. (A) Genome browser views of ChIP-seq

2233 at the CD82 locus in the GFP and Tat cell lines. The arrows indicate the position of two

2234 intragenic Tat binding sites. (B) 3C assay showing the relative crosslinking efficiency at the

2235 CD82 locus in the GFP and Tat cell lines using a promoter anchor primer (top) or an upstream

2236 control primer (bottom). The position of the primers used in qPCR assays, restriction sites and

2237 fragment generated after digestion are indicated (mean ± SEM are shown; n = 3).

2238

2239 Figure 5–figure supplement 1. Intragenically paused Pol II is detected in CD4+ T cell lines

2240 and primary cells at sites with high H3K27Ac and low H3K4me3. Genome browser views of

2241 H3K4me3, H3K27Ac and different Pol II forms (total CTD (CTD), Ser5P-CTD and Ser2P-CTD)

2242 in the Jurkat-GFP cell line and primary CD4+ T cells along with the Refseq track for the ABCG1

86 2243 locus (class II TSG). The position of the different isoforms is indicated.

2244

2245 Figure 6–figure supplement 1. Different modes of Tat repression at class I and class II

2246 TDG. (A) Tat blocks transcription initiation of CD1E by interfering with assembly of the pre-

2247 initiation complex at the promoter. (B) Tat prevents elongation of EOMES by precluding P-TEFb

2248 loading at the promoter. ChIP-qPCR experiments with the indicated antibodies, in the GFP and

2249 Tat cell lines. The position of the amplicons used in ChIP-qPCR and their distance to the TSS

2250 (arrow) is indicated. Values represent the average of three independnt experiments. The SEM is

2251 less than 10% and not shown for simplicity. IP, immunoprecipitation.

2252

2253 Figure 7–figure supplement 1. mTR reveals that Tat blocks both Pol II recruitment and

2254 pause release at TDG. Calculations of the ratio of paused Pol II (pPol II) and traveling Pol II

2255 (tPol II) at class I and class II TDG in the presence and absence of Tat (Tat/GFP).

2256

2257 Figure 8–figure supplement 1. The interaction between Tat and host cell chromatin

2258 appears to be primary dictated by protein-protein interactions. (A) Fractionation scheme of

2259 Jurkat-GFP and Jurkat-Tat cells by increased salt extraction in the absence (–) or presence (+)

2260 of RNase A. Ch denotes the chromatin fraction. (B) Western blots of the samples prepared as in

2261 panel (A) with the indicated antibodies. The efficiency of RNase treatment was verified by

2262 electrophoresis of the purified RNA in an agarose gel stained with ethidium bromide. (C) ChIP

2263 assays to analyze the occupancy of GFP and Tat (FLAG) at the CD69 promoter (-63 amplicon)

2264 in the absence (–) and presence (+) of RNase. (D) ChIP assays to analyze the occupancy of

2265 GFP and Tat (FLAG) at the FAM46C promoter (-182 amplicon) in the absence (–) and presence

2266 (+) of RNase. (E) ChIP assays to analyze the occupancy of GFP and Tat (FLAG) at the CD1E

2267 promoter (+4 amplicon) in the absence (–) and presence (+) of RNase. (F) ChIP assays to

2268 analyze the occupancy of GFP and Tat (FLAG) at the EOMES promoter (-57 amplicon) in the

87 2269 absence (–) and presence (+) of RNase (mean ± SEM are shown; n = 3).

2270

2271 Figure 9–figure supplement 1. The C22A non-functional Tat mutant fails to bind ETS1

2272 and evidence that ETS1 is critical for transcription activation of Tat target genes. (A)

2273 Western blots showing interactions between Tat (but not GFP or the C22A non-functional Tat

2274 mutant) and ETS1. CDK9 was used as a positive control in the interaction. Strep-tagged GFP,

2275 Tat, and C22A were affinity purified (AP) from the Jurkat cell lines using Strep beads, and

2276 analyzed by western blot using the indicated antibodies. (B) qRT-PCR analysis of CD69

2277 expression in the indicated cell lines. (C) qRT-PCR analysis of ADCYAP1 expression in the

2278 indicated cell lines. (D) qRT-PCR analysis of VAV3 expression in the indicated cell lines. (E)

2279 qRT-PCR analysis of FAM46C expression in the indicated cell lines. (F) qRT-PCR analysis of

2280 RPL19 expression in the indicated cell lines. (G) qRT-PCR analysis of 7SK expression in the

2281 indicated cell lines. Expression of the indicated genes (panels B–G) was normalized to ACTB.

2282 Values represent the average of three independent experiments (mean ± SEM are shown; n =

2283 3).

88 Figure 1 A C D Jurkat T-cell line Combinatorial analysis 32 qRT-PCR 16 GFP Tat ChIP-seq RNA-seq DOX –+ –+ 8 2751 2013 4 Tat bound genes DE genes 2 1 FLAG -2 -4 2295 456 1557 -8 -16 β-actin Log2 fold change (Tat/GFP) -32 B 7SK ACTB CD69 CD1E CDK6 Indirect RPL19 ANXA1ZNF83 FBLN2 Direct FAM46CPPM1H EOMES PRAME targets ADCYAP1 FAM133B Exon5’UTR (2.1%) targets Intron Intergenic Promoter 3’UTR TSG TDG 36.4 35 24 Non- Direct targets 0 50 100 244 TSG 212 TDG targets Percent of sequences E F TSG GO Biological Process TSG Regulation of immune system process CD69 Immune system process Positive regulation of immune system process GFP Regulation of immune response Cell activation Tat Positive regulation of immune response Positive regulation of cell activation ChIP-seq RefSeq TSS Positive regulation of leukocyte activation Leukocyte activation Intron Promoter Positive regulation of lymphocyte activation Immune response-regulating cell surface receptor signaling Antigen receptor-mediated signaling pathway Immune response-activating cell surface receptor signaling GFP Regulation of cell activation Lymphocyte differentiation Tat Regulation of leukocyte activation

RNA-seq Leukocyte differentiation Regulation of lymphocyte activation Lymphocyte activation ADCYAP1 Regulation of B cell proliferation

GFP 10e-05 10e-10 10e-15 10e-20 Tat Binomial p-value

ChIP-seq RefSeq TSS GO Biological Process TDG Intron 3’-UTR Negative regulation of cell aging Regulation of cell aging Negative regulation of myeloid cell differentiation GFP Interphase Regulation of myeloid cell differentiation Tat Endocrine pancreas development

RNA-seq Cell cycle process Chromatin assembly or disassembly Nucleosome assembly Chromatin assembly Nucleosome organization TDG Cell cycle phase Protein-DNA complex subunit organization CD1E Cell cycle Protein-DNA complex assembly GFP Pancreas development Regulation of gene silencing Tat DNA packaging S phase RefSeq DNA conformation change TSS 10e-25 10e-50 10e-75 10e-100 Binomial p-value GFP Tat RNA-seq ChIP-seq Figure 2

A H3K4me3 B H3K4me3 8 NETO1 ATP9A Class I Class II ADCYAP1 0.035 0.1 4 ZNF83 CD244 0.03 PPM1H 0.08 RORβ 0.025 FLT1 ZNF521 2 PREX2 CD69 0.06 IQGAP2 ETV6 0.02 VAV3 0.015 0.04 1 0.01 0.02 0.005 Tat GFP

Normalized tag density Normalized tag density Tat -2 GFP -3-2 -1 0 +1 +2 +3 -3-2 -1 0 +1 +2 +3

Log2 fold change (Tat/GFP) Distance from TSS (Kb) Distance from TSS (Kb) -4 C D H3K79me3 H3K79me3 64 CD69 ADCYAP1 5 Class I1.2 Class II 32 1.0 VAV3 4 16 CD82 ETV6 0.8 3 8 0.6 2 0.4 4 1 0.2 Tat 2 Normalized tag density Tat Normalized tag density GFP GFP 1 TSS TTS TSS TTS Log2 fold change (Tat/GFP) -2

E H3K36me3 F H3K36me3 16 5 Class I2.5 Class II RORβ 8 NETO1 ADCYAP1 2.0 ZNF83 4 ZNF521 CD69 3 1.5 4 PPM1H ETV6 2 1.0 VAV3 Tat 2 1 0.5 GFP Tat Normalized tag density Normalized tag density 1 GFP TSS TTS TSS TTS Log2 fold change (Tat/GFP) -2

GHNETO1 (Class I) VAV3 (Class II) GFP GFP H3K4me3 H3K4me3 Tat Tat GFP GFP H3K79me3 H3K79me3 Tat Tat

GFP GFP H3K36me3 H3K36me3 Tat Tat Refseq TSS Refseq TSS Figure 3

ABPol II density at Tat peak Pol II density at promoter 2.0 Class I 44.0Class II 1.2 Class I 2.0 Class II

1.5 33.0 0.9 1.5

1.0 22.0 0.6 1.0

0.5 11.0 0.3 0.5 Normalized tag density Normalized tag density Normalized tag density Normalized tag density

GFP Tat GFP Tat GFP Tat GFP Tat C D 0.8 Class I 2.0 Class II

0.6 1.5

0.4 1.0

0.2 0.5 Tat Tat GFP GFP Normalized tag density -1 -0.5 0+1+0.5 Normalized tag density -1 -0.5 0+1+0.5 Distance from Tat peak (Kb) Distance from Tat peak (Kb) E F Tat Class I NETO1 (Class I) 0.25 GFP 0.20 FLAG Tat 0.15 GFP 0.10 Pol II Tat 0.05 Tat GFP GFP H3K4me3 Normalized tag density Tat TSS TTS Refseq TSS G H Class II VAV3 (Class II) 0.7 GFP 0.6 FLAG 0.5 Tat 0.4 GFP 0.3 Pol II Tat 0.2 Tat 0.1 GFP GFP H3K4me3 Normalized tag density Tat TSS TTS Refseq

TSS Figure 4 ADB C H3K4me3 H3K27Ac H3K27Ac H3K27Ac Tat Tat GFP Tat GFP Tat GFP Tat 4 Class I 5 Class II

H3K4me3 high Class I 4 H3K27Ac low 3 Promoter Promoter 3 2 2 H3K4me3 low 1 Normalized tag density H3K27Ac high Normalized tag density 1 Class II Ranked based on K27 density Intragenic Intragenic GFP Tat GFP Tat

(Centered on Tat peak) (Centered on Tat peak) TSS

E G I Tat Tat PPM1H (Class I) PPM1H Class I 3 GFP FLAG 5 +3755 Tat 2 4 GFP Pol II 3 1 Tat 2 GFP 1 H3K4me3 IP DNA (% Input) DNA IP Tat

Normalized tag density -3-2 -1 0 1 2 3 Distance from Tat peak (Kb) GFP Tat Tat GFP GFP H3K27Ac Tat H3K27Ac p300 Refseq

F H 300 kb 200 kb 100 kb TSS Class II CD82 (Class II) Primers 3 Restriction Anchor +4180 2 10 Fragments 12 34 5 67 8 9 8 12

1 6 4 9 2 IP DNA (% Input) DNA IP Normalized tag density -3-2 -1 0 1 2 3 6 Distance from Tat peak (Kb)

Tat Tat GFP GFP 3 H3K27Ac p300 Crosslinking efficiency Figure 5 A B Traveling Ratio (TR) CD82 (Class II) Pol II promoter density GFP TR = Pol II gene body density FLAG Tat Promoter Gene Body GFP Pol II Tat

GFP H3K4me3 Tat

GFP H3K4me1

ChIP-seq enrichment Tat

GFP TSS TTS H3K27Ac -50bp +1000bp Tat Refseq TSS

C Modified Traveling Ratio (mTR) D E 8 pPol II (promoter+gene body density) 128 mTR = tPol II (gene body density) 7 64 Promoter Gene Body 6 Intragenic enhancer 32 (H3K27Ac+) 5 16 4

8 3 tPol II (Tat/GFP) Normalized mTR 4 2

ChIP-seq enrichment 2 1 Class I GFP Tat Class II TSS TTS 1234567 8 -50bp +1000bp pPol II (Tat/GFP) F G Class I TSG Class II TSG

Tat Pol II Pol II Tat Pol II Pol II Promoter TSS

TSS Pol II Pol II Intragenic Tat

Pol II Pol II Pol II

TSS TSS Figure 6 A B H3K4me3 H3K4me3 2 Class I Class II 1 0.10 0.05

-2 0.08 0.04

-4 0.06 0.03 THEMIS DAB2IP AOX1 -8 MYO7B 0.04 0.02

-16 FBLN2 0.02 0.01 Tat GFP GFP Tat -32 Normalized tag density Normalized tag density -3-2 -1 0 +1 +2 +3 -3-2 -1 0 +1 +2 +3 Log2 fold change (Tat/GFP) CD1E -64 Distance from TSS (Kb) Distance from TSS (Kb) C D H3K79me3 H3K79me3 4 Class I Class II 2 0.025 0.025

1 0.020 0.020

0.015 0.015 -2 0.010 0.010 FBLN2 -4 EOMES 0.005 0.005 GFP Tat Normalized tag density -8 Tat Normalized tag density GFP

Log2 fold change (Tat/GFP) CD1E TSS -16 TTS TSS TTS

E H3K36me3 F H3K36me3 2

0.05 Class I0.01 Class II

0.04 0.008 1 0.03 0.006

0.02 0.004 GFP GFP -2 EOMES 0.01 Tat 0.002 Tat FBLN2 Normalized tag density Normalized tag density CD1E Log2 fold change (Tat/GFP) TSS TTS TSS TTS -4 GH CD1E (Class I) EOMES (Class II) GFP GFP FLAG FLAG Tat Tat GFP GFP H3K4me3 H3K4me3 Tat Tat GFP GFP H3K79me3 H3K79me3 Tat Tat

GFP GFP H3K36me3 H3K36me3 Tat Tat Refseq TSS Refseq TSS Figure 7

ABPol II density at Tat peak Pol II density at promoter 8.0 Class I 16.0 Class II 4.0 Class I 4.0 Class II 8.0 4.0 2.0 2.0 4.0 2.0 1.0 1.0 2.0 1.0 0.5 0.5 1.0 0.5 0.25 0.25 0.5 0.25 0.125 0.125

Normalized tag density Normalized tag density 0.25 Normalized tag density Normalized tag density

0.125 0.125 0.0625 0.0625 GFP Tat GFP Tat GFP Tat GFP Tat CD 1.2 Class I 1.2 Class II

0.9 0.9

0.6 0.6 GFP GFP 0.3 0.3 Tat Tat Normalized tag density Normalized tag density -1 -0.5 0+1+0.5 -1 -0.5 0+1+0.5 Distance from Tat peak (Kb) Distance from Tat peak (Kb) E F CD1E (Class I) 0.12 Class I GFP 0.10 FLAG Tat 0.08 GFP 0.06 GFP Pol II 0.04 Tat Tat 0.02 GFP H3K4me3 Normalized tag density Tat TSS TTS Refseq TSS G H EOMES (Class II) 0.10 Class II GFP 0.08 FLAG Tat 0.06 GFP Pol II 0.04 GFP Tat 0.02 Tat GFP H3K4me3 Normalized tag density Tat TSS TTS Refseq

TSS Figure 8 A B TAR TAR-like (query)

GG Loop U G N(4-40) C A CG NN GC NN AU AU GC GC Bulge N(1-2) N(1-2) U U AU AU GC GC AU NN CG NN CG NN C D 1.0 TSG 1.0 TDG

0.8 0.8

0.6 0.6

0.4 0.4

TAR-like motifs TAR-like 0.2 motifs TAR-like 0.2 Fraction of genes with Fraction of genes with

<10kb <1kb <0.1kb <10kb <1kb <0.1kb Distance from Tat peak Distance from Tat peak EF TSG TDG Transcription Transcription Motif P-value Motif P-value factor factor

-18 -9 ETS1 2.5x10 ETS1 1.1x10

-14 -7 RUNX1 9.6x10 RUNX1 2.0x10

-12 GATA3 1.9x10 G H 1.0 TSG 1.0 TDG

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2 ETS1 ChIP-seq peaks ETS1 ChIP-seq peaks Fraction of genes with Fraction of genes with

<10kb <1kb <0.1kb <10kb <1kb <0.1kb Distance from Tat peak Distance from Tat peak Figure 9 A B C Strep AP CD69 CD1E 3’ Inputs Elutions Tat (ChIP-Seq) Tat (ChIP-Seq) ETS1 (ChIP-Seq) ETS1 (ChIP-Seq) ETS1 motif prediction ETS1 motif prediction GFPTat GFPTat -63 +5897 +4 +5215 Q-PCR amplicons Q-PCR amplicons Promoter- Promoter- Promoter- Promoter- GFP proximal distal proximal distal 0.8 1.6

FLAG 0.6 1.2

0.4 0.8 Tat FLAG FLAG 0.2 0.4

ETS1 (% Input) DNA IP (% Input) DNA IP

CDK9 2.0 2.0

1.5 1.5

1.0 ETS1 ETS1 1.0 0.5 0.5 IP DNA (% Input) DNA IP IP DNA (% Input) DNA IP

Tat Tat Tat Tat GFP GFP GFP GFP D E F Jurkat cell line CD69 CD1E GFP Tat 3’ -63 +4 2.5 2.0 shRNA – NT ETS1– NT ETS1 2.0 1.5 GFP 1.5 ETS1 1.0 ETS1 FLAG 1.0 0.5 0.5 IP DNA (% Input) DNA IP (% Input) DNA IP

Tat shRNA NT NT shRNA NT NT ETS1 ETS1 ETS1 ETS1 ETS1 GFP Tat GFP Tat

1.0 2.0

β-actin 0.8 1.5 0.6 FLAG 1.0 FLAG 0.4 0.5 0.2 IP DNA (% Input) DNA IP IP DNA (% Input) DNA IP

shRNA NT NT shRNA NT NT ETS1 ETS1 ETS1 ETS1 GFP Tat GFP Tat