A structure-based mechanism for HEXIM displacement from 7SK

Vincent Pham Harvard University Michael Gao Harvard University Jennifer Meagher University of Michigan Janet Smith University of Michigan https://orcid.org/0000-0002-0664-9228 Victoria D'Souza (  [email protected] ) Harvard University

Article

Keywords: Transcriptional Elongation, Conformational Plasticity, Remodeling Site, Local Destabilization, Competitive Regulation

Posted Date: February 2nd, 2021

DOI: https://doi.org/10.21203/rs.3.rs-198962/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License 1 A structure-based mechanism for HEXIM displacement from 7SK

2 Vincent V. Pham1, Michael Gao1, Jennifer L. Meagher2, Janet L. Smith2,3 & Victoria M. 3 D’Souza1*

4 1Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 5 02138, USA. 2Life Sciences Institute, University of Michigan, Ann Arbor, MI, 48109, USA. 6 3Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, 7 USA. *Correspondence and requests for materials should be addressed to V.M.D (email: 8 [email protected]).

9

10 Productive transcriptional elongation of many cellular and viral mRNAs requires

11 transcriptional factors to extract pTEFb from the 7SK snRNP by modulating the

12 association between the HEXIM and the 7SK snRNA. Here we report the

13 structure of the HEXIM arginine rich motif in complex with the apical stemloop-1 of

14 7SK (7SK-SL1apical) and detail how the HIV transcriptional regulator Tat from various

15 subtypes overcome the structural constraints required to displace HEXIM. While the

16 majority of interactions between 7SK and HEXIM and Tat are similar, critical

17 differences exist that guide function. First, the conformational plasticity of 7SK

18 enables the formation of three different configurations at a critical

19 remodeling site, which allows for the modulation required for HEXIM binding and its

20 subsequent displacement by Tat. Furthermore, the specific sequence variations

21 observed in various Tat subtypes all converge on remodeling 7SK at this region.

22 Second, we show that HEXIM primes its own displacement by causing specific local

23 destabilization upon binding — a feature that is then exploited by Tat to bind 7SK

24 more efficiently. Overall, our study details the molecular environment presented by

25 HEXIM and uncovers a destabilization-driven displacement strategy that increases the

26 conformational sampling of 7SK-snRNP, which may allow diverse transcriptional

27 factors to competitively regulate pTEFb. 28 Transcription of all class II is a highly regulated process within cells. Shortly

29 after promoter clearance, RNA Polymerase II is inhibited by the negative elongation

30 factors1-5. Release from this stalled state requires all components to be phosphorylated by

31 the positive elongation factor pTEFb, a heterodimeric complex consisting of

32 and the cyclin-dependent kinase Cdk96-11. However, most of the pTEFb is kept

33 catalytically inactive in the nucleus by the 7SK small nuclear ribonucleoprotein (7SK

34 snRNP) through its interactions with the HEXIM adapter protein and the 5’ stemloop-1

35 of the 7SK RNA 12-18 (7SK-SL1; Fig. 1a). Thus, productive transcriptional elongation of

36 many genes requires transcriptional factors to extract pTEFb from the 7SK snRNP — a

37 process that involves manipulating the interaction between HEXIM and 7SK. This

38 association between 7SK and HEXIM tightly controls the balance between active and

39 inactive pTEFb, and dysregulation of this interaction can have serious biological

40 consequences including cardiac hypertrophy and breast and pancreatic cancers19-21.

41 Furthermore, as many viruses rely on the host transcriptional machinery to produce

42 mRNA and genomes, they have also evolved mechanisms to capture pTEFb22-24. One such

43 unique case is the human immunodeficiency virus (HIV), which utilizes the viral Tat

44 protein to extract pTEFb by binding to the same region of 7SK as HEXIM and directly

45 displacing it 24. Structural insights into the consequence of HEXIM binding to 7SK and

46 how positive transcriptional factors like Tat compete with it are therefore important for

47 understanding HEXIM’s potency as a critical negative regulator.

48 To date, two HEXIM have been identified that are capable of carrying out

49 the same function and both bind 7SK with identical regions of their Arginine Rich Motifs

50 (ARMs) (residues 151-159 in HEXIM1 and 89-97 in HEXIM2)24-27. Although HEXIM binds

51 7SK as a dimer, only one ARM directly contacts 7SK by engaging the apical region of

apical 28-33 52 stemloop-1 (G26 to C85, 7SK-SL1 ; Fig. 1a) . Both in vitro and in vivo studies have 53 shown that this represents the sole interaction between the two molecules that must be

54 modulated in order to release pTEFb27-29,31,32.

55 Our previous work showed that 7SK-SL1apical is enriched in arginine sandwich

56 motifs (ASMs)34. ASMs are defined by two nucleotides that stack in a manner that allows

57 for intercalation of arginine guanidinium moieties between the aromatic rings of the

58 bases34-37 (Fig. 2a). While a bulge pyrimidine forms the cap by engaging in a triple base

59 interaction with an n + 2 base pair in the stem, a Watson-Crick base paired nucleotide

60 preceding the bulge forms the base of the interaction. In the free 7SK-SL1apical, three such

61 bulges fold into preformed arginine sandwich motifs (ASM1, ASM2 and ASM4) poised for

62 arginine guanidinium moieties to dock into them. A fourth bulge folds into a pseudo

63 configuration (pseudo-ASM3) where U40 is able to form a triple base interaction with the

64 A43-U66 base pair to form the cap, but the base of the sandwich is sequestered in a reverse

65 Hoogsteen interaction, excluding it from use as a classical ASM. Our work also showed

66 that HIV-1 Tat NL4-3 (TatNL4-3) uses its arginine-rich motif to intercalate arginines not only

67 into the three preformed ASMs, but also to remodel the pseudo-ASM into a classical

34 68 ASM . This structural remodeling of pseudo-ASM3 is a key mechanism through which

69 Tat displaces HEXIM.

70 However, without the structure of the HEXIM:7SK-SL1apical interaction, it is

71 currently unclear what structural constraints Tat would need to overcome in order to

72 access pTEFb. Furthermore, while the Tat ARM is highly conserved, sequence variations

73 exist in different strains that allow for HEXIM displacement. For example, the ARM of

Fin 74 Tat Finland (Tat ; KR52KHRRR) differs from HEXIM (KK151KHRRR) by only a single

75 amino acid and would lack one of the ASM interactions from the previously described

G 76 Tat NL4-3 strain (KR52RQRRR). Additionally, while Tat subtype G (Tat ; KR52R53HRRR)

77 has an equivalent number of arginines as TatNL4-3, the critical linker sequence connecting

78 ASM3/ASM4 and ASM1/ASM2 interactions is the same as HEXIM. In this study, we present 79 the structure of the 7SK-SL1apical in complex with the HEXIM, TatFin, and TatG ARMs.

80 Despite sequence variations, the structures show deep major groove intercalations of all

81 ARMs, albeit with differential interactions with pseudo-ASM3 and ASM4. Furthermore,

82 we show that HEXIM causes local destabilization of ASM4, enhancing Tat’s affinity for

83 7SK. These studies thus uncover a feature in which HEXIM facilitates its own

84 displacement by increasing conformational sampling, which may be a more general

85 mechanism of pTEFb capture.

86

87 Results

88 Comparative binding affinities of HEXIM and Tat to 7SK.

89 As a first step toward identifying the atomic determinants of 7SK recognition by

90 HEXIM, we performed binding studies using isothermal titration calorimetry (ITC).

91 Studies have shown that the first 108 nucleotides comprising stemloop-1 of 7SK (G1-C108;

92 7SK-SL1Full) are required for binding full length HEXIM as a dimer both in vivo and in

93 vitro24,27,32. ITC traces of full-length HEXIM1 with 7SK-SL1Full corroborate these studies (N

94 = 2.1 ± 0.2, Kd = 209 ± 34 nM; Fig. 1b). Additionally, ITC traces of full-length HEXIM1 to

Full-AGU apical 95 7SK-SL1 and 7SK-SL1 (G26 to C85), both containing an AGU triloop engineered

96 to prevent low levels of dimerization and thus aid in increasing quality of NMR spectra,

97 also reveal dimeric HEXIM binding to both constructs with comparable binding affinities

Full 98 to the full-length HEXIM1: 7SK-SL1 complex (N = 2 ± 0.1, Kd = 200 ± 20 nM, N = 1.8 ±

99 0.03, Kd = 206 ± 58 nM, and N = 2.1 ± 0.2, Kd = 209 ± 34 nM, respectively; Fig. 1c, d),

100 indicating that the loop does not play a significant role in HEXIM1 binding and

101 confirming that the HEXIM:7SK interaction is confined to the apical region stem of 7SK-

102 SL1. 103 Previous studies have also shown that while HEXIM binds 7SK as a dimer, the

104 ARM of only one HEXIM monomer is responsible for interacting with 7SK both in vivo

17,32,33 105 and in vitro . ITC traces of the N- terminal ARM residues (R146QLGKKKHRRR156;

106 HEXIMN-ARM) to 7SK-SL1apical with the AGU triloop confirm these findings as the binding

107 affinities between the full-length dimeric HEXIM1:7SK-SL1apical and HEXIMN-ARM:7SK-

apical 108 SL1 complexes are similar (N = 1.8 ± 0.03, Kd = 206 ± 58 nM and N = 1 ± 0.1, Kd = 238 ±

109 35 nM, respectively; Fig. 1d, e). NMR studies comparing these two complexes show that

110 binding of either full-length HEXIM or the N-ARM gives rise to the same nuclear

111 Overhauser effect (NOE) shifts in 7SK-SL1apical, indicating that the HEXIMN-ARM:7SK-

112 SL1apical interaction represents the biologically relevant mode of HEXIM binding to 7SK

113 (Supplementary Fig. 1).

NL4-3 114 Finally, our previous work showed that the Tat (KR52RQRRR) ARM represents

115 the interaction domain between Tat and 7SK-SL1apical and has an approximately two-fold

116 increased affinity over the HEXIMN-ARM. ITC traces show that Tat Subtype G’s ARM

apical 117 (KR52RHRRR), which also has two N-terminal arginines, binds 7SK-SL1 with a Kd of

118 91 ± 32 nM (N = 1.1 ± 0.1; Fig. 1f), which is an approximately 2.6-fold increased binding

119 affinity over HEXIMN-ARM (Supplementary Table 1). On the other hand, Tat Finland’s

120 ARM (KR52KHRRR), despite have an additional N-terminal arginine compared to the

121 HEXIMN-ARM (R52 and K151, respectively), does not have a statistically significant increase

122 in binding affinity (Kd of 198 ± 16 nM; Fig. 1g). Overall, these results highlight the need

123 for understanding the HEXIM-bound 7SK; while the increased TatG affinity would allow

124 for HEXIM displacement, it is unclear how TatFin can achieve the same biological output.

125

126 127 Preformed configurations of ASM1 and ASM2 provide a common mode of interaction

128 with C-terminal residues.

129 To understand how the HEXIMN-ARM and the various Tat ARMs interact with 7SK-

130 SL1apical, we utilized a combination of small-angle X-ray scattering (SAXS) and NMR. All

131 reconstructed ab initio SAXS envelopes showed no major overall global changes between

132 peptide-bound and free 7SK-SL1apical (Fig. 2b, c). Numerous intermolecular NOEs place

133 both HEXIM and Tat arginine rich motifs into the major groove of the RNA and allow us

134 to define their interactions with all ASM regions. Base pairs in the lower part of the stem-

135 loop below the G79-U32 base pair, as well as the CAGUG pentaloop do not give any

136 intermolecular NOEs, indicating that the interactions are contained within a single turn

137 of the helix (Fig. 2d-f).

apical 138 In the free 7SK-SL1 , ASM1 and ASM2 are placed in a tandem orientation and

139 upon titration of the various ARMs, all expected NOEs for such configurations are

140 retained. Unlike a typical ASM where the following nucleotide after the bulge is in a

141 canonical Watson-Crick base pair, in ASM1, the residue A77 is configured into an A34, A77

142 pair. A NOE from the A77 H8 proton to the H1′ of C75 positions this residue under the C75

143 cap (Supplementary Fig. 2). This confirms a planar orientation of C75 with the C33-G78 base

144 pair and configures A77 in such a way that it is perfectly positioned to interact with the

145 guanidinium moiety of R156 in HEXIMN-ARM and R57 in TatFin and TatG, which intercalate

146 between C75 and G74 in a manner identical to canonical ASMs (Supplementary Fig. 2-5).

+ 147 Similarly, in ASM2, the C71 base also retains its protonation at the N3 position as

148 evidenced by a downfield shift of the N4 amino protons (Supplementary Fig. 6). The

N-ARM Fin G 149 guanidinium moiety of R155 in HEXIM and R56 of Tat and Tat interact with G73

+ 150 by intercalating between the C71 cap and G70 base of the motif (Supplementary Fig. 3-5).

+ 151 Additionally, intermolecular NOEs from the aromatic protons of the C75 and C71 caps and 152 the G74 and G70 bases of ASM1 and ASM2 to the H훾 and the Hδ protons confirm that

153 consecutive arginines R156 and R155 interact in a ladder-like configuration with the

154 tandem preformed motifs ASM1 and ASM2, respectively (Fig. 3a and Supplementary Fig.

155 4c). Such NOEs are also observed in both the TatFin and the TatG-bound complexes,

156 confirming the similar placement of the C-terminal R57 and R56 into the tandem ASM1

157 and ASM2, respectively (Fig. 3a and Supplementary Fig. 5a, b, d, e). Taken together, the

158 structures reveal a common mode of interaction between the non-varying C-terminal

159 arginines and the tandem ASMs.

160

161 Rearrangement of pseudo-ASM3 allows for HEXIM N-terminal interactions.

apical 162 In the free 7SK-SL1 , pseudo-ASM3 and ASM4 adopt a pseudo-symmetrical

163 architecture where the two motifs are spatially opposed. Upon HEXIMN-ARM binding, the

164 pseudo-ASM3 maintains its U40:A43-U66 triple base interaction although the base of the

165 sandwich, A39, rearranges from a reverse Hoogsteen interaction with U68 into a cis

166 Hoogsteen/sugar interaction, giving rise to an alternate pseudo configuration. (Fig. 3b,f).

167 This is evidenced both by NOEs from the U68 imino proton to the A39 amino protons and

168 NOEs from the U68 H2′ and H3′ protons to the A39 H8 proton (Supplementary Fig. 6b).

169 This frees up the U68 imino proton to engage the backbone carbonyl of K152 while

170 simultaneously bringing the N1 proton acceptor of A39 into the major groove to hydrogen

171 bond with the side chain Hε protons of K151 (Fig. 3b). Thus, both K151 and 152 can enter

172 deep into the major groove by remodeling the pseudo-ASM3.

173 The amino side chain of K151 is within hydrogen-bonding distance of the A39 N1

174 nitrogen as evidenced by NOEs from the K151 H훾 and Hβ protons to the C37 H6 and H5

175 protons, respectively and from the K151 Hε protons to the C38 H6 and H5 protons

176 (Supplementary Fig. 4f). Additionally, NOEs between the K152 Hβ protons with the U68 177 H5 proton, the K152 Hδ protons with the C67 and U66 H5 protons, and the K152 Hε protons

178 with the C67 H5 and H6 protons position the amino side chain of K152 within hydrogen-

179 bonding distance of the C67 backbone (Fig. 3b and Supplementary Fig. 4e, f). This gives

180 rise to a forked configuration of the two lysines, orienting the side chain amino groups

181 towards the phosphate backbones on opposite ends of the groove.

182 Unlike the other three ASMs, where the NOEs clearly define a single predominant

183 structural configuration as described above, multiple dynamic states exist for ASM4 (see

184 below). In the most abundant form, the preformed nature found in the free state is

185 retained as evidenced by a direct imino to imino connectivity between U44 and U63, along

186 with maintenance of the G46-C62 Watson-Crick base pair (Supplementary Fig. 6c). In fact,

187 this interaction is stabilized by K150, which displays NOEs between the Hε protons with

188 the U63 and the U40 H5 protons, positioning the amino side chain within hydrogen

189 bonding distance of the O4 atoms of both U63 and U40 (Supplementary Fig. 4d). Additional

190 intermolecular interactions between the U40 H5 proton and the U63 H5 and H1′ protons

191 with the K150 Hδ, H훾, and Hβ protons places the K150 directly under the U63 cap of ASM4

192 (Supplementary Fig. 4d, e). Taken together, these data show that despite the lack of

193 arginines, the lysine-rich N-terminus of HEXIMN-ARM is able to be accommodated by 7SK:

194 the Watson-Crick face of A39 turns from the minor into the major groove to interact with

195 K151 and 152, which then positions the K150 to interact with the oxygen-rich

196 environment of the U63 and U40 caps.

197

198 Conformational plasticity of the ASM3/ASM4 region provides differential mode of

199 interactions with N-terminal and spacer residues.

200 Our previous study showed that TatNL4-3 displaces HEXIM by remodeling the

34 201 pseudo-ASM3 into a canonical ASM3 to allow for arginine intercalation . Furthermore, 202 an additional arginine docks into the preformed ASM4. While the mechanism of

Fin G 203 remodeling pseudo-ASM3 is conserved upon binding of both Tat and Tat ARMs (Fig.

204 3c, e, g, h; Supplementary Fig. 6), both the drivers of the conformational switch and the

205 engagement of the ASM4 vary depending on differences in amino acid sequences.

206 While TatFin has two major differences from TatNL4-3 (K53 to R53 and spacer H54 to

207 Q54, respectively), it only differs in a single amino acid from HEXIM (R52 and K151,

NL4-3 208 respectively). Like Tat , R52 is responsible for remodeling pseudo-ASM3 (Fig. 3c;

NL4-3 209 Supplementary Fig. 5a). However, while R53 in Tat flips over R52 and engages ASM4,

210 the equivalent K53 stays in the spacer region between the ASM1/ASM2 and ASM3/ASM4

211 regions in a manner similar to HEXIM as evidenced by NOEs between the K53 Hβ

212 protons with the U68 H5 proton, the K53 Hδ protons with the C67 and U66 H5 protons, and

213 the K53 Hε protons with the C67 H5 and H6 protons, which position the amino side chain

214 of K53 within hydrogen-bonding distance of the C67 backbone (Fig. 3j; Supplementary

215 Fig. 5a).

Fin 216 Like HEXIM, ASM4 remains unoccupied upon binding Tat and the structure

217 shows that the K51 amino side chain is positioned to hydrogen-bond with the U63 ribose

218 ring in a stabilizing interaction (Fig. 3c). This is evidenced by NOEs between the K51 Hδ

219 protons with the U63 H5 and H1′ protons and the K51 Hε protons with the U63 2′ hydroxyl

220 proton (Supplementary Fig. 5c). Furthermore, the N-terminal K50 exits near the apical

221 loop with NOEs observed between the K50 Hδ and Hε protons with the C38 and C37 H5

222 and H1′ protons position the amino side chain of K50 to the C38 phosphate backbone

223 (Supplementary Fig. 5a).

224 Finally, in evaluating the structural consequences of the spacer substitution, we

Fin 225 see that H54 and R55 in Tat remain near ASM1 and ASM2, similar to what is found in

226 HEXIM. This is evidenced by NOEs between the H54 (H153 in HEXIM) Hβ protons with 227 the C35, C36, and C37 H5 protons, placing H54 near ASM2 whereas the R55 (R154 in HEXIM)

228 Hδ protons display NOEs with the A34 H1′ proton and the C33 H1’, H5, and H6 protons,

229 positioning this spacer residue near ASM1 (Fig. 3i, j; Supplementary Fig. 4, 5). This is in

NL4-3 230 contrast with the binding mode of Tat in which the intercalation of R53 into ASM4

231 drags both the Q54 and R55 spacer residues towards the apical ASMs.

232 The importance of the histidine H54 spacer is even more evident in the TatG strain

233 where it represents the only difference between TatNL4-3. This single difference changes

234 the identity of the arginine that remodels pseudo-ASM3. In this ARM, the positioning of

235 H54 near ASM2 precludes R53 from reaching ASM4 to accomplish the inverse

236 intercalation seen in NL4-3 (Fig. 3d, e, k and Supplementary Fig. 5d, e). The interactions

237 with the apical ASMs thus occur in a ladder-like manner where R53 intercalates into the

238 remodeled ASM3 whereas R52 intercalates into ASM4 (Fig. 3d, e and Supplementary Fig.

239 5a, d, e). K51 makes the final stabilizing interaction with NOEs seen between the Hε

240 protons with the U63 2′ hydroxyl proton, indicating a hydrogen-bonding interaction

241 between the K51 amino side chain with the U63 ribose ring (Fig. 3d and Supplementary

242 Fig. 5f). Taken together, these studies show that arginine sandwich motifs provide mini

243 domains that arginine rich motifs of proteins can differentially interact with in order to

244 achieve deep major groove binding into the stem of in 7SK-SL1apical.

245

246 HEXIM allows for increased conformational sampling of apical ASMs.

247 While titration of all four arginine rich motifs stabilize the majority of 7SK-SL1apical

248 into one predominant configuration, the HEXIM ARM is an outlier wherein binding

249 causes ASM4 to become destabilized and exhibit multiple conformations. In such

250 conformations, the NOEs between the imino protons of U63 and U44 disappear, indicating

251 the disruption of the U63:U44-A65 triple and loss of ASM4 (Supplementary Fig. 6c). The 252 destabilization of this region is also indicated by the line-broadening of K150, which

253 interacts with U63 in the folded configuration (Supplementary Fig. 4e).

254 The destabilization of 7SK-SL1apical only by HEXIM is further evident when

255 comparing the thermodynamic profiles between Tat and HEXIM. The binding of TatFin

256 and TatG strains are enthalpically driven (ΔH = -7.5 ± 0.2 and -8.9 ± 2.2 kcal mol-1,

257 respectively; Fig. 4a) with a modest entropic contribution (-TΔS = -1.7 ± 0.3 and -2 ± 0.8

258 kcal mol-1, respectively; Fig. 4a). On the other hand, HEXIM binding is entropically

259 enhanced by approximately 2.5-fold over both Tat strains (-TΔS = -4.6 ± 0.8 kcal mol-1, ΔH

260 = -4.4 ± 0.7 kcal mol-1; Fig. 4a).

261 To evaluate the implication of HEXIM’s ability to locally destabilize ASM4 in the

262 context of its displacement required for transcriptional regulation, we compared TatFin

263 and TatG binding to 7SK both free and in the presence of HEXIM. Due to the modest

264 differences in binding energetics between the different ARMs, competition experiments

265 using ITC were not tractable. As seen in Fig. 4b, a 1:1 titration of either HEXIM or Tat into

266 7SK in the NMR shows the presence of both bound and free forms of 7SK. However, upon

267 titration of TatFin and TatG into the HEXIM:7SK complex, we not only observe completion

268 of complex formation but also a total displacement of HEXIM in both cases. This is

269 especially striking given that the binding affinities of TatFin and HEXIM for free 7SK are

270 equivalent. Taken together, these data indicate that Tat has a greater affinity for a HEXIM-

271 bound 7SK complex. Finally, ITC data of full-length HEXIM bound to full-length 7SK

272 snRNA (N= 1.9 ± 0.1; Fig. 4b) show that an entropy-driven interaction is maintained, and

273 in fact, is even more pronounced (-TΔS = -6.4 ± 1.3 kcal mol-1, ΔH = -2.6 ± 1 kcal mol-1Fig.

274 4a), suggesting that HEXIM binding may globally increase the conformational space

275 sampled by the 7SK-snRNP complex. These studies suggest that destabilization by

276 HEXIM may play an important role in how transcription factors access 7SK for pTEFb

277 capture. 278 Discussion

279 The 7SK snRNP represents a central biomolecule that a wide range of

280 transcriptional factors need to interact with to access pTEFb to control transcriptional

281 elongation. In particular, pTEFb extraction by HIV Tat from this complex requires

282 manipulating the interaction between the 7SK snRNA and the HEXIM adapter protein.

283 In this study, we solved the structures of the RNA binding domains of HEXIM and Tat

284 bound to 7SK and gain several insights into their functional significance, including the

285 malleability of 7SK, the local destabilization by HEXIM, and the specific sequence

286 variations of Tat.

287 The structures show that both HEXIM and Tat regulate pTEFb by directly binding

288 the stem of 7SK-SL1apical through intercalation of arginine rich motifs into an entire helical

289 turn of the major groove. This is unusual as RNA major grooves are deep and narrow,

290 making them generally inaccessible for protein binding. The architecture of the four

291 sandwich motifs in 7SK allows for transcriptional regulators to differentially utilize their

292 ARMs. The tandem preformed ASMs, ASM1 and ASM2, remain unchanged from their

293 free configuration upon encountering C-terminal arginines of Tat and HEXIM. On the

294 other hand, the apical pseudo-symmetrical ASMs, pseudo-ASM3 and ASM4, reconfigure

295 depending on their binding partners. The structures show that the ASM3 region can adopt

296 at least three different base pair interactions: a reverse Hoogsteen in the free state, a cis

297 Hoogsteen/sugar interaction upon HEXIM binding, and a Watson-Crick base pair upon

298 Tat binding. The cis Hoogsteen/sugar interaction is especially significant because it

299 allows HEXIM to enter the major groove despite the lack of arginines in the N-terminus.

300 Similarly, while ASM4 retains its preformed configuration found in the free state upon

301 Tat binding, it can be destabilized in the presence of HEXIM and adopt multiple flexible 302 states. Taken together, these studies show that 7SK is adaptable in its ASM architecture,

303 which can be modulated upon encountering different transcription factors.

304 Comparative analyses of HEXIM and Tat provide insights into how both positive

305 and negative regulators can manipulate 7SK to carry out their transcription roles. Our

306 studies implicate HEXIM as potentially having a dual structural role. On the one hand, it

307 can bind with high affinity to the apical portion of 7SK-stemloop 1 to sequester pTEFb

308 and on the other hand, it simultaneously causes local destabilization of this region,

309 enhancing the binding affinities of positive regulators for pTEFb capture. In comparison

310 to Tat, the thermodynamic profile and solution-state characteristics of HEXIM binding

311 show an entropy-driven mode of interaction that is particularly attributed to the

312 destabilization of U63 in the ASM4 region. Indeed, mutational studies have shown that

27,33 313 deletion of U63 significantly reduces HEXIM binding . This expansion in the dynamic

314 state of 7SK surrounding the ASM4 region is also supported both by in-vivo SHAPE

315 analysis where U63 becomes ultra-reactive upon HEXIM:pTEFb binding, and structural

316 and molecular dynamics modeling34,38-40. Furthermore, we show that Tat capitalizes on

317 this increased dynamic state, binding with greater affinity to the HEXIM-bound complex

318 than to free 7SK. While the use of a HEXIM-displacement mechanism for pTEFb capture

319 by binding to 7SK-SL1 has yet to be discovered for cellular factors, the destabilization-

320 driven preparation of 7SK snRNP for pTEFb extraction may be a general feature exploited

321 by specialized transcriptional factors.

322 Comparative analysis of HEXIM and Tat also shed light on the sequence

323 requirements of ARMs for pTEFb regulation. While N-terminal lysines of HEXIM allow

324 for destabilization of ASM4, the anchoring required to enter the major groove can only be

325 provided by the stacking of C-terminal arginines within ASM1 and ASM2. Indeed, the

326 importance of these C-terminal arginines for HEXIM binding is supported by their nearly

327 complete conservation across metazoan species41. On the other hand, the equivalent 328 arginines in HIV-1 Tat occur as a consecutive pair only in approximately 50% of reported

329 strains, albeit with the strong requirement of at least one arginine. The structures show

330 that these variations may be possible due to the anchoring provided by the arginines that

331 intercalate into the apical ASMs. Nevertheless, when two arginines are present in Tat, the

332 interactions with the tandem ASMs mirror HEXIM.

333 Furthermore, differences in N-terminal and spacer ARM residues orchestrate the

334 structural modulations of the apical ASMs. In order to accommodate the continuation of

335 the HEXIM ARM chain from the ASM1/ASM2 to the ASM3/ASM4 region required for U63

336 destabilization, K152 induces the reconfiguration of pseudo-ASM3 from a reverse

337 Hoogsteen to a cis-Hoogsteen/sugar interaction. In all variations of N-terminal Tat

338 sequences studied, binding is concomitant with the rearrangement of pseudo-ASM3 into

339 a canonical ASM3 through the intercalation of an arginine.

340 The structures also provide insights into specific sequence variations that occur in

341 the highly conserved Tat ARM to displace HEXIM. When two arginines are available in

342 the N-terminal residues, both are involved in arginine sandwich interactions, providing

343 a 2-fold increase in affinity; however, either R52 (TatNL4-3) or R53 (TatG) can act as the

344 remodeler. This can be explained by the presence of either a glutamine or histidine spacer,

345 respectively, which is the only amino acid difference between the two strains. As

346 glutamine (75%) and histidine (15%) make up the majority of the sequence variation in

347 this spacer, the structures show that these two spacer residues drive the differential

348 positioning of the arginine remodeler. In the TatFin strain, which has a histidine spacer, it

349 is the R52 that acts as a remodeler. In this case, the R53K substitution provides the

350 stabilizing interactions to reposition the single R52 arginine near pseudo-ASM3.

351 Furthermore, it is also interesting to compare the mode of binding of TatFin to

352 HEXIM. First, the single residue difference (R52 vs K151) provides TatFin with the 353 additional ASM intercalation required for displacement. Thus, Tat has evolved specific

354 sequence variations that allow for the reconfiguration of pseudo-ASM3, even in cases

355 where there is only a single variation from HEXIM. Second, despite both ARMs having

356 lysines positioned near ASM4, only HEXIM leads to local destabilization. Our studies

357 therefore provide HEXIM as an example of a negative regulator that primes its own

358 displacement by locally destabilizing 7SK (Fig. 4c). Overall, these studies have broader

359 implications for 7SK-snRNP mediated regulation. Given that the destabilization-driven

360 displacement is a robust mechanism, it is possible that other yet to be identified cellular

361 and viral transcriptional regulators recruit pTEFb through direct intercalation of ARMs

362 into 7SK-SL1apical. Furthermore, as a destabilized state of 7SK snRNP is what is presented

363 to all transcriptional regulators, the mechanisms necessary to extract pTEFb may

364 converge on capitalizing on this conformational heterogeneity.

365

366 Methods

367 RNA sample preparation: RNA samples used for biophysical experiments were

368 synthesized by in vitro transcription using T7 RNA polymerase with either plasmid DNA

369 or with synthetic DNA templates containing 2′-O-methylated (Integrated DNA

370 Technologies) containing the T7 promoter and the desired sequences. Plasmid DNA for

371 7SK-SL1Full-WT and 7SK-SL1Full-AGU contain the T7 promoter, insert, and SmaI sequence

372 were cloned by Genscript in between the EcoRI and BamHI restriction sites of a puc19

373 vector. Plasmid DNA was prepared for in vitro transcription from a 5mL overnight culture

374 of NEB 5α Competent E.coli (C29871) transformed with the plasmid using Qiaprep Spin

375 Miniprep Kit (Qiagen 27104). 10 µL of purified DNA was combined with 25 µL of 2′-O-

376 methylated reverse primer at 100 µM (5′-mGmGAGCGGTGAGG GAGGAAG-3′ where

377 m indicates 2′ O-methylated nucleotide), 2 µL of forward primer at 100 µM (5′- 378 GACAAGCCCGTCAGGG-3′), 2.44 mL of water, and two tubes of EconoTaq PLUS 2X

379 Master Mix (Lucigen 30035-2). The 5 mL mixture was then aliquoted into 50 µL

380 increments in a 96-well PCR plate and the template for in vitro transcription reactions

381 were amplified using the following PCR protocol: 95 °C for 5 minutes, 34 cycles of (95 °C

382 for 30 seconds, 50 °C for 1 minute, and 68 °C for 90 seconds), and 68 °C for 5 minutes.

383 After PCR amplification, reactions were pooled into 5 mL volume in a 50 mL Falcon tube

384 and 0.5 mL of 3M sodium acetate, pH 5 and 32 mL of 100% ethanol was added to the

385 mixture and chilled at -80°C for at least 30 minutes before spinning down at 9000 x g for

386 10 minutes at 4 °C. The ethanol was decanted and the pellet was left to dry overnight

387 before in vitro transcription use. Template preparation for 7SK-SL1apical using 2′-O-

388 methylated reverse primers in order to suppress the heterogeneity at the 3′ end of the

389 transcripts involved combining 15 mL of both forward (5′-

390 TAATACGACTCACTATAGGGATCTGTCACCCCATTGATCGCCAGTGGCTGATCT

391 GGCTGGCTAGGCGGGTCCC-3′) and reverse (5′-mGmGGACCCGCCTAGCCAG

392 CCAGATCAGCCACTGGCGATCAATGGGGTGACAGATCCCTATAGTGAGTCGTAT

393 TA-3′ where m indicates 2′ O-methylated nucleotide) primers at 1 mM stock solution with

394 470 mL of water. The mixture was heated at 95 °C for five minutes and cooled at room

395 temperature for 30 minutes before assembling the in vitro transcription reaction. Samples

396 were either unlabeled, or residue-specifically labeled with 13C/15N- or 2H (Cambridge

397 Isotope Laboratories, Inc.). After transcription, RNA samples were heat denatured and

398 purified by using urea-denaturing polyacrylamide gels.

399 HEXIM ARM and Tat ARM peptide preparation: Unlabeled HEXIMN-ARM

400 (GISYGRQLGKKKHRRRAHQ), TatFin ARM (GISYGRKKRKHRRRAHQ), and TatG ARM

401 (GISYGRKKRRHRRRAHQ) peptides were purchased from Tufts University Core

402 Facility at a 0.1 mmol scale. Tat adapters were placed around the HEXIMN-ARM sequence

403 to prevent non-physiological aggregation in solution state NMR studies. HEXIMN-ARM 13 15 404 peptides containing selective C/ N-labeled residues, underlined,

405 (GISYGRQLGKKKHRRRAHQ and GISYGRQLGKKKHRRRAHQ) were purchased from

406 New England Peptide.

407 Full-length HEXIM1 preparation: Synthetic DNA encoding HEXIM1 (2-359) was cloned

408 into a bacterial pMCSG7 expression vector 42 [JLS1] encoding an N-terminal tobacco etch

409 virus (TEV) protease-cleavable His6 tag and was expressed in E. coli BL21 AI cells in an

410 overnight culture at 20°C. Cells were lysed by sonication in buffer containing 50 mM Tris

411 pH 8.0, 500 mM NaCl, 0.1% b-mercaptoethanol, 50 mM (NH4)2SO4 and protease inhibitor

412 aprotinin and leupeptin. His6-HEXIM1 was purified from the cleared cell lysate using

413 Ni-NTA resin (Qiagen) and the His6 tag was cleaved with TEV protease. The HEXIM was

414 run over a second Ni-NTA column, followed by anion exchange on a 5 mL HiTrap Q HP

415 column (Cytiva) and gel filtration on a Superdex 200 16/60 column (Cytiva) in a final

416 buffer containing 25 mM Hepes pH 7.5, 200 mM NaCl, 5% glycerol, 1 mM TCEP. HEXIM

417 was flash frozen in liquid nitrogen and stored at -80° C.

418 Isothermal titration calorimetry: Binding constants for the interactions of 7SK-SL1apical

419 with the HEXIMN-ARM and TatFin and TatG ARMs and full-length HEXIM1 with 7SK-

420 SL1apical, 7SK-SL1Full-WT, and 7SK-SL1Full-AGU were measured using an ITC-200

421 microcalorimeter (MicroCal). 68 µM HEXIMN-ARM peptide was titrated into 5 µM

422 solutions of 7SK-SL1apical in 10 mM sodium phosphate, 70 mM NaCl, 0.1mM EDTA, pH

423 5.2 at 25 °C. Titrations of Tat ARMs into 7SK-SL1apical were also performed in the same

424 buffer conditions as the HEXIMN-ARM titration although the Tat ARM concentration was

425 at 2.5 µM and the 7SK-SL1apical concentration was at 45 µM. Titrations with full-length

426 HEXIM1 were done at 100 µM of HEXIM1 into 3 µM of either 7SK-SL1apical, 7SK-SL1Full-WT,

427 and 7SK-SL1Full-AGU in a buffer of 25mM HEPES pH 7.5, 200mM NaCl, 5% glycerol, and

428 1mM TCEP. Titration curves were analyzed using ORIGIN (OriginLab) and all

429 thermodynamic parameters are reported with n=3 experiments. 430 Small angle X-ray scattering: SAXS data for the 7SK-SL1apical:HEXIMN-ARM, 7SK-SL1apical:

431 TatFin ARM, and 7SK-SL1apical:TatG ARM complexes were obtained at SIBYLS beamline of

432 Advanced Light Source at Lawrence Berkeley National Laboratory. Measurements were

433 performed in a buffer containing 10 mM sodium phosphate, 70 mM NaCl, 0.1mM EDTA,

434 pH 5.2 and the background scattering was subtracted from the sample scattering to

435 obtain the scattering intensity from the solute molecules. Data from three different

436 concentrations (50, 75, and 100 µM) were compared with scattering intensities at q = 0 Å-

437 1 [I(0)], as determined by Guinier analysis, to detect possible interparticle interactions.

438 Data was analyzed by using ScÅtter software, and the ab initio envelope structures were

439 reconstructed by using DAMMIF/DAMMIN software.

440 NMR data acquisition, resonance assignment and structural calculations: For NMR

441 experiments, the Tat ARM/HEXIMN-ARM:7SK-SL1apical complexes were dissolved in a buffer

442 containing 10 mM potassium phosphate, 70 mM NaCl, and 0.1mM EDTA, pH 5.2

443 whereas the full-length HEXIM1:7SK-SL1apical complex was in a buffer with 25mM HEPES

444 pH 7.5, 200mM NaCl, 5% 2H-glycerol, and 1mM TCEP. All NMR experiments were

445 acquired by using Bruker 700 or 800 MHz instruments equipped with cryogenic probes.

446 Spectra for observing non-exchangeable protons were collected at 298K in 99.96% D2O,

447 whereas those for exchangeable protons were at 283K and 298K in 10% D2O. For NOESY

448 experiments, mixing times were set to 200 ms. To help unambiguously assign the

449 intermolecular NOEs of the HEXIMN-ARM with 7SK-SL1apical, we used both specifically

450 protonated GA, AC, and GU samples of 7SK-SL1apical and two HEXIMN-ARM peptides

451 synthesized by with different combinations of 13C/15N-labeled amino acids. Samples of

452 the 7SK-SL1apical:HEXIMN-ARM was prepared at 1:0.9 equivalents whereas the 7SK-

453 SL1apical:TatFin ARM, 7SK-SL1apical:Tat G ARM, and full-length HEXIM1:7SK-SL1apical

454 complexes were prepared at a 1:0.3 equivalents to avoid any non-specific binding of the

455 peptides or aggregation of the proteins to the RNA. Assignments for non-exchangeable 456 1H, 13C, 15N signals of 7SK-SL1apical in complex with HEXIMN-ARM and Tat ARMs were

457 obtained by analyzing two-dimensional 1H-1H NOESY recorded with non-labeled

458 samples and two-dimensional 13C-HMQC and 15N-HSQC and three-dimensional 13C-

459 edited HMQC-NOESY spectra for labeled samples.

460 Initial structural models were generated using manually assigned restraints in

461 CYANA where upper-limit distance restraints of 2.7 A, 3.3 A, and 5.0 A were employed

462 for direct NOE cross-peaks of strong, medium and weak intensities, respectively43.

463 However, for crosspeaks pairs associated with the intra-residue H8/6 to H2’ and H3’,

464 upper distance limits of 4.2 Å and 3.2 Å were employed for NOEs of medium and strong

465 intensity, respectively. To prevent the generation of structures with collapsed major

466 grooves, cross-helix P–P distance restraints (with 20% weighting coefficient) were

467 employed for A-form helical segments. Standard torsion angle restraints were used for

468 regions of A-helical geometry, allowing for ± 50º deviations from ideality (α=−62°, β=

469 180°, γ= 48°, δ= 83°,ɛ=−152°,ζ=−73°)44. Standard hydrogen bonding restraints with an

470 approximately linear NH–N and NH–O bond distances of 1.85 ± 0.05 Å and N–N and N–

471 O bond distances of 3.00 ± 0.05 Å, and two lower-limit restraints per base pair (G–C base

472 pairs: G-C4 to C-C6 ≥ 8.3 Å and G-N9 to C-H6 ≥ 10.75 Å; A–U base pairs: A-C4 to U-C6 ≥

473 8.3 Å and A-N9 to U-H6 ≥ 10.75 Å) were employed in order to weakly enforce base-pair

474 planarity (20% weighting coefficient).

475 The CYANA structure with the lowest target function was used as the initial

476 model for structure calculations Xplor-NIH to incorporate electrostatic constraints. First,

477 structures were calculated using annealing from 2000°C to 25°C in steps of 12.5°C.

478 Standard energy potential terms for bonds, angles, torsion angles, van der Waals

479 interactions and interatomic repulsions were included. The statistical backbone H-bond

480 potential was utilized for protein residues. Energy potentials for NOEs, hydrogen bonds,

481 and planarity were incorporated with restraints derived from NMR data. All restraints 482 used in CYANA were included with the exception of phosphate-phosphate distances.

483 The structures were sorted by energy using bond, angle, dihedral, and NOE energy

484 potential terms, and the ten percent of the structures with the lowest sort energy were

485 further minimized with SAXS terms to incorporate orientation restraints. For this step,

486 minimization started at 1500°C to 25°C in steps of 12.5°C. The lowest ten percent of these

487 were deposited in the RCSB databank.

488 489

490

491

492

493

494

495

496

497

498

499

500 7SK-snRNP b c SL3

SL2 a

3’ 5’ N = 2.1 ± 0.2 N = 2 ± 0.1 K = 209 ± 34 nM K = 200 ± 20 nM SL4 d d

7SK-SL1Full(native): 7SK-SL1Full(AGU): HEXIM HEXIM pTEFb d e HEXIM SL1 Tat

N = 1.8 ± 0.03 N = 1 ± 0.1 HEXIM K = 206 ± 58 nM Kd = 238 ± 35 nM SL3 d

apical apical SL2 7SK-SL1 (AGU): 7SK-SL1 (AGU): HEXIM HEXIM N-ARM g 3’ 5’ f

SL4

pTEFb N = 1.1 ± 0.1 N = 1 ± 0.03 K = 91 ± 35 nM K = 189 ± 16 nM Tat d d SL1 7SK-SL1apical(AGU): 7SK-SL1apical(AGU): Tat Subtype G ARM Tat Finland ARM 501 Figure 1. Characterization of HEXIM and Tat binding to 7SK: 502 (a) Cartoon representation of the HEXIM dimer and pTEFb heterodimer binding to the 503 7SK snRNP (top). Upon introduction of Tat, HEXIM is displaced from the snRNP (below). 504 Not depicted are MEPCE and LARP7. Representative ITC data for: full-length HEXIM1 Full 505 bound to 7SK-SL1 (G1-C108) with a wild-type loop (b) or an AGU tri loop (c) show that 506 the loop does not play a significant role in dimeric HEXIM binding; full-length HEXIM1 Full apical 507 shows a similar binding affinity to both 7SK-SL1 (c) and 7SK-SL1 (G26-C85) (d); the 508 full-length HEXIM dimer (d) and HEXIMN-ARM (e) bind with similar affinities and 509 stoichiometry to 7SK-SL1apical, indicating that the HEXIMN-ARM:7SK-SL1apical complex 510 represents the minimal binding interaction. Representative ITC traces of the TatFin (f) and 511 TatG (g) ARMs into 7SK-SL1apical show an approximately 1.2 and 2.6-fold increased 512 binding affinity compared to HEXIMN-ARM. All reported values are for n=3 replicates. a e

Arginine Arginine Sandwich Motif (ASM)

Cap d U40 U63 ASM3 Base ASM4 C62 A39

G70 U40 ASM2 b C71 G A U C G pseudo- G74 C G U63 ASM1 G C ASM4 ASM3 U 63 C75 C G 44U A ASM4 A U66 A39 C UG C62 U 40 A U pseudo-C G ASM3 C GASM2 C 71 U C G G70 35C G ASM1 7SK-SL1apical:TatFin C 75 U ASM2 A A C G f U G G C U G C71 C G U G A U G C 85 25G C G C G74 U40 U63 ASM3 c ASM1 ASM4 7SK-SL1apical C62 A39 7SK-SL1:Tat Fin C75 G G70 7SK-SL1:Tat ASM2 7SK-SL1:HEXIM N-ARM C71 G74

C75 ASM1

7SK-SL1apical:HEXIMN-ARM

7SK-SL1apical:TatG 513 Figure 2. 7SK-SL1apical in complex with HEXIMN-ARM, TatFin, and TatG ARMs. 514 (a) Cartoon depicting an arginine sandwich motif. (b) Secondary structure of free 7SK- apical 515 SL1 with a modified AGU tri loop. The base and cap residues forming ASM1, ASM2, 516 pseudo-ASM3, and ASM4 colored in orange, green, magenta, and blue, respectively. The 517 dashed arcs represent the triple-base interactions from the bulge to the stem, giving rise 518 to the caps of the sandwiches. (c) Overlay of reconstructed ab initio SAXS envelopes of 519 free 7SK-SL1apical (gray) and bound to HEXIMN-ARM (orange), TatFin (blue), and TatG (red) 520 ARMs, demonstrating the lack of global rearrangement of the RNA. Representative NMR 521 structures of 7SK-SL1apical bound to (d) HEXIMN-ARM, (e) TatFin, and (f) TatG shows 522 engagement with all ASMs. a d C62 ARM Sequences: R56 C71

N-ARM HEXIM : KK151KHRRR U63 Fin Tat : KR52KHRRR G74 G Tat : KR52RHRRR R52 C75 R57 K51

b c U63 e U63 U40 U40 A39 R53 K150 U68 R52 A39 K51

K151 g f U68 U68 h U68 A39 A39

A39

i j k

K53 R55 H153

R154 H54 C67 H54

R55

7SK-SL1apical:HEXIMN-ARM 7SK-SL1apical:TatFin 7SK-SL1apical:TatG 523 Figure 3. Details of intermolecular interactions between HEXIM, TatFin, and TatG ARMs apical 524 with 7SK-SL1 and the rearrangement of the U68-A39 base pair. 525 C-terminal arginines of all ARMs dock into ASM1 and ASM2 with identical tertiary 526 structures; representative ARM of TatFin is shown in (a). K150 and K151 in HEXIM (b), 527 R52 and K51 in TatFin (c), and K51, R52 and R53 in TatG (d,e) interact with the apical ASMs. 528 The U68-A39 base pair rearranges into a cis-Hoogsteen/sugar interaction upon HEXIM Fin G 529 binding (f), while Tat (g) and Tat (h) both remodel ASM3 by rearranging the U68-A39 530 base pair into a Watson-Crick interaction. (i-k) Spacer residues between the ASM1/ASM2 Fin 531 and ASM3/ASM4 regions are positioned near ASM1 and ASM2. In the case of Tat (j), K53 532 also acts as a spacer residue in order to allow for the remodeling of ASM3 by R52. 533 a 6.6/apical 6.VQ51$ ǻ* G Fin Tat ARM Tat ARM HEXIM ARM HEXIM ǻ+ 7ǻ6 -2

-4 b

-6 kcal/mole -8

-10

-12 1 “

.d “Q0 c

39 39

65 (bound) 65 (free) 65 (free)

65 (bound)

Free TatG Free TatFin HEXIM HEXIM + TatG HEXIM HEXIM + TatFin 534 535 536 Figure 4. Comparative thermodynamic analyses and competition experiments between 537 Tat and HEXIM: 538 (a) Comparison of enthalpic and entropic contributions between TatG, TatFIN, and 539 HEXIMN-ARM in complex with 7SK-SL1apical, and full length HEXIM in complex with 7SK 540 snRNA. Entropy values were calculated using a T value of 298K. The reversal in the 541 entropic and enthalpic contribution for Tat ARM compared to HEXIM is evident with 542 HEXIM having a entropically-driven binding profile. (b) Representative ITC data for full- 543 length HEXIM bound to 7SK snRNP demonstrating expected stoichiometry and specific 544 binding. (c) NMR competition titration analysis showing binding of 7SK by Tat 545 concomitant with the total displacement of HEXIM. Data are shown for the A39 (left) and 546 A65 (right) h2-c2 correlations. The increase in Tat affinity for the HEXIM-bound complex 547 is evident by lack of free-RNA populations for the A65 resonance in the competition 548 experiment compared to binding to free 7SK. 549 550

HEXIM TatFin TatG

NMR distance and dihedral constraints Distance restraints Total NOE 560 571 570 Intraresidue 229 229 229 Inter-residue 331 342 341 Sequential (|i – j| = 1) 152 152 152 Nonsequential (|i – j| > 1) 179 190 189 Hydrogen bonds 152 157 163 Total dihedral-angle restraints 376 376 376 Base pair 99 99 99 Sugar pucker 112 112 112 Backbone 127 127 127 Based on A-form geometry

Structure statistics Violations (mean ± s.d.) 323 ± 14 342 ± 14 348 ± 17 Distance constraints (Å) 0.28 ± 0.007 0.28 ± 0.01 0.26 ± 0.007

Dihedral-angle constraints (°) 0.22 ± 0.05 0.16 ± 0.05 0.31 ± 0.08 Max. dihedral-angle violation (°) 6.12 ± 1.01 7.89 ± 2.10 14.6 ± 1.96 Max. distance-constraint 1.69 ± 0.13 1.62 ± 0.16 2.15 ± 0.28 violation (Å)

Deviations from idealized geometry Bond lengths (Å) 0.006 0.006 0.006 Bond angles (°) 1.04 ± 0.009 1.06 ± 0.02 1.06 ± 0.005 Impropers (°) 0.68 ± 0.02 0.79 ± 0.06 0.94 ± 0.03

Average pairwise r.m.s. deviation (Å)a NOE-restrained RNA and 0.44 ± 0.08 0.54 ± 0.06 0.38 ± 0.05 protein residues b aPairwise r.m.s. deviation was calculated among ten refined structures. bThese are residues 24:87 for the RNA and 150:157 for the peptides. 551 552 Table 1. NMR restraints and structure statistics for HEXIM, TatFin, and TatG ARMs in 553 complex with 7SK-SL1apical 554 a b

140 apical 81 7SK-SL1 -AGU 112.20 83 79 112.00 69 78 31 25 4642 26 50 112.60 150 74 112.50 64 61 70 827360 113.00 113.00

G78.h1’ 160 G73.h1’ G42.h1’ G46.h1’ U68.h1’ G25.h1’ G69.h1’ U28.h1’

112.50 40 63 170 32 28 51 113.00 30 113.00 66 84 113.40 180 44 C ppm N ppm 13 15 140 apical 7SK-SL1 -WT 83 81 109.60 79 109.00 69 Gloop 25 4642 26 78 74 31 110.00 145 64 73 109.40 61 70 82 60 110.40

150 U30.h1’ U66.h1’ U44.h1’ U51.h1’

63 109.50 108.40 40 155 32 28 110.00 30 108.80 110.50 66 84 160 44

1H ppm 1H ppm 555

556 Supplementary Figure 1: 7SK-SL1apical construct design and comparison of 7SK-SL1apical 557 binding to full-length HEXIM1 and the HEXIM N-ARM. 558 (a) Two-dimensional 1H-15N HSQC spectra for 7SK-SL1apical with an AGU triloop (top) and 559 a native loop (bottom) showing that all major motifs in the apical stem are unaffected by 560 the change in the loop. (b) Two-dimensional 1H-13C HMQC spectra of GU-labeled 7SK- 561 SL1apical binding to 0.3 equivalents of full-length HEXIM1 (black) and 0.9 equivalents of 562 HEXIM N-ARM (magenta). While there are slight differences in both carbon and proton 563 chemical due to the salt concentration of the buffers, the HEXIM N-ARM makes same 564 interactions with 7SK-SL1apical as the full-length HEXIM1. 565 5.00 35

67.h1’ 5.20 38 67 45.h1’ 47 33 37 86 45 5.40 36 85

43.h2 48 29 5.60 34.h2 85.h1’ 75 87 62 49 65.h2 27.h2 71 G 5.80 34.h1’ 43 34 27.h1’ A U 43.h1’ 39 27 65.h1’ 65 C G 6.00 77 71 C G AC-7SK-SL1apical:HEXIM ARM G C ASM4 U 63 C G 5.10 44U A A U66 28 G C 68 U 5.30 U 50 32 40 A U 44 pseudo- C G 40 69 ASM3 ASM2 5.50 C G 84 46 C 71 H ppm 31 30 63 66 1 U 64 73 81 70 24 41 61 C G 5.70 79 35 G ASM1 74 82 C 75 83 26 C 42 78 60 U 25 A A 5.90 72 76 apical C G 51 GU-7SK-SL1 :HEXIM ARM U G G C U G 5.20 apical C G GA-7SK-SL1 :HEXIM ARM U G 50 A U 5.40 G C 85 25G C 69 G C 31 46 5' 3' 5.60 70 64 61 73 81 26 24 79 49 42 82 43.h2 74 83 5.80 34.h1’ 39 43 34.h2 34 27.h2 60 27.h1’ 78 25 65.h1’ 65 77 77.h2 65.h2 27 6.00 77.h1’

1H ppm 566 Supplementary Figure 2: Non-exchangeable RNA proton assignments of the HEXIM- 567 N-ARM: 7SK-SL1apical complex. 568 Portion of two-dimensional 1H–1H NOESY spectra of the HEXIM-N-ARM: 7SK-SL1apical 569 interaction using AC (top), GU (middle), and GA (lower) protonated samples with the 570 other nucleotides being deuterated. GISYGR146Q147L148G149K150K151K152H153R154R155R156AHQ

108.0 44.0 150.hz 146.nh 154.nh

46.0

110.0 152.nh N ppm 156.nh 148.nh 15 150.nh 152.hz 48.0

112.0 50.0

1H ppm

GISYGR146Q147L148G149K150K151K152H153R154R155R156AHQ

149.nh 110.0 151.nh

155.nh

115.0 N ppm 15

120.0 147.hz 147.nh

147.he21 147.he22 125.0

1H ppm 571 Supplementary Figure 3: Assignments of amino acid backbone and sidechain protons 572 of the HEXIM N-ARM:7SK-SL1apical interaction. 573 HSQC spectra of the two selectively 13C/15N- labeled HEXIM N-ARMs in complex with 574 7SK-SL1apical. Bolded residues in the sequences are the amino acids that have been 575 selectively labeled. All backbone amide protons and the side chain amino protons were 576 unambiguously assigned with the help of this selective labeling strategy. a b c 2.95 7.12 7.12 153.hβ 156.hη 155.hη 3.00 7.14 7.14

H ppm 3.05 1 7.16 7.16 155.hb 3.10 156.hb 71.h5 73.h8 75.h5 34.h1’ 77.h8 1H ppm 1H ppm 1H ppm alt d alt e G alt 40.h5 63.h5 63.h5 63.h5 I 63.h5 e 150.ha 1.30 G S A U Y 2.80 63.h1’ 40.h5 66.h1’ C 63.h5 G G 1.40 C G 150.h¡ G ASM4 R C 63 146 2.85 U H ppm 1.50 Q 1 C G 147 152.hb 44 150.hb U A L148 A U66 G 2.90 1.60 G C 149 150.hβ U K U 150 1 40 A U K H ppm 1.70 pseudo-C G 151 ASM2 K 1 ASM3 C G 152 H ppm C 71 H U 153 C G R f g 35C G ASM1 154 75 R 152.ha C 155 1.20 U R156 A A A C G 2.95 U G H 1.30 G C Q 153.hβ3 U G HEXIM N-ARM

1.40 33.h5 67.h5 3.00 C G 38.h5 U G 153.hβ2 A H ppm

U 1 G C 85 1.50 25G C 3.05 G C 5' 3' 1.60 152.hb 7SK-SL1apical 151.hb 154.hβ

1.70 36.h6 35.h6

1H ppm 1H ppm 577 Supplementary Figure 4: Assignments of the interactions between the HEXIM N-ARM 578 and 7SK-SL1apical: Secondary structure of the 7SK-SL1apical showing connectivities to the N-ARM 579 HEXIM . NOEs between: (a) A77 H8 proton and R156 guanidinium protons indicate 580 R156 docks into the preformed ASM1; (b) G74 and G73 with the guanidinium protons of 581 R156 and R155, respectively, and (c) 156 and 155 Hd protons and H5 protons of the C75 + 582 and C71 caps support entry of these arginines into ASM1 and ASM2; (d) K150 He protons 583 and the U63 and U40 caps position K150 within hydrogen-bonding distance to the O4 of 584 both residues. (e) K150 interacts with both the U63 and U40 caps of pseudo-ASM3 and ASM4 585 whereas interactions between the U66 H1’ protons with the K152 Hd protons positions 586 K152 as a spacer residue. (f) Interactions of K152 and K151 with the C67 and C38 H5 587 protons, respectively, and NOEs between the R154 Hb proton with C33 determine the 588 relative position of this spacer residue. (g) Positioning of the H153 Hb protons with the 589 H6 protons of C35 and C36. G G A U I a b 67.h5 38.h5 63.h1’ 66.h5 C G S 37.h5 36.h1’ C G G C ASM4 Y 56.hb U 63 G C G 2.80 3.20 55.hb 44U A R 50.h¡ 66 51.h¡ A U 54.hβ3 C K50 UG 2.90 U K 54.hβ2 40 A U 51 53.h¡3 3.30 pseudo-C G R ASM3 ASM2 52 59.hβ3 C G 71 C K53 3.00 53.h¡2

H ppm 3.40

U 1 C G H 35C G ASM1 54 52.h 75 R b C 55 3.10 56.hb 57.hb U 3.50 59.hβ2 A A R56 C G 55.hb U G R 57 G C A 3.20 3.60 U G C G H U G Q 33.h1’ 75.h5 40.h5 34.h1’ 36.h5 31.h1’ 30.h5 A U 71.h5 33.h5 71.h1’ 37.h5 G C 85 Tat Finland RBD 1 1 25G C H ppm H ppm 5'G C3' c 63.h02’ G G 2.70 U e A I d C G 33.h5 63.h1’ 2.80 C G S G C ASM4 2.90 51.h¡ U 63 Y 2.90 2.90 C G G 51.h¡3 3.00 44U A 66 H ppm A U R 51.h¡2 1 C 51.h¡2 1H ppm UG K 3.00 56.hb 3.00 U 40 A U K 57.hb pseudo- 51 C G R 52.hb 55.hb ASM3 C GASM2 52 3.10 3.10 53.hb C 71 R f

U 53 63.h02’ C G 35 ASM1 H H ppm C G 54 1 2.90 C 75 3.20 54.hβ33.20 54.hβ3 U R55 A A 54.hβ2 C G R 54.hβ2 56 3.30 59.hβ3 3.30 59.hβ3 51.h¡ U G R 3.00 G C 57 59.hβ2 U G A 59.hβ2 C G U G H 3.40 3.40 A U Q G C 85 1H ppm 25G C Tat Subype G RBD 75.h5 71.h5 35.h5 63.h5 30.h5 37.h5 29.h5 40.h5 5'G C3'

apical 1 1H ppm 590 7SK-SL1 H ppm 591 Supplementary Figure 5: Interactions between 7SK-SL1apical and TatFin and TatG RBDs: 592 Secondary structure of the 7SK-SL1apical showing connectivities to the RBDs. NOEs Fin + 593 between: (a) R57, R56, and R52 Hd of Tat protons and the C75, C71 , and U40 caps position 594 then into ASM1, ASM2, and remodeled ASM3; K53 He and C67 and U66, and R55 Hd and 595 C33 position these spacer residues in-between the lower and apical ASMs; (b) K51 He and 596 U63 and U66, and K50 He and C38 and C37 make up the final interactions as the peptide exits 597 the groove; H54 Hb and the C37 and C36 H5 provide information for the final spacer Fin 598 residue of the Tat ; (c) U63 2’ hydroxyl and K51 He indicate a hydrogen bonding G + 599 interaction; (d) R56, R52 Hd of Tat and the C71 and U63 caps show interaction with ASM2 600 and ASM1; (e) R57, R52 Hd and the C75 and U40 caps show interaction with ASM1 and the 601 formation of ASM3 by R52; (f) U63 2’ hydroxyl proton and K51 He indicate K51 hydrogen 602 bond with the U63 ribose ring. a b

71.h42 4.24 4.26 68.h2’ 68.h3’ H ppm 1 4.28 G 9.0 A U 39.h8 H ppm

C G 1 C G 39.h61 39.h62 G C ASM4 68.h3 U 63 10.0 71.h41 13.5 C G 44U A A U66 H, ppm 1H ppm G C U U 40 A U 30 pseudo- c 145.0 63 C G 40 32 28 ASM2 ASM3 C G 155.0 C 71 N ppm 15 U 10.0 C G 82-81 30-81 35C G ASM1 C 75 U A A C G 11.0 U G 84-83 31-79 32-79 G C 82-83 82-30 U G 84-28 44-63 C G 82-28 U G 12.0 66-40 74-73 A U 84-26 25-26 85 G C 44-64

25 H ppm 66-42 G C 1 5'G C3' 13.0 69-70 60-61 68-69

14.0

1H ppm

603 Supplementary Figure 6: Characterization of the A39-U68 base pair: 1 1 + 604 Portions of the H– H 2D NOESY spectra. (a) The C71 amino protons are downshifted 605 due to their participation in a triple base-interaction. (b) NOEs between the U68 imino 606 proton with the A39 amino protons and the U68 H2’ and H3’ protons with the A39 H8 607 proton confirms that the A39-U68 base pair is in a cis Hoogsteen/sugar interaction. (c). 608 Portion of the 1H– 15N 2D HSQC spectrum for 15N, 13C-labeled 7SK-SL1apical supports the 609 assignments of the U40 and U63 cap imino protons. 610

611 612 613 Supplementary Table 1. ITC derived binding parameter: 614

615

616

617

618

619

620

621

622 623 Acknowledgements

624 This work was supported by HHMI grant 55108516 (to V.M.D’S) and NIH grant U54

625 AI50470 (formerly U54 GM103297) (to V.M.D. and J.L.S.). SAXS data was collected at

626 the Advanced Light Source (ALS), SIBYLS beamline on behalf of US DOE-BER, through

627 the Integrated Diffraction Analysis Technologies (IDAT) program. Additional support

628 comes from the NIGMS project ALS-ENABLE (P30 GM124169) and a High-End

629 Instrumentation Grant S10OD018483. 630

631 Accession codes

632 Atomic coordinates have been deposited in the under accession codes

633 PDB XXXX (7SK-SL1apical:HEXIMN-ARM), PDB XXX (7SK-SL1apical:TatFin), and PDB XXX

634 (7SK-SL1apical:TatG). Chemical shifts have been deposited in the Biological Magnetic

635 Resonance Data Bank under accession codes XXX (7SK-SL1apical:HEXIMN-ARM), XXXX

636 (7SK-SL1apical:TatFin), and XXXX (7SK-SL1apical:TatG).

637

638 Author contributions

639 V.V.P, M.G., and V.M.D’S. conceived and designed the experiments. V.V.P., M.G. and

640 J.L.M. purified the samples and V.V.P performed the NMR, ITC, and SAXS experiments.

641 V.V.P., M.G., and V.M.D’S. performed the structural analyses, interpreted the data, and

642 wrote the manuscript. V.V.P, M.G., J.L.M., J.L.S. and V.M.D. analyzed data and edited

643 the manuscript. 644

645 Competing financial interests

646 The authors declare no competing financial interests. 647 References

648 1 Wada, T. et al. DSIF, a novel transcription elongation factor that regulates RNA 649 polymerase II processivity, is composed of human Spt4 and Spt5 homologs. 650 Genes Dev 12, 343-356, doi:10.1101/gad.12.3.343 (1998). 651 2 Yamaguchi, Y. et al. NELF, a multisubunit complex containing RD, cooperates 652 with DSIF to repress RNA polymerase II elongation. Cell 97, 41-51, 653 doi:10.1016/s0092-8674(00)80713-8 (1999). 654 3 Missra, A. & Gilmour, D. S. Interactions between DSIF (DRB sensitivity inducing 655 factor), NELF (negative elongation factor), and the Drosophila RNA polymerase 656 II transcription elongation complex. Proc Natl Acad Sci U S A 107, 11301-11306, 657 doi:10.1073/pnas.1000681107 (2010). 658 4 Renner, D. B., Yamaguchi, Y., Wada, T., Handa, H. & Price, D. H. A highly 659 purified RNA polymerase II elongation control system. J Biol Chem 276, 42601- 660 42609, doi:10.1074/jbc.M104967200 (2001). 661 5 Ehara, H. et al. Structure of the complete elongation complex of RNA polymerase 662 II with basal factors. Science 357, 921-924, doi:10.1126/science.aan8552 (2017). 663 6 Marshall, N. F. & Price, D. H. Purification of P-TEFb, a transcription factor 664 required for the transition into productive elongation. J Biol Chem 270, 12335- 665 12338, doi:10.1074/jbc.270.21.12335 (1995). 666 7 Peng, J., Zhu, Y., Milton, J. T. & Price, D. H. Identification of multiple cyclin 667 subunits of human P-TEFb. Genes Dev 12, 755-762, doi:10.1101/gad.12.5.755 668 (1998). 669 8 Kim, J. B. & Sharp, P. A. Positive transcription elongation factor B 670 phosphorylates hSPT5 and RNA polymerase II carboxyl-terminal domain 671 independently of cyclin-dependent kinase-activating kinase. J Biol Chem 276, 672 12317-12323, doi:10.1074/jbc.M010908200 (2001). 673 9 Fujinaga, K. et al. Dynamics of human immunodeficiency virus transcription: P- 674 TEFb phosphorylates RD and dissociates negative effectors from the 675 transactivation response element. Mol Cell Biol 24, 787-795, 676 doi:10.1128/mcb.24.2.787-795.2004 (2004). 677 10 Vos, S. M. et al. Structure of activated transcription complex Pol II-DSIF-PAF- 678 SPT6. Nature 560, 607-612, doi:10.1038/s41586-018-0440-4 (2018). 679 11 Yamada, T. et al. P-TEFb-mediated phosphorylation of hSpt5 C-terminal repeats 680 is critical for processive transcription elongation. Mol Cell 21, 227-237, 681 doi:10.1016/j.molcel.2005.11.024 (2006). 682 12 Yang, Z., Zhu, Q., Luo, K. & Zhou, Q. The 7SK small nuclear RNA inhibits the 683 CDK9/cyclin T1 kinase to control transcription. Nature 414, 317-322, 684 doi:10.1038/35104575 (2001). 685 13 Nguyen, V. T., Kiss, T., Michels, A. A. & Bensaude, O. 7SK small nuclear RNA 686 binds to and inhibits the activity of CDK9/cyclin T complexes. Nature 414, 322- 687 325, doi:10.1038/35104581 (2001). 688 14 Michels, A. A. et al. MAQ1 and 7SK RNA interact with CDK9/cyclin T complexes 689 in a transcription-dependent manner. Mol Cell Biol 23, 4859-4869, 690 doi:10.1128/mcb.23.14.4859-4869.2003 (2003). 691 15 Yik, J. H. et al. Inhibition of P-TEFb (CDK9/Cyclin T) kinase and RNA 692 polymerase II transcription by the coordinated actions of HEXIM1 and 7SK 693 snRNA. Mol Cell 12, 971-982, doi:10.1016/s1097-2765(03)00388-5 (2003). 694 16 Chen, R., Yang, Z. & Zhou, Q. Phosphorylated positive transcription elongation 695 factor b (P-TEFb) is tagged for inhibition through association with 7SK snRNA. J 696 Biol Chem 279, 4153-4160, doi:10.1074/jbc.M310044200 (2004). 697 17 Michels, A. A. et al. Binding of the 7SK snRNA turns the HEXIM1 protein into a 698 P-TEFb (CDK9/cyclin T) inhibitor. EMBO J 23, 2608-2619, 699 doi:10.1038/sj.emboj.7600275 (2004). 700 18 Kobbi, L. et al. An evolutionary conserved Hexim1 peptide binds to the Cdk9 701 catalytic site to inhibit P-TEFb. Proc Natl Acad Sci U S A 113, 12721-12726, 702 doi:10.1073/pnas.1612331113 (2016). 703 19 Sano, M. et al. Activation and function of cyclin T-Cdk9 (positive transcription 704 elongation factor-b) in cardiac muscle-cell hypertrophy. Nat Med 8, 1310-1317, 705 doi:10.1038/nm778 (2002). 706 20 Kretz, A. L. et al. CDK9 is a prognostic marker and therapeutic target in 707 pancreatic cancer. Tumour Biol 39, 1010428317694304, 708 doi:10.1177/1010428317694304 (2017). 709 21 Schlafstein, A. J. et al. CDK9 Expression Shows Role as a Potential Prognostic 710 Biomarker in Breast Cancer Patients Who Fail to Achieve Pathologic Complete 711 Response after Neoadjuvant Chemotherapy. Int J Breast Cancer 2018, 6945129, 712 doi:10.1155/2018/6945129 (2018). 713 22 Cho, W. K., Jang, M. K., Huang, K., Pise-Masison, C. A. & Brady, J. N. Human T- 714 lymphotropic virus type 1 Tax protein complexes with P-TEFb and competes for 715 Brd4 and 7SK snRNP/HEXIM1 binding. J Virol 84, 12801-12809, 716 doi:10.1128/JVI.00943-10 (2010). 717 23 Zhou, M. et al. Tax interacts with P-TEFb in a novel manner to stimulate human 718 T-lymphotropic virus type 1 transcription. J Virol 80, 4781-4791, 719 doi:10.1128/JVI.80.10.4781-4791.2006 (2006). 720 24 Muniz, L., Egloff, S., Ughy, B., Jady, B. E. & Kiss, T. Controlling cellular P-TEFb 721 activity by the HIV-1 transcriptional transactivator Tat. PLoS Pathog 6, e1001152, 722 doi:10.1371/journal.ppat.1001152 (2010). 723 25 Byers, S. A., Price, J. P., Cooper, J. J., Li, Q. & Price, D. H. HEXIM2, a HEXIM1- 724 related protein, regulates positive transcription elongation factor b through 725 association with 7SK. J Biol Chem 280, 16360-16367, doi:10.1074/jbc.M500424200 726 (2005). 727 26 Yik, J. H., Chen, R., Pezda, A. C. & Zhou, Q. Compensatory contributions of 728 HEXIM1 and HEXIM2 in maintaining the balance of active and inactive positive 729 transcription elongation factor b complexes for control of transcription. J Biol 730 Chem 280, 16368-16376, doi:10.1074/jbc.M500912200 (2005). 731 27 Egloff, S., Van Herreweghe, E. & Kiss, T. Regulation of polymerase II 732 transcription by 7SK snRNA: two distinct RNA elements direct P-TEFb and 733 HEXIM1 binding. Mol Cell Biol 26, 630-642, doi:10.1128/MCB.26.2.630-642.2006 734 (2006). 735 28 Li, Q. et al. Analysis of the large inactive P-TEFb complex indicates that it 736 contains one 7SK molecule, a dimer of HEXIM1 or HEXIM2, and two P-TEFb 737 molecules containing Cdk9 phosphorylated at threonine 186. J Biol Chem 280, 738 28819-28826, doi:10.1074/jbc.M502712200 (2005). 739 29 Dulac, C. et al. Transcription-dependent association of multiple positive 740 transcription elongation factor units to a HEXIM multimer. J Biol Chem 280, 741 30619-30629, doi:10.1074/jbc.M502471200 (2005). 742 30 Schonichen, A. et al. A flexible bipartite coiled coil structure is required for the 743 interaction of Hexim1 with the P-TEFB subunit cyclin T1. Biochemistry 49, 3083- 744 3091, doi:10.1021/bi902072f (2010). 745 31 Blazek, D., Barboric, M., Kohoutek, J., Oven, I. & Peterlin, B. M. Oligomerization 746 of HEXIM1 via 7SK snRNA and coiled-coil region directs the inhibition of P- 747 TEFb. Nucleic Acids Res 33, 7000-7010, doi:10.1093/nar/gki997 (2005). 748 32 Martinez-Zapien, D. et al. Intermolecular recognition of the non-coding RNA 7SK 749 and HEXIM protein in perspective. Biochimie 117, 63-71, 750 doi:10.1016/j.biochi.2015.03.020 (2015). 751 33 Lebars, I. et al. HEXIM1 targets a repeated GAUC motif in the riboregulator of 752 transcription 7SK and promotes base pair rearrangements. Nucleic Acids Res 38, 753 7749-7763, doi:10.1093/nar/gkq660 (2010). 754 34 Pham, V. V. et al. HIV-1 Tat interactions with cellular 7SK and viral TAR RNAs 755 identifies dual structural mimicry. Nat Commun 9, 4266, doi:10.1038/s41467-018- 756 06591-6 (2018). 757 35 Chen, L. & Frankel, A. D. An RNA-binding peptide from bovine 758 immunodeficiency virus Tat protein recognizes an unusual RNA structure. 759 Biochemistry 33, 2708-2715, doi:10.1021/bi00175a046 (1994). 760 36 Puglisi, J. D., Tan, R., Calnan, B. J., Frankel, A. D. & Williamson, J. R. 761 Conformation of the TAR RNA-arginine complex by NMR spectroscopy. Science 762 257, 76-80, doi:10.1126/science.1621097 (1992). 763 37 Puglisi, J. D., Chen, L., Blanchard, S. & Frankel, A. D. Solution structure of a 764 bovine immunodeficiency virus Tat-TAR peptide-RNA complex. Science 270, 765 1200-1203, doi:10.1126/science.270.5239.1200 (1995). 766 38 Martinez-Zapien, D. et al. The crystal structure of the 5 functional domain of the 767 transcription riboregulator 7SK. Nucleic Acids Res 45, 3568-3579, 768 doi:10.1093/nar/gkw1351 (2017). 769 39 Bourbigot, S. et al. Solution structure of the 5'-terminal hairpin of the 7SK small 770 nuclear RNA. RNA 22, 1844-1858, doi:10.1261/rna.056523.116 (2016). 771 40 Brogie, J. E. & Price, D. H. Reconstitution of a functional 7SK snRNP. Nucleic 772 Acids Res 45, 6864-6880, doi:10.1093/nar/gkx262 (2017). 773 41 Marz, M. et al. Evolution of 7SK RNA and its protein partners in metazoa. Mol 774 Biol Evol 26, 2821-2830, doi:10.1093/molbev/msp198 (2009). 775 42 Stols, L. et al. A new vector for high-throughput, ligation-independent cloning 776 encoding a tobacco etch virus protease cleavage site. Protein Expr Purif 25, 8-15, 777 doi:10.1006/prep.2001.1603 (2002). 778 43 Guntert, P., Mumenthaler, C. & Wuthrich, K. Torsion angle dynamics for NMR 779 structure calculation with the new program DYANA. J Mol Biol 273, 283-298, 780 doi:10.1006/jmbi.1997.1284 (1997). 781 44 Tolbert, B. S. et al. Major groove width variations in RNA structures determined 782 by NMR and impact of 13C residual chemical shift anisotropy and 1H-13C 783 residual dipolar coupling on refinement. J Biomol NMR 47, 205-219, 784 doi:10.1007/s10858-010-9424-x (2010). 785

786

787 Figures

Figure 1

Characterization of HEXIM and Tat binding to 7SK: (a) Cartoon representation of the HEXIM dimer and pTEFb heterodimer binding to the 7SK snRNP (top). Upon introduction of Tat, HEXIM is displaced from the snRNP (below). Not depicted are MEPCE and LARP7. Representative ITC data for: full-length HEXIM1 bound to 7SK-SL1Full (G1-C108) with a wild-type loop (b) or an AGU tri loop (c) show that the loop does not play a signicant role in dimeric HEXIM binding; full-length HEXIM1 shows a similar binding anity to both 7SK-SL1Full(c) and 7SK-SL1apical (G26-C85) (d); the full-length HEXIM dimer (d) and HEXIMN-ARM (e) bind with similar anities and stoichiometry to 7SK-SL1apical, indicating that the HEXIMN-ARM:7SK- SL1apical complex represents the minimal binding interaction. Representative ITC traces of the TatFin (f) and TatG (g) ARMs into 7SK-SL1apical show an approximately 1.2 and 2.6-fold increased binding anity compared to HEXIMN-ARM. All reported values are for n=3 replicates.

Figure 2 7SK-SL1apical in complex with HEXIMN-ARM, TatFin, and TatG ARMs. (a) Cartoon depicting an arginine sandwich motif. (b) Secondary structure of free 7SK SL1apical with a modied AGU tri loop. The base and cap residues forming ASM1, ASM2, pseudo-ASM3, and ASM4 colored in orange, green, magenta, and blue, respectively. The dashed arcs represent the triple-base interactions from the bulge to the stem, giving rise to the caps of the sandwiches. (c) Overlay of reconstructed ab initio SAXS envelopes of free 7SK-SL1apical (gray) and bound to HEXIMN-ARM (orange), TatFin (blue), and TatG (red) ARMs, demonstrating the lack of global rearrangement of the RNA. Representative NMR structures of 7SK- SL1apical bound to (d) HEXIMN-ARM, (e) TatFin, and (f) TatG shows engagement with all ASMs. Figure 3

Details of intermolecular interactions between HEXIM, TatFin, and TatG ARMs with 7SK-SL1apical and the rearrangement of the U68-A39 base pair. C-terminal arginines of all ARMs dock into ASM1 and ASM2 with identical tertiary structures; representative ARM of TatFin is shown in (a). K150 and K151 in HEXIM (b), R52 and K51 in TatFin (c), and K51, R52 and R53 in TatG (d,e) interact with the apical ASMs. The U68- A39 base pair rearranges into a cis-Hoogsteen/sugar interaction upon HEXIM binding (f), while TatFin (g) and TatG (h) both remodel ASM3 by rearranging the U68-A39 base pair into a Watson-Crick interaction. (i- k) Spacer residues between the ASM1/ASM2 and ASM3/ASM4 regions are positioned near ASM1 and ASM2. In the case of TatFin (j), K53 also acts as a spacer residue in order to allow for the remodeling of ASM3 by R52.

Figure 4

Comparative thermodynamic analyses and competition experiments between Tat and HEXIM: (a) Comparison of enthalpic and entropic contributions between TatG, TatFIN, and HEXIMN-ARM in complex with 7SK-SL1apical, and full length HEXIM in complex with 7SK snRNA. Entropy values were calculated using a T value of 298K. The reversal in the entropic and enthalpic contribution for Tat ARM compared to HEXIM is evident with HEXIM having a entropically-driven binding prole. (b) Representative ITC data for full length HEXIM bound to 7SK snRNP demonstrating expected stoichiometry and specic binding. (c) NMR competition titration analysis showing binding of 7SK by Tat concomitant with the total displacement of HEXIM. Data are shown for the A39 (left) and A65 (right) h2-c2 correlations. The increase in Tat anity for the HEXIM-bound complex is evident by lack of free-RNA populations for the A65 resonance in the competition experiment compared to binding to free 7SK.