STRUCTURAL INSIGHTS INTO 7SK SNRNP COMPLEX AND ITS IMPLICATION FOR HIV-1 TRANSCRIPTIONAL CONTROL

by

LE LUO

Submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Thesis Advisor Blanton S. Tolbert, Ph.D.

Department of Chemistry

CASE WESTERN RESERVE UNIVERSITY

January, 2019

CASE WESTERN RESERVE UNIVERSITY SCHOOL

OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

Le Luo

candidate for the degree of Doctor of Philosophy.

Committee Chair

Mary Barkley, Ph.D.

Committee Member

Paul Carey, Ph.D.

Committee Member

Fu-Sen Liang, Ph.D.

Committee Member

Blanton S. Tolbert, Ph.D.

Date of Defense November 26, 2018

*We also certify that written approval has been obtained for any proprietary material contained therein.

Dedicated to my family and friends

Table of Contents

TITLE PAGE ......

COMMITTEE APPROVAL SHEET ......

DEDICATION ......

LIST OF FIGURES ...... V

ACKNOWLEDGEMENTS ...... VII

LIST OF ABBREVIATIONS ...... IX

ABSTRACT ...... 1

CHAPTER 1: INTRODUCTION ...... 3

1.1 History of AIDS/HIV discovery ...... 4

1.2 HIV-1 Life Cycle ...... 7

1.3 The HIV-1 structure and ...... 12

Major Structural Proteins ...... 13

Regulatory Proteins...... 15

Accessory Proteins ...... 17

I

1.4 The AIDS/HIV-1 Epidemics and Treatment ...... 21

1.5 HIV transcription by RNAPII ...... 25

1.6 Transcriptional regulation of HIV-1 ...... 29

1.7 7SK snRNPs in P-TEFb regulation ...... 33

Reference ...... 41

CHAPTER 2: DETERMINATION OF THE SECONDARY STRUCTURE OF 7SK

SNRNA BY DMS-MAPSEQ ...... 53

2.1 Abstract...... 54

2.2 Introduction ...... 57

2.3 Materials and Method ...... 62

T7 RNA Polymerase preparation ...... 62

7SK snRNA Preparation ...... 62

DMS modification ...... 63

RT-PCR ...... 64

Next generation Sequencing ...... 65

Processing of DMS-MaPseq data ...... 65

Data validation and normalization ...... 68

RNAStructure ...... 69

2.4 Result...... 70

II

Population average model of DMS-MaPseq ...... 70

Preliminary Trial of Clustering on DMS_491 ...... 74

2.5 Discussion ...... 76

Reference ...... 89

CHAPTER 3: STRUCTURAL INSIGHTS INTO7SK SL3 AND ITS

INTERACTION WITH HNRNP A1 ...... 93

3.1 Abstract...... 94

3.2 Introduction ...... 96

3.3 Materials and methods...... 103

RNA synthesis and purification ...... 103

UP1 purification ...... 104

DMS-MaPseq ...... 104

Differential DMS-MaPseq ...... 105

ITC ...... 105

SAXS data Acquisition and analysis ...... 106

NMR data acquisition ...... 107

NMR titrations of UP1-7SK SL3up ...... 108

Structural modeling ...... 108

3.4 Results ...... 110

III

7SK snRNA –hnRNP A1 (1:10) interaction ...... 110

Titration study of hnRNP A1- 7SK complexes ...... 112

ITC experiment ...... 113

1H-1H NOESY and 1H-15N HSQC spectra of SL3S ...... 114

SEC-SAXS model of SL3D and its complex ...... 115

7SK SL3D model fitted to SEC-SAXS model ...... 116

Reference ...... 133

CHAPTER 4: CONCLUSIONS AND FUTURE STUDIES ...... 136

4.1 Conclusions ...... 137

4.2 Future studies ...... 146

High resolution structural model of full length 7SK snRNA ...... 146

7SK associated snRNP study in vitro ...... 148

High resolution structure of 7SK SL3 - hnRNP A1 complex ...... 149

Reference ...... 151

APPENDIX ...... 153

BIBLIOGRAPHY ...... 155

IV

List of Figures

[Figure 1.1] The HIV Life Cycle.

[Figure 1.2] Structure of HIV Virion.

[Figure 1.3] Transcriptional Regulation of HIV by P-TEFb.

[Figure 1.4] Major 7SK snRNP Complexes Involved in P-TEFb Regulation.

[Figure 2.1] Watson-Cricks Base Pairing of RNA.

[Figure 2.2] Secondary Structure Model Determined Using Chemical and

Enzymatic Probing by the J. Steitz Group.

[Figure 2.3] Secondary Structure Model Determined Using SHAPE Method by

the D. Price Group.

[Figure 2.4] Secondary Structure Model Determined by DMS-MaPseq.

[Figure 2.5] Two Preliminary Clustering Models Determined by DMS-MaPseq.

[Figure 3.1] Structural Features of HnRNP A1.

[Figure 3.2] HnRNP A1/ UP1 Titration into 7SK SL3up by EMSA.

[Figure 3.3] Differential DMS-MaPseq of 7SK snRNA upon hnRNP A1 Binding.

[Figure 3.4] Differential DMS Reactivity of 7SK-hnRNP A1 Complex in SL3D

Region.

[Figure 3.5] Secondary Structures of SL3 Constructs Used in the Study.

[Figure 3.6] HnRNP A1 Titration into SL3up RNA Measured by ITC.

[Figure 3.7] H2O NMR Spectrum of SL3S.

[Figure 3.8] HnRNP A1 Titration to 7SK SL3up Measurement by 1H-15N HSQC.

[Figure 3.9] SEC- SAXS Data of SL3D and Its 1:1 Complex with HnRNP A1.

V

[Figure 3.10] The Tertiary Structure Models for the 7SK SL3D Fitted into the

Molecular Density Envelope Calculated from SEC-SAXS Data.

VI

ACKNOWLEDGEMENTS

I would like to begin by thanking my Ph.D. advisor, Dr. Blanton S. Tolbert for his guidance during my Ph.D. career since Miami University. He is a creative and talented scientist who is always encouraging to try out and learn new techniques, which allowed for the work presented in this thesis to be possible. More importantly, his supports made it possible for me to always learn and practice state of the art techniques and learn from great scientists that transformed me from a student to a researcher.

To my dearest new family, the Tolbert group, we are from everywhere and we all have been this great family, contributing to it and getting nurtured by each other. I sincerely appreciated the time we spent with each other; all the help and support you provided. Especially, I want to acknowledge former group member, Dr.

Christopher E. Morgan, for his help with the structure calculation method and the

NMR experimental setup. Liang-Yuan Chiu also helped with NMR. I also want to thank my undergrads trainees, Marie-Louise Kloster, Kaixuan Zheng, and Andrew

Sugarman (Oberlin College) for their contribution to this work.

Over the years of my Ph.D., I really appreciated the opportunity to meet and collaborate with many great minds. I would like to thank Dr. Jonathan Karn (School of Medicine, CWRU), for the training opportunity in his lab and the help I got from

VII

his group. I would like to thank Dr. Silvi Rouskin (Whitehead Biomedical Institute,

MIT), for the hands-on training of the DMS-MaPseq technique and help with collecting and interpreting data. I would like to thank Dr. Srinivas Charkravarthy

(BioCAT, Argonne National Laboratory) for help with collecting and analyzing our

SAXS data.

I would like to thank all my committee members, Dr. Mary Barkley, Dr. Paul Carey, and Dr. Fu-Sen Liang, for their comments and revision of the thesis.

Last but not the least, my family and friends. I cannot imagine how could I get this far without their love and support.

谢谢你,戴江宏女士,作为一个医生,你的职业精神激励我树立个人事业目标并

为之努力奋斗;作为我的母亲,你无时无刻的支持和爱,甚至是质疑,都在鼓励,

鞭策我成为更好的自己。我的论文有你一半的功劳, 我永远爱你。

VIII

List of Abbreviations

AIDS: acquired immune deficiency syndrome

ART: antiretroviral therapy

ARV: AIDS-related viruses

ASCII: American Standard Code for Information Interchange

AZT: Zidovudine

Atm: atmosphere (pressure)

AS: alternative splicing

ATP/A: adenosine triphosphate/adenosine

BME: 2-mercaptoethanol

CA: capsid

CCR5: C-C motif chemokine receptor 5

CD4: cluster of differentiation 4

CDK9: cyclin-dependent kinase 9

CXCR4: C-X-C chemokine receptor 4

CTP/C: cytidine triphosphate/cytidine

CTD: C-terminal repeat domain

DSIF: DRB sensitivity inducing factor dNTP: deoxyribonucleoside triphosphate

Dmax: maximum dimension

DMS: dimethyl sulfate

IX

DMS-MaPseq: DMS mutation profiling with sequencing

DTT: dithiothreitol

ESCRT: endosomal sorting complexes required for transport

EDTA: ethylenediaminetetraacetic acid

EMSA: electrophoresis mobility shift assay

Env: envelope

FDC: HIV fixed dose combination therapeutics

FPLC: fast liquid chromatography

GPU: graphics processing unit

GTP/G: guanosine triphosphate/guanosine

Gp120: envelope glycoprotein 120 kDa

HAART: highly active antiretroviral therapy

HIV-1: human immunodeficiency virus type 1

HTLV: human T-cell leukemia-lymphoma virus

HEXIM: hexamethylene bis-acetamide inducible protein

HMQC: heteronuclear multiple quantum correlation hnRNP A1: heterogeneous nuclear ribonucleoprotein A1

HSQC: heteronuclear single quantum correlation

IN: integrase

ITC: isothermal titration calorimetry

IPTG: isopropyl β-D-1-thiogalactopyranoside

Kcal: kilocalorie

X

Kd: dissociation constant kDa/kD: kilo dalton

LB: Lysogeny broth

LTR: long terminal repeat

LAV: Lymphadenopathy associated virus

MA: matrix

MD: molecular dynamics

MES: 2-(N-morpholino) ethanesulfonic acid miRNA: micro-RNA mRNA: messenger RNA

MW: molecular weight

NC: nucleocapsid

NE: nuclear export

NELF: negative elongation factor

N: any nucleotide (Purine or Pyrimidine) nM: nanomolar

NOE: nuclear Overhauser effect

NOESY: nuclear Overhauser enhansement spectroscopy ns: nanos nt: nucleotide

OD: optical density p(r): pairwise distance distribution

XI

P-TEFb: positive transcription elongation factor b

PTM: post-translational modification

PIC: pre-integration complex

PR: HIV protease

PAGE: polyacrylamide gel electrophoresis

PCR: polymerization chain reaction pdb: ppm: parts per million

R: purine

Rg: radius of Gyration

RNA: ribonucleic acid

RNAPII: Class II RNA polymerase

RRE: Rev-responsive element

RT: reverse transcription or reverse transcriptase rNTP: ribonucleoside triphosphate

RTC: reverse transcription complex

SB: sodium borate

SIV: simian immunodeficiency virus snRNP: small noncoding ribonucleoprotein

RRM1: RNA recognition motif 1

SAXS: small angle X-ray scattering

SBS: sequencing by synthesis

XII

SEC: size exclusion chromatography

SHAPE: selective 2'-hydroxyl acylation analyzed by primer extension

SL: stem loop

SR: serine-arginine rich ssDNA: single-stranded deoxyribonucleic acid

TFII: transcription factors for RNAPII

TGIRT-III: thermostable group II intron reverse transcriptase, Class III

TCEP: Tris (2- carboxyethyl) phosphine hydrochloride

Tm: Mixing time

Tris: tris (hydroxymethyl) aminomethane

TBE: Tris-Borate-EDTA

UNAIDS: United Nations Program on HIV and AIDS

UP1: unwinding protein 1

UTP/U: uridine triphosphate/Uracil wt: wild-type

Y: pyrimidine

ΔG: Gibbs free energy

ΔH: enthalpy

ΔS: entropy

XIII

Structural Insights into 7SK SnRNP Complex and

Its Implication for HIV-1 Transcriptional Control

Abstract by

LE LUO

AIDS/HIV remains a major health threat globally; while treatment and prevention are available, there is currently no cure. The major challenge for developing a cure is to understand the mechanism of HIV latency and to identify and eradicate the latent reservoir. Although substantial insights about HIV latency have been made, the mechanism is still under debate and not well established. Among these, transcriptional repression as a result of reduced levels and activity of P-TEFb plays a significant role. The availability of P-TEFb is closely regulated by an abundant non-coding RNA, 7SK, which is about 330 nt and transcribed by RNAPIII.

The studies presented in this thesis provided structural insights into the secondary structure of 7SK snRNA and its interaction with hnRNP A1, in hope of elucidating the mechanism of transcriptional control, as the basis for reversing the latent reservoir of HIV and eradicating the virus.

1

In the thesis, the 7SK secondary structure model was determined using

RNAStructure in combination with DMS-MaPseq data as pseudo-energy restraint.

The binding site of hnRNP A1 on 7SK snRNA was identified within the upper region of SL3 by DMS-MaPseq. The binding complexes of hnRNP A1 with variable lengths of SL3 transcripts were further characterized using biophysical approaches.

The secondary structure of SL3 that was determined by DMS-MaPseq was confirmed by NMR. The stoichiometry and binding affinity were measured using

ITC and the binding event was confirmed by selectively 15N-labeled HSQC titration.

The shapes of SL3 as well as the 1:1 complex were also characterized using SEC-

SAXS. The molecular density model showed SL3 formed a kinked cylindrical shape and the complex formed a similar shape with enlarged middle region. The measured molecular weights agree with the actual molecular weights. The tertiary structure of SL3 was calculated using MD simulation with the restraints from DMS-

MaPseq and NMR experiments. The structures with highest scores fitted into the molecular density model.

2

Chapter 1: Introduction

3

1.1 History of AIDS/HIV discovery

The first clinical report of Acquired Immune Deficiency Syndrome (AIDS) was back in 1981 when groups of homosexual men and intravenous drug users in the United

States began developing opportunistic infections, which are commonly seen among severely immunocompromised patients1. Over the following years, occurrence of these infections and subsequent death of patients spiked and patients had been reported experiencing a syndrome of lymphadenopathy and chronic fatigue2. Acquired Immunodeficiency Syndrome (AIDS) was initially termed to define clinically the various manifestations of the disease3-4.

By the end of 1982, AIDS had exploded throughout the United States, and the disease was extended to hemophiliacs, blood transfusion recipients, Haitian immigrants, sex partners of risk group members, and children born to mothers at risk, no longer restricted to homosexuals and drug abusers1. All the observations strongly suggested the etiological involvement of a transmissible agent spread through genital secretions and blood. A new retrovirus was an attractive idea since some animal retroviruses (e.g. feline leukemia virus) were known to induce immune deficiency as well as leukemia/ lymphoma. There were also some similarities between the previously discovered human retroviruses (HTLV-I and

HTLV-II) and the putative AIDS agent with respect to host cell tropism and modes of transmission.

4

In 1983, a human retrovirus, was discovered and reported as the causative agent of AIDS5. Barre-Sinoussi group from France detected reverse transcriptase, cytopathic effect, and virus particles in phytohemaglutinin (PHA)- and IL-2- stimulated lymphocytes from a patient with lymphadenopathy, which is a frequent prodrome of AIDS6. Other similar cultures were subsequently reported. The virus was designated as LAV (lymphadenopathy-associated virus) at the time7. However, because no purified virus reagent was available to type the virus from the various cultures or the sera of different AIDS patients, the causal association between LAV and AIDS remained equivocal8. In 1984, Gallo group reported the first long-term propagation of viruses from several AIDS patients in permanent CD4 + T-cell lines9.

It allowed for the first time the isolation of the virus and the development of highly purified and concentrated viral reagents necessary for the study of viral characterization and for the serological detection of exposed individuals. The virus was named HTLV-III, based on the tropism of the virus7. Another team headed by

Levy reported the isolation of similar viruses, which were designated as AIDS- related viruses (ARV)10. Later on, HTLV-III, LAV, and ARV were found to be variants of the same virus4-5. In 1986, an international committee recommended the name human immunodeficiency virus (HIV) for all isolates of the AIDS virus and the isolated virus was named as human immunodeficiency virus, type I (HIV-

1)4.

5

In the same year, a morphologically similar but genetically distinct variant, currently termed HIV-2, was observed in infected patients in western Africa7. Soon thereafter, numerous simian immunodeficiency viruses were discovered in various primate species. Interestingly, close simian relatives of HIV-1 and HIV-2 were found in chimpanzees and sooty mangabeys, respectively 8, 11. The following study further confirmed that HIV-1 and HIV-2 were transmitted to humans through independent cross-species transmission events from these primates12.

6

1.2 HIV-1 Life Cycle

Retroviruses are RNA-containing viruses that replicate through a DNA intermediate by a viral encoded RNA-dependent DNA polymerase, reverse transcriptase (RT). The family of retrovirus is divided into three subfamilies:

Oncovirus, Lentivirus, and Spumavirus. Oncovirus includes all the oncogenic retroviruses and many closely related non-oncogenic viruses. Lentivirus (lente,

Latin for "slow") is a genus of retroviruses that cause chronic and deadly diseases characterized by long incubation periods, including most notably Human

Immunodeficiency Virus (HIV), the causative agent of AIDS. The third subfamily,

Spumavirus (spuma, Latin for “foam”) is a genus of exogenous viruses that have specific morphology with prominent surface spikes and induce persistent infections without any clinical disease but cause vacuolization of cultured cells13.

HIV uses CD4 immune cells to replicate itself with each infected CD4 cell producing hundreds of new copies of HIV particles with each life cycle lasting only

1 to 2 days. Understanding of the HIV lifecycle is fundamentally important for controlling the progression of HIV, as the identification of the key enzymes and mechanisms have provided the targets for the development of new drugs.

The HIV life cycle includes seven steps: binding, fusion, reverse transcription, integration, replication, assembly and budding (Figure 1.1). Following entry into

7

target cells, HIV reverse transcribes its genome into double stranded DNA

(dsDNA). The viral DNA is then translocated into the nucleus of the cell, integrating itself into host . To replicate, HIV exclusively exploits the host cellular machineries to transcribe and translate its genome to produce viral proteins as well as genomic materials for production of new viruses. This process of effective integration and packaging of new virions is known as transduction or productive infection. This life cycle of productive infection is illustrated in Figure 1.1 and described in detail below.

Binding and fusion mark the start of the virus replication cycle, which begins with attachment of the HIV Env glycoprotein to its cognate receptor on target cells and finishes with fusion of the viral and host cell membranes. The Env trimer is heavily glycosylated and composed of the gp120 and gp41 heterodimers. The gp120 subunit mediates attachment of HIV to its primary receptor necessary for entry,

CD414-16. Env binding of CD4 mediates rearrangement of the gp120 subunit, which exposes gp120 binding sites to one of the two necessary co-receptors for entry,

CCR5 or CXCR416. Following binding of gp120 to its specific co-receptor, rearrangements in Env expose the hydrophobic gp4117. Gp41 inserts into the host cell membrane and mediates fusion of target cell and viral membranes, resulting in the release of the viral core into the target cell cytoplasm.

8

The cone-shaped core of HIV-1 includes two copies of the (positive sense) ssRNA genome, the enzymes reverse transcriptase (RT), integrase (IN) and protease

(PR), some minor proteins, and the major core protein18. Once inside the cytoplasm, the self-packed RT starts to reverse transcribe the ssRNA to dsDNA while transiting to the nucleus. Once the viral DNA is transported into the nucleus, the viral nucleoprotein complex known as the pre-integration complex (PIC) inserts into chromosomal DNA in a reaction catalyzed by IN19. IN processes the 3’ ends of the HIV dsDNA and facilitates strand transfer of these ends onto the chromosomal site of integration20. The integrated viral DNA is called provirus.

Hypothetically, integration of the HIV genome could occur anywhere along the genome of the target cell, but in reality, integration site selection is not a random process. Instead, HIV integrates preferentially into actively transcribing within gene-dense regions21.

Replication consists of two major steps, transcription and translation. HIV can exist in a transcriptionally active or latent state following integration, primarily depending on the chromatin environment of the integration site and the availability of the Tat protein, which is fundamental for the efficient transcriptional elongation of provirus.

HIV transcription is initiated from the promoter within the 5’ LTR22. The cellular

RNA polymerase II begins transcription with generating some short, nonpolyadenylated transcripts. The transcriptional complex is paused at an early

9

state, until HIV Tat recruits the cellular protein positive transcription elongation factor b (P-TEFb) complex to the HIV 5’ LTR which binds to the TAR element on these short transcripts. CDK9, a protein within the P-TEFb complex, hyperphosphorylates the C-terminal domain (CTD) of RNA polymerase II to stimulate transcription elongation so that full-length transcripts can be generated23-

24. Interestingly, resting CD4 T cells contain almost no , the other component along with CDK9 of the P-TEFb complex, thus restricting the availability of P-TEFb to stimulate proviral transcription25-26.

Complete expression of the viral genome undergoes a number of post- transcriptional processes. Alternative splicing (AS) plays a critical role in the viral replication cycle27-30. As a result of HIV-1 AS, the primary transcript is spliced into over 100 variant mRNA constructs with the aid of host regulatory proteins

(HRPs)31-33. These mRNAs are grouped by size and splicing pattern into 3 pools: the unspliced RNA (9 kb), the incompletely spliced RNA (or singly spliced RNA, 4 kb), and the completely spliced RNA (or multiply spliced RNA, 2 kb)34. HIV transcripts are heavily spliced prior to export from the nucleus. HIV protein Rev assists in nuclear export of unspliced or singly spliced transcripts by binding to the

Rev responsive element (RRE) on these RNAs30.

Translation of partially spliced or fully spliced transcripts primarily produces the regulatory and accessory proteins needed to transcribe proviral DNA and aid in

10

infectivity, while singly spliced and unspliced transcripts primarily produce the structural and enzymatic proteins necessary for replication as well as the genomic

RNA for packaging of progeny virions.

After all the virus has generated all the components required for infectivity, viral assembly occurs at the plasma membrane, within specialized membrane microdomains. This event is mediated by the viral protein Gag. The newly packaged virion includes two copies of the genomic viral ssRNA, the viral envelope

(Env) protein, the Gag polyprotein, and the three viral enzymes: protease (PR), reverse transcriptase (RT), and integrase (IN). Although Gag itself can bind membranes and assemble into spherical particles, the budding event that releases the virion from the plasma membrane is mediated by the host endosomal sorting complexes required for transport (ESCRT) machinery. The matured virions that were released will go on to infect new targets and continue the viral replication cycle.

11

1.3 The HIV-1 structure and proteins

Structural study can provide detailed information on biological mechanisms and aid in the development of therapeutic interventions. In HIV-1, key components, including its envelope glycoproteins and capsid, and the replication enzymes RT,

IN and PR, have been scrutinized to near atomic scale resolution. Moreover, structural analyses of the interactions between viral and host cell components have yielded fundamental insights into the mechanisms of virus entry, chromosomal integration, transcription and budding from cells. Structural studies on the virion itself, and more importantly the viral proteins, have been fundamentally important for the discovery of potential sites of therapeutic intervention.

The mature HIV-1 virion is spherical, measuring approximately 120 nm in diameter.

It is an enveloped virus enclosed by a lipid bilayer which is studded with envelope

(Env) glycoprotein spikes35. Two copies of the genomic viral ssRNA, the nucleocapsid (NC) protein, the viral enzymes reverse transcriptase, integrase, and protease, as well as particular viral accessory proteins are all contained within the core16.

In HIV-1, all viral proteins are expressed from a small single 9 kb polycistronic transcript. The integrated double-stranded DNA form of the HIV-1 virus, also known as the provirus, is approximately 9,800 base pairs long. The provirus is

12

flanked on both ends by repeat structures known as long terminal repeats, with the promoter region located within the 5’ LTR. Transcription of the viral genome is regulated by constitutive cellular transcription factors as well as the viral encoded regulatory proteins. In all, there are 15 mature proteins encoded by the HIV-1 genome, and based on their functions, they are divided into three major classes: major structural proteins, such as Gag, Gag-Pol, and Env; regulatory proteins, such as Tat and Rev; or accessory proteins, such as Nef, Vif, et al. In the following part, I will review the major proteins of each class, starting with the major structural proteins.

Major Structural Proteins

• Gag (group specific antigen)

The gag gene produces a 55 kDa polyprotein, known as p55, which is myristoylated at its N-terminus to direct it to the plasma membrane following translation30, 36. From there, it recruits two copies of the viral RNA to the plasma membrane. Following budding of the immature virion from the plasma membrane, the Gag polyprotein is cleaved into its respective components matrix (MA)37, capsid (CA)18, 38, nucleocapsid 39-40, and p6, by the viral protease in a process known as maturation30. MA is the myristoylated region of Gag which remains associated with the inner leaflet of the viral envelope, whereas CA forms the conical core of the virus. The NC region of Gag associates with viral RNA to incorporate the viral genome into budding virions. Finally, the p6 region of Gag is

13

necessary for Vpr incorporation into the budding virus as well as recruitment of proteins necessary for complete budding of virions from the target cell surface19,

41-44.

• Gag-Pol (Gag-Polymerase fusion polyprotein)

Five percent of the time during translation of unspliced HIV-1 mRNA, a ribosomal frame shift event occurs causing read-through translation of a Gag-Pol polyprotein45. Within this Gag-Pol fusion exist viral PR, IN and RT45-46. Following maturation, viral PR cleaves these specific proteins into their individual components47. HIV-1 RT contains both DNA-dependent RNA polymerase activity as well as RNAse H activity, which reverse transcribes the viral RNA to DNA and also degrade RNA within DNA-RNA hybrids, respectively26, 48-49. Finally, the IN protein mediates integration of the viral DNA into target cell chromosomes21, 50.

• Env (envelope)

The HIV-1 envelope protein is expressed from singly spliced mRNA15. The env gene encodes for the 160 kDa HIV envelope glycoprotein precursor known as gp16017. Cellular protease cleaves gp160 into the gp120 and gp41 subunits.

Gp120 forms the extracellular domain, while gp41 is associated with the viral membrane to form the transmembrane domain of Env51. Both subunits are connected together on the envelope through noncovalent interactions10, 17, 52. Upon viral infection, the gp120 subunit mediates the interaction of the virion with the CD4

14

receptor on T cells52. As a result of binding between gp120 and the CD4 receptor, conformational changes of gp120 therefore induce exposure of the fusion domain of gp41 that in turn triggers the fusion of HIV-1 into the cellular membrane. The gp120 has five highly variable domains, designated V1 through V5, whose amino acid sequences can vary greatly among HIV-1 isolates. While the other domains are mainly involved in CD4 binding, the V3 loop instead interacts with the co- receptor CXCR4 or CCR5 on the cell surface and defines the tropism of a specific

HIV strain8, 35.

Regulatory Proteins

The HIV-1 regulatory proteins, Tat and Rev, are viral proteins that differ from the host regulatory proteins. These two proteins play important roles in viral replication.

• Tat

Tat is a transcriptional transactivator that is essential for HIV-1 transcription53. The two isoforms of Tat are 72 and 101 amino acid long and expressed by early fully spliced mRNAs or late incompletely spliced HIV mRNAs, respectively. Tat is an

RNA binding protein, unlike conventional transcription factors that interact with

DNA54-55. Tat binds to a short stem loop structure, known as the transactivation response element (TAR), that is located at 5’ LTR, closely downstream of the promoter at the 5' terminus of HIV RNA. Tat promotes the elongation phase of HIV-

1 transcription, so that the full-length transcript (the viral genome) can be produced

15

by recruitment of the transcriptional elongation factor (P-TEFb) to this TAR region56-57. The promotion of RNAPII elongation is accomplished by the recruitment of a serine kinase which phosphorylates the C-terminal domain of RNA polymerase II. This kinase, which is known as cyclin dependent kinase 9 (CDK9), is a part of P-TEFb that binds directly to Tat55. Tat function requires a cellular co- factor in part of P-TEFb, known as Cyclin T1, which facilitates recognition of the

TAR loop region by the Cyclin T1-Tat complex58.

• Rev

Rev is a 13 kDa sequence-specific RNA binding protein encoded by a completely spliced mRNA46. Rev induces the transition between early and late phase of HIV gene expression59. Rev binds to a 240 base region of RNA, called the Rev response element (RRE)60. The RRE has a complex secondary structure that contains a non-Watson-Crick G-G within a double stranded RNA helix61.

This structure, known as the Rev high affinity binding site, is located in the stem loop 2 of RRE.

The binding of Rev to the RRE facilitates export of unspliced and incompletely spliced viral RNAs from the nucleus to the cytoplasm. Normally, RNAs that contain introns (i.e., unspliced or incompletely spliced RNA) are retained in the nucleus, as only the completely spliced RNA would be transported to the cytoplasm, where translation occurs. At the early phase, only completely spliced RNA is translated,

16

which encodes Tat, Rev and Nef. With Tat and Rev expression, more unspliced

/incompletely spliced mRNAs accumulate within the nucleus; meanwhile, Rev can lead to the export of the intron containing viral RNAs out to the cytoplasm. Both events lead to the availability of all the mRNA that encodes the whole spectrum of viral proteins. As it shifts to the late phase, the amount of RNA available for complete splicing is decreased, which in turn reduces the levels of Rev expression.

Therefore, the ability of Rev to decrease the rate of splicing of viral RNA generates a negative feedback loop whereby Rev expression levels are tightly regulated 62.

Rev is required for HIV-1 replication where proviruses that lack Rev function are transcriptionally active but do not express viral late genes and thus do not produce virions63.

Accessory Proteins

Accessory proteins are found only from lentiviruses and dispensable for viral replication. The accessory proteins are not necessary for HIV replication in vitro, but their evolutionary retention suggests their significance in viral infectivity as well as HIV disease progression. These proteins have a variety of functions related to

HIV pathogenesis including HIV evasion of the host immunity, degradation of cellular antiviral proteins and enhanced infectivity of HIV.

• Nef

17

Nef (negative regulatory factor) is a 27 kD myristoylated protein that is encoded by a single exon that extends into the 3' LTR. Following HIV-1 infection, Nef, an early gene of HIV, expressed during the early phase like Tat and Rev, is the first viral protein to accumulate to detectable levels in a cell64. Its name is a consequence of early reports claiming that Nef down-regulated transcriptional activity of the HIV-1

LTR65. However, it is no longer believed that Nef has a direct effect on HIV gene expression, as it has been shown to have multiple activities throughout the replication cycle66-67. The most widely observed activities include the down- regulation of the cell surface expression of CD4, the perturbation of T cell activation, and the stimulation of HIV infectivity68-69. CD4 down-regulation appears to be advantageous to viral production because an excess of CD4 on the cell surface has been found to inhibit Env incorporation and virion budding69-71.

Additionally, it has been reported that Nef protein of simian immunodeficiency virus

(SIV) is required for high-titer growth and the typical development of disease in adult animals72. It is possible, for Nef-defective mutants of SIV to cause disease in newborn animals73. Further, Nef-defective virions do cause an AIDS-like disease in infected animals although onset is delayed74.

• Vif

Vif (viral infection factor) is a 23 kD protein that is essential for the replication of

HIV in peripheral blood lymphocytes, macrophages, and certain cell lines

18

75.However, the functional study of this protein is not conclusive. In most cell lines,

Vif is not required, suggesting that these cells may express a protein that can complement Vif function. However, the observation that Vif must be present during virion assembly 76 , as it is incorporated into virions of HIV 77-78 suggest its necessity otherwise. However, this phenomenon might be nonspecific because Vif is also incorporated into heterologous retroviruses such as murine leukemia viruses (MLV)

19, 76. It has also been revealed that Vif-defective HIV strains can enter cells but cannot efficiently synthesize the proviral DNA, and Vif mutant virions have improperly packed nucleoprotein cores 79. These observations propose a model that Vif is interacting an antiviral cellular factor, rather than a viral component.

• Vpr

The 14 kD Vpr (viral protein R) protein is incorporated into viral particles.

Approximately 100 copies of Vpr are associated with each virion 80. Incorporation of Vpr into virions is mediated through specific interactions with p6, which corresponds to the C-terminal region of p55 Gag 81. Although the function of Vpr is not fully understood, it is known to both arrest the cell cycle in the G2 phase as well as target cellular proteins related to DNA repair for proteasomal degradation

82-83.

• Vpu

The 16 kD Vpu (viral protein U) protein is an integral membrane phosphoprotein that is primarily localized in the internal membranes of the cell 42. Vpu is expressed

19

from the mRNA that also encodes Env. Vpu is translated from this mRNA at levels ten-fold lower than that of Env because the Vpu translation initiation codon is not efficient 84.

The two functions of Vpu are the down-modulation of CD4 and the enhancement of virion release 85. Vpu antagonizes the ability of the host tetherin to restrict HIV-

1 budding. In the absence of Vpu, large numbers of virions can be seen attached to the surface of infected cells 86.

20

1.4 The AIDS/HIV-1 Epidemics and Treatment

HIV continues to be a major global public health issue, having claimed more than

35 million lives thus far. As reported in the most recent World Health Organization statistics, by the end of 2017, there were approximately 36.9 million people living with HIV. In 2017 alone, 1.8 million people were newly infected, and 1 million people died from HIV-related causes globally. The data also reported that 59% of adults and 52% of children living with HIV get access to lifelong antiretroviral therapy (ART), and the ART coverage for pregnant and breastfeeding women living with HIV is high at 80%. Africa remained the most affected region with 70% of the world’s diagnosed cases and accounted for over two-thirds of new HIV infections in 2017. Additionally, it is estimated that currently 25% of people that have been infected with HIV are unaware of their status. The report also stated that between 2000 and 2017, the number of new HIV infections decreased by 36%, and HIV-related deaths decreased by 38% with 11.4 million lives saved in the same period attributed to ART.

The United Nations Program on HIV and AIDS (UNAIDS) is leading the global efforts towards ending the AIDS epidemic by 2030. Powerful momentum is building toward a new narrative on HIV treatment and a target for 2020: 90% of all people living with HIV will know their HIV status; 90% of all people diagnosed with HIV will

21

receive sustained antiretroviral therapy; and 90% of all people receiving antiretroviral therapy will achieve viral suppression.

Although many strategies are needed to end the epidemic, one thing is certain: It will be impossible to end the epidemic without bringing HIV treatment to all who need it. For the last three decades of global fighting against HIV, the prevalence of HIV has expanded across the globe despite advances in treatment.

During the long battle against AIDS over the last several decades, different approaches have been proposed, attempting to treat or cure the disease. In March

1987, the first FDA-approved antiretroviral treatment for HIV-1 infection,

Zidovudine (AZT), was introduce, marking the first successful counterattack of human beings in the war against AIDS 87-88. Despite the tremendous efforts, the quest for a safe and effective HIV vaccine seems to be remarkably long and winding 74, 89-92.

Of all the accomplishment achieved during the battle against AIDS, effective antiretroviral therapy (ART) has been the most successful weapon to defeat the virus. Not only has ART transformed AIDS, a once universally fatal disease into a manageable chronic infection 93, it has also shown great promise as an approach to prevent the spreading among those people at greatest risk 94. HIV infection has a very complex pathogenesis and varies substantially from case to case. As a host-

22

specific infection, the specificity of pathogenesis often complicates treatment options that are currently available for individual HIV infections 95.

Effective management of HIV infections is possible using different combinations of available drugs. This method of treatment is collectively known as combinational antiretroviral therapy (ART or cART). Standard ART is often comprised of a concoction of at least three medicines (termed as “highly active antiretroviral therapy” or HAART). Effective ART often helps control the multiplication of HIV in infected patients and increases the count of CD4 cells, thus, prolonging the asymptomatic phase of infection, slowing the progression of the disease, and reducing the risk of transmission. Since 1996, ART has been widely recommended for treatment of HIV-1 infection and has proven highly effective on suppression of viral replication in HIV-1 infected patients as well as for the prevention of new infections.

Since the approval of AZT, FDA has approved 42 other antiretroviral drugs, including 29 single-tablets and 14 fixed dose combination therapeutics. Most of them can be classified into 6 major categories of antiretrovirals (ARV): nucleoside reverse transcriptase inhibitors (NRTIs), non-nucleoside reverse transcriptase inhibitors (NNRTIs), protease inhibitors (PIs), integrase strand transfer inhibitors

(INSTIs), fusion inhibitors (FIs), and entry inhibitors (EIs). These ARV target

23

selectively to the key steps during the HIV life cycle, including binding, fusion, reverse transcription, integration and budding.

There are currently no ARVs developed to target the replication stage of the virus, where HIV-1 is hijacking the host cell machineries to produce viral transcripts and proteins. The main reason is that direct inhibition will also result in malfunction of cellular replication and therefore the death of the host cells. However, the replication stage remains as a major interest of HIV research in hope of identifying new potential sites for therapeutic interventions.

24

1.5 HIV transcription by RNAPII

As HIV is utilizing the host cell system to replicate the viral components, understanding the transcription mechanism of host cells, as well as the regulatory mechanism of viral transcription is critical to developing novel therapeutic interventions to HIV life cycle.

Transcription is the first essential step of replication, whereby the genomic DNA is transcribed into RNA. As for HIV, the integrated dsDNA (provirus) is transcribed into different classes of RNA. In human, as in all eukaryotes, RNA is very versatile and plays a plethora of roles, including, but not limited to, translation (e.g., messenger RNA, transfer RNA), catalytic roles in RNA processing (e.g., the spliceosome and tRNA maturation), chromatin structure (e.g., XIST-mediated coating and silencing of an X ), transcription regulation (e.g., guide

RNA and 7SK RNA), and RNA silencing (e.g., miRNAs) 96.

The job of transcription has been shared by three RNA polymerases (RNA polymerase I, II, and III) 97, and each polymerase is responsible for specific subsets of genes with highly evolved regulatory mechanisms. For example, RNA polymerase I transcribes 28S, 18S, and 5.8S ribosomal RNAs 98-99; RNA polymerase II (RNAPII) produces messenger RNA (mRNA), small nuclear RNA, and miRNA 99; and RNA polymerase III generates 5S ribosomal RNA, transfer

25

RNA, and small non-coding RNA (snRNA) 100. While partially conserved, each

RNA polymerase associates with specific transcription factors and has unique levels of regulation 97. As for HIV provirus, RNAPII is responsible for its transcription.

RNAPII is a ~520 kDa protein complex composed of 12 subunits, RPB1-12 99, 101.

Because RNAPII is the central enzyme that controls expression of protein-coding genes, cells have evolved a series of regulatory events to control transcription activity 99, 102. Transcription efficiency is not only regulated by RNAPII but also modulated by the recruitments of other proteins to the transcription site during transcription102. These proteins are called transcription factors and classified further into enhancer or suppressor (silencer) depending upon their impacts on transcription. The presence of enhancers at the transcription site will promote the transcription, while suppressors will block or inhibit the transcription.

Transcription can be roughly divided into three stages: initiation, elongation, and termination 98, 103. Each stage has unique control mechanisms that coordinate the

RNAPII activity.

Efficient initiation of RNAPII requires the aid of transcription factors for RNAPII

(TFIIs), TFIIA, TFIIB, TFIID, TFIIE, TFIIF and TFIIH. The ordered assembling of these TFIIs leads to the formation of the pre-initiation complex (PIC), whereby

26

RNAPII is recruited to the promoter. Initiation begins upon phosphorylation of

Serine 5 within the CTD by the cyclin dependent kinase 7 (CDK7) subunit of TFIIH, while another TFIIH subunit hydrolyses ATP to induce DNA melting 104-105.

In theory, after the initiation, once elongation starts, RNAPII should be able to run the entirety of the transcription until termination signals are reached. However, efficient elongation is not an easy task. In HIV-1, the early elongation complex with

RNAPII will be transcribing the provirus until reaching a highly structured RNA element, called transactivation response (TAR) element. Upon reaching TAR, the elongation complex is trapped by two negative factors, DRB-sensitive inducing factor (DSIF) and the negative elongation factor (NELF), with the result that transcription is stalled at TAR 54, 106. To resume transcription, both NELF and DSIF need to be released from the elongation complex, which requires phosphorylation of both multi-domain negative factors 107-108. HIV regulatory protein Tat (trans activator of transcription) acts as a chaperone to introduce the positive elongation factor B (P-TEFb) to the complex, where the CDK9 kinase region of P-TEFb phosphorylates NELF-E, a subunit of NELF, and Spt5, a subdomain of DISF, as well as the CTD of RNAPII, resuming the elongation of RNAPII 108-109. In this process, both NELF and DISF are phosphorylated, however, their fates are different. Phosphorylated NELF is forced to disassociate from the elongation complex110, while the phosphorylated Spt5 (DISF) is separated from the rest of the complex and converted into a positive elongation factor that stabilizes transcription

27

complexes at terminator sequences 111-112. The transition between pausing and elongation is a critical step in gene expression. On the one hand, this transition becomes the speed-limit step of transcription; on the other hand, it provides a crucial quality-control checkpoint for transcription. As the RNAPII complex is transcribing the provirus, the initiation factors are released from RNAPII, in exchange for elongation factors, aiding in an efficient and complete transcription113-

114.

Once RNAPII reaches the end of the provirus, termination happens. Unlike initiation and elongation, termination is less studied, particularly because of the difficulty in specifying the function of each termination factor to the process.

28

1.6 Transcriptional regulation of HIV-1

Gene expression of HIV-1 by RNAPII is mostly regulated at the transcriptional and posttranscriptional levels107. Immediately after infection, transcription of HIV-1 produces only completely spliced mRNAs that encode the viral regulatory proteins

Tat and Rev31. As the infection proceeds, aided by these two viral regulatory proteins, transcription processivity largely spikes, and other classes of mRNA get transcribed, which later are translated to the major structural proteins and accessory proteins of HIV-129. The full-length unspliced transcript is also transcribed later, which acts both as the virion genomic RNA and the mRNA for the Gag-Pol polyprotein29, 115.

Two regulatory proteins regulate the process in different levels with different mechanisms. Tat regulates the gene expression at the transcriptional level, activating viral transcription by promoting efficient elongation from 5’ LTR, the transcription promoter116. Rev regulates the process at the posttranscriptional level, transporting the unspliced and incompletely spliced mRNAs encoding the structural proteins from the nucleus to the cytoplasm, so a full spectrum of viral proteins can be translated 60. If the host cell is a manufacturer for HIV-1, then Tat is the transcription workshop supervisor whose primary job is to maximize the processivity of transcription, and Rev is the supply manager of the downstream

29

translation workshop who is in charge of optimizing the transcript supplies based on the needs of new virion production.

In the following part, I will focus on the regulatory mechanism of HIV-1 transcription which is closely regulated by viral protein Tat through P-TEFb.

Releasing of RNAPII from the transcriptional pause has emerged as a critical step of gene expression regulation, in which P-TEFb functions as a core contributor that phosphorylates the negative transcription factors, NELF and DISF, enhancing efficient transcriptional elongation. Regulation of P-TEFb is essential to the overall transcriptional control109. There are multiple levels of regulation in the activities of

P-TEFb107, 117-118. First, translational regulation of the kinase itself119. The active kinase subunit, CDK9, is expressed as two isoforms, with molecular masses of 42 kD and 55 kD due to the alternative promoter usage120-121. While the larger isoform

122appears to undergo constitutive expression, the smaller isoform is up-regulated upon extracellular signals and activation122. In addition, the two CDK9 isoforms can each interact with four different cyclins (T1, T2a, T2b, and K, with two Cyclin

T2 isoforms resulting from alternative splicing of a single gene122). These isoforms can form up to 8 variable complexes, however, the exact contribution of each in regulating P-TEFb function is not fully understood. Another level of regulation is through posttranslational modification (PTM)123. The regulatory Cyclin T subunit can undergo acetylation via Proline 300, which releases active P-TEFb from the

30

inhibitory snRNP complex; CDK9 also possesses a T-loop which includes a critical threonine (Threonine 186) proximal to the active site, that requires phosphorylation for full activity 124. The most notable level of P-TEFb control occurs through association with the 7SK small nuclear RNA (snRNA) 125-128. Indeed, this novel mechanism in which 7SK snRNA serves as a platform to form a transcriptionally inactive ribonucleoprotein complex (RNP) with P-TEFb and hexamethylene bis- acetamide inducible protein (HEXIM) 125-126, is the predominant mechanism governing the equilibrium between active and inactive pools of P-TEFb. Depending on the cell type, nearly 50-90% of the total P-TEFb is sequestered within this 7SK snRNPs 129.

The recruitment of P-TEFb to the paused polymerase complex is not by simple diffusion of active kinase. Instead, it is regulated via specific interactions with the transacting transcription factors, the viral protein Tat23, 130. The crystal structure of a Tat-P-TEFb complex was determined in 2010 by David Price and his colleagues131. The structure shows that Tat forms extensive contacts with both the

Cyclin T1 subunit of P-TEFb and also the T-loop of the CDK9 subunit. Other recruiter proteins and mechanism were discovered. For example, human bromodomain protein Brd4 was characterized to directly interact with the inactive

P-TEFb in the 7SK snRNP similar to Tat, releasing P-TEFb from the restriction and chaperoning the enzyme to the paused RNAPII complex 132-133. Tat directed resumption of transcriptional elongation is called transactivation in which Tat binds

31

to P-TEFb to induce significant conformational changes in CDK9 that constitutively activate the kinase 121, 131, 134. The transactivation mechanism involves a complex set of phosphorylation events mediated by the Tat-activated P-TEFb that modify the transcription factors, like DSIF and NELF, as well as CTD of RNAPII108.

32

1.7 7SK snRNPs in P-TEFb regulation

Human 7SK snRNA is an abundant (~2 x 105 copies per cell 135) non-coding transcript, whose gene is located on chromosome 6 and is generated by RNA polymerase III136. The 331 nt snRNA folds into several stem loops (SL) that provide a platform for RNA binding proteins, however, for a long time, the function of this snRNA remained unknown until researchers discovered it associates with, and inhibits, P-TEFb125-126.

While not fully understood, three different and mutually exclusive 7SK snRNPs have been identified in vivo. The assembly and disassembly of two of the 7SK snRNPs directly regulate the availability of active P-TEFb, while the other snRNP helps suppress RNAPII initiation and elongation at enhancer regions129, 137-138.

The central complex that directly regulates P-TEFb availability is 7SK-HEXIM-P-

TEFb snRNP complex, which contains the host accessory factor, hexamethylene bis-acetamide inducible protein (HEXIM). In the host cell, two isoforms (HEXIM1 and HEXIM2) have been identified, arising from adjacent genes on chromosome

17139-141. The isoforms appear to have overlapped functions, as a compensatory up-expression of HEXIM2 is observed when the HEXIM1 gene is knocked down140.

HEXIM could form a homodimer or heterodimer through a C-terminal coiled-coil motif, however, homodimer is the most common form of HEXIM 139, 142. The

33

dimerization is a prerequisite for association with 7SK snRNA, as monomer was not observed in the complex in vivo 139. Additionally, association of the HEXIM1 dimer with 7SK snRNA is required for the binding and inhibition of P-TEFb 139; a conformational change upon interaction with 7SK snRNA unmasks the acidic region of HEXIM1, so that the N-terminal arginine rich motif (ARM) is available to bind P-TEFb 137. ARM is also responsible for mediating dsRNA interactions137.

Upon binding to the distal portion of SL1 of 7SK RNA 143, the C-terminal acidic portion interacts with the Cyclin T1 subunit of P-TEFb 128. HEXIM1 contains two aromatic residues, Phenylalanine 208 (F208) and Tyrosine 271 (Y271), that most likely bind within the ATP-binding pocket of CDK9 to restrict the kinase activity of

P-TEFb 143.

P-TEFb and HEXIM1 are released from the 7SK-P-TEFb-HEXIM1 snRNP complex sequentially, as viral protein Tat recruits P-TEFb to TAR where RNAPII is paused

25, 109, 144. The dissociation of P-TEFb and HEXIM1 from 7SK snRNA does not result in their degradation. Instead, various heterogeneous nuclear RNP (hnRNPs) bind to the remaining 7SK snRNP to form the 7SK-hnRNP snRNP complexes. Of the twenty hnRNPs, only hnRNP A1, A2/B1, Q1 and Q3, R, and K have been observed to bind 7SK snRNA 129, 145. Surprisingly, these proteins form two mutually exclusive complexes, with hnRNP A1 and A2 binding separately from hnRNP Q and R 146. It is unknown whether hnRNP K binds either one of the two 7SK-hnRNP snRNPs, or if the protein forms a unique 7SK-hnRNP snRNP 147.

34

More interestingly, both P-TEFb and hnRNP form mutually exclusive complexes with 7SK snRNA, posting an equilibrium between the active and inactive pools of

P-TEFb and implying a potential regulatory mechanism of P-TEFb 135, 145, 148. While the precise function of the 7SK-hnRNP snRNPs is unknown, some evidence suggests it may play multiple roles in HIV-1 gene expression. First, as the hnRNP family is one of the major splicing factors, it may contribute to regulating alternative splicing by balancing the availability of different classes of hnRNPs. One of the observations is that 7SK-associated hnRNPs participate in splice site selection149.

More importantly, as I mentioned earlier, the balance between the formation of the

7SK-P-TEFb snRNP and the 7SK-hnRNP snRNPs coordinates the availability of active P-TEFb. As the observation of the correlation between the down regulation of P-TEFb and increased formation of 7SK-hnRNP snRNP under transcriptional stress. Vice versa, knock-down of hnRNP A1 and A2 or hnRNP K increased the formation of the 7SK-P-TEFb RNP145. These observations imply that the equilibrium of the 7SK snRNA interaction with these two proteins is potentially an unknown mechanism of P-TEFb regulation.

My major interest is to establish structural insights into this 7SK-hnRNP snRNP, more specifically, 7SK- hnRNP A1 snRNP complex, in order to unmask this mechanism of P-TEFb regulation and, furthermore, to better understand the transcriptional regulation of HIV-1. In the following part of my thesis, I will describe

35

the project that is focused on 7SK secondary structure determination by DMS-

MaPseq (Chapter. 2) and the structural study of SL3 of 7SK including interaction with hnRNP A1 (Chapter. 3), to provide some structural insights into the topic.

36

[Figure 1.1]: The HIV Life Cycle. The HIV life cycle includes seven stages: binding, fusion, reverse transcription, integration, replication, assembly and budding. In these stages, six major classes of ART treatment are available to target the life cycle (https://aidsinfo.nih.gov/understanding-hiv-aids/fact-sheets/19/73/the-hiv- life-cycle; UNAIDS data sheet).

37

[Figure 1.2] Structure of HIV Virion. The mature HIV-1 virion is spherical, measuring approximately 120 nm in diameter. It is an enveloped virus enclosed by a lipid bilayer which is studded with envelope (Env) glycoprotein complex consist of gp41 and gp120. Within the core, two copies of the genomic viral ssRNA, the nucleocapsid (NC) protein, the viral enzymes reverse transcriptase (RT), integrase

(IN), and protease (PR), as well as particular viral accessory proteins, such as Tat

16. (license: CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0) for

Creative Commons; Creator: Thomas Splettstoesser (www.scistyle.com), from

Wikimedia; for figure detail, refer to Appendix 1)

38

[Figure 1.3] Transcriptional Regulation of HIV Gene by P-TEFb. (A)

Transcription by RNAPII is initiated at 5’ LTR of the HIV-1 provirus. (B) Early transcription stalls once the viral TAR RNA is transcribed, where negative factors

NELF and DSIF bind to the transcriptional complex. (C) HIV viral protein, Tat transports P-TEFb to the stalled RNAPII complex. (D) P-TEFb phosphorylates the

E subunit of NELF, spt5 domain of DSIF, and the CTD of RNAPII. (E) As a result of phosphorylation, NELF is released from the transcription complex, whereas

DSIF is converted to a positive elongation factor and ensure efficient transcriptional elongation, along with P-TEFb.

39

[Figure 1.4] Major 7SK SnRNP Complexes Involved in P-TEFb Regulation. 7SK snRNA is stabilized by MePCE and LARP7 at 5’ and 3’ ends, respectively. An equilibrium of active and inactive pool of P-TEFb is regulated by HIV protein Tat.

In the inactive pool, P-TEFb is inhibited by HEXIM1 dimer bound to 7SK snRNA.

Tat dissociates P-TEFb from the complex and activates P-TEFb. HEXIM1 is displaced after P-TEFb is released and hnRNP proteins bind to 7SK snRNA to form 7SK-hnRNP complexes. HnRNPs prevent 7SK snRNA from reforming the inactive complex of P-TEFb. The two sets of 7SK snRNP complexes are mutually exclusive.

HEXIM1 P-TEFb HEXIM1 Tat + Tat + P-TEFb MePCE LARP7 MePCE LARP7 7SK snRNA 7SK snRNA Inactive Active

P-TEFb X hnRNPs

hnRNPs HEXIM1 + MePCE LARP7 7SK snRNA

40

Reference

1. Centers for Disease, C., Update on acquired immune deficiency syndrome (AIDS) among patients with hemophilia A. MMWR Morb Mortal Wkly Rep 1982, 31 (48), 644-6, 652. 2. Centers for Disease, C., Update on acquired immune deficiency syndrome (AIDS)--United States. MMWR Morb Mortal Wkly Rep 1982, 31 (37), 507-8, 513-4. 3. Gallo, R.; Wong-Staal, F.; Montagnier, L.; Haseltine, W. A.; Yoshida, M., HIV/HTLV . Nature 1988, 333 (6173), 504. 4. Laurence, J., Update: HIV-1 gene nomenclature. AIDS Res Hum Retroviruses 1988, 4 (6), vii-viii. 5. Gelmann, E. P.; Popovic, M.; Blayney, D.; Masur, H.; Sidhu, G.; Stahl, R. E.; Gallo, R. C., Proviral DNA of a retrovirus, human T-cell leukemia virus, in two patients with AIDS. Science 1983, 220 (4599), 862-5. 6. Barre-Sinoussi, F.; Chermann, J. C.; Rey, F.; Nugeyre, M. T.; Chamaret, S.; Gruest, J.; Dauguet, C.; Axler-Blin, C.; Vezinet-Brun, F.; Rouzioux, C.; Rozenbaum, W.; Montagnier, L., Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS). Science 1983, 220 (4599), 868-71. 7. Clavel, F.; Brun-Vezinet, F.; Guetard, D.; Chamaret, S.; Laurent, A.; Rouzioux, C.; Rey, M.; Katlama, C.; Rey, F.; Champelinaud, J. L.; et al., [LAV type II: a second retrovirus associated with AIDS in West Africa]. C R Acad Sci III 1986, 302 (13), 485-8. 8. Goudsmit, J.; Debouck, C.; Meloen, R. H.; Smit, L.; Bakker, M.; Asher, D. M.; Wolff, A. V.; Gibbs, C. J., Jr.; Gajdusek, D. C., Human immunodeficiency virus type 1 neutralization epitope with conserved architecture elicits early type-specific antibodies in experimentally infected chimpanzees. Proc Natl Acad Sci U S A 1988, 85 (12), 4478-82. 9. Gallo, R. C.; Sarin, P. S.; Gelmann, E. P.; Robert-Guroff, M.; Richardson, E.; Kalyanaraman, V. S.; Mann, D.; Sidhu, G. D.; Stahl, R. E.; Zolla-Pazner, S.; Leibowitch, J.; Popovic, M., Isolation of human T-cell leukemia virus in acquired immune deficiency syndrome (AIDS). Science 1983, 220 (4599), 865-7. 10. Wyatt, R.; Kwong, P. D.; Desjardins, E.; Sweet, R. W.; Robinson, J.; Hendrickson, W. A.; Sodroski, J. G., The antigenic structure of the HIV gp120 envelope glycoprotein. Nature 1998, 393 (6686), 705-11. 11. Hirsch, V. M.; Dapolito, G.; McGann, C.; Olmsted, R. A.; Purcell, R. H.; Johnson, P. R., Molecular cloning of SIV from sooty mangabey monkeys. J Med Primatol 1989, 18 (3-4), 279-85. 12. Hahn, B. H.; Shaw, G. M.; De Cock, K. M.; Sharp, P. M., AIDS as a zoonosis: scientific and public health implications. Science 2000, 287 (5453), 607-14.

41

13. Weiss, R. A., Retrovirus classification and cell interactions. J Antimicrob Chemother 1996, 37 Suppl B, 1-11. 14. Camerini, D.; Seed, B., A CD4 domain important for HIV-mediated syncytium formation lies outside the virus binding site. Cell 1990, 60 (5), 747-54. 15. Capon, D. J.; Ward, R. H., The CD4-gp120 interaction and AIDS pathogenesis. Annu Rev Immunol 1991, 9, 649-78. 16. Feng, Y.; Broder, C. C.; Kennedy, P. E.; Berger, E. A., HIV-1 entry cofactor: functional cDNA cloning of a seven-transmembrane, G protein-coupled receptor. Science 1996, 272 (5263), 872-7. 17. Kwong, P. D.; Wyatt, R.; Robinson, J.; Sweet, R. W.; Sodroski, J.; Hendrickson, W. A., Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody. Nature 1998, 393 (6686), 648-59. 18. Franke, E. K.; Yuan, H. E.; Luban, J., Specific incorporation of cyclophilin A into HIV-1 virions. Nature 1994, 372 (6504), 359-62. 19. Camaur, D.; Trono, D., Characterization of human immunodeficiency virus type 1 Vif particle incorporation. J Virol 1996, 70 (9), 6106-11. 20. Lapadat-Tapolsky, M.; De Rocquigny, H.; Van Gent, D.; Roques, B.; Plasterk, R.; Darlix, J. L., Interactions between HIV-1 nucleocapsid protein and viral DNA may have important functions in the viral life cycle. Nucleic Acids Res 1993, 21 (4), 831-9. 21. Bushman, F. D., Tethering human immunodeficiency virus 1 integrase to a DNA site directs integration to nearby sequences. Proc Natl Acad Sci U S A 1994, 91 (20), 9233-7. 22. Brother, M. B.; Chang, H. K.; Lisziewicz, J.; Su, D.; Murty, L. C.; Ensoli, B., Block of Tat-mediated transactivation of tumor necrosis factor beta gene expression by polymeric-TAR decoys. Virology 1996, 222 (1), 252-6. 23. Zhu, Y.; Pe'ery, T.; Peng, J.; Ramanathan, Y.; Marshall, N.; Marshall, T.; Amendt, B.; Mathews, M. B.; Price, D. H., Transcription elongation factor P- TEFb is required for HIV-1 tat transactivation in vitro. Genes Dev 1997, 11 (20), 2622-32. 24. Sedore, S. C.; Byers, S. A.; Biglione, S.; Price, J. P.; Maury, W. J.; Price, D. H., Manipulation of P-TEFb control machinery by HIV: recruitment of P- TEFb from the large form by Tat and binding of HEXIM1 to TAR. Nucleic Acids Res 2007, 35 (13), 4347-58. 25. Molle, D.; Maiuri, P.; Boireau, S.; Bertrand, E.; Knezevich, A.; Marcello, A.; Basyuk, E., A real-time view of the TAR:Tat:P-TEFb complex at HIV-1 transcription sites. Retrovirology 2007, 4, 36. 26. Muniz, L.; Egloff, S.; Ughy, B.; Jady, B. E.; Kiss, T., Controlling cellular P- TEFb activity by the HIV-1 transcriptional transactivator Tat. PLoS Pathog 2010, 6 (10), e1001152.

42

27. Stoltzfus, C. M.; Madsen, J. M., Role of viral splicing elements and cellular RNA binding proteins in regulation of HIV-1 alternative RNA splicing. Curr HIV Res 2006, 4 (1), 43-55. 28. Dowling, D.; Nasr-Esfahani, S.; Tan, C. H.; O'Brien, K.; Howard, J. L.; Jans, D. A.; Purcell, D. F.; Stoltzfus, C. M.; Sonza, S., HIV-1 infection induces changes in expression of cellular splicing factors that regulate alternative viral splicing and virus production in macrophages. Retrovirology 2008, 5, 18. 29. Stoltzfus, C. M., Chapter 1. Regulation of HIV-1 alternative RNA splicing and its role in virus replication. Adv Virus Res 2009, 74, 1-40. 30. Gottlinger, H. G.; Sodroski, J. G.; Haseltine, W. A., Role of capsid precursor processing and myristoylation in morphogenesis and infectivity of human immunodeficiency virus type 1. Proc Natl Acad Sci U S A 1989, 86 (15), 5781-5. 31. Schwartz, S.; Felber, B. K.; Benko, D. M.; Fenyo, E. M.; Pavlakis, G. N., Cloning and functional analysis of multiply spliced mRNA species of human immunodeficiency virus type 1. J Virol 1990, 64 (6), 2519-29. 32. Purcell, D. F.; Martin, M. A., Alternative splicing of human immunodeficiency virus type 1 mRNA modulates viral protein expression, replication, and infectivity. J Virol 1993, 67 (11), 6365-78. 33. Favre, M.; Butticaz, C.; Stevenson, B.; Jongeneel, C. V.; Telenti, A., High frequency of alternative splicing of human genes participating in the HIV-1 life cycle: a model using TSG101, betaTrCP, PPIA, INI1, NAF1, and PML. J Acquir Immune Defic Syndr 2003, 34 (2), 127-33. 34. Muesing, M. A.; Smith, D. H.; Cabradilla, C. D.; Benton, C. V.; Lasky, L. A.; Capon, D. J., Nucleic acid structure and expression of the human AIDS/lymphadenopathy retrovirus. Nature 1985, 313 (6002), 450-8. 35. Hwang, S. S.; Boyle, T. J.; Lyerly, H. K.; Cullen, B. R., Identification of the envelope V3 loop as the primary determinant of cell tropism in HIV-1. Science 1991, 253 (5015), 71-4. 36. Bryant, M.; Ratner, L., Myristoylation-dependent replication and assembly of human immunodeficiency virus 1. Proc Natl Acad Sci U S A 1990, 87 (2), 523-7. 37. Lewis, P.; Hensel, M.; Emerman, M., Human immunodeficiency virus infection of cells arrested in the cell cycle. EMBO J 1992, 11 (8), 3053-8. 38. Bukovsky, A. A.; Weimann, A.; Accola, M. A.; Gottlinger, H. G., Transfer of the HIV-1 cyclophilin-binding site to simian immunodeficiency virus from Macaca mulatta can confer both cyclosporin sensitivity and cyclosporin dependence. Proc Natl Acad Sci U S A 1997, 94 (20), 10943-8. 39. Harrison, G. P.; Lever, A. M., The human immunodeficiency virus type 1 packaging signal and major splice donor region have a conserved stable secondary structure. J Virol 1992, 66 (7), 4144-53.

43

40. Franke, E. K.; Luban, J., Inhibition of HIV-1 replication by cyclosporine A or related compounds correlates with the ability to disrupt the Gag-cyclophilin A interaction. Virology 1996, 222 (1), 279-82. 41. Gallay, P.; Swingler, S.; Song, J.; Bushman, F.; Trono, D., HIV nuclear import is governed by the phosphotyrosine-mediated binding of matrix to the core domain of integrase. Cell 1995, 83 (4), 569-76. 42. Sato, A.; Igarashi, H.; Adachi, A.; Hayami, M., Identification and localization of vpr gene product of human immunodeficiency virus type 1. Virus Genes 1990, 4 (4), 303-12. 43. Re, F.; Braaten, D.; Franke, E. K.; Luban, J., Human immunodeficiency virus type 1 Vpr arrests the cell cycle in G2 by inhibiting the activation of p34cdc2-cyclin B. J Virol 1995, 69 (11), 6859-64. 44. Paxton, W.; Connor, R. I.; Landau, N. R., Incorporation of Vpr into human immunodeficiency virus type 1 virions: requirement for the p6 region of gag and mutational analysis. J Virol 1993, 67 (12), 7229-37. 45. Jacks, T.; Power, M. D.; Masiarz, F. R.; Luciw, P. A.; Barr, P. J.; Varmus, H. E., Characterization of ribosomal frameshifting in HIV-1 gag-pol expression. Nature 1988, 331 (6153), 280-3. 46. Zapp, M. L.; Green, M. R., Sequence-specific RNA binding by the HIV-1 Rev protein. Nature 1989, 342 (6250), 714-6. 47. Ashorn, P.; McQuade, T. J.; Thaisrivongs, S.; Tomasselli, A. G.; Tarpley, W. G.; Moss, B., An inhibitor of the protease blocks maturation of human and simian immunodeficiency viruses and spread of infection. Proc Natl Acad Sci U S A 1990, 87 (19), 7472-6. 48. Kohlstaedt, L. A.; Wang, J.; Friedman, J. M.; Rice, P. A.; Steitz, T. A., Crystal structure at 3.5 A resolution of HIV-1 reverse transcriptase complexed with an inhibitor. Science 1992, 256 (5065), 1783-90. 49. Harrich, D.; Ulich, C.; Gaynor, R. B., A critical role for the TAR element in promoting efficient human immunodeficiency virus type 1 reverse transcription. J Virol 1996, 70 (6), 4017-27. 50. Pryciak, P. M.; Varmus, H. E., Nucleosomes, DNA-binding proteins, and DNA sequence modulate retroviral integration target site selection. Cell 1992, 69 (5), 769-80. 51. Bernstein, H. B.; Tucker, S. P.; Kar, S. R.; McPherson, S. A.; McPherson, D. T.; Dubay, J. W.; Lebowitz, J.; Compans, R. W.; Hunter, E., Oligomerization of the hydrophobic heptad repeat of gp41. J Virol 1995, 69 (5), 2745-50. 52. Landau, N. R.; Warton, M.; Littman, D. R., The envelope glycoprotein of the human immunodeficiency virus binds to the immunoglobulin-like domain of CD4. Nature 1988, 334 (6178), 159-62. 53. Ruben, S.; Perkins, A.; Purcell, R.; Joung, K.; Sia, R.; Burghoff, R.; Haseltine, W. A.; Rosen, C. A., Structural and functional characterization of human immunodeficiency virus tat protein. J Virol 1989, 63 (1), 1-8. 54. Feng, S.; Holland, E. C., HIV-1 tat trans-activation requires the loop sequence within tar. Nature 1988, 334 (6178), 165-7.

44

55. Roy, S.; Delling, U.; Chen, C. H.; Rosen, C. A.; Sonenberg, N., A bulge structure in HIV-1 TAR RNA is required for Tat binding and Tat-mediated trans-activation. Genes Dev 1990, 4 (8), 1365-73. 56. Kao, S. Y.; Calman, A. F.; Luciw, P. A.; Peterlin, B. M., Anti-termination of transcription within the long terminal repeat of HIV-1 by tat gene product. Nature 1987, 330 (6147), 489-93. 57. Feinberg, M. B.; Baltimore, D.; Frankel, A. D., The role of Tat in the human immunodeficiency virus life cycle indicates a primary effect on transcriptional elongation. Proc Natl Acad Sci U S A 1991, 88 (9), 4045-9. 58. Tiley, L. S.; Madore, S. J.; Malim, M. H.; Cullen, B. R., The VP16 transcription activation domain is functional when targeted to a promoter- proximal RNA sequence. Genes Dev 1992, 6 (11), 2077-87. 59. Kim, S. Y.; Byrn, R.; Groopman, J.; Baltimore, D., Temporal aspects of DNA and RNA synthesis during human immunodeficiency virus infection: evidence for differential gene expression. J Virol 1989, 63 (9), 3708-13. 60. Malim, M. H.; Hauber, J.; Le, S. Y.; Maizel, J. V.; Cullen, B. R., The HIV-1 rev trans-activator acts through a structured target sequence to activate nuclear export of unspliced viral mRNA. Nature 1989, 338 (6212), 254-7. 61. Bartel, D. P.; Zapp, M. L.; Green, M. R.; Szostak, J. W., HIV-1 Rev regulation involves recognition of non-Watson-Crick base pairs in viral RNA. Cell 1991, 67 (3), 529-36. 62. Felber, B. K.; Drysdale, C. M.; Pavlakis, G. N., Feedback regulation of human immunodeficiency virus type 1 expression by the Rev protein. J Virol 1990, 64 (8), 3734-41. 63. Schwartz, S.; Felber, B. K.; Pavlakis, G. N., Mechanism of translation of monocistronic and multicistronic human immunodeficiency virus type 1 mRNAs. Mol Cell Biol 1992, 12 (1), 207-19. 64. Garcia, J. V.; Miller, A. D., Downregulation of cell surface CD4 by nef. Res Virol 1992, 143 (1), 52-5. 65. Goldsmith, M. A.; Warmerdam, M. T.; Atchison, R. E.; Miller, M. D.; Greene, W. C., Dissociation of the CD4 downregulation and viral infectivity enhancement functions of human immunodeficiency virus type 1 Nef. J Virol 1995, 69 (7), 4112-21. 66. Pandori, M. W.; Fitch, N. J.; Craig, H. M.; Richman, D. D.; Spina, C. A.; Guatelli, J. C., Producer-cell modification of human immunodeficiency virus type 1: Nef is a virion protein. J Virol 1996, 70 (7), 4283-90. 67. Schwartz, O.; Marechal, V.; Danos, O.; Heard, J. M., Human immunodeficiency virus type 1 Nef increases the efficiency of reverse transcription in the infected cell. J Virol 1995, 69 (7), 4053-9. 68. Miller, M. D.; Warmerdam, M. T.; Gaston, I.; Greene, W. C.; Feinberg, M. B., The human immunodeficiency virus-1 nef gene product: a positive factor for viral infection and replication in primary lymphocytes and macrophages. J Exp Med 1994, 179 (1), 101-13.

45

69. Lama, J.; Mangasarian, A.; Trono, D., Cell-surface expression of CD4 reduces HIV-1 infectivity by blocking Env incorporation in a Nef- and Vpu- inhibitable manner. Curr Biol 1999, 9 (12), 622-31. 70. Aiken, C.; Konner, J.; Landau, N. R.; Lenburg, M. E.; Trono, D., Nef induces CD4 endocytosis: requirement for a critical dileucine motif in the membrane- proximal CD4 cytoplasmic domain. Cell 1994, 76 (5), 853-64. 71. Ross, T. M.; Oran, A. E.; Cullen, B. R., Inhibition of HIV-1 progeny virion release by cell-surface CD4 is relieved by expression of the viral Nef protein. Curr Biol 1999, 9 (12), 613-21. 72. Kestler, H. W., 3rd; Ringler, D. J.; Mori, K.; Panicali, D. L.; Sehgal, P. K.; Daniel, M. D.; Desrosiers, R. C., Importance of the nef gene for maintenance of high virus loads and for development of AIDS. Cell 1991, 65 (4), 651-62. 73. Baba, T. W.; Jeong, Y. S.; Pennick, D.; Bronson, R.; Greene, M. F.; Ruprecht, R. M., Pathogenicity of live, attenuated SIV after mucosal infection of neonatal macaques. Science 1995, 267 (5205), 1820-5. 74. Collins, K. L.; Nabel, G. J., Naturally attenuated HIV--lessons for AIDS vaccines and treatment. N Engl J Med 1999, 340 (22), 1756-7. 75. Strebel, K.; Daugherty, D.; Clouse, K.; Cohen, D.; Folks, T.; Martin, M. A., The HIV 'A' (sor) gene product is essential for virus infectivity. Nature 1987, 328 (6132), 728-30. 76. von Schwedler, U.; Song, J.; Aiken, C.; Trono, D., Vif is crucial for human immunodeficiency virus type 1 proviral DNA synthesis in infected cells. J Virol 1993, 67 (8), 4945-55. 77. Liu, H.; Wu, X.; Newman, M.; Shaw, G. M.; Hahn, B. H.; Kappes, J. C., The Vif protein of human and simian immunodeficiency viruses is packaged into virions and associates with viral core structures. J Virol 1995, 69 (12), 7630- 8. 78. Simon, J. H.; Miller, D. L.; Fouchier, R. A.; Soares, M. A.; Peden, K. W.; Malim, M. H., The regulation of primate immunodeficiency virus infectivity by Vif is cell species restricted: a role for Vif in determining virus host range and cross-species transmission. EMBO J 1998, 17 (5), 1259-67. 79. Hoglund, S.; Ohagen, A.; Lawrence, K.; Gabuzda, D., Role of vif during packing of the core of HIV-1. Virology 1994, 201 (2), 349-55. 80. Cohen, E. A.; Dehni, G.; Sodroski, J. G.; Haseltine, W. A., Human immunodeficiency virus vpr product is a virion-associated regulatory protein. J Virol 1990, 64 (6), 3097-9. 81. Thali, M.; Bukovsky, A.; Kondo, E.; Rosenwirth, B.; Walsh, C. T.; Sodroski, J.; Gottlinger, H. G., Functional association of cyclophilin A with HIV-1 virions. Nature 1994, 372 (6504), 363-5. 82. Jowett, J. B.; Planelles, V.; Poon, B.; Shah, N. P.; Chen, M. L.; Chen, I. S., The human immunodeficiency virus type 1 vpr gene arrests infected T cells in the G2 + M phase of the cell cycle. J Virol 1995, 69 (10), 6304-13.

46

83. He, J.; Choe, S.; Walker, R.; Di Marzio, P.; Morgan, D. O.; Landau, N. R., Human immunodeficiency virus type 1 viral protein R (Vpr) arrests cells in the G2 phase of the cell cycle by inhibiting p34cdc2 activity. J Virol 1995, 69 (11), 6705-11. 84. Schwartz, S.; Felber, B. K.; Fenyo, E. M.; Pavlakis, G. N., Env and Vpu proteins of human immunodeficiency virus type 1 are produced from multiple bicistronic mRNAs. J Virol 1990, 64 (11), 5448-56. 85. Schubert, U.; Bour, S.; Ferrer-Montiel, A. V.; Montal, M.; Maldarell, F.; Strebel, K., The two biological activities of human immunodeficiency virus type 1 Vpu protein involve two separable structural domains. J Virol 1996, 70 (2), 809-19. 86. Willey, R. L.; Maldarelli, F.; Martin, M. A.; Strebel, K., Human immunodeficiency virus type 1 Vpu protein induces rapid degradation of CD4. J Virol 1992, 66 (12), 7193-200. 87. Kolata, G., FDA approves AZT. Science 1987, 235 (4796), 1570. 88. Nakashima, H.; Tochikura, T.; Kobayashi, N.; Matsuda, A.; Ueda, T.; Yamamoto, N., Effect of 3'-azido-2',3'-dideoxythymidine (AZT) and neutralizing antibody on human immunodeficiency virus (HIV)-induced cytopathic effects: implication of giant cell formation for the spread of virus in vivo. Virology 1987, 159 (1), 169-73. 89. Beyrer, C., The HIV/AIDS vaccine research effort: an update. Hopkins HIV Rep 2003, 15 (1), 6-7. 90. Shafran, S. D.; Mashinter, L. D.; Lindemulder, A.; Taylor, G. D.; Chiu, I., Poor efficacy of intradermal administration of recombinant hepatitis B virus immunization in HIV-infected individuals who fail to respond to intramuscular administration of hepatitis B virus vaccine. HIV Med 2007, 8 (5), 295-9. 91. Nath, A., HIV/AIDS Vaccine: An Update. Indian J Community Med 2010, 35 (2), 222-5. 92. Shin, S. Y., Recent update in HIV vaccine development. Clin Exp Vaccine Res 2016, 5 (1), 6-11. 93. Cohen, M. S.; Gay, C. L., Treatment to prevent transmission of HIV-1. Clin Infect Dis 2010, 50 Suppl 3, S85-95. 94. Smith, M. K.; Westreich, D.; Liu, H.; Zhu, L.; Wang, L.; He, W.; Zhou, J.; Miller, W. C.; Cohen, M. S.; Wang, N., Treatment to Prevent HIV Transmission in Serodiscordant Couples in Henan, China, 2006 to 2012. Clin Infect Dis 2015, 61 (1), 111-9. 95. Bhatti, A. B.; Usman, M.; Kandi, V., Current Scenario of HIV/AIDS, Treatment Options, and Major Challenges with Compliance to Antiretroviral Therapy. Cureus 2016, 8 (3), e515. 96. Morris, K. V.; Mattick, J. S., The rise of regulatory RNA. Nat Rev Genet 2014, 15 (6), 423-37. 97. Arimbasseri, A. G.; Rijal, K.; Maraia, R. J., Comparative overview of RNA polymerase II and III transcription cycles, with focus on RNA polymerase III termination and reinitiation. Transcription 2014, 5 (1), e27639.

47

98. Goodfellow, S. J.; Zomerdijk, J. C., Basic mechanisms in RNA polymerase I transcription of the ribosomal RNA genes. Subcell Biochem 2013, 61, 211- 36. 99. Cramer, P.; Armache, K. J.; Baumli, S.; Benkert, S.; Brueckner, F.; Buchen, C.; Damsma, G. E.; Dengl, S.; Geiger, S. R.; Jasiak, A. J.; Jawhari, A.; Jennebach, S.; Kamenski, T.; Kettenberger, H.; Kuhn, C. D.; Lehmann, E.; Leike, K.; Sydow, J. F.; Vannini, A., Structure of eukaryotic RNA polymerases. Annu Rev Biophys 2008, 37, 337-52. 100. Turowski, T. W.; Tollervey, D., Transcription by RNA polymerase III: insights into mechanism and regulation. Biochem Soc Trans 2016, 44 (5), 1367- 1375. 101. Sainsbury, S.; Bernecky, C.; Cramer, P., Structural basis of transcription initiation by RNA polymerase II. Nat Rev Mol Cell Biol 2015, 16 (3), 129-43. 102. Fuda, N. J.; Ardehali, M. B.; Lis, J. T., Defining mechanisms that regulate RNA polymerase II transcription in vivo. Nature 2009, 461 (7261), 186-92. 103. Shandilya, J.; Roberts, S. G., The transcription cycle in eukaryotes: from productive initiation to RNA polymerase II recycling. Biochim Biophys Acta 2012, 1819 (5), 391-400. 104. Bataille, A. R.; Jeronimo, C.; Jacques, P. E.; Laramee, L.; Fortin, M. E.; Forest, A.; Bergeron, M.; Hanes, S. D.; Robert, F., A universal RNA polymerase II CTD cycle is orchestrated by complex interplays between kinase, phosphatase, and isomerase enzymes along genes. Mol Cell 2012, 45 (2), 158-70. 105. Chapman, R. D.; Conrad, M.; Eick, D., Role of the mammalian RNA polymerase II C-terminal domain (CTD) nonconsensus repeats in CTD stability and cell proliferation. Mol Cell Biol 2005, 25 (17), 7665-74. 106. Zhang, Z.; Klatt, A.; Gilmour, D. S.; Henderson, A. J., Negative elongation factor NELF represses human immunodeficiency virus transcription by pausing the RNA polymerase II complex. J Biol Chem 2007, 282 (23), 16981-8. 107. Karn, J.; Stoltzfus, C. M., Transcriptional and posttranscriptional regulation of HIV-1 gene expression. Cold Spring Harb Perspect Med 2012, 2 (2), a006916. 108. Yamaguchi, Y.; Takagi, T.; Wada, T.; Yano, K.; Furuya, A.; Sugimoto, S.; Hasegawa, J.; Handa, H., NELF, a multisubunit complex containing RD, cooperates with DSIF to repress RNA polymerase II elongation. Cell 1999, 97 (1), 41-51. 109. Fujinaga, K.; Irwin, D.; Huang, Y.; Taube, R.; Kurosu, T.; Peterlin, B. M., Dynamics of human immunodeficiency virus transcription: P-TEFb phosphorylates RD and dissociates negative effectors from the transactivation response element. Mol Cell Biol 2004, 24 (2), 787-95. 110. Narita, T.; Yamaguchi, Y.; Yano, K.; Sugimoto, S.; Chanarat, S.; Wada, T.; Kim, D. K.; Hasegawa, J.; Omori, M.; Inukai, N.; Endoh, M.; Yamada, T.; Handa, H., Human transcription elongation factor NELF: identification of

48

novel subunits and reconstitution of the functionally active complex. Mol Cell Biol 2003, 23 (6), 1863-73. 111. Yamada, T.; Yamaguchi, Y.; Inukai, N.; Okamoto, S.; Mura, T.; Handa, H., P-TEFb-mediated phosphorylation of hSpt5 C-terminal repeats is critical for processive transcription elongation. Mol Cell 2006, 21 (2), 227-37. 112. Yamaguchi, Y.; Inukai, N.; Narita, T.; Wada, T.; Handa, H., Evidence that negative elongation factor represses transcription elongation through binding to a DRB sensitivity-inducing factor/RNA polymerase II complex and RNA. Mol Cell Biol 2002, 22 (9), 2918-27. 113. Weil, P. A.; Luse, D. S.; Segall, J.; Roeder, R. G., Selective and accurate initiation of transcription at the Ad2 major late promotor in a soluble system dependent on purified RNA polymerase II and DNA. Cell 1979, 18 (2), 469- 84. 114. Li, B.; Carey, M.; Workman, J. L., The role of chromatin during transcription. Cell 2007, 128 (4), 707-19. 115. Finley, J., Reactivation of latently infected HIV-1 viral reservoirs and correction of aberrant alternative splicing in the LMNA gene via AMPK activation: Common mechanism of action linking HIV-1 latency and Hutchinson-Gilford progeria syndrome. Med Hypotheses 2015, 85 (3), 320- 32. 116. Nabel, G.; Baltimore, D., An inducible transcription factor activates expression of human immunodeficiency virus in T cells. Nature 1987, 326 (6114), 711-3. 117. Fu, T. J.; Peng, J.; Lee, G.; Price, D. H.; Flores, O., Cyclin K functions as a CDK9 regulatory subunit and participates in RNA polymerase II transcription. J Biol Chem 1999, 274 (49), 34527-30. 118. Renner, D. B.; Yamaguchi, Y.; Wada, T.; Handa, H.; Price, D. H., A highly purified RNA polymerase II elongation control system. J Biol Chem 2001, 276 (45), 42601-9. 119. Paparidis, N. F.; Durvale, M. C.; Canduri, F., The emerging picture of CDK9/P-TEFb: more than 20 years of advances since PITALRE. Mol Biosyst 2017, 13 (2), 246-276. 120. Peng, J.; Marshall, N. F.; Price, D. H., Identification of a cyclin subunit required for the function of Drosophila P-TEFb. J Biol Chem 1998, 273 (22), 13855-60. 121. Wei, P.; Garber, M. E.; Fang, S. M.; Fischer, W. H.; Jones, K. A., A novel CDK9-associated C-type cyclin interacts directly with HIV-1 Tat and mediates its high-affinity, loop-specific binding to TAR RNA. Cell 1998, 92 (4), 451-62. 122. Shore, S. M.; Byers, S. A.; Maury, W.; Price, D. H., Identification of a novel isoform of Cdk9. Gene 2003, 307, 175-82. 123. Mbonye, U.; Wang, B.; Gokulrangan, G.; Shi, W.; Yang, S.; Karn, J., Cyclin- dependent kinase 7 (CDK7)-mediated phosphorylation of the CDK9

49

activation loop promotes P-TEFb assembly with Tat and proviral HIV reactivation. J Biol Chem 2018, 293 (26), 10009-10025. 124. Baumli, S.; Lolli, G.; Lowe, E. D.; Troiani, S.; Rusconi, L.; Bullock, A. N.; Debreczeni, J. E.; Knapp, S.; Johnson, L. N., The structure of P-TEFb (CDK9/cyclin T1), its complex with flavopiridol and regulation by phosphorylation. EMBO J 2008, 27 (13), 1907-18. 125. Nguyen, V. T.; Kiss, T.; Michels, A. A.; Bensaude, O., 7SK small nuclear RNA binds to and inhibits the activity of CDK9/cyclin T complexes. Nature 2001, 414 (6861), 322-5. 126. Yang, Z.; Zhu, Q.; Luo, K.; Zhou, Q., The 7SK small nuclear RNA inhibits the CDK9/cyclin T1 kinase to control transcription. Nature 2001, 414 (6861), 317-22. 127. Yik, J. H.; Chen, R.; Nishimura, R.; Jennings, J. L.; Link, A. J.; Zhou, Q., Inhibition of P-TEFb (CDK9/Cyclin T) kinase and RNA polymerase II transcription by the coordinated actions of HEXIM1 and 7SK snRNA. Mol Cell 2003, 12 (4), 971-82. 128. Michels, A. A.; Fraldi, A.; Li, Q.; Adamson, T. E.; Bonnet, F.; Nguyen, V. T.; Sedore, S. C.; Price, J. P.; Price, D. H.; Lania, L.; Bensaude, O., Binding of the 7SK snRNA turns the HEXIM1 protein into a P-TEFb (CDK9/cyclin T) inhibitor. EMBO J 2004, 23 (13), 2608-19. 129. Krueger, B. J.; Jeronimo, C.; Roy, B. B.; Bouchard, A.; Barrandon, C.; Byers, S. A.; Searcey, C. E.; Cooper, J. J.; Bensaude, O.; Cohen, E. A.; Coulombe, B.; Price, D. H., LARP7 is a stable component of the 7SK snRNP while P-TEFb, HEXIM1 and hnRNP A1 are reversibly associated. Nucleic Acids Res 2008, 36 (7), 2219-29. 130. Mancebo, H. S.; Lee, G.; Flygare, J.; Tomassini, J.; Luu, P.; Zhu, Y.; Peng, J.; Blau, C.; Hazuda, D.; Price, D.; Flores, O., P-TEFb kinase is required for HIV Tat transcriptional activation in vivo and in vitro. Genes Dev 1997, 11 (20), 2633-44. 131. Tahirov, T. H.; Babayeva, N. D.; Varzavand, K.; Cooper, J. J.; Sedore, S. C.; Price, D. H., Crystal structure of HIV-1 Tat complexed with human P- TEFb. Nature 2010, 465 (7299), 747-51. 132. Yang, Z.; Yik, J. H.; Chen, R.; He, N.; Jang, M. K.; Ozato, K.; Zhou, Q., Recruitment of P-TEFb for stimulation of transcriptional elongation by the bromodomain protein Brd4. Mol Cell 2005, 19 (4), 535-45. 133. Huang, H.; Liu, S.; Jean, M.; Simpson, S.; Huang, H.; Merkley, M.; Hayashi, T.; Kong, W.; Rodriguez-Sanchez, I.; Zhang, X.; Yosief, H. O.; Miao, H.; Que, J.; Kobie, J. J.; Bradner, J.; Santoso, N. G.; Zhang, W.; Zhu, J., A Novel Bromodomain Inhibitor Reverses HIV-1 Latency through Specific Binding with BRD4 to Promote Tat and P-TEFb Association. Front Microbiol 2017, 8, 1035. 134. Isel, C.; Karn, J., Direct evidence that HIV-1 Tat stimulates RNA polymerase II carboxyl-terminal domain hyperphosphorylation during transcriptional elongation. J Mol Biol 1999, 290 (5), 929-41.

50

135. Diribarne, G.; Bensaude, O., 7SK RNA, a non-coding RNA regulating P- TEFb, a general transcription factor. RNA Biol 2009, 6 (2), 122-8. 136. Humphries, P.; Russell, S. E.; McWilliam, P.; McQuaid, S.; Pearson, C.; Humphries, M. M., Observations on the structure of two human 7SK pseudogenes and on homologous transcripts in vertebrate species. Biochem J 1987, 245 (1), 281-4. 137. Barboric, M.; Kohoutek, J.; Price, J. P.; Blazek, D.; Price, D. H.; Peterlin, B. M., Interplay between 7SK snRNA and oppositely charged regions in HEXIM1 direct the inhibition of P-TEFb. EMBO J 2005, 24 (24), 4291-303. 138. Egloff, S.; Van Herreweghe, E.; Kiss, T., Regulation of polymerase II transcription by 7SK snRNA: two distinct RNA elements direct P-TEFb and HEXIM1 binding. Mol Cell Biol 2006, 26 (2), 630-42. 139. Li, Q.; Price, J. P.; Byers, S. A.; Cheng, D.; Peng, J.; Price, D. H., Analysis of the large inactive P-TEFb complex indicates that it contains one 7SK molecule, a dimer of HEXIM1 or HEXIM2, and two P-TEFb molecules containing Cdk9 phosphorylated at threonine 186. J Biol Chem 2005, 280 (31), 28819-26. 140. Yik, J. H.; Chen, R.; Pezda, A. C.; Zhou, Q., Compensatory contributions of HEXIM1 and HEXIM2 in maintaining the balance of active and inactive positive transcription elongation factor b complexes for control of transcription. J Biol Chem 2005, 280 (16), 16368-76. 141. Li, Q.; Cooper, J. J.; Altwerger, G. H.; Feldkamp, M. D.; Shea, M. A.; Price, D. H., HEXIM1 is a promiscuous double-stranded RNA-binding protein and interacts with RNAs in addition to 7SK in cultured cells. Nucleic Acids Res 2007, 35 (8), 2503-12. 142. Blazek, D.; Barboric, M.; Kohoutek, J.; Oven, I.; Peterlin, B. M., Oligomerization of HEXIM1 via 7SK snRNA and coiled-coil region directs the inhibition of P-TEFb. Nucleic Acids Res 2005, 33 (22), 7000-10. 143. Lebars, I.; Martinez-Zapien, D.; Durand, A.; Coutant, J.; Kieffer, B.; Dock- Bregeon, A. C., HEXIM1 targets a repeated GAUC motif in the riboregulator of transcription 7SK and promotes base pair rearrangements. Nucleic Acids Res 2010, 38 (21), 7749-63. 144. D'Orso, I.; Jang, G. M.; Pastuszak, A. W.; Faust, T. B.; Quezada, E.; Booth, D. S.; Frankel, A. D., Transition step during assembly of HIV Tat:P-TEFb transcription complexes and transfer to TAR RNA. Mol Cell Biol 2012, 32 (23), 4780-93. 145. Barrandon, C.; Bonnet, F.; Nguyen, V. T.; Labas, V.; Bensaude, O., The transcription-dependent dissociation of P-TEFb-HEXIM1-7SK RNA relies upon formation of hnRNP-7SK RNA complexes. Mol Cell Biol 2007, 27 (20), 6996-7006. 146. Van Herreweghe, E.; Egloff, S.; Goiffon, I.; Jady, B. E.; Froment, C.; Monsarrat, B.; Kiss, T., Dynamic remodelling of human 7SK snRNP controls the nuclear level of active P-TEFb. EMBO J 2007, 26 (15), 3570-80.

51

147. Hogg, J. R.; Collins, K., RNA-based affinity purification reveals 7SK RNPs with distinct composition and regulation. RNA 2007, 13 (6), 868-80. 148. Peterlin, B. M.; Brogie, J. E.; Price, D. H., 7SK snRNA: a noncoding RNA that plays a major role in regulating eukaryotic transcription. Wiley Interdiscip Rev RNA 2012, 3 (1), 92-103. 149. Chaudhury, A.; Chander, P.; Howe, P. H., Heterogeneous nuclear ribonucleoproteins (hnRNPs) in cellular processes: Focus on hnRNP E1's multifunctional regulatory roles. RNA 2010, 16 (8), 1449-62.

52

Chapter 2: Determination of the Secondary

Structure of 7SK SnRNA by DMS-MaPseq

53

2.1 Abstract

Human 7SK snRNA is an abundant 330-332 nt transcript that is transcribed by

RNA polymerase III (RNAPIII)1-2. 7SK has been found existing mostly in three different and mutually exclusive 7SK snRNPs that function collaboratively to regulate transcriptional initiation and elongation3-6. In human, most functional 7SK snRNA is observed to be associated with and stabilized by MePCE protein and

LARP7 protein at 5’ and 3’ end of the RNA, respectively5, 7-9. The mostly notable fate of 7SK snRNA is to form 7SK-HEXIM-P-TEFb snRNP complex, through which

7SK snRNA closely regulates the availability of active P-TEFb3-4, 10-11. While the detailed mechanism of HIV viral Tat protein sequestering P-TEFb is unclear, the mutually exclusive complex of 7SK-hnRNP snRNP complex suggested a potential pathway of P-TEFb releasing1, 10-12.

To study the 7SK-hnRNP complex, as well as other 7SK associated snRNP complexes, the secondary structure determination of 7SK snRNA is especially important6, 13-17. The secondary structure model of 7SK snRNA has been previously determined in 199113. Over the past decade, thanks to the development of chemical probing methods coupled with massively parallel sequencing, the accuracy and efficiency of RNA secondary structure probing have been revolutionarily changed and the secondary structures of numerous RNAs have been determined both in vitro and in vivo 18-21. The secondary structure of 7SK

54

snRNA has been determined previously, however, several preliminary attempts of construct design based on the Steitz model failed to fold into a single conformation, rendering the Steitz model an unclear starting point for individual structural study of 7SK stem loops.

In this chapter, we present a secondary structure model of 7SK snRNA with the recently developed DMS-MaPseq method 18, 22. In this method, 7SK snRNA was transcribed in vitro, without any association of protein factors. The DMS modified transcript was reverse transcribed using a special reverse transcriptase, TGIRT-

III, PCR amplified and sequenced using Illumina Hiseq 2000 system18, 22-23. The initial process of the data was following the previous method, and the mutation distribution data was normalized and input as pseudoenergy restraints to

RNAStructure folding program to guide the determination of the 7SK snRNA secondary structure model 24-27. The new model revealed many interesting structural aspects of 7SK snRNA, however, the functional understanding of 7SK snRNA was limited by the interpretation of the structure based on a population average dataset22, 28-29. In order to understand the conformational heterogeneity of

7SK snRNA, one of the DMS-MaPseq datasets (DMS_491) was further analyzed by using a recently developed method, in which each sequencing read was filtered and clustered based on the conservation of the mutational patterns. While the proposed clustering structures require further validation, based on the preliminary data, the putative equilibrium between the two conformations (with difference

55

located in the middle region) might change the overall dynamics of 7SK snRNA.

While SL2 region did not directly bind to the protein factors, the conformational change of the region might change the accessibility of the protein factors to the other stem loops. As planned, the SL2 constructs based on the DMS cluster model would be designed and studied using NMR, SAXS and cryo-EM coupled with MD simulation, towards the goal of a high-resolution structure model of full length 7SK snRNA 30-33.

56

2.2 Introduction

Human 7SK snRNA is an abundant 330-322 nt transcript generated by RNA polymerase III (RNAPIII). The contains hundreds of pseudogenes that encodes 7SK RNA, however, the only functional copy is located on chromosome 6 15. 7SK RNA is expressed under a strong constitutive promoter that relies on the transcription factors and terminates upon transcription of a canonical poly-uracil sequence 1, 34.

After transcription by RNAPIII, multiple posttranscriptional modifications have been observed on 7SK snRNA. Exonuclease trims off one to three uracils before an unknown enzyme adds a single adenosine, leading to a heterogenous population of 330-332 nt transcripts 35. The 3’ end is protected from further degradation by genuine La protein while the 5’ end is capped with a monomethyl group on the gamma phosphate by the methylphosphate capping enzyme (MePCE) 8-9, 36. After methylation, the enzyme remains bound to the 5’ end of 7SK RNA, while genuine

La protein is replaced by La-related protein 7 (LARP7), forming the core 7SK snRNP5. As over 90% of LARP7 is observed to associate with 7SK snRNA, it seems that the sole function of the protein is to stabilize 7SK snRNA and the associated snRNP formation8. At the 5’ end, the methyl phosphate capping enzyme (MePCE), also known as bicoid-interacting protein 3 (BCDIN3)3, adds a single methyl group to the gamma phosphate and binds the snRNA.

57

The posttranscriptional processing events by MePCE and LARP7 are critical for

7SK RNA stability. Knock-down of either LARP7 1, 36-37 or MePCE 1, 9, 36 dramatically reduces 7SK snRNA level in vivo. More interestingly, LARP7 and

MePCE can also directly interact with each other, and 7SK snRNA strengthens this interaction36. There are two major consequences of the LARP7–MePCE interactions: 1) LARP7 suppresses MePCE catalysis, preventing removal of the monomethyl phosphate cap 9. 2) Direct association while binding RNA helps bring together the 5’ and 3’ ends, forming the core 7SK snRNP12, 36. With the help of the two proteins, 7SK snRNA serves as a scaffold to facilitate formation of multiple snRNPs that work together to regulate RNAPII transcription.

7SK snRNA has been found mostly in three different and mutually exclusive 7SK snRNPs that function collaboratively to regulate transcriptional initiation and elongation. As identified as the major function, the most notable fate of 7SK snRNA is to form 7SK-HEXIM-P-TEFb snRNP complex, in which 7SK snRNA directly sequesters P-TEFb, as well as an accessory factor, hexamethylene bis-acetamide inducible protein (HEXIM)14, 38. Moreover, this complex has been reported in association with other P-TEFb auxiliary factors AFF1 and AFF4, which facilitate the super elongation complex to enhance effective transcriptional elongation 39-40.

Multiple heterogeneous nuclear ribonucleoproteins (hnRNPs) bind to the remaining core 7SK snRNP to form the 7SK-hnRNP snRNPs. Of the twenty hnRNPs, only hnRNP A1, A2/B1, Q1 and Q3, R, and K have been observed to

58

bind 7SK snRNA4, 41-42. Both 7SK-P-TEFb and 7SK-hnRNP snRNPs are associated with MePCE and LARP7 while the other 7SK snRNP interestingly does not contain either. Comprising a small percentage of the 7SK snRNPs, the 7SK-

BAF snRNP is composed of 7SK snRNA and the BAF chromatin remodeling complex43. Without the canonical stabilizing of MePCE and LARP7, the complex must be stabilized through a different mechanism. The 7SK-BAF snRNP is predominantly recruited to enhancer regions, where the BAF complex positions nucleosomes to prevent RNAPII transcription of enhancer RNAs 5, 9, 36.

Previous structural studies of 7SK snRNA includes the chemical and ribonuclease probing experiments performed by the J. Steitz group, which led to the determination of the initial secondary structure of 7SK snRNA containing four stem loops (SL1-SL4)13 [Figure 2.2]. Later in 2009, the phylogenetic study of 7SK RNAs across metazoans, with comprehensive computational analysis, suggests an improved secondary structure model composed of eight highly conserved motifs

(M1-M8)16. More recently in 2017, the secondary structure model determined by

D. Price group using selective 2′-hydroxyl acylation analyzed by primer extension

(SHAPE) further improved the Steitz model, with major differences at SL2 and SL3 regions44. For the purpose of this thesis, I will compare the Steitz model [Figure

2.2] and the Price model determined by SHAPE [Figure 2.3] 13, 44.

59

In the Steitz model [Figure 2.2], the native 331 nt 7SK snRNA folded into 4 stem loops. Specifically, the 5’ 1-108 nt folded into SL1, followed by the SL2 and SL3, which included 116-171 nt and 203-273nt, respectively, and the 3’ 296-331 nt folded into SL4. Between SL2 and SL3, as well as SL2 to SL4, there were long unpaired linker regions. While the recent model determined using SHAPE mostly agreed with the previous model on SL1 and SL4 at the two ends, the middle region, folded into SL2A and followed by two shorter stem loops, SL2B and SL2C, and a longer SL3 comparing to the Steitz model.

In this chapter, I am going to present my project to determine the secondary structure of 7SK snRNA in vitro, mostly, using dimethyl sulfate (DMS) modification followed by mutational profiling by sequencing (MaPseq) method.

DMS is one the oldest chemical modification reagents that were commonly used to probe Watson-Crick base pairing in nucleotides45-47. The application of DMS to double-stranded DNA or RNA results in methylation mainly on the N7 atom of guanine (G) and adenine (A) within the major groove, and the N3 atom of G and A within the minor groove of the helix [Figure.2.1]. The N1 of A and N3 of cytidine (C) position are protected by base pairing. In single-stranded DNA or RNA, the N1 position of A and N3 of C are readily methylated by DMS, in addition to the N3 and

N7 positions of G and A47. A powerful innovation came a few years later when

DMS modification was coupled with primer extension with RT, in which two

60

nucleotides (As and Cs) of methylation by DMS in the single-strand RNA were blocked due to the modification, hence, probing the secondary structure of RNA48.

Further advances came when this versatile method is coupled with another RT, thermostable group II intron reverse transcriptase (TGIRT), as well as next- generation sequencing18, 22. The method developed by our collaborator, Dr. Silvi

Rouskin of MIT, was named DMS-MaPseq, in which TGIRT reverse transcribes the modified RNA with mismatches at methylated A and C. The probing result provides structural restraints for structure determination by incorporating it in an algorithm for structure prediction as pseudoenergy restraints24, 27. RNAStructure

(application as well as the webserver), developed by the David Mathews group at

University of Rochester, is a commonly used structural prediction tool based mainly on free energy calculations. The incorporation of experimental data of DMS-

MaPseq greatly increased the accuracy of the secondary structure of the large

RNA construct 49-50.

61

2.3 Materials and Method

T7 RNA Polymerase preparation

The recombinant T7 RNA polymerase was overexpressed in BL 21(DE3) electro- competent cells in LB broth containing 200 µg/mL ampicillin for additional 5 h at

37 °C after induction of IPTG at OD of 0.6-1.0. The growth was pelleted and resuspended in lysis buffer (10 mM Na2HPO4, 20 mM imidazole, 1.2 M NaCl, pH

7.5). The following sonication and centrifugation were performed at recommended conditions. The lysate was further purified using AKTA start protein purification system with HisTrap HP histidine-tagged protein purification column. The elution was loaded to HiPrep 16/60 Sephacryl S-100 HR column to exchange to the storage buffer (20 mM Tris, 170 mM NaCl, 1 mM EDTA, 1 mM DTT, pH 8.0). The enzyme was stored in 50% v/v glycerol at -20 °C until use.

7SK snRNA Preparation

The native sequence of 7SK snRNA was subcloned into vector pUc19 at the 3’

EcoRI site (gift from Dr. Jonathan Karn). The plasmid template was cut with EcoRI-

HF from NEB at 3’ end of the template, followed by purification to remove uncut plasmids and restriction enzymes. The linearized plasmid template was desalted and stored at -20 °C. The fully protonated nucleotides (Sigma-Aldrich) were mixed

62

according to the ratio of each nucleotide in 7SK snRNA (A: C: G: U = 66:96:99:75).

The customized NTP mix gives higher yield comparing to NTP mix of equal amount in the RNA transcription in vitro. Trial reaction was performed prior to the large- scale reaction to decide the optimal conditions. Urea containing polyacrylamide gel electrophoresis (Urea-PAGE) was used to decide the optimal condition based on the relative intensity of the RNA bands. Following synthesis, the transcription reaction was treated with DNase I, RNase free (Roche) following manufacturer’s protocol and then, desalted using 100 kD centrifugal filter (Millipore Amicon), which also removed the aborted product of the transcription. This alternative purification procedure, in replace of the conventional FPLC method was only ideal for this construct, as the Urea-PAGE shows very little large size aborted products of the transcription (molecular weight of the RNA is 102.55 kD, closely above the filter cutoff). The purified desalted RNA is stored at -20 °C until use.

DMS modification

The 7SK snRNA concentration was pre-adjusted to around 200 ng/µL. 5 µL (1 µg) of purified RNA is annealed by heating the sample to 95 °C for 15 s and flash cooling on ice for 2 min in a 1.5 ml tube. Then 95 µL of DMS modification buffer

(100 mM Sodium Cacodylate, 140 mM KCl, 3 mM MgCl2, pH 7.5) were added to each sample. The samples were incubated in the DMS modification buffer at room temperature for 30 min before adding 2-5% of DMS. The sample was incubated at

37 °C with 500 rpm shaking for 5 or 10 min after adding 2-5% of DMS. The

63

methylation reaction was terminated with 60 µL of BME (Sigma-Aldrich) to each sample. The samples then were desalted using RNA cleanup and concentrator-5 column (Zymo Research) to recover the RNA > 200 nt following the manufacturer’s instruction. The untreated samples as negative control was prepared by replacing

2-5% of DMS with RNase free water. The yield of the RNA after RNA cleanup was around 100 ng/µL for a 6 µL elution.

RT-PCR

20–50 ng of methylated RNA was used for reverse transcription with 100 U thermostable group II intron reverse transcriptase, 3rd generation (TGIRT-III, InGex) for 2 h at 57 °C, per manufacturer’s instruction. The reverse primer (5’- CAT GCA

GCG CCT CAT TTG -3’) for 7SK snRNA is commercially synthesized (IDT). The

RNA templates were digested using RNase H (NEB) for 20 min at 37 °C, per manufacturer’s instruction. The reversely transcribed DNA were sequentially PCR amplified using Phusion DNA polymerase (NEB) or Phusion Flash High-Fidelity

PCR Master Mix (Thermo Scientific). The polymerization chain reaction (PCR) program with the primer set (forward primer: 5’- GGA TGT GAG GGC GAT CTG -

3’; reverse primer: 5’- CAT GCA GCG CCT CAT TTG -3’) began with initial denaturing for 30 s at 98 °C, followed by 25 PCR cycles, including denaturing for

5 s at 98 °C, annealing for 10 s at 65 °C, and extension for 15 s at 72 °C, in order; then the final extension is set for 5 min at 72 °C. The PCR products were desalted using DNA cleanup and concentrator-5 column kit (Zymo Research). An agarose

64

gel analysis of all cleanup samples was run to check the yield and specificity of the

PCR amplification before the samples were sent for sequencing.

Next Generation sequencing

The sequencing was performed on an Illumina HiSeq 2000 system, which used cluster generation and sequencing by synthesis (SBS) chemistry. The sequencing primer libraries were constructed using the Beckman Coulter SPRIworks to adenylate each end and ligate adaptors to amplicons. A PCR reaction was then used to generate index libraries. Libraries were quantified using the Fragment

Analyzer (Advanced Analytical) and qPCR before being loaded for paired-end sequencing.

Processing of DMS-MaPseq data

• Initial process at sequencing core

The raw sequencing data from Illumina HiSeq 2000 system is output in multiplex .fastq files. The .fastq file is a text file that contains the sequence data from the clusters that pass the filter on a flow cell. Since the samples were multiplexed, the initial process was to de-multiplex each assigned cluster to a simpler .fastq text file, based on the cluster’s index sequence. The indexed text file includes both the sequence letter and representative quality score (QS) that were each encoded with a single ASCII character. To ensure the authenticity and quality

65

of the original sequence, the original filtering only selected the sequence with quality using the FASTX-Toolkit Clipper and Quality Filter functions, respectively, requiring that 80% of sequenced bases have a QS over 25. In this process, all nonuniquely aligned reads were then removed.

For detailed manuals, please refer to (http://hannonlab.cshl.edu/fastx_toolkit/).

• Indexation

As the .fastq file was downloaded from the sequencing core, the .fasta file of 7SK snRNA (1-331 nt) was indexed using bowtie2 ("bowtie2 -L 12 --local --no-unal -- no-discordant -x " + ref + ' -1 ' + file1 + ' -2 ' + file2 + " -S " + out + " -p 3") to transform the original sequence file to an adaptable matrix for the following alignment.

• Alignment

The index files and .fastq files are designated to the same folder under the same directory. The .fastq files was aligned against the index files of full length 7SK (1-

331 nt) sequence using ExecuteBow.py to generate .sam file that included the alignment of each single read.

• Output

The following output process was performed based on .sam file. In which step, specific alignment region could be set: for example, if only SL3 of 7SK will be

66

analyzed, then coordinate of the process could be set to SL3 region, to recover those original sequences that were discarded in the alignment due to partial coverage, but covered SL3 region completely. The output files include:

1. Population average file: which shows the mutational distribution across the

whole sequence of all the sequencing reads that were selected in the

analysis. The y-axis of the distribution is the ratiometric DMS intensity

calculated as number of mismatch/sequencing depth. This file as well as

the representative text file are the direct representation of the sequencing

result, the MaPseq.

2. Read coverage file: which as a quality control, distributes the region

coverage of each sequencing reads that was selected in the analysis. The

uneven coverage of the read due to an unexpected truncation of the

transcript during the DMS modification would lead to biased distribution as

the uncovered region would show “false negative”.

3. Jackpot file: which as a quality control, counted the standard deviation of

the occurrence of DMS modification on each sequence. Over methylation

of the RNA could lead to nonspecific truncation of the transcript; moreover,

uneven methylation could lead to misrepresentation of the RNA. In the ideal

67

experiment, each sequencing read of the selected dataset should have

shown equal numbers of modification occurrence.

4. Bitvector file: which recorded all the mutations of each reads. As an internal

quality control, the untreated sample should have shown average low

mutation rate (background mutational rate), while the DMS-treated samples

should have shown modified A and C signals much higher than the

background mutational rate.

Data validation and normalization

Despite template-switching capabilities of the TGIRT III enzyme, there was not a substantial number of chimeric reads detected in our dataset, hence no additional processing step beyond alignment was applied to remove those reads18, 22. The reads that show poor coverage of the full length were removed from the populated dataset. The mutational distribution along the full length 7SK snRNA is documented to ensure positive signals over the background signals (untreated samples). The population average files were further analyzed:

1. minimal mutational signal of 5’ and 3’ primer overlapping regions were

determined to be null, and minimal signals from T and G in the sequence

were determined to be null as well.

68

2. Signals from A and C were 98% Winsorized and normalized as described

previously 18.

3. Each dataset was internally normalized to the highest mutational signal and

output as .text file, in which the highest mutational signal is denoted as 1.

As for the following incorporation of the file into RNAStructure as

pseudoenergy restraints, the null signals of G and T were denoted to -999

(no signal information)24-25.

RNAStructure

The command line interface version was downloaded and used in the experiment, but similar functions could be done with the webserver51-52. The 7SK snRNA full length (7SK.FASTA) was input to fold with restraint file (7SK_DMS.txt) generated from DMS-MaPseq. The script is [./Fold_~/Desktop/7SK/7SK.FASTA_~/Desktop/

7SK/7SK.ct_-dms_~/Desktop/7SK/7SK_DMS.txt], and the folded structure was drawn [./draw ~/Desktop/7SK/7SK.ct ~/Desktop/7SK/7SK.ps].

69

2.4 Result

Population average model of DMS-MaPseq

As mentioned in Materials and Methods, DMS modification was optimized between

2-5% with an incubation of 5-10 min 22. As it was optimized, 5% of DMS was used for 10 min before neutralization in this experiment. The DMS-MaPseq was repeated and three individual experiments were selected to determine the population average model of the RNA. The three datasets were averaged following internal normalization, and the average data were used as the pseudoenergy restraint for RNAStructure 25, 27.

The population average model of DMS-MaPseq is shown in Figure 2.4. In this model of DMS-MaPseq, 7SK snRNA folded into 4 major stem loops (SL1, SL2A,

SL3 and SL4) with two minor stem loops (SL2B and SL2C) [Figure 2.4A].

Comparison of the secondary structure determined by DMS-MaPseq to the Steitz model is highlighted in red.

• SL1

The Steitz model showed that G13 is base paired with C93, and A14 is base paired with U92. In the normalized DMS-MaPseq data, C93 showed medium DMS reactivity with an average of 0.395 out of 1, which indicated the nucleotides were

70

not likely to be paired, as the DMS reactivity of the base paired As and Cs in the same DMS-MaPseq dataset are normally under 0.2. The reactivity information of

A14 is missing as the region is denoted as primer region during the PCR step.

• SL2

The Steitz model included only one stem loop (116-171 nt); the following region

(172-200 nt) is not base paired. The DMS-MaPseq model showed major disagreement in this region: C114 which was originally located in the short linker region between SL1 and SL2 showed extremely low reactivity, 0.065 out of 1, and should be denoted as base paired. C141 which is originally determined as base paired showed a high DMS reactivity of 0.525. C150 was originally determined not base paired, while the DMS-MaPseq data showed a low reactivity of 0.08. For the original linker region where all the nucleotides were unpaired (157-160 nt), the average DMS reactivity was 0.08 for C157-C160, 0.17 for A172, 0.29 and 0.12 for

C173 and C174, and 0.12 for C181. Those nucleotides with low DMS reactivity were folded into 2 minor stem loops, SL2B and SL2C.

• SL3

The DMS-MaPseq data suggested that SL3 was longer than the original, as part of the flexible linker between SL2 and SL3 would be paired with the linker region between SL3 and SL4. While the DMS-MaPseq model includes a flexible bulge located at the lower region, the DMS reactivity of C282 and C284 (both under 0.1) did indicate base paring of the region. The middle region of SL3 agreed with the

71

Steitz model, however, the top region (U223-U250) did show some different base paring patterns; the average DMS reactivity was 0.20 for C225-A228, 0.53 for

C229-A231, and 0.17 for C233. The apical loop (C235, C237, A238 and A239) showed a high reactivity from 0.32 to 1. C243-A245 also showed a strong reactivity from 0.39 to 0.95. The above detailed data produced a secondary structure model with maximized incorporation of both the DMS-MaPseq data and energy-based structure calculation, which is shown in Figure 2.4.

• SL4

The secondary structure of SL4 determined here is the same as the Steitz model.

As the region was mostly overlapped with the reverse primer region, DMS-MaPseq provided little information about the stem loop. The model adopted the secondary structure of the solved SL4 structure 30. However, there were information as mentioned earlier that part of the linker (C282 and C284) between original SL3 and

SL4 folded into the SL3 in the DMS-MaPseq model. An average DMS reactivity of

0.43 indicated no more minor stem loops formed in the original linker region.

• Comparison to the previous models

The comparison of the DMS-MaPseq model to the Steitz model indicated the SL2 region was highly flexible and the secondary structure differed from each other.

While the Price model agreed with the DMS-MaPseq model in general, 7SK snRNA adopted different secondary structure in some regions, mostly within the

SL2 region [Figure 2.4B]. In both models, SL2 folded into one major stem loop

72

(SL2A) and two minor ones (SL2B and SL2C). For SL2A, the Price model determined U120 paired with A143, whereas the DMS-MaPseq model indicated

U120 was more likely paired with A144, leaving A143 unpaired (Figure 2.4B, shown in green). Although A143 was constantly shown more reactive than A144 in all the DMS-MaPseq, the DMS reactivities were 0.39 and 0.36 in average, respectively, with very minimal difference. The two models determined SL2B and

SL2C differently. In the Price model, the region from C159-A166 formed an apical loop region with no base pairs in SL2B; while in the DMS-MaPseq model, since the average DMS reactivities of C159 and C160 were low as 0.04 and 0.11, respectively, close to the reactivity of base paired C157 and C158, the apical loop was formed from G161 to A166. As the consequence of fewer base pairs forming within SL2B in the Price model, the 3’ of SL2B shifted to form two more base pairs in SL2C. One interesting observation was that C185 showed very low DMS reactivity, however, it was not determined to be base paired in the DMS-MaPseq model. In the Price model, G171 was paired with C185 within SL2C, whereas, in the DMS-MaPseq model, G171 is determined to be paired with C157 in the SL2B region. The stem loop forming of the region that forms SL2B and SL2C in the two models was merely dependent on how the G170 and G171 were base paired.

When G170 and G171 were paired with U184 and C185, like the Price model suggested, G167 and G168 would be paired with C158 and C157, respectively. In the other scenario, like the DMS-MaPseq model indicated, when G170 and G171 were paired with C158 and C157, the following two Cs, C159 and C160 would be

73

shifted into the apical loop in SL2B, leaving U184 and C185 unpaired. While the two models suggested different base pairing based on chemical reactivity profiles, it’s plausible that in vitro prepared 7SK snRNA exists in equilibrating conformations.

Preliminary Trial of Clustering on DMS_491

The in vitro transcribed 7SK snRNA that was used for DMS-MaPseq was tested using native PAGE. The RNA of one experimental replicate, DMS_491 gave two conformations. Further analysis of this dataset (DMS_491) showed that the R2 in the linear regression test of DMS_491 between the other two datasets were 0.966 and 0.964, while the R2 of the other two datasets was 0.983 (data not shown).

Nevertheless, there is more difference from the DMS_491 dataset compared to the other two 18.

However, the secondary structure with incorporation of DMS_491 alone yielded the same structure as the other datasets. To better understand the impact of the minor difference between the dataset, further analysis of DMS_491 was attempted to identify and quantify the potential alternative conformations of 7SK snRNA, with the help from our collaborator, Dr. Silvi Rouskin. In this clustering method, instead of populating all the aligned sequence reads, the sequencing reads were subgrouped into several datasets based on the conservation of each sequencing read [Figure 2.5]. Further discussion about the methodology will be available in the forthcoming publication by Dr. Rouskin. In the preliminary trial of the clustering

74

analysis, the whole dataset was reprocessed after the alignment step and clustered into two datasets. One of them representing the majority of the 7SK snRNA [Figure. 2.5A] adopted the same conformation as the Price model, while the other conformation adopted a similar secondary structure with only minor difference in SL2 (the structures of each cluster are not available at this time).

75

2.5 Discussion

From the population average model by DMS-MaPseq, we noticed differences compared to the previous models. The Steitz model used chemical and enzymatic methods followed by conventional sequencing. The chemicals used in the method specifically probed different nucleotides (DMS for A and C; CMCT for U; Kethoxal for G) at the base pairing faces directly. The Price model used a chemical that selectively acrylated the 2’- hydroxal groups in the RNA backbone and indirectly probes the base pairing of the nucleotides. While both methods probe the secondary structure of the RNA, the difference of the modification location could possibly induce variations between the two models. Our method, DMS-MaPseq incorporated the base-pairing probing reagent, DMS with state-of-the-art sequencing method, mutational profiling sequencing.

The three models presented similar secondary structures of 5’ and 3’ ends of the

RNA (SL1 and SL4), which might indicate the secondary structure of the RNA does not depend on the protein factors, MePCE and LARP7, even though they stabilize and form core 7SK snRNP. The major structural differences between models is within the middle region of 7SK snRNA, which suggested the potential coexistence of different conformations. In addition to the population average model of 7SK snRNA determined by DMS-MaPseq, further analysis is required in order to better

76

study the conformational differences of 7SK snRNA and understand the biological functions of the RNA.

The issue with the population average data, and likely the other models, came from the underlying assumption that 7SK snRNA adopts only one secondary structure, regardless of the finding that functional RNAs often adopt multiple equilibrating conformations, which can function differently or collaboratively in biological activities. A population average model could be unperceptive and somewhat misleading without proper understanding of the RNA conformational heterogeneity.

The clustering method under development can identify and quantify different RNA conformations from a single sequencing dataset, through a calculative algorithm.

In part of the project to determine the structure of full length 7SK snRNA at high resolution, further steps have been planned based on the population average model of 7SK snRNA. Currently, the 3D structural model of SL1 and SL4 have been determined by D’Souza group30-31, and the structural model of SL3 has been determined (Chapter 3) with a hybrid method of NMR, SAXS and MD simulations.

The challenges for SL2 structural characterization came from conformational heterogeneity. SL2 has not been observed to interact with protein factors, indicating that the stem loop does not directly participate in binding events, however, it is possible that the conformational change of the region ultimately

77

controls the availability of the binding sites of 7SK snRNA to protein factors. In the equilibrium of 7SK snRNPs with P-TEFb and hnRNP A1, it is possible that P-TEFb selectively binds to the compact conformation (Figure. 2.5A) so that the HEXIM1 dimer bond to SL1 is able to interact with and inhibit the kinase. Meanwhile, the interaction of HEXIM1 and P-TEFb locks 7SK snRNA in the compact form. When

HIV protein Tat dislodges P-TEFb, the complex is destabilized and HEXIM1 is therefore released from the complex. As a result, 7SK snRNA tends to adopt the loose conformation (Figure 2.5B), in which space is opened up within the SL3 region for hnRNP A1 interactions.

With the help of the clustering method, different conformations of 7SK snRNA can be identified and quantified. We propose to design constructs that “lock” each in different conformations, based on the clustering method, so that each conformation can be characterized individually. Towards the goal of determining the high-resolution 3D structure of each stem loop of 7SK snRNA and the full- length 7SK snRNA, the future work is proposed to acquire shape information of each conformation utilizing NMR, SEC-SAXS and cryo-EM methods in combination with MD simulation.

78

[Figure 2.1] Watson-Crick Base Pairing of RNA. N1 of adenine and N3 of cytosine are involved in W-C base pairing. While all the imono nitrogens (red) can be methylated by DMS in single strands, in the base pairing region, N1 (A) and N3

(C) are protected, and not accessible to DMS. Based on the methylation status of

N1 (A)and N3 (C), the secondary structure of a region can be probed.

79

[Figure 2.2] Secondary Structure Model Determined by Chemical and

Enzymatic Probing by the J. Steitz Group. The full length 7SK snRNA was 331 nt and expressed in vivo. In the model adopted from the original probing, Watson-

Crick base pairs and G-U wobble base pairs were included while noncanonical base pairs were not considered. The full length 7SK snRNA folded into 4 stem loops with long linker regions between SL2, SL3 and SL4. As visual aid, the sequence is circled every 10 nt and the sequence number was labeled every 30 nt (This numbering does not indicate any structural information).

80

81

[Figure 2.3] Secondary Structure Model Determined Using SHAPE by the D.

Price Group. The full length 7SK snRNA was 332 nt and expressed in vivo. In the model adopted from the original probing, Watson-Crick base pairs and G-U wobble base pair were included while noncanonical base pairs were not considered. As visual aid, the sequence is circled every 10 nt and the sequence number is labeled every 30 nt.

82

[Figure 2.4] Secondary Structure Model Determined by DMS-MaPseq. The full length 7SK snRNA was 336 nt (3’ EcoRI site, 5’-GAATTC-3’) and expressed in vitro by T7 RNA polymerase. The model was determined incorporating DMS-

MaPseq data as pseudoenergy restraints to guide the folding using RNAStructure.

In the model, Watson-Crick base pairs and G-U wobble base pairs were included while noncanonical base pairs were not considered. As visual aid, the sequence is circled every 10 nt and the sequence number is labeled every 30 nt. To help comparing the DMS-MaPseq model to the previous models: (A) highlighted region in red, indicating the secondary structure different from the Steitz model; (B) highlighted region in green, indication the secondary structure different from the

Price model.

83

84

85

[Figure 2.5] Two Preliminary Clustering Models Determined by DMS-MaPseq.

The population average DMS-MaPseq (DMS_491) was processed (method not published) and clustered into subgrounps representing heterogeneous conformations of 7SK snRNA. Models were individually generated incorporating clustered DMS-MaPseq data as pseudoenergy restraint in RNAStructure. As default, W-C base pairs and G-U wobble base pairs were included while non canonical base pairs were not considered. As visual aid, the sequence is circled every 10 nt and the sequence number is labeled every 30 nt. (A) DMS_491_

Cluster_1 model. (B) DMS_491_Cluster_2 model

86

87

88

Reference

1. Peterlin, B. M.; Brogie, J. E.; Price, D. H., 7SK snRNA: a noncoding RNA that plays a major role in regulating eukaryotic transcription. Wiley Interdiscip Rev RNA 2012, 3 (1), 92-103. 2. Shumiatskii, G. P.; Tillib, S. V.; Dramerov, D. A., [B2 RNA and 7SK RNA, transcripts of RNA-polymerase III, have a cap-like structure at the 5'-end]. Mol Biol (Mosk) 1990, 24 (6), 1686-94. 3. Jeronimo, C.; Forget, D.; Bouchard, A.; Li, Q.; Chua, G.; Poitras, C.; Therien, C.; Bergeron, D.; Bourassa, S.; Greenblatt, J.; Chabot, B.; Poirier, G. G.; Hughes, T. R.; Blanchette, M.; Price, D. H.; Coulombe, B., Systematic analysis of the protein interaction network for the human transcription machinery reveals the identity of the 7SK capping enzyme. Mol Cell 2007, 27 (2), 262-74. 4. Van Herreweghe, E.; Egloff, S.; Goiffon, I.; Jady, B. E.; Froment, C.; Monsarrat, B.; Kiss, T., Dynamic remodelling of human 7SK snRNP controls the nuclear level of active P-TEFb. EMBO J 2007, 26 (15), 3570-80. 5. Krueger, B. J.; Jeronimo, C.; Roy, B. B.; Bouchard, A.; Barrandon, C.; Byers, S. A.; Searcey, C. E.; Cooper, J. J.; Bensaude, O.; Cohen, E. A.; Coulombe, B.; Price, D. H., LARP7 is a stable component of the 7SK snRNP while P-TEFb, HEXIM1 and hnRNP A1 are reversibly associated. Nucleic Acids Res 2008, 36 (7), 2219-29. 6. Krueger, B. J.; Varzavand, K.; Cooper, J. J.; Price, D. H., The mechanism of release of P-TEFb and HEXIM1 from the 7SK snRNP by viral and cellular activators includes a conformational change in 7SK. PLoS One 2010, 5 (8), e12335. 7. He, N.; Jahchan, N. S.; Hong, E.; Li, Q.; Bayfield, M. A.; Maraia, R. J.; Luo, K.; Zhou, Q., A La-related protein modulates 7SK snRNP integrity to suppress P-TEFb-dependent transcriptional elongation and tumorigenesis. Mol Cell 2008, 29 (5), 588-99. 8. Bayfield, M. A.; Yang, R.; Maraia, R. J., Conserved and divergent features of the structure and function of La and La-related proteins (LARPs). Biochim Biophys Acta 2010, 1799 (5-6), 365-78. 9. Xue, Y.; Yang, Z.; Chen, R.; Zhou, Q., A capping-independent function of MePCE in stabilizing 7SK snRNA and facilitating the assembly of 7SK snRNP. Nucleic Acids Res 2010, 38 (2), 360-9. 10. Barboric, M.; Yik, J. H.; Czudnochowski, N.; Yang, Z.; Chen, R.; Contreras, X.; Geyer, M.; Matija Peterlin, B.; Zhou, Q., Tat competes with HEXIM1 to increase the active pool of P-TEFb for HIV-1 transcription. Nucleic Acids Res 2007, 35 (6), 2003-12.

89

11. Muniz, L.; Egloff, S.; Ughy, B.; Jady, B. E.; Kiss, T., Controlling cellular P- TEFb activity by the HIV-1 transcriptional transactivator Tat. PLoS Pathog 2010, 6 (10), e1001152. 12. Mbonye, U.; Wang, B.; Gokulrangan, G.; Shi, W.; Yang, S.; Karn, J., Cyclin- dependent kinase 7 (CDK7)-mediated phosphorylation of the CDK9 activation loop promotes P-TEFb assembly with Tat and proviral HIV reactivation. J Biol Chem 2018, 293 (26), 10009-10025. 13. Wassarman, D. A.; Steitz, J. A., Structural analyses of the 7SK ribonucleoprotein (RNP), the most abundant human small RNP of unknown function. Mol Cell Biol 1991, 11 (7), 3432-45. 14. Michels, A. A.; Fraldi, A.; Li, Q.; Adamson, T. E.; Bonnet, F.; Nguyen, V. T.; Sedore, S. C.; Price, J. P.; Price, D. H.; Lania, L.; Bensaude, O., Binding of the 7SK snRNA turns the HEXIM1 protein into a P-TEFb (CDK9/cyclin T) inhibitor. EMBO J 2004, 23 (13), 2608-19. 15. Humphries, P.; Russell, S. E.; McWilliam, P.; McQuaid, S.; Pearson, C.; Humphries, M. M., Observations on the structure of two human 7SK pseudogenes and on homologous transcripts in vertebrate species. Biochem J 1987, 245 (1), 281-4. 16. Marz, M.; Donath, A.; Verstraete, N.; Nguyen, V. T.; Stadler, P. F.; Bensaude, O., Evolution of 7SK RNA and its protein partners in metazoa. Mol Biol Evol 2009, 26 (12), 2821-30. 17. Barboric, M.; Kohoutek, J.; Price, J. P.; Blazek, D.; Price, D. H.; Peterlin, B. M., Interplay between 7SK snRNA and oppositely charged regions in HEXIM1 direct the inhibition of P-TEFb. EMBO J 2005, 24 (24), 4291-303. 18. Rouskin, S.; Zubradt, M.; Washietl, S.; Kellis, M.; Weissman, J. S., Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 2014, 505 (7485), 701-5. 19. Silverman, I. M.; Berkowitz, N. D.; Gosai, S. J.; Gregory, B. D., Genome- Wide Approaches for RNA Structure Probing. Adv Exp Med Biol 2016, 907, 29-59. 20. Quarrier, S.; Martin, J. S.; Davis-Neulander, L.; Beauregard, A.; Laederach, A., Evaluation of the information content of RNA structure mapping data for secondary structure prediction. RNA 2010, 16 (6), 1108-17. 21. Xu, Z.; Culver, G., RNA structure experimental analysis--chemical modification. Methods Enzymol 2013, 530, 363-80. 22. Zubradt, M.; Gupta, P.; Persad, S.; Lambowitz, A. M.; Weissman, J. S.; Rouskin, S., DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat Methods 2017, 14 (1), 75-82. 23. Qin, Y.; Yao, J.; Wu, D. C.; Nottingham, R. M.; Mohr, S.; Hunicke-Smith, S.; Lambowitz, A. M., High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases. RNA 2016, 22 (1), 111-28. 24. Mathews, D. H.; Turner, D. H.; Watson, R. M., RNA Secondary Structure Prediction. Curr Protoc Nucleic Acid Chem 2016, 67, 11 2 1-11 2 19.

90

25. Xu, Z. Z.; Mathews, D. H., Experiment-Assisted Secondary Structure Prediction with RNAstructure. Methods Mol Biol 2016, 1490, 163-76. 26. Xu, Z. Z.; Mathews, D. H., Secondary Structure Prediction of Single Sequences Using RNAstructure. Methods Mol Biol 2016, 1490, 15-34. 27. Mathews, D. H., RNA Secondary Structure Analysis Using RNAstructure. Curr Protoc Bioinformatics 2014, 46, 12 6 1-25. 28. Novikova, I. V.; Hennelly, S. P.; Sanbonmatsu, K. Y., Tackling structures of long noncoding RNAs. Int J Mol Sci 2013, 14 (12), 23672-84. 29. Ueda, K.; Seki, T.; Kudo, T.; Yoshida, T.; Kataoka, M., Two distinct mechanisms cause heterogeneity of 16S rRNA. J Bacteriol 1999, 181 (1), 78-82. 30. Durney, M. A.; D'Souza, V. M., Preformed protein-binding motifs in 7SK snRNA: structural and thermodynamic comparisons with retroviral TAR. J Mol Biol 2010, 404 (4), 555-67. 31. Pham, V. V.; Salguero, C.; Khan, S. N.; Meagher, J. L.; Brown, W. C.; Humbert, N.; de Rocquigny, H.; Smith, J. L.; D'Souza, V. M., HIV-1 Tat interactions with cellular 7SK and viral TAR RNAs identifies dual structural mimicry. Nat Commun 2018, 9 (1), 4266. 32. Qi, X.; Zhang, F.; Su, Z.; Jiang, S.; Han, D.; Ding, B.; Liu, Y.; Chiu, W.; Yin, P.; Yan, H., Programming molecular topologies from single-stranded nucleic acids. Nat Commun 2018, 9 (1), 4579. 33. Schulze-Gahmen, U.; Echeverria, I.; Stjepanovic, G.; Bai, Y.; Lu, H.; Schneidman-Duhovny, D.; Doudna, J. A.; Zhou, Q.; Sali, A.; Hurley, J. H., Insights into HIV-1 proviral transcription from integrative structure and dynamics of the Tat:AFF4:P-TEFb:TAR complex. Elife 2016, 5. 34. Diribarne, G.; Bensaude, O., 7SK RNA, a non-coding RNA regulating P- TEFb, a general transcription factor. RNA Biol 2009, 6 (2), 122-8. 35. Zhang, H.; Rigo, F.; Martinson, H. G., Poly(A) Signal-Dependent Transcription Termination Occurs through a Conformational Change Mechanism that Does Not Require Cleavage at the Poly(A) Site. Mol Cell 2015, 59 (3), 437-48. 36. Muniz, L.; Egloff, S.; Kiss, T., RNA elements directing in vivo assembly of the 7SK/MePCE/Larp7 transcriptional regulatory snRNP. Nucleic Acids Res 2013, 41 (8), 4686-98. 37. Markert, A.; Grimm, M.; Martinez, J.; Wiesner, J.; Meyerhans, A.; Meyuhas, O.; Sickmann, A.; Fischer, U., The La-related protein LARP7 is a component of the 7SK ribonucleoprotein and affects transcription of cellular and viral polymerase II genes. EMBO Rep 2008, 9 (6), 569-75. 38. Yik, J. H.; Chen, R.; Nishimura, R.; Jennings, J. L.; Link, A. J.; Zhou, Q., Inhibition of P-TEFb (CDK9/Cyclin T) kinase and RNA polymerase II transcription by the coordinated actions of HEXIM1 and 7SK snRNA. Mol Cell 2003, 12 (4), 971-82. 39. Lu, H.; Li, Z.; Xue, Y.; Schulze-Gahmen, U.; Johnson, J. R.; Krogan, N. J.; Alber, T.; Zhou, Q., AFF1 is a ubiquitous P-TEFb partner to enable Tat

91

extraction of P-TEFb from 7SK snRNP and formation of SECs for HIV transactivation. Proc Natl Acad Sci U S A 2014, 111 (1), E15-24. 40. Schulze-Gahmen, U.; Upton, H.; Birnberg, A.; Bao, K.; Chou, S.; Krogan, N. J.; Zhou, Q.; Alber, T., The AFF4 scaffold binds human P-TEFb adjacent to HIV Tat. Elife 2013, 2, e00327. 41. Barrandon, C.; Bonnet, F.; Nguyen, V. T.; Labas, V.; Bensaude, O., The transcription-dependent dissociation of P-TEFb-HEXIM1-7SK RNA relies upon formation of hnRNP-7SK RNA complexes. Mol Cell Biol 2007, 27 (20), 6996-7006. 42. Hogg, J. R.; Collins, K., RNA-based affinity purification reveals 7SK RNPs with distinct composition and regulation. RNA 2007, 13 (6), 868-80. 43. Flynn, R. A.; Do, B. T.; Rubin, A. J.; Calo, E.; Lee, B.; Kuchelmeister, H.; Rale, M.; Chu, C.; Kool, E. T.; Wysocka, J.; Khavari, P. A.; Chang, H. Y., 7SK-BAF axis controls pervasive transcription at enhancers. Nat Struct Mol Biol 2016, 23 (3), 231-8. 44. Brogie, J. E.; Price, D. H., Reconstitution of a functional 7SK snRNP. Nucleic Acids Res 2017, 45 (11), 6864-6880. 45. Shapiro, R.; Law, D. C.; Weisgras, J. M., A new chemical probe for single- stranded RNA. Biochem Biophys Res Commun 1972, 49 (2), 358-63. 46. Moore, G., Chemical modification of ribosomes with dimethyl sulfate: a probe to the structural organization of ribosomal proteins and RNA. Can J Biochem 1975, 53 (3), 328-37. 47. Yamakawa, M.; Shatkin, A. J.; Furuichi, Y., Chemical methylation of RNA and DNA viral genomes as a probe of in situ structure. J Virol 1981, 40 (2), 482-90. 48. Tijerina, P.; Mohr, S.; Russell, R., DMS footprinting of structured RNAs and RNA-protein complexes. Nat Protoc 2007, 2 (10), 2608-23. 49. Lusvarghi, S.; Sztuba-Solinska, J.; Purzycka, K. J.; Rausch, J. W.; Le Grice, S. F., RNA secondary structure prediction using high-throughput SHAPE. J Vis Exp 2013, (75), e50243. 50. Low, J. T.; Weeks, K. M., SHAPE-directed RNA secondary structure prediction. Methods 2010, 52 (2), 150-8. 51. Bellaousov, S.; Reuter, J. S.; Seetin, M. G.; Mathews, D. H., RNAstructure: Web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res 2013, 41 (Web Server issue), W471-4. 52. Reuter, J. S.; Mathews, D. H., RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 2010, 11, 129.

92

Chapter 3: Structural Insights into 7SK SL3 and

Its Interaction with HnRNP A1

93

3.1 Abstract

Entry into productive transcriptional elongation requires transactivation of the paused RNAPII complex by P-TEFb, whose activity is primarily inhibited through association with 7SK snRNA and inhibitory factor, HEXIM1, forming the 7SK-

HEXIM1-P-TEFb snRNP complex1-4. Upon releasing of P-TEFb, 7SK snRNA nucleates an alternative set of protein factors that include various heterogeneous nuclear ribonucleoprotein (hnRNPs)5-7. So far, hnRNP A1, A2/B1, Q1 and Q3, R, and K have been observed to be associated with 7SK snRNA8-9. It is still unclear why and how these mutually exclusive 7SK-hnRNP snRNPs transit between each other. While the mechanism requires further study, P-TEFb and hnRNP form mutually exclusive complexes with core 7SK snRNP, in association with LARP7 and MePCE, posting an equilibrium between the active and inactive pools of P-

TEFb8-10. This transition implies a potential regulatory mechanism of P-TEFb. To help elucidate these issues, we started with one of the most widely studied hnRNPs, hnRNP A1, in hope to understand the interactions within the 7SK-hnRNP

A1 snRNP complex.

For different structural approaches, different 7SK snRNA constructs were used.

Full-length 7SK snRNA was used for differential DMS-MaPseq study; SL3D (97nt,

Figure 3.5A) was used to acquire tertiary shape of SL3 by SEC-SAXS; SL3S (75nt,

Figure 3.5B) was used for NMR experiments to validate the secondary structure

94

determined by DMS-MaPseq (Chapter 2); SL3up (57nt, Figure 3.5C) was used for

EMSA, ITC and HSQC titration study to reduce the chance of nonspecific binding during the interaction with hnRNP A1. As for the construct of hnRNP A1, unwinding protein (UP1, hnRNP A1-RRM12) (Figure 3.1) was used in replacement of full length hnRNP A1 to avoid the complication due to the potential precipitation and dimerization of full length hnRNP A1. As EMSA suggested, the replacement did not significantly change the binding pattern of the RNA (Figure 3.2).

In this study we find that hnRNP A1 binds specifically to SL3 within the full length

7SK snRNA by differential DMS-MaPseq (Figure 3.3). The DMS-MaPseq model predicted the binding sites that were included in the segmental 7SK snRNA constructs (Figure 3.5). A combination of EMSA (Figure 3.2) and ITC (Figure 3.6) data validate binding sites of hnRNP A1 that were revealed by the differential DMS-

MaPseq. The 15N-1H HSQC titration of hnRNP A1 suggested local base paring rearrangement of SL3up upon hnRNP A1 binding (Figure 3.8). Lastly, integrating the results from NMR (Figure 3.7) and SEC-SAXS (Figure 3.9) with MD simulation, we calculated the 3D structural model of SL3 (Figure 3.10).

95

3.2 Introduction

HnRNPs are RNA binding proteins that have been found almost ubiquitously expressed in vertebrates 11-12, and mostly complexed with heterogeneous nuclear

RNA (hnRNA). Many hnRNPs share general features, but differ in domain composition and functional properties. This multi-functional protein family has been identified to be involved in most stages of RNA metabolism including: assisting in modification and maturation of newly formed hnRNAs (hnRNAs/pre- mRNAs) into messenger RNAs (mRNAs), stabilization of mRNAs during their cellular transport and control of their translation 9, 11. While all of the hnRNPs are present in the nucleus, some seem to shuttle between the nucleus and the cytoplasm.

As the encoding transcripts of most hnRNPs undergo a series of AS and PTM, this protein family evolved into remarkably diverse functions, working in concert in normal and pathological gene regulation13. Considering their functional diversity and complexity, the role of hnRNPs in regulating gene expression has been widely studied, especially in disease research11, 14-15. The expression level of hnRNPs is highly correlated to the stage of tumerigenesis in many types of cancer. In addition to cancer, many hnRNPs were also linked to various neurodegenerative diseases, such as spinal muscular atrophy (SMA), amyotrophic lateral sclerosis (ALS), and

96

Alzheimer’s disease (AD). In AIDS/ HIV-1, hnRNPs are actively involved in alternative splicing (AS), regulating the gene expression of HIV11, 15.

In this chapter, we focus on one of the best-known members of this protein family, hnRNP A1, in HIV-1 gene expression. Although initially discovered as a DNA binding protein, hnRNP A1 has been identified as one of the most abundant hnRNPs in RNA metabolism. The protein contains tandem, antiparallel RRMs,

RRM1 and RRM2 16, a glycine-rich C-terminus used for protein-protein interactions

17, and a non-canonical nuclear export/import signal (M9) (Figure 3.1A) 18. A proteolytic cleavage results in the isolation of the tandem RRMs, which were first identified to have unwinding effects on RNAs upon binding, namely, “unwinding protein 1” (UP1) 16, 19-20. HnRNP A1 recognizes single-stranded nucleic acid, and the crystal structure of UP1 has been solved with both DNA and RNA 16, 19, 21. While the exact binding preferences from different studies vary slightly, the recognition specificity test through the systematic evolution of ligands by exponential enrichment (SELEX) showed the “winner” sequence of 5’-YAG-3’ (Y=C or U) is achieved through ionic interactions and base stacking with aromatic residues such as F17 in RRM1 (Figure 2.2b) 22. In recognizing the HIV exon splicing silencer 3 stem loop, only RRM1, aided with aromatic stacking through the linker, makes contact and binds the RNA19. However, domain swapping, deletion, and duplication experiments have shown that the RRM1 and RRM2 are non-redundant in binding properties 23. Indeed, global analysis of hnRNP A1 interactions highlights

97

unique binding potentials not only for RNA sequence, but also secondary structure of the RNA 22.

While hnRNP A1 is widely studied in basic cellular and pathological RNA processing events such as AS, mRNA stabilization, nuclear export (NE) and translation, the understanding of its function in transcriptional elongation is limited24. One aspect of hnRNP A1 functions that has been poorly characterized is how it regulates 7SK snRNA availability. As introduced in the previous chapter, transcription by RNAPII is closely regulated by P-TEFb, which is mostly trapped in the 7SK-P-TEFb snRNP-HEXIM complex. This complex could further transition to

7SK-hnRNP snRNP, hence releasing P-TEFb from the complex. HnRNP A1 has been shown to bind 7SK snRNA and to promote disassociation of P-TEFb from the 7SK-P-TEFb-HEXIM snRNP complex. P-TEFb in turn assembles onto the inactive RNA polymerase II (RNAPII) transcription complex to activate transcriptional elongation. Previous studies suggest that hnRNP A1 is important for releasing P-TEFb from the 7SK snRNP complex by competing with HEXIM1; however, the series of molecular events that promote the release of P-TEFb is poorly understood. This event is fundamentally important for HIV gene expression.

While previous crosslinking immunoprecipitation coupled with sequencing (CLIP- seq) and co-immunoprecipitation experiments demonstrated that hnRNP A1 associates with SL3 of 7SK snRNA in cells 25-26, it is not clear whether the

98

association is direct or indirect. We predicted these interactions would be direct from the evidence that hnRNP A1 tends to bind directly to consensus 5′-YAG-3′ motifs and the specificity of hnRNP A1 is centered on the conformationally exposed 5′-AG-3′ dinucleotide, but that binding affinities depend on the surrounding sequence and structural context. Upon checking the sequence of 7SK snRNA, there are a total of twenty-four 5’-AG-3’ motifs, with 14 of them located in the SL3 region. However, most of these are buried within base pairing regions, which hinders recognition by hnRNP A1 27-28.

To elucidate the molecular mechanism of the transition between 7SK snRNPs, we designed studies to focus on the specific interaction between hnRNP A1 and 7SK snRNA. During the experiment, we decided to focus our studies on the UP1 domain of hnRNP A1 since its glycine-rich C-terminal region of hnRNP A1 significantly increases the protein-protein interaction, aiding in cooperative protein binding17, 27, dimerization and precipitation.

To start, we studied the 7SK-hnRNP A1 complex using differential DMS-MaPseq.

This method is a comparative study in which the RNA will be sequenced twice, once in native form and the other, in protein bound form. Binding of protein onto the RNA in general shields the particular nucleotide so that the bound nucleotide gets less accessible for DMS modification and the sequencing profile shows a lower population at the bound nucleotide compared to the sequence profile of the

99

native RNA. However, the accessibility is very sensitive, slight changes in experimental conditions can change the patterns in the result. Moreover, the unwinding effect of UP1 would change the RNA base pairing upon binding, so that more active DMS modification will be detected or the shielding effect will be compensated.

To find the ideal conditions, a DMS titration along with a hnRNP A1 (UP1) titration was performed to optimize the binding and modification conditions (data not shown). Through this method, we identified putative binding sites of hnRNP A1 on the SL3 of 7SK snRNA, which in general agreed with previous reports. For the structural experiments, smaller RNA constructs that include the two specific binding sites on the SL3 of 7SK snRNA were designed based on the DMS-MaPseq model. In total, three constructs, with overlapping regions from C210 to G264, were designed for different types of experiment, SL3up (57 nt, Figure 3.5C), SL3S (75 nt, Figure 3.5B) and SL3D (97 nt, Figure 3.5A).

We first performed electrophoretic mobility shift assay (EMSA) on SL3up using hnRNP A1 (Figure 3.2A) and UP1 (Figure 3.2B) as ligands. As expected, UP1 binds to SL3up with a Kd of 42±3 nM, similar to hnRNP A1. Therefore, the study of

7SK snRNA with hnRNP A1 is performed with UP1 as a replacement of hnRNP

A1.

100

To validate the secondary structure of the construct by NMR, we began with SL3D which shows very weak NOEs in the spectrum that were not assignable. However, we were able to assign a majority of the NOEs collected with the SL3S construct

(Figure 3.7A), except for the bottom internal loops located at the end of SL3D. The assignment was made with the help of 15N-1H HSQC spectrum of selective 15N- labeled G and U SL3S samples (Figure 3.7B), in which only base paired G and U were shown. The 15N-labeled G and U SL3Sup samples was also titrated with UP1 and the spectrum was collected using sofast 1H-15N HSQC. The chemical shifts of three Gs in GC base pairs emerged in the spectrum (labeled with red arrows), which indicated local base pairing rearrangement within the construct upon UP1 binding (Figure 3.8).

To study the thermodynamics of the interactions, isothermal titration calorimetry

(ITC) is performed with SL3up construct and UP1. The result validated the binding sites of hnRNP A1 that were revealed by DMS-MaPseq. The binding affinity KD of

42± 3 nM, and the binding stoichiometry of UP1 to SL3up was around 2.16, consistent with the result from differential DMS-MaPseq.

To build the 3D model for the region, small angle X-ray scattering in line with size exclusion chromatography (SEC-SAXS) was performed on SL3D as well as its 1:1 complex with UP1. The general molecular density envelope of the free SL3D RNA and its 1:1 complex are shown in Figure 3.9.

101

Finally, we combined the results from several structural determining methods, including DMS-MaPseq, NMR and SAXS with molecular dynamic (MD) simulations to generate a calculated structural model for SL3D (Figure 3.10).

102

3.3 Materials and methods

RNA synthesis and purification

The full length 7SK snRNA and SL3D were subcloned into pUc19 vector using the methods detailed in the previous chapter. The linearized plasmid template was desalted and stored at -20 °C. For SL3up and SL3S, synthetic DNA oligo templates were purchased (IDT). Uniformly 13C/15N-labeled uridine (UTP) and guanidine

(GTP) (Cambridge Isotope Laboratories), and fully protonated adenosine (ATP) and cytidine (CTP) (Sigma-Aldrich) were used to prepare the labeled SL3up and

SL3S RNA samples for so-fast HSQC titration, while the other RNA samples for all studies were prepared with fully protonated NTPs (Sigma-Aldrich).

Transcription reactions were optimized in individually trials following the protocol detailed in the previous chapter. The large synthesis was scaled up according to the optimized condition for each construct. The SL3 constructs were purified to homogeneity by 8-10% Urea-PAGE and electroeluted in Tris-Borate-EDTA (TBE) buffer. The RNA samples were desalted and adjusted to under 20 mM using

Nanodrop (ThermoFisher Scientific). All the desalted RNA samples were annealed in RNase free water by heating the sample to 95 °C for 2 min followed by flash cooling on ice for more than 15 min. The annealed samples were concentrated using centrifugation filtration system (Amicon) and purified using size exclusion column (SEC), exchanging into the buffer specifically made for each following

103

application: for NMR, including NOESY and HSQC, 10 mM K2HPO4, 50 mM KCl, pH 6.5, 10% D2O; for ITC, 10 mM K2HPO4, 120 mM KCl, 1 mM TCEP, 0.5 mM of

EDTA, pH 6.5; for SEC-SAXS, 50 mM KCl and 5 mM MES, pH 6.5. The refolded samples were prepared freshly for the experiments and kept at 4 °C if needed.

UP1 purification

The C-terminal (His)6-tagged UP1 protein (residues 1-196) was prepared as previously described19. In short, the UP1 construct was overexpressed in

BL21(DE3) cells and purified using nickel affinity chromatography on Hi-Trap columns (GE Biosciences) using high salt binding, washing and elution buffers (1.2

M NaCl, 20 mM Na2HPO4, pH 7.5) containing 10 mM, 20 mM and 250 mM imidazole, respectively. Once eluted, the UP1 construct was exchanged into ITC buffer (120 mM KCl, 10 mM K2HPO4, 1 mM TCEP and 0.5 mM EDTA, pH 6.5) using a HiPrep 16/60 sephacryl S-100 column (Pharmacia Biotech) FPLC.

Fractions coinciding with a single peak were collected. The final sample was tested for purity using a 10% SDS denaturing gel and stored at 4 °C until use. Freshly purified sample was preferred for the experiment.

DMS-MaPseq

The detailed protocol for DMS-MaPseq is given in the previous chapter, including the DMS modification, RT-PCR, Next Generation Sequencing and data processing.

104

For the UP1 titration samples, the complexes were formed and incubated in the

DMS modification buffer at room temperature for 30 min before adding 2-5% DMS.

For each complex sample. 1µg (stocked at 200 ng/µL) of native 7SK snRNA was refolded before adding buffers and UP1. The final concentration of the RNA was adjusted to 100 nM (1 µg/ 100 µL). Based on the ratio of the titration, UP1 was pre- diluted to the preferred concentration, 1 µM, 2 µM, 4 µM, 10 µM, and 25 µM. In

Each sample, 10 µL UP1, 5 µL RNA solution and 80 µL DMS buffer were added to form the complexes. The titration samples were examined using EMSA to ensure the complex is formed.

Differential DMS-MaPseq

All the DMS-MaPseq profiles were processed using the conventional protocol including internal normalization detailed in the previous chapter. To get differential

DMS reactivity between native 7SK snRNA dataset and 7SK-UP1 snRNP complex dataset, z-score test was done for each dataset individually to normalize the overall

DMS reactivity of the three samples. For individual background subtraction, the untreated sample of the same titration was used to normalize the background mismatch. The normalized dataset for each titration point were used to get the differential DMS reactivity of UP1 binding.

ITC

105

All calorimetric titrations were performed at 25 °C on a VP-ITC calorimeter

(MicroCal, LLC) as described previously19. UP1 was exchanged into ITC buffer by gel filtration. 7SK SL3up were prepared in titration buffer using a centrifugal filter device (Millipore Amicon) and subsequently annealed by heating at 95 °C for 2 min, followed by flash cooling on ice. UP1 at 50 µM was titrated into ~1.4 mL of

2.0-2.5 µM SL3up sample over 42 injections of 6 µL each. Prior to non-linear least squares fitting in Origin v7.0, the raw data were corrected for dilution by subtracting the average heats from the saturated upper asymptotes.

SAXS data acquisition and analysis

SL3D (188-284 nt) RNA for size exclusion chromatography in line with small angle

X-ray scattering (SEC-SAXS) was prepared as described above using fully- protonated unlabeled NTPs. SEC-SAXS experiments were performed at BioCAT

(Beamline 18-ID) at the Advanced Photon Source (Argonne National Laboratory;

Lemont, IL). To minimize any non-negligible structure effects such as aggregation or repulsion, SAXS experiments were performed in 50 mM KCl and 5 mM MES buffer (pH 6.5) at a concentration of ~ 3 mg/mL in 200 µL load volume for SEC.

The SAXS data was collected at 0.5 s exposures every 3 s for the duration of the

SEC run. 10 points coinciding with a single SEC peak were taken as sample + buffer and 100 points coinciding with the SEC baseline trace directly prior to sample peak were taken as buffer only. Buffer only scattering was subtracted from buffer plus sample scattering to obtain the solution scattering from the RNA. After

106

initial processing, Primus from the ATSAS suite of small angle X-ray scattering programs was used to visualize the data. Guinier fitting (Rg Å~ q < 1.3) was used to check for non-negligible structure factors (aggregation or repulsion) and determine the Radius of Gyration (Rg). GNOM was used to fit the SAXS data and generate the pairwise-distance distribution (p(r)) to determine the maximum particle dimension (Dmax). The molecular envelope of SL3D construct was determined using DAMMIF in fast mode. In short, the ab initio models were generated from the fitting model that was determined by the SAXS data using

GNOM. The models were then averaged using DAMAVER, and the most populated (probable) models were determined using DAMFILT. The overall

Normalized Spatial Discrepancy (NSD) for all 32 models was ~ 0.6. The final ab initio molecular envelope was overlaid onto atomic model for visualization using

SUPCOMB by allowing for enantiomers and fitting in fast mode.

NMR data acquisition

All NMR experiments were performed on Advance 900/800/700 MHz high-field

NMR spectrometer (Bruker) equipped with cryogenically cooled HCN triple resonance probes and a z-axis pulsed-field gradient accessory. After collection, all data was processed using NMRpipe/NMRDraw22 and analyzed/assigned using

NMRViewJ23. All NMR experiments of SL3 constructs were conducted in the NMR buffer. Hydrogen-bonding was assigned by collecting exchangeable 1H-1H (imino) spectra in 90 % H2O / 10 % D2O at 283 K using a Watergate NOESY (Tm = 200

107

ms) pulse sequence on fully protonated SL3S. 1H-15N HSQC spectra were collected to verify imino assignment using a selective 15N/13C-labeled G and U

(fully protonated A and C) SL3S sample.

NMR titrations of UP1-7SK SL3up

SOFAST HSQC titrations were carried out using selective 15N/13C-labeled 7SK

SL3up construct and unlabeled UP1. Spectra were collected at four different protein: RNA molar ratios of 0.25, 0.5, 0.75 and 1 in 10 mM of K2HPO4, 50 mM of

KCl, pH 6.50 and at 288 K.

Structural modeling

Nucleotide sequence and secondary structure information were submitted to

RNAComposer for coarse tertiary structure modeling 29. Ten structures output in

PDB format were then chosen for further simulations in Amber with the ff99OL3 force field 30. Selected structures were subjected to energy minimization in Amber, followed by 50 ns MD simulation, cooled to 0 K, and fitted to SAXS data for 100 ps with the Emap function in Amber.

Minimization and subsequent calculations were performed in implicit solvent with the generalized Born model (igb=1). Structures were prepared for simulation in

Amber using Tleap. Energy minimization consisted of 4000 cycles of minimization,

108

2000 cycles of steepest descent followed by 2000 cycles of conjugate gradient

(ntmin=1, ncyc=2000). Following minimization, simulations were conducted using

GPU-accelerated pmemd on NVIDIA Tesla P100 GPUs. Simulations employed a

999.9 Å non-bonded cutoff with a 10 Å cutoff for the calculation of Born radii.

Structures were simulated for 50 ns with a 2 fs timestep (d=0.002). In order to manage temperature, Langevin dynamics with collision frequency of 2.0 ps-1 was used19.

The final frame of each simulation was extracted using cpptraj, producing the initial structures utilized in further refinement. For final refinement using both NMR and

SAXS data, structures were simulated for 100 ps at 0 K with sander. The non- bonded cutoff was reduced to 24 Å, and the simulation was not run as a restart from the cool down. SAXS data was converted into a restraint file via the creation of a SAXS molecular envelope with DAMMIF.

The molecular envelope was then aligned to each structure using Supcomb, creating a restraint map specific to the individual structures. Each aligned molecular density map was incorporated as a shape constraint using the Emap function in Amber. The Emap function employed an fcons of 0.005, with a 5 Å map around the SAXS envelope as the restraint map shape. The final frame of each

Emap-restrained simulation was then aligned in PyMol for visual representation of the solution structure [Figure 3.10].

109

3.4 Results

Preliminary study of 7SK snRNP to UP1 interaction by differential DMS-MaPseq

In the differential DMS reactivity distribution, a positive change indicated higher

DMS accessibility of the nucleotide, directly resulting from the relaxed secondary structure of the region; a negative change indicated lower DMS accessibility of the nucleotide, resulting from the likely shielding through direct binding to protein, or new base paring. As mentioned earlier, the unwinding effect and shielding effect were opposing effects in the differential DMS profile.

However, hnRNP A1 could unwind RNA base paring as a result of binding. In a single profile, the apparent DMS reactivity showed the overall result of these two opposing effects. Without further interpretation and background, it would be hard to speculate a particular nucleotide of decreased DMS reactivity to be a specific binding site or a nonspecific binding region that unwound.

As for the preliminary experiment, free full length 7SK snRNA and 7SK-hnRNP A1 snRNP complex ([7SK]: [hnRNP A1] =1:10) was prepared for DMS-MaPseq. The differential DMS reactivity distribution was calculated following the described protocol. As it showed, the region corresponding to SL3 (188-284 nt) got more frequent changes of DMS reactivity upon hnRNP A1 binding compared to the other

110

regions of the full length 7SK snRNA (Figure 3.1). While the majority of the changes were positive, the negative changes were mostly correlated to the single- stranded regions of SL3D.

Upon checking the sequencing of 7SK snRNA, there are 24 sequential 5’-AG-3’ motifs, 14 of them are located in the SL3 region, with 4 of them as the 5’-YAG-3’ motifs, which was previously reported to be the preferred binding sequence of hnRNP A1. As the preliminary result agreed with the previous report, which of these 14 5’-AG-3’ are the specific binding sites?

As we hypothesized, at lower binding ratio, hnRNP A1 would initially bind to the specific binding sites and slightly unwound the local region of the sites, with the result that the potential binding sites having the 5’-AG-3’ motif would be exposed in the local region; as more hnRNP A1 is titrated to the RNA, more hnRNP A1 gets access to the RNA to recognize and bind to the motifs that were initially buried in base paired regions; as the titration continues, the binding and unwinding consequently happens until the RNA is saturated with hnRNP A1. The RNA would be completely unwound so that all the 5’-AG-3’ motifs are bound with hnRNP A1; at the extreme condition that hnRNP A1 was oversaturated, this RNA binding protein would even bind to random sites nonspecifically.

111

For the experiment, we would like to identify the specific binding sites of hnRNP

A1 shown in the lower binding ratio to avoid nonspecific binding due to over saturation. To optimize the binding ratio to the point that all specific binding sites were exposed and bound without detecting the unwound binding sites, a titration experiment of hnRNP A1 to full length 7SK snRNA was performed.

DMS-MaPseq titration study of UP1 to 7SK snRNA

Zooming into the SL3 region of the 1:10 complex, as the heat map showed, some base pairing A (A200, A271, and A281) and C (C202, C273, C282, and C284) became highly reactive in the complex. We believed this indicated that the base paring is disrupted, and the secondary structure of 7SK SL3D was changed due to oversaturation at the ratio of 1:10, so that nonspecific binding sites were exposed and bound to hnRNP A1. Experiments at lower binding ratio are needed, in which base pairing are only locally rearranged upon hnRNP A1 binding, without total unwinding of the stem loop.

From the differential DMS reactivity distribution map of the 1:10 complex, we also noticed that, the local nucleotides surrounding the binding sites were mostly more reactive, where the DMS reactivity was extremely low in the RNA only sample.

Moreover, the end regions located at the bottom of SL3 also became highly reactive in the complex. These observations indicated that SL3 get mostly

112

unwound that 5’-AG-3’ motifs in the base pairs and even non-specific binding sites are exposed due to over saturation of hnRNP A1.

In following titrations, samples of 7SK snRNA-hnRNP A1 complexes at 1:1, 1:2 and 1:4 molar ratios were prepared. While the 1:1 complex indicated two binding sites (A219 and A239) located in the upper region of the stem loop, the 1:2 complex showed very little difference from the 1:1 complex (data not shown). As more hnRNP A1 titrated into the RNA, the 1:4 complex showed stronger binding pattern at A219 and A239.

As shown in the differential DMS-MaPseq data of 1:4 complex, the end regions demonstrated very little changes (color coded as light yellow or green). Meanwhile, the regions surrounding A219 and A239 showed very minimal reactivity increase due to the unwinding effect of hnRNP A1, without disrupting the lower region of the stem loop. Together, the titration experiment suggested hnRNP A1 binds to the top region of SL3, more precisely, to A219 and A239, with no preference (as shown in 1:1 complex); continued titration of hnRNP A1 leads to local secondary structure rearrangement within the top region of SL3 (as shown in 1:4 complex).

ITC experiments confirm hnRNP A1 binds specifically to the upper region of SL3 with Kd of 42±3 nM and stoichiometry of

2.16±0.02

113

SL3 region of 7SK snRNA was previously shown to interact with hnRNP A1 though

CLIP-seq studies12. Furthermore, differential DMS-MaPseq located the binding sites within the upper region of SL3. To validate the result, and better understand this interaction, including obtaining overall stoichiometry and binding affinity, ITC were preformed using the upper region of SL3, SL3up. In the experiment, two binding sites were observed (stoichiometry of 2.16), showing hnRNP A1 interacts with SL3 to form an apparent 2:1 complex. The two interactions with A219 and A239 were not distinguishable in the ITC titration curve (Figure 3.4), indicating that both interactions have similar binding affinities (~ 40 nM) at the salt concentrations and temperature (25 °C) tested. The interactions are driven by a large favorable change in enthalpy and opposed entropically, agreeing with similar studies involving hnRNP A1 binding to structured RNA stem loops.

1H-1H NOESY and 1H-15N HSQC spectra confirm the secondary structure of SL3S

To validate the secondary structure of SL3 determined by DMS-MaPseq, 1H-1H

Watergate NOESY and 1H-15N HSQC experiments were performed to examine the hydrogen bonding pattern taken by the stem loop. As the 1H-1H Watergate NOESY of SL3D showed very weak NOEs, a smaller construct, SL3S was designed to validate the hydrogen bonding patterns. The detailed SL3S assignment of 1H-1H

114

Watergate NOESY is shown in Figure 3.2. While no NOEs assigned to the base pairs located at top region near the apical loop from the 1H-1H NOESY spectrum of SL3S, at least one of the GU base pairs in this region showed NOEs in the facilitating spectrum collected on SL3up under the same condition (spectrum not shown). No NOEs were assigned for the base pair of U208 to A266, which might be relaxed due to the proximal internal loop. With a few exceptions, continuous walking patterns were traced for the majority of SL3S. Most of signals in the 1H-

1H NOESY spectrum were assignable aided by the corresponding 1H-15N HSQC assignment [Figure 3.3].

SEC-SAXS provided molecular density model of SL3D and its complex

SAXS provided structural insight into the global shape of SL3D as well as its complex with hnRNP A1. Size exclusion chromatography in-line with small angle x-ray scattering (SEC-SAXS) was applied to further estimate the global shape of

SL3D. In the Kratky plot (Figure 3.9A), the inverted parabolic shape of this plot indicates that SL3D folds into a stable structure, consistent with 1H-1H NOESY results. The radius of gyration (Rg) of SL3D and its 1:1 complex calculated from the linear region of the Guinier plot are 38.4 Å and 44.3 Å, respectively. The shape of the pair distance distribution function p(r) reveals that SL3D (Figure.3.9C) adopts a kinked cylindrical structure with a maximum dimension of ~130 Å, and

115

the complex (Figure 3.9D) adopts a less cylindrical shape with maximum dimension of ~160 Å.

Twenty ab initio molecular reconstructions of SL3D were used to determine the final DAMMIN refined model (Figure 3.9C, Figure 3.9D). The measured molecular weight determined from the excluded volume is 28 kD, which agreed well with the expected molecular weight (29.6 kD) within around 5.4 percent error. The structural model of the 1:1 complex of SL3D and hnRNP A1 was also determined similarly, and the measured molecular weight of the complex is 50 kD, which agreed well with the expected molecular weight (52.6 kD) with 4.9 percent error.

The tertiary structure models for the SL3D fitted to the molecular density envelope calculated from SEC-SAXS data

The three best-fit tertiary structure models for the SL3D were superimposed to the molecular density envelope. 10 initial tertiary structure models were generated in

RNAComposer, with base pairing information from the differential DMS-MaPseq model, and then simulated at 300 K for 50 ns in Amber at physiological salt concentration. The results of these simulations were filtered against the SAXS data, and the twenty best fit structures were minimized then briefly simulated and cooled to 0 K in Amber. Structural refinement of the 3 best fit structures was conducted in

Amber with Supcomb aligned SAXS molecular density map as an Emap restraint.

116

A219 and A239 are colored in red as binding sites of hnRNP A1 according to the differential DMS-MaPseq data. Processing of raw SAXS data for filtering and generation of the molecular envelope was conducted with the software Primus and

DAMMIF. The overall size of the SAXS reconstruction easily accommodates the

NMR determined model (Figure 3.10).

117

[Figure 3.1] Structural Features of HnRNP A1. (A) The domain organization of full length hnRNP A1 showing the N-terminal RNA binding domains (RRM1 = yellow, Inter-RRM linker = gray, and RRM2 = blue) and the C-terminal domain

(light gray) with M9 nuclear localization signal depicted as a black box. The tandem

RRMs of hnRNP A1 collectively make up the UP1 protein, residues 1–196. The

RNP1 and RNP2 submotifs are also depicted for each RRM. (B) The solution NMR structure (2LYV) of UP1 color-coded as in Panel A. (C) A zoomed view of the alpha helical side of UP1 showing the conserved salt bridge interactions that stabilize the relative orientation of RRM1 and RRM2. (Reprinted from Elsevier Semin Cell Dev

Biol, Copyright 2018, for license details, please refer to Appendix 2)

118

[Figure 3.2] HnRNP A1/ UP1 Titration into 7SK SL3up by EMSA. The concentration of the 7SK SL3up RNA was constant at 40 nM. The concentration of the protein in each lane are: 0, 4 nM, 40 nM, 80 nM, 160 nM, 320 nM, 640 nM,1

µM, 2 µM, 4 µM (left to right). (A) HnRNP A1 titrated into 7SK SL3up RNA. (B) UP1 titrated into 7SK SL3up RNA.

119

[Figure 3.3] Differential DMS-MaPseq of 7SK SnRNA upon HnRNP A1 Binding.

The experiment was performed with 10-fold higher hnRNP A1 present in the DMS modification. (A) Differential DMS reactivity distribution. The relative DMS reactivity is normalized using z-score test. The intensity change represents the DMS reactivity change compared to the sample without hnRNP A1. The boxed region corresponds to SL3D. (B) Linear regression test of 7SK-hnRNP A1 complex (Y axis) vs 7SK snRNA (X axis). The data points highlighted in red represent corresponding nucleotides as labeled.

120

121

122

[Figure 3.4] Differential DMS Reactivity of 7SK-hnRNP A1 Complex in SL3D

Region. While all the Differential DMS MaPseq studies were performed on full length 7SK snRNA, significant changes have only been noticed in the hnRNP A1 binding region, SL3D. From left to right, before DMS modification, hnRNP A1 was titrated into 7SK snRNA from 1:1, 1:4, and 1:10 in each sample. The differential

DMS reactivities were color coded in the tricolor style shown on the right. Differential DMS-MaPseq of 7SK U C A C A A239 SL3(188-284) titrated with hnRNP A1 U G C G A G U C A C U C A C A U A C A A239 A U U G A U C G C G A G U C C U A C U A C C G A U A A219 A A U A A C G G A U More Accessible More G A C G A U C G C A C U G U U A U A C G C G A219 A A C G A G C G G A C G A U C G C A C U G U U A U A C G C G C G U C C G GU A A C210 C G G264 C G G G C G A G U A U U C G C 7SK:hnRNP More Protected More C A A1=1:4 A U A U A G A G C G U G188 C284 7SK:hnRNP 7SK:hnRNP 7SK:hnRNP G C A1=1:1 A1=1:4 A1=1:10 7SK:hnRNP A1=1:4

123

[Figure 3.5] Secondary Structures of SL3 Constructs Used in the Study. The

secondary structures are color coded the same as the 7SK:hnRNP A1 molar ratio

of 1:4 in Figure 3.4. (A) SL3D, 97 nt, covering the entire region of SL3 in the DMS-

MaPseq model of 7SK secondary structure. (B) SL3S, 75 nt, excluding the large

internal loops at the lower region. In this construct, A200 was mutated to G200 to

form 5’-GG- for T7 transcription in vitro. (C) SL3up, 57 nt, including C210 to G264.

Two consecutive G was added at 5’ for T7 transcription in vitro, no 3’ Differential DMS-MaPseq of 7SK U C A C A A239 U G Differential DMS-MaPseq of 7SK U C A SL3(188-284) titrated with hnRNP A1 complimentary C was designedC forA A239 the construct. C G SL3(188-284) titrated with hnRNP A1 U G A G U C C G A C U C A A G U C C A U A C A A239 C A C C A U U G Differential DMS-MaPseq of 7SK U A C U A A C A A239 A BU A C C A A239 A U C G SL3(188-284) titrated with hnRNP A1 U G A U U G C G A G U C C G A U C G C U A C U A C A G U C C G A G U C C G A U A A C C U A C A219 A U U A U C A A A C G A G C A U A A239 C A U A A U C G C A Accessible More G A A U A219 A U G A U A U C G A C G A U A A C G G C G A U C C U More Accessible More G A G U U A C G A U C G C A A G U C C G U A C G C U A C C U A219 U A G U C A U A C G A A C G A G C G U A A U C G C G G A A219 A A U A219 A U C A A A C G G C G A U A A C G A G C G G C

More Accessible More C U G A C G C G G A U A G U A U C G C A C G C U A U C G C A C G U A G U C U U A G U C G U A C G U U A C G C G U A C C G A219 G A C210 G264 C G G U A A C G A G C G U A C G C G C C G A C G C G G C G GU A A A U C G C A C210 C G G264 G C G C U G U U A C G G A G U A U U C U A G G C 7SK:hnRNP C G G C Protected More C G C A A1=1:4 G U A G U A U U C A U A U A G A C C G C C G 7SK:hnRNP More Protected More G A C A C210 C G G264 A1=1:4 G C U A A U A U A G A G U C G G G188 C284 G C G 7SK:hnRNP 7SK:hnRNP 7SK:hnRNP G C G C G U A1=1:1 A1=1:4 A1=1:10 7SK:hnRNP A G U A U U C G188 C284 A1=1:4 7SK:hnRNP 7SK:hnRNP 7SK:hnRNP G C G C 7SK:hnRNP A1=1:1 A1=1:4 Protected More A1=1:10 C A 7SK:hnRNP A1=1:4 A U A U A G A A1=1:4 G C G U G188 C284 7SK:hnRNP 7SK:hnRNP 7SK:hnRNP G C A1=1:1 A1=1:4 A1=1:10 7SK:hnRNP A1=1:4

124

[Figure 3.6] HnRNP A1 Titration of SL3up RNA Measured by ITC. (A) Titration curve of hnRNP A1 into 7SK SL3up. The concentration of SL3up is 2 µM, and the concentration of hnRNP A1 titrant is 40 µM. (B) Triplicate titrations were performed.

Time (min)

0 50 100 150 0.05

0.00

-0.05

-0.10 Cal/sec µ -0.15

-0.20

-0.25 0.00

-5.00

-10.00

-15.00

Cal/le -20.00

-25.00

0 1 2 3 4 /

125

1 1 [Figure 3.7] H2O NMR Spectrum of SL3S. (A) H- H H2O NOESY spectrum of

SL3S. The NOEs were assigned in the spectrum, the colored nucleotides shown in the secondary structure were assigned. (B) 1H-15N HSQC spectrum of SL3S.

The signals were labeled with the same colors in Panel A and Panel B.

126

G U223 G206 A A U G214 G200 G201 U203 A 213 U A A C G C C C C U U G U U U U U G G G G G A A A C C C C C C C C C C U246 G259 C G G G G U U U U U U U G G G G G G G G G G G A A A A C A U U A C A U260 A C A U274 G272 G252 U268 C G267 G263 G264 G262 G249 U247 U248 C G A A 14 U223 U203 U246 U247 U G264 213 G201 G267 13 U248 G263 G272 G262 G249 12 G252 U274 G206 U260 G200 11 G214 U268

14 13 12 11

127

G U223 G206 A A U G214 G200 G201 U203 A 213 U A A C G C C C C U U G U U U U U G G G G G A A A C C C C C C C C C C U246 G259 C G G G G U U U U U U U G G G G G G G G G G G A A A A C A U U A C A U260 A C A U274 G272 G252 U268 C G267 G263 G264 G262 G249 U247 U248 C G A A 14 U223 U203 U246 U247 G201 U 213 G264 G267 13 U248 G263 G272 G262 G249 12 G252 U274 U268 U260 G200 U242/U234 11 G214 G206

165 160 155 150

128

[Figure 3.8] HnRNP A1 Titration of 7SK SL3up by 1H-15N HSQC. The selective

15N-labeled G U sample of SL3up RNA was constant set at 150 µM. Free RNA

(black), ([SL3up]: [hnRNP A1]) of 1:0.5 (red), 1:0.75 (purple) and 1:1.25 (green) were collected. The assignment was done with reference to 1H-1H NOESY of

SL3up (not shown). The emerging signals upon hnRNP A1 titration are labeled with red arrows.

129

[Figure 3.9] SEC-SAXS Data of SL3D and Its 1:1 Complex with HnRNP A1. (A)

Krakty plot for 7SK SL3D. The inverted parabolic shape of this plot indicates that

7SK SL3D folds into a stable structure. (B) Pair distribution of SL3D (blue) and its

1:1 complex with hnRNP A1 (orange). The radius of gyration (Rg) calculated from the linear region of the Guinier plot is 38.4 Å for SL3D. The shape of the pair distance distribution function p(r) reveals 7SK SL3 adopts a kinked cylindrical structure with a maximum dimension (Dmax) of ~130 Å. The Rg for the SL3D-hnRNP

A1 1:1 complex is 44.3 Å and Dmax of the complex is ~160 Å. (C) The final DAMMIN refined model of SL3D. (D) The final DAMMIN refined model of SL3D-hnRNP A1 complex.

130

131

[Figure 3.10] The Tertiary Structure Models for the 7SK SL3D Fitted into the

Molecular Density Envelope Calculated from SEC-SAXS Data. The three structures with lowest scores were calculated in Amber and fitted to the molecular density envelope calculated from SEC-SAXS data.

132

Reference

1. Michels, A. A.; Fraldi, A.; Li, Q.; Adamson, T. E.; Bonnet, F.; Nguyen, V. T.; Sedore, S. C.; Price, J. P.; Price, D. H.; Lania, L.; Bensaude, O., Binding of the 7SK snRNA turns the HEXIM1 protein into a P-TEFb (CDK9/cyclin T) inhibitor. EMBO J 2004, 23 (13), 2608-19. 2. Blazek, D.; Barboric, M.; Kohoutek, J.; Oven, I.; Peterlin, B. M., Oligomerization of HEXIM1 via 7SK snRNA and coiled-coil region directs the inhibition of P-TEFb. Nucleic Acids Res 2005, 33 (22), 7000-10. 3. Li, Q.; Price, J. P.; Byers, S. A.; Cheng, D.; Peng, J.; Price, D. H., Analysis of the large inactive P-TEFb complex indicates that it contains one 7SK molecule, a dimer of HEXIM1 or HEXIM2, and two P-TEFb molecules containing Cdk9 phosphorylated at threonine 186. J Biol Chem 2005, 280 (31), 28819-26. 4. Li, Q.; Cooper, J. J.; Altwerger, G. H.; Feldkamp, M. D.; Shea, M. A.; Price, D. H., HEXIM1 is a promiscuous double-stranded RNA-binding protein and interacts with RNAs in addition to 7SK in cultured cells. Nucleic Acids Res 2007, 35 (8), 2503-12. 5. Barrandon, C.; Bonnet, F.; Nguyen, V. T.; Labas, V.; Bensaude, O., The transcription-dependent dissociation of P-TEFb-HEXIM1-7SK RNA relies upon formation of hnRNP-7SK RNA complexes. Mol Cell Biol 2007, 27 (20), 6996-7006. 6. Van Herreweghe, E.; Egloff, S.; Goiffon, I.; Jady, B. E.; Froment, C.; Monsarrat, B.; Kiss, T., Dynamic remodelling of human 7SK snRNP controls the nuclear level of active P-TEFb. EMBO J 2007, 26 (15), 3570-80. 7. Hogg, J. R.; Collins, K., RNA-based affinity purification reveals 7SK RNPs with distinct composition and regulation. RNA 2007, 13 (6), 868-80. 8. Diribarne, G.; Bensaude, O., 7SK RNA, a non-coding RNA regulating P- TEFb, a general transcription factor. RNA Biol 2009, 6 (2), 122-8. 9. Peterlin, B. M.; Brogie, J. E.; Price, D. H., 7SK snRNA: a noncoding RNA that plays a major role in regulating eukaryotic transcription. Wiley Interdiscip Rev RNA 2012, 3 (1), 92-103. 10. Krueger, B. J.; Jeronimo, C.; Roy, B. B.; Bouchard, A.; Barrandon, C.; Byers, S. A.; Searcey, C. E.; Cooper, J. J.; Bensaude, O.; Cohen, E. A.; Coulombe, B.; Price, D. H., LARP7 is a stable component of the 7SK snRNP while P-TEFb, HEXIM1 and hnRNP A1 are reversibly associated. Nucleic Acids Res 2008, 36 (7), 2219-29. 11. Levengood, J. D.; Tolbert, B. S., Idiosyncrasies of hnRNP A1-RNA recognition: Can binding mode influence function. Semin Cell Dev Biol 2018. 12. Chaudhury, A.; Chander, P.; Howe, P. H., Heterogeneous nuclear ribonucleoproteins (hnRNPs) in cellular processes: Focus on hnRNP E1's multifunctional regulatory roles. RNA 2010, 16 (8), 1449-62.

133

13. Mbonye, U.; Wang, B.; Gokulrangan, G.; Shi, W.; Yang, S.; Karn, J., Cyclin- dependent kinase 7 (CDK7)-mediated phosphorylation of the CDK9 activation loop promotes P-TEFb assembly with Tat and proviral HIV reactivation. J Biol Chem 2018, 293 (26), 10009-10025. 14. Stoltzfus, C. M.; Madsen, J. M., Role of viral splicing elements and cellular RNA binding proteins in regulation of HIV-1 alternative RNA splicing. Curr HIV Res 2006, 4 (1), 43-55. 15. Dowling, D.; Nasr-Esfahani, S.; Tan, C. H.; O'Brien, K.; Howard, J. L.; Jans, D. A.; Purcell, D. F.; Stoltzfus, C. M.; Sonza, S., HIV-1 infection induces changes in expression of cellular splicing factors that regulate alternative viral splicing and virus production in macrophages. Retrovirology 2008, 5, 18. 16. Ding, J.; Hayashi, M. K.; Zhang, Y.; Manche, L.; Krainer, A. R.; Xu, R. M., Crystal structure of the two-RRM domain of hnRNP A1 (UP1) complexed with single-stranded telomeric DNA. Genes Dev 1999, 13 (9), 1102-15. 17. Cartegni, L.; Maconi, M.; Morandi, E.; Cobianchi, F.; Riva, S.; Biamonti, G., hnRNP A1 selectively interacts through its Gly-rich domain with different RNA-binding proteins. J Mol Biol 1996, 259 (3), 337-48. 18. Siomi, H.; Dreyfuss, G., A nuclear localization domain in the hnRNP A1 protein. J Cell Biol 1995, 129 (3), 551-60. 19. Morgan, C. E.; Meagher, J. L.; Levengood, J. D.; Delproposto, J.; Rollins, C.; Stuckey, J. A.; Tolbert, B. S., The First Crystal Structure of the UP1 Domain of hnRNP A1 Bound to RNA Reveals a New Look for an Old RNA Binding Protein. J Mol Biol 2015, 427 (20), 3241-3257. 20. Jokan, L.; Dong, A. P.; Mayeda, A.; Krainer, A. R.; Xu, R. M., Crystallization and preliminary X-ray diffraction studies of UP1, the two-RRM domain of hnRNP A1. Acta Crystallogr D Biol Crystallogr 1997, 53 (Pt 5), 615-8. 21. Shamoo, Y.; Krueger, U.; Rice, L. M.; Williams, K. R.; Steitz, T. A., Crystal structure of the two RNA binding domains of human hnRNP A1 at 1.75 A resolution. Nat Struct Biol 1997, 4 (3), 215-22. 22. Jain, N.; Lin, H. C.; Morgan, C. E.; Harris, M. E.; Tolbert, B. S., Rules of RNA specificity of hnRNP A1 revealed by global and quantitative analysis of its affinity distribution. Proc Natl Acad Sci U S A 2017, 114 (9), 2206-2211. 23. Mayeda, A.; Munroe, S. H.; Xu, R. M.; Krainer, A. R., Distinct functions of the closely related tandem RNA-recognition motifs of hnRNP A1. RNA 1998, 4 (9), 1111-23. 24. Michael, W. M.; Choi, M.; Dreyfuss, G., A nuclear export signal in hnRNP A1: a signal-mediated, temperature-dependent nuclear protein export pathway. Cell 1995, 83 (3), 415-22. 25. Harlen, K. M.; Churchman, L. S., The code and beyond: transcription regulation by the RNA polymerase II carboxy-terminal domain. Nat Rev Mol Cell Biol 2017, 18 (4), 263-273.

134

26. Burd, C. G.; Dreyfuss, G., RNA binding specificity of hnRNP A1: significance of hnRNP A1 high-affinity binding sites in pre-mRNA splicing. EMBO J 1994, 13 (5), 1197-204. 27. Wong, K. H.; Jin, Y.; Struhl, K., TFIIH phosphorylation of the Pol II CTD stimulates mediator dissociation from the preinitiation complex and promoter escape. Mol Cell 2014, 54 (4), 601-12. 28. Allen, B. L.; Taatjes, D. J., The Mediator complex: a central integrator of transcription. Nat Rev Mol Cell Biol 2015, 16 (3), 155-66. 29. Malaby, A. W.; Chakravarthy, S.; Irving, T. C.; Kathuria, S. V.; Bilsel, O.; Lambright, D. G., Methods for analysis of size-exclusion chromatography- small-angle X-ray scattering and reconstruction of protein scattering. J Appl Crystallogr 2015, 48 (Pt 4), 1102-1113. 30. Giambasu, G. M.; York, D. M.; Case, D. A., Structural fidelity and NMR relaxation analysis in a prototype RNA hairpin. RNA 2015, 21 (5), 963-74.

135

Chapter 4: Conclusions and Future Studies

136

4.1 Conclusions

AIDS/HIV currently affects approximately 37 million people globally, and about 60% of people living with HIV have access to efficient HIV treatment, ART (reviewed in

Chapter 1). While tremendous efforts have been made toward control of the epidemics over the last more than 30 years, AIDS/HIV is only preventable and treatable but not yet curable. Currently, the lifelong regimens of oral therapeutics for infected individuals can lead to low compliance; moreover, the high cost of the drugs is burdensome for the patient, especially for those from developing countries and rural regions. This treatment accessibility severely limits the progress of

AIDS/HIV prevention globally.

Towards the goal of ending the AIDS/HIV epidemic by 2030, momentum is built for a more narrative goal for 2020: 90% of all people living with HIV will know their

HIV status; 90% of all people diagnosed with HIV will receive sustained antiretroviral therapy; and 90% of all people receiving antiretroviral therapy will achieve viral suppression. This narrative goal sets a target for the disease detection, accessibility of the treatment and the effectiveness of the treatment.

Based on the current status, though we made steady and promising progress, we still fall behind.

137

In order to bridge the gap in achieving viral suppression in 90% of ART recipients, better ART regiment is required.

As mentioned in the preceding chapters, one step of the life cycle, transcription, is currently not targeted with any form of ART, making it a viable target for a new class of anti-retroviral drugs. The difficulty of targeting transcription in HIV treatment is rooted on the limited understanding of transcriptional regulation in host cells versus in HIV. HIV gene expression has been widely studied, however, very little progress has been made through directly targeting viral proteins that facilitate this process. It is mostly due to the functional and structural similarities between the viral regulatory factors and the host regulatory factors. In this thesis, we attempt to understand the transcription regulatory mechanism through P-TEFb, which is an essential transcriptional factor for both viral and host cell.

To set fundamental understanding of the regulatory mechanism of P-TEFb activity,

I started my studies focused on secondary structure of 7SK snRNA, which is an abundant non-coding RNA in the host cells that inhibits P-TEFb by forming a tertiary 7SK-HEXIM1-P-TEFb snRNP complex. Furthermore, to understand the transition between the active and inactive pools of P-TEFb, I studied 7SK-hnRNP

A1 snRNP complex, which is formed mutually exclusively with 7SK-HEXIM1-P-

TEFb snRNP complex. While the transitioning mechanism of these two complexes is not directly explained by this research, my studies provided a solid structural

138

understanding of 7SK snRNA as well as 7SK-hnRNP A1 complex. In the future, this structural information could help explain the transitioning mechanism between the 7SK- hnRNP A1 snRNP and 7SK-HEXIM1-P-TEFb snRNP complexes.

In the thesis, aided by the powerful DMS-MaPseq method, we revisited the secondary structure of 7SK snRNA. Secondary structure of non-coding RNAs, like

7SK snRNA, is extremely important for understanding their functions as they do not code any functional proteins, but, facilitate biological process through direct or indirect interactions with regulatory factors. In our method, we transcribed the full length 7SK snRNA in vitro by T7 RNA polymerase, and purified the RNA without denaturing the secondary structure. The DMS modification and following RT-PCR were performed in the optimized condition as described (Chapter.2). The model was constructed leveraging the RNA folding tool, RNAStructure, developed by Dr.

David Mathews group, with incorporation of the normalized DMS reactivity profile as a pseudoenergy restraint to guild the RNA folding. In this process, the DMS reactivity profile is considered for the base pairing calculation and eventually guides the RNA structure determination.

In the population average DMS-MaPseq model (Figure.2.3), 7SK snRNA folds into

6 stem loops, including 4 major stem loops and 2 minor stem loops. Following the previous nomenclatures, we named the 4 major stem loops as SL1, SL2A, SL3, and SL4, and the 2 minor stem loops as SL2B and SL2C, as they are located

139

downstream next to SL2A. The determination of SL2B and SL2C explained the coexistence of multiple conformations of the SL2 construct that was designed based on the Steitz model.

Comparing the DMS-MaPseq model to the previous secondary structure models of 7SK snRNA, one conclusion could be made that SL1 and SL4 are structurally conserved regardless of the association of LARP7 and MePCE proteins1-2. Both previous models ( the Steitz model determined in 1991 and the Price model determined in 2017) were based on RNA transcribed in vivo, speculated that the stability of SL1(5’ end region, binding with MePCE) and SL4 (3’ end region, binding with LARP7) were dependent on the association with those two proteins, as

MePCE and LARP7 were abundantly found associated with 7SK snRNA core and knockout of MePCE or LARP7 decreased the functional 7SK snRNA abundance in vivo 1, 3-4. However, our RNA was transcribed in vitro without association of

LARP7 nor MePCE, and found to have the same secondary structures within those regions. That is to say, MePCE and LARP7 tend to stabilize 7SK snRNA functionally. MePCE adds a methylphosphate cap at the 5’ end of 7SK snRNA to facilitate the recognition of LARP7 to 7SK snRNP at 3’ end and the association of

LARP7 to 7SK snRNA protectes the 3’ end of the RNA from ribonuclease. Through direct binding to 7SK snRNA in addition to binding between these two proteins, the functional 7SK snRNA core is stabilized. However, the stability of the secondary structure of 7SK snRNA is not dependent on MePCE or LARP7.

140

The major difference between DMS-MaPseq model and the Steitz model is in the

SL2 region (Figure 2.3A). The Price model suggested a similar folding pattern of

SL2 region as our model, with a minor base pairing rearrangement between SL2B and SL2C (Figure 2.3B). This minor difference is determined by the base pairing status of C159 and C160 located at SL2B and C185 located at the 3’ end of SL2C from both models. Although the DMS activities of all three Cysteines are very low, in the DMS-MaPseq model, C159 and C160 were determined to be paired and

C185 unpaired. As comparison, in the Price model, C159 and C160 were determined to be unpaired and C185 paired, leading to a minor rearrangement between these two models. While it is highly possible that the differences between these two models is due to the different sample preparation, we are looking for some alternative explanations to the rearrangement between these two models.

In the population average model from our experiment, the average values of structural restraints were collected and averaged from 3 independent DMS-

MaPseq experiments that were highly reproducible with the R2 of 96.4%, 96.6% and 98.3%, in the linear regression test. However, one dataset, DMS_491, which has slightly lower R 2 value (96.4% and 96.6% to the other two dataset), showed potential of alternative secondary structure of the RNA within the SL2 region. Per published method, the R2 of over 95% was recommended for the DMS-MaPseq date. It is rather shocking to observe that 2% of difference in R2 changed the secondary structure folding of the average model.

141

Inspired by this observation, we decided to analyze this experiment DMS_491 more in depth. Upon checking the shift pattern of the sample using electrophoresis, it turned out that the sample for DMS_491 had adopted two or more conformations, whereas the others shifted as one single band. The MaPseq data collected from this sample could be representing these two major conformations. To better understand the impact of the minor difference between the datasets, further analysis of DMS_491 was attempted to identify and quantify the potential alternative conformations of 7SK snRNA, with the help from our collaborator, Dr.

Silvi Rouskin. In this clustering method, instead of populating all the aligned sequence reads, the sequencing reads were sub-grouped into several datasets based on the conservation of each sequencing read before alignment [Figure 2.5].

While the method of clustering is still in development and more optimization and validation needs to be done, we had solid evidence that the clustered structures could be a better representation of the native form of the RNA. Instead of averaging and showing the RNA secondary structure as one single conformation, the clustering method identifies and quantifies the sequencing data based on the conservation of the region, so that individual conformations could be identified and analyzed. Details of the methodology are not discussed in the thesis as it is still under development and will be available soon in the forthcoming publications from our collaborator, Dr. Silvi Rouskin.

142

In the preliminary trial experiment of secondary structure clustering of 7SK snRNA, two major conformations were identified. One structure (DMS_491_Cluster_2,

Figure 2.5B) is identical to the Price model determined by Dr. Price in 2017; and the other model (DMS_491_Cluster_1, Figure 2.5A) represented another conformation in which SL2B and SL2C shifted and folded as one stem loop. As the method is being developed, the model needs to be further validated by other structure determination methods as discussed in Future Studies.

Thanks to the high sensitivity of DMS-MaPseq method to RNA secondary structure

(Chapter 2), RNA-protein binding can also be detected using a similar method, differential DMS- MaPseq. In this method, untreated 7SK snRNA, free 7SK snRNA and 7SK-hnRNP A1 complex were sequenced in parallel. After normalization between datasets, the direct subtraction result can indicate the DMS reactivity change upon protein change, thus identifying the binding sites of the protein. In our case, the scenario was complicated by the nature of the protein that the

RRM12 region (UP1) of hnRNP A1 has unwinding effect on RNA upon binding. As

DMS-MaPseq directly detects the overall accessibility of the nucleotides, binding of protein would in general shield the nucleotide and limit the access, while the unwinding effect associated with this binding would relax the secondary structures in the local regions of the binding sites, exposing the nucleotides. The binding sites determined may not be accurate as some binding sites located within double stranded regions that cannot be accessed by hnRNP A1 will be accessible upon

143

unwinding. To minimize the chance of false detection of binding sites due to unwinding, we performed hnRNP A1 titration to 7SK snRNA. The titration experiment showed hnRNP A1 binds to 7SK snRNA at A219 and A239 located at the upper helix of SL3 starting as low as 1 to 1, and binding gets stronger as hnRNP A1 titrated in. At the titration point of 1 7SK snRNA: 4 hnRNP A1, the two binding sites (A219 and A239) tend to be saturated, the neighborhood of these two nucleotides get unwound, and the local regions showed increased DMS reactivity

(Figure 3.4). At this point of titration, the overall secondary structure of SL3 has not been disrupted compared to unbound RNAs. As hnRNP A1 is further titrated in, the SL3 continued to be unwound by the additional hnRNP A1. At the titration point of 1 7SK snRNA: 10 hnRNP A1, the entire SL3 has been unwound. In the differential DMS-MaPseq heatmap, the DMS reactivity of each nucleotide is extremely changed (positively or negatively), all the potential binding sites formerly buried in the base pairing were exposed (indicated by increased DMS reactivity) and bound to the over saturated hnRNP A1 (indicated by decreased DMS reactivity). To identify the specific binding sites, the binding ratio of 17SK snRNA:

4 hnRNP A1 was ideal where specific binding sites could be identified without disrupting the secondary structure of SL3, so that minimal unwinding effect was detected by the method.

Based on the binding sites identified by differential DMS MaPseq, three SL3 constructs (each construct includes the upper helix of SL3) with different lengths

144

(Figure 3.5) were designed for additional structural and dynamic study. SL3up

(57nt) was used for comparative binding study of UP1 and hnRNP A1 by EMSA, which confirmed that UP1 can replace full length hnRNP A1 for the binding studies.

Also, SL3up was used for thermodynamic study by ITC, which measured UP1 binding to SL3up at stoichiometry of 2.16 with a binding affinity of 41.66 nM. The

ITC result agreed with the result from the differential DMS- MaPseq method. The assignment of secondary structure of SL3S (75 nt) through 1H-1H Water NOESY and 15N-1H selective G and U labeled HSQC also confirmed the DMS-MaPseq model is accurate within SL3S region (200-274nt). Additionally, we acquired SEC-

SAXS data with SL3D RNA construct as well as SL3D-hnRNP A1 1:1 binding complex to provide structural insight into the global shape of 7SK SL3D (97 nt) as well as the protein binding complex. Processing of raw SAXS data for filtering and generation of the molecular envelope was conducted with the software Primus and

DAMMIF. The overall size of the SAXS reconstruction easily accommodates the

NMR determined model (Figure 3.10).

The study of secondary structure of 7SK snRNA and structural study of 7SK SL3 interaction of hnRNP A1 provide solid structural insights into the regulatory mechanism of P-TEFb through 7SK snRNA. Several continued studies have been proposed to discover this regulatory mechanism.

145

4.2 Future studies

Transcriptional regulation of P-TEFb is fundamentally important in HIV-1 gene expression, therefore, a potential therapeutic target for the next generation of

ART5-6. My research is focused on 7SK snRNA associated regulatory mechanism study of P-TEFb availability. Towards better understanding of the regulatory mechanism of P-TEFb through 7SK snRNA, the following future studies are proposed:

High-resolution structural model of full length 7SK snRNA

The population average model of 7SK snRNA provided an accountable starting point for the future structural study of individual stem loops. The developing method of clustering can enhance the power of the secondary structural model for the RNA design. As planned, multiple constructs of SL2 will be designed and studied comparatively using NMR, so that different conformations of SL2 region can be confirmed. The SEC-SAXS data of each construct will be collected and analyzed to provide global shape of the region in each conformation. By now, the molecular density distribution model of SL1, SL3 and SL4 have been calculated from the SEC-SAXS data. Upon completion of the SEC-SAXS analysis of SL2, the global shapes of each individual stem loop will be available for building a molecular density distribution model of full length 7SK snRNA.

146

In addition, higher resolution structure of SL2 and SL3 will be determined using a hybrid method combining NMR, SAXS and cryo-EM, as well as MD simulation. As the structure of SL1 and SL4 were determined previously. In collaboration with Dr.

D’ Souza group, we plan to incorporate the existing high-resolution structure information of each stem loop to generate high resolution structure model of full length 7SK snRNA. This work is planned as part of the project in my upcoming postdoctoral research.

Currently, the cryo-EM study of SL3 is in preparation. We have purified the three different SL3 constructs (Figure.3.5), as well as full length 7SK snRNA in vitro, and formed stable hnRNP A1 binding complexes purified through FPLC. Meanwhile, the full NMR assignment of SL3 is also planned. As mentioned in the preceding chapters, the design of SL2 is highly dependent on the clustering model of the full length 7SK snRNA. Upon completion of the validation of the method, SL2 constructs that were locked in each conformation will be designed, and the high- resolution structures will be determined using NMR and SEC-SAXS method. Once the structure of each stem loop has been individually determined, the overall orientation between stem loops can be determined using orientational restraints in the form of RDCs from NMR and the global shape determined from SEC-SAXS.

Determining the high-resolution structure of full length 7SK snRNA will allow us to

147

look at the equilibrium of different 7SK snRNPs in the full-length perspective that provides a greater understanding of transition between mutually exclusive snRNPs.

7SK associated snRNP study in vitro

To address the major issue of P-TEFb availability regulation, there are many experimental options that could be used to shed light on the overall mechanism of the regulation. Starting with hnRNP A1 is beneficial as the initial study, because hnRNP A1 is the most common and widely studied hnRNP protein7. However, because 7SK snRNA forms more than one different complex with hnRNP proteins, studies of interactions of 7SK snRNA and other hnRNP proteins could provide important information to the overall mechanism of regulation. Previous studies reported that transcriptional pausing is only affected upon knock-down of all hnRNP A members while knock-down of hnRNP A1 alone would not significantly affect the transcription. In other words, regulatory responsibility of hnRNP A1 could be compensated by hnRNP A2/B1 when the former is not available. This indicated hnRNP A2/B1 binding plays a similar role as hnRNP A18.

Future investigations could focus on examining the roles of hnRNP Q and R which were previously reported to interact with SL1 and SL3 of 7SK snRNA 9. HnRNP Q and R were reported to bind both SL1 and SL3 simultaneously and form a bridge between them. Also, they could bind only one stem loop, either SL1 or SL3, and

148

another hnRNP Q or R would interact the other stem loop 10. Interestingly, hnRNP

Q and R do not bind to the same complex as hnRNP A1 or A2/B1 9, so the 7SK- hnRNP Q/R RNP may have played a specific regulatory role independent of the

7SK-hnRNP snRNP with hnRNP A1 or A2/B1. Combining with the study of full length 7SK snRNA in the clustering model, the binding patterns of hnRNP Q and

R might be related to the conformational change of SL2, which changed the distance of SL1 and SL3, hence affecting the binding ability of the protein. While most of the binding studies were done in vivo, in vitro studies provided more isolated model of the complex without any potential unknown facilitation by other factors in vivo.

7SK-hnRNP snRNP exists mutually exclusively with 7SK-P-TEFb complex. While the study between these complexes in vitro are not completed, structures of several 7SK snRNP complexes have already been solved 11-13. Together, these data will allow us to examine the intimate relationship between 7SK snRNA and protein factors that work together to regulate P-TEFb availability.

High resolution structure of 7SK SL3 - hnRNP A1 complex

In this study, the structure of SL3D has been determined using a DMS-MaPseq guided approach that combines NMR, SEC-SAXS and MD simulation, which was presented in Chapter 3. The overall structure can be studied in solution using this

149

hybrid method, however, the resolution of the study is highly depended on the quality of NMR data, since the SEC-SAXS model of global shape of the complex is generally in low resolution14. If the complex is sufficiently stable, cryo-EM could be used to determine the high-resolution structure of the complex15. Cryo-EM has been far advanced in the last several years thanks to the recent advances in electron detection and image processing15. These advances of detection yield density maps with unprecedent quality while the processing tools correct for sample movement to deduce the atomic structure for a range of specimens. The resolution by cryo-EM is now beginning to rival X-ray crystallography, no longer limited to large complexes or low-resolution models 16. A recent publication presented a structure of HIV-1 RNA dimerization signal around 30 kD, using a method combining Cryo-EM, NMR, and MD simulation 16-17. SL3-hnRNP A1 1:1 complex is roughly 55 kD and the 1:2 complex is around 80 kD, with the help of

NMR and MD simulation, high-resolution structure of the complexes could be determined.

150

Reference

1. Krueger, B. J.; Jeronimo, C.; Roy, B. B.; Bouchard, A.; Barrandon, C.; Byers, S. A.; Searcey, C. E.; Cooper, J. J.; Bensaude, O.; Cohen, E. A.; Coulombe, B.; Price, D. H., LARP7 is a stable component of the 7SK snRNP while P-TEFb, HEXIM1 and hnRNP A1 are reversibly associated. Nucleic Acids Res 2008, 36 (7), 2219-29. 2. Xue, Y.; Yang, Z.; Chen, R.; Zhou, Q., A capping-independent function of MePCE in stabilizing 7SK snRNA and facilitating the assembly of 7SK snRNP. Nucleic Acids Res 2010, 38 (2), 360-9. 3. Wassarman, D. A.; Steitz, J. A., Structural analyses of the 7SK ribonucleoprotein (RNP), the most abundant human small RNP of unknown function. Mol Cell Biol 1991, 11 (7), 3432-45. 4. Brogie, J. E.; Price, D. H., Reconstitution of a functional 7SK snRNP. Nucleic Acids Res 2017, 45 (11), 6864-6880. 5. Barboric, M.; Yik, J. H.; Czudnochowski, N.; Yang, Z.; Chen, R.; Contreras, X.; Geyer, M.; Matija Peterlin, B.; Zhou, Q., Tat competes with HEXIM1 to increase the active pool of P-TEFb for HIV-1 transcription. Nucleic Acids Res 2007, 35 (6), 2003-12. 6. Karn, J.; Stoltzfus, C. M., Transcriptional and posttranscriptional regulation of HIV-1 gene expression. Cold Spring Harb Perspect Med 2012, 2 (2), a006916. 7. Levengood, J. D.; Tolbert, B. S., Idiosyncrasies of hnRNP A1-RNA recognition: Can binding mode influence function. Semin Cell Dev Biol 2018. 8. Lemieux, B.; Blanchette, M.; Monette, A.; Mouland, A. J.; Wellinger, R. J.; Chabot, B., A Function for the hnRNP A1/A2 Proteins in Transcription Elongation. PLoS One 2015, 10 (5), e0126654. 9. Van Herreweghe, E.; Egloff, S.; Goiffon, I.; Jady, B. E.; Froment, C.; Monsarrat, B.; Kiss, T., Dynamic remodelling of human 7SK snRNP controls the nuclear level of active P-TEFb. EMBO J 2007, 26 (15), 3570-80. 10. Diribarne, G.; Bensaude, O., 7SK RNA, a non-coding RNA regulating P- TEFb, a general transcription factor. RNA Biol 2009, 6 (2), 122-8. 11. Durney, M. A.; D'Souza, V. M., Preformed protein-binding motifs in 7SK snRNA: structural and thermodynamic comparisons with retroviral TAR. J Mol Biol 2010, 404 (4), 555-67. 12. Pham, V. V.; Salguero, C.; Khan, S. N.; Meagher, J. L.; Brown, W. C.; Humbert, N.; de Rocquigny, H.; Smith, J. L.; D'Souza, V. M., HIV-1 Tat interactions with cellular 7SK and viral TAR RNAs identifies dual structural mimicry. Nat Commun 2018, 9 (1), 4266. 13. Martinez-Zapien, D.; Legrand, P.; McEwen, A. G.; Proux, F.; Cragnolini, T.; Pasquali, S.; Dock-Bregeon, A. C., The crystal structure of the 5 functional

151

domain of the transcription riboregulator 7SK. Nucleic Acids Res 2017, 45 (6), 3568-3579. 14. Mertens, H. D. T.; Svergun, D. I., Combining NMR and small angle X-ray scattering for the study of biomolecular structure and dynamics. Arch Biochem Biophys 2017, 628, 33-41. 15. Bai, X. C.; McMullan, G.; Scheres, S. H., How cryo-EM is revolutionizing structural biology. Trends Biochem Sci 2015, 40 (1), 49-57. 16. Bayfield, M. A.; Yang, R.; Maraia, R. J., Conserved and divergent features of the structure and function of La and La-related proteins (LARPs). Biochim Biophys Acta 2010, 1799 (5-6), 365-78. 17. Zhang, K.; Keane, S. C.; Su, Z.; Irobalieva, R. N.; Chen, M.; Van, V.; Sciandra, C. A.; Marchant, J.; Heng, X.; Schmid, M. F.; Case, D. A.; Ludtke, S. J.; Summers, M. F.; Chiu, W., Structure of the 30 kDa HIV-1 RNA Dimerization Signal by a Hybrid Cryo-EM, NMR, and Molecular Dynamics Approach. Structure 2018, 26 (3), 490-498 e3.

152

Appendix 1:

This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license. (For further information of the license, refers to https://creativecommons.org/licenses/by-sa/4.0/legalcode)

Page URL: https://commons.wikimedia.org/wiki/File:HI-virion-structure_en.svg

File URL: https://upload.wikimedia.org/wikipedia/commons/5/5e/HI-virion-structure_en.svg

Attribution: Thomas Splettstoesser (www.scistyle.com) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], from Wikimedia Commons

HTML/BBCode

153

Appendix 2:

12/3/2018 RightsLink Printable License

ELSEVIER LICENSE TERMS AND CONDITIONS Dec 03, 2018

This Agreement between Case Western Reserve University -- Le Luo ("You") and Elsevier ("Elsevier") consists of your license details and the terms and conditions provided by Elsevier and Copyright Clearance Center.

License Number 4481360556728 License date Dec 03, 2018 Licensed Content Publisher Elsevier

Licensed Content Publication Seminars in Cell & Developmental Biology

Licensed Content Title Idiosyncrasies of hnRNP A1­RNA recognition: Can binding mode influence function Licensed Content Author Jeffrey D. Levengood,Blanton S. Tolbert

Licensed Content Date Available online 9 April 2018

Licensed Content Volume n/a Licensed Content Issue n/a Licensed Content Pages 1 Start Page 0

End Page 0 Type of Use reuse in a thesis/dissertation Portion figures/tables/illustrations

Number of 1 figures/tables/illustrations Format both print and electronic Are you the author of this No Elsevier article?

Will you be translating? No Original figure numbers Figure 1 Title of your structural Insights into 7SK snRNP complex and its implication of thesis/dissertation HIV­1 transcriptional control

Expected completion date Dec 2018 Estimated size (number of 170 pages) Requestor Location Case Western Reserve University 2074 Adelbert Rd, Millis G27

CLEVELAND, OH 44106 United States Attn: Case Western Reserve University Publisher Tax ID 98­0397604 Total 0.00 USD

Terms and Conditions https://s100.copyright.com/AppDispatchServlet 1/6

154

Bibliography

155

Aiken, C.; Konner, J.; Landau, N. R.; Lenburg, M. E.; Trono, D., Nef induces CD4 endocytosis: requirement for a critical dileucine motif in the membrane-proximal CD4 cytoplasmic domain. Cell 1994, 76 (5), 853-64.

Allen, B. L.; Taatjes, D. J., The Mediator complex: a central integrator of transcription. Nat Rev Mol Cell Biol 2015, 16 (3), 155-66.

Arimbasseri, A. G.; Rijal, K.; Maraia, R. J., Comparative overview of RNA polymerase II and III transcription cycles, with focus on RNA polymerase III termination and reinitiation. Transcription 2014, 5 (1), e27639.

Ashorn, P.; McQuade, T. J.; Thaisrivongs, S.; Tomasselli, A. G.; Tarpley, W. G.; Moss, B., An inhibitor of the protease blocks maturation of human and simian immunodeficiency viruses and spread of infection. Proc Natl Acad Sci U S A 1990, 87 (19), 7472-6.

Baba, T. W.; Jeong, Y. S.; Pennick, D.; Bronson, R.; Greene, M. F.; Ruprecht, R. M., Pathogenicity of live, attenuated SIV after mucosal infection of neonatal macaques. Science 1995, 267 (5205), 1820-5.

Barboric, M.; Kohoutek, J.; Price, J. P.; Blazek, D.; Price, D. H.; Peterlin, B. M., Interplay between 7SK snRNA and oppositely charged regions in HEXIM1 direct the inhibition of P-TEFb. EMBO J 2005, 24 (24), 4291-303.

Barboric, M.; Yik, J. H.; Czudnochowski, N.; Yang, Z.; Chen, R.; Contreras, X.; Geyer, M.; Matija Peterlin, B.; Zhou, Q., Tat competes with HEXIM1 to increase the active pool of P-TEFb for HIV-1 transcription. Nucleic Acids Res 2007, 35 (6), 2003-12.

Barrandon, C.; Bonnet, F.; Nguyen, V. T.; Labas, V.; Bensaude, O., The transcription-dependent dissociation of P-TEFb-HEXIM1-7SK RNA relies upon formation of hnRNP-7SK RNA complexes. Mol Cell Biol 2007, 27 (20), 6996- 7006.

Barre-Sinoussi, F.; Chermann, J. C.; Rey, F.; Nugeyre, M. T.; Chamaret, S.; Gruest, J.; Dauguet, C.; Axler-Blin, C.; Vezinet-Brun, F.; Rouzioux, C.; Rozenbaum, W.; Montagnier, L., Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS). Science 1983, 220 (4599), 868-71.

Bartel, D. P.; Zapp, M. L.; Green, M. R.; Szostak, J. W., HIV-1 Rev regulation involves recognition of non-Watson-Crick base pairs in viral RNA. Cell 1991, 67 (3), 529-36.

156

Bataille, A. R.; Jeronimo, C.; Jacques, P. E.; Laramee, L.; Fortin, M. E.; Forest, A.; Bergeron, M.; Hanes, S. D.; Robert, F., A universal RNA polymerase II CTD cycle is orchestrated by complex interplays between kinase, phosphatase, and isomerase enzymes along genes. Mol Cell 2012, 45 (2), 158-70.

Baumli, S.; Lolli, G.; Lowe, E. D.; Troiani, S.; Rusconi, L.; Bullock, A. N.; Debreczeni, J. E.; Knapp, S.; Johnson, L. N., The structure of P-TEFb (CDK9/cyclin T1), its complex with flavopiridol and regulation by phosphorylation. EMBO J 2008, 27 (13), 1907-18.

Bayfield, M. A.; Yang, R.; Maraia, R. J., Conserved and divergent features of the structure and function of La and La-related proteins (LARPs). Biochim Biophys Acta 2010, 1799 (5-6), 365-78.

Bellaousov, S.; Reuter, J. S.; Seetin, M. G.; Mathews, D. H., RNAstructure: Web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res 2013, 41 (Web Server issue), W471-4.

Bernstein, H. B.; Tucker, S. P.; Kar, S. R.; McPherson, S. A.; McPherson, D. T.; Dubay, J. W.; Lebowitz, J.; Compans, R. W.; Hunter, E., Oligomerization of the hydrophobic heptad repeat of gp41. J Virol 1995, 69 (5), 2745-50.

Beyrer, C., The HIV/AIDS vaccine research effort: an update. Hopkins HIV Rep 2003, 15 (1), 6-7.

Bhatti, A. B.; Usman, M.; Kandi, V., Current Scenario of HIV/AIDS, Treatment Options, and Major Challenges with Compliance to Antiretroviral Therapy. Cureus 2016, 8 (3), e515.

Blazek, D.; Barboric, M.; Kohoutek, J.; Oven, I.; Peterlin, B. M., Oligomerization of HEXIM1 via 7SK snRNA and coiled-coil region directs the inhibition of P-TEFb. Nucleic Acids Res 2005, 33 (22), 7000-10.

Brogie, J. E.; Price, D. H., Reconstitution of a functional 7SK snRNP. Nucleic Acids Res 2017, 45 (11), 6864-6880.

Brother, M. B.; Chang, H. K.; Lisziewicz, J.; Su, D.; Murty, L. C.; Ensoli, B., Block of Tat-mediated transactivation of tumor necrosis factor beta gene expression by polymeric-TAR decoys. Virology 1996, 222 (1), 252-6.

Bryant, M.; Ratner, L., Myristoylation-dependent replication and assembly of human immunodeficiency virus 1. Proc Natl Acad Sci U S A 1990, 87 (2), 523-7.

157

Bukovsky, A. A.; Weimann, A.; Accola, M. A.; Gottlinger, H. G., Transfer of the HIV-1 cyclophilin-binding site to simian immunodeficiency virus from Macaca mulatta can confer both cyclosporin sensitivity and cyclosporin dependence. Proc Natl Acad Sci U S A 1997, 94 (20), 10943-8.

Burd, C. G.; Dreyfuss, G., RNA binding specificity of hnRNP A1: significance of hnRNP A1 high-affinity binding sites in pre-mRNA splicing. EMBO J 1994, 13 (5), 1197-204.

Bushman, F. D., Tethering human immunodeficiency virus 1 integrase to a DNA site directs integration to nearby sequences. Proc Natl Acad Sci U S A 1994, 91 (20), 9233-7.

Camaur, D.; Trono, D., Characterization of human immunodeficiency virus type 1 Vif particle incorporation. J Virol 1996, 70 (9), 6106-11.

Camerini, D.; Seed, B., A CD4 domain important for HIV-mediated syncytium formation lies outside the virus binding site. Cell 1990, 60 (5), 747-54.

Capon, D. J.; Ward, R. H., The CD4-gp120 interaction and AIDS pathogenesis. Annu Rev Immunol 1991, 9, 649-78.

Cartegni, L.; Maconi, M.; Morandi, E.; Cobianchi, F.; Riva, S.; Biamonti, G., hnRNP A1 selectively interacts through its Gly-rich domain with different RNA- binding proteins. J Mol Biol 1996, 259 (3), 337-48.

Centers for Disease, C., Update on acquired immune deficiency syndrome (AIDS) among patients with hemophilia A. MMWR Morb Mortal Wkly Rep 1982, 31 (48), 644-6, 652.

Centers for Disease, C., Update on acquired immune deficiency syndrome (AIDS)--United States. MMWR Morb Mortal Wkly Rep 1982, 31 (37), 507-8, 513- 4.

Chapman, R. D.; Conrad, M.; Eick, D., Role of the mammalian RNA polymerase II C-terminal domain (CTD) nonconsensus repeats in CTD stability and cell proliferation. Mol Cell Biol 2005, 25 (17), 7665-74.

Chaudhury, A.; Chander, P.; Howe, P. H., Heterogeneous nuclear ribonucleoproteins (hnRNPs) in cellular processes: Focus on hnRNP E1's multifunctional regulatory roles. RNA 2010, 16 (8), 1449-62.

158

Clavel, F.; Brun-Vezinet, F.; Guetard, D.; Chamaret, S.; Laurent, A.; Rouzioux, C.; Rey, M.; Katlama, C.; Rey, F.; Champelinaud, J. L.; et al., [LAV type II: a second retrovirus associated with AIDS in West Africa]. C R Acad Sci III 1986, 302 (13), 485-8.

Cohen, E. A.; Dehni, G.; Sodroski, J. G.; Haseltine, W. A., Human immunodeficiency virus vpr product is a virion-associated regulatory protein. J Virol 1990, 64 (6), 3097-9.

Cohen, M. S.; Gay, C. L., Treatment to prevent transmission of HIV-1. Clin Infect Dis 2010, 50 Suppl 3, S85-95.

Collins, K. L.; Nabel, G. J., Naturally attenuated HIV--lessons for AIDS vaccines and treatment. N Engl J Med 1999, 340 (22), 1756-7.

Cramer, P.; Armache, K. J.; Baumli, S.; Benkert, S.; Brueckner, F.; Buchen, C.; Damsma, G. E.; Dengl, S.; Geiger, S. R.; Jasiak, A. J.; Jawhari, A.; Jennebach, S.; Kamenski, T.; Kettenberger, H.; Kuhn, C. D.; Lehmann, E.; Leike, K.; Sydow, J. F.; Vannini, A., Structure of eukaryotic RNA polymerases. Annu Rev Biophys 2008, 37, 337-52.

D'Orso, I.; Jang, G. M.; Pastuszak, A. W.; Faust, T. B.; Quezada, E.; Booth, D. S.; Frankel, A. D., Transition step during assembly of HIV Tat:P-TEFb transcription complexes and transfer to TAR RNA. Mol Cell Biol 2012, 32 (23), 4780-93.

Ding, J.; Hayashi, M. K.; Zhang, Y.; Manche, L.; Krainer, A. R.; Xu, R. M., Crystal structure of the two-RRM domain of hnRNP A1 (UP1) complexed with single- stranded telomeric DNA. Genes Dev 1999, 13 (9), 1102-15.

Diribarne, G.; Bensaude, O., 7SK RNA, a non-coding RNA regulating P-TEFb, a general transcription factor. RNA Biol 2009, 6 (2), 122-8.

Dowling, D.; Nasr-Esfahani, S.; Tan, C. H.; O'Brien, K.; Howard, J. L.; Jans, D. A.; Purcell, D. F.; Stoltzfus, C. M.; Sonza, S., HIV-1 infection induces changes in expression of cellular splicing factors that regulate alternative viral splicing and virus production in macrophages. Retrovirology 2008, 5, 18.

Durney, M. A.; D'Souza, V. M., Preformed protein-binding motifs in 7SK snRNA: structural and thermodynamic comparisons with retroviral TAR. J Mol Biol 2010, 404 (4), 555-67.

159

Egloff, S.; Van Herreweghe, E.; Kiss, T., Regulation of polymerase II transcription by 7SK snRNA: two distinct RNA elements direct P-TEFb and HEXIM1 binding. Mol Cell Biol 2006, 26 (2), 630-42.

Favre, M.; Butticaz, C.; Stevenson, B.; Jongeneel, C. V.; Telenti, A., High frequency of alternative splicing of human genes participating in the HIV-1 life cycle: a model using TSG101, betaTrCP, PPIA, INI1, NAF1, and PML. J Acquir Immune Defic Syndr 2003, 34 (2), 127-33.

Feinberg, M. B.; Baltimore, D.; Frankel, A. D., The role of Tat in the human immunodeficiency virus life cycle indicates a primary effect on transcriptional elongation. Proc Natl Acad Sci U S A 1991, 88 (9), 4045-9.

Felber, B. K.; Drysdale, C. M.; Pavlakis, G. N., Feedback regulation of human immunodeficiency virus type 1 expression by the Rev protein. J Virol 1990, 64 (8), 3734-41.

Feng, S.; Holland, E. C., HIV-1 tat trans-activation requires the loop sequence within tar. Nature 1988, 334 (6178), 165-7.

Feng, Y.; Broder, C. C.; Kennedy, P. E.; Berger, E. A., HIV-1 entry cofactor: functional cDNA cloning of a seven-transmembrane, G protein-coupled receptor. Science 1996, 272 (5263), 872-7.

Finley, J., Reactivation of latently infected HIV-1 viral reservoirs and correction of aberrant alternative splicing in the LMNA gene via AMPK activation: Common mechanism of action linking HIV-1 latency and Hutchinson-Gilford progeria syndrome. Med Hypotheses 2015, 85 (3), 320-32.

Flynn, R. A.; Do, B. T.; Rubin, A. J.; Calo, E.; Lee, B.; Kuchelmeister, H.; Rale, M.; Chu, C.; Kool, E. T.; Wysocka, J.; Khavari, P. A.; Chang, H. Y., 7SK-BAF axis controls pervasive transcription at enhancers. Nat Struct Mol Biol 2016, 23 (3), 231-8.

Franke, E. K.; Luban, J., Inhibition of HIV-1 replication by cyclosporine A or related compounds correlates with the ability to disrupt the Gag-cyclophilin A interaction. Virology 1996, 222 (1), 279-82.

Franke, E. K.; Yuan, H. E.; Luban, J., Specific incorporation of cyclophilin A into HIV-1 virions. Nature 1994, 372 (6504), 359-62.

160

Fu, T. J.; Peng, J.; Lee, G.; Price, D. H.; Flores, O., Cyclin K functions as a CDK9 regulatory subunit and participates in RNA polymerase II transcription. J Biol Chem 1999, 274 (49), 34527-30.

Fuda, N. J.; Ardehali, M. B.; Lis, J. T., Defining mechanisms that regulate RNA polymerase II transcription in vivo. Nature 2009, 461 (7261), 186-92.

Fujinaga, K.; Irwin, D.; Huang, Y.; Taube, R.; Kurosu, T.; Peterlin, B. M., Dynamics of human immunodeficiency virus transcription: P-TEFb phosphorylates RD and dissociates negative effectors from the transactivation response element. Mol Cell Biol 2004, 24 (2), 787-95.

Gallay, P.; Swingler, S.; Song, J.; Bushman, F.; Trono, D., HIV nuclear import is governed by the phosphotyrosine-mediated binding of matrix to the core domain of integrase. Cell 1995, 83 (4), 569-76.

Gallo, R. C.; Sarin, P. S.; Gelmann, E. P.; Robert-Guroff, M.; Richardson, E.; Kalyanaraman, V. S.; Mann, D.; Sidhu, G. D.; Stahl, R. E.; Zolla-Pazner, S.; Leibowitch, J.; Popovic, M., Isolation of human T-cell leukemia virus in acquired immune deficiency syndrome (AIDS). Science 1983, 220 (4599), 865-7.

Gallo, R.; Wong-Staal, F.; Montagnier, L.; Haseltine, W. A.; Yoshida, M., HIV/HTLV gene nomenclature. Nature 1988, 333 (6173), 504.

Garcia, J. V.; Miller, A. D., Downregulation of cell surface CD4 by nef. Res Virol 1992, 143 (1), 52-5.

Gelmann, E. P.; Popovic, M.; Blayney, D.; Masur, H.; Sidhu, G.; Stahl, R. E.; Gallo, R. C., Proviral DNA of a retrovirus, human T-cell leukemia virus, in two patients with AIDS. Science 1983, 220 (4599), 862-5.

Goldsmith, M. A.; Warmerdam, M. T.; Atchison, R. E.; Miller, M. D.; Greene, W. C., Dissociation of the CD4 downregulation and viral infectivity enhancement functions of human immunodeficiency virus type 1 Nef. J Virol 1995, 69 (7), 4112-21.

Goodfellow, S. J.; Zomerdijk, J. C., Basic mechanisms in RNA polymerase I transcription of the ribosomal RNA genes. Subcell Biochem 2013, 61, 211-36.

Gottlinger, H. G.; Sodroski, J. G.; Haseltine, W. A., Role of capsid precursor processing and myristoylation in morphogenesis and infectivity of human immunodeficiency virus type 1. Proc Natl Acad Sci U S A 1989, 86 (15), 5781-5.

161

Goudsmit, J.; Debouck, C.; Meloen, R. H.; Smit, L.; Bakker, M.; Asher, D. M.; Wolff, A. V.; Gibbs, C. J., Jr.; Gajdusek, D. C., Human immunodeficiency virus type 1 neutralization epitope with conserved architecture elicits early type- specific antibodies in experimentally infected chimpanzees. Proc Natl Acad Sci U S A 1988, 85 (12), 4478-82.

Hahn, B. H.; Shaw, G. M.; De Cock, K. M.; Sharp, P. M., AIDS as a zoonosis: scientific and public health implications. Science 2000, 287 (5453), 607-14.

Harlen, K. M.; Churchman, L. S., The code and beyond: transcription regulation by the RNA polymerase II carboxy-terminal domain. Nat Rev Mol Cell Biol 2017, 18 (4), 263-273.

Harrich, D.; Ulich, C.; Gaynor, R. B., A critical role for the TAR element in promoting efficient human immunodeficiency virus type 1 reverse transcription. J Virol 1996, 70 (6), 4017-27.

Harrison, G. P.; Lever, A. M., The human immunodeficiency virus type 1 packaging signal and major splice donor region have a conserved stable secondary structure. J Virol 1992, 66 (7), 4144-53.

He, J.; Choe, S.; Walker, R.; Di Marzio, P.; Morgan, D. O.; Landau, N. R., Human immunodeficiency virus type 1 viral protein R (Vpr) arrests cells in the G2 phase of the cell cycle by inhibiting p34cdc2 activity. J Virol 1995, 69 (11), 6705-11.

He, N.; Jahchan, N. S.; Hong, E.; Li, Q.; Bayfield, M. A.; Maraia, R. J.; Luo, K.; Zhou, Q., A La-related protein modulates 7SK snRNP integrity to suppress P- TEFb-dependent transcriptional elongation and tumorigenesis. Mol Cell 2008, 29 (5), 588-99.

Hirsch, V. M.; Dapolito, G.; McGann, C.; Olmsted, R. A.; Purcell, R. H.; Johnson, P. R., Molecular cloning of SIV from sooty mangabey monkeys. J Med Primatol 1989, 18 (3-4), 279-85.

Hogg, J. R.; Collins, K., RNA-based affinity purification reveals 7SK RNPs with distinct composition and regulation. RNA 2007, 13 (6), 868-80.

Hoglund, S.; Ohagen, A.; Lawrence, K.; Gabuzda, D., Role of vif during packing of the core of HIV-1. Virology 1994, 201 (2), 349-55.

Huang, H.; Liu, S.; Jean, M.; Simpson, S.; Huang, H.; Merkley, M.; Hayashi, T.; Kong, W.; Rodriguez-Sanchez, I.; Zhang, X.; Yosief, H. O.; Miao, H.; Que, J.; Kobie, J. J.; Bradner, J.; Santoso, N. G.; Zhang, W.; Zhu, J., A Novel

162

Bromodomain Inhibitor Reverses HIV-1 Latency through Specific Binding with BRD4 to Promote Tat and P-TEFb Association. Front Microbiol 2017, 8, 1035.

Humphries, P.; Russell, S. E.; McWilliam, P.; McQuaid, S.; Pearson, C.; Humphries, M. M., Observations on the structure of two human 7SK pseudogenes and on homologous transcripts in vertebrate species. Biochem J 1987, 245 (1), 281-4.

Hwang, S. S.; Boyle, T. J.; Lyerly, H. K.; Cullen, B. R., Identification of the envelope V3 loop as the primary determinant of cell tropism in HIV-1. Science 1991, 253 (5015), 71-4.

Isel, C.; Karn, J., Direct evidence that HIV-1 Tat stimulates RNA polymerase II carboxyl-terminal domain hyperphosphorylation during transcriptional elongation. J Mol Biol 1999, 290 (5), 929-41.

Jacks, T.; Power, M. D.; Masiarz, F. R.; Luciw, P. A.; Barr, P. J.; Varmus, H. E., Characterization of ribosomal frameshifting in HIV-1 gag-pol expression. Nature 1988, 331 (6153), 280-3.

Jain, N.; Lin, H. C.; Morgan, C. E.; Harris, M. E.; Tolbert, B. S., Rules of RNA specificity of hnRNP A1 revealed by global and quantitative analysis of its affinity distribution. Proc Natl Acad Sci U S A 2017, 114 (9), 2206-2211.

Jeronimo, C.; Forget, D.; Bouchard, A.; Li, Q.; Chua, G.; Poitras, C.; Therien, C.; Bergeron, D.; Bourassa, S.; Greenblatt, J.; Chabot, B.; Poirier, G. G.; Hughes, T. R.; Blanchette, M.; Price, D. H.; Coulombe, B., Systematic analysis of the protein interaction network for the human transcription machinery reveals the identity of the 7SK capping enzyme. Mol Cell 2007, 27 (2), 262-74.

Jokan, L.; Dong, A. P.; Mayeda, A.; Krainer, A. R.; Xu, R. M., Crystallization and preliminary X-ray diffraction studies of UP1, the two-RRM domain of hnRNP A1. Acta Crystallogr D Biol Crystallogr 1997, 53 (Pt 5), 615-8.

Jowett, J. B.; Planelles, V.; Poon, B.; Shah, N. P.; Chen, M. L.; Chen, I. S., The human immunodeficiency virus type 1 vpr gene arrests infected T cells in the G2 + M phase of the cell cycle. J Virol 1995, 69 (10), 6304-13.

Kao, S. Y.; Calman, A. F.; Luciw, P. A.; Peterlin, B. M., Anti-termination of transcription within the long terminal repeat of HIV-1 by tat gene product. Nature 1987, 330 (6147), 489-93.

163

Karn, J.; Stoltzfus, C. M., Transcriptional and posttranscriptional regulation of HIV-1 gene expression. Cold Spring Harb Perspect Med 2012, 2 (2), a006916.

Kestler, H. W., 3rd; Ringler, D. J.; Mori, K.; Panicali, D. L.; Sehgal, P. K.; Daniel, M. D.; Desrosiers, R. C., Importance of the nef gene for maintenance of high virus loads and for development of AIDS. Cell 1991, 65 (4), 651-62.

Kim, S. Y.; Byrn, R.; Groopman, J.; Baltimore, D., Temporal aspects of DNA and RNA synthesis during human immunodeficiency virus infection: evidence for differential gene expression. J Virol 1989, 63 (9), 3708-13.

Kohlstaedt, L. A.; Wang, J.; Friedman, J. M.; Rice, P. A.; Steitz, T. A., Crystal structure at 3.5 A resolution of HIV-1 reverse transcriptase complexed with an inhibitor. Science 1992, 256 (5065), 1783-90.

Kolata, G., FDA approves AZT. Science 1987, 235 (4796), 1570.

Krueger, B. J.; Jeronimo, C.; Roy, B. B.; Bouchard, A.; Barrandon, C.; Byers, S. A.; Searcey, C. E.; Cooper, J. J.; Bensaude, O.; Cohen, E. A.; Coulombe, B.; Price, D. H., LARP7 is a stable component of the 7SK snRNP while P-TEFb, HEXIM1 and hnRNP A1 are reversibly associated. Nucleic Acids Res 2008, 36 (7), 2219-29.

Krueger, B. J.; Varzavand, K.; Cooper, J. J.; Price, D. H., The mechanism of release of P-TEFb and HEXIM1 from the 7SK snRNP by viral and cellular activators includes a conformational change in 7SK. PLoS One 2010, 5 (8), e12335.

Kwong, P. D.; Wyatt, R.; Robinson, J.; Sweet, R. W.; Sodroski, J.; Hendrickson, W. A., Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody. Nature 1998, 393 (6686), 648-59.

Lama, J.; Mangasarian, A.; Trono, D., Cell-surface expression of CD4 reduces HIV-1 infectivity by blocking Env incorporation in a Nef- and Vpu-inhibitable manner. Curr Biol 1999, 9 (12), 622-31.

Landau, N. R.; Warton, M.; Littman, D. R., The envelope glycoprotein of the human immunodeficiency virus binds to the immunoglobulin-like domain of CD4. Nature 1988, 334 (6178), 159-62.

Lapadat-Tapolsky, M.; De Rocquigny, H.; Van Gent, D.; Roques, B.; Plasterk, R.; Darlix, J. L., Interactions between HIV-1 nucleocapsid protein and viral DNA may

164

have important functions in the viral life cycle. Nucleic Acids Res 1993, 21 (4), 831-9.

Laurence, J., Update: HIV-1 gene nomenclature. AIDS Res Hum Retroviruses 1988, 4 (6), vii-viii.

Lebars, I.; Martinez-Zapien, D.; Durand, A.; Coutant, J.; Kieffer, B.; Dock- Bregeon, A. C., HEXIM1 targets a repeated GAUC motif in the riboregulator of transcription 7SK and promotes base pair rearrangements. Nucleic Acids Res 2010, 38 (21), 7749-63.

Lemieux, B.; Blanchette, M.; Monette, A.; Mouland, A. J.; Wellinger, R. J.; Chabot, B., A Function for the hnRNP A1/A2 Proteins in Transcription Elongation. PLoS One 2015, 10 (5), e0126654.

Levengood, J. D.; Tolbert, B. S., Idiosyncrasies of hnRNP A1-RNA recognition: Can binding mode influence function. Semin Cell Dev Biol 2018.

Lewis, P.; Hensel, M.; Emerman, M., Human immunodeficiency virus infection of cells arrested in the cell cycle. EMBO J 1992, 11 (8), 3053-8.

Li, B.; Carey, M.; Workman, J. L., The role of chromatin during transcription. Cell 2007, 128 (4), 707-19.

Li, Q.; Cooper, J. J.; Altwerger, G. H.; Feldkamp, M. D.; Shea, M. A.; Price, D. H., HEXIM1 is a promiscuous double-stranded RNA-binding protein and interacts with RNAs in addition to 7SK in cultured cells. Nucleic Acids Res 2007, 35 (8), 2503-12.

Li, Q.; Price, J. P.; Byers, S. A.; Cheng, D.; Peng, J.; Price, D. H., Analysis of the large inactive P-TEFb complex indicates that it contains one 7SK molecule, a dimer of HEXIM1 or HEXIM2, and two P-TEFb molecules containing Cdk9 phosphorylated at threonine 186. J Biol Chem 2005, 280 (31), 28819-26.

Liu, H.; Wu, X.; Newman, M.; Shaw, G. M.; Hahn, B. H.; Kappes, J. C., The Vif protein of human and simian immunodeficiency viruses is packaged into virions and associates with viral core structures. J Virol 1995, 69 (12), 7630-8.

Low, J. T.; Weeks, K. M., SHAPE-directed RNA secondary structure prediction. Methods 2010, 52 (2), 150-8.

Lu, H.; Li, Z.; Xue, Y.; Schulze-Gahmen, U.; Johnson, J. R.; Krogan, N. J.; Alber, T.; Zhou, Q., AFF1 is a ubiquitous P-TEFb partner to enable Tat extraction of P-

165

TEFb from 7SK snRNP and formation of SECs for HIV transactivation. Proc Natl Acad Sci U S A 2014, 111 (1), E15-24.

Lusvarghi, S.; Sztuba-Solinska, J.; Purzycka, K. J.; Rausch, J. W.; Le Grice, S. F., RNA secondary structure prediction using high-throughput SHAPE. J Vis Exp 2013, (75), e50243.

Malim, M. H.; Hauber, J.; Le, S. Y.; Maizel, J. V.; Cullen, B. R., The HIV-1 rev trans-activator acts through a structured target sequence to activate nuclear export of unspliced viral mRNA. Nature 1989, 338 (6212), 254-7.

Mancebo, H. S.; Lee, G.; Flygare, J.; Tomassini, J.; Luu, P.; Zhu, Y.; Peng, J.; Blau, C.; Hazuda, D.; Price, D.; Flores, O., P-TEFb kinase is required for HIV Tat transcriptional activation in vivo and in vitro. Genes Dev 1997, 11 (20), 2633-44.

Markert, A.; Grimm, M.; Martinez, J.; Wiesner, J.; Meyerhans, A.; Meyuhas, O.; Sickmann, A.; Fischer, U., The La-related protein LARP7 is a component of the 7SK ribonucleoprotein and affects transcription of cellular and viral polymerase II genes. EMBO Rep 2008, 9 (6), 569-75.

Martinez-Zapien, D.; Legrand, P.; McEwen, A. G.; Proux, F.; Cragnolini, T.; Pasquali, S.; Dock-Bregeon, A. C., The crystal structure of the 5 functional domain of the transcription riboregulator 7SK. Nucleic Acids Res 2017, 45 (6), 3568-3579.

Marz, M.; Donath, A.; Verstraete, N.; Nguyen, V. T.; Stadler, P. F.; Bensaude, O., Evolution of 7SK RNA and its protein partners in metazoa. Mol Biol Evol 2009, 26 (12), 2821-30.

Mathews, D. H., RNA Secondary Structure Analysis Using RNAstructure. Curr Protoc Bioinformatics 2014, 46, 12 6 1-25.

Mathews, D. H.; Turner, D. H.; Watson, R. M., RNA Secondary Structure Prediction. Curr Protoc Nucleic Acid Chem 2016, 67, 11 2 1-11 2 19.

Mayeda, A.; Munroe, S. H.; Xu, R. M.; Krainer, A. R., Distinct functions of the closely related tandem RNA-recognition motifs of hnRNP A1. RNA 1998, 4 (9), 1111-23.

Mbonye, U.; Wang, B.; Gokulrangan, G.; Shi, W.; Yang, S.; Karn, J., Cyclin- dependent kinase 7 (CDK7)-mediated phosphorylation of the CDK9 activation loop promotes P-TEFb assembly with Tat and proviral HIV reactivation. J Biol Chem 2018, 293 (26), 10009-10025.

166

Michael, W. M.; Choi, M.; Dreyfuss, G., A nuclear export signal in hnRNP A1: a signal-mediated, temperature-dependent nuclear protein export pathway. Cell 1995, 83 (3), 415-22.

Michels, A. A.; Fraldi, A.; Li, Q.; Adamson, T. E.; Bonnet, F.; Nguyen, V. T.; Sedore, S. C.; Price, J. P.; Price, D. H.; Lania, L.; Bensaude, O., Binding of the 7SK snRNA turns the HEXIM1 protein into a P-TEFb (CDK9/cyclin T) inhibitor. EMBO J 2004, 23 (13), 2608-19.

Miller, M. D.; Warmerdam, M. T.; Gaston, I.; Greene, W. C.; Feinberg, M. B., The human immunodeficiency virus-1 nef gene product: a positive factor for viral infection and replication in primary lymphocytes and macrophages. J Exp Med 1994, 179 (1), 101-13.

Molle, D.; Maiuri, P.; Boireau, S.; Bertrand, E.; Knezevich, A.; Marcello, A.; Basyuk, E., A real-time view of the TAR: Tat: P-TEFb complex at HIV-1 transcription sites. Retrovirology 2007, 4, 36.

Moore, G., Chemical modification of ribosomes with dimethyl sulfate: a probe to the structural organization of ribosomal proteins and RNA. Can J Biochem 1975, 53 (3), 328-37.

Morgan, C. E.; Meagher, J. L.; Levengood, J. D.; Delproposto, J.; Rollins, C.; Stuckey, J. A.; Tolbert, B. S., The First Crystal Structure of the UP1 Domain of hnRNP A1 Bound to RNA Reveals a New Look for an Old RNA Binding Protein. J Mol Biol 2015, 427 (20), 3241-3257.

Morris, K. V.; Mattick, J. S., The rise of regulatory RNA. Nat Rev Genet 2014, 15 (6), 423-37.

Muesing, M. A.; Smith, D. H.; Cabradilla, C. D.; Benton, C. V.; Lasky, L. A.; Capon, D. J., Nucleic acid structure and expression of the human AIDS/lymphadenopathy retrovirus. Nature 1985, 313 (6002), 450-8.

Muniz, L.; Egloff, S.; Kiss, T., RNA elements directing in vivo assembly of the 7SK/MePCE/Larp7 transcriptional regulatory snRNP. Nucleic Acids Res 2013, 41 (8), 4686-98.

Muniz, L.; Egloff, S.; Ughy, B.; Jady, B. E.; Kiss, T., Controlling cellular P-TEFb activity by the HIV-1 transcriptional transactivator Tat. PLoS Pathog 2010, 6 (10), e1001152.

167

Nabel, G.; Baltimore, D., An inducible transcription factor activates expression of human immunodeficiency virus in T cells. Nature 1987, 326 (6114), 711-3.

Nakashima, H.; Tochikura, T.; Kobayashi, N.; Matsuda, A.; Ueda, T.; Yamamoto, N., Effect of 3'-azido-2',3'-dideoxythymidine (AZT) and neutralizing antibody on human immunodeficiency virus (HIV)-induced cytopathic effects: implication of giant cell formation for the spread of virus in vivo. Virology 1987, 159 (1), 169-73.

Narita, T.; Yamaguchi, Y.; Yano, K.; Sugimoto, S.; Chanarat, S.; Wada, T.; Kim, D. K.; Hasegawa, J.; Omori, M.; Inukai, N.; Endoh, M.; Yamada, T.; Handa, H., Human transcription elongation factor NELF: identification of novel subunits and reconstitution of the functionally active complex. Mol Cell Biol 2003, 23 (6), 1863- 73.

Nath, A., HIV/AIDS Vaccine: An Update. Indian J Community Med 2010, 35 (2), 222-5.

Nguyen, V. T.; Kiss, T.; Michels, A. A.; Bensaude, O., 7SK small nuclear RNA binds to and inhibits the activity of CDK9/cyclin T complexes. Nature 2001, 414 (6861), 322-5.

Novikova, I. V.; Hennelly, S. P.; Sanbonmatsu, K. Y., Tackling structures of long noncoding RNAs. Int J Mol Sci 2013, 14 (12), 23672-84.

Pandori, M. W.; Fitch, N. J.; Craig, H. M.; Richman, D. D.; Spina, C. A.; Guatelli, J. C., Producer-cell modification of human immunodeficiency virus type 1: Nef is a virion protein. J Virol 1996, 70 (7), 4283-90.

Paparidis, N. F.; Durvale, M. C.; Canduri, F., The emerging picture of CDK9/P- TEFb: more than 20 years of advances since PITALRE. Mol Biosyst 2017, 13 (2), 246-276.

Paxton, W.; Connor, R. I.; Landau, N. R., Incorporation of Vpr into human immunodeficiency virus type 1 virions: requirement for the p6 region of gag and mutational analysis. J Virol 1993, 67 (12), 7229-37.

Peng, J.; Marshall, N. F.; Price, D. H., Identification of a cyclin subunit required for the function of Drosophila P-TEFb. J Biol Chem 1998, 273 (22), 13855-60.

Peterlin, B. M.; Brogie, J. E.; Price, D. H., 7SK snRNA: a noncoding RNA that plays a major role in regulating eukaryotic transcription. Wiley Interdiscip Rev RNA 2012, 3 (1), 92-103.

168

Pham, V. V.; Salguero, C.; Khan, S. N.; Meagher, J. L.; Brown, W. C.; Humbert, N.; de Rocquigny, H.; Smith, J. L.; D'Souza, V. M., HIV-1 Tat interactions with cellular 7SK and viral TAR RNAs identifies dual structural mimicry. Nat Commun 2018, 9 (1), 4266.

Pryciak, P. M.; Varmus, H. E., Nucleosomes, DNA-binding proteins, and DNA sequence modulate retroviral integration target site selection. Cell 1992, 69 (5), 769-80.

Purcell, D. F.; Martin, M. A., Alternative splicing of human immunodeficiency virus type 1 mRNA modulates viral protein expression, replication, and infectivity. J Virol 1993, 67 (11), 6365-78.

Qi, X.; Zhang, F.; Su, Z.; Jiang, S.; Han, D.; Ding, B.; Liu, Y.; Chiu, W.; Yin, P.; Yan, H., Programming molecular topologies from single-stranded nucleic acids. Nat Commun 2018, 9 (1), 4579.

Qin, Y.; Yao, J.; Wu, D. C.; Nottingham, R. M.; Mohr, S.; Hunicke-Smith, S.; Lambowitz, A. M., High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases. RNA 2016, 22 (1), 111-28.

Quarrier, S.; Martin, J. S.; Davis-Neulander, L.; Beauregard, A.; Laederach, A., Evaluation of the information content of RNA structure mapping data for secondary structure prediction. RNA 2010, 16 (6), 1108-17.

Re, F.; Braaten, D.; Franke, E. K.; Luban, J., Human immunodeficiency virus type 1 Vpr arrests the cell cycle in G2 by inhibiting the activation of p34cdc2- cyclin B. J Virol 1995, 69 (11), 6859-64.

Renner, D. B.; Yamaguchi, Y.; Wada, T.; Handa, H.; Price, D. H., A highly purified RNA polymerase II elongation control system. J Biol Chem 2001, 276 (45), 42601-9.

Reuter, J. S.; Mathews, D. H., RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 2010, 11, 129.

Ross, T. M.; Oran, A. E.; Cullen, B. R., Inhibition of HIV-1 progeny virion release by cell-surface CD4 is relieved by expression of the viral Nef protein. Curr Biol 1999, 9 (12), 613-21.

Rouskin, S.; Zubradt, M.; Washietl, S.; Kellis, M.; Weissman, J. S., Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 2014, 505 (7485), 701-5.

169

Roy, S.; Delling, U.; Chen, C. H.; Rosen, C. A.; Sonenberg, N., A bulge structure in HIV-1 TAR RNA is required for Tat binding and Tat-mediated trans-activation. Genes Dev 1990, 4 (8), 1365-73.

Ruben, S.; Perkins, A.; Purcell, R.; Joung, K.; Sia, R.; Burghoff, R.; Haseltine, W. A.; Rosen, C. A., Structural and functional characterization of human immunodeficiency virus tat protein. J Virol 1989, 63 (1), 1-8.

Sainsbury, S.; Bernecky, C.; Cramer, P., Structural basis of transcription initiation by RNA polymerase II. Nat Rev Mol Cell Biol 2015, 16 (3), 129-43.

Sato, A.; Igarashi, H.; Adachi, A.; Hayami, M., Identification and localization of vpr gene product of human immunodeficiency virus type 1. Virus Genes 1990, 4 (4), 303-12.

Schubert, U.; Bour, S.; Ferrer-Montiel, A. V.; Montal, M.; Maldarell, F.; Strebel, K., The two biological activities of human immunodeficiency virus type 1 Vpu protein involve two separable structural domains. J Virol 1996, 70 (2), 809-19.

Schulze-Gahmen, U.; Echeverria, I.; Stjepanovic, G.; Bai, Y.; Lu, H.; Schneidman-Duhovny, D.; Doudna, J. A.; Zhou, Q.; Sali, A.; Hurley, J. H., Insights into HIV-1 proviral transcription from integrative structure and dynamics of the Tat: AFF4:P-TEFb:TAR complex. Elife 2016, 5.

Schulze-Gahmen, U.; Upton, H.; Birnberg, A.; Bao, K.; Chou, S.; Krogan, N. J.; Zhou, Q.; Alber, T., The AFF4 scaffold binds human P-TEFb adjacent to HIV Tat. Elife 2013, 2, e00327.

Schwartz, O.; Marechal, V.; Danos, O.; Heard, J. M., Human immunodeficiency virus type 1 Nef increases the efficiency of reverse transcription in the infected cell. J Virol 1995, 69 (7), 4053-9.

Schwartz, S.; Felber, B. K.; Benko, D. M.; Fenyo, E. M.; Pavlakis, G. N., Cloning and functional analysis of multiply spliced mRNA species of human immunodeficiency virus type 1. J Virol 1990, 64 (6), 2519-29.

Schwartz, S.; Felber, B. K.; Fenyo, E. M.; Pavlakis, G. N., Env and Vpu proteins of human immunodeficiency virus type 1 are produced from multiple bicistronic mRNAs. J Virol 1990, 64 (11), 5448-56.

170

Schwartz, S.; Felber, B. K.; Pavlakis, G. N., Mechanism of translation of monocistronic and multicistronic human immunodeficiency virus type 1 mRNAs. Mol Cell Biol 1992, 12 (1), 207-19.

Sedore, S. C.; Byers, S. A.; Biglione, S.; Price, J. P.; Maury, W. J.; Price, D. H., Manipulation of P-TEFb control machinery by HIV: recruitment of P-TEFb from the large form by Tat and binding of HEXIM1 to TAR. Nucleic Acids Res 2007, 35 (13), 4347-58.

Shafran, S. D.; Mashinter, L. D.; Lindemulder, A.; Taylor, G. D.; Chiu, I., Poor efficacy of intradermal administration of recombinant hepatitis B virus immunization in HIV-infected individuals who fail to respond to intramuscular administration of hepatitis B virus vaccine. HIV Med 2007, 8 (5), 295-9.

Shamoo, Y.; Krueger, U.; Rice, L. M.; Williams, K. R.; Steitz, T. A., Crystal structure of the two RNA binding domains of human hnRNP A1 at 1.75 A resolution. Nat Struct Biol 1997, 4 (3), 215-22.

Shandilya, J.; Roberts, S. G., The transcription cycle in eukaryotes: from productive initiation to RNA polymerase II recycling. Biochim Biophys Acta 2012, 1819 (5), 391-400. Shapiro, R.; Law, D. C.; Weisgras, J. M., A new chemical probe for single- stranded RNA. Biochem Biophys Res Commun 1972, 49 (2), 358-63.

Shin, S. Y., Recent update in HIV vaccine development. Clin Exp Vaccine Res 2016, 5 (1), 6-11.

Shore, S. M.; Byers, S. A.; Maury, W.; Price, D. H., Identification of a novel isoform of Cdk9. Gene 2003, 307, 175-82.

Shumiatskii, G. P.; Tillib, S. V.; Dramerov, D. A., [B2 RNA and 7SK RNA, transcripts of RNA-polymerase III, have a cap-like structure at the 5'-end]. Mol Biol (Mosk) 1990, 24 (6), 1686-94.

Silverman, I. M.; Berkowitz, N. D.; Gosai, S. J.; Gregory, B. D., Genome-Wide Approaches for RNA Structure Probing. Adv Exp Med Biol 2016, 907, 29-59.

Simon, J. H.; Miller, D. L.; Fouchier, R. A.; Soares, M. A.; Peden, K. W.; Malim, M. H., The regulation of primate immunodeficiency virus infectivity by Vif is cell species restricted: a role for Vif in determining virus host range and cross- species transmission. EMBO J 1998, 17 (5), 1259-67.

171

Siomi, H.; Dreyfuss, G., A nuclear localization domain in the hnRNP A1 protein. J Cell Biol 1995, 129 (3), 551-60.

Smith, M. K.; Westreich, D.; Liu, H.; Zhu, L.; Wang, L.; He, W.; Zhou, J.; Miller, W. C.; Cohen, M. S.; Wang, N., Treatment to Prevent HIV Transmission in Serodiscordant Couples in Henan, China, 2006 to 2012. Clin Infect Dis 2015, 61 (1), 111-9.

Stoltzfus, C. M., Chapter 1. Regulation of HIV-1 alternative RNA splicing and its role in virus replication. Adv Virus Res 2009, 74, 1-40.

Stoltzfus, C. M.; Madsen, J. M., Role of viral splicing elements and cellular RNA binding proteins in regulation of HIV-1 alternative RNA splicing. Curr HIV Res 2006, 4 (1), 43-55.

Strebel, K.; Daugherty, D.; Clouse, K.; Cohen, D.; Folks, T.; Martin, M. A., The HIV 'A' (sor) gene product is essential for virus infectivity. Nature 1987, 328 (6132), 728-30.

Tahirov, T. H.; Babayeva, N. D.; Varzavand, K.; Cooper, J. J.; Sedore, S. C.; Price, D. H., Crystal structure of HIV-1 Tat complexed with human P-TEFb. Nature 2010, 465 (7299), 747-51.

Thali, M.; Bukovsky, A.; Kondo, E.; Rosenwirth, B.; Walsh, C. T.; Sodroski, J.; Gottlinger, H. G., Functional association of cyclophilin A with HIV-1 virions. Nature 1994, 372 (6504), 363-5.

Tijerina, P.; Mohr, S.; Russell, R., DMS footprinting of structured RNAs and RNA- protein complexes. Nat Protoc 2007, 2 (10), 2608-23.

Tiley, L. S.; Madore, S. J.; Malim, M. H.; Cullen, B. R., The VP16 transcription activation domain is functional when targeted to a promoter-proximal RNA sequence. Genes Dev 1992, 6 (11), 2077-87.

Turowski, T. W.; Tollervey, D., Transcription by RNA polymerase III: insights into mechanism and regulation. Biochem Soc Trans 2016, 44 (5), 1367-1375.

Ueda, K.; Seki, T.; Kudo, T.; Yoshida, T.; Kataoka, M., Two distinct mechanisms cause heterogeneity of 16S rRNA. J Bacteriol 1999, 181 (1), 78-82.

Van Herreweghe, E.; Egloff, S.; Goiffon, I.; Jady, B. E.; Froment, C.; Monsarrat, B.; Kiss, T., Dynamic remodelling of human 7SK snRNP controls the nuclear level of active P-TEFb. EMBO J 2007, 26 (15), 3570-80.

172

von Schwedler, U.; Song, J.; Aiken, C.; Trono, D., Vif is crucial for human immunodeficiency virus type 1 proviral DNA synthesis in infected cells. J Virol 1993, 67 (8), 4945-55.

Wassarman, D. A.; Steitz, J. A., Structural analyses of the 7SK ribonucleoprotein (RNP), the most abundant human small RNP of unknown function. Mol Cell Biol 1991, 11 (7), 3432-45.

Wei, P.; Garber, M. E.; Fang, S. M.; Fischer, W. H.; Jones, K. A., A novel CDK9- associated C-type cyclin interacts directly with HIV-1 Tat and mediates its high- affinity, loop-specific binding to TAR RNA. Cell 1998, 92 (4), 451-62.

Weil, P. A.; Luse, D. S.; Segall, J.; Roeder, R. G., Selective and accurate initiation of transcription at the Ad2 major late promotor in a soluble system dependent on purified RNA polymerase II and DNA. Cell 1979, 18 (2), 469-84.

Weiss, R. A., Retrovirus classification and cell interactions. J Antimicrob Chemother 1996, 37 Suppl B, 1-11.

Willey, R. L.; Maldarelli, F.; Martin, M. A.; Strebel, K., Human immunodeficiency virus type 1 Vpu protein induces rapid degradation of CD4. J Virol 1992, 66 (12), 7193-200.

Wong, K. H.; Jin, Y.; Struhl, K., TFIIH phosphorylation of the Pol II CTD stimulates mediator dissociation from the preinitiation complex and promoter escape. Mol Cell 2014, 54 (4), 601-12.

Wyatt, R.; Kwong, P. D.; Desjardins, E.; Sweet, R. W.; Robinson, J.; Hendrickson, W. A.; Sodroski, J. G., The antigenic structure of the HIV gp120 envelope glycoprotein. Nature 1998, 393 (6686), 705-11.

Xu, Z. Z.; Mathews, D. H., Experiment-Assisted Secondary Structure Prediction with RNAstructure. Methods Mol Biol 2016, 1490, 163-76.

Xu, Z. Z.; Mathews, D. H., Secondary Structure Prediction of Single Sequences Using RNAstructure. Methods Mol Biol 2016, 1490, 15-34.

Xu, Z.; Culver, G., RNA structure experimental analysis--chemical modification. Methods Enzymol 2013, 530, 363-80.

173

Xue, Y.; Yang, Z.; Chen, R.; Zhou, Q., A capping-independent function of MePCE in stabilizing 7SK snRNA and facilitating the assembly of 7SK snRNP. Nucleic Acids Res 2010, 38 (2), 360-9.

Yamada, T.; Yamaguchi, Y.; Inukai, N.; Okamoto, S.; Mura, T.; Handa, H., P- TEFb-mediated phosphorylation of hSpt5 C-terminal repeats is critical for processive transcription elongation. Mol Cell 2006, 21 (2), 227-37.

Yamaguchi, Y.; Inukai, N.; Narita, T.; Wada, T.; Handa, H., Evidence that negative elongation factor represses transcription elongation through binding to a DRB sensitivity-inducing factor/RNA polymerase II complex and RNA. Mol Cell Biol 2002, 22 (9), 2918-27.

Yamaguchi, Y.; Takagi, T.; Wada, T.; Yano, K.; Furuya, A.; Sugimoto, S.; Hasegawa, J.; Handa, H., NELF, a multisubunit complex containing RD, cooperates with DSIF to repress RNA polymerase II elongation. Cell 1999, 97 (1), 41-51.

Yamakawa, M.; Shatkin, A. J.; Furuichi, Y., Chemical methylation of RNA and DNA viral genomes as a probe of in situ structure. J Virol 1981, 40 (2), 482-90.

Yang, Z.; Yik, J. H.; Chen, R.; He, N.; Jang, M. K.; Ozato, K.; Zhou, Q., Recruitment of P-TEFb for stimulation of transcriptional elongation by the bromodomain protein Brd4. Mol Cell 2005, 19 (4), 535-45.

Yang, Z.; Zhu, Q.; Luo, K.; Zhou, Q., The 7SK small nuclear RNA inhibits the CDK9/cyclin T1 kinase to control transcription. Nature 2001, 414 (6861), 317-22.

Yik, J. H.; Chen, R.; Nishimura, R.; Jennings, J. L.; Link, A. J.; Zhou, Q., Inhibition of P-TEFb (CDK9/Cyclin T) kinase and RNA polymerase II transcription by the coordinated actions of HEXIM1 and 7SK snRNA. Mol Cell 2003, 12 (4), 971-82.

Yik, J. H.; Chen, R.; Pezda, A. C.; Zhou, Q., Compensatory contributions of HEXIM1 and HEXIM2 in maintaining the balance of active and inactive positive transcription elongation factor b complexes for control of transcription. J Biol Chem 2005, 280 (16), 16368-76.

Zapp, M. L.; Green, M. R., Sequence-specific RNA binding by the HIV-1 Rev protein. Nature 1989, 342 (6250), 714-6.

174

Zhang, H.; Rigo, F.; Martinson, H. G., Poly(A) Signal-Dependent Transcription Termination Occurs through a Conformational Change Mechanism that Does Not Require Cleavage at the Poly(A) Site. Mol Cell 2015, 59 (3), 437-48.

Zhang, Z.; Klatt, A.; Gilmour, D. S.; Henderson, A. J., Negative elongation factor NELF represses human immunodeficiency virus transcription by pausing the RNA polymerase II complex. J Biol Chem 2007, 282 (23), 16981-8.

Zhu, Y.; Pe'ery, T.; Peng, J.; Ramanathan, Y.; Marshall, N.; Marshall, T.; Amendt, B.; Mathews, M. B.; Price, D. H., Transcription elongation factor P-TEFb is required for HIV-1 tat transactivation in vitro. Genes Dev 1997, 11 (20), 2622-32.

Zubradt, M.; Gupta, P.; Persad, S.; Lambowitz, A. M.; Weissman, J. S.; Rouskin, S., DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat Methods 2017, 14 (1), 75-82.

175