Mechanism of Antitermination by NusG-like and

the Role of RNAP Conformational Mobility in Cycle

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Anastasia Sevostiyanova, M.S.

Microbiology Graduate Program

The Ohio State University

2010

Dissertation Committee:

Dr. Irina Artsimovitch, Advisor

Dr. Michael Ibba

Dr. Kurt Fredrick

Dr. Mark Foster

Copyright by

Anastasia Sevostiyanova

2010

Abstract

Uninterrupted synthesis of complete, up to a million nucleotides long, RNA chains by multi-subunit RNA polymerases (RNAPs) requires accessory proteins that help RNAP bypass numerous roadblocks it encounters along the way. Antitermination factors are found in all organisms from to humans and share the ability to switch the elongating RNAP into a highly processive state. Their molecular mechanisms, and in most cases even their binding sites on the transcription elongation complex (TEC), remain unknown.

Diverse elongation factors, including λ N and Q proteins, HIV Tat, and regulators from the NusG family, help RNAP to bypass various pause and termination signals, thereby increasing its processivity. Bacterial factor RfaH, an -specific paralog of the general NusG, is an excellent model for studies of the antitermination mechanism. RfaH acts as a canonical antiterminator: it increases the apparent elongation rate in vitro, reduces pausing, and facilitates bypass of some terminators. Studies of RfaH have already provided many insights into its function: we have obtained the RfaH structure and identified its binding sites on RNAP and the non- template (NT) DNA, identified many RfaH-controlled genes, and elucidated some

ii

aspects of the RfaH mechanism. I aimed to dissect the molecular contacts in the RfaH- modified TEC and to study the cellular context of RfaH action using a complex of in vivo and in vitro techniques.

We demonstrated that RfaH increases the apparent elongation rate by preventing the TEC isomerization into an off-pathway state. Since RfaH binds 75Å away from the

RNAP active site, where the structural rearrangements accompanying this isomerization occur, we envisioned that RfaH may transmit an allosteric signal to the active site. The complex and dynamic architecture of RNAP allows for many conformational changes whose regulatory importance we have just started to explore, and several hypothetical allosteric pathways have been proposed. In particular, five elements in the β‘ subunit, switch-2 (SW2), clamp, F-bridge helix and trigger loop (TL), appear to adopt different conformations in crystals and in solution. RfaH binds to the clamp, which is in turn directly linked to the SW2. Our data suggest that folding of the SW2 is crucial for transition from a catalytically-incompetent intermediate to the open complex, and we are pursuing its role in elongation and termination. Our recent data led us to propose that that RfaH binds simultaneously to the β and β’ regions that constitute the clamp, thereby locking it in a closed conformation in which RNAP tightly encircles the nucleic acid chains. This closed conformation likely corresponds to the processive, pause- resistant state of the TEC. Presented work is focused on the mechanism of the NusG-like proteins and the conformational mobility of regulatory elements in RNAP that may play important roles throughout the transcription cycle.

iii

Acknowledgements

First and most of all, I would like to thank my advisor Irina for creating an excellent working environment, high professional standards and the degree of scientific freedom she offered me, for her leadership, dedication to science, friendship, support, patience, driving lessons and my beloved cat.

I also want to thank my Committee members, Dr. Kurt Fredrick, Dr. Mark Foster and Dr. Mike Ibba for their valuable feedback and support over the years.

I thank the organizers of the memorable Mountain Lake meeting in 2008 for creating an intellectually stimulating environment and support.

I want to express my gratitude to everyone who tried to control my life and failed.

I thank Andrey Feklistov, Agus Muñoz Garcia, Amit Dashottar, Daniel Alpern and

Olga Karicheva for their support, long talks and midnight walks.

I want to thank Ran Furman, Noah Reynolds and Kiley Dare for their friendship and support.

I want to thank my father for his support and encouragement to advance my career.

iv

This work is dedicated to the memory of my mother Elena.

v

Vita

2000-2005: College of Biological Sciences, Moscow State University, Moscow, Russia.

2005-2006: Junior Research Scientist, Laboratory of Molecular of

Microorganisms, Institute of Molecular Genetics, Moscow, Russia

2006 to date: Department of Microbiology, Ohio State University, Columbus, OH

Publications

• Sevostyanova A and Artsimovitch I (2010) Nucleic Acids Res. Jul 17, PMID: 20639538 Functional analysis of Thermus thermophilus transcription factor NusG.

• Pupov D, Miropolskaya N, Sevostyanova A, Bass I, Artsimovitch I & Kulbachinskiy A (2010) Nucleic Acids Res. May 10, PMID: 2045775 Multiple roles of the RNA polymerase β′ SW2 region in transcription initiation, promoter escape, and RNA elongation.

• Belogurov GA, Sevostyanova A, Svetlov V, Artsimovitch I (2010) Mol Microbiol 76(2), 286-301. Functional regions of the N-terminal domain of the antiterminator RfaH.

• Belogurov GA, Vassylyeva MN, Sevostyanova A, Appleman JR, Xiang AX, Lira R, Webber SE, Klyuyev S, Nudler E, Artsimovitch I & Vassylyev DG. (2009) Nature 457(7227), 332-35. Transcription inactivation through local refolding of the RNA polymerase structure.

vi

• Sevostyanova A, Svetlov V, Vassylyev D G & Artsimovitch I (2008) PNAS 105, 865-70. The elongation factor RfaH and the initiation factor sigma bind to the same site on the transcription elongation complex.

• Sevostyanova A, Feklistov A, Barinova N, Heyduk E, Bass I, Klimasauskas S, Heyduk T & Kulbachinskiy A (2007) J Biol Chem 282, 22033-9. Specific recognition of the -10 promoter element by the free RNA polymerase sigma subunit.

• Sevostyanova A, Djordjevic M, Kuznedelov K, Naryshkina T, Gelfand MS, Severinov K.& Minakhin L.(2006) J Mol Biol 366, 420-35. Temporal regulation of viral transcription during development of T. thermophilus bacteriophage ϕYS40.

• Feklistov A., Barinova N., Sevostyanova A., Heyduk E., Bass I., Vvedenskaya I., Kuznedelov K., Merkiene E., Stavrovskaya E., Klimasauskas S., Nikiforov V., Heyduk T., Severinov K. & Kulbachinskiy A. (2006) Mol Cell 23, 97-107. A basal promoter element recognized by free RNA polymerase sigma subunit determines promoter recognition by RNA polymerase holoenzyme.

Fields of study

Major Field: Microbiology

vii

Table of Contents

Abstract ...... ii

Acknowledgements ...... iv

Vita ...... vi

Publications ...... vi

Fields of study ...... vii

Table of Contents ...... viii

List of Figures ...... xiii

List of Tables ...... xvii

List of Symbols and Abbreviations ...... xviii

Chapter 1: Introduction ...... 1

Basics of Transcription ...... 1

RNA Polymerase Architecture ...... 1

Transcription Cycle ...... 4

Nucleotide Addition Cycle...... 6

Transcription Initiation ...... 12

Initial Promoter Recognition ...... 12

Intermediates in Open Complex Formation ...... 13

Transition to Elongation ...... 16

Transcription Pausing and Termination ...... 18

Role of Pausing in Transcription Regulation ...... 18

viii

Cellular Elongation Factors ...... 18

Pausing Mechanism ...... 22

Intrinsic Termination ...... 28

Factor-dependent Termination ...... 30

Antitermination Systems ...... 33

N of Phage λ ...... 33

Q Protein of Phage λ ...... 34

Put RNA of Phage HK022 ...... 34

Ribosomal Antitermination ...... 37

NusG Family of Elongation Factors ...... 37

RfaH as a Paradigm of Antipausing Control ...... 40

RfaH Structure and the Recruitment Mechanism ...... 40

A Ubiquitous Mechanism for Antitermination? ...... 43

Chapter 2: The elongation factor RfaH and the initiation factor sigma bind to the same

site on the transcription elongation complex ...... 45

Introduction ...... 45

Results ...... 49

RfaH Abrogates σ-Induced Pausing...... 49

RfaH Likely Prevents σ Recruitment Directly...... 52

Contacts to CH and NT DNA Are Essential for the RfaH and σ Action during Elongation. 52

RfaH and σ70 Recognize Distinct Determinants on the β’CH...... 56

RfaH and σ70 Do Not Compete During Initiation...... 57

Discussion ...... 61

Materials and Methods...... 66

Proteins and Reagents...... 66

Pause Assays...... 66

Halted A26 Complex Formation...... 66

Sample Analysis...... 67

Chapter 3: Functional regions of the N-terminal domain of the antiterminator RfaH.69

ix

Introduction ...... 69

Results ...... 73

RfaHN Mediates All Transcriptional Activities of RfaH In Vitro ...... 73

In Vitro Assay for Ops Binding and AP Activities of RfaH ...... 74

Substitutions That Compromise RfaHN Contacts with Ops ...... 81

Substitutions That Compromise the AP Activity of RfaHN ...... 86

In Vivo Effects of RfaHN Substitutions ...... 94

Discussion ...... 99

The Functions of the RfaH Domains ...... 99

A Cluster of Polar and Charged Residues Mediates RfaHN Binding to DNA ...... 101

A Hydrophobic Surface of the N-terminal Domain Mediates its Binding to RNAP ...... 102

The HTT Motif as an AP Module ...... 102

Materials and Methods...... 105

Plasmids and Strains ...... 105

Proteins and Reagents ...... 105

Halted Complex Formation ...... 106

Single Round Pause Assays ...... 106

Intrinsic Termination Assay at Thly ...... 106

Rho-Dependent Termination Assays ...... 107

Sample Analysis ...... 107

In Vivo Assays ...... 107

Western Blotting ...... 108

Chapter 4: Functional analysis of Thermus thermophilus transcription factor NusG ..... 113

Introduction ...... 113

Results ...... 119

Tth NusG Slows Down the Already “Fast” Tth RNAP ...... 119

Different NusG Proteins Have Small and Distinct Effects on Intrinsic Termination ...... 124

Tth NusG Reduces σA-Induced Pausing by Tth RNAP ...... 127

Tth NusG Binds to the NT DNA Strand in the TEC ...... 130

x

Tth NusG Stabilizaes the Post-Translocated State of the TEC ...... 132

Discussion ...... 134

NusG Interactions with the TEC ...... 134

NusG Role in Transcriptional Pausing ...... 137

What is the Main Role of NusG in the Cell? ...... 138

Materials and Methods...... 140

Plasmids and Reagents ...... 140

Protein Expression and Purification ...... 141

Transcription Elongation Assays ...... 142

Sigma Competition Assay ...... 142

KMnO4 Footprinting ...... 143

Sample Analysis ...... 144

Chapter 5: The β subunit Gate loop mediates antitermination modification of RNA

polymerase...... 145

Introduction ...... 145

Results ...... 148

RfaH Binding to β’CH is Insufficient for AP ...... 148

RNAP with GL Deletion Does not Respond to RfaH In Vivo...... 150

RNAP with GL Deletion Does not Respond to RfaH In Vitro...... 152

Deletion of βGL Does not Abolish RfaH Binding ...... 155

Discussion ...... 158

Materials and Methods...... 161

Proteins and Reagents ...... 161

Sigma Competition Assay ...... 161

qRT-PCR ...... 162

Viability Assay ...... 163

Elongation Rate Assay ...... 163

Rho-Dependent Termination Assay ...... 163

xi

Gel Mobility Shift Assay ...... 164

Chapter 6: Transcription inactivation through local refolding of the RNA polymerase structure ...... 166

Introduction ...... 166

Results ...... 168

Crystal Structure of RNAP Complexed with dMyx ...... 168

Mutational Analysis of SW2 region ...... 172

Myxopyronin Traps a Partially Melted Promoter Complex Intermediate ...... 180

Discussion ...... 189

Materials and Methods...... 192

Single Nucleotide Addition Initiation Assay ...... 194

Isolation and Assay of Mutant E. coli RNAPs ...... 195

Footprinting Analysis ...... 196

Chapter 7: Conclusions and perspective ...... 197

References ...... 206

xii

List of Figures

Fig. 1. RNAP holoenzyme architecture...... 3

Fig. 2. Schematic representation of transcription cycle and its regulation...... 5

Fig. 3. Nucleotide addition cycle...... 7

Fig. 4. Catalysis of nucleotide addition by RNAP (residue numbers correspond to E. coli)...... 8

Fig. 5. Position of the incoming NTP in pre-insertion (A) vs. insertion (B) complexes...... 11

Fig. 6. Schematics of transcription initiation pathway...... 15

Fig. 7. The regulatory map of TEC and interaction relationships between elongation factors

that bind to the same site on the RNAP surface...... 21

Fig. 8. Alternative conformations of 3’-rNMP in X-ray structures of yeast RNAP PolII...... 24

Fig. 9. Structural features and kinetic relationship of different elongation complexes...... 26

Fig. 10. Rho hexamer structure in open and closed conformations...... 31

Fig. 11. Schematic representation of λN- (A) and λQ- (B) modified TEC...... 36

Fig. 12. Models of antipausing activity of put.(adopted from (193))...... 36

Fig. 13. Structural comparison of NusG (A) and RfaH (B)...... 41

Fig. 14. RfaH recruitment to the TEC...... 41

Fig. 15. The heterologous model of RfaH binding to the TEC...... 42

xiii

Fig. 16. RfaH and σ70 bind to topologically similar targets on the TEC...... 47

Fig. 17. RfaH reduces σ-dependent pausing downstream from the consensus -10 element....50

Fig. 18. Contacts with the TEC are critical for RfaH and σ effects during elongation...... 51

Fig. 19. Bound to the NT DNA σ70 prevents RfaH recruitment to the adjacent ops site...... 53

Fig. 20. RfaH effect on RNAPs with the substitutions in the β’CH...... 58

Fig. 21. β’ R275Q RNAP fails to respond to σ during elongation...... 59

Fig. 22. RfaH does not compete with σ 70 during initiation...... 60

Fig. 23. Contacts to the NT DNA strand...... 65

Fig. 24. The structural context of the RfaH action...... 72

Fig. 25. RfaHN effects on intrinsic termination...... 75

Fig. 26. RfaHN effects on Rho-dependent termination...... 76

Fig. 27. Effects of RfaH on transcription elongation in vitro...... 78

Fig. 28. Effects of GreB on pausing at the ops site...... 79

Fig. 29. Effects of RfaHN substitutions on pausing at the ops site...... 84

Fig. 30. Alignment of the wild-type and a representative set of “mutant” RfaH structures in

context of the wild-type structural ensemble...... 85

Fig. 31. Effects of RfaHN substitutions on pausing at the hisP site...... 88

Fig. 32. Alignment of the wild-type and all “mutant” RfaH structures...... 92

Fig. 33. Circular Dichroism (CD) spectra of selected RfaH variants does not reveal any major

structural perturbations compared to the wild-type protein...... 93

Fig. 34. The in vivo reporter assay for the RfaH activity...... 97

Fig. 35. The functional contacts between RfaH and the TEC...... 104

Fig. 36. Structural conservation in the NusG family...... 115

xiv

Fig. 37. The RNAP-binding surface and the domain architecture of the NusG -like

proteins...... 116

Fig. 38. Effects of temperature on transcript elongation of Eco and Tth RNAPs...... 120

Fig. 39. Effects of the NusG proteins on the elongation rate of Eco and Tth RNAPs...... 122

Fig. 40. Effect of the Tth NusG on transcription on a “pause-free” pIA146 template...... 123

Fig. 41. Transcription termination by Eco and Tth RNAPs...... 125

Fig. 42. Tth NusG inhibits σA-induced pausing by Tth RNAP...... 129

Fig. 43. Tth NusGN binds to the Tth TEC...... 131

Fig. 44. Tth NusG favors forward translocation...... 133

Fig. 45. A model for Tth NusG interactions...... 135

Fig. 46. A model of RfaHN bound to the TEC...... 147

Fig. 47. RfaH variants with substitutions in HTT motif still bind to TEC...... 149

Fig. 48. ∆GL RNAP does not respond to RfaH in vivo...... 151

Fig. 49. ∆GL RNAP does not respond to RfaH in vitro ...... 153

Fig. 50. Deletion of βGL does not compromise NusG effect on Rho-dependent

termination...... 154

Fig. 51. ∆GL RNAP supports viability in laboratory condition...... 156

Fig. 52. RfaH abolishes σ-dependent pause during elongation by the ΔGL RNAP...... 156

Fig. 53. RfaH binds to the ΔGL RNAP...... 157

Fig. 54. Bacterial two-hybrid assay (BacterioMatchTM) of RfaH- βGL interactions...... 157

Fig. 55. Model of antipausing modification by NusG-like proteins...... 160

Fig. 56. Myxopyronin inhibits transcription initiation by bacterial RNAPs...... 169

Fig. 57. Structure of the RNAP/dMyx complex, the overall view...... 169

xv

Fig. 58. The quality of the RNAP/dMyx structure...... 170

Fig. 59. The RNAP domain rearrangement induced by the dMyx binding...... 170

Fig. 60. The dimensions of the RNAP main channel in the RNAP/dMyx and apo-RNAP

structures...... 171

Fig. 61. Myxopyronin binds to a conserved SW2 element...... 174

Fig. 62. The dMyx binding determinants in the RNAP/dMyx complex structure...... 175

Fig. 63. Refolding of SW2...... 176

Fig. 64. Effect of RNAP mutations on dMyx activity...... 177

Fig. 65. The entry of the dMyx binding site...... 178

Fig. 66. Modeling of the DNA template to the RNAP/dMyx complex...... 179

Fig. 67. Myx inhibits transcription only if added before RPO formation...... 181

Fig. 68. dMyx alters the contacts between RNAP and λPR promoter DNA...... 182

Fig. 69. Myxopyronin inhibits transcription from both the natural, double-stranded (left) or

artificially melted (right) λPR promoter templates...... 183

Fig. 70. Footprinting analysis of the RNAP variants with changes in the SW2 regions...... 185

Fig. 71. DNaseI footprinting analysis of promoter complexes formed by SW2 variants...... 186

Fig. 72. Step-by step schematic of open complex formation...... 191

xvi

List of Tables

Table 1. Characteristics of promoter-RNAP complexes in initiation pathway……. 14

Table 2. Plasmids and templates. ……………………………………………………….68

Table 3. Predicted effects of selected substitutions in RfaH.…………………………90

Table 4. Plasmids and templates.………………………………………………………112

Table 5. Plasmids and templates.……..…………………………………………..……165

Table 6. Collection of structural data and refinement statistics.……………………193

xvii

List of Symbols and Abbreviations

General: NT strand – non-template, coding DNA strand

T strand – template, non-coding DNA strand

WT – wild type

nt – nucleotide

bp – base pairs

aa – amino acid

PPi – inorganic pyrophosphate

NTP – nucleotide triphosphate

NMP – nucleotide monophosphate

NAC - nucleotide addition cycle

Transcription complexes: RPc – RNAP/ promoter initiation closed complex

I1 – first initiation intermediate

I2 – second initiation intermediate

RPo – RNAP/ promoter open initiation complex

TEC – transcription elongation complex

HC – halted complex

Transcription sites ops – operon polarity suppressor, DNA element and signals: opsP – ops-dependent pause site

σP – σ-dependent pause site

RO – run-off transcript

T – termination site or the released transcript

xviii

Proteins: RNAP – RNA polymerase

RfaHC – C-terminal domain of RfaH

RfaHN – N-terminal domain of RfaH

NusGN – N-terminal domain of NusG

NusGC – C-terminal domain of NusG

β'CH – clamp-helices domain of RNAP β' subunit

βGL – gate-loop fragment of RNAP β subunit

ΔGL – RNAP variant with βGL deletion (aa368-376)

BH - bridge helix

TL – trigger loop

TH – trigger helices

α2 – dimer of α-subunits of RNAP

Antibiotics: dMyx, Myx – myxopyronin

Stl – streptolydigin

Rif - rifampicin

xix

Chapter 1: Introduction

Basics of Transcription

In the orchestra of molecular processes in a living cell the synthesis of RNA is the first step in expression of genetic information stored in DNA. RNA polymerase (RNAP) and its interaction partners recognize intrinsic signals encoded in the template and select which sets of genes will be transcribed in a given environmental and developmental state of the cell. Multiple biochemical cross-talks connect transcription to all essential cellular processes, such as DNA replication and repair, translation, cell wall biogenesis etc. to ensure that the program always meets the cell’s needs.

Depending on where and how often RNAP decides to initiate transcription, where it stops, and how fast it elongates RNA in between, a unique pool of transcripts is generated.

RNA Polymerase Architecture

All classes of RNA messages in bacteria are synthesized by a single enzyme with a unique architecture (1,2). RNA polymerase core enzyme has a multisubunit composition α2ββ’ω. The active site (3,4) and the nucleic acid-binding channel are

formed by the β and β’ subunits (1), the biggest proteins in E. coli (Eco)(Fig. 1). The

overall shape of RNAP is reminiscent of a crab claw that clamps on nucleic acids with β

forming one pincer and β’ - the other. A long bridge helix (BH) extends across the cleft

and splits it into the main channel that accommodates DNA and a narrow secondary

1 channel that likely serves as a route for NTP substrates (1,5-7).

The α-dimer (α2) ties β and β’ together and is located on a periphery of the

complex (1,5). α2 plays important regulatory roles during initiation: the α-C-terminal domain (αCTD) interacts with the UP elements (see Initial Promoter Recognition, p.12) in promoter DNA and with many transcriptional activators, thereby increasing expression of many genes (8-10) but is not directly involved in catalysis (11,12). A small

ω subunit is dispensable for RNAP function in vitro (13) but is thought to participate in complex assembly in vivo (13) and has been known to be required for stringent response regulation (14,15).

The core enzyme is fully capable of RNA synthesis but unable to initiate transcription on double-stranded DNA. Sequence-specific recognition of a promoter requires formation of a holo RNAP complex that includes the σ subunit (16,17).

Promoter elements -10 and -35 are recognized by σ regions 2 and 4, respectively (17). In a structure of T. thermophilus (Tth) holo enzyme (5), σ is bound mostly on the surface of core RNAP, except for a short linker called σ3.2 positioned between regions 2 and 4,

which protrudes towards the active site (Fig. 1). The portion of σA (a major σ-factor in

Tth analogous to σ70 in Eco) resolved in the crystal structure (5) lacks an N-terminal part

(σ1-73) that includes a conserved region named 1.1. The following segment forms several helix-turn-helix motifs that contact both sides of the main channel, connecting the RNAP jaws. This segment contains the σ2 region responsible for specific recognition of the -10

promoter element (18). In the structure, σ2 wraps around a region called β’ clamp helices

(β’CH, Fig. 1), a part of a large β’ clamp that forms one of the walls of the DNA-binding

channel. The second promoter recognition region, σ4 binds to the β flap element.

2

β’ β flap σ4 ω β’CH Active center

σ2

DNA binding α2 channel β Secondary channel σ3.2 Bridge helix

Fig. 1. RNAP holoenzyme architecture.

RNAP subunits (PDB #2A6E) are shown as cartoons; β is light green, β’ is dark green, α2 is light blue, ω is blue, and σ is purple. Some functionally important structural elements

are highlighted: the β’CH domain is orange, the β’BH is brown. The position of the σ3.2 loop is indicated by a magenta arrow, the major and the secondary channels are indicated by black arrows. σ-core contacts cover 10000 Å2 in the holoenzyme, that includes interactions of the σ2 domain with the β’ clamp and of the σ4 domain with the β flap.

3

Transcription Cycle

At the first step of the transcription cycle, one σ-subunit must associate with a

core RNAP to form a holo enzyme that is capable of binding to a promoter DNA

sequence (Fig. 2). Promoter recognition triggers a number of conformational changes in the DNA/holo RNAP complex that ultimately lead to separation of DNA strands inside the main channel. When the transcription bubble propagates to the initiation site, RNAP starts synthesizing RNA in a template-dependent manner. Several cycles of abortive transcription follow, when RNAP makes and releases short without dissociating from the template (19,20). The efficiency of promoter escape (the process of transition from initiation to elongation accompanied by breaking of σ-core contacts) is dictated primarily by intrinsic properties of the promoter DNA sequence but can be affected by certain protein factors, such as Gre (21,22).

Shortly after the transcription complex enters the elongation phase, the σ subunit dissociates from RNAP; however, it may re-bind later at a promoter-like sequence, resulting in promoter-proximal pausing (23,24).

Even after promoter escape, RNAP never travels alone; its movement along the

template is regulated not only by substrate concentration but also by many competing

protein factors (such as NusA, NusG, RfaH, Gre etc.) that affect its response to different

signals encoded by nucleic acids (both DNA and RNA, see Pausing Machanism, p. 22)

(25). The combinatorial effect of these intrinsic and extrinsic regulatory factors defines

whether the nascent RNA will be terminated prematurely or synthesized as a full-length

product. Upon termination, RNAP dissociates from DNA template and releases RNA to

repeat the cycle.

4

Fig. 2. Schematic representation of transcription cycle and its regulation.

(adopted with changes from (25))

The core RNAP is shown in gray, the initiation σ factor is purple. RNA and DNA are shown in red and black, respectively; the active site Mg2+ ion is indicated by a red circle.

This color scheme is used throughout the document unless indicated otherwise.

5

Nucleotide Addition Cycle

During elongation RNAP is highly processive, capable of synthesizing transcripts tens of thousands of nucleotide (nt) long. The high-resolution structure of a

bacterial elongation complex (consisted of core enzyme and short oligonucleotides that

mimic positions of T DNA, NT DNA and RNA in transcribing RNAP) provided

important insight into the basis of elongation: the nucleotide addition cycle (NAC)(26). A

complete cycle of nucleotide addition consists of NTP binding, phosphoryl transfer, PPi release, and translocation (Fig. 3). RNAP repeats this cycle many thousands of times to complete the synthesis of a nascent RNA chain, while remaining bound to both a DNA template and a growing RNA transcript.

Phosphoryl transfer occurs by Mg2+-dependent, SN2 nucleophilic attack of the

RNA 3'- hydroxyl on the α-phosphate of the incoming NTP (Fig. 4) (27,28). The active site is located deep inside the main channel and consists of two parts: the i-site, in which the 3’-end of the nascent RNA resides, and the i+1 site, to which the incoming NTP substrate binds prior to catalysis. Both the β and β’ subunits donate residues to coordinate the incoming NTP and two magnesium ions crucial for catalysis (28). A high- affinity Mg2+ ion (labeled as "1" in Fig. 4 and "Mg1" in Figs. 3 and 5) is coordinated by

aspartate residues from the invariant DxDGD motif (2) and contacts both the primer and

the substrate, whereas a low-affinity Mg2+ ion (labeled as "2" in Fig. 4 and "Mg2" in

Figs. 3 and 5) forms fewer direct contacts with amino acid side chains at the active site; instead, it likely binds all 3 phosphate moieties of incoming NTP. After nucleotide addition, Mg2 leaves the active site bound to PPi. A new Mg2 ion is brought in with each incoming NTP in the beginning of each cycle. The nucleotide addition reaction can be reversed if the concentration of inorganic pyrophosphate is high enough, a process known as pyrophosphorolysis (29).

Alternatively, Mg2 can be stabilized by phosphate groups of a backtracked

RNA(30) or acidic residues of transcript cleavage factors (31). In this case, the active site of RNAP can catalyze a different type of reaction: hydrolysis of a nascent RNA (32,33).

6

Fig. 3. Nucleotide addition cycle.

(adopted with changes from (26))

TL – trigger loop, TH – trigger helices, BH – bridge helix. Template DNA is red, non- template DNA is black, RNA is yellow, Mg2+ ions in the active site are indicated by magenta circles.

7

Fig. 4. Catalysis of nucleotide addition by RNAP (residue numbers correspond to E. coli)

(adopted from (28))

Conserved aspartate residues (“catalytic triad”: D460, 462 and 464) of the β’ subunit coordinate the high-affinity Mg2+ ion (#1, Kd~ 100µM (29)) that forms contacts with the α- phosphate of a substrate and the 3’-OH of a primer. The low-affinity Mg2+ (#2, Kd~

10mM (29)) ion comes in with each NTP substrate and leaves with PPi each cycle; in the active site it is coordinated by residues provided by both β and β’ subunits (β’ D460 and

β E813).

The reaction commences with nucleophilic attack of the 3’-OH of the primer on the α- phosphate of the substrate. Mg1 is thought to lower the pKa of the hydroxyl group and to stabilize a pentacoordinate transition state along with the Mg2. Mg2 compensates for a build-up of a negative charge on the phosphate groups and facilitates the leaving of the β and γ phosphates as PPi.

8

Intrinsic transcript cleavage rates are typically low; however, hydrolysis is greatly stimulated in the presence of transcript cleavage factors, such as Gre in bacteria or TFIIS in eukaryotes (33).

The NAC (Fig. 3) begins with the NTP binding to the post-translocated complex in a so-called pre-insertion state in which the NTP is base-paired to the DNA template strand (T DNA), but is not properly aligned for catalysis with the 3’-OH (Figs. 3 and 5A); in this state, the active site is in an open conformation (26). The open conformation allows entrance of the incoming NTP. This step is followed by the closure of the active site which is mediated by a key element of the β’ subunit called the trigger loop (TL).

The TL is disordered in the Tth holoenzyme structure (5) but in the elongation complex it refolds into an α-helical hairpin called the trigger helix (TH) in the presence of the NTP substrate (26), with which it makes multiple contacts. The TH forms a triple-helical bundle with the catalytic bridge helix. This structural change closes the active site and moves the NTP into the insertion position, in line for nucleophilic attack (Figs. 3, 4 and

5B). In the pre-insertion state, the phosphate moieties are displaced from Mg1 but most of the contacts with the base and the sugar moiety of the incoming NTP are already formed (Fig. 5A), which allows for substrate binding with high affinity and specificity

(26). Indeed, substitutions in the TL have a modest effect on substrate affinity but severely decrease the catalytic rate (34). The cycle may be interrupted by molecules that affect structural transition of the TL. For example, the antibiotic streptolydigin prevents folding of the TL even in the presence of the substrate and allows direct visualization of the pre-insertion state (Fig. 5A)(26).

After formation of a new phosphodiesther bond, the 3’-end of the transcript is positioned in the i+1 site, thereby blocking the substrate from binding (Fig. 3). Thus,

RNAP has to translocate one nt forward on the DNA template and clear the i+1 site for the next NTP to be aligned(28,29,35-37). Translocation requires unwinding of one (bp) of the downstream DNA duplex and the last bp of the RNA/DNA heteroduplex and reannealing one bp of the upstream DNA duplex. Single-molecule analysis of

9

RNAP movement along the template suggests that forward translocation occurs in a stochastic manner, driven by thermal motions (38). After addition of NMP, RNAP diffuses back and forth on the template, oscillating between pre- and post-translocated states(38). Binding of NTP would shift the equilibrium to the post-translocated complex.

Applied forces in either direction did not change the velocities of individual RNAP molecules, indicating that translocation is not a rate-limiting step in the NAC (39).

In a recent report from Landick's group the folding of trigger helices was linked to the formation of non-backtracked pauses (see Pausing Machanism, p. 22)(40). The authors proposed that the principal contribution of TL folding is a steric alignment of all reactive groups in the catalytic site of RNAP for catalysis rather than direct involvement in catalysis: the conformation of the TL was found to be important for nucleotidyl transfer and pyrophosphorolysis (reactions that require the precise alignment of phosphate moieties for catalysis) but not for intrinsic or factor-assisted transcript hydrolysis (34).

The NAC entails conformational changes in other RNAP elements. Superposition of transcription elongation complex (TEC) structures in the presence and the absence of the NTP substrate revealed a displacement of the β pincer causing a substantial opening of the DNA-binding channel upon NTP binding (Fig. 3)(26). With the RNAP jaws

opened, DNA loses several polar and van der Waals contacts with the main channel and

appears to be more mobile. No such opening occurs in the complex formed in the

presence of streptolydigin that stabilizes the inactive pre-insertion intermediate,

suggesting that small changes in the active site may cause larger domain movements on

the periphery of the complex. There is an intriguing possibility that the converse may

also be true: proteins that bind on the outside and affect mobility of RNAP jaws, thereby

changing the kinetics of nucleotide addition (41).

10

A B

Stl

i +1 +1 TH i TL

Hβ’1242 i -1 i -1 Tβ1088 Rβ’1239 TDNA Mβ’1238 TDNA Hβ’1242 BH

BH Rβ’1239 β N ’737 AMPcPP AMPcPP Dβ’739 Mg2 Mg2 Dβ’739 Dβ’743 Dβ’743 β D ’741 Mg1 Mg1 Dβ’741 Eβ685 Eβ685

Fig. 5. Position of the incoming NTP in pre-insertion (A) vs. insertion (B) complexes.

Numbers correspond to Tth RNAP. Mg1 and 2 are shown as magenta spheres, T DNA bases are shown as red sticks, RNA primer is yellow, a substrate analog AMPcPP is blue, streptolydigin (Stl) is black. The β’TL (β’1221-1266) is shown in cyan, BH (β’1066–1103) is orange, a fragment of β’ that contain a catalytic triad (D739, 741, 743, sticks) and a ribodiscriminator N737 (sticks) is dark grey, Eβ685 that participates in coordination of

Mg2 is light grey sticks.

A. A snapshot of the active site from TEC/AMPcPP/Stl structure (PDB#2PPB). The TL is trapped in a partially unfolded state in the presence of streptolydigin, AMPcPP is in the pre-insertion site. The phosphate moieties of the substrate are rotated ~35° away from

Mg1, adopting a position incompatible with catalysis.

B. A snapshot of the active site from TEC/AMPcPP structure (PDB#2O5J). Folded TH forms contacts with phosphate moieties of the substrate and stabilizes it in the insertion conformation. Mβ‘1238 (TH) and Tβ‘1088 (BH) form contacts with the substrate and

template bases, when Rβ‘1239 and Hβ‘1242 contact the phosphate moieties of the substrate, positioning them for catalysis.

11

Transcription Initiation

Initial Promoter Recognition

At the onset of initiation, σ binds to promoter elements in the context of the holo enzyme (complex of core RNAP and σ, subunit composition α2ββ’ω σ) and nucleates

DNA strand separation (Fig. 6)(42,43). Sequence comparison revealed two major types of

σ-factors, named σ70-superfamily (after σ70 factor that directs transcription of

housekeeping genes in E. coli), that does not require any additional co-factors to initiate

transcription, and σ54-family, that depend on ATP-dependent activators for promoter opening a(43). Typically a bacterial genome encodes several σ70-family factors that have different promoter specificities, allowing for fast switching between genetic programs in response to environmental changes (44). A typical promoter for a primary

(“housekeeping”) σ factor consists of two elements named -10 and -35, each 5-6 nt in

length, separated by a 17 bp (±1bp) spacer (43). In addition, a number of adjacent

elements may affect holo RNAP recruitment, such as interaction between α-CTD and an

AT-rich sequence element called the UP, typically positioned ~60 bp upstream of the transcription start site, or transcriptional activator proteins that bind on the same distance from transcription start site (8,12,45-48).

The process of promoter recognition by σ70-holo was studied in detail. Sequence- specific contacts with -35 and -10 elements are established by helix-turn-helix motifs from regions 2 and 4 respectively (Fig. 1)(5,49-51). Region 3.2 (σ3.2) located between the promoter-recognition domains adopts a hairpin conformation that protrudes towards the active site in the holoenzyme and occupies a part of the RNA exit channel (51,52).

This region was shown to facilitate binding of the substrate nucleotide in the active site after open complex formation, as well as to affect promoter escape. The position of the dispensable region σ1 in the promoter-bound RNAP is not clear; some studies suggest it

is loosely bound in the main channel in the closed complex and is displaced by the

downstream DNA duplex on the later step of the open complex formation (53,54).

12

Intermediates in Open Complex Formation

The process of open complex formation results in protein-assisted DNA melting inside the RNAP from position -12 to +1 relative to the transcription start (51). During initiation, the promoter- holoenzyme complex undergoes several conformational changes, from a loosely associated closed complex (RPc,, Fig. 6) to a stable initiation-competent open complex (RPo). It is assumed that open complex formation occurs by a common

mechanism at different promoters; however, the relative occupancy of structural

intermediates varies. In other words, the kinetic intermediates in the pathway may

correspond to the different structural complexes at different promoters (55).

For bacteriophage promoters T7A1 and λPR, at least two kinetically significant intermediates have been identified, I1 and I2 (56-58). Table 1 summarizes some important characteristics of promoter complexes in the initiation pathway for the λPR promoter.

RPC, the first in the scheme, is kinetically insignificant and can be detected only in melting-deficient RNAP variants(59). However, at certain promoters, like ribosomal rrnBP1, RPC is well-populated (60). The biochemical features of this complex are a short

DNaseI footprint encompassing approximately from -48 to +1 from start site, and the absence of strand separation. The hypothetical conformation of RPC is shown in Fig. 6.

The current model for promoter opening suggests that strand separation starts in I1 from

σ-induced (or rather stabilized) kink centered at -11/-12 (9,61). I1 is relatively stable and

can be characterized at low temperatures. DNaseI protection extends downstream to

+22, and is generally attributed to interactions of DNA with the β upstream lobe domain

(a part of β pincer). The sharp bend allows upstream DNA to enter the RNAP jaws,

which together form the main channel (62,63). It is believed that promoter bending and

loading into the main channel are coupled with the nucleation of melting; however, DNA

separation at this step cannot be detected by permanganate modification (64). Potassium

permanganate reacts selectively with T residues in DNA unless they are protected by

stacking interactions with adjacent bases in a double helix or with protein residues,

causing irreversible modification that can be further mapped with 1 nt resolution. In the

13 current model, -11A of the non-template DNA strand(NT DNA) flips out and then is captured by stacking interaction with tyrosine 430 in σ2.3 (62,65,66). Numerous studies

have demonstrated that for λPR this step requires wrapping of upstream DNA to the -81 position around the RNAP (67). Record and colleagues proposed that upstream wrapping critically facilitates loading of the downstream DNA into the jaws (67).

RPC I1 I2 RPO

Kinetic parameters kinetically KB = 6×106 M-1 k2= 6.6×10-1 s-1 Half-life > 8h (62) insignificant ΔC˚=-1.4 kcalK-1 ΔC˚=~0 ΔH˚= -27 kcal Eact = 34 kcal Protection from -48 to +1 -80 to -56; Not determined -48 to +22 DNaseI -48 to +22

Strand separation No separation Nucleated, -11 to -1(69) -12 to +2 (68) -12 to-9 Controversial Sensitivity to Sensitive Sensitive Controversial Resistant competitors Key events Establishing of σ-induced kink Engulfing of Transcription sequence- at the -10 region, downstream bubble reaches specific flipping out of DNA to the +1 position, contacts with -11 base, main channel, accommodation promoter DNA wrapping of propagation of of DNA in and σ upstream DNA DNA melting main channel

Table 1. Characteristics of promoter-RNAP complexes in the transcription initiation.

Using rapid quench kinetics, the rate of RPO formation as f([RNAP]) over the

temperature range was measured and k1 and k2 were determined(70). KB =k1/k-1 - binding constant at 37˚C, k2 – isomerization constant (I1 to I2) at 37˚C, ΔC˚- activation

heat capacity change, Eact - Arrhenius activation energy, ΔH˚- change in enthalpy at 37˚C.

14

Fig. 6. Schematic of steps the transcription initiation.

Positions -35 and -10 of the promoter elements are indicated by black arrows, position of transcription start is indicated by a red arrow. For λPR, intermediate complexes I* and I** correspond to I1 and I2.

15

For λPR, the formation of I1 is characterized by a large negative change in heat

capacity (indicating that the surface being exposed to water is predominantly non-polar)

and an unusually low binding constant for a specific binding interaction over an extensive

surface (Table 1)(62). It has been proposed that DNA binding is coupled to its bending, so

that the largest portion of binding energy compensates for unfavorable conformational

changes in the protein and the promoter which occur upon DNA loading to the main

channel (62).

Structural analysis suggests that the second kink around the +1 position in the

promoter DNA is required to form the initiation-competent RPo, which is believed to occur in I2 (71). I2 isomerizes into RPo very fast and is not amenable to biochemical analysis due to a short half-life. The transition from I1 to I2 is the rate-limiting step in RPO

formation for λPR, and is characterized by a large temperature-independent Arrhenius

activation energy (E2act = 34(±2) kcal)(62). This kinetic signature suggests that formation of the transition state (I1–I2)‡ involves large conformational changes and the exposure of a polar and/or a charged surface to water. Surprisingly, it does not exhibit a measurable change in activation heat capacity (Table 1). A plausible explanation offered by the

Record group is that the opposite effects on the heat capacity of change in polar and non-polar water-accessible surfaces would compensate each other if 70% of the surface area exposed upon (I1–I2) transition involves polar and charged surfaces (62).

Transition to Elongation

The transition from initiation to elongation requires breaking of most core-σ and σ-DNA contacts. Prior to the transition, RNAP repetitively synthesizes and releases short 3-12nt RNAs, a process known as (Fig. 2). To explain the apparent contradiction between the unchanged position of the upstream DNA (determined by its contacts with σ) relative to the RNAP and the active site translocation upon synthesis of short RNAs, a "scrunching" model was proposed (19). The model postulates that RNAP maintains protein-promoter contacts while the transcription bubble propagates further downstream, forming an unstable stressed intermediate

16 complex. Single-molecule studies provided evidence for scrunching during abortive initiation (20). In a stressed complex, ~1 turn of DNA is unwound and contracted, pulled by RNAP into itself. Accumulated tension can be resolved in two ways. In abortive initiation, RNAP loses grip on the downstream duplex, allowing single-stranded DNA to reanneal in front of itself with an extrusion of nascent RNA. In productive initiation, RNAP breaks contacts with an upstream duplex and preserves the RNA/DNA hybrid. In this scenario, the transcription bubble is being reduced to a size typical for the TEC due to partial collapse of single-stranded DNA bulges at the downstream part of the hybrid. The energy accumulated in the stressed intermediate is thought to compensate for the disruption of RNAP-promoter contacts upon the transition to elongation.

Promoter escape is accompanied by a number of rearrangements in the transcription complex. When a growing RNA reaches 5-7 nt, it clashes with σ3.2 that already occupies the RNA exit channel (Fig. 1)(51,52). As RNA becomes even longer (16- 17 nt), it promotes disruption of σ contacts with the β flap. The only σ-RNAP contact that remains compatible with changes in the transcription complex upon the transition to elongation is the interaction of σ region 2 and β’ clamp-helices (β’CH, Fig. 1)(5,23,72).

In contrast to a single subunit T7 RNAP, bacterial RNAP does not undergo large- scale refolding upon formation of a processive transcription elongation complex (TEC) (73-75). Instead, the displacement of σ-factor manifests the transition from initiation to elongation. The lack of structural information on the initiation complex for bacterial RNAP obscures the assessment of conformational changes that drive the formation of the TEC.

The TEC is extraordinarily stable in up to 1M KCl, temperature up to 60 °C (E. coli) and pressure up to 180 MPa (76,77). Dissociation of the TEC under physiological conditions can be achieved only at certain sites on the template. Intrinsic interactions between RNAP, nascent RNA and/or DNA can give rise to altered TEC conformations that differ in their stabilities and catalytic properties. These conformations can also be targeted by protein factors that can affect the termination efficiency at a particular position (25).

17

Transcription Pausing and Termination

Role of Pausing in Transcription Regulation

Even at saturating NTP concentrations, RNAP translocates in leaps, with its fast

movement along the template punctuated by pauses (78). At a pause site, elongating

RNAP stops temporarily but does not dissociate from the nucleic acids, undergoing a

reversible isomerization into a catalytically-inactive state. Upon addition of a nucleotide,

RNAP can escape and continue chain elongation. Kinetically, pauses can be

characterized by their efficiency, the fraction of RNAP molecules that enter the paused

state at a given position, and longevity, the pause half-life (79).

Pausing plays many regulatory roles, is an obligate step in termination pathways,

and likely controls the overall rate of RNA chain elongation (80). Whereas core E. coli

RNAP can extend the nascent RNA at 500 nt/sec in vitro (37,39), it moves only at 20-90 nt/sec in vivo (81). This relatively slow rate may be necessary for timely recruitment of regulatory factors, such as RfaH and λQ (82,83), attenuation control (84), co- transcriptional folding of the RNA (85,86), and efficient coupling of transcription and translation. Both general and operon-specific regulators control the fate of an emerging transcript, fine-tuning gene expression in response to environmental signals (25). These regulators often function as a part of multi-component ribonucleoprotein complexes.

Regulation of pausing and termination is a key feature of such diverse bacterial systems as (87), attenuation (88), temporal expression of phage λ genome (83,89) and control of ribosomal operon expression (78). Emerging studies of pausing in eukaryotes suggests that it plays roles in Tat-mediated regulation of HIV-I transcription

(90), polyadenylation (91,92) and splice site selection (93,94).

Cellular Elongation Factors

Factors that affect RNAP progression along the template can be divided into two groups: locus-specific, which change pausing or termination properties of TEC only at a unique site, and general elongation factors that display a broader specificity because

18 they mostly target a particular conformation of the TEC. Interaction of general elongation factors with the TEC provides important insight into the mechanism of pausing and termination and will be the focus of further discussion.

Gre factors bind to arrested TEC complexes through the secondary channel and stimulate the hydrolytic cleavage of a nascent transcript by RNAP (95-97). Such an arrest occurs if RNAP slides back on the template and the 3’-end of the growing transcript occludes the active site, preventing nucleotide addition (see Pausing Mechanism, p.

22)(98,99). Gre-assisted RNA hydrolysis generates a new 3’-OH primer in the active site, rescuing the arrested complex. Activity of Gre factors can be easily monitored in vitro by release of short RNA fragments. A structural signature of this family of factors, which includes eukaryotic TFIIS (100), is an elongated domain with two conserved acidic residues on its tip that are inserted into the secondary channel and coordinate the second

Mg ion required for catalysis (Figs. 4 and 7)(31,101). However, the role of this activity in vivo is not well understood. Gre factors have been implicated in transcription initiation and promoter escape (21,102) as well as proofreading during elongation (30,103), however, none of these activities appear to be essential, as deletion of both Gre factors in

E. coli is not lethal (104).

The essential elongation factor NusA has an even broader range of transcriptional activities. It has been identified in E. coli as a factor required for phage λ gene expression and thus named N utilization substance A (105). In vitro, NusA reduces the rate of elongation and enhances hairpin-induced pauses and intrinsic terminators (41,106). At the same time, NusA is an essential component of large antitermination complexes that assemble at λQ- and λN- regulated , as well as ribosomal operons (89,107,108). NusA is a large extended protein with multiple RNA-binding motifs, an N-terminal domain that interacts with the β-flap and a C-terminal domain that interacts with α-CTD and λN (Figs. 7 and 11)(109-112). The mechanism of NusA action is still elusive in either stimulation or suppression of pausing and termination. Its position near the RNA exit channel and direct crosslinking to a pause hairpin (112)

19 suggest that NusA can affect β-flap/RNA hairpin interactions that were shown to be important for the efficiency of pausing. The essential role of NusA in the cell is somehow linked to Rho action, although different studies provide contradictory evidence on the exact nature of their interaction. In one case the construction of the ∆nusA strain became possible after deletion of a cryptic rac prophage (113), normally suppressed by Rho, which implies a cooperation between two factors. However, in a different study lethality of ∆nusA was overruled by compensatory mutations in rho that reduced activity of the factor (114). Moreover, NusA inhibits Rho-dependent termination in vitro (115). NusG is another essential component of ribosomal and λN- antitermination systems (116). NusG is ubiquitous in bacteria and its homologs are present in and eukaryotes, but which of its functions makes it essential is not well understood (117- 123). Just like NusA, NusG is essential in wild-type E.coli strains (124) but can be inactivated if rac prophage is deleted (113). In vitro studies of E. coli NusG identified two apparently contradictory activities: on one hand, NusG suppresses pausing at class II sites and increases the elongation rate (41,125), on the other hand, it stimulates Nun- and Rho-dependent termination (126-128). Many organisms also encode specialized forms of NusG, such as RfaH, that control specific sets of genes (more detailed discussion of NusG proteins is provided in NusG Family of Elongation Factors, p. 37)(129,130). The NusE :NusB heterodimer is also required for assembly of fully functional λN- and ribosomal antitermination complexes (131,132). In addition to its involvement in transcription, NusE (ribosomal protein S10) also participates in translation as a component of the 30S ribosomal subunit. NusB recognizes a specific sequence in the nascent transcript called boxA and loads the NusE/NusB heterodimer on the TEC; this interaction is further stabilized by NusE contacts with α-CTD (133,134). Recent studies revealed that NusE also forms direct contacts with NusG, specifically, with its C- terminal domain (135). Isotope-filtered NOESY NMR spectra revealed that the NusGC /NusE binding interface does not overlap with the NusE /NusB dimerization surface, or with determinants that mediate NusE interaction with the . Thus, NusE/NusG may act as a bridge between the elongating RNAP and the translational machinery.

20

Fig. 7. The regulatory map of TEC and interaction relationships between elongation factors that bind to the same site on the RNAP surface.

Interaction of factors that target the same binding site can occur in different ways: the binding of one factor may exclude the binding of “competitors”, two factors may enhance the effects of each other, or display no apparent competition or cooperation.

Factors that bind to β’CH (NusG, RfaH, σ) exclude each other from the TEC (136,137).

NusA and σ forms contacts with β- flap but do not compete (138,139), likely because

interaction with β’CH is sufficient for σ retention. Binding of NusE:NusB heterodimer

has a cooperative effect on NusA binding (105); so does ppGpp on DksA(140). GreA and

GreB compete for binding to the secondary channel (96), whereas GreB and DksA do not

(AS, unpublished).

21

Pausing Mechanism

Although pausing is caused by sequence-dependent signals, no consensus pause sequence exists. Single-molecules studies revealed that bacterial RNAP pauses frequently (every 100-200 bp) for 1-6 sec on average (39,141). Comparison of elongation profiles of individual RNAP molecules on a gene without strong pause sites showed that their rates are remarkably similar during the periods of uninterrupted movement(141).

The observed heterogeneity in overall rates arises mostly from short stochastic pause events that occur randomly throughout the template. The rate distribution curves consistently showed only two peaks for each profile: the one that corresponds to the rate of transcription between pauses and another at 0 nt/sec, or a pause itself. “Slow” molecules appear to spend a larger fraction of time in a paused state than “fast” RNAPs, rather than have a distinct elongation rate or pausing more frequently. Other studies have reported wider distribution between individual rates of elongation between pauses; however, these rates remain constant for each molecule in a course of experiment (39).

Pausing arises when a fraction of pre-translocated complex isomerizes into a

catalytically inactive off-pathway intermediate (termed an elemental pause, or a

common intermediate) whose existence was inferred from the analysis of RNAP

response to regulatory pause signals and elongation factors (41,75). Formation of the

elemental pause involves fraying of the 3’–terminal nucleotide from the template base,

slowing down phosphoryl transfer and the isomerization into a post-translocated state.

Misalignment of the RNA 3’-end in the active site has been shown by crosslinking(40),

site-directed mutagenesis (41) and X-ray crystallography (142). Consistent with the

observations for strong pause sites (bacterial RNAPs), two uridine residues at the 3’-end

were found to favor a frayed conformation (Fig. 8D). The base and a sugar moiety of the

ultimate nucleotide are not resolved in the structure, indicating a relaxed conformation

of the 3’-UMP. The frayed conformation can be further stabilized by a preceding

misincorporation event, such as formation of a wobble dT-rU pair (Fig. 8A, B), allowing

22 crystallographic characterization. In a frayed complex, the RNA 3’-NMP is flipped away from the template DNA and the active site, compromising the correct positioning of the incoming nucleotide.

For bacterial RNAP, formation of an elemental pause state has been proposed to be linked to a modest opening of a mobile clamp domain (also called β' pincer) that contacts nucleic acids in the TEC (6,7,41). If indeed opening of the jaws and altering of

RNAP/DNA contacts correspond to the folded TH and the closure of the active site, such a conformation would favor isomerization into an off-pathway intermediate from pre- translocated state: the incoming NTP which stabilizes a post-translocated complex requires an open pre-insertion site.

The elemental pause can escape to the elongation pathway, isomerize into long-

lived paused states or give rise to the termination complex. A proposed branched

mechanism of pausing is shown in Fig. 9. Longer-lived pauses require additional

rearrangements in the TEC, such as backtracking (class II pauses), stabilization by a

nascent RNA hairpin (class I pauses) or a bound protein factor(41). Paused complexes

may restore the correct alignment at the active site after addition of NMP for the hairpin-

stabilized pauses or by hydrolytic cleavage of the backtracked RNA for class II pauses,

and continue productive elongation. Alternatively, the arrested TEC can release a

transcript or it will be removed by termination factors like Rho or Mfd (see Factor-

Dependent Termination, p. 30 ) (143-145).

Although class I and II pauses are mechanistically distinct and they differ in their

response to elongation factors (Fig. 9), they share some common features, such as

response to particular mutations in RNAP (41). Most class I signals were found in

untranslated regions of biosynthetic operons in Enterobacteria, such as his pause in E. coli

(hisP) and trp pause in Serratia marcescens (146,147).

23

AB TL TL N479 dT dA dT dC

D485

Y769 R766 Y769 R766 BH BH

CD TL TL

dT dA dA dA

αP

Y769 R766 Y769 R766 BH BH

Fig. 8. Alternative conformations of 3’-rNMP in X-ray structures of yeast RNAP PolII.

The same color scheme as in Fig. 5 is used. Asterisks indicate a frayed 3’-rNMP.

continued

24

Fig. 8 continued

Penultimate dT-rU wobble base pair induces fraying of the following 3'-NMP:

A. dT-rU mismatch is followed by a canonical dA-rU bp (PDB#3HOW);

B. dT-rU mismatch is followed by a canonical dC-rG bp (PDB#3HOZ);

Both frayed rU and rG occupy the NTP-binding site but adopt different conformations:

U is parallel to the axis of the DNA/RNA hybrid and forms contacts with R766 (blue sticks) of Rpb2, normally involved in substrate binding; G is perpendicular to the hybrid axis and is stabilized by stacking interactions with Y769.

Two uridine residues at 3’ -end destabilize base pairing and favor a frayed conformation:

C. Canonical dT-rA bp (not shown) followed by dA-rU bp (PDB# 3HOV);

Base pairing is normal, complex is in a post-translocated state.

D. Canonical dA-rU bp followed by a another dA-rU (PDB# 3HOV).

3'-rU does not base pair with the template and is not resolved in a structure (only αP is visible, indicated by a black arrow). Complex is in a pre-translocated state.

25

Fig. 9. Structural features and kinetic relationship of different elongation complexes.

The TEC oscillates between pre- and post-translocated states. The pre-translocated complex may spontaneously isomerize into an inactive intermediate, the elemental pause. A hallmark of the elemental pause is the fraying of the 3’-terminal nucleotide.

This short-lived complex can be re-activated upon binding of NTP to continue the cycle of nucleotide addition. Alternatively, the frayed complex may undergo further rearrangements (backtracking, formation of an RNA hairpin, etc., see text) into one of the long-lived paused or termination complexes. Formation of the termination complex is irreversible, whereas paused complexes may eventually escape to elongation. Class I and II pauses differ in their response to elongation factors. Hairpin-dependent pauses are stimulated by NusA and suppressed by RfaH; backtracked pauses are sensitive to

NusG, RfaH and Gre factors.

26

At least four different elements have been found to affect class I pausing : i) a nascent RNA hairpin, ii) the 10- or 11-nt long region between the hairpin base and the 3'-

RNA end, iii) the two bases in the active site and iv) ~ 14 bp of dsDNA downstream from the pause site (148,149). All four components were shown to slow down nucleotide addition. Reducing the distance between a hairpin and 3’-OH to 8 nt or less causes transcript release, but only from a stalled complex (150). A more detailed dissection of the hisP –RNAP complex demonstrated that the RNA hairpin contacts a flexible β flap domain located at the RNA exit channel and likely affects nucleotide addition allosterically (112). The TEC stalled at the his pause appears to be in the pre-translocated state with a frayed 3’-OH RNA, and is thought to be stabilized by the inhibitory conformation of the TL (34,40).

Class II pauses have been found in both and eukaryotes, and they are characterized by a reverse translocation (backtracking) of RNAP along the DNA and

RNA chains (90). Backtracking does not require any specific protein-nucleic acid interactions and reflects the natural property of RNAP to slide back and forth on the

DNA/RNA scaffold. In a backtracked complex, the unpaired 3’-end of the nascent transcript protrudes toward the secondary channel where it occludes the binding of an incoming nucleotide (99). Backtracking is facilitated by a weak RNA/DNA hybrid in the transcription bubble (151). RNAP may escape the arrested state by forward translocation or generation of free 3’-OH through intrinsic or factor-induced RNA cleavage in the active site (33,95,98,152). Stable secondary structures in the RNA or a protein that binds behind the TEC– anything that prevents RNAP sliding– would prevent backtracking. A recent finding that efficient progression of during nascent RNA translation increases the speed of RNAP suggests that backtracking controls the overall rate of RNA synthesis and transcription-translation coupling (153).

Recently, another type of pausing has been characterized in which the TEC is in the post-translocated state, capable of binding the incoming NTP (154). This pause, encoded by T7A1 D111 template, does not depend on the secondary RNA structure and

27 may be caused by i) misalignment of the 3’-end or an incoming NTP, or ii) RNAP interactions with the non-template DNA strand.

Intrinsic Termination

Unlike the DNA replication machinery, dissociation of the transcription complex is irreversible. Termination requires structural rearrangements in the normally stable

TEC that weaken RNAP's grip on nucleic acids and allows for RNA release. Although factors that trigger destabilization of the TEC are numerous and proposed mechanisms of their action are even more abundant, they all involve forces imposed in the upstream part of the transcription bubble (155). Typically, terminators are classified based on whether they require a protein co-factor or not.

An intrinsic is encoded in the nucleic acid and typically includes a

GC-rich hairpin followed by a U-rich segment (25,156). Transcript release occurs 7-8 nt downstream of the hairpin base. Termination efficiency cannot be accurately predicted from the hairpin length, GC content or the sequence of the 3’-end, which may vary significantly (157). Although the exact mechanism of hairpin-dependent termination is unknown, several models have been proposed. The most straightforward “rigid-body” models postulate that termination is triggered by the shortening of the RNA/DNA hybrid inside the TEC. The formation of a hairpin is thought to unwind the upstream edge of the transcription bubble already weakened by an unstable poly(rU:dA) hybrid.

In vitro, termination can be induced by a complementary oligo added in trans that simulates an upstream portion of the hairpin, implying that the interactions between the hairpin loop and RNAP are not required for transcript release (150,158).

Scenarios for the last step of RNA extraction from the active site also vary from study to study. In a forward translocation model, strong hybrids behind the RNAP

(upstream DNA duplex and RNA stem) and weak hybrids inside and in front of the TEC create conditions for translocation without nucleotide addition. Forward translocation results in a loss of 3’-OH from the active site and shortens the DNA/RNA hybrid, which further destabilizes the complex (157,159,160). In support for this model, intermolecular

28 crosslinking of the downstream duplex was found to prevent transcript release at the well-characterized t82 terminator from a lambdoid phage 82 (161). In the RNA pullout model, a propagating hairpin stem can extract RNA from the hybrid without RNAP translocation (162). Certain sequences, such as the homopolymeric tract of the T7 terminator (T7 phage), were shown to induce RNAP slippage, resulting in a longer polyU tail at the 3’- end than the one encoded by the template (156).

Allosteric models stress the importance of the interactions between a hairpin and

RNAP structures in proximity of the RNA exit channel. Such interactions are proposed to trigger extended conformational changes in RNAP, including clamp opening and active site rearrangement, ultimately leading to RNA release (112,163,164). For example, analysis of RNA release at the λtR2 terminator (lambda phage) was not consistent with the forward translocation model: the increased stability of the downstream hybrid stimulated RNA release and no change was detected in the crosslinking pattern of the 3’- end in the trapped termination complex (164).

Most likely, both allosteric effects and hairpin invasion contribute to termination.

Recent single-molecule analysis indicates that termination at different signals may occur via distinct mechanisms. Exertion of external force along the template was found to stimulate transcript release for the t500 terminator (a mutant derivative of t82 (158)), consistent with the forward translocation model. Neither assisting nor hindering forces applied to DNA affected his* (mutant derivative of histidine biosynthetic operon leader region from S. typhimurium) or λtR2 terminators; pulling RNA, however, did make a

difference. Interestingly, forces that caused a dramatic increase in efficiency of λtR2 and

his* terminators were much lower than those that were required for hairpin unfolding or

extraction of the U–track from the hybrid. The authors attributed the observed effect to

the suppression of secondary structures in RNA that normally compete for residues

involved in the hairpin formation. The same effect can be achieved in bulk solution with

a complementary oligonucleotide that anneals immediately upstream from the stem-

loop structure. Thus, the probability of hairpin formation is a defining factor for the

29 efficiency of λtR2 and his* terminators. No translocation larger than 1 bp was observed prior to transcript release on λtR2 and his*; RNA appears to move independently of

DNA as it is being extracted from the TEC (165).

Factor-dependent Termination

At least two protein-mediated termination mechanisms have been characterized in bacteria. Rho is an ATP-dependent /translocase responsible for ~50% of termination events in E. coli (166). Rho is essential in E. coli, targeting mostly prophages

and other horizontally acquired genes (113). Rho binds to a nascent RNA at the rut (Rho

utilization) site and translocates in the 5’ to 3’ direction until it catches up with the TEC

and triggers its dissociation (167). The exact basis of Rho sequence-specificity is not well

understood, which makes the prediction of Rho targets almost impossible. The Rut site is

typically enriched in C residues that form specific contacts with a primary binding site

located at the outer surface of the Rho hexamer (Fig. 10A)(168). Interaction with rut

causes the hexamer ring to open and let RNA pass through the central hole, where it

forms contacts with mobile Q- and R-loops in a secondary binding site (Fig. 10B, C),

inducing the ATPase activity of Rho (169,170). The ATP-binding sites are bipartite: a

nucleotide-binding pocket on one protomer is positioned next to a γ-phosphate sensor of

another protomer (171). This architecture is essential for the mechanochemical coupling

of ATP hydrolysis to the conformational movements in the hexamer.

Rho is thought to act by shortening the upstream part of an RNA/DNA hybrid,

pulling the 3'-potion of RNA out of the hybrid, analogous to the stem-loop invasion in

intrinsic termination (155). Rho is not a very processive helicase (172), its movement

along ssRNA can be compromised by strong secondary structures (173) or large protein

complexes, such as a ribosome. At the same time, certain protein factors, such as NusG,

can tether Rho to the elongating polymerase and facilitate Rho-mediated RNA release

(174,175).

30

Fig. 10. Rho hexamer structure in open and closed conformations.

A. Rho with RNA bound to the primary site adopts an open conformation (PDB#1PVO), side view. Only two-nucleotides fragments bound to each protomer are seen in the structure (red sticks). The position of ATP bound at the ATP-hydrolysis site of one protomer is indicated by a blue arrow.

B. Rho with RNA bound to the secondary site adopts a closed conformation (PDB#3ICE), side view. RNA (red) in the central channel adopts a helical conformation.

C. Top view of B. RNA in the central channel is bound by the Q-loops donated by each protomer. ATP (blue sticks) is bound in the catalytic site at the subunit interface.

31

Mfd acts in transcription-coupled DNA repair where it senses RNAPs stalled at damaged DNA, destabilizes the TEC, and recruits excision repair enzymes (176,177).

Mfd is a large (130 kDa) protein composed of several domains with distinct functions: an

ATP-dependent DNA translocase, an RNAP-binding domain, a UvrA-binding domain and a C-terminal regulatory domain (176). Mfd was first identified as an agent that reduces mutagenesis in genetic analysis(178) and was shown to remove elongating

RNAP stalled by lesions in the template DNA strand (177). In vitro Mfd works on any stalled elongation complex. TECs stopped by a physical block or substrate deprivation are disrupted by Mfd. In the simplest model, Mfd simultaneously binds to ~20 bp of the upstream DNA and a site on the β subunit of RNAP and pushes TEC forward until the transcription bubble collapses due to forward translocation (143). However, in the presence of nucleotides it can facilitate escape of arrested, backtracked TEC to productive RNA synthesis (143), presumably through limited translocation.

32

Antitermination Systems

λN Protein

Antitermination by λN protein was discovered in early operons of bacteriophage

λ containing promoter-proximal terminator sites that prevent the expression of distal genes (179). Although λN can affect RNAP even in the absence of cellular factors (180), the complex is not stable; persistent antitermination requires the assembly of a multicomponent complex that includes NusA, B, G and E (Fig. 11A)(105,131).

Recruitment of λN occurs early in elongation and depends on the nut RNA element that consists of a recognition site for the NusB:NusE heterodimer (boxA) and an inverted palindrome that forms an RNA hairpin recognized by the N-terminal part of λN

(boxB)(181). λN protein also forms contacts with NusA and RNAP at unknown sites, furthermore, NusA and NusG directly bind to RNAP (Figs. 7 and 11A). The simplest model for λN action is that, together with NusA, it interacts with an emerging transcript and prevents Rho invasion and hairpin formation, consistent with reports that λN suppresses both intrinsic and Rho-dependent termination (107,182).

Interestingly, another λ-related phage HK022 encodes a protein called Nun that targets the same site as λN (nut) and interacts with the same protein partners but induces premature termination rather than antitermination (183). Interplay between

Nun and λN is thought to be an example of molecular adaptation of competing bacteriophages that prevents superinfection (116). Nun binds to RNAP in a Zn- dependent manner and anchors the complex to the DNA through sequence-specific contacts with boxB and via stacking interactions between a critical Trp residue and the downstream DNA duplex(184). Nun blocks RNAP translocation (in both forward and backward directions) but does not abolish its ability to perform pyrophosphorolysis or hydrolysis; however, Nun-arrested TEC cannot be reactivated by Gre (183). Termination and transcript release from Nun-modified complexes observed in vivo require Mfd (144).

33

λQ Protein

Protein factors of the λQ family allow expression of the late operons in λ-like

phages. Once recruited, λQ becomes a stable subunit of the TEC, conferring resistance to

both intrinsic and factor-dependent termination and increased rate of elongation

(83,185). The recruitment mechanism of λQ has been studied in detail. λQ binds

simultaneously to the elongating RNAP stalled at a σ-dependent promoter-proximal

pause (~25 nt downstream from transcription start site) and a specific region in the

promoter DNA duplex (qut, Fig. 11B)(24,186). Interaction with λQ results in displacement of σ4 from the position characteristic for the open complex, specifically

from the β flap domain (187). The presence of σ70 is essential for complex assembly: TEC lacking σ stopped by nucleotide deprivation at the same recruitment site cannot be modified by λQ (185). σ likely provides crucial contacts for λQ binding and/or creates a

specific “scrunched” conformation of the TEC that λQ can recognize. Q action is greatly

stimulated by NusA (89). Like λN, the binding site of Q-like factors on the TEC is still

elusive: mutations that impair Q function are located mostly in β and β' subunits far

from each other (188). While the precise molecular mechanism of λQ antitermination has not been determined, it likely includes two components: first, Q-modified complexes may transcribe faster than required for some initial steps in transcription termination; second, a protein may protect the emerging RNA from forming a stem-loop or the invasion of Rho (158).

Put RNA of Phage HK022

Put is an example of a cis-acting antiterminator that does not require any additional factors to modify RNAP into the pause- and termination-resistant state (189).

The antiterminator transcripts are encoded downstream from early promoters of HK022 phage (190). Structural and mutational analysis and enzymatic probing suggest that put

RNA folds into two stem-loop structures whose integrity is crucial for antipausing and antitermination (191). The 3’-proximal loop binds directly to a β’-Zn finger region on

RNAP (located near the RNA exit channel, on the opposite side from the β flap) and

34 maintains its interaction throughout elongation (189). There is no evidence for a direct contact between RNAP and the 5’-loop; it likely favors tethering of the 3’-hairpin to the

TEC indirectly (192). Put elements increase the overall rate of elongation, reduce hairpin- dependent pausing and intrinsic termination. Generally, put does not affect backtracking with the exception of one U-rich site located in the immediate vicinity of the 3’-put hairpin. Several models were considered to explain this phenomenon (Fig. 12)(193). The first model, which proposes that both antipausing and antitermination activities of put result from the same allosteric changes in the TEC, was ruled out by the fact that suppression of backtracking (but not termination) strongly depends on the distance between put and a pause site. Insertion of only a few nucleotides between the U-track and the 3'- stem-loop dramatically increased the efficiency of arrest. A mechanistic stabilization of the nascent RNA by hairpin formation also could not explain the experimental data: mimicking of the 3'- stem-loop by antisense oligonucleotides did not suppress backtracking. Thus, formation of the put RNA structure is not sufficient for antipausing at U-tracks. Thus, put prevents backtracking by physically restraining the nascent RNA from re-entry into the exit channel, using its interaction with the β’-Zn finger as an “anchor”.

Recently, Irnov and Winkler described a processive, cis-acting regulatory RNA element named EAR that regulates biofilm formation in Bacillus subtilis (194). Although, unlike put, EAR appears to require cellular co-factors which have not been identified yet, it is a first example of a long-distance antiterminator in Gram-positive bacteria.

35

Fig. 11. Schematic representation of λN- (A) and λQ (B) -modified TEC.

(adopted with changes from (116)).

Fig. 12. Models of antipausing activity of put. (adopted from (193)).

Top: put binds to RNAP and suppresses both backtracking and termination by the same yet unknown allosteric mechanism; middle: backtracking is suppressed mechanistically by hairpin formation, but termination depends on put-RNAP interactions; bottom: backtracking is prevented by the short RNA spacer between the hybrid and put anchored to RNAP, but termination depends on the interaction with the hairpin itself.

36

Ribosomal Antitermination

The ribosomal operons are the most highly transcribed units in E. coli (up to 68 transcripts per minute in exponential phase)(108). As rRNAs are not translated, they become potential subjects for Rho-dependent termination despite their extensive secondary structure. A ribosomal antitermination complex closely resembles that of λN and includes factors NusA, B, G, E and the nut site (78); but no N-like protein has been implicated. The precise role of the system is not completely understood. Measurement of rRNA synthesis rates in vivo confirmed that the presence of the antitermination complex increase these rates at least two-fold (108); however, the complex does not appear to be absolutely essential for viability (113).

The biological rationale for Rho-dependent termination in ribosomal operons is

also unclear. A computational analysis of RNAP traffic at the ribosomal operons allowed

evaluation of the quantitative effects of pausing, termination and antitermination on

rRNA transcription (145). A stochastic model of rRNA elongation was generated using

data from single-molecule and in vivo studies in E.coli. It predicts that maintaining a high

elongation rate is crucial for fast growth, when RNAP traffic is especially dense and the

risk of collision and subsequent arrest is high. Rho may remove stalled RNAPs if a traffic

jam has occurred, allowing to restore rRNA transcription.

NusG Family of Elongation Factors

The role of NusG-like factors in cell survival and virulence was recognized almost 20 years ago but very little progress has been made in unraveling the molecular mechanism of their action until recently. The key to understanding the function of NusG is in its structure: it consists of two domains connected by a flexible linker that act independently from each other and mediate different activities of the factor (119,175).

The N-terminal domain (NusGN) binds to the β’CH (Fig. 7)and is sufficient for

antipausing and rate enhancing effects on the TEC, whereas the C-terminal domain

(NusGC) can directly interact with either Rho or NusE and physically tether them to a

37 complex (135,175). Cooperation with Rho in silencing of foreign DNA in E. coli is thought to be an essential function of NusG (113).

Recently, a crystal structure of the yeast Spt4/Spt5 complex, an essential elongation factor that functions in the control of RNAPII processivity has been reported

(120). These studies revealed a high level of structural similarity between Spt5 and NusG

/RfaH N-terminal domains (Fig. 13), which allowed the authors to suggest that this structural motif is likely a conserved regulator of transcription elongation in all life forms. At the same time, in different species the universal RNAP-binding domain of

NusG is often fused or forms stable complexes with other functional modules. Bacterial

NusGs have a C-terminal domain, KOW, with the same RNA-binding motif found in ribosomal proteins (119). In eukaryotes, Spt5 contains multiple KOW motifs along with the conserved acidic N-terminal region and C-terminal repeats (120); furthermore, it forms a stable heterodimer with Spt4. Archeal NusG is structurally similar to the bacterial one; it was found to interact with the RNAP subunit RpoE, a protein similar to

Spt4 in sequence (121,195). In some bacterial species, like Thermoanaerobacter tengcongensis, NusG was also shown to form a heterologous complex with a replication initiator DnaA (196). The presence of specialized paralogs of NusG, such as RfaH, further increases functional diversity within the family (197). In the most extreme case known to date, Bacteroides fragilis encodes eight different NusG-like factors that control eight different polysaccharide biosynthetic operons (130).

Although NusG is ubiquitous, it is not essential in Bacillus, for example (198), where Rho is also dispensable. Interestingly, NusG in B. subtilis stimulates rather than inhibits pausing and intrinsic termination in a regulatory leader (117).

RfaH is an operon-specific paralog of NusG. It is recruited to the TEC at specific sites called ops early in elongation and becomes an RNAP subunit, increasing expression of distal genes in several long operons in E. coli and other γ-Proteobacteria (199). RfaH was first identified in E. coli and S. enterica where it regulates biosynthesis of LPS, O- antigen, and other virulence and fertility determinants. Although RfaH is dispensable in

38 standard laboratory conditions, it is essential for virulence in animal models (200). Both

RfaH and NusG increase the overall RNA elongation rate in vitro, however, RfaH is a more efficient antiterminator (Fig. 7): it suppresses pausing at both backtracked and hairpin-dependent sites, decreases termination efficiency at intrinsic sites and, unlike

NusG, disfavors Rho-dependent termination (Fig. 9)(129,201). Moreover, RfaH outcompetes NusG in vivo and in vitro (129).

Clearly, the elongation factors from the NusG family display considerable functional flexibility. Some NusG-like factors may affect the processivity of transcription directly, through their RNAP-binding domain, or indirectly, by recruiting other regulatory proteins to the elongating RNAP.

39

RfaH as a Paradigm of Antipausing Control

RfaH Structure and the Recruitment Mechanism

Despite the overall sequence similarity between RfaH and NusG, their structures are not that similar: both proteins are composed of two domains connected by a linker, but while the N-terminal domains (RfaHN or NusGN) can be almost superimposed, the folds of the C-terminal domains (RfaHC or NusGC) differ dramatically (Fig. 13)(118,197).

In NusG, the two domains fold and function independently, whereas RfaHC acts essentially as an inhibitor of the RfaHN, which, similarly to NusGN , appears to be

sufficient for all RfaH effects on transcript elongation (197).

RfaH does not bind RNAP in either free form or during initiation. Its recruitment

depends on the conserved sequence element called ops (GGCGGTAGnnTG; operon polarity suppressor) positioned on the NT strand in the TEC (82). Comparison of known functional ops sites, in vivo and in vitro analysis of mutants indicate that nine out of 12

nucleotide positions in ops are highly conserved(202)(Artsimovitch I., unpublished).

However, the preliminary analysis of the RfaH/ops DNA structure showed that the

protein makes only four base-specific contacts with the DNA at the upstream edge of the

ops element. Together, these data suggest that ops plays more roles than just the RfaH- binding site.

Based on structural and preliminary biochemical data a model of RfaH binding to the TEC was built (197). According to the model, the recruitment of RfaH to TEC occurs when RNAP pauses at an ops site (Fig. 14). Interaction of the ops bases with RfaHN triggers domain dissociation and allows RfaHN to establish contacts with the β’CH (Fig.

15). After recruitment, RfaH maintains contacts with RNAP throughout elongation.

RfaHC is dispensable for RfaH effects on elongation; it confers dependence on ops but does not bind it directly. Instead, RfaHC masks the RNAP-binding site on RfaHN prior to its activation via ops binding (Fig. 14).

40

Fig. 13. Structural comparison of NusG (A) and RfaH (B).

Structure of Aquifex aeolicus NusG, PDB#1M1G (119), structure of E. coli RfaH,

PDB#2OUG (197). RfaHN and NusGN are green, RfaHC is red, NusGC is blue.

Fig. 14. RfaH recruitment to the TEC.

RNAP is grey, RfaHN is green, RfaHC is red, β’CH is shown as two cylinders, NT DNA containing the ops sequence is highlighted in green.

RfaH recruitment is a multiple-step process. 1) Initial recognition of the ops by RfaHN.

Domains are closed, the RNAP-interacting surface on RfaHN is blocked by RfaHC;

2) Interaction between the RfaHN and ops exposed on the TEC surface triggers domain separation; 3) Domain opening allows RfaHN to establish contacts with the β'CH.

41

Fig. 15. The heterologous model of RfaH binding to the TEC.

RfaH binds 75Å away from the active site (magenta sphere), to the β’CH (orange) and

the NT DNA (blue). RNAP is grey, RfaHN is green, RfaHC is red, RNA is yellow, T DNA is red.

42

Analysis of the RfaH structure also revealed that the residues which form the hydrophobic core in the β-barrel of NusG are turned ‘inside-out‘ and end up on the surface of RfaHC. In contrast to NusG, in RfaH the C-terminal domain closes over the

hydrophobic cavity on RfaHN; the buried inter-domain interface is large (~1800Å2) and

>80% nonpolar (197). The separation of the domains is absolutely required for the RfaH

action, as was demonstrated by intramolecular crosslinking. At the same time, deletion

of the C-terminal domain leaves the RNAP-binding cavity on RfaHN always accessible, conferring an ops-independent phenotype in vitro without any effect on the antipausing activity (197).

Mapping of the RfaH binding site did not provide a straightforward explanation for its activity. It binds far away from the active site to modulate catalytic properties of

RNAP directly and also too far from the RNA exit channel to prevent hairpin formation.

A Ubiquitous Mechanism for Antitermination?

Antiterminators differ in their mechanisms of recruitment. λN binds the nascent

RNA structure (203), λQ is recruited to the double-stranded DNA near the promoter

(83), RfaH binds to the single-stranded NT DNA during elongation (82). The sites on the

TEC to which these proteins bind are currently unknown for most regulators, but are likely also distinct. Yet in spite of their differences, all antiterminators accelerate RNAP, suggesting that they induce similar changes in the TEC. However, these changes have not been characterized, and the molecular mechanism(s) by which elongation factors or substitutions in RNAP make the enzyme faster or slower is not known: they may control nucleotide addition at every template position by affecting the common rate-limiting step, or prevent the TEC isomerization into off-pathway states at pause and termination sites.

RfaH acts as a canonical processive antiterminator – it increases the overall elongation rate in vitro, reduces pausing at both class I and II sites, and facilitates bypass of some terminators. However, unlike other well-studied antiterminators (83,181), RfaH does not require any accessory proteins (e.g. NusA or NusG), and its action is not

43 dramatically affected by addition of the cellular extract (82). These features allow us to dissect RfaH effects in a highly purified model system and yield insights into the mechanisms of other elongation factors.

44

Chapter 2: The elongation factor RfaH and the initiation factor

sigma bind to the same site on the transcription elongation complex

Introduction

Bacterial RNA polymerase (RNAP) is a principal target for numerous accessory proteins and small ligands that fine-tune gene expression profiles to match the cell needs. Competition (or cooperation) among these regulators for the finite number of targets on the RNAP surface determines the patterns of gene expression. The classical paradigm for the partitioning of the regulatory space is σ-competition (204) with different initiation σ factors competing for binding to the core enzyme and, when

successful, directing it to a subset of σ-specific promoters. The σ subunit makes many

contacts to the core RNAP among which the β’ subunit clamp helices (a coiled-coil motif

comprised of residues 260-309 in the enzyme) are thought to constitute

the major σ binding site in the free RNAP (5,72) as well as in TEC (23). Our recent

finding that the β'CH is also required for recruitment of the elongation factor RfaH (197)

suggested that competition for this site may regulate gene expression far beyond σ-

specific promoter recognition.

RfaH reduces pausing and termination thereby suppressing transcriptional polarity in long operons encoding virulence and fertility determinants (82,202). RfaH action depends on the ops DNA sequence (GGCGGTAGnnTG) elements located in the transcribed regions of RfaH-controlled operons (82). In vitro, the ops element indeed mediates RfaH binding to the TEC but only if it is placed in the NT DNA strand exposed on the surface of RNAP (82). RfaH recruitment is thought to occur in two steps: (i)

45 sequence-specific binding of the N-terminal domain to DNA triggers displacement of the stably bound C-terminal domain to expose the RNAP-binding site on the N-domain; and (ii) interactions of the RfaHN with β'CH on one side and (nonspecific) interactions with the NT strand on the other allow for the stable retention of RfaH on the TEC throughout elongation (197).

Several lines of evidence support this view. First, the isolated N-domain no longer requires the ops element for function. Second, RfaH reduces pausing at all sites in vitro (82) yet increases pausing at the ops element located downstream from the identical ops site that mediates RfaH recruitment to the TEC (unpublished observations), indicating that RfaHN retains the ability to interact with ops. Third, the RNAP variant

missing the CH tip fails to respond to either full-length RfaH or RfaHN (197), arguing that the contacts to the CH are required regardless of the recruitment mechanism. Last, RfaH does not dissociate from the RNAP following its recruitment to ops (129).

Although RfaH and σ lack any sequence or structure similarity and recognize very different DNA elements, their targets on the TEC are topologically similar (Fig. 16).

σ70- binding to the TATA-like element, which is also located in the NT strand, mediates promoter recognition during initiation (50) and RNAP pausing at promoter-proximal (83,205,206) and downstream sequences (207). Moreover, RfaH(197) and σ (5) likely bind to the adjacent sites on the β'CH (Fig. 16B). Thus, RfaH would be expected to ‘insulate’ the TEC from σ re-binding since the β’CH is thought to be the only part of core that contacts σ during elongation (23). In contrast, RfaH is not likely to interfere with σ action

during initiation when σ makes numerous interactions that engage ~10,000Å2 of the core surface.

46

Fig. 16. RfaH and σ70 bind to topologically similar targets on the TEC.

A. In the TEC, core RNAP (gray) is bound to DNA strands (black) that are separated in

front of the active site to form a transcription bubble, in which the template DNA strand

is paired with the nascent RNA (red) to form an 8-9 bp RNA:DNA hybrid, whereas the

NT DNA strand is exposed on the TEC surface. RfaH binding to the TEC requires

specific interactions of RfaHN (green) with the ops element (dark green) and the hydrophobic tip of the β’CH. The C-domain (red) is dispensable for effects on RNA chain elongation in vitro and makes no contacts to TEC. Contacts of σ regions 2 and 3

with the β’CH and the -10-like element in the NT DNA (magenta) were proposed to

mediate σ rebinding to the TEC(23), while other contacts (made by σ1.1, σ3-4 linker, and

σ4) should be lost upon transition to elongation.

47

B. Interactions between the β’CH (with the N-terminal and C-terminal halves colored in purple and cyan, respectively) and RfaHN (left, in green) vs σ (right, in magenta); the

two views are related by ~180o rotation of the CHs. The residues whose substitutions

eliminate effects of RfaHN or σ on elongation are shown as ball-and-stick models. Two residues on the tip of the CH, β’ Ile 290 and 291 (cyan), are engaged in hydrophobic interactions with the N-domain of RfaH; Tyr8 residue (orange) is located at the RfaH/ β’

interface, whereas Arg73 (light green) is a part of the DNA-binding region of RfaH. In

contrast, σ makes many polar contacts to the CH including β’ Arg278 (purple); σLeu402 and Glu407 residues (yellow) are required for σ -dependent pausing (208). This figure was prepared with PyMol (DeLano Scientific LLC).

48

Results

RfaH Abrogates σ-Induced Pausing.

To test if RfaH can prevent σ70–dependent pausing (and likely recruitment) at the

–10 element during elongation, we first constructed templates with the extended –10

(TGcTATAAT) element positioned downstream from the consensus ops site element that

has been shown to mediate efficient RfaH recruitment in vitro (82,202). We prepared halted radiolabeled G37 TECs and monitored RNA chain extension upon addition of the

NTP substrates in the presence of RfaH, σ, or both. Addition of the wild-type σ70 to 1 µM

(at or below its physiological concentration (209)) induced pausing at position 118, at the same distance from the –10 element as observed in earlier studies of σ-induced pauses

(205-207); 14.5% of TECs remained paused at this site (called σP thereafter) after 16 min incubation. The full-length RfaH (at 40 nM) increased the rate of elongation (as seen from the accumulation of the run-off transcript) and, when present with σ70, reduced the fraction of TECs stalled at the σP site 2.3-fold, to 6.3%.

The inability of RfaH to prevent σ-induced pausing completely is consistent with our earlier proposal that the full-length RfaH readily dissociates in vitro (202). Upon dissociation, the full-length RfaH switches into the inactive ‘closed’ state, in which its two domains are tightly bound to each other (197), and cannot rebind the RNAP that has moved beyond the ops site. In vivo, other proteins may interact with the C-domain, precluding its re-association with the N-domain and favoring stable binding of RfaH to the TEC. If this were the case, using the isolated N-domain would abolish the effects attributed to the closed state formation. We found that the RfaHN indeed acted as a more effective competitor, completely abrogating the σ-pause on both the consensus ops (Fig.

17) and scrambled ops (Fig. 18A) templates.

49

Fig. 17. RfaH reduces σ-dependent pausing downstream from the consensus -10 element.

The linear DNA template shown on top with the transcription start site (+1) and end

(224), consensus ops and the extended –10 element indicated. The assays were performed as depicted, at least thrice for each combination of the conditions tested; WT = wild type.

A representative 6% denaturing gel is shown below. Positions of opsP, hisP, and σP pause sites were mapped in the presence of chain-terminating NTPs (data not shown).

The fraction of RNA at the σP site after a 960-sec incubation (in % of total RNA in the lane) is shown below each panel. Interestingly, RfaHN reduced the TEC fraction at the σP site to 1.0%, below that observed with the reconstituted (2:1 ratio of σ to core) holoenzyme alone (1.8%) - the residual pause in the latter case is apparently due to σ70 being present at 60 nM. The dramatic effect of RfaHN suggests that the apparent competition with σ70 requires stable RfaH recruitment to the TEC.

50

Fig. 18. Contacts with the TEC are critical for RfaH and σ effects during elongation.

A. Analysis of the NT DNA interactions on templates with variant -10 or ops elements shown above each panel.

B. Analysis of the β' CH determinants on the WT ops — WT -10 template with three different core RNAPs. The assays were performed as in Fig. 17 with protein variants indicated below each panel. To conserve space, only the relevant portions of the gels are shown.

51

RfaH Likely Prevents σ Recruitment Directly.

RfaH could eliminate σ-induced pause directly, by making stable contacts to TEC and sterically excluding σ or indirectly, through its general antipausing activity (82). We

favor the first possibility because such a dramatic effect of RfaH on pausing has not yet

been seen at any other pause/arrest sites; on the template used in Fig. 17 and many

others the isolated RfaHN reduces pausing (e.g., at the hisP site) on average by 3-fold. To exclude the indirect model, we utilized a ‘fast’ RNAP variant (β’∆943-1130) that is defective in both pausing and response to RfaH, yet retains the ability to bind to RfaH (210). The mutant enzyme paused at σP as efficiently as the wild-type RNAP (Fig. 17) and this

pause was abolished by RfaHN. These observations argue against the antipausing mechanism of RfaH action at σ-induced pauses and suggest competition for the same binding site. In support of this model, we found that σ70 apparently interfered with

RfaH recruitment to the ops site positioned immediately downstream from the extended

–10 element (Fig. 19).

Contacts to CH and NT DNA Are Essential for the RfaH and σ Action during Elongation.

Molecular modeling suggests that RfaH makes only a few contacts with the short exposed NT DNA segment and the adjacent regions on RNAP (197). Likewise, the

σ/TEC interface is also limited, as most of the σ contacts to core are lost upon the

transition from initiation to elongation (23). Thus, the recruitment of RfaH and σ and

their apparent competition during elongation should be very sensitive to changes in

interactions with the NT strand and the β’CH. We probed the role of the NT DNA

interactions using substitutions in ops, the -10 hexamer, and RfaH and σ70 residues that

are thought to make sequence-specific contacts to the DNA. As expected, full-length

RfaH did not affect elongation or interfere with the σ action on the template with the

scrambled ops (82) whereas RfaHN abrogated the σP (Fig. 18A); the same result was obtained on templates with single substitutions in the ops (data not shown).

52

Fig. 19. Bound to the NT DNA σ70 prevents RfaH recruitment to the adjacent ops site. To

test if σ70 bound to the extended –10 would prevent RfaH recruitment to a downstream ops site we designed a transcription template on which the extended –10 element (red box) is immediately followed by a consensus ops signal (green box). In this and other studies(205-207), σ induced pausing 12 nt downstream from the TATAAT box, likely because its specific contacts with the non-template DNA must be broken to allow for

RNAP translocation. The distance between the –10 element and the σ pause site is similar to that between the –10 promoter element and the peak of abortive products released during initiation, and just like in the course of abortive synthesis (20,211), the

53 persistent σ/DNA contacts may induce scrunching during elongation. We reasoned that on this template, σ70 bound to the extended –10 element will occupy the CH, thereby preventing stable RfaH recruitment to the ops site located downstream. RfaH binding to the ops DNA is expected to be disfavored by (i) the core RNAP occluding the ops element and possibly by an altered DNA conformation (scrunching). The schematic diagram of the assay and a representative gel are shown. Positions of the pause sites, opsP, hisP, and

σP, and the transcript end are shown with arrows; σ70 and RfaH induced pausing at

positions 74 and 76, respectively. RfaH and σ70 were present at 40 nM or 1 µM where

indicated. Addition of RfaH or σ70 alone induced RNAP pausing at ops or σP. The RfaHN

abolished σP and delayed RNAP escape from opsP, as expected. In contrast, the full-

length RfaH neither abolished the pausing at σP nor induced pausing at the ops site. This

effect was reproducible in several independent experiments, and is consistent with the

σ/RfaH competition for the same site on the TEC.

54

A single base substitution in the extended -10 element, -12T to C, eliminated pausing by the wild-type σ70; the pause was partially restored by the addition of σ2.4

Q437H variant that acts as an allele-specific suppressor of the -12T to C mutation in vivo

(212). Consistent with the in vivo data, the Q437H variant was less effective in recognizing the wild-type -10 element (14.5±0.7 vs 8.3±1.9% of σP-paused TECs). Lastly, we showed that R73A, a substitution in the putative DNA-binding site that greatly reduces both RfaH binding to the ops and its post-recruitment effects on elongation

(manuscript in preparation), also abolished RfaH effect on σ-dependent pausing (Fig.

18A). Taken together, these data highlight the importance of RfaH and σ contacts to the

NT strand.

Next, we turned to the analysis of the CH contacts with σ and RfaH. We

reasoned that σ Glu407 should be required for σ-dependent pausing: this residue is

located near σ Leu402 that is essential for promoter-proximal σ-dependent pausing(208),

interacts with the CH directly, and E407K substitution has been shown to destabilize σ

interactions with the CH (213,214). Indeed, σ70 E407K failed to induce pausing at the σP

site (1.2±0.1%; Fig. 18B). We next tested whether σ38, which shares the –10 recognition

determinants(215) with σ70, will recognize the –10 element as a pause signal. In

agreement with(216), we found that σ38 did not induce RNAP pausing significantly above background (2.4±0.6%; Fig. 18B). This observation is consistent with the report

that σ38 binds less tightly to the core enzyme and dissociates more rapidly after initiation

(217), as well as with the sequence differences between the two σ factors both in the CH recognition region and in the σ1.2 region, reported to affect –10 recognition allosterically

(218). Interestingly, σ38 has a Glu residue at the position that corresponds to Gln406 in

σ70; Q406A substitution confers a defect in binding to core RNAP comparable to that of

E407K (213). To probe the importance of the RfaH/CH interface, we utilized Y8A, a

substitution at the proposed RfaH-RNAP interface (Fig. 16B) that is expected to weaken

hydrophobic contacts between the N-domain and RNAP. We found that Y8A

substitution greatly reduced RfaH effects on σ-dependent pausing (11.5±2.7%; Fig. 18B)

and elongation (data not shown). Thus, weakening of σ contacts with the NT DNA or

55 the β' CH (by RfaH Y8A or β’ I290R substitution, see below) compromise its anti-σ

activity. In turn, weakening of σ contacts with the NT DNA or the β' CH reduces

pausing, making RfaH competition irrelevant.

RfaH and σ70 Recognize Distinct Determinants on the β’CH.

The modes of σA (σ70 homolog in Thermus thermophilus) and RfaH binding to the

CH are quite different (Fig. 16B): σ forms a network of polar interactions with the CH

covering nearly all their surface exposed in the core enzyme (5), whereas RfaHN is

predicted to make predominantly van der Waals contacts (197) with two hydrophobic

residues at the very tip of the β’CH (Ile290 and Ile291 in E. coli). This suggests that it may

be possible to selectively destabilize the CH contacts with RfaH or σ70.

Indeed, substitutions of Ile290 and Ile291 for Arg abolish regulation by RfaH, but

do not prevent transcription initiation (197). We found that β’ I290R RNAP paused at

the σP site whether or not RfaH was present (Fig. 18B), albeit with a lower efficiency

than the wild-type enzyme – thus, it appears that this substitution did not obliterate the

σ/CH interaction while completely abolishing RfaH action (and presumably binding to

the TEC).

In search for a substitution in the CH that would severely destabilize σ70 contacts during elongation, we constructed two β’ variants which were reported to have a reduced affinity to σ70, R293Q and R275Q (72). The purified β’ R293Q core RNAP did not behave any differently from the wild-type enzyme in any assay used (Fig. 20); in fact, in the T. thermophilus holoenzyme (5) the corresponding β’ residue (Arg568) makes no contacts to σ. In contrast, the β’ R275Q enzyme was defective in σ-pausing (Fig. 21) but displayed no other defects in transcript elongation, including response to RfaH (Fig. 20 and data not shown). Given the dramatic effect of σ E407K on pausing (Fig. 18B), we

also designed a core variant with the substitution of β’ Arg278, a residue that interacts

with σ Glu407 in the holoenzyme, for Glu. As expected, this substitution had an effect

comparable to the σ E407K change: β’ R278E RNAP failed to pause at the σP site, yet was

56 similar to the wild-type enzyme in its elongation pattern and response to RfaH (Fig. 18B and Fig. 20). These data lead us to conclude that the β’CH determinants that mediate its binding to σ and RfaH are not identical.

RfaH and σ70 Do Not Compete During Initiation.

During elongation, both σ and RfaH are thought to make only a few interactions

with the TEC (23,197). In contrast, in the initiation complex σ makes multiple contacts to

both the core enzyme and the promoter DNA elements – thus a loss of just one of these

contacts may still allow for σ function. Indeed, E407K variant unable to pause at the

perfect –10 element (Fig. 18B) supported transcription initiation. We therefore reasoned

that RfaH would not be able to interfere with σ function at promoters, given that it

requires a specific sequence for recruitment and binds to core RNAP weakly, and only

when its diffusion is limited by immobilization on a matrix (82). Indeed, RfaH (either

full-length or the N-domain) did not inhibit transcription initiation from the λPR

promoter (Fig. 22) under conditions that are expected to favor RfaH over σ: a large

molar excess of RfaH was pre-incubated with core RNAP prior to the addition of σ70 and

only a single round of transcription (formation of a halted A26 TEC) was measured.

RfaH (at 20 µM) also failed to compete with σ70 for binding to free core RNAP (IA, data not shown). Consistently, RfaH has never been shown to affect initiation and all its effects were linked to elongation. These observations are reminiscent of the findings by

Gill et al. that a large excess of NusA, which is thought to compete for σ binding site on core RNAP (219), did not affect σ function during initiation (220).

57

continued

Fig. 20. RfaH effect on RNAPs with the substitutions in the β’CH.

The single round pause assays were performed on the pIA349 template(82) schematically depicted on top. The altered enzymes were tested for response to 40 nM full-length RfaH. RfaH has several effects on transcription on this template. Within the ops element, RfaH reduces pausing at the U43 site and delays RNAP at the U45 site; the latter effect is a consequence of the RfaH binding to the non-template DNA (since these contacts have to be broken to allow forward movement of the TEC)

58

Fig. 20 continued

but is not a requirement for the TEC modification. At downstream pause, sites, such as

hisP, RfaH reduces pausing ~3-fold(82,197), thereby accelerating transcription. All RNAP variants shown displayed qualitatively similar response to the wild-type RfaH; interestingly, RfaH was particularly effective in reducing pausing (at U43, hisP, and the intermediate positions) by the σ -defective β’R278E core RNAP. Sizes of the α32P- labeled

pBR322 MspI restriction fragments used as molecular markers are indicated.

Fig. 21. β’ R275Q RNAP fails to respond to σ during elongation.

Performed as in Fig. 18B.

59

Fig. 22. RfaH does not compete with σ 70 during initiation.

Core RNAP was pre-incubated with increasing concentrations of RfaH (full-length or the

N-domain alone) before addition of σ70, template encoding λPR promoter, NTP

substrates, and a α32P-labeled DNA oligonucleotide used as a loading control. Samples were analyzed on a 12% denaturing gel; a representative gel is shown. The fraction of halted A26 complex (corrected using the 45-mer as standard) formed in a single-round assay was quantified relative to that in the absence of RfaH. The assay was repeated five times; the fraction of A26 RNA was between 94 and 102% and independent of RfaH concentration.

60

Discussion

In this work, we demonstrate that the elongation factor RfaH antagonizes σ

activity during elongation but not initiation of transcription. This anti-σ activity of RfaH

requires its stable association with the TEC and is mediated by the CH domain of the

RNAP β’ subunit and by the NT DNA strand. Interestingly, the β’CH utilizes a different

set of interactions to bind to σ and RfaH, extensive polar contacts in the first and a

hydrophobic patch in the second case (Fig. 16B). We show that these two sets of

interactions can be selectively disrupted by substitutions in the β’CH - we argue that

RfaH competes with σ indirectly, by sterically occluding the σ target site and thus

preventing its reloading onto the TEC at sites which resemble the -10 consensus. On the

other hand, although their recognition DNA sequences are very different (Fig. 16A),

RfaH and σ could compete for binding to the NT DNA in the TEC directly, using

essentially the same target, the fork-junction between the upstream DNA duplex and the

surface-exposed NT strand.

We used the recently determined structure of a bacterial TEC(221) to visualize the contacts of σ and RfaH within the TEC (Fig. 23). We assumed that the protein structure remains largely unaltered and modeled the NT strand in a conformation resembling that in the DNA duplex, with most bases exposed and stacking on each other while avoiding close contacts of its phosphate backbone with the protein. The modeling was aided by the restraints imposed on the positions of the first annealed DNA bp and the first unpaired NT strand nt in the transcription bubble: the TEC structure and a recent biochemical study (222) indicate the 9 bp-long RNA/DNA hybrid, with only one

DNA bp melted in the active site. The model of the RfaH N-domain was fitted to the tip of the CH as described previously (197). The TEC/σ model was generated through superposition of the β’CH domain in the holoenzyme (5) with that in the TEC (221)}; the

CH appears substantially displaced (by ~7Å) towards the main channel in the latter structure. The TEC/σ modeling reveals that only σ region 2 is likely to maintain stable contacts with the core enzyme. As noted above, contacts of the σ3.2 and σ4 observed in

61 the holoenzyme are incompatible with the TEC structure. Moreover, the σ domain that encompasses σ3.0-3.1 likely also loses its contacts with RNAP because its binding site in the

TEC appears stably blocked by the upstream DNA duplex. This view agrees with the previous model (23) and the experimental data implicating the β' CH as the major σ binding site on the TEC.

These models are hypothetical and can't be used for the detailed structural analysis - we cannot rule out the alterations in the RNAP structure upon binding of σ or

RfaH (e.g. additional displacement of the mobile CH) or a somewhat distinct conformation of the ab initio modeled NT DNA strand (e.g. some bases may be trapped in protein pockets rather than exposed). Assuming, however, that these putative changes are not very dramatic, the models suggest three implications. First, consistent with our data (Fig. 17), RfaH and σ are expected to compete for the binding site on the TEC

during elongation. Second, both proteins may establish base-specific contacts with two-

three exposed bases of the NT strand at the upstream edge of the transcription bubble.

Last, although the upstream NT bases are most proximal to σ/RfaH, they are not directly

accessible to σ Gln437 and RfaH Arg73, the residues that are thought to form sequence-

specific contacts with the DNA. This observation suggests that the -10 and ops sequence

elements may favor formation of a DNA loop within the melted NT strand to allow for

the base-specific contacts with σ and RfaH, respectively. Following this initial

recruitment, further compaction of the DNA may be induced by protein-DNA

interactions in both types of paused complexes; these structural transitions may

represent scrunching occurring during elongation(223) rather than abortive initiation

(20).

The β’CH has been long known to play an important regulatory role by recruiting

the σ initiation factors to core RNAP. We show that, in spite of its rather limited size (<50

residues), the β’CH exhibits the potential for unexpected mechanistic and functional

diversity that allows it to recruit regulators that act during elongation and have no

apparent similarity to σ. We found that RfaH prevents σ-induced pausing at a –10-like

62 element in vitro (Fig. 17); other regulatory proteins that target the β’CH during elongation would be expected to compete with σ as well. Obvious candidates include various RfaH paralogs, such as the elongation factor NusG and ActX in E. coli, and yet

unknown proteins that bind to the NT DNA, either specifically or non-specifically (e.g.,

during transcription-dependent AID-mediated cytidine deamination (224).

RfaH increases expression of distal genes in long operons by facilitating bypass

of many consecutive roadblocks as RNAP travels the entire length of the operon. Both

nucleic acid signals and DNA-bound proteins can delay RNAP progression along the

template, prompting emergence of regulators that enable transcription through both

types of obstacles. Our results argue that RfaH plays a dual antipausing role: it prevents

the TEC isomerization into off-pathway states at factor-independent pause signals (210)

and insulates the TEC from spurious rebinding of σ to -10-like sequences that triggers

pausing. Consistent with the proposed RfaH/σ competition, the second mechanism is

highly specific towards σ in context of the TEC, as RfaH is unable to facilitate

transcription through an EcoRIQ111 roadblock pre-formed on double-stranded DNA (IA, unpublished). Notably, σ -induced pauses hinder transcript elongation yet are

unavoidable since they depend on the same set of contacts that mediate the

indispensable function of σ at promoters. Interference with σ -dependent pausing (but

not initiation) may thus constitute an essential part of the RfaH regulatory function. This

‘competition’ could work both ways: whereas RfaH would inhibit σ loading during

elongation, σ re-binding to the termination complex triggered by conformational

changes in RNAP (225) may induce RfaH release.

Our findings also have implications for σ release and re-binding to the

transcription complex. The release of σ was proposed to occur deterministically, after the

nascent RNA reaches a defined length of at least 8 nt and actively displaces σ or

stochastically, after entering the elongation phase when most of the σ contacts to core are

lost (reviewed in Ref. (23)}. Even though association constant of σ70 drops from 2x109 to

5x105 M-1 upon transition from initiation to elongation(220), the estimated in vivo

63 concentration of σ70 is ~15 µM (209), and its effective concentration may be higher due to the macromolecular crowding. Thus, σ could stay bound to the TEC, a scenario supported by some studies (138,226). While the vast majority of σ70 appears to be released from the elongating RNAP rapidly in vivo (217,227), in some operons σ hangs

along (228), suggesting that the TEC/σ association can be regulated by environmental

conditions, specific sequences, or accessory factors. For example, the elongation factor

NusA was proposed to bind to the CH (229) and thus could trigger σ displacement. We

note, however, that although NusA and σ may compete for binding to the β flap

(112,213), their competition for the CH appears unlikely: NusA does not eliminate σ-

dependent pausing (207), does not compete with RfaH (IA, data not shown) or NusG

(230), and may co-exist with σ within the TEC (138,185). Instead, our data suggest that

NusG and its paralogs play the role originally assigned to NusA: RfaH would exclude σ

from RNAP molecules transcribing the ops-containing operons whereas NusG is expected to act on the rest of the transcriptome.

The enthusiastic but short-lived demise of the σ -release paradigm painted a new

picture of transcription, overloaded with “memories” of past regulatory decisions in

form of idle σ subunits forever bound to their core RNAPs. The ChIP-on-chip data

(217,227), together with our findings that elongation factors can "insulate" the TEC from

rebinding of σ restore the σ release to its place in the transcription cycle. Perhaps even

more importantly than preventing σ -induced pausing, this insulatory effect may reset

the “memory” of past initiation events and increase responsiveness of σ competition as a

regulatory/developmental switch.

Contributions: Anastasia Sevostiyanova carried out mutant characterization, pause assays, cloning and purification of RpoDQ437H, and analyzed the data. Vladimir Svetlov

purified RNAP and RfaH variants. Dmitry Vassylyev provided structural modeling.

Irina Artsimovitch performed initiation assays, constructed transcriptional templates

and rpoC mutants.

64

Fig. 23. Contacts to the NT DNA strand.

Structural models of RfaH (A) and σ (B) bound to the TEC. The RNAP core is shown in gray with the CH highlighted in cyan. The template and non-template strands are shown in red and blue, respectively. The registers (relative to the active site) from -9 to -6 represent the single-stranded NT DNA, while those of -10, -11, etc. correspond to the upstream DNA duplex; the numbering does not correspond to positions in the nascent

RNA transcript because RfaH- and σ -paused TECs are backtracked, placing the 3'- end

of the RNA ahead of the active site. This figure was prepared with Molscript (231).

65

Materials and Methods

Proteins and Reagents.

All general reagents were obtained from Sigma and Fisher; NTPs, from GE

Health; PCR reagents, from Fermentas and Roche; restriction and modification enzymes, from NEB; [α32P]-NTPs, from GE Health. Oligonucleotides were obtained from

Integrated DNA Technologies. DNA purification kits were from Promega. Substitutions

in the E. coli rpoC gene (encoding the β’ subunit) were constructed by site-directed

mutagenesis; sequences of all constructs were verified at the OSU PMGF. All

plasmid constructs used in this work are listed in the Table 2. RfaH variants, σ \70, and

altered RNAPs were purified as described previously(197). σ38 was a gift of Jay Gralla.

Pause Assays.

Linear DNA template generated by PCR amplification (40 nM), holo RNAP (30 nM), ApU (100 µM), and starting NTP subsets (indicated in figure legends; the NTP used for labeling at 1 µM, two others at 5 µM) were mixed on ice in GBB buffer (20 mM

Tris-HCl, 20 mM NaCl, 14 mM MgCl2, 5% glycerol, 14 mM 2-mercaptoethanol, 0.1 mM

EDTA, pH 7.9). Halted radiolabeled TECs were formed at 37 oC for 15 min, and

incubated with RfaH variants and σ70 (at 40 nM and 1 µM, respectively, where indicated)

for 3 min at 37 oC. Elongation was restarted by the addition of NTPs (150 µM ATP, CTP,

UTP, 10 µM GTP) and rifapentin (25 µg/ml). Aliquots were withdrawn at 15, 30, 60, 120,

480, and 960 seconds.

Halted A26 Complex Formation.

Core RNAP (40 nM) was pre-incubated with different concentrations of RfaH in

25 µl of GBB buffer at 37 oC for 25 min. An equal volume of the pre-warmed mix of the

linear pIA253 DNA template (200 nM), σ70 (40 nM), ApU (200 µM), ATP and UTP (10

µM), GTP (2 µM), 10 µCi of α32P-[GTP] (3,000 µCi/mmol), and 2 nM of the ssDNA standard in the GBB buffer was added, followed by the 25 min incubation at 37 oC.

66

Sample Analysis.

Reactions were stopped by the addition of an equal volume of STOP buffer (10 M

urea, 50 mM EDTA, 45 mM Tris-borate; pH 8.3, 0.1% bromphenol blue, 0.1% xylene

cyanol). Samples were heated for 2 min at 90 oC and separated by electrophoresis in

denaturing 6% acrylamide (19:1) gels (7 M Urea, 0.5X TBE). The gels were dried and

RNA products were visualized and quantified using a Molecular Dynamics

Phosphorimaging System, ImageQuant Software, and Microsoft Excel.

67

Name Description Source or note pIA253 λPR promoter – A26 transcription template (232) pIA349 T7 A1 promoter–G37–ops pause transcription template (82) pIA807 T7 A1 promoter–G37–ops pause–[extended-10] this work pIA808 T7 A1 promoter–G37–scrambled ops–[extended-10] this work pIA810 λPR promoter–A26–[extended-10]–consensus ops site this work pAS30 T7 A1 promoter–G37–ops pause–[extended-10 12T->C] this work pVS10 PT7–rpoA–rpoB–rpoC[His6]; rpoZ (197) pVS14 PT7–rpoA–rpoB Δ943-1130–rpoC[His6]; rpoZ (210) pIA778 β’ L283D in pVS10 this work pIA795 β’ R293Q in pVS10 this work pIA803 β’ I290R in pVS10 (197) pIA816 β' R278E in pVS10 this work pIA817 β' R275Q in pVS10 this work pIA238 PT7–[His6]rfaH (82) pIA677 RfaH Y8A in pIA238 this work pVS66 RfaH R73A in pIA238 this work pCL391E407K PT7–[His6]rpoD E407K (213) pAS31 PT7–[His6]rpoD Q437H this work

Table 2. Plasmids and templates.

68

Chapter 3: Functional regions of the N-terminal domain of the

antiterminator RfaH

Introduction

RfaH is an operon-specific paralog of widely conserved general elongation factor

NusG. NusG is essential in wild-type E. coli (124) and is associated with RNAP transcribing most of the E. coli MG1655 genes (137). Recent studies (113) demonstrate that NusG becomes dispensable when the rac prophage kil gene is deleted, and suggest that NusG limits transcription of the horizontally transferred DNA by enhancing Rho- dependent termination (124,126). Additionally, E. coli NusG cooperates with NusA,

NusB, NusE and other factors to form specialized antitermination complexes that are resistant to pause and termination signals (132,233).

By contrast, RfaH appears to act independently of other proteins and targets only those operons that have an ops element in their untranslated leader regions (129). The ops element is required for RfaH recruitment to RNAP (82,129,234): it mediates sequence- specific binding of RfaH to the NT DNA exposed on the surface of the TEC and may induce TEC isomerization into a distinct state necessary for recruitment of RfaH.

Modeling suggests that the DNA needs to be deformed to allow for productive contacts with RfaH(210); the ops may be an example of a “scrunchable” sequence (223). Our recent analysis identified several ops-containing operons that are enriched for RfaH (129) in MG1655; these operons are devoid of NusG and do not encode essential functions.

Consistently, RfaH is dispensable for growth of the commensal E. coli (201). However,

69

RfaH activates expression of several virulence and fitness genes, such as LPS(235), capsule (236,237), and hemolysin (238,239) biosynthesis genes, and is essential for virulence (200,240).

RfaH and NusG increase the transcript elongation rate by suppressing RNAP pausing at backtracked sites in vitro (41,82). However, some of their effects are different even in a purified system: for example, RfaH also reduces pausing at hairpin-dependent sites, whereas NusG does not (41,82). Most importantly, NusG increases whereas RfaH reduces Rho-dependent termination (82,137); this difference underlies the opposite regulatory functions of RfaH and NusG in the cell. RfaH inhibits Rho action and thus activates expression of laterally acquired genes (129), whereas NusG appears to act in concert with Rho to inhibit the expression of foreign genes (113).

These opposite functions may be partially explained by different architectures of the two proteins (Fig. 24). Both proteins consists of two domains connected by a flexible linker (197). The N-terminal domains (RfaHN and NusGN) are structurally similar and mediate RNAP binding and antipausing (AP) activities of both proteins (137,197). The C- terminal domains are drastically different (Fig. 24A): a short α-helical hairpin in RfaH, a

β-barrel Tudor domain in NusG. Strikingly, RfaHC sequence can be computationally

fitted into a NusGC-like structure (129). The two domains are tightly associated in a free

RfaH, and the interdomain interface masks a hydrophobic surface on RfaHN that likely

serves as an RNAP-binding site; we have proposed that RfaH binding to an ops element triggers the domain separation and allows RfaH binding to the RNAP (197). In contrast, the two NusG domains do not interact, implying that the RNAP-binding surface on

NusGN is always accessible (175); indeed, NusG associates with most transcribed

operons (137).

In a model of RfaH bound to the TEC, one side of RfaHN binds to the exposed

segment of the NT stand and the other - to the tip of the β’CH domain, whereas RfaHC

domain does not make any contacts to RNAP or DNA (Fig. 24B). This model is

70 supported by (i) zero-length UV-crosslinking of RfaH to the NT DNA (82); (ii) the ability

of RfaHN to compete with σ70 for binding to the β’CH during elongation (136); (iii) the loss of ops binding conferred by substitutions of adjacent basic residues in RfaH (see

Results); (iv) the deleterious effects of substitutions of the hydrophobic residues in RfaH and in β’CH on RfaH association with the TEC (136,197).

Here, we show that RfaHN also supports anti-termination at Rho-dependent and intrinsic terminators in vitro. Taken together with our previous reports that RfaHN is

sufficient for the AP activity and σ70 exclusion in vitro (136,210), these data indicate that

RfaHN contains all the functional elements required for AP and antitermination activities

of RfaH. In this work, we set out to dissect these elements. Our modeling suggested that

RfaHN has at least two separate regions (Fig. 24A) that mediate sequence-specific contacts to the ops bases and binding to the β’CH, respectively. We constructed a set of single residue substitutions in RfaHN and tested their phenotypes in vitro where we can

distinguish defects in binding and defects in function. We report that the two previously

hypothesized ops-binding and the RNAP-binding regions are indeed required for RfaH

recruitment to, and retention on, the TEC. Unexpectedly, our analysis also identified the

third RfaHN cluster which is apparently dispensable for binding to the TEC but is

essential for the AP activity of RfaH.

71

Fig. 24. The structural context of the RfaH action.

A. The E. coli RfaH model with RfaHN shown in grey and the RfaHC - in cyan. The aromatic residues at the domain interface (orange) and the polar residues on the opposite side of the RfaHN domain (blue) are shown as sticks. The E. coli NusG model is

shown for comparison, the structurally distinct NusGC domain is shown in dark red.

B. A model of RfaH bound to the TEC. The T. thermophilus RNAP (5) is shown as green

lines, β’ BH highlighted in magenta, T DNA is red, NT DNA is blue, RNA is yellow, the

active site is a small magenta sphere. RfaHN is bound to the NT DNA and to β’CH, whereas RfaHC makes no contacts to the RNAP.

72

Results

RfaHN Mediates All Transcriptional Activities of RfaH In Vitro

We reported that RfaHN is sufficient for the AP activity of RfaH at factor-

independent pause signals (197) and at σ-dependent pause sites (136). Similarly, Mooney

et al. have found that NusGN is sufficient for NusG’s effects on RNA chain elongation

(137). However, NusGN does not support increased Rho-dependent termination. Thus, we wanted to test whether RfaHN alone can mediate the effects of full-length RfaH on

Rho-dependent termination. We performed this analysis in vitro because RfaHN, which

possesses an extensive hydrophobic surface that is masked either by RfaHC (in a free

RfaH) or by the β’CH (when bound to the TEC), was insoluble when expressed separately. To circumvent this problem, we introduced a TEV protease cleavage site into the interdomain linker of the C-terminally His-tagged RfaH, cleaved the purified full- length protein using the His-tagged TEV protease, and removed both the protease and the RfaHC domain by absorption to the Ni-Sepharose resin. Thus isolated RfaHN is poorly soluble and prone to aggregation but, when present at low concentration, acts similarly to the full-length RfaH (197).

We tested the effect of RfaHN at the intrinsic (factor-independent) Thly terminator,

which has been shown to respond to RfaH in vivo (241) and in vitro (202). During single-

round in vitro transcription in the absence of RfaH ~40% of transcripts were terminated

at Thly (Fig. 25), whereas addition of RfaH decreased termination efficiency more than two-fold, to 18%; the same effect has been reported previously (202). Consistent with its key role in modification of RNAP into a pause-resistant state, the isolated RfaHN domain had the same effect on termination at Thly (18%).

To assay RfaHN effect on Rho-mediated RNA release, we used a template that

encodes the opsP followed by a well-characterized phage λtR1 Rho-dependent termination signal (Fig. 26A, pIA267 (82)). On this template, RfaH and NusG had opposite effects on Rho-dependent RNA release: consistent with its synergy with Rho,

73

NusG shifted the distribution of RNA species upstream (towards shorter transcripts), whereas RfaH favored synthesis of longer RNAs (Fig. 26B), presumably by reducing

RNAP pausing, and thus Rho-mediated termination. The RfaHN domain displayed the

same effect but was able to act at lower concentrations; the enhanced activity of RfaHN

was also observed in pause assays (197) and is likely due to the higher stability of

RfaHN/TEC complex downstream from the ops site. We conclude that the isolated RfaHN

domain is sufficient for all documented in vitro activities of RfaH.

In Vitro Assay for Ops Binding and AP Activities of RfaH

RfaH has two distinct effects in an in vitro transcription assay: it inhibits backtracking and accelerates escape from hairpin-dependent pause sites. The former effect RfaH shares with NusG, whereas the latter effect is specific for RfaH. The other distinctive property of RfaH is its ability to bind with high affinity (with Kd in a low nM range, ~1,000 higher than for an ops DNA oligonucleotide) to the TEC paused at the conserved opsP site (opsP1, position U43 in pIA349 transcript; Fig. 27A) where RfaH specifically interacts with the ops bases in the NT DNA (82). Upon binding, RfaH accelerates escape from opsP1 site but delays a small fraction of RNAP (~15-20%) two nucleotides downstream of opsP1 (e.g., opsP2 site, position C45 in pIA349 transcript).

Experiments performed in the presence of GreB (Fig. 28) suggest that RNAP paused at opsP2 is backtracked by two nucleotides, so that the active site is at the opsP1 site. RfaH therefore likely maintains specific contacts with the NT DNA at least two nucleotides downstream from its recruitment site, which in turn suggests that RNA extension following RfaH recruitment is accompanied by moderate scrunching of the DNA.

Structural modeling indicates that only four ops nucleotides at the upstream edge of the transcription bubble are exposed on the surface of the elongating RNAP (197) and are available for direct contacts with RfaH detected by crosslinking (82). Thus, RNAP translocation by two nucleotides would inevitably occlude the RfaH-binding site on the

NT DNA. The hypothetical scrunched state can be resolved either by breakage of the

RfaH/NT DNA contacts or by reannealing of scrunched DNA.

74

Fig. 25. RfaHN effects on intrinsic termination.

Transcript generated on a linear pIA416 DNA template; transcription start site

(+1), the ops element (boxed), Thly terminator structure, terminated and run-off RNA

products are shown on top. Halted [α32P]CMP-labeled G37 TECs were formed at 60 nM

with Eco RNAP and challenged with NTPs (10 µM UTP, 200 µM ATP, CTP, GTP) and

rifapentin at 25 µg/ml in the absence or in the presence of full-length RfaH or RfaHN. The reactions were incubated for 15 minutes at 37oC, quenched, and analyzed in a 6%

denaturing gel along with the [γ32P]ATP-labeled pBR322 MspI digest as a molecular

weight standard (the sizes of fragments are indicated. Termination efficiency (219-nt

long RNA as a fraction of total RNA) was determined in three independent experiments.

75

Fig. 26. RfaHN effects on Rho-dependent termination.

A. Transcript generated on a linear pIA267 DNA template; transcription start site (+1), ops, Rho-dependent RNA release sites, and transcript end are indicated.

B. Halted, [α32P]GMP-labeled TECs were formed at 40 nM with Eco RNAP. Rho, NusG,

RfaH or RfaHN were added at indicated concentrations, followed by addition of NTPs

and rifapentin. The reactions were incubated for 15 minutes at 37oC, quenched, and analyzed in a 6% denaturing gel. A representative gel and four selected traces for Rho alone (gray), full-length RfaH at 300 nM (red), RfaHN at 60 nM (blue), and NusG at 40 nM (green) are shown.

76

The latter scenario is phenotypically indistinguishable from backtracking to the recruitment site as observed on the pIA349 template (Fig. 28).

Importantly, the RfaH-induced delay is only observed when the third nucleotide downstream from opsP1 is G and the GTP concentration is low (data not shown), indicating that this effect has little relevance to RfaH function in vivo. Here, we use

RfaH-induced pausing at opsP2 as a tool for evaluating the strength of interactions between variant RfaHs and the NT DNA. To simultaneously assay the ops recognition and the AP activity of RfaH, we used a template that encodes the tandem ops and his pause signals located downstream from a T7A1 promoter. The initial transcribed region was designed to allow for the formation of radiolabeled TECs stalled after incorporation of a G residue at position 37 (G37) when transcription is initiated in the absence of UTP

(Fig. 27A). Upon addition of all four NTP substrates and rifapentin (to block re- initiation), RNAP elongated the nascent RNA at a characteristic rate, pausing at opsP1, opsP2 and hisP sites (Fig. 27B). The wild-type (WT) RfaH increased RNAP pausing at the opsP2 site, but accelerated transcription downstream, reducing the apparent efficiency and the longevity of pauses (Fig. 27C).

The main goal of this work is to identify the RfaH determinants that mediate its recruitment to the TEC and the consequent AP modification of RNAP. We focused on mutational analysis of RfaHN, since it is responsible for all the direct effects of RfaH on

transcript elongation and termination. We targeted both the surface exposed residues

that are highly conserved in NusG-RfaH superfamily (e.g., a subset of hydrophobic

residues that likely binding to RNAP) and those residues that are divergent between

RfaH and NusG (e.g., the positively charged residues that may constitute the DNA-

binding site in RfaH). Below, we summarize the properties of selected single

substitutions in RfaH at the opsP2 and hisP sites.

77

Fig. 27. Effects of RfaH on transcription elongation in vitro.

A. A schematic representation of a linear template pIA349 with the ops element, the start site (+1), transcript end (run-off), the pause sites that occur after the addition U43

(opsP1), C45 (opsP2), and U145 nucleotides (hisP), and hisT terminator indicated.

B. WT RfaH accelerates TEC escape from the U43 and the hisP sites but delays the RNAP escape from the C45 site in a single-round elongation assay. Halted radiolabeled G37

TECs (see Materials and Methods) were pre-incubated with RfaH at 50 nM or storage buffer for 5 min at 37 oC, and then challenged with rifapentin at 100 µg/ml and NTPs (10

µM GTP, 150 µM ATP, CTP, UTP). Aliquots were withdrawn at times ranging from 5 to

1200 sec and analyzed on a 8% denaturing gel.

C. The fractions of RNA at the hisP (red squares), at or beyond the hisP site (blue circles), and at the opsP site (U43+G44+C45; green triangles) were quantified from the gel in (B) and used to kinetically model the pauses in the absence (top) and in the presence

(bottom) of the WT RfaH.

78

continued

Fig. 28. Effects of GreB on pausing at the ops site.

A. Top: A schematic representation of a linear template pIA349 with the ops element, the start site (+1), transcript end (run-off), the pause sites that occur after the addition U43 (opsP1),

C45 (opsP2), and U145 nucleotides (the hisP pause), and the hisT terminator are indicated.

79

Fig. 28 continued

Bottom: Halted radiolabeled G37 TECs (see Experimental procedures) were pre-incubated with RfaH at 50 nM or storage buffer for 5 min at 37 oC, and then challenged with rifapentin at 100 µg/ml and NTPs (10 µM GTP, 150 µM ATP, CTP, UTP). Where indicated, wild-type

E. coli GreB or cleavage deficient variant GreBD41N (96)were included in transcription

reactions at 500 nM. Aliquots were withdrawn at times ranging from 10 to 1200 sec and

analyzed on a 8% denaturing gel.

B. The fraction of RNA at opsP (U43+G44+C45) as a function of time, quantified from the gel

in (A). Wild-type GreB but not the cleavage deficient variant completely eliminates the slow

escaping fraction of RNAP observed in the presence of RfaH.

C. AP activity of RfaH at hisP expressed as khisP/EffhisP is not affected by GreB, suggesting that

RfaH is properly recruited and retained with the TEC in the presence of GreB.

D. The fractions of RNAP retained at U43, G44 and C45 position as functions of time quantified from gel in (A). GreB dramatically reduces the RNAP fractions at C45 and G44 but increases the fraction at U43 compared to those observed in the presence of RfaH alone.

Considering that GreB effects at pause sites are conventionally attributed to acceleration of

RNA cleavage in backtracked TECs (242), the data pattern above suggests that ~15% of

RNAP is backtracked at positions C45 and G44 with the U43 positioned in the active site.

E. The model of GreB and RfaH action at opsP. DNA strands and RNA are depicted as black and red lines, respectively. NT DNA nucleotides interacting with RfaH are colored yellow, catalytic Mg2+ is represented by a magenta sphere. RfaH (purple oval) recruits to the TEC

paused at U43 and maintains contacts with its binding site in NT DNA for two successive

nucleotide addition cycles resulting in moderate scrunching of DNA. In ~15% of cases the

scrunched state resolves by DNA reannealing downstream from the active site, thus

generating a backtracked TEC with U43 in the active site. GreB (depicted as green scissors)

regenerates active U43 TEC, which rapidly resumes elongation. Since the efficiencies of

pauses at G44 and C45 are low, thus recovered TECs have a good chance to avoid getting

trapped again, and GreB action ultimately results in rapid clearance of the opsP site.

80

Substitutions That Compromise RfaHN Contacts with Ops

In our analysis, we treated three consecutive positions within the opsP site (U43,

G44 and C45) as a single pause site (See Supporting information). In the absence of

RfaH, the majority of RNAP molecules occupied the opsP1 site (U43) and escaped following a monoexponential function with a rate constant of ~0.1 s-1. In the presence of

WT RfaH, the U43 site became depopulated and the RNAP occupancy at G44 and C45

positions was increased; a slowly escaping fraction (~17%, rate constant 0.003 s-1)

emerged at C45, necessitating the use of a biexponential function. The effects of various

RfaH substitutions on RNAP delay at C45 are summarized in Fig. 28. K10F and R73D

variants were transcriptionally silent at opsP: RNAP displayed monoexponential escape

kinetics and the pausing pattern was indistinguishable from that observed in the

absence of RfaH. Thus, these variants are absent from Fig. 28A.

Several variants retained partial activity: they delayed 8-40% of RNAP molecules

at C45 and required biexponential function for accurate fit of escape kinetics. Five

substitutions (T72A, K10A, H20A, R73A and R16A) resulted in strong defects in

interactions with the NT DNA; escape rate constant increased 3-9 fold, whereas the

fraction of delayed RNAP was similar or reduced relative to the WT RfaH. These

residues form a tight cluster (Fig. 28C) that we propose constitutes the DNA-binding site

on RfaHN. Lys10, Arg16, His20 and Arg73 side chains face the protein exterior and possibly

make direct contact with the DNA. In contrast, Thr72 faces the interior and therefore likely affects DNA binding indirectly, by controlling the position helix-loop-helix region accommodating Arg73.

H65A, T66A, Q13A, R23A and T68A variants appeared mildly defective in NT

DNA binding: RNAP escape rate constant was 1.5-2 fold lower and the fraction of delayed RNAP was similar or decreased as compared to WT RfaH. Gln13 and Arg23 are adjacent to the residues forming a putative DNA-binding cluster, and thus may either directly contribute to DNA binding or affect positioning of the DNA-binding residues.

The effects of H65A, T66A and T68A substitutions are likely indirect and may result

81 from compromised RfaH interactions with RNAP (see Discussion) or involve long-range conformational changes.

Three substitutions (E19A, R43A and E48A) were more effective than WT RfaH:

they reduced escape rate constant two- to three-fold and increased the fraction of C45-

delayed RNAP by ~30%. The apparent stabilization of the RfaH-DNA contacts by E19A

substitution may be due to the removal of the negatively charged Glu side chain from

the vicinity of the DNA-binding residues. Glu48 may interact with the invariant Arg178 from RfaHC in free RfaH and with β’CH residues in the RfaH-TEC complex. Changes in both the interdomain and intermolecular contacts upon E48A substitution may contribute to the observed effect, because escape of at least some RNAP from the backtracked state at C45 may involve RfaH dissociation from the TEC. The effect of

R43A substitution at the tip of a β-loop is difficult to interpret; this loop is predicted to be highly mobile even in RfaH that is not bound to RNAP (Fig. 30). The β-loop may be

involved in interaction with the β’CH or with the DNA fork junction and thus contribute

to the RfaH ability to reduce backtracking (210). In E. coli NusG, the size but not the

sequence of this “mini-domain” is important for activity (243).

Finally, we classified Y54F and F56L into a separate group since these variants

delayed as much as 35-40% RNAP at C45 and were thus clearly distinct from the WT

and all other variants. Tyr54 is one of the most conserved residues in NusG and all of its paralogs; in contrast, Phe56 is only conserved in RfaH-like proteins, whereas most NusGs

have a Leu at this position (129). The two substitutions also had different effects on

escape rate, similar to WT for F56L and four times greater in the case of Y54F. Both Tyr54

and Phe56 are located in the hydrophobic cavity on the RfaHN surface thought to

constitute the β’CH binding site (197). We therefore hypothesize that their substitutions

cause RfaH to bind RNAP in an aberrant way, which increases the backtracking

propensity of the RfaH-TEC complex at C45. The F56Lprotein, which mimics the NusG

configuration, remains tightly bound to the TEC and has a “wild-type” effect on RNAP

escape. In contrast, Y54F likely dissociates rapidly from the backtracked complex,

82 thereby allowing TEC to translocate forward and resume elongation.

83

Fig. 29. Effects of RfaHN substitutions on pausing at the ops site.

A. Effects of altered RfaH proteins on pause efficiency at (top) and escape rate from

(bottom) the C45 position (Fig. 27). See text for details.

B. Substitutions are shown on the RfaHN structure as sticks and colored according to their phenotypes (color coded as in A). The position of a hypothetical DNA-binding region is indicated.

84

Fig. 30. Alignment of the wild-type and a representative set of “mutant” RfaH structures in context of the wild-type structural ensemble.

The tube radius for the wild-type ground state was set to 0.5, for “mutants” - to 0.15, for the higher energy WT structures - to 0.05. Cartoon is colored according to activity of

RfaH variants in Fig. 29. Variants that possess near-wild type activity (F51A, H20A,

K37A) are colored yellow; those displaying significant defects (R16A, T66A, W4F) are colored red; the wild-type structural ensemble is shown in black for visibility. The coordinates for alignment were obtained through CONCOORD-PBSA modeling, alignment and image were generated using PyMOL (DeLano Scientific LLC).

85

Substitutions That Compromise the AP Activity of RfaHN

After RfaH is recruited to RNAP at the ops site, it remains bound to the enzyme for thousands of nucleotide addition cycles (129), preventing NusG binding, inhibiting backtracking and reducing efficiency and longevity of all known type of pauses. The effect of RfaH at hisP site is commonly used as in vitro reporter of RfaH AP activity and is traditionally analyzed by fitting the decay part of the pausing curve to a monoexponential function. Such analysis outputs a well-defined escape rate and pause efficiency (extrapolated to zero time) parameters, but serves as a very rough approximation of the actual events. Indeed, in our assays RNAP was starting to arrive at hisP only after 5-10 s and continued to simultaneously arrive and escape for tens or even hundreds of seconds. In addition, different RfaH mutants produced different proportions of fast and slow (opsP2 delayed) fractions of RNAP and also generated a various degrees of asynchrony in RNAP arrival to hisP site. We thus employed a more complex kinetic model that accounts for the concurrent arrival and escape of both the fast and the slow fractions of RNAP at hisP. Our analysis demonstrates that only the ratio of escape rate constant and efficiency (KhisP/EffhisP) but not the individual parameters, can be accurately determined for all datasets. This limitation is inherent in the experimental setup (concurrent arrival and escape at hisP with similar rate constants) and is not a consequence of model overparameterization. We therefore used KhisP/EffhisP as a measure of AP activity of RfaH variants (Fig. 31).

Q13A, F51A, E19A, H94A, T68A, Q2A, Q24A, K37A, R23A, H20A variants displayed KhisP/EffhisP ratio indistinguishable from that of the WT RfaH within the margin

of error, suggesting that the altered residues are not essential for TEC binding and AP

activity. In contrast, H65A, F56L, R16A, R73A, W4F, T67A, K42A, T66A variants

displayed KhisP/EffhisP ratio similar to that observed with NusG and two- to three-fold lower than that observed with the WT RfaH. K10F, Y54F and R73D were transcriptionally silent at hisP as they did not elevate KhisP/EffhisP ratio above that observed

in the absence of added elongation factors. Finally, Y8F, K10A, V63D, N70A, E48A,

86

T72A, Y5F, A71N and R43A variants displayed mild defects, with KhisP/EffhisP ratio in the range of 80-60% of the WT RfaH value. We note that mildly defective variants cannot be unambiguously differentiated from the WT group due to a continuum in decreasing

KhisP/EffhisP values. The substitutions producing mild phenotype are usually adjacent to those causing strong defects (Fig. 31B), suggesting that both types of substitutions interfere, directly or indirectly, with the same interaction in the RfaH-TEC complex.

We then focused on pinpointing the causes of diminished AP activity in strongly defective variants. The diminished activity may arise from (i) inefficient recruitment at opsP; (ii) dissociation of RfaH prior to arrival to hisP ; and (iii) failure of the TEC-bound

RfaH to reduce pausing. At concentrations above 2 µM, RfaH does not require opsP for

function, implying that RfaH can be recruited to RNAP nonspecifically (data not

shown). Thus, both the inefficient recruitment and the subsequent dissociation should be

compensated by increasing the concentration of the RfaH variant. Indeed, at 3 µM F56L,

R73A and K42A displayed KhisP/EffhisP ratios indistinguishable from that observed in the

presence of WT RfaH (Fig. 31, inset), whereas partial compensation (1.5-fold increase in

KhisP/EffhisP ratio) was observed with R16A and W4F. These results suggest that defects in

recruitment or retention on the TEC were the primary causes of diminished AP activity

for W4F, R16A, K42A, F56L, and R73A variants.

In contrast, the low KhisP/EffhisP ratios of H65A, T67A and T66A were not markedly affected by a 60-fold increase in concentration, suggesting that these substitutions eliminated AP activity. This is supported by the observation that H65A, T67A and T66A variants remain tightly associated with the TEC (A.S., unpublished data). Finally, increasing Y54F RfaH concentration to 3 µM resulted in 1.5-fold higher AP activity, but the KhisP/EffhisP ratio remained three-fold below that of WT RfaH and equal to that of

NusG. We conclude that Y54F change not only compromises RfaH retention on the TEC, but likely also completely abolishes RfaH-TEC interactions essential for its AP activity at the hisP (which NusG lacks).

87

Fig. 31. Effects of RfaHN substitutions on pausing at the hisP site.

A. Effects of altered RfaH proteins on pausing at the hairpin-dependent hisP site at 50

nM protein (Fig. 27). Inset; the AP activity measured at 3 µM RfaH variant.

B. Substitutions are shown on the RfaHN structure as sticks and colored according to their phenotypes (color coded as in A). The positions of a presumed β’CH-binding site and an HTT motif are indicated, together with that of the hypothetical DNA-binding region (Fig. 29B).

88

To assess the impact of substitutions on the RfaH structure, we first utilized a

CONCOORD-PBSA molecular mechanics approach(244). The predicted effects of most substitutions on stability and overall structure were small (ΔΔG ± 2 kcal/mol and root mean square values < 2Å relative to WT RfaH, respectively; Table 3) and did not correlate with a given mutant’s AP activity, although variants with an increased stability tend to have increased or near-WT levels of activity. Alignment of the structural ensembles for each variant and the starting RfaH structure (PDB ID 2OUG) demonstrated that predicted changes in flexibility were insignificant in the case of the closed RfaH conformation, the state for which structural information exists (Fig. 30 and

32). This suggests that the detrimental effects of substitutions analyzed in this work are not mediated by gross changes in the RfaH structure or folding, but rather stem from changes in side-chain interactions with the NT DNA and RNAP.

To verify the modeling predictions experimentally, we carried out CD analysis of selected RfaH variants (Fig. 33). This analysis revealed no gross structural alterations conferred by substitutions. In particular, the spectrum for the Y54F protein, which displays dramatic defects both in vitro (Figs. 29 and 31) and in vivo (see the next section), was indistinguishable from that of WT RfaH.

89

RfaH ΔΔGcalc* ΔΔGLG ΔΔGes ΔΔGsa -T. ΔΔS RMS

variant (kcal/mol) (kcal/mol) (kcal/mol) (kcal/mol) (kcal/mol) (Å)

F56L 1.92 0.996 -0.242 1.21 -0.0478 1.291

H20A 2.55 1.95 -0.337 1.46 -0.529 1.630

R16A -0.772 0.256 -0.677 0.332 -0.683 1.478

R43A -0.5 -0.0833 -0.245 0.288 -0.46 1.361

K37A 1.39 1.49 -0.448 0.963 -0.614 1.331

E19A -1.43 -0.634 -0.326 -0.0899 -0.384 1.569

H65A 2.62 2.02 0.063 1.19 -0.653 2.162

T66A 4.28 3.16 -0.0695 1.57 -0.384 1.537

T67A 1.98 1.42 0.0634 0.806 -0.306 2.495

Y54A 3.66 2.48 -0.727 2.51 -0.597 1.252

Y8A 3.49 2.86 -0.8 1.93 -0.505 1.382

T72A 1.93 1.65 -0.472 1.14 -0.395 1.144

W4A 6.45 4.6 -1.22 3.46 -0.392 1.797

W4F 3.44 1.83 0.0563 1.34 0.217 1.030

Y54F -1.02 -0.67 -0.279 -0.155 0.0868 1.779

F51A 4.69 2.71 0.229 2.47 -0.71 1.749

H94A 2.84 1.95 -0.762 2.21 -0.562 2.048

* . ΔΔGcalc=αΔΔGes+βΔΔGLG+γΔΔGsa-τT ΔΔS

continued

Table 3. Predicted effects of selected substitutions in RfaH.

To assess the impact of amino acid substitutions used throughout this work on

the RfaH structure and stability, we utilized CONCOORD-PBSA molecular mechanics

approach (244) using the E. coli RfaH structure (PDB ID 2OUG) as an input.

http://ccpbsa.bioinformatik.uni-saarland.de/ccpbsa/index.php. For 12 variants, the

predicted effects of substitutions (ΔΔG) were within 3 kcal/mol.

90

Table 3 continued

Five substitutions (in bold) at the domain interface in the closed RfaH conformation, the only state for which structural information exists, were predicted to induce a greater decrease in stability (up to 6.54 kcal/mol for the W4A variant); however, the structure of the mutants is predicted to be largely intact, whereas the domain interface is destroyed after RfaH recruitment to the TEC, and the same region is thought to instead bind to the β’CH domain of RNAP (197). Destabilization of the closed conformation might actually increase RfaH activity, but the detailed analysis of the effects of these substitutions would require a high-resolution experimental model of

RfaH-TEC interactions. The predicted effects on stability did not correlate with a given mutant’s antipausing activity, although variants with an increased stability tend to have increased or near-wild type levels of activity. The overall structure is also not predicted to change as a result of these substitutions: the total molecule RMS (root mean square) values deviated less than 2Å from the wild-type RfaH and they do not correlate with activity (Fig. 32). Predicted lowest energy structures for mutants fit quite well within WT structural ensemble (Fig. 30) further confirming that they belong to the same structural ensemble, accessible in solution. Flexibility changes were assessed by alignment of the sample structural ensembles for each mutant and the starting PDB structure and were found to be insignificant in the closed conformation of RfaH (data not shown).

Altogether our modeling indicates that the impact of each single amino acid substitution on RfaH structure, flexibility and stability (∆∆G) are rather moderate and are unlikely to account for majority of noted defects in mutants’ activity.

91

Fig. 32. Alignment of the wild-type and all “mutant” RfaH structures.

Tube cartoon representations (tube radius for WT was set to 0.5, for “mutants” to 0.15).

Cartoon is colored according to RfaH activity in Fig. 6: the wild-type, K37A, E19A,

H20A, F51A, H94A are shown in yellow; mildly defective R43A and T72A – in orange; very defective R16A, H65A, T66A, T67A, F56L, Y54A, Y8A, Y54F, W4F are shown in red.

The coordinates for the alignment obtained through CONCOORD-PBSA modeling, alignment and image were generated using PyMOL (DeLano Scientific LLC).

92

Fig. 33. Circular Dichroism (CD) spectra of selected RfaH variants does not reveal any major structural perturbations compared to the wild-type protein.

Protein samples were dialyzed into 10mM Tris-HCl pH 8.0, 250mM NaCl, 1% glycerol and diluted to 19.5µM prior the experiment. CD spectra were recorded with an Aviv

62A DS CD spectrophotometer at 25 °C. Data were collected in a 1 mm quartz cuvette from 260 to 200 nm with a 1 nm step and 10 s averaging at each step. Mean residual ellipticity (MRE) was calculated as (q´100)/(0.1 cm´[P]´n) where q is raw ellipticity, [P] is protein concentration (µM) and n is the number of amino acid residues.

93

In Vivo Effects of RfaHN Substitutions

In the case of NusG, a good correlation between in vitro and in vivo effects of selected substitutions has been reported. We wished to test whether the RfaH residues implicated by our in vitro assays are functionally important in the cell. We designed an assay system (Fig. 34A) consisting of three components. First, we constructed an rfaH-

strain by a targeted disruption of the rfaH gene in E. coli DH5α strain using the

Targetron protocol (see Materials and Methods). Second, we constructed a compatible low copy number plasmid (pIA957) with the rfaH gene cloned under the control of an

IPTG-inducible Ptrc promoter and a lacIQ1 variant, which contains promoter mutations that increase the expression of lac (245). We also made vectors lacking rfaH

(pIA947) or containing substitutions in the RfaHN domain (see Table 4 for the complete list). Third, we constructed a reporter that carries a Photorhabdus luminescens (246) luxCDABE operon under control of the PBAD promoter/araC cassette (247) and the ops site

(pGB83) and a control plasmid without the ops element (pGB63). This is a medium copy

number plasmid and since all natural RfaH targets are transcribed at very low levels we

did not use arabinose induction in our assays.

The lux operon encodes the luciferase (luxA and luxB) genes that oxidize FMN-H2 and a long-chain aliphatic aldehyde in the presence of O2 to yield a luminescence signal.

The aldehyde is subsequently regenerated by a multi-enzyme reductase complex encoded by the luxC, luxD, and luxE genes. The lux system is a sensitive (over at least five orders of magnitude) and simple bioreporter capable of autonomous light emission: it encodes all the components required to generate the bioluminescent signal, which is then directly measured in the cell culture.

In the absence of RfaH, expression of the lux operon was quite low (400 units,

Fig. 34B). Ectopically expressed RfaH increased the signal more than 700-fold when the ops+ template was used, and ~20-fold with the control plasmid that lacks ops upstream from luxC. Thus, unexpectedly, the lux expression was strongly dependent on RfaH but less dependent on the ops element. Sequence analysis revealed an ops-like element in the

94 luxC gene (GGCGGTAGAGca, marked as ops* in Fig. 34A, left). In the ops* sequence, all

the bases that likely interact with RfaH are conserved (bases shown in capital letters fit

the ops consensus), and the only two divergent bases correspond to the location of the

site where RNAP pauses in the absence of RfaH (U43 shown in Fig. 29A). The TG element is the preferred combination for natural pauses, yet it is possible that either the pause is not absolutely required for RfaH recruitment, or that the CA combination may induce pausing at sufficient, even if reduced, levels. To determine whether the ops* element confers the dependence on RfaH, we constructed a variant with two silent substitutions in luxC that compromise the function of the canonical ops sequence

(unpublished). We found that this “defective” ops* reporter maintained its dependence on RfaH and exhibited the same activity as the original vector. Thus, the dependence on

RfaH is not due to the putative ops element; one possibility is that the documented effects of RfaH on cell wall integrity (248) underlie the observed effect. For the purposes of this analysis, it is important to note that although this reporter is not ops-dependent, the measured light emission is absolutely dependent on RfaH function, the parameter under study.

We observed that all seven RfaH variants tested were apparently defective in lux operon expression; their activities were increased 5-10 fold in the presence of ops and ranged from 1% to ~50% of the WT protein (Fig. 34B). The most defective variant, Y54F, carries a conservative substitution in the putative RNAP-binding site, and is also severely defective in the ability to reduce pausing at the hisP site (Fig. 31A). We note that there is no perfect correlation between the in vitro AP activity of the RfaH variants and their in vivo effects on gene expression. These differences are most likely due to the nature of signals at which RfaH acts to increase the expression of the lux operon: while the hairpin-dependent hisP is an excellent model to study the mechanism of AP, most

pause sites present on natural templates are short-lived pauses that control the overall

rate of transcription and serve as precursors to termination. In addition, the presence of

other cellular elongation factors or ongoing translation may affect RfaH function.

95

To ascertain that the in vivo defects of RfaH variants are not merely a result of their reduced expression or stability, we measured the cellular level of these proteins by

Western analysis with polyclonal anti-RfaH antibodies (129). These antibodies recognize the RfaHC domain (A.S., unpublished observations) and thus their interactions with the

target should not be directly altered by substitutions in RfaHN. We found that all the

tested variants were present at the same or a slightly higher level than the WT RfaH (Fig.

34C), strongly suggesting that their defects are directly conferred by the substitutions.

96

Fig. 34. The in vivo reporter assay for the RfaH activity

A. The assay system. We constructed an rfaH knock-out strain and two plasmids to assay the effects of ops and RfaH variants in vivo. The first plasmid (pGB83; left) has a ColE1 origin of replication and contains the entire Photorhabdus luminescens (246) luxCDABE operon under the control of the AraC-controlled PBAD promoter and an ops element. The

second compatible (P15A ori) plasmid (right; pIA957) has the E. coli rfaH gene cloned

under the control of the Ptrc; the plasmid also carries an engineered lacIQ1 gene.

B. Analysis of the effects of selected RfaH substitutions on lux operon expression in the presence (solid bars) and the absence (striped bars) of the ops element. The results are

expressed as luminescence corrected for the cell densities of individual cultures. The

data represent the average of at least four independent experiments.

97

C. Western analysis of cell extracts expressing RfaH variants (as in panel B; see Materials and Methods) performed in parallel with the lux assay with polyclonal RfaHC-specific

antibodies.

98

Discussion

RfaH is recruited to the TEC through specific contacts to the DNA and RNAP and remains bound to the enzyme until it completes the synthesis of the entire operon.

While bound, RfaH increases the apparent rate of RNA synthesis on natural templates by suppressing pausing and reduces termination. In a purified system, all these activities are mediated by the N-terminal, RfaHN domain. We report that RfaHN contains

three separate regions that mediate DNA recognition, retention on the RNAP throughout

transcription, and AP modification.

The Functions of the RfaH Domains

Our previous (197,210) and present (Fig. 24 and 25) analyses show that RfaHN acts in vitro at least as efficiently as the full-length protein. RfaHN is both necessary and

sufficient for binding to RNAP and to the ops DNA and mediates all the effects on

elongation: an overall increase in the elongation rate, reduction of pausing and

termination at intrinsic and Rho-dependent sites. However, the isolated RfaHN is insoluble and thus nonfunctional in vivo; furthermore, it lacks the key regulatory feature of RfaH which distinguishes it from NusG in vitro: in contrast to the full-length protein,

RfaHN recruitment to the TEC is independent of the ops element.

RfaHC is transcriptionally silent but plays several modulatory roles (129). First,

RfaHC renders the full-length protein soluble by masking an extensive hydrophobic

surface on RfaHN, which acts as an RNAP-binding site (see below). Second, RfaHC restricts the RfaH action to a small number of ops-containing operons by preventing binding to RNAP except at the ops sites, where RfaHN contacts with the DNA trigger domain dissociation. Third, RfaHC may be engaged in cross-talk with the translation and

secretion machineries; the last feature may be particularly important because RfaH-

controlled operons are laterally-acquired (and thus are likely poorly translated) and

mediate synthesis of various extracytoplasmic molecules.

Our studies are in agreement with a recent analysis of the E. coli NusG domains

99

(137), which demonstrated that NusGN is sufficient for the AP effect in vitro, but fails to

support the enhancement of Rho-dependent termination. NusGC interacts with Rho and other partners: NusG is required for Nun-dependent termination and for the assembly of λ phage and E. coli rRNA antitermination complexes. Different NusGC substitutions

eliminate the effect on either Rho- or Nun-dependent termination, suggesting that these

proteins interact with distinct regions on NusGC (137).

RfaH has an opposite (and likely indirect) effect on Rho-dependent termination and does not interact with Rho; the most straightforward explanation is that RfaHC structure is drastically different from that of NusGC (essentially turned inside-out) and

the Rho-interacting residues that are located on the surface of NusGC are inaccessible in

RfaH (Fig. 23). While it is possible that RfaHC may refold into a β-barrel after recruitment, we argued that substitutions of residues in an RfaH-like protein that contact

Rho in NusG (and whose identity is yet unknown) must have happened early after the ancestral nusG duplication to eliminate Rho binding, the key functional difference between the two paralogs (129). We propose that RfaHC may interact with other cellular proteins and may be essential for coupling transcription of RfaH-controlled operons to translation and finally to secretion. In support of this hypothesis, Bailey et al. reported that RfaH and ops nucleate the formation of a high molecular weight complex whose assembly requires RNAP and associated elongation factors, ribosomes, and the cytoplasmic membrane fraction (249).

C-terminal domains in eukaryotic NusG paralogs are more complex and likely mediate interactions with the components of the transcriptional and post-transcriptional complexes. For example, a plant-specific SPT5-like protein has a long carboxy-terminal extension that interacts with AGO4 to direct RNA-directed DNA methylation by polV

(250) and transcriptional silencing of retrotransposons and repetitive elements.

100

A Cluster of Polar and Charged Residues Mediates RfaHN Binding to DNA

Our previous studies indicate that RfaH directly and specifically binds to the NT

DNA (82). Based on molecular modeling, we proposed that a polar region on RfaHN interacts with ops (197) in the RfaH/TEC complex. Importantly, this model was rather speculative: it was built using the structure of Tth RNAP, the only bacterial species for which high-resolution structures are available, but which differs from the E. coli enzyme in many surface features that may be essential for RfaH action. Furthermore, the path of the NT DNA was not constrained by any experimental data - none of the TEC structures currently available contains the NT DNA in the transcription bubble, and it is quite possible that both the ops sequence and the bound RfaHN constrain its path on the RNAP

surface.

The results of the mutational analysis that we carried out to evaluate this model

(Fig. 29) are consistent with our predictions. Five residues that form a patch on the

RfaHN side opposite of the interdomain interface, Lys10, Arg16, His20, Thr72 and Arg73, are

required for RfaH- ops DNA interactions (Fig. 29). The effects of substitutions at these residues are unlikely to result from alterations in the protein structure (Supporting information); consistently, their defects were alleviated by increase in RfaH concentration (Fig. 31).

RfaHN does not have any recognizable DNA-binding motif, which is not

surprising given that it binds to a rather unusual target: the single-stranded DNA strand

exposed on, and interacting with, the surface of RNAP (251). The ops element spans 12 nt among which 10 are highly conserved; however, the modeling predicts that only ~four ops bases are available for direct contacts with RfaHN, too few to explain the high specificity of RfaH towards its targets observed in vivo (129). The remaining ops bases likely mediate a conformational change in the TEC that is required for RfaH recruitment.

101

A Hydrophobic Surface of the N-terminal Domain Mediates its Binding to RNAP

The post-recruitment activity of RfaH depends on its persistent contacts to

RNAP. We propose that these contacts are established between a hydrophobic patch on

RfaH and the tip of β’CH. We reported that the deletion of the tip, or substitutions of the

Ile290 or Ile291 residues eliminated RfaH ability to enhance elongation (197). We also showed that RfaH Y8A and β’ I290R substitutions decreased the RfaH ability to compete with σ, presumably by destabilizing the RfaH/β’CH contacts and allowing σ to bind to

β’CH instead (136). Here we show that substitutions of several residues in this region decrease the AP activity of RfaH (e.g., W4F and F56L; Fig. 31). The effect of these substitutions can also be alleviated by an increase in RfaH concentration, consistent with the reduced affinity of the altered proteins. Mutagenesis of E. coli nusG, in vitro analysis

of NusG variants with substitutions of residues in the homologous patch (137), and two-

hybrid assays (252) all suggest that NusG interacts with the β’CH, supporting our

assumption that RfaHN and NusGN bind to RNAP in a similar fashion (129).

The HTT Motif as an AP Module

The common activity of RfaHN and NusGN domains is to increase the rate of

RNA chain elongation. This rate is limited by sequence-dependent signals that induce formation of an elemental pause state, in which the 3'- end of the nascent RNA is misaligned in the active site (41,142); this isomerization is thought to occur from a pre- translocated state (40). RfaH fails to affect transcription on pause-free templates and with pause-resistant RNAPs, suggesting that it prevents isomerization into the elemental pause (210). In the simplest scenario, RfaH interactions with β’CH and the NT strand would be sufficient to induce forward translocation of RNAP, an effect that may explain its AP effect. Although the specific RfaH/DNA contacts are not needed for its AP activity, RfaH likely maintains non-specific contacts with the NT strand at the upstream part of the transcription bubble where it could promote strand reannealing, and thus translocation. In this scenario, RfaH should be able to act as long as it remains bound to the TEC.

102

Our data identify an additional HTT motif required for AP by RfaH, but distinct from the DNA- and RNAP-binding motifs. The defects of substitutions in the HTT motif are not suppressed at high RfaH concentration (Fig. 31), suggesting that these changes are unlikely to cause the loss of affinity. In search for an alternative explanation, we re- examined the RfaH/TEC model (197). We found that the HTT motif makes a hypothetical contact with the β gate loop (β GL, Fig. 35), a conserved loop that has been proposed to play a key role in the DNA loading in the course of the promoter complex formation (5). We are not aware of any functional data pertaining to the β GL role in transcription.

A large caveat of this modeling is that it has been performed with the T. thermophilus RNAP, in which the β GL sequence is quite distinct from that in the E. coli

RNAP and which is not “designed” to interact with RfaH, which is absent from many

Bacteria, including Thermus. However, if this contact can indeed be established in the E. coli TEC, it would offer a plausible hypothesis for the RfaH AP effects: the β GL belongs to a mobile domain that likely moves in concert with the β’ clamp during isomerization into a paused state (40). During pausing and termination, the clamp is thought to partially open (41); RfaH could restrict the mobility of the β and β’ parts of the clamp, thereby preventing its opening. We are currently testing if RfaH and the β GL interact, and whether this interaction has functional consequences.

Contributions: Anastasia Sevostiyanova performed analysis of RfaH mutants in vivo (lux reporter assays, Western) and in vitro (CD analysis, Rho-dependent and factor- independent termination assays. Georgiy Belogurov constructed some and purified all

RfaH variants and carried out in vitro analysis of their kinetic properties. Vladimir

Svetlov constructed 3 RfaH variants and carried out their initial analysis. Irina

Artsimovitch constructed most of the expression vectors for RfaH variants.

103

Fig. 35. The functional contacts between RfaH and the TEC.

We propose that RfaH function depends on three separate contacts with the NT DNA strand (blue), the β’CH (orange), and the β GL (magenta). The side chains of the RfaH residues that are proposed to make the key contacts with these elements are shown as spheres in the matching colors. The ops elements from selected operons that are associated with RfaH in MG1655 (129) are conserved with the exception of the two highlighted bases. Sequences of the β’CH and the β GL are highly and moderately conserved, respectively; the residues that differ from the E. coli (Eco) sequence are shaded in light green. The abbreviations are: Tth, Thermus thermophilus; Hpy, Helicobacter pylori; Bsu, Bacillus subtilis; Mtu, Mycobacterium tuberculosis; Mge, Mycoplasma genitalium.

104

Materials and Methods

Plasmids and Strains

Plasmids used in this work are listed in Table 4. Sequences of all plasmid constructs were verified at the OSU PMGF centre and are available at our lab web site, www.osumicrobiology.org/homepages/artsimovitch/sequences/pIA_plasmids_list.htm.

Disruption of the RfaH ORF was carried out in the E. coli DH5α (λDE3) strain using

TargeTronTM Gene Knockout System (Sigma-Aldrich, St. Louis, MO) according to the manufacturer's protocol. Briefly, RfaH ORF sequence was submitted to the proprietary search engine (http://www.sigma-genosys.com/targetron/) to identify potential target sites for intron insertion, and one of the top ranked targets (for intron insertion at position 21) was selected due to its proximity to the start codon. Intron insertion plasmid, pACD4K-C, was retargeted to this site on rfaH using PCR with a set of oligonucleotides generated according to the TargeTron algorithm. Disruption of rfaH by the retrohoming intron was induced by addition of IPTG to the culture transformed with the retargeted plasmid and application of the kanamycin selection. RfaH disruption was confirmed by genomic PCR and sequencing, followed by "curing" the strain of the plasmid; the resulting ΔrfaH strain was named IA149.

Proteins and Reagents

Oligonucleotides were obtained from Integrated DNA Technologies (Coralville,

IA), NTPs and [α32P]-NTPs were from GE Healthcare (Piscataway, NJ), restriction and modification enzymes – from NEB (Ipswich, MA), PCR reagents – from Roche

(Indianapolis, IN), other chemicals - from Sigma (St. Louis, MO) and Fisher (Pittsburgh,

PA). Plasmid DNAs and PCR products were purified using spin kits from Qiagen

(Valencia, CA) and Zymo Research (Orange, CA). Rho protein was a gift from Rachel A.

Mooney. The full-length RfaH variants, the RfaHN domain, and RNAP were purified as described in (197).

105

Halted Complex Formation

Linear templates for in vitro transcription were generated by PCR amplification.

TECs were formed with 40 nM of linear DNA template and 50 nM RNAP holoenzyme in

20-100 µl of transcription buffer (20 mM Tris-chloride, 20 mM NaCl , 2 mM MgCl2, 14 mM 2-mercaptoethanol, 0.1 mM EDTA, 5% glycerol, pH 7.9). To make the elongation complexes halted after addition of G37 on pIA349 and pIA416 templates, transcription was initiated in the absence of UTP, with ApU at 150 µM, ATP and GTP at 2.5 µM, CTP at 1 µM, with 32P derived from [α 32P]CTP (3000 Ci/mmol). Halted complexes were formed for 15 minutes at 37oC and stored on ice prior to use.

Single Round Pause Assays

Halted [32P]CTP labeled elongation complexes were prepared in 50 µl of transcription buffer. Elongation factors were added followed by 3 min incubation at 37 oC. Transcription was restarted by addition of GTP to 15 µM, CTP, ATP and UTP to 150

µM, and rifapentin to 25 µg/ml. Samples were removed at 10, 20, 40, 60, 90, 120, 180, 300,

600 and 1200 sec and after a final 5-min incubation with 200 µM GTP (chase), and

quenched by addition of an equal volume of STOP buffer (10 M urea, 50 mM EDTA,

45 mM Tris-borate; pH 8.3, 0.1% bromophenol blue, 0.1% xylene cyanol).

Intrinsic Termination Assay at Thly

Halted complexes were prepared in 20 µl of transcription buffer with 25 nM of linear DNA pIA416 template and 60 nM of RNAP holoenzyme. Full-length RfaH, RfaHN

(or storage buffer) was added followed by 3 min incubation at 37 oC. Elongation was

restarted by addition of NTPs (10 µM UTP, 200 µM ATP, CTP, GTP) and rifapentin.

Reactions were incubated at 37 oC for 15 min and quenched as above.

106

Rho-Dependent Termination Assays

Halted complexes (A26) were prepared on pIA267 template in the absence of

CTP in 30 µl of Rho buffer (40 mM Tris-HCl, 50 mM KCl, 5 mM MgCl2, 0.1 mM dithiothreitol, 3% glycerol, pH 7.9) supplemented with ApU at 150 µM, ATP and UTP at

2.5 µM, GTP at 1 µM, and 5 µCi of [α32P]GTP (3000 Ci/mmol) during 15 min incubation at 37 oC. Full-length RfaH, RfaHN, NusG (or storage buffer) was added followed by 3 min incubation at 37 oC. Transcription was restarted by addition of GTP to 15 µM, CTP,

ATP and UTP to 150 µM, and rifapentin to 25 µg/ml. Reactions were incubated at 37 oC

for 15 min and stopped as above.

Sample Analysis

Samples were heated for 2 min at 90 oC and separated by electrophoresis in denaturing acrylamide (19:1) gels (7 M Urea, 0.5X TBE) of various concentrations (6-

10%). RNA products were visualized and quantified using a PhosphorImager Storm 820

System (GE Healthcare), ImageQuant Software, and Microsoft Excel. Kinetic analysis of pause assays is described in detail in the Supporting information.

In Vivo Assays

A promoter-less lux reporter vector, pIA874, in which several unique restriction sites were engineered by site-directed mutagenesis, was constructed from pSB417 (246)

During these manipulations, we have realized that the real sequence of pSB417 differs from the virtual one, and thus completely sequenced the redesigned vector. To create a

PBAD-lux fusion pGB063, a fragment containing the araC gene and the PBAD promoter was

PCR amplified from pBAD30 (247) and cloned into the NotI and XhoI sites of pIA874.

The ops+ plasmid pGB083 was constructed by cloning a PCR fragment containing the ops element upstream of rfbB (amplified from E. coli DH5α genomic DNA) between the NheI and XhoI sites of pGB063. A compatible plasmid (pIA947) for testing the RfaH effects in trans was constructed by first cloning a PCR fragment containing the PlacIQ1-lacI region from pIA249 (the lacIQ1 variant was introduced into the primer) between the EagI and

107

HindIII sites of pACYC184. To construct pIA957, an NdeI-HindIII fragment bearing wild-type RfaH was excised from pIA432 and cloned between the same sites of pIA947.

Altered RfaH variants were recloned from pIA432-like plasmids listed in Table 4.

Plasmids carrying RfaH variants were co-transformed with a lux reporter vector

(pGB083) into IA149 strain and plated on selective media (100 µg/ml carbenicillin, 50

µg/ml chloramphenicol). The single colonies were inoculated into 3 ml of LB media supplemented with antibiotics and incubated at 37°C with aeration. After 6 hours of growth, cultures were diluted into fresh LB containing antibiotics and 0.1% glucose to

OD600~0.05 and allowed to grow for additional 6 hours. Neither construct required induction (with IPTG and arabinose) since background expression of rfaH and lux operon were sufficient to generate the signal; this observation is consistent with very low expression levels of both rfaH and the operons it controls in E. coli (129). Luminescence was measured in 200 µl aliquots in triplicates on FLUOstar OPTIMA plate reader (BMG

LABTECH GmbH, Offenburg, Germany) and normalized by cell density. Results were analyzed using Microsoft Excel.

Western Blotting

Derivatives of the ΔrfaH strain transformed with plasmids carrying RfaH variants or an empty vector were grown in the same conditions as for the Lux assay. Cell samples were collected by centrifugation and resuspended in 1 ml of Lysis buffer (50 mM Tris-HCl, 500 mM NaCl, 5% glycerol, 0.1 mM β-ME; pH 7.9) containing 0.1 mg/ml lysozyme. Cells were sonicated and extracts were cleared by centrifugation. Extract samples containing 22µg of protein (as determined by Bradford assay) were loaded on a

10% SDS Bis-Tris gel (Invitrogen, Carlsbad, CA). Protein transfer was performed in Tris-

Glycine buffer, pH8.3 containing 20% methanol onto Hybond™ ECL membrane (GE

Healthcare, Piscataway, NJ) at 300mA for 2 hours in a Mini Trans-Blot Cell (Bio-Rad

Laboratories, Hercules, CA). Blocking of non-specific sites was carried out overnight at

4°C in PBS-T buffer (1x PBS pH7.5, 0.2% Tween-20) containing 5% nonfat dry milk. The membrane was incubated with rabbit polyclonal antibodies against RfaH (129)

108

(specificity confirmed) diluted 1/4000 in PBS-T for 1 hour with agitation at room temperature. After five washes with PBS-T, membrane was incubated with rabbit IgG for 1 hour (1/10000 dilution in PBS-T, obtained from GE Healthcare), washed again and exposed to ECL Plus detection reagents (GE Healthcare). Image was obtained using blue fluorescent mode on Storm-840 Phosphorimager (GE Healthcare).

Plasmid Description AbR Reference

/source

RfaH overexpression vectors (pET derivatives):

T7 gene 10 promoter-His6-thrombin cleavage site-rfaH

lacIq, ColEI ori, kanamycin resistance pIA238 E. coli rfaH between NdeI and HindIII sites of pET28b Kn (82) pIA432 silent BamHI site in rfaH in pIA238 Kn (202) pIA507 silent BamHI and XhoI sites in rfaH in pIA432 Kn this work pIA674 RfaH W4A; site-directed mutagenesis of pIA238 Kn this work pIA675 RfaH Y5A ;site-directed mutagenesis of pIA238 Kn this work pIA676 RfaH W4F; site-directed mutagenesis of pIA238 Kn this work pIA677 RfaH Y8A; site-directed mutagenesis of pIA238 Kn this work pIA702 RfaH R40A; site-directed mutagenesis of pIA432 Kn this work pIA703 RfaH K37A; site-directed mutagenesis of pIA432 Kn this work pIA707 RfaH F51A; site-directed mutagenesis of pIA432 Kn this work

Table 4 continued

109

Table 4 continued pIA710 RfaH K10A; site-directed mutagenesis of pIA432 Kn this work pIA711 RfaH R11A; site-directed mutagenesis of pIA432 Kn this work pIA733 RfaH H94A; site-directed mutagenesis of pIA432 Kn this work pIA744 RfaH Q24A; site-directed mutagenesis of pIA507 Kn this work pIA756 RfaH K42A; site-directed mutagenesis of pIA432 Kn this work pIA757 RfaH H65A; site-directed mutagenesis of pIA432 Kn this work pIA758 RfaH T67A; site-directed mutagenesis of pIA432 Kn this work pIA760 RfaH H20A; site-directed mutagenesis of pIA766 Kn this work pIA761 RfaH E22A; site-directed mutagenesis of pIA766 Kn this work pIA763 RfaH E48A; site-directed mutagenesis of pIA238 Kn this work pIA764 RfaH T66A; site-directed mutagenesis of pIA507 Kn this work pIA766 silent SacI, BamHI and XhoI sites in rfaH in pIA507 Kn this work pIA767 RfaH Q2A; site-directed mutagenesis of pIA507 Kn this work pIA768 RfaH T68A; site-directed mutagenesis of pIA507 Kn this work pIA772 RfaH V63D; site-directed mutagenesis of pIA507 Kn this work pIA773 RfaH Q13A; site-directed mutagenesis of pIA507 Kn this work pIA783 RfaH E19A; site-directed mutagenesis of pIA507 Kn this work pGB009 RfaH R16A; site-directed mutagenesis of pIA238 Kn this work pGB012 RfaH Y54A; site-directed mutagenesis of pIA238 Kn this work

Table 4 continued

110

Table 4 continued pGB013 RfaH T72A; site-directed mutagenesis of pIA238 Kn this work pGB030 RfaH F56L; site-directed mutagenesis of pIA238 Kn this work pGB038 RfaH Y54F; site-directed mutagenesis of pIA238 Kn this work pAL16 RfaH R43A; site-directed mutagenesis of pIA432 Kn this work pVS61 RfaH R23A; site-directed mutagenesis of pIA432 Kn this work pVS62 RfaH R73D; site-directed mutagenesis of pIA432 Kn this work pVS66 RfaH R73A; site-directed mutagenesis of pIA432 Kn this work

RfaH expression vectors (pACYC derivatives): lacIQ1 promoter, P15A ori, lacIq, chloramphenicol resistance pIA249 PlacI-lacI and Ptrc-E. coli rfaH in ptrc99 Amp (202) pIA947 PlacI-lacIQ1 from pIA249 cloned between EagI and Cm this work

HindIII sites of pACYC184 pIA957 rfaH cloned between NdeI and HindIII sites of pIA947 Cm this work pIA1001 RfaH T66A in pIA957; recloned from pIA764 Cm this work pIA1002 RfaH K10A in pIA957; recloned from pIA710 Cm this work pIA1003 RfaH R73D in pIA957; recloned from pVS62 Cm this work pIA1004 RfaH H20A in pIA957; recloned from pIA760 Cm this work pIA1005 RfaH R16A in pIA957; recloned from pGB009 Cm this work pIA1006 RfaH Y54F in pIA957; recloned from pGB038 Cm this work

Table 4 continued

111

Table 4 continued pIA1007 RfaH T67A in pIA957; recloned from pIA758 Cm this work pIA1008 RfaH T72A in pIA957; recloned from pGB013 Cm this work pIA1009 RfaH F51A in pIA957; recloned from pIA707 Cm this work pIA1010 RfaH R43A in pIA957; recloned from pAL16 Cm this work

Transcription templates

λ PR promoter–A26–opsP–λtr1 Rho-dependent pIA267 Amp (82) terminator transcription template

T7 A1 promoter–G37–opsP–hisP pause transcription pIA349 Amp (82) template pIA416 T7 A1 promoter–G37–opsP–Thly terminator Amp (202)

transcription template

Photorhabdus luminescens luxCDABE reporter vectors (pSB417(246) derivatives):

ColEI ori, ampicillin resistance pIA874 a polylinker cloned into pSB417 in place of the Plac Amp this work

promoter, the entire plasmid was sequenced pGB063 araC-PBAD promoter cassette from pBAD30 cloned into Amp this work

pIA874 pGB083 an ops element from the rfbB gene cloned downstream Amp this work

from PBAD in pGB083

Table 4. Plasmids and templates

112

Chapter 4: Functional analysis of Thermus thermophilus

transcription factor NusG

Introduction

Transcription elongation factors from the NusG family are ubiquitous from bacteria to humans and play diverse roles in regulation of gene expression. These proteins consist of at least two domains. The N-terminal domains directly bind to the largest, β’ in bacteria, subunit of RNA polymerase, whereas the C-terminal domains interact with other cellular components and serve as platforms for the assembly of large nucleoprotein complexes. Escherichia coli NusG and its paralog RfaH modify RNAP into a fast, pause-resistant state but the detailed molecular mechanism of this modification remains unclear since no high-resolution structural data are available for the E. coli system. NusG has been identified in E. coli on the basis of its requirement for phage λ N- dependent gene expression and thus named N utilization substance G (253). Subsequent

studies demonstrated that E. coli (Eco) NusG affects Rho-dependent termination (124), transcriptional arrest by HK022 Nun protein (183), RNA chain elongation (125) and translation (254), and is also a key component of the rRNA antitermination complex

(255). NusG is essential in wild-type E. coli (256) and is associated with RNAP

transcribing most of the E. coli MG1655 genes (137). However, recent studies (113)

demonstrate that NusG becomes dispensable when the rac prophage kil gene is deleted

and suggest that the essential role of NusG in E. coli is to enhance Rho-dependent

termination within the horizontally transferred operons, thereby limiting their

expression. In support of this hypothesis, Eco NusG directly interacts with Rho (175,257)

113 to increase Rho-dependent termination at suboptimal sites (126,258).

Given the variety of functions that have been assigned to NusG, it is difficult to infer which of these functions is (are) the most important. The majority of the functional data were collected with Eco NusG (41,124,126,137,175,253,257); Bacillus subtilis (Bsu)

(117) and Thermotoga maritima (259) NusGs have also been partially characterized.

Despite their high sequence and structural conservation, NusG proteins from different bacteria appear to have somewhat different functions and interaction partners. In contrast to E. coli, NusG is dispensable in B. subtilis(198) and Staphylococcus aureus (260) and does not bind to Rho in T. thermophilus (118). Many species also encode specialized

NusG paralogs. In E. coli, RfaH activates expression of LPS and capsule biosynthesis operons, hemolysin, and fertility genes by post-initiation mechanism that has been compared to antitermination (82). In Bacteroides fragilis, eight RfaH-like operon-specific

UpxY antiterminators regulate expression of capsular polysaccharides (130). The diverse range of binding partners/activities suggests that, after diverging from a common ancestor, different NusG-like proteins became adapted to playing specialized regulatory roles.

Functional analysis of this family of proteins has been greatly accelerated by the availability of several structures (118,119,121,175,197,261) solved by X-ray crystallography and NMR (Figs. 13 and 36). The N-terminal domains of the NusG-like

proteins are similar, whereas their C-terminal domain structures are quite divergent. The

differences range from the presence of additional domains (Fig. 37B) to a complete

domain refolding of the C-terminal domain of RfaH (RfaHC) relative to that of NusG

(NusGC). The SH3 β-barrel in NusG is transformed into an α-helical hairpin in RfaH

(129), yet the RfaHC sequence can be easily threaded into a NusG-like structure (Fig. 36).

114

Fig. 36. Structural conservation in the NusG family.

Structural models of the N-terminal domains of E. coli NusG (119) and yeast Spt5 (120) proteins. Secondary structure elements are colored and indicated on the sequence alignment shown at the bottom; the side chain of a highly conserved Trp residue is shown as red spheres. The conserved residues are indicated in red on the alignment based on multiple sequences; only Tth NusG, Eco NusG, Eco RfaH and yeast Spt5 are shown, the numbering corresponds to Tth NusG. Out of 35 residues conserved among bacterial homologs, the vast majority are hydrophobic (210), six are structural (Pro and

Gly). One of two positively charged conserved residues is located in the β loop; deletion of this loop (but not multiple substitutions that remove the charge) eliminates Eco NusG effects on elongation and Rho-dependent termination (243).

115

Fig. 37. The RNAP-binding surface and the domain architecture of the NusG -like proteins.

A. The proposed RNAP-binding surface (197) of Eco NusG lined with conserved hydrophobic residues (including Trp9, spheres) is facing the viewer.

B. Domain architecture of NusG-like proteins in the three domains of life; the experimentally confirmed binding partners are shown. NGN, NusGN; KOW, Kyprides–

Ouzounis–Woese domain; see (119) and references therein; CTR, C-terminal repeats.

This figure was made using PyMol (DeLano Scientific, Palo Alto, CA).

116

Comparison of RfaH and Eco NusG (175,197) shed light on the similarities and differences between their molecular mechanisms. Like NusG, RfaH increases the rate of

RNA chain elongation (82) and reduces pausing and termination at sites where RNAP is prone to backtracking in vitro; these effects are likely due to the stabilization of a

forward-translocated state (210,262) of the TEC.

Unlike NusG, RfaH reduces pausing at hairpin-dependent sites (197), does not

bind to Rho (IA, data not shown), and only acts on operons that encode a 12-nt long ops

element (82). During recruitment to the TEC, RfaH specifically recognizes the ops bases

in the NT DNA strand transiently exposed on the surface of the moving enzyme. This

interaction triggers domain dissociation that unmasks the RNAP-binding site located at

the domain interface in free RfaH (197). In contrast, the RNAP-binding surface is always

exposed in NusG (175), which can be recruited to RNAP at any site on a template.

Following their recruitment, NusG (137) and RfaH (129) remain associated with RNAP

throughout elongation in vivo.

The RfaH and NusG N-terminal domains (RfaHN and NusGN) are sufficient for their antipausing effects and likely bind to the β’CH of RNAP, as does Methanococcus jannaschii (Mja) NusG (263). In fact, even though the isolated RfaHN recognizes the ops

sequence during elongation, it no longer requires ops for recruitment to the TEC. Thus, the ops element serves to restrict the RfaH action to a subset of E. coli operons. The C- terminal domains play protein-specific roles: RfaHC indirectly confers the requirement for the ops element (197) and may bind to ribosome (129), NusGC interacts with Rho

(175), and Spt5C serves as an assembly platform for proteins that promote transcription

elongation and modification (121).

We are particularly interested in the molecular mechanism of the antipausing

modification of RNAP by NusG-like proteins. These proteins bind ~75 Å away from the

RNAP active site (175,197,263) and may act directly, by binding to and stabilizing the

upstream DNA fork junction, or allosterically, by altering conformational dynamics of

the trigger loop and the bridge helix, the two key catalytic elements in the β’ subunit

117

(264). These mechanisms are not exclusive, and high-resolution structural data will be required to elucidate the fine details of NusG/TEC interactions. Sequence and structural conservation of NusG and its target on RNAP, together with a recent analysis of archaeal

NusG (263), suggest that bacterial model systems recapitulate all the aspects of antipausing modification (as opposed to a complex network of regulatory interactions of

Spt5 in eukaryotes; see Discussion). The high-resolution structures of the TEC and the full-length NusG are available only for T. thermophilus, which lacks the better-studied

RfaH. We wanted to ascertain that Tth NusG can be used as a model for the structure/function analysis. Here, we report that, similarly to RfaH, Tth NusG binds to, and stabilizes the forward translocated state of, the TEC and competes with Tth σA during elongation. Thus, the architecture of the NusG-bound TEC appears to be conserved, justifying the choice of Tth NusG as a model for this family of regulators.

118

Results

Tth NusG Slows Down the Already “Fast” Tth RNAP

We wanted to test if Tth NusG increases the rate of RNA chain extension in vitro.

Although it is the most readily observable in vitro phenotype of Eco NusG (41,230,265),

several considerations suggest that it may not represent the universal effect of NusG-like

factors. First, we argued that RfaH (or the isolated RfaHN) acts by disfavoring

isomerization into a paused state: (i) RfaHN decreases the elongation rate on templates

that do not encode strong pauses and (ii) the “fast” RNAP variants that do not pause

readily are relatively resistant to the RfaH action (210). Second, similarly to Bsu enzyme,

which does not pause at several E. coli pause signals (232), Tth RNAP is missing the β’

SI3 domain; deletion of this domain makes Eco RNAP “fast” (210). If Tth enzyme was

resistant to falling into a paused state, it would fail to respond to NusG. Third, Bsu

NusG increases pausing at a hairpin-dependent site (117).

It is currently unknown whether regulatory pause signals exist in Thermus. We

first compared recognition of a canonical hairpin-dependent his pause signal by the Eco

and Tth enzymes. We used a template that has a phage λPR promoter, which is readily

recognized by both enzymes (266), followed by a 26-nt long region that does not encode

C residues (Fig. 38A). When transcription is initiated in the absence of CTP, RNAP molecules stall after addition of AMP at position 26. Upon addition of NTPs and heparin

(to prevent re-initiation), a single round of elongation can be monitored. Because transcription initiation is quite sensitive to the reaction temperature, we formed halted radiolabeled A26 TECs at near-physiological conditions, 37 oC and 55 oC, for Eco and Tth

RNAPs, respectively. We then monitored RNA chain extension upon addition of NTP

substrates and heparin; in this case, different temperatures can be used since TECs retain

their activity over a broader range of conditions.

119

Fig. 38. Effects of temperature on transcript elongation of Eco and Tth RNAPs.

A. Transcript generated from λPR promoter on a linear pIA226 DNA template; transcription start site (a bent arrow), C-less region, hisP pause signal (pause after U77), and transcript end are indicated.

B. Halted A26 TECs were formed at 50 nM with Tth RNAP (left panel) or Eco RNAP

(right panel) as described in Materials and Methods. Elongation was restarted upon

addition of NTPs and heparin at 37°C or 55°C. Aliquots were withdrawn at times

indicated above each lane (in seconds) and analyzed on 8% denaturing gels. Positions of

the halted (A26), paused (hisP), and run-off transcripts are indicated with arrows. Sizes

of the 32P-labeled DNA markers used as molecular weight standards (M; pBR322 MspI

digest) are indicated on the right.

120

We found that Eco RNAP paused at the hisP site and elongated the nascent RNA at roughly the same rate at either temperature Similarly to T. aquaticus enzyme, Tth

RNAP was “cold-sensitive” (267) and its elongation rate was dramatically slower at

37°C; however, it still did not recognize the hisP signal efficiently (Fig. 38B). Pausing at the hisP site has been proposed to depend in part on interactions between a pause RNA hairpin and the β flap domain of Eco RNAP (112); a failure of Tth RNAP to pause at this site may be due to its inability to establish these contacts.

We next tested the response of the Eco and Tth RNAP to both NusG proteins on a template that encodes a hairpin-independent (class II) pause signal, opsP (Fig. 39). This signal induces backtracking, a process that is relatively independent of specific interactions between RNAP and the nucleic acid chains in the TEC (41), and thus could be recognized by different RNAPs. While Eco RNAP paused at the ops site efficiently at both 37 oC and 55 oC (left panel), Tth RNAP did not recognize this signal (right panel) even at 37 oC; in contrast, Bsu enzyme pauses at ops, albeit weakly (232). Addition of Eco

NusG moderately increased the elongation rate (by 20%, consistent with many published reports; (243) and references therein) of the Eco enzyme, but had no effect on

Tth RNAP. Conversely, Tth NusG reduced (by ~60%) an overall elongation rate of its cognate enzyme while having no effect on Eco RNAP. We could not detect any significant site-specific effects of Tth NusG (e.g., an appearance of a new pause species) on either pIA692 (Fig. 39) or pIA226 template (data not shown).

To directly compare the effects of Tth and Eco NusGs on the elongation rate, we utilized a well-characterized pIA146 template (Fig. 40) that encodes the E. coli rpoB fragment which is devoid of strong pauses. We monitored the overall elongation rate by accumulation of 1225 nt run-off RNA. The Eco NusG effects on this template have been measured both in bulk (175) and in single-molecule (262) experiments; NusG conferred a moderate (10% to 20%) rate increase under a variety of conditions, including at near- physiological NTP concentrations. However, even under conditions that favor RNAP pausing (low [GTP]), Tth NusG reduced the mean rate ~ two-fold (Fig. 40).

121

Fig. 39. Effects of the NusG proteins on the elongation rate of Eco and Tth RNAPs.

A. Transcript generated from λ PR promoter on a linear pIA692 DNA template;

transcription start site (a bent arrow), C-less region, opsP pause site (+84), rrnB T1

terminator (release site at +247), and transcript end are indicated.

B. Halted A26 TECs were formed at 50 nM with Eco RNAP (left panel) or Tth RNAP

(right panel) as described in Materials and Methods. Elongation was restarted upon

addition of NTPs and heparin in the absence or in the presence of 1 µM NusG variant (as

indicated above each panel); assays were carried out at 37°C (left panels) or 55°C (right

panels). Aliquots were withdrawn at times indicated above each lane and analyzed on

8% denaturing gels. Positions of the halted (A26), paused (opsP), terminated (rrnB T1)

and run-off transcripts are indicated with arrows.

122

Fig. 40. Effect of the Tth NusG on transcription on a “pause-free” pIA146 template.

Halted radiolabeled A29 TECs were extended with NTPs in the absence or in the presence of Tth NusG, aliquots were withdrawn at indicated times and analyzed on 5% denaturing gels; only the gel region near the run-off transcript is shown. The mean rate was calculated from three independent experiments as described previously (210).

123

Different NusG Proteins Have Small and Distinct Effects on Intrinsic Termination

Next, we wanted to find out whether NusG proteins also differ in their effect on intrinsic termination. We used a set of seven templates (A through G) in which λ PR promoter is followed by a terminator (Fig. 41A and C); five of these signals (C – G) induce termination by Eco RNAP. Additional two candidates (A and B) were cloned from the HB8 T. thermophilus genome (NC_006461; KEGG): both have a canonical

terminator structure with a hairpin followed by a U-rich region and are located at the

end of the transcription units. T1672 (template A) is located after the gene encoding

isocitrate lyase (genome coordinates 1409340 - 1409359); T1969 (template B) is found at

the end of the tRNA-Asn gene (1653421 - 1653440).

Single round termination assays were carried out at 37°C and 55°C for the Eco

and Tth enzymes, respectively (Fig. 41B). The two enzymes behaved very differently

(Fig. 41D). While two strong E. coli terminators, hisT and rrnB T1, were recognized even

better by Tth RNAP, this enzyme terminated much less efficiently at another strong (T7

Te, C) and two weak (T3 Te, D and P14, E) E. coli terminators. Two “test” T. thermophilus

terminators triggered weak termination by Eco RNAP; Tth RNAP essentially bypassed

T1969 (A) and terminated with low (and comparable to the Eco enzyme) efficiency at the

T1672 (B) signal. While a thorough comparison is impossible to make from such a small

data set (and such a comparison was not intended in this work), some preliminary

conclusions may be made. At two signals (F and G) that trigger the high-efficiency

(>55%) termination by Tth RNAP, the four terminal bases in the transcript are Us. By

contrast, at four out of five signals that Tth enzyme bypasses easily (B, C, D and E; less

than 20% termination), the residues at which termination occurs (as measured by the

transcript release from the immobilized TECs; IA, data not shown) are GC, CG, CG, and

CU, respectively. The terminal U residues are thought to favor fraying of the transcript

3'- end from the template DNA during the formation of the elemental pause state, which

is a precursor for termination.

124

continued

Fig. 41. Transcription termination by Eco and Tth RNAPs.

A. Transcript generated from the λ PR promoter on a linear pIA747 DNA (template A); transcription start site (a bent arrow), C-less region (residues 1-26), Tth T1969 terminator

(release at 227, a red dot) and transcript end (325, a green dot) are indicated.

B. Halted A26 TECs were formed at 50 nM on templates A-G (indicated below) with Eco

or Tth RNAP. Termination was assayed in single-round A26 RNA extension by addition

of all four NTPs (to 200 µM) and heparin (at 10 µg/ml) in the absence or in the presence

of 1 µM NusG. The reactions were incubated for 10 minutes at 55°C (for Tth RNAP, top

panel) or at 37°C (for Eco RNAP, bottom panel), and quenched. Products were analyzed

on 6% denaturing gels. Positions of terminated (red dots) and run-off (green dots) RNAs

are shown on the left. Sizes of the 32P-labeled DNA markers used as molecular weight

standards (M; pBR322 MspI digest) are indicated on the right.

125

Fig. 41 continued

C. Terminators used in this study. Release occurs between the underlined positions (at two sites in case of hisT).

D. Termination efficiency (terminated transcript as a fraction of total RNA) was determined in four independent experiments. Templates are indicated below each set of bars, the key is shown in the figure.

126

As compared to the Eco enzyme, Tth RNAP is characterized by a higher apparent rate and reduced pausing ((Fig. 38 and 39), suggesting that it may be relatively resistant to fraying of the 3'- nucleotide, an effect that would be augmented by the presence of non-U residues at the end of the nascent RNA. The putative Tth 1969 signal appears to be an exception; it induced less than 10% termination by both enzymes despite the presence of a perfect run of eight U residues at the end of the terminated RNA. In this case, one may consider a possibility that the predicted structure does not form during transcription. For example, an alternative stable RNA structure may form upstream of the putative terminator hairpin, thereby precluding folding of the latter; we did not assess this possibility experimentally.

Response to NusG also differed between the two enzymes. While the effects were generally modest, Eco NusG reproducibly reduced termination at strong rrnB T1 and T7

Te sites, and showed small effects at T3 Te and hisT signals. In contrast, Tth NusG did

not reduce termination at any of the terminators by Tth RNAP, and slightly increased

termination at rrnB T1 and Tth T1672 signals. We did not test the NusG effects on

heterologous RNAPs because we did not observe any such effects during elongation

assays (Fig. 39).

We conclude that (i) NusG proteins from E. coli and T. thermophilus have modest

effects on termination by their cognate RNAPs (Fig. 41), and (ii) these effects parallel

those observed during transcript elongation (Fig. 39).

Tth NusG Reduces σA-Induced Pausing by Tth RNAP

Interactions of the E. coli initiation factor σ70 with the NT DNA strand and the

β’CH domain apparently induce RNAP pausing at promoter-proximal and downstream sequences (83,206,207) that bear resemblance to the TATAAT hexamer. Pausing is likely caused by energetically favorable σ/DNA contacts that have to be broken before RNAP

can move forward. In the RfaH/TEC model (197), RfaHN domain simultaneously binds to the NT DNA strand and the β’CH domain, the same targets that σ uses for

recruitment to the TEC. We showed that full-length Eco RfaH and the isolated RfaHN

127 abrogate σ-dependent pausing of Eco RNAP, presumably through steric competition

(136). Since both sets of interactions required for the σ-induced pausing are conserved

between the Eco and Tth RNAPs, we reasoned that σA, the primary σ factor from T.

thermophilus, should induce pausing at the TATAAT consensus element. Furthermore,

the high degree of conservation of the NusGN-like domains (Figs. 13, 36 and 37) suggests that Tth NusG may also inhibit σ-dependent pausing.

To test these predictions, we constructed a template with the extended –10

(TGcTATAAT) sequence located downstream from the λPR promoter and the C-less region (Fig. 33A). We prepared halted A26 TECs and monitored RNA chain extension upon addition of the NTP substrates. Addition of σA to 500 nM induced Tth RNAP pausing downstream from the -10 hexamer (Fig. 42B). Since this concentration is far below the cellular level of the E. coli σ70 (209), this effect is likely physiologically relevant;

σA concentration is expected to fall in the same range. Both the site and the efficiency of

σA-induced pausing (Fig. 42B) were similar to the observed under identical experimental

conditions for the σ70-induced pausing of Eco RNAP (136). This finding is not surprising: primary σ factors recognize the consensus -10 element, and the determinants that ensure specific promoter recognition (the bases on the NT strand and the β’CH residues) are also accessible within the TEC to mediate σ-induced stalling at promoter-like sites

during elongation (207). When present at 500 nM, Tth NusG reduced the fraction of σA- stalled TECs by ∼ three fold (Fig. 42B). Given that Tth NusG decreases the overall rate of elongation (Fig. 39), its effect at the σ-dependent site cannot be explained by an Eco

NusG-like antipausing activity and is most likely due to the steric competition.

Interestingly, in contrast to RfaHN (197), Tth NusGN was less effective in reducing σA- induced pausing (data not shown), suggesting that is has a lower affinity to the TEC that the full-length protein; the same effect was reported for Eco NusGN (175).

128

Fig. 42. Tth NusG inhibits σA-induced pausing by Tth RNAP.

A. The linear DNA template is shown on top with the transcription start site (a bent arrow), the extended –10 motif, and the end indicated.

B. Single-round pause assays were performed in the absence or in the presence of

σA and Tth NusG (at 500 nM each), where indicated. A representative 6% denaturing gel

is shown. Position of σP pause site was mapped in the presence of chain-terminating

NTPs (data not shown). The fraction of RNA at the σP site after a 240-sec incubation (as

% of total RNA) is presented below each panel.

129

Tth NusG Binds to the NT DNA Strand in the TEC

In the RfaH/TEC model (210), RfaHN interacts with the upstream fork-junction

where the NT DNA bends sharply (~90°) to reanneal with template strand. A highly

similar NusGN is expected to bind to the same target. To examine whether the fork-

junction accessibility is affected by Tth NusGN, we used footprinting by KMnO4, which reacts with unstacked or unpaired thymines (Fig. 43). ). We chose to use the isolated

NusGN domain in place of the full-length protein because (i) we expected that the

presence of a flexibly tethered NusGC (118) could complicate structural studies and (ii)

the N-terminal domains of E. coli RfaH and NusG are sufficient for their effects on

elongation (175,197). The structure of Tth TEC (221) ) suggests that in halted A26

complexes the fork junction lies between +14 and +15 residues, and that the NT stand T

residues at +15,+16, +18 and +20 are single stranded (Fig. 43A, TEC). For comparison, in open promoter complexes (Fig. 43A, RPo) the NT strand T residues at positions -10, -7, -

4, and -3 are expected to be single-stranded and +2 - unstacked, assuming that σA and σ70 make similar contacts to the -10 element. However, σ70 contacts with the -10 and -7 bases

block KMnO4 access to the plane of these bases, which thereby appear resistant to

modification(268). Indeed, the expected patterns were observed in both RPo and TEC formed by Tth RNAP (Fig. 43B).

We found that addition of Tth NusGN to the halted TEC led to a partial protection of the upstream part of the bubble (see Materials and Methods). In the presence of Tth

NusGN, the permanganate reactivity at positions +15 and +16 was reduced by ~40-60%

and 30-50%, respectively. By contrast, accessibility of the T residues at +18 (<15% change)

and +20 (no detectable change) was not affected. Bsu NusG has also been shown to alter

KMnO4 reactivity of the NT strand in vivo and in vitro (117).

130

Fig. 43. Tth NusGN binds to the Tth TEC.

A. Transcription complexes were formed on the λ PR promoter template; the top (NT) strand is labeled. The start site (+1) is indicated in blue. The NT strand T residues known

(268) or expected to be modified by KMnO4 are shown in green. In RPo, the T residues at -10 and -7 positions are expected to be protected by the bound σ. In Eco (210) and Tth (IA, data not shown) TEC halted after addition of A at position +26, the RNA is in a pre- translocated state and T28 would be expected to remain inaccessible.

B. Analysis of KMnO4 modification patterns of RPo and TECs formed with and without Tth NusGN (see Materials and Methods). Positions of hypersensitive (in one of the complexes) T residues and the full-length, uncut DNA are indicated with arrows.

C. Trace analysis of the data shown in B. The low-level (but above the background) modification of Ts at -4, -3, +2, +4 and +7 positions in elongation complexes is likely due to the persistence of open complexes and abortive complexes.

131

Tth NusG Stabilizes the Post-Translocated State of the TEC

RfaH binds to a similar position on the nucleic acids within the TEC and favors forward translocation by the Eco RNAP (210). We argued that this activity may explain an observed decrease in pausing conferred by RfaH. A recent single-molecule study

(262) has demonstrated that Eco NusG increases Eco RNAP velocity and decreases the entry into backtracked paused states. Both effects can be readily explained by an effect on translocation; indeed, the same result was achieved by applying assisting force. To test whether this is a general feature of the NusG-like proteins, we examined the effect of

Tth NusG on two types of RNA cleavage reactions. Pyrophosphorolysis, a reversal of the nucleotide addition reaction, and intrinsic RNA hydrolysis occur in pre-translocated

TECs; sensitivity of the nascent RNA to these reactions can be used to infer the position of the nascent RNA in the active site (Fig. 44).

The A26 TEC formed by either Eco (210) or Tth RNAP (101) are relatively sensitive to PPi and GreB, indicating that these complexes are in a pre-translocated and backtracked states. We formed halted radiolabeled A26 complexes, removed NTPs by gel filtration, and then incubated these complexes for 1-20 min at 55 oC; PPi (200 μM) and

NusG (200 nM) were added where indicated (Fig. 44). As expected, A26 complexes

were sensitive to the PPi-induced cleavage, with a half-life of ~1.3 min at 55 oC. Tth, but

not Eco (210), RNAP also displayed a high level of intrinsic hydrolytic activity (the left

panel). Addition of NusG increased the half-life of A26 approximately two-fold in both

cases, from 3.7 to 7.6 min in the absence of PPi and from 1.3 to 2.8 min in the presence of

PPi. The two reactions differ in their mechanisms, phosphoryl transfer and water- mediated, transcript-assisted hydrolysis (30), but are similarly affected by Tth NusG.

These results suggest that, similarly to Eco NusG and RfaH, Tth NusG favors forward translocation of RNAP along the DNA.

132

Fig. 44. Tth NusG favors forward translocation.

(Top) TECs may interconvert between states in which the 3'- end of the nascent RNA occupies different positions in the RNAP active site. In the post-translocated configuration, the 3'- OH is in the i site, the i+1 site is poised to bind the incoming NTP, and the complex is resistant to cleavage. In the pre-translocated state, the 3'- end occupies the i+1 site and the nascent RNA is sensitive to PPi - and H2O-mediated

cleavage.

(Bottom) Tth NusG effect on RNA cleavage. Halted Tth A26 complexes were incubated

at 55 oC for the times shown; NusG (200 nM) and PPi (200 µM) were present where

indicated. The reactions were analyzed on a 12% denaturing gel.

133

Discussion

In this work we demonstrate that T. thermophilus elongation factor NusG (i) apparently binds to the upstream fork junction of Tth TEC; (ii) stimulates forward

translocation of RNAP; and (iii) competes with the initiation factor σ during elongation;

these phenotypes and the binding site on RNAP are shared by E. coli RfaH and NusG.

On the other hand, in contrast to RfaH and NusG which both decrease RNAP pausing,

Tth NusG does not facilitate transcript elongation by its cognate RNAP. Thus, it appears

that the well-documented acceleration of Eco RNAP by NusG may not be an essential

activity of this universally conserved transcription factor. Together with recent reports

from other groups, our data suggest that even though all NusG-like proteins likely bind

to the same site on TEC (Fig. 45), their regulatory outcomes may depend on the intrinsic

properties of the affected RNAP and the identity of their interaction partners.

NusG Interactions with the TEC

In E. coli RfaH (197) and NusG (175), Mja NusG (263), and human DSIF (269), the

binding to RNAP is likely mediated by interactions between the CH domain and a

hydrophobic patch on the N-terminal domain. Thus, the mode of recruitment to RNAP

is likely common within the NusG superfamily. However, the details of interactions

with nucleic acids may vary significantly among different proteins. Eco RfaH requires

an ops DNA sequence in the NT DNA strand exposed on the RNAP surface during

recruitment and directly crosslinks to the NT strand in the TEC (82). RfaH likely

maintains nonspecific interactions with the DNA throughout elongation; RfaH

recognizes an ops site positioned far downstream from its site of recruitment. In contrast,

Eco NusG does not crosslink to DNA (or RNA); consistently, the region that mediates

RfaH binding to ops is the least conserved between the two proteins (129). Two NusG orthologs, from Aquifex aeolicus and Thermotoga maritima, bind to nucleic acids nonspecifically (119,259); this property could be attributed to a large positively charged domains inserted into a flexible β loop in the NusGN domain (Fig. 36).

134

Fig. 45. A model for Tth NusG interactions.

In the TEC, core RNAP (grey) is bound to the T (red) and NT (blue) DNA strands that are separated in front of the active site (white circles, shown with two catalytic Mg ions and the substrate NTP) to form a transcription bubble. The single-stranded T strand is paired with the nascent RNA (yellow) in an 8-9 bp RNA:DNA hybrid, whereas the NT strand is exposed on the RNAP surface. NusGN (green) interacts with the hydrophobic tip of the β’ CH (dark grey cylinder) near the upstream fork junction. NusGC (cyan) is

connected to NusGN via a flexible linker and may interact with Rho (as shown),

upstream DNA or other regulators. In NusG-bound TEC, the primary binding site for σ

(magenta) composed of the NT strand nucleotides and the β’CH is blocked.

135

Finally, the action of Bsu NusG action is sensitive to sequence alterations (270), and Eco NusG exhibits differential effects at different pause sites (262). It is, however, unclear whether these effects are mediated by base-specific contacts between NusG and nucleic acids or are conferred by changes in the TEC induced by altered RNA-DNA interactions.

The observed protection of the residues at the fork-junction by Tth NusG (Fig.

43B) could be interpreted in several ways. First, NusG could make contacts to the DNA

bases that would directly shield Ts from KMnO4 attack. Second, NusG could induce changes in stacking interaction or backbone distortion that indirectly result in altered sensitivity to KMnO4. Third, as suggested earlier (41,258) and recently supported by the single-molecule data for Eco NusG (262), Tth NusG could favor forward translocation of

RNAP. In the latter case, one base pair would reanneal at the upstream part of the bubble, leading to protection of T+15 against KMnO4, and one base pair will become separated ahead of the active site. The +2 base (T28) is predicted to be unstacked but may or may not become sensitive to modification by KMnO4; in different transcription complexes, +2 position may appear accessible or protected.

The observed pattern is inconsistent with base-specific contacts between NusG and the NT DNA; such contacts would be expected to confer strong protection (e.g., of the -10 and -7 T residues by the bound σ in RPo; Fig. 43B). Competition with σ (Fig. 42) suggests that Tth NusG may sterically hinder KMnO4 access to the bases; however,

additional effects on the DNA structure cannot be excluded. This pattern of protection,

together with the reduction in PPi cleavage rate (Fig. 44), is consistent with forward

translocation induced by NusG bound to the upstream fork junction.

The NusG/NT DNA interactions may (i) serve as specific signals for a regulator

(e.g., RfaH) recruitment; (ii) insulate the TEC from factors (e.g., σ) that target the NT

DNA; (iii) constrain the path/conformation of the NT strand on the RNAP surface; and

(iv) help to stabilize the fork junction structure; A. aeolicus NusG was proposed to induce

partial melting of duplex DNA (119). However, at least in some cases the NT contacts

136 appear to be dispensable; for example, archaeal NusG still functions in the absence of the

NT DNA (263).

NusG Role in Transcriptional Pausing

We propose that the NusG-like proteins play a dual role in regulation of transcriptional pausing: they insulate any elongating RNAP from re-binding of σ (and other factors that target the NT DNA or the β’CH) and prevent isomerization of some, pause-prone, RNAP molecules into the paused state.

The ability of Eco NusG to increase the elongation rate was noted early and proposed to be an important part of its mechanism. Similarly, RfaH is thought to act by decreasing RNAP pausing. Pausing is triggered by signals that induce formation of an elemental pause state, in which the nascent RNA 3'- end is misaligned (40,142); isomerization into the paused state is thought to occur from a pre-translocated state (41).

RfaH fails to affect transcription on pause-free templates and with pause-resistant

RNAPs (232). Both RfaH and NusG appear to favor the forward-translocated state of the

TEC, thereby inhibiting isomerization into the paused state (210,258).

Recent data suggests, however, that the antipausing activity is not shared by all bacterial NusG: Tth (Fig. 38) and Bsu (117) NusGs increase rather than decrease pausing. These differences may reflect the properties of the RNAP species: while Eco

RNAP pauses frequently, at least during transcription in vitro, Bsu (232) and Tth enzyme

(Figs. 38 and 39) do not recognize pause signals that hinder Eco RNAP, even when

moving at approximately the same rate. Pyrophosphorolysis in static TECs (e.g., A26)

reveals that the nascent RNA in Tth TECs is significantly more resistant to cleavage as

compared to the Eco RNAP (the half-life of 13 sec at 25 µM PPi (210)), suggesting that

Tth RNAP may be stabilized in the post-translocated state. If this were true, Tth RNAP would be expected to ignore pauses and transcribe at a faster rate, as seen in bulk elongation assays (Figs. 38 and 39). These observations suggest that translocation may

not be rate limiting for Tth RNAP; in this case, no “stimulatory” effect of NusG on

elongation would be expected.

137

RNAPs that are not accelerated by NusG (or RfaH) share one common property – they transcribe at an overall faster rate in the absence of accessory factors. This could be due to sequence-specific differences between the catalytic elements which confer many enzyme-specific properties (267). While it is difficult to point out the single underlying reason, our attempts to convert Eco RNAP into a Bsu-like enzyme through removal of large E. coli-specific insertions demonstrated that the β’ SI3 domain is responsible for many difference observed between Eco and Bsu enzymes in vitro (271). Eco ΔSI3 enzyme forms unstable promoter complexes, transcribes at a faster rate, pauses and terminates less efficiently, and is slowed down by RfaHN (210). Similarly, Bsu (117) and Tth (Fig. 38)

NusGs reduce the elongation rate by their cognate RNAPs which are already inherently fast. It is currently unclear how Bsu and Tth NusGs slow RNAP down; the single- molecule analysis would be required to distinguish between the effects on pausing

(isomerization into, or escape from, the off-pathway states) and pause-free elongation between the short-lived pauses. The available data appear to suggest that one of the roles of NusG could be to maintain the certain rate of RNA synthesis rather than to speed the RNAP up. In bacteria, this requirement may be imposed by the need to couple transcription to translation (272).

What is the Main Role of NusG in the Cell?

NusG-like proteins are present in all three kingdoms of life and have been implicated in various essential cellular processes. Eco NusG reduces RNAP pausing and intrinsic termination (125), recruits Rho to the TEC (126,257), and participates in formation of multi-protein complexes (175) that mediate antitermination modification of

RNAP transcribing rrn and phage λ genes and assist termination by HK022 Nun protein.

Recent reports (135,153) suggest that Eco NusG may bind to ribosome; we have proposed a similar role for RfaH (129). Due to this functional diversity, it is difficult to point out which of these activities is responsible for the ubiquity of NusG. In E. coli, transcriptional repression of foreign DNA through direct interactions with Rho has been proposed to constitute the essential role of NusG (113). However, both NusG and Rho

138 are dispensable in B. subtilis (198) and Tth NusG does not interact with Rho (118).

Binding to RNAP and modulation of its rate are mediated by NusGN whereas

NusGC domain is thought to establish contacts with other partners (Fig. 45). We

hypothesize that the primary role of NusGN may be to tether NusGC to the TEC, whereas

NusGC is essential for the assembly and function of the regulatory complexes that include one (Rho) of many (other Nus factors and nut RNA) components.

We also propose that the roles of these domains are conserved in all kingdoms.

In Archaea and eukaryotes, NusG homolog Spt5 forms a heterodimer with Spt4; the

Spt4/5 complex (called DSIF in humans) enhances RNAP II processivity (273). The N- terminal domain of Mja NusG likely binds to the same site on RNAP (263). The C- terminal domains contain several KOW motifs (as compared to just one in NusG) and repeats, which, upon phosphorylation by cyclin-dependent kinases (P-TEFb in humans and Bur in yeast), appear to nucleate assembly of large protein complexes with diverse regulatory functions. C-terminal repeats promote RNAPII elongation, recruitment of the

PAF complex, histone H2B K123 monoubiquitination and histone H3 K4 and K36 trimethylation, and suppress Rad26-independent transcription-coupled nucleotide excision repair (121,273) ), and may couple transcription by RNAP I to rRNA processing and ribosome assembly(274).

Contributions: Anastasia Sevostiyanova performed σ-competition assay, cleavage assay,

elongation rate assay, footprinting analysis, vector construction, purification of Tth

NusG NTD. Irina Artsimovitch performed pause assays and termination efficiency

assays and vector construction.

139

Materials and Methods

Plasmids and Reagents

pTYB12 was obtained from NEB (Ipswich, MA). All general reagents were obtained from Sigma Aldrich (St. Louis, MO) and Fisher (Pittsburgh, PA); NTPs, [γ32P]-

ATP and [α32P]-GTP, from GE Healthcare (Piscataway, NJ) and Perkin Elmer (Boston,

MA); PCR reagents, restriction and modification enzymes, from NEB, Roche

(Indianapolis, IN) and Epicentre (Madison, WI). Chitin beads and Ni-sepharose were

from NEB and GE Healthcare, respectively. Oligonucleotides were obtained from

Integrated DNA Technologies (Coralville, IA) and Sigma Aldrich. DNA purification kits were from Qiagen (Valencia, CA) and Promega (Madison, WI).

Protein Expression and Purification

XJb (λDE3) strain transformed with pVS58 (Tth NusG ORF cloned between NdeI and NotI sites of pTYB12) was inoculated into LB (Miller) + 0.1 mg/L carbenicillin + auto- induction reagents as described by W.F. Studier (32). The culture was grown with agitation at 32 oC till stationary phase (~20 h; OD600~5). Arabinose was added to 0.06% after 12-16 h to induce the expression of endolysin. Cells were collected by centrifugation and frozen at -80 oC. Pellet was resuspended in IMPACT-CN500 buffer

(50 mM Tris-HCl, pH 8.8, 500 mM NaCl, 1 mM EDTA) + 1X Complete® EDTA-free

Protease Inhibitors Cocktail (Roche) + 0.1% Tween-20. Cells were lysed by ultrasonication, followed by centrifugation (2X 30 min at 29500 g, 4 oC), and the cleared lysate was loaded on chitin beads equilibrated with IMPACT-CN500. The column was washed with 10 volumes of IMPACT-CN500, 3 volumes of IMPACT-CN500+50 mM

DTT, and incubated at 22 oC for 18 h. The cleaved-off protein was eluted with IMPACT-

CN500, polished by gel-filtration over a Sephacryl S-200HR column (GE Healthcare),

dialysed against storage buffer (50% glycerol, 100 mM NaCl, 10 mM Tris-HCl pH 7.9, 0.1

mM EDTA, 0.1 mM DTT) and stored at -20°C.

pIA885 containing residues 1-117 of Tth NusG (NusGN) fused to a His6 tag and a TEV

140 recognition site under the T7 promoter was transformed into XJb (λDE3) strain. An overnight culture was diluted 1/100 into fresh LB (Miller) medium and grown at 37°C.

IPTG was added to 1mM at OD600 ~0.4, cells were grown for 3.5 h at 30 °C, and collected

by centrifugation. The pellet was resuspended in Lysis buffer ( 500 mM NaCl, 50 mM

Tris-HCl pH 6.9, 5% glycerol, 0.1 mM EDTA, 1 mM βME, with Complete® cocktail) and

disrupted by ultrasonication. The extract was cleared by centrifugation and subjected to

heat shock for 20 min at 70 °C. Precipitate was removed by centrifugation; supernatant

was filtered and loaded onto a Ni-sepharose (GE Healthcare) gravity column pre-

equilibrated with Lysis buffer. Column was washed with 10 volumes of Lysis buffer, 10

volumes of HepA buffer (50 mM Tris-HCl pH 6.9, 5% glycerol, 1 mM βME) and 10

volumes of HepA+20 mM imidazole. Protein was eluted with HepA+100mM imidazole

and loaded onto a HiTrap Heparin HP column (GE Healthcare). Bound proteins were

eluted by NaCl gradient, NusGN eluted as a single peak at 12 mSi (~220 mM NaCl).

NusGN fractions were concentrated on Amicon filtration device MWCO 5 kDa

(Millipore, Billerica, MA) and NaCl concentration was adjusted to 500 mM. His6 tagged

TEV protease (100 µg) was incubated with the protein sample (~8 mg) at 4°C for 48 h.

The cleaved-off His6 tag, the uncut His6-NusGN, and (His-tagged) TEV were removed by

absorption to Ni-sepharose. NusGN was dialyzed into storage buffer (as above) and

stored at -20°C.

Transcription Elongation Assays

Linear DNA template generated by PCR amplification (30 nM), holo RNAP (40 nM), ApU (100 µM), and starting NTP subsets (1 µM GTP, 5 µM ATP and UTP, 10 µCi

[α32P]-GTP, 3000 Ci/mmol) were mixed in 100 µl of TGA10 (20 mM Tris-acetate, 20 mM

Na-acetate, 10 mM Mg-acetate, 5% glycerol, 1 mM DTT, 0.1 mM EDTA, pH 7.9).

Reactions were incubated for 10 min at 37 oC or 55 oC for Eco and Tth RNAP,

respectively; thus halted TECs were stored on ice. Transcription was restarted by

addition of nucleotides (10 µM GTP, 150 µM ATP, CTP, and UTP) and heparin to 10

µg/ml at either 37 oC or 55 oC. Samples were removed at desired time points (as

141 indicated in the figures) and quenched by addition of an equal volume of STOP buffer

(10 M urea, 20 mM EDTA, 45 mM Tris-borate; pH 8.3).

Pause-free elongation assays

Halted radiolabeled A29 TECs (40nM) formed on a linear template PCR amplified from pIA146 DNA template containing the pause-less rpoB gene in 30µl of

TGA2 buffer with ATP and CTP at 2.5 µM, GTP at 1 µM, ApU at 150 µM, and 20 µCi of

[α-32P]GTP (3000 Ci/mmol; Perkin Elmer) for 10 min at 55°C, diluted eight-fold and split into six aliquots. Samples were incubated with NusG (200 nM) or storage buffer for 1 min, and transcription was restarted by addition of NTPs (150 µM ATP, CTP, UTP and

10 µM GTP). Aliquots were withdrawn at indicated times, quenched, and analyzed on

5% denaturing gels.

Sigma Competition Assay

was performed on a linear pAS33 DNA template (50 nM) containing λPR promoter followed by a C-less region, σ-dependent pause (σP) and the hisT terminator.

Halted TECs were prepared in 50 µl of TGA2 buffer (20 mM Tris-acetate, 20 mM Na- acetate, 2 mM Mg-acetate, 5% glycerol, 1 mM DTT, 0.1 mM EDTA, pH 7.9) with holo Tth

RNAP (50 nM), ApU (100 µM), and starting NTPs (1 µM GTP, 5 µM ATP and UTP, 10

µCi [α32P]GTP, 3000 Ci/mmol) at 55°C for 10 min. Tth NusG and/or σA was added to 0.5

µM followed by a 1-min incubation at 55°C. Transcription was restarted by the addition of all four NTPs to 40 µM and rifapentin to 25 µg/ml. Samples were removed at 7, 15, 30,

45, 60, 120 and 240 sec and quenched by the addition of an equal volume of STOP buffer.

KMnO4 Footprinting

Linear 153-bp DNA fragment containing λ PR promoter was made by PCR

amplification using pIA226 as a template with primers 17 (5'-

CGTTAAATCTATCACCGCAAGGG) and 138 (5'-ATCGCCTGAAAGACTAGTCAGG).

The NT DNA strand primer (#17) was end-labeled with [32P]-γATP (Perkin Elmer) and

142

PNK (Epicentre, Madison, WI) and purified using G-50 spin columns (GE Healthcare).

PCR products were gel-purified using a Wizard® SV kit (Promega). Sequencing reactions

were performed using the same labeled primer with SequiTherm kit (Epicentre). Open

complexes were formed with holo Tth RNAP (200 nM) pre-incubated with the labeled

promoter fragment (100 nM) and ApU (100 µM) for 10 min at 55 °C in GBB buffer (20 mM Tris-HCl, 20 mM NaCl, 14 mM MgCl2, 5% glycerol, and 0.1 mM EDTA; pH 7.9) in the presence of 0.03% DMSO. To form halted complexes, the reaction was supplemented with 1 µM GTP, 5 µM ATP and 5 µM UTP. NusGN was added to 2 µM where indicated.

Samples were shifted to room temperature and treated with 10 mM KMnO4 for 60 sec.

After addition of an equal volume of quench mix (1.5 M NaAc, pH 5.2, 80 mM EDTA, 6

M β-mercaptoethanol) , samples were subjected to phenol-chloroform extraction and precipitated with ethanol. Pellets were dissolved in 20 µl of water, incubated with 100 µl of 0.5 M piperidine at 95 oC for 20 min. Following ethanol precipitation, DNA was

dissolved in 96% formamide.

The changes in reactivity of the accessible (single-stranded or unstacked) T

residues in the NT DNA between positions -10 to +20 relative to the transcription start

site were evaluated by ImageQuant software. As the shape of the peaks did not vary

dramatically, we used their heights to evaluate the relative KMnO4 reactivity at each

position. If one assumes that the area below a peak can be roughly approximated by the

area of a triangle, two possibilities can be considered: these triangles are (i) isosceles and similar or (ii) isosceles and equal base. A change in the height of the two triangles by a factor of n would correspond to a change in their areas by a factor n2 or n, respectively.

We used the range from n to n2 to estimate the effect of Tth NusGN on the accessibility of

each residue.

Transcript cleavage

Linear pIA226 DNA template, holo Tth RNAP (200 nM), ApU (100 μM) and starting NTPs (1 μM GTP, 5 μM ATP and UTP, 10 μCi [α32P]-GTP, 3000Ci/mmol) were mixed in 30 μl of buffer TGA2 and incubated for 10 min at 55 oC. Halted A26 complex

143 was purified by gel filtration through G-50 spin columns equilibrated in TGA2, diluted four-fold and stored on ice. Reactions were initiated by shifting samples to 55 oC. PPi

(1/10 volume of 2 mM stock) and Tth NusG where added where indicated. Samples were

quenched with the STOP buffer at the selected times.

Sample Analysis

Samples were heated for 2-3 min at 95 oC and separated by electrophoresis in

denaturing 6-10% acrylamide (19:1) gels (7 M Urea, 0.5X TBE). The gels were dried and

RNA products were visualized and quantified using a Molecular Dynamics Storm 820

Phosphorimaging System, ImageQuant Software, and Microsoft Excel.

144

Chapter 5: The β subunit Gate loop mediates antitermination

modification of RNA polymerase.

Introduction

Uninterrupted synthesis of complete, up to a million nucleotides-long, RNA chains by multisubunit RNA polymerases (RNAPs) requires accessory proteins that help

RNAP bypass numerous roadblocks it encounters along the way. These

“antitermination” factors are found in all organisms from bacteriophages to humans and share the ability to switch the elongating RNAP into a highly processive state. Their molecular mechanisms, and in most cases even their binding sites on the transcription complex, remain unknown. Proteins from the NusG family have been implicated in

transcription regulation, RNA processing, and RNA silencing (250),(175,274-276). NusG

is a general transcription factor that is essential in wild-type E. coli and associates with

RNAP transcribing most genes (137). NusG increases the rate of RNA synthesis in vivo

and in vitro (125), directly interacts with Rho (175) to limit expression

of horizontally acquired operons (113), and is a component of multi-protein rRNA

antitermination complexes (255). RfaH, a sequence-specific paralog of NusG,

preferentially increases expression of the distal genes in operons that have ops DNA

elements. The ops sequence mediates RfaH binding to the elongating RNAP (82,197) and

thus restricts the RfaH action to just a few E. coli operons (129); consistently, rfaH is

dispensable for cell viability. In vitro, RfaH reduces RNAP pausing, and in turn

termination (82,210).

We argued that RfaH and other antiterminators, such as the phage λ N and Q

145 proteins (277), maintain the TEC in a rapidly-moving mode by preventing its isomerization into an off-pathway state (210) called the elemental pause (39). From this state, the TEC is thought to isomerize into a stabilized paused state (upon backtracking or formation of a nascent RNA hairpin) or a termination complex (41). Changes that underlie the formation of this state occur near the RNAP active site and likely involve fraying of the 3'- RNA end, which occurs after nucleotide addition but before RNAP translocation (40,142). A regulatory factor could act by promoting RNAP translocation or by preventing conformational changes that allow fraying of 3’-OH, but the molecular mechanism of the antipausing (AP) modification remains unknown for any factor.

RfaH is an excellent model to address this key question because RfaH structure, the mechanism of recruitment, the DNA- and RNAP-binding regions, and its binding site on RNAP have been determined (197). RfaH binds to the β’CH and the NT DNA strand at the upstream end of the transcription bubble (Fig. 46). In the simplest scenario,

RfaH may favor DNA strand reannealing, and thus propel RNAP forward (210). RfaH may also trigger an allosteric signal that will be transmitted over 75Å to the RNAP active site. In the first case, RfaH binding to the TEC should be sufficient for its action. In contrast, the second mechanism predicts that other elements in RNAP and RfaH should be necessary for the AP modification.

146

Fig. 46. A model of RfaHN bound to the TEC.

The T. thermophilus RNAP (5) is shown as tubes with the bridge helix (β’ BH) highlighted in cyan, T DNA – in black, NT DNA – in yellow, RNA – in red. Position of the RNAP active site is marked by the catalytic Mg2+ ion (a yellow sphere). RfaHN (blue) is bound to the NT strand and to β’CH (green); Tyr54 (forest green) interacts with β’CH. The HTT cluster is positioned next to a mobile βGL (magenta). In the alignment of the GL sequences, the highlighted residues differ from those in the E. coli β. The abbreviations are: Eco, E. coli; Tth, Thermus thermophilus; Hpy, Helicobacter pylori; Bsu, Bacillus subtilis;

Mtu, Mycobacterium tuberculosis; Mge, Mycoplasma genitalium.

A model is prepared with PyMol (DeLano Scientific LLC).

147

Results

RfaH Binding to β’CH is Insufficient for AP

The first model can be refuted by identification of RfaH and RNAP variants that maintain their interaction but fail to support RfaH function. Mutational analysis of the

N-terminal RfaH domain, RfaHN, which is sufficient for RNAP modification, revealed three functional regions: (i) a cluster of residues that likely interact with the NT DNA; (ii) a hydrophobic surface that makes van der Waals contacts with the β’CH tip, and (iii) a cluster of three residues (HTT, 65-67) whose substitutions do not compromise RfaH contacts with DNA, yet reduce its AP activity (264).

In vivo and in vitro data suggest that after the initial recruitment at the ops site RfaH must stay bound to RNAP until the nascent RNA chain is completed(129,264). Even though the HTT residues are not predicted to interact with either DNA or β’CH, a trivial explanation is that these substitutions compromise RfaH binding to RNAP indirectly, and the altered protein dissociates from the TEC soon after recruitment. Alternatively, RfaH remains bound to the TEC but cannot establish a key contact with RNAP that is ultimately required for the AP modification. To distinguish between these scenarios, we used a σ competition assay (Fig. 47), which relies on the ability of RfaH to sterically block σ recruitment to the TEC because both proteins bind to the β’CH domain (136). In the absence of RfaH, σ binds to a -10-like sequence and delays RNAP escape from a downstream site (designated as σP in Fig. 47). Wild-type RfaH interfered with σ- induced pausing and increased the rate of RNA synthesis. In contrast, an RfaH Y54F variant with a substitution at the RfaH/β’CH interface (Fig. 46) was initially recruited at the ops site (as reflected by RNAP delay at C45) but failed to maintain stable post- recruitment interactions: it neither accelerated transcription nor competed with σ. On the other hand, T66A and T67A (and H65A, not shown) RfaH variants reduced pausing at σP, but not at the hairpin-dependent hisP site (Fig. 47). We conclude that RfaH interactions with the β’CH and NT DNA are sufficient for its binding to the TEC, but not the AP function.

148

Fig. 47. RfaH variants with substitutions in the HTT motif still bind to the TEC.

The RfaH retention assay was performed on a linear pIA807 DNA template (40nM) containing T7A1 promoter followed by a U-less region, RfaH recruitment site (ops), sigma-dependent pause (σP) and hisP site (shown on top). The transcription start site

(+1) and end (224), the ops and -10 elements, and the hisP signal are indicated.

Halted [α32P]CMP labeled G37 TECs (40nM) were incubated with σ70 (400nM) and an

RfaH variant (100 nM) where indicated; transcription was restarted upon addition of the

NTP substrates(136). Positions of the G37, ops, σP and his paused RNAs and run-off (RO) transcripts are indicated with arrows. The fraction of RNA at the σP site after a 960sec incubation (as % of total RNA) is presented below each panel; the errors (±SD) were calculated from three independent experiments.

149

RNAP with the GL Deletion Does not Respond to RfaH in Vivo.

The RfaH/TEC model allows for a plausible contact of the HTT motif with the β subunit gate loop (βGL, Fig. 46), a moderately conserved mobile element that has been proposed to play a key role in DNA loading during initiation (5) but, to our knowledge, has not been studied experimentally. The βGL is located in the narrowest part of the main channel, on the opposite side from β’CH. To test whether βGL is a functional partner of the HTT motif, we substituted the E. coli β residues 368-376 with two glycines and determined effects of this deletion on RfaH function.

RfaH is thought to act by reducing polarity in long, poorly expressed operons such as the 11-kb rfb; we showed that RfaH is recruited to RNAP upstream from the first

(rfbB) gene and remains bound to the enzyme until the end of this operon (129). To assess polarity within the rfb operon and the role of βGL therein, we measured expression of the rfbB and the 8th (wbbI) genes by qRT-PCR (see Materials and Methods).

In control samples, disruption the rfaH gene reduced the levels of rfbB and wbbI genes by factors of 4 and 130, respectively (Fig. 48A), consistent with a model in which RfaH preferentially increases the expression of distal genes, e.g. by antitermination at an intergenic site(241). The absence of RfaH decreases the ratio of rfbB to wbbI transcripts more than 32 times (from 5.3 to 173). The observed RfaH effect on transcription as early as within ~ 400 nt from its recruitment site (a promoter-proximal RT-PCR probe covers

+40 to +363 region from the rfbB start codon) has not been reported before; however, qRT-PCR has never been used to evaluate the expression of RfaH-controlled genes. One of the possible explanations arises from the recent genome-wide ChIP-chip analysis of elongation factors associated with RNAP in wild-type E. coli (129). Analysis of the rfb

operon revealed a high level of Rho and NusG present in the promoter-proximal region

just before the RfaH-recruitment site (Fig. 48C). The presence of RfaH reduces levels of

Rho and NusG in the body of the operon. However, in the ∆rfaH strain, Rho and NusG

may target the rfb operon early in elongation, causing premature termination of rfbB

RNA.

150

Fig. 48. ∆GL RNAP does not respond to RfaH in vivo.

Polarity within the rfb operon evaluated by qRT-PCR. Total RNA was isolated from cells expressing WT (A) or βGL deletion (B) RNAP in a wild-type or ΔrfaH background and the amount of RNA message in the rfbB region (red) was compared to that in the wbbI region (blue). Positions of RT-PCR probes marked as red and blue rectangles for rfbB and wbbI, respectively, in a schematics of the rfb operon (C). Positions of putative promoters are indicated with arrows, position of terminator – with a stop sign, position of ops is shown relative to the translation start of the first gene (rfbB). Occupancy of RNAP, NusG, RfaH and Rho associated within the rfb operon evaluated by ChIP-chip are shown below. The patterns are aligned with the schematics above.

151

Deletion of the βGL dramatically decreased RfaH effect on expression of the rfb operon (Fig. 48B). The distal wbbI transcript level showed only a modest 3.2-fold reduction, whereas the level of rfbB message was even increased by a factor of two. The

RfaH effect on polarity was also reduced when the wild-type β was replaced by the βGL deletion variant: RfaH had only a ~ six-fold (from 11 to 69) effect on relative levels of rfbB and wbbI messages, consistent with an essential role of the βGL in RNAP modification by RfaH.

RNAP with GL Deletion Does not Respond to RfaH In Vitro.

We first tested if RNA chain elongation by the deletion enzyme responds to RfaH and NusG in vitro (Fig. 49A). Consistent with the published data, NusG and RfaHN increased the apparent elongation rate by the wild-type enzyme (from 1.47 nt/sec to 2.15 and 2.61 nt/sec, respectively) (82,125). In contrast, the deletion enzyme failed to respond to RfaHN and was less sensitive to NusG. The intrinsic rate of RNA synthesis appeared to be slightly higher for the mutant enzyme (2.09 nt/sec vs. 1.47 nt/sec), however, the presence of RfaHN decreased the rate of elongation to 1.52 nt/sec. This phenomenon has been observed before for some “fast” RNAP variants(210). These RNAPs were thought to be resistant to acceleration by RfaH because they were intrinsically insensitive to pause signals. To rule out the possibility that the same effect underlies Δ βGL RNAP resistance to RfaH action, we compared its response to a model hisP signal with that of the well-characterized “fast” RNAP that lacks a large SI3 insertion in the trigger loop

(β’Δ943-1130) (210). We found that Δ βGL RNAP pauses at hisP more efficiently and dwells longer than Δ β’SI3 RNAP (Fig. 49B). The kinetic characteristics of Δ βGL RNAP, such as pause half-life were very similar to the wild-type enzyme (t1/2 ~46.5 sec vs. t1/2 ~52.9), whereas Δ β’SI3 RNAP escaped from hisP more than twice as fast(210). We concluded that the complete resistance of Δ βGL RNAP to RfaH cannot be attributed to the modest increase in its intrinsic elongation rate.

152

Fig. 49. ∆GL RNAP pauses but does not respond to RfaH in vitro.

A. Elongation rate assays were performed on pIA146 template as described previously(210) with RfaHN (80nM) or NusG (80nM) at 20 µM NTPs. Only the region near the full-length transcript (RO, 1225 nt) is shown. Numbers below each panel correspond to the calculated elongation rate, nt per second.

B. Pause assay performed at pIA171 template as described previously (232). Only the region near the hisP site is shown.

153

Fig. 50. Deletion of βGL does not compromise NusG effect on Rho-dependent termination.

Rho termination assays were performed on pIA267 template as described(129). Halted,

[α32P]GMP labeled TECs were formed at 40 nM with the WT or βGL deletion RNAP.

Rho (50nM) and NusG (100nM) were added where indicated. The run-off (RO)

transcript and the Rho-dependent termination region (TERM) are indicated.

154

We next tested Rho-dependent termination within the phage λ tR1 region (Fig. 50). In support of our hypothesis, we found that the Δ βGL RNAP responded to Rho action and to the NusG enhancement thereof: the addition of NusG shifted the termination window upstream and reduced the run-off RNA from 77 to 29% for the wild-type enzyme, and from 68 to 21% for the mutant RNAP.

Altogether, these results argue the main function of RfaH is to reduce pausing, and that the βGL is necessary for RfaH action in vivo (Fig. 48) and in vitro (Fig. 49A). By contrast, the βGL appears dispensable for the cell viability (Fig. 51) and the NusG- mediated enhancement of Rho-dependent termination (Fig. 50), a function independent from the antipausing activity of NusG (175).

Deletion of βGL Does not Abolish RfaH Binding

The GL may be required for RfaH/NusG to remain bound to RNAP throughout transcription or to modify RNAP into a pause-resistant state. Several lines of evidence argue against the first model. Hypothetical RfaH/βGL contacts inferred from the model are weak, and disruption of these contacts by substitutions in the HTT motif abolishes the AP activity of RfaH but not σ exclusion (Fig. 47). Furthermore, NusG retains the ability to enhance Rho-dependent termination by, and thus apparently binds to, the ΔβGL RNAP. We sought to obtain more direct evidence in support of these considerations. First, while RfaHN was unable to accelerate transcription by the ΔGL enzyme, it did prevent σ binding during elongation (Fig. 52). Second, we showed that although RfaH interactions with the ΔGL RNAP may be somewhat weakened (as judged by reduced RNAP retention at the ops site), RfaH did bind to the ΔGL TEC, but not to the TEC with a single substitution in the β’CH, the main RfaH target (Fig. 53). Finally, using bacterial two hybrid assay (Fig. 54), we detected an apparently specific interaction between RfaHN and the N-terminal domain of β (E. coli residues 2-454) that was compromised by the βGL deletion and the H65A substitution in RfaH. However, this interaction was rather weak, and thus unlikely to make a major contribution to binding affinity.

155

Fig. 51. ∆GL RNAP supports viability in laboratory conditions.

Viability of the E. coli cells carrying the plasmid-encoded rpoB alleles (wild-type, rifampicin-resistant D516V and D516V+Δ368-376GG) expressed from the Ptrc promoter.

Serial dilutions of cultures were spotted on LB plates supplemented with carbenicillin

and IPTG in the absence or in the presence of rifapentin.

Fig. 52. RfaH abolishes σ-dependent pause during elongation by the ΔGL RNAP.

The σ-competition assay was performed as in Fig. 47.

156

Fig. 53. RfaH binds to the ΔGL RNAP.

Gel mobility shift assay was performed as in(82). TECs assembled from oligonucleotides and RNAP variants were incubated with [γ32P]ATP-labeled RfaH (left panel) or

[α32P]GTP, the next nucleotide specified by the template (right panel), and loaded onto a

3% agarose gel.

Fig. 54. Bacterial two-hybrid assay (BacterioMatchTM) of RfaH- βGL interactions.

Vectors containing fusions of λ cI to RfaHN (WT or T65A mutant) were co-transformed with an αNTD-fusion to the rpoB fragment encompassing residues 2-454 (WT or ΔGL variant) into a reporter strain containing the lacZ gene downstream from the test promoter. Expression of the fusion proteins was induced with IPTG and β-galactosidase activity was measured. The results shown (Miller units) are from at least three independent experiments.

157

Discussion

In this work we identify the site on the β subunit of bacterial RNAP that mediates

its modification by NusG-like proteins conserved in all domains of life. We found that

interaction of the HTT motif in RfaH with the βGL is crucial for antipausing by RfaH

and NusG (Figs. 47 and 49). Despite its functional significance, the βGL is dispensable

for RfaH binding to RNAP (Figs. 52 and 53). Interaction between the HTT motif in RfaH

and the βGL appears to be weak and transient (Fig. 54), whereas the β' CH, located on

the opposite side of the channel (Fig. 46), is a major affinity determinant for RfaH

binding (Fig. 53 and (197)).

Contrary to our expectations, the RNAP lacking the βGL was transcriptionally

competent both in vivo (Figs. 48 and 51) and in vitro (Fig. 49), implying that the βGL is

not required for either initiation or the (essential) function of NusG. The N-terminal

domains of RfaH and NusG are structurally homologous, are predicted to make the

same contacts with RNAP, and are sufficient to reduce pausing (129,175). However, full-

length RfaH and NusG appear to play opposite regulatory roles in E. coli: NusG recruits

Rho to terminate transcription of horizontally transferred operons (113), whereas RfaH

promotes expression of a few ops-containing operons, in part by reducing Rho-

dependent termination (129,278). If we assume that the βGL is required for the shared,

AP function of RfaH and NusG, our data support the idea that cooperation with Rho

may be the principal role of NusG in the cell (113), and suggest that the βGL is

dispensable for the NusG/Rho interactions.

In summary, our present data suggest a model of antipausing modification by

RfaH (Fig. 55). For RfaH to work, it needs to simultaneously contact the β'CH and the

βGL located on the opposite sides of the clamp. The alternative positions of the clamp

observed in many crystallographic studies suggest that this domain can move relative to

other structures in the RNAP (1,5-7,26), but the functional significance of the clamp

movement is still debated. The modest opening of the clamp may alter positions on

nucleic acids inside the main channel and was proposed to favor formation of elemental

158 pause (41). Recent finding that small changes in the active site induced by the substrate binding can cause large changes in the periphery of the complex (26) further support this model. We propose that RfaH binding of RfaH to both “jaws” restricts mobility of the

RNAP clamp, thereby locking it in a processive state (Fig. 55). In the “locked” conformation, the RNAP maintains tight contacts with RNA and DNA that promote isomerization into the post-translocated complex, thereby inhibiting formation of an off- pathway intermediate that arises only from the pre-translocated state.

Although a link between the formation of the elemental pause and clamp movement was proposed more than a decade ago, our findings provide the first experimental evidence of an elongation factor that may act by controlling this movement. We propose that other, structurally unrelated factors may also act by a similar mechanism.

Contributions: Anastasia Sevostiyanova carried out in vitro (pause assays, gel mobility

shift assay, Rho-dependent and intrinsic termination assays, elongation rate analysis)

and in vivo (qRT-PCR, bacterial 2-hybrid, viability tests) analyses. Irina Artsimovitch

constructed the GL deletion and the 2-hybrid vectors. Georgiy Belogurov provided RfaH

mutants for this analysis.

159

Fig. 55. Model of antipausing modification by NusG-like proteins.

Antipausing factor (AP) binds to both sides of the main channel and restricts the clamp opening. The opened conformation of clamp is thought to correspond to the folded state of the TL (yellow) (26), an element that was shown to play a central role in active site rearrangement during pausing (40). Unfolding of the TL is required for opening of the active site and binding of an incoming nucleotide in the pre-insertion site, which is thought to stabilize a pause-resistant post-translocated state.

By locking the TEC in the closed conformation, RfaH and other regulators may favor the correct alignment of the nucleic acids inside the main channel that corresponds to the productive, pause-resistant state of the enzyme.

160

Materials and Methods

Proteins and Reagents

Oligonucleotides were obtained from Integrated DNA Technologies (Coralville,

IA) or Sigma Aldrich (St. Louis, MO), NTPs and [α32P]-NTPs were from Perkin Elmer

(Boston, MA), restriction and modification enzymes – from NEB (Ipswich, MA), PCR

reagents – from Roche (Indianapolis, IN), other chemicals - from Sigma (St. Louis, MO)

and Fisher (Pittsburgh, PA). Plasmid DNAs and PCR products were purified using spin

kits from Qiagen (Valencia, CA) and Promega (Madison, WI). Unless indicated

otherwise, for RNA manipulations and qRT-PCR were from Epicentre (Madison, WI).

Rho was purified as described in (175). The full-length RfaH variants, the RfaHN domain, and RNAP were purified as described in (197). Plasmids are listed in Table 5.

Sigma Competition Assay

Halted elongation complexes were prepared in 50 µl of TGA buffer (20 mM Tris-

HCl, 2 mM MgCl2, 20 mM NaCl, 5% glycerol, and 0.1 mM EDTA; pH 7.9) with E. coli

RNAP (30 nM), ApU (100 µM), and starting NTPs (1 µM CTP, 5 µM ATP and GTP, 10

µCi [α32P]CTP, 3000 Ci/mmol). Elongation factors (1 µM σ, 70 nM RfaH) were added

followed by a 3-min incubation at 37°C. Transcription was restarted by the addition of

GTP to 15 µM, CTP, ATP and UTP to 150 µM, and rifapentin to 25 µg/ml. Samples were

removed at 10, 20, 40, 90, 180 and 360 sec and quenched by the addition of an equal

volume of STOP buffer (10 M urea, 50 mM EDTA, 45 mM Tris-borate; pH 8.3, 0.1%

bromophenol blue, 0.1% xylene cyanol). Samples were analyzed on a 8% denaturing

PAGE gel. The RNA products were visualized and quantified using PhosphorImager

and ImageQuant Software (GE Healthcare).

161 qRT-PCR

To test for the effect of gate-loop deletion on RfaH-controlled genes expression, total RNA was isolated from cells expressing wild-type or ΔGL RNAP in the presence of absence of RfaH and the amounts of RNA message in rfbB region (oligos 1069 and 1070, amplify the +40 - +363 region from the rfbB start codon) was compared to that in the wbbI region (oligos 1065 and 1066, amplify the +664 - +967 region from the wbbI start codon).

Overnight culture of DH5α or IA149 (a derivative of DH5α, Targetron™ disruption of rfaH) E. coli strains (264) transformed with pIA898 (ΔGL; D516V rpoB) or pIA183 (D516V rpoB) were diluted 1/100 into LB and grown for 3h before addition of 0.1 mM IPTG. After 2 h of induction, rifapentin was added to 200µg/ml, and cells were grown for 1 h. Cells were collected and total RNA samples were isolated using Nucleic

Acid Isolation Kit. Samples were treated with DNAseI according to manufacturer instructions, and the RNA quality was evaluated in the agarose gel. Control PCR with specific oligos without the RT-step was performed to ensure the absence of DNA contamination. qRT-PCR analysis was performed using MiniOpticon cycler (BioRad;

Hercules, CA) and MasterAmpTM GREEN Real-Time RT-PCR kit. Total RNA samples (1

µg) were added to 24 µl of reaction mix and analyzed in triplets. For each sample, at

least 3 repeats in two independent experiments (starting from cell growth and RNA

isolation) were performed.

To ensure an accurate quantification of RNA message in rfbB and wbbI regions,

qRT-PCR assay was calibrated on an in vitro synthesized RNA. Regions that correspond

to rfbB or wbbI genes were PCR amplified from E. coli genomic DNA using an upstream

oligo with the T7 gene 10 promoter sequence overhang. In vitro transcription was

prepared in 40 µl of T7 transcription buffer (40 mM Tris-HCl pH7.5, 6 mM MgCl2,10 mM

NaCl, 10 mM DTT) using 1.5 µg of PCR product containing rfbB or wbbI regions, 2 mM rNTP substrates, 500 U of T7 RNAP and incubated at 37° C for 90 min followed by 30 min of DNaseI treatment. DNaseI was heat inactivated according to manufacture instructions; RNA was purified by ZYMO kit (Orange, CA) and diluted to 80 ng/µl.

162

Serial dilutions were prepared and used for calibration qRT-PCR. Amount of in vitro synthesized RNA applied per reaction were: 64 pg, 16 pg, 4 pg, 1 pg, 0.25 pg and 0.0625 pg. Tc (threshold cycle) from four independent runs were plotted against concentration and fitted using Scientist 3.0 software (Micromath). The resulting equations were used to quantify the amount of rfbB and wbbI messages in total RNA samples.

Viability Assay

DH5α transformed with pIA898 (Ptrc-rpoBD516V+ΔGL), pIA183 (Ptrc-rpoBD516V), or

pIA160 (Ptrc-rpoB). Two different colonies from each plate were inoculated into LB +

carbenicillin (100 µg/ml) overnight, diluted into fresh media 1/50, and grown for 3 h

before the addition of 1mM IPTG. After 1 h of induction, 7 µl of serial dilutions (100 to

10-5) were plated onto LB agar plates containing carbenicillin, 1 mM IPTG, with or without 50 µg/ml rifapentin. Plates were incubated at 37° C for 48 h and scanned.

Elongation Rate Assay

Halted radiolabeled A29 TECs formed on linear template PCR amplified from pIA146 DNA template containing the pause-less rpoB gene with the wild-type or βGL deletion RNAP were incubated with RfaHN (80 nM) or NusG (80 nM), and transcription reinitiated by addition of 20µM NTPs. aliquots were withdrawn at indicated times and analyzed on 5% denaturing gels.

Rho-Dependent Termination Assay

Halted A26 TECs were formed on a linear template PCR amplified from pIA267at 40 nM in Rho -transcription buffer (40 mM Tris-HCl, 50 mM KCl, 5 mM MgCl2,

0.1 mM DTT, and 3% glycerol; pH 7.9) with ATP and UTP at 2.5 µM, GTP at 1 µM, ApU at 150 µM, and 20 µCi of [α-32P]GTP (3000 Ci/mmol; Perkin Elmer) for 15 min at 37°C.

NusG was added to 100 nM where indicated. Elongation was resumed by addition of

150 µM each ATP, CTP, UTP, 15 µM GTP containing 50 µg rifapentin/ml and 50 nM of

Rho factor where indicated. Reactions were stopped after 15 min and analyzed on 6%

163 denaturing gels.

Gel Mobility Shift Assay

was performed essentially as in (82). TECs were assembled using either partially or fully complementary DNA oligonucleotides, RNA primer, and core RNAP. The wild- type (ops element is shown in italics) TEC was assembled from the non-template DNA

(CACCACCACGCGGGCGGTAGCGTGCTTTTTTCGATCTTCCAGTG), the template

DNA (CACTGGAAGATCG AAAAAAGCACGCTACCGCCCGCGTGGTGGTG), and the 14-mer RNA primer (GCGGGCGGUAGCGU). The template strand and RNA primer were annealed in 20 mM Tris-HCl (pH 7.9), 20 mM NaCl, 0.1 mM EDTA, mixed with core RNAP (at 100 nM) and incubated in the transcription buffer (20 mM Tris-HCl; pH

7.9, 20 mM NaCl, 5% glycerol, 5 mM MgCl2, 0.1 mM DTT, and 50 µg/ml BSA) for 10 min

at 22° C. The nontemplate DNA strand oligo was added at 2.5-fold molar excess for 10

min at 37° C. To obtain radiolabeled TECs, 10 µCi [α-32P]GTP (3000 Ci/mmol; Perkin

Elmer) was added to the assembled complex. Purified RfaH (500 pmoles) that carries the

RRASV motif was labeled at the Ser residue with the heart muscle kinase catalytic subunit (NEB) in a 25 µl reaction (20 mM Tris-HCl; pH 8.0; 150 mM NaCl, 20 mM MgCl2,

0.1 mM EDTA, 10 µCi [α-32P]ATP (3000 Ci/mmol; Perkin Elmer); 20 U of protein kinase) for 45 min at 22°C. The unincorporated label was removed using a size exclusion G50 spin column (GE Healthcare). Reconstituted TECs were mixed with radiolabeled RfaH at

50 nM, incubated for 5 min at 37°C, and loaded onto 3% NuSieve agarose gels in 0.5X

TBE. After electrophoresis at room temperature at 5 V/cm for 4 hr, the gels were exposed to phosphorimager screens.

164

Name Description Source or note

TRANSCRIPTION TEMPLATES pIA146 T7A1 promoter-A29-rpoB elongation rate template (279) pIA267 λ PR promoter-A26-λ tR1 terminator template (82) pIA807 T7 A1 promoter–G37–ops pause–[extended-10] template (136)

PROTEIN EXPRESSION VECTORS pVS10 PT7–rpoA–rpoB–rpoCHis6; rpoZ (197) pIA160 Ptrc– His6rpoB (82) pIA183 Ptrc– His6rpoB[D516VRifR] (82) pIA803 PT7–rpoA–rpoB–rpoC[I290R] His6;rpoZ (136) pIA898 Ptrc– His6rpoB[D516VRifR+Δ368-376) →GG in pIA183 this work pIA899 PT7–rpoA–His6rpoB[D516V+Δ368-376→GG]–rpoC; rpoZ this work pIA1039 PT7–rpoA– His6rpoB[Δ368-376→GG]–rpoC; rpoZ this work pIA238 PT7–His6rfaH (82) pIA758 PT7–His6rfaH [H65A] (264) pIA764 PT7–His6rfaH [T66A] (264) pIA757 PT7–His6rfaH [T67A] (264) pGB34 PT7–His6rfaH [Y54F] (264) pIA270 PT7–His6+RRASVrfaH (82)

BACTERIAL 2-HYBRID VECTORS pBT Bait vector; λ cI Stratagene (La Jolla, CA) pTRG Target vector, rpoA[1-248] Stratagene (La Jolla, CA) pIA960 rpoB[2-454] in pBT this work pIA963 RfaHN[1-99] in pTRG this work pIA997 rpoB[2-454/Δ368-376→GG] in pBT this work pIA1013 RfaHN[H65A] in pTRG this work

Table 5. Plasmids and templates

165

Chapter 6: Transcription inactivation through local refolding

of the RNA polymerase structure

Introduction

Structural studies of antibiotics not only provide a short-cut to medicine allowing for rational structure-based drug design, but may also capture snap-shots of dynamic intermediates that become “frozen” upon an inhibitor binding(26,280). Myxopyronin inhibits bacterial RNAP by an unknown mechanism(281). We report the structure of dMyx, a desmethyl derivative of myxopyronin B(282), complexed with a bacterial RNAP holoenzyme. The antibiotic binds to a pocket deep inside the RNAP clamp head domain, which interacts with the DNA template in the transcription bubble(43,283). Strikingly, binding of dMyx stabilizes refolding of the β’-subunit switch-2 (SW2) segment, resulting in configuration that might indirectly compromise binding to, or directly clash with the melted template DNA strand. Consistently, footprinting data reveal that the antibiotic binding does not prevent nucleation of the promoter DNA melting but blocks its propagation towards the active site. Myxopyronins are thus a first class of antibiotics that target formation of the pre-catalytic transcription initiation complex, the decisive step in gene expression control. Interestingly, mutations designed in SW2 mimic the dMyx effects on promoter complexes in the absence of antibiotic. Overall, our results suggest a plausible mechanism of the dMyx action and a step-wise pathway of open complex formation in which core enzyme mediates the final stage of DNA melting near the transcription start site, while SW2 might serve as a molecular checkpoint for DNA

166 loading in response to regulatory signals or antibiotics. The universally conserved SW2 may play the same role in all multi-subunit RNAPs.

167

Results

Crystal Structure of RNAP Complexed with dMyx

Our data show that myxopyronins efficiently inhibit formation of transcription initiation complexes by both E. coli and T. thermophilus RNAPs (Fig. 56).The crystal structure of dMyx bound to T. thermophilus RNAP holoenzyme (Figs. 57 and 58) has been refined at 2.7Å resolution to the final R-factor/R-free=0.240/0.270. Comparison with the apo-RNAP (PDB ID 2A6E) revealed a subtle but systematic, dMyx-induced ~2.0-3.7Å shift of the β’-subunit N-terminal domain (residues β’1-600) and the σ subunit in the

RNAP/dMyx complex (Fig. 59). This shift, however, did not change either the pattern of

the σ/core-RNAP interactions or the arrangement of, and the distance between the σ

regions 2 and 4 which recognize the -10 and -35 promoter elements(43). The

configuration of the fork formed by the σ regions 2.3-2.4 and 2.5-3.1 at the upstream

entrance to the RNAP main channel (where melting of the DNA duplex is thought to

commence(283-285) is also largely unaffected by dMyx. In the RNAP/dMyx complex, the

width of the main channel is reduced by ~4Å as compared to that in one of the two

crystallographically-independent molecules in the apo-RNAP. If systematic, this

constriction could affect accommodation of the upstream DNA duplex and/or DNA

melting. However, the second apo-RNAP molecule also exhibits the narrowed channel,

implying the inherent flexibility of the neighboring domains and suggesting that the

main channel width is unlikely a limiting factor for open complex formation (Fig. 60).

168

Fig. 56. Myxopyronin inhibits transcription initiation by bacterial RNAPs.

E. coli and T. thermophilus enzymes are inhibited by myxopyronin, as assayed by the steady-state abortive synthesis of ApUpG RNA. The inhibitor (diluted into 50% EtOH) was preincubated with 20 nM RNAP holoenzyme (E. coli or T. thermophilus) at the concentrations indicated above the gel for 10 min at 37oC or 55oC, respectively.

Fig. 57. Structure of the RNAP/dMyx complex, the overall view.

The σ-subunit, bridge helix, TL, and the remainder of the RNAP molecule are in blue,

magenta, cyan, and gray, respectively. The SW2 segment is green, dMyx is in black, the

Mg2+ ion is shown as magenta sphere. The same color scheme is used in all figures throughout this manuscript. CH – clamp helices, CC – coiled-coil.

169

Fig. 58. The quality of the RNAP/dMyx structure.

The final slow annealing (Fobs-Fcalc) omit electron density (blue) for dMyx (3.0Å level) (a),

and SW2 segment (2.4Å level) (b) in the complex structure.

Fig. 59. The RNAP domain rearrangement induced by the dMyx binding.

The N-terminal β’-subunit domains are shown in grey and yellow, while the σ-subunits

are in blue and red for the RNAP/dMyx complex and dMyx-free RNAP, respectively.

170

Fig. 60. The dimensions of the RNAP main channel in the RNAP/dMyx and apo-RNAP structures.

The narrowest place of the channel is indicated by red arrows and distances in Å.

A. RNAP/dMyx complex (σ-subunit – blue; core enzyme – grey).

B, C. The apo-RNAP structure. The first (b) and second (c) crystallographically independent molecules are shown (σ-subunit –orange; core enzyme – yellow).

171

dMyx binds in the pocket deep inside the RNAP clamp head domain (Figs. 57 and 61A), which constitutes the wall of the main channel opposite the catalytic center and forms crucial contacts with the DNA template strand in the EC(100,221). Although hydrophobic contacts likely play a dominant role in binding, most of the dMyx polar groups also form specific interactions with the protein (Figs. 61B and 62). The major, and

a very striking change observed in the presence of dMyx is refolding of the highly

conserved β’ SW2 segment (β’602-621; Fig. 61C): the α-helix, interrupted in the middle by four flipped-out residues, is straightened, while its C-terminal portion (~2-helical turns) unwinds and refolds into a loop (Fig. 63). This loop extends toward the active site, where it approaches the σ hairpin loop (53) (σ317-333; Figs. 57 and 61A).

Mutational Analysis of SW2 region

To verify the dMyx-binding determinants revealed by the structure and to probe

the role of SW2 refolding in the dMyx mechanism, we performed in vitro mutational

analysis of E. coli RNAP (see Materials and Methods) As anticipated, substitutions of

three residues (Ser β1322, Glu β1279, and Lys β’345, numbered as in E. coli β'; Fig. 61B) making crucial interactions with dMyx conferred resistance to the antibiotic (Fig. 64). On the other hand, upon the SW2 refolding Lys334 forms only weak van der Waals interactions with dMyx while Arg337 and Arg339 do not interact with the inhibitor at all

(Fig. 61B). Consistently, substitutions of these residues for Ala do not substantially affect inhibition by dMyx (Fig. 64). To design SW2 variants with altered refolding properties, we used the following considerations. First, we selected residues without essential direct contacts with dMyx. Second, deletion of two flipped out residues, Lys334 and Gln335, whose integration into an α-helix likely initiates refolding (Fig. 63A), would prevent both opening of the inhibitor-binding site and formation of the C-terminal loop. Third, in the structures without dMyx, Phe338 is integrated into the hydrophobic core that likely stabilizes the original SW2conformation. The Phe/Ala substitution, as well as a deletion of 338-341 residues, would weaken these interactions, thereby presumably favoring refolding. Fourth, although in both conformations Gly336 is located at the junction

172 between the α-helical and unfolded portions, its main chain angles (φ,ψ) appear in the

disallowed and favorable (for the amino acids with side chains) regions of the

Ramachadran plot for the original and refolded configurations, respectively, suggesting

that its substitution for Ala would favor the refolded conformation. In support of

structural considerations, the Δ334-335 was resistant, while the Δ338-341, F338A and

G336A variants were hypersensitive to dMyx (Fig. 64).

SW2 refolding may be pivotal for the dMyx action. First, refolding opens the

entry to the otherwise inaccessible dMyx-binding site (Fig. 65). Second, Arg β’610 and

Gln β’611 are flipped out of the helix in the original SW2configuration and form hydrogen bonds with the DNA template in the EC (and likely in the initiation complex, where they may be crucial for stability of the transcription bubble) but lose these contacts upon refolding (Fig. 63). This change may inhibit DNA melting beyond the register -3. Finally, the newly formed C-terminal loop would clash with the DNA

template strand if melting propagates to register +1 (Fig. 66). This clash can hardly be

avoided: while the upstream DNA (registers -2  -10, etc.) may exhibit relatively large deviations between the initiation and elongation complexes, the position of the acceptor template (i+1, where the major clash is predicted, Fig. 66) is strongly restrained by base pairing with the incoming substrate and thus is likely identical in both states. This suggests that steric competition between the refolded switch-2 and the template DNA strand underlies the dMyx mechanism of inhibition.

173

A C

B

Fig. 61. Myxopyronin binds to a conserved SW2 element.

A. The close-up view of the dMyx binding site;

B. The schematic drawing of the protein/dMyx interactions. The SW2 segments in the

Myx-free and Myx-bound structures are in orange and green, respectively. The polar and van der Waals interactions are shown as solid arrows and dashed lines, respectively.

The mutated residues are indicated by the red boxes

C. Sequence alignment of the SW2 segment from bacterial (E. coli, eco; T. thermophilus, tt,

Bacillus subtilis, bsu; Mycobacterium tuberculosis, mtu), archaeal (Pyrococcus furiosis, pfu), and yeast Saccharomyces cerevisiae pol II (scII) enzymes. Substitutions constructed in this work are shown above the sequence in green.

174

Fig. 62. The dMyx binding determinants in the RNAP/dMyx complex structure.

A. Sequence alignment of the β and β’-subunit fragments containing the dMyx binding residues (marked by the black boxes) from bacterial (E. coli, Eco; T. thermophilus, Tt,

Bacillus subtilis, Bsu; Mycobacterium tuberculosis, Mtu), archaeal (Pyrococcus furiosis, Pfu), chloroplasts (Arabidopsis thaliana, Ath) and yeast Saccharomyces cerevisiae (Sce) RNAPII enzymes. The residues are numbered (above the sequence) as in the T. thermophilus sequence. The hydrophobic, basic, acidic and polar residues are shown in yellow, blue, red and white, respectively.

B. Stereo view of the dMyx (black) binding site. The hydrogen bonds between the protein residues and dMyx are shown by the cyan dashed lines.

175

AB

Fig. 63. Refolding of SW2.

Conformations of the SW2 segment in the Myx-free (A) and Myx-bound (B) holo-RNAP structures.

176

b

Fig. 64. Effect of RNAP mutations on dMyx activity.

IC50s were measured in vitro with purified RNAP variants (see Materials and methods).

A. The fraction of RNA synthesis (compared to 1 in the absence of dMyx) plotted against

dMyx concentration for the wild-type, resistant (β' ∆334-335), and hypersensitive

(β'F338A) E. coli enzymes.

B. The data for all variants tested in this study. The IC50 could not be determined for the highly resistant β' K345A variant.

177

Fig. 65. The entry of the dMyx binding site.

The closed (A) and open (B) configuration of the SW2 motif in the apo-RNAP (orange) and RNAP/dMyx complex (green), respectively. In the closed state, dMyx access to its binding site appears largely blocked.

178

Fig. 66. Modeling of the DNA template to the RNAP/dMyx complex.

The main chain of the residues located at the tip of the refolded SW2 segment (β’613-616) clash with phosphate backbone of the T DNA in vicinity to the active site. In particular, the main and side chains of Arg β’615 appears to completely block access for the acceptor template base (i+1) to its binding site in the active transcription complex. Importantly, the resolution of the RNAP/dMyx structure is high enough for feasible modeling. Furthermore, the refolded SW2 segment is well resolved, allowing us to predict the potential competition; the interference between the protein and nucleic acids appears quite severe and cannot be avoided even assuming that the conformation of the nucleic acids might be somewhat different in the initiation complex as compared to the EC. While we may expect relatively large deviations in the positions of the DNA template upstream of active site (registers -2  -10, etc.), we do not anticipate significant alterations in the position of acceptor template (i+1) where the major clash between the SW2 and DNA is predicted by the modeling. Indeed, the position of the acceptor template is strongly restrained by the proper base pairing with the incoming substrate that would be likely identical for the initiation and elongation complexes.

179

Myxopyronin Traps a Partially Melted Promoter Complex Intermediate

To complement the lack of structural information on the DNA conformation in dMyx-inhibited complex, we tested this model using biochemical approaches. First, the model implies that the properly positioned template strand would preclude refolding, and thus Myx binding; indeed, dMyx failed to inhibit transcription if added to the preformed open promoter complex (Fig. 67). Second, we tested the dMyx effect on

RNAP/DNA contacts and the DNA strand separation in λPR promoter complexes using

DNaseI and KMnO4 footprinting, respectively. Consistent with all published data

(57,67,69), the non-template strand T residues at positions -4,-3, and +2 were

hypersensitive to KMnO4 modification in the absence of dMyx (Fig. 68). In the presence

of inhibitor, the +2 position became strongly protected. DNaseI probing showed that

dMyx induced a loss of protection (4-5 bp) at the downstream footprint boundary (on

both DNA strands; Fig. 68 and data not shown). Similar patterns were observed in complexes trapped at intermediate steps of open complex formation (59,69,284).

Our structural data did not reveal any antibiotic-dependent significant alterations of the RNAP structure that may affect DNA loading into the main channel at the upstream (-10) promoter region. Consistently, our footprinting analysis demonstrates that dMyx does not prevent RNAP binding to promoter, nucleation of melting at ~-11, or entry of the double-stranded downstream DNA into the enzyme. The antibiotic imposes block to DNA melting only beyond register -3, where the direct interactions with SW2 are predicted by modeling (assuming that the DNA trajectory is not dramatically changed between the initiation and elongation complexes). Importanly, dMyx inhibited transcription on the artificially melted promoters (Fig. 69), indicating that dMyx not only blocks DNA melting near the active site, but also precludes the

correct loading of the template strand into the main channel.

180

Fig. 67. Myx inhibits transcription only if added before RPO formation.

As a template, linear pIA171 template with a T7A1 promoter followed by a 29-nt long

"U-less cassette" was used. The key features of this template are shown on top; the run- off transcript is 55-nt long. E. coli RNAP core and σ70 were used in these experiments.

The point of dMyx addition was varied as indicated in the reaction schematic. In the

reactions 1-4, only three substrate NTPs were present, whereas UTP was omitted to

allow formation of the transcription elongation complex halted prior to addition of UMP

at position 30 (A29). In reaction 5, all four NTPs were present, allowing formation of the

full-length, run-off RNA. The reactions were quenched and analyzed as in Fig. 56.

We conclude that dMyx acts before the open promoter complex formation and cannot

inhibit transcription if added after the stable open complex (RPo↔) is formed (reaction

4). The "intermediate" effect in reaction 3 (dMyx added to RPo) is likely due to the relative (e.g., compared to λPR) instability of RPo formed at the T7A1 promoter; these

complexes are at equilibrium with dMyx-sensitive closed and intermediate complexes.

Thus, to form a "stable" RPo↔ complex, we added ApU primer to the RPo in reaction

181

Fig. 68. dMyx alters the contacts between RNAP and λPR promoter DNA.

A linear DNA fragment encompassing positions -81 through +70 of the λPR promoter

was generated by PCR; the non-template DNA strand was end-labeled with [32P]-αATP.

The sequence from -44 to +23 is shown. The -35 and -10 hexamers are indicated by black boxes, the start site (+1) is shown by a black dot. Top panel shows probing of the transcription bubble by piperidine-induced cleavage of the permanganate-modified T residues (indicated next to the gel). The modification pattern is summarized above the promoter sequence where black and white arrows indicate high and low reactivity, respectively. Bottom panel shows protection of the non-template DNA strand from

DNaseI digestion. The footprint boundaries within the promoter region shown are indicated on the gel and by black (RNAP alone) and white (RNAP with dMyx) bars below the promoter sequence; the dideoxy-sequencing ladder is shown for reference. In the gels shown, independent reaction repeats were analyzed for consistency.

182

Fig. 69. Myxopyronin inhibits transcription from both the natural, double-stranded (left) or artificially melted (right) λPR promoter templates.

The melted region is indicated. Synthesis of ApUpG abortive RNA was followed as a function of the inhibitor concentration shown below each gel; the assays were carried out with the E. coli RNAP holoenzyme as described in Materials and Methods.

183

Altogether, our data support a steric occlusion mechanism, in which local dMyx- stabilized refolding of ~20 RNAP residues blocks formation of the open complex by triggering displacement of the downstream DNA and inhibiting the strand separation at the transcription start site (i+1). Additional allosteric effects (for example, on the clamp opening/closing) of dMyx binding cannot be ruled out, but our study fails to reveal any indications of their importance in antibiotic action. A more detailed understanding of the mechanism awaits the high resolution structures of the RNAP open complex with and without dMyx. Elucidation of the dMyx-binding determinants and the mechanism of its action would guide rational design of more potent dMyx derivatives.

Our results, together with conformational transitions in SW2 observed in eukaryotic RNAP(6,7), suggest that this region is inherently flexible and may influence the open complex formation in the absence of inhibitors. We tested this hypothesis using

the SW2 variants with altered refolding properties (see above). At the λPR promoter, the wild-type RNAP formed a stable complex in which the +2 position is accessible (Fig. 67).

In contrast, two mutants lacking the Phe338 side chain (that likely stabilizes the original

SW2configuration) demonstrated prominent phenotypes. Δ338-341 enzyme alone behaves as the wild-type RNAP in the presence of dMyx: DNA melting at +2 is blocked

(Fig. 70A), DNaseI footprint is shortened (Fig. 70B), and the complex is heparin sensitive

(Fig. 71). In F338A RNAP, the pattern of permanganate sensitivity is shifted further upstream: the +2 position is not melted, and the -10 position becomes unprotected (Fig.

70). In contrast, β’Δ309-325 that removes the entire rudder loop (which is inserted in the

same helix as switch-2, but is unlikely to interfere with the nucleic acids) has no effect on

DNA melting, suggesting that a melting defect of a different rudder deletion(286) might

be due to changes in the adjacent switch-2 instead. Remarkably, addition of dMyx

shifted all complexes into the same state (Fig. 70).

184

Fig. 70. Footprinting analysis of the RNAP variants with changes in the SW2 regions.

Experiments were performed essentially as in Fig. 68.

A. Potassium permanganate footprinting: WT and mutant RNAPs differ in their patterns of reactivity in the absence of dMyx (top traces) but are nearly identical in the presence of 10 mM dMyx (bottom traces). Notably, β’ Δ309–325 that removes the entire rudder loop (which is inserted in the same helix as SW2, but is unlikely to interfere with the nucleic acids) has no effect on DNA melting.

B. DNaseI footprinting analysis of promoter complexes formed by SW2 variants. The downstream footprint boundaries are indicated by black or white circles. The upstream boundary (-42) is the same in all cases. β’ Δ334–335 variant is resistant to dMyx when

β’Δ338–341 and F338A variants are hypersensitive to the inhibitor.

185

Fig. 71. DNaseI footprinting analysis of promoter complexes formed by SW2 variants. A linear PCR fragment encompassing positions -81 through +70 of the λPR promoter was

generated with the [32P]-labeled non-template DNA strand. The promoter fragment was incubated with the RNAP variant indicated above each set of lanes in the presence of

10µM dMyx or in the absence of the inhibitor (but with a mixture of EtOH/DMSO that is used as a dMyx solute); the leftmost lane contains no RNAP. After 15 min incubation at

37oC, heparin was added to 10 µg/ml (where indicated) for 1 min, followed by DNaseI digestion (0.01U for 1 min). The downstream footprint boundaries are indicated by black

(∆338-341 RNAP with and without dMyx, other enzymes alone) or white (all RNAPs + dMyx) circles. The upstream boundary (-42) is the same in all cases. In the gel shown, independent reaction repeats were analyzed for consistency.

186

The observed effect at -10 might originate from a hypothetical allosteric connection between the SW2 and the nearby clamp helices, which serve as the major σ- binding site and whose repositioning may be required for proper σ/base interactions with the -10 element in the non-template strand; the F338A mutation may block this putative shift, while the dMyx binding would restore it.

We propose that dMyx stabilizes an intermediate (I2, Fig. 72) in which the DNA strand separation is nucleated (by the action of σ) but does not extend to the active site. dMyx likely traps the SW2 in the refolded state, blocking DNA melting at the downstream (register +2) and repositioning the clamp helixes at the upstream (register -

10) edges of the bubble. The clamp helices serve as the major σ-binding site and their movement may be required for proper σ/base interactions with the -10 element in the

non-template strand; the F338A mutation may block this shift, while the dMyx binding

would restore it (Fig. 70). Further bubble propagation may require structural transitions

of the SW2 and is inhibited by Δ338-341. By this reasoning, the state that predominates in

F338A may be an earlier pathway state, or a different state in which σ/base contacts or

stacking is altered.

We cannot definitively prove that the SW2 state observed in the presence of

dMyx resembles a "physiological" (dMyx-independent) intermediate, as opposed to an

antibiotic-induced dead-end complex. However, several arguments favor this

interpretation. First, dMyx binding likely requires SW2 refolding because even a partial

insertion of dMyx into its binding site, which in principle could initiate the SW2

refolding, is blocked in its original configuration (Fig. 65); consistently, substitutions that

favor refolding confer hypersensitivity (see above). Thus, to induce the alternate SW2

conformation, dMyx presumably should first bind somewhere near (but not overlapping

with) its major binding site and interact with SW2 to promote its refolding. Given that

dMyx is hydrophobic, this putative “pre-insertion” site should be largely hydrophobic

and complement in shape to some unique portion of the antibiotic to distinguish it from

other compounds. However, our modeling does not reveal any site that could play such

187 a role. In particular, the RNAP surface near the SW2 segment at the entrance of the major dMyx binding site appears substantially charged. Second, changes in the SW2 designed to promote its re-folding stabilize promoter complexes in states different from

RPo (Fig. 70) but active upon addition of substrates. In particular, β'Δ338-341 RNAP is

both dMyx-hypersensitive (Fig. 64B) and stabilized in the same (by footprinting criteria)

state as the dMyx-bound enzyme (Fig. 70); in other words, it mimics the effects of dMyx,

but reversibly. Similar intermediates trapped by altering reaction conditions10,13,15 or a

large deletion in β subunit (59) are commonly referred to as on-pathway states since they

can give rise to active complexes.

188

Discussion

The detailed understanding of the effects of the substitutions used in this work requires further structural and biochemical characterization. However, our present

findings support a mechanism of open complex formation that involves sequential

bending of the promoter DNA upstream and downstream of the RNAP main channel as

the pivotal transitions. The upstream bend is thought to originate from the σ-

subunit/DNA contacts and induces upstream DNA melting ~60Å away from the

catalytic center (51,67,284). In contrast, the subsequent step is likely σ-independent and

is set near the active site, where interactions of DNA with the core enzyme introduce

sharp kink in the template strand (100,221) that may facilitate opening of the DNA

duplex at the downstream edge, thereby finalizing the opening of the transcription

bubble (Fig. 72). In this process, the SW2 might serve as a molecular checkpoint that can

permit or restrict DNA loading into the active site in response to regulatory signals (yet

unknown) or antibiotics (dMyx, corallopyronin, and perhaps others).

Thus, suggesting an active role of the core enzyme in the open complex

formation, our data provide an important addition to the traditional “σ-centric” model,

in which the σ-subunit is entirely responsible for the DNA strand separation, whereas

the core enzyme serves as a scaffold and becomes active only at a later, catalytic step.

Furthermore, while the mechanisms of initiation of DNA melting are vastly different

between bacterial and eukaryotic enzymes, the final (core-dependent) step of the DNA

melting and the template strand loading into the active site is likely fundamentally

conserved, in order to give rise to essentially identical “final” states observed in the

structures of the active transcription complexes.

189

Contributions: Anastasia Sevostiyanova performed DNaseI and KMnO4 footprinting

analysis of the initiation complexes formed in the presence of dMyx by the wild-type

and altered RNAPs on different promoters, analyzed sensitivity of initiation complexes

formed in the presence of dMyx by the wild-type and altered RNAPs and performed

initiation assays on artificially melted promoters. Irina Artsimovitch carried out vector

construction and performed biochemical assays. Georgiy Belogurov constructed,

purified and analyzed the properties of mutationally altered RNAPs. Dmitriy Vassylyev

has determined, refined, and analyzed the structure.

190

Fig. 72. Step-by-step schematic of open complex formation.

Formation of the transcriptionally-active open promoter complex (RPo) from an initial closed complex (RPc) proceeds through several kinetic intermediates (I1, I2,...), which differ in the RNAP/DNA interactions and the state of the transcription bubble (57,67).

Binding of dMyx apparently traps an intermediate transcription complex, in which the

DNA melting is blocked prematurely, upstream of the transcription start site.

Conversely, dMyx cannot bind to RPo in which the DNA strand separation is complete.

191

Materials and Methods

DATA COLLECTION

Space group P65

Unit cell parameters (Å) a = b = 235.0, c = 255.0

Resolution (Å) 40.0-2.7 (2.80 – 2.70)*

Reflections (Total/Unique) 1102413/212777

I/σ(I) 14.3 (2.5)

Rmerge (%) 8.4 (44.4)

Completeness (%) 97.7 (93.2)

REFINEMENT

Space group¶ P32

Twinning (%) 50.0

Twinning operator -h,-k,l

Resolution (Å) 40.0–2.7 (2.80 – 2.70)

Reflections used 422394

Rfactor (%) 24.0 (29.9)

Rfree (%) 27.0 (32.3)

Overall B-factor/RMSD (Å2) 57.6/1.6

Cross-validated sigma-A 0.44 coordinate error (Å)

192

continued Number of protein atoms 52790

Number of water molecules 4496

Number of dMyx atoms 60

Number of Zn2+ ions 4

Number of Mg2+ ions 2

MODEL QUALITY

RMSD bond length (Å) 0.015

RMSD bond angles (°) 1.99

RMSD improper angles (°) 1.22

RAMACHADRAN PLOT

Number of residues (%)

Most favorable regions 84.6

Allowed region 15.1

Generously allowed regions 0.3

Disallowed region 0.0

Table 6. Collection of structural data and refinement statistics

Rmerge=ΣhklΣj Ij(hkl) - /ΣhklΣj , where Ij(hkl) and are the intensity of measurement j and the mean intensity for the reflection with indices hkl, respectively.

Rfactor, free=ΣhklFcalc(hkl)  - Fobs(hkl) /ΣhklFobs, where the crystallographic R-factor is calculated including and excluding reflections in the refinement. The free reflections constituted 5% of the total number of reflections. RMSD – root mean square deviation.

I/σ(I) – ratio of mean intensity to a mean standard deviation of intensity. *The data for

193 the highest resolution shell are shown in brackets. ¶The refinement was first carried out in the P65 space group. However, although the procedure of zonal scaling provided a substantially better match between the experimental and model structure factor amplitude, we were still unable to obtain an R-factor below 35.6% at 2.7Å resolution, whereas the EC map remained quite noisy albeit showing the clear ED in the protein and

“omit” regions. Inability to improve the R-factor and to provide a high quality ED suggested that the data are likely affected by merohedral twinning, as was also observed in other projects in our lab(5,36,101,140,287,288). Indeed, the calculations of the intensity statistics proposed by Yeates (289) and implemented in the CNS program(290) indicated a presence of the perfect merohedral twinning, thereby suggesting that a proper space group of the crystals is P32, rather than P65. At the same time, the fact that we still have

been able to obtain an interpretable ED using the phases calculated in the P65 space group led us to conclude that, as observed previously(5,36,101,140,287,288), merohedral twinning mimicking the P65 space group is likely coupled with the non-crystallographic

symmetry that also closely resembles the P65 crystallographic symmetry operators.

Therefore, to obtain the high quality of the ED, we have carried out the twinning refinement in the P65 space group using the CNS program(290). For this, before the

refinement we have expanded the crystallographic data processed in the P65 space group to that of P32 and have generated the two molecules initially related by the

corresponding crystallographic symmetry operator. The rigid body twinning refinement

using the two molecules in the P32 space group converged from the initial value of 35.6%

to ~30.7% at 2.7Å resolution. The resulting |2Fobs – Fcalc| ED map was of substantially better quality than the one obtained for the P65 space group and allowed us to improve

the model and to easily refine the structure to the crystallographic standards

corresponding to the 2.7Å resolution data.

Single Nucleotide Addition Initiation Assay

A mix containing linear λPR promoter (which is recognized efficiently by either enzyme) template obtained by PCR amplification of pIA226 (2 nM), ApU (100 µM), and

194

[α32P]-GTP in 20 mM Tris-acetate, 20 mM Na-acetate, 2 mM Mg-acetate, 5% glycerol, 1

mM DTT, 0.1 mM EDTA, pH 7.9 was pre-equilibrated at the target and added to the

RNAP/inhibitor mixture, followed by a 15-min incubation at 37oC (E. coli ) or 55oC (T.

thermophilus). The reaction was quenched by addition of an equal volume of saturated

urea in 90 mM Tris-borate, pH 8.3, 50 mM EDTA. Products were analyzed on a 7 M urea,

12% (w/v) acrylamide:bisacrylamide (19:1) denaturing gel.

Isolation and Assay of Mutant E. coli RNAPs

Core wild-type and mutationally altered RNAPs were purified as described previously(197) except for the ionic strength was maintained at or above 0.2 M at all chromatographic steps. Overexpression plasmids for β' R339A (pIA830), βE1279A

(pIA870), βS1322E (pIA878), β' G336A (pIA880), β' F338A (pIA881), β' K345A (pIA882), β'

∆333-335G (pIA883), and β' R337A (pGB055) were constructed by site-directed mutagenesis and the sequenced fragments were recloned into pVS10-based vectors(197).

Holoenzymes were reconstituted with the two-fold molar excess of σ70. For steady-state abortive initiation assays, holo RNAP (20 nM) in 16 µl of 20 mM Tris-acetate, 20 mM Na- acetate, 2 mM Mg-acetate, 5% glycerol, 1 mM DTT, 0.1 mM EDTA, pH 7.9, 1 µM σ70, were supplemented with desired concentration of dMyx (2 µl) and incubated for 15 min at 37 oC. Transcription was initiated by adding linear T7A1 promoter template (100 nM),

ApU (200 µM), CTP (25 µM) and 3 µCi [α32P]-CTP (final reaction volume 20 µl).

Reactions were allowed to proceed for 15 min at 37 oC and quenched by addition of an

equal volume of saturated urea in 90 mM Tris-borate, pH 8.3, 20 mM EDTA. Products

were analyzed on 7 M urea, 12% (w/v) acrylamide:bisacrylamide (19:1) denaturing gels

and RNA quantities were determined from Phosphorimager scans of the gels. dMyx IC50 for wild-type and variant RNAPs were determined by fitting concentration dependencies to hyperbolic function. The assay was repeated at least three times for each variant tested.

195

Footprinting Analysis

Linear 153-bp DNA fragment containing λ PR promoter was made by PCR

amplification using pIA226(291) as a template with primers 17 (5'-

CGTTAAATCTATCACCGCAAGGG) and 138 (5'-ATCGCCTGAAAGACTAGTCAGG) .

The top (non-template) DNA strand primer (#17) was end-labeled with [32P]-γATP

(Perkin Elmer) and PNK (Epicentre) and purified using G-50 spin columns (GE Health).

PCR products were gel-purified using kit (Promega). Sequencing reactions were performed using the same labeled primer with SequiTherm kit (Epicentre). For DNaseI protection experiments, wild-type holo E. coli RNAP (400 nM) was pre-incubated with

1µM of Myx or equal volume of 50% ethanol for 15’ at 37ºC in GBB buffer (20 mM Tris-

HCl, 14 mM MgCl2, 20 mM NaCl, 5% glycerol, 1 mM DTT, 0.1 mM EDTA, pH 7.9) supplemented with 1mM CaCl2. Labeled λ PR promoter fragment was added (at 20 nM) and the reaction was incubated for additional 10 minutes. Samples were shifted to room temperature (22 oC) and treated with 0.002U of DNaseI (Roche, 10U/µl) for 1 min.

Reaction was stopped by addition of equal volume of buffer containing 15mM EDTA

and 8M urea. For potassium permanganate probing, holo RNAP (400 nM) was pre-

incubated with 1µM of myxopyronin or equal volume of 50% ethanol:0.5%DMSO for 15’

at 37ºC in GBB buffer without reducing agents. Labeled λ PR promoter fragment was

added (at 20 nM), and the reaction was incubated for additional 10 minutes. Samples

were shifted to room temperature and treated with KMnO4 at a final concentration of 10

mM for 60 sec. Reaction was stopped by addition of 5x stop buffer (1.5M NaAc pH 5.2,

80 mM EDTA, 6M β-mercaptoethanol), samples were subjected to phenol-chloroform

extraction and precipitated with ethanol. Pellet was dissolved in 20µl of water and

incubated with 100µl of 0.5 M piperidine at 95ºC for 20 min. After another round of

ethanol precipitation, DNA was dissolved on 96% formamide. Samples were heated at

95ºC for 3 minutes and analyzed on 7 M urea, 8% (w/v) acrylamide:bisacrylamide (19:1)

denaturing gels.

196

Chapter 7: Conclusions and perspectives

For many years, promoter utilization by RNAPs has been thought to be a major, if not the only, regulatory target present in all kingdoms of life. Consequently, studies of transcription regulation by accessory proteins were focused on those factors that help or hinder RNAP binding to a promoter and the consequent formation of a productive initiation complex. However, it became increasingly clear that the ability of RNAP to synthesize long RNA messages and to cross-talk with other cellular processes is equally important for proper gene expression program in all organisms. To carry out these tasks,

RNAP recruits numerous transcription elongation factors that can act as general regulators, which affect expression of many genes under a variety of conditions, or carry out highly specific functions affecting only a handful of genes. These factors act at post- initiation steps to alter RNAP propensity to pause, arrest, or terminate synthesis of a nascent RNA.

In bacteria, RNAP pausing at specific sites has been shown to mediate attenuation, RNA folding, termination, recruitment of transcription factors, coupling of transcription to translation, etc. (85,90,116,153). In eukaryotes, pausing at promoter-

proximal sites was discovered more than 25 years ago (295) but has been seen as an

exceptional mode of regulation for a small number of inducible genes. In the last few

years, however, a number of whole-genome ChIP-chip screening studies revealed

widespread stalling of pre-activated Pol II just downstream of the promoter (298-

300,302,306). Permanganate probing of the selected genes and a global analysis of short

RNAs confirmed that most of these sites indeed contain stalled elongating complexes

197

(297,298). By a modest estimation, at least 10% of genes in Drosophila (298) and up to 30% in humans (299,300) experience transcription initiation but no detectable elongation due to promoter-proximal pausing, suggesting that RNAP stalling may be a common phenomenon (301). The well-characterized genes that follow this pattern include genes involved in development (302), hematopoiesis (303), and response to stimuli, such as heat shock (304) and infection (305). In zebrafish, the defects in transcriptional pausing have been shown to cause a developmental arrest due to aberrant splicing, which in turn resulted from uncoupling of the RNA synthesis and processing machineries (294).

Recently, a key regulator of cellular proliferation, c-, has been shown to play a major role in pause release rather than the Pol II recruitment at its target genes (296).

Regulation at post-initiation steps likely affords a faster response to an environmental signal, because the paused RNAP has already bypassed the and the subsequent assembly of a large multi-component transcription initiation complex, the two steps that are pre-requisites for efficient initiation in eukaryotes (306,307). Thus, regulation at the level of elongation would confer an advantage when the gene expression must be reprogrammed rapidly − for example, during the heat shock. In Drosophila, an uninduced hsp70 gene is characterized by a very low level of transcription but a high Pol II occupancy within 50 nt downstream from the transcription start site; after a 10-min heat shock, Pol II can be detected throughout the body of hsp70 associated with elongation factors Paf1, Spt5, and Spt6 (308,309). On the other hand, stalled Pol II would preclude the passage of additional RNAP molecules, serving as a repressor. In some cases, such as the innate immune response, prevention of basal transcription is required: the uncontrolled production of proinflammatory cytokines can cause autoimmune diseases or even toxic shock syndrome (305).

Thus, RNAP pausing plays many essential roles in gene expression, from determining an overall rate of transcription to poising RNAP for recruitment of auxiliary factors. However, frequent pausing may hinder synthesis of long RNAs: pausing is a precursor to termination, and stochastic premature termination of RNA synthesis (a

198 polarity effect) can reduce expression of distal genes in bacterial operons by tens, or even hundreds of times (Fig. 48A). This problem is exacerbated in eukaryotes, where genes are typically much longer than bacterial operons. The processivity mechanisms that enable RNAP to reach the end of a transcription unit are still unknown.

Another class of enzymes that need to synthesize long nucleotide chains are the replicative DNA polymerases. These enzymes bind to accessory clamp subunits that lock around the DNA duplex to ensure the processivity of replication. Interestingly, no such factors have been described for RNAPs which instead contain a built-in clamp composed of the parts of the β and β’ subunits (1) that encircle the DNA in order to afford processivity. The clamp is one of the most flexible parts of RNAP – its position varies in structures of RNAP in different complexes from bacteria and yeast (5,6,7,221). Opening of the clamp has been proposed to occur during pausing and termination (41), where the loss of stable interactions with the nucleic acids mediates enzyme stalling or dissociation. Conversely, factors that lock the clamp in a closed state increase RNAP processivity.

My work has been focused on the NusG family of proteins which are ubiquitous from bacteria to humans. The common function of the NusG proteins is to control RNAP progression along the DNA through altering RNAP response to pause signals. Our studies suggest that NusG factors play two roles in transcriptional regulation. First, they control mobility of the RNAP clamp, and in turn RNAP progression along the DNA, pausing, and termination (Chapters 4 and 5). Second, they interact with a variety of other factors, e.g., to promote transcription-translational coupling (272) and mediate the formation of large nucleoprotein transcription complexes that act on specific operons to change the efficiency of their expression in bacteria (116). In eukaryotes, NusG homologs mediate interactions of the transcription complex with chromatin remodeling complexes and RNA processing enzymes that perform mRNA capping, splicing and polyadenylation (292,293).

199

NusG-like factors consist of at least two domains. The N-terminal domains directly bind to the largest subunit of RNAP, β’ in bacteria, via interactions with the

β’CH, whereas the C-terminal domains interact with other cellular components and serve as platforms for the assembly of nucleoprotein complexes. The general elongation factor NusG is associated with most operons in E. coli (129). The N-terminal domain is necessary and sufficient for the antipausing activity of NusG, while the C-terminal domain enhances Rho-dependent termination by tethering Rho to the TEC (175).

RfaH, an operon-specific paralog of NusG, targets only a few operons containing an ops sequence in their untranslated regions (82,234). The structure of N-terminal domain of RfaH is almost superimposable with the one of NusG; consistently, their effects on elongation are very similar (129,197). The structures of their C-terminal domains, however, could not be more different: two α-helices in RfaH and a β-sheet in

NusG. One of the outcomes of such a dramatic refolding is that RfaH does not interact with Rho; instead, the C-terminal domain forms an extensive hydrophobic interface with the N-domain, making its RNAP-binding site inaccessible in the absence of ops (197).

Spt5, a homolog of NusG in eukaryotes, is associated with most actively transcribed genes (123,310). Like all proteins from the NusG family, Spt5 binds to the

RNAP clamp helices (263) and reduces pausing both in vivo and in vitro (263,311). Spt5 has been also shown to interact directly with capping enzymes in yeast and Argonaute proteins in plants (250,312,313). In a complex with negative elongation factor (NELF) and

Spt4 co-factor, Spt5 induces promoter-proximal pausing of Pol II, acting as a transcriptional repressor. Transition from negative to positive regulation requires phosphorylation of Spt5 C-terminal repeats by P-TEFb kinase, the same factor that hyperphosphorylates the CTD of Pol II. If the role of the latter modification in the processivity of transcription has been studied very extensively over the last decade, the regulatory effect of Spt5 phosphorylation has received very little attention until now.

Activation of Spt5 promotes dissociation of NELF from the paused complex and subsequent release of stalled Pol II to productive elongation (306). Once phosphorylated,

200 the Spt5/Spt4 complex reduces pausing, increases processivity of elongation and recruits capping enzymes − functions similar to those of the hyperphosphorylated CTD

(314,315).

Although our understanding of the role of pausing as a rate-limiting step in global transcription regulation is just beginning to emerge, it is clear that NusG-like factors play a central role in this mechanism. In fact, proteins from the NusG family are the only transcriptional factors that are present in all three domains of life (316). Thus, outcomes from studies of RNAP-NusG interaction in a simple bacterial system can have much broader applications. Current data suggest that NusG, RfaH, and Spt5 bind to the same site on RNAP (175,197,263) and help the enzyme to bypass some pause signals, thereby increasing its processivity. However, the detailed mechanism of their antipausing action is not well understood.

We used RfaH as a paradigm to understand the molecular mechanism of the

NusG factors. RfaH is recruited to the TEC through specific interactions with a DNA element called ops. Following recruitment, RfaH remains bound to RNAP and acts as an antiterminator by reducing pausing and termination. We aimed to determine the molecular determinants important for RfaH activity. Functional analysis of single substitutions in this domain reported here identifies three separate RfaH regions that mediate i) binding to ops, ii) binding to RNAP, and iii) antitermination modification of the TEC (Chapter 3, (264)).

We set out to study the role of the β’CH, which was proposed to serve as the binding site for RfaH based on structural modeling, as a regulatory target (197). The

β’CH has long been known to recruit the initiation factor σ to core RNAP. Interestingly,

both RfaH and σ interact with the non-template DNA strand in transcription complexes

and thus may interfere with each other’s activity. We showed that RfaH did not inhibit

transcription initiation but, once recruited to RNAP, abolished σ-dependent pausing

during elongation. We argue that this apparent competition is due to steric exclusion of

σ by RfaH that is stably bound to the non-template DNA and clamp helices, both of

201 which are necessary for the recruitment of σ to the transcription complex. Our findings highlight the key regulatory role played by the β’CH during both initiation and elongation (Chapter 2, (136)). The apparent exclusion of σ by RfaH - and by NusG, (175)

- might also provide resolution to a long-term controversy over σ retention during

elongation (23,139,207,217,227). Although σ can readily re-bind to TEC in vitro at

promoter-like sequences, this does not occur in vivo, as has been shown by a whole-

genome microarray analysis (137).

Our data, together with the report from Mooney et al., suggest that RfaH and

NusG together insulate TEC from σ during elongation, thereby enhancing RNAP

processivity. RfaH also sterically excludes NusG from the ops-containing operons (129)

to inhibit Rho-dependent termination (82,136). However, the general antipausing

activity of RfaH is independent from its competition with other factors. We argued that

this activity requires modification of RNAP into a fast, pause-resistant state. In support

of this idea, RNAP locked in a fast state by substitutions is resistant to RfaH action (210).

Initial analysis of RfaH mutants revealed that binding to the β’CH is necessary

but not sufficient for antipausing activity (Chapter 3, (264)). This suggested that an

additional set of interactions with the TEC is required for RNAP modification. We found

that three consecutive residues forming an “antitermination” HTT motif were

dispensable for binding but required for RfaH function. In the heterologous model of the

RfaH/TEC complex, the HTT motif is positioned near the βGL, a flexible module that

was proposed to control DNA loading during initiation (5), located on the opposite side

of the main channel from the clamp. To our surprise, deletion of the GL had little effect

on the initiation, elongation and termination properties of RNAP (Chapter 5 and data

not shown). However, it abolished the RfaH- and NusG-mediated increase in the rate of

RNA chain elongation. qRT-PCR analysis of RfaH-controlled operon expression driven

by wild-type or GL deletion in the presence or absence of RfaH confirmed in vitro

observations. Finally, we showed that GL deletion does not abolish stimulation of Rho-

dependent termination by NusG, a function that is thought to be independent from

202 antipausing activity of NusG and based on the physical tethering of Rho to the TEC.

Based on our data, we propose that the NusG-like proteins restrict conformational mobility of RNAP by simultaneously binding to, and “gluing” together the two flexible RNAP modules to lock the clamp. Thus, binding of NusG (or RfaH) essentially restores a "missing wall" in the RNAP major channel. Formation of such a ring structure around nucleic acids by RNAP and NusG may functionally resemble the processivity clamp of replicases. Since the overall RNAP structure, and the mobile clamp in particular, is conserved in all organisms, we argue that this simple mechanism of antitermination modification can be employed by structurally diverse regulators in all kingdoms of life.

Despite the fact that different positions of the clamp have been observed in many

X-ray structures from different organisms, there is no experimental evidence that would connect movement of the clamp to any particular function in transcription. To further test our hypothesis, future single-molecule studies would be required to monitor the hypothetical opening and closing of the main channel (by measuring the distance between the jaws) and its correlation with pauses.

To assess directly what kind of conformational changes NusG-like proteins cause in RNAP, a tertiary structure of a TEC bound to RfaH or NusG would be required. Since no TEC structure is available for E. coli and RfaH is absent from T. thermophilus, for

which high-resolution structural data has been obtained, we wanted to investigate

whether T. thermophilus NusG can be used as a model for structural studies of this family

of regulators. We purified and performed the initial biochemical analysis of Tth NusG

(Chapter 4, (317)). We showed that Tth NusG slows down rather than facilitates

transcript elongation by its cognate RNAP. On the other hand, similarly to the E. coli

regulators, Tth NusG apparently binds near the upstream end of the transcription

bubble, competes with σA, and favors forward translocation by RNAP. Our data suggest

that the mechanism of NusG recruitment to RNAP is universally conserved even though

the regulatory outcomes among its homologs may appear distinct. A following X-ray

203 analysis of the Tth TEC co-crystallized with the factor would greatly improve our model of antipausing modification of RNAP by NusG.

The clamp domain is also a target for different inhibitors that affect transcription initiation (268,318). In collaboration with the Vassylyev lab, we described the crystal structure of RNAP with an antibiotic that we propose traps the initiation complex in a partially melted state. Myx binds to the RNAP clamp domain and stabilizes refolding of the

β’ SW2 segment, which constitutes the wall of the main channel opposite the catalytic center and forms crucial contacts with the DNA template strand in the TEC. In the presence of

Myx, the α-helix of the SW2 is straightened, while its C-terminal portion unwinds and

refolds into a loop that would clash with the DNA template strand if melting propagated to

the transcription start site (Chapter 4, (268)). In our lab, we validated the Myx binding site

through mutational analysis of the SW2 region. We also showed that in the presence of

antibiotic, the open promoter complex pathway is interrupted in the middle: the

upstream part of the transcription bubble is formed, but DNA melting is not propagated

downstream to include the active site. We constructed SW2 variants with altered

refolding properties that were supposed to favor unfolded conformation. Footprinting

analysis revealed that open complex formation by these RNAP variants in the absence of

Myx is interrupted in the same manner as was observed for the wild-type RNAP

complexed with Myx. The existence of multiple ways to achieve the same conformation

(through substitutions or binding of the antibiotic) strongly suggested that the trapped

partially melted complex can correspond to a natural intermediate along the initiation

pathway. This finding provided important insights into the role of the core enzyme in

strand separation during initiation and illustrated how local conformational changes in

the RNAP structure may regulate the transcriptional cycle. Studying RNAP variants

with altered refolding properties of switch-2, we characterized a collection of mutants

defective presumably in the transition from I2 to RPo, each with a unique “footprint

signature” (Fig. 71). The structural diversity among complexes formed by these RNAP

variants (as judged by different footprint patterns) highlights the problem of correlation

between structural and kinetic intermediates. Thoughtful analysis of the kinetics of DNA

204 melting by SW2 mutants would allow us to establish how kinetic intermediates correspond to structural complexes in the mutants characterized by altered SW2 refolding properties.

The presented study has three major implications: it improves our understanding of the mechanism of pausing and its regulation (Chapters 4 and 5), highlights the role of

RNAP conformational mobility in the transcription cycle (Chapters 5 and 6), and provides a deeper insight into the role of the NusG family of factors in the regulation of

RNA synthesis in different organisms (Chapters 2, 3, and 4). In addition, we have developed many useful tools and provided the background for future studies of RfaH as a virulence factor, its role in the coupling of transcription, translation and secretion processes; and studies of the molecular evolution and structural-functional diversity within the NusG family. We have also made an important contribution to our understanding of the mechanism of transcription initiation and the role of clamp flexing at different steps of RNA synthesis.

205

References

1. Zhang G, Campbell EA, Minakhin L, Richter C, Severinov K and Darst SA. (1999) Crystal structure of Thermus aquaticus core RNA polymerase at 3.3 A resolution. Cell, 98, 811-824. 2. Iyer LM, Koonin EV and Aravind L. (2003) Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA- dependent RNA polymerases and the origin of RNA polymerases. BMC Struct Biol, 3, 1. 3. Markovtsov V, Mustaev A and Goldfarb A. (1996) Protein-RNA interactions in the active center of transcription elongation complex. PNAS USA, 93, 3221-3226. 4. Zaychikov E, Martin E, Denissova L, Kozlov M, Markovtsov V, Kashlev M, Heumann H, Nikiforov V, Goldfarb A and Mustaev A. (1996) Mapping of catalytic residues in the RNA polymerase active center. Science, 273, 107-109. 5. Vassylyev DG, Sekine S, Laptenko O, Lee J, Vassylyeva MN, Borukhov S and Yokoyama S. (2002) Crystal structure of a bacterial RNA polymerase holoenzyme at 2.6 A resolution. Nature, 417, 712-719. 6. Gnatt AL, Cramer P, Fu J, Bushnell DA and Kornberg RD. (2001) Structural basis of transcription: an RNA polymerase II elongation complex at 3.3 A resolution. Science, 292, 1876-1882. 7. Cramer P, Bushnell DA and Kornberg RD. (2001) Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution. Science, 292, 1863-1876. 8. Strainic MG, Jr., Sullivan JJ, Velevis A and deHaseth PL. (1998) Promoter recognition by Escherichia coli RNA polymerase: effects of the UP element on open complex formation and promoter clearance. Biochemistry, 37, 18074-18080. 9. deHaseth PL, Zupancic ML and Record MT (1998) RNA polymerase-promoter interactions: the comings and goings of RNA polymerase. J Bacteriol, 180, 3019-3025. 10. Busby S and Ebright RH. (1994) Promoter structure, promoter recognition, and transcription activation in prokaryotes. Cell, 79, 743-746. 11. Ross W, Gosink KK, Salomon J, Igarashi K, Zou C, Ishihama A, Severinov K and Gourse RL. (1993) A third recognition element in bacterial promoters: DNA binding by the alpha subunit of RNA polymerase. Science, 262, 1407-1413.

206

12. Benoff B, Yang H, Lawson CL, Parkinson G, Liu J, Blatter E, Ebright YW, Berman HM and Ebright RH. (2002) Structural basis of transcription activation: the CAP-alpha CTD-DNA complex. Science, 297, 1562-1566. 13. Minakhin L, Bhagat S, Brunning A, Campbell EA, Darst SA, Ebright RH and Severinov K. (2001) Bacterial RNA polymerase subunit omega and eukaryotic RNA polymerase subunit RPB6 are sequence, structural, and functional homologs and promote RNA polymerase assembly. PNAS USA, 98, 892-897. 14. Vrentas CE, Gaal T, Ross W, Ebright RH and Gourse RL. (2005) Response of RNA polymerase to ppGpp: requirement for the omega subunit and relief of this requirement by DksA. Genes Dev, 19, 2378-2387. 15. Chatterji D, Ogawa Y, Shimada T and Ishihama A. (2007) The role of the omega subunit of RNA polymerase in expression of the relA gene in Escherichia coli. FEMS Microbiol Lett, 267, 51-55. 16. Burgess RR, Travers AA, Dunn JJ and Bautz EK. (1969) Factor stimulating transcription by RNA polymerase. Nature, 221, 43-46. 17. Gross CA, Chan C, Dombroski A, Gruber T, Sharp M, Tupy J and Young B. (1998) The functional and regulatory roles of sigma factors in transcription. Cold Spring Harb Symp Quant Biol, 63, 141-155. 18. Fenton MS, Lee SJ and Gralla JD. (2000) Escherichia coli promoter opening and -10 recognition: mutational analysis of sigma70. Embo J, 19, 1130-1137. 19. Hsu LM. (2002) Promoter clearance and escape in prokaryotes. Biochim Biophys Acta, 1577, 191-207. 20. Revyakin A, Liu C, Ebright RH and Strick TR. (2006) Abortive initiation and productive initiation by RNA polymerase involve DNA scrunching. Science, 314, 1139-1143. 21. Hsu LM, Vo NV and Chamberlin MJ. (1995) Escherichia coli transcript cleavage factors GreA and GreB stimulate promoter escape and gene expression in vivo and in vitro. PNAS USA, 92, 11588-11592. 22. Hatoum A and Roberts J. (2008) Prevalence of RNA polymerase stalling at Escherichia coli promoters after open complex formation. Mol Microbiol, 68, 17-28. 23. Mooney RA, Darst SA and Landick R. (2005) Sigma and RNA polymerase: an on-again, off-again relationship? Mol Cell, 20, 335-345. 24. Ring BZ, Yarnell WS and Roberts JW. (1996) Function of E. coli RNA polymerase sigma 70 in promoter-proximal pausing. Cell, 86, 485-493. 25. Mooney RA, Artsimovitch I and Landick R. (1998) Information processing by RNA polymerase: recognition of regulatory signals during RNA chain elongation. J Bacteriol, 180, 3265-3275. 26. Vassylyev DG, Vassylyeva MN, Zhang J, Palangat M, Artsimovitch I and Landick R. (2007) Structural basis for substrate loading in bacterial RNA polymerase. Nature, 448, 163-168.

207

27. Erie DA, Yager TD and von Hippel PH. (1992) The single-nucleotide addition cycle in transcription: a biophysical and biochemical perspective. Annu Rev Biophys Biomol Struct, 21, 379-415. 28. Steitz TA. (1998) A mechanism for all polymerases. Nature, 391, 231-232. 29. Sosunov V, Sosunova E, Mustaev A, Bass I, Nikiforov V and Goldfarb A. (2003) Unified two-metal mechanism of RNA synthesis and degradation by RNA polymerase. Embo J, 22, 2234-2244. 30. Zenkin N, Yuzenkova Y and Severinov K. (2006) Transcript-assisted transcriptional proofreading. Science, 313, 518-520. 31. Sosunova E, Sosunov V, Kozlov M, Nikiforov V, Goldfarb A and Mustaev A. (2003) Donation of catalytic residues to RNA polymerase active center by transcription factor Gre. PNAS USA, 100, 15469-15474. 32. Surratt CK, Milan SC and Chamberlin MJ. (1991) Spontaneous cleavage of RNA in ternary complexes of Escherichia coli RNA polymerase and its significance for the mechanism of transcription. PNAS USA, 88, 7983-7987. 33. Orlova M, Newlands J, Das A, Goldfarb A and Borukhov S. (1995) Intrinsic transcript cleavage activity of RNA polymerase. PNAS USA, 92, 4596-4600. 34. Zhang J, Palangat M and Landick R. (2010) Role of the RNA polymerase trigger loop in catalysis and pausing. Nat Struct Mol Biol, 17, 99-104. 35. Landick R. (2004) Active-site dynamics in RNA polymerases. Cell, 116, 351-353. 36. Temiakov D, Zenkin N, Vassylyeva MN, Perederina A, Tahirov TH, Kashkina E, Savkina M, Zorov S, Nikiforov V, Igarashi N, Matsugaki N, Wakatsuki S, Severinov K and Vassylyev DG. (2005) Structural basis of transcription inhibition by antibiotic streptolydigin. Mol Cell, 19, 655-666. 37. Wang D, Bushnell DA, Westover KD, Kaplan CD and Kornberg RD. (2006) Structural basis of transcription: role of the trigger loop in substrate specificity and catalysis. Cell, 127, 941-954. 38. Abbondanzieri EA, Greenleaf WJ, Shaevitz JW, Landick R and Block SM. (2005) Direct observation of base-pair stepping by RNA polymerase. Nature, 438, 460-465. 39. Neuman KC, Abbondanzieri EA, Landick R, Gelles J and Block SM. (2003) Ubiquitous transcriptional pausing is independent of RNA polymerase backtracking. Cell, 115, 437-447. 40. Toulokhonov I, Zhang J, Palangat M and Landick R. (2007) A central role of the RNA polymerase trigger loop in active-site rearrangement during transcriptional pausing. Mol Cell, 27, 406-419. 41. Artsimovitch I and Landick R. (2000) Pausing by bacterial RNA polymerase is mediated by mechanistically distinct classes of signals. PNAS USA, 97, 7090-7095.

208

42. Tomsic M, Tsujikawa L, Panaghie G, Wang Y, Azok J and deHaseth PL. (2001) Different roles for basic and aromatic amino acids in conserved region 2 of Escherichia coli sigma(70) in the nucleation and maintenance of the single-stranded DNA bubble in open RNA polymerase-promoter complexes. J Biol Chem, 276, 31891-31896. 43. Paget MS and Helmann JD. (2003) The sigma70 family of sigma factors. Genome Biol, 4, 203. 44. Hengge-Aronis R. (2002) Recent insights into the general stress response regulatory network in Escherichia coli. J Mol Microbiol Biotechnol, 4, 341-346. 45. Typas A and Hengge R. (2005) Differential ability of sigma(s) and sigma70 of Escherichia coli to utilize promoters containing half or full UP-element sites. Mol Microbiol, 55, 250-260. 46. Ma J and Howe MM. (2004) Binding of the C-terminal domain of the alpha subunit of RNA polymerase to the phage mu middle promoter. J Bacteriol, 186, 7858-7864. 47. Neufing PJ, Shearwin KE, Camerotto J and Egan JB. (1996) The CII protein of bacteriophage 186 establishes lysogeny by activating a promoter upstream of the lysogenic promoter. Mol Microbiol, 21, 751-761. 48. Szalewska-Palasz A, Wegrzyn A, Obuchowski M, Pawlowski R, Bielawski K, Thomas MS and Wegrzyn G. (1996) Drastically decreased transcription from CII-activated promoters is responsible for impaired lysogenization of the Escherichia coli rpoA341 mutant by bacteriophage lambda. FEMS Microbiol Lett, 144, 21-27. 49. Roberts CW and Roberts JW. (1996) Base-specific recognition of the nontemplate strand of promoter DNA by E. coli RNA polymerase. Cell, 86, 495-501. 50. Marr MT and Roberts JW. (1997) Promoter recognition as measured by binding of polymerase to nontemplate strand oligonucleotide. Science, 276, 1258-1260. 51. Murakami KS, Masuda S, Campbell EA, Muzzin O and Darst SA. (2002) Structural basis of transcription initiation: an RNA polymerase holoenzyme-DNA complex. Science, 296, 1285-1290. 52. Kulbachinskiy A and Mustaev A. (2006) Region 3.2 of the sigma subunit contributes to the binding of the 3'-initiating nucleotide in the RNA polymerase active center and facilitates promoter clearance during initiation. J Biol Chem, 281, 18273-18276. 53. Murakami KS, Masuda S and Darst SA. (2002) Structural basis of transcription initiation: RNA polymerase holoenzyme at 4 A resolution. Science, 296, 1280-1284. 54. Mekler V, Kortkhonjia E, Mukhopadhyay J, Knight J, Revyakin A, Kapanidis AN, Niu W, Ebright YW, Levy R and Ebright RH. (2002) Structural organization of bacterial RNA polymerase holoenzyme and the RNA polymerase-promoter open complex. Cell, 108, 599-614. 55. Haugen SP, Ross W and Gourse RL. (2008) Advances in bacterial promoter recognition and its control by factors that do not bind DNA. Nat Rev Microbiol, 6, 507-519.

209

56. Sclavi B, Zaychikov E, Rogozina A, Walther F, Buckle M and Heumann H. (2005) Real- time characterization of intermediates in the pathway to open complex formation by Escherichia coli RNA polymerase at the T7A1 promoter. PNAS USA, 102, 4706-4711. 57. Craig ML, Tsodikov OV, McQuade KL, Schlax PE, Jr., Capp MW, Saecker RM and Record MT (1998) DNA footprints of the two kinetically significant intermediates in formation of an RNA polymerase-promoter open complex: evidence that interactions with start site and downstream DNA induce sequential conformational changes in polymerase and DNA. J Mol Biol, 283, 741-756. 58. Roe JH and Record MT (1985) Regulation of the kinetics of the interaction of Escherichia coli RNA polymerase with the lambda PR promoter by salt concentration. Biochemistry, 24, 4721-4726. 59. Severinov K and Darst SA. (1997) A mutant RNA polymerase that forms unusual open promoter complexes. PNAS USA, 94, 13481-13486. 60. Rutherford ST, Villers CL, Lee JH, Ross W and Gourse RL. (2009) Allosteric control of Escherichia coli rRNA promoter complexes by DksA. Genes Dev, 23, 236-248. 61. Tsujikawa L, Tsodikov OV and deHaseth PL. (2002) Interaction of RNA polymerase with forked DNA: evidence for two kinetically significant intermediates on the pathway to the final complex. PNAS USA, 99, 3493-3498. 62. Saecker RM, Tsodikov OV, McQuade KL, Schlax PE, Jr., Capp MW and Record MT (2002) Kinetic studies and structural models of the association of E. coli sigma(70) RNA polymerase with the lambdaP(R) promoter: large scale conformational changes in forming the kinetically significant intermediates. J Mol Biol, 319, 649-671. 63. Schroeder LA, Karpen ME and deHaseth PL. (2008) Threonine 429 of Escherichia coli sigma 70 is a key participant in promoter DNA melting by RNA polymerase. J Mol Biol, 376, 153-165. 64. Craig ML, Suh WC and Record MT (1995) HO. and DNase I probing of E sigma 70 RNA polymerase--lambda PR promoter open complexes: Mg2+ binding and its structural consequences at the transcription start site. Biochemistry, 34, 15624-15632. 65. Schroeder LA, Choi AJ and DeHaseth PL. (2007) The -11A of promoter DNA and two conserved amino acids in the melting region of sigma70 both directly affect the rate limiting step in formation of the stable RNA polymerase-promoter complex, but they do not necessarily interact. Nucleic Acids Res, 35, 4141-4153. 66. Schroeder LA, Gries TJ, Saecker RM, Record MT, Harris ME and DeHaseth PL. (2009) Evidence for a tyrosine- stacking interaction and for a short-lived open intermediate subsequent to initial binding of Escherichia coli RNA polymerase to promoter DNA. J Mol Biol, 385, 339-349. 67. Davis CA, Bingman CA, Landick R, Record MT and Saecker RM. (2007) Real-time footprinting of DNA in the first kinetically significant intermediate in open complex formation by Escherichia coli RNA polymerase. PNAS USA, 104, 7833-7838.

210

68. Helmann JD and deHaseth PL. (1999) Protein-nucleic acid interactions during open complex formation investigated by systematic alteration of the protein and DNA binding partners. Biochemistry, 38, 5959-5967. 69. Suh WC, Ross W and Record MT (1993) Two open complexes and a requirement for Mg2+ to open the lambda PR transcription start site. Science, 259, 358-361. 70. Sullivan JJ, Bjornson KP, Sowers LC and deHaseth PL. (1997) Spectroscopic determination of open complex formation at promoters for Escherichia coli RNA polymerase. Biochemistry, 36, 8005-8012. 71. Kontur WS, Saecker RM, Capp MW and Record MT (2008) Late steps in the formation of E. coli RNA polymerase-lambda P R promoter open complexes: characterization of conformational changes by rapid [perturbant] upshift experiments. J Mol Biol, 376, 1034-1047. 72. Arthur TM, Anthony LC and Burgess RR. (2000) Mutational analysis of beta '260-309, a sigma 70 binding site located on Escherichia coli core RNA polymerase. J Biol Chem, 275, 23113-23119. 73. Durniak KJ, Bailey S and Steitz TA. (2008) The structure of a transcribing T7 RNA polymerase in transition from initiation to elongation. Science, 322, 553-557. 74. Cheetham GM, Jeruzalmi D and Steitz TA. (1999) Structural basis for initiation of transcription from an RNA polymerase-promoter complex. Nature, 399, 80-83. 75. Cheetham GM and Steitz TA. (1999) Structure of a transcribing T7 RNA polymerase initiation complex. Science, 286, 2305-2309. 76. Erijman L and Clegg RM. (1998) Reversible stalling of transcription elongation complexes by high pressure. Biophys J, 75, 453-462. 77. Uptain SM and Chamberlin MJ. (1997) Escherichia coli RNA polymerase terminates transcription efficiently at rho-independent terminators on single-stranded DNA templates. PNAS USA, 94, 13548-13553. 78. Condon C, Liveris D, Squires C, Schwartz I and Squires CL. (1995) rRNA operon multiplicity in Escherichia coli and the physiological implications of rrn inactivation. J Bacteriol, 177, 4152-4156. 79. Landick R, Wang D and Chan CL. (1996) Quantitative analysis of transcriptional pausing by Escherichia coli RNA polymerase: his leader pause site as paradigm. Methods Enzymol, 274, 334-353. 80. Friedman DI and Court DL. (1995) Transcription antitermination: the lambda paradigm updated. Mol Microbiol, 18, 191-200. 81. Foster JE, Holmes SF and Erie DA. (2001) Allosteric binding of nucleoside triphosphates to RNA polymerase regulates transcription elongation. Cell, 106, 243-252. 82. Artsimovitch I and Landick R. (2002) The transcriptional regulator RfaH stimulates RNA chain synthesis after recruitment to elongation complexes by the exposed nontemplate DNA strand. Cell, 109, 193-203.

211

83. Roberts JW, Yarnell W, Bartlett E, Guo J, Marr M, Ko DC, Sun H and Roberts CW. (1998) Antitermination by bacteriophage lambda Q protein. Cold Spring Harb Symp Quant Biol, 63, 319-325. 84. Wickiser JK, Winkler WC, Breaker RR and Crothers DM. (2005) The speed of RNA transcription and metabolite binding kinetics operate an FMN . Mol Cell, 18, 49-60. 85. Pan T, Artsimovitch I, Fang XW, Landick R and Sosnick TR. (1999) Folding of a large ribozyme during transcription and the effect of the elongation factor NusA. PNAS USA, 96, 9545-9550. 86. Pan T and Sosnick T. (2006) RNA folding during transcription. Annu Rev Biophys Biomol Struct, 35, 161-175. 87. Henkin TM. (2008) Riboswitch RNAs: using RNA to sense cellular metabolism. Genes Dev, 22, 3383-3390. 88. Yakhnin H, Yakhnin AV and Babitzke P. (2006) The trp RNA-binding attenuation protein (TRAP) of Bacillus subtilis regulates translation initiation of ycbK, a gene encoding a putative efflux protein, by blocking ribosome binding. Mol Microbiol, 61, 1252-1266. 89. Grayhack EJ, Yang XJ, Lau LF and Roberts JW. (1985) Phage lambda gene Q antiterminator recognizes RNA polymerase near the promoter and accelerates it through a pause site. Cell, 42, 259-269. 90. Palangat M, Meier TI, Keene RG and Landick R. (1998) Transcriptional pausing at +62 of the HIV-1 nascent RNA modulates formation of the TAR RNA structure. Mol Cell, 1, 1033-1042. 91. Park NJ, Tsao DC and Martinson HG. (2004) The two steps of poly(A)-dependent termination, pausing and release, can be uncoupled by truncation of the RNA polymerase II carboxyl-terminal repeat domain. Mol Cell Biol, 24, 4092-4103. 92. Yonaha M and Proudfoot NJ. (1999) Specific transcriptional pausing activates polyadenylation in a coupled in vitro system. Mol Cell, 3, 593-600. 93. Nogues G, Kadener S, Cramer P, de la Mata M, Fededa JP, Blaustein M, Srebrow A and Kornblihtt AR. (2003) Control of alternative pre-mRNA splicing by RNA Pol II elongation: faster is not always better. IUBMB Life, 55, 235-241. 94. Robson-Dixon ND and Garcia-Blanco MA. (2004) MAZ elements alter transcription elongation and silencing of the fibroblast growth factor 2 exon IIIb. J Biol Chem, 279, 29075-29084. 95. Borukhov S, Sagitov V and Goldfarb A. (1993) Transcript cleavage factors from E. coli. Cell, 72, 459-466. 96. Laptenko O, Lee J, Lomakin I and Borukhov S. (2003) Transcript cleavage factors GreA and GreB act as transient catalytic components of RNA polymerase. Embo J, 22, 6322-6334.

212

97. Opalka N, Chlenov M, Chacon P, Rice WJ, Wriggers W and Darst SA. (2003) Structure and function of the transcription elongation factor GreB bound to bacterial RNA polymerase. Cell, 114, 335-345. 98. Komissarova N and Kashlev M. (1997) RNA polymerase switches between inactivated and activated states By translocating back and forth along the DNA and the RNA. J Biol Chem, 272, 15329-15338. 99. Komissarova N and Kashlev M. (1997) Transcriptional arrest: Escherichia coli RNA polymerase translocates backward, leaving the 3' end of the RNA intact and extruded. PNAS USA, 94, 1755-1760. 100. Kettenberger H, Armache, K.J. & Cramer, P. (2004) Complete RNA polymerase II elongation complex structure and its interactions with NTP and TFIIS. Mol Cell, 16, 955- 965. 101. Symersky J, Perederina A, Vassylyeva MN, Svetlov V, Artsimovitch I and Vassylyev DG. (2006) Regulation through the RNA polymerase secondary channel. Structural and functional variability of the coiled-coil transcription factors. J Biol Chem, 281, 1309-1312. 102. Rutherford ST, Lemke JJ, Vrentas CE, Gaal T, Ross W and Gourse RL. (2007) Effects of DksA, GreA, and GreB on transcription initiation: insights into the mechanisms of factors that bind in the secondary channel of RNA polymerase. J Mol Biol, 366, 1243-1257. 103. Erie DA, Hajiseyedjavadi O, Young MC and von Hippel PH. (1993) Multiple RNA polymerase conformations and GreA: control of the fidelity of transcription. Science, 262, 867-873. 104. Stepanova E, Lee J, Ozerova M, Semenova E, Datsenko K, Wanner BL, Severinov K and Borukhov S. (2007) Analysis of promoter targets for Escherichia coli transcription elongation factor GreA in vivo and in vitro. J Bacteriol, 189, 8772-8785. 105. Mason SW and Greenblatt J. (1991) Assembly of transcription elongation complexes containing the N protein of phage lambda and the Escherichia coli elongation factors NusA, NusB, NusG, and S10. Genes Dev, 5, 1504-1512. 106. Schmidt MC and Chamberlin MJ. (1987) nusA protein of Escherichia coli is an efficient transcription termination factor for certain terminator sites. J Mol Biol, 195, 809-818. 107. Gusarov I and Nudler E. (2001) Control of intrinsic transcription termination by N and NusA: the basic mechanisms. Cell, 107, 437-449. 108. Vogel U and Jensen KF. (1997) NusA is required for ribosomal antitermination and for modulation of the transcription elongation rate of both antiterminated RNA and mRNA. J Biol Chem, 272, 12265-12271. 109. Mah TF, Li J, Davidson AR and Greenblatt J. (1999) Functional importance of regions in Escherichia coli elongation factor NusA that interact with RNA polymerase, the bacteriophage lambda N protein and RNA. Mol Microbiol, 34, 523-537.

213

110. Gopal B, Haire LF, Gamblin SJ, Dodson EJ, Lane AN, Papavinasasundaram KG, Colston MJ and Dodson G. (2001) Crystal structure of the transcription elongation/anti-termination factor NusA from Mycobacterium tuberculosis at 1.7 A resolution. J Mol Biol, 314, 1087-1095. 111. Yang X, Molimau S, Doherty GP, Johnston EB, Marles-Wright J, Rothnagel R, Hankamer B, Lewis RJ and Lewis PJ. (2009) The structure of bacterial RNA polymerase in complex with the essential transcription elongation factor NusA. Embo J, 10, 997-1002. 112. Toulokhonov I, Artsimovitch I and Landick R. (2001) Allosteric control of RNA polymerase by a site that contacts nascent RNA hairpins. Science, 292, 730-733. 113. Cardinale CJ, Washburn RS, Tadigotla VR, Brown LM, Gottesman ME and Nudler E. (2008) Termination factor Rho and its cofactors NusA and NusG silence foreign DNA in E. coli. Science, 320, 935-938. 114. Zheng C and Friedman DI. (1994) Reduced Rho-dependent transcription termination permits NusA-independent growth of Escherichia coli. PNAS USA, 91, 7543-7547. 115. Sigmund CD and Morgan EA. (1988) Nus A protein affects transcriptional pausing and termination in vitro by binding to different sites on the transcription complex. Biochemistry, 27, 5622-5627. 116. Roberts JW, Shankar S and Filter JJ. (2008) RNA polymerase elongation factors. Annu Rev Microbiol, 62, 211-233. 117. Yakhnin AV, Yakhnin H and Babitzke P. (2008) Function of the Bacillus subtilis transcription elongation factor NusG in hairpin-dependent RNA polymerase pausing in the trp leader. PNAS USA, 105, 16131-16136. 118. Reay P, Yamasaki K, Terada T, Kuramitsu S, Shirouzu M and Yokoyama S. (2004) Structural and sequence comparisons arising from the solution structure of the transcription elongation factor NusG from Thermus thermophilus. Proteins, 56, 40-51. 119. Steiner T, Kaiser JT, Marinkovic S, Huber R and Wahl MC. (2002) Crystal structures of transcription factor NusG in light of its nucleic acid- and protein-binding activities. Embo J, 21, 4641-4653. 120. Guo M, Xu F, Yamada J, Egelhofer T, Gao Y, Hartzog GA, Teng M and Niu L. (2008) Core structure of the yeast spt4-spt5 complex: a conserved module for regulation of transcription elongation. Structure, 16, 1649-1658. 121. Zhou H, Liu Q, Gao Y, Teng M and Niu L. (2009) Crystal structure of NusG N-terminal (NGN) domain from Methanocaldococcus jannaschii and its interaction with rpoE''. Proteins, 76, 787-793. 122. Heinrich T, Schroder W, Erdmann VA and Hartmann RK. (1992) Identification of the gene encoding transcription factor NusG of Thermus thermophilus. J Bacteriol, 174, 7859-7863. 123. Hartzog GA, Wada T, Handa H and Winston F. (1998) Evidence that Spt4, Spt5, and Spt6 control transcription elongation by RNA polymerase II in Saccharomyces cerevisiae. Genes Dev, 12, 357-369.

214

124. Sullivan SL and Gottesman ME. (1992) Requirement for E. coli NusG protein in factor- dependent transcription termination. Cell, 68, 989-994. 125. Burova E, Hung SC, Sagitov V, Stitt BL and Gottesman ME. (1995) Escherichia coli NusG protein stimulates transcription elongation rates in vivo and in vitro. J Bacteriol, 177, 1388-1392. 126. Burns CM and Richardson JP. (1995) NusG is required to overcome a kinetic limitation to Rho function at an intragenic terminator. PNAS USA, 92, 4738-4742. 127. Burova E and Gottesman ME. (1995) NusG overexpression inhibits Rho-dependent termination in Escherichia coli. Mol Microbiol, 17, 633-641. 128. Burns CM, Nowatzke WL and Richardson JP. (1999) Activation of Rho-dependent transcription termination by NusG. Dependence on terminator location and acceleration of RNA release. J Biol Chem, 274, 5245-5251. 129. Belogurov GA, Mooney RA, Svetlov V, Landick R and Artsimovitch I. (2009) Functional specialization of transcription elongation factors. Embo J, 28, 112-122. 130. Chatzidaki-Livanis M, Coyne MJ and Comstock LE. (2009) A family of transcriptional antitermination factors necessary for synthesis of the capsular polysaccharides of Bacteroides fragilis. J Bacteriol, 191, 7288-7295. 131. Mogridge J, Mah TF and Greenblatt J. (1995) A protein-RNA interaction network facilitates the template-independent cooperative assembly on RNA polymerase of a stable antitermination complex containing the lambda N protein. Genes Dev, 9, 2831-2845. 132. Torres M, Balada JM, Zellars M, Squires C and Squires CL. (2004) In vivo effect of NusB and NusG on rRNA transcription antitermination. J Bacteriol, 186, 1304-1310. 133. Mogridge J, Mah TF and Greenblatt J. (1998) Involvement of boxA nucleotides in the formation of a stable ribonucleoprotein complex containing the bacteriophage lambda N protein. J Biol Chem, 273, 4143-4148. 134. Burmann BM, Luo X, Rosch P, Wahl MC and Gottesman ME. (2010) Fine tuning of the E. coli NusB:NusE complex affinity to BoxA RNA is required for processive antitermination. Nucleic Acids Res, 38, 314-326. 135. Burmann BM, Schweimer K, Luo X, Wahl MC, Stitt BL, Gottesman ME and Rosch P. (2010) A NusE:NusG complex links transcription and translation. Science, 328, 501-504. 136. Sevostyanova A, Svetlov V, Vassylyev DG and Artsimovitch I. (2008) The elongation factor RfaH and the initiation factor sigma bind to the same site on the transcription elongation complex. PNAS USA, 105, 865-870. 137. Mooney RA, Davis SE, Peters JM, Rowland JL, Ansari AZ and Landick R. (2009) Regulator trafficking on units in vivo. Mol Cell, 33, 97-108. 138. Bar-Nahum G and Nudler E. (2001) Isolation and characterization of sigma(70)- retaining transcription elongation complexes from Escherichia coli. Cell, 106, 443-451.

215

139. Kapanidis AN, Margeat E, Laurence TA, Doose S, Ho SO, Mukhopadhyay J, Kortkhonjia E, Mekler V, Ebright RH and Weiss S. (2005) Retention of transcription initiation factor sigma70 in transcription elongation: single-molecule analysis. Mol Cell, 20, 347-356. 140. Perederina A, Svetlov V, Vassylyeva MN, Tahirov TH, Yokoyama S, Artsimovitch I and Vassylyev DG. (2004) Regulation through the secondary channel--structural framework for ppGpp-DksA synergism during transcription. Cell, 118, 297-309. 141. Adelman K, La Porta A, Santangelo TJ, Lis JT, Roberts JW and Wang MD. (2002) Single molecule analysis of RNA polymerase elongation reveals uniform kinetic behavior. PNAS USA, 99, 13538-13543. 142. Sydow JF, Brueckner F, Cheung AC, Damsma GE, Dengl S, Lehmann E, Vassylyev D and Cramer P. (2009) Structural basis of transcription: mismatch-specific fidelity mechanisms and paused RNA polymerase II with frayed RNA. Mol Cell, 34, 710-721. 143. Park JS, Marr MT and Roberts JW. (2002) E. coli Transcription repair coupling factor (Mfd protein) rescues arrested complexes by promoting forward translocation. Cell, 109, 757-767. 144. Washburn RS, Wang Y and Gottesman ME. (2003) Role of E.coli transcription-repair coupling factor Mfd in Nun-mediated transcription termination. J Mol Biol, 329, 655-662. 145. Klumpp S and Hwa T. (2008) Stochasticity and traffic jams in the transcription of ribosomal RNA: Intriguing role of termination and antitermination. PNAS USA, 105, 18159-18164. 146. Winkler ME and Yanofsky C. (1981) Pausing of RNA polymerase during in vitro transcription of the tryptophan operon leader region. Biochemistry, 20, 3738-3744. 147. Yanofsky C. (1981) Attenuation in the control of expression of bacterial operons. Nature, 289, 751-758. 148. Chan CL and Landick R. (1993) Dissection of the his leader pause site by base substitution reveals a multipartite signal that includes a pause RNA hairpin. J Mol Biol, 233, 25-42. 149. Chan CL, Wang D and Landick R. (1997) Multiple interactions stabilize a single paused transcription intermediate in which hairpin to 3' end spacing distinguishes pause and termination pathways. J Mol Biol, 268, 54-68. 150. Artsimovitch I and Landick R. (1998) Interaction of a nascent RNA structure with RNA polymerase is required for hairpin-dependent transcriptional pausing but not for transcript release. Genes Dev, 12, 3110-3122. 151. Nudler E, Mustaev A, Lukhtanov E and Goldfarb A. (1997) The RNA-DNA hybrid maintains the register of transcription by preventing backtracking of RNA polymerase. Cell, 89, 33-41.

216

152. Rudd MD, Izban MG and Luse DS. (1994) The active site of RNA polymerase II participates in transcript cleavage within arrested ternary complexes. PNAS USA, 91, 8057-8061. 153. Proshkin S, Rahmouni AR, Mironov A and Nudler E. (2010) Cooperation between translating ribosomes and RNA polymerase in transcription elongation. Science, 328, 504-508. 154. Kireeva ML and Kashlev M. (2009) Mechanism of sequence-specific pausing of bacterial RNA polymerase. PNAS USA, 106, 8900-8905. 155. Park JS and Roberts JW. (2006) Role of DNA bubble rewinding in enzymatic transcription termination. PNAS USA, 103, 4870-4875. 156. Macdonald LE, Zhou Y and McAllister WT. (1993) Termination and slippage by bacteriophage T7 RNA polymerase. J Mol Biol, 232, 1030-1047. 157. von Hippel PH. (1998) An integrated model of the transcription complex in elongation, termination, and editing. Science, 281, 660-665. 158. Yarnell WS and Roberts JW. (1999) Mechanism of intrinsic transcription termination and antitermination. Science, 284, 611-615. 159. Martin FH and Tinoco I, Jr. (1980) DNA-RNA hybrid duplexes containing oligo(dA:rU) sequences are exceptionally unstable and may facilitate termination of transcription. Nucleic Acids Res, 8, 2295-2299. 160. Ryder AM and Roberts JW. (2003) Role of the non-template strand of the elongation bubble in intrinsic transcription termination. J Mol Biol, 334, 205-213. 161. Santangelo TJ and Roberts JW. (2004) Forward translocation is the natural pathway of RNA release at an intrinsic terminator. Mol Cell, 14, 117-126. 162. Toulokhonov I and Landick R. (2003) The flap domain is required for pause RNA hairpin inhibition of catalysis by RNA polymerase and can modulate intrinsic termination. Mol Cell, 12, 1125-1136. 163. Gusarov I and Nudler E. (1999) The mechanism of intrinsic transcription termination. Mol Cell, 3, 495-504. 164. Epshtein V, Cardinale CJ, Ruckenstein AE, Borukhov S and Nudler E. (2007) An allosteric path to transcription termination. Mol Cell, 28, 991-1001. 165. Larson MH, Greenleaf WJ, Landick R and Block SM. (2008) Applied force reveals mechanistic and energetic details of transcription termination. Cell, 132, 971-982. 166. Richardson JP. (2002) Rho-dependent termination and ATPases in transcript termination. Biochim Biophys Acta, 1577, 251-260. 167. Chen CY and Richardson JP. (1987) Sequence elements essential for rho-dependent transcription termination at lambda tR1. J Biol Chem, 262, 11292-11299.

217

168. Skordalakes E and Berger JM. (2003) Structure of the Rho transcription terminator: mechanism of mRNA recognition and helicase loading. Cell, 114, 135-146. 169. Skordalakes E and Berger JM. (2006) Structural insights into RNA-dependent ring closure and ATPase activation by the Rho termination factor. Cell, 127, 553-564. 170. Brennan CA, Dombroski AJ and Platt T. (1987) Transcription termination factor rho is an RNA-DNA helicase. Cell, 48, 945-952. 171. Thomsen ND and Berger JM. (2009) Running in reverse: the structural basis for translocation polarity in hexameric . Cell, 139, 523-534. 172. Kalarickal NC, Ranjan A, Kalyani BS, Wal M and Sen R. (2010) A bacterial transcription terminator with inefficient molecular motor action but with a robust transcription termination function. J Mol Biol, 395, 966-982. 173. Morgan WD, Bear DG, Litchman BL and von Hippel PH. (1985) RNA sequence and secondary structure requirements for rho-dependent transcription termination. Nucleic Acids Res, 13, 3739-3754. 174. Nehrke KW and Platt T. (1994) A quaternary transcription termination complex. Reciprocal stabilization by and NusG protein. J Mol Biol, 243, 830-839. 175. Mooney RA, Schweimer K, Rosch P, Gottesman M and Landick R. (2009) Two structurally independent domains of E. coli NusG create regulatory plasticity via distinct interactions with RNA polymerase and regulators. J Mol Biol, 391, 341-358. 176. Deaconescu AM, Chambers AL, Smith AJ, Nickels BE, Hochschild A, Savery NJ and Darst SA. (2006) Structural basis for bacterial transcription-coupled DNA repair. Cell, 124, 507-520. 177. Selby CP and Sancar A. (1993) Molecular mechanism of transcription-repair coupling. Science, 260, 53-58. 178. Witkin EM. (1994) Mutation frequency decline revisited. Bioessays, 16, 437-444. 179. Weisberg RA and Gottesman ME. (1999) Processive antitermination. J Bacteriol, 181, 359-367. 180. Das A, Pal M, Mena JG, Whalen W, Wolska K, Crossley R, Rees W, von Hippel PH, Costantino N, Court D, Mazzulla M, Altieri AS, Byrd RA, Chattopadhyay S, DeVito J and Ghosh B. (1996) Components of multiprotein-RNA complex that controls transcription elongation in Escherichia coli phage lambda. Methods Enzymol, 274, 374-402. 181. Mason SW, Li J and Greenblatt J. (1992) Host factor requirements for processive antitermination of transcription and suppression of pausing by the N protein of bacteriophage lambda. J Biol Chem, 267, 19418-19426. 182. Vieu E and Rahmouni AR. (2004) Dual role of boxB RNA motif in the mechanisms of termination/antitermination at the lambda tR1 terminator revealed in vivo. J Mol Biol, 339, 1077-1087.

218

183. Hung SC and Gottesman ME. (1995) Phage HK022 Nun protein arrests transcription on phage lambda DNA in vitro and competes with the phage lambda N antitermination protein. J Mol Biol, 247, 428-442. 184. Watnick RS, Herring SC, Palmer AG, 3rd and Gottesman ME. (2000) The carboxyl terminus of phage HK022 Nun includes a novel zinc-binding motif and a tryptophan required for transcription termination. Genes Dev, 14, 731-739. 185. Yarnell WS and Roberts JW. (1992) The phage lambda gene Q transcription antiterminator binds DNA in the late gene promoter as it modifies RNA polymerase. Cell, 69, 1181-1189. 186. Yang XJ, Hart CM, Grayhack EJ and Roberts JW. (1987) Transcription antitermination by phage lambda gene Q protein requires a DNA segment spanning the RNA start site. Genes Dev, 1, 217-226. 187. Nickels BE, Roberts CW, Roberts JW and Hochschild A. (2006) RNA-mediated destabilization of the sigma(70) region 4/beta flap interaction facilitates engagement of RNA polymerase by the Q antiterminator. Mol Cell, 24, 457-468. 188. Santangelo TJ, Mooney RA, Landick R and Roberts JW. (2003) RNA polymerase mutations that impair conversion to a termination-resistant complex by Q antiterminator proteins. Genes Dev, 17, 1281-1292. 189. King RA, Banik-Maiti S, Jin DJ and Weisberg RA. (1996) Transcripts that increase the processivity and elongation rate of RNA polymerase. Cell, 87, 893-903. 190. Oberto J, Clerget M, Ditto M, Cam K and Weisberg RA. (1993) Antitermination of early transcription in phage HK022. Absence of a phage-encoded antitermination factor. J Mol Biol, 229, 368-381. 191. Banik-Maiti S, King RA and Weisberg RA. (1997) The antiterminator RNA of phage HK022. J Mol Biol, 272, 677-687. 192. Sen R, King RA and Weisberg RA. (2001) Modification of the properties of elongating RNA polymerase by persistent association with nascent antiterminator RNA. Mol Cell, 7, 993-1001. 193. Komissarova N, Velikodvorskaya T, Sen R, King RA, Banik-Maiti S and Weisberg RA. (2008) Inhibition of a transcriptional pause by RNA anchoring to RNA polymerase. Mol Cell, 31, 683-694. 194. Irnov I and Winkler WC. (2010) A regulatory RNA required for antitermination of biofilm and capsular polysaccharide operons in Bacillales. Mol Microbiol. 195. Ponting CP. (2002) Novel domains and orthologues of elongation factors. Nucleic Acids Res, 30, 3643-3652. 196. Liu J, Pei H, Mei S, Li J, Zhou L and Xiang H. (2008) Replication initiator DnaA interacts with an anti-terminator NusG in T. tengcongensis. Biochem Biophys Res Commun.

219

197. Belogurov GA, Vassylyeva MN, Svetlov V, Klyuyev S, Grishin NV, Vassylyev DG and Artsimovitch I. (2007) Structural basis for converting a general transcription factor into an operon-specific virulence regulator. Mol Cell, 26, 117-129. 198. Ingham CJ, Dennis J and Furneaux PA. (1999) Autogenous regulation of transcription termination factor Rho and the requirement for Nus factors in Bacillus subtilis. Mol Microbiol, 31, 651-663. 199. Bailey MJ, Hughes C and Koronakis V. (1996) Increased distal gene transcription by the elongation factor RfaH, a specialized homologue of NusG. Mol Microbiol, 22, 729-737. 200. Nagy G, Dobrindt U, Schneider G, Khan AS, Hacker J and Emody L. (2002) Loss of regulatory protein RfaH attenuates virulence of uropathogenic Escherichia coli. Infect Immun, 70, 4406-4413. 201. Farewell A, Brazas R, Davie E, Mason J and Rothfield LI. (1991) Suppression of the abnormal phenotype of Salmonella typhimurium rfaH mutants by mutations in the gene for transcription termination factor Rho. J Bacteriol, 173, 5188-5193. 202. Carter HD, Svetlov V and Artsimovitch I. (2004) Highly divergent RfaH orthologs from pathogenic proteobacteria can substitute for Escherichia coli RfaH both in vivo and in vitro. J Bacteriol, 186, 2829-2840. 203. Rees WA, Weitzel SE, Yager TD, Das A and von Hippel PH. (1996) Bacteriophage lambda N protein alone can induce transcription antitermination in vitro. PNAS USA, 93, 342-346. 204. Gruber TM and Gross CA. (2003) Multiple sigma subunits and the partitioning of bacterial transcription space. Annu Rev Microbiol, 57, 441-466. 205. Brodolin K, Zenkin N, Mustaev A, Mamaeva D and Heumann H. (2004) The sigma 70 subunit of RNA polymerase induces lacUV5 promoter-proximal pausing of transcription. Nat Struct Mol Biol, 11, 551-557. 206. Nickels BE, Mukhopadhyay J, Garrity SJ, Ebright RH and Hochschild A. (2004) The sigma 70 subunit of RNA polymerase mediates a promoter-proximal pause at the lac promoter. Nat Struct Mol Biol, 11, 544-550. 207. Mooney RA and Landick R. (2003) Tethering sigma70 to RNA polymerase reveals high in vivo activity of sigma factors and sigma70-dependent pausing at promoter-distal locations. Genes Dev, 17, 2839-2851. 208. Ko DC, Marr MT, Guo J and Roberts JW. (1998) A surface of Escherichia coli sigma 70 required for promoter function and antitermination by phage lambda Q protein. Genes Dev, 12, 3276-3285. 209. Grigorova IL, Phleger NJ, Mutalik VK and Gross CA. (2006) Insights into transcriptional regulation and sigma competition from an equilibrium model of RNA polymerase binding to DNA. PNAS USA, 103, 5332-5337.

220

210. Svetlov V, Belogurov GA, Shabrova E, Vassylyev DG and Artsimovitch I. (2007) Allosteric control of the RNA polymerase by the elongation factor RfaH. Nucleic Acids Res, 35, 5694-5705. 211. Kapanidis AN, Margeat E, Ho SO, Kortkhonjia E, Weiss S and Ebright RH. (2006) Initial transcription by RNA polymerase proceeds through a DNA-scrunching mechanism. Science, 314, 1144-1147. 212. Waldburger C, Gardella T, Wong R and Susskind MM. (1990) Changes in conserved region 2 of Escherichia coli sigma 70 affecting promoter recognition. J Mol Biol, 215, 267-276. 213. Sharp MM, Chan CL, Lu CZ, Marr MT, Nechaev S, Merritt EW, Severinov K, Roberts JW and Gross CA. (1999) The interface of sigma with core RNA polymerase is extensive, conserved, and functionally specialized. Genes Dev, 13, 3015-3026. 214. Young BA, Anthony LC, Gruber TM, Arthur TM, Heyduk E, Lu CZ, Sharp MM, Heyduk T, Burgess RR and Gross CA. (2001) A coiled-coil from the RNA polymerase beta' subunit allosterically induces selective nontemplate strand binding by sigma(70). Cell, 105, 935-944. 215. Gaal T, Ross W, Estrem ST, Nguyen LH, Burgess RR and Gourse RL. (2001) Promoter recognition and discrimination by EsigmaS RNA polymerase. Mol Microbiol, 42, 939-954. 216. Marr MT, Datwyler SA, Meares CF and Roberts JW. (2001) Restructuring of an RNA polymerase holoenzyme elongation complex by lambdoid phage Q proteins. PNAS USA, 98, 8972-8978. 217. Raffaelle M, Kanin EI, Vogt J, Burgess RR and Ansari AZ. (2005) Holoenzyme switching and stochastic release of sigma factors from RNA polymerase in vivo. Mol Cell, 20, 357-366. 218. Zenkin N, Kulbachinskiy A, Yuzenkova Y, Mustaev A, Bass I, Severinov K and Brodolin K. (2007) Region 1.2 of the RNA polymerase sigma subunit controls recognition of the -10 promoter element. Embo J, 26, 955-964. 219. Greenblatt J and Li J. (1981) Interaction of the sigma factor and the nusA gene protein of E. coli with RNA polymerase in the initiation-termination cycle of transcription. Cell, 24, 421-428. 220. Gill SC, Weitzel SE and von Hippel PH. (1991) Escherichia coli sigma 70 and NusA proteins. I. Binding interactions with core RNA polymerase in solution and within the transcription complex. J Mol Biol, 220, 307-324. 221. Vassylyev DG, Vassylyeva MN, Perederina A, Tahirov TH and Artsimovitch I. (2007) Structural basis for transcription elongation by bacterial RNA polymerase. Nature, 448, 157-162. 222. Kashkina E, Anikin M, Brueckner F, Lehmann E, Kochetkov SN, McAllister WT, Cramer P and Temiakov D. (2007) Multisubunit RNA polymerases melt only a single DNA base pair downstream of the active site. J Biol Chem, 282, 21578-21582.

221

223. Roberts JW. (2006) Biochemistry. RNA polymerase, a scrunching machine. Science, 314, 1097-1098. 224. Chaudhuri J, Khuong C and Alt FW. (2004) Replication protein A interacts with AID to promote deamination of somatic hypermutation targets. Nature, 430, 992-998. 225. Arndt KM and Chamberlin MJ. (1988) Transcription termination in Escherichia coli. Measurement of the rate of enzyme release from Rho-independent terminators. J Mol Biol, 202, 271-285. 226. Mukhopadhyay J, Kapanidis AN, Mekler V, Kortkhonjia E, Ebright YW and Ebright RH. (2001) Translocation of sigma(70) with RNA polymerase during transcription: fluorescence resonance energy transfer assay for movement relative to DNA. Cell, 106, 453-463. 227. Reppas NB, Wade JT, Church GM and Struhl K. (2006) The transition between transcriptional initiation and elongation in E. coli is highly variable and often rate limiting. Mol Cell, 24, 747-757. 228. Wade JT and Struhl K. (2004) Association of RNA polymerase with transcribed regions in Escherichia coli. PNAS USA, 101, 17777-17782. 229. Laptenko O, Kim SS, Lee J, Starodubtseva M, Cava F, Berenguer J, Kong XP and Borukhov S. (2006) pH-dependent conformational switch activates the inhibitor of transcription elongation. Embo J, 25, 2131-2141. 230. Burns CM, Richardson LV and Richardson JP. (1998) Combinatorial effects of NusA and NusG on transcription elongation and Rho-dependent termination in Escherichia coli. J Mol Biol, 278, 307-316. 231. Kyrpides NC, Woese CR and Ouzounis CA. (1996) KOW: a novel motif linking a bacterial transcription factor with ribosomal proteins. Trends Biochem Sci, 21, 425-426. 232. Artsimovitch I, Svetlov V, Anthony L, Burgess RR and Landick R. (2000) RNA polymerases from Bacillus subtilis and Escherichia coli differ in recognition of regulatory signals in vitro. J Bacteriol, 182, 6027-6035. 233. Zhou Y, Filter JJ, Court DL, Gottesman ME and Friedman DI. (2002) Requirement for NusG for transcription antitermination in vivo by the lambda N protein. J Bacteriol, 184, 3416-3418. 234. Bailey MJ, Hughes C and Koronakis V. (1997) RfaH and the ops element, components of a novel system controlling bacterial transcription elongation. Mol Microbiol, 26, 845- 851. 235. Wandersman C and Letoffe S. (1993) Involvement of lipopolysaccharide in the secretion of Escherichia coli alpha-haemolysin and Erwinia chrysanthemi proteases. Mol Microbiol, 7, 141-150. 236. Rahn A and Whitfield C. (2003) Transcriptional organization and regulation of the Escherichia coli K30 group 1 capsule biosynthesis (cps) gene cluster. Mol Microbiol, 47, 1045-1060.

222

237. Stevens MP, Hanfling P, Jann B, Jann K and Roberts IS. (1994) Regulation of Escherichia coli K5 capsular polysaccharide expression: evidence for involvement of RfaH in the expression of group II capsules. FEMS Microbiol Lett, 124, 93-98. 238. Bailey MJ, Koronakis V, Schmoll T and Hughes C. (1992) Escherichia coli HlyT protein, a transcriptional activator of haemolysin synthesis and secretion, is encoded by the rfaH (sfrB) locus required for expression of sex factor and lipopolysaccharide genes. Mol Microbiol, 6, 1003-1012. 239. Leeds JA and Welch RA. (1996) RfaH enhances elongation of Escherichia coli hlyCABD mRNA. J Bacteriol, 178, 1850-1857. 240. Nagy G, Danino V, Dobrindt U, Pallen M, Chaudhuri R, Emody L, Hinton JC and Hacker J. (2006) Down-regulation of key virulence factors makes the Salmonella enterica serovar Typhimurium rfaH mutant a promising live-attenuated vaccine candidate. Infect Immun, 74, 5914-5925. 241. Koronakis V, Cross M and Hughes C. (1988) Expression of the E.coli hemolysin secretion gene hlyB involves transcript anti-termination within the hly operon. Nucleic Acids Res, 16, 4789-4800. 242. Marr MT and Roberts JW. (2000) Function of transcription cleavage factors GreA and GreB at a regulatory pause site. Mol Cell, 6, 1275-1285. 243. Richardson LV and Richardson JP. (2005) Identification of a structural element that is essential for two functions of transcription factor NusG. Biochim Biophys Acta, 1729, 135- 140. 244. Benedix A, Becker CM, de Groot BL, Caflisch A and Bockmann RA. (2009) Predicting free energy changes using structural ensembles. Nat Methods, 6, 3-4. 245. Muller-Hill B, Crapo L and Gilbert W. (1968) Mutants that make more . PNAS USA, 59, 1259-1264. 246. Winson MK, Swift S, Hill PJ, Sims CM, Griesmayr G, Bycroft BW, Williams P and Stewart GS. (1998) Engineering the luxCDABE genes from Photorhabdus luminescens to provide a bioluminescent reporter for constitutive and promoter probe plasmids and mini-Tn5 constructs. FEMS Microbiol Lett, 163, 193-202. 247. Guzman LM, Belin D, Carson MJ and Beckwith J. (1995) Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter. J Bacteriol, 177, 4121-4130. 248. Lindberg AA and Hellerqvist CG. (1980) Rough mutants of Salmonella typhimurium: immunochemical and structural analysis of lipopolysaccharides from rfaH mutants. J Gen Microbiol, 116, 25-32. 249. Bailey MJ, Hughes C and Koronakis V. (2000) In vitro recruitment of the RfaH regulatory protein into a specialised transcription complex, directed by the nucleic acid ops element. Mol Gen Genet, 262, 1052-1059.

223

250. Bies-Etheve N, Pontier D, Lahmy S, Picart C, Vega D, Cooke R and Lagrange T. (2009) RNA-directed DNA methylation requires an AGO4-interacting member of the SPT5 elongation factor family. Embo J, 10, 649-654. 251. Wang D and Landick R. (1997) Nuclease cleavage of the upstream half of the nontemplate strand DNA in an Escherichia coli transcription elongation complex causes upstream translocation and transcriptional arrest. J Biol Chem, 272, 5989-5994. 252. Nickels BE. (2009) Genetic assays to define and characterize protein-protein interactions involved in gene regulation. Methods, 47, 53-62. 253. Li J, Horwitz R, McCracken S and Greenblatt J. (1992) NusG, a new Escherichia coli elongation factor involved in transcriptional antitermination by the N protein of phage lambda. J Biol Chem, 267, 6012-6019. 254. Zellars M and Squires CL. (1999) Antiterminator-dependent modulation of transcription elongation rates by NusB and NusG. Mol Microbiol, 32, 1296-1304. 255. Squires CL, Greenblatt J, Li J, Condon C and Squires CL. (1993) Ribosomal RNA antitermination in vitro: requirement for Nus factors and one or more unidentified cellular components. PNAS USA, 90, 970-974. 256. Sullivan SL, Ward DF and Gottesman ME. (1992) Effect of Escherichia coli nusG function on lambda N-mediated transcription antitermination. J Bacteriol, 174, 1339-1344. 257. Li J, Mason SW and Greenblatt J. (1993) Elongation factor NusG interacts with termination factor rho to regulate termination and antitermination of transcription. Genes Dev, 7, 161-172. 258. Pasman Z and von Hippel PH. (2000) Regulation of rho-dependent transcription termination by NusG is specific to the Escherichia coli elongation complex. Biochemistry, 39, 5573-5585. 259. Liao D, Lurz R, Dobrinski B and Dennis PP. (1996) A NusG-like protein from Thermotoga maritima binds to DNA and RNA. J Bacteriol, 178, 4089-4098. 260. Xia M, Lunsford RD, McDevitt D and Iordanescu S. (1999) Rapid method for the identification of essential genes in Staphylococcus aureus. Plasmid, 42, 144-149. 261. Knowlton JR, Bubunenko M, Andrykovitch M, Guo W, Routzahn KM, Waugh DS, Court DL and Ji X. (2003) A spring-loaded state of NusG in its functional cycle is suggested by X-ray crystallography and supported by site-directed mutants. Biochemistry, 42, 2275-2281. 262. Herbert KM, Zhou J, Mooney RA, Porta AL, Landick R and Block SM. (2010) E. coli NusG inhibits backtracking and accelerates pause-free transcription by promoting forward translocation of RNA polymerase. J Mol Biol, 399, 17-30. 263. Hirtreiter A, Damsma GE, Cheung AC, Klose D, Grohmann D, Vojnic E, Martin AC, Cramer P and Werner F. (2010) Spt4/5 stimulates transcription elongation through the RNA polymerase clamp coiled-coil motif. Nucleic Acids Res.

224

264. Belogurov GA, Sevostyanova A, Svetlov V and Artsimovitch I. (2010) Functional regions of the N-terminal domain of the antiterminator RfaH. Mol Microbiol. 265. Burova E, Hung SC, Chen J, Court DL, Zhou JG, Mogilnitskiy G and Gottesman ME. (1999) Escherichia coli nusG mutations that block transcription termination by coliphage HK022 Nun protein. Mol Microbiol, 31, 1783-1793. 266. Artsimovitch I, Patlan V, Sekine S, Vassylyeva MN, Hosaka T, Ochi K, Yokoyama S and Vassylyev DG. (2004) Structural basis for transcription regulation by alarmone ppGpp. Cell, 117, 299-310. 267. Miropolskaya N, Artsimovitch I, Klimasauskas S, Nikiforov V and Kulbachinskiy A. (2009) Allosteric control of catalysis by the F loop of RNA polymerase. PNAS USA, 106, 18942-18947. 268. Belogurov GA, Vassylyeva MN, Sevostyanova A, Appleman JR, Xiang AX, Lira R, Webber SE, Klyuyev S, Nudler E, Artsimovitch I and Vassylyev DG. (2009) Transcription inactivation through local refolding of the RNA polymerase structure. Nature, 457, 332-335. 269. Wenzel S, Martins BM, Rosch P and Wohrl BM. (2010) Crystal structure of the human transcription elongation factor DSIF hSpt4 subunit in complex with the hSpt5 dimerization interface. Biochem J, 425, 373-380. 270. Yakhnin AV and Babitzke P. (2010) Mechanism of NusG-stimulated pausing, hairpin- dependent pause site selection and intrinsic termination at overlapping pause and termination sites in the Bacillus subtilis trp leader. Mol Microbiol, 76, 690-705. 271. Artsimovitch I, Svetlov V, Murakami KS and Landick R. (2003) Co-overexpression of Escherichia coli RNA polymerase subunits allows isolation and analysis of mutant enzymes lacking lineage-specific sequence insertions. J Biol Chem, 278, 12344-12355. 272. Roberts JW. (2010) . Syntheses that stay together. Science, 328, 436-437. 273. Chen Y, Yamaguchi Y, Tsugeno Y, Yamamoto J, Yamada T, Nakamura M, Hisatake K and Handa H. (2009) DSIF, the Paf1 complex, and Tat-SF1 have nonredundant, cooperative roles in RNA polymerase II elongation. Genes Dev, 23, 2765-2777. 274. Schneider DA, French SL, Osheim YN, Bailey AO, Vu L, Dodd J, Yates JR, Beyer AL and Nomura M. (2006) RNA polymerase II elongation factors Spt4p and Spt5p play roles in transcription elongation by RNA polymerase I and rRNA processing. PNAS USA, 103, 12707-12712. 275. Peterlin BM and Price DH. (2006) Controlling the elongation phase of transcription with P-TEFb. Mol Cell, 23, 297-305. 276. Wang MB and Dennis ES. (2009) SPT5-like, a new component in plant RdDM. Embo J, 10, 573-575. 277. Nudler E and Gottesman ME. (2002) Transcription termination and anti-termination in E. coli. Genes Cells, 7, 755-768.

225

278. Stevens MP, Clarke BR and Roberts IS. (1997) Regulation of the Escherichia coli K5 capsule gene cluster by transcription antitermination. Mol Microbiol, 24, 1001-1012. 279. Ederth J, Artsimovitch I, Isaksson LA and Landick R. (2002) The downstream DNA jaw of bacterial RNA polymerase facilitates both transcriptional initiation and pausing. J Biol Chem, 277, 37456-37463. 280. Brueckner F and Cramer P. (2008) Structural basis of transcription inhibition by alpha- amanitin and implications for RNA polymerase II translocation. Nat Struct Mol Biol, 15, 811-818. 281. Irschik H, Gerth, K., Hofle, G., Kohl, W. & Reichenbach, H. (1983) The myxopyronins, new inhibitors of bacterial RNA synthesis from Myxococcus fulvus (Myxobacterales). J Antibiot (Tokyo) 36, 1651-1658. 282. Lira R, Xiang AX, Doundoulakis T, Biller WT, Agrios KA, Simonsen KB, Webber SE, Sisson W, Aust RM, Shah AM, Showalter RE, Banh VN, Steffy KR and Appleman JR. (2007) Syntheses of novel myxopyronin B analogs as potential inhibitors of bacterial RNA polymerase. Bioorg Med Chem Lett, 17, 6797-6800. 283. Artsimovitch I, Kahmeyer-Gabbe, M. & Howe, M.M. (1996) Distortion in the spacer region of Pm during activation of middle transcription of phage Mu. PNAS USA 93, 9408-9413 284. Chen YFH, J.D. (1997) DNA-melting at the Bacillus subtilis flagellin promoter nucleates near -10 and expands unidirectionally. J Mol Biol, 267, 47-59. 285. Li XY and McClure WR. (1998) Stimulation of open complex formation by nicks and apurinic sites suggests a role for nucleation of DNA melting in Escherichia coli promoter function. J Biol Chem, 273, 23558-23566. 286. Kuznedelov K, Korzheva, N., Mustaev, A. & Severinov, K. (2002) Structure-based analysis of RNA polymerase function: the largest subunit's rudder contributes critically to elongation complex stability and is not involved in the maintenance of RNA-DNA hybrid length. Embo J 21, 1369-1378 287. Artsimovitch I, Vassylyeva MN, Svetlov D, Svetlov V, Perederina A, Igarashi N, Matsugaki N, Wakatsuki S, Tahirov TH and Vassylyev DG. (2005) Allosteric modulation of the RNA polymerase catalytic reaction is an essential component of transcription control by rifamycins. Cell, 122, 351-363. 288. Vassylyeva MN, Lee J, Sekine SI, Laptenko O, Kuramitsu S, Shibata T, Inoue Y, Borukhov S, Vassylyev DG and Yokoyama S. (2002) Purification, crystallization and initial crystallographic analysis of RNA polymerase holoenzyme from Thermus thermophilus. Acta Crystallogr D Biol Crystallogr, 58, 1497-1500. 289. Yeates TO. (1997) Detecting and overcoming crystal twinning. Methods Enzymol, 276, 344-358.

226

290. Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T and Warren GL. (1998) Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr, 54, 905-921. 291. Vassylyev DG, Svetlov V, Vassylyeva MN, Perederina A, Igarashi N, Matsugaki N, Wakatsuki S and Artsimovitch I. (2005) Structural basis for transcription inhibition by tagetitoxin. Nat Struct Mol Biol, 12, 1086-1093. 292. Sims RJ, 3rd, Belotserkovskaya R and Reinberg D. (2004) Elongation by RNA polymerase II: the short and long of it. Genes Dev, 18, 2437-2468. 293. Nag A, Narsinh K and Martinson HG. (2007) The poly(A)-dependent transcriptional pause is mediated by CPSF acting on the body of the polymerase. Nat Struct Mol Biol, 14, 662-669. 294. Barboric M, Lenasi T, Chen H, Johansen EB, Guo S and Peterlin BM. (2009) 7SK snRNP/P-TEFb couples transcription elongation with alternative splicing and is essential for vertebrate development. PNAS USA, 106, 7798-7803. 295. Gilmour DS and Lis JT. (1986) RNA polymerase II interacts with the promoter region of the noninduced hsp70 gene in Drosophila melanogaster cells. Mol Cell Biol, 6, 3984-3989. 296. Rahl PB, Lin CY, Seila AC, Flynn RA, McCuine S, Burge CB, Sharp PA and Young RA. (2010) c-Myc regulates transcriptional pause release. Cell, 141, 432-445. 297. Nechaev S, Fargo DC, dos Santos G, Liu L, Gao Y and Adelman K. (2010) Global analysis of short RNAs reveals widespread promoter-proximal stalling and arrest of Pol II in Drosophila. Science, 327, 335-338. 298. Muse GW, Gilchrist DA, Nechaev S, Shah R, Parker JS, Grissom SF, Zeitlinger J and Adelman K. (2007) RNA polymerase is poised for activation across the genome. Nat Genet, 39, 1507-1511. 299. Guenther MG, Levine SS, Boyer LA, Jaenisch R and Young RA. (2007) A chromatin landmark and transcription initiation at most promoters in human cells. Cell, 130, 77-88. 300. Core LJ, Waterfall JJ and Lis JT. (2008) Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science, 322, 1845-1848. 301. Nechaev S and Adelman K. (2008) Promoter-proximal Pol II: when stalling speeds things up. Cell Cycle, 7, 1539-1544. 302. Zeitlinger J, Stark A, Kellis M, Hong JW, Nechaev S, Adelman K, Levine M and Young RA. (2007) RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat Genet, 39, 1512-1516. 303. Bai X, Kim J, Yang Z, Jurynec MJ, Akie TE, Lee J, LeBlanc J, Sessa A, Jiang H, DiBiase A, Zhou Y, Grunwald DJ, Lin S, Cantor AB, Orkin SH and Zon LI. TIF1gamma controls erythroid cell fate by regulating transcription elongation. Cell, 142, 133-143.

227

304. Li B, Weber JA, Chen Y, Greenleaf AL and Gilmour DS. (1996) Analyses of promoter- proximal pausing by RNA polymerase II on the hsp70 heat shock gene promoter in a Drosophila nuclear extract. Mol Cell Biol, 16, 5433-5443. 305. Adelman K, Kennedy MA, Nechaev S, Gilchrist DA, Muse GW, Chinenov Y and Rogatsky I. (2009) Immediate mediators of the inflammatory response are poised for gene activation through RNA polymerase II stalling. PNAS USA, 106, 18207-18212. 306. Core LJ and Lis JT. (2008) Transcription regulation through promoter-proximal pausing of RNA polymerase II. Science, 319, 1791-1792. 307. Margaritis T and Holstege FC. (2008) Poised RNA polymerase II gives pause for thought. Cell, 133, 581-584. 308. Adelman K, Wei W, Ardehali MB, Werner J, Zhu B, Reinberg D and Lis JT. (2006) Drosophila Paf1 modulates chromatin structure at actively transcribed genes. Mol Cell Biol, 26, 250-260. 309. Lis J. (1998) Promoter-associated pausing in promoter architecture and postinitiation transcriptional regulation. Cold Spring Harb Symp Quant Biol, 63, 347-356. 310. Wada T, Takagi T, Yamaguchi Y, Ferdous A, Imai T, Hirose S, Sugimoto S, Yano K, Hartzog GA, Winston F, Buratowski S and Handa H. (1998) DSIF, a novel transcription elongation factor that regulates RNA polymerase II processivity, is composed of human Spt4 and Spt5 homologs. Genes Dev, 12, 343-356. 311. Zhu W, Wada T, Okabe S, Taneda T, Yamaguchi Y and Handa H. (2007) DSIF contributes to transcriptional activation by DNA-binding activators by preventing pausing during transcription elongation. Nucleic Acids Res, 35, 4064-4075. 312. Karlowski WM, Zielezinski A, Carrere J, Pontier D, Lagrange T and Cooke R. (2010) Genome-wide computational identification of WG/GW Argonaute-binding proteins in Arabidopsis. Nucleic Acids Res. 313. Pei Y and Shuman S. (2002) Interactions between fission yeast mRNA capping enzymes and elongation factor Spt5. J Biol Chem, 277, 19639-19648. 314. Schneider S, Pei Y, Shuman S and Schwer B. (2010) Separable functions of the fission yeast Spt5 carboxyl-terminal domain (CTD) in capping enzyme binding and transcription elongation overlap with those of the RNA polymerase II CTD. Mol Cell Biol, 30, 2353-2364. 315. Bourgeois CF, Kim YK, Churcher MJ, West MJ and Karn J. (2002) Spt5 cooperates with human immunodeficiency virus type 1 Tat by preventing premature RNA release at terminator sequences. Mol Cell Biol, 22, 1079-1093. 316. Werner F. (2007) Structure and function of archaeal RNA polymerases. Mol Microbiol, 65, 1395-1404. 317. Sevostyanova A and Artsimovitch I. (2010) Functional analysis of Thermus thermophilus transcription factor NusG. Nucleic Acids Res.

228

318. Mukhopadhyay J, Das K, Ismail S, Koppstein D, Jang M, Hudson B, Sarafianos S, Tuske S, Patel J, Jansen R, Irschik H, Arnold E and Ebright RH. (2008) The RNA polymerase "switch region" is a target for inhibitors. Cell, 135, 295-307.

229