ANALYZING AND CLASSIFYING BIMOLECULAR INTERACTIONS: I. EFFECTS OF METAL BINDING ON AN IRON-SULFUR CLUSTER SCAFFOLD II. AUTOMATIC ANNOTATION OF RNA-PROTEIN INTERACTIONS FOR NDB

Poorna Roy

A Dissertation

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

August 2017

Committee:

Neocles Leontis, Committee Co-Chair

Andrew Torelli, Committee Co-Chair

Vipaporn Phuntumart, Graduate Faculty Representative

H. Peter Lu © 2017

Poorna Roy

All Rights Reserved iii ABSTRACT

Neocles B. Leontis and Andrew T. Torelli, Committee co-chairs

This dissertation comprises two distinct parts; however the different research agendas are thematically linked by their complementary approaches to investigate the nature of important intermolecular interactions. The first part is the study of interactions between an iron-sulfur cluster , IscU, and different transition metal ions. Interactions between IscU and specific metal ions are investigated and compared with those of SufU, a homologous Fe-S cluster biosynthesis protein from Gram-positive bacteria whose metal-dependent conformational behavior remains unclear. These studies were extended with additional metal ions selected to determine whether coordination geometry at the active sites of IscU and its homolog influence metal ion selectivity. Comparing the conformational behavior and affinity for different transition metal ions revealed that metal-dependent conformational transitions exhibited by IscU may be a recurring strategy exhibited by U-type involved in Fe-S cluster biosynthesis.

The second part of the thesis focuses on automated detection and annotation of specific interactions between and amino acid residues in RNA-protein complexes. RNA- protein interactions play crucial roles in all stages of , translation and regulation. In order to systematically detect, annotate, and query these non-covalent interactions, we have developed programs that are integrated into the RNA.BGSU.EDU data pipeline to provide RNA-protein interaction annotations to the NDB website. Our programs were then used to identify RNA-protein interactions in the mammalian mitochondrial (mmt) ribosome. Mmt ribosomes have evolved from ancestral bacterial ribosomes by large-scale reduction of ribosomal

RNAs (rRNAs), loss of (G) nucleotides, and an increased prevalence of ribosomal iv proteins. Systematic comparisons of recently solved structures of small-subunits (SSU) mmt- ribosomes with those of bacteria, and of high-quality rRNA sequence alignments, allowed us to deduce rules for folding a complex RNA with far fewer Gs. Via specific RNA-protein interactions, mmt rProteins (i) substitute for truncated rRNA helices, (ii) maintain the mutual spatial orientations of the remaining helices, (iii) compensate for lost RNA-RNA interactions,

(iv) reduce the solvent accessibility of exposed bases, and (v) stabilize the RNA loop motifs lacking Gs that are conserved in bacteria. v

Dedicated to my parents and my grandmother, who instilled the value of education in me. Some day I hope to do the same for my baby niece, Shubhangi. vi ACKNOWLEDGMENTS

I am truly thankful to my advisors Prof. Neocles B. Leontis and Prof. Andrew T. Torelli for their continuous support and guidance during my entire Ph.D. study. They not only guided my dissertation projects, but also helped me build up the skills of critical thinking.

I would also like to thank my committee members, Prof. H. Peter Lu, and

Prof. Vipaporn Phuntumart, for their valuable suggestions throughout my studies, and our collaborators, Prof. Craig Zirbel, Prof. Eric Westhof and Prof. Marie Sissler for their generous help and support.

I am grateful to my lab mates Blake Sweeney, Maryam, Sri, Mary, Hayfa, Geetha and

Mike for making my PhD work such a great learning experience. I have spent most of my PhD days, in a windowless office with Blake and Maryam, but you both made the atmosphere exciting.

I am thankful to my family, dada, bourani, KD, Puntai, papa, mummy, bhaiya, bhabhi,

Pihu, for keeping me healthy and sane during this challenging period. Whenever I have had doubts, my brother has led me not just by words, but also by example, showing what hard work and determination can achieve. My Kgp family has always been a constant source of support as have been the two goofballs, Ani and Shilu. My BG friends, Arpan, Sarasij, Nibedita, Kaustav, thank you for the constant encouragement. My soul sister, Debarati, for listening to my rants during frustrating times and cherishing my accomplishments. Last but not the least, my husband

Vishal. You are my strength, my support, and my voice of reason. I am pretty sure you were not as worried about getting your PhD as you were about mine. We did it! vii

TABLE OF CONTENTS Page

CHAPTER 1. OVERVIEW OF STRUCTURAL HETEROGENITY IN Fe-S CLUSTER

BIOSYNTHESIS PROTEINS …………………………………………………………...... 1

1.1 Importance of Fe-S clusters as protein cofactors ……………………………… 1

1.2 Properties of Fe-S clusters …………………………………………………… .. 1

1.3 Fe-S cluster biosynthesis systems...... 3

1.4 IscU as a metamorphic protein ...... 6

1.5 Role of IscU conformational heterogeneity in Fe-S cluster biosynthesis ...... 7

1.6 Sequence elements implicated in conformational heterogeneity of IscU and

SufU homologues from Gram-negative and Gram-positive bacteria ...... 9

1.7 Interactions of IscU with metal ions and their biological relevance ...... 13

1.8 Interactions of toxic metals with Fe-S biosynthetic systems ...... 14

1.9 Specific aims and significance of the work ...... 15

REFERENCES ...... 17

CHAPTER 2. EFFECT OF TOXIC METAL STRESS ON CONFORMATIONS OF

U-TYPE PROTEINS ……………………………………………………………………… 24

2.1 IscU and SufU as metalloproteins …...... …………………………… 24

2.1.1 Fe-S clusters targeted by transition metals … …………………………… 24

2.2 Methods …...... …………………………… 26

2.2.1 Gene cloning and protein overexpression construct design ……………… 26

2.2.2 Protein expression and IMAC purification ……………………………… 26 viii

2.2.3 Iron ion content determination …...... …………………………… 27

2.2.4 Zinc ion content determination … ...... …………………………… 28

2.2.5 Removal of coordinated metal ions … ...... …………………………… 28

2.2.6 Circular Dichroism (CD) experiments …...... …………………………… 29

2.2.7 Thermofluor assay … ...... …………………………… 30

2.2.8 Steady state fluorescence measurements … ...…………………………… 30

2.3 Results ….……………………………...... 31

2.3.1 Change in secondary structure in response to metal addition …………… 31

2.3.2 Probing thermal stability of the proteins with addition of metals………... 33

2.3.3 Assessment of solvent accessibility of the protein active site upon metal

binding……………… ...... 35

2.4 Discussion……………………………...... 36

2.4.1 Metal ion binding by Fe-S cluster biosynthesis proteins ………………… 36

2.4.2 Preparation of proteins for metal binding studies ………………...... 37

2.4.3 Zn2+ alters the secondary structure profile of IscU and SufU……………. 38

2.4.4 Confirming the effect of Zn2+ on IscU and SufU with Cd2+ …………….. 39

2.4.5 IscU and SufU discriminate between transition metals with different

coordination geometries ……………...... 41

2.4.6 Confirming the structural effect and discrimination of different transition

metals…………… ...... 43

2.4.7 Affinity of IscU and SufU for Zn2+ ions ……………...... 44

2.5 Conclusion ...... …………………………… 45

REFERENCES ...... ……………………………...... 47 ix

APPENDIX A FIGURES…………………………… ...... 51

CHAPTER 3. ROLE OF PUTATIVE LIGANDS ON THE ACTIVE SITE OF IscU…… 53

3.1 Introduction…...... …………………………… 53

3.1.1 Metals associated with proteins… ...... …………………………… 53

3.1.2 Amino acids as ligands for metal ions…………...... 54

3.1.3 Coordination geometry of metals in metalloproteins…………………….. 56

3.1.4 Conservation of residues at the active site of IscU superfamily…………. 57

3.1.5 Metal coordination at active site of IscU…………………………… ...... 59

3.1.6 Role of active site residues in conformational transitions of IscU……… . 61

3.2 Methods...... 62

3.2.1 Gene cloning and protein overexpression construct design……………… 62

3.2.2 Site-directed mutagenesis of IscU……………… ...... 63

3.2.3 Protein expression and IMAC purification ……………………………… 63

3.2.4 Iron ion content determination …...... …………………………… 64

3.2.5 Zinc ion content determination … ...... …………………………… 65

3.2.6 Removal of coordinated metal ions … ...... …………………………… 65

3.2.7 Circular Dichroism (CD) experiments …...... …………………………… 66

3.2.8 Thermofluor assay … ...... …………………………… 67

3.3 Results ….……………………………...... 67

3.3.1 Effect of mutations at the active site on the conformation of IscU……… 67

3.3.2 Effect of mutations on metal coordination in IscU ……………………… 69

3.3.3 Probing thermal stability of the proteins with addition of metals ……...... 72

3.4 Discussion……………………………...... 74 x

3.4.1 D39A and D39H mutations stabilize a single structural conformation

independent of metal ion binding ………………… ...... 75

3.4.2 H105A mutation diminishes the stabilizing effect of added Co2+ ions … . 77

3.5 Conclusion ...... …………………………… 78

REFERENCES ...... ……………………………...... 79

APPENDIX B FIGURES …………………………… ...... 82

CHAPTER 4. INTRODUCTION TO RECURRENT INTERACTIONS IN

NUCLEOPROTEIN COMPLEXES………………………………………...... 83

4.1 Components of RNA nucleotides…………………………… ...... 84

4.1.1 Bases and base edges … ...... …………………………… 84

4.1.2 Ribose sugars …………...... 87

4.1.3 Phosphate groups …………………………… ...... 88

4.2 RNA Backbone conformations …...... …………………………… 88

4.3 Components of amino acid residues…………………………… ...... 89

4.3.1 Peptide backbone and torsion angles……… ...... 91

4.3.2 Amino acid sidechains ……………………… ...... 92

4.4 Properties of amino acids and nucleotides……………………………...... 98

4.4.1 Electronic structure of amino acids……… ...... 98

4.4.1.1 pKa values, corresponding delta G ...... 99

4.4.2 Electronic properties of nucleotides ………………………...... 102

4.4.3 Tautomeric and protonated forms of bases …………...... 103

4.4.4 Relevance of protonation of A and C to non Watson-Crick base pairing .. 108

4.5 Hydrogen bonding in base pairs ...... 108 xi

4.5.1 Base pairing interactions …………...... 110

4.5.2 The Sugar Edge …………...... 112

4.5.3 isostericity and sequence variation ………… ...... 113

4.5.4 Base pair frequency …………...... 113

4.6 Orbitals available in amino acids for hydrogen or metal binding……………… 115

4.6.1 Amino acid sidechains with lone pair of electrons ………… ...... 117

4.7 Recurrent elementary interactions in RNA-protein complexes ………………… 123

4.7.1 Hydrogen bonding ………… ...... 123

4.7.2 Electrostatic interactions …………...... 124

4.7.3 Van der Waal’s interactions …………...... 125

4.7.4 Hydrophobic interactions …………...... 127

4.8 Stabilization of RNA quaternary structure by proteins ……………………...... 129

4.9 Resources for exploring RNA 3D structure ……………………...... 131

4.9.1 Sources of structural data …………...... 132

4.9.2 Tools for evaluating structures …………...... 133

4.9.3 Tools for searching structures …………...... 134

4.10 Conclusion …………………… ...... 138

REFERENCES…………………………… ...... 140

CHAPTER 5. AUTOMATED DETECTION AND ANNOTATION OF

RNA-PROTEIN INTERACTIONS……………………………….….…….…….…….…….144

5.1 Early evidence of specific nucleic acid-protein recognition…………………… 144

5.2 Differences between RNA and DNA Recognition by proteins………………… 145

5.3 Binding sites on RNA: Major and minor grooves…………………………… ... 147 xii

5.4 Types of RNA-protein interactions ……………………………...... 148

5.4.1 Electrostatic interaction ...... 148

5.4.2 Hydrogen bonding ……………………… ...... 149

5.4.3 Stacking interactions…………...... 155

5.4.4 Bidentate interactions…………...... 156

5.4.5 Hydrophobic interactions…………...... 157

5.4.6 Perpendicular stacking and cation-pi interactions………… ...... 158

5.5 Methods …...... …………………………… 159

5.5.1 Deriving interaction pairs from representative list of crystal structures..... 163

5.5.2 Defining components of RNA-protein interactions...... 163

5.5.3 Parsing structural files...... 165

5.5.4 Geometric conditions for annotating interactions …………...... 166

5.5.5 Annotations for interacting base component ………… ...... 169

5.6 Results and discussion …………………… ...... 174

5.6.1 Statistics of occurrence of different types of interactions …………...... 175

5.6.2 Propensity of amino acids to participate in certain types of interactions ... 175

5.6.3 Specific recognition of bases by amino acids …………...... 176

5.6.4 Usage in bioinformatics ………… ...... 179

5.7 Conclusion …………………… ...... 184

REFERENCES ...... ……………………………...... 185

CHAPTER 6. THE EVOLUTIONARY PATH OF THE MAMMALIAN

MITOCHONDRIAL RIBOSOME: HOW TO FOLD RNA WITH LESS GUANINES

…………………...... 187 xiii

6.1 Mitochondria: The site of ROS production…………………………… ...... 187

6.2 ROS damage to mitochondrial ribosomes ...... ……………………… 188

6.3 Methods …...... …………………………… 189

6.3.1 Analysis of 3D Structures ...... 189

6.3.2 Assessment of degree of conservation...... 189

6.3.3 Visualization of helical elements in 2D and 3D ...... 193

6.4 Results and discussion …………………… ...... 193

6.4.1 Corresponding interaction networks in bacterial and mmt SSU ………… 193

6.4.2 Selective changes in composition of mmt-SSU rRNA ...... 197

6.4.3 Distribution of Gs by Structural Context …………...... 198

6.4.4 Transition from an RNA to an RNP world ………… ...... 206

6.4.5 Shielding of ribosomal RNA from solvent borne ROS ………… ...... 212

6.5 Conclusion …………………… ...... 215

REFERENCES ...... ……………………………...... 217 xiv

LIST OF FIGURES

Figure Page

1.1 Three common iron-sulfur cluster species …...... 2

1.2 Comparison of involved in Fe-S cluster biosynthesis in Escherichia coli and

Streptococcus mutans ...... 4

1.3 Conformational heterogeneity in IscU ...... 7

1.4 IscU in solution adopts an equilibrium of partially disordered (D) and structured (S)

states that can be shifted under different conditions …...... 8

1.5 Multiple sequence alignment of IscU from Gram-negative bacteria and SufU from

Gram-positive bacteria ...... 10

1.6 A comparison of experimental structures of IscU representing the structured and

partially-disordered states highlights differences in secondary structure ...... 12

2.1 Far-UV circular dichroism spectra of 10 M (left) E. coli IscU and (right) S. mutans

SufU with 50 μM of different metal ions ...... 33

2.2 Thermal melting curves for (A) IscU and (B) SufU in absence (dotted line) and

presence (solid line) of different divalent metals ...... 34

2.3 Effect of zinc addition on tryptophan fluorescence emission ...... 36

3.1 Protein amino acid sidechains involved in metal binding and their metal (M) binding

patterns...... 55

3.2 Conservation of electrostatics in IscU based on multiple sequence alignments...... 58

3.3 Crystal structures and model of SufU and IscU with disordered regions predicted by

PONDR ...... 61 xv

3.4 Far-UV circular dichroism spectra of 10 M E. coli IscU and its mutants in absence

of metals (apo-form) ...... 69

3.5 Far-UV circular dichroism spectra of 10 M E. coli IscU and its mutants in presence

of metals (Zn2+ and Fe3+) ...... 70

3.6 Far-UV circular dichroism spectra of 10 M E. coli IscU and its mutants in presence

of toxic transition metals (Cd2+, Co2+ and Cu2+) ...... 71

3.7 Melting temperature of IscU variants in absence and presence of different metal ions,

as analyzed by the thermofluor assay ...... 73

4.1 Hydrogen bonding in RNA nucleotides ...... 86

4.2 Fisher projections of D and L enantiomers of alanine ...... 90

4.3 Dihedral angles in peptide backbone ...... 92

4.4 Tautomers of and their hydrogen bonding patterns ...... 106

4.5 Nucleotides to print on transparencies ...... 109

4.6 A graphical summary of the base pair occurrence frequencies within each base

pair family, obtained from rRNA sequence data ...... 114

4.7 Structure and resonance in basic amino acids ...... 120

4.8 Structure and resonance in Asparagine and Glutamine ...... 121

4.9 Structure and resonance in acidic amino acids ...... 121

4.10 Structure and resonance in polar amino acids with a hydroxyl function group in

their sidechains ...... 122

4.11 Structure and resonance in nominally polar amino acids ...... 123

4.12 Components of a Hydrogen bond (HB) ...... 124

4.13 Amino Acid Interactions for Loop vs. Helix Nucleotides ...... 130 xvi

5.1 Different conformations of DNA ...... 146

5.2 Electrostatic interaction in RNA between anionic phosphate (-1 charge) and

protonated Lysine sidechain (+1 charge) in Flock House virus B2-dsRNA Complex .. 149

5.3 Hydrogen bond donor and acceptor groups in (A) amino acid sidechains and

(B) RNA nucleotides ...... 153

5.4 Pseudopair observed between an Arginine and in E. coli small subunit

ribosome...... 154

5.5 Stacking interactions of RNA bases with (a) Arginine and (b) Tryptophan residues .... 156

5.6 Bidentate interaction involving residues in T. thermophilus ribosomal large subunit ... 157

5.7 Perpendicular edge interaction in mammalian mitochondrial small subunit ribosome... 158

5.8 Perpendicular interactions showing (A) cation-pi interaction between Arginine

and Cytosine and (B) perpendicular stacking of Phenylalanine with Adenosine ...... 159

5.9 Segmentation of amino acid and nucleotide residue to facilitate user directed

screening of interactions ...... 160

5.10 Pseudopair detection conditions ...... 167

5.11 Stacking criterion for amino acids with non-planar functional groups ...... 168

5.12 Classification of pseudopairs along three different edges of nucleobases ...... 170

5.13 Classification of stacking interactions according to base faces ...... 171

5.14 A Flowchart showing parsing of nucleoprotein complex structural data and

initial screening condition to isolate base-amino acid pairs ...... 173

5.14 B Flowchart showing geometric conditions applied to detect and annotate different

types of interaction ...... 174 xvii

5.15 Column graph indicating percentage of base- amino acid (functional group) stacking

interactions in different ribosomal structures ...... 176

5.16 Interaction of amino acid functional groups with different bases ...... 177

5.17 Conserved bidentate interaction involving equivalent residues in E. coli,

T. thermophilus and D. radiodurans LSU ...... 181

6.1 Comparative 2D structures of SSU rRNAS in bacteria and mammalian mitochondria . 194

6.2 Nucleotide composition of Aligned rRNA for Bacterial (dotted lines) and mmt-SSUs

(continuous lines) ...... 197

6.3 RNA-protein interactions of different types involving RNA bases in the porcine

mitochondrial and E. coli ribosomal SSU ...... 211 xviii

LIST OF TABLES

Table Page

1.1 Different conditions shown to stabilize the structured and partially-disordered

states of IscU, respectively … ...... …………………………… 9

2.1 Melting temperatures of apo-IscU and apo-SufU in presence of various metal ions ..... 35

3.1 Classification and characteristics of biologically-relevant metal ions and ligands

according to the Hard Soft Acid Base theory ...... 54

3.2 Amino acid residues that commonly bind metals in proteins ...... 55

3.3 Common geometries and corresponding hybridization for 4-, 5- and 6-coordinate

metal ions...... 57

3.4 Melting temperatures of IscU variants in presence of various divalent metal ions ...... 74

4.1 Classification of amino acids according to nature of sidechains ...... 97

4.2 pKa value of different sidechains of amino acids and the corresponding Ka and

'G values...... 101

4.3 Intrinsic pKa values of RNA bases in aqueous solution ...... 107

4.4 Twelve geometric families of RNA base pairs ...... 112

4.5 General Hybridization Scheme and Hybrid orbital composition ...... 116

4.6 Energy values for different types of interactions prevalent in nucleoprotein complexes 128

4.7 A list of resources available for visualization, validation and analysis of nucleoprotein

structures ...... 136

5.1 Hybridization of atoms constituting amino acid backbone and side-chains ...... 151 xix

5.2 Partition of each (A) nucleotide into base, ribose and phosphate, and (B) amino acid

into linkers and functional groups for annotation...... 161

5.3 Different types of interactions annotated for components of nucleotides and amino

acid residues ...... 165

5.4 Nucleotide and edge specificity of base-aa_fg interactions as observed in our limited

dataset analysis ...... 178

5.5 Comparison of RNA-protein interactions in E. coli and T. thermophilus small subunit

ribosome ...... 183

6.1 Nucleotide composition in mmt and bacterial SSU alignments ...... 198

6.2 Correlations of % G composition in mmt and bacterial SSU alignments in aligned

positions ...... 201

6.3 Specific RNA-protein interactions detected in E. coli and porcine SSU ribosome ...... 208

6.4 Solvent accessible surface area of residues categorized by %G composition ...... 214 1

CHAPTER 1. OVERVIEW OF STRUCTURAL HETEROGENITY IN Fe-S CLUSTER

BIOSYNTHESIS PROTEINS

1.1 Importance of Fe-S clusters as protein cofactors

Proteins have evolved to bind and leverage the unique chemical properties of cofactors that afford functionality beyond that available from the standard complement of amino acid residues.

Iron sulfur (Fe-S) clusters are a well-known class of cofactors that expand the function of proteins involved in multiple cellular roles. Fe-S clusters are inorganic in nature comprising of

Fe3+ or Fe2+ ions coordinated to inorganic sulfide anions (S2-). Almost 1/3 of all the in biological systems require metal cofactors for catalysis, of which Fe-S clusters represent a significant fraction [2]. Fe-S proteins are essential in the photosynthetic and respiratory electron transport chains, nitrogen fixation, as well as processes that involve electron transfer, catalysis (e.g. as Lewis acids), redox sensing and even stabilization of protein structure[3-5].

The reduction potentials of Fe-S clusters can range from -0.6 V to +0.45 V [8,9], functionalizing proteins containing Fe-S clusters involved in a range of electron-transfer reactions, from ferredoxins to nitrogenases and hydrogenases.

1.2 Properties of Fe-S clusters

Fe-S centers most commonly used by proteins include Fe2S2, Fe3S4 and Fe4S4, as shown in

Figure 1.1. Cysteine residues are the most prevalent ligands; however, Fe-S clusters coordinated by histidine, serine, aspartate, and arginine residues have also been reported [10]. Fe-S clusters contain a core of iron atoms with bridging inorganic sulfides. The bridging inorganic sulfide is assigned an oxidation number of -2 and the oxidation state of the iron is considered to be +2, +3, or +2.5, depending on the extent of valence delocalization [6]. The electronic, vibrational, magnetic and redox properties of each type of cluster depend on the core oxidation state, the 2 coordination environment, the overall protein structure and the extent of cluster solvent exposure. For example, Rieske type [2Fe-2S] centers (Figure 1.1) are ligated by two histidine residues at the reducible Fe site, generally have much higher [2Fe-2S]2+, midpoint potentials than all-cysteinyl ligated [2Fe-2S] clusters [11].

Figure 1.1 Three common iron-sulfur cluster species. Atoms are colored as follows: iron, orange, sulfur, yellow; protein atoms responsible for binding the iron-sulfur cluster, grey.

The diverse functions of Fe-S proteins arise from the varied electronic and chemical properties of the constituent iron and sulfur atoms [12],[13], however these elements also pose a risk since free iron and sulfide species are toxic to cells. Although Fe-S clusters can be readily formed in vitro from the combination of iron and sulfide ion species, a limited number of Fe-S cluster biosynthesis systems are all but universally conserved, most likely owing to the toxicity of “free” iron and sufide in the cell. Therefore, despite the fact that Fe-S clusters will form spontaneously from the combination of iron and sulfide species in solution, dedicated cellular machineries are required to assemble and traffic Fe-S clusters into the appropriate proteins in order to minimize risk of release or exposure to toxic compounds. Efforts to fully understand the biosynthesis of

Fe-S clusters are relevant to various human diseases arising from perturbed iron homeostasis, 3 bacterial pathogenesis, and bioengineering. A key question that remains concerns the mechanisms by which the biosynthesis scaffold proteins can achieve assembly of nascent Fe-S clusters, followed by a step in which their affinity is tempered in order to promote transfer of the clusters to recipient cellular proteins.

1.3 Fe-S cluster biosynthesis systems

Considerable research in the past two decades has focused on Fe-S cluster biosynthesis in bacteria. In vivo assembly of [Fe-S] clusters is accomplished by three phylogenetically distinct biosynthesis systems: ISC (iron-sulfur cluster), SUF (sulfur mobilization), and NIF (nitrogen fixation). All of these systems are observed in bacteria, and the latter two are conserved in eukaryotes. The mitochondrial ISC system has significant similarities with the bacterial ISC system, and mitochondria plays crucial role in maturation of Fe-S proteins in eukaryotes [14,15].

Escherichia coli represents a common scenario where more than one Fe-S biosynthesis system is present in an organism. In this case, both ISC and SUF systems are present with the corresponding genes organized into two operons, iscSUA-hscBA-fdx and sufABCDSE, respectively [16,17]. The regulation of each system helps the bacterium adapt to normal and stressed conditions. For example, the diazotroph Azotobacter vinelandii harbors the ISC and NIF systems for biosynthesis of Fe-S clusters to support general cellular processes and maturation of the complex metal in nitrogenase, respectively.

Unlike E.coli, most Gram-positive bacteria, particularly Firmicutes, only contain the suf operon, sufCDSUB, which differs from the suf operon found in E.coli by lacking sufE gene and the sufA being located elsewhere in the [18-20]. Studies in two Gram-negative and firmicutes have established a detailed view of prokaryotic Fe-S cluster biosynthesis systems [2,21-23]. The three systems share several features in common. A PLP-dependent is 4 universally required and serves to initiate Fe-S cluster biosynthesis by mobilizing the sulfur atom from L-cysteine and transferring it as a persulfide to the site of Fe-S cluster biosynthesis on a

“scaffold” protein. The cysteine desulfurases present in the NIF, ISC and SUF systems are classified as either Type I or II desulfurases according to signature sequences and the requirement for a protein binding partner to enhance catalytic activity in the case of Type II desulfurases [24]. A “scaffold” protein or protein complex is also essential to nucleate and support the assembly of nascent Fe-S clusters. Scaffolds are either of a “U-type”, as in the case of the prototypical NifU protein and IscU, or they may be a complex of proteins, as in the case of the SufBCD proteins that collaborate to perform the role of scaffold in E. coli [25,26].

Importantly, IscU and its eukaryotic homologs have been shown to adopt either a structured state

(S) or a partially disordered state (D) that influence protein:protein interactions and affinity for the nascent Fe-S clusters during the biosynthesis cycle [23,27,28]. Additional proteins in the operons have been ascribed roles as alternative scaffolds, iron and electron donors and chaperone proteins that assist in the transfer of clusters to cellular proteins [29].

Figure 1.2 Comparison of genes involved in Fe-S cluster biosynthesis in (A) Escherichia coli (Gram-negative Proteobacteria) and (B) Streptococcus mutans (Gram-positive Firmicutes). 5

Colors indicate gene products expected to have similar function. The gene encoding SufU is colored red and blue to reflect evidence of its possible role as an enhancer of SufS activity, and the possibility that its function overlaps with that of an Fe-S cluster scaffold protein as well. The SufBCD genes in the Firmicutes SUF system are not colored reflecting uncertainty in their function due to key sequence differences with E. coli proteins [1].

Compared to Fe-S cluster biosynthesis in Gram-negative bacteria, analogous pathways in Gram positive bacteria have only more recently been studied, mainly in the case of Firmicutes.

Firmicutes comprise a bacterial phylum that includes multiple human pathogens as well as the model bacterium Bacillus subtilis. There is significant interest in elucidating Fe-S cluster biosynthesis in Firmicutes given that multiple species in this phylum are known human pathogens and Fe-S cluster biosynthesis has been implicated in pathogenicity [1,30-33]. Fe-S cluster biosynthesis in Firmicutes such as B. subtilis and Enterococcus faecalis is accomplished by a single SUF system that exhibits similarities to the SUF system from Gram-negative bacteria

(Figure 1.2B). Genes encoding SufB, C, D and S proteins are common to the respective SUF systems from both the Gram-positive and Gram-negative bacteria, although the specific function these proteins in Firmicutes remains to be determined, especially in light of key amino acid differences in the sufB genes of Firmicutes vs. Gram-negative Proteobacteria [1]. The SufS cysteine desulfurase enzymes from both Firmicutes and E. coli belong to the Type II family of desulfurases on the basis of sequence signatures and their requirement for an enhancer protein in order to achieve maximal catalytic rate[1]. SufE serves this role in the canonical E. coli SUF system, enhancing the rate of E. coli SufS activity by 40-60 fold[25,34]. In Gram-positive bacteria, the suf operon does not include a gene encoding SufE. Instead, the gene for a unique

SufU protein is present that is capable of enhancing the catalytic rate of its cognate SufS desulfurase by >200-fold [35]. SufU also shares strong structural and sequence similarity to IscU 6

[36], the central U-type scaffold in the ISC Fe-S cluster biosynthesis system. This has led to investigations into whether SufU is capable of functioning as a scaffold protein during Fe-S cluster biosynthesis [20,37], although conflicting evidence will need to be resolved with additional studies [38].

1.4 IscU as a metamorphic protein

E. coli IscU is a monomeric protein of 127 residues, characterized by an DE fold comprising a three-stranded antiparallel E-sheet and four D-helices[57,58]. In absence of a metal ligand or Fe-

S cluster, IscU exists in an equilibrium of a partially disordered (D-state) and a structured state

(S-state) that interchange on a subsecond timescale [59]. Though the D-state appears to be partially disordered, it contains a fold that stabilizes two high-energy cis peptide bonds critical for the conformational transition [27,60]. The transition is facilitated by the conversion of these two peptidyl-prolyl peptide bonds from trans in the S-state to cis in the D-state. Upon transition to the structured (S) conformational state, three helices and a long unstructured loop that are present in the partially-disordered (D) state conformation of IscU form an additional helix with extension of other secondary elements (Figure 1.3).

One of the two states is preferred at each particular stage in the Fe-S cluster biosynthesis cycle and the two states have different affinity towards partner proteins, metal ligands and assembled

Fe-S clusters [23,42,59,61]. Understanding how IscU interacts with metal ions such as Zn2+ is crucial to understanding how ligand binding affects conformational transitions in the IscU structure, as well as contributions of specific amino acid residues at the active site. It is important to probe into the role of these residues in their participation in metal ion-dependent conformational transitions as well as metal ion selectivity. 7

Figure 1.3 Conformational heterogeneity in IscU. (Left) PONDR predicted disordered regions are IscU are highlighted in grey. (Right) Superposition of IscU structures from Escherichia coli are shown, putatively representative of the structured (S) (brown) and partially disordered (D) states (blue).

1.5 Role of IscU conformational heterogeneity in Fe-S cluster biosynthesis

An important question that arises from the similarities between SufU and IscU is whether SufU also exhibits structural transitions that may be important for its function. IscU has long been recognized to exhibit conformational heterogeneity[62], and E. coli IscU has now been well characterized to undergo structural variations between equilibrium of partially disordered (D) and structured (S) states[23,29,59] (Figure 1.4A). Briefly, the S-state of IscU is predicted to dominate upon interaction with SufS, and has higher affinity for Fe-S clusters to support their nucleation and assembly. After formation of a nascent Fe-S cluster, interaction with the

HscB/HscA chaperone proteins and subsequent ATP hydrolysis by HscA induces IscU to revert predominantly to the D-state [59]. With diminished affinity between D-state of IscU and Fe-S clusters, cluster transfer to cellular apo-proteins is favored. The transfer of Fe-S clusters is facilitated through distinct interactions between the IscU and multiple recipient proteins. Figure 8

1.4B summarizes this model, which is described in more detail elsewhere [23,29,59]. The mechanism may also be preserved in eukaryotic cells [63]. IscS interacts with conserved cysteine residues to form an IscS persulfide intermediate and transfers this Sulfur to IscU involving an

D2E2 heterotetrameric complex intermediate [57]. The intermediate is finally converted to two

[2Fe–2S]2+ clusters bound to each IscU dimer which are subsequently transferred to appropriate proteins directly or with the aid of chaperones[39,64,65]. Importantly, the ability for IscU to adopt different conformational states with distinct binding partner and Fe-S cluster affinities is essential to its function in Fe-S cluster biosynthesis.

Figure 1.4 A) IscU in solution adopts an equilibrium of partially disordered (D) and structured (S) states that can be shifted under different conditions. B) Summarized Fe-S biosynthesis cycle involving transition between D- and S-states by IscU. The variation between states is influenced by cognate biosynthesis protein binding partners and serves to modulate IscU’s affinity for Fe-S clusters , and therefore alternatively favor Fe-S cluster assembly and transfer to apo- proteins[23,27]. 9

In the absence of a ligand or binding partner, the “apo” form of IscU from Gram-negative bacteria exists in a dynamic equilibrium of two conformational states: a structured (S) state and a partially disordered state (D) [23,27,42,59] (Figure 1.4). Of the U-type proteins, IscU has been by far the most extensively characterized in terms of its conformational behavior, and multiple conditions have been identified in vitro to favor either the S- or the D-state (Table 1). For example, addition of Zn2+ ions in solution was found to shift the equilibrium of D- and S-states for IscU to predominantly favor the S-state [66]. Less is known about the role of conformational transitions in Fe-S biosynthesis mediated by other U-type proteins, however conformational heterogeneity has been identified in NifU [67] (the NIF system scaffold protein), ISCU [28] (the human ortholog of IscU), and SufU from B. subtilis [38].

Table 1.1 Different conditions shown to stabilize the structured and partially-disordered states of IscU, respectively.

Condition Predominant state D39A, D39L, D39V, Structured Mutation[60,61,66] N90A, S107A, E111A K89A, N90D Partially-disordered pH[59] High (7.5-9.5) Structured Low (5-7) Partially-disordered Temperature[59] 15-30 qC Structured 5-15 qC, 30-45 qC Partially-disordered Zn2+ [66] Bound Zn2+ Structured Absence of Zn2+ Partially-disordered

1.6 Sequence elements implicated in conformational heterogeneity of IscU and SufU

homologues from Gram-negative and Gram-positive bacteria

Analysis of the alignment of IscU and SufU sequences from Gram-negative and Gram-positive bacteria identified a signature sequence region comprising an 18−21 amino acid insertion 10 between the second and the third active site cysteine residues (Figure 1.5). This insertion is largely conserved across different Firmicutes species and is referred to as the Gram-positive region (GPR) [1,36]. The GPR bears a conserved signature sequence xxFSxxxQGxExxxxLG. A closer look at the multiple sequence alignments of IscU and SufU reveals other conserved/functionally-relevant sequence elements that are implicated in metal coordination.

Specifically, the three conserved cysteine residues that form the site of Fe-S cluster assembly in

IscU are conserved in SufU, likely with the same function[68], as well as other putative metal coordinating residues in IscU that include D39 (conserved as D42 in SufU), and H105

(conserved as K127 in SufU). These residues are highlighted in Figure 1.5.

Figure 1.5 Multiple sequence alignment of IscU from Gram-negative bacteria (Azotobacter vinelandii, Haemophilus influenza, Escherichia coli) and SufU from Gram-positive bacteria (Bacillus subtilis, Streptococcus mutans, and Streptococcus pyogenes). Secondary structure assignments from the X-ray crystal structure of E. coli IscU (PDB: 3LVL[69], chain A) are denoted with helix (h) and sheet (s) symbols located above the aligned sequences. Residues that 11 are disordered in the D-state (PDB: 2L4X), but adopt a definite secondary structure in the S-state are indicted with pink “h” (α-helix) or “s” (β-strand) letters above the aligned sequences. E. coli IscU amino acids predicted by PONDR to be disordered are indicated by red boxes. Active site cysteine residues are colored red and underlined. The LPPVK motif in IscU is shaded in blue and the Gram-positive region (GPR) in SufU is shaded yellow. The putative residues important for binding, D39 and H105 in Gram-positive bacteria, and Asp41 and Lys126 in Gram-positive bacteria are colored as blue text.

Based on earlier molecular dynamics calculations, the GPR region in Firmicutes SufU was predicted to be prone to conformational flexibility and potentially important for interaction with target proteins, similar to how the dynamic LPPVK motif of IscU mediates interaction with

HscA and HscB [36,42]. We considered the possible relationship between conformational heterogeneity and other protein functions such as ligand coordination. The first step was to compare the structures of IscU representing the D and S states to identify the amino acid regions exhibiting the largest changes in apparent secondary structure between the two states (Figure 1.6,

Figure 1.6 pink highlights). In Figure 1.5, the first and second active site cysteine residues are within, or very close to, sequence regions that undergo conformational transition in IscU identified by this comparison. On the other hand, PONDR analysis of IscU indicated, different regions were predicted to display disorder, both in terms of the extent and location of implicated sequence regions (Figure 1.5). For example, some of the residues predicted to be disordered by

PONDR are found to retain structure in both the D and S-states. These regions include the putative binding site residue D39 as well as the LPPVK, which appear to remain structured in both the D and S IscU structures. However, as shown in previous mutational studies, single site mutations of D39, N90, S107 or E111 (conserved residues that are within or close to disordered regions predicted by PONDR) strongly favors the S-state of E. coli IscU [66]. Furthermore, proline residues in the LPPVK motif (P100, P101) undergo a cis/trans isomerization implicated in the conformational transition of IscU [27]. Therefore, although there is not complete 12 agreement between PONDR analysis and the comparison of IscU NMR and X-ray crystal structures, PONDR appears to offer useful insights into sequence regions and residues that may be involved in conformational variability.

Figure 1.6 A comparison of experimental structures of IscU representing the structured and partially-disordered states highlights differences in secondary structure. IscU exists in a dynamic equilibrium of two different conformational states that interconvert on a second timescale [27]. The S-state is represented by the X-ray crystal structure of IscU in complex with IscS (PDB ID: 3LVL[69]). The partially-disordered state (D-state) is represented by one of the twenty dynamic states of the deposited NMR structure ensemble obtained for apo-IscU, which contains clear loop and “extended” regions that lack structure compared to the S-state (PDB ID: 2L4X[60]). Equivalent residues whose conformations and secondary structure assignments differ between the S- and D-states are colored in pink.

In light of the similarities between IscU and SufU, we sought to compare their structural responses to the addition of Zn2+ as a stimulus. We selected SufU from Streptococcus mutans, a well-studied Firmicute that is a primary causative agent of dental caries and known for its robust ability to form biofilms [70-72]. SufU from S. mutans shares 43% sequence similarity with

E.coli IscU. We report here structural transitions exhibited by IscU from E. coli that are induced 13 by addition of zinc, and compare them with the features and responses of SufU from S. mutans under the same conditions. This study investigates the capability of IscU and SufU to exhibit conformational flexibility in order to explore the extent of this phenomenon as a widespread strategy involved in Fe-S cluster biosynthesis mediated by U-type proteins.

1.7 Interactions of IscU with metal ions and their biological relevance

In solution, owing to the lack of rigid folding, probing the three-dimensional structure of IscU has been difficult. Attempts at resolving the structural heterogeneity of IscU succeeded with the addition of metal ions. In other studies, IscU has also been characterized as a “complex-orphan protein” that is susceptible to unfolding in the absence of a co-factor or interacting protein[73].

In vivo, IscU adopts a [2Fe-2S] cluster [39,74]. In vitro studies have demonstrated cluster assembly on IscU in presence of IscS. However, apo-IscU does not bind to Fe2+ or Fe3+ ions indicating ass embly of Fe-S cluster requires additional partner proteins [58]. Most recentl, it y was shown that Zn2+ addition to apo-IscU solution favors the S state and lends greater stability than does the interaction with a complex of Isc:IscS and an Fe2+ ion [61].

Strong interactions between U-type proteins and Zn2+ ions have also been demonstrated in other

-17 cases. It has been reported that SufU from B. subtilis is a zinc-dependent protein (Kd~ 10 M),

[38] however the biological significance of zinc binding to IscU remains unclear. While total cellular concentrations of zinc can be in the range of 10-4 M, the amount of

“available” zinc ions (not bound by proteins or sequestered in cellular reserves) is thought to be exceedingly low[75]. A possible counter point is that during oxidative stress, zinc ions were found to mismetallate iron binding sites in mononuclear iron enzymes, decreasing the activity of affected enzymes[76]. Alternatively, metal ion trafficking can dramatically elevate the apparent concentration of metal ions a protein is exposed to, however we are not aware of an established 14 role for zinc ions (or zinc trafficking) in bacterial Fe-S cluster biosynthesis[75]. Nevertheless, as noted in an earlier study, zinc is expected to promote rather than impede Fe-S cluster delivery by influencing interactions of IscU with IscS and Fe-S cluster delivery[40]. Despite unconfirmed biological relevance of zinc ion binding in Fe-S cluster biosynthesis, Zn2+ ions have been recognized to stimulate conformational changes in vitro that, in the case of IscU, appear to represent states of the protein that are important during the Fe-S biosynthesis cycle[59].

1.8 Interactions of toxic metals with Fe-S biosynthesis systems

Fe-S clusters are frequently targeted by soft metals with high affinity for sulfur ligands. Metals reported to disrupt Fe-S biosynthesis are Cu(I) [55] and Co(II) [56] though several other metals are known to attack exposed Fe-S clusters on Fe-S bound proteins such as dehydratases [76,77].

These metals may operate by stimulating Fenton reactions that produce reactive species

(ROS) that damage Fe-S clusters other other cellular components. Other thiophilic metals such as

Cd(II) and Al(III) are also known to affect Fe-S enzymes, which further propagate the damage via free radical formation [78-80]. In most of the reported toxic metal stress studies, the primary mechanism of toxicity is unknown. It has been speculated that some, if not all, toxic metals directly disrupt Fe-S biosynthesis and damage repair processes by impairing the biosynthesis proteins or the iron donor required for repair of damaged clusters.

[4Fe–4S]2+ clusters present in Fe-S enzymes involved in TCA pathways, have one iron atom

-x solvent-exposed that can interact accidentally with one superoxide anion (O2 ) [81-84]. This can

-x 2+ be followed by the transfer of one electron O2 to form H2O2, which converts the [4Fe–4S] cluster to the unstable [4Fe–4S]3+ state that loses one Fe2+ to yield a [3Fe–4S]1+ cluster and thus

2+ an inactive enzyme. Furthermore, the resultant Fe can react with H2O2, which generates 15 hydroxyl radical (OHx) that can damage DNA [84]. ROS can also directly affect Fe–S cluster biogenesis in aerobic bacteria as well as, in pathogens inside hosts and for bacteria treated with antibiotics [85,86]. Moreover, mutations impairing Fe–S biosynthesis enhance resistance to antibiotics in bacteria [86-88].

Under such oxidative or toxic metal stress, the SUF system is induced in firmicutes

[26,36,43,89]. In Gram-negative bacteria such as E.coli, the SufABCDSE operon is upregulated under stress conditions [87,89]. It is unclear how the Fe-S biosynthesis proteins, IscU and SufU are affected by toxic metal stress. Pertinent questions include (i) which residues are implicated in the metal binding, (ii) what is the impact of toxic metal binding on the conformation of U-type proteins, and (iii) do toxic metal ions bind at the same site as Fe-S cluster assembly.

1.9 Specific aims and significance of the work

The goal of this study was to expand on the interaction of IscU variants with different metal ions and make a systematic comparison with the response of S. mutans SufU to the same metals. The specific aims are:

i. To compare the effects of transition metal ion binding on IscU and SufU conformations

ii. To probe the role of two conserved active site residues in metal-dependent structural

transitions and metal ion-binding preference

Since so many cellular processes depend on Fe–S proteins, biosynthesis and maintenance of Fe-S clusters is imperative to cell survival. Any damage to Fe–S cofactors or to the Fe-S cluster biosynthetic systems can have deleterious effects on cellular physiology. Since Fe-S cluster biosynthesis proteins have evolved to bind metals, mismetallation of these proteins is a common mechanism that impairs Fe-S cluster proteins. Since conformational transitions are imperative to 16 the function of IscU as a metamorphic protein, this work focuses on the impact of toxic metal binding on conformation of IscU and its SufU homolog. Metal-protein inteactions are examined via independent experimental techniques that monitor changes in secondary structure of the protein, thermal stability of the protein upon metal binding, and thermodynamics of the interaction. The outcomes of this study delineate similarities in behavior of U-type proteins, IscU and SufU, and clarify the metal ion selectivity exhibited by the proteins.

17

REFERENCES

1. Riboldi GP, de Mattos EP, Frazzon J (2013) Biogenesis of [Fe–S] cluster in Firmicutes: an unexploited field of investigation. Antonie van Leeuwenhoek 104: 283-300. 2. Roche B, Aussel L, Ezraty B, Mandin P, Py B, et al. (2013) Iron/sulfur proteins biogenesis in prokaryotes: formation, regulation and diversity. Biochimica et Biophysica Acta (BBA)- Bioenergetics 1827: 923-937. 3. Bian S, Cowan J (1999) Protein-bound iron–sulfur centers. Form, function, and assembly. Coordination chemistry reviews 190: 1049-1066. 4. Py B, Moreau PL, Barras F (2011) Fe–S clusters, fragile sentinels of the cell. Current opinion in microbiology 14: 218-223. 5. Beinert H (2000) Iron-sulfur proteins: ancient structures, still full of surprises. Journal of Biological Inorganic Chemistry 5: 2-15. 6. Beinert H, Holm RH, Münck E (1997) Iron-sulfur clusters: nature's modular, multipurpose structures. Science 277: 653-659. 7. Cambray J, Lane R, Wedd A, Johnson RW, Holm R (1977) Chemical and electrochemical interrelationships of the 1-Fe, 2-Fe, and 4-Fe analogs of the active sites of iron-sulfur proteins. Inorganic Chemistry 16: 2565-2571. 8. Capozzi F, Ciurli S, Luchinat C (1998) Coordination sphere versus protein environment as determinants of electronic and functional properties of iron-sulfur proteins. In: Hill HAO, Sadler PJ, Thomson AJ, editors. Metal Sites in Proteins and Models Redox Centres. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 127-160. 9. Beinert H (2000) Iron-sulfur proteins: ancient structures, still full of surprises. JBIC Journal of Biological Inorganic Chemistry 5: 2-15. 10. Py B, Barras F (2010) Building Fe–S proteins: bacterial strategies. Nat Rev Micro 8: 436- 446. 11. Liu J, Chakraborty S, Hosseinzadeh P, Yu Y, Tian S, et al. (2014) Metalloproteins containing cytochrome, iron–sulfur, or copper redox centers. Chemical Reviews 114: 4366-4469. 12. Beinert H (2000) A tribute to sulfur. European Journal of Biochemistry 267: 5657-5664. 13. Jensen KP (2006) Iron–sulfur clusters: Why iron? Journal of inorganic biochemistry 100: 1436-1439. 14. Lill R (2009) Function and biogenesis of iron–sulphur proteins. Nature 460: 831-838. 15. Lill R, Mühlenhoff U (2005) Iron–sulfur-protein biogenesis in eukaryotes. Trends in biochemical sciences 30: 133-141. 16. Ayala-Castro C, Saini A, Outten FW (2008) Fe-S cluster assembly pathways in bacteria. Microbiology and Molecular Biology Reviews 72: 110-125. 17. Fontecave M, De Choudens SO, Py B, Barras F (2005) Mechanisms of iron–sulfur cluster assembly: the SUF machinery. JBIC Journal of Biological Inorganic Chemistry 10: 713- 721. 18

18. Santos JA, Alonso-García N, Macedo-Ribeiro S, Pereira PJB (2014) The unique regulation of iron-sulfur cluster biogenesis in a Gram-positive bacterium. Proceedings of the National Academy of Sciences 111: E2251-E2260. 19. Riboldi GP, Verli H, Frazzon J (2009) Structural studies of the Enterococcus faecalis SufU [Fe-S] cluster protein. BMC biochemistry 10: 3. 20. Albrecht AG, Netz DJ, Miethke M, Pierik AJ, Burghaus O, et al. (2010) SufU is an essential iron-sulfur cluster scaffold protein in Bacillus subtilis. Journal of bacteriology 192: 1643- 1651. 21. Fontecave M, Ollagnier-de-Choudens S (2008) Iron–sulfur cluster biosynthesis in bacteria: mechanisms of cluster assembly and transfer. Archives of biochemistry and biophysics 474: 226-237. 22. Ayala-Castro C, Saini A, Outten FW (2008) Fe-S cluster assembly pathways in bacteria. Microbiol Mol Biol Rev 72: 110-125, table of contents. 23. Kim JH, Bothe JR, Alderson TR, Markley JL (2015) Tangled web of interactions among proteins involved in iron–sulfur cluster assembly as unraveled by NMR, SAXS, chemical crosslinking, and functional studies. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research 1853: 1416-1428. 24. Black KA, Dos Santos PC (2015) Shared-intermediates in the biosynthesis of thio-cofactors: Mechanism and functions of cysteine desulfurases and sulfur acceptors. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research 1853: 1470-1480. 25. Outten FW, Wood MJ, Muñoz FM, Storz G (2003) The SufE protein and the SufBCD complex enhance SufS cysteine desulfurase activity as part of a sulfur transfer pathway for Fe-S cluster assembly in Escherichia coli. Journal of Biological Chemistry 278: 45713-45719. 26. Wollers S, Layer G, Garcia-Serres R, Signor L, Clemancey M, et al. (2010) Iron-Sulfur (Fe- S) Cluster Assembly The SufBCD complex is a new type of Fe-S scaffold with a flavin redox cofactor. Journal of Biological Chemistry 285: 23331-23341. 27. Dai Z, Tonelli M, Markley JL (2012) Metamorphic protein IscU changes conformation by cis–trans isomerizations of two peptidyl–prolyl peptide bonds. Biochemistry 51: 9595- 9602. 28. Cai K, Frederick RO, Kim JH, Reinen NM, Tonelli M, et al. (2013) Human mitochondrial chaperone (mtHSP70) and cysteine desulfurase (NFS1) bind preferentially to the disordered conformation, whereas co-chaperone (HSC20) binds to the structured conformation of the iron-sulfur cluster scaffold protein (ISCU). Journal of Biological Chemistry 288: 28755-28770. 29. Blanc B, Gerez C, de Choudens SO (2015) Assembly of Fe/S proteins in bacterial systems: Biochemistry of the bacterial ISC system. Biochimica et Biophysica Acta (BBA)- Molecular Cell Research 1853: 1436-1447. 19

30. Lo M, Murray GL, Khoo CA, Haake DA, Zuerner RL, et al. (2010) Transcriptional response of Leptospira interrogans to iron limitation and characterization of a PerR homolog. Infection and immunity 78: 4850-4859. 31. Ellermeier JR, Slauch JM (2008) Fur regulates expression of the Salmonella pathogenicity island 1 type III secretion system through HilD. Journal of bacteriology 190: 476-486. 32. Cheng H, Chen Y, Wu C, Chang H, Lai Y, et al. (2010) RmpA regulation of capsular polysaccharide biosynthesis in Klebsiella pneumoniae CG43. Journal of bacteriology 192: 3144-3158. 33. Brickman TJ, Anderson MT, Armstrong SK (2007) Bordetella iron transport and virulence. Biometals : an international journal on the role of metal ions in biology, biochemistry, and medicine 20: 303-322. 34. Dai Y, Kim D, Dong G, Busenlehner LS, Frantom PA, et al. (2015) SufE D74R Substitution Alters Active Site Loop Dynamics To Further Enhance SufE Interaction with the SufS Cysteine Desulfurase. Biochemistry 54: 4824-4833. 35. Selbach B, Earles E, Dos Santos PC (2010) Kinetic Analysis of the Bisubstrate Cysteine Desulfurase SufS from Bacillus subtilis. Biochemistry 49: 8794-8802. 36. Riboldi GP, Verli H, Frazzon J (2009) Structural studies of the Enterococcus faecalis SufU [Fe-S] cluster protein. BMC biochemistry 10: 1. 37. Riboldi GP, De Oliveira JS, Frazzon J (2011) Enterococcus faecalis SufU scaffold protein enhances SufS desulfurase activity by acquiring sulfur from its cysteine-153. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics 1814: 1910-1918. 38. Selbach BP, Chung AH, Scott AD, George SJ, Cramer SP, et al. (2013) Fe-S cluster biogenesis in Gram-positive bacteria: SufU is a zinc-dependent sulfur transfer protein. Biochemistry 53: 152-160. 39. Mansy SS, Wu G, Surerus KK, Cowan JA (2002) Iron-Sulfur Cluster Biosynthesis: Thermatoga maritima IscU is a structured Iron-Sulfur cluster assembly protein. Journal of Biological Chemistry 277: 21397-21404. 40. Iannuzzi C, Adrover M, Puglisi R, Yan R, Temussi PA, et al. (2014) The role of zinc in the stability of the marginally stable IscU scaffold protein. Protein Science 23: 1208-1219. 41. Roche B, Aussel L, Ezraty B, Mandin P, Py B, et al. (2013) Iron/sulfur proteins biogenesis in prokaryotes: Formation, regulation and diversity. Biochimica et Biophysica Acta (BBA) - Bioenergetics 1827: 455-469. 42. Kim JH, Tonelli M, Frederick RO, Chow DC-F, Markley JL (2012) Specialized Hsp70 chaperone (HscA) binds preferentially to the disordered form, whereas J-protein (HscB) binds preferentially to the structured form of the iron-sulfur cluster scaffold protein (IscU). Journal of Biological Chemistry 287: 31406-31413. 43. Wayne Outten F (2015) Recent advances in the Suf Fe–S cluster biogenesis pathway: Beyond the Proteobacteria. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1853: 1464-1469. 20

44. Outten FW, Djaman O, Storz G (2004) A suf operon requirement for Fe–S cluster assembly during iron starvation in Escherichia coli. Molecular microbiology 52: 861-872. 45. Jan R, Florence P, Jan M, Gaelle SK, Jeffrey MC, et al. (2014) The cysteine desulfurase IscS of Mycobacterium tuberculosis is involved in iron-sulfur cluster biogenesis and oxidative stress defence. Biochemical Journal 459: 467-478. 46. Huet G, Daffé M, Saves I (2005) Identification of the Mycobacterium tuberculosis SUF machinery as the exclusive mycobacterial system of [Fe-S] cluster assembly: evidence for its implication in the pathogen's survival. Journal of bacteriology 187: 6137-6146. 47. Helaine S, Kugelberg E (2014) Bacterial persisters: formation, eradication, and experimental systems. Trends in microbiology 22: 417-424. 48. Graves DB (2012) The emerging role of reactive oxygen and nitrogen species in redox biology and some implications for plasma applications to medicine and biology. Journal of Physics D: Applied Physics 45: 263001. 49. Kohanski MA, Dwyer DJ, Collins JJ (2010) How antibiotics kill bacteria: from targets to networks. Nature Reviews Microbiology 8: 423-435. 50. Kohanski MA, Dwyer DJ, Wierzbowski J, Cottarel G, Collins JJ (2008) Mistranslation of membrane proteins and two-component system activation trigger antibiotic-mediated cell death. Cell 135: 679-690. 51. Runyen-Janecky L, Daugherty A, Lloyd B, Wellington C, Eskandarian H, et al. (2008) Role and regulation of iron-sulfur cluster biosynthesis genes in Shigella flexneri virulence. Infection and immunity 76: 1083-1092. 52. Nachin L, El Hassouni M, Loiseau L, Expert D, Barras F (2001) SoxR-dependent response to oxidative stress and virulence of Erwinia chrysanthemi: the key role of SufC, an orphan ABC ATPase. Molecular microbiology 39: 960-972. 53. Rincon-Enriquez G, Crété P, Barras F, Py B (2008) Biogenesis of Fe/S proteins and pathogenicity: IscR plays a key role in allowing Erwinia chrysanthemi to adapt to hostile conditions. Molecular microbiology 67: 1257-1273. 54. Glasner JD, Yang C-H, Reverchon S, Hugouvieux-Cotte-Pattat N, Condemine G, et al. (2011) Genome sequence of the plant pathogenic bacterium Dickeya dadantii 3937. Journal of bacteriology. 55. Chillappagari S, Seubert A, Trip H, Kuipers OP, Marahiel MA, et al. (2010) Copper Stress Affects Iron Homeostasis by Destabilizing Iron-Sulfur Cluster Formation in Bacillus subtilis. Journal of bacteriology 192: 2512-2524. 56. Ranquet C, Ollagnier-de-Choudens S, Loiseau L, Barras F, Fontecave M (2007) Cobalt Stress in Escherichia coli :The effect on the iron-sulfur proteins. Journal of Biological Chemistry 282: 30442-30451. 57. Ramelot TA, Cort JR, Goldsmith-Fischman S, Kornhaber GJ, Xiao R, et al. (2004) Solution NMR structure of the iron–sulfur cluster assembly protein U (IscU) with zinc bound at the active site. Journal of molecular biology 344: 567-583. 21

58. Adinolfi S, Rizzo F, Masino L, Nair M, Martin SR, et al. (2004) Bacterial IscU is a well folded and functional single domain protein. European Journal of Biochemistry 271: 2093-2100. 59. Markley JL, Kim JH, Dai Z, Bothe JR, Cai K, et al. (2013) Metamorphic protein IscU alternates conformations in the course of its role as the scaffold protein for iron–sulfur cluster biosynthesis and delivery. FEBS letters 587: 1172-1179. 60. Kim JH, Tonelli M, Kim T, Markley JL (2012) Three-dimensional structure and determinants of stability of the iron–sulfur cluster scaffold protein IscU from Escherichia coli. Biochemistry 51: 5557-5563. 61. Kim JH, Füzéry AK, Tonelli M, Ta DT, Westler WM, et al. (2009) Structure and Dynamics of the Iron− Sulfur Cluster Assembly Scaffold Protein IscU and Its Interaction with the Cochaperone HscB. Biochemistry 48: 6062-6071. 62. Bertini I, Cowan J, Del Bianco C, Luchinat C, Mansy SS (2003) Thermotoga maritima IscU. Structural characterization and dynamics of a new class of metallochaperone. Journal of molecular biology 331: 907-924. 63. Yoon T, Cowan J (2003) Iron-sulfur cluster biosynthesis. Characterization of frataxin as an iron donor for assembly of [2Fe-2S] clusters in ISU-type proteins. Journal of the American Chemical Society 125: 6078-6084. 64. Wu G, Mansy SS, Wu S-p, Surerus KK, Foster MW, et al. (2002) Characterization of an iron-sulfur cluster assembly protein (ISU1) from Schizosaccharomyces pombe. Biochemistry 41: 5024-5032. 65. Agar JN, Krebs C, Frazzon J, Huynh BH, Dean DR, et al. (2000) IscU as a scaffold for iron- sulfur cluster biosynthesis: sequential assembly of [2Fe-2S] and [4Fe-4S] clusters in IscU. Biochemistry 39: 7856-7862. 66. Kim JH, Tonelli M, Markley JL (2012) Disordered form of the scaffold protein IscU is the substrate for iron-sulfur cluster assembly on cysteine desulfurase. Proceedings of the National Academy of Sciences 109: 454-459. 67. Li J, Ding S, Cowan J (2013) Thermodynamic and structural analysis of human NFU conformational chemistry. Biochemistry 52: 4904-4913. 68. Kornhaber GJ, Snyder D, Moseley HN, Montelione GT (2006) Identification of zinc-ligated cysteine residues based on 13Cα and 13Cβ chemical shift data. Journal of biomolecular NMR 34: 259-269. 69. Shi R, Proteau A, Villarroya M, Moukadiri I, Zhang L, et al. (2010) Structural basis for Fe–S cluster assembly and tRNA thiolation mediated by IscS protein–protein interactions. PLoS Biol 8: e1000354. 70. Lemos JA, Quivey Jr RG, Koo H, Abranches J (2013) Streptococcus mutans: a new Gram- positive paradigm? Microbiology 159: 436-445. 71. Krzyściak W, Jurczak A, Kościelniak D, Bystrowska B, Skalniak A (2014) The virulence of Streptococcus mutans and the ability to form biofilms. European Journal of Clinical Microbiology & Infectious Diseases 33: 499-515. 22

72. Banas JA (2004) Virulence properties of Streptococcus mutans. Front Biosci 9: 1267-1277. 73. Prischi F, Pastore C, Carroni M, Iannuzzi C, Adinolfi S, et al. (2010) Of the vulnerability of orphan complex proteins: The case study of the E. coli IscU and IscS proteins. Protein expression and purification 73: 161-166. 74. Nuth M, Cowan J (2009) Iron–sulfur cluster biosynthesis: characterization of IscU–IscS complex formation and a structural model for sulfide delivery to the [2Fe–2S] assembly site. JBIC Journal of Biological Inorganic Chemistry 14: 829-839. 75. Outten CE, O'Halloran TV (2001) Femtomolar sensitivity of metalloregulatory proteins controlling zinc homeostasis. Science 292: 2488-2492. 76. Imlay JA (2014) The mismetallation of enzymes during oxidative stress. Journal of Biological Chemistry 289: 28121-28128. 77. Xu FF, Imlay JA (2012) Silver (I), mercury (II), cadmium (II), and zinc (II) target exposed enzymic iron-sulfur clusters when they toxify Escherichia coli. Applied and environmental microbiology 78: 3614-3621. 78. Lemire JA, Harrison JJ, Turner RJ (2013) Antimicrobial activity of metals: mechanisms, molecular targets and applications. Nature Reviews Microbiology 11: 371-384. 79. Helbig K, Grosse C, Nies DH (2008) Cadmium toxicity in glutathione mutants of Escherichia coli. Journal of bacteriology 190: 5439-5454. 80. Calderon IL, Elías AO, Fuentes EL, Pradenas GA, Castro ME, et al. (2009) Tellurite- mediated disabling of [4Fe–4S] clusters of Escherichia coli dehydratases. Microbiology 155: 1840-1846. 81. Dixon SJ, Stockwell BR (2014) The role of iron and reactive oxygen species in cell death. Nature chemical biology 10: 9-17. 82. Imlay JA (2013) The molecular mechanisms and physiological consequences of oxidative stress: lessons from a model bacterium. Nature Reviews Microbiology 11: 443-454. 83. Jang S, Imlay JA (2010) Hydrogen peroxide inactivates the Escherichia coli Isc iron-sulphur assembly system, and OxyR induces the Suf system to compensate. Molecular microbiology 78: 1448-1467. 84. Imlay JA, Chin SM, Linn S (1988) Toxic DNA damage by hydrogen peroxide through the Fenton reaction in vivo and in vitro. Science 240: 640. 85. Maisonneuve E, Gerdes K (2014) Molecular mechanisms underlying bacterial persisters. Cell 157: 539-548. 86. Kohanski MA, Dwyer DJ, Hayete B, Lawrence CA, Collins JJ (2007) A common mechanism of cellular death induced by bactericidal antibiotics. Cell 130: 797-810. 87. Dwyer DJ, Belenky PA, Yang JH, MacDonald IC, Martell JD, et al. (2014) Antibiotics induce redox-related physiological alterations as part of their lethality. Proceedings of the National Academy of Sciences 111: E2100-E2109. 88. Yeom J, Imlay JA, Park W (2010) Iron homeostasis affects antibiotic-mediated cell death in Pseudomonas species. Journal of Biological Chemistry 285: 22689-22695. 23

89. Lee K-C, Yeo W-S, Roe J-H (2008) Oxidant-responsive induction of the suf operon, encoding a Fe-S assembly system, through Fur and IscR in Escherichia coli. Journal of bacteriology 190: 8244-8247.

24

CHAPTER 2. EFFECT OF TOXIC METAL STRESS OF ON CONFORMATIONS OF U-TYPE PROTEINS

2.1 IscU and SufU as metalloproteins

U-type proteins are widely observed among the different Fe-S cluster biosynthesis systems [1].

This family includes the archetypal members NifU and IscU, which cooperate with cysteine desulfurase enzymes to mediate the assembly and transfer of nascent Fe-S clusters in their respective biosynthesis systems [2-4]. A more recent protein of interest is SufU, a protein found in Firmicutes such as Bacillus subtilis and Enterococcus faecalis that shares 43% sequence similarity to IscU [5] and is grouped by the NCBI Conserved Domains Database [6] in the same family of “IscU-like” proteins. In spite of both sequence and structural similarities, the two proteins exhibit distinct behaviors. For example, SufU appears not to function as a typical Fe-S cluster scaffold, but rather enhances the catalytic rate of its cognate SufS desulfurase by nearly

200-fold [7,8]. On the other hand, both IscU and SufU can bind a Zn2+ ion at their active sites, which leads to changes in the structures of the proteins. In the case of IscU, the presence of a bound Zn2+ ion results in the shift from a partially-disordered (D) state to a more structured (S) state, essentially mimicking the cluster-loaded structural conformation of IscU [9-11]. SufU also requires a bound Zn2+ to enhance the activity of its cognate binding partner SufS [7].

2.1.1 Fe-S cluster targeted by transition metals

In their native states, both these proteins are associated with metals. Specifically, both IscU and

SufU exhibit a strong affinity for zinc. In the absence of an Fe-S cluster, a Zn2+ ion can bind at the active site of IscU by coordination with its conserved constellation of three cysteine residues

(Cys37, Cys63, and Cys106, E. coli numbering). Two other residues, D39 and H105 are found in 25 close proximity and are implicated in metal coordination at the active site [12-17]. The SufU active site comprises three characteristic cysteine residues (Cys41, Cys66 and Cys128, B. subtilis numbering) and an aspartate residue (D43) that is the counterpart of D39 in E. coli IscU [5].

These residues in SufU can participate in tetrahedral coordination of a Zn2+ion (PDB: 2AZH

[18]). Interestingly, the reliance on conserved cysteine residues in both IscU and SufU may render the proteins susceptible to binding thiophilic soft transition metal ions with deleterious effects from “mismetallation” [19],[20]. Metal ion species formed from Cu, Ag, Hg, Cd, and Co can bind to the active-site cysteine residues of dehydratases, leading to loss of function [19].

Co2+ and Cu+ ions have specifically been reported to affect expression of genes encoding ISC and SUF iron-sulfur cluster biogenesis proteins, and to destabilize Fe-S clusters on cluster- loaded proteins [21,22].

Our goal was to study the effect of transition metals implicated in damaging Fe-S cluster biosynthesis. We were motivated by the reports of certain transition metals that are detrimental to Fe-S cluster proteins. We wanted to investigate if these metals disrupted Fe-S cluster synthesis by directly binding proteins essential in Fe-S cluster biosynthesis. We tested interaction of five different metals, Zn2+, Fe3+, Cu2+, Co2+ and Cd2+, with apo E. coli IscU (IscU) and SufU from

Streptococcus mutans (SufU), a well-studied Firmicute that is a primary causative agent of dental caries and known for its robust ability to form biofilms [28-30]. SufU from S. mutans shares 43% sequence similarity with E.coli IscU. Our results confirm structural transitions exhibited by IscU and SufU in presence of zinc and reveal possible discriminatory behavior between IscU and SufU to different transition metals.

26

2.2 Methods

2.2.1 Gene cloning and protein overexpression construct design

The genes encoding the desired proteins were PCR amplified from genomic DNA purchased from Sigma (Escherichia coli K12 strain) or ATCC (Streptococcus mutans UA159). The amplified gene sequences were purified from agarose gels and then treated with NdeI and XhoI restriction enzymes (New England Biolabs) before ligation into a modified pET vector designed to produce the target proteins with a fused N-terminal 6-histidine purification tag that is cleavable with tobacco etch virus (TEV) protease (IscU) or a standard pET28a vector with an N- terminal 6-histidine purification tag that is cleavable by thrombin (SufU). All expression plasmids were submitted for nucleotide sequencing at the Chicago DNA Sequencing Facility prior to use.

2.2.2 Protein expression and IMAC purification

Competent E.coli BL21 (DE3) cells (EMD Millipore) were transformed with the protein overexpression plasmids and plated on lysogeny broth (LB) agar plates containing kanamycin

(65 μg/ml). Single colonies were used to inoculate 80 ml of sterile LB medium with 65 μg/ml of kanamycin and grown for overnight at 37 °C on an orbital shaker set to 250 rpm. The overnight cultures were diluted 1 to 200 into 1.5 L sterile LB medium with 65 μg/ml of kanamycin in baffled flasks and incubated in an orbital shaker set to 37 oC and 250 rpm. When cell growth reached an OD600 of 0.5-0.7, the culture flasks were transferred to ice water baths for 30 minutes, induced by the addition of 0.2 mM (final concentration) isopropyl β-D-1-thiogalactopyranoside

(IPTG), and returned to the orbital shaker for an additional 16 hours at 15 oC and 110 rpm.

Approximately 3 g cell pellet per liter of culture was harvested by centrifugation at 6000 rcf for 27

15 min, flash-frozen in liquid nitrogen, and stored at -80 oC. For purification, cell pellets were thawed and re-suspended in lysis buffer comprising 0.025 M Tris-Cl pH 8.0, 0.500 M NaCl, and

2 μl (500 units) Benzonase nuclease (Sigma Aldrich). The lysis mixture was stirred with a magnetic stir bar and sonicated for 120 cycles of 2 seconds sonication pulses separated by 3 seconds rest. Crude lysate was clarified by centrifugation at 25000 rcf for 30 minutes, and the supernatant was loaded on to a pre-packed 5 ml His-Trap immobilized nickel column (GE

Healthcare) using a peristaltic pump. The column was washed extensively with wash buffer

(identical to lysis buffer without added Benzonase). After washing, the bound protein was eluted

(0.025 M Tris-Cl pH 7.8, 0.500 M NaCl, and 0.250 M imidazole). Samples taken at each step were analyzed by SDS-PAGE and visualized with Coomassie Brilliant Blue to identify fractions with optimal yield and purity for pooling and subsequent analysis. The pooled SufU fractions were buffer exchanged by 4 cycles of concentration followed by dilution into the desired buffer

(0.020 M Tris-Cl pH 7.5), using Amicon Ultra-15 centrifugal filter units (EMD Millipore). The collected proteins were cleaved from their purification tags with addition of TEV protease (IscU) or Thrombin (SufU) followed by overnight incubation. The reaction mixtures were passed over fresh columns of immobilized nickel resin and the cleaved proteins were collected as flow through. Protein purity was confirmed by SDS-PAGE.

2.2.3 Iron ion content determination

The iron ion content of each purified protein was determined according to a published method with ferrozine purchased from Thermo Fisher Scientific[31]. The absorbance maximum at 560 nm was recorded on a Cary60 spectrophotometer (Agilent Technologies). The amount of iron 28 present in the sample was estimated using a plot of standard solutions with Fe3+ ion concentrations ranging from 1 μg/ml to 6 μg/ml.

2.2.4 Zinc ion content determination

Zinc ion content of the proteins was determined using a spectrophotometric assay utilizing the chromophoric chelator 4-(2-pyridylazo) resorcinol (PAR)[32]. PAR will form a complex with zinc ions that exhibits an absorbance maximum at 497 nm. Standard solutions with Zn2+ (1-8

μM) and protein solutions were prepared in 0.05 M HEPES at pH 7.4 containing 4.0 M

Guanidine hydrochloride, followed by addition of 0.05 mM PAR then incubated for 30 minutes.

The spectrum was recorded (350-650 nm) for all the samples using the Cary60 spectrophotometer (Agilent Technologies) and the quantity of zinc ions present in protein was determined from the calibration plot.

2.2.5 Removal of coordinated metal ions

Coordinated metal ions were removed from purified proteins by passage over a column of Ni-

NTA resin (Thermo Fisher Scientific) pre-treated with 0.1 M EDTA (pH 8.0) to expose open metal coordination sites that would compete with protein-bound metal ions[33]. Before adding the protein, the EDTA-treated Ni-NTA resin was washed extensively with purified water, followed by 20 mM Tris-Cl buffer (pH 7.8). Purified protein was applied to the column and the flow-through was collected. Subsequently, the column was washed with 20 mM Tris-Cl repeatedly to elute the apo-protein. Eluted apo-protein was concentrated using Amicon Ultra-15 centrifugal filter units (EMD Millipore) with 3 kDa nominal molecular weight cutoff and diluted to original concentration with 20 mM Tris-Cl buffer. This step was repeated 4-5 times to ensure removal of any residual EDTA. Removal of metal ions was confirmed by Ferrozine assay/PAR 29 assay prior to other experiments (Figures S2 and S3).

2.2.6 Circular Dichroism (CD) Experiments

Conformational changes induced by the addition of metal ions to the apo-proteins were monitored by circular dichroism spectroscopy. 0.10 M of ZnCl2, CoCl2 and CdCl2 stock solutions were freshly prepared in water and subsequently diluted into a buffer containing 0.02

M Tris-Cl (pH 7.7) and 0.5 mM Tris(hydroxymethyl)phosphine (THP) to form the “working stock solution” (final concentration 50 μM). Samples of 20 μM (final concentration) IscU and

SufU were prepared in a buffer containing 0.02 M Tris-Cl (pH 7.7) and 0.5 mM

Tris(hydroxymethyl)phosphine (THP). In spite of the known interference with CD spectral measurements at wavelengths shorter than 200 nm, Tris buffer with chloride counter-ions was selected for its compatibility with different metal salts used in parallel experiments. A “blank” spectrum recorded with buffer alone was recorded and subtracted from all subsequent measurements. Protein:zinc solutions were prepared by combining equal volumes of protein and zinc chloride, 20 mM Tris-Cl buffer (pH 7.7). Final sample composition was 10 μM apo-protein in 20 mM TrisCl and 50 μM metal solution. Protein and buffer samples were incubated for 1 hr followed by spinning down for 10 minutes at 14000 rcf. Slight cloudiness was observed for SufU when the added zinc concentration exceeded approximately 40 μM; IscU solutions remained clear in presence of zinc. Far-UV CD spectra of protein-metal complexes were recorded with an

Aviv 62DS updated to the equivalent of a Model 202 CD Aviv Circular Dichroism Spectrometer at 25°C. Data were collected every 0.5 nm with 2 sec averaging time. Three scans were averaged and the appropriate buffer baseline was subtracted. All spectra were smoothed and plotted in Igor

Pro. 30

2.2.7 Thermafluor assay

ThermoFluor experiments were carried out using RT-PCR machine available from Applied

Biosystems. Protein-metal solutions in 0.020 M HEPES, pH 7.5 (25 μL) were dispensed into 96- well polypropylene PCR microplates (Abgene). and sealed with transparent adhesive tape to prevent evaporation. Protein solutions contained apoIscU or apoSufU 10 μM (final concentration),

0.020 M HEPES buffer (pH 7.4), SYPRO Orange, metal salt solutions: ZnCl2, CoCl2, or CdCl2, respectively, at a final concentration of 0.05 mM. Thermocycler plates were robotically loaded onto a RT-PCR thermal block. The RT-PCR machine was programmed first with a 5 min equilibration time at 5 oC to allow the SYPRO Orange to diffuse and reach temperature equilibrium in order to lower the initial background fluorescence[34]. Subsequently, the plate was heated from 5 °C to 95 °C with initial stepwise increments of 1 °C per minute, followed by the fluorescence reading optimized for SYPRO Orange at 485/20 nm (Ex) and 530/30 nm (Em).

Each protein-metal solution was replicated three times. The resultant melting temperatures (Tms) were averaged to obtain a mean Tm. Reference wells contained buffer and metal solutions without protein.

2.2.8 Steady State Fluorescence measurements

Tryptophan fluorescence emission was recorded with Fluorolog-3 (Horiba Scientific) from 10

μM apoSufU/apoIscU in 20 mM Tris-Cl buffer pH 7.7 with 0.5 mM of THP in a quartz fluorescence cuvette. Spectra were recorded following incremental additions of 2.5 μM ZnCl2 from a 5 mM ZnCl2 working stock solution. Fluorescence measurements were recorded with the incremental addition of the zinc until slight cloudiness was observed for both proteins above 50

μM Zn2+. Data were recorded from 305-450 nm with a 2 nm slit width and 0.5 s acquisition time. 31

The excitation wavelength was 295 nm. Fractional saturation (f) for Zn2+ binding to each protein was determined by equation 1:

 FF )( f 0  FF )( max 0 (1)

F above is the fluorescence intensity measured from protein solution with a particular zinc concentration, F0 is the fluorescence intensity of the apo-protein and Fmax is the maximal fluorescence intensity achieved, beyond which further increase was not observed. Signs in numerator and denominator were reversed for fluorescence quenching. Binding curves were generated by fitting the fractional saturation versus zinc ion concentration to a one-site specific binding model using equation 2 as implemented by Prism 7 from GraphPad (Figure 2, inset):

௫כ ஻ ݕൌ ೘ೌೣ ሺ௄ ା௫ሻ ೏ (2)

Bmax is the maximum specific binding in the same units as y. Kd is the dissociation constant, corresponding to the zinc ion concentrations associated with half the maximal change in observed fluorescence for each protein

2.3 Results

2.3.1 Change in secondary structure in response to metal addition

Circular dichroism (CD) was employed to assess the structural responses of apoIscU and apoSufU upon separate titration of five different transition metal ions: Fe3+, Zn2+, Cu2+, Co2+ and

Cd2+. Whereas addition of Fe3+ and Cu2+ ions did not alter the CD profiles of either protein, addition of Zn2+ and Cd2+ ions had clear effects. Specifically, addition of Zn2+ and Cd2+ ions resulted in a enhancement of the negative signal at 222 nm for IscU, which is consistent with 32 prior reports[15]. In the case of SufU, addition of Zn2+ and Cd2+ ions resulted in a more shallow negative band centered at 208 nm, which has also been observed[7]. The respective changes in the CD spectra are suggestive of an increase in α-helical content for IscU and increase in β-sheet content for SufU that occurs upon addition of Zn2+ and Cd2+ ions to the two proteins.

33

Figure 2.1 Far-UV circular dichroism spectra of 10 μM (left) E. coli IscU and (right) S. mutans SufU with 50 μM of different metal ions. Molar ellipticity is in (deg cm2/ decimol). All spectra were taken at 25 °C with 10 μM protein samples in 20 mM TrisCl at pH 7.5

The CD spectra recorded for each protein appear to indicate a different response to addition of

Co2+ ions. In the case of IscU, the CD spectra recorded upon addition of Co2+ ions resembles that recorded following addition of Zn2+ and Cd2+ ions, respectively, while in the case of SufU, addition of Co2+ ions did not significantly alter the CD spectra as compared with apoSufU.

2.3.2 Probing thermal stability of the proteins with addition of metals

Ligands can increase the thermal stability of proteins by enhancing favorable interactions associated with a given structural conformation. To further probe the nature of possible IscU and 34

SufU metal:protein interactions, we monitored changes in the thermal stability of apo- and metal- bound IscU and SufU, respectively, using a fluorescence-based thermal shift assay [35,36]. Dye fluorescence is monitored in the presence of protein as the temperature of the solution is slowly heated. As the protein unfolds, hydrophobic moieties are exposed that can associate with the dye and enhance its fluorescence. The fluorescence is plotted as a function of temperature and the resulting isotherm can be fitted to obtain Tm values under different experimental conditions. To rule out any influence of the buffer or metal on the fluorescence of the dye, negative controls included solutions of the dye with buffer (no protein or metal added) and metal salt solution with the dye (no protein added). Fe3+ and Cu2+ ions are known to quench the fluorescence of the

SYPRO Orange dye used in thermofluor assay, and were therefore excluded from this experiment.

Figure 2.2 Thermal melting curves for (A) IscU and (B) SufU in absence (dotted line) and presence (solid line) of different divalent metals

Melting curves obtained for the apo-proteins (dashed line, Figure 2.2) were uncharacteristic of thermal melting of a structured protein. Hence no meaningful melting point could be determined.

The results of the thermofluor assay were interpreted both based on the shape of the melting 35 curves obtained for the proteins with metal and also on the first derivative (where the global maxima gives the Tm) of the typical melting curves obtained. Based on the thermal melting curves of the proteins in presence of metal ions, addition of of Cd2+ and Zn2+ enhanced the thermal stabilities of both apo-IscU and apo-SufU (Table 2.1) while addition of Co2+ ions resulted in thermally stable IscU:Co complex, but had no effect on the Tm of SufU.

Table 2.1. Melting temperatures of apo-IscU and apo-SufU in presence of various metal ions

Tm (°C) IscU SufU Apo-protein - - Zn2+ 63.2 57.8 Cd2+ 59.5 55.7 Co2+ 54.2 -

2.3.3 Assessment of solvent accessibility of the protein active site upon metal binding

Zinc binding to IscU and SufU was studied by monitoring the fluorescence of the single, intrinsic tryptophan in each protein in during gradual titration with ZnCl2. Upon excitation with 295 nm light, trp76 in E. coli IscU exhibited maximal emission at 355 nm that was enhanced by addition

2+ of Zn ions. The end-point of the titration was reached at 50 μM ZnCl2. By comparison, maximal emission for S. mutans SufU was observed at 335 nm, and addition of ZnCl2 resulted in quenching of the fluorescence signal. 36

Figure 2.3 Effect of zinc addition on tryptophan fluorescence emission. Normalized fluorescence spectra at various concentrations of zinc ions are shown. (A) E. coli IscU exhibits enhancement of fluorescence. (B) S. mutans SufU exhibits quenching of fluorescence. Apo-protein spectrum is indicated in blue, while protein spectrum at maximal concentration of zinc is indicated in red. Inset: Binding curves generated from analysis of the fluorescence during sequential addition of Zn2+ ions to a 10 μM solution of the respective apo-protein. Fraction of the normalized change in fluorescence intensity indicating fractional saturation for apo-IscU (left) and fluorescence quenching for apo-SufU (right) following titration with zinc ion are shown.

2.4 Discussion

2.4.1 Metal ion binding by Fe-S cluster biosynthesis proteins

Fe-S cluster biosynthesis proteins depend on their ability to bind and respond to specific transition metal ions. To accomplish this critical role, U-type Fe-S cluster biosynthesis proteins have evolved a conserved constellation of cysteine, as well as aspartate and histidine residues.

These residues coordinate with iron ions delivered during the formation of nascent Fe-S clusters; however metal ion selectivity is a challenge. Fe-S cluster proteins with similar ligands can bind other thiophilic soft metal ions with dysfunctional outcomes [19,21,22,37-41] . IscU and SufU have specifically been implicated as susceptible targets of other transition metal ions Co2+ ions or

Cu1+ ions, respectively [21,22]. Additionally, both IscU and SufU are capable of binding a zinc ion at their active site, which, in the case of IscU, has been shown to shift the equilibrium of 37 conformational states adopted by IscU proteins in solution [9,13,42], although evidence that zinc binding to IscU has no effect on its structural state has also been reported [43]. In the case of

SufU, zinc binding is apparently crucial for its ability to enhance the activity of its cognate Fe-S biosynthesis protein binding partner, SufS. Taken together, Fe-S cluster biosynthesis systems, and in particular the homologous U-type proteins IscU and SufU, exhibit important metal ion binding properties with the potential to serve as entry points for toxic metal stress. The goal of this study was to shed light on the influence of metal ion binding to two representative IscU and

SufU proteins by measuring effects on protein secondary structure, local structural changes, and thermal stability. A side-by-side comparison of IscU and SufU was carried out to examine the influence of zinc ions as well as other divalent transition metal ions chosen for their different coordination geometries to explore the selectivity of these two U-type proteins involved in Fe-S cluster biosynthesis.

2.4.2 Preparation of proteins for metal binding studies

Our approach to study the effects of metal binding on the structures of IscU and SufU in solution was to remove metals that bound during overexpression and purification, followed by monitoring the proteins for structural changes upon serial addition of transition metals implicated in prior literature. Careful studies reported by the Pastore group provided evidence that the treatment history of IscU can affect protein behavior [15]. In the reported study, IscU was observed to bind

-12 tightly to zinc ions (estimated Kd of 1x10 M) and was compared under two conditions: “as purified” (presumably bound to zinc ions encountered during overexpression/purification) and in a “metal free” state prepared by addition of 30-fold excess of EDTA. Importantly, the authors reported that “posthumous” addition of zinc ions following treatment with EDTA led to distinct 38 behavior of the IscU protein, including precipitation, leading to the conclusion that sample history is an important parameter.

Introducing different transition metals to IscU and SufU necessitated preliminary treatment to remove any metal ions that were bound to the proteins during purification. In order to avoid consequences such as “posthumous behavior” from using high concentrations of EDTA to remove metal ions, we employed an alternative method that has been reported as more mild or biocompatible with the treated protein [33]. EcIscU and SmSufU protein samples were passed through columns of metal affinity resin that were themselves pre-treated with EDTA to remove all metal ions (leaving a high concentration of empty metal-coordinating sites). After passing over the columns, the IscU and SufU samples were subsequently confirmed to have undetectable levels of zinc ions present (Figure S1). These “metal-free” EcIscU and SmSufU protein samples were monitored by circular dichroism, intrinsic tryptophan fluorescence, and thermofluor analysis upon addition of zinc and other transition metal ions in order to determine which species were capable of altering the protein secondary structure. Confirmation that the EcIscU and

SmSufU proteins behaved as expected was achieved with control experiments described below.

2.4.3 Zn2+ alters the secondary structure profile of IscU and SufU

It is well established that both IscU and SufU bind Zn2+ ions dating from early studies where zinc ions were found present in structures determined by X-ray crystallography[44] and

NMR[12], to more recent studies that have reported remarkably high binding affinities for this ion exhibited by both IscU and SufU[7,15]. A number of reports also indicated the presence or absence of zinc ions was associated with distinct secondary structure profiles[7,15,24]. We first set out to reproduce the observed changes in secondary structure profiles that resulted following 39 addition of Zn2+ ions to the protein samples by monitoring their circular dichroism spectra.

Addition of Zn2+ ions in solution led to a distinct spectral transition for both “metal free” EcIscU and SmSufU, respectively. In earlier NMR studies with IscU, zinc addition caused a shift in the equilibrium of D- and S-states to predominantly favor the S-state[24]. In B. subtilis, apo-SufU and SufU reconstituted with Zn2+ exhibited distinct secondary structures, a conclusion the authors also made based on recorded CD spectra [7]. Our results agree with previous findings with both EcIscU and SmSufU showing CD spectra characteristic of higher secondary structure content, in presence of Zn2+. For comparison, Fe3+ was included as a negative control. As expected from earlier reports [15], addition of Fe3+ did not affect the secondary structure of IscU and SufU as seen from their unchanged CD spectra (Figure 2.1).

2.4.4 Confirming the effect of Zn2+ on IscU and SufU with Cd2+

Our preliminary findings that addition of Zn2+ ions (but not Fe3+ ions) to both IscU and SufU resulted in distinct secondary structure profiles is mutually reinforcing with previous studies and the proposal that IscU can adopt D- and S-states. However, in light of the reports that sample history is an important experimental parameter, we sought to confirm the effect of Zn2+ on the secondary structure of EcIscU and SmSufU proteins that were treated first to remove metal ions bound during expression or purification. To do this, we employed Cd2+ to confirm that a small, thiophilic[45], transition metal ion with tetrahedral coordination was capable of altering the secondary structure profile of EcIscU and SmSufU. By comparison to the CD profiles recorded following titration with Zn2+, those recorded following addition of Cd2+ were nearly identical, as expected given the similarities between the two metal ions including size and coordination geometry. Our conclusion was that the observed structural transition can be reproducibly 40 achieved upon addition of a tetrahedral transition metal ion, but not one with expanded coordination such as Fe3+.

While it appeared that Zn2+ and Cd2+ ions had an effect on the secondary structure content of both IscU and SufU, we also considered the possibility that our use of the “stripped metal affinity column” method [33] to remove bound metal ions did not yield compliant protein samples, specifically in light of the report cautioning use of “posthumous” IscU samples [15].

We therefore compared the CD spectra of the wild type apo- and zinc-bound EcIscU protein samples with the CD spectrum of EcIscU protein we prepared with a D39A mutation. The D39A mutation has been demonstrated to stabilize the structure and mimic the zinc-bound state of wild type EcIscU, [24]. The CD spectrum recorded for the EcIscU D39A mutant overlaid closely with that of the wild type protein after zinc ions were added to a sample initially prepared in our hands with the “stripped metal affinity column” method (Chapter 3). An equivalent mutation in SufU has been prepared (D43A in SufU using Bacillus subtilis numbering [46]), however we were unfortunately unable to express sufficient quantities of SmSufU D43A for study. Nevertheless, comparison of the CD spectra from the EcIscU D39A mutant suggested EcIscU and SmSufU proteins prepared as described above, respond to the addition of either Zn2+ or Cd2+ ions with altered secondary structure content with a more structured state.

As a final validation, we employed a thermofluor assay as an orthogonal means to validate the stabilizing effect of Zn2+ and Cd2+ ion interactions with the EcIscU and SmSufU proteins.

Thermofluor assays monitor unfolding of the proteins in presence and absence of metal ions. The structural responses of the EcIscU and SmSufU proteins recorded using CD upon addition of different metal ions correlated well with the enhanced thermal stability of the metal-bound 41 proteins as determined with the thermofluor assay. From the protein melting curves (Figure 2.2)

EcIscU and SmSufU start out with a high fluorescence in absence of metal ions. Considering the possibility that apo EcIscU and apo SmSufU exhibit partial disorder, the relatively high initial fluorescence observed for these proteins may result from the dye binding to exposed hydrophobic moieties in the unstructured regions of the proteins. By comparison, in the presence of Zn2+ and Cd2+ ions, the plot of fluorescence vs. temperature for both EcIscU and SmSufU are as expected from the thermofluor assay of a folded protein.

2.4.5 IscU and SufU discriminate between transition metals with different coordination

geometries

That both Zn2+ and Cd2+ ions bind and stabilize the structures of EcIscU and SmSufU suggests a preference for tetrahedral coordination by these two proteins. We selected two other divalent metal ions with different coordination geometries to test this hypothesis. The first divalent metal ion tested was Cu2+, which is expected to adopt trigonal pyramidal or square planar coordination geometry in proteins[47,48]. By comparison to the change in CD profiles upon titration with zinc ions, addition of addition of Cu2+ ions did not affect the secondary structure of either EcIscU or

SmSufU. This observation is in agreement with an earlier report that found that, while capable of exerting negative effects on bacterial growth, Cu2+ ions did not directly interact with SufU from

B. subtilis [21], and furthermore suggest the structural effects attributed to Zn2+ and Cd2+ binding are not broadly reproduced by any divalent metal ion.

Interestingly, addition of Co2+ ions revealed a difference in the behavior of EcIscU and SmSufU.

While SmSufU did not undergo any change in secondary structure upon addition of cobalt ions,

EcIscU did exhibit a slight change (Figure 5). The structural effect is further supported by 42 thermal stability of the two proteins in presence of Co2+ ions where the presence of this metal ion resulted in an increase in the melting temperature of EcIscU only. One possibility for the different behavior of EcIscU and SmSufU in the case of Co2+ ion binding may be due to differences in the metal binding site between the two proteins. The active site of EcIscU comprises three cysteine residues, an aspartic acid residue, and a histidine residue with recognized “plasticity”[15] for coordinating metal ions. By comparison, the metal binding site of

SmSufU is missing the conserved histidine, which is replaced with a lysine residue. Though Co2+ has similar ionic radius[49,50] as Zn2+, the most common coordination for Co2+ in proteins in octahedral not tetrahedral. As recognized previously[15], having an additional ligand at the active site may confer the “plasticity” accommodate either a tetrahedral, or an octahedral divalent metal ion at the active site of EcIscU.

To further investigate the possibility that His105 in EcIscU plays a role in supporting octahedral coordination of Co2+, we prepared an H105A mutant EcIscU for study by CD and thermofluor analyses in parallel with the D39A mutation. While D39A does not exhibit a remarkable change in its CD profile upon addition of Zn2+ or Co2+ ions, H105A undergoes a marked structural transition in response to the addition of both Zn2+ and Co2+ ions (Chapter 3). This suggests H105 is not crucial for binding of either Zn2+ or Co2+ ions in the EcIscU structure. While this observation does not support the inclusion of H105 in the five putative residues, it does expand the octahedral coordination hypothesis to include residues that are in close proximity to the active site in EcIscU but have not been considered yet. In a detailed study by Markley and co- workers, mutations of several key residues (K89, N90, S107 and E111) that lie close to the putative active site of EcIscU, have been observed to preferentially stabilize the S or D-state

[24]. 43

In conclusion, EcIscU and SmSufU exhibit preference for interacting with Zn2+ and Cd2+ ions, thiophilic transition metal ions with tetrahedral coordination geometry that stabilize the structures of both proteins. In addition, both EcIscU and SmSufU discriminate between different transition metal ions, most likely on the basis of metal ion coordination geometry, although

EcIscU may also accommodate certain octahedral metal ions owing the “plasticity” of its expanded complement of putative metal ligand residues at its active site[15,43].

2.4.6 Confirming the structural effect and discrimination of different transition metals

While monitoring conformational transitions of the proteins, solvent accessibility of the active site of Fe-S proteins is another major issue to consider. Fe-S clusters are routinely damaged by solvent-borne ROS in the cell where solvent exposed iron atoms interact with superoxide anions

- - (O2 ) reducing O2 to H2O2 and resulting in unstable Fe-S clusters[51-53]. To assay the solvent accessibility of apo-IscU and apo-SufU active sites in presence of metal, we measured the fluorescence emission of intrinsic Trp residues in both the protein at increasing concentrations of

Zn2+. In absence of zinc, IscU recorded fluorescence emission maxima at 355 nm, indicating a solvent-exposed tryptophan residue[54]. Enhancement of fluorescence with addition of Zn2+ implies greater protection of tryptophan residue from solvent quenching effects. The observed enhancement of the fluorescence is consistent with calculations of the solvent accessible surface area (SASA) for each residue in holo (i.e. S state; PDB ID: 3LVL[55], chain A) and apo-IscU structures (i.e. D state; PDB ID: 2L4X[10]) using the Gerstein’s algorithm for accessible surface area implemented in the High-Performance Computing server at the NIH[56]. SASA for Trp76 was calculated to be 132 Å2 in apo-IscU and 67 Å2 in holo-IscU, suggesting lower solvent accessibility of tryptophan residue in holo/ structured form. 44

In S. mutans SufU, the fluorescence emission at 335 nm is quenched by increasing concentrations of zinc suggesting either further exposure of the reporter Trp residue to an increasingly polar micro-environment or greater solvent exposure. Due to the unavailability of crystal/ NMR/ cryo-EM structures of S. mutans SufU putatively representing both “structured” and “partially-disordered” states, SASA calculations for SufU could not be completed. However, recent H/D exchange study of Zn2+ bound B. subtilis SufU shows increased deuterium uptake of the loop containing active site Cys41 in presence of SufS[57]. Thus part of the active site in zinc- bound SufU can be exposed to the solvent.

2.4.7 Affinity of IscU and SufU for Zn2+ ions

Despite the contrasting effect of zinc on the solvent accessibility of their Trp residues, both IscU and SufU exhibit changes in the intensity of fluorescence of their intrinsic tryptophan residues indicating the proteins both undergo conformational changes in response to similar concentrations of added Zn2+. Both experimental determinations probing the conformational transitions exhibited by IscU and SufU indicated transitions that occurred within a low- micromolar Zn2+ concentration regime (Figure S3), which was further confirmed by ITC studies

(Figure S4).

Detailed spectroscopic studies[7,15] have reported extremely tight binding of zinc to IscU and

-17 -13 SufU (Kd = 10 M for SufU [7] and estimated Kd 10 M for IscU [15]). Such high affinity for zinc would imply a constitutively bound zinc atom which would not induce conformational switching of the proteins in response to metal addition. This suggests the possibility that the binding event we have observed is distinct from the tightly bound zinc ion that is required for

SufU activity [7] or has been associated with IscU. Interestingly, in the only NMR structure 45 available for zinc-bound IscU from H. influenzae, broadening of signals was observed around the metal binding site which was interpreted as “transient or weak binding of Zn2+” by the authors

[12]. The existing reports and experimental results are insufficient to support a biological role of such “transient” or weak zinc binding to IscU or SufU-mediated Fe-S cluster biosynthesis in vivo. We recognize that these proteins are challenging to work with, and that sample history is an important parameter. Therefore, our main conclusion is that under the same treatment conditions, both IscU and SufU behave similarly with regards to the structural impact of binding to zinc (and cadmium) ions in solution, the ability to discriminate between metal ions that exhibit different coordination geometry. IscU also has the distinct ability to also bind cobalt, a property that may arise from the different architecture of its active site; however Cu2+ does not influence either protein though Cu+ is implicated in the literature as affecting SufU-mediate biosynthesis [21].

2.5 Conclusion

Details of the mechanisms of Fe-S cluster biosynthesis mediated by IscU and SufU remain active areas of research, and in particular the effects of zinc ion binding by these two proteins. Our objective in this study was to compare side-by-side the selectivity and structural effects of metal ion binding by representative IscU and SufU proteins. EcIscU and SmSufU behave nearly the same with respect to zinc (and cadmium) binding in terms of the changes in secondary structure and approximate binding affinity. One important difference is that IscU also has the distinct ability to bind to Co2+ ions, a property that may arise from the different architecture of its active site, specifically the presence of histidine 105 as a compatible ligand for metal ion coordination

(Reference Pastore). The micromolar affinities reported here, supported by circular dichroism, intrinsic tryptophan fluorescence, and isothermal titration calorimetry, appear consistent with the 46 notion that these proteins may shift between accessible conformational states that differ by modest free energy changes, which has specifically been proposed in a model for IscU-mediated

Fe-S cluster biosynthesis. On the other hand, careful reports in other groups have measured significantly higher zinc ion affinities for both IscU from Escherichia coli and SufU from

Bacillus subtilis, which suggest the zinc ions that are essentially constitutively bound.

Importantly, the sample history of these refractory proteins has been shown to be an important parameter (reference Pastore), and while precautions were taken accordingly, determination of metal ion binding affinities may depend on the the conditions in use. Therefore, the conclusions we draw from this research are that under the same treatment conditions, both IscU and SufU behave similarly with regards to the structural impact of binding to zinc (and cadmium) ions in solution. In addition, both proteins exhibit the ability to discriminate between metal ions that adopt different coordination geometries, and EcIscU, but not SmSufU, responds to addition of

Co2+ ions added in solution. These results support the expectation that homology between IscU and SufU is manifested in protein structures that behave similarly, specifically with regard to metal-ion dependent conformational changes and binding, which will in turn will inform future studies targeting U-type proteins involved in Fe-S cluster biosynthesis.

47

REFERENCES

1. Bandyopadhyay S, Chandramouli K, Johnson Michael K (2008) Iron–sulfur cluster biosynthesis. Biochemical Society Transactions 36: 1112-1119. 2. Zheng L, Cash VL, Flint DH, Dean DR (1998) Assembly of Iron-Sulfur Clusters identification of an iscSUA-hscBA-fdx gene cluster from Azotobacter vinelandii. Journal of Biological Chemistry 273: 13264-13272. 3. Agar JN, Zheng L, Cash VL, Dean DR, Johnson MK (2000) Role of the IscU protein in iron-sulfur cluster biosynthesis: IscS-mediated assembly of a [Fe2S2] cluster in IscU. Journal of the American Chemical Society 122: 2136-2137. 4. Fu W, Jack RF, Morgan TV, Dean DR, Johnson MK (1994) nifU gene product from Azotobacter vinelandii is a homodimer that contains two identical [2Fe-2S] clusters. Biochemistry 33: 13455- 13463. 5. Riboldi GP, Verli H, Frazzon J (2009) Structural studies of the Enterococcus faecalis SufU [Fe-S] cluster protein. BMC biochemistry 10: 1. 6. Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, et al. (2015) CDD: NCBI's conserved domain database. Nucleic Acids Res 43: D222-226. 7. Selbach BP, Chung AH, Scott AD, George SJ, Cramer SP, et al. (2013) Fe-S cluster biogenesis in Gram-positive bacteria: SufU is a zinc-dependent sulfur transfer protein. Biochemistry 53: 152- 160. 8. Selbach B, Earles E, Dos Santos PC (2010) Kinetic Analysis of the Bisubstrate Cysteine Desulfurase SufS from Bacillus subtilis. Biochemistry 49: 8794-8802. 9. Markley JL, Kim JH, Dai Z, Bothe JR, Cai K, et al. (2013) Metamorphic protein IscU alternates conformations in the course of its role as the scaffold protein for iron–sulfur cluster biosynthesis and delivery. FEBS letters 587: 1172-1179. 10. Kim JH, Tonelli M, Kim T, Markley JL (2012) Three-dimensional structure and determinants of stability of the iron–sulfur cluster scaffold protein IscU from Escherichia coli. Biochemistry 51: 5557-5563. 11. Mansy SS, Wu S-p, Cowan J (2004) Iron-Sulfur Cluster Biosynthesis Biochemical Characterization of the Conformational Dynamics of Thermatoga maritima IscU and the Relevance for Cellular Cluster Assembly. Journal of Biological Chemistry 279: 10469-10475. 12. Ramelot TA, Cort JR, Goldsmith-Fischman S, Kornhaber GJ, Xiao R, et al. (2004) Solution NMR structure of the iron–sulfur cluster assembly protein U (IscU) with zinc bound at the active site. Journal of molecular biology 344: 567-583. 13. Kim JH, Bothe JR, Alderson TR, Markley JL (2015) Tangled web of interactions among proteins involved in iron–sulfur cluster assembly as unraveled by NMR, SAXS, chemical crosslinking, and functional studies. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research 1853: 1416-1428. 14. Kim JH, Tonelli M, Frederick RO, Chow DC-F, Markley JL (2012) Specialized Hsp70 chaperone (HscA) binds preferentially to the disordered form, whereas J-protein (HscB) binds preferentially to the structured form of the iron-sulfur cluster scaffold protein (IscU). Journal of Biological Chemistry 287: 31406-31413. 15. Iannuzzi C, Adrover M, Puglisi R, Yan R, Temussi PA, et al. (2014) The role of zinc in the stability of the marginally stable IscU scaffold protein. Protein Science 23: 1208-1219. 16. Huang J, Cowan J (2009) Iron–sulfur cluster biosynthesis: role of a semi-conserved histidine. Chemical Communications: 3071-3073. 17. Foster MW, Mansy SS, Hwang J, Penner-Hahn JE, Surerus KK, et al. (2000) A Mutant Human IscU Protein Contains a Stable [2Fe-2S]2+ Center of Possible Functional Significance. Journal of American Chemical Society 122: 6805-6806. 48

18. Kornhaber GJ, Snyder D, Moseley HN, Montelione GT (2006) Identification of zinc-ligated cysteine residues based on 13 Cα and 13 Cβ chemical shift data. Journal of biomolecular NMR 34: 259- 269. 19. Xu FF, Imlay JA (2012) Silver (I), mercury (II), cadmium (II), and zinc (II) target exposed enzymic iron-sulfur clusters when they toxify Escherichia coli. Applied and environmental microbiology 78: 3614-3621. 20. Quintal SM, dePaula QA, Farrell NP (2011) Zinc finger proteins as templates for metal ion exchange and ligand reactivity. Chemical and biological consequences. Metallomics : integrated biometal science 3: 121-139. 21. Chillappagari S, Seubert A, Trip H, Kuipers OP, Marahiel MA, et al. (2010) Copper Stress Affects Iron Homeostasis by Destabilizing Iron-Sulfur Cluster Formation in Bacillus subtilis. Journal of bacteriology 192: 2512-2524. 22. Ranquet C, Ollagnier-de-Choudens S, Loiseau L, Barras F, Fontecave M (2007) Cobalt Stress in Escherichia coli :The effect on the iron-sulfur proteins. Journal of Biological Chemistry 282: 30442-30451. 23. di Maio D, Chandramouli B, Yan R, Brancato G, Pastore A (2017) Understanding the role of dynamics in the iron sulfur cluster molecular machine. Biochimica et Biophysica Acta (BBA)- General Subjects 1861: 3154-3163. 24. Kim JH, Tonelli M, Markley JL (2012) Disordered form of the scaffold protein IscU is the substrate for iron-sulfur cluster assembly on cysteine desulfurase. Proceedings of the National Academy of Sciences 109: 454-459. 25. Yan R, Kelly G, Pastore A (2014) The Scaffold Protein IscU Retains a Structured Conformation in the -1686. 26. Li J, Ding S, Cowan J (2013) Thermodynamic and structural analysis of human NFU conformational chemistry. Biochemistry 52: 4904-4913. 27. Cai K, Frederick RO, Kim JH, Reinen NM, Tonelli M, et al. (2013) Human mitochondrial chaperone (mtHSP70) and cysteine desulfurase (NFS1) bind preferentially to the disordered conformation, whereas co-chaperone (HSC20) binds to the structured conformation of the iron-sulfur cluster scaffold protein (ISCU). Journal of Biological Chemistry 288: 28755-28770. 28. Lemos JA, Quivey Jr RG, Koo H, Abranches J (2013) Streptococcus mutans: a new Gram-positive paradigm? Microbiology 159: 436-445. 29. Krzyściak W, Jurczak A, Kościelniak D, Bystrowska B, Skalniak A (2014) The virulence of Streptococcus mutans and the ability to form biofilms. European Journal of Clinical Microbiology & Infectious Diseases 33: 499-515. 30. Banas JA (2004) Virulence properties of Streptococcus mutans. Front Biosci 9: 1267-1277. 31. Stookey LL (1970) Ferrozine---a new spectrophotometric reagent for iron. Analytical chemistry 42: 779-781. 32. Säbel CE, Shepherd JL, Siemann S (2009) A direct spectrophotometric method for the simultaneous determination of zinc and cobalt in metalloproteins using 4-(2-pyridylazo) resorcinol. Analytical biochemistry 391: 74-76. 33. Carrer C, Stolz M, Lewitzki E, Rittmeyer C, Kolbesen BO, et al. (2006) Removing coordinated metal ions from proteins: a fast and mild method in aqueous solution. Analytical and bioanalytical chemistry 385: 1409-1413. 34. Lo M-C, Aulabaugh A, Jin G, Cowling R, Bard J, et al. (2004) Evaluation of fluorescence-based thermal shift assays for hit identification in drug discovery. Analytical biochemistry 332: 153- 159. 35. Ericsson UB, Hallberg BM, DeTitta GT, Dekker N, Nordlund P (2006) Thermofluor-based high- throughput stability optimization of proteins for structural studies. Analytical biochemistry 357: 289-298. 49

36. Pantoliano MW, Petrella EC, Kwasnoski JD, Lobanov VS, Myslik J, et al. (2001) High-density miniaturized thermal shift assays as a general strategy for drug discovery. Journal of biomolecular screening 6: 429-440. 37. Fantino JR, Py B, Fontecave M, Barras F (2010) A genetic analysis of the response of Escherichia coli to cobalt stress. Environmental microbiology 12: 2846-2857. 38. Macomber L, Imlay JA (2009) The iron-sulfur clusters of dehydratases are primary intracellular targets of copper toxicity. Proceedings of the National Academy of Sciences 106: 8344-8349. 39. Thorgersen MP, Downs DM (2007) Cobalt targets multiple metabolic processes in Salmonella enterica. Journal of bacteriology 189: 7774-7781. 40. Imlay JA (2014) The mismetallation of enzymes during oxidative stress. Journal of Biological Chemistry 289: 28121-28128. 41. Gu M, Imlay JA (2013) Superoxide poisons mononuclear iron enzymes by causing mismetallation. Molecular microbiology 89: 123-134. 42. Dai Z, Kim JH, Tonelli M, Ali IK, Markley JL (2014) pH-Induced Conformational Change of IscU at Low pH Correlates with Protonation/Deprotonation of Two Conserved Histidine Residues. Biochemistry 53: 5290-5297. 43. Adrover M, Howes BD, Iannuzzi C, Smulevich G, Pastore A (2015) Anatomy of an iron-sulfur cluster scaffold protein: Understanding the determinants of [2Fe–2S] cluster stability on IscU. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1853: 1448-1456. 44. Liu J, Oganesyan N, Shin DH, Jancarik J, Yokota H, et al. (2005) Structural characterization of an iron–sulfur cluster assembly protein IscU in a zinc-bound form. Proteins: Structure, Function, and Bioinformatics 59: 875-881. 45. Helbig K, Grosse C, Nies DH (2008) Cadmium toxicity in glutathione mutants of Escherichia coli. Journal of bacteriology 190: 5439-5454. 46. Albrecht AG, Netz DJ, Miethke M, Pierik AJ, Burghaus O, et al. (2010) SufU is an essential iron- sulfur cluster scaffold protein in Bacillus subtilis. Journal of bacteriology 192: 1643-1651. 47. Leigh G (2004) Comprehensive coordination chemistry II From Biology to Nanotechnology. Elsevier. 48. Ramakrishnan C, Geetha YS (1990) Analysis of the coordination geometry in copper complexes. Proceedings of the Indian Academy of Sciences - Chemical Sciences 102: 481-496. 49. Maret W, Li Y (2009) Coordination Dynamics of Zinc in Proteins. Chemical Reviews 109: 4682- 4707. 50. Marcus Y (1988) Ionic radii in aqueous solutions. Chemical Reviews 88: 1475-1498. 51. Dixon SJ, Stockwell BR (2014) The role of iron and reactive oxygen species in cell death. Nature chemical biology 10: 9-17. 52. Jang S, Imlay JA (2010) Hydrogen peroxide inactivates the Escherichia coli Isc iron-sulphur assembly system, and OxyR induces the Suf system to compensate. Molecular microbiology 78: 1448- 1467. 53. Imlay JA (2006) Iron-sulphur clusters and the problem with oxygen. Molecular microbiology 59: 1073-1082. 54. Ghisaidoobe AB, Chung SJ (2014) Intrinsic tryptophan fluorescence in the detection and analysis of proteins: A focus on Förster resonance energy transfer techniques. International journal of molecular sciences 15: 22518-22538. 55. Shi R, Proteau A, Villarroya M, Moukadiri I, Zhang L, et al. (2010) Structural basis for Fe–S cluster assembly and tRNA thiolation mediated by IscS protein–protein interactions. PLoS Biol 8: e1000354. 56. Gerstein M (1992) A resolution-sensitive procedure for comparing protein surfaces and its application to the comparison of antigen-combining sites. Acta Crystallographica Section A: Foundations of Crystallography 48: 271-276. 50

57. Blauenburg B, Mielcarek A, Altegoer F, Fage CD, Linne U, et al. (2016) Crystal Structure of Bacillus subtilis Cysteine Desulfurase SufS and Its Dynamic Interaction with Frataxin and Scaffold Protein SufU. PloS one 11: e0158749. 58. Keller S, Vargas C, Zhao H, Piszczek G, Brautigam CA, et al. (2012) High-precision isothermal titration calorimetry with automated peak-shape analysis. Analytical chemistry 84: 5066-5073. 59. Houtman JC, Brown PH, Bowden B, Yamaguchi H, Appella E, et al. (2007) Studying multisite binary and ternary protein interactions by global analysis of isothermal titration calorimetry data in SEDPHAT: application to adaptor protein complexes in cell signaling. Protein science : a publication of the Protein Society 16: 30-42. 60. Brautigam CA (2015) Chapter Five-Calculations and Publication-Quality Illustrations for Analytical Ultracentrifugation Data. Methods in enzymology 562: 109-133. 51

APPENDIX A FIGURES

Figure S1. Quantification of zinc ion content of proteins using PAR. Low micromolar concentrations of zinc (1-10 μM) are complexed with the chromophoric chelator 4-(2- pyridylazo) resorcinol (PAR) and absorbance spectra are recorded in the visible range (350-600 nm). (A) Absorbance spectra for IscU as-purified and apo-IscU are shown in pink and black respectively. PAR:Zn complex shows maximum absorbance at 497 nm; (B) absorbance at 497 nm for each spectrum is plotted against concentration of zinc to constitute the calibration plot. SufU as purified was found to contain 1 zinc ion per 20 protein molecules and IscU as purified had 1 zinc ion to 25 protein molecules. Following removal of coordinated metal ions the absorbance for the apo-proteins were below the detection limits of the assay.

Figure S2. Quantification of iron content of proteins using Ferrozine assay. (A) Absorbance spectra of the Fe2+–ferrozine complex formed with increasing concentration of the standard

FeCl3 solution is shown in μg/ mL. Protein samples of IscU as purified and after removal of coordinated metal ions (apoIscU) were treated similarly with ferrozine as the standard FeCl3 aliquots. Maximal absorbance of the Fe2+–ferrozine complex was recorded at 560 nm and (B) a 52 calibration graph of the peak absorbance of standard FeCl3 aliquots was plotted to determine iron content of the proteins as purified and following removal of coordinated metal ions. As purified, IscU and SufU were found to contain 1 iron to 5 protein molecules and 1 iron ion to 4 protein molecules respectively. Both ApoIscU and ApoSufU were found to contain 1 iron to 40 protein molecules, following removal of coordinated metal ions.

Figure S3. Effect of zinc addition on secondary structure of proteins. Far-UV circular dichroism spectra of (A) E. coli IscU and S. mutans SufU with increasing concentrations of zinc ions. Molar ellipticity is in (deg cm2/ decimol). All spectra were taken at 25 °C with protein samples at pH 7.5.

Figure S4. ITC measurements of zinc ion binding to the apo IscU and SufU proteins carried out at 27 °C on a MicroCal titration calorimeter. All samples were in degassed 20 mM Tris-Cl (pH 7.5) buffer solution and measurements made over two concentration ranges. (A) A 1.5 mM stock of Zn2+ was titrated into a solution of 0.02 mM IscU (E. coli). (B) A 1.0 mM stock of Zn2+ was titrated into a solution of 0.02 mM SufU (S. mutans). Data were collected automatically and subsequently analyzed with NITPIC/ SEDPHAT[58,59] and plotted using GUSSI[60]. 53

CHAPTER 3. ROLE OF PUTATIVE LIGANDS AT ACTIVE SITE OF IscU

3.1 Introduction

3.1.1 Metals associated with proteins

Proteins perform a variety of different functions in biological systems. Each protein-specific function arises from the arrangement and chemical properties of amino acid residues governing the structure of the protein, particularly at its active site. While there is enormous diversity available to proteins comprised of the 20 naturally-occurring amino acid residues, there are nevertheless limits; certain chemistries are not possible. However, proteins have evolved that coordinate with metal ions or metal cofactors that are suited to functions beyond what can be achieved with amino acids alone. In fact, nearly half of all enzymes has to associate with a specific metal species in order to perform its required function [3]. Certain metal ions such as

Na+, K+, Mg2+, Ca2+ are present in high concentration and mainly function as charge carriers.

These ions are mobile and weakly associated with proteins. Transition metal ions such as Co2+,

Zn2+, Fe3+, Mn2+ etc. have moderate to strong binding affinity for proteins and are utilized in a range of diverse functions. These trace metals are all Lewis acids i.e. they can accept an electron pair from a donor. Based on the ratio between the charge on a metal ion its ionic radius, these metal ions can be classified as hard or soft acids. Table 3.1 shows the general characteristics of hard and soft acids [4]. According to the Hard Soft Acid Base (HSAB) theory [5], hard acids interact with hard base while intermediate and soft acids form stable complexes with soft bases.

The fundamental basis of hard-hard interaction is mainly ionic in nature, while soft-soft interactions are guided by interactions between the expanded metal orbitals (usually involving d orbitals). 54

Table 3.1 Classification and characteristics of biologically-relevant metal ions and ligands according to the Hard Soft Acid Base theory

Lewis acid Metal ion Ligand (donor) Properties of Properties of classification (acceptor) acceptor donor + + 2+ - - Hard Na , K , Mg , H2O, OH , CO2 , High charge Low 2+ 3+ 3+ 2- - Ca , Cr , Fe , CO3 , NO3 , density, small ionic polarisability, 2+ 3- - Co PO4 , Cl radius high electronegativity, hard to oxidize 2+ 2+ 2+ - 2- - Intermediate Fe , Co , Ni , NO2 , SO3 , Br , Cu2+, Zn2+ imidazole Soft Cu+, Hg+, Ag+, RSH, RS-, CN- Low charge High Cd2+ density, large ionic polarisability, low radius electronegativity, easily oxidized

3.1.2 Amino acids as ligands for metal ions

Only a few of the 20 amino acids can serve as metal ligands in proteins. Ligand groups must have an available lone pair of electrons. Amino acid residues such as His, Cys, Asp, Glu, Tyr, and Ser can serve as ligands for binding metal ions. On the other hand, delocalized electron pairs in Arg, Asn, Gln are less likely to associate with metals. Even rarer metal ligands are amino group of Lys and the thioether group of Met. Amino acid residues and their coordinating moieties are shown in Table 3.2 and recurrent metal binding patterns of commonly observed amino acid residues are shown in Figure 3.1.

55

Table 3.2 Amino acid residues that commonly bind metals in proteins. Coordinating atoms/moieties are also indicated.

Amino acid Sidechain Coordinating atom Histidine N

Cysteine -CH2-SH S

Serine -CH2-OH O

Methionine -CH2-SCH3 S Tyrosine O

Aspartic acid/ Glutamatic acid O

Asparagine/ Glutamine O, N

Figure 3.1 Protein amino acid sidechains involved in metal binding and their metal (M) binding patterns. Adapted from [4]. 56

3.1.3 Coordination geometry of metals in metalloproteins

Valence shell electrons of both the metal and the amino acid functional group are attracted by the two nuclei, however proximity of electrons lead to repulsive forces. Hence coordination geometry of a metal ion in a complex is a delicate balance between the attractive and repulsive forces. The final geometry is determined by the number of available lone pairs of electrons which are arranged in a strict spatial orientation to minimize electrostatic repulsion.

Hybridization of the valence shell orbital determines the coordination number of the metal ion i.e. the total number of points of attachment to the central metal. In inorganic coordination complexes, coordination number can vary from 2 to 16, but in biology the coordination numbers are usually limited to 4-6 [6]. Coordination number of a complex is also influenced by the ionic potential of the metal ion, which is defined as the charge to radius ratio (q/r). Table 3.3 describes the geometries corresponding to the most common coordination numbers in biological systems, with the corresponding hybridization of the valence shell orbital and exemplary metal complexes.

57

Table 3.3 Common geometries and corresponding hybridization for 4-, 5- and 6-coordinate metal ions

Coordination Hybridization of Geometry Geometry Example of number valence shell illustration metal ion 4 sp3 Tetrahedral Zn2+

Square planar Cu2+

5 sp3d Trigonal Cu+ bipyramidal

6 sp3d2 Octahedral Fe3+, Co2+

3.1.4 Conservation of residues at the active site of IscU superfamily

The active site of IscU comprises of three cysteine residues (Cys37, Cys63, and Cys106) that are involved in metal ion coordination at the active site and are conserved across the IscU protein domain family. Additionally, two other residues, D39 and H105 (E. coli numbering)are found in proximity to the conserved cysteines and are implicated in metal coordination at the active site

[7,8]. At the active site of IscU, a localization of conserved charges is observed. 58

Figure 3.2. Conservation of electrostatics in IscU based on multiple sequence alignments. Multiple sequence alignments for IscU-like superfamily of proteins with 100 aligned sequences (right) were used to determine conserved electrostatics mapped on to the crystal structure of holo IscU (PDB: 3LVL). Red indicates negatively charged residues, blue indicates positively charged residues. Polar residues are represented by lighter hue of grey while darker grey indicates non- polar i.e. hydrophobic residues. Both figures show high conservation of charge around the putative binding site (circled).

These conserved charge residues are also reported to play crucial role in metal ion coordination at the active site. It is interesting to note that in the multiple sequence alignment of IscU_like superfamily (Figure 3.2B) D39 is always conserved. It is also reported that the D39A mutation decouples Fe-S cluster assembly from Fe-S cluster transfer [9-11]. On the other hand, H105 is also spatially close to the conserved cysteines and has been implicated in zinc binding in IscU

[7]. In the multiple sequence alignment of IscU_like superfamily from conserved domain database [12], there are residues with positive and negative charges, polar and non-polar groups.

Histidine is the most common amino acid residue at that position accounting for 70% 59 composition among the most diverse sequences in the multiple sequence alignment. Ironically, among more-closely related sequences, histidine is only 30% conserved at this position while lysine accounts for 35%.

3.1.5 Metal coordination at the active site of IscU

In addition to the three conserved cysteines residues, the fourth ligand required for coordination of Zn2+ was identified as D39 in a crystal structure of IscU from T. thermophilus determined in the presence of Zn2+ (PDB:2QQ4). However in other structures, H105 was observed to coordinate with Zn2+[7,13]. Zn2+ has been independently observed to bind at the active sites of

SufU from B. subtilis[14] and S. pyogenes[15], where they coordinate to D40, the residue corresponding to D39 of IscU, in Firmicutes. Sequence comparison with other U-type proteins show high conservation for all three cysteines and D39 (D41 in SufU) and also reveal a conserved K103 (replaced by R124 in Gram-positive bacteria), another potentially important residue close to the active site. K103 was initially implicated in ligation in zinc-bound H. influenzae IscU [7] and is also in close proximity with C65 in zinc-bound SufU from S. pyogenes

[15]. In E. coli IscU, K103 contacts HscA and has significant effect on the kinetics of its chaperone activity [16,17].

Recently, it has been proposed that D39 and H105 can both coordinate to Zn2+, depending on the pH. Using quantum mechanical calculation based on a homology model of zinc-bound E. coli

IscU, the Pastore group predicted that at high pH, the deprotonated cysteines C37 and C63, H105 and sidechain carboxylate of D39 coordinate to Zn2+ in a tetrahedral geometry while C106 is pointed away from the metal ion[8,17]. In the quantum mechanical model of 2Fe-2S cluster loaded wild-type IscU, one of the Fe atoms is axially sandwiched between D39 and H105 in a 60 trigonal bipyramidal geometry[17]. The presence of five possible ligands in IscU has been hypothesized to create a “tug of war” between the residues competing for coordination at active site resulting in destabilization of the cluster [8]. Mutation of either of the residues, D39A or

H105A, has been shown to stabilize cluster formation [17].

Interestingly, sequence regions in both IscU and SufU near the active sites are predicted to exhibit inherent disorder, which leads to the question of whether ligand binding may influence the structure or conformation. Comparison of the predicted regions of disorder for IscU and

SufU (using a server that predicts naturally disordered regions in proteins, PONDR [18]) from E. coli and S. mutans respectively (Figure 1.3) indicates a comparable region in the two proteins that is near the active site (circled in Figure 3.3). A recent study used hydrogen/deuterium exchange to identify a similar region of conformational flexibility (residues 98-108) in SufU from B. subtilis[19]. It is possible that stabilization of the active site cysteine residues by zinc binding (Figure 3.3, bottom panel) may affect the residues implicated in conformational heterogeneity, and specifically that the U-type proteins appear to share a common structural region that may be important for the conformational transition induced by zinc binding at the active site. Notably, this shared region of predicted disorder overlaps with the LPPVK and GPR regions in IscU and SufU, respectively, which have been implicated in protein interactions that may affect the conformational states of IscU [20] and SufU [21]. 61

Figure 3.3 Crystal structures and model of SufU and IscU with disordered regions predicted by PONDR highlighted in red (upper) and active site residues depicted in ball-and-stick (lower). (A) S. pyogenes SufU, PDB: 1SU0[15] (cyan) with a bound Zn2+ ion (grey sphere) (B) S. mutans SufU threaded model using 1SU0 as template (slate) (C) E. coli IscU, PDB: 3LVL[22], chain A (green). The sole tryptophan residue in each structure (Trp132 in S. pyogenes and S. mutans SufU and Trp76 in E. coli IscU) is depicted in ball-and-stick representation (dark blue). Orange circles highlight the GPR and LPPVK sequence regions in SufU and IscU, respectively, that are predicted to be flexible and were implicated in protein:protein interactions.

3.1.6 Role of active site residues in conformational transitions of IscU

Currently, there is debate about the determinants of structural disorder in IscU. Markley and co- workers have reported that IscU exists in an equilibrium of structured and partially disordered states, as described in previous sections [23-25]. They have further demonstrated that IscS preferentially binds to the disordered form of IscU [26]. However, recent reports by Pastore and 62 co-workers have concluded that the biologically-relevant state of IscU is primarily structured

[17], albeit with an inherent plasticity at its active site that is capable of binding mononuclear metal ions (e.g. Zn2+) or Fe-S clusters [8]. Therefore, investigation of metal coordination at the active site of IscU is pertinent to the study of its structural disorder. Chapter 2 detailed how E. coli IscU coordinates to both zinc and cadmium ions (tetrahedral coordination), as well as cobalt ions (octahedral coordination). In this chapter, the influence of residues D39 and H105 in binding to specific metal ions, as well as supporting the metal ion-dependent conformational changes in IscU is investigated. To achieve this goal, the ability to coordinate select metal ion species, as well as the resulting changes in protein structure and stability were compared between wild type IscU from E. coli and D39A and H105A mutants.

3.2 Methods

3.2.1 Gene cloning and protein overexpression construct design

The genes encoding the desired proteins were PCR amplified from genomic DNA purchased from Sigma (Escherichia coli K12 strain) or ATCC (Bacillus subtilis). The amplified gene sequences were purified from agarose gels and then treated with NdeI and XhoI restriction enzymes (New England Biolabs) before ligation into a modified pET vector designed to produce the target proteins with a fused N-terminal 6-histidine purification tag that is cleavable with tobacco etch virus (TEV) protease in the case of IscU. The SufU gene sequence was instead cloned into a pET28a vector. All expression plasmids were submitted for nucleotide sequencing at the Chicago DNA Sequencing Facility prior to use.

63

3.2.2 Site-directed mutagenesis of IscU

A plasmid containing the E. coli iscu gene was mutated to generate IscU mutants D39A/ D39H and H105A/H105D. The expression plasmids of IscU variants were created using stepwise site- directed mutagenesis protocol to the pTHT-IscU expression vector. Plasmid (5ng/ μL) was amplified by polymerase chain reaction with either forward or backward mutagenesis primers in

“half-reactions”. After 5 amplification cycles (30 sec at 95°C, 60 sec at 51°C, 7.5 min at 68°C) the half-reactions were combined and the amplification cycle was repeated 16 times. PCR amplified gene was cleaned using Qiagen PCR purification kit, introduced into NovaBlue competent cells (EMD Millipore). Mutated transformants were verified by plasmid sequencing and colony PCR.

3.2.3 Protein expression and IMAC purification

Competent E.coli BL21 (DE3) cells (EMD Millipore) were transformed with the protein overexpression plasmids and plated on lysogeny broth (LB) agar plates containing kanamycin

(65 μg/ml). Single colonies were used to inoculate 80 ml of sterile LB medium with 65 μg/ml of kanamycin and grown for overnight at 37 °C on an orbital shaker set to 250 rpm. The overnight cultures were diluted 1 to 200 into 1.5 L sterile LB medium with 65 μg/ml of kanamycin in baffled flasks and incubated in an orbital shaker set to 37 oC and 250 rpm. When cell growth reached an OD600 of 0.5-0.7, the culture flasks were transferred to ice water baths for 30 minutes, induced by the addition of 0.2 mM (final concentration) of isopropyl β-D-1- thiogalactopyranoside (IPTG), and returned to the orbital shaker for an additional 16 hours at 15 oC and 110 rpm. Approximately 3 g cell pellet per liter of culture was harvested by centrifugation at 6000 rcf for 15 min, flash-frozen in liquid nitrogen, and stored at -80 oC. For 64 purification, cell pellets were thawed and re-suspended in lysis buffer comprising 0.025 M Tris-

Cl pH 8.0, 0.50 M NaCl, and 2 μl (500 units) Benzonase nuclease (Sigma Aldrich). The lysis mixture was stirred with a magnetic stir bar and sonicated for 120 cycles of 2 seconds sonication pulses separated by 3 seconds rest. Crude lysate was clarified by centrifugation at 25000 rcf for

30 minutes, and the supernatant was loaded on to a pre-packed 5 ml His-Trap immobilized nickel column (GE Healthcare) using a peristaltic pump. The column was washed extensively with wash buffer (identical to lysis buffer without added Benzonase). After washing, the bound protein was eluted with elution buffer (0.025 M Tris-Cl pH 7.8, 0.50 M NaCl, and 0.25M imidazole). Samples taken at each step were analyzed by SDS-PAGE and visualized with

Coomassie Brilliant Blue to identify fractions with optimal yield and purity for pooling and subsequent analysis. The pooled SufU fractions were buffer exchanged by 4 cycles of concentration followed by dilution into the desired buffer (0.020 M Tris-Cl pH 7.5) , using

Amicon Ultra-15 centrifugal filter units (EMD Milliporex). The collected proteins were cleaved from their purification tags with addition of TEV protease (IscU) or Thrombin (SufU) followed by overnight incubation. The reaction mixture was passed over a fresh column of immobilized nickel resin and the cleaved proteins were collected as flow through. Protein purity was confirmed by SDS-PAGE.

3.2.4 Iron ion content determination

As described in chapter 2, the iron ion content of purified protein was determined according to a published method with ferrozine purchased from Thermo Fisher Scientific[27]. The absorbance maximum at 560 nm was recorded on a Cary60 spectrophotometer (Agilent Technologies). The 65 amount of iron present in the sample was estimated using a plot of standard solutions with Fe3+ ion concentrations ranging from 1 μg to 6 μg.

3.2.5 Zinc ion content determination

Zn2+ ion content of the proteins was determined using a spectrophotometric assay involving the chromophoric chelator 4-(2-pyridylazo) resorcinol (PAR)[28]. PAR will form a complex with zinc ions that exhibits an absorbance maximum at 497 nm. Standard solutions with Zn2+ (1-8

μM) and protein solutions were prepared in 0.05 M HEPES at pH 7.4 containing 4 M Guanidine hydrochloride, followed by addition of 50 μM PAR then incubated for 30 minutes. The spectrum was recorded (350-650 nm) for all the samples using the Cary60 spectrophotometer (Agilent

Technologies) and the quantity of Zn2+ ions present in protein was determined from the calibration plot.

3.2.6 Removal of coordinated metal ions

Coordinated metal ions were removed from purified proteins by passage over a column of Ni-

NTA resin (Thermo Fisher Scientific) pre-treated with 0.1 M EDTA (pH 8.0) to expose open metal coordination sites that would compete with protein-bound metal ions [29]. Before adding the protein, the EDTA-treated Ni-NTA resin was washed extensively with purified water and 20 mM Tris-Cl buffer (pH 7.8). Purified protein was applied to the column and the flow-through was collected. Subsequently, the column was washed with 20 mM Tris-Cl repeatedly to elute the apo-protein. Eluted apo-protein was concentrated using Amicon Ultra-15 centrifugal filter units

(EMD Millipore) with 3 kDa nominal molecular weight cutoff and diluted to original concentration with 20 mM Tris-Cl buffer. This step was repeated 4-5 times to ensure removal of any residual EDTA. Removal of metal ions was confirmed by Ferrozine assay/PAR assay prior 66 to other experiments (Figures S2 and S3).

3.2.7 Circular Dichroism (CD) experiments

Conformational changes induced by the addition of Zn2+ to the apo-proteins were monitored by circular dichroism spectroscopy. 0.10 M zinc chloride stock solutions were freshly prepared in water and subsequently diluted into a buffer containing 0.02 M Tris-Cl (pH 7.8) and 0.5 mM

Tris(hydroxymethyl)phosphine (THP) to form the “working stock solutions”. Samples of 10 μM

(final concentration) IscU and SufU were prepared in a buffer containing 0.02 M Tris-Cl (pH

7.8) and 0.5 mM Tris(hydroxymethyl)phosphine (THP). In spite of the known interference with

CD spectral measurements at wavelengths shorter than 200 nm, Tris buffer with chloride counter-ions was selected for its compatibility with different metal salts used in parallel experiments. A “blank” spectrum recorded with buffer alone was recorded and subtracted from all subsequent measurements. Protein:zinc solutions were prepared by combing equal volumes of protein and zinc chloride, each at twice the desired concentration in 20 mM Tris-Cl buffer (pH

7.8). Protein and buffer samples were incubated for 1 hr followed by spinning down for 10 minutes at 14000 rcf. Slight cloudiness was observed for SufU in combination with Zn2+ ions in excess of 40 μM; IscU solutions remained clear in presence of Zn2+ ions. Far-UV CD spectra of protein-metal complexes were recorded with an Aviv 62DS updated to the equivalent of a Model

202 CD Aviv Circular Dichroism Spectrometer at 25 °C. Data were collected every 0.5 nm with

2 sec averaging time. Three scans were averaged and the appropriate buffer baseline was subtracted. All spectra were smoothed and plotted in Igor Pro.

67

3.2.8 Thermofluor assay

ThermoFluor experiments [30] were carried out using a Johnson & Johnson Pharmaceutical

Research & Development, LLC RT-PCR instrument. . Protein-metal solutions in 20 mM

HEPES, pH 7.5 (25 μL) were dispensed into 96-well polypropylene PCR microplates (Abgene). and sealed with transparent adhesive tape to prevent evaporation. Protein solutions contained apoIscU or apoSufU at 10 μM final concentration, 20 mM HEPES buffer (pH 7.4), SYPRO

Orange, and 50 μM final concentration of a metal salt solution: ZnCl2, CoCl2 or CdCl2.

Thermocycler plates were robotically loaded onto the thermostatically controlled PCR-type thermal block. The RT-PCR machine was programmed first with a 5 min equilibration time at 5

C to allow SYPRO Orange to diffuse and reach temperature equilibration to lower the initial background fluorescence [31]. Subsequently, the plate was heated from 5 °C to 95 °C with initial stepwise increments of 1°C per minute, followed by the fluorescence reading optimized for

SYPRO Orange at 485/20 nm (Ex) and 530/30 nm (Em). Each protein-metal aliquot had three replicates and the resultant melting temperatures (Tm) were averaged to obtain the mean Tm.

Reference wells contained buffer and metal solutions without protein.

3.3 Results

3.3.1 Effect of mutations at the active site on the conformation of IscU

Secondary structures of wild-type IscU and its four mutants (D39A, H105A, D39H and H105D) were probed using circular dichroism (CD) spectroscopy. Far-UV spectra of IscU exhibited distinct negative peaks at 208 nm and 222 nm, consistent with the CD profile observed for IscU in earlier reports [26]. Deconvolution of CD spectra have been reported earlier [32], determining the following secondary structure composition: apprximately 41 % α-helix, 14 % β-strands, and 68

20 % turns. All mutations exhibited differences in secondary structure when compared to wild- type IscU, but to varying degrees. The greatest change was observe for D39A, while the least was observed for D39H. Confirming earlier reports, the D39A mutation exhibited a CD profile indicative of the S-state [26] (Figure 3.4, in red), which also resembled the zinc bound wild-type-

IscU spectra (Supplementary Information from [26]). H105A (Figure 3.4, in blue) displayed a more pronounced negative feature in the 200-210 nm region with a shift towards lower wavelength, possibly due to increased content of random coil with concomitant decrease in D/E content. As mentioned earlier, D39H (Figure 3.4, in green) resulted in only a small change in CD spectrum compared to that of wild-type IscU in the absence of metal ions. Finally, H105D

(Figure 3.4, in pink) resulted in a complete lack of secondary structure, and during experiment, the sample formed aggregates, suggesting that the proximity of two negatively charged groups

(D39 and H105D) at the active site is disruptive to the secondary structure of the protein. H105D was not used for further studies. 69

Figure 3.4 Far-UV circular dichroism spectra of 10 μM E. coli IscU and its mutants in absence of metals (apo-form). Molar ellipticity is in (deg cm2/ decimol). All spectra were taken at 25 °C with 10 μM protein samples in 20 mM TrisCl at pH 7.5.

3.3.2 Effect of mutations on metal coordination in IscU

The effects of additing transition metal ions to IscU was monitored using CD spectroscopy. The transition metal ions were divided into two categories: physiologically important metals such as

Fe(III) which is a part of Fe-S cluster in IscU and related scaffold proteins, and Zn(II) which binds to IscU and is crucial for the function of SufU [14], and potentially toxic metals that can perturb Fe-S cluster proteins, such as Cd(II), Co(II) and Cu(II) [33]. 70

Figure 3.5 Far-UV circular dichroism spectra of 10 μM E. coli IscU and its mutants in presence of metals (Zn2+ and Fe3+). Molar ellipticity is in (deg cm2/ decimol). All spectra were taken at 25 °C with 10 μM protein samples in 20 mM TrisCl at pH 7.5 and final concentration of 50 μM

Addition of Fe3+ ions did not affect the wild type protein or the mutants. Addition of Zn2+ had minimal effect on the secondary structure of D39H and D39A, but H105A showed a distinct CD spectra with a deepening of the negative signal at 222 nm (Figure 3.5 D), similar to the change observed for WT IscU with added Zn2+ ions (Figure 3.5 A) and in earlier reports [8] . Observed changes in spectral signals are suggestive of an increase in α-helical content for wild type and

H105A IscU. 71

Figure 3.6 Far-UV circular dichroism spectra of 10 μM E. coli IscU and its mutants in presence of toxic transition metals (Cd2+, Co2+ and Cu2+). Molar ellipticity is in (deg cm2/ decimol). All spectra were taken at 25 °C with 10 μM protein samples in 20 mM TrisCl at pH 7.5 and varying metal concentrations.

In the toxic metals test group, addition of Cu2+ did not have any effect on any of the mutants or wild-type IscU. Cd2+ bound wild type and H105A IscU, as indicated by CD spectra that resembled those collected from WT IscU with added zinc ions. Addition of Co2+ induced conformational change in wild type IscU and H105A. D39H showed distinct spectral transition with addition of Co2+ ions, however addition of Cd2+ ions led to loss of secondary structure

(Figure 3.6). 72

3.3.3 Probing thermal stability of the proteins with addition of metals

To assess the thermal stability of metal bound IscU and its mutants, a thermofluor assay was performed. In this procedure, a hydrophobic fluorescent dye binds is added to the protein in solution to monitor unfolding. The dye fluorescence is quenched when it is fully solvated, however when the dye molecules associate with hydrophobic moieties, the dye becomes more desolvated, leading to enhanced fluorescence. When added to a protein solution that is monitored while the temperature is sequentially increased, the fluorescent signal can reveal protein unfolding. Fluorescence increases with as the solution is heated, causing more of the protein’s hydrophobic interior to become exposed and available for binding to the dye. As the temperature is increased beyond the melting point of the protein, the protein molecules start to aggregate and the dye dissociates from the protein leading to a decrease in fluorescence. To rule out any influence of the buffer or metal on the fluorescence of the dye, negative controls were employed including solutions of the dye with buffer (no protein or metal added) and metal salt solution with the dye (no protein added). Since Fe3+ and Cu2+ are known to quench the fluorescence of

SYPRO Orange, the dye used in thermofluor assay, they were excluded from this experiment.

73

Figure 3.7 Melting temperature of IscU variants in absence and presence of different metal ions, as analyzed by the thermofluor assay.

Among the apo-proteins, only D39A exhibited a “peak” indicating a putative melting transition.

This suggests the other apo proteins may have lacked a sufficiently uniform structural conformation, and instead did not display the characteristic “2-state” unfolding transition. Upon addition of metals, Cd2+ and Zn2+ ions enhanced the thermal stabilities of IscU and its mutants

(Table 3.4). Though D39A mutant did not show remarkable changes in secondary structure upon addition of metal ions, its thermal stability was enhanced increased with an elevated Tm. H105A showed distinct thermal stability in presence of Zn2+ and Cd2+ ions, but very little change with addition of Co2+ ions. D39H exhibited only modest changes in thermal stability in presence of metal ions, possibly indicating weak binding. 74

Table 3.4 Melting temperatures of IscU variants in presence of various divalent metal ions

Tm (qC) No metal Zn2+ Cd2+ Co2+ WT-IscU - 71.3 68.3 64.2 D39A 53.3 60.1 58.3 54.0 H105A - 59.2 58.1 - D39H - 54.7 52.3 -

3.4 Discussion

Structural disorder in IscU has been studied in which role of putative active site residues D39 and H105 have been explored [8,17,22,23]. To gain further understanding of the effect of metal binding on the structure of IscU, and in particular, the influence of these two active site residues, the responses of IscU with mutations at these two positions following addition of transition metal ions were monitored by CD and a thermofluor melting assay.

Homologous proteins such as SufU, have subtle differences in their active sites that may give rise to their different metal preferences or be important for their different functions. In Chapter

2, it was noted that IscU undergoes structural changes and enhanced stability in the presence of added Co2+ ions, whereas SufU remains unchanged. One possible explanation is that the presence of five putative ligands at the active site of IscU can accommodate a metal ion with octahedral coordination, as opposed to SufU, which has a more limited inventory of putative metal ion ligands at its active site.. Here, we investigated the effects of mutating the D39 and

H105 residues on the ability of added transition metal ions to alter the structure and stability of

IscU from E. coli. 75

3.4.1 D39A and D39H mutations stabilize a single structural conformation independent of

metal ion binding

D39A or its equivalent mutation in IscU and its homologs is known to increase cluster stability of holo-IscU in Azotobacter vinelandii NifU [34] as well as yeast [35]. NMR studies have shown that the D39A mutation causes E. coli IscU to exist primarily in the S-state [25]. Our results confirmed the enhanced thermal stability and increased secondary structure content

(Figure 3.4) in a D39A variant of IscU from E. coli. The results reported here demonstrated that unlike wild-type IscU, the D39A mutation decouples metal binding and structural transition.

Specifically, no remarkable change in protein secondary structure was observed by CD for the

IscU D39A mutant protein upon addition of Zn2+, Cd2+ and Co2+ ions, (Figures 3.5 and 3.6).

However, the thermal stability of the metal bound protein was slightly enhanced compared to the apo-protein, suggesting that metal binding occurs with an increase in intramolecular interactions

(Table 3.1).

The tetrahedral coordination of zinc ions at active site of IscU has been suggested to arise from the thiol ligands of Cys37, Cys63, Nε2 of H105 and the carboxylate group of D39, with Cys106

2+ pointing away from Zn [8]. Though the reported pKas of the Nε2 group of H105 and the cysteine residues at the active site imply these groups should be protonated, , the authors note that the local charged environment of the IscU active site would conceivably promote the deprotonated states for these residues [8]. At physiological pH, the D39 residue is likely to be deprotonated at the active site, therefore it has a negative charge (i.e. high electron density). In the absence of a metal ion(s), the asp side chain prevents co-localization with the otherwise electron-dense thiol (or thiolate) cysteine ligands. In other words, without a metal ion, the asp 76 side chain prevents a compact or folded active site until a metal ion binds, neutralizes the excessive electron density, and brings together the cysteine and asp residues through metal coordination.

The D39A mutant takes away the negatively-charged aspartate group, and the active site is capable of “pre-forming” in the absence of the metal ion. After all, the cysteine side chains would be expected to remain protonated (i.e. neutral/no charge) until a metal ion is present, but then they would be neutralized anyway). Metal ion binding therefore does not change the protein structure, but it does enhance the intramolecular interactions, thus increased stability.

By comparison, H105 does not have as much electron density (it is polar, but neutral). So its presence is not “mutually exclusive” with co-localization of the other active site residues.

Therefore, the “pre-formed” active site is lost when the His residue is mutated to H105A, .

D39H mutant shows an interesting phenomenon. This mutant did not reveal a remarkable change in secondary structure in the presence of metal ions (Figures 3.5 and 3.6). It appears as though the D39H mutation stabilizes the protein in state that resembles that of apo-WT IscU on the basis of similarity in the two CD profiles. However, on the basis of features observed in the thermofluor assay for this mutant, stabilization of the protein structure following metal ion binding was diminished by comparison to that observed for the D39A or WT proteins.. This was interpreted as an indication of comparatively weak binding of the added metal ions. The results reported here suggest that even with the availability of a ligand in D39H IscU, two cysteines, and

Nε2 from two histidines (D39H and H105) are unable to bind zinc in a manner that enhances intramolecular interactions as effectively as the WT protein.

77

3.4.2 H105A mutation diminishes the stabilizing effect of added Co2+ ions

In chapter 2,the hypothesis that the unique ability to bind Co2+ ions exhibited by IscU, but not

SufU may be due to differences in the availability of coordinating residues at their active sites was discussed. It is plausible that IscU may be able to accommodate an octahedrally-coordinated metal ion owing to the presence of five potentially-competent metal ligand residues at its active site. By comparision, the four active site residues of SufU are limited to binding metal ions with tetrahedral geometry. This hypothesis led us to study the binding of Co2+ ions to the IscU with mutations at D39A and H105A. Importantly, the CD trace recorded for the H105A mutant exhibited a distinct structural transition upon addition of Co2+ ions (Figure 3.6). This suggests that H105 is not required for binding to the octahedrally-coordinated Co2+ ions. On the other hand, a comparison of the thermal melting curves reveals that the H105A mutant protein experiences diminished stabilization in the presence of Co2+ ions as compared with Zn2+ ions

(Figure 3.7)

It is interesting to note that D39H and H105A mutant proteins exhibitedcontrasting changes upon metal binding with regard to Co2+ ions. While D39A did not exhibit any further change in secondary structure upon interaction with Zn2+, Cd2+, and Co2+ metal ions, its thermal stability was nevertheless slightly enhanced. On the other hand, addition of Zn2+ and Cd2+ ionsto the

H105A mutant protein did result in structural changes observed by CD, as well as enhanced stability evidenced by the thermofluor assay. However, the ability of this mutant protein to interact specifically with Co2+ ions was diminished both in terms of changes in secondary structure, as well as structural stabilization. Our results support the conclusion that the absolute conservation of the active site aspartate residue may be due to its involvement in governing 78 conformational changes in the protein structure upon ligand (metal ion) binding, whereas the more flexible conservation of the H105 residue may reflect a more peripheral role in ligand selectivity. Future structural data for D39H and H105A IscU in their metal bound form will clarify further explanation of the results.

3.5 Conclusion

The interaction of IscU variants with different transition metals presented here will inform studies into the determinants of metal ion coordination and metal-ion dependent structural transitions exhibited by IscU. The results indicate that residues D39 and H105 contribute differently to changes in the conformational state adopted by IscU following metal ion binding, and also that binding to metal ions with octahedral coordination geometry in IscU likely arises from the presence of an additional residue, H105, that is not strictly conserved in the active site required for Fe-S cluster assembly.

79

REFERENCES

1. Chillappagari S, Seubert A, Trip H, Kuipers OP, Marahiel MA, et al. (2010) Copper Stress Affects Iron Homeostasis by Destabilizing Iron-Sulfur Cluster Formation in Bacillus subtilis. Journal of bacteriology 192: 2512-2524. 2. Ranquet C, Ollagnier-de-Choudens S, Loiseau L, Barras F, Fontecave M (2007) Cobalt Stress in Escherichia coli :The effect on the iron-sulfur proteins. Journal of Biological Chemistry 282: 30442-30451. 3. Waldron KJ, Rutherford JC, Ford D, Robinson NJ (2009) Metalloproteins and metal sensing. Nature 460: 823-830. 4. Crichton RR (2012) Biological Inorganic Chemistry: A New Introduction to Molecular Structure and Function: Elsevier. 5. Pearson RG (1963) Hard and soft acids and bases. Journal of the American Chemical Society 85: 3533-3539. 6. Cho AE, III WAG (2015) Metalloproteins: theory, calculations, and experiments: CRC Press. 7. Ramelot TA, Cort JR, Goldsmith-Fischman S, Kornhaber GJ, Xiao R, et al. (2004) Solution NMR structure of the iron–sulfur cluster assembly protein U (IscU) with zinc bound at the active site. Journal of molecular biology 344: 567-583. 8. Iannuzzi C, Adrover M, Puglisi R, Yan R, Temussi PA, et al. (2014) The role of zinc in the stability of the marginally stable IscU scaffold protein. Protein Science 23: 1208-1219. 9. Unciuleac M-C, Chandramouli K, Naik S, Mayer S, Huynh BH, et al. (2007) In Vitro Activation of Apo-Aconitase Using a [4Fe-4S] Cluster-Loaded Form of the IscU [Fe−S] Cluster Scaffolding Protein. Biochemistry 46: 6812-6821. 10. Agar JN, Krebs C, Frazzon J, Huynh BH, Dean DR, et al. (2000) IscU as a scaffold for iron- sulfur cluster biosynthesis: sequential assembly of [2Fe-2S] and [4Fe-4S] clusters in IscU. Biochemistry 39: 7856-7862. 11. Agar JN, Zheng L, Cash VL, Dean DR, Johnson MK (2000) Role of the IscU protein in iron- sulfur cluster biosynthesis: IscS-mediated assembly of a [Fe2S2] cluster in IscU. Journal of the American Chemical Society 122: 2136-2137. 12. Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, et al. (2015) CDD: NCBI's conserved domain database. Nucleic Acids Res 43: D222-226. 13. Shimomura Y, Wada K, Fukuyama K, Takahashi Y (2008) The asymmetric trimeric architecture of [2Fe–2S] IscU: implications for its scaffolding during iron–sulfur cluster biosynthesis. Journal of molecular biology 383: 133-143. 14. Selbach BP, Chung AH, Scott AD, George SJ, Cramer SP, et al. (2013) Fe-S cluster biogenesis in Gram-positive bacteria: SufU is a zinc-dependent sulfur transfer protein. Biochemistry 53: 152-160. 15. Liu J, Oganesyan N, Shin DH, Jancarik J, Yokota H, et al. (2005) Structural characterization of an iron–sulfur cluster assembly protein IscU in a zinc-bound form. Proteins: Structure, Function, and Bioinformatics 59: 875-881. 80

16. Hoff KG, Cupp-Vickery JR, Vickery LE (2003) Contributions of the LPPVK motif of the iron-sulfur template protein IscU to interactions with the Hsc66-Hsc20 chaperone system. Journal of Biological Chemistry 278: 37582-37589. 17. Adrover M, Howes BD, Iannuzzi C, Smulevich G, Pastore A (2015) Anatomy of an iron- sulfur cluster scaffold protein: Understanding the determinants of [2Fe–2S] cluster stability on IscU. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1853: 1448-1456. 18. Xue B, Dunbrack RL, Williams RW, Dunker AK, Uversky VN (2010) PONDR-FIT: A Meta-Predictor of Intrinsically Disordered Amino Acids. Biochimica et biophysica acta 1804: 996-1010. 19. Blauenburg B, Mielcarek A, Altegoer F, Fage CD, Linne U, et al. (2016) Crystal Structure of Bacillus subtilis Cysteine Desulfurase SufS and Its Dynamic Interaction with Frataxin and Scaffold Protein SufU. PloS one 11: e0158749. 20. Kim JH, Tonelli M, Frederick RO, Chow DC-F, Markley JL (2012) Specialized Hsp70 chaperone (HscA) binds preferentially to the disordered form, whereas J-protein (HscB) binds preferentially to the structured form of the iron-sulfur cluster scaffold protein (IscU). Journal of Biological Chemistry 287: 31406-31413. 21. Riboldi GP, Verli H, Frazzon J (2009) Structural studies of the Enterococcus faecalis SufU [Fe-S] cluster protein. BMC biochemistry 10: 1. 22. Shi R, Proteau A, Villarroya M, Moukadiri I, Zhang L, et al. (2010) Structural basis for Fe–S cluster assembly and tRNA thiolation mediated by IscS protein–protein interactions. PLoS Biol 8: e1000354. 23. Bothe JR, Tonelli M, Ali IK, Dai Z, Frederick RO, et al. (2015) The Complex Energy Landscape of the Protein IscU. Biophysical journal 109: 1019-1025. 24. Kim JH, Bothe JR, Alderson TR, Markley JL (2015) Tangled web of interactions among proteins involved in iron–sulfur cluster assembly as unraveled by NMR, SAXS, chemical crosslinking, and functional studies. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research 1853: 1416-1428. 25. Markley JL, Kim JH, Dai Z, Bothe JR, Cai K, et al. (2013) Metamorphic protein IscU alternates conformations in the course of its role as the scaffold protein for iron–sulfur cluster biosynthesis and delivery. FEBS letters 587: 1172-1179. 26. Kim JH, Tonelli M, Markley JL (2012) Disordered form of the scaffold protein IscU is the substrate for iron-sulfur cluster assembly on cysteine desulfurase. Proceedings of the National Academy of Sciences 109: 454-459. 27. Stookey LL (1970) Ferrozine---a new spectrophotometric reagent for iron. Analytical chemistry 42: 779-781. 28. Säbel CE, Shepherd JL, Siemann S (2009) A direct spectrophotometric method for the simultaneous determination of zinc and cobalt in metalloproteins using 4-(2-pyridylazo) resorcinol. Analytical biochemistry 391: 74-76. 81

29. Carrer C, Stolz M, Lewitzki E, Rittmeyer C, Kolbesen BO, et al. (2006) Removing coordinated metal ions from proteins: a fast and mild method in aqueous solution. Analytical and bioanalytical chemistry 385: 1409-1413. 30. Ericsson UB, Hallberg BM, DeTitta GT, Dekker N, Nordlund P (2006) Thermofluor-based high-throughput stability optimization of proteins for structural studies. Analytical biochemistry 357: 289-298. 31. Lo M-C, Aulabaugh A, Jin G, Cowling R, Bard J, et al. (2004) Evaluation of fluorescence- based thermal shift assays for hit identification in drug discovery. Analytical biochemistry 332: 153-159. 32. Adinolfi S, Rizzo F, Masino L, Nair M, Martin SR, et al. (2004) Bacterial IscU is a well folded and functional single domain protein. European Journal of Biochemistry 271: 2093-2100. 33. Py B, Moreau PL, Barras F (2011) Fe–S clusters, fragile sentinels of the cell. Current opinion in microbiology 14: 218-223. 34. Foster MW, Mansy SS, Hwang J, Penner-Hahn JE, Surerus KK, et al. (2000) A Mutant Human IscU Protein Contains a Stable [2Fe-2S]^ 2^+ Center of Possible Functional Significance. JOURNAL-AMERICAN CHEMICAL SOCIETY 122: 6805-6806. 35. Wu G, Mansy SS, Wu S-p, Surerus KK, Foster MW, et al. (2002) Characterization of an iron− sulfur cluster assembly protein (ISU1) from Schizosaccharomyces pombe. Biochemistry 41: 5024-5032.

82

APPENDIX B FIGURES

Figure S1. SDS-PAGE of showing His-tagged IscU post IMAC

purification. Lane 1 shows the molecular weight marker, the

lowest band being 10 kDa. His tagged-IscU appears ~17 kDa.

Figure S2. Iron assay of Apo-N13C-N48C IscU. UV-vis absorbance spectra of FeCl3 stock solutions and Apo IscU variants are overlaid.

83

CHAPTER 4. INTRODUCTION TO RECURRENT INTERACTIONS IN NUCLEOPROTEIN

COMPLEXES

Parts of this chapter has been adapted from [1], which I co-authored.

From assembly of macromolecular nucleoprotein complexes such as the ribosome, to transient interactions between tRNAs and aminoacyl synthetases, interactions between individual nucleotides (nts) and amino acids (aa) are mediated by a variety of non-covalent interactions that vary in specificity and interaction energy. Classified from a structural perspective, they include edgewise interactions, face-to-face stacking and electrostatic interactions between sugar- phosphate backbone of nucleic acids and proteins. These interactions enable cognate recognition between a specific nucleic acid sequence and a DNA or RNA binding protein. Reliable recognition and annotation of these and other recurrent nucleotide-level interactions in atomic- resolution 3D structures is fundamental for understanding RNA folding, function, and evolution, because they are widespread and crucial as building blocks of RNA 3D motifs and complex

RNA architectures.

I begin with definitions of the components of RNA and proteins, RNA nucleotides and amino acids. I describe the electronic and chemical properties of these building blocks which govern these fundamental interactions. Then I provide detailed descriptions and examples from 3D experimental structures, of the most important interactions and analyze them to identify the physical forces. Finally, I conclude by reviewing online resources available through the NDB web portal (http://ndbserver.rutgers.edu/) and related resources that provide access to visualizations and comprehensive lists of interactions for all atomic-resolution RNA structures in

PDB, tools for structure search, and compilations of 3D motifs organized by structural similarity. 84

4.1 Components of RNA nucleotides

The nucleotide is the basic unit of RNA structure. It is also the synthetic unit or “synthon” from which RNA is produced in vivo. Each nucleotide is composed of one of the four RNA bases,

Adenine (A), Cytosine (C), Guanine (G), or (U), attached to D-ribose, a 5-Carbon furanose sugar ring, which in turn is linked by phosphodiester linkages at the 3’ and 5’ carbons.

Two features distinguish RNA from DNA and have important structural implications: the substitution of -OH in RNA at the 2’-carbon of the ribose ring in place of -H in the deoxyribose ring of DNA, and the substitution of uracil (U) in RNA for thymidine (T) in DNA. The 2’-OH facilitates H-bonding along the “Sugar-edge” of each RNA base. U lacks the 5-methyl group located on the Hoogsteen edge of Thymidine (T); consequently U, but not T, can form basepairs along its Hoogsteen edge. These structural differences make RNA more versatile than DNA in forming interactions that support complex structures.

4.1.1 Bases and base edges

Each base is a nitrogen-rich, heterocyclic aromatic ring system that is planar in its equilibrium geometry and quite rigid, due to delocalization of pi-electrons of carbon and nitrogen atoms in the ring. RNA bases are of two types, the pyrimidines (U and C), composed of one six- membered ring, and the larger purines (A and G), composed of fused five- and six-membered heterocyclic rings. Structures of nts with the numbering of base positions are shown in Figures

1.1. Ring atoms are numbered from 1 to 9 in purines and from 1 to 6 in pyrimidines. Exocyclic groups and attached hydrogens are numbered according to ring position. Ribose carbon atoms are numbered with primes, from 1’ to 5’, to distinguish their atoms from those of the base.

The planar RNA bases present three “edges” studded with Hydrogen-bond donor and acceptor 85 groups along which H-bonds can form with the edges of other bases, as well as with phosphate and ribose groups, amino acid side chains or backbone atoms of proteins and small molecules.

These edges are called the Watson-Crick (W), Hoogsteen (H) and Sugar (S) edges [2]. Therefore it is useful to represent RNA bases, purines as well as pyrimidines, as triangles, or more precisely, oblate triangular prisms [3], to describe and classify the diverse interactions observed in RNA molecules [4]. The base edges are indicated for each RNA base in Figure 4.1. Further support for representing bases as triangles comes from the observation that bases can pair edge- to-edge with up to three other bases at the same time, but no more than three [5].

86

Figure 4.1 Hydrogen bonding in RNA nucleotides. RNA nts G, C, A, and U each have three edges along which H-bonding takes place. Hoogsteen, Watson-Crick, and Sugar Edges are marked with dotted lines. Base ring atoms are numbered from 1 to 9 for purines and 1 to 6 for pyrimidines. Exocyclic groups and attached hydrogens are numbered according to ring position. H-bond donors are highlighted with blue and H-bond acceptors with red.

In triangles representing the RNA bases, two of the vertices coincide with exocyclic H-bonding functional groups. The H and W edges meet at the vertex defined by the N4 and N6 amino groups of C and A, and the O4 and O6 carbonyl oxygens of U and G. The W and S edges meet at the vertex defined by the O2 carbonyl group in C and U, the polarized C2-H2 of A, and the N2 amino group of G. The glycosidic bond connecting each base to the C1’ carbon of the ribose defines the third vertex of the triangle, where the H and S edges meet. The 2’-OH group, unique 87 to RNA, extends the H-bonding capability of the Sugar edge of each nucleotide.

The RNA bases are shown in Figure 4.1 with H-bond donor groups shaded with blue “clouds” and the H-bond acceptor groups shaded with red “clouds,” representing centers of positive and negative charge, respectively. The ribose 2’-OH group is shaded in purple because it serves both as an H-bond donor and H-bond acceptor.

4.1.2 Ribose sugars

RNA bases are covalently attached to the 1’-carbons of their respective ribose rings by single C-

N bonds, the “beta-glycosidic” bonds. In the beta configuration, each base is attached to the same side of its ribose as the 5’-carbon atom. The ribose ring is a five-carbon aldose sugar present in the furanose form. The flexible beta-glycosidic bonds allow for relatively free rotation of the

RNA base relative to the ribose moiety. Rotation of the base gives rise to two distinct conformational classes of the nucleotide, called anti and syn. In the anti configuration the

Watson-Crick edges of the bases point away from the 5’-phosphate group. In the syn configuration the base is rotated ~180° about the glycosidic bond so the WC edge faces the 5’- phosphate. In G, the base which is most often observed in the syn configuration [6], the N2 amino group is positioned to H-bond to the phosphate group attached to the 5’ carbon. However, in polynucleotides, the anti configuration is much more common. The anti configuration is stabilized by weak H-bonding of purine H8 or pyrimidine H6 atoms to the 5’-phosphate.

Whether a base is syn or anti affects the kinds of basepairs it can form with nearby bases. One of the most common discrepancies between published RNA 3D structures of the same molecule is the glycosidic bond configuration of corresponding nucleotides [3].

88

4.1.3 Phosphate groups

Phosphate groups are derived from phosphoric acid (H3PO4), a weak acid with three dissociable protons that can form up to three phospho-ester linkages. In RNA and DNA, there is one phosphate per base-sugar unit or “nucleoside” and each phosphate links two nts, by forming two phospho-ester bonds. Each phosphate is arbitrarily assigned to the nt attached by its 5’-hydroxyl group to the phosphate. The phosphate groups link adjacent base-ribose units to each other by forming a second linkage to the 3’-hydroxyl of the preceding nucleotide in the linear chain. To link two nucleosides together, each phosphate loses two protons (H+) and forms two electrically neutral phospho-ester linkages. The remaining proton is acidic and largely dissociates at neutral pH. Therefore each phosphate group in RNA carries a full negative (-1) electrical charge at neutral pH. This negative charge is delocalized over the two non-bridging oxygen atoms of the phosphate group, both of which consequently are strong H-bond acceptors. This fact provides the basis for structurally important base-phosphate interactions, which can be as strong as GC basepairs [7]. Large RNA molecules like 16S therefore have substantial negative charge which must be neutralized by mobile positive ions in order for the RNA to fold into its functional structure. In vivo, charge neutralization depends on mobile K+ ions and Mg2+ , many of which bind-site specifically [8], as well as basic proteins and polyamines [9].

4.2 RNA Backbone conformations

The backbone of the RNA chain is very flexible and able to assume many different conformations because each nt contributes six covalent single bonds, including the C3’-C4’ and

C4’-C5’ bonds of the ribose ring and two sets of C-O and O-P bonds comprising the two phospho-ester linkages. The conformation of each nt is defined by the set of values of the 89 dihedral angles of these bonds together with that of the glycosidic bond. The dihedral angles of each nucleotide are assigned Greek letters, alpha to zeta, starting with the P-O5’ (D) bond of the

5’-phosphate and continuing consecutively with the O5’-C5’ (E), C5’-C4’ (J), C4’-C3’ (G), C3’-

O3’ (H), and ending with the O3’-P bond of the 3’-phosphate. The glycosidic bond (“chi”) contributes the seventh dihedral angle. These seven dihedral angles define a very large, seven- dimensional conformational space for each nucleotide [10]. However there are extensive correlations between the values of the dihedral angles, so that relatively few regions of this seven-dimensional space are populated to a significant extent in observed or theoretically possible structures. Clustering of conformations is most conveniently carried out by parsing the

RNA backbone in overlapping “suites” that extend from one sugar to the next, rather than from one phosphate group to the next. Analysis of atomic-resolution experimental data using this approach enabled researchers to determine that the backbone conformations of structured RNA molecules are “rotameric” [11]. In other words, most observed conformations can be assigned to well-defined clusters of conformations and these are characteristic of particular structural motifs.

This analysis identified 42 recurrent rotamer clusters in RNA structures, each of which was assigned a two-symbol representation. While the experimental data are still incomplete and we expect to discover additional energetically accessible recurrent (though rare) conformations or rotamers that are likely to be rare or similar to ones already reported. Conformational analysis of each suite in each structure is now a standard reporting function of NDB [12].

4.3 Components of amino acid residues

Proteins are universally composed of D-amino acids, comprising of both an amino group and a carboxylic acid group. Due to the presence of four different chemical groups attached to a central 90 carbon atom (the D-carbon): amino, carboxylic acid, hydrogen and sidechain, the central carbon atom in an amino acid is chiral. Hence, except for glycine the simplest amino acid with a hydrogen-atom as a sidechain, every amino acid can occur in two isomeric forms, because of the possibility of forming two different enantiomers (stereoisomers) around the central carbon atom

(Figure 4.2 D and L amino acids). By convention, these are called D- and L- forms, according to the naming convention devised by Emil Fischer based on the rotation of circularly polarized light.

Figure 4.2 Fisher projections of D and L enantiomers of alanine

The D/L enantiomers were named based on the rotation of cicrcularly polarized light by a selected standard molecule, glyceraldehyde, HOCH2CH(OH)CHO. The enantiomer that rotates plane polarized light clockwise (+) was labeled Dextrorotatory (right-handed rotation) or D while the enantiomer that rotates the light anticlockwise (-) was called laevorotatory (left-handed rotation) or L. In modern stereochemical annotation, the D/L naming convention has been supplemented by R/S convention that is based on absolute configuration. In cells, only L-amino acids are manufactured and incorporated in proteins. Certain D-amino acids are found in the cell walls of bacteria, but not in bacterial or eukaryal proteins. In the following sections, I will describe how the geometrical and chemical properties of the amino acid components direct the 91 structure and functional properties of proteins. The secondary structure of a protein is determined by the conformation of its peptide backbone, stabilized by recurrent H-bonding patterns involving backbone -NH and C=O groups. On the other hand, the chemical properties of a protein depend largely on the nature of the sidechains of amino acid residues.

4.3.1 Peptide backbone and torsion angles

Amino acids are linked by peptide bonds which are amide bonds formed by condensation of the amino group of one amino acid and the carboxylate group of another. This reaction requires loss of water molecule (H2O) and is thermodynamically unfavorable under cellular (aqueous) conditions and pH. The amide bond so formed has partial double bond character due to delocalization of the nitrogen lone pair of electrons into the C=O group S-bond, which restricts rotation around the peptide bond. At physiological temperature and in an aqueous solvent, the energy barrier that restricts rotational motion can be overcome to allow interconversion of cis and trans isomers of peptides with the help of enzymes but otherwise, the isomerization is very slow. Planar cis and trans isomer are most stable since they allow maximal orbital overlap [13].

Hydrogen bonding between atoms of the peptide backbone lead to the formation of protein secondary structural elements such as α-helix and β-sheet. It should be noted that the amide nitrogen can act as an H-bond donor but not as an H-bond acceptor, due to the delocalization of its non-bonding lone pair of electrons with the carbonyl group. The carbonyl oxygen can accept two H-bonds 120q apart

Each amino acid residue has three different dihedral angles defined by their backbone atoms.

These are phi, psi and omega as shown in Figure 4.3. ϕ describes rotation about the N-C(α) bond and involves the C(O)-N-C(α)-C(O) bonds; ψ describes rotation about the C(α)-C(O) bond and 92 involves the N-C(α)-C(O)-N bonds; ω describes rotation about the C(O)-N bond and involves the

C(α)-C(O)-N-C(α) bonds.

Figure 4.3 Dihedral angles in peptide backbone. The phi angle is the angle of right-handed rotation around N-CA bond, the value being zero if CA-C bond is cis to C-N bond. Range: from -180 to 180 degrees. The psi angle is the angle of right-handed rotation around CA-C bond, the value being zero if C-N bond is cis to N-CA bond. Range: from -180 to 180 degrees. The omega angle is the angle of right-handed rotation about C-N bond, the value being zero if CA-C bond of the preceding residue is cis to N-CA bond. Most residues in a typical protein are involved in the formation of two peptide bonds. The peptide bond formed by the residues I and I + 1 is assigned to the residue I + 1. The same applies to the omega angle. For that reason no omega angle is assigned to the first residue. Image adapted from (source)

The planarity of the peptide bond restricts omega to 180q in very nearly all of the main chain peptide bonds. In rare cases omega = 0 degrees for a cis peptide bond which, as stated above, usually involves proline.

4.3.2 Amino acid sidechains

The 20 essential amino acids found in proteins exhibit a significant range of chemical types due 93 to the chemical nature of the different sidechains of proteins. The properties used to characterize amino acid sidechains include: (i) size/ steric volume, which determines strategic positioning of certain sidechains in a protein 2q or 3q structure (ii) ionizability as determined by the presence of functional groups with dissociable protons (iii) presence of aromatic rings (iv) polarizability, which influences the ability of the amino acid to form H-bonds, and (v) hydropathy or ability to form hydrophobic interactions.

a) Protein volumes: Amino acid volumes play a crucial role in their spatial arrangement in a

protein, thereby determining intra- and intermolecular interactions of protein residues

with each other and with neighboring molecules. Volume occupied by a protein in space

depends on its amino acid composition and is called compositional volume [14]. Volume

of each amino acid is calculated theoretically by approximating each atom in the

functional group as a hard sphere and the spherical volume is given by the standard

equation 1.1:

ͶSݎଷ ܸ ൌ  ௪ ͵

where rw is the van der Waal’s radius of each atom. Accordingly, each amino acid

sidechain can be categorized into four groups: very large, large, medium and small. Van

der Waal’s volume for each amino acid sidechain [15] and their size classification are

shown in table 1.1.

94

+ b) Ionizability: All amino acids have two ionizable groups: the amino group (NH3 ) and the

carboxylate group (COO-). At physiological pH (pH 7.3), certain key amino acids are

negatively (Asp, Glu) or positively (Lys, Arg) charged. Combination of charges on

individual amino acid residues determines the net charge carried by the protein. The pH

at which the net charge on the protein is zero is called the isoelectric point or pI of the

protein. At a pH above the pI, the protein is negatively charged and when the pH is lower

than the pI, the protein has a net positive charge. pKa for individual amino acids and the

relationship between pH, pKa and charge is crucial for the function of a protein and will

be discussed in greater details later in this chapter. c) Presence of aromatic rings: Four amino acids have aromatic rings in their sidechains.

These are Phenyalanine, Tyrosine, Tryptophan and Histidine. Phe and Tyr have a phenyl

group each, Trp has an indole sidechain and His, the smallest, has an imidazole sidechain.

Despite having an aromatic sidechain, Histidine is usually not grouped together with the

other aromatic amino acids because it exists in two equivalent tautomeric forms where

one of the two nitrogens bears a positive charge and the other bears a negative charge.

Due to this charge separation, histidine has an electric dipole moment of 3.67 D and is

highly polar. It is also amphoteric i.e. it can act as both an acid and as a base. The other

aromatic amino acids are quite different from histidine and share certain common

properties. The presence of aromatic ring has deep impact on the properties of the amino

acids. Rules governing aromaticity ensure that these sidechains are planar, have overlap

of three or more p-orbitals and consequently, have extensive delocalization of electrons

[16]. The delocalization of electrons confers extra stability to these sidechains, making

them less likely to participate in interactions such as Hydrogen bonding. Tyrosine is the 95

only aromatic amino acid that has an ionizable sidechain (-OH). Large aromatic rings

also provide substantial hydrophobic surface area that is utilized at the time of protein

folding as a manifestation of hydrophobic effect. Aromatic amino acids also participate in d) Polarizability: Every molecule shows tendency for charge separation in order to create an

electric dipole. Electronegative atom in a molecule shows significant charge separation

by pulling the shared electron cloud towards itself. Most electronegative atoms have a

partial negative charge (δ-), while atoms bonded to an electronegative atom possess a

partially positive charge (δ+). Electronegative atoms such as Oxygen, Nitrogen and to a

lesser extent, Sulfur and Phosphorus make a bond polar. Difference in electronegativity

of atoms in a molecule gives rise to polarity. A molecule is said to be polar when the

difference in the electronegativity between two atoms ranges from 0.4 to 1.7 i.e. that one

atom exerts significantly more pull on the shared electrons than the other. In a non-polar

molecule, the difference in electronegativity between two atoms is less than 0.4, this is to

say that the atoms of the molecule exert approximately the same pull on the charges,

therefore no dipole [17]. There are four different classes of amino acids determined by

different side chains: (1) non-polar and neutral, (2) polar and neutral, (3) acidic and polar,

(4) basic and polar. Polarity-wise distribution of amino acids in a protein often controls

the folding and function of the protein. E.g. in a water-soluble protein, non-polar amino

acids tend to be found in the centre of the molecule where they stabilize the structure

while polar amino acids tend to be located on the protein surface where they can interact

with water. In membrane-bound proteins, this trend is reversed. Non-polar amino acids

tend to be located on the regions of the surface in contact with the membrane while polar

amino acids will generally line interior pores to create hydrophilic channels. 96

Groups containing only a combination of carbons and hydrogens are non-polar. Accordingly, amino acids are classified by the polarity of their sidechain as shown in table 1.

e) Hydropathy: Amino acid sidechains have distinct chemical nature which influences their

behavior in aqueous solvents. Sidechains of amino acids that contain aliphatic chains

(Ala, Leu, Ile, Val, Met) are hydrophobic and do not prefer aqueous environment. Hence,

these amino acids are usually found buried within the hydrophobic core of the protein, or

within the lipid portion of the membrane. Hydrophobicity or hydrophilicity scales are

used to indicate relative hydrophobicity of amino acid residues. While determining

chemical nature of different parts of a protein, each residue in the protein is assigned a

characteristic hydrophobicity value. Several hydrophobicity scales exist; one of the most

common is the hydropathy plot according to the Kyte-Doolittle scale [18]. This scale is

based on the transfer free energies from water to vapor of model compounds for the

amino acid side chains, as well as on the exterior-interior distribution of amino acids in

protein structures. In their original article, Kyte and Doolittle described usage of

hydropathy analysis for identifying D-helices in membrane proteins. Each amino acid is

assigned a hydrophobicity value. The more positive the value, the more hydrophobic the

aa. Similarly, the more negative the value, the more hydrophilic the aa. Characteristic

hydrophobicity values for different aa sidechains are listed in Table 1.1. It should be

noted that in other hydrophobicity scales, the values for different aa sidechains may differ

but the general order of amino acids (from most to least hydrophobic) is similar.

97

Table 4.1 Classification of amino acids according to nature of sidechains

Amino acid Sidechain Van der Waal’s Hydro Charge, Polarity (Abbreviation) volume (Å3) pathy

Glycine (Gly) H 48 (VS) -0.4 Neutral, Non-polar

Alanine (Ala) CH3 67 (VS) 1.8 Neutral, Non-polar

Serine (Ser) CH2OH 73 (VS) -0.8 Neutral, Polar

Cysteine (Cys) CH2SH 86 (S) 2.5 Neutral, Slightly polar

Proline (Pro) 90 (S) -1.6 Neutral, Non-polar

Aspartic Acid (Asp) CH2CO2H 91 (S) -3.5 Acidic, Polar

Threonine (Thr) CH(CH3)-OH 93 (S) -0.7 Neutral, Polar

Asparagine (Asn) CH2CONH2 96 (S) -3.5 Neutral, Polar

Valine (Val) CH(CH3)2 105 (M) 4.2 Neutral, Non-polar

Glutamic Acid (Glu) CH2CH2CO2H 109 (M) -3.5 Acidic, Polar

Glutamine (Gln) CH2CH2CONH2 114 (M) -3.5 Neutral, Polar

Histidine (His) 118 (M) -3.2 Basic, Polar

Isoleucine (Ile) CH(CH3)CH2CH3 124 (M) 4.5 Neutral, Non-polar

Leucine (Leu) CH2CH(CH3)2 124 (M) 3.8 Neutral, Non-polar 98

Methionine (Met) CH2CH2SCH3 124 (M) 1.9 Neutral, Non-polar

Phenylalanine (Phe) 135 (L) 2.8 Neutral, Non-polar

Lysine (Lys) (CH2)4NH2 135 (L) -3.9 Basic, Polar

Tyrosine (Tyr) 141 (L) -1.3 Neutral, Polar

Arginine (Arg) 148 (L) -4.5 Basic, Polar

Tryptophan (Trp) 163 (VL) -0.9 Neutral, Slightly polar

4.4 Properties of amino acids and nucleotides

4.4.1 Electronic structure of amino acids

Electronic and steric properties of amino acids govern as well as interactions of proteins with other molecules such as other proteins, RNA, DNA and ligands. Furthermore, it has been posited that the electronic properties of amino acids, including inductive effects, can play an important role in the preference of the protein to adopt a particular secondary structure

[19], which in turn dictates interaction motifs. Charge on a protein affects its behavior in ion exchange chromatography. Proteins contain many ionizable groups on their amino acid sidechains as well as their amino - and carboxyl - termini. These include basic groups on the sidechains of lysine, arginine and histidine and acidic groups on the sidechains or glutamate, aspartate, cysteine and tyrosine. The pH of the solution, the pKa of the sidechain and local 99 environment around the sidechain influence the charge on each sidechain.

4.4.1.1 pKa values, corresponding delta G

Amino acids are weak acids and like all weak acids, they partially dissociate in water. In water,

+ weak acids such as HA exist in a dissociation equilibrium of protonated water H3O and the conjugate base A-

+ + - HA + H2O ' H3O + A (1)

The concentration ratio of both sides is constant given fixed analytical conditions (temperature, pressure, ionic strength etc) and is referred to as the acid dissociation constant (Ka). Ka is defined by the equation (2).

ሾு ைశሿሾ஺షሿ ܭ ൌ య (2) ௔ ሾு஺ሿ

In equation 2, the square brackets indicate the concentration of respective components. Ka or the acid dissociation constant measures how readily the acid releases proton i.e strength of the molecule as an acid. Ka is related to the free energy required by the acid to release the proton

(ΔGq).The smaller ΔG is related to the strength of an acid as shown in Equation 3. In addition,

+ the equation shows how the dissociation state of weak acids vary according to the [H3O ] level in the solution, where Q is the reaction quotient.

ΔGq = -RT ln Ka (3)

'ܩൌ'ܩq൅ܴܶŽܳ

Expressing acidity in terms of the Ka constant is inconvenient because of the large range of values (1 x 101 to 1 x 10-14). Therefore, pKa was introduced as a logarithmic index to express the acidity of weak acids, where pKa is defined by equation 4a and the relationship between pH, pKa 100 and charge for individual amino acids can be described by the Henderson-Hasselbalch equation

(equation 4b),

pKa = -log10 Ka (4a)

− pH= pKa+log10[A ]/[HA] (4b)

There are at least two ionizable groups in each amino acid, the amino group and the carboxylate group. In addition, all amino acids even Glycine, have sidechains which may be ionizable in principle under suitable solvent conditions [20]. For amino acids with hydrocarbon sidechains

(Ala, Val, Leu, Ile, Phe) though there is a theoretical pKa value for dissociation of their sidechains, they are not of practical value. For all other amino acids, table 4.2 shows the pKa value of their ionizable groups, with their corresponding Ka and ΔG values.

101

Table 4.2 pKa value of different sidechains of amino acids and the corresponding Ka and 'G values

Amino Sidechain + - acid -NH3 -COO

pKa Ka ΔG pKa Ka ΔG pKa Ka ΔG

Ala 9.87 1.35E-10 5.67E+04 2.35 4.47E-03 1.35E+04

Arg 8.99 1.02E-09 5.16E+04 1.82 1.51E-02 1.05E+04 12.48 3.31E-13 7.17E+04

Asn 8.72 1.91E-09 5.01E+04 2.14 7.24E-03 1.23E+04 14.9 1.26E-15 8.56E+04

Asp 9.9 1.26E-10 5.69E+04 1.99 1.02E-02 1.14E+04 3.9 1.26E-04 2.24E+04

Cys 10.7 2.00E-11 6.15E+04 1.92 1.20E-02 1.10E+04 8.37 4.27E-09 4.81E+04

Gln 9.13 7.41E-10 5.24E+04 2.17 6.76E-03 1.25E+04 14.9 1.26E-15 8.56E+04

Glu 9.47 3.39E-10 5.44E+04 2.1 7.94E-03 1.21E+04 4.07 8.51E-05 2.34E+04

Gly 9.78 1.66E-10 5.62E+04 2.35 4.47E-03 1.35E+04

His 9.33 4.68E-10 5.36E+04 1.8 1.58E-02 1.03E+04 6.04 9.12E-07 3.47E+04

Ile 9.76 1.74E-10 5.61E+04 2.32 4.79E-03 1.33E+04

Leu 9.74 1.82E-10 5.59E+04 2.33 4.68E-03 1.34E+04

Lys 9.06 8.71E-10 5.20E+04 2.16 6.92E-03 1.24E+04 10.54 2.88E-11 6.05E+04

Met 9.28 5.25E-10 5.33E+04 2.13 7.41E-03 1.22E+04

Phe 9.31 4.90E-10 5.35E+04 2.2 6.31E-03 1.26E+04 102

Pro 10.64 2.29E-11 6.11E+04 1.95 1.12E-02 1.12E+04 9.3 5.01E-10 5.34E+04

Ser 9.21 6.17E-10 5.29E+04 2.19 6.46E-03 1.26E+04 13 1.00E-13 7.47E+04

Thr 9.1 7.94E-10 5.23E+04 2.09 8.13E-03 1.20E+04 13 1.00E-13 7.47E+04

Trp 9.41 3.89E-10 5.40E+04 2.46 3.47E-03 1.41E+04

Tyr 9.21 6.17E-10 5.29E+04 2.2 6.31E-03 1.26E+04 10.46 3.47E-11 6.01E+04

Val 9.74 1.82E-10 5.59E+04 2.29 5.13E-03 1.32E+04

At a pH above their pKa, a carboxylic group loses a proton and acquire a negative charge. At a pH lower than their pKa, the aspartic acid and glutamic acid side chains are uncharged. At a pH

+ lower than their pKa, the lysine, arginine and histidine side chains accept an H ion (proton) and are positive charged. They are therefore basic. Based on the pKa, five amino acids are fully or partially charged at physiological pH (7-8): Asp, Glu (acidic) and Lys, Arg, His (basic). The pKa of His is very close to the physiological pH. Hence, it is not completely ionized and has a 25% positive charge at pH 7.3. The other four amino acids are completely ionized at pH 7.3 and have a full +1 or -1 charge. Quantum topological atomic charges have been calculated for each of these five amino acids to show the location of the charges at physiological pH [21]. The majority of the molecular charge is found on the side-chain (81–100%), with a large percentage of the charge located on the functional group undergoing protonation/deprotonation. Charge delocalization in Arg, His, Asp and Glu are shown in figures 4.6 and 4.7.

4.4.2 Electronic properties of nucleotides

Electronic properties and distribution of electron density of nucleotides are relevant for 103 understanding the Hydrogen bonding propensities of the ring nitrogen atoms and the exocyclic functional groups. They are also responsible for the planar geometries of the bases that facilitate base stacking interactions. Hydrogen bonding between nucleotides enable base complementarity while London forces and suitably oriented dipoles between stacked base pairs in RNA provide a pathway for rapid, one-dimensional charge separation [22,23]. Extensive quantum mechanical calculations of DNA and RNA bases have been carried out using even larger basis sets and improved methods [23]. These calculations provide the electron density and equilibrium (lowest energy) atomic coordinates of the nucleic bases. From the electron density, all other chemical properties are in principle derivable, including dipole moment and direction, charge distribution and electric potential. The dipole moments of the base lie in their planes and affect stacking geometries. Charge distribution determines where the H-bonds can form. The polarizability of the electron density affects London dispersion forces that give rise to so-called S-S bonding.

− Quantum mechanical calculations have shown that there is a charge transfer from PO4 groups of

DNA superhelices to the positively charged side chains arginines and lysines of the histones at

120 sites of the superhelix. This results in a hole formation in the conduction band in DNA and an electronic one in the proteins [24]. The reverse process, electron transfer from aromatic amino acids to nucleobases helps in repairing the nucleic acids from oxidative damage. This concept will be revisited in chapter 3 when I discuss the adaptations made by mitochondrial ribosome to sustain its structure and function in a highly oxidizing environment.

4.4.3 Tautomeric and protonated forms of bases

Each of the four bases of RNA can exist in at least two tautomeric forms. The bases selected by nature for RNA and DNA are unique in one tautomer significantly lower in energy than the 104 others [25]. This natural bias ensures fidelity in their hydrogen bonding pattern. Cyclic amidines such as and cytosine can exist in either amino or imino forms (Figure 4.4 A while cyclic amides such as guanine, thymine, and uracil can exist in either keto or enol forms. The tautomeric forms increase the possible base pairing combinations. E.g. tWW AU and tWW GC pairs are not isosteric. However, when the C is present as the imino tautomer or the G is present as the enol tautomer, they can form a tWW G = C pair that is isosteric to the tWW A–U pair

[26].

The tautomeric forms of each base exist in equilibrium but the amino and keto tautomers are more stable by ~22.8 kJ and therefore predominate under physiological conditions. The ratio for minor to major tautomer concentration is 1: 104 [27-29]. The rings remain unsaturated and planar in each tautomer. These variations in base combinations as well as isostericity of different base pairs limit geometric selection in molecular recognition of complementary Watson-Crick pairs for recongnition by DNA polymersaes to ensure fidelity in replication and translation [25]. 105

106

Figure 4.4 Tautomers of nucleobases and their hydrogen bonding patterns. (A) Tautomers of the four standard nucleobases of RNA. (B) Standard complementary U-A (left) and C=G (right)Watson-Crick pairs and the isosteric C~A and U~G pairs formed using the tautomers imino (for C and A – left) and enol (for U and G – right).Image adapted from [26,30].

Another routine chemical modification of nucleobases is protonation/ deprotonation. Though protonated forms of RNA bases are less discussed, they are observed in crystal and solution structures [31-35]. pKa of nucleobases have been calculated using the Poisson–Boltzmann equation [36]. Table 1.3 shows the pKa values of different RNA bases under physiological conditions [6]. Reports on mechanistic studies have indicated that the ring nitrogens of the nucleoside bases can function as general acid-base catalysts [37]. Ionization of bases is not only 107 dependent on their intrinsic pKa but also on the microenvironment created around the group by the protein or RNA structure, which modifies its intrinsic pKa to its functional or apparent pKa.

Dielectric constant of the medium along with the interactions of the ionizable group with other fully or partially charged groups determines the extent of pKa change.

Table 4.3 Intrinsic pKa values of RNA bases in aqueous solution

RNA base Ionization equilibrium pKa

Adenosine 3.5

Cytosine 4.2

Guanine 9.4

Uracil 9.4

It has further been observed that RNA bases shift their pKas to form folded structural motifs with conserved, electrostatic interactions. In particular cases, selective protonation of nucleotides enable rearrangement of the phosphodiester backbone for concentrated negative electrostatic potential in certain regions. Role of protonation of nucleotides in basepairing is discussed below. 108

4.4.4 Relevance of protonation of A and C to non Watson-Crick base pairing

The bases that are most readily protonated are A and C on their Watson-Crick edges; the pKa’s are ~4 and therefore accessible under physiological conditions. A and C are normally unprotonated and act as H-bond acceptors, can be protonated at a modest energetic cost, when required by the context, to convert these groups to H-bond donors. This allows certain base edges to hydrogen bond in geometries they cannot form in the unprotonated form. Protonation confers a positive charge to the resulting BP, which can help stabilize accumulations of negative charge, as occur, for example in the close packing of phosphate groups or during chemical reactions. Allowing for H-bonding to 2’-OH and protonation of A and C Watson-Crick edges expands the number of basepairs one can construct.

4.5 Hydrogen bonding in base pairs

Within each basepair family different base combinations (A with U, G with C, U with U, etc.) can form geometrically similar basepairs, depending on the arrangement of H-bond donor and acceptor groups on each base edge. To help the reader explore the base-pairing possibilities of

RNA, I have prepared planar structures of the four nucleotides that can be photocopied onto colored transparencies and cut out individually (see Figure 4.5). Each nt is reproduced in two distinct orientations, related by flipping over out of the plane, to aid in forming cis as well as trans basepairs, without having to flip the bases upside down. New basepairs can be formed by aligning base edges in various combinations to identify potentially stable H-bonding arrangements. To guide the reader in this exercise, lone pair electrons have been included on the carbonyl oxygen and imine nitrogen atoms; these serve as electron-rich, H-bond acceptors and are colored blue. Lone pairs of exocyclic amino groups (GN2, CN6, and AN6) are not shown in 109 these diagrams because these electrons are delocalized into the aromatic rings and are generally not available as H-bond acceptors, except in special cases [38].

Figure 4.5 Nucleotides to print on transparencies. RNA nts in two orientations to print on 110 transparencies for making base pairs by juxtaposing H-bonding donor (blue) and acceptor (red) groups.

The H-bond acceptor groups are colored red, to reflect their overall negative charge. H-bond donor groups, comprising H-atoms covalently bonded to electro-negative oxygen or nitrogen atoms, are colored blue, reflecting their overall positive charges. The criterion when manipulating the colored transparencies to obtain stable basepairs is to juxtapose H-bond donors and acceptors so as to form at least two H-bonds. Each red-colored functional group should partly overlap a blue-colored functional group while avoiding any red-with-red or blue-with-blue juxtapositions. The 2’-OH (hydroxyl) groups are colored purple to indicate that they can serve either as H-bond donors or acceptors. Moreover, a single hydroxyl group can simultaneously interact with an H-bond acceptor and at least one H-bond donor.

4.5.1 Base pairing interactions

RNA bases can form many types of basepairs, in addition to the well-known WC pairs, because they have three distinct edges available for H-bonding [2], the Watson-Crick edge (W), the

Hoogsteen edge (H) and the Sugar edge (S) as shown in Figure 4.1. The base edges interact with each other in all combinations, W with W, W with H, W with S, H with H, H with S, and S with

S, and in each of two orientations, cis and trans, to create twelve geometrically distinct types of base pairs [4]. These twelve types of basepairing geometries are shown schematically in Table

1.4 using right triangles to represent RNA bases. The hypotenuse of each triangle represents the

H edge and the marked vertex indicates the location of the ribose sugar. Table 1.4 also shows symbols developed to mark basepairs in extended 2D diagrams. In these symbols, circles represent Watson-Crick edges, squares Hoogsteen edges and triangles Sugar edges. Basepairs involving distinct edges are represented by two symbols linked by a line. In this way crucial 3D 111 interaction information can be conveyed by annotating 2D diagrams, at least for local interactions. Representing all long-range interactions in the 3D structure is more difficult because the 2D diagram can become very cluttered with symbols if these are not drawn carefully.

112

Table 4.4 Twelve geometric families of RNA base pairs

4.5.2 The Sugar Edge

A key concept for understanding sugar-edge basepairing is the role of the ribose 2’-OH group, which can serve as an H-bond donor or acceptor, and often, both simultaneously, on account of the free rotation of the -OH group about the C2’-O2’ single bond and the presence of non- 113 bonding orbitals on the oxygen. Consequently, many different pairs can form using the sugar edges of nts.

4.5.3 Base pair isostericity and sequence variation

The classification of basepairs into geometric families based on edges provides the framework for explaining the base substitutions observed in structural alignments of recurrent RNA 3D motifs and sequence alignments of RNA homologs. RNA helices are regular precisely because

AU and GC pairs can substitute for each other with little or no distortion of the helical geometry.

They are said to be isomorphic or “isosteric” to each other in the sense of occupying the same space between the backbone atoms of the two strands of the helix. The property of being isosteric can be quantified using a measure called the Iso-Discrepancy Index (IDI), which depends on three distinct geometric features of base-pairs [39]. The geometric classification of basepairs is useful because only basepairs belonging to the same family are isosteric by qualitative and quantitative (i.e. the IDI) criteria. Isosteric pairs substitute for each other without distorting the local RNA 3D structure. The IDI was calibrated by carrying out statistical analysis of IDI values of isosteric and near isosteric AU, GC, and GU basepairs extracted from high- quality structures. Quantitative analysis with IDI values confirms that for two basepairs to be isosteric, they must belong to the same geometric family [39].

4.5.4 Base pair frequency

Analysis of atomic-resolution 3D structures of RNA complexes can reveal intrinsic propensities of certain basepair combinations to occur in some geometry more frequently than in other.

Stombaugh et al analyzed 3D structures corresponding to 954 sequences of 5S, 16S and 23S ribosomal RNA to calculate basepair frequencies of each basepair combination for different 114 basepair geometries as shown in Figure 4. 6.

Figure 4.6 A graphical summary of the base pair occurrence frequencies within each base pair family, obtained from rRNA sequence data. For cWW, tHH, tWH, tHS, tWS and tSS, one base combination accounts for >50% of instances. The gray boxes in each matrix indicate base combinations that do not form that type of base pair. Adapted from [39].

As observed for many the non-WC base pair families, only one or two base combinations comprise most of the occurrences of that base pair family. E.g. in cHH families, AG accounts for

49.6% of base pairs while the rest of the 50% is accounted for by GG base pairs. This implies in order to form cHH base pairs, one of the nucleotides must be a G and the other can be any purine. tHS, tWS and tSS are other geometries that are favored by AG base pairs. From Figure

4.6 it is evident that certain base combinations are more common, while others are very rare, if 115 they occur at all. Besides geometry, energetics of the base pair, some can be more stable than others owing to the types or number of hydrogen bonds they form, along with differences in their stacking energies can explain the predominance of certain base combinations for particular geometries.

4.6 Orbitals available in amino acids for hydrogen or metal binding

Several amino acids have sidechains that are Hydrogen bond donors or acceptors. The ability to donate a lone pair of electrons or accept a Hydrogen atom depends on the hybridization of the orbitals around the donor/ acceptor atom. The number of bonds an atom makes in a molecule decides the number of required hybridized orbitals. Determining which hybrid orbitals form the most stable bonds in a molecule requires counting the regions of electron density around each atom. The general hybridization scheme and composition of hybrid orbitals are shown in table

4.5.

116

Table 4.5 A. General Hybridization Scheme (Adapted from UCLA)

Number of Regions of Electron Hybridization Number of Shape Density unhybridized p or d-orbitals

2 sp 2 Linear

3 sp2 1 Trigonal planar

4 sp3 0 Tetrahedral

5 sp3d 4 Trigonal pyramidal

6 sp3d2 3 Octahedral

Table 4.5 B: Hybrid orbital composition

Hybrid Orbital Orbitals Combined Resulting Orbitals

sp s-orbital + 1 p-orbital 2 sp orbitals + 2 p-orbitals

sp2 s-orbital + 2 p-orbitals 3 sp2 orbitals + 1 p-orbital

sp3 s-orbital + 3 p-orbitals 4 sp3 orbitals (no p-orbitals)

A p orbital containing a lone pair of electrons is usually available for dative bonds as in metal coordination or Hydrogen bonding. Exceptions exist and it is necessary to understand the cases where p orbitals would not be available for coordination or Hydrogen bonding. Given below is a summarized list of “rules” to follow while determining hybridization, resonance and availability of lone pairs:

1. Based on the regions of electron density, in some cases an atom might be deemed sp3 117

hybridized. However, it must be noted that when an atom appears sp3 hybridized but has

possible resonance structures, then the atom is in fact sp2 -hybridized. Resonance requires

an open p-orbital to occur. Unlike sp2 hybridized atoms, sp3 hybridized atom has 4 sp3-

hybridized orbitals and no open p-orbitals, and therefore cannot participate in resonance.

2. Among amino acids, side-chains with multiple resonance structures are common (Arg,

Gln, Asn, Asp, Glu, Tyr, Phe, Trp, His) since structures with resonance are more stable

than similar structures without resonance.

3. While looking for canonical structures in a group, it must be noted that atoms with

definite sp3 orbitals (such as a carbon with four bonds) prevent resonance from occurring

through them due to their lack of a p-orbital. E.g. in methionine, (-CH2SCH3) even

though the sulfur has two lone pairs, they cannot engage in resonance since the sulfur is

bonded to two sp3 hybridized carbons with no available p orbitals.

4. While determining resonance involving a lone pair, hybridization of the atom containing

the lone pair must be taken into account. Only a lone pair in a p orbital can be delocalized

in resonance, conjugation or aromaticity.

5. An atom having four s bonds has to be sp3 hybridized to accommodate these bonds. This

atom does not have a p orbital, and cannot participate in resonance (except the very rare

case involving σ bonds), conjugation, or aromaticity.

From the analysis of orbitals required for H-bonding in nucleic acids and proteins, it appears that unhybridized p-orbitals are not capable of forming hydrogen bonds.

4.6.1 Amino acid sidechains with lone pair of electrons

Hydrogen bonding occurs when there is a significant difference in electronegativity between

Hydrogen atom and the donor atom. Hydrogen bonds can be classified by their strengths, as in 118 conventional X−H···:Y (X,Y = N,O) H-bonds and weaker C−H···:O bonds, both of which are observed in proteins. NH groups in peptide backbones can donate a single hydrogen bond and the carbonyl group of the peptide can accept two via two lone pair electrons on the sp2- hybridized oxygen. Detailed studies have shown that most of mainchain NH and C=O groups are involved in hydrogen bonding [40,41]. Nearly all sidechains capable of being involved in a hydrogen bond are indeed involved in at least one hydrogen bond [40-43]. Under certain conditions an atom of hydrogen is attracted to two atoms instead of only one, so that it may be considered to be acting as a bridge between them. Most amino acids with hydrocarbon sidechains do not participate in metal coordination or Hydrogen bonding. Although there are instances of C-H---O Hydrogen bonding, earliest noted in purines [44], the resultant Hydrogen bond is very weak and not commonly observed. C-H bonds are significantly affected by the adjacent atoms. E.g. C-H groups in aromatic sidechains (Phe, Tyr, Trp) are better Hydrogen bond donors than aliphatic C-H groups. A neighboring N atom, particularly a charged group containing Nitrogen, greatly activates a C-H group, making it a stronger Hydrogen bond donor.

Hence C-H groups in Lysine and Arginine are stronger donors than Trp since the neighboring N is charged in the former two amino acids. The strongest C-H donors in protein are the CD1-H and CD2-H in Histidine, particularly when the Histidine is protonated [45]. Still, C-H Hydrogen bond donors are not common in protein-RNA or protein-protein interactions. Moreover, hydrocarbon sidechains, aliphatic or aromatic, tend to be hydrophobic. Hence, they are found buried in protein cores and do not commonly participate in solvent (water) mediated Hydrogen bonding. Exceptions include Methionine and Tryptophan which can be a Hydrogen bond acceptor and donor respectively, utilizing the lone pairs on their Sulfur and Nitrogen atoms. Such

‘weak’ hydrogen bonds/bridges are specific interactions with distinct structural consequences 119 and their presence or absence can cause a cascade of changes.

Amino acids with polar sidechains, uncharged or charged, can form Hydrogen bonds that are most commonly observed in proteins. Depending on pH and the local environment, amino acids can be in protonated or deprotonated state. All Hydrogen bonding amino acid sidechains are shown in Figures 1.6-1.11. Donor orbitals are denoted by red and acceptor orbitals are denoted by blue.

Arginine and Lysine: Sidechains of basic amino acids Arginine (sidechain pKa = 12.5) and lysine

(sidechain pKa = 10.5) are always protonated and hence positively charged, at physiological pH. Their sidechains are therefore Hydrogen bond donors. Arginine has three atoms which can donate their Hydrogen bonds: NE, NH1, NH2, of which the last two can donate two "sp hydrogens" each. Arginine sidechain can form a bifurcated hydrogen bond where an acceptor molecule has one hydrogen bond acceptor, or Arg can form a bidentate hydrogen bond with two acceptors (L. Shimoni and J.P. Giusker, 1995). The various modes of Hydrogen bonding in arginine sidechain is shown in Figure 4.7.

Histidine: The pKa of Histidine is close to the physiological pH, though its pKa can vary depending on the local environment. At pKa = 6, ~25% of histidine molecules are charged at pH

7.4. However, when the local environment raises the histidine side-chain pKa to ~7, the imidazole side-chain has a significant partial positive charge at pH 7.4. Thus the sidechain of histidine, specifically atoms Nδ1 and Nδ2, can be a Hydrogen bond donor or acceptor depending on its environment. However, since the two imidazole nitrogens cannot donate and accept a hydrogen bond at once each nitrogen contributes to only one hydrogen bond (McDonald and

Thornton, 1994). In its neutral (uncharged) form, either of the nitrogens is bonded to a hydrogen so that one becomes a donor and the other an acceptor. In its charged state, both nitrogens are 120 bonded to a hydrogen, making them both donors. Because of such flexible property, histidine can act as a pH-dependent proton switch in proteins by inducing changes in the protein structure in response protonation/ deprotonation depending on the pH.

Figure 4.7 Structure and resonance in basic amino acids. H-bond donors are highlighted with blue and H-bond acceptors with red. Delocalized orbitals are shown in green.

Asparagine and glutamine can act as both Hydrogen bond donors (through atoms Nδ1) and acceptors (through atoms Oδ2). Recurrent patterns of hydrogen bonding are shown in Figure 4.7. 121

Figure 4.8 Structure and resonance in Asparagine and Glutamine. Modes of binding in Arginine and Asparagine sidechains are shown according to [46].

Glutamate and aspartate have a pKa of 4 and are generally deprotonated at physiological pH and hence negatively charged (Figure 4.8). In their deprotonated form, they accept Hydrogen bonds via atoms Oδ1 and Oδ2. Under certain conditions of low pH, these side-chains may become partially or fully protonated.

Figure 4.9 Structure and resonance in acidic amino acids. H-bond donors are highlighted with 122 blue and H-bond acceptors with red. Delocalized orbitals are shown in green.

Cysteine has a thiol group with a pKa of ~8.4. Therefore free cysteine side-chains will be slightly negatively charged at some physiological pH (~10% at pH 7.4). However, compared to

Histidine, Cysteine is much less likely to act as a pH-dependent switch, because reductions in pH will only protonate the ~10% of the cysteine side-chains that are deprotonated.

Serines and threonines and Tyrosine have a hydroxyl group as the functional moiety of their sidechain. Available lone pairs on their Oγ allows them to be Hydrogen bond donors. The donor capability in Tyrosine is somewhat diminished by the participation of the lone pairs in p orbital of OH in delocalization of electrons in the aromatic ring.

Serine and Threonine play crucial roles in maintaining protein secondary structure as well as bimolecular interactions. 70% of serine residues and at least 85% (potentially 100%) of threonine residues in helices make hydrogen bonds to carbonyl oxygen atoms in the preceding turn of the helix [47].

Figure 4.10 Structure and resonance in polar amino acids with a hydroxyl function group in their sidechains. H-bond donors are highlighted with blue and H-bond acceptors with red.

Amino acids such as Tryptophan, Proline and Methionine have lone pairs on atoms in the sidechains, however the hybridization of the orbitals (sp3) makes them unsuitable for donation. 123

Tryptophan NE1 has partial p character and can occasionally act as a Hydrogen bond donor.

Figure 4.11 Structure and resonance in nominally polar amino acids. Sidechains contain atoms with lonepairs but the lonepairs are in an sp3 orbital and thus unavailable for donation.

4.7 Recurrent elementary interactions in RNA-protein complexes

Nucleoprotein complexes are formed as a result of numerous short and long-range, energetically favorable interactions that are sequence-specific. Such specific interactions stabilize functional structures or mediate interactions with other molecules. These interactions can be transient or long term and are usually non-covalent in nature. In this section, I discuss the fundamental molecular interactions that stabilize structural assemblies of complexes.

4.7.1 Hydrogen bonding

Hydrogen bonding is possibly the most important of all intermolecular interactions. A hydrogen bond is a favorable interaction between an atom with a basic lone pair of electrons (a Lewis

Base) and a hydrogen atom that is covalently bound to an electronegative atom (N, O, or S).

Essentially, the Hydrogen atom is partially stripped of its electrons and is shared by two electronegative atoms, mainly O and N as: DǦHǦǦǦA. In a hydrogen bond, the Lewis Base is the hydrogen bond acceptor (A) and the partially exposed proton is bound to the hydrogen bond donor (H-D). It is a special case of dipoleǦdipole interactions, which is electrostatic for weak and moderate bonds, and acquires a covalent character when it is strong. A hydrogen bond is not an 124 acid-base reaction; there is no transfer of proton (H+) from one molecule to another leading to formation of D- and HA+. In a hydrogen bond, the H+ is partially transferred from H-D to A, but

H+ remains covalently attached to D. The H-D bond remains intact (Figure 4.12).

Figure 4.12 Components of a Hydrogen bond (HB). HB acceptor and HB donor are shown as A and D, the lone pair and the acidic proton are shown in their red and blue orbitals respectively. N, O, S are the predominant hydrogen bonding atoms (A & D) in biological systems.

Hydrogen bonds are short-range and have directionality as well as specific bond angles.

Strength of a hydrogen bond depends on (i) dielectric constant of the medium (ii) electronegativity of atoms involved and (iii) charge. Concepts from acid-base theory can be applied to Hydrogen bond donors and acceptors to estimate the strength of a Hydrogen bond.

Electronegative atoms strip hydrogens of their electron density (like acids) and make the hydrogen atom available for an acceptor which shares its lone pair with the hydrogen atom (like a base). Hydrogen bonds in biological systems generally involve oxygen and nitrogen atoms as

A and D. Keto groups (=O), amines (R3N), imines (R=N-R) and hydroxyl groups (-OH) are the most common hydrogen bond acceptors in DNA, RNA, proteins and complex carbohydrates.

Hydroxyl groups and amines/imines are the most common hydrogen bond donors. Hydroxyls and amines/imines can both donate and accept hydrogen bonds.

4.7.2 Electrostatic interactions

Electrostatic interactions are between and among cations and anions, and can be either attractive or repulsive, depending on the signs of the charges. Like charges repel while unlike charges 125 attract. Unlike charges placed in a long-range distance (~) experience an attractive electrostatic force (Coulombic force). When Coulombic force between two molecules with charges Z1e and

Z2e, separated by a distance of r12 leads to an energy that is higher than the thermal energy at room temperature, then the electrostatic interactions can be considered strong. The energy associated is given by equation 1.5 where k is Coulomb’s constant and ε is the dielectric constant of the medium.

ܼݍ ܼݍ ܧ ൌ ଵ ଶ ͶS[଴”

In RNA, the most prevalent electrostatic interaction is between amino acids with positively charged sidechains (such as Lys or Arg) with the negatively charged phosphate backbone of nucleotides. Similarly, metal ions such as Mg2+ or Na+ can also bind to phosphates. Coulombic forces depend greatly on the dielectric constant of the medium. Thus the attractive force between two molecules in water would be very different from that in benzene. In biomolecular systems, dielectric constant may be affected by local environment via solvent exclusion. Energy associated with electrostatic interaction between an ion pair exposed to the solvent may be half that of an ion pair that is in a hydrophobic core.

Favorable electrostatic interactions can also occur in proteins, between anionic and cationic amino acid sidechains. Such ion Pairs, lead to “salt bridge” formation when cationic (Arg/

Lys)and anionic (Asp, Glu) amino acids are separated by 3.0 to 5.0 Å. In salt bridges, Hydrogen bonding, combined with Coulombic force of attraction, strengthen the interaction.

4.7.3 Van der Waal interactions

In polar molecules, there is a separation of charges with the greater electron density being concentrated on the more electronegative atom. The extent of charge separation within a 126 molecule is characterized by the dipole moment μ. Both the magnitude and the orientation of the dipole moment is crucial in influencing the environment around the molecule. The magnitude of a dipole moment is determined by the magnitudes of the partial charges and by the distances between them. When a polar or a charged molecule gets close to another, they change their mutual electronic wavefunctions that can strengthen or weaken the interaction. If a polar/ charged molecule is near a non-polar molecule, their wavefunctions are changed in such a way that the non-polar molecule now bears a partial charge. Such polarization introduces new forms of possible interactions. Polarizable molecules produce a dipole when exposed to an electric field

(E): μ = ɑEfield

Thus a polar molecule has a dipole and can induce a dipole moment in a polarizable molecule.

Interactions between dipoles and ions are are called Charge-Dipole Interactions (or Ion-Dipole

Interactions). Dipoles also interact with other dipoles (Dipole-Dipole Interactions), and induce charge redistribution (polarization) in surrounding molecules (Dipole-Induced Dipole

Interactions). All these interactions are collectively described as Van der Waals forces. Induced dipole based interactions include dispersion forces, i.e. instantaneous dipoles interactions, ion- induced dipole, and dipole-induced dipole interactions. These are short-range interactions, the associated energy being inversely proportional to the distance

ଵ E͑௪͑ ௥ల

Van der Waal’s forces can be both attractive and repulsive. In RNA-protein complexes, base stacking and base-aa stacking are examples of vdW attractive forces. Repulsive vdW forces allow each helical turn in an RNA to be 3.4 Å apart. Strength of a vdW interaction is a function of the surface area of contact and the polarizability of electron shells. The larger the surface area the stronger the interaction will be. Thus stacking of purines is more favorable than pyrimidine 127 stacking. By the same principle, aromatic amino acid stacking on RNA bases should provide greater stabilization.

4.7.4 Hydrophobic interactions

Hydrophobic interactions refer to interactions that are favored by excluding water from the interaction surface. The term “hydrophobic interaction” is misleading, since the “interaction” is a result of rearrangement of aqueous network around a non-polar molecule and should ideally be called hydrophobic effect. The hydrophobic effect is observed when a non-polar molecule is inserted in water and causes an interruption of the loose H-bond network. Water molecules are forced to form H-bond only with outer molecules creating a cage around the solute. The new bonds are stronger, and this causes and entropy loss. Association between two hydrophobic molecules leads to a decrease of the number of water molecule trapped in the cages formation.

Total entropy increases and compensates the enthalpy loss. Thus hydrophobic effect is essentially dependent on the unique cohesive properties of water. Cohesive interactions between water molecules are not disrupted by dissolved non-polar molecules. Interactions between water molecules in the interfacial region (in contact with hydrocarbon) are just as extensive and enthalpy-wise favorable as interactions between water molecules in bulk water, surrounded by water only. There is no net change in extent of favorable molecular interactions when non-polar molecules are mixed in water or when they leave aqueous phase.

In RNA-protein complexes, RNA bases and hydrophobic sidechains of amino acids can orient themselves in a variety of ways to exclude water from the interacting surface.

Strength of the interactions discussed here is varied. Some operate in short-range while others predominate over long-range distances. A summary of the energies associated with each of the above interactions and their operative distances is provided in table 4.6. 128

Table 4.6 Energy values for different types of interactions prevalent in nucleoprotein complexes

Interaction Energy (kcal/ mol) Long-range or short-range?

Hydrophobic <10 Long range

Electrostatic 1-20 Short range

Hydrogen bond 2-30 Short range

S-S aromatic stacking 0-10 Short range

Van der Waal’s 0.1-1 Long range

For an RNA chain to fold into a distinct 3D structure, the nucleotides must form specific and energetically favorable non-covalent contacts or interactions. RNA nucleotides can interact with each other in many different ways. These can be broadly classified as 1) base with base, 2) base with sugar, 3) base with phosphate, 4) sugar with sugar, 5) sugar with phosphate, and 6) phosphate with phosphate. Although the negatively charged phosphate groups repel each other, precluding direct contact, the phosphates can indirectly interact through bridging divalent metal ions, especially Mg2+, which is maintained at millimolar concentration in cells [48]. When large

RNAs fold into compact structures, negatively charged phosphate groups are brought into close proximity. Cations help neutralize the negative charge density associated with the RNA phosphate backbone. Such charge neutralization occurs with cationic polyamines, basic proteins

[49] as well as inorganic cations [50]. Divalent cations also aid in RNA self-cleavage in ribozymes [51,52]. Mg2+ is particularly suitable divalent cation to facilitate rRNA compaction and catalysis because: (a) it is the most abundant intracellular multivalent cation and (b) it has the highest charge density of all biologically available ions, owing to its relatively small ionic radius (0.6 Å) [53]. Mg2+ associates preferentially with oxyanions of RNA phosphates over base 129 and ribose atoms, predominantly forming mono and bidentate complexes. Metal-bridging have been reviewed elsewhere in greater details [50,54].

4.8 Stabilization of RNA quaternary structure by proteins

Most, if not all, cellular RNAs interact specifically with one or more protein. E. coli 16S rRNA associates with 21 different ribosomal proteins (r-proteins) to form the small (30S) ribosomal subunit or “SSU,” and transiently with several translation factors. Figure 4.12 shows a histogram of SSU protein-RNA interactions for loop and helix nts in 16S rRNA (PDB file 2AW7). About

60% of nt-amino acid interactions involve loop nts, even though loop nts constitute just 42% of all 16S nts, demonstrating once again the important role of loop nts in the functional interactions of structured RNA. 130

Figure 4.13 Amino Acid Interactions for Loop vs. Helix Nucleotides. Histogram of number of amino acids within 4Å for nucleotides in loops (blue) vs helices (red) in E. coli 30S ribosome.

The 30S particle also interacts with the large (50S) ribosomal subunit, or LSU, to form the functional 70S ribosome. Non-covalent interactions or “bridges” that form between the subunits, many of which involve RNA, stabilize the assembled 70S ribosome [55]. Most of the 16S nts interacting with the LSU are loop nts or helix nts adjacent to loops. About thirteen of the approximately 58 “bulged out” nts that do not interact with other nts in E. coli 16S rRNA, interact with SSU protein molecules. From structures that contain bound mRNA and tRNA we 131 observe that three others interact with tRNA, three with mRNA, and A532, which is bulged out in 2AW7, forms a bridge to the head domain of 16S when the head clamps down on the mRNAs and the tRNA bound to the 16S A-site. Another bulged base, U723 forms a base-phosphate interaction with the mRNA-Shine-Delgarno helix. A702, which bulges out of the H23 kink-turn, interacts with 23S rRNA. Five others of these 58 “looped out” bases form perpendicular base- base interactions and eight form base-ribose interactions. In summary only about half of the 58 nts (less than thirty) are truly bulged out and of these ~21 facilitate tight turns in the 16S backbone, of which 7 are in UNCG loops.

The conclusion of this overview of the prevalence and roles of loop nts, exemplified by 16S rRNA, is that almost all nts, whether they belong to “loops,” linkers, or helices in the 2D structure, form some kind of pairing, stacking or base-backbone interactions. Moreover, interactions of loop nts constitute most of the crucial long-range pairing, stacking and base- phosphate interactions that stabilize domain structures and architectures and mediate most interactions with proteins and other RNAs. Most of the very small number of truly “looped out” bases play a role in facilitating formation of sharp ~180° turns in the RNA backbone.

For completeness, we conclude this section by briefly explaining the criteria used to assign nts to helices and loops in a consistent way that can be applied to analyze other structures and to automatically extract loop motifs for detailed comparison and analysis. This will help readers better understand how motifs are extracted from structures for the RNA 3D Motif Atlas [56], now accessible through the revised NDB website [12].

4.9 Resources for Exploring RNA 3D Structure

Visualizing the 3D structure reveals the architecture of RNA molecules and the stabilizing local 132 and long-range interactions. The 2D structure of an RNA is the basic road map for accessing the

3D structure because it is so easy to find each nt in the 2D and those nts that it is close to in sequence and secondary structure. The 3D structure is much more complicated and challenging to view. Coordinated color coding of the 2D and 3D structures as in Figure 4 greatly facilitates finding nts and their neighbors in the 3D structure. In the 3D structure, helical elements that are distant in the 2D may be brought together. Color coding makes it easy to identify which helical elements contact each other and to identify to which element each interacting nt belongs.

If instead one is interested in finding a structure based upon the protein content instead of RNA they can use the search facilities at the (PDB) found at http://www.pdb.org/.

PDB is the archival database for experimentally determined 3D structures of biomolecules, including proteins, carbohydrates, and nucleic acids. It has been integrated with a variety of resources such as the data validation parameters, protein secondary structure annotations, genetic sequences etc. PDB’s search tools allow the user to find all structures containing a particular protein. This is often useful when looking for a protein/RNA complex.

However, understanding the entire structure simply by visual inspection is exceedingly difficult, so much so that even experienced researchers find it challenging. A variety of online resources have been created to make understanding structure easier. Presented in table 1.6 is a summary of online resources available for visualization, validation and analysis of RNA/ protein structures.

4.9.1 Sources of Structural Data

There are 2 primary sources of structural data, PDB and NDB. All macromolecule 3D structures,

RNA, DNA and protein, are deposited in the Protein DataBank (PDB). This resource is found at http://www.rcsb.org/pdb/home/home.do and serves as an archive for data. It has integration with other sites such as NDB and the Structural Biology Knoweldge base. This is an excellent place 133 for finding raw data for structures.

The Nucleic Acids Database (NDB), located at http://ndbserver.rutgers.edu/, is the other main source of 3D data. It differs from the PDB by being more than just an archive and only focuses on nucleic acids. It contains information about structure quality as well as structural annotations for structures. For example, users can retrieve several different types of annotations about interactions in a structure. This site has recently undergone a major overhaul [12]. These updates add new types of data, such as pairwise interaction annotations and motif annotations, new ways of exploring the data, such as exploring only non-redundant structures, and an updated interface.

In addition, this site provides a powerful search allowing users to find structures both by structural features and biological criteria. Experienced users can explore the advance search option where RNA structures can be queried in terms of 3D motifs, 3D interactions, experimental restrictions of crystal cell dimensions and space group, chemical modifications or structural conformation.

Much of the data on NDB comes from the RNA 3D Hub (.bgsu.edu/rna3dhub/ [57]). Unlike what is provided at NDB, this site provides a history of all data. This allows users to see how the loops clustered into motifs have changed over time as well as unambiguously refer to any version of the data.

4.9.2 Tools for evaluating structures

Determining the conformation of the sugar-phosphate backbone from medium resolution X-ray structures (resolution 2.5Å to 3.5Å) is challenging. While the positions of the electron-rich and geometrically rigid nucleobases and phosphate groups can be more reliably determined, the conformations of the flexible ribose sugar moieties are harder to determine. Consequently, modeling errors are more frequent in the sugar-phosphate backbone conformations. The 134

Molprobity software and online web-service (http://molprobity.biochem.duke.edu/) was developed to identify these errors [58,59].

4.9.3 Tools for searching structures

Several tools are available for searching 3D structures for instances of a particular motif. Here I will mention tools which are available as a web server. A complete non-redundant list of RNA structures is provided at the NDB website [56]. This list is automatically updated and provides a set of non-redundant lists which groups all structures by type of molecule. To query a particular structure in this comprehensive list, users search by the functional or structural category the

RNA interest is a part of. In addition to structural and experimental details, the result provides interaction annotations for all structures as well as tools to explore the 2D and 3D information.

Other tools available as a web server are RNA FRABASE [60] (http://rnafrabase.ibch.poznan.pl,

REF), FRASS (http://protein.bio.unipd.it/frass/) and WebFR3D [61]

(http://rna.bgsu.edu/main/webapps/webfr3d/). RNA FRABASE uses any combination of sequence and secondary structure constraints to search for 3D motifs. FRASS searches 3D structures using constraints on the shape of the backbone. It creates a simplified representation of the backbone and searches other structures for regions containing similar structures. While not discussed here backbone conformations can be representative of motifs, making FRASS a powerful tool for finding some motifs. WebFR3D searches 3D structures using an example motif, constraints on interactions, or both. The key feature of WebFR3D is that it allows the user to perform a geometric as well as a symbolic search. While the geometric search looks for local and composite recurrent structural motifs in RNA 3D structures using the selected geometric criteria, the symbolic search allows the user to look for any RNA fragment that matches user defined parameters such as specific nucleotide identities, interactions between nucleotides and 135 distances in the polyribonucleotide strand. WebFR3D is embedded in the NDB website to enable the user to query the extensive nucleic acid database. The standalone version of WebFR3D,

FR3D, has been used to generate the Motif Atlas [56], a comprehensive resource of RNA 3D motifs. There are also several useful computational methods available for identifying amino acids in an RNA-binding protein directly contact RNA. A detailed review on these methods is available elsewhere [62].

136

Table 4.7 A list of resources available for visualization, validation and analysis of nucleoprotein structures Resource Description Link

Folding

UNAFold/mfold A tool to fold RNA and DNA sequences http://mfold.rna.albany.edu/?q= using thermodynamic parameters mfold/RNA-Folding-Form

RNAfold A tool which predicts secondary http://rna.tbi.univie.ac.at/cgi- structures of single stranded RNA or bin/RNAfold.cgi DNA sequences

ViennaRNA Web Tools for folding RNA sequences and http://rna.tbi.univie.ac.at/ Services alignments

2D Visualization

RNAbow A program for visualizing RNA base http://rna.williams.edu/rnabows/ pairing probabilities

RiboVision A modern open-source web application http://apollo.chemistry.gatech.ed for ribosome-information viewing u/RiboVision/

SAVoR An web application that allows the user http://tesla.pcbi.upenn.edu/savor/ to visualize RNA-seq data and other genomic annotations on RNA secondary structures

Structure Collections

Protein Databank The comprehensive resource for all 3D http://www.pdb.org structures of biomolecules. This contains RNA, DNA and proteins

Nucleic Acids A comprehensive collection of all 3D http://ndbserver.rutgers.edu/ Database structures that contain Nucleic acids. These structures are annotated with quality, base pairs, and annotations from a variety of sources

Non-redundant Non-redundant list of atomic-resolution http://rna.bgsu.edu/rna3dhub/nrli Database RNA structures at different resolution st thresholds 137

Evaluating Structures

MolProbity A tool to validate 3D structures http://molprobity.biochem.duke. edu/

W3DNA Web server for the analysis, http://w3dna.rutgers.edu/ reconstruction, and visualization of 3D nucleic acid-containing structures

Structural Searching

WebFR3D A web server to find RNA motifs in 3D http://rna.bgsu.edu/main/webapp structures by symbolic and geometric s/webfr3d/ searches

NASSAM A web server that searches for motifs http://27.126.156.150/nassam// and formations of nucleic acid bases in 3D space independent of base order

FRASS A web server for pairwise comparison of http://protein.bio.unipd.it/frass/ RNA structures or searching for similar structures

RNA FRABASE A tool to search 3D structures using http://rnafrabase.ibch.poznan.pl both sequence and structure constraints

Structural Feature Collections

Basepair Catalog Catalog of all RNA basepair families http://rna.bgsu.edu/FR3D/basepa with exemplars irs/

RNA Base Triple A collection three base interactions in http://rna.bgsu.edu/Triples/triple Atlas RNA structure s.php

3D Motif Atlas A collection of RNA 3D internal and http://rna.bgsu.edu/rna3dhub/mo hairpin loop motifs tifs

Comparative RNA A large collection of RNA alignments http://www.rna.ccbb.utexas.edu Site and secondary structures.

3D Structure Alignment 138

T-coffee A web server for the multiple sequence http://www.tcoffee.org/ alignment of protein and RNA sequences using structural information and homology extension

SARA A web server for fully automated RNA http://structure.biofold.org/sara/ structure alignment

SETTER A web server for RNA tertiary structure http://setter.projekty.ms.mff.cuni comparison .cz/

INTARNA, Web server that integrates tools for http://rna.informatik.uni- EXPARNA and prediction of RNA–RNA interaction, freiburg.de/ LOCARNA exact RNA matching and alignment of RNA

R3D Align An application for detailed nucleotide to http://rna.bgsu.edu/r3dalign/ nucleotide alignments of RNA 3D structures

R3D2MSA

Miscellaneous

RNA puzzles A CASP-like evaluation of RNA three- http://paradise-ibmc.u- dimensional structure prediction strasbg.fr/rnapuzzles/

EteRNA An interactive game that crowdsources http://eterna.cmu.edu/web/ the computationally challenging tasks of RNA design and structure prediction

4.10 Conclusion

Here I have provided a comprehensive introduction to recurrent bimolecular interactions in nucleoprotein complexes that stabilize the 3D architecture and make possible the specific binding of proteins, ligands, and other RNA molecules. I have discussed how specific properties of nucleotides and amino acids guide RNA-protein interactions and the underlying physical forces that govern them. Finally I have provided links to resources to allow readers to deepen their understanding of RNA 3D structure and access specific information about RNAs of interest 139 to them. New ways of integrating 2D and 3D representations of structured RNAs will make it easier for students and scientists to explore and comprehend the structures, functions, and evolution of these amazing, ancient molecules. New bioinformatic tools will make it possible to ever more reliably predict the 2D and 3D structures and possible functions and interactions of new RNA molecules.

140

REFERENCES

1. Sweeney BA, Roy P, Leontis NB (2014) An introduction to recurrent nucleotide interactions in RNA. Wiley Interdisciplinary Reviews: RNA: n/a-n/a. 2. Leontis NB, Westhof E (1998) Conserved geometrical base-pairing patterns in RNA. Quarterly reviews of biophysics 31: 399-455. 3. Brown JW, Birmingham A, Griffiths PE, Jossinet F, Kachouri-Lafond R, et al. (2009) The RNA structure alignment ontology. RNA 15: 1623-1631. 4. Leontis NB, Westhof E (2001) Geometric nomenclature and classification of RNA base pairs. RNA 7: 499-512. 5. Almakarem ASA, Petrov AI, Stombaugh J, Zirbel CL, Leontis NB (2011) Comprehensive survey and geometric classification of base triples in RNA structures. Nucleic acids research: gkr810. 6. Egli M, Saenger W (2013) Principles of nucleic acid structure: Springer Science & Business Media. 7. Zirbel CL, Šponer JE, Šponer J, Stombaugh J, Leontis NB (2009) Classification and energetics of the base-phosphate interactions in RNA. Nucleic acids research 37: 4898- 4918. 8. Tan Z-J, Chen S-J (2007) RNA helix stability in mixed Na+/Mg 2+ solution. Biophysical journal 92: 3615-3632. 9. Misra VK, Draper DE (2002) The linkage between magnesium binding and RNA folding. Journal of molecular biology 317: 507-521. 10. Richardson JS, Schneider B, Murray LW, Kapral GJ, Immormino RM, et al. (2008) RNA backbone: consensus all-angle conformers and modular string nomenclature (an RNA Ontology Consortium contribution). RNA 14: 465-481. 11. Murray LJ, Arendall WB, Richardson DC, Richardson JS (2003) RNA backbone is rotameric. Proceedings of the National Academy of Sciences 100: 13904-13909. 12. Narayanan BC, Westbrook J, Ghosh S, Petrov AI, Sweeney B, et al. (2014) The Nucleic Acid Database: new features and capabilities. Nucleic acids research 42: D114-D122. 13. Barrett GC, Elmore DT (1998) Amino acids and peptides: Cambridge University Press. 14. Zamyatnin A (1984) Amino acid, peptide, and protein volume in solution. Annual review of biophysics and bioengineering 13: 145-165. 15. Zamyatnin A (1972) Protein volume in solution. Progress in biophysics and molecular biology 24: 107-123. 16. Badger GM (1969) Aromatic character and aromaticity. 17. Resnick R, Halliday D, Walker J (1988) Fundamentals of physics: John Wiley. 18. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. Journal of molecular biology 157: 105-132. 19. Dwyer DS (2001) Electronic properties of the amino acid side chains contribute to the structural preferences in protein folding. Journal of Biomolecular Structure and 141

Dynamics 18: 881-892. 20. Wolfenden R, Andersson L, Cullis P, Southgate C (1981) Affinities of amino acid side chains for solvent water. Biochemistry 20: 849-855. 21. Kasende OE, Matondo A, Muzomwe M, Muya JT, Scheiner S (2014) Computational and Theoretical Chemistry. 22. Eley D, Spivey D (1962) Semiconductivity of organic substances. Part 9.—Nucleic acid in the dry state. Transactions of the Faraday Society 58: 411-415. 23. Sponer J, Sponer JE, Mladek A, Jurecka P, Banas P, et al. (2013) Nature and magnitude of aromatic base stacking in DNA and RNA: Quantum chemistry, molecular mechanics, and experiment. Biopolymers 99: 978-988. 24. Ladik J, Bende A, Bogár F (2008) The electronic structure of the four nucleotide bases in DNA, of their stacks, and of their homopolynucleotides in the absence and presence of water. The Journal of chemical physics 128: 03B602. 25. Westhof E, Yusupov M, Yusupova G (2014) Recognition of Watson-Crick base pairs: constraints and limits due to geometric selection and tautomerism. F1000Prime Rep 6: 19. 26. Westhof E (2014) Isostericity and tautomerism of base pairs in nucleic acids. FEBS letters 588: 2464-2469. 27. Katritzky A, Waring A (1962) 299. Tautomeric azines. Part I. The tautomerism of 1- methyluracil and 5-bromo-1-methyluracil. Journal of the Chemical Society (Resumed): 1540-1544. 28. Dreyfus M, Bensaude O, Dodin G, Dubois J (1976) Tautomerism in cytosine and 3- methylcytosine. A thermodynamic and kinetic study. Journal of the American Chemical Society 98: 6338-6349. 29. Wolfenden RV (1969) Tautomeric equilibria in inosine and adenosine. Journal of molecular biology 40: 307-310. 30. Singh V, Fedeles BI, Essigmann JM (2015) Role of tautomerism in RNA biochemistry. RNA 21: 1-13. 31. Blanchard SC, Puglisi JD (2001) Solution structure of the A loop of 23S ribosomal RNA. Proceedings of the National Academy of Sciences 98: 3720-3725. 32. Courtois Y, Fromageot P, Guschlbauer W (1968) Protonated Polynucleotide Structures. European Journal of Biochemistry 6: 493-501. 33. Ravindranathan S, Butcher SE, Feigon J (2000) Adenine protonation in domain B of the hairpin ribozyme. Biochemistry 39: 16026-16032. 34. Asensio JL, Lane AN, Dhesi J, Bergqvist S, Brown T (1998) The contribution of cytosine protonation to the stability of parallel DNA triple helices. Journal of molecular biology 275: 811-822. 35. Gao X, Patel D (1987) NMR studies of AC mismatches in DNA dodecanucleotides at acidic pH. Wobble A (anti). C (anti) pair formation. Journal of Biological Chemistry 262: 16973-16984. 142

36. Tang CL, Alexov E, Pyle AM, Honig B (2007) Calculation of pK a s in RNA: On the structural origins and functional roles of protonated nucleotides. Journal of molecular biology 366: 1475-1496. 37. Harris TK, Turner GJ (2002) Structural basis of perturbed pKa values of catalytic groups in enzyme active sites. IUBMB life 53: 85-98. 38. Šponer J, Šponer JE, Mládek A, Jurečka P, Banáš P, et al. (2013) Nature and magnitude of aromatic base stacking in DNA and RNA: Quantum chemistry, molecular mechanics, and experiment. Biopolymers 99: 978-988. 39. Stombaugh J, Zirbel CL, Westhof E, Leontis NB (2009) Frequency and isostericity of RNA base pairs. Nucleic acids research 37: 2294-2312. 40. McDonald IK, Thornton JM (1994) Satisfying hydrogen bonding potential in proteins. Journal of molecular biology 238: 777-793. 41. Baker E, Hubbard R (1984) Hydrogen bonding in globular proteins. Progress in biophysics and molecular biology 44: 97-179. 42. Guerin J, Stickle W (1992) Effects of salinity gradients on the tolerance and bioenergetics of juvenile blue crabs (Callinectes sapidus) from waters of different environmental salinities. Marine Biology 114: 391-396. 43. Torshin IY, Weber IT, Harrison RW (2002) Geometric criteria of hydrogen bonds in proteins and identification ofbifurcated'hydrogen bonds. Protein engineering 15: 359-363. .O hydrogen bonds in crystals ڮSutor DJ (1963) 204. Evidence for the existence of C–H .44 Journal of the Chemical Society (Resumed): 1105-1110. 45. Desiraju GR, Steiner T (2001) The weak hydrogen bond: in structural chemistry and biology: Oxford University Press on Demand. 46. Shimoni L, Glusker JP (1995) Hydrogen bonding motifs of protein side chains: descriptions of binding of arginine and amide groups. Protein Science 4: 65-74. 47. Gray TM, Matthews BW (1984) Intrahelical hydrogen bonding of serine, threonine and cysteine residues within α-helices and its relevance to membrane-bound proteins. Journal of molecular biology 175: 75-81. 48. Gupta RK, Benovic JL, Rose ZB (1978) The determination of the free magnesium level in the human red blood cell by 31P NMR. Journal of Biological Chemistry 253: 6172-6176. 49. Klein D, Moore P, Steitz T (2004) The roles of ribosomal proteins in the structure assembly, and evolution of the large ribosomal subunit. Journal of molecular biology 340: 141-177. 50. Bowman JC, Lenz TK, Hud NV, Williams LD (2012) Cations in charge: magnesium ions in RNA folding and catalysis. Current opinion in structural biology 22: 262-272. 51. Holm NG (2012) The significance of Mg in prebiotic geochemistry. Geobiology 10: 269- 279. 52. Petrov AS, Gulen B, Norris AM, Kovacs NA, Bernier CR, et al. (2015) History of the ribosome and the origin of translation. Proceedings of the National Academy of Sciences 112: 15396-15401. 53. KLEIN DJ, MOORE PB, STEITZ TA (2004) The contribution of metal ions to the structural 143

stability of the large ribosomal subunit. RNA 10: 1366-1379. 54. Woodson SA (2005) Metal ions and RNA folding: a highly charged topic with a dynamic future. Current Opinion in Chemical Biology 9: 104-109. 55. Yusupov MM, Yusupova GZ, Baucom A, Lieberman K, Earnest TN, et al. (2001) Crystal structure of the ribosome at 5.5 Å resolution. Science 292: 883-896. 56. Petrov AI, Zirbel CL, Leontis NB (2013) Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas. RNA 19: 1327-1340. 57. Rahrig RR, Petrov AI, Leontis NB, Zirbel CL (2013) R3D Align web server for global nucleotide to nucleotide alignments of RNA 3D structures. Nucleic acids research: gkt417. 58. Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, et al. (2007) MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic acids research 35: W375-W383. 59. Hintze BJ, Lewis SM, Richardson JS, Richardson DC (2016) Molprobity's ultimate rotamerǦlibrary distributions for model validation. Proteins: Structure, Function, and Bioinformatics 84: 1177-1189. 60. Popenda M, Szachniuk M, Blazewicz M, Wasik S, Burke EK, et al. (2010) RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three- dimensional fragments within RNA structures. BMC bioinformatics 11: 231. 61. Petrov AI, Zirbel CL, Leontis NB (2011) WebFR3D—a server for finding, aligning and analyzing recurrent RNA 3D motifs. Nucleic acids research: gkr249. 62. Walia RR, EL-Manzalawy Y, Honavar VG, Dobbs D (2017) Sequence-Based Prediction of RNA-Binding Residues in Proteins. Prediction of Protein Secondary Structure: 205-235.

144

CHAPTER 5. AUTOMATED DETECTION AND ANNOTATION OF

RNA-PROTEIN INTERACTIONS

5.1 Early evidence of specific nucleic acid-protein recognition

The central dogma of molecular biology as stated by Francis Crick describes the flow of genetic information within a biological system [1]. The steps of the flow in the most common sequence are: (i) DNA replication, (ii) transcription of genetic information from DNA to messenger RNA

(mRNA), (iii) translation of mRNA by the ribosome resulting in protein synthesis (iv) degradation of mRNA. Each step of this flow is regulated by specific recognition of DNA and

RNA sequences by proteins. Regulation of transcription by transcription factors (DNA-binding proteins) led to the proposition of the repressor hypothesis [2] which further guided other studies to show that a DNA-binding protein is capable of binding very selectively to unique DNA sequences, out of the whole genome [3]. The first structures of nucleic acid binding proteins solved at atomic resolution, were Cro and cI repressors of bacteriophage lambda and for the CAP

(Catabolite Activator Protein) of Escherichia coli [4-6], each of which bind specifically to genetic sequences to regulate transcription. Despite differences in size, 3D structure and architecture, each of these proteins binds to operator DNA in a similar manner. Each one interacts as a dimer and uses α-helices to contact adjacent major grooves along the face of the double helical DNA. Interactions between amino acid (aa) sidechains and nucleobases in the major groove are bolstered by adjacent α-helices that contact the DNA backbone to facilitate proper orientation for recognition of DNA [7].

The first RNA binding protein (RBP) structures of tRNA-bound aminoacyl-tRNA synthetase complexes and complexes of MS2 coat protein with RNA hairpins demonstrated a recurrent

RNA-protein interaction motif where the proteins bound to the major grooves of A-form RNA. 145

The amino acid sidechains appeared to recognize both the specific sequence of bases and the shape or dimensions of the groove. Studies in the mid 1980’s to late 1990’s focused on recognizing patterns of RNA/ DNA-protein interactions. For example, in groove binding proteins, the chemical nature of RNA bases in these grooves governs the RNA-protein contacts.

Purine rich grooves favor Arginine-rich peptide sequences as evinced through Alanine scanning mutation analysis [8] and Arg Æ Lys mutation studies that probed whether positive charge alone was sufficient for binding specificity [9]. Another category of RBPs involves β-sheet rich proteins that bind to specific sequences of single-stranded RNA [10]. RBPs such as, Human

U1A protein and Rho factors, form binding pockets where unstacked bases such as bases in a hairpin loop, interact with amino acids [11]. Detailed studies of RNA-protein interaction and specific recognition of nucleic acids by proteins is still a topic of active research interest.

Recognition of specific base sequences involves edge-to-edge H-bonding (“pseudopair”) formation and face-to-face van der Waals interactions (“stacking”) between amino acid sidechains and RNA bases. These specific interactions, together with backbone interactions and electrostatic interactions, stabilize the protein-RNA complexes. With the availability of atomic resolution X-ray, cryo-EM and NMR structures of nucleic acid-protein complexes, analyzing

RNA-protein interactions at a residue level has become more feasible. Here, we present an automated detection and classification of RNA-protein interaction.

5.2 Differences between RNA and DNA Recognition by proteins

Like DNA, double stranded RNA is composed of two polynucleotide strands arranged in an antiparallel double helix. Duplex nucleic acid helices can be of three forms: A, B and Z (Figure

5.1). For most DNA at neutral pH and physiological salt concentrations, the predominant form is

B-form helix. Base pairs comprising the helix go through the axis in B-form helices. DNA- 146 binding proteins bind to cognate sequences within duplex DNA in B-form. On the other hand, under most conditions RNA duplexes adopt A-form helices in which the planes of base pairs are tilted with respect to the helix axis. RNA chains make more complex and varied three- dimensional shapes. These can consist of duplex A-form helices often stacked co-axially, as well as single-stranded regions forming loops and bulged bases. The third form of duplex DNA is the

Z-form composed of stretches of alternating purines and pyrimidines, e.g. GCGCGC, especially in negatively supercoiled DNA. It has a strikingly different, left-handed helical structure whose biological role has not been fully understood yet.

Figure 5.1 Different conformations of DNA. Side (top panel) and top (bottom panel) view of B- form (left), A-form (middle) and Z-DNA (right). Image adapted from [12].

147

A major difference between A-form and B-form nucleic acid is the placement of base-pairs within the duplex. In B-form, the base-pairs are almost centered over the helical axis (Figure 5.1 bottom panel), but in A-form, they are displaced away from the central axis and closer to the major groove. The result is a ribbon-like helix with a more open cylindrical core in A-form.

DNA in B-form can accommodate α-helices or antiparallel β-sheets in its wide major groove with the exposed base edges contacting amino acid side chains of the protein [13]. RNA, in the

A-form has broad and shallow minor grooves can bind proteins with great specificity. Though the deep, narrow major groove in RNA is unsuitable for allowing long α-helices protein interfaces, helical structure of RNA is often interrupted by single base bulges or loop regions that have more flexibility to contact proteins.

5.3 Binding sites on RNA: Major and minor grooves

In B-form helices, the phosphate backbones of each nucleotide are far apart, creating a narrow and deep major groove and a broad and shallow minor groove [14]. Energetically, binding by proteins to either major or minor grooves is favorable. However, interactions of proteins with these two types of grooves involve distinct thermodynamic changes. Binding to the major groove is essentially enthalpy driven ('H), whereas association with the minor groove is characterized by an unfavorable enthalpy ('S) that is compensated by favorable entropic contributions [15].

The difference in thermodynamic compensations stems from the contrasting hydration properties of the two grooves. The shallow major groove exposes much of the nucleic acid surface to solvent i.e. water. Particularly, in RNA, helical secondary structures are rarely more than half a turn, making most of their major groove surface accessible from the ends. Additionally, bulged bases facilitate increased flexibility in RNA conformation, allowing protein secondary structure to fit in the wide and shallow major grooves [11]. Consequently, major groove binding proteins 148 make stabilizing interactions that have a favorable enthalpy. Strong nucleotide-amino acid (nt- aa) hydrogen bonds and stacking interactions are common in major grooves. In DNA, hydrophobic interactions in the major grooves allow protein side chains to distinguish between basepairs thymine and cytosine [16]. Minor grooves are usually AU-rich, deep and narrow where water molecules are arranged in a highly ordered state. Hence, driving force of RNA-protein interaction in the minor groove is displacement of water, which results in a significant positive contribution to the binding entropy. Water mediated interactions with proteins, which are observed in major grooves, are rarely observed in the minor groove [16].

5.4 Types of RNA-protein interactions

Before attempting a systematic detection of RNA-protein interactions, it is important to recognize the recurrent modes of non-covalent nucleotide-aa interaction observed in nucleoprotein complexes. These are (i) electrostatic interaction (ii) pseudopairs (iii) stacking (iv) bidentate interactions (v) hydrophobic contacts and (vi) perpendicular stacking and cation-pi interactions.

5.4.1 Electrostatic interactions

This class of interactions is one of the most common and widely observed interactions.

Favorable electrostatic interactions between negatively charged phosphate oxygens of RNA and cationic amino acid sidechains stabilize compactly folded RNA structures by charge neutralization (Figure 5.2). Several amino acids are fully ionized at physiological pH and two of them, Arg and Lys, are cationic. Quantum mechanical calculations suggest that the negative charge density on Arg resides on the guanidinium group of its sidechain where each nitrogen bears a full +1 charge. For Lys, the protonated amino group has a +1 charge centered on the nitrogen. Because of the high positive charge density, electrostatic interactions between cationic 149 amino acid sidechains and RNA backbone are nonspecific [17] i.e. affinity of Arg or Lys sidechains for the phosphate backbone would be no more specific than that of polyamines such as spermidine or spermine. The following sections describe specific interactions of amino acids with nucleotides.

Figure 5.2 Electrostatic interaction in RNA between anionic phosphate oxygens (-1 charge) and protonated Lysine sidechain (+1 charge) in Flock House virus B2-dsRNA Complex (PDB: 2AZ2).

5.4.2 Hydrogen bonding

Taking their cue from the prevalence of hydrogen bonding in nucleotides, Seeman et al made the first effort to analyze and annotate DNA nucleotide-amino acid pairing [18]. In their proposed scheme, basic amino acid residues such as Lys and Arg interacted with the negatively charged phosphate backbone of DNA. Pseudopairs result from Hydrogen-bonding interactions between the edges of an RNA base and an amino acid functional group (aa_fg), a consequence of the geometric regularities of the RNA bases and the presence of at least two H-bond donor or acceptor groups on the aa_fg. As per Westhof et al [19] who coined the term “pseudopairs”, 150 having two Hydrogen bonds in the same functional group facilitates one-to-one base-amino acid pairings by fixing the position of the two bonds relative to each other.

H-bonds are attractive electrostatic interactions between H-atoms covalently bonded to highly electronegative atoms, primarily O and N in biomolecules, and electronegative atoms bearing unpaired electrons, also O or N for the most part. Lone pair of electrons on carbonyl oxygen and imine nitrogen atoms makes them electron-rich, H-bond acceptors. Availability of lone pairs on these atoms depends on the hybridization of the orbitals that accommodate the lone pair electrons. A rule of thumb (from chapter 4) is that for a lone pair of electrons to be available for H-bonding, they must be in an sp2 or sp3 hybridized orbital. Electrons in unhybridized p- orbitals do not participate in hydrogen bonding. The hybridization of constituent atoms in the amino acids is indicated in Table 5.1.

151

Table 5.1 Hybridization of atoms constituting amino acid backbone and side-chains.

Amino acid sp2 sp3 Backbone C, O, N CA Ala CB Arg CZ, NH1, NH2 CB, CG, NE Asn CG, OD1, ND2 CB Asp CG, OD1, OD2 CB Cys CB, SG Gln CD, OE1, NE2 CB, CG Glu CD, OE1, OE2 CB, CG Gly His CG, ND1, CE1, NE2, CD2 CB Ile CB, CG1, CG2, CD Leu CB, CG, CD1, CD2 Lys CB, CG, CD, CE, NH Met CB, CG, SD, CE Phe CG, CD1, CD2, CE1, CE2, CZ CB Pro CA, CB, CG, CD, NE Ser CB, OG Thr CB, CG, OD Trp CG, CD1, CD2, NE1, CE2, CB CE3, CZ2, CZ3, CH2 Tyr CG, CD1, CD2, CE1, CE2, CZ, CB OH Val CB, CG1, CG2

Hydrogens bound to electronegative atoms such as oxygen and nitrogen, make them electron deficient and these electronegative atoms serve as H-bond donors. To identify the hydrogen accepting and donating atoms on different amino acids sidechains, donor and acceptor groups in

Figure 5.3A are colored blue and red, to reflect their overall partially positive or negative charge.

Delocalized lone pairs of endocyclic imino groups as in Trp, or lone pairs on atoms in sp3 orbitals as in Met and Pro, are not available for H-bond acceptance. Figure 5.3B shows H-bond 152 acceptors and donors in planar structures of the four RNA nucleotides. The H-bond acceptor groups are colored red, to reflect their overall negative charge. H-bond donor groups, comprising

H-atoms covalently bonded to electro-negative oxygen or nitrogen atoms, are colored blue, reflecting their overall positive charges. The 2’-OH (hydroxyl) groups are colored purple to indicate that they can serve either as H-bond donors or acceptors. Moreover, a single hydroxyl group can simultaneously interact with an H-bond acceptor and at least one H-bond donor.

153

Figure 5.3 Hydrogen bond donor and acceptor groups in (A) amino acid sidechains and (B) RNA nucleotides. H-bond donors are highlighted with blue and H-bond acceptors with red. Delocalized orbitals are shown in green. 2’ hydroxyl groups on RNA bases are colored purple since they can act as both H-bond donor and acceptor. RNA base ring atoms are numbered from 1 to 9 for purines and 1 to 6 for pyrimidines. Exocyclic groups and attached hydrogens are numbered according to ring position.

154

Amino acids can form H-bonds via their side chains or their backbone atoms. Though all amino acids can form similar pseudopairs with their peptide backbone atoms with donor (N–H) and acceptor (C=O) groups, only some of the amino acids can engage in pseudopair formation mediated by their sidechains. Amino acids with acidic (Asp, Glu), basic (Arg, His) and polar sidechains (Asn, Gln) hold planar structures with hydrogen bonding donor and/or acceptor atoms similar to the atoms present in the nucleotide bases. Hence they interact with other amino acids on a plane by forming pseudopairs through two hydrogen bonds (Figure 5.3).

Not all amino acids can form pseudopair by their classical definition [19]. Some amino acids such as Ser, Thr, Tyr, Lys have only one group containing a H-bond acceptor or donor. Hence they can only form single interactions where one hydrogen bond is found between an amino acid and base or in rare cases, bifurcated hydrogen bonds where a single hydrogen atom from a nucleotide or an aa sidechain, can participate in two hydrogen bonds (Figure 5.4). Bifurcated hydrogen bonds are weaker than canonical hydrogen bonds, having 60% and 50% energy of canonical H-bonds [20].

Figure 5.4 Pseudopair observed between an Arginine and Cytosine in E. coli small subunit ribosome (PDB: 4YBB, chain AA).

155

Energetic values of hydrogen bonds depend on the dielectric constant of the medium. In aqueous medium, contribution from intramolecular hydrogen bonds is ~1 kcal/mole since it is possible for water molecules to substitute for macromolecular hydrogen-bond donors and acceptors.

Individual hydrogen bonds are highly directional but relatively weak non-covalent interactions.

Therefore, stable association between the edges of a base and aa_fg generally requires forming two or more H-bonds. Specificity is achieved because H-bonds are directional and require juxtaposition of complementary H-bond donors and acceptors, as exemplified by the H-bonding between the Watson-Crick edges of G and C, or A and U, to form pseudopairs. RNA bases can form many types of pseudopairs because they have three distinct edges available for H-bonding

[21], the Watson-Crick edge (W), the Hoogsteen edge (H) and the Sugar edge (S). Accordingly, nt-aa pseudopairs are classified based on the interaction edges of the bases involved in Hydrogen bonding in the same manner as nucleotide-ligand interaction classification [22].

5.4.3 Stacking interaction

The energetically most stabilizing contributions to RNA structure are provided by the hydrophobic van der Waals forces mediating stacking of the faces of RNA bases on each other

[23]. Such stacking interactions are also observed repeatedly in RBPs. A base and an amino acid are said to be stacked on one another if they lie in roughly parallel planes, on top of each other.

Stacked interactions are further classified according to the participating amino acid: Arginine stacking and aromatic stacking (involving Tryptophan, Tyrosine, Phenyl alanine and Histidine).

Because RNA bases lack rotational symmetry, the two faces can be distinguished from one another. Therefore an amino acid sidechain and a RNA base can stack face-to-face in distinct ways, depending on which base faces come into contact. The faces are distinguished by reference to the usual orientation of each base in the Watson–Crick helix, in which all bases are in the anti- 156 glycosidic conformation; the “5′-face” is the face that points toward the 5′-end of the strand and the “3′-face” the one toward the 3′-end of the strand [24].

Figure 5.5 Stacking interactions of RNA bases with (a) Arginine and (b) Tryptophan residues.

Amino acid sidechains can also stack on the ribose sugars of bases. Aliphatic carbon atoms from arginine and hydrophobic sidechains make non-polar contacts with endo faces of backbone riboses.

5.4.4 Bidentate interactions

A notable category of nucleotide-amino acid interaction involves an amino acid participating in simultaneous hydrogen bonding with two stacked bases as shown in Figure 5.6. We have termed such interactions, bidentate interactions. These interactions are limited to amino acids whose sidechains can make multiple hydrogen bonds like Arg, Asn, Gln, Asp, Glu. Amino acid sidechains that are capable of forming bifurcated multiple H-bonds such as Lys, Ser and Thr can also form bidentate interactions though they are less commonly observed. Participating nucleotides can belong to the same strand and be stacked on top of each other, or, they can be from different strands, positioned diagonally as shown in Figure 5.6. 157

Figure 5.6 Bidentate interaction involving residues in T. thermophilus ribosomal large subunit. Thermus LSU (PDB ID: 4QCN) shows an Arginine residue hydrogen bonding with two that are stacked diagonally

5.4.5 Hydrophobic interactions

Bases of nucleotides are the most hydrophobic parts of an RNA. During the formation of an

RNA helix, substantial part of the base surface area is shielded from solvent with the center of a helix forming a hydrophobic core [25]. Amino acids with hydrophobic sidechains (Pro, Leu, Ile,

Cys, Phe, Trp, Tyr) are found to predominantly interact in the minor groove of RNA helices.

They also participate in parallel or perpendicular stacking as shown here. These prefer to either stack or interact along the sugar edge of bases to maximize van der Waals’ forces of attraction.

158

Figure 5.7 Perpendicular edge interaction in mammalian mitochondrial small subunit ribosome (PDB: 5AJ3). A Phenyalanine residue is stacked perpendicularly to the edges of two parallel stacked .

5.4.6 Perpendicular stacking and cation-pi interactions

All bases contain an aromatic ring with a delocalized pi electron cloud above and below the plane of the ring. Cations such as the sidechains of Lysine or Arginine can align themselves centered over the faces of aromatic rings thus establishing an electrostatic interaction termed cation-pi interaction.

Aromatic amino acids can also interact in an edge-to-face manner with bases (“perpendicular stacking”). In crystals of aromatic hydrocarbons compounds such as benzene, a herring bone structure is observed in which benzene molecules are stacked parallel and perpendicular to the layer above, alternately [26]. Perpendicular stacking thus involves hydrophobic aromatic residues which stack normal to the plane of the nucleotide. 159

Figure 5.8 Perpendicular interactions showing (A) cation-pi interaction between Arginine and Cytosine and (B) perpendicular stacking of Phenylalanine with Adenosine

Depending on the constraints of analytical methods used, RNA-protein interaction methods use a limited dataset of high-resolution crystal structures, often omitting large nucleoprotein complexes, such as the ribosome. Also, most analyses focus on few, specific interactions such as hsydrogen bonding or electrostatic interactions. Lastly and most importantly, there is no existing database available that compiles different types of RNA-protein interactions. An attempt made by Hoffman et al [27] to construct such databases has been discontinued, as of now.

5.5 Methods

Here, I present the completely automated process for detecting and classifying the nature of different amino acid-nucleotide interactions. The algorithm accounts for all major forms of nucleotide-amino acid interaction including hydrogen bonding, electrostatic, van der Waals, cation-pi and stacking interactions. The first step is to partition each nucleotide and amino acid 160 residue into three parts- for each nucleotide: base, ribose sugar and phosphate and for each amino acid: functional group at the end of each amino acid sidechain (hereafter referred to as functional group), the “linker” connecting the functional group to the α-carbon of the amino acid, the peptide backbone comprising of Cα and the NH, CO groups (Figure 5.9). Parts of each amino acid defined as functional group (fg) and linker is shown in Table 5.2. The linker does not participate in the interactions and are not considered during the annotations. During analysis, the user can select which parts of each residue to query for interactions. Interactions are categorized as (i) amino acid functional group-base (ii) amino acid functional group-ribose (iii) amino acid functional group-phosphate (iv) peptide backbone-base (v) peptide backbone-phosphate.

Figure 5.9 Segmentation of amino acid and nucleotide residue to facilitate user directed screening of interactions. Each amino acid (left) is segmented into a functional group (functional end of the side chain participating in hydrogen bonding), linker and peptide backbone. Each nucleotide is segmented into base, ribose andphosphate backbone. 161

Table 5.2 Partition of each (A) nucleotide into base, ribose and phosphate, and (B) amino acid into linkers and functional groups for annotation. Hydrogen donor and acceptor atoms are also defined. Indicated between parentheses is the number of "sp hydrogens" that a donor atom can donate or accept, if more than one.

Nucleotide Base Hydrogen Donor atoms Hydrogen Acceptor atoms

N6 N1, N3, N7 A

N4 O2, N3 U

N1, N2 N3, O6, N7 G

N3 O2, O4 C

O2’ O3’ Ribose

3- Phosphate -PO4 OP1, OP2

162

Amino acid Linker Functional group Hydrogen Donor Hydrogen atoms Acceptor atoms

Asn, Gln -CH2, -CH2CH2 ND2 (2) Asn NE(2) Gln

Asp, Glu -CH2, -CH2CH2 OD1 (2), OD2 (2) (deprotonated) Asp OE1 (2), OE2 (2) Glu Asp, Glu -CH2, -CH2CH2 OD1, OD2 Asp (neutral) OE1, OE2 Glu

Arg -(CH2)3 NE, NH (1), NH2 (2)

+ Lys -(CH2)4 -NH3 NZ (3)

His -CH2 ND1, NE2 ND1, NE2

Tyr -CH2 OH OH

Trp -CH2 NE1

Phe -CH2

Ser, Thr -CH2, -CH2CH2 -OH OG

Cys -CH2 -SH SG

Met -CH2 -S(CH3)

Leu

Ile

Val

163

Pro

Ala -CH3 Gly -H

Our dataset includes high resolution (3 Å or better) crystal and Cryo-EM structures of nucleic acid-protein complexes to investigate if there is any underlying propensity for (i) amino acids of one type to interact with bases of a specific type (ii) amino acids to interact with a specific edge of the nucleotide (Watson-Crick, Hoogsteen, Sugar). We will show that our method accurately annotates interactions of ribosomal protein binding and will be useful for recognition of ribosomal RNA and enable better modeling of amino acids while fitting electron density data from X-ray crystallography or cryo-EM structures.

5.5.1 Deriving interaction pairs from representative list of crystal structures

The BGSU RNA group website (http://rna.bgsu.edu/rna3dhub/nrlist) hosts representative sets of

RNA-protein containing 3D structures which were used to remove bias in the interaction frequency patterns deduced. Use of the representative list enables accurate and statistically meaningful data. To extract high quality structural data, only atomic resolution crystal structures of RNA-protein complexes (resolution 3 Å or better) were used by retrieving mmCIF files from

PDB. Since an mmCIF file may contain more than one model of the same molecule, any structure that contained more than one identical model in the same file was parsed to consider one representative model.

5.5.2 Defining components of RNA-protein interactions

To computationally segment the base residues and the amino acid residues, we stored all relevant information in a Python file named “definition.py” available at Github URL: https://github.com/BGSU-RNA/fr3d-python/tree/develop/fr3d. This file contains Python 164 dictionaries that are used to define structural and functional components of RNA-protein interactions. The dictionaries include atom names that define the following components:

x aa_fg: amino acid functional group

x aa_backbone: peptide backbone for each amino acid

x RNAbaseheavyatoms: Nitrogen base for each nucleotide

x nt_sugar: Ribose atoms

x nt_phosphate: Phosphate group of each nucleotide

Additionally, the dictionary named “RNAbasecoordinates” contains quantum mechanics optimized base coordinates for each of the four bases in standard orientation. The types of RNA- protein interactions detected for each pair of components are summarized in table 5.3.Other dictionaries in definitions.py include atom names that define the plane for the nucleobases and planar amino acid functional groups (“planar_atoms”), and define a plane for the puckered ribose ring (“planar sugar”). For amino acids with non-planar functional groups such as Pro, Ile, Leu,

Val, the approximate angle by which the amino acid functional group is tilted with respect to the plane of the nucleobases, is calculated. The cut-offs of this tilt measurement for each amino acid, are defined in the Python dictionary “tilt_cutoff”. Python files that calculate bond distances and bond angles use these dictionaries for appropriate geometric restrictions for classification of interactions.

Other dictionaries in definitions.py contain:

x Atom names that define each of the three edges of each base, i.e. Watson-Crick,

Hoogsteen and Sugar edge

x Atom names and connections for drawing nucleotides and amino acids in matplotlib

165

Table 5.3 Different types of interactions annotated for components of nucleotides and amino acid residues. Amino acid sequences for each type are specified in Python sets. In pseudopairs, amino acids capable of forming bifurcated hydrogen bonds are indicated between parentheses.

Nucleotide Interacting Amino acid Type of Participating amino acids component base part component interaction Pseudopair Asp, Glu, Asn, Gln, Arg, His, (Tyr, Lys) Aa functional Bidentate His, Arg, Asp, Glu, Asn, Gln Edge group/ Perpendicular Phe, Tyr Peptide edge Base backbone Hydrophobic Ile, Leu, Val, Pro, Ala Single Hydrogen Lys, Ser, Thr, Tyr bonds Stacking Trp, Tyr, Phe, His, Arg, Asp, Face Aa functional Glu, Asn, Gln group Cation-pi His, Arg, Lys Stacking Trp, Tyr, Phe, His, Arg, Asp, Ribose Aa functional Glu, Asn, Gln group Single Hydrogen Asp, Glu, Asn, Gln, Arg, His, bonds Tyr, Lys, Ser, Thr Aa functional Electrostatic Phosphate group interactions Arg, His, Lys Peptide Single Hydrogen Asp,Glu, Asn, Gln, Arg, His, backbone bonds (Tyr, Lys)

5.5.3 Parsing structural files

A series of Python scripts are used to parse a user-input structure of interest in mmCIF format.

To enable fast and accurate detection of -amino acid interactions, FR3D modules

were imported into Python. The fr3d.cif.reader module (used internally by cif.reader to read and

parse mmCIF files) is installed by installing fr3d-python. This module relies on two external

Python files: data.py and definitions.py to create lists of atoms for each nucleotide or amino acid

residue that correspond to each part of the residue as defined above. A ‘center’ for each

nucleotide or amino acid is calculated by averaging over the coordinates that constitute the 166 nitrogen base of a nucleotide or the atoms that constitute the functional group or backbone of an amino acid residue. An interaction between the nitrogen base of a nucleotide and the functional group or backbone of an amino acid residue is detected by calculating the center-to-center distance between the respective base and amino acid. Nucleotide-amino acid pairs are considered possible candidates for an interaction, if the center-to-center distance calculated meets a preset distance cut-off (7 Å); all other possible pairs of residues are rejected.

5.5.4 Geometric conditions for annotating interactions

Screening conditions for nucleotide-amino acid interactions are based on interatomic distances of the residues involved and the angle between the normals to their planes. The cut-offs for each condition depends on the type of interaction to be detected and also on the amino acid residue involved. Each base is represented geometrically by the position of its glycosidic nitrogen in 3D space and by the rotation matrix that describes its orientation with respect to a standard reference frame. Each amino acid paired with the base, is transformed with the base rotation matrix to retain its relative orientation to the base. Steps followed in the code are represented in a flowchart (Figure 5.14). Criteria used for each type of interaction is explained using RNA- protein interactions observed in the mammalian mitochondrial small subunit ribosome [28]

(PDB: 5AJ3). a) Pseudopairs:

Since hydrogen bonding is the primary basis of pseudopair formation, the geometric conditions enforced by our algorithm follow the conditions of hydrogen bonding. The nature of hydrogen bonding varies from covalent (as in HF, proton sponges) to electrostatic (carboxylic acids, nucleotides, proteins), so there is no strict enforcement of bond lengths and angle criteria but a broader range must be used: (i) Hydrogen bond donor (D) and acceptor (A) atoms must be within 167 a distance of 4 Å or less (ii) Angle between Donor-Hydrogen-Acceptor atoms, when it can be calculated, should be between 150° to 180° (Figure 5.10). Since most macromolecular crystal structures cannot resolve electron density for hydrogen atoms, the second criterion is enforced as an angle between the normal to the plane of the nucleotide (base or ribose sugar) and the normal to the plane of the amino acid (functional group or peptide backbone). The distance cut-offs set by the algorithm has to be between H-bond donor and acceptor atoms as defined in Table 5.1.

Figure 5.10 Pseudopair detection conditions. Pseudopairs are detected using a combination of distance, planarity and angle criteria. (A) Sugar edge pseudopair between U28 and Gln 57 (chain L). (B) C59 and Gln 133 (chain j) and (C) G357 and Gln 100 (chain K) do not meet the criteria and are rejected.

Additionally, as a final step of validation, a function called HB_count counts the number of H- bonds formed between the H-bond donor or acceptor atoms of the amino acid and the base. If the number of hydrogen bonds is 2, it is a pseudopair. If there is only one hydrogen bond formed then the interaction is accepted and annotated only if the amino acid is Lys, Ser, Thr or Tyr. b) Stacking interaction:

A base and an amino acid are said to be stacked on one another if they lie in roughly parallel planes, with their geometric centers within 3 Å from one another. Functional groups of amino acids such as Lysine, Serine, Threonine and Glycine are not considered for stacking since they do not possess an adequate number of atoms to define a plane. 168

For amino acid residues with non-planar functional groups, the “tilt” of the functional group to the planar nucleobase is calculated. The tilt is measured by calculating the difference between the minimum and maximum z-distance of the functional group atoms from the geometric center of the nucleobase (Figure 5.11).

Figure 5.11 Stacking criterion for amino acids with non-planar functional groups. Maximum (d1) and minimum (d2) distance between atoms of the functional group and the base center (shown as blue dot) is calculated. Difference of the two (d1-d2) gives an approximate measurement of the tilt of the amino acid functional group with respect to the plane of the nucleobase. In this figure, Leu104 (chain m) is not parallely stacked on U961 whereas Leu35 (chain N) is annotated as parallel stacked on C599. c) Perpendicular interaction:

All bases comprise an aromatic ring with a delocalized pi electron cloud above and below the plane of the ring. Cations such as the sidechains of Lysine or Arginine can align themselves centered over the faces of aromatic rings thus establishing an electrostatic interaction termed cation-pi interaction. To account for such interactions in the ribosome, functional groups of Lys, and Arg are annotated as “perpendicular” if their plane is roughly perpendicular to the plane of 169 the base (i.e. angle between the normal vectors to the planes is between 80 q and 100 q) and the aa_fg is positioned within 2 Å of the geometric center of the base.

For perpendicular stacking, aa_fg of amino acids such as Phe, Tyr, Trp, His (for aromatic stacking) and Leu, Ile, Val (for aliphatic stacking) has to be positioned perpendicular and close to the edge of the base. d) Bidentate interaction:

Annotating this complex class of interaction involves two additional steps. First base and amino acid pairs that pass (i) the minimal center-to-center distance cut-off (7 Å) and (ii) the angle, planarity criteria for pseudopairs, progress to HB_count function. If the amino acid makes a single hydrogen bond to the base, all such instances are grouped in a single list. Since bidentate interactions are made by a single amino acid interacting with two bases, in the second step, the base-aa list is scanned to identify all instances of one amino acid residue making hydrogen bonds with two different bases. These instances are then annotated as bidentate interactions.

5.5.5 Annotations for interacting base component

All interactions are further annotated according to structural specifications of the interacting base. Pseudopairs and bidentate interactions are annotated to specify the edge of the base participating in the interaction. E.g. pseudopairs involving base-aa_fg interactions are annotated as PsPfgW/ PsPfgH/ PsPfgS (Figure 5.12), where PsP denotes pseudopair; fg implies amino acid functional group and W, S and H stand for the Watson-Crick, Hoogsteen and Sugar edge of a base. Pseudopair with sugar edge of the base can involve H-bond with the 2’ OH of the ribose of the nucleotide (Figure 5.12 C). To ascertain the edge of the base, three angles are defined from the geometric center of the base (Figure 5.12 A and B). To be classified into a certain category, the geometric center and the normal vector of the amino acid component (aa_fg or peptide) must 170 fall inside the specified sector. Note in Figure 5.12 A and B, the circular sectors overlap slightly.

Particularly, for pyrimidines (C and U), it is difficult to specify cut-offs for the Hoogsteen edge.

In a helix, Hoogsteen edge of the bases constitute the major groove where proteins are often found to bind sequence specifically [29].

Figure 5.12 Classification of pseudopairs along three different edges of nucleobases. Gray circular sectors defining each edge for (A) pyrimidines and (B) purine are based on the angular cut-offs from the geometric center of the base (yellow dot). To classify the edge of interaction, 171 the geometric center of the amino acid component must fall within a particular circular sector. (C) Sugar edge of A196 paired with Arg27 (chain d) (D) Hoogsteen edge of G791 paired with Arg277 (chain I) and (E) Watson-Crick edge of C136 paired with Arg198 (chain O).

Parallel and perpendicular stacking interactions are differentiated according to the particular face of the RNA base interacting with the amino acid. The two faces of an RNA base are distinguished by reference to the usual orientation of each base in the Watson–Crick helix, in which all bases are in the anti-glycosidic conformation; the “5′-face” is the face that points toward the 5′-end of the strand and the “3′-face” the one toward the 3′-end of the strand [24].

Accordingly, an aa_fg can stack on the 5’- or 3’-face of the base (Figure 5.13).

Figure 5.13 Classification of stacking interactions according to base faces. (A) Trp61 (chain c) stacks on the 5’-face of A104 and (B) Arg96 (chain p) stacks on the 3’-face of A272.

A simplified overview of the steps involved in the base-aa interaction classification is presented in Figure 5.14. The algorithm is divided into two broad flowcharts for easy perusal. Figure 5.14

A shows the parsing of data and initial screening of all combinations of base and amino acid residues according to a center-to-center distance cut-off (7 Å). Following this initial screening, base-aa pairs fulfilling the criterion are translated and rotated with the base being positioned in a 172 standard orientation and the associated amino acid coordinates are transformed accordingly.

Figure 5.14 B describes the actual detection of interactions and their annotation according to pre- defined geometric conditions and restrictions based on the structural properties of the type of interaction and chemical properties of the residues involved. The Python codes for detecting base-aafg and base-peptide interactions can be found on the URL https://github.com/poorna11/Python-scripts/blob/master/RNA-protein_new.py . Code for detecting aafg interactions with RNA backbone (ribose stacking and interactions with phosphates) can be found here: https://github.com/poorna11/Python-scripts/blob/master/RNA- peptide_bb.py

173

Figure 5.14 A Flowchart showing parsing of nucleoprotein complex structural data and initial screening condition to isolate base-amino acid pairs.

174

Figure 5.14 B Flowchart showing geometric conditions applied to detect and annotate different types of interaction.

5.6 Results and discussion

For a test run, we selected a limited dataset of bacterial SSUs (PDB ID: 3I8G, 2AW7), LSUs

(PDB ID: 4QCN, 2QBG, 4IO9), mitochondrial SSU (PDB ID: 5AJ3, 3J9M), archaeal LSU (PDB

ID: 1S72) and structures with entire ribosomes (PDB ID: 4YBB, 4V6F). For all figures, high 175 resolution structures from disparate sources were used (E. coli : 4YBB, T. thermophilus 4V9F, S. scrofa mitochondrial small subunit: 5AJ3 and H. marismotui large subunit: 1S72). The nucleotide-amino acid interactions detected were annotated as pseudopair, stacked, bidentate or perpendicular.

5.6.1 Statistics of occurrence of different types of interactions

a) Amino acid functional group interaction with RNA backbone:

Interaction between the amino acid functional group with the RNA backbone dominates

RNA-protein interaction. Although much specificity is not observed, most of the interactions are electrostatic in nature. These are formed by positively charged (Arg, Lys) or polar (Asn, Gln,

Ser, Thr) amino acids in a protein interact with the sugar-phosphate backbone of the RNA to neutralize the long chain of negative charges and enable compact folding of the RNA. Some stacking interactions are also observed between aromatic amino acids or Arg, His with the ribose of the nucleotide.

b) Amino acid functional group interaction with RNA base:

This category of interaction is most significant because of the underlying specificity. The specific nature of such an interaction is underlined by (i) preference of amino acids to interact more frequently with certain bases over others (ii) for pseudopairs, preference of amino acids to interact along specific edges (Watson-Crick/ Hoogsteen/ Sugar) of a base (iii) propensity of certain type of amino acids to participate dominantly in particular interactions (discussed previously).

5.6.2 Propensity of amino acids to participate in certain types of interactions

Among twenty naturally occurring amino acids (Gly being excluded for a lack of side-chain), two polar uncharged ones (Asn, Gln), two acidic ones (Asp, Glu) and a basic one (Arg) possess 176 planar functional groups with hydrogen bonding donor and/or acceptor atoms. Hence these amino acids can efficiently participate in hydrogen bonding with nucleotide bases to form pseudopairs via two hydrogen bonds. Other amino acids such as Ser, Thr, Tyr, Lys possess only one hydrogen bond donor/ acceptor and can only form single and bifurcated hydrogen bonds to nucleotide bases.

For base-amino acid stacking, aromatic amino acids (Phe, Tyr, Trp), Phenylalanine being the most prominent member, are most frequently observed making stacking interactions with bases

(Figure 5.15), closely followed by Arg.

Figure 5.15 Column graph indicating percentage of base- amino acid (functional group) stacking interactions in different ribosomal structures; E.coli (4YBB), T. thermophilus (4V6F) and S. scrofa mitochondrial small subunit (5AJ3).

5.6.3 Specific recognition of bases by amino acids a) Sequence specificity:

Based on a limited dataset, it is difficult to conclusively formulate a set of cardinal rules for 177 specific base-amino acid pairing, as we have for base pairing in RNA or DNA. However, preference for particular pairings of amino acids and bases is amply evident from the analysis of our output.

Arginines strongly prefer Guanosines followed by (Figure 5.16). Histidines are found to prefer purines to pyrimidines. To a lesser extent, Asn and Gln also appear to favor purines.

A

B

C

Figure 5.16 Interaction of amino acid functional groups with different bases. All bases are superposed to a and transformed to a standard geometric center and orientation. Each amino acid paired with a base is transformed relatively. Base part of the nucleotide is indicated in blue while ribo-phosphate backbone is shown in green. Side-chain of the amino acid is shown in red and peptide backbone is shown in yellow. These images were generated with Matplotlib. 178

b) Base edge specificity:

Each base has three edges along which the amino acids can participate in Hydrogen bonding

while forming pseudopairs. In ribosomal RNA, the Watson-Crick edge of the base is frequently

used in making basepairs, making it unavailable for pseudopair formation. Only the bases

involved in non-Watson-Crick base pairing or in loops or junctions can make pseudopairs along

their Watson-Crick edge. Arginines appear to have a strong preference for Watson-Crick edge of

Cytosines although for Guanosine, the Hoogsteen edge is preferred (Figure 4 a) while other

bases (A, U) do not appear to have any edge specificity.

Though sugar edges of bases have not been reported to frequently form pseudopairs [19], our

dataset suggests that Histidines, Asparagines and Glutamines strongly favor the sugar edge of

Guanosines followed by Cytosines and Uracils (Figure 5.16).

We have also categorized interactions based on the participating part of the amino acid or

nucleotide residue. Our primary interest has been regarding the interactions of the bases with the

amino acid function group, because of the apparent base sequence and edge specificity. From our

preliminary results, we have identified preference of certain amino acids to interact with certain

bases (Table 5.4).

Table 5.4 Nucleotide and edge specificity of base-aa_fg interactions as observed in our limited dataset analysis Amino acid Interacting part Nucleotide Interacting part Interacting edge

Arginine Functional group Guanosine Base Hoogsteen

Histidine Functional group Guanosine/ Base Sugar Cytosine

Glutamine Functional group Guanosine/ Base Sugar Cytosine

179

● Peptide backbone interaction with RNA base

Peptide backbone of all amino acid contain the peptidyl group where the carboxyl O can be a hydrogen bond acceptor and the amino Nitrogen is a hydrogen bond donor, essentially making the peptidyl group similar to the functional group of Asparagine or Glutamine. Expectedly, base preferences of peptide backbone interactions are similar to those of Asn and Gln.

Certain amino acids like Serine and Threonine use both functional group and peptide backbone atoms to make hydrogen bonds with the sugar edge of bases. Interestingly, Tyrosine cannot make such similar interactions due to the increased bulk of a phenyl ring resulting in steric clashes with the base and ribose atoms.

In addition to the interactions described, we observe cation-pi interactions from positively charged functional groups of amino acids with pi electron cloud of the bases although such interactions are rare. Also observed are van der Waals contacts leading to pseudopairs, particularly between hydrophobic amino acids and bases.

5.6.4 Usage in bioinformatics

a) Validation of structure:

The cornerstone of structural bioinformatics is the quality of structure. Currently there are several techniques that generate high-resolution structures for RNA-protein complexes. These techniques include X-ray, NMR and Cryo-EM. Each technique has distinct advantages and caveats. X-ray crystal diffraction can neither resolve the positions of hydrogen atoms in macromolecular structure nor distinguish between atoms with similar number of electrons such as carbon, nitrogen and oxygen (number of electrons = 6, 7 and 8 respectively). Hence, it is difficult to differentiate between Asp and Asn, Glu and Gln etc. Modeling the orientation of an amino acid sidechain largely depends on the chemical environment of the aa residue in the 180 protein [30]. A parameter that measures how well a structural model fits the underlying experimental electron density is real space refinement (RSR). However, there are certain issues with using RSR as a structure quality parameter as discussed here [31]. To overcome these issues, RSR-Z, a normalized version of RSR calculated by PDB [32], is implemented. RSR-Z provides a Z-score for each residue in a structure providing a more detailed assessment of structure quality.

Another widely used technique for determination of 3D structures in NMR. NMR captures the solution state structure of a molecule. In solution, amino acid sidechains can move around rapidly giving rise to poorly resolved structural data. Assessment of how closely the different models depicting different conformational states of the molecule in solution, is denoted by the root mean square deviation (RMSD) between these models. For every deposited structure, PDB provides several structure quality parameters such as resolution, backbone RMSD, clash score,

RSR, RSRZ etc [32]. While studying an annotated interaction, it is important to be aware of the structural quality of the residues involved. For instance, in Figure 5.17A, a conserved bidentate interaction in the 23S rRNA of E. coli and Thermus thermophilus shows good fit of electron density data to the model in the PDB validation report (colored green in Figure 5.17B).

Equivalent residues in 23S rRNA from Deinococcus radiodurans shows rotated sidechain of the amino acid, leading to poor fit of electron density (red, Figure 5.17B). 181

Figure 5.17 Conserved bidentate interaction involving equivalent residues in E. coli, T. thermophilus and D. radiodurans LSU. In E. coli (PDB: 2QBG) Asn11 (chain N, protein L17) makes bidentate interaction with A1652 and G1653. T. thermophilus (PDB: 4QCN) Asn11 (chain R, protein L17) with A1699 and G1700. Sidechain of Asn11 (chain K, protein L17) in D. radiodurans (PDB: 4IOA) is rotated and fails to make a similar bidentate interaction with A1669 and G1670. PDB validation report for (left) T. thermophilus (4QCN), chain DR, protein L17 where residue Asn11 is green denoting no outliers during fitting of electron density data and (right) D. radiodurans, chain K, protein L17 where residue Asn11 is highlighted in red denoting high number of outliers i.e. poor fitting.

There are web-enabled services that evaluate and rectify protein crystal structures. One such widely used web server in Molprobity [33,34]. It provides global and residue-level assessment of protein models using optimized hydrogen placement and all-atom contact analysis along with geometrical restriction such as bond lengths and torsion angles. The resultant diagnostics subsequently help the user rebuild the model with suggested rectifications (such as amino acid sidechain flips) and addition of hydrogen atoms. A future goal of our group is to automate 182 structural validation by Molprobity prior to using the structures as input for the RNA-protein interaction annotation algorithms.

b) Identifying phyohenetically conserved motifs:

Systematic annotations are excellent tools to recognize phylogenetically conserved RNA-protein interaction motifs. For instance, E. coli and T. thermophilus are related bacteria and share homologous ribosomal proteins. Shown in Table 5.5 is a partial listing of RNA-protein interactions in the 30S subunit of E. coli (PDB: 4YBB, chain AA) and T. thermophilus (PDB:

4V6F, chain BA) ribosomes. Of 88 interactions annotated in Thermus, there are 74 equivalent interactions in E. coli. Of these 74, 15 interactions are poorly modeled in E. coli. No errors were noted in annotation when verified by manual inspection of crystal structures.

183

Table 5.5 Comparison of RNA-protein interactions in E. coli and T. thermophilus small subunit ribosome

E. coli rProtein rProtein Interaction Edge T. rProtein rProtein Interaction Edge 16S residue thermop residue Type rRNA hilus residue 16S rRNA residue A33 uS12 Gln29 Pseudo Sugar A33 uS12 Phe32 Pseudo Sugar pair edge pair edge G108 bS20 Arg10 cation parall G108 bS20 Arg15 cation-pi perpendi stacking ely cularly stack stacked ed No - U189 uS17 Arg63 cation parallely equivale stacking stacked nt nt

A560 uS5 Tyr128 Aromatic - U560 uS5 Absent - stacking C620 uS4 Ile132 stacking parall C620 uS4 Leu135 stacking parallely ely stacked stack ed G667 uS15 His42 Minor G667 uS15 Absent groove A729 uS15 His51 bidentate Hoog A729 uS15 His51 bidentate Hoogste steen en edge edge G730 uS15 His51 bidentate Hoog G730 uS15 His51 bidentate Hoogste steen en edge edge C875 uS8 Asn16 Minor Sugar C875 uS8 Asn15 bidentate Sugar groove edge edge

c) Identifying adaptive features in response to extreme environment:

Detailed analysis of RNA-protein interactions can also provide useful insights into adaptations

made by different organisms in extreme environment. A well-known example is thermophiles

having increased hydrophobic RNA-protein contacts to increase thermal stability of their

ribosomes. Another interesting test case is the mammalian mitochondrial (mmt) ribosome, which 184 is, exposed to high concentrations of reactive oxygen species (ROS) that damage ribosomal

RNA. In order to protect the rRNA from solvent-borne ROS, ribosomal proteins in mmt are positioned around the rRNA to minimize solvent exposure of rRNA. rProteins also compensate of missing RNA elements in the mmt. The adaptive features of the mmt ribosome and the role of

RNA-protein interactions in it would be the main focus of chapter 3 in this dissertation.

5.7 Conclusion

Here I have presented an algorithm to systematically detect and annotate nucleotide-amino acid interactions. Our preliminary results suggest that amino acids interact with nucleotides by (i) direct or water mediated hydrogen bonding, forming pseudopairs, (ii) van der Waal interactions that are either enforced by stacking of amino acid sidechains on the bases or by atomic contacts, particularly between hydrophobic amino acids and bases (iii) cation-pi interactions, although they form a minor fraction in the repertoire of possible interactions.

We hope this work not only elucidates the details of specific recognition of ribosomal RNA by ribosomal proteins on a residue level, but also aids the fitting of amino acids to electron density data, which is often undermined in ribosomal structures. Additionally, we expect that efficient detection and annotation of nucleotide-amino acid interactions would lead to the discovery of recurrent motifs that enforce RNA-protein recognition. Occurrence of such recurrent motifs would signify their importance in strengthening RNA-protein interaction. Currently, I am in the process of automating RNA-protein interaction annotation in the FR3D pipeline that stores annotated base-aa pair information in the BGSU RNA databse. In future, we plan to integrate nucleotide-amino acid interaction detection into our current suite of programs that search for recurrent RNA 3D motifs (http://rna.bgsu.edu/webfr3d/geometric.php). This would enable users to search for nucleotide-amino acid 3D motifs in similar fashion. 185

REFERENCES

1. Crick FH. On protein synthesis; 1958. pp. 8. 2. Jacob F, Monod J (1961) Genetic regulatory mechanisms in the synthesis of proteins. Journal of molecular biology 3: 318-356. 3. Ptashne M (1967) Specific binding of the lambda phage repressor to lambda DNA. Nature 214: 232-234. 4. Pabo CO, Lewis M (1982) The operator-binding domain of lambda repressor: structure and DNA recognition. Nature 298: 443-447. 5. Anderson WF, Ohlendorf DH, Takeda Y, Matthews BW (1981) Structure of the cro repressor from bacteriophage lambda and its interaction with DNA. Nature 290: 754-758. 6. McKay DB, Steitz TA (1981) Structure of catabolite gene activator protein at 2.9 A resolution suggests binding to left-handed B-DNA. Nature 290: 744-749. 7. Ohlendorf D, Anderson W, Fisher R, Takeda Y, Matthews B (1982) The molecular basis of DNA–protein recognition inferred from the structure of cro repressor. 8. Tan R, Chen L, Buettner JA, Hudson D, Frankel AD (1993) RNA recognition by an isolated alpha helix. Cell 73: 1031-1040. 9. Battiste JL, Tan R, Frankel AD, Williamson JR (1994) Binding of an HIV Rev peptide to Rev responsive element RNA induces formation of purine-purine base pairs. Biochemistry 33: 2741-2747. 10. Oubridge C, Ito N, Evans PR, Teo CH, Nagai K (1994) Crystal structure at 1.92 A resolution of the RNA-binding domain of the U1A spliceosomal protein complexed with an RNA hairpin. Nature 372: 432-438. 11. Draper DE (1999) Themes in RNA-protein recognition. Journal of molecular biology 293: 255-270. 12. García-Ramos JC, Galindo-Murillo R, Cortés-Guzmán F, Ruiz-Azuara L (2013) Metal-based drug-DNA interactions. Journal of the Mexican Chemical Society 57: 245-259. 13. Steitz TA (1993) Similarities and differences between RNA and DNA recognition by proteins. Cold Spring Harbor Monograph Series 24: 219-219. 14. Parge HE, Schneider M, Hahn V, Saenger W, Altschmied L, et al. (1984) Crystallization of and preliminary X-ray diffraction data for TET-repressor and the TET-repressor- tetracycline complex. Journal of molecular biology 180: 1189-1191. 15. Privalov PL, Dragan AI, Crane-Robinson C, Breslauer KJ, Remeta DP, et al. (2007) What drives proteins into the major or minor grooves of DNA? Journal of molecular biology 365: 1-9. 16. Rohs R, Jin X, West SM, Joshi R, Honig B, et al. (2010) Origins of specificity in protein- DNA recognition. Annual review of biochemistry 79: 233-269. 17. Burd CG, Dreyfuss G (1994) Conserved structures and diversity of functions of RNA- binding proteins. Science-AAAS-Weekly Paper Edition 265: 615-620. 18. Seeman NC, Rosenberg JM, Rich A (1976) Sequence-specific recognition of double helical 186

nucleic acids by proteins. Proceedings of the National Academy of Sciences 73: 804-808. 19. Kondo J, Westhof E (2011) Classification of pseudo pairs between nucleotide bases and amino acids by analysis of nucleotide–protein complexes. Nucleic acids research: gkr452. 20. Feldblum ES, Arkin IT (2014) Strength of a bifurcated H bond. Proceedings of the National Academy of Sciences 111: 4085-4090. 21. Leontis NB, Westhof E (1998) Conserved geometrical base-pairing patterns in RNA. Quarterly reviews of biophysics 31: 399-455. 22. Kondo J, Westhof E (2010) Base pairs and pseudo pairs observed in RNA–ligand complexes. Journal of Molecular Recognition 23: 241-252. 23. Šponer J, Šponer JE, Petrov AI, Leontis NB (2010) Quantum chemical studies of nucleic acids: can we construct a bridge to the RNA structural biology and bioinformatics communities? The journal of physical chemistry B 114: 15723. 24. Sarver M, Zirbel CL, Stombaugh J, Mokdad A, Leontis NB (2008) FR3D: finding local and composite recurrent structural motifs in RNA 3D structures. Journal of mathematical biology 56: 215-252. 25. Moore PB (1999) The RNA folding problem. Cold Spring Harbor Monograph Series 37: 381-402. 26. Desiraju GR, Gavezzotti A (1989) Crystal structures of polynuclear aromatic hydrocarbons. Classification, rationalization and prediction from molecular structure. Acta Crystallographica Section B: Structural Science 45: 473-482. 27. Hoffman MM, Khrapov MA, Cox JC, Yao J, Tong L, et al. (2004) AANT: The amino acid– nucleotide interaction database. Nucleic acids research 32: D174-D181. 28. Greber BJ, Ban N (2016) Structure and function of the mitochondrial ribosome. Annual review of biochemistry. 29. Benner SA (2016) Unusual Hydrogen Bonding Patterns and the Role of the Backbone in Nucleic Acid Information Transfer. ACS Publications. 30. Rhodes G (2010) Crystallography made crystal clear: a guide for users of macromolecular models: Academic press. 31. Tickle IJ (2012) Statistical quality indicators for electron-density maps. Acta Crystallographica Section D: Biological Crystallography 68: 454-467. 32. Gore S, Velankar S, Kleywegt GJ (2012) Implementing an X-ray validation pipeline for the Protein Data Bank. Acta Crystallographica Section D: Biological Crystallography 68: 478-483. 33. Deis L, Verma V, Videau L, Prisant M, Moriarty N, et al. (2013) Phenix/MolProbity hydrogen parameter update. Comput Crystallogr Newsl 4: 9-10. 34. Chen VB, Arendall WB, Headd JJ, Keedy DA, Immormino RM, et al. (2010) MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallographica Section D: Biological Crystallography 66: 12-21.

187

CHAPTER 6. THE EVOLUTIONARY PATH OF THE MAMMALIAN MITOCHONDRIAL

RIBOSOME: HOW TO FOLD RNA WITH LESS GUANINES

This chapter is adapted from a manuscript in progress by Poorna Roy, Maryam Hosseini, Eric Westhof, Marie Sissler and Neocles Leontis

6.1 Mitochondria: The site of ROS production

Mitochondria are the sites for cellular respiration, responsible for generating 90% of the energy used by mammalian cells [1]. During oxidative phosphorylation, proteins in the electron transport chain reduce oxygen to water, one electron at a time. This generates partially reduced

x- x intermediates such as superoxide anion (O2 ), hydroxyl radical (OH ) and hydrogen peroxide(H2O2)[2]. These intermediates, collectively known as reactive oxygen species (ROS), are produced in the inner mitochondrial membrane which is also the site of protein synthesis by mitochondrial ribosomes. These ribosomes are distinct from those in the . In mammals, they are responsible for translating just thirteen core hydrophobic polypeptides, subunits of the respiratory chain complexes that are embedded in the mitochondrial inner membrane [3]. A major peculiarity of the mammalian mitochondrial (mmt) translational apparatus is that all RNA components are encoded by the mitochondrial genome (mmtDNA), while all required proteins

(including ribosomal proteins and translation factors) are encoded by the nuclear DNA, translated in the cytosol, and imported into mitochondria[4-6]. Noteworthy, the two genomes accumulate mutations at different rates [7,8]. The high mutation rate and extensive compaction of mmtDNA have led to large-scale reduction or alteration of all mmtDNA-encoded RNA components: i) the thirteen mRNAs lack or have drastically reduced 5’- and 3’-UTRs; ii) many of the 22 mmt-tRNAs have reduced D- and T- loops and lack some of the otherwise highly 188 conserved residues, that promote 3D folding in canonical tRNAs; iii) the two mmt-rRNAs are significantly reduced in size (12S vs 16S SSU and 16S vs 23S LSUrRNAs) compared to their ancestral bacterial homologs (16S and 23S rRNAs) with the 12S mmt rRNA, for example, is just half the length of the 16S bacterial rRNA; and iv) most of the mmt-RNAs are significantly enriched in A, U, and C nucleotides at the expense of G .

6.2 ROS damage to mitochondrial ribosomes

Mmt ribosomes have garnered much interest because of their unusual characteristics. They present a unique case study in molecular evolution of a truncated, G-poor RNA in a highly oxidizing environment. RNA, like DNA, is subject to oxidative damage from exposure to ROS and oxidative damage of RNA (mRNA, rRNA, or tRNA) can lead to inactivation of the molecule

[9]. Since the mitochondrial ribosomes are constantly exposed to ROS, they are susceptible to the various deleterious processes resulting from the excessive concentration of ROS. It appears in order to avoid the subsequent damage resulting from synthesis of defective proteins, mammalian mitochondrial (mmt) ribosomes are turned over much more rapidly than cytosolic ribosomes (3-5 hours vs. 5 days)[10]. What level of damage exactly triggers the recycling of mmt ribosomes is not known neither is the mechanism for detecting that damage.

The recent cryo-EM structures obtained at near atomic-resolution for human and porcine mmt- ribosomes, provide striking structural data, forming a basis for comprehensive biochemical understanding of this evolution. Our aim in this contribution focuses on a central question, how is it possible to fold a complex RNA with less Gs? Firstly, we analyze how architectural RNA features are maintained in the mmt small subunit (SSU) 12S rRNA, despite the loss of several

RNA parts and contacts, with the goal of delineating the limits in reduction of the mmt-RNAs 189 and the mechanisms of potential compensation through increased protein content. This knowledge should add to our present views on RNA structural modules and how they interact with other ribosomal components and substrates to maintain folding and stability. Secondly, we analyze the changes in base composition of the mmt-SSU, by reference to bacterial SSU from which they are derived, most notably, the massive decrease in Gs in mmt-SSU and the dramatic changes in the distribution of the remaining conserved Gs. In addition, we analyze how individual amino acids stabilize RNA folding through specific interactions with remaining, G- poor RNA elements.

6.3 Methods

6.3.1 Analysis of 3D Structures

We analyzed high quality representative structures of E. coli SSU (4YBB, 2.8 Å) [11], E. coli

SSU (2AW7, 3.5 Å) [12] ,and T. thermophilus (4V6F, 2.5 Å) identified in representative set of nonredundant structure [13]. The T.th. structure included 3 tRNAs (one in each of the three standard functional sites). For analysis of mitochondrial structures we examined near atomic resolution cryo-EM structures with nominal resolution (3 Å - 4 Å ) for bovine (3J6V, 7 Å [14]) porcine (5AJ3, 3.6 Å [15]) and human (3J9M, 3.5 Å [16]) mt-ribosomes as no high resolution X- ray structures are available. Structures were visualized in SwissPDBViewer and homologous proteins and RNA helices were colored identically in each structure to facilitate superposition and visualization of common elements.

Analysis of Sequence Alignments. Curated rRNA sequence alignments maintained by Robin

Gutell’s laboratory (http://www.rna.icmb.utexas.edu/DAT/3C/Alignment/) were accessed using the newly developed R3D-2-MSA webservice [17]. The bacterial SSU alignment and the 190 mitochondrial SSU alignments were used. This server makes it possible to access the columns of the alignments using nucleotide identifiers from 3D PDB structures. The mitochondrial alignments consist of 899 sequences of which 308 are mammalian mitochondria, comprising diverse and non-redundant representatives of mammalian evolution. The bacterial SSU alignment consists of 35998 sequences where 1228 are from E. coli.

R3D-2-MSA Alignment Server was used to determine the base frequencies of a base at each position of the mammalian mitochondrial and bacterial alignment. The PDB name (2AW7 for

E.coli SSU and 5AJ3 for mmt-SSU in our study) and nucleotide number were given as an input for the sequence search and the output was the percentage of the occurrence of each type of nucleobase ( A, U, G or C) in that specific position in the entire bacterial and mmt alignment. To simplify the search, a python program was developed to connect to the R3D-2-MSA Alignment

Server and get the variation of nucleotides in all positions in the alignment. The documentation for programmatic access to the R3D-2-MSA Alignment Server and search for the sequence variations is available on BGSU RNA Bioinformatics website (link).

To compare the conservation of the nucleobases in Bact SSU rRNA and mmt-SSU rRNA, the equivalent nucleotides in the E.coli SSU ( PDB entry: 2AW7) and mmt-SSU ( PDB entry: 5AJ3) were determined manually using the Swiss PDB Viewer. The nucleotides that had an equivalent nucleotide in the same position in other structures were grouped as “Core Elements ” and those with no equivalent nucleotides were grouped as “Peripheral Nucleotides ”. Also, all the nucleobases were manually categorized based on the type of their secondary structure element: flanking base-pair, inter helical base-pair, hairpin loop, internal loop or multi-helix junction loop nucleotides. 191

Replacement of eliminated and truncated elements by rProteins: Bacterial and mmt-SSU ribosomes (PDB entries: 5AJ3 for mmt-SSU and 4YBB, chain AA for E. coli SSU) were superposed to determine which rRNA helices or rProteins eliminated or truncated in mmt spatially coincide with rProteins in mmt-SSU. Distinct elements were selected for accurate superposition of the two ribosomal structures. For “head” elements, helix 44 from bacterial and mmt-SSU were superposed while for body elements, h30 was the superposed. Volumes of the rRNA helical elements as well as the replacing rProtein elements were determined by the 3vee:

Voss, volume, Voxelator webserver [18] using a probe radius of 1.4 Հ and a high grid resolution.

Similar volume calculation was also used to determine the extent of expansion of mitochondrial rProteins homologous to bacterial rProteins.

6.3.2 Assessment of degree of conservation

The degree of conservation of RNA-RNA interactions is assessed by comparative sequence and structure analysis. The sequences of equivalent motifs in bacterial and mmt-rRNA are compared on the basis of isostericity matrices (IM) for non-Watson-Crick base pairs, as previously described [19], to determine whether base substitutions in loops of the mmt-SSU can potentially form base pairs isosteric to those observed in the parent 3D structure, thus preserving the structures of internal and hairpin loops and therefore the geometries of long-range interactions that they form.

Geometric analysis of RNA-protein interactions. Specific interactions between the nucleotide and amino acid residues were determined on the basis of the stereochemistry of the amino acid sidechain or peptide backbone and the part of the nucleotide participating in the interaction. A suite of Python programs were developed to detect, classify and annotate residue level RNA- 192 protein interactions in ribosomes (Roy et al, manuscript in preparation). Our programs analyze mmCIF files (PDB entries: 5AJ3 for mmt-SSU and 4YBB, chain AA for E. coli SSU) to detect and classify RNA-protein interactions as (i) RNA base-amino acid stacking interactions, (ii) complex “bidentate” interactions where an amino acid residue bind to two stacked bases simultaneously (iii) edge-to-edge interactions forming “pseudopairs” by two hydrogen bonds from the same part of the amino acid to the nucleotide. We further categorize these three types of interactions based on the part of the nucleotide (base or sugar-phosphate backbone) and part of the amino acid (sidechain or peptide backbone) participating.

Geometric analysis of protein-protein interactions. Protein-protein contacts were determined by careful inspection of the three-dimensional structure of the porcine mitochondrial SSU ribosome

(PDB entry 5AJ3) and E. coli SSU ribosome (PDB entry 4YBB). Amino acid residues within 4

Å of any atom of each protein chain was defined as contacted.

Analysis of Solvent Accessible Surface Areas. Solvent accessible surface area (SASA) for the

12S and 16S rRNA from mmt-SSU (PDB entry: 5AJ3) and E.coli SSU (PDB entry: 4YBB, chain

AA) were calculated for each atom in presence and absence of the associated rProteins, using

Gerstein’s accessible surface algorithm (REF) available at the High-Performance Computing server at NIH. Probe radius used for solvent accessibility calculations was 1.4 Å, typical for water molecule. The calculated SASA for each atom was then summed to get the total SASA for the RNA nucleobase (excluding sugar-phosphate backbone atoms) using a Python program.

SASA for each type of nucleobase (A, U, G or C) was normalized against the SASA for that base in isolation from the ribosome to determine the percent SASA of the nucleobase in the ribosome.

193

6.3.3 Visualization of helical elements in 2D and 3D

Corresponding (homologous) helices of all structures are colored in a consistent way, to facilitate comparison of superposed structures. The 3D structures, colored as shown in Figure 6 and suitable for viewing with SwissPDBViewer or with PyMol, are available in the supplemental material. Eight distinct colors suffice to color most RNA structures so that no two helices, in contact in 3D space or adjacent in the 2D diagram, share the same color. The same color scheme was used as previously presented [20].

6.4 Results and discussion

Our analysis couples comparative study of RNA sequence alignments with 2D and 3D structural analysis to identify the conserved and novel components of the mmt-ribosome. It is confined to the SSU rRNA, which mediates the crucial contacts between mRNA and tRNA to decode the mitochondrial mRNAs and ensure smooth translocation after peptide bond formation.

6.4.1 Corresponding interaction networks in bacterial and mmt SSU

Comparison of the 2D structures of bacterial and mmt-SSU rRNA, represented by E. coli 16S and S. scrofa 12S rRNA, reveals that only about half of the helical elements are conserved, while the others are significantly reduced or completely lost (gray regions in Figs. 1A and 1B). All helical elements that contain the binding sites for tRNA and mRNA (colored boxes in Fig. 1) are conserved in size and 3D structure. In mmt-SSU, the greatest losses of RNA occur in the body

(~400 nts), while only ~160 nts are lost from the head. Entire helical elements are only lost from peripheral regions: h8-h10 in the lower body, h16-h17 along its side and h21 from the back; h33 and h38-h40 from the peripheral of the head. By contrast, helical elements that form multi-helix 194 junctions with conserved helices retain sufficient nucleotides to stabilize junction geometries, ensuring correct co-axial stacking. Helical elements that are reduced, but not eliminated, include h6, h7, h12-h14, h22, h26, h33, h37, h38 and h41.

Figure 6.1 Comparative 2D structures of SSU rRNAS in bacteria and mammalian mitochondria. (Left) Bacterial SSU rRNA (E. coli); (Right) Mammalian mito SSU rRNA (S. scrofa). Helical elements that are eliminated or drastically reduced in the mmt-SSU rRNA are colored in grey. rProteins replacing rRNA helices are shown in yellow. Long range interactions are shown by arrows where color of the arrows indicate the type of elements mediating the interaction. Nucleotides that bind to mRNA and tRNA are boxed. Bacterial and mmt-SSU ribosomes (PDB entries: 5AJ3 for mmt-SSU and 4YBB, chain AA for E. coli SSU) were superposed to determine which rRNA helices or rProteins eliminated or truncated in mmt spatially coincide with rProteins in mmt-SSU.

Tertiary RNA-RNA interactions in 16S rRNA, 20 in the body and 12 in the head (dark arrows in

Fig. 6.1), form dense networks that stabilize the head and body. Loss or reduction of RNA 195 helical elements in 12S rRNA reduces the 32 all-RNA interactions in bacteria to just 8 in the body of mmt-SSU and 7 in the head (black arrows in Fig. 6.1). Remarkably, the interaction networks in mmt-SSU are maintained by replacement of lost RNA-only interactions with 8

RNA-protein (blue arrows in Fig 6.1) and 6 protein-protein interactions (red arrows in Fig 6.1), for a total of 18 conserved long-range contacts in the mmt body and 11 in the head. Just 3 interactions are absent (gray arrows in Fig 6.1). The networks are maintained by recruitment of new mmt-rProteins or mmt-specific extensions of bacterial homologues, many of which replace lost RNA elements, in whole or in part.

Places from where multiple helices are eliminated, e.g. the 4WJ formed by h7-h10, new mmt- specific rProtein, mS34, replaces the 3WJ and maintains several long-range interactions with other parts of the 12S rRNA. In places where both the helix and the rProtein binding to the helix is eliminated, such as h21 (one of the most exposed helices in bacterial SSU) and uS8, new rProtein mS25 replaces the RNA (93% volume substitution of h21) and protein (15% volume substitution of uS8) contacts of both the eliminated elements. In some cases, a new protein replaces two lost elements that interact in 16S, effectively replacing the RNA interaction by a protein. An example is the h8-h44 interaction which is replaced by protein mS34.

Of the 15 homologous rProteins in bacterial and mmt-SSU, 7 are considerably larger in mitochondria by acquisition of additional protein domains, and five of these substitute for lost rRNA domains (as determined by the volume analysis described in Methods) and bolster RNA- protein interactions as shown in Figure 6.1. Some of the homologous proteins such as bS16m bind to RNA elements that are reduced or eliminated in mitochondria (h17) and the mito-specific extension compensates for the missing helical element. Four mmt-SSU proteins are roughly the 196 same size in bacterial and mmt-SSU: bS6, uS12, uS14 and bS21, all of which bind to functionally important primary or secondary elements of the SSU-rRNA at conserved multi-helix junctions. Some rProteins that bind to RNA elements eliminated in mmt SSU are also lost, including uS8 which binds to h21 in bacterial SSU and is eliminated with the loss of h21 in mmt-

SSU. It is also striking to note that the five rProteins missing in the mmt-SSU: uS4, uS8, uS13, bS19 and bS20 have limited protein-protein contacts (uS4: 1, uS8: 0, uS13: 1, bS19: 2 and bS20:

0 protein contacts). On the contrary, mmt-SSU relies on extensive protein-protein contacts to stabilize the ribosome. When two RNA elements interacting in bacterial SSU are eliminated or truncated in mmt, protein-protein interaction compensates for the lost RNA-RNA interaction.

E.g. mS23 binds to the remodeled h25 hairpin, mS26 which partly replaces the truncated h13 contacts mS34 which replaces the truncated region of h44. Other proteins bind to remnants of reduced hairpins. For example, mS34 contacts the reduced new HL of h6 and the exposed

GNRA HL of h15.Out of the 15 new rProteins in mmt-SSU, 6 substitute for missing RNA elements. mS33 binds to the remnant of h33 and mS34 replaces the direct contact between h13 and h44. mS35, which lies very close to the eliminated h39, and uS9m, which substitutes for

20% of h39, partly replace the contacts between h39 and h41. mS23 replaces h26 as well as the anti Shine-Dalgarno helix (that does not form in mitochondria).Two proteins, mS35 and mS29, do not directly replace any RNA element but bind very close to the sites of missing or truncated

RNA elements, h39 and h41 respectively, and strengthen protein-protein network.

In summary, comparing the interaction networks in Figure 6.1 we see that all of the important long-range RNA-RNA interactions in bacterial SSU are either conserved in mmt SSU or replaced by RNA-protein or protein-protein interactions, thus preserving the interaction networks 197 in mmt-SSU. Additional protein-protein interactions further stabilize mmt-SSU ribosome and protect the RNA from solvent (see below).

6.4.2 Selective changes in nucleotide composition of mmt-SSU rRNA

In addition to losses of entire helical regions and sub-domains, the mmt-SSU rRNAs have suffered large-scale loss of guanosines (Gs) and changes in the distribution of those Gs that remain. Figure 6.2 plots nucleotide composition against nucleotide position in the curated bacterial and mmt SSU rRNA alignments [21], sorted separately from highest to lowest (left to right along the x-axis) according to percentage of each nucleotide. so as to reveal the unusual distribution of G in mmt compared to bacterial SSU sequences.

Figure 6.2 Nucleotide composition of Aligned rRNA for Bacterial (dotted lines) and mmt-SSUs (continuous lines). Along the X-axis is the number of positions in the sequence alignments normalized by the total number of positions in the reference rRNA, 1530 and 960 nts, respectively for E. coli and S. scrofa. These positions are sorted by decreasing fractional nucleotide composition. Positions for each type of nucleotide (G, in red, or A, in purple) have 198 been sorted separately. The Y-axis corresponds to the composition of each type of nucleotide by column in the alignment.

Thus, about 13% of aligned positions in mmt have 95% or higher G composition compared to

21% for bacteria. However just 16% of positions have >50% G (cf. 28% for bacteria) and just

23% have >15% (cf. 43% for bacteria). G is the most abundant nucleotide in the bacterial alignment (red dotted line) but least in the mmt alignment, in which A predominates (purple solid line). Furthermore, the curve for G in mmt-SSU drops more steeply than for any other nucleotide in either the mmt or bacterial alignments, indicating that most Gs occur at positions with high sequence conservation (>95%). See supplemental material for graphs for C and U.

Interestingly, the loss of Gs is not accompanied by reduction in Cs; rather mmt sequences show a large increase in the frequency of As and a modest increase in Us (see Table 6). Further, the variance of G composition is considerably smaller than for other nts in mmt and all nts in bacteria.

Table 6.1 Nucleotide composition in mmt and bacterial SSU alignments. The average number of Gs in curated mmt 12S rRNA sequences is just 175±11 out of 955±10.5 total nts (18.3±1.15%), compared to 477±23 out of 1529±26 (31.4% ± 1.5%) for bacteria [17,21]

Bacterial SSU alignment Mmt SSU alignment G 31.2±1.52 % 18.3±1.15 % A 25.3±1.55 % 35.7±1.47 % C 22.7±1.43 % 22.7±1.64 % U 20.8±1.40 % 23.2±1.55 %

6.4.3 Distribution of Gs by Structural Context

This unusual distribution of G suggests that positions conserved as G in both the mmt and bacterial alignments (>95%) have structural and functional requirements for G that are reflected 199 in the types and numbers of interactions in both structures. Conversely, positions in which Gs occur in one alignment but not in the other should also be informative. To analyze the distribution of Gs in mmt vs. bacterial SSU, we used the superposed 3D structures to identify all nucleotide positions that are structurally equivalent and therefore alignable at the nucleotide level across bacterial and mmt rRNA sequence alignments. We identified 837 positions (out of 960) in the S. scrofa mmt-SSU rRNA that are structurally alignable to E. coli 16S rRNA. The non- alignable positions occur in regions where elements are either reduced or lost (shown in gray shading in Fig. 6.1). To analyze the sequence conservation of the aligned positions, we partitioned them into nine disjoint sets according to the degree of G conservation in the two alignments, as shown in Table 6.2: high (>95%), medium (15-95%) and low (<15%). These are color coded in Table 6.2 for use in the 2D diagrams and to facilitate discussion. We further divided each of the nine groups into three structural contexts, helix interior, flanking Watson-

Crick base pairs (i.e. base pairs beginning or ending helices), and hairpin, internal or junction loops, including single-stranded linkers that join RNA domains. The sub-totals of aligned positions by structural context are: (1) 199 aligned positions inside helices (24% of aligned positions), (2) 264 flanking positions (31%), and (3) 374 positions in loop regions (45%). The sub-totals by %G composition in bacterial alignments, are 183 positions with > 95% G, 147 with

15%-95% G, and 517 wtih < 15% G. Of the highly conserved positions, only 40% (74) are also

>95% in the mmt alignments (colored blue in the 2D diagrams). However, this is considerably more than the number expected from a random distribution (25.1, see Table 6.2). We discuss each of these categories, identified by their colors, in the order showing in Table 6.2, paying special attention to those with significantly larger or smaller counts than expected. The numbers show that by and large, at positions where Gs are “optional” in bacteria (or even avoided), they 200 also tend to be so in the mmt alignments (“white” positions). The exceptions are the red, brown, and orange positions, all of which, however, occur at or below the frequencies expected randomly. Conversely, mmt positions for which there is >95% Gs in bacteria tend to have intermediate to high in G composition (blue and cyan positions). For this case, the exceptions

(yellow) occur at below expected.

201

Table 6.2 Correlations of % G composition in mmt and bacterial SSU alignments in aligned positions

202

a) Gs conserved in both bacterial and mmt-SSU (blue positions):

As mentioned above, many of these occur within or close to the primary functional sites that bind directly to substrates. Most others occur at sites flanking 3D motifs, i.e. within hairpin, internal and MHJ loops. It is well known that GC pairs at the ends of helices stabilize them by reducing “fraying” but an additional reason for conserving some Gs is that specific non-Watson-

Crick interactions often occur at these positions. For example, G1068 in the first WC pair of h37 is universally conserved. It forms a conserved tSS tertiary pair with A1191, which is part of the same 3WJ, an interaction also present in mmt-SSU. Moreover, the combination of tertiary interactions and helical stacking serves to protect Gs such as G1068 from the reactive environment of the . Overall, sugar-edge interactions are very common at MHJ and their maintenance appears to contribute to the positive selection for Gs at such positions.

Most blue Gs occur in flanking basepairs (41/74), forming mainly GC cWW pairs, but also some

GU and GA pairs. GC pairs are known to stabilize the ends of RNA helices by reducing fraying.

The Turner thermodynamic parameters for nearest neighbors predict that this stabilization is greatest when the G occupies the 5’-position, regardless of which basepair follows, or which base (C, A, or U) it is paired with. We find that 23% (62 of 264) of flanking positions have

>95% G (corresponding to blue, red, and brown positions). Of these, 36 have 5’-G and 18 have

3’-G.

An additional explanation for why G is highly conserved at some positions is the occurrence of specific tertiary interactions, including non-WC pairs and base-backbone interactions that are most favorable when the interacting base is G. For five basepair families, Gs form some of the most frequent and stable non-Watson-Crick pairs: GA in tHS (70%), UG in cHS (47%), GG in 203 cWH (33%), GA in tSS (62%) and GA in tWS (57%) pairs [22]. The high occurrence indicates greater stability. These five families account for over half of non-WC basepairs in structured

RNAS. In each of these families, isosteric base combinations in which the G is replaced by A, C, or U also occurs with some loss of stability [19].

Most of the blue loop positions occur within or adjacent to the primary functional sites of SSU that directly interact with tRNA and mRNA substrates (9 of 16 positions). These regions are indicated in the 2D diagrams using orange (A-site), green (P-site) and magenta (E-site) boxes

(Figure 6.1). These sites also contain several blue flanking positions. All blue G’s participate in extensive interactions including WC and non-WC base pairs, and base phosphate interactions.

Blue G’s in loops participate in local non-WC base pairs and account for the highest percentage of base-phosphate interactions (72% local and 17% long range base-phosphates).

Cyan Positions (>95% conserved in bacteria and 15-95% in mmt): These positions present a higher frequency than expected, especially for those in loops. In mmt, Gs are often replaced by

As, with rarely a rough equipartition between the four bases. They occur in flanking base pairs and in modules.

Yellow Positions (>95% conserved in bacteria and <15% in mmt): These positions were highly conserved in bacteria but have been replaced by other bases in mmt. They occur less than expected. Of the 68 positions, 25 do not make any tertiary interaction and 15 make tertiary interactions in bacteria with helical elements that are missing in mmt. The remaining 28 nts make

35 tertiary interactions of which 11 have the same base pair geometry with 8 have different geometries in the two structures. There are 16 positions that make a non-WC base pair in 204 bacteria but not in mmt. Yellow Gs in loops interact with proteins most predominantly. There are

26 yellow Gs in loops out of which 10 (40%) interact with proteins.

Red and brown Gs: Red and brown Gs are the positions that have >95% G composition in mmt alignments but 15-95% and <15% G composition in bacterial alignments, respectively. In other words, these are the positions which are usually not occupied by Gs in bacteria but are conserved as G in mmt. Among the 19 red and 23 brown Gs, in mmt-SSU 3 brown Gs and 2 red Gs are within or near the substrate binding sites (boxed in Figure 6.1). Other helical elements that include several red and brown Gs are h13, h30, h34 and hairpin loops of h43 and h45, all of which directly contact the substrate binding sites. Seven out of the 19 red Gs form tertiary interactions that are present in both bacterial and mmt structures. Two of these seven tertiary interactions are found only in T. thermophilus and not in E. coli suggesting the mmt favors more thermally stable interactions. Two of the red Gs participate in bidentate interaction with proteins

(G203 and G947).

Seven of the 23 brown Gs form tertiary interactions while 14 form cWW basepairs. 5 of the brown Gs make GC base pairs as seen in T. thermophilus while E.coli has AU base pairs in aligned positions. Another interesting observation about the brown Gs is that in 8 cWW basepairs, a CG base pair in bacterial structure is flipped into a GC base pair in mmt structures.

In structural context, the most striking characteristic of these positions are that they are minimal in loops. In both red and brown G groups, just one position from each group is in a loop while the rest are placed in helix interiors and flanking base pairs. All 19 red Gs and 22 out of 23 brown Gs engage in local WC base pairs. Two red Gs and five brown Gs make non-WC base 205 pairs. There are three red Gs (G203, G715 and G947) participating in local and two red Gs

(G203 and G947) in long range base-phosphate interactions.

There appears to be a tendency for greater clustering of conserved Gs in mmt-SSU, either adjacent to other conserved positions along the sequence or by stacking. Most striking is the significant number of runs of three conserved paired Gs that occur in mmt-SSU, something not observed to the same degree in bacterial SSU: In h1, G6, G7, and G18, in h7, G102, G103 and

G105; in h13, G165-G167; in h19, G288, G289, G485, and G391; in h28, G516, G517, G 836, and G519; in h30 G553, G717, G716, and G715, in h34 G682, G683, G618; and in h35 and h36

G642, G643, and G651.

b) Is there clustering of positions with G? There appears to be a tendency for greater clustering of conserved Gs in mmt-SSU, either adjacent to other conserved positions along the sequence or by stacking. Most striking is the significant number of runs of three conserved paired Gs that occur in mmt-SSU, something not observed to the same degree in bacterial SSU:

In h1, G6, G7, and G18, in h7, G102, G103 and G105; in h13, G165-G167; in h19, G288, G289,

G485, and G391; in h28, G516, G517, G 836, and G519; in h30 G553, G717, G716, and G715, in h34 G682, G683, G618; and in h35 and h36 G642, G643, and G651. c) How to deal with loss of G from RNA 3D Modules: The extensive loss of G conserved in bacteria from 3D motifs in hairpin and internal loops is striking and raises several questions: Do the 3D motifs conserved in bacterial structures retain their 3D structures in mmt-SSU? If so, how do they do so without the stabilizing interactions provided by the conserved G? In those cases where this is not possible, what alternative structures are formed and to mediate conserved tertiary interactions? This is an important issue, because the integrity of RNA 3D motifs is 206 crucial to maintain long-range contacts. Many of the contacts between primary and secondary functional helical elements involve 3D motifs, and most of these are found in hairpin and internal loops.

6.4.4 Transition from an RNA to an RNP world

Mmt ribosomes are composed of nearly twice the ribosomal proteins (rProteins) than that of their bacterial counterparts. The mmt-SSU rProteins comprise two broad classes, those that have homologs in bacteria and those that are unique to mitochondria. New proteins and mmt-specific extensions of bacterial homologues compensate for lost RNA or protein domains. Protein uS5m

(chain E) has an additional N-terminal domain that extends to the head region and substitutes for the lost domain of uS3m to interact with h34. uS9 occupies some of the space left by the disappearance of h39 and h40 and the reduction of h41 but the two additional N-terminal domains interact with a new specific protein mS35.

The new rProteins in the mmt-SSU maintain bacterial interactions with the rRNA and form new protein-protein contacts. For example, mS22 partly replaces h17, mS23 binds to the remodeled h25 hairpin, mS25 replaces the lost helical element h21 (one of the most exposed helices in bacterial SSU). Other proteins bind to remnants of reduced hairpins.

Using a suite of Python programs (Roy et al., manuscript in preparation) we analyzed specific residue level interactions of the nucleotides and amino acids in the bacterial and mmt-ribosomes.

In mmt-SSU, there is much more extensive amino acid - nucleotide interaction of all types

(Figure 6.3): pseudo-pairs (previously described by Westhof and Kondo [23]) involving H- bonding of RNA bases with sidechains or backbone atoms, amino acid sidechain stacking,

“bidentate” interaction in which one amino acid H-bonds to two stacked bases in sequence 207 specific ways, minor groove interactions where amino acids with hydrophobic sidechains interact in the minor groove of RNA helices, perpendicular stacking by aromatic amino acids as observed in crystal structures of benzene and cation-pi interactions. Such RNA-protein interactions are also observed in bacterial SSU though they are far fewer in number.

An important question regarding mmt-SSU is the extent to which loss of RNA elements resulting in loss of RNA-RNA stacking interactions involving HL loops is compensated by gain of RNA- protein stacking interactions. There is a huge increase in stacking interactions in mmt-SSU which compensate for loss of RNA-RNA stacking interactions. The nucleotide bases exposed as a result of RNA reductions are protected by stacking of aromatic or cationic amino acid side chains originating from augmented or entirely new polypeptide chains. The number of stacking interactions between exposed or terminal bases is 12 to 14 depending on the bacterial SSU crystal structure and this number increases by a factor around four to 44 or 48 in the two mmt-

SSU structures studied. Several of such contacts are conserved between the two mmt crystal structures and most of them occur in the proteins unique to mmt-SSU.

While in bacteria, the nucleotides interacting with protein amino acids are roughly equally distributed among the four bases (around 3% each), in mmt-SSU, around 10% of the nucleotides

A and C participate in protein interactions. The diversity in the modes of interaction between amino acids and nucleotides is amazing and here we will emphasize two modes: stacking interactions between aromatic amino acids, including arginine residues, and pseudo-pair formation between nucleotides and amino acid side chains or the peptide backbone. The number of stacking interactions between exposed or terminal bases is 12 to 14 depending on the bacterial

SSU crystal structure and this number increases by a factor around four to 44 or 48 in the two 208 mmt-SSU structures studied. Several of such contacts are conserved between the two mmt crystal structures and besides most of them occur in the proteins unique to mmt-SSU. There are between 21 and 25 pseudo-pairs in the mmt-SSU structures. There is a strong correlation between the number of new amino acid/nucleotide contacts and the remnants of helices that have been eliminated or reduced and therefore expose RNA bases to solvent.

Table 6.3 Specific RNA-protein interactions detected in E. coli and porcine SSU ribosome

Escherichia coli S. scrofa

Pseudopair 10 17

Aromatic stacking 1 19

Cation stacking 6 18

Perpendicular stacking 0 2

Hydrophobic stacking 2 11

Minor groove interactions 4 10

Bidentate 9 22

Peptide bb 5 12

TOTAL 37 111

No. of nucleotides 1530 960

% Nts interacting with amino acids 2.5% 11.5%

All protein interactions involving nucleobases in mmt and bacterial rRNA is mapped onto their respective 2D diagrams (Figure 6.3 A and B). As evident from these figures, the RNA-protein interactions are concentrated in the loops and multi-helix junctions, suggesting protection of 209 unpaired bases in these regions. Furthermore, conserved bases in loops and helices in mmt have greater interaction with protein than in bacteria. Gs conserved in bacteria and replaced by other bases frequently interact with rProteins where amino acids appear to bolster the bases with additional protein-RNA interactions.

210

211

Figure 6.3 RNA-protein interactions of different types involving RNA bases in the porcine mitochondrial and E. coli ribosomal SSU. Different modes of interactions are denoted by 212 different symbols as annotated in the legend inset. Interacting amino acids are categorized by the nature of their sidechain.

None of the Gs participate in stacking interactions though several As and Cs replace conserved

Gs from the bacterial SSU to stack on aromatic amino acids. Nucleobases such as As and Gs have been reported to stack on Tryptophans and Tyrosines to facilitate electron transfer [24-26] so that, when transiently oxidized nucleobases are produced, they can accept electrons from nearby Tyr or Trp residues and return to their fully reduced state. Contrastingly, there are only three aromatic amino acids (1 interacts at an equivalent position as the mmt-SSU) in E. coli SSU that stack onto nucleobases. This observation provides significant evidence for adaptations in mmt-SSU to repair oxidative damage to rRNA by positioning aromatic amino acids in close proximity to the bases.

6.4.5 Shielding of ribosomal RNA from solvent borne ROS

From the crystal structures of mmt-SSU (PDB: 5AJ3), it is evident that the rRNA is reduced to a compact core which is heavily shielded by an extensive network of rProteins. We analyzed the hypothesis that rProteins in mmt-SSU envelope the 12S rRNA to minimize its exposure to the solvent and hence the solvent associated ROS. We have compared the solvent accessible surface area (SASA) of 12S rRNA from mmt-SSU and 16S rRNA from E. coli-SSU, both with and without the associated rProteins to determine the extent of rRNA shielding provided by the rProteins. These data are correlated with %G composition and structural context of nucleotide positions in Table 6.4. SASA of conserved residues in helices are comparable in bacteria and mmt structures.

Helical residues that are not conserved appear to be 5-7% more solvent exposed in mmt-SSU than in bacterial-SSU. On the other hand, conserved residues in loops and multi-helix junctions 213 have lower SASA and are more protected from solvent, particularly in the residues conserved only in mmt-SSU. The loop and junction residues that are not conserved in mmt-SSU show greater exposure to solvent suggesting that the positions where bases are conserved exclusively in mmt, are less exposed to the solvent.

We further note that conserved G’s in loops are most protected from solvent by proteins, followed by A’s. Contrastingly, bases conserved only in bacteria show less protection from solvent by proteins. We also determine that bases exposed to solvent as a result of loss of RNA elements are likely to be stabilized and protected by specific interactions (pseudopair, stacking or bidentate) with amino acid sidechains or peptide backbones. Mapping the bases in the 12S rRNA that show 30% or greater decrease in SASA upon addition of rProteins and correlating it with the protein interaction data, we see that ~65% of such bases interact specifically with amino acids.

214

Table 6.4 Solvent accessible surface area of residues categorized by %G composition

In conclusion, due to the presence of the new mmt-SSU ribosomal proteins most exposed rRNA terminal helices are protected from direct contact with the solution. The exceptions are primary functional elements that must make direct contact with tRNA or mRNA or with the LSU. Helices that are highly exposed in bacteria and are important for SSU function are dealt with in the mmt in two major ways, exemplified by h27 and h22. In the case of h27, a new protein wraps up around the helix to stabilize its 3D motifs and protect them. In the case of h22, which is especially long and exposed, the entire element is replaced by an equally extended protein.

215

6.5 Conclusion

Atomic-resolution structures of mmt-ribosomes facilitate detailed analysis of secondary and tertiary structures of mmt-rRNA and the interacting rProteins. Our main aim in this paper is to account for the remarkable differences between the structure and composition of mmt-SSU ribosome and those of prokaryotes as represented by E. coli SSU. The key results of our study can be summarized as follows: (i) The mmt-rRNA undergoes significant truncation particularly in peripheral RNA elements; all helical elements that directly interact with the tRNA and mRNA substrates are largely conserved in the mmt-SSU ribosome (ii) rProteins recurrently replace the lost rRNA elements to maintain indirect connectivity with substrate (tRNA/ mRNA) binding sites while reducing exposure of rRNA to ROS (iii) There is a dramatic change in the overall nucleotide composition in mmt-rRNA, with far fewer conserved G’s, which are most prone to oxidation, and more A’s. Almost all remaining G’s in mmt occur in helices rather than loops (iv)

There are fewer conserved bases in mmt loops compared to bacterial, especially G’s and U’s.

Almost all Gs that are highly conserved in bacterial rRNA “loops” are changed to other bases in mmt-rRNA. Such replaced bases frequently interact with rProteins where the amino acids appear to stabilize the loop conformation with additional protein-RNA interactions and also shield the rRNA elements from solvent accessibility (v) A’s in mmt-SSU substituting conserved Gs in bacteria frequently stack on Trp residues so that, when transiently oxidized As are produced, they can accept electrons from nearby Tyr or Trp residues and return to their fully reduced state

(vi) There is much greater protection of rRNA from ROS/ solvent by rProtein in mmt than in bacterial SSU. Our results highlight the concerted evolutionary adaptations in mmt-ribosomes in regulation of its nucleotide composition and distribution of nucleobases in the structure, and in 216 expanding the role of rProteins in the functional networking and protecting the rRNA from ROS exposure.

217

REFERENCES

1. Koc EC, Haque M, Spremulli LL (2010) Current views of the structure of the mammalian mitochondrial ribosome. Israel Journal of Chemistry 50: 45-59. 2. Turrens JF (2003) Mitochondrial formation of reactive oxygen species. The Journal of Physiology 552: 335-344. 3. Greber BJ, Ban N (2016) Structure and function of the mitochondrial ribosome. Annual review of biochemistry 85: 103-132. 4. Bar-Yaacov D, Blumberg A, Mishmar D (2012) Mitochondrial-nuclear co-evolution and its effects on OXPHOS activity and regulation. Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms 1819: 1107-1111. 5. Allen JF, Raven JA (1996) Free-radical-induced mutation vs redox regulation: costs and benefits of genes in organelles. Journal of molecular evolution 42: 482-492. 6. Pesole G, Gissi C, De Chirico A, Saccone C (1999) Nucleotide substitution rate of mammalian mitochondrial genomes. Journal of molecular evolution 48: 427-434. 7. Brown WM, George M, Wilson AC (1979) Rapid evolution of animal mitochondrial DNA. Proceedings of the National Academy of Sciences 76: 1967-1971. 8. Castellana S, Vicario S, Saccone C (2011) Evolutionary patterns of the mitochondrial genome in Metazoa: exploring the role of mutation and selection in mitochondrial protein–coding Genes. Genome biology and evolution 3: 1067-1079. 9. Kong Q, Lin CL (2010) Oxidative damage to RNA: mechanisms, consequences, and diseases. Cell Mol Life Sci 67: 1817-1829. 10. Gelfand R, Attardi G (1981) Synthesis and turnover of mitochondrial ribonucleic acid in HeLa cells: the mature ribosomal and messenger ribonucleic acid species are metabolically unstable. Molecular and cellular biology 1: 497-511. 11. Noeske J, Wasserman MR, Terry DS, Altman RB, Blanchard SC, et al. (2015) High- resolution structure of the Escherichia coli ribosome. Nature Structural & Molecular Biology 22: 336-341. 12. Schuwirth BS, Borovinskaya MA, Hau CW, Zhang W, Vila-Sanjurjo A, et al. (2005) Structures of the bacterial ribosome at 3.5 Å resolution. Science 310: 827-834. 13. Leontis NB, Zirbel CL (2012) Nonredundant 3D structure datasets for RNA knowledge extraction and benchmarking. RNA 3D structure analysis and prediction: Springer. pp. 281-298. 14. Kaushal PS, Sharma MR, Booth TM, Haque EM, Tung C-S, et al. (2014) Cryo-EM structure of the small subunit of the mammalian mitochondrial ribosome. Proceedings of the National Academy of Sciences 111: 7284-7289. 15. Greber BJ, Ban N (2016) Structure and function of the mitochondrial ribosome. Annual review of biochemistry. 218

16. Amunts A, Brown A, Toots J, Scheres SH, Ramakrishnan V (2015) The structure of the human mitochondrial ribosome. Science 348: 95-98. 17. Cannone JJ, Sweeney BA, Petrov AI, Gutell RR, Zirbel CL, et al. (2015) R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server. Nucleic acids research: gkv543. 18. Voss NR, Gerstein M (2010) 3V: cavity, channel and cleft volume calculator and extractor. Nucleic acids research: gkq395. 19. Westhof E (2014) Isostericity and tautomerism of base pairs in nucleic acids. FEBS letters 588: 2464-2469. 20. Sweeney BA, Roy P, Leontis NB (2014) An introduction to recurrent nucleotide interactions in RNA. Wiley Interdisciplinary Reviews: RNA: n/a-n/a. 21. Gutell RR, Lee JC, Cannone JJ (2002) The accuracy of ribosomal RNA comparative structure models. Current opinion in structural biology 12: 301-310. 22. Brown JW, Birmingham A, Griffiths PE, Jossinet F, Kachouri-Lafond R, et al. (2009) The RNA structure alignment ontology. RNA 15: 1623-1631. 23. Kondo J, Westhof E (2011) Classification of pseudo pairs between nucleotide bases and amino acids by analysis of nucleotide–protein complexes. Nucleic acids research: gkr452. 24. Morozova OB, Kiryutin AS, Yurkovskaya AV (2008) Electron transfer between guanosine radicals and amino acids in aqueous solution. II. Reduction of guanosine radicals by tryptophan. The Journal of Physical Chemistry B 112: 2747-2754. 25. Kawai H, Tarui M, Doi M, Ishida T (1995) Enhancement of aromatic amino acid-nucleic acid base stacking interaction by metal coordination to base: fluorescence study on a tryptophan-Pt (II)-guanine ternary complex. FEBS letters 370: 193-196. 26. Helene C, Montenay-Garestier T (1971) Reflectance and luminescence studies of molecular complex formation between tryptophan and nucleic acid components in frozen aqueous solutions. Biochemistry 10: 300-306.