Research Collection
Doctoral Thesis
Structural and Functional Studies of mRNA Stability Regulators
Author(s): Ripin, Nina
Publication Date: 2018-11
Permanent Link: https://doi.org/10.3929/ethz-b-000303696
Rights / License: In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.
ETH Library
DISS. ETH NO. 25327
Structural and functional studies of mRNA stability regulators
A thesis submitted to attain the degree of
DOCTOR OF SCIENCES of ETH ZÜRICH
(Dr. sc. ETH Zürich)
presented by
NINA RIPIN
Diplom-Biochemikerin, Goethe University, Frankfurt, Germany
Born on 06.08.1986
citizen of Germany
accepted on the recommendation of
Prof. Dr. Frédéric Allain
Prof. Dr. Stefanie Jonas
Prof. Dr. Michael Sattler
Prof. Dr. Witold Filipowicz
2018
“Success consists of going from failure to failure without loss of enthusiasm.”
Winston Churchill
Summary
Posttranscriptional gene regulation (PTGR) is the process by which every step of the life cycle of an mRNA following transcription – maturation, transport, translation, subcellular localization and decay - is tightly regulated. This is accomplished by a complex network of multiple RNA binding proteins (RNPs) binding to several specific mRNA elements. Such cis-acting elements are or can be found within the 5’ cap, the 5’ untranslated region (UTR), the open reading frame (ORF), the 3’UTR and the poly(A) tail at the 3’ end of the mRNA. Adenylate-uridylate-rich elements (AU-rich elements; AREs) are heavily investigated regulatory cis- acting elements within 3’untranslated regions (3’UTRs). These are found in short-lived mRNAs and function as a signal for rapid degradation. AREs are present in 5-8% of human genes involved in the regulation of many essential cellular processes, such as stress response, cell cycle regulation and apoptosis and must therefore be tightly regulated. In the cytoplasm, trans-acting ARE binding proteins regulate the transport localization, stability and translation of these mRNAs. One of these factors is the embryonic lethal abnormal visual (ELAV)/ Human antigen R (HuR) protein. It increases the stability and/ or the translation of many important cellular mRNAs. Another cis-activing element is the poly(A) tail of mRNA, which protects the mRNAs from degradation. These are bound by multifunctional poly(A)-binding proteins (PABPs), which play a central role in translation initiation, translation termination and mRNA decay.
In this study, we have investigated the mRNA stability regulators HuR and PABPC1, both containing multiple RNA recognition motifs (RRMs). Excitingly, some single RRMs have several functions due to the presence of additional binding interfaces that allow them to bind both RNA and other factors. We characterized the C-terminal RRM of HuR, which is hypothesized to be involved in RNA binding, homo-dimerization and protein-protein interacting. We show the first 1.9- Å-resolution crystal structure of HuR RRM3 bound to several short ARE-motifs. Our structure reveals the presence of the homodimer. The combination of several biophysical techniques validate the homo-dimerization and promiscuous RNA binding in solution. Additionally, the binding of the canonical AUUUA pentameric motifs, found in the majority of AREs, is possible by the recognition of two registers. Excitingly, RRM3 homo-dimerization increases the affinity for RNA, highlighting the cooperativity between the two binding surfaces. Moreover, despite the known stabilizing role of HuR, we provide evidence that RRM3 counteracts this effect in a Huh7 cell - based ARE reporter assay containing multiple AUUUA motifs. Finally, we investigated the mechanism of the cytoplasmic PABP RRM1 in binding to poly(A) and to the anti-proliferative B- cell translocation gene (BTG2) protein. BTG2 recruits the CCR4-associated factor 1 (CAF1), a
subunit of CCR4-NOT deadenylase complex, to induce deadenylation of mRNAs. We show that PABPC1 RRM1 uses its α1 to bind BTG2 while simultaneously binding the poly(A) RNA. This interaction seems to orient the poly(A) 3’end such that it is close to the CAF1 enzymatic pocket.
Our findings provide new details of the HuR RRM3-RNA recognition and homo-dimerization as well as the PABPC1 RRM1-poly(A)-BTG2 binding to recruit CAF1 and thus highlight the diversity of RRMs.
Zusammenfassung
Posttranskriptionelle Genregulation (PTGR) ist ein Prozess nach der Transkription, bei dem jeder Schritt des Lebenszyklus einer mRNA - Prozessierung, Transport, Translation, subzelluläre Lokalisierung und Abbau - streng reguliert wird. Dieser wird durch ein komplexes Netzwerk von RNA-bindenden Proteinen (RBP) erreicht, die mehrere spezifische mRNA-Elemente binden. Solche cis-wirkenden Elemente sind oder liegen innerhalb der 5'-Kappe, der 5'-untranslatierten Region (UTR), des offenen Leserahmens (ORF), des 3'UTR und des Poly(A) -Schwanzes am 3'- Ende. Adenylat-Uridylat-reiche Elemente (AU-reiche Elemente; AREs) sind stark untersuchte regulatorische cis-wirkende Elemente innerhalb von den 3'-untranslatierten Regionen (3'UTRs). Diese finden sich in kurzlebigen mRNAs und fungieren als ein Signal für den schnellen mRNA Abbau. AREs sind in 5-8% der menschlichen Gene vorhanden, die an der Regulation vieler essenzieller zellulärer Prozesse beteiligt sind, wie Stressreaktion, Zellzyklusregulation und Apoptose und müssen daher streng überwacht werden. Im Zytoplasma, trans-wirkende ARE- bindende Proteine steuern die Lokalisierung, Stabilität und Translation dieser mRNAs. Einer dieser Proteine ist das „embryonic lethal abnormal visual (ELAV)/ human antigen R (HuR)“ Protein. Es erhöht die Stabilität und/ oder Translation vieler dieser zellulärer mRNAs. Ein weiteres cis-wirkendes Element ist der Poly(A) -Schwanz von mRNAs, der die mRNAs vor Abbau schützt. Dieser wird durch multifunktionelle Poly(A) -bindenden Proteine (PABPs) gebunden, die eine zentrale Rolle bei der Translationsinitiation, Translationstermination und dem mRNA-Zerfall spielen.
Das Ziel dieser Doktorarbeit ist die Untersuchung der mRNA bindenden Proteine HuR und PABPC1. Diese enthalten mehrere RNA-bindinge Domänen/ „RNA recognition motifs, (RRMs)“. Interessanterweise haben einige dieser RRMs zahlreiche Funktionen aufgrund der Anwesenheit von mehreren Bindungsstellen. Diese ermöglichen sowohl RNA- als auch Protein-Bindung. Wir charakterisierten die C-terminale RRM von HuR, von welcher angenommen wird, dass sie an RNA-Bindung, Homodimerisierung und Protein-Protein-Wechselwirkung beteiligt ist. Wir zeigen die erste 1,9-Å-Kristallstruktur von HuR RRM3, die an mehrere kurze ARE-Motive gebunden ist. Unsere Struktur zeigt die Anwesenheit eines Homodimers. Durch die Kombination mehrerer biophysikalischer Metoden validieren wir die Homodimerisierung und die RNA-Bindung in Lösung. Darüber hinaus ist die Bindung der kanonischen AUUUA-Pentamer-Motive, die in den meisten AREs gefunden werden, durch die Erkennung von zwei Bindungsregistern möglich. Die RRM3-Homodimerisierung erhöht die Affinität für RNA und verdeutlicht die Kooperativität zwischen den beiden Bindungsoberflächen. Schließlich, trotz der bekannten stabilisierenden
Rolle von HuR, liefern wir Beweise, dass RRM3 diesem Effekt in einem auf Huh7-Zellen-basierten ARE-Reporter-Test entgegenwirkt. Darüber hinaus untersuchten wir den Mechanismus der zytoplasmatischen PABP-RRM1 bei der Bindung an poly(A) und an das anti-proliferative B-Zell- Translokationsgen (BTG2) -Protein. BTG2 rekrutiert den CCR4-assoziierten Faktor 1 (CAF1), eine Untereinheit des CCR4-NOT-Deadenylase-Komplexes, um die Deadenylierung von mRNAs zu induzieren. Wir zeigen, dass PABPC1 RRM1 die α1 verwendet, um BTG2 zu binden, während es gleichzeitig die poly(A) RNA bindet. Diese Wechselwirkung scheint das Poly(A) 3'-Ende so auszurichten, dass es nahe der CAF1-enzymatischen aktivem Zentrums liegt.
Unsere Ergebnisse liefern interessante Informationen über die HuR-RRM3-RNA-Erkennung und Homodimerisierung sowie die PABPC1-RRM1-BTG2-Bindung, um CAF1 zu rekrutieren und verdeutlichen damit die Diversität von RRMs.
Acknowledgment
My time during the doctorate was the toughest challenge I ever faced. I grew as a scientist but also as a person. For that, a special thanks to Fred. You took me in under special circumstances and for that, I am very grateful. The time in your group was and still is one of the best and valuable experiences I had in my life. Thank you for all your support. I would like to acknowledge Prof. Stefanie Jonas, Prof. Michael Sattler and Prof. Witold Filipowicz for joining as co-referees for my thesis. Michael, I really enjoyed our discussions and your challenging questions in Parpan. Another very big thanks goes to Malgosia. You have been always there for me. Your wisdom is highly precious and I learned a lot from you. Fred D, thanks a lot for teaching me how to set up and the basics of NMR experiments and especially for correcting my English whenever it was needed and thanks to Julien for introducing me to ITC, your support on the HuR project and all the discussions we had. Nana and Irene, it was fun working with you. One cannot imagine better colleagues and friends in the lab. Ahmed, you are always there to help and support others. Thank you for being such a good listener. Yaro, thanks for the fun working evenings and late discussions. Thea, Gerry, Fred D, Alvar and Simon, thank you for keeping our NMR spectrometers running. Stefanie, many thanks for all your advice on the cell culture experiments and more. I wish I had met you earlier. You would have saved me from all the struggle I faced in the cell lab. To all my office mates over the years: first L24, Johannes, Georg, Kyle, Nana, Irene and Esteban, thanks a lot for all the discussions and the fun time. Then, L12 office: Thanks to Laurent, Tebbe, Leonidas and Elisabeth for the great and fun atmosphere and your patience about my nagging during my thesis-writing period. Elisabeth, you do an amazing job with the AKTA! Cristina thanks for trying to motivate me going to Yoga. Thanks to all former and present colleagues Grégoire, Christine, Dominik, Sébastien, Antoine and our newest lab members Xing and Pengzhi, as well as all the members of the Gossert and Jonas group. It was and is always a pleasure to come to work and one of the reasons is the amazing atmosphere because of all you guys. Big thanks to Roddy and Naomi, two great students I supervised. I enjoyed and had a lot of fun working with you. I learned a lot from you too! Thank you Isabelle for your amazing help with all the administration, and Iwona and Gabi for keeping the institute running.
Additionally, I thank all my collaborators, namely Fabienne Mauxion (Institut de Génétique et de Biologie Moléculaire et Cellulaire, France) for the collaboration on the BTG2 project. Jiří Šponer and Miroslav Krepl (Institute of Biophysics of the Czech Academy of Sciences) for molecular dynamics data, Robert Schneider and Abhilash Gadi (NYU School of Medicine, USA) for validating some of our luciferase assay data, Nicole Meisner-Kober,
Alexandra Hinniger and Michael Faller (Novartis, Basel) and especially Alexandra and Michael for teaching me X-ray crystallography and always being there for me. Additionally, thanks to further colleagues during my time at Novartis, César and Sascha (CPC) and Katrin, Julian, Wolf, Lena, Dominik, Anja and Cornelia (DMP) and my flatmates from the Basler Murbi WG, Cedi, Lena and Christof.
Thanks to the ladies from the FSSB and the ladies from the NCCR peer mentoring group for all the valuable discussions and advice on the scientific environment. Thanks to all my colleagues and friends from the scientific staff associations AMB and AVETH, especially Shady, Betty, Michaela, Tanja, Markus, Rebekka, Elisa, Florian, Arik, Martin, Alok, Jenna, Michael and Alina. Also a big thanks to Antonio, Wilfred and Maryvonne. We were a great team and I learned a lot from you. It was fun working with all of you and we had some – politically - exciting years! Let’s see what comes next! Lastly, thanks to my climbing partners Florian, Daniel and Bettina and especially my friends Lori for always being there for me.
Abschließend möchte ich allen meinen Cousins, Cousinen, Tanten und Onkeln in Deutschland und Russland für all die Unterstützung und Motivation danken. Vor allem meinen Eltern Tatjana und Julius und meiner Schwester Kristine. Ohne eure Geduld, Vertrauen und Unterstützung wäre ich niemals so weit gekommen oder hätte so viel erreicht. Und nochmal: Ich habe mich nie vor der Hausarbeit gedrückt, ich habe wirklich gearbeitet! NMR Protein- „Assignments“ ist kein Computerspiel! Darüber hinaus möchte ich meiner Großmutter Nina danken. Sie ist letztes Jahr verstorben. Sie war eine wundervolle Person, stark, selbstbewusst und immer für ihre Familie da. Danke für alles, was du für mich getan hast.
Table of Contents 1. Introduction ...... 1
1.1 mRNA biology ...... 2
1.1.1 Post-transcriptional gene regulation ...... 2
1.1.2 Translation and mRNA turnover ...... 3
1.1.3 mRNA decay regulation by cis-acting AU-rich elements...... 4
1.1.4 Subcellular localization and phase separation ...... 5
1.1.5 The RNA recognition motif (RRM) – a multifunctional binding scaffold ...... 6
1.1.6 ELAV protein family and HuR ...... 8
1.1.7 Interplay of PABP and BTG2/ Tob family in mRNA decay ...... 10
1.1.8 mRNA stability regulators and human disease ...... 12
1.2 Investigation of protein-protein and protein-RNA interactions ...... 13
1.2.1 Structural studies of macromolecules ...... 13
1.2.2 Structure determination by X-ray Crystallography ...... 14
1.2.3 Characterization of protein-RNA and protein-protein complexes by NMR Spectroscopy ...... 18
1.2.4 Prediction of protein complexes by HADDOCK ...... 23
1.2.5 ITC to study protein-RNA and protein-protein interactions ...... 25
1.2.6 Dual Luciferase Reporter Assay ...... 27
1.3 Research Objectives ...... 30
2. Molecular basis for AU-rich element recognition and dimerization by the HuR C- terminal RRM ...... 31
2.1 Abstract ...... 32
2.2 Introduction ...... 33
2.3 Results ...... 35
2.4 Discussion ...... 54
2.5 Materials and Methods ...... 58
2.6 Acknowledgements ...... 62
3. Structural basis of the PABPC1 RRM1-BTG2 interaction to recruit CAF1 deadenylase ...... 63
3.1 Abstract ...... 64
3.2 Introduction ...... 65
3.3 Results ...... 68
3.4 Outlook ...... 85
3.5 Discussion ...... 86
3.6 Materials and Methods ...... 89
3.7 Acknowledgements ...... 92
4. Concluding remarks ...... 93
4.1 RNA recognition motifs: boring? Not at all! ...... 94
4.2 The multitasking RRM3 of HuR ...... 94
4.3 The multitasking RRM1 of PABPC1 binds poly(A) and BTG2 to induce deadenylation ...... 97
4.4 Towards understanding mRNA stability regulators ...... 99
5. Appendix ...... 101
5.1 A2. Supplementary Tables Chapter 2 ...... 102
5.2 A2. Supplementary Figures Chapter 2 ...... 110
5.3 A2. Supplementary Materials and Methods Chapter 2 ...... 117
5.4 A3. Supplementary Tables Chapter 3 ...... 120
5.5 A3. Supplementary Figures Chapter 3 ...... 127
6. References ...... 129
7. Curriculum Vitae ...... 142
Abbreviations
ARE (AU)-rich element (Adenylate-uridylate-rich element)
Cryo-EM Cryo-Electron Microscopy dsRBD double-stranded RNA binding domain
EPR Electron Paramagnetic Resonance
FL Firefly luciferase
HADDOCK High Ambiguity Driven protein-protein Docking
HEK293 human embryonic kidney cells 293 hetNOE heteronuclar nuclear Overhauser effect
HSQC Heteronuclear Single Quantum Coherence spectroscopy
Huh7 Human hepatocarcinoma cell line
IDR Intrinsic disordered region
ITC Isothermal titration calorimetry
LCD Low complexity domains
MR Molecular Replacement miRNA microRNA miRISC miRNA-induced silencing complex mRNA messenger RNA mRNP messenger ribonucleoprotein particles
MW molecular weight nt nucleotides
NLS nuclear localization signal
NMR Nuclear magnetic resonance
NOESY Nuclear Overhauser Enhancement Spectroscopy
ORF Open reading frame p pocket
PTGR Posttranscriptional gene regulation
PB p-bodies/ processing bodies
PBS Phosphate buffered saline ppm parts per million
PRE Paramagnetic relaxation enhancement
RBD RNA binding domain
RBP RNA binding protein
RDC Residual dipolar coupling,
RISC RNA-induced silencing complex
RL Renilla luciferase
RNA Ribonucleic acid
RNP Ribonuleoprotein
RRM RNA recognition motif
RT Room temperature
SANS Small-angle neutron scattering
SAXS Small-angle X-ray scattering
SG Stress granules
SR Serine-arginine-rich protein
TNF Tumor necrosis factor
TOCSY Total Correlation Spectroscopy
UTR Untranslated region
WT Wild type
Less frequent abbreviations are defined upon their first use in the text
1. Introduction
1 Introduction
1.1 mRNA biology
All the genetic information is stored in our DNA, which is tightly packed and protected in the cell nucleus. To execute genetic instructions, the DNA sequence is transcribed into a complementary pre-messenger RNA (pre-mRNA). The pre-mRNA is processed through multiple steps in the nucleus and the mature mRNA is transported to the cytoplasm to fulfill its functions (Figure 1.1). From the very beginning, multiple factors, including RNA binding proteins (RBPs), bind mRNAs and regulate them throughout their entire life cycle, a process called posttranscriptional gene regulation (PTGR).
Figure 1. 1. mRNPs and post-transcriptional regulation. RBPs are trans-acting factors that bind cis-acting elements within the mRNAs RBPs regulate alternative mRNA splicing, maturation, transport, subcellular location, lifetime, and translation. Adapted from (García-Mauriño et al., 2017).
1.1.1 Post-transcriptional gene regulation
After the transcription within the nucleus, every step in the life cycle of an mRNA is tightly regulated to produce the mature mRNA for translation in the cytoplasm. During these nuclear processing events, a 7-methylguanosine cap is added at the 5’end, introns are spliced and the 3’end is poly-adenylated (Figure 1.1). The mature mRNA is composed of a 5’ cap, the 5’ untranslated region (UTR), the open reading frame (ORF), the 3’UTR and the poly(A) tail at the 3’ end (Figure 1.2). These mRNAs are exported to the cytoplasm, where they are translated to proteins, stored in cytoplasmic bodies or targeted for degradation (Figure 1.1). All these steps are regulated by a dynamic interaction with multiple RBPs and formation of mRNA-protein complexes (RNPs).
Introduction
Figure 1. 2. Composition of mRNAs. mRNA elements and interacting proteins form mRNPs. Translation initiation factors, such as eIF4E and eIF4G interact with the 5’UTR-cap structure and factors including PABP interact with the 3’-poly(A) tail. Trans-binding regulatory factors recognize specific cis-elements, e.g. AREs within the 3’UTRs of certain mRNAs. Adapted from (Rissland, 2017).
1.1.2 Translation and mRNA turnover
The mRNA cap and poly(A) tail protect mRNAs from degradation. They are involved in two major processes affecting all mRNAs: translation and decay (Rissland, 2017). Many RBPs recognize these two structural elements and link both pathways. In the cytoplasm, the translation initiation factor eIF4E binds and protects the 5’ cap structure (Figure 1.2). The cytoplasmic poly(A)-binding protein 1 (PABP) on the contrary binds the poly(A) tail and the translation initiation factor eIF4G, which is bound to eIF4E, leading to the closed-loop structure (Wells et al., 1998). This facilitates translational initiation but also regulates mRNA stabilization/ degradation. PABP interacts with members of the degradation machinery (Webster et al., 2018; Yi et al., 2018) or proteins that recruit the degradation machinery (Ezzeddine et al., 2007; Stupfler et al., 2016). The anti-proliferative B-cell translocation gene (BTG)/ transducer of ERBB2 (Tob) family members bind PABP and thereby recruit the CCR4- associated factor 1 (CAF1; also known as CNOT7), a subunit of CCR4-NOT deadenylase complex (Ezzeddine et al., 2007; Stupfler et al., 2016). Shortening of poly(A) tails is the first major step that triggers mRNA decay. This is catalyzed by various different deadenylases, such as PAN2-PAN3 and the CCR4-NOT complex. When the poly(A) tail reaches a certain length, degradation of the mRNA is mediated by the removal of the 5’-end cap by DCP1/2, followed by the 5’ exonuclease XRN1, which degrades the mRNA in the 5’ to 3’ direction (Clark et al., 2009; Heck and Wilusz, 2018; Wahle and Winkler, 2013). Alternatively, a second pathway of mRNA decay following deadenylation is the 3’ to 5’ degradation of some mRNAs. These are degraded by the exosome from the 3’-end. Afterwards, the scavenger decapping complex removes the 7-methyl-guanosine cap (Figure 1.3) (Clark et al., 2009; Heck and Wilusz, 2018; Wahle and Winkler, 2013).
3 Introduction
Figure 1. 3. Simplified illustration of the mRNA ARE degradation pathway. Cis acting AREs within the 3’UTR provides binding sites for various ARE binding proteins, such as HuR, AUF-1, TTP or KSRP, which regulate mRNA turnover. The 3’ to 5’ decay pathway (above the mRNA) is comprised of the deadenylase complex PAN2-PAN3 or the CCR4/NOT complex, followed by the 3’ degradation of the mRNA body by the exosome and 5’ end processing by the scavenger decapping complex (DCPS). The major 5’ to 3’ decay pathway (below the mRNA) involves the same initial deadenylation step. Afterwards the decapping complex (DCP1-DCP2) removes the 5’ cap and the nuclease XRN1 degrades the mRNA body. Adapted from (Clark et al., 2009).
Destabilizing elements such as adenylate/ uridylate (AU)-rich elements in the 3’UTRs (Barreau et al., 2005; Chen and Shyu, 1995), elements in protein coding regions (Chang et al., 2004; Grosset et al., 2000), premature stop codons (Chen and Shyu, 2003) and miRNA binding sites (Behm-Ansmant et al., 2006; Wu et al., 2006) are all inducing deadenylation and mRNA decay.
1.1.3 mRNA decay regulation by cis-acting AU-rich elements
(AU)-rich elements (AREs) are regulatory cis-acting elements found in 5-8% of human genes (Bakheet et al., 2006). AREs function as a signal for rapid degradation and are found mainly in the 3’ UTRs of short-lived mRNAs (Bakheet et al., 2006). mRNAs containing AREs are involved in the regulation of many important cellular processes such as stress response, cell cycle regulation, inflammation, immune cell activation, apoptosis and carcinogenesis (Bakheet et al., 2006). They range in size from 50 to 150 nucleotides and are classified by the
4 Introduction presence of the AUUUA motif repeats. Class I and II contains several AUUUA motifs while class III is U-rich (Benjamin and Moroni, 2007). Six well known RBPs regulate the transport, stability and translation of ARE containing mRNAs: Tristetrapolin (TTP), AU-binding Factor 1 (AUF-1), KH-type splicing regulatory protein (KSRP), Human antigen R (HuR), T-cell intracellular antigen 1 (TIA-1) and TIA-1-related protein (TIAR). While TTP, AUF-1 and KSRP mainly functions in mRNA degradation and TIA-1 and TIAR in silencing of translation, HuR, stabilizes mRNAs and/or upregulates translation. However, to some small extent, opposite functions for all proteins were reported for various mRNAs (García-Mauriño et al., 2017). Most of these proteins are able to shuttle between the nucleus and cytoplasm and compete for the same RNA binding sites; therefore, they regulate similar mRNA targets (Figure 1.3).
1.1.4 Subcellular localization and phase separation
Next to translation or decay, mRNAs can be stored in subcellular compartments, such as stress granules (SGs) or processing bodies (p-bodies or PBs) (Figure 1.1). These are membrane-less granules composed of proteins, nucleic acids and other molecular factors, which form under various stimuli. Both SGs and PBs form in response to stress, where the translation of mRNAs is strongly repressed (Uversky, 2017). The hallmarks of proteins enriched in such granules, are the presence of RNA-binding domains or motifs and intrinsically disordered regions (IDRs), sequences that lack a defined 3D structure (“disordered”) (Calabretta and Richard, 2015). A subset of IDRs are defined by >100 residue long regions, called low complexity domains (LCDs), that are composed of repeating amino acids with low overall diversity. In vitro, such IDRs drive phase transitions. These sequences are rich in uncharged polar side chains (glutamine, asparagine, glycine, serine, proline), charged amino acids (arginine, lysine, glutamic acid, aspartic acid), or aromatic residues (phenylalanine and tyrosine) and mediate interactions by electrostatic, dipole–dipole, pi–pi, cation–pi, hydrophobic, and hydrogen bonding interactions (Boeynaems et al., 2018). Most of the proteins involved in translation and decay, such as eIF4B, eIF4G, eIF4E, TTP, TIA-1, XRN1, PABP, TIAR and HuR can be found in such granules (Uversky, 2017). In addition, RNA alone is also able to form phase transition in vitro. Recent studies showed that RNAs which contain repetitive sequences are able to form phase separation through intermolecular base-pairing (Jain and Vale, 2017).
In the past years, multiple discoveries created a completely new field in cell biology, that focuses on understanding how these membrane less organelles form, their composition and their effect on biological function and disease (Boeynaems et al., 2018).
5 Introduction
1.1.5 The RNA recognition motif (RRM) – a multifunctional binding scaffold
RBPs are diverse and vary in their structure and function. They represent 7,5% of all protein-coding genes in humans (Gerstberger et al., 2014). Many RBPs contain characteristic domains, which bind to single stranded or double stranded RNA: RNA recognition motif (RRM) (also known as RBD (RNA binding domain) or RNP (ribonucleoprotein domain)), zinc fingers domains (ZnF), the K-Homology (KH), cold shock domains (CSD) and double stranded RNA- binding domain (dsRBD)(Lunde et al., 2007). These domains can specifically bind RNA by hydrogen bonding, electrostatic interactions and hydrophobic and aromatic stacking interactions with the nucleobases. Non-sequence-specific contacts are mediated by the sugar- phosphate backbone.
The RNA recognition motif (RRM) is the most abundant RNA binding domain and is present in 0.5-1% of human genes (Venter et al., 2001). Proteins containing RRMs are involved in all steps of PTGR. The RRM is comprised of around 90 amino acids, which fold into a
β1α1β2β3α2β4 topology. Four β-strands are packed against two α-helices (Figure 1.4). Within the β-sheet surface, three aromatic side-chains are often located in the conserved RNP1 (β3- strand) and RNP2 (β1-strand). RRMs are able to recognize two to eight nucleotides. Two RNA bases stack on the aromatic ring within the β1 (RNP2, position 2) and in the β3 (RNP1, position 5). A third aromatic ring, which is located in β3 (RNP1, position 3), is often inserted between the two RNA sugar rings (Cléry et al., 2008). Excitingly, despite the conserved RRM fold and similar binding surfaces, proteins show differences in RNA recognition. These deviations from the canonical RNA binding mode are possible due to the N- and C-terminal extensions, loops and interdomain linkers or binding to other factors which affect the number of bound nucleotides and RNA specificity. Some RRMs do not contain the canonical aromatic residues within their β-sheet surface and adopted different RNA binding strategies. The RRMs of the heterogeneous nuclear ribonucleoprotein (hnRNP) F, an alternative splicing and polyadenylation regulator, recognize RNA G-tracts by a β-hairpin, the β1-α1 loop and the β2- β3 loop. Such RRMs are called quasi RRMs (qRRMs) (Dominguez and Allain, 2006). Another alternative-splicing regulators, the SR protein SRSF1, also contains a RRM lacking the conserved RNP (Clery et al., 2013). The structure of this so-called pseudo-RRM (ΨRRMs) reveals the involvement of the patch where α1 packs against β2.
RRM containing proteins use a wide set of additional mechanisms to modulate RNA recognition. RNA can be recognized by unique contacts between amino acids and specific nucleotides. This leads to a precise binding on a certain position within the target mRNA. However, RRMs can also recognize degenerate or repetitive RNA sequence motifs, such as
6 Introduction
Figure 1. 4. Canonical RRM β-sheet-RNA interaction. (A) Example of an RRM structure in complex with RNA (hnRNP A1 RRM2 in complex with single stranded telomeric DNA). (B) Schematic representation of the four- stranded β-sheet surface with the main conserved RNP1 and RNP2 aromatic residues indicated in green. RNP1 and RNP2 consensus sequences of RRMs are shown (X is for any amino acid). Figure from (Cléry et al., 2008).
poly-pyrimidine or poly-adenosine tracts (Banerjee et al., 2003; Deo et al., 1999; Mackereth et al., 2011). On such targets, RRMs are able to recognize multiple binding registers, which enhances the overall binding affinity. This was shown for the the polyU binding protein hnRNP C and the pre-mRNA splicing regulator U2AF65 (Cieniková et al., 2014; Mackereth et al., 2011). Dynamic binding, by multiple register binding or sliding along the RNA, is an additional mechanism of RRMs to fine-tune their affinity.
Other variability in RNA recognition of proteins comes from the existence of multiple copies of RRMs or a combination of different domains (Figure 1.5). The domains can be in tandem or separated by short linkers or longer unstructured regions. Tandem RRMs show higher specificity and affinity compared to separate RRMs. The RRMs can interact with each other, sometimes involving their interdomain linkers, to create an extended RNA binding surface or a deep cleft for the interaction with the RNA, as shown for the ARE binding proteins Sex-lethal, HuD and HuR RRM12 (Handa et al., 1999; H. Wang et al., 2013; Wang and Tanaka Hall, 2001). Many RRM containing proteins are multifunctional. Aside from RNA binding they also participate in other macromolecular assemblies. Structural studies reveal that protein recognition by RRMs is very diverse. Interactions can form between two RRMs, between an RNA binding RRM and a non-RRM domain and between RRMs that do not bind RNA and another protein (Cléry et al., 2008; Muto and Yokoyama, 2012). RRM-RRM interaction can induce RNA looping, as shown for the polypyrimidine tract binding protein (PTB) RRM3-
7 Introduction
Figure 1. 5. RRM organization of key proteins involved in PTGR. RRMs are shown in yellow, Gly-rich sequences in green and the MLLE domain in PABP in grey.
RRM4, hnRNPA1 RRM12 and hnRNPL RRM34 (Barraud and Allain, 2013; Beusch et al., 2017; Oberstrass et al., 2005; Vitali et al., 2006; Zhang et al., 2013). RRM-protein interactions can cooperatively affect RNA affinity. This is observed for PABP RRM2. When PABP RRM12 is in complex with an eIF4G fragment, it binds 10 fold higher to poly(A) RNA than without (Safaee et al., 2012).
RNA recognition and affinity can thus be influenced by additional protein-protein interactions. These diverse binding modes help us to understand the basis for RRM-RNA or RRM-protein recognition. However, new modes of interaction are still being discovered. This highlights the versatility of the RRMs and that more investigations are needed to understand the potential code for RRM-RNA/ protein recognition (Cléry et al., 2008).
1.1.6 ELAV protein family and HuR
The embryonic lethal abnormal visual (ELAV)/ Human antigen (Hu) protein family consists of three members found in neurons, HuB (Hel-N1), HuC and HuD and one ubiquitously expressed member HuR (HuA) (Antic and Keene, 1997; Fan and Steitz, 1998b; King et al., 1994; Ma et al., 1996). Out of this family of proteins, HuR is heavily under investigation. It is a pivotal regular of ARE containing mRNAs, which play a role in essential biological processes such as including cell growth, differentiation, apoptosis, signal transduction, hematopoiesis and metabolism. HuR acts by stabilizing a large number of transcripts such as cyclin A, cyclin B1, p21, p53, tumor necrosis factor alpha (TNF-α), interleukin-3 (IL-3) and vascular endothelial growth factor (VEGF) (Dean et al., 2001; Levy et al., 1998; Ming et al., 2001; W. Wang et al., 2000; Wengong Wang et al., 2000; Zou et al., 2006). Moreover, HuR promotes translation as
8 Introduction reported for glucose transporter 1 (GLUT1),and cationic amino acid transporter 1 (CAT1), prothymosin alpha (ProTα) (Gantt et al., 2006; Lal et al., 2005; Yaman et al., 2002) It is also known to destabilize a small number of mRNAs and/or to suppress their translation (Cammas et al., 2014; Kim et al., 2009; Leandersson et al., 2006; Meng et al., 2005). HuR binds rather U-rich than AU-rich targets (Lebedeva et al., 2011; López de Silanes et al., 2004). HuR is mostly localized in the nucleus but undergoes cytoplasmic translocation under various cellular and stress conditions (J. Wang et al., 2013). HuR localizes in stress granules when cells are stressed by heat shock or arsenate treatment, (Gallouzi et al., 2000; Yoon et al., 2013). HuR and other Hu family members are composed of three highly conserved canonical RNA recognition motifs (RRMs). The first two RRMs (RRM12) are in tandem and are separated from the C-terminal RRM (RRM3) by a ~50-residue unstructured basic region (hinge region). A nucleocytoplasmic shuttling element within the hinge region is responsible for the translocation between the nucleus and the cytoplasm (Fan and Steitz, 1998a). RRM12 is mainly responsible for ARE-binding (Chen et al., 2002), while the exact function of RRM3 is still not fully characterized.
So far, only the crystal structures of the free HuR RRM12, crystal structures of HuR and HuD RRM12 bound to RNA and NMR structures of HuC RRM1 and HuC RRM2 are solved (Inoue et al., 2000; H. Wang et al., 2013; Wang and Tanaka Hall, 2001). In the free form, HuR RRM12 has an open conformation with no inter-domain contacts (Figure 1.6A). (H. Wang et al., 2013). The crystal structure of HuR RRM12 in complex with AUUUUUAUUUU shows a closed shape mediated by hydrogen bonds between the two domains and the inter domain linker. These interdomain interactions are also present in HuD RRM12, when bound to RNA. (Wang and Tanaka Hall, 2001). Both RRMs create a deep cleft for the RNA. RRM1 binds to five nucleotides U5-U8 and U10 while the inter-domain linker interacts with U9 and RRM2 binds to U3-U4 (Figure 1.6B) (Wang and Tanaka Hall, 2001).
RRM3 and the hinge region are involved in homo-dimerization (Scheiba et al., 2014; Toba and White, 2008), protein-protein interactions (Brennan et al., 2000; Cho et al., 2010), cooperative binding of multiple HuRs to long AREs and counteracting miRNA mediated repression to promote miRISC release from target mRNAs (Kundu et al., 2012; Mukherjee et al., 2016). Due to the insolubility and instability of the RRM3 domain in vitro, structural studies has remained challenging. A recent NMR study revealed that RRM3 dimerizes through helix α1, which is located opposite to the RNA binding interface (Scheiba et al., 2014), highlighting a new RRM-RRM interaction mode. Despite the provided structural model of the free RRM3 and two potential RRM3 dimer models in that study, the lack of atomic resolution structures of the free and RNA-bound forms prevents a complete understanding of HuR RRM3 dimerization and RNA recognition.
9 Introduction
Figure 1. 6. RRM orientation of free HuR RRM12 and in complex with HuR Cartoon representation of (A) the free HuR RRM12 shows an open confirmation (pdb code 4EGL). (B) 5’-AUUUUUAUUUU-3’ binding induces a closed conformation of HuR RRM12, creating a deep cleft for the RNA (pdb code 4ED5).
1.1.7 Interplay of PABP and BTG2/ Tob family in mRNA decay
The poly(A)-binding protein (PABP) family plays a role in both translation and mRNA stability by binding to the mRNA 3’ poly(A) tails. In addition to the canonical cytoplasmic PABP (PABPC1), there are four other PABP genes in humans. They have a similar domain architecture but differ in their expression patterns. Three of the cytoplasmic PABPs consist of four RNA recognition motifs (RRMs) followed by an extended C-terminus, while one PABP protein lacks the C-terminal region. There is also a nuclear PABP (PABPN1) comprised of only one RRM, flanked by an acidic N-terminus and an arginine-rich C-terminal domain (Mangus et al., 2003). The cytoplasmic PABP C-terminus contains a conserved MLLE domain, also known as poly(A)-binding protein C-terminal domain (PABC) (Kozlov et al., 2001), which recognizes PABP-interacting motif 2 (PAM2) found in a wide set of proteins to recruit them to the poly(A) tails (Kozlov et al., 2001; Lim et al., 2006; Okochi et al., 2005). Two crystal structures of the tandem PABP RRM12 with poly(A) RNA reveal that the two tandem RRMs contact each other to create an extended β-sheet surface and bind a single-stranded RNA motif (Figure 1.7A) (Deo et al., 1999; Safaee et al., 2012). The adenines are recognized by multiple contacts with the sugar-phosphate backbone and the ribose moieties (Deo et al.,
10 Introduction
1999). Further structural studies show that RRM2 α1 and β4 form hydrophobic interactions, hydrogen bonds and salt
Figure 1. 7. RRM domain orientations of PABPC1 RRM12. Cartoon representation of (A) the poly(A) bound PABPC1 RRM12, showing an extended β-sheet surface (pdb code 1CVJ) and (B) PABPC1 RRM12 bound to poly(A) and the eIF4G fragment (pdb code 4F02). RRMs are colored in grey, eiF4G in blue. RNA is shown as stick representation in yellow. (C). Polar contacts (red dashed lines) (left) and hydrophobic interactions (right) contribute to eIF4G-RRM2 binding
bridges with eIF4G (Figure 1.7B, C) (Safaee et al., 2012). PABP protects poly(A) tails from deadenylation. However, interaction with members of the anti-proliferative B-cell translocation gene (BTG)/ transducer of ERBB2 (Tob) family, recruits the CCR4-associated factor 1 (CAF1), a subunit of CCR4-NOT deadenylase complex, which induces poly(A) tail shortening (Ezzeddine et al., 2007; Stupfler et al., 2016). In detail, Tob contains a PAM2 motif, which binds the PABPC1 C-terminal MLLE domain (Ezzeddine et al., 2007). Interestingly, BTG2 is lacking such a PAM2 motif. In fact, the interaction is mediated by the BTG2 APRO domain and the PABPC1 RRM1 (Stupfler et al., 2016), suggesting a novel mode of RRM-domain interaction.
11 Introduction
1.1.8 mRNA stability regulators and human disease
All post-transcriptional steps in gene expression need to be tightly regulated. Aberrations, such as gene mutations, differential abundance of mRNAs or proteins, or changes in protein behavior could lead to undesirable pathologic effects. RNPs, among them ARE binding proteins are fundamental players, which control the stability and translation of ARE containing mRNAs. ARE is a signal for rapid degradation located in short-lived mRNA 3’UTRs. Such mRNAs code for proteins, which are involved in all essential biological processes, including cell growth, differentiation, apoptosis, signal transduction, hematopoiesis and metabolism (Khabar, 2005). After the mRNAs fulfill their functions, they need to be degraded. Prolonged stabilization of such ARE mRNAs causes continuous responses. In case of HuR, elevated levels increases the upregulation of mRNAs which cause tumor growth and disease progression in various cancer types, e.g. breast-, colon-, ovarian-, prostate-, pancreatic- and oral cancer (Kotta-Loizou et al., 2016; Srikantan and Gorospe, 2012; J. Wang et al., 2013). Additionally, HuR expression levels correlate with viral infections, cardiovascular diseases, neurological pathologies and muscular disorders (Di Marco et al., 2005; Farooq et al., 2009; Figueroa et al., 2003; Li et al., 2009; Misquitta et al., 2001; Sokoloski et al., 2010; Van Der Giessen et al., 2003). Consequently, understanding of the structure and function of mRNPs will help us to develop new biomarker for disease prognosis and new therapeutic drug targets.
12 Introduction
1.2 Investigation of protein-protein and protein-RNA interactions
1.2.1 Structural studies of macromolecules
Structural studies of biomolecules are necessary to understand their functions. Atomic models of enzymes or macromolecular machines help us elucidate their mechanism of action and their architecture. Biomolecular structures enable us to make targeted modification and engineering or structure based drug design to generate new drugs.
Multiple methods can be used to derive atomic resolution structures of biomolecules. Nuclear Magnetic Resonance (NMR) Spectroscopy, for example can be used to study molecules in solution. This method is based on nuclear spins response to magnetic fields and provide structural information as well as give insights into the dynamics of a system. However, NMR is limited to the size of a molecule and starts to become challenging beyond 30 kDa (Ikeya et al., 2018). To determine structures of larger molecules, NMR can be combined with other methods such as Electron Paramagnetic Resonance (EPR) Spectroscopy (Duss et al., 2015). There, longer distance restrains are derived which can be used for structure calculation. NMR is widely used for structure determination of RRMs or RRM-RNA complexes, to investigate their dynamics and for binding studies to RNA (Dominguez et al., 2011).
Another technique to determine structures is X-ray crystallography, where one measures the X-ray diffraction pattern of a crystalline sample. To be able to resolve a structure, this method requires that biomolecules form diffracting crystals. The derived atomic structures represent the packed state of the molecule in the crystal. This provides us with an instantaneous view of the biomolecule. In some cases this might lead to faulty interpretations of biomolecular function, for example protein dimers, which are only formed due to crystal packing but do not exist in solution or an overall different domain arrangement as in solution (Mackereth and Sattler, 2012). Proteins which are too flexible or are present in multiple conformations do not form crystals and shorter flexible regions or sidechains are not observable.
Recently, Single Particle Cryo-Electron Microscopy (Cryo-EM) has become a favored method for studying large assemblies at atomic resolution. One of the advantages of Cryo-EM is that large and complex structures can be determined, which cannot be crystallized for X-ray crystallography or are too large for NMR spectroscopy. Samples are directly imaged in vitrified solutions. Structures starting from around 64 kDa can thus be studied with resolutions around 3-4 Å (Murata and Wolf, 2018)
13 Introduction
Small-angle X-ray scattering (SAXS) and small-angle neutron scattering (SANS) are additional methods to gain insights into the structure of biomolecules, however, with lower resolution. Small-angle scattering of X-rays or neutrons prove information about the overall shape, relative position of the binding partners and binding stoichiometries. The advantage of SAXS and SANS is that the crystallization of the sample is not needed and that both methods can be combined with high-resolution structural information obtained from X-ray crystallography, NMR or Electron Microscopy (EM) to tackle complex multi-subunit complexes (Trewhella, 2016).
In the following studies, we used X-ray crystallography for protein-RNA complex structure determination and the docking program HADDOCK (High Ambiguity Driven protein-protein Docking) to generate models of a protein-protein-complex. NMR and Isothermal titration calorimetry (ITC) helped us to validate and characterize protein-protein and protein-RNA binding in solution, while a cellular luciferase assay highlighted the biological relevance in Huh7 cells. Thus, these methods are described in more detail.
1.2.2 Structure determination by X-ray Crystallography
To obtain a structure by X-ray crystallography, molecules need to form crystals. Usually, different crystallization conditions are tested. X-ray diffraction pattern of the crystals is based on the crystal packing of the molecules. If the diffraction pattern indicates a good resolution, data is recorded and processed. After determining the electron density, the structure can be obtained and validated (Figure 1.8).
1.2.2.1 Crystallization screening
For crystallization often a high quality, pure and homogeneous sample is needed. Crystals form from a supersaturated solution. When molecules change from a solution state into a solid state, they can either become amorphous and precipitate or ordered and form a crystal. To induce saturation and crystal formation, a precipitant is added to the solution. Over time, the total drop volume decreases and the evaporation/condensation from the reservoir reaches an equilibrium. Thereby, both protein and precipitant concentration increase and reach a critical concentration, at which the protein goes out of solution and crystallizes. Nowadays, crystal screens are performed at high throughput in 96-well plates using commercially available suites. Common screening suites are JCSG+, PACT, PEGS, AmSO4 and Classics (Hampton
14 Introduction or Qiagen) which cover different pH, precipitants (salts, polyethylene glycol, organic solvents, etc) and additives. Additionally, sample concentration and temperature affect the
Figure 1. 8. Work flow for an X-ray crystallographic structure determination. After crystal growth, diffraction is measured and the data processed to derive an electron density map and build a model.
crystallization process. Pipette robots set up plates by mixing the screening suite solution (precipitants) and protein samples in either hanging drop or sitting drop format (Chayen and Saridakis, 2008). A good starting point is to screen multiple conditions such as 3-4 different screening suites at two temperatures (4°C and 20°C) and two different protein concentrations. Crystal growth can be observed after a few days but also after a few weeks or months.
1.2.2.2 Data collection and Processing
X-ray diffraction pattern of crystals are measured at an in-house X-ray source/ diffractometer, if available, or at a Synchrotron light source. Before, crystal can be cryo- protected, so that the solvent around them is vitrified and prevents ice formation, which would lead to loss of diffraction. Various cryo-protectants (glycerol, ethylene glycol, sucrose, etc.) should be tested. Then, crystals are frozen in liquid nitrogen. Cryo-cooling helps preventing the crystals from radiation damage. Additionally, to avoid radiation damage, crystals are measured under a nitrogen gas stream.
Crystals are shot by X-rays and the diffraction is recorded when it hits the detector. X-rays are scattered by the electrons at a certain angle. These angles are derived by considering the diffraction to be reflections from parallel planes of atoms in a crystal. The hkl Miller indices
15 Introduction define these parallel planes. Thus, any reflection (spot on the screen) in the diffraction pattern is characterized by its index (hkl) and the reflection intensity (I). Crystals are rotated to obtain a complete data set (Garman and Schneider, 1997). The degree of rotation needed dependents on the internal symmetry of the crystal. The symmetry is described by a crystal lattice, which like a coordinate system defines the position of the atoms within a molecule. The lattice is comprised by a set of repetitive unit cells. The unit cells, described by the dimensions a, b, c (in Å) and angles α, β, γ (in °), is the scattering unit of the crystal. Therefore, the unit cell is derived directly from the diffraction pattern. The molecules within one unit cell are related by crystal symmetry. The asymmetric unit is the smallest unit of the cell, which cannot be transferred into another unit cell by symmetry elements (translation, rotation). From such an asymmetric unit, the entire crystal lattice can be built by applying symmetry elements. A combination of symmetry elements are referred to as space group. For each crystal, the unit cell dimensions and the space group needs to be determined (Smyth and Martin, 2000).
The position of the diffraction spots depend on the size of the unit cell as well as the position of the molecules in the unit cell. Thus, each spot corresponds to distances between molecules in the crystal. The size and shape of the molecules is encoded in the phases and intensities of the diffraction spots. To get the three dimensional electron density maps, the structure factor F and the Miller indices h, k, l of each reflection must be determined. Software such as HKL-2000 (Otwinowski and Minor, 1997) or XDS (Kabsch, 1993) determines all reflections by comparing the background with high intensity spots. Based on the position of all reflections, the unit cell dimensions of the crystal can be obtained (which is called “indexing”). The signal intensity and hkl values are obtained by integrating all reflections. Subsequently “merging and scaling” are performed; first, all peaks that appear in more than one image are identified (merging) and they are scaled such that they have consistent intensity (scaling). The structure factor contains information about the amplitude and the phase of a wave. Both of which are needed to generate an electron density map by Fourier Transformation (FT) (Smyth and Martin, 2000). The square of the amplitude is proportional to the signal intensity (which has been measured in the diffraction image), but the phase cannot be measured directly, leading to what is called the “phase problem”.
Several parameters enable us to judge the quality of the processed data: Resolution, Rmerge (accuracy), I/σ(Ι) (signal-to-noise ratio), redundancy and completeness. At high resolution (in Å), structural elements are better visible and a more precise model can be build. Rmerge (or Rsym) determines the accuracy of the data set. It is derived from differences in intensity between symmetry-related and unique reflections that should have identical intensities. Overall Rsym of 5% are very good and of more than 20% indicate severe problems of the data. I/σ(I) is the signal-to-noise ratio where I is the intensity of a unique reflection and
16 Introduction
σ(I) the deviation/error. The redundancy of the data indicates how many times a unique reflection has been measured. The completeness of the data shows the difference of unique reflections (not symmetry related) and the theoretical number for a given unit cell and a space group. The overall completeness should be 95-100% (Otwinowski and Minor, 1997; Wlodawer et al., 2008) .
1.2.2.3 Phase Determination
The structure factor contains information about the amplitude and the phase. To generate the electron density map, both is needed. However, only the amplitude can be determined directly. The phase can be determined by various methods. In Multiple Isomorphous Replacement (MIR), crystals are soaked or co-crystalized with a heavy-metal compound (mercury, platinum, uranium, etc.). The induced strong scattering from such an heavy atom allows to determine their location due to a change in intensities. Soaking with at least two different heavy-metal compound are required for a reliable phase determination. In Multiple Wavelength Anomalous Dispersion (MAD) different wavelengths around the absorption edge of a certain atom (anomalous scatterer) are used. The differences in the resulting diffraction patterns can then be used to reconstruct the phase. Besides soaking a heavy metal, incorporation of selenium is often chosen. A protein is expressed in presence of the amino acid seleno-methionine, where this residue is incorporated at the position of the methionines (Taylor, 2003).). If a homologue structure is available, the phase can be determined by Molecular Replacement (MR). A related structure gives the orientation and position of the molecules within the unit cell. This is used to estimate an initial phase which helps generate the necessary electron density map (Taylor, 2003).
1.2.2.4 Model building and Structure quality
To build the structure into an the electron density map, the software COOT (Emsley and Cowtan, 2004) is used. Software such as REFMAC5 (Vagin et al., 2004) or Buster (Smart et al., 2012) can be used to refine the structure and check the quality of the structure by calculating the structure factors R and Rfree after each manual change. The R-factor indicates how much the calculated structure factors from the model (Fcalc) and the observed structure factors (Fobs) deviate. The parameter Rfree is determined analogously to normal R-factor but excludes a random amount of reflections. Rfree is an important validation parameter and indicates over-fitting of the experimental data. Both values should decrease if refinement is
17 Introduction proceeding sensibly. Desirable R values are between 10 and 30%, which depends on the resolution. Rfree should not deviate from the R factor more than 7% (Wlodawer et al., 2008).
Occupancy is an additional parameter included in the refinement. Occupancy of an atom indicates the fraction of molecules in the crystals, where this atom occupies the position. If the position of all atoms is identical, then the occupancy is 1. In case of two conformations, for example due to a sidechain being 50% in one conformation and 50% in the other conformation, the occupancy is 0.5. Lower occupancy is observed for certain key nucleotides in chapter 2. Very dynamic protein regions are not visible in the electron density map. They occupy multiple positions which are averaged out in the electron density maps (Wlodawer et al., 2008). Therefore, loops or long sidechains that are too flexible and not stabilized by crystal contacts are often not visible and therefore missing in crystal structures. Moreover, the root- mean-square deviation (RMSD) indicate how much the model differ from geometrical parameters. Bond lengths RMSDs are expected to be around 0.02 Å. Additionally, further deviations of stereo-chemical parameters need to be controlled, such as the peptide planarity. The Ramachandran plot indicates outlier of φ/ψ torsion angles of the polypeptide backbone. 98% should lie in the allowed region (Wlodawer et al., 2008). Final structures are uploaded to the protein data bank (PDB). Papers showing crystal structures include the statistics of the data (resolution, Rsym, completeness, I/σ(Ι), etc. for the overall set and for the highest resolution shell) and the refinement parameters (R/Rfree, RMSDs, Ramachandran outliers, etc) to judge the quality of the data.
1.2.3 Characterization of protein-RNA and protein-protein complexes by NMR Spectroscopy
NMR Spectroscopy can provide a wide set of information about structural, mechanistic, thermodynamic and kinetic aspects of a biomolecular interaction (Waudby et al., 2016). NMR is based on the behavior of nuclei with spin ½ in a magnetic field. In biomolecules, 1H and 31P are naturally abundant and as they are spin ½ nuclei, they can be measured using NMR. On the other hand, the majority of naturally occurring 14N and 12C atoms have spin 1 or 0, respectively and isotope labelling methods are needed, to increase the fraction of 13C and 15N with spin ½. Isotopic labeling of proteins can be done by using a recombinant expression system and growing cells, for example E. coli, in a minimal medium supplemented with 15N-
13 NH4Cl and C glucose as the only nitrogen and carbon sources. In addition, deuterated proteins can be obtained by growing E. coli in a medium containing D2O instead of H2O. To produce labelled RNA, 15N and 13C labeled nucleotides are used during in vitro transcription.
18 Introduction
Active nuclei behave like magnets, align with the magnetic field and start to precesse around the magnetic field with a frequency
ʋ0= ω0/2π, called the Larmor frequency. The sum of the spins is called bulk magnetization. Radio frequency pulses can manipulate the orientation of the nuclear spins. The angular velocity ω0 of the precession depends on the static magnetic field B0 and is proportional to the gyromagnetic ratio γ:
ω0=γB0.
γ is an intrinsic property of a nuclei. Due to the dependence of the magnetic field B0, NMR frequencies are difficult to compare when measured at different spectrometers. Therefore, these frequencies are referenced with a specific reference compound to give the chemical shift:
6 δ=10 * (ʋspin- ʋref)/ ʋref, expressed in parts per million (ppm) (Keeler, J. Understanding NMR Spectroscopy/2nd ed. 2010). The chemical shift depends on the chemical environment. Within a protein, every amino acid atom has a different chemical environment and a characteristic chemical shift. When the atom experience a change in chemical environment, for example, when the amino acid is involved in ligand binding or the protein unfolds, the chemical shift changes.
Two types of spin-spin interactions, scalar coupling and dipolar coupling, are observable by NMR. The scalar coupling, also called J coupling, is indirect and mediated through a chemical bond while the dipolar coupling is based on the direct interaction of the two dipoles. The later depends on the distance between the two spins and their orientation relative to the magnetic field. Due to the orientation dependence, dipolar coupling cannot be observed in solution. There, spins are always in fast motion, adopting different orientations, such that the dipolar coupling is averaged out.
1.2.3.1 The chemical exchange
Protein-RNA or protein-protein interactions can be monitored by various NMR titration experiments. Unlabeled ligand or protein is titrated into a labelled protein in smaller steps until saturation is reached. Usually, 1H-15N Heteronuclear Single Quantum Coherence spectroscopy (HSQC) experiments are recorded. The 1H-15N HSQC spectrum is a ‘fingerprint’ of a protein showing one peak for each amino-acid NH group, where the position of the signal
19 Introduction represents its local chemical environment. To determine which signal corresponds to which amino acid in the protein, a protein backbone assignment needs to be done (1.2.3.3).
Upon ligand or protein binding, the local chemical environment and thus the position of the peak changes (Waudby et al., 2016). The appearance and position of the signal depends on
Figure 1. 9. Exchange regimes. Different exchange regimes of a protein-ligand interaction indicated on a 1D 1H spectra (A) and 2D 1H-15N HSQC spectra (B). Simulated spectra for a protein-ligand interaction showing line shapes under different exchange regimes. Adapted from Waudby et al, 2016.
the exchange rate kex between the free and bound conformation, relative to their frequency difference, Δω. When the exchange rate is smaller than the difference in frequency (kex ≪ Δω), two signals are observed, one corresponding to the free and one to the bound form. With increasing ligand concentrations, the intensity decreases for the free form and increases for the bound form. This exchange regime is called “slow exchange” (Figure 1.9, top). When the exchange rate is larger than the difference in frequency (kex ≫ Δω), only one signal is observed at the average frequency of both bound and free form. With increasing ligand concentrations, a progressive change in peak position is observed. This exchange regime is called “fast exchange” (Figure 1.9, bottom). If the exchange rate is close to the difference in frequency
(kex ≈ Δω), a more complex behavior is observed. The chemical shifts are broadened due to exchange and result in lower or invisible signal intensities. This is called “intermediate exchange” (Figure 1.9, middle) (Waudby et al., 2016). If intermediate exchange is observed, it is possible to shift the exchange regime by a change in temperature (increasing or decreasing the exchange rate) or magnetic field (change in resonance frequency). When the protein-RNA interaction is in fast exchange, the change of the signal position can be easily followed (Williamson, 2013). To quantify the chemical shift and map the binding surface, the combined chemical shift difference (ΔCS) between the free and bound state is generated according to:
20 Introduction