<<

CHARACTERIZATION OF A SINGLE DOMAIN RNA-BINDING PROTEIN FROM THE ANTARCTIC METHANOGEN METHANOCOCCOIDES BURTONII

Taha

A thesis in fulfilment of the requirements for the degree of Doctor of Philosophy

School of Biotechnology and Biomolecular Sciences Faulty of Science The University of New South Wales

March 2016

THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet

Surname or Family name: TAHA

First name: Other name/s:

Abbreviation for degree as given in the University calendar: PhD

School: Biotechnology and Biomolecular Sciences Faculty: Faculty of Science

Title: Characterization of a single TRAM domain RNA-binding protein from the Antarctic methanogen Methanococcoides burtonii

Abstract 350 words maximum: (PLEASE TYPE)

TRAM domain proteins present in Archaea and Bacteria have a -barrel shape with anti-parallel -sheets that form a nucleic acid binding surface; a structure also present in cold shock proteins. This thesis explores evolutionary, biophysical, structural and nucleic acid binding properties of a single TRAM domain protein, Ctr3 (cold-responsive TRAM domain protein 3) from the Antarctic archaeon Methanococcoides burtonii. o Ctr3 binds RNA and unfolds reversibly with a two-state mechanism (Tm of ~ 50 C). An on-column in vitro binding assay was used to capture M. burtonii RNA targets of Ctr3. Identification of the captured RNA using RNA-seq revealed that Ctr3 bound M. burtonii RNA with a preference for tRNA and 5S rRNA, and a potential binding motif was identified. In tRNA the motif represented the C loop; a region that is conserved in tRNA from all domains of life and appears to be solvent exposed, potentially providing access for Ctr3 to bind. In 5S rRNA the motif represented one side of the stem and loop C which also appears to be solvent exposed providing possible access to Ctr3. At low temperatures, nucleic acids are prone to form stable secondary structures which consequently impede transcriptional and translational processes in the cell. In Bacteria, a family of Csps has been postulated to resolve inhibitory structures of nucleic acids; thereby facilitating transcription and translation at low temperatures. However, while being ubiquitous in bacterial genomes, only a few csp homologs have been identified in Archaea. The tertiary structures of Ctr3 and Csps are similar. The broad representation of single TRAM domain proteins within Archaea compared to their apparent absence in Bacteria, and scarcity of Csps in Archaea but prevalence in Bacteria, suggests they represent distinct evolutionary lineages of functionally equivalent RNA- binding proteins. Although, there is little sequence identity among TRAM domain and Csp proteins, based on evolutionary and tertiary structural analyses, both proteins are inferred to play important roles in cold adaptation, including low temperature translation.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral theses only).

.. 31/03/2016 .... Signature Witness Signature Date

The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

I

ORIGINALITY STATEMENT

‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed ……………………………………………......

Date ...……………………………………………......

II

Abstract

Methanococcoides burtonii, a methanogen that was isolated from Ace Lake, Antarctica, has proven to be a useful model for studying the molecular mechanisms of cold adaptation in Archaea. In Ace Lake, M. burtonii only experiences temperatures below 4oC and nucleic acid binding proteins have been inferred to play important roles in cold adaptation. In this thesis, the three M. burtonii ctr (cold-responsive TRAM domain) genes (proteins) Mbur_0304, Mbur_0604 and Mbur_1445 are referred to as ctr1 (Ctr1), ctr2 (Ctr2) and ctr3 (Ctr3), respectively. The thesis reports the first experimental studies to assess evolutionary, biophysical and structural properties and unique characteristics related to nucleic acid binding, including specific interactions describing putative roles of Ctr3 in the cell.

During purification, E. coli nucleic acids consistently co-purified with recombinant Ctr3. The liberated nucleic acid from proteins was able to be digested with RNase indicating the bound nucleic acid was RNA. The bound RNA was able to be removed by treating the recombinant proteins with mild urea to partially unfold the protein, eluting the protein across a NaCl gradient, and refolding it by dialysis. Purification of recombinant RNA-free proteins allowed thorough assessment of the structure-function- stability relationship of Ctr3. Ctr3 unfolded reversibly with a two-state mechanism (Tm of ~ 50oC). The predicted three-dimensional structure of Ctr3 exhibited substantial structural similarities with the TRAM domain of RumA protein (RlmD) from E. coli. The aromatic residues, particularly the four phenylalanine residues of Ctr3 appeared to be surface exposed and in close proximity to each other on the putative RNA-binding surface, similar to the aromatic residues in RumA-TRAM domain, suggesting similar roles in RNA interaction. An on-column in vitro binding assay was used to capture M. burtonii RNA targets of Ctr3, and analysed relative to a complete reconstruction of the M. burtonii transcriptome obtained from total RNA. Identification of the captured RNA using RNA-seq revealed that Ctr3 bound M. burtonii RNA with a preference for tRNA and 5S rRNA, and a potential binding motif was identified. In tRNA, the motif represented the C loop; a region that is conserved in tRNA from all domains of life and appears to be solvent exposed, potentially providing access for Ctr3 to bind. In 5S rRNA, the motif represented one side of the stem and loop C which also appears to be solvent exposed providing possible access to Ctr3.

III

At low temperatures, nucleic acids are prone to form stable secondary structures which consequently impede transcriptional and translational processes in the cell. In Bacteria, a family of cold shock proteins (Csp) has been postulated to resolve inhibitory structures of nucleic acids; thereby facilitating transcription and translation at low temperatures. However, while being ubiquitous in bacterial genomes, only a few csp homologs have been identified in psychrophilic Archaea; and csp genes are absent from the M. burtonii genome. The first insight into which genes in Archaea may perform an analogous function to csp genes came from proteomic analyses of M. burtonii where the increased abundance of Ctr proteins at low temperature was identified (Williams et al., 2011). The highest levels of these Ctr proteins occurred at 1 and -2°C and in particular, Ctr3 exhibited the highest increases (9-fold) at -2C. Ctr3 and Csps both form a β-barrel shape with anti-parallel -sheets that form a nucleic acid binding surface. The broad representation of single TRAM domain proteins within Archaea compared to their apparent absence in Bacteria, and scarcity of Csps in Archaea but prevalence in Bacteria, suggests they represent distinct evolutionary lineages of functionally equivalent RNA-binding proteins. Although, there is little sequence identity among TRAM domain and Csp proteins, based on evolutionary and tertiary structural analyses, both proteins are inferred to play important roles in cold adaptation, including low temperature translation.

IV

List of Publications

Taha., Siddiqui, K.S., Campanaro, S., Najnin, T., Deshpande, N., Williams, T.J., Aldrich‐Wright, J., Wilkins, M., Curmi, P.M.G., and Cavicchioli, R. (2016) Single TRAM domain RNA‐binding proteins in Archaea: functional insight from Ctr3 from the Antarctic methanogen Methanococcoides burtonii. Environmental Microbiology. doi: 10.1111/1462-2920.13229.

Najnin, T., Siddiqui, K.S., Taha., Elkaid, N., Kornfeld, G., Curmi, P.M.G. and Cavicchioli, R. (2016) Temperature-sensing in the Antarctic archaeon, Methanococcoides burtonii: role of a cold-active sensor kinase from a two component regulatory system. Scientific Reports 6:24278 doi:10.1038/srep24278.

Conference presentation

Taha., Campanaro, S., Siddiqui. K.S., Curmi, P.M.G., Deshpande, N., Wilkins, M., and Cavicchioli, R. (2015) Phylogenetic, structural and nucleic acid binding properties of a novel type of RNA binding (TRAM) protein from an Antarctic archaeon. 6th International Conference on Polar and Alpine Microbiology, České Budějovice, Czech Republic, 6-10 September.

V

Acknowledgements

Alhumdulillah! I would like to begin by thanking Allah for all His blessings. It has been an exciting journey; numerous ups and downs, but coming out on top successfully surely makes me forget all the difficult times.

I will always be grateful to Prof Rick Cavicchioli. He has been kind and patient during some of my most stressful times in Sydney. Without his support, I would not have been able to complete my degree. I extend my prayers and best wishes to him and his family.

I would also like express my immense gratitude to Sohail. He has always been the first person to help and give me potential solutions to most of my work-related problems. There is always something new to learn from him. It has been a priviledge to know a scientist of his calibar.

Very especial thanks to Prof Paul Curmi and Dr. Stefano Camparano for all their valuable inputs and support. Thank you for managing time despite your busy schedules.

My acknowledgement will not be complete without thanking all my labmates. Sohaila for helping me settle down in the lab in my first year; Tim for helping me with the phylogenetic analyes; Haluk for his moral supports, Benny and Yan for being amazing friends in the Czech Republic trip and Tahria for being an amazing sister. I am grateful to Khalid Daud for his moral guidance and encouragement. I would like to take this oppertunity to also mention Anne, Nandan and Janice for their assistance.

To my parents, my brother and sister, I could not have asked for a better family. I hope I have been able to make you guys proud. And finally, Easha, you are the kindest and the most patient person I have ever met in my entire life. I think I finally have some time to take you to the zoo!

VI

Table of Contents Page

Declaration of originality I Abstract II List of Publications IV Acknowledgements V List of Abbreviations XII List of Figures XVI List of Tables XVIII

Chapter 1. General Introduction 1

1.1 Archaea 2

1.2 Methanococcoides burtonii 3

1.2.1 Genomic analysis of M. burtonii 5

1.2.2 Proteomic analysis of M. burtonii 5

1.2.3 Transcriptomic analysis of M. burtonii 6

1.2.4 Proteins from M. burtonii 7

1.3 Cold adaptation in microorganisms 9

1.3.1 Cold adaptation in Bacteria 9

1.3.2 Cold adaptation in Archaea 10

1.4 Binding modules in different RNA-binding proteins 12

1.5. TRAM domain proteins 13

1.6 RNA-Protein interactions 15

1.7 RNA-seq and its implications in transcriptome analysis 18

1.8 Aims of project 20

VII

Chapter 2. Phylogenetic analysis of Ctr3 22

Abstract 23

2.1 Introduction 24

2.2 Materials and Methods 25

2.3 Results 26

2.4 Discussion 30

Chapter 3. Overexpression and purification of Ctr proteins 33

Abstract 34

3.1 Introduction 35

3.2 Materials and Methods 38

3.2.1 Synthesis of ctr genes 38

3.2.2 E. coli competent cells preparation and transformation 40

3.2.3 Overexpression and purification of Ctr proteins 41

3.2.3.1 Purification of nucleic acid-bound Ctr proteins (protocol A) 41

3.2.3.2 Purification of nucleic acid-free Ctr proteins (protocol B) 42

3.2.4 SDS PAGE analysis 43

3.2.5 Protein quantification 44

3.2.6 Protein Identification 44

3.2.7 Nucleic acid identification assay 44

3.3 Results 45

3.3.1 Expression and protein purification 45

VIII

3.3.2 Quantification of nucleic acid content of protocol A vs protocol B 50

3.3.3 Quantification of Ctr proteins 51

3.3.4 Identification of the bound nucleic acids 52

2.3 Discussion 53

Chapter 4. Biophysical and structural analysis of Ctr3 55

Abstract 56

4.1 Introduction 57

4.2 Materials and Methods 60

4.2.1 Far-UV circular dichroism (CD) spectrometry 60

4.2.2 Near-UV circular dichroism (CD) spectrometry 60

4.2.3 Intrinsic fluorescence spectrometry 60

4.2.4 Extrinsic fluorescence spectrometry 61

4.2.5 Differential Scanning Calorimetry (DSC) 61

4.2.6 Transverse urea gradient gel electrophoresis (TUG-GE) 62

4.2.7 Homology modelling and structure prediction 64

4.3 Results 64

4.3.1 Far-UV CD spectrometric analysis of Ctr3 64

4.3.2 Near-UV CD spectrometric analysis of Ctr3 66

4.3.3 Intrinsic fluorescence spectrometric analysis of Ctr3 68

4.3.4 Extrinsic fluorescence spectrometric analysis of Ctr3 70

4.3.5 DSC analysis of Ctr3 73

4.3.6 TUG-GE analysis of Ctr3 75

IX

4.3.7 Thermodynamic parameters of Ctr3 78

4.3.8 Structural analysis of Ctr3 82

4.4 Discussion 85

4.4.1 Reversibility of Ctr3 85

4.4.2. Thermodynamics of unfolding of secondary vs tertiary structures 85

4.4.3 Possible effects of protein-RNA interactions on structural stability 87

4.4.4. Refolding of ctr3 to a different conformation 89

4.4.5 Structural features of Ctr3 89

4.4.6 Conclusion 90

Chapter 5. Temperature-dependent gene expression and 91 transcriptome analysis of M. burtonii

Abstract 92

5.1 Introduction 93

5.2 Materials and Methods 96

5.2.1 M. burtonii cultures and RNA extraction 96

5.2.2 RNA-seq and data analyses 97

5.3 Results 99

5.4 Discussion 104

Chapter 6. RNA binding specificity and possible cellular 109 functions of Ctr3 Abstract 110

6.1 Introduction 111

X

6.2 Materials and Methods 113

6.2.1 RNA extraction and in vitro binding to Ctr3 113

6.2.2 RNA-seq and data analyses 114

6.3 Results 116

6.3.1 Ctr3 binds specific M. burtonii transcripts 116

6.3.2 Binding motif of Ctr3 117

6.4 Discussion 123

6.4.1 Preference for structured RNA 123

6.4.2 Possible role in translation 126

6.4.3 Evolutionary implications of single TRAM domain proteins 129

Chapter 7. General discussion and future perspectives 131

7.1 Relationship between RNP and TRAM RNA-binding modules 132

7.2 Roles of Phe residues in the nucleic acid binding surface 134

7.3 Characteristics of Csp and TRAM proteins: possibility of common roles 135

7.4 Evolution of cold shock and TRAM domain proteins 137

7.5 Future work and concluding remarks 138

Bibliography 141

Appendix 1. List of Archaea and Bacteria used for the phylogenetic analysis. 171

Appendix 2. Complete 16S rRNA tree 173

Appendix 3. Alignment of amino acid sequences of TRAM domains from 174 Bacteria and Archaea.

XI

Appendix 4. Alignment of nucleic acid sequences of TRAM domains from 177 Bacteria and Archaea.

Appendix 5. List of all operons, TSS and TTS from M. burtonii genome 190 identified from the RNA-seq data.

Appendix 6. Complete list of transcripts bound by Ctr3 at 4 and 23oC, 190 and M. burtonii genes upregulated at 4 and 23oC.

Appendix 7. List of all RNA targets from 4 and 23°C RNA containing the 41 190 nucleotide full-length 4C_M1 sequence.

Appendix 8. List of all RNA targets from 4 and 23°C RNA containing the 190 nine nucleotide core 4C_M1 sequence.

Appendix 9. Multiple sequence alignment of M. burtonii tRNA bound 191 by Ctr3 from 4oC and 23oC grown cultures.

Appendix 10. Multiple sequence alignment of M. burtonii 5S RRNA bound 193 by Ctr3 from 4oC and 23oC grown cultures.

Appendix 11. Alignment of tRNA genes from selected Archaea, Bacteria 194 and Eucarya.

XII

List of Abbreviations

(NH4)2Fe(SO4)2 Ammonium ferrous sulphate µg Microgram µl Microlitre µM Micromolar µm Micrometre 2DE Two-dimensional gel electrophoresis ABC ATP-binding cassette

AlK(SO4)2 Aluminium potassium sulphate ANS 8-anilinonaphthalene-1-sulphonate APS Ammonium persulphate Arg Arginine atm Atmosphere ATP Adenosine triphosphate AU Adenylate-uridylate BLAST Basic local alignment search tool bp Base pair

CaCl2 Calcium chloride CD Circular dichroism cDNA Complementary deoxyribonucleic acid Chip Chromatin immunoprecipitation

CO2 Carbon dioxide

CoSO4 Cobalt sulphate CSD Cold shock domain Csp Cold shock protein Ctr Cold-responsive TRAM domain

CuSO4.5H2O Copper sulphate with 5 molecules of water DSC Differential Scanning Calorimetry ECM Extra cellular matrix EDTA Ethylenediaminetetraacetic acid EF-2 Elongation factor

XIII

Ɛmax Emission maximum

FeSO4.7H2O Ferrous sulphate with 7 molecules of water

Fu Fraction of unfolded proteins GC Guanine-cytosine Gly Glycine GTP Guanosine-5'-triphosphate h Hour

H2O Water

H3BO3 Boric acid HCl Hydrogen chloride HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid His Histidine His-tags Polyhistidine-tag consisting of at six histidine residues at the N- terminal HTV High-tension voltage IF Initiation factor Ile Isoleucine IMAC Immobilized metal affinity chromatography IPTG Isopropyl β-D-1-thiogalactopyranoside

K2HPO4 Potassium hydrogen phosphate kcal Kilocalorie KCl Potassium chloride kDa Kilo Dalton KH K homology kJ Kilojoule L Litre LB Luria broth LC-MS Liquid chromatography–mass spectrometry LC-MS/MS Liquid chromatography–tandem mass spectrometry Leu Leucine Lys Lysine mA Milli ampere mg Milligram

XIV

mg/ml Milligram per millilitre

MgCl2.6H2O Magnesium chloride with 6 molecules of water

MgSO4.7H2O Magnesium sulphate with 7 molecules of water Milli-Q Ultra-pure water (Millipore) Min Minutes ml Millilitre mM Millimolar

MnSO4.7H2O Manganese sulphate with 7 molecules of water mol Moles MOPS 3-(N-morpholino) propanesulphonic acid mRNA Messenger ribonucleic acid

N2 Nitrogen

Na2MoO4.2H2O Sodium molybdate with 2 molecules of water

Na2S Sodium sulphide NaCl Sodium chloride NCBI National Center for Biotechnology Information ng Nanogram

NH4Cl Ammonium chloride Ni Nickel NJ Neighbour joining nm Nanometre NMR Nuclear magnetic resonance OB Oligonucleotide/oligosaccharide-binding fold OD Optical density ORF Open reading frame PCR Polymerase chain reaction PE Paired-end PEI Polyethylenimine Phe Phenylalanine PNP Purine nucleoside phosphorylase PUA Pseudouridine synthase and archaeosine transglycosylase qPCR Quantitative PCR RIP Ribonucleoprotein immunoprecipitation

XV

RNA Ribonucleic acid RNAP RNA polymerase RNA-seq RNA sequencing RNP RNA-recognition motif RpoEF RNA polymerase subunits E and F rRNA Ribosomal ribonucleic acid SDS-PAGE Sodium dodecyl sulphate polyacrylamide gel electrophoresis Sec Seconds SEC Size exclusion chromatography SELEX Systematic evolution of ligands by exponential enrichment SR Serine and arginine TEMED N,N,N',N'-tetramethylethylenediamine TF Trigger factor THUMP Thiouridine synthases, RNA methylases and pseudouridine synthases

Tm Melting temperature tRNA Transfer ribonucleic acid TSS Transcription start site TTS Transcription termination site TUG-GE Transverse urea gradient-gel electrophoresis Tyr Tyrosine UTR Untranslated region UV Ultra-violet v/v Volume/volume Val Valine

ZnSO4 Zinc sulphate ΔCp Change in heat capacity ΔG Free energy change

ΔHcal Calorimetric enthalpy change

ΔHvH Van’t Hoff enthalpy change ΔS Entropy change

λmax Wavelength maximum

XVI

List of Figures

1.1 Image showing ultrastructures of M. burtonii cells grown at 4°C 4 using scanning electron microscopy.

2.1 Amino acid neighbor-joining (NJ) tree. 27

2.2 Nucleic acid neighbor-joining (NJ) tree. 27

2.3 Phylogenetic relationship of TRAM domain proteins. 29

3.1 Plasmid maps for pJexpress with ctr1 (Mbur_0304), ctr2 (Mbur_0604) 40 and ctr3 (Mbur_1445) inserts.

3.2 Induction gel for overexpression of ctr3 in E. coli Rosetta (DE3) at 15oC. 47

3.3 Ni-column chromatography and SEC purification profiles for Ctr proteins. 49

3.4 SDS-PAGE protein bands of recombinant Ctr. 50

3.5 Amino acid sequence of Ctr proteins confirmed by mass spectrometry. 52

3.6 Identification of the nucleic acids bound to Ctr3. 52

4.1 Far-UV CD spectrometry analyses of Ctr3. 65

4.2 Near-UV CD spectrometry analyses of Ctr3. 67

4.3 Intrinsic fluorescence spectrometric analyses of Ctr3. 69

4.4 Blue shift of emission maximum of ANS fluorescence. 71

4.5 Fluorescence emission of ANS excited at 380 nm. 72

4.6 Stability and unfolding characteristic of RNA-free Ctr3; 74 DSC plots for Ctr3.

4.7 Stability and unfolding pattern of RNA-bound Ctr3; DSC plots. 75

4.8 TUG-GE of Ctr3. 77

4.9 ΔG for Ctr3 unfolding as function of temperature (45 to 52oC). 79

4.10 ΔHcal and ΔHvH determined from DSC thermogram. 81

XVII

4.11 Plot of ΔG(T) versus temperature. 81

4.12 Structural comparison of M. burtonii Ctr3 with E. coli RumA (RlmD). 83

4.13 Comparison between Ctr3 and TRAM domain protein structures 84 from Archaea.

4.14 Alignment of Ctr3 from M. burtonii, MM_1357 from M. mazei 84 and RumA-TRAM domain from E. coli.

4.15 Biophysical analyses of Ctr3. 88

5.1 Linear plot of RNA-seq and microarray data. 99

5.2 Overview of M. burtonii transcriptome data. 101

5.3 Example of consensus promoters (TATA) within ORFs of an operon. 102

5.4 CS-box 1-like elements within the 5’-UTRs of ctr genes from M. burtonii. 103

6.1 RNA bound by Ctr3 and total M. burtonii RNA used for RNA-seq 117 electrophoresed on a non-denaturing 1.5% agarose gel.

6.2 RNA species bound by Ctr3 from 4C and 23C grown cultures that 118 contain both the full length and core sequence of the 4C_M1 motif.

6.3 Location of full-length 4C_M1 motif within the predicted structures of 121 M. burtonii 5S rRNA and tRNA.

6.4 Location of the nine nucleotide core sequence of 4C_M1 in the tertiary 122 structures of 5S rRNA and tRNA.

6.5 Relative abundance of tRNA and 5S rRNA transcripts in total RNA 125 at 4C compared to 23C.

6.6 Model showing putative cellular functions of Csps, L5 and Ctr3. 128

7.1 Primary sequence alignment of Csp homologs from E. coli and 133 M. frigidum, and Ctr3 from M. burtonii.

7.2 Homology model of Ctr3. 133

XVIII

List of Tables

4.1 Analysis of thermal unfolding of secondary structure of Ctr3. 79

4.2 Analysis of thermal unfolding of tertiary structure of Ctr3. 80

4.3 Analysis of free energy change (ΔG) of secondary vs tertiary 86 structure of Ctr3.

5.1 Summary of genes identified in M. burtonii RNA-seq analysis. 100

6.1 Transcripts bound by Ctr3 which contain both the full 4C_M1 motif 119 and the core sequence of the motif.

6.2 Comparison between Csp proteins from Bacteria and Ctr3/TRAM 129 proteins from Archaea.

1

Chapter 1

General introduction

2

1.1 Archaea

Archaea were discovered as a separate domain only around 25 years ago. Woese et al. in 1990 formally introduced Archaea as the third domain. This was achieved through the use of comprehensive rRNA gene analysis with the aim to construct a universal tree of life (Woese et al., 1990). Prior to this discovery, phylogenetic studies were performed based on macromolecules such as proteins (i.e. phenotypic characteristics). As a result, the living world at that time (prior to the discovery of Archaea) was divided into either Prokaryotes or Eucaryotes (Woese and Fox, 1977). This phylogenetic division did not address different primeval branches from the common line of descent and hence the living world was not accurately classified (Woese and Fox, 1977). By analysing functionally and structurally conserved small-subunit rRNA across all species, a new view of taxonomy of life was put forward (Woese and Fox, 1977). In 1990, three domains of life were formally introduced: Archaea, Bacteria and Eucarya (Woese et al., 1990).

Archaea have several distinguishable features from Bacteria and Eucarya (Cavicchioli, 2011). For example, the fundamental difference between archaeal vs bacterial/eucaryotic membrane phospholipids is in the composition of glycerol phosphate backbone. In Archaea, phospholipids are constructed from glycerol-1- phosphate and the branched hydrocarbon chains are bound by ether linkages whereas in Bacteria-Eucarya, phospholipids are made up of glycerol-3-phosphate and mostly straight hydrocarbon chains (while some are branched), linked by ester linkages (Cavicchioli, 2011). Also, Archaea are the only life form that is known to produce biological methane via methanogenesis (Cavicchioli, 2011). However, Archaea also share some similarities with Bacteria and Eucarya. For example, archaeal transcriptional components (such as TATA-box binding protein, transcription factor B and E and DNA-dependent RNA polymerase) are all similar to Eucarya (Reiter et al., 1990; Hausner et al., 1991; Ouzounis and Sander, 1992; Langer et al., 1995; Hanzelka et al., 2001) whereas archaeal gene regulation systems are generally bacterial-like (Dennis, 1997; Bell et al., 1999). Generally, at a cytological level, Archaea are like Bacteria in that they are single-celled organisms containing a circular chromosome. However, some Bacteria (e.g. Streptomyces and spirochetes) consist of linear chromosomes (Hinnebusch and Tilly, 1993) and some Archaea (e.g. Sulfolobus solfataricus) have more than one origin of replication analogous to the multi-origin 3

arrangement of linear eukaryotic chromosomes (Lundgren et al., 2004; Robinson et al., 2004). Like Archaea, some Eucarya are also unicellular e.g. yeast, protozoans, slime moulds and some forms of algae.

Archaea have been isolated from a variety of environments; from anoxic sediments (Falz et al., 1999; Simankova et al., 2001), hot springs (Barns et al., 1994; Kublanov et al., 2009; Reigstad et al., 2010), hydrothermal vents (Jeanthon et al., 1998), hypersaline lakes (Walsby, 1980; Bolhuis et al., 2004; Baliga et al., 2004), acid mines (Bruneel et al., 2008; Edwards et al., 2000), marine (Karner et al., 2001) and cold environments (Franzmann et al., 1992; 1997; Zhang et al., 2009). Archaea are abundant in cold (below 5°C) ecosystems (Cavicchioli, 2006). Cold-adapted or psychrophilic Archaea have been isolated from alpine and polar habitats, deep oceans, terrestrial and ocean surfaces (Cavicchioli, 2006). Psychrophilic Archaea play important roles in global biogeochemical cycles; e.g. nitrification of soil and cycling of simple carbon compounds via methanogenesis and anaerobic oxidation of methane (Leininger et al., 2006; Cavicchioli, 2006; Cavicchioli et al., 2007).

Despite the diversity of psychrophilic Archaea in the environment, they have long been understudied. This is probably because of the difficulties in culturing psychrophilic Archaea under laboratory conditions (Cavicchioli, 2006). However, a few have been isolated and characterized: Cenarchaeum symbiosum (not in pure culture) (Schleper et al., 1997; 1998), Halorubrum lacusprofundi (Franzmann et al., 1988; Gibson et al., 2005), Methanogenium frigidum (Franzmann et al., 1997; Saunders et al., 2003; Cavicchioli, 2006; Giaquinto et al., 2007) and Methanococcoides burtonii (Franzmann et al., 1992; Cavicchioli, 2006; Williams et al., 2011 and references therein). Most of the advances in understanding the biology of psychrophilic Archaea have been obtained from studies on the Antarctic methanogen, M. burtonii (Cavicchioli, 2006).

1.2 Methanococcoides burtonii

M. burtonii was isolated from the bottom anoxic water of Ace Lake, which is located in the Vestfold Hills region (68°S, 78°E), close to Davis Station, in the Australian Antarctic Territory (Franzmann et al., 1992). The lake system is meromictic (permanently stratified): ice covers upper layer, salinity increases with the depth and 4

anoxic region from 12 to 25 metres (maximum depth) of the lake. Ace lake is covered with ice for almost 11 months of the year and has a constant temperature of 1 - 2oC (Franzmann et al., 1992). Aside for methanogenic Archaea (M. burtonii and M. frigidum), the microbial community also consists of anaerobic sulphate-reducing and sulphur-oxidizing Bacteria that dwells in Ace lake (Rankin et al., 1999; Cavicchioli, 2015).

M. burtonii is an obligate methylotrophic methanogen that utilizes only methylamines and methanol as substrates for methanogenesis to generate cellular energy and produce biomass, hence, M. burtonii does not compete with hydrogen-utilizing, sulphate- reducing Bacteria in Ace Lake. M. burtonii is flagellated, motile and can grow across a range of temperatures between -2 C to 29oC, with maximum growth rate occurring at 23oC (Franzmann et al., 1992; Cavicchioli, 2006; Reid et al., 2006; Williams et al., 2011). M. burtonii is classified as a eurypsychrophile due to its ability to grow at a wide range of temperatures (-2 C to 29oC) as opposed to a stenopsychrophile that is restricted to growth at low temperatures e.g. M. frigidum (Cavicchioli, 2006).

Figure 1.1. Image showing ultrastructures of M. burtonii cells grown at 4°C using scanning electron microscopy. This figure has been adapted from Campanaro et al. (2011).

M. burtonii is amenable to laboratory cultivation (Cavicchioli, 2006) and has proven to be an excellent model in studying cold adaptation in psychrophilic Archaea. To date, many studies have been performed on M. burtonii to extensively investigate cellular properties related to cold adaptation. A range of genomic, proteomic and transcriptomic 5

analyses together with studies focusing on specific proteins have provided insights into the mechanisms of cold adaptation in M. burtonii, and broadly psychrophilic Archaea (for example, Allen et al., 2009; Williams et al., 2011; Campanaro et al., 2011). The following sections describe findings from previous studies on M. burtonii.

1.2.1 Genomic analysis of M. burtonii

In 2003, Saunders and colleagues constructed a draft genome of M. burtonii by performing comparative genomic analyses on methanogens and 16 closed archaeal genomes (available at that time). Several proteins were identified that potentially played roles in cold adaptation e.g. two cold shock domain (CSD) fold proteins and a winged- helix DNA binding protein (Saunders et al., 2003). The genome of M. burtonii was finally completed by Allen et al. (2009). It was reported that M. burtonii exhibited a genomic capacity to accommodate highly skewed amino acid content. This was reflected by the cold adapted proteins having higher degree of amino acid skew. However, the codon usage in M. burtonii was similar to that of mesophilic Methanosarcina spp. A comprehensive analysis of the whole genome reported an over- representation of polysaccharide biosynthetic, cell wall, cell membrane, envelope biosynthetic, signal transduction, replication, recombination, repair and a large number of transposase genes (Allen et al., 2009). Several Bacteria-like central metabolic and signal transduction genes were also identified that was indicative of horizontal gene transfer from Eplisonproteobacteria and Deltaproteobcteria. The presence of a large number of transposases in M. burtonii suggests that its genome has undergone rearrangement possibly leading to its adaptation to low temperature. Overall, the analysis revealed that nucleotide skew, horizontal gene transfer and transposase activity may allow M. burtonii to evolve through genome plasticity and adapt to cold environments (Allen et al., 2009).

1.2.2 Proteomic analysis of M. burtonii

A comprehensive proteomic analysis of M. burtonii was first performed by Goodchild et al. (2004a). Tandem liquid chromatography – tandem mass spectrometry (LC/LC- MS/MS) was used to analyse cell extracts of M. burtonii grown at 4oC. A total of 528 proteins were identified which included 133 hypothetical proteins. Although proteins were identified at a single temperature (4oC), processes or proteins involved directly in cold adaptation were not identified. A two-dimensional gel electrophoresis (2DE) 6

followed by mass spectrometry was used to perform a comparative proteomic analysis at 4 vs 23°C (Goodchild et al., 2004b). A total of 43 differentially abundant proteins were identified. To complement the 2DE analysis, isotope coded affinity tag chromatography was performed and an additional 11 proteins were identified. Furthermore, another comparative proteomic analysis was performed to analyse secreted proteins and determine differentially abundant proteins at 4 vs 23°C (Saunders et al., 2003). Overall, these proteomic analyses revealed that proteins involved in transcription, protein folding, protein transport, translation, metabolism and cellular interactions are likely to play important roles in cold adaptation of M. burtonii.

A global proteomic analysis of soluble, insoluble and supernatant fractions from cell extracts of M. burtonii grown at 4 vs 23oC was performed to examine the effects of different growth temperatures (4oC vs 23oC) and various methylated growth substrates (Methanol vs trimethylamine) (Williams et al., 2010). This study reported a number of nucleic acid binding proteins (involved in transcription and translation), isomerases (involved in protein folding) and surface layer proteins (involved in maintaining cell envelop) that were upregulated at 4oC; and it was postulated that these proteins were involved in cold adaptation (Williams et al., 2010). Another proteomic analysis assessed global protein level across the full growth temperature range of M. burtonii: at - 2, 1, 4, 10, 16, 23 and 28ºC (Williams et al., 2011). From the analysis, M. burtonii was identified as having three different temperature-dependent physiological states: cold stressed (at -2 °C), cold adaptation (at 1, 4, 10 and 16°C) and heat stressed (at 23 and 28oC). At -2oC, numerous oxidative stress proteins were identified and at 23oC, proteins involved during heat stress were upregulated. At 1 and 4oC, proteomic profiles were similar. Therefore, it was postulated that protein abundance at 4oC provides a reliable estimation of gene expression at the native growth temperature of M. burtonii (1-2oC). At 10 and 16oC, cells did not appear stressed and consequently proteins involved in methanogenesis were abundant. Overall, knowledge from these proteomic studies provided a better understanding of cold stress, heat stress and regular growth of M. burtonii in view of protein abundance.

1.2.3 Transcriptomic analysis of M. burtonii

To determine low temperature transcript abundance and arrangement of operons in M. burtonii, a microarray was developed by Camparano et al. (2011). It was reported that 7

approximately 55% of the genes are arranged in operons that range from 2 to 23 genes. A positive correlation was observed between operon length and mRNA abundance; and longer operons appeared to contain genes with related functions. At 4oC, genes for tRNA modifying proteins, nucleic acid binding proteins and ribosomal proteins were upregulated; and at 23oC, genes involved in methanogenesis, metabolism and transporter proteins exhibited notable upregulation. Furthermore, transcriptomic analysis suggested that transcriptional regulation rather than translation is mostly responsible for controlling gene regulation in M. burtonii (Campanaro et al., 2011).

1.2.4 Proteins from M. burtonii

Comparative proteomic analyses have identified several proteins that have been postulated to play roles in cold adaptation of M. burtonii. To date, only a few have been characterized. These include RNA helicase (Lim et al., 2000), elongation factor 2 (EF- 2) (Thomas et al., 2001; Siddiqui et al., 2002; Thomas and Cavicchioli, 2002), RNA polymerase subunits E and F (RpoEF) (DeFrancisci et al., 2011) and chaperonins (Pilak et al., 2011).

At low temperature, RNA helicase (Mbur_1950) from M. burtonii transcribes with a long 5'-untranslated region (UTR) (Lim et al., 2000). The long 5'-UTR contains a sequence with high identity to cold-box elements found in the 5'-UTR of genes encoding CspA, CspB, CspG and CspI (Phadtare et al., 1999) and CsdA (RNA helicase) (Jones et al., 1996) in Escherichia coli and CrhC (RNA helicase) in Anabaena sp. (Chamot et al., 1999). It was postulated that a bacterial-like gene regulation may exist for Mbur_1950 and the helicase can reduce secondary structure formation in the mRNA transcripts, and hence allow accessibility of the transcripts to the ribosomes (Lim et al., 2000; Williams et al., 2011).

EF-2 is a GTPase that is involved in translocation of ribosomes and play important role in protein synthesis. Structural features of EF-2 from M. burtonii indicate that the protein is flexible in nature due to the presence of fewer salt-bridges and loosely packed hydrophobic core (Thomas and Cavicchioli, 1998). EF-2 showed greater activity at low temperature and lower thermal stability at high temperatures (Thomas and Cavicchioli, 1998). Presence of ribosome and intracellular compatible solutes improved in vitro GTPase activity and stability of EF-2 from M. burtonii. Therefore, intracellular components (such as intracellular solutes and ribosomes) influence thermal 8

characteristics and functional properties of EF-2 at low temperatures (Thomas et al., 2001; Siddiqui et al., 2002; Thomas and Cavicchioli, 2002). In a separate study, thermodynamic activation properties of EF-2 were assessed (Siddiqui et al., 2002). EF- 2 from M. burtonii exhibited irreversible unfolding and required less activation energy for GTP hydrolysis compared to EF-2 from the thermophilic counterpart Methanosarcina thermophila.

All 12 components of RNA polymerase (RNAP) in M. burtonii are expressed (Goodchild et al., 2004a); however, only subunit E of RNAP is upregulated at 4 compared to 23oC (Goodchild et al., 2004b; Goodchild et al., 2005). In order to assess whether RpoEF from M. burtonii binds to specific cellular targets that may have potential roles in cold adaptation, nucleic-acid binding characteristics of RpoEF were determined (DeFrancisci et al., 2011). An in vitro method was used to capture specific M. burtonii RNA targets and a total of 117 genes (4% of the total) representing 48 regions of the genome were identified to be bound by RpoEF (DeFrancisci et al., 2011). These included a number of functional classes: methanogenesis, cofactor biosynthesis, nucleotide metabolism, transcription, translation and transport. Most of these target genes are arranged in operons as well as relatively close to the putative origin of replication to perhaps facilitate efficient gene expression (DeFrancisci et al., 2011).

Chaperonins are a subclass of molecular chaperones that aid in refolding of unfolded or partially folded proteins (Hartl and Hayer-Hartl, 2002). M. burtonii contain three group II chaperonins (MbCpns) that are expressed irrespective of growth temperatures (Pilak et al., 2011). The crystal structure of MbCpn1960 (one of the three chaperonins) was solved using X-ray crystallography. It was revealed that both in vivo and in vitro; MbCpn1960 exists predominantly as monomers whereas in other Archaea, chaperonins exist as oligomeric forms. MbCpn1960 was also reported to contain a fully open nucleotide binding site which appeared to be a unique feature of group II chaperonins from M. burtonii. It was postulated that these unique monomeric chaperonins might play roles in cold adaptation of M. burtonii (Pilak et al., 2011).

9

1.3 Cold adaptation in microorganisms

Microorganisms growing in cold environments have evolved features (both genetic and physiological) suitable for cellular functions at low temperatures. Cellular response to low temperature varies widely between different microorganisms. The following sections describe some cold adaptation mechanisms in Bacteria and Archaea.

1.3.1 Cold adaptation in Bacteria

Upon temperature downshift, bacterial cells usually enter a transient arrest phase (acclimation phase) and consequently, production of most proteins stop (Polissi et al., 2003). After the cells have adapted to the low temperature, they resume growth but at a slower rate (Phadtare, 2004). Low temperature can decrease membrane fluidity (Cao- Hoang et al., 2010). In Bacillus subtilis, cold shock instigates desaturation of membrane fatty acids. At low temperature, induction of fatty acid desaturase, sensor kinase and the response regulator genes subsequently introduces double bonds into fatty acid molecules of cell membranes (Aguilar et al., 2001; Albanesi et al., 2004).

Most of the cold inducible proteins in Bacteria play roles in improving RNA metabolism during cold shock (Baria et al., 2013). Examples of such proteins are DeaD box RNA helicase and exonucleases (such as RNase R and PNPase). RNA Helicase can potentially unwind secondary structures of RNA whilst RNase R and PNPase can degrade them (Phadtare, 2011). In E. coli, PNPase has been reported to increase its abundance by 2-fold during cold shock and is important for cell survival at low temperatures. PNPase helps to repress the production of Csp proteins at the end of the transient acclimation phase. RNase R abundance increased 10-fold following cold shock (Cairrão et al., 2003; Andrade et al., 2009). In E. coli, RNase R can digest secondary structures without the assistance of a helicase (Awano et al., 2010). In Bacteria, the helicase activity of DeaD box RNA helicase has also been shown to be involved in ribosome biogenesis (Martin et al., 2013).

One of the most important cold inducible proteins includes cold shock proteins (Csp). In Bacteria, cold temperature induces expression of Csp proteins that facilitate survival and growth (Baria et al., 2013). At low temperatures, DNA and RNA are more prone to form stable secondary structures that impede transcription and translational processes (Phadtare et al., 1999). One of the major putative roles of Csp proteins is to prevent 10

inhibitory structures of nucleic acids and thereby facilitating transcription and translation at low temperatures. Bacterial Csp proteins have been shown to bind single- stranded DNA and RNA (Graumann and Marahiel, 1994; Lopez and Makhatadze, 2000). Binding single-stranded RNA improves protein stability and consequently decreases nucleic acid stability. As a result, Csp proteins become less sensitive to proteolytic degradation (Schindler et al., 1999) and the bound RNA becomes more sensitive to ribonuclease digestion (Jiang et al., 1997) at low temperatures.

At low temperatures, DNA can become more negatively supercoiled (Baria et al., 2013) due to the influence of histone-like HU protein and gyrase. However, supercoiling can conversely initiate transcription of cold-induced proteins by mechanisms which are not yet fully understood. Csp proteins have been proposed to maintain chromosome structure (Chaikam and Karlson, 2010). For example, overexpression of CspE can promote or protect chromosome folding (Hu et al., 1996).

Ribosomes can preferentially act on mRNA from cold induced genes (Giuliodori et al., 2004). Levels of translation initiation factors (IF) increase during cold shock. IF1 and IF3 are important for the translation of cold-inducible genes; IF3 enhances mRNA translation whereas IF1 enhances the efficiency of IF3 (Gualerzi et al., 2003). IF2 and Hsc66 proteins are involved in correcting protein misfolding at low temperatures (Lelivelt and Kawula, 1995; Caldas et al., 2000). Trigger factors (TF) also plays important role in protein folding; TF levels increase as temperature falls (Kandror and Goldberg, 1997). In B. subtilis, TF is involved in protein folding at low temperatures (Graumann et al., 1996). In E. coli, TF is induced at low temperatures and has been reported to enhance viability of cells (Kandror and Golderg, 1997). Therefore, molecular responses such modification in RNA (or nucleic acid) metabolism is critical for low temperature survival.

1.3.2 Cold adaptation in Archaea

Analysis of several archaeal genomes has provided insights into amino acid composition of cold adapted proteins (Saunders et al., 2003). Studies focusing on three dimensional structures and homology models of psychrophilic proteins revealed that solvent accessible areas of proteins consist of more glutamine, threonine and hydrophobic residues. Exposed hydrophobic residues are likely to destabilize surfaces of psychrophilic proteins; thereby increasing flexibility of the protein and reducing 11

activation energy of protein-substrate transition state and improving catalytic efficiency at low temperatures (Saunders et al., 2003; Siddiqui and Cavicchioli, 2006). In addition, compared to mesophilic counterparts, psychrophilic proteins also possess increased number and clustering of glycine residues and lesser number of proline residues in their loops; thus increasing flexibility of the secondary structures (Gerday et al., 1997; Feller and Gerday, 1997; Georlette et al., 2004; Siddiqui and Cavicchioli, 2006). These structural strategies adapted by psychrophilic proteins contribute to the overall flexibility of protein and consequently improve low temperature function and stability.

In psychrophilic Bacteria, posttranscriptional incorporation of dihydrouridine increases flexibility of tRNA at low temperatures (Dalluge et al., 1996; 1997). It was speculated that GC content does not play roles in maintaining tRNA flexibility in psychrophilic Archaea; rather it was proposed that incorporation of dihydrouridine in tRNA can increase flexibility of tRNA. Putative dihydrouridine synthase genes were identified in the genomes of M. burtonii and M. frigidum indicating that the incorporation of dihydrouridine perhaps is a characteristic of psychrophilic Archaea (Saunders et al., 2003; Allen et al., 2009). In M. burtonii, dihydrouridine synthase (Mbur_2154) is upregulated at 4°C (Campanaro et al., 2011).

At low temperatures, lipid bilayers of cell membrane tend to become rigid and as a result membrane fluidity is reduced (Cavicchioli, 2006). This hinders nutrient uptake, electron transport, permeability and other membrane associated functions. In cold adapted Bacteria, membrane fluidity is maintained by an increase in the production of unsaturated fatty acids; this is achieved by increasing desaturase enzyme activity or by de novo fatty acid biosynthesis (Wada et al., 1989; Russell and Fukunaga, 1990). Proteomic analysis of M. burtonii indicated that generation of unsaturated fatty acid by selective saturation (rather than by a bacterial-like desaturase mechanism) enhances membrane lipid desaturation (Nichols et al., 2004). Similar observations were made for H. lacusprofundi where proportion of unsaturated lipids was correlated with growth temperature (Gibson et al., 2005). Hence increase in abundance of unsaturated lipids to maintain membrane fluidity might be a general feature of psychrophilic Archaea.

12

1.4 Binding modules in different RNA-binding proteins

Almost all transcribed RNA binds to proteins to form ribonucleoprotein complexes. This not only provides stability, but also facilitates processing, nuclear export, transport and localization of RNA (Dreyfuss et al., 2002). There are various kinds of RNA- binding proteins; therefore exhibiting the possibility of diverse functions in cells. However, the RNA-binding modules in these proteins are not very diverse. Principles of some of these modules from RNA-binding proteins are described in this section.

RNA-recognition motif (RNP) is composed of 80–90 amino acids that form a four- stranded anti-parallel β-sheet with two α-helices packed against the β-sheets (Oubridge et al., 1994). RNA recognition occurs on the surface of β-sheets (Lunde et al., 2007). RNP can recognize between 4-8 nucleotides. Binding is modulated by conserved residues: an arginine or lysine and two aromatic amino acids (usually phenylalanines) that can form stacking interactions with nucleotides. In E. coli and B. subtilis, Csp proteins possess two RNP motifs (Clery et al., 2008). RNP generally does not bind specific sequences (Dreyfuss et al., 2002).

K homology (KH) domain is a heterologous nuclear domain that has been reported to bind both single stranded DNA and single stranded RNA (Braddock et al., 2002; Backe et al., 2005). They are present in all domains of life. KH-domain is approximately 70 amino acids long which contain three-stranded β-sheet packed against three α-helices. Binding recognition of KH-domain is modulated by hydrogen bonding and electrostatic interactions and shape complementarity (Lunde et al., 2007). KH-domains are also present in PNPases and exosomes in Bacteria and Archaea.

Zinc fingers are mainly DNA-binding domains that can also bind RNA (Lunde et al., 2007). They are typically classified based on residues that are used to coordinate zinc and are found in multiple repeats in a protein. For example, transcription factor TFIIIA possesses 9 zinc fingers. Nucleic acid recognition of zinc fingers is mainly modulated by hydrogen bonds to Watson–Crick base pairs in the major groove of nucleic acids (Wolfe et al., 2000). Although zinc fingers appear to be non-specific in most RNA- protein interactions, some zinc finger modules can recognize specific sequences. For example, TIS11d bound to an AU-rich RNA element is mainly achieved through hydrogen bonding to the protein backbone (Hudson et al., 2004). 13

S1 domains were initially identified in the ribosomal protein, S1. They are composed of approximately 70 amino acids and are also found in several exonucleases (Subramanian, 1983). S1 domain is formed from five-stranded antiparallel β-barrel capped by a short α- helix (Bycroft et al., 1997). This structure is similar to the oligonucleotide/oligosaccharide binding (OB)-fold superfamily (Murzin, 1993). The recognition module of S1 domain is also similar to the OB-fold binding surface. Both modules contain several conserved aromatic residues located on the β-sheets that can form stacking interactions with the nucleic acid bases. The stacking interaction is further supported by interactions provided by the surrounding loops and secondary structure elements (Bycroft et al., 1997; Schubert et al., 2004).

Often an RNA-binding protein consists of several RNA-binding modules. The combined effect of several modules translates to higher specificity and affinity for the RNA substrate. The advantage of having of multiple domains is to allow RNA-binding proteins to recognize long stretches of RNA (Lunde et al., 2007). Overall, the recognition modules in RNA-binding proteins appear to strongly influence functional properties of the proteins. In other words, RNA-binding specificity can directly affect the biological activity of RNA-binding proteins.

1.5 TRAM domain proteins

Modifications of nucleotides in tRNA and rRNA species are important for their cellular functions (Anatharaman et al., 2001). Some of these modifications include addition of methyl groups, pseudouridine formation and thiolation. RNA modification proteins typically consist of a catalytic domain and a separate RNA binding domain (Aravind and Koonin, 1999). The primary sequences of these RNA binding domains are more divergent compared to those of the catalytic domains (Aravind and Koonin, 1999). Nevertheless, by combining sensitive profile search techniques (e.g. PSI-BLAST and HMM-based methods), several RNA binding domains have been identified. These include: S4, PUA, THUMP, NusB, KH and certain OB-fold domains (Gibson et al., 1993; Wolf et al., 1999; Aravind and Koonin, 1999; 2001; Staker et al., 2000; Koonin et al., 2000). Genome wide analysis of proteins involved in RNA metabolism revealed another potential RNA-binding domain. The domain was called 'TRAM', named after the two families of tRNA modifying proteins in which it was identified: TRM2 and 14

MiaB. These families are uridine methylases and enzymes involved in thiolation and methylation of tRNA, respectively (Anantharaman et al., 2001; Pierrel et al., 2004). Uridine methylation and adenosine thiolation in tRNA are almost universal modifications observed in tRNAs. Based on structural analysis of proteins containing TRAM domains, it was postulated that TRAM domains may have roles in translation regulation (Anantharaman et al., 2001).

The N-terminal extension of TRM2 ortholog (from E. coli) was used to further search for the TRAM domains in other proteins. TRAM domains were found in TRM2 proteins from diverse Bacteria and Eucarya. Apart from TRM2 orthologs, TRAM domains were also identified in methylases, translation initiation factors, ribosomal proteins and as single proteins in Archaea (Anantharaman et al., 2001). TRAM domains span 60-70 amino acids and have been identified in the N-terminal regions of all TRM2 orthologs, except for the single proteins from Archaea. This indicated that the N- terminal extension which contained the TRAM domain is an evolutionary mobile domain (Anantharaman et al., 2001).

MiaB proteins are ubiquitous in nature with at least one ortholog present in all genomes. MiaB proteins are involved in the formation of 2-thioadenines in tRNA. In these proteins, TRAM domain was located in the C-terminal region. To assess the relationship between the N-terminal extension of TRM2 and C-terminal of MiaB orthologs, predicted secondary structures were aligned. Interestingly, there were strong similarities across these predicted structural elements of TRAM domains from TRM2 and MiaB. Therefore, it was postulated that an ancient conserved domain may have evolved to give rise to form N-terminal and C-terminal segments of TRM2 and MiaB proteins, respectively (Anantharaman et al., 2001).

The predicted secondary structures of TRAM domains exhibited a β-barrel shape corresponding to five β-strands. The canonical β-barrel structure is also present in OB- folds (Murzin, 1993; koonin et al., 2000). It was suggested that TRAM domains confer a simple β-barrel structure that occurs in i) N-terminal TRM2 proteins. ii) C-terminal MiaB proteins or as iii) single TRAM proteins. However, the single TRAM proteins appeared unique to Archaea. All these occurrences of TRAM domains in different proteins and formation of the canonical β-barrel structure suggested that TRAM is a 15

putative RNA-binding domain that delivers various RNA modifying proteins to their potential targets (Anantharaman et al., 2001).

Although MiaB family proteins are ubiquitous in all domains of life, TRM2-like methylases are rarely present in Archaea. Perhaps the scarcity of the modified nucleotide containing 5-methyluridine (thymine) in Archaea explains the absence of TRM2-like methylases in Archaea. The exact functions of the stand alone single TRAM proteins from Archaea are unknown. However, the presence of TRAM domains in 23S RNA-specific uracil O-methyltransferase (FtsJ) in Halobacterium (Caldas et al., 2000) and translation initiation factor eIF-2β in Thermoplasma suggested that TRAM domains perform roles in regulation of rRNA methylation and translation initiation in Archaea (Anantharaman et al., 2001).

Overall, the TRAM domain was identified as a potential RNA-binding domain which appeared to confer the canonical β-barrel structure. The presence of 60-70 residues long domain in nearly universal RNA-modifying enzyme MiaB and TRM2-tRNA methylases suggest that TRAM domain is perhaps an evolutionary mobile domain that may have independently combined with different catalytic domains on several occasions in evolution, similar to other RNA-binding domains such as S4, PUA and THUMP (Anantharaman et al., 2001).

1.6 RNA-Protein interactions

RNA-protein interactions are crucial for regulation of gene expression (Jankowsky and Harris, 2015). RNA-protein interactions form a complex network inside cells. Association between RNA and protein can be both specific and non-specific. Specific interactions include preferential binding to a defined RNA sequence or structural motif; whereas non-specific associations do not bind to any specific sequence or structure. Proteins involved in RNA degradation, translation elongation and initiation interact with RNA non-specifically as these proteins deal with a variety of RNA species (Aitken and Lorsch, 2012; Parker and Song, 2004). Many RNA binding proteins possess multiple binding domains (Singh and Valcárcel, 2005). Interestingly, these RNA-binding domains can also perform other functions besides RNA interactions. For example, RNA-binding domains present in nucleotidyltransferases can perform tasks as RNA 16

ligases as well as RNA polymerases (Jankowsky and Harris, 2015). However, there are many RNA-binding domains that only bind RNA and do not perform other functions.

Multiple proteins can bind to an individual RNA (Licatalosi and Darnell, 2010). For example, a large amount of CspA proteins has been detected to bind to the same mRNA molecule (Goldstein et al., 1990). Conversely, RNA-binding proteins can also bind different RNA species. For example, translation elongation factor (EF-G and EF-Tu) can bind to all charged tRNA (Agirrezabala and Frank, 2009). There are some proteins that do not bind RNA directly; rather they bind with the aid of another RNA-binding protein. For example, SR protein-specific kinases can regulate splicing in the nucleus with the aid of SR proteins (Zhou and Fu, 2013).

There are several methods that are available to determine protein binding sites on RNA. Proteins can be crosslinked to RNA by UV radiation followed by immunoprecipitation and the crosslinked RNA fragment can be identified via next generation sequencing (Hafner et al., 2010). Another method to determine binding sites in RNA is to perform in vitro selection by systematic evolution of ligands by exponential enrichment (SELEX) followed by next generation sequencing (Campbell et al., 2012). Large numbers of RNA variants are provided to the RNA-binding proteins in vitro and SELEX identifies RNA species that are preferentially bound by the protein. However, SELEX identifies only those RNA species with the highest affinity towards the protein. In the RNA Bind-n-Seq technique, a pool of RNA species is incubated in vitro with RNA-binding proteins (Lambert et al., 2014). The proteins are pulled down (e.g. using affinity columns) and the bound RNA is identified using microarray or next generation sequencing. High-throughput sequencing kinetics measures functional RNA-binding proteins binding to RNA. In this process, a pool of RNA is incubated with RNA- binding proteins and subsequently two groups of RNA are produced: i) processed RNA (bound by the RNA-binding proteins) and ii) unprocessed RNA (not bound to the RNA- binding proteins). The mixture of processed and unprocessed RNA is then separated by gel electrophoresis. The ratios of processed vs unprocessed RNA species over time are analysed by next-generation sequencing which provides information regarding binding kinetics (Guenther et al., 2013). The above methods have been successfully used to identify different RNA species bound by the RNA-binding proteins. 17

Often in vitro assessment of protein-RNA interactions exhibits that the RNA-binding protein binds to a pool of RNA species. However, binding affinities of these RNA species for the protein varies. To describe the entire range of affinities of the bound RNA species, affinity distribution plots are quite useful (Guenther et al., 2013). Affinity distribution plot is a histogram representation of substrate variants with similar affinities (Jankowsky and Harris, 2015). In one study, RNA affinity distribution was measured for RNase P from E. coli and it was revealed that physiologically preferred binding sites cluster at the high-affinity region of the distribution (Guenther et al., 2013). The high affinity region is defined by the presence of the binding motif and only a fraction of the total bound RNA contains the motif. The remainder of the distribution consists of non- specific RNA species without the binding motif (Guenther et al., 2013). This illustrates that in in vitro analysis, RNA-binding proteins inherently bind to a range of RNA species and only a fraction of the total bound RNA is likely to be the protein targets.

Most RNA-binding proteins recognize motifs consisting of 3-8 nucleotides and this sequence can occur frequently even in small genomes. Variation in sequences in binding sites often does not affect RNA binding for most RNA-binding proteins (Jankowsky and Harris, 2015). Hence, many specific RNA-binding proteins can bind RNA non-specifically. The availability of RNA substrates can also affect RNA-binding; at low concentrations of specific RNA substrates, RNA species with low affinity and non-consensus binding motifs can bind more profoundly to the protein.

Aside from specific sequence based motifs, RNA binding proteins can also bind to structural motifs of RNA. For example, ribosomal proteins in Bacteria bind RNA by recognizing specific structural motifs (Perederina et al., 2002). The intrinsic specificity of a RNA-binding protein can also depend on the ratio of rate constants for substrate binding and dissociation constant (Jankowsky and Harris, 2015). These parameters rely on the available concentrations of the protein and RNA. Therefore, a highly specific RNA binding protein can change its specificity through changes in the rate constants which is dependent on RNA or protein concentrations.

18

1.7 RNA-seq and its implications in transcriptome analysis

The transcriptome is the complete set of transcripts in the cell. Analysis of transcriptome of an organism allows identification of functional elements of the genome. It also provides information regarding mRNAs, non-coding RNAs and small RNAs; operon structures of gene, transcription start and stop sites, splicing patterns, post-transcriptional modifications and determination of expression levels at different growth conditions (Wang et al., 2009). There are various methods to analyse a transcriptome. For example, hybridization based techniques use fluorescence labelled cDNA with custom designed microarray. Hybridization techniques are relatively inexpensive and allow assessments of large genomes; however, it also imposes some limitations. For example, major disadvantages of this technique are that a genomic sequence is required, there is high background interference from cross-hybridization and the technique has a restricted dynamic range (Wang et al., 2009).

Sequence based techniques can directly determine cDNA sequences using Sanger sequencing technique (Boguski et al., 1994; Gerhard et al., 2004). But this process is expensive and generally produces non-quantitative data. To overcome these limitations, Tag-based methods were developed. Although Tag based methods gave high- throughput and quantitative assessment of gene expressions, it still uses expensive Sanger sequencing. One of the major disadvantages of this technique is that the short reads cannot be accurately mapped onto the reference genome. Hence, from this method correct structures of transcriptomes cannot be obtained (Wang et al., 2009).

The development of high throughput RNA sequencing methods (RNA-seq) exhibit advantages over previous methods. A population of RNA is converted to cDNA and with or without amplification it can be directly sequenced in a high-throughput manner (Wang et al., 2009). Generally, any high-throughput sequencing platform can be used; e.g. Illumina IG, Applied Biosystems SOLiD and Roche 454 Life Science. After sequencing, the resulting reads can be aligned to a reference genome or reference transcripts, or assembled de novo without any genomic sequence.

Some of the advantages of RNA-seq include: prior knowledge of genomic sequences is not required, RNA-Seq can detect the precise location of transcripts to a single base resolution and short reads can be accurately mapped to give information regarding adjoining exons. Furthermore, background signal from RNA-seq is substantially lower 19

compared to hybridization techniques. The large range of dynamic detection level allows a comprehensive assessment of expression level of transcripts. For example, a total of 16 million reads were generated from a transcriptomic study on Saccharomyces cerevisiae and expression levels greater than 9000-fold range were predicted (Wilhelm et al., 2008). RNA-seq also demonstrates accurate quantification of gene expression levels as determined using quantitative PCR (qPCR) (Wilhelm et al., 2008). Collectively, RNA-seq is the first sequencing-based method developed that allows assessment of whole transcriptome of organisms in a high-throughput and quantitative manner.

However, there are also some challenges posed by RNA-seq method. Large RNA molecules require fragmentation before they are compatible with different sequencing platforms. Fragmentation processes such as hydrolysis or nebulization can introduce bias in the samples (Wang et al., 2009). For example, fragmentation processes that are biased towards the 3’ ends of transcripts will provide more accurate information about 3’ ends compared to 5’ ends. Sometimes short reads are difficult to distinguish and thereby influence the true abundance of RNA species. In order to obtain correct transcriptome annotations, construction of strand-specific cDNA libraries are required. Strand specific reads are essential to construct regions with overlapping transcription from opposite directions (David et al., 2006). However, strand specific cDNA libraries are more laborious to construct i.e. requires many steps or use of RNA-RNA ligation methods, which can be inefficient (Lister et al., 2008; Cloonan et al., 2008). As a result, successful preparations of strand specific cDNA libraries are difficult to achieve. Other issues of RNA-seq include the cost; greater sequence coverage translates into higher cost. In general, larger genomes are complex and require more sequencing depth.

Overall, RNA-seq has the dynamic capacity to yield unparalleled global assessment of the transcriptome and organization of genes. The single base resolution of RNA-seq can provide more accurate annotations, define transcript boundaries and identify small, non- coding RNA species. Since RNA-seq can yield quantitative data, RNA expression levels can be compared across different conditions to assess relevant biological aspects of any organism.

20

1.8 Aims of project

One of key features that emerged from the genomic, proteomic and transcriptomic analyses performed on M. burtonii is the upregulation of nucleic acid binding proteins at low temperatures (Allen et al., 2007; Williams et al., 2011; Campanaro et al., 2011). These analyses only provided a global assessment of genes, proteins and transcript contents of nucleic acid binding proteins at low temperatures, however, experimental data characterizing functions of many nucleic acid binding proteins remained unassessed. Clearly these proteins have functions that are critical to cold adaptation of M. burtonii. To date, only one study has been performed to determine the nucleic-acid binding characteristics of RpoEF with the aim to identify cellular RNA targets (DeFrancisci et al., 2011).

The first insight into which genes in Archaea may perform an analogous function to csp genes came from proteomic analyses of M. burtonii where small proteins, each composed of a single TRAM domain (Mbur_0304, Mbur_0604, Mbur_1445) were found to have increased abundance during low temperature growth (4 vs 23C; Williams et al., 2010). Subsequently, Williams et al., (2011) determined that the highest levels of these Ctr (cold-responsive TRAM domain) proteins occurred at 1 and -2°C. It was also observed that the abundance of the ATP-dependent RNA helicase Mbur_1950 increased with decreasing growth temperature down to 1C, but cellular abundance dropped at -2C (Williams et al., 2011). It was rationalised that the levels of the RNA helicase may be down regulated at the minimum temperature for growth (-2C) because cellular ATP would be limited, and the increased abundance of Ctr proteins may compensate for the reduced levels of RNA helicase activity (Williams et al., 2011). While the proteomic studies (Williams et al., 2010; Williams et al., 2011) pointed to a role for Ctr proteins in unravelling inhibitory RNA secondary structures formed at low temperature (Williams et al., 2011), experimental data characterizing their functional properties was lacking. In this study, the three M. burtonii TRAM domain genes (proteins) Mbur_0304, Mbur_0604 and Mbur_1445 are referred to as ctr1 (Ctr1), ctr2 (Ctr2) and ctr3 (Ctr3), respectively.

Ctr3 exhibited the highest abundance at low temperature growth of M. burtonii, with the largest increase of ~ 9-fold at -2C (Williams et al., 2011). Although a putative role of Ctr3 at low temperature was suggested in the transcriptomic and proteomic studies 21

(Campanaro et al., 2011; Williams et al., 2011), experimental data characterizing its functional properties is lacking. The aim of my research was to assess evolutionary, biophysical and structural properties, unique characteristics related to nucleic acid binding, including specific interactions describing molecular roles of Ctr3 in the cell. This was achieved by addressing the following specific aims:

 Determination of evolutionary relationship of TRAM proteins across Archaea and Bacteria (Chapter 2).  Development and optimization of recombinant expression and purification methods for producing nucleic acid free Ctr proteins (Chapter 3).  Assessment of the biophysical properties of nucleic acid bound vs nucleic acid free forms of Ctr3 proteins. Generation of a homology model for Ctr3 in order to identify structural features that could conceivably facilitate nucleic acid binding (Chapter 4).  Reconstruction of M. burtonii transcriptome using RNA-seq data (Chapter 5).  Development of a method to capture M. burtonii cellular nucleic acid targets for Ctr3 followed by identification using RNA-seq and inference of cellular function of Ctr3 based on specific RNA species bound by the protein (Chapter 6).

22

Chapter 2

Phylogenetic analysis of Ctr3

23

Abstract

Single TRAM proteins are unique to Archaea. However, the association of the TRAM domain with other catalytic domains is quite prevalent in both Archaea and Bacteria. TRAM proteins exhibit affinity towards RNA and adopt the canonical β-barrel fold which is typically observed in OB-fold and CSD proteins. TRAM domains are present in MiaB homologs, TRM2 methylases, translation initiation factors and ribosomal proteins. For the purpose of this study, sequences with similarity to Ctr3 within genomes of Archaea and Bacteria were identified and phylogenetic trees were constructed to gain an understanding of the evolutionary relationship of Ctr3 and TRAM domain proteins from Archaea and Bacteria. Single TRAM domain proteins from Archaea clustered together while 23S rRNA methyltransferases and translation IF2 proteins from Archaea, 23S rRNA methyltransferase sequences from Bacteria, and tRNA modification enzymes from both Archaea and Bacteria fell into distinct clades. Many of archaeal species have multiple stand-alone TRAM proteins, the phylogeny of which approximates that of the organisms based on their 16S rRNA genes. These results indicate that the genes encoding single TRAM domain proteins arose late in evolution and the distribution of multiple TRAM domain proteins in archaeal species (e.g. M. burtonii) arose as a result of duplication within the genome.

24

2.1 Introduction

A previously undetected RNA binding domain was first proposed as a separate domain by Anantharaman et al. (2001). The domain was called 'TRAM', named after two families of tRNA modifying proteins: TRM2 and MiaB. These families are uridine methylases and enzymes involved in thiolation and methylation of tRNA, respectively (Anantharaman et al., 2001; Pierrel et al., 2004). Uridine methylation and adenosine thiolation in tRNA are almost universal modifications observed in tRNAs (Anantharaman et al., 2001). Besides these two families, TRAM was also identified in translation initiation factors, ribosomal proteins and as stand-alone archaeal TRAM proteins (Anantharaman et al., 2001). Anantharaman et al. (2001) also postulated that these previously uncharacterized archaeal TRAM domain proteins might have roles in tRNA modification or translation. The same study predicted that TRAM domains adopt a β-barrel shape. The unpublished three-dimensional NMR structures of TRAM domain proteins from M. mazei (PDB ID: 1YEZ) and Methanococcus maripaludis (PDB ID: 1YVC) also exhibited β-barrel structures. This canonical β-barrel shape is observed in OB-fold and (CSD) proteins (Murzin, 1993; Sawyer et al., 2015). OB-folds are found in a variety of proteins which includes nucleases, tRNA synthetases, enterotoxins, verotoxins, ribosomal proteins, phosphorylases, translation initiation factors IF1 and eIF2 (Murzin, 1993; Graumann and Marahiel, 1998). These OB-fold proteins have five- stranded β-sheets coiled together to form a closed β-barrel structure (Murzin, 1993), a similar architecture to single TRAM proteins from M. mazei and M. maripaludis; and more importantly to Ctr3 from M. burtonii (see Chapter 4). However, these OB-fold and CSD proteins are formed from different/unrelated primary amino acid sequences (Murzin, 1993) which possibly explains why primary sequences of RNA binding domains are more divergent than any associated partner catalytic domains (Aravind and Koonin, 1999). Perhaps it is for this reason why this domain remained unidentified previous to the Anantharaman et al. (2001) study.

The cold shock domain (CSD) is present in Bacteria in a class of small nucleic acid binding proteins called cold shock proteins (Csps), which are upregulated during low temperature stress to cope with cold shock (Gao et al., 2006; Bergholz et al., 2009; Siddiqui et al., 2013). These Csp proteins adopt the canonical β-barrel structure (Feng et al., 1998). Although both Csp and TRAM domain share structural similarities (common OB-fold), primary sequences of these proteins are very different, a feature common in 25

RNA-binding proteins (Aravind et al., 1999). Both TRAM domain and Csp proteins are postulated to have roles in translation (Anantharaman et al., 2001; Baria et al., 2013).

To determine the evolutionary relationship of TRAM proteins across Archaea and Bacteria, in regards to Ctr3, phylogenetic analyses were performed. Sequences with similarity to Ctr3 were retrieved from genomes of Archaea and Bacteria and were used to construct neighbour-joining phylogenetic trees. This is the first broad phylogenetic analysis of TRAM proteins from Archaea and Bacteria.

2.2 Materials and Methods

Phylogenetic trees were constructed from amino acid and nucleic acid sequences of TRAM domain proteins identified from blastp searches with ≥40% identity to Ctr3 (Mbur_1445) present in genome sequences of representatives of the Archaea and Bacteria that were available in the IMG database (Appendix 1) (Markowitz et al., 2010). Only TRAM domain sequences were used for proteins which had multiple protein domains; if full sequences were used, multiple sequence alignment results were inconclusive (data not shown). Multiple sequence alignments were performed using MUSCLE (Edgar, 2004) on the T-coffee platform (Notredame et al., 2000) and the alignment file uploaded in MEGA 6.0 (Kumar et al., 2013) to generate a neighbour- joining (NJ) tree using an interior branch test of phylogeny (10,000 bootstrap replicates) that was rooted to a hypothetical protein from Desulfurococcus mucosus (<20% identity to Ctr3). A NJ 16S rRNA gene tree was constructed using sequences of all species represented in the TRAM domain amino acid and nucleic acid tree with the tree rooted using the 18S rRNA gene sequence of S. cerevisiae (Appendix 2). All constructed phylogenetic trees were further manipulated using Figtree v1.3.1 (Rambaut, 2008). Multiple sequence alignments of amino acid and nucleic acid sequences of TRAM proteins are shown in Appendix 3 and 4, respectively.

26

2.3 Results

Phylogenetic analysis revealed that Ctr3 clustered with other sequences of single TRAM domain proteins from Archaea in both amino acid and nucleic acid tree (Figures 2.1 and 2.2). The remaining sequences fell into distinct clades represented by 23S rRNA methyltransferases and translation IF2 proteins from Archaea, 23S rRNA methyltransferase sequences from Bacteria, and tRNA modification enzymes from both Archaea and Bacteria (Figures 2.1 and 2.2). Similar to M. burtonii, which has three single TRAM proteins (ctr1, ctr2, ctr3), multiple genes encoding single-TRAM domain proteins tend to be present within many archaeal genomes.

The single TRAM domain proteins (exclusive to Archaea) were represented by relatively shorter branch lengths, compared to branch lengths of 23S rRNA methyltransferases and translation IF2 proteins from Archaea, 23S rRNA methyltransferase sequences from Bacteria, and tRNA modification enzymes from both Archaea and Bacteria (Figures 2.1 and 2.2), indicative of fewer nucleotide and amino acid substitutions (Figures 2.1 and 2.2, respectively). The phylogeny resolves archaeal TRAM domain proteins as nested within methyltransferases from archaeal methanogens and haloarchaea (Figure 2.3A,B).

For the methanogens most closely related to M. burtonii, two genes are present in Methanohalophilus mahii, three in Methanolobus tindarius and four in Methanolobus psychrophilus (Figure 2.3A,B). A NJ 16S rRNA gene tree was also constructed using sequences of all species represented in the TRAM domain amino acid and nucleic acid tree with the tree rooted using the 18S rRNA gene sequence of S. cerevisiae (Appendix 2). The phylogeny of these TRAM sequences from M. burtonii, M. mahii, M. tindarius and M. psychrophilus approximates that of the organisms based on their 16S rRNA genes (Figure 2.3), indicating that the genes encoding single TRAM domain proteins likely arose from duplication within genomes late in the evolution of each species.

27

Figure 2.1. Amino acid neighbor-joining (NJ) tree. Phylogenetic relationship of TRAM domain proteins showing formation of distinct functional clades which are marked accordingly. The tree is rooted to a hypothetical protein from D. mucosus. The bar at the bottom represents the amount of genetic change (amino acid substitutions per site) of 0.1.

Figure 2.2. Nucleic acid neighbor-joining (NJ) tree. Phylogenetic relationship of these TRAM proteins exhibit formations of distinct functional clades which are marked accordingly. The tree is rooted to the nucleic acid sequence of a hypothetical protein from D. mucosus. The bar at the bottom represents the amount of genetic change (nucleotide substitutions per site) of 0.05.

28

29

Figure 2.3. Phylogenetic relationship of TRAM domain proteins. (A) Amino acid neighbor-joining (NJ) tree and (B) nucleic acid neighbor-joining (NJ) tree highlighting single TRAM domain proteins related to the three M. burtonii proteins. Individual genes from each archaeal species have been assigned a number, and individual methanogen species have been labeled with letters (A-G) to assist in their identification. (C) 16S rRNA gene NJ tree. M. burtonii, M. mahii, M. tindarius and M. psychrophilus 30

share 93 – 97% 16S rRNA gene identity. The bar at the bottom represents the amount of genetic change (amino acid or nucleotide substitutions per site) of (A) 0.1, (B) 0.05 and (C) 0.05.

2.4 Discussion

The primary sequences of TRAM domains from Archaea and Bacteria are quite divergent (Aravind et al., 1999; Anantharaman et al., 2001). Therefore, Ctr3 was used as reference while searching for TRAM homologs across genome sequences of representatives of the Archaea and Bacteria; and only those proteins were considered which had at least 40% identity to Ctr3. This allowed assessment of evolutionary relationship of TRAM proteins across Archaea and Bacteria, in particular regards to Ctr3 from M. burtonii.

The genetics of microorganisms are influenced by various environmental factors. In necessity to adapt to these environmental factors, nucleic acid sequences are potentially more likely to integrate codon bias to adapt to evolutionary pressure (Hasegawa et al., 1993). Amino acid sequences, in the other hand, of these organisms remain more protected from changes (Hasegawa et al., 1993). Codon bias introduced in nucleic acid sequences can affect phylogenetic analysis (Hasegawa et al., 1993). In order to assess the extent of codon bias in nucleic acid sequences of TRAM proteins, a nucleic acid tree was constructed and compared to the amino acid tree (Figure 2.1 vs 2.2). The amino acid tree and nucleic acid tree were largely congruent (Figures 2.1 and 2.2). This suggests that codon bias, if introduced from the nucleic acid sequences, has little or perhaps no effect on the amino acid sequence of TRAM proteins from Archaea and Bacteria.

The phylogenetic analyses allowed broad understanding of the evolutionary relationship of TRAM proteins across Archaea and Bacteria. The archaeal TRAM domain proteins are nested within methyltransferases from methanogens and haloarchaea (Figure 2.3A,B). Such assembly and/or divergence of an evolutionary distinct module are thought to be quite typical of RNA-modifying proteins (Anantharaman et al., 2001). The presence of TRAM domains (e.g. within tRNA modification enzymes) across Archaea and Bacteria suggests that these domains may have been disseminated across species via horizontal gene transfer (Figures 2.1 and 2.2). The most parsimonious 31

interpretation of the phylogenetic analyses presented here is that stand-alone TRAM proteins from Archaea arose from methyltransferases that lost the catalytic domain, for reasons yet to be determined. This further suggests that stand-alone TRAM proteins of Archaea diverged relatively recently as single domain proteins.

The hypothetical ancient RNA binding domain (or nucleic acid binding domain) was likely a polypeptide with a simple fold which later diverged into various families of proteins, including OB-fold, CSD and other RNA binding proteins (Graumann and Marahiel, 1998). Although these families have gained distinct functions, overall folding and structure (β-barrel like shape) of many remain essentially similar (Murzin, 1993). However, within these RNA-binding protein families, there is significant variation in primary amino acid sequences (Graumann and Marahiel, 1998) suggesting conservation of structural features may correspond to functional advantages. The extreme divergence in primary amino acid sequence between TRAM, OB-fold and CSD preclude these families from being included in the same phylogenetic analysis. The RNA-binding protein ancestral to all RNA-binding domains (including TRAM, OB-fold and CSD) has been proposed to be single domain (Graumann and Marahiel, 1998); nevertheless, the phylogenetic analysis presented here suggests that the single TRAM domain is derived, and evolved from proteins that lost their catalytic domains.

Most Bacteria have multiple numbers of genes for Csp proteins in their genomes (Horn et al., 2007); e.g. E. coli has nine homologs of Csp proteins (Yamanaka, 1999). A small number of csp genes have also been identified in Archaea (Cavicchioli, 2006); however these genes are absent from the M. burtonii genome, and broadly in many Archaea (Giaquinto et al., 2007; Allen et al., 2009; Lauro et al., 2011). Many Archaea possess multiple numbers of single TRAM protein genes in their genomes (Appendix 4); e.g. three genes are present in M. burtonii, two in M. mahii, three in M. tindarius and four in M. psychrophilus (Figure 2.3A,B). The prevalence of single TRAM proteins in Archaea in contrast to complete absence in Bacteria; and presence of Csp in Bacteria in contrast to paucity of Csp in Archaea indicates that they represent distinct evolutionary lineages of functionally equivalent RNA-binding proteins. Such divergence from an evolutionary ancient and distinct/conserved module is predicted to be quite typical of RNA- modifying proteins (Anantharaman et al., 2001). 32

In E. coli, not all Csp proteins are upregulated at low temperature; only four (CspA, CspB, CspG and CspI) have increased abundance during cold-shock (Yamanaka, 1999). The other Csp proteins have been reported to perform functions that are not related to low temperature adaptation (Yamanka and Inouye, 1997) e.g. CspD is induced upon carbon starvation (Yamanka and Inouye, 1997) and CspE is involved in chromosome condensation (Hu et al., 1996) and transcriptional regulation (Hanna and Liu, 1998; Bae et al., 2000). Similar to Csp homologs in E. coli (or broadly Bacteria), it is possible that not all single TRAM proteins within these archaeal species play roles that are vital for low temperature adaptation; rather they play potential roles in cellular functions unrelated to cold adaptation. This is an area of research that requires more attention.

In conclusion, Ctr3 appears to have diverged from an evolutionary ancient RNA- binding module and emerged relatively recently as a single domain protein. Although Ctr3 is structurally similar to OB-fold and CSD proteins, the divergence in primary sequence between these nucleic acid-binding proteins is striking (Graumann and Marahiel, 1998). Broadly, the function of stand-alone TRAM proteins from Archaea is not clear; however, it has been postulated to have roles in tRNA modification or translation (Anantharaman et al., 2001). Chapter 6 describes the first study to experimentally assess the possible cellular function of Ctr3, a single TRAM domain protein from M. burtonii.

33

Chapter 3

Overexpression and purification of Ctr proteins

34

Abstract

Ctr proteins (Ctr1, Ctr2 and Ctr3) from M. burtonii are single domain proteins which are predicted to bind nucleic acids. The poly-histidine tagged recombinant Ctr proteins were expressed in an E. coli expression system and purified to homogeneity using immobilized metal affinity chromatography and size exclusion chromatography. E. coli nucleic acids consistently co-purified with all three recombinant Ctr proteins. The liberated nucleic acid from proteins was able to be digested with RNase but not DNase, indicating the bound nucleic acid was RNA. A separate purification strategy was developed to produce recombinant Ctr proteins devoid of RNA. The bound nucleic acids were able to be removed by treating the recombinant proteins with mild urea to partially unfold the protein, eluting the protein across a NaCl gradient, and refolding it by dialysis. The method yielded large quantities of soluble heterologous proteins.

35

3.1 Introduction

Purification of psychrophilic proteins is highly influenced by the approach chosen (Cavicchioli et al., 2006). Psychrophilic proteins are usually purified either from the native psychrophile or expressed in non-native mesophilic hosts. Both strategies have certain advantages and disadvantages. Purifying psychrophilic proteins from native organisms can yield correctly folded and post-translationally modified proteins; on the other hand, the production/abundance of the protein may be limited due to difficulties in obtaining sufficient biomass (Cavicchioli et al., 2006). In contrast, heterologous expression of psychrophilic proteins in mesophilic bacterial systems (e.g. E. coli) can yield large quantities of recombinant proteins; conversely, correct folding of the recombinant psychrophilic proteins may be affected in mesophilic hosts. For example, multi-domain α-amylase from the Antarctic psychrophile Pseudoalteromonas haloplanktis is natively synthesized at 0 ± 2oC; overexpression of α-amylase in a mesophilic host (E. coli) at 18oC yielded correctly folded protein, whereas at 37oC, correct folding of α-amylase was not achieved (D’Amico et al., 2003). Similarly, EF-2 from M. burtonii was purified with high yield when overexpressed in E. coli at 14oC compared to at 23 or 30oC (Thomas and Cavicchioli, 2000; 2002). However, lowering the growth temperature of E. coli (optimal at 37oC) to 10oC or below significantly reduces growth rate and subsequently impede protein synthesis (Goldstein et al., 1990). Therefore, a compromise between psychrophilic protein stability and mesophilic growth rate should be reached in order to achieve optimal purification (Feller et al., 1998).

In E. coli, proteolytic degradation can severely affect overexpression of recombinant proteins (Cavicchioli et al., 2006). However, several protease-negative E. coli strains, such as BL21 and Rosetta, have been engineered to prevent degradation of overexpressed recombinant proteins (Gräslund et al., 2008). EF-2 (Thomas and Cavicchioli, 2000) and RpoEF (De Francisci et al., 2011) from M. burtonii, Csp from M. frigidum (Giaquinto et al., 2007) were successfully overexpressed in the BL21 strain.

The presence of rare codons in psychrophilic genes and unavailability of the cognate tRNA can also affect successful overexpression of recombinant proteins in a non-native host, such as E. coli. However, codon optimization by modifying coding sequences of the target gene can eliminate rare codons. This not only allows expression of the target 36

gene but also improves translational proficiency in the heterologous host (Gustafsson et al., 2004). Plasmids can be engineered to supply tRNAs for the rare codons. For example, plasmid pRARE supplies tRNAs for the codons AUA, AGG, AGA, CUA, CCC, and GGA; these codons are rarely used in E. coli (Sharp and Li, 1987; Sharp et al., 1988; Sharp and Matassi, 1994).

λDE3 lysogen strains of E. coli [e.g. Rosetta (DE3), BL21 (DE3)] are used to control overexpression of target genes (and prevent leaky expression of target genes). These strains are genetically modified to encode T7 RNA polymerase which, under normal conditions, is repressed by lac repressor (Gräslund et al., 2008). Isopropyl β-D-1- thiogalactopyranoside (IPTG) can bind to lac repressor, inactivate it and induce transcription of the T7 RNA polymerase (Studier et al., 1990). The expressed T7 polymerase can initiate expression of the target gene in plasmid systems or vectors that are specifically controlled by T7 lacO promoter system (such as pJexpress and pET vectors) (Dubendorff et al., 1991). Gene expression of the target genes can therefore be induced by IPTG, and subsequently protein synthesis of the target protein in the heterologous host cell (Studier, et al., 1990). BL21 (DE3) has proven useful for overexpression of EF-2 (Thomas and Cavicchioli, 2000), chaperonins (Pilak et al., 2011) and RpoEF (DeFrancisci et al., 2011) from M. burtonii, and Csp from M. frigidum (Giaquinto et al., 2007).

Affinity tags are frequently used to facilitate purification of many recombinant proteins, (Gräslund et al., 2008). The hexa-histidine or His-tag is a small affinity tag that has several benefits; these include i) His-tagged recombinant proteins can be easily purified using immobilized metal affinity chromatography (IMAC), ii) His-tags do not affect the solubility of the proteins and iii) His-tags usually do not alter the characteristics of recombinant proteins (Uhlén et al., 1992; Gräslund et al., 2008). His-tags were successfully used previously to purify psychrophilic proteins such as chaperonins (Pilak et al., 2011) and RpoEF (DeFrancisci et al., 2011) from M. burtonii.

Strong buffer components are essential to stabilize recombinant proteins during purification (Gräslund et al., 2008). A typical strong buffer system usually contains phosphate, HEPES, Tris-HCl or MOPS to overcome interference of host lysate during purification; high ionic strength (NaCl) to enhance protein solubility and stability; and protease inhibitors and reducing agents (such as Tris (2-carboxyethyl) phosphine 37

hydrochloride) to prevent proteolysis and oxidation of the protein respectively (Gräslund et al., 2008). Additionally, nucleases can be used to degrade nucleic acids to reduce the viscosity of the sample (Gräslund et al., 2008). Such strong buffer components have previously been used to purify archaeal recombinant proteins from M. burtonii (EF-2, chaperonins and RpoEF) and M. frigidum (Csp) (Thomas and Cavicchioli, 2001; Pilak et al., 2011; DeFrancisci et al., 2011; Giaquinto et al., 2007).

Nucleic acid binding proteins are known to possess a range of affinities for different species of nucleic acids (Guenther et al., 2013; Duss et al., 2014; Jankowsky and Harris, 2015); and when overexpressed in a non-native host, nucleic acids from host cell are often co-purified during the purification process (Marenchino et al., 2009). Removal of these contaminants is extremely important for the purpose of downstream biological and structural analysis of target proteins. Nucleic acid contaminants (co-purified with protein) can reportedly be removed by a variety of approaches e.g. ethanol precipitation from crude cell extracts (Nalin et al., 1990), polyethylenimine (PEI) treatment (Zillig et al., 1970), nuclease digestion (Rabilloud, 1999) and heparin-sepharose affinity chromatography (Xiong et al., 2008). PEI treatment has been successful in removing non-specifically bound nucleic acid contaminants from overexpressed His-tagged HIV- 1 Rev (a protein that is essential for the regulation of HIV-1 protein expression) in E. coli (Marenchino et al., 2009). However, a one-step affinity purification method was shown to be more effective in removing nucleic acid contaminants from recombinant HIV-1 Rev protein. The process involved the use of affinity chromatography under denaturing conditions (urea) combined with on-column refolding to produce proteins without nucleic acid contaminants (Marenchino et al., 2009).

M. burtonii has three single TRAM domain genes Mbur_0304, Mbur_0604 and Mbur_1445 (Allen et al., 2009). It was postulated that TRAM proteins bind RNA (Campanaro et al., 2011; Williams et al., 2011). For the purpose of this study, three M. burtonii TRAM domain genes (proteins) Mbur_0304, Mbur_0604 and Mbur_1445 are referred to as ctr1 (Ctr1), ctr2 (Ctr2) and ctr3 (Ctr3), respectively. Ctr stands for cold- responsive TRAM domain protein. This chapter describes optimal strategies developed for purification of Ctr1, Ctr2 and Ctr3 proteins from M. burtonii. Structural and functional properties of the purified Ctr proteins were determined. These properties were related to cold adaptation of M. burtonii in subsequent chapters (6 and 7). 38

3.2 Materials and Methods

3.2.1 Synthesis of ctr genes

All ctr genes were codon optimized and synthesized in pJexpress vectors from DNA 2.0 (Figure 3.1). All vectors had the following features: ampicillin marker, T7 promoter, IPTG inducer and conventional (6X) His-tags at the N-terminals of target genes. ctr1 (Mbur_0304), ctr2 (Mbur_0604) and ctr3 (Mbur_1445) DNA fragments were all sequence verified from DNA 2.0 and independently from Micromon (Monash University, Australia).

(A)

39

(B)

(C)

40

Figure 3.1. Plasmid maps for pJexpress with (A) ctr1 (Mbur_0304), (B) ctr2 (Mbur_0604) and (C) ctr3 (Mbur_1445) inserts. The inserts are shown highlighted in red. Different features and restriction cutting sites are marked accordingly. Details of pJexpress vector are available at www.dna20.com.

3.2.2 E. coli competent cells preparation and transformation

Two strains of E. coli competent cells were used in this study: DH5α and Rosetta (DE3) (Novagen). DH5α was used for cryogenic storage whilst Rosetta (DE3) was used for overexpression of ctr genes. Competent Cells were prepared according to the following protocol. From laboratory stocks, cells were streaked on LB agar plates and incubated overnight at 37oC. A single colony was picked, inoculated into fresh 50 ml LB media and incubated overnight under agitation at 37oC. The overnight culture was 50X diluted in fresh LB media. Under agitation at 37oC, the fresh culture was allowed to reach

OD600 of 0.5-0.6. The culture was then put on ice for 15 min followed by centrifugation at 5000 x g for 10 min at 4oC. The supernatant was carefully discarded and the cell pellet was resuspended in half-culture volume of chilled 0.1 M CaCl2 and incubated on ice for 20 min. The suspension was centrifuged at 5000 x g for 10 min at 4oC. The supernatant was discarded and the pellet was resuspended in one-tenth culture-volume o of cold 0.1 M CaCl2. Aliquots were made and stored at -80 C.

Plasmids harbouring ctr genes (2 µl) (see Section 3.2.1) were added to 50 µl of E. coli competent cells and incubated on ice for 30 min. The mixture was then subjected to heat-shock treatment at 42oC for 90 sec and quickly transferred back on ice. After transformation (via heat-shock), the mixture was allowed to stand for a further 15-20 min. Fresh LB broth (1 ml) was added to the mixture and incubated under agitation at 37oC for ~ 2 h. The turbid culture was then centrifuged at 13000 x g for 2 min. Approximately 90% of the supernatant was discarded and the cell pellet was resuspended in the remaining supernatant (~ 10%). From the suspension, 5 µl was spread on LB plates. Ampicillin (1 mM) was used for selection of DH5α strain and ampicillin (1 mM) and chloramphenicol (1 mM) was used for the Rosetta (DE3) strain. Controls were also set up using non-transformed E. coli cells and empty agar plates. All plates were incubated overnight at 37oC. 41

DH5α was used for cryogenic storage of pJexpress vectors harbouring M. burtonii ctr gene inserts (ctr1, ctr2 and ctr3) whilst Rosetta (DE3) strain was used for overexpression of recombinant proteins.

3.2.3 Overexpression and purification of Ctr proteins

3.2.3.1 Purification of nucleic acid-bound Ctr proteins (protocol A)

E. coli Rosetta (DE3) cells harbouring pJexpress vectors with ctr inserts were allowed to grow overnight at 37oC in 100 ml LB media supplemented with 1 mM ampicillin and 1 mM chloramphenicol. From the overnight culture, 5 ml was used to inoculate fresh 500 ml LB media containing 1 mM ampicillin and 1 mM chloramphenicol in a 2 litre o flask. The culture was incubated at 37 C under agitation until OD600 reached 0.5-0.6. During this incubation period, the flask was fitted with a loose cotton plug to ensure o sufficient aeration. At OD600 0.5-0.6, the temperature was lowered to 15 C and the culture was allowed to grow for a further 1 h. Overexpression was induced by the addition of 1.3 mM IPTG (final concentration). Cells were harvested after 10 h by centrifugation at 5000 x g at 4oC for 20 min. The cell pellet was resuspended in 30 ml chilled lysis buffer A (20 mM Tris pH 7.4, 500 mM NaCl), protease inhibitor (Roche cOmplete ULTRA, EDTA free) and Benzonase® nuclease (Novagen). A control experiment was also set up: transformed E. coli Rosetta (DE3) cell cultured under similar conditions without any IPTG induction. All resuspended cell pellets (in lysis buffer A) were stored at -20oC.

Cells were lysed using a French press (Thermo Scientific) according to the manufacturer’s protocol. The cell suspension was passed 2-3 times through French press to ensure thorough lysis of the cells. Samples were kept on ice during the whole process. Soluble cell free extracts were separated from insoluble fractions by centrifugation at 23000 x g at 4oC for 25 min. The soluble cell free extracts were filtered through 0.45 µm filter unit (Millipore). The insoluble fractions were resuspended in 8 M urea for analysis.

Ctr proteins were purified on pre-equilibrated (with lysis buffer A) 5 ml nickel (Ni) columns (HiTrap Chelating HP, GE Healthcare) fitted to an ÄKTA purification system (GE Healthcare) at 4oC. Briefly, program was run using Unicorn 5.0 software (GE Healthcare) for the following process: pre-equilibration was performed by washing 42

column with 10 bed volumes of chilled lysis buffer A; filtered cell extracts were passed slowly (at a flow rate of 1 ml min-1) through the column to ensure maximum binding, the column was then washed with 20 bed volumes of lysis buffer to remove all unbound material; recombinant Ctr proteins were eluted out periodically by applying increasing step gradients of imidazole concentrations (20 mM, 50 mM, 100 mM and 500 mM). Ctr proteins typically eluted at the 500 mM step. Fractions corresponding to different steps were collected and assessed by SDS-PAGE analysis (see below). The corresponding fractions which contained eluted Ctr proteins were concentrated (to a final volume of 5 ml) through centrifugal filtration units (Amicon Ultra - 3K) and was dialyzed (Novagen dialyzer, 3.5 kDa molecular weight cut-off) overnight against lysis buffer A to remove all traces of imidazole.

Size exclusion chromatography (SEC) was performed on Superdex 75 pg fitted to an ÄKTA purification system (GE Healthcare) at 4oC. Pre-equilibration was done in 50 mM HEPES, pH 7.4, 500 mM NaCl. Prior to this process, the concentrated and dialyzed (overnight) protein sample (obtained from Ni column purification) was centrifuged at 12000 x g at 4oC for 15 min and filtered through 0.22 µm centrifugal filter units (Millipore) to remove particulate material. Pre-equilibration of the column and SEC were both performed at a constant flow rate of 1 ml min-1. The system was controlled by Unicorn 5.0 Software (GE Healthcare). Approximately 500 µg of proteins were applied to the column using a 5 ml sample loop. Eluted proteins were detected at 280 nm.

All collected fractions (obtained from both Ni column and SEC purification) corresponding to Ctr proteins were assessed by SDS-PAGE analysis (see below). Prior to electrophoresis, samples for SDS-PAGE were prepared and stored at 4oC.

3.2.3.2 Purification of nucleic acid-free Ctr proteins (protocol B)

Overexpression and protein purification were performed based on similar methods as described in section 3.2.3.1. Briefly, overexpression was performed in E. coli Rosetta (DE3) in 500 ml LB medium at 37oC in the presence of 1 mM ampicillin and 1 mM chloramphenicol. Cells were grown to OD600 of 0.5-0.6, the temperature lowered to 15oC, the cells grown for a further 1 h, expression induced by the addition of 1.3 mM IPTG (final concentration) and cells harvested after 10 h by centrifugation (5000 x g for 20 min) at 4oC. The cell pellet was resuspended in 30 ml chilled lysis buffer B (20 mM Tris pH 7.4, 1 M NaCl, 1 M urea), protease inhibitor (Roche cOmplete ULTRA, EDTA 43

free) followed by Benzonase® nuclease (Novagen) addition, and cells lysed using a French press (Thermo Spectronic). Soluble and insoluble fractions were separated by centrifugation at 23000 x g at 4oC for 25 min. The insoluble fractions were resuspended in 8 M urea for analysis. The soluble extract was filtered through an 0.45 µm filter unit (Millipore) and treated with Benzonase® nuclease (Novagen) for a further 30 min. Ctr was purified on a pre-equilibrated (with lysis buffer B) Ni column (HiTrap Chelating HP, GE Healthcare) fitted to an ÄKTA purification system (GE Healthcare) at 4oC. A linear salt gradient from 2 M to 1 M NaCl was applied at flow rate of 2.5 ml min-1. Recombinant Ctr was eluted by applying increasing concentrations of imidazole (stepwise: 20, 50, 100 and 500 mM) with Ctr typically eluting at 100 mM. The corresponding fractions were collected and concentrated through centrifugal filtration units (Amicon Ultra, 3 kDa molecular weight cut-off). The concentrate was subjected to overnight dialysis (Novagen dialyzer, 3.5 kDa molecular weight cut-off) to remove imidazole and urea. Prior to SEC, the sample was centrifuged at 12000 x g at 4oC for 15 min and filtered through a 0.22 µm filtration unit (Millipore) to remove precipitates. SEC was performed on Superdex 75 pg columns fitted to an ÄKTA purification system (GE Healthcare) at 4oC. Pre-equilibration was performed in 50 mM HEPES, pH 7.4, 500 mM NaCl at a constant flow rate of 1 ml min-1. Protein purity was checked by SDS- PAGE using a 15% (w/v) acrylamide gel.

3.2.4 SDS PAGE analysis

All protein samples were qualitatively checked using denaturing SDS-PAGE according to Laemmli (1970). Samples were mixed with NuPAGE® LDS Sample Buffer (Novex) at 4:1 ratio. All sample mixtures were heated at 95oC for 5 min using a heating block. Proteins were separated on 4-12% NuPAGE® Bis-Tris Precast gradient Gels (Novex) by electrophoresis for 1 h at 200V in NuPAGE® MOPS SDS Running Buffer system. Novex® Sharp Pre-stained Protein Standard (size range 3.5-260 kDa) was used for molecular weight estimation. The gels were stained with Coomassie staining solution (60 mg/L Brilliant Blue R in 10% (v/v) acetic acid) for ~ 45 min followed by destaining with destaining solution (H2O, methanol, and acetic acid in a ratio of 5:4:1) for ~ 45 min. Intensity of gel bands are usually proportional to the amount of protein in the sample (Bradford, 1976).

44

3.2.5 Protein quantification

Proteins were quantified using Quick Start™ Bradford Protein Assay (Bio-Rad), according to the manufacturer’s instructions. BSA (Sigma) was used as protein standards from 0-2.5 mg/ml at an interval of 0.5 mg/ml. All protein assays were performed in triplicates. Absorbance values of the BSA standards at 590 nm against their corresponding concentrations were plotted. Unknown protein concentrations were subsequently determined from the standard curve.

Protein concentrations were also independently determined using a Direct Detect spectrometer (Millipore) based on infra-red spectrum of light. All dilutions and buffer concentrations were adjusted according to the manufacturer’s instructions.

Both the colorimetric (Bradford) and infrared based (Direct Detect) assays yielded very similar results. But clearly Direct Detect spectrometry claimed certain advantages over Bradford assay which includes less time, less sample volume requirement and easier sample handling.

Fractions containing Ctr proteins were pooled, further concentrated (>1 mg/ml) (Amicon Ultra, 3 kDa molecular weight cut-off) accordingly, flash-frozen in liquid nitrogen and stored at -80oC. Proteins concentrated above 1 mg/ml tend to be more stable to freeze-thaw cycle (Gräslund et al., 2008).

3.2.6 Protein Identification

The molecular mass and amino acid sequence of all Ctr proteins were assessed by performing liquid chromatography–mass spectrometry (LC-MS) and liquid chromatography–tandem mass spectrometry (LC-MS/MS) at the Bioanalytical Mass Spectrometry Facility (BMSF) (UNSW, Australia).

3.2.7 Nucleic acid identification assay

Nucleic acids (from E. coli) bound to Ctr proteins were extracted by phenol-chloroform procedure (Sambrook et al., 1989). The liberated nucleic acids were then subjected to digestion trials. Approximately 10 µg of liberated nucleic acids were separately digested with DNase I (Invitrogen), RNase A (Invitrogen) and Benzonase ® nuclease (Novagen) according to manufacturer’s instructions. The digested products were run on non- 45

denaturing 1.5% agarose gels for assessment. Benzonase ® nuclease was used as positive control in the assay.

3.3 Results

3.3.1 Expression and protein purification

E. coli nucleic acids consistently co-purified with all three recombinant Ctr proteins. The strategy that gave the best yield of all three Ctr proteins bound to nucleic acids (from E. coli) is described in section 3.2.3.1 (protocol A). The purified nucleic acid- bound proteins were soluble and appeared to be folded (assessed by far and near UV CD spectrometry; described in Chapter 4). The biggest challenge in developing strategies of purification was inventing a technique that produced high yields of soluble nucleic acid-free Ctr proteins. The aim was to develop a single purification method that could be applied to all three Ctr proteins from M. burtonii.

Purification strategies that failed to eliminate bound E. coli nucleic acids from Ctr proteins included PEI treatment, nuclease digestion and heparin-sepharose affinity chromatography (data not shown). Briefly, PEI is a positively charged polyelectrolyte. PEI was added to the protein sample at a final concentration of 0.5% (w/v). PEI-nucleic acid complexes were removed by centrifugation and the supernatant was incubated overnight with 75% ammonium sulphate. The suspension was centrifuged and the pellet containing the protein was resuspended in lysis buffer (20 mM Tris pH 7.4, 500 mM NaCl). Further purification step using IMAC Ni-charged resins was applied and proteins were eluted with 500 mM imidazole. Nuclease digestion technique involved incubation of nucleic acid-bound Ctr proteins with nuclease at 37oC for ~ 30 min (Rabilloud, 1999). After nuclease treatment, proteins were subjected to further purification step using IMAC Ni-charged resins to remove the nuclease. Heparin- sepharose operates as a negatively charged affinity ligand and can mimic the nucleic acid backbone. Briefly, protein samples were loaded onto the heparin-sepharose column. The nature of immobilized heparin-sepharose allows protein to bind to the column by displacing nucleic acids. The proteins were then eluted using a continuous NaCl gradient (0.2 M to 1 M). The failure to remove contaminant E. coli nucleic acids 46

by these methods (PEI, nuclease treatment and heparin chromatography) suggested possible tight/strong association of nucleic acids with Ctr proteins.

Urea denaturation/on-column refolding method has been successful in removing non- specifically bound nucleic acid contaminants from overexpressed His-tagged HIV-1 Rev in E. coli (Marenchino et al., 2009). This technique is a one-step method to simultaneously purify tagged proteins and eliminate bound host nucleic acids. Briefly, urea (8 M) containing lysate is loaded onto a sepharose Ni column. A linear gradient from 8 to 0 M urea is applied to achieve on-column protein renaturation. Finally proteins are eluted with 500 mM imidazole (Marenchino et al., 2009). When this strategy was applied to remove RNA from Ctr proteins, it was not effective. In addition to low yield of proteins, E. coli RNA continued to remain bound to proteins (data not shown). However, modifications made to the original method developed by Marenchino et al. (2009) proved successful in removing contaminant E. coli nucleic acids from Ctr proteins. Mild urea (1 M) was used to partially unfold Ctr proteins which were loaded onto a sepharose Ni column. The bound RNA was able to be removed by treating the partially unfolded proteins across a NaCl gradient (2 M to 1 M). Proteins were eluted with 500 mM imidazole and refolding/renaturation was achieved by removing urea by dialysis (section 3.2.3.2; protocol B). The purified nucleic acid-free proteins were soluble and appeared to be folded (assessed by far and near-UV CD spectrometry; described in Chapter 4); with a yield of ~ 2 mg/L. This strategy (section 3.2.3.2; protocol B) was successfully applied to all three Ctr proteins (Ctr1, Ctr2 and Ctr3).

47

Figure 3.2. Induction gel for overexpression of ctr3 in E. coli Rosetta (DE3) at 15oC. Protein standards are marked accordingly in kDa. Lanes 1-3 are soluble fractions and Lanes 4-6 are insoluble fractions. Lane 1 and Lane 4 represent cell extracts before induction, Lane 2 and lane 5 represent negative control (no IPTG induction) and Lane 3 and Lane 6 represent cell extracts after induction (1.3 mM IPTG). The SDS-PAGE protein band marked by red circle is the overexpressed soluble recombinant Ctr3.

Ctr proteins without nucleic acids (using protocol B) typically eluted from the Ni- column at the 100 mM imidazole step (Figure 3.4A,C,E). This trend was identical for all three Ctr proteins (Figure 3.4). SEC profiles showed that all Ctr proteins eluted as a single peaks with similar elution volume/time (Figure 3.4B,D,F). Reasonably symmetric SEC elution profiles were indicative of mono-dispersed homogeneous proteins (Figure 3.4 B,D,F). Proteins obtained from SEC were collected from fractions that corresponded to the top part of the peaks i.e. fractions within the black vertical dotted lines (Figure 3.4B,D,F) and assessed using SDS-PAGE electrophoresis (Figure 3.5).

(A) 48

(B)

(C)

(D) 49

(E)

(F)

Figure 3.3. Ni-column chromatography and SEC purification profiles for Ctr proteins. Panels A-B represent Ctr1, panels C-D represent Crt2 and panels E-F represent Ctr3. Panel A, C and E are Ni column purification profiles for Ctr1, Ctr2 and Ctr3 purified without nucleic acids, respectively Panel B, D and F represent SEC profiles for Ctr1, Ctr2 and Ctr3, respectively; fractions enclosed within vertical lines were collected as final purified Ctr proteins. Blue represents spectrometric UV trace, green represents imidazole steps: 20 mM, 50 mM, 100 mM and 500 mM as marked accordingly and red represents fraction collection numbers. In (C), the brown plot represents logbook trace of the purification.

50

Figure 3.4. SDS-PAGE protein bands of recombinant Ctr. Protein standards are marked accordingly in kDa. Lane 1-2 represent Ctr1, lane 3-4 represent Ctr2 and lane 5-6 represent Ctr3 obtained from Ni column and SEC purification, in that order.

3.3.2 Quantification of nucleic acid content of protocol A vs protocol B

Purification using protocol A yielded Ctr proteins bound to nucleic acids whereas protocol B produced proteins devoid of nucleic acids. In order to quantify the amount of nucleic acids successfully removed by protocol B, molar ratios were estimated for protein and bound RNA (from protocol A vs protocol B). The bound nucleic acids were extracted (phenol-chloroform) from all three Ctr proteins (Ctr1, Ctr2 and Ctr3). The mass of the extracted RNA was quantified using Nanodrop (measured at 260 and 280 nm); protocol A: ~ 40000 ng of RNA, and protocol B: ~ 3000 ng of RNA was extracted from 1 mg of each Ctr protein. The average length of the bound RNA was considered as 100 nucleotides (approximate average length of tRNA and 5S rRNA; RNA binding partners of Ctr3, described in Chapter 6). Approximate molecular weights of bound RNA were calculated using the following formula:

(Number of nucleotides x 320.5) + 159.0 (http://www.thermofisher.com/).

Where, "320.5" is the average molecular weight of each nucleotide within the polynucleotide and addition of "159" takes into account the molecular weight of 5' triphosphate.

Number of moles (mass/molecular weight) of bound RNA was then calculated as follows. Molecular weights of all three Ctr proteins were obtained from LC-MS analysis 51

(section 3.3.3); number of moles of Ctr proteins was calculated using the same formula (mass/molecular weight). Avogadro’s constant (6.022×1023) was then used to calculate the number of molecules for both bound RNA and protein (number of moles x 6.022×1023); and subsequently ratios between protein and bound RNA were determined. The molar ratio of Ctr protein molecule: RNA single nucleotide changed from ~ 1:1 (protocol A) to ~ 14:1 (protocol B). This indicates that after purification using protocol B, at most, a single RNA nucleotide was bound per 14 Ctr protein molecules. If the bound RNA is in the form of a polynucleotide then an even larger proportion of Ctr molecules are completely free of nucleotides. Accordingly, molar ratios of protein:RNA determined for all three Ctr proteins (Ctr1, Ctr2 and Ctr3) were similar; RNA single nucleotide changed from ~ 1:1 before (protocol A) to ~ 14:1 (protocol B).

3.3.3 Quantification of Ctr proteins

The general lack of tryptophan residues in all three Ctr proteins from M. burtonii made it difficult to quantify purified recombinant proteins. Nanodrop spectrometer (Thermo Scientific) failed to quantify these proteins accurately (data not shown). Each Ctr protein has only a single tyrosine and 4-5 phenylalanine residues (five Phe residues in Ctr1 and Ctr2; four Phe residues in Ctr3) (Figure 3.6), however, the total absence of tryptophan residues gave low absorbance values around 260-280 nm (measured by Nanodrop). To overcome this drawback, two independent techniques were used to measure protein concentrations: classical Bradford assay (Bradford, 1976) and Direct Detect Infrared Spectrometry (Millipore). Bradford assay is a colorimetric technique whilst Direct Detect Infrared Spectrometry (Millipore) is an infrared based technique that measures amide bonds in protein chains (Bradford, 1976; Strug et al., 2014). Both these techniques gave almost identical results with a consensus yield of ~ 2 mg proteins from 1 litre of culture.

Protein samples were subjected to LC/MS and LC/MS-MS analysis at a concentration of 1 mg/ml to assess molecular mass and amino acid sequence of Ctr proteins, respectively. Due to the presence of His-tags in the N-terminal of Ctr proteins; molecular weights determined by LC-MS were relatively higher than the theoretical molecular weight. Molecular weights, reported by LC-MS analysis, were: Ctr1- 8.3 kDa, Ctr2- 8.2 kDa and Ctr3- 7.6 kDa. Amino acid sequence analysis by LC/MS-MS showed no discrepancy (i.e. exhibited 100% match) (Figure 3.6). 52

(A) Ctr1 (Mbur_0304) MHHHHHHLFNNMESTAPVEAGETYEVTIEDIAREGDGIARVSGFVVFVPNTSVGDEVTI KVTKVMRKFAFGEVAE (B) Ctr2 (Mbur_0604) MHHHHHHLFNNMESTAPVEAGETYDVTIEDIAREGDGIARVSGFVIFVPEASVGDEVTI KVTKVMSKFAFGELV (C) Ctr3 (Mbur_1445) MHHHHHHMESTAPVEAGESYDVTIEDTAREGDGIARVSGFVIFVPNTSVGDEVTIKVTK VARKFAFGEVV

Figure 3.5. Amino acid sequence of Ctr proteins confirmed by mass spectrometry. (A) Ctr1, (B) Ctr2 and (C) Ctr3 amino acids sequences are shown accordingly. N-terminal His-tags in all Ctr proteins are underlined.

3.3.4 Identification of the bound nucleic acids

The bound nucleic acids were extracted (phenol-chloroform) from all Ctr proteins, digested separately by DNase, RNase and Benzonase nuclease (control) and assessed on 1.5% non-denaturing agarose gel. The nucleic acids were digested completely by RNase whilst DNase had no effect (Figure 3.7), thereby, suggesting that the nucleic acids co- purified with Ctr proteins were in fact RNA.

Figure 3.6. Identification of the nucleic acids bound to Ctr3. Ctr3-bound E. coli nucleic acids were extracted (phenol-chloroform) and electrophoresed on a non-denaturing 1.5% agarose gel. Lane A: 100 53

bp molecular marker; Lane B: E. coli nucleic acids (~ 5 µg) extracted from Ctr3; Lane C: DNase I treated nucleic acids (~ 5 µg); Lane D: RNase A treated nucleic acids (~ 5 µg); Lane E: Benzonase nuclease treated nucleic acids (~ 5 µg). Digestion by RNase A and Benzonase but not by DNase I identified that the nucleic acids were RNA.

3.4 Discussion

Overexpression of recombinant proteins in mesophilic hosts (e.g. E.coli) at lower temperatures (e.g. ~ 15oC) results in slower rate of protein production; which conceivably allows foreign transcribed recombinant proteins more time to fold accurately and yield soluble proteins (Vera et al., 2007). Induction of target genes at ~ 15oC has been proven useful for gene expression of M. burtonii proteins (Pilak et al., 2011; DeFrancisci et al., 2011; Thomas and Cavicchioli, 2001). Lower induction temperatures were also shown to increase protein yield and limit proteolytic degradation of recombinant psychrophilic proteins, EF-2 and RpoEF from M. burtonii (Thomas et al., 2002; DeFrancisci et al., 2011). Overexpression of Ctr proteins from M. burtonii at 15oC increased protein yield compared to overexpression at 30oC (data not shown).

His-tags have been shown to improve recombinant protein production in mesophilic hosts (e.g. E. coli). Recombinant Ctr proteins were produced as fusions to N-terminal His-tags (6X) and His-tags did not negatively affect the solubility of Ctr proteins. Purification using IMAC Ni-charged resins and SEC yielded large quantities of soluble heterologous proteins. Fusion to N-terminal His-tags has already been proven to be a useful strategy in purifying M. burtonii proteins such as Chaperonins and RpoEF in large quantities (Pilak et al., 2011; DeFrancisci et al., 2011). Overall, purification strategies used for M. burtonii Ctr proteins were similar to the strategies applied successfully for proteins from M. burtonii: EF-2 (Thomas and Cavicchioli, 2001), chaperonins (Pilak et al., 2011) and RpoEF (DeFrancisci et al., 2011) and a Csp protein from M. frigidum (Giaquinto et al., 2007).

Recombinant RNA-binding proteins purified from E. coli frequently contain nucleic acid contamination (Marenchino et al., 2009). As expected, E. coli RNA consistently co-purified with all three recombinant Ctr proteins. Conventional purification steps (PEI treatment, RNase digestion and heparin-sepharose affinity chromatography) failed to eliminate the bound RNA from Ctr proteins which suggested tight association of RNA 54

with Ctr proteins. However, the bound nucleic acids were able to be removed by treating the recombinant proteins with mild urea (1 M) to partially unfold the protein, eluting the protein across a NaCl gradient, and refolding it by dialysis. E. coli lysate containing recombinant Ctr proteins were resuspended in high ionic strength (1 M NaCl) buffer. The use of high ionic strength (1 M) has been proven to reduce RNA- binding protein's affinity towards cellular nucleic acids (Marenchino et al., 2009). This simple and efficient purification method was a combination of chemical denaturation (via urea) and on-column removal of RNA (via NaCl gradient) that allowed purification of Ctr proteins without containing E. coli RNA contaminants. The protocol (section 3.2.3.2; protocol B) was reproducible and was applicable to all three Ctr proteins (Ctr1, Ctr2 and Ctr3).

In conclusion, the purification strategy developed (protocol B) effectively removed E. coli RNA from Ctr proteins. This strategy was successfully applied to all three Ctr proteins (Ctr1, Ctr2 and Ctr3) to produce heterologous soluble proteins in folded states. Ctr proteins were purified with higher yield when overexpressed at 15°C compared to at 30°C. It will be interesting to assess whether protocol B can be successfully applied to other psychrophilic nucleic acid binding proteins.

55

Chapter 4

Biophysical and structural analysis of Ctr3

56

Abstract

The structure-function-stability relationship of Ctr3 from M. burtonii was investigated using various spectrometric (circular dichroism, intrinsic and extrinsic fluorescence), calorimetric (differential scanning calorimetry) and electrophoretic (transverse urea gradient gel electrophoresis) methods. It was determined that the single TRAM domain o protein exhibited a reversible two-state unfolding (Tm of ~ 50 C) as demonstrated by a

ΔHcal /ΔHvH ratio close to unity. To understand the effect of RNA on unfolding- refolding properties and conformational stability of Ctr3, RNA-bound vs RNA-free proteins were assessed. RNA exhibited increased affinity towards the denatured state of

Ctr3 instead of the native state. Calorimetric analysis showed that Tm decreased by ~ o o o 13 C for RNA-bound (Tm 37 C) vs RNA-free (Tm 50 C) protein. The thermostability of Ctr3 was affected by repeated unfolding and refolding cycles, which suggested that high temperature appears to produce covalent changes leading to the formation of a different and more stable conformation. A homology model of Ctr3 was also analysed. The predicted three-dimensional structure of Ctr3 exhibited substantial structural similarities with the TRAM domain of RumA protein (RlmD) from E. coli. The aromatic residues, particularly the four phenylalanine residues of Ctr3 appeared to be surface exposed and in close proximity to each other on the putative RNA-binding surface, similar to the aromatic residues in RumA-TRAM domain, suggesting similar roles in RNA interaction. These characteristics provided valuable information regarding biophysical and structural features of RNA-bound vs RNA-free forms of Ctr3.

57

4.1 Introduction

At low temperatures, protein activity is significantly compromised (Feller and Gerday, 1997; Siddiqui et al., 2002; Feller and Gerday, 2003); to compensate for this, proteins from psychrophiles have developed features (e.g. higher structural flexibility) that allow them to function (e.g. catalysis) in the cold (Feller et al., 1997; Gerday et al., 1997; Russell, 2000; Lonhienne et al., 2000; Johns and Somero, 2004). However, higher flexibility also results in low stability because of activity-stability trade-off (Siddiqui and Cavicchioli, 2006; Siddiqui, 2016). Studies on cold-adapted proteins can provide important insights into cold adaptation of psychrophiles as many cold-adapted proteins drive many essential cellular processes at low temperature (e.g. transcription, translation) (Feller and Gerday, 1997; Siddiqui et al., 2002; Feller and Gerday, 2003; D'amico et al., 2006).

A number of biophysical techniques are available to study structure and stability of proteins. Spectrometric techniques such as circular dichroism (CD) in the near-UV region (260 to 320 nm) and intrinsic fluorescence provides information on the environment around specific amino acid residues e.g. Phe, Tyr, Trp and cysteine residues for Near-UV CD and Tyr and Trp for intrinsic fluorescence hence both these spectrometric analyses are suitable for studying the tertiary structure of a protein (Cavicchioli et al., 2006). CD in the Far-UV region (below 240 nm) only gives information regarding the secondary structures of the protein. These methods are useful to assess structural (secondary or tertiary) unfolding/refolding transitions and reversibility of a protein (Cavicchioli et al., 2006). Differential scanning calorimetry

(DSC) is a technique that measures changes in the molar heat capacity (Cp) of a protein as a function of temperature with concomitant determination of enthalpy of unfolding/refolding transition of the overall structure (Cavicchioli et al., 2006). Transverse urea gradient-gel electrophoresis (TUG-GE) is an electrophoretic technique based on the separation of various protein conformations due to changes in hydrodynamic volume as result of structural unfolding. It is used to study unfolding/folding transitions and to differentiate various conformational states induced by chemical denaturant such as urea (Cavicchioli et al., 2006). The benefit of TUG- PAGE over spectrometric and calorimetric methods include analysis of stability as well as separation of various structural and conformation protein forms in the same experiment (Giaquinto et al., 2007). TUG-GE can be used to study both reversible and 58

irreversible unfolding. In case of reversible unfolding, the unfolding mechanism (two- state vs non-two-state) and the thermodynamic stability (free-energy of unfolding, ΔG) of a protein can also be determined (Siddiqui et al., 2005; Cavicchioli et al., 2006; Giaquinto et al., 2007). Collectively, the information generated from spectrometric, calorimetric and electrophoretic methods can provide detailed assessment of structure- stability relationship of a protein.

M. burtonii grows at cold temperatures (≤4oC) and has proven to be an excellent model for studying molecular mechanisms of cold adaptation (Cavicchioli, 2006; Williams et al., 2011). Several studies have been performed on proteins from M. burtonii in order to obtain insights into protein structure, function, stability relationship in context of cold adaptation (EF-2: Thomas et al., 2001; chaperonin: Pilak et al., 2011; RpoEF: DeFrancisci et al., 2011). Previously, a number of spectrometric (circular dichroism, fluorescence), calorimetric (differential scanning calorimetry) and electrophoretic (transverse urea gradient gel electrophoresis) methods were used to assess unfolding- refolding properties, conformational stability and kinetic/thermodynamic parameters of o cold-adapted proteins that included EF-2 (Tm = 51 C), Chaperonins [MbCpn0972 (Tm = o 39°C), MbCpn1960 (Tm = 49°C) and MbCpn2146 (Tm = 54°C)] and RpoEF (Tm = 51 C) from M. burtonii (Thomas and Cavicchioli, 2000; Pilak et al., 2011; DeFrancisci et al.,

2011). It is important to note that the melting temperatures (Tm) reported for these proteins are above the upper temperature for M. burtonii viability (~ 28oC) (Franzmann et al., 1992); a characteristic common to many psychrophilic proteins (Pilak et al., 2011; DeFrancisci et al., 2011).

Nucleic acid binding proteins are postulated to play important roles in the cold adaptation of M. burtonii (Allen et al., 2009; Campanaro et al., 2011). The single TRAM proteins (Ctr1, Ctr2 and Ctr3) from M. burtonii preferentially bind RNA (Chapter 3) and were shown to be up-regulated at low temperature (Williams et al., 2010a; Williams et al., 2011; Campanaro et al., 2011). Many RNA-binding proteins have the characteristic OB-fold (Murzin, 1993). OB-folds generally consist of anti- parallel β-sheets that form the β-barrel structure (Murzin, 1993; Sawyer et al., 2015). Nucleic acid-binding surfaces of OB-fold proteins are typically enriched with basic amino acids that provide an overall positive electrostatic potential to facilitate interactions with the negatively charged nucleic acids (Murzin, 1993; Schindelin et al., 1993; 1994; Schröder et al., 1995; Feng et al., 1998; Schindler et al., 1999; Lee et al., 59

2005; Sawyer et al., 2015). The binding surfaces of these proteins are also characteristically enriched with surface exposed aromatic amino acids which can interact with RNA directly and play critical roles in protein stability and specific binding (Murzin, 1993; Schindelin et al., 1993; 1994; Schröder et al., 1995; Hillier et al., 1998; Lee et al., 2005; Sawyer et al., 2015). For example, RumA (RlmD), a methyltransferase from E. coli which possesses a TRAM domain (OB-fold) as the first of its three domains (Lee et al., 2004), exhibits conformational change upon RNA interaction. For RumA, when the TRAM domain interacts with an RNA hairpin structure, the free energy of binding enhances the catalytic efficiency of the methyltransferase (Lee et al., 2005). The inherent flexibility of the TRAM domain in RumA is important for enabling the conformational change necessary for binding to RNA (Lee et al., 2004), and such flexibility appears to be a general characteristic of OB-fold proteins (Mihailovich et al., 2010; Sawyer et al., 2015).

Chapter 3 describes the successful heterologous expression and purification of largely, nucleic acid (RNA) free Ctr proteins. Among the three single TRAM proteins from M. burtonii, Ctr3 has the highest abundance at -2oC compared to 23°C; an upregulation of ~ 9-fold compared to Ctr1 and Ctr2 (Williams et al., 2011). Based on abundance, proteomic studies (Williams et al., 2010a,b; Williams et al., 2011) indicated that Ctr3 might have roles in facilitating translation and transcription by unravelling inhibitory RNA secondary structures formed at low temperature (Williams et al., 2011). Therefore, Ctr3 (rather than Ctr1 or 2) was chosen for biophysical characterization. In order to determine features of Ctr3 protein-RNA interaction, different biophysical properties were assessed using spectrometric (circular dichroism, fluorescence), calorimetric (differential scanning calorimetry) and electrophoretic (transverse urea gradient gel electrophoresis) methods. Unfolding-refolding properties and conformational stability of RNA-bound vs RNA-free Ctr3 protein were compared. A homology model of Ctr3 was also analysed and putative structural features that could conceivably facilitate RNA-binding were identified. Overall, this chapter describes assessment of structural characteristics related to RNA binding of Ctr3.

60

4.2 Materials and Methods

4.2.1 Far-UV circular dichroism (CD) spectrometry

Far-UV spectra were generated for both RNA-bound and RNA-free Ctr3 proteins, in the range of 190–260 nm at 4oC in 50 mM HEPES, pH 7.4 buffer system. JASCO J-810CD Spectropolarimeter under constant nitrogen flow, connected to a Peltier temperature controller was used for all scans and melting experiments. Spectra were obtained using 0.1 cm path length and a protein concentration of 0.1 mg ml-1. Melting experiments were performed at 215 nm (for β-sheets). Buffer scans were subtracted from the protein scans and the data were smoothed using Software supplied by JASCO (Cavicchioli et al., 2006). In all experiments, the high-tension voltage (HTV) was kept below 600. Mean residue ellipticity (θ) was calculated using cell path length, molecular weight, concentration and numbers of residues (Kelly et al., 2005).

4.2.2 Near-UV circular dichroism (CD) spectrometry

Near-UV spectra were recorded for both RNA-bound and RNA-free Ctr3 proteins. All scans and melting experiments were carried out in the range of 240-400 nm at 4oC in 50 mM HEPES, pH 7.4 buffer system under constant nitrogen flow. JASCO J-810CD Spectropolarimeter connected to a Peltier temperature controller was used for all experiments using a path length of 1 cm and protein concentration of 1 mg ml-1. Buffer scans were subtracted from the protein scans and the data were smoothed using Software supplied by JASCO (Cavicchioli et al., 2006). Melt profiles were followed at 258 nm (for Phe). In all experiments, the HTV was kept below 600. Mean residue ellipticity (θ) was calculated using cell path length, molecular weight, concentration and numbers of residues (Kelly et al., 2005).

4.2.3 Intrinsic fluorescence spectrometry

Both RNA-bound and RNA-free Ctr3 proteins were prepared at concentration of 1 mg ml-1 in 50 mM HEPES, 500 mM NaCl, pH 7.4 buffer. Perkin Elmer Luminescence Spectrometer LS-50B connected to a Peltier temperature controller was used for all experiments. Samples were excited at 280 nm and all scans were carried out in the range 290-400 nm and a scan rate of 1.1°C min-1 at 4oC. Buffer signals were subtracted from the protein scans (Cavicchioli et al., 2006). The emission wavelength of Tyr is ~ 305 nm; therefore, protein melts were obtained at 305 nm. 61

4.2.4 Extrinsic fluorescence spectrometry

Experiments were performed on a Perkin Elmer Luminescence Spectrometer LS-50B connected to a Peltier temperature controller. Samples of both RNA-bound and RNA- free Ctr3 proteins were prepared at concentration of 1 mg ml-1 in 50 mM HEPES, 500 mM NaCl, pH 7.4 buffer and 8-Anilinonaphthalene-1-sulfonic acid (ANS) is used as the fluorescent dye at concentration of 50 µM. Samples were excited at 380 nm and all emission scans were carried out in the range 400-620 nm from 4oC to 90oC at intervals of ~ 5oC. Buffer signals were subtracted from the protein scans (Cavicchioli et al., 2006) and emission at 480 nm against temperature were plotted.

4.2.5 Differential Scanning Calorimetry (DSC)

RNA-bound and RNA-free Ctr3 proteins in 50 mM HEPES, 500 mM NaCl, pH 7.4 buffer were concentrated to 10 mg ml-1 using centrifugal filter units (Amicon Ultra - 3K). The filtrate collected at the bottom of the centrifugal filter tubes were used as reference in DSC experiments (Ashutosh et al., 2000). All samples and buffers were centrifuged at 21000 x g for 10 min at 4oC to remove any precipitate and particulate matters followed by degassing under vacuum using Thermovac Unit (Microcal) for at least 20 min at 4°C prior to DSC thermal scans. NanoDSC (TA instruments) with a cell volume of 0.3 ml and a pressure of 3 atm were used to perform all experiments. The cells of DSC instrument were repeatedly washed with Milli-Q water prior to equilibration. Thermal equilibration was achieved by running at least two buffer-buffer thermal scans before the protein scans.

For each calorimetric run, a series of scans with identical parameters (4°C to 80°C and back to 4oC at a scan rate of 1.1oC min-1 before starting the next scan) were set up. The thermograms were recorded and analysed using NanoAnalyze software. Thermogram for the first buffer-buffer scan was not considered for analysis due to inadequate thermal history. Melting temperature (Tm) corresponded to the maximum heat capacity, Cp. A minimum of four scans were carried out: two buffer-buffer scans and two successive protein scans. Additionally, multiple (>2) scans were performed on each sample to determine the reversibility of unfolding.

62

4.2.6 Transverse urea gradient gel electrophoresis (TUG-GE)

TUG-GE was performed based on methods described previously (Siddiqui et al., 2005; Cavicchioli et al., 2006; Giaquinto et al., 2007). A lower urea gradient (0 - 6.64 M) was used for the cold adapted Ctr3 protein (Giaquinto et al., 2007). Four solutions (V1-V4) were prepared as follows:

Solutions V1 V2 V3 V4

0.86 M imidazole-HEPES 1 1.63 1.63 1.63 (pH 7.4) (ml)

40% acrylamide (C = 3.3%; 4 6.5 8.9 8.9 29:1) (Bio-Rad) (ml)

Urea (g) 8.53 12.75 - -

0.4% bromophenol - 50 - - Blue dye (µl)

Distilled water (ml) Final Final Final Final (up to) volume volume volume volume

20 ml 32.5 ml 32.5 ml 32.5 ml

10% APS (µl) 23 42 32 52

TEMED (Sigma) (µl) 4.5 9 6 9

Volume used (ml) 14 32.5 32.5 25

Bromophenol blue was only added to solution V2 to monitor gradient formation during gel preparation. APS and TEMED were added to all solutions just before pouring of the gels to avoid premature polymerization. Hoefer Multiple Gel system (Fisher Scientific) was set up based on instructions described by Siddiqui et al. (2005). The gel sandwich pairs (with single well comb inserts) were perfectly positioned so that the spacer protected sides were aligned at the bottom and the top of the gel caster. V1 was slowly poured along the sidewalls of the gel caster. Solutions V2 and V3 were poured into the two separate chambers of Hoefer Gradient Mixer (Fisher Scientific). A peristaltic pump was then used to transfer the mixture (urea gradient superimposed on an inverse acrylamide gradient) into the gel caster. Magnetic stirrers placed in the V2 solution helped mixed V3 solution with V2 solution. The blue intensity (due to bromophenol 63

blue) decreased from the bottom to top of gel. The remaining gel caster space was filled o up with V4 solution. The gels were allowed to polymerize overnight at 15 C. After polymerization, gels were taken out of the gel caster, excess polyacrylamide was discarded and the single well combs were carefully removed to yield TUG-GE gels with urea gradient of 0-6.64 M and inverse acrylamide gradient of 15-11%.

To study unfolding (folded ↔ unfolded), proteins samples were prepared (with and without urea) as follows:

Solutions with urea without urea

50 mM imidazole-HEPES 3.5 3.5 (pH 7.4) (µl)

Glycerol (µl) - 20

Urea (g) 0.027 -

0.4% bromophenol 1 1 Blue dye (µl)

Protein (35 µg) (ml) 5 5

Distilled water (µl) ~30 40

Total sample volume (µl) 70 70

The protein samples were incubated on ice for ~ 2 h. While the samples were on ice, the gels were assembled in a Hoefer SE-250 electrophoresis unit (Bio-Rad). The temperature of the gels was kept around 7oC by connecting the electrophoresis unit to a temperature controlled water bath (MultiTemp III; Pharmacia Biotech). Gels were subjected to pre-runs at 10 mA for ~ 40 min to remove unreacted reagents such as cyanate ions. Samples were then carefully loaded as a single band and electrophoresis was performed in 0.86 M imidazole-HEPES (pH 7.4) buffer system at 10 mA for ~ 3 h. Gels were removed from the plates, washed with Milli-Q water and stained with Coomassie Blue R250. Gel images were taken using LAS3000 (Fujifilm, Melbourne, Australia) with ImageGauge v4.0 software. ∆G (stability of folded form relative to the unfolded form) and [urea]1/2 (urea concentration corresponding to the mid-point of the curve between the fully folded and fully unfolded forms of the protein) of Ctr3 were determined as described previously (Siddiqui et al., 2005; Cavicchioli et al., 2006). 64

4.2.7 Homology modelling and structure prediction

A protein homology model was determined for Ctr3 using the SWISS-MODEL (Biasini et al., 2014) server and homologous structures from the Protein Data Bank (PDB). Methanosarcina mazei MM_1357 (PDB: 1YEZ) single TRAM domain protein (~ 68% identity to Ctr3) and the TRAM domain from E. coli RumA (RlmD) (EC01304_4012) (PDB: 2BH2) 23S rRNA uracil-5-methyltransferase (~ 25% identity to Ctr3) were used as templates for alignments. Visualization and comparative analyses were performed using the PyMOL Molecular Graphics System, Version 1.5 Schrödinger, LLC. The amino acid sequences were aligned using MUSCLE (Edgar, 2004) on the T-coffee platform (Notredame et al., 2000).

4.3 Results

4.3.1 Far-UV CD spectrometric analysis of Ctr3

Circular dichroism spectrometry is a technique that measures the difference in absorbance between two circularly polarized components (L and R) of plane polarized light and report the difference as ellipticity (θ) in degrees. “R” refers to rotation in the clockwise and “L” refers to rotation in the anticlockwise direction where both R and L components are equal in magnitude. Emission (or radiation) generated when L and R components of polarized light are absorbed in different amounts (by the sample) is described as possessing ellipticity; however if L and R are not absorbed by the sample, the two components are recombined in polarized radiation in the original plane (Kelly et al., 2005). A CD signal is detected if a molecule is chiral (carbon is bonded to four different groups) or the environment around the chromophore is asymmetric. For proteins, a number of absorption regions serve as useful indicators of structural features: absorption below 240 nm represents peptide bonds; absorption between 260 to 320 nm represents amino acid side chains and absorption around 260 nm represent disulphide bonds (Kelly et al., 2005). Absorption below 240 nm (far-UV) is particularly useful for the determination of secondary structure compositions by studying the peptide backbone (Cavicchioli et al., 2006). In this study, far-UV CD spectrometry was used to infer quantitative information about the secondary structures of RNA-bound vs RNA- free Ctr3. 65

The secondary structures of RNA-bound and RNA-free Ctr3were obtained in the far- UV region (Figure 4.1). Both forms of Ctr3 (RNA-bound and RNA-free) displayed maximum peaks around 215 nm, a characteristic of β-sheets (Figure 4.1A,C). Almost identical profiles were observed after thermal melting and cooling indicating that the secondary structure of Ctr3 (RNA-bound and RNA-free) unfolded reversibly (Figure 4.1A,C). Thermal unfolding of Ctr3 was monitored by far-UV CD at 215 nm (Figure

4.1B,D). The Tm of the secondary structures of Ctr3 (RNA-bound and RNA-free) were o determined from the far-UV melting profiles; in all instances Tm was ~ 50 C (Figure 4.1B,D). Ctr3, whether bound to E. coli or M. burtonii RNA (Figure 4.1 A,C), gave almost identical thermal unfolding profiles and Tm (data for Ctr3 bound to M. burtonii RNA is only shown).

Figure 4.1. Far-UV CD spectrometry analyses of Ctr3. (A) and (B) represent profiles for RNA-bound Ctr3; and (C) and (D) represent profiles for RNA-free Ctr3 (A) and (C) Far-UV CD spectra at 4°C showing that Ctr3 is a folded protein that is rich in -sheets. The spectra obtained after thermal melting 66

and cooling indicate that Ctr3 folds reversibly. (B) and (D) Thermal unfolding of Ctr3 as monitored by far-UV CD signal at 215 nm at 4°C. After complete unfolding, the sample was cooled and melted a o o o second time, showing reversible unfolding (Tm ~ 50 C) across the temperature range (4 C to 80 C).

4.3.2 Near-UV CD spectrometric analysis of Ctr3

While far-UV CD spectrometry provided information regarding the secondary structure of Ctr3, the nature of the tertiary structure of the protein was assessed using near-UV CD spectrometry. CD spectrometry measures changes in the environment of molecules; an asymmetric molecule, therefore, produces a stronger near-UV CD signal than a symmetric molecule exposed to an isotropic environment. Generally, an unfolded protein does not give any detectable near-UV CD signals, however a folded protein with amino acid side chains immobilized in an asymmetric environment yields detectable near-UV CD signal (Cavicchioli et al., 2006; Woody, 1996).

Ctr3 does not have any Trp or Cys residue; but has one Tyr and four Phe residues (Figure 3.6C). Near-UV CD spectrometry was therefore used to assess whether purified recombinant Ctr3 was folded (Chapter 3). Sharp peaks were observed at ~ 258 nm for Ctr3 (RNA-bound and RNA-free), a typical near-UV signal for Phe residues (Figure 4.2A,C); thus suggesting that the purified Ctr3 was folded. After thermal unfolding, peaks from first and second scans corresponding to Phe residues were almost super imposable for RNA-bound Ctr3 (RNA from M. burtonii) (Figure 4.2A); but for RNA- free Ctr3, the first thermal melt profile differed from the subsequent unfolding-refolding scans (Figure 4.2C). After the first unfolding-refolding cycle, the near-UV spectrum corresponding to the Phe residues changed and then remained essentially the same for subsequent cycles (Figure 4.2C). This result indicated that the environment around the Phe residues changed once the protein had been unfolded, and then remained unchanged (Figure 4.2C).

Presence of RNA can substantially influence spectrometric measurements around ~ 260 nm. Therefore, in experiments where RNA was bound to proteins, spectrometric measurement of proteins around ~ 260 nm was difficult to be distinguished from signals from RNA. In near-UV CD of RNA-bound Ctr3, signals obtained around 300 nm did not reach baseline (Figure 4.2A). In addition, thermal unfolding of RNA-bound Ctr3 (monitored by near-UV CD signal at 258 nm at 4°C) resulted in scattered/noisy 67

elliptical trace in the temperature range of 4 to 80oC (Figure 4.2B). However, in near- UV CD of RNA-free Ctr3, signals obtained around 300 nm reached the baseline (Figure 4.2C) and after complete unfolding (where samples were cooled and unfolded again) showed well-defined reversible unfolding across the temperature range (4oC to 80oC) (Figure 4.2D). These results indicated that the bound RNA is likely to interfere in near- UV CD assessment at ~ 260 nm which is also the absorbance wavelength of RNA. Therefore, information derived from near-UV CD of RNA-bound Ctr3 was not very reliable. The Tm determined by near-UV CD spectrometry of RNA-free Ctr3 was almost identical to the Tm determined by far-UV CD spectrometry (of RNA-bound and RNA- free Ctr3) i.e. ~ 50oC.

Figure 4.2. Near-UV CD spectrometry analyses of Ctr3. (A) and (B) represent profiles for RNA-bound Ctr3; and (C) and (D) represent profiles for RNA-free Ctr3. (A) and (C) Near-UV CD spectra at 4°C showing typical phenylalanine peak at 258 nm indicating both forms of Ctr3 are folded. The spectra obtained at 4°C from samples that were previously unfolded showed that Ctr3 refolds reversibly. (B) 68

RNA-bound Ctr3 gave noisy near-UV CD signals, possibly due to interference caused by RNA at ~ 260 nm. (D) Thermal unfolding as monitored by near-UV CD at 258 nm showing definite reversible unfolding of RNA-free Ctr3 across the temperature range (4oC to 80oC).

4.3.3 Intrinsic fluorescence spectrometric analysis of Ctr3

All aromatic amino acids absorb light in the UV-range. In intrinsic fluorescence spectrometry, aromatic side chains are excited at a wavelength which corresponds to their λmax. When excited electrons return to ground state, they emit detectable fluorescence and the emission wavelength of the sample is always longer than the absorption/excitation wavelength (Cavicchioli et al., 2006). Intrinsic fluorescence spectrometry is sensitive to changes in the environment around aromatic amino acid residues such as between polar and non-polar protein regions. An Intrinsic fluorescence emission spectrum can be used to assess conformational changes in protein such as due to structural folding/unfolding (Cavicchioli et al., 2006).

Among the three aromatic amino acid residues, Trp has the highest quantum yield due to its strong absorption (Cavicchioli et al., 2006). However, Ctr3 does not have any Trp residues; but contains a single Tyr residue (Figure 3.6C). When samples were excited at

280 nm (λmax of Tyr), Ctr3 (RNA-bound and RNA-free) produced emission peaks around ~ 305 nm (Figure 4.3A,C). The first thermal melt profile differed from all the subsequent rounds of unfolding-refolding (a property also observed for near-UV CD spectrometric analyses of RNA-free Ctr3; Figure 4.2C). From the spectra (Figure 4.3A,C) it appeared that the Tyr residue did not return to its original environment after the first round of unfolding-refolding, however once refolded (after the second round), the Tyr residue seemed to be retained in its new environment. Additionally, the spectra (Figure 4.3A,C), showed slight red-shifts for both forms of Ctr3 (with and without RNA) accompanied by noticeable increase in fluorescence intensity of the Tyr residue. This increased fluorescence might have originated due to lack of H-bonds which the Tyr residue is unable to form with concomitant transfer of energy to other amino acids (Siddiqui and Cavicchioli, 2006).

Thermal unfolding of Ctr3 was monitored at an emission wavelength of 305 nm at 4oC (Figure 4.3B,D). Monotonic decreases in intensity vs temperature were obtained which was suggestive of temperature-dependent decrease in intensity as Ctr3 unfolded. 69

Structural analysis of the homology model of Ctr3 showed that the sole Tyr residue is exposed on the surface (section 4.3.8). The fluorescence intensity of a surface exposed aromatic amino acid of a protein generally exhibits a gradual decrease with increasing temperature (Schmid, 1990). Therefore, melting profiles obtained by intrinsic fluorescence spectrometry is consistent with the Tyr residue of Ctr3 being surface exposed. The intrinsic fluorescence spectra of Ctr3 (RNA-bound and RNA-free) was characteristic of folded proteins (Figure 4.3A,C). However, no other useful information, such as Tm or reversibility, could be obtained from the thermal melt plots (Figure 4.3B,D).

Figure 4.3. Intrinsic fluorescence spectrometric analyses of Ctr3. (A) and (B) represent profiles for RNA-bound Ctr3; and (C) and (D) RNA-free Ctr3. (A) and (C) Intrinsic fluorescence spectra at 4°C showing tyrosine peak at 303 nm. (B) and (D) Thermal unfolding as monitored by intrinsic fluorescence spectrometry at 303 nm across the temperature range (4oC to 95oC). 70

4.3.4 Extrinsic fluorescence spectrometric analysis of Ctr3

Extrinsic fluorescent dyes can be used for characterization of hydrophobic sites of a protein. For example, the strong hydrophobicity of ANS, a large negatively charged aromatic molecule, makes it an excellent probe to study hydrophobic nature of a protein (Poklar et al., 1997). ANS exhibit high fluorescence intensities due to binding to hydrophobic binding sites of the native protein. As the protein thermally unfolds, a blue shift in ANS fluorescence emission maximum is detected; in addition increase in intensity demonstrates exposure of hydrophobic sites of the protein (Hawe et al., 2008). Blue shifts of ANS fluorescence are usually dependent on the dielectric constant of the solvent (Stryer, 1965). Decrease in dielectric constant corresponds to an increase in quantum yield and hence results in the blue shift of ANS fluorescence. The fluorescence is shifted to longer detectable wavelengths (Hawe et al., 2008).

In order to further assess the structural stability of Ctr3 (RNA-bound and RNA-free), temperature dependent unfolding was monitored using extrinsic fluorescence spectrometry. ANS was used as the fluorescent probe. Maximum emissions at 480 nm and wavelength maximum (λmax) obtained at various temperatures (refer to Figure 4.4 legend) were recorded for each experiment; both parameters (emission intensity and

λmax) were then plotted on the same graph against temperature (Figure 4.5). The intersecting point corresponded to the proteins Tm (Figure 4.5). These plots show that ANS molecules begin interacting with the hydrophobic residues of Ctr3 around ~ 40oC o and by ~ 60 C most of the hydrophobic residues of Ctr3 had been exposed to ANS. Tm determined for both forms of Ctr3 (RNA-bound and RNA-free) were similar (~ 50oC). The maximum emission intensities recorded at 480 nm for RNA-bound Ctr3 appeared to be lower than the intensities recorded for RNA-free Ctr3 (Figure 4.5) suggesting that RNA perhaps prevents ANS dye interaction with the hydrophobic side chains of the protein upon their exposure.

71

Figure 4.4. Blue shift of emission maximum of ANS fluorescence. Plots represent (A) RNA-free Ctr3 and (B) RNA-bound Ctr3. Plots with different colors correspond to scans performed at 4, 7, 11.4, 15.8, 19.1, 23, 27, 31, 34, 37, 41, 43, 47, 51, 54.5, 60, 64, 70, 75 and 84.8oC in the range 420-620 nm. Scan numbers and colours are given next to the plots.

72

Figure 4.5. Fluorescence emission of ANS excited at 380 nm. Plots represent (A) RNA-free Ctr3 and (B) RNA-bound Ctr3. The left Y-axis depicts maximum emissions at 480 nm and the right Y-axis shows 73

λmax corresponding to 4, 7, 11.4, 15.8, 19.1, 23, 27, 31, 34, 37, 41, 43, 47, 51, 54.5, 60, 64, 70, 75 and o 84.8 C. Tm is marked accordingly with a drop-down arrow.

4.3.5 DSC analysis of Ctr3

DSC measures the molar heat capacity (Cp) of a protein as a function of temperature

(Freire, 1995). Cp refers to the amount of heat required to cause a unit of mass (SI unit: mole) to raise its temperature by 1°C. Measuring Cp of a protein as a function of temperature provides useful information about the unfolding and the stability of the protein along with relevant kinetic and thermodynamic parameters depending on whether the protein unfolds irreversibly or reversibly, respectively (Freire et al., 1990; Cavicchioli et al., 2005). Additionally, DSC can be used for determining stabilization energy (ΔG) of reversibly unfolded proteins (Freire, 1995). From the plot of Cp vs temperature, Tm (corresponds to the maximum Cp), ΔHcal (protein’s calorimetric enthalpy) and ΔHvH (van’t Hoff enthalpy) can be determined. If the ratio between ΔHcal and ΔHvH is not equal to 1, the unfolding-refolding can be considered a non two-state process (Cavicchioli et al., 2006).

DSC was performed on RNA-bound and RNA-free Ctr3. Tm determined for RNA-free Ctr3 was ~ 50oC (Figure 4.6). The first unfolding-refolding profile differed with subsequent rounds of unfolding-refolding. After the initial unfolding-refolding cycle, o o the melting profile and Tm changed (from ~ 50 C to ~ 52 C) and then remained essentially the same for subsequent unfolding-refolding cycles. The calorimetric ΔH calculated by NanoAnalyze software (TA instruments) was 273 kJ/mol and 312 kJ/mol for the first and second conformation, respectively, demonstrating the second conformation required more heat to unfold relative to the initial conformation (Figure o 4.6). Tm determined for RNA-bound Ctr3 (RNA from E. coli) was ~ 37 C (Figure 4.7). o Tm shifted to ~ 42 C after the first round of unfolding-refolding; this shift of Tm after first round of unfolding-refolding was similar to the unfolding of RNA-free Ctr3 (Figure 4.6). Further rounds of unfolding-refolding of RNA-bound Ctr3 (RNA from E. coli) then remained essentially the same for subsequent cycles (Figure 4.7A). Change in enthalpy (ΔH) obtained from the first and second melt profiles were 206 kJ/mol and 272 kJ/mol, respectively (Figure 4.7B); again showing that the second conformation required more heat to unfold relative to the initial conformation. Interestingly, 74

reintroduction of RNA (from M. burtonii) to RNA-free Ctr3 (see chapter 6) resulted in o the recovery of lower Tm of ~ 37 C (Figure 4.7C,D), illustrating that RNA may have o o caused the reduction in Tm from ~ 50 C to ~ 37 C. After the initial unfolding-refolding cycle of RNA-bound Ctr3 (RNA from M. burtonii), the melting profile and Tm changed (from ~ 37oC to ~ 42oC) and then remained essentially the same for subsequent unfolding-refolding cycles (Figure 4.7C). The data from DSC indicates that RNA is likely to have strong influence on the unfolding and stability of Ctr3.

(A) (A)

(B) (B)

Figure 4.6. Stability and unfolding characteristic of RNA-free Ctr3. DSC plots for Ctr3: (A) rounds of melting illustrating reversible unfolding of the possible second conformation of Ctr3; (B) baseline 75

o o subtracted curves illustrating shift of Tm from ~ 50 C to ~ 52 C resulted from the first round of unfolding- refolding.

Figure 4.7. Stability and unfolding pattern of RNA-bound Ctr3. DSC plots: (A) and (B) Ctr3 bound to

E. coli RNA; and (C) and (D) Ctr3 bound to M. burtonii RNA. These plots illustrate the behaviour of Tm upon reintroduction of RNA-bound to the RNA-free state of Ctr3 and are indicative of RNA affecting stability of Ctr3. Plots (B) and (D) illustrate baseline subtracted curves of Ctr3.

4.3.6 TUG-GE analysis of Ctr3

TUG-GE is used to assess the unfolding/refolding transitions of a protein induced by chemical denaturant such as urea (Cavicchioli et al., 2006). By applying a urea-gradient, TUG-GE can be used to study reversible/irreversible unfolding with concomitant determination of thermodynamic (for reversibly unfolded proteins) or kinetic (for irreversibly unfolded proteins) stability of a protein (Siddiqui et al., 2005; Cavicchioli et al., 2006; Giaquinto et al., 2007). A folded protein has a lower hydrodynamic volume 76

compared to an unfolded protein which has a large hydrodynamic volume. Therefore, during electrophoresis, movement of the unfolded protein will be slower than the folded protein. Typically in TUG-GE, urea gradient is applied perpendicularly to the direction of electrophoresis. The folded-unfolded forms separate across the gradient while the proteins progressively move in the direction of the electrophoresis (Siddiqui et al., 2005; Cavicchioli et al., 2006; Giaquinto et al., 2007). Due to electrophoretic separation, this method offer benefits that allow simultaneous analysis of various structural and conformational states of a protein (e.g. determination of different intermediate folded states of a protein).

The unfolding and refolding transitions of Ctr3 were assessed using TUG-GE, at a constant temperature (7oC) to approximately match the native temperature of Ctr3 (≤4oC). The transition curves of RNA-bound and RNA-free Ctr3 were monitored in the presence of transverse urea gradient (0 - 6.64 M) (Figure 4.8). Almost identical transition profiles were obtained for both unfolding and refolding processes of RNA- bound and RNA-free Ctr3 (Figure 4.8). It is likely that low molecular weight RNA separated from the protein early during the electrophoresis thereby resulting in identical profiles for RNA-bound and RNA-free Ctr3 (Figure 4.8). This result suggests that RNA when bound to the ctr3 has a strong influence on the unfolding and stability of the protein as revealed by spectroscopic and calorimetric methods. Both curves illustrating unfolding and refolding of Ctr3 was continuous and without any break between completely folded and completely unfolded parts of the curves (Figure 4.8A vs B; C vs D). Furthermore, both profiles were super-imposable (Figure 4.8A vs B; C vs D); indicating a two-state reversible unfolding of Ctr3 (Cavicchioli et al., 2006). It is important to note that indications of protein variants (structural and/or conformational) were not observed as indicated by the clear absence of multiple bands in all TUG gels (Giaquinto et al., 2007) (Figure 4.8). 77

Figure 4.8. TUG-GE of Ctr3. (A) Refolding and (B) unfolding of RNA-bound Ctr3. (C) Refolding and (D) unfolding of RNA-free Ctr3. The urea gradient (0 – 6.64 M) is counterbalanced by the inverse acrylamide gradient (15 – 11%) as indicated in the figure. The graphical extrapolation of lines to obtain values for ΔG and [urea]1/2 are shown accordingly. Fu, fraction of unfolded molecules; ΔG, free-energy 78

difference between folded and unfolded states; [urea]1/2, concentration of urea at equilibrium; R, universal gas constant (8.314 kJ mol-1); T, absolute temperature.

4.3.7 Thermodynamic parameters of Ctr3

Since the possibility of bound RNA alone influencing DSC thermograms or spectrometric analyses (CD and fluorescence) was not assessed, data obtained for RNA- free Ctr3 was therefore used to calculate thermodynamic parameters of Ctr3: secondary (far-UV CD) vs tertiary (DSC) structure. The unfolding/refolding profile of RNA-free Ctr3 appeared to be reversible as shown by spectrometric (far and near-UV CD) and electrophoretic (TUG-GE) methods; hence equilibrium thermodynamics was applied to measure conformational stability of Ctr3.

Fraction of unfolded proteins (fu), equilibrium constant (K) and Gibbs free energy (ΔG) at any given temperature were calculated from the first melt of RNA-free Ctr3 from the far-UV CD spectrum (Figure 4.1D). The following equations were used (Cavicchioli et al., 2006):

Fu = (yF - y) / (y - yU) (1)

K = fU / (1 - fU) (2)

ΔG = -RT ln K (3)

At any temperature, fF + fu = 1; where fF and fu represents fractions of proteins folded and unfolded, respectively. Values of the y-axis (Figure 4.1D), characteristic of folded- unfolded state of proteins, are the ellipticity measured in the far-UV region (Kelly et al.,

2005). Values of yF and yU in the pre-transition and post-transition regions, respectively, were determined by extrapolating lines to y-intercepts. The calculation of fu, K and ΔG are illustrated in Table 4.1. From the plot of ΔG vs temperature, it can be seen that ΔG varies linearly with temperature (Figure 4.9). The slope of the ΔG vs temperature illustrates ΔS at Tm where ΔS is the entropy change at Tm (Cavicchioli et al., 2006).

79

Table 4.1. Analysis of thermal unfolding of secondary structure of Ctr3. Molar ellipticity values measured from far-UV CD spectrometry are presented;T represents temperature, fu represents fractions of proteins unfolded, K represents equilibrium constant; and ΔG, ΔSm and ΔHm are free energy, entropy and enthalpy change, -1 respectively, at Tm. ΔG and ΔHm are presented in kJ mol and ΔSm is presented as J mol-1 K-1.

o T ( C) Molar ellipticity fu K ΔG ΔSm ΔHm 45 4.1 0.111 0.124 4.76 46 4.2 0.143 0.167 4.08 47 4.2 0.176 0.214 3.51 48 4.3 0.235 0.307 2.69 49 4.4 0.316 0.462 1.76 50 4.4 0.333 0.499 1.58 817 267 51 4.6 0.538 1.165 -0.35 52 4.7 0.600 1.500 -0.92

Figure 4.9. ΔG for Ctr3 unfolding as function of temperature (45 to 52oC). Values of ΔG were -1 -1 calculated from equation (3). The slope of the plot which represents ΔSm is 816.95 kJ mol K .

In order to assess the thermodynamic parameters of the tertiary structure of Ctr3, ΔHvH and ΔHcal were determined from the DSC thermogram of the first melt of RNA-free

Ctr3 (Figure 4.10A). ΔHcal was recorded directly from the DSC thermogram and ΔHvH was calculated from the plot of ln Keq vs 1/T, where slope of the curve = -ΔHvH/R; K is 80

the equilibrium constant, T is absolute temperature and R is the gas constant (8.31 J −1 −1 mol K ) (Freire, 1995) (Figure 4.10B). ln Keq values at any temperature were obtained from DSC thermogram of the first melt, where K is the ratio between the area under the curve for denatured protein and the area under the curve for the native protein

(Figure 4.10A). The ratio between ΔHvH (271 kJ/mol) and ΔHcal (273 kJ/mol) was equal to ~ 1; thus implying that the unfolding-refolding of Ctr3 was a two-step process. ΔSm o at Tm (50 C) and ΔG at any temperature were also calculated using the same thermogram (Figure 4.6B). ΔSm and ΔCp was calculated according to the following equations (Pace el al., 1990; Schmid, 1990; D'amico et al., 2003b; Struvay and Feller, 2012):

ΔG = ΔH - TΔS (4)

ΔHm = Tm x ΔSm (5)

-1 -1 ΔCp = 12 cal mol K X 70 (number of amino acids in Ctr3) (6)

ΔG values at temperatures from 4oC to 50oC (at intervals of ~ 5oC) were determined using the following formula (Cavicchioli et al., 2006):

ΔG(T) = ΔHm (1 - T/Tm) - ΔCp [(Tm - T) + T(ln T/Tm)] (7)

A plot of ΔG(T) vs temperature was then determined to asses stability of Ctr3 (Figure 4.10). The stability curve was a parabola (data points not shown) that intersected the x- o axis (temperature) at two points; the right intercept corresponded to Tm (~ 50 C) of Ctr3 (where ΔG = 0) and the left intercept (extrapolated parabola from <4oC) corresponded to cold denaturation of Ctr3, which was found to be ~ -22oC (Cavicchioli et al., 2006) o (Figure 4.10). At the maximum ΔG (~ 8 kJ/mol), Tmax was ~ 15 C (Figure 4.11). For all calculations of ΔSm and ΔG(T), ΔHcal value was used. The calculated values of ΔHvH,

ΔHcal, Cp and ΔSm are illustrated in Table 4.2.

Table 4.2. Analysis of thermal unfolding of tertiary structure of Ctr3. ΔHvH and ΔHcal -1 -1 -1 are presented in kJ mol ; and ΔCp is presented as cal mol K and ΔSm is presented as J mol-1 K-1.

ΔHvH ΔHcal ΔCp ΔSm

271 273 840 419

81

Figure 4.10. ΔHcal and ΔHvH determined from the DSC thermogram. (A) ΔHcal recorded directly off

DSC (first melt of RNA-free Ctr3); (B) ΔHvH calculated from the DSC thermogram. ΔHcal and ΔHvH are indicated accordingly.

Figure 4.11. Plot of ΔG(T) versus temperature. The parabola cut x-axis (temperature) at two points; o the right intercept corresponded to Tm (~ 50 C) of Ctr3 (where ΔG = 0) and the left intercept corresponded to cold denaturation of Ctr3, which is ~ -22oC. At 7oC (marked with dotted lines), ΔG value obtained from this plot matches ΔG calculated from TUG-GE analysis.

82

The conformational stability of RNA-free Ctr3 was also assessed using the TUG gels (Figure 4.8C,D). The urea gradient gel profiles were used to obtain information regarding thermodynamics of the unfolding transition of the tertiary structure of Ctr3. o ΔG and [urea]1/2 were calculated at 7 C (temperature at which TUG-GE was performed). The free energy of unfolding at 0 M urea was determined by extrapolating the line defining the transition area (Figure 4.8C,D) and using the following formula (Cavicchioli et al., 2006):

ΔG = - (y-axis value) x RT (8)

Where R is gas constant (8.31 J mol−1 K−1) and T is absolute temperature.

[urea]1/2 was determined in the region where ΔG = 0, this was achieved by graphically extrapolating lines across both axes (Figure 4.8C,D). Calculated values are as follows: -1 o ΔG = 7.7 kJ mol ; [urea]1/2 = 3.25 M. ΔG obtained from TUG-GE at 7 C was almost identical to the ΔG obtained from DSC at the same temperature (Figure 4.11).

4.3.8 Structural analysis of Ctr3

The protein with a crystal structure that has the highest amino acid identity to Ctr3 is RumA (RlmD) from E. coli, which catalyses the transfer of a methyl group from S- adenosylmethionine to yield 5-methyluridine on a specific uridine residue in 23S ribosomal RNA (Lee et al., 2004; Lee et al., 2005). Ctr3 has ~ 25% amino acid identity with the RumA-TRAM domain (Figure 4.12A) and a protein homology model shows five β-strands arranged in a β-barrel structure (Figure 4.12B) with a similar representation of aromatic amino acids and residues involved in hydrogen bonding on the RNA binding surface (Figure 4.12A,C). Ctr3 has four Phe residues (Phe33, Phe36, Phe57 and Phe59) comprising predicted RNA binding surface: Phe36 and Phe57 in Ctr3 correspond to Phe40 and Tyr61, respectively, in the RumA (RlmD) crystal structure; Phe33 and Phe59 in Ctr3 correspond to Lys37 and Arg63, respectively, in the RumA (RlmD) crystal structure. All four Ctr3 Phe side chains correspond to side chains in the RumA:RNA complex that are involved in direct contacts with RNA bases suggesting that these aromatic residues in Ctr3 play a similar role in interaction with RNA. From the homology model, the sole tyrosine of Ctr3 appears to be solvent exposed.

Ctr3 has ~ 68% sequence identity with MM_1357 (unknown function) from Methanosarcina mazei (Figure 4.13) and the characteristic β-barrel shape was 83

essentially identical to the RumA-TRAM domain (Figure 4.12B). The residues identified as interacting with RNA in the RumA:RNA complex (both aromatic amino and hydrogen bonding interactions) appear to be largely conserved between Ctr3, MM_1357 and RumA (RlmD) (Figure 4.14). These structural similarities further suggest that the binding mechanism is likely to be similar in Ctr3, MM_1357 and RumA (RlmD).

(A) β1 β2 β3 β4 β5 23 33 36 57 59 M. bur Ctr3 MESTAPVEAGESYDVTIEDTAREGDGIARVSGFVIFVPNTSVGDEVTIKVTKVARKFAFGEVV— 27 37 40 61 63 RumA TRAM RTTTR-----QIITVSVNDLDSFGQGVARHNGKTLFIPGLLPQENAEVTVTEDKKQYARAKVVR- :* : *:::* *:*:** .* .:*:*. ::. :.**: :::* .:**

(B) (C)

U1956 U1955 F59 F27

β5 F40 F36 β4

β2 β1 β3 Y61 G1954 F33

A1953 F57

Figure 4.12. Structural comparison of M. burtonii Ctr3 with E. coli RumA (RlmD). (A) Primary sequence alignment. Residues that reside in β-strands (underlined); residues involved in stacking interactions with RNA (grey highlight); residues involved in hydrogen bond interactions with RNA (green highlight). (B) Structural alignment of Ctr3 protein homology model (magenta) with RumA (RlmD) (turquoise) showing the five -strands of the -barrel. (C) Interaction between TRAM domains and RNA. Ctr3 (magenta); RumA (RlmD) (turquoise); RNA (red); stacking interaction between RNA and protein (orange arrows); also shown are aromatic amino acids involved (RumA-TRAM domain) or predicted to be involved (Ctr3) in RNA-binding. 84

(A)

Ctr3 ME----STAPVEAGESYDVTIEDTAREGDGIARVSGFVIFVPNTSVGDEVTIKVTKVARKFAFGEVV- 63 MM_1357 MFREESRSVPVEEGEVYDVTIQDIARQGDGIARIEGFVIFVPGTKVGDEVRIKVERVLPKFAFASVVE 68 * :.*** ** *****:* **:******:.*******.*.***** *** :* ****..** β1 β2 β3 β4 β5

(B)

Figure 4.13. Comparison between Ctr3 and TRAM domain protein structures from Archaea. (A) Alignment of Ctr3 and MM_1357 highlighting residues within β strands (underlined) and conserved Phe residues (red). (B) Alignment of MM_1357. Ctr3 and MM_1357 showed ~ 68% sequence identity. Shown are residues shared between the two TRAM proteins that are identical (magenta), residues that are different (green), and conserved phenylalanine residues (red).

Ctr3 ---MES-TAPVEAGESYDVTIEDTAREGDGIARVSGFVIFVPNTSVGDEVTIKVTKVARKFAFGEVV- MM_1357 MFREESRSVPVEEGEVYDVTIQDIARQGDGIARIEGFVIFVPGTKVGDEVRIKVERVLPKFAFASVVE RumA ------RTTTRQIITVSVNDLDSFGQGVARHNGKTLFIPGLLPQENAEVTVTEDKKQYARAKVVR . : *:::* *:*:** .* .:*:*. ::. :.* . ::* ..**

Figure 4.14. Alignment of Ctr3 from M. burtonii, MM_1357 from M. mazei and RumA-TRAM domain from E. coli.

85

4.4 Discussion

4.4.1 Reversibility of Ctr3

Generally the unfolding of small, single domain proteins (<10 kDa) has been observed to follow a two-step folding mechanism (Pace et al., 1990). For example, cold shock protein (CspB) from Bacillus subtilis and Csp from psychrophilic M. frigidum unfolds with a two-state mechanism (Garcia-Mira et al., 2004; Giaquinto et al., 2007). Ctr3 from M. burtonii, Csp from M. frigidum and CspB from B. subtilis are single domain proteins that adopt similar β-barrel structures and are of similar molecular weights (~ 8- 10 kDa) (Schindelin et al., 1993; Giaquinto et al., 2007). A large multidomain protein, cold-adapted α-amylase from the Antarctic bacterium Pseudoalteromonas haloplanktis has also been reported to exhibit reversible two-state unfolding (Feller et al., 1999, Siddiqui et al., 2005). Unfolding of many psychrophilic proteins is reversible possibly due to less electrostatic interactions combined with weak hydrophobicity of the core clusters that may prevent aggregation (Feller, 2013). RNA-free Ctr3 also indicated a two-state reversible unfolding as determined by spectrometric, calorimetric and electrophoretic methods. Changes in enthalpy (ΔH) determined by the van't Hoff relationship is almost identical to ΔH determined calorimetrically (DSC). Similarly, TUG-GE profiles illustrating unfolding and refolding of Ctr3 were super-imposable indicating a reversible two-state unfolding (Figure 4.8C vs D). Near-UV CD spectrum of RNA-free Ctr3 also showed well-defined reversible unfolding when samples were cooled and melted a second time. In addition, the secondary structure of Ctr3 probed by far-UV CD also exhibited well-defined reversible unfolding (Figure 4.1D).

4.4.2 Thermodynamics of unfolding of secondary vs tertiary structures

Spectrometric (far-UV CD) and calorimetric (DSC) data were used to compare thermodynamic transition parameters of secondary vs tertiary structures unfolding of RNA-free Ctr3, respectively. Free-energy (ΔG) is released when a protein unfolds reversibly and is composed of enthalpic (ΔH) and entropic (ΔS) components according to the following relationship: ΔG = ΔH – TΔS (Struvay and Feller, 2012). Table 4.3 reports a gradual decrease in ΔG values to almost zero for both secondary and tertiary o o structures as the corresponding temperature reaches Tm (~ 50 C). From 45 to 49 C, at each temperature, secondary structure exhibits higher free energy compared to the tertiary structure. This indicates that more energy is needed to disrupt secondary than 86

tertiary structure and that the secondary structure is more stable than the tertiary structure. At Tm, ΔG = 0 (Cavicchioli et al., 2006), ΔHm of secondary and tertiary structures are 267 kJ mol-1 and 273 kJ mol-1, respectively, implying that the unfolding of secondary and tertiary structures is concurrent and that similar amounts of heat is needed to disrupt both types of structures. Although, the enthalpy of unfolding essentially remains the same at Tm, however, the entropy of unfolding at Tm varies -1 -1 considerably. The ΔSm of secondary and tertiary structures are 817 J mol K and 419 J -1 -1 mol K , respectively. This indicates that at Tm, the tertiary structure is more ordered than the secondary structure. Hydrophobic molecules can interact with water molecules via van der Waals interactions and as a result water molecules can arrange around the hydrophobic molecules to form cage-like structures (Mikheev et al., 2007). This increases order in the surrounding water molecules and consequently decrease the entropy of the system (Mikheev et al., 2007). Therefore in solution, hydrophobic regions of tertiary structure of Ctr3 perhaps interact with surrounding water molecules which results in lower entropy compared to the secondary structure.

Table 4.3. Analysis of free energy change (ΔG) of secondary vs tertiary structure of Ctr3. Equation 3 and 7 were used to calculate ΔG values for secondary and tertiary structures, respectively. Values of ΔG are presented in kJ mol-1.

Secondary structure Tertiary structure T (oC) ΔG ΔG 45 4.76 2.12 46 4.08 1.75 47 3.51 1.36 48 2.69 0.97 49 1.76 0.56

Proteins from M. burtonii that have been reported to date exhibited irreversible unfolding; EF2 (Thomas and Cavicchioli, 2000), chaperonin (Pilak et al., 2011) and RNA polymerase E/F (DeFrancisci et al., 2011). Therefore, Ctr3 is the first protein to be assessed from M. burtonii that exhibits reversible unfolding. Csp from psychrophilic M. frigidum is a single domain protein that exhibits a reversible two-state unfolding mechanism (Giaquinto et al., 2007). ΔG and [urea]1/2 (calculated at 7°C) were 11.4 kJ mol−1 and 3.2 M, respectively, for M. frigidum Csp. Ctr3 from M. burtonii exhibited 87

-1 similar [urea]1/2 (3.3 M); however ΔG value (7.7 kJ mol ) was less that of Csp from M. frigidum; indicating M. frigidum Csp is more stable at 7oC than Ctr3. In TUG-GE analysis of M. frigidum Csp, two structural variants of the full length Csp was detected at high urea concentrations (Giaquinto et al., 2007). However, no indication of protein variants (structural and/or conformational) was observed in urea-induced unfolding of Ctr3 (Figure 4.8), indicating the absence of post-translational modifications of recombinant Ctr3 in E. coli.

4.4.3 Possible effects of protein-RNA interactions on structural stability

To assess fundamental biophysical properties of Ctr3, CD, fluorescence spectroscopy, DSC and TUG-GE were performed on the protein depleted RNA-bound and RNA-free form of Ctr3. While the near-UV spectra of the RNA-bound and RNA-free forms were characteristic of folded proteins, the intensity of the Phe peak (Ɛmax) at 258 nm was stronger for the bound form (Figure 4.15A). The intrinsic fluorescence spectra also had a more intense Tyr peak (Ɛmax) at 303 nm for the bound form, accompanied by a small red shift (Figure 4.15C). The increases in intensities are indicative of RNA binding causing interference in bonding pattern of Tyr and Phe residues, consistent with the change in environment around these aromatic amino acids on the putative RNA binding surface (see Figure 4.12; also discussed in section 4.4.3). Evidence that RNA might have influence on protein structure was also supported by extrinsic fluorescence measurements, with a more intense peak occurring in the absence of RNA, indicative of greater penetration of ANS into the hydrophobic core (Figure 4.15A). The DSC thermograms also exhibited a similar trend; comparing the first round of unfolding- refolding, Tm decreased by ~ 13C for RNA-bound (Tm 37C; Figure 4.15D) vs RNA- free Ctr3 (Tm 50C; Figure 4.15D) and calorimetric ΔH was higher for the RNA-free (273 kJ/mol) vs RNA-bound (206 kJ/mol) form of Ctr3. TUG-GE performed on both RNA-bound and RNA-free Ctr3 provided some evidence on whether RNA binding has effects on the unfolding and stability of ctr3. As TUG-GE removes highly negatively charged RNA from the protein during electrophoresis, both proteins were expected to show super-imposable protein curves. The results (Figure 4.15) clearly show superimposable curves implying that the differences in the properties of both proteins (RNA-free and RNA-bound) are perhaps solely due to the binding of RNA. 88

Most ligands tend to bind tightly and specifically to the native states of their respective proteins, without having any affinity for the denatured states thereby stabilizing the native state (Freire, 1990; Freire et al., 1995). For example, binding of DNA increases o the Tm of methionine repressor protein (metJ) (from 54 to 59 C) (Cooper, 2000). In contrast, if the ligand has more affinity for the denatured state, then it stabilizes the ligand-denatured protein complex (Freire, 1990; Freire et al., 1995). As the presence of RNA destabilizes Ctr3 (as suggested from various biophysical studies), it suggests that RNA has more affinity for the denatured state of the protein. An example of such behaviour of stabilization of ligand-denatured protein complexes is observed in GLUT- 1 protein (Epand et al., 2001). The glucose transporter GLUT-1 is an integral membrane protein facilitating the passive glucose transport. Results from DSC studies indicate that the thermal denaturation temperature of GLUT-1 is significantly lower in presence of ATP, showing that ATP destabilizes the native structure (Epand et al., 2001).

(A) (C) (A) (C) (A) (C) (A) (C)

(B) (D) (B)(B) (D)(D) (B) (D)

Figure 4.15. Biophysical analyses of Ctr3. (A) Near-UV CD showing Phe signal variation upon RNA binding; (B) Intrinsic fluorescence showing change in Tyr signal upon RNA binding; (C) Extrinsic 89

fluorescence showing penetration of ANS into the hydrophobic core upon RNA binding. (D) Differential scanning calorimetry (DSC) showing shift of Tm upon RNA association.

4.4.4 Refolding of ctr3 to a different conformation

DSC analysis of Ctr3 revealed that once unfolded and refolded the protein did not return to its primary conformation; rather it appeared to attain a secondary conformation as indicated by the shift of Tm (Figures 4.6A and 4.7A,C). This second conformation had a higher Tm (compared to the first conformation) and essentially exhibited complete reversibility as opposed to the primary conformation (Figures 4.6A and 4.7A). This feature was common for both RNA-bound and RNA-free Ctr3. For RNA-free Ctr3, Tm of first and second conformations were ~ 50oC and ~ 52oC (Figure 4.6), respectively, o whilst for RNA-bound Ctr3, Tm of first and second conformations were ~ 37 C and ~ 42oC, respectively (Figure 4.7). The calorimetric ΔH was also higher for the second conformation compared to the first (Figure 4.6 and 4.7). Interestingly, near-UV CD and intrinsic fluorescence spectrometric analyses also supported the possibility of a secondary conformation of Ctr3 (after first round of unfolding-refolding). Near-UV CD analysis of RNA-free Ctr3 showed that intensity of Phe residues (at 258 nm) increased after the first round of unfolding-refolding (Figure 4.2C) and remained same for the subsequent rounds of unfolding-refolding. The Tyr residue also behaved similarly; the intrinsic fluorescence intensity (at 305 nm) increased after the first round of unfolding- refolding, and remained the same for the subsequent rounds (Figure 4.3A,C); both RNA-bound and RNA-free Ctr3 exhibited similar trend (Figure 4.3A,C). Structural changes can occur after the first round of unfolding/refolding due to irreversible covalent changes at high temperature that is likely to lead to a different conformation (Freire, 1990). Therefore, it is possible that Ctr3 adopts such covalent changes at high temperatures and is unable to return to its original conformation.

4.4.5 Structural features of Ctr3

Despite having distinct primary sequences between TRAM domain proteins (e.g. Ctr3 and E. coli RumA) and CSD proteins (e.g. B. subtilis CspB and NAB1 from Chlamydomonas reinhardtii), all have a tertiary structure with a common OB-fold. The RNA binding surface of these OB-fold proteins is enriched with basic amino acids providing an overall positive electrostatic potential which facilitates interactions with 90

the negatively charged backbone of RNA molecules (Murzin, 1993; Schindelin et al., 1993; 1994; Schröder et al., 1995; Feng et al., 1998; Schindler et al., 1999; Lee et al., 2005; Sawyer et al., 2015). The aromatic amino acids afford direct RNA-protein interactions and are critical to protein stability and the specificity of nucleic acid recognition and binding (Murzin, 1993; Schindelin et al., 1993; 1994; Schröder et al., 1995; Hillier et al., 1998; Lee et al., 2005; Sawyer et al., 2015). For NAB1, which regulates translation of light harvesting complex mRNA in the alga C. reinhardtii (Sawyer et al., 2015), and for E. coli RumA (Lee et al., 2005), interaction with RNA induces a protein conformational change. For RumA (RlmD), when the TRAM domain interacts with an RNA hairpin structure, the free energy of binding enhances the catalytic efficiency of the methyltransferase (Lee et al., 2005). The inherent flexibility of the TRAM domain in RumA (RlmD) is important for enabling the conformational change necessary for binding to RNA (Lee et al., 2004), and such flexibility appears to be a general characteristic of OB-fold proteins (Mihailovich et al., 2010; Sawyer et al., 2015). Given the conserved characteristics of OB-fold proteins (Figure 4.12), the binding and flexibility properties of RumA-TRAM domain are likely to be similar in Ctr3.

4.4.6 Conclusion

o Ctr3 unfolds reversibly with a two-step mechanism (Tm of ~ 50 C) without the formation of any intermediate protein variants (structural and/or conformational). The unfolding of secondary and tertiary structures is concurrent. Thermal unfolding perhaps results in irreversible covalent changes in Ctr3 that likely lead to the formation of a different conformation. RNA substrates exhibit more affinity for the denatured state compared to the native state of Ctr3. The homology model of Ctr3 indicated that the surface exposed Phe residues on the putative RNA-binding surface are likely to be involved in direct RNA-protein interaction. Overall, these characteristics provided valuable insights into biophysical and structural features of RNA-bound vs RNA-free forms of Ctr3 from M. burtonii.

91

Chapter 5

Temperature-dependent gene expression and transcriptome analysis of M. burtonii

Statement

Sections of this chapter contain analyses performed by of Dr Stefano Campanaro from the Department of Biology, University of Padua, Padova, Italy. Contributions by Stefano Campanaro that are presented in this chapter are as follow: transcriptome reconstruction which includes identification of transcription start and termination sites, operon structures and promoters; global assembly and mapping of reads against the M. burtonii genome (strain DSM6242) and assessment of global transcript abundance at 4 vs 23oC from RNA-seq and microarray data.

My contributions are as follows: MFM media preparation and growing M. burtonii cultures at 4 and 23oC; RNA extraction; sample preparation for RNA-seq quality control; comprehensive analysis of ctr genes; promoter identification; comparative analysis of transcripts at 4 vs 23oC from RNA-seq and microarray data, and interpretation of RNA-seq data. 92

Abstract

M. burtonii dwells in the permanently cold environment (below 4oC) of Ace Lake in Antarctica; however in the laboratory, it is capable of growth at temperatures from -2C to 28C. In this study, RNA-seq was performed on RNA isolated from cells grown at low (4C) and high (23C) temperatures. All operons and differentially abundant genes (at 4 or 23oC) were consistent with previous expression profiling using microarrays. Genes that exhibited differential abundance of 2-fold or more in both RNA-seq and microarray experiments had good correspondence suggesting the expression data (4 vs 23oC) obtained from RNA-seq was reliable. Consensus promoters were frequently present within operons, indicative of independent transcription initiation from internal sites; a feature not previously identified in the microarray study. A bacterial cold box- like element was identified in the 5’-untranslated regions of ctr genes. Cold-box elements are present in the 5’-untranslated regions of genes encoding CspA, CspB and CsdA in E. coli and CspB and CspC in B. subtilis, and are thought to be involved in csp gene regulation at low temperatures. This chapter describes temperature-dependent (4 vs 23oC) transcriptional regulation of global gene expression and a general overview of M. burtonii transcriptome obtained from RNA-seq.

93

5.1 Introduction

About 80% of the Earth’s surface is occupied by cold (below 5°C) ecosystems (Feller, 2003) in which Archaea are abundant and represent a significant proportion of the total biomass (Karner et al., 2001; Feller and Gerday, 2003; Keller and Zengler, 2004). Through evolution, cold-adapted or psychrophilic Archaea have evolved capacities to grow at low temperatures. For example, it was inferred that the increased synthesis of unsaturated lipids improves membrane fluidity in M. burtonii which consequently facilitates growth at low temperature (Nichols et al., 2004).

Genomic and proteomic analyses performed on psychrophilic Archaea have proven to be useful in providing a strong basis for understanding about genomic features (such as DNA sequence, annotations of functional elements) and protein synthesis in the cell (Cavicchioli et al., 2006). However, only a limited number of genomic and proteomic studies have been performed on psychrophilic Archaea (Casanueva et al., 2010). For example, comparative proteomic analyses performed on M. burtonii growing at 4 and 23oC enabled the identification of 528 proteins at 4oC (Goodchild et al., 2004a; 2005). Combining genomic and proteomic data of M. burtonii allowed inferring functions of 55 proteins that were previously identified as hypothetical proteins (Saunders et al., 2005). Nevertheless, neither genomic or proteomic study can provide an exhaustive list of all expressed genes or synthesized proteins in the cell (Campanaro et al., 2011). Transcriptome analysis, on the other hand, provides expression data for all RNA transcripts that are abundant under specific conditions (e.g. temperature), thus providing a link between genes and proteins via expression profiling. Only a few transcriptomic analyses have been performed on psychrophilic Archaea. For example, transcriptomic characterization of Methanolobus psychrophilus (Chen et al., 2012; Li et al., 2015) and transcriptomic analysis on temperature-dependent global gene expression in M. burtonii (Campanaro et al., 2011).

Transcriptomic analysis allows i) detailed assessment of gene regulation (e.g. whether it is transcription or translation directed); ii) determination of the relationship between gene organization and iii) identification of operons (Campanaro et al., 2011). For example, an Agilent custom microarray developed to assess transcript abundance differences between M. burtonii cells growing at 4 and 23°C found that approximately 55% of genes were arranged in operons and abundance of mRNA increased with operon 94

length (Campanaro et al., 2011). This relationship between operon length and mRNA abundance was also observed in a mesophilic archaeon, Halobacterium salinarum (Koide et al., 2009).

Deep transcriptome sequencing or RNA-seq is a modern technique that uses next- generation sequencing to study genome-wide RNA-based regulatory mechanisms (Wang et al., 2009). Briefly, a population of RNA is converted to cDNA with adapters attached on one/both ends. The cDNA fragments are then amplified and sequenced using high-throughput technology (such Illumina-based technologies) to produce short sequences or reads (typically between 30-400 bp from Illumina platforms) (Wang et al., 2009). RNA-seq has been utilized successfully in many recent bacterial transcriptomic studies and has demonstrated its effectiveness in determining precise transcription start and termination sites, operon structures and broadly mapping of the whole transcriptome of the organism (Passalacqua et al., 2009; Perkins et al., 2009; Yoder- Himes et al., 2009; Wurtzel et al., 2010). However, only a limited number of archaeal transcriptomes have been characterized using RNA-seq. A demonstration of the capacity of RNA-seq can be observed in the reconstruction of primary transcriptome of a thermophilic archaeon, Sulfolobus solfataricus (Wurtzel et al., 2010). As compared to the previous studies which were based on cloning of size-selected individual transcripts (Tang et al., 2005; Zago et al., 2005), over 1000 transcription start sites and operon structures, over 300 non-coding RNA and 80 new protein-coding genes were accurately identified. In addition, over 5% of all genes were correctly annotated in the genome of S. solfataricus (Wurtzel et al., 2010). To date, the transcriptome of only one psychrophilic archaeon (M. psychrophilus) has been studied using RNA-seq technique in two separate studies reporting transcriptome characterization and global mapping of transcriptional start sites (Chen et al., 2012; Li et al., 2015).

An Agilent custom microarray, the same to the one used for transcriptome analysis of M. burtonii (Campanaro et al., 2011), was constructed to identify target RNA species bound by the RpoEF from M. burtonii (DeFrancisci et al., 2011). Signals obtained from the RpoEF array was compared to genes that were found to be expressed at a significant level in M. burtonii (Campanaro et al., 2011). The ratio of fluorescence intensity for RpoEF-bound RNA to total RNA was determined; oligonucleotides exhibiting fluorescence value over 1000 units for the bound RNA relative to the total RNA were considered as potential targets for RpoEF. The total RNA expression data was 95

successfully used as reference for the bound RNA to identify differentially abundant 588 potential RNA targets (DeFrancisci et al., 2011; Campanaro et al., 2011).

RNA-seq has higher dynamic range compared to the microarray technique (Wang et al., 2009). Therefore, in order to achieve a more comprehensive assessment of differential abundance of transcripts at 4 vs 23°C that were likely to have remained undetected in the microarray analysis (Campanaro et al., 2011), RNA-seq was performed on total RNA isolated from M. burtonii. Expression profiling using RNA-seq data allowed assessment of all 2493 genes (Allen et al., 2009) including all tRNA, ribosomal RNA and genes that were not included in the microarray analysis (total of 2239 genes were covered) (Campanaro et al., 2011). Putative transcription start and termination sites, intergenic regions and operon structures were identified which were not previously determined by the microarray analysis. The RNA-seq data of total RNA was also used as reference for the bound RNA to identify differentially abundant potential RNA targets of Ctr3 (Chapter 6). This chapter describes an overview of M. burtonii transcriptome reconstructed using RNA-seq data.

96

5.2 Materials and Methods

5.2.1 M. burtonii cultures and RNA extraction

MFM media, a modification of MGM (Franzmann et al., 1992), was prepared according to the protocol set by Thomas and Cavicchioli (2000). Distilled water (800 ml) was added to a 1 L Schott bottle and the following chemicals were then added: 0.335 g KCl,

6 g MgCl2.6H2O, 1 g MgSO4.7H2O, 0.25 g NH4Cl, 0.14 g CaCl2.2H2O, 23.32 g NaCl, 2

µg (NH4)2Fe(SO4)2, 0.14 g K2HPO4, 1 µg resazurin, 5 g trimethylamine HCl, 2 g yeast extract, 10 µl vitamin solution, 10 µl mineral solution and 0.1 g sodium acetate

(C2H3NaO2). The vitamin and stock solutions were prepared in 100X concentration as described by Goodchild (2004). The mineral salt stock solution (100X) contained 1.5 g nitrilotriacetic acid, 3 g MgSO4.7H2O, 0.5 g MnSO4.7H2O, 1 g NaCl, 0.1 g

FeSO4.7H2O, 0.1 g CoSO4, 0.1 g CaCl2.2H2O, 1 g ZnSO4, 10 mg CuSO4.5H2O, 10 mg

AlK(SO4)2, 10 mg H3BO3 and 10 mg Na2MoO4.2H2O; and the vitamin stock solution (100X) contained 2 mg biotin, 2 mg folic acid, 2 mg pyridoxine HCl, 10 mg thiamine HCl, 5 mg riboflavin, 5 mg nicotinic acid, 5 mg DL-Ca pantothenate, 0.1 mg vitamin B12, 5 mg of p-aminobenzoic acid and 5 mg lipoic acid. The mineral and vitamin stock solutions were prepared as 1 L, filtered through a 0.22µm filter unit (Millipore) and stored at -20°C. The MFM media (in preparation) was then dissolved using a magnetic stirrer and the volume was made up to 1 L. The solution was then sparged with N2 for

15 min and again with N2:CO2 mixture at ratio of 80:20 for a further 15 min. Cysteine

HCl was added (0.5 g dissolved in 1 ml H2O). The solution was left to stir while being bubbled with N2:CO2 for 20 min. The pH of the solution was adjusted to 6.8 using 32% hydrochloric acid before the media was transferred to serum bottles (approximately 100 ml per bottle). The media in the serum bottles was then sparged with N2:CO2 for 15 min ensuring that the gas outlet was submerged in the liquid. The bottles were sealed with rubber stoppers to ensure anaerobic condition, the head space was sparged and media was autoclaved at 121oC for 15 min. The autoclaved media was then placed on shaker overnight at room temperature to dissolve any precipitate. A solution of 2.5% (w/v)

Na2S (1 ml) was injected into the media and left for at least 2 h on the shaker before inoculation. Separate cultures were grown at 4 and 23oC by inoculating 1:100 from cells grown under similar conditions. Cells were harvested at OD620 0.5-0.6 (late logarithmic phase) by centrifugation at 3200 x g for 35 min at the culture growth temperature. Total RNA was extracted using the TRIzol® Plus RNA Purification Kit (Life Technologies), 97

RNA concentration measured using a Nanodrop spectrometer (Thermo Scientific), and RNA integrity analysed using a 2100 Bioanalyzer (Agilent).

5.2.2 RNA-seq and data analyses

Transcriptome experiments were performed based on methods described previously (Campanaro et al., 2011; DeFrancisci et al., 2011; Campanaro et al., 2012). Total RNA extracted from six independent M. burtonii cultures (three from 4°C and three from 23°C) and RNA was sequenced on a Hiseq 2000 (Illumina) instrument at the Ramaciotti Centre for Genomics (UNSW, Australia) generating a total of 150 million non-strand- specific paired-end (PE) reads with an average read length of 100 bp. The sequenced reads were evaluated for quality using FastQC (version 0.11.2) (Andrews, 2010). The SolexaQA package (Cox et al., 2010) was used to calculate quality statistics and remove low quality reads. Raw PE reads were trimmed with BWA trimming mode at a threshold of Q13 (P = 0.05) using DynamicTrim of SolexaQA. Reads less than 25 bp were discarded using LengthSort. The quality filtered reads were mapped and aligned against M. burtonii genome (Genbank: NR_076415) using bowtie/2-2.0.0-beta7 (Langmead et al., 2009) using default parameters; reads with ≤ 2 mismatches only were considered. HTSeq-count (v0.5.3p9) (Anders et al., 2015) was used to generate a list of per-gene read counts for each sample. The BioConductor package edgeR v3.10.2 (Robinson et al., 2010) was used in the R programming environment to sort genes according to their differential abundance across temperatures 4 vs 23°C. Standard normalization procedures were performed on all data. A false discovery rate calculation was employed to filter quartiles of genes with lowest counts. Threshold of log2 ratio of  1 (2-fold) was imposed on 4 vs 23°C total RNA to minimize background reads. All transcripts and reads that exhibited differential abundance were visualized and manipulated using IGV 2.3 (Robinson et al., 2011) and Artemis (Rutherford et al., 2000) software.

The identification of 5’ and 3’-UTRs used approaches described previously (Campanaro et al., 2011; Campanaro et al., 2012). Briefly, custom PERL scripts were used to analyse both upstream and downstream coverage of open reading frames (ORFs) and to calculate the mean coverage of each open reading frame (ORF). The transcription start site (TSS) in the upstream region and transcription termination site (TTS) in the downstream region were allocated precise positions where the coverage dropped to 98

1/20th of the calculated average value of the ORF. To avoid an overestimation of the UTRs size, upstream and downstream regions with coverage lower than 1/5th of the average coverage of the gene were not considered to be part of the gene’s UTR. The predicted TSS and TTS from three replicates at 4°C were compared, and those confirmed by at least two experiments with a length difference smaller than 10 bp were considered for further analysis. The same was performed on 23°C total RNA data. In all predictions, UTRs longer than 300 bases were discarded because the coverage profile of the TSS and TTS could overlap the coverage of adjacent genes in the same or opposite strands. All identified UTRs were manually assessed. Core promoters in the entire M. burtonii genome were predicted by analysing 90 bp upstream of TSS and 10 bp inside the 5’-end of all transcripts. This analysis was performed using Motif Sampler (Thompson et al., 2007) and FIMO (Grant et al., 2011) software.

Operons were predicted separately for each experiment. The three replicates of total RNA at 4°C were compared and operons that were not predicted in all three experiments were removed from further analysis; the same was performed for experiments at 23°C. Operons were identified using procedures similar to those described previously (Campanaro et al., 2011, 2012). Briefly, the coverage variability between different regions of the same gene was analysed using ORA software (Sardu et al., 2014). Log2 ratio between the mean coverage of 5’-half of the gene and 3’-half of the same gene was calculated. This ratio was used to determine standard deviation of the distribution of these values. PERL scripts developed by Campanaro et al. (2012) were used to analyze whether the log2 ratio of the coverage of two neighboring genes on the same strand was less than one standard deviation. Genes separated by regions of zero coverage in the intergenic regions were not considered as part of the same operon, and genes with very low coverage (lower than 2) were discarded from further analysis. All predictions were confirmed by comparing with the previously determined M. burtonii operons from microarray transcriptomics (Campanaro et al., 2011). All the PERL scripts used for the identification of UTRs and prediction of operons are deposited in sourceforge, including a manual describing the procedures and example files (https://sourceforge.net/projects/trb/).

RNA-seq data are available in the National Centre for Biotechnology Sequence Read Archive under accession number SRR2866535. 99

5.3 Results

RNA-seq data was compared to previous microarray data using a 2-fold cutoff for significance to match the microarray data (Camparano et al., 2011). The two methods exhibited good correspondence (R2, 0.61; Figure 5.1) and gave somewhat similar proportions of genes with higher abundance at 4 and 23oC: 326 and 240 from RNA-seq and 90 and 87 from microarray data, respectively (Table 5.1). This correlation between RNA-seq and microarray data presented reliable evidence that RNA-seq provided comprehensive assessment of global transcript abundance in M. burtonii.

Figure 5.1. Linear plot of RNA-seq and microarray data. Data is for genes that have differential abundance (DA) ≥2-fold in both RNA-seq and microarray datasets. Microarray data from: Campanaro et al. (2011).

100

Table 5.1. Summary of genes identified in M. burtonii RNA-seq analysis.

Number Percentage relative to total of genes number of genes in the M. burtonii genome

Total number of M. 2493 100% burtonii genes

Transcripts more abundant 326 13% at 4oC > 2-fold RNA-seq Transcripts more abundant 240 10% at 23oC > 2-fold

Transcripts more abundant 90 4% at 4oC > 2-fold Microarray* Transcripts more abundant 87 4% at 23oC > 2-fold

* microarray data from Campanaro et al. (2011).

The M. burtonii transcriptome was characterized by numerous relatively short intergenic regions. This made it challenging to predict operons; however identification of a consensus promoter helped to define them (Figure 5.2E). Based on a conservative evaluation of operons (i.e. identified in all three replicates), for RNA from 4oC grown cells, 481 represented single genes and 209 were for operons of two or more genes, from a total of 690 transcripts (Figure 5.2D). For RNA from 23oC grown cells, from 395 transcripts, 248 were for single gene and 147 for operons (Figure 5.2D). All operons defined from RNA-seq data were previously identified in the microarray data (Camparano et al., 2011) (Appendix 5). TSS, TTS and operon structures were also predicted from the RNA-seq data (Figure 5.2). A total of 50 at 4oC and 13 at 23oC genes were identified that lacked 5'-UTR and a total of 32 at 4oC and 15 at 23oC genes lacked 3’-UTRs (Figure 5.2A).

101

Figure 5.2. Overview of M. burtonii transcriptome data. (A) Summary of transcription start sites (TSS) and transcription termination sites (TTS) predicted from 4°C and 23°C grown cells. Location of TSS and TTS from their respective ORFs for 23°C (B) and 4°C (C) grown cells. (D) Number of operons of different lengths for RNA from 4°C and 23°C grown cells. (E) Consensus promoter sequence generated using MEME software with the vertical intensity of each letter illustrative of the frequency of occurrence at that position.

A total of 755 core promoters were identified in the whole M. burtonii genome from the RNA-seq data. Aside from promoters being located near TSS of genes, frequently promoters were also identified within small intergenic regions occurring inside operons. An example is reported in Figure 5.3; Mbur_0346 and Mbur_0347 are separated from Mbur_0348-Mbur_0354 by a small intergenic region. Although there seems to be a small intergenic region present between Mbur_0346-Mbur_0347 and Mbur_0348- Mbur_0354, the software frequently considered Mbur_0346-Mbur_0354 to be part of a single operon. However, the presence of a core promoter upstream of Mbur_0348 suggests that Mbur_0348-Mbur_0354 can be expressed as an independent 102

transcriptional unit. Interestingly, there is also another core promoter in the small intergenic region upstream of Mbur_0352 suggesting that Mbur_0352-Mbur_0354 can constitute another independent transcriptional unit. This was suggestive of mechanisms of regulation involving independent transcription initiation from internal sites. In M. burtonii genome, numerous such operons were identified.

Figure 5.3. Example of consensus promoters (TATA) within ORFs of an operon (Mbur_0346- Mbur_0354) that has increased abundance at 4°C, showing the main operon promoter (outlined by a box) and additional promoters (outlined by a dotted circle).

Cold shock protein genes, cspB and cspC, from Bacillus subtilis consist of a relatively long 5’-UTR that contains a ‘cold-box’ (CS-box 1) sequence (Jiang et al., 1996; Graumann et al., 1997; Fang et al., 1998). These genes from B. subtilis are low temperature regulated (Graumann et al., 1997). From the M. burtonii RNA-seq data, it was found that ctr genes also comprised of relatively long 5’-UTRs (Figure 5.4C). Interestingly, sequences similar to the B. subtilis CS-box 1 sequence were found within two of the three M. burtonii ctr 5’-UTRs (Figure 5.4). The presence of putative cold- box sequences within 5’-UTRs of the low temperature upregulated ctr genes indicated that gene expressions perhaps involve a regulatory mechanism in common with B. subtilis csp genes.

103

(A)

(B)

(C)

Figure 5.4. CS-box 1-like elements within the 5’-UTRs of ctr genes from M. burtonii. (A) Graphical representation of the 5’-UTR of ctr3 (Mbur_1445) and its components. (B) Sequence alignment of CS- box 1 elements from M. burtonii and B. subtilis. (C) Sequence alignment of 5’-UTRs of ctr genes from M. burtonii highlighting the CS-box 1-like elements (yellow highlight) in the 5’-UTR of ctr2 and ctr3. 104

5.4 Discussion

Features such as detection of novel transcriptomic/genomic features, operon structures and accurate annotation of genes make RNA-seq an ideal method to comprehensively assess transcriptome structures (Zhao et al., 2014). RNA-seq data of M. burtonii total RNA was compared to previous microarray data (Camparano et al., 2011) and the data for the two methods showed good correspondence (R2, 0.61). Many recent studies performed on RNA-seq and microarray in parallel have reported similar observations (e.g. Fu et al., 2009; Bottomly et al., 2011; Sîrbu et al., 2012 and references therein). Moreover, in these studies, RNA-seq demonstrated increased detection of transcripts compared to microarray (Marioni et al., 2008). For example, using RNA-seq, over 300 non-coding RNA and 80 new protein coding genes were identified in the transcriptome analysis of S. solfataricus (Wurtzel et al., 2010). For M. burtonii total RNA, microarray data reported a total of 177 genes (90 at 4oC and 87 at 23oC) exhibiting differential abundance over 2-fold; whereas from RNA-seq data, a total of 566 genes (326 at 4oC and 240 at 23oC) exhibited differential abundance over 2-fold; thus demonstrating the typical higher dynamic range of RNA-seq compared to microarray.

The microarray analysis showed that several genes for tRNA modifying proteins were upregulated at low temperature (Campanaro et al., 2011). These included archaeosine biosynthesis protein (Mbur_1272), tRNA-guanine transglycosylase (Mbur_1772) and tRNA-dihydrouridine synthase (Mbur_2154). Modification of nucleosides can stabilize tRNA. For example, dihydrouridine is a specific modified nucleoside that can enhance tRNA flexibility and M. burtonii has higher proportion of dihydrouridine per tRNA molecule compared to hyperthermophilic Archaea (Noon et al., 2003). The upregulation of Mbur_2154 at 4oC suggested that incorporation of dihydrouridine was mediated post- transcriptionally (Campanaro et al., 2011). Aside from the tRNA modifying proteins identified from the microarray data, RNA-seq, in addition, identified several more tRNA modifying proteins that showed up-regulation at 4oC; these included: glutamyl- tRNA amidotransferase (Mbur_1026), tRNA- guanylyltransferase (Mbur_1464) and tRNA-cys synthetase (Mbur_0796). tRNA genes were not included in the microarray analysis, and therefore the abundance of tRNA was not assessed at low vs high temperatures. Inclusion of tRNA data from RNA-seq allowed further assessment of tRNA and their corresponding modifying proteins at low temperatures. Transcript abundance for the majority of tRNA genes was similar for 4 and 23oC, with only six 105

tRNA more abundant at 4oC, and one tRNA species more abundant at 23oC (Figure 6.6). Therefore it can postulated that although overall tRNA content does not increase at low temperature, but the upregulation of tRNA modifying enzymes perhaps tailors tRNA with molecular features that allow low temperature stability and function.

A number of nucleic acid binding proteins were reported to be upregulated at both 4 and 23oC (Appendix 6) in the RNA-seq data. All three genes for TRAM proteins (ctr1, ctr2 and ctr3) exhibited higher abundance at 4oC, consistent with microarray (Campanaro et al., 2011) and previous proteomic data (Williams et al., 2010; 2011) (Appendix 6). Overall, the RNA-seq data reported higher transcript abundance of genes encoding nucleic acid binding proteins at 4oC compared to 23oC indicating important role of these proteins in cold adaptation of M. burtonii.

From the microarray data, several genes encoding ribosomal proteins exhibited higher abundance at 4oC. For example, L29P (important component of ribosome involved in protein secretion) and S3P (protein involved in interaction with mRNA) exhibited uprgulation at 4oC (Baranov et al., 1999; Kramer et al., 2002; Campanaro et al., 2011). However, transcript abundance for 23S, 16S and 5S rRNA genes at 4 vs 23oC was not assessed in the microarray analysis. RNA-seq data provided assessment of these rRNA genes. Transcript abundance for 23S and 16S rRNA genes was similar at 4 and 23oC. However, from a total of six, three 5S rRNA species (Figure 6.6) and several other rRNA genes including L29P and S3P exhibited more abundance at 4oC. One proteomic study examined seven different growth temperatures (-2, 1, 4, 10, 16, 23 and 28°C) that spanned the complete growth temperature range of M. burtonii (Williams et al., 2011). An important finding from this study was that at 1 and 4°C the abundance of most ribosomal subunits peaked, indicative of an adaptive mechanism to maintain effective translation at environmental temperatures. The higher transcript abundance of genes for ribosomal proteins at 4oC reported in the microarray, proteomic and now RNA-seq analysis (except for 16S and 23S rRNA) emphasizes the roles of ribosomal proteins in facilitating protein synthesis of M. burtonii at low temperatures.

In general, from the RNA-seq data, at 4oC, genes for nucleic acid binding, tRNA modifying and cell surface proteins exhibited the most upregulation (Appendix 6). The notable increase of genes for nucleic acid binding proteins and tRNA modifying proteins is indicative of the importance of maintaining RNA in a state suitable for 106

translation and translation initiation at low temperatures. Genes for cell surface proteins involved in glycosylation, protein secretion and expression of adhesion proteins are differentially expressed at 4 and 23oC (Campanaro et al., 2011). This is reflected in different compositions of the extra cellular matrix (ECM) of M. burtonii at 4 and 23oC (Campanaro et al., 2011; Williams et al., 2011). It was postulated that modulation of the composition of ECM at low temperatures may facilitate gene exchange and nutrient capture (Campanaro et al., 2011). From the RNA-seq data, at 23oC, genes involved in methanogenesis, metabolism and transporter proteins exhibited notable upregulation (Appendix 6). Upregulation of these genes can perhaps be explained by the capacity of M. burtonii to grow at higher temperature (i.e. higher at 23 vs 4oC). At high temperatures, faster growth can be characterised by the increased rate of methanogenesis, energy generation, metabolism and transporter proteins (Campanaro et al., 2011). For example, upregulation of phosphate ATP-binding cassette transporter protein (Mbur_0752) perhaps meets the increased demand for phosphorus at a higher growth rate. Overall, RNA-seq data regarding upregulation of genes for specific classes of proteins at 4 vs 23oC was in concert with the microarray analysis of transcript abundance at 4 vs 23oC (Campanaro et al., 2011); again suggesting that the expression data obtained from RNA-seq were reliable and reproducible.

The prevalence of conditional promoters appears to be an inherent characteristic of archaeal genomes. In H. salinarum NRC-1 genome, many of such operons are present that allow conditionally altered gene-expression of constituent genes (Koide et al., 2009). H. salinarum NRC-1 can grow under the influence of various environmental factors such as different pH, oxygen concentration, nutrition level etc. (Schmid et al., 2007) which affect expression of ~ 63% of the total genes (Koide et al., 2009). It was also postulated that intragenic binding sites could re-associate transcription factors to aid RNA polymerase in the process of transcription (Reppas et al., 2006; Lee et al., 2008). M. burtonii is capable of growth using different substrates (e.g. methanol or trimethylamine in complex medium; Williams et al., 2010b) and at different temperatures (-2 to 28oC; Williams et al., 2010a; Williams et al., 2011). The transcriptome structures of M. burtonii and H. salinarum share some similar characteristics: larger transcripts generally have higher cellular levels (Campanaro et al., 2011); presence of many leaderless transcripts (i.e. no 5'-UTR present) and numerous intergenic regions in their respective genomes (from M. burtonii RNA-seq data; Koide 107

et al., 2009). From the RNA-seq data of M. burtonii total RNA, numerous putative core promoters were identified within operons indicating mechanisms of regulation involving independent transcription initiation from internal sites. Therefore, it is possible that under specific environmental conditions, dynamic transcriptional regulatory mechanisms are likely to affect gene expression of M. burtonii. It will be interesting to assess how operon structures and expression profiles vary under different environmental factors such temperature or nutrition.

B. subtilis has three cold shock proteins, CspB, CspC and CspD, which bind RNA (Graumann et al., 1997). Their homologous counterparts in E. coli (CspA, CspB and CsdA) possess conserved motifs (cold-box elements) in their unusually long 5’-UTR that are responsible for gene regulation during low temperature stress (Jiang et al., 1996; Fang et al., 1998). Interestingly, low temperature regulated cspB and cspC genes from B. subtilis also possess a relatively long 5’-UTR that contains a ‘cold-box’ (CS- box 1) sequence (Jiang et al., 1996; Graumann et al., 1997; Fang et al., 1998). However, there is no significant sequence similarity between B. subtilis CS-box and E. coli cold box. B. subtilis Csp proteins exhibit strong binding affinity towards CS-box 1 elements and are postulated to be involved in regulation of csp genes at low temperatures (Graumann et al., 1997). A cold-box element was previously identified in the M. burtonii genome. At low temperature, RNA helicase (Mbur_1950) from M. burtonii transcribes with a long 5'-UTR (Lim et al., 2000) that contains a sequence with high identity to cold-box elements found in the 5'-UTR of E. coli (Phadtare et al., 1999) and Anabaena sp. (Chamot et al., 1999). The RNA-seq data of M. burtonii total RNA also confirmed the higher abundance of Mbur_1950 at 4oC and the presence of the cold- box element in the 5'-UTR of Mbur_1950. A search for cold-box sequences within 5’- UTR regions of low temperature regulated genes (from the RNA-seq data) revealed sequences similar to the B. subtilis CS-box 1 sequence within two of the three M. burtonii ctr 5’-UTRs: ctr2 and ctr3 (Figure 5.4). In E. coli, 5'-UTR of cspA has been reported to play roles in autoregulation (Jiang et al., 1996; Bae et al., 1997; Fang et al., 1997, 1998) and CspE has been suggested to bind to the cold-box of cspA gene and act as a negative regulator of expression (Bae et al., 1999). So it is possible that Ctr proteins may also function as autoregulators or even as negative regulators of their own gene expression. Presence of such conserved motifs suggests correlation between low temperature gene regulation across Bacteria and Archaea. These regulatory elements 108

may have been acquired by horizontal gene transfer from Bacteria to M. burtonii, and are consistent with genomic analyses which identified the importance of genome plasticity in the evolution of M. burtonii to its Antarctic environment, including gene transfer from Eplisonproteobacteria and Deltaproteobcteria (Allen et al., 2009). The exact mechanism of gene regulation of low temperature regulated M. burtonii genes (e.g. ctr and Mbur_1950) needs further assessment in order to obtain more knowledge regarding low temperature gene regulation of Bacteria and Archaea.

In conclusion, RNA-seq performed on RNA isolated from M. burtonii provided a more comprehensive assessment of differentially abundant transcripts at 4 vs 23C. Moreover, the differentially expressed genes identified from the RNA-seq data showed good correspondence with those identified by microarray suggesting expression data obtained for M. burtonii from RNA-seq was consistent and reliable. A general overview of the reconstructed transcriptome of M. burtonii provided some insights in operon structures and identified features that were not previously reported in the microarray analysis (Campanaro et al., 2011) such as presence of alternate promoters within operons. Bacterial cold-box like elements were identified in the 5’-UTRs of ctr2 and ctr3 genes suggesting a bacterial-like gene regulation may exist for these genes. A comprehensive study comprising knowledge obtained from genomic, transcriptomic and proteomic analyses of M. burtonii at high vs low temperatures may highly benefit the understanding of cold-adaptation of M. burtonii, and more broadly psychrophiles in general.

109

Chapter 6

RNA binding specificity and possible cellular functions of Ctr3

Statement

Sections of this chapter contain analyses performed by Dr Stefano Campanaro, Dr Nandan Despande from the School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia and Prof Paul Curmi from the School of Physics, University of New South Wales, Sydney, Australia. Contributions by Stefano Campanaro are as follows: global assembly and mapping of RNA-seq reads against M. burtonii genome; computational analysis of secondary structures; motif searches and all statistical analyses. Contributions by Nandan Despande are as follows: global assembly and mapping of RNA-seq reads against M. burtonii genome. Contributions by Paul Curmi are as follows: analysis of crystal structures of 5S rRNA from Haloarcula marismortui and tRNA-Phe from Saccharomyces cerevisiae.

My contributions are as follows: in vitro RNA binding assay; RNA extraction; sample preparation for RNA-seq quality control; all analyses and data interpretation of binding data from RNA-seq.

110

Abstract

This chapter explores the possible functional properties of Ctr3 from M. burtonii. An on-column in vitro binding assay was used to capture M. burtonii RNA targets of Ctr3. Identification of the captured RNA using RNA-seq revealed that Ctr3 preferentially bound tRNA and 5S rRNA. A potential binding motif for Ctr3 was identified. In tRNA, the motif represented the C loop; a region that is conserved in tRNA from all domains of life and appears to be solvent exposed, potentially providing access for Ctr3 to bind. In 5S rRNA, the motif represented one side of the stem and loop C which also appears to be solvent exposed providing possible access to Ctr3. Ctr3 and Csps are structurally similar and are both inferred to function in low temperature translation. The broad representation of single TRAM domain proteins within Archaea compared to their apparent absence in Bacteria, and scarcity of Csps in Archaea but prevalence in Bacteria, suggests they represent distinct evolutionary lineages of functionally equivalent RNA-binding proteins.

111

6.1 Introduction

Growth in cold impedes transcriptional and translational processes in the cell (Feller and Gerday, 2003). In psychrophilic Archaea, nucleic acid binding proteins have been suggested to play important roles in cold adaptation (Saunders et al., 2003; Campanaro et al., 2011). For example, in M. burtonii, upregulation of RNase J-like protein (involved in 5’-UTR processing of genes) at low temperatures has been inferred to enhance transcript stability and translation of mRNA species (Williams et al., 2010a). The TATA-box binding protein from M. burtonii has higher abundance at low temperatures (highest at 1oC which the native growth temperature of M. burtonii) and suggested to have putative roles in the recruitment of the multisubunit RNAP (Williams et al., 2010a). In a proteomic study of M. burtonii, it was revealed that the abundance of most ribosomal subunits peaked at 1 and 4°C, indicative of an adaptive mechanism to maintain effective translation at low temperatures (Williams et al., 2011). In M. frigidum, a cold shock protein (CspA homolog) was identified that illustrated the ability to complement cold sensitivity and rescue the E. coli BX04 (a cold sensitive strain) growth defect at low temperatures (Giaquinto et al., 2007). Csps (e.g. CspA family proteins) from Bacteria have been suggested to resolve inhibitory secondary structures of RNA and facilitate translation at low temperatures (Phadtare, 2011). Both translation and transcription (e.g. premature termination) are likely to be compromised at low temperature due to the increased stability of adventitious RNA secondary structures; therefore it is possible that RNA chaperones in concert in RNA helicases may play roles in overcoming these problems in psychrophiles (Williams et al., 2011).

One of important features that emerged from the genomic, proteomic and transcriptomic analyses performed on M. burtonii is the upregulation of nucleic acid binding proteins at low temperatures (Allen et al., 2007; Williams et al., 2011; Campanaro et al., 2011; RNA-seq data from Chapter 5). While these analyses provided a global assessment of gene, protein and transcript contents of nucleic acid binding proteins at low temperatures, experimental data characterizing functions of many nucleic acid binding proteins remained unassessed. To date, only one study has been performed to determine the nucleic-acid binding characteristics of RpoEF with the aim to identify cellular RNA targets. Subunits E and F of the RNAP can interact with general transcription factors and facilitate transcription (Hirata et al., 2008a). RNA targets for RpoEF were captured in vitro by immobilizing a mixture of M. burtonii total RNA and His-tagged 112

recombinant RpoEF proteins on a nickel column. The unbound RNA was washed off, the bound RNA-RpoEF was eluted out and the bound RNA was extracted (DeFrancisci et al., 2011). The identity of the captured RNA was determined by an Agilent custom microarray, the same to the one used for transcriptome analysis of M. burtonii (Campanaro et al., 2011). The total RNA expression data was successfully used as reference for the bound RNA (Campanaro et al., 2011; DeFrancisci et al., 2011). The in vitro method to capture RNA targets was proficiently used to identify transcripts for 117 genes that were bound by RpoEF which included a number of functional classes: methanogenesis, cofactor biosynthesis, nucleotide metabolism, transcription, translation and transport (DeFrancisci et al., 2011).

Most Bacteria have multiple numbers of genes for Csp proteins in their genomes (Horn et al., 2007); e.g. E. coli has nine homologs of Csp proteins (Yamanaka, 1999). In Bacteria, Csp proteins are upregulated during cold stress and/or during low temperature growth (Gao et al., 2006; Bergholz et al., 2009; Lauro et al., 2011; Siddiqui et al., 2013). However, not all Csp proteins are upregulated at low temperature. For example, in E. coli, only four (CspA, CspB, CspG and CspI) have increased abundance during cold-shock (Yamanaka, 1999). Nevertheless, Csp proteins have been suggested to function by chaperoning unwound RNA (Jiang et al., 1997), performing antitermination (La Teana et al., 1991) or mediating transcriptional activity (Bae et al., 2000). However, while being ubiquitous in bacterial genomes, only a few csp homologs have been identified in psychrophilic Archaea (e.g. M. frigidum, Cenarchaeum symbiosum, H. lacusprofundi); csp genes are totally absent from the M. burtonii genome (Giaquinto et al., 2007; Allen et al., 2009).

Ctr proteins (Ctr1, Ctr2 and Ctr3) bind RNA (Chapter 3) and have been reported to exhibit higher abundance at 4oC compared to 23oC (Williams et al., 2010a; Campanaro et al., 2011; RNA-seq data from Chapter 5). In addition, one proteomic analysis revealed that highest levels of Ctr proteins occurred at 1 and -2°C (Williams et al., 2011). In the same study, it was reported that the abundance of the ATP-dependent RNA helicase (Mbur_1950) increased with decreasing growth temperature down to 1C, but cellular abundance dropped at -2C (Williams et al., 2011). The levels of the RNA helicase may be down regulated at the minimum temperature for growth (-2C) because cellular ATP may become limiting. Therefore, it was inferred that the increased 113

abundance of Ctr proteins may compensate for the reduced levels of RNA helicase activity (Williams et al., 2011).

Ctr3 exhibited the highest abundance at low temperature growth of M. burtonii (Williams et al., 2011). Although a putative role of Ctr3 at low temperature was suggested in the transcriptomic and proteomic studies (Campanaro et al., 2011; Williams et al., 2011), experimental data characterizing its functional properties is lacking. This chapter report findings from the first experimental analysis performed to infer cellular function of Ctr3 based on specific RNA species bound by the protein.

6.2 Materials and Methods

6.2.1 RNA extraction and in vitro binding to Ctr3

RNA binding was performed based on methods described previously (DeFrancisci et al., 2011; Camparano et al., 2011; Jankowsky and Harris, 2015). M. burtonii total RNA (1 mg) was incubated with purified Ctr3 (10 mg) in binding buffer (40 mM HEPES, pH 7.4, 100 mM K acetate, 40 U of RNaseOUT: Life Technologies) for 1 h. The incubation was performed in two stages: 0.5 mg of total RNA was incubated for 30 min followed by the addition of another aliquot of 0.5 mg of total RNA with incubation for a further 30 min, and the suspension was carefully mixed every 5 min. Ni-Nta resin (300 µl) (Qiagen) was equilibrated with binding buffer on a disposable chromatography column (Biorad), the Protein-RNA mixture incubated with the pre-equilibrated Ni resin on the column for 1 h, and the column washed with 10 bed volumes of binding buffer (without RNaseOUT) to remove all unbound material. A 2 ml volume of elution buffer (40 mM HEPES, pH 7.4, 100 mM K acetate, 500 mM imidazole) was added to elute the protein, the eluent was extracted using TRIzol® Plus RNA Purification Kit (Life Technologies) to separate M. burtonii RNA from Ctr3, and the resultant RNA was resuspended in 30 µl nuclease free water (Life Technologies). The RNA concentration was measured using a Nanodrop spectrometer (Thermo Scientific), and the RNA integrity was analysed using a 2100 Bioanalyzer (Agilent). The binding assay was repeated a total of ten times using five independent M. burtonii cultures grown at 4°C and five at 23°C, with six as test experiments (three at each temperature) and two control experiments for 114

each temperature (one without the addition of Ctr3 and one without the addition of RNA).

6.2.2 RNA-seq and data analyses

The bound RNA was sequenced on a Hiseq 2000 (Illumina) at the Ramaciotti Centre for Genomics (UNSW, Australia) generating a total of 150 million non-strand-specific PE reads with an average read length of 100 bp. The sequenced reads were evaluated for quality using FastQC (version 0.11.2) (Andrews, 2010). The SolexaQA package (Cox et al., 2010) was used to calculate quality statistics and remove low quality reads. Raw PE reads were trimmed with BWA trimming mode at a threshold of Q13 (P = 0.05) using DynamicTrim of SolexaQA. Reads less than 25 bp were discarded using LengthSort. The quality filtered reads were mapped and aligned against M. burtonii genome (Genbank: NR_076415) using bowtie/2-2.0.0-beta7 (Langmead et al., 2009) using default parameters; reads with ≤ 2 mismatches only were considered. HTSeq-count (v0.5.3p9) (Anders et al., 2015) was used to generate a list of per-gene read counts for each sample. The BioConductor package edgeR v3.10.2 (Robinson et al., 2010) was used in the R programming environment to sort genes according to their differential abundance in binding assays. Standard normalization procedures were performed on all data. A false discovery rate calculation was employed to filter quartiles of genes with lowest counts. Differential abundance of individual transcripts identified in binding assays was calculated as the log2 fold change of the ratio between transcripts in the binding experiments and total RNA at each temperature. Thresholds of log2 ratio of  2 (4-fold) were imposed on binding assay data sets to minimize background reads. Transcripts for 16S rRNA and 23S rRNA genes were not enriched (differentially abundant) in the Ctr3 binding experiments and were discarded prior to data analysis. All transcripts and reads that exhibited differential abundance in binding experiments were visualized and manipulated using IGV 2.3 (Robinson et al., 2011) and Artemis (Rutherford et al., 2000) software.

RNAFOLD from the Vienna package (Hofacker, 2004) was used to predict the number of bases that can form secondary structures and the associated average folding energy. The free energy was not normalized for transcript length and is therefore a measure of the average fraction of bases in secondary structures. Comparisons were performed between Ctr3-bound transcripts and transcripts for all genes in the M. burtonii genome. 115

Comparisons included calculations of the average (-18573 kcal mol-1) and median (- 14990 kcal mol-1) free energy of the bound transcripts vs the average (-26772 kcal mol- 1) and median (-22951 kcal mol-1) for unbound transcripts, and the average (59.6%) and median (60.1%) fraction of bases in secondary structures for bound transcripts vs the average (58.9%) and median (63.1%) for unbound transcripts.

MEME (Bailey et al., 2009) and GLAM2 (Frith et al., 2008) software were used to identify conserved motifs in transcripts bound by Ctr3. All predicted motifs were checked for overrepresentation in transcripts that were not bound by Ctr3. Positions of the motifs in the transcripts were determined using FIMO (Bailey et al., 2009; Grant et al., 2011). MEME was used to illustrate the compositions of identified motifs. Statistical over-representation of tRNA and 5S rRNA sequences in the Ctr3-bound data was assessed using procedures described previously (Treu et al., 2014). Briefly, 100000 random samplings were performed using 64 and 53 genes (representing transcripts defined from 4 and 23oC cultures, respectively) across the entire M. burtonii genome sequence (2497 genes) using custom PERL script implementing PERL 'rand()' function. Fractions of random samples were then calculated in which the number of tRNA or 5S rRNA genes was equal to or higher than N (where N is the number of tRNA/5S rRNA in the group of 64 and 53 genes). If this fraction was lower than the significance level (α = 0.05) then the enrichment of tRNA/5S rRNA genes was considered significant.

RNA-seq data are available in the National Centre for Biotechnology Sequence Read Archive under accession number SRR2866535.

116

6.3 Results

6.3.1 Ctr3 binds specific M. burtonii transcripts

To determine the ability of Ctr3 to bind to specific cellular RNA, recombinant Ctr3 was incubated with M. burtonii total RNA from 4 and 23oC grown cells. For each condition (4 and 23oC), experiments were performed in triplicate (Figure 6.1). The RNA-protein mixture was immobilized on Ni-Nta affinity columns. The use of full length transcripts in the binding experiments preserved the secondary structures of RNA. The in vitro binding assay ensured that no cellular factors influenced protein-RNA interactions. Control experiments were set up to assess whether RNA alone binds to the Ni-Nta affinity column. No RNA bound to the affinity column unless it was combined with Ctr3; consistent with the RpoEF microarray study in which RNA also did not exhibit affinity towards the Ni-Nta affinity column (DeFrancisci et al., 2011). In the other control experiment, only RNA-free Ctr3 proteins were immobilized on Ni-Nta affinity columns to assess whether bound RNA influences Ctr3 binding to the Ni-Nta column. In both instances (RNA-bound and RNA-free), same amount of proteins (~ 10 mg) was immobilized and eluted. This confirmed that bound RNA did not influence Ctr3 binding to the Ni-Nta column.

Based on a stringent cut-off (2-fold in all replicates), Ctr3 bound to a total of 353 transcripts from 4oC cultures and 255 from 23oC grown cells; ~ 1.4-fold more from 4 than 23oC RNA (data not shown). Since RNA-binding proteins naturally possess a range of affinities for different species of RNA (Guenther et al., 2013; Duss et al., 2014; Jankowsky and Harris, 2015), a more stringent cut-off (4-fold in all replicates) for binding was used to minimize non-specific binding data. This caused a large decrease resulting in a total of 64 transcripts from 4oC cultures and 53 from 23oC grown cells (Appendix 6). In both instances (cut-off values of 2-fold or 4-fold in all replicates), the class of transcripts that were clearly most enriched were tRNA and 5S rRNA. From the 2-fold cut-off transcript list, a total of 36 tRNA from 4oC compared to 13 from 23oC grown cells were obtained; and from the 4-fold cut-off transcript list, a total of 24 tRNA from 4C compared to 12 from 23C. Clearly, Ctr3 bound to more tRNA species from the 4oC grown cells compared to 23oC cultures. Almost equal numbers of 5S rRNA genes were enriched using 2-fold (all six 5S rRNA from both 4 and 23oC cultures) and 4-fold (all six 5S rRNA from 4oC and five from 23oC grown cells) cut-off values. 117

Overall, Ctr3 preferentially bound to tRNA and 5S rRNA species from 4 and 23oC grown cells regardless of the cut-off threshold (2 vs 4-fold). Therefore, in order to focus on the species most strongly enriched by Ctr3, the 4-fold cut-off transcript list was selected.

The over-representation of tRNA and 5S rRNA in Ctr3-bound transcripts was further confirmed using a random resampling approach (Treu et al., 2014; see Materials and Methods). Moreover, the representation of transcripts predicted to form secondary structures using RNAFOLD (see Materials and Methods) was equivalent in the Ctr3- bound data compared to all genes in the genome, thereby indicating that the over- representation of tRNA and 5S rRNA in the Ctr3-bound data was not simply explained by the RNA being structured.

Figure 6.1. RNA bound by Ctr3 and total M. burtonii RNA used for RNA-seq electrophoresed on a non-denaturing 1.5% agarose gel. Lane A and D: 100 bp molecular marker; Lane B and C: total RNA from 4oC and 23oC, respectively; Lane E-G: replicates of RNA from independently grown 4oC cultures captured by Ctr3; Lane H-F: replicates of RNA from independently grown 23oC cultures captured by Ctr3.

6.3.2 Binding motif of Ctr3

To attempt to identify motifs that can potentially serve as specific binding sites for Ctr3, a total of 64 transcripts from 4oC cultures and 53 from 23oC grown cells were analysed 118

using MEME and GLAM2 software. A 41 nucleotides long motif, 4C_M1 (Figure 6.3A), was identified in 29 sequences from the 4oC transcripts, including 23 tRNA and all 5S rRNA transcripts, and 18 transcripts from 23oC RNA, including 12 tRNA and five 5S rRNA transcripts (Appendix 7). The 4C_M1 motif consisted of a nine nucleotide core sequence of GUUCXXXUC (Figure 6.4A). The whole dataset (64 transcripts from 4oC and 53 from 23oC grown cells) was re-interrogated using FIMO for the presence of just the nine nucleotide core motif of 4C_M1, resulting in the identification of 30 sequences from 4C RNA (24 tRNA and six 5S rRNA) and 18 transcripts from 23oC RNA (12 tRNA and five 5S rRNA) (Appendix 8). Both the full length and core 4C_M1 motifs were present in a total of 27 tRNA and all rRNA species that were bound by Ctr3 from both 4 and 23oC grown cells (Table 6.1, Figure 6.2).

Figure 6.2. RNA species bound by Ctr3 from 4C and 23C grown cultures that contain both the full length and core sequence of the 4C_M1 motif. The quantitative enrichment of tRNA and 5S rRNA by

Ctr3 is shown as log2-fold increase relative to transcript levels for total RNA-seq data for cultures grown at 4C (blue) or 23oC (red), with the 4-fold cut-off used for defining enrichment (in all replicates) shown by an arrow.

119

Table 6.1. Transcripts bound by Ctr3 which contain both the full 4C_M1 motif and the core sequence of the motif. Transcripts containing the motif from RNA prepared from 4oC or 23oC grown cells are shown with differential abundance levels for total RNA for comparison.

Differential Differential abundance, Ctr3 abundance, total bound a RNA b

Gene ID Gene annotation 4oC 23oC 4oC 23oC tRNA genes Mbur_R0001 tRNA-Ser 5.5 - - - Mbur_R0002 tRNA-Leu 220 - - - Mbur_R0007 tRNA-Phe 12 - - - Mbur_R0008 tRNA-Val 49 23 - - Mbur_R0009 tRNA-Met 7.7 - - - Mbur_R0012 tRNA-Ser 11 - - - Mbur_R0015 tRNA-Val 5.7 4.3 - - Mbur_R0016 tRNA-Leu 5.3 - - - Mbur_R0017 tRNA-Val 4.3 - - - Mbur_R0019 tRNA-Cys - 4.3 - - Mbur_R0022 tRNA-Ala 5.0 7.9 4.5 - Mbur_R0027 tRNA-Lys 6.3 4.7 - - Mbur_R0028 tRNA-Thr - 4.2 - - Mbur_R0030 tRNA-Asp 7.5 - - - Mbur_R0031 tRNA-Tyr 20 12 - - Mbur_R0034 tRNA-Ser 9.6 - - - Mbur_R0038 tRNA-Pro 72 - - - Mbur_R0040 tRNA-Ala 6.4 - - - Mbur_R0043 tRNA-Cys - 14 5.2 - Mbur_R0045 tRNA-Arg 6.7 5.4 3.5 - Mbur_R0047 tRNA-His 11 - - - Mbur_R0048 tRNA-Lys 13 - - - Mbur_R0051 tRNA-Ile 4.9 71 96 - Mbur_R0054 tRNA-Leu - 12 5.2 - Mbur_R0056 tRNA-Ala 6.1 - 3.3 - Mbur_R0059 tRNA-Cys 4.4 5.8 - - Mbur_R0063 tRNA-Met 14 - - -

5S rRNA genes Mbur_R0042 5S ribosomal RNA 13 6.3 - - Mbur_R0060 5S ribosomal RNA 12 - - - Mbur_R0020 5S ribosomal RNA 11 8.8 2.3 - Mbur_R0058 5S ribosomal RNA 10 9.1 2.6 - Mbur_R0044 5S ribosomal RNA 6.6 7.4 - - Mbur_R0018 5S ribosomal RNA 4.1 8.8 7.4 - a. Differential abundance is shown as fold change calculated to two significant figures from average cpm values (counts per million) of individual genes in binding experiments for RNA from 4oC or 23oC grown cells compared to total RNA for the same growth temperature. 120

b. Differential abundance is shown as fold change calculated to two significant figures from average cpm values for total RNA from 4oC vs 23oC grown cells, for differential abundance increases (at least 2-fold in all replicates) at 4oC or 23oC. Fold change that is below the 4-fold and 2-fold cut-off in Ctr3 bound and total RNA, respectively, are shown as a dash (-).

The position of 4C_M1 core sequence in tRNA and 5S rRNA species was further assessed. The M. burtonii tRNA sequences vary in length from 72 to 109 nucleotides and have as little as 35% nucleotide identity to each other. The nine nucleotide core sequence is located near the 3'-end of their sequences (Appendix 9A,B). In contrast, all six 5S rRNA sequences are of almost equal lengths (~ 120 nucleotides long) and exhibit ≥95% identity. The nine nucleotide core sequence is located near the 5'-end of their sequences (Appendix 10C,D). The complete motif, 4C_M1, was visualized in the structures in tRNA and 5S rRNA (Figure 6.3). Nuclear magnetic resonance (NMR) or crystal structures of 5S rRNA and tRNA from M. burtonii have not been solved yet. A crystal structure of an archaeal 5S rRNA (AF034620) from Haloarcula marismortui (PDB ID: 2QA4) has been solved (Szymanski et al., 2002). Alignment of primary sequences of M. burtonii 5S rRNA (Mbur_R0020) and H. marismortui 5S rRNA (AF034620) allowed direct prediction of structural features of M. burtonii 5S rRNA (Figure 6.3A,B). For M. burtonii tRNA, tRNAscan-SE software was used to predict the secondary structure of tRNA-Ala (Mbur_R0022) (Figure 6.3C). In 5S rRNA, the full 4C_M1 motif spans parts of domain α, helix 2, loop B and helix 3, and in tRNA it covers the anticodon loop and C loop (Figure 6.3). The complete 4C_M1 motif occurs within structured regions of tRNA and 5S rRNA that contain hairpin loops, although the structures in tRNA and 5S rRNA are not identical (Figure 6.3). The core sequence itself is represented by the hairpin loop of the tRNA C loop, and one side of the stem and loop of the 5S rRNA C loop. Strikingly, the core sequence is conserved in the C loop of tRNA from all domains of life (Appendix 11).

It is important that the binding motif is solvent exposed in order to ensure access for Ctr3. Crystal structures of 5S rRNA from H. marismortui (Archaea) (PDB ID: 2QA4) and tRNA-Phe from S. cerevisiae (PDB ID: 1EHZ) were analyzed (Figure 6.4). The structures show that the region of rRNA and tRNA that contains the core motif (GUUCXXXUC) is solvent exposed and would therefore potentially provide access for Ctr3 to the RNA to allow it to bind. 121

Figure 6.3. Location of full-length 4C_M1 motif within the predicted structures of M. burtonii 5S rRNA and tRNA. (A) Putative binding motif, 4C_M1. The vertical intensity of each letter is proportional to the frequency of occurrence at that position calculated by MEME. (B) 5S rRNA. Upper portion:

Primary sequence alignment of the M. burtonii 5S rRNA (Mbur_R0020) and H. marismortui 5S rRNA (AF034620) showing the locations of known loop structures (dotted boxes) from the H. marismortui 5S rRNA structure (Ban et al., 2000; Szymanski et al., 2002) and the location of the 4C_M1 motif (yellow). Lower portion: Location of the 4C_M1 motif from M. burtonii (yellow) superimposed on the H. marismortui 5S rRNA secondary structure; bases that are part of the 4C_M1 motif and form part of the hairpin structures in loop B and loop C (blue circles); tertiary structure interactions (black lines between bases depicted in white lettering in black circles). (C) tRNA-Ala. The secondary structure of Mbur_R0022, tRNA-Ala was predicted using tRNAscan-SE and depicted show the location of the 4C_M1 motif (yellow) and bases that form the anticodon loop (secondary structure anticodon sequence) and C loop (primary sequence, underlined; secondary structure, blue circles).

122

Figure 6.4. Location of the nine nucleotide core sequence of 4C_M1 in the tertiary structures of 5S rRNA and tRNA. (A) Core sequence of 4C_M1 motif. The vertical intensity of each letter is proportional to the frequency of occurrence at that position. Tertiary structure of (B) C loop of 5S rRNA from H. marismortui (PDB ID: 2QA4) and (C) loop C of tRNA-Phe from S. cerevisiae (PDB ID: 1EHZ). RNA bases in the core motif are shown in stick representation using atomic colours and they are linked by lines (5S rRNA, purple; tRNA, cyan) to the motif. The portions of the two structures that face the viewer (foreground) are solvent exposed, thus, the ribbon representation appears brighter due to depth cuing in the image. The RNA backbone is represented as a ribbon shown in orange. Ribosomal proteins in (B) are shown in pale pink and green.

123

6.4 Discussion

6.4.1 Preference for structured RNA Ctr3 preferentially bound to M. burtonii tRNA and 5S rRNA from both 4 and 23oC grown cells. These RNA species contained the full length motif, 4C_M1, with a solvent exposed core sequence (GUUCXXXUC) (Table 6.1, Figure 6.2). The total number of transcripts bound by Ctr3 included a larger number of tRNA species (24 from 4C vs 12 from 23C) and a similar number of 5S rRNA species (6 from 4C vs 5 from 23C) (Appendix 6, Figure 6.2). However, cellular abundance does not explain the preferential binding of tRNA. At both temperatures (4 and 23oC), the majority of tRNA were at similar levels with only six tRNA species more abundant at 4oC, and one tRNA species more abundant at 23oC (Figure 6.5, Table 6.1, Appendix 6). Therefore, additional features such as temperature-dependent modifications may play roles in preferential binding of tRNA.

The core motif of 4C_M1 represents the C loop of tRNA which appears to be solvent exposed in tRNA-Phe from S. cerevisiae (PDB ID: 1EHZ) (Figure 6.4). The core sequence is conserved in the C loop of tRNA from all domains of life (Appendix 11). In the C loop of tRNA-Phe from S. cerevisiae, uridine residues at positions 54 and 55 and the adenine residue at 58 are modified. These modifications include: methylation of uridine at position 54, modification of uridine to pseudouridine at position 55 and methylation of adenine at position 58. Modifications in nucleosides are very common in tRNA (Motorin and Grosjean, 2005) and can affect the biology of tRNA. For example, anti-codon stem and loop modifications facilitate accurate codon selection and possibly affect gene regulation (Gustilo et al., 2008). Pseudouridines (Ψ) are usually present in the hinge regions of tRNA and can play roles in structural stabilization (Charette and Grey, 2000; Motorin and Grosjean, 2005). Ψ also affects the local structure of tRNA domains (Auffinger and Westhof, 1998). In one mutational analysis, it was shown that changing Ψ55 to U55 affects the tRNA (Du et al., 2003) in E. coli. During translation, Ψ are thought to modulate some of the many interactions between tRNA, rRNA and mRNA (Charette and Grey, 2000). Methylation of adenine and uridine residues can help avoid misfolding of tRNA (Motorin and Grosjean, 2005). Modification of residues in the C loop is very common and largely conserved across all domains of life (Björk, 1984; Shigi et al., 2002). In E. coli tRNA-Leu, interactions occur between specific 124

bases in the D and C loops that affect the tertiary structure of the tRNA and its interactions with aminoacyl-tRNA synthetases that in turn influence aminoacylation and editing reactions (Du et al., 2003). Presence of modified nucleosides in tRNA can increase the surface area by approximately 20% which promotes effective interactions of the tRNA with cognate proteins (Björk and Kohli, 1990). Therefore, modified residues play important roles in structure stabilization and function of tRNA (Gustilo et al., 2008).

M. burtonii tRNA was previously found to have an overall low level of tRNA modification (e.g. compared to hyperthermophiles) but possessed the highest levels of dihydrouridine in tRNA for any microorganism examined at the time (Noon et al., 2003). The relative abundance of this nucleoside modification which enables maintenance of polynucleotide flexibility at low temperatures was similar for 4 and 23C grown cells (Noon et al., 2003); so some other modification(s) not examined in the 2003 study perhaps affected Ctr3 binding. Genes for several tRNA modifying proteins were shown to have higher abundance at 4oC (Chapter 5; RNA-seq analysis of M. burtonii total RNA) and it was postulated that modifications of tRNA species at low temperature can facilitate low temperature function. Therefore, it is possible that the specific sites of modification are regulated by growth temperature with changes occurring within the 4C_M1 motif that allows Ctr3 to preferentially bind to tRNA.

Three of the six 5S rRNA species are more abundant at 4oC (Figure 6.5). Although cellular abundance perhaps explains the preferential binding of 5S rRNA by Ctr3, however additional features may also play roles in preferential binding e.g. site specific modifications. Similar to tRNA, nucleosides in 5S rRNA also exhibit modifications (Szymanski et al., 2000). For example, Ψ residues found in 5S rRNA may influence both rRNA folding and ribosome assembly (Ofengand et al., 1995; 1998). Ψ can probably also contribute to the stabilization of RNA-protein interaction (Ofengand et al., 1995). 125

Figure 6.5. Relative abundance of tRNA and 5S rRNA transcripts in total RNA at 4C compared to 23C. RNA from cultures grown at 4C (green) and 23oC (purple). The relative abundance of tRNA and

o 5S rRNA is shown as a log2-fold increase for higher levels at 4C (green) or 23 C (purple), with the 2- fold cut-off used for defining a significant increase (in all replicates) shown by an arrow. The small number of species with higher abundance from 4C vs 23C grown cells does not account for the larger number of tRNA transcripts bound by Ctr3 from 4C vs 23C.

During the purification of recombinant Ctr3, RNA from E. coli consistently co-purified with Ctr3 requiring special procedures to yield RNA-free proteins (Chapter 3). The ability to capture structured M. burtonii RNA on the affinity column; and bind-to and retain E. coli RNA (during purification) suggests Ctr3 might bind to similar species of RNA from E. coli. Therefore, it will be interesting to assess whether Ctr3 also binds to similar species of RNA (i.e. tRNA and 5S rRNA) from E. coli using in vitro binding assay, similar to the ones used for Ctr3 and RpoEF (DeFrancisci et al., 2011) and identification of bound RNA via RNA-seq.

126

6.4.2 Possible role in translation

In Bacteria, Csps have been proposed to function as transcriptional and translational activators (including autoregulation), antiterminators, and chaperones that potentially unwind or coat nascent or full-length RNA to prevent secondary structures from forming and impacting on translation (Jiang et al., 1997; Bae et al., 2000; Phadtare et al., 2002; Phadtare et al., 2010; Phadtare, 2011; Baria et al., 2013) (Figure 6.6). The cellular levels of Csps can reach 10% of total protein synthesis following cold shock treatment (Goldstein et al., 1990; Jiang et al., 1997). It was therefore concluded that Csps may function as chaperones as well as gene regulators. However, studies defining the capacity of Csps to bind specific species of whole cell RNA do not appear to have been reported in the literature. It will be interesting to assess whether any Csps can bind to specific cellular RNA targets similar to Ctr3. Based on similarities, parallels can be drawn between Csps and TRAM proteins (Table 6.2). Single TRAM proteins are prevalent in Archaea whilst they are completely absent in Bacteria. In contrast, there is scarcity of csp genes in Archaea vs prevalence in Bacteria. There are significant structural similarities between TRAM proteins and Csps; e.g. both TRAM and Csp are small proteins (60-70 kDa) that unfold reversibly and adapt the canonical β-barrel structure with a cluster of aromatic amino acids on the RNA-binding surface. Between Csps and Ctr3 with regards to their increased cellular abundance at low temperature and Ctr3’s preferential binding to tRNA and 5S rRNA, it is possible that both have proposed roles in facilitating translation at low temperature.

L5, a ribosomal protein from Bacteria, binds to 5S rRNA to form stable ribonucleoprotein complexes prior to ribosome assembly and plays an important role in the formation of the central protuberance during ribosomal assembly (Barciszewska et al., 2001; Dinman et al., 2005). L5 can bind to loop C of 5S rRNA (Perederina et al., 2002) as well as interact with the docked tRNA at the P-site (Nissen et al., 2000). L5 together with 5S rRNA is therefore believed to play important roles in stabilization of ribosome bound tRNA and enhances ribosomal activity (Moore, 1995). Structural studies show that L5 forms a β-barrel like structure and two β strands interact with the RNA substrate (Perederina et al., 2002). Ctr3 from M. burtonii does not have any significant primary sequence similarity with L5. However, there is marked functional similarity in their predicted binding of both tRNA and 5S rRNA via the C loop (for Ctr3 via the 4C_M1 motif) (Figure 6.6). It has been suggested that archaeal L5-5S rRNA 127

recognition is structurally mediated and this module is highly conserved (Perederina et al., 2002). Similar structurally mediated binding modules have also been previously reported in CTC-5S rRNA and S8-16S rRNA complexes (Fedorov et al., 2001; Tishchenko et al., 2001). For Ctr3, the core binding motif of 4C_M1 covers hairpin loops of tRNA and 5S rRNA (Figure 6.4). Therefore it is possible that structural recognition modules can mediate Ctr3-RNA interactions.

In the E. coli RumA-TRAM domain, the aromatic amino acids afford direct RNA- protein interactions (Lee et al., 2005). The preformed 3’ hairpin structure of the RNA is recognized by the N-terminal TRAM domain of RumA (RlmD) and the free energy of binding from RNA-protein interaction is utilized by the catalytic domain by enhancing the overall catalytic efficiency rate (Lee et al., 2005). Based on the biophysical properties (RNA destabilizes protein) determined for Ctr3, its preferred RNA binding partners, and the structural and functional similarities Ctr3 has with both L5 and Csp proteins, three possible intracellular molecular roles for Ctr3 can be proposed: 1) binding to secondary structures of 5S rRNA and tRNA during pre-RNA processing/maturation steps with the free energy generated from protein-RNA interactions used to facilitate proper folding; 2) binding to mature 5S rRNA and tRNA (via structural motifs) with the free energy associated with interaction utilized for proficient 5S rRNA and tRNA assembly in the large ribosomal unit; 3) binding to the 5S rRNA and tRNA that is already assembled in the ribosome with the free energy driving translation; a process that becomes inherently less efficient with decreasing growth temperature. Overall, the fact that Ctr3 has increasingly higher cellular abundance with decreasing growth temperature, with abundance peaking at -2C (Tmin) when the cells are most cold stressed (Williams et al., 2011), is consistent with Ctr3 interacting specifically with tRNA and 5S rRNA to facilitate low temperature translation. This may be achieved by promoting low temperature ribosome biogenesis (5S rRNA), peptide elongation (tRNA) or by overall effectiveness of the translation machinery (5S rRNA and tRNA).

128

RNAP 3* Inhibitory secondary structure (A) 1* 5’ Open reading frame 3’ 2* DNA Transcription start site

5* 4* Nascent mRNA Ribosome Helicase Helicase

mRNA Helicase

Ribosome Translation Protein (B) 6* (C) L5 Ctr3

Figure 6.6. Model showing putative cellular functions of Csps, L5 and Ctr3. (A) Proposed functional role in: 1*, transcription (La Teana et al., 1991; Brandi et al., 1994; Bae et al., 1997); 2*, auto-regulation and stabilization of secondary structures in the 5’-UTR (Jiang et al., 1996; Bae et al., 1999); 3*, anti- termination; 4*, chaperoning nascent RNA; 5*, chaperoning preformed transcripts and reducing the stability of secondary structures (La Teana et al., 1991; Bae et al., 1997; Jiang et al., 1997; Bae et al., 2000); 6*, translation (Phadtare et al., 2010; Phadtare, 2011; Baria et al., 2013). (B) L5 binding to tRNA and 5S rRNA. (C) Ctr3 binding to tRNA and 5S rRNA. The interaction of Csps, L5 and Ctr3 with their nucleic acid substrates is predicted to lead to effective translation in Bacteria and Archaea.

129

Table 6.2. Comparison between Csp proteins from Bacteria and Ctr3/TRAM proteins from Archaea. Some characteristics listed are not unique features. For example, a small number of csp genes have been identified in Archaea; not all csp genes are known to possess long 5’-UTRs containing a cold-box.

Feature Bacteria/Csp Archaea/Ctr3 csp gene Yes No Single TRAM domain gene No Yes Molecular weight 60 – 70 kDa Structure β-barrel with anti-parallel β-sheets Unfolding Reversible Nucleic acid binding surface Cluster of conserved amino acids Catalytic domain None Gene regulation Cold-box within long 5’-UTR Known protein binding substrate DNA/RNA species tRNA and 5S rRNA Cellular function Translation

6.4.3 Evolutionary implications of single TRAM domain proteins

A long 5’-UTR with a cold-box sequence (shared with E. coli and Anabaena sp.) was identified for the M. burtonii, low temperature regulated RNA helicase gene, Mbur_1950 (Lim et al., 2000). In this study, a B. subtilis-like cold-box element was identified in the 5’-UTRS of ctr2 and ctr3 genes (Chapter 5). Therefore, it can speculated that bacterial-like gene regulatory mechanisms of low temperature regulation may have been acquired by horizontal gene transfer from Bacteria to M. burtonii which is consistent with genomic analyses which identified the importance of genome plasticity in the evolution of M. burtonii to its Antarctic environment, including gene transfer from Eplisonproteobacteria and Deltaproteobcteria (Allen et al., 2009). In contrast to the apparent acquisition of specific mechanisms of gene regulation by horizontal gene transfer, the genes encoding TRAM domain proteins themselves are evolutionarily ancient with distinct clades diverging early within the Archaea and Bacteria (Chapter 2). The single TRAM lineage has by and large been vertically inherited (mimicking that of 16S rRNA) with gene duplication occurring relatively recently (Chapter 2). The broad representation of single TRAM domain proteins within 130

Archaea vs apparent absence in Bacteria, in contrast to the scarcity of csp genes in Archaea vs prevalence in Bacteria, suggests they represent distinct evolutionary lineages of functionally equivalent RNA-binding proteins.

In E. coli, Csps have been proposed to assist transcription and translation at low temperatures (Jiang et al., 1997; Bae et al., 2000; Phadtare et al., 2002). Cold shock proteins from B. subtilis were reported to be crucial for protein synthesis during cold shock (Graumann et al., 1997). The expression data for M. burtonii indicate that the three TRAM proteins play roles that are particularly important for cold adaptation

(Williams et al., 2011). In M. burtonii, the cellular level of Ctr3 is highest at Tmin (- 2C), and it has been hypothesized that Ctr3 (Ctr1 and Ctr2 as well) compensate for reduced RNA helicase activity at Tmin (Williams et al., 2011). CsdA (RNA helicase) from E. coli has been shown to promote translation by resolving inhibitory secondary structures of mRNAs (Phadtare, 2011). Therefore, it possible that Ctr3 (Ctr1 and Ctr2 as well) can also have chaperone-like activity at low temperatures that can compensate for the reduced RNA helicase activity and promote translation. Nevertheless, this clearly is an area of research that requires further investigation.

131

Chapter 7

General discussion and future perspectives

132

7.1 Relationship between RNP and TRAM RNA-binding modules

The structural analysis of Csp proteins revealed that the two RNA binding motifs, RNP1 and RNP2, are located on the β-2 and β-3 strands, respectively (Bandziulis et al., 1989; Burd and Dreyfuss, 1994). The nucleic acid binding surface of Csp exposes a cluster of aromatic amino acids that are surrounded by positively charged (basic) amino acids. However, the overall charge of Csp proteins is negative. This structural arrangement allows negatively charged nucleic acids to approach Csp proteins via electrostatic attraction (from the basic residues) and subsequently bind to the aromatic RNP side chains of the protein (Phadtare and Severinov, 2010). RNP1 and RNP2 are composed of Lys-Gly-Phe-Gly-Phe-Ile and Val-Phe-Val-His-Phe, which are located in antiparallel strands β-2 and β-3, respectively (Feng et al. 1998). The four surface exposed Phe residues are involved in interactions with nucleic acids (Newkirk et al., 1994; Phadtare et al., 2002). The homology model of Ctr3 from M. burtonii also exhibited four surface exposed Phe residues (Figure 4.12; Chapter 4). These Phe residues (in Ctr3) were postulated to be involved in RNA-binding.

CspA from E. coli and Csp from M. frigidum are structurally homologous and share 59% sequence identity (Giaquinto et al., 2007). The aromatic residues are generally conserved among Csp proteins (Feng et al., 1998). On the nucleic acid binding surface, EcCspA has five basic and four acidic residues, and MfCsp has six basic and five acidic residues (Giaquinto et al., 2007). It was postulated that higher ratio between basic to acidic residues in Csp proteins allows these cold-adapted proteins to function at low temperatures (Giaquinto et al., 2007). Ctr3 exhibits 21% and 25% sequence identity with MfCsp and EcCspA, respectively (Figure 7.1). The ratio of basic to acidic residues predicted on the putative binding surface of Ctr3 is 5:4 (Figure 7.2); similar to the ratios found in CspA and Csp from E. coli and M. frigidum, respectively. Furthermore, two putative RNP-like motifs were identified in Ctr3; RNP1-like motif was identified on strand β-5 and RNP2 on β-3 (Figure 7.1). The arrangement of amino acid residues in RNP1-like motif in Ctr3 (Arg-Lys-Phe-Ala-Phe-Gly) is similar to the arrangement in RNP1 in EcCspA (Lys-Gly-Phe-Gly-Phe-Ile). Both these motifs have a hydrophobic residue (Ala in Ctr3 and Gly in CspA) sandwiched between Phe residues and surrounded by hydrophobic/basic residues to facilitate nucleic acid binding. The arrangement of amino acid residues in RNP2-like motif in Ctr3 (Gly-Phe-Val-Ile-Phe) is also similar to the arrangement in RNP2 in EcCspA (Val-Phe-Val-His-Phe). The 133

order of RNP1 and RNP2 can vary between CspA-like proteins and other RNP proteins (Feng et al., 1998).

RNP2 RNP1 MbCtr3 MESTAPVEAGESYDVTIEDTAREGDG-IARVSG-FVIFVPNTSV------GDEVTIKVTKVARKFAFGEVV-- 63 MfCsp MEVSYMTGKVKWFN------SEKGYGFITTDEG-QDIFAHYSQIQKDGFKSLEEGERVSFEVVDGAKGPQASDITSL 70 EcCspA -MSGKMTGIVKWFN------ADKGFGFITPDDGSKDVFVHFSAIQNDGYKSLDEGQKVSFTIESGAKGPAAGNVTSL 70 . : :: : :* * *: .* :*. : : *: *:: : . *. .::. β-1 β-2 β-3 β-4 β-5 RNP1 RNP2

Figure 7.1. Primary sequence alignment of Csp homologs from E. coli and M. frigidum, and Ctr3 from M. burtonii. β-strands are represented by dark underlines, negatively and positively charged residues are highlighted as grey and turquoise, respectively. Red highlights RNP1 and RNP2 from E. coli and M. frigidum, and orange highlight putative RNP1 and RNP2-like motifs in Ctr3.

Figure 7.2. Homology model of Ctr3. The putative nucleic acid binding surface is surrounded by 5 surface exposed basic residues: three Lys and two Arg (highlighted in turquoise); and 4 surface exposed acidic residues: two Asp and two Glu (highlighted in grey). Orange highlights the Phe residues that are likely to participate in interaction with RNA.

134

Mutational analysis of CspB-oligonucleotide interactions (from B. subtilis) and CspA- single stranded DNA (from E. coli) have illustrated the roles of the surface exposed Phe residues play in protein-nucleic acid association (Schindelin et al., 1997; Schröder et al., 1995). The nucleic acid binding surface projects a hydrophobic dock allowing interactions with nucleic acids. These aromatic residues together with the surface exposed basic amino acid residues have been shown to be strongly conserved in Bacteria (Feng et al., 1998). Interestingly, Ctr3 also demonstrates similar structural features (Chapter 4). Therefore, perhaps through evolution, RNP and RNA-binding motifs in TRAM domains have converged from different RNA-binding domains to produce similar molecular recognition modules for binding.

7.2 Roles of Phe residues in the nucleic acid binding surface

Substitutions of the Phe residues in E. coli CspA or corresponding residues in B. subtilis CspB have led to the loss of the nucleic acid binding ability (Schroder et al., 1995; Hillier et al., 1998). It was postulated that these residues play important roles in protein function; i.e. the ability to bind and melt nucleic acids. Phe residues in CspE were mutated to Arg and subsequent results were analysed (Phadtare et al., 2002). It was found that not all Phe residues played similar roles in binding and melting of nucleic acids. Substitution of two Phe residues at position 17 and 30 (one in each RNP) did not affect nucleic acid binding, but lost the ability to melt nucleic acids; whereas substitution of Phe residues at position 19 and 33 (one in each RNP) did not affect nucleic acid melting. It was concluded that perhaps the basic residues surrounding the aromatic amino acid residues were primarily involved in attracting nucleic acids, and the association of two Phe residues (at position 19 and 33) with nucleic acids was essential for function; whilst the other Phe residues assisted in the RNA-protein interaction or played other roles e.g. stabilization of the RNA-protein complex (Phadtare et al., 2002). Hence, it is possible that Phe residues have specific roles in the nucleic acid binding surfaces of Csp proteins. In the structural analysis of Ctr3 (Figure 4.12; Chapter 4), alignment of RumA-TRAM domain with Ctr3 suggested that only two Phe residues (from a total of four) were involved in direct RNA-protein interaction. They were at position 36 and 57 (Figure 4.12), one from each putative RNP (Figure 7.1), whereas the two other Phe residues at position 33 and 59 (one in each putative 135

RNP) were in close vicinity of Phe36 and Phe57, but are not involved in direct protein- RNA interaction. Therefore, it is possible that not all Phe residues in Ctr3 play similar roles in the nucleic acid binding surface. While two Phe residues might be involved in direct interaction with RNA, the other Phe residues may function in protein-RNA complex stabilization by assisting proper accommodation of tRNA and 5S rRNA species. Future mutational analysis of Phe residues in Ctr3 in complex with RNA substrates might help understand the roles of individual Phe residues.

7.3 Characteristics of Csp and TRAM proteins: possibility of common roles

E. coli encodes a total of nine csp genes (cspA to cspI) and only four are (cspA, cspB, cspG and cspI) induced during cold shock (Baria et al., 2013). By deleting four of the cold induced genes (cspA, cspB, cspG and cspI), a cold sensitive strain of E. coli, BX04, was developed that was unable to form colonies at 15oC (Xia et al., 2001). Overexpressing any of the E. coli genes (except for cspI) suppressed the cold sensitivity of BX04 cells. This cold sensitive strain has been successfully used in many gene complementation analyses. In the study performed by Giaquinto et al. (2007), Mfcsp gene and a CSD gene from M. burtonii exhibited the ability to complement cold sensitive growth defect in BX04 cells (Giaquinto et al., 2007). Based on the possible role of Ctr3 in cold adaptation (Chapter 6) and structural similarities with MfCsp and CSD, it can be speculated that ctr genes from M. burtonii are likely to complement the cold sensitivity of E. coli.

In E. coli, cspA mRNA undergoes structural rearrangement via stabilization of an RNA folding intermediate at low temperatures and this structure favours efficient translation of cspA mRNA at cold temperatures (Giuliodori et al., 2010). Furthermore, it was suggested that cspA mRNA may respond to low temperatures and consequently adopt a functionally distinct structure (Giuliodori et al., 2010). In M. burtonii, Ctr proteins were upregulated during cold stress (-2oC) (Williams et al., 2011). It will be interesting to see whether ctr mRNA can also respond to low temperatures and undergo structural rearrangement to facilitate low temperature translation. Whether ctr mRNA undergoes temperature dependent structural transitions can be assessed using temperature 136

dependent gel electrophoresis across a temperature gradient (possibly between 4-23oC). If the electrophoretic profile displays nonlinear, temperature-dependent mobility variation i.e. lower mobility at low temperature but increased mobility at high temperature, occurrence of structural rearrangement of mRNA can be implied (Giuliodori et al., 2010).

CspB and CspC from B. subtilis have been shown to bind to the cold-box sequences within the 5'-UTRs of cspB and cspC (Graumann et al., 1997). The binding affinity of CspB and CspC appears to be more favoured at low temperatures (Lopez et al., 1999). The 5’-UTR is prone to form secondary structure at low temperatures in which the stem-loop structure is composed of the cold-box sequence (Bae et al., 1997; Giuliodori et al., 2010). It was postulated that binding to cold-boxes may prevent formation of inhibitory secondary structures of 5'-UTRs at low temperatures and thereby facilitate translation of CspB and CspC. Similar observations were reported for E. coli where the 5’-UTR was shown to be responsible for cspA mRNA stability at low temperature (Mitta et al., 1997). However, the content of the 5'-UTRs of Csp from E. coli and B. subtilis are different (Lopez et al., 2001). The sequence of the cold-box from E. coli (UGACGUACAGA) is very different from the sequence of the cold-box from B. subtilis (AUUAUUUUUGUUC).

Differences in cold-box content may influence functions of Csp proteins in cold adaptation; e.g. CspA from E coli has been suggested to facilitate transcription antitermination whereas CspB from B. subtilis may prevent mRNA folding and facilitate translation at low temperatures (Lopez et al., 2001). Interestingly, both cold box-like sequences have been identified in M. burtonii. In M. burtonii, the 5'-UTR of the RNA helicase Mbur_1950 has an E. coli like cold-box element (Lim et al., 2000) whereas the 5'-UTRs of ctr2 and ctr3 contain B. subtilis-like cold-box elements (Chapter 5). Therefore, it is possible that the roles of these cold-box-like elements in Mbur_1950 and ctr2-ctr3 are different and consequently influence specific functions of the helicase and Ctr proteins in M. burtonii.

137

7.4 Evolution of cold shock and TRAM domain proteins

In a phylogenetic analysis of Csp and CSD proteins (from Bacteria and Eucarya), formation of separate functional classes of Csp and CSD proteins was not observed (Graumann and Marahiel, 1998). Rather several Csp proteins were grouped with other Csps from different organisms rather than related species. A number of eukaryotic CSD proteins were also grouped within Csp proteins. In addition, the phylogenetic tree constructed on the basis of Csp and CSD sequences was different from the tree that was constructed based on rRNA sequences (Graumann and Marahiel, 1998). It was concluded that the groupings observed in the phylogenetic tree of Csp and CSD proteins was due to the strong sequence conservation among Csp and CSD proteins (Graumann and Marahiel, 1998). Sequence conservation of Csp and CSD proteins suggests that similar functions of these proteins may have been obtained through selective evolutionary pressure. For example, Csp proteins can play a variety of roles based on a conserved mode of function; e.g. stress response in Bacteria, viability of B. subtilis and development of Caenorhabditis elegans (Graumann and Marahiel, 1998). In contrast, sequence conservation is not observed in TRAM domain proteins (Anantharaman et al., 2001). The phylogenetic tree constructed on basis of TRAM domain sequences from different bacterial and archaeal species exhibited formation of separate functional clades (Chapter 2). Furthermore, the TRAM domain tree approximated that of the phylogenetic tree constructed based on rRNA sequences (Chapter 2).

Duplication events or divergence of multiple Csp proteins within or across organisms, based on evolutionary distance, could not be determined from the reported Csp-CSD phylogenetic analysis (Graumann and Marahiel, 1998). In contrast, evolutionary distances obtained from the phylogenetic analysis of TRAM proteins provided some indications on how the duplication/divergence events of single TRAM domain proteins are likely to be relatively recent (Chapter 2). Sequences of Csp and CSD remained conserved during evolution perhaps to ensure conserved modes of function of proteins in different organisms whereas the lack of sequence conservation of different functional classes of TRAM domain proteins suggests that TRAM domains may have different roles in various proteins (e.g. in tRNA modifying proteins, ribosomal proteins, translation initiation factors, RNA methylases and as single TRAM domain protein). For example, to date, only two studies have been performed to assess RNA binding 138

properties of TRAM domains: RumA-TRAM domain (from Bacteria) and Ctr3 (from Archaea) (Lee et al., 2005; Chapter 6). These domains share ~ 25% sequence identity and bind to different RNA species (RumA: 23S rRNA and Ctr3: 5s RNA-tRNA). Although both TRAM domains bind to structured regions of RNA substrates, Ctr3 appears to recognize a sequence based motif (Chapter 6) whilst RumA-TRAM domain binds to the 3’ hairpin segment which is distal to the active site (Lee et al., 2005). In contrast, Csp proteins from Bacteria and Archaea (EcCsp and MfCsp) share ~ 70% sequence identity and appear to have similar functions in cold adaptation. In addition, binding to nucleic acids possibly occurs through almost identical modes of nucleic acid- protein interactions (Giaquinto et al., 2007). Therefore, future investigations of functions of different TRAM domains will reveal whether lack of sequence conservation is reflected in the possible different modes of action of TRAM domains in various proteins.

7.5 Future work and concluding remarks

In order to further validate the ability of Ctr3 to preferentially bind tRNA and 5S rRNA species, binding affinity (Ka) of Ctr3 for RNA fragments with and without 4C_M1 motif can be assessed using isothermal titration calorimetry (ITC). ITC provides direct measurement of the heat generated or absorbed when molecules interact (Gilbert and Batey, 2009). Therefore, RNA fragments containing 4C_M1 motif will produce higher

Ka values compared to fragments without the motif. Aside from analysing binding affinities of Ctr3, ITC can also measure ΔH and ΔS of protein-RNA interactions and determine the molar ratio (stoichiometry) of RNA-protein binding. Determination of ΔH, ΔS along with ΔG from ITC results can also be used to obtain insights into the mechanism of binding.

To better understand the cellular roles of Ctr3, in vivo experiments, such as RIP-Chip, can be performed (Mardis, 2007; Townley-Tilson et al., 2006). RIP-Chip is immunoprecipitation of RNA-binding proteins coupled to reverse transcription and RNA-seq. RIP-chip allows in vivo capture of target RNA and can be performed under a range of different conditions (e.g. high vs low temperatures, logarithmic vs stationary phase, different nutrient levels). Performing in vivo experiments will provide insights into overall cellular effects on Ctr3-RNA interactions. 139

To assess whether Ctr3 can rescue growth defects of E. coli, complementation studies can be performed using the E. coli BX04 strain. BX04 cells harbouring ctr3 can be incubated for various time intervals at several temperatures to assess the ability of Ctr3 to complement cold sensitivity of BX04 under different conditions. If the ctr3 gene can complement the growth defect of BX04 cells at low temperatures, it can be postulated that Ctr3 is biologically active and is able to fulfil functional roles related to cold adaptation in this bacterial host.

Crystal structures of Ctr3 can be of great value in identifying RNA-binding features of the protein. In particular, structures in complex with single-stranded oligonucleotide RNA with 4C_M1 motif can reveal hydrogen bonds and stacking interactions between RNA bases and aromatic side chains that characterize the binding site. Psychrophilic proteins are difficult to crystallize due to the innate flexible nature of the cold adapted proteins. However, a group II chaperonin protein from M. burtonii has previously been successfully crystallized and analysed using x-ray crystallography and can act as precedent to future crystallization trials of Ctr3 (Pilak et al., 2011).

Methods and results described in this study should facilitate the assessment of Ctr1 and Ctr2 proteins using similar approaches. A comprehensive analysis of all three single TRAM domain proteins from M. burtonii will increase our understanding on the roles of these novel proteins in M. burtonii and their possible cellular functions in cold adaptation. The in vitro binding assays to capture RNA targets coupled with RNA-seq can be effectively used in comparative studies. Identifying RNA targets for single TRAM domain proteins from different psychrophilic (e.g. M. burtonii), halophilic (e.g. H. lacusprofundi) and thermophilic (e.g. Methanococcus jannaschii) Archaea can provide a more thorough assessment of single TRAM domain proteins in Archaea.

Collectively, the results and discussion presented in this study explored various features of Ctr3; from developing a method to purify RNA-free proteins and identifying biophysical-structural characteristic relevant to RNA binding to determining specific RNA targets and inferred cellular roles of Ctr3. The phylogenetic analysis of TRAM proteins also provided an assessment of the evolutionary relationship of TRAM proteins and other RNA binding proteins. The in vitro binding assay to capture RNA targets followed by identification using RNA-seq has proven to be a useful technique for characterizing Ctr3. A general reconstruction of M. burtonii transcriptome using RNA- 140

seq data was also performed as a part of this study; only the second psychrophilic archaeon, after M. psychrophilus (Chen et al., 2012; Li et al., 2015), whose transcriptome has been studied using RNA-seq technique. In conclusion, the studies presented here provide a strong foundation for all future studies on TRAM domain proteins.

141

Bibliography

Agirrezabala, X., and Frank, J. (2009) Elongation in translation as a dynamic interaction among the ribosome, tRNA, and elongation factors EF‑G and EF‑Tu. Q Rev Biophys 42: 159-200.

Aguilar, P.S., Hernandez-Arriaga, A.M., Cybulski, L.E., Erazo, A.C., and de Mendoza, D. (2001) Molecular basis of thermosensing: a twocomponent signal transduction thermometer in Bacillus subtilis. EMBO J 20: 1681-1691.

Aitken, C.E., and Lorsch, J.R. (2012) A mechanistic overview of translation initiation in eukaryotes. Nat Struct Mol Biol 19: 568-576.

Albanesi, D., Mansilla, M.C., and de Mendoza, D. (2004) The membrane fluidity sensor DesK of Bacillus subtilis controls the signal decay of its cognate response regulator. J Bacteriol 186: 2655-2663.

Allen, M., Lauro, F., Williams, T., Burg, D., Siddiqui, K., DeFrancisci, D., et al. (2009) The genome sequence of the psychrophilic archaeon, Methanococcoides burtonii: the role of genome evolution in cold adaptation. ISME J 3: 1012-1035.

Anantharaman, V., Koonin, E., and Aravind, L. (2001) TRAM, a predicted RNA- binding domain, common to tRNA uracilmethylation and adenine thiolation enzymes. FEMS Microbiol Lett 197: 215-221.

Anders, S., Pyl, P.T., and Huber, W. (2015) HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics 31: 166-169.

Andrade, J.M., Pobre, V., Silva, I.J., Domingues, S., and Arraiano, C.M. (2009) The role of 39–59 exoribonucleases in RNA degradation. Prog Mol Biol Transl Sci 85: 187- 229.

Andrews, S. (2010) FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.

Aravind, L., and Koonin, E.V. (1999) Novel predicted RNA-binding domains associated with the translation machinery. J Mol Evol 48: 291-302. 142

Aravind, L., and Koonin, E.V. (2001) THUMP–a predicted RNA-binding domain shared by 4-thiouridine, pseudouridine synthases and RNA methylases. Trends Biochem Sci 26: 215-217.

Ashutosh, T., Prasanna, K., and Bhata, R. (2000) An efficient and cost-effective procedure for preparing sample for differential scanning calorimetry experiments. Anal Chem 284: 406-408.

Auffinger, P., and Westhof, E. (1998) Location and distribution of modified nucleotides in tRNA. In Modification and editing of RNA. Grosjean, H., and Benne, R. (eds). Washington, DC: ASM Press, pp. 569-576.

Awano, N., Rajagopal, V., Arbing, M., Patel, S., Hunt, J., Inouye, M., and Phadtare, S. (2010) Escherichia coli RNase R has dual activities, helicase and RNase. J Bacteriol 192: 1344-1352.

Backe, P.H., Messias, A.C., Ravelli, R.B., Sattler, M., and Cusack, S. (2005) X-ray crystallographic and NMR studies of the third KH domain of hnRNP K in complex with single-stranded nucleic acids. Structure 13: 1055-1067.

Bae, W., Jones, P., and Inouye, M. (1997) CspA, the major cold shock protein of Escherichia coli, negatively regulates its own gene expression. J Bacteriol 179: 7081- 7088.

Bae, W., Phadtare, S., Severinov, K., and Inouye, M. (1999) Characterization of Escherichia coli cspE, whose product negatively regulates transcription of cspA, the gene for the major cold shock protein. Mol Microbiol 31: 1429-1441.

Bae, W., Xia, B., Inouye, M., and Severinov, K. (2000) Escherichia coli CspA-family RNA chaperones are transcription antiterminators. Proc Natl Acad Sci USA 97: 7784– 7789.

Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., and Noble, W.S. (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37: W202-208. 143

Baliga, N.S., Bonneau, R., Facciotti, M.T., Pan, M., Glusman, G., Deutsch, E.W., et al. (2004) Genome sequence of Haloarcula marismortui: a halophilic archaeon from the Dead Sea. Genome Res 14: 2221-2234.

Ban, N., Nissen, B., Hansen, J., Moore, P., and Steitz, T. (2000) The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 289: 905-919.

Bandziulis, R.J., Swanson, M.S., and Dreyfuss, G. (1989) RNA-binding proteins as developmental regulators. Genes Dev 3: 431-437.

Baranov, P.V., Kubarenko, A.V., Gurvich, O.L., Shamolina, T.A., and Brimacombe, R. (1999) The database of ribosomal cross-links: an update. Nucleic Acids Res 27: 184- 185.

Barciszewska, M., Szymañski, M., Erdmann, V., and Barciszewski, J. (2001) Structure and functions of 5S rRNA. Acta Biochim Pol 48: 191-198.

Baria, C., Malecki, M., and Arraiano, C.M. (2013) Bacterial adaptation to cold. Microbiology 159: 2437-2443.

Barns, S.M., Fundyga, R.E., Jeffries, M.W., and Pace, N.R. (1994) Remarkable archaeal diversity detected in a Yellowstone National Park hot spring environment. Proc Natl Acad Sci USA 91: 1609-1613.

Bell, S.D., Cairns, S.S., Robson, R.L., and Jackson, S.P. (1999) Transcriptional regulation of an archaeal operon in vivo and in vitro. Mol Cell 4: 971-982.

Bergholz, P., Bakermans, C., and Tiedje, J. (2009) Psychrobacter arcticus 273-4 uses resource efficiency and molecular motion adaptations for subzero temperature growth. J Bacteriol 191: 2340-2352.

Björk, G.R. (1984) Transfer RNA modification in different organisms. Chem Scripta 26B: 91-95.

Björk, G.R., and Kohli, J. (1990) Synthesis and Function of Modified Nucleosides in tRNA. In Chromatography and Modification of Nucleosides. Part B. Biological Roles and Function of Modification. Gehrke, C., and Kuo, K. (eds). Amsterdam: Elsevier, pp. B13-B67. 144

Boguski, M.S., Tolstoshev, C.M., and Bassett, D.E.Jr. (1994) Gene discovery in dbEST. Science 265: 1993-1994.

Bolhuis, H., Poele, E.M., and Rodriguez-Valera, F. (2004) Isolation and cultivation of Walsby's square archaeon. Environ Microbiol 6: 1287-1291.

Bottomly, D., Walter, N.A., Hunter, J.E., Darakjian, P., Kawane, S., Buck, K.J., et al. (2011) Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays. PloS one 6: e17820.

Braddock, D.T., Baber, J.L., Levens, D., and Clore, G.M. (2002) Molecular basis of sequence-specific single-stranded DNA recognition by KH domains: solution structure of a complex between hnRNP K KH3 and single-stranded DNA. EMBO J 21: 3476– 3485.

Bradford, M.M. (1976). A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 72: 248-254.

Brandi, A., Pon, C., and Gualerzi, C. (1994) Interaction of the main cold shock protein CS7.4 (CspA) of Escherichia coli with the promoter region of hns. Biochimie 76: 1090- 1098.

Braun, P., and LaBaer, J. (2003) High throughput protein production for functional proteomics. Trends Biotechnol 21: 383–388.

Bruneel, O., Pascault, N., Egal, M., Bancon-Montigny, C., Goñi-Urriza, M.S., Elbaz- Poulichet, F., et al. (2008) Archaeal diversity in a Fe–As rich acid mine drainage at Carnoulès (France). Extremophiles 12: 563-571.

Burd, C.G., and Dreyfuss, G. (1994) Conserved structures and diversity of functions of RNA-binding proteins. Science 615-615.

Burg, D.W., Lauro, F.M., Williams, T.J., Raftery, M.J., Guilhaus, M., and Cavicchioli, R. (2010) Analyzing the hydrophobic proteome of the antarctic archaeon Methanococcoides burtonii using differential solubility fractionation. J Proteome Res 9: 664-676. 145

Bycroft, M., Hubbard, T.J., Proctor, M., Freund, S.M., and Murzin, A.G. (1997) The solution structure of the S1 RNA binding domain: a member of an ancient nucleic acid- binding fold. Cell 88: 235-242.

Cairrão, F., Cruz, A., Mori, H., and Arraiano, C.M. (2003) Cold shock induction of RNase R and its role in the maturation of the quality control mediator SsrA/tmRNA. Mol Microbiol 50: 1349-1360.

Caldas, T., Binet, E., Bouloc, P., Costa, A., Desgres, J., and Richarme, G. (2000) The FtsJ/RrmJ heat shock protein of Escherichia coli is a 23S ribosomal RNA methyltransferase. J Biol Chem 275: 16414-16419.

Caldas, T., Laalami, S., and Richarme, G. (2000) Chaperone properties of bacterial elongation factor EF-G and initiation factor IF2. J Biol Chem 275: 855–860.

Campanaro, S., Pascale, F., Telatin, A., Schiavon, R., Bartlett, D., and Valle, G. (2012) The transcriptional landscape of the deep-sea bacterium Photobacterium profundum in both a toxR mutant and its parental strain. BMC Genomics 13: 567.

Campanaro, S., Williams, T., DeFrancisci, D., Treu, L., Lauro, F., and Cavicchioli, R. (2011) Temperature-dependent global gene expression in the Antarctic archaeon, Methanococcoides burtonii. Environ Microbiol 13: 2018-2038.

Campbell, Z.T., Bhimsaria, D., Valley, C.T., Rodriguez-Martinez, J.A., Menichelli, E., Williamson, J.R., et al. (2012) Cooperativity in RNA-protein interactions: global analysis of RNA binding specificity. Cell Rep 1: 570-581.

Cao-Hoang, L., Dumont, F., Marechal, P.A., and Gervais, P. (2010) Inactivation of Escherichia coli and Lactobacillus plantarum in relation to membrane permeabilization due to rapid chilling followed by cold storage. Arch Microbiol 192: 299-305.

Casanueva, A., Tuffin, M., Cary, C., and Cowan, D.A. (2010) Molecular adaptations to psychrophily: the impact of ‘omic’technologies. Trends Microbial 18: 374-381.

Cavicchioli, R. (2006) Cold-adapted archaea. Nat Rev Microbiol 4: 331-343.

Cavicchioli, R. (2011) Archaea—timeline of the third domain. Nat Rev Microbiol 9: 51- 61. 146

Cavicchioli, R. (2015) Microbial ecology of Antarctic aquatic systems. Nat Rev Microbiol 13: 691-706.

Cavicchioli, R., Curmi, P., Siddiqui, K., and Thomas, T. (2006) Proteins from psychrophiles. In Methods in Microbiology. Rainey, F.A., and Oren, A. (eds). London, UK: Academic Press, Elsevier, pp. 395–436.

Cavicchioli, R., DeMaere, M.Z., and Thomas, T. (2007) Metagenomic studies reveal the critical and wide‐ranging ecological importance of uncultivated archaea: the role of ammonia oxidizers. BioEssays 29: 11-14.

Chaikam, V., and Karlson, D.T. (2010) Comparison of structure, function and regulation of plant cold shock domain proteins to bacterial and animal cold shock domain proteins. BMB Rep 43: 1-8.

Chamot, D., Magee, W.C., Yu, E., and Owttrim, G.W. (1999) A cold shock-induced cyanobacterial RNA helicase. J Bacteriol 181: 1728-1732.

Charette, M., and Gray, M. (2000) Pseudouridine in RNA: what, where, how and why. IUBMB Life 49: 341-351.

Chen, Z., Yu, H., Li, L., Hu, S., and Dong, X. (2012) The genome and transcriptome of a newly described psychrophilic archaeon, Methanolobus psychrophilus R15, reveal its cold adaptive characteristics. Environ Microbiol Rep 4: 633-641.

Clery, A., Blatter, M., and Allain, F.H. (2008) RNA recognition motifs: boring? Not quite. Curr Opin Struc Biol 18: 290-298.

Cloonan, N., Forrest, A.R., Kolle, G., Gardiner, B.B., Faulkner, G.J., Brown, M.K., et al. (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5: 613-619.

Cooper, A. (2000) Microcalorimetry of protein-DNA interactions. In DNA-Protein Interactions Travers, A., and Buckle, M. (eds). Oxford, UK: Oxford University Press, pp. 125-139.

Cox, M.P., Peterson, D.A., and Biggs, P.J. (2010) SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinf 11: 485. 147

Dalluge, J.J., Hamamoto, T., Horikoshi, K., Morita, R.Y., Stetter, K.O., and McCloskey, J.A. (1997) Posttranscriptional modification of tRNA in psychrophilic bacteria. J Bacteriol 179: 1918-1923.

Dalluge, J.J., Hashizume, T., Sopchik, A.E., McCloskey, J.A., and Davis, D.R. (1996) Conformational flexibility in RNA: the role of dihydrouridine. Nucleic Acids Res 24: 1073-1079.

D'Amico, S., Collins, T., Marx, J.C., Feller, G., and Gerday, C. (2006) Psychrophilic microorganisms: challenges for life. EMBO Rep 7: 385-389.

D'Amico, S., Gerday, C., and Feller, G. (2003) Temperature adaptation of proteins: engineering mesophilic-like activity and stability in a cold-adapted α-amylase. J Mol Biol 332: 981-988.

D'amico, S., Marx, J.C., Gerday, C., and Feller, G. (2003b) Activity-stability relationship in extremophilic enzymes. J Biol Chem 278: 7891-7896.

David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C.J., Bofkin, L., et al. (2006) A high-resolution map of transcription in the yeast genome. Proc Natl Acad Sci USA 103: 5320-5325.

DeFrancisci, D., Campanaro, S., Kornfeld, G., Siddiqui, K., Williams, T., Ertan, H., Treu, L., Pilak, O., et al. (2011) The RNA polymerase subunits E/F from the Antarcticarchaeon Methanococcoides burtonii bind to specific species of mRNA. Environ Microbiol 13: 2039-2055.

Dennis, P.P. (1997) Ancient ciphers: translation in Archaea. Cell 89: 1007-1010.

Dinman, J. (2005) 5S rRNA: Structure and function from head to toe. Int J Biomed Sci. 1: 2-7.

Dreyfuss, G., Kim, V.N., and Kataoka, N. (2002) Messenger-RNA-binding proteins and the messages they carry. Nature Rev Mol Cell Biol 3: 195-205.

Du, X., and Wang, E. (2003) Tertiary structure base pairs between D- and TC-loops of Escherichia coli tRNALeu play important roles in both aminoacylation and editing. Nucleic Acids Res 31: 2865-2872. 148

Dubendorff, J.W., and Studier, F.W. (1991) Controlling basal expression in an inducible T7 expression system by blocking the target T7 promoter with lac repressor. J Mol Biol 219: 45-59.

Duss, O., Michel, E., Diarra dit Konté, N., Schubert, M., and Allain, F. (2014) Molecular basis for the wide range of affinity found in Csr/Rsm protein–RNA recognition. Nucleic Acids Res 42: 5332-5346.

Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.

Edwards, K.J., Bond, P.L., Gihring, T.M., and Banfield, J.F. (2000) An archaeal iron- oxidizing extreme acidophile important in acid mine drainage. Science 287: 1796-1799.

Epand, R.F., Epand, R.M., and Jung, C.Y. (2001) Ligand‐modulation of the stability of the glucose transporter GLUT 1. Protein Sci 10: 1363-1369.

Falz, K.Z., Holliger, C., Grosskopf, R., Liesack, W., Nozhevnikova, A.N., Müller, B., et al. (1999) Vertical distribution of methanogens in the anoxic sediment of Rotsee (Switzerland). Appl Environl Microbiol 65: 2402-2408.

Fang, L., Hou Y., and Inouye M. (1998) Role of the cold-box region in the 5' untranslated region of the cspA mRNA in its transient expression at low temperature in Escherichia coli. J Bacteriol 180: 90-95.

Fang, L., Jiang, W., Bae, W., and Inouye, M. (1997) Promoter independent cold-shock induction of cspA and its derepression at 37oC by mRNA stabilisation. Mol Microbiol 23: 355-364.

Fedorov, R., Meshcheryakov, V., Gongadze, G., Fomenkova, N., Nevskaya, N., Selmer, M., et al. (2001) Structure of ribosomal protein TL5 complexed with RNA provides new insights into the CTC family of stress proteins. Acta Crystallogr Section D: Biol Crystallogr 57: 968-976.

Feller, G. (2013) Psychrophilic enzymes: from folding to function and biotechnology. Scientifica. doi: http://dx.doi.org/10.1155/2013/512840. 149

Feller, G., Bussy, O., and Gerday, C. (1998) Expression of psychrophilic genes in mesophilic hosts: assessment of the folding state of a recombinant α-Amylase. Appl Environ Microbiol 64: 1163-1165.

Feller, G., and Gerday, C. (1997) Psychrophilic enzymes: molecular basis of cold adaptation. Cell Mol Life Sci 53: 830-841.

Feller, G., and Gerday, C. (2003) Psychrophilic enzymes: hot topics in cold adaptation. Nature Rev Microbiol 1: 200-208.

Feller, G., D'Amico, D., and Gerday, C. (1999) Thermodynamic stability of a cold- active α-amylase from the Antarctic bacterium Alteromonas haloplanctis. Biochemistry 38: 4613-4619.

Feller, G., Zekhnini, Z., Lamotte-Brasseur, J., and Gerday, C. (1997) Enzymes from cold adapted microorganisms. The class C beta-lactamase from the Antarctic psychrophile Psychrobacter immobilis A5. Eur J Biochem 244: 186-191.

Feng, W., Tejero, R., Zimmerman, D.E., Inouye, M., and Montelione, G.T. (1998) Solution NMR structure and backbone dynamics of the major cold-shock protein (CspA) from Escherichia coli: evidence for conformational dynamics in the single- stranded RNA-binding site. Biochemistry 37: 10881-10896.

Franzmann, P., Springer, N., Ludwig, W., Conway de Macario, E., and Rohde, M. (1992) A methanogenic archaeon from ace lake, Antarctica: Methanococcoides burtonii sp. nov. Syst Appl Microbiol 15: 573–581.

Franzmann, P.D., Liu, Y., Balkwill, D.L., Aldrich, H.C., De Macario, E.C., and Boone, D.R. (1997) Methanogenium frigidum sp. nov., a psychrophilic, H2-using methanogen from Ace Lake, Antarctica. Int J Syst Evol Microbiol 47: 1068-1072.

Franzmann, P.D., Springer, N., Ludwig, W., Conway De Macario, E., and Rohde, M.A. (1992) Methanogenic archaeon from Ace Lake, Antarctica: Methanococcoides burtonii sp. nov. System Appl Microbiol 15: 573-581.

Franzmann, P.D., Stackebrandt, E., Sanderson, K., Volkman, J.K., Cameron, D.E., Stevenson, P.L., et al. (1988) Halobacterium lacusprofundi, sp. nov., a halophilic bacterium isolated from Deep Lake, Antarctica. Syst Appl Microbiol 11: 20-27. 150

Freire, E. (1995) Thermal denaturation methods in the study of protein folding. Methods Enzymol 259:144-168.

Freire, E., Osdol, W.V., Mayorga, O.L., and Sanchez-Ruiz, J.M. (1990) Calorimetrically determined dynamics of complex unfolding transitions in proteins. Annu Rev Biophys Biophys Chem 19: 159-188.

Frith, M.C., Saunders, N.F.W., Kobe, B., and Bailey, T.L. (2008) Discovering sequence motifs with arbitrary insertions and deletions. PLoS Comp Biol 4: e1000071.

Fu, X., Fu, N., Guo, S., Yan, Z., Xu, Y., Hu, H., et al. (2009) Estimating accuracy of RNA-Seq and microarrays with proteomics. BMC genomics 10: 161.

Gao, H., Yanh, Z., Wu, L., Thompson, D., and Zhou, J. (2006) Global transcriptome analysis of the cold shock response of Shewanella oneidensis MR-1 and mutational analysis of its classical cold shock proteins. J Bacteriol 188: 4560-4569.

Garcia-Mira, M.M., Boehringer, D., and Schmid, F.X. (2004) The folding transition state of the cold shock protein is strongly polarized. J Mol Biol 339: 555-569.

Georlette, D., Blaise, V., Collins, T., D'Amico, S., Gratia, E., Hoyoux, A., et al. (2004) Some like it cold: biocatalysis at low temperatures. FEMS Microbiol Rev 28: 25-42.

Gerday, C., Aittaleb, M., Arpigny, J.L., Baise, E., Chessa, J.P., Garsoux, G., et al. (1997) Psychrophilic enzymes: a thermodynamic challenge. Biochim Biophys Acta 1342: 119-131.

Gerhard, D.S., Wagner, L., Feingold, E.A., Shenmen, C.M., Grouse, L.H., Schuler, G., et al. (2004) The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). Genome Res 14: 2121-2127.

Giaquinto, L., Curmi, P., Siddiqui, K., Poljack, A., DeLong, E., DasSarma, S., and Cavicchioli, R. (2007) Structure and function of cold shock proteins in archaea. J Bacteriol 189: 5738-5748.

Gibson, J.A., Miller, M.R., Davies, N.W., Neill, G.P., Nichols, D.S., and Volkman, J.K. (2005) Unsaturated diether lipids in the psychrotrophic archaeon Halorubrum lacusprofundi. Syst Appl Microbiol 28: 19-26. 151

Gibson, T.J., Thompson, J.D., and Heringa, J. (1993) The KH domain occurs in a diverse set of RNA-binding proteins that include the antiterminator NusA and is probably involved in binding to nucleic acid. FEBS Lett 324: 361-366.

Gilbert, S.D., and Batey, R.T. (2009) Monitoring RNA-ligand interactions using isothermal titration calorimetry. Methods Mol Bio 540: 97-114.

Giuliodori, A.M., Brandi, A., Gualerzi, C.O., and Pon, C.L. (2004) Preferential translation of cold-shock mRNAs during cold adaptation. RNA 10: 265-276.

Giuliodori, A.M., Di Pietro, F., Marzi, S., Masquida, B., Wagner, R., Romby, P., et al. (2010) The cspA mRNA is a thermosensor that modulates translation of the cold-shock protein CspA. Mol Cell 37: 21-33.

Glasel, J. (1995) Validity of nucleic acid purities monitored by 260/280 absorbance ratios. BioTechniques 18: 62-63.

Goldstein, J., Pollitt, N., and Inouye, M. (1990) Major cold shock protein of Escherichia coli. Proc Natl Acad Sci USA 87: 283-287.

González, J.M., Masuchi, Y., Robb, F.T., Ammerman, J.W., Maeder, D.L., Yanagibayashi, M., et al. (1998) Pyrococcus horikoshii sp. nov., a hyperthermophilic archaeon isolated from a hydrothermal vent at the Okinawa Trough. Extremophiles 2: 123-130.

Goodchild, A., Raftery, M., Saunders, N.F., Guilhaus, M., and Cavicchioli, R. (2004a) Biology of the cold adapted archaeon, Methanococcoides burtonii determined by proteomics using liquid chromatography-tandem mass spectrometry. J Proteome Res 3: 1164-1176.

Goodchild, A., Raftery, M., Saunders, N.F., Guilhaus, M., and Cavicchioli, R. (2005) Cold adaptation of the Antarctic archaeon, Methanococcoides burtonii assessed by proteomics using ICAT. J Proteome Res 4: 473-480.

Goodchild, A., Saunders, N.F., Ertan, H., Raftery, M., Guilhaus, M., Curmi, P.M., and Cavicchioli, R. (2004b) A proteomic determination of cold adaptation in the Antarctic archaeon, Methanococcoides burtonii. Mol Microbiol 53: 309-321. 152

Grant, C.E., Bailey, T.L., and Noble, W.S. (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics 27: 1017-1018.

Gräslund, S., Nordlund, P., Weigelt, J., Hallberg, B.M., Bray, J., Gileadi, O., et al. (2008) Protein production and purification. Nat methods 5: 135-146.

Graumann, P., and Marahiel, M.A. (1994) The major cold shock protein of Bacillus subtilis CspB binds with high affinity to the ATTGG‐and CCAAT sequences in single stranded oligonucleotides. FEBS Lett 338: 157-160.

Graumann, P., Schröder, K., Schmid, R., and Marahiel, M.A. (1996) Cold shock stress- induced proteins in Bacillus subtilis. J Bacteriol 178: 4611-4619.

Graumann, P., Wendrich, T., Weber, M., Schröder, K., and Marahiel, M. (1997) A family of cold shock proteins in Bacillus subtilis is essential for cellular growth and for efficient protein synthesis at optimal and low temperatures. Mol Microbiol 25: 741-756.

Graumann, P.L., and Marahiel, M.A. (1998) A superfamily of proteins that contain the cold-shock domain. Trends Biochem Sci 23: 286-290.

Gualerzi, C.O., Giuliodori, A.M., and Pon, C.L. (2003) Transcriptional and post- transcriptional control of cold-shock genes. J Mol Biol 331: 527-539.

Guenther, U., Yandek, L., Niland, C., Campbell, F., Anderson, D., Anderson, V., Harris, M., and Jankowsky, E. (2013) Hidden specificity in an apparently nonspecific RNA-binding protein. Nature 502: 385-388.

Gustafsson, C., Govindarajan, S., and Minshull, J. (2004) Codon bias and heterologous protein expression. Trends Biotechnol 22: 346-353.

Gustilo, E., Vendeix, F., and Agris, P. (2008) tRNA’s modifications bring order to gene expression. Curr Opin Microbiol 11: 134-140.

Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., et al. (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141: 129-141. 153

Hanna, M.M., and Liu, K. (1998) Nascent RNA in transcription complexes interact with CspE, a small protein in E. coli implicated in chromatin condensation. J Mol Biol 282: 227-239.

Hanzelka, B.L., Darcy, T.J., and Reeve, J.N. (2001) TFE, an archaeal transcription factor in Methanobacterium thermoautotrophicum related to eucaryal transcription factor TFIIEalpha. J Bacteriol 183: 1813-1818.

Hartl, F.U., and Hayer-Hartl, M. (2002) Molecular chaperones in the cytosol: From nascent chain to folded protein. Science 295: 1852-1858.

Hasegawa, M., and Hashimoto, T. (1993) Ribosomal RNA Trees Misleading? Nature 361: 23.

Hausner, W., Frey, G., and Thomm, M. (1991) Control regions of an archaeal gene. A TATA box and an initiator element promote cell-free transcription of the tRNA(Val) gene of Methanococcus vannielii. J Mol Biol 222: 495-508.

Hawe, A., Sutter, M., and Jiskoot, W. (2008) Extrinsic fluorescent dyes as tools for protein characterization. Pharm Res 25: 1487-1499.

Hillier, B.J., Rodriguez, H.M., and Gregoret, L.M. (1998) Coupling protein stability and protein function in Escherichia coli CspA. Fold Des 3: 87-93.

Hinnebusch, J., and Tilly, K. (1993) Linear plasmids and chromosomes in bacteria. Mol Microbiol 10: 917-922.

Hirata, A., Klein, B.J., and Murakami, K.S. (2008a) The X-ray crystal structure of RNA polymerase from Archaea. Nature 451: 851-854.

Hoeijmakers, W.A., Bártfai, R., and Stunnenberg, H.G. (2013) Transcriptome analysis using RNA-Seq. Malaria: Methods and Protocols 221-239.

Hofacker, I.L. (2004) RNA secondary structure analysis using the Vienna RNA package. Curr Protoc Bioinformatics Chapter 12:Unit 12.2.

Horn, G., Hofweber, R., Kremer, W., and Kalbitzer, H.R. (2007) Structure and function of bacterial cold shock proteins. Cell Mol Life Sci 64: 1457-1470. 154

Hu, K.H., Liu, E., Dean, K., Gingras, M., DeGraff, W., and Trun, N.J. (1996) Overproduction of three genes leads to camphor resistance and chromosome condensation in Escherichia coli. Genetics 143: 1521-1532.

Hudson, B.P., Martinez-Yamout, M.A., Dyson, H.J., and Wright, P.E. (2004) Recognition of the mRNA AU-rich element by the zinc finger domain of TIS11d. Nature Struct Mol Biol 11: 257-264.

Jankowsky, E., and Harris, E. (2015) Specificity and nonspecificity in RNA-protein interactions. Nature Rev Mol Cell Biol 16: 533-544.

Jeanthon, C., L'Haridon, S., Reysenbach, A.L., Vernet, M., Messner, P., Sleytr, U.B., and Prieur, D. (1998) Methanococcus infernus sp. nov., a novel hyperthermophilic lithotrophic methanogen isolated from a deep-sea hydrothermal vent. Int J Syst Bacteriol 48: 913-919.

Jiang, W., Fang, L., and Inouye, M. (1996) The role of the 5’-end untranslated region of the mRNA for CspA, the major cold-shock protein of Escherichia coli, in cold-shock adaptation. J Bacteriol 178: 4919-4925.

Jiang, W., Hou, Y., and Inouye M. (1997) CspA, the major cold-shock protein of Escherichia coli, is an RNA chaperone. J Biol Chem 272: 196-202.

Johns, G.C., and Somero G.N. (2004) Evolutionary convergence in adaptation of proteins to temperature: A4-lactate dehydrogenases of Pacific damselfishes (Chromis spp.). Mol Biol Evol 21: 314-320.

Jones, P.G., Mitta, M., Kim, Y., Jiang, W., and Inouye, M. (1996) Cold-shock induces a major ribosomal associated protein that unwinds double stranded RNA in Escherichia coli. Proc Natl Acad Sci USA 93: 76-80.

Kandror, O., and Goldberg, A.L. (1997) Trigger factor is induced upon cold shock and enhances viability of Escherichia coli at low temperatures. Proc Natl Acad Sci USA 94: 4978-4981.

Karner, M.B., DeLong, E.F., and Karl, D.M. (2001) Archaeal dominance in the mesopelagic zone of the Pacific Ocean. Nature 409: 507-510. 155

Keller, M., and Zengler, K. (2004) Tapping into microbial diversity. Nat Rev Microbiol 2: 141-150.

Kelly, S.M, Jess, T.J., and Price, N.C. (2005) How to study proteins by circular dichroism. Biochim Biophys Acta 1751: 119-139.

Koide, T., Reiss, D.J., Bare, J.C., Pang, W.L., Facciotti, M.T., Schmid, A.K., et al. (2009) Prevalence of transcription promoters within archaeal operons and coding sequences. Mol Syst Biol 5: 285.

Koonin, E.V., Wolf, Y.I., and Aravind, L. (2000) Protein fold recognition using sequence profiles and its application in structural genomics. Ad Protein Chem 54: 245- 275.

Kramer, G., Rauch, T., Rist, W., Vorderwulbecke, S., Patzelt, H., Schulze-Specking, A., et al. (2002) L23 protein functions as a chaperone docking site on the ribosome. Nature 419: 171-174.

Kublanov, I.V., Bidjieva, S., Mardanov, A.V., and Bonch-Osmolovskaya, E.A. (2009) Desulfurococcus kamchatkensis sp. nov., a novel hyperthermophilic protein-degrading archaeon isolated from a Kamchatka hot spring. Int J Syst Evol Microbiol 59: 1743- 1747.

Kumar S., Nei M., Dudley J., and Tamura K. (2008) MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform 9: 299-306.

La Teana, A., Brandi, A., Falconi, M., Spurio, R., Pon, C., and Gualerzi, C. (1991) Identification of a cold shock transcriptional enhancer of the Escherichia coli gene encoding nucleoid protein H-NS. Proc Natl Acad Sci USA 88: 10907-10911.

Laemmli, U.K. (1970) Cleavage of structural protein during assembly of the head of bacteriophage. Nature 227: 680-685.

Lambert, N., Robertson, A., Jangi, M., McGeary, S., Sharp, P.A., and Burge, C.B. (2014) RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. Mol Cell 54: 887-900. 156

Langer, D., Hain, J., Thuriaux, P., and Zillig, W. (1995) Transcription in archaea: similarity to that in eucarya. Proc Natl Acad Sci USA 92: 5768-5772.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009) Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25.

Lauro, F.M., Allen, M., Wilkins, D., Williams, T.J., and Cavicchioli, R. (2010) Genetics, genomics and evolution of psychrophiles. In Extremophiles Handbook. Horikoshi, K. (ed.). Tokyo, Japan: Springer Japan KK.

Lauro, F.M., DeMaere, M.Z., Yau, S., Brown, M.V., Ng, C., Wilkins, D., et al. (2011). An integrative study of a meromictic lake ecosystem in Antarctica. ISME J 5: 879-895.

Lee, H.J., Jeon, H.J., Ji, S.C., Yun, S.H., and Lim, H.M. (2008) Establishment of an mRNA gradient depends on the promoter: an investigation of polarity in gene expression. J Mol Biol 378: 318-327.

Lee, T., Agarwalla, S., and Stroud, R. (2004) Crystal structure of RumA, an iron-sulfur cluster containing E. coli ribosomal RNA 5-methyluridine methyltransferase. Structure 12: 397-407.

Lee, T., Agarwalla, S., and Stroud, R. (2005) A unique RNA fold in the RumA-RNA- cofactor ternary complex contributes to substrate selectivity and enzymatic function. Cell 120: 599-611.

Leininger, S., Urich, T., Schloter, M., Schwark, L., Qi, J., Nicol, G.W., et al. (2006) Archaea predominate among ammonia-oxidizing prokaryotes in soils. Nature 442: 806- 809.

Lelivelt, M.J., and Kawula, T.H. (1995) Hsc66, an Hsp70 homolog in Escherichia coli, is induced by cold shock but not by heat shock. J Bacteriol 177: 4900-4907.

Li, J., Qi, L., Guo, Y., Yue, L., Li, Y., Ge, W., et al. (2015) Global mapping transcriptional start sites revealed both transcriptional and post-transcriptional regulation of cold adaptation in the methanogenic archaeon Methanolobus psychrophilus. Scientific reports. doi:10.1038/srep09209. 157

Licatalosi, D.D., and Darnell, R.B. (2010) RNA processing and its regulation: global insights into biological networks. Nat Rev Genet 11: 75-87.

Lim, J., Thomas, T., and Cavicchioli, R. (2000) Low temperature regulated DEAD-box RNA helicase from the Antarctic archaeon, Methanococcoides burtonii. J Mol Biol 297: 553-567.

Lister, R., O'Malley, R.C., Tonti-Filippini, J., Gregory, B.D., Berry, C.C., Millar, A.H., and Ecker, J.R. (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133: 523-536.

Lonhienne, T., Gerday, C., and Feller, G. (2000) Psychrophilic enzymes: revisiting the thermodynamic parameters of activation may explain local flexibility. Biochim Biophys Acta 1543: 1-10.

Lopez, M.M., and Makhatadze, G.I. (2000) Major cold shock proteins, CspA from Escherichia coli and CspB from Bacillus subtilis, interact differently with single- stranded DNA templates. Biochim Biophys Acta 1479: 196-202.

Lopez, M.M., Yutani, K., and Makhatadze, G.I. (1999) Interactions of the major cold shock protein of Bacillus subtilis CspB with single-stranded DNA templates of different base composition. J Biol Chem 274: 33601-33608.

Lopez, M.M., Yutani, K., and Makhatadze, G.I. (2001) Interactions of the Cold Shock Protein CspB from Bacillus subtilis with Single-stranded DNA importance of the T base content and position within the template. J Biol Chem 276:15511-15518.

Lunde, B.M., Moore, C., and Varani, G. (2007) RNA-binding proteins: modular design for efficient function. Nat Rev Mol Cell Biol 8: 479-490.

Lundgren, M., Andersson, A., Chen, L., Nilsson, P., and Bernander, R. (2004) Three replication origins in Sulfolobus species: synchronous initiation of chromosome replication and asynchronous termination. Proc Natl Acad Sci USA 101: 7046-7051.

Mardis, E.R. (2007) ChIP-seq: welcome to the new frontier. Nat Methods 8: 613-614.

Marenchino, M., Armbruster, D.W., and Hennig, M. (2009) Rapid and efficient purification of RNA-binding proteins: application to HIV-1 Rev. Protein Expression Purif 63: 112-119. 158

Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M., and Gilad, Y. (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18: 1509-1517.

Markowitz, V., Chen, I., Palaniappan, K., Chu, K., Szeto, E., Grechkin, Y., et al. (2010) The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Res 38: 382-390.

Martin, R., Straub, A.U., Doebele, C., and Bohnsack, M.T. (2013) DExD/H-box RNA helicases in ribosome biogenesis. RNA Biol 10: 4-18.

Metpally, R.P.R., and Reddy, B.V.B. (2009) Comparative proteome analysis of psychrophilic versus mesophilic bacterial species: Insights into the molecular basis of cold adaptation of proteins. BMC Genomics 10: 1-10.

Mihaescu, R., Levy, D., and Pachter, L. (2009) Why neighbor-joining works. Algorithmica 54: 1-24.

Mihailovich, M., Militti, C., Gabaldón, T., and Gebauer, F. (2010) Eukaryotic cold shock domain proteins: highly versatile regulators of gene expression. BioEssays 32: 109-118.

Mikheev, Y.A., Guseva, L.N., Davydov, E.Y., and Ershov, Y.A. (2007) The hydration of hydrophobic substances. Russ J Phys Chem A 81: 1897-1913.

Mitta, M., Fang, L., and Inouye, M. (1997) Deletion analysis of cspA of Escherichia coli: requirement of the AT‐rich UP element for cspA transcription and the downstream box in the coding region for its cold shock induction. Mol Microbiol 26: 321-335.

Moore, P.B. (1995) Structure and function of 5S RNA. In Ribosomal RNA: Structure, Evolution, Processing and Function in Protein Synthesis. Zimmermann, R.A., and Dahlberg, A.E. (eds). Boca Raton, USA: CRC Press, pp. 199-236.

Motorin, Y., and Grosjean, H. (2005) Transfer RNA modification. In: eLS. John Wiley & Sons Ltd, Chichester. doi:10.1002/9780470015902.

Murzin, A. (1993) OB (oligonucleotide/oligosaccharide binding)-fold: common structural and functional solution for non-homologous sequences. EMBO J 12: 861-867. 159

Nalin, C.M., Purcell, R.D., Antelman, D., Mueller, D., Tomchak, L., Wegrzynski, B., et al. (1990) Purification and characterization of recombinant Rev protein of human immunodeficiency virus type 1. Proc Natl Acad Sci USA 87: 7593-7597.

Newkirk, K., Feng, W., Jiang, W., Tejero, R., Emerson, S.D., Inouye, M., and Montelione, G.T. (1994) Solution NMR structure of the major cold shock protein (CspA) from Escherichia coli: identification of a binding epitope for DNA. Proc Natl Acad Sci USA 91: 5114-5118.

Nichols, D.S., Miller, M.R., Davies, N.W., Goodchild, A., Raftery, M., and Cavicchioli, R. (2004) Cold adaptation in the Antarctic archaeon Methanococcoides burtonii involves membrane lipid unsaturation. J Bacteriol 186: 8508-8515.

Nissen, P., Hansen, J., Ban, N., Moore, P., and Steitz, T. (2000) The structural basis of ribosomal activity in peptide bond synthesis. Science 289: 920-930.

Noon, K., Guymon, R., Crain, P., McCloskey, J., Thomm, M., Lim, J., and Cavicchioli, R. (2003) Influence of temperature on tRNA modification in archaea: Methanococcoides burtonii (optimum growth temperature [Topt], 23 degrees C) and Stetteria hydrogenophila (Topt, 95 degrees C). J Bacteriol 185: 5483-5490.

Notredame, C., Higgins, D.G., and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302: 205-217.

Ofengand, J., and Fournier, M. (1998) The pseudouridine residues of rRNA: number, location, biosynthesis, and function. In Modification and Editing of RNA. Grosjean, H., and Benne, R. (eds). Washington, DC: ASM Press, pp. 229-253.

Ofengand, J., Bakin, A., Wrzesinski, J., Nurse, K., and Lane, B. (1995) The pseudouridine residues of ribosomal RNA. Biochem Cell Biol 73: 915-924.

Oubridge, C., Ito, N., Evans, P.R., Teo, C.H., and Nagai, K. (1994) Crystal structure at 1.92 Å resolution of the RNA-binding domain of the U1A spliceosomal protein complexed with an RNA hairpin. Nature 372: 432-438.

Ouzounis, C., and Sander, C. (1992) TFIIB, an evolutionary link between the transcription machineries of archaebacteria and eukaryotes. Cell 71: 189-190. 160

Pace, C.N., Shirley, B.A., and Thomson, J.A. (1990) Measuring the conformational stability of a protein. In Protein Structure: A Practical Approach. T. E. Creighton, (ed). Oxford: IRL Press, pp: 311-330.

Pace, N.R. (1997) A molecular view of microbial diversity and the biosphere. Science 276: 734-740.

Parker, R., and Song, H. (2004) The enzymes and control of eukaryotic mRNA turnover. Nat Struct Mol Biol 11: 121-127.

Passalacqua, K.D., Varadarajan, A., Ondov, B.D., Okou, D.T., Zwick, M.E., and Bergman, N.H. (2009) Structure and complexity of a bacterial transcriptome. J Bacteriol 191: 3203-3211.

Peng, X., Wang, X., Qi, W., Huang, R., Su, R., and He, Z. (2015) Deciphering the binding patterns and conformation changes upon the bovine serum albumin–rosmarinic acid complex. Food Funct 6: 2712-2726.

Perederina, A., Nevskaya, N., Nikonov, O., Nikulin, A., Dumas, P., Yao, M., et al. (2002) Detailed analysis of RNA–protein interactions within the bacterial ribosomal protein L5/5S rRNA complex. RNA 8: 1548-1557.

Perkins, T.T., Kingsley, R.A., Fookes, M.C., Gardner, P.P., James, K.D., Yu, L., et al. (2009) A strand-specific RNA–Seq analysis of the transcriptome of the typhoid Bacillus salmonella typhi. PLoS Genet 5: e1000569.

Phadtare, S. (2004) Recent developments in bacterial cold-shock response. Curr Issues Mol Biol 6: 125-136.

Phadtare, S. (2011) Unwinding activity of cold shock proteins and RNA metabolism. RNA Biol 8: 394-397.

Phadtare, S., Alsina, J., and Inouye, M. (1999) Cold-shock response and cold-shock proteins. Curr Opin Microbiol 2: 175-180.

Phadtare, S., and Severinov, K. (2010) RNA remodeling and gene regulation by cold shock proteins. RNA Biol 7: 788-795. 161

Phadtare, S., Inouye, M., and Severinov K. (2002) The nucleic acid melting activity of Escherichia coli CspE is critical for transcription antitermination and cold acclimation of cells. J Biol Chem 277: 7239-7245.

Phadtare, S., Tyagi, S., Inouye, M., and Severinov, K. (2002) Three amino acids in Escherichia coli CspE surface-exposed aromatic patch are critical for nucleic acid melting activity leading to transcription antitermination and cold acclimation of cells. J Biol Chem 277: 46706-46711.

Pierrel, F., Douki, T., Fontecave, M., and Atta, M. (2004) MiaB protein is a bifunctional radical-S-adenosylmethionine enzyme involved in thiolation and methylation of tRNA. J Biol Chem 279: 47555-47563.

Pilak, O., Harrop, S., Siddiqui, K., Chong, K., DeFrancisci, D., Burg, D., et al. (2011) Chaperonins from an Antarctic archaeon are predominantly monomeric: crystal structure of an open state monomer. Environ Microbiol 13: 2232-2249.

Poklar, N., Lah, J., Salobir, M., Macek, P., and Vesnaver, G. (1997) pH and temperature-induced molten globule-like denatured states of equinatoxin II: a study by UV-melting, DSC, far- and near-UV CD spectroscopy, and ANS fluorescence. Biochemistry 36: 14345-14352.

Polissi, A., De Laurentis, W., Zangrossi, S., Briani, F., Longhi, V., Pesole, G., and Dehò, G. (2003) Changes in Escherichia coli transcriptome during acclimatization at low temperature. Res Microbiol 154: 573-580.

Rabilloud, T. (1999) Solubilization of proteins in 2-D electrophoresis. An outline. Methods Mol Biol 112: 9–19.

Rambaut, A. (2008) FigTree v1.3.1: Tree figure drawing tool. Available: http://tree.bio.ed.ac.uk/software/figtre?e/. Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom.

Rankin, L.M., Gibson, J.A.E., Franzmann, P.D., and Burton, H.R. (1999) The chemical stratification and microbial communities of Ace Lake, Antarctica: a review of the characteristics of a marine-derived meromictic lake. Polarforschung 66: 35-52. 162

Reid, I.N., Sparks, W.B., Lubow, S., McGrath, M., Livio, M., Valenti, J., et al. (2006) Terrestrial models for extraterrestrial life: methanogens and halophiles at Martian temperatures. Int J Astrobiol 5: 89-97.

Reid, K.L., Rodriguez, H.M., Hillier, B.J., and Gregoret, L.M. (1998) Stability and folding properties of a model β‐sheet protein, Escherichia coli CspA. Protein Sci 7: 470-479.

Reigstad, L.J., Jorgensen, S.L., and Schleper, C. (2010) Diversity and abundance of Korarchaeota in terrestrial hot springs of Iceland and Kamchatka. ISME J 4: 346-356.

Reiter, W.D., Hudepohl, U., and Zillig, W. (1990) Mutational analysis of an archaebacterial promoter: essential role of a TATA box for transcription efficiency and start-site selection in vitro. Proc Natl Acad Sci USA 87: 9509-9513.

Reppas, N.B., Wade, J.T., Church, G.M., and Struhl, K. (2006). The transition between transcriptional initiation and elongation in E. coli is highly variable and often rate limiting. Mol Cell 24: 747-757.

Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. (2011) Integrative genomics viewer. Nat Biotechnol 29: 24-26.

Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139-140.

Robinson, N.P., Dionne, I., Lundgren, M., Marsh, V.L., Bernander, R., and Bell, S.D. (2004) Identification of two origins of replication in the single chromosome of the archaeon Sulfolobus solfataricus. Cell 116: 25-38.

Russell, N.J. (2008) Membrane components and cold sensing. In Psychrophiles: From Biodiversity to Biotechnology. Membrane Components and Cold Sensing. Margesin, R., Schinner, F., Marx, J.C., and Gerday, C. (eds). Berlin Heidelberg, Germany: Springer, pp. 177-190.

Russell, N.J., and Fukunaga, N. (1990) A comparison of thermal adaptation of membrane lipids in psychrophilic and thermophilic bacteria. FEMS Microbiol Lett 75: 171-182. 163

Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice ,P., Rajandream, M.A., and Barrell, B. (2000) Artemis: sequence visualization and annotation. Bioinformatics 16: 944-945.

Sambrook, J., Fritsch, E.F., and Maniatis, T. (1989) Molecular cloning: a laboratory manual 2nd edition. Cold Spring Harbor Laboratory Press 3: E3-E4.

Sardu, A., Treu, L., and Camparano, S. (2014) Transcriptome structure variability in Saccharomyces cerevisiae strains determined with a newly developed assembly software. BMC Genomics 15: 1045.

Saunders, N.F., Goodchild, A., Raftery, M., Guilhaus, M., Curmi, P.M.G., and Cavicchioli, R. (2005) Predicted roles for hypothetical proteins in the low-temperature expressed proteome of the Antarctic archaeon Methanococcoides burtonii. J Proteome Res 4: 464-472.

Saunders, N.F., Ng, C., Raftery, M., Guilhaus, M., Goodchild, A., and Cavicchioli, R. (2006) Proteomic and computational analysis of secreted proteins with type I signal peptides from the Antarctic archaeon Methanococcoides burtonii. J Proteome Res 5: 2457-2464.

Saunders, N.F., Thomas, T., Curmi, P.M., Mattick, J.S., Kuczek, E., Slade, R., et al. (2003) Mechanisms of thermal adaptation revealed from the genomes of the Antarctic Archaea Methanogenium frigidum and Methanococcoides burtonii. Genome Res 13: 1580-1588.

Sawyer, A., Landsberg, M., Ross, I., Kruse, O., Mobli, M., and Hankamer, B. (2015) Solution structure of the RNA-binding cold shock domain of the Chlamydomonas reinhardtii NAB1 protein and insights into RNA recognition. Biochemical J 469: 97- 106.

Schindelin, H., Jiang, W., Inouye, M., and Heinemann, U. (1994) Crystal structure of CspA, the major cold shock protein of Escherichia coli. Proc Natl Acad Sci USA 91: 5119-5123.

Schindelin, H., Marahiel, M., and Heinemann, U. (1993) Universal nucleic acid-binding domain revealed by crystal structure of the B. subtilis major cold shock protein. Nature 364: 164-168. 164

Schindler, T., Graumann, P.L., Perl, D., Ma, S., Schmid, F.X., and Marahiel, M.A. (1999) The family of cold shock proteins of Bacillus subtilis stability and dynamics in vitro and in vivo. J Biol Chem 274: 3407-3413.

Schindler, T., Perl, D., Graumann, P., Sieber, V., Marahiel, M.A., and Schmid, F.X. (1998) Surface‐exposed phenylalanines in the RNP1/RNP2 motif stabilize the cold‐shock protein CspB from Bacillus subtilis. Proteins: Struct, Funct, Bioinf 30: 401- 406.

Schleper, C., DeLong, E.F., Preston, C.M., Feldman, R.A., Wu, K.Y., and Swanson, R.V. (1998) Genomic Analysis Reveals Chromosomal Variation in Natural Populations of the Uncultured Psychrophilic Archaeon Cenarchaeum symbiosum. J Bacteriol 180: 5003-5009.

Schleper, C., Swanson, R.V., Mathur, E.J., and DeLong, E.F. (1997) Characterization of a DNA polymerase from the uncultivated psychrophilic archaeon Cenarchaeum symbiosum. J Bacteriol 179: 7803-7811.

Schmid, A.K., Reiss, D.J., Kaur, A., Pan, M., King, N., Van, P.T., et al. (2007) The anatomy of microbial cell state transitions in response to oxygen. Genome Res 17: 1399-1413.

Schmid, F.X. (1990) Spectral methods of characterizing protein conformation and conformational changes. In Protein Structure: A Practical Approach. T. E. Creighton, (ed). Oxford: IRL Press, pp: 251-285.

Schröder, K., Graumann, P., Schnuchel, A., Holak, T., and Marahiel, M. (1995) Mutational analysis of the putative nucleic acid-binding surface of the cold-shock domain, CspB, revealed an essential role of aromatic and basic residues in binding of single-stranded DNA containing the Y-box motif. Mol Microbiol 16: 699-708.

Schubert, M., Edge, R.E., Lario, P., Cook, M.A., Strynadka, N.C., Mackie, G.A., and McIntosh, L.P. (2004) Structural characterization of the RNase E S1 domain and identification of its oligonucleotide-binding and dimerization interfaces. J Mol Biol 341: 37-54. 165

Sharp, P.M., and Li, W.H. (1987) The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acid Res 15: 1281-1295.

Sharp, P.M., and Matassi, G. (1994) Codon usage and genome evolution. Curr Opin Genet Dev 4: 851-860.

Sharp, P.M., Cowe, E., Higgins, D.G., Shields, D.C., Wolfe, K.H., and Wright, F. (1988) Codon usuage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity. Nucleic Acid Res 16: 8207-8211.

Shigi, N., Suzuki, T., Tamakoshi, M., Oshima, T., and Watanabe, K. (2002) Conserved bases in the TC Loop of tRNA are determinants for thermophile-specific 2- Thiouridylation at position 54*. J Biol Chem 277: 39128-39135.

Siddiqui, K., Cavicchioli, R., and Thomas, T. (2002) Thermodynamic activation properties of elongation factor 2 (EF-2) proteins from psychrotolerant and thermophilic Archaea. Extremophiles 6: 143-150.

Siddiqui, K.S. (2016) Defying the activity–stability trade-off in enzymes: taking advantage of entropy to enhance activity and thermostability. Crit Rev Biotechnol. doi:10.3109/07388551.2016.1144045.

Siddiqui, K.S., and Cavicchioli, R. (2006) Cold-adapted enzymes. Annu Rev Biochem 75: 403-433.

Siddiqui, K.S., Poljak, A., DeFrancisci, D., Guerriero, G., Pilak, O., Burg, D., et al. (2010) A chemically modified α-amylase with a molten-globule state has entropically driven enhanced thermal stability. Protein Eng Des Sel 23: 769-780.

Siddiqui, K.S., Poljak, A., Guilhaus, M., Feller, G., D'Amico, S., Gerday, C., and Cavicchioli, R. (2005) Role of disulfide bridges in the activity and stability of a cold- active α-amylase. J Bacteriol 187: 6206-6212.

Siddiqui, K.S., Williams, T.J., Wilkins, D., Yau, S., Allen, M.A., Brown, M.V., Lauro, F.M., and Cavicchioli, R. (2013) Psychrophiles. Annu Rev Earth Planet Sci 41: 87-115. 166

Simankova, M.V., Parshina, S.N., Tourova, T.P., Kolganova, T.V., Zehnder, A.J., and Nozhevnikova, A.N. (2001) Methanosarcina lacustris sp. nov., a new psychrotolerant methanogenic archaeon from anoxic lake sediments. Syst Appl Microbiol 24: 362-367.

Singh, R., and Valcárcel, J. (2005) Building specificity with nonspecific RNA-binding proteins. Nat Struct Mol Biol 12: 645-653.

Sîrbu, A., Kerr, G., Crane, M., and Ruskin, H.J. (2012) RNA-Seq vs dual-and single- channel microarray data: sensitivity analysis for differential expression and clustering. PloS one 7: e50986.

Staker, B.L., Korber, P., Bardwell, J.C., and Saper, M.A. (2000) Structure of Hsp15 reveals a novel RNA-binding motif. EMBO J 19: 749-757.

Strug, I., Utzat, C., Cappione, A., Gutierrez, S., Amara, R., Lento, J., et al. (2014) Development of a univariate membrane-based mid-infrared method for protein quantitation and total lipid content analysis of biological samples. J Anal Methods Chem. doi: 10.1155/2014/657079.

Struvay, C., and Feller, G. (2012) Optimization to low temperature activity in psychrophilic enzymes. Int J Mol Sci 13: 11643-11665.

Stryer, L. (1965) The interaction of a naphthalene dye with apomyoglobin and apohemoglobin. A fluorescent probe of non-polar binding sites. J Mol Biol 13: 482-495.

Studier, F.W., Rosenberg, A.H., Dunn, J.J., and Dubendorff, J.W. (1990) Use of T7 RNA polymerase to direct expression of cloned genes. Methods Enzymol 185: 60-89.

Subramanian, A.R. (1983) Structure and functions of ribosomal protein S1. Prog Nucleic Acid Res Mol Biol 28: 101-142.

Szymanski, M., Barciszewska, M., Erdmann, V., and Barciszewski, J. (2002) 5S ribosomal RNA database. Nucleic Acids Res 30: 176-178.

Tang, T.H., Polacek, N., Zywicki, M., Huber, H., Brugger, K., Garrett, R., et al. (2005) Identification of novel non‐coding RNAs as potential antisense regulators in the archaeon Sulfolobus solfataricus. Mol Microbiol 55: 469-481. 167

Thomas, T., and Cavicchioli, R. (1998) Archaeal cold‐adapted proteins: structural and evolutionary analysis of the elongation factor 2 proteins from psychrophilic, mesophilic and thermophilic methanogens. FEBS Lett 439: 281-286.

Thomas, T., and Cavicchioli, R. (2002) Cold adaptation of archaeal elongation factor 2 (EF-2) proteins. Curr Protein Pept Sci 3: 223-230.

Thomas, T., and Cavicchioli, R. (2000) Effect of temperature on stability and activity of elongation factor 2 proteins from Antarctic and thermophilic methanogens. J Bacteriol 182: 1328-1332.

Thomas, T., Kumar, N., and Cavicchioli R. (2001) Effects of ribosomes and intracellular solutes on activities and stabilities of elongation factor 2 proteins from psychrotolerant and thermophilic methanogens. J Bacteriol 183: 1974-1982.

Thompson, W., Conlan, S., McCue, L.A., and Lawrence, C.E. (2007) Using the Gibbs Motif Sampler for phylogenetic footprinting. Methods Mol Biol 395: 403-424.

Tishchenko, S.V., Nikulin, A., Fomenkova, N.P., Nevskaya, N., Nikonov, O., Dumas, P., et al. (2001) Detailed analysis of RNA–protein interactions within ribosomal protein S8–rRNA complex from the archaeon Methanococcus jannaschii. J Mol Biol 311: 311- 324.

Townley-Tilson, W.H., Pendergrass, S.A., Marzluff, W.F., and Whitfield, M.L. (2006) Genome-wide analysis of mRNAs bound to the histone stem-loop binding protein. RNA 10: 1853-1867.

Treu, L., Toniolo, C., Nadai, C., Sardu, A., Giacomini, A., Corich, V., and Campanaro, S. (2014) The impact of genomic variability on gene expression in environmental Saccharomyces cerevisiae strains. Environ Microbiol 16: 1378-1397.

Uhlén, M., Forsberg, G., Moks, T., Hartmanis, M., and Nilsson, B. (1992) Fusion proteins in biotechnology. Curr Opin Biotechnol 3: 363-369.

Vera, A., Gonzalez-Montalban, N., Aris, A., and Villaverde, A. (2007) The conformational quality of insoluble recombinant proteins is enhanced at low growth temperatures. Biotechnol Bioeng 96: 1101-1106. 168

Wada, M.A.S.A.T.O., Fukunaga, N.O.R.I.Y.U.K.I., and Sasaki, S.H.O.J.I. (1989) Mechanism of biosynthesis of unsaturated fatty acids in Pseudomonas sp. strain E-3, a psychrotrophic bacterium. J Bacteriol 171: 4267-4271.

Walsby, A.E. (1980) A square bacterium. Nature 283: 69-71.

Wang, Z., Gerstein, M., and Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10: 57-63.

Wilhelm, B.T., Marguerat, S., Watt, S., Schubert, F., Wood, V., Goodhead, I., et al. (2008) Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453: 1239-1243.

Williams, T., Lauro, F., Ertan, H., Burg, D., Poljack, A., Raftery, M., and Cavicchioli, R. (2011) Defining the response of a microorganism to temperatures that span its complete growth temperature range (-2°C to 28°C) using multiplex quantitative proteomics. Environ Microbiol 13: 2186-2203.

Williams, T.J., Burg, D.W., Ertan, H., Raftery, M.J., Poljak, A., Guilhaus, M., and Cavicchioli, R. (2010b) Global proteomic analysis of the insoluble, soluble, and supernatant fractions of the psychrophilic archaeon Methanococcoides burtonii part II: the effect of different methylated growth substrates. J Proteome Res 9: 653-663.

Williams, T.J., Burg, D.W., Raftery, M.J., Poljak, A., Guilhaus, M., Pilak, O., and Cavicchioli, R. (2010a) Global proteomic analysis of the insoluble, soluble, and supernatant fractions of the psychrophilic archaeon Methanococcoides burtonii part I: the effect of growth temperature. J Proteome Res 9: 640-652.

Woese, C.R., and Fox, G.E. (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA 74: 5088-5090.

Woese, C.R., Kandler, O., and Wheelis, M.L. (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA 87: 4576-4579.

Wolf, Y.I., Aravind, L., Grishin, N.V., and Koonin, E.V. (1999) Evolution of aminoacyl-tRNA synthetases - analysis of unique domain architectures and 169

phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res 9: 689-710.

Wolfe, S.A., Nekludova, L., and Pabo, C.O. (2000) DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct 29: 183-212.

Woody, R.W. (1996) Theory circular dichroism of protein. In Circular Dichroism and the Conformational Analysis of Biomolecules. Gerald D. Fasman (ed). New York: Springer LLC, pp: 25-68.

Wurtzel, O., Sapra, R., Chen, F., Zhu, Y., Simmons, B.A., and Sorek, R. (2010) A single-base resolution map of an archaeal transcriptome. Genome Res 20: 133-141.

Xia, B., Ke, H., and Inouye, M. (2001) Acquirement of cold sensitivity by quadruple deletion of the cspA family and its suppression by PNPase S1 domain in Escherichia coli. Mol Microbiol 40: 179-188.

Xiong, S., Zhang, L., and He, Q.Y. (2008) Fractionation of proteins by heparin chromatography. 2D PAGE: Sample Preparation and Fractionation 424: 213-221.

Yamanaka, K. (1999) Cold shock response in Escherichia coli. J Mol Microbiol Biotechnol 1: 193-202.

Yamanaka, K., and Inouye, M. (1997) Growth-phase-dependent expression of cspD, encoding a member of the CspA family in Escherichia coli. J Bacteriol 179: 5126- 5130.

Yoder-Himes, D.R., Chain, P.S.G., Zhu, Y., Wurtzel, O., Rubin, E.M., Tiedje, J.M., and Sorek, R. (2009) Mapping the Burkholderia cenocepacia niche response via high- throughput sequencing. Proc Natl Acad Sci USA 106: 3976-3981.

Zago, M.A., Dennis, P.P., and Omer, A.D. (2005) The expanding world of small RNAs in the hyperthermophilic archaeon Sulfolobus solfataricus. Mol Microbiol 55: 1812- 1828.

Zhang, L.M., Wang, M., Prosser, J.I., Zheng, Y.M., and He, J.Z. (2009) Altitude ammonia-oxidizing bacteria and archaea in soils of Mount Everest. FEMS Microbiol Ecol 70: 52-61. 170

Zhao, S., Fung-Leung, W.P., Bittner, A., Ngo, K., and Liu, X. (2014) Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PloS one 9: e78644.

Zhou, Z., and Fu, X.D. (2013) Regulation of splicing by SR proteins and SR protein- specific kinases. Chromosoma 122: 191-207.

Zillig, W., Zechel, K., and Halbwachs, H.J. (1970) A new method of large scale preparation of highly purified DNA-dependent RNA-polymerase from E. coli. Hoppe Seylers Z Physiol Chem 351: 221-224.

171

Appendix 1

List of Archaea and Bacteria used for the phylogenetic analysis.

172

173

Appendix 2

16S rRNA gene neighbour-joining tree. This tree was constructed using 16S rRNA sequences from all representative members from Archaea and Bacteria that were used for phylogenetic analysis of TRAM domain proteins.

174

Appendix 3

Alignment of amino acid sequences of TRAM domains from Bacteria and Archaea. Sequences are denoted by their respective locus tags. Multiple sequence alignment was performed on T-coffee platform (Notredame et al., 2000), where colors represent the quality of the alignment: BAD AVG GOOD.

175

176

177

Appendix 4

Alignment of nucleic acid sequences of TRAM domains from Bacteria and Archaea. Sequences are denoted by their respective locus tags. Multiple sequence alignment was performed on T-coffee platform (Notredame et al., 2000), where colors represent the quality of the alignment: BAD AVG GOOD.

178

179

180

181

182

183

184

185

186

187

188

189

190

Appendix 5

List of all operons, TSS and TTS from M. burtonii genome identified from the RNA- seq data. File provided as excel worksheet and is available online at http://onlinelibrary.wiley.com/doi/10.1111/1462- 2920.13229/abstract;jsessionid=32EDD278AA2949E381999D9A45F44EA4.f02t01

Appendix 6

Complete list of transcripts bound by Ctr3 at 4 and 23oC, and M. burtonii genes upregulated at 4 and 23oC. File provided as excel worksheet and is available online at http://onlinelibrary.wiley.com/doi/10.1111/1462- 2920.13229/abstract;jsessionid=32EDD278AA2949E381999D9A45F44EA4.f02t01

Appendix 7

List of all RNA targets from 4 and 23°C RNA containing the 41 nucleotide full-length 4C_M1 sequence. File provided as excel worksheet and is available online at http://onlinelibrary.wiley.com/doi/10.1111/1462- 2920.13229/abstract;jsessionid=32EDD278AA2949E381999D9A45F44EA4.f02t01

Appendix 8

List of all RNA targets from 4 and 23°C RNA containing the nine nucleotide core 4C_M1 sequence. File provided as excel worksheet and is available online at http://onlinelibrary.wiley.com/doi/10.1111/1462- 2920.13229/abstract;jsessionid=32EDD278AA2949E381999D9A45F44EA4.f02t01

191

Appendix 9

Multiple sequence alignment of M. burtonii tRNA bound by Ctr3. (A) All tRNA bound by Ctr3 from 4oC grown cultures. (B) All tRNA bound by Ctr3 from 23oC grown cultures. The position of the nine nucleotide core (GUUCXXXUC) in the 4C_M1 motif is highlighted in yellow.

(A)

192

(B)

193

Appendix 10

Multiple sequence alignment of M. burtonii 5S rRNA bound by Ctr3. (A) All six 5S rRNA bound by Ctr3 from 4oC frown cultures; (B) All five 5S rRNA bound by Ctr3 from 23oC grown cultures. The position of the nine nucleotide core (GUUCXXXUC) in the 4C_M1 motif is highlighted in yellow.

(A)

(B)

194

Appendix 11

Alignment of tRNA genes from selected Archaea, Bacteria and Eucarya. For each organism, the two most divergent tRNA sequences were selected. The most conserved region is highlighted in yellow. Archaea (red); Bacteria (grey); Eucarya (cyan).