CLONING, EXPRESSION, PURIFICATION AND CRYSTALLIZATION

OF Pf -1806301

by

LEI HUANG

(Under the direction of Dr. Bi-Cheng Wang)

ABSTRACT

Pf -1806301, a putative archaeal Lsm protein characterized in this thesis, shows sequence similarity to both archaeal Lsm and bacteria Hfq proteins. Sm/Lsm proteins are a large family of

RNA-binding proteins that have been identified in archaea, bacteria and eukarya. The determination of the three-dimensional structure of Pf -1806301 therefore will not only help reveal the functions of the Lsm proteins, but also offer in-depth implications to the relationship between Lsm and Hfq proteins.

In order to solve the crystal structure of Pf -1806301, the encoding Pf -1806301 was recloned into vector pET28, which provides a thrombin cleavage site for the removal of the His- tag after protein purification. The best Se-Met crystal diffracted up to 2.7Å, showing a significant improvement in diffraction compared with crystals grown from the protein with the

His-tag attached. In addition, diffraction data reveal that the Pf -1806301 complex is of a dimer of octamers.

INDEX WORDS: Pyrococcus furiosus, Archaea, Sm/Lsm, Hfq, CLONING, EXPRESSION, PURIFICATION AND CRYSTALLIZATION

OF Pf -1806301

by

LEI HUANG

B.S., NanKai University, P.R.China, 2000

A Thesis Submitted to the Graduate Faculty of The University of Georgia in Partial

Fulfillment of the Requirements for the Degree

MASTER OF SCIENCE

ATHENS, GEORGIA

2004

©2004

Lei Huang

All Rights Reserved

CLONING, EXPRESSION, PURIFICATION AND CRYSTALLIZATION

OF Pf -1806301

by

Lei Huang

Major Professor: Bi-Cheng Wang

Committee: John P. Rose William Lanzilotta

Electronic Version Approved:

Maureen Grasso Dean of the Graduate School The University of Georgia Demember 2004

ACKNOWLEDGEMENTS

I would like to express my deepest gratitude to my major professor Dr. Bi-Cheng Wang for his guidance, encouragement and support, which is essential for me to complete my master’s project. And I appreciate the knowledge and experience gained during the years.

I would like to thanks my committee members Dr. John Rose and Dr. William Lanzilotta for their suggestions and advice on the project; Dr. Zhi-Jie Liu and Jeff Habel for their help in crystallization and data collection.

I would also like to thank all my fellow members in Dr. Wang’s lab, especially Peter

Horanyi and Lirong Chen. Special thanks to my husband and my parents for their constant supports and encouragements. Also thanks to all the other colleagues for their advice and supports.

iv TABLE OF CONTENTS

Page

ACKNOWLEDGMENTS……………………………………………………………………..iv

LIST OF TABLES……………………………………………………………………………. vii

LIST OF FIGURES…………………………………………………………………………...viii

CHAPTER

1 INTRODUCTION

Hyperthermophiles and Pyrococcus furiosus ……………………………………1

Pf -1806301 is a putative Sm-like protein………………………………………. 3

Oligomerization and RNA-binding of Sm/Lsm proteins………………………...8

2 MATERIAL AND METHODS

Materials………………………………………………………………………...15

Sequence Analysis…...………………………………………………...... 16

Cloning and Expression…………………………………………………………16

Purification of Recombinant Pf -1806301………………………………..……..18

Thrombin Cleavage of His-tagged protein……………………………………...19

Dynamic Light Scattering ………………………………...... 20

Gel-Shift Assay………………………………...... 21

Expression and Purification of Se-Met protein………………………………….22

Crystallization…………………………………………………………………....22

Data Collection…………………………………………………………………..23

v 3 RESULTS

Sequence Analysis……………………………………………………………….24

Cloning and Expression………………………………………………………….24

Purification of Recombinant Pf -1806301……………………………………….28

Thrombin Cleavage of His-tagged protein……………………………………....28

Dynamic Light Scattering ……………………………………………………….28

Gel-Shift Assay…………………………………………………………………..31

Expression and Purification of Se-Met protein…………………………………..31

Crystallization…………………………………………………………...... 34

Data Collection and Processing………………………………………………….35

4 DISCUSSION……………………………………………………………………...... …..44

REFERENCES………………………………………………………………...... 47

vi LIST OF TABLES

Page

Table 1.1: The hyperthermophilic genera: organisms that grow optimally at 90 °C……………..2

Table 1.2: Classification of Sm/Lsm family in human, yeast, archaea and bacteria …………….6

Table 3.1: Statistics of current Sm, Lsm and Hfq crystal structures…………………………….26

Table 3.2: Data collection statistics of crystals soaked with heavy atom……………………….38

Table 3.3: Data collection parameters of P1 space group crystal………………………………..40

Table 3.4: Matthews coefficient of P1 space group crystal……………………………………...43

vii LIST OF FIGURES

Page

Figure 1.1: The rooted universal phylogenetic tree ………………………………………… 4

Figure 1.2: Crystal structure of Lsm1 monomer from P. abyssi (1H64)……………………..9

Figure 1.3: Sequence and structural comparison of the Sm-fold in Sm, Lsm and Hfq proteins form human, archaea and bacteria respectively………………………………………...... 10

Figure 1.4: Oligomeric organizations of Sm/Lsm and Hfq complexes……………………...11

Figure 1.5: The internal uridine-binding pocket of P. abyssi Lsm1 complex (1M8V)……...14

Figure 3.1: Sequence alignment of archaeal Sm-like proteins with human canonical Sm (hsm) proteins and bacterial Hfq proteins…………………………………………………………..25

Figure 3.2: Purification scheme……………………………………………………………...27

Figure 3.3: Pf -1806301 was purified over Ni-NTA agarose………………………………..29

Figure 3.4: Superdex-75 gel filtration purification of Pf -1806301…………………………30

Figure 3.5: Dynamic light scattering analysis of native Pf -1806301 in solution……….…..32

Figure 3.6: In vitro binding of the fluorescence labelled oligo U with Pf -1806301………..33

Figure 3.7: Crystallization of Pf -1806301…………………………………………………..36

Figure 3.8: The diffraction of the best Se-Met crystal……………………………………….37

Figure 3.9: Self-rotation function plot with κ = 45°…………………………………………41

Figure 3.10: Self-rotation function plot with κ = 180°……………………………………...42

viii CHAPTER 1

INTRODUCTION

Hyperthermophiles and Pyrococcus furiosus

Structural genomics (1) is a multidisciplinary research project that involves bioinformatics, molecular biology, biochemistry, and biophysical methods such as NMR spectroscopy and x-ray crystallography. The objective of structural genomics is to determine the unique three-dimensional structures of proteins throughout the genome in order to better understand the relationship between protein sequence, structure and function.

The Southeast Collaboratory for Structural Genomics (SECSG) is one of the original seven NIH-funded Pilot Centers for high throughput (HTP) structure determination. It consists of five collaboration institutes: The University of Georgia, the University of Alabama at

Birmingham, the University of Alabama at Huntsville, Georgia State University and Duke

University Medical Center (2).

One of the major model organisms selected by SECSG is Pyrococcus furiosus. The Latin name Pyrococcus stands for “the fireball”, which indicates its ability to grow at temperatures above 100°C. Based on its favorable growth temperature, Pyrococcus belongs to a group of organisms called hyperthermophiles. The first hyperthermophile was isolated from shallow marine volcanic vents by Stetter and coworkers in 1982 (3). Since then, more than 20 genera of hyperthermophiles have been discovered (Table 1.1) (4a). Hyperthermophilic organisms are distributed in the geothermally-heated ecosystems. They are mostly obligate anaerobes, and they

0 depend on electron donors like elementary sulfur (S ) instead of O2 for their energy conservation

(4b).

1 Table 1.1 The hyperthermophilic genera: organisms that grow optimally at 90 °C

(a) (b) (c) Genus Tmax (°C) Metabolism Substrate Acceptors S0-depentdent archaea Thermofilum 100 Hetero Pep S0, H+ Staphylothermus 98 Hetero Pep S0, H+ Thermodiscus 98 Hetero Pep S0, H+ Desulfurococcus 90 Hetero Pep S0, H+ 0 + Thermoproteus 92 Hetero (auto) Pep, CBH (H2) S , H 0 + Pyrodictum 110 Hetero (auto) Pep, CBH (H2) S , H Pyrococcus 105 Hetero Pep S0, H+ Thermococcus 97 Hetero Pep, CBH S0, H+ 0 + Hyperthermus 110 Hetero Pep (H2) S , H 0 2- Stetteria 103 Hetero Pep + H2 S , S2O3 0 - Pyrobaculum 102 Hetero (auto) Pep (H2) S , mO2, NO3 0 0 Acidianus 96 Auto S , H2 S , O2 S0-indepentdent archaea Sulfobococcus 95 Hetero Pep - Aeropyrum 100 Hetero Pep O2 2- - Pyrolobus 113 Auto H2 S2O3 NO3 mO2 Sulfate-reducing archaea 2- 2- Archaeoglobus 95 Hetero (auto) CBH (H2) SO4 S2O3 Iron-oxidizing archaea 2+ - 2- - Ferroglobus 95 Auto Fe , H2, S2 S2O3 NO3 Methanogenic archaea Methanococcus 91 Auto H2 CO2 Methanothermus 97 Auto H2 CO2 Methanopyrus 110 Auto H2 CO2 Bacteria Thermotoga 90 Hetero Pep, CBH S0, H+ 0 - Aquifex 95 Auto S (H2) NO3 mO2

From Roopali, R. Ph. D dissertation, University of Georgia, 2001 (a) Maxium growth temperature (b) Heterotrophic (hetero) of autotrophic (auto) growth mode (c) Growth substrates: peptides (Pep), carbohydrates (CBH), hydrogen (H2) and elemental sulfur (S0) as electron receptor

2 The classification of hyperthermophiles offers new insight into the origin of life. Based

on 16S-rRNA analyses of known hyperthermophilic microbes, except for two bacterial genera,

all of the hyperthermophile genera are classified as Archaea (formerly known as Archaeabacteria)

(5). In 1990, Woese et al. proposed a universal phylogenetic tree for all living organisms, in

which the living world is divided into three domains: the Bacteria, the Archaea, and the Eukarya

(5). The Archaea can further be divided into two kingdoms: the Crenarchaeota and the

Euryarchaeota. The Crenarchaeota contains a few thermophiles and all the methanogens,

whereas Euryarchaeota consist of a homogenous tight clustering of extreme thermophiles

including Pyrococcus (Figure 1.1).

From the phylogenetic tree, the archaea and eukarya are most likely evolved from a common ancestor that is not shared by bacteria. And the hyperthermophiles species appear to be the most slowly evolved organisms within both the archaea and the bacteria domain (6).

Therefore, hyperthermophilic organisms are believed to be most closely related to the ancestral form of life, which was first evolved under the high temperature on earth. Phylogenic studies show that the rest of the life forms result from evolutionary pressure to adapt to lower temperatures (5).

Pf -1806301 is a putative Sm-like protein

In addition to sharing common ancestor with Eukarya based on ribosomal RNA sequence comparisons (7), archaeal genomes encode many protein complexes characteristic of eukaryotic small nuclear ribonucleoproteins (). SnRNPs are small RNA-protein complexes found in

the eukaryotic nucleoplasm (8). They are crucial components in various RNA processing events,

including pre-mRNA splicing (9, 10), rRNA processing (9, 11), mRNA degradation (12, 13),

telomere synthesis (14) and pre-mRNA processing (15-17).

3 Bacteria Archaea Eukarya Animal Crenarchaeota Euryarchaeota Fungi 8 Pyrococcus 7 9 Plant 4 6 5 2 3

1 1. Aquifex 2. Thermotoga 3. Pyrodictum 4. Thermoproteus 5. Thermococcales 6. Methanococcales 7. Methanobacteriales 8. Methanomicrobiales 9. Extreme halophiles

Figure 1.1 The rooted universal phylogenetic tree based on 16 rRNA sequence analysis. Showing three domains: bacteria, archaea and eukarya. Hyperthermophilic genera are colored red. Position of Pyrococcus is pointed by arrow.

Modified from Woese et al., Proc Natl Acad Sci U S A. 1990 87(12):4576-9.

4 Each snRNP consists of a small nuclear RNA (snRNA), associated with several snRNP

specific proteins and a set of seven Sm or Sm-like (Lsm) proteins (18). During autoimmune

diseases, antibodies against the protein or RNA moiety of snRNPs are often generated for

unknown reason (19). In patients suffering from systematic lupus erythematosus, autoantibodies

were found to recognize an antigen called Sm (20). The Sm antigen was found present in U7

snRNP, spliceosomal snRNP U1, U2, U4 and U5, and some snRNPs with unknown functions.

The Sm antigen was identified as seven small homologous proteins B/ B´, D1, D2, D3, E, F and

G, which are also referred as the canonical Sm proteins (21).

Sequence analyses reveal that Sm proteins form a distinct family that share an

approximately 70-residue signature sequence called the Sm domain. This domain consists of two

conserved regions termed the Sm1 and Sm2 motif, which were linked by a loop region with

variable length (22- 24). Database search of shows that the Sm domain is present in a

large number of proteins. Some of them are highly homologous to human Sm proteins, hence

they might be the Sm counterpart in other eukaryotic species. Yet other proteins shared no clear

sequence similarity with Sm proteins, and these proteins were named Sm-like (Lsm) proteins (23,

24). Based on the relationship between the canonical Sm proteins and Sm-like proteins, nine subtypes of this Sm/Lsm protein family could be divided (Table 1.2). Among the nine Sm-like proteins identified in yeast, the Lsm2-Lsm8 complex directly associates with spliceosomal U6

RNA and pre-RNase P RNA, indicating that both Sm and Sm-like proteins are involved in RNA processing events (25).

The formation of Sm/Lsm complex involves the interaction between Sm/Lsm proteins and the specific recognition of cognate . In eukaryotic spliceosomal snRNPs, the seven canonical Sm proteins B/B´, D1, D2, D3, E, F and G assemble around snRNA at a conserved

5 Table 1.2 Classification of the Sm/Lsm family in human, yeast, archaea and bacteria

Classification 1 2 3 4 5 6 7 8 9

Human SmB SmD1 SmD2 SmD3 SmE SmF SmG

Lsm2 Lsm3 Lsm4 Lsm5 Lsm6 Lsm1

Yeast SmB SmD1 SmD2 SmD3 SmE SmF SmG

Lsm8 Lsm2 Lsm3 Lsm4 Lsm5 Lsm6 Lsm7 Lsm1 Lsm9

Archaea Lsm1 ( AF, PA, PH, MT, SS, AP, TA, HN, PyA, TV )

Lsm2 ( AF, AP, MT, SS, TA, TV )

Bacteria Hfq (~30 species)

AF, Archaeoglobus fulgidus; PA, Pyrococcus abyssi; PH, Pyrococcus horikoshi; MT, Methanobacterium thermoautotrophicum; SS, Sulfolovus solfataricus; AP, Aeropyrum pernix; TA, Thermoplasma acidophilum; HN, Halobacterium sp. NRC-1; PyA, Pyrobaculum aerophilum; TV, Thermoplasma volcanicum

6 uridine-rich site called Sm site to form the Sm core domain (26). The formation of the Sm core

domain plays an important role in snRNP biogenesis. It triggers the hypermethylation of the m7G

cap of snRNA (27), and also acts as one of the bipartite nuclear localization signal for the

transportation of the newly assembled snRNPs back to the nucleus (28). Electron micrographs of human canonical Sm core domains and purified human Lsm complex demonstrate that Sm/Lsm

complexes are doughnut-shaped ring structure with a diameter of about 70-80Å (29, 30).

However due to the complex composition of these eukaryotic Sm/Lsm complexes, the three-

dimensional structures of the complexes have not been determined yet.

Although the Sm/Lsm proteins in eukaryotic spliceosomal snRNPs have been well

studied for many years, their archaeal counterparts had not been identified until 1999, when

database searches were conducted on genomes of Methanobacterium thermoautotrophicum and

Archaeoglovus fulgidus (25). In contrast to eukaryotes, archaeal genomes were found to encode

at most two subtypes of Sm-like proteins, which were termed as Lsm1 and Lsm2 (25). Until now

Sm-like proteins have been identified in more than ten archaeal species. Some species encode

both Lsm1 and Lsm2 type proteins, and other species encode only Lsm1 type proteins (31)

(Table 1.2). The sequence comparison of these Lsm1 proteins reveals a high level of sequence

homology, suggesting that Lsm1 from different archaeal genomes are orthologues. Lsm2 type

proteins on the other hand, are more diverse in their sequences, having only 30% or less

sequences identity within Sm2 type and with Sm1 type proteins. Furthermore Hfq proteins, a

widespread and well-conserved family of bacterial factors involving in RNA modulation, were

recently found to share the characteristic Sm fold from the crystal structures (32, 33) (Figure 1.2).

Although the overall sequence similarity between the eukaryotic Sm proteins, archaeal Lsm

proteins and bacterial Hfq proteins are low, their core Sm-folds are strikingly conserved, with an

7 average root mean square deviation of 0.91± 0.28Å (32). This gives another good example that for proteins with similar biochemical function, their 3-D structures are more conserved than the sequences during the evolution (Figure 1.3).

The first crystal structures of archaeal Sm-like proteins were determined in 2001, revealing a donut-shaped heptameric ring structures comprised of seven identical Lsm subunits

(34-36). Each subunit consists of an N-terminal α-helix followed by five-twisted β-sheet, which can be divided into two sequence motifs: Sm1 (β strand 1,2 and 3) and Sm2 (β strand 4 and 5)

(Figure 1.2). These two parts are linked together by a loop region (L4) of variable length and composition. The archaeal Lsm fold was found highly conserved with human canonical Sm proteins and bacterial Hfq proteins, showing only slight differences at N-terminal and L4 region

(32). The whole complex is held together by backbone hydrogen bondings between β4 of one subunit and β5 of the adjacent subunit, as well as the hydrophobic and other side-chain interactions.

Oligomerization and RNA-binding of Sm/Lsm proteins

Sm and Sm-like proteins from eukarya, archaea and bacteria are organized in a ring structure. The reason is easily explained by the structure of the monomer, where the curved β strands form a compact hydrophobic core. By pairing with an adjacent subunit, the hydrophobic core is shielded from the hydrophilic environment. The formation of the ring structure allows the hydrophobic cores of all component subunits to be enclosed (34).

Sm/Lsm Sm and Sm-like protein tend to form cyclic oligomers (Figure 1.4). Biochemical and electron microscopic studies of snRNPs confirmed the formation of a eukaryotic hetero- heptamer Sm core structure (37, 38). Five crystallographic structures of archaeal Lsm proteins

8 A B

Figure 1.2 Crystal structure of Lsm1 monomer from P. abyssi (1H64). Figure A and B show the same structure viewed from different orientations. This structure represents a typical Sm-fold, consisting of a N-terminal α-helix followed by five β-strand, which are colored by red, orange, yellow, green, cyan and blue respectively.

9 A

B

From Sauter et al., (2003) Nucleic acids research 14, 4091-4098.

Figure 1.3 Sequence and structural comparison of the Sm-fold in Sm, Lsm and Hfq proteins form human, archaea and bacteria respectively. (A) Superposition of the Sm-fold in Hfq monomers from bacteria (green), Lsm monomers from archaea (blue) and Sm proteins from human (red) (B) The sequences of the Sm, Lsm and Hfq proteins whose structures are shown above.

10

A. Hexameric bacterial Hfq protein B. Heptameric archaeal Lsm protein

Sauter et al. Nucleic Acid Research, 2003 Collins et al. J. Mol. Biol, 2001

Figure 1.4 Oligomeric organizations of Sm/Lsm and Hfq complexes. (A) The front and side views of the hexameric E.Coli Hfq protein complex. (B) The front and side views of the heptameric Lsm1 protein complex from Methanobacteriaum thermoautotophicum.

11 have also been determined. Except for the only Lsm2 protein that is in a homo-hexameric

formation, all four Lsm1 proteins arrange in a homo-heptameric organization. Hfq proteins in bacterial organisms on the other hand seem to adopt exclusively the homo-hexameric formation

(32). Comparison of the sequences and structures of above Sm/Lsm proteins could not give a clear answer to what drives the preference for heptamer or hexamer formation. More structural data are needed to help answer this question.

Some archaeal Lsm proteins form an even higher degree of polymerization shown by their crystallographic structures. For example, Lsm1 complex from Pyrococcos abyssi is packed as two heptamers in a head-to-head orientation (head refers to the face containing the N-terminal

α-helix). This dimerization of the heptamer was also confirmed by EM and gel filtration studies.

It was suggested that the dimerization might imply the real function (39). Similar head-to-head organization of heptamers was also observed in Lsm1 proteins from Pyribaculum aerophilum and Methanobacterium thermautotrophicum (40).

In addition to their oligomeric formation, Sm and Sm-like proteins are characterized by being able to bind RNA at a uridine-rich site called Sm site. EM studies suggest that the eukaryotic Sm core proteins assemble around the RNA in a ring structure, and the RNA is channeled through the central cavity of the ring (37). Archaeal Lsm and bacterial Hfq complexes on the other hand were shown to act only as a docking site where cognate RNA molecules bind at the surface of the oligomers instead of threading through the central cavity. These differences were further verified by the facts that eukaryotic Sm proteins assemble into the heptameric core only in the presence of its target RNA, while archaeal Lsm and Hfq proteins exist as stable oligomers without binding to the RNA (32).

12 It was proposed that archaeal Lsm proteins have a generic single-stranded RNA binding

activity. Structures of several Lsm-RNA complexes have been determined, including Lsm1 from

P. abyssi, A. fulgidus and M.thermoautotrophicum (31, 39, 40). Although the length of their ligand RNAs ranges from single UMP to uridine heptamer (U7), the residues involved in uridine interaction are strictly conserved. The uridine-binding pocket includes histidine and asparagine in loop3 and arginine in loop5. The uracil ring is stacked firmly between the side chains of arginine and histidine, and form hydrogen bonds with the conserved asparagine (Figure 1.5).

In order to fully understand the function and structure of Pf -1806301, it will be of great importance to determine its X-ray structure. In this thesis, Pf -1806301, a putative Lsm protein in archaea P. furiosus, was identified to be an Lsm2 type protein by sequence comparison. Previous efforts in crystallization of recombinant Pf -1806301 with His-tag produced the crystals that diffracted to only ~8Å. Since a His-tag can sometimes interfere with protein interaction and that leads to low diffraction quality crystals, by recloning the cDNA of Pf -1806301 into vector pET-

28, we introduced a thrombin cleavage site downstream of the N-terminal His-tag. The recombinant protein was expressed and purified from E.Coli BL21 (DE3). The new recombinant

Pf -1806301 with no His-tag produced native crystals that diffracted to 2.7Å at the Advance

Photon Source (APS), Argonne National Laboratory. Intensive heavy atom soaking was carried out, and the best crystal diffracted to 3.0 Å at the synchrotron. However, the heavy atom sites could not be found in the anomalous difference Patterson map. In order to obtain stronger anomalous signal, Se-Met labeled Pf -1806301 was produced. The best crystal diffracted to 2.7Å using the home-source X-ray system. Self-rotation function and Matthews coefficient were calculated from the diffraction data collected at synchrotron, and the overall structure of Pf -

1806301 is predicted to be a dimer of octamers.

13

Figure 1.5 the internal uridine-binding pocket of P. abyssi Lsm1 complex (1M8V) The uracil is stacked between His37 (orange) and Arg63 (red), and also forms hydrogen bonds with Asn39 (yellow).

14 CHAPTER 2

MATERIALS AND METHODS

Materials

Bacto Tryptone and Bacto Yeast Extract were purchased from BD Biosciences; SDS-PAGE

molecular weight marker was purchased from Bio-Rad; Restriction enzymes (Nde I and Xho I ),

Bovine Serum Albumin (BSA), DNA ladder (200bp and 1kbp), 100mM dNTP mix and

Isopropyl-beta-D-thiogalactopyranoside (IPTG) were purchased from Promega; T4 DNA Ligase was purchased from Fisher Scientific; the pET-28 expression vector was from Novagen;

Bacteriological Agar was purchased from Oxoid; the Qiaquick PCR purification kit and the gel purification kit were purchased from Qiagen; Kanamycin and high pure plasmid isolation kit were purchased from Roche; Pfu Turbo DNA polymerase, BL21 GOLD® and BL21(ED3)

GOLD® competent cells were from Stratagene.

Media were prepared as follows:

LB (1L): 10g tryptone, 5g yeast extract, 10g NaCl.

SOC (1L): 20g Tryptone, 5g Yeast Extract, 0.5g NaCl, 5g MgSO4.7H2O, 20 mM glucose.

PA-0.5G (50ml): 46.13 ml H2O, 50 µl 1M MgSO4, 5 µl 1000x metal mix, 0.63 ml 40% glucose,

2.5 ml 20x NPS, 0.2 ml Methionine 25 mg/ml, 0.5 ml 17 amino acids mix 10mg/ml, 0.25ml

Kanamycin 10mg/ml, 50 µl 33mg/ml chloramphenicol.

1000x metal mix: 0.1 M FeCl3-6H2O, 1 M CaCl2, 1 M MnCl2-4H2O, 1 M ZnSO4-7H2O, 0.2 M

CoCl2-6H2O, 0.1 M CuCl2-2H2O, 0.2 M NiCl2-6H2O, 0.1 M Na2MoO4-5H2O, 0.1 M Na2SeO3-

5H2O and 0.1 M H3BO3.

15 20x NPS: 0.5 M (NH4)2SO4, 1 M KH2PO4 and 1 M Na2HPO4

17 amino acids mix: 10mg/ml 17 commonly used amino acids except Cys, Tyr and Met.

PASM-5052 (1L): 900ml H2O, 1.25 ml 1M MgSO4, l.25ml 1000x Metal Mix, 20ml 50x 5052,

50ml 20x NPS, 1ml 100 µM vitamin B12, 5ml 10mg/ml 17 amino acids mix, 400µl Methionine

25mg/ml, 5ml 25mg/ml Se-Methionine, 5ml Kanamycin 10mg/ml, 1ml 33mg/ml

chloramphenicol.

50x5052: 0.5% glycerol, 0.05% glucose and 0.2% α-lactose

Pyrococcus furiosus genomic DNA was provided by Dr. Michael Adams, University of Georgia.

Methods

Sequence Analysis

Using Pf -1806301 sequence as a query, PSI-BLAST

(http://www.ncbi.nlm.nih.gov/BLAST/ ) and Superfamily (http://supfam.mrc-

lmb.cam.ac.uk/SUPERFAMILY/ ) homology searches were carried out. The multiple sequence

alignment of Pf -1806301, Hfq and Sm/Lsm sequences was manually adjusted based on the

alignment previously presented (32). Also the sequence identities of Pf -1806301 with Hfq and

Sm/Lsm proteins were calculated by program 3D-PSSM (http://www.sbg.bio.ic.ac.uk/~3dpssm/)

using Pf -1806301 sequence as a query.

Cloning and Expression of Recombinant Pf -1806301

Primers were designed from the Pf -1806301 gene in P. furiosus. A Nde I site (underlined)

was placed in the forward primer (5′-GTTCTTCATATGGGTATGGAAAGCCTACTTG – 3′),

and a Xho I site (underlined) was placed in the reverse primer (5′- GTTCTTCTCGAGCTACT

CTAGTAGCATTATCCAGT-3′). Primers were synthesized and purified by Integrated DNA

16 Technologies (IDT). PCR reaction was carried out using a Peltier Thermal Cycler (MJ Research)

to amplify the 216-bp gene from the P. furiosus genomic DNA. The amplification protocol was

as follows: incubation for 4 minutes at 94°C, followed by 35 cycles of the following; 30 seconds

at 94°C, 30 seconds at 45°C and 30 seconds at 72°C, and finished by 10 minutes at 72°C. The

PCR product was visualized by electrophoresis on a 1% agarose gel using TAE (Tris-Acetate-

EDTA) buffer and ethidium bromide, followed by purification using a Qiagen PCR purification

kit. The pET-28 expression vector (Novagen) contains the following elements: a regular

promoter/operator, consisting of the E. coli phage T7 promoter and one lac operator sequence; a

ribosome binding site, a translation start codon; a 6xHis affinity tag coding sequence; a thrombin

recognition site coding sequence; a T7 tag coding sequence; a multi-cloning site; a second 6xHis

affinity tag coding sequence; translation stop codon and a transcriptional terminator. The PCR

product and pET-28 expression vector were then digested by another endonuclease restriction

enzyme Nde I (Promega) at 37°C for 48 hours, followed by Xho I (Promega) for 4 hours. The

resulting fragments were loaded onto 1% agarose gel stained by ethidium bromide, and purified

using Qiagen gel purification kit. The ligation reaction of the digested pET-28 vector and Pf-

1806301 DNA fragment was carried out at 14°C for 16 hours using T4 DNA Ligase (Fisher).

The whole ligation product was then transformed into BL21 GOLD® Competent cells

(Statagene). Five positive transformants were selected, and the insertion was confirmed by PCR

amplification using synthesized forward and reverse primers from IDT. Plasmids from two of

the transformants were then extracted using the High Pure Plasmid Isolation kit from Roche, and

later sent to the Sequencing and Synthesis Facility (University of Georgia, Athens, GA) for verification. The plasmid with the correct insertion was transformed into E.Coli BL21

(DE3)/pRIL® Competent cells (Statagene). The transformation was plated onto an LB agar plate

17 containing 50µg/ml Kanamycin and 35µg/ml Chloramphenicol, and the plate was incubated at

37°C for overnight. Transformants were selected by Kanamycin resistance, and several positive

transformants were grown in LB media with IPTG induction to test for the expression of

recombinant protein.

Purification of Recombinant Pf -1806301

Cells containing the recombinant Pf -1806301 construct were grown in 10 ml LB

medium containing 50µg/ml Kanamycin and 35µg/ml Chloramphenicol. After 5 hours of

incubation at 37°C with shaking at 200 rpm, the culture was used to inoculate 1liter of LB

medium. The 1L culture was incubated at 37°C on a shaker until the OD600 was 0.6 - 0.8. The

culture was then induced by adding IPTG to a final concentration of 1mM, and then incubated

for 16 hours at 28°C with shaking. Cells were harvested by centrifugation at 5,000 rpm for 15

min. The pellet (~ 5 grams) was resuspended thoroughly in 30 ml working buffer (25mM sodium

phosphate and 100 mM NaCl at pH 7.5). Also 1ml B-PER (Bacterial purification Extraction

Reagent) and 100 µl of PMSF (phenyl methyl sulfonyl fluoride) of 0.1 M stock solution were added to the cell solution to increase the lysis efficiency and decrease the protease activity. The solution was then sonicated on ice using a Branson Sonifier Cell Disruptor 450 at power level 7 with 6 bursts of 30 seconds on and 30 seconds off. The resulting lysate was centrifuged at 12,000 rpm for 30 min. The supernatant was transferred to another centrifuge tube and incubated in a

Fisher ISOTEMP 228 water bath at 70°C for 1 hour. The supernatant was again collected by

centrifugation at 12,000 rpm for 30 min, and then was put on ice ready for affinity

chromatography.

A 5 ml Hitrap affinity column from Amersham Biosciences was charged with Ni2+ by first washing the matrix with 2 bed volumes of 1 M NaOH followed by 10 bed volumes of water,

18 then washing with 2 bed volumes of 0.5 M EDTA followed by 10 bed volumes of water, and

charging the matrix with 2 bed volumes of 0.1 M NiCl2 followed by 10 bed volumes of water wash, and finally equilibrated with 5 bed volumes of sample loading buffer, containing 25mM

sodium phosphate and 100 mM NaCl at pH 7.5. Using the AKTA Prime chromatography system

from Amersham Biosciences, the supernatant was loaded onto the 5 ml Ni - NTA Agarose

column previously equilibrated. The column was washed with buffer containing 20 mM

Imidazole, until the UV absorption at 280nm wavelength was about zero, followed by elution

with a linear gradient from 20 mM to 500 mM Imidazole in 100 ml. Fractions were collected and

analyzed by SDS-PAGE.

Thrombin Cleavage of His-Tagged Protein

After the Ni-NTA purification, the recombinant protein was dialyzed against buffer

containing 20 mM Tris-HCl and 100 mM NaCl at pH 8.0 overnight to decrease imidazole concentration. 100 units of high activity human thrombin from Calbiochem were added to every

20 mg of protein solution to the final volume of 2~3 ml. The thrombin digestion reaction was carried out at 37°C for 24 hours. The digestion process was monitored by taking 10 µl aliquots at various time intervals, running the sample on an SDS-PAGE gel, and looking for the band shift.

Once the thrombin cleavage was complete, 1 mM freshly made PMSF was added to stop the reaction. The undigested protein with His-tag was separated from the digested protein by running

the whole mixture though another Ni-NTA column under the same condition as the first time,

and the flowthrough was collected. The recombinant Pf-1806301without C-terminus His-tag was

sent to the UGA Chemical and Biological Science Mass Spectrometry Facility to obtain a more

accurate molecular weight.

19 The recombinant protein obtained from the second NTA was loaded onto Hiload 16/60

Superdex 75 gel filtration column from Amersham Biosciences. Elution was carried out using

20mM Tris-HCl and 100mM NaCl, pH 8.0, at 1min/ml. The protein was collected and concentrated for future use.

SDS Polyacylamide Gel Electrophoresis

The 10-well precast 4-20% iGel was purchased from Gradipore and the 26-well 10-20%

Criterion® Precast gel was purchased from Bio-Rad. Protein samples were mixed with Laemmli

sample loading buffer containing 62.5 mM Tris-HCl, pH 6.8, 2% SDS, 25% glycerol and 0.01%

Bromophenol Blue in a 1:2 ratio, and boiled in water for 10 min. Samples were then loaded onto the gel of choice, and the SDS Polyacrylamide electrophoresis was run in buffer containing 25

mM Tris at pH 8.3, 192 mM glycine and 0.1% SDS at 150 Volt for 70 minutes. After washing

with H2O for 5 min, the gel was stained with Coomassie G-250 for about 1 hour, and then destained in buffer containing 5% ethanol and 7% acetic acid until the background was clear.

Dynamic Light Scattering (DLS) Analysis

DLS analysis of purified protein was carried out using a DynaPro® molecular Sizing

Instruments from Protein Solutions. The 12µl sample cuvet was first cleaned by rinsing with

0.1M HCl, pure ethanol and water. The scattering intensity of water was measured, which should

be less than 10k counts/sec to ensure the cuvet is clean. The protein sample (~20µl) was then

filtered through 0.02µm Anodisc 13 membrane (Whatman), and injected into the cuvet. The

scattering intensity curve of the protein should be able to remain constant and with little

fluctuation. Around 30 data points were collected for each experiment. The scattering data were

analyzed, and a molecular size model was calculated.

20 Gel-Shift Assay

Oligonucleotide UUUUUUUUUU (U10), purchased from Integrated DNA Technology

(IDT), was labelled with 6-carboxyfluorescein (6-FAM) at the 5' end and purified by HPLC to

ensure it was RNase-free. The absorbance and emission wavelength of the fluorescence labelled

RNA were 492 nm and 520 nm respectively. 62 nmole of pure RNA oligonucleotide was dissolved in 100 µl double distilled H2O, and the final concentration was determined to be 0.62

mM. For efficient RNA binding with Pf-1806301, 1 µl of U10 RNA was mixed with 7 µl of protein solution at 28 mg/ml, 1 µl of gel shift buffer (10X) containing 400mM HEPES-KOH pH

7.3, 1 M KCl, 100mM MgCl2, 80% Glycerol and 4mM EDTA, and 1 µl of BSA at 10 mg/ml.

The mixture was incubated at 65°C for 45min, and then mixed with loading buffer containing

80% glycerol and dye in a 1:1 ratio (31). Samples were loaded onto an 8-25% polyacrylamide

mini gel from Bio-Rad, and electrophoresis was run at 250 V for 40min, the whole experiment

was kept at 15°C using the PhastGel system. The native PAGE was visualized under filtered UV

light (filter: 570DF20-D) to locate the RNA band, and a picture was taken. The gel was then

stained with Coomassie blue to locate the protein bands, and another picture was taken after

distaining.

Reductive Methylation of Lysine

The protein solution was dialyzed against methylation buffer containing 20 mM HEPES

and 100 mM NaCl at pH7.6 using a Tube-O DIALYZER with 3000 Da cutoff. The methylation

reaction started by adding 20 µl of 1 M Dimethylamine-borane complex (DMAB) to 1 ml of

protein solution at a concentration of about 10 mg/ml, followed by adding 40 µl of 1 M

formaldehyde while shaking. The reaction mixture was incubated for 2 hours at 4°C in the dark

with shaking. This step was repeated twice, and another 10 µl of DMAB was added to the

21 mixture and incubated for 18 hours at 4°C in the dark with shaking. Buffer exchange was carried

out at last using 20mM Tris-HCl and 100mM NaCl to end the reaction.

Expression and Purification of Se-Met Protein

The plasmid containing the Pf -1806301 gene was transformed into BL21(DE3)/pRIPL

codon plus competent cell. The fresh transformants were inoculated into 50 ml of PA-0.5G

minimal medium, and grown at 37°C for ~ 8 hrs. The culture was then transferred to 5 L PASM-

5052 and incubated for ~ 20 hrs (A600 =1.37). After autoinduction, the cells were collected. The

Se-Met derivatized protein was purified by a 3-step process as described before, Ni-affinity

gradient purification followed by thrombin cleavage of N-terminal His-tag, and gel filtration

purification.

Crystallization of Recombinant Pf -1806301

Initial crystallization screens were set up using modified microbatch methods by mixing

0.5 µl of protein solution with 0.5 µl of the crystallization reagent, covering the drop with mixed oil containing 70% of Paraffin and 30% Silicone oil, and incubating the crystallization tray at 18

°C. Using the Oryx 6 crystallization robot from Douglas Instruments, the native recombinant protein without the N-terminus His-tag was first screened under six crystallization screens kits, including Hampton I and II, Wizard I and II, MP1 and PEG/ION. Favorable crystallization conditions were optimized. Crystals with good diffraction quality were soaked with K2PtCl6 and

HgCl2 powder. Also modification of the native recombinant Pf -1806301 was carried out by

reductive methylation of lysine residues to obtain better crystals.

Crystal screening of Se-Met labeled Pf -1806301 was carried out using the Honey bee

robot from Cartesian Technologies. 384 crystallization conditions from Screen I and II, Membfac,

PEG/ION and Cryo screens from Hampton research, and Wizard I and II screens from Emerald

22 BioStructures, and MemSys screen were used for the screen. The sitting-drop vapor diffusion method was used. Screens were set up by first adding 100 µl of crystallization reagents into the reservoirs of Greiner Crystal Quick 96-well, 3-drop plate, then mixing 200 nl of protein solution with 200 nl of the crystallization reagent, and then covering the whole tray with the ClearSeal

Film (Hampton Research).

Data Collection

Crystals that have been deemed suitable for X-ray diffraction analysis based on their size

(> 50 micron on edge) and morphology (no apparent signs of twinning) were harvested from the drop using an 18 mm Hampton CrystalCap® Copper magnetic pin with the loop size chosen to best match the crystal dimensions. Once harvested, the crystal was briefly (1-2 sec) immersed in a small drop of its well solution containing 30% glycerol (for the initial cryoprotection

experiment). The cryo-protected crystal was recovered from the cryoprotectant drop using the

same Hampton pin, flash-frozen and stored in LN2. Diffraction screening was carried out using

the Rigaku ACTOR/Director automated crystal-screening system consisting of an FR-D high

intensity rotating anode with Max-Flux confocal optics, an AFC-9 four-axis goniometer and a

Saturn 92 CCD detector.

The best-diffracting Pf -1806301 crystals were sent to the Southeast Regional

Collaborative Access Team (SER-CAT), Advance Photon Source (APS), Argonne National

Laboratory. Both native and Se-Met SAS data were collected on MarCCD 225 and MarCCD 300

detectors, and 720 images were collected with 0.5° oscillations. Data were indexed and

integrated by HKL2000.

23 CHAPTER 3

RESULTS

Sequence Analysis

The PSI-BLAST search using the Pf -1806301 sequence as a query failed to give a significant hit, partly because of Pf -1806301’s small molecular size and the relatively low sequence similarity with Sm/Lsm sequences. Further homology search using Superfamily (41) revealed that Pf -1806301 belongs to the Sm and Sm-like protein family, which includes eukaryotic Sm/Lsm, archaeal Lsm and bacterial Hfq proteins. This result was also consistent with TIGR annotation (www.tigr.org). The multiple alignments of Pf -1806301, Hfq and

Sm/Lsm sequences were shown in Figure 3.1, where Pf -1806301 displayed sequence homology to both Hfq and Lsm proteins. Pf -1806301 was therefore identified as an Lsm2 type protein,

which is denoted as Lsm2-PF in Figure 3.1. In addition, using the threading program 3D-PSSM

(42), the sequence identity of Pf -1806301 with Hfq and Sm/Lsm sequences could be calculated

(Table 3.1).

Cloning and Expression of Recombinant Pf-1806301

Using primers designed from the Pyrococcus furiosus ORF #1806301DNA sequence, we

were able to amplify the 216 bp gene from Pyrococcus furiosus genomic DNA. The PCR

product was digested by Nde I and Xho I, and ligated into the pET-28 expression vector by T4

DNA ligase. The main advantage of choosing pET-28 as an expression vector is that it includes a

thrombin cleavage site following the N-terminal 6xHis tag coding sequence. Therefore, we were

able to remove the His-tag once the protein was purified (Figure 3.2).

24

Lsm2-PF 1 ---MESLLEKVLKEWKG-—HKVAVSVGGD-HSFTGTLEDFDEE—VILLK------DVVDV------IGNRGKQMLIGLEDINWIMLLE 70 Hfq-EC 4 ---GQSLQDPFLNALRRERVPVSIYLVNG-IKLQGQIESFDQF—VILLK------NT------VSQMV-YKHAISTVVPSRPVSHHS 71 Hfq-SA 1 MIANENIQDKALENFKANQTEVTVFFLNG-FQMKGVIEEYDKY—VVSLN------SQG------KQHLIYKHAISTYTVETE--- 66 Lsm1-AF 2 ----PPRPLDVLNRSLK--SPVIVRLKGG-REFRGTLDGYDIHMNLVLLDAEEI------QNGEV------VRKVGSVVIRGDTVVFVSPAP-- 74 Lsm1-PA 9 ----FATLGATLQDSIG--KQVLVKLRDS-HEIRGILRSFDQHVNLLLEDAEEI------IDGN------VYKRGTMVVRGENVLFISPVPG 81 Lsm1-Py 1 MAERPLDVIHRSLD--KDVLVILKKG-FEFRGRLIGYDIHLNVVLADAEMI------QDGEV------VKRYGKIVIRGDNVLAISPTEE 73 Lsm1-PF 1 MAERPLDVIHKSLD—-KDVLVILKKG-FEFRGKLIGYDIHLNVVLANAELL------QDGEV------VKKYGKIVIRGDNVLAISPTEEG 74 Lsm1-MT 8 –RVNVQRPLDALGNSLN—-SPVIIKLKGD-REFRGVLKSFDLHMNLVLNDAEEL------EDGEV------TRRLGTVLIRGDNIVYISP 81 Lsm2-AF 1 ----MVLPNQMVKSMVG--KIIRVEMKGEENQLVGKLEGVDDYMNLYLTNAMEC------KGEE------KVRSLGEIVLRGNNVVLIQPQEE 73 hsm-B 2 ----TVGKSSKMLQHID--YRMRCILQDG-RIFIGTFKAFDKHMNLILCDCDEFRKIK--PKNSKQA—-EREEKRVLGLVLLRGENLVSMTVEGPPPKDTG 91 hsm-D1 2 ------KLVRFLMKLSH--ETVTIELKNG-TQVHGTITGVDVSMNTHLKAVKMT------LKNRE------PVQLETLSIRGNNIRYFILPDSLPLDTLLV 81 Hsm-D2 26 -----TGPLSVLTQSVKNNTQVLINCRNN-KKLLGRVKAFDRHCNMVLENVKEMWTEVPKSGKGKKKSKPVNKDRYISKMFLRGDSVIVVLRNPLIAGK 113 hsm-D3 2 ---SIGVPIKVLHEAEG--HIVTCETNTG-EVYRGKLIEAEDNMNCQMSNITVTY------RDG------RVAQLEQVYIRGCKIRFLILPD 75 * * * * ------Sm1 motif Sm2 motif

Figure 3.1 Sequence alignments of archaeal Sm-like proteins with human canonical Sm (hsm) proteins and bacterial Hfq proteins. Shown on top is the secondary structure assignment determined by X-ray analysis. Orange color arrow denotes α-helix, and blue color arrows denote β-strands. β1,2 and 3 strands constitute the first part of the Sm domain, and β4 and 5 strands form the second part. These two parts are linked together by a loop region (L4) with variable length and composition. Residues that almost fully conserved in all Sm/Lsm proteins are labeled in red, and residues highly conserved are in yellow. Residues in loop3 and 5 that involved in uridine binding are indicated by star. Pyrococcus furiosus Lsm2, also named as Pf -1806301, was colored in blue. This figure was adapted from Sauter et al., Nucleic Acids Research, 2003, 31, 4091-4098.

25 Table 3.1 Statistics of current Sm, Lsm and Hfq crystal structures at PDB

PDB Oligomeric Dimension Angle α/β/γ Reso- Space Sequence ID Formation a/b/c (Å) (°) lution group Identity* Lsm1-AF 1I4K heptamer 110.40/ 64.56/ 90.00/ 92.09/ 2.5 P21 17% 129.86 90.00

Lsm1-Py 1H64 heptamer 69.33/ 70.16/ 90.21/90.70/ 1.9 P1 17% 116.01 107.48 Lsm1-MT 1JBM heptamer 45.07/ 54.08/ 87.58/ 72.86/ 1.85 P1 19% 62.35 81.45 Lsm1-PA 1I8F heptamer 100.26/95.74/ 90.00/ 92.09/ 1.75 C2 30% 62.16 90.00 Lsm2-AF 1LJO hexamer 58.42/ 58.42/ 90.00/90.00/ 1.95 P6 22% 32.08 120.00 Hfq-EC 1HK9 hexamer 61.35/ 61.35/ 90.00/90.00/ 2.15 P61 28% 166.10 120.00 Hfq-SA 1KQ1 hexamer 67.03/ 89.95/ 90.00/ 97.95/ 1.55 P21 27% 67.77 90.00 hsm-D3 1D3B hexamer 107.35/108.45/ 90.00/ 90.00/ 2.00 P212121 17% 110.42 90.00 hsm-B 1D3B hexamer 107.35/108.45/ 90.00/ 90.00/ 2.00 P212121 14% 110.42 90.00 hsm-D1 1B34 dimer 75.30/ 75.30/ 90.00/90.00/ 2.50 P62 22% 91.99 120.00 hsm-D2 1B34 dimer 75.30/ 75.30/ 90.00/90.00/ 2.50 P62 19% 91.99 120.00 Lsm2-PF N/A octamer 69.57/ 69.64/ 80.55/80.64/ 2.8 P1 69.80 71.05 N/A 82.96/82.96/ 90.00/90.00/ 2.7 P6222/ 189.785 120.00 P6422

*Sequence identity was calculated by the program 3D-PSSM using the Lsm2-PF sequence as a query.

26

pET 28 T7 promoter RBS ATG 6xHis tag Thrombin recognition site Lsm2-PF gene

Translation

ATG 6xHis tag Thrombin recognition site Lsm2-PF protein LeuValProArgGlySer

Thrombin cleavage Lsm2-PF protein

Figure 3.2 Purification scheme. Lsm2-PF gene was cloned into the multiple cloning sites located just downstream of the thrombin recognition site in vector pET28. Lsm2-PF was expressed in E.Coli BL21 (DE3). The His-tag was removed by thrombin cleavage after Ni-affinity purification. The final purified Lsm-PF protein contains a short sequence of GSHM at the N- terminal.

27 Purification of Recombinant Pf -1806301

The results of the Ni-NTA column purification were shown in Figure 3.3. From the purification profile, it could be seen that a protein peak of ~ 200 mAu appeared during the

elution using a 20-500 mM imidazole gradient. The 4-20% gradient SDS-PAGE showed that the

recombinant protein was relatively pure, and the majority of the contamination was removed

after the heat treatment. Also, a special protein band with the molecular weight of 60~70 kDa

was shown always purified together with Pf -1806301 (10.13kDa). Since the appearance of the

band was dependent on both the concentrations of the pure Pf -1806301 sample and the

completeness of the denaturation of the protein during SDS-PADE, the 60-70kDa band was

considered to be a polymeric band of Pf -1806301.

Thrombin Cleavage of His-Tag Protein and Ni-affinity Purification

A high concentration of imidazole inhibits Thrombin protease, therefore buffer exchange

was necessary to remove excess imidazole after Ni-affinity purification. For a more complete

cleavage result, protein was incubated at 37°C for 24 hrs, and no multiple digestions occurred

(Figure 3.4). The Pf -1806301, without the His-tag, does not have the ability to bind to a Ni-

affinity column. Thus it was separated from the undigested protein by passage through the

column.

Dynamic Light Scattering (DLS) Analysis

In DLS the diffusion constant of the molecule in the solution was calculated from the

autocorrelation function of the scattered light. From the diffusion constant, the hydrodynamic

radius was then calculated. DLS was used to build the size distribution model of the sample,

hence giving information on degree of aggregation and molecular weight of the native protein

sample. In the case of Pf -1806301, the polydispersity index of the hydrodynamic radius was 2%,

28

A.

B.

85K polymer 39.7K 17.5K monomer 7.1K

Figure 3.3 Pf -1806301 was purified over Ni-NTA agarose. (A) The purification profile of the imidazole concentration is colored green and UV280 is colored blue. (B) SDS-PAGE analysis of selected fractions in the peak area. The molecular weight of the Pf -1806301 monomer is 10.13 kDa based on calculation. The polymeric bands with MW about 60-70k Da were also shown.

29 A B

17.5k 7.1k

Figure 3.4 Superdex-75 gel filtration purification of Pf -1806301 after the His-tag was removed. (A) Purification profile (B) SDS-PAGE shows the final purified protein without a His-tag, which has the MW of 8.25 kDa.

30 which is much smaller than 15%, indicating the sample was monodispersed (Figure 3.5). Also

the baseline level and the SOS (Sum of Squares) error were very low, showing the molecular

size distribution was monomodal, and there was no random aggregation. Interestingly, the

calculated molecular weight was 107 KDa, suggesting Pf -1806301 is organized as a 13-mer

given that the size of the monomer is 8.25 KDa. Given that Pf -1806301 was already

characterized as an archaeal Lsm protein, and the formation of a dimer of heptamers (14-mer)

were observed in some archaeal Lsm crystal structures (40), it is proper to suggest that

Pf-1806301 forms a dimer of hexamer (12-mer) or a dimer of heptamer (14-mer) in solution.

Gel-Shift Assay

The gel shift assay, or electrophorectic mobility shift assay, is a simple and rapid method

for studying the DNA/RNA-binding properties of the target proteins. This assay is based on the

fact that the DNA/RNA-protein complex migrates more slowly than the free DNA/RNA in the

non-denaturing polyacrylamide gel. Therefore a shift of the RNA band would be observed when

the protein did bind with its target RNA. In this thesis, the fluorescence bands shown in figure

3.6A indicate the position of ligand RNA and protein-RNA complex, and the Coomassie blue stained bands in figure 3.6B indicate the position of the protein and protein-RNA complex. Lane

1 and 3 shown in both gel pictures are loaded with RNA and protein control respectively.

Comparison of the two gel pictures showed no visible shift of the RNA band at lane 2. Therefore this experiment suggested that Pf -1806301 did not bind to the poly-uridine (U10).

Expression of Selenomethionine (Se-Met) labelled protein

In order to solve the protein crystal structure using the method of multiwavelength

anomalous diffraction (MAD), selenomethionine has to be incorporated into the recombinant

proteins to replace the original methionine residue. We hereby utilize PASM-5052 auto-inducing

31

Figure 3.5 Dynamic light scattering analysis of native Pf -1806301 in solution. The size distribution shows that the sample is highly homogenous. The calculated molecular weight based on the hydrodynamic radius shown above is 107kDa. The diagram was generated by program Exdynals.

32 A B

3 2 1 1 2 3

Pf -1806301(µl) 6-FAM labeled RNA(µl) Gel shift buffer(µl) BSA (µl) Total(µl) Lane 1 7 (H2O) 1 1 1 10 Lane 2 7 1 1 1 10 Lane 3 7 7 (H2O) 1 1 10

Figure 3.6 In vitro binding of the fluorescence labelled oligo U with Pf -1806301. (A) Gel was visualized under UV light, showing the position of the uridine oligomer. (B) The SDS-PAGE gel was stained with Coomassie blue, showing the position of the protein complex. Lane 1 and 3 are RNA and protein controls of the experiment respectively. In lane 2, 24 nmole of Pf -1806301 was incubated with 0.62 nmole of 6-FAM labeled oligo (U) for 45min at 60°C. Gel picture A was laid side by side with picture B to show the band shift.

33 medium to express the Se-Met labeled protein in E. Coli, which has succeeded in producing a

large number of Se-Met labelled P. furiosus proteins. Since the replacement of the methionine by selenomethionine is isomorphous, the protein structure is expected to be subject to little change.

Therefore the same purification procedures as with the native protein was used, and the yield was comparable.

Crystallization of Pf -1806301 without His-tag

There are a variety of methods that can be used to crystallize proteins. Modified microbatch is one of them, in which a small drop of a sample is mixed with the crystallization reagent of choice, and a layer of oil (usually a mixture of paraffin and silicon oil) is added above the droplet to let slow evaporation occur. Primary crystallization screening showed that small crystals were readily formed in Hampton I, Wizard I and PEG/ION screens, but optimization screens of selected crystallization conditions could not produce crystals with high enough quality for data collection. Methylation of native Pf -1806301 also did not improve the crystal quality

either. Promising initial crystallization screen conditions were then tested in combination with an

additive screen. Among them, Wizard I #45 which containing 20% PEG 3000 and 0.1M NaAc at

pH 4.5 yield diamond-shaped crystals in a large number of additive conditions. These crystals

were used for heavy atom soaking and later for data collection. The best crystal diffracted to 3.2

Å. (Figure 3.7). With the high mosaicity, cryo protectant optimization had been carried out, and

the significant improvement in crystal diffraction had been observed.

In addition, stronger anomalous diffraction is also required. High purity Se-Met labeled

Pf -1806301 was then used to screen the 384 conditions using the sitting-drop vapor diffusion

method. Different shapes of crystals appeared in more than 20 crystallization conditions within

two weeks. After testing the initial diffraction of the crystals using the home-source X-ray

34 instrument, it appeared that diamond-shaped crystals diffracted better than plate and needle-

shaped crystals. The best plate-shaped crystals grown from Hampton II #10 (0.2M NaCl, 0.1M

NaAc pH4.6, 30% MPD) diffracted to 2.8 Å at the APS. The best diamond-shaped crystals grown from Memfac #10 (0.1M NaAc pH4.6, 0.1M NaCl, 12% PEG 6000) diffracted to 2.7Å using the home-source X-ray instrument (Figure 3.8). Six optimization screens were made based on the crystallization conditions that produced the best diamond-shaped crystals. However, only condition 1F (0.1M NaAc pH3.5, 0.35M (NH4)2SO4, 10% PEG 4000) from Memfac #3

optimization screen produced crystals that could be mounted.

Data Collection and Processing

A. Data collection of Native protein crystal soaked with heavy atom

Native protein crystals grown from WI #45 with additive diffracted up to 2.7 Å. Intensive

heavy atom soaking was carried out and crystals soaked with K2PtCl6 diffract to ~ 3.0 Å at the

synchrotron. No heavy atom sites were observed in the anomalous difference Patterson map and

it was assumed that the heavy atoms were not specifically incorporated into the crystals. Table

3.2 shows the data collection statistics of these data sets.

B. Data collection of Se-Met protein crystal with P1 space group

Data was collected at 22-ID of the SER-CAT beamline, APS (Argonne National

Laboratory), using a MarCCD 300 detector. The data collection parameters are listed in Table

3.3. The data were indexed and integrated using program d*TREK (43) and scaled using 3dscale

(44).

The scaled isomorphous data were used to perform the self-rotation function calculation.

Program GLRF (45) was used in this calculation. The resolution and integration radius were

15.0 - 3.5 Å and 15.0 Å separately. Different κ angles (180°, 30° - 60°) were calculated while

35

Figure 3.7 Crystallization of Pf -1806301 Crystals were grown from Wizard I #45 ( 20% PEG 3000 and 0.1M NaAc at pH 4.5 ), diamond-shaped crystals were found in several additive conditions.

36

Figure 3.8 The best diffracted Se-Met crystal was grown from Memfac #10 (0.1M NaAc pH4.6,

0.1M NaCl, 12% PEG 6000) and diffracted up to 2.7Å with the home source X-ray instrument.

37 Table 3.2 Data collection statistics of crystals soaked with heavy atom

Rsym Comple I/σ Redun Mosaic Cell a/b/c (Å) Cell α/β/γ Space Resolu (%) -teness -dancy Spread (°) group -tion (%) (°) Pfu- 8.1 96.1 25.72 8.2 0.26 83.374/83.374/ 90/90/120 P6222/ 2.9 1806301- 190.046 P6422 001 78-dt 1.sca Pfu- 11.5 97 42.64 37.5 0.64 83.091/83.091/ 90/90/120 P6222/ 3.0 1806301- 189.001 P6422 001 79-dt 1.sca Pfu- 7.8 97.7 51.03 20.8 0.8 82.41/82.41/ 90/90/120 P6222/ 2.94 1806301- 190.094 P6422 001 81-dt 1.sca Pfu- 16.3 97.9 18.11 4.5 1.19 83.019/83.019/ 90/90/120 P6222/ 3.0 1806301- 189.511 P6422 001 85-dt 1.sca Pfu- 8.6 91.1 21.007 5.2 0.757 82.96/82.96/ 90/90/120 P6222/ 2.67 1806301- 189.785 P6422 001 87-dt 1.sca Pfu- 13.9 99.6 23.52 5.2 0.688 83.218/83.218/ 90/90/120 P6222/ 3.2 1806301- 189.438 P6422 001 88-dt 1.sca Pfu- 8.3 77.5 31.13 16.2 0.37 83.963/83.963/ 90/90/120 P6222/ 3.0 1806301- 189.143 P6422 001 104- dt 1.sca *

* Data collected from crystals grown from Se-Met protein.

38 scanning ψ and φ from 0° to 180°. A strong peak was observed at ψ = 102°, φ = 108° and κ =

45° (Figure 3.9). This indicated the existence of an octameric form of the Pf -1806301 molecules. Another self-rotation function calculation was made with κ = 180°. One strong peak and eight weaker peaks were observed in the self-rotation function plot (Figure 3.10). The strong peak was at the same location of the 8-fold axis in the spherical polar space (ψ = 102° and

φ = 108°). The other eight peaks represented the eight 2-fold axes that are perpendicular to the

2-fold (or 8-fold) axis of the octameric ring. Also based on the Matthews coefficient calculation

(Table 3.4), when there were 16 molecules per asymmetric unit, VM was 2.4 which is right in the

range of 1.7~4.0 (46). Therefore, the Pf -1806301 complex is predicted to be a dimer of octamers,

and the 8-fold axis is perpendicular to the eight 2-fold axes of the ring.

39 Table 3.3 Data collection parameters of P1 space group crystal

Distance(mm) 340.0 Oscillation (°) 0.5 Exposure (sec) 4.0 Wavelength (Å) 0.9688 Scan range (°) 360 Space group P1 a,b,c (Å) 69.57, 69.64, 69.80 α, β, γ (°) 80.55, 80.64, 71.05 Rsym(%) 8.0 Resolution (Å) 2.8

40 ( 102, 108 )

Figure 3.9 Self-rotation function plot with κ = 45°.

A strong peak was observed at ψ = 102°, φ = 108° and κ = 45°, indicating an 8-fold axis.

41 ( 15, 153 ) ( 15, 66 )

( 36, 36 )

( 57, 27 )

( 78, 21 )

( 102, 108 )

( 102, 15 )

( 123, 9 )

( 144, 0 )

Figure 3.10 Self-rotation function plot with κ = 180°. The strong peak at the center of the sphere indicates a 2-fold axis that coincides with the 8-fold axis. The other eight peaks represent the eight 2-fold axes that are perpendicular to the 2-fold (or 8-fold) axis.

42 Table 3.4 Matthews coefficient of P1 space group crystal

Nmol/asym Matthews Coeff %solvent ______1 38.5 96.8 2 19.2 93.6 3 12.8 90.3 4 9.6 87.1 5 7.7 83.9 6 6.4 80.7 7 5.5 77.5 8 4.8 74.2 9 4.3 71.0 10 3.8 67.8 11 3.5 64.6 12 3.2 61.4 13 3.0 58.1 14 2.7 54.9 15 2.6 51.7 16 2.4 48.5 17 2.3 45.3 18 2.1 42.0 19 2.0 38.8 20 1.9 35.6 21 1.8 32.4 22 1.7 29.1 23 1.7 25.9 24 1.6 22.7 25 1.5 19.5 26 1.5 16.3 27 1.4 13.0 28 1.4 9.8 29 1.3 6.6 30 1.3 3.4 31 1.2 0.2 ______

43 CHAPTER 4

DISCUSSION

Sequence comparison showed that Pf-1806301 shares with archaeal Lsm proteins the

conserved pattern of hydrophobic residues in β1, β2 and β4, as well as highly conserved residues

at Gly29 and Asp35. Also, analysis of the secondary structure of Pf -1806301 revealed sequence

motifs consistent with the Sm-fold. Interestingly Pf -1806301 does not contains the highly

conserved residues His37, Asn39 and Arg63 that are involved in uridine binding, but shares the

sequence of VILLK at β3 with Hfq proteins. Nevertheless, Pf -1806301 could not be

characterized as an Hfq protein, since it lacks the highly conserved YKHAI sequence at the loop

5 region of all the known Hfq proteins. Based on the fact that only two types of Lsm proteins are

present in each archaeal genome, and Lsm1 type protein was already identified in P. furiosus, it

is proper to consider Pf -1806301 an Lsm2 type protein.

In eukaryotes, spliceosomal Sm proteins bind to the snRNA at the Sm site with the

consensus sequence of PuAU4-6GPu. Archaeal Lsm proteins also show a clear preference for

uridine-rich RNA as determined by gel shift assays. Finally the 3-D structures of several archaeal

Lsm proteins demonstrate the existence of a conserved uridine-binding pocket (31, 39, 40).

However Pf -1806301, although we show evidence that this is an Lsm-PF, did not bind with poly

U sequence in the gel shift assay. Two reasons might explain this result. 1) Since Pf -1806301

lacks the three conserved residues (His37, Asn39 and Arg63) that form the uridine-binding

pocket, the protein itself may not have the ability to bind uridine-riched RNA, and it is also

possible that the protein may bind other type of sequences. Similar situations also happen to

Lsm2 from Sulfolobus solfataricus and Aeropyrum pernix. When Thr substituted His at position

37, the proteins show significantly weaker RNA binding (31). 2) In the gel-shift experiment,

44 radioactively labeled RNA is generally used for the detection of target RNA at even fmol concentrations (36). However the 6-FAM fluorescence labeled RNA used in Pf -1806301 experiment could only be detected at µmol concentration. Therefore, due to weak binding, even if the target RNA indeed binds with Pf -1806301, the shifted band may be too faint to detect.

In this thesis we have shown that the crystals can be improved by choosing the expression vector pET28. In this case, the His-tag could be easily removed by thrombin digestion after the Ni-affinity purification. Thrombin cleavage results in the retention of only four amino acids at the N-terminus of the target protein, which is much less than the ‘old version’ of the His- tagged Pf -1806301 clone. Small tags such as the His-tag are usually considered non-interfering with protein folding and function, and over 100 His-tagged proteins have been deposited into the

Proteins Data Bank (47). Yet crystals of His-tagged protein may vary slightly regarding their mosaicity and diffraction compared with native protein (48). In this thesis, the final purified native Pf -1806301 did yield better-diffracted crystals than the protein with His-tag attached.

EM and crystallographic studies showed that Sm/Lsm and Hfq complexes are organized in either hexameric or heptameric ring structures both in vivo and in vitro studies. The oligomerization of Pf-1806301 was first observed on SDS-PAGE gels, and further confirmed by dynamic light scattering analysis. Based on at these studies, Pf -1806301 was first assumed to be a dimer of hexamers or heptamers, which were also seen in the Lsm complex of P. abyssi, P. aerophilum, and M. thermautotrophicum. However, recent self-rotation studies on the diffraction data of Pf -1806301 revealed that the overall structure is actually a dimer of octamers. This result is significant in terms of protein structure and function, since none of the Sm/Lsm or Hfq complexes currently known adopt the octameric form. To reveal how and why this octameric ring is formed, better diffraction data are needed.

45 From an evolutionary point of view, hyperthermophilic archaea belong to the most slowly evolving organisms. In addition, archaea are considered closely related to the precursor of eukaryotic nuclei (47). Archaeal Lsm protein complexes therefore may represent an ancient form of eukaryotic snRNPs. Presently, the structures of four Lsm1 and one Lsm2 archaeal protein complexes have been determined. This structural information provided valuable insight into the formation and evolution of the eukaryotic Sm core complex (34, 35). Meanwhile Pf -1806301, an archaeal Lsm2 protein characterized in this thesis, shows sequence similarity to both archaeal

Lsm and bacteria Hfq proteins. The three-dimensional structure of Pf -1806301 therefore will not only help reveal the function of the protein, but also offer in-depth implications as to the relationship between Lsm and Hfq proteins.

46

REFERENCES

(1) Kim, S. H. (1998) Shining the light on structural genomics. Nature Struct Biol. 5, 643-645.

(2) Adams, M. W. W., Dailey, H. A., Delucas, L. J., Luo, M., Prestegard, J. H., Rose, J. P. and

Wang, B. C. (2003) The southeast collaboratory for structural genomics: a high-throughput gene to structure factory. Acc. Chem. Res. 36, 191-198

(3) Stetter, K. O. (1982) Ultrathin mycelia forming organisms from submarine volcanos areas having an optimum growth temperature of 105°C. Nature, 300, 258-60.

(4a) Roopali, R. (2001) Tungsten-containing aldehyde oxidoreductases: a novel family of enzymes from hyperthermophilic archaea. Ph. D dissertation, University of Georgia

(4b) Adams, M. W. W. (1993) Enzymes and proteins from organisms that grow near and above

100 degrees C. Annu Rev Microbiol. 47, 627-58.

(5) Woese, C.R., Kandler, O., Wheelis, M.L. (1990) Towards a Natural System of Organisms:

Proposal for the Domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. 87, 4576-4579.

(6) Stetter, K. O. (1996). Hyperthermophilic prokaryotes. FEMS Micro. Rev. 18, 149-158

(7) Magrum, L. J., Luehrsen, K. R., and Woese, C. R. (1978) Are extreme halophiles actually

"bacteria"? J. Mol. Evol. 11, 1-8.

(8) Burge, C. B., Tuschl, T. & Sharp, P. A. (1999). Splicing of precursors to mRNAs by the spliceosomes. In The RNA World (Gesteland, R. F., Cech, T. R. & Atkins, J. F., eds), pp. 525-

560, Cold Spring Harbor Press, Cold Spring Harbor, NY.

47 (9) Mattaj, I. W., Tollervey, D. and Seraphin,B. (1993) Small nuclear RNAs in messenger RNA and ribosomal RNA processing. FESEB J. 7, 47-53.

(10) He,W. and Parker, R. (2000) Functions of Lsm proteins in mRNA degradation and splicing.

Curr. Opin. Cell Biol. 12, 346-350.

(11) Kufel,J., Allmang,C., Petfalski,E., Beggs,J. and Tollervey,D. (2003) Lsm proteins are required for normal processing and stability of ribosomal RNAs. J. Biol. Chem. 278, 1239-1247.

(12) Bouveret, E., Rigaut, G., Shevchenko, A., Wilm, M. and Seraphin, B. (2000). A Sm-like protein complex that participate in mRNA degradation. EMBO J. 19, 1661-1671.

(13)Tharun, S., He, W., Mayes, A. E., Lennertz, P., Beggs, J. D. and Parker, R. (2000). Yeast

Sm-like proteins function in mRNA decapping and decay. Nature, 404, 515-518.

(14) Seto, A. G., Zaug, A. J., Sobel, S. G., Wolin, S. L. and Cech, T. R. (1999). Saccharomyces cerevisiae telomerase is an Sm small nuclear ribonucleoprotein particle. Science, 401, 177-180.

(15) Strub, K., Galli, G., Busslinger, M. and Birnstiel, M. L. (1984). The cDNA sequences of the sea urichin U7 small nuclear RNA suggest specific contacts between histone mRNA precursor and U7 RNA during RNA processing. EMBO J. 3, 2801-2807.

(16) Schaufele, F., Gilmartin, G. M., Bannwarth, W. and Birnstiel, M. L. (1986). Compensatory mutations suggest that base-pairing with a small nuclear RNA is required to form the 3’ end of messenger-RNA. Nature, 323, 777-781.

(17) Galli, G., Hofstetter, H., Stunnenberg, H. G. and Birnstiel, M. L. (1983). Biochemical complementation with RNA in the Xenopus oocyte: a small RNA is required for the generation of 3’ histone mRNA termini. Cell, 34, 823-828.

(18) Will, C. L., Lührmann, R. (1997) snRNP structure and function. In Eukaryotic mRNA

Processing. (Krainer, A. R.eds), pp.130-173, Oxford: IRL Press.

48 (19) Tan, E. M. (1989) Antinuclear antibodies: diagnostic markers for autoimmune diseases and probes for cell biology. Adv. Immunol., 44, 93-151.

(20) Lerner, M. R., and Steitz, J. A. (1979). Antibodies to small nuclear RNAs complexed with proteins are produced by patients with systematic lupus erythematosus. Proc. Natl Acad. Sci.

USA 76, 5495-5499.

(21) Lührmann, R., Kastner, B. and Bach, M. (1990) Structure of spliceosomal snRNPs and their role in pre-mRNA splicing. Biochem. Biophys. Acta, 1087,265-292.

(22) Cooper, M., Johnston, L. H. and Beggs, J. D. (1995). Identification and characterization of

Uss1p (Sdb23p): a novel U6 snRNA-associated protein with significant similarity to core proteins of small nuclear ribonucleoprotiens. EMBO J., 14, 2066-2075.

(23) Séraphin, B. (1995) Sm and Sm-like proteins belong to a large family: identification of proteins of the U6 as well as the U1, U2, U4 and U5 snRNPs. EMBO J. 14, 2089-2098.

(24) Hermann, H., Fabrzio, P., Raker, V. A., Foulaki, K., Hornig, H., Brahms, H. and Lührmann,

R. (1995) SnRNP Sm proteins share two evolutionarily motifs which are involved in Sm protein-protein interactions. EMBO J. 14, 2076-2088.

(25) Salgado-Garrido, J., Bragado-Nilsson, E., Kandels-lewis., and Séraphin, B. (1999). Sm an

Sm-like proteins assemble in two related complexes of deep evolutionary origin. EMBO J. 18,

3451-3462.

(26) Branlant, C., Krol, A., Ebel, J. P., Lazar, E., Haendler, B. and Jacob, M. (1982). U2 RNA shares a structural domain with U1, U4 and U5 RNAs. EMBO J. 1, 1259-1265.

23. Kambach, C., Walke, S., Young, R., Avis, J. M., de laFortelle, E., Raker, V. A., Lührmann,

R., Li, J., an Nagai, K. (1999). Crystal structures of two Sm protein complexes and their implications for the assembly of the spliceosomal snRNPs. Cell , 96, 375-387.

49 (27) Mattaj, I. W. (1986). Cap trimethylation of U snRNA is cytoplasmic and dependent on U

snRNP protein binding. Cell, 46, 905-911.

(28) Fischer, U., Sumpter, V., Sekine, M., Satoh, T., and Lührmann, R. (1993). Nucleo- cytoplasmic transport of U snRNPs: definition of a nuclear location signal in the Sm core domain that binds a transport receptor independently of the m3G cap. EMBO J. 12, 573-583.

(29) Kastner, B., Bach, M. and Lührmann, R. (1990). Electron microscopy of small nuclear ribonucleoprotein (snRNP) particles U2 and U5: evidence for a common structure-determining principle in the major U snRNP family. Proc. Natl Acad. Sci. USA, 87, 1710-1714.

(30) Achsel, T., Brahms, H., Kastner, B., Bachi, A., Wilm, M. and Lührmann, R. (1999). A

doughnut-shaped heteromer of human Sm-like proteins binds to the 3'-end of U6 snRNA,

thereby facilitating U4/U6 duplex formation in vitro. EMBO J. 18, 5789-5809.

(31) Törö, I., Basquin, J., Teo-Dreher, H. and Suck, D. (2002). Archaeal Sm proteins from

heptameric and hexameric complex: crystal structures of the Sm1 and Sm2 proteins form the

hyperthermophile Archaeoglobus fulgidus. J. Mol. Biol. 320, 129-142

(32) Sauter, C., Basquin, J. and Suck, D. (2003). Sm-like proteins in Eubacteria: the crystal

structure of the Hfq protein form Escherichia coli. Nucleic Acids Research. 31, 4091-4098.

(33) Schumacher, M. A., Pearson, R. F., Møller, T., Valentin-Hansen, P. and Brennan, R. G.

(2002). Structures of the pleiotropic translational regulator Hfq and Hfq-RNA complex: a bacterial Sm-like protein. The EMBO Journal. 21. 3546-3556.

(34) Collins, B. M., Harrop, S. J., Kornfeld, G. D., Dawes, I. w., Curmi, P. M. and Mabbutt, C. B.

(2001). Crystal structure of a heptameric Sm-like protein complex from Archaea: implications

for the structure and evolution of snRNPs. J. Mol. Biol. 309, 915-923.

50 (35) Mura, C., Cascio, D., Sawaya, M. R. and Eisenberg, D. S. (2001). The crystal structure of a heptameric archaeal Sm protein: implications for the eukaryotic snRNP core. Proc. Natl Acad.

Sci. USA, 98, 5532-5537.

(36) Törö, I., Thore, S., Mayer, C., Basquin, J., Séraphin, B. and Suck, D. (2001). RNA binding in an Sm core domain: X-ray structure and functional analysis of an archaeal Sm protein complex. EMBO J. 20, 2293-2303.

(37) Stark, H., Dube, P., Lührmann, R. and Kastner, B. (2001). Arrangement of RNA and proteins in the spliceosomal U1 small nuclear ribonucleoprotein particles. Nature, 409, 539-542.

(38) Raker, V. A., Hartmuth, K., Kastner, B. and Lührmann, R. (1999) Spliceosomal U snRNP core assembly: Sm proteins assemble onto an Sm site RNA nonanucleotide in a specific and thermodynamically stable manner. Mol. Cell. Biol. 19, 6554-6565.

(39) Thore, S., Mayer, C., Sauter, C., Weeks, S. and Suck, D. (2003) Crystal structures of the

Pyrococcus abyssi Sm core and its complex with RNA. J. Biol. Chem. 278, 1239-1247.

(40) Mura, C., Kozhukhovsky, A., Gingery, M., Phillips, M. and Eisenberg, D. (2003) The oligomerization and lignad-binding properties of Sm-like archaeal proteins (SmAPs). Protein

Science. 12, 832-847.

(41) Gough, J., Karplus, K., Hughey, R. and Clothia, C. (2001) Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure. J. Mol. Biol., 313(4), 903-919.

(42) Kelley, L.A., MacCallum, R.M. and Sternberg M.J.E. (2000). Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299(2), 499-520.

(43) Pflugrath, J.W. (1999) The finer things in X-ray diffraction data collection. Acta Cryst. D55,

1718-1725.

51 (44) Fu, Z.-Q., Pressprich, M., Sparks, R., Foundling, S. and Phillips, J. (2000) Scaling Bio-

Macromolecular Crystal Diffraction Data Using 3-Dimensional Models with Free-R Test.

Abstracts of 2000 ACA Annual Meeting, St. Paul, MN, USA, July 21-28. P066.

(45) Tong, L. and Rossmann, M. G. (1997) Rotation function calculations with GLRF program.

Methods in Enzymology, 276, 594-611.

(46) Kantardjieff KA, Rupp B. (2003) Matthews coefficient probabilities: Improved estimates for unit cell contents of proteins, DNA, and protein-nucleic acid complex crystals. Protein

Science, 12, 1865-1871.

(47) Terpe, K. (2003) Overview of tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Appl Microbiol Biotechnol. 60, 523-533.

(48) Hakansson, K., Broder, D., Wang, A. H., Miller, C. G. (2000) Crystallization of peptidase T from Salmonella typhimurium. Acta Crystallogr D Biol Crystallogr. 56, 924-926.

(49) Puhler, G., Leffers, H., Gropp, F., Palm, P., Klenk, H. P., Lottspeich, F., Garrett, R. A. and

Zillig, W. (1989) Archaebacterial DNA-dependent RNA polymerases testify to the evolution of the Eukaryotic nuclear genome. Proc. Natl Acad. Sci. USA, 86, 4569-4573.

52