View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Cell, Vol. 96, 375–387, February 5, 1999, Copyright 1999 by Cell Press Crystal Structures of Two Sm Protein Complexes and Their Implications for the Assembly of the Spliceosomal snRNPs

Christian Kambach,* Stefan Walke,* Robert Young,*§ RNA–RNA interactions to allow the trans-esterification

Johanna M. Avis,*k Eric de la Fortelle,* reactions to occur (Moore et al., 1993; Nilsen, 1994). Veronica A. Raker,† Reinhard Lu¨ hrmann,† The snRNPs are named after their RNA components: Jade Li,* and Kiyoshi Nagai*‡ the U1, U2, U4/U6, and U5 snRNPs contain U1, U2, U4/ *MRC Laboratory of Molecular Biology U6, and U5 small nuclear RNAs (snRNAs), respectively. Hills Road Their protein components are classified into two groups: Cambridge CB2 2QH specific proteins such as U1A, U1 70K, U2BЈЈ, and U2AЈ, England present only in a particular snRNP, and the core proteins † Institut fu¨ r Molekularbiologie und Tumorforschung common to U1, U2, U4/U6, and U5 snRNPs (Lu¨ hrmann Philipps-Universita¨ t Marburg et al., 1990; Nagai and Mattaj, 1994). In the spliceosomal Emil-Mannkopff Stra␤e2 snRNPs from HeLa cell nuclear extract, seven core pro-

D-35037 Marburg teins, referred to as the B/BЈ,D1,D2,D3,E,F,andG Germany proteins, have been identified. The core proteins are also called the Sm proteins due to their reactivity with autoantibodies of the Sm serotype from patients with Summary systemic lupus erythematosus (Lerner and Steitz, 1979). The Sm proteins form a distinct family characterized The U1, U2, U4/U6, and U5 small nuclear ribonucleo- by a conserved Sm sequence motif in two segments, protein particles (snRNPs) involved in pre-mRNA splic- Sm1 and Sm2, separated by a linker of variable length

,ing contain seven Sm proteins (B/B؅,D1,D2,D3,E,F, (Cooper et al., 1995; Hermann et al., 1995; Se´ raphin and G) in common, which assemble around the Sm 1995) (Figure 1A). The shortest G protein contains 75 site present in four of the major spliceosomal small residues and is almost made exclusively of the Sm motif nuclear RNAs (snRNAs). These proteins share a com- (Hermann et al., 1995), whereas the longest B and BЈ mon sequence motif in two segments, Sm1 and Sm2, proteins, which are alternatively spliced products of a separated by a short variable linker. Crystal structures single gene, contain the Sm motif near the N termini and

of two Sm protein complexes, D3B and D1D2, show differ only at their C termini by 11 residues (van Dam et that these proteins have a common fold containing al., 1989; Chu and Elkon, 1991). In the absence of U

an N-terminal helix followed by a strongly bent five- snRNA, three stable complexes of the Sm proteins, D3B

stranded antiparallel ␤ sheet, and the D1D2 and D3B (or D3BЈ), D1D2, and EFG, are found (Lehmeier et al., dimers superpose closely in their core regions, includ- 1994; Hermann et al., 1995; Raker et al., 1996). Deletion

ing the dimer interfaces. The crystal structures sug- mutagenesis of the B and D3 proteins demonstrated that gest that the seven Sm proteins could form a closed the Sm motif is necessary and sufficient for the formation

ring and the snRNAs may be bound in the positively of the D3B protein complex, suggesting that the Sm charged central hole. motif corresponds to a stable domain that mediates the Sm protein–protein interactions (Hermann et al., 1995). The core Sm proteins assemble around a conserved, Introduction uridyl-rich sequence called the Sm site present in the U1, U2, U4, and U5 snRNAs to form the core snRNP The protein coding sequence of the majority of eukary- domain. None of the Sm protein complexes, D B (or otic genes is interrupted by noncoding intervening se- 3 D3BЈ), D1D2, or EFG, can alone associate stably with the quences (introns). Following transcription into mRNA U snRNA (Fisher et al., 1985; Feeney et al., 1989; Raker precursors, the introns are excised to form continuous et al., 1996); however, the D D and EFG complexes coding sequences by two-step trans-esterification re- 1 2 together form a stable subcore snRNP complex with the actions within a macromolecular assembly called the U snRNA. Further binding of the D B complex to the (Steitz et al., 1988; Baserga and Steitz, 3 subcore domain yields the complete core domain (Raker 1993; Moore et al., 1993). The major components of the et al., 1996). spliceosome are four RNA–protein complexes, the U1, Formation of the core snRNP domain plays a critical U2, U4/U6, and U5 small nuclear ribonucleoprotein par- role in snRNP biogenesis. The U1, U2, U4, and U5 ticles (snRNPs). These snRNPs recognize short con- snRNAs are transcribed by RNA polymerase II and co- served sequences at the 5Ј and 3Ј exon–intron junctions transcriptionally acquire an N7-methyl-guanosine (m7G) and the branchpoint within the introns and assemble cap, also found in mRNAs. They are transported to the into catalytically active in which these cytoplasm, where the assembly of the Sm proteins sites are brought into close proximity by a network of around the Sm site takes place. The core domain is required for the hypermethylation of the m7G cap to a ‡ To whom correspondence should be addressed (e-mail: kn@ 2,2,7-trimethyl-guanosine (m3G) cap (Mattaj, 1986). The mrc-lmb.cam.ac.uk). nuclear import of the U snRNPs depends on a bipartite § Present address: Antisoma Research Laboratories, St. George’s Hospital Medical School, Cranmer Terrace, London SW17 0QS, UK. signal consisting of the complete core domain and the m G cap (Hamm et al., 1990; Fischer et al., 1993; Plessel k Present address: Department of Biomolecular Sciences, UMIST, 3 PO Box 88, Manchester M60 1QD, UK. et al., 1994), but the requirement for the m3G cap is less Cell 376

Figure 1. Primary Structure and Topology of the Secondary Structure of the Sm Proteins

(A) Amino acid sequence alignment of the human Sm (D1,D2,D3, B/BЈ, E, F, and G) proteins (Hermann et al., 1995) with secondary structure elements. Wavy line, helix; arrows, ␤ strands. The ␤ strands within the Sm1 and Sm2 motifs are colored blue and yellow, respectively. The ␤ strands and interconnecting loops are numbered consecutively from the N terminus. The conserved Sm1 and Sm2 motifs are indicated and the conserved residues within these motifs are highlighted in blue (hydrophobic), gray (hydrophobic, less well conserved), orange (neutral polar), red (basic), and green (acidic). (B) Topology of the secondary structure elements and pairing of ␤ strands in the B protein. (C) Topology of the secondary structure elements and pairing of ␤ strands in the D3 protein. The views in (B) and (C) are from the exterior of the bent ␤ sheet. The secondary structure elements and conserved residues are color coded as in (A), and main chain–main chain hydrogen bonds are shown as black arrows (intramolecular bonds) or red (intermolecular bonds) pointing from NH to CO. Side chain–main chain and side chain–side chain hydrogen bonds involving highly conserved residues are also included. stringent for the U4 and U5 snRNPs (Palacios et al., 1995; Hermann et al., 1995; Se´ raphin, 1995), and two 1997). The nuclear import of U snRNPs requires recep- Sm-like proteins (Uss1p and SmX3) have been shown tor-mediated binding of the bipartite signal to importin to be associated with U6 snRNA in yeast (Cooper et al.,

␤ (Palacios et al., 1997). A receptor for the m3G cap, 1995; Se´ raphin, 1995). A set of U6-associated Sm-like snurportin1, has been characterized (Huber et al., 1998). proteins have been identified by yeast two-hybrid sys- Based on sequence homology, canonical Sm and Sm- tems and are thought to form the U6-specific core do- like proteins have been identified in a variety of organ- main (J. Beggs and B. Se´ raphin, personal communica- isms, including yeast, plant, and worm (Cooper et al., tions). In addition to the canonical Sm proteins, the Crystal Structure of the Core snRNP Proteins 377

human U4/U6.U5 tri-snRNP also contains a set of Sm- 1995; Hermann et al., 1995; Se´ raphin, 1995) (Figure 1A). like proteins (T. Achsel and R. L., unpublished results). The majority of the conserved residues are hydrophobic Unlike other major spliceosomal snRNAs, the U6 snRNA (blue), but two glycines (orange), an asparagine (orange), is transcribed by RNA polymerase III and remains in the two aspartic/glutamic acids (green), and an arginine/ nucleus, where the assembly of the U6 core-like domain lysine (red) are highly conserved. The crystal structure hence takes place (Baserga and Steitz, 1993). reveals that the D3 and B proteins have a common fold We report here the crystal structures of two Sm protein containing an N-terminal ␣ helix (helix A) followed by a complexes. These structures not only reveal the fold of strongly bent five-stranded antiparallel ␤ sheet (Figures the Sm proteins but also suggest how they assemble 2A–2C). Strands ␤1, ␤2, and ␤3 are part of the Sm1 into the core domain together with snRNA. We have motif, whereas the Sm2 motif forms strands ␤4 and ␤5. reported previously the crystal structures of U snRNP– The topology of the secondary structure elements in the specific proteins U1A and U2BЈЈ/U2AЈ, bound to their D3 and B proteins are schematically shown in Figures respective cognate snRNA-binding sites (Oubridge et 1B and 1C with the same residue color code as in Figure al., 1994; Price et al., 1998). Together with these struc- 1A. The five-stranded antiparallel ␤ sheet is strongly tures, the crystal structures of the two Sm protein com- bent in the middle such that strand ␤4 at the top right- plexes represent an important further step toward un- hand corner in Figures 1B and 1C is joined to strand ␤5 derstanding of the architecture of the pre-mRNA splicing at the bottom left-hand corner. The side chains of the machinery. conserved hydrophobic residues (blue) pointing into the page in Figures 1B and 1C form a hydrophobic core. Results and Discussion The two halves of the sheet are staggered (Figures 2A and 2C), so that both surfaces of the ␤ sheet at the Structural Determination bottom right and top left corners in Figures 1B and 1C Fragments of the human D3 (residues 1–75) and B (resi- are exposed to the solvent. Strands ␤2, ␤3, and ␤4 are dues 1–91) proteins consisting predominantly of their strongly bent to allow the formation of the hydrophobic Sm motifs form a stable complex even in high salt. Crys- core, and the highly conserved Gly (orange) allows the tals of the complex in the orthorhombic space group severe bending of strand ␤2. As a result of this bending, ˚ ˚ ˚ P212121 (a ϭ 108.3 A, b ϭ 109.3 A, c ϭ 111.2 A) grown the regular pattern of alternating internal and external from citrate buffer diffracted to 2.0 A˚ resolution on a residues characteristic of the ␤ strand breaks up in the synchrotron source. The crystal structure was deter- middle of strands ␤3 and ␤4, where these strands form mined by multiple isomorphous replacement and has ␤ bulges with two consecutive external residues (D : been refined at 2.0 A˚ resolution to an R factor of 21.3% 3 Ser-44/Asn-45, Glu-59/Gln-60; B: Cys-43/Asp-44, Gly- and R of 26.5% with good geometry (Table 1). Crystals free 68/Leu-69). The B protein has a much longer L4 loop of the full-length D D protein complex in the hexagonal 1 2 than the D protein, and the pairing of strands ␤3 and space group P6 (a ϭ b ϭ 76.8 A˚ , c ϭ 90.0 A˚ ) were 3 2 ␤4 extends further into the L4 region (Figures 1C and grown from 25% PEG4000, 0.2 M sodium citrate (pH 2C). Loop L5 crosses over and closes the open end of 5.6), 0.2 M ammonium acetate, 15% glycerol. The crystal the bent ␤ sheet, and ␤5 pairs with ␤1 to form a ␤ structure of the D1D2 protein complex was solved by multiple isomorphous replacement and has been refined barrel–like structure. Although the Sm proteins bear some topological and structural similarities to proteins at 2.5 A˚ resolution to an R factor of 25.8% and Rfree of 29.0% with good geometry (Table 2). containing a strongly bent ␤ sheet or ␤ barrel (SH3; Gel filtration and dynamic light scattering experiments TFIIA) (Musacchio et al., 1992; Geiger et al., 1996; Tan et al., 1996), the absence of overall structural and se- indicated that both the D3B and D1D2 protein complexes are heterodimeric in solution. The asymmetric unit of quence similarities to these proteins shows that the Sm proteins form an evolutionarily distinct protein family. the D1D2 crystal contains one D1D2 complex, and the D1 protein contacts three D2 proteins in the crystal. One of 2 the D1D2 protein interfaces (interface area of 2070 A˚ )is much more extensive than the other two (interface area Conservation of the Sm Fold of 736 and 1087 A˚ 2) (Lo Conte et al., 1999) and is readily The D3 and B proteins were truncated essentially to the Sm motif (D3 residues 1–75, B residues 2–91), whereas identified as the D1D2 heterodimer interface in solution. the D1 and D2 proteins in the crystal are both full length, The asymmetric unit of the D3B protein crystals contains containing 119 and 118 residues, respectively. The last two rings. Each consists of three D3B heterodimers ar- ranged around a noncrystallographic three-fold axis 39 residues of the D1 protein, including 9 alternating Arg–Gly repeats at the very C-terminal end, are not or- such that the D3 and B proteins alternate around each dered and hence are excluded from the current model. ring. Hence, there are two distinct D3B protein interfaces. The packing of the Sm proteins is different between The first 27 residues of the D2 protein preceding helix the D1D2 and D3B crystals, but one of the D3B protein A as well as loop L4 (residues 76–90) are not ordered and interfaces is structurally identical to the most extensive excluded from the current model. The ordered regions of

D1D2 dimer interface. Therefore, we conclude that these the D1 and D2 proteins therefore consist predominantly

D1D2 and D3B complexes are the respective heterodim- of the Sm1 and Sm2 motifs. The structures of the bent ers in solution. ␤ sheet of the four core proteins made of the Sm1 and Sm2 motifs superimpose well (Figure 3A). The rms devia- The Sm Fold tion of C␣ atom positions within the Sm1 and Sm2 motifs The amino acid sequence alignment of the seven snRNP is less than 0.9 A˚ for all pairwise superpositions of these core proteins shows two conserved segments corre- four Sm proteins. For comparison, the rms deviation for sponding to the Sm1 and Sm2 motifs (Cooper et al., the Sm motif C␣ positions between all six copies of the Cell 378

Table 1. X-Ray Structure Determination for the D3B Protein Complex Data Collection Derivative 1 Derivative 2 Derivative 3

Native 1 Native 2 K2Pt(CN)6 Me3PbOAc PMB Resolution range 24.6–2.4 A˚ 25.2–2.0 A˚ 38.6–3.0 A˚ 36.5–3.0 A˚ 38.7–2.8 A˚ Wavelength (A˚ ) 0.8725 0.9900 1.5418 1.5418 0.8725 X-Ray source SRS, PX9.6 Elettra CuK␣ CuK␣ SRS, PX9.6 Unique reflections 50,252 88,556 25,921 25,665 29,747 Mean redundancy 3.3 4.8 3.5 3.4 6.1 Completeness (%) 99.1 (95.9) 99.1 (98.6) 98.3 (98.3) 98.1 (87.4) 97.6 (97.6) (outer shell)

Reflections with I Ͼ 3␴I (%) 100 69.3 100 88.7 100 a Rsym 0.110 0.097 0.083 0.112 0.100 b Rdiff (vs. Native 1) — — 0.137 0.128 0.153 Phasing Number of sites 9 11 21 Phasing power c Acentric 1.477 1.507 1.777 (Anomalous) 0.577 0.577 1.308 Centric 1.242 1.454 1.713 Cullis R factor, centricd 0.732 0.797 0.591 Mean FOMe Acentric 0.5044 Centric 0.5114 (for 24.5–2.8 A˚ ) Solvent Flattening Number of cycles 100 Resolution range 24.5–2.40 Mean overall FOMe 0.9436 Refinement and Model Correlation Resolution range 20–2.0 A˚ Number of atoms 8,132 (562 water molecules) Number of reflections 86,568 used in refinement R factor 0.213

Rfree factor 0.265 Number of reflections 4075 used for Rfree Average atomic B factor (A˚ 2) 32.0 Deviations from ideal geometry Bond lengths (A˚ ) 0.015 Angle distance (A˚ ) 0.034 Planes 0.045 a R ϭ⌺I ϪϽIϾ ⌺I, where I is the observed intensity and ϽIϾ is the average intensity from multiple measurements. sym | | b R ϭ⌺ F Ϫ F /⌺ F where F is the protein structure factor amplitude and F is the heavy-atom derivative structure factor amplitude. diff || PH| | P|| | P| | P| | PH| c Phasing power is root-mean-square ( Fh /E), where Fh is the heavy atom structure factor amplitude and E is the residual lack of closure | | | | error. d Cullis R factor ϭ ( E /D ), where D ϭ F Ϫ F . | | iso iso || PH| | P|| e FOM, figure of merit.

D3 protein in the asymmetric unit is 0.16 A˚ , and the chain amide of Gly-74 and the carboxyl group of the corresponding value for the B protein copies is 0.43 A˚ . highly conserved Asp-35 (Figures 1B and 2C). In the D1

The hydrogen bond network within the five-stranded ␤ and D2 proteins, these hydrogen bonds are conserved: in sheet observed in the D3 and B proteins (Figures 1B and the D1 protein, the side chain of Asn-37 forms hydrogen

1C) is essentially conserved in the D1 and D2 proteins. bonds with the amide group of Gly-62 and the ␥-carboxyl

However, the length of the N-terminal helix and its orien- group of Asp-33; and similarly in the D2 protein, the side tation with respect to the ␤ sheet is somewhat variable chain of Asn-64 is hydrogen bonded to the main chain (Figure 3A). The length of L4 is also variable (Figure 1A), amide group of Gly-103 and the ␥-carboxyl group of and loop L4 of the D2 protein, longest of the Sm proteins, Asp-60. These three residues are conserved in all the is ordered poorly in the crystal. The region between Sm proteins (Figure 1A) except in D3, and the hydrogen residues 74 and 80 of the D1 protein forms an extra bonding pattern seen in the B, D1, and D2 proteins is likely helix. The D3 and B proteins used for crystallization are to be conserved in the remainder of the Sm proteins (E, truncated, and it is likely that the C-terminal regions of F, and G proteins). The D3 protein is unique in that the these proteins have an extra structure in the full-length side chain of the absolutely conserved Asn-40 forms proteins. hydrogen bonds with the main chain amide of Gly-65 and In the B protein, the side chain of the absolutely con- the phenol oxygen of Tyr-62, which is in turn hydrogen served Asn-39 forms hydrogen bonds with the main bonded to Glu-36, which corresponds to Asp-35 in the Crystal Structure of the Core snRNP Proteins 379

Table 2. X-Ray Structure Determination for the D1D2 Protein Complex Data Collection Derivative 1 Derivative 2

Native 1 Native 2 MeHg K2Pt(CN)6 Resolution range 20.0–2.9 A˚ 18.5–2.5 A˚ 20.0–2.62 A˚ 24.2–3.0 A˚ Wavelength (A˚ ) 1.000 1.3412 1.000 0.8725 X-Ray source Elettra Elettra Elettra SRS PX9.6 Unique reflections 6,469 10,226 8,952 6,047 Mean redundancy 3.4 3.9 4.8 3.5 Completeness 97.1 (93.3) 98.7 (99.9) 95.4 (84.1) 98.5 (93.9) (outer shell)

Reflections with I Ͼ 3␴I (%) 85.5 71.3 74.2 73.3 a Rsym 0.083 0.051 0.049 0.061 b Rdiff (vs. Native 1) — — 0.345 0.156 Phasing Number of sites 2 2 Phasing powerc Acentric 1.2 1.0 (Anomalous) 1.88 0.60 Centric 1.1 0.84 Cullis R factor, centricd 0.84 0.80 Mean FOMe Acentric 0.38 Centric 0.46 (for 38.5–2.9 A˚ ) Solvent Flattening Number of cycles 100 Resolution range 22.5–2.62 Mean overal FOMe 0.890 Refinement and Model Correlation Resolution range 18.49–2.50 A˚ Number of atoms 1,316 (16 water molecules) Number of reflections 10,225 used in refinement R factor 0.258

Rfree factor 0.290 Number of reflections 431 used for Rfree Average atomic B factor (A˚ 2) 45.3 Deviations from ideal geometry Bond lengths (A˚ ) 0.010 Angle distance (A˚ ) 0.029 Planes 0.031 See footnotes to Table 1 for definition of symbols.

B protein (Figures 1C and 2A). These hydrogen bonds protein and Pro-6, Leu-10 (helix A), Val-18 (␤1), Leu-32 are important in reinforcing the ␤ sheet and the strong (␤2), Met-33 (loop L3), Ile-68, Leu-71, and Leu-73 (␤5) conservation of these residues as well, as the neigh- of the D3 protein (Figure 4C). Pro-6 in D3 also makes boring Arg-64 in D3 and Arg-73 in B suggests that they close contacts with Ala-33, the highly conserved Asp- may have an additional function such as RNA binding. 35 and Asn-39, as well as Ile-41 of the B protein (Figure

4C). Third, a salt bridge between Glu-21 of the D3 protein

The Formation of the D3B Protein Complex and Arg-65 of the B protein on the opposite side of the

In the D3B heterodimer, strand ␤4 of the B protein pairs ␤ sheet (Figure 4D) also stabilizes the dimer. These two with strand ␤5 of the D3 protein, forming an extended residues are conserved in the D1D2 complex but do not antiparallel ␤ sheet (Figure 4A). The D3 protein is there- form a salt bridge. The guanidinium groups of three fore referred to as the ␤4 neighbor of the B protein and arginine residues (Arg-49 and Arg-25 of B and Arg-69 the B protein as the ␤5 neighbor of the D3 protein. The of D3) show stacking interactions. intermolecular interactions within the heterodimer can be divided into three groups. First, the pairing of the Conserved Interfaces in the D1D2 two ␤ strands, which is stabilized by a hydrophobic and D3B Protein Complexes cluster involving Phe-70 and Ile-72 of D3 and Leu-67, In the D1D2 protein complex, strand ␤4 of the D2 protein

Val-70, Leu-72, and Phe-27 of B (Figure 4B). Second, pairs with strand ␤5 of the D1 protein through main chain the amphipathic helix A and parts of strands ␤2 and ␤3 hydrogen bonding in a manner analogous to the inter- of the D3 protein lie over the surface of the ␤ sheet of action seen in D3B protein complex (Figure 4B). Super- the B protein, forming a hydrophobic cluster involving position of the D3B and D1D2 complexes shows that the Ile-41, Cys-43 (␤3), Leu-69, and Leu-71 (␤4) of the B geometrical relationship between the two subunits within Cell 380

Figure 2. Ribbon Representations of the Crystal Structure of the D3 and B Proteins

(A) D3 protein (front view) with the hydrogen bonding network involving Tyr-62 and highly conserved residues Glu-36, Asn-40, Arg-64, and Gly-65.

(B) D3 protein (side view). (C) B protein (front view) including highly conserved residues Asp-35, Asn-39, Arg-73, and Gly-74. Strands are color coded as in Figure 1. Figure produced with SETOR (Evans, 1993).

the two heterodimeric complexes is also well conserved Further Assembly of the Core Proteins

(Figure 3B). The rms deviation of the C␣ positions within The ␤5 strand of the B protein within the heterodimer the Sm1 and Sm2 motifs is less than 0.9 A˚ between the is freely available to interact with the ␤4 strand of another

D1D2 and D3B protein complexes. The C-terminal end of core protein, and similarly the ␤4 strand of the D3 protein the D1 protein forms an additional helix not present in is available to interact with the ␤5 strand of another the three other core proteins studied here. The interface protein. Hence, each core protein can have both ␤4 and between the D1 and D2 proteins is more extensive than ␤5 neighbors simultaneously, and as a result the core proteins can form a multisubunit complex using the that between the D3 and B proteins, as both the N- and C-terminal helices as well as strands ␤2 and ␤3 of the same subunit interface as observed in the two core protein complexes. Modeling of a multisubunit complex D1 protein lie over the surface of the ␤ sheet of the D2 protein (Figure 3B), stabilizing the dimer by forming a based on the D1D2 complex structure shows that seven hydrophobic cluster. Leu-3, Phe-6, Leu-7, Leu-10 (helix core proteins could form a complete ring, as shown in A), Met-36 (␤3), Phe-68, Ile-69 (␤5), Leu-76, and Leu-80 Figure 5A. The angle between strands ␤4 and ␤5in (C-terminal helix) of the D protein and Ala-58 (␤2), Val- each protein determines the geometrical relationship 1 between adjacent subunits, hence the number of sub- 66 (␤3), and Phe-100 (␤4) of the D protein are involved 2 units within the ring. Similarly, the trp RNA-binding at- in the hydrophobic cluster. The D D protein complex is 1 2 tenuation protein (TRAP) from Bacillus subtilis forms also stabilized by hydrogen bonds between the ⑀-amino a complex consisting of 11 identical subunits through group of Lys-98 of the D protein and the main chain 2 intersubunit pairing of ␤ strands (Antson et al., 1995). carbonyl groups of Asp-72 and Leu-74 of the D1 protein. The conservation of the amino acid sequence of the core Sm proteins is thus reflected not only in the struc- EFG Protein Complex ture of the individual proteins (Figure 3A) but also in The E, F, and G proteins can be immunoprecipitated their subunit interfaces (Figure 3B). This conclusion can together with anti-F polyclonal antibody, although nei- be extended to the structure of the homologous E, F, ther E nor G alone nor F and G together could be precipi- and G proteins and allows us to build a model of a higher tated by the same antibody (Raker et al., 1996). E and order assembly. G or E and F can be coimmunoprecipitated with Y12 Crystal Structure of the Core snRNP Proteins 381

Figure 3. Structural Conservation of the Sm Proteins and Sm Protein Dimers

(A) Superposition of the C␣ backbones of D1,D2,D3, and B protein structures. L4 loops and N-terminal segments including helix A are the most variable parts of the structures. Only the ␣ carbon atoms of the Sm1 and Sm2 motif residues were used for the superposition. The poorly ordered loop L4 of the D2 protein is not included in the current model. B, blue; D1, green; D2, red; D3, orange.

(B) Superposition of the D3B and D1D2 complexes. Only the ␣ carbon atoms of the Sm1 and Sm2 motifs are used for superposition (rms deviation of the C␣ positions Ͻ 0.9 A˚ ). Proteins are color coded as in (A). Figure produced with SETOR (Evans, 1993). monoclonal antibody, whereas they are not precipitated trimeric complexes could associate into a hexameric individually. These results show that the E, F, and G ring through two F–G interfaces. As seven subunits are proteins form a complex in which the interaction be- needed to close the ring, the interfaces between the tween G and F is mediated by the E protein. The same two trimers may not be as stable as the D1D2 or D3B conclusion was drawn from experiments using the yeast contacts and may dissociate to interact with the D1D2 two-hybrid system (Camasses et al., 1997; Fury et al., protein during the core domain assembly. As no strong 1997). The mutagenesis experiment of yeast E protein F–F, G–G, or F–G interactions have been observed by the by Camasses et al. (1997) provided clues to identify the yeast two-hybrid experiment or by immunoprecipitation, ␤4 and ␤5 neighbors of the E protein. The Ile-90 Arg the interaction between the F and G proteins probably mutation in yeast E protein abolished the interaction→ occurs only after the formation of the FEG trimers. with G without affecting the F–E interaction. Ile-90 of yeast E protein, which corresponds to Leu-71 of human Architecture of the Core Domain

D3, is located on strand ␤5. This indicates that the G Based on the crystal structure of the D1D2 and D3B pro- protein is the ␤5 neighbor of the E protein. The Leu- tein complexes, we have shown that the Sm proteins 81 Pro, Asp-47 Ala, and Asp-47 His mutations in could form a seven-membered ring (Figure 5A). Is it → → → yeast E protein abolished the E–F interaction without possible to arrange the seven core proteins (B/BЈ,D1, affecting the E–G interaction. Leu-81 and Asp-47 of D2,D3, E, F, and G proteins) within a heptameric ring in yeast E protein, corresponding to Leu-81 and Asp-35 a manner consistent with the existing biochemical data? of the human B protein, are located at the D3B protein The B and BЈ differ only slightly at the C terminus (van interface and are important for the interaction with the Dam et al., 1989; Chu and Elkon, 1991) and are likely to ␤4 neighbor (Figure 3C). Hence, the F protein is the ␤4 be functionally indistinguishable during the assembly neighbor of the E protein. process. The three subcomplexes, the D3B, D1D2, and Plessel et al. (1997) and C. K. et al. (unpublished re- FEG-trimer complexes, are stable during purification in sults) showed that the E, F, and G proteins form a hexam- high salt and are likely to remain intact in the fully assem- eric subcomplex containing two copies of each subunit. bled core domain. The D3 protein is the ␤4 neighbor of

A negatively stained electron micrograph of this com- the B protein, and similarly the D1 protein is the ␤4 plex reveals a doughnut-shaped ring with outer diameter neighbor of the D2 protein. Mutagenesis data indicate of 65 A˚ , inner diameter and thickness of 20 A˚ . Two FEG that the F and G proteins are the ␤4 and ␤5 neighbors Cell 382

Figure 4. The Interaction between the D3 and B Proteins within the Heterodimer

(A) Stereo view of the D3B dimeric complex. D3, orange; B, blue. The ␤5 strand of the D3 protein pairs with the ␤4 strand of the B protein, forming a continuous antiparallel ␤ sheet. The loops L2, L3, L4, and L5 are extending in the same direction. (B) Ball-and-stick model (stereo view) of the main dimer interface stabilized by the paired ␤ strands and hydrophobic clustering of Phe-27

(strand ␤2) of the B protein and Phe-70 of the D3 protein. The hydrogen bond between the Gly-65 of D3 and Arg-73 of B stabilizes the conformation of loop L5.

(C) A close-up stereo view of the D3 and B protein interface including the hydrophobic cluster between residues on the ␤ sheet outer surface of the B protein and residues on helix A and strands ␤1, ␤2, and ␤5 of the D3 protein. Pro-6 of the D3 protein makes close contact with Ala- 33, the highly conserved Asp-35 and Asn-39, and Ile-41 of the B protein.

(D) The intersubunit salt bridge between Glu-21 (D3) and Arg-65 (B) and a cluster of arginines showing stacking of the guanidinium groups

(Arg-69 of D3 and Arg-25 and Arg-49 of B). The sigma A–weighted 2Fo Ϫ Fc map indicates the high quality of the electron density. Figure produced with SETOR (Evans, 1993). Crystal Structure of the Core snRNP Proteins 383

Figure 5. Proposed Higher-Order Assembly of the Human Core snRNP Proteins (A) Ribbon diagram of the heptamer model (Evans, 1993). The seven core Sm proteins

(B/BЈ,D1,D2,D3, E, F, and G) are arranged within the seven-membered ring based on the

crystal structures of the D1D2 and D3B com- plexes and pairwise interactions deduced from biochemical and genetic experiments. For each protein, its ␤4 neighbor is located in the clockwise direction. The subunit ar- rangement within the FEG trimer is based on a mutagenesis study of yeast E protein in- vestigating its interaction with the F and G proteins (Camasses et al., 1997). The interac-

tions between D2 and F as well as between

D3 and G have been reported based on a yeast two-hybrid experiment and, indepen- dently, by immunoprecipitation (Raker et al., 1996; Fury et al., 1997). (B) Surface represen- tation (Nicholls et al., 1991) of the heptameric ring model with electrostatic potential (blue, positive; red, negative) viewed as in (A). (C) Side view. (D) Bottom view. Note that the cen- tral hole is highly basic and the surface of the assembly is asymmetric both in shape and charge. Figure 5A produced with SETOR (Ev- ans, 1993), and Figures 5B–5D produced by GRASP (Nicholls et al., 1991).

of the E protein, respectively (Camasses et al., 1997), Figures 5A–5D accounts for only 58% of the total mass and the FEG protein trimer was modeled based on the of the core proteins. Some of the poorly ordered regions

D1D2 structure (see Experimental Procedures). For a not seen in the crystal structure are likely to become given protein, its ␤4 neighbor is located in the clockwise ordered in the fully assembled core domain and may direction in Figure 5A. The interaction between the D2 account for a substantial part of the other hemisphere and F proteins has been detected by the yeast two- together with the RNA and the C-terminal ca. 150 resi- hybrid experiments and immunoprecipitation, and we dues of the B/BЈ protein not included in the construct hence placed D2 and F next to each other within the used for crystallization. ring. The D3 and G proteins are also known to interact If the FEG hexamers observed by electron microscopy with each other and are placed next to each other in remain intact in the fully assembled core domain, an

Figure 5A. Weak interaction between the B and D1 pro- alternative model must be considered (Plessel et al., teins is also detected by immunoprecipitation (V. A. R. 1997). The FEG hexamer has a two-fold symmetry, and and R. L., unpublished results). Hence, all the Sm pro- two D1D2 complexes are likely to bind to the hexamer, teins can be arranged within the heptamer in a manner leaving only a space for a single Sm protein in between. consistent with all the pairwise subunit interactions de- Therefore, either one D3B complex replaces one of the tected by the immunoprecipitation and yeast two-hybrid D1D2 complexes, or two D3B complexes bind on the experiments (Raker et al., 1996; Camasses et al., 1997; opposite surface. These models are less plausible, as Fury et al., 1997). The modeled heptamer ring has an they require additional modes of subunit interactions outer diameter of 70 A˚ and a diameter of the central not observed in the D1D2 or D3B crystals and do not hole of 20 A˚ when only the main chain atoms are consid- satisfy the reported pairwise subunit interactions simul- ered, in good agreement with the dimensions of the core taneously (Raker et al., 1996; Camasses et al., 1997; domain estimated by electron microscopy (Kastner et Fury et al., 1997). al., 1990, 1992) (Figures 5B–5D). In negatively stained electron micrographs, the core Interactions between the Core Proteins domain often appeared to be round, but it is not yet and snRNA possible to conclude whether the core domain is a The electrostatic surface potential shown on the hep- sphere, a ring, or a hemisphere like our heptamer model tamer ring model (Figures 5B–5D) indicates that the cen-

(Figure 5C). The atomic models of the D3B and D1D2 tral hole is highly basic, suggesting that the snRNA is crystals contain approximately 70 residues of each pro- likely to bind to or go through this hole. The central hole tein chain, and therefore the heptamer model shown in is too small to accommodate a double-stranded RNA Cell 384

but large enough for a single-stranded RNA of the Sm Sm site (Lu¨ hrmann et al., 1990; Baserga and Steitz, 1993; sequence to bind. The polypeptide loops L2, L3, L4, and Moore et al., 1993; Frank et al., 1994; Nilsen, 1994; New- L5 of each core protein protrude into the central hole man, 1997). When these sites interact with the intron (Figures 3A and 5A) and, like the antigen-binding CDR and with each other in the fully assembled spliceosome, loops of immunoglobulin, they are well suited for binding the core domains are likely to be brought into proximity. ligands. The conserved feature of the Sm site is a run of five or six U’s preceded by A and followed by G. Unlike free adenosine, the N7 atom of this adenosine Evolution of Sm Proteins and the is readily methylated by dimethyl sulphate when it is Splicing Machinery incorporated in the subcore domain lacking the B and The Sm and Sm-like proteins are found in all eukaryotic

D3 proteins as well as in the fully assembled core domain cells throughout the plant and animal kingdoms (Cooper (Hartmuth et al., 1999). This shows that the chemical et al., 1995; Hermann et al., 1995; Se´ raphin, 1995), and environment of this adenosine is very similar in both therefore they must have appeared early in evolution. complexes. The AAU sequence present at the 5Ј end of The ancestral Sm protein probably formed homooligo- the Sm site of the U1 snRNA can be cross-linked to the mers, which gradually diverged into seven distinct pro- G protein by ultraviolet light (Heinrichs et al., 1992). The teins. The asymmetry of the assembly in both shape and AAU sequence is therefore in close contact with the G electrostatic potential allows the flanking RNA stems to protein. In the subcore domain, the protein subunits are interact with the surface of the core domain and extend arranged in the order of G-E-F-D2-D1 according to our in a specific direction. The long L4 loops in the B and model (Figure 5A), and therefore the Sm sequence is D2 proteins are strongly basic and may also interact with likely to bind along the inner surface of the protein com- one of the RNA stems flanking the Sm site. The core plex toward the D1 protein. The most highly conserved domain also provides an asymmetric platform to which amino acid residues located near loops L3 and L5 are specific protein components such as the U1 70K and likely to be involved in nucleotide binding. In the B pro- U1C protein can bind stably (Nelissen et al., 1994). The tein, the side chain of Asn-39 is hydrogen bonded to roles of the individual core Sm proteins remained largely the side chain of Asp-35 and the main chain amide of unchanged during eukaryotic evolution, as a homolog Gly-74 (Figure 2C). These three residues, as well as for each of the seven human core proteins can readily the neighboring Arg-73, are conserved in all the core be identified in yeast based on sequence similarities proteins except in D3 and may be involved in the binding (Cooper et al., 1995; Hermann et al., 1995; Se´ raphin, of U’s within the Sm site in each of these proteins. The 1995). Furthermore, the E protein gene is essential for

D3 protein is unique in that Glu-36 (corresponding to viability in yeast but can be replaced by its human coun- Asp-35 of B) is hydrogen bonded to Tyr-62, which is in terpart without affecting viability (Bordonne´ and Taras- turn hydrogen bonded to Asn-40 (corresponding to Asn- sov, 1996). 39 in B) (Figure 2A). The base of the last nucleotide (G) The Sm sites are found not only in the major spliceoso- of the Sm site might bind to the D3 protein through mal snRNAs (U1, U2, U4, and U5 snRNAs) but also in stacking interaction with Tyr-62 and hydrogen bond for- U11, U12, and U4atac snRNAs involved in the splicing mation with Glu-36 and Asn-40. Both RNA and DNA of minor AT-AC intron species (Tarn and Steitz, 1997). bases are often hydrogen bonded to Gln, Asn, Asp, The binding sites for the specific proteins in the U1 and Glu, or Arg side chains tethered by hydrogen bonds to U2 snRNPs are not conserved in the corresponding U11 another amino acid, and the stacking interaction be- and U12 snRNAs, and hence the core snRNP domain tween an RNA base and an aromatic amino acid side appears to be the only conserved feature in the major chain plays an important role in RNA–protein recognition and AT-AC spliceosomal snRNPs. This might imply that in the U1A and U2BЈЈ proteins (Oubridge et al., 1994; the core snRNPs have important conserved functions Price et al., 1998), MS2 phage coat protein (Valega˚ rd et such as mediating the interactions between the snRNPs al., 1994), and some aminoacyl-tRNA synthetases (Ca- within the spliceosome. varelli et al., 1993). Nuclear pre-mRNA splicing and group II intron self- The Sm site is flanked by double-stranded stems in splicing are mechanistically very similar, both involving the U1, U2, U4, and U5 snRNA (Lu¨ hrmann et al., 1990; a circular lariat intron intermediate. Hence, it has been Baserga and Steitz, 1993; Moore et al., 1993; Nagai and postulated that nuclear pre-mRNA splicing may have Mattaj, 1994). Do the snRNAs enter the central hole from evolved from group II intron self-splicing (Cech, 1985; one side of the ring and exit from the other? The electron Sharp, 1985; Weiner, 1993) and that the spliceosomal micrographs of U2 snRNP show that the domain con- snRNAs may have been derived from fragments of group taining the 5Ј end of U2 snRNA and the domain con- II self-splicing introns (Sharp, 1987; Newman and Nor- taining the 3Ј end of RNA and the U2BЈЈ–U2AЈ protein man, 1992; Weiner, 1993). The folding of the group II self- complex are located on the opposite sides of the core splicing intron into a catalytically active conformation is domain (Kastner et al., 1990), and this might suggest achieved through pairing of complementary stretches that U2 snRNA goes through the hole. It is interesting as well as tertiary interactions within the intron. The to note that the functional ends of the snRNAs, such as assembly of trans-acting RNA fragments of a group II the 5Ј end of the U1 snRNA that binds the 5Ј splice site intron into a “primitive spliceosome” was probably facili- of pre-mRNA, the binding sites in the U2 snRNA for the tated by their association with basic proteins, which branchpoint and U6 snRNA–binding sites, the con- initially merely reduced the electrostatic repulsion be- served loop of the U5 snRNA, and the U6 snRNA–binding tween the RNA components but later started playing region of U4 snRNA are all located on the 5Ј side of the more active roles in mediating the interactions between Crystal Structure of the Core snRNP Proteins 385

the primitive snRNPs. The ancestral core Sm proteins refined with REFMAC (Murshudov et al., 1997), with tight noncrys- may have played such a role, but the present spliceo- tallographic symmetry (NCS) restraints on the six copies of each somes have recruited many more protein components protein in the asymmetric unit. After three rounds of model building and refinement, the resulting model was used for a rigid-body refine- in order to increase both the efficiency and fidelity of ment against the higher resolution but nonisomorphous native data the splicing reaction (Baserga and Steitz, 1993; Moore set 2 (Na3-citrate crystallization condition). In five subsequent et al., 1993). rounds of refinement, NCS and geometric restraints were gradually

lowered to result in an R factor of 21.3% and Rfree of 26.5% with good geometry (Table 1). Experimental Procedures The D1D2 Complex A native data set (native 1) was collected from frozen crystals along Protein Preparation and Crystallization with a methylmercury derivative data set at Elettra synchrotron (␭ϭ The D B Protein Complex 3 1.0 A˚ ).AK2Pt(CN)6 derivative data set was collected at station PX9.6 An N-terminal fragment of a mutant (Ser66Cys) human D3 protein at Daresbury (␭ϭ0.87 A˚ ). All data were processed using programs (residues 1–75) fused to a hexa-histidine tag via a TEV protease of the CCP4 package (Dodson et al., 1997). The methylmercury cleavage site was coexpressed in E. coli with a fragment of the B derivative (CH3HgNO3) gave clear isomorphous and anomalous dif- protein (residues 1–91) using a pQE30 vector (Qiagen, Hamburg). ference Patterson peaks that were interpreted using SHELXS-90 (Shel- The proteins were purified as a complex on a Ni-NTA agarose (Qia- drick, 1991). Heavy atom refinement and phase calculations were gen) column using a 50–500 mM imidazole gradient and dialyzed performed in SHARP (de la Fortelle and Bricogne, 1997). The differ- against 20 mM NaHEPES (pH 7.5), 0.2 M NaCl, 5 mM ␤-mercaptoeth- ence Fourier method was used to locate the heavy atom sites in anol. The hexa-histidine tag was removed by overnight digestion at the Pt derivative and their positions were also refined in SHARP. 22ЊC with recombinant hexa-histidine-tagged TEV protease at the The resulting MIR map at 2.9 A˚ resolution was readily interpretable enzyme to substrate ratio of 1:50 (Parks et al., 1994). Uncleaved after solvent flattening with SOLOMON (Abrahams and Leslie, 1996) material, liberated histidine tag, and the tagged TEV protease were implemented in SHARP (de la Fortelle and Bricogne, 1997). An initial removed by a second Ni-NTA agarose chromatography. The eluted model was built with the program O (Jones and Kjeldgaard, 1994) D3B complex was dialyzed exhaustively against 10 mM Na/K phos- and refined with REFMAC (Murshudov et al., 1997). A higher resolu- phate (pH 5.0), 5 mM DTT, concentrated to 0.5 mM, and stored in tion native data set (native 2) collected at 2.5 A˚ resolution at ELET- liquid nitrogen. The D3B protein complex was crystallized in the TRA had significantly different cell parameters (a ϭ b ϭ 75.5 A˚ , c ϭ orthorhombic space group P212121 at 20ЊC by the sitting drop vapor 92.1 A˚ ), and hence a rigid-body refinement was carried out using diffusion method either from 1.8 M Na/K phosphate buffer (pH 8.5), REFMAC (Murshudov et al., 1997). The structure was then refined 15% glycerol, 10 mM spermidine (native 1 and derivatives 1–3; see by three rounds of model building and refinement with REFMAC to Table 1) (a ϭ 107.4 A˚ , b ϭ 108.5 A˚ , c ϭ 110.3 A˚ ) or 1.4 M Na -citrate 3 an R factor of 25.8% and Rfree of 29.0% with good geometry (Table (pH 8.5), 15% glycerol, 5 mM DTT (native 2) (a ϭ 108.3 A˚ , b ϭ 2). The current model includes residues 2–80 of the D1 protein (119 109.3 A˚ , c ϭ 111.2 A˚ ). residues in total), residues 27–75 and 91–118 of the D2 protein (118 The D1D2 Protein Complex residues in total), and 16 water molecules. Unaccounted electron A coding sequence of human D1 protein was designed using codons density contributes to somewhat higher R factors for this structure. found in highly expressed E. coli genes (Ikemura, 1982). A synthetic Model Building of a Higher-Order Structure D protein gene was assembled with chemically synthesized deoxy- 1 The D1 subunit of one coordinate file of the D1D2 complex was oligonucleotides and cloned into a T7 expression vector together superimposed onto the D2 subunit of another coordinate file using with the natural D2 cDNA sequence (Studier et al., 1990). Full-length LSQMAN (Jones and Kjeldgaard, 1994). This procedure creates a D1 and D2 proteins were overproduced in the BL21(DE3) pLysS strain third subunit interacting with the second subunit in the same way after induction with IPTG. A cleared lysate was applied to an SP- the second does with the first subunit. By repeating this procedure, Sepharose (Pharmacia) column that had been equilibrated with 0.5 it was found that seven subunits form a complete seven-membered M NaCl, 0.5 M urea, 10 mM DTT, 1 mM EDTA, 20 mM Na HEPES ring. The FEG subcomplex model was created by mutating residues (pH 7.4) and eluted with a linear gradient of 0.5–2 M NaCl. The D1D2 within the program O (Jones and Kjeldgaard, 1994). To complete protein complex was crystallized in the hexagonal space group P62 the heptameric ring model, two adjacent protein positions were then (a ϭ b ϭ 76.8 A˚ , c ϭ 90.0 A˚ ) using the sitting drop method with 0.1 replaced by the D3B dimer. M sodium citrate (pH 5.6), 0.2 M ammonium acetate, 15% glycerol, 25% PEG4000. Acknowledgments

Crystallographic Analysis We thank Drs. Alexey Murzin and for helpful discus- sion; Dr. Phil Evans for help with crystallographic analysis; Dr. Lore- The D3B Protein Complex The concentration of glycerol in the mother liquor was increased to dona Lo Conte for the analysis of protein interfaces; Ms. Hiang.-C. 25% in 5% increments (15 min each) prior to flash cooling, and all Teo for her contribution at the early stage of the project; the staff data collections were carried out at 100 K. Native 1 and p-hydroxy- at the Daresbury and ELETTRA synchrotrons for help with data mercurybenzoate (PMB) derivative (derivative 3) data sets were col- collection; Dr. S. Tan for TEV protease; and Drs. A. Klug, M. F. lected at station PX9.6, SRS, Daresbury, England (␭ϭ0.87 A˚ ). Deriv- Perutz, A. Newman, and P. R. Evans for helpful discussion. This atives 1 and 2 were collected on a laboratory X-ray source with Cu project was supported by the Medical Research Council and Human K␣ radiation. A 2.0 A˚ data set of native crystals grown from citrate Frontier Science Program. C. K. was supported by NATO and EU was collected at Elettra synchrotron in Trieste, Italy (native 2, ␭ϭ fellowships, S. W. by Boehringer Ingelheim and EU-Marie Curie stu- 0.99 A˚ ). All data were processed using programs of the CCP4 pack- dentships, and J. M. A. by an MRC training fellowship. age (Dodson et al., 1997). The K2Pt(CN)6 derivative (derivative 1) gave a Patterson solution that was interpreted using SHELXS-90 Received October 26, 1998; revised December 9, 1998. (Sheldrick, 1991). The Pt coordinates of the strongest SHELXS-90 solutions were fed into the program SHARP for phasing and detec- References tion of minor sites (de la Fortelle and Bricogne, 1997). By running SHARP in detection mode, the Pt phases were then used to find the Abrahams, J.P., and Leslie, A.G.W. (1996). Methods used in the Ϫ structure determination of bovine mitochondrial F1 ATPase. Acta (CH3)3PbCH3COO (Me3PbOAc) (derivative 2) and PMB (derivative 3) sites. The resulting MIR map was phase extended to 2.4 A˚ with a Crystallogr. D 52, 30–42. combined maximum likelihood/solvent flattening protocol (SOLO- Antson, A.A., Otridge, J., Brzozowski, A.M., Dodson, E.J., Dodson, MON) (Abrahams and Leslie, 1996) implemented in SHARP (de la G.G., Wilson, K.S., Smith, T.M., Yang, M., Kureski, T., and Gollnick, P. Fortelle and Bricogne, 1997) using data set native 1. An initial model (1995). The structure of trp RNA-binding attenuation protein. Nature was built with the program O (Jones and Kjeldgaard, 1994) and 374, 693–700. Cell 386

Baserga, S.J., and Steitz, J.A. (1993). The diverse world of small transfer-RNAs and the occurrence of the respective codons in pro- ribonucleoproteins. In The RNA World, R.F. Gesteland and J.F. At- tein genes—differences in synonymous codon choice patterns of kins, eds. (Cold Spring Harbor, New York: Cold Spring Harbor Labo- yeast and Escherichia coli with reference to the abundance of isoac- ratory Press), pp. 359–381. cepting transfer RNAs. J. Mol. Biol. 158, 573–597. Bordonne´ , R., and Tarassov, I. (1996). The yeast SmE gene encodes Jones, T.A., and Kjeldgaard, M.O. (1994). O—The Manual. (Uppsala the homologue of the human E core protein. Gene 176, 111–117. University, Uppsala, Sweden). Camasses, A., Bragado-Nilsson, E., Martin, R., Se´ raphin, B., and Kastner, B., Bach, M., and Lu¨ hrmann, R. (1990). Electron microscopy Bordonne´ , R. (1997). Interactions within the yeast Sm core complex: of small nuclear ribonucleoprotein (snRNP) particles U2 and U5: from proteins to amino acids. Mol. Cell. Biol. 18, 1956–1966. evidence for a common structure-determining principle in the major Cavarelli, J., Rees, B., Ruff, M., Thierry, J.C., and Moras, D. (1993). U snRNP family. Proc. Natl. Acad. Sci. USA 87, 1710–1714. Yeast tRNAAsp recognition by its cognate class II aminoacyl tRNA Kastner, B., Kornstadt, U., Bach, M., and Lu¨ hrmann, R. (1992). Struc- synthetase. Nature 362, 181–184. ture of the small nuclear RNP particle U1: identification of the two Cech, T.R. (1985). RNA splicing: three themes with variations. Cell structural protuberances with RNP-antigens A and 70K. J. Cell Biol. 43, 713–716. 116, 839–849. Chu, J.-L., and Elkon, K.B. (1991). The small nuclear ribonucleopro- Lehmeier, T., Raker, V., Hermann, H., and Lu¨ hrmann, R. (1994). cDNA cloning of the Sm proteins D2 and D3 from human small nuclear teins, SmB and BЈ are products of a single gene. Gene 97, 311–312. ribonucleoproteins: evidence for a direct D1-D2 interaction. Proc. Cooper, M., Johnston, L.H., and Beggs, J.D. (1995). Identification Natl. Acad. Sci. USA 91, 12317–12321. and characterisation of Uss1p (Sdb23p): a novel U6 snRNA-associ- Lerner, M.R., and Steitz, J.A. (1979). Antibodies to small nuclear ated protein with significant similarity to core proteins of small nu- RNAs complexed with proteins are produced by patients with sys- clear ribonucleoproteins. EMBO J. 14, 2066–2075. temic lupus erythematosus. Proc. Natl. Acad. Sci. USA 76, 5495– De la Fortelle, E., and Bricogne, G. (1997). Maximum-likelihood 5499. heavy atom parameter refinement for multiple isomorphous replace- Lo Conte, L., Chothia, C. and Janin, J. (1999). The atomic structure ment and multiwavelength anomalous diffraction methods. Methods of protein-protein recognition sites. J. Mol. Biol. 285, 2177–2198. Enzymol. 276, 472–494. Lu¨ hrmann, R., Kastner, B., and Bach, M. (1990). Structure of spliceo- Dodson, E.J., Winn, M., and Ralph, A. (1997). Collaborative Compu- somal snRNPs and their role in pre-mRNA splicing. Biochim. Bio- tational Project, Number 4: Providing Programs for Crystallography. phys. Acta 1087, 265–292. Methods Enzymol. 277, 620–634. Mattaj, I.W. (1986). Cap trimethylation of U snRNA is cytoplasmic Evans, S.V. (1993). SETOR: hardware-lighted three-dimensional and dependent on U snRNP protein binding. Cell 46, 905–911. solid model representations of macromolecules. J. Mol. Graph. 11, Moore, M.J., Query, C., and Sharp, P.A. (1993). Splicing of precursors 134–138. to mRNA by the spliceosome. In The RNA World, R.F. Gesteland Feeney, R.J., Sauterer, R.A., Feeney, J.L., and Zieve, G.W. (1989). and J.F. Atkins, eds. (Cold Spring Harbor, New York: Cold Spring Cytoplasmic assembly and nuclear accumulation of mature small Harbor Laboratory Press), pp. 303–357. nuclear ribonucleoprotein particles. J. Biol. Chem. 264, 5776–5783. Murshudov, G.N., Vagin, A.A., and Dodson, E.J. (1997). Refinement Fischer, U., Sumpter, V., Sekine, M., Satoh, T., and Lu¨ hrmann, R. of macromolecular structures by the maximum likelihood method. (1993). Nucleo-cytoplasmic transport of U snRNPs: definition of a Acta Crystallogr. D 53, 240–255. nuclear location signal in the Sm core domain that binds a transport Musacchio, A., Noble, M., Pauptit, R., Wierenga, R., and Saraste, receptor independently of the m3G cap. EMBO J. 12, 573–583. M. (1992). Crystal structure of a Src-homology-3 (SH3) domain. Na- Fisher, D.E., Conner, G.E., Reeves, N.H., Wisniewolski, R., and Blo- ture 359, 851–855. bel, G. (1985). Small nuclear ribonucleoprotein particle assembly in Nagai, K., and Mattaj, I.W. (1994). RNA-protein interactions in the vivo: demonstration of a 6S RNA-free core precursor and posttrans- splicing snRNPs. In RNA-Protein Interactions, K. Nagai and I.W. lational modification. Cell 42, 751–758. Mattaj, eds. (Oxford, Oxford University Press), pp. 150–177. Frank, D.N., Roiha, H., and Guthrie, C. (1994). Architecture of the Nelissen, R.L., Will, C.L., van Venrooij, W.J., and Lu¨ hrmann, R. (1994). U5 small nuclear RNA. Mol. Cell. Biol. 14, 2180–2190. The association of the U1-specific 70K and C proteins with U1 Fury, M.G., Zhang, W., Christodoulopoulos, I., and Zieve, G.W. snRNPs is mediated in part by common U snRNP proteins. EMBO (1997). Multiple protein:protein interactions between the snRNP J. 13, 4113–4125. common core proteins. Exp. Cell. Res. 237, 63–69. Newman, A.J. (1997). The role of U5 snRNP in pre-mRNA splicing. Geiger, J.H., Hahn, S., Lee, S., and Sigler, P.B. (1996). Crystal struc- EMBO J. 16, 5797–5800. ture of the yeast TFIIA/TBP/DNA complex. Science 272, 830–836. Newman, A., and Norman, C. (1992). U5 snRNA interacts with exon Hamm, J., Darzynkiewicz, E., Tahara, S.M., and Mattaj, I.W. (1990). sequences at 5Ј and 3Ј splice sites. Cell 68, 743–754. The trimethylguanosine cap structure of U1 snRNA is a component Nicholls, K.A., Sharp, B., and Honig, B. (1991). Protein folding and of a bipartite nuclear targeting signal. Cell 62, 569–577. association: insights from the interfacial and thermodynamic prop- Hartmuth, K., Raker, V.A., Huber, J., Branlant, C., and Lu¨ hrmann, R. erties of hydrocarbons. Proteins 11, 281–296. (1999). An unusual chemical reactivity of Sm site adenosines Nilsen, T.W. (1994) RNA-RNA interactions in the spliceosome: unrav- strongly correlates with proper assembly of core U snRNP particles. eling the ties that bind. Cell 78, 1–4. J. Mol. Biol. 285, 133–147. Oubridge, C., Ito, N., Evans, P.R., Teo, H.C., and Nagai, K. (1994). Heinrichs, V., Hackl, W., and Lu¨ hrmann, R. (1992). Direct binding of Crystal structure at 1.92 A˚ resolution of the RNA-binding domain of small nuclear ribonucleoprotein G to the Sm site of small nuclear the U1A spliceosomal protein complexed with an RNA hairpin. Na- RNA. Ultraviolet light cross-linking of protein G to the AAU stretch ture 372, 432–438. within the Sm site (AAUUUGUGG) of U1 small nuclear ribonucleo- Palacios, I., Hetzer, M., Adam, S.A., and Mattaj, I.W. (1997). Nuclear protein reconstituted in vitro. J. Mol. Biol. 227, 15–28. import of U snRNPs requires importin b. EMBO J. 16, 6783–6792. Hermann, H., Fabrizio, P., Raker, V.A., Foulaki, K., Hornig, H., Parks, T.D., Leuther, K.K., Howard, E.D., Johnson, S.A., and Dough- Brahms, H., and Lu¨ hrmann, R. (1995). snRNP Sm proteins share two erty, W.G. (1994). Release of proteins and peptides from fusion evolutionarily conserved sequence motifs which are involved in Sm proteins using a recombinant plant virus proteinase. Anal. Biochem. protein-protein interactions. EMBO J. 14, 2076–2088. 216, 413–417. Huber, J., Cronshagen, U., Kadokura, M., Marshallsay, C., Wada, Plessel, G., Fischer, U., and Lu¨ hrmann, R. (1994). m3G cap hyper- T., Seikine, M., and Lu¨ hrmann, R. (1998). Snurportin1, an m3G-cap- methylation of U1 small nuclear ribonucleoprotein (snRNP) in vitro: specific nuclear import receptor with a novel domain structure. evidence that the U1 small nuclear RNA-(guanosine-N2)-methyl- EMBO J. 17, 4114–4126. transferase is a non-snRNP cytoplasmic protein that requires a bind- Ikemura, T. (1982). Correlation between the abundance of yeast ing site on the Sm core domain. Mol. Cell. Biol 14, 4160–4172. Crystal Structure of the Core snRNP Proteins 387

Plessel, G., Lu¨ hrmann, R., and Kastner, B. (1997). Electron micros- copy of assembly intermediates of the snRNP core: morphological similarities between the RNA-free (E.F.G) protein heteromer and the intact snRNP core. J. Mol. Biol. 265, 87–94. Price, S.R., Evans, P.R., and Nagai, K. (1998). Crystal structure of the spliceosomal U2BЈЈ-U2AЈ protein complex bound to a fragment of U2 small nuclear RNA. Nature 394, 645–650. Raker, V.A., Plessel, G., and Lu¨ hrmann, R. (1996). The snRNP core assembly pathway: identification of stable core protein heteromeric complexes and an snRNP subcore particle in vitro. EMBO J. 15, 2256–2269. Se´ raphin, B. (1995). Sm and Sm-like proteins belong to a large family: identification of proteins of the U6 as well as the U1, U2, U4 and U5 snRNPs. EMBO J. 14, 2089–2098. Sharp, P.A. (1985). On the origins of RNA splicing and introns. Cell 42, 397–400. Sharp, P.A. (1987). Splicing of messenger RNA precursors. Science 23, 766–771. Sheldrick, G. (1991). Heavy atom location using SHELXS-90. In Iso- morphous Replacement and Anomalous Scattering, W. Wolf, P.R. Evans, and A.G.W. Leslie, eds. (Daresbury, CCP4), pp. 23–38. Steitz, J.A., Black, D.L., Gerke, V., Parker, K.A., Kra¨ mer, A., Frende- wey, D., and Keller, W. (1988). Functions of the abundant U-snRNPs. In Structure and Function of Major and Minor Small Nuclear Ribo- nucleoprotein Particles, M.L. Birnstiel, ed. (Berlin, Springer-Verlag), pp. 115–154. Studier, F.W., Rosenberg, A.H., Dunn, J.J., and Dubendorff, J.A. (1990). Use of T7 RNA polymerase to direct expression of cloned genes. Methods Enzymol. 185, 60–89. Tan, S., Hunziker, Y., Sargent, D.F., and Richmond, T.J. (1996). Crys- tal structure of a yeast TFIIA/TBP/DNA complex. Nature 381, 127–134. Tarn, W.Y., and Steitz, J.A. (1997). Pre-mRNA splicing: the discovery of a new spliceosome doubles the challenge. Trends Biochem. Sci. 22, 132–137. Valega˚ rd, K., Murray, J.B., Stockley, P.G., Stonehouse, N.J., and Lilijas, L. (1994). Crystal structure of an RNA bacteriophage coat protein-operator complex. Nature 371, 623–626. van Dam, A., Winkel, I., Zijlstra-Baalbergen, J., Smeenk, R., and Cuypers, H.T. (1989). Cloned human snRNP proteins B and B’ differ only in their caboxy-terminal part. EMBO J. 8, 3853–3860. Weiner, A.M. (1993). mRNA splicing and autocatalytic introns: dis- tant cousins or the products of chemical determinism. Cell 72, 161–164.

Brookhaven Protein Data Bank ID Codes

The coordinates of the D3B and D1D2 complexes have been depos- ited in the Brookhaven Protein Data Bank under ID codes 1d3b and 1b34, respectively.