Structural Basis of UGUA Recognition by the Nudix Protein CFI 25
Total Page:16
File Type:pdf, Size:1020Kb
Structural basis of UGUA recognition by the Nudix protein CFIm25 and implications for a regulatory role in mRNA 3′ processing Qin Yang, Gregory M. Gilmartin1, and Sylvie Doublié1 Department of Microbiology and Molecular Genetics, Stafford Hall, University of Vermont, Burlington, VT 05405 Edited by Joan A. Steitz, Howard Hughes Medical Institute, New Haven, CT, and approved April 20, 2010 (received for review January 21, 2010) — Human Cleavage Factor Im (CFIm) is an essential component of the C-terminal RS/RD alternating charge domain a structure simi- pre-mRNA 3′ processing complex that functions in the regulation of lar to that of the SR-protein family of splicing regulators. The 25 poly(A) site selection through the recognition of UGUA sequences small subunit (CFIm ) contains a Nudix domain, a protein do- upstream of the poly(A) site. Although the highly conserved 25 kDa main that most often participates in the hydrolysis of substrates subunit (CFIm25) of the CFIm complex possesses a characteristic containing a nucleotide diphosphate linked to a variable moiety α β α / / Nudix fold, CFIm25 has no detectable hydrolase activity. Here X (15). Found throughout all three kingdoms, Nudix proteins we report the crystal structures of the human CFIm25 homodimer in participate in a wide range of crucial housekeeping functions, in- complex with UGUAAA and UUGUAU RNA sequences. CFIm25 is the cluding the hydrolysis of mutagenic nucleotides, the modulation first Nudix protein to be reported to bind RNA in a sequence- of the levels of signaling molecules, and the monitoring of meta- 25 α β specific manner. The UGUA sequence contributes to binding speci- bolic intermediates (15). CFIm possesses the characteristic / / ficity through an intramolecular G:A Watson–Crick/sugar-edge α Nudix fold and is able to bind Ap4A (diadenosine tetraphos- base interaction, an unusual pairing previously found to be in- phate), but it has no hydrolase activity, due to the absence of volved in the binding specificity of the SAM-III riboswitch. The two of the four glutamate residues that coordinate the divalent structures, together with mutational data, suggest a novel cations important for substrate hydrolysis (16). mechanism for the simultaneous sequence-specific recognition of While an array of different protein domains have been identi- two UGUA elements within the pre-mRNA. Furthermore, the mu- fied that bind RNA in a sequence-specific manner, only a limited tually exclusive binding of RNA and the signaling molecule Ap4A subset functions in the sequence-specific recognition of single- (diadenosine tetraphosphate) by CFIm25 suggests a potential role stranded RNA (17). These domains include the ubiquitous for small molecules in the regulation of mRNA 3′ processing. RRM, hnRNP K homology domain (KH domain), zinc-binding domains, and the PUF domain. In this report, we present a pre- cleavage factor ∣ CPSF5 ∣ mRNA processing ∣ Protein-RNA complex ∣ RNA viously undescribed mechanism for the sequence-specific binding recognition of single-stranded RNA by the 25 kDa subunit of CFIm. Although the Nudix domains of the eukaryotic decapping enzymes (18), he transcriptome complexity of higher eukaryotes requires the bacterial 5′ pyrophophohydrolase (19), and the trypanosome 25 Tcoordinate recognition of an array of alternative pre-mRNA mitochondrial protein MERS1 (20) act on RNA, CFIm is processing signals in a developmental and tissue-specific manner unique among Nudix proteins in that it is capable of sequence- ′ 25 (1, 2). The sequences that direct pre-mRNA splicing and 3 pro- specific RNA binding. CFIm is highly conserved throughout the cessing are initially recognized within the nascent transcript in a eukaryotic kingdom (Fig. S1), yet, interestingly, it has been lost in process that is intimately coupled to transcription (3, 4). While a subset of protists, including both Saccharomyces cerevisiae and the recognition of exons within the pre-mRNA is mediated by Schizosaccharomyces pombe (Fig. S2). 25 both RNA:RNA and protein:RNA interactions (5), the 3′ proces- The structures of the CFIm homodimer in complex with sing of polyadenylated mRNAs appears to rely solely on the in- RNA presented here not only reveal a unique mechanism for teraction of protein factors (6) with unstructured RNA sequences sequence-specific RNA binding, but also provide an insight into (7) within the nascent transcript. the coordinate recognition of multiple poly(A) site upstream Vertebrate pre-mRNA 3′ processing signals are recognized elements, and the potential regulation of these interactions by by a tripartite mechanism through which a set of short RNA small molecules. sequences direct the cooperative binding of three multimeric ′ Results 3 processing factors, cleavage factor Im (CFIm), cleavage and polyadenylation specificity factor (CPSF), and cleavage stimula- Overall Structure of CFIm25 Bound to UGUA Element. Two 6-nucleo- tion factor (CstF) (8). CPSFand CstF bind the AAUAAA hexamer tide RNA sequences containing a UGUA element: 5'-UGUAAA- and downstream GU-rich elements that flank the poly(A) site, 3' and 5'-UUGUAU-3' were designed based on our previously respectively, whereas CFIm interacts with upstream sequences published SELEX results (11). The second oligonucleotide with that may function in the regulation of alternative polyadenylation the extra uracil at the 5′-end was used to confirm that the UGUA (9–11). SELEX and biochemical analyses have identified the sequence UGUAN (N ¼ A > U > G, C) as the preferred binding Author contributions: Q.Y., G.M.G., and S.D. designed research; Q.Y. and G.M.G. site of CFIm (11). In this report we have taken a structural ap- performed research; and Q.Y., G.M.G., and S.D. wrote the paper. proach to determine the mechanism of sequence-specific RNA The authors declare no conflict of interest. binding by CFI . m This article is a PNAS Direct Submission. CFIm is composed of a large subunit of 59, 68, or 72 kDa and a 25 Data deposition: The atomic coordinates and structure factor amplitudes have been small subunit of 25 kDa (CFIm , also referred to as CPSF5 or deposited in the Protein Data Bank, www.pdb.org (PDB ID codes 3MDG, 3MDI). NUDT21) (12, 13), both of which contribute to RNA binding 1To whom correspondence may be addressed: E-mail: [email protected] or Gregory. (14). The large subunit, encoded by either of two paralogs [email protected]. CPSF6 CPSF7 ( and ), contains an N-terminal RNA Recognition This article contains supporting information online at www.pnas.org/lookup/suppl/ Motif (RRM), an internal polyproline-rich region, and a doi:10.1073/pnas.1000848107/-/DCSupplemental. 10062–10067 ∣ PNAS ∣ June 1, 2010 ∣ vol. 107 ∣ no. 22 www.pnas.org/cgi/doi/10.1073/pnas.1000848107 Downloaded by guest on September 27, 2021 25 B core element was bound by CFIm specifically. The UGUAAA all the bases except for the first U (referred to as U0) (Fig. 1 ). and UUGUAU complex structures were solved to a resolution of The next four nucleotides, U1, G2, U3, and A4, are found in the 2.1 and 2.2 Å, respectively (Table S1). The overall protein archi- RNA binding pocket of Mol A. Right after A4, the RNA back- 25 tecture of both CFIm -RNA complexes is nearly identical to that bone bends ∼105° toward Mol Bs of an adjacent dimer, leading 25 of the previously published unliganded CFIm model [3BAP U5 to insert into the RNA binding pocket of Mol Bs. In the (16)] (RMSD 0.45 Å calculated on 194 Cα atoms). Briefly, UGUAAA-bound complex (Fig. 1C), the first three nucleotides, 25 CFIm is composed of a central domain encompassing residues U1, G2, and U3, interact with Mol A in the same manner as in the 77–202, which adopts a α/β/α fold common to all Nudix proteins UUGUAU complex. In contrast to the UUGUAU complex, how- 25 (21). In CFIm , the central Nudix domain is sandwiched be- ever, the phosphate backbone of the RNA is twisted by ∼95° after tween N-terminal and C-terminal structural elements, which the U3 nucleotide, flipping A4 and A5 into the RNA binding are major contributors to the dimer interface. The most notable pocket of Mol Bs of the adjacent dimer. Interestingly, right after difference between the apo and RNA-bound structures is the po- A5, the RNA strand twists back toward Mol A, enabling the in- sition of the N-terminal segment (residues 21–29), which swings teractions between G2 and A6, which are identical to the G2-A4 backward instead of leaning toward the other monomer. Another interactions observed in the UUGUAU-bound complex. All 25 β α interesting feature of CFIm is the loop connecting 2 and 1 these observations support our earlier SELEX and biochemical – 25 (residues 51 60) (Fig. 1 and 4). This loop acts like a strap that analyses indicating that CFIm specifically recognizes the occludes the canonical Nudix substrate-binding pocket. Contrary UGUA tetranucleotide sequence (11). to earlier predictions (22), the loop does not move away upon 25 RNA binding. Instead, it is an integral part of the RNA recogni- Sequence-Specific Recognition of UGUA by CFIm25. CFIm binds to tion pocket. RNA through a variety of interactions, including hydrogen-bond- 25 Even though the asymmetric unit contains a dimer of CFIm , ing via both main-chain and side-chain atoms, aromatic stacking, we observe only one bound RNA molecule. The RNA hexamer is and peptide bond stacking (17). Besides protein–RNA interac- bound specifically by one molecule (designated as molecule A), tions, intramolecular interactions also play a substantial role in and partially by molecule B of an adjacent dimer in the crystal RNA recognition. A schematic representation of the interactions s A 25 (designated B , for symmetry equivalent) (Fig. 1 ). In the between CFIm and each of the RNAs is shown in Fig.