The Structure of a Highly Conserved Picocyanobacterial Protein Reveals a Tudor Domain with an RNA Binding Function
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/542068; this version posted August 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The structure of a highly conserved picocyanobacterial protein reveals a Tudor domain with an RNA binding function Katherine M Bauer1, Rose Dicovitsky2, Maria Pellegrini2, Olga Zhaxybayeva3,4 and Michael J Ragusa1,2* 1Department of Biochemistry & Cell Biology, Geisel School of Medicine at Dartmouth, Hanover NH 03755; 2Department of Chemistry, Dartmouth College, Hanover NH 03755; 3Department of Biological Sciences, Dartmouth College, Hanover NH 03755; 4Department of Computer Science, Dartmouth College, Hanover NH 03755 Running Title: The structure of a highly conserved cyanobacterial protein *To whom correspondence should be addressed: Michael J. Ragusa: Department of Chemistry, Dartmouth College, Hanover NH 03755; [email protected] Keywords: NMR spectroscopy, Prochlorococcus, Synechococcus, nucleic acid binding, Tudor domains ABSTRACT stranded and hairpin RNAs. These results provide Cyanobacteria of the Prochlorococcus and the first insight into the structure and function of marine Synechococcus genera are the most PSHCP, demonstrating that PSHCP is an RNA abundant photosynthetic microbes in the ocean. binding protein that can recognize a broad array Intriguingly, the genomes of these bacteria are of RNA molecules. very divergent even within each genus, both in gene content and at the amino acid level of the encoded proteins. One striking exception to this With the mean annual average of 3.6 × is a 62 amino acid protein, termed 1027 cells, cyanobacteria from genera Prochlorococcus/Synechococcus Hyper Prochlorococcus (1) and Synechococcus (2) Conserved Protein (PSHCP). PSHCP is not only numerically dominate microbial communities of found in all sequenced Prochlorococcus and the global ocean (3). Because of such abundance, marine Synechococcus genomes but it is also these photosynthesizing bacteria are extremely nearly 100% identical in its amino acid sequence important players in the global carbon cycle (3). across all sampled genomes. Such universal The small genomes and limited gene content of distribution and sequence conservation suggests individual Prochlorococcus and Synechococcus an essential cellular role of the encoded protein in cells (1.6-2.9 Mb, encoding ~1700-3100 genes; these bacteria. However, the function of PSHCP (4)) are counterbalanced by a diverse collective is unknown. We used Nuclear Magnetic gene pool, which in the case of Prochlorococcus Resonance (NMR) spectroscopy to determine its is estimated to contain 80,000 genes (5,6). The structure. We found that 53 of the 62 amino acids genomes of Prochlorococcus and Synechococcus in PSHCP form a Tudor domain, while the are also surprisingly divergent: in pairwise remainder of the protein is disordered. NMR comparisons, the genome-wide average amino titration experiments revealed that PSHCP has acid identity (AAI; (7)) of the encoded proteins is only a weak affinity for DNA, but an 18.5-fold often below 60% even within each genus (8). higher affinity for tRNA, hinting at an Maintenance of such remarkable sequence and involvement of PSHCP in translation. Isothermal gene content divergence remains poorly titration calorimetry experiments further revealed understood (6). However, one protein-coding that PSHCP also binds single-stranded, double- gene is a curious exception, with its product 1 bioRxiv preprint doi: https://doi.org/10.1101/542068; this version posted August 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. showing almost 100% amino acid sequence Resonance (NMR) spectroscopy and investigated conservation across all currently available the protein’s interactions with nucleic acids in Prochlorococcus and marine Synechococcus vitro. genomes ((9,10) and Supplementary Datasets 1 and 2). This gene encodes a 62 amino acid protein RESULTS of unknown function, dubbed PSHCP for The structure of PSHCP Prochlorococcus/Synechococcus Hyper We began by screening different Conserved Protein, (9,10). Although at the time constructs of PSHCP for their behavior for of its discovery the gene encoding PSHCP was structure determination. An initial round of limited to the cyanobacterial clade consisting of construct screening revealed that residues 57-62 Prochlorococcus and marine Synechococcus, could be removed without altering the overall sequence similarity searches now identify the stability of the protein. This is in good agreement PSHCP gene in genomes of freshwater with disorder prediction using the IUPred server Synechococcus (11); several Cyanobium spp. (16), which suggests that residues 57-62 are from both freshwater and marine environments; disordered. PSHCP1-56 could be easily in cyanobacterial sponge symbiont “Candidatus concentrated above 1 mM making it ideal for Synechococcus spongarium” (12); in three structure determination. Secondary structure species of Paulinella, which is a photosynthetic probabilities as calculated using TALOS-N (17) protist with a chromatophore thought to be from 1H,13C,15N chemical shifts, show that the derived from marine Synechococcus spp. (13); first three residues of PSHCP1-56 are disordered and in metagenomically-assembled and that the remainder of the protein contains cyanobacteria from globally distributed marine, five β-strands (Fig. S1A). To further investigate 1 brackish and freshwater environments. Within the dynamics of PSHCP1-56 we performed a { H}- cyanobacteria, all characterized PSHCP- 15N heteronuclear Nuclear Overhauser Effect containing organisms form a clade (14) within the (NOE) experiment. In the {1H}-15N heteronuclear order Synechococcales (15). Hence, the PSHCP NOE experiment a peak intensity ratio closer to protein remains extremely conserved in a specific one indicates little motion of the N-H bond on the subgroup of cyanobacteria (Supplementary picosecond to nanosecond timescale while a Datasets 1 and 2) but is undetectable outside of smaller peak intensity ratio corresponds to more this subgroup. motion on this time scale and thus residues which The remarkable conservation of the are disordered. The {1H}-15N heteronuclear NOE PSHCP protein and retention of this gene even data recorded on PSHCP1-56 is in good agreement within extremely reduced chromatophore with the secondary structure probabilities genomes (13) suggest that this protein may confirming that the structured core of the protein encode an important housekeeping function. In corresponds to residues 4-56 (Fig. S1B). three examined strains of Prochlorococcus and For the structure calculation of PSHCP1- marine Synechococcus, the gene is expressed, and 56, 1195 NOE derived distance constraints and 44 its protein product is abundant (10). A large dihedral angle constraints were used (Table 1). proportion of positively charged amino acids Excluding the first three amino acids, this (16%) (9), a predicted isoelectric point of 11.3 resulted in 22.1 distance constraints per amino (9), and the protein’s association with the acid. The 20 lowest energy structures align with ribosomal protein L2 in pull-down assays (10) an average pairwise backbone RMSD of 0.72 ± prompted the hypotheses of PSHCP involvement 0.15 Å for residues 4-29 and 35-56 (Fig. 1A). in binding of either DNA or RNA, and its Residues 30-34 form a flexible loop between β- possible association with a ribosome. Yet, the strands two and three, and thus were excluded amino acid sequence of PSHCP shows no from the RMSD calculation. The overall fold of significant sequence similarity to any protein PSHCP contains five β-strands, which form a domain with a known function. single antiparallel β-sheet with an overall barrel- To gain more insight into the possible like architecture (Fig. 1B and 1C). function of PSHCP, we determined the structure To determine if PSHCP is part of a of the PSHCP protein using Nuclear Magnetic conserved domain family, we used the Dali server 2 bioRxiv preprint doi: https://doi.org/10.1101/542068; this version posted August 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. (18) to search the Protein Data Bank (PDB) for retinoblastoma-binding protein 1 (RBBP1) (23). proteins with structural similarity to PSHCP. When titrating dsGCDNA into PSHCP seven 1 15 PSHCP aligns well to a number of Tudor domains peaks shifted in the PSHCP1-56 H- N HSQC including the Tudor domain from SMN (Fig. S2) spectrum, demonstrating that PSHCP1-56 can bind (19). The highest structural similarity is to the to DNA (Fig. 4A). As only seven peaks shifted Tudor domain from the human TDRD3 protein: upon DNA binding, this suggests that PSHCP1-56 the 53 amino acids of PSHCP (residues 4-56) does not undergo large structural changes upon align to it with a backbone RMSD of 1.5 Å (Fig. DNA binding. The peaks for L14, E15, S16, G20 2A-C). Tudor domains are small, approximately and V22 all shifted upon DNA binding. These 60 amino acid long, domains containing four or five residues all sit in the loop between β-strand five β-strands in a barrel-like fold. Most known 1 and 2 suggesting that this region is important Tudor domains are found in eukaryotes (98.97% for DNA binding. of the InterPro records for Tudor domain), where To determine the dissociation constant they are often present in 2-3 adjacent copies (Kd) for PSHCP1-56-DNA binding, we monitored within large, multidomain proteins (20). In chemical shift perturbations (CSPs) at the four contrast, full length PSHCP consists entirely of a residues which undergo the largest CSPs in single Tudor domain (residues 4-56). PSHCP (E15, G20, V22, T46) over a concentration range of 50 µM to 500 µM PSHCP is unlikely to be involved in recognition dsGCDNA (Fig. 4B and Table 2). The calculated of methylated proteins Kd values ranged from 121.3 to 173.3 µM with an The best-characterized functions of average Kd of 140.6 µM.