Structural View of a Non Pfam Singleton and Crystal Packing Analysis

Structural View of a Non Pfam Singleton and Crystal Packing Analysis Chongyun Cheng1., Neil Shaw1,2., Xuejun Zhang3, Min Zhang4, Wei Ding1, Bi-Cheng Wang5, Zhi-Jie Liu1,2* 1 National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China, 2 College of Stem Cell and Molecular Clinical Medicine, Kunming Medical University, Kunming, China, 3 Department of Immunology, Tianjin Medical University, Tianjin, China, 4 School of Life Sciences, Anhui University, Hefei, Anhui, China, 5 Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia, United States of America Abstract Background: Comparative genomic analysis has revealed that in each genome a large number of open reading frames have no homologues in other species. Such singleton genes have attracted the attention of biochemists and structural biologists as a potential untapped source of new folds. Cthe_2751 is a 15.8 kDa singleton from an anaerobic, hyperthermophile Clostridium thermocellum. To gain insights into the architecture of the protein and obtain clues about its function, we decided to solve the structure of Cthe_2751. Results: The protein crystallized in 4 different space groups that diffracted X-rays to 2.37 A˚ (P3121), 2.17 A˚ (P212121), 3.01 A˚ (P4122), and 2.03 A˚ (C2221) resolution, respectively. Crystal packing analysis revealed that the 3-D packing of Cthe_2751 dimers in P4122 and C2221 is similar with only a rotational difference of 2.69u around the C axes. A new method developed to quantify the differences in packing of dimers in crystals from different space groups corroborated the findings of crystal packing analysis. Cthe_2751 is an all a-helical protein with a central hydrophobic core providing thermal stability via p:cation and p: p interactions. A ProFunc analysis retrieved a very low match with a splicing endonuclease, suggesting a role for the protein in the processing of nucleic acids. Conclusions: Non-Pfam singleton Cthe_2751 folds into a known all a-helical fold. The structure has increased sequence coverage of non-Pfam proteins such that more protein sequences can be amenable to modelling. Our work on crystal packing analysis provides a new method to analyze dimers of the protein crystallized in different space groups. The utility of such an analysis can be expanded to oligomeric structures of other proteins, especially receptors and signaling molecules, many of which are known to function as oligomers. Citation: Cheng C, Shaw N, Zhang X, Zhang M, Ding W, et al. (2012) Structural View of a Non Pfam Singleton and Crystal Packing Analysis. PLoS ONE 7(2): e31673. doi:10.1371/journal.pone.0031673 Editor: Petri Kursula, University of Oulu, Finland Received October 28, 2011; Accepted January 11, 2012; Published February 20, 2012 Copyright: ß 2012 Cheng et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the National Natural Science Foundation of China (grants: 30870483, 31070660, 31021062 and 31000334), Ministry of Science and Technology of China (grants: 2009DFB30310, 2009CB918803 and 2011CB911103), CAS Research Grant (YZ200839 and KSCX2-EW-J-3), the University of Georgia Research Foundation and the Georgia Research Alliance. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] . These authors contributed equally to this work. Introduction have accumulated substitutions to such an extent that the sequence is no longer identical to the parent or any other known One of the perplexing outcomes of sequencing of a number of sequence [2]. Theoretical analysis of lineage specific genes genomes is the discovery of a large set of open reading frames involved in adaptation of a species to a particular environment (ORFs) in each genome that have no homologues in other seem to suggest that these genes are fast evolving since they have species. Such ORFs, referred to as singletons or ORFans, have a a ‘‘substrate’’ to act on and therefore a number of lineage-specific codon usage pattern similar to those seen for other proteins, singletons have been postulated to play a role and confer an suggesting that these ORFs encode and express proteins [1]. adaptive advantage on a particular species [3]. In contrast to this Recently, singletons have attracted the attention of evolutionary hypothesis, studies on the Drosophila genome show that singletons biologists, biochemists and structural biologists regarding their in Drosophila have similar rates of evolution as non-singletons and origin, functional significance and the possibility that they may therefore accumulation of mutations may not be the only method carry a relatively untapped source of new folds. Several for the origin of Drosophila singletons. Instead the singletons seem hypotheses have been put forward to explain the lack of sequence to have largely originated by de novo synthesis from non-coding identity and the origin of singletons; the most common regions like intergenic sequences [4]. In addition, insertion of explanation being that singletons are fast-evolving genes that transposon elements has resulted in completely new coding PLoS ONE | www.plosone.org 1 February 2012 | Volume 7 | Issue 2 | e31673 Structure of a Non Pfam Singleton Figure 1. Characterization of Cthe_2751 (A) Sequence alignment of Cthe_2751 homologues. Only the top 11 matches with Cthe_2751 amino acids 1–134 are shown. Conservation is colored according to ClustalW convention. (B) Size exclusion profile of Cthe_2751 run on Hi Load 10/ 300 Superdex G75 gel filtration column equilibrated with 20 mM Tris-HCl, 100 mM NaCl, pH 8.0, reveals that the protein exists as a dimer in solution. SDS-PAGE picture (inset) showing the purity of Cthe_2751 before crystallization. (C) Sedimentation velocity experiments performed using an analytical ultracentrifuge suggested that Cthe_2751 forms a dimer in solution. The curve was generated using Sedfit software. doi:10.1371/journal.pone.0031673.g001 PLoS ONE | www.plosone.org 2 February 2012 | Volume 7 | Issue 2 | e31673 Structure of a Non Pfam Singleton Figure 2. Structure of singleton Cthe_2751. (A) Topology diagram of the structure. (B) Cartoon representation of Cthe_2751 with helices shown as cylinders in (C) to depict the contour, pairing and stacking. (D) Cartoon representation of a dimer of Cthe_2751. doi:10.1371/journal.pone.0031673.g002 sequences. Such retro genes of viral origin have also been found to encode new proteins in primates [1], humans [5] and microbes [6]. The origin of singletons in mouse is partially attributed to frameshift mutations resulting in novel open reading frames [7]. Similarly, in Saccharomyces, ORFan domains are found at the C- termini of proteins and seem to have originated from frameshift mutations [8]. However, a majority of Saccharomyces ORFan domains are a result of de novo synthesis from non-coding DNA [8]. Thus, it seems that although different species might prefer one mechanism over another for the generation of singletons, they might still be using all of these methods – a faster rate of mutation, de novo synthesis from non-coding DNA, lateral gene transfers via transposons and frameshift mutations – to produce singletons. One question that arises then is – are singletons merely aberrations of biological processes or do they play a role in the survival and propagation of organisms? Attempts have been made to address this question and there is strong evidence now that singletons express protein. For instance, in a genome- wide study on Halobacterium, the authors could detect mRNA for Figure 3. Two dimensional projection (along C axis direction) 30 out of 39 paralogous singletons representing 13 out of 14 of 2-fold symmetry related molecules for space group P4122 families identified in Halobacterium [9]. Similarly, singleton genes and C2221. The transformation between space groups P4122 and involved in immune response, oxygen stress, flight and circadian C2221 is illustrated. (A) The projection of Cthe_2751 monomer (dark blue) and 7 symmetry related molecules along C axis. The homodimer rhythm could be detected in the cDNA of Drosophila yakuba of Cthe_2751 (dark and light blue molecules) is related by crystallo- suggesting singletons are expressed as legitimate proteins. A graphic 2-fold 2(x 0 0). Please note that 41 screw symmetry related mutation in the singleton fln gene that encodes protein for a thick molecules are not shown for thesakeofonlydisplayingthe filament in flight muscle results in a viable but flightless fly [10]; a transformation between P4122 and C2221 space groups; (B) In order mutation in the circadian rhythm to gene produces a rhythm to illustrate the transformation between P4122 and C2221 space groups, defective fly [11]. Interestingly, all these functions of singletons Figure 3A is rotated 45 degrees clockwise around 41 axis; (C) The projection of Cthe_2751 dimer along C axis. The 4 Cthe_2751 dimers in are expected to play a role in the fly’s response to specific C2221 space group have almost the same orientation as that of the 8 ecological or environmental challenges. Although these examples Cthe_2751 monomers (or 4 dimers) in the 45u rotated P4122 unit cell. underscore the fact that singletons are expressed as proteins and doi:10.1371/journal.pone.0031673.g003 PLoS ONE | www.plosone.org 3 February 2012 | Volume 7 | Issue 2 | e31673 Structure of a Non Pfam Singleton play a functional role, a vast majority of singletons yet have #1499712) from M. jannaschii was annotated as a hypothetical unknown functions. protein with unknown function [13]. The primary sequence One way to gain functional insights is to solve the 3-dimensional provided no clues about the function. When the crystal structure of structure of the protein and compare it with structures with known the protein was solved, it revealed a methyl-transferase fold.

Structural View of a Non Pfam Singleton and Crystal Packing Analysis

Cryptic Inoviruses Revealed As Pervasive in Bacteria and Archaea Across Earth’S Biomes

Learning Protein Constitutive Motifs from Sequence Data Je´ Roˆ Me Tubiana, Simona Cocco, Re´ Mi Monasson*

DECIPHER: Harnessing Local Sequence Context to Improve Protein Multiple Sequence Alignment Erik S

1 Codon-Level Information Improves Predictions of Inter-Residue Contacts in Proteins 2 by Correlated Mutation Analysis 3

EMBL-EBI Now and in the Future

Origin of a Folded Repeat Protein from an Intrinsically Disordered Ancestor

The Pfam Protein Families Database Marco Punta1,*, Penny C

Comparative Genomics of the Major Parasitic Worms

UC Berkeley UC Berkeley Electronic Theses and Dissertations

Structural Basis for Effector Transmembrane Domain Recognition

MEROPS: the Peptidase Database Neil D

Human Genetics 1990–2009