High Content of Proteins Containing 21St and 22Nd Amino Acids
Total Page:16
File Type:pdf, Size:1020Kb
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Vadim Gladyshev Publications Biochemistry, Department of July 2007 High content of proteins containing 21st and 22nd amino acids, selenocysteine and pyrrolysine, in a symbiotic deltaproteobacterium of gutless worm Olavius algarvensis Yan Zhang University of Nebraska-Lincoln, [email protected] Vadim N. Gladyshev University of Nebraska-Lincoln, [email protected] Follow this and additional works at: https://digitalcommons.unl.edu/biochemgladyshev Part of the Biochemistry, Biophysics, and Structural Biology Commons Zhang, Yan and Gladyshev, Vadim N., "High content of proteins containing 21st and 22nd amino acids, selenocysteine and pyrrolysine, in a symbiotic deltaproteobacterium of gutless worm Olavius algarvensis" (2007). Vadim Gladyshev Publications. 56. https://digitalcommons.unl.edu/biochemgladyshev/56 This Article is brought to you for free and open access by the Biochemistry, Department of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Vadim Gladyshev Publications by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. 4952–4963 Nucleic Acids Research, 2007, Vol. 35, No. 15 Published online 11 July 2007 doi:10.1093/nar/gkm514 High content of proteins containing 21st and 22nd amino acids, selenocysteine and pyrrolysine, in a symbiotic deltaproteobacterium of gutless worm Olavius algarvensis Yan Zhang and Vadim N. Gladyshev* Department of Biochemistry, University of Nebraska, Lincoln, NE 68588-0664, USA Received May 24, 2007; Revised and Accepted June 14, 2007 ABSTRACT bacteria and eukaryotes (1–4). Biosynthesis of Sec and its cotranslational insertion into polypeptides require a Selenocysteine (Sec) and pyrrolysine (Pyl) are rare complex molecular machinery that recodes in-frame UGA amino acids that are cotranslationally inserted into codons, which normally function as stop signals, to serve proteins and known as the 21st and 22nd amino as Sec codons (5–9). Although the occurrence of acids in the genetic code. Sec and Pyl are encoded selenoprotein genes is limited, the Sec UGA codon has by UGA and UAG codons, respectively, which become the first addition to the universal genetic code normally serve as stop signals. Herein, we report since the code was deciphered 40 years ago (10). on unusually large selenoproteomes and pyrropro- The mechanism of Sec insertion differs in the three teomes in a symbiont metagenomic dataset of domains of life. In bacteria, this process has been most a marine gutless worm, Olavius algarvensis. thoroughly elucidated in Escherichia coli (1,2,6). We identified 99 selenoprotein genes that clustered Translation of bacterial selenoprotein mRNA requires into 30 families, including 17 new selenoprotein both a selenocysteine insertion sequence (SECIS) element, genes that belong to six families. In addition, which is a stem-loop structure immediately downstream of several Pyl-containing proteins were identified Sec-encoding UGA codon (5,11,12), and trans-acting in this dataset. Most selenoproteins and factors dedicated to Sec incorporation (8). In archaea and eukaryotes, SECIS elements are located in 30-UTRs Pyl-containing proteins were present in a single and some factors involved in Sec biosynthesis and deltaproteobacterium, d1 symbiont, which con- insertion are different. Recent identification of Sec tained the largest number of both selenoproteins synthase, SecS, in eukaryotes, which is different from the and Pyl-containing proteins of any organism bacterial Sec synthase, SelA, provided important insights reported to date. Our data contrast with the into Sec biosynthesis in these organisms (13). previous observations that symbionts and host- Recently, an additional rare amino acid pyrrolysine associated bacteria either lose Sec utilization or (Pyl), was identified, which expanded the canonical genetic possess a limited number of selenoproteins, code to 22 amino acids (14,15). Pyl is inserted in response and suggest that the environment in the gutless to UAG codon in several methanogenic archaea (14). worm promotes Sec and Pyl utilization. Anaerobic Although the mechanism of Pyl biosynthesis and incor- conditions and consistent selenium supply might be poration into protein is not fully understood, the presence pyl the factors that support the use of amino acids that of a tRNA gene (pylT) with the CUA anticodon and of extend the genetic code. class II aminoacyl-tRNA synthetase gene (pylS) argued for cotranslational incorporation of Pyl (15). In Desulfitobacterium hafniense, a single bacterium, in which a Pyl-containing protein was found, PylS consists INTRODUCTION of two proteins: PylSn and PylSc (15). Selenium (Se) is an essential micronutrient due to its In recent years, large-scale genome sequencing projects, requirement for biosynthesis and function of the 21st including both organism-specific and environmental amino acid, selenocysteine (Sec). This amino acid is metagenomic projects, provided a large volume of gene typically found in the active sites of a small number of and protein sequence information. However, selenopro- selenoproteins in all three domains of life: archaea, tein genes are almost universally misannotated in these *To whom correspondence should be addressed. Tel: +1 402 472 4948; Fax: +1 402 472 7842; Email: [email protected] ß 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2007, Vol. 35, No. 15 4953 datasets because UGA has the dual function of encoding TGA/TAG/TAA-containing homologs using TBLASTN Sec and terminating translation, and only the latter with default parameters. Only local alignments, in which function is recognized by current annotation programs. Cys in the query protein was aligned with TGA codon in Several bioinformatics tools have been developed to the nucleotide sequence from the Olavius symbionts’ address this problem and can be used to identify metagenomic database, were selected for further analysis. selenoprotein genes (16–22). These programs have suc- For each TGA-containing nucleotide sequence identified cessfully identified many new selenoproteins in both in the metagenomic database, regions upstream and prokaryotic and eukaryotic genomes, as well as in the downstream of the putative in-frame TGA codon were Sargasso Sea environmental samples (23). analyzed to identify a minimal ORF. If a stop codon was Complex symbiotic relationships between bacteria and found between the in-frame TGA codon and an initiation multicellular eukaryotes have evolved in several environ- codon (ATG or GTG), such a TGA-containing sequence ments, but science has traditionally focused on interac- was discarded. tions that are pathogenic (24). Recently, there has been increased recognition of symbiotic interactions that Analyses of TGA-flanking regions and sequence clustering benefit both the microorganism and the host (25). We analyzed the conservation of TGA-flanking regions in A recent metagenomic analysis of the symbiotic microbial all six reading frames using BLASTX. If the best hit, consortium of the marine oligochaete Olavius algarvensis, which covered the TGA codon with at least a 10-nt a worm lacking a mouth, gut and nephridia, revealed overlap, was in a different reading frame than the four major co-occurring symbionts, which belong TGA codon, the corresponding sequence was filtered to Deltaproteobacteria (d1 and d4) and Gamma- out. RPS-BLAST was then used to search against proteobacteria (g1 and g3), as well as one minor conserved domains database (CDD). If the best hit Spirochaete species. Since some Deltaproteobacteria which covered the TGA codon with at least a five-residue are selenoprotein-rich organisms (27), we analyzed the overlap was in a different reading frame or additional stop selenoproteomes of these symbionts to examine a possible codons appeared within the conserved domain in the same relationship between selenium and symbiosis. frame, the sequence was removed. To characterize selenoproteome in these symbionts, we We used BL2SEQ to cluster remaining protein adopted a Sec/cysteine(Cys) homology-based search sequences into different groups. If a local alignment of approach, which has been successfully used to characterize two proteins had an E-value below 10À4 and was at least the selenoproteomes of both prokaryotes (22) and one of 20 amino acid long, as well as the predicted Sec residues the largest prokaryotic sequencing projects, the Sargasso were located at the same position or very close (no more Sea microbial sequencing project (23). We detected known than three residues apart) in the alignment, the two selenoproteins present in this metagenomic dataset and proteins were assigned to the same cluster. identified several novel selenoproteins. Interestingly, one deltaproteobacterium, d1 symbiont, contains at least 57 Cysteine conservation and selenoprotein classification selenoproteins, which is the largest number of selenopro- teins reported to date in any organism. In addition, several All clusters were automatically searched against NCBI Pyl-containing proteins were identified and most were also NR and microbial databases using BLASTX and found in the same d1 symbiont. Our results provide new TBLASTX.