Diversity and functional plasticity of eukaryotic selenoproteins: Identification and characterization of the SelJ family

Sergi Castellano*†, Alexey V. Lobanov‡, Charles Chapple*, Sergey V. Novoselov‡, Mario Albrecht§, Deame Hua‡, Alain Lescure¶, Thomas Lengauer§, Alain Krol¶, Vadim N. Gladyshev‡, and Roderic Guigo´ *

*Grup de Recerca en Informa`tica Biome`dica, Institut Municipal d’Investigacio´Me` dica, Universitat Pompeu Fabra and Centre de Regulacio´Geno`mica, Carrer del Doctor Aiguader 80, 08003 Barcelona, Spain; ‡Department of Biochemistry, University of Nebraska, Lincoln, NE 65588; §Max Planck Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbru¨cken, Germany; and ¶Institut de Biologie Mole´culaire et Cellulaire, 15 Rue Rene´Descartes, 67084 Strasbourg Cedex, France

Edited by Philip P. Green, University of Washington School of Medicine, Seattle, WA, and approved September 22, 2005 (received for review June 19, 2005)

Selenoproteins are a diverse group of proteins that contain sel- evolutionary direction (if any) of Sec͞Cys interconversion. This enocysteine (Sec), the 21st amino acid. In the genetic code, UGA distribution of Sec and Cys residues across genomes hinders the serves as a termination signal and a Sec codon. This dual role has identification of true Sec-containing proteins. precluded the automatic annotation of selenoproteins. Recent In consequence, the description of eukaryotic selenoproteomes advances in the computational identification of selenoprotein is incomplete. The number, functional diversity, and phylogenetic genes have provided a first glimpse of the size, functions, and distribution of eukaryotic selenoproteins are poorly known and, phylogenetic diversity of eukaryotic selenoproteomes. Here, we thus, the importance of Sec and selenium in protein function and describe the identification of a selenoprotein family named SelJ. In evolution remains unclear, despite an increasing body of evidence contrast to known selenoproteins, SelJ appears to be restricted to linking selenium deficiency to a number of pathologies (reviewed actinopterygian fishes and sea urchin, with Cys homologues only in ref. 17). Recently, computational approaches have been devel- found in cnidarians. SelJ shows significant similarity to the jellyfish oped that have greatly contributed to the characterization of J1-crystallins and with them constitutes a distinct subfamily within selenoproteins in many eukaryotic genomes. Specific methods have the large family of ADP-ribosylation enzymes. Consistent with its been developed to predict SECIS elements in nucleotide sequences potential role as a structural crystallin, SelJ has preferential and (5, 6). SECIS predictions, in turn, have been used to instruct homogeneous expression in the eye lens in early stages of ze- modified gene prediction algorithms to ignore UGA codons as brafish development. A structural role for SelJ would be in contrast terminators in putative exon sequences when SECIS elements are to the majority of known selenoenzymes. The unusually highly predicted at the appropriate distance (7–9). In addition, compar- restricted phylogenetic distribution of SelJ, its specialization, and ative sequence analysis methods have proven to be very powerful the comparative analysis of eukaryotic selenoproteomes reveal the in uncovering novel selenoproteins. Indeed, protein sequence con- diversity and functional plasticity of selenoproteins and point to a servation across a TGA codon between sequences from at mosaic evolution of the use of Sec in proteins. large phylogenetic distances is strongly indicative of the Sec coding function (10). ͉ ͉ ͉ ADP-ribosylation J1-crystallins selenocysteine selenium Here, we report the discovery by computational and experi- mental means of a selenoprotein family designated SelJ in the elenocysteine (Sec) residues are found only in a small group of genome of Tetraodon nigroviridis. SelJ has a very restricted Sproteins called selenoproteins. Selenoproteins with character- phylogenetic distribution and, in contrast to all known eukary- ized functions are enzymes involved in redox reactions. Sec is otic selenoproteins, does not exist in mammalian genomes, not inserted by dynamic recoding of an in-frame UGA stop codon even as a Cys homologue. In addition, although the majority of located upstream of the actual termination codon (1). Selenopro- selenoproteins are assumed to have enzymatic functions, com- teins are present in eukaryotes and prokaryotes and, although their putational and experimental data suggest that SelJ could have a sets of selenoproteins (selenoproteomes) do not completely overlap structural role. Comparison of eukaryotic selenoproteomes (2), their intersection appears to be larger than previously thought highlights the scattered phylogenetic distribution of selenopro- (3). In eukaryotes, the Sec insertion sequence (SECIS), an RNA teins, arguing for a mosaic evolution of specialized proteins in Ј stem-loop located in the 3 UTR of selenoprotein genes, recruits which the differential use of Sec and Cys shapes the size and several transacting factors to recode UGA from termination to Sec function of eukaryotic selenoproteomes. insertion (4). The dual function of this codon confounds gene- finding programs and human curators, making selenoprotein iden- Methods tification a challenge (5–10). Gene and SECIS Prediction. The T. nigroviridis sequence data (18) was In addition, Cys and Sec are structurally related amino acids, with screened for previously uncharacterized selenoprotein genes using Sec incorporating selenium in the position of sulfur in Cys. The three independent computational methods (see Supporting Meth- functional resemblance of Sec and Cys is supported by the obser- vation that their residues occupy equivalent positions in homolo- gous proteins. As expected, mutation of the Sec residue to Cys Conflict of interest statement: No conflicts declared. results in variants that are active but have a lower catalytic efficiency This paper was submitted directly (Track II) to the PNAS office. (11–13), although they may have an enhanced translation (13). Abbreviations: Sec, selenocysteine; SECIS, Sec insertion sequence; ARH, ADP-ribosylglyco- Furthermore, natural Cys-containing counterparts are usually in- hydrolase; mjARH, ARH from Methanococcus jannaschii. ferior catalysts (14, 15), but similar reactivity has also been reported See Commentary on page 16123. (16). Hence, Sec, per se, does not possess an essential role in protein †To whom correspondence should be sent at the present address: Department of Molecular function and these two residues seem to be partially interchange- Biology and Genetics, Cornell University, 101 Biotechnology Building, Ithaca, NY 14853. able. Accordingly, Sec and Cys residues are alternatively found in E-mail: [email protected]. orthologous proteins in different species raising the issue of the © 2005 by The National Academy of Sciences of the USA

16188–16193 ͉ PNAS ͉ November 8, 2005 ͉ vol. 102 ͉ no. 45 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0505146102 Downloaded by guest on September 26, 2021 SEE COMMENTARY

Fig. 1. SelJ gene and SECIS struc- ture. (A) SelJ gene structure in T. ni- groviridis. Coding exon regions are shown in blue. The annotated gene in Ensembl (GSTENG00015130001) has the frame changed and uses a different termination codon. There- fore, SelJ is reannotated with the Sec codon (TGA in the genomic se- quence) and the SECIS element in the noncoding region of the last exon. (B) SelJ SECIS element.

ods, which is published as supporting information on the PNAS web ACCTGCGGTTTGCCTGGTGC-3Ј and 5Ј-GGAATGCACC- site). The SelJ (Fig. 1A) selenoprotein mRNA was found through AGGCAAACCGCAGGTATTTGGAAA-3Ј primers and a a coordinated prediction of ORFs interrupted by in-frame UGA QuikChange Mutagenesis kit (Stratagene). The coding region of codons in cDNA sequences with the gene prediction program the resulting Cys mutant of SelJ was amplified with 5Ј-TCGA- GENEID (19) and SECIS elements (Fig. 1B) with SECISEARCH (8, 9). TCTCGAGGATTTCAAGATGGCTCTTGC-3Ј and 5Ј-CTAA- GGAATCCCTTGTTTTTCCAGGAAGGTGGAATC-3Ј and 75Se Labeling. The partial sequence of the zebrafish SelJ cDNA that cloned upstream of GFP in pEGFP-N2 vector by using EcoR I and included a region coding for a 18-kDa C-terminal portion of the XhoI restriction sites. NIH 3T3 cells were transfected as described protein and the 3Ј UTR with a predicted SECIS element was above, and fluorescence was detected with an Olympus FV500 amplified with 5Ј-AGTCGCTCGAGGTTGAAGAAGCAGTC- confocal microscope at the Microscopy Core Facility at the Uni- CGTGTCAC-3Ј and 5Ј-GCTTGGGATCCATTTTCCGCATGT- versity of Nebraska, Lincoln (see Fig. 3). CATGCTG-3Ј primers and cloned into pEGFP-C3 (Clontech) vector by using XhoI and BamHI restriction sites to generate the Homology Searches. TBLASTN (20) was used to query the following GFP–SelJ construct, which codes for a fusion protein containing for SelJ homologues: the National Center for Biotechnology In- GFP followed by a C-terminal region of SelJ. The plasmid was formation (NCBI, www.ncbi.nlm.nih.gov) collection of 62 eukary- purified with an EndoFree Maxi kit (Qiagen, Valencia, CA). NIH otic (partial and complete), 25 archaeal, and 354 bacterial genomes, 3T3 cells were transfected with the GFP–SelJ construct using the NCBI eukaryotic ESTs, and sets of eukaryotic transcripts (81 Lipofectamine and Plus reagents (Invitrogen). Transfected cells species) for SelJ homologues from The Institute for Genomic were supplemented with 5–10 mCi (1 Ci ϭ 37 GBq) of [75Se]se- Research (www.tigr.org). Additional homologues were identified by lenite (University of Missouri Research Reactor, Columbia) per using iterated PSI-BLAST (20) searches in the NCBI and UniProt 60-mm cell culture dish. After 24 h, cells were collected, and protein (Universal Protein Resource, www.pir.uniprot.org) databases. ͞

samples were prepared, subjected to SDS PAGE, and transferred BIOCHEMISTRY onto a polyvinylidene difluoride membrane. Radioactive bands were detected by using a phosphorimager (Amersham Biosciences, which is now GE Healthcare). See Fig. 2.

Subcellular Localization. The Sec-encoding TGA codon was mu- tated to TGC, a Cys codon, by using 5Ј-CTGTGTTTCCAAAT-

Fig. 2. 75Se labeling. The left lane shows cells transfected with a GFP–SelJ construct, and the right lane shows cells transfected with a control vector. Fig. 3. Subcellular localization of SelJ. (Upper) The SelJ–GFP fusion protein, Migration of endogenous selenoproteins TR1 and GPx1 is indicated by arrows in which Sec was mutated to Cys, was expressed in NIH 3T3 cells. (Lower)A on the right. Migration of the 45-kDa GFP–SelJ fusion selenoprotein is shown control in which the cells were transfected with a construct expressing GFP by an arrow on the left. alone.

Castellano et al. PNAS ͉ November 8, 2005 ͉ vol. 102 ͉ no. 45 ͉ 16189 Downloaded by guest on September 26, 2021 Fig. 4. Expression pattern of the SelJ gene during development in zebrafish embryos. Shown are a gastrula (A), an embryo at middle somitogenesis (B), dorsoventral (C), or transversal (D) views of the 20-somite stage, a whole embryo 24 h after fertilization (E) and a closer view of the eyes (F), a whole embryo 36 h after fertilization (G), and an embryo 48 h after fertilization (H). All views are lateral, except C and D.(C) Territories in which SelJ is expressed are indicated in black, and the position of the midbrain–hindbrain boundary is shown in red.

Analysis of Promoter Regions. The available J1A (289 nt), accession numbers and species of depicted protein sequences in Fig. J1B (265 nt), and J1C (178 nt) promoter sequences were obtained 7 are listed in Table 2, which is published as supporting information from NCBI entries L05524, L05523, and L05522, respectively. SelJ on the PNAS web site. See Supporting Methods for details. promoter regions (1,000 nt upstream of the most 5Ј transcript) were extracted for T. nigroviridis (chromosome 15), Tetraodon rubripes Distribution of Eukaryotic Selenoproteins. Human selenoproteins (scaffold࿝182) and Danio rerio (Zv4࿝scaffold126.1) fishes. Bl2SEQ were compared (TBLASTN) to selected eukaryotic genomes and Ϫ was used to compare these promoter regions at the sequence level. transcriptomes (E Ͻ 10 3) with different BLOSUM matrices (see The PROMO server with TRANSFAC 8.3 (21, 22) was used to predict Fig. 5). promoter elements (see Supporting Methods). Results In Situ Hybridization in Zebrafish. A TBLASTN search in EST data- T. nigroviridis Selenoproteome. The T. nigroviridis selenoproteome bases identified eight zebrafish sequences encoding homologues to consists of 19 known selenoprotein families (18): 15kDa, DI, GPx, the T. nigroviridis SelJ protein. These ESTs generated a 1,292-bp SelH, SelI, SelK, SelM, SelN, SelO, SelP, SelR, SelS, SelT, SelU, contiguous sequence containing the entire ORF and the 3Ј UTR SelV, SelW, SPS2, and TR. Thus, all known vertebrate selenopro- with the SECIS motif. A 1,300-bp DNA fragment encompassing teins exist in true Sec form in this fish. In addition, this genome positions 12–1313 of the zebrafish SelJ cDNA was obtained by encodes SelJ on chromosome 15. SelJ has 9 exons, with a single Sec SmaI–EcoRI digestion and cloned into pBluescriptSK(Ϫ). The residue lying in exon 7 (Fig. 1A). The SelJ gene encodes a protein antisense probe was synthesized with T7 RNA polymerase, and of 341 residues and has a type I SECIS (Fig. 1B). Type I SECIS, in whole-mount in situ hybridizations were performed (23). The fully contrast to type II, have no additional small-stem loop at the end detailed protocol is available from the authors upon request (see of the apical loop (25). Fig. 4 and Supporting Methods; see also Fig. 6, which is published as supporting information on the PNAS web site). SelJ Experimental Validation. To verify that SelJ is a true seleno- protein, 75Se-labeling of NIH 3T3 cells was undertaken. A 75Se- EST Expression Pattern. Tissue distribution of SelJ in different fishes labeled band (Fig. 2) of the expected size for the protein product was obtained from ESTs (indicated here with each species and the containing the in-frame UGA encoding Sec was observed. This respective database accession nos.) in Ensembl (T. nigroviridis, band was absent in the control transfection experiments, whereas GSTENG00015130001), UniGene (D. rerio, Dr.21881), and the the endogenous mammalian selenoproteins were expressed in both Institute for Genomic Research databases [D. rerio, TC292667; samples. Thus, SelJ is a selenoprotein and has a functional SECIS Salmo salar, TC23857 (SelJa), TC23525 (SelJb), and TC33491 element, which is used to insert Sec at an in-frame UGA codon. To (SelJb); Oryzias latipes, TC38404 (SelJa) and TC38404 (SelJb); determine the cellular location of SelJ, a SelJ–GFP fusion protein Fundulus heteroclitus, TC1043 and CN987014; Oncorhynchus was expressed in NIH 3T3 cells in which Sec was mutated to Cys. mykiss, TC56113 and TC64467]. The fusion protein, which was detected by GFP fluorescence, was distributed uniformly in the cell (Fig. 3). Structural and Functional Analysis. Protein domain architectures were obtained from the Pfam (Protein Families Database of SelJ Relationship to J1-Crystallins. A PSI-BLAST search with the Alignments and HMMs) and SCOP (Structural Classification of zebrafish SelJ protein returned J1-crystallins (J1A, J1B, and J1C of Proteins) databases. The protein structure of the ADP- the jellyfish Tripedalia cystophora) as the closest homologues (the ribosylglycohydrolase (ARH) from Methanococcus jannaschii best E value was 3 ϫ 10Ϫ57 for J1C-crystallin, and sequence identity (mjARH) was obtained from the Protein Data Bank (PBD ID code was 36% for J1A-, J1B-, and J1C-crystallins). These proteins have 1T5J), and its secondary structure was obtained from the DSSP Cys in place of Sec (Fig. 8, which is published as supporting (Definition of Secondary Structure of Proteins) database (24). information on the PNAS web site; see also Fig. 7) and are encoded Highly conserved regions among ARHs were mapped onto the 3D by three intronless genes (26). Crystallins maintain the transparency structure of mjARH to identify their locations with respect to and proper light diffraction in the eye lens across the taxa, putative binding sites (see Table 1 and Fig. 7, which are published but unrelated crystallins are found in vertebrates, cephalopods, and as supporting information on the PNAS web site). The NCBI jellyfish. At the base of this polyphyletic origin lies the mechanism

16190 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0505146102 Castellano et al. Downloaded by guest on September 26, 2021 SEE COMMENTARY

Fig. 5. Distribution of eukaryotic selenoproteins and Cys homologues in eukaryotes. Sec proteins are shown in red, Cys proteins are shown in green, and unknown proteins are shown in yellow. Open boxes represent absent proteins in the genome. S. purpuratus, Xenopus laevis, Sus scrofa, and Bos taurus are incomplete genomes. The protein homologous to known disulfide isomerases with Sec instead of Cys in the haptophyte alga Emiliana huxleyi (39) is not depicted. See Table 3, which is published as supporting information on the PNAS web site, for full scientific and common names.

known to produce crystallin-acting proteins, that is, the indepen- TRANSFAC hits were found in all sequences. However, all hits had dent recruitment of metabolic enzymes and stress-related proteins a high random expectation (Ͼ0.25). Therefore, the evidence for in the eye lens. There are two different pathways for crystallin these elements is inconclusive. Furthermore, the hits in the T. BIOCHEMISTRY recruitment, including gene duplication and subsequent sequence nigroviridis and T. rubripes sequences did not correlate with their divergence with or without loss of the original function (27) and conserved regions, suggesting a high false-positive rate. gene sharing, which consists of the recruitment of a gene product In addition, to assess whether SelJ in fishes has significant eye lens to serve additional functions with neither duplication nor loss of the expression, we analyzed the tissue and temporal expression of SelJ primary function (28). Gene sharing may, however, require a in situ during zebrafish embryogenesis. An RNA probe comple- change of tissue specificity, usually a dramatic increase of expres- mentary to the zebrafish SelJ cDNA was synthesized, and in situ sion in the eye lens or a change in developmental timing. J1- hybridization was performed on whole zebrafish embryos from crystallins are of unknown origin. different developmental stages. The hybridization sites were re- To further investigate the potential role of SelJ as a crystallin, we vealed by a chromogenic reaction. The SelJ gene was not expressed have analyzed the promoter region of the fish SelJ and jellyfish during the primary developmental stages, up to middle somitogen- J1-crystallins. As reported in ref. 26, the J1A-, J1B-, and J1C- esis (Fig. 4 A and B). The first transcripts were detected at the stage crystallins have completely different promoter regions. Further- of 20 somites and were observed within the forming lens and the more, the fish promoters do not show sequence similarity to the anterior neural crest cells flanking the midbrain–hindbrain bound- jellyfish promoter regions. However, T. nigroviridis and T. rubripes, ary (Fig. 4 C and D). Twenty-four hours after fertilization, SelJ was two puffer fishes, have stretches of significant sequence similarity. predominantly expressed within different territories: the eye; the We searched for binding sites in common in all promoter regions tectum, a highly proliferating tissue of the brain; the inner cell mass, and, specifically, shared by T. nigroviridis and T. rubripes. Among which is the tissue of hematopoiesis; and the dorsal neural crest hundreds of potential binding sites, we searched for motifs that (Fig. 4). Neural crest cells are highly migrating cells with a large could be indicative of eye function, for example, activator protein spectrum of differentiation potential, such as pigment cells, periph- 1, musculoaponeurotic fibrosarcoma, and those in the Pax family. eral nervous system, or parts of facial cartilage. Within the eye, SelJ Pax-6 has been recognized as a master control gene for eye was expressed in the lens where the labeling was homogenous (Fig. morphogenesis, and other members of this family also are related 4F). Later in development, the expression turned less restricted and to eye differentiation (28). Pax-2 (T01823), Pax-2a (T00678), Pax-4a spread to the whole embryo with higher levels detected in the blood (T02983), Pax-6 (T00682), Pax-9a (T03593), Pax-9b (T03594) and dorsal hindbrain (Fig. 4G). At 48 h of development, the gene

Castellano et al. PNAS ͉ November 8, 2005 ͉ vol. 102 ͉ no. 45 ͉ 16191 Downloaded by guest on September 26, 2021 was widely expressed in all embryonic tissues (Fig. 4H). Addition- catalytic site of Asp-lacking proteins was inactive. Therefore, SelJ, ally, tissue and temporal expression was explored through the J1-crystallins, and other Asp-lacking proteins may still be involved identification of ESTs from various tissue origins in several acti- in catalytic functions related to ADP-ribosylation, but these func- nopterygian fishes. SelJ showed no preferential expression pattern tions may be different from well known ARHs. in later embryonic or adult stages and was observed in liver, kidney, heart, muscle, ovary, gut, spleen, eyes, testis, and brain in adult SelJ Phylogenetic Distribution. SelJ is also present in other fishes in fishes. However, the EST data (76 sequences) might be too scarce either Sec or Cys form (see Figs. 7 and 8). However, no SelJ to infer a reliable expression pattern. In conclusion, SelJ is widely homologues were found in mammals, birds, amphibians, or other expressed in advanced embryo and adult stages, but a more vertebrates. Interestingly, SelJ is also present in Strongylocentrotus restricted pattern of expression in the eye lens is detected in early purpuratus (sea urchin). In addition, Cys homologues are present in embryogenesis. two , the jellyfish T. cystophora and Hydra magnipapillata. SelJ appears to have a very restricted phylogenetic distribution and SelJ Relationship to ADP-Ribosylation Enzymes. The first PSI-BLAST is known in vertebrates but is absent, even as a Cys homologue, in iteration also retrieved significant hits to bacterial proteins (best E mammals. value, 5 ϫ 10Ϫ19), all of which belong to the family of ADP- ribosylation enzymes (termed ADP࿝ribosyl࿝GH in Pfam). This Distribution of Eukaryotic Selenoproteins. The mapping of known search converged at the fifth iteration after retrieving Ͼ170 po- selenoproteins across eukaryotes showed a scattered pattern of tential ADP-ribosylation enzymes from all domains of life, that is, Sec͞Cys distribution among proteins and taxa (Fig. 5). Sec- bacteria, archaea, eukarya, and viruses, including bacteriophages containing proteins seemed to accumulate in vertebrate or mam- and the mimivirus. We conclude that SelJ and J1-crystallins form malian genomes (Fig. 5, unframed families); however, the recently a distinct subfamily within this large family of ADP-ribosylation identified families (Fig. 5, framed families) no longer followed this enzymes. ADP-ribosylation is a reversible posttranslational modi- previously observed trend of Sec͞Cys usage in eukaryotes. A fication of proteins involving the addition of an ADP-ribose moiety greater diversity among selenoproteomes is thus observed. from nicotinamide adenine dinucleotide to acceptor amino acid(s). In particular, ARHs cleave ADP-ribose-L-Arg (29, 30) (EC Discussion 3.2.2.19). Other well studied family members are dinitrogenase As new genome sequences become available and improved com- reductase-activating glycohydrolases (EC 3.2.2.24) of the photo- putational methods are being developed, a more accurate picture synthetic Rhodospirillum rubrum and related bacteria, which regu- of the use and roles of Sec in eukaryotic proteins is emerging. late nitrogen fixation by removing the ADP-ribosyl group of Indeed, a much richer and more complex history of eukaryotic inactivated dinitrogenase reductases to recover their activity selenoproteins than suspected is taking shape. Recent findings have (31, 32). uncovered taxa-specific selenoproteins with unexpected functions To assess whether SelJ or the J1-crystallins may have an ADP- as well as Sec-dependent homologues of well known Cys-containing ribosylation function similar to the previously described ARHs, we genes. first identified functionally relevant amino acids of these enzymes Here, we have used computational methods to analyze the and then compared them to the corresponding residues of SelJ and recently released genomic sequence of the actinopterygian fish T. J1-crystallin proteins (Table 1 and Fig. 7). The analysis of mjARH nigroviridis and predicted a previously uncharacterized selenopro- provided by Gogos et al. in the Protein Data Bank file identifies a tein family, which we designated SelJ. We have shown SelJ to be a catalytic pocket with two alternative adjacent locations of modeled bona fide selenoprotein that possesses a functional SECIS element Mg2ϩ ions, but the presence of different metals cannot be ruled out. by 75Se labeling (Fig. 2). SelJ is widely distributed in actinopterygian Indeed, the activity of mammalian ARHs depends on DTT and fishes and present in at least one species of sea urchin. It is also Mg2ϩ (29, 30). It has also been shown experimentally that dinitro- found among cnidarians in J1-crystallins with Cys (Figs. 7 and 8). genase reductase ADP-ribosyltransferases require Mg-ATP and a In contrast to all known eukaryotic selenoproteins, SelJ is absent, free divalent metal for their activity. Furthermore, a binuclear even as a Cys homologue, in mammals. Mn2ϩ center, also found in arginases (33), has been detected in the Our study indicates that SelJ and J1-crystallins may have been active site of dinitrogenase reductase-activating glycohydrolases derived from ancestral ADP-ribosylation enzymes, suggesting that (31–33). Therefore, the modeled Mg2ϩ ions may really point to the taxa-specific selenoproteins may have evolved specialized functions approximate positions of Mg2ϩ or Mn2ϩ in vivo. from them. Within the family of ADP-ribosylation enzymes, SelJ is In general, Mg2ϩ and Mn2ϩ ions are frequently bound to and more closely related to the J1-crystallins. Analyses of SelJ, J1- coordinated through Asp and Glu (33). Indeed, six highly con- crystallin, and several bacterial proteins with Cys in place of Sec served Asp residues and one Glu are present within the putative show common functional residues that are conserved but different binding pocket of mjARH and are modeled to be involved in metal from those found in the active sites of ADP-ribosylation enzymes binding and coordination (Table 1 and Fig. 9, which is published as (Table 1 and Figs. 7 and 9). This observation argues for the supporting information on the PNAS web site). This catalytic cleft acquisition of a different protein function, although possibly related is also clearly marked by large, negatively charged, protein-surface to ADP-ribosylation. Like the majority of eukaryotic selenopro- patches computed by GRASP2 for the crystal structure of mjARH teins, the function of SelJ is not fully clear. Because the Sec insertion (Fig. 10, which is published as supporting information on the PNAS mechanisms differ between bacteria and eukaryotes, the recombi- web site). Visual inspection of this structure suggested further nant expression of selenoproteins is difficult and often results in a functional and structural roles for certain amino acids (Table 1). low amount of pure protein, which complicates biochemical anal- The functional relevance of Asp and other amino acids in and near yses. However, whereas in most recently identified selenoproteins the active site has also been demonstrated experimentally by there are no clues of which assays should be used to look for inactivating mutations of dinitrogenase reductase-activating glyco- function, we expect the work presented here to motivate and lay the hydrolase in human or rat ADP-ribosylarginine hydrolase (29–32). ground for the experimental test of ADP-ribosylation-related ac- Interestingly, most functionally relevant Asp residues are not con- tivity in SelJ. served in SelJ and J1-crystallin homologues or in six other bacterial In addition, the close relationship between SelJ (with Sec) and proteins (Fig. 7, sequences in violet and brown; see also Table 1). J1-crystallins (with Cys) suggests the possibility of SelJ also being a However, these proteins conserve other residues (Cys͞Ser and His) crystallin. Most selenoproteins and Cys-containing homologues are corresponding to active Asp residues in ARHs (Table 1 and Fig. 7). thought to have an exclusively enzymatic role, and only GPx4 (34) This high degree of conservation would not be expected if the has been previously predicted to have a structural function. If

16192 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0505146102 Castellano et al. Downloaded by guest on September 26, 2021 confirmed, the structural role of SelJ would strongly support a possess selenoproteins (nor the necessary selenoprotein machin- greater functional plasticity of Sec-containing proteins. In this ery) but only alternative Cys variants. Thus, Sec is a rarely used regard, in situ hybridization of SelJ in zebrafish shows high expres- amino acid in extant proteins and, although the selenoprotein sets SEE COMMENTARY sion almost exclusively in the eye lens in early embryogenesis but a are unknown for most species, it seems reasonable to expect them wide expression (including in the eye lens) at later developmental to be variable and rather small. stages. Analyses of EST distribution in several fishes support such The presence of specialized Sec-containing proteins among expression in adult tissues. This pattern, the lack of SelJ paralogous lineages and the scattered distribution of Sec and Cys homologues genes in most fish genomes, and its similarity to bacterial proteins of all selenoproteins are consistent with the highly diverse nature of would be consistent with a crystallin role for SelJ in fishes with selenoproteomes (Fig. 5). However, whether there is a general neither gene duplication nor the loss of a previous enzymatic gain͞loss trend of Sec underlying the observed mosaic distribution function. Interestingly, in jellyfish, J1-crystallins are highly ex- of extant selenoproteins in eukaryotes (Fig. 5) remains unknown. pressed in the eye lens and weakly expressed elsewhere, which is Furthermore, the dynamics of Sec͞Cys interconversion for each consistent also with crystallin and noncrystallin roles (35). In this selenoprotein family and lineage and whether the evolutionary regard, the analysis of the promoter regions of the fish SelJ is forces driving it can be correlated to biological constraints (e.g., inconclusive. However, it cannot be ruled out that the presence of selenium availability, Sec incorporation efficiency, Sec reactivity, SelJ in the eye lens may be related exclusively to its original and Sec͞Cys exchangeability) also are uncertain. Preliminary con- enzymatic function. Indeed, several antioxidant selenoproteins are clusions can be drawn from the study of specific taxa. In nematodes, typically expressed in the eye lens and, to our knowledge, perform the distribution of Sec-containing enzymes suggests that the sizes of no crystallin function (36). selenoproteomes were reduced during evolution and, although The recent analysis of nonmammalian or nonvertebrate lineages some nematodes have few Sec enzymes, in an extreme reduction, has resulted in the identification of several selenoprotein families Caenorhabditis elegans and Caenorhabditis briggsae have only one (Fig. 5, framed families) with restricted taxa distribution in Sec- selenoprotein left (40). containing form and a more widespread distribution in Cys form In summary, although the distribution of selenoproteins we know (10, 37–39). These results argue for a scattered mosaic-like evolu- is only a static snapshot of the current use of Sec among a subset tion of selenoprotein genes rather than for an accumulation of them of eukaryotes, the small but different selenoproteome sizes, their in a sort of evolutionary cul-de-sac taxon (10), because it was protein diversity and functional plasticity, with SelJ as an example, thought to be the mammalian lineage from the Sec͞Cys distribution and the uneven distribution of Sec among proteins and taxa further of the first identified selenoprotein families (Fig. 5, unframed supports a landscape of mosaic evolution in the use of the 21st families). SelJ distribution is indeed more extreme because it does amino acid in proteins. not exist, not even in Cys form, in mammals. This result further supports the hypothesis of a distinct evolutionary history of each We thank R. J. Stillwell, M. Coromines, and M. Oertel for helpful selenoprotein gene. discussions; C. and B. Thisse for expertise with zebrafish in situ hybrid- The ad hoc use of selenium and Sec among taxa is further ization. This work was supported by National Institutes of Health Grant observed in the range of selenoproteome sizes. Well studied GM061603 (to V.N.G.) and Plan Nacional de Investigacio´n Cientı´fica Desarrollo e Innovacio´n Tecnolo´gica Grant BIO2000-1358-C02-02 from selenoproteomes, mainly in mammals (9), consist of small sets of 75 Ͻ the Ministerio de Educacio´n y Ciencia of Spain (to R.G.). The use of Se selenoproteins (possibly 30 genes), which is in accordance with a was supported by U.S. Department of Energy Grant DE-FG07- limited supply of selenium in nature. Studies of the fly genome 02ID14380. The work conducted at the Max Planck Institute for Infor- suggest the presence of three selenoproteins (7, 8), and a recent matics and, partially, at the Institut Municipal d’Investigacio´Me`dica was analysis of model nematodes limit their selenoproteome to a single performed in the context of the BioSapiens Network of Excellence protein (40). Furthermore, known yeast and land plants do not funded by the European Commission.

1. Atkins, J. F. & Gesteland, R. F. (2000) Nature 407, 463. 20. Altschul, S. F., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W. & Lipman, 2. Kryukov, G. V. & Gladyshev, V. N. (2004) EMBO Rep. 5, 538–543. D. (1997) Nucleic Acids Res. 25, 3389–3402. BIOCHEMISTRY 3. Zhang, Y., Fomenko, D. E. & Gladyshev, V. N. (2005) Genome Biol. 6, R37. 21. Messeguer, X., Escudero, R., Farre´, D., Nun˜ez, O., Martı´nez, J. & Alba`, M. (2002) 4. Driscoll, D. M. & Copeland, P. R. (2003) Annu. Rev. Nutr. 23, 17–40. Bioinformatics 18, 333–334. 5. Kryukov, G. V., Kryukov, V. M. & Gladyshev, V. N. (1999) J. Biol. Chem. 274, 22. Farre´, D., Roset, R., Huerta, M., Adsuara, J. E., Rosello´, L., Alba`, M. & Messeguer, 33888–33897. X. (2003) Nucleic Acids Res. 31, 3651–3653. 6. Lescure, A., Gautheret, D., Carbon, P. & Krol, A. (1999) J. Biol. Chem. 274, 38147–38154. 23. Thisse, C., Thisse, B., Schilling, T. F. & Postlethwait, J. H. (1993) Development 7. Castellano, S., Morozova, N., Morey, M., Berry, M. J., Serras, F., Corominas, M. & (Cambridge, U.K.) 119, 1203–1215. Guigo´, R. (2001) EMBO Rep. 2, 697–702. 24. Kabsch, W. & Sander, C. (1983) Biopolymers 22, 2577–2637. 8. Martin-Romero, F. J., Kryukov, G. V., Lobanov, A. V., Carlson, B. A., Lee, B. J., 25. Grundner-Culemann, E., Martin, G. W., III, Harney, J. W. & Berry, M. J. (1999) RNA Gladyshev, V. N. & Hatfield, D. L. (2001) J. Biol. Chem. 276, 29798–29804. 5, 625–635. 9. Kryukov, G. V., Castellano, S., Novoselov, S. V., Lobanov, A. V., Zehtab, O., Guigo´, 26. Piatigorsky, J., Horwitz, J. & Norman, B. L. (1993) J. Biol. Chem. 268, 11894–11901. R. & Gladyshev, V. N. (2003) Science 300, 1439–1443. 27. Piatigorsky, J. (2003) J. Struct. Funct. Genomics 3, 131–137. 10. Castellano, S., Novoselov, S. V., Kryukov, G. V., Lescure, A., Blanco, E., Krol, A., 28. Nilsson, D.-E. (2004) Curr. Opin. Neurobiol. 14, 407–414. Gladyshev, V. N. & Guigo´, R. (2004) EMBO Rep. 5, 71–77. 29. Takada, T., Iida, K. & Moss, J. (1993) J. Biol. Chem. 268, 17837–17843. 11. Axley, M. J., Bo¨ck, A. & Stadman, T. C. (1991) Proc. Natl. Acad. Sci. USA 88, 30. Konczalik, P. & Moss, J. (1999) J. Biol. Chem. 274, 16736–16740. 8450–8454. 31. Antharavally, B. S., Poyner, R. R., Zhang, Y., Roberts, G. P. & Ludden, P. W. (2001) 12. Berry, M. J., Kieffer, J. D., Harney, J. W. & Larsen, P. R. (1991) J. Biol. Chem. 266, J. Bacteriol. 183, 5743–5746. 14155–14158. FEBS Lett. 559, 13. Berry, M. J., Mai, A. L., Kieffer, J., Harney, J. W. & Larsen, P. (1992) Endocrinology 32. Kim, K., Zhang, Y. & Roberts, G. P. (2004) 84–88. 131, 1448–1852. 33. Christianson, D. W. & Cox, J. D. (1999) Annu. Rev. Biochem. 68, 33–57. 14. Stadtman, T. C. (1996) Annu. Rev. Biochem. 65, 83–100. 34. Ursini, F., Heim, S., Kiess, M., Maiorino, M., Roveri, A., Wissing, J. & Flohe, L. 15. Hatfield, D. L., Gladyshev, V. N., Park, J., Park, S. I. Chittum, H. S., Baek, H. J., (1999) Science 285, 1393–1396. Carlson, B. A., Yang, E. S., Moustafa, M. E. & Lee, B. J. (1999) Comp. Nat. Prod. 35. Piatigorsky, J., Norman, B., Dishaw, L. J., Kos, L., Horwitz, J., Steinbach, P. J. & Chem. 4, 353–380. Kozmik, Z. (2001) Proc. Natl. Acad. Sci. USA 98, 12362–12367. 16. Gromer, S., Johansson, L., Bauer, H., Arscott, L. D., Rauch, S., Ballou, D. P., 36. Flohe, L. (2005) Dev. Ophthalmol. 38, 89–102. Williams, C. H., Jr., Schirmer, R. H. & Arner, E. S. (2003) Proc. Natl. Acad. Sci. USA 37. Fu, L. H., Wang, X. F., Eyal, Y., She, Y. M., Donald, L. J., Standing, K. G. & 100, 12618–12623. Ben-Hayyim, G. (2002) J. Biol. Chem. 277, 25983–25991. 17. Hatfield, D. L. (2001) Selenium: Its Molecular Biology And Role In Human Health 38. Novoselov, S. V., Rao, M., Onoshko, N. V., Zhi, H., Kryukov, G. V., Xiang, Y., (Kluwer, New York). Weeks, D. P., Hatfield, D. L. & Gladyshev, V. N. (2002) EMBO J. 21, 3681–3693. 18. Jaillon, O., Aury, J.-M., Brunet, F., Petit, J.-L., Stange-Thomann, N., Mauceli, E., 39. Obata, T. & Shiraiwa, Y. (2005) J. Biol. Chem. 280, 18462–18468. Bouneau, L., Fischer, C., Ozouf-Costaz, C., Bernot, A., et al. (2004) Nature 431, 946–957. 40. Taskov, T., Chapple, C., Kryukov, G. V., Castellano, S., Lobanov, A. V., Korotkov, 19. Parra, G., Blanco, E. & Guigo´, R. (2000) Genome Res. 10, 511–515. K. V., Guigo´, R. & Gladyshev, V. N. (2005) Nucleic Acids Res. 33, 2227–2238.

Castellano et al. PNAS ͉ November 8, 2005 ͉ vol. 102 ͉ no. 45 ͉ 16193 Downloaded by guest on September 26, 2021