View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Biochimica et Biophysica Acta 1742 (2004) 179–183 http://www.elsevier.com/locate/bba Review The EF-Handome: combining comparative genomic study using FamDBtool, a new bioinformatics tool, and the network of expertise of the European Calcium Society

Jacques Haiech*, Saad B.M. Moulhaye, Marie-Claude Kilhoffer

UMR 7034 CNRS and IFR 85 Gilbert Laustriat, 74, route du Rhin, 67401 ILLKIRCH France

Received 9 August 2004; received in revised form 1 October 2004; accepted 4 October 2004 Available online 20 October 2004

Abstract

By combining a bioinformatics tool (FamDBTool) and the expertise of a network of calcium binding specialists (European Calcium Society), we aim to accelerate and to rationalize the curation of the public general biological database such as Swissprot and with respect to specific families. In this paper, we show the feasibility of such rationale in order to set and to curate the human, mouse and rat sets of EF-Hand , the EF-Handome. D 2004 Elsevier B.V. All rights reserved.

Keywords: EF-Handome; FamDBtool; European Calcium Society

1. Introduction in the . Most of these genes present orthologs in mouse and rat. The proteins coded by these Calcium is one of the most important second mes- genes are divided in several subfamilies based on sequence sengers in eukaryotic cells, and probably also in similarities. prokaryotic cells [1,2]. Upon cellular stimulation, the One of the largest subfamily is constituted by the S100 cytosolic calcium concentration undergoes either a tran- proteins (see http://www.unizh.ch/ ~kispi/clinchem/calzium/ sient increase or a set of calcium oscillations. In order to index.html). Although several EF-hand databases exist decipher such calcium signals, the cell possesses calcium- (http://www.structbio.vanderbilt.edu/cabp_database/), it is binding proteins that are able to detect either a transient difficult to obtain all the curated information in one increase in the calcium concentration amplitude or a unique location. In order to fulfill this aim, we have frequency type signal [3]. developed a new bioinformatics tool, FamDBtool, that EF-hand proteins represent the most abundant family of allows us to create an individual database for a given such eukaryotic calcium binding proteins [4,5].These protein family. This tool will be described elsewhere proteins evolve from an ancestral domain of 36 amino and can be obtained upon request. We have used this acids composed of two alpha helical domains and one tool to build an EF-hand protein database (the EF- calcium-binding loop. More than 200 genes coding for Handome) [6] that collects the human, mouse and rat proteins containing at least one EF-hand domain are found genes. This database is used as a working tool among the members of the European Calcium Society (ECS) not only to allow a group of experts to help in the curation of the data from the public databases, but also * Corresponding author. Tel.: +33 388 676889; fax: +33 388 664633. to share information and resources. The database is E-mail address: [email protected] (J. Haiech). accessible through the web server of the ECS (http://

0167-4889/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.bbamcr.2004.10.001 180 J. Haiech et al. / Biochimica et Biophysica Acta 1742 (2004) 179–183 www.ulb.ac.be/assoc/ecs/ or upon request to haiech@ the gene form is hyperlinked and allows the access to the pharma.u-strasbg.fr). primary database. Several tools have been implemented in order to curate the database and to check the coherency of the data. 2. Building the database Then the database is expanded by searching for genes that present similarities to the starting set of genes. The The EF-Hand database is built from a list of user prevents the integration of new genes in the database Swissprot/TREMBL or RefSeq Identity numbers (Id list). that are not biologically pertinent. Therefore, the database Then, the information needed to fill the different tables may be steadily increased upon user supervision. and fields of the database is obtained from the main The entire database or some specific genes are updated public databases. The collected information for each gene upon user request. Complete description of the database is is presented in Fig. 1. described in the Master Thesis (FamDBTool, S. Moul- The form is divided into four parts, namely (i) general haye) that can be obtained upon request (haiech@pharma. information (locusId from the NCBI Entrez gene database, u-strasbg.fr). orthologous genes in mouse and rat and paralogous genes), (ii) gene information (contig description using the informa- tion from the UCSC Genome Bioinformatics Site http:// 3. The EF-Hand database www.genome.ucsc.edu/index.html?org=Human & db=hg17 & hgsid=34709571), (iii) mRNA information from the The EF-Hand database contains 230 human genes, 180 RefSeq database and (iv) the protein information from the mouse genes and 74 rat genes using the assembly hg17, UNIPROT database. All information displayed in blue in mm5 and rn3 from the UCSC center. These differences are

Fig. 1. Gene form for the S100a3 human gene. J. Haiech et al. / Biochimica et Biophysica Acta 1742 (2004) 179–183 181 correlated with the level of completion and annotations of Table 2 the corresponding genomes. EF-Hand human genes separated by less than 100,000 base pairs Table 1 summarizes the number of human EF-Hand Name LocusID Contig Number of genes by . of gene Amino acids Three present a higher number of EF-Hands S100A10 6281 chr1:148767467–148780369 96 genes than the others, namely , 2 and 15. S100A11 6282 chr1:148817055–148823584 105 THH 7062 chr1:148892937–148899629 1898 Some of the genes are located close together. We REPETIN 126638 chr1:148940296–148943438 784 consider genes to form a cluster when the distance between HORNERIN 0 chr1:148998628–149008802 2850 them was less than 100,000 base pairs. Eight putative FILAGGRIN 2312 chr1:149098662–149101005 591 clusters were found. Table 2 presents the information on FLJ22843 388698 chr1:149142696–149144433 213 these clusters. C1orf10 49860 chr1:149193792–149200801 495 S100A9 6280 chr1:150142403–150147572 114 A putative cluster was considered as a real cluster when S100A12 6283 chr1:150158257–150162148 91 such a cluster also exists regarding the mouse ortholog genes. S100A8 6279 chr1:150174582–150177622 93 One cluster does not fulfill this rule (RYR1-ACTN4 on S100A7L1 338324 chr1:150201073–150209776 100 chromosome 19). The most important cluster is on chromo- S100A7L2 127482 chr1:150222610–150223878 101 some 1 and all genes concerned are part of the S100 family. S100A7 6278 chr1:150242294–150247210 100 S100A6 6277 chr1:150319149–150322576 90 The family of human genes coding for proteins with a S100 S100A5 6276 chr1:150321696–150328314 92 like domain is composed of 29 genes; 23 of those genes are S100A4 6275 chr1:150328171–150332355 101 localized in chromosome 1. An excellent review has been S100A3 6274 chr1:150331882–150335807 101 published recently on this family [7] (Table 3). S100A2 6273 chr1:150345660–150352379 97 Fig. 2 represents the localization of these genes on S100A16 140576 chr1:150391440–150399587 103 S100A14 57402 chr1:150398807–150402863 104 chromosome 1. S100A13 6284 chr1:150404447–150412021 98 In fact, this cluster is composed of two subclusters: S100A1 6271 chr1:150412946–150418586 93

! The S100A9 cluster from S100A1 to S100A9 genes on GUCA1A 2978 chr6:42230152–42256770 200 the dog chromosome 7, GUCA1B 2979 chr6:42259531–42271536 200 ! CALML5 51806 chr10:5529661–5532510 146 The S100A10 cluster starting from the loricrin gene to CALML3 810 chr10:5555924–5559225 148 S100A10 localized on dog chromosome 17. CABP4 57010 chr11:66978394–66984261 275 CABP2 51475 chr11:67041994–67048443 219 DUOX1 53905 chr15:43137248–43174830 1551 Table 1 DUOX2 50506 chr15:43171145–43194651 1548 EF-Hand human genes found on the different human chromosome (Assembly Build 35 from NCBI) FLJ20481 54947 chr16:54099455–54175650 427 CAPNS2 84290 chr16:54157085–54160093 248 Chromosome Chromosome Gene EF-Hand MRCL3 10627 chr18:3236528–3247233 170 size (Mb) number genes MRLC2 103910 chr18:3251135–3269179 172 1 246 2544 42 RYR1 6261 chr19:43,615,203–43,771,042 5038 2 243 1772 19 ACTN4 81 chr19:43829167–43914010 911 3 199 1406 11 4 191 1036 9 5 181 1233 9 6 170 1247 10 7 158 1383 12 The cluster S100A10 contains the small proline-rich 8 146 942 4 9 136 1100 5 protein clusters involved in the keratinisation and cornifi- 10 135 1003 4 cation of the epidermis [8,9]. The genes in this cluster (with 11 134 1692 9 the exception of S100A10 and S100A11) code for large 12 132 1278 6 proteins with repeated domains carrying the S100 specific 13 113 506 1 EF-hand motif at the N-terminus. It is a matter of discussion 14 105 1168 5 15 100 895 13 to consider the S100 domains of those proteins as members 16 90 1107 8 of the S100 family. 17 81 1421 6 The gap between the two clusters is smaller in human 18 76 396 3 than in mouse and rat. A rat S100 gene localized in 19 63 1621 12 this gap has been cloned and does not present an ortholog 20 63 724 6 21 46 355 2 with human or mouse gene (S100 ventral prostate, 22 49 707 3 NM_176076). X 153 1149 7 Fig. 3 compares these two clusters in human and Y 50 251 1 mouse. 182 J. Haiech et al. / Biochimica et Biophysica Acta 1742 (2004) 179–183

Table 3 S100A15) [10]. From Fig. 3, the S100A9 gene cluster Human proteins that contain a S100 domain appears to be arranged into four groups: Name of gene LocusID Contig S100A10 6281 chr1:148767467–148780369 ! the S100A9, 12 and 8 group, S100A11 6282 chr1:148817055–148823584 ! the S100A7 group, THH 7062 chr1:148892937–148899629 ! the S100A3, A4, A5, A6 group. It is possible that the repetin 126638 chr1:148940296–148943438 HORNERIN 388697 chr1:148998628–149008829 human S100A2 belongs to this group although we FILAGGRIN 2312 chr1:149098662–149101005 did not find an orthologous gene in mouse or rat FLJ22843 388698 chr1:149142696–149144433 genome, C1orf10 49860 chr1:149193792–149200801 ! the S100A1, S100A13, A14, A16 group [11]. S100A9 6280 chr1:150142403–150147572 S100A12 6283 chr1:150158257–150162148 S100A8 6279 chr1:150174582–150177622 We may hypothesize that the genes belonging to a S100A7L1 338324 chr1:150201073–150209776 specific group present a coordinated expression profile. We S100A7L2 127482 chr1:150222610–150223878 have checked this point by the use of the database GEO S100A7 6278 chr1:150242294–150247210 (http://www.ncbi.nlm.nih.gov/geo/). The S100A6 6277 chr1:150319149–150322576 Omnibus is a high-throughput gene expression repository S100A5 6276 chr1:150321696–150328314 S100A4 6275 chr1:150328171–150332355 [12]. S100A3 6274 chr1:150331882–150335807 The S100A7 subcluster does not seem to exist in the S100A2 6273 chr1:150345660–150352379 mouse genome. Either we miss a 100-kb piece of the mouse S100A16 140576 chr1:150391440–150399587 genome in this region or this cluster has been eliminated S100A14 57402 chr1:150398807–150402863 from the mouse genome. S100A13 6284 chr1:150404447–150412021 S100A1 6271 chr1:150412946–150418586 S100P 6286 chr4:6812638–6817969 S100Z 170591 chr5:76180582–76210800 S100A11L 30013 chr7:102496398–102496703 4. Conclusions S100A11Y 347701 chr7:34982076–34982387 S100B 6285 chr21:46841959–46850424 CALB3 795 chrX:16426938–16433448 Today, we are overwhelmed by an avalanche of bio- logical data. Those data are often noisy and, therefore, difficult to analyze. It is of paramount importance to curate In the cluster S100A9, the human S100A7 gene family them. Whereas it is difficult to do this cleaning on a general is composed of at least four members (S100A7, basis, it has been shown that a network of experts may ease S100A7L3, S100A7L2 and S100A7L1 also named this curation when focusing on a specific class of proteins.

Fig. 2. Localization of the S100A9 and S100A10 gene clusters on human chromosome 1, mouse chromosome 3, rat chromosome 2 and dog chromosome 17 and 7. J. Haiech et al. / Biochimica et Biophysica Acta 1742 (2004) 179–183 183

Fig. 3. Localization on chromosome 1 of the human S100 genes dispatched into two clusters (S100A10 and S100A9 clusters) and comparison with the localization of the mouse orthologous genes.

By creating a simple bioinformatics tool that allows to focus [4] H. Kawasaki, S. Nakayama, R.H. Kretsinger, Classification and on a family of genes, in our case the calcium binding evolution of EF-hand proteins, BioMetals 11 (1998) 277–295. [5] M. Goodman, J.F. Pechere, J. Haiech, J.G. Demaille, Evolutionary proteins, and by using this tool to synergize the expertise of diversification of structure and function in the family of intracellular the specialists in a given scientific field, in our case the calcium-binding proteins, J. Mol. Evol. 13 (1979) 331–352. members of the European Calcium Society, we believe that [6] G. Manning, D.B. Whyte, R. Martinez, T. Hunter, S. Sudarsanam, The this will help to curate the more generalized and public protein kinase complement of the human genome, Science 298 (2002) databases. 1912–1934. [7] I. Marenholz, C.W. Heizmann, G. Fritz, S100 proteins in mouse and man: from evolution to function and pathology, BBRC (2004) 204. Acknowledgements [8] S. Patel, T. Kartasova, J.A. Segre, Mouse Sprr locus: a tandem array of coordinately regulated genes, Mamm. Genome 14 (2003) This work was supported by the project 42 of the bre´seau 140–148. Q [9] I. Marenholz, M. Zirra, D.F. Fischer, C. Backendorf, A. Ziegler, D. GenHomme . We acknowledge the ECS board for scientific Mischke, Identification of Human Epidermal Differentiation Complex fruitful discussions and helpful comments. (EDC)-encoded genes by subtractive hybridization of entire YACs to a gridded keratinocyte cDNA library, Genome Res. 11 (2001) 341–355. [10] J.K. Kulski, C.P. Lim, D.S. Dunn, M. Bellgard, Genomic and References phylogenetic analysis of the S100A7 (Psoriasin) gene duplications within the region of the S100 gene cluster on human chromosome [1] M.J. Berridge, M.D. Bootman, H.L. Roderick, Calcium signalling: 1q21, J. Mol. Evol. 56 (2003) 397–406. dynamics, homeostasis and remodelling, Nat. Rev., Mol. Cell Biol. 4 [11] I. Marenholz, C.W. Heizmann, S100A16, a ubiquitously expressed (2003) 517–529. EF-hand protein which is up-regulated in tumors, Biochem. Biophys. [2] M.L. Herbaud, A. Guiseppi, F. Denizot, J. Haiech, M.C. Kilhoffer, Res. Commun. 313 (2004) 237–244. Calcium signalling in Bacillus subtilis, Biochim. Biophys. Acta 1448 [12] R. Edgar, M. Domrachev, A.E. Lash, Gene expression omnibus: (1998) 212–226. NCBI gene expression and hybridization array data repository, [3] M.C. Kilhoffer, J. Haiech, J.G. Demaille, Ion binding to calmodulin. Nucleic Acids Res. 30 (2002) 207–210. A comparison with other intracellular calcium-binding proteins, Mol. Cell. Biochem. 51 (1983) 33–54.