The EF-Handome: Combining Comparative Genomic Study Using Famdbtool, a New Bioinformatics Tool, and the Network of Expertise of the European Calcium Society
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Biochimica et Biophysica Acta 1742 (2004) 179–183 http://www.elsevier.com/locate/bba Review The EF-Handome: combining comparative genomic study using FamDBtool, a new bioinformatics tool, and the network of expertise of the European Calcium Society Jacques Haiech*, Saad B.M. Moulhaye, Marie-Claude Kilhoffer UMR 7034 CNRS and IFR 85 Gilbert Laustriat, 74, route du Rhin, 67401 ILLKIRCH France Received 9 August 2004; received in revised form 1 October 2004; accepted 4 October 2004 Available online 20 October 2004 Abstract By combining a bioinformatics tool (FamDBTool) and the expertise of a network of calcium binding proteins specialists (European Calcium Society), we aim to accelerate and to rationalize the curation of the public general biological database such as Swissprot and Entrez Gene with respect to specific protein families. In this paper, we show the feasibility of such rationale in order to set and to curate the human, mouse and rat sets of EF-Hand genes, the EF-Handome. D 2004 Elsevier B.V. All rights reserved. Keywords: EF-Handome; FamDBtool; European Calcium Society 1. Introduction in the human genome. Most of these genes present orthologs in mouse and rat. The proteins coded by these Calcium is one of the most important second mes- genes are divided in several subfamilies based on sequence sengers in eukaryotic cells, and probably also in similarities. prokaryotic cells [1,2]. Upon cellular stimulation, the One of the largest subfamily is constituted by the S100 cytosolic calcium concentration undergoes either a tran- proteins (see http://www.unizh.ch/ ~kispi/clinchem/calzium/ sient increase or a set of calcium oscillations. In order to index.html). Although several EF-hand databases exist decipher such calcium signals, the cell possesses calcium- (http://www.structbio.vanderbilt.edu/cabp_database/), it is binding proteins that are able to detect either a transient difficult to obtain all the curated information in one increase in the calcium concentration amplitude or a unique location. In order to fulfill this aim, we have frequency type signal [3]. developed a new bioinformatics tool, FamDBtool, that EF-hand proteins represent the most abundant family of allows us to create an individual database for a given such eukaryotic calcium binding proteins [4,5].These protein family. This tool will be described elsewhere proteins evolve from an ancestral domain of 36 amino and can be obtained upon request. We have used this acids composed of two alpha helical domains and one tool to build an EF-hand protein database (the EF- calcium-binding loop. More than 200 genes coding for Handome) [6] that collects the human, mouse and rat proteins containing at least one EF-hand domain are found genes. This database is used as a working tool among the members of the European Calcium Society (ECS) not only to allow a group of experts to help in the curation of the data from the public databases, but also * Corresponding author. Tel.: +33 388 676889; fax: +33 388 664633. to share information and resources. The database is E-mail address: [email protected] (J. Haiech). accessible through the web server of the ECS (http:// 0167-4889/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.bbamcr.2004.10.001 180 J. Haiech et al. / Biochimica et Biophysica Acta 1742 (2004) 179–183 www.ulb.ac.be/assoc/ecs/ or upon request to haiech@ the gene form is hyperlinked and allows the access to the pharma.u-strasbg.fr). primary database. Several tools have been implemented in order to curate the database and to check the coherency of the data. 2. Building the database Then the database is expanded by searching for genes that present similarities to the starting set of genes. The The EF-Hand database is built from a list of user prevents the integration of new genes in the database Swissprot/TREMBL or RefSeq Identity numbers (Id list). that are not biologically pertinent. Therefore, the database Then, the information needed to fill the different tables may be steadily increased upon user supervision. and fields of the database is obtained from the main The entire database or some specific genes are updated public databases. The collected information for each gene upon user request. Complete description of the database is is presented in Fig. 1. described in the Master Thesis (FamDBTool, S. Moul- The form is divided into four parts, namely (i) general haye) that can be obtained upon request (haiech@pharma. information (locusId from the NCBI Entrez gene database, u-strasbg.fr). orthologous genes in mouse and rat and paralogous genes), (ii) gene information (contig description using the informa- tion from the UCSC Genome Bioinformatics Site http:// 3. The EF-Hand database www.genome.ucsc.edu/index.html?org=Human & db=hg17 & hgsid=34709571), (iii) mRNA information from the The EF-Hand database contains 230 human genes, 180 RefSeq database and (iv) the protein information from the mouse genes and 74 rat genes using the assembly hg17, UNIPROT database. All information displayed in blue in mm5 and rn3 from the UCSC center. These differences are Fig. 1. Gene form for the S100a3 human gene. J. Haiech et al. / Biochimica et Biophysica Acta 1742 (2004) 179–183 181 correlated with the level of completion and annotations of Table 2 the corresponding genomes. EF-Hand human genes separated by less than 100,000 base pairs Table 1 summarizes the number of human EF-Hand Name LocusID Contig Number of genes by chromosome. of gene Amino acids Three chromosomes present a higher number of EF-Hands S100A10 6281 chr1:148767467–148780369 96 genes than the others, namely chromosome 1, 2 and 15. S100A11 6282 chr1:148817055–148823584 105 THH 7062 chr1:148892937–148899629 1898 Some of the genes are located close together. We REPETIN 126638 chr1:148940296–148943438 784 consider genes to form a cluster when the distance between HORNERIN 0 chr1:148998628–149008802 2850 them was less than 100,000 base pairs. Eight putative FILAGGRIN 2312 chr1:149098662–149101005 591 clusters were found. Table 2 presents the information on FLJ22843 388698 chr1:149142696–149144433 213 these clusters. C1orf10 49860 chr1:149193792–149200801 495 S100A9 6280 chr1:150142403–150147572 114 A putative cluster was considered as a real cluster when S100A12 6283 chr1:150158257–150162148 91 such a cluster also exists regarding the mouse ortholog genes. S100A8 6279 chr1:150174582–150177622 93 One cluster does not fulfill this rule (RYR1-ACTN4 on S100A7L1 338324 chr1:150201073–150209776 100 chromosome 19). The most important cluster is on chromo- S100A7L2 127482 chr1:150222610–150223878 101 some 1 and all genes concerned are part of the S100 family. S100A7 6278 chr1:150242294–150247210 100 S100A6 6277 chr1:150319149–150322576 90 The family of human genes coding for proteins with a S100 S100A5 6276 chr1:150321696–150328314 92 like domain is composed of 29 genes; 23 of those genes are S100A4 6275 chr1:150328171–150332355 101 localized in chromosome 1. An excellent review has been S100A3 6274 chr1:150331882–150335807 101 published recently on this family [7] (Table 3). S100A2 6273 chr1:150345660–150352379 97 Fig. 2 represents the localization of these genes on S100A16 140576 chr1:150391440–150399587 103 S100A14 57402 chr1:150398807–150402863 104 chromosome 1. S100A13 6284 chr1:150404447–150412021 98 In fact, this cluster is composed of two subclusters: S100A1 6271 chr1:150412946–150418586 93 ! The S100A9 cluster from S100A1 to S100A9 genes on GUCA1A 2978 chr6:42230152–42256770 200 the dog chromosome 7, GUCA1B 2979 chr6:42259531–42271536 200 ! CALML5 51806 chr10:5529661–5532510 146 The S100A10 cluster starting from the loricrin gene to CALML3 810 chr10:5555924–5559225 148 S100A10 localized on dog chromosome 17. CABP4 57010 chr11:66978394–66984261 275 CABP2 51475 chr11:67041994–67048443 219 DUOX1 53905 chr15:43137248–43174830 1551 Table 1 DUOX2 50506 chr15:43171145–43194651 1548 EF-Hand human genes found on the different human chromosome (Assembly Build 35 from NCBI) FLJ20481 54947 chr16:54099455–54175650 427 CAPNS2 84290 chr16:54157085–54160093 248 Chromosome Chromosome Gene EF-Hand MRCL3 10627 chr18:3236528–3247233 170 size (Mb) number genes MRLC2 103910 chr18:3251135–3269179 172 1 246 2544 42 RYR1 6261 chr19:43,615,203–43,771,042 5038 2 243 1772 19 ACTN4 81 chr19:43829167–43914010 911 3 199 1406 11 4 191 1036 9 5 181 1233 9 6 170 1247 10 7 158 1383 12 The cluster S100A10 contains the small proline-rich 8 146 942 4 9 136 1100 5 protein clusters involved in the keratinisation and cornifi- 10 135 1003 4 cation of the epidermis [8,9]. The genes in this cluster (with 11 134 1692 9 the exception of S100A10 and S100A11) code for large 12 132 1278 6 proteins with repeated domains carrying the S100 specific 13 113 506 1 EF-hand motif at the N-terminus. It is a matter of discussion 14 105 1168 5 15 100 895 13 to consider the S100 domains of those proteins as members 16 90 1107 8 of the S100 family. 17 81 1421 6 The gap between the two clusters is smaller in human 18 76 396 3 than in mouse and rat. A rat S100 gene localized in 19 63 1621 12 this gap has been cloned and does not present an ortholog 20 63 724 6 21 46 355 2 with human or mouse gene (S100 ventral prostate, 22 49 707 3 NM_176076). X 153 1149 7 Fig. 3 compares these two clusters in human and Y 50 251 1 mouse.