A Database for the Analysis of Small Heat Shock Proteins Emmanuel Jaspard, Gilles Hunault
Total Page:16
File Type:pdf, Size:1020Kb
sHSPdb: a database for the analysis of small Heat Shock Proteins Emmanuel Jaspard, Gilles Hunault To cite this version: Emmanuel Jaspard, Gilles Hunault. sHSPdb: a database for the analysis of small Heat Shock Proteins. BMC Plant Biology, BioMed Central, 2016, 16, pp.1-11. 10.1186/s12870-016-0820-6. hal-01602002 HAL Id: hal-01602002 https://hal.archives-ouvertes.fr/hal-01602002 Submitted on 27 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Jaspard and Hunault BMC Plant Biology (2016) 16:135 DOI 10.1186/s12870-016-0820-6 DATABASE Open Access sHSPdb: a database for the analysis of small Heat Shock Proteins Emmanuel Jaspard1,2,3* and Gilles Hunault4 Abstract Background: small Heat Shock Proteins (sHSP) is a wide proteins family. SHSP are found in all kingdoms and they play critical roles in plant stress tolerance mechanisms (as well as in pathogenic microorganisms and are implicated in human diseases). Results: sHSPdb (small Heat Shock Proteins database) is an integrated resource containing non-redundant, full-length and curated sequences of sHSP, classified on the basis of amino acids motifs and physico-chemical properties. sHSPdb gathers data about sHSP defined by various databases (Uniprot, PFAM, CDD, InterPro). It provides a browser interface for retrieving information from the whole database and a search interface using various criteria for retrieving a refined subset of entries. Physicochemical properties, amino acid composition and combinations are calculated for each entry. sHSPdb provides automatic statistical analysis of all sHSP properties. Among various possibilities, sHSPdb allows BLAST searches, alignment of selected sequences and submission of sequences. Conclusions: sHSPdb is a new database containing information about sHSP from all kingdoms. sHSPdb provides a classification of sHSP, as well as tools and data for the analysis of the structure - function relationships of sHSP. Data are mainly related to various physico-chemical properties of the amino acids sequences of sHSP. sHSPdb is accessible at http://forge.info.univ-angers.fr/~gh/Shspdb/index.php. Keywords: Small heat shock proteins, Amino acids sequence, Physico-chemical properties Background dimmers [6–8]. A study of all sHSP merged have shown Heat Shock Proteins (HSP) are proteins whose expres- that the average length of ACD is 90 ± 10 residues [9] sion is increased when cells are exposed to elevated and this length is much more constant than the N- and temperatures or other stress. Among HSP, sHSP belong C-terminal domains. However, we have classified sHSP to the superfamily of protein chaperones. They coun- into 21 classes and this value of average length of ACD teract the irreversible aggregation of misfolded proteins. does not apply anymore when considering sHSP class by sHSP are ubiquitous proteins found in all living organ- class. The N-terminal part of ACD is likely not necessary isms, from bacteria to mammals and especially in for dimerization or chaperone activity, but seems required plants [1–4]. Moreover, sHSP display a variety of sub- for higher order aggregates formation [10]. cellular localization and/or tissue distribution [3, 5]. As members of the chaperone network, sHSP play sHSP contain : (i) a highly variable (in both length important roles in cells exposed to heat and other and in sequence) N-terminal domain; (ii) a very con- stress, but their functional molecular mechanisms are served central domain called the Alpha Crystallin not fully elucidated. sHSP monomers (12–42 kDa) usu- Domain (ACD) involved in dimerization leading to the ally associate into large homo-oligomers of mostly 24 building block of higher order sHSP structure; (iii) a subunits [9, 11]. sHSP quaternary structures have a short C-terminal domain that links monomers inside high degree of plasticity due to changes in their oligo- meric state under different cellular conditions or to * Correspondence: [email protected] exchange of their subunits. Loosening of subunit 1 Université d’Angers, UMR 1345 IRHS, SFR 4207 QUASAV, Angers, France organization leads to more dynamic properties thus en- 2INRA, UMR 1345 IRHS, Beaucouzé, France Full list of author information is available at the end of the article hancing the available chaperone sites for the client © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Jaspard and Hunault BMC Plant Biology (2016) 16:135 Page 2 of 11 proteins. For example, it was shown that the sequence sHSP issue. The pertinence of the files annotation and that RLFDQ (found at the N-terminal part of ACD of of the sequences were checked using: (i) the sHSPdb « HSP27 - class number 12) contributes to the higher Search » option (see below); (ii) BLAST homology analysis order assembly of their subunits as well as their struc- and multiple alignments software; (iii) MOCAR (http:// tural stability [12]. Structural studies of numerous sHSP forge.info.univ-angers.fr/~gh/wstat/Mocar/), a program have brought details in monomer, transition of the olig- developed specifically for our sHSP classification based on omeric state [13–15] and sHSP chaperone mechanism pattern motif recognition in any Fasta sequences dataset; [12, 16]. (iv) the « Analyze »andthe«Statistical analysis »tools However, such approaches do not allow the compari- implemented in sHSPdb; (v) Finally, the motifs deter- sonofmorethanafewsHSPatatime.Onthecon- mined for each class of sHSP were further used to per- trary, sHSPdb offers access to large-scale analysis of form PHI-Blast allowing to get additional sequences. sHSP. Moreover, since sHSP is a very large protein fam- Sequence conflict (in particular in the case of AGI ily there is a wide discrepancy in annotation, definition entries for Arabidopsis thaliana) was checked using an- and terminology: sHSPdb provides a classification of notations from the Uniprot files. Identical sequences sHSP based on selective motifs, physicochemical prop- were removed using a program developed for sHSPdb erties and previous classification [17–19]. sHSPdb has feeding. This allowed detecting truncated, ambiguous been conceived with multiple functions to search and (e.g., annotated « unnamed protein », « hypothetical », describe the different sHSP. sHSPdb provides a global « unknown product »…, although some few populated vision of sHSP physicochemical properties, amino acid sHSP classes may contain such annotated files) or false usage, literature as well as statistical analysis on any sHSP sHSP sequences. Sequences containing the undefined dataset built by the user. Such a resource will be invalu- amino acid symbol (X) were removed. All files are able for computational analyses of sHSP structure - func- stored in sHSPdb but only non-redundant, full-length tion relationships, especially regarding the specific roles of and annotated ones are accessible. each of the three domains. sHSPdb is useful to gather The two holding chaperones (holdases) families, i.e., information about sHSPs from all kingdom. sHSPdb HSP31 and HSP33, cannot be considered formally as provides the largest set of sHSP sequences, a new clas- sHSP since they do not harbor the ACD domain, and sification and a precise description of each class of sHSP. interact with client proteins through Cys and disulfide Physico-chemical properties and statistical analysis pro- bonds formation [22]. However, these redox regulated vided by sHSPdb represent an actual plus-value. molecular chaperones have physico-chemical properties close to those of sHSP and also contribute to protein Construction and content homeostasis under stress conditions. Moreover, they are Data sources and characteristics of the dataset redox regulated molecular chaperones: they protect both To fill sHSPdb, we used a two-stage process. The first thermally unfolded and oxidatively damaged proteins step filled automatically the fields of the tables of the from irreversible aggregation and play an important role database using PHP and perl scripts, getting the infor- in the bacterial defense system toward oxidative stress. mation from text and XML files from the following da- Therefore, the two classes HSP31 and HSP33 were in- tabases: NCBI/Proteins, NCBI/CDD, NCBI/Taxonomy, corporated into sHSPdb. EBI/Picr, UNIPROT, Interpro, SANGER/Pfam, AMIGO. The main goal of sHSPdb is the analysis of the Starting with the NCBI accession number, we retrieved structure-function relationships of sHSP. This is why the « GenPept » file to derive the GI number, the even- only amino acids sequences