Identification and Analysis of the Mouse Basic/Helix-Loop-Helix
Total Page:16
File Type:pdf, Size:1020Kb
中国科技论文在线 http://www.paper.edu.cn BBRC Biochemical and Biophysical Research Communications 350 (2006) 648–656 www.elsevier.com/locate/ybbrc Identification and analysis of the mouse basic/Helix-Loop-Helix transcription factor family Jing Li a,b, Qi Liu b, Mengsheng Qiu c, Yuchun Pan a,*, Yixue Li d,*, Tieliu Shi d,e,* a School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 201101, China b School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China c Department of Anatomical Sciences and Neurobiology, School of Medicine, University of Louisville, Louisville, KY 40292, USA d Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China e Bioinformation Center, Shanghai University, Shanghai 200444, China Received 23 August 2006 Available online 29 September 2006 Abstract The basic/Helix-Loop-Helix (bHLH) proteins are a family of transcription factors that regulates a variety of biological processes. Based on a previously defined consensus motif, we identified the complete set of bHLH protein family from the mouse proteome dat- abases and carried out a series of bioinformatics analysis. As results, 124 mouse bHLH proteins were identified in this study, and 28 of them were additional bHLH proteins beyond the previous report. These 124 mouse bHLH proteins were classified into groups from A to F by the nomenclature and phylogenetic analysis. Statistic analysis of the Gene Ontology annotation of these proteins showed that the bHLH proteins tend to perform functions related to cell differentiation and development. Gene function enrichmeznt analysis among six groups illuminated that the proteins in certain group tend to have special biology functions, so that the molecular function of the unchar- acterized proteins in groups could be inferred. Ó 2006 Elsevier Inc. All rights reserved. Keywords: Basic/Helix-Loop-Helix; Transcription factor; Predictive motif; Dynamic programming; Mouse The interactions among transcription factors (TFs) that acids) has high number of basic residues; the C-terminal bind to cis-regulatory elements in DNA and additional HLH domain (40 amino acids) is formed by two amphi- co-factors are crucial for transcription, which is the initial pathic a-helices connected by a loop of variable length [3]. step of gene expression [1]. Identifying the extent of tran- Crystal structural studies have shown that the bHLH pro- scription factors or transcription factor families on gen- teins dimerize via HLH domains, adopt a scissors shape, ome-wide base is an important step to understand the and bind DNA via the basic domains [4,5]. Based on gene regulatory network. The basic/Helix-Loop-Helix systematic phylogenetic analysis, the bHLH proteins have (bHLH) proteins are a key TF family that regulates a been classified into six groups from A to F in animal variety of biological processes such as neurogenesis, myo- system [2,6,7] according to DNA-binding specificities genesis, cell proliferation, differentiation, and determina- and the structural features. tion in eukaryotic organisms [2]. The bHLH proteins Up to now, the genome-wide prediction and evolution- usually have two functionally distinct domains—the ary analysis of bHLH transcription factor families have DNA-binding basic domain and the C-terminal HLH been performed in Caenorhabditis elegans, Drosophila, domain. The DNA-binding basic domain (15 amino Yeast, and human by BLAST (Basic Local Alignment Search Tool) search [6,8]. A consensus predictive motif * Corresponding authors. was applied into the definition of 147 bHLH protein- E-mail addresses: [email protected] (Y. Pan), [email protected] (Y. Li), encoding genes after BLAST search in the Arabidopsis [email protected] (T. Shi). genome [9]. Based on pattern of sequence conservation, 0006-291X/$ - see front matter Ó 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.bbrc.2006.09.114 转载 中国科技论文在线 http://www.paper.edu.cn J. Li et al. / Biochemical and Biophysical Research Communications 350 (2006) 648–656 649 the predictive motif was established by the position associ- We defined a mouse protein having more than 9 con- ation (pa) statistics from 242 bHLH domain sequences and served amino acids among the 19 residues and more than included 19 elements, in which only one element is in the 7 amino acids matched the 14 residues in HLH region as loop region and the others are in the basic and helix bHLH protein. This criterion was applied due to conserva- regions. This motif has been proved to identify bHLH tion of HLH domain. In our case, proteins with 9, 10, and domain-containing proteins accurately [9,10]. 11 mismatches had an average of 6.7, 7.5, and 8 mismatch- Before the completion of the mouse genome sequencing es in the HLH region, respectively. As a matter of fact, in project [11], Ledent et al. [8] investigated the mouse the mouse bHLH proteins identified by experiment, the bHLH protein family by BLASTP searches against the degree of fit to the HLH region of the predictive motif non-redundant database at National Center for Biotech- ranges from zero mismatches for Myod1, Ascl1, Hand1 nology Information (NCBI) and discovered 102 mouse to 7 mismatches for Ebf4 [12]. The proteins with more than protein sequences containing bHLH domain. Nowadays, 10 mismatches in the 19 conserved sites or more than 7 mis- the completion of the mouse genome sequencing project matches in HLH region were not included in our analysis provided us an opportunity to identify the complete set because their mismatches in the conserved HLH region of mouse bHLH proteins from the mouse proteome by were higher than the most divergent of defined mouse computational method. In this study, we identified at least bHLH proteins. 124 bHLH proteins from the most updated mouse prote- According to above-defined criterion, 124 putative ome database in NCBI. The existing nomenclature and bHLH proteins finally were defined after redundancy phylogenetic analysis allowed us to classify those proteins was removed. The complete multiple sequence alignment into a few large groups. At the same time, we analyzed of the bHLH domain of these 124 mouse proteins is the enrichments of Gene Ontology hierarchy annotations shown in Fig. 1. The length of loop region varied from among those distinct groups. The DNA-binding specifici- three to seventeen. More than two-thirds of putative ties and functional activities of the uncharacterized or bHLH proteins conserved at more than 16 of the 19 sites. novel bHLH proteins identified in our study could be There were nine mouse proteins that matched the motif inferred from the information of the group which they perfectly (with no mismatch), and about 98% putative belong to. bHLH proteins had less than 7 mismatches though our cutoff was up to 10 mismatches. The bHLH domain Results sequences of some putative bHLH proteins are identical (Fig. 1), but BLAST showed the full sequences of those Identification of the mouse bHLH transcription factor family proteins are different. The important information of 124 putative mouse In order to investigate the bHLH transcription factor bHLH proteins or genes was extracted from NCBI and is family in the mouse genome exhaustively, we systemati- listed in Table 1, which included official gene symbol, Gen- cally identified bHLH proteins in mouse proteome data eID (LocusLink), protein accession number, MGI (Mouse by the same way in defining the Arabidopsis bHLH pro- Genome Informatics), and chromosome location. When an tein family [9]. Again, same as the previous report [9], alternative-splicing gene had more than one isoform, only the 19-element predictive motif proposed by Atchley one of them was shown in our table. From the chromo- et al. [10] was also used to define a bHLH protein in this some locations, we found that the 124 mouse bHLH pro- research. Compared with the previous reports, the query teins distributed throughout the whole mouse genome sequences of our BLAST searches increased from one to except chromosome Y, but the frequencies of their distribu- seven since divergent animal bHLH proteins were classi- tion on the chromosomes were different. fied into six high-order groups named A, B, C, D, E, All putative mouse bHLH proteins previously reported and F. Plant bHLHs were all included in group B [8]. by Ledent et al. [8] were identified in our results except Query sequence set was composed of five defined bHLH for three invalid accession numbers (B43814, Q9H494, domains from group A, C, D, E, and F, respectively, and BAA9469) and three redundant protein pairs. The and two from group B. The reason for two sequences gene names assigned by Ledent et al. [8] are also provided from group B selected was that group B was more diver- in Table 1. Therefore, additional 28 bHLH proteins or pro- gent and the similarity of the two query sequences less tein-encoding genes were found in this study. Twenty-one than 25%, comparing to members in other groups the of the additional 28 mouse bHLH proteins have been minimal similarity all were higher than 35%. Instead of reported having HLH domain by publication or predicted. manually calculating numbers of matches and mismatch- For example, Bhlh4 was reported as a bHLH transcription- es after multiple protein sequence alignments, here al regulator in pancreas and brain that marks the dime- dynamic programming (DP) algorithm was employed to sencephalic boundary and is required for rod bipolar cell search the highest score of match of a protein for the maturation [13,14]. Moreover, 10 of additional members predictive motif. Each match in a trial alignment was are hypothetical proteins labeled in Table 1.Forinstance, given a score of 1, and each mismatch was given a score the product of A830053O21Rik gene is hypothetical protein of 0.