Identification of Eight Genes Encoding Chemokine-Like
Total Page:16
File Type:pdf, Size:1020Kb
中国科技论文在线 http://www.paper.edu.cn Genomics 81 (2003) 609–617 Identification of eight genes encoding chemokine-like factor superfamily members 1–8 (CKLFSF1–8) by in silico cloning and experimental validation૾ Wenling Han,1 Peiguo Ding,1 Mingxu Xu, Lu Wang, Min Rui, Shuang Shi, Yanan Liu, Ying Zheng, Yingyu Chen, Tian Yang, and Dalong Ma* Center for Human Disease Genomics, Peking University, 38 Xueyuan Road, Beijing 100083, China Received 21 November 2002; accepted 14 March 2003 Abstract TM4SF11 is only 102 kb from the chemokine gene cluster composed of SCYA22, SCYD1, and SCYA17 on chromosome 16q13. CKLF maps on chromosome 16q22. CKLFs have some characteristics associated with the CCL22/MDC, CX3CL1/fractalkine, CCL17/TARC, and TM4SF proteins. Bioinformatics based on CKLF2 cDNA and protein sequences in combination with experimental validation identified eight novel genes designated chemokine-like factor superfamily members 1–8 (CKLFSF1–8). CKLFSF1–8 form gene clusters; the sequence identities between CKLF2 and CKLFSF1–8 are from 12.5 to 39.7%. Most of the CKLFSFs have alternative RNA splicing forms. CKLFSF1 has a CC motif and higher sequence similarity with chemokines than with any of the other CKLFSFs. CKLFSF8 shares 39.3% amino acid identity with TM4SF11. CKLFSF1 links the CKLFSF family with chemokines, and CKLFSF8 links it with TM4SF. The characteristics of CKLFSF2–7 are intermediate between CKLFSF1 and CKLFSF8. This indicates that CKLFSF represents a novel gene family between the SCY and the TM4SF gene families. © 2003 Elsevier Science (USA). All rights reserved. Keywords: CKLF; CKLFSF; Bioinformatics; Gene cluster; Chemokine; TM4SF11 Chemokines are small, secreted proteins that can be CXC chemokines (CXCL1–16), 2 C chemokines (XCL1– subdivided according to their NH2-terminal cysteine-motif 2), and one CX3C chemokine (CX3CL1/fractalkine) iden- into the CC, CXC, C, and CX3C classes. A systematic tified to date. Their corresponding genes have been named nomenclature for chemokines and their genes has gained SCYA1–28, SCYB1–16, SCYC1–2, and SCYD1, respectively wide acceptance, with 28 CC chemokines (CCL1–28),2 16 [1,2]. SCYA22 (CCL22/MDC), SCYD1 (CX3CL1/fracta- lkine), and SCYA17 (CCL17/TARC) are clustered on chro- mosome 16q13 [3]. TM4SF11, formerly named plasmo- ૾ Sequence data from this article have been deposited with the Gen- Bank Data Library under Accession Nos. AY174118 (CKLFSF1), lipin, is only 102 kb from SCYA22 [4]. Such a short distance AF479260 (CKLFSF2), AF479813 (CKLFSF3), AF479814 (CKLFSF4), suggests some relationship may exist between them. AF479262 (CKLFSF5), AF479261 (CKLFSF6), AF479263 (CKLFSF7), Chemokine-like factor was isolated from PHA-stimu- and AF474370 (CKLFSF8). lated U937 cells. It is composed of four exons and located * Corresponding author. Fax: ϩ1-8610-62091149. E-mail address: [email protected] (D. Ma). on chromosome 16q22. It maintains four alternative RNA 1 These authors contributed equally to this paper. splicing forms, which we designated CKLF1, 2, 3, and 4. 2 Abbreviations used: CKLF, chemokine-like factor; SCY, small se- CKLF1 has a CC motif and the key amino acids around the creted cytokine gene; CCL, CC chemokine ligand; TARC, thymus- and motif are identical with those of CCL22/MDC and CCL17/ activation-regulated chemokine; MDC, macrophage-derived chemokine; TARC, though lacking the additional two C-terminus cys- CX3CL, CX3C chemokine ligand; IL, interleukin; FXR, fragile-X-related protein; TM4SF, transmembrane 4 superfamily; HUGO, Human Genome teines. Mouse and rat CKLFs have some RNA splicing Organization; HGMP, Human Genome Mapping Project. forms similar to those of human CKLFs, but the key motif 0888-7543/03/$ – see front matter © 2003 Elsevier Science (USA). All rights reserved. doi:10.1016/S0888-7543(03)00095-8 转载 中国科技论文在线 http://www.paper.edu.cn 610 W. Han et al. / Genomics 81 (2003) 609–617 of their encoded proteins is not CC, but rather a CX3C motif (manuscript in preparation). Here, we report one of these similar to that of CX3CL1/fractalkine. Both human and rat splice variants, which encodes a protein of 169 amino acids CKLF1 have chemotactic activities on a wide spectrum of (AY174118). leukocytes. CKLF2 is the full-length cDNA product encod- When the cDNA sequence of mouse CKLF2 was used in ing 152 amino acids. It can stimulate the proliferation and a BLASTN search, several mouse EST fragments with high differentiation of C2C12 cells. In addition to possessing the sequence identity were identified. After EST assembly, a CC motif, it has four putative transmembrane regions and novel polypeptide with 169 amino acids was cloned the sequence identity between CKLF2 and TM4SF11 is (AY172740). After conducting a TBLASTN search against 15.4% [5,6]. It seems that CKLF may be an interim gene human EST databases with this protein sequence and EST between SCYA22, SCYD1, SCYA17, and TM4SF11. assembly, we obtained its human homologue and named it When CKLF was cloned in 1998, there was no human CKLFSF2. It is tightly linked with CKLFSF1 on chromo- gene product that had obvious sequence similarity with it. some 16q13, justifying the designation CKLFSF2. However, we suspected the existence of additional CKLF- When mouse CKLF2 was used for BLASTP searches, related genes because of the high frequency of gene dupli- four putative mouse proteins sharing obvious sequence sim- cation within chemokine clusters [7,8] and the rapidly ex- ilarity with it were found, but no human homologues of panded TM4SF proteins [9]. With the current availability of them existed in the public database at that time. TBLASTN large databases of expressed sequence tags (ESTs) and new searches against human EST databases with these sequences bioinformatics methods, we hypothesized that analysis of and the corresponding EST assemblies were performed. The the EST, nr, and HGP databases with multiple computa- accession numbers of the putative mouse genes used for tional tools would allow us to identify novel genes belong- TBLASTN analyses and the accession numbers for the ing to the CKLF family. In silico gene identification from corresponding EST assemblies may be found under Mate- human genome sequence and EST databases has become rials and methods. By using this approach, we successfully critically important in many projects seeking to discover obtained the human homologues of these four putative novel functional genes, especially structurally similar and mouse genes designated CKLFSF3, -5, -7, and -8. clustered genes. For example, some new members of the For CKLFSF4, we referred to a putative human gene FXR, IL-1, and -defensin families have recently been encoding 133 amino acids (GenBank Accession No. cloned using a computational search strategy [10–12]. In XP_085368) that was identified by BLASTP searches with the current report, we describe the use of this strategy for the CKLFSF5 and CKLFSF7. Analyses revealed that this se- identification of eight novel genes designated chemokine- quence did not encode a complete gene product. By EST like factor superfamily members 1–8 (CKLFSF1–8), linking assembly, we obtained its full-length cDNA sequence en- SCYA22, SCYD1, SCYA17, and TM4SF11 further. coding 234 amino acids. For human CKLFSF6, we con- sulted the hypothetical National Center for Biotechnology Information (NCBI) protein FLJ20396 (GenBank Acces- Results sion Nos. BC002797, AK000403, and AAH02797), which has 31.1% sequence identity with CKLFSF4 on the overall Identification of the full-length cDNA sequences of amino acid sequence level. Using EST assembly based on CKLFSF1–8 the incomplete sequence information, we obtained the full- length cDNA sequence and named it CKLFSF6. We conducted bioinformatics analyses with the goal of After iterative BLAST searches and EST assemblies, we identifying novel CKLF-related genes. By searching the obtained the full-length cDNA sequences of human TBLASTN programs against human EST databases with CKLFSF1–8 with complete open reading frames. Most of human CKLF2, we found dozens of EST fragments whose them have a stop codon upstream of the translation initiation deduced amino acids have obvious sequence identity with site, a polyadenylation site, and a poly(A) tail at the 3Ј it. After EST assembly, we obtained a complete cDNA terminus. A promoter scan suggests there is a typical pro- sequence with a putative amino acid sequence that contains moter sequence in the upstream region of CKLFSF7. The a CC motif and has sequence similarity with CCL22/MDC, above results indicate we have obtained full cDNA se- CX3CL1/fractalkine, and CCL17/TARC. Therefore, we quences. propose that CKLF represents a novel gene family and that other related genes must exist. Under the approval of the Molecular cloning of cDNA sequences of CKLFSF1–8 HUGO Gene Nomenclature Committee, novel genes related to CKLF have been designated chemokine-like factor su- According to the predicted sequences, we designed spe- perfamily members (CKLFSFs) (previously called chemo- cific primers and amplified the complete open reading kine-like factor-related proteins). This first isolated gene frames of the putative genes from different cDNA libraries. was named CKLFSF1 and was confirmed by PCR amplifi- DNA sequencing verified our prediction. We deposited cation and DNA sequencing. CKLFSF1 is a complicated them with GenBank. In total, this iterative, computational gene having at least 23 alternative RNA splicing forms method has discovered eight novel human CKLFSF genes. 中国科技论文在线 http://www.paper.edu.cn W. Han et al. / Genomics 81 (2003) 609–617 611 Fig. 1. Northern blot analyses of CKLFSF1–8. CKLFSF1, -2, -3, -6, and -7 probes were hybridized with membranes prepared in our lab, and CKLFSF1, -5, and -8 were hybridized with membranes purchased from Clontech. The molecular weights of CKLFSF1–8 are in agreement with the predicted sizes. The expression of CKLFSF1, -2, -5, and -7 is restricted to some specific tissues, but CKLFSF3, -4, -6, and -8 are widely expressed. Northern blot analyses five gaps of ϳ117, 120, 192, 301, and 340 bp.