中国科技论文在线 http://www.paper.edu.cn

Genomics 81 (2003) 609Ð617

Identification of eight encoding -like factor superfamily members 1Ð8 (CKLFSF1Ð8) by in silico cloning and experimental validation૾

Wenling Han,1 Peiguo Ding,1 Mingxu Xu, Lu Wang, Min Rui, Shuang Shi, Yanan Liu, Ying Zheng, Yingyu Chen, Tian Yang, and Dalong Ma*

Center for Human Disease Genomics, Peking University, 38 Xueyuan Road, Beijing 100083, China Received 21 November 2002; accepted 14 March 2003

Abstract TM4SF11 is only 102 kb from the chemokine cluster composed of SCYA22, SCYD1, and SCYA17 on 16q13. CKLF maps on chromosome 16q22. CKLFs have some characteristics associated with the CCL22/MDC, CX3CL1/fractalkine, CCL17/TARC, and TM4SF . Bioinformatics based on CKLF2 cDNA and sequences in combination with experimental validation identified eight novel genes designated chemokine-like factor superfamily members 1Ð8 (CKLFSF1Ð8). CKLFSF1Ð8 form gene clusters; the sequence identities between CKLF2 and CKLFSF1Ð8 are from 12.5 to 39.7%. Most of the CKLFSFs have alternative RNA splicing forms. CKLFSF1 has a CC motif and higher sequence similarity with than with any of the other CKLFSFs. CKLFSF8 shares 39.3% amino acid identity with TM4SF11. CKLFSF1 links the CKLFSF family with chemokines, and CKLFSF8 links it with TM4SF. The characteristics of CKLFSF2Ð7 are intermediate between CKLFSF1 and CKLFSF8. This indicates that CKLFSF represents a novel gene family between the SCY and the TM4SF gene families. © 2003 Elsevier Science (USA). All rights reserved.

Keywords: CKLF; CKLFSF; Bioinformatics; Gene cluster; Chemokine; TM4SF11

Chemokines are small, secreted proteins that can be CXC chemokines (CXCL1Ð16), 2 C chemokines (XCL1Ð

subdivided according to their NH2-terminal cysteine-motif 2), and one CX3C chemokine (CX3CL1/fractalkine) iden- into the CC, CXC, C, and CX3C classes. A systematic tified to date. Their corresponding genes have been named nomenclature for chemokines and their genes has gained SCYA1Ð28, SCYB1Ð16, SCYC1Ð2, and SCYD1, respectively wide acceptance, with 28 CC chemokines (CCL1Ð28),2 16 [1,2]. SCYA22 (CCL22/MDC), SCYD1 (CX3CL1/fracta- lkine), and SCYA17 (CCL17/TARC) are clustered on chro- mosome 16q13 [3]. TM4SF11, formerly named plasmo- ૾ Sequence data from this article have been deposited with the Gen- Bank Data Library under Accession Nos. AY174118 (CKLFSF1), lipin, is only 102 kb from SCYA22 [4]. Such a short distance AF479260 (CKLFSF2), AF479813 (CKLFSF3), AF479814 (CKLFSF4), suggests some relationship may exist between them. AF479262 (CKLFSF5), AF479261 (CKLFSF6), AF479263 (CKLFSF7), Chemokine-like factor was isolated from PHA-stimu- and AF474370 (CKLFSF8). lated U937 cells. It is composed of four exons and located * Corresponding author. Fax: ϩ1-8610-62091149. E-mail address: [email protected] (D. Ma). on chromosome 16q22. It maintains four alternative RNA 1 These authors contributed equally to this paper. splicing forms, which we designated CKLF1, 2, 3, and 4. 2 Abbreviations used: CKLF, chemokine-like factor; SCY, small se- CKLF1 has a CC motif and the key amino acids around the creted gene; CCL, CC chemokine ligand; TARC, thymus- and motif are identical with those of CCL22/MDC and CCL17/ activation-regulated chemokine; MDC, macrophage-derived chemokine; TARC, though lacking the additional two C-terminus cys- CX3CL, CX3C chemokine ligand; IL, interleukin; FXR, fragile-X-related protein; TM4SF, transmembrane 4 superfamily; HUGO, teines. Mouse and rat CKLFs have some RNA splicing Organization; HGMP, Human Genome Mapping Project. forms similar to those of human CKLFs, but the key motif

0888-7543/03/$ Ð see front matter © 2003 Elsevier Science (USA). All rights reserved. doi:10.1016/S0888-7543(03)00095-8

转载 中国科技论文在线 http://www.paper.edu.cn

610 W. Han et al. / Genomics 81 (2003) 609Ð617

of their encoded proteins is not CC, but rather a CX3C motif (manuscript in preparation). Here, we report one of these similar to that of CX3CL1/fractalkine. Both human and rat splice variants, which encodes a protein of 169 amino acids CKLF1 have chemotactic activities on a wide spectrum of (AY174118). leukocytes. CKLF2 is the full-length cDNA product encod- When the cDNA sequence of mouse CKLF2 was used in ing 152 amino acids. It can stimulate the proliferation and a BLASTN search, several mouse EST fragments with high differentiation of C2C12 cells. In addition to possessing the sequence identity were identified. After EST assembly, a CC motif, it has four putative transmembrane regions and novel polypeptide with 169 amino acids was cloned the sequence identity between CKLF2 and TM4SF11 is (AY172740). After conducting a TBLASTN search against 15.4% [5,6]. It seems that CKLF may be an interim gene human EST databases with this protein sequence and EST between SCYA22, SCYD1, SCYA17, and TM4SF11. assembly, we obtained its human homologue and named it When CKLF was cloned in 1998, there was no human CKLFSF2. It is tightly linked with CKLFSF1 on chromo- gene product that had obvious sequence similarity with it. some 16q13, justifying the designation CKLFSF2. However, we suspected the existence of additional CKLF- When mouse CKLF2 was used for BLASTP searches, related genes because of the high frequency of gene dupli- four putative mouse proteins sharing obvious sequence sim- cation within chemokine clusters [7,8] and the rapidly ex- ilarity with it were found, but no human homologues of panded TM4SF proteins [9]. With the current availability of them existed in the public database at that time. TBLASTN large databases of expressed sequence tags (ESTs) and new searches against human EST databases with these sequences bioinformatics methods, we hypothesized that analysis of and the corresponding EST assemblies were performed. The the EST, nr, and HGP databases with multiple computa- accession numbers of the putative mouse genes used for tional tools would allow us to identify novel genes belong- TBLASTN analyses and the accession numbers for the ing to the CKLF family. In silico gene identification from corresponding EST assemblies may be found under Mate- human genome sequence and EST databases has become rials and methods. By using this approach, we successfully critically important in many projects seeking to discover obtained the human homologues of these four putative novel functional genes, especially structurally similar and mouse genes designated CKLFSF3, -5, -7, and -8. clustered genes. For example, some new members of the For CKLFSF4, we referred to a putative human gene FXR, IL-1, and ␤-defensin families have recently been encoding 133 amino acids (GenBank Accession No. cloned using a computational search strategy [10Ð12]. In XP_085368) that was identified by BLASTP searches with the current report, we describe the use of this strategy for the CKLFSF5 and CKLFSF7. Analyses revealed that this se- identification of eight novel genes designated chemokine- quence did not encode a complete gene product. By EST like factor superfamily members 1Ð8 (CKLFSF1Ð8), linking assembly, we obtained its full-length cDNA sequence en- SCYA22, SCYD1, SCYA17, and TM4SF11 further. coding 234 amino acids. For human CKLFSF6, we con- sulted the hypothetical National Center for Biotechnology Information (NCBI) protein FLJ20396 (GenBank Acces- Results sion Nos. BC002797, AK000403, and AAH02797), which has 31.1% sequence identity with CKLFSF4 on the overall Identification of the full-length cDNA sequences of amino acid sequence level. Using EST assembly based on CKLFSF1Ð8 the incomplete sequence information, we obtained the full- length cDNA sequence and named it CKLFSF6. We conducted bioinformatics analyses with the goal of After iterative BLAST searches and EST assemblies, we identifying novel CKLF-related genes. By searching the obtained the full-length cDNA sequences of human TBLASTN programs against human EST databases with CKLFSF1Ð8 with complete open reading frames. Most of human CKLF2, we found dozens of EST fragments whose them have a stop codon upstream of the translation initiation deduced amino acids have obvious sequence identity with site, a polyadenylation site, and a poly(A) tail at the 3Ј it. After EST assembly, we obtained a complete cDNA terminus. A promoter scan suggests there is a typical pro- sequence with a putative amino acid sequence that contains moter sequence in the upstream region of CKLFSF7. The a CC motif and has sequence similarity with CCL22/MDC, above results indicate we have obtained full cDNA se- CX3CL1/fractalkine, and CCL17/TARC. Therefore, we quences. propose that CKLF represents a novel gene family and that other related genes must exist. Under the approval of the Molecular cloning of cDNA sequences of CKLFSF1Ð8 HUGO Committee, novel genes related to CKLF have been designated chemokine-like factor su- According to the predicted sequences, we designed spe- perfamily members (CKLFSFs) (previously called chemo- cific primers and amplified the complete open reading kine-like factor-related proteins). This first isolated gene frames of the putative genes from different cDNA libraries. was named CKLFSF1 and was confirmed by PCR amplifi- DNA sequencing verified our prediction. We deposited cation and DNA sequencing. CKLFSF1 is a complicated them with GenBank. In total, this iterative, computational gene having at least 23 alternative RNA splicing forms method has discovered eight novel human CKLFSF genes. 中国科技论文在线 http://www.paper.edu.cn

W. Han et al. / Genomics 81 (2003) 609Ð617 611

Fig. 1. Northern blot analyses of CKLFSF1Ð8. CKLFSF1, -2, -3, -6, and -7 probes were hybridized with membranes prepared in our lab, and CKLFSF1, -5, and -8 were hybridized with membranes purchased from Clontech. The molecular weights of CKLFSF1Ð8 are in agreement with the predicted sizes. The expression of CKLFSF1, -2, -5, and -7 is restricted to some specific tissues, but CKLFSF3, -4, -6, and -8 are widely expressed.

Northern blot analyses five gaps of ϳ117, 120, 192, 301, and 340 bp. This suggests that a large CKLFSF4 isoform exists, with a molecular size Northern blot analyses were used to confirm that we had of about 8.2 kb (as seen in the Northern blot). In this RNA isolated the full-length cDNA sequences of CKLFSF1Ð8. splicing form, the intron is selected as a part of the 3ЈUTR. The complete ORF of each gene was used as a template for Further experiments are necessary to validate this proposal. probe labeling. Every probe was hybridized with mem- branes prepared by our lab and by Clontech (Palo Alto, CA, Sequence comparison of CKLFSF1 with CCL22/MDC, USA). Representative results are illustrated in Fig. 1. In the CX3CL1/fractalkine, and CCL17/TARC tested tissues, the expression patterns of CKLFSF1Ð8 vary. CKLFSF1 and CKLFSF2 have the highest expression levels Our previous studies have demonstrated that CKLFs in testis; CKLFSF5 and CKLFSF7 are highly expressed in share some characteristics associated with CCL17/TARC, brain and leukocytes, respectively. CKLFSF3, -4, -6, and -8 CCL22/MDC, and CX3CL1/fractalkine. CKLFSF1 has a have a wide expression profile in many tissues. This is CC motif and shares a high level of amino acid sequence consistent with our RT-PCR analyses and Unigene searches identity with CKLF2. Therefore, we did pair-wise sequence (data not shown). There is more than one band in CKLFSF3, comparison of CKLFSF1 with the above three proteins. -4, -5, and -6, indicating alternative RNA splicing forms that Unlike other chemokines, the predicted human CX3CL1/

have been confirmed by PCR amplification and DNA se- fractalkine is composed of 100 amino acids at the NH2- quencing (data not shown). The predicted full-length cDNA terminus of a 373-amino-acid protein that carries the che- sequences of CKLFSF1, -2, -3, -5, -6, -7, and -8 are 1.14, mokine domain atop an extended mucin-like stalk [13]. 1.07, 2.1, 1.22, 3.88, 1.37, and 1.18 kb, respectively, which Therefore, we selected only the chemokine domain for se- are consistent with the major mRNA bands in the Northern quence alignment. As illustrated in Fig. 2, the C-terminus blot analyses. For CKLFSF4, there are three bands whose region of CKLFSF1 shows sequence similarity with molecular sizes are about 1.2, 3.4, and 8.2 kb. We obtained CCL22/MDC and CX3CL1/fractalkine. CKLFSF1 and cDNA sequences corresponding to the two smaller forms CCL22/MDC are up to 47.0% similar, and some amino acid that were ϳ1.11 and 3.43 kb, respectively. When conduct- residues near the CC motif are identical. CKLFSF1 is most

ing EST BLAST searches with the final intron of CKLFSF4 similar to CCL17/TARC in the NH2-terminus amino acids (4.7 kb), we found many EST matches that identified the 12 to 118. The sequence similarities of CKLFSF2Ð8 with 中国科技论文在线 http://www.paper.edu.cn

612 W. Han et al. / Genomics 81 (2003) 609Ð617

Fig. 2. Local sequence alignments of CKLFSF1 with CCL22, CX3CL1, and CCL17. Pair-wise sequence comparisons of conserved regions were performed using the Needle software. The sequence identities and similarities between CKLFSF1 and CCL22/MDC, CX3CL1/fractalkine, and CCL17/TARC are 26.5 (22/83) and 47.0% (39/83), 22.5 (25/111) and 42.3% (47/111), 28.15 (32/114) and 41.2% (47/114), respectively. The symbols ͉,:,andá indicate matched, similarly matched, and different amino acid residues, respectively.

these three chemokines are lower. DNA star and HOMOL- CKLF, the protein products of different RNA splicing forms OGY software analyses gave us similar results (data not may have different transmembrane regions (data not shown). Further experiments will be required to elucidate shown). A representative transmembrane region analysis is the chemotactic activity of CKLFSF1 that is suggested by shown in Table 1. Except for CKLFSF3, which has only its similarity to CCL17/TARC, CCL22/MDC, and three potential transmembrane regions, at least one isoform CX3CL1/fractalkine. of each other gene has four putative transmembrane regions. This indicates that CKLFSF1Ð8 share some characteristics Sequence alignment of CKLFSF1Ð8 with CKLF2 and with TM4SF11. TM4SF11 Homology analysis of CKLFSF1Ð8 between human and CKLFSF1Ð8 were identified by iterative BLAST mouse searches and were found to share various identical se- quences (Fig. 3A). When the amino acid sequence of DNA sequencing and Northern blot analyses validated CKLFSF8 was applied to BLASTP searches, we found that the existence of human CKLFSF1Ð8. After this, we did CKLFSF8 is more highly identical to Plasmolipin/ sequence homology analyses of them with their mouse TM4SF11 than to other CKLFSF members. Since all of homologues. The mouse homologues of human CKLFSF1 these novel proteins were identified through searching with and CKLFSF3Ð8 were designated mouse CKLFSF1 and CKLF2, we performed multiple sequence alignments with CKLFSF3Ð8, respectively. Human CKLFSF2 is the com- both CKLF2 and TM4SF11. Figs. 3B and 3C illustrate that mon ortholog of mouse CKLFSF2a and CKLFSF2b. This these 10 proteins constitute four subfamilies: CKLF2, systemic designation of mouse CKLFSF members was ap- CKLFSF1, and CKLFSF2 belong to one subfamily with proved by the Mouse Genome Database Project. The Ac- CKLFSF7 closely related, while CKLFSF3 and CKLFSF5, cession Nos. of mouse CKLFSF1, CKLFSF2a, CKLFSF2b, CKLFSF4 and CKLFSF6, and CKLFSF8 and TM4SF11 are and CKLFSF3Ð8 are BAB24700, AY172740, AY162282, 39.7, 31.1, and 39.3% identical, respectively (Fig. 3B), and AAH03230, AF479815, BAB28947, AK030885, AF188504, form another three subfamilies (Fig. 3C). and BAB28125. Mouse CKLFSF1 has 50% similarly matched sequence with human CKLFSF1 at the C-terminus. Transmembrane analyses of CKLFSF1Ð8 In addition, it also shares significant sequence identity with mouse CKLF and CKLFSF2a on the nucleotide level (data CKLF has four isoforms designated CKLF1, -2, -3, and not shown). Human CKLFSF2 shows 46.1 and 48.1% sim- -4; their protein products have two, four, one, and three ilarity with mouse CKLFSF2a and CKLFSF2b on the over- putative transmembrane regions, respectively. CKLFSF8 all amino acid level, respectively. The similarities between has higher sequence identity with TM4SF11 than with other human and mouse CKLFSF3Ð8 are 91.8, 97.6, 90.4, 88.5, CKLFSF members. Most of the CKLFSF1Ð8 superfamily 88.6, and 95.4%, of which CKLFSF4 is the most conserved exists as alternative RNA splicing forms, and similar to one. 中国科技论文在线 http://www.paper.edu.cn

W. Han et al. / Genomics 81 (2003) 609Ð617 613

Fig. 3. Sequence alignment of CKLFSF family members. (A) Amino acid sequence alignment of human CKLFSF1Ð8 with CKLF2 and TM4SF11. Sequence Accurate, Identity) of the Lasergene software (DNASTAR, Inc., Madison, WI, USA). The boxed andگalignment was obtained using the ClustalW algorithm (Slow shaded areas indicate matching residues. (B) Sequence comparison of human CKLFSF1Ð8, CKLF2, and TM4SF11. The numbers represent the percentage of Accurate, Identity weight table in the Lasergene software. (C) Phylogenetic analysis ofگsequence identity determined by the ClustalW method with Slow CKLFSF1Ð8, CKLF2, and TM4SF11. The inferred phylogenetic tree was generated based on the degree of amino acid sequence identity shown in B.

Chromosomal mapping of CKLFSF1Ð8 and the related chemokines, others are much more similar to TM4SF genes proteins. When conducting BLAST searches of the draft human genomic sequence with all the assembled full- From the above analyses, it seems that while some length cDNA, we noted some important phenomena. As CKLFSF members share obvious characteristics with shown in Fig. 4A, CKLF and CKLFSF1Ð4 form a gene

Table 1 Transmembrane regions of CKLF2 and CKLFSF1Ð8

First Second Third Fourth Amino acids CKLF2 24Ð40 45Ð67 74Ð96 106Ð128 152 CKLFSF1 23Ð40 45Ð67 77Ð99 108Ð130 169 CKLFSF2 94Ð111 116Ð135 142Ð166 179Ð198 248 CKLFSF3 61Ð83 103Ð122 132Ð151 — 182 CKLFSF4 59Ð77 87Ð109 121Ð143 148Ð170 234 CKLFSF5 35Ð56 60Ð82 95Ð114 119Ð136 156 CKLFSF6 43Ð64 73Ð94 107Ð126 135Ð156 183 CKLFSF7 40Ð59 69Ð91 103Ð125 145Ð167 175 CKLFSF8 46Ð68 78Ð100 113Ð130 140Ð162 173

Note. CKLF2 and the representative protein forms of CKLFSF1Ð8 were selected for putative transmembrane analyses. Except for CKLFSF3 (with three transmembrane regions), all the others have similar transmembrane regions. 中国科技论文在线 http://www.paper.edu.cn

614 W. Han et al. / Genomics 81 (2003) 609Ð617

Fig. 4. (A) The chromosomal location of CKLFSF members and related genes. CKLF and CKLFSF1Ð4 form a gene cluster on , and CKLFSF6Ð8 constitute another cluster on chromosome 3. The gene density of the first cluster is much higher than that of the second one. CKLFSF5 itself is mapped to chromosome 14. The cluster consisting of SCYA22, SCYD1, and SCYA17 is not far from TM4SF11. (B) Chromosomal location of mouse CKLFSF members and related genes. The chromosomal locations of mouse CKLFSF members and the related genes are similar to those of the human homologues. The dots between CKLF and SCY17 reveal that this region is not drawn to scale. The solid lines between genes indicate that the distances are showed in proportion. The thick dashed lines in (A) and (B) between SCY17 and CKLF suggest that the distances are reduced. The thin dashed lines in (B) show that there are gaps on the mouse genome draft map.

cluster on chromosome 16. It is known that SCYA22, the same. The above genomic localization patterns fur- SCYD1, and SCYA17 are clustered on 16q13. Notably, ther verify that CKLFSF1Ð8 represent a novel gene fam- the TM4SF11 is only 102 kb from SCYA22. It has been ily linking chemokine and TM4SF families. The chromo- known to us that CKLF is located on chromosome 16q22; somal arrangements of mouse CKLFSF members are further genomic examination shows that the region be- similar to those of the human CKLFSF genes. As illus- tween CKLF and CKLFSF1 is only 325 bp, and the trated in Fig. 4B, except that there are some promiscuities distance between CKLFSF1 and CKLFSF2 is 312 bp. in the genomic location of CKLFSF1, CKLFSF2a, and CKLFSF3 is only 870 bp from CKLFSF4. The gene CKLFSF2b, the CKLF and CKLFSF1, -2a, -2b, -3, and -4 density of this region is very high. CKLFSF5 is located genes form a cluster on mouse chromosome 8; CKLFSF6, on chromosome 14; CKLFSF6,- 7, and -8 form a second -7, and -8 are located on mouse chromosome 9, and gene cluster on chromosome 3. CKLFSF4 and CKLFSF6 similar to human CKLFSF4, mouse CKLFSF4 is also have identical orientation; the orientations of others are mapped to chromosome 14. 中国科技论文在线 http://www.paper.edu.cn

W. Han et al. / Genomics 81 (2003) 609Ð617 615

Discussion conserved during evolution. CKLFSF1 and CKLFSF2 are more active than others and CKLFSF4 is the most con- In this report, we describe the discovery of human served one. The tight linking of CKLF, CKLFSF1, and CKLFSF1Ð8. CKLFSF1 was obtained from human CKLF; CKLFSF2 on human chromosome 16 and some identical CKLFSF2, -3, -5, -7, and -8 were acquired by TBLAST nucleotide fragments shared by mouse CKLF, CKLFSF1, with the amino acid sequences of their putative mouse CKLFSF2a, and CKLFSF2b and their close linking on chro- homologues; conducting iterative BLAST searches with the mosome 8 indicate that they may have derived from a above genes, we got the complete cDNA sequences of common ancestral gene and expanded by duplication CKLFSF4 and CKLFSF6 by EST assembly on the basis of events. Human CKLFSF2 is the common ortholog of mouse the putative genes without any structural or functional in- CKLFSF2a and CKLFSF2b, which suggests that a gene dications released in the public databases. After further conversion phenomenon may exist. At present, the draft analyses, we found they share sequence identity and similar mouse genome map cannot map the exact localizations of transmembrane regions with other CKLFSF genes; more CKLFSF1, CKLFSF2a, and CKLFSF2b, most likely be- importantly, they are closely linked with CKLFSF3 and cause of their overlapping sequences. Further experiments CKLFSF7 on chromosome 16 and chromosome 3, respec- are necessary to obtain accurate arrangements of them. In tively, so we designated these two genes CKLFSF4 and addition, human CKLFSF3Ð8 are conserved both on the CKLFSF8. At present, some laboratories have predicted amino acid level and in chromosomal location. CKLFSF1, -2, -3, -5, -7, and -8 as putative or hypothetical Our previous studies have shown that CKLFs have some genes. These findings highlight the potential value of the characteristics similar to CCL22/MDC, CX3CL1/fractal- complementary nature of database mining and EST assem- kine, and CCL17/TARC. In this paper, we show that bly in identifying novel members of gene families. We show CKLFSF1 shares conserved areas with these chemokines, here that it is especially important to conduct searches which suggests that CKLF and CKLFSF1 are much more across human and mouse databases. We believe that new related to chemokines, whereas CKLFSF8 is more similar genes of other families that have not been discovered by to TM4SF11. More importantly, TM4SF11 is not far from conventional methods can be detected by using the above the SCYA22, SCYD1, and SCYA17 cluster on chromosome strategies. By using multiple computational approaches, we 16q13. It seems that CKLF and CKLFSF1 link the CKLFSF have successfully cloned human CKLFSF1Ð8. Most of them family with chemokines, and CKLFSF8 links it with the have alternative RNA splicing forms; each cloned sequence TM4SF family. The characteristics of CKLFSF2Ð7 are in- has a complete open reading frame. Our belief that we have termediate between CKLFSF1 and CKLFSF8. The average elucidated the complete cDNA sequences of CKLFSF1Ð8 is sequence identities among CKLFSF members are not high, supported by promoter scans and the identification of satis- but only a few amino acids are required to maintain the factory translation consensus sequences, stop codons up- tertiary and quaternary structure, which is the most impor- stream of the initial ATG, adenylation signals, and poly(A) tant determinant of protein function. All the members have tails. Northern blot analyses were used to validate further some conserved amino acids and similar potential trans- the existence of the cloned cDNA sequences. membrane regions. Therefore, it is reasonable to infer they Except for CKLFSF2 and CKLFSF3 (15.75 kb apart), the are functionally related. We believe CKLF and CKLFSF1Ð8 genes located on chromosome 16 are all within ϳ100 bp of represent a novel gene family associated with both chemo- their nearest neighbor. They are so closely linked that it is kine and TM4SF families. This provides new opportunities conceivable that some of their regulatory elements overlap, to investigate the relationship of these three families and though this will need to be examined in further work. lays the groundwork for achieving a better understanding of CKLFSF5 is located by itself on chromosome 14, but it is the complicated interactions among them. about 400 bp from IL-17E [14]. Further experiments are needed to evaluate the possibility of a relationship between the two. CKLFSF6, -7, and -8 fall into the second gene Materials and methods cluster of this family, mapped to chromosome 3. In consid- ering the similarities among the family members, CKLF, BLAST-based searches CKLFSF1Ð8, and TM4SF11 form four subfamilies. CKLF, CKLFSF1, and CKLFSF2 exist as a subfamily that is tightly GenBank, nr, and EST databases as well as nonredun- linked on chromosome 16. CKLFSF3 and CKLFSF5 form dant, high-throughput genomic sequences were searched for the second subfamily, CKLFSF4 and CKLFSF6 the third, CKLFSF family members by using the BLASTP, BLASTN, and CKLFSF8 and TM4SF11 the fourth. Unlike the tightly and TBLASTN programs on the NCBI Web site (http:// linked members of the first subfamily, the members of the www.ncbi.nlm.nih.gov/blast). The initial queries for the other three subfamilies are all located on different chromo- search used the cDNA and amino acid sequences of the somes. It is interesting that the sequence identity is incon- human and mouse CKLF2. NCBI default parameters were sistent with the chromosome localization. used in the searches, and any potential hits were examined Bioinformatics analysis determines that CKLFSF1Ð8 are and logged manually. For each novel CKLF family gene 中国科技论文在线 http://www.paper.edu.cn

616 W. Han et al. / Genomics 81 (2003) 609Ð617

Table 2 Accession Nos. of the EST fragments for the full-length cDNA assembly

CKLFSF1 BI464188, AL038679, AI632227, AI242300, BG389971, AI540221, AI269738, AI217152, AI962574, AA921931, AA193255, AI825627, AI695900 CKLFSF2 BG826826, AA778552, BG212551, AI201364, AW663435, AI223025 CKLFSF3 AL542357, AL551269, AL514858, AL526558, BG744169, AL544210, AL550478, BG758446, BG748053, AL527977, BI199176, BG697860, BG323663, BG323505 CKLFSF4 AL530351, BF327262, AL554724, BQ016885, AW297066, BF331942, AI208468, BU675882, AL577363, BI256936, BQ004582 CKLFSF5 BF530674, BF346190, AI147740, AA452430, AI553750, AI567497, AW296102, AA452257, AI471151 CKLFSF6 BI598747, BQ219936, BU192458, BQ221096, BM980865 CKLFSF7 BG182013, BG189368, BG189930, BG193535, AW080832, BE869501, BF793712, AI693734, AI564525, BG213702, BG216957, BG656425 CKLFSF8 AL528672, AL570224, AL528671, AL543749, BG761693, BE740519, BG761693, BE740519, BG477819, BG340845, BG282018, BG478235, BI459684

identified, additional iterative BLAST searches were per- DNA sequencing formed to identify other related sequences and search for EST fragments for gene assembly. The protein sequences of The PCR products from each set of gene-specific primers the following genes were used for subsequent TBLASTN were cloned into the pGEM-T-easy vector according to the searches in the human EST database: human CKLF2 and manufacturer’s directions (Promega, Madison, WI, USA). mouse CKLFSF2a (GenBank Accession Nos. AF135380 Positive colonies were selected and the plasmids were pu- and AY172740) were used to identify human CKLFSF1 and rified with the Qiagen Plasmid Mini Kit (Qiagen, Valencia, CKLFSF2, respectively; the putative mouse proteins CA, USA) and sequenced on the ABI Prism 3100 DNA AAH03230, BAB28947, AF188504, and BAB28125 were genetic analyzer using the BigDye Terminator Cycle Se- used to search for human CKLFSF3, -5, -7, and -8, respec- quencing Ready Reaction Kit (Applied Biosystems, Foster tively. For human CKLFSF4 and CKLFSF6, we performed City, CA, USA). EST assembly based on the two putative human genes XP_085368 and AAH02797. Northern blot analyses EST assembly of novel genes Total RNA from human testis, leukocytes, and placenta Most of the cDNA sequences of CKLFSF members were was isolated using Trizol (Life Technologies, Rockville, obtained by EST assembly as described previously [5]. MD, USA) according to the manufacturer’s instructions. Table 2 gives the Accession Nos. of the EST fragments used Aliquots (30 ␮g) of RNA were separated on 1.5% agaroseÐ for CKLFSF gene assembly. formaldehyde gels, transferred to Gene Screen-Plus nylon membranes (DuPont-NEN, Boston, MA, USA) by capillary Oligonucleotides and cDNA libraries transfer using 20ϫ standard saline citrate, and cross-linked using UV radiation. Other premade membranes were ob- A series of gene-specific primers was designed based on tained from Clontech for comparison. The complete ORFs the predicted cDNA sequences of CKLFSF1Ð8. The oligo- of CKLFSF1Ð8 were radiolabeled with [32P]dCTP in a nucleotides and the cDNA libraries used for PCR amplifi- Klenow reaction with random priming (Life Technologies). cation are shown in Table 3 and the PCR conditions are Membranes were hybridized with the appropriate probes, shown in Table 4. The cDNA libraries used were single- washed under stringent conditions, and analyzed using the strand cDNA libraries from the Multiple Tissue cDNA Cyclone Storage Phosphor System (Packard Instrument Co., Panel (Clontech). USA).

Table 3 Primers and cDNA libraries for CKLFSF1Ð8 cDNA amplification

Upstream primer Downstream primer cDNA library CKLFSF1 CACCATGGATCCTGAACACGC AGCGATTCGACAGACACGTGC Testis CKLFSF2 AAGGACACCGAGTCAGTCATG TCATTTCTTTCCCTTTGCTGGCC Testis CKLFSF3 CGCGAGAAGAGGGGAGCCAG ATCTCGCAGTGCCCATAGCC Placenta CKLFSF4 GGGCGGCAGCATGCGG GGCAGGTCCTCACGTGTCCAG Spleen CKLFSF5 GCCTCCATCTCTGCCTACATG GGCTCCACTGTCCTCTCTGC Brain CKLFSF6 ACGATGGAGGAGCCGC TCACTGTATGGTCCTGGATCTC Prostate CKLFSF7 CTGGGGCCGCGCAATG TAACAAAGGCAGAGATGGAGAGGAG Placenta CKLFSF8 CACGATGGAGAACGGAGCGG GGGCTCAGGCACCACAATG Tonsil 中国科技论文在线 http://www.paper.edu.cn

W. Han et al. / Genomics 81 (2003) 609Ð617 617

Table 4 PCR conditions for CKLFSF1Ð8 amplification

Predenature Denature Anneal Extend Cycles Further extension CKLFSF1 94¡C 5 min 94¡C20s 58¡C20s 72¡C30s 35 72¡C10min CKLFSF2 94¡C 5 min 94¡C20s 58¡C20s 72¡C30s 35 72¡C10min CKLFSF3 94¡C 5 min 94¡C20s 64¡C20s 72¡C70s 35 72¡C10min CKLFSF4 94¡C 5 min 94¡C20s 66¡C20s 72¡C30s 35 72¡C10min CKLFSF5 94¡C 5 min 94¡C20s 55¡C20s 72¡C30s 35 72¡C10min CKLFSF6 94¡C 5 min 94¡C20s 58¡C20s 72¡C30s 35 72¡C10min CKLFSF7 94¡C 5 min 94¡C20s 64¡C20s 72¡C60s 35 72¡C10min CKLFSF8 94¡C 5 min 94¡C20s 58¡C20s 72¡C30s 35 72¡C10min

Bioinformatics analysis of the protein sequence [2] K. Christopherson, R. Hromas, Chemokine regulation of normal and pathologic immune responses, Stem Cells 19 (5) (2001) 388Ð96. Using the Needle software (HGMP, Hinxton, Cam- [3] H. Nomiyama, T. Imai, J. Kusuda, R. Miura, DF. Callen, O. Yoshie, Human chemokines fractalkine (SCYD1), MDC (SCYA22) and TARC bridge, UK) to find conserved areas, CKLFSF family mem- (SCYA17) are clustered on chromosome 16q13, Cytogenet. Cell bers were compared with CCL17/TARC, CCL22/MDC, and Genet. 81 (1998) 10Ð1. CX3CL1/fractalkine. The matrix used was EBLOSUM62; [4] M. Hamacher, U. Pippirs, A. Kohler, HW. Muller, F. Bosse, Plasmo- the gap open penalty was 4, and the gap extension penalty lipin: genomic structure, chromosomal localization, protein expres- was 1.65 [15,16]. The sequence comparisons between hu- sion pattern, and putative association with BardetÐBiedl syndrome, man CKLFSF1Ð8 and their mouse homologues were con- Mamm. Genome 12 (2001) 933Ð7. ducted with the above software and parameters. The multi- [5] W.L. Han, et al., Molecular cloning and characterization of chemo- kine-like factor 1 (CKLF1), a novel human cytokine with unique ple sequence analyses of CKLFSFs were obtained using structure and potential chemotactic activity, Biochem. J. 357 (2001) ClustalW algorithm of the Lasergene software (DNASTAR, 127Ð35. Inc., Madison, WI, USA). The transmembrane analyses [6] D.L. Xia, et al., Overexpression of chemokine-like factor 2 promotes were performed using the Expasy Web site (http:// the proliferation and survival of C2C12 skeletal muscle cells. Bio- www.cbs.dtu.dk/services/TMHMM-2.0/). chim. Biophys. ActaÐMol. Cell. Res. 1591 (1-3) (2002) 163Ð173. [7] B. J. Rollins , Chemokines, Blood 90 (1997) 909Ð928. Chromosomal localization analyses [8] A. Zlotnik, J. Morales, JA. Hedrick, Recent advances in chemokines and chemokine receptors, Crit. Rev. Immunol. 19 (1999) 1Ð47. [9] J.P. Magyar, C. Ebensperger, N. Schaeren-Wiemers, U. Suter, To generate a continuous DNA sequence for some anal- Myelin and lymphocyte protein (MAL/MVP17/VIP17) and yses, genomic contigs from Human Genome Project were plasmolipin are members of an extended gene family, Gene 189 analyzed for the full-length assembled human CKLFSF (2) (1997) 69Ð75. cDNA sequences. The contig that was selected to analyze [10] L.K. Laura, A.M. Kellie, L.N. David, Comparative genomic sequence CKLF and CKLFSF1Ð4 was NT_010478.10; the contig that analysis of the FXR gene family: FMR1, FXR1, and FXR2, Genomics was used to analyze CKLFSF5 was NT_025892.9; the con- 78 (3) (2001) 169Ð177. tig for CKLFSF6Ð8 analyses was NT005817.7, and the [11] Haishan Lin, Alice S. Ho, Dana Haley-Vicente, Jun Zhang, Juanita Bernal-Fussell, Ann M. Pace, Derek Hansen, Kathi Schweighofer, contig for TM4SF11, SCYA22, SCYD1, and SCYA17 was Nancy K. Mize, and John E. Ford. (2001). Cloning and characteriza- NT_010463.10. For their mouse homologues, the genomic tion of IL-1HY2, a novel interleukin-1. J. Biol. Chem. 276 (23): contig for mouse CKLF, CKLFSF1Ð4, TM4SF11, SCYA22, 20597-20602. SCYD1, and SCYA17 was NW_000349.1, the contig for [12] C.S. Brian, P.M. Joseph, A.B. Jennifer, D.W. Jesse, P.J. Hong, J.W. mouse CKLFSF5 was NW_000100.1, and the contig for Micheal, L.C. Thomas, B.M. Paul, Discovery of five conserved ␤-de- mouse CKLFSF6Ð8 was NW_000361.1. fensin gene clusters using a computational search strategy, Proc. Natl. Acad. Sci. USA 99 (4) (2002) 2129Ð2133. [13] J.F. Bazan, et al., A new class of membrane-bound chemokine with Acknowledgments a CX3C motif, Nature 385 (6617) (1997) 640Ð4. [14] J. Lee, W.H. Ho, M. Maruoka, R.T. Corpuz, D.T. Baldwin, J.S. Foster, A.D. Goddard, D.G. Yansura, R.L. Vandlen, W.I. Wood, This work was supported by a grant from the National A.L. Gurney, IL-17E, a novel proinflammatory ligand for the Natural Sciences Foundation of China (30000153) and was IL-17 receptor homolog IL-17Rh1, J. Biol. Chem. 276 (2) (2001) done under the auspices of the Chinese High Tech Program 1660Ð 4. (863) (2001AA215061 and 2002BA711A01). [15] E. Wallin, C. Wettergren, F. Hedman, G. von Heijine, Fast Needle- manÐWunsch scanning of sequence databanks on a massively parallel computer, Comput. Appl. Biosci. 9 (1) (1993) 117Ð8. References [16] V.B. Streletc, I.N. Shindyalov, N.A. Kolchanov, L. Milanesi, Fast, statistically based alignment of amino acid sequences on the base of [1] A. Zlotnik, O. Yoshie, Chemokines: a new classification system and diagonal fragments of DOT-matrices, Comput. Appl. Biosci. 8 (6) their role in immunity, Immunity 12 (2000) 121Ð127. (1992) 529Ð34.