GENOMICS 42, 46–54 (1997) ARTICLE NO. GE974709

Genomic Structure, Evolution, and Expression of Human FLII, a and Leucine-Rich-Repeat Family Member: Overlap with LLGL Hugh D. Campbell,1 Shelley Fountain, Ian G. Young,* Charles Claudianos,† Jo¨ rg D. Hoheisel,‡ Ken-Shiung Chen,§ and James R. Lupski§ Molecular Evolution and Systematics Group and Centre for Molecular Structure and Function, Research School of Biological Sciences, The Australian National University, Canberra, ACT 2601, Australia; *Division of Biochemistry and Molecular Biology, John Curtin School of Medical Research, The Australian National University, Canberra, ACT 2601, Australia; †Division of Botany and Zoology, The Australian National University, and Division of Entomology, CSIRO, Canberra, ACT 2601, Australia; ‡Molecular-Genetic Genome Analysis, German Cancer Research Center, Im Neuenheimer Feld 506, D-69120 Heidelberg, Germany; and §Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030-3498

Received December 4, 1996; accepted March 6, 1997

Sheppard, 1977) cause flightlessness with abnormal The Drosophila melanogaster flightless-I is in- myofibrillar arrangements in the indirect flight mus- volved in cellularization processes in early em- cles (Deak et al., 1982; Miklos and de Couet, 1990) or bryogenesis and in the structural organization of indi- lethality with abnormal gastrulation and only partial rect flight muscle. The encoded contains a gel- cellularization of the syncytial blastoderm (Perrimon solin-like actin binding domain and an N-terminal et al., 1989). We have recently characterized the cDNA leucine-rich repeat protein–protein interaction do- for Drosophila fliI (Campbell et al., 1993) and shown main. The homologous human FLII gene encodes a that the gene encodes a 1256-amino-acid protein with 1269-residue protein with 58% amino acid sequence a gelsolin-like domain characteristic of that identity and is deleted in Smith–Magenis syndrome. interact with actin to cap and/or sever actin filaments We have cloned the FLII gene and determined its nu- cleotide sequence (14.1 kb). FLII has 29 introns, com- (Hartwig and Kwiatkowski, 1991). The fliI protein also pared with 13 in Caenorhabditis elegans and3inD. carries an N-terminal protein–protein interaction do- melanogaster. The positions of several introns are con- main consisting of leucine-rich repeats (LRRs) (Kobe served in FLII-related and in the domains and and Deisenhofer, 1995). subdomains of the gelsolin-like regions giving indica- Highly conserved fliI homologues are also present in tions of gelsolin gene family evolution. In keeping with Caenorhabditis elegans and humans (49 and 58% its function in indirect flight muscle in Drosophila, the amino acid sequence identity, respectively) (Campbell human FLII gene was most highly expressed in muscle. et al., 1993; Claudianos and Campbell, 1995), sug- The FLII gene lies adjacent to LLGL, the human homo- gesting conservation of the function(s) of this protein logue of the D. melanogaster tumor suppressor gene over an evolutionary period of at least 500 million lethal(2) giant larvae. The 3؅ end of the FLII transcript years. Interestingly, the human FLII gene, which maps –overlaps the 3؅ end of the LLGL transcript, and the to human 17p11.2, is deleted in Smith corresponding mouse genes Fliih and Llglh also over- Magenis syndrome (SMS), a relatively common (1 in lap. The overlap region contains poly(A) signals for 25,000 live births) microdeletion syndrome involving both genes and is strongly conserved between human developmental abnormalities and mental retardation and mouse. ᭧ 1997 Academic Press (Chen et al., 1995). We report here the isolation of genomic cosmid clones INTRODUCTION spanning FLII, the nucleotide sequence of the gene, Depending on their severity, mutations in the Dro- and the demonstration that it is most highly expressed sophila melanogaster flightless-I (fliI) gene (Homyk and in muscle. Conservation of intron positions within sub- unit domains and across other members of the gelsolin Sequence data from this article have been deposited with the gene family gives some indications of the evolution of EMBL/GenBank Data Libraries under Accession No. U80184. these genes, and this was also examined by phyloge- 1 To whom correspondence should be addressed at Molecular Evo- netic analysis of the protein domains. The LLGL gene lution and Systematics Group and Centre for Molecular Structure and Function, RSBS, The Australian National University, GPO Box (Strand et al., 1995; Koyama et al., 1996), the human 475, Canberra, ACT 2601, Australia. Telephone: 61-6-2495080. Fax: homologue of the D. melanogaster tumor suppressor 61-6-2494437. E-mail: [email protected]. gene lethal(2) giant larvae (l(2)gl), is shown to lie adja-

46 0888-7543/97 $25.00 Copyright ᭧ 1997 by Academic Press All rights of reproduction in any form reserved.

AID GENO 4709 / 6r34$$$181 04-18-97 00:23:06 gnma HUMAN FLII GENE STRUCTURE: OVERLAP WITH LLGL 47 cent to the FLII gene in the opposite transcriptional library (Stratagene Catalog No. 936205) as template in a Perkin– .orientation. The 3؅ ends of the transcripts overlap for Elmer system 9600 PCR machine using AmpliTaq DNA polymerase After 1.5 min at 95ЊC, 40 cycles of 95ЊC, 30 s; 55ЊC, 30 s; and 72ЊC, both the human FLII and LLGL genes and the corre- 60 s were carried out, followed by 4.5 min at 72ЊC. An aliquot (1 ml) sponding murine genes Fliih and Llglh. The overlap of this reaction was then used as template in a second PCR with the region contains poly(A) signals for both genes and is inner primer 5؅-CAT CCT TCC CCC TCA CTT TGC AGA G-3؅ (LLGL highly conserved between human and mouse. cDNA, GenBank D50550, nucleotides 3539–3563) and a T3 primer 5؅ATT AAC CCT CAC TAA AGG GA-3؅), under the same reaction) conditions. The 600-bp product was phosphorylated, purified on an MATERIALS AND METHODS agarose gel, and cloned into M13mp10 for dye primer sequencing.

Cloning of the FLII gene. Cosmids were isolated from a gridded RESULTS library in SuperCos I (Stratagene) from Los Alamos National Laboratory (LA17NC01). The library was prepared from Cloning and Analysis of FLII Genomic DNA flow-sorted from the mouse–human hybrid cell line 38L-27 (Kallioniemi et al., 1994). Cosmids were also isolated from We previously reported the isolation of three cosmids the ICRF gridded chromosome 17 Reference Library (Zehetner and spanning the 5؅ and 3؅ ends of the FLII gene (Chen et Lehrach, 1994). Screening was done with the 4.1-kb human FLII -cDNA (Campbell et al., 1993) and cDNA probes from the 5؅ (274-bp al., 1995). We have isolated further cosmids hybridiz EcoRI) and 3؅ (378-bp NotI) ends. DNA probes were labeled by ran- ing to FLII cDNA probes from chromosome 17 cosmid dom primer incorporation of [a-32P]dCTP. Cosmid DNA was isolated libraries (Kallioniemi et al., 1994; Zehetner and Leh- using Qiagen kits. NotI fragments were subcloned into pBluescript rach, 1994). A restriction map (EcoRI and NotI) of the KS(/). Standard recombinant DNA methods were as described (Sam- brook et al., 1989). cloned genomic DNA in 11 cosmids was generated, and Sequence analysis. End sequencing was done using Applied Bio- hybridization to the FLII cDNA and probes from the systems PRISM dye primer and dye terminator reagents. Sonicated 5؅ and 3؅ ends of the cDNA localized the FLII gene (not fragments of the 13.7-kb NotI fragment were cloned into M13mp10 shown). Southern blot analysis of human genomic DNA (Deininger, 1983) and sequenced with dye primer reagents. Se- (Fig. 1A) with the FLII cDNA probe showed that the quences were determined on an Applied Biosystems 373A or 377 gene was present as a single copy and was consistent DNA sequencer. Specific primers were used to ensure coverage of both strands. The only exception was part of the final exon and a with the map of the cloned FLII region and with the few bases 3؅ to it that were determined on one strand only, using nucleotide sequence of the gene (see below), indicating both dye primers and dye terminators. The sequence of the final exon the absence of major rearrangements in the cloned agreed exactly with the cDNA sequence (Campbell et al., 1993). The DNA. The same pattern of bands was observed on the overall sequencing redundancy level was úeightfold. genomic Southern blot at final wash stringencies of 11 Computer methods. Sequence assembly and analysis used Staden or 0.11 SSC at 65ЊC, and with exposure times up to 68 (1987), Genetics Computer Group (Devereux et al., 1984), MacVector (Kodak), and EditView (Applied Biosystems) software. Sequence da- h, indicating that there are no other closely related tabases were searched using BLAST options at NCBI, Bethesda. genes present detectable under these conditions. Sequence alignments of proteins and their domains were obtained using the GCG programs Gap and PileUp, with gap weight 3.0 and Northern Analysis of FLII Gene Expression gap length weight 0.1. End gaps were weighted like other gaps. Pro- tein distance matrix analysis was performed with the Phylogenetic A Northern blot containing 2 mg per lane of poly(A)/ Inference Package, PHYLIP, Version 3.5c (Felsenstein, 1993). One RNA from human heart, brain, placenta, lung, liver, hundred bootstrap replications were carried out via the program SEQBOOT. PRODIST, in conjunction with the PAM001 matrix, pro- skeletal muscle, kidney, and pancreas was hybridized duced distance datasets from which a neighbor-joining tree was cal- to the FLII cDNA probe. A band of Ç4.4 kb was visible culated using NEIGHBOR (Saitou and Nei, 1987). Maximum parsi- in all lanes after 2.4 h autoradiography (Fig. 1B). A mony analysis was carried out using PAUP (Swofford, 1993). Un- strong band of this size was observed in all lanes after rooted protein trees were rooted to severin as the ancestral taxon 19 h autoradiography, with no additional bands visible using the outgroup method. (not shown). After background subtraction, the signal Southern and Northern blot analysis. Human placental DNA (10 mg) was digested, separated on a 0.8% agarose gel, and blotted onto intensities, relative to liver set as 1.0, were heart, 2.0; reinforced nitrocellulose membrane (Schleicher & Schuell, BA-S 85). brain, 1.0; placenta, 0.8; lung, 1.5; liver, 1.0; skeletal The membrane was hybridized with the 32P-labeled 4.1-kb FLII muscle, 7.7; kidney, 0.8; pancreas, 0.8. Thus, FLII is cDNA fragment (Campbell et al., 1993). The final wash was at 65ЊC expressed in all these tissues, with strongest expres- in 11 or 0.11 SSC, 0.1% SDS. A human MTN blot, Clontech Catalog sion in skeletal muscle, followed by heart, and then No. 7760-1, was hybridized to the FLII cDNA probe or a human b- actin probe (Clontech) at 65ЊCin51SSC, 50 mM sodium phosphate lung. Hybridization to a human b-actin probe (Fig. 1C) / buffer, pH 6.8, 101 Denhardt’s solution, 2% SDS, 10 mM EDTA, 1 confirmed approximately equal loading of poly(A) mM sodium pyrophosphate, 1 mM ATP, 100 mg/ml sonicated, dena- RNA in each lane. tured salmon sperm DNA and washed at 65ЊCin11SSC, 0.1% SDS. Autoradiography was performed with intensifying screens at 080ЊC. Sequence Analysis of the FLII Gene Signal intensity was quantified with a Molecular Dynamics Phos- phorImager. Cosmid c5C2 contained a 13.7-kb NotI fragment that PCR. For nested PCR, 10 pmol of the outer primer 5؅-AAG AGG hybridized to the 5؅ end cDNA probe and extended to ACA GGT GGG GGC CCA GCA C-3؅ (LLGL cDNA, GenBank D50550, nucleotides 3487–3511) was used together with 10 pmol of the NotI site present near the 3؅ end of the cDNA at an M13 reverse primer (5؅-AGC GGA TAA CAA TTT CAC ACA GGA- nucleotide 3709 (Campbell et al., 1993). The 3؅ end of -3؅) in a 50-ml reaction with 1 ml boiled human hippocampus cDNA the gene was present on an adjacent 9-kb NotI frag

AID GENO 4709 / 6r34$$$181 04-18-97 00:23:06 gnma 48 CAMPBELL ET AL. sults of Northern analysis suggest that the 5؅ end of R33910 is probably not far from this point. Examina- tion of the 207 bp of genomic sequence extending up- -stream from the 5؅ end of R33910 reveals a very GC rich region in which a TTAAA motif is embedded, possi- bly representing the TATA box for FLII. The TTAAA motif is located 61 bp upstream from the 5؅ end of R33910. The 144 bp of the 5؅ sequence extending from within the potential TATA box to the Sau3AI cloning site exactly matches the sequence of a CpG island clone (Cross et al., 1994; GenBank Accession No. Z59965), indicating that this is likely to represent at least a .portion of the 5؅ regulatory region of FLII

Exon/Intron Structure of the FLII Gene A schematic view of the human FLII gene is pre- sented in Fig. 2, and a summary of the exon/intron boundaries is presented in Fig. 3. The FLII gene con- tains 29 introns, compared with 3 in the D. melanogas- ter fliI gene (GenBank Accession Nos. U01182 and U28044) (Campbell et al., 1993; de Couet et al., 1995) and 13 in the C. elegans gene (GenBank Accession Nos. U01183, M77697, L18807, and L07143) (Campbell et al., 1993; Sulston et al., 1992). The conservation of in- FIG. 1. Southern blot analysis of human genomic DNA with the tron positions between the different characterized FLII cDNA probe and Northern blot analysis of FLII gene expression members of the FLII gene family (Fig. 4A) provides in various tissues. (A) High-molecular-weight human placental DNA further strong support for the common evolutionary (10 mg) was digested with restriction endonucleases EcoRI, HindIII, BamHI, PstI and HincII, separated on a 0.8% agarose gel, trans- origin of these genes and indicates that a number of ferred to reinforced nitrocellulose membrane, and hybridized to the the present-day FLII gene introns were present in the 4.1-kb FLII cDNA probe. The blot was exposed to X-ray film for 19.5 FLII gene homologue in the latest common ancestor for hat080ЊC. (B) A Northern blot containing human poly(A)/ RNA (2 humans, D. melanogaster, and C. elegans at least 500 mg) from heart, brain, placenta, lung, liver, skeletal muscle (sk. mus- million years ago. cle), kidney, and pancreas was hybridized to the 4-kb FLII cDNA probe. The blot was exposed to X-ray film for 2.4 h at 080ЊC. (C) Northern blot from B stripped and hybridized to a human b-actin Analysis of Exon/Intron Structure of Gelsolin Family probe. In all cases, the final wash conditions were 11 SSC, 0.1% SDS Members at 65ЊC. The sizes of markers in kilobases are indicated to the left of the blots. Apart from the FLII genes, the only other members of the gelsolin gene family for which detailed informa- tion is available on exon/intron structure are human ment. The complete nucleotide sequence of the 13.7- villin (Pringault et al., 1991), human Cap G (Mishra et kb NotI fragment was determined on both strands by al., 1994), Dictyostelium discoideum protovillin (Hoff- automated sequencing. One end of the 9-kb NotI frag- man et al., 1993), and human gelsolin (partial; Kwiat- ment matched the 3؅ end of the cDNA exactly, ex- kowski et al., 1988; Witke et al., 1995). Recently, a novel tending from the NotI site at 3709 in the cDNA se- member of the family has been found in C. elegans quence to the poly(A) attachment site (Campbell et al., (GenBank Accession No. Z70755, reading frame 1993), with no introns present. The nucleotide se- K06A4.3) (Sulston et al., 1992). quence of the FLII gene that we have determined cov- The human villin gene (Pringault et al., 1991) con- ers 14131 bp. tains 18 introns and spans 25 kb. A number of introns An EST clone, GenBank Accession No. R33910, ex- are conserved in position between human FLII and vil- tending 5؅ to our previously determined cDNA se- lin, and one of these corresponds to the single intron quence (Campbell et al., 1993), was identified and rese- of D. discoideum protovillin (Hoffman et al., 1993) (Fig. quenced, extending the cDNA sequence a further 37 bp 4). The C. elegans protein K06A4.3 (GenBank Acces- upstream across the putative ATG translation initia- sion No. Z70755) (Sulston et al., 1992) consists of do- tion codon. This is the first ATG in the open reading main 1 (subdomains 1, 2, and 3) followed by subdomain frame after an in-frame TAA stop codon upstream in 6 (Fig. 4), and 5 of its 9 introns correspond in position the genomic sequence. The previous cDNA sequence with introns in human villin (Fig. 4A). was missing only the AT of the ATG translation initia- The human CAPG gene (Mishra et al., 1994) contains tion codon. Although we have not determined the exact 9 introns and spans 16.6 kb. Intron 18 in domain 1 of position at which transcription of FLII begins, the re- the gelsolin-like portion of FLII is in exactly the same

AID GENO 4709 / 6r34$$$181 04-18-97 00:23:06 gnma HUMAN FLII GENE STRUCTURE: OVERLAP WITH LLGL 49

FIG. 2. Schematic view of the structure of the human FLII gene. The exons of the gene are boxed and numbered 1–30. Shaded portions of the boxes represent untranslated regions, and the open boxes represent coding sequence. The ATG initiation codon and TAA termination codon are marked. The 3؅ end of the LLGL gene (Strand et al., 1995; Koyama et al., 1996) is also shown overlapping with the 3؅ end of FLII. The scale is indicated. position as intron 7 of CAPG (Fig. 4A). No other introns all, the data indicate that Cap G is more closely related appear to be conserved in position between CAPG and to domain 1 of gelsolin family members than to domain FLII. None of the introns present near the 5؅ end of 2. The data also indicate that some present-day introns the 70-kb human gelsolin gene (Kwiatkowski et al., occurred in the latest common ancestor of gelsolin fam- 1988; Witke et al., 1995) are conserved in FLII. Mishra ily members, prior to the divergence of lines leading to et al. (1994) state that ‘‘The first 5 exons of CAPG and humans and Dictyostelium. gelsolin reveal nearly identical intron–exon junctions at 3 of 5 sites’’ based on unpublished information on Analysis of Domains 1 and 2 of the Gelsolin-like the human and mouse gelsolin gene structures. Over- Portion of Gelsolin Family Members Gelsolins, adseverins, villins, and FLII proteins show evidence of an internal duplication, containing two cop- ies of a unit similar to severin, fragmin, and Cap G (Campbell et al., 1993; Claudianos and Campbell, 1995; Way and Weeds, 1988; Bazari et al., 1988). We per- formed multiple alignments on the monomeric family members together with the separated domains 1 and 2 of dimeric family members and used these alignments to examine aspects of the evolution of the family. First, we examined the position of introns within the domains to see whether there is evidence of conservation of in- trons from the precursor from which domains 1 and 2 arose by gene duplication, and we found several cases where introns are in similar positions in domain 1 vs domain 2 of the gelsolin-like portions of family mem- bers (Fig. 4B). Second, we performed phylogenetic anal- yses on the separated domains to try to clarify the se- quence of events by which the various family members have arisen. A distance matrix tree is shown in Fig. 5. Maximum parsimony methods give similar results (not shown). Cap G and the separated domains 1 all segre- gate together, and similarly, the separated domains 2 all segregate together. There is little evidence for gene conversion between the portions of the genes encoding domains 1 and 2 having played a significant role during the evolution of family members. Overall, the results strongly indicate that the Cap G protein is more closely related to domain 1 of the di- meric family members and support the hypothesis that FIG. 3. The exon–intron boundaries of the human FLII gene. The numbers indicate the positions of the 5؅ and 3؅ nucleotides CAPG arose by loss of domain 2 from a gelsolin-like within the human FLII cDNA sequence (GenBank Accession No. precursor gene (Claudianos and Campbell, 1995). In U01184) (Campbell et al., 1993). On each line, the final six nucleo- further support of this, the pattern of insertions and tides of each exon are given, followed by the first eight bases of the deletions (indels) in Cap G is identical (with one excep- intron. The intron number is given, with its total size, then the last eight bases of the intron, followed by the first six bases of the next tion) with that of segment 1 of the gelsolins, whereas exon. The GT and AG at the 5؅ and 3؅ ends of the introns are in significant differences are present compared with all boldface type and underlined. other segments 1 and all segments 2 (not shown), mir-

AID GENO 4709 / 6r34$$$181 04-18-97 00:23:06 gnma 50 CAMPBELL ET AL.

FIG. 4. Conservation of intron position. Domains 1 and 2, subdomains 1–6, and the B, A, and C motifs of the subdomains are indicated schematically. hFLI, human FLII gene; celfli, C. elegans fliI gene; drofli, D. melanogaster fliI gene; hvil, human villin gene; protovil, D. discoideum protovillin gene; Z70755, C. elegans gene encoding reading frame K06A4.3 in GenBank entry Z70755; Cap G, human Cap G gene; hgel, human gelsolin gene. The intron number follows the abbreviation for each gene. Introns conserved within 1 base are indicated. (A) Introns conserved between different members of the gelsolin gene family. (B) Introns conserved between domains 1 and 2 of family members. The numbers in parentheses indicate the domains (1 or 2) within which the introns occur. (C) Introns conserved between the subdomains of family members. The numbers in parentheses indicate the subdomains (1–6) within which these introns occur. roring the phylogenetic trees (Fig. 5; Claudianos and pected, one end of the 600-bp PCR product matches Campbell, 1995). Domains 1 and 2 contain evidence of LLGL cDNA (GenBank Accession No. D50550) to the a triplication (Way and Weeds, 1988; Bazari et al., end of the available sequence (99.6% identity over 268 In this work, we have identified some introns bases). The other end overlaps the 5؅ end sequence .(1988 that may have been present in the ancestral unit prior (98.9% identity over 277 bases) from a single human to the triplication (Fig. 4C). EST clone (GenBank Accession No. W35195) that has been sequenced from both ends. The 3؅ end sequence The FLII and LLGL Transcripts Overlap of this clone (GenBank Accession No. W23681), which contains a poly(A) signal and tail (original sequence The opposite end of the 9-kb NotI fragment from that trace downloaded from http://genome.wustl.edu/est/ containing the 3؅ portion of FLII was sequenced, and esthmpg.html), therefore represents the 3؅ end of a very strong match to the previously described LLGL LLGL cDNA and was found to be 100% identical to the gene was found (Strand et al., 1995; Koyama et al., 3؅ end of the FLII gene sequence we have determined 1996). After a match of 106 bp extending from the NotI (Fig. 7), establishing that the FLII and LLGL genes site, a short intron of 74 bp is present, and then the overlap. match resumes (Fig. 6). The close proximity of FLII We are also sequencing genomic Fliih, the mouse and LLGL was also demonstrated by the fact that the homologue of FLII. The sequence of 2.5 kb of Fliih 3؅ FLII cDNA probe and a PCR-generated probe for LLGL to the final exon shows extensive sequence identity to (Strand et al., 1995; Koyama et al., 1996) identified five mouse Llglh on the opposite strand, allowing for sev- of the identical cosmids (c5C2, c5F6, c92C10, c110H8, eral introns (unpublished results). The sequence of and c157D9) when used to screen a chromosome 17 Llglh cDNA includes the complete 3؅ end (Tomotsune cosmid library (Kallioniemi et al., 1994). et al., 1993). The final exons of Fliih and Llglh overlap .Reported cDNAs for LLGL are truncated at the 3؅ in the region of the poly(A) sites as illustrated in Fig end (Strand et al., 1995; Koyama et al., 1996). Therefore 7. Mouse brain Fliih cDNAs that we have analyzed are .we used nested PCR on human hippocampus cDNA to polyadenylated at the 5؅ most of the poly(A) sites (Fig -extend the 3؅ end of the LLGL cDNA. For this purpose, 7), but human FLII EST cDNA clones from other tis primers were designed to avoid an Alu element present sues are also polyadenylated at the other site shown (at the 3؅ end of the available LLGL sequence. As ex- in Fig. 7, which also uses the variant ATTAAA poly(A

AID GENO 4709 / 6r34$$$181 04-18-97 00:23:06 gnma HUMAN FLII GENE STRUCTURE: OVERLAP WITH LLGL 51

more introns than the latter two genes. The human gelsolin gene contains a minimum of 13 introns, but only the 5؅ end has been characterized at the sequence level (Kwiatkowski et al., 1988). FLII and its Drosophila and C. elegans homologues encode an N-terminal LRR domain found in more than 60 proteins (Kobe and Deisenhofer, 1995). In general, this domain is involved in protein–protein interactions (Kobe and Deisenhofer, 1995). Recently the first 3D structure of an LRR protein, that of the porcine ribo- nuclease inhibitor and of its complex with ribonuclease, has been determined (Kobe and Deisenhofer, 1995). The ligand for the LRR domain of the FLII protein is unknown, but may be a member of the ras family of proteins, based on analysis of the close relationship of the LRR domain of FLII proteins with LRR domains known to interact with ras, including those of yeast adenylate cyclase and the mammalian Rsu-1 proteins (Claudianos and Campbell, 1995). The FLII proteins also contain a C-terminal gelsolin- like domain (Campbell et al., 1993). This internally du- plicated domain is present in gelsolins, villins, adsev- erins, and the FLII proteins. A monomeric version of the domain is present in the slime mold proteins sev- erin and fragmin and mammalian Cap G (Hartwig and Kwiatkowski, 1991; Mishra et al., 1994). Several in- trons conserved in position between domains 1 and 2 may have been present in the ancestral gene prior to the duplication. The monomeric domains contain evi- dence of an internal triplication (Way and Weeds, 1988; Bazari et al., 1988). Therefore, the evolution of the gel- FIG. 5. Distance matrix analysis of the separated domains 1 and solin-like domain has proceeded by way of a triplica- 2 of members of the gelsolin gene family. Abbreviations of the names of the gelsolin family members are as in this paper or as used pre- tion, followed by a duplication (Way and Weeds, 1988; viously (Claudianos and Campbell, 1995), with 1 or 2 appended, as Bazari et al., 1988; Claudianos and Campbell, 1995). appropriate, to indicate gelsolin-like domain 1 or 2. drovil, D. melano- Some introns occur at common positions within the A gaster quail protein (Mahajan-Miklos and Cooley, 1994). Bootstrap and C motifs of the triplicated subdomains, suggesting values are indicated at the nodes. The node from which the segments they may have been present prior to the triplication. 1 and 2 segregate is indicated by an asterisk. Although the concept that the monomeric Cap G pro- tein represents an intermediate species on this evolu- signal (unpublished results). Although we have only tionary pathway is attractive, we suggested previously detected mouse brain Fliih cDNA clones polyadenyl- -ated at the 5؅ most of the poly(A) sites (Fig. 7) (Camp bell et al., 1993), it seems likely that mouse Fliih tran- scripts polyadenylated using the additional poly(A) site as in FLII (Fig. 7) may also occur, possibly in a tissue- specific manner.

DISCUSSION

In the present study we have cloned and sequenced the chromosomal FLII gene, the human homologue of the D. melanogaster fliI gene (Campbell et al., 1993). FIG. 6. Mapping of LLGL near FLII. The nucleotide sequence of The FLII gene spans 14 kb of genomic DNA and is the first 200 bp of one end of the human genomic 9-kb NotI fragment smaller than the genes for other mammalian members from cosmid c5C2 is shown aligned with a portion of the nucleotide of the gelsolin gene family such as human gelsolin (70 sequence of LLGL cDNA (GenBank Accession No. D50550) (Koyama et al., 1996) extending from the NotI site at 1446 in D50550. A short kb) (Kwiatkowski et al., 1988), human villin (25 kb) intron, which follows the GT/AG rule, is present in this region of (Pringault et al., 1991), and the monomeric Cap G (16.6 LLGL. The N at nucleotide 86 in the 9-kb NotI sequence appears to kb) (Mishra et al., 1994), although it contains many be G in the trace file.

AID GENO 4709 / 6r34$$$181 04-18-97 00:23:06 gnma 52 CAMPBELL ET AL.

FIG. 7. Region of overlap of human FLII and LLGL and mouse Fliih and Llglh genes. Poly(A) signals and sites for which polyadenylated cDNAs have been observed are indicated by shaded text and black boxes, respectively. (/) indicates that the gene is on the (/) strand and (0) indicates that the gene is on the (0) strand. that CAPG arose by loss of domain 2 from a gelsolin- ras family member-mediated cytoskeletal regulatory like dimeric precursor gene (Claudianos and Campbell, processes. The fliI protein is also involved in the organi- 1995), based on phylogenetic analysis of family mem- zation of indirect flight muscle (Deak et al., 1982; bers. Evidence presented here strongly supports this Miklos and de Couet, 1990). Presumably the human conclusion. First, analysis of the exon/intron structure FLII protein is involved in functionally analogous pro- of family members indicates that Cap G is more closely cesses. In this respect it is interesting that the human related to domain 1 of the dimeric proteins than domain FLII gene is most highly expressed in muscle. 2. A prediction from this is that the intron positions in The FLII gene was the first protein coding gene Cap G will be found more similar to those of domains mapped into the critical region deleted in SMS (Chen 1 than to domains 2 of gelsolin and adseverin when et al., 1995), a relatively common (§1 in 25,000 live their complete genomic structures are available. Sec- births) microdeletion syndrome with a wide range of ond, phylogenetic analysis of the separated domains 1 physical, developmental, functional, and behavioral ef- and 2 of family members (Fig. 5) strongly supports the fects (Chen et al., 1995). Other genes mapping to this close relationship of Cap G to domain 1 of gelsolin and region include LLGL, the human homologue of the D. adseverin. The duplication event must have occurred melanogaster lethal(2) giant larvae gene (l(2)gl) before the latest common ancestor of higher eukaryotes (Strand et al., 1995; Koyama et al., 1996). We have and Dictyostelium prior to 1.1 billion years ago, since shown that the 3؅ end of LLGL overlaps the 3؅ end of Dictyostelium contains protovillin (Hoffman et al., FLII and that the 3؅ end of the mouse homologue Llglh .We find no evidence for gelsolin-related se- (Tomotsune et al., 1993) overlaps the 3؅ end of Fliih .(1993 quences in the complete yeast genome, and the most LLGL and FLII are in the opposite transcriptional ori- closely related LRR sequence is that of adenylate cy- entation in a tail-to-tail arrangement (Figs. 2 and 7). clase (unpublished results). It is tempting to speculate Since Llglh maps to mouse chromosome 11 (Kuwabara that some of the positionally conserved introns (Fig. 4) et al., 1994), Fliih must also map there. In D. melano- may contain regulatory elements as proposed by Mat- gaster, l(2)gl maps to 21A, whereas fliI maps to 19F, tick (1994). and in C. elegans, the homologue of l(2)gl, F56F10.4 The gelsolin-like domain of family members is in- (GenBank Accession No. U51993), maps to the X chro- volved in the capping and severing of actin filaments mosome, whereas the fliI homologue maps to chromo- (Hartwig and Kwiatkowski, 1991). Cap G caps actin some III (Campbell et al., 1993; Sulston et al., 1992). filaments, but has no severing activity, although this 3؅ overlaps of mammalian genes are apparently rare can be restored by changing small divergent portions (Tee et al., 1995; Bristow et al., 1993). The significance of the Cap G sequence back to the corresponding resi- of the present instance of overlapping genes is unclear, dues of gelsolin domain 1 (Southwick, 1995). It has although the conservation of the overlap region be- recently been shown using expressed material that the tween human and mouse suggests the possibility of gelsolin-like domain of the human FLII protein binds conserved function and also suggests that mutations actin (Orloff et al., 1995). in this region could affect expression of both genes. The presence of an actin-binding domain and a do- In this context, it is intriguing that the LLGL protein main that may bind a ras-like molecule (Campbell et interacts with nonmuscle myosin II in a cytoskeletal al., 1993; Claudianos and Campbell, 1995), combined network (Strand et al., 1995), while the FLII protein with the phenotype of homozygous lethal mutations in interacts with cytoskeletal actin (Campbell et al., 1993; Drosophila fliI, suggests that the fliI protein is involved Orloff et al., 1995). in rearrangements of the actin cytoskeleton during cel- The size of the FLII mRNA from Northern analysis lularization in early embryogenesis, possibly involving is 4.4 kb, in agreement with the composite cDNA size

AID GENO 4709 / 6r34$$$181 04-18-97 00:23:06 gnma HUMAN FLII GENE STRUCTURE: OVERLAP WITH LLGL 53 of 4142 bp, allowing for a poly(A) tail of Ç100–200 bp. lated to gelsolin and leucine-rich-repeat proteins involved in ras It appeared likely from alignments with C. elegans and signal transduction. Mol. Biol. Evol. 12: 405–414. D. melanogaster fliI sequences that two bases of the Cross, S. H., Charlton, J. C., Nan, X., and Bird, A. P. (1994). Purifi- cation of CpG islands using a methylated DNA binding column. ATG translation initiation codon were missing from the Nature Genet. 6: 236–244. FLII cDNA sequence (Campbell et al., 1993). An EST de Couet, H. G., Fong, K. S. K., Weeds, A. G., McLaughlin, P. J., and clone, GenBank Accession No. R33910, extends 37 bp Miklos, G. L. G. (1995). Molecular and mutational analysis of a upstream and, with the genomic sequence, confirms gelsolin-family member encoded by the flightless I gene of Drosoph- the ATG initiation codon. A potential TATA box and a ila melanogaster. Genetics 141: 1049–1059. ,CpG island (Cross et al., 1994), indicative of a 5؅ regula- Deak, I. I., Bellamy, P. R., Bienz, M., Dubuis, Y., Fenner, E., Gollin .(tory region, occur just further 5؅ to this point. M., Ra¨hmi, A., Ramp, T., Reinhardt, C. A., and Cotton, B. (1982 The complete sequence of the human FLII gene will Mutations affecting the indirect flight muscles of Drosophila mela- nogaster. J. Embryol. Exp. Morphol. 69: 61–81. be of utility in the analysis of human genetic disorders Deininger, P. (1983). Random subcloning of sonicated DNA: Applica- due to mutations in this conserved gene. These may tion to shotgun DNA sequence analysis. Anal. Biochem. 129: 216– include muscle disorders, based on the muscle pheno- 223. type of the viable Drosophila alleles and the elevated Devereux, J., Haeberli, P., and Smithies, O. (1984). A comprehensive level of expression of FLII in human skeletal muscle, set of sequence analysis programs for the VAX. Nucleic Acids Res. and could include some features of SMS (Chen et al., 12: 387–395. 1995), if mutations in the FLII gene on the nondeleted Felsenstein, J. (1993). PHYLIP—Phylogeny inference package, Uni- chromosome 17 contribute to the phenotype. Alterna- versity of Washington, Seattle, WA. tively, some SMS phenotypic features may result from Hartwig, J., and Kwiatkowski, D. (1991). Actin-binding proteins. FLII haploinsufficiency via the common deletion at Curr. Opin. Cell Biol. 3: 87–97. 17p11.2 (Chen et al., 1995). If this is the case, loss-of- Hoffman, A., Noegel, A. A., Bomblies, L., Lottspeich, F., and Schleicher, M. (1993). The 100kDa F-actin capping protein of Dicty- function mutations in FLII could also generate some ostelium amoebae is a villin prototype (‘protovillin’). FEBS Lett. SMS-like phenotypic features. Mouse Fliih is of similar 328: 71–76. size to the human FLII gene (unpublished results), and Homyk, T., Jr., and Sheppard, D. E. (1977). Behavioral mutants of gene targeting using the 15-kb mouse sequence is Drosophila melanogaster. I. Isolation and mapping of mutations planned. The generation of altered alleles of Fliih may which decrease flight ability. Genetics 87: 95–104. provide important insight into the biological role of the Kallioniemi, O.-P., Kallioniemi, A., Mascio, L., Sudar, D., Pinkel, D., FLII protein and into whether haploinsufficiency at the Deaven, L., and Gray, J. (1994). Physical mapping of chromosome 17 cosmids by fluorescence in situ hybridization and digital image FLII contributes to the phenotype of SMS. analysis. Genomics 20: 125–128. Kobe, B., and Deisenhofer, J. (1995). Proteins with leucine-rich re- ACKNOWLEDGMENTS peats. Curr. Opin. Struct. Biol. 5: 409–416. Koyama, K., Fukushima, Y., Inazawa, J., Tomotsune, D., Takahashi, We thank A. Gibbs for support, C. McCrae and P. Milburn of the N., and Nakamura, Y. (1996). The human homologue of the murine ANU Biomolecular Resource Facility for running the sequencers, A. Llglh gene (LLGL) maps within the Smith–Magenis syndrome Lewis, T. Goodsell, and J. Whitehead for illustrations, and J. Wilson region in 17p11.2. Cytogenet. Cell Genet. 72: 78–82. and M. Whittaker for photography. Kuwabara, K., Takahashi, Y., Tomotsune, D., Takahashi, N., and Kominami, R. (1994). mgl-1, a mouse homologue of the Drosophila tumor-suppressor gene l(2)gl, maps to chromosome 11. Genomics REFERENCES 20: 337–338. Kwiatkowski, D. J., Mehl, R., and Yin, H. L. (1988). Genomic organi- Bazari, W. L., Matsudaira, P., Wallek, M., Smeal, T., Jakes, R., and zation and biosynthesis of secreted and cytoplasmic forms of gel- Ahmed, Y. (1988). Villin sequence and peptide map identify six solin. J. Cell Biol. 106: 375–384. homologous domains. Proc. Natl. Acad. Sci. USA 85: 4986–4990. Mahajan-Miklos, S., and Cooley, L. (1994). The villin-like protein Bristow, J., Tee, M. K., Gitelman, S. E., Mellon, S. H., and Miller, encoded by the Drosophila quail gene is required for actin bundle W. L. (1993). Tenascin-X: A novel extracellular matrix protein en- assembly during oogenesis. Cell 78: 291–301. coded by the human XB gene overlapping P450c21B. J. Cell Biol. 122: 265–278. Mattick, J. (1994). Introns: Evolution and function. Curr. Opin. Cell Biol. 4: 823–831. Campbell, H. D., Schimansky, T., Claudianos, C., Ozsarac, N., Kasp- rzak, A. B., Cotsell, J. N., Young, I. G., de Couet, H. G., and Miklos, Miklos, G. L. G., and de Couet, H. G. (1990). The mutations pre- 3 2 G. L. G. (1993). The Drosophila melanogaster flightless-I gene in- viously designated as flightless-I , flightless-O and standby are volved in gastrulation and muscle degeneration encodes gelsolin- members of the W-2 lethal complementation group at the base of like and leucine-rich-repeat domains, and is conserved in Caeno- the X-chromosome of Drosophila melanogaster. J. Neurogenet. 6: rhabditis elegans and human. Proc. Natl. Acad. Sci. USA 90: 133–151. 11386–11390. Mishra, V. S., Henske, E. P., Kwiatkowski, D. J., and Southwick, Chen, K.-S., Gunaratne, P. H., Hoheisel, J. D., Young, I. G., Miklos, F. S. (1994). The human actin-regulatory protein Cap G: Gene G. L. G., Greenberg, F., Shaffer, L. G., Campbell, H. D., and Lup- structure and chromosome location. Genomics 23: 560–565. ski, J. R. (1995). The human homologue of the Drosophila melano- Orloff, G. J., Allen, P. G., Miklos, G. L. G., Young, I. G., Campbell, gaster flightless-I gene (fliI) maps within the Smith–Magenis mi- H. D., and Kwiatkowski, D. J. (1995) Human flightless-I has actin crodeletion critical region in 17p11.2. Am. J. Hum. Genet. 56: 175– binding ability. Mol. Biol. Cell 6: 139. 182. Perrimon, N., Smouse, D., and Miklos, G. L. G. (1989). Develop- Claudianos, C., and Campbell, H. D. (1995). The novel flightless-I mental genetics of loci at the base of the X chromosome of Drosoph- gene brings together two gene families, actin binding proteins re- ila melanogaster. Genetics 121: 313–331.

AID GENO 4709 / 6r34$$$181 04-18-97 00:23:06 gnma 54 CAMPBELL ET AL.

Pringault, E., Robine, S., and Louvard, D. (1991). Structure of the son, A., Craxton, M., Durbin, R., Berks, M., Metzstein, M., human villin gene. Proc. Natl. Acad. Sci. USA 88: 10811–10815. Hawkins, T., Ainscough, R., and Waterston, R. (1992). The C. eleg- Saitou, N., and Nei, M. (1987). The neighbor-joining method: A new ans genome sequencing project: A beginning. Nature 356: 37–41. method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4: Swofford, D. (1993). PAUP: Phylogenetic analysis using parsimony, 406–425. Version 3.1, Illinois Natural History Survey, Champaign, IL. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989). ‘‘Molecular Tee, M. K., Thomson, A. A., Bristow, J., and Miller, W. L. (1995). Cloning: A Laboratory Manual,’’ 2nd ed., Cold Spring Harbor Labo- Sequences promoting the transcription of the human XA gene over- ratory, Cold Spring Harbor, NY. lapping P450c21A correctly predict the presence of a novel, adre- nal-specific, truncated form of tenascin-X. Genomics 28: 171–178. Southwick, F. S. (1995). Gain-of-function mutations conferring actin- Tomotsune, D., Shoji, H., Wakamatsu, Y., Kondoh, H., and Taka- severing activity to human macrophage Cap G. J. Biol. Chem. 270: hashi, N. (1993). A mouse homologue of the Drosophila tumour- 45–48. suppressor gene l(2)gl controlled by Hox-C8 in vivo. Nature 365: Staden, R. (1987). Computer handling of DNA sequencing projects. 69–72. In ‘‘Nucleic Acid and Protein Sequence Analysis: a Practical Ap- Way, M., and Weeds, A. (1988). Nucleotide sequence of pig plasma proach’’ (M. J. Bishop and C. J. Rawlings, Eds.), pp. 173–217, IRL gelsolin. Comparison of protein sequence with human gelsolin and Press, Oxford. other actin-severing proteins shows strong homologies and evi- Strand, D., Unger, S., Corvi, R., Hartenstein, K., Schenkel, H., dence for large internal repeats. J. Mol. Biol. 203: 1127–1133. Kalmes, A., Merdes, G., Neumann, B., Krieg-Schneider, F., Coy, Witke, W., Sharpe, A. H., Hartwig, J. H., Azuma, T., Stossel, T. P., J. F., Poustka, A., Schwab, M., and Mechler, B. M. (1995). A human and Kwiatkowski, D. J. (1995). Hemostatic, inflammatory, and fi- homologue of the Drosophila tumour suppressor gene l(2)gl maps broblast responses are blunted in mice lacking gelsolin. Cell 81: to 17p11.2–12 and codes for a cytoskeletal protein that associates 41–51. with nonmuscle myosin II heavy chain. Oncogene 11: 291–301. Zehetner, G., and Lehrach, H. (1994). The Reference Library Sys- Sulston, J., Du, Z., Thomas, K., Wilson, R., Hillier, L., Staden, R., tem—sharing biological material and experimental data. Nature Halloran, N., Green, P., Thierry-Mieg, J., Qiu, L., Dear, S., Coul- 367: 489–491.

AID GENO 4709 / 6r34$$$181 04-18-97 00:23:06 gnma