Copyright  2003 by the Genetics Society of America

Phospholipase C-␥ Contains Introns Shared by src Homology 2 Domains in Many Unrelated

Charlene M. Manning,1 Wendy R. Mathews,2 Leah P. Fico and Justin R. Thackeray3 Biology Department, Clark University, Worcester, Massachusetts 01610 Manuscript received September 10, 2002 Accepted for publication March 13, 2003

ABSTRACT Many proteins with novel functions were created by exon shuffling around the time of the metazoan radiation. Phospholipase C-␥ (PLC-␥) is typical of proteins that appeared at this time, containing several different modules that probably originated elsewhere. To gain insight into both PLC-␥ evolution and structure-function relationships within the Drosophila PLC-␥ encoded by small wing (sl), we cloned and sequenced the PLC-␥ homologs from Drosophila pseudoobscura and D. virilis and compared their structure and predicted amino acid sequences with PLC-␥ homologs in other animals. PLC-␥ has been well conserved throughout, although structural differences suggest that the role of tyrosine phosphorylation in enzyme activation differs between vertebrates and invertebrates. Comparison of intron positions demon- strates that extensive intron loss has occurred during invertebrate evolution and also reveals the presence of conserved introns in both the N- and C-terminal PLC-␥ SH2 domains that are present in SH2 domains in many other . These and other conserved SH2 introns suggest that the SH2 domains in PLC-␥ are derived from an ancestral domain that was shuffled not only into PLC-␥, but also into many other unrelated genes during animal evolution.

OUR distinct types of phospholipase C (PLC) pro- X and Y catalytic domains that is unique among PLC F tein, ␤, ␥, ␦, and ε, are produced in mammals by a subtypes. This central region contains one src homology rather diverse family of more than 10 genes. All PLCs 3 (SH3) and two SH2 domains within a split PH domain, catalyze the hydrolysis of the membrane phospholipid implying a series of shufflings and duplications to pro- phosphatidyl inositol 4,5-bisphosphate [PI(4,5)P2] into duce the modern PLC-␥ structure from an ancestral inositol 1,4,5-trisphosphate (InsP3) and diacylglycerol. PLC form. These events were apparently completed be- The former is bound by specific receptors embedded fore the parazoan-eumetazoan split, because the sponge in the endoplasmic reticular membrane, leading to a Ephydatia fluviatilis has a PLC-␥ homolog with identical transient increase in intracellular calcium, while the structure to all other animal forms (Koyanagi et al. latter is a direct activator of kinase C. The pat- 1998). The SH2 and SH3 domains of PLC-␥ are typical tern of expression, mechanism of activation, and cellu- examples of the many widely distributed modules that lar function vary considerably among the four types are found in many different proteins involved in signal (reviewed by Rebecchi and Pentyala 2000; Lopez et transduction; each domain is presumed to have arisen al. 2001; Song et al. 2001). All PLC proteins have three once and then been spread both by gene duplication domains in common: a C2 domain and catalytic domains and by being co-opted into existing genes by exon shuf- ␤ ␥ ␦ X and Y; three of the PLC types, , , and , also share fling via retrotransposition, illegitimate recombination, an N-terminal pleckstrin homology (PH) domain and or long interspersed nuclear element-mediated 3Ј trans- ␦ EF hands. PLC- may be the ancestral form, because duction (Long 2001). In the case of PLC-␥ these ac- ␤ ␥ ε , , and types are absent from plants and simple quired domains permit specific protein:protein interac- eukaryotes such as yeast. ␥ ␥ tions: the SH2 domains are thought to allow PLC- to PLC- is particularly interesting from an evolutionary bind to specific phosphorylated tyrosine residues on the standpoint, because it has a central region between the cytoplasmic face of activated receptor tyrosine kinases, whereas the SH3 domain is essential in allowing PLC-␥ to activate the phosphatidylinositol-3-OH kinase [PI(3)K] Sequence data from this article have been deposited with the enhancer (Ye et al. 2002). EMBL/GenBank Data libraries under the accession nos. AF543827 and AF543828. PLC-␥ is involved in regulating many aspects of cell 1Present address: Department of Genetics, HHMI, Harvard Medical physiology, including proliferation, differentiation, and School, 200 Longwood Ave., Boston, MA 02115. motility (reviewed by Rebecchi and Pentyala 2000). 2 Present address: Department of Biology, Johns Hopkins University, PLC-␥ activation is triggered by the binding of a wide 3400 N. Charles St., Baltimore, MD 21218. 3Corresponding author: Biology Department, Clark University, 950 variety of growth factors, cytokines, and immunoglobu- Main St., Worcester, MA 01610. E-mail: [email protected] lins to their membrane-bound receptor. Two distinct

Genetics 164: 433–442 ( June 2003) 434 C. M. Manning et al.

PLC-␥ homologs encoded by different genes—PLC-␥1 further rounds of plating and hybridization. Purified plaques ␭ and PLC-␥2—are present in mammals, whereas all inver- were obtained and -genomic DNA was extracted by the plate lysate method (Sambrook et al. 1989). tebrates described to date have only a single homolog. DNA sequencing: Restriction fragments containing sl exons In Drosophila, the PLC-␥ homolog is encoded by small were identified by Southern blot hybridization using a variety wing (sl; Emori et al. 1994; Thackeray et al. 1998), a of probes corresponding to the sl open reading frame (ORF). gene first identified by Bridges in 1915 (Lindsley and For the D. virilis gene, several of these fragments were sub- ϩ Zimm 1992). Flies lacking sl function are viable, with cloned into pBluescript KS (Stratagene) and sequenced on both strands by primer walking. Most of the D. pseudoobscura mild defects in eye and wing development due to over- gene was sequenced by transposon tagging of a 4.8-kb PstI/ activation of the epidermal growth factor receptor sig- HindIII subclone with the EZ::TNϽKAN-2Ͼ kit, using the naling pathway (Thackeray et al. 1998). This weak phe- conditions recommended by the manufacturer (Epicentre notype contrasts markedly with the more significant Technologies, Madison, WI). The 5Ј end of the D. pseudoob- effects resulting from loss of PLC-␥ function in verte- scura gene sequence was determined by primer walking within ␥ an overlapping 2.1-kb PstI/SacI fragment. For each species, brates: PLC- 1 knockout mice die in early embryogene- the subcloned fragments included the entire sl homolog. Se- sis (Ji et al. 1997), while mice lacking PLC-␥2 are viable, quencing reactions on purified plasmid DNA were performed but have defects in B cell development (Hashimoto et using the ABI Prism Big Dye terminator cycle sequencing kit al. 2000; Wang et al. 2000). The amino acid sequence (Applied Biosystems, Foster City, CA). Sequencing reactions of Sl is generally similar to its mammalian homologs were purified by ethanol precipitation with pellet paint (Nova- gen, Madison, WI) and separated on a Perkin-Elmer (Norwalk, (Emori et al. 1994), suggesting that its mode of activa- CT) ABI Prism 377 DNA sequencer. Sequences were edited tion and cellular function are conserved. However, sub- with Sequencher 4.1 (Gene Codes, Ann Arbor, MI) and fur- tle differences in its activation may exist; for example, ther analyzed with MacVector 6.01 (Accelrys, Madison, WI). two of the three tyrosines in mammalian PLC-␥1, which Phylogenetic analysis: Amino acid sequences were aligned become phosphorylated during activation, are missing using either ClustalW 1.4 (Thompson et al. 1994) or ClustalX 1.6 (Thompson et al. 1997) and in some cases improved by eye. in Sl. In an effort to determine the functional signifi- Unrooted trees were generated from the aligned sequences by cance of these and other differences between vertebrate PAUP (version 4.0b10; Swofford 2002) using bootstrap with and invertebrate PLC-␥, we cloned and sequenced two heuristic search and 1000 replicates; gaps were treated as miss- additional Drosophila PLC-␥ homologs. We present ing; parsimony was used as the optimality criterion; starting here a comparison of both the amino acid sequence and trees were obtained by stepwise addition; and branch swapping was by tree-bisection reconnection. the gene structure of these sl homologs in Drosophila, Tyrosine phosphorylation prediction: The probability of a as well as among a variety of other invertebrate and given tyrosine being phosphorylated was determined using vertebrate PLC-␥ homologs. Our analysis not only sheds the NetPhos algorithm (http://www.cbs.dtu.dk/services/ light on PLC-␥ function and evolution, but also provides NetPhos/), which uses a neural network method to compare tyrosines and their local context to tyrosines known to be evidence for a common origin of SH2 domains in all phosphorylated (Blom et al. 1999). A score Ͼ0.5 predicts that proteins. a tyrosine is in a context that will be phosphorylated in vivo.

MATERIALS AND METHODS RESULTS ␭-library construction and screening: ␭-libraries of Drosophila Isolation of sl homologs from D. virilis and D. pseudoob- pseudoobcura and D. virilis genomic DNA were constructed in scura: We constructed ␭-libraries of D. pseudoobscura and ␭FIXII (Stratagene, La Jolla, CA) and ␭GEM11 (Promega, D. virilis genomic DNA and screened them with various Madison, WI), respectively. Briefly, genomic DNA was pre- ␥ pared from adults by standard methods, partially digested with genomic probes from the D. melanogaster PLC- homo- DpnII, and separated on a 0.5% agarose gel. Fragments in the log encoded by sl. Several clones were isolated and 10- to 20-kb range were recovered onto dialysis membrane mapped by Southern blotting and various restriction and the ends partially filled in with dCTP and dTTP. The fragments were subcloned and sequenced from each genomic fragments were then ligated overnight at 4Њ to XhoI- digested ␭DNA that had been partially filled in with dATP and species. We compared the two putative sl homologs with dGTP. The ligated DNA was packaged in vitro using Packagene the D. melanogaster sl sequence by dot-matrix analysis extracts according to the manufacturer’s instructions (Stra- and identified sequences corresponding to all four sl tagene). Because the D. virilis library did not contain any exons. Each presumed exon shows a high level of DNA clones corresponding to the 3Ј end of the Drosophila PLC-␥ ␭ sequence identity to D. melanogaster sl (data not shown) homolog (sl), we obtained a D. virilis -genomic library from and the splice junctions all conform to the well-estab- Thomas Kaufman (Indiana University) that was originally con- structed by Ronald Blackman (Blackman and Meselson lished consensus sequences previously described in Dro- 1986). Amplified libraries were initially screened with a 1.8- sophila (Mount et al. 1992). The intron/exon structure kb BamHI genomic fragment from D. melanogaster sl that in- has remained unchanged, with minor differences in cludes the SH2 and SH3 domains of PLC-␥. About 100,000 length of introns 1 and 2 between the three species. plaques from each species were lifted onto nylon membranes The conserved exon/intron structure and very similar and probed with the radiolabeled BamHI fragment in standard filter hybridizations (Sambrook et al. 1989). Posthybridization DNA sequences strongly suggest that the genes we de- washes were performed at low stringency (2.5ϫ SSC, 0.1% scribe here are the sl homologs in D. pseudoobscura and SDSat65Њ). Positively hybridizing plaques were purified by two D. virilis. PLC-␥ and SH2 Domain Evolution 435

Figure 1.—(A) Phylogram of PLC-␥ homologs. The number of changes between each sequence is represented by the length of each branch, and the single best tree is shown. Numbers at each node indicate bootstrap values. (B) Alignment of vertebrate and invertebrate PLC-␥ proteins in the region where tyrosine phosphorylation occurs in mammalian PLC-␥. The amino acid sequence between the C-terminal SH2 domain and the SH3 domain was aligned using ClustalW 1.4 and improved by eye. Dark- gray shading indicates a position where at least half the proteins have same amino acid; light-gray shading indicates a position where at least half the proteins have an amino acid with similar biochemical properties. The tyrosines known to be phosphorylated in PLC-␥1 and PLC-␥2 are indicated by a circled P above the alignment.

Phylogenetic analysis of PLC-␥ proteins: The PLC-␥ species of Drosophila is consistent with previous molecu- homolog encoded by sl is the only one present in the lar phylogenies of these species (e.g., Russo et al. 1995). now almost-complete D. melanogaster genome sequence, The phylogram also suggests that the PLC-␥1/␥2 dupli- and only a single gene has been identified in the other cation event occurred long before the mammalian radi- fold greater number of-10ف invertebrate genomes sequenced to date. By contrast, ation, because there is an a gene duplication event produced separate ␥1 and ␥2 changes between the PLC-␥1/␥2 duplication and the subtypes at some point in the vertebrate lineage; each mammalian radiation (383 and 401 changes in the ␥1 subtype has identical domain structure and similar se- and ␥2 branches, respectively) compared to between 21 quence, but distinct functions. To reveal the relation- and 62 changes in each lineage since this radiation. ships between PLC-␥ homologs, we produced a transla- Although we cannot assume that rates of evolution tion of the putative ORFs from both D. pseudoobscura among paralogs in the vertebrate lineage have been and D. virilis and compared them to all other complete constant, at the very least this disparity suggests that the PLC-␥ homologs in the GenBank database, including D. PLC-␥1/␥2 duplication did not immediately precede melanogaster, the mosquito Anopheles gambiae, the sponge the mammalian radiation. E. fluviatilis, the nematode Caenorhabditis elegans, the Sequence conservation among PLC-␥ proteins: The cow Bos taurus (␥1), the rat Rattus rattus (␥1 and ␥2), 11 PLC-␥ homologs shown in Figure 1A are well con- and Homo sapiens (␥1 and ␥2). The best tree from this served throughout the sequence, with the highest level comparison is shown in Figure 1A. The tree divides in the X and Y catalytic domains (data not shown; full align- the taxa unambiguously into a mammalian clade and a ment is available at http://prism.clarku.edu/departments/ dipteran clade and gives moderate support to placing biol/faculty/Thackeray/PLCgamma.pdf). This is particu- the nematode C. elegans closer to the arthropods than larly noticeable in the N-terminal half of region X, to the mammals. The branching arrangement among where there are two blocks of 9 and 10 amino acids, 436 C. M. Manning et al. each with 100% identity among all 11 species, the two noted that the second and third introns almost precisely largest contiguous blocks anywhere in the alignment. bracket the central region containing the SH2-SH2-SH3 The two PH domains show a lower level of conservation domain, which is unique among PLC subtypes. Al- than the catalytic regions, especially in C. elegans,in though these two introns are not symmetric as would which most of the sequence fails to align clearly with be expected if these domains were originally inserted the other species. However, despite the low level of as a unit (Long et al. 1995), there is evidence that “slid- conservation, two PH domains are recognized in the C. ing” of introns occurs occasionally (Rogozin et al. elegans sequence by an InterPro motif scan (data not 2000); this pair of introns might therefore be vestiges shown). The SH3, C2, and both of the SH2 domains of the shuffling events that presumably created the ␥-form all show consistently high levels of sequence identity, from an ancestral PLC gene lacking SH domains. To confirming their critical importance to PLC-␥ function. reveal whether these introns are conserved in other The region with the weakest level of conservation lies PLC-␥ genes, we examined the homologs in five species between the N-terminal PH domain and region X, for which the complete gene structure is available: A. where four EF hand motifs were previously identified gambiae, C. elegans, Mus musculus (␥2), and H. sapiens ␥ in PLC- . (␥1 and ␥2). The PLC-␥ gene has changed in size quite ␥ ␥ Tyrosine phosphorylation sites in PLC- : PLC- 1 be- considerably during evolution; the three invertebrate comes phosphorylated on three tyrosines, Y771, Y783, genes are relatively small at 4.2, 5.7, and 6.2 kb in D. and Y1254, during its activation in growth factor signal- melanogaster, A. gambiae, and C. elegans, respectively, ing pathways (Kim et al. 1990; Wahl et al. 1990), al- whereas the mammalian genes range from 30.5 kb for though Y783 appears to be the only one required for mouse PLC-␥2 to 36.9 and 172 kb for human PLC-␥1 activation in vivo (Kim et al. 1991) and no tyrosine corre- and PLC-␥2. The number of introns correlates well with ␥ sponding to Y1254 exists in PLC- 2. Figure 1B shows gene size: only 3 introns are in the small Drosophila ␥ an alignment of a 50-amino-acid region from 12 PLC- genes, 8 in the slightly larger Anopheles gene, 11 in C. homologs, centered on the region homologous to elegans, and 31 in all three mammalian genes. Y771/Y783 of human PLC-␥1. This alignment includes ␥ ␥ The position of every intron in all eight PLC- genes additional sequences homologous to PLC- from the for which the gene structure has been determined is toad Xenopus laevis and the purple sea urchin Paracentro- shown in Figure 2A. Because the level of conservation tus lividus, which were excluded from the full alignment across much of the PLC-␥ protein is high, in most cases because they are incomplete; however, because both it is possible to determine whether two introns at homol- proteins contain the X and Y catalytic domains and all ogous positions are identical by descent or simply hap- three SH domains, they are very likely to be true PLC-␥ pen to have been inserted at similar locations since the homologs. Although the alignment is excellent among species diverged. This can be determined by comparing the five vertebrate sequences, as would be expected both the amino acid alignment at the splice site and given the demonstrated importance of tyrosine phos- the intron phase. We found that all 31 introns in each of phorylation in regulating vertebrate PLC-␥ activity, the the three mammalian genes are at conserved locations; alignment is surprisingly weak in the seven invertebrate similarly, all three Drosophila introns are at invariant proteins. With the exception of the sponge, E. fluviatilis, all of the invertebrates lack a tyrosine at the position positions within that genus (data not shown). We also homologous to Y771. Although the sponge sequence found that 7 of the 31 mammalian introns are also aligns poorly with the other proteins in this region, it present in one of the other species; these are indicated has four tyrosines arranged as two paired residues in as introns a–g in Figure 2A, and their exact position is this region, two of which align near Y771 and are each shown in alignments in Figure 2B. Four additional sites in a context that suggests that they are likely to be (marked with shaded triangles in Figure 2A) probably phosphorylated in vivo (NetPhos scores are 0.595 and represent shared intron positions, because they are in 0.787 for the N- and C-terminal residues, respectively). the same phase at equivalent regions of the protein; By contrast, the presence of a tyrosine in a conserved however, the alignment in these regions is too weak to context at a position homologous to mammalian PLC- be certain whether they are identical by descent. One ␥1 Y783 in all the invertebrates strongly suggests that this of this uncertain group is the intron at the C-terminal tyrosine acts as a regulatory site in all PLC-␥ homologs. end of the SH3 domain that we first identified in D. However, while each tyrosine in the A. gambiae, P. lividus, melanogaster. Four of the 8 introns in A. gambiae (marked E. fluviatilis, and C. elegans sequences at this position as a, c, f, and g in Figure 2, A and B) and 4 of the 11 in has a relatively high NetPhos score that suggests that C. elegans (b, d, e, and g in Figure 2, A and B) are shared phosphorylation is likely (0.749, 0.909, 0.491, 0.420, re- with an intron in the mammalian PLC-␥ genes; one of spectively), each of the Drosophila tyrosines at this posi- this group (g) is shared between A. gambiae, C. elegans, tion has a much lower NetPhos score (0.154). and the mammalian genes. Although the phase 0 intron Evolution of gene structure in PLC-␥ homologs: at the C-terminal end of the SH2-SH2-SH3 region may When comparing the Drosophila PLC-␥ homologs, we be conserved among PLC-␥ genes, the one near the PLC-␥ and SH2 Domain Evolution 437

Figure 2.—(A) Intron positions in eight PLC-␥ genes. The location of every intron in each species is shown by a triangle above a schematic representation of the PLC-␥ protein; the phase of splicing for each intron is indicated by the number above it. Solid triangles indicate a conserved intron in more than one species; shaded triangles indicate introns that may be conserved, but poor local alignment prevents them from being definitively identified as conserved. Each conserved intron is given a letter beneath the protein. Because the length of the protein differs in each species, particularly at the C-terminal end, the length of each domain is approximate. “D. mel/pse/vir” indicates introns in D. melanogaster, D. pseudoobscura, and D. virilis; all three species have introns of the same phase at identical locations. Similarly, all 31 introns in the human PLC-␥1 and PLC-␥2 and in the mouse PLC-␥1 break the gene at the homologous codons in the same phase. The central conserved introns in the N- and C-terminal SH2 domains of the vertebrate genes are indicated by solid circles. (B) Protein alignments around the conserved introns. For each of the conserved introns shown in A, the ClustalW 1.4 alignment of amino acid sequence around it is shown, together with the position of an intron indicated by a circle. (C) Alignment of SH2 domain sequences around a conserved intron from human PLC-␥1 and PLC-␥2 proteins. The alignment was generated as described above; intron positions are indicated following the scheme used in B.

N-terminal end of the first SH2 domain seems to be to it as the ␤C7 intron. An amino acid alignment around unique to the Drosophila lineage. the ␤C7 intron of both N- and C-terminal SH2 domains Conserved introns in the SH2 domains of PLC-␥ and from mouse and human PLC-␥1 and PLC-␥2 is shown in many diverse proteins: When we compared PLC-␥ in- Figure 2C. Although the codon split by this intron encodes tron positions between species, we noted that an intron different amino acids in the N-terminal and C-terminal within the mammalian N-terminal SH2 domains breaks SH2 domains (tryptophan and arginine, respectively), at the homologous residue and in the same phase as the identical phase and quality of the alignment on the conserved intron f in the C-terminal SH2 domain either side of the ␤C7 intron strongly suggest that the (each of these two sites is identified by a solid circle in introns in question are homologous. Figure 2A). Because this intron breaks the seventh A conserved intron in the N- and C-terminal SH2 amino acid of the conserved C ␤-strand of SH2 domains domains of PLC-␥ implies that the two domains are (nomenclature of Kuriyan and Cowburn 1997), we refer derived from the same ancestral SH2 either by duplica- 438 C. M. Manning et al. tion within an ancestral PLC-␥ gene or by two indepen- another by one or more shared intron sites (Figure 4). dent rounds of “exon shuffling” from exterior sources, This demonstrates that SH2 domains not only are simi- which both happened to carry an intron at the same lar in sequence, but also share a common evolutionary position. To distinguish between these two models, we heritage that is still visible in their gene structure. compared both the protein sequence and the intron Relationship between PLC-␥ SH2 domains: If the two position from additional mammalian genes containing PLC-␥ SH2 domains were produced by an initial recruit- an SH2 domain, identified at random in the GenBank ment event from another gene followed by a duplica- database. Human and mouse genes were targeted both tion, they should be more similar to each other than because our data (above) and that of others (e.g., to SH2 domains in other proteins; whereas if they were Schmitt and Brower 2001) suggest that there is less derived by two independent recruitment events, they selective pressure to lose introns in mammals and be- should be no more alike than SH2 domains in other cause the genome projects from these species provide proteins. We compared the human PLC-␥ SH2 domains cDNA evidence for gene structures. From a random to the SH2 domains from the set of 27 randomly selected sample of 30 SH2-domain-containing proteins we identi- proteins described above (i.e., excluding SH2 domains fied 27 SH2 domains that contain at least one intron. An from other PLC-␥ proteins) using PAUP (Figure 5). alignment of these SH2 domains together with the five Probably because SH2 domains are short and the align- in PLC-␥ homologs that have an intron is shown in Figure ment is weak in many places, most nodes were supported 3, as well as the position of all introns within them. We very weakly by bootstrap analysis. Although this analysis found the ␤C7 intron in a further 9 SH2 domains: six did not separate the PLC-␥ N- and C-terminal SH2 do- adaptor proteins (Eat2, SAP, TSAd, Shb, Shd, and a Shb mains into a separate branch, they are their closest rela- homolog), a nonreceptor tyrosine phosphatase (PTPN6) tives in terms of the number of changes between them, and both SH2 domains in the PI 3-kinase regulatory sub- consistent with a duplicative model of origin. The only unit, p85. This means that the presence of the ␤C7 SH2 domains grouped with a high level of confidence intron in both of the PLC-␥ SH2 domains is not proof by the phylogenetic analysis were the Shd, Shb, and Shb of a duplication event having occurred within PLC-␥, homolog adaptors, the transcription factors Stat1 and because the prevalence of the ␤C7 intron strongly sug- Stat3, the N-terminal SH2 domains from the tyrosine gests that it is likely to have been present in an ancestral phosphatases Syp and PTPN6, and the adaptors Eat-2 SH2 domain. and SAP. We also examined whether the SH2 duplica- Another very common conserved intron, ␤A0, is pres- tion in PLC-␥ occurred more than once during animal ent at the exact N-terminal end of the classically defined evolution. When we compared all SH2 domains from SH2 domain and is in the same phase as ␤C7 (Figure human PLC-␥1, E. fluviatilis, D. melanogaster, and C. eleg- 3). The ␤A0 intron is present in 11 of the 32 SH2 ans, the N-terminal and C-terminal domains separate domains and is spread among just as disparate a group into distinct clades (not shown) with a moderate level of proteins as ␤C7: a nonreceptor tyrosine kinase (Hck), of support (75) suggested by bootstrap analysis. This is a nonreceptor tyrosine phosphatase (PTPN6), a phos- consistent with a single SH2 duplication event occurring pholipase (PLC-␥), two transcription factors (Stat1 and in the PLC-␥ gene of a common ancestor of all these -3), and several adaptor proteins (SH2D3C, Stap1, Shb, species. Shd, and a Shb homolog). In all, 53% of the SH2 do- mains in this group (17 out of 32) have one or both of DISCUSSION the ␤C7 and ␤A0 introns. Five additional introns are shared by between 2 and 5 SH2 domains in each case. We isolated the PLC-␥ homologs from D. pseudoob- The pattern of shared intron positions shows that 22 of scura and D. virilis and compared both the gene structure the 32 SH2 domains we examined can be related to one and the protein sequence with other PLC-␥ homologs

᭤ Figure 3.—Position of all introns within 32 SH2 domains from 29 proteins. The SH2 domains were identified at random among mouse and human proteins from the GenBank database using and the gene structure was obtained from LocusLink. The domains were aligned using ClustalW 1.4 and improved in some places by eye; none of the changes affected regions around putative conserved introns. The position of each intron is indicated by a triangle, with a number indicating the phase of splicing. The affected amino acid is indicated by a circle in the case of phase 1 and 2 introns and by an oval between the affected amino acids in the case of phase 0 introns. Conserved introns are indicated by a solid triangle, possibly conserved introns are indicated by a shaded triangle, and open triangles represent introns that appear to be unique to one lineage. Blnk, mouse, NP_032554; Clnk, human, XP_093920; Eat2, human, XP_086490; Hck, human, XP_009539; IRS-5, mouse, NP_061295; Jak1, human, XM_059180; Jak2, human, AF058925; Nck2, human, XP_087122; PI3Kp85R, human, XP_027982; PLC-␥, A. gambiae, EAA05135; PLC-␥, C. elegans, NP_496205, Wormbase T01E8.3; PLC-␥, D. melanogaster, A53970, FBgn0003416; PLC-␥1, human, CAA18537; PTPN6, human, NP_536859; SAP, mouse, NP_035494; SH2D3C, human, NP_005480; ShbHom, human, XP_060121; Shb, human, XP_032304; Shc, human, NP_003020; Shd, human, XP_031857; Stap1, mouse, NP_064376; Stat1, human, NP_644671; Stat3, mouse, NP_035616; Supt6h, human, XP_017037; Syp, human, XM_069074; Tensin, human, XP_029631; TSAd, mouse, AAH34847; Tyk2, human, XP_008893; Zap70, human, XP_047776. PLC-␥ and SH2 Domain Evolution 439 from another insect, a nematode, and several vertebrates. high level of conservation throughout. The most highly PLC-␥ is unusual among large proteins, in that almost the conserved parts of PLC-␥ are the catalytic domains, par- entire ORF is composed of recognized domains common ticularly within region X, where several residues have been to many other signaling proteins, which is reflected in a proposed to either coordinate a calcium ion or stabilize 440 C. M. Manning et al.

Figure 4.—Venn diagram showing interrelationships be- tween SH2 domains determined by shared introns. the transition state during catalysis in PLC-␦ (Essen et al. 1997). This strongly suggests that the same catalytic mechanism is also used by the PLC-␥ forms. The impor- tance of this region is also underscored by the fact that ␥ two mutations of sl, which encodes the Drosophila PLC- Figure 5.—Unrooted phylogram of 29 mammalian SH2 homolog, also lie within these catalytic domains (Man- domains. The number of changes between each sequence is kidy et al. 2003). represented by the length of each branch. The phylogram Although it has been known for some time that both shown is one of three best trees; nodes with a bootstrap value Ͼ PLC-␥1 and -␥2 become phosphorylated on tyrosine 50 are indicated and the corresponding branches are shown with thick lines. The distribution of the ␤C7 and ␤A0 introns residues following activation, the function of this phos- is also indicated. phorylation remains unclear (reviewed by Rebecchi and Pentyala 2000). Our analysis shows that the pat- tern of PLC-␥ phosphorylation must differ between ver- ␥2 isoforms have partially overlapping patterns of ex- tebrates and invertebrates, because none of the inverte- pression, but do show significant specialization; in par- brate PLC-␥ homologs have tyrosines that correspond ticular, the PLC-␥2 isoform is a major contributor to to Y771, which is phosphorylated in PLC-␥1, or to Y1254 immune system function. For example, transgenic mice in the C-terminal tail, which is phosphorylated in PLC-␥1. carrying a knockout mutation in PLC-␥2 are viable, but Our results are therefore consistent with reports that show very specific defects in B-cell development (Hashi- Y783 has the major role in regulation of PLC-␥ activation moto et al. 2000; Wang et al. 2000), whereas PLC-␥1 (Kim et al. 1991) and suggest that Y771 and Y1254 are knockout mice die in early development (Ji et al. 1997). involved in vertebrate-specific modes of PLC-␥ regulation. B cells play a key role in adaptive immunity, a physiologi- The relatively low NetPhos score for the tyrosine homol- cal response to infection that invertebrates lack, which ogous to Y783 in all three Drosophila species may indi- is thought to have arisen early in vertebrate evolution cate different substrate preferences among Drosophila (Hughes and Yeager 1997). Although the evidence is tyrosine kinases; it is also possible that this tyrosine is currently circumstantial, it may be that early vertebrates not phosphorylated in Drosophila. We are currently found a use for a duplicated PLC-␥ gene as a regulator testing, by in vitro mutagenesis of this residue in a geno- of development in the many specialized cell types mic transformation construct, whether phosphorylation needed for the adaptive immune response. of this tyrosine is necessary for Sl function (M. Staples There has been much discussion in the literature and J. Thackeray, unpublished data). as to whether the recent evolution of genes primarily An interesting and still unanswered question is the reflects loss or gain of introns. Comparison of ortholo- date of the PLC-␥ gene duplication in the vertebrate gous genes between puffer fish and suggests lineage. Our phylogenetic analysis suggests that it oc- that intron number and position have changed little in curred after the vertebrate/invertebrate split, but well the vertebrate lineage, although intron size has ex- before the mammalian radiation. The PLC-␥1 and PLC- panded greatly in Homo (Elgar et al. 1996; Yeo et al. PLC-␥ and SH2 Domain Evolution 441

1997). A recent study of ␤-integrin genes in the coral fling events occurred. Second, the 40-amino-acid region Acropora millepora, Drosophila, C. elegans, and humans between ␤A0 and ␤C7 contains four three-dimensional suggests that intron loss has occurred frequently within features: three forming short ␤-sheets and one an ␣-helical the human lineage, but to an even greater extent within domain, whereas the exon theory suggests that ancient the fly and nematode lineages (Schmitt and Brower introns are correlated with protein features 15, 22, or 2001). All but 1 of the 26 coral ␤-integrin introns were 30 amino acids long (Gilbert et al. 1997). Third, both found in at least one other phylum, suggesting that the ␤A0 and the ␤C7 are phase 2 introns, whereas phase the coral gene structure represents the ancestral state; 0 introns are thought to have predominated in the origi- without it, many of the introns in the remaining species nal exon-shuffling events. Major exon shuffling involv- would have appeared to be unique to one lineage. Our ing many families of signaling proteins is thought to have comparison of PLC-␥ homologs fits remarkably well with occurred before the parazoan-eumetazoan split (Suga et this picture. First, the fact that the human PLC-␥1 and al. 2001), so it may be that the ␤A0 and ␤C7 introns were PLC-␥2 genes have 31 identical introns demonstrates a present in a proto-SH2 domain before these reshufflings complete lack of change in gene structure for several took place, allowing them to be spread to many varied hundred million years in the vertebrate lineage. Our and still extant proteins. The fact that one of the con- ␤ analysis also shows that the last common ancestor of served introns, A0, is at the exact N-terminal end of nematodes, insects, and vertebrates must have had a the normally defined SH2 domain is highly unlikely to minimum of 8 PLC-␥ introns—the 7 shown in Figure be coincidental, marking the N-terminal boundary of 2A, plus the additional ␤C7 intron in the N-terminal the shuffled region. This raises the question of what SH2 domain. At the very least, therefore, we can say has happened at the C-terminal end; although we have that there has been substantial loss of introns in the found no evidence for a conserved intron at this end, Drosophila PLC-␥ lineage, because all 8 of these ances- it may be that the originally shuffled exon containing tral introns are missing. The structure of a PLC-␥ gene the C-terminal half of the SH2 domain contained some from a sponge or coral will be needed to determine unnecessary sequence that has drifted in size and se- whether the many apparently unique PLC-␥ introns are quence sufficiently that it can no longer be recognized. actually derived from an ancestral gene, as is the case In any ancient family of proteins, divergent evolution will inevitably lead to some members being difficult to in the ␤-integrin genes. identify as belonging to the group without resorting to During our comparison of PLC-␥ genes, we discov- comparisons of three-dimensional structure. This idea ered an intron, ␤C7, that is present not only in the has recently been applied to SH2 domains, demonstra- human and mosquito C-terminal SH2 domains, but also ting that domains in the Janus kinase (Jak) family are in the human C-terminal SH2 domain and SH2 domains indeed bona fide SH2 domains (Al-Lazikani et al. 2001). from nine other unrelated proteins, including a phos- Our data confirm that the SH2 domains in the Janus phatase, a kinase, and several adaptor proteins. Another kinase family members Jak1, Jak2 (and Jak3, because intron at the SH2 N-terminal end, ␤A0, was found at the introns in Jak2 and Jak3 are identical; data not almost exactly the same frequency in the SH2 domain ␥ shown), and Tyk2 can be linked to the main SH2 line sample, not only in PLC- and the other proteins just ␣ ␤ by the shared A2 intron. This demonstrates the utility mentioned for C7, but also in two transcription factors. of intron position comparison: it can confirm evolution- To be present in such a diverse group of proteins indi- ary connections even when overall amino acid similarity cates that these are very old introns indeed, present in is low. Another intriguing example in our data set is a very early SH2 domain that was subsequently co-opted SupT6H, which clearly shares a phase 0 intron with Blnk, by a varied collection of proteins during eukaryotic evo- a well-established adaptor protein with a functional SH2 lution. The majority of SH2 domains in our sample, 22 domain (Yablonski and Weiss 2001). SupT6H encodes of 32, can be linked together by one or more common a protein involved in transcriptional regulation and as- introns, demonstrating that most, if not all, SH2 do- sociates with both the phosphorylated form of RNA mains are the result of divergent and not convergent polymerase II and the actively expressed chromatin evolution. (Kaplan et al. 2000). A “degenerate” SH2 domain was How ancient are the conserved SH2 introns? Al- reported in the human SupT6H sequence when it was ␤ ␤ though C7 and A0 may well represent introns from first sequenced (Chiang et al. 1996), and it is tempting one of the earliest SH2 domains to appear in evolution, to speculate that, although the SH2 domain has indeed there are three reasons to believe that these introns do diverged in sequence, in fact SupT6H contains an active not represent ancient introns in the sense implied by SH2 domain involved in binding to a protein involved the exon theory of genes (Gilbert et al. 1997). First, in transcriptional regulation, possibly RNA polymerase no SH2 domains have been identified in yeast (Hunter II itself. Tyrosine kinase activity has been demonstrated and Plowman 1997) and none have been found in in chromatin (Palangat and Roy 1995), so there may plants, suggesting that this domain arose more recently be a phosphotyrosine target for the SupT6H SH2 do- than the period during which these primal exon-shuf- main after all. 442 C. M. Manning et al.

We thank Thom Kaufman for sending us the D. virilis genomic lations and the evolution of the intron/exon structure of genes. library, Rishikesh Mankidy and Tetteh Abbeyquaye for experimental Proc. Natl. Acad. Sci. USA 92: 12495–12499. advice, Deborah Robertson and David Hibbett for assistance with the Lopez, I., E. C. Mak, J. Ding, H. E. Hamm and J. W. Lomasney, 2001 A novel bifunctional phospholipase c that is regulated by G␣ phylogenetic analysis and helpful discussions, and Manyuan Long for 12 and stimulates the Ras/mitogen-activated protein kinase pathway. his comments on the manuscript. This work was supported by grant J. Biol. Chem. 276: 2758–2765. no. R15 GM-55883-01 from the National Institutes of Health to J.R.T. Mankidy, R., J. Hastings and J. R. Thackeray, 2003 Distinct phos- pholipase C-␥-dependent signaling pathways in the Drosophila eye and wing are revealed by a new small wing allele. Genetics 164: 553–563. LITERATURE CITED Mount, S. M., C. Burks, G. Hertz, G. D. Stormo, O. White et al., 1992 Splicing signals in Drosophila: intron size, information Al-Lazikani, B., F. B. Sheinerman and B. Honig, 2001 Combining content, and consensus sequences. Nucleic Acids Res. 20: 4255– multiple structure and sequence alignments to improve sequence 4262. detection and alignment: application to the SH2 domains of Palangat, M., and D. Roy, 1995 Phosphorylation of tyrosine resi- Janus kinases. Proc. Natl. Acad. Sci. USA 98: 14796–14801. dues of RNA polymerase II and other nuclear proteins by active Blackman, R. K., and M. Meselson, 1986 Interspecific nucleotide chromatin tyrosine kinase(s). Biochem. Biophys. Res. Commun. sequence comparisons used to identify regulatory and structural 209: 356–364. features of the Drosophila hsp82 gene. J. Mol. Biol. 188: 499–515. Rebecchi, M. J., and S. N. Pentyala, 2000 Structure, function, and Blom, N., S. Gammeltoft and S. Brunak, 1999 Sequence and struc- control of phosphoinositide-specific phospholipase C. Physiol. ture-based prediction of eukaryotic protein phosphorylation sites. Rev. 80: 1291–1335. J. Mol. Biol. 294: 1351–1362. Rogozin, I. B., J. Lyons-Weiler and E. V. Koonin, 2000 Intron Chiang, P. W., S. Wang, P. Smithivas, W. J. Song, S. Ramamoorthy sliding in conserved gene families. Trends Genet. 16: 430–432. et al., 1996 Identification and analysis of the human and murine Russo, C. A., N. Takezaki and M. Nei, 1995 Molecular phylogeny putative chromatin structure regulator SUPT6H and Supt6h. Ge- and divergence times of drosophilid species. Mol. Biol. Evol. 12: nomics 34: 328–333. 391–404. Elgar, G., R. Sandford, S. Aparicio, A. Macrae, B. Venkatesh Sambrook, J., E. F. Fritsch and T. Maniatis, 1989 Molecular Clon- et al., 1996 Small is beautiful: comparative genomics with the ing: A Laboratory Manual. Cold Spring Harbor Laboratory Press, pufferfish (Fugu rubripes). Trends Genet. 12: 145–150. Cold Spring Harbor, NY. Emori, Y., R. Sugaya, H. Akimaru, S.-I. Higashijima, E. Shishido Schmitt, D. M., and D. L. Brower, 2001 Intron dynamics and ␤ et al., 1994 Drosophila phospholipase C-␥ expressed predomi- the evolution of integrin -subunit genes: maintenance of an nantly in blastoderm cells at cellularization and in endodermal ancestral gene structure in the coral, Acropora millepora. J. Mol. cells during later embryonic stages. J. Biol. Chem. 269: 19474– Evol. 53: 703–710. 19479. Song, C., C. D. Hu, M. Masago, K. Kariyai, Y. Yamawaki-Kataoka ε Essen, L. O., O. Perisic, M. Katan, Y. Wu, M. F. Roberts et al., 1997 et al., 2001 Regulation of a novel human phospholipase C, PLC , Structural mapping of the catalytic mechanism for a mammalian through membrane targeting by Ras. J. Biol. Chem. 276: 2752– phosphoinositide-specific phospholipase C. Biochemistry 36: 2757. 1704–1718. Suga, H., K. Katoh and T. Miyata, 2001 Sponge homologs of Gilbert, W., S. J. de Souza and M. Long, 1997 Origin of genes. vertebrate protein tyrosine kinases and frequent domain shuf- Proc. Natl. Acad. Sci. USA 94: 7698–7703. flings in the early evolution of animals before the parazoan- Hashimoto, A., K. Takeda, M. Inaba, M. Sekimata, T. Kaisho et eumetazoan split. Gene 280: 195–201. al., 2000 Cutting edge: essential role of phospholipase C-␥2in Swofford, D. L., 2002 PAUP*. Phylogenetic Analysis Using Parsimony. B cell development and function. J. Immunol. 165: 1738–1742. Sinauer Associates, Sunderland, MA. Thackeray, J. R., P. C. Gaines, P. Ebert and J. R. Carlson, 1998 Hughes, A. L., and M. Yeager, 1997 Molecular evolution of the ␥ vertebrate immune system. Bioessays 19: 777–786. small wing encodes a phospholipase C- that acts as a negative Hunter, T., and G. D. Plowman, 1997 The protein kinases of bud- regulator of R7 development in Drosophila. Development 125: ding yeast: six score and more. Trends Biochem. Sci. 22: 18–22. 5033–5042. Ji, Q. S., G. E. Winnier, K. D. Niswender, D. Horstman, R. Wisdom Thompson, J. D., D. G. Higgins and T. J. Gibson, 1994 CLUSTAL et al., 1997 Essential role of the tyrosine kinase substrate phos- W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap pholipase C-␥1 in mammalian growth and development. Proc. penalties and weight matrix choice. Nucleic Acids Res. 22: 4673– Natl. Acad. Sci. USA 94: 2999–3003. 4680. Kaplan, C. D., J. R. Morris, C. Wu and F. Winston, 2000 Spt5 and Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin and spt6 are associated with active transcription and have characteris- D. G. Higgins, 1997 The ClustalX windows interface: flexible tics of general elongation factors in D. melanogaster. Genes Dev. strategies for multiple sequence alignment aided by quality analy- 14: 2623–2634. sis tools. Nucleic Acids Res. 24: 4876–4882. Kim, H. K., J. W. Kim, A. Zilberstein, B. Margolis, J. G. Kim et Wahl, M. I., S. Nishibe, J. W. Kim, H. Kim, S. G. Rhee et al., 1990 al., 1991 PDGF stimulation of inositol phospholipid hydrolysis Identification of two epidermal growth factor-sensitive tyrosine requires PLC-␥1 phosphorylation on tyrosine residues 783 and phosphorylation sites of phospholipase C-␥ in intact HSC-1 cells. 65: 1254. Cell 435–441. J. Biol. Chem. 265: 3944–3948. Kim, J. W., S. S. Sim, U. H. Kim, S. Nishibe, M. I. Wahl et al., 1990 ␥ Wang, D., J. Feng, R. Wen, J. C. Marine, M. Y. Sangster et al., 2000 Tyrosine residues in bovine phospholipase C- phosphorylated Phospholipase C␥2 is essential in the functions of B cell and by the epidermal growth factor receptor in vitro. J. Biol. Chem. several Fc receptors. Immunity 13: 25–35. 265: 3940–3943. Yablonski, D., and A. Weiss, 2001 Mechanisms of signaling by the Koyanagi, M., K. Ono, H. Suga, N. Iwabe and T. Miyata, 1998 hematopoietic-specific adaptor proteins, SLP-76 and LAT and Phospholipase C cDNAs from sponge and hydra: antiquity of their B cell counterpart, BLNK/SLP-65. Adv. Immunol. 79: 93– genes involved in the inositol phospholipid signaling pathway. 128. FEBS Lett. 439: 66–70. Ye, K., B. Aghdasi, H. R. Luo, J. L. Moriarity, F. Y. Wu et al., Kuriyan, J., and D. Cowburn, 1997 Modular peptide recognition 2002 Phospholipase C␥1 is a physiological guanine nucleotide domains in eukaryotic signaling. Annu. Rev. Biophys. Biomol. exchange factor for the nuclear GTPase PIKE. Nature 415: 541– Struct. 26: 259–288. 544. Lindsley, D. L., and G. G. Zimm, 1992 The Genome of Drosophila Yeo, G. S., G. Elgar, R. Sandford and S. Brenner, 1997 Cloning melanogaster. Academic Press, San Diego. and sequencing of complement component C9 and its linkage Long, M., 2001 Evolution of novel genes. Curr. Opin. Genet. Dev. to DOC-2 in the pufferfish Fugu rubripes. Gene 200: 203–211. 11: 673–680. Long, M., C. Rosenberg and W. Gilbert, 1995 Intron phase corre- Communicating editor: J. Hey