Reference: Biol. Bull. 205: 8-15. (August 2003) © 2003 Marine Biological Laboratory

250 Million Years of Bindin Evolution

KIRK S. ZIGLER^'2'* AND H. A. LESSIOS' ' Smithsonian Tropical Research Institute, Balboa, Panamá; and ^ Department of Biology, Duke University, Durham, North Carolina

Abstract. Bindin plays a central role in -egg attach- level, often within , sometimes within genera, but ment and fusion in sea urchins (echinoids). Previous studies rarely across an entire class. There are good reasons for this determined the DNA sequence of bindin in two orders of the focus: such studies are likely to uncover mutational changes class Echinoidea, representing 10% of all echinoid species. that are important in mate recognition and in speciation. We report sequences of mature bindin from five additional However, comparisons across broad taxonomic levels can genera, representing four new orders, including the distantly offer insights into the evolution of such molecules. They can related sand dollars, heart urchins, and pencil urchins. The reveal which features of these molecules are conserved (and six orders in which bindin is now known include 70% of all are thus essential for basic functions) and which features are echinoids, and indicate that bindin was present in the com- free to vary. For the parts that do vary, such comparisons mon ancestor of all extant sea urchins more than 250 million can determine common features of evolution. Most of all, years ago. Over this span of evolutionary time there has the comparisons can address the question of the universality been (1) remarkable conservation in the core region of of a particular molecule by asking how far back in evolution bindin, particularly in a stretch of 29 amino acids that has one needs to search to find the point at which a completely not changed at all; (2) conservation of a motif of basic different molecule has taken over the essential functions amino acids at the cleavage site between preprobindin and involved in gamete binding and fusion. mature bindin; (3) more than a twofold change in length of Echinoids (sea urchins, heart urchins, and sand dollars), mature bindin; and (4) emergence of high variation in the with their readily obtainable gametes, have long been model sequences outside the core, including the insertion of gly- organisms for fertilization studies. Because fertilization is cine-rich repeats in the bindins of some orders, but not external, the molecules involved in gamete recognition and others. fusion are associated exclusively with the gametes. Bio- chemical studies in sea urchins identified the first "gamete Introduction recognition protein," bindin (Vacquier and Moy, 1977). Various studies have shown that molecules involved in Bindin is the major insoluble component of the sperm reproduction (and particularly in gamete interactions) acrosomal vesicle and has been implicated in three molec- evolve rapidly, often under the influence of positive selec- ular interactions (Hofmann and Glabe, 1994). First, after the tion (reviewed in Swanson and Vacquier, 2002). Among acrosomal reaction, bindin self-associates, coating the acro- these proteins there are examples of both high (Metz and somal process. Second, it functions in sperm-egg attach- Palumbi, 1996) and low (Metz et ai, 1998b) levels of ment by binding to carbohydrates in the vitelline layer on intraspecific variation. In some cases a single molecule the egg surface. Third, it is involved in the fusion of sperm displays domains that are highly conserved and other do- and egg membranes (Ulrich et al, 1998, 1999). mains that are highly variable (Vacquier et al, 1995). Vari- Bindin is translated as a larger precursor, from which the ation in such proteins is usually studied at a low taxonomic N-terminal preprobindin portion is subsequently cleaved to produce mature bindin (Gao et al, 1986). The mature bindin molecule contains an amino acid core of about 55 residues Received 25 February 2003; accepted 3 June 2003. * To whom correspondence should be addressed. Current address: Fri- that is highly conserved among all bindins characterized to day Harbor Laboratories, University of Washington, 620 University Road, date (Vacquier et al, 1995). An 18 amino acid section of Friday Harbor, WA 98250. E-mail: [email protected] this conserved core (B18) has been shown to fuse lipid EVOLUTION OF BINDIN vesicles in vitro, suggesting that this region functions in differences in molecular size, amino acid sequence, and sperm-egg membrane fusion (Ulrich et ai, 1998, 1999). number of repeats. Thus far, bindin is known only from echinoids; no homol- ogous molecules have been identified in any other organism Materials and Methods (Vacquier, 1998). Samples To date, the nucleotide sequence of bindin has been determined in six genera of sea urchins. In Echinometra The pencil urchins (order ) were represented in (Metz and Palumbi, 1996), (Gao et al, our study by tribuloides, collected on the Atlantic 1986; Minor et al, 1991; Biermann, 1998; Debenham et al, coast of Panama; the order by an- 2000), and (Zigler et al, 2003), there are many tillarum, also from the Atlantic coast of Panamá. The sand sequence rearrangements among individuals and species, dollars (order Clypeasteroida) were represented by Encope and indications of positive selection in regions on either side stokesii from the Pacific coast of Panamá; the heart urchins of the core. In (Glabe and Clark, 1991; Metz et al, (order Spatangoida) by Moira clotho collected at the Perlas 1998a) and Tripneustes (Zigler and Lessios, 2003), there are Islands in the Bay of Panamá. Heliocidaris erythrogramma fewer sequence rearrangements and no evidence for positive (order Echinoida) was collected near Sydney, Australia. selection. In Lytechinus, only one sequence has been pub- lished (Minor et al, 1991), so the mode of evolution of the DNA isolation and sequencing molecule remains unknown. The five genera in which bindin was previously se- We injected various individuals of each species with 0.5 quenced belong to two echinoid orders, the Echinoida and M KCl until we encountered one that produced sperm. The testes of this ripe male were removed and used either the . These two orders contain only 10% of all directly for mRNA extraction, or after preservation in either extant echinoid species (Kier, 1977; Smith, 1984; Little- RNALater (Ambion Inc.) or in liquid nitrogen. The methods wood and Smith, 1995). The molecular structure of bindin for mRNA isolation, reverse transcription reactions, initial in the other 13 orders of the class Echinoida has not been polymerase chain reactions, 3' and 5' rapid amplification of studied. The only evidence that bindin is present outside the cDNA ends (RACE) reactions, and DNA sequencing were Echinoida and Arbacioida comes from Moy and Vacquier as described in Zigler and Lessios (2003), with the follow- (1979), who reported that an antibody to bindin of Strongy- ing modifications. (1) A fragment of the core region of locentrotus purpuratus reacted with sperm from one species bindin was amplified from the reverse transcriptase reac- of the order Phymosomatoida and two species of the order tion product or from genomic DNA, using primers Clypeasteroida. As Vacquier (1998) has pointed out, mole- MB1130+ (5'-TGCTSGGTGCSACSAAGATTGA-3') and cules that mediate fertilization•in contrast to those central either core200- (5'-TCYTCYTCYTCYTGCATIGC-3') or to other basic life processes•often differ between taxa. For corel57- (5'-CIGGRTCICCHATRTTIGC-3'). These prim- example, in the molluscan class Bivalvia, completely dif- ers correspond to amino acids VLGATKID, ANIGDP, and ferent proteins are involved in gamete recognition of oysters AMQEEEE, respectively (Vacquier et al, 1995). (2) When (Brandriff et al, 1978) and of mussels (Takagi et al, 1994). complete 5' mature bindin sequences were not obtained It is, therefore, not safe to assume without empirical evi- during the first round of 5' RACE, new primers were dence that bindin is present in all orders of echinoids, or that designed at the 5' end of the obtained sequence; then a it has the same general structure as in the taxa in which it second round of RACE amplification was conducted. (3) A has already been characterized. 5 ' preprobindin primer was designed based on a comparison As a first step in determining which orders of echinoids of preprobindin sequences of Moira clotho (this study) to possess bindin and, if they do, how its structure varies, we preprobindin sequences oi Arbacia (Glabe and Clark, 1991), cloned and sequenced mature bindin from five genera of sea Strongylocentrotus (Gao et al, 1986; Minor et al, 1991), and urchins, four of which belong to orders in which bindin was Lytechinus (Minor et al, 1991). This primer, prolSO (5'- previously unknown. We combined our data with those of AAGMGIKCL^GYSCIMGIAAGGG-3'), which corresponds previous studies of bindin in genera belonging to the orders to the conserved amino acids KR(A/S)S(A/P)RKG of the Echinoida and Arbacioida. The final data set includes bindin preprobindin, was used in combination with exact primers from 10 genera of sea urchins, pencil urchins, sand dollars, from the bindin core to amplify mature bindin sequences 5 ' and heart urchins, and the results indicate that the molecule of the core from Eucidaris tribuloides testis cDNA. (4) was present in the common ancestor of all extant echinoids Bindin sequences obtained from RACE were subsequently that diverged from each other over 250 million years ago. confirmed by amplification, cloning, and sequencing of full The core sequence has remained remarkably unchanged mature bindin sequences from testis cDNA. over this period of time, whereas the areas flanking the core Sequencing of both DNA strands was performed on have undergone substantial modification, resulting in great an ABI 377 automated sequencer, and sequences were 10 K. S. ZIGLER AND H. A. LESSIOS edited using Sequencher 4.1 (Gene Codes Corp.). Se- for the mature bindins both for the core region (10 se- quences have been deposited in GenBank (Accession num- quences, 55 amino acids per sequence) and for mature bers AY126482-AY126485, AF530406). Published mature bindin sequences outside the core (10 sequences of varying bindin sequences from a single exemplar from each of length for a total of 1909 amino acids). The program the five genera in which bindin had been previously se- CODONS (Lloyd and Sharp, 1992) was used to calculate quenced were taken from GenBank. These representatives the effective number of codons (ENC), a measure of codon were Strongylocentrotus purpuratus (Accession number: usage bias (Wright, 1990), for each sequence. ENC values M14487, Gao et ai, 1986), Lytechinus variegatus (M59489, can range from 20 to 61, with 61 indicating that all synon- Minor ei a/., 1991), (X54155, Glabe and ymous codons are used in equal frequency (no codon bias), Clark, 1991), Echinometra oblonga (U39503, Metz and and 20 indicating that only a single codon is used for each Palumbi, 1996), and Tripneustes ventricosus (AF520222, amino acid (maximum codon bias). The statistical analysis Zigler and Lessios, 2003). Three amino acids of the core of protein sequences (SAPS, http://www.isrec.isb-sib.ch/ region of the bindin of Lytechinus variegatus [numbers 367 software/SAPS_form.html) program was used to identify (N), 368 (L), and 385 (Y) in the alignment of Vacquier et separated repeats, simple tandem repeats, and periodic re- al, (1995)] were changed to A, V, and D, respectively, peats in each mature bindin sequence (Brendel et al, 1992). based on our own sequence data of Lytechinus bindin from 25 individuals representing 5 species; all 25 sequences had Results and Discussion these amino acids at the 3 sites (Zigler and Lessios, unpub.). In Echinometra oblonga, sequences for the extreme 3' end Figure 1 shows the phylogenetic relationships among the of preprobindin are not in GenBank. They were inferred echinoid orders from which bindin was sequenced, as they from the primer sequences used by Metz and Palumbi have been reconstructed from molecular, morphological, (1996) to amplify mature bindin sequences. and fossil evidence (Littlewood and Smith, 1995; Smith et al, 1995). As the figure indicates, bindin is present not only in the Echinoida and the Arbacioida (from which it was Sequence analysis previously known), but also in the sand dollars (Clypeas- We aligned the mature bindin amino acid sequences with teroida) and the heart urchins (Spatangoida), as well as the ClustalX ver. 1.81 (Thompson etal, 1997), and adjusted the phylogenetically much more distant Diadematoida and Ci- alignment by eye in Se-Al (ver. 2.0a5, Rambaut, 1996). We daroida. Along with the sequence of Heliocidaris, reported characterized the amino acid changes observed in the core in this paper, and the previously known sequences from region of bindin as either radical or conservative with re- Arbacia, Strongylocentrotus, Tripneustes, Lytechinus, and spect to charge and polarity (Taylor, 1986; Hughes et al, Echinometra, the data set covers orders that contain more 1990). The PROTPARAM tool of the EXPASY proteomics than 70% of all extant echinoid species (Kier, 1977). The server of the Swiss Institute for Bioinformatics (http://www. Cidaroida, the only extant order of the subclass Perischo- expasy.org) was used to calculate Kyte and Doolittle (1982) echinoidea, is the lineage most divergent from all other hydrophobicity plots (window size =11 amino acids) for echinoids. It was separated from the approx- each mature bindin sequence. The PROTSCALE tool of the imately 250 mya. Bindin's presence in both extant sub- same server was used to calculate amino acid composition classes of the Echinoidea indicates that it was present in

Millions of years ago Species Order Source

Eucidaris tribuloides Cidaroida this study Diadema antillarum Diadematoida this study Encope stokesii Clypeasteroida this study Moira dotho Spatangoida this study Arbacia punctulata Arbacioida Glabe and Clarli, 1991 Strongylocentrotus purpuratus Echinoida Gao ^r a/., 1986 Tripneustes ventricosus Echinoida Zigler and Lessios, 2003 Lytechinus variegatus Echinoida Minor ei a/., 1991 Heliocidaris erythrogramma Echinoida this study Echinometra oblonga Echinoida Metz and Palumbi, 1996

Figure 1. Phylogenetic relationships, divergence times, and systematic position of genera in which bindin has been sequenced. Echinoid phylogeny and divergence times are from Smith (1988) and Smith et al. (1995). Source of bindin sequence data is also indicated. EVOLUTION OF BINDIN 11 their common ancestor and that it has been evolving along gent regions. Over the past 250 my, the 55 residues of the each of the branches of the phylogenetic tree for core (amino acids 155-209) have been remarkably con- more than 250 my. Whether bindin is present in other served. This region does not contain any insertions or de- remains uncertain. Moy and Vacquier (1979) letions in any echinoid lineage. Of the 55 amino acids, 45 found that their antibody to Strongylocentrotus purpuratus are conserved across all of the 10 exemplars, including a bindin did not react with sperm from three species of sea stars, stretch of 29 residues in a row (amino acids 164-192). The and "zoo blots" using S. purpuratus bindin sequences to B18 sequence of 18 amino acids implicated in membrane probe genomic DNAs of a sea cucumber and a sea star were fusion (Ulrich et al, 1998, 1999) is part of this perfectly negative (Minor et al, 1991). No attempt has been made to conserved section. Seven amino acid sites in the core region determine bindin's presence in the ophiuroids or crinoids. exhibit a singleton amino acid change {i.e., a change found Figure 2 indicates that the aligned mature bindin se- in only one of the sequences). Four of these changes are quences are a mosaic of highly conserved and highly diver- conservative with respect to charge and polarity (amino

25 50 75 Eucidaris Iribuloides NG I T • - - YT RGGGHG YVA T RPGE I L P V YP GPGP AA P Diadema antUlarum YNPTGNVGRAGAQQGGGTR AA YPPA GRPN GPRANN PQPAYAG GA P R|Q|OQYMENPSLPS(^RANG A A P Encope síokesii QMYPMMM PN AA PSPY Q MPM MP P Q GGP V GV Moira clotho SRAAVMD YQGL GMPGDV AQEV PQMGL PVQGYQGN A VV GGlL PSGGL AGGGL P VGGL A GG Arbacia puiiclulata AQGA - • GGMQSGYG YPQ AGGAQY QPVQ MNQG P PMG A A A - GFG A PQ Strongylocentrotus purpuratus VNTM QYPQ AUSPQM V • GQPA Q GAO VGGG PMG Tripneustes venlricosus YGN R F PQ S RNQQM PGQ 0 ANQ VGGG S - • Lytechinus varieaatus YGNM N YPQ PMNQPM PGQP P A PQ VGGG GMG Heliocidaris ervtkroi>ramma YGNMM N YPO QMN PQM PGPG- - AMGQPA GV P V GGG GPP Echinometra oblonga YG YPQ AMN P PM PV PGQA PMGQPA AA P V GGG GGG

100 125 50 155 E.t. A PPAA PV PDN GGG DYIGNPSAGS ALDYYDE YSD TD SQ I - VA EDDG NA L L E KM KAVLGATKIDL DM. RQGA PSG I PG PA Y GPSYGPAPAGPVIPPAEGAGGDFDSVSRTMESDQPLHE YSL SSS ODD SA V L E K I KAVLGATK IDL] E.S. MNN I PS I ADS AA E MSSVADE YSN LS DOLE NA TME 1^ I KAV L GAT KVD L M.c. PV GGLAG GPP GGG LQGGGFQGGGLPGGGGFAETS ELSPSETED L SS AEITADLETGGE NA MME RA I L GAT K I D L A.p. PPVGQPI EAA GGG EFLGEPGVGG ESEFAE YSS SI GEGE NA VME KAVLGATKIDL S.p. ••-PPQF ALP PG - QADTD FGS SS SS- - VDGGDT SA VMD KAV L GAT K I D L r.v. ----NRG PVG GGG GGELSQAAS ENEMSTDDE YSS AST----EEGET SA VMOD KAVLGATKIDL i.V. V GAGAMG pve GGG GGMGGPVGGANGI GESV EDEMSVDSD YSS LG GET SA V I QD KAVLGATKIDL H.e. - - - T NQV N YG YSS SS DGET SD V L E KAV L GATK I D L: E.o. - - - GAMA P FG GGGlAMAGPV GGGGAGPPEFGGUPAAE GAEGEGDED YSS SV E EET SA VMD KAVLGATKIDL: c-c* • • * * c* *

•66 -1 75_. _200 225 250 255 E.I. PVD I ND P YDL GL L L RH L RHH SN L LAN I GO PEV HGËV L SAMQ E EE EE EERD A EAAV V LNQ I T RN D.a. PVD I NDP YD L GL L L RH L RHH SN L LAN I GO PDV RGQV LTAMQ ED E YE EERD AATGA LMA LN GGGEGAVERRHIRANODYSEQEEEES E.S. PVD I NDP YDL GL L L RHL RHHSN L LAN IDQPEV REQV LAAMQ EE E - - - EOD A - N - - ET LTG P LOAPHAGQ M.c PVD I ND P YD L GL LL RHL RHHSN L LAN I GD PEV RNQV L SAMQ EE EQE EEOD AANAV V L NNV AOR A.p. PVD 1 ND P YDL GL L L RHL RHHSN L LAN I GD PEV RNQV LTAMQ E E E E E EEQD AANGV V L NN L EG S.p. PVD I ND P YDL GL L L RHL RHHSN L LAN I GD PAV REQV L SAMQ EE E E E E EED AATGA V L - - - GN T.v. PVD I N D P YOL GL L L RHL RHHSN L LAN I GD P EV REQV L SAMO EE E EE EEOD AANGA V L NN L DN L.v. PVD I ND P YOL GL L L HHL RHHSN L LAN I GD PEV REQV LSAMQ EE EE E EEN D AANGV V LNN L He. PVD I. N D P YDL GL L L RHL RHHSN L LAN i GD PAV REQV LTAMQ EE E E E EQOD AANGV V L NN L GP • E.o. PVD I N D P YDLGL L L RHL RHHSN L LAN 1 GD P EV REQV L SAMO EEE E_E EEQD AANGV V L NN L AN • * R R * R • • RR- * R- • • * C*

300 325 345 E.t VGG AGAG GGG YTNGGAHG RGGY D.a. DINQDSOEEPSDYP NGE SQYQ NNWRSDFOQE|S QQVN RF RS P NORNAGFPFDDDYEADFGDGGMEE EONRSGFENE FA 00 ROND E.S. PQF MFN W GAA APNPYSG M.c. GiGPA QGQF GGN MGMV PSN QPQL GMGGY GGlMGAPVGMGNAGRYNNYAQGQAFGG AGGG- A.p. PC AA W MP PYPGGAQ GGMRV GGQPQNPMGG S.p. PCGQA GGGG GAM MS POOMG QPQGMIGQP OGMGFPHEGMGGPPOGMGMPHGGMGG P PQGMGMPP - r.v. AQG T F GGM 0 G T A - - • GGLGR MGNQGYGG QA PG L.v. GQG TQ GGM R G AG - - • GGM MGNQGMGG QG H.e. (^FGG MG GGM G G GPMGGGGPM GGGGM MGFQGMGG OP P E.o. GRA GF PG GGG G G MGMAGGGG GIGGIGGM MGL PlGMGG QV

46 350 375 400 425 428 E.t. Q - G GY - NG D.a. EQEDDYVDDDSDYDDEDEPEYQQEARRYGQRPQ G - P A Y - N GO EDNYHRQETRNNPYGQRQARAGROGGSRAGVSORRGR- E.s. G - Y PH - MG' M.c. A Y - QG* A.p. PMT GYR( QG* S.p. 0 GYL OG* T.v. GYR QG* L.v. GYM OG * He. GF R PG* E.o. N A GYR QG*

Figure 2. Mature bindin amino acid alignment. Tlie first four amino acids are the presumed cleavage site from preprobindin. Sites at which amino acids are identical in more than 50% of the genera are enclosed in boxes. Conservation across all 10 genera is indicated by an asterisk below the site. Dashes indicate deletions. The site for which an intron is known to exist in Echinometra, Arbacia, Strongylocentrotus, Lytechinus, Tripneustes, Heliocidaris, and Diadema is indicated by an "I" under the alignment. The core region is shaded. The B18 region of the core is indicated by a bar beneath the alignment. Sites in the core where radical amino acid changes have occurred are marked with an 'R' under the alignment; sites in the core where only conservative amino acid changes have occurred are marked with a 'C. Stop codons are indicated by an asterisk after the last amino acid. 12 K. S. ZIGLER AND H. A. LESSIOS acids at positions 155, 157, 164, and 208), and three are tend to have bindins that are of similar length. Indeed, it radical (positions 193, 194, and 200). Each of positions 196, cannot be assumed that the genera we have included in the 199, and 203 contain three amino acids across the 10 study are representative of their orders in this respect. The genera, indicating that there have been at least two changes regions on either side of the core were found to confer at each of these sites. At least one of the changes at each site species-specificity in Strongylocentrotus (Lopez et al, must have been a radical change. Thus, radical changes are 1993). If their variation reflects the requirements of this observed in only six amino acid positions of the core region, function, they can be expected to vary in a phylogenetically all of them concentrated in a small portion of the core close unpredictable fashion. to the C terminus (amino acids 193, 194, 196, 199, 200, and An intron located at a conserved position just 5' of the 203). The rest of the core (amino acids 155 through 192 and core region has been identified in the mature bindins of 204 through 209) contains only four conservative singleton Echinometra (Metz and Palumbi, 1996), Arbacia (Metz et amino acid substitutions. al, 1998a), Strongylocentrotus (Biermann, 1998), and Tri- A second conserved region is the cleavage site at the pneustes (Zigler and Lessios, 2003). In each of these genera, border between preprobindin and mature bindin (Fig. 2). In the intron is located at a conserved valine (amino acid 150 Strongylocentrotus purpuratus, the cleavage site is marked in Fig. 2). Comparison of sequences derived from both by a motif of four basic amino acids (RKKR) (Gao et al, cDNA and from genomic DNA in Heliocidaris (Zigler et 1986). Multibasic motifs are also present in the other nine al, 2003), Lytechinus, and Diadema revealed that in these genera (Fig. 2). Such multibasic motifs typically mark the genera the intron also exists at the same location and that its cleavage sites of proproteins from the mature molecule point of insertion is also a valine. We have made no attempt during the secretory process through the action of propro- to amplify bindin from genomic DNA in Eucidaris, Moira, tein convertases (Steiner, 1998; Seidah and Chretien, 1999). and Encope, so we do not know whether this intron is a The conservation of this multibasic motif in bindin rein- universal feature of all bindins. These three genera do not forces the idea that it functions as a signal for the cleavage have a valine in the site at which the intron is known to exist of preprobindin from mature bindin in all echinoids. in the others, but the significance of this pattern cannot be In contrast to the core and to the cleavage site, the rest of evaluated with the present data. the molecule is so variable between orders that we have Previous studies have identified both bindins with gly- little confidence that the alignment of these regions depicted cine-rich repeat structures and bindins that lack such struc- in Figure 2 is correct. There is a great amount of variation ture. Glycine-rich repeats were found in the bindins of in the length of mature bindin both on the 5 ' and on the 3 ' Lytechinus (Minor et al, 1991), Strongylocentrotus (Bier- side of the molecule (Table 1). This study identifies both the mann, 1998), Echinometra (Metz and Palumbi, 1996), and longest and the shortest bindins described to date. Bindin in Tripneustes (Zigler and Lessios, 2003), all members of the Diadema antillarum (418 amino acids) is more than twice order Echinoida. Consistent with the phylogenetic position as long as bindin in Encope stokesii (193 amino acids). of Heliocidaris, its bindin also contains glycine-rich repeat Bindin length 5' of the core ranges from 78 to 148 amino sequences, with MGGGN and VGGGGP on the 5' side of acids, while bindin length 3' of the core ranges from 56 to its core, and the series MGGG-MGGGGP-MGGGGP- 215 amino acids. There seems to be no discernible evolu- MGGGGM-MGFQG-MGGQPP on the 3' side. Although tionary trend in bindin length. Closely related orders do not Moira belongs to a different order, its bindin also contains extensive glycine-rich repeats, with the sequence PGGGL- PSGGL-AGGGL-PVGGL-AGGGL-PVGGL-AGGGF- Table 1 PGGGL-QGGGF-QGGGL-PGGGG found 5' of the core. Number of amino acids in three regions of the mature hindin in 10 Glabe and Clark (1991) noted that bindin from Arbacia genera punctulata lacked significant repeat structure, and this ob-

5' Core 3' Total servation was extended to three other species of Arbacia (Metz et al, 1998a). Eucidaris, Encope, and Diadema re- Eucidaris 101 55 60 216 semble Arbacia in containing only minimal tandem or sep- Diadema 148 55 215 418 arated repeats, the longest of which is PAAP-PPAP-PAAP Encope 82 55 56 193 in the region flanking the 5 ' side of the core in Eucidaris. Moira 138 55 94 287 Arbacia 105 55 73 233 Thus, glycine-rich repeat structure remains a common trait Lytechinus 103 55 60 218 of the bindin of the Echinoida, although, as the data from Tripneustes 88 55 68 211 the spatangoid Moira indicate, it is not a characteristic Strongylocentrotus 82 55 99 236 limited to this order or even to a closely aligned clade. Heliocidaris 78 55 73 206 There are no cysteine or tryptophan residues in any Echinometra 111 55 75 241 mature bindin. Disulfide bonds formed between cysteine 5' and 3' regions are defined relative to the conserved core. residues are often critical for protein structure, and in rap- EVOLUTION OF BINDIN 13 idly evolving proteins•such as toxins of cone snails (Duda and Palumbi, 1999) and pheromones of the marine ciliate Euplotes (Luporini et al, 1995)•cysteine residues are of- E.t. ten among the most conserved amino acids, serving as yvv/*V^^AA^ guides for aligning sequences. Thus, the lack of cysteine residues in bindin may have important structural conse- D.a. quences. When all sequences are pooled, glycine is by far VV^^^^/'VA^A^ the most common amino acid outside the core, constituting E.s. nearly a quarter of all residues. If the orders that possess v^vV^^ wv glycine-rich repeats (Echinoida and Spatangoida) are sepa- rated from those that do not, glycine remains the most M.c. w^^%v^^^Jv^^ common amino acid in both categories, constituting 29.6% of the non-core amino acids in the former and 16.4% of A.p. non-core residues in the latter. The six most common resi- dues outside the core (G, A, P, Q, N, and E) compose 63.9% of all non-core residues. Leucine is the most common amino S.p. A/^^\h^M\M^ acid in the core, present in 10 completely conserved amino acid positions, including 6 of the 18 amino acids in the B18 region. There is a much higher proportion of charged resi- T.v. dues in the core (31.8%) than in the rest of the molecule ^ypV"^/^ (15.6%). Each of the five charged amino acids (E, D, R, H, L.v. and K) is more common in the core. ,/w^^ Another common feature of all bindins is their lack of codon usage bias. ENC values among the 10 genera range H.e. :/^^i^jr^ from 61 (for Eucidaris and Diadema) to 48.1 (îoi Arbacia), with an average of 56.4. Low levels of codon usage bias E.o. ^>^\^^^^-\ have also been observed in sex-related genes in Drosophila (Civetta and Singh, 1998) and in the Chlamydomonas mat- ing-type locus genes Mid and Fusl (Ferris et al., 2002). Given the large divergence in amino acid sequence and Figure 3. Hydrophobicity plots of mature bindin in 10 genera. The length (and the uncertainties in alignments), it is not sur- genera are presented in the same order as in Figure 1 and abbreviated as follows: E.t.: Eucidaris tribuloides, D.a.: Diadema antillarum, E.s.: En- prising that hydrophobicity plots (Fig. 3) from these bindins cope stokesii, M.c: Moira clotho, A.p.: Arbacia punctulata, S.p.: Strongy- are diverse. The conserved amino acid sequence of the core locentrotus purpuratus, T.v.: Tripneustes ventricosus, L.V.: Lytechinus and its flanking regions causes all plots to be similar through variegatus, H.e.: Heliocidaris erythrogramma, and E.o.: Echinometra ob- the middle of the molecule. Plots of the closely related longa. A box encloses the core, and a bracket indicates the hydrophobic Tripneustes ventricosus, Lytechinus variegatus, Helioci- region in Arbacia. The scale bars above and below each zero line mark +1 and •1, respectively. daris erythrogramma, and Echinometra oblonga bindins are similar throughout their lengths. The rest of the hydropho- bicity plots are not clearly similar. One particularly distinct additional bindin sequences reported here reinforce the con- region is the long hydrophilic stretches in Diadema bindin clusions of Hellberg and Vacquier (1999) from comparisons along its extended length. A second is the highly hydropho- between the modes of evolution of these two proteins. bic region 3' of the core oi Arbacia bindin, noted by Glabe Although they are both involved in gamete recognition and and Clark (1991). both lack cysteine residues, they evolve in different fash- The only other gamete recognition protein that has been ions. There is no equivalent of a bindin core region in lysin; studied in marine invertebrates separated for as long as 250 amino acid substitutions are spread throughout the mole- my is the gastropod sperm protein lysin. Lysin opens a hole cule, with only three amino acids conserved between all in the vitelline envelope of free-spawning snails and thus Haliotis species and the two teguline genera. Instead of enables sperm to penetrate to the plasma membrane of the conserving a section of the molecule, lysin has maintained egg. It has been studied in the abalones (Haliotis) (Lee and its function by conserving secondary structure through con- Vacquier, 1992; Lee et al., 1995; Yang et al, 2000; re- servative amino acid substitutions (Hellberg and Vacquier, viewed in Kresge et ai, 2001) and in two genera of turban 1999). Length variation is another obvious difference. Ma- snails, Tegula and Norrisia (Hellberg and Vacquier, 1999). ture bindin length varies from 193 to 418 amino acids, but Abalones and turban snails diverged 250 mya, roughly the lysin length (at least in the two groups studied to date) only same time the cidaroids separated from the euechinoids. The from 126 to 138 amino acids. 14 K. S. ZIGLER AND H. A. LESSIOS

Conclusions sequence variation and selection in the bindin locus of the , Strongylocentrotus franciscanas. J. Mol. Evol. 51: 481-490. The comparisons of bindin from 10 genera of echinoids Duda, T. F., and S. R. Palumbi. 1999. Molecular genetics of ecological reveal the results of long-term evolution under two oppos- diversification: duplication and rapid evolution of toxin genes of the ing selective forces acting on gamete recognition molecules. venomous gastropod Conus. Proc. Nati. Acad. Sei. USA 96: 6820- 6823. The sections of the molecule involved in the basic functions Ferris, P. J., E. V. Armbrust, and U. W. Goodenough. 2002. Genetic of gamete fusion and post-translational cleaving of the structure of the mating-type locus of Chlamydomonas reinhardtii. preprobindin have been remarkably conserved over 250 my Genetics 160: 181-200. of evolution, presumably through purifying selection. The Gao, B., L. E. Klein, R. J. Britten, and E. H. Davidson. 1986. Se- sections involved in species recognition have been evolving quence of mRNA coding for bindin, a species-specific sperm protein rapidly in seemingly unpredictable directions, presumably required for fertilization. Proc. Nati. Acad. Sei. USA 83: 8634-8638. Glabe, C. G., and D. Clark. 1991. The sequence of the Arbacia punctu- under diversifying selection; such changes are likely to be lata bindin cDNA and implications for the structural basis of species- specific to each species. specific sperm adhesion and fertilization. Dev. Biol. 143: 282-288. A number of features identified by these comparisons are Hellberg, M. F., and V. D. Vacquier. 1999. Rapid evolution of fertil- in need of functional explanations. Among the conserved ization selectivity and lysin cDNA sequences in teguline gastropods. features, the lack of change in the core region is the only one Mol. Biol. Evol. 16: 839-848. Hofmann, A., and C. G. Glabe. 1994. Bindin, a multifunctional sperm that can be easily explained. We do not yet know whether ligand and the evolution of new species. Semin. Dev. Biol. 5: 233-242. there is a particular reason for the low codon usage bias of Hughes, A. L., T. Ota, and M. Nei. 1990. Positive Darwinian selection all bindins, for the absence of tryptophan or cysteine resi- promotes charge profile diversity in the antigen-binding cleft of class I dues, or for the absence of major hydrophobic regions in all major-histocompatibility-complex molecules. Mol. Biol. Evol. 7: 515- bindins except that of Arbacia. The differences between the 524. orders are equally puzzling. Is there a functional reason for Kier, P. M. 1977. The poor fossil record of the regular echinoid. Paleo- biology 3: 168-174. the length variation of the regions outside the core? Why do Kresge, N., V. D. Vacquier, and C. D. Stout. 2001. Abalone lysin: the the Echinoida and the Spatangoida have glycine-rich repeats dissolving and evolving sperm protein. Bioessays 23: 95-103. in the regions flanking the core, while other orders do not? Kyte, J., and R. F. Doolittle. 1982. A simple method for displaying the Comparisons alone cannot provide answers to these ques- hydrophobic character of a protein. J. Mol. Biol. 157: 105-132. tions; but they can identify features of the molecule that are Lee, Y.-H., and V. D. Vacquier. 1992. The divergence of species- specific abalone sperm lysins is promoted by positive Darwinian se- worthy of functional study. lection. Biol. Bull. 182: 97-104. Lee, Y.-H., T. Ota, and V. D. Vacquier. 1995. Positive selection is a Acknowledgments general phenomenon in the evolution of abalone sperm lysin. Mol. Biol. Evol. 12: 231-238. We are grateful to A. and L. Calderón for providing Littlewood, D. T. J., and A. B. Smith. 1995. A combined morphological support in the laboratory, to M. McCartney for primer and molecular phylogeny for sea urchins (Echinoidea: Echinodermata). design and advice on the RACE technique, to T. Duda for Philos. Trans. R. Soc. Lond. B 347: 213-234. Lloyd, A. T., and P. M. Sharp. 1992. CODONS: A microcomputer collecting Moira clotho, and to E. Popodi for providing program for codon usage analysis. J. Hered. 83: 239-240. testis RNA from Heliocidaris erythrogramma. Comments Lopez, A., S. J. Miraglia, and C. G. Glabe. 1993. Structure/function from C. Cunningham, D. McClay, R. Sponer, W. Swanson, analysis of the sea-urchin sperm adhesive protein bindin. Dev. Biol. V. Vacquier, and two anonymous reviewers improved the 156: 24-33. manuscript. This work was supported by National Science Luporini, P., A. Vallesi, C. Miceli, and R. A. Bradshaw. 1995. Chem- ical signaling in ciliates. J. Eukaryot. Microbiol. 42: 208-212. Foundation and Smithsonian predoctoral fellowships to Metz, F. C, and S. R. Palumbi. 1996. Positive selection and sequence KSZ, by the Duke University Department of Zoology, and rearrangements generate extensive polymorphism in the gamete recog- by the Smithsonian Molecular Evolution Program. nition protein bindin. Mol. Biol. Evol. 13: 397-406. Metz, F. C, G. Gomez-Gutierez, and V. D. Vacquier. 1998a. Mito- Literature Cited chondrial DNA and bindin gene sequence evolution among allopatric species of the sea urchin genus Arbacia. Mol. Biol. Evol. 15: 185-195. Biermann, C. H. 1998. The molecular evolution of sperm bindin in six Metz, E. C, R. Robles-Sikisaka, and V. D. Vacquier. 1998b. Nonsyn- species of sea urchins (Echinoida: ). Mol. Biol. onymous substitution in abalone sperm fertilization genes exceeds Evol. 15: 1761-1771. substitutions in introns and mitochondrial DNA. Proc. Nati. Acad. Sei. Brandriff, B., G. W. Moy, and V. D. Vacquier. 1978. Isolation of USA 95: 10,676-10,681. sperm bindin from the oyster (Crussostrea gigas). Gamete Res. 89: Minor, J. F., D. R. Fromson, R. J. Britten, and F. H. Davidson. 1991. 89-99. Comparison of the bindin proteins of Strongylocentrotus franciscanus, Brendel, V., P. Bücher, I. Nourbakhsh, B. E. Blaisdell, and S. Karlin. S. purpuratus, and Lytechinus variegatus: sequences involved in the 1992. Methods and algorithms for statistical analysis of protein se- species specificity of fertilization. Mol. Biol. Evol. 8: 781-795. quences. Proc. Nati. Acad. Sei. USA 89: 2002-2006. Moy, G. W., and V. D. Vacquier. 1979. Immunoperoxidase localization Civetta, A., and R. S. Singh. 1998. Sex-related genes, directional sexual of bindin during the adhesion of sperm to sea urchin eggs. Curr. Top. selection, and speciation. Mol. Biol. Evol. 15: 901-909. Dev. Biol. 13: 31-44. Debenham, P., M. A. Brzezinski, and K. R. Foltz. 2000. Evaluation of Rambaut, A. 1996. Se-Al: Sequence Alignment Editor. University of EVOLUTION OF BINDIN 15

Oxford, Oxford [Online]. Available: http://evolve.zoo.ox.ac.uk/ [ac- Ulrich, A. S., M. Otter, C. G. Glabe, and D. Hoekstra. 1998. Mem- cessed June 2003]. brane fusion is induced by a distinct peptide sequence of the sea urchin Seidah, N. G., and M. Chretien. 1999. Proprotein and prohormone fertilization protein bindin. J. Biol. Chem. 273: 16,748-16,755. convertases: a family of subtilases generating diverse bioactive polypep- Ulrich, A. S., W. Tichelaar, G. Forster, O. Zschornig, S. Weinkauf, and tides. Bruin Res. 848: 45-62. H. W. Meyer. 1999. Ultrastructural characterization of peptide-in- Smith, A. B. 1984. Echinoid Paleobiology. George Allen and Unwin, duced membrane fusion and peptide self-assembly in the lipid bilayer. London. 190 pp. Biophys. J. 77: 829-841. Smith, A. B. 1988. Phylogenetic relationship, divergence times, and Vacquier, V. D. 1998. Evolution of gamete recognition proteins. Science rates of molecular evolution for camarodont sea urchins. Mol. Biol. 281: 1995-1998. Evol. 5: 345-365. Vacquier, V. D., and G. W. Moy. 1977. Isolation of bindin: the protein Smith, A. B., D. T. J. Littlewood, and G. A. Wray. 1995. Comparing responsible for adhesion of sperm to sea urchin eggs. Proc. Nail. Acad. patterns of evolution: larval and adult life history stages and ribosomal Sei. USA 74: 2456-2460. RNA of post-Paleozoic echinoids. Philos. Trans. R. Soc. Land. B 349: Vacquier, V. D., W. J. Swanson, and M. E. Hellberg. 1995. What have 11-18. we learned about sea urchin sperm bindin? Dev. Growth Dijfer. 37: Steiner, D. F. 1998. The proprotein convertases. Curr. Opin. Chem. Biol. 1-10. 2: 31-39. Wright, F. 1990. The "effective number of codons" used in a gene. Gene Swanson, W. J., and V. D. Vacquier. 2002. Reproductive protein 87: 23-29. evolution. Annu. Rev. Ecol. Syst. 33: 161-179. Yang, Z., W. J. Swanson, and V. D. Vacquier. 2000. Maximum like- Takagi, T., A. Nakamura, R. Deguchi, and K. Kyozuka. 1994. Isola- lihood analysis of molecular adaptation in abalone sperm lysin reveals tion, characterization, and primary structure of three major proteins variable selective pressures among lineages and sites. Mol. Biol. Evol. obtained from Mytilus edulis sperm. J. Biochem. 116: 598-605. 17: 1446-1455. Taylor, W. R. 1986. The classification of amino acid conservation. /. Zigler, K. S., and H. A. Lessios. 2003. Evolution of bindin in the Theor Biol. 119: 205-218. pantropical sea urchin Tripneustes: comparisons to bindin of other Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. genera. Mol Biol. Evol. 20: 220-231. Higgins. 1997. The ClustalX windows interface: flexible strategies Zigler, K. S., E. C. Raff, E. Popodi, R. A. Raff, and H. A. Lessios. 2003. for multiple sequence alignment aided by quality analysis tools. Nu- Adaptive evolution of bindin is correlated with the shift to direct cleic Acids Res. 24: 4876-4882. development in the genus Heliocidaris. Evolution (In press).