Copyright  2004 by the Genetics Society of America DOI: 10.1534/genetics.104.030478

Evolutionary Expressed Sequence Tag Analysis of Drosophila Female Reproductive Tracts Identifies Genes Subjected to Positive Selection

Willie J. Swanson,*,†,1 Alex Wong,† Mariana F. Wolfner† and Charles F. Aquadro† *Department of Genome Sciences, University of Washington, Seattle, Washington 98195-7730 and †Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853-2703 Manuscript received April 23, 2004 Accepted for publication August 10, 2004

ABSTRACT Genes whose products are involved in reproduction include some of the fastest-evolving genes found within the genomes of several organisms. Drosophila has long been used to study the function and evolutionary dynamics of genes thought to be involved in competition and sexual conflict, two processes that have been hypothesized to drive the adaptive evolution of reproductive molecules. Several seminal fluid (Acps) made in the Drosophila male reproductive tract show evidence of rapid adaptive evolution. To identify candidate genes in the female reproductive tract that may be involved in female–male interac- tions and that may thus have been subjected to adaptive evolution, we used an evolutionary bioinformatics approach to analyze sequences from a cDNA library that we have generated from Drosophila female reproduc- tive tracts. We further demonstrate that several of these genes have been subjected to positive selection. Their expression in female reproductive tracts, presence of signal sequences/transmembrane domains, and rapid adaptive evolution indicate that they are prime candidates to encode female reproductive molecules that interact with rapidly evolving male Acps.

ENES whose products participate in reproduction genome demonstrated that the genes encoding Acps G often show signs of adaptive evolution (Swanson are on average twice as divergent as non-Acp genes and Vacquier 2002). For example, two-dimensional gel (Swanson et al. 2001a). Although no statistically signifi- electrophoresis has shown that proteins from Drosoph- cant departures from neutrality were observed in the ila male and female reproductive organs are, on average, tests applied in their study, 11% of the ESTs identified twice as diverse between species as those from nonrepro- by Swanson et al. (2001a) showed a signature consistent ductive tissues (Civetta and Singh 1995). A similar with adaptive evolution by virtue of having a d N/d S ratio pattern has been found at the nucleotide level for Dro- greater than one. sophila male accessory gland proteins (Acps)(Aguade´ Although the nature and evolution of several repro- et al. 1992; Tsaur and Wu 1997; Aguade´ 1998, 1999; ductive molecules contributed by the male have been Tsaur et al. 1998; Begun et al. 2000; Swanson et al. studied in detail, relatively little is known about the 2001a; Kern et al. 2004). Acps are important compo- evolution of female reproductive molecules. The few nents of the seminal fluid of the male ejaculate and cases studied so far suggest that adaptive evolution may have been shown to have a variety of effects on the also occur in female reproductive molecules. Positive mated female (Wolfner 2002). Acps have been shown selection on female reproductive molecules has been to increase the female’s egg laying rate (Herndon and detected in (Swanson et al. 2001b, 2003; Wolfner 1995; Soller et al. 1997, 1999; Heifetz et al. Jansa et al. 2003) and abalone (Galindo et al. 2003). 2000, 2001; Chapman et al. 2003; Liu and Kubli 2003), Here we present the first systematic attempt to identify reduce her receptivity to remating (Chen et al. 1988; genes encoding female reproductive proteins in Dro- Chapman et al. 2003; Liu and Kubli 2003), decrease sophila and to initiate evolutionary analyses of several the female’s lifespan (Chapman et al. 1995), and be such genes. involved in sperm storage and utilization (Neubaum To this end, we have undertaken an evolutionary EST and Wolfner 1999; Tram and Wolfner 1999; Xue and screen of the reproductive tract of female Drosophila. Noll 2000). An analysis of expressed sequence tags Proteins produced in the female reproductive tract carry (ESTs) derived from Drosophila simulans male accessory out a variety of important physiological functions. Pro- glands and compared to the completed D. melanogaster cesses such as sperm storage, control of oogenesis and ovulation, and control over remating rate are likely to involve interactions between female molecules and mol- ecules transferred from the male to the female. For 1Corresponding author: Department of Genome Sciences, University of Washington, Seattle, WA 98195-7730. example, the male seminal fluid proteins Acp36DE and E-mail: [email protected] Acp62F localize to the sperm storage organs following

Genetics 168: 1457–1465 (November 2004) 1458 W. J. Swanson et al. mating (Neubaum and Wolfner 1999; Lung et al. 2002), San Diego). We did not perform in-solution subtractive hybrid- Acp26Aa localizes to the base of the ovary (Heifetz et ization or normalize the cDNA library because these methods typically result in truncated cDNAs, and we desired full-length al. 2000), and sex peptide (Acp70A) binds to receptors cDNA for our evolutionary comparisons. The resulting library in the female genital tract (Ottiger et al. 2000). Thus, contained 130,000 CFUs, of which 99% were recombinant. we expect that some proteins expressed in the female The average insert size was 1.2 kb. Two sets of probes were reproductive tract will interact molecularly with Acps, utilized for differential hybridization. First, oligo(dT)-primed sperm, or other components of the male ejaculate. Mol- first-strand male cDNA was prepared from mixed age and mating status whole adult male D. simulans flies using Bethesda ecules secreted into the female reproductive tract may Research Laboratories (Gaithersburg, MD) superscript II re- also carry out a variety of functions, such as egg activa- verse transcriptase incorporating 32P-labeled dCTP and then tion, lubrication, or defense against pathogens, that do denatured at 65Њ for 30 min in 0.3 m NaOH. Second, a random- not necessitate any molecular contribution from the primed probe was generated from a mixture of RT-PCR prod- male (Wolfner et al. 2004). ucts from the three female yolk genes from D. melano- gaster: YP1, YP2, and YP3 (Barnett et al. 1980). These genes Our first goal in carrying out this EST screen was to were screened out of the library since yolk protein RNAs are identify a suite of genes whose products can be consid- abundantly expressed in the fat body, which is associated with ered candidate female reproductive molecules. Since a the reproductive tract (Barnett et al. 1980) (they are also recurring observation about reproductive proteins is expressed in the ovary, which was removed). Hybridization Њ ϫ ϫ that many show adaptive divergence (Swanson and was for 18 hr at 65 in 5 SSPE, 5 Denhardt’s, 0.5% SDS, 0.2 mg/ml salmon sperm DNA. Final washes were at 65Њ, 0.1ϫ Vacquier 2002), we also incorporate evolutionary infor- SSPE for 10 min. Sequencing was from QIAGEN purified mation into our screen by deriving ESTs from D. sim- plasmid DNA using ABI big dye terminator sequencing chem- ulans (a close relative of D. melanogaster) and aligning istry analyzed on an ABI 3100 automated sequencer. EST them to their putative orthologs in the completed D. sequences are deposited in GenBank under accession nos. melanogaster genome (Adams et al. 2000). We identified CO391819–CO392724, CO408479, and CO408480. Polymorphism survey: DNA was extracted using the Pure- 526 genes that show enriched expression in the female Gene DNA isolation kit from isofemale lines of D. melanogaster reproductive tract, 169 of which encode predicted extra- and D. simulans previously collected by C. Aquadro in Belts- cellular or cell surface molecules that could interact ville, Maryland. To maximize the power of our statistical tests, with male proteins during reproduction. we focused our analyses on intron regions, which should max- Our second goal, given the interspecific amino acid imize variation within and between species under neutrality. PCR primers and conditions are available as online supple- sequence diversity that has been observed for Drosoph- mentary material at http://www.genetics.org/supplemental/. ila male accessory gland genes (Tsaur and Wu 1997; PCR products were diluted eightfold with water and se- Aguade´ 1998, 1999; Tsaur et al. 1998; Begun et al. 2000; quenced directly using ABI big dye terminator sequencing Swanson et al. 2001a; Kern et al. 2004), was to determine chemistry and analyzed on an ABI 3100 automated sequencer. if there is a similar level of diversity among female Dro- Sequences are deposited in GenBank under accession nos. sophila reproductive molecules. Analysis of nucleotide AY665365–AY665396. Divergence study: We assessed DNA sequence divergence sequence polymorphism within and/or divergence among among five to eight increasingly divergent species of Drosoph- Drosophila species reveal statistically robust evidence that ila for five genes. For each we used either all or overlapping at least six genes expressed in the female reproductive subsets of the following species: D. erecta, D. eugracilis, D. lutescens, tract show signs consistent with having been subjected D. melanogaster, D. pseudoobscura, D. simulans, D. teissieri, and D. to positive selection and identify 25 additional candidates yakuba (detailed in results). We used two tree topologies [dif- fering only in the placement of D. erecta (Ko et al. 2003)] and the that may also show adaptive evolution upon further analy- results were consistent. The two topologies were: (pseudoobscura, sis. The identification of genes involved in male-female lutescens,(eugracilis,(erecta,((teissieri, yakuba), (melanogaster, interactions during reproduction should provide impor- simulans))))) and (pseudoobscura, lutescens,(eugracilis,((erecta, tant molecular insight into sperm precedence (Parker (teissieri, yakuba)), (melanogaster, simulans)))). Sequences for 1970), sexual conflict (Rice 1996; Gavrilets 2000), or D. melanogaster and D. pseudoobscura were obtained from public databases (http://genome.ucsc.edu/). Stocks for the other cryptic female choice (Eberhard 1996), processes that species (except our own D. simulans) were obtained from the have been proposed to account for the adaptive evolu- Drosophila Species Stock Center in Tucson, Arizona. Since the tion of reproductive proteins. analyses are based upon coding regions, we amplified the coding sequence from cDNA. Total RNA was extracted from mixed-age females using Trizol Reagent (Invitrogen). Ran- MATERIALS AND METHODS dom decamer primed cDNA was synthesized using MMLV- Reverse Transcriptase (Ambion, Austin, TX). Primers were cDNA library preparation: Total RNA was purified by the designed in conserved regions of the genes of interest, which guanidinium isothiocyanate/CsCl method (MacDonald et al. were identified by aligning the D. melanogaster gene sequences 1987) from 600 female reproductive tracts minus ovaries (ovi- with their tblastn best hits in the genome of D. pseudoobscura. ducts, uterus, parovaria, spermathecae, and seminal recepta- PCR primers and conditions are available as online supple- cle) that had been dissected from D. simulans of mixed aged mentary material at http://www.genetics.org/supplemental/. adult flies from a bottle culture. mRNA was purified using PCR products were purified using the QIAquick PCR purifica- QIAGEN (Valencia, CA) oligotex spin columns. Oligo(dT)- tion kit (QIAGEN) and sequenced using an ABI 3700 se- primed cDNA was synthesized using superscript reverse tran- quencer (Macrogen). Sequences are deposited in GenBank scriptase and cloned into the pCMV-Sport6 vector (Invitrogen, under accession nos. AY665365–AY665396. Drosophila Female Evolutionary ESTs 1459

Evolutionary and bioinformatic analyses: The D. simulans ovaries. Ovaries were excluded because they express EST sequences were aligned against the D. melanogaster pre- a diverse array of transcripts important for embryonic dicted coding sequences, and the alignment was used to calcu- development, and we wished to enrich our cDNA library late d N/d S ratios using the maximum-likelihood methods (Goldman and Yang 1994) implemented in the program for candidate molecules expressed in, or secreted from, PAML (Yang 2000). Assessment of the significance of excess reproductive epithelia. We performed a differential hy- 32 d N over d S was determined as follows. d N and d S were estimated bridization screen of our cDNA library with P-labeled as two free parameters by maximum likelihood (L1). The likeli- cDNA made from whole adult male D. simulans. Low hood was also calculated for the null model having d equal N and nonhybridizing clones were selected for further to dS (L0). The negative of twice the difference in the log- Ϫ Ϫ likelihood obtained from these two models ( 2[log(L0) analysis to enrich the collection of ESTs to be analyzed log(L1)]) was compared to the chi-square distribution with 1 for those with predominant expression in female repro- d.f. For the polymorphism survey, Tajima’s D (Tajima 1989), ductive tracts (although transcripts expressed at low Fu and Li’s D (Fu and Li 1993), and Fay and Wu’s H (Fay levels in both sexes are still present). It is important to and Wu 2000) were calculated using DnaSP4.0 (Rozas and Rozas 1999). Significance was determined by coalescent simu- note the possibility that not all proteins important in lations with R (recombination) estimated from the data by reproduction are female specific or enriched. As such, the method of Hudson (1987). These three statistics for poly- our approach may have screened out some non-sex-spe- morphism data analyze the frequency of alleles (frequency cific genes whose products, in females, interact with male spectrum) within the sample. The departures from neutrality proteins. However, the need to screen out abundant include an excess of rare alleles (Tajima 1989; Fu and Li 1993) or an excess of high-frequency-derived alleles (Fay and general molecules like actin, tubulin, etc., made it criti- Wu 2000). These specific departures are expected to be associ- cal to include this step in our screen. We selected 960 ated with recent selection acting at or near a locus. During clones for sequencing. Of these, we were able to obtain a selective sweep, in the presence of recombination, linked sequence reads of Ͼ100 bp for 908 clones. These were variation is dragged toward fixation, resulting in an excess used for further analyses. of high-frequency-derived in regions flanking the target of selection. The fixation of the favored variant results The 908 ESTs corresponded to 526 independent genes. in the elimination of polymorphism at sites immediately sur- We focused on genes predicted to encode extracellular rounding the selected site (size of region is dependent upon or cell surface molecules, since they could potentially recombination and the strength of selection). As new muta- be receptors or binding partners for Acps or sperm or tions occur in this region after the sweep and drift upward in be involved in male-independent extracellular pro- frequency, there is an initial excess of rare alleles since every new produces a new allele. The time to return to cesses. We used a bioinformatics approach to identify an equilibrium frequency distribution is a function of the genes encoding proteins with a predicted secretory sig- population size and can be quite slow for large populations. nal sequence and/or transmembrane domains. The For the divergence analyses, we used PAML (Yang 2000) identification of a signal sequence relies on a correct to calculate the likelihood of a neutral model where no codons Ͼ prediction of the first coding exon. Since initial exons could have a d N/d S ratio 1(L0) and compared it to the likelihood of a model in which a subset of sites could have a are notoriously difficult to predict (Davuluri et al. Ͼ d N/d S ratio 1(L1)(Yang and Bielawski 2000). The negative 2001) and some proteins have internal secretory signals, of twice the difference in the log-likelihood obtained from we also included genes containing one or more pre- Ϫ Ϫ these two models ( 2[log(L0) log(L1)]) was compared to dicted transmembrane regions. Thirty-five encoded pro- the chi-square distribution with degrees of freedom equal to teins with a predicted signal sequence and transmem- the difference in number of estimated parameters. Variation brane domain, 75 had just a predicted signal sequence, in the d N/d S ratio between sites was modeled using both dis- crete (PAML models M0 and M3) and ␤-(PAML models M7 and 59 had predicted transmembrane domains but no and M8) distributions. We consider the comparison of model predicted signal sequence. M0 and M3 to be a test for variation in the d N/d S ratio between Several male reproductive proteins show the molecu- sites and not a robust test of adaptive evolution. The compari- lar signature of adaptive evolution, and several hypothe- son of M7 and M8 is a robust test of adaptive evolution. To ses to account for that rapid evolution would predict a determine if the d N/d S ratio significantly exceeds 1, we com- pared the M8 model to the likelihood of a model (M8A) with similar pattern for the female proteins with which they the additional proportion of sites fixed at a d N/d S ratio of 1 interact. We thus incorporated evolutionary informa- (Swanson et al. 2003). Details of the distributions and test tion into our screen by deriving our ESTs from D. simu- statistics can be found in Yang et al. (2000). Signal sequences lans, which allows comparison of them to their putative were predicted using the program SignalP (http://www.cbs. dtu.dk/services/SignalP-2.0/; Nielsen et al. 1997). Trans- orthologs in the completed D. melanogaster genome. We membrane regions were predicted using the TMHMM meth- then calculated the rate of synonymous (silent, d S) and ods (Sonnhammer et al. 1998), using the TMHMM server nonsynonymous substitution (amino acid replacement

(http://www.cbs.dtu.dk/services/TMHMM-2.0/). changes, d N) using maximum-likelihood methods (Gold- man and Yang 1994). The average d N/d S ratio of the 461 protein-coding ESTs is 0.15 Ϯ 0.25 (with the average RESULTS d N being 0.013 and d S being 0.091). Evolutionary EST screen identifies candidate female The signature of adaptive evolution is a d N/d S ratio reproductive genes: We constructed a cDNA library from significantly exceeding 1, as equal numbers of nonsyn- dissected D. simulans female reproductive tracts minus onymous and synonymous substitutions, normalized to 1460 W. J. Swanson et al.

Figure 1.—Analysis of 70 genes, from published research articles on detecting adaptive evolution by analy- sis for variation in the d N/d S ratio be- tween sites by the method of Yang et al. (2000). Additional information and references can be found as online sup- plementary material at http://www. genetics.org/supplemental/.

the number of possible nonsynonymous and synony- We therefore surveyed the literature for articles utilizing mous changes in the gene, are expected under strict the method of Yang et al. (2000) for detecting adaptive neutrality. Our goal here, however, is to use a genomic evolution through analysis for variation in the d N/d S screen to identify candidate genes that have been sub- ratio between sites. We have plotted the proportion of jected to positive selection, possibly at only a small subset genes with evidence of positive selection in relation to Ͼ of their codons. For example, mammalian egg coat pro- their overall d N/d S ratio in Figure 1. At a d N/d S ratio 0.5, ف teins (ZP proteins) have an overall d N/d S ratio of 0.5, 19 of 20 genes analyzed showed statistical evidence for but upon detailed analysis incorporating variation in adaptive evolution, suggesting this may be a reasonable the d N/d S ratio between sites using maximum likelihood value to identify candidate genes that may have been (Yang et al. 2000) it can be demonstrated that these subjected to adaptive evolution. The genes in Figure 1 genes are subjected to positive selection (Swanson et that fall between a d N/d S ratio of 0.3–0.5 also include a Ͼ al. 2001b) with a class of codons having a d N/d S ratio 1. high proportion that show statistical evidence for adap- tive evolution upon closer examination; however, these may be overrepresented in our analyses due to the lack of reports detailing negative results (and thus they are not included in our analysis). The genes, references, and summary information are available as online supplemen- tary material at http://www.genetics.org/supplemental/ for the 70 genes analyzed in Figure 1. Although only 25% of the 70 genes reported failed to show statistical evi- dence for adaptive evolution in subsequent PAML analy- sis, the proportion of genes under positive selection is surely overestimated due to the lack of reports that failed to detect adaptive evolution. Nonetheless, genes Ͼ with an overall d N/d S ratio 0.5 are more likely to have been subjected to adaptive evolution and are thus good candidates for further study. In our EST screen, 27 out

of the total of 461 protein-coding genes have d N/d S ratios Figure 2.—Plot of d N vs. d S for the 461 D. simulans ESTs Ͼ0.5 (Figure 2), including eight of the candidate recep- that matched protein-coding regions of D. melanogaster genes. ϭ tor proteins (containing signal sequences and/or trans- The solid line is the neutral expectation of d N/d S 1. The ϭ dashed line is the cutoff of d N/d S 0.5 used to identify candi- membrane regions; Table 1). date genes that may have been subjected to positive selection. Some of the genes identified by this female reproduc- Drosophila Female Evolutionary ESTs 1461

TABLE 1 Classification of ESTs based upon evolutionary and bioinformatics analyses

No. with Ͼ Classification No. genes No. cds d S d N d N/d S d N/d S 0.5 SS and TM 35 31 0.102 0.009 0.14 1 SS 75 70 0.111 0.023 0.23 5 TM 59 51 0.099 0.015 0.13 2 All candidates combined 169 152 0.105 0.017 0.17 8 Noncandidates 357 309 0.084 0.011 0.14 19 All 526 461 0.091 0.013 0.15 27 SS, signal sequence; TM, transmembrane region; All candidates combined, those with SS and/or TM domains; Noncandidates, lacked TM and/or SS domains; All, all genes identified in the EST screen; No. cds, the number of ESTs containing protein-coding sequence. tive tract evolutionary EST approach have predicted divergence analyses on five of the same genes in five to ORF sequences consistent with likely functions for Dro- eight Drosophila species (Table 4). Genes were chosen sophila reproductive proteins. Sixteen predicted pepti- on the basis of predicted extracellular localization of Ͼ dases and eight predicted protease inhibitors were the protein they encode and/or overall d N/d S ratio 0.5. found. At least two Drosophila male seminal fluid proteins For the polymorphism survey, we analyzed the fre- that are transferred to females undergo proteolytic cleav- quency spectrum (i.e., analysis of proportion of alleles age (Monsma et al. 1990; Bertram et al. 1996), and in at high vs. low frequencies) of the polymorphisms for at least one case this cleavage is dependent on contribu- departures from equilibrium neutral expectations (Aqua- tions from the female as well as the male (Park and dro 1997). In particular, we analyzed for an excess of rare Wolfner 1995). Although the nature of the female alleles (i.e., singletons; Tajima 1989; Fu and Li 1993) or contribution is unknown, it could involve proteases (to an excess of high-frequency-derived polymorphisms (Fay cleave) and protease inhibitors (to confine cleavage to and Wu 2000). Either pattern could have resulted from appropriate sites in the protein) such as the predicted a recent selective sweep or a population bottleneck. To ones identified here. Additionally, there are 47 different maximize the power of our statistical tests, we focused proteins with putative transporter activity and 11 differ- our analyses on intron regions, which should maximize ent putative signal tranducer genes that could be in- variation within and between species under neutrality. volved in regulating the mated female’s physiology (Ta- We ruled out any genome-wide confounding effects, ble 2). For example, it has been hypothesized that a such as demographics (e.g., population bottleneck), on transporter moves the Acp70a (sex peptide) from the these statistics, since three loci (Table 3) and additional reproductive tract to the hemolymph, where it binds receptors in the nervous system of the female (Ding et TABLE 2 al. 2003). Finally, there are several genes predicted to be involved in defense or immunity. These candidates Gene Ontology Functions are all prime targets for functional analyses. A summary No. from 526 of the molecular functions based upon the gene ontol- Molecular function independent genes ogy classification (Ashburner et al. 2000) is provided in Table 2. Details of all genes identified in our screen Unclassified 184 can be found as online supplementary material at http:// Catalytic activity 148 Binding 95 www.genetics.org/supplemental/. Transport activity 47 Divergence and polymorphism studies demonstrate Structural molecule 42 adaptive evolution: The evolutionary EST approach uti- Enzyme regulator 20 lized here (isolating ESTs from one organism and com- Transcription regulator 17 paring to the completed genome of a close relative; Signal transducer 11 Translation regulator 9 Swanson et al. 2001a) is aimed at identifying candidate Chaperone activity 9 genes for further tests for adaptive evolution. Each indi- Antioxidant activity 4 vidual prediction of adaptive evolution needs to be inde- Motor activity 3 pendently verified. To test if any of the candidate genes Defense/immunity protein 2 identified herein have actually been subjected to posi- Unknown 2 tive selection, we performed a polymorphism survey of Cell adhesion 1 Apoptosis regulator 1 nine of the genes from D. melanogaster and D. simulans Protein tagging 1 isofemale lines isolated from Maryland (Table 3) and 1462 W. J. Swanson et al.

TABLE 3 Polymorphism survey identifies positive selection in several candidate genes

D. melanogaster D. simulans EST ␲ ␲ Gene bp GO function d N/d S Rationale N Taj. D F&L D F&W HN Taj. D F&L D F&W H CG16705 822 Protease 0.30 SS 27 0.007 0.1 0.5 Ϫ0.4 11 0.011 0.0 0.5 Ϫ2.1 CG17108 731 Unknown 0.36 SS 31 0.002 Ϫ1.6* Ϫ3.0* 0.8 9 0.004 Ϫ1.9* Ϫ0.3 Ϫ7.0* CG4928 750 TM receptor 0.03 TM 34 0.002 Ϫ1.3 0.2 Ϫ4.1* 6 0.013 Ϫ0.7 Ϫ4.3* Ϫ0.1 Ϫ Ϫ Ϫ Ϫ CG10200 716 Unknown 1.26 SS, dN/dS 23 0.010 1.9* 2.9* 4.4 5 0.008 0.3 0.1 1.0 Ϫ Ϫ Ϫ Ϫ Ϫ CG16707 830 Unknown 1.38 SS, TM, dN/dS 19 0.003 1.7* 3.3* 1.2 5 0.004 1.2* 0.1 2.7* CG7415 859 Protease 0.05 Function 19 0.001 Ϫ1.8* Ϫ2.2* 0.6 8 0.007 Ϫ0.7 Ϫ1.3 Ϫ0.34 CG8827 753 Protease 0.03 SS 34 0.008 0.6 1.3 0.9 12 0.019 Ϫ0.3 0.2 3.2 CG11390 793 Ligand carrier 0.04 SS 25 0.007 Ϫ0.4 Ϫ1.2 0.4 9 0.005 Ϫ0.2 Ϫ0.9 1.1 CG3066 788 Protease 0.17 SS, TM 13 0.007 Ϫ0.3 Ϫ0.3 Ϫ4.2* 11 0.013 0.2 0.0 3.1

bp, number of base pairs sequenced; GO function, gene ontology function (Ashburner et al. 2000); EST dN/dS, dN/dS ratio from EST screen; N, number of individuals sequenced; ␲, nucleotide diversity; Taj. D, Tajima’s D,F&LD, Fu and Li’s D with outgroup; F&W H, Fay and Wu’s H.*P Ͻ 0.05. Rationale indicates why the gene was investigated: SS, signal sequence; TM, Ͼ transmembrane; dN/dS, dN/dS 0.5; and/or Function, predicted function.

unpublished studies of these samples (C. F. Aquadro, D. simulans. We then analyzed the sequence data using unpublished results) conform to equilibrium neutral maximum-likelihood methods (Nielsen and Yang 1998;

expectations. We performed polymorphism surveys for Yang et al. 2000) to detect variation in the d N/d S ratio nine loci and found evidence for selective sweeps at six between sites. Divergence analyses were not performed of these loci (Table 3), suggesting the recent action of on CG17108 due to the biased amino acid and codon positive selection at or near these genes. Our results are usage seen in this gene, which may induce errors in bolstered by finding evidence for recent selective events parameter estimations using codon models. Whereas using multiple statistics that utilize different regions of the polymorphism-based tests are capable of detecting the frequency spectrum (i.e., high and low frequency). recent selection in a single species, the divergence analy- The genes under positive selection by this analysis in- ses can detect repeated episodes of positive selection on clude two putative proteases, a predicted transmem- the same codons in several species. A significant result brane receptor, and three genes with unknown func- using these latter methods suggests that a subset of co- tion. dons in a gene has been subjected to positive selection For the divergence studies, we sequenced from several in several species. We find evidence of variation in the

additional Drosophila species five of the genes identi- d N/d S ratio for all five genes using the discrete model M3. Ͼ fied from our polymorphism analysis as having evidence Four of these genes have a class of sites with d N/d S 1. for a recent selective sweep in D. melanogaster and/or These four genes are still considered as only candidates

TABLE 4 Detection of positive selection by maximum-likelihood analysis

M0 vs.M3 M7vs.M8 M8 vs. M8A: ␻ ␻ Gene GO function Species d N/d S ps ps probability CG4928 TM receptor ere, eug, lut, mel, pse, sim, tei 0.0 0.06** 0.24 0.37 0.0 — CG10200 Unknown ere, eug, mel, sim, tei, yak 0.4 0.20*** 1.46 0.06 1.8 0.34 CG16707 Unknown ere, mel, sim, tei, yak 0.5 0.09** 5.5 0.09* 5.5 Ͻ0.01 CG7415 Protease ere, lut, mel, sim, tei, yak 0.1 0.01*** 1.2 0.14 0.3 — CG3066 Protease ere, eug, lut, mel, pse, sim, tei, yak 0.1 0.04*** 1.8 0.03* 2.0 Ͻ0.05 GO function, gene ontology (Ashburner et al. 2000) function; Species, species from the set D. erecta (ere), D. eugracilis (eug), D. lutescens (lut), D. melanogaster (mel), D. pseudoobscura (pse), D. simulans (sim), D. teissieri (tei), and D. yakuba (yak); d N/d S, ␻ estimate of d N/d S assuming no rate heterogeneity; M0 vs. M3, M3 parameter estimates of dN/dS in the highest site class ( ) and ␻ the proportion of sites (ps) estimated to belong to that class; M7 vs. M8, M8 free parameter estimate of d N/d S ( ) and proportion Ͼ of sites (ps) estimated to belong to that class; M8 vs. M8A: probability, probability that the d N/d S in model 8 is significantly 1. *P Ͻ 0.05; **P Ͻ 0.01; ***P Ͻ 0.001. Drosophila Female Evolutionary ESTs 1463 for adaptive evolution since using a discrete model with (though not mutually exclusive) evolutionary hypothe- three classes of d N/d S ratios compared to a single overall ses have been proposed to explain the rapid evolution average d N/d S ratio is not a robust test of adaptive evolu- (Parker 1970; Eberhard 1996; Rice 1996; Gavrilets tion (Swanson et al. 2001b) and should be considered et al. 2000; Swanson and Vacquier 2002). Evaluation as only a test for variable d N/d S ratios between sites. of these hypotheses, and of the mechanism of action of Using a more refined test with a beta distribution of these proteins, requires knowledge of the proteins in d N/d S for “neutral” or functionally constrained codons the female with which the male accessory gland proteins that covers the interval 0–1, we find evidence of positive interact. To identify genes that could encode such Acp- selection acting upon a subset of codons for two of the interacting or -regulated proteins, we carried out an evolu- five genes studied (Table 4). In both cases the sites in tionary EST screen of the female reproductive tract in D. Ͼ this extra class have d N/d S ratios significantly 1, since simulans and D. melanogaster and identified 908 ESTs a model (M8) with a freely estimated extra class is sig- corresponding to 526 independent genes. These genes nificantly better than a model where the extra class has encode proteins predicted to mediate diverse biological a d N/d S ratio fixed at 1 (M8A; Table 4). One gene functions and include a number of candidates for pro- (CG3066) is a predicted trypsin-like serine protease. teins in position to interact with Acps (by virtue of being Several of the codons inferred to be under positive secreted or having transmembrane domains). This screen selection in this gene lie within the predicted trypsin complements a previous evolutionary EST screen of the catalytic domain. Furthermore, several putatively se- male accessory gland (Swanson et al. 2001a). Together lected codons lie in the predicted clip domain, which these screens provide two sets of genes that likely in- may be involved in protein-protein interactions ( Jiang clude partners in molecular interactions that modulate and Kanost 2000). The second gene (CG16707) does reproductive success in these species. not belong to any predicted functional class. Of the genes we identified here from the female re- productive tract, 461 contained sufficient protein-coding DISCUSSION sequence in the D. simulans EST to make a comparison of nonsynonymous (d ) and synonymous (d ) substitutions Adaptive evolution is becoming an increasingly com- N S between species. Twenty-seven proteins had a ratio of mon observation in the study of reproductive proteins. nonsynonymous to synonymous substitutions Ͼ0.5, a The vast majority of studies have focused on male- level that we argue is a useful cutoff to identify genes derived factors (Swanson and Vacquier 2002), per- likely to show evidence of positive selection on further haps in part because these are easier to characterize more detailed analysis. Nine candidate proteins with and more have been identified. However, it is clear signal sequences and/or transmembrane domains, in- that female genotype plays an important role during cluding two with elevated levels of d /d substitution reproduction. In mammals, genes encoding the egg N S ratios, were further examined for evidence of recent coat proteins ZP2 and ZP3 have been demonstrated to positive selection by analysis of DNA sequence polymor- undergo adaptive evolution. Several of the sites pre- phism in population samples of both D. melanogaster and dicted to be subjected to adaptive evolution are in re- gions implicated in sperm-egg binding (Swanson et al. D. simulans. Six of the nine genes showed evidence of 2001b; Jansa et al. 2003), indicating the selective pres- a recent adaptive fixation at or near the candidate locus sure may relate to sperm-egg interaction. In Drosophila, (Table 3). Subsequent analysis of sequence divergence it has been demonstrated that females play an important at five of these genes among five to eight species of role in (Price 1997; Clark and Drosophila revealed significant evidence for positive se- Begun 1998; Clark et al. 1999). The class of genes lection accelerating amino acid sequence divergence at studied in this article includes several genes expressed in between 3 and 9% of the codons in two genes. One of the female reproductive tract and subjected to adaptive these two genes encodes a serine protease, while the other evolution. The identification of candidate genes encod- encodes a protein of unknown function (Table 4). ing Drosophila female reproductive proteins is a crucial It is worth noting that the two types of evolutionary step toward understanding, at the molecular level, the male analyses utilized here (polymorphism and divergence) and female interactions that occur during reproduction. are most powerful at detecting different kinds of adap- Proteins transferred with sperm to the female during tive evolution. The polymorphism surveys are most pow- copulation significantly influence the mated female’s erful at detecting recent selective events (Simonsen et behavior and physiology in some animals, such as in- al. 1995). The divergence analyses are most powerful sects, as well as the reproductive success of the partici- when recurrent selection acts upon a subset of codons pant gametes (Wolfner 1997, 2002). The rapid amino over most lineages studied (Anisimova et al. 2001). Im- acid sequence divergence of some of these male acces- portantly, detection of nonneutral patterns by either sory gland proteins (Tsaur and Wu 1997; Aguade´ 1998, method should be considered evidence of adaptive evo- 1999; Tsaur et al. 1998; Begun et al. 2000; Swanson lution. The selective pressure driving the divergence of et al. 2001a) begs explanation, and several competing these genes remains unknown, and determination of 1464 W. J. Swanson et al. the function of the molecules should shed light on the conflict (Rice 1996; Gavrilets 2000), and female selective pressures. choice (Eberhard 1996). As a group, the genes chosen as candidates on the We thank J. Rozas for a beta release of DnaSP 4.0, Jennifer Calkins basis of the presence of a signal sequence and/or trans- for help with Figure 1, Lawrence Harshman, Andy Clark, members of membrane region have a level of nonsynonymous se- the Swanson, Aquadro, and Wolfner labs, and reviewers for thoughtful suggestions. Support was provided by National Institutes of Health quence divergence (d N) that is 50% greater than that (NIH) grant HD42563, National Science Foundation grant DEB- of the noncandidate reproductive genes. While the level 0111613, and NIH National Research Service Award postdoctoral of synonymous divergence (d S) of the candidate genes fellowship GM20889 (to W.J.S.); and NIH grants GM036431 (to is consistent with the value expected for this species C.F.A.) and HD38921 (to M.F.W.). A.W. is a Howard Hughes Medical .Bauer and Aquadro 1997), it is greater Institute predoctoral fellow ;0.10ف) pair than that of the noncandidate loci by 25%. This differ- ence in d S likely reflects the previously seen positive correlation between protein sequence conservation and LITERATURE CITED synonymous site divergence reported by Akashi et al. Adams, M. D., S. E. Celniker, R. A. Holt, C. A. Evans, J. D. Gocayne (1996). The increased d N is similar to that observed for et al., 2000 The genome sequence of . the male accessory gland genes (Swanson et al. 2001a). Science 287: 2185–2195. Aguade´,M., 1998 Different forces drive the evolution of the Although a lower proportion of ESTs with d N/d S ratios Acp26Aa and Acp26Ab accessory gland genes in the Drosophila Ͼ0.5 was observed among the female reproductive tract melanogaster species complex. Genetics 150: 1079–1089. ESTs reported here than among male accessory gland Aguade´,M., 1999 Positive selection drives the evolution of the Acp29AB accessory gland protein in Drosophila. Genetics 152: ESTs from (Swanson et al. 2001a) analysis (6% female 543–551. vs. 19% male Acp), the total number of genes with Aguade´, M., N. Miyashita and C. H. Langley, 1992 Polymorphism Ͼ and divergence in the Mst26A male accessory gland gene region d N/d S ratios 0.5 was similar (27 female vs. 33 male in Drosophila. Genetics 132: 755–770. Acp). The polymorphism surveys of genes expressed Akashi, A., S. Ono, K. Kuwano and S. Arai, 1996 Proteins of 30 in the female reproductive tract were consistent with and 36 kilodaltons, membrane constituents of the Staphylococcus positive selection for six out of nine genes analyzed aureus L form, induce production of tumor necrosis factor alpha and activate the human immunodeficiency virus type 1 long termi- (Table 3). This is a higher proportion than that for nal repeat. Infect. Immun. 64: 3267–3272. surveys of male accessory glands, in which statistical Anisimova, M., J. P. Bielawski and Z. Yang, 2001 Accuracy and departures from neutrality based largely on frequency power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol. 18: 1585–1592. distributions were observed for three out of nine com- Aquadro, C. F., 1997 Insights into the evolutionary process from parisons in D. melanogaster (Begun et al. 2000) and zero patterns of DNA sequence variability. Curr. Opin. Genet. Dev. out of seven comparisons in D. simulans (Kern et al. 7: 835–840. Ashburner, M., C. A. Ball, J. A. Blake, D. Botstein, H. Butler et 2004). Variance in rates across lineages is consistent al., 2000 Gene ontology: tool for the unification of biology. The with positive selection in some Acp genes in the latter Gene Ontology Consortium. Nat. Genet. 25: 25–29. study. The discrepancy may be due to differences in Barnett, T., C. Pachl, J. P. Gergen and P. C. Wensink, 1980 The isolation and characterization of Drosophila yolk protein genes. selective pressures, sample sizes (the female polymor- Cell 21: 729–738. phism survey had more individuals), or populations ana- Bauer, V. L., and C. F. Aquadro, 1997 Rates of DNA sequence lyzed [female was cosmopolitan (this study) and male evolution are not sex-biased in Drosophila melanogaster and D. simulans. Mol. Biol. Evol. 14: 1252–1257. was African]. Begun, D. J., P. Whitley, B. L. Todd, H. M. Waldrip-Dail and A. G. The genes identified here are likely candidates to Clark, 2000 Molecular population genetics of male accessory be the ones encoding molecules made in the female gland proteins in Drosophila. Genetics 156: 1879–1888. Bertram, M. J., D. M. Neubaum and M. F. Wolfner, 1996 Localiza- reproductive tracts that may interact with male-derived tion of the Drosophila male accessory gland protein Acp36DE in factors and should be the target of future functional the mated female suggests a role in sperm storage. Bio- analyses. Prediction of candidate genes encoding repro- chem. Mol. Biol. 26: 971–980. Chapman, T., L. F. Liddle, J. M. Kalb, M. F. Wolfner and L. Par- ductive proteins will facilitate their functional character- tridge, 1995 Cost of mating in Drosophila melanogaster females is ization through allelic association studies and biochemi- mediated by male accessory gland products. Nature 373: 241–244. cal and genetic characterization. It is likely that some Chapman, T., J. Bangham, G. Vinti, B. Seifried, O. Lung et al., 2003 The sex peptide of Drosophila melanogaster: female post-mating of the genes identified in this screen are involved in responses analyzed by using RNA interference. Proc. Natl. Acad. female-specific functions, such as egg activation, lubrica- Sci. USA 100: 9923–9928. Chen, P. S., E. Stumm-Zollinger, T. Aigaki, J. Balmer, M. Bienz tion, or immunity. Moreover, some of the genes identi- et al., 1988 A male accessory gland peptide that regulates repro- fied here as having been subjected to positive selection ductive behavior of female Drosophila melanogaster. Cell 54: 291– may prove to be binding partners of male reproductive 298. Civetta, A., and R. S. Singh, 1995 High divergence of reproductive proteins, including the Acps and sperm surface pro- tract proteins and their association with postzygotic reproductive teins. It will be of particular interest to determine if isolation in Drosophila melanogaster and Drosophila virilis group the ligand and its receptor show similar evolutionary species. J. Mol. Evol. 41: 1085–1095. Clark, A. G., and D. J. Begun, 1998 Female genotypes affect sperm dynamics. These studies will help provide molecular displacement in Drosophila. Genetics 149: 1487–1493. insights into sperm precedence (Parker 1970), sexual Clark, A. G., D. J. Begun and T. Prout, 1999 Female x male Drosophila Female Evolutionary ESTs 1465

interactions in Drosophila sperm competition. Science 283: 217– Ottiger, M., M. Soller, R. F. Stocker and E. Kubli, 2000 Binding 220. sites of Drosophila melanogaster sex peptide pheromones. J. Davuluri, R. V., I. Grosse and M. Q. Zhang, 2001 Computational Neurobiol. 44: 57–71. identification of promoters and first exons in the human genome. Park, M., and M. F. Wolfner, 1995 Male and female cooperate Nat. Genet. 29: 412–417. in the prohormone-like processing of a Drosophila melanogaster Ding, Z., I. Haussmann, M. Ottiger and E. Kubli, 2003 Sex-pep- seminal fluid protein. Dev. Biol. 171: 694–702. tides bind to two molecularly different targets in Drosophila Parker, G. A., 1970 Sperm competition and its evolutionary conse- melanogaster females. J. Neurobiol. 55: 372–384. quences in the . Biol. Rev. 45: 525–567. Eberhard, W. G., 1996 Female Control: Sexual Selection by Cryptic Female Price, C. S., 1997 Conspecific sperm precedence in Drosophila. Choice. Princeton University Press, Princeton, NJ. Nature 388: 663–666. Fay, J. C., and C.-IWu, 2000 Hitchhiking under positive Darwinian Rice, W. R., 1996 Sexually antagonistic male adaptation triggered selection. Genetics 155: 1405–1413. by experimental arrest of female evolution. Nature 381: 232–234. Fu, Y. X., and W. H. Li, 1993 Statistical tests of neutrality of muta- Rozas, J., and R. Rozas, 1999 DnaSP version 3: an integrated pro- tions. Genetics 133: 693–709. gram for molecular population genetics and molecular evolution Galindo, B. E., V. D. Vacquier and W. J. Swanson, 2003 Positive analysis. Bioinformatics 15: 174–175. selection in the egg receptor for abalone sperm lysin. Proc. Natl. Simonsen, K. L., G. A. Churchill and C. F. Aquadro, 1995 Proper- Acad. Sci. USA 100: 4639–4643. ties of statistical tests of neutrality for DNA polymorphism data. Gavrilets, S., 2000 Rapid evolution of reproductive barriers driven Genetics 141: 413–429. by sexual conflict. Nature 403: 886–889. Soller, M., M. Bownes and E. Kubli, 1997 Mating and sex peptide Gavrilets, S., R. Acton and J. Gravner, 2000 Dynamics of specia- stimulate the accumulation of yolk in oocytes of Drosophila melano- 243: tion and diversification in a metapopulation. Evolution 54: 1493– gaster. Eur. J. Biochem. 732–738. Soller, M., M. Bownes and E. Kubli, 1999 Control of oocyte matu- 1501. ration in sexually mature Drosophila females. Dev. Biol. 208: 337– Goldman, N., and Z. Yang, 1994 A codon-based model of nucleotide 351. substitution for protein-coding DNA sequences. Mol. Biol. Evol. Sonnhammer, E. L., G. von Heijne and A. Krogh, 1998 A hidden 11: 725–736. Markov model for predicting transmembrane helices in protein Heifetz, Y., O. Lung, E. A. Frongillo, Jr. and M. F. Wolfner, 2000 sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6: 175–182. The Drosophila seminal fluid protein Acp26Aa stimulates release Swanson, W. J., and V. D. Vacquier, 2002 Rapid evolution of repro- of oocytes by the ovary. Curr. Biol. 10: 99–102. ductive proteins. Nat. Rev. Genet. 3: 137–144. Heifetz, Y., U. Tram and M. F. Wolfner, 2001 Male contributions Swanson, W. J., A. G. Clark, H. M. Waldrip-Dail, M. F. Wolfner to egg production: the role of accessory gland products and and C. F. Aquadro, 2001a Evolutionary EST analysis identifies sperm in Drosophila melanogaster. Proc. R. Soc. Lond. Ser. B Biol. rapidly evolving male reproductive proteins in Drosophila. Proc. Sci. 268: 175–180. Natl. Acad. Sci. USA 98: 7375–7379. Herndon, L. A., and M. F. Wolfner, 1995 A Drosophila seminal Swanson, W. J., Z. Yang, M. F. Wolfner and C. F. Aquadro, 2001b fluid protein, Acp26Aa, stimulates egg laying in females for 1 day Positive Darwinian selection drives the evolution of several female after mating. Proc. Natl. Acad. Sci. USA 92: 10114–10118. reproductive proteins in mammals. Proc. Natl. Acad. Sci. USA Hudson, R. R., 1987 Estimating the recombination parameter of a 98: 2509–2514. finite population model without selection. Genet. Res. 50: 245– Swanson, W. J., R. Nielsen and Q. Yang, 2003 Pervasive adaptive 250. evolution in mammalian fertilization proteins. Mol. Biol. Evol. Jansa, S. A., B. L. Lundrigan and P. K. Tucker, 2003 Tests for 20: 18–20. positive selection on immune and reproductive genes in closely Tajima, F., 1989 Statistical method for testing the neutral mutation related species of the murine genus mus. J. Mol. Evol. 56: 294–307. hypothesis by DNA polymorphism. Genetics 123: 585–595. Jiang, H., and M. R. Kanost, 2000 The clip-domain family of serine Tram, U., and M. F. Wolfner, 1999 Male seminal fluid proteins proteinases in arthropods. Insect Biochem. Mol. Biol. 30: 95–105. are essential for sperm storage in Drosophila melanogaster. Genetics Kern, A. D., C. D. Jones and D. J. Begun, 2004 Molecular population 153: 837–844. genetics of male accessory gland proteins in the Drosophila sim- Tsaur, S. C., and C.-IWu, 1997 Positive selection and the molecular ulans complex. Genetics 167: 725–735. evolution of a gene of male reproduction, Acp26Aa of Drosophila. Ko, W. Y., R. M. David and H. Akashi, 2003 Molecular phylogeny Mol. Biol. Evol. 14: 544–549. of the Drosophila melanogaster species subgroup. J. Mol. Evol. Tsaur, S. C., C. T. Ting and C.-IWu, 1998 Positive selection driving 57: 562–573. the evolution of a gene of male reproduction, Acp26Aa, of Dro- Liu, H., and E. Kubli, 2003 Sex-peptide is the molecular basis of sophila: II. Divergence versus polymorphism. Mol. Biol. Evol. 15: the sperm effect in Drosophila melanogaster. Proc. Natl. Acad. Sci. 1040–1046. USA 100: 9929–9933. Wolfner, M. F., 1997 Tokens of love: functions and regulation of Lung, O., U. Tram, C. M. Finnerty, M. A. Eipper-Mains, J. M. Kalb Drosophila male accessory gland products. Insect Biochem. Mol. et al., 2002 The Drosophila melanogaster seminal fluid protein Biol. 27: 179–192. Acp62F is a protease inhibitor that is toxic upon ectopic expres- Wolfner, M. F., 2002 The gifts that keep on giving: physiological sion. Genetics 160: 211–224. functions and evolutionary dynamics of male seminal proteins MacDonald, R. J., G. H. Swift, A. E. Przybyla and J. M. Chirgwin, in Drosophila. Heredity 88: 85–93. 1987 Isolation of RNA using guanidinium salts. Methods Enzy- Wolfner, M. F., S. Applebaum and Y. Heifetz, 2004 Insect gonadal mol. 152: 219–227. glands and their gene products in Comprehensive Insect Physiology, Monsma, S. A., H. A. Harada and M. F. Wolfner, 1990 Synthesis Biochemistry, Pharmacology and Molecular Biology, edited by L. Gil- of two Drosophila male accessory gland proteins and their fate after bert,K.Iatrou and S. Gill. Elsevier, Amsterdam/New York. transfer to the female during mating. Dev. Biol. 142: 465–475. Xue, L., and M. Noll, 2000 Drosophila female sexual behavior induced by sterile males showing copulation complementation. Neubaum, D. M., and M. F. Wolfner, 1999 Mated Drosophila melano- Proc. Natl. Acad. Sci. USA 97: 3272–3275. gaster females require a seminal fluid protein, Acp36DE, to store Yang, Z., 2000 Phylogenetic Analysis by Maximum Likelihood (PAML). 153: sperm efficiently. Genetics 845–857. University College London, London. Nielsen, H., J. Engelbrecht, S. Brunak and G. von Heijne, 1997 Yang, Z., and J. P. Bielawski, 2000 Statistical methods for detecting A neural network method for identification of prokaryotic and molecular adaptation. Trends Ecol. Evol. 15: 496–503. eukaryotic signal peptides and prediction of their cleavage sites. Yang, Z., R. Nielsen, N. Goldman and A. M. Pedersen, 2000 Co- Int. J. Neural Syst. 8: 581–599. don-substitution models for heterogeneous selection pressure at Nielsen, R., and Z. Yang, 1998 Likelihood models for detecting posi- amino acid sites. Genetics 155: 431–449. tively selected amino acid sites and applications to the HIV-1 enve- lope gene. Genetics 148: 929–936. Communicating editor: L. Harshman