<<

Copyright Ó 2010 by the Society of America DOI: 10.1534/genetics.110.116756

Gene Duplication, Conversion and the of the Y

Tim Connallon1 and Andrew G. Clark Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853-2703 Manuscript received March 17, 2010 Accepted for publication May 31, 2010

ABSTRACT Nonrecombining , such as the Y, are expected to degenerate over time due to reduced efficacy of compared to chromosomes that recombine. However, , coupled with between duplicate pairs, can potentially counteract forces of evolutionary decay that accompany asexual reproduction. Using a combination of analytical and computer simulation methods, we explicitly show that, although gene conversion has little impact on the probability that duplicates become fixed within a population, conversion can be effective at maintaining the functionality of Y-linked duplicates that have already become fixed. The coupling of Y-linked gene duplication and gene conversion between paralogs can also prove costly by increasing the rate of nonhomologous crossovers between duplicate pairs. Such crossovers can generate an abnormal , as was recently shown to reduce male fertility in . The results represent a step toward explaining some of the more peculiar attributes of the Y as well as preliminary Y-linked sequence data from other mammals and Drosophila. The results may also be applicable to the recently observed pattern of tetraploidy and gene conversion in asexual, bdelloid rotifers.

ONRECOMBINING chromosomes are often asso- (Charlesworth and Charlesworth N ciated with genetic degradation and a loss of 2000; Bachtrog 2006; Engelstadter 2008). functional , and nowhere is this pattern more The issue is more complex when one considers data exaggerated than on the Y chromosome (Charlesworth from the well-characterized human Y chromosome. A and Charlesworth 2000; Bachtrog 2006). However, majority of functional Y-linked genes are members of in addition to the more widely recognized pattern of duplicate gene pairs residing within large palindromes gene loss, sequences of mammals and Drosoph- and are almost exclusively testis expressed (Skaletsky ila are also yielding evidence for Y-linked functional et al. 2003). In contrast to many of the single-copy genes gene gain followed by amplification of duplicate genes with X-linked homologs, members of Y-linked gene (Skaletsky et al. 2003; Koerich et al. 2008; Carvalho families are apparently not degenerating, but rather et al. 2009; Krsticevic et al. 2009; Hughes et al. 2010). have become fixed and maintained over many millions Duplication and retention of functional Y-linked gene of years (Skaletsky et al. 2003; Yu et al. 2008). Although copies is somewhat surprising because evolutionary the- Y chromosomes are not well characterized in other taxa, ory predicts an opposing pattern. First, to the extent currently available data suggest that duplication is a that gene duplicates are fixed via positive selection, common feature of Y chromosomes in other mammal they are less likely to become fixed on nonrecombining as well as Drosophila (Rozen et al. 2003; Verkaar relative to recombining chromosomes (Otto and et al. 2004; Murphy et al. 2006; Alfo¨ldi2008; Wilkerson Goldstein 1992; Clark 1994; Yong 1998; Otto and et al. 2008; Krsticevic et al. 2009; Geraldes et al. 2010). Yong 2002; Tanaka and Takahasi 2009). Second, Thus, patterns of gene duplication and retention, for at regardless of whether Y-linked duplicates become fixed least a subset of Y-linked genes, may be a general rule of via or by natural selection, the actions of Y chromosome evolution. Muller’s ratchet, genetic hitchhiking, and background Another attribute of the mammalian Y appears to be selection are expected to greatly increase the probabil- relevant for duplicate gene evolution. Comparative ity that Y-linked genes degenerate into nonfunctional analysis between humans and suggests ongoing recombination between the gene duplicate pairs that reside on the same Y chromosome. Such Supporting information is available online at http://www.genetics.org/ ‘‘intrachromosomal’’recombination includes both non- cgi/content/full/genetics.110.116756/DC1. reciprocal (gene conversion) and reciprocal exchange 1Corresponding author: Department of Molecular Biology and Genetics, ozen Cornell University, Biotechnology Bldg. (Room 227), Ithaca, NY 14853- (crossing over) between gene duplicate pairs (R 2703. E-mail: [email protected] et al. 2003; Lange et al. 2009). Gene conversion between

Genetics 186: 277–286 (September 2010) 278 T. Connallon and A. G. Clark the duplicates potentially maintains gene function by develop and analyze a diffusion approximation and counteracting stochastic forces of Y chromosome de- perform stochastic simulations to examine the proba- generation (Rozen et al. 2003; Charlesworth 2003; bility that a rare gene duplicate eventually becomes Noordam and Repping 2006). The rationale behind fixed within a population of small size. this hypothesis is subtle. As with other clonally inherited Invasion of a new gene duplicate: Consider a single chromosomes, each evolutionary lineage of the Y is phys- Y-linked locus with a functional , A, and a non- ically coupled to, and its evolutionary fate is influenced functional allele, a.MutationfromA to a occurs at rate u by, the presence of deleterious . - per generation and there is no back mutation. By bearing lineages represent evolutionary dead ends unless introducing a duplication of the locus, the population is they can somehow remove or compensate for deleterious expanded to include five genotypic classes: the original mutations. Recombination between duplicates can ‘‘res- single-copy classes (A and a), those with two functional cue’’ functionality via gene conversion between func- gene copies (AA), those with one functional and one tional and nonfunctional copies. nonfunctional copy (Aa), and those with two nonfunc- On the other hand, double-strand DNA breaks, which tional copies (aa). As in the single-locus case, transitions precede gene conversion events (Marais 2003), also between states (AA / Aa or aA; Aa or aA / aa) can occur precede crossing over. Crossovers between Y-linked by mutation, at rate of u per locus; because there are now genes can generate acentric and dicentric Y chromo- two loci, the mutation rate per chromosome is 2u. somes, resulting in infertility and disruption of the For Y chromosomes carrying duplicates, recombina- sex determination pathway (e.g., Repping et al. 2002; tion (crossing over and gene conversion) can poten- Heinritz et al. 2005; Lange et al. 2009). Considering tially occur between loci. Throughout our analysis, we both gene conversion and crossing over on the Y, re- examine cases where recombination occurs at a rate of d combination can be viewed as a factor that either con- per paralog pair, per generation. The probability that a strains (via gene conversion) or promotes (via crossing single recombination event is a crossover, which gen- over) Y chromosome degeneration. erates an abnormal (sterile) Y chromosome (e.g., These observations concerning Y chromosome gene Repping et al. 2002; Heinritz et al. 2005; Lange et al. content and recombination raise interesting questions 2009), is equal to the constant c . The remainder of that have not been formally addressed by evolutionary recombination events (1 c) represent gene conversion theory (but see the recent study by Marais et al. 2010). events between duplicate pairs. Gene conversion in- First, what conditions favor the evolutionary invasion of volving Aa or aA individuals yields AA or aa sperm at rate Y-linked gene duplicates, and does recombination in- b and 1 b, respectively. Thus, b can be viewed as a fluence the probability that duplicates eventually become biased gene conversion parameter, where the functional fixed within a population? Second, what affect does copy A preferentially replaces the nonfunctional a recombination have on Y-linked fitness and the mainte- whenever b . 0.5 (there is no bias when b ¼ 0.5). nance of functional duplicate genes? To address these Compared to individuals with two functional gene questions, we develop and analyze a series of population- copies, individuals with zero functional copies suffer a genetic models of Y chromosome evolution. We show fitness reduction of s, while those with one functional that, when direct selection on gene duplicates is weak, copy suffer a reduction of sh, where h is equivalent to a biased gene conversion can increase, whereas crossing dominance coefficient. Complete masking of a non- over will decrease, their probability of fixation. For dupli- functional allele occurs when h ¼ 0, and there is no cates with larger fitness effects, the probability of fixation direct fitness benefit of carrying two vs. one functional is largely independent of Y-linked recombination. Finally, gene. Partial masking occurs when 1 . h . 0; in such gene conversion has a major impact on the retention of cases, there is a fitness benefit of having two functional functional Y-linked genes that are already fixed within the copies. Genotypes, genotypic fitness, and zygotic fre- population and maintains multiple gene copies with or quencies are described in Table 1. without selection favoring these duplicates.

MODEL AND RESULTS TABLE 1 Gene conversion and the invasion of new gene Parameterization for the gene duplicate invasion model duplicates: We first consider conditions favoring the evolutionary invasion of new Y-linked duplicate genes at Genotype Frequency Fitness low initial frequency within the population. Determin- AA x11 1 istic invasion dynamics are described for a two-locus Aa, aA x10 1 sh model, and it is shown separately that the two-locus Ax1 1 sh model characterizes duplicate gene invasion conditions aa x00 1 s on a Y chromosome carrying an arbitrary number of ax0 1 s genes (see supporting information, File S1). We then Abnormal Y xs 0 Gene Conversion and Y Evolution 279

For a sequence of events of (i) birth, (ii) selection, when the direct fitness benefit of additional functional (iii) mutation, (iv) recombination, and (v) random gene copies outweighs the indirect consequences of mating (and ignoring factors of u2), the frequency doubling the deleterious mutation rate, as previously change of each genotype, per generation, is given by reported for both haploid and diploid systems without the following six recursions, recombination (Clark 1994; Otto and Yong 2002; also x see Otto and Goldstein 1992). x 9¼ 11 ð1 2uÞð1 dcÞ 11 w How does recombination alter the evolutionary dy- x x ð1 shÞ namics of Y chromosomes? When duplicates do not 1 11 2udð1 cÞb 1 10 ð1 uÞdð1 cÞb w w directly increase fitness (sh ¼ 0), and there is no re- combination, selection never favors invasion (Equation x11 x10ð1 shÞ 1b above). We can ask whether gene conversion expands x109 ¼ 2uð1 dÞ 1 ð1 uÞð1 dÞ w w the conditions favorable to invasion of a duplicate in a way that is similar to previous models of gene duplication x x ð1 shÞ with crossing over (Otto and Yong 2002). By permitting x 9 ¼ 11 2udð1 cÞð1 bÞ 1 10 uð1 dcÞ 00 w w Y-linked recombination between duplicates, and assum- x10ð1 shÞ x00ð1 sÞ ing that the crossover rate is zero (dc ¼ 0; hence, all 1 ð1 uÞdð1 cÞð1 bÞ 1 ð1 dcÞ w w recombination is by gene conversion), the leading ei- genvalue can be approximated for low rates of gene x x ð1 shÞ x ð1 sÞ conversion (d 0, per generation), x 9 ¼ 11 dc 1 10 dc 1 00 dc s w w w @l 2 x1ð1 shÞ l ¼ l 1 d 1 Oðd Þ1 1 dð2b 1Þ; ð1cÞ x 9 ¼ ð1 uÞ d¼0 @d d¼0 1 w which indicates that selection favors duplicates (l . 1) x1ð1 shÞ x0ð1 sÞ x09 ¼ u 1 ; when gene conversion is biased toward transmission of w w functional over nonfunctional gene copies (b . 0.5). where mean fitness is w ¼ x11 1 ðx10 1 x1Þð1 shÞ 1 Numerical evaluation of Equation 1a indicates that, ðx00 1 x0Þð1 sÞ. although higher rates of gene conversion can increase To describe conditions promoting the invasion of the leading eigenvalue (and hence the probability of duplicates, we analyzed the stability of an evolutionary invasion), this positive relationship quickly saturates. equilibrium in which duplicated genotypes are absent Thus, a little bit of gene conversion has about as much from the population. Under such a condition, the fre- of an impact on the leading eigenvalue as a high rate of quencies x1 and x0 equilibrate to xˆ1 ¼ 1 uð1 hsÞ= gene conversion does. Nevertheless, the strength of ½sð1 hÞ ¼ 1 xˆ0 and the leading eigenvalue of the such positive selection (with magnitude of l 1) is on stability matrix is the order of the mutation rate (u) and is therefore extremely weak. Stochastic simulations (see below) show ð1 2uÞð1 dcÞ 1 2udð1 cÞb 1 ð1 shÞð1 uÞð1 dÞ that the probability of duplicate fixation is marginally l ¼ vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ð1 shÞð1 uÞ influenced by biased gene conversion alone. u u 2 Further analysis of Equation 1a shows that, as with the t fð1 2uÞð1 dcÞ 1 2udð1 cÞb 1 ð1 shÞð1 uÞð1 dÞg case of no recombination (Otto and Yong 2002), 4ð1 2uÞð1 dcÞð1 dÞð1 shÞð1 uÞ 1 : selection will favor duplicates if they directly increase 2ð1 shÞð1 uÞ fitness (sh . 0). Gene conversion (including unbiased ð1aÞ gene conversion: b ¼ 0.5) can increase the strength of Selection favors the invasion of a duplicate when the selection favoring invasion of a duplicate (l 1; Figure leading eigenvalue is greater than one (Otto and Day 1). However, the relative impact of gene conversion is 2007). The magnitude of the leading eigenvalue also minor when sh ? u. In other words, when there are represents the strength of selection acting in favor of a rare weak direct benefits of having multiple gene copies, the duplicate gene [i.e., the probability of fixation is pro- strength of natural selection favoring Y-linked gene portional to l (Otto and Bourguet 1999; Otto and duplicates will be enhanced by gene conversion be- Yong 2002); see below for additional details]. Without tween paralogs. This conclusion holds if the crossover recombination (d ¼ 0), the leading eigenvalue reduces to rate between duplicate pairs (dc) is small (Figure 1). As the rate of crossing over increases, the production of 1 1 2u 1 jshð1 uÞuj abnormal Y haplotypes can generate purifying selection l ¼ 1 ð1bÞ 2 2ð1 shÞð1 uÞ against Y chromosomes that carry gene duplicates. Why should gene conversion broaden duplicate in- and evolutionary invasion of a duplicate-bearing Y is vasion conditions under ? An intuitive favored when sh . u/( 1 u). Duplicates are favored explanation can be reached by considering the recursion 280 T. Connallon and A. G. Clark

theduplicateisfavoredwhen½1 2uð1 dbÞ=w . 1. Invasion is clearly facilitated by gene conversion (db . 0). Nevertheless, because the term 2u(1 db) is extremely small, gene conversion will marginally influence the probability of fixation whenever sh ? u. Probability of duplicate fixation: The deterministic model presented above can be modified to describe the evolutionary dynamics in finite populations. Follow- ing Otto and Bourguet (1999) and Otto and Yong (2002), the selection coefficient for a rare gene dupli- cate can be approximated as l 1, where l is the leading eigenvalue of the stability matrix (Equation 1a, above). Given this selection coefficient, the probability that a rare duplicate is eventually fixed can be estimated by diffusion approximation (Kimura 1957, 1962), with drift and diffusion coefficients M ¼ (l 1)x(1 x) and V ¼ x(1 x)/N, respectively, where x is the frequency of a duplicate-bearing Y haplotype and N is the Y chromo- some effective population size. For an initial frequency of 1/N, the probability that a duplicate is fixed will be 1 e2ð1lÞ 2ðl 1Þ PrðfixationÞ¼ : ð2Þ 1 e2N ð1lÞ 1 e2N ð1lÞ To assess the validity of Equation 2, we conducted computer simulations that incorporate mutation, selec- tion, and genetic drift. Each simulation was initiated at x11 ¼ 1/N, x0 ¼ u(1 hs)/(s sh), and x1 ¼ 1 x11 x0. To generate genotypic frequencies for the next gener- ation, N genotypes were randomly drawn from a multinomial distribution, after selection, from the six genotypes described above. Mutation–selection–drift recursions were iterated until the duplicate genotype was either fixed or lost from the population. Equation 2 provides a good approximation for the probability of duplicate fixation over a broad range of parameter space (Figure 2 and Figure S1). As direct selection on a duplicate approaches zero (sh / 0), the probability of fixation approaches 1/N. As direct selection increases in strength (1 ? 1 l ? 1/N), the probability of fixation approaches 2(l 1). Figure 1.—Gene conversion can enhance the strength of Gene conversion had little impact on the probability positive selection for rare duplicate genes, whereas crossovers of duplicate fixation (see Figure S1). As shown above, select against duplicates. Selection coefficient approxima- tions (l 1) are based on the leading eigenvalue (Equation the leading eigenvalue of the stability matrix is not 1a), as described and justified in the text, and are presented as substantially influenced by gene conversion unless sh is a ratio of selection with (d . 0) vs. without recombination of similar order to u. Even though the selection co- (d ¼ 0). Representative results are presented for u ¼ 105 efficient approximation (l 1) can increase with gene and assume that there is no gene conversion bias (i.e., b ¼ 0.5). conversion, its absolute magnitude under weak direct selection (sh 0) will generally be too small for natural dynamics for a population fixed for the single-gene hap- selection to be effective, unless of course Nu . 1, which lotype. Because this explanation is heuristic, we ignore is particularly unlikely for Y-linked loci. Thus, gene crossovers and assume that they do not occur (c ¼ 0). The conversion is unlikely to significantly enhance the rate rate of increase for a rare haplotype with two functional of duplicate gene fixation, but can potentially reduce gene copies depends on its relative competitiveness the fixation rate of duplicates if the rate of deleterious against the resident, single-copy haplotype. For initial crossovers between paralogs is high. condition x11 ¼ 1/N and x10 ¼ x00 ¼ 0, the expected Gene conversion and the maintenance of gene proportion of functional duplicate haplotypes (x11)within duplicates: A major hypothesis inspired by the human Y thegametepoolisE½x119¼x11½1 2uð1 dbÞ=w,and chromosome is that gene conversion between duplicates Gene Conversion and Y Evolution 281

abilities between states), we made a simplifying assump- tion that each of the n gene types represents an essential male fertility factor. Males lacking a functional copy of one or more gene types are sterile and comprise a heterogeneous genotypic class with reproductive suc- cess of zero. Although the essentiality assumption is useful for modeling purposes, it will often be biologi- cally reasonable because Y-linked genes, at least in mammals and Drosophila, are often essential for male fertility. For example, human Y chromosome micro- deletions within Y-palindromic regions are often associ- ated with spermatogenic failure (Noordam and Repping 2006; Lange et al. 2009). In Drosophila melanogaster, mutations in at least three of seven currently Y-annotated genes (kl-2, kl-3,andkl-5, as well as an additional set of unannotated genes: kl-1, ks-1,andks-2;dataobtained igure F 2.—The probability of fixation for Y-linked dupli- from http://flybase.org/) are known to cause male-sterile cate genes. The solid line depicts the analytical approxima- tion from Equation 2. Circles represent the proportion of phenotypes. Nevertheless, the overall agreement be- duplicate genotypes (out of 100,000 replicate simulations tween our multilocus and two-locus results (the latter for each data point) that eventually become fixed within does not assume essentiality; see Figure S2)suggeststhat the population. Results are shown for d ¼ 0, N ¼ 1000, and a violation of the essentiality assumption is unlikely to 5 . u ¼ 10 , per locus, per generation. Values of d 0 yield ap- strongly affect our conclusions. proximately the same results (see Figure S1). For each paralog pair, there are three possible genotypes: both loci functional, one functional and may prevent the accumulation of mutations and ulti- one nonfunctional, and both nonfunctional. Transi- mately prevent or slow down Y chromosome degenera- tions between genotypic states can occur by mutation, tion due to Muller’s ratchet (Charlesworth 2003; by gene conversion, or by crossing over, with crossover Rozen et al. 2003; Noordam and Repping 2006). To for- yielding an abnormal Y chromosome. For individuals mally evaluate this possibility, we considered two models carrying a structurally normal Y, fitness follows the for the maintenance of functional Y-linked genes. We first function w ¼ (1 sh)k(0)j, where j refers to the number conducted simulations of our two-locus model with initial of gene pairs with both copies nonfunctional, and k condition x11 ¼ 1 (a pair of functional duplicates is ini- refers to the number of pairs where one of the two gene tially fixed within the population) and analyzed whether copies is functional (0 # k # n). Individuals with j . 0 gene conversion prevented the loss of one or both of the and individuals carrying abnormal Y chromosomes are functional gene copies. Gene conversion between Y-linked sterile. After selection, the reproductive contribution of paralogs decreased the rate of gene loss under a wide an individual with k Y-linked mutations is range of fitness conditions, including the extreme case xkwk where there was no direct benefit of having two, as x ¼ ; kS w opposed to one, functional gene copies (Figure S2). Although gene conversion can substantially reduce the where xk is the zygotic frequency of k-bearing males, wk ¼ rate of gene loss, the results indicatethatlossofcompletely (1 sh)k is the fitness of a male with k mutations,P and n redundant genes (where sh ¼ 0) will persist under gene mean male fitness with respect to the Y is w ¼ k¼0 xkwk. conversion, albeit at a substantially reduced rate. (The reproductive contribution of sterile individuals is Prior models of Muller’s ratchet generally find that zero.) the rate at which deleterious mutations become fixed To facilitate analytical tractability, we assume that the depends upon both the strength of purifying selection rates of recombination and mutation are both small and the number of loci evolving on an asexual chromo- enough to ignore multiple mutation and multiple re- some (Charlesworth and Charlesworth 2000; combination events per generation. In other words, there Bachtrog 2008). To account for selection and gene is a zero probability of an individual with k mutations conversion across many loci, we extended our model to producing a fertile son with k 2ork 1 2mutations.This describe the degeneration of Y chromosomes carrying assumption is justified as long as 2nu > 1andnd > 1, an arbitrary number of genes. To permit gene conver- which requires that the mutation and recombination rate sion, we assumed that each Y initially carries n distinct per locus is small, and the number of loci mutable to a gene types, each with a duplicate copy (for a total of 2n nonfunctional allele is much smaller than the reciprocal loci). Because the increased number of genes greatly of the mutation or gene conversion rate: n > min[1/u,1/ expands the number of possible genotypic and fitness d]. Because n represents a small fraction of Y-linked states (and consequently the matrix of transition prob- nucleotides (i.e., it represents a very specific functional 282 T. Connallon and A. G. Clark class), this assumption is biologically reasonable. Never- theless, a violation of these assumptions is expected to make our results conservative by downwardly biasing the speed of Muller’s ratchet (which is enhanced by a higher mutation rate) and minimizing the positive effect of gene conversion (higher gene conversion rates increasingly counteract Muller’s ratchet). Extending across the 2n loci, the probability that a Y chromosome experiences one mutation is Pr(M ¼ 1) ¼ 2nu ¼ U. The probability that zero mutations occur is Pr(M ¼ 0) ¼ 1 U.The probability of a recombination event between one of the n paralog pairs is Pr(R ¼ 1) ¼ nd ¼ D. The probability of no recombination is Pr(R ¼ 0) ¼ 1 D. Given a sequence of events of (i) birth, (ii) selection, (iii) mutation, (iv) recombination, and (v) random mat- ing, the frequency of fertile males in the next generation follows the recursion Figure 3.—Gene conversion increases the frequency of Y chromosome haplotypes that carry zero deleterious muta-  k1 xk1ð1 shÞ U ðn k 1 1Þ tions (i.e., the ‘‘least-loaded’’ genotypic class). The cost of a x 9 ¼ mutation eliminating function of a copy of each duplicate k w n  pair is represented by sh (this cost increases from left to right x ð1 shÞk Uk 1 2nð1 U Þ 1 k on the x-axis). The relative proportion of mutation-free Y w 2n chromosomes in recombining vs. nonrecombining popula- Dð1 cÞðn kÞ tions is presented as a ratio of the two scenarios (gene conver- 3 1 1 D sion increases the proportion of mutation-free Y’s when this n ratio is greater than one). The number of distinct, Y-linked  k k11 genes is represented by n. Results are presented for c ¼ 0, x ð1 shÞ U ðn kÞ x 1 ð1 shÞ 1 k 1 k 1 b ¼ 0.5, and u ¼ 5 3 104, per locus, per generation, and w n  w D ¼ U ¼ 2nu. Additional results are presented in Figure S3. U ðk 1 1Þ 1 2nð1 U Þ 3 2n Dbð1 cÞðk 1 1Þ mean Y chromosome fitness as well as the distribution of 3 : n mutations among individuals can be analytically de- termined. If mutations that eliminate duplicate gene function are deleterious (sh . 0), and the number of The ‘‘least-loaded’’ (k ¼ 0) and ‘‘most-loaded’’ (k ¼ n) unique Y-linked genes is large (n ? U/sh), the pop- classes of fertile males follow the recursion ulation approaches the equilibrium: xˆ PoisðU=shÞ,  k x 0, and w 1 U . This is analogous to the case of x ð1 U Þð1 DcÞn 1 UDð1 cÞb ˆs x 9 ¼ 0 mutation–selection balance with incomplete domi- 0 w n nance (sh . 0), with a Y-linked genetic load of L ¼ U x ð1 shÞ U Dð1 cÞb U aldane imura aruyama 1 1 1 1 U 1 e (e.g., H 1937; K and M w 2n n 1966; Kondrashov and Crow 1988). When knocking out a duplicate yields no fitness effect (sh ¼ 0), or the and number of Y-linked genes is small (n > U/sh), the n1 population approaches the equilibrium: xˆn 1 U=2, xn1ð1 shÞ U ð1 DÞ xn9 ¼ xˆ U =2, and w 1 U =2. Under this scenario, the w n s genetic load is reduced by a factor of 2, to L ¼ U/2 1 x ð1 shÞn ð2 U Þð1 DÞ 1 n ; eU/2 (Haldane 1937). w 2 Gene conversion between duplicates increases the respectively. The frequency of sterile males in the next frequency of the least-mutated class (Figure 3 and Figure generation (via crossover, mutation, or gene conver- S3), whether or not there is a gene conversion bias sion) will be favoring functional over nonfunctional loci. The fre- quency of the least-loaded class represents a quantity of Xn particular importance for adaptation on clonally trans- xs9 ¼ 1 xk9: mitted chromosomes such as the Y (Charlesworth k¼0 and Charlesworth 2000). Without recombination, the unit of selection is the chromosome rather than the Deterministic equilibria and mean fitness of the Y: When locus. Beneficial mutations that are associated with there is no recombination between duplicates (D ¼ 0), mutation-free genetic backgrounds are relatively likely Gene Conversion and Y Evolution 283 to become fixed (Peck 1994; Orr and Kim 1998) and do plicative epistasis, and gene conversion does not not permit hitchhiking of deleterious mutations during strongly influence mean fitness or the distribution of a (Rice 1987). However, as the frequency mutations among Y chromosomes. This explanation of the least-loaded class becomes small, virtually all accounts for the decreased impact of gene conversion beneficial mutations will arise in inferior genetic back- on mutation-free Y chromosomes, as the strength of grounds. This will limit the adaptive potential of the selection (sh) increases (Figure 3 and Figure S3). Y chromosome. Because it increases the fraction of Muller’s ratchet and the accumulation of nonfunctional mutant-free Y chromosomes, gene conversion is ex- genes: The deterministic results (presented above) rep- pected to enhance the fixation probability for beneficial resent an upper limit for Y chromosome fitness. In finite mutations and can reduce the deleterious consequen- populations, where Muller’s ratchet operates, mean ces of hitchhiking. fitness can further decrease with each successive loss of By shifting the mutational distribution toward rela- ‘‘mutation-free’’ individuals. Once lost from the popula- tively mutation-free genotypes, gene conversion also tion, mutation-free genotypes are unlikely to be recov- increases mean Y chromosome fitness. This effect does ered by back mutation or positive selection because they not depend on a gene conversion bias, but can become must initially arise within the current least-loaded class exacerbated when conversion events favor functional and subsequently avoid stochastic loss (Peck 1994; Orr over nonfunctional variants (for models yielding similar and Kim 1998; Gordo and Charlesworth 2000). conclusions about the genetic load, albeit by different To explore the influence of gene conversion on the approaches, see Bengtsson 1986, 1990, and especially rate and severity of Y chromosome degeneration via Ohta 1989). Muller’s ratchet, we conducted a series of stochastic These long-term effects of gene conversion can be simulations, varying the selection and recombinational accounted for by a straightforward explanation. When parameters (u, h, n , d, c, b). We first use the recursions the fitness cost of silencing both copies of a duplicate presented above to bring the frequencies of each pair is much greater than the cost of silencing one of the genotypic class to deterministic equilibrium. Conver- copies (when duplicates partially or completely mask gence to equilibrium is followed by 100,000 generations deleterious mutations: h , 0.5), selection across Y chro- of simulation under a mutation–selection–drift model mosomes mimics truncation selection, which is par- and constant male population size. For each generation, ticularly efficient at removing deleterious (e.g., genotype frequencies were sampled from a pseudoran- Kondrashov 1988; Ohta 1989). Truncation selection dom multinomial distribution (pseudorandom num- arises because mutations on a relatively mutation-free Y bers generated with R; R Development Core Team will generally affect one copy of a pair, with the second, 2005), with genotypes randomly sampled after selec- functional copy compensating for loss of the first. As the tion, mutation, and recombination. number of mutations on a Y increases, so does the When there is no gene conversion between dupli- probability of silencing the second copy of a pair. Con- cates, Muller’s ratchet can operate rapidly, causing sequently, the deleterious effect of each mutation in- Y-linked fitness decay and loss of functional genes. creases faster than linearly with the number of mutations Representative simulation results are shown in Figures carried on a Y. 4 and 5. In agreement with previous theory (Haigh 1978; Without recombination, the accumulation of muta- Gordo and Charlesworth 2000; Bachtrog 2008), the tions is unidirectional, and the population will tend to impact of the ratchet is strongest when the ancestral Y evolve toward the edge of the truncation point (n carries many functional gene duplicates and when mutations at distinct genes), particularly if masking by mutations have small individual fitness effects. Relatively duplicates is strong (i.e., having two functional copies low rates of gene conversion can rescue Y-linked genes provides the same fitness as one copy). At the extreme of from stochastic loss via Muller’s ratchet and thereby sh ¼ 0 (complete masking), the population evolves to increase mean fitness of the Y (Figures 4 and 5). contain n functional genes, each distinct. Gene conver- Increasing the total mutation and gene conversion rates sion restores variability by permitting bidirectional on the Y (U and D, respectively) amplifies the differences transitions (e.g., k to k 1 and k 1 1 mutations). Y between recombining and nonrecombining chromo- chromosomes that are closer to the truncation point somes, whereas a decrease in these compound parame- have a higher probability of transitioning (by mutation ters (U, D / 0) eliminates these long-term evolutionary or recombination) beyond the truncation point where differences. This effect occurs both with and without they are removed by selection. Consequently, the biased gene conversion between duplicates. population distribution shifts toward fewer mutations Gene conversion appears to constrain accumulation per Y. However, if selection in favor of functional of deleterious mutations in a way that is identical to duplicates is strong relative to the number Y-linked crossing over in traditional models of Muller’s ratchet. genes (sh . 0; n large), most individuals will carry few Under both models, the rate at which the ratchet mutations, the truncation point becomes irrelevant to Y ‘‘clicks’’—theleast mutated class of individuals is lost—is chromosome evolution, selection shifts toward multi- highest when individual mutations are weakly deleterious 284 T. Connallon and A. G. Clark

Figure 4.—Intrapalindrome gene conversion prevents the erosion of Y chromosome gene con- tent and enhances adaptation on the Y. N repre- sents the Y-linked effective size, sh is the fitness cost associated with mutations to one copy of each duplicate pair, t refers to the generation within the simulation, and n is the number of dis- tinct genes on the chromosome (including dupli- cates, each Y carries 2n genes). Results are presented for c ¼ 0, b ¼ 0.5, and u ¼ 5 3 104, per locus, per generation. Each data point repre- sents the average of 10 simulation replicates. Since estimates of gene conversion from human– chimp comparisons suggest that D may be consid- erably higher than the mutation rate (Rozen et al. 2003), the results, if anything, will underes- timate the impact of gene conversion on func- tional gene retention.

and/or the chromosome-wide mutation rate (an in- of the mutation rate). This result is in agreement with a creasing function of the mutation rate per locus and recent simulation study, which also found that gene the number of loci) is high (Charlesworth and conversion does not strongly promote the invasion of Charlesworth 2000; Bachtrog 2008). The similar new Y-linked duplicates (Marais et al. 2010). consequences of gene conversion and crossing over are The invasion dynamics of rare duplicate genes bear not surprising: both processes permit chromosomal some similarities to models of adaptation within gene transitions from more to fewer mutations and this, families (Walsh 1985; Mano and Innan 2008), which along with purifying selection, can counteract the show that gene conversion can enhance the probability steady accumulation of new deleterious mutations that a weakly beneficial allele becomes fixed. In our within a population. model, gene conversion alone is unlikely to overpower genetic drift unless Nu ? 1, yet this condition is rarely (if ever) expected to arise within populations, par- DISCUSSION ticularly with respect to Y-linked loci that have reduced Previous theory indicates that selection does not effective size relative to other nuclear genes. Further- generally favor the invasion of a rare duplicate gene more, there is no biological reason to suspect that gene unless there is a direct benefit of carrying an additional conversion will necessarily be biased against mutant gene copy (Clark 1994) or there is recombination copies of a particular gene. We therefore expect that between the paralogs (Yong 1998; Otto and Yong Y-linked duplicates will most likely become fixed by 2002; Tanaka and Takahasi 2009). We have shown that genetic drift, unless they directly increase the fitness of gene conversion between duplicates can broaden the those who carry them (for additional discussion of parameter conditions favoring the invasion of duplicate duplicate gene fixation, see Innan and Kondrashov genes from low initial frequency. Biased gene conversion, 2010). Likewise, deleterious Y-linked crossover events can with conversion favoring undamaged over damaged gene generate selection against gene duplicates. This factor copies, can generate positive selection for rare duplicates will have little impact on the probability of fixation or loss that do not provide a direct fitness benefit (that is, unless the crossover rate is relatively high and direct individuals with two functional copies have fitness equal selection on the duplicate is weak or absent. to those with one). However, the strength of positive Y chromosome recombination can exert a profound selection acting on such duplicates is weak (on the order influence on the retention of functional copies of genes Gene Conversion and Y Evolution 285

Figure 5.—The proportion of loss-of-function duplicates following 100,000 generations of mu- tation, selection, and genetic drift. Parameters are described in the Figure 4 legend and through- out the text. Results are presented for c ¼ 0, b ¼ 0.5, u ¼ 5 3 104, per locus, per generation, and D on the order of the mutation rate, D ¼ U ¼ 2nu. Each point represents the average of 10 replicate simulations.

that have already become fixed within the population. the number of Y-linked duplicate genes (or in humans Our simulations show that low rates of gene conversion the size of Y-linked palindromes) will have an upper are sufficient to maintain Y-linked genes and counteract limit. As the number of Y-linked loci that interact via degradation via Muller’s ratchet. These results are recombination increases, so too should the rate of conservative, as higher rates enhance the preservation deleterious crossovers. This suggests an upper limit to of functional gene copies. Thus, once gene conversion Y chromosome gene content, where crossing over has evolved, it can potentially provide a degree of becomes unbearably costly. From this perspective, stability on an otherwise evolutionarily unstable Y duplication and recombination represent a costly chromosome. Interestingly, Marais et al. (2010) ob- mechanism of Y chromosome preservation. served that the rate of invasion for gene conversion In addition to the Y chromosome, our findings have modifier alleles does not greatly exceed neutral expect- implications for asexually reproducing species. Recent ations unless they greatly increase the gene conversion reports suggest that the asexual bdelloid rotifers are rate. This suggests that, while low rates of conversion tetraploid (Mark Welch et al. 2008) and that gene may slow the rate of Muller’s ratchet, the evolution of conversion occurs between gene copies (Hur et al. 2008; the gene conversion rate itself may be much more Mark Welch et al. 2008). Our model supports the restrictive. verbal claim that gene conversion between homologous The large number of genes within the ‘‘ampliconic’’ gene copies might aid in DNA damage repair and region of the human Y (Skaletsky et al. 2003) should prevent the genomic degradation that is expected to provide a large target for mutations, creating an accompany strict asexual reproduction. Unlike the Y opportunity for Muller’s ratchet to act. This role of chromosome scenario, crossovers between homolo- gene conversion on the Y is therefore likely to explain gous, tetraploid chromosomes will tend to avoid dele- patterns of gene retention on the human Y chromo- terious chromosomal aberrations. The relative rate of some. It is less clear whether similar patterns character- nonhomologous crossovers is an empirical question ize other animal species. Current (albeit incomplete) that may be difficult to assess, given the likely association data suggest that amplification and re- between chromosome abnormalities and embryonic tention might be common Y chromosome attributes death, which will lead to a pronounced bias toward (Rozen et al. 2003; Verkaar et al. 2004; Murphy et al. ‘‘normal’’ chromosomes. On the other hand, crossing 2006; Alfo¨ldi 2008; Wilkerson et al. 2008; Krsticevic over between homologous is likely to et al. 2009), although the prevalence of Y-linked gene generate copy number polymorphism, which adds a conversion outside the human and chimp lineages is level of complexity to the evolutionary dynamics of less clear (but see Geraldes et al. 2010). Future autosomal gene duplicates or gene families. This may sequencing efforts, including evidence for gene conver- lead to different evolutionary consequences of crossing sion among Y-linked genes in nonhuman species, will over and gene conversion in asexual lineages compared help to determine the general relevance of the dupli- to the results that we report for the Y chromosome and cation and gene conversion model presented here. represents an interesting avenue for future theoretical Within-chromosome crossovers can generate an ab- research. ange normal, sterility-inducing Y (L et al. 2009) and We are grateful to Roman Arguello, Clement Chow, Margarida potentially represent a deleterious fitness consequence Cardoso-Moreira, Qixin He, Lacey Knowles, Amanda Larracuente, of Y-linked recombination. This cost also implies that Rich Meisel, Nadia Singh, and two anonymous reviewers for discussion 286 T. Connallon and A. G. Clark and comments that substantially improved the quality of the manu- Lange, J., H. Skaletsky,S.K.M.van Daalen,S.L.Embry,C.M. script and to Sarah Otto for comments about the eigenvalue-selection- Korver et al., 2009 Isodicentric Y chromosomes and sex disor- coefficient approximation and for sharing an unpublished manu- ders as byproducts of that maintains palindromes. 138: 855–869. script. This work was supported by National Institutes of Health grant ano nnan GM64590 to A.G.C. and A. B. Carvalho. M , S., and H. I , 2008 The evolutionary rate of duplicate genes under . Genetics 180: 493–505. Marais, G., 2003 Biased gene conversion: implications for genome and sex evolution. Trends Genet. 19: 330–338. arais ampos ordo LITERATURE CITED M , G., P. R. A. C and I. G , 2010 Can intra-Y gene conversion oppose the degeneration of the human Y chromo- Alfo¨ldi, J. E., 2008 Sequence of mouse Y chromosome. Ph.D. Dis- some?: A simulation study. Genome Biol. Evol. 2: 347–357. sertation, MIT, Cambridge, MA. Mark Welch, D. B., J. L. Mark Welch and M. Meselson, Bachtrog, D., 2006 A dynamic view of sex chromosome evolution. 2008 Evidence for degenerate tetraploidy in bdelloid rotifers. Curr. Opin. Genet. Dev. 16: 578–585. Proc. Natl. Acad. Sci. USA 105: 5145–5149. Bachtrog, D., 2008 The temporal dynamics of processes underly- Murphy, W. J., A. J. P. Wilkerson,T.Raudsepp,R.Agarwala, ing Y chromosome degeneration. Genetics 179: 1513–1525. A. A. Schaffer et al., 2006 Novel gene acquisition on carnivore Bengtsson, B. O., 1986 Biased conversion as the primary function Y chromosomes. PLoS Genet. 2: e43. of recombination. Genet. Res. 47: 77–80. Noordam, M. J., and S. Repping, 2006 The human Y chromosome: Bengtsson, B. O., 1990 The effect of biased conversion on the mu- a masculine chromosome. Curr. Opin. Genet. Dev. 16: 225–232. tation load. Genet. Res. 55: 183–187. Ohta, T., 1989 The mutational load of a multigene family with uni- Carvalho, A. B., L. B. Koerich and A. G. Clark, 2009 Origin and form members. Genet. Res. 53: 141–145. evolution of Y chromosomes: Drosophila tales. Trends Genet. 25: Orr, H. A., and Y. Kim, 1998 An adaptive hypothesis for the evolu- 270–277. tion of the Y chromosome. Genetics 150: 1693–1698. Charlesworth, B., 2003 The organization and evolution of the hu- Otto, S. P., and D. Bourguet, 1999 Balanced polymorphisms and man Y chromosome. Genome Biol. 4: 226. the evolution of dominance. Am. Nat. 153: 561–574. Charlesworth, B., and D. Charlesworth, 2000 The degenera- Otto, S. P., and T. Day, 2007 A Biologist’s Guide to Mathematical Mod- tion of Y chromosomes. Philos. Trans. Biol. Sci. 355: 1563–1572. eling in Ecology and Evolution. Princeton University Press, Prince- Clark, A. G., 1994 Invasion and maintenance of a gene duplication. ton, NJ. Proc. Natl. Acad. Sci. USA 91: 2950–2954. Otto, S. P., and D. B. Goldstein, 1992 Recombination and the evo- Engelstadter, J., 2008 Muller’s ratchet and the degeneration of Y lution of diploidy. Genetics 131: 745–751. chromosomes: a simulation study. Genetics 180: 957–967. Otto, S. P., and P. Yong, 2002 The evolution of gene duplicates. Geraldes, J., T. Rambo,R.Wing,N.Ferrand and M. W. Nachman, Homol. Eff. 46: 451–483. 2010 Extensive gene conversion drives the concerted evolution Peck, J. R., 1994 A ruby in the rubbish: beneficial mutations, delete- of paralogous copies of the SRY gene in European rabbits. Mol. rious mutations and the evolution of sex. Genetics 137: 597–606. Biol. Evol. (in press). RDevelopment Core Team, 2005 R: A Language and Environment Gordo, I., and B. Charlesworth, 2000 The degeneration of asex- for Statistical Computing, reference index version 2.2.1. R Founda- ual haploid populations and the speed of Muller’s ratchet. Genet- tion for Statistical Computing, Vienna. ics 154: 1379–1387. Repping, S., H. Skaletsky,J.Lange,S.Silber,F.van der Veen et al., Haigh, J., 1978 Accumulation of deleterious genes in a population—- 2002 Recombination between palindromes P5 and P1 on the Muller’s ratchet. Theor. Popul. Biol. 14: 251–267. human Y chromosome causes massive deletions and spermato- Haldane, J. B. S., 1937 The effect of variation on fitness. Am. Nat. genic failure. Am. J. Hum. Genet. 71: 906–922. 71: 337–349. Rice, W. R., 1987 Genetic hitchhiking and the evolution of reduced Heinritz, W., D. Kotzot,S.Heinze,A.Kujat,W.J.Eleemann et al., genetic activity on the Y sex chromosome. Genetics 116: 161–167. 2005 Molecular and cytogenetic characterization of a non- Rozen, S., H. Skaletsky,J.Lange,S.Silber,F.van der Veen et al., mosaic isodicentric Y chromosome in a patient with Klinefelter 2003 Abundant gene conversion between arms of palindromes syndrome. Am. J. Med. Genet. A 132A: 198–201. in human and ape Y chromosomes. Nature 423: 873–876. Hughes, J. F., H. Skaletsky,T.Pyntikova,T.A.Graves,S.K.M. S kaletsky, H., T. Kuroda-Kawaguchi,P.J.Minx,H.S.Cordum,L. van Daalen et al., 2010 and human Y chromo- Hillier et al., 2003 The male-specific region of the human Y somes are remarkably divergent in structure and gene content. chromosome is a mosaic of discrete sequence classes. Nature Nature 463: 536–539. 423: 825–837. Hur, J. H., K. Van Doninck,M.L.Mandigo and M. Meselson, Tanaka, K. M., and K. R. Takahasi, 2009 Enhanced fixation and 2008 Degenerate tetraploidy was established before bdelloid preservation of a newly arisen duplicate gene by masking delete- rotifer families diverged. Mol. Biol. Evol. 26: 375–383. rious loss-of-function mutations. Genet. Res. 91: 267–280. Innan, H., and F. A. Kondrashov, 2010 The evolution of gene du- Verkaar, E. L. C., C. Zijlstra,E.M.van ’t Veld,K.Boutaga,D.C. plications: classifying and distinguishing between models. Nat. J. Boxtel et al., 2004 Organization and concerted evolution Rev. Genet. 11: 97–108. of the ampliconic Y-chromosomal TSPY genes from cattle. Kimura, M., 1957 Some problems of stochastic processes in genet- 84: 468–474. ics. Ann. Math. Stat. 28: 882–901. Walsh, B., 1985 Interaction of selection and biased gene conversion Kimura, M., 1962 On the probability of fixation of mutant genes in in a multigene family. Proc. Natl. Acad. Sci. USA 82: 153–157. a population. Genetics 47: 713–719. Wilkerson, A. J. P., F. Raudsepp,T.Graves,D.Albracht,W. Kimura, M., and T. Maruyama, 1966 Mutational load with epistatic Warren et al., 2008 Gene discovery and comparative analysis gene interactions in fitness. Genetics 54: 1337–1351. of X-degenerate genes from the domestic cat Y chromosome. Koerich, L. B., X. Wang,A.G.Clark and A. B. Carvalho, Genomics 92: 329–338. 2008 Low conservation of gene content in the Drosophila Y Yong, P., 1998 Theoretical population genetic model of the inva- chromosome. Nature 456: 949–951. sion of an initial duplication. Honours Thesis, Department of Zo- Kondrashov, A. S., 1988 Deleterious mutations and the evolution ology, University of British Columbia, Vancouver, BC, Canada. of sexual reproduction. Nature 336: 435–440. Yu, Y.-H., Y.-W. Lin, J.-F. Yu,W.Schempp and P. H. Yen, Kondrashov, A. S., and J. F. Crow, 1988 King’s formula for the mu- 2008 Evolution of the DAZ gene and AZFc region on tation load with epistasis. Genetics 120: 853–856. Y chromosomes. BMC Evol. Biol. 8: 96. Krsticevic, F. J., H. L. S antos,S.Januario,C.G.Schrago and A. B. Carvalho, 2009 Functional copies of the Mst77F gene on the Y chromosome of Drosophila melanogaster. Genetics 184: 295–307. Communicating editor: D. Charlesworth GENETICS

Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.116756/DC1

Gene Duplication, Gene Conversion and the Evolution of the Y Chromosome Tim Connallon and Andrew G. Clark

Copyright Ó 2010 by the Genetics Society of America DOI: 10.1534/genetics.110.116756 2 SI T. Connallon and A. G. Clark

FILE S1

I. Invasion of gene duplicates on Y chromosomes that carry an arbitrary number of linked genes.

Y-linked duplicate genes evolve within the genetic background of the entire Y chromosome, which is likely to contain multiple functional genes, particularly during early stages of sex chromosome evolution. To determine the generality of the single gene duplication scenario in the main text, we developed a second model to examine the evolutionary dynamics of rare, Y-linked duplicates on ancestral chromosomes carrying an arbitrary number (n) of single-copy genes.

Consider a rare, Y-linked duplicate on Y chromosome carrying n single-copy genes. By duplicating one of the n single-copy genes, the individual has n – 1 single-copy genes and a single duplicated pair. Though expanding the number of loci greatly increases the number of possible genotypes to follow within the population, subsequent calculations can be simplified by making each gene essential. In other words, fitness drops to zero (s = 1) unless each of the n genes has at least one functional copy.

Given this simplification, there are four relevant genotypic classes within the population: (i) individuals with n functional singletons and no duplicates, each at frequency xn and with fitness wn = 1 – sh; (ii) those with n + 1 functional genes (n – 1 singleton) at frequency xn1 and with fitness wn1 = 1; (iii) those with n + 1 genes (n – 1 singleton), of which n are functional, at frequency xn0 and with fitness wn0 = 1 – sh; and (4) a class of sterile individuals, at frequency xs and with fitness ws = 1 – s = 0, that either lack a functional copy of an , or carry an abnormal Y chromosome.

In an individual carrying n singletons, the Y chromosome deleterious mutation rate per gamete per generation is U = nu, and the distribution of mutations across gametes is reasonably modeled as a Poisson variable with mean of nu. However, given that the diploid, genomic deleterious mutation rate is unlikely to be much greater than one, and Y chromosomes typically represent a tiny fraction of a genome, the number of new mutations should be close to the Bernoulli distribution: U = nu is probability of one mutation, and 1 – U represents the probability of zero mutations, per generation. For an individual carrying n + 1 total genes, the overall mutational target will be slightly increased, and the Y chromosome mutation rate becomes Udup = U(n + 1)/n, per generation. The presence of gene duplicates introduces an opportunity for gene conversion, which as before, are governed by recombination rate (d), crossover (c), and conversion bias (b) parameters.

Following the events order of (i) birth, (ii) selection, (iii) mutation, (iv) recombination, and (v) fertilization, the Y chromosome recursions are:

xn1[2Ud(1  c)b + (n U Un)(1 dc)] xn0 (1  h)(1U)d(1  c)b xn1'= + [xn1 + (xn 0 + xn )(1 h)]n xn1 + (xn 0 + xn )(1 h)

2xn1U(1  d) xn 0(1  h)(1U)(1 d) xn0 '= + [xn1 + (xn 0 + xn )(1 h)]n xn1 + (xn0 + xn )(1 h)

xn (1  h)(1U) xn '= xn1 + (xn 0 + xn )(1 h) T. Connallon and A. G. Clark 3 SI

xs'= xn1'+xn0 '+xn '

Stability of the equilibrium xn1 = xn0 = 0, xˆ n =1U =1 xˆ s, and w = (1 U)(1 h) is governed by the eigenvalue: 2Ud(1  c)b + (n U Un)(1 dc) + (1  h)(1U)(1 d)n  = + 2(1 h)(1U)n 2 {}2Ud(1  c)b + (n U Un)(1 dc) + (1  h)(1U)(1 d)n  4(n U Un)(1 dc)(1 d)(1 h)(1U)n 2(1 h)(1U)n

When there is no recombination (d = 0), a rare gene duplicate is favored by selection when sh > U/(n – nU). Substituting for

U = nu yields sh > u/(1 – nu). This result differs slightly from the previous model of a duplicate linked to a single essential gene (the former model predicts that a duplicate invades when sh > u/(1 – u)). Multiple Y-linked genes will therefore decrease opportunities for positive selection in favor of new duplicates.

When selection is weak (sh  0), recombination can promote selection in favor of the duplicate. For sh = c = 0, the Taylor series approximation around d = 0 gives a leading eigenvalue of:

 d O(d 2 ) 1 d(2b 1)  =  d = 0 + +  +  d d = 0 which is greater than one for b > 0.5, as in the previous model. Numerical simulations of the leading eigenvalue under a broad range of parameter space show that, as before, the opportunity for positive selection for a new duplicate is greater with recombination. 4 SI T. Connallon and A. G. Clark

II. Invasion Probability of Duplicate Genes with Gene Conversion

FIGURE S1.—The probability of fixation for Y-linked duplicate genes. The red line depicts the analytical approximation from Eq. (2). To facilitate comparison between these results and those of Fig. 2 from the main text, we show the approximation for N = 1000, s = 1, d = 0, and u = 10-5, and present representative simulation results for d > 0 and various combinations of the remaining parameters (c, b). Circles represent the proportion of duplicate genotypes (out of 100,000 replicate simulations for each data point) that eventually become fixed within the population. T. Connallon and A. G. Clark 5 SI

III. Maintenance of Functional Gene Duplicates

FIGURE S2.—Gene conversion and the maintenance of functionally redundant paralogs. Results are presented for two extremes of selection: gene conversion between paralogs of an essential gene (s = 1) and between paralogs of a nonessential gene (s = 0.001). In each case, gene conversion is unbiased (b = 0.5) and the mutation rate is u = 10-5. Under essentiality and non- essentiality, fitness is maximized when at least one of the paralog copies is functional (i.e., masking of knockout mutations is complete: h = 0). Each point represents the fraction of 100 simulation replicates where both copies are maintained as functional within the population. For each simulation run, the population is initially fixed for two functional Y-linked genes, and then evolves under mutation, recombination, selection, and genetic drift for 100,000 generations. 6 SI T. Connallon and A. G. Clark

IV. Frequency of the ‘least loaded class’ under biased gene conversion.

FIGURE S3.—Gene conversion increases the frequency of Y chromosomes haplotypes that carry zero deleterious mutations (i.e., the “least-loaded” genotypic class). Results use the same parameters as those of Fig. 3 with n = 50, and with the biased gene conversion parameter (b) permitted to vary.