<<

Evolutionary Effects of Hybridization

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Edelman, Nathaniel B. 2020. Evolutionary Effects of Hybridization. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Citable link https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37365704

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA © 2020 - Nathaniel B Edelman All rights reserved. Thesis advisor: Professor James Mallet Nathaniel B Edelman

Evolutionary Effects of Hybridization

Abstract

Though any particular individual is extremely unlikely to mate with a member of

a different species, rare hybridization can have profound effects on adaptation and

evolution. As more whole genome comparative analyses are published, scientists are

learning that the genomes of many species have mosaic histories, with a substantial

share of alleles derived from interspecific introgression.

In Chapter 1, I review current literature on the prevalence of introgression in ge-

nomic studies, and examine mechanisms by which introgressed loci can be adaptive.

Researchers have used patterns of asymmetry and expectations of branch lengths

in local gene trees, as well as full network models, to identify such loci. The pro-

portion of genomes that have a history of hybridization can be as low as a single

gene and as high as 70%. Though it is not easy to assign adaptive significance to

these proportions, introgressed loci can be beneficial by contributing genetic vari-

ation, masking recessive deleterious alleles, or impacting particular traits. All of

these effects are greatly influenced by the particular demographic histories ofthe

participating populations, and at present it is difficult to estimate the impact of

introgression in adaptive evolution. Nonetheless, given that introgression appears

widespread, and significant fractions of the genome have been reported to be under

selection, the adaptive impact of introgression is likely to be high.

In Chapter 2, I aim to characterize introgressed variation across the butterfly radiation. Although hybridization has long been known to occur inthis group, only closely related species that hybridized in the relatively recent past had been systematically investigated, and then often only for a small subset of genes. An

iii Thesis advisor: Professor James Mallet Nathaniel B Edelman internationally collaborative study, this project consists of assembling the genomes of 20 Heliconiini species, generating a multi-species alignment, and characterizing the extent of interspecific gene sharing across the clade and throughout the history of the radiation. I construct a whole-genome species tree and find that there is a high level of phylogenetic discordance among trees generated from different segments of the genome. This leads me to re-conceptualize the evolution of Heliconius as a net- work instead of a bifurcating tree, and I generate reticulate evolutionary hypotheses for the group. I next describe how loci with alternative evolutionary histories are distributed across the genome for one tractable clade. Because we generated de novo assemblies, I am able to detect two major chromosomal inversions that have histories discordant with the species tree. One of these has a history of ancestral and incomplete lineage sorting (ILS), and I show the other to be an introgressed, convergent inversion that captures a color pattern switch locus. I next describe the general distribution of introgressed loci across the genome, in which introgression probability is strongly positively correlated with recombination rate.

In Chapter 3, I turn from large-scale genomic patterns of introgression to a case study of two sub-species of Heliconius pardalinus which, when mated together, yield sterile female offspring which produce no eggs. This phenomenon is consistent with

Haldane‘s rule, as females are the heterogametic sex in the lepidopteran ZZ/ZW system. Most incompatibility genes have been identified in model organisms such as Drosophila and Mus, both XX/XY sex determination systems. There is very little information about mechanisms of hybrid incompatibility in .

With collaborators, I generate a population of backcross individuals using the fer- tile F1 males. I dissect the ovaries of over 100 backcross females and score their morphological phenotypes. I also extract and sequenced RNA from a subset of the

iv Thesis advisor: Professor James Mallet Nathaniel B Edelman backcross and pure subspecies females. The intersection of RNA-sequencing, QTL mapping, and cross-referencing with known Lepidoptera oogenesis genes points to one candidate gene, tral, whose underexpression may be responsible for the hybrid

sterility phenotype. If this is validated, it will be the first molecular characterization

of hybrid sterility in a lepidopteran system.

v Contents

1 The prevalence and adaptive impact of introgression in 1 1.1 Abstract ...... 1 1.2 Introduction ...... 2 1.3 Quantifying the extent of introgressed loci in genomes ...... 3 1.4 Regimes of adaptive introgression ...... 9 1.5 Effects of demography and recombination ...... 14 1.6 Discussion ...... 17

2 Genomic architecture and introgression shape a butterfly ra- diation 19 2.1 Abstract ...... 19 2.2 Main Text ...... 20 2.3 Discussion ...... 31

3 Molecular basis of hybrid sterility in Heliconius pardalinus geo- graphic races 33 3.1 Abstract ...... 33 3.2 Introduction ...... 34 3.3 Materials and Methods ...... 37 3.4 Results ...... 45 3.5 Discussion ...... 51

4 Conclusion 57

References 62

vi Author List

The following authors contributed to Chapter 2: Paul B. Frandsen, Michael Miyagi, Bernardo Clavijo, John Davey, Rebecca Dikow, Gonzalo García-Accinelli, Steven M. Van Belleghem, Nick Patterson, Daniel E. Neafsey, Richard Challis, Sujai Kumar, Gilson R. P. Moreira, Camilo Salazar, Mathieu Chouteau, Brian A. Counterman, Riccardo Papa, Mark Blaxter, Robert D. Reed, Kanchon K. Dasmahapatra, Marcus Kronforst, Mathieu Joron, Chris D. Jiggins, W. Owen McMillan, Federica Di Palma, Andrew J. Blumberg, John Wakeley, David Jaffe, James Mallet. The following authors contributed to Chapter 3: Neil Rosser.

vii Listing of figures

1.3.1 Components of topologies in ILS and introgression gene trees . . . . 5

2.2.1 Phylogeny and phylogenetic networks of Heliconius show lack of sup- port for bifurcating tree...... 22 2.2.2 Local evolutionary history in the erato-sara clade is heterogeneous across the genome...... 24 2.2.3 Chromosomal architecture is strongly correlated with local topology. 27 2.2.4 Parallel evolution of a major inversion at the cortex supergene locus. 30

3.2.1 Distribution of H. pardalinus and crossing scheme ...... 38 3.3.1 Ovary score guide ...... 40 3.3.2 Comparison of scoring schemes ...... 41 3.4.1 Staging of developing oocytes ...... 47 3.4.2 Morphology score distribution ...... 48 3.4.3 Backcross differential expression ...... 50 3.4.4 QTL map ...... 52 3.5.1 Heliconius tral expression ...... 53 3.5.2 mRNA expression of piRNA-associated proteins ...... 55

viii Dedicated to Harriet Stern, my grandmother, who is no longer with us but has always been my role model.

ix Acknowledgments

This dissertation could not have been completed alone. Most obviously, I owe a great deal to my co-authors. As can be seen from the author list above, this is quite a group! Chapter 2 of this thesis, in particular, was the result of an effort conceived before I arrived at Harvard by an international consortium of Heliconius researchers. For the final product presented here, I particularly want to thank Paul Frandsen and Michael Miyagi. Paul was a partner in the project from the beginning, and we worked together throughout to craft the narrative and explore the data in exciting ways. Michael, though a later addition to the team, had the critical insight and theoretical ability to design the statistical analysis method QuIBL, which brought the manuscript to a higher level. I learned and grew immensely from my discussions and work with both of them. The third chapter has only a single co-author listed: Neil Rosser. Neil is a gifted naturalist and ecologist, and was the leader of my most extensive field season during my PhD. We spent many late nights working together in Tarapoto, and in addition to admiring his work ethic, I had a genuinely fun experience. Perhaps the most critical co-author of all is my advisor, Jim Mallet. Jim’s deep knowledge of the Heliconius literature, and ecological and evolutionary genetics more broadly, meant that he was ever ready to supply relevant readings and constructive criticisms to my projects. While his door was always open and he was happy to discuss my research, he also encouraged me to explore aspects of data that I found interesting. Jim did not force me in a particular direction, but did insist that my claims and analyses were well supported. Much of the discussion and honing of my research

x direction was done in lab meetings, and I would also like to gratefully acknowledge the other members of the Mallet group throughout my time who contributed to this process, and who made working in the lab a rewarding experience: Tianzhu Xiong, Fernando Seixas, Jorge Amaya Romero, Carolina Segami, and Liang Qiao. I was extremely lucky to have a group of three faculty committee members who were genuinely interested in my projects and who took the time to discuss them with me. Hopi Hoekstra could be counted upon to ask incisive questions and offer clear, practical advice. Naomi Pierce introduced me to entomology, and after each meeting I could be sure to leave with an idea connected to my work, but which I had never imagined before. John Wakeley was always encouraging and constructive with his comments, and my thinking as a scientist was heavily influenced by the five courses I took with him in my six years in the department. My memories of OEB will be dominated by my friendships, particularly with members of my cohort. The various retreats, brunches, parties, and other excuses to be with each other softened the pain of particularly difficult times and heightened the joy of successes. Outside of school, I was kept sane by the friendship of my teammates in the Goals Gone Wild soccer organization, of which I as a member for the final two and a half years of graduate school. I would also particularly liketo thank Audrey Burns and her mother Angie MacDonald, who were ever supportive and encouraging, and who believed in me throughout this sometimes seemingly endless process. Finally, I would like to thank Anna Newman, a source of joy and comfort who, at this very moment, is pushing me over the finish line. I am only in a position to write acknowledgements for anything at all because of my family. It is always a relief and pleasure to see my siblings, Rachel, Jonathan, and Andrew whenever we are together. My dad, Jim, has made a valiant effort to understand evolutionary biology over the past six years so that he can be properly excited about my research. My mom, Susan, has generally taken a pass, but could not be more proud and supportive of each step I’ve taken. Family Thanksgivings with my mom’s extended family and Passover Seders with my dad’s provide comfort and a welcome escape for the stresses of everyday life, and I am lucky to have such a loving family. I thank them all deeply. Finally, I would be amiss if I didn’t mention Kooskia the Peruvian street dog. I adopted Kooskia as a puppy in the Spring of my first semester, in the field in Peru. He’s been by my side ever since.

xi 1 The prevalence and adaptive impact of introgression in animals

1.1 Abstract

In the short term, hybridization is often detrimental to fitness. Early-generation hy- brids often suffer from sterility and inviability, or are maladapted to their environ- ments. However, genomic analyses have shown that a large share of genetic variation in modern-day populations has an evolutionary history that includes cross-species

1 . Here, we discuss ways that this widespread evidence of introgression can be interpreted in terms of evolutionary impact. We review methods for estimating the proportion of genomic material that arose via hybridization and their results, classify regimes of adaptive introgression, and interpret variation in introgressed loci across the genome due to recombination rate and demographic processes. We show that introgression is a major contributor of genetic variation, consistent with an important role in adaptive evolution. In many cases the long-term effects of intro- gressed loci are beneficial to recipient populations, but the presence or prevalence of introgression is difficult to translate directly to adaptive impact.

1.2 Introduction

In contrast to modern synthesis predictions that hybridization should have minimal adaptive impact on natural populations, interspecific gene flow is now a broadly accepted driver of genetic variability [5, 32, 34, 105, 108, 116]. Increasingly avail- able genomic data from non-model organisms support widespread introgressive hy- bridization across a broad array of species and throughout evolutionary history

[43, 45, 52, 53, 55, 61, 95, 100, 134, 140, 150, 158]. For example, a recent com- parative genomic analysis of woodcreepers identified five independent introgression events among nineteen species, the most ancient dating to almost five million years ago [150]. Such conclusions are based on a foundation of theoretical advances, most of which rely on a coalescent framework [89], and the understanding of and focus on each individual’s genome as a mosaic of their ancestral history [104, 135, 151]. These findings have inspired scientists to delve more deeply into investigating the signif- icance and role of hybridization in the evolutionary process. They have identified specific adaptively introgressed alleles such as insecticide resistance in mosquitos, as

2 well as more subtle adaptive introgression in the form of high-frequency alleles that were transferred from rainbow trout into westslope cutthroad trout [11, 128]. The probability that introgression will be adaptive depends on a nuanced, interconnected relationship between the divergence time of the parental populations, demographic history, and selection against linked introgressing alleles [86, 129, 166, 190]. Here, we review the literature on how common introgression is across genomes, the ways in which that introgression is detected, regimes of adaptive introgression, and the effects of demography and recombination.

1.3 Quantifying the extent of introgressed loci in genomes

Given the heterogeneity of ancestry within individuals, one first approximation of the importance of hybridization is to identify the proportion of any particular genome with a history of introgression. In this way, one can quantify the prevalence of introgressed loci, if not their function. A number of methods have been established to accomplish this goal, each of which relies on models describing the expected relationships of loci (gene trees) in the context of the evolutionary trajectories of populations (species trees).

Tracking lineages back through time, populations grow, shrink, and morph until merging with other lineages in common ancestral populations. These lineages may experience periods in which one or many individuals mate with members of an- other population or species, transferring genetic variation across species boundaries.

Because diploid genomes recombine each generation during meiosis, present-day in- dividuals contain within them multitudes of loci with unique genealogies and evolu- tionary histories. If hybridization has occurred, some loci will have ancestors which belonged to the donor species, and each local genealogy, or “gene tree”, provides

3 data for evolutionary inference [104, 135, 171].

Gene trees are comprised of two main types of information: the topology, which

describes the branching order of ancestral lineages leading to the sampled sequences,

and branch lengths, including coalescence times, which indicate when the sampled

lineages had a common ancestor [89]. In the absence of intra-locus recombination since the most recent common ancestor of all sampled individuals, each locus neces- sarily has a single bifurcating history. Every lineage has a single parent population, and every parent population has two daughter lineages (Fig. 1.3.1). In many in- stances, the true topology of a gene tree is expected to be different from that of the species tree. This phenomenon, known as gene tree – species tree discordance, occurs due to gene duplication or loss, incomplete lineage sorting (ILS), or introgres- sion [60, 104, 135, 171]. Therefore, one must use additional lines of evidence drawn from theoretical knowledge of the mechanistic differences between these processes to distinguish between them.

1.3.1 Tree asymmetry

In a bifurcating species tree framework, certain aspects of the tree are expected to

be symmetrical. For example, present-day sister species share a common ancestor

at some time, T , in the past. As long as the mutation rate per generation, µ, is the

same between the two species, and those mutations have no effect on fitness, the

expected number of substitutions that have occurred from the common ancestor to

each of these species is the same: T µ. Similarly, anything that affects the common

ancestor of these sister species will affect each species equally, and that is the case

for ILS [44, 104, 183] (Fig. 1.3.1, Center). Also known as deep coalescence, ILS occurs when homologous sequences at a particular locus sampled from sister species

4 Figure 1.3.1: Components of topologies in ILS and introgression gene trees Terminal branches of species recovered as sister are shown in red, and internal branches in blue. Left In the concordant, non-ILS case, the first coalescence takes place in AB, and the second in ABC. Center In the ILS case, both coalescence events take place in ABC. The terminal branches are relatively long, and the internal branch is relatively short. In ABC, all pairs of lineages are equally to be the first to coalesce. Right In the introgression case, the first coalescence takes place in C, and the second in ABC. The D and related statistics take advantage of the mutations that occur in the internal branch, and are therefore private to B and C [64, 93, 137, 156]. QuIBL and related methods take advantage of the fact that the long internal branch length has a different distribution of expected lengths compared to the internal branch in ILS gene trees [43].

(call them A and B) do not coalesce in the common ancestor private to those species

(AB). Instead, the AB remains polymorphic at this locus, and coalescence occurs in a population ancestral to A, B, and at least one more distantly related species (C).

In this case, if we have also sampled from C, the three possible gene tree topologies

[((A,B),C); ((A,C),B); ((B,C),A);] are all equally likely. Therefore, we are unlikely to obtain a gene tree topology concordant with the species tree, and we are equally likely to obtain either of the two discordant topologies. Considering a large set of loci, all of which have the property that no coalescence occurred until the most recent common ancestor of all three species, the theoretical expectation is that loci with gene trees supporting the two discordant topologies will exist in equal frequency

[89, 135, 192].

5 This logic drives the most commonly used statistical test for introgression, Pat-

terson’s D statistic [93, 137, 156]. This test distinguishes between the symmetric pattern described above and the asymmetric process of gene flow between one of the two sister species and the more distantly related third species (Fig. 1.3.1, Right).

With introgression, one expects significantly more loci to have gene tree topologies recovering the hybridizing species as sisters. Instead of estimating gene trees per se, the D statistic uses patterns of single nucleotide polymorphisms in which the third species and one of the sisters share a derived allele [156]. This is therefore a whole genome composite test for gene flow – certain regions of the genome will show affinity for one discordant topology or the other, but on average they will balance each other out. The D statistic and closely related f statistics [64, 137] can be used to estimate the fraction of a genome that has experienced introgression. In its first iteration, Green et al [64] used f statistics to estimate that approximately

2% of the human genome has a history of introgression from Neanderthals. Among animals, this class of tests has been successful in identifying interspecific gene flow among species groups as varied as Heliconius butterflies (28% between H. cydno

and H. melpomene in ) [111], killer whales (|D| as large as 0.13) [53], and cichlid fish (|D| as large as 0.22) [55]. Related symmetry-based measures have been developed to infer additional information such as the directionality of gene flow [139] and the extent of introgression at a particular locus [112].

1.3.2 Branch Length Expectations

Although gene trees that arise through ILS and introgression may have the same

topology, the expected coalescence times and therefore branch lengths are quite

different [89, 192]. Consider the conditions in which each coalescence event takes

6 place, again in the context of the 3-species system described above. In this model, there are two events in the species tree: one which splits ABC into AB

and Cwhich splits the population AB into A and B, and the other which splits the population AB into A and B (Fig. 1.3.1, Left). For ILS, the first coalescence event occurs in the common ancestral species, when all three lineages coexist in the same population. This means that the time from sampling until the first coalescence event is at least as long as the time until the most recent common ancestor of all three species. The time between the first and second event is approximately 2N generations after the first [192] (Fig. 1.3.1, Center). In the case of introgression, the first coalescence is likely to occur shortly after the introgression event. Because, by definition, the introgression event is closer to the present than the speciation time between the two sister species, the terminal branches in introgression trees are likely to be much shorter than that in ILS trees (Fig. 1.3.1, Right).

This kind of logic was used to infer that 70% of the Anopheles gambiae genome has a history of hybridization from A. arabiensis, and only 4% of the genome, almost exclusively on the X-chromosome, retained the original species tree topology [52].

Further, the second coalescence event in introgression trees cannot happen soon after the first, because the lineages are not in the same population. Instead, there isa waiting time equal to the difference between the time to the most common ancestor of all species and the time of hybridization, and the final coalescence will occur on average 2N generations after the lineages enter this common ancestral population.

Meyer et al measured the waiting time between coalescences to identify three major

introgression events in Lake Tanganyika Cichlid fishes [120]. Beyond absolute times to coalescence, the expected distributions of branch lengths among all loci in the genome with a given evolutionary history have characteristic shapes. These can be

7 leveraged to calculate the proportion of introgressed loci, as well as the probability of introgression at any particular locus. Using estimated gene trees, lengths of internal branch lengths were used to show that among loci with topologies discordant from the species tree in the erato-sara clade of Heliconius butterflies, 70% arose through introgression. This corresponds to approximately 20% of the whole genome [43].

1.3.3 Full network modeling

Both the symmetry argument used in the D and f statistics and estimates of branch

length distributions arise from a general model that fully describes the evolutionary

history of all loci in a genome. Ideally, one would infer introgression directly from

this general model, and there has been substantial work towards that goal. The an-

cestral recombination graph (ARG) records the evolutionary relatedness of all loci

sampled from a population, including both coalescence events and recombination

events [65, 66, 74]. Instead of a bifurcating tree, the ARG is a network in which one locus can have two or more parents which represent recombination events back though time. This representation is computationally expensive to calculate, though recent advances have made estimation of the ARG within a population more feasible

[84, 154]. Current methods that aim to apply a network model to species relation- ships are limited to relatively small numbers of loci and samples, but estimate a full species network with fitted parameters for branch lengths, introgression times, and introgression fractions [51, 174, 182, 195, 196]. Such methods have also been used to estimate introgression fractions, or the proportion of sampled genomes with a history of hybridization. Two introgression branches inferred on a phylogenetic network of Xiphophorous fish by SNAQ had introgression fractions estimated tobe

0.17 and 0.19 [174]. Using PhyloNet’s maximum likelihood method to infer reticulate

8 networks with previously estimated gene trees [200, InferNetwork_ML], Meyer et al

estimated that the three hybridization events in their cichlid phylogeny transferred

24, 41, and 9% of genetic variation from source to recipient populations [120]. Addi- tional development to incorporate larger datasets, including greater numbers of loci and taxa, will enable estimates of reticulate species histories that are more robust, firmly grounded in theory and less prone to confounding effects of unaccounted-for species interactions.

1.4 Regimes of adaptive introgression

These strategies to detect introgressed regions of a genome reveal that hybridization has contributed to genetic variation in organisms across the tree of life [108, 181].

The proportion of a genome corresponding to introgression has an enormous range - from around 2% in the snowshoe hare to over 70% in Anopheles mosquitos [52, 79].

However, it is not clear how to translate the prevalence of hybrid ancestry in a genome to the adaptive impact of introgression. Factors such as the function of introgressed loci and incompatibility between species will strongly effect the inter- pretation of any percentage of introgressed ancestry. Each of the methods to identify introgression discussed above assumes that alleles transferred from one species to another are selectively neutral. On their own, they therefore cannot explicitly iden- tify adaptive introgression, or provide answers about the impact of hybridization on populations’ abilities to adapt to their environments [138].

1.4.1 enhancing genetic variation

When species or populations are very closely related to one another, introgression

is in some ways akin to an increase in population size. Alleles that cross the species

9 boundary may have arisen neutrally in their own population, and when introduced into the other lineage may have benefits such as increasing genetic variation and masking deleterious recessive alleles [33, 54, 170]. A major goal of conservation biology is to protect and preserve endangered species which, by definition, have small population sizes [2]. Classical population genetics predicts that populations will accumulate deleterious alleles that have selective effects inversely proportional to their effective population size, Ne [9]. This is because selection is stronger than the stochastic effects of genetic drift when the selective coefficient of anallele, s, is greater than ∼1/Ne [71]. If most mutations are deleterious, the larger a population is, the more efficiently it can purge weakly detrimental alleles. Larger populations

are therefore expected to have greater absolute fitness than smaller populations88 [ ], and migration from large populations has been used as a strategy to rescue small populations from inbreeding depression [72, 78]; but see [94]. In Trinidadian gup- pies, a recent study showed that an influx of migration from a larger population to a smaller population increased fitness and genetic variation [50]. While conspe- cific, these populations had adapted to different ecological conditions, and geneflow did not swamp locally adaptive alleles in the small, recipient population. Even be- tween species, populations can experience these types of benefits of hybridization.

Genomic regions that do not experience introgression between the butterflies Heli- conius pardalinus and H. elevatus in the Amazon have much lower polymorphism

than regions in which gene flow is pervasive, suggesting this enhancement of fitness

via introgression may have been important among these two introgressing species

[92].

10 1.4.2 Adaptive introgression despite limited incompatibility

In most cases of interpecific hybridization, many introgressed loci will not be se-

lectively neutral, but are instead expected to have deleterious interactions with

regions in the recipient genome. Often, this pattern is attributed to negative epis-

tasis between alleles that have differentially fixed in each population, known as

Dobzhansky-Muller incompatibility (DMI) [10, 39, 124, 130, 133]. In the simplest model, mutations that contribute to DMI loci accumulate at a constant and sym- metric rate in both species. As the time since the common ancestor increases, the number of potential negative epistatic interactions between alleles that have arisen in each diverging population increases quadratically, leading to a “snowballing” ef- fect [115, 130, 133, 148]. It is somewhat unclear whether actual declines in hybrid fitness are linear or accelerate due to a snowball effect, but the relationship between percent divergence and hybrid compatibility is highly variable [31, 62, 149, 186].

Even in the absence of epistatic interactions, alleles from one population may be deleterious in the other due to environmental interactions.

Such effects predict that deleterious introgressed alleles are likely to be purged from recipient genomes, persist at low frequencies (perhaps at a migration-selection balance), or at best follow neutral trajectories. However, in many studies, analysis of genomic variation has revealed instances in which several loci of introgressed ancestry segregate at frequencies higher than expected under neutral gene flow

[49, 57, 153, 163, 194]. This is consistent with adaptive introgression, even if a particular trait cannot be associated with the introgressed loci. This scenario may represent the advantage of “recombinant genotypes”, or genotypes that result in a mixture of traits from each parent [4]. Phenotypic variation is a common outcome of hybridization, and when the increased variability extends beyond single large-effect

11 traits, it is likely to involve many genomic loci [3, 8, 63]. Establishment of many, unlinked introgressed alleles may indicate successful “genetic rescue” from recessive deleterious alleles, a single adaptive trait with polygenic genetic architecture, or an adaptive response to many different pressures. Typifying this case, hybridiza- tion between two strongly differentiated subspecies of mice (Mus musculus musculus and M. m. domesticus) in Europe has resulted in approximately 10% of the genome with a history of introgression [176]. Higher than expected frequencies of intro- gressed alleles provide evidence that they contribute to adaptation, though these authors did not estimate selective coefficients. Similarly, in rainbow trout-westslope cutthroat trout hybrids, there is an overrepresentation of introgressed alleles rela- tive to neutral expectations [11]. Selective coefficients as large as 0.05 are needed to explain the allele frequencies of rainbow trout alleles in hybrid populations, and such high-frequency introgressed alleles are consistently identified among multiple independent hybrid populations. Specific functions of these high-frequency alleles are still unknown, but functional categories such as transport of toxic compounds were over-represented among introgressed loci. This could be indicative of polygenic selection on discrete traits, or of multiple different traits that are all being selected for in hybrid trout populations.

One particular pattern observed as a result of hybridization is mitochondrial in- trogression, causing mito-nuclear phylogenetic discordance [184]. Often, the nuclear genome is less impacted than the mitochondria, and in some cases nuclear introgres- sion is undetected [59]. However, mitochondrial replacement has the potential to be the source of broad incompatibilities, leading to co-introgression of nuclear loci that interact with the mitochondria [13, 168].

12 1.4.3 Introgression of large-effect genes

Finally, when populations are highly divergent, it is likely that mainly alleles with

strong beneficial effects will introgress, as genetic incompatibilities are likely tobe

widespread and strong [1, 144]. For example, adaptive responses to extreme an- thropogenic selective pressures, such as rodenticide resistance transferred from Mus spretus to M. mus domesticus [175] or pesticide resistance from Anopheles gambiae

to A. coluzzii mosquitos [128] involve single loci with very strong selective advan- tages (lower bound s=0.28 and 0.13, respectively). Another pesticide resistance allele was recently identified as adaptively introgressed from the moth Helicoverpa armigera into H. zea [187], and though no selective coefficient was calculated in that

study, the frequency of the introgressed allele rose from 0% to 70% in just 4 years,

implying very strong selection.

Heliconius butterflies experience strong selection for local mimetic color patterns

[15, 76, 96, 106]. Across an H. erato color pattern cline in Peru, average selection

coefficients for three pattern-determining loci was 0.17[106]. These large-effect color pattern switch loci have been passed between many species across the genus through introgression (e.g. [35, 123, 136, 193]). Similarly, populations of snowshoe hare (Lepus americanus) acquired an allele through introgression from the black- tailed jackrabbit (L. californicus) that regulates the expression of the pigmentation gene agouti, resulting in a winter brown as opposed to winter white coat [79]. This phenotype, which is advantageous in the relatively temperate Pacific Northwest, has a selective coefficient estimated between 0.027 and 0.04980 [ ]. Similarly,

13 1.5 Effects of demography and recombination

The density and strength of incompatible alleles are fundamental parameters for determining which regime of adaptive introgression a particular pair of popula- tions may inhabit. Demography of hybridizing species, in combination with the recombination landscape, plays an important role in determining both factors and therefor influences which and how many genomic regions are likely to be introgressed

[1, 18, 70, 86, 98, 166]. In general, when most hybrid ancestry is expected to be detri-

mental to fitness, high recombination regions will harbor most introgressed variation

[1, 86, 125]. For example, a relatively small population with significant genetic load could act as a source population and contribute genetic material to a larger popula- tion, as is often assumed for Neanderthal-modern human gene flow70 [ ]. Similarly, two populations could have diverged sufficiently long ago for many DMIs to arise.

The observed association between introgressed loci and high recombination rate is thought to be due to a combination of two distinct factors. If all loci were unlinked, each neutral and beneficial variant would enter the recipient population at arate proportional to its fitness effect, even if hybridization generally represented abarrier to gene flow [144]. On the other hand, if the neutral or beneficial variant is linked to a large, introgressed, deleterious haplotype, its effects will be swamped by its genetic background, and it will likely be purged from the population [6]. Therefore, one part of the positive correlation between recombination rate and introgression is due to the de-coupling of neutral and beneficial loci from deleterious haplotypes

[7]. When considering adaptive introgression in particular, the probability that a particular beneficial allele introgresses into a recipient population in these circum- stances will be a function of the density of deleterious alleles in the genome, the local recombination rate, and the selective coefficient of the allele in question1 [ , 6, 7, 86].

14 Weakly beneficial alleles are most likely to introgress in regions of high recombina- tion, and we therefore expect a negative correlation between the minimum selective coefficient for an introgressed variant and local recombination rate. This prediction has implications specific to different organisms. In Heliconius butterflies, recombi-

nation rate is largely homogenous across the genome, but is reduced at chromosome

ends [43, 113]. However, most animals and plants have the opposite pattern, in which recombination is suppressed in chromosome centers relative to ends [68]. In

Drosophila, humans, and mice, recombination is reduced near transcription start sites, though there is not a general correlation between gene density and crossover frequency in animals [24, 68, 117, 172]. There are observed patterns of reduced recombination around centromeres or in heterochromatic regions as well [121, 179], and each recombination landscape will constrain which genomic regions are likely to introgress.

Conversely, introgression may occur in the direction of the large population into the small population. If, in general, deleterious effects of hybridization are due to globally deleterious alleles as opposed to epistatic genetic incompatibilities, one ex- pects the opposite relationship between recombination rate and introgression proba- bility [99]. Here, not only will hybridization mask the effects of recessive deleterious alleles in the recipient population, but globally beneficial alleles will be introduced as well. Illustrating this point, Leitwein et al ([98]) studied a population of wild arctic char, into which domestic charr had been introduced. The wild populations had very low effective population sizes, and at a local scale, introduced domestic ancestry was associated with low recombination rates.

In addition to local recombination rate, which measures the probability of recom- bination between two particular loci, global recombination rate plays an important

15 role in probability of introgression [190]. By global recombination rate, we mean the mean probability that any two loci in the genome recombine in each meiosis, and this rate is primarily driven by the haploid number of chromosomes [191]. The efficiency of natural selection is proportional to the amount of genetic variancein fitness in a population [48]. To illustrate this concept, imagine that an individual’s

fitness is directly proportional to the amount of introgressed ancestry it carries, most

of which is presumed to be deleterious [6, 190]. After a hybridization event, the vari- ance in fitness in the population is the variance in introgression fraction per genome.

If global recombination rate is high, within very few generations, each individual will be largely homogeneous in the proportion of their genomes comprised of intro- gressed ancestry. Selection will therefore be inefficient in purging weakly deleterious introgressed alleles, leaving more time for neutral or weakly beneficial alleles to re- combine away from the deleterious background and spread through the recipient population [190]. Global recombination rate varies among species, and this mecha- nism may explain why some species are more likely to experience introgression that others. Simulations using human and Drosophila recombination maps show that

“Drosophila purges as much introgressed DNA in 13 generations as humans do in

2,000 generations ”[190]. In Heliconius butterflies, a much stronger correlation was

observed between chromosome size and introgression than between local recombi-

nation rate and introgression [43]. If chromosomes experience approximately one crossover per meiosis, the probability that any two loci on a short chromosome will recombine is much higher than that in long chromosomes. Through the mechanism described above, deleterious variation on short chromosomes will reach this low ge- netic variance state more quickly than on long chromosomes, which will remain more heterogeneous with larger blocks of introgressed and non-introgressed regions.

16 In combination with fine-scale recombinational impact on purging, this aggregate recombination effect may lead to chromosome size being more strongly correlated with introgressed fraction than local recombination rate [190].

1.6 Discussion

We have examined the prevalence of introgressed segments in genomes, the ways in which introgressed alleles can be adaptive, and the demographic forces that impact introgression probability, but have not answered the overarching question: how im- portant is hybridization in evolution? For hybridization to happen, two species must have overlapping geographic ranges, and must be sufficiently closely related that at least some hybrid offspring will be fertile. In addition, the adaptive landscape must be such that some hybrid individuals with phenotypes either intermediate between parents or transgressive (outside the range of parental phenotypes) will not suffer catastrophic fitness consequences. Both of these conditions are met in rapid adap- tive radiations, where, due to key innovations or ecological opportunity, populations are able to expand into new niches [109, 164, 167]. Genomic introgression has been observed in many classic adaptive radiations in animals, including cichlid fish, Heli-

conius butterflies, Darwin’s finches, and Ctenotus skinks [43, 91, 95, 118, 152, 178].

This introgression is not simply neutral variants passing between sister species, but includes adaptive alleles that have undergone inter-specific sweeps [123]. In cichlids, a likely spark for adaptive radiation in two lineages was an ancient hybridization event, leading to recombinant lineages and providing evidence for a combinatorial view of speciation [109, 118, 178]. Other instances where variable phenotypes might prove advantageous include range expansions and invasions, and in both cases hy- bridization has been widely documented [73, 143].

17 The fraction of a genome with a history of introgression can be estimated from

a single individual per species, while positive selection analysis typically requires

population-level sampling. However, powerful techniques to identify selection on a

genome-wide scale are available for groups that have suitable data, and could be

adapted to apply to the introgression case [16, 38, 87, 169, 177]. One additional promising avenue of research is analysis of genomic clines [56]. This framework identifies introgressed alleles that exist at unexpected frequencies, given the hybrid makeup of a population. Adaptive introgression is therefore inferred if, for example, introgressed alleles are at high frequencies, although the population as a whole is not broadly admixed [56].

Introgression has played a role in the evolution of many clades that have been

assessed on a genomic level. The probability of introgression depends not only on

the adaptive benefit of a particular gene or the deleterious background onwhich

that gene occurs, but also on the demographic history of interacting populations

and the genomic landscape of recombination. Introgression has been the source of

specific variants that allow populations to respond to selective pressures, adaptto

new conditions, and rescue populations with small sizes from extinction. We shall

soon learn if it can play an outsized role in adaptation to changing climates, but

given the conditions identified above, it is likely to be a significant contributor.

Understanding the prevalence of hybridization in nature has allowed researchers to

think more clearly about the process of evolution as a reticulate network instead of

a simple bifurcating tree, and more fully appreciate sources of adaptive variation in

nature.

18 2 Genomic architecture and introgression shape a butterfly radiation

2.1 Abstract

We use twenty de novo genome assemblies to probe the speciation history and ar- chitecture of gene flow in rapidly radiating Heliconius butterflies. Our tests to

distinguish incomplete lineage sorting from introgression indicate that gene flow has

obscured several ancient phylogenetic relationships in this group over large swathes

19 of the genome. Introgressed loci are underrepresented in low recombination and gene-rich regions, consistent with the purging of foreign alleles more tightly linked to incompatibility loci. We identify a hitherto unknown inversion that traps a color pattern switch locus. We infer that this inversion was transferred between lineages via introgression and is convergent with a similar rearrangement in another part of the genus. These multiple de novo genome sequences enable improved understanding

of the importance of introgression and selective processes in adaptive radiation.

2.2 Main Text

Adaptive radiations play a fundamental role in generating biodiversity. Initiated by key innovations and ecological opportunity, radiation is fueled by niche com- petition that promotes rapid diversification of species [164]. Reticulate evolution may enhance radiation by introducing genetic variation, enabling rapidly emerging populations to take advantage of novel ecological opportunities [95, 140]. Diverg- ing from its sister genus Eueides ∼12 My ago, Heliconius radiated in a burst of

speciation in the last ∼5 My [90]. Introgression is well known in Heliconius, with widespread reticulate evolution across the genus [91], though this has been disputed

[22]. Nonetheless, how introgression varies across the genome is known only in one

pair of sister lineages [111, 113]. Here, we use multiple de novo whole genome as-

semblies to improve the resolution of introgression, incomplete lineage sorting (ILS),

and genome architecture in deeper branches of the Heliconius phylogeny.

2.2.1 Phylogenetic analysis

We generated 20 de novo genome assemblies for species in both major Heliconius sub-

clades and three additional genera of Heliconiini. Here we align the sixteen highest

20 quality Heliconiini assemblies to two Heliconius reference genomes and seven other

Lepidoptera genomes, resulting in an alignment of 25 taxa [188]. De novo assem- bly provides superior sequence information for low complexity regions, allows for discovery of structural rearrangements, and improves alignment of evolutionarily distant clades [41]. Other studies in Heliconius have shown a high level of phyloge- netic discordance, arguably a result of rampant introgression [90, 91]. We attempted to reconstruct a bifurcating species tree by estimating relationships using protein- coding genes, conserved coding regions, and conserved non-coding regions. We generated phylogenies with coalescent-based and concatenation approaches, using both the full Lepidoptera alignment and a restricted, Heliconiini-only sub-alignment.

These topologies were largely congruent among analytical approaches, but weakly supported nodes were resolved inconsistently. These approaches therefore failed to resolve the phylogeny of Heliconius as a simple bifurcating tree (Fig. 2.2.1A, [43,

Fig. S20]).

To determine whether hybridization was a cause of the species tree uncertainty, we calculated Patterson’s D-statistics [137] for every triplet of the 13 Heliconius species, using a member of the sister genus, Eueides tales, as outgroup. In 201 of 286 triplets, we observed values significantly different from zero based on block- jackknifing, demonstrating strong evidence for introgression [43, Fig. S53]. However, these tests alone yield little quantitative information about admixture. We therefore used phyloNet [195] to infer reticulate phylogenetic networks of these species on the basis of random samples of one hundred 10 kb windows across the alignment. For each sample, we co-estimated all 100 regional gene trees and the overall species network in parallel [195]. To improve alignments, we analyzed the melpomene- silvaniform group with respect to the H. melpomene Hmel2.5 assembly [37] and

21 A B H. cydno melpomene clade H. timareta Silvaniform clade H. timareta H. melpomene erato clade H. melpomene H. melpomene (ref) sara clade H. pardalinus H. pardalinus Basal Heliconius H. besckei H. besckei H. numata H. numata H. doris H. doris H. erato .01 H. e. demophoon (ref) C H. erato H. erato x H. himera H. himera H. himera H. hecalesia H. hecalesia H. telesiphe

H. demeter H. telesiphe H. sara H. demeter Proportion of 10 kb Eueides tales windows recovering node Agraulis vanillae H. sara 0 .25 .5 .75 1 .01

Figure 2.2.1: Phylogeny and phylogenetic networks of Heliconius show lack of support for bifurcating tree. A. All nodes resolved in a majority of species trees are shown in this cladogram (heavy black lines), while the poorly resolved silvaniform clade is collapsed as a polytomy [43, Fig. S20]. The 500 colored trees were sampled from 10 kb non-overlapping windows and constructed with maximum likelihood. B, C. High-confidence tree structure (black) and introgression events (red) are shown as solid lines. Dashed red lines indicate weakly supported introgression events. Grey branch ends are cosmetic. The melpomene-silvaniform clade is shown in B, the erato-sara clade in C. Euclidean lengths of solid black lines are proportional to genetic distance along the branches. Scale bars in units of substitutions per site. Breaks at the base in B indicate that the branch leading to H. doris has been shortened for display.

22 the erato-sara group with respect to the H. erato demophoon v1 assembly [188].

Most species exhibited an admixture event at some point in their history using this method; we confirmed extensive reticulation among silvaniform species and discovered major gene flow events in the erato-sara clade. Based on these results, we propose the reticulate phylogenies in Fig. 2.2.1B-C.

2.2.2 Correlation of local ancestry with genome architecture

We next analyzed the distribution of tree topologies across the genome, again treat-

ing each major clade separately and using its respective reference genome. The

melpomene-silvaniform group lacked topological consensus, unsurprisingly since in-

trogression, especially of key mimicry loci, is well known from this clade [76]. The most common tree topology was found in only 4.3% of windows, with an additional

14 topologies appearing in 1.0-3.4% of windows [43, Fig. S19-S21]. By contrast, we here focus on the erato-sara group, where two topologies dominate (Fig. 2.2.2).

One (Tree 2, Fig. 2.2.2B) matched our bifurcating consensus topology (Fig. 2.2.1A) and a recently published tree [90], while the other (Tree 1) differs in that it places

H. hecalesia and H. telesiphe as sisters.

Regions with local topologies discordant from the species tree may have arisen

through introgression or ILS. In order to make within-topology locus-by-locus infer-

ences, we developed a statistical test to distinguish between ILS and introgression

based on the distribution of internal branch lengths among windows for a given

three-taxon subtree, conditional on its topology. We call this method Quantifying

Introgression via Branch Lengths (QuIBL). In the absence of introgression, we ex-

pect internal branch lengths of triplet topologies discordant with the species tree

(due to ILS) to be exponentially distributed. However, if introgression has occurred,

23 h eoe A. genome. the au ntetplf onri h ecnaeo l 0k idw htrcvrta topology. that in recover color that same windows the kb of 50 all topology of the percentage to the corresponds is histogram corner Each left top the in value iue222 oa vltoayhsoyi the in history evolutionary Local 2.2.2: Figure ooe ad ersn retplge fec 0k idw ooscrepn otetopologies the to correspond colors window; kb 50 each in of topologies tree represent bands Colored ftenme fcneuie5 bwnoswt httplg.Arw niaeln lcsin blocks long indicate Arrows topology. that with windows kb 50 consecutive inversions. of number the of A B ihbakrgossoigmsigdata. missing showing regions black with , Chr 1 Chr 2 Chr 3 Chr 4 Chr 5 Chr 6

ahbrrpeet hoooe ntrso the of terms in chromosome, a represents bar Each Chr 7 Chr 8 Chr 9 Chr 10 Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 Chr 17 Chr 18 Chr 19 24 B. Chr 20

Chr 21 B h ih otcmo re r hw.The shown. are trees common most eight The erato-sara 1.2% 35.5% 2.0% 2.2% 4.8% 15.4% 34.8% 2.6% tel sar sar sar sar sar sar sar dem dem dem dem dem dem dem dem hsa hsa hsa tel tel sar tel tel hsa hsa hsa hsa hsa tel tel tel ld shtrgnosacross heterogeneous is clade Tree 7 Tree 8 Tree 6 Tree 5 Tree 4 Tree 3 Tree 2 Tree 1 era era era era era era era era B n hw h distribution the shows and , him him him him him him him him .erato H. C 400 100 200 100 200 300 100 150 100 100 150 200 400 100 200 300 600 150 25 50 75 Consecutive Windows 50 50 50 0 0 0 0 0 0 0 0 0.0 0.0 0 0 0 0 0 0 10 10 2.5 eeec [ reference 2.5 10 3 2 10 20 20 5.0 4 5.0 6 20 30 20 30 7.5 6 40 7.5 40 9 30 30 10 188 50 8 C. ]. their distribution should have that same exponential component, but also include an additional component with a non-zero mode corresponding to the time between the introgression event and the most recent common ancestor of all three species

[188]. Like other tree-based methods, QuIBL is potentially sensitive to the assump- tion that each tree is inferred from loci with limited internal recombination [43,

Fig. S75]. We therefore chose small (5 kb) windows to reduce the probability of intra-locus recombination breakpoints.

For every triplet in the erato-sara clade, we calculate the likelihood that the distribution of internal branch lengths is consistent with introgression or with ILS only. We formally distinguish between these two models using a BIC test with a strict cutoff of ∆BIC > 10. Consistent with our results from D-statistics, we find that 13 of 20 triplets have evidence for introgression [43, Table S13]. For example, using QuIBL on the triplet H. erato-H. hecalesia-H. telesiphe, we infer that 76% of discordant loci, or 38% of all loci genome-wide, are introgressed. Averaging over all triplets, we infer that 71% (67% with BIC filtering) of loci with discordant gene trees have a history of introgression, or 20% (19% with BIC filtering) of all triplet loci, indicating a broad signal of introgression throughout the clade ([43, Equation

7.7, Table S13]; see [188] for additional discussion).

In hybrid populations, individuals have genomic regions that originate from dif-

ferent species and may be incompatible with the recipient genome or with their

environment [32]. Linked selection causes harmless or even beneficial introgressed loci to be removed along with these deleterious loci if they are tightly linked; this ef- fect depends on the strength of selection and the local recombination rate [14, 166].

We therefore expect introgressed loci to be enriched in regions where selection is likely to be weak, such as gene deserts, or in regions of high recombination, where

25 harmless introgressed loci more readily recombine away from linked incompatibility loci.

In Heliconius, even distant species like H. erato and H. melpomene have the

same number of broadly collinear chromosomes [37], facilitating direct comparisons among species. Furthermore, each chromosome in Heliconius has approximately one crossover per chromosome per meiosis in males (there is no crossing over in female Heliconius)[36, 188]. Chromosomes vary in length, and chromosome size is inversely proportional to recombination rate per base pair [37, 113]. We found a strong correlation between the fraction of windows in each chromosome that show a given topology and physical chromosome length (Fig. 2.2.3A). Such relationships exist for all 8 trees in Fig. 2.2.2B[188], but we focus here on the two most common trees: Tree 1 has a strongly negative correlation with chromosome size (r2 = 0.883, t = 11.7, 18 d.f., p < 0.0001) while Tree 2 (concordant with our inferred species tree) has a positive correlation (r2 = 0.726, t = 6.9, 18 d.f., p < 0.0001). Results from QuIBL indicate that 94% of windows that recover a Tree 1 triplet topology are consistent with introgression [43, Fig. S70, Table S13]. The Z (sex) chromosome 21, is strongly enriched for Tree 2, suggesting it may harbor more incompatibility loci than autosomes. Interspecific hybrid females in Heliconius are often sterile, conform- ing to Haldane’s Rule, and sex chromosomes have been implicated as particularly important in generating incompatibilities [40, 77, 113, 127, 132, 189].

To test whether the pattern we observe among chromosomes is related to differ-

ences in recombination, we investigated the relationship between recombination rate

and tree topology within chromosomes. Recombination rate declines at the ends of

chromosomes [43, Fig. S85], and the species tree (Tree 2) is more abundant in those regions (Fig. 2.2.3B). In addition, when windows are grouped by local recombination

26 A B 9 2 5 8 14 16 4 3 15 11 21 7 6 20 17 12 18 1 19 1310 0.6

rees 0.2 0.4

0.2 action of T 0.1 F r action of Chromosome F r 0.0 0.0 15.0 17.5 20.0 22.5 5 10 15 20 25 30 35 40 45 50 Chromosome Length (Mb) Distance from End (percent of chromosome length) C D 0.8 1493 8221 11255 7505 2766 626 97 0.8 14232 649 1576 3200 5447 5999 860 * 0.6 0.6 rees rees

0.4 0.4 action of T action of T F r F r 0.2 0.2

0.0 0.0 0−1 1−2 2−3 3−4 4−5 5−6 >6 0−1 1−4 4−5 5−6 6−7 7−8 >8 Recombination Rate (cM/Mb) ln(Number of Coding Bases per 10kb Window)

Figure 2.2.3: Chromosomal architecture is strongly correlated with local topology. Tree 1 is shown in red, and Tree 2 is shown in blue, as in Fig. 2.2.2. A. Tree 1 shows a negative relationship with chromosome size, while Tree 2 shows a positive relationship. Lines are linear regressions with chromosome 21 excluded. Numbers along top indicate chromosome number. B. Each chromosome was divided into 10 equally sized bins, and the occupancy of each topology in each bin was calculated as the number of windows that recovered the topology in the bin divided by the number of windows that recovered the topology in the chromosome. C. Windows are binned by recombination rate, and boxes show the fraction of each tree in each bin for each chromosome separately. Numbers above boxes are the number of windows in each bin. D. Boxes show the relationship of tree topology with coding density. Asterisk denotes significance at 5% level (paired t-test, p < 0.025). In all boxplots, central line is median, box edges are first and third quartile, and whiskers extend to the largest value no further than 1.5×(inter-quartile range).

27 rate calculated from population genetic data [188], we observe a strong relationship with the recovered topology (Fig. 2.2.3C). Finally, we observe a minor enrichment of Tree 1 in regions of very low gene density, but this effect is weak (Fig. 2.2.3D) compared to that of recombination. Taken together, these results show that tighter linkage on longer chromosomes, and in lower recombination regions within chromo- somes leads to removal of more introgressed variation in those regions. This very strong correlation is consistent with a highly polygenic architecture of incompati- bilities between species.

2.2.3 Introgression of a convergent inversion

The topology block size distribution in the erato clade generally decayed exponen-

tially (Fig. 2.2.2C), but two unusually long blocks contained minor topologies: one on chromosome 2 (Tree 3, composed of three sub-blocks) and the other on chro- mosome 15 (Tree 4). Our study of the ∼3 Mb topology block on chromosome 2

confirms an earlier finding of an inversion in H. erato [37], and we show here that its rare topology is most likely explained by ILS including a long period of ancestral polymorphism [43, Fig. S95].

The topology block on chromosome 15 is of particular interest, as it spans cor- tex, a genetic hotspot of wing color pattern diversity in Lepidoptera [81, 126]. We hypothesized that this block could be an inversion, as in H. numata, where the P1

’supergene’ inversion polymorphism around cortex controls color pattern switching among mimicry morphs [75]. This block recovers H. telesiphe and H. hecalesia as a monophyletic subclade, which together are sister to the sara clade (Fig. 2.2.2B, Tree

4). We searched our de novo assemblies for contigs that mapped across topology transitions. Taking H. melpomene as the standard arrangement, we find clear inver-

28 sion breakpoints in H. telesiphe, H. hecalesia, H. sara, and H. demeter. Conversely,

H. erato, H. himera, and E. tales all contain contigs that map in their entirety across

the breakpoints (Fig. 2.2.4A), implying that they have the ancestral H. melpomene

arrangement.

This chromosome 15 inversion covers almost exactly the same region as the 400

kb P1 inversion in H. numata [75, 81, 82]. However, de novo contigs from our

H. numata assembly show that the breakpoints of P1 are close to but not identical to those of the inversion in the erato clade (Fig. 2.2.4A). Furthermore, in topologies for H. numata, H. telesiphe, H. erato, and E. tales across chromosome 15, not a single window recovered H. numata and H. telesiphe as a monophyletic subclade, as would be expected if the erato group inversion was homologous to P1 in H. numata.

We used QuIBL with the triplet (H. erato + H. telesiphe + H. sara) to elu- cidate the evolutionary history of this inversion. A small internal branch would suggest ILS while a large internal branch would be more consistent with introgres- sion (Fig. 2.2.4B). The average internal branch length in the inversion was much longer than the genome-wide average, corresponding to a 79% probability of intro- gression (Fig. 2.2.4C). If the inversion was polymorphic in the ancestral population for some time, we could also recover a similarly long internal branch (Fig. 2.2.4B, center). We distinguish between this longer-term polymorphic scenario and intro- gression by comparing the genetic distance (DXY ) between H. telesiphe and H. sara, represented by T3 in Fig. 2.2.4B. Normalized DXY (as in [43, Fig. S95]) within the inversion is ∼25% less than in the rest of the genome. Given that this is a large genomic block, introgression is therefore the most parsimonious explanation for the evolutionary history of the inversion (Fig. 2.2.4D) [157].

29 A Tree topologies Genes H. erato

E. tales

H. telesiphe

H. hecalesia

H. demeter H. sara H. numata

B simple ILS stably polymorphic ILS introgression

T2 T2 T2

T3 T3 T3

H. sara H. demeter H. telesiphe H. hecalesia H. himera H. erato H. sara H. demeter H. telesiphe H. hecalesia H. himera H. erato H. sara H. demeter H. telesiphe H. hecalesia H. himera H. erato C D 250

200 900

150 600 count count 100

300 50

0 0 0.000 0.005 0.010 0.015 0.020 0.75 1.00 1.25 1.50

Internal Branch Length (T2) Normalized H. telesiphe − H. sara Dxy (T3)

Figure 2.2.4: Parallel evolution of a major inversion at the cortex supergene locus. A. Map of 1.7 Mb region on chromosome 15. Coordinates are in terms of Hmel 2.5, and ticks are in Mb. Tree topology colors correspond to those in Fig. 2.2.2. Genes are shown as black rectangles; cortex is highlighted in yellow. Each line shows the mapping of a single contig. Aligned sections of each contig are shown as thick bars, while unaligned sections are shown as dotted lines. Arrows indicate the strand of the alignment. The H. erato group breakpoints are shown with red vertical lines, while the H. numata breakpoints are shown with green vertical lines. B. Evolutionary hypotheses consistent with the topology observed in this inversion in the context of the previously estimated phylogenetic network. The three species used in the triplet gene tree method – H. erato, H. telesiphe, and H. sara – are shown as black lines, while lineages not included are shown as grey lines. C. Histogram of internal branch lengths (T2) in windows with the topology H. erato,(H. telesiphe, H. sara). The inferred ILS distribution is shown as a dashed line, and the inferred introgression distribution is shown as a dotted line. The average internal branch length in the inversion is shown as a green vertical line. D. Histogram of normalized DXY (T3) between H. telesiphe and H. sara. Mean normalized DXY in the inversion is shown as a green vertical line. 30 2.3 Discussion

Species involved in rapid radiations are prone to hybridization due to frequent geo- graphical overlap with closely related taxa. In both melpomene and erato clades of

Heliconius, introgression has overwritten the original bifurcation history of several species across large swathes of the genome, a pattern also observed in Anopheles mosquitos [52]. This observation is also consistent with genomic analysis of other rapid radiations characterized by widespread hybridization and introgression, in- cluding Darwin’s finches [95] and African cichlids [118]. In other radiations, the role of introgression is less clear: in Tamias chipmunks, widespread introgression of mitochondrial DNA was identified, in contrast to an absence of evidence fornu- clear gene flow [59]. With few genomic comparisons available to date, it is perhaps too early to say whether introgression is a major feature of adaptive radiations in general, but evidence thus far suggests this to be the case.

Our results raise the question of why some genomic regions cross species bound- aries while others do not. In the erato clade, we find a strong correlation between recombination rate and introgression probability. Similar associations with topology also exist between sister species in the melpomene clade [111]. Associations between recombination and introgression in actively hybridizing populations of sword-tail fish

(Xiphophorus) and monkey flowersMimulus ( ) support the role of linked selection on a highly polygenic landscape of interspecific incompatibilities [18, 55, 166]. Our results establish that this relationship persists and may indeed be strengthened with time since introgression. While hybridization is ongoing, many introgressed blocks are constantly reintroduced into the population. If linked to weakly deleterious alleles, introgressed loci will be finally purged by linked selection only long after introgression ceases.

31 Recombination rate alone cannot account for differential introgression, so we must

delve into specific regions to elucidate their function and relevance to speciation. It

is critical, therefore, to have tools that can confidently identify introgressed loci,

and much effort has gone into developing such methods112 [ , 137]. Our test using internal branch lengths in triplet gene trees is based in coalescent theory and takes advantage of the discriminatory power of a property of gene trees not explicitly accounted for by other methods. QuIBL allows us to assess probability of intro- gression for each locus in each species triplet [188]. Here, we employ this method

to identify the evolutionary origin of a convergent inversion that has undergone

multiple independent introgression events, and to show that genomic regions with

discordant topologies arose mostly through hybridization. Just as sex aids adapta-

tion within species, occasional introgression and recombination among species can

have major long-term effects on the genome, contributing variation that could fuel

rapid adaptive divergence and radiation.

32 3 Molecular basis of hybrid sterility in Heliconius pardalinus geographic races

3.1 Abstract

Species divergence is marked by the emergence of genetic incompatibilities in hy- brids. Characterizing the underlying molecular mechanisms of such incompatibility phenotypes is therefore critical to understanding the processes of speciation and generation of biodiversity. Most research on the genetic basis of hybrid sterility

33 has been done in species of the model genus Drosophila which, in accordance with

Haldane’s rule, has male-biased hybrid inviabilty or sterility. In Lepidoptera, which have ZW/ZZ sex determination, it is the females that should be more affected by either sterility or inviability. Neotropical Heliconius butterflies are known to hy- bridize in nature, and typically show hybrid female sterility among some species and geographic races, conforming to Haldane’s rule. Heliconius pardalinus butleri and H. p. sergestus are parapatric in distribution, with H. p. sergestus restricted to the Huallaga valley in northern Peru and H. p. butleri distributed in the lowland rainforest east of the Andes. While they display no assortative mating in captivity, female F1 hybrids are sterile, producing no eggs. Through histological characteri- zation of hybrid ovaries, we find that the F1 female oocytes arrest after the oocyte is differentiated from nurse cells but before yolk deposition. By mating fertile male

F1 hybrids to parental population females, we generated a population of backcross females, for which we then described phenotypes, inferred genomic Quantitative

Trait Loci (QTL) and quantified gene expression as a function of oocyte develop- ment. This study represents the first in-depth characterization of hybrid sterility using modern genetic tools in Lepidoptera, and elucidates the genetic architecture of hybrid incompatibility in recently diverged populations.

3.2 Introduction

As populations diverge over time, they acquire ecological and genetic differences that lead to speciation, generating biodiversity in nature. Species identity is maintained in part due to the inability of interspecific parents to produce offspring that areas viable and fertile as intraspecific parents. In some cases, hybrid sterility or inviability is asymmetrical between sexes, and in such instances it is usually the heterogametic

34 sex that is affected - males in XX/XY systems and females in ZZ/ZW systems

[69]. This observation, known as Haldane’s rule, holds across a wide variety of taxa

[30, 69, 97, 146]. Its evolutionary basis has been theorized to be primarily due to dominance effects associated with the lack of a complementary sex chromosome in the heterogametic sex [131, 185], though faster evolution of males (in cases where males are sterile)[198] and faster evolution of the sex chromosome [160] have been proposed as explanations as well.

Haldane’s rule, and in particular the dominance theory, may be explained by the accumulation of Dobzhansky-Muller incompatibilities (DMIs) [10, 39, 124, 130].

Under the Dobzhansky-Muller model, diverging populations independently acquire novel alleles at loci across their genomes. Alleles derived in separate populations are not tested together, and when they are brought together in hybrids, they interact epistatically to reduce fitness [130]. In general, the derived alleles could be dominant or recessive in the hybrid background, but if at least one of the recessive interact- ing alleles is on a sex chromosome, it will show an effect only in the heterogametic sex. The widespread presence of DMIs across the genome have been inferred from genetic association data in a number of species (eg. [58, 165]), and recessive, nega- tive epistatic interactions in hybrids have been mapped in Drosophila species [147].

Genetic and molecular mechanisms of hybrid sterility have been identified in some cases [12, 20, 122, 162, 180], but this work has been primarily carried out in organ-

isms with XX/XY sex determination including Drosophila melanogaster and Mus

musculus, and therefore instances of Haldane’s rule have been those involving male

sterility and inviability (but see P-element incompatibilities in Drosophila females

[83, 161]).

Lepidoptera were among the groups of taxa that Haldane considered when initially

35 formulating his rule [69]. They have ZW/ZZ sex determination, so females are the heterogametic sex and are more susceptible to hybrid inviability or sterility. Studies in Lepidoptera have been critical in evaluating the relative effects of dominance and faster male evolution in producing Haldane’s rule [146], and also provided evidence that faster-Z evolution may contribute to the phenomenon in female heterogametic systems [160].

Heliconius is a genus of Neotropical butterflies with complex but well-studied

population structure and gene flow [76]. Many species have a number of geographic subspecies, each with a different colour pattern which mate where they overlap in narrow hybrid zones [23, 107]. Most genetic material passes freely across these hybrid zones, and only particular loci specifying color pattern switch genes remain differentiated [107, 188]. Meanwhile, sister or closely related species hybridize at

very low rates in sympatry. In these cases, color pattern alleles are often shared

via interspecific gene flow, allowing recipient species to acquire the adaptive local

color pattern [35, 123]. Such hybridization can affect the whole genome, giving rise to lineages that have a substantial portions of their genetic variability due to introgression [43, 91, 111].

Haldane’s rule has been observed in hybrids within and between species in Helico-

nius. Hybrid female sterility was shown between geographic races of H. melpomene,

in which hybrid daughters of Panamanian mothers and Guianan fathers lay small,

sterile eggs [77]. Interspecific hybrids of H. melpomene and its sister species H. cydno also display female sterility, with phenotypes ranging from full-size but sterile eggs to undeveloped ovaries [127]. Sterility in some, but not all, cross directions in these two cases was associated with the Z-chromosome through linkage with the Tpi locus.

In the case of H. melpomene - H. cydno hybridization, backcross sterility was also

36 associated with inheritance of the H. cydno Ac locus [119], now known to be the signalling protein WntA [110].

Here, we investigate the genetic and molecular basis of Haldane’s rule in two subspecies of Heliconius pardalinus: H. p. butleri and H. p. sergestus. These parap-

ateric subspecies are closely related but are strongly genetically differentiated, with

average genome-wide FST of approximately 0.28. They inhabit different environ- ments, with H. p. sergestus restricted to the dry forest of the Huallaga valley near

Tarapoto, San Martín Peru and H. p. butleri inhabiting lowland rainforest in the

neighboring Loreto department (Fig. 3.2.1A). Although individuals from these two subspecies mate freely in captivity, the F1 hybrid females are completely sterile. We first characterize the sterility phenotype in F1 females and in backcrosses produced from crosses between the fertile male F1s and parental population females. Next, we sequence RNA from the region of backcross ovaries identified to be abnormal in F1 individuals and quantify transcript expression differences between poorly-developed and well-developed individuals. We use a backcross population of 87 females to generate a QTL map, and combine the results from the QTL, RNA-seq, and genes known to be involved in Lepidopteran oogenesis from previous studies to identify a candidate gene potentially responsible for hybrid incompatibility in this system.

3.3 Materials and Methods

3.3.1 Butterfly Rearing

Butterflies were collected in the Peruvian departments of San Martin and Loreto,

and were housed and reared in insectaries in Tarapoto, Peru. Further details are

described in [159].

37 Figure 3.2.1: Distribution of H. pardalinus and crossing scheme A. Distribution of locality records for H. pardalinus. Yellow: H. p. sergestus, Red: all other H. pardalinus subspecies. Expanded region: Huallaga valley, San Martin, Peru B. Crossing scheme. Hybrid females between H. p. butleri and H. p. sergestus are sterile. Fertile F1 males were mated with H. p. butleri females to generate backcross broods of variable developmental phenotype. Data from [159]

38 3.3.2 Ovary Dissection

Female butterflies were collected from insectaries when they were 15 days old, al-

lowing time for eggs to fully develop [42, 127]. Wings were removed and stored in glassine envelopes as vouchers. Thorax and head were removed and stored in 96% alcohol for DNA extraction and processing. Approximately half of the ovaries were immediately dissected, and for the remainder, abdomens were stored in 96% ethanol and transferred to Cambridge, MA for fine dissection. In all cases, ovaries were dis- sected from the abdomen in ice-cold PBS using standard dissection tools. Tracheae and fat body were manually removed, and images were taken at 8X, 12.5X, and 20X magnification for phenotyping. Of the ovaries dissected in the field, six backcrosses and two pure H. pardalinus butleri were stored in RNALater solution (ThermoFisher

AM7020).

H. melpomene and H. cydno individuals were generously contributed by L. Gilbert.

These individuals were housed in greenhouses at Harvard University for less than

one week before dissection following the same protocol as above. One ovary from

each individual was stored in RNALater, and the other was fixed in 4% PFA in PBS

for 20 minutes, washed 3 times for 5 minutes each in PBS, and stored in PBS at

−20◦C.

3.3.3 Ovary Staining and phenotyping

Individual ovarioles from alcohol-stored ovaries were removed and rehydrated by 15

minute incubations in serial dilutions of ethanol in PBT. Once fully re-hydrated,

ovaries were incubated in acridine orange solution (ThermoFisher A1301; 5 µg/mL

in PBT) to visualize cytoplasm. They were then washed in PBT before being stained

with DAPI (1 µL/mL in PBT), washed once more in PBT, and mounted on slides

39 Score 0: absolutely nothing in the ovaries

Score 1: Some developing follicles in tips of ovarioles

Score 2: Clearly has developing oocytes in vitellogenesis stage

Score 3: Obviously enlarged yolk-filled (yel- low) oocytes. May or may not have fully developed eggs in oviduct.

Figure 3.3.1: Ovary score guide This scheme was used to characterize hybrid ovaries in terms of gross developmental phenotype [67].

with VectaShield (Vector Labs). Slides with stained ovarioles were scanned with a

Zeiss Axio Scan Z1, and high-magnification images were taken with a Zeiss LSM

880 upright confocal microscope. The most highly developed follicle was compared visually to stages outlined in [199].

Due to field storage conditions, we were unable to perform staining and determine

the latest developmental stage for each of our backcross females. We therefore

developed a scoring scheme based on gross ovary morphology, as demonstrated in

Fig. 3.3.1. Three images from each ovary were randomized and scored blindly by two independent scorers. The scoring scheme is illustrated in Fig. 3.3.1. The resulting six scores were averaged, producing one phenotype, dubbed “Morphology Score”.

Using the 42 individuals for which we confidently assigned a latest developmental stage and morphology score, we verified that the two metrics are highly correlated

40 3

2

phology Score

r Mo 1

0

2 4 6 8 Latest Stage

Figure 3.3.2: Comparison of scoring schemes Among individuals for whom we were able to score both latest developmental stage and morphology score, the values were highly correlated. Note that the ”Latest Stage” measurement is the latest stage observed in the sampled ovariole. Fully-developed eggs (stage 12[199]) may have been present in the oviduct but were not observed in ovarioles themselves.

41 (logistic regression, p = 2.45 × 10−11, Fig. 3.3.2).

3.3.4 DNA Extraction and Sequencing

Specimens were stored in DMSO salt solution at −20◦C. RNA-free genomic DNA

was extracted to a concentration of approximately 15ng/µL from thoracic tissue

using a Qiagen DNeasy Blood and Tissue Kit following the standard protocol pro-

vided by the manufacturer. Restriction site Associated DNA (RAD) libraries were

prepared (by K. Dasmahapatra) using a modified protocol from [46], using a PstI restriction enzyme, sixteen 6bp P1 barcodes and eight indexes. DNA was covaris sheared to 300-700bp and gel size selected. 128 individuals were sequenced per lane, with 125bp paired end reads, on an Illumina HiSeq 2500.

3.3.5 RNA Extraction and Sequencing

Ovaries stored in RNALater were further dissected into pre-vitellogenic follicles,

vitellogenic follicles, and choriogenic follicles + chorionated eggs. Each of these

subsets was processed separately. Tissue was blotted dry with kimWipes to remove

excess RNALater solution. It was then transferred to TRIZOL and homogenized

with the PRO200 tissue homogenizer (PRO Scientific). RNA was extracted with

the Direct-zol RNA miniprep kit (Zymo R2051). mRNA libraries were prepared by

the Harvard University Bauer Core with the KAPA mRNA HyperPrep kit, with

mean fragment insert sizes of 200-300bp. mRNA was sequenced with the NovaSeq

S2, producing an average of 49 million paired-end, 50 base pair reads (Table 3.3.1).

All reads were mapped to the H. melpomene v2.5 transcriptome [37] using kallisto

[19]. Approximately 70% of reads were mapped to the transcriptome per sample,

and that value did not differ between the H. pardalinus butleri samples and the

42 Table 3.3.1: RNA sequencing and mapping statistics.

Specimen population tissue reads (×106) pct Q≥30 mapped (×106) pct mapped NR15-459 backcross EGG 53.74 94.35 33.70 63 NR15-459 backcross TIP 55.64 94.64 38.68 70 NR15-459 backcross VIT 50.93 95.04 35.50 70 NR15-461 backcross EGG 42.73 94.86 29.07 68 NR15-461 backcross TIP 47.19 94.90 33.06 70 NR15-461 backcross VIT 52.75 94.46 34.29 65 NR15-465 backcross TIP 44.82 95.16 31.15 69 NR15-465 backcross VIT 39.70 93.90 25.80 65 NR15-474 backcross EGG 48.49 94.86 34.02 70 NR15-474 backcross TIP 39.77 95.45 28.54 72 NR15-474 backcross VIT 49.50 94.64 34.89 70 NR15-475 backcross TIP 53.07 94.84 37.04 70 NR15-475 backcross VIT 47.44 94.82 32.28 68 NR15-483 backcross TIP 46.67 94.80 32.21 69 NR15-483 backcross VIT 42.97 94.50 29.09 68 NR15-473 butleri EGG 46.92 94.64 32.00 68 NR15-473 butleri TIP 48.52 94.58 32.32 67 NR15-473 butleri VIT 50.60 95.11 34.67 69 NR15-488 butleri EGG 57.83 94.57 39.05 68 NR15-488 butleri TIP 55.19 94.50 37.71 68 NR15-488 butleri VIT 50.63 94.51 35.38 70 NE19-08 cydno EGG 50.02 93.24 37.20 74 NE19-08 cydno TIP 37.13 94.73 28.22 76 NE19-08 cydno VIT 56.74 95.15 42.88 76 NE19-10 cydno EGG 47.44 94.55 33.28 70 NE19-10 cydno TIP 51.48 94.89 39.10 76 NE19-10 cydno VIT 44.53 94.76 32.45 73 NE19-04 melpomene TIP 47.09 95.14 31.57 67 NE19-04 melpomene VIT 65.44 95.42 42.63 65 NE19-06 melpomene EGG 48.09 95.03 32.28 67 NE19-06 melpomene TIP 49.99 94.45 35.03 70 NE19-06 melpomene VIT 45.57 94.91 26.82 59

backcrosses (Table 3.3.1). Analysis was carried out in R using the Sleuth package

[145].

3.3.6 SNP calling

FastQ files from each RAD library were demultiplexed using process_radtags from

Stacks [27], amd BWA-MEM [101] was used with default parameters to map the reads to the H. melpomene genome v2.5 [37]. BAM files were then sorted and in- dexed with SAMtools [102], and Picard-tools v 1.119 (https://github.com/broadinstitute/picard) was used to add read group data and mark PCR duplicates. To check for incorrectly labelled samples, we estimated the sex of a sample by dividing the mean number of

43 reads per kilobase on the Z chromosome by the mean value for autosomes (custom script by John Davey). This returns a value close 1 in males and 0.7 in females, which can then be compared with the recorded sex of the sample. To further check for labelling errors, confirm pedigrees, and assign samples with unrecorded pedi- gree to families, we used Plink 1.9 [28] to estimate the fraction of the genome that is identical by descent between all pairwise combinations of samples (siblings and parent-offspring comparisons should yield values close to 0.5). In addition, forspec- imens that were sequenced multiple times, we checked that samples were in fact derived from the same individual. We then merged these samples, using the Merge-

SamFiles command from Picard-tools, and used Samtools’ mpileup command to call single nucleotide polymorphisms (SNPs).

3.3.7 Linkage map construction

A linkage map was built using Lep-MAP3 [155]. SNPs were first converted to pos- terior genotype likelihoods for each 10 possible SNP genotypes using custom scripts available at (https://sourceforge.net/p/lep-map3). We used the ParentCall2 mod- ule to correct erroneous or missing parental genotypes, and call sex-linked markers using a log-odds difference of > 2. We used Filtering2 to remove SNPs showing segregation distortion, specifying a P-value limit of 0.01, i.e. there is a 1:100 chance that a randomly segregating markers is discarded. We did not apply sex filter- ing, as all but one offspring were female. We then used SeparateChromosomes2 in Lep-MAP3 to cluster markers to linkage groups, specifying zero recombination in females and joining pairs of markers with LOD-score greater than 14. To ob- tain genetic distances between markers, we fixed the markers to their order inthe

H. melpomene genome v2.5, and then evaluated this order using OrderMarkers2,

44 specifying no recombination in females, and using parentally informative and dual informative markers. We then used map2gentypes.awk to convert the Lep-MAP3 output to 4-way fully informative genotypes with no missing data.

3.3.8 QTL mapping

A total of 86 backcross female ovaries were phenotyped and sequenced with a

reduced-representation, RAD-seq approach. These individuals belonged to 8 broods,

which were all analyzed jointly. Linkage maps were generated with 880 RAD-seq

markers mapped to the H. melpomene reference genome Hmel2.5 [37], yielding a map with a total length of 1106 cM.

We converted the 4-way fully informative genotypes to backcross markers, and used R/QTL [21] to test for associations between phenotype and genotype. We esti- mated genotype probabilities at 1cM intervals, using the Haldane mapping function and an assumed genotyping error rate of 1.0×10−4. We then ran a genome scan with a single QTL model, modelling phenotype as a binomial response (0=sterile, 1=fer- tile), using Haley-Knott regression, and treating family as an interactive covariate.

We estimated a single genome-wide significance LOD threshold by randomly per- muting the phenotypes and running the scan 1,000 times. We did not estimate a separate significance threshold for the sex chromosome because all phenotyped off- spring were female, hence there is no difference in the number of degrees of freedom of QTL models at autosomal loci and sex loci.

3.4 Results

In order to uncover the molecular and genetic basis of hybrid sterility between

Heliconius pardalinus butleri and H. p. sergestus, we generated backcross broods

45 by mating the fertile F1 hybrid males to both parental species (Fig. 3.2.1B). Both directions of this cross produced sterile F1 females with similar phenotypes, so we here focus on broods with H. p. butleri mothers, as we obtained much higher sample

sizes in this cross direction.

3.4.1 Phenotype characterization

We first aimed to characterize the sterility phenotype beyond the initial recognition

that hybrids did not lay eggs. We therefore dissected ovaries from parental, F1,

and backcross virgin females 15 days after eclosion, well after they were expected to

have reached full development [42, 119]. All individuals from parental populations contained developing follicles which had reached the final stages of vitellogenesis, and many had generated fully developed eggs. However, the ovaries of F1 hybrids appeared to be devoid of developing oocytes (Fig. 3.4.1A). Backcross individuals exhibited a variety of developmental phenotypes, spanning the full range between parental, apparently wild-type oocyte development and absence of visible developing follicles (Fig. 3.4.1A). In a subset of samples, we characterized the developmental stages of oocytes through nuclear staining with DAPI, with reference to stages de- scribed in the silkmoth Bombyx mori (Fig. 3.4.1B-G). This analysis revealed that

all F1 and backcross individuals did contain early-stage follicles, indicating that the

hybrid incompatibility set in when oocytes had reached approximately stage 3. This

stage marks a developmental timepoint after oocyte-nurse cell specification and fol-

licle formation, but before vitellogenesis and yolk deposition [25, 199]. Results from the morphology score recapitulated these trends with the full dataset 3.4.2.

46 Figure 3.4.1: Staging of developing oocytes A. Idealized developing follicle stages, from [199] B.. Widefield, light microscope images of ovaries from parental, backcross, and F1 individuals (rows labelled). C-G DAPI-stained single ovarioles dissected from the ovaries in B. Each row displays follicles from the same ovary. C. Overview scan of ovariole. D. Stage 2 follicles. E. Stage 3 follicles. F. Stage 4 follicles. G. Stage 5 follicles. Scale bar for images in C is illustrated below the C column. Scale bar for D-G is illustrated below the D column. Cells with an X represent stages not present in the illustrated ovariole.

47 Figure 3.4.2: Morphology score distribution] A. Representative images of score 3 ovaries. B. Representative images of score 2 ovaries. C. Representative images of score 1 ovaries. D. Representative images of score 0 ovaries. E. Distribution of morphology scores for parental, backcross, and F1 female ovaries. The backcross broods show phenotypes that span the full range from parental to F1. Binary phenotypes discussed below were generated by creating a cutoff at morphology score of 1.5.

3.4.2 Differential expression analysis

Because we localized the dysgenic phenotype to early stage, pre-vitellogenic oocytes

(Fig 3.4.1), we next quantified RNA expression differences among backcross indi- viduals at this stage and earlier. We dissected the pre-vitellogenic follicles from six backcross ovaries, two of which were assigned a morphology score of 0-1, two of 1-2, and two of 2-3. Because we micro-dissected pre-vitellogenic follicles to investigate the specific phenotype of developmental failure in early-stage oocytes, we further classified the phenotypes with a binary scheme: 0 for no vitellogenic follicles, 1for any vitellogenic follicles. In each case, tissue was dissected from all four ovarioles of a single ovary, and we acquired approximately 49 million reads per individual. After filtering our data for sequencing and mapping quality, we quantified expression of

16,774 transcripts.

48 We first performed a principal components analysis of our expression data. PC1,

which explains over 50% of variance, segregates the three morphology score cate-

gories in order (Fig. 3.4.3A). We performed a Wald test to evaluate the effect of change in expression of each transcript to the sterility phenotype in the backcrosses

[29]. After correcting for multiple comparisons, a total of 14%, or 2315 transcripts

showed significant effects of expression on binary phenotype (q < 0.05). Of these,

941 displayed a positive association with development, meaning that the transcript was expressed at a higher level in more highly developed ovaries. The remaining

1386 differentially expressed transcripts displayed a negative association with devel- opment. To narrow our list of candidate genes, we next intersected our differentially expressed transcripts with a list of genes known to be involved in Lepidoptera oo- genesis [26]. Through a reciprocal BLAST approach, we identified 1,771 transcripts

in the H. melpomene transcriptome that were orthologous with Pararge aegeria oo- genesis genes. As expected, these genes showed generally high expression levels in the sampled transcriptomes relative to other genes not implicated in oogenesis

(Fig. 3.4.3B. A total of 306 (17%) genes that were identified as involved in oogen- esis were also differentially expressed in backcrosses as a function of developmental phenotype.

3.4.3 QTL mapping

Finally, we searched for loci associated with hybrid sterility in ovary development

through Quantitative Trait Locus (QTL) mapping. We again focused on the binary

phenotype to allow for comparison with the RNA-sequencing data. Using an addi-

tive model, no region reached genome-wide significance at a 5% level (Fig. 3.4.4A).

With only 86 individuals, a locus must have an effect on the binary phenotype

49 Figure 3.4.3: Backcross differential expression A. Principal component analysis of the six backcross pre-vitellogenic follicles reveals that PC1 segregates individuals based on morphology score. Black: Score=1, Binary=0; Dark Blue: Score=2, Binary=1; Light Blue: Score=3, Binary=1. B. Scatter plot shows the mean expression of transcripts across all samples on the x-axis, and the fold change in expression between ovaries of binary score 0 and 1 on the y-axis. Color corresponds to p-value of differential expression. Diamonds are transcripts implicated in oogenesis in Pararge aegeria [26], and circles are all other transcripts. Density plot above shows the distribution of expression levels for transcripts implicated in oogenesis in P. aegeria [26]. Density plot on right shows the distribution of transcripts that were significantly differently expressed in the two developmental conditions.

50 of at least 0.67 and explain 30.6% of phenotypic variance to be detected at this genome–wide significance level. However, alleles that do not reach genome-wide significance may still contribute to observed phenotypes. We therefore searched for overlap between the transcripts that were differentially expressed between the backcross ovaries, have been implicated in Lepidopteran oogenesis, and were in the top 1% of QTL LOD values. This set included only 5 transcripts, which were associated with genes HMEL007053, HMEL037521,HMEL007056, HMEL011314, and HMEL037500. Only one of these, HMEL011314, showed expression differ- ences over 2-fold (Fig. 3.4.3B, Fig. 3.4.4B). This transcript is orthologous to the D. melanogaster gene trailer hitch (tral), a conserved protein involved in mRNA local-

ization and protein trafficking [197]. This transcript is labelled in Fig. 3.4.3B, and is located on chromosome 7, within the approximate Bayesian credible interval for the major QTL peak on that chromosome (Fig. 3.4.4C). That peak has an effect on the binary phenotype of 0.467, and explains approximately 23.4% of the phenotypic variance (Fig. 3.4.4D).

3.5 Discussion

Here, we show that hybrid females between H. pardalinus butleri and H. p. sergestus

are sterile due to dysgenesis in oocyte development. We find that the transition

from stage 3 to stage 4 is a critical moment, at which which most F1 hybrid oocytes

appear to arrest. Through targeted RNA sequencing and QTL mapping, we identify

a candidate gene, tral, whose expression level correlates strongly with developmental phenotype in backcross ovaries.

51 Figure 3.4.4: QTL map A QTL LOD scores for the binary scoring scheme. B Expression levels of tral based on binary score. Individuals with score 0 have significantly lower expression. C High resolution of QTL peak on chromosome 7. Points represent LOD scores for each marker. Yellow: approximate Bayesian credible interval for QTL peak. Dotted line: location of tral. D Mean phenotypes for individuals from the largest brood based on their chr7 QTL genotype. Heterozygotes have significantly lower scores.

52 11.0

10.8

10.6 adjusted ln counts

10.4

10.2 H. melpomene H. cydno H. pardalinus butleri

Figure 3.5.1: Heliconius tral expression The tral transcript is equally abundant in pre- vitellogenic follicles of H. melpomene, H. cydno, and H. p. butleri. The expression level is comparable to that in the backcross ovaries with binary score 1 (Fig. 3.4.4B)

3.5.1 role and conservation of tral in Lepidopteran oogenesis

When comparing expression levels in non-hybrid Heliconius species sequenced for this study (H. pardalinus, H. melpomene, and H. cydno), there was no significant difference in tral expression in the pre-vitellogenic follicles (Fig. 3.5.1). Ovarian gene expression has also been characterized in three additional Lepidopteran species:

Bombyx mori [201], Pararge aegeria [26], and Plutella xytostella [141]. In each of these, the tral ortholog was found to be highly expressed, consistent with a conserved role for tral in oogenesis.

53 More in-depth molecular characterization of tral has been performed in Drosophila

melanogaster, revealing two potential mechanisms for the sterility phenotype ob-

served here. First, in D. melanogaster, under-expression of tral leads to defects in

the localization of mRNAs critical for dorsal-ventral patterning in the egg, including

gurken (grk)[173], as well as the secretion of their associated proteins [197]. This process occurs during stages 8-9 of Drosophila oogenesis, homologous to Bombyx stages 3-4, consistent with the hypothesis that H. pardalinus hybrid sterility is re-

lated to failures in dorsal-ventral patterning due to tral underexpression [114, 199].

An alternative mechanism for tral-mediated female sterility is through disruption of the transposon silencing pathway. Derepression of transposable elements due to incompatible piRNA pathway proteins can cause hybrid sterility in Drosophila, and tral is known to interact with those proteins [83, 103, see below].

3.5.2 comparison to Drosophila hybrid incompatibility

The fruit fly Drosophila melanogaster and it relatives have long been used as a model to study developmental phenomena, including the genetic and molecular ba- sis of hybrid sterility. In this system, classical Dobzhansky-Muller incompatibilities have been identified and characterized [12, 20, 180]. Because D. melanogaster has

X-Y sex determination, in accordance with Haldane’s rule, when one sex is sterile it is most often males [69]. However, hybrid female dysgenesis has been observed as well, notably in so-called P-M hybrids, in which oogenesis arrests at a very early stage[17, 85, 161]. This phenotype has been described as a loss of control of trans- posable elements which are typically suppressed through the Drosophila piRNA

pathway [47, 83]. Superficially, the Heliconius sterility phenotype described in this study closely parallels the Drosophila case. However, it is not clear whether the

54 Figure 3.5.2: mRNA expression of piRNA-associated proteins Expression levels for vasa, AGO2/3, and piwi/aubergine are shown. In each case, abundance is lower in undeveloped ovaries, but only significantly so for vasa.

mechanism is the same between systems. Heliconius researchers have previously

explicitly tested the hypothesis that transposon silencing through the piRNA path-

way is mis-regulated in sterile female hybrids between H. melpomene and H. cydno.

They found that a subset of transposable elements were derepressed in F1 hybrids,

but found no evidence that piRNAs themselves or three proteins involved in the

piRNA pathway were misexpressed [142]. In H. pardalinus low fertility score female hybrids, these three proteins (piwi/aubergine,AGO2/3, and vasa) are expressed at lower levels than in well-developed individuals, though only vasa expression dif- ferences were significant (Fig. 3.5.2). However, as discussed above, our candidate gene tral has also been implicated in transposon silencing by forming a complex with piRNA proteins which inhibits transposition [103]. The transposon derepres- sion mechanism is therefore plausible in this system, and more work must be done to test the hypothesis that Heliconius female hybrid sterility arises from a process distinct from that in Drosophila.

55 3.5.3 evolution of incompatibilities in Heliconius pardalinus

Upon sequencing the H. melpomene genome in 2012, the Heliconius Genome Con- sortium discovered that H. pardalinus was rendered paraphyletic, as H. p. butleri was more closely related genetically to H. elevatus than it was to H. p. sergestus [35].

More fine-scale analysis revealed that the ancestry of several genomic regions reca- pitulated traditional species relationships, in which the two H. pardalinus subspecies were monophyletic [92]. This data is consistent with a model in which H. p. butleri and H. elevatus exchanged genes through hybridization in secondary contact after speciation, to the exclusion of H. p. sergestus [92]. Crosses between H. p. butleri and H. elevatus are fully fertile, but crosses between H. p. sergestus and H. elevatus are sterile, raising questions about the orgin and timing of sterility between the two H. pardalinus subspecies studied here. Sterility could have evolved in the short period in which H. p. sergestus was isolated from the Amazonian populations, or it could have evolved after the initial speciation event between H. pardalinus and

H. elevatus, and then lost in H. p. butleri after gene flow in the Amazon. The answer to this question will be informative as to the stability of nascent species boundaries and the speed of formation of hybrid incompatibilities.

56 4 Conclusion

Species are locally adapted to the particular environment that they inhabit, and mating with individuals of another species is likely to produce offspring that are intermediate in character, and whose traits are not as finely tuned as their parents

[3, 8]. In keeping with this intuition, early-generation hybrids generally suffer se- vere fitness consequences: they are often sterile, inviable, or maladapted totheir environment [32]. This is the case for several Heliconius butterfly species, including

H. pardalinus geographic races, detailed in Chapter 3, in which hybrid females are

57 sterile. H. p. butleri and H. p. sergestus are parapatric in distribution, separated by a 1000-meter high cordillera, such that they very rarely come into contact [159].

They inhabit quite different environments, with H. p. butleri in lowland rainforest

and H. p. sergestus in dry forests. However, the observed incompatibility is not

one of maladaptation to the environment, but instead the intrinsic phenotype that

hybrid females lay no eggs. Because the two populations are quite closely related

to each other, it is likely that the incompatibility is driven by relatively few loci,

which make it an ideal system to investigate which pathway is the first to fail in in-

terpopulation compatibility. This system is also powerful because Lepidoptera have

ZZ/ZW sex determination. As of yet, no researchers have identified a molecular

basis of hybrid sterility in such a system, though some have been studied in detail

in the XX/XY genera Drosophila and Mus [12, 20, 122, 180].

Our identification of trailer hitch (tral) as a candidate gene may shed light on

the parallelisms between these categories of hybrids. More work must be done to

confirm that tral underexpression is the causative factor in hybrid sterility, and to

understand the mechanism by which it works. However, if tral is in fact responsible, there are two theories, laid out in Chapter 3, of how it might lead to the observed phenotype. One of these, misregulation of the transposon silencing pathway, would be analogous to a mechanism of female hybrid sterility in Drosophila [17, 84, 85].

Interestingly, though the X-chromosome is often implicated in hybrid sterility and both the X and Z chromosomes consistently show signals of reduced introgression, tral is located on an autosome [52, 58, 113, 147]. However, preliminary analysis of

epistatic interactions in our QTL study may implicate loci on the Z chromosome,

and further research is needed to clarify this point.

Given the lowered fitness experienced by early-generation hybrids, hybridization

58 seems unlikely to play an important, positive role in adaptation. It is puzzling, from that perspective, to observe widespread evidence of introgression in representative genomes from a major Heliconius clade, as was identified in Chapter 2. One par- ticular region captured a color pattern switch locus. We were able to leverage our de novo genome assembly strategy to show that the region was in fact an inversion, convergent in location but independent from an inversion identified in H. numata and H. pardalinus, members of the other major Heliconius clade [75, 81, 82]. I helped to develop and use a new statistical method to show that this region had a high likelihood of having a history of introgression, and further applied the method to show that across all species comparisons and all genomic windows, approximately one in five loci arose via interspecies gene flow.

This high value can, however, be reconciled with the idea that hybrids have gener- ally low fitness, and incompatible loci are relatively common. When examining the distribution of introgressed loci across the genome, two striking patterns emerged: larger chromosomes have much less introgression than smaller chromosomes, and regions of the genome with high recombination rate have a higher introgression fraction that those with low recombination rates. These two patterns can be par- tially explained by the same process. Because Heliconius have approximately one

crossover per chromosome per meiosis, larger chromosomes have a lower per-base

recombination rate. Consistent with previous explanations, the correlations with

recombination rate could therefore be due to the higher probability of neutral or

beneficial introgressed alleles becoming unlinked from deleterious alleles in highre-

combination regions [18, 86, 125, 166]. In a project outside the scope of this thesis, but discussed somewhat in Chapter 1, I collaborated with theoreticians to show that an alternative mechanism is also at play and may be more important [190].

59 The efficiency of selection against deleterious introgressed loci is proportional tothe variance of hybrid ancestry in the population. Introgressed haplotypes in regions of low recombination are inherited as large blocks, while those in areas of high recom- bination are broken into smaller loci and distributed more evenly in the population.

Selection will act to quickly remove the large blocks in variable regions but will be less efficient at removing the small, dispersed blocks, retaining more introgressed ancestry in genomic regions with high recombination [190].

This pattern of genome-wide introgression in Heliconius is far from unique. As

I show in Chapter 1, gene flow involving at least a subset of taxa is the normfor

clade-wide genomic studies (e.g. [53, 61, 95, 134]. Analogous to our Heliconius case, studies involving Anopholes mosquitos found that the majority of the genome shows an evolutionary relationship reflecting a history of hybridization, not the historical species tree [52]. The current difficulty of assessing the overall importance ofin- trogression does not lie in determining the prevalence of introgressed loci, but in translating that level into adaptive impact. In many cases, the introgressed vari- ation has either been directly shown to be beneficial to the recipient species, or the frequency of introgressed alleles in the recipient population is more consistent with positive selection than neutral evolution (e.g. [49, 79, 128, 175, 176]. However, each case is unique. High values of introgression may be indicative of hybridization between two very closely related species, from a large population into a small popu- lation, or between species with high aggregate recombination rates [1, 70, 86, 190].

The demographic histories of hybridizing populations help determine the number of incompatible alleles and the evolutionary trajectory of introgressed loci, and this must be determined in each case separately. There are, however, some general pat- terns of when hybridization may be especially important. These include times of

60 extreme environmental change, such as during adaptive radiations, range expan- sions, or invasions [73, 109, 143].

In the long time scale of evolution, extremely unlikely events are almost guaran-

teed to happen. If populations are in close proximity and are similar enough on a

genetic level to produce at least some fertile offspring, mounting evidence suggests

that they will do so. Early on, the hybrids may suffer severe fitness consequences,

and classical outcomes such as reinforcement of mating barriers may result, solid-

ifying the separation of the species. However, even a low level of gene flow may

contribute significant genetic variation from a source population into the recipient

and, depending on the factors detailed above, may drastically impact the ability of

those individuals to adapt to their environments. Each of these relationships with

hybridization plays a role in how populations evolve, and each is therefore important

to consider in forming intuition and models for the general process of evolution.

61 References

[1] S. Aeschbacher, J. P. Selby, J. H. Willis, and G. Coop. Population-genomic inference of the strength and timing of selection against gene flow. Proceedings of the National Academy of Sciences, 114(27):7061–7066, 2017.

[2] F. W. Allendorf, G. Luikart, and S. N. Aitken. Conservation and the genetics of populations. mammalia, 2007(2007):189–197, 2007.

[3] Anderson. Hybridization of the habitat. Evolution, 2(1):1–9, 1948.

[4] E. Anderson. Recombination in species crosses. Genetics, 24(5):668, 1939.

[5] M. L. Arnold. Evolution through genetic exchange. Oxford University Press, 2006.

[6] N. Barton. Multilocus clines. Evolution, 37(3):454–471, 1983.

[7] N. Barton and B. O. Bengtsson. The barrier to genetic exchange between hybridising populations. Heredity, 57(3):357–376, 1986.

[8] N. H. Barton. The role of hybridization in evolution. Molecular ecology, 10 (3):551–568, 2001.

[9] T. Bataillon and M. Kirkpatrick. Inbreeding depression due to mildly delete- rious mutations in finite populations: size does matter. Genetics Research, 75 (1):75–81, 2000.

[10] W. Bateson. Heredity and variation in modern lights. Darwin and modern science, 1909.

[11] R. A. Bay, E. B. Taylor, and D. Schluter. Parallel introgression and selection on introduced alleles in a native species. Molecular ecology, 28(11):2802–2813, 2019.

[12] J. J. Bayes and H. S. Malik. Altered heterochromatin binding by a hybrid sterility protein in Drosophila sibling species. Science, 326(5959):1538–1541, 2009.

62 [13] E. A. Beck, A. C. Thompson, J. Sharbrough, E. Brud, and A. Llopart. Gene flow between Drosophila yakuba and Drosophila santomea in subunit V of cy- tochrome c oxidase: A potential case of cytonuclear cointrogression. Evolution, 69(8):1973–1986, 2015. [14] D. J. Begun and C. F. Aquadro. Levels of naturally occurring DNA poly- morphism correlate with recombination rates in D. melanogaster. Nature, 356 (6369):519–520, 1992. [15] W. W. Benson. Natural selection for Müllerian mimicry in in . Science, 176(4037):936–939, 1972. [16] J. J. Berg and G. Coop. A population genetic signal of polygenic adaptation. PLoS genetics, 10(8), 2014. [17] P. M. Bingham, M. G. Kidwell, and G. M. Rubin. The molecular basis of PM hybrid dysgenesis: the role of the P element, a P-strain-specific transposon family. Cell, 29(3):995–1004, 1982. [18] Y. Brandvain, A. M. Kenney, L. Flagel, G. Coop, and A. L. Sweigart. Specia- tion and introgression between Mimulus nasutus and Mimulus guttatus. PLoS genetics, 10(6), 2014. [19] N. L. Bray, H. Pimentel, P. Melsted, and L. Pachter. Near-optimal proba- bilistic RNA-seq quantification. Nature biotechnology, 34(5):525–527, 2016. [20] N. J. Brideau, H. A. Flores, J. Wang, S. Maheshwari, X. Wang, and D. A. Barbash. Two Dobzhansky-Muller genes interact to cause hybrid lethality in Drosophila. science, 314(5803):1292–1295, 2006. [21] K. W. Broman, H. Wu, Ś. Sen, and G. A. Churchill. R/qtl: QTL mapping in experimental crosses. Bioinformatics, 19(7):889–890, 2003. [22] A. V. Brower and I. J. Garzón-Orduña. Missing data, clade support and “reticulation”: the molecular systematics of Heliconius and related genera (Lepidoptera: ) re-examined. Cladistics, 34(2):151–166, 2018. [23] K. S. Brown Jr. The biology of Heliconius and related genera. Annual Review of Entomology, 26(1):427–457, 1981. [24] H. Brunschwig, L. Levi, E. Ben-David, R. W. Williams, B. Yakir, and S. Shif- man. Fine-scale maps of recombination rates and hotspots in the mouse genome. Genetics, 191(3):757–764, 2012. [25] J. Büning. The insect ovary: ultrastructure, previtellogenic growth and evolu- tion. Springer Science & Business Media, 1994.

63 [26] J.-M. Carter, S. C. Baker, R. Pink, D. R. Carter, A. Collins, J. Tomlin, M. Gibbs, and C. J. Breuker. Unscrambling butterfly oogenesis. BMC ge- nomics, 14(1):283, 2013.

[27] J. Catchen, P. A. Hohenlohe, S. Bassham, A. Amores, and W. A. Cresko. Stacks: an analysis tool set for population genomics. Molecular ecology, 22 (11):3124–3140, 2013.

[28] C. C. Chang, C. C. Chow, L. C. Tellier, S. Vattikuti, S. M. Purcell, and J. J. Lee. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 4(1):s13742–015, 2015.

[29] Z. Chen, J. Liu, H. K. T. Ng, S. Nadarajah, H. L. Kaufman, J. Y. Yang, and Y. Deng. Statistical methods on detecting differentially expressed genes for RNA-seq data. BMC systems biology, 5(S3):S1, 2011.

[30] J. A. Coyne. Genetics and speciation. Nature, 355(6360):511–515, 1992.

[31] J. A. Coyne and H. A. Orr. Patterns of speciation in Drosophila. Evolution, 43(2):362–381, 1989.

[32] J. A. Coyne and H. A. Orr. Speciation. Sinauer Associates Incorporated, Jan. 2004.

[33] J. F. Crow. Alternative hypotheses of hybrid vigor. Genetics, 33(5):477, 1948.

[34] T. E. Cruickshank and M. W. Hahn. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Molecular ecology, 23(13):3133–3157, 2014.

[35] K. K. Dasmahapatra, J. R. Walters, A. D. Briscoe, J. W. Davey, A. Whibley, N. J. Nadeau, A. V. Zimin, D. S. Hughes, L. C. Ferguson, S. H. Martin, et al. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature, 487(7405):94, 2012.

[36] J. W. Davey, M. Chouteau, S. L. Barker, L. Maroja, S. W. Baxter, F. Simp- son, R. M. Merrill, M. Joron, J. Mallet, K. K. Dasmahapatra, et al. Major improvements to the Heliconius melpomene genome assembly used to confirm 10 chromosome fusion events in 6 million years of butterfly evolution. G3: Genes, Genomes, Genetics, 6(3):695–708, 2016.

[37] J. W. Davey, S. L. Barker, P. M. Rastas, A. Pinharanda, S. H. Martin, R. Durbin, W. O. McMillan, R. M. Merrill, and C. D. Jiggins. No evidence for maintenance of a sympatric Heliconius species barrier by chromosomal inversions. Evolution Letters, 1(3):138–154, 2017.

64 [38] M. DeGiorgio, C. D. Huber, M. J. Hubisz, I. Hellmann, and R. Nielsen. SweepFinder2: increased sensitivity, robustness and flexibility. Bioinformat- ics, 32(12):1895–1897, 2016.

[39] T. Dobzhansky. Studies on hybrid sterility. II. Localization of sterility factors in Drosophila pseudoobscura hybrids. Genetics, 21(2):113, 1936.

[40] T. Dobzhansky. Genetics and the Origin of Species. Number 11. Columbia university press, 1937.

[41] Drosophila 12 Genomes Consortium. Evolution of genes and genomes on the Drosophila phylogeny. Nature, 450(7167):203, 2007.

[42] H. Dunlap-Pianka, C. L. Boggs, and L. E. Gilbert. Ovarian dynamics in heliconiine butterflies: programmed senescence versus eternal youth. Science, 197(4302):487–490, 1977.

[43] N. B. Edelman, P. B. Frandsen, M. Miyagi, B. Clavijo, J. Davey, R. B. Dikow, G. García-Accinelli, S. M. Van Belleghem, N. Patterson, D. E. Neafsey, et al. Genomic architecture and introgression shape a butterfly radiation. Science, 366(6465):594–599, 2019.

[44] S. Edwards and P. Beerli. Perspective: gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies. Evolution, 54(6):1839–1854, 2000.

[45] H. Ellegren. Genome sequencing and population genomics in non-model or- ganisms. Trends in ecology & evolution, 29(1):51–63, 2014.

[46] P. D. Etter, J. L. Preston, S. Bassham, W. A. Cresko, and E. A. Johnson. Local de novo assembly of RAD paired-end contigs using short sequencing reads. PloS one, 6(4), 2011.

[47] M. B. Evgen’ev, H. Zelentsova, N. Shostak, M. Kozitsina, V. Barskyi, D.-H. Lankenau, and V. G. Corces. Penelope, a new family of transposable elements and its possible role in hybrid dysgenesis in Drosophila virilis. Proceedings of the National Academy of Sciences, 94(1):196–201, 1997.

[48] R. A. Fisher. The genetical theory of natural selection. The Clarendon Press, 1930.

[49] B. M. Fitzpatrick, J. R. Johnson, D. K. Kump, J. J. Smith, S. R. Voss, and H. B. Shaffer. Rapid spread of invasive genes into a threatened native species. Proceedings of the National Academy of Sciences, 107(8):3606–3610, 2010.

65 [50] S. W. Fitzpatrick, G. S. Bradburd, C. T. Kremer, P. E. Salerno, L. M. An- geloni, and W. C. Funk. Genomic and fitness consequences of genetic rescue in wild populations. Current Biology, 2020.

[51] T. Flouri, X. Jiao, B. Rannala, and Z. Yang. A Bayesian implementation of the multispecies coalescent model with introgression for phylogenomic analysis. Mol. Biol. Evol, 2019.

[52] M. C. Fontaine, J. B. Pease, A. Steele, R. M. Waterhouse, D. E. Neafsey, I. V. Sharakhov, X. Jiang, A. B. Hall, F. Catteruccia, E. Kakani, et al. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science, 347(6217):1258524, 2015.

[53] A. D. Foote, M. D. Martin, M. Louis, G. Pacheco, K. M. Robertson, M.-H. S. Sinding, A. R. Amaral, R. W. Baird, C. S. Baker, L. Ballance, et al. Killer whale genomes reveal a complex history of recurrent admixture and vicariance. Molecular ecology, 28(14):3427–3444, 2019.

[54] R. Frankham, J. D. Ballou, M. D. Eldridge, R. C. Lacy, K. Ralls, M. R. Du- dash, and C. B. Fenster. Predicting the probability of outbreeding depression. Conservation Biology, 25(3):465–475, 2011.

[55] H. F. Gante, M. Matschiner, M. Malmstrøm, K. S. Jakobsen, S. Jentoft, and W. Salzburger. Genomics of speciation and introgression in Princess cichlid fishes from Lake Tanganyika. Molecular ecology, 25(24):6143–6161, 2016.

[56] Z. Gompert and C. A. Buerkle. Bayesian estimation of genomic clines. Molec- ular Ecology, 20(10):2111–2127, 2011.

[57] Z. Gompert, L. K. Lucas, C. C. Nice, and C. A. Buerkle. Genome divergence and the genetic architecture of barriers to gene flow between Lycaeides idas and L. melissa. Evolution, 67(9):2498–2514, 2013.

[58] J. M. Good, M. D. Dean, and M. W. Nachman. A complex genetic basis to X-linked hybrid male sterility between two species of house mice. Genetics, 179(4):2213–2228, 2008.

[59] J. M. Good, D. Vanderpool, S. Keeble, and K. Bi. Negligible nuclear in- trogression despite complete mitochondrial capture between two species of chipmunks. Evolution, 69(8):1961–1972, 2015.

[60] M. Goodman, J. Czelusniak, G. W. Moore, A. E. Romero-Herrera, and G. Matsuda. Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Sys- tematic Biology, 28(2):132–163, 1979.

66 [61] S. Gopalakrishnan, M.-H. S. Sinding, J. Ramos-Madrigal, J. Niemann, J. A. S. Castruita, F. G. Vieira, C. Carøe, M. de Manuel Montero, L. Kuderna, A. Ser- res, et al. Interspecific gene flow shaped the evolution of the genus Canis. Current Biology, 28(21):3441–3449, 2018.

[62] S. Gourbiere and J. Mallet. Are species real? the shape of the species boundary with exponential failure, reinforcement, and the “missing snowball”. Evolution: International Journal of Organic Evolution, 64(1):1–24, 2010.

[63] P. R. Grant and B. R. Grant. Hybridization increases population variation during adaptive radiation. Proceedings of the National Academy of Sciences, 116(46):23216–23224, 2019.

[64] R. E. Green, J. Krause, A. W. Briggs, T. Maricic, U. Stenzel, M. Kircher, N. Patterson, H. Li, W. Zhai, M. H.-Y. Fritz, et al. A draft sequence of the Neandertal genome. science, 328(5979):710–722, 2010.

[65] R. C. Griffiths. The two–locus ancestral graph. Lecture Notes-Monograph Series, pages 100–117, 1991.

[66] R. C. Griffiths and P. Marjoram. Ancestral inference from samples ofDNA sequences with recombination. Journal of Computational Biology, 3(4):479– 502, 1996.

[67] P. J. Gullan and P. S. Cranston. The : an outline of entomology. John Wiley & Sons, 2014.

[68] Q. Haenel, T. G. Laurentino, M. Roesti, and D. Berner. Meta-analysis of chromosome-scale crossover rate variation in eukaryotes and its significance to evolutionary genomics. Molecular ecology, 27(11):2477–2497, 2018.

[69] J. Haldane. Sex ratio and unisexual sterility in hybrid animals. Journal of genetics, 12(2):101–109, 1922.

[70] K. Harris and R. Nielsen. The genetic cost of Neanderthal introgression. Genetics, 203(2):881–891, 2016.

[71] D. L. Hartl, A. G. Clark, and A. G. Clark. Principles of population genetics, volume 116. Sinauer associates Sunderland, MA, 1997.

[72] J. T. Hogg, S. H. Forbes, B. M. Steele, and G. Luikart. Genetic rescue of an insular population of large mammals. Proceedings of the Royal Society B: Biological Sciences, 273(1593):1491–1499, 2006.

67 [73] S. M. Hovick and K. D. Whitney. Hybridisation is associated with increased fe- cundity and size in invasive taxa: meta-analytic support for the hybridisation- invasion hypothesis. Ecology letters, 17(11):1464–1477, 2014.

[74] R. R. Hudson. Properties of a neutral allele model with intragenic recombi- nation. Theoretical population biology, 23(2):183–201, 1983.

[75] P. Jay, A. Whibley, L. Frézal, M. Á. R. de Cara, R. W. Nowell, J. Mallet, K. K. Dasmahapatra, and M. Joron. Supergene evolution triggered by the introgression of a chromosomal inversion. Current Biology, 28(11):1839–1845, 2018.

[76] C. D. Jiggins. The ecology and evolution of Heliconius butterflies. Oxford University Press, 2017.

[77] C. D. Jiggins, M. Linares, R. E. Naisbit, C. Salazar, Z. H. Yang, and J. Mallet. Sex-linked hybrid sterility in a butterfly. Evolution, 55(8):1631–1638, 2001.

[78] W. E. Johnson, D. P. Onorato, M. E. Roelke, E. D. Land, M. Cunningham, R. C. Belden, R. McBride, D. Jansen, M. Lotz, D. Shindle, et al. Genetic restoration of the Florida panther. Science, 329(5999):1641–1645, 2010.

[79] M. Jones, L. Mills, P. Alves, C. Callahan, J. Alves, D. Lafferty, F. Jiggins, J. Jensen, J. Melo-Ferreira, and J. Good. Adaptive introgression underlies polymorphic seasonal camouflage in snowshoe hares. Science, 360(6395):1355– 1358, 2018.

[80] M. R. Jones, L. S. Mills, J. D. Jensen, and J. M. Good. The origin and spread of locally adaptive seasonal camouflage in snowshoe hares. bioRxiv, page 847616, 2019.

[81] M. Joron, R. Papa, M. Beltrán, N. Chamberlain, J. Mavárez, S. Baxter, M. Abanto, E. Bermingham, S. J. Humphray, J. Rogers, et al. A conserved su- pergene locus controls colour pattern diversity in Heliconius butterflies. PLoS Biol, 4(10):e303, 2006.

[82] M. Joron, L. Frezal, R. T. Jones, N. L. Chamberlain, S. F. Lee, C. R. Haag, A. Whibley, M. Becuwe, S. W. Baxter, L. Ferguson, et al. Chromosomal rear- rangements maintain a polymorphic supergene controlling butterfly mimicry. Nature, 477(7363):203–206, 2011.

[83] E. S. Kelleher, N. B. Edelman, and D. A. Barbash. Drosophila interspecific hybrids phenocopy piRNA-pathway mutants. PLoS biology, 10(11), 2012.

68 [84] J. Kelleher, Y. Wong, A. W. Wohns, C. Fadil, P. K. Albers, and G. McVean. Inferring whole-genome histories in large population datasets. Nature genetics, 51(9):1330–1338, 2019.

[85] M. G. Kidwell, J. F. Kidwell, and J. A. Sved. Hybrid dysgenesis in Drosophila melanogaster: a syndrome of aberrant traits including mutation, sterility and male recombination. Genetics, 86(4):813–833, 1977.

[86] B. Y. Kim, C. D. Huber, and K. E. Lohmueller. Deleterious variation shapes the genomic landscape of introgression. PLoS genetics, 14(10):e1007741, 2018.

[87] Y. Kim and W. Stephan. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics, 160(2):765–777, 2002.

[88] M. Kimura, T. Maruyama, and J. F. Crow. The mutation load in small populations. Genetics, 48(10):1303, 1963.

[89] J. F. Kingman. On the genealogy of large populations. Journal of applied probability, 19(A):27–43, 1982.

[90] K. M. Kozak, N. Wahlberg, A. F. Neild, K. K. Dasmahapatra, J. Mallet, and C. D. Jiggins. Multilocus species trees show the recent adaptive radiation of the mimetic Heliconius butterflies. Systematic biology, 64(3):505–524, 2015.

[91] K. M. Kozak, O. McMillan, M. Joron, and C. D. Jiggins. Genome-wide ad- mixture is common across the Heliconius radiation. BioRxiv, page 414201, 2018.

[92] D. Kryvokhyzha. Whole genome resequencing of heliconius butterflies revolu- tionizes our view of the level of admixture between species, 2014.

[93] R. J. Kulathinal, L. S. Stevison, and M. A. Noor. The genomics of speciation in Drosophila: diversity, divergence, and introgression estimated using low- coverage genome sequencing. PLoS genetics, 5(7), 2009.

[94] C. C. Kyriazis, R. K. Wayne, and K. E. Lohmueller. High genetic diversity can contribute to extinction in small populations. bioRxiv, page 678524, 2019.

[95] S. Lamichhaney, J. Berglund, M. S. Almén, K. Maqbool, M. Grabherr, A. Martinez-Barrio, M. Promerová, C.-J. Rubin, C. Wang, N. Zamani, et al. Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature, 518(7539):371–375, 2015.

[96] G. M. Langham. Specialized avian predators repeatedly attack novel color morphs of Heliconius butterflies. Evolution, 58(12):2783–2787, 2004.

69 [97] C. C. Laurie. The weaker sex is heterogametic: 75 years of Haldane’s rule. Genetics, 147(3):937, 1997.

[98] M. Leitwein, H. Cayuela, A.-L. Ferchaud, É. Normandeau, P.-A. Gagnaire, and L. Bernatchez. The role of recombination on genome-wide patterns of local ancestry exemplified by supplemented brook charr populations. Molecular Ecology, 28(21):4755–4769, 2019.

[99] M. Leitwein, M. Duranton, Q. Rougemont, P.-A. Gagnaire, and L. Bernatchez. Using haplotype information for conservation genomics. Trends in Ecology & Evolution, 2019.

[100] G. Li, H. V. Figueiró, E. Eizirik, and W. J. Murphy. Recombination-aware phylogenomics reveals the structured genomic landscape of hybridizing cat species. Molecular biology and evolution, 36(10):2111–2126, 2019.

[101] H. Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997, 2013.

[102] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. The sequence alignment/map format and SAM- tools. Bioinformatics, 25(16):2078–2079, 2009.

[103] L. Liu, H. Qi, J. Wang, and H. Lin. PAPI, a novel TUDOR-domain protein, complexes with AGO3, ME31B and TRAL in the nuage to silence transposi- tion. Development, 138(9):1863–1873, 2011.

[104] W. P. Maddison. Gene trees in species trees. Systematic biology, 46(3):523– 536, 1997.

[105] J. Mallet. Hybridization as an invasion of the genome. Trends in ecology & evolution, 20(5):229–237, 2005.

[106] J. Mallet and N. H. Barton. Strong natural selection in a warning-color hybrid zone. Evolution, 43(2):421–431, 1989.

[107] J. Mallet, N. Barton, G. Lamas, J. Santisteban, M. Muedas, and H. Eeley. Estimates of selection and gene flow from measures of cline width and linkage disequilibrium in Heliconius hybrid zones. Genetics, 124(4):921–936, 1990.

[108] J. Mallet, N. Besansky, and M. W. Hahn. How reticulated are species? BioEs- says, 38(2):140–149, 2016.

[109] D. A. Marques, J. I. Meier, and O. Seehausen. A combinatorial view on speciation and adaptive radiation. Trends in ecology & evolution, 2019.

70 [110] A. Martin, R. Papa, N. J. Nadeau, R. I. Hill, B. A. Counterman, G. Halder, C. D. Jiggins, M. R. Kronforst, A. D. Long, W. O. McMillan, et al. Diversi- fication of complex butterfly wing patterns by repeated regulatory evolution of a Wnt ligand. Proceedings of the National Academy of Sciences, 109(31): 12632–12637, 2012.

[111] S. H. Martin, K. K. Dasmahapatra, N. J. Nadeau, C. Salazar, J. R. Walters, F. Simpson, M. Blaxter, A. Manica, J. Mallet, and C. D. Jiggins. Genome- wide evidence for speciation with gene flow in Heliconius butterflies. Genome research, 23(11):1817–1828, 2013.

[112] S. H. Martin, J. W. Davey, and C. D. Jiggins. Evaluating the use of ABBA– BABA statistics to locate introgressed loci. Molecular biology and evolution, 32(1):244–257, 2015.

[113] S. H. Martin, J. W. Davey, C. Salazar, and C. D. Jiggins. Recombination rate variation shapes barriers to introgression across butterfly genomes. PLoS Biology, 17(2):e2006288, 2019.

[114] A. Martinez Arias and M. Bate. The development of Drosophila melanogaster. Cold Spring Harbor Laboratory Press, 1993.

[115] D. R. Matute, I. A. Butler, D. A. Turissini, and J. A. Coyne. A test of the snowball theory for the rate of evolution of hybrid incompatibilities. Science, 329(5998):1518–1521, 2010.

[116] E. Mayr et al. species and evolution. Animal species and evolution., 1963.

[117] G. A. McVean, S. R. Myers, S. Hunt, P. Deloukas, D. R. Bentley, and P. Don- nelly. The fine-scale structure of recombination rate variation in the human genome. Science, 304(5670):581–584, 2004.

[118] J. I. Meier, D. A. Marques, S. Mwaiko, C. E. Wagner, L. Excoffier, and O. See- hausen. Ancient hybridization fuels rapid cichlid fish adaptive radiations. Na- ture communications, 8(1):1–11, 2017.

[119] R. M. Merrill, B. Van Schooten, J. A. Scott, and C. D. Jiggins. Pervasive genetic associations between traits causing in Heliconius butterflies. Proceedings of the Royal Society B: Biological Sciences, 278(1705): 511–518, 2011.

[120] B. S. Meyer, M. Matschiner, and W. Salzburger. Disentangling incomplete lineage sorting and introgression to refine species-tree estimates for Lake Tan- ganyika cichlid fishes. Systematic Biology, 66(4):531–550, 2017.

71 [121] C. Mézard, M. T. Jahns, and M. Grelon. Where to cross? New insights into the location of meiotic crossovers. Trends in Genetics, 31(7):393–401, 2015.

[122] O. Mihola, Z. Trachtulec, C. Vlcek, J. C. Schimenti, and J. Forejt. A mouse speciation gene encodes a meiotic histone h3 methyltransferase. Science, 323 (5912):373–375, 2009.

[123] M. Moest, S. M. Van Belleghem, J. E. James, C. Salazar, S. H. Martin, S. L. Barker, G. R. Moreira, C. Mérot, M. Joron, N. J. Nadeau, et al. Selective sweeps on novel and introgressed variation shape mimicry loci in a butterfly adaptive radiation. PLoS Biology, 18(2):e3000597, 2020.

[124] H. Muller. Isolating mechanisms, evolution, and temperature. In Biol. Symp., volume 6, pages 71–125, 1942.

[125] M. W. Nachman and B. A. Payseur. Recombination rate variation and spe- ciation: theoretical predictions and empirical results from rabbits and mice. Philosophical Transactions of the Royal Society B: Biological Sciences, 367 (1587):409–421, 2012.

[126] N. J. Nadeau, C. Pardo-Diaz, A. Whibley, M. A. Supple, S. V. Saenko, R. W. Wallbank, G. C. Wu, L. Maroja, L. Ferguson, J. J. Hanly, et al. The gene cortex controls mimicry and crypsis in butterflies and moths. Nature, 534 (7605):106–110, 2016.

[127] R. E. Naisbit, C. D. Jiggins, M. Linares, C. Salazar, and J. Mallet. Hybrid sterility, Haldane’s rule and speciation in Heliconius cydno and H. melpomene. Genetics, 161(4):1517–1526, 2002.

[128] L. C. Norris, B. J. Main, Y. Lee, T. C. Collier, A. Fofana, A. J. Cornel, and G. C. Lanzaro. Adaptive introgression in an African malaria mosquito coincident with the increased usage of insecticide-treated bed nets. Proceedings of the National Academy of Sciences, 112(3):815–820, 2015.

[129] H. A. Orr. The population genetics of speciation: the evolution of hybrid incompatibilities. Genetics, 139(4):1805–1813, 1995.

[130] H. A. Orr. Dobzhansky, Bateson, and the genetics of speciation. Genetics, 144(4):1331, 1996.

[131] H. A. Orr. Haldane’s rule. Annual Review of Ecology and Systematics, 28(1): 195–218, 1997.

[132] H. A. Orr and M. Turelli. Dominance and Haldane’s rule. Genetics, 143(1): 613, 1996.

72 [133] H. A. Orr and M. Turelli. The evolution of postzygotic isolation: accumulating Dobzhansky-Muller incompatibilities. Evolution, 55(6):1085–1094, 2001.

[134] E. Palkopoulou, M. Lipson, S. Mallick, S. Nielsen, N. Rohland, S. Baleka, E. Karpinski, A. M. Ivancevic, T.-H. To, R. D. Kortschak, et al. A compre- hensive genomic history of extinct and living elephants. Proceedings of the National Academy of Sciences, 115(11):E2566–E2574, 2018.

[135] P. Pamilo and M. Nei. Relationships between gene trees and species trees. Molecular biology and evolution, 5(5):568–583, 1988.

[136] C. Pardo-Diaz, C. Salazar, S. W. Baxter, C. Merot, W. Figueiredo-Ready, M. Joron, W. O. McMillan, and C. D. Jiggins. Adaptive introgression across species boundaries in Heliconius butterflies. PLoS genetics, 8(6), 2012.

[137] N. Patterson, P. Moorjani, Y. Luo, S. Mallick, N. Rohland, Y. Zhan, T. Gen- schoreck, T. Webster, and D. Reich. Ancient admixture in human history. Genetics, 192(3):1065–1093, 2012.

[138] B. A. Payseur and L. H. Rieseberg. A genomic perspective on hybridization and speciation. Molecular ecology, 25(11):2337–2360, 2016.

[139] J. B. Pease and M. W. Hahn. Detection and polarization of introgression in a five-taxon phylogeny. Systematic biology, 64(4):651–662, 2015.

[140] J. B. Pease, D. C. Haak, M. W. Hahn, and L. C. Moyle. Phylogenomics reveals three sources of adaptive variation during a rapid radiation. PLoS Biology, 14 (2), 2016.

[141] L. Peng, L. Wang, Y.-F. Yang, M.-M. Zou, W.-Y. He, Y. Wang, Q. Wang, L. Vasseur, and M.-S. You. Transcriptome profiling of the Plutella xylostella (Lepidoptera: Plutellidae) ovary reveals genes involved in oogenesis. Gene, 637:90–99, 2017.

[142] A. L. Pessoa Pinharanda. The genomic basis of species barriers in Heliconius butterflies. PhD thesis, University of Cambridge, 2017.

[143] K. S. Pfennig, A. L. Kelly, and A. A. Pierce. Hybridization as a facilitator of species range expansion. Proceedings of the Royal Society B: Biological Sciences, 283(1839):20161329, 2016.

[144] J. Piálek and N. H. Barton. The spread of an advantageous allele across a barrier: the effects of random drift and selection against heterozygotes. Genetics, 145(2):493–504, 1997.

73 [145] H. Pimentel, N. L. Bray, S. Puente, P. Melsted, and L. Pachter. Differential analysis of RNA-seq incorporating quantification uncertainty. Nature methods, 14(7):687, 2017.

[146] D. C. Presgraves. Patterns of postzygotic isolation in Lepidoptera. Evolution, 56(6):1168–1183, 2002.

[147] D. C. Presgraves. A fine-scale genetic analysis of hybrid incompatibilities in Drosophila. Genetics, 163(3):955–972, 2003.

[148] D. C. Presgraves. Speciation genetics: search for the missing snowball. Current biology, 20(24):R1073–R1074, 2010.

[149] T. D. Price and M. M. Bouvier. The evolution of F1 postzygotic incompati- bilities in birds. Evolution, 56(10):2083–2089, 2002.

[150] P. Pulido-Santacruz, A. Aleixo, and J. T. Weir. Genomic data reveal a pro- tracted window of introgression during the diversification of a Neotropical woodcreeper radiation. Evolution.

[151] S. Pääbo. The mosaic that is our genome. Nature, 421(6921), 2003.

[152] D. L. Rabosky, M. N. Hutchinson, S. C. Donnellan, A. L. Talaba, and I. J. Lovette. Phylogenetic disassembly of species boundaries in a widespread group of australian skinks (scincidae: Ctenotus). Molecular Phylogenetics and Evo- lution, 77:71–82, 2014.

[153] F. Racimo, S. Sankararaman, R. Nielsen, and E. Huerta-Sánchez. Evidence for archaic adaptive introgression in humans. Nature Reviews Genetics, 16(6): 359–371, 2015.

[154] M. D. Rasmussen, M. J. Hubisz, I. Gronau, and A. Siepel. Genome-wide inference of ancestral recombination graphs. PLoS genetics, 10(5), 2014.

[155] P. Rastas. Lep-MAP3: robust linkage mapping even for low-coverage whole genome sequencing data. Bioinformatics, 33(23):3726–3732, 2017.

[156] D. Reich, K. Thangaraj, N. Patterson, A. L. Price, and L. Singh. Reconstruct- ing indian population history. Nature, 461(7263):489–494, 2009.

[157] F. Roda, F. K. Mendes, M. W. Hahn, and R. Hopkins. Genomic evidence of gene flow during reinforcement in Texas Phlox. Molecular ecology, 26(8): 2317–2330, 2017.

74 [158] J. Rogers, M. Raveendran, R. A. Harris, T. Mailund, K. Leppälä, G. Athanasiadis, M. H. Schierup, J. Cheng, K. Munch, J. A. Walker, et al. The comparative genomics and complex population history of Papio baboons. Science advances, 5(1):eaau6947, 2019.

[159] N. Rosser, L. M. Queste, B. Cama, N. B. Edelman, F. Mann, R. Mori Pezo, J. Morris, C. Segami, P. Velado, S. Schulz, et al. Geographic contrasts between pre-and postzygotic barriers are consistent with reinforcement in Heliconius butterflies. Evolution, 73(9):1821–1838, 2019.

[160] T. B. Sackton, R. B. Corbett-Detig, J. Nagaraju, L. Vaishna, K. P. Arunku- mar, and D. L. Hartl. Positive selection drives faster-Z evolution in silkmoths. Evolution, 68(8):2331–2342, 2014.

[161] R. E. Schaefer, M. G. Kidwell, and A. Fausto-Sterling. Hybrid dysgenesis in Drosophila melanogaster: morphological and cytological studies of ovarian dysgenesis. Genetics, 92(4):1141–1152, 1979.

[162] M. Schartl. Evolution of xmrk: an oncogene, but also a speciation gene? Bioessays, 30(9):822–832, 2008.

[163] D. R. Schield, R. H. Adams, D. C. Card, B. W. Perry, G. M. Pasquesi, T. Jezkova, D. M. Portik, A. L. Andrew, C. L. Spencer, E. E. Sanchez, et al. Insight into the roles of selection in speciation from genomic patterns of di- vergence and introgression in secondary contact in venomous rattlesnakes. Ecology and evolution, 7(11):3951–3966, 2017.

[164] D. Schluter. The ecology of adaptive radiation. OUP Oxford, 2000.

[165] M. Schumer, R. Cui, D. L. Powell, R. Dresner, G. G. Rosenthal, and P. Andol- fatto. High-resolution mapping reveals hundreds of genetic incompatibilities in hybridizing fish species. Elife, 3:e02535, 2014.

[166] M. Schumer, C. Xu, D. L. Powell, A. Durvasula, L. Skov, C. Holland, J. C. Blazier, S. Sankararaman, P. Andolfatto, G. G. Rosenthal, et al. Natural se- lection interacts with recombination to shape the evolution of hybrid genomes. Science, 360(6389):656–660, 2018.

[167] O. Seehausen. Hybridization and adaptive radiation. Trends in ecology & evolution, 19(4):198–207, 2004.

[168] F. A. Seixas, P. Boursot, and J. Melo-Ferreira. The genomic impact of his- torical hybridization with massive mitochondrial DNA introgression. Genome biology, 19(1):91, 2018.

75 [169] D. Setter, S. Mousset, X. Cheng, R. Nielsen, M. DeGiorgio, and J. Hermis- son. VolcanoFinder: genomic scans for adaptive introgression. bioRxiv, page 697987, 2019.

[170] G. H. Shull and J. W. Gowen. Beginnings of the heterosis concept. 1952.

[171] J. B. Slowinski and R. D. M. Page. How should species phylogenies be inferred from sequence data? Systematic Biology, 48(4):814–825, 1999.

[172] C. S. Smukowski Heil, C. Ellison, M. Dubin, and M. A. Noor. Recombining without hotspots: a comprehensive evolutionary portrait of recombination in two closely related species of Drosophila. Genome biology and evolution, 7 (10):2829–2842, 2015.

[173] M. J. Snee and P. M. Macdonald. Bicaudal C and trailer hitch have similar roles in gurken mRNA localization and cytoskeletal organization. Develop- mental biology, 328(2):434–444, 2009.

[174] C. Solís-Lemus and C. Ané. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS genetics, 12(3), 2016.

[175] Y. Song, S. Endepols, N. Klemann, D. Richter, F.-R. Matuschka, C.-H. Shih, M. W. Nachman, and M. H. Kohn. Adaptive introgression of anticoagulant rodent poison resistance by hybridization between old world mice. Current Biology, 21(15):1296–1301, 2011.

[176] F. Staubach, A. Lorenc, P. W. Messer, K. Tang, D. A. Petrov, and D. Tautz. Genome patterns of selection and introgression of haplotypes in natural pop- ulations of the house mouse (Mus musculus). PLoS Genetics, 8(8), 2012.

[177] A. J. Stern, P. R. Wilton, and R. Nielsen. An approximate full-likelihood method for inferring selection and allele frequency trajectories from dna se- quence data. PLoS genetics, 15(9):e1008384, 2019.

[178] H. Svardal, F. X. Quah, M. Malinsky, B. P. Ngatunga, E. A. Miska, W. Salzburger, M. J. Genner, G. F. Turner, and R. Durbin. Ancestral Hybridization Facilitated Species Diversification in the Lake Malawi Cichlid Fish Adaptive Radiation. Molecular Biology and Evolution, 12 2019. ISSN 0737-4038. doi: 10.1093/molbev/msz294. URL https://doi.org/10.1093/ molbev/msz294. msz294.

[179] P. B. Talbert and S. Henikoff. Centromeres convert but don’t cross. PLoS biology, 8(3), 2010.

76 [180] S. Tang and D. C. Presgraves. Evolution of the Drosophila nuclear pore com- plex results in multiple hybrid incompatibilities. Science, 323(5915):779–782, 2009.

[181] S. A. Taylor and E. L. Larson. Insights from genomes into the evolution- ary importance and prevalence of hybridization in nature. Nature ecology & evolution, 3(2):170–177, 2019.

[182] C. Than, D. Ruths, and L. Nakhleh. PhyloNet: a software package for an- alyzing and reconstructing reticulate evolutionary relationships. BMC bioin- formatics, 9(1):322, 2008.

[183] L. H. Throckmorton. Similarity versus relationship in drosophila. Systematic Zoology, 14(3):221–236, 1965.

[184] D. P. Toews and A. Brelsford. The biogeography of mitochondrial and nuclear discordance in animals. Molecular Ecology, 21(16):3907–3930, 2012.

[185] M. Turelli and H. A. Orr. Dominance, epistasis and the genetics of postzygotic isolation. Genetics, 154(4):1663–1679, 2000.

[186] M. Turelli, J. R. Lipkowitz, and Y. Brandvain. On the Coyne and Orr-igin of species: effects of intrinsic postzygotic isolation, ecological differentiation, X chromosome size, and sympatry on Drosophila speciation. Evolution, 68(4): 1176–1187, 2014.

[187] W. A. Valencia-Montoya, S. Elfekih, H. L. North, J. I. Meier, I. A. Warren, W. T. Tay, K. H. Gordon, A. Specht, S. V. Paula-Moraes, R. Rane, et al. Adaptive introgression across semipermeable species boundaries between local Helicoverpa zea and invasive Helicoverpa armigera moths. bioRxiv, 2019.

[188] S. M. Van Belleghem, P. Rastas, A. Papanicolaou, S. H. Martin, C. F. Arias, M. A. Supple, J. J. Hanly, J. Mallet, J. J. Lewis, H. M. Hines, et al. Complex modular architecture around a simple toolkit of wing pattern genes. Nature Ecology & Evolution, 1(3):1–12, 2017.

[189] S. M. Van Belleghem, M. Baquero, R. Papa, C. Salazar, W. O. McMillan, B. A. Counterman, C. D. Jiggins, and S. H. Martin. Patterns of Z chromosome divergence among Heliconius species highlight the importance of historical demography. Molecular ecology, 27(19):3852–3872, 2018.

[190] C. Veller, N. B. Edelman, P. Muralidhar, and M. A. Nowak. Recombina- tion, variance in genetic relatedness, and selection against introgressed DNA. BioRxiv, page 846147, 2019.

77 [191] C. Veller, N. Kleckner, and M. A. Nowak. A rigorous measure of genome- wide genetic shuffling that takes into account crossover positions and Mendel’s second law. Proceedings of the National Academy of Sciences, 116(5):1659– 1668, 2019.

[192] J. Wakeley. Coalescent theory: an introduction. Robers and Co, 2009.

[193] R. W. Wallbank, S. W. Baxter, C. Pardo-Diaz, J. J. Hanly, S. H. Martin, J. Mallet, K. K. Dasmahapatra, C. Salazar, M. Joron, N. Nadeau, et al. Evo- lutionary novelty in a butterfly wing pattern through enhancer shuffling. PLoS biology, 14(1), 2016.

[194] J. Walsh, A. I. Kovach, B. J. Olsen, W. G. Shriver, and I. J. Lovette. Bidi- rectional adaptive introgression between two ecologically divergent sparrow species. Evolution, 72(10):2076–2089, 2018.

[195] D. Wen and L. Nakhleh. Coestimating reticulate phylogenies and gene trees from multilocus sequence data. Systematic Biology, 67(3):439–457, 2018.

[196] D. Wen, Y. Yu, and L. Nakhleh. Bayesian inference of reticulate phylogenies under the multispecies network coalescent. PLoS genetics, 12(5), 2016.

[197] J. E. Wilhelm, M. Buszczak, and S. Sayles. Efficient protein trafficking requires trailer hitch, a component of a ribonucleoprotein complex localized to the ER in Drosophila. Developmental cell, 9(5):675–685, 2005.

[198] C.-I. Wu and A. W. Davis. Evolution of postmating reproductive isolation: the composite nature of Haldane’s rule and its genetic bases. The American Naturalist, 142(2):187–212, 1993.

[199] H. Yamauchi and N. Yoshitake. Developmental stages of ovarian follicles of the silkworm, Bombyx mori L. Journal of Morphology, 179(1):21–31, 1984.

[200] Y. Yu, J. Dong, K. J. Liu, and L. Nakhleh. Maximum likelihood inference of reticulate evolutionary histories. Proceedings of the National Academy of Sciences, 111(46):16448–16453, 2014.

[201] Q. Zhang, W. Sun, B.-Y. Sun, Y. Xiao, and Z. Zhang. The dynamic landscape of gene regulation during Bombyx mori oogenesis. BMC genomics, 18(1):714, 2017.

78