Using Genomic Tools to Understand Dynamics in Madagascar's Mouse

Lemurs (: Primates)

by

Christopher Ryan Campbell

Department of Biology Duke University

Date:______Approved:

______Anne Yoder, Supervisor

______S. Thomas Mitchell-Olds

______Charles Nunn

______Jeffrey L. Thorne

______Marcy Uyenoyama

Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biology in the Graduate School of Duke University

2019

iv

ABSTRACT

Using Genomic Tools to Understand Speciation Dynamics in Madagascar's Mouse

Lemurs (Order: Primates)

by

Christopher Ryan Campbell

Department of Biology Duke University

Date:______Approved:

______Anne Yoder, Supervisor

______S. Thomas Mitchell-Olds

______Charles Nunn

______Jeffrey L. Thorne

______Marcy Uyenoyama

An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biology in the Graduate School of Duke University

2019

Copyright by Christopher Ryan Campbell 2019

Abstract

Mouse lemurs ( Microcebus; Order Primates) are endemic to the island nation of Madagascar. These small, nocturnal primates offer a unique mammalian system ideally suited to study the patterns and processes of speciation, and the genomic revolution of the past decade has presented us with novel tools to test and understand these -specific dynamics.

This thesis presents three independent projects on the speciation dynamics within mouse lemurs, interconnected by their utilization of the entire genome as a tool for reconstructing hidden evolutionary processes. Chapter 2 is an overview of the exploding field of speciation genomics and serves to ground these individual projects in evolutionary theory. In Chapter 3, we adopt a genome-wide approach to understand how the changing landscape of Madagascar has shaped the of mouse lemurs while simultaneously leveraging the patterns derived from these to discern clues about the on the island in the distant past, and thus the impact of more recently arriving . The phylogenetic and population genetic patterns in a five-species of mouse lemurs suggest that longitudinal dispersal across the island occurred until approximately 500,000 ybp when a large ancestral population experienced rapid diversification, resulting in the present-day distributions of these species. However, the accuracy of the estimates of species ages from genetic data are limited primarily by our understanding of the generation process in this non-

iv

model species. Thus, in Chapter 4, we improve these estimates by using high-coverage linked-read sequencing to estimate the generational de novo mutation rate within a pedigree (n=8) of grey mouse lemurs (Microcebus murinus). We estimated a mutation rate of 1.64x10-8 per basepair per generation, higher than nearly all mammals that have been previously characterized. Because estimates of these biological metrics critically affect estimates of divergence time we reanalyzed the results presented in the previous chapter and discover considerably more recent divergence times across the five-species clade. Finally, in Chapter 5, we utilize the knowledge of functional processes housed within genomic data to investigate the underlying causes of speciation in Microcebus. While many of the initial applications of genomic data in the genus have centered around understanding the phylogenetic relationships among the species rather than the mechanisms that underlie their diversification, we attempt to utilize genomic annotations to assess a putative mechanism driving speciation. We compared the rates of positive selection within sperm genes and a set of randomly drawn genes to look for the presence of positive selection. These comparisons reveal an elevated dN/dS, and thus evidence for positive selection, in sperm fertilization-related genes relative to sperm construction-related genes and a similarly-sized set of randomly-drawn genes. The results provide data that support our current knowledge of the behavior and natural history of these primates, highlighting what could be genomic mechanism of speciation among these highly speciose primates of Madagascar. In doing so we hope to have

v

shown that detailed questions relating to both the extrinsic (e.g., inter- and intra- population and ecological interactions) and intrinsic (e.g., genome content and architecture) forces that drive speciation can be asked and answered, especially in non- model species.

vi

Dedication

To public television and public education, without their inspiration and preparation I would not have embarked on this dissertation.

vii

Contents

Abstract ...... iv

List of Tables ...... xii

List of Figures ...... xiii

Acknowledgements ...... xv

1. Introduction ...... 1

2. What is Speciation Genomics? ...... 7

2.1 Introduction ...... 8

2.2 The Demography of Speciation ...... 12

2.2.1 Speciation and ...... 12

2.2.2 Characterizing the Demography of Speciation ...... 13

2.3 The Role of Intrinsic Genomic Features in Speciation ...... 17

2.3.1 Genomic Architecture and Speciation Predisposition ...... 17

2.3.2 Genome and Gene Duplications ...... 19

2.3.3 Chromosomal Inversions ...... 23

2.4 Genomic Differentiation and Barrier Loci ...... 26

2.4.1 Genomic Differentiation Islands ...... 26

2.4.2 Identifying Barrier Loci ...... 31

2.4.3 PRDM9: A Crucial Barrier Locus? ...... 38

2.5 Looking to the Future: Is There Hope for a Unified Theory of Speciation? ...... 41

3. Geogenetic Patterns in Mouse Lemurs ...... 45

3.1 Introduction ...... 46

viii

3.2 Methods ...... 55

3.2.1 Sample Collection ...... 55

3.2.2 mtDNA Sequencing ...... 57

3.2.3 mtDNA Analyses ...... 58

3.2.4 RAD Genotyping ...... 60

3.2.5 RAD Data Analyses ...... 61

3.2.6 Divergence Time Analyses ...... 62

3.2.7 Estimation of average generation time for genus Microcebus ...... 63

3.2.8 Comparison of Genetic and Geographic Distance ...... 64

3.3 Results ...... 65

3.3.1 mtDNA tree ...... 65

3.3.2 Species tree estimation and divergence times ...... 68

3.3.3 Geogenetic Analysis ...... 70

3.4 Discussion ...... 78

3.4.1 Whole Genome Phylogeography ...... 78

3.4.2 mtDNA Diversity ...... 79

3.4.3 Genetic Distances ...... 80

3.4.4 Geogenetic analyses ...... 80

3.5 Conclusions ...... 82

4. Direct measurement of the Mouse de novo mutation rate ...... 84

4.1 Background ...... 85

4.2 Methods and Materials ...... 92

ix

4.2.1 Samples ...... 92

4.2.2 Sequencing ...... 92

4.2.3 10x Genomics Pipeline ...... 93

4.2.4 DeNovoGear ...... 94

4.2.5 Detectable Mutation Test ...... 95

4.2.6 Mutations at CpG sites ...... 95

4.2.7 Parent of Origin ...... 96

4.2.8 BPP ...... 97

4.3 Results ...... 98

4.3.1 Mutations ...... 98

4.3.2 Mutation Rate and Spectrum ...... 99

4.3.3 Effective Population Size and Mutation Rate ...... 102

4.3.4 CpG Sites ...... 104

4.3.5 Parental Origin of Mutations ...... 105

4.3.6 Improved Timetree ...... 106

4.4 Discussion ...... 107

4.4.1 Mutation rate differences across species ...... 108

4.4.2 Novel Mutation Rate Characteristics ...... 109

4.4.3 Diversification rates in mouse lemurs ...... 112

4.4.4 Methodology and caveats ...... 113

5. Rates of Sperm Gene Across Microcebus ...... 115

5.1 Background ...... 116

x

5.2 Methods ...... 119

5.2.1 Gene Identification ...... 119

5.2.2 Genome Assembly and Annotation ...... 120

5.2.3 BLAST and Alignment ...... 122

5.3 Results and Discussion ...... 124

5.3.1 Gene Selection ...... 124

5.3.2 Exon Alignment ...... 124

5.3.3 Pairwise dN/dS ...... 127

5.3.4 Multispecies dN/dS ...... 135

5.4 Conclusions ...... 139

6. Conclusion ...... 142

Appendix A: State-of-the-Art Genomic Approaches ...... 146

Appendix B: “Messy Speciation” and Genealogical Variation Across the Genome ...... 148

Appendix C: Genetic Incompatibilities and Rules of Speciation ...... 149

Appendix D: The uncertainty of the mutation rate estimate from pedigree-sequencing studies ...... 151

Appendix E: Variant Calling’s Effect on de novo Mutation Rate Estimation ...... 153

References ...... 157

Biography ...... 211

xi

List of Tables

Table 1: Sample Localities...... 56

Table 2: Ancestral effective population size (Ne) and estimated divergence times...... 70

Table 3: Mammalian Mutation Rates...... 91

Table 4: de novo Mutations...... 98

Table 5: Mutation Rates in Gray Mouse Lemurs...... 102

Table 6: Animal Mutation Rates...... 104

Table 7: Improved Divergence Time Estimation...... 106

Table 8: T-Test Results...... 134

Table 9: Kruskal-Wallis Test Results...... 135

Table 10: Summary of fertilization-related gene data across all six species...... 137

xii

List of Figures

Figure 1: Microcebus Mitochondrial Maximum Likelihood Tree...... 3

Figure 2: Genome Completeness ...... 10

Figure 3: Structural Genomic Impacts on Speciation ...... 20

Figure 4: Methods of Studying Genomics of Speciation ...... 32

Figure 5: Speciation Study Spectrum ...... 43

Figure 6: The Biomes of Madagascar...... 49

Figure 7: Microcebus Tree with Focal Species Highlighted...... 68

Figure 8: Microcebus tree of genomic SNPs...... 69

Figure 9: M. lehilahytsara Sampling Locality Satellite Images...... 71

Figure 10: Genetic and Geographic Distance Species Comparison...... 73

Figure 11: Short Distance Comparison p-values...... 74

Figure 12: Long Distance Comparison p-values...... 74

Figure 13: Combined Distance Comparison p-values...... 75

Figure 14: Microcebus Samples Inferred Geogenetic Locations...... 76

Figure 15: Focal Family Quartet with Mutations...... 99

Figure 16: Microcebus murinus mutation spectrum...... 101

Figure 17: Mutation Rate and Effective Population Size...... 103

Figure 18: Mutations at CpG Sites...... 105

Figure 19: Mutation Rate’s Impact on Phylogeny...... 107

Figure 20: Violin plot of CpG Island and Gene Proximity...... 111

Figure 21: Species Phylogeny from Assembly Data...... 122

xiii

Figure 22: Gene Exon Number versus Gene Length...... 124

Figure 23: Assembly Quality and Gene Mapping...... 126

Figure 24: Exon GC content broken down by species...... 127

Figure 25: dN/dS by Gene Category...... 128

Figure 26: dN/dS by Gene Category...... 130

Figure 27: Exon dN/dS versus Exon Length...... 131

Figure 28: The 80th Percentile of dN/dS Values by Gene Category...... 131

Figure 29: dN/dS Comparisons and Significance by Gene Category...... 133

Figure 30: dN/dS by Gene Category...... 136

Figure 31: dN/dS by Gene Category...... 137

Figure 32: dN/dS by Gene Category and Evolutionary Distance...... 139

xiv

Acknowledgements

In science, and in many ways , perhaps it has never been more true that any successes we may achieve are in no small measure the product of those around us. In

1675 Issac Newton wrote to Robert Hooke that “If I have seen further it is by standing on the shoulders of Giants.” Given the ever growing connectivity of the world and the ever growing volumes of published scientific works, these words ring truer by the day. In the same letter, Newton also wrote “I am not so much in love with philosophical productions but oft I can make them yield to equity and friendship” a sentiment that I often identify with, and is pertinent here as the equity and friendships I hope to have forged weigh large in my final product. In the spirit of these quotes I would like to acknowledge some of the people whose shoulders I stood on and who I count among my closest friends.

To my advisor, Anne Yoder, first and foremost I appreciate the chance to attend graduate school. Thank you for attracting an amazing cadre of scientists in your lab that fostered amazing work, discussions, and, of course, social events. Thank you for an education, by both word and deed, in scientific research, collaboration, and writing.

And lastly, thank you for helping me to find a dissertation somewhere among all the rabbit holes I’ve gone down over the years.

xv

I’d also like to thank my committee for excellent feedback, support, and scientific discussions. Marcy Uyenoyama has taught, and re-taught, me more math and population theory than I learned in an entire undergraduate education, I sincerely appreciate her patience and commitment. Jeff Thorne has provided great guidance navigating the world of and is always great to catch up with, I owe him many thanks for the extra work it is to serve on an external committee. Thomas

Mitchell-Olds introduced me to one of my favorite statistical methods - the permutation test and always has sage advice for extrapolating genomic results to the organismal level. And Charlie Nunn has worked to make sure I don’t lose focus of the real world, in this case Madagascar, and the star of my dissertation: the mouse lemur. Though not an official committee member, Mario dos Reis has been instrumental to this work. He is an ever-willing collaborator who has provided the necessary expertise to accomplish several of the projects here, as well as many others, and I look forward to every chance I get to work with him.

To the many members of the Yoder Lab that I have been lucky enough to share the corner of the 3rd Floor with, I owe you an endless debt of gratitude. Peter Larsen is a tireless mentor who has a passion for science that is contagious, who helped me successfully navigate the early years of graduate school life. My academic older sisters

Erin McKenney and Sheena Faherty laid out beautiful paths for me to follow and were always helpful both expanding my view of lemurs and with the nitty gritty of graduate

xvi

life. Kelsie Hunnicutt has a mind like a diamond, both awesome and tough, which was an excellent proving ground for many projects and ideas. Rachel Williams is a true friend, equal parts scientific collaborator and personal sounding board. George Tiley has been an invaluable resource for trees and species relationships among Microcebus, additionally he doesn’t let society tell him how it’s supposed to be. And last, but not least, Jelmer Poelstra knows genomes and knows speciation, is very Dutch, and I thank him for the many hours he has spent earnestly editing and shaping our co-authored papers.

It would be difficult to complete a dissertation that lacked the positive effects of the broader Biology Department community, which has been immeasurably helpful.

Though I am only going to name a few of you, everyone who participates and interacts with this larger body should feel some sense of responsibility for, and hopefully pride in, the work it produces. I’m always smarter and more thoughtful after spending time with Aspen Reece. I have been lucky to overlap with Jenn Coughlan who shares my scientific interests. She has given me feedback, insight, and her enthusiasm is unmatched. I am thankful for Andy Ballard’s steadying, confidence-inspiring impact on me, and I hope everyone finds a friend like that. Eleanor Caves has been both an inspiration and a motivational force and Patrick Green has been a port in the storm and a friend with whom I have always been able to discuss any and everything. Their

xvii

combined impact on my personal and scientific life will likely know no equal, and for that I am eternally grateful.

Lastly I must thank those that set me on this path to begin with, my parents, sister, and wife. My father, Bill Campbell, taught me the value of hard work and how important it is to consider others. I can thank my mother, Nancy Miller Campbell, for her intelligence, kindness, and endless support. My sister, Erin Campbell, was always around to push me to achieve more, though surely she’d swear to the converse being true. To the single person to whom I ascribe the vast plurality of credit for my accomplishments, my wife Katelyn Ander. Thank you for believing in me, standing by me, and making me want to be a better person.

xviii

1. Introduction

Madagascar is home to a unique radiation of critically-endangered primates, the

Lemuriformes. Within the lemuriform primates, the genus Microcebus has been of special interest given their rapid life history profile and the still evolving understanding of their . Originally thought to contain only one species, they are now recognized as one of the most speciose primate on earth. With the advancement and proliferation of sequencing technologies, a more complete assessment of the species’ diversity has occurred, and presently, 24 species have been formally described and named. These species can be found across the entirety of Madagascar, in a broad array of demographic circumstances, from allopatry to , and with species found in virtually every in Madagascar, often showing patterns of extreme microendemism. The deployment of multiple next generation sequencing technologies has produced a high-quality assembly for Microcebus murinus, the only lemur with an annotated -level genome.

Mouse lemurs were considered to be monotypic for nearly 200 years following their original description in 1795 (Geoffroy Saint-Hilaire, 1795). It wasn’t until the 1970’s, when increased research activity and broader geographic sampling led investigators to conclude that there were actually at least two distinct forms (Martin, 1972; Petter et al.,

1977; Tattersall, 1982). At that time two species, M. murinus - a long-eared gray animal from the western regions of Madagascar and M. rufus - a short-eared reddish animal

1

from the east, were formally recognized. Martin (1972), in particular, made note of the differing and ecological conditions defining the two species, with M. murinus being insectivorous and inhabiting dry deciduous and spiny desert forest, and M. rufus inhabiting humid rain forest and showing dietary tendencies towards omnivory. Thus, after 200 years of scientific stasis, the idea that both ecological and biogeographic mechanisms maintain species separation in mouse lemurs entered the conversation.

Even so, nearly 40 years of continued stasis persisted, though this changed abruptly with the addition of genetic characterization of mouse lemur lineages (Yoder et al. 2000, Weisrock et al. 2010). Continuing this trend, recent studies (Rasoloarison et al.

2013, Hotaling et al. 2016) have built on the pioneering work of Martin (1972) by showing that mouse lemurs are remarkably diverse and express a host of complex ecological, behavioral, and physiological specializations and are found in a complex array of species demographies, ages and interactions. At present the genus consists of 24

2

species (Figure 1), with an estimated most-recent-common-ancestor age of 8-10 million years and the youngest species age of 50 kya (thousand years ago).

Figure 1: Microcebus Mitochondrial Maximum Likelihood Tree. Maximum likelihood tree for concatenated mtDNA data (cytb and cox2) from 117 Microcebus sequences plus four outgroup sequences (Cheirogaleus and Mirza; not shown in figure). Bootstrap support is shown for internal nodes (100 replicates). The age of the basal node was previously estimated using phylogenetic methods by Yoder and Yang (48).

3

In addition to the genus’ flexibility and Madagascar’s extreme ecological heterogeneity, mouse lemurs also possess the life-history characteristics that ideally suit them for the study of speciation. They are short-lived, able to breed at 10 months of age, and have a generation time of roughly 4 years (Yoder et al., 2016). They also have relatively small home ranges for a mammal, presumably due to their extremely small body size (~30 - 90 grams, depending on species). Small home ranges, in combination with the extreme environmental heterogeneity of Madagascar, is likely causal to the microendemism observed across the mouse lemur clade wherein species appear to be adapted to discrete, local environments (Rasoloarison et al. 2013, Hotaling et al. 2016).

Recent research from our lab shows that lineage diversification is occurring faster and more recently than has been previously estimated. Ample evidence indicates that mouse lemurs have been progressively adapting to the small and varied ecological niches within Madagascar (Ganzhorn & Schmid 1998, Rasoloarison et al. 2013, Hotaling et al.

2016). And given their quick life-history turnover, it is likely that they can adapt rapidly to diverse microhabitats. Also notable is the observation that some species of mouse lemur can occur in high densities in degraded forests and on forest edges. In particular, they are known to thrive in areas with invasive fruiting trees, such as Chinese guava, and have been observed to disperse on the ground between trees. Thus, of all of the endemic Malagasy primates, they have been considered as the most likely to be robust to -mediated landscape modification (Gerard et al. 2015), though this appears to

4

be true for some but not all species (Schaffler et al. 2015). This combination of

Madagascar’s unique habitat variation and the diversity of Microcebus species provides a natural experiment in which to examine speciation and species boundaries in a mammal and to characterize how this process may be influenced by the mammalian genome.

M. murinus is frequently bred in captivity and as a result it is also one of the most genetically well-studied lemurs. Along with Coquerel’s sifaka, Propethicus coquereli,

Blue-eyed Black Lemur (Meyer et al. 2015), Eulemur flavirons, Bamboo Lemur (Hawkins et al. 2018), Prolemur simus, and the Aye-aye (Perry et al. 2012), Daubentonia madagascarensis, it is one of the few lemurs to have even a roughly assembled genome.

Recent work has increased sequence coverage to greater than 150-fold depth across multiple sequencing technologies. These efforts include the use of optical mapping (Lam et al. 2012) and Hi-C chromosome painting (Imakaev et al. 2012) to improve scaffold size and assembly quality (Larsen et al. 2017). The availability of a well-assembled genome has proven to be crucial to any thorough investigation of the genomic impact of speciation and has been thus utilized throughout the independent projects presented in this thesis. Having access to a high-quality genome increases the effectiveness of new, affordable genome-wide sequencing technologies such as RADseq by improving mapping quality and limiting data loss. This provides more loci and information about species, allowing more thorough dating as well as a more accurate picture of species histories than through a handful of mitochondrial genes. Lastly, an annotated genome is

5

a prerequisite to the crucial step of moving beyond simply describing and dating species and towards understanding the how’s and why’s of biological processes such as speciation.

This growing body of theory, methods, and empirical data is enabling us to ask ever more refined questions about the speciation process and to ask them in some of nature’s most obscure species. These questions and the influx of genomic tools are powerful tools for understanding the evolutionary processes that have driven the diversification of the grey mouse lemur (M. murinus) and its sister lineages that make up the genus Microcebus. This dissertation leverages the information in the mouse lemur genome to describe species histories across the genus, measure mutation rate, characterize mutations with respect to other mammals, and compare genes across several species of mouse lemur to query the functional causes of their high levels of lineage diversification.

6

2. What is Speciation Genomics?1

As is true of virtually every of the biological sciences, our understanding of speciation is increasingly informed by the genomic revolution of the past decade.

Investigators can ask detailed questions relating to both the extrinsic (e.g., inter- and intra-population and ecological interactions) and intrinsic (e.g., genome content and architecture) forces that drive speciation. Technologies ranging from restriction-site associated DNA sequencing (RADseq), to whole genome sequencing and assembly, to transcriptomics, to CRISPR are revolutionizing the means by which investigators can both frame and test hypotheses of lineage diversification. Our review aims to examine both extrinsic and intrinsic aspects of speciation. Genome-scale data have already served to fundamentally clarify the role of gene flow during (and after) speciation, though we predict that the differential propensity for speciation among phylogenetic lineages will be one of the most exciting frontiers for future genomic investigation. We propose that a unified theory of speciation will take into account the idiosyncratic features of genomic architecture examined in the light of each 's biology and ecology drawn from across the full breadth of the Tree of Life.

1 The contents of this chapter have been published in The Biological Journal of the Linnean Society: Campbell, C. R., Poelstra, J. W., & Yoder, A. D. (2018). What is Speciation Genomics? The roles of ecology, gene flow, and genomic architecture in the formation of species. Biological Journal of the Linnean Society, 124(4), 561-583.

7

2.1 Introduction

It is an exciting time to be an empiricist engaged in the genetics and genomics of speciation. Combined with the enduring power of field and laboratory studies, genomic analysis is allowing investigators to rigorously test long-standing questions regarding the sources of and selective pressures underlying reproductive barriers, the genomic architecture associated with speciation, and the roles of ecology, geography and demography in speciation across the Tree of Life. The process of lineage diversification and the mechanisms that promote it have been of fundamental interest from the very outset of the formalized theory of evolution by (Darwin, 1858; Darwin,

1859). In Darwin's view, natural selection was the driving force of speciation, intrinsically augmented by ecological conditions. The introduction of Mayr’s (1942)

Biological , however, laid bare the apparent difficulties of establishing (RI) without prolonged geographic separation. Though these two views of speciation, sympatric versus allopatric, were initially considered to be fundamentally opposed, it is now appreciated that they are actually endpoints on a continuum. Foundational work by Guy Bush and colleagues (Bush, 1994; Bush, 1998), together with genetic and genomic approaches, have clarified that gene flow among diverging species is often a facet of speciation. Thus, we are now in the position to consider the relative influence of geography, ecology, and selection in driving the speciation process. Moreover, genome-scale data have pointed to the role of genomic

8

architecture in predisposing certain lineages towards divergence, and others towards stasis.

Thus, we have reached a point wherein forces that are both extrinsic and intrinsic to the organism are equally tractable for investigation. Even so, the frontier is vast and the unknown significantly outweighs the known. The genetic and genomic data that have thus far been generated are phylogenetically restricted, and have a strong bias towards a limited number of model systems which accordingly imposes a biased organismal perspective (for an insightful review, see Scordato et al., 2014). Though constraining at present, this bias should not be surprising given that model tend to be those best characterized genomically, thus conferring benefits to the study of closely-related lineages with decreasing benefits as phylogenetic distance increases. As we discuss below, however, taxonomic bias in available genomic resources is rapidly giving way to a broader phylogenetic perspective as more genomes are being sequenced

(Figure 2A) at a higher standard of quality (Figure 2B) both qualitatively and quantitatively (Figure 2C) with important features such as detailed annotation (Figure

2D). Thanks to remarkable advances in sequencing technologies and de novo genome assembly (Alkhateeb & Rueda, 2017; Jackman et al., 2017; Kamath et al., 2017; Paten et al., 2017; Vaser et al., 2017; Worley, 2017), it is no longer the case that the availability of a closely-related model species and reference genome are essential to the generation of genome-scale data and analysis of non-model organisms (see Appendix A). Moreover,

9

genome-scale data have pointed to the role of genomic architecture in predisposing certain lineages towards divergence, and others towards stasis.

Figure 2: Genome Completeness. Number and quality of plant and vertebrate genomes uploaded to the National Center for Biotechnology Information (NCBI) over time. A) Overall number of genomes uploaded per year since 2000. B) Genomes modified since 2012, displayed by NCBI's assessment of completeness. C) Violin plots of average scaffold size (genome size / number of scaffolds) by year of genomes modified since 2015, horizontal bar marks the median. D) Number of genomes that are currently annotated, by original release date.

It is our aim in this review to examine the history, recent developments, and future directions of the field now generally referred to as "speciation genomics." Given the enormity of the field, it is not our intent (nor a realistic goal) to provide an exhaustive overview of the relevant literature. Rather, our primary goal is to illustrate the many ways that technological advances for characterizing the genome are serving to

10

enhance understanding of the interacting extrinsic and intrinsic forces that drive speciation. Scordato et al. (2014) described "internal interactions," wherein natural and jointly influence divergence in sexual traits and preferences, are considerably more common than cases wherein "external interactions" are driven by ecological context and transmission efficiency of sexual trait signals. Here, we define extrinsic features as those wherein the environment (described as any feature external to the individual organism, including conspecifics) impacts the action of the genome during speciation, and intrinsic features as those that are specific to an organism's internal features, most notably, the structure of its genome. This makes for a convenient, if not entirely inclusive, framework for examining speciation via genomic and genetic analysis. By focusing on the interaction of individuals within populations, and the impacts of environment on these interactions, we can explicitly examine the extrinsic

"demography" of speciation, whereas by focusing on features of genome structure and content, we can examine the internal "architecture" of speciation.

The present deluge of data is progressively placing within reach new, testable hypotheses and improving our understanding of the underlying speciation process. It is the intersection of theory, empiricism, and technology that promises to yield remarkable insights into the most fundamental of all evolutionary processes: the genetic and genomic underpinnings of the diversification of organismal linages through time and space.

11

2.2 The Demography of Speciation

2.2.1 Speciation and Gene Flow

Over the past two decades we have moved from a largely geographic allopatric view of speciation to a far more nuanced and complex understanding that harkens back to Darwin and draws meaningfully from the theories of both natural and sexual selection. Whereas requires only geographic isolation plus time to produce species-level lineage divergence, is thought to mostly rely on ecologically-mediated natural selection and prezygotic isolation "acting differently in different places." (Turelli, Barton & Coyne, 2001:332) Specifically a large shift has occurred in the appreciation of the occurrence and role of gene flow during as well as after speciation. First, it has become clear that gene flow can be overcome, even in the initial stages of divergence, in particular when ecologically-based divergent selection is strong. As stated by Nosil (2008): "[s]peciation with gene flow could be common".

Second, it has become clear that introgressive hybridization between substantially diverged populations is commonplace (Árnason et al., 2018; Coyner, Murphy & Matocq,

2015; Morii et al., 2015; Sankararaman et al., 2014; Schumer et al., 2018), and that despite cases of lineage merging or speciation reversal (Campagna et al., 2014; Kearns et al.,

2018), such hybridization events often contribute to (Pardo-Diaz et al., 2012;

Racimo et al., 2015; Richards & Martin, 2017) and can contribute to the formation of new lineages (Abbott et al., 2013; Lamichhaney et al., 2017; Seehausen, 2004).

12

With the advantage of genomic characterization via high-throughput sequencing combined with recent developments in statistical methods, investigators can now, for nearly any species of interest, estimate parameters to describe the demographic aspects of speciation history with unprecedented resolution (Berner & Roesti, 2017; Ellegren,

2014; Ellegren et al., 2012; Fan & Meyer, 2014; Gaither et al., 2015; Gante et al., 2016;

Kang et al., 2016; Malinsky et al., 2015; Schmitz et al., 2016; Toews et al., 2016). These aspects are collectively often referred to as the Isolation-with-Migration (IM) model and provide critical information on the rates, direction, and possibly, timing of gene flow, divergence times (which should be co-estimated with gene flow), and population size trajectories. While challenges remain, as we discuss below, together, these parameters can be used to answer key questions pertaining to speciation events, such as: did populations diverge in isolation, in the face of continuous gene flow, or has and gene flow been recent? Was speciation associated with a severe population bottleneck (e.g., )? How rapidly did observed phenotypic divergence or genetic incompatibilities evolve? The answers to these questions are fundamental to understanding the complex interplay among ecology, geography, and natural selection in driving lineage diversification.

2.2.2 Characterizing the Demography of Speciation

A major advance in population or species-level inferences has been facilitated with the use of single nucleotide polymorphisms (SNPs) drawn from across the genome.

13

Reduced representation libraries such as RADseq can be produced at low cost, do not require a reference genome, and are often sufficient when questions are focused on estimating genome-averaged historical demography. In fact, because RADseq can generate sequencing data for tens of individuals at a fraction of the cost a single whole genome, the reduced constraint on the number of individuals that can be sequenced can be highly beneficial for certain approaches, such as those that rely on the site frequency spectrum. For other applications that require large numbers of individuals, pooled sequencing (e.g., Pool-seq; Schlötterer et al., 2014) and low-coverage sequencing (Nielsen et al., 2011) also offer low-cost whole-genome perspectives, even though sacrificing individual-levels genotypes (see Fuentes-Pardo & Ruzzante, 2017 for an overview).

Though the advantages of anonymous genome-wide SNPs are many, the need for assembled and annotated genome-scale data persists when the questions being asked require either functional or structural information. In turn, this necessitates increasing levels of theoretical sophistication. Put succinctly by Sousa and Hey (2013:404), as we accumulate more and more comparative genomic data "we find our best models and tools for explaining patterns of variation were designed for a simpler time and smaller data sets." Tremendous progress has nevertheless been made in the development of approaches to estimate demography and gene flow. Three areas in particular are worth pointing out:

14

First, several methods can now be used to estimate parameters of IM models from the site frequency spectrum (SFS), that is, the distribution of allele frequencies aggregated across the available sequencing data specific to populations (Excoffier et al.,

2013; Gutenkunst et al., 2009; Kern & Hey, 2017; Lohse et al., 2015). These methods are fast and can be easily applied to any genomic dataset. Yet, SFS approaches work best for relatively large sample sizes (that is, many individuals) due to the highly condensed summarization of the data, and thus concerns exist that they may not always be able to distinguish between competing models (Lapierre, Lambert & Achaz, 2017; Terhorst &

Song, 2015).

Second, , first proposed by Kingman (1982), is now one of the most widely used population genetic models and now forms the backbone for many current demographic inference methods. One of the central realizations that has come from coalescent theory is that a species tree will typically contain a distribution of varying gene genealogies. Given that the addition of further unlinked loci never fails to add information with respect to the underlying population history, this has further clarified the applications of coalescent theory for interpreting the demographic signals contained within large-scale genomic data. Even though coalescent approaches are still unable to use of all the genealogical information contained within a genome given the difficulties of fully incorporating recombination, an approximation of the coalescent with recombination, the Sequentially Markov Coalescent (McVean & Cardin, 2005; Wiuf

15

& Hein, 1999), has formed the basis for recent methods that estimate populations size changes through time from high-coverage whole-genome sequences (Li & Durbin, 2009;

Schiffels & Durbin, 2014). Moreover, this approach has been used to estimate the age of genomic admixture blocks (Rasmussen et al., 2014) therefore adding tremendous sophistication and precision to temporal estimates of lineage diversification.

Third, a of formal tests for admixture that was first developed to test for

Neanderthal ancestry in modern humans (Patterson et al., 2012) can be easily computed from any genome sequencing data. This test has been widely adopted and extended, and provides a simple and standardized way to test for admixture between sets of two

(f2 statistics), three (f3), four (f4, fd, D) and five (f4-ratio, DFOIL) populations or species

(Martin, Davey & Jiggins, 2015; Patterson et al., 2012; Pease & Hahn, 2015).

Together, these methods, as applied to high-throughput sequencing data sets, have already provided abundant evidence for the pervasiveness of gene flow during all stages of speciation, while also elucidating the demographic speciation histories for numerous study systems (Feder, Egan & Nosil, 2012; Pinho & Hey, 2010; Sousa et al.,

2013). Despite this rapid progress towards explicitly integrating demographic parameters into our understanding of speciation dynamics, major challenges remain.

For example, we do not generally have the resolution to create a hypothesis-free description of the rate, direction, and magnitude of gene flow through time, instead often relying on summary measures or being forced to choose from a limited set of

16

hypothesized models. One of the great remaining challenges is to distinguish ancestral population structure from ongoing gene flow. This requires that we disentangle gene flow and divergence time for very recently diverged populations, and also, that we must co-estimate demography and selection. Fundamentally, given the high variance inherent in the coalescence process, even making optimal use of the information contained in high-quality genome assemblies may not provide the resolution necessary to satisfactorily address all the challenges described above. The development of ever-more sophisticated models of the coalescent process thus remains one of the grand challenges in the field of speciation genomics.

2.3 The Role of Intrinsic Genomic Features in Speciation

2.3.1 Genomic Architecture and Speciation Predisposition

While extrinsic factors such as available ecological opportunity and within- and between-population dynamics are likely key for explaining variation in diversification rates, the intrinsic factors, underlying features of lineage-specific genome structure, require exploration if we are to understand the phylogenetic propensities for rapid speciation. That is, how often do certain features of genomic “architecture”, such as a genome ploidy, rates and patterns of recombination, and inversion frequency facilitate speciation above and beyond the extrinsic organismal effects of divergence?

17

The large and well-established effects of sex in systems with a heterogametic and homogametic sex are one testament to the role that genome architecture can have on speciation. More generally, since speciation often relies on gene-gene interactions, such as very explicitly in Dobzhansky-Muller genetic incompatibilities (Appendix C), mechanisms of epistasis and rates of recombination likely impact the probability of speciation. A case for lineage-specific genomic features to promote speciation has for instance been made for ray-finned fishes (e.g. Cortesi et al.,

2015; Rennison, Owens & Taylor, 2012; Taylor et al., 2003; Venkatesh, 2003), for which

Volff (2005) proposed the importance of an early stage of whole genome tetraploidization and subsequent rediploidization, along with the "amazing diversity" of sex determination systems and plasticity of sex chromosomes. After sequencing several lineages of ray-finned fishes, Brawand et al. (2014) found evidence of accelerated evolution among regulatory regions, microRNAs (miRNA) and transposable element

(TE) insertions. Even so, the exact mechanistic impacts of these genomic features on speciation are currently unknown. In finding a path forward, we can look to these processes in model organisms, such as work in yeast linking chromosomal architecture and species formation (Leducq et al., 2016) and illustrates these processes in rapidly radiating lineages where we expect to find the recent effects of speciation. In this light, genomic characteristics such as an excess of gene duplications, rapid mutation rates, novel miRNAs, high numbers of transposable elements, and genome-wide diversifying

18

selection on coding and regulatory elements have each been proposed to play differential roles in setting the genomic stage for rapid evolutionary transitions

(Brawand et al., 2014).

2.3.2 Genome and Gene Duplications

As a specific example, duplications at the genome (Figure 3A) and gene (Figure

3B) level have frequently been implicated as both drivers and maintainers of speciation

(e.g., Evans, 2008; Lynch, Force & Travis, 2000; Otto & Whitton, 2000; Roth et al., 2007;

Soltis, Soltis & Tate, 2004; Taylor, Van de Peer & Meyer, 2001). A clear case of clade- specific genomic architecture associated with speciation can be seen in the differential rates of polyploid speciation in plants versus animals. While this is rare in animals, the most recent study on this subject estimated that as many as 15% of speciation events in angiosperms (and double as many in ) are accompanied by ploidy increase (Wood et al., 2009). This difference in propensity for polyploidization (and associated speciation events) does not appear to be due to differences in the initial polyploidization step, and instead is more likely to be related to limitations to regain a balanced genome

(Wertheim, Beukeboom & Zande, 2013). For instance, difficulties may arise in many animals after polyploidization due to the nature of their genetic sex determination systems as well as the disruption of dosage compensation for differentiated sex chromosomes (Orr, 1990; Otto et al., 2000; Wertheim et al., 2013).

19

Figure 3: Structural Genomic Impacts on Speciation. A) Genome duplications: shown is the process of allopolyplodization, where hybridization between two species with 2N chromosomes (blue versus red chromosomes) produces individuals with 4N chromosomes that are not interfertile with either parent species. B) Gene duplications: when an ancestral pair of duplicated genes (copies marked A and B) is differentially resolved in two isolated populations, a different copy retains functionality (dark boxes: functional copies, light boxes: nonfunctional copies) in each population. Upon hybridization between the two lineages, one-fourth of the F1 gametes and one-sixteenth of the F2 zygotes will not carry any functional copy. C) Inversions can contribute to speciation in several ways due to local reduction of recombination. Inversions are depicted as boxes with thick dark lines within the larger boxes, which represent a stretch of a chromosome. Pairs of dotted lines are interacting genes: coadapted gene complexes or genetic incompatibilities. Blue represents low genetic divergence between populations, orange represents high divergence, and purple (as in the Fuller et al. 2017 model) represents high divergence segregating within populations. In contrast to the other models, higher divergence within inverted regions compared to collinear regions does not need to a consequence of (differential) gene flow.

At a finer-scale, gene duplications may also promote speciation via the divergent resolution of duplicates. The “differential resolution” of gene copies after a gene duplication event (that is, a different gene copy degenerates in each of two diverging populations), may represent a powerful, general mechanism underlying dysfunction (Lynch et al., 2000). Indeed, this process has been posited (Li et al., 2013)

20

and even demonstrated (Bikard et al., 2009; Mizuta, Harushima & Kurata, 2010) to cause hybrid incompatibilities (Figure 3B). More generally, rapid evolution of gene duplicates, for instance due to a reduction of purifying selection for one of the gene copies, may render them likely candidates for barrier loci. For example, it was recently discovered that a duplication in a crucial gene is at the root of hybrid lethality between two sympatric species of Mimulus (Brandvain & Matute, 2018; Zuellig &

Sweigart, 2018). These findings utilized gene mapping and gene expression experiments which required the genomic characterization of both species. In practical terms, this work offers a prime example of the premise that once the genome of a given species, or that of a phylogenetically-related lineage is sequenced (Gnerre et al., 2011), is fundamental to understanding the biological mechanisms that underlie the genomic loci that differentiate species, which in turn may provide mechanistic insight into the speciation process as a whole.

Several studies have discovered evidence pointing to the role of copy number variation (CNV) in the process of speciation. An interrogation of the pig (Sus scrofa) genome, along with several related species, showed that CNVs are evolving faster than

SNPs and that the CNVs often contain olfactory receptor (OR) genes which could be vital to mate recognition (Paudel et al., 2015). In a similar example, genes that determine butterfly chemosenses (ionotropic receptors, IRs) were identified as divergent between species pairs (van Schooten et al., 2016). The effects of CNVs on speciation can also be

21

indirect. Recently a study of several genera found that CNVs on sex chromosomes were responsible for rapid changes to sex ratio (O'Neill & O'Neill, 2018). These changes happen very quickly, making them responsible for the development of hybrid incompatibilities if two populations are in allopatry, ultimately leading to speciation. All of this work relies on the accurate assembly of genomes and high-quality sequencing to measure genome-scale changes between a small sample of individuals, work that hasn’t been possible, or more crucially scalable, until very recently.

We are, however, still far from being able to explain differences in polyploidization propensities in any detail (Soltis et al., 2010). In particular, with short- read next generation sequencing technologies, it has been difficult to accurately assemble duplicated regions of the genome (Ellegren, 2014). These technical barriers are steadily falling away, however, as sequencing technologies continue to become more efficient, accurate, and affordable. By improving genome characterization, improved technology has conferred new power to investigate gene duplications as indicators of speciation, even in species for which assembled genomes are as yet unavailable. Along with the increasing number of organisms with sequenced whole genomes, the recent improvement in long-read sequencing technologies and single cell sequencing is allowing for the identification of duplicated genes in non-model genomes (Larsen,

Heilman & Yoder, 2014). Longer reads, such as those produced by single-molecule sequencing, are able to differentiate gene copies by their surrounding genomic sequence,

22

and consequently, makes genome characterization for even difficult regions of the genome possible without a reference sequence (Jiao & Schneeberger, 2017). Furthermore, the recent adoption of techniques such as optical mapping, Hi-C, and linked reads (see

Appendix A) now make it possible to accurately assemble repetitive regions across hundreds of kilobases. These technical advances are thus rapidly shifting the field of speciation genomics towards greater methodological and theoretical sophistication.

2.3.3 Chromosomal Inversions

The potential of structural genomic features to promote speciation (Noor et al.,

2001a) was perhaps first appreciated with the observation that chromosomal inversions may play a special role in facilitating hybrid sterility, and thus incipient RI (Figure 3C).

Chromosomal inversions can promote reproductive isolation by two fundamental means. The first is structural: heterozygous inversions may (partially) prevent proper chromosome pairing during meiosis, such that hybrids between populations fixed for alternative orientations suffer from reduced fertility (White, 1978). This hypothesis was tested in Drosophila as early as 1933 in Dobzhansky’s (1933) classic work, but runs into difficulties explaining why such an inversion would spread in the first place (reviewed by Hoffmann & Rieseberg, 2008). The second means by which inversions may promote speciation is by suppressing recombination. In the face of gene flow, this prevents the uncoupling of allelic combinations present in the inversion, and these combinations may include, for instance, co-adapted gene complexes, male trait and female preference

23

combination, and Dobzhansky-Muller incompatibilities. Given that recombination is the central challenge for modeling speciation-with-gene-flow (Felsenstein, 1981), extreme recombination suppression such as by inversions may be expected to facilitate this potentially common mode of speciation.

Noor et al. (2001a) demonstrated that inversions create linkage groups among genes that cause sterility among a pair of Drosophila species (D. persimilis and D. pseudoobscura), prompting the development of a model wherein hybrid incompatibility genes (Noor et al., 2001b; Rieseberg, 2001) accumulate indiscriminately throughout the genome during allopatric divergence, but are retained only in inversions when gene flow is resumed during secondary contact (Figure 3C). Two related models posit that inversions may also promote speciation with primary gene flow, by allowing and incompatibilities to disproportionately build up in inverted regions

(Navarro & Barton, 2003), or by allowing inversions with co-adapted loci to spread

(Charlesworth & Barton, 2017; Kirkpatrick & Barton, 2006). The preceding models invoke gene flow to explain the widely-observed pattern of substantially higher divergence within as opposed to outside of inversions. It was recently shown, however, that fixed inversions between the same pair of Drosophila species, as in Noor’s landmark studies, all segregated long before speciation, indicating that ancestrally segregating inversions may be prone to accumulating incompatibilities regardless of the presence of gene flow (Fuller et al., 2017). This is reminiscent of two recent studies on Atlantic Cod,

24

where an ancient inversion is associated with parallel divergence in migratory phenotypes on both sides of the Atlantic Ocean (Kirubakaran et al., 2016; Sinclair-Waters et al., 2017).

In the European corn borer moth, Ostrinia nubilalis, in which a chromosomal inversion on the Z-chromosome is strongly associated with the accumulation of ecologically adaptive alleles and genetic differentiation across nearly 20% of the length of the chromosome (Wadsworth, Li & Dopman, 2015; Yasukochi et al., 2016). The authors posit that in lepidopterans, chromosomal divergence may involve two phases: first, a transient origin through local adaptation, and second, a stable persistence through differential introgression, and a similar scenario may well play out in other groups as well (Conflitti et al., 2015). In addition to paracentric inversions, high rates of other chromosomal rearrangements such as pericentric inversions, reciprocal translocations, fusions, and polyploidization appear to be evolving at high rates in several groups of "notorious speciators" such as Mimulus (Fishman et al., 2013), fish

(Cioffi et al., 2015), and butterflies (Arias, Van Belleghem & McMillan, 2016; Sichova et al., 2015). Remarkably, in a recent comparative study, the number of fixed inversions between closely-related species of songbirds was most strongly predicted by whether or not the species overlap in their geographic range (Hooper & Price, 2017).

Though it is clear that inversions can promote speciation through recombination suppression, this still leaves open the question as to whether their contributions differ

25

qualitatively from strong selection or non-inversion related variation in recombination rates across the genome. In a simulation study, Feder and Nosil (2009) found that strong selection acting on these genes was just as effective in driving divergence as were the large differences between inverted and colinear regions of diverging genomes. Further simulations refined this result by showing that the effects of inversions were most pronounced when fixed in populations prior to secondary contact, with subsequent RI maintained by adaptive change involving many genes with small fitness effects (Feder,

Nosil & Flaxman, 2014). Charlesworth et al. (2017) recently showed that, as may be intuitively expected, the propensity of an inversion to promote speciation depends strongly on the magnitude of the reduction in recombination rate, which may be small when co-adapted loci are already tightly linked (see also Ortiz-Barrientos & James,

2017). It should also be noted, however, that very few genetic elements have been identified that are localized within inversions that conclusively contribute to reproductive isolation. Thus, though spontaneous inversions remain one of the most compelling genomic features associated with rapid speciation, the precise mechanisms by which this is accomplished remain elusive.

2.4 Genomic Differentiation and Barrier Loci

2.4.1 Genomic Differentiation Islands

For decades, the prevailing view was that RI developed as a byproduct of independent evolution through the progressive substitution of incompatible alleles in

26

geographically isolated populations leading to speciation via postzygotic genetic incompatibilities (i.e., Dobzhansky-Muller (D-M) incompatibilities — Via, 2001).

Conversely, under a model of speciation with gene flow, lineages are expected to show

"profound genetic similarity" (Via, 2001:381) differing only at a few loci, presumably those conferring reproductive isolation. As Nosil, Harmon and Seehausen (2009:145) aptly state "although selection often initiates the process of speciation, it often fails to complete it." (Wu, 2001:887) was among the first to state that speciation reflects "a process of emerging genealogical distinctness, rather than a discontinuity affecting all genes simultaneously". Under this view of “genic speciation” (Wu, 2001; Wu & Ting,

2004), the process is driven by selection on specific regions of the genome, and RI is frequently incomplete until long after categorical speciation (Gourbiere & Mallet, 2010).

The genic view of speciation has gained momentum with the model of "genomic islands of speciation" (Feder et al., 2012; Malinsky et al., 2015). Originally formulated in an empirical setting (Turner, Hahn & Nuzhdin, 2005), the concept of genomic islands has been more broadly conceived as a case wherein certain regions of the genome

(typically, loci under strong selection) will show patterns of even in the face of considerable gene flow. Moreover, surrounding areas of the genome, even if evolving neutrally, can show similar patterns of population divergence (as measured by

FST) via the process of divergence hitchhiking (DH, Via, 2012). Theoretically, speciation can thus proceed from a stage wherein genomic islands are small and dispersed

27

throughout the genome, to a later stage wherein genome-wide divergence will occur and the genomic islands are erased (Feder & Nosil, 2010). Under this model, early-stage population divergence is predicted to be characterized by highly heterogeneous genomic divergence, with barrier loci residing in highly differentiated regions of the genome. For the empiricist, this is an attractive model: these predictions provide ideal circumstances for the identification of barrier loci using genome scans. Furthermore, due to increased frequencies of hemiplasy, a genic speciation process has important consequences for the interpretation of phylogenetic and comparative analyses

(Appendix B).

In line with the genomic islands model, highly heterogeneous patterns of divergence across the genome has indeed been a ubiquitous feature of what has been identified as the differentiation landscape. However, it has also become clear that the interpretation of these landscapes is highly complicated (Cruickshank & Hahn, 2014;

Nachman & Payseur, 2012; Noor & Bennett, 2009; Wolf & Ellegren, 2017). When comparing measures of relative divergence (FST) and absolute divergence (dXY),

Cruickshank et al. (2014) found little evidence that "islands of divergence" are actually produced by lack of introgression via gene flow. Rather, they conclude that differentiation islands represent genomic areas of reduced diversity, which are produced by the effects of linked selection regardless of gene flow. Specifically, variation across the genome in recombination rates and the density of functional elements

28

interacts with selection to produce variation in genetic diversity, and regions of low diversity will automatically show higher levels of relative divergence when populations are isolated (Burri, 2017b). Moreover, given that most fitness effects of new mutations are negative, the effects of background (i.e. negative) selection on patterns of diversity across the genome is expected to be substantial, perhaps larger than those of positive selection (Stephan, 2010), further reducing the likelihood that such regions are commonly important for speciation. Finally, differences in Ne and occurrence in each sex among regions of the genome generate different rates of the accumulation of differentiation that may moreover interact with population size changes (Belleghem et al., 2018), underscoring the need for further development of methods that co-estimate selection and all aspects of historical demography.

The emerging consensus appears to be that "islands of differentiation" are more commonly caused by processes unrelated to speciation, and thus do not by themselves provide evidence for a genic process of speciation. A further problem with the genomic- islands-of-speciation metaphor, and the underlying model, is that in many cases, such islands need not at all during speciation For instance, when speciation proceeds without flow or is underpinned by polygenic adaptation (Feder et al., 2012; Feder et al.,

2010), this model is not particularly relevant. Nevertheless, in systems where speciation may genuinely be characterized as genic, that is with high levels of gene flow and a limited number of barrier loci, such islands may be likely to contain barrier loci. As a

29

recent example, in a comparison of more than 100 populations from eleven species of stick insects (genus Timema), investigators identified a strong correlation between genomic islands and localized differentiation of loci underlying color differences under ecological selection (Riesch et al., 2017). Furthermore, regions with low recombination rates may not only be likely to generate spurious signals of differentiation, but as discussed previously in the context of inversions, regionally low recombination rates may also oppose introgression and promote speciation (Berner et al., 2016; Carneiro,

Ferrand & Nachman, 2009; Janoušek et al., 2015; Nachman et al., 2012; Ortiz-Barrientos et al., 2017; Samuk et al., 2017; Schumer et al., 2014).

Noting that current theory is presently dominated by a limited number of model species, perhaps biased by a "adaptationist perspective", Wolf et al. (2017:97) call for a cautious approach for interpreting genomic islands as signals of divergent selection. In their comparison of 67 published empirical studies, these authors found that general conclusions are necessarily hampered by a number of confounding factors, including

(but not limited to) differential genome quality, differing life history strategies amongst the lineages examined, as well as differing methodologies such as the chosen genome- scan window size and the methods for identifying outliers. This comparison across a wide phylogenetic range therefore suggests that identifying the genomic causes and consequences of divergent genomic islands will require a more fine-scaled approach. We therefore echo Ellegren’s (2014) prediction that as the field of speciation genomics

30

continues to develop, enhanced genome characterization will provide a richer understanding of the interaction of genotype and phenotype as targets for divergent selection.

2.4.2 Identifying Barrier Loci

Among the motivations for identifying "genomic islands" is to associate patterns of divergence with functional genomic mechanisms that may potentially be driving divergence. One such functional class has been described as "barrier loci."

Conceptualized as any gene that contributes to RI and meets the criteria of pre- speciation divergence and measurable effect size (Nosil & Schluter, 2011), the hunt for

“barrier loci” has been active (Ravinet et al., 2017). The term “barrier” makes clear that the locus in question, though contributing to the process, may not by itself be sufficient for irreversible lineage divergence, and “locus” implies that the genetic element in question does not need to be a gene. By determining the specific identity of barrier loci, we can hope that this will move us closer to answering a range of longstanding questions, such as what are the number and effect sizes of barrier loci, what types of genomic regions are involved, what types of mutations are required, and under which evolutionary forces have they evolved (Nosil et al., 2011)? However, arguing that reproductive isolation is an effect rather than a cause of speciation, some have suggested that instead of primarily focusing on reproductive isolation and the genes contributing to it, more attention should be given to the causes and consequences of diverging

31

phenotypes, i.e. “speciation phenotypes” (Shaw & Mullen, 2011). In the context of speciation genomics, both approaches nevertheless come down to establishing links between genotypes and phenotypes, although detecting loci that underlie reproductive isolation may be more straightforward in natural populations and using top-down approaches such as genome scans (see Figure 4 for an overview of approaches that can be used across different types of systems).

Figure 4: Methods of Studying Genomics of Speciation. Common methods of studying the genomics of speciation, with example publications. Each method is assigned to the category of study system in which it is most applicable. The methods are arranged, from left to right, in increasing order of cost and sophistication. The red methods are suitable for natural populations, blue methods for laboratory studies, and the purple methods are useful in both systems.

Until recently, the identification of barrier loci was predominantly focused on genomic regions contributing to postzygotic isolation. In striking contrast to the current enthusiasm for the role of adaptation and ecology in speciation, most loci that have so

32

far been linked to such hybrid incompatibilities appear to have evolved in response to internal genetic conflicts as well as neutral mutational mechanisms and recombination hotspots, of which PRDM9, discussed below, is a likely example (Maheshwari &

Barbash, 2011; Presgraves, 2010). Nevertheless, prezygotic isolation may be more likely than postzygotic isolation to be a consequence of ecological and sexual sources of selection, and prezygotic barrier loci are now also beginning to be identified (Ding et al.,

2016). For non-model organisms, especially those that cannot be crossed in laboratory settings, genome scans are the most widely applicable and currently most commonly used method to identify candidate barrier loci (Ravinet et al., 2017). Genome scans examine across the genome to find regions with unusual patterns such as strongly elevated genetic differentiation between populations (Beaumont & Balding,

2004; Lewontin & Krakauer, 1973). Ever-decreasing sequencing costs mean that population-level whole-genome resequencing projects are feasible for many non-model organisms, and we stress that given the difficulties described below, this should often be the approach of choice. As discussed in the previous , differentiation landscapes are commonly highly heterogeneous, with many regions of high differentiation, though most of these do not appear to be directly relevant to speciation. The difficulty in separating highly differentiated genomic regions that harbor barrier loci from those that do not is illustrated by the striking overlap in expected patterns at the genomic level:

33

both are likely to disproportionately represent regions with low recombination, a high density of functional elements, and signatures of selection.

Several approaches may help to identify the processes underlying the formation of a given genomic region that stands out for its high levels of differentiation. First,

Cruickshank et al. (2014) suggested using absolute (i.e., dxy) rather than relative (e.g., FST) measures of divergence, and this has been widely adopted. Nevertheless, dxy has very little power for recently diverged lineages (Burri, 2017a), may be masked by linked selection (Burri, 2017a), and may also be susceptible to demographic changes (Belleghem et al., 2018). Second, comparative approaches that examine differentiation landscapes across several populations or species pairs, including pairs that are known to not exchange genes, enable the identification of unique differentiation islands in the focal pair (Renaut et al., 2013; Roesti et al., 2015; Samuk et al., 2017; Vijay et al., 2016), which are less likely to simply be the consequence of local genomic features (Burri, 2017b). It should be noted that this approach assumes that landscapes of these genomic features, such as recombination rate, are the same across all population pairs. Third, the genomic features that may shape the differentiation landscape can also be characterized separately. This is increasingly possible in natural populations due to improved genome annotations and improved estimation of local recombination rates using information on (Smukowski & Noor, 2011), the latter owing to techniques such as single-molecule long-read sequencing and linked-read sequencing, which enable

34

better detection of structural variants and also retain haplotype information. Fourth, detailed examination of global and genomically-localized gene flow, as well as signatures of selection, may also allow for a better understanding of a given differentiation island.

Despite the promise from more widespread adoption of these approaches, a decade or so of widespread genome scans have made it clear that using such scans to identify barrier loci is in many systems very challenging and may in others not even be feasible (Buerkle, 2017; Jiggins & Martin, 2017; Lindtke & Yeaman, 2017). Before embarking on a genome scan approach for identifying barrier loci, it is thus helpful to reflect whether a focal system lends itself to this approach. Systems with relatively low overall divergence, much ongoing gene flow, and within which barrier loci are expected to be few and of large effect, are generally most conducive to the identification of candidate barrier loci (Belleghem et al., 2018; Jones et al., 2012; Malinsky et al., 2015;

Poelstra et al., 2014). However, if barrier loci are likely to detected through genome scans only in systems with specific biological features, this may mean that these barrier loci are not a representative subset of all barrier loci. It should also be noted that genome scans in the context of speciation research may be worth pursuing even when there is little scope for direct and precise identification of barrier loci. For instance, such studies provide insight on genome structure and its relation to patterns of differentiation, allow the quantification of gene flow both at the global and local genomic level, and provide

35

insight in the general architecture (rather than the specific identity) of barriers to gene flow (Jiggins et al., 2017).

Hybrid zones—and admixed populations more generally—provide an opportunity to use an alternative set of methods for identifying candidate barrier loci

(Gompert, Mandeville & Buerkle, 2017). First, if candidate barrier phenotypes are known, and these segregate within the admixed population, genotype-to-phenotype links can be assessed by genome-wide association methods such as Bayesian Variable

Selection Regression (BVSR, Gompert et al., 2013; Guan & Stephens, 2011), genome-wide efficient mixed-model association (GEMMA Delmore et al., 2016; Turner & Harr, 2014;

Zhou & Stephens, 2012), and GenABEL (Aulchenko et al., 2007; Nadeau et al., 2014).

Second, loci exhibiting unusually steep clines can be detected by exploiting spatial clines in hybrid zones (Barton & Gale, 1993; Payseur, 2010; Rafati et al., 2018; Trier et al., 2014), and similarly, yet without relying on spatial patterns, genomic clines across many loci in admixed individuals (Gompert & Buerkle, 2009; Lexer et al., 2007). Finally, when local genomic ancestry can be inferred among admixed individuals, the length of continuous ancestry tracts may offer clues to barrier loci (Sedghifar, Brandvain & Ralph, 2016), and ancestry disequilibrium between locus pairs can be used to test for two-locus genetic incompatibilities specifically (Schumer & Brandvain, 2016; Schumer et al., 2014).

If candidate barrier loci are identified, functional approaches are often necessary to validate their effects. Although these approaches will for the foreseeable future

36

remain limited to organisms that can be kept in laboratory settings, manipulated, and in most cases, bred (such as Drosophila, Cooper & Phadnis, 2016), the recent breakthrough of CRISPR holds great promise for testing candidate genes much more effectively and in a wider of species than other transgenic approaches (Bono, Olesnicky & Matzkin,

2015). The overriding practical requirement for the application of CRISPR for genome editing is that CRISPR/Cas9 elements can be delivered to early-stage embryos thus yielding the potential for the two repair pathways that are triggered by the double- stranded breaks induced by CRISPR/Cas9 to be exploited for multiple applications. For example, the nonhomologous endjoining (NHEJ) repair pathway can induce large deletions that create knockouts, which has for example been employed in a of studies that have identified some of the genes underlying several butterfly wing color pattern traits (Mazo-Vargas et al., 2017; Zhang et al., 2017a; Zhang, Mazo-Vargas &

Reed, 2017b; Zhang & Reed, 2016). CRISPR/Cas9 mediated NHEJ can also create other structural variants such as duplications and inversions, which may be particularly useful for testing their potential role in speciation (Bono et al., 2015; Kraft et al., 2015).

Further, -directed repair can be used to introduce precise genetic modifications. For instance, Ding et al. (2016) employed CRISPR/Cas9-mediated homology-directed repair both to fine-map and create mutations within the locus responsible for a courtship song difference between two species of Drosophila. This locus has been identified as the insertion of a retro-element in an intron of slo, an ion

37

channel gene, illustrating both the potential for small mutations to have large effects as well as the unpredictable relationship between gene function and impacts on speciation.

To summarize, advances in CRISPR techniques are developing at astonishing rates, including methods to directly convert single bases without requiring the formation of double-stranded breaks, using a catalytically impaired CRISPR/Cas9 mutant base editing (Gaudelli et al., 2017; Nishida et al., 2016). Thus, we are at the early stages of a technological revolution with unforeseeable impacts on the field of speciation genomics.

2.4.3 PRDM9: A Crucial Barrier Locus?

Perhaps the most intriguing candidate barrier locus to have emerged from a suite of recombination hotspot modifiers (Johnson, 2010) is PDRM9 (Brand & Presgraves,

2016; Oliver et al., 2009). Known to be strongly associated with recombination hotspots in placental mammals, PRDM9 is a rapidly evolving zinc finger protein with sequence- specific DNA binding and histone methyltransferase activity. As such, it "neatly wrap[s] genetic, epigenetic, and trans-acting factors known to influence recombination into one intriguing package" (Sandovici & Sapienza, 2010:1). Though as yet only characterized at the population level in a few natural populations, mice (Kono et al., 2014) and humans

(Consortium, 2010), PRDM9 is hypothesized to play two fundamental roles in the genome: to yield a diverse spectrum of recombination hotspots and to cause male hybrid sterility. In this regard, it is noteworthy for mediating both recombination rate and hybrid sterility, which in turn raises the tantalizing possibility that these

38

activities are potentially causative to RI and speciation (Kono et al., 2014; Payseur, 2016).

Moreover, it appears to be something of a smoking gun connecting rates of recombination directly to rapid rates of sequence evolution associated with strong positive selection (Axelsson et al., 2012; Gravogl, Schwarz & Tiemann-Boege, 2014;

Groeneveld et al., 2012; Kono et al., 2014; Oliver et al., 2009; Padhi et al., 2017; Ponting,

2011; Sandovici et al., 2010; Schwartz et al., 2014).

As support for this hypothesis, in vertebrate groups such as birds that lack

PRDM9, interspecific hybridization appears to be more feasible across larger evolutionary distances (Singhal et al., 2015) than in mammals. Oliver et al. (2009) found that concerted evolution and positive selection have united to drive rapid evolution of the gene in rodents, producing high levels of sequence variation across 13 rodent genomes. These authors also found that PRDM9 plays a measurable role in determining male sterility both within and among species as diverged as rodents and primates.

Broad phylogenetic surveys of PRDM9 suggest that it may be the most rapidly evolving gene in human and other animals (Ponting, 2011), and in a survey of 64 individuals across 18 species of primate, 68 unique alleles were identified (Schwartz et al., 2014). Of particular interest to human , alignments of Neanderthal and

Denisovan genomes reveal that PRDM9 sequences in these extinct species are closely related to present-day alleles in modern humans that are both rare and specific to

African populations (Schwartz et al., 2014). By far, however, the most intensive and

39

compelling work on PRDM9's role in hybrid sterility has been conducted in mouse

(Davies et al., 2016; Kono et al., 2014; Mihola et al., 2009; Smagulova et al., 2016;

Zelazowski & Cole, 2016), with increasing evidence that recombination rate and hybrid sterility are linked and phylogenetically widespread thus pointing to a more general connection to speciation (Payseur, 2016).

Successful attempts have been made to advance PRDM9 research in non-model organisms. The recombination patterns of organisms lacking PRDM9 (e.g., dogs and bees) have been mapped, and in cattle, some preliminary attempts have been made to establish PRDM9 as active in holding up species boundaries (Lou et al., 2014). Tarsiers, the most diverged lineage within the primate clade, show high allelic diversity of

PRDM9 that is highly congruent with phylogeography, thus suggesting an important role in speciation within genus Tarsius, and by inference, haplorrhine primates

(Heerschop et al., 2016). Similarly, the remarkable variation in the zinc finger of

PRDM9 in goats and sheep, wherein numerous amino acid sites are apparently under strong positive selection, has also been interpreted as evidence of the gene's intriguing role in speciation (Padhi et al., 2017).

As new understanding of the molecular evolutionary dynamics of PRDM9, and recombination hotspots in general, emerge an ever more nuanced view of its role in speciation is developing. Although it has long been known that the locations of recombination hotspots are highly mobile and are rarely conserved even between

40

closely-related species (e.g., Ptak et al., 2005; Winckler et al., 2005), the impact on hybrid sterility is only now coming to light. For example, Davies et al. (2016) found that hybrid sterility between two mouse lineages could be instantaneously reversed by

"humanizing" the PRDM9 allele. When the PRDM9 array was genetically engineered in one lineage to represent the human sequence, the genomic position of recombination hotspots was accordingly rearranged, and surprisingly, yielded fertile male hybrids.

Thus, one of the key findings of this study is that though PRDM9 shows a direct involvement in hybrid infertility, the effects are likely to be evolutionarily transient. In other words, increased divergence of PRDM9 is likely to mean a decreased role in the maintenance of species boundaries thereby suggesting that there may be a phylogenetic distance "sweet spot" wherein PRDM9 can strongly impact propensity for speciation, but with diminishing impact as phylogenetic distance increases. It will be fascinating to explore this phenomenon in an array of non-model species across a greater phylogenetic breadth.

2.5 Looking to the Future: Is There Hope for a Unified Theory of Speciation?

We have long known that organisms are hierarchically distributed across the tree of life, existing in “bins” that biologists attempt to define as species. These bins have boundaries of varying completeness and clarity, made porous by hybridization and introgression. Asking how these biological "edges" are formed, and how they are maintained, are among the most basic questions in evolutionary biology. The 41

relationship between genomic differentiation and lineage diversification is profoundly complex, and can range from circumstances wherein speciation is virtually instantaneous owing to possibly random genomic events such as chromosomal inversions, to scenarios of rapid speciation in situ owing to strong environmental selection, to speciation on evolutionary timescales wherein differentiation and reproductive isolation slowly builds in geographic isolation. Thus, it is not surprising that few, if any, rules have been identified to formalize the role of the genome in speciation.

Intrinsic genomic features such as inversions, gene duplications, recombination patterns, and higher order architecture have all been implicated in speciation, and in many cases, these discoveries occurred initially in lab-based model organisms with well- characterized genomes and tractable life histories. Identification of the genetics underlying reproductive isolation has up to the present been most feasible for postzygotic incompatibilities between pairs of genes in long-studied species such as

Drosophila. We are reaching a point, however, wherein the field is rapidly expanding outward and is discovering the more complex genomic underpinnings of speciation in a wider array of species (Figure 5). As genomic resources have spread to evolutionarily proximate species, mechanisms of speciation are being described in non-model species and natural populations. Accordingly, our view of speciation has become richer and more complex. The interplay among underlying features of the genome, patterns and

42

processes of speciation, and the ecological surroundings of species will continue to emerge as knowledge of non-model genomics increases, and the field will push ever further toward insights into natural, non-model populations with complex speciation stories — the "unexplored corner" suggested in Figure 5.

Figure 5: Speciation Study Spectrum. A graphic representation of where recent published studies of speciation fall along two continua: 1 - whether the focal species' was studied as a lab organism or a natural population (bottom axis of diagram) and 2 - 43

the complexity that underlies the nature of the genetic mechanism responsible for reproductive isolation, ranging from genic to genomic (left axis of diagram). Studies are represented by an icon depicting species, year of publication, and a short summary with superscript notation of the full citation. The papers' first authors (in chronological order) are as follows: 1- Ting et al. (1998), 2- Noor et al. (2001a), 3- Brideau et al. (2006), 4- Nosil, Egan and Funk (2008), 5- Carneiro et al. (2010), 6- Janousek et al. (2012), 7- Renaut et al. (2012), 8- Martin et al. (2013), 9- Fan et al. (2014), 10- Franchini et al. (2014), 11- Gaither et al. (2015), 12- Janoušek et al. (2015), 13- Wadsworth et al. (2015), 14- Davies et al. (2016), 15- Larson et al. (2016), 16- Leducq et al. (2016), 17- Rastorguev et al. (2016), 18- Toews et al. (2016), 19- Ichikawa et al. (2017), 20- Malukiewicz et al. (2017), 21- Mazo-Vargas et al. (2017), 22- Zuellig et al. (2018).

A unified field of speciation genomics will thus require a multi-pronged approach to speciation dynamics that takes into account intrinsic features of genomic architecture examined in the light of each organism's extrinsic biology and ecology.

There is an essential place for targeted studies that illuminate the role of specific genes or structural variants, while many valuable insights can also be gained through genome scan comparisons, though caution must be applied. Consequently, the continued development of theory and competing models will always be relevant in order to make sense of what will be an ever-increasing torrent of empirical data. By examining the role of the genome in contrasting models of speciation, we will attain powerful insight into the differential effects of historical constraint in the face of ecological opportunity. It is the interplay between these two forces that has and continues to produce species diversity across the Tree of Life.

44

3. Geogenetic Patterns in Mouse Lemurs2

Madagascar is one of the most biodiverse and imperiled regions on Earth with a biota that is largely unique to this island nation. The degree to which humans have altered the landscape is extreme; it remains a matter of debate just how much of the island was with closed-canopy forest prior 2000 years ago when humans began to colonize the island. Here, we adopt a phylogeographic approach to understand how the dynamic process of landscape change on Madagascar has shaped the distribution of a targeted clade of mouse lemurs (genus, Microcebus), and conversely, how phylogenetic and population genetic patterns in these small primates can reciprocally inform our understanding of Madagascar’s pre-human environment. The degree to which human activity has impacted the natural plant communities of Madagascar is of critical and enduring interest. Today, the eastern rainforests are separated from the dry deciduous forests of the west by a large expanse of presumed anthropogenic grassland savanna, dominated by the Family Poaceae, that blankets most of the Central Highlands.

Traditional views of human impacts hold that the island was covered by closed-canopy forest prior to human colonization approximately 2,000 years ago, but with aggressive agricultural practices, the landscape was abruptly and dramatically transformed.

2 The contents of this chapter have been published in Proceedings of the Natrual Academy of Sciences: Yoder, A.D., Campbell, C.R., Blanco, M.B., Dos Reis, M., Ganzhorn, J.U., Goodman, S.M., Hunnicutt, K.E., Larsen, P.A., Kappeler, P.M., Rasoloarison, R.M. and Ralison, J.M., 2016. Geogenetic patterns in mouse lemurs (genus Microcebus) reveal the ghosts of Madagascar's forests past. Proceedings of the National Academy of Sciences, 113(29), pp.8049-8056. 45

Phylogenetic and population genetic patterns in a five-species clade of mouse lemurs suggest that longitudinal dispersal across the island occurred until approximately

500,000 ybp when a large ancestral population experienced rapid diversification, resulting in the present-day distributions of several taxa. By examining patterns of both inter- and intraspecific genetic diversity in these species, found in the eastern, western, and Central Highland zones, we find support for the view that the natural environment of the Central Highlands would have been mosaic of humid forest areas surrounded by a matrix of wooded savanna, and forming a transitional zone between the extremes of a of humid eastern and dry western forest types.

3.1 Introduction

Madagascar is one of the most enigmatic landmasses on earth and has long been identified as a unique hotspot (Myers et al. 2000) with exceptional levels of species endemism across plants and animals, including an array of microendemics. It is currently estimated that nearly 100% of Madagascar’s land mammals and native amphibians, 92% of its reptiles, and > 90% of its plants are found nowhere else on earth

(Callmander et al. 2011; Vences et al. 2009). Best estimates indicate that the island harbors nearly 5% of Earth’s species level biodiversity even though it comprises only a little more than 0.01% of the planet’s land surface area. This island represents a biological and ecological sample of maximum importance, a veritable treasure trove of species interactions and testable hypotheses. The mechanisms by which so much

46

biodiversity arose on such a relatively small and remote corner of the Earth beg for phylogeographic investigation, though many of Madagascar’s present patterns of diversity and endemism relate to its historical (Wilmé et al. 2006). A recent and detailed analysis of the interplay of geology, climatic change, and shifting ocean currents concluded that the island is a natural experiment for understanding the effects of “differential and filtration” over the vast expanse of the Cenozoic

(Samonds et al. 2013). Though the evolutionary questions that riddle Madagascar have long been the focus of biologists there still remain many untested hypotheses and unanswered questions, many of which are being ushered into the realm of possibility by the increasing ease of genetic and genomic technologies.

The eastern trade winds coming off the Indian Ocean and the orographic effect of the island’s eastern mountain range yield a remarkable east-west trend in precipitation across Madagascar (Gautier and Goodman 2003). An evergreen humid forest biome covers portions of the eastern lowlands of the island and extends about 100 km inland up the north-south aligned eastern chain of mountains. At elevations above 800 m and extending well into the island’s interior, the humid forest gives way to the Central

Highlands, which are dominated by moist montane forest. At higher elevations (i.e. above 1900 m), the montane forest habitat gives way to a high elevation Ericaceae thicket. In marked contrast, along the western half of the island, below 800 m elevation and to the west of the limit of the Central Highlands, the montane forests shift to dry

47

deciduous forest dominated by drought-adapted trees and shrubs (Figure 6). All of these habitats have been extensively degraded and fragmented by human activities over the past few hundred years (Green and Sussman 1990; Harper et al. 2007), and in most areas, particularly the Central Highlands, little of the former natural vegetation remains.

While there is uniform consensus that human activities dramatically transformed the landscape at about 1,000 ybp (Dewar 2014; Goodman and Jungers 2014; Mcconnell and

Kull 2014; Vorontsova et al. 2016), the pre-human situation remains contested.

Interpretation of the pre-human vegetation via current phytogeographic classification of the island, however, are constructed based on relatively recent floristic affinities of plants (Humbert 1955; Perrier De La Bâthie 1921) but with little data on natural formations from the Central Highlands. Hence, this classification separates the biomes of the east and west, in absence of the possibility that the Central Highlands were naturally a vast zone of transition, in turn placing too much emphasis on the extremes of a broad cline.

48

Figure 6: The Biomes of Madagascar. A map of Madagascar with the major biome types recognized in modern times and sampling locations used in this study for all 30 samples across six species within the genus Microcebus. Sampling counts at a given location varied between one and three individuals. 49

The eastern humid forest, western dry deciduous forest, and the Central

Highlands zones are now covered in part in what have been referred to as secondary grasslands or savanna composed primarily of Poaceae and specifically the

Gramineae, and subject to frequent burning (Figure 6). Today, the Central Highlands exists as a stark region separating the mesic habitats of the east and the arid habitats of the west and south. It has been the subject of debate whether this grassland formation is entirely the result of human-mediated transformation, or whether it is better viewed as a modified landscape at least in part comprised of native wooded savanna interspersed with areas of forest (Bond et al. 2008; Goodman and Jungers 2014). Our definition of wooded savanna is a mosaic formation with irregular continuous canopy and in certain more open areas lacking relatively dense trees with herbaceous plants perhaps dominated by grasses, very similar to Miombo woodlands of southern Africa. The classic view (hereafter referred to as the “traditional hypothesis”) holds that Madagascar was blanketed by closed-canopy forest across its entirety prior to the arrival of humans, ultimately leading to a 90% reduction in natural forest cover entirely through human agency. Supporting this hypothesis, a recent stable isotope analysis from calcium carbonate cave deposits from Anjohibe Cave in the northwest found evidence for a rapid and complete transformation from a flora dominated by C3 plants to a C4 grassland system over the course of a single century and coincident with early human habitation of this portion of the island (Burns et al. 2016). The authors of this study therefore

50

conclude that early anthropogenic activities were the primary forces driving landscape transformation, with rapid and extreme natural habitat loss, presumably a mixture of closed canopy forest or wooded savanna, being a direct consequence of early human habitation. This view leaves unanswered the degree to which intermediate habitats would have been subject to Quaternary climatic vicissitudes and the impacts of natural fires prior to the arrival of humans, which is estimated to have occurred about 4,000 ybp

(Dewar et al. 2013; Dewar and Wright 1993).

Separate investigations present an alternative narrative of the central plateau.

Analyses of species richness, endemism, and of the grasses of Madagascar provide evidence that these species occurred naturally, were widespread, and their presence on the island predates humans, going as far back as the Neogene (Vorontsova et al. 2016). Further extending these results, the savanna ecology that is presently found on the Central Highlands is a remnant of former graminea vegetation, the aforementioned wooded savanna. This evidence and accompanying hypothesis will be hereafter referred to as the “grassland hypothesis”, for the sake of brevity.

Finally, an intermediate and more nuanced view of pre-human vegetation in

Madagascar, hereafter referred to as the “mosaic hypothesis,” holds that the Central

Highlands were characterized by a mixture of closed canopy forest and wooded savanna (as defined above, but occurring more recently in the Quaternary) with some

51

regions being open and others closed with dense tree growth (Goodman and Jungers

2014). This view of Madagascar’s pre-human vegetation has been developed over years of study focusing on the analysis of pollen spectra and charcoal influx from lake sediments across the Central Highlands (Burney 1986, 1987, 1993; Burney et al. 1997).

The overarching question addressed by the present study is what was the role of the natural Central Highland habitats as a dispersal corridor and an ecological cline between the extremes of the humid east and dry west?

We examine phylogenetic and “geogenetic” (Bradburd et al. 2016) patterns in mouse lemurs (genus Microcebus) to determine both the timing and directionality of dispersal events between eastern and western Madagascar. We also investigate patterns of genetic structure within two broadly distributed species, one from western

Madagascar, and one from the Central Highlands, to determine if there are differential signatures of continuous versus discontinuous habitat in the two species. Such differential patterns will have potential consequences for the interpretation of the pre- human landscape and its impact on mouse lemur evolution. If the traditional hypothesis applies, and the grassland savannas of the Central Highlands are an entirely recent phenomenon, we would expect to see extensive and recent patterns of connectivity between what are considered distinct eastern and western biomes. Alternatively, if the grassland hypothesis holds, we should observe a strong and ancient separation between

52

the eastern and western biomes. If, however, recent models of a longstanding and persistent environment comprised of mosaic wooded grassland and isolated forested areas are accurate, it is probable that the pulsing cycles of warm and wet versus cold and dry conditions characteristic of the Quaternary will likely have left genetic signatures of population isolation and reconnection on forest-dwelling mammals (Wilmé et al. 2006).

Ongoing work from the past two decades has shown conclusively that although the morphological differences among species of mouse lemur are subtle, their genetic and ecological differences are consistent with a species radiation (Craul et al. 2007;

Ganzhorn and Schmid 1998; Heckman et al. 2007; Hotaling et al. 2016; Lahann et al.

2006; Louis et al. 2006; Rakotondranary and Ganzhorn 2011; Rasoloarison et al. 2000;

Rasoloarison et al. 2013; Schmid and Kappeler 1994; Thalmann and Rakotoarison 1994;

Weisrock et al. 2010; Yoder et al. 2000; Zimmermann et al. 1998), with the basal radiation perhaps occurring as long as 9 - 10 Ma (Yang and Yoder 2003; Yoder and Yang 2004).

Several studies have shown strong support for three deep lineages, one that contains M. murinus plus M. griseorufus, another deeply-diverged lineage represented by M. ravelobensis, M. danfossi, and M. bongolavensis, and a third lineage that is comprised of all other mouse lemur species including strong support for distal subclade with M. berthae, M. rufus, and M. myoxinus(Heckman et al. 2007; Weisrock et al. 2010;

Weisrock et al. 2012; Yoder et al. 2000). This latter subclade is especially intriguing given that M. rufus, an eastern humid-forest animal, is markedly divergent both ecologically

53

and geographically from M. berthae and M. myoxinus, both of which occur in western dry deciduous forests. The observation that this relatively recently-diverged clade contains both eastern and western representatives has prompted speculation about the existence of former forested corridors between eastern and the western Madagascar

(Yoder et al. 2000).

Our study aims to examine the relationship of the western and eastern zones and the extent to which the hypothesized Central Highland habitat matrix between these zones, including isolated forest blocks separated by a matrix of wooded savanna, provided conduits for dispersal of forest-dwelling mammals. Here we examine the fit of mouse lemur phylogeography to hypotheses regarding the natural plant community composition of Madagascar’s Central Highlands. To test the fit to the three paleoenvironmental hypotheses described above, we focus on phylogenetic and geogenetic patterns in five species of mouse lemur: two with eastern distributions (M. mittermeieri and M. rufus), two with western distributions (M. myoxinus and M. berthae), and one that that has been proposed as limited to the Central Highlands (M. lehilahytsara) (Radespiel et al. 2012). A genome-wide RADseq approach was employed to assess genetic diversity among and within the five targeted species and to test the fit of these data to both spatial and historical predictions associated with the two competing hypotheses. The genome-wide SNP data were analyzed with coalescent

54

methods to estimate the species tree structure, and it’s fit to the mtDNA gene tree, as well as to estimate divergence times for the five targeted species.

3.2 Methods

3.2.1 Sample Collection

The targeted species analyzed samples of 30 wild caught mouse lemurs from 15 study sites (Table 1) and are largely from 5 closely-related species (M. berthae, M. myoxinus, M. rufus, M. lehilahytsara, and M. mittermeieri) and 1 outgroup (M. marohita).

The 21 existing samples were analyzed, in part, in previous work (Weisrock et al. 2010;

Yoder et al. 2000). This study conducted mtDNA sequencing to compare newly collected samples with existing data. Additionally, all 30 individuals were ddRAD sequenced and analyzed at a greater depth than previous technology allowed.

55

Table 1: Sample Localities. ddRADseq sample ID, species name, and collection coordinates.

Sample ID Species Latitude Longitude

RMR93 Microcebus berthae -20.05 44.63

JMR002 Microcebus lehilahytsara -16.29 48.82

JMR091 Microcebus lehilahytsara -18.8 48.35

JMR092 Microcebus lehilahytsara -18.8 48.35

MBB002 Microcebus lehilahytsara -18.82 48.31

MBB003 Microcebus lehilahytsara -18.82 48.31

MBB036 Microcebus lehilahytsara -18.1 47.19

MBB037 Microcebus lehilahytsara -18.1 47.19

MBB038 Microcebus lehilahytsara -18.1 47.19

MBB039 Microcebus lehilahytsara -18.1 47.19

MBB3235 Microcebus lehilahytsara -19.72 47.86

MBB3236 Microcebus lehilahytsara -19.72 47.86

MBB3243 Microcebus lehilahytsara -19.69 47.77

MBB3244 Microcebus lehilahytsara -19.69 47.77

RMR34 Microcebus lehilahytsara -19.1 44.8

RMR96 Microcebus lehilahytsara -18.48 47.27

RMR97 Microcebus lehilahytsara -18.48 47.27

56

RMR99 Microcebus lehilahytsara -18.48 47.27

MBB001 Microcebus lehylahitsara -18.82 48.31

RMR138 Microcebus marohita -20.06 48.18

MBB011 Microcebus mittermeieri -14.74 49.49

MBB016 Microcebus mittermeieri -14.74 49.49

RMR189L Microcebus mittermeieri -14.47 49.84

RMR192L Microcebus mittermeieri -14.47 49.84

JMR027 Microcebus myoxinus -16.52 44.48

JMR077 Microcebus myoxinus -19.5 44.73

RMR83 Microcebus myoxinus -19.25 44.45

RMR86 Microcebus myoxinus -19.25 44.45

RMR142 Microcebus rufus -21.5 47.4

RMR144 Microcebus rufus -21.5 47.4

3.2.2 mtDNA Sequencing

Sequence data were collected from nine individuals for both the cytochrome B

(cytb) and cytochrome oxidase II (COII) genes. A 1,140 bp of cytb was amplified using the following primers: TGA-YTA-ATG-AYA-TGA-AAA-AYC-ATC-GTT-G and TCT-

CCA-TTT-CTG-GTT-TAC-AAG-ACC-A. Within COII 684bp were amplified using primers L7553 and H8320 (Adkins and Honeycutt 1994). Cycle sequencing reactions

57

were performed using BigDye® Terminator V1.1 following standard conditions and all samples were sequenced using an Applied Biosystems 3730xl at the Duke Sequencing

Core resource. Sequences have been submitted to NCBI under accession numbers

KX070700-KX070743.

3.2.3 mtDNA Analyses

Our full data set consists of cytochrome-B (cytb) and cytochrome oxidase II

(cox2) sequences for 332 individuals from 23 species. These sequences include 309 individuals from previous studies as well as new data for 23 individuals sequenced for this study. Analyses were conducted using PAUP version 4.0a149 except where otherwise indicated. The “missdata” command was used to delete 97 taxa containing more than 50% missing-data for either gene. Individuals with identical sequences were then merged using the “reducetaxa” command, resulting in a data matrix containing 121 distinct sequences. The remaining sequences were re-aligned using MAFFT v7.272(Katoh and Standley 2013). Six subsets of sites were designated corresponding to first, second, and third codon positions of each gene. An optimal partition of these sites was estimated using the “autopart” command (which implements a version of the partitioning strategy described by Lanfear et al.(Lanfear et al. 2012), on an initial tree obtained using the neighbor joining method(Saitou and Nei 1987) with distances calculated according to the HKY model (Hasegawa et al. 1985). Partitioned models were compared using the Akaike Information Criterion (Akaike 1974) with correction for

58

finite sample size (AICc(Burnham and Anderson 2002). For each of the 203 possible partitioning schemes for six subsets, a set of 56 models was evaluated (corresponding to seven basic nucleotide substitution schemes with variations for equal vs. unequal base frequencies and four combinations of among-site rate variation models). Substitution- model parameters, including parameters used in modeling among-site rate variation, were estimated separately for each partition subset, but branch lengths were shared across subsets. An additional set of parameters scales all branch lengths according to the relative rate for each included partition subset. All parameters of the chosen partitioned likelihood model were estimated by maximum likelihood, and then fixed to these estimates prior to tree search. The best of the 203 possible partitioning schemes involved separating sites by codon position, but merging both genes for each codon position.

Models chosen by AICc for the three codon positions were SYM+I+G, HKY+I+G, and

TrN+I+G, respectively. “SYM” corresponds to the the general time reversible model

(Tavaré 1986) with equal base frequencies. “HKY” and “TrN” are the models of

Hasegawa et al. (1985) and Tamura and Nei (1993). All selected models include an invariable-sites category with variable sites drawn from a gamma distribution (+I+G).

Relative rates for second codon positions were slowest (0.080), with faster rates for first

(0.413) and third (2.510) codon positions. Optimal trees were estimated using

PAUP with stepwise addition (10 random addition sequences) and TBR branch- swapping (Swofford et al. 1996). Bootstrapping was performed using RAxML version

59

8.2.4(Stamatakis 2014) using the closest available model (GTR+I+G, with partitioning by codon position, with partitioning by codon position). Bootstrap results were transferred to the optimal PAUP trees using the SumTrees command available in DendroPy version 4.1.0 (Sukumaran and Holder 2010).

3.2.4 RAD Genotyping

ddRAD libraries were generated from whole genomic DNA and whole genome amplified DNA following the protocol of Peterson (Peterson et al. 2012) and Blair (Blair et al. 2015). The double digest was completed with the enzymes SphI and MluCI and

IDT primers were used to uniquely barcode the first paired end of all 30 samples. Pooled ddRAD libraries were size selected to a length of 525 bp using a Pippen Prep (Sage

Science) at the North Carolina State University Genomic Sciences Laboratory (NCSU

GSL). 150 bp paired end sequencing on an Illumina NextSeq was also conducted at the

NCSU GSL. Two NextSeq runs generated a total of 557,299,621 read pairs of 150x150bp data.

The software package trimmomatic (Bolger et al. 2014) was used to curate the data for overall quality and we required that 80% of a sequence read contain a q-score 20 or higher. To assess overlap between the paired end reads we used Paired-End reAd mergeR (PEAR) (Zhang et al. 2014), which assembled roughly 4% of the read pairs together, these were then discarded from further analysis. We used Stacks (Catchen et al.

2013) to demultiplex reads by unique barcode. Given the overall poor quality of the

60

second paired end read we dropped those reads from the analysis and used only the first read. All sequence reads used for analysis are available at the National Center for

Biotechnology Information Short Read Archive.

We used the program pyRAD (Eaton 2014) to cluster, align, and call SNPs within the curated ddRAD dataset. Once the final dataset was curated, one M. lehilahytsara sample (MlehiDWW3244) was dropped from further ddRAD analysis due to an excessively high level of missing data, as its exclusion did not eliminate a locality. While it only constituted a small fraction of the data within our focal species (M. lehilahytsara), its exclusion more than doubled the amount of shared loci when operating under the requirement of no missing data.

3.2.5 RAD Data Analyses

For the assessment of genetic distances the R package APE (Paradis et al. 2004) was used, taking as input the concatenated, aligned ddRAD loci. The R package

SpaceMix (Bradburd et al. 2016) was run to determine the samples’ “geogenetic” positions with 95% confidence intervals. SpaceMix was run both for all 29 individuals, yielding 1,583 SNPs and for all 15 sample sites separately, in which we collapsed the locality data to a single entry in the SpaceMix input matrix, yielding 7,303 SNPs. All

SpaceMix analyses were run without any missing data.

61

3.2.6 Divergence Time Analyses

The program BPP (Yang 2015), which implements the multispecies coalescent, was used to estimate the species tree topology, branch lengths (τ) and nucleotide diversity (θ= 4/N/µ) for the mouse lemur species of interest. The posterior τ and θ values can be converted to geological times of divergence and effective population sizes by using priors on the per-generation mutation rate and the generation time (Angelis and

Dos Reis 2015). Bayesian analysis using the multispecies coalescent is computationally expensive, so here we used a small dataset to estimate the tree topology, and a large dataset to estimate τ and θ values more precisely by fixing the topology to that obtained with the small data.

Small dataset analysis: 82 RAD-seq fragments (11,624 bp) with data for all 30 individuals (from 7 species) were analyzed with BPP to obtain the tree topology (BPP’s

A01 analysis). For the species tree prior we used uniform rooted trees. The prior for τ is

Gamma(2, 250) and for θ is Gamma(2, 1000).

Large dataset analysis: Six individuals for each species (out of 30) were selected randomly, and then 80,662 RAD-seq fragments (11,247,917 bp) with data for these six individuals were analyzed. The use of a large number of sites leads to narrower (more precise) posterior estimates of τ and θ values, while the use of few individuals guarantees that the Bayesian MCMC analysis can be conducted in a reasonable amount of time (about 2 weeks). The priors on τ and θ were as above. To convert τ and θ values

62

to divergence times (t) and population sizes (N), we sampled values for the mutation rate (µ) and generation time (g) from priors and used these to calculate posterior estimates of τ and θ. For example, consider the ith MCMC posterior sample τi, and samples µi and gi from the priors, then ti = τi * µi / gi. Thus, by sampling as many values of µ and g as values of τ in the MCMC we can obtain a posterior sample of t. A similar approach is used to obtain the posterior sample of N’s (see Angelis and dos Reis 2015 for details). The prior on µ was Gamma (27.80, 31.96) which roughly has 95% prior credibility interval (CI) between 0.5 and 1.2x 10–8 substitutions per site per generation.

This matches NextGen estimates of the per-generation mutation rate in the lab mouse

(Uchimura et al. 2015) and human (Scally and Durbin 2012). The prior on g was Gamma

(100, 26.6) which roughly has 95% prior CI between 3 to 4.5 years.

3.2.7 Estimation of average generation time for genus Microcebus

The proper estimation of generation times is fundamental to the divergence time methods described immediately above. Accordingly, we estimated an average generation time for all species of mouse lemurs using a combination of data from both wild and captive populations yielding an informed estimate of 3.0 - 4.5 years. Field observations indicate that for eastern rainforest species, age at first reproduction in wild populations is 10-12 months, with females undergoing estrus and reproduction annually thereafter. This is consistent with husbandry records for captive M. murinus at the Duke

Lemur Center (data accessible at

63

http://dx.doi.org/10.5061/dryad.fj974#sthash.OdSp6YnB.dpuf ). For some species and wild populations, two litters per year can be common, though this is rare in other species and localities (Blanco et al. 2015; Lahann et al. 2006). Captive data for M. murinus at the Duke Lemur Center show that two litters in a single year has only occurred only once in 101 litters. Extrapolating from Fig.7A in Zody et al., 2014 , we calculated a survival probability of 0.879 - 0.0714 (age in years) that when extrapolated to 12-years of age yields 17% for 10y, 9% for 11y, 2% for 12y, and 0% above 12y. These figures are consistent with observations from captivity. Survival probabilities are considerably lower for wild populations, however. In the dry forests of western Madagascar, individuals appear to live only 2-3 years on average (Hamalainen et al. 2014). Based on capture/recaptures at more resource-rich habitats, such as the eastern rainforests, indications are that lifespans are likely longer, with 4 years being a conservative average lifespan. Information from both captive and wild populations indicate that reproductive senescence does not show major effects until 5 years of age in females, thus, it is likely that the period of reproductive fitness exceeds lifespan in natural populations.

3.2.8 Comparison of Genetic and Geographic Distance

To initially assess the impact of forest type on population structure and gene flow within species of Microcebus the genetic distance (RAD Data Analyses) was compared between pairs of populations separated by similar geographic distances.

Individuals were assessed at every possible within-species dyad and the resulting

64

genetic and geographic distances were plotted to visually inspect for a pattern of

Isolation by Distance (IBD). Given the relatively low sample count of M. myoxinus, compared to M. lehilahytsara, the latter was randomly sampled to conduct a paired, one- sided T-test. Paired short and long distance comparisons were tested both individually and together for a total of 3 separate T-tests, each sampling 1,000 times from the measured M. lehilahytsara dyads.

3.3 Results

3.3.1 mtDNA tree

Maximum likelihood trees for concatenated cytb and COII loci are largely congruent with previous analyses (Hotaling et al. 2016; Weisrock et al. 2010). Though the mtDNA analysis only sampled 15 of the currently 24 described species of described mouse lemur species, the fundamental pattern presented in previous studies is repeated here: the phylogeny shows a basal split between a clade formed by M. murinus plus M. griseorufus and all other species. The mtDNA gene tree continues to illustrate the depth and diversity of the mouse lemur radiation through time, and throughout the geographic expanse of Madagascar (Figure 7). The topology is highly similar to phylogenetic results from previous studies and similarly finds all previously described taxa to be resolved as reciprocally monophyletic. The clade containing the eastern M. rufus and western M. myoxinus and M. berthae continues to receive robust support.

Moreover, the placement of the Central Highlands species, M. lehilahytsara, shows a

65

well-supported relationship to this distal-most clade, also in agreement with previous studies (Hotaling et al. 2016; Weisrock et al. 2010).

66

67

Figure 7: Microcebus Tree with Focal Species Highlighted. Maximum likelihood tree for concatenated mtDNA data (cytb and cox2) from 117 Microcebus sequences plus four outgroup sequences (Cheirogaleus and Mirza; not shown in figure). Bootstrap support is shown for internal nodes (100 replicates). The age of the basal node was previously estimated using phylogenetic methods by Yoder and Yang (48). The dashed box highlights the five species targeted for genome-wide SNP analysis.

3.3.2 Species tree estimation and divergence times

The species tree generated from the double digest RADseq (ddRAD) approach is largely congruent with the mtDNA gene tree for the five targeted species (illustrated with a dashed box in Figure 7), though notably, M. rufus from southeastern rainforest is resolved as the sister group to M. berthae from the western regions of Madagascar, south of the Tsiribihina River (node B, Figure 8). Data supporting this species tree were generated from ddRAD libraries created from 30 individuals across the six species, uniquely barcoded with inline barcodes in the first of two paired-end reads. Paired-end

150 bp sequencing on two Illumina NextSeq runs yielded 557.3 million reads. After several quality filtering steps (see Methods) the dataset was trimmed to a total of 340.5 million reads across 29 samples for analysis with the software pyRAD (Eaton 2014). pyRAD yielded 124,916 total loci after removal of putative paralogs and implementation of filtering criteria. The 29 samples averaged 57,845 ±13,901 loci. Among the four taxa with two or more sequenced samples the average number of loci was 60,076 ±10,387.

Only unlinked SNPs with no missing data were used for analyses of the geogenetic location of samples using the software package SpaceMix (Bradburd et al. 2016). There were 1,583 SNPs without any missing data across the four taxa used for SpaceMix 68

analysis (M. lehilahytsara, M. rufus, M. myoxinus, M. mittermeieri). However, when individuals from the same sampling location were consolidated and analyzed together, this increased to 7,303. We used this latter dataset for all final SpaceMix analyses.

Figure 8: Microcebus tree of genomic SNPs. Maximum clade credibility species tree from a BPP analysis of nuclear RAD-seq data. Node labels correspond to those in Table 1. All nodes in the tree have 100% statistical support. Blue bars are the 95% highest posterior densities of node heights. Species are identified by their contemporary geographic range (eastern versus western); plateau, Central Highlands plateau.

The species tree estimated with BPP (Yang 2015) resulted in a posterior distribution with high (100%) measures of support for all nodes, including a sister relationship for M. lehilahytsara and M. mittermeieri, and a sister relationship for M.

69

berthae and M. rufus. Microcebus myoxinus from western Madagascar was placed as the sister lineage to the M. berthae plus M. rufus clade (Figure 8). Posterior branch lengths (τ) and nucleotide diversity (θ = 4/Nµ) were converted to geological times of divergence and effective population sizes by using priors on the per-generation mutation rate and the generation time revealing that the ancestral node for the targeted species has a posterior mean age of roughly 500K ypb (Table 2). The divergence times between M. lehilahytsara and M. mittermeieri (Node A in Figure 8) and between M. berthae plus M. rufus (Node B in Figure 8) are much more recent, with both divergences occurring approximately 55K ybp (though note the considerable uncertainty represented by the

95% credible intervals (Table 2).

Table 2: Ancestral effective population size (Ne) and estimated divergence times.

Divergence Node Ne (✕103) 95% CI 95% CI Time (ka) E 307 200, 427 545 321, 796 D 21.5 8.56, 34.5 313 188, 452 A 79 42.3, 118 52.4 7.42, 112 C 3.98 1.92, 6.14 290 174, 421 B 88.4 51.5, 130 17.5 0.51, 52.1

3.3.3 Geogenetic Analysis

One of the intriguing results of this study was the discovery of a population of

M. lehilahytsara from a tiny (< 2 km2) forest patch known as Akafobe, in close proximity to a series of fragmented forest parcels within the Ambohitantely protected area (Figure

70

9). Though only three individuals were sampled from this locality, the level of genetic diversity among these animals at the mtDNA locus is greater than 1% and exceeds that of any other single locality from which this species has been sampled.

Figure 9: M. lehilahytsara Sampling Locality Satellite Images. Google Earth view of the Ankafobe sampling locality for M. lehilahytsara. Inset A illustrates the depauperate environment that currently is comprised of anthropogenic grasslands. Areas of green are largely comprised of rice cultivation but suggest the potential for naturally occurring forests along these stream basins. Inset B illustrates the extremely isolated position of the Ankafobe Reserve.

As a potential means for testing this hypothesis, we compare patterns of genetic diversity within M. lehilahytsaraand M. myoxinus. The two species have inferred ranges

71

that are equivalent in geographic area (as measured in km2), though in the case of M. myoxinus, there is the null expectation that ancestral habitat would have been largely continuous during the inferred geological timeframe, though it is now highly fragmented due to anthropogenic effects. The ancestral habitat for M. lehilahytsara, on the other hand, is unknown and dependent on the inferred conditions consequent to the

“traditional” versus “mosaic” hypotheses. By comparing genetic structure in a geographic range among contiguous forests (M. myoxinus) equivalent in area (as measured by km2) with a geographic range among mosaic forests within the Central

Highland species (M. lehilahytsara), we see that the two species show strikingly different patterns (Figure 10) and we can infer that the latter has long existed in a mosaic environment. T-tests yielded statistically significant differences in genetic distance in paired long distance dyads, where average p=9.80x10-4 across the 1,000 randomly sampled T-tests, paired short distance dyads, average p=8.59x10-3, and when considering long and short dyads together, average p=1.34x10-4 (Figure 11).

72

Figure 10: Genetic and Geographic Distance Species Comparison.A map of Madagascar with sampling locations for two focal species, M. myoxinus and M. lehilahytsara, hypothesized historical forest type labeled by color, and boxes marking the sampling locations used for short and long geographic distance t tests of genetic distance. (B) Plot of genetic distance versus geographic distance in within-species dyads, colored as on the sampling location map. The dashed and solid boxes on both the distance plot and sampling map illustrate the short and long geographic distance comparisons of genetic distance being made. The t tests within each box are significantly significant at the P < 0.01 level (Figure 11). The line of best fit and shaded 95% confidence interval, plotted with R, are matched to the species by color.

73

Figure 11: Short Distance Comparison p-values.The results of T-Test comparisons of genetic distance across randomly drawn samples at short geographic distances.

Figure 12: Long Distance Comparison p-values. The results of T-Test comparisons of genetic distance across randomly drawn samples at long geographic distances.

74

Figure 13: Combined Distance Comparison p-values. The results of T-Test comparisons of genetic distance across randomly drawn samples at short and long geographic distances combined.

The program SpaceMix was used to visually inspect the historical connections among mouse lemurs currently occupying the eastern and western biomes (Figure 6).

SpaceMix represents the “geogenetic” locations of the samples (drawn as colored haloes or ovals) in contrast to their true sample location (the dot of a matching color). The arrow connecting the two points in the direction of purported genetic history and admixture, inferred by the difference in absolute location of the sample and expected location of the genetic data. Each pair of sampling and geogenetic locations represents a population, summarized by allele count, and the halo corresponds to a 95% confidence interval of the spatial location of the SNP profile. The size of the halo decreases with the number of individuals in the population.

75

Figure 14: Microcebus Samples Inferred Geogenetic Locations. SpaceMix- inferred geogenetic locations of samples based on prior of true sampling site and population-consolidated SNP data. Abbreviated names indicate the geogenetic centroid, and ellipses represent 95% CI of geogenetic location. The true sampling site is labeled with a colored dot, and arrows indicate the geographic direction of genetic pull via admixture. 76

Western mouse lemurs show two patterns of particular interest. The first is the strong separation of M. myoxinus and M. berthae, consistent with the biogeographic separation of these species by a river suggesting that it is a significant barrier to gene flow. Microcebus myoxinus dots are pulled northward and M. berthae dots pushed southward, thus opposite of what would be expected in the presence of gene flow between the species. The second pattern of interest regards the comparison to M. lehilahytsara. The full set of M. myoxinus samples (4 populations) are all pulled to a central location and all 4 geogenetic confidence intervals overlap. Comparing this to M. lehilahytsara in the east, where over a similar geographic range the populations do not share any common geo-genetic space. Even excluding the northern M. lehilahytsara population, the six southernmost populations occupy two distinct geogenetic spaces.

There are two stark patterns associated with eastern mouse lemurs. The first, and probably most obvious, pattern is the common geo-genetic space shared by M. rufus and

M. berthae. Though surprising given their present geographic distributions are both distant and ecologically distinct, these results are consistent with the species tree wherein they are shown to be sister taxa. The second pattern is the shift of M. mittermeieri from the northeastern corner of the island back south into the geographic range of M. lehilahytsara. This is further indication that M. lehilahytsara displays remarkable levels of genetic diversity, despite its fragmented distribution, and that

77

whatever we think of M. mittermeieri’s status as a distinct species it only represents a portion of the variation seen across M. lehilahytsara.

3.4 Discussion

Using samples across five species of Microcebus we assessed both mitochondrial

DNA and hundreds of thousands of genomic loci via ddRADseq to deeply assess the phytogeographic history of these species. In doing so we aim to both clarify their relationships and also provide insight to the change that has occurred on Madagascar in the last 100,000 years.

3.4.1 Whole Genome Phylogeography

Using whole genome SNP information from the ddRADseq analysis and a BPP- based divergence time estimates we determined a potential range of geological ages for the five targeted species ranging from 500K ybp to 53K ybp (Table 2). The age of the basal radiation of these five species is placed well within the Quaternary when climatic changes associated with glacial and interglacial periods would have radically and episodically desiccated much of the Central Highlands (Burney 1986, 1987, 1988, 1997;

Gasse et al. 1994). Thus, the results of our study would appear to agree with other studies in which Quaternary climatic and vegetation changes have been invoked to explain patterns of diversity and speciation in different groups of lemurs (Wilmé et al.

2006) and mouse lemurs in particular (Olivieri et al. 2008; Schneider et al. 2010).

According to the results of these data it would be reasonable to hypothesize that these

78

mouse lemur species were diverging and dispersing across the island, apparently with little impediment.

3.4.2 mtDNA Diversity

While the sampling undertaken for this work, both locally and more generally, is too limited for definitive conclusions, we discovered an unexcited level of diversity within relative small forest patches. Were further support to be found for this pattern, it would be consistent with the hypothesis that remnant forest patches such as Ankafobe and Ambohitantely, though clearly impacted by recent degradation due to anthropogenic agency, may have experienced alternating episodes of connection and isolation with similar relict forests across the Central Highlands.

Taking this hypothesis to its logical extreme, patterns of genetic diversity within and among these relictual habitats would thus share a signature of genetic diversification, driven by long-term isolation but punctuated by periods of gene flow.

These fragments could accurately be described as “museums” of genetic diversity within an otherwise desolated grassland savanna with the consequent expectation that patterns of gene flow within M. lehilahytsara would be reflective of a fragmented landscape wherein dispersal would be highly challenging.

79

3.4.3 Genetic Distances

To assess historical amounts of gene flow among mouse lemurs, and in doing so gain insight into the historical ecology of Madagascar, we compared the relationship between genetic and geographic distances for species of mouse lemur who currently occupy contiguous forests (M. myoxinus) with those who currently occupy a discontinuously forested range. At both short and long distances, the genetic distance of

M. myoxinus, inhabiting the contiguous forest, was smaller than that of the closely related M. lehilahytsara, occurring in the discontinuous mosaic forests. As M. myoxinus and M. lehilahytsara have been inhabiting these areas for millennia before the arrival of humans, the distinction between these signals provides evidence that the mosaic pattern of the Central Highland is a long-standing ecological feature.

3.4.4 Geogenetic analyses

The most intriguing results regarding the historical connections of eastern and western biomes are visible in the geogenetic patterns among the five targeted species. A subtle, but important, result is the sharp separation between M. myoxinus and M. berthae.

Though their geographic ranges are close, they are actually separated by a river, long thought to be a barrier to gene flow. The sharp separation between these halos, in addition to the directional arrows showing the genetic signal “pulling” the samples away from each other indicate that the river is indeed a barrier to gene flow.

80

Another interesting pattern occurs when a direct comparison is made between

M. lehilahytsara and M. myoxinus. While all four samples M. myoxinus samples are geographically spread out, their genetic “pull” is centrally into an overlapping range. In contrast, the M. lehilahytsara samples, taken from a similar geographic range, have smaller, distinct genetic ovals that do not share a common overlap or direction. Even excluding the most geographically distant M. lehilahytsara sample the others do not share an overlap. Directly comparing the two species, these results point to M. myoxinus having more gene flow and shared alleles than the M. lehilahytsara and, as one would expect, M. myoxinus having more genetic similarity across the species. This genome- wide PCA method supports the results of the genetic v. geographic distance plots, indicating that there is substantial structure built up in M. lehilahytsara relative to M. myoxinus and that it is unlikely that it occupied a contiguous forest in the recent past, or in the event that it did it was not migrating through it thoroughly or evenly.

The last observation can be made regarding M. berthae and M. rufus. Whereas the actual geographic coordinates of M. rufus fall well into the southeastern forests of

Madagascar, the genomic DNA signature recovered from the ddRAD data place this species squarely in the center of the island, forming a genetic “bridge” between M. lehilahytsara and M. myoxinus. Notably, this geogenetic signature reflects the results from the species tree wherein western M. berthae is sister to eastern M. rufus. This is in complete contradiction to the expectation that western neighbors M. myoxinus and M.

81

berthae would share a more recent ancestry than the geographically distant M. rufus. The conclusion that dispersal from east to west (and vice versa) took place during this stage of mouse lemur evolution thus seems inescapable.

3.5 Conclusions

Taken as a whole, the results of this study have both specific implications regarding the timing and geographic patterns of speciation events among the five targeted species of Microcebus, as well as more general implications for the inferred habitat of the Central Highlands during the period of their diversification. Specifically, the genome-wide SNP data indicate that the five target species were undergoing rapid diversification during a period of up to two million years leading up to the onset of the

Quaternary period. Moreover, both the phylogenetic and geogenetic patterns recovered by the BPP and SpaceMix analyses yield the unexpected result that M. rufus(or its immediate ancestors), at one point occupied a more central and northerly distribution though it is now a southeastern rainforest specialist. Thus, this species may at one time have acted as a genetic “bridge” uniting the currently eastern-distributed and western- distributed species. This leads us to the conclusion that natural habitat cover across the latitudinal expanse of the Central Highlands, which we suggest was a a mixture of wooded savanna and montane forest, must have been interconnected during this period, and thus quite unlike the current biome patterns of today, encapsulated within the

“grassland” hypothesis, where Microcebusdoes not occur in the Poaceae savanna (Figure

82

6). More subtly and hypothetically, the current distribution of M. lehilahytsara across the northern quadrant of the Central Highlands, in conjunction with its distinctly more diverged intraspecific patterns of genetic diversity relative to its spatial distribution, suggest that it has long existed in an environment unlike that suggested by the

“traditional” hypothesis of continuous forest, and much more like that envisioned by the

“mosaic” hypothesis.

By utilizing next generation sequence methods and data we have examined patterns of genetic diversity within and between these five species of mouse lemurs. In doing so we have elucidated the relationships between the species to entirely new levels, extending previous findings from a handful of genes into the entire genome, and providing new insight into the diversity present in single species. We also found multiple pieces of evidence supporting the hypothesis that the historic ecology of the

Malagasy plateau was previously a mosaic of forests and wooded savanna, a region that existed as a transitory zone for species from the otherwise distinct on either side of the island.

83

4. Direct measurement of the Mouse Lemur de novo mutation rate

Spontaneous germline mutations are the raw material on which evolution acts and knowledge of their frequency and genomic distribution is crucial to understanding how evolution operates at both long and short timescales. The rate and spectrum of de novo mutations have so far been directly characterized only for a limited set of organisms, yet it is critical to investigate a wide range of species to examine the generality of patterns that have been identified so far. Using high-coverage linked-read sequencing of a family pedigree (n=8) of grey mouse lemurs (Microcebus murinus), we estimate the mutation rate at 1.64x10-8 mutations per basepair per generation, an estimate that is higher than for most previously characterized mammals. Our result underscores the lack of a negative relationship between effective population size and mutation rate in primates [as would be predicted by the drift barrier hypothesis], a pattern that we show does continue to be visible at broader phylogenetic scales.

Unexpectedly, we found little sex bias in the parent-of-origin of mutations, a transition- transversion ratio near 1, and only a modest overrepresentation of mutations at CpG- sites. Estimates of the mutation rate critically affect estimates of divergence time, recombination rates, ancestral population sizes, and other fundamental evolutionary processes and parameters. These results both contribute to the understanding of lemur

84

evolution and encourage the continued surveillance of mutation rates across the tree of life.

4.1 Background

Spontaneous germline mutations (SGMs) are single basepair errors that occur in genetic material that is passed from parent to offspring of any sexually reproducing organism. The rate at which mutations are introduced into genomes is perhaps the most crucial metric of evolution at the genomic level. The accrual of these errors not only provides the raw materials for evolution, but also provides us with a metronome-like counter of evolutionary time. The accurate measurement of the spontaneous germline mutation rate is important to properly contextualize observed evolutionary patterns as well as for understanding the evolutionary underpinnings of the rate itself (Ohta 1972,

Lynch 2010, Gao 2018). Knowing how fast the genome is changing, and perhaps as importantly, how fast discrete genomic regions are changing, is important for understanding how organisms evolve. Quantification of variation in mutation across the tree of life informs how the rate has changed between species and potentially points to the evolutionary setting and pressures species have experienced (Martin 2017). Lastly, by establishing the rate of genomic change in the absence of fossil calibrations or other temporal anchors, investigators can accurately calculate species divergence time directly from genomic data.

85

Approaches for estimating the rate of mutation in vertebrates generally fall into one of two groups: indirect, or phylogenetically-based estimates, versus direct, or "trio- based" estimates (Scally 2012). While the indirect method was the gold standard prior to the advent of next generation sequencing technologies, these technologies have now enabled a new era of direct mutation rate measurement among previously uncharacterized species (Smeds 2016, Pfeifer 2017a, Feng 2017, Thomas 2018).

The indirect, or phylogenetic, approach has been absolutely vital to early estimates of mutation rates in vertebrates (e.g., Kumar 2002). But, given that phylogenetic branch lengths are the product of rate and time, their use in indirect methods relies on an independent and external measure of time. In a typical divergence- time study putatively homologous regions of each genome are aligned, and basepair changes along branches are calculated (Nachman 2000, Zhang 2007, Yang 2006,

Drummond 2006, Thorne 2002, Thorne 1998). With temporal calibration implemented with fossils or other external criteria, investigators can then calculate rate as an estimate of the number of substitutions per year along specific phylogenetic lineages. Though these methods have been fundamental to our understanding of evolutionary rates

(Lynch 2016), they are known to suffer from violation of the molecular clock, inaccuracies in external calibration points, and the difficulties of recovering multiple overlapping changes (i.e., "multiple hits") at any given site (dos Reis et al., 2016). Thus, we must consider these phylogenetic estimates as imprecise approximations that rely on

86

measuring changes that are putatively fixed between species. Moreover, these estimates are generally insensitive to subtle differences in rate between genomic regions (Scally

2012).

Direct estimation of the mutation rate avoids most of the limitations of the indirect method as it compares mutations that occur over a single generation by sequencing either pedigreed, related cohorts, or known parent-child trios. The direct method does not require an independent measure of the time to most recent common ancestor for the sequences being compared, nor does it require that changes be fixed within a population. Rather, the direct count of mutations occurring from parent to offspring captures the evolutionary process in real time. Given the simple nature of allelic inheritance, the logical assumption might follow that the detection of spontaneous mutations is a counting task, and therefore, the subsequent estimation of mutation rate simply a problem. In fact, the process is quite complicated with unexpected caveats that we have attempted to anticipate and account for in this study. To date, direct mutation rate estimation studies have been somewhat limited in taxonomic scope

(Table 3), the sequencing and assembly of genomes has paved the way for the direct estimation of mutation rate and the characterization of its genomic spectrum and localization of hotspots.

A reliance on existing genomic resources and infrastructure has contributed to the model organism bias that currently shapes our understanding of mutation rates

87

(Uchimara 2015, Scally and Durbin 2012, Venn 2014, Jonsson 2017). Even without parent-offspring studies large datasets of short-read sequencing, more common among model species, can be leveraged to infer mutation rates from rare variants (Harris &

Pritchard 2017). And because genomes for these organisms have been assembled and thoroughly sequenced many times over, short reads map sufficiently well to yield confident mutation counts. It follows that the theories of how the mutational process changes, both in rate (Lynch 2010) and genomic location (Kim 2006), have been formulated and tested largely within the realm of model species. Even when indirect mutation rate estimates exist they lack information regarding the genomic spectrum and location of mutations and thus limit insight into evolution.

Many recent studies producing directly estimated mutation rates have confirmed the relationship between the mutation spectrum, hypermutation among exposed CpG sites, and protective nature CpG islands. The discovery that methylated CpG sites have higher mutation rates goes back to sequence comparisons made several decades ago

(Bird 1980). While CpG sites have higher mutation rates (Erlich 1981, Hwang 2004, Kim

2006, Kong 2012) the C’s and G’s within CpG islands, localized overrepresentations of

GC content, are less likely to be methylated and are generally protected from mutations.

These islands are often present at the start of genes or near gene promoters. In most mammals studied thus far CpG sites have been found to contain an overrepresentation of mutations, ranging from 17% up to 26.8% of total genome-wide mutations (Venn

88

2014, Thomas 2018, Gao 2018), despite comprising less than 10% of the genome.

Additionally, because these mutations are due to a chemical deamination process rather than biological replication errors, they are not correlated with parental age or generation time like the majority of mutations in the genome (Kim 2006, Pfeifer 2012, Thomas 2018).

Gathering more direct, non-model mutation rate estimates will shed light on how these processes evolve.

Recently there has been an increasing prevalence, power, and affordability of sequencing technologies and the number and quality of assembled genomes (Goodwin

2016). With the increased ease and reduced costs of genome sequencing (Mardis 2017) it is now possible to obtain direct estimates of the mutation rate in non-model species

(Smeds 2016, Feng 2017, Pfeifer 2017a, Martin 2018), a development that is expanding our understanding of the phylogenetic and genomic parameters by which mutation rates have evolved. Currently, the approach of 10x Genomics (Weisenfeld 2017) stands out as particularly useful for characterizing mutation rates as it simultaneously offers the high quality of short read data and the mapping and haplotype information normally associated with longer read lengths. The high-quality short reads provide confidence when calling individual variants, a crucial step to identifying de novo mutations and an overall mutation rate. Linking these short reads together improves the mapping quality of long reads to the genome which decreases the number of incorrectly mapped reads and improves sequence quality which is especially important in large and

89

repeat-rich genomes (Chaisson 2015). Additionally, the phasing information provided by linked reads allows for the imputation of the parent of origin through just two generations of sequencing. By knowing the sequence that surrounds the haplotype containing the mutation, we can assign that haplotype as either maternal or paternal.

In this study, we combined 10x Genomics linked reads with current methods at the forefront of mutation rate estimation (Long 2016, Winter 2018) to accurately estimate the number of detectable sites. Further, we accounted for two sources of error in the inferred mutation rate - the uncertainty of the estimate itself and the potential error introduced by variant calling mistakes.

Our study focuses on the , a member of a radiation of morphologically cryptic species distributed throughout Madagascar (Hotaling et al.

2016). It is currently unclear which factors have contributed to the diversification in this clade, and in order to study this we need an accurate temporal context to correctly estimate speciation rates and to correlate speciation events with historical climatic events. Yet, given that there is no fossil record within the clade, or even within the larger outgroup lemuriform clade, it has been a challenge to estimate the age of speciation events. Several dating methods have been attempted: indirect phylogenetic methods have produced estimates of many millions of years (Yang & Yoder, 2003) and coalescent methods have yielded much more recent ages (Yoder et al., 2016). As a conservative first approach in prior work, species ages were estimated with mutation rate priors bounded

90

by mouse and human rate measures (dos Reis et al. 2015, Yoder et al. 2016). This statistically cautious approach was the best available at the time but yielded large uncertainties around speciation age estimates. By measuring the mutation rate directly, we simultaneously expand our knowledge of the rate of evolution generally and allow the estimation of divergence times within this radiation.

Table 3: Mammalian Mutation Rates. Recent direct estimates of mammalian mutation rates a-Besenbacher 2019, b-Tatsumodo 2017, c-Jonsson 2017, d-Pfeiffer 2017, e-Thomas 2018, f-Uchimura 2015.

Species Method Year Rate

P. abelii Pedigree 2019a 1.66×10⁻⁸

P. troglodytes Pedigree 2017b 1.48×10⁻⁸

H. sapiens Pedigree/Large Scale 2017c 1.29×10⁻⁸

C. sabaeus Pedigree 2017d 9.4×10⁻⁸

A. nancymaae Population 2018e 8.1×10⁻⁸

M. musculus Pedigree/Experimentation 2015f 5.4×10⁻⁹

We deeply sequenced a family pedigree of gray mouse lemurs (Microcebus murinus, n=8), including a mother, father, and two offspring to accurately identify and count de novo germline mutations as well as to assign the parent-of-origin to de novo mutations. We use long-molecule technology for accurate haplotype reconstruction which increases call confidence and yields parent-of-origin information for most mutations. Via this estimation we aim to test several current theories of mutation rate

91

evolution generally and improve our ability to estimate species ages within Microcebus specifically. This work thus has the capacity to both improve our understanding of mutation rate across vertebrates and mammals and to provide insight into the evolutionary history of Microcebus and other endemic Malagasy primates.

4.2 Methods and Materials

4.2.1 Samples

Eight individuals were selected from the DLC’s mouse lemur colony consisting of a focal family of four, two parents and two offspring from separate litters, an additional half-sibling to the offspring, and three other individuals in the maternal lineage. Four of the eight selected samples were colony founders, which had been transferred from the mouse lemur colony in Brunois, Paris, France in 2003.

Blood and tissue samples were collected from these individuals during annual veterinary check-ups. High molecular weight DNA was extracted with the Qiagen

MagAttract kit (Qiagen, Germantown, MD, USA) and 10X Genomics library preparation was performed at the Duke Molecular Genomics Core.

4.2.2 Sequencing

The eight individuals comprised nine independent sequencing samples; every individual was sequenced once, and the focal paternal sample was prepared twice and sequenced as two separate samples. Libraries were sequenced at the Duke Center for

92

Genomic Computational Biology (GCB) Sequencing and Genomic Technology Shared

Resource across two flowcells (nine total lanes) of a HiSeq 4000. The run was 150 basepairs, paired end with an average insert size of 554bp (range: 527-574bp). A single lane was run as a test of the 10x Genomics LongRanger analysis software and was analyzed to confirm successful indexing and preparation of the samples. After confirmation, the additional eight lanes (an entire flowcell) were run. All data was combined for downstream analyses (SRA accession number). 933,337,210,328 bases were generated across nine samples and nine lanes.

4.2.3 10x Genomics Pipeline

After sequencing the Illumina run folder, with basecall (.bcl) files intact, were obtained for demultiplexing and analysis in 10x Genomic's LongRanger v2.2.1 pipeline.

933,337,210,328 bases were generated across nine samples and nine lanes. The number of bases passing filtering was 828,315,644,211. Average genomic coverage was 34.5x across nine samples. The LongRanger pipeline was implemented on the Duke Scalable

Computing Resource. The alignment was conducted with respect to the latest gray mouse lemur genome (version 3.0, GCF_000165445.3) and variant calling was handled by the Genomic Analysis Toolkit (v3.8), implemented within LongRanger v2.2.1.

Basecall (.bcl) files were demultiplexed and analysed using 10x Genomic's

LongRanger v2.2.1 pipeline. Average genomic coverage after filtering was 34.5x across the nine samples. The LongRanger pipeline was run on the Duke Scalable Computing

93

Resource for approximately 2,000 hours (10 days of compute time per sample).

Sequences were aligned to the latest gray mouse lemur genome assembly (version 3.0,

GCF_000165445.3) and variant calling was handled by the Genomic Analysis Toolkit

(v3.8), implemented within LongRanger v2.2.1 (Van der Auwera et al. 2013; Weisenfeld et al. 2017). The mean sample N50 scaffold length generated within the 10x Genomics

LongRanger alignment pipeline was 1.18Mb.

4.2.4 DeNovoGear

LongRanger alignments were used to find de novo mutations within the two focal family offspring. Several methods were used to in finding mutations. First, the software package DeNovoGear (Ramu 2013) was used to analyze the LongRanger variant call files (.vcf) with default settings. As a follow-up, the software package VarScan2 was used with LongRanger binary alignment files (.bam) to confirm that the de novo mutations found with DeNovoGear could be confirmed.

Finally, de novo mutation hits were confirmed by checking independently with each replicate of the sire. Hits were also checked for absence in all other pedigree individuals as well as additionally sequenced samples from a diversity panel of gray mouse lemurs. The final list of mutations were filtered for quality by offspring de novo quality (min DNQ of 100), offspring map quality (min MQ of 50), and depth of coverage

(min coverage of 20) in the parents.

94

4.2.5 Detectable Mutation Test

We conducted a test to determine the true number of sites at which a mutation would be detectable, hereafter referred to as an “allele drop test” (Long 2016, Winter et al 2018). This test increases the accuracy of the number of base pairs used in the denominator of the mutation rate calculation. It estimates the proportion of sites at which a de novo mutation would have been discoverable by our methods. This test consisted of adding 1,000 known false de novo mutations into the pedigree with the software BAMsurgeon (Lee 2018). These mutations were added as heterozygotes, by changing half the aligned bases in the bam file at a site to the non-reference allele. After this manipulation the same methods were used to find de novo mutations, and the results were mined for the 1,000 sites. By conducting this “allele drop” test we were able to estimate in which fraction of the genome de novo mutations would have been found, were they to have occurred. Going forward with this number as the denominator in the mutation rate calculation yields a more accurate measure of the rate than using total bases, total autosomal bases, or even calculations based on a minimum depth of coverage because it produces an accurate picture of where mutations can be found via our methods.

4.2.6 Mutations at CpG sites

CpG islands were identified by two independent methods and compared to measure the number of mutations within them. First, the software (EMBOSS:cpgplot,

95

Chojnacki 2017) was used on the latest gray mouse lemur genome (3.0,

GCF_000165445.3) to identify regions that met the threshold of a CpG island (200 bp, over 50% CG content). Then, to confirm these annotations a fasta file of CpG-annotated regions of gray mouse lemur genome 2.0 (GCF_000165445.2) was downloaded from the

UCSC genome browser. A blast (Altschul 1990) database of the latest gray mouse lemur genome (3.0, GCF_000165445.3) was created and the annotations were queried to determine the coordinates of the 2.0 CpG annotations in the 3.0 genome release used for mapping and assembly. While the concordance between these methods was over 95% out of an abundance of caution only the CpG islands (a total of 67,673 annotations) confirmed via both methods were used in the analyses.

With these coordinates in-house bash scripts (see my github) were used to determine the location of each mutation with respect to CpG islands and coding regions.

To measure distribution of mutations relative to the null expectation, a permutation test was conducted of the 1,000 false mutations generated for the allele drop test. A set of de novo mutations were randomly sampled, without replacement, to compare the distance to CpG islands with the true set.

4.2.7 Parent of Origin

Custom in-house bash scripts were used in combination with the phased variant call files produced by LongRanger to assign the mutations to a maternal or paternal chromosome. In brief, these methods took input of the three family individuals and a

96

mutation location. The surrounding haplotype that contained the mutation was directly compared to the parental haplotypes at the same location to determine a match. As these individuals are all related members of a colony, dam and sire often shared similar haplotypes. Caution was used when the mutation-bearing haplotype was found in both parents, resulting in less than 100% parent-of-origin assignment of mutations.

4.2.8 BPP

Using BPP (BPP 4.0, Yang 2015) we reran data from a past study (Yoder et al

2016) to measure the effect of the directly estimated mutation rate on the of mouse lemurs. BPP ran on 3 chains for 20,000 cycles, and convergence was checked by plotting the resulting theta values from each chain. Using BPP’s output and the software bppr (dos Reis, https://github.com/dosreislab/bppr) we plotted the tree with the original mutation rate (a mouse and human-based prior) and the directly estimated mutation rate (calculated herein, with confidence intervals). While in the new version of

BPP (v4.0) the theta prior has changed from a gamma distribution to an inverse-gamma distribution, the resulting tree has the same topology and branch lengths as the one originally produced for the previous publication.

97

4.3 Results

4.3.1 Mutations

To identify de novo mutations, we initially used denovogear (Ramu 2013), then checked the results with the 10x haplotype visualization software Loupe (Weisenfeld

2017). Finally we confirmed the mutations this using an independent mutation identification analysis, VarScan2 (Koboldt et al 2009). In total, we assessed 4,542,770 variants across eight related individuals to discover 134 total de novo mutations in two focal offspring (Omale & Ofemale). Of these, 125 were located on autosomes and 9 were located on the X chromosome. Across the two sequenced offspring, 71 mutations were found in Ofemale, with seven on the pair of X chromosomes, whereas 63 were found in

Omale, with two on the single X chromosome. The average depth of coverage for the 134 mutations was 170 reads (170.2, SD=79.95).

Table 4: de novo Mutations. Number of de novo mutations and implied mutation rate of each focal individual, broken down by chromosomal location.

Chromosome Haplotype Focal de novo Autosomes X Maternal Paternal Rate Individual Mutations

Ofemale 71 64 7 23 27 1.71×10⁻⁸

Omale 63 61 2 20 24 1.56×10⁻⁸

98

Figure 15: Focal Family Quartet with Mutations. Focal family quartet with parents (P), offspring (labeled O) and sex (subscript m – male, or f – female). Lines represent familial relationships as in a traditional pedigree, with thickness and color reflecting the number and source of de novo mutations passed down (red is from male parent, blue from female parent, gray is undetermined origin). Line color represents source and shading represents destination (lighter shading to Om, darker to Of). Numbers within bars show mutation counts and the rate of each individual offspring is listed below (see Table 2).

4.3.2 Mutation Rate and Spectrum

We estimate that the mutation rate in this family quartet is 1.64x10-8 mutations per basepair per individual per generation or roughly 40 mutations contributed per

99

parent. This estimate is a weighted average of the rates on the autosomal and X chromosomes, with the total number of discovered mutations on each (autosome, n=125, and X chromosome, n=9) divided by the number of sites where a mutation was detectable given our methods (Long 2016, Winter 2018). To estimate the number of detectable sites, we artificially generated 1,000 mutations placed randomly across the entire genome within the offspring sequence data (on the autosomes, 801 of 952 generated mutation were detected, whereas on the X chromosome, 39 of 48 were detected). Thus there is an expectation that it would be possible to discover a de novo mutation occurring at 840 of the total 1,000 randomly chosen sites, or roughly 84% of the gray mouse lemur genome. Extending this percentage to the entire genome yields roughly 2.1 billion total genomic sites where, if present, a de novo mutation could have been detected; 84.1% of the baes on autosomes and 81.3% of the bases on the X chromosome.

To calculate the confidence interval around this estimate, we assume that the total number of mutations inherited by an offspring follows a Poisson distribution

(Appendix D). Given that an average of 67 mutations were found in each of the focal offspring, the 95% credibility interval for the estimate is between 56.5 and 79.1 mutations per genome, which translates to an interval of 1.41x10-8 - 1.98x10-8 mutations per nucleotide site around the estimated rate. To calculate the expected number of false positive and false negative results we utilized the duplicated sample to estimate the

100

error rate based on variant presence and absence (Appendix E). These calculations yielded an expectation of 6.46 false positive de novo mutations and 37.51 false negative de novo mutations from the total of 134 de novo mutations and 1.968 billion potential mutation sites.

The transition:transversion (Ti:Tv) was 1.03 with 68 transitions and 66 transversions. The number of strong-to-weak (SW; C/G>A/T) and weak-to-strong (WS;

A/T>C/G) were nearly identical (31 and 30, respectively). The most common two categories of de novo mutation type were G>A and C>T changes (Figure 14).

Figure 16: Microcebus murinus mutation spectrum. De novo mutations displayed by type of base change (spectrum) and displayed by location with respect to CpG sites. The base changes have been collapsed by reverse complement for simplicity and in accordance with convention (Venn 2014, Thomas 2018). For example the A>C mutation column includes A>C and the reverse complement T>G.

101

Table 5: Mutation Rates in Gray Mouse Lemurs. Mutations rates in gray mouse lemurs, broken down by method. Rates measured broken down by chromosome type. Bases reflect total assembled basepairs, not denominator used for rate calculation.

Method Chromosomes Bases Rate

Measured Autosomes (n=33) 2,323,491,814 1.60×10⁻⁸

Measured X 146,735,467 2.52×10⁻⁸

Measured All 2,470,227,281 1.64×10⁻⁸

4.3.3 Effective Population Size and Mutation Rate

Our estimate of the mouse lemur mutation rate falls above the mean of known mutation rates among the growing number of mammals that have been directly measured (5.4x10-9 – 1.66x10-8, mean value = 1.08x10-8), but within confidence intervals placed on prior estimates (Scally & Durbin 2012, Besenbacher 2019). Using an MSMC analysis, we estimated a harmonic for the effective population size of grey mouse lemur across the last 50,000 years to be approximately 41,000. At a broad scale, the mutation rate of mouse lemurs is in line with the theoretical expectations based on a relationship between effective population size and mutation rate (Figure 15) (Lynch 2010, Sung 2012).

However, within primates, this relationship clearly does not hold, with the two species with largest Ne, orangutan and mouse lemur, also having the highest mutation rate.

102

Figure 17: Mutation Rate and Effective Population Size. The linear relationship between mutation rate and effective population size for animals with directly estimated mutation rates (for data see Table 4). Categories are identified by color.

103

Table 6: Animal Mutation Rates. Directly estimated animal mutation rates and effective population sizes (Ne) across animals in Figure 15.

Directly Estimated Animal Mutation Rates

Species Mutation Rate Ne Author Year Category Human 1.20X108 1.00X104 Many 2017 Primate Chimp 1.27X108 1.10X104 Besenbacher 2019 Primate Gorilla 1.13X108 3.00X104 Besenbacher 2019 Primate Orangutan 1.66X108 2.68X104 Besenbacher 2019 Primate Green Monkey 9.40X109 1.20X104 Pfeifer 2017 Primate Mouse Lemur 1.64X108 4.10X104 This Study 2019 Primate Mouse 5.40X109 7.50X104 Uchimara 2015 Rodent Platypus 7.00X109 2.00X104 Martin 2018 Monotreme Cow 9.70X109 3.70X104 Harland 2017 Bovid Flycatcher 4.60X109 3.00X105 Smeds 2016 Bird Herring 2.00X109 4.00X105 Feng 2017 Fish Bee 6.80X109 4.73X105 Yang 2015 Insect Butterfly 2.90X109 2.00X106 Keightley 2015 Insect Fly 2.80X109 1.40X106 Keightley 2014 Insect

4.3.4 CpG Sites

As expected none of the 134 de novo mutations were found within CpG islands, which constitute roughly 4% of the Microcebus murinus genome. This lower rate matches findings in methylated regions of other mammalian genomes. There were six total mutations found at CG dinucleotide sites constituting 4.5% of all de novo mutations. This is roughly double the randomly expected hit rate of 1.9% of the genome consisting of CG sites and is a significant departure from random allotment, as determined via a permutation test (Figure 16). However, this was well below the rate previously cataloged among other studied mammals. In those studies anywhere from 15-25% of de

104

novo mutations occur at CpG sites, a 10-50x increase from the similar 2% of the genome consisting of CG sites (Hwang 2004).

Figure 18: Mutations at CpG Sites. Permutation test results for the number of randomly placed mutations that occur at CG dinucleotides (from 840 discovered allele drops only). Black line is the true number of mutations found at CG sites (6). Most randomly drawn sets of 134 mutations showed less than or equal number of mutations within CG sites with two mutations being most common, p < 0.05.

4.3.5 Parental Origin of Mutations

Using the haplotypes produced as a result of 10x Genomics linked-read sequencing we were able to assign parental origin for 94 out of 134 (70%) de novo mutations. Among those assigned mutations, 54% (n=51) were found on the offspring’s paternal haplotype while the remaining 46% (n=43) were found on the offspring’s maternal haplotype. Based on this data we find no evidence for a significant sex bias in mutations among the focal family.

105

4.3.6 Improved Timetree

Using the directly estimated mutation rate, we have recalculated the most recent genus-level timetree (i.e., age-calibrated phylogeny). Applying the newly estimated rate of 1.64x10-8 mutations per generation significantly reduces the uncertainty in age estimates around the speciation nodes (Table 7). The faster rate also has a pronounced effect on the divergence times, placing them in the more recent past (Fig 17).

Table 7: Improved Divergence Time Estimation. Ancestral effective population size (Ne) and estimated divergence times as determined by BPP and bppr. Table includes results for the previous estimate (Yoder 2016) and the directly measure rate, with node labels corresponding to Figure 17.

Directly Estimated Rate Divergence Node Ne (✕103) 95% CI 95% CI Time (ka) E 155 129, 182 275 198, 357 D 10.8 5.29, 14.5 157 117, 201 A 39.8 25.9, 53.1 26.4 4.79, 54.8 C 2.01 1.19, 2.67 146 107, 187 B 44.5 30.8, 57.7 8.84 0.30, 26.0 Previous Rate Estimate Divergence Node Ne (✕103) 95% CI 95% CI Time (ka) E 307 200, 427 545 321, 796 D 21.5 8.56, 34.5 313 188, 452 A 79 42.3, 118 52.4 7.42, 112 C 3.98 1.92, 6.14 290 174, 421 B 88.4 51.5, 130 17.5 0.51, 52.1

106

Figure 19: Mutation Rate’s Impact on Phylogeny. Densitree plot of trees from BPP analysis and bppr plot of previously published mouse lemur RADseq data. Divergence times estimated with either the previous rate estimate (red) or the directly measured rate (blue). Data is also displayed in Table 4, matched by node labels (A-E).

4.4 Discussion

We have estimated the mutation rate of the grey mouse lemur at 1.64x10-8 mutations per generation, which is in line previous findings among primates

(Besenbacher 2019) and, at a broad scale, with expectations from the drift-barrier hypothesis linking mutation rate and effective population size (Figure 15). We categorized the mutation spectrum, which matched that found in other mammals, determined the ratio of transitions to transversions (Ti:Tv) and the balance of strong-to- weak (SW; C/G>A/T) and weak-to-strong (WS; A/T>C/G) mutations to both be near one.

In inspecting the genomic context of mutations among this quartet we an elevated rate of mutations at CpG sites (4.5%) relative to the genome-wide, null expectation. After

107

using the long phasing blocks generated by the 10X Genomics linked-read method we were able to determine the parent of origin for a majority of mutations and found a roughly equivalent contribution of mutations by male and female parents to offspring.

Finally, the estimated mutation rate and 95% confidence intervals allowed the reexamination of the timing of species relationships across the genus, and in concert with previously analyzed data we found that the species were younger than previously thought, including a reduction of some species ages to within 20-50kya.

4.4.1 Mutation rate differences across species

Our estimate is among the highest among direct estimates for mammals, being very close to the recent estimate in Orangutans (Besenbacher 2019). Our estimated mutation rate corroborates the recent finding that non-human primates show higher mutation rates than humans (Besenbacher 2019). More generally, within primates, there does not appear to be a strong negative correlation between the effective population size and the mutation rate, as would be predicted by the drift barrier hypothesis (Lynch

2010, Sung et al 2012). Nevertheless, across a broader phylogenetic scale, this negative relationship is clear (Figure 15). Without a doubt primates have been oversampled relative to other groups, so our own selection and focal biases could be responsible for the distinction between primates and other groups. There is also the distinct possibility that the drift-barrier hypothesis and the limitation that effective population size has on mutation rate acts so slow as to only appear across large timescales and population size

108

changes. Moreover, given that the mouse lemur is the smallest living primate, with the shortest generation time within the primate clade, this result is a clear violation of recent interpretations of the correlation between these life-history characteristics and spontaneous mutation rates (Thomas et al., 2018)

4.4.2 Novel Mutation Rate Characteristics

Though our study has provided evidence in support of several widely held theories of mutation rate evolution, there is also evidence that runs counter to other longstanding findings. The abundance of mutations at CpG sites matches earlier mammalian findings. Though we did find an elevated rate of mutations at CpG sites

(4.5%) relative to genome-wide abundance of said sites (1.9%), this increase was far lower than expected (Gao 2018, Thomas 2018, Venn 2014, Besenbacher 2019) and represents the most striking finding of this study. The ratio of transitions to transversions (Ti:Tv) was 1.03, lower than in other animals (Venn 2014, Smed 2017,

Besenbacher 2019). A similar difference was found in the balance of strong-to-weak (SW;

C/G>A/T) and weak-to-strong (WS; A/T>C/G) mutations. While prior studies have found that more mutations occur away from C’s and G’s. These trends are linked by a shared impact on the overall mutation spectrum, it is difficult to measure a change in the Ti:Tv ratio without impacting the strong-weak ratio. We also found a roughly equivalent contribution of mutations by male and female parents, which differs from the typical male bias found among primates (Gao 2018, Thomas 2018, Venn 2014, Besenbacher

109

2019). It is conceivable that the changes in mutation spectrum are all mediated, in part, by the differences in mutations occurring at CpG sites. Additionally, the lack of CpG mutations in tandem with the relatively rapid generation time of mouse lemurs is a conceivable explanation for the parentally balanced sourcing of these mutations. All of these discrepancies could be the cause, or the effect, of a greater mechanism that deserves more thorough examination, both to understand the evolution of the mutation rate and the downstream affects that it may have on other genomic processes such as A- to-I RNA editing (An 2019).

Mouse lemurs differ from other tested species in a number of ways which all provide possible causative factors for the discrepancy in rate and genomic pattern of the mutations. In the mouse lemur genome, CpG islands are in greater proximity to genes than in other primates (humans and chimpanzees) or mice (Figure 18). Because CpG islands are significantly closer to genes in mouse lemurs there may be more direct interaction with gene function, leading to more constraint on CpG islands and even at

CpG sites that are no longer within the protective cover afforded by CpG islands.

Further, the individuals here characterized come from a breeding colony consisting of many closely related animals, which is not uncommon for laboratory populations such as mice, but provides a contrast with the same measurements in humans. Further, mouse lemurs have the shortest time to breeding age and therefore the shortest generation time among primates, unique life history traits among the other censused

110

species. Finally, they are the only strepsirrhines currently measured and are, conservatively, at least 50 million years diverged from other primates. Any or all of these factors could be causes for the different rate, location, and spectrum of de novo mutations, though further measurement of the mutation rate across primates and related mammals is needed to further identify potentially confounding variables.

Figure 20: Violin plot of CpG Island and Gene Proximity. Violin plot of the log(distance) from each CpG island to nearest gene for gray mouse lemur (Mmur), human (Hsap), chimp (Ptro), and mouse (Mmus). Gray mouse lemurs have a lower max, 75th percentile, median and 25th percentile than any of the other species.

111

4.4.3 Diversification rates in mouse lemurs

Beyond the broad goal of shedding light on the evolution of mutation rates, the direct estimation from mouse lemur pedigree data allows for more accurate estimates of species divergence dates than was previously possible in this or other lemuriform species. Notably, the measured rate is higher than the rate previously estimated for use in phylogenetic reconstruction and divergence time estimation, which considerably reduces the estimated time to most recent common ancestor (TMRCA) in mouse lemurs.

Direct and indirect methods of mutation rate estimation often yield different results, with most cases the direct estimation of the mutation rate being higher than the indirect, phylogenetic, rate. Knowing both serves two main, related, purposes. It can help elucidate the pattern of differences across the tree of life between the direct and indirect rates, which could eventually provide useful information to solve this problem.

Work towards those goals is already underway among the most heavily sampled taxa such as apes (Besenbacher 2019). Knowing both rates helps provide the best estimate of past biological events and generate more accurate predictions of past species.

More accurate timetrees will help us better understand mouse lemur diversification, a desirable goal given that the genus Microcebus is one of the most species-rich clades among all primates (Weisrock 2010). Specifically, this enables investigators to more accurately correlate past climate and geological data with species ages for better understanding the impact of ecological change on speciation rates, as well

112

as for interpreting the impacts of human arrival on the native fauna and flora

(Vorontsova 2016). Given the critical and endangered status of these and other lemuriform species, these developments may help motivate conservation efforts and improve the decisions made regarding these policies (Brown 2015).

4.4.4 Methodology and caveats

These findings not only underscore the importance of measuring mutation rate in a variety of species, but also for doing so with linked-read sequencing technology that increases mapping confidence, provides haplotypes for determining sex-biases and overall aids placing the mutation rate in genomic context. By eliminating potentially spurious sites of the genome by depth of coverage and allele drop simulations, we are confident that this calculation therefore provides a more conservative denominator for mutation rate calculations. The use of a replicated sample also contributes to the conservative nature of this estimate, by requiring that the mutations identified were present regardless of which replicate was analyzed. Despite our best efforts to develop an accurate and conservative method for determining the de novo mutation rate, our results are not immune to problems presented by small sample sizes. The overall rate and genomic distribution of mutations is stable across the sequenced offspring and duplicated paternal individual, conferring confidence in these measurements and conclusions. In contrast, the mutation spectrum was variable across individuals indicating that these results should be interpreted with more caution.

113

The methods described here can be utilized to study the evolution of mutation rate throughout the tree of life. Via the construction of de facto long reads, the 10x technology produces sufficient haplotype data to prevent the need for multiple generations of sequencing. The same technology makes genome assembly feasible with relatively low coverage (60-fold), which in turn enables assembly of a high-quality genome and direct estimation of mutation rate in a single study. Though mutation rates have already been estimated for a number of model organisms, progress in understanding mutation rate evolution depend on estimates from a wide range of species across the tree of life, which will necessary include many non-model organisms.

It is our hope that the details of our study will provide an encouraging pathway for that work.

114

5. Rates of Sperm Gene Evolution Across Microcebus

Although mouse lemurs have been the recent focus of next generation sequencing methods, the initial applications of these data have been centered around understanding the relationships among and between the species rather than the mechanisms that underlie the diversification of the species radiation and/or maintain species boundaries. Herein I compare the rates of positive selection within sperm fertilization-related genes, sperm construction-related genes, and a set of randomly drawn genes to determine if the former are under more selective pressure than the latter.

As the null hypothesis, I would expect the sperm-mediating genes to be under more intense positive selection with the assumption that they are directly related to reproductive success, and thus directly relevant to fitness. These comparisons show that there is higher dN/dS (a measure of positive selection) in sperm fertilization-related genes relative to sperm construction-related genes and random genes. These results provide molecular data that support our current understanding of the behavior and lifestyles of these primates. They also highlight what could be an underlying mechanism of speciation among these highly speciose primates of Madagascar. Ultimately this represents a step towards understanding the evolution of lemurs by applying genomic data to answer the how and why of speciation in this clade of primates.

115

5.1 Background

Among the mouse lemur’s life history traits are several signs of rapid speciation.

Mouse lemurs are spread across a large, ecologically diverse landscape which, in combination with their limited home ranges, may allow for the buildup of genetic differences that often lead to speciation as seen in other organisms (Coyne & Orr 2004,

Thorpe et al. 2008, Crispo et al. 2006, Wang et al. 2013, Safran et al. 2016). Beyond being speciose, mouse lemurs exhibit life history traits that are potentially consistent with selection working to simultaneously increase the likelihood of genomic incompatibilities and thus accelerate the speciation process.

In natural populations, mouse lemurs have been documented to display highly promiscuous male behavior, low paternal care, and female philopatry (Radespiel et al.

2000). In primates generally, these behaviors are consistent with multiple male copulations with a single female. Because the species are female philopatric (or matrilocal) and the males do not contribute to offspring care, the males of the species typically move between multiple female groups and likely copulate with a number of females in a single breeding season. The males with the most fit sperm have the highest likelihood of fathering the highest number of offspring, behavior that leads to high levels of sperm competition (Balshine et al. 2001, Nascimento et al. 2007, Firman et al.

2011). Among mouse lemurs in captivity it is common husbandry practice to expose reproductively-receptive females to two male suitors to promote a successful courtship

116

(Yoder, pers comm). This behavior would suggest that in the wild females are accustomed to multiple male copulations, and that it may even be required for successful mating. Additionally, in the wild and in captivity the males of the species show exceptionally enlarged testes during breeding season, another sign of sperm competition (Schumacher et al. 2014). Indeed, relative to their body weight, mouse lemurs show among the highest ratio of testes size within primates (Harcourt et al. 1995,

Kappeler et al. 1997).

These life history traits are consistent with high levels of sperm competition among males and it would follow that certain sperm genes within mouse lemurs are rapidly evolving. These sperm genes are likely under pervasive sexual selection (Dorus et al. 2004, Ramm et al. 2007, Martin-Coello et al. 2009). Genes involved in mating, such as sperm genes, are commonly a source of genetic incompatibilities that lead to the formation of novel species because of the intimate relationship between these genes and the ability to produce viable offspring. (Howard et al. 1999, Coyne & Orr 2004, Palumbi et al. 2009) If mouse lemur sperm genes are indeed rapidly evolving as a result of male competition this could offer a mechanistic explanation for the speciose nature of the genus. Sperm protein changes can rapidly create genetic incompatibilities and are an often cited as a reinforcement of allopatric speciation (Wade et al. 1994, Rieseberg et al.

1995, Howard et al. 1999, Gavrilets et al. 2000, reviewed in Ritchie 2007, Manier et al.

2013, Baack et al. 2015). Among primates post-mating, or fertilization-related, sperm

117

genes show higher levels of positive selection in polygamous species than monogamous ones (Schumacher et al. 2014).

Due to the previous lack of a thoroughly assembled genome and high-quality gene annotation for the mouse lemur clade, these hypotheses have yet to be tested with genetic data. While genetic data has been generated for several species within the genus

Microcebus these data have primarily been used to date current species divergences

(Yang and Yoder, 2003), describe new ones (Hotaling et al., 2016), and reconstruct the phylogenetic relationships among species (e.g., Yoder et al., 2000; Weisrock et al., 2010;

Yoder et al., 2016).

With the dropping cost and increasing quality of next generation sequencing and improving quality of genome assembly and annotation methods, it is now feasible to ask genic questions of non-model species. We can use the annotated mouse lemur genome

(Larsen et al., 2017), in combination with several genomes assembled with moderate coverage, to assess selection in a family of genes across the entire clade of mouse lemurs.

The progress into gene-function genomics is novel within this clade and is crucial to gaining a deeper understanding of how evolution acts on the process of lineage diversification within the mouse lemur clade.

Here we used the sequenced, assembled, and annotated genomes of six species of mouse lemurs to assess the patterns of change in sperm genes of two general categories: fertilization and sperm-construction. Inferring the degree of selection

118

occurring across these genes will give us a better understanding of how these primates diverged within Madagascar and why they have formed and maintained so many novel species. It will also provide a genetic confirmation of known life history traits in

Microcebus. Ultimately this study illustrates how genomics can be used to test biological questions across the tree of life.

5.2 Methods

5.2.1 Gene Identification

Using the annotated, chromosome-level assembly of M. murinus

(GCF_000165445.3) the exons of all genes that had previously been established as sperm- related were extracted for the analysis, a list of genes originally published in similar primate work (Schumacher et al. 2014). The majority of these genes also appear in the M. murinus genome (129 of 169) and are used in this analysis. These sperm genes (n=129), consist of 2,824 exons and ranged in length from 975-230,082 basepairs.

These genes are further categorized as being related to the construction of sperm

(pre-mating or hereafter “construction-related” genes, n = 78) or the fertilization of the egg by the sperm (post-mating, or hereafter “fertilization-related” genes, n = 51).

An equal number of genes was randomly chosen from the mouse lemur genome

(n=129, hereafter “random genes”). To make the comparison between gene sets as accurate as possible each paired random gene was drawn from all genes within 50% of the length and exon count of each sperm gene. By following this method, each sperm

119

gene has a matched pair within the analysis, and the underlying distribution of length and exon density matches between the gene sets.

5.2.2 Genome Assembly and Annotation

Among ongoing work in the lab to improve the genomic resources for mouse lemurs and answer several specific questions, including this one, genomes of Microcebus griseorufus, M. mittermeieri, M. johani, M. ravelobensis, M. tavaratra, and Mirza zaza

(outgroup) were sequenced. Tissue biopsies were taken from wild individuals in

Madagascar from 1997-2015. For these five species DNA was extracted following manufacturer instructions using the Qiagen DNeasy Blood and Tissue kit (Qiagen,

Germantown, MD, USA).

The genomes of Microcebus griseorufus, M. mittermeieri, M. johani, M. tavaratra, and

Mirza zaza were sequenced to 30-fold coverage as approximately 400bp insert libraries on a single lane of an Illumina HiSeq 3000 with paired-end 150bp reads. We sequenced the Microcebus ravelobensis genome from two libraries, one with an average insert size of

570bp on the Illumina HiSeq 2500 and the other with a 500bp insert library on 5.5% of both lanes of an Illumina NovaSeq. Genomes were assembled using MaSuRCA v3.2.2

(Zimin et al. 2013). We assumed an insert size standard deviation of 15%. Scaffolds were obtained from SSPACE (Boetzer et al. 2010), which also attempted to correct assembly errors and extend contigs from MaSuRCA. The average contig N50 for these assemblies was 30,176bp (range: 15,313-69,531bp).

120

Gene annotations were trained by RNAseq data (Peng et al. 2014) used in concert with the gray mouse lemur genome. RNAseq libraries from multiple tissues and replicates were used to assemble a single transcriptome with TRINITY v2.2.0 (Haas et al.

2013). Assembled transcripts and homologous protein evidence from closely-related species were used as direct evidence for gene annotations as well as for training the ab initiogene predictor SNAP (Korf et al. 2004) with MAKER v2.31.9 (Cantarel et al. 2008) for the Microcebus murinus3.0 assembly (GCF_000165445.3). For an independently trained gene predictor, we optimized the AUGUSTUS v3.3 (Stanke et al. 2006) human parameters with the M. murinus 3.0 assembly using BUSCO v3.0.2 (Simão et al. 2015).

Final annotations for all five genomes were estimated with a single MAKER run using a combination of transcriptome and homologous protein alignment as well as gene predictions. From BUSCO v3.0.2 these assemblies average 54% (range: 46.2-68.3%) of complete BUSCO (Benchmarking Universal Single-Copy Orthologs) transcripts (4,104 common mammalian transcripts). On average 35% of BUSCO transcripts were fragmented (range: 26.4-40.6%) and only 11% were missing (range: 5.3-14.5%).

As M. johnani is a novel species, currently being described (Schüßler, in prep) the coding regions from the assembly and annotation process were used to create a tree among the seven species involved in this analysis, M. murinus and six newly assembled genomes (Figure 19).

121

Figure 21: Species Phylogeny from Assembly Data. A maximum likelihood tree generated from the coding regions of all seven species included in this analysis. The tree confirms the Microcebus relationships seen in other data (RADseq, mtDNA) and the novel species M. johani is on the opposite side of the tree from M. murinus and M. griseorufus, and is sister taxa to M. tavaratra. It also shows that within nuclear data there is some uncertainty regarding the placement of M. ravelobensis.

5.2.3 BLAST and Alignment

BLAST (Altschul et al. 1990) databases were created from the transcriptomes of the six species. All six databases were searched for all exonic sequences. From the output of the BLAST search multiple hits to the same exon were discarded and the highest match, by e-score, was kept. Samtools (Li et al. 2009) was used to both index the transcript fasta files and extract the sequence from the source transcript fast that matched the hit returned by the BLAST search. The sequence from the highest BLAST hit of each species were aligned to the M. murinus exon using PRANK (Loytynoja et al.

2014). 122

The alignment was considered successful if one or more species returned an exon

BLAST hit that subsequently aligned back to the M. murinus exon. Successful alignments were analyzed for selection with respect to the reference sequence, M. murinus, as well as three other species, M. griseorufus, M. tavaratra, and M. johani, using the software package KaKs_calculator (Wang et al. 2010). KaKs_Calculator identified synonymous and non-synonymous sites and counted synonymous and non-synonymous substitutions under the “NG” model (Nei and Gojobori 1986).

KaKs_Calculator was used to calculated the dN/dS ratio (synonymous rate to non-synonymous rate) where possible as well as GC content for both the entire sequence and all three codon positions. Analysis on the output from KaKs_Calculator was conducted in R (RC Team 2013) and RStudio (RS Team 2015).

Two separate significance tests were performed to measure the differences in dN/dS between gene categories across several sets of species. A T-test was used to compare the logarithm-adjusted means of the non-zero dN/dS values between gene sets.

Taking the logarithm of the values transformed the skewed distribution of dN/dS into a more normal distribution. A Kruskal-Wallis test was used to compare the distributions without any transformation, though random sampling was used to match distribution sizes and meet the requirement that each set within a comparison have an equal number of data points.

123

5.3 Results and Discussion

5.3.1 Gene Selection

One hundred and twenty-nine random genes were chosen from the mouse lemur genome (n=129, hereafter “random genes”). Each gene was randomly drawn from all genes matching each individual sperm gene’s length (plus or minus 50%) and exon count (plus or minus 50%). Such that a paired set of randomly drawn genes of equal number and matching in length and exon content was established (Figure 20).

Figure 22: Gene Exon Number versus Gene Length. The randomly selected genes and sperm genes plotted by exon count and gene length. The matched random draw produced sets of genes with similar characteristics.

5.3.2 Exon Alignment

On average 1,949 exons (range: 1,827-2,170) and at least some part of 210 genes

(range: 202-216) were successfully mapped across the species, from a total of 6,211 individual exons across 242 genes. The overall average dN/dS, for all exons in all genes 124

and all species was 0.266 with 95.1% of values falling below a dN/dS of 1 and 85.6% falling below a dN/dS of 0.50.

Within the genus Microcebus there is, unsurprisingly, a strong correlation (r =

0.836, p = 0.078) between the assembly quality of a species’ genome, measured by contig

N50, and the percent of exons with successfully aligned sequence. As expected, the outgroup, M. zaza, has average to low gene alignment percentages which reflect its evolutionary distance from the reference sequence (Figure 21). This is despite being the best assembly, with the highest contig N50. Though there are some differences between the number of exons that were successfully mapped, there were relatively few differences among species when measured at the gene level (Figure 21) with the exception of M. mittermeiri. This species stood out as having below 80% of fertilization- related genes align, which was lower than the outgroup M. zaza. Additionally, the exons that aligned successfully for M. mittermeiri had a different GC content than the other species (Figure 22). Given these anomalous results, that are almost certainly related to genome quality rather than biological reality, and given that this species can be considered redundant within the Microcebus species tree, we have dropped this species from further analyses.

125

Figure 23: Assembly Quality and Gene Mapping. Percent of genes with aligned sequences, denoted by gene category and species, plotted against the contig N50 of the species’ assembly. The species with the lowest contig N50 and below average percent of genes aligning, M. mittermeiri, was dropped from further analyses. Though there is a slightly linear trend, where long contigs results in a higher percent of genes found, the difference between the remaining species is less than 5%.

126

Figure 24: Exon GC content broken down by species. Excluding the outgroup, Mirza zaza, all of the exons across Microcebus have similar GC content except for M. mittermeiri.

5.3.3 Pairwise dN/dS

Given the high quality of the M. murinus reference, this species was fundamental for identifying relevant genes across the genus Microcebus. The M. murinus assembly was utilized in every step of the methods except for the actual sequencing of the additional species. It was used as an assembly guide, RNAseq data from this species was used to in gene annotation, and then the focal genes were found by comparing these same annotations back to M. murinus genes. Due to the frequent use of this single genome assembly, it is possible that gene sequences within other species were, in fact, simply M. murinus genes, mistakenly attributed to the novel species. If this was the case, however, the effected pairwise comparison of exonic sequence would appear to be identical, and

127

the resulting measure of dN/dS would accordingly be zero. In an attempt to account for this possibility, and to remove any potential bias from the methods, we measured the dN/dS values of exons with at least one sequence change from the M. murinus reference.

By this method the mean value of dN/dS was calculated for each group of species across each gene category (Figure 23).

Figure 25: dN/dS by Gene Category. Breaking down the average dN/dS across gene category. The category “Sperm” contains all genes in both “Const”, construction- related, and “Fert”, fertilization-related. The genes in the fertilization category have the highest dN/dS when comparing M. murinus to all other Microcebus species. Comparison to the outgroup, Mirza zaza, provided for reference.

Additionally, a weighted average of the dN/dS ratio was measured, calculated by weighting each exon’s dN/dS ratio by the basepair length of the exon itself (Figure

24). This takes each basepair as an equal weight to reduce the outsize effect of small exons on the average. As has been shown in other genome-wide studies of dN/dS the

128

denominator can heavily impact dN/dS calculations (Wolf 2009), and indeed there is a strong correlation between sequence length and dN/dS in these data (Figure 25). This is the most likely cause of the dN/dS fluctuations by category in M. zaza. There are fewer exons aligned within the fertilization gene category (Table 1) and the longest of those exons (1,317bp) is roughly half the maximum length of the exons aligned to each of the

Microcebus samples (2,307bp). The longer exons have both an outsized effect on the weighted dN/dS calculation and are more likely to have low dN/dS values. Additionally the sample size of M. zaza (n=105) fertilization-related sperm genes is much smaller than that of the combined set of Microcebus samples (n=634). To confirm this pattern with additional statistical measures, particularly those less susceptible to outliers, the dN/dS values were also visualized by percentile. The same patterns present in the mean of dN/dS were consistently present at the 80th percentile of the dN/dS values (Figure 26).

129

Figure 26: dN/dS by Gene Category. Breaking down the weighted average dN/dS across gene category. The category “Sperm” contains all genes in both “Const”, construction-related, and “Fert”, fertilization-related. The genes in the fertilization category have the highest dN/dS when comparing M. murinus to all other Microcebus species. Comparison to the outgroup, Mirza zaza, provided for reference.

130

Figure 27: Exon dN/dS versus Exon Length. Longer exons have lower values of dN/dS, which is as expected. Each category of gene is represented across the distribution of both of these variables.

Figure 28: The 80th Percentile of dN/dS Values by Gene Category. The 80th percentile of dN/dS values, as a comparison that is more robust to non-normal distributions than the mean. The patterns seen in the mean and weighted mean, high dN/dS among fertilization-related sperm genes, hold up in this metric.

131

In averages (0.322), weighted averages (0.246), and percentile comparisons (80th percentile = 0.306), fertilization-related sperm genes had higher dN/dS than construction-related sperm genes (non-weighted = 0.235, weighted = 0.132, 80th percentile = 0.302) and roughly equal or higher dN/dS than the random set of genes

(non-weighted = 0.270, weighted = 0.251, 80th percentile = 0.334, Table 8 & Table 9).

There are significant differences in the log-adjusted dN/dS values between fertilization- related sperm genes and construction-related sperm genes in M. tavaratra, across all three “distant” Microcebus species, and across the combined means of all the species tested. There are significant differences in the log-adjusted dN/dS values between fertilization-related sperm genes and random genes in M. johani, M. tavaratra, across all three “distant” Microcebus species, and across the combined means of all the species tested. There is higher dN/dS in random genes when M. murinus is compared to Mirza zaza, but as discussed earlier this is likely due to sample size. Finally, there are significant differences in both the log-adjusted dN/dS values and the distributions of fertilization-related sperm genes and construction-related sperm genes between M. griseorufus and M. murinus, though these differences show a contrasting, lower rate of dN/dS among fertilization-related genes (Figure 27). Overall there is more evidence of positive selection, when compared back to M. murinus, among fertilization-related sperm genes then among construction-related sperm genes.

132

Figure 29: dN/dS Comparisons and Significance by Gene Category. A summary of the dN/dS comparisons made with a simple phylogenetic tree for reference. The color of the row indicates which gene category is being summarized (construction-related = yellow, fertilization-related = green, random = blue). Boxes indicate significance, red indicates significance in both T and Kruskal-Wallis tests while pink indicated significance in T-test only. Phylogenetic tree indicates the relationship of the species being comparison, with respect to M. murinus - the focal species, and is not to scale.

133

Table 8: T-Test Results. Results of T-tests of the distributions of log-adjusted dN/dS values across species by gene category. Set of genes within comparison is abbreviated as: F – fertilization-related sperm, C – construction related sperm, R – random.

Difference Difference in T-Test T-Test p T-Test Species Comparison in Means log(Means) Stat value Sig.

All F-C 0.221 0.260 3.22 1.36E-03 Sig Mgri F-C -0.404 -0.368 -1.55 1.27E-01 NS Mrav F-C -0.100 -0.116 -0.55 5.81E-01 NS Mjoh F-C 0.835 0.322 1.47 1.46E-01 NS

Mtav F-C 0.319 0.707 5.96 2.01E-08 Sig allMcebus F-C 0.284 0.342 3.70 2.48E-04 Sig distMcebus F-C 0.396 0.459 4.60 5.89E-06 Sig

Mzaz F-C 0.210 0.335 1.43 1.58E-01 NS All F-R 0.279 0.249 3.36 8.52E-04 Sig Mgri F-R -0.003 -0.003 -0.01 9.88E-01 NS

Mrav F-R 0.096 -0.056 -0.32 7.52E-01 NS Mjoh F-R 0.940 0.603 2.91 4.80E-03 Sig Mtav F-R 0.204 0.574 4.44 1.53E-05 Sig

allMcebus F-R 0.317 0.327 3.78 1.85E-04 Sig distMcebus F-R 0.362 0.384 4.02 7.33E-05 Sig Mzaz F-R 0.339 0.440 2.11 4.12E-02 Sig All R-C -0.058 0.011 0.18 8.53E-01 NS Mgri R-C -0.401 -0.365 -2.19 3.10E-02 Sig Mrav R-C -0.196 -0.061 -0.40 6.92E-01 NS Mjoh R-C -0.105 -0.281 -1.95 5.23E-02 NS Mtav R-C 0.115 0.133 1.36 1.75E-01 NS allMcebus R-C -0.033 0.015 0.23 8.17E-01 NS distMcebus R-C 0.034 0.074 1.02 3.07E-01 NS Mzaz R-C -0.129 -0.105 -0.63 5.32E-01 NS

134

Table 9: Kruskal-Wallis Test Results. Results of Kruskal-Wallis tests of the distributions of dN/dS values across species by gene category. Set of genes within comparison is abbreviated as: F – fertilization-related sperm, C – construction related sperm, R – random.

Kruskal- Kruskal- Difference in Wallis Test Wallis p Kruskal- Species Comparison Means Stat value Wallis Sig. All F-C 0.221 164.13 3.09E-01 NS Mgri F-C -0.404 26.29 2.51E-02 Sig Mrav F-C -0.100 28.93 2.72E-01 NS Mjoh F-C 0.835 41.90 5.32E-01 NS Mtav F-C 0.319 66.41 7.09E-01 NS allMcebus F-C 0.284 115.32 6.00E-01 NS distMcebus F-C 0.396 120.47 8.89E-01 NS Mzaz F-C 0.210 10.69 5.96E-01 NS All F-R 0.279 225.22 5.98E-01 NS Mgri F-R -0.003 27.59 8.73E-01 NS Mrav F-R 0.096 50.33 3.97E-01 NS Mjoh F-R 0.940 37.59 6.46E-01 NS Mtav F-R 0.204 87.35 4.95E-01 NS allMcebus F-R 0.317 194.69 2.89E-01 NS distMcebus F-R 0.362 138.45 5.24E-01 NS Mzaz F-R 0.339 33.91 1.09E-01 NS All R-C -0.058 136.09 2.04E-01 NS Mgri R-C -0.401 20.64 8.47E-02 NS Mrav R-C -0.196 24.91 1.58E-01 NS Mjoh R-C -0.105 24.46 6.04E-01 NS Mtav R-C 0.115 58.26 5.21E-02 NS allMcebus R-C -0.033 129.00 4.43E-01 NS distMcebus R-C 0.034 110.44 2.33E-01 NS Mzaz R-C -0.129 8.12 3.36E-01 NS

5.3.4 Multispecies dN/dS

As dN/dS is a paired metric, it makes sense to also break out these comparisons by evolutionary distance to the focal M. murinus. Doing so highlights a stark pattern in 135

the data. M. griseorufus, which is the closest relative to the reference genome, M. murinus, shows a lower dN/dS than the other Microcebus included who are all an equal evolutionary distance away (Figure 28). M. ravelobensis was not considered in this analysis as its phylogenetic position is problematic, likely due to issues of long-branch attraction (Tiley, pers comm, Figure 19). Though the direction of the M. murinus-M. griseorufus pattern relationship reverses when comparing the weighted means, the fact that the amount of non-synonymous change between fertilization-related genes is lower when comparing M. murinus and M. griserufus remains unchanged (Figure 29). To check for the possibility of data bias, the number and length of fertilization-related exons mapping to M. griseorufus was confirmed and no significant differences were found

(Table 10).

Figure 30: dN/dS by Gene Category. Breaking down the average dN/dS across gene category. The category “Sperm” contains all genes in both “Const”, construction-

136

related, and “Fert”, fertilization-related. “distMcebus” contains M. johani and M. tavaratra. The genes in the fertilization category have the highest dN/dS when comparing M. murinus to distant Microcebus species, but the lowest when compared to M. griseorufus.

Figure 31: dN/dS by Gene Category. Breaking down the weighted average dN/dS across gene category. category. The category “Sperm” contains all genes in both “Const”, construction-related, and “Fert”, fertilization-related. “distMcebus” contains M. johani and M. tavaratra. The genes in the fertilization category have the highest dN/dS when comparing M. murinus to distant Microcebus species. Though the fertilization-related genes have a higher weighted average than construction-related they are still lower than the fertilization genes in the comparison with other Microcebus species.

Table 10: Summary of fertilization-related gene data across all six species. *M. mittermeiri was excluded from the analyses.

Exons Exons with Average Average Longest Species Aligned Substitutions Length dN/dS Exon

M. griseorufus 283 133 161.9 0.119 2307 M. ravelobensis 288 157 171.6 0.276 2307 M. johani 332 140 148.1 0.574 2307 M. mittermeiri* 243 120 150.6 0.419 2307 137

M. tavatratra 330 204 164.4 0.267 2307 Mirza zaza 300 105 117.8 0.311 1317

Two steps were taken to confirm the patterns from the dN/dS calculations of M. murinus alignments. First, we changed the focal species of the alignment to M. griseorufus which confirmed the findings with a similar pattern seen in the M. murinus-focal data: there was less change in fertilization-related genes when compared within the M. murinus-M. griseorufus pair than when compared across to more distant species.

To test the dN/dS values found in M. griseorufus-M. murinus relationship with values from species pairs of a matching evolutionary distance, we turned to the M. tavatratra and M. johani species. Comparisons where these two species are focal show a high rate of change among fertilization-related genes in both directions within the pair and a more mixed result when compared back to M. murinus-M. griseorufus (Figure 30).

138

Figure 32: dN/dS by Gene Category and Evolutionary Distance. Breaking down the average dN/dS across gene category and multiple evolutionary distances. The category “Sperm” contains all genes in both “Const”, construction-related, and “Fert”, fertilization-related. The top of each plots shows the focal species for the dN/dS test, with the categories being “Close”, neighboring single species, “Distant”, species pair across the Microcebus genus, or “Outgroup”, M. zaza. The simple phylogenetic tree beneath each plot shows the focal species for the dN/dS test (blue) and the species included in the close, distant, and outgroup comparisons (red). The genes in the fertilization category have the high dN/dS when comparing M. johani and M. tavaratra to each other (the “Close” category).

5.4 Conclusions

The clear signal shown in the results of these direct genome comparisons across species indicates that this method for detecting signatures of selection was successful.

This is in large part due to the phylogenetic proximity of the species being compared.

Whereas sufficient evolutionary time has elapsed for meaningful substitutions to accumulate within exon sequences, the phylogenetic distance is not so great that multiple substitutions are likely to obscure the biological patterns of interest. Rather, the

139

main predictors of gene search and alignment success were both assembly quality and evolutionary distance as would be expected. Additionally, the dN/dS values that were recovered from the search match those in similar searches of other primate genomes

(e.g., Schumacher 2014, construction median dN/dS = 0.08 & fertilization median dN/dS

= 0.23). Genomes that have been built with single runs of next generation sequencing data, assembled with de novo software and annotated with the help of gene transcripts can be used for preliminary tests of selection, with the appropriate controls and techniques.

The overall increase in the dN/dS rate of mouse lemur sperm fertilization-related genes, compared to random and sperm construction-related genes, is indicative of positive selection acting on sperm fertilization within the genus. Given that the life history traits of mouse lemurs are strongly indicative of the presence of sperm competition, it is possible that sexual selection is also playing a role, in addition to positive Darwinian selection. The signatures of selection in these genes match those of other primates that have promiscuous behavior, such as multiple male copulations

(Schumacher 2014). Further, these genomic results are compatible with both field and captive observations of mouse lemur behavior. Sperm fertilization proteins, like those found to be changing in this study, have been theorized, modeled, and recognized as reinforcing allopatric speciation in many other taxa (Wade 1994, Rieseberg 1995,

Howard 1999, Gavrilets 2000, reviewed in Ritchie 2007, Manier 2013, Baack 2015).

140

While on the whole the data suggest that across the entire genus fertilization- related sperm genes are changing more rapidly than other sperm genes, comparisons between M. griseorufus and M. murinus provide a clear outlier. We have field evidence of their range and captive evidence suggesting that they exhibit behavior consistent with multiple male matings. Additionally, many precautions have been taken to account for biases in the data and analysis process. These results have been shown to be robust to several different tests and the underlying data do not appear to harbor any significant biases. Though the data generated by this study is insufficient to determine why this particular species pair shows discrepancy in dN/dS among fertilization-related sperm genes, the overall pattern confirms our hypotheses regarding mouse lemur life history via similar genic changes that occur in other male polygamous primates.

This implementation of genomic data shows the relative ease with which it can be applied to non-model systems. Though the M. murinus genome was generated only recently (Larsen et al., 2017), the vast majority of proteins cited in prior work

(Schumacher 2014) were annotated, easily found, and confirmed. When a genome of this quality is paired with additional genomes, even of draft quality, as was the case here, many diverse questions about the underlying biology of the system can be investigated.

Studies such as this one are but a small step in the process of learning about the gene function of non-model species.

141

Genomic research outside model systems has become increasingly more common, allowing for the investigation of evolutionary patterns across the tree of life.

The methods presented are among many possible techniques for the application of genomic data to fascinating questions in non-model systems. These results are the first steps in a full exploration of the of mouse lemurs, which is a uniquely powerful system for the study of primate speciation, even among the diverse fauna of

Madagascar.

6. Conclusion

We have demonstrated with an integrative approach to speciation genomics that

Madagascar's mouse lemurs offer an ideal system in which to study speciation in natural populations. When paired with the recent genomic revolution and the new tools it has introduced we can move towards an understanding of the speciation dynamics of mouse lemurs, the underlying rules of speciation, and ultimately provide a roadmap for future investigations of speciation in natural populations of other non-model systems.

We first used a thorough, genomic approach to construct the phylogeography of both past and present species whose distributions are key to understanding natural habitat fragmentation in the Central Highland plateau of Madagascar. These results

142

were utilized to test hypotheses concerning the ecology and distribution of

Madagascar's grasslands in the distant past, before the arrival of humans. The species relationships and past distributions inferred from the information in current mouse lemur genomes indicate that island-wide dispersal was possible up until approximately

500,000 years ago. Subsequently, the populations diverged, giving way to an array of living species now found in the central plateau and the eastern rainforests. However, the accuracy of the species age estimates from genetic data were initially limited by our understanding of the mutation generation process in this non-model species.

To improve the accuracy of the species age estimates we generated linked-read sequence data that was used to estimate the generational mutation rate of grey mouse lemurs (Microcebus murinus). We sequenced a family pedigree (n=8), finding 134 de novo mutations across two focal offspring, which yielded a mutation rate estimate of 1.64x10-8 mutations per basepair per generation. This estimate is higher than nearly all mammals that have been previously characterized. As this estimate is crucial to the estimate of species ages we re-analyzed the previous findings to discover a considerably more recent divergence time across the species.

Lastly, we employed mouse lemur genome annotations to investigate the putative functional underpinnings of speciation across the genus Microcebus. Due to the novelty of the genome assembly, this area of research in mouse lemurs is relatively unexplored. Most prior analyses of species differences have focused on understanding

143

the phylogenetic relationships among the species rather than the mechanisms that underlie their diversification. To assess functionality, we compared the rates of substitution within sperm genes and a set of randomly drawn genes to look for the presence of positive selection. These comparisons reveal an elevated dN/dS, and thus evidence for positive selection, in sperm fertilization-related genes relative to sperm construction-related genes and a similarly-sized set of randomly-drawn genes. This provides genomic data that lend credence to existing observations of the behavior and natural history of these primates, highlighting what could be genomic mechanism of speciation among these Malagasy primates. In doing so we hope to have shown that detailed questions relating to both the extrinsic (e.g., inter- and intra-population and ecological interactions) and intrinsic (e.g., genome content and architecture) forces that drive speciation can be asked and answered, especially in non-model species.

We have now reached a point wherein the field of speciation research is rapidly expanding outward from model species and is revealing the complex genomic underpinnings of speciation in a growing array of species. As genomic tools have been extended to an increasing number of species, more mechanisms of speciation are being described in non-model species and natural populations. With the field undergoing these advances, our view of speciation is accordingly becoming richer and more nuanced. The interactions between underlying genomic features, species distributions, processes of speciation, and the ecological surroundings of species will continue to

144

emerge as knowledge and resources of non-model genomics are established.

Accordingly, the field of speciation genomics will push ever further toward insights into natural, non-model populations with complex speciation stories - it is thus our ultimate hope that work like that presented in this dissertation is only the beginning of that progress.

145

Appendix A: State-of-the-Art Genomic Approaches

Since high throughput (also called next-generation or second-generation) sequencing technologies opened up the possibility of genomic characterization any organism in 2005, reference genomes have been assembled for hundreds of non-model organisms (Figure 2), and with ever-decreasing sequencing costs, whole-genome resequencing projects using population samples have now become commonplace.

Besides simply determining the DNA sequence, second-generation sequencing technology has also been widely adopted to identify DNA-protein interactions (Chip- seq), methylation patterns (BS-seq), and to quantify gene expression (RNA-seq). The latter, in particular, is a powerful tool for speciation researchers, since transcriptomic data improves genome annotation (Li et al., 2011; Trapnell et al., 2010) and can be an alternative or complement to genome scans for identifying barrier loci (Jeukens et al.,

2010; Poelstra et al., 2014; Rafati et al., 2018; Ritchie et al., 2015; Wang, Gerstein &

Snyder, 2009), which may also themselves represent regulatory divergence (Mack &

Nachman, 2017).

Short read lengths are a key shortcoming of second-generation sequencing, which has for example made it difficult to assemble repetitive regions, characterize structural variants, and directly observe haplotypes. Third-generation technologies are serving to overcome this limitation by directly sequencing long reads (5-15 kbp both for

146

Pacific Biosciences Single-molecule Real Time (SMRT) sequencing and Oxford Nanopore

Technologies Nanopore Sequencing), or by using novel mapping technologies such as the optical mapping system of BioNano Genomics (Servin et al., 2013), the Hi-C approach by Dovetail Genomics (Lieberman-Aiden et al., 2009), and the linked read approach by 10X Genomics (Greer et al., 2017; Yeo et al., 2017) for an overview of third- generation approaches (see Lee et al., 2016), and for Genome Research on sequencing assembly advances (Phillippy, 2017).

Access to a larger and better characterized regions of the genome and segregating variation therein will be beneficial for all genomics projects, yet the relevance of the advances that third-generation technologies offer to speciation genomics is still to be fully realized. With longer contigs and high-quality assemblies, genomic subtleties with potentially profound impacts on speciation are likely to be revealed. Structural variation such as duplications and inversions may disproportionately affect speciation, whereas haplotypic information will aid in the inference of gene flow and selection to reconstruct speciation histories and identify barrier loci. In addition, improved assembly of highly repetitive, heterochromatic regions such as centromeres (e.g. Ichikawa et al., 2017; Larsen et al., 2017), may be important since a significant number of hitherto identified hybrid incompatibility genes encode proteins that interact with heterochromatin (Bayes & Malik, 2009; Brideau et al.,

2006; Thomae et al., 2013; Ting et al., 1998), likely due to the high concentration of selfish 147

elements there (Castillo & Barbash, 2017). Repeats themselves have also been identified as the focal incompatibility locus (Ferree & Barbash, 2009). Furthermore, centromeres have been linked to speciation also outside of the context of postzygotic incompatibilities, due to their tendency to have to have particularly low recombination rates (Carneiro et al., 2009; Noor et al., 2009; Stump et al., 2005).

Appendix B: “Messy Speciation” and Genealogical

Variation Across the Genome

Genomic approaches have clarified many phylogenetic relationships that were previously unclear or controversial, and have provided an enormous increase in resolution and precision to delimit species and population structure within species. This is because the limited information present in single gene fragments and the high variance of the coalescence process necessitates a multitude of independent loci (“gene trees”) to accurately infer the underlying genealogy (“species tree”). However, the high variance and stochasticity of the coalescence process can create extensive genealogical variation across the genome, such that genealogies underlying traits of interest are not always likely to follow the inferred species tree – a phenomenon originally called hemiplasy (Avise, Robinson & Kubatko, 2008). Hemiplasy is more likely under precisely some of the patterns in speciation that genomic approaches have helped to uncover, and will thus be necessary to take into account, especially in analyses of trait evolution

(Hahn & Nakhleh, 2016; Wu et al., 2017). More generally, relying on a single bifurcating

148

tree for species delimitation, phylogenetic hypotheses, and subsequent comparative analyses may not always be appropriate given the often multifaceted nature of speciation and the complexities of gene-tree/species-tree reconstruction.

These “messy” aspects of speciation include, first, gene flow during and after speciation, which produces additional variance in genealogies, and is now believed to be widespread phenomenon. Second, multiple speciation events that happen in rapid succession, or even simultaneously (Bolnick, 2006; Kautt, Machado-Schiaffino & Meyer,

2016), such as in rapid radiations, result in high proportions of incomplete lineage sorting, the second source of genealogical discordance. Third, in a number of intriguing instances of incipient speciation, strong discordance was found between overall genomic ancestry clines and clines for phenotypes that are thought to represent major isolating barriers (Harris et al., 2017; Poelstra et al., 2014; Semenov et al., 2017; Vijay et al., 2016).

In such cases, counter-intuitive patterns that may be uncovered by genomic approaches can certainly complicate attempts at species delimitation, and once again, have demonstrated the inherent complexity of speciation.

Appendix C: Genetic Incompatibilities and Rules of Speciation

Genetic incompatibilities reduce or nullify hybrid fertility and viability. Given that they tend to be slow to develop, and during hybridization events act after other 149

barriers to gene flow, they may in many taxa not be as important as prezygotic barriers.

Even so, they are the only barriers held to be irreversible. The Dobzhansky-Muller (DM) model posits that interactions between two or more loci that each diverged between two populations are responsible for genetic incompatibilities, which circumvents the need to invoke negative effects of these allelic changes when they occurred within each population. "Haldane's Rule" and the "Large X-effect", have been described as "the most consistent empirical patterns in speciation genetics" (Demuth, 2014), (Delph & Demuth,

2016; Irwin, 2018) (see also Delph & Demuth 2016, Irwin 2018) and are both related to

DM incompatibilities. They also both involve sex chromosomes (see e.g. Irwin, 2018;

Johnson & Lachance, 2012) (see e.g. Johnson & Lachane 2012, Irwin 2018), underlining the role of chromosome-level peculiarities in speciation, which has long been recognized

(Coluzzi et al., 1977).

In the earliest descriptions of hybrid sterility, Haldane (1922) observed that it was typical for the heterogametic sex to be the one to manifest hybrid sterility as observed in mammalian males (XY) and avian females (ZW). In a related phenomenon, it has been observed that X-chromosome genes in Drosophila and most other animals cause infertility in hybrid males at a far greater rate than autosomal genes (Presgraves,

2010), with 60 percent of X-chromosome genes causing infertility in hybrid males versus the 18 percent for all the non-sex chromosomes (Masly & Presgraves, 2007). Not

150

surprisingly, sex chromosomes have been implicated for harboring an excess of genes with sex-biased expression and thus predisposing features for facilitating speciation

(Yoshida et al., 2014). This theory has recently been investigated, and supported, to an extremely granular level in mice (Larson et al., 2016). The view across the eukaryotic tree of life suggests that speciation rates are lower in lineages without differentiated sex chromosomes (Phillips & Edmands, 2012), presumably correlated with the lower levels of postzygotic isolation in organisms without sex chromosomes, even when levels of overall genetic divergence are similar (Lima, 2014).

Appendix D: The uncertainty of the mutation rate estimate from pedigree-sequencing studies

Let λm and λp be the maternal and paternal mutation rates per genome. The total mutation rate is λ = λm + λp. For example, if we expect an offspring to inherit 30 and 40 mutations form their mother and father respectively, then λm = 30, λp = 40, and λ = 70. If we assume that the mutations inherited form the mother and father have a Poisson distribution, then the total number of mutations inherited also follows a Poisson distribution.

Let xi be the total number of mutations observed in an individual, and suppose we observe mutations in n individuals. We can use the Bayesian method to estimate the mutation rate, λ. Note that the gamma distribution is the conjugate prior of the Poisson,

151

and thus we use a gamma prior on λ with parameters α (shape) and β (rate). The posterior distribution is

Thus, the posterior of λ is a gamma distribution with parameters α + Σxi (shape) and β + n (rate).

In the mouse lemur pedigree-sequencing results, we see 71 and 63 mutations in two individuals respectively (Table 4). Let the prior on λ be gamma with α = 2 and β =

0.02. This is a diffuse prior with mean α/β = 100 mutations. Thus the estimate of λ is gamma distributed with shape 2 + 71 + 63 = 136 and rate 0.02 + 2 = 2.02. Thus the posterior mean is !" = 67.3, posterior standard deviation #̃ = √136 / 2.02 = 5.77. Note this estimate of λ is given as mutations per genome. To get the rate per nucleotide site, we simply divide by the diploid genome size, g. For example, assuming this is g = 4x109 in mouse lemurs, we get !" =1.68x10-8 and #̃ =1.44x10-9. The 95% credibility interval for the estimate between 56.5 and 79.1 mutations per genome, or between 1.41x10-8 and 1.98x10-

8 mutations per nucleotide site.

152

Appendix E: Variant Calling’s Effect on de novo Mutation Rate Estimation

An attempt to, given a duplicate sample providing a known variant call error rate, quantify the effects of incorrect variant calls on the estimation of de novo mutations.

Our knowns are as follows, the number of sites in the genome with sufficient coverage to make de novo calls (sites, s), the number of variants (variants, v) in the entire pedigree (vp), the mother’s sample (vm), the father’s sample (vf), and the offspring sample (vo). Also the number of unique variants (unique variants, w) in the entire pedigree (wp), the mother’s sample (wm), the father’s sample (wf), and the offspring sample (wo). We also know the error rate of the replicated sample, presented here as a fraction of variants called (called error rate, c). This is an average rate of variants that are present in one replicate and absent in the other. We also assume the transmission probability of any given site is 0.5 (transmission probability, t).

Finally we’ll add a single unknown variable (e), which is the fraction of erroneous variant calls that are false positives. This variable ranges from 0 to 1, with 1 representing the scenario where all erroneous calls are false positives. We have no reason to expect that this fraction differs across the replicates. This variable will allow us to explore a range of possible error effects. It will also allow us to be conservative (taking the value of e that yields the most error) when reaching a final conclusion regarding error rates. 153

The probability that a de novo mutation call is a false positive is simply the probability of a false positive variant within the offspring that is not already a variant in the pedigree or a parental false negative at a site that is unique to the individual and transmitted.

So, substituting the these into the original equation:

The probability that a genomic site with sufficient coverage is an uncalled false negative de novo mutation call is essentially the converse of the false positive. It is the probability of a false negative within the offspring that is not already a variant in the pedigree or a parental false positive at a site that is unique to the individual and transmitted.

154

So, substituting the these into the original equation:

Using as known values the following rough estimates:

The probabilites of false negative and false positives are as follows:

Given 134 de novo mutations from 1,968,000,000 sites of sufficient coverage, we expect:

155

156

References

Abbott, R., Albach, D., Ansell, S., Arntzen, J. W., Baird, S. J. E., Bierne, N., . . . Zinner, D. (2013). Hybridization and speciation. Journal of Evolutionary Biology, 26(2), 229- 246. doi:10.1111/j.1420-9101.2012.02599.x

Adkins, R. M., & Honeycutt, R. L. (1994). Evolution of the primate cytochrome c oxidase subunit II gene. Journal of molecular evolution, 38, 215-231.

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.

Alkhateeb, A., & Rueda, L. (2017). Zseq: An Approach for Preprocessing Next- Generation Sequencing Data. J Comput Biol. doi:10.1089/cmb.2017.0021

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403-410. doi:https://doi.org/10.1016/S0022-2836(05)80360-2

An, N. A., Ding, W., Yang, X.-Z., Peng, J., He, B. Z., Shen, Q. S., . . . Li, C.-Y. (2019). Evolutionarily significant A-to-I RNA editing events originated through G-to-A mutations in primates. Genome Biology, 20(1), 24. doi:10.1186/s13059-019-1638-y

Angelis, K., & dos Reis, M. (2015). The impact of ancestral population size and incomplete lineage sorting on Bayesian estimation of species divergence times. Current Zoology, 61(5), 874-885.

Arias, C. F., Van Belleghem, S., & McMillan, W. O. (2016). Genomics at the evolving species boundary. Current Opinion in Insect Science, 13, 7-15. doi:10.1016/j.cois.2015.10.004

Árnason, Ú., Lammers, F., Kumar, V., Nilsson, M. A., & Janke, A. (2018). Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow. Science Advances, 4(4), eaap9873.

Aulchenko, Y. S., Ripke, S., Isaacs, A., & Van Duijn, C. M. (2007). GenABEL: an R library for genome-wide association analysis. Bioinformatics (Oxford, England), 23(10), 1294-1296.

157

Avise, J. C., Robinson, T. J., & Kubatko, L. (2008). Hemiplasy: A New Term in the Lexicon of Phylogenetics. Systematic Biology, 57(3), 503-507. doi:10.1080/10635150802164587

Axelsson, E., Webster, M. T., Ratnakumar, A., Consortium, L., Ponting, C. P., & Lindblad-Toh, K. (2012). of PRDM9 coincides with stabilization of the recombination landscape in the dog genome. Genome Res, 22(1), 51-63. doi:10.1101/gr.124123.111

Baack, E., Melo, M. C., Rieseberg, L. H., & Ortiz-Barrientos, D. (2015). The origins of reproductive isolation in plants. New Phytologist, 207(4), 968-984. doi:10.1111/nph.13424

Barluenga, M., Stölting, K. N., Salzburger, W., Muschick, M., & Meyer, A. (2006). Sympatric speciation in Nicaraguan crater lake fish. Nature, 439(7077), 719-723. doi:10.1038/nature04325

Barton, N. H., & Gale, K. S. (1993). Genetic analysis of hybrid zones. In R. G. Harrison (Ed.), Hybrid zones and the evolutionary process (pp. 13-45). Oxford: Oxford University Press.

Bass, C., Zimmer, C. T., Riveron, J. M., Wilding, C. S., Wondji, C. S., Kaussmann, M., . . . Nauen, R. (2013). Gene amplification and microsatellite polymorphism underlie a recent insect host shift. Proceedings of the National Academy of Sciences of the United States of America, 110(48), 19460-19465. doi:10.1073/pnas.1314122110

Batty, E. M., Hussin, J., Bowden, R., Donnelly, P., Martin, H. C., Piazza, P., . . . Moritz, C. (2018). Insights into Platypus Population Structure and History from Whole- Genome Sequencing. Molecular Biology and Evolution, 35(5), 1238-1252. doi:10.1093/molbev/msy041

Bayes, J. J., & Malik, H. S. (2009). Altered Heterochromatin Binding by a Hybrid Sterility Protein in Drosophila Sibling Species. Science, 326(5959), 1538-1541. doi:10.1126/science.1181756

Beaumont, M. A., & Balding, D. J. (2004). Identifying adaptive genetic divergence among populations from genome scans. Molecular Ecology, 13(4), 969-980.

Belleghem, S. M. V., Baquero, M., Papa, R., Salazar, C., McMillan, W. O., Counterman, B. A., . . . Martin, S. H. (2017). Patterns of Z chromosome divergence among

158

Heliconius species highlight the importance of historical demography. bioRxiv, 222430. doi:10.1101/222430

Belleghem, S. M. V., Baquero, M., Papa, R., Salazar, C., McMillan, W. O., Counterman, B. A., . . . Martin, S. H. (2018). Patterns of Z chromosome divergence among species highlight the importance of historical demography. Molecular Ecology. doi:doi:10.1111/mec.14560

Belleghem, S. M. V., Rastas, P., Papanicolaou, A., Martin, S. H., Arias, C. F., Supple, M. A., . . . Papa, R. (2017). Complex modular architecture around a simple toolkit of wing pattern genes. Nature Ecology & Evolution, 1(3), 0052. doi:10.1038/s41559- 016-0052

Bendall, E. E., Vertacnik, K. L., & Linnen, C. R. (2017). Oviposition traits generate extrinsic postzygotic isolation between two pine sawfly species. Bmc Evolutionary Biology, 17. doi:ARTN 26 10.1186/s12862-017-0872-8

Berlocher, S. H., & Feder, J. L. (2002). Sympatric speciation in phytophagous insects: Moving beyond controversy? Annual Review of Entomology, 47, 773-815. doi:DOI 10.1146/annurev.ento.47.091201.145312

Berner, D., Ammann, M., Spencer, E., Rüegg, A., Lüscher, D., & Moser, D. (2016). Sexual isolation promotes divergence between parapatric lake and stream stickleback. Journal of Evolutionary Biology, 30(2), 401-411. doi:10.1111/jeb.13016

Berner, D., & Roesti, M. (2017). Genomics of adaptive divergence with chromosome- scale heterogeneity in crossover rate. Molecular Ecology. doi:10.1111/mec.14373

Bickhart, D. M., Rosen, B. D., Koren, S., Sayre, B. L., Hastie, A. R., Chan, S., . . . Smith, T. P. (2017). Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet, 49(4), 643-650. doi:10.1038/ng.3802

Bikard, D., Patel, D., Metté, C. L., Giorgi, V., Camilleri, C., Bennett, M. J., & Loudet, O. (2009). Divergent Evolution of Duplicate Genes Leads to Genetic Incompatibilities Within A. thaliana. Science, 323(5914), 623-626. doi:10.1126/science.1165917

159

Blair, C., Campbell, C. R., & Yoder, A. D. (2015). Assessing the utility of whole genome amplified DNA for next-generation molecular ecology. Molecular Ecology Resources, 15(5), 1079-1090. doi:10.1111/1755-0998.12376

Blanco, M. B. (2011). Timely Estrus in Wild Brown Mouse Lemur Females at Ranomafana National Park, Southeastern Madagascar. American Journal of Physical Anthropology, 145(2), 311-317. doi:Doi 10.1002/Ajpa.21503

Blanco, M. B., Rasoazanabary, E., & Godfrey, L. R. (2015). Unpredictable Environments, Opportunistic Responses: Reproduction and Population Turnover in Two Wild Mouse Lemur Species (Microcebus Rufus and M-Griseorufus) From Eastern and Western Madagascar. American Journal of Primatology, 77(9), 936-947. doi:10.1002/ajp.22423

Blankers, T., Lubke, A. K., & Hennig, R. M. (2015). Phenotypic variation and covariation indicate high evolvability of acoustic communication in crickets. Journal of Evolutionary Biology, 28(9), 1656-1669. doi:10.1111/jeb.12686

Blome, M. W., Cohen, A. S., Tryon, C. A., Brooks, A. S., & Russell, J. (2012). The environmental context for the origins of modern human diversity: A synthesis of regional variability in African climate 150,000-30,000 years ago. Journal of , 62(5), 563-592. doi:10.1016/j.jhevol.2012.01.011

Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics (Oxford, England), 30(15), 2114-2120. doi:10.1093/bioinformatics/btu170

Bolnick, D. I. (2006). Multi-species outcomes in a common model of sympatric speciation. Journal of Theoretical Biology, 241(4), 734-744. doi:10.1016/j.jtbi.2006.01.009

Bolnick, D. I., Svanback, R., Fordyce, J. A., Yang, L. H., Davis, J. M., Hulsey, C. D., & Forister, M. L. (2003). The ecology of individuals: incidence and implications of individual specialization. Am Nat, 161(1), 1-28. doi:10.1086/343878

Bono, J. M., Olesnicky, E. C., & Matzkin, L. M. (2015). Connecting genotypes, phenotypes and fitness: harnessing the power of CRISPR/Cas9 genome editing. Molecular Ecology, 24(15), 3810-3822. doi:10.1111/mec.13252

160

Boughman, J. W. (2002). How sensory drive can promote speciation. Trends in Ecology and Evolution, 17(12), 571-577.

Bowen, B. W., Rocha, L. A., Toonen, R. J., Karl, S. A., & Lab, T. (2013). The origins of tropical marine biodiversity. Trends in Ecology & Evolution, 28(6), 359-366. doi:10.1016/j.tree.2013.01.018

Bradburd, G. S., Ralph, P. L., & Coop, G. M. (2016). A Spatial Framework for Understanding Population Structure and Admixture. PLoS Genet, 12(1), e1005703. doi:10.1371/journal.pgen.1005703

Brand, C. L., & Presgraves, D. C. (2016). Evolution: On the Origin of Symmetry, Synapsis, and Species. Current Biology, 26(8), R325-R328. doi:10.1016/j.cub.2016.03.014

Brandvain, Y., & Matute, D. R. (2018). When genes move, genomes collide. PLOS Genetics, 14(4), e1007286. doi:10.1371/journal.pgen.1007286

Brawand, D., Wagner, C. E., Li, Y. I., Malinsky, M., Keller, I., Fan, S., . . . Di Palma, F. (2014). The genomic substrate for adaptive radiation in African cichlid fish. Nature, 513(7518), 375-381. doi:10.1038/nature13726

Brideau, N. J., Flores, H. A., Wang, J., Maheshwari, S., Wang, X., & Barbash, D. A. (2006). Two Dobzhansky-Muller Genes Interact to Cause Hybrid Lethality in Drosophila. Science, 314(5803), 1292-1295. doi:10.1126/science.1133953

Bronikowski, A. M., Cords, M., Alberts, S. C., Altmann, J., Brockman, D. K., Fedigan, L. M., . . . Morris, W. F. (2016). Female and male life tables for seven wild primate species. Scientific Data, 3, 160006. doi:10.1038/sdata.2016.6

Brown, J. L., & Yoder, A. D. (2015). Shifting ranges and conservation challenges for lemurs in the face of climate change. Ecology and Evolution, 5(6), 1131-1142. doi:10.1002/ece3.1418

Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A., & RoyChoudhury, A. (2012). Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis. Molecular Biology and Evolution, 29(8), 1917- 1932. doi:10.1093/molbev/mss086

161

Buerkle, C. A. (2017). Inconvenient truths in population and speciation genetics point towards a future beyond allele frequencies. Journal of Evolutionary Biology, 30(8), 1498-1500. doi:10.1111/jeb.13106

Burney, D. A. (1986). Late Quaternary environmental dynamics of Madagascar. Duke University, Durham, NC.

Burney, D. A. (1987). Late Quaternary stratigraphic charcoal records from Madagascar. Quaternary Research, 28, 274-280.

Burney, D. A. (1988). Modern pollen spectra from Madagascar. Palaeogeography Palaeoclimatology Palaeoecology, 66, 63-75.

Burney, D. A., James, H. F., Grady, F. V., Rafamantanantsoa, J.-G., Ramilisonina, Wright, H. T., & Cowart, J. B. (1997). Environmental change, extinction and human activity:evidence from caves in NW Madagascar. Journal of Biogeography, 24, 755- 767.

Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.): Springer-Verlag.

Burns, S. J., Godfrey, L. R., Faina, P., McGee, D., Hardt, B., Ranivoharimanana, L., & Randrianasy, J. (2016). Rapid human-induced landscape transformation in Madagascar at the end of the first millennium of the Common Era. Quaternary Science Reviews, 134, 92-99. doi:10.1016/j.quascirev.2016.01.007

Burri, R. (2017). Interpreting differentiation landscapes in the light of long-term linked selection. Evolution Letters, 1(3), 118-131. doi:10.1002/evl3.14

Burri, R. (2017). Dissecting differentiation landscapes: a linked selection's perspective. Journal of Evolutionary Biology, 30(8), 1501-1505. doi:10.1111/jeb.13108

Bush, G. L. (1994). Sympatric Speciation in Animals - New Wine in Old Bottles. Trends in Ecology & Evolution, 9(8), 285-288. doi:Doi 10.1016/0169-5347(94)90031-0

Bush, G. L. (1998). The conceptual radicalization of an evolutionary biologist. Endless Forms: Species and Speciation, 425-438.

Callmander, M. W., Phillipson, P. B., Schatz, G. E., Andriambololonera, S., Rabarimanarivo, M., Rakotonirina, N., . . . Lowry, P. P. (2011). The endemic and

162

non-endemic vascular flora of Madagascar updated. Plant Ecology and Evolution, 144(2), 121-125. doi:10.5091/plecevo.2011.513

Campagna, L., Kopuchian, C., Tubaro, P. L., & Lougheed, S. C. (2014). Secondary contact followed by gene flow between divergent mitochondrial lineages of a widespread Neotropical songbird (Zonotrichia capensis). Biological Journal of the Linnean Society, 111(4), 863-868.

Campbell, C. R., Poelstra, J. W., & Yoder, A. D. (2018). What is Speciation Genomics? The roles of ecology, gene flow, and genomic architecture in the formation of species. Biological Journal of the Linnean Society, 124(4), 561-583. doi:10.1093/biolinnean/bly063

Cannon, C. H., Morley, R. J., & Bush, A. B. G. (2009). The current refugial rainforests of Sundaland are unrepresentative of their biogeographic past and highly vulnerable to disturbance. Proceedings of the National Academy of Sciences of the United States of America, 106(27), 11188-11193. doi:10.1073/pnas.0809865106

Cantarel, B. L., Korf, I., Robb, S. M. C., Parra, G., Ross, E., Moore, B., . . . Yandell, M. (2008). MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome research, 18(1), 188-196.

Carneiro, M., Blanco-Aguiar, J. A., Villafuerte, R., Ferrand, N., & Nachman, M. W. (2010). Speciation in the European rabbit (Oryctolagus cuniculus): islands of differentiation on the X chromosome and autosomes. Evolution, 64(12), 3443-3460. doi:10.1111/j.1558-5646.2010.01092.x

Carneiro, M., Ferrand, N., & Nachman, M. W. (2009). Recombination and speciation: loci near centromeres are more differentiated than loci near telomeres between of the European rabbit (Oryctolagus cuniculus). Genetics, 181(2), 593- 606. doi:10.1534/genetics.108.096826

Castillo, D. M., & Barbash, D. A. (2017). Moving Speciation Genetics Forward: Modern Techniques Build on Foundational Studies in Drosophila. Genetics, 207(3), 825- 842. doi:10.1534/genetics.116.187120

Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A., & Cresko, W. A. (2013). Stacks: an analysis tool set for population genomics. Molecular Ecology, 22(11), 3124-3140. doi:10.1111/mec.12354

163

Chang, A. Y. C., Balboa, M., Zufall, R. A., Azevedo, R. B. R., Long, H., Lynch, M., . . . Sung, W. (2016). Low Base-Substitution Mutation Rate in the Germline Genome of the Ciliate Tetrahymena thermophila. Genome Biology and Evolution, 8(12), 3629-3639. doi:10.1093/gbe/evw223

Charlesworth, B. (2009). Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nature Reviews Genetics, 10(3), 195- 205. doi:10.1038/nrg2526

Charlesworth, B., & Barton, N. H. (2017). The Spread of an Inversion with Migration and Selection. Genetics, genetics.300426.302017. doi:10.1534/genetics.117.300426

Chifman, J., & Kubatko, L. (2014). Quartet Inference from SNP Data Under the Coalescent Model. Bioinformatics (Oxford, England), 30(23), 3317-3324. doi:10.1093/bioinformatics/btu530

Chifman, J., & Kubatko, L. (2015). Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site- specific rate variation, and invariable sites. Journal of Theoretical Biology, 374, 35- 47. doi:10.1016/j.jtbi.2015.03.006

Chu, J. H., Wegmann, D., Yeh, C. F., Lin, R. C., Yang, X. J., Lei, F. M., . . . Li, S. H. (2013). Inferring the geographic mode of speciation by contrasting autosomal and sex- linked genetic diversity. Molecular Biology and Evolution, 30(11), 2519-2530. doi:10.1093/molbev/mst140

Cioffi, M. D., Bertollo, L. A. C., Villa, M. A., de Oliveira, E. A., Tanomtong, A., Yano, C. F., . . . Chaveerach, A. (2015). Genomic Organization of Repetitive DNA Elements and Its Implications for the Chromosomal Evolution of Channid Fishes (Actinopterygii, Perciformes). PLoS One, 10(6). doi:ARTN e0130199 10.1371/journal.pone.0130199

Coluzzi, M., Sabatini, A., Petrarca, V., & Di Deco, M. A. (1977). Behavioural divergences between mosquitoes with different inversion in polymorphic populations of the gambiae complex. Nature, 266(5605), 832-833.

Conflitti, I. M., Shields, G. F., Murphy, R. W., & Currie, D. C. (2015). The speciation continuum: ecological and chromosomal divergence in the Simulium arcticum complex (Diptera: Simuliidae). Biological Journal of the Linnean Society, 115(1), 13- 27.

164

Consortium, G. P. (2010). A map of human genome variation from population-scale sequencing. Nature, 467(7319), 1061.

Cooper, J. C., & Phadnis, N. (2016). A genomic approach to identify hybrid incompatibility genes. Fly, 10(3), 142-148.

Cortesi, F., Musilova, Z., Stieb, S. M., Hart, N. S., Siebeck, U. E., Malmstrom, M., . . . Salzburger, W. (2015). Ancestral duplications and highly dynamic opsin gene evolution in percomorph fishes. Proceedings of the National Academy of Sciences of the United States of America, 112(5), 1493-1498. doi:10.1073/pnas.1417803112

Cowley, A., Foix, A., Lee, J., Chojnacki, S., & Lopez, R. (2017). Programmatic access to bioinformatics tools from EMBL-EBI update: 2017. Nucleic Acids Research, 45(W1), W550-W553. doi:10.1093/nar/gkx273

Coyne, J. A., & Orr, H. A. (2004). Speciation. Sunderland, MA: Sinauer Associates.

Coyne, J. A., & Price, T. D. (2000). Little evidence for sympatric speciation in island birds. Evolution, 54(6), 2166-2171.

Coyner, B. S., Murphy, P. J., & Matocq, M. D. (2015). Hybridization and asymmetric introgression across a narrow zone of contact between Neotoma fuscipes and N. macrotis (Rodentia: Cricetidae). Biological Journal of the Linnean Society, 115(1), 162-172.

Craul, M., Zimmermann, E., Rasoloharijaona, S., Randrianambinina, B., & Radespiel, U. (2007). Unexpected species diversity of Malagasy primates (Lepilemur spp.) in the same biogeographical zone: a morphological and molecular approach with the description of two new species. Bmc Evolutionary Biology, 7. doi:Artn 83 doi 10.1186/1471-2148-7-83

Crispo, E., Bentzen, P., Reznick, D. N., Kinnison, M. T., & Hendry, A. P. (2006). The relative influence of natural selection and geography on gene flow in guppies. Molecular Ecology, 15(1), 49-62. doi:10.1111/j.1365-294X.2005.02764.x

Cruickshank, T. E., & Hahn, M. W. (2014). Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Molecular Ecology, 23(13), 3133-3157. doi:10.1111/mec.12796

165

Darwin, C. (1858). On the tendency of species to form varieties; and on the perpetuation of varieties and species by natural means of selection. Journal of the Proceedings of the Linnean Society, 1858, 45-52.

Darwin, C. (1859). by Means of Natural Selection. London: John Murray.

Davies, B., Hatton, E., Altemose, N., Hussin, J. G., Pratto, F., Zhang, G., . . . Donnelly, P. (2016). Re-engineering the zinc fingers of PRDM9 reverses hybrid sterility in mice. Nature, 530(7589), 171-176. doi:10.1038/nature16931

Delmore, K. E., Toews, D. P., Germain, R. R., Owens, G. L., & Irwin, D. E. (2016). The genetics of seasonal migration and plumage color. Current Biology, 26(16), 2167- 2173.

Delph, L. F., & Demuth, J. P. (2016). Haldane’s Rule: Genetic Bases and Their Empirical Support. Journal of Heredity, 107(5), 383-391. doi:10.1093/jhered/esw026

Demuth, J. (2014). Do sex chromosomes affect speciation rate? . BioEssays, 36(7), 632.

Dewar, R. E. (2014). Early human settlers and their impacts on Madagascar's landscapes. In I. R. Scales (Ed.), Conservation and Environmental Management in Madagascar (pp. 44 - 64): Routledge.

Dewar, R. E., Radimilahy, C., Wright, H. T., Jacobs, Z., Kelly, G. O., & Berna, F. (2013). Stone tools and foraging in northern Madagascar challenge Holocene extinction models. Proceedings of the National Academy of Sciences of the United States of America, 110(31), 12583-12588. doi:10.1073/pnas.1306100110

Dewar, R. E., & Richard, A. F. (2007). Evolution in the hypervariable environment of Madagascar. Proc Natl Acad Sci U S A, 104(34), 13723-13727. doi:10.1073/pnas.0704346104

Dewar, R. E., & Wright, H. T. (1993). The culture history of Madagascar. Journal of World Prehistory, 7(4), 417-466.

Dieckmann, U., & Doebeli, M. (1999). On the origin of species by sympatric speciation. Nature, 400(6742), 354-357. doi:Doi 10.1038/22521

166

Ding, Y., Berrocal, A., Morita, T., Longden, K. D., & Stern, D. L. (2016). Natural courtship song variation caused by an intronic retroelement in an ion channel gene. Nature, 536(7616), 329. doi:10.1038/nature19093

Dobzhansky, T. (1933). On the Sterility of the Interracial Hybrids in Drosophila Pseudoobscura. Proceedings of the National Academy of Sciences of the United States of America, 19(4), 397-403.

Doebeli, M., & Dieckmann, U. (2003). Speciation along environmental gradients. Nature, 421(6920), 259-264. doi:10.1038/nature01274

Dorus, S., Evans, P. D., Wyckoff, G. J., Choi, S. S., & Lahn, B. T. (2004). Rate of molecular evolution of the seminal protein gene SEMG2 correlates with levels of female promiscuity. Nature genetics, 36, 1326. doi:10.1038/ng1471 dos Reis, M., Donoghue, P. C. J., & Yang, Z. (2015). Bayesian molecular clock dating of species divergences in the genomics era. Nature Reviews Genetics, 17, 71. doi:10.1038/nrg.2015.8

Dres, M., & Mallet, J. (2002). Host races in plant-feeding insects and their importance in sympatric speciation. Philosophical Transactions of the Royal Society B-Biological Sciences, 357(1420), 471-492. doi:10.1098/rstb.2002.1059

Dudas, G., Bedford, T., Lycett, S., & Rambaut, A. (2015). Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex. Molecular Biology and Evolution, 32(1), 162-172. doi:10.1093/molbev/msu287

Dudchenko, O., Batra, S. S., Omer, A. D., Nyquist, S. K., Hoeger, M., Durand, N. C., . . . Aiden, E. L. (2017). De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science, 356(6333), 92-95. doi:10.1126/science.aal3327

Eaton, D. A. R. (2014). PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics (Oxford, England), 30(13), 1844-1849. doi:10.1093/bioinformatics/btu121

Eberle, M., & Kappeler, P. M. (2004). Sex in the dark: determinants and consequences of mixed male mating tactics in Microcebus murinus, a small solitary nocturnal

167

primate. Behavioral Ecology and Sociobiology, 57(1), 77-90. doi:Doi 10.1007/S00265- 004-0826-1

Egan, S. P., Ragland, G. J., Assour, L., Powell, T. H., Hood, G. R., Emrich, S., . . . Feder, J. L. (2015). Experimental evidence of genome-wide impact of ecological selection during early stages of speciation-with-gene-flow. Ecol Lett, 18(8), 817-825. doi:10.1111/ele.12460

Ellegren, H. (2014). Genome sequencing and population genomics in non-model organisms. Trends Ecol Evol, 29(1), 51-63. doi:10.1016/j.tree.2013.09.008

Ellegren, H., Smeds, L., Burri, R., Olason, P. I., Backstrom, N., Kawakami, T., . . . Wolf, J. B. (2012). The genomic landscape of species divergence in Ficedula flycatchers. Nature, 491(7426), 756-760. doi:10.1038/nature11584

Evans, B. J. (2008). Genome evolution and speciation genetics of clawed frogs (Xenopus and Silurana). Front Biosci, 13, 4687-4706.

Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V. C., & Foll, M. (2013). Robust demographic inference from genomic and SNP data. PLOS Genetics, 9(10), e1003905. doi:10.1371/journal.pgen.1003905

Fan, S., & Meyer, A. (2014). Evolution of genomic structural variation and genomic architecture in the adaptive radiations of African cichlid fishes. Front Genet, 5, 163. doi:10.3389/fgene.2014.00163

Feder, J. L., Egan, S. P., & Nosil, P. (2012). The genomics of speciation-with-gene-flow. Trends in Genetics, 28(7), 342-350. doi:10.1016/j.tig.2012.03.009

Feder, J. L., Flaxman, S. M., Egan, S. P., & Nosil, P. (2013). Hybridization and the build- up of genomic divergence during speciation. Journal of Evolutionary Biology, 26, 261-266.

Feder, J. L., & Nosil, P. (2009). Chromosomal Inversions and Species Differences: When Are Genes Affecting Adaptive Divergence and Reproductive Isolation Expected to Reside within Inversions? Evolution, 63(12), 3061-3075. doi:10.1111/j.1558- 5646.2009.00786.x

Feder, J. L., & Nosil, P. (2010). The efficacy of divergence hitchhiking in generating genomic islands during . Evolution, 64(6), 1729-1747.

168

Feder, J. L., Nosil, P., & Flaxman, S. M. (2014). Assessing when chromosomal rearrangements affect the dynamics of speciation: implications from computer simulations. Front Genet, 5, 295. doi:10.3389/fgene.2014.00295

Feder, J. L., Opp, S. B., Wlazlo, B., Reynolds, K., Go, W., & Spisak, S. (1994). Host Fidelity Is an Effective Premating Barrier between Sympatric Races of the Apple Maggot Fly. Proceedings of the National Academy of Sciences of the United States of America, 91(17), 7990-7994. doi:DOI 10.1073/pnas.91.17.7990

Felsenstein, J. (1981). Skepticism towards Santa Rosalia, or why are there so few kinds of animals? Evolution, 35(1), 124-138.

Felsenstein, J. (1985). Confidence limits on phylogenies with a molecular clock. Systematic Zoology, 34, 152-161.

Feng, C., Pettersson, M., Lamichhaney, S., Rubin, C. J., Rafati, N., Casini, M., . . . Andersson, L. (2017). Moderate nucleotide diversity in the Atlantic herring is associated with a low mutation rate. Elife, 6. doi:10.7554/eLife.23907

Ferree, P. M., & Barbash, D. A. (2009). Species-Specific Heterochromatin Prevents Mitotic Chromosome Segregation to Cause Hybrid Lethality in Drosophila. PLOS Biology, 7(10), e1000234. doi:10.1371/journal.pbio.1000234

Firman, R. C., & Simmons, L. W. (2011). Experimental evolution of sperm competitiveness in a mammal. Bmc Evolutionary Biology, 11(1), 19. doi:10.1186/1471-2148-11-19

Fishman, L., Stathos, A., Beardsley, P. M., Williams, C. F., & Hill, J. P. (2013). Chromosomal rearrangements and the genetics of reproductive barriers in mimulus (monkey flowers). Evolution, 67(9), 2547-2560. doi:10.1111/evo.12154

Flaxman, S. M., Wacholder, A. C., Feder, J. L., & Nosil, P. (2014). Theoretical models of the influence of genomic architecture on the dynamics of speciation. Molecular Ecology, 23(16), 4074-4088. doi:10.1111/mec.12750

Franchini, P., Fruciano, C., Spreitzer, M. L., Jones, J. C., Elmer, K. R., Henning, F., & Meyer, A. (2014). Genomic architecture of ecologically divergent body shape in a pair of sympatric crater lake cichlid fishes. Molecular Ecology, 23(7), 1828-1845. doi:10.1111/mec.12590

169

Frantz, L. A., Schraiber, J. G., Madsen, O., Megens, H. J., Cagan, A., Bosse, M., . . . Groenen, M. A. (2015). Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat Genet, 47(10), 1141-1148. doi:10.1038/ng.3394

Fuentes-Pardo, A. P., & Ruzzante, D. E. (2017). Whole-genome sequencing approaches for : Advantages, limitations and practical recommendations. Molecular Ecology, n/a-n/a. doi:10.1111/mec.14264

Fuller, Z., Leonard, C., Young, R., Schaeffer, S., & Phadnis, N. (2017). The role of chromosomal inversions in speciation. bioRxiv, 211771. doi:10.1101/211771

Funk, D. J., Nosil, P., & Etges, W. J. (2006). Ecological divergence exhibits consistently positive associations with reproductive isolation across disparate taxa. Proc Natl Acad Sci U S A, 103(9), 3209-3213. doi:10.1073/pnas.0508653103

Gaither, M. R., Bernal, M. A., Coleman, R. R., Bowen, B. W., Jones, S. A., Simison, W. B., & Rocha, L. A. (2015). Genomic signatures of geographic isolation and natural selection in coral reef fishes. Molecular Ecology, 24(7), 1543-1557. doi:10.1111/mec.13129

Gante, H. F., Matschiner, M., Malmstrom, M., Jakobsen, K. S., Jentoft, S., & Salzburger, W. (2016). Genomics of speciation and introgression in Princess cichlid fishes from Lake Tanganyika. Molecular Ecology, 25(24), 6143-6161. doi:10.1111/mec.13767

Ganzhorn, J. U., & Schmid, J. (1998). Different population dynamics of Microcebus murinus in primary and secondary deciduous dry forests of Madagascar. International Journal of Primatology, 19(5), 785-796.

Gao, Z., Moorjani, P., Sasani, T., Pedersen, B., Quinlan, A., Jorde, L., . . . Przeworski, M. (2018). Overlooked roles of DNA damage and maternal age in generating human germline mutations. bioRxiv, 327098. doi:10.1101/327098

Gasse, F., Cortijo, E., Disnar, J. R., Ferry, L., Gibert, E., Kissel, C., . . . Williamson, D. (1994). A 36-Ka Environmental Record in the Southern Tropics - Lake Tritrivakely (Madagascar). Comptes Rendus De L Academie Des Sciences Serie Ii, 318(11), 1513-1519.

170

Gasse, F., & Van Campo, E. (2001). Late Quaternary environmental changes from a pollen and diatom record in the southern tropics (Lake Tritrivakely, Madagascar). Palaeogeography Palaeoclimatology Palaeoecology, 167(3-4), 287-308. doi:Doi 10.1016/S0031-0182(00)00242-X

Gaudelli, N. M., Komor, A. C., Rees, H. A., Packer, M. S., Badran, A. H., Bryson, D. I., & Liu, D. R. (2017). Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature, advance online publication. doi:10.1038/nature24644

Gautier, L., & Goodman, S. M. (2003). Introduction to the flora of Madagascar. In S. M. Goodman & J. Benstead (Eds.), The Natural History of Madagascar (pp. 229-250). Chicago: University of Chicago Press.

Gavrilets, S. (2000). Rapid evolution of reproductive barriers driven by sexual conflict. Nature, 403, 886. doi:10.1038/35002564

Gavrilets, S. (2003). Perspective: Models of speciation: What have we learned in 40 years? Evolution, 57(10), 2197-2215.

Gavrilets, S. (2004). Fitness Landscapes and the Origin of Species: Princeton University Oress.

Geiselhardt, S., Otte, T., & Hilker, M. (2012). Looking for a similar partner: host plants shape mating preferences of herbivorous insects by altering their contact pheromones. Ecology Letters, 15(9), 971-977. doi:10.1111/j.1461-0248.2012.01816.x

Genome Project Data Processing, S., Wysoker, A., Handsaker, B., Marth, G., Abecasis, G., Li, H., . . . Fennell, T. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England), 25(16), 2078-2079. doi:10.1093/bioinformatics/btp352

Geraldes, A., Basset, P., Gibson, B., Smith, K. L., Harr, B., Yu, H. T., . . . Nachman, M. W. (2008). Inferring the history of speciation in house mice from autosomal, X- linked, Y-linked and mitochondrial genes. Molecular Ecology, 17(24), 5349-5363. doi:10.1111/j.1365-294X.2008.04005.x

Geraldes, A., Carneiro, M., Delibes-Mateos, M., Villafuerte, R., Nachman, M. W., & Ferrand, N. (2008). Reduced introgression of the Y chromosome between subspecies of the European rabbit (Oryctolagus cuniculus) in the Iberian

171

Peninsula. Molecular Ecology, 17(20), 4489-4499. doi:10.1111/j.1365- 294X.2008.03943.x

Gerard, A., Ganzhorn, J. U., Kull, C. A., & Carriere, S. M. (2015). Possible roles of introduced plants for native vertebrate conservation: the case of Madagascar. Restoration Ecology, 23(6), 768-775. doi:10.1111/rec.12246

Getz, W. M., Salter, R., Seidel, D. P., & van Hooft, P. (2016). Sympatric speciation in structureless environments. Bmc Evolutionary Biology, 16. doi:ARTN 50 10.1186/s12862-016-0617-0

Gnerre, S., Maccallum, I., Przybylski, D., Ribeiro, F. J., Burton, J. N., Walker, B. J., . . . Jaffe, D. B. (2011). High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A, 108(4), 1513-1518. doi:10.1073/pnas.1017351108

Gompert, Z., & Buerkle, C. A. (2009). A powerful regression-based method for admixture mapping of isolation across the genome of hybrids. Molecular Ecology, 18(6), 1207-1224.

Gompert, Z., Lucas, L. K., Nice, C. C., & Buerkle, C. A. (2013). Genome divergence and the genetic architecture of barriers to gene flow between Lycaeides idas and L. melissa. Evolution, 67(9), 2498-2514.

Gompert, Z., Mandeville, E. G., & Buerkle, C. A. (2017). Analysis of population genomic data from hybrid zones. Annual Review of Ecology, Evolution, and , 48, 207-229.

Goodman, S. M., & Jungers, W. L. (2014). Extinct Madagascar: predicting the island's past. Chicago, IL: University of Chicago Press.

Goodwin, S., McPherson, J. D., & McCombie, W. R. (2016). Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics, 17, 333. doi:10.1038/nrg.2016.49

Gourbiere, S., & Mallet, J. (2010). Are Species Real? The Shape of the Species Boundary with Exponential Failure, Reinforcement, and the "Missing Snowball". Evolution, 64(1), 1-24. doi:Doi 10.1111/J.1558-5646.2009.00844.X

172

Gravogl, Y., Schwarz, T., & Tiemann-Boege, I. (2014). Characterizing the zinc finger binding interactions of PRDM9 with the DNA of recombination hotspots. Febs Journal, 281, 297-297.

Green, G. M., & Sussman, R. W. (1990). Deforestation history of the eastern rain forests of Madagascar from satellite images. Science, 248, 212-214.

Greer, S. U., Nadauld, L. D., Lau, B. T., Chen, J., Wood-Bouwens, C., Ford, J. M., . . . Ji, H. P. (2017). Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases. Genome Medicine, 9. doi:10.1186/s13073-017-0447-8

Groeneveld, L. F., Atencia, R., Garriga, R. M., & Vigilant, L. (2012). High Diversity at PRDM9 in Chimpanzees and Bonobos. PLoS One, 7(7). doi:ARTN e39064 DOI 10.1371/journal.pone.0039064

Gross, L. (2006). Demonstrating the theory of ecological speciation in . PLoS Biol, 4(12), e449. doi:10.1371/journal.pbio.0040449

Guan, Y., & Stephens, M. (2011). Bayesian variable selection regression for genome-wide association studies and other large-scale problems. The Annals of Applied Statistics, 1780-1815.

Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H., & Bustamante, C. D. (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLOS Genetics, 5(10), e1000695. doi:10.1371/journal.pgen.1000695

Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., . . . Regev, A. (2013). De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols, 8, 1494. doi:10.1038/nprot.2013.084

Hahn, M. W., & Nakhleh, L. (2016). Irrational exuberance for resolved species trees. Evolution, 70(1), 7-17. doi:10.1111/evo.12832

Haldane, J. B. S. (1922). Sex ratio and unisexual sterility in hybrid animals. Journal of Genetics, 12, 101–109.

Hamalainen, A., Dammhahn, M., Aujard, F., Eberle, M., Hardy, I., Kappeler, P. M., . . . Kraus, C. (2014). Senescence or selective disappearance? Age trajectories of body

173

mass in wild and captive populations of a small-bodied primate. Proc Biol Sci, 281(1791), 20140830. doi:10.1098/rspb.2014.0830

Hapke, A., Gligor, M., Rakotondranary, S. J., Rosenkranz, D., & Zupke, O. (2011). Hybridization of mouse lemurs: different patterns under different ecological conditions. Bmc Evolutionary Biology, 11(1), 297. doi:10.1186/1471-2148-11-297

Harcourt, A. H., Purvis, A., & Liles, L. (1995). Sperm Competition: Mating System, Not Breeding Season, Affects Testes Size of Primates. Functional Ecology, 9(3), 468-476. doi:10.2307/2390011

Harper, G. J., Steininger, M. K., Tucker, C. J., Juhn, D., & Hawkins, F. (2007). Fifty years of deforestation and forest fragmentation in Madagascar. Environmental Conservation, 34(4), 325-333. doi:10.1017/S0376892907004262

Harris, K., Pritchard, J. K., & Kimura, M. (2017). Rapid evolution of the human mutation spectrum on the evolutionary adjustment of spontaneous mutation rates. Elife, 6(1), e24284. doi:10.7554/eLife.24284 10.1017/S0016672300010284

Harris, R. B., Alström, P., Ödeen, A., & Leaché, A. D. (2017). Discordance between genomic divergence and phenotypic variation in a rapidly evolving avian genus (Motacilla). arXiv:1707.03864 [q-bio].

Harrison, R. G., & Larson, E. L. (2014). Hybridization, introgression, and the nature of species boundaries. J Hered, 105 Suppl 1, 795-809. doi:10.1093/jhered/esu033

Hasegawa, M., Kishino, H., & Yano, T. (1985). Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of molecular evolution, 22, 160-174.

Hawkins, M. T. R., Culligan, R. R., Frasier, C. L., Dikow, R. B., Hagenson, R., Lei, R., & Louis, E. E. (2018). Genome sequence and population declines in the critically endangered greater bamboo lemur (Prolemur simus) and implications for conservation. BMC Genomics, 19(1), 445. doi:10.1186/s12864-018-4841-4

Hawthorne, D. J., & Via, S. (2001). Genetic linkage of ecological specialization and reproductive isolation in pea aphids. Nature, 412(6850), 904-907. doi:10.1038/35091062

174

Hayes, A., Morgenstern, B., Gunduz, I., Stanke, M., Keller, O., & Waack, S. (2006). AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research, 34(suppl_2), W435-W439. doi:10.1093/nar/gkl200

Heckman, K. L., Mariani, C. L., Rasoloarison, R., & Yoder, A. D. (2007). Multiple nuclear loci reveal patterns of incomplete lineage sorting and complex species history within western mouse lemurs (Microcebus). and Evolution, 43(2), 353-367. doi:Doi 10.1016/J.Ympev.2007.03.005

Heerschop, S., Zischler, H., Merker, S., Perwitasari-Farajallah, D., & Driller, C. (2016). The pioneering role of PRDM9 indel mutations in tarsier evolution. Sci Rep, 6, 34618. doi:10.1038/srep34618

Hermansen, J. S., Haas, F., Trier, C. N., Bailey, R. I., Nederbragt, A. J., Marzal, A., & Saetre, G. P. (2014). Hybrid speciation through sorting of parental incompatibilities in Italian sparrows. Molecular Ecology, 23(23), 5831-5842. doi:10.1111/mec.12910

Herron, M. D., & Doebeli, M. (2013). Parallel Evolutionary Dynamics of Adaptive Diversification in Escherichia coli. PLOS Biology, 11(2). doi:ARTN e1001490 10.1371/journal.pbio.1001490

Hoffmann, A. A., & Rieseberg, L. H. (2008). Revisiting the Impact of Inversions in Evolution: From Population Genetic Markers to Drivers of Adaptive Shifts and Speciation? Annual Review of Ecology, Evolution, and Systematics, 39(1), 21-42. doi:10.1146/annurev.ecolsys.39.110707.173532

Hooper, D. M., & Price, T. D. (2017). Chromosomal inversion differences correlate with range overlap in passerine birds. Nature Ecology & Evolution, 1(10), 1526. doi:10.1038/s41559-017-0284-6

Hotaling, S., Foley, M. E., Lawrence, N. M., Bocanegra, J., Blanco, M. B., Rasoloarison, R., . . . Weisrock, D. W. (2016). Species discovery and validation in a cryptic radiation of endangered primates: coalescent-based species delimitation in Madagascar's mouse lemurs. Molecular Ecology, DOI: 10.1111/mec.13604.

Howard, D. J. (1999). Conspecific Sperm and Pollen Precedence and Speciation. Annual Review of Ecology and Systematics, 30(1), 109-132. doi:10.1146/annurev.ecolsys.30.1.109

175

Humbert, H. (1955). Les territoires phytogéographiques de Madagascar. Année Biologique, 3e série, 31, 439-448.

Ichikawa, K., Tomioka, S., Suzuki, Y., Nakamura, R., Doi, K., Yoshimura, J., . . . Morishita, S. (2017). Centromere evolution and CpG methylation during vertebrate speciation. Nature Communications, 8(1), 1833. doi:10.1038/s41467-017- 01982-7

Irwin, D. E. (2018). Sex chromosomes and speciation in birds and other ZW systems. Molecular Ecology, 0(0). doi:10.1111/mec.14537

Jackman, S. D., Vandervalk, B. P., Mohamadi, H., Chu, J., Yeo, S., Hammond, S. A., . . . Warren, R. L. (2017). ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome research, 27(5), 768-777.

Jackman, S. D., Vandervalk, B. P., Mohamadi, H., Chu, J., Yeo, S., Hammond, S. A., . . . Birol, I. (2017). ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res. doi:10.1101/gr.214346.116

Janoušek, V., Munclinger, P., Wang, L., Teeter, K. C., & Tucker, P. K. (2015). Functional organization of the genome may shape the species boundary in the house mouse. Molecular Biology and Evolution, 32(5), 1208-1220. doi:10.1093/molbev/msv011

Janousek, V., Wang, L., Luzynski, K., Dufkova, P., Vyskocilova, M. M., Nachman, M. W., . . . Tucker, P. K. (2012). Genome-wide architecture of reproductive isolation in a naturally occurring hybrid zone between Mus musculus musculus and M. m. domesticus. Molecular Ecology, 21(12), 3032-3047. doi:10.1111/j.1365- 294X.2012.05583.x

Jennions, M. D., & Petrie, M. (2000). Why do females mate multiply? A review of the genetic benefits. Biological Reviews, 75(1), 21-64. doi:Doi 10.1017/S0006323199005423

Jeukens, J., Renaut, S., St-Cyr, J., Nolte Arne, W., & Bernatchez, L. (2010). The transcriptomics of sympatric dwarf and normal lake whitefish ( clupeaformis spp., Salmonidae) divergence as revealed by next-generation sequencing. Molecular Ecology, 19(24), 5389-5403. doi:10.1111/j.1365- 294X.2010.04934.x

176

Jiao, W.-B., & Schneeberger, K. (2017). The impact of third generation genomic technologies on plant genome assembly. Current Opinion in Plant Biology, 36(Supplement C), 64-70. doi:10.1016/j.pbi.2017.02.002

Jiggins, C. D., & Martin, S. H. (2017). Glittering gold and the quest for Isla de Muerta. Journal of Evolutionary Biology, 30(8), 1509-1511. doi:10.1111/jeb.13110

Johnson, N. A. (2010). Hybrid incompatibility genes: remnants of a genomic battlefield? Trends in Genetics, 26(7), 317-325.

Johnson, N. A., & Lachance, J. (2012). The genetics of sex chromosomes: evolution and implications for hybrid incompatibility. Annals of the New York Academy of Sciences, 1256(1), E1-E22. doi:10.1111/j.1749-6632.2012.06748.x

Jones, F. C., Grabherr, M. G., Chan, Y. F., Russell, P., Mauceli, E., Johnson, J., . . . Kingsley, D. M. (2012). The genomic basis of adaptive evolution in threespine sticklebacks. Nature, 484(7392), 55. doi:10.1038/nature10944

Jónsson, H., Sulem, P., Kehr, B., Kristmundsdottir, S., Zink, F., Hjartarson, E., . . . Stefansson, K. (2017). Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature, 549, 519. doi:10.1038/nature24018

Kamath, G. M., Shomorony, I., Xia, F., Courtade, T. A., & David, N. T. (2017). HINGE: long-read assembly achieves optimal repeat resolution. Genome research, 27(5), 747-756.

Kamath, G. M., Shomorony, I., Xia, F., Courtade, T. A., & Tse, D. N. (2017). HINGE: long-read assembly achieves optimal repeat resolution. Genome Res. doi:10.1101/gr.216465.116

Kang, L., Settlage, R., McMahon, W., Michalak, K., Tae, H., Garner, H. R., . . . Michalak, P. (2016). Genomic Signatures of Speciation in Sympatric and Allopatric Hawaiian Picture-Winged Drosophila. Genome Biol Evol, 8(5), 1482-1488. doi:10.1093/gbe/evw095

Kappeler, P. M. (1997). Intrasexual selection and testis size in strepsirhine primates. Behavioral Ecology, 8(1), 10-19. doi:10.1093/beheco/8.1.10

177

Kappeler, P. M., Rasoloarison, R., Razafimanantsoa, L., Walter, L., & Roos, C. (2005). Morphology, behaviour and molecular evolution of giant mouse lemurs (Mirza spp.) Gray, 1870, with description of a new species. Primate Report, 71, 3-26.

Kapusta, A., Suh, A., & Feschotte, C. (2017). Dynamics of genome size evolution in birds and mammals. Proceedings of the National Academy of Sciences, 114(8), E1460. doi:10.1073/pnas.1616702114

Katoh, K., & Standley, D. M. (2013). MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution, 30(4), 772-780. doi:10.1093/molbev/mst010

Kautt, A. F., Machado-Schiaffino, G., & Meyer, A. (2016). Multispecies outcomes of sympatric speciation after admixture with the source population in two radiations of nicaraguan crater lake cichlids. PLOS Genetics, 12(6), e1006157. doi:10.1371/journal.pgen.1006157

Kearns, A. M., Restani, M., Szabo, I., Schrøder-Nielsen, A., Kim, J. A., Richardson, H. M., . . . Omland, K. E. (2018). Genomic evidence of speciation reversal in ravens. Nature Communications, 9(1), 906.

Kern, A. D., & Hey, J. (2017). Exact Calculation of the Joint Allele Frequency Spectrum for Isolation with Migration Models. Genetics, 207(1), 241-253. doi:10.1534/genetics.116.194019

Kingman, J. F. C. (1982). The coalescent. Stochastic Processes and their Applications, 13(3), 235-248. doi:10.1016/0304-4149(82)90011-4

Kirkpatrick, M., & Barton, N. (2006). Chromosome Inversions, Local Adaptation and Speciation. Genetics, 173(1), 419-434. doi:10.1534/genetics.105.047985

Kirubakaran, T. G., Grove, H., Kent, M. P., Sandve, S. R., Baranski, M., Nome, T., . . . Andersen, Ø. (2016). Two adjacent inversions maintain genomic differentiation between migratory and stationary ecotypes of Atlantic cod. Molecular Ecology, 25(10), 2130-2143. doi:10.1111/mec.13592

Koboldt, D. C., Larson, D. E., Mardis, E. R., Weinstock, G. M., Chen, K., Ding, L., . . . Wylie, T. (2009). VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics (Oxford, England), 25(17), 2283- 2285. doi:10.1093/bioinformatics/btp373

178

Kondrashov, A. S., & Kondrashov, F. A. (1999). Interactions among quantitative traits in the course of sympatric speciation. Nature, 400(6742), 351-354.

Kondrashov, A. S., & Shpak, M. (1998). On the origin of species by means of . Proceedings of the Royal Society B-Biological Sciences, 265(1412), 2273-2278.

Kong, A., Frigge, M. L., Masson, G., Besenbacher, S., Sulem, P., Magnusson, G., . . . Stefansson, K. (2012). Rate of de novo mutations and the importance of father’s age to disease risk. Nature, 488, 471. doi:10.1038/nature11396

Kono, H., Tamura, M., Osada, N., Suzuki, H., Abe, K., Moriwaki, K., . . . Shiroishi, T. (2014). Prdm9 polymorphism unveils mouse evolutionary tracks. DNA Res, 21(3), 315-326. doi:10.1093/dnares/dst059

Konrad, A., Thompson, O., Waterston, R. H., Moerman, D. G., Keightley, P. D., Bergthorsson, U., & Katju, V. (2017). Mitochondrial Mutation Rate, Spectrum and Heteroplasmy in Caenorhabditis elegans Spontaneous Mutation Accumulation Lines of Differing Population Size. Molecular Biology and Evolution, 34(6), 1319- 1334. doi:10.1093/molbev/msx051

Korf, I. (2004). Gene finding in novel genomes. BMC Bioinformatics, 5(1), 59. doi:10.1186/1471-2105-5-59

Kraft, K., Geuer, S., Will, Anja J., Chan, Wing L., Paliou, C., Borschiwer, M., . . . Andrey, G. (2015). Deletions, Inversions, Duplications: Engineering of Structural Variants using CRISPR/Cas in Mice. Cell Reports, 10(5), 833-839. doi:10.1016/j.celrep.2015.01.016

Kratochwil, C. F., & Meyer, A. (2015). Evolution: Tinkering within Gene Regulatory Landscapes. Current Biology, 25(7), R285-R288. doi:10.1016/j.cub.2015.02.051

Kriventseva, E. V., Zdobnov, E. M., Simão, F. A., Ioannidis, P., & Waterhouse, R. M. (2015). BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics (Oxford, England), 31(19), 3210-3212. doi:10.1093/bioinformatics/btv351

Krone, S. M., & Neuhauser, C. (1997). Ancestral Processes with Selection. Theoretical Population Biology, 51(3), 210-237.

179

Kruck, N. C., Innes, D. I., & Ovenden, J. R. (2013). New SNPs for population genetic analysis reveal possible cryptic speciation of eastern Australian sea mullet (Mugil cephalus). Mol Ecol Resour, 13(4), 715-725. doi:10.1111/1755-0998.12112

Kuhner, M. K., Yamato, J., & Felsenstein, J. (1995). Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics, 140(4), 1421-1430.

Kumar, S., & Subramanian, S. (2002). Mutation rates in mammalian genomes. Proceedings of the National Academy of Sciences, 99(2), 803. doi:10.1073/pnas.022629899

Künstner, A., Ellegren, H., Wolf, J. B. W., Nam, K., & Jakobsson, M. (2009). Nonlinear Dynamics of Nonsynonymous (dN) and Synonymous (dS) Substitution Rates Affects Inference of Selection. Genome Biology and Evolution, 1, 308-319. doi:10.1093/gbe/evp030

Lahann, P., Schmid, J., & Ganzhorn, J. U. (2006). Geographic variation in populations of Microcebus murinus in Madagascar: Resource seasonality or Bergmann's rule? International Journal of Primatology, 27(4), 983-999. doi:10.1007/s10764-006-9055-y

Lamichhaney, S., Han, F., Webster, M. T., Andersson, L., Grant, B. R., & Grant, P. R. (2017). Rapid hybrid speciation in Darwin’s finches. Science, eaao4593. doi:10.1126/science.aao4593

Lanfear, R., Calcott, B., Ho, S. Y. W., & Guindon, S. (2012). PartitionFinder: Combined Selection of Partitioning Schemes and Substitution Models for Phylogenetic Analyses. Molecular Biology and Evolution, 29(6), 1695-1701. doi:10.1093/molbev/mss020

Lanfear, R., Kokko, H., & Eyre-Walker, A. (2014). Population size and the rate of evolution. Trends Ecol Evol, 29(1), 33-41. doi:10.1016/j.tree.2013.09.009

Lapierre, M., Lambert, A., & Achaz, G. (2017). Accuracy of Demographic Inferences from the Site Frequency Spectrum: The Case of the Yoruba Population. Genetics, genetics.116.192708. doi:10.1534/genetics.116.192708

Larsen, P. A., Harris, R. A., Liu, Y., Murali, S. C., Campbell, C. R., Brown, A. D., . . . Worley, K. C. (2017). Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus). BMC Biology, 15, 110. doi:10.1186/s12915-017-0439-6

180

Larsen, P. A., Heilman, A. M., & Yoder, A. D. (2014). The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms. BMC Genomics, 15(720).

Larson, E. L., Keeble, S., Vanderpool, D., Dean, M. D., & Good, J. M. (2016). The composite regulatory basis of the large X-effect in mouse speciation. Molecular Biology and Evolution. doi:10.1093/molbev/msw243

Leducq, J. B., Nielly-Thibault, L., Charron, G., Eberlein, C., Verta, J. P., Samani, P., . . . Landry, C. R. (2016). Speciation driven by hybridization and chromosomal plasticity in a wild yeast. Nature Microbiology, 1(1). doi:Artn 15003 10.1038/Nmicrobiol.2015.3

Lee, H., Gurtowski, J., Yoo, S., Nattestad, M., Marcus, S., Goodwin, S., . . . Schatz, M. (2016). Third-generation sequencing and the future of genomics. bioRxiv, 048603. doi:10.1101/048603

Leffler, E. M., Bullaughey, K., Matute, D. R., Meyer, W. K., Segurel, L., Venkat, A., . . . Przeworski, M. (2012). Revisiting an old riddle: what determines genetic diversity levels within species? PLoS Biol, 10(9), e1001388. doi:10.1371/journal.pbio.1001388

Lewontin, R., & Krakauer, J. (1973). Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics, 74(1), 175-195.

Lexer, C., Buerkle, C., Joseph, J., Heinze, B., & Fay, M. (2007). Admixture in European Populus hybrid zones makes feasible the mapping of loci that contribute to reproductive isolation and trait differences. Heredity, 98(2), 74.

Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows- Wheeler transform. Bioinformatics (Oxford, England), 25(14), 1754-1760. doi:10.1093/bioinformatics/btp324

Li, K. X., Hong, W., Jiao, H. W., Wang, G. D., Rodriguez, K. A., Buffenstein, R., . . . Zhao, H. B. (2015). Sympatric speciation revealed by genome-wide divergence in the blind mole rat Spalax. Proceedings of the National Academy of Sciences of the United States of America, 112(38), 11905-11910. doi:10.1073/pnas.1514896112

181

Li, M., Tian, S., Jin, L., Zhou, G., Li, Y., Zhang, Y., . . . Ma, J. (2013). Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars. Nature genetics, 45(12), 1431-1438.

Li, Z., Zhang, Z., Yan, P., Huang, S., Fei, Z., & Lin, K. (2011). RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC Genomics, 12, 540. doi:10.1186/1471-2164-12-540

Lieberman-Aiden, E., Berkum, N. L. v., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., . . . Dekker, J. (2009). Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science, 326(5950), 289-293. doi:10.1126/science.1181369

Lima, T. G. (2014). Higher levels of sex chromosome heteromorphism are associated with markedly stronger reproductive isolation. Nat Commun, 5, 4743. doi:10.1038/ncomms5743

Lindtke, D., & Yeaman, S. (2017). Identifying the loci of speciation: the challenge beyond genome scans. Journal of Evolutionary Biology, 30(8), 1478-1481. doi:10.1111/jeb.13098

Linn, C., Jr., Feder, J. L., Nojima, S., Dambroski, H. R., Berlocher, S. H., & Roelofs, W. (2003). Fruit odor discrimination and sympatric host formation in Rhagoletis. Proc Natl Acad Sci U S A, 100(20), 11490-11493. doi:10.1073/pnas.1635049100

Loh, Y. H., Yi, S. V., & Streelman, J. T. (2011). Evolution of microRNAs and the diversification of species. Genome Biol Evol, 3, 55-65. doi:10.1093/gbe/evq085

Lohse, K., Chmelik, M., Martin, S. H., & Barton, N. H. (2015). Efficient Strategies for Calculating Blockwise Likelihoods under the Coalescent. Genetics, genetics.115.183814. doi:10.1534/genetics.115.183814

Losos, J. B., & Glor, R. E. (2003). Phylogenetic comparative methods and the geography of speciation. Trends in Ecology & Evolution, 18(5), 220-227. doi:10.1016/S0169- 5347(03)00037-5

Lou, Y. N., Liu, W. J., Wang, C. L., Huang, L., Jin, S. Y., Lin, Y. Q., & Zheng, Y. C. (2014). Histological evaluation and Prdm9 expression level in the testis of sterile male cattle-yaks. Livestock Science, 160, 208-213. doi:10.1016/j.livsci.2013.12.017

182

Louis, E. E., Coles, M. S., Andriantompohavana, R., Sommer, J. A., Engberg, S. E., Zaonarivelo, J. R., . . . Brenneman, R. A. (2006). Revision of the mouse lemurs (Microcebus) of eastern Madagascar. International Journal of Primatology, 27(2), 347-389. doi:Doi 10.1007/S10764-006-9036-1

Löytynoja, A. (2014). Phylogeny-aware alignment with PRANK. In D. J. Russell (Ed.), Multiple Sequence Alignment Methods (pp. 155-170). Totowa, NJ: Humana Press.

Lynch, M. (2008). The cellular, developmental and population-genetic determinants of mutation-rate evolution. Genetics, 180(2), 933-943. doi:10.1534/genetics.108.090456

Lynch, M. (2010). Evolution of the mutation rate. Trends Genet, 26(8), 345-352. doi:10.1016/j.tig.2010.05.003

Lynch, M. (2011). The lower bound to the evolution of mutation rates. Genome Biol Evol, 3, 1107-1118. doi:10.1093/gbe/evr066

Lynch, M., Ackerman, M. S., Gout, J. F., Long, H., Sung, W., Thomas, W. K., & Foster, P. L. (2016). , selection and the evolution of the mutation rate. Nature Reviews Genetics, 17(11), 704-714. doi:10.1038/nrg.2016.104

Lynch, M., & Conery, J. S. (2000). The evolutionary fate and consequences of duplicate genes. Science, 290(5494), 1151-1155. doi:DOI 10.1126/science.290.5494.1151

Lynch, M., Force, A. G., & Travis, E. J. (2000). The Origin of Interspecific Genomic Incompatibility via Gene Duplication. The American Naturalist, 156(6), 590-605. doi:10.1086/316992

Mack, K. L., & Nachman, M. W. (2017). Gene Regulation and Speciation. Trends in Genetics, 33(1), 68-80. doi:10.1016/j.tig.2016.11.003

MacLeod, A., Rodriguez, A., Vences, M., Orozco-terWengel, P., Garcia, C., Trillmich, F., . . . Steinfartz, S. (2015). Hybridization masks speciation in the evolutionary history of the Galapagos marine . Proc Biol Sci, 282(1809), 20150425. doi:10.1098/rspb.2015.0425

Magalhaes, I. S., Mwaiko, S., Schneider, M. V., & Seehausen, O. (2009). Divergent selection and during incipient speciation in Lake Victoria cichlid fish. J Evol Biol, 22(2), 260-274. doi:10.1111/j.1420-9101.2008.01637.x

183

Maheshwari, S., & Barbash, D. A. (2011). The Genetics of Hybrid Incompatibilities. Annual Review of Genetics, 45(1), 331-355. doi:10.1146/annurev-genet-110410- 132514

Malinsky, M., Challis, R. J., Tyers, A. M., Schiffels, S., Terai, Y., Ngatunga, B. P., . . . Turner, G. F. (2015). Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake. Science, 350(6267), 1493-1498. doi:10.1126/science.aac9927

Mallet, J. (2001). The speciation revolution. Journal of Evolutionary Biology, 14(6), 887-888.

Mallet, J. (2008). Hybridization, ecological races and the nature of species: empirical evidence for the ease of speciation. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 363(1506), 2971-2986.

Mallet, J., Meyer, A., Nosil, P., & Feder, J. L. (2009). Space, sympatry and speciation. Journal of Evolutionary Biology, 22(11), 2332-2341. doi:10.1111/j.1420- 9101.2009.01816.x

Malukiewicz, J., Guschanski, K., Grativol, A. D., Oliveira, M. A., Ruiz-Miranda, C. R., & Stone, A. C. (2017). Application of PE-RADSeq to the study of genomic diversity and divergence of two Brazilian marmoset species (Callithrix jacchus and C. penicillata). Am J Primatol, 79(2), 1-12. doi:10.1002/ajp.22587

Manier, Mollie K., Lüpold, S., Belote, John M., Starmer, William T., Berben, Kirstin S., Ala-Honkola, O., . . . Pitnick, S. (2013). Postcopulatory Sexual Selection Generates Speciation Phenotypes in Drosophila. Current Biology, 23(19), 1853-1862. doi:https://doi.org/10.1016/j.cub.2013.07.086

Mardis, E. R. (2017). DNA sequencing technologies: 2006–2016. Nature Protocols, 12, 213. doi:10.1038/nprot.2016.182

Markolf, M., Brameier, M., & Kappeler, P. M. (2011). On species delimitation: Yet another lemur species or just genetic variation? Bmc Evolutionary Biology, 11. doi:Artn 216 doi 10.1186/1471-2148-11-216

Martin, C. H., Crawford, J. E., Turner, B. J., & Simons, L. H. (2016). Diabolical survival in Death Valley: recent pupfish colonization, gene flow and in the smallest species range on earth. Proceedings of the Royal Society B-Biological Sciences, 283(1823). doi:ARTN 20152334 10.1098/rspb.2015.2334

184

Martin, R. D. (1972). A preliminary field-study of the lesser mouse lemur (Microcebus murinus J.F. Miller, 1777). Zeitschrift für Tierpychologie, Suppl. 9, 43-89.

Martin, S. H., Dasmahapatra, K. K., Nadeau, N. J., Salazar, C., Walters, J. R., Simpson, F., . . . Jiggins, C. D. (2013). Genome-wide evidence for speciation with gene flow in Heliconius butterflies. Genome research, 23(11), 1817-1828. doi:10.1101/gr.159426.113

Martin, S. H., Davey, J. W., & Jiggins, C. D. (2015). Evaluating the use of ABBA–BABA statistics to locate introgressed loci. Molecular Biology and Evolution, 32(1), 244-257. doi:10.1093/molbev/msu269

Martin-Coello, J., Dopazo, H., Arbiza, L., Ausió, J., Roldan Eduardo, R. S., & Gomendio, M. (2009). Sexual selection drives weak positive selection in protamine genes and high promoter divergence, enhancing sperm competitiveness. Proceedings of the Royal Society B: Biological Sciences, 276(1666), 2427-2436. doi:10.1098/rspb.2009.0257

Masly, J. P., & Presgraves, D. C. (2007). High-resolution genome-wide dissection of the two rules of speciation in Drosophila. PLOS Biology, 5(9), 1890-1898. doi:ARTN e243 10.1371/journal.pbio.0050243

Matsumoto, T., Terai, Y., Okada, N., & Tachida, H. (2014). Sensory drive speciation and patterns of variation at selectively neutral genes. Evolutionary Ecology, 28(4), 591- 609. doi:10.1007/s10682-014-9697-8

Mayr, E. (1942). Systematics and the Origin of Species from the Viewpoint of a Zoologist. Cambridge, MA: Harvard University Press.

Mayr, E. (1963). Animal species and evolution. Cambridge, MA: Harvard University Press.

Mazo-Vargas, A., Concha, C., Livraghi, L., Massardo, D., Wallbank, R. W. R., Zhang, L., . . . Martin, A. (2017). Macroevolutionary shifts of WntA function potentiate butterfly wing-pattern diversity. Proceedings of the National Academy of Sciences, 114(40), 10701-10706. doi:10.1073/pnas.1708149114

McConnell, W. J., & Kull, C. A. (2014). Deforestation in Madagascar Debates over the island's forest cover and challenges of measuring forest change. Conservation and Environmental Management in Madagascar, 67-104.

185

McVean, G. A., & Cardin, N. J. (2005). Approximating the coalescent with recombination. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1459), 1387-1393.

Merot, C., Frerot, B., Leppik, E., & Joron, M. (2015). Beyond magic traits: Multimodal mating cues in Heliconius butterflies. Evolution, 69(11), 2891-2904. doi:10.1111/evo.12789

Meyer, A. (2015). The extraordinary evolution of cichlid fishes. American Scientist, 2015(April), 70-75.

Meyer, J. R., Dobias, D. T., Medina, S. J., Servilio, L., Gupta, A., & Lenski, R. E. (2016). Ecological speciation of bacteriophage lambda in allopatry and sympatry. Science, 354(6317), 1301-1304. doi:10.1126/science.aai8446

Meyer, W. K., Venkat, A., Kermany, A. R., van de Geijn, B., Zhang, S., & Przeworski, M. (2015). Evolutionary history inferred from the de novo assembly of a nonmodel organism, the blue-eyed black lemur. Molecular Ecology, 24(17), 4392-4405. doi:10.1111/mec.13327

Michel, A. P., Sim, S., Powell, T. H. Q., Taylor, M. S., Nosil, P., & Feder, J. L. (2010). Widespread genomic divergence during sympatric speciation. Proceedings of the National Academy of Sciences of the United States of America, 107(21), 9724-9729. doi:10.1073/pnas.1000939107

Mihola, O., Trachtulec, Z., Vlcek, C., Schimenti, J. C., & Forejt, J. (2009). A Mouse Speciation Gene Encodes a Meiotic Histone H3 Methyltransferase. Science, 323(5912), 373-375. doi:10.1126/science.1163601

Mizuta, Y., Harushima, Y., & Kurata, N. (2010). Rice pollen hybrid incompatibility caused by reciprocal gene loss of duplicated genes. Proceedings of the National Academy of Sciences, 107(47), 20417-20422. doi:10.1073/pnas.1003124107

Moorjani, P., Amorim, C. E. G., Arndt, P. F., & Przeworski, M. (2016). Variation in the molecular clock of primates. Proceedings of the National Academy of Sciences, 113(38), 10607. doi:10.1073/pnas.1600374113

Morii, Y., Yokoyama, J., Kawata, M., Davison, A., & Chiba, S. (2015). Evidence of introgressive hybridization between the morphologically divergent land snails Ainohelix and Ezohelix. Biological Journal of the Linnean Society, 115(1), 77-95.

186

Morris, D. W. (2003). Toward an ecological synthesis: a case for habitat selection. Oecologia, 136(1), 1-13. doi:10.1007/s00442-003-1241-4

Myers, N., Mittermeier, R. A., Mittermeier, C. G., da Fonseca, G. A. B., & Kent, J. (2000). Biodiversity hotspots for conservation priorities. Nature, 403, 853-858.

Nabholz, B., Glemin, S., & Galtier, N. (2009). The erratic mitochondrial clock: variations of mutation rate, not population size, affect mtDNA diversity across birds and mammals. Bmc Evolutionary Biology, 9, 54. doi:10.1186/1471-2148-9-54

Nachman, M. W., & Crowell, S. L. (2000). Estimate of the Mutation Rate per Nucleotide in Humans. Genetics, 156(1), 297.

Nachman, M. W., & Payseur, B. A. (2012). Recombination rate variation and speciation: theoretical predictions and empirical results from rabbits and mice. Phil. Trans. R. Soc. B, 367(1587), 409-421. doi:10.1098/rstb.2011.0249

Nadeau, N. J., Ruiz, M., Salazar, P., Counterman, B., Medina, J. A., Ortiz-Zuazaga, H., . . . Papa, R. (2014). Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato. Genome research, 24(8), 1316-1333.

Nadeau, N. J., Whibley, A., Jones, R. T., Davey, J. W., Dasmahapatra, K. K., Baxter, S. W., . . . Jiggins, C. D. (2012). Genomic islands of divergence in hybridizing Heliconius butterflies identified by large-scale targeted sequencing. Philosophical Transactions of the Royal Society B-Biological Sciences, 367(1587), 343-353. doi:Doi 10.1098/Rstb.2011.0198

Nascimento Jaclyn, M., Shi Linda, Z., Meyers, S., Gagneux, P., Loskutoff Naida, M., Botvinick Elliot, L., & Berns Michael, W. (2008). The use of optical tweezers to study sperm competition and motility in primates. Journal of The Royal Society Interface, 5(20), 297-302. doi:10.1098/rsif.2007.1118

Navarro, A., & Barton, N. H. (2003). Chromosomal speciation and molecular divergence- -accelerated evolution in rearranged chromosomes. Science (New York, N.Y.), 300(5617), 321-324. doi:10.1126/science.1080600

Nei, M., & Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution, 3(5), 418-426. doi:10.1093/oxfordjournals.molbev.a040410

187

Nielsen, R., Paul, J. S., Albrechtsen, A., & Song, Y. S. (2011). Genotype and SNP calling from next-generation sequencing data. Nature Reviews Genetics, 12(6), 443. doi:10.1038/nrg2986

Nishida, A., Weiss, J. M., Thomas, M. J., Kelly, S., Peng, X., Katze, M. G., . . . Schroth, G. P. (2014). Tissue-specific transcriptome sequencing analysis expands the non- human primate reference transcriptome resource (NHPRTR). Nucleic Acids Research, 43(D1), D737-D742. doi:10.1093/nar/gku1110

Nishida, K., Arazoe, T., Yachie, N., Banno, S., Kakimoto, M., Tabata, M., . . . Kondo, A. (2016). Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science (New York, N.Y.), 353(6305). doi:10.1126/science.aaf8729

Noor, M. A., Grams, K. L., Bertucci, L. A., Almendarez, Y., Reiland, J., & Smith, K. R. (2001). The genetics of reproductive isolation and the potential for gene exchange between Drosophila pseudoobscura and D. persimilis via backcross hybrid males. Evolution, 55(3), 512-521.

Noor, M. a. F., & Bennett, S. M. (2009). Islands of speciation or mirages in the desert? Examining the role of restricted recombination in maintaining species. Heredity, 103(6), 439. doi:10.1038/hdy.2009.151

Noor, M. A. F., Grams, K. L., Bertucci, L. A., & Reiland, J. (2001). Chromosomal inversions and the reproductive isolation of species. Proceedings of the National Academy of Sciences, 98(21), 12084-12088. doi:10.1073/pnas.221274498

Nosil, P. (2008). Speciation with gene flow could be common. Molecular Ecology, 17(9), 2103-2106. doi:10.1111/j.1365-294X.2008.03715.x

Nosil, P., Egan, S. P., & Funk, D. J. (2008). Heterogeneous genomic differentiation between walking-stick ecotypes: "isolation by adaptation" and multiple roles for divergent selection. Evolution, 62(2), 316-336. doi:10.1111/j.1558-5646.2007.00299.x

Nosil, P., Funk, D. J., & Ortiz-Barrientos, D. (2009). Divergent selection and heterogeneous genomic divergence. Molecular Ecology, 18(3), 375-402. doi:10.1111/j.1365-294X.2008.03946.x

188

Nosil, P., Harmon, L. J., & Seehausen, O. (2009). Ecological explanations for (incomplete) speciation. Trends in Ecology & Evolution, 24(3), 145-156. doi:10.1016/j.tree.2008.10.011

Nosil, P., & Schluter, D. (2011). The genes underlying the process of speciation. Trends Ecol Evol, 26(4), 160-167. doi:10.1016/j.tree.2011.01.001

O'Neill, M. J., & O'Neill, R. J. (2018). Sex chromosome repeats tip the balance towards speciation. Molecular Ecology.

Ohta, T. (1972). Population size and rate of evolution. Journal of molecular evolution, 1(4), 305-314.

Oliver, P. L., Goodstadt, L., Bayes, J. J., Birtle, Z., Roach, K. C., Phadnis, N., . . . Ponting, C. P. (2009). Accelerated evolution of the Prdm9 speciation gene across diverse metazoan taxa. PLoS Genet, 5(12), e1000753. doi:10.1371/journal.pgen.1000753

Olivieri, G., Zimmermann, E., Randrianambinina, B., Rasoloharijaona, S., Rakotondravony, D., Guschanski, K., & Radespiel, U. (2007). The ever-increasing diversity in mouse lemurs: Three new species in north and northwestern Madagascar. Molecular Phylogenetics and Evolution, 43(1), 309-327. doi:Doi 10.1016/J.Ympev.2006.10.026

Olivieri, G. L., Sousa, V., Chikhi, L., & Radespiel, U. (2008). From genetic diversity and structure to conservation: Genetic signature of recent population declines in three mouse lemur species (Microcebus spp.). Biological Conservation, 141(5), 1257-1271. doi:Doi 10.1016/J.Biocon.2008.02.025

Orr, H. A. (1990). "Why is Rarer in Animals Than in Plants" Revisited. The American Naturalist, 136(6), 759-770. doi:10.1086/285130

Orr, M. R., & Smith, T. B. (1998). Ecology and speciation. Trends Ecol Evol, 13(12), 502- 506.

Ortiz-Barrientos, D., & James, M. E. (2017). Evolution of recombination rates and the genomic landscape of speciation. Journal of Evolutionary Biology, 30(8), 1519-1521. doi:10.1111/jeb.13116

Otto, S. P., & Whitton, J. (2000). Polyploid Incidence and Evolution. Annual Review of Genetics, 34(1), 401-437. doi:10.1146/annurev.genet.34.1.401

189

Padhi, A., Shen, B. T., Jiang, J. C., Yang, Z., Liu, G. E., & Li, M. (2017). Ruminant-specific multiple duplication events of PRDM9 before speciation. Bmc Evolutionary Biology, 17. doi:ARTN 79 10.1186/s12862-017-0892-4

Palumbi, S. R. (2008). Speciation and the evolution of gamete recognition genes: pattern and process. Heredity, 102, 66. doi:10.1038/hdy.2008.104

Paradis, E., Claude, J., & Strimmer, K. (2004). APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics (Oxford, England), 20(2), 289-290. doi:10.1093/bioinformatics/btg412

Pardo-Diaz, C., Salazar, C., Baxter, S. W., Merot, C., Figueiredo-Ready, W., Joron, M., . . . Jiggins, C. D. (2012). Adaptive Introgression across Species Boundaries in Heliconius Butterflies. PLOS Genetics, 8(6), e1002752. doi:10.1371/journal.pgen.1002752

Parvanov, E. D., Petkov, P. M., & Paigen, K. (2010). Prdm9 controls activation of mammalian recombination hotspots. Science, 327(5967), 835. doi:10.1126/science.1181495

Paten, B., Novak, A. M., Eizenga, J. M., & Garrison, E. (2017). Genome graphs and the evolution of genome inference. Genome research, 27(5), 665-676.

Patterson, N., Moorjani, P., Luo, Y., Mallick, S., Rohland, N., Zhan, Y., . . . Reich, D. (2012). Ancient Admixture in Human History. Genetics, 192(3), 1065-1093. doi:10.1534/genetics.112.145037

Paudel, Y., Madsen, O., Megens, H.-J., Frantz, L. A., Bosse, M., Crooijmans, R. P., & Groenen, M. A. (2015). Copy number variation in the speciation of pigs: a possible prominent role for olfactory receptors. BMC Genomics, 16(1), 330.

Paudel, Y., Madsen, O., Megens, H. J., Frantz, L. A., Bosse, M., Crooijmans, R. P., & Groenen, M. A. (2015). Copy number variation in the speciation of pigs: a possible prominent role for olfactory receptors. BMC Genomics, 16, 330. doi:10.1186/s12864-015-1449-9

Payseur, B. A. (2010). Using differential introgression in hybrid zones to identify genomic regions involved in speciation. Molecular Ecology Resources, 10(5), 806- 820.

190

Payseur, B. A. (2016). Genetic Links between Recombination and Speciation. PLoS Genet, 12(6), e1006066. doi:10.1371/journal.pgen.1006066

Pease, J. B., & Hahn, M. W. (2015). Detection and polarization of introgression in a five- phylogeny. Systematic Biology, 64(4), 651-662. doi:10.1093/sysbio/syv023

Perrier de la Bâthie, H. (1921). La végétation malgache. Annals du Muséum Colonial, Marseille, 9, 1-266.

Peter, B. M. (2016). Admixture, Population Structure and F-Statistics. Genetics, genetics.115.183913. doi:10.1534/genetics.115.183913

Peterson, B. K., Weber, J. N., Kay, E. H., Fisher, H. S., & Hoekstra, H. E. (2012). Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species. PLoS One, 7(5). doi:ARTN e37135 10.1371/journal.pone.0037135

Pfeifer, S. P. (2017). Direct estimate of the spontaneous germ line mutation rate in African green monkeys. Evolution, 71(12), 2858-2870. doi:10.1111/evo.13383

Pfeifer, S. P. (2017). The Demographic and Adaptive History of the African Green Monkey. Molecular Biology and Evolution, 34(5), 1055-1065. doi:10.1093/molbev/msx056

Phillippy, A. M. (2017). New advances in sequence assembly. Genome research, 27(5), xi- xiii. doi:10.1101/gr.223057.117

Phillips, B. C., & Edmands, S. (2012). Does the speciation clock tick more slowly in the absence of sex chromosomes? BioEssays, 34(3), 166-169.

Pinho, C., & Hey, J. (2010). Divergence with Gene Flow: Models and Data. Annual Review of Ecology, Evolution, and Systematics, 41(1), 215-230. doi:10.1146/annurev-ecolsys- 102209-144644

Poelstra, J. W., Vijay, N., Bossu, C. M., Lantz, H., Ryll, B., Müller, I., . . . Wolf, J. B. W. (2014). The genomic landscape underlying phenotypic integrity in the face of gene flow in crows. Science, 344(6190), 1410-1414. doi:10.1126/science.1253226

191

Pontarp, M., Ripa, J., & Lundberg, P. (2015). The Biogeography of Adaptive Radiations and the Geographic Overlap of Sister Species. American Naturalist, 186(5), 565- 581. doi:10.1086/683260

Ponting, C. P. (2011). What are the genomic drivers of the rapid evolution of PRDM9? Trends Genet, 27(5), 165-171. doi:10.1016/j.tig.2011.02.001

Presgraves, D. C. (2008). Sex chromosomes and speciation in Drosophila. Trends Genet, 24(7), 336-343. doi:10.1016/j.tig.2008.04.007

Presgraves, D. C. (2010). The molecular evolutionary basis of species formation. Nature Reviews Genetics, 11(3), 175. doi:10.1038/nrg2718

Price, C. S., Kim, C. H., Posluszny, J., & Coyne, J. A. (2000). Mechanisms of conspecific sperm precedence in Drosophila. Evolution, 54(6), 2028-2037.

Ptak, S. E., Hinds, D. A., Koehler, K., Nickel, B., Patil, N., Ballinger, D. G., . . . Paabo, S. (2005). Fine-scale recombination patterns differ between chimpanzees and humans (vol 37, pg 429, 2005). Nature genetics, 37(4), 445-445. doi:10.1038/ng0405- 445

Quemere, E., Amelot, X., Pierson, J., Crouau-Roy, B., & Chikhi, L. (2012). Genetic data suggest a natural prehuman origin of open habitats in northern Madagascar and question the deforestation narrative in this region. Proceedings of the National Academy of Sciences of the United States of America, 109(32), 13028-13033. doi:10.1073/pnas.1200153109

Racimo, F., Sankararaman, S., Nielsen, R., & Huerta-Sánchez, E. (2015). Evidence for archaic adaptive introgression in humans. Nature Reviews Genetics, 16, 359. doi:10.1038/nrg3936

Radespiel, U. (2000). Sociality in the gray mouse lemur (Microcebus murinus) in northwestern Madagascar. American Journal of Primatology, 51(1), 21-40. doi:10.1002/(SICI)1098-2345(200005)51:1<21::AID-AJP3>3.0.CO;2-C

Radespiel, U., Ehresmann, P., Reimann, W., Rendigs, A., & Zimmermann, E. (2004). Ecological differentiation of two sympatric mouse lemur species in northwestern madagascar. Folia Primatologica, 75, 321-321.

192

Radespiel, U., Olivieri, G., Rasolofoson, D. W., Rakotondratsimba, G., Rakotonirainy, O., Rasoloharijaona, S., . . . Randrianarison, R. M. (2008). Exceptional diversity of mouse lemurs (Microcebus spp.) in the Makira region with the description of one new species. Am J Primatol, 70(11), 1033-1046. doi:10.1002/ajp.20592

Radespiel, U., Ratsimbazafy, J. H., Rasoloharijaona, S., Raveloson, H., Andriaholinirina, N., Rakotondravony, R., . . . Randrianambinina, B. (2012). First indications of a highland specialist among mouse lemurs (Microcebus spp.) and evidence for a new mouse lemur species from eastern Madagascar. Primates, 53(2), 157-170. doi:10.1007/s10329-011-0290-2

Rafati, N., Blanco-Aguiar, J. A., Rubin, C. J., Sayyab, S., Sabatino, S. J., Afonso, S., . . . Ferrand, N. (2018). A genomic map of clinal variation across the European rabbit hybrid zone. Molecular Ecology.

Rakotondranary, S. J., & Ganzhorn, J. U. (2011). Habitat Separation of Sympatric Microcebus spp. in the Dry Spiny Forest of South-Eastern Madagascar. Folia Primatologica, 82(4-5), 212-223. doi:Doi 10.1159/000334816

Ramm, S. A., Oliver, P. L., Ponting, C. P., Stockley, P., & Emes, R. D. (2007). Sexual Selection and the Adaptive Evolution of Mammalian Ejaculate Proteins. Molecular Biology and Evolution, 25(1), 207-219. doi:10.1093/molbev/msm242

Ramu, A., Noordam, M. J., Schwartz, R. S., Wuster, A., Hurles, M. E., Cartwright, R. A., & Conrad, D. F. (2013). DeNovoGear: de novo indel and point mutation discovery and phasing. Nature Methods, 10, 985. doi:10.1038/nmeth.2611

Rasmussen, M. D., Hubisz, M. J., Gronau, I., & Siepel, A. (2014). Genome-Wide Inference of Ancestral Recombination Graphs. PLOS Genetics, 10(5), e1004342. doi:10.1371/journal.pgen.1004342

Rasoloarison, R. M., Goodman, S. M., & Ganzhorn, J. U. (2000). Taxonomic revision of mouse lemurs (Microcebus) in the western portions of Madagascar. International Journal of Primatology, 21, 963-1019.

Rasoloarison, R. M., Weisrock, D. W., Yoder, A. D., Rakotondravony, D., & Kappeler, P. M. (2013). Two New Species of Mouse Lemurs (Cheirogaleidae: Microcebus) from Eastern Madagascar. International Journal of Primatology, 34(3), 455-469. doi:Doi 10.1007/S10764-013-9672-1

193

Rastorguev, S. M., Nedoluzhko, A. V., Sharko, F. S., Boulygina, E. S., Sokolov, A. S., Gruzdeva, N. M., . . . Prokhortchouk, E. B. (2016). Identification of novel microRNA genes in freshwater and marine ecotypes of the three-spined stickleback (Gasterosteus aculeatus). Molecular Ecology Resources, 16(6), 1491-1498. doi:10.1111/1755-0998.12545

Ravinet, M., Faria, R., Butlin, R. K., Galindo, J., Bierne, N., Rafajlović, M., . . . Westram, A. M. (2017). Interpreting the genomic landscape of speciation: a road map for finding barriers to gene flow. Journal of Evolutionary Biology, 30(8), 1450-1477. doi:10.1111/jeb.13047

Reaz, R., Bayzid, M. S., & Rahman, M. S. (2014). Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach. PLoS One, 9(8). doi:ARTN e104008 10.1371/journal.pone.0104008

Renaut, S., Grassa, C. J., Yeaman, S., Moyers, B. T., Lai, Z., Kane, N. C., . . . Rieseberg, L. H. (2013). Genomic islands of divergence are not affected by geography of speciation in sunflowers. Nature Communications, 4, 1827. doi:10.1038/ncomms2833

Renaut, S., Maillet, N., Normandeau, E., Sauvage, C., Derome, N., Rogers, S. M., & Bernatchez, L. (2012). Genome-wide patterns of divergence during speciation: the lake whitefish study. Philosophical Transactions of the Royal Society B-Biological Sciences, 367(1587), 354-363.

Rennison, D. J., Owens, G. L., & Taylor, J. S. (2012). Opsin gene duplication and divergence in ray-finned fish. Molecular Phylogenetics and Evolution, 62(3), 986- 1008. doi:10.1016/j.ympev.2011.11.030

Rice, W. R. (1987). Speciation via habitat specialization: the evolution of reproductive isolation as a correlated character. Evolutionary Ecology, 1(4), 301-314. doi:Doi 10.1007/Bf02071555

Rice, W. R., & Hostert, E. E. (1993). Laboratory Experiments on Speciation - What Have We Learned in 40 Years. Evolution, 47(6), 1637-1653. doi:Doi 10.2307/2410209

Richards, E. J., & Martin, C. H. (2017). Adaptive introgression from distant Caribbean islands contributed to the diversification of a microendemic adaptive radiation of trophic specialist pupfishes. PLOS Genetics, 13(8), e1006919. doi:10.1371/journal.pgen.1006919

194

Ricklefs, R. E., Outlaw, D. C., Svensson-Coelho, M., Medeiros, M. C. I., Ellis, V. A., & Latta, S. (2014). Species formation by host shifting in avian parasites. Proceedings of the National Academy of Sciences of the United States of America, 111(41), 14816-14821. doi:10.1073/pnas.1416356111

Riesch, R., Muschick, M., Villoutreix, R., Comeault, A. A., Farkas, T. E., Lucek, K., . . . Nosil, P. (2017). Transitions between phases of genomic differentiation during stick-insect speciation. Nature Ecology and Evolution, 1(82), doi:10.1038/s41559- 41017-40082.

Rieseberg, L. H. (2001). Chromosomal rearrangements and speciation. Trends in Ecology & Evolution, 16(7), 351-358.

Rieseberg, L. H., Desrochers, A. M., & Youn, S. J. (1995). Interspecific pollen competition as a reproductive barrier between sympatric species of Helianthus (Asteraceae). American Journal of Botany, 82(4), 515-519. doi:10.1002/j.1537-2197.1995.tb15672.x

Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A., & Kim, D. (2015). Methods of integrating data to uncover genotype–phenotype interactions. Nature Reviews Genetics, 16(2), 85. doi:10.1038/nrg3868

Ritchie, M. G. (2007). Sexual Selection and Speciation. Annual Review of Ecology, Evolution, and Systematics, 38(1), 79-102. doi:10.1146/annurev.ecolsys.38.091206.095733

Roesti, M., Kueng, B., Moser, D., & Berner, D. (2015). The genomics of ecological vicariance in threespine stickleback fish. Nature Communications, 6, ncomms9767. doi:10.1038/ncomms9767

Roth, C., Rastogi, S., Arvestad, L., Dittmar, K., Light, S., Ekman, D., & Liberles, D. A. (2007). Evolution after gene duplication: models, mechanisms, sequences, systems, and organisms. J Exp Zool B Mol Dev Evol, 308(1), 58-73. doi:10.1002/jez.b.21124

Roux, C., Fraïsse, C., Romiguier, J., Anciaux, Y., Galtier, N., & Bierne, N. (2016). Shedding Light on the Grey Zone of Speciation along a Continuum of Genomic Divergence. PLOS Biology, 14(12), e2000234. doi:10.1371/journal.pbio.2000234

195

Ruegg, K., Anderson, E. C., Boone, J., Pouls, J., & Smith, T. B. (2014). A role for migration-linked genes and genomic islands in divergence of a songbird. Molecular Ecology, 23(19), 4757-4769. doi:10.1111/mec.12842

Rundle, H. D., & Nosil, P. (2005). Ecological speciation. Ecology Letters, 8(3), 336-352. doi:10.1111/j.1461-0248.2004.00715.x

Safran, R. J., Scordato, E. S. C., Wilkins, M. R., Hubbard, J. K., Jenkins, B. R., Albrecht, T., . . . Kane, N. C. (2016). Genome-wide differentiation in closely related populations: the roles of selection and geographic isolation. Molecular Ecology, 25(16), 3865-3883. doi:10.1111/mec.13740

Saint-Hilaire, E. G. (1795). Observations on a small species of Maki (Lemur Linn.). Bull. Soc. Philomath. Correspondants, 1, 89-90.

Saitou, N., & Nei, M. (1987). The Neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4, 406-425.

Samonds, K. E., Godfrey, L. R., Ali, J. R., Goodman, S. M., Vences, M., Sutherland, M. R., . . . Krause, D. W. (2013). Imperfect Isolation: Factors and Filters Shaping Madagascar's Extant Vertebrate Fauna. PLoS One, 8(4). doi:ARTN e62086 10.1371/journal.pone.0062086

Samuk, K., Owens, G. L., Delmore, K. E., Miller, S. E., Rennison, D. J., & Schluter, D. (2017). Gene flow and selection interact to promote adaptive divergence in regions of low recombination. Molecular Ecology, 26(17), 4378-4390. doi:10.1111/mec.14226

Sandovici, I., & Sapienza, C. (2010). PRDM9 sticks its zinc fingers into recombination hotspots and between species. F1000 Biol Rep, 2. doi:10.3410/B2-37

Sankararaman, S., Mallick, S., Dannemann, M., Prüfer, K., Kelso, J., Pääbo, S., . . . Reich, D. (2014). The genomic landscape of Neanderthal ancestry in present-day humans. Nature, 507, 354. doi:10.1038/nature12961

Savolainen, V., Anstett, M.-C., Lexer, C., Hutton, I., Clarkson, J. J., Norup, M. V., . . . Baker, W. J. (2006). Sympatric speciation in palms on an oceanic island. Nature, 441(7090), 210-213. doi:10.1038/nature04566

196

Scally, A., & Durbin, R. (2012). Revising the human mutation rate: implications for understanding human evolution. Nature Reviews Genetics, 13(10), 745-753. doi:10.1038/nrg3295

Schaffler, L., Saborowski, J., & Kappeler, P. M. (2015). Agent-mediated spatial storage effect in heterogeneous habitat stabilizes competitive mouse lemur coexistence in Menabe Central, Western Madagascar. BMC Ecol, 15, 7. doi:10.1186/s12898-015- 0040-1

Schiffels, S., & Durbin, R. (2014). Inferring human population size and separation history from multiple genome sequences. Nature genetics, 46(8), 919-925. doi:10.1038/ng.3015

Schliewen, U. K., Tautz, D., & Paabo, S. (1994). Sympatric Speciation Suggested by of Crater Lake Cichlids. Nature, 368(6472), 629-632. doi:DOI 10.1038/368629a0

Schlötterer, C., Tobler, R., Kofler, R., & Nolte, V. (2014). Sequencing pools of individuals — mining genome-wide polymorphism data without big funding. Nature Reviews Genetics, 15(11), 749. doi:10.1038/nrg3803

Schluter, D. (1996). Ecological causes of adaptive radiation. American Naturalist, 148, S40- S64. doi:Doi 10.1086/285901

Schluter, D. (1996). Ecological speciation in postglacial fishes. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences, 351(1341), 807-814. doi:DOI 10.1098/rstb.1996.0075

Schluter, D. (2001). Ecology and the origin of species. Trends in Ecology & Evolution, 16(7), 372-380. doi:Doi 10.1016/S0169-5347(01)02198-X

Schluter, D., Boughman, J. W., & Rundle, H. D. (2001). Parallel speciation with allopatry. Trends in Ecology & Evolution, 16(6), 283-284. doi:Doi 10.1016/S0169- 5347(01)02186-3

Schmid, J., & Kappeler, P. M. (1994). Sympatric mouse lemurs (Microcebus spp.) in western Madagascar. Folia Primatologica, 63, 162-170.

Schmitz, J., Noll, A., Raabe, C. A., Churakov, G., Voss, R., Kiefmann, M., . . . Warren, W. C. (2016). Genome sequence of the basal haplorrhine primate Tarsius syrichta

197

reveals unusual insertions. Nature Communications, 7. doi:ARTN 12997 10.1038/ncomms12997

Schneider, N., Chikhi, L., Currat, M., & Radespiel, U. (2010). Signals of recent spatial expansions in the grey mouse lemur (Microcebus murinus). Bmc Evolutionary Biology, 10. doi:Artn 105 doi 10.1186/1471-2148-10-105

Scholz, C. A., Cohen, A. S., Johnson, T. C., King, J., Talbot, M. R., & Brown, E. T. (2011). Scientific drilling in the Great Rift Valley: The 2005 Scientific Drilling Project - An overview of the past 145,000 years of climate variability in Southern Hemisphere East Africa. Palaeogeography Palaeoclimatology Palaeoecology, 303(1-4), 3-19. doi:10.1016/j.palaeo.2010.10.030

Schumacher, J., Rosenkranz, D., & Herlyn, H. (2014). Mating systems and protein– protein interactions determine evolutionary rates of primate sperm proteins. Proceedings of the Royal Society B: Biological Sciences, 281(1775), 20132607. doi:10.1098/rspb.2013.2607

Schumer, M., & Brandvain, Y. (2016). Determining epistatic selection in admixed populations. Molecular Ecology, 25(11), 2577-2591.

Schumer, M., Cui, R., Powell, D. L., Dresner, R., Rosenthal, G. G., & Andolfatto, P. (2014). High-resolution mapping reveals hundreds of genetic incompatibilities in hybridizing fish species. Elife, 3.

Schumer, M., Rosenthal, G. G., & Andolfatto, P. (2014). How Common Is Homoploid Hybrid Speciation? Evolution, 68(6), 1553-1560. doi:10.1111/evo.12399

Schumer, M., Xu, C., Powell, D. L., Durvasula, A., Skov, L., Holland, C., . . . Przeworski, M. (2018). Natural selection interacts with recombination to shape the evolution of hybrid genomes. Science.

Schwartz, J. J., Roach, D. J., Thomas, J. H., & Shendure, J. (2014). Primate evolution of the recombination regulator PRDM9. Nat Commun, 5, 4370. doi:10.1038/ncomms5370

Scordato, E. S., Symes, L. B., Mendelson, T. C., & Safran, R. J. (2014). The role of ecology in speciation by sexual selection: a systematic empirical review. Journal of Heredity, 105(S1), 782-794.

198

Scordato, E. S. C., Symes, L. B., Mendelson, T. C., & Safran, R. J. (2014). The Role of Ecology in Speciation by Sexual Selection: A Systematic Empirical Review. Journal of Heredity, 105, 782-794. doi:10.1093/jhered/esu037

Sedghifar, A., Brandvain, Y., & Ralph, P. (2016). Beyond clines: lineages and haplotype blocks in hybrid zones. Molecular Ecology, 25(11), 2559-2576.

Seehausen, O. (2004). Hybridization and adaptive radiation. Trends in Ecology & Evolution, 19(4), 198-207. doi:10.1016/j.tree.2004.01.003

Seehausen, O. (2015). Process and pattern in cichlid radiations - inferences for understanding unusually high rates of evolutionary diversification. New Phytologist, 207(2), 304-312. doi:10.1111/nph.13450

Seehausen, O., Koetsier, E., Schneider, M. V., Chapman, L. J., Chapman, C. A., Knight, M. E., . . . Bills, R. (2003). Nuclear markers reveal unexpected genetic variation and a Congolese-Nilotic origin of the Lake Victoria cichlid species flock. Proceedings of the Royal Society B-Biological Sciences, 270(1511), 129-137. doi:10.1098/rspb.2002.2153

Semenov, G. A., Scordato, E. S. C., Khaydarov, D. R., Smith, Chris C. R., Kane, N. C., & Safran, R. J. (2017). Effects of assortative mate choice on the genomic and morphological structure of a hybrid zone between two bird subspecies. Molecular Ecology, n/a-n/a. doi:10.1111/mec.14376

Senra, M. V. X., Sung, W., Ackerman, M., Miller, S. F., Lynch, M., & Soares, C. A. G. (2018). An unbiased genome-wide view of the mutation rate and spectrum of the endosymbiotic bacterium Teredinibacter turnerae. Genome Biol Evol. doi:10.1093/gbe/evy027

Servedio, M. R., & Noor, M. A. F. (2003). The role of reinforcement in speciation: Theory and data. Annual Review of Ecology Evolution and Systematics, 34, 339-364. doi:10.1146/annurev.ecolsys.34.011802.132412

Servedio, M. R., Van Doorn, G. S., Kopp, M., Frame, A. M., & Nosil, P. (2011). Magic traits in speciation: 'magic' but not rare? Trends in Ecology & Evolution, 26(8), 389- 397. doi:10.1016/j.tree.2011.04.005

Servin, B., Zhu, B., Sayre, B., Bian, C., Sweeney, D., Zhang, G., . . . Li, Y. (2013). Sequencing and automated whole-genome optical mapping of the genome of a

199

domestic goat (Capra hircus). Nature Biotechnology, 31(2), 135. doi:10.1038/nbt.2478

Shaw, K. L. (2002). Conflict between nuclear and mitochondrial DNA phylogenies of a recent species radiation: what mtDNA reveals and conceals about modes of speciation in Hawaiian crickets. Proc Natl Acad Sci U S A, 99(25), 16122-16127.

Shaw, K. L., & Mullen, S. P. (2011). Genes versus phenotypes in the study of speciation. Genetica, 139(5), 649-661.

Sichova, J., Volenikova, A., Dinca, V., Nguyen, P., Vila, R., Sahara, K., & Marec, F. (2015). Dynamic evolution and unique sex determination systems in Leptidea wood white butterflies. Bmc Evolutionary Biology, 15. doi:ARTN 89 10.1186/s12862-015-0375-4

Sinclair-Waters, M., Bradbury, I. R., Morris, C. J., Lien, S., Kent, M. P., & Bentzen, P. (2017). Ancient chromosomal rearrangement associated with local adaptation of a post-glacially colonized population of Atlantic Cod in the northwest Atlantic. Molecular Ecology, n/a-n/a. doi:10.1111/mec.14442

Singhal, S., Leffler, E. M., Sannareddy, K., Turner, I., Venn, O., Hooper, D. M., . . . Przeworski, M. (2015). Stable recombination hotspots in birds. Science, 350(6263), 928-932. doi:10.1126/science.aad0843

Smagulova, F., Brick, K., Pu, Y., Camerini-Otero, R. D., & Petukhova, G. V. (2016). The evolutionary turnover of recombination hot spots contributes to speciation in mice. Genes Dev, 30(3), 266-280. doi:10.1101/gad.270009.115

Smeds, L., Qvarnström, A., & Ellegren, H. (2016). Direct estimate of the rate of germline mutation in a bird. Genome research, 26(9), 1211-1218.

Smukowski, C. S., & Noor, M. a. F. (2011). Recombination rate variation in closely related species. Heredity, 107(6), 496. doi:10.1038/hdy.2011.44

Sobel, J. M., & Streisfeld, M. A. (2015). Strong premating reproductive isolation drives incipient speciation in Mimulus aurantiacus. Evolution, 69(2), 447-461. doi:10.1111/evo.12589

Soltis, D. E., Buggs, R. J. A., Doyle, J. J., & Soltis, P. S. (2010). What we still don't know about polyploidy. Taxon, 59(5), 1387-1403.

200

Soltis, D. E., Soltis, P. S., & Tate, J. A. (2004). Advances in the study of polyploidy since Plant speciation. New Phytologist, 161(1), 173-191. doi:10.1046/j.1469- 8137.2003.00948.x

Sorenson, M. D., Sefc, K. M., & Payne, R. B. (2003). Speciation by host switch in brood parasitic indigobirds. Nature, 424(6951), 928-931. doi:10.1038/nature01863

Sousa, V., & Hey, J. (2013). Understanding the origin of species with genome-scale data: modelling gene flow. Nature Reviews Genetics, 14(6), 404-414. doi:10.1038/nrg3446

Sousa, V. C., Carneiro, M., Ferrand, N., & Hey, J. (2013). Identifying loci under selection against gene flow in isolation-with-migration models. Genetics, 194(1), 211-233. doi:10.1534/genetics.113.149211

Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics (Oxford, England), 30(9), 1312-1313. doi:10.1093/bioinformatics/btu033

Stephan, W. (2010). Genetic hitchhiking versus background selection: the controversy and its implications. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 365(1544), 1245-1253.

Stroud, J. T., & Losos, J. B. (2016). Ecological Opportunity and Adaptive Radiation. Annual Review of Ecology, Evolution, and Systematics, Vol 47, 47, 507-532. doi:10.1146/annurev-ecolsys-121415-032254

Stump, A. D., Fitzpatrick, M. C., Lobo, N. F., Traoré, S., Sagnon, N. F., Costantini, C., . . . Besansky, N. J. (2005). Centromere-proximal differentiation and speciation in Anopheles gambiae. Proceedings of the National Academy of Sciences of the United States of America, 102(44), 15930-15935. doi:10.1073/pnas.0508161102

Sukumaran, J., & Holder, M. T. (2010). DendroPy: a Python library for phylogenetic computing. Bioinformatics (Oxford, England), 26(12), 1569-1571. doi:10.1093/bioinformatics/btq228

Sukumaran, J., & Knowles, L. L. (2017). Multispecies coalescent delimits structure, not species. Proc Natl Acad Sci U S A, 114(7), 1607-1612. doi:10.1073/pnas.1607921114

201

Sung, W., Ackerman, M. S., Miller, S. F., Doak, T. G., & Lynch, M. (2012). Drift-barrier hypothesis and mutation-rate evolution. Proc Natl Acad Sci U S A, 109(45), 18488- 18492. doi:10.1073/pnas.1216223109

Swanson, W. J., & Vacquier, V. D. (2002). The rapid evolution of reproductive proteins. Nature Reviews Genetics, 3(2), 137-144. doi:10.1038/nrg733

Swofford, D. L. (2002). PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods) (Version 4). Sunderland, Massachusetts: Sinauer Associates.

Swofford, D. L., Olsen, G. J., Waddel, P. J., & Hillis, D. M. (1996). Phylogenetic inference. In D. M. Hillis, C. Moritz, & B. K. Mable (Eds.), Molecular Systematics (pp. 407- 514). Sunderland, Mass: Sinauer Assoc.

Tamura, K., & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10(3), 512-526.

Tattersall, I. (2007). Madagascar's lemurs: Cryptic diversity or taxonomic inflation? Evolutionary Anthropology, 16(1), 12-23. doi:Doi 10.1002/Evan.20126

Taylor, J. S., Braasch, I., Frickey, T., Meyer, A., & Van de Peer, Y. (2003). Genome duplication, a trait shared by 22,000 species of ray-finned fish. Genome research, 13(3), 382-390. doi:10.1101/gr.640303

Taylor, J. S., Van de Peer, Y., Braasch, I., & Meyer, A. (2001). Comparative genomics provides evidence for an ancient genome duplication event in fish. Philos Trans R Soc Lond B Biol Sci, 356(1414), 1661-1679. doi:10.1098/rstb.2001.0975

Taylor, J. S., Van de Peer, Y., & Meyer, A. (2001). Revisiting recent challenges to the ancient fish-specific genome duplication hypothesis. Curr Biol, 11(24), R1005- 1008.

Taylor, J. S., Van de Peer, Y., & Meyer, A. (2001). Genome duplication, divergent resolution and speciation. Trends in Genetics, 17(6), 299-301. doi:Doi 10.1016/S0168-9525(01)02318-6

Taylor, S. A., Anderson, D. J., & Friesen, V. L. (2013). Evidence for asymmetrical divergence-gene flow of nuclear loci, but not mitochondrial loci, between seabird

202

sister species: blue-footed (Sula nebouxii) and Peruvian (S. variegata) boobies. PLoS One, 8(4), e62256. doi:10.1371/journal.pone.0062256

Terai, Y., Seehausen, O., Sasaki, T., Takahashi, K., Mizoiri, S., Sugawara, T., . . . Okada, N. (2006). Divergent selection on opsins drives incipient speciation in Lake Victoria cichlids. PLoS Biol, 4(12), e433. doi:10.1371/journal.pbio.0040433

Terekhanova, N. V., Seplyarskiy, V. B., Soldatov, R. A., & Bazykin, G. A. (2017). Evolution of Local Mutation Rate and Its Determinants. Molecular Biology and Evolution, 34(5), 1100-1109. doi:10.1093/molbev/msx060

Terhorst, J., & Song, Y. S. (2015). Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proceedings of the National Academy of Sciences, 112(25), 7677-7682. doi:10.1073/pnas.1503717112

Thalmann, U., & Rakotoarison, N. (1994). Distribution of lemurs in central western Madagascar, with a regional distribution hypothesis. Folia Primatologica, 63, 156- 161.

Thomae, A. W., Schade, G. O. M., Padeken, J., Borath, M., Vetter, I., Kremmer, E., . . . Imhof, A. (2013). A Pair of Centromeric Proteins Mediates Reproductive Isolation in Drosophila Species. Developmental Cell, 27(4), 412-424. doi:10.1016/j.devcel.2013.10.001

Thomas, G. W. C., Wang, R. J., Puri, A., Harris, R. A., Raveendran, M., Hughes, D. S. T., . . . Hahn, M. W. (2018). Reproductive Longevity Predicts Mutation Rates in Primates. Current Biology, 28(19), 3193-3197.e3195. doi:https://doi.org/10.1016/j.cub.2018.08.050

Thorpe Roger, S., Surget-Groba, Y., & Johansson, H. (2008). The relative importance of ecology and geographic isolation for speciation in anoles. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1506), 3071-3081. doi:10.1098/rstb.2008.0077

Ting, C.-T., Tsaur, S.-C., Wu, M.-L., & Wu, C.-I. (1998). A Rapidly Evolving Homeobox at the Site of a Hybrid Sterility Gene. Science, 282(5393), 1501-1504. doi:10.1126/science.282.5393.1501

Toews, D. P. L., Taylor, S. A., Vallender, R., Brelsford, A., Butcher, B. G., Messer, P. W., & Lovette, I. J. (2016). Plumage Genes and Little Else Distinguish the Genomes of

203

Hybridizing Warblers. Current Biology, 26(17), 2313-2318. doi:10.1016/j.cub.2016.06.034

Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., Baren, M. J. v., . . . Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, 28(5), 511. doi:10.1038/nbt.1621

Tregenza, T., Butlin, R. K., & Wedell, N. (2000). Evolutionary biology - Sexual conflict and speciation. Nature, 407(6801), 149-150. doi:Doi 10.1038/35025138

Tregenza, T., & Wedell, N. (2000). Genetic compatibility, mate choice and patterns of parentage: Invited review. Molecular Ecology, 9(8), 1013-1027. doi:DOI 10.1046/j.1365-294x.2000.00964.x

Trier, C. N., Hermansen, J. S., Sætre, G.-P., & Bailey, R. I. (2014). Evidence for mito- nuclear and sex-linked reproductive barriers between the hybrid Italian sparrow and its parent species. PLOS Genetics, 10(1), e1004075.

Turelli, M., Barton, N. H., & Coyne, J. A. (2001). Theory and speciation. Trends in Ecology & Evolution, 16(7), 330-343. doi:Doi 10.1016/S0169-5347(01)02177-2

Turner, L. M., & Harr, B. (2014). Genome-wide mapping in a house mouse hybrid zone reveals hybrid sterility loci and Dobzhansky-Muller interactions. Elife, 3.

Turner, T. L., Hahn, M. W., & Nuzhdin, S. V. (2005). Genomic islands of speciation in Anopheles gambiae. PLOS Biology, 3(9), 1572-1578.

Uchimura, A., Higuchi, M., Minakuchi, Y., Ohno, M., Toyoda, A., Fujiyama, A., . . . Yagi, T. (2015). Germline mutation rates and the long-term phenotypic effects of mutation accumulation in wild-type laboratory mice and mutator mice. Genome research, 25(8), 1125-1134. doi:10.1101/gr.186148.114 van Doorn, G. S., Edelaar, P., & Weissing, F. J. (2009). On the Origin of Species by Natural and Sexual Selection. Science, 326(5960), 1704-1707. doi:10.1126/science.1181661 van Schooten, B., Jiggins, C. D., Briscoe, A. D., & Papa, R. (2016). Genome-wide analysis of ionotropic receptors provides insight into their evolution in Heliconius butterflies. BMC Genomics, 17(1), 254.

204

Vaser, R., Sovic, I., Nagarajan, N., & Sikic, M. (2017). Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. doi:10.1101/gr.214270.116

Vaser, R., Sović, I., Nagarajan, N., & Šikić, M. (2017). Fast and accurate de novo genome assembly from long uncorrected reads. Genome research, 27(5), 737-746.

Vences, M., Wollenberg, K. C., Vieites, D. R., & Lees, D. C. (2009). Madagascar as a model region of species diversification. Trends in Ecology & Evolution, 24(8), 456- 465. doi:Doi 10.1016/J.Tree.2009.03.011

Venkatesh, B. (2003). Evolution and diversity of fish genomes. Current Opinion in Genetics & Development, 13(6), 588-592. doi:10.1016/j.gde.2003.09.001

Venn, O., Turner, I., Mathieson, I., de Groot, N., Bontrop, R., & McVean, G. (2014). Strong male bias drives germline mutation in chimpanzees. Science, 344(6189), 1272-1275.

Via, S. (2001). Sympatric speciation in animals: the ugly duckling grows up. Trends Ecol Evol, 16(7), 381-390.

Via, S. (2012). Divergence hitchhiking and the spread of genomic isolation during ecological speciation-with-gene-flow. Philosophical Transactions of the Royal Society B-Biological Sciences, 367(1587), 451-460. doi:10.1098/rstb.2011.0260

Via, S., Bouck, A. C., & Skillman, S. (2000). Reproductive isolation between divergent races of pea aphids on two hosts. II. Selection against migrants and hybrids in the parental environments. Evolution, 54(5), 1626-1637.

Vijay, N., Bossu, C. M., Poelstra, J. W., Weissensteiner, M. H., Suh, A., Kryukov, A. P., & Wolf, J. B. W. (2016). Evolution of heterogeneous genome differentiation across multiple contact zones in a crow species complex. Nature Communications, 7, 13195. doi:10.1038/ncomms13195

Volff, J. N. (2005). Genome evolution and biodiversity in teleost fish. Heredity (Edinb), 94(3), 280-294. doi:10.1038/sj.hdy.6800635

Vorontsova, M. S., Besnard, G., Forest, F., Malakasi, P., Moat, J., Clayton, W. D., . . . Linder, H. P. (2016). Madagascar's grasses and grasslands: anthropogenic or natural? Proc Biol Sci, 283(1823). doi:10.1098/rspb.2015.2262

205

Wade, M. J., Patterson, H., Chang, N. W., & Johnson, N. A. (1994). Postcopulatory, prezygotic isolation in flour beetles. Heredity, 72, 163. doi:10.1038/hdy.1994.23

Wadsworth, C. B., Li, X., & Dopman, E. B. (2015). A recombination suppressor contributes to ecological speciation in OSTRINIA moths. Heredity (Edinb), 114(6), 593-600. doi:10.1038/hdy.2014.128

Wang, D., Zhang, Y., Zhang, Z., Zhu, J., & Yu, J. (2010). KaKs_Calculator 2.0: A Toolkit Incorporating Gamma-Series Methods and Sliding Window Strategies. Genomics, Proteomics & Bioinformatics, 8(1), 77-80. doi:https://doi.org/10.1016/S1672- 0229(10)60008-3

Wang, I. J., Glor, R. E., & Losos, J. B. (2013). Quantifying the roles of ecology and geography in spatial genetic divergence. Ecology Letters, 16(2), 175-182. doi:10.1111/ele.12025

Wang, L., Wan, Z. Y., Lim, H. S., & Yue, G. H. (2016). Genetic variability, local selection and demographic history: genomic evidence of evolving towards allopatric speciation in Asian seabass. Molecular Ecology, 25(15), 3605-3621. doi:10.1111/mec.13714

Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10(1), 57-63. doi:10.1038/nrg2484

Weber, M. G., & Strauss, S. Y. (2016). Coexistence in Close Relatives: Beyond Competition and Reproductive Isolation in Sister Taxa. Annual Review of Ecology, Evolution, and Systematics, Vol 47, 47, 359-381. doi:10.1146/annurev-ecolsys- 112414-054048

Weiblen, G. D., & Bush, G. L. (2002). Speciation in fig pollinators and parasites. Molecular Ecology, 11(8), 1573-1578. doi:DOI 10.1046/j.1365-294X.2002.01529.x

Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M., & Jaffe, D. B. (2017). Direct determination of diploid genome sequences. Genome research, 27(5), 757-767.

Weisrock, D. W., Rasoloarison, R. M., Fiorentino, I., Ralison, J. M., Goodman, S. M., Kappeler, P. M., & Yoder, A. D. (2010). Delimiting Species without Nuclear Monophyly in Madagascar's Mouse Lemurs. PLoS One, 5(3), e9883. doi:ARTN e9883 DOI 10.1371/journal.pone.0009883

206

Weisrock, D. W., Smith, S. D., Chan, L. M., Biebouw, K., Kappeler, P. M., & Yoder, A. D. (2012). Concatenation and Concordance in the Reconstruction of Mouse Lemur Phylogeny: An Empirical Demonstration of the Effect of Allele Sampling in Phylogenetics. Molecular Biology and Evolution, 29(6), 1615-1630. doi:Doi 10.1093/Molbev/Mss008

Wertheim, B., Beukeboom, L. W., & Zande, L. v. d. (2013). Polyploidy in Animals: Effects of Gene Expression on Sex Determination, Evolution and Ecology. Cytogenetic and Genome Research, 140(2-4), 256-269. doi:10.1159/000351998

White, M. J. D. (1978). Modes of speciation: San Francisco : W. H. Freeman.

Wilmé, L., Goodman, S. M., & Ganzhorn, J. U. (2006). Biogeographic evolution of Madagascar's microendemic biota. Science, 312(5776), 1063-1065. doi:10.1126/science.1122806

Wilson, E. B. (1925). The Cell in Development and Heredity (3rd ed.). New York: Macmillan.

Winckler, W., Myers, S. R., Richter, D. J., Onofrio, R. C., McDonald, G. J., Bontrop, R. E., . . . Altshuler, D. (2005). Comparison of fine-scale recombination rates in humans and chimpanzees. Science, 308(5718), 107-111. doi:10.1126/science.1105322

Winter, D. J., Wu, S. H., Howell, A. A., Cartwright, R. A., Zufall, R. A., & Azevedo, R. B. R. (2018). accuMUlate: a mutation caller designed for mutation accumulation experiments. Bioinformatics (Oxford, England), 34(15), 2659-2660. doi:10.1093/bioinformatics/bty165

Wiuf, C., & Hein, J. (1999). Recombination as a point process along sequences. Theoretical Population Biology, 55(3), 248-259.

Wolf, J. B. W., & Ellegren, H. (2017). Making sense of genomic islands of differentiation in light of speciation. Nature Reviews Genetics, 18(2), 87. doi:10.1038/nrg.2016.133

Wood, T. E., Takebayashi, N., Barker, M. S., Mayrose, I., Greenspoon, P. B., & Rieseberg, L. H. (2009). The frequency of polyploid speciation in vascular plants. Proceedings of the National Academy of Sciences, 106(33), 13875-13879. doi:10.1073/pnas.0811575106

Worley, K. C. (2017). A golden goat genome. Nature genetics, 49(4), 485.

207

Wu, C.-I. (2001). The genic view of the process of speciation. Journal of Evolutionary Biology, 14(6), 851-865. doi:10.1046/j.1420-9101.2001.00335.x

Wu, C.-I., & Ting, C.-T. (2004). Genes and speciation. Nature Reviews Genetics, 5(2), 114. doi:10.1038/nrg1269

Wu, M., Kostyun, J. L., Hahn, M. W., & Moyle, L. (2017). Dissecting the basis of novel trait evolution in a radiation with widespread phylogenetic discordance. bioRxiv, 201376. doi:10.1101/201376

Yamamoto, S., Beljaev, E. A., & Sota, T. (2016). Phylogenetic analysis of the winter geometrid genus Inurois reveals repeated reproductive season shifts. Molecular Phylogenetics and Evolution, 94, 47-54. doi:10.1016/j.ympev.2015.08.016

Yang, Z. H. (2015). The BPP program for species tree estimation and species delimitation. Current Zoology, 61(5), 854-865.

Yang, Z. H., & Yoder, A. D. (2003). Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse lemur species. Systematic Biology, 52(5), 705-716. doi:Doi 10.1080/10635150390235557

Yasukochi, Y., Ohno, M., Shibata, F., Jouraku, A., Nakano, R., Ishikawa, Y., & Sahara, K. (2016). A FISH-based chromosome map for the European corn borer yields insights into ancient chromosomal fusions in the silkworm. Heredity (Edinb), 116(1), 75-83. doi:10.1038/hdy.2015.72

Yeo, S., Coombe, L., Chu, J., Warren, R. L., & Birol, I. (2017). ARCS: Scaffolding Genome Drafts with Linked Reads. Bioinformatics (Oxford, England). doi:10.1093/bioinformatics/btx675

Yoder, A. D., Rasoloarison, R. M., Goodman, S. M., Irwin, J. A., Atsalis, S., Ravosa, M. J., & Ganzhorn, J. U. (2000). Remarkable species diversity in Malagasy mouse lemurs (primates, Microcebus). Proceedings of the National Academy of Sciences of the United States of America, 97(21), 11325-11330. doi:Doi 10.1073/Pnas.200121897

Yoder, A. D., & Yang, Z. (2004). Divergence dates for Malagasy lemurs estimated from multiple gene loci: geological and evolutionary context. Molecular Ecology, 13(4), 757-773.

208

Yoder, J. B., Clancey, E., Des Roches, S., Eastman, J. M., Gentry, L., Godsoe, W., . . . Harmon, L. J. (2010). Ecological opportunity and the origin of adaptive radiations. Journal of Evolutionary Biology, 23(8), 1581-1596. doi:10.1111/j.1420- 9101.2010.02029.x

Yoshida, K., Makino, T., Yamaguchi, K., Shigenobu, S., Hasebe, M., Kawata, M., . . . Kitano, J. (2014). Sex chromosome turnover contributes to genomic divergence between incipient stickleback species. PLoS Genet, 10(3), e1004223. doi:10.1371/journal.pgen.1004223

Zehr, S. M., Taylor, J. P., Roach, R. G., Haring, D., Cameron, F. H., Dean, M., & Yoder, A. D. (2014). Prosimian primate life history profiles generated from the new Duke Lemur Center Database (coming soon to a URL near you!). American Journal of Physical Anthropology, 153, 281-281.

Zelazowski, M. J., & Cole, F. (2016). X marks the spot: PRDM9 rescues hybrid sterility by finding hidden treasure in the genome. Nature Structural & Molecular Biology, 23(4), 267-269.

Zhang, J. J., Kobert, K., Flouri, T., & Stamatakis, A. (2014). PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics (Oxford, England), 30(5), 614- 620. doi:10.1093/bioinformatics/btt593

Zhang, L., Martin, A., Perry, M. W., Burg, K. R. L. v. d., Matsuoka, Y., Monteiro, A., & Reed, R. D. (2017). Genetic Basis of Melanin Pigmentation in Butterfly Wings. Genetics, 205(4), 1537-1550. doi:10.1534/genetics.116.196451

Zhang, L., Mazo-Vargas, A., & Reed, R. D. (2017). Single master regulatory gene coordinates the evolution and development of butterfly color and iridescence. Proceedings of the National Academy of Sciences, 201709058. doi:10.1073/pnas.1709058114

Zhang, L., & Reed, R. D. (2016). Genome editing in butterflies reveals that spalt promotes and Distal-less represses eyespot colour patterns. Nature Communications, 7, 11769. doi:10.1038/ncomms11769

Zhang, W., Bouffard, G. G., Wallace, S. S., Bond, J. P., & Program, N. C. S. (2007). Estimation of DNA Sequence Context-dependent Mutation Rates Using Primate Genomic Sequences. Journal of molecular evolution, 65(3), 207-214. doi:10.1007/s00239-007-9000-5

209

Zhou, X., & Stephens, M. (2012). Genome-wide efficient mixed-model analysis for association studies. Nature genetics, 44(7), 821.

Zimmermann, E., Cepok, S., Rakotoarison, N., Zietemann, V., & Radespiel, U. (1998). Sympatric mouse lemurs in north-west Madagascar: a new rufous mouse lemur species (Microcebus ravelobensis). Folia Primatologica, 69, 106-114.

Zohdy, S., Gerber, B. D., Tecot, S., Blanco, M. B., Winchester, J. M., Wright, P. C., & Jernvall, J. (2014). Teeth, sex, and testosterone: aging in the world's smallest primate. PLoS One, 9(10), e109528. doi:10.1371/journal.pone.0109528

Zuellig, M. P., & Sweigart, A. L. (2018). Gene duplicates cause hybrid lethality between sympatric species of Mimulus. PLOS Genetics, 14(4), e1007130. doi:10.1371/journal.pgen.1007130

210

Biography

Christopher Ryan Campbell earned Bachelor of Science degrees in Biology and

Psychology (with Honors) at the University of North Carolina-Chapel Hill in May of

2007.

While pursuing his doctorate, Ryan received the following fellowships: the Bass

Teaching Fellowship, the Katherine Goodman Stern Dissertation Year Fellowship, and the Biology Department Fellowship. He received a Sigma Xi Grant-in-Aid, Graduate

Summer Research Fellowship, and led multiple Undergraduate Data Expeditions awarded by the Information Initiative at Duke. He has also been the recipient of the

Biology Teaching Assistant Award.

Publications (reverse chronological order):

Campbell CR, Poelstra JW, Yoder AD. 2018. What is Speciation Genomics?: the roles of ecology, gene flow, and genomic architecture in the formation of species. BJLS, 124(4), 561-583.

Larsen PA, Harris RA, Liu Y, Murali SC, Campbell CR, Brown AD, Sullivan BA, Shelton J, Brown SJ, Raveendran M, Dudchenko O, Machol I, Durand NC, Shamim MS, Aiden EL, Muzny DM, Gibbs RA, Yoder AD, Rogers J, Worley KC. 2017. Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus). BMC Biology. BMC biology, 15(1), p.110.

Casillas-Espinosa PM, Powell KL, Zhu M, Campbell CR, Maia JM, Rhen Z, Jones NC, O’Brien T, Petrovski S. 2017. Whole Genome Sequencing The Genetic Absence Epilepsy Rat From Strasbourg And Its Related Non-Epileptic . PLoS one 12 (7), e0179924.

Faherty SL, Campbell CR, Hilbig SA, Yoder AD. 2017. The effect of body mass and diet composition on torpor patterns in a Malagasy primate (Microcebus murinus). J of Comp Phys B 187 (4), 677-688

211

Yoder AD, Campbell CR, dos Reis M, Blanco M, Ganzhorn JU, Goodman SM, Hunnicutt KE, Larsen PA, Kappeler PM, Rasoloarison RM, Ralison JM, Swofford D, Weisrock DW. 2016. Geogenetic patterns in mouse lemurs (genus Microcebus) reveal the ghosts of Mada- gascar’s forests past. PNAS 113.29:8049-8056.

Faherty SL, Campbell CR, Larsen PA, Yoder AD. 2015. Evaluating whole transcriptome amplification for gene profiling experiments using RNASeq. BMC Biotech, 15(1), 65.

Blair C, Campbell CR, Yoder AD. 2015. Assessing the utility of whole genome amplified DNA for next-generation molecular ecology. Mol Ecol Resour, 15(5):1079-1090.

Larsen PA+, Campbell CR+, Yoder AD. 2014. Next-generation approaches to advancing eco-immunogenomic research in critically endangered primates. Mol Ecol Resour, 14(6):1198-1209. + Equal Authorship

Yoder AD, Chan LM, dos Reis M, Larsen PA, Campbell CR, Rasoloarison RM, Barrett M, Roos C, Kappeler PM, Bielawski J, Yang Z. 2014. Molecular evolutionary characterization of a V1R subfamily unique to strepsirrhine primates. Genome Biol Evol. 6(1):213-27.

Heinzen EL, Depondt C, Cavalleri GL, Ruzzo EK, Walley NM, Need AC, Ge D, He M, Cirulli ET, Zhao Q, Cronin KD, Gumbs CE, Campbell CR, Hong LK, Maia JM, Shianna KV, Mc- Cormack M, Radtke RA, O’Conner GD, Mikati MA, Gallentine WB, Husain AM, Sinha SR, Chinthapalli K, Puranam RS, McNamara JO, Ottman R, Sisodiya SM, Delanty N, Goldstein DB. 2012. Exome sequencing followed by large-scale genotyping fails to identify single rare variants of large effect in idiopathic generalized epilepsy. American Journal of Human Genetics. 91(2):293- 302.

Need AC, McEvoy JP, Gennarelli M, Heinzen EL, Ge D, Maia JM, Shianna KV, He M, Cirulli ET, Gumbs CE, Zhao Q, Campbell CR, Hong L, Rosenquist P, Putkonen A, Hallikainen T, Repo-Tiihonen E, Tiihonen J, Levy DL, Meltzer HY, Goldstein DB. 2012. Exome sequencing followed by large-scale genotyping suggests a limited role for moderately rare risk factors of strong effect in schizophrenia. American Journal of Human Genetics. 91(2):303-12.

Pelak K, Shianna KV, Ge D, Maia JM, Zhu M, Smith JP, Cirulli ET, Fellay J, Dickson SP, Gumbs CE, Heinzen EL, Need AC, Ruzzo EK, Singh A, Campbell CR, Hong LK, Lornsen KA, McKenzie AM, Sobreira NL, Hoover-Fong JE, Milner JD, Ottman R,

212

Haynes BF, Goedert JJ, Goldstein DB. 2010. The Characterization of Twenty Sequenced Human Genomes. PLoS ONE Genetics. PLoS Genet 6:e1001111.

Sockman KW, Salvante KG, Racke DM, Campbell CR, Whitman BA. 2009. Song compe- tition changes the brain and behavior of a male songbird. J. Exp. Biol. 212 (2009), pp. 2411–2418.

Salvante KG, Racke DM, Campbell CR, Sockman KW. 2009. Plasticity in Singing Effort and its Relationship With Monoamine in the Songbird Telencephalon. Dev. Neurobiol. 70 (2009), pp. 41–57.

213