<<

Molecular evolution and phylogenetic importance of a gamete recognition gene Zan reveals a unique contribution to mammalian speciation.

by

Emma K. Roberts

A Dissertation

In

Biological Sciences

Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of

DOCTOR OF PHILOSOPHY

Approved

Robert D. Bradley Chair of Committee

Daniel M. Hardy

Llewellyn D. Densmore

Caleb D. Phillips

David A. Ray

Mark Sheridan Dean of the Graduate School

May, 2020

Copyright 2020, Emma K. Roberts Texas Tech University, Emma K. Roberts, May 2020

ACKNOWLEDGMENTS I would like to thank numerous people for support, both personally and professionally, throughout the course of my degree. First, I thank Dr. Robert D. Bradley for his mentorship, knowledge, and guidance throughout my tenure in in PhD program. His ‘open door policy’ helped me flourish and grow as a scientist. In addition, I thank Dr. Daniel M. Hardy for providing continued support, knowledge, and exciting collaborative efforts. I would also like to thank the remaining members of my advisory committee, Drs. Llewellyn D. Densmore III, Caleb D. Phillips, and David A. Ray for their patience, guidance, and support. The above advisors each helped mold me into a biologist and I am incredibly gracious for this gift.

Additionally, I would like to thank numerous mentors, friends and colleagues for their advice, discussions, experience, and friendship. For these reasons, among others, I thank Dr. Faisal Ali Anwarali Khan, Dr. Sergio Balaguera-Reina, Dr. Ashish Bashyal, Joanna Bateman, Karishma Bisht, Kayla Bounds, Sarah Candler, Dr. Juan P. Carrera-Estupiñán, Dr. Megan Keith, Christopher Dunn, Moamen Elmassry, Dr. Adam Ferguson, Dr. Nicole Foley, James Francis, Dr. Sarah Fumagalli, Dr. Ken Griffith, Brandon Gross, Michaela Halsey, Dr. John Hanson, Dr. Lucas Heintzman, Dr. Tyla S. Holsomback, Dr. Howard M. Huynh, Dr. Narayan Kandel, Jennifer Korstian, Macy Krishnamoorthy, Dr. Neha Kumari, Mark Lee, Taylor Lenzmeier, Dr. Laramie Lindsey, Dr. Rita Magalhães, Sarah Mangum, Preston McDonald, Dr. Molly M. McDonough, Archana Muthu, Anisha Navlekar, Benneth Obitte, Dr. Nicté Ordóñez-Garza, Austin Osmanski, Dr. Julie Parlos, Nicole Paulat, Doug Perez, Dr. Kendra Phelps, Dr. R. Neal Platt, Andrea Reinhardt, Andres Rodriguez-Codero, Dr. Stephen A. Roussos, Megan Rowe, Oscar Sandate, Taylor Soniat, Dr. Cibele Sotero- Caio, Dr. Amanda Starr, Dr. Scott Starr, Dr. Jenny Strovas, Kevin Sullivan, Tim Sweeney, Iroro Tanshi, Dr. Courtney A. Thomason, Dr. Cody W. Thompson, Craig Tipton, Dr. Miryam Venegas-Anaya, Dr. Polrit Viravathana, Sarah Vrla, Dr. Lizz Waring, Rachael Weidmeier, and Emily Wright. I look forward to a lifetime of friendship with you all.

ii Texas Tech University, Emma K. Roberts, May 2020

Thanks to Heath Garner, Kathy MacDonald, and the Natural Science Research Laboratory of the Museum of Texas Tech University, for assistance with tissue loans. Thank you to numerous co-authors who helped in data procurement and analysis as well as manuscript preparation for the research reported herein. I thank the TTU High Performance Computing Center for assistance with bioinformatics analyses. Thank you to the following professional and student organizations from which research funding was procured: National Institutes of Health, National Science Foundation, American Society of Mammalogists, Southwestern Association of Naturalists, Association of Biologists at Texas Tech University, Texas Tech University Graduate School Dissertation Completion Fellowship, Museum at Texas Tech University as well as the Department of Biological Sciences at Texas Tech University. Additionally, thanks to the Department of Biological Sciences and Graduate School of Texas Tech University for travel funding to present the research reported herein at scientific meetings.

Additionally, thanks to the countless educators who with much patience taught me not only subject matter, but more importantly prepared me for both academia and life. Your words of encouragement, guidance, and knowledge did not go unheeded. With these sentiments I thank, among others, Carol Baker, Bob DeFazio, Dr. Ximena Bernal, Dr. Michael Dini, Dr. Lewis Held, Dr. Matt Johnson, David Schoen, Dr. Dylan Schwilk, and my parents (Terrence and Brenda Abrams).

Finally, I would like to thank my family and close friends for their continued encouragement, reassurance, support, and love in all of my personal and professional endeavors, as well as the innumerable opportunities they provided for education and growth. Without all of them by my side, I would not have been successful in attaining a PhD, the greatest achievement of my academic career thus far.

iii Texas Tech University, Emma K. Roberts, May 2020

TABLE OF CONTENTS

ACKNOWLEDGEMENTS...... ii

LIST OF TABLES...... vi

LIST OF FIGURES...... vii

I. INTRODUCTION...... 1 Background Information...... 1 Organization of Chapters...... 10

II. RAPID DIVERGENCE OF A GAMETE RECOGNITION GENE PROMOTED MACROEVOLUTION OF ...... 12 Abstract...... 12 Introduction...... 13 Materials and Methods...... 14 Experimental models/subjects...... 14 Database mining for Zan sequences...... 15 Sequence alignments...... 16 Phylogenetic analysis of Zan...... 17 Phylogenetic comparisons...... 17 Selection tests...... 18 Characterization of zonadhesin D3 and D3p polypeptides...... 19 Divergence rate comparisons, quantification, and statistical analyses...... 19 Results...... 20 Zan ontogeny...... 20 Zan phylogeny...... 23 Zan divergence...... 26 Discussion...... 28

III. GAMETE RECOGNITION GENE TREE YIELDS A ROBUST EUTHERIAN PHYLOGENY ACROSS TAXONOMIC LEVELS...... 63 Abstract...... 63 Introduction...... 64 Methods...... 68 Results...... 69 Magnordinal and Superordinal-level comparisons...... 69 Ordinal-level comparisons...... 70 Intra-ordinal comparisons...... 72

iv Texas Tech University, Emma K. Roberts, May 2020

Discussion...... 80

IV. ZAN VWD DOMAIN DUPLICATION AND DIVERGENCE REFLECT UNIQUE EVOLUTION OF A SPECIATION GENE IN MYOMORPH ...... 96 Abstract...... 96 Introduction...... 97 Methods...... 101 Retrieval and comparison of Zan sequences...... 101 Comparison of Zan in monotremes and placentals...... 102 Sequence alignments and phylogenetic analysis of Zan...... 102 Selection tests...... 103 Results...... 104 Retrieval and comparison of Zan sequences...... 104 Comparisons of ZanL and Zan domain architecture...... 105 Phylogenetic and comparative analyses of Zan...... 106 Selection tests...... 107 Discussion...... 107

V. CONCLUSION...... 130 General Conclusions...... 130 Future Directions...... 134

BIBLIOGRAPHY...... 140

APPENDICES...... 167 A. Sequence source IDs for each gene ...... 158 B. PAML summary results for all D3p Groups ...... 161

v Texas Tech University, Emma K. Roberts, May 2020

LIST OF TABLES 2.1. Blastp summary results for Zan sequence mining in non-...... 45 2.2 Global and Ordinal nucleotide substitution models for each gene...... 46 2.3 Global phylogenetic comparisons by Shimodaira-Hasegawa and Adjusted Unbiased tests for each gene...... 47 2.4 Ordinal phylogenetic comparisons by Shimodaira-Hasegawa and Adjusted Unbiased tests for each gene...... 48 2.5 Global PAML summary results for each gene...... 50 2.6 Positively selected sites for each gene...... 51 2.7 Likelihood ratio test statistics for each gene ...... 54 2.8 Test statistics for ANOVA and post-hoc analysis of gene divergence rates.....55 2.9 Games-Howell post-hoc test summary results for Zan divergence rate...... 56 2.10 Games-Howell post-hoc test summary results for Tecta divergence rate...... 59 2.11 Games-Howell post-hoc test summary results for Cytb divergence rate...... 61 3.1 Orders with limited taxon representation for tanglegram comparison...... 95 4.1 Gene sequence source accession numbers...... 121 4.2 Global and -specific nucleotide substitution models...... 122 4.3 PAML summary results for models M2 and M8...... 123 4.4 Positively selected sites for each D3p Group...... 124 4.5 Likelihood ratio test statistics for each D3p Group...... 127 A.1 Gene sequence source ID and accession numbers for each gene...... 158 B.1 PAML results for each D3p Group...... 161

vi Texas Tech University, Emma K. Roberts, May 2020

LIST OF FIGURES 2.1 Two-dimensional comparison of AchE-Tfr2 genomic loci: a) mouse and opossum b) mouse and platypus...... 36 2.2 Phylogenetic analysis of Zan DNA sequence divergence...... 37 2.3 Phylogenetic analysis of Tecta DNA sequence divergence...... 38 2.4 Phylogenetic analysis of Cytb DNA sequence divergence...... 39 2.5 Bayesian tree of aligned Zan protein sequences...... 40 2.6 Species diversity of Zan: a) variation in proteolytic processing sites of the zonadhesin protein precursor b) heterogeneity of zonadhesin D3 and D3p domain polypeptides...... 41 2.7 Rank rate plot of Zan divergence rates among placental mammals...... 42 2.8 Variation of Zan DNA sequence divergence rate among species...... 43 2.9 Zan ontogeny...... 44 3.1 Zan gene tree with corresponding taxonomic descriptions...... 87 3.2 Superorder and Ordinal-level trees...... 89 3.3 Order ...... 90 3.4 Order Chiroptera...... 91 3.5 Order Cetartiodactyla...... 92 3.6 Order ...... 93 3.7 Order Rodentia...... 94 4.1 Schematic diagram of zonadhesin domain structure in myomorph rodents...115 4.2 Comparisons of exon structure: a) platypus ZanL and mouse Zan b) human ZAN and mouse Zan...... 116 4.3 Bayesian tree of Zan D3p nucleotide sequences...... 117 4.4 Summary tree of Zan D3p Groups...... 118 4.5 Identification of orthologous D3p domain Groups...... 119 4.6 Positively selected sites for each D3p domain Group...... 120

vii Texas Tech University, Emma K. Roberts, May 2020

CHAPTER I

INTRODUCTION

Background Information Macroevolution, or the evolution of species, represents one of the central ideas in evolutionary biology (Stanley 1975). Macroevolutionary phenomena include processes such as adaptive radiations, genome evolution, diversification rates, and biodiversity changes over time (Glor 2010, Benton 2015). There are many ways in which to examine processes of macroevolution, namely by observing changes in microevolution (i.e. morphology, , behavior, genetic mutations).

Macroevolution and microevolution describe fundamentally identical processes on different biological scales, specifically either above species-level (macroevolution) or below species-level (microevolution). The process of microevolution is relatively well understood and uncontroversial; however no consensus exists surrounding the fundamental basis of macroevolution. For example, the dynamic formation of species, or speciation (Palumbi 1994, Coyne and Orr 2004), is well documented in mammals over the course of evolution and categorized as different modes of reproductive isolation (allopatric-geographical, peripatric and parapatric-isolation by distance, and sympatric- isolation without distance; Nei 1983, Coyne 1992, Coyne and Orr 2004).

Despite an abundance of knowledge on various aspects in addressing what constitutes the origination of species, two key concepts remain unresolved, particularly in mammalian taxa: 1) what are the evolutionary mechanisms of speciation, and 2) following divergence, what defines a species or lineage? The process of speciation is

1 Texas Tech University, Emma K. Roberts, May 2020 complex and many questions remain regarding the processes of species divergence, particularly in mammalian systems. However, the overarching theme is that the most essential feature in speciation is the dynamic maintenance of reproductive isolation either within a population or among species.

Speciation is largely a result of adaptive evolution, or the propagation of advantageous mutations through positive selection, originally proposed in the Modern

Synthesis by Charles Darwin and Alfred Wallace (Huxley 1942). Adaptive evolution is generated and maintained by sexual reproduction, which introduces genotypic variation into the genome through independent assortment and crossing over during meiosis, genetic recombination, and the origination of novel genotypes which may spread advantageous traits (Bell 1982). However, sexual reproduction is energetically expensive for vertebrate species, especially in mammals that internally fertilize (Wade and Schneider 1992; Gittleman et al. 1988). Mammalian reproduction can result in energetically costly events, such as mating energy expenditure, dangers in courtship behaviors, (Daly 1978), disruption of adapted gene combinations due to recombination in meiosis, (Horandl 2009, Horandl 2013), defective and therefore wasted gametes

(Cohen 1973), genome dilution (two-fold cost of sex), and sexual selection processes such as competition and specialization (Eberhard 1979).

Reproductive trade-offs such as those listed above, often between survival and reproduction, are a major theme in mammalian life histories. ‘Fast’ and ‘slow’ life histories contribute to the success of reproduction events (Promislow and Harvey

1990; Bronson 1991). There are vastly different reproductive strategies in mammals,

2 Texas Tech University, Emma K. Roberts, May 2020 some of which are: multiple litters and exponential rate of offspring production (i.e. rodents), quadruplet offspring resulting from polyembryony (armadillos), slowed production of offspring per year (humans, elephants, and ), and differing stages on the altricial-precocial spectrum at birth. For example, rabbits, cricetid rodents, and canids produce altricial young whereas hares, hystricomorph rodents, and bovids birth precocial young (Derrickson 1992). The aforementioned strategies may be dependent on numerous life history traits, including gestation length, litter sizes, metabolic rate, neonatal development, and lactation length, causing mating systems to be inherently complex (Promislow and Harvey 1990).

Along with a multitude of reproductive strategies, mammalian systems include natural (or artificial) strategies to promote conspecific mating (Noor et al. 2001, Orr

2005, Eberhard 1986), or reproductive isolation mechanisms. Examples include evolutionary processes, such as behaviors and physiological pathways, that prevent dilution of the gene pool by hybridization events and subsequent generation of less fit, individuals (Weir and Rowlands 1973). Because speciation often occurs as a continuum, all that is seen is a “snapshot” of evolution, resulting in many unresolved questions surrounding reproductive isolation. The dynamic balance and maintenance of reproductive isolation between distinct lineages is upheld with the use of isolation

“barriers” critical for the divergence of species (Coyne 1992).

In the 1940s, Ernst Mayr classified reproductive isolation mechanisms into two categories: pre-zygotic (before mating and zygote formation) and post-zygotic (after mating and zygote formation), both of which contribute to reproductive isolation

3 Texas Tech University, Emma K. Roberts, May 2020

(Mayr 1940). Pre-zygotic barriers include physical or temporal isolation, ethological, mechanical, and gametic isolation, whereas post-zygotic barriers include gametic mortality, hybrid mortality, inviability or sterility, and hybrid breakdown. Other historical categories for isolation barriers include ‘pre-mating’ and ‘post-mating’, and

‘pre-fertilization’ and ‘post-fertilization’, however the delineation at zygote formation is the standard used most often today as well as in this dissertation. Several examples of reproductive isolation barriers are listed below.

Species of both Drosophila and mammals serve as great examples of each type of mechanism. Drosophila species use ethological barriers in mating, such as wing- beat speed differences in courtship dances at the time of mating (Nei et al. 1983). The

Kaneshiro hypothesis provides a model for ethological isolation that is seen in

Drosophila (Kaneshiro 1980). Derived males (less primitive) in a population lose attractive traits, such as specific courtship dances, and become discriminated against by ancestral (more primitive) females. Then, derived females are selected to be less choosy and can no longer distinguish between the derived and ancestral males, and thus mates indiscriminately with both derived and ancestral males. This behavior is a strong candidate for reproductive isolation between derived populations especially in secondary contact (Odeen and Florin 2002) as established in two genera of pocket gopher, Geomys and Thomomys (Bradley et al. 1991a, Bradley et al. 1991b, Patton and

Smith 1993). Another behavioral and also temporal isolation barrier is found in mephitid species (genus Spilogale), in which individuals remain reproductively isolated based on temporal differences in each species’ estrus periods. Oftentimes,

4 Texas Tech University, Emma K. Roberts, May 2020 only weeks separate the estrus period for the species pair. This lack of overlap in reproductive availability is crucial because Spilogale sp. are indistinguishable morphologically and may represent cryptic species (Mead 1968a, Mead 1968b).

Mechanical incompatibility of Drosophila genitalia is an effective barrier to fertilization because of substantial interspecific variation in male and female anatomy.

This mechanism is analogous to the ‘lock and key hypothesis’ (Eberhard 1985), seen in many species of rodents (Shapiro and Porter, 1998, Edwards 1993, Matocq et al.

2007). During copulation, the ‘key’ (penis) must anatomically fit with the ‘lock’

(vagina) for fertilization to occur successfully. Microscopic spines on the exterior of male penises trigger only the conspecific female reproductive tract to ovulate (Shapiro and Porter 1998).

Reproductive barriers between nascent species involving gametic incompatibility are described extensively in externally fertilizing organisms, such as sea urchin and abalone (Lee and Vacquier 1992, Zigler et al. 2005, Mets and Palumbi

2006). However, barriers such as these differ markedly in internal fertilization systems as seen in mammals. In crosses of Drosophila species, defects in fertilization cause egg inviability because the sperm of one species is not compatible with eggs from another species (Alipaz et al. 2001). Because gamete incompatibility occurs after mating but before zygote formation, it is seen as the ‘last chance’ effort and one of the final pre-zygotic barriers to prevent formation of a hybrid zygote.

After mating and zygote formation occurs, there is a broad range of mechanisms which may contribute to reproductive isolation. Chromosomal

5 Texas Tech University, Emma K. Roberts, May 2020 incompatibilities, developmental breakdown, hybrid inviability, hybrid sub-fertility, and hybrid sterility are some of the possibilities following the mating of two genetically incompatible species (Wu and Palopoli 1994). Historically, researchers report that post-mating barriers occur by the fixation of a special set of interacting genes or incompatibility of genes in different populations (Dobzhansky 1937). Recent studies on hybrid fitness suggest these “special genes” which control hybrid viability and/or fertility are fully functional within a species (Nei et al. 1983) and others have suggested the incompatibility lies within the enzyme level of molecular evolution (Nei

1975). The field of post-mating and post-zygotic barriers to reproduction is just beginning.

Chromosomal and developmental incompatibilities occur when an egg is successfully fertilized but the zygote fails to develop, or the zygote develops but resulting hybrid fetus is inviable. In some crosses, there is no or halted segmentation of the zygote during stages of development or differential development of genes or gene complexes during later stages of development. It is well documented that mammals evolve complete hybrid inviability up to ten times faster than other vertebrates, such as birds and frogs (Wilson et al. 1974, Prager and Wilson 1975). The idea that mammals evolve intrinsic post-zygotic isolation more rapidly than other vertebrates remains a fascinating and unanswered question (Fitzpatrick 2004).

An example of hybrid sterility in mammals is the generation of the and in horse/donkey reciprocal mating crosses. Though the hybrid offspring is viable and phenotypically seen as robust (mule) or weak (hinny) for human

6 Texas Tech University, Emma K. Roberts, May 2020 anthropogenic purposes, these hybrids are sterile and commonly referred to as

‘evolutionary dead-ends’. Post-zygotic mechanisms serve as “last chance efforts” to inhibit maladaptive crosses of related species and reproductive and costly energy is not lost on an inviable or unfit hybrid offspring. Though hybrids may be viable, gene flow between species will be impeded because hybrids are sterile (Benirschke 1967).

Backcross individuals (hybrid individuals that mate back with the parental species), specifically descendants of hybrid females, are not phenotypically robust and potentially inviable and will continue to further breakdown, or sink, the mating system

(Benirschke 1967).

Reproductive isolation barriers that separate species often do not consist of just one mechanism and instead a combination of barriers is present. For example, certain species of Drosophila are physically isolated from each other as a result of temperature and altitude preferences between species, and also mating behavior differences. Even if the distribution of the species pair overlaps in certain , the combination of isolation barriers remains sufficient to prevent interbreeding.

Nevertheless, several mammalian species hybridize, both in captivity and in the wild, despite the aforementioned reproductive isolation barriers. Known crosses occur when in captivity: between polar and grizzly bears, tigers and lions ( and ), goats and sheep (geep), cattle and yak (cattleyak), and cattle and buffalo

( or cattalo). In addition, some mammals are known to hybridize in wild systems: rodents (ground squirrels-Thompson et al. 2015, pocket gophers-Bradley et al. 1991b, and woodrats-Mauldin et al. 2014), even-toed (camelids-Potts

7 Texas Tech University, Emma K. Roberts, May 2020

2001, cervids-Carr et al. 1986, Wishart et al. 1988, bovids-Wishart et al. 1988), and carnivores (canids-Mengel 1971, felids-Murphy et al. 2007, ursids-Galbreath et al.

2008). The above hybridizing pairs may be generated because species are genetically similar enough to bypass naturally occurring post-mating isolation barriers.

In addition to the multitude of existing reproductive isolation barriers, there must be a genetic component to reproductive isolation and subsequent diversification of species. The ‘trick’ is in finding the genes and genetic mechanisms involved. For example, species-specific sperm-egg recognition prevents formation of hybrid offspring during spawning of marine invertebrate species with overlapping ranges

(Palumbi 2009, Lessios 2011). The active gamete recognition molecule pairs (e.g. lysin and VERL in molluscs, binding and EBR in sea urchins) acquired their species- specific binding activities through combined effects of positive selection and adaptive evolution (Swanson and Vacquier 2002a, Swanson and Vacquier 2002b). Thus, in these externally fertilizing organisms, rapid molecular evolution of gamete recognition proteins confers fertilization species-specificity that serves as a mode of reproductive isolation. In mammals, the sperm protein zonadhesin (gene: Zan) mediates species- specific adhesion to the egg’s zona pellucida (ZP; Hardy and Garbers 1994, Hardy and

Garbers 1995, Tardif et al. 2010a) and may serve as a potential pre-zygotic reproductive isolation mechanism.

Zan is a large, mosaic protein present exclusively in the sperm acrosome (Bi et al. 2003, Olson et al 2004). The 7-16 kb Zan mRNA from , mouse, rabbit, and human each encode nascent precursors comprising N-terminal MAM

8 Texas Tech University, Emma K. Roberts, May 2020

(meprin/A5/protein phosphatase mu) and mucin domains followed by a variable number of tandem full and partial von Willebrand D (VWD) domains (Hardy and

Garbers 1995, Gao and Garbers 1998). The precursor undergoes extensive post- translational processing (glycosylation, proteolysis, and covalent oligomerization) during spermiogenesis (Hickox et al. 2001, Bi et al 2003), with ZP binding activity residing in the VWD domains of functionally mature zonadhesin (Hardy and Garbers

1995, Bi et al 2003).

In the last few decades, Zan has been targeted as a candidate speciation gene, along with ADAM2, ZP2, PRM1, proacrosin, sp56 and SED-1, genes associated with the sperm acrosome, spermatogenesis, or zona pellucida extracellular matrix, all of which function at the post-zygotic phase of reproduction (Tardif et al. 2010a, Nosil et al. 2011). Any gamete recognition gene could conceivably contribute to gamete recognition. However, only Zan has been shown to contribute disproportionately to pre-zygotic reproductive isolation and therefore the divergence of species (Roberts et al. Ontogeny). Some of the key components of Zan function are known, including function in species-specificity to fertilization, but the dynamic mechanisms of gamete recognition, presence of dramatic sequence variability and contribution to reproductive isolation and therefore diversification of mammalian species is still poorly understood.

Using a bioinformatic molecular evolution approach, I propose three main projects to evaluate Zan’s involvement in reproductive isolation and therefore speciation in mammals. Below is the general organization of this dissertation and an

9 Texas Tech University, Emma K. Roberts, May 2020 outline detailing these projects (including titles, authorlines, and submission information).

Organization of Chapters

Chapter I is an introductory chapter that provides necessary background information on the research focus. Chapters II through IV are separate manuscripts developed to address the aforementioned dissertation objectives. These chapters will be submitted to peer-reviewed journals. Detailed descriptions of each are listed below.

Chapter II is entitled “Rapid divergence of a gamete recognition gene promoted macroevolution of Eutheria.” The authorline is EK Roberts, S Tardif, EA

Wright, RN Platt II, CD Phillips, RD Bradley, and DM Hardy and will be submitted for publication to Molecular Biology and Evolution. Chapter II will be referred to as

“Roberts et al. Ontogeny” when cited within the body of the dissertation. Coauthors and I examine Zan’s contribution as a speciation gene to the macroevolution of

Eutherian mammals by evaluating Zan’s ontogeny in vertebrate species, characterizing its divergence among mammalian species, and assessing the biochemical properties of its encoded polypeptides in spermatozoa in several mammals.

Chapter III is entitled “Gamete recognition gene tree yields a robust Eutherian phylogeny across taxonomic levels.” The authorline is EK Roberts, EA Wright, DM

Hardy, and RD Bradley, and will be submitted for publication to Systematic Biology.

Chapter III will be referred to as “Roberts et al. Phylogeny” when cited within the body of the dissertation. Coauthors and I assess the utility of Zan as a phylogenetic

10 Texas Tech University, Emma K. Roberts, May 2020 solution to resolve evolutionary relationships across mammalian taxonomic levels by comparing detailed topological comparisons between Zan (one gene) and Supertree

(66 genes) phylogenies.

Chapter IV is entitled “Zan VWD domain duplication and divergence reflect unique evolution of a speciation gene in myomorph rodents.” The authorline is EK

Roberts, EA Wright, RD Bradley, and DM Hardy, and will be submitted for publication to the Molecular Phylogenetics and Evolution. Chapter IV will be referred to as “Roberts et al. Rodents” when cited within the body of the dissertation.

Coauthors and I examine the molecular evolution of dramatic protein domain architectural changes in the rodent Suborder Myomorpha to assess its contribution to the rapid proliferation and differentiation of rodent species.

The final chapter (Chapter V) summarizes the evolutionary implications of the main findings in chapters II-IV and discusses possible future directions for research in this area.

11 Texas Tech University, Emma K. Roberts, May 2020

CHAPTER II

RAPID DIVERGENCE OF A GAMETE RECOGNITION GENE PROMOTED MACROEVOLUTION OF EUTHERIA

Abstract

Speciation genes contribute disproportionately to macroevolution, but few examples exist, especially in vertebrates. In mammals, the Zan gene encodes the sperm acrosomal protein zonadhesin that mediates species-specific adhesion to the egg’s zona pellucida. Here we identify Zan as a speciation gene in placental mammals. Zan genomic ontogeny suggested it arose by repurposing of a stem vertebrate gene that was lost in multiple lineages but retained in Eutheria on acquiring a function in egg recognition. A 112-species Zan sequence phylogeny, representing

17 of 19 placental Orders, resolved all species into monophyletic groups corresponding to known Eutherian Orders and Suborders with <5% unsupported nodes. Zan divergence by intense positive selection and by domain duplications and accelerated divergence rate in the Myomorpha Suborder of Rodentia produced dramatic species differences in the protein’s properties, and Zan ordinal divergence rates generally reflected species-richness of Eutherian Orders. We propose that species-specific egg recognition conferred by Zan divergence served as a mode of prezygotic reproductive isolation that promoted radiation and adaptive success of

Eutheria.

12 Texas Tech University, Emma K. Roberts, May 2020

Introduction Reproductive isolation limits homogenizing gene flow between incipient species. Genetic changes that promote reproductive isolation serve not only to reinforce speciation secondary to geographic isolation, but also to initiate and drive speciation in populations with overlapping or identical ranges or niches (Coyne and

Orr 2004). Modes of reproductive isolation vary among organisms, and include prezygotic barriers such as mate discrimination, anatomical incompatibility, and fertilization specificity, as well as postzygotic barriers such as embryo inviability and hybrid sterility (Turelli and Orr 2000, Coyne and Orr 2004, Palumbi 2009).

Species-specific sperm-egg recognition prevents formation of hybrid offspring during spawning of marine invertebrate species with overlapping ranges (Palumbi

2009, Lessios 2011). The active gamete recognition molecule pairs (e.g. lysin and

VERL in molluscs, bindin and EBR in sea urchins) acquired their species-specific binding activities through combined effects of positive selection and concerted evolution (Swanson and Vacquier 1998, Swanson and Vacquier 2002a, Palumbi

2009). Thus, in these externally fertilizing organisms, rapid molecular evolution of gamete recognition proteins confers fertilization specificity that serves as a primary mode of reproductive isolation. In mammals, the sperm protein zonadhesin (gene:

Zan) mediates species-specific adhesion to the egg's zona pellucida (ZP; Hardy and

Garbers 1994, Hardy and Garbers 1995, Tardif et al. 2010a). However, no studies have yet determined if fertilization barriers contribute to reproductive isolation in any vertebrate, or for that matter whether such barriers are even relevant in such as mammals that fertilize internally.

13 Texas Tech University, Emma K. Roberts, May 2020

Orthologous genes diverge regardless of whether their products evolve neutrally or are subjected to selection, so species divergence of any gene can serve as a clock to measure time passed since a speciation event. However, gene divergence reflects not only time passed but also evolution of their products’ functions, with negative selection on products that must function the same and positive selection on products that bestow beneficial new traits in the evolving organisms (Sabeti et al.

2006). Accordingly, evolution of a speciation gene should accurately reflect both species phylogeny and divergence rate, and occur by positive selection. To examine the relationship between Zan function and mammalian phylogeny, here we evaluated its ontogeny in >120 vertebrate species, characterized its divergence among 112 species representing 17 of 19 mammalian Orders, and assessed the properties of its encoded polypeptides in spermatozoa of several mammals. The results suggest Zan molecular evolution contributed to speciation among placental mammals and thereby promoted the extraordinary adaptive radiation of Eutheria.

Materials and Methods

Experimental models/subjects For immunochemical studies we used spermatozoa from five lab species (mouse, rat, hamster, guinea pig, and rabbit), two farm animal species (horse and pig), and dog and human. Lab and farm animals were housed and handled per

IACUC-approved animal protocols. We collected semen from farm animals in the course of routine animal husbandry for the species, acquired dog spermatozoa from discarded epididymides of a mature male neutered in the course of routine veterinary

14 Texas Tech University, Emma K. Roberts, May 2020 care, and obtained human semen from volunteer donors with appropriate consent per

IRB-approved protocol.

Database mining for Zan sequences We first retrieved nucleotide sequences annotated as "zonadhesin" in GenBank or Ensembl (last accessed 03 October 2018). Among the retrieved sequences we identified authentic Zan in 112 Eutherian mammals representing 17 of 19 Eutherian Orders as described in Results. Searches for Zan in non-Eutherian mammals

(Ornithorhynchus anatinus, Monodelphis domestica, Sarcophilus harrisi, and

Macropus eugenii) in genomic databases (GenBank and Ensembl) revealed an absence of Zan in the conserved mammalian ACHE/Tfr2 locus (Wilson et al. 2001).

Corresponding ACHE/Tfr2 syntenic regions between representative Eutherian mammals and non-Eutherian genomes (listed above) were compared for detection of

Zan, specifically between the genes Epo (erythropoietin) and Ephb4 (ephrin b4 receptor). To search for Zan remnants in the Epo/Ephb4 syntenic region and

ACHE/Tfr2 locus, TBLASTX 2.8.0+ searches aligned and queried all possible reading frames of the aforementioned genomic reads to search for distantly related target sequences using default parameters and reward/penalty ratio (1/-1) for detection of more divergent sequences (Camacho et al. 2009).

To determine if Zan absence in several non-Eutherian genomes might be an artifact of genome assembly, we queried raw genomic reads of various vertebrates

(Ornithorhynchus anatinus assembly ornAna2, Monodelphis domestica assembly

MonDom5, Sarcophilus harrisi assembly sarHar1, Macropus eugenii assembly

15 Texas Tech University, Emma K. Roberts, May 2020 macEug2, Crocodylus porosus assembly CroPor_comp1, Gallus gallus assembly galGal6, Anolis carolinensis assembly anoCar5, and Xenopus laevis assembly xenLae2) by TBLASTX search with the 112 species Eutherian Zan nucleotide alignment (see below) and with a six species (Rattus norvegicus, Bos taurus, Sus scrofa, Myotis lucifugus, Canis lupus familiaris, Dasypus novemcinctus) ADAM3 nucleotide alignment. To identify divergent, zonadhesin-like candidate progenitors in non-mammals, we queried NCBI non-redundant protein sequences by BLASTp 2.8.0+

(Camacho et al. 2009; Table 2.1) with the least derived Zan protein (Dasypus novemcinctus), then evaluated synteny to authentic Zan by inspection of their corresponding genetic loci in NCBI Genome Data Viewer.

Sequence alignments We aligned authentic Zan nucleotide sequences (Table A.1) encoding the zonadhesin protein’s von Willebrand D0, D1, D2, D3, and approximately first 25% of

D4 domains (range: 330-1560 nts each) using T-coffee software in Meta-coffee mode to align with multiple algorithms, consolidate the output into a single model, and produce a local estimation of consistency with the individual alignment from which it was derived (Notredame et al. 2000). To confirm correct reading frames and detect premature stop codons, we translated the aligned sequences in MEGA v.7 (Kumar et al. 2016). All alignments and resultant trees will be deposited in the DRYAD Digital

Repository.

16 Texas Tech University, Emma K. Roberts, May 2020

Phylogenetic analysis of Zan We examined 56 maximum likelihood models with the hierarchical likelihood ratio tests (hLRTs) and Akaike Information Criterion (AIC) in jModelTest2 (Darriba et al. 2012) to detect the best-fit model of nucleotide substitution (Table 2.2), and identified GTR+I+G as the most appropriate model. Because authentic Zan proved to be absent from non-mammals, we selected the ZanL gene from Chinese soft-shelled turtle (Pelodiscus sinicus) as outgroup in the Zan alignment. To perform likelihood analysis under a Bayesian inference model, we used MrBayes 3.2.6 (Ronquist et al.

2012) with the following options: 2 independent runs with four chains, one cold and three heated (Metropolis-coupled Markov chain Monte Carlo numerical method), 10 million generations, and sample frequency every 100th generation from the last

750,000 generated, then constructed a consensus tree (50% majority rule) from the remaining trees and plotted posterior probability values on the topology in FigTree

1.4.4 (Rambaut 2018).

Phylogenetic comparisons To determine if the Zan gene tree recapitulates mammalian evolutionary relationships better than other genes, we compared it to similarly constructed trees for tectorin A (Tecta gene), a zonadhesin paralog with a tandem VWD domain region similar in length and protein domain composition to zonadhesin, and cytochrome-b

(Cytb gene), a rapidly evolving mitochondrial gene commonly used for molecular phylogeny. To assess phylogenetic validity of the gene tree topologies, we compared them to an established mammalian species supertree phylogeny (Bininda- Emonds et

17 Texas Tech University, Emma K. Roberts, May 2020 al. 2007) by both global and intra-ordinal Shimodaira-Hasegawa (SH) and

Approximately Unbiased (AU) tests (Shimodaira 2002), as well as by calculating tree congruity index (Icong) (de Vienne et al. 2007). The supertree (Bininda-Emonds et al.

2007) was constructed from extensive gene sequence and morphometric data, and is very well-represented taxonomically, so we first pruned it in Phylomatic v.3 (Webb and Donoghue 2005) to only those species represented in each of the Zan, Tecta, and

Cytb phylogenies, we input the unconstrained gene and corresponding pruned, constrained supertree files to PAUP (Swofford 2003) and ran one-tailed SH and AU tests using software determined best nucleotide substitution models (Table 2.2),

10,000 RELL (Resampling Estimated Log Likelihoods) bootstrap generations, and

Bonferroni correction. We considered trees significantly different at p<0.05. For congruence testing, we calculated Icong and associated p-values for the same gene and pruned supertrees using the online resource described by de Vienne et al. (2007).

Selection tests

To determine if Zan DNA sequence divergence subserves biologically relevant species differences in the zonadhesin precursor’s amino acid sequence, we globally evaluated the 112 species alignment for relative contribution of neutral evolution, negative selection, and positive selection by the CODEML program in the PAML 4.9d package (Yang 2007). We first calculated dN/dS ratios (w, omega) from the codon alignments with four comparisons wherein the null model, M0, assumed one global ratio and constrained w to be equal on all branches in the phylogeny. The initial

18 Texas Tech University, Emma K. Roberts, May 2020 comparison (M0 vs. M1 and M0 vs. M7) tested for neutrality, wherein M1 and M7 assumed independent w ratios for all branches in the phylogeny. The subsequent comparison (M1 vs. M2 and M7 vs. M8) tested for selection, wherein M2 and M8 allowed w>1 and detected variation in w among sites using a Bayes Empirical Bayes approach to calculate posterior probabilities for sites under selective pressures (Yang

2007). We then determined which model, M1, M7, M2, or M8 was most appropriate for each gene by likelihood ratio tests (LTRs), using a chi-squared distribution, degrees of freedom equaling 2, and statistical significance of p<0.05.

Characterization of zonadhesin D3 and D3p polypeptides We detected zonadhesin D3 polypeptides as previously described (Bi et al.

2003, Tardif et al. 2010a, Tardif et al. 2010b) on western blots of proteins extracted from spermatozoa and resolved by SDS-PAGE (disulfides reduced) using affinity- purified rabbit antibodies to the mouse D3 domain (Tardif et al. 2010a). We similarly detected D3p polypeptides using affinity-purified antibodies to the mouse D3p18 domain (Tardif et al. 2010a, Wheeler et al. 2011, Tung et al. 2017).

Divergence rate comparisons To assess differences in divergence rates between species, we visualized gene trees in FigTree 1.4.4 (Rambaut 2018), summed species’ branch lengths back to the origins of their corresponding Superorders, then for each gene normalized the calculated branch lengths to the species with the shortest branch length. We used

Levene’s and Shapiro-Wilk tests to assess the homogeneity of variances and normality, respectively, of divergence rate between and among mammalian Orders for

19 Texas Tech University, Emma K. Roberts, May 2020 each gene. Because the Levene’s test confirmed homoscedasticity for each individual gene (all p£0.02) but not all three genes collectively (p=0.442), and Shapiro-Wilk confirmed normality only for Zan, Tecta, and all genes collectively (p£0.011) but not

Cytb (p=0.064), we subsequently conducted Kruskall-Wallis H non-parametric

ANOVA on ranks and Games-Howell post-hoc tests to determine stochastic domination of divergence rate between genes and between Orders (Day and Quinn

1989).

Statistical details performed using SPSS are presented in Tables 2.3-2.11, which state how normalization was achieved for the datasets, identify statistical tests used, list exact value of n and what it represents, and define relevant terms. For all tests, differences were considered significant at p<0.05, and posterior probabilities were considered significant at p³0.95. All statistical analyses for selection tests and divergence rate data are described in the Methods above.

Results

Zan ontogeny To identify Zan loci we queried NCBI and Ensembl databases, as well as raw genomic reads from several non-Eutherian species, by a combination of word,

TBLASTX, and BLASTp searches. NCBI Protein database queries retrieved >1200 entries annotated as ’zonadhesin’, including sequences from viruses, bacteria, protists, fungi, and plants. Animals accounted for the vast majority of entries (>1000), with species ranging broadly from cnidarians and nematodes to fishes, reptiles, birds, and mammals. However, entries from most species other than Eutherian mammals

20 Texas Tech University, Emma K. Roberts, May 2020 differed markedly in size, sequence, and domain composition from prototypical Zan gene products in species such as pig, mouse, and human that have been directly characterized, suggesting those entries were not truly Zan. We therefore set two criteria to identify authentic Zan in genome assemblies: 1) predicted protein domain composition to include, in order, MAM, mucin, and tandem VWD domains (Hardy and Garbers 1995), and 2) fully conserved synteny and orientation between Ephb4 and

Epo in the genomic locus spanning AchE to Tfr2 (Wilson et al. 2001). No non-

Eutherian genes annotated as Zan met these criteria. Indeed, two-dimensional comparison of the mouse (Mus musculus) Zan genomic locus spanning AchE to Tfr2 with the syntenic opossum (Monodelphis domestica) locus revealed a marked discontinuity between the Ephb4 and Epo genes flanking Zan (Fig. 2.1A) owing to the absence of Zan in opossum despite conservation and synteny of the other nine genes.

The full opossum assembly was approximately 30 kb longer than the mouse assembly

(340 kb vs. 310 kb), reflecting a generally greater content of non-coding intergenic and intronic DNA in the opossum locus. Nevertheless, the Ephb4 - Epo intergenic segment spanned only 30 kb, which is too short to accommodate 100+ kb Zan, and local TBLASTX search of the 30 kb with mouse Zan detected no Zan-like sequences.

Surprisingly, even though monotremes (Prototheria) diverged basal to Metatheria and

Eutheria (estimated at 166 vs. 148 Myr ago, respectively; Bininda-Emonds et al.

2007), two-dimensional comparison of the mouse and platypus (Ornithorhynchus anatinus) syntenic loci identified a platypus Zan-like gene (Fig. 2.1B) comprising mucin and tandem VWD domains but with incongruous predicted domain content (no

21 Texas Tech University, Emma K. Roberts, May 2020

MAM domains and double the number of full VWD domains) in comparison to authentic Zan.

In searches for Zan loci that may have evaded annotation, TBLASTX query with 112 Eutherian Zan DNA sequences retrieved no matching Zan sequences from raw genomic reads of six non-Eutherian species, including one amphibian (Xenopus laevis), one bird (Gallus gallus), and three marsupials (Monodelphis domestica,

Sarcophilus harrisii, and Notamacropus eugenii). Similar query of the same six sets of genomic reads with aligned DNA sequences encoding ADAM 3, another rapidly evolving sperm-specific protein that functions in fertilization (Nishimura et al. 2007,

Long et al. 2012), retrieved multiple ADAM sequences from each organism, suggesting that this search strategy would have retrieved Zan if it were present in the genomes of the queried species.

Despite the apparent absence of Zan in non-Eutherian mammals, birds, and amphibians, BLASTp search with armadillo zonadhesin protein sequence retrieved not only 100+ mammalian Zan sequences as expected, but also zonadhesin-like predicted sequences in two reptiles (Chinese softshell turtle, Pelodiscus sinensis; painted turtle,

Chrysemys picta), two ray-finned fish (large yellow croaker, Larimichthys crocea; zebrafish, Danio reria) and one lobe-finned fish (coelocanth, Latimeria chalumnae)

(Table 2.1). We therefore compared synteny of the Pelodiscus and Latimeria

"zonadhesin" genomic loci with Eutherian Zan loci. The Pelodiscus locus included two genes (EphB4 and GNB2) present in the Eutherian Zan locus spanning

ACHE/Tfr2, but without conserved gene order and orientation, and the Latimeria locus

22 Texas Tech University, Emma K. Roberts, May 2020 included three other genes (Epo, SLC12A9, and TFR2) present in the Eutherian Zan locus, also without conserved order and orientation. The limited, partial synteny among the loci suggested the Pelodiscus and Latimeria genes annotated as zonadhesin were not authentic Zan but instead are "Zan-like", hereafter designated ZanL, descended from an ancient Zan/ZanL progenitor. Altogether, we identified Zan in 112 species representing 17 of 19 Eutherian Orders, and Zan-like genes (hereafter designated ZanL) in one monotreme and five non-mammalian vertebrates, but found no Zan or ZanL genes in marsupials, birds, or amphibians. Thus, authentic Zan appeared only in genomes of Eutherian mammals.

Zan phylogeny To compare Zan among the 112 placental species, we first characterized the genes’ exon and encoded domain structures. Zonadhesin’s ZP-binding activity resides in its D0-D4 VWD domains (Hardy and Garbers 1994, Hardy and Garbers 1995).

However, human ZAN comprises 48 exons, whereas mouse Zan comprises 88 exons owing to presence of an additional 20 partial VWD domains between D3 and D4, designated D3p domains, each encoded by a two-exon cassette (Wilson et al. 2001).

We found D3p domain expansions only in 10 of the 11 rodent species from Suborder

Myomorpha, with the number of domains ranging from zero in Lesser Egyptian jerboa

(Jaculus jaculus) to 24 in North American deermouse (Peromyscus maniculatus).

Therefore, to compare orthologous Zan sequences across all mammalian Orders, we removed D3p domain coding regions from the predicted Zan mRNA sequences of the

10 myomorph rodent species, then aligned sequences encoding D0-D4. Bayesian

23 Texas Tech University, Emma K. Roberts, May 2020 analysis of the alignment (GTR+I+G nucleotide substitution model) produced a phylogenetic tree that closely recapitulated Eutherian phylogeny constructed by other morphometric and molecular data (Murphy et al. 2001, Springer et al. 2001, Bininda-

Emonds et al. 2007, Meredith et al. 2011, Foley et al. 2016), with posterior support

(p³0.95) at 107 of 112 nodes (Fig. 2.2). In contrast, similar analysis with two other genes yielded trees with many more unsupported nodes and with discrepant grouping of species and Orders (Figs. 2.3 and 2.4). For these comparisons we chose tectorin A

(Tecta gene) because it is a zonadhesin paralog with a tandem VWD domain region similar in length and domain composition to zonadhesin, and cytochrome b (Cytb gene) because it is rapidly evolving mitochondrial gene commonly used for molecular phylogeny. The Tecta tree yielded 27/115 = 23% unsupported nodes, and the Cytb tree 47/119 = 40% unsupported nodes, as opposed to only 4.5% unsupported nodes in the Zan tree.

To determine if the Zan gene tree accurately reflected Eutherian species phylogeny, we compared its topology to that of a widely-accepted species supertree constructed from a combination of extensive gene sequence and morphometric data

(Bininda-Emonds et al. 2007), that nevertheless contains many unresolved polytomies

(Stadler and Bokma 2013). The Zan gene tree largely recapitulated the established supertree phylogeny. Shimodiara-Hasegawa (SH) and Approximately Unbiased (AU) tests, and tree congruity index (Icong) calculations revealed that single best Zan tree global and ordinal topologies differed (p<0.0001, AU test) from the established supertree because the Zan tree is more fully resolved (Tables 2.3-2.4). Specifically,

24 Texas Tech University, Emma K. Roberts, May 2020

Zan ordinal topologies did not differ from the supertree for Orders in which the supertree is well resolved (Primates, Cetartiodactyla). But Zan topologies did differ

(AU test) for Superorder (p<0.00001), as well as for Orders with polytomies in the supertree (Carnivora, p=0.005; Chiroptera, p<0.0001; Rodentia, p<0.0001).

Corresponding comparison of the Tecta gene tree also yielded a single best global topology that differed from the supertree species phylogeny (p<0.0001, AU test), but two different ordinal topologies each for Afrotheria and Primates. Only the single Carnivora (p<0.04) and one of the two Afrotheria (p<0.006) Tecta ordinal topologies differed from the supertree phylogeny; the topologies for Primates,

Rodentia, Cetartiodactyla, and Chiroptera did not differ. In contrast, four global Cytb trees, each about equally likely, all differed from the supertree phylogeny (all p<0.02,

AU test), presumably because of the numerous unresolved nodes in the Cytb gene tree.

Neither of two Carnivora Cytb ordinal topologies differed from the supertree, nor did the single ordinal topologies for Afrotheria, Cetartiodactyla, Chiroptera, and Rodentia.

Only the single Primates Cytb ordinal topology differed from supertree topology

(p<0.0001, AU test). Finally, congruency comparison of Zan and Tecta gene trees to the supertree yielded similarly higher proportions of leaves in maximum agreement subtrees (75/108 and 75/106, respectively) than the Cytb gene tree (48/113), as well as higher Icong values (4.91 and 6.34 respectively, vs. 3.07 for Cytb; all p<8e-22), reflecting the generally poorer resolution and topology of the Cytb gene tree. Thus, in contrast to the Zan tree, the Tecta and Cytb tree topologies, with their discrepant

25 Texas Tech University, Emma K. Roberts, May 2020 groupings and larger proportion of unsupported nodes, yielded generally poorer resolution than the supertree.

Zan divergence To determine if Zan DNA sequence divergence subserves biologically relevant species differences in the zonadhesin precursor's amino acid sequence, we globally evaluated the 112 species alignment for relative contribution of neutral evolution, and negative and positive selection by PAML test. The analysis detected intense (w2

(dN/dS) = 8.749) positive selection (PAML model M2; Tables 2.5-2.7, B.1) at 166 of

1932 sites (8.6%) in the compared Zan coding region, with another 46 sites also under weaker but significant positive selection (Bayes empirical Bayes posterior probability

³ 0.95). In contrast, corresponding analysis of the Tecta alignment yielded equal likelihoods for the neutral and positive selection models (PAML M1 and M2, respectively; Tables 2.5-2.7, B.1) with, for model M2, overall much weaker but more pervasive positive selection (w2 = 1.707 at 919 of 2597 sites). Finally, despite the more rapid divergence of the Cytb mitochondrial DNA sequences, analysis of the Cytb alignment detected only negative selection (w0 <1) and neutral evolution (w1 = 1.0) at

352 and 24 of 376 sites, respectively (Tables 2.5-2.7, B.1), consistent with the idea that Cytb divergence serves primarily as a marker for time passed since species divergence (Arbogast and Slowinski 1999).

Multiple alignment of protein sequences encoded by 106 Zan DNAs, with the corresponding sequence of soft-shelled turtle ZanL as outgroup, produced a phylogenetic tree nearly identical to that obtained from DNA alignments (Fig. 2.5).

26 Texas Tech University, Emma K. Roberts, May 2020

Regions of high protein sequence variation were readily evident in the alignment, with characteristic sequences corresponding to taxonomic groups, including insertions unique to certain taxa (e.g. a 4-11 residue insertion in the D1 domain only in myomorph rodents and a 4 residue insertion in the D3 domain only in ), as well as loss of an otherwise conserved proteolytic processing site in the D1 domain only in myomorph rodents (Fig. 2.6, upper panel). The loss of the D1 processing site, together with differences in other post-translational events such as glycosylation, manifested as striking species heterogeneity in the sizes of zonadhesin D3 polypeptides produced in the precursor's maturation (Fig. 2.6, lower panel).

Importantly, the PàL/V mutation in the otherwise conserved Y114GDPH D1 processing site proved to be a result of positive selection (Tables S4-S5), suggesting the change is functionally important.

Although Zan DNA and protein sequence divergence correlated closely with presumptive species divergence, branch length differences suggested Zan divergence rate differed among the compared species. A rank-rate plot of Zan divergence since origin of the species' respective Superorders revealed an inflection between the 91 most slowly diverging species, which included the 90 species from all Orders except

Rodentia and , and the 21 most rapidly diverging species, which included

18 of 19 rodent species comprising, in part, all 11 species from Suborder Myomorpha

(Fig. 2.7). Among species in Superorder Afrotheria and eight other Eutherian Orders with at least three Zan sequences, the three genes diverged with normalized dynamic ranges of 3.6 for Zan, 3.4 for Tecta, and 2.9 for Cytb. (Fig. 2.8). However, the range

27 Texas Tech University, Emma K. Roberts, May 2020 of Tecta divergence decreased to 2.0 exclusive of the three species of echolocating bats. Divergence rates for all three genes differed overall as well as between Orders and between species (Tables 2.9-2.11). Zan divergence rate ranged highest in

Rodentia and Eulipotyphla, whereas Cytb divergence ranged highest in Primates, and

Tecta ranged highest in Chiroptera. Within Orders, Zan exhibited generally greater divergence range than Cytb or Tecta, with the exception of Cytb in Primates and Tecta in Chiroptera. Furthermore, Rodentia, Eulipotyphla, and exhibited higher average Zan normalized divergence rate than Tecta or Cytb normalized divergence (p<0.0001; Table 2.9-2.11), as well as higher Zan divergence than the other seven Orders (Afrotheria p=0.041; Perissodactyla, Chiroptera, Carnivora,

Cetartiodactyla, and Primates p<0.0001; Table 2.9-2.11).

Discussion We conclude that Zan is a speciation gene in Eutheria. Three key findings support this conclusion:

1. Zan sequence divergence directly reflects species diversification in Eutheria. The

Zan gene tree yields a more resolved phylogeny than trees from genes whose variation reflects time passed since species diverged rather than a direct contribution to speciation itself. The Zan tree is superior even to trees that combine extensive gene sequence and morphometric data (Murphy et al. 2001, Springer et al. 2001, Bininda-

Emonds et al. 2007, Meredith et al. 2011, Foley et al. 2016).

2. Positive selection drives Zan functional divergence. Positive selection at >11% of sites within the compared Zan sequences, with intense selection (w2> 8.7) at most of

28 Texas Tech University, Emma K. Roberts, May 2020 those sites, rapidly diversifies Zan amino acid sequences. Changes include altered processing sites in the precursor's functional maturation, as well as substitutions that presumably determine the protein's species-specific egg recognition activity. The magnitude and pervasiveness of Zan diversifying selection exceeds that observed for various noteworthy examples of rapid molecular evolution, including sperm protamine and other male reproductive proteins in Old World primates [w£ 3 at 3.6% of sites

(Wyckoff et al. 2000)], various fertilization proteins of mammalian spermatozoa [e.g. w=3.9 at 3% of sites in fertilin a/ADAM1 (Swanson et al. 2003) and w=7.6 at 1% of sites in preproacrosin (Turner et al. 2008)], bindin F-lectin repeat 1 in oyster spermatozoa [w=6.0 at 7.1% of sites (Moy et al. 2008)], and even HIV envelope protein in rapid viral escape from neutralizing antibody [w= 8.1 at <5% of sites (Frost et al. 2005)].

3. Zan divergence rate variation generally reflects species richness of Eutherian

Orders. Zan divergence rate is highest in Rodentia and Eulipotyphla, which together comprise nearly half of the >6000 currently recognized placental species, and lowest among the six Orders of Afrotheria, which comprise fewer than 100 extant species

(Burgin et al. 2018).

Zan's presence in Eutheria but absence from other vertebrates raises the question of when and how the gene originated. A plausible ontogeny (Fig. 2.9) emerges from presence of Zan-like genes (ZanL) in the syntenic locus of a monotreme

(Ornithorhynchus) and in partially conserved loci of two reptiles (Pelodiscus and

Chrysemis) two ray-finned fish (Larimichthys and Danio) and a lobe-finned fish

29 Texas Tech University, Emma K. Roberts, May 2020

(Latameria) (Table 2.1). In this proposed ontogeny, Zan and ZanL evolve from an ancestral Zan-like gene in stem vertebrates that is lost in amphibians, birds, and marsupials but persists as ZanL in monotremes, fishes, and reptiles because of some yet unidentified function, and persists as Zan in placental mammals because of its acquired function in sperm-egg recognition. No tissue expression data are available for the ZanL genes identified, so we cannot definitively rule out the possibility that their products are sperm proteins. However, fish ZanL cannot act as Zan does in gamete interactions because fertilization in teleosts occurs by penetration of acrosomeless spermatozoa through the egg micropyle (Killingbeck and Swanson

2018), and an unrelated sperm protein gene encoded by the Bouncer gene mediates species-specific egg recognition in zebrafish (Herberg et al. 2018). Zan-related genes in Atlantic salmon, pufferfish, and zebrafish (Hunt et al. 2005) appear not to be direct descendants of the Zan/ZanL ancestor because they encode proteins that differ from zonadhesin in predicted domain composition and order, or map to loci that lack recognizable synteny to mammalian Zan loci, or both. Also, their expression primarily in the gut suggests they and similar genes such as ZanL function in teleost fish primarily as gut mucins, and not as gamete recognition molecules. Thus, we propose that Zan arose in Eutheria by repurposing of a gene from stem vertebrates early in or coincident with the divergence of Eutheria from Metatheria, suggesting a contribution not only to species divergence within Eutheria but possibly also to the origination of the clade.

30 Texas Tech University, Emma K. Roberts, May 2020

Reproduction varies dramatically among species, with unique reproductive specializations constituting the defining traits of some taxa (e.g. the placenta in

Eutheria). Accordingly, reproductive proteins exhibit high rates of evolution by positive selection (Palumbi 1999, Swanson and Vacquier 2002a, Swanson and

Vacquier 2002b, Swanson et al. 2003, Turner and Hoekstra 2006, Turner et al. 2008).

In contrast to gamete recognition molecules, however, most such proteins do not mediate processes that directly influence reproductive isolation. Our findings on Zan molecular evolution show that rapid evolution of a mammalian gamete recognition protein confers species-specificity to fertilization that in turn serves as a mode of prezygotic reproductive isolation, thereby promoting speciation. Thus mammals seem to diverge by this mode of reproductive isolation even though other characteristics of internally fertilizing organisms also readily promote prezygotic isolation, for example mating barriers arising from anatomical incompatibilities or differences in courtship behavior or breeding seasons (Kaneshiro 1980). Remarkably, despite dramatic species differences in other aspects of fertilization (energy metabolism, gamete transport, sperm capacitation, etc.), the cell biology of species-specific gamete recognition is largely conserved, occurring in many invertebrates and vertebrates by binding of a rapidly evolving protein in the sperm acrosome to the acellular coat overlying the egg

(Killingbeck and Swanson 2018, Tardif et al. 2010a). Notwithstanding this conserved cell biology, as well as conserved functionality of ZP-domain proteins in egg coats from marine invertebrates to mammals (Turner and Hoekstra 2006, Palumbi 2009,

Wilburn and Swanson 2016, Killingbeck and Swanson 2018), the active sperm

31 Texas Tech University, Emma K. Roberts, May 2020 proteins in different taxa are not evolutionarily related; abalone lysin, sea urchin bindin, mammalian zonadhesin, and fish Bouncer share no common ancestors (Hardy and Garbers 1995, Wilson et al. 2001, Palumbi 2009, Herberg et al. 2018).

Collectively, these observations reveal that species-specific gamete interaction in diverse animals occurs by conserved cellular processes paradoxically mediated not only by related molecular components that evolved rapidly from ancient ancestral genes [for example, ZP-domain proteins (Turner and Hoekstra 2006, Raj et al. 2017,

Killingbeck and Swanson 2018)], but also by unrelated components that evolved rapidly from repurposed genes (for example, Zan), all driven by reproductive advantage bestowed by prezygotic isolation.

Several enzymes of mammalian spermatozoa possess ZP binding/adhesion activity. These “hijacked” enzymes include the membrane-associated ectoenzymes galactosyl transferase, aryl sulfatase A, hexokinase, hyaluronidase, a-mannosidase, and the acrosomal zymogen proacrosin (Bi et al. 2002). The ZP binding activity of galactosyl transferase derives from the enzyme’s substrate binding activity, whereas the ZP binding activity of proacrosin resides in a C-terminal region distinct from the serine protease catalytic domain. Irrespective of how they bind the ZP, hijacked enzymes represent additional examples of opportunistic repurposing or dual-purposing in evolution of mammalian gamete recognition mechanisms, and together with Zan establish gene repurposing as a recurrent theme in mammalian fertilization. However, perhaps because of constraints on their ability to evolve without losing catalytic activity, hijacked enzymes do not bind ZP in a species-specific manner; indeed,

32 Texas Tech University, Emma K. Roberts, May 2020 zonadhesin and proacrosin in sperm extracts both bind avidly to intact, isolated ZP, but proacrosin binds equally well to ZP from three different mammals and also to

Xenopus envelopes, whereas zonadhesin binds only to cognate ZP (Hardy and Garbers

1994, Hardy and Garbers 1995). Coincident species-specific and non-species-specific binding also occurs in mollusc gamete recognition; among three tandemly repeated

VERL domains distantly related to the N-terminal moiety of ZP domains [ZP-N domains (Wilburn and Swanson 2016)], the most conserved of the three (repeat 3) binds lysin of multiple species, a less conserved domain (repeat 2) binds only lysin of the cognate species, and the most variable domain (repeat 1) does not bind lysin (Raj et al. 2017). This combination of binding activities may be necessary to sustain fertility as species-specific adhesion molecule pairs co-evolve (Palumbi 1999, Palumbi

2009, Wilburn et al. 2018). Accordingly, in mammalian spermatozoa, the combined binding activities of zonadhesin and other ZP recognition proteins may confer functional degeneracy that protects against sterility upon loss of any one component’s function (Tardif et al. 2010a), thereby making zonadhesin binding specificity free to evolve in support of reproductive isolation.

Hybridization is not always deleterious, as it commonly introgresses beneficial alleles (Harrison and Larson 2014), and can even produce unique new species (Mallet

2007). The balance between the benefits and detriments of hybridization therefore dictates whether incipient species ultimately merge or diverge. Similar to results on bindin divergence and function in sea urchin fertilization (Lessios 2011), Zan sequence identity between species correlates with ability to form hybrids, as illustrated

33 Texas Tech University, Emma K. Roberts, May 2020 by the high identity between donkey and horse and between modern human and

Neandertal (Figs. 2.2 and 2.5). But even when hybridization can occur, some individuals within a population may less readily produce hybrids than others. Indeed, speciation gene polymorphism can produce fertility differences among individuals in a population (Palumbi 1999) that in turn could represent the earliest steps in the march toward reproductive isolation. No studies have yet characterized Zan variation within species. However, genes that evolve rapidly between species also commonly exhibit high allelic variation within species (Palumbi 1999, Sabeti et al. 2006, Moy et al.

2008, Palumbi 2009, Lessios 2011). Thus, Zan allelic variation within and outside of hybrid zones (Bradley et al. 1993, Harrison and Larson 2014) may provide insight into the relative pressures promoting and opposing complete reproductive isolation between placental species.

Zan DNA sequence comparisons may prove useful for resolving uncertainties in mammalian phylogenetics, including relationships among species within Orders as well as more basal relationships that have been difficult to resolve (Foley et al. 2016).

Average Zan divergence rate among Eutherian Orders correlated with species richness, consistent with a direct contribution of Zan evolution to speciation. Detailed analysis of that association proved impossible owing to lack of accepted ordinal speciation rates, which are difficult to estimate in part because they depend on extinction rates skewed by the “pull of the recent”, and in part because lineages- through-time plots underestimate recent divergences that take time to complete

(Etienne and Rosindell 2012). Also, average speciation rate for the evolution of a

34 Texas Tech University, Emma K. Roberts, May 2020 lineage does not reflect the dynamic rate variation likely to result from action of speciation genes subjected to episodic positive selection (Kumar et al. 2012).

Nevertheless, for Eutheria in general it should be possible to generate highly resolved phylogenies by comparing, as done here, Zan sequences spanning the full VWD domains among a more extensive assortment of species. Furthermore, the existence of partial VWD domains [D3p (Wilson et al. 2001)] in some rodents also provides an opportunity for even greater resolution of those species. The absence of D3p domains in most placental mammals suggests that partial VWD domain duplications represent a dramatic and comparatively recent development in the molecular evolution of Zan that is unique to Superfamily Muroidea in Suborder Myomorpha, which is estimated to have diverged from Superfamily Dipodoidea ~66 Myr ago (Honeycutt 2009).

Importantly, the partial VWD domains in these species likely contribute to egg recognition, as the D3p18 domain of mouse zonadhesin is highly autoimmunogenic

(Wheeler et al. 2011), and affinity-purified antibody to it inhibits mouse sperm-egg adhesion (Tardif et al. 2010a). The number of D3p domains detected thus far among

Muroidea species varies from 9-24, and comparing the ontogeny of D3p domain expansions in these animals alone may provide insight into the evolution of Muroidea.

35 Texas Tech University, Emma K. Roberts, May 2020

A 300 10 10 1 AchE 2 Trip6 250 9 3 CIP1/Slc12a9 8 9 8 4 Ephb4 5 Zan 7 6 7 200 6

150 5

6 Epo 100 7 POP7/Rpp20 8 PERQ1/Gigyf1 4 9 Gnb2 4 50 3 10 Tfr2 2 Mouse AssemblyGenomic (kb) 3 2 1 2 3 4 6 7 8 9 10 0 0 50 100 150 200 250 300 350 Opossum Genomic Assembly (kb) B 300 10 10 1 AchE 2 Trip6 250 9 3 CIP1/Slc12a9 8 9 8 4 Ephb4 7 5 Zan/ZanL 6 7 200 6

5 150 5

6 Epo 100 7 POP7/Rpp20 8 PERQ1/Gigyf1 4 9 Gnb2 50 4 10 Tfr2

Mouse Genomic Assembly (kb) Assembly Genomic Mouse 3 2 3 2 1 2 3 4 5 6 7 8 9 10 0 0 50 100 150 200 250 300 Platypus Genomic Assembly (kb) Fig. 2.1. Two- dimensional comparison of mouse, opossum, and platypus genomic loci spanning AchE - Tfr2. Panel A: Comparison of syntenic loci from mouse Chr 5 (~310 kb) encompassing 10 identified genes and opossum Chr 2 (~340 kb) encompassing nine of the same 10 genes. The arrows represent the locations and orientations of the respective genes (first through last exons plus introns; blue arrow = mouse Zan). Note the co-linearity and conserved order and orientation of genes 1-4 (AchE-Ephb4) and 6-10 (Epo-Tfr2), and the generally greater content of intergenic and intronic DNA in the opossum. Note also the comparatively short segment of intergenic DNA (30 kb) between opossum Ephb4 and Epo, and the corresponding discontinuity in the co-linear relationship (dotted lines) between the mouse and opossum loci, reflecting the absence of ~100 kb Zan in the opossum. Panel B: Corresponding comparison of the mouse Zan locus with the syntenic locus from platypus Chr X5 (~270 kb). Note the presence of a Zan-like gene (ZanL) in platypus between Ephb4 and Epo.

36 Texas Tech University, Emma K. Roberts, May 2020

Armadillo Aardvark Cape golden Lesser Cape rock hyrax Afrotheria manatee African savannah elephant Star-nosed mole European shrew Eulipotyphla Western European hedgehog Domestic cat Amur tiger Cheetah Red fox Dingo Domestic dog Domestic ferret Sea otter Carnivora Giant panda Polar bear Hawaiian monk seal Northern fur seal Weddell seal Pacific walrus Malayan Pholiodota Natal long-fingered Big brown bat David’s Myotis Brandt’s bat Little brown bat Chinese rufous Chiroptera Great roundleaf bat Egyptian rousette Black flying fox Large flying fox Southern white rhinoceros Donkey Horse Perissodactyla Zan Przewalski’s horse Alpaca Arabian camel Bactrian camel Wild Bactrian camel Pig Texas white-tailed deer Water buffalo Bison Yak Cetartiodactyla Cattle Zebu Tibetan antelope Goat Sheep Minke Sperm whale Yangtze River Beluga whale Bottlenose dolphin Killer whale Chinese treeshrew Scandentia Sunda flying lemur Dermoptera Small-eared galago Coquerel sifaka Gray mouse lemur Philipine tarsier White-fronted capuchin Bolivian squirrel Common Ma’s night monkey Cotton top tamarin White-tufted ear marmoset Northern white-cheeked gibbon Gorilla Chimpanzee Primates Pygmy chimpanzee Human Neanderthal Angola colobus Ugandan red colobus Black snub-nosed monkey Golden snub-nosed monkey Green monkey Pig-tailed macaque Crab-eating macaque Sooty mangabey Drill Gelada Hamadryas baboon Olive baboon Pika Rabbit Lagomorpha Damara mole rat Guinea pig Degu Long-tailed chinchilla Alpine marmot Thirteen-lined ground squirrel American beaver Ord’s kangaroo rat Lesser Egyptian jerboa Upper Galilee Mountains blind mole rat Rodentia Prairie deermouse Prairie vole Chinese hamster Golden hamster Mongolian gerbil Norway rat Shrew mouse House mouse 0.09 Ryukyu mouse Chinese soft-shelled turtle ZanL Fig. 2.2. Phylogenetic analysis of Zan DNA sequence divergence. Shown is the gene tree constructed by Bayesian analysis of 112 aligned Zan sequences, with ZanL from Chinese soft-shelled turtle (Pelodiscus sinensis) as outgroup. Nodes lacking statistical support are marked with a red dot (only 5/112 nodes unsupported). Note the statistically supported grouping of cetaceans with artiodactyls, as well as the monophyletic grouping of all species into their respective Orders and of all Orders into their respective Superorders. Taxonomy per McKenna and Bell (1997).

37 Texas Tech University, Emma K. Roberts, May 2020

Armadillo Two-tailed sloth Xenarthra Gray short-tailed opossum Tasmanian devil Koala Marsupialia Tammar wallaby Cape Aardvark Cape Afrotheria Florida manatee African savannah elephant Cape rock hyrax Chinese treeshrew Scandentia Rabbit Pika Lagomorpha Alpine marmot Thirteen-lined ground squirrel Guinea pig Damara mole rat Long-tailed chinchilla Degu Ord’s kangaroo rat Lesser Egyptian jerboa Upper Galilee Mountains blind mole rat Prairie deermouse Rodentia Prairie vole Chinese hamster Golden hamster Mongolian gerbil Norway rat Shrew mouse House mouse Ryukyu mouse Sunda flying lemur Dermoptera Small-eared galago Coquerel sifaka Gray mouse lemur Phillipine tarsier Euarchontoglires Ma’s night monkey White-tufted ear marmoset Bolivian squirrel monkey White-fronted capuchin Green monkey Gelada Hamadryas baboon Olive baboon Tecta Drill Sooty mangabey Primates Pig-tailed macaque Crab-eating macaque Rhesus macaque Angola colobus Ugandan red colobus Black snub-nosed monkey Golden snub-nosed monkey Northern white-cheeked gibbon Sumatran orangutan Gorilla Chimpanzee Pygmy chimpanzee Neanderthal Human Southern white rhino Donkey Horse Perissodactyla Przewalski’s horse Common vampire bat Natal long-fingered bat David’s Myotis Brandt’s bat Little brown bat Great roundleaf bat Chiroptera Chinese rufous horseshoe bat Egyptian rousette Black flying fox Large flying fox Malayan pangolin Pholiodota Amur tiger Cheetah Domestic cat Laurasiatheria Red fox Dingo Domestic dog Giant panda Carnivora Polar bear Domestic ferret Hawaiian monk seal Weddell seal Pacific walrus Northern fur seal Star-nosed mole European shrew Eulipotyphla Western Madagascar hedgehog Alpaca Arabian camel Bactrian camel Wild Bactrian camel Pig Texas white-tailed deer Water buffalo Bison Yak Cattle Cetartiodactyla Zebu Tibetan antelope Goat Sheep Minke whale Sperm whale Yangtze Beluga whale Bottle-nosed dolphin 0.06 Killer whale Duck-billed platypus Monotremata Chinese soft-shelled turtle Fig. 2.3. Phylogenetic analysis of Tecta DNA sequence divergence. Shown is the gene tree constructed by Bayesian analysis of 115 aligned Tecta sequences, with Chinese soft-shelled turtle (Pelodiscus sinensis) as outgroup. Nodes lacking statistical support are marked with a red dot (27/115 nodes unsupported). Note the discrepant grouping of the Marsupialia clade with Xenarthran and Afrotherian clades of placental mammals rather than deeper in the phylogeny, basal to all Eutheria. Taxonomy per McKenna and Bell (1997).

38 Texas Tech University, Emma K. Roberts, May 2020

Duck-billed platypus Monotremata Mongolian gerbil Prairie deermouse Prairie vole Chinese hamster Golden hamster Euarchontoglires Norway rat House mouse Shrew mouse Ryukyu mouse Rodentia Lesser Egyptian jerboa Upper Galilee Mountains blind mole rat Damara mole rat Degu Guinea pig Long-tailed chinchilla American beaver Aardvark Cape rock hyrax African savannah elephant Afrotheria West Indian manatee Chinese treeshrew Scandentia Cape elephant shrew European shrew Thirteen-lined ground squirrel Alpine marmot Rodentia Euarchontoglires Malayan pangolin Ord’s kangaroo rat Pika Lagomorpha Rabbit Sunda flying lemur Dermoptera Two-toed sloth Armadillo Xenarthra Lesser hedgehog tenrec Star-nosed mole Western Madagascar hedgehog Eulipotyphla Natal long-fingered bat Big brown bat David’s Myotis Brandt’s bat Little brown bat Chiroptera Great roundleaf bat Egyptian rousette Black flying fox Large flying fox Amur tiger Cytb Cheetah Domestic cat Dingo Domestic dog Red fox Hawaiian monk seal Carnivora Weddell seal Northern fur seal Atlantic walrus Domestic ferret Sea otter Laurasiatheria Giant panda Polar bear Southern white rhino Donkey Perissodactyla Horse Przewalski’s horse Texas white-tailed deer Antelope Goat Sheep Water buffalo Bison Yak Cow Zebu Pig Cetartiodactyla Alpaca Arabian camel Bactrian camel Wild Bactrian camel Minke whale Sperm whale Yangtze River dolphin Beluga whale Bottle-nosed dolphin Killer whale Philippine tarsier Small-eared galago Coquerel sifaka Gray mouse lemur Northern white-cheeked gibbon Sumatran orangutan Euarchontoglires Gorilla Chimpanzee Pygmy chimpanzee Neanderthal Human Green monkey Crab-eating macaque Pig-tailed macaque Rhesus macaque Primates Drill Sooty mangabey Gelada Hamadryas baboon Olive baboon Angola colobus Western red colobus Black snub-nosed monkey Golden snub-nosed monkey Cotton top tamarin White-tufted ear marmoset Ma’s night monkey White-fronted capuchin Bolivian squirrel monkey 0.3 Common squirrel monkey Tasmanian devil Banded-hare wallaby Marsupialia Gray short-tailed opossum Koala Chinese soft-shelled turtle 0.3

Fig. 2.4. Phylogenetic analysis of Cytb DNA sequence divergence. Shown is the gene tree constructed by Bayesian analysis of 119 aligned Cytb sequences, with Chinese soft-shelled turtle (Pelodiscus sinensis) as outgroup. Nodes lacking statistical support are marked with a red dot (47/119 nodes unsupported). Note the many discrepant groupings, including lack of monophyly in Euarchontoglires and Afrotheria, grouping of Xenarthra with Laurasiatheria rather than branching basal to all other Eutherian Superorders, and grouping of Monotremata with Eutheria rather than basal to Marsupialia. Taxonomy per McKenna and Bell (1997).

39 Texas Tech University, Emma K. Roberts, May 2020

ChinesesoftshellturtleZanL ChinesesoftshellturtleZanL W_European_hhog W_European_hhog European_shrew European_shrew Star-nosed_mole Star-nosed_mole Cat Cat Amur_tiger Amur_tiger Cheetah Cheetah Dog Dog Polar_bear Polar_bear Panda Panda Domestic_ferret Domestic_ferret Sea_otter Sea_otter Walrus Walrus Weddel_seal Weddel_seal Hawaiian_monk_seal Hawaiian_monk_seal So_white_rhino So_white_rhino Donkey Donkey Horse Horse Przewalskis_horse Przewalskis_horse Pangolin Pangolin Alpaca Alpaca Arabian_camel Arabian_camel Bactrian_camel Bactrian_camel Pig Pig TX_whitetail_deer TX_whitetail_deer Water_buffalo Water_buffalo Bison Bison Yak Yak Cow Cow Chiru Chiru Goat Goat Sheep Sheep Yangtze_river_dolphin Yangtze_river_dolphin Beluga_whale Beluga_whale Killer_whale Killer_whale Bottle-nosed_dolphin Bottle-nosed_dolphin Sperm_whale Sperm_whale Minke_whale Minke_whale Natal_long-fingered_bat Natal_long-fingered_bat Big_brown_bat Big_brown_bat Mouse-eared_bat Mouse-eared_bat Davids_myotis Davids_myotis Brandts_bat Brandts_bat Little_brown_bat Little_brown_bat Egyptian_rousette Egyptian_rousette Large_flying_fox Large_flying_fox Black_flying_fox Black_flying_fox Chin_ruf_horseshoe_bat Chin_ruf_horseshoe_bat Great_roundleaf_bat Great_roundleaf_bat Armadillo Armadillo Sm_Madagascar_hhog Sm_Madagascar_hhog Cape_golden_mole Cape_golden_mole Aardvark Aardvark Hyrax Hyrax Elephant Elephant Manatee Manatee Sunda_flying_lemur Sunda_flying_lemur Chinese_tree_shrew Chinese_tree_shrew Sm-eared_galago Sm-eared_galago Coquerel_sifaka Coquerel_sifaka Gray_mouse_lemur Gray_mouse_lemur Phillipine_tarsier Phillipine_tarsier White-headed_capuchin White-headed_capuchin Bolivian_squirrel_monkey Bolivian_squirrel_monkey Common_squirrel_monkey Common_squirrel_monkey Mas_night_monkey Mas_night_monkey Common_marmoset Common_marmoset Cottontop_tamarin Cottontop_tamarin Sumatran_orangutan Sumatran_orangutan NWC_gibbon NWC_gibbon Gorilla Gorilla Chimpanzee Chimpanzee Pygmy_chimpanzee Pygmy_chimpanzee Human Human Neandertal Neandertal Angola_colobus Angola_colobus Ugandan_red_colobus Ugandan_red_colobus Golden_snub-nosed_monkey Golden_snub-nosed_monkey Black_snub-nosed_monkey Black_snub-nosed_monkey Green_monkey Green_monkey Rhesus_macaque Rhesus_macaque Crab_eating_macaque Crab_eating_macaque Drill Drill Sooty_mangabey Sooty_mangabey Hamadryas_baboon Hamadryas_baboon Olive_baboon Olive_baboon Pika Pika Rabbit Rabbit Zebu Zebu Damara_mole_rat Damara_mole_rat Guinea_pig Guinea_pig Chinchilla Chinchilla Degu Degu Ords_KangarooRat Ords_KangarooRat Alpine_marmot_ed Alpine_marmot_ed 13_lined_ground_squirrel 13_lined_ground_squirrel Lssr_Egyptian_jerboa Lssr_Egyptian_jerboa UprGalMts_blind_mole_rat UprGalMts_blind_mole_rat NA_deermouse NA_deermouse Prairie_vole Prairie_vole Chinese_hamster Chinese_hamster Syrian_hamster Syrian_hamster Mongolian_gerbil Mongolian_gerbil Norway_rat Norway_rat Shrew_mouse Shrew_mouse House_mouse House_mouse Ryuku_mouse Ryuku_mouse

0.1 0.1

Fig. 2.5. Bayesian tree of aligned Zan protein sequences. Shown is the protein tree constructed by Bayesian analysis of 106 aligned zonadhesin sequences (GTR+I+G nucleotide substitution model and 10,000,000 generations), with Chinese soft-shelled turtle (Pelodiscus sinensis) as outgroup. Red dots mark the 12 of 106 nodes that lacked statistical support (p<0.95).

40 Texas Tech University, Emma K. Roberts, May 2020

D1 D2 D4 * * * * * * * * * * * * Consensus Consensus * * * * * * * * Consensus Human Human Human Alpine marmot O’s kangaroo rat O’s kangaroo rat 13-l ground sq Alpine marmot 13-l ground sq. LE jerboa 13-l ground sq. LE jerboa Chinese hamster Chinese hamster Syrian hamster Syrian hamster Syrian hamster Prairie vole Prairie vole Prairie vole Norway rat Norway rat Norway rat * * NA deermouse NA deermouse NA deermouse House mouse House mouse House mouse Mongol gerbil Mongol gerbil Mongol gerbil Ryukyu mouse Ryukyu mouse Ryukyu mouse Shrew mouse Shrew mouse Shrew mouse UGM bl mole rat UGM bl mole rat UGM bl mole rat Chinchilla Chinchilla Chinchilla Degu Degu Degu Guinea pig Guinea pig Guinea pig Damara mole rat Damara mole rat Damara mole rat

Mu Rt Ha GP Rb Dg Eq Po Hu Mu Rt Ha GP Rb Dg Eq Po Hu anti-D3 anti-D3p18

250 250 150 150 100 100 75 75 50 50 37 37 25 25

Fig. 2.6. Species diversity of Zan amino acid sequence and processing. Upper panels: Taxon-specific variation in proteolytic processing sites of the zonadhesin precursor. Asp-Pro bonds ("DP", downward arrows) cleaved during the proteolytic processing of the pig zonadhesin precursor (42) are widely conserved among mammalian taxa in the D2 and D4 VWD domains (Panels D2, D4), but the D1 domain site is not conserved in myomorph rodents (Panel D1). Numbers denote amino acid positions of the consensus sequence downstream of the alignment’s start at the D0 domain. Note the P®L/V substitution, driven by positive selection (sites denoted by asterisks), in all 11 myomorph species, but conservation of P in the others. Note also the downstream amino acid substitutions in D1 of the rodent species, also driven by positive selection, reflecting the accelerated divergence of Zan in these animals. Lower panels: Species heterogeneity of zonadhesin D3 and D3p domain polypeptides. Shown are western blots of resolved sperm proteins from nine species, representing Orders Rodentia, Lagomorpha, Carnivora, Perissodactyla, Cetartiodactyla, and Primates, probed as indicated with affinity-purified antibody to the mouse zonadhesin D3 VWD domain or the D3p18 partial VWD domain. -3 Migration of size markers (Mr ´ 10 ) is indicated on the left. Species abbreviations are: Mu, mouse; Rt, Norway rat; Ha, Syrian hamster; GP, guinea pig; Rb, rabbit; Dg, dog; Eq, horse; Po, pig; Hu, human. Note with anti-muD3 the detection of the well- characterized Mr 105,000 zonadhesin D3 polypeptides of porcine and equine spermatozoa, the detection of similarly-sized D3 polypeptides in human, dog, and rabbit spermatozoa, and the detection of multiple larger polypeptides in rodent spermatozoa. Note with anti-muD3p18 the detection only in the three species of myomorph rodents of high Mr D3p18 immunoreactivity (>160,000 Mr), including the well-characterized Mr 300,000 polypeptide of mouse and hamster spermatozoa.

41 Texas Tech University, Emma K. Roberts, May 2020

0.5 Dermoptera Perissodactyla Scandentia Chiroptera Pholiodota Primates Lagomorpha Cetartiodactyla Carnivora 0.4 Rodentia (non-Myomorpha) Hyracoidea Rodentia (Myomorpha) Proboscidea Eulipotyphla Sirenia Tubulidentata 0.3 Divergence

Zan 0.2

0.1 0 10 20 30 40 50 60 70 80 90 100 110 120 Species Rank

Fig. 2.7. Rank-rate plot of Zan divergence rates among 112 species of placental mammals. Note the inflection between the most rapidly diverging Cetartiodactyla species and the species comprising Eulipotyphla and all but one of Rodentia.

42 Texas Tech University, Emma K. Roberts, May 2020

4

Cytb Tecta Zan

3

2

Normalized divergence rate 1

Fig. 2.8. Variation of Zan DNA sequence divergence rate among species. Shown are the divergence rates of Cytb, Tecta, and Zan from each species (filled black circles), calculated from branch lengths back to the originating node of the species’ respective Superorders, and normalized to the rate of the species with the slowest divergence for each gene, i.e. normalized rate = 1.0 for Cytb in Norway rat (Rattus norvegicus), for Tecta in donkey (Equus asinus), and for Zan in Florida manatee (Trichechus manatus). Data are grouped by taxa (Superorder Afrotheria and eight individual Orders) represented by at least three species. Note the low average Zan divergence rate in Afrotheria, which comprises six Orders with only 89 extant species, and the significantly higher rate in Eulipotyphla and Rodentia, which together comprise 3079 (>48%) of 6399 currently recognized, extant Eutherian species.

43 Texas Tech University, Emma K. Roberts, May 2020 Zan ontogeny

Croaker Frog Platypus Opossum Human Crocodyle Chicken Turtle Coelocanth

synapsids sauropsids

amphibians Zan amniotes ray-finned fish tetrapods lobe-finned fish ZanL

Loss Stem vertebrate

Fig. 2.9. Zan ontogeny. In this proposed ontogeny, Zan and ZanL evolve from an ancient Zan-like gene in stem vertebrates that shares partial synteny with current Zan and ZanL loci. The ZanL descendant persists in reptiles and ray- and lobe-finned fish (black branches) but is lost in amphibia, birds, and non-Eutherian mammals (dotted branches). ZanL persists in Prototheria (monotremes) after divergence of the therian crown group but is lost in Metatheria (marsupials) after divergence from Eutheria, whereas authentic Zan evolves in the latter (red branch) as a consequence of its neofunctionalization to a sperm-egg recognition molecule.

44 Texas Tech University, Emma K. Roberts, May 2020

Table 2.1. Blastp summary results for Zan sequence mining in non-mammals

Species Common NCBI E val Query Bitscore Seq ID Accession No. name description cover Chrysemys Painted turtle zonadhesin- 0.00 99% 3512 46% XP_023969614.1 picta bellii like Pelodiscus Chinese soft- zonadhesin 0.00 99% 3186 45% XP_025045216.1 sinensis shelled turtle Latimeria Coelocanth PREDICTED: 0.00 99% 3564 44% XP_014346149.1 chalumnae zonadhesin Larimichthyes Yellow zonadhesin 0.00 99% 2961 42% KKF20408.1 crocea croaker Danio rerio Zebrafish zonadhesin iso- 0.00 99% 1249 42% XP_021333481.1, forms x2, x1 XP_001921732.1

45 Texas Tech University, Emma K. Roberts, May 2020

Table 2.2. Global and Ordinal nucleotide substitution models for each gene. GTR=general time reversal; +G=gamma; +I+G=inverse gamma.

Dataset Zan Tecta Cytb Global GTR+I+G GTR+I+G GTR+I+G Afrotheria GTR+G GTR+I+G GTR+I+G Carnivora GTR+G GTR+I+G GTR+I+G Cetartiodactyla GTR+G GTR+I+G GTR+I+G Chiroptera GTR+G GTR+G GTR+I+G Primates GTR+G GTR+I+G GTR+I+G Rodentia GTR+I+G GTR+I+G GTR+I+G

46 Texas Tech University, Emma K. Roberts, May 2020

Table 2.3. Global phylogenetic comparisons by Shimodaira-Hasegawa (SH) and Adjusted Unbiased (AU) tests for each gene. These tests compared the best unconstrained phylogeny (gene tree), inclusive of all Orders, to gene trees constrained to an established phylogeny [“ST” (Bininda-Emonds et al. 2007)]. Decreased LH for the constrained trees thus reflects differences between the gene trees and the BE phylogeny. Columns are: LH, likelihood score; D(LH), difference in the likelihood scores between constrained and unconstrained phylogenies; SD, standard deviation; SH, p-value for SH test; AU, p-value for AU test. When the PAUP software yielded multiple constrained trees, p-values were adjusted accordingly using the Bonferroni correction. Asterisks denote significant p-values.

Gene Tree LH D(LH) SH AU Zan Unconstrained -167535.10 (best) Constrained #1 -167545.20 10.09 0.15 0.06 Constrained #2 -167547.48 12.38 0.14 0.02* Constrained #3 -167793.42 258.32 <0.0001* <0.0001*

Tecta Unconstrained -107522.79 (best) Constrained #1 -107526.64 3.84 0.18 0.07 Constrained #2 -107527.75 4.96 0.16 0.07 Constrained #3 -107531.60 8.81 0.11 0.02* Constrained #4 -111809.97 4287.17 <0.0001* <0.0001*

Cytb Unconstrained -54914.43 (best) Constrained #1 -55018.56 104.13 0.005* 0.016* Constrained #2 -55021.65 107.21 0.007* 0.004* Constrained #3 -55022.27 107.83 0.005* 0.001* Constrained #4 -55023.13 108.69 0.005* 0.007*

47 Texas Tech University, Emma K. Roberts, May 2020

Table 2.4. Ordinal phylogenetic comparisons by Shimodaira-Hasegawa (SH) and and Approximately Unbiased (AU) tests for each gene. These tests compared the best unconstrained phylogenies (gene trees) by individual Order to the corresponding ordinal phylogenies of ST tree (Bininda-Emonds et al. 2007). Software-determined best substitution model was GTR+I+G for all Cytb phylogenies, for all but the Chiroptera Tecta phylogeny, and for the Rodentia Zan phylogeny. GTR+G was the best model for Chiroptera Tecta and Afrotheria, Carnivora, Cetartiodactyla, Chiroptera, and Primates Zan phylogenies. Columns are defined as in Table 2.3.

Gene Order Tree LH D(LH) SH AU Zan Afrotheria Unconstrained -17882.75 (best) Constrained #1 -17913.44 30.69 0.0021* <0.0001* Carnivora Unconstrained -21418.57 (best) Constrained #1 -21437.15 18.58 0.0178* 0.0053* Cetartiodactyla Unconstrained -21736.82 (best) Constrained #1 -21736.82 0.00 1.00 0.5 Chiroptera Unconstrained -19384.05 (best) Constrained #1 -19470.35 86.29 <0.0001* <0.0001* Primates Unconstrained -31529.82 (best) Constrained #1 -31536.62 6.8 0.23 0.2 Rodentia Unconstrained -46874.37 (best) Constrained #1 -46930.17 55.79 <0.0001* <0.0001* Tecta Afrotheria Unconstrained -162009.51 (best) Constrained #1 -16201.04 0.52 0.19 0.20 Constrained #2 -16222.22 21.70 0.0092* 0.0058* Carnivora Unconstrained -15797.76 (best) Constrained #1 -15805.01 7.24 0.06 0.033* Cetartiodactyla Unconstrained -17170.29 (best) Constrained #1 -17170.29 0.00 1.00 0.50 Chiroptera Unconstrained -18704.75 (best) Constrained #1 -18704.75 0.00 1.00 0.50 Primates Unconstrained -21291.28 (best) Constrained #1 -21293.08 1.80 0.20 0.26 Constrained #2 -21295.62 4.34 0.08 0.05 Rodentia Unconstrained -27073.70 (best) Constrained #1 -27079.35 5.65 0.25 0.24 Cytb Afrotheria Unconstrained -5699.84 (best) Constrained #1 -5706.19 6.34 0.13 0.13 Carnivora Unconstrained -7308.91 (best) Constrained #1 -7309.21 0.30 0.24 0.24 Constrained #2 -7310.53 1.61 0.10 0.09

48 Texas Tech University, Emma K. Roberts, May 2020

Table 2.4 Continued

Cetartiodactyla Unconstrained -8580.81 (best) Constrained #1 -8585.75 4.93 0.18 0.17 Chiroptera Unconstrained -6906.36 (best) Constrained #1 -6906.77 0.40 0.44 0.45 Primates Unconstrained -13314.47 (best) Constrained #1 -13575.30 260.82 <0.0001* <0.0001* Rodentia Unconstrained -11741.46 (best) Constrained #1 -11742.85 1.39 0.46 0.47

49 Texas Tech University, Emma K. Roberts, May 2020

Table 2.5. PAML summary results. Shown are log likelihood scores (LH), Akaike Information Criterion (AIC-c) and parameter estimates for variable dN/dS models within the F3´4 codon frequency method.

Gene Model Parameter Estimates LH AIC-c

Zan M0: null w0 = 1.179 -165052.47 329236.03 M1: neutral w0 = 0.811 f0 = 0.029 -164386.94 428487.49 w1 = 1.000 f1 = 0.971 M2: selection w0 = 0.515 f0 = 0.247 -162998.98 326486.44 w1 = 1.000 f1 = 0.667 w2 = 8.749 f2 = 0.086 Tecta M0: null w0 = 1.313 -79687.25 158687.78 M1: neutral w0 = 0.700 f0 = 0.021 -78708.47 157947.22 w1 = 1.000 f1 = 0.979 M2: selection w0 = 0.462 f0 = 0.367 -78701.72 157902.20 w1 = 1.000 f1 = 0.279 w2 = 1.707 f2 = 0.354 Cytb M0: null w0 = 0.039 -55069.08 129343.18 M1: neutral w0 = 0.035 f0 = 0.936 -54242.80 110123.90 w1 = 1.000 f1 = 0.064 M2: selection w0 = 0.035 f0 = 0.936 -54242.78 109000.73 w1 = 1.000 f1 = 0.022 w2 = 1.000 f2 = 0.041

50 Texas Tech University, Emma K. Roberts, May 2020

Table 2.6. Sites with p(w > 1) > 0.95 Bayes Empirical Bayes for each gene. Sites under positive selection (w>1) with ≥0.95 Bayes Empirical Bayes statistical support are listed. Boldface denotes Zan positively selected sites shown in Fig 2.6.

Zan 17 E 1.00 92 T 1.00 117 P 1.00 124 G 1.00 126 H 1.00 128 G 1.00 130 M 1.00 132 K 1.00 273 N 1.00 336 M 1.00 724 Y 1.00 734 L 1.00 735 T 1.00 825 D 1.00 879 A 1.00 970 T 1.00 1065 D 1.00 1164 A 1.00 1185 Q 1.00 1288 H 1.00 1397 D 1.00 1468 G 1.00 1518 Q 1.00 112 L 0.999 298 A 0.999 647 R 0.999 152 T 0.999 153 A 0.999 231 Y 0.999 269 E 0.999 289 Q 0.995 337 F 0.999 493 A 0.999 686 N 0.999 899 S 0.999 1004 N 0.999 1256 R 0.999 1282 S 0.999 1296 S 0.999 91 H 0.998 125 R 0.998 192 R 0.998 407 Q 0.998 494 A 0.998 568 I 0.998 700 M 0.998 717 S 0.998 754 P 0.998 875 L 0.998 932 K 0.998 968 Q 0.998 1177 P 0.998 1226 G 0.998 1270 N 0.998 1283 V 0.998 1295 L 0.998 1357 E 0.998 1478 S 0.998 1519 E 0.998 1522 L 0.998 108 T 0.997 321 A 0.997 1442 T 0.996 24 G 0.995 139 Q 0.995 235 P 0.995 703 Q 0.995 753 A 0.995 824 T 0.995 1192 P 0.995 1423 K 0.995 76 W 0.992 687 S 0.992 316 R 0.990 400 G 0.990 438 P 0.990 498 V 0.990 791 D 0.990 871 L 0.990 1060 N 0.990 1223 A 0.990 1267 D 0.990 1269 G 0.990 1333 W 0.990 272 G 0.989 441 G 0.989 666 P 0.989 845 R 0.989 1165 H 0.989 1196 A 0.989 1280 Q 0.989 1408 S 0.989 105 P 0.988 463 N 0.988 300 Q 0.987 314 D 0.987 392 P 0.987 1069 Q 0.987 1284 Y 0.987 1510 T 0.987 878 R 0.986 887 I 0.986 26 L 0.985 278 A 0.985 283 Q 0.985 85 I 0.984 167 S 0.984 409 R 0.984 892 H 0.984 1451 T 0.984 1409 M 0.983 642 L 0.982 986 G 0.982 1271 S 0.982 1476 F 0.981 301 N 0.980 1387 I 0.980 1474 L 0.980 232 V 0.979 1266 S 0.979 768 P 0.975 284 E 0.974 684 Q 0.974 1490 S 0.974 104 H 0.973 826 P 0.973 668 S 0.972 749 Q 0.972 832 P 0.972 1018 T 0.972 1464 E 0.972 646 L 0.970 1091 Q 0.970 355 A 0.969 880 S 0.969 1322 V 0.969 1532 K 0.969 59 K 0.969 436 L 0.968 794 A 0.968 827 A 0.967 1477 H 0.967 86 S 0.966 385 R 0.966 784 L 0.966 795 A 0.966 953 L 0.966 961 R 0.966 1268 N 0.966 1472 E 0.966 107 G 0.965 136 I 0.965 198 I 0.965 258 L 0.965 281 E 0.965 286 Q 0.965 311 R 0.965 354 S 0.965 467 R 0.965 525 P 0.965 653 A 0.965 665 L 0.965 679 K 0.965 698 I 0.965 774 P 0.965 886 R 0.965 1265 S 0.965 1272 N 0.965 1421 Y 0.965 1453 D 0.965 52 F 0.965 62 A 0.964 233 T 0.964 297 S 0.964 303 M 0.964 437 D 0.964 733 T 0.964 790 R 0.964 1172 L 0.964 1475 G 0.964 527 W 0.963 1367 I 0.963 1411 Q 0.963 87 Q 0.962 1400 K 0.962 720 A 0.961 1182 L 0.959 1290 R 0.959 330 S 0.958 1399 R 0.958 904 R 0.957 299 L 0.956 1109 D 0.956 478 V 0.955 654 G 0.955 282 D 0.954 901 I 0.954 1471 E 0.954 315 T 0.953 894 V 0.953 1395 S 0.953 1456 L 0.952 119 Y 0.951 308 F 0.951 497 T 0.951 531 Q 0.951 97 K 0.950 404 M 0.950 574 M 0.950 711 Q 0.950 1188 G 0.950 1249 A 0.950 Tecta 140 R 1.00 143 Q 1.00 148 R 1.00 150 R 1.00 161 L 1.00 163 Q 1.00 164 Q 1.00 165 A 1.00 166 G 1.00 167 A 1.00 170 L 1.00 175 P 1.00 189 Q 1.00 255 A 1.00 314 G 1.00 316 L 1.00 317 C 1.00 324 R 1.00 361 R 1.00 364 L 1.00 369 P 1.00 387 P 1.00 390 S 1.00 403 Q 1.00 407 W 1.00 420 P 1.00 426 Q 1.00 434 V 1.00 438 W 1.00 445 P 1.00 453 P 1.00 455 A 1.00 472 V 1.00 498 L 1.00 507 Q 1.00 518 T 1.00 519 R 1.00 528 R 1.00 530 R 1.00 532 Q 1.00 534 R 1.00 536 P 1.00 539 P 1.00 540 L 1.00 542 P 1.00 590 A 1.00 593 G 1.00 594 Q 1.00 601 P 1.00 604 K 1.00 605 E 1.00 609 R 1.00 635 S 1.00 637 G 1.00

51 Texas Tech University, Emma K. Roberts, May 2020

Table 2.6 Continued

654 L 1.00 656 G 1.00 665 P 1.00 669 P 1.00 676 V 1.00 699 P 1.00 707 R 1.00 710 R 1.00 715 V 1.00 716 S 1.00 729 P 1.00 734 P 1.00 735 A 1.00 738 A 1.00 743 G 1.00 746 A 1.00 757 R 1.00 760 R 1.00 766 P 1.00 767 P 1.00 768 S 1.00 774 G 1.00 786 Q 1.00 797 V 1.00 812 A 1.00 814 R 1.00 828 G 1.00 832 R 1.00 837 R 1.00 838 L 1.00 839 V 1.00 853 V 1.00 857 L 1.00 859 G 1.00 877 T 1.00 897 T 1.00 911 R 1.00 918 P 1.00 951 C 1.00 962 R 1.00 983 P 1.00 989 C 1.00 1004 V 1.00 1064 L 1.00 1074 A 1.00 1081 A 1.00 1083 A 1.00 1101 R 1.00 1103 R 1.00 1113 G 1.00 1117 G 1.00 1122 R 1.00 1129 G 1.00 1141 A 1.00 1154 V 1.00 1158 R 1.00 1161 P 1.00 1162 G 1.00 1164 F 1.00 1175 H 1.00 1180 P 1.00 1184 L 1.00 1188 A 1.00 1290 I 1.00 1296 S 1.00 1297 T 1.00 1301 A 1.00 1308 G 1.00 1321 W 1.00 1327 R 1.00 1346 C 1.00 1356 C 1.00 1369 Y 1.00 1396 L 1.00 1407 P 1.00 1408 T 1.00 1420 S 1.00 1425 Q 1.00 1431 S 1.00 171 D 0.999 172 V 0.999 193 P 0.999 194 Q 0.999 338 G 0.999 392 G 0.999 410 P 0.999 412 T 0.999 423 N 0.999 448 A 0.999 505 P 0.999 531 L 0.999 535 C 0.999 549 K 0.999 563 V 0.999 565 A 0.999 568 T 0.999 581 P 0.999 620 S 0.999 626 W 0.999 634 P 0.999 636 T 0.999 655 T 0.999 694 R 0.999 790 P 0.999 796 S 0.999 801 T 0.999 821 S 0.999 833 S 0.999 835 A 0.999 843 P 0.999 852 S 0.999 860 T 0.999 906 P 0.999 987 S 0.999 994 A 0.999 1092 S 0.999 1132 R 0.999 1136 L 0.999 1155 A 0.999 1167 G 0.999 1183 P 0.999 1205 S 0.999 1287 W 0.999 1303 H 0.999 1304 R 0.999 1332 R 0.999 1334 H 0.999 1344 K 0.999 1370 C 0.999 1372 I 0.999 1384 N 0.999 1398 S 0.999 1400 H 0.999 1406 V 0.999 1410 V 0.999 1416 V 0.999 1423 L 0.999 1428 A 0.999 154 I 0.998 168 I 0.998 253 S 0.998 359 P 0.998 370 L 0.998 389 A 0.998 416 E 0.998 433 S 0.998 470 T 0.998 492 T 0.998 551 T 0.998 575 E 0.998 589 G 0.998 596 S 0.998 612 T 0.998 621 G 0.998 629 T 0.998 667 A 0.998 695 F 0.998 726 S 0.998 773 T 0.998 813 T 0.998 842 A 0.998 861 S 0.998 896 N 0.998 954 R 0.998 1001 P 0.998 1007 P 0.998 1059 P 0.998 1090 P 0.998 1095 R 0.998 1098 P 0.998 1185 C 0.998 1419 A 0.998 1422 H 0.998 352 R 0.997 360 R 0.997 388 V 0.997 559 P 0.997 597 A 0.997 732 T 0.997 745 R 0.997 751 S 0.997 776 A 0.997 827 A 0.997 856 V 0.997 912 T 0.997 925 S 0.997 929 R 0.997 949 S 0.997 1070 C 0.997 1181 S 0.997 1317 A 0.997 147 P 0.996 236 Y 0.996 260 V 0.996 371 T 0.996 440 L 0.996 457 C 0.996 476 Y 0.996 478 V 0.996 479 G 0.996 483 P 0.996 499 F 0.996 638 W 0.996 644 Q 0.996 661 A 0.996 753 T 0.996 822 S 0.996 851 A 0.996 908 S 0.996 939 F 0.996 976 S 0.996 1190 H 0.996 1326 R 0.996 1411 L 0.996 514 Q 0.995 615 G 0.995 709 A 0.995 755 S 0.995 810 A 0.995 920 T 0.995 967 N 0.995 1107 E 0.995 1142 P 0.995 1291 V 0.995 1324 R 0.995 1333 L 0.995 1413 G 0.995 1429 T 0.995 142 F 0.994 185 I 0.994 413 G 0.994 515 S 0.994 558 S 0.994 607 C 0.994 651 K 0.994 747 T 0.994 846 T 0.994 902 T 0.994 946 R 0.994 973 I 0.994 1116 R 0.994 1134 R 0.994 1378 Y 0.994 183 P 0.993 191 P 0.993 386 P 0.993 486 A 0.993 641 C 0.993 847 S 0.993 893 R 0.993 900 R 0.993 909 P 0.993 913 P 0.993 998 T 0.993 1331 R 0.993 1368 N 0.993 1382 S 0.993 1388 V 0.993 322 I 0.992 335 G 0.992 383 P 0.992 391 V 0.992 454 T 0.992 557 A 0.992 562 S 0.992 649 P 0.992 664 N 0.992 713 R 0.992 763 R 0.992 895 T 0.992 921 A 0.992 1091 G 0.992 1126 P 0.992 1182 H 0.992 1433 L 0.992 162 E 0.991 343 C 0.991 372 D 0.991 409 C 0.991 525 A 0.991 569 A 0.991 650 C 0.991 663 R 0.991 725 A 0.991 761 F 0.991 784 T 0.991 872 A 0.991 1088 A 0.991 1123 C 0.991 1315 L 0.991 249 G 0.990 401 P 0.990 671 D 0.990 698 P 0.990 711 R 0.990 724 Y 0.990 818 A 0.990 881 P 0.990 919 A 0.990 1347 A 0.990 1373 L 0.990 1377 A 0.990 1418 G 0.990 156 Q 0.989 180 V 0.989 333 P 0.989

52 Texas Tech University, Emma K. Roberts, May 2020

Table 2.6 Continued

366 L 0.989 500 Q 0.989 681 E 0.989 720 A 0.989 741 T 0.989 887 A 0.989 1008 A 0.989 1114 G 0.989 1362 H 0.989 1401 P 0.989 248 D 0.988 404 T 0.988 406 G 0.988 431 A 0.988 546 S 0.988 632 C 0.988 717 S 0.988 934 A 0.988 991 C 0.988 1186 H 0.988 1337 P 0.988 1340 G 0.988 1360 C 0.988 188 F 0.987 552 S 0.987 614 I 0.987 1313 F 0.987 1350 P 0.987 1435 Y 0.987 158 P 0.986 520 S 0.986 570 G 0.986 737 C 0.986 789 S 0.986 871 T 0.986 1192 L 0.986 381 R 0.985 466 R 0.985 524 A 0.985 600 P 0.985 957 T 0.985 1075 T 0.985 1295 T 0.985 1310 D 0.985 1367 Q 0.985 1421 T 0.985 160 P 0.984 318 L 0.984 337 L 0.984 484 V 0.984 702 K 0.984 910 A 0.984 927 G 0.984 960 G 0.984 1137 H 0.984 1316 V 0.984 1414 V 0.984 449 A 0.983 481 T 0.983 521 A 0.983 693 C 0.983 1078 A 0.983 1412 Y 0.983 151 C 0.982 587 G 0.982 646 I 0.982 874 S 0.982 1204 S 0.982 1389 P 0.982 173 E 0.981 485 S 0.981 512 S 0.981 647 P 0.981 731 S 0.981 809 T 0.981 980 T 0.981 1309 T 0.981 1319 S 0.981 190 V 0.980 378 S 0.980 673 G 0.980 775 S 0.980 924 P 0.980 1002 G 0.980 1111 P 0.980 1112 G 0.980 1323 E 0.980 357 A 0.979 508 G 0.979 691 T 0.979 704 T 0.979 1187 W 0.979 1307 G 0.979 1387 S 0.979 1403 R 0.979 1417 V 0.979 464 T 0.978 576 S 0.978 627 A 0.978 1379 I 0.978 363 R 0.977 398 L 0.977 473 C 0.977 482 A 0.977 494 C 0.977 802 T 0.977 824 C 0.977 1193 S 0.977 1375 R 0.977 1394 D 0.977 783 T 0.976 999 S 0.976 1195 T 0.976 1325 R 0.976 36 I 0.975 452 R 0.975 752 P 0.975 1105 A 0.975 1120 H 0.975 1329 L 0.975 429 T 0.974 430 T 0.974 583 T 0.974 785 T 0.974 862 F 0.974 1354 K 0.974 511 L 0.973 522 P 0.973 618 T 0.973 986 T 0.973 1121 L 0.973 375 T 0.972 585 R 0.972 623 A 0.972 787 S 0.972 890 S 0.972 1061 R 0.972 1099 S 0.972 184 I 0.971 394 I 0.971 642 S 0.971 696 S 0.971 739 G 0.971 876 A 0.971 996 G 0.971 1108 L 0.971 1138 H 0.971 1314 H 0.971 1397 N 0.971 237 S 0.970 586 C 0.970 804 P 0.970 935 A 0.970 1312 S 0.970 332 S 0.969 451 T 0.968 781 P 0.968 834 S 0.968 961 T 0.968 376 P 0.967 432 A 0.967 819 P 0.967 1156 P 0.967 195 R 0.9660 526 R 0.9660 652 P 0.9660 867 P 0.9660 891 P 0.9660 914 P 0.9660 1094 S 0.9660 1125 P 0.9660 198 K 0.965 633 S 0.965 979 P 0.965 399 T 0.964 467 P 0.964 487 R 0.964 657 T 0.964 1292 W 0.964 395 T 0.963 848 A 0.963 904 S 0.963 177 Q 0.962 930 S 0.962 1143 L 0.962 1349 H 0.962 232 T 0.961 592 T 0.961 995 K 0.961 1157 H 0.961 1351 H 0.961 690 R 0.96 770 R 0.96 933 R 0.96 1071 S 0.96 1430 P 0.96 326 R 0.959 677 R 0.959 873 S 0.959 968 K 0.959 1010 V 0.959 1338 D 0.959 192 G 0.958 1168 A 0.958 1170 P 0.958 139 K 0.957 829 G 0.957 849 P 0.957 1089 T 0.957 1093 A 0.957 1124 H 0.957 417 C 0.956 555 T 0.956 772 A 0.956 397 T 0.955 418 T 0.955 474 L 0.955 844 V 0.955 878 A 0.955 1289 R 0.955 25 S 0.954 1068 A 0.954 1133 L 0.954 1345 P 0.954 886 T 0.953 1336 F 0.953 730 T 0.952 820 A 0.952 825 A 0.952 1115 R 0.952 1171 H 0.952 1302 H 0.952 447 I 0.951 791 T 0.951 Cytb None

53 Texas Tech University, Emma K. Roberts, May 2020

Table 2.7. Likelihood ratio test statistics for models of variable selective pressure for each gene. Model M0 assumes the same dN/dS for all branches; Model M1 assumes different dN/dS for each branch; Model M2 assumes different dN/dS for each branch and site, with selection at sites confirmed by Bayes Empirical Bayes statistical support. The test statistic equals double the difference in log likelihood values between models being tested. Asterisks denote significant p-values.

Gene Model 2(ΔLH) df P-value Zan M0 vs. M1 665.53 1 <<0.001* M1 vs. M2 1385.96 2 <<0.001*

Tecta M0 vs M1 978.78 1 <<0.001*

M1 vs. M2 13.50 2 <0.001*

Cytb M0 vs. M1 826.28 1 <<0.001* M1 vs. M2 0.04 2 >0.1

54 Texas Tech University, Emma K. Roberts, May 2020

Table 2.8. Test statistics for ANOVA and post-hoc analysis of global gene divergence rates: Kruskal-Wallis H rank test (top), ANOVA of divergence rate between and within Orders among all three genes collectively and by individual gene (middle), and Fisher’s LSD post-hoc test comparing normalized divergence between genes (bottom). df = degrees of freedom between Orders. Asterisks denote p<0.05.

Gene Test statistic df P-value Zan 68.303 8 <0.0001* Tecta 53.069 8 <0.0001* Cytb 72.087 8 <0.0001* All 3.680 2 0.159

Sum of squares Degrees of Mean squares freedom Gene Between Within Between Within Between Within F P-value All 67.391 9.404 2 313 33.696 0.03 1121.514 <0.0001* Zan 2.599 1.007 8 98 0.325 0.01 31.597 <0.0001* Tecta 1.897 1.324 8 90 0.237 0.015 16.126 <0.0001* Cytb 1.78 0.797 8 101 0.223 0.008 28.214 <0.0001*

95% Confidence Interval Mean Std. Lower Upper Comparison difference Error P-value bound bound Cytb vs Zan 0.864 0.023 <0.0001* 0.818 0.910 Zan vs Tecta 0.190 0.024 <0.0001* 0.143 0.238 Cytb vs Tecta 1.054 0.024 <0.0001* 1.007 1.102

55 Texas Tech University, Emma K. Roberts, May 2020

Table 2.9. Games-Howell post-hoc test summary results for Zan Ordinal comparisons of normalized divergence rate. Asterisks denote significant p-values.

95% Confidence Interval Mean Std. Comparison difference Error P-value Lower Upper Afrotheria vs Perissodactyla 0.226 0.076 0.256 -0.137 0.589 Afrotheria vs Chiroptera 0.034 0.079 1.000 -0.322 0.391 Afrotheria vs Carnivora -0.001 0.078 1.000 -0.359 0.356 Afrotheria vs Cetartiodactyla -0.026 0.080 1.000 -0.380 0.327 Afrotheria vs Lagomorpha -0.062 0.128 0.999 -1.157 1.031 Afrotheria vs Primates -0.104 0.077 0.883 -0.465 0.257 Afrotheria vs Eulipotyphla -0.438 0.127 0.185 -1.095 0.219 Afrotheria vs Rodentia -0.373 0.079 0.041* -0.730 -0.015

Perissodactyla vs Afrotheria -0.226 0.076 0.256 -0.589 0.137 Perissodactyla vs Chiroptera -0.191 0.025 <0.0001* -0.285 -0.098 Perissodactyla vs Carnivora -0.227 0.023 <0.0001* -0.311 -0.143 Perissodactyla vs Cetartiodactyla -0.252 0.029 <0.0001* -0.353 -0.151 Perissodactyla vs Lagomorpha -0.288 0.104 0.540 -3.609 3.031 Perissodactyla vs Primates -0.330 0.018 <0.0001* -0.394 -0.266 Perissodactyla vs Eulipotyphla -0.664 0.102 0.102 -1.621 0.293 Perissodactyla vs Rodentia -0.599 0.024 <0.0001* -0.683 -0.515

Chiroptera vs Afrotheria -0.034 0.079 1.000 -0.391 0.322 Chiroptera vs Perissodactyla 0.191 0.025 <0.0001* 0.098 0.285 Chiroptera vs Carnivora -0.035 0.030 0.958 -0.142 0.071 Chiroptera vs Cetartiodactyla -0.060 0.035 0.741 -0.180 0.059 Chiroptera vs Lagomorpha -0.096 0.106 0.955 -2.937 2.743 Chiroptera vs Primates -0.138 0.027 0.002* -0.233 -0.043 Chiroptera vs Eulipoytyphla -0.472 0.104 0.182 -1.370 0.425 Chiroptera vs Rodentia -0.407 0.031 <0.0001* -0.514 -0.300

Carnivora vs Afrotheria 0.001 0.078 1.000 -0.356 0.359 Carnivora vs Perissodactyla 0.227 0.023 <0.0001* 0.143 0.311 Carnivora vs Chiroptera 0.035 0.030 0.958 -0.071 0.142 Carnivora vs Cetartiodactyla -0.024 0.034 0.998 -0.139 0.090 Carnivora vs Lagomorpha -0.061 0.106 0.994 -2.979 2.856 Carnivora vs Primates -0.102 0.025 0.012* -0.189 -0.016 Carnivora vs Eulipotyphla -0.436 0.104 0.212 -1.344 0.471 Carnivora vs Rodentia -0.371 0.030 <0.0001* -0.472 -0.270

Cetartiodactyla vs Afrotheria 0.026 0.080 1.000 -0.327 0.380 Cetartiodactyla vs Perissodactyla 0.252 0.029 <0.0001* 0.151 0.353

56 Texas Tech University, Emma K. Roberts, May 2020

Table 2.9 Continued

Cetartiodactyla vs Chiroptera 0.060 0.035 0.741 -0.059 0.180 Cetartiodactyla vs Carnivora 0.024 0.034 0.998 -0.090 0.139 Cetartiodactyla vs Lagomorpha -0.036 0.107 1.000 -2.642 2.569 Cetartiodactyla vs Primates -0.077 0.031 0.276 -0.182 0.026 Cetartiodactyla vs Eulipotyphla -0.411 0.105 0.231 -1.279 0.455 Cetartiodactyla vs Rodentia -0.346 0.035 <0.0001* -0.462 -0.231

Lagomorpha vs Afrotheria 0.062 0.128 0.999 -1.031 1.157 Lagomorpha vs Perissodactyla 0.288 0.104 0.540 -3.031 3.609 Lagomorpha vs Chiroptera 0.096 0.106 0.955 -2.743 2.937 Lagomorpha vs Carnivora 0.061 0.106 0.994 -2.856 2.979 Lagomorpha vs Cetartiodactyla 0.036 0.107 1.000 -2.569 2.642 Lagomorpha vs Lagomorpha -0.041 0.105 0.999 -3.222 3.139 Lagomorpha vs Primates -0.375 0.145 0.442 -1.429 0.678 Lagomorpha vs Eulipotyphla -0.310 0.106 0.505 -3.196 2.575 Lagomorpha vs Rodentia 0.062 0.128 0.999 -1.031 1.157

Primates vs Afrotheria 0.104 0.077 0.883 -0.257 0.465 Primates vs Perissodactyla 0.330 0.018 <0.0001* 0.266 0.394 Primates vs Chiroptera 0.138 0.027 0.002* 0.043 0.233 Primates vs Carnivora 0.102 0.025 0.012* 0.016 0.189 Primates vs Cetartiodactyla 0.077 0.031 0.276 -0.026 0.182 Primates vs Lagomorpha 0.041 0.105 0.999 -3.139 3.222 Primates vs Eulipotyphla -0.334 0.103 0.338 -1.274 0.606 Primates vs Rodentia -0.268 0.026 <0.0001* -0.356 -0.181

Eulipotyphla vs Afrotheria 0.438 0.127 0.185 -0.2191 1.095 Eulipotyphla vs Perissodactyla 0.664 0.102 0.102 -0.293 1.621 Eulipotyphla vs Chiroptera 0.472 0.104 0.182 -0.425 1.370 Eulipotyphla vs Carnivora 0.436 0.104 0.212 -0.471 1.344 Eulipotyphla vs Cetartiodactyla 0.411 0.105 0.231 -0.455 1.279 Eulipotyphla vs Lagomorpha 0.375 0.145 0.442 -0.678 1.429 Eulipotyphla vs Primates 0.334 0.103 0.338 -0.606 1.274 Eulipotyphla vs Rodentia 0.065 0.104 0.996 -0.838 0.968

Rodentia vs Afrotheria 0.373 0.079 0.041* 0.015 0.730 Rodentia vs Perissodactyla 0.599 0.024 <0.0001* 0.515 0.683 Rodentia vs Chiroptera 0.407 0.031 <0.0001* 0.300 0.514 Rodentia vs Carnivora 0.371 0.030 <0.0001* 0.270 0.472 Rodentia vs Cetartiodactyla 0.346 0.035 <0.0001* 0.231 0.462

57 Texas Tech University, Emma K. Roberts, May 2020

Table 2.9 Continued

Rodentia vs Lagomorpha 0.310 0.106 0.505 -2.575 3.196 Rodentia vs Primates 0.268 0.026 <0.0001* 0.181 0.356 Rodentia vs Eulipotyphla -0.065 0.104 0.996 -0.968 0.838

58 Texas Tech University, Emma K. Roberts, May 2020

Table 2.10. Games-Howell post-hoc test summary results for Tecta Ordinal comparisons of normalized divergence rate. Asterisks denote significant p-values.

95% Confidence Interval Mean Std. Comparison difference Error P-value Lower Upper Afrotheria vs Perissodactyla 0.389 0.030 <0.0001* 0.244 0.533 Afrotheria vs Chiroptera -0.051 0.106 1.000 -0.482 0.379 Afrotheria vs Carnivora 0.161 0.035 0.023* 0.020 0.302 Afrotheria vs Cetartiodactyla 0.125 0.037 0.098 -0.016 0.267 Afrotheria vs Lagomorpha 0.144 0.078 0.706 -1.169 1.459 Afrotheria vs Primates 0.156 0.037 0.027* 0.014 0.297 Afrotheria vs Eulipotyphla -0.101 0.051 0.601 -0.366 0.162 Afrotheria vs Rodentia -0.170 0.036 0.016* -0.312 -0.028 Perissodactyla vs Afrotheria -0.389 0.030 <0.0001* -0.533 -0.244 Perissodactyla vs Chiroptera -0.440 0.102 0.046* -0.873 -0.008 Perissodactyla vs Carnivora -0.227 0.019 <0.0001* -0.297 -0.157 Perissodactyla vs Cetartiodactyla -0.263 0.022 <0.0001* -0.341 -0.185 Perissodactyla vs Lagomorpha -0.244 0.072 0.460 -2.621 2.133 Perissodactyla vs Primates -0.233 0.022 <0.0001* -0.307 -0.158 Perissodactyla vs Eulipotyphla -0.491 0.041 0.031* -0.874 -0.107 Perissodactyla vs Rodentia -0.559 0.021 <0.0001* -0.636 -0.483 Chiroptera vs Afrotheria 0.051 0.106 1.000 -0.379 0.482 Chiroptera vs Perissodactyla 0.440 0.102 0.046* 0.008 0.873 Chiroptera vs Carnivora 0.213 0.103 0.554 -0.218 0.644 Chiroptera vs Cetartiodactyla 0.177 0.104 0.736 -0.253 0.608 Chiroptera vs Lagomorpha 0.196 0.125 0.793 -0.373 0.766 Chiroptera vs Primates 0.207 0.104 0.586 -0.223 0.638 Chiroptera vs Eulipoytyphla -0.050 0.109 1.000 -0.489 0.388 Chiroptera vs Rodentia -0.119 0.104 0.949 -0.550 0.312 Carnivora vs Afrotheria -0.161 0.035 0.023* -0.302 -0.020 Carnivora vs Perissodactyla 0.227 0.019 <0.0001* 0.157 0.297 Carnivora vs Chiroptera -0.213 0.103 0.554 -0.644 0.218 Carnivora vs Cetartiodactyla -0.035 0.028 0.941 -0.132 0.060 Carnivora vs Lagomorpha -0.016 0.074 1.000 -1.856 1.823 Carnivora vs Primates -0.005 0.028 1.000 -0.099 0.089 Carnivora vs Eulipotyphla -0.263 0.045 0.071 -0.565 0.0381 Carnivora vs Rodentia -0.332 0.027 <0.0001* -0.426 -0.237 Cetartiodactyla vs Afrotheria -0.125 0.037 0.098 -0.267 0.016 Cetartiodactyla vs Perissodactyla 0.263 0.022 <0.0001* 0.185 0.341 Cetartiodactyla vs Chiroptera -0.177 0.104 0.736 -0.608 0.253

59 Texas Tech University, Emma K. Roberts, May 2020

Table 2.10 Continued

Cetartiodactyla vs Carnivora 0.035 0.028 0.941 -0.060 0.132 Cetartiodactyla vs Lagomorpha 0.019 0.075 1.000 -1.643 1.682 Cetartiodactyla vs Primates 0.030 0.031 0.986 -0.070 0.131 Cetartiodactyla vs Eulipotyphla -0.227 0.046 0.092 -0.509 0.054 Cetartiodactyla vs Rodentia -0.296 0.030 <0.0001* -0.397 -0.195 Lagomorpha vs Afrotheria -0.144 0.078 0.706 -1.459 1.169 Lagomorpha vs Perissodactyla 0.244 0.072 0.460 -2.133 2.621 Lagomorpha vs Chiroptera -0.196 0.125 0.793 -0.766 0.373 Lagomorpha vs Carnivora 0.016 0.074 1.000 -1.823 1.856 Lagomorpha vs Cetartiodactyla -0.019 0.075 1.000 -1.682 1.643 Lagomorpha vs Lagomorpha 0.011 0.075 1.000 -1.668 1.691 Lagomorpha vs Primates -0.246 0.083 0.425 -1.266 0.773 Lagomorpha vs Eulipotyphla -0.315 0.075 0.344 -2.040 1.409 Lagomorpha vs Rodentia -0.144 0.078 0.706 -1.459 1.169 Primates vs Afrotheria -0.156 0.037 0.027* -0.297 -0.014 Primates vs Perissodactyla 0.233 0.022 <0.0001* 0.158 0.307 Primates vs Chiroptera -0.207 0.104 0.586 -0.638 0.223 Primates vs Carnivora 0.005 0.028 1.000 -0.089 0.099 Primates vs Cetartiodactyla -0.030 0.031 0.986 -0.131 0.070 Primates vs Lagomorpha -0.011 0.075 1.000 -1.691 1.668 Primates vs Eulipotyphla -0.258 0.046 0.065 -0.541 0.025 Primates vs Rodentia -0.326 0.030 <0.0001* -0.426 -0.227 Eulipotyphla vs Afrotheria 0.101 0.051 0.601 -0.162 0.366 Eulipotyphla vs Perissodactyla 0.491 0.041 0.031* 0.1075 0.874 Eulipotyphla vs Chiroptera 0.050 0.109 1.000 -0.388 0.489 Eulipotyphla vs Carnivora 0.263 0.045 0.071 -0.038 0.565 Eulipotyphla vs Cetartiodactyla 0.227 0.046 0.092 -0.054 0.509 Eulipotyphla vs Lagomorpha 0.246 0.083 0.425 -0.773 1.266 Eulipotyphla vs Primates 0.258 0.046 0.065 -0.025 0.541 Eulipotyphla vs Rodentia -0.068 0.046 0.817 -0.357 0.220 Rodentia vs Afrotheria 0.170 0.036 0.016* 0.028 0.312 Rodentia vs Perissodactyla 0.559 0.021 <0.0001* 0.483 0.636 Rodentia vs Chiroptera 0.119 0.104 0.949 -0.312 0.550 Rodentia vs Carnivora 0.332 0.027 <0.0001* 0.237 0.426 Rodentia vs Cetartiodactyla 0.296 0.030 <0.0001* 0.195 0.397 Rodentia vs Lagomorpha 0.315 0.075 0.344 -1.409 2.040 Rodentia vs Primates 0.326 0.030 <0.0001* 0.227 0.426 Rodentia vs Eulipotyphla 0.068 0.046 0.817 -0.220 0.357

60 Texas Tech University, Emma K. Roberts, May 2020

Table 2.11. Games-Howell post-hoc test summary results for Cytb Ordinal comparisons of normalized divergence rate. Significant p-values are indicated by an asterisk.

Mean Std. 95% Confidence Interval Comparison difference Error P-value Lower Upper Afrotheria vs Perissodactyla 0.302 0.031 <0.0001* 0.180 0.424 Afrotheria vs Chiroptera 0.062 0.043 0.867 -0.092 0.218 Afrotheria vs Carnivora 0.107 0.032 0.102 -0.014 0.228 Afrotheria vs Cetartiodactyla 0.080 0.035 0.403 -0.044 0.205 Afrotheria vs Lagomorpha 0.198 0.027 0.003* 0.081 0.316 Afrotheria vs Primates -0.177 0.032 0.003* -0.297 -0.058 Afrotheria vs Eulipotyphla 0.099 0.124 0.984 -0.968 1.166 Afrotheria vs Rodentia -0.040 0.031 0.922 -0.160 0.078 Perissodactyla vs Afrotheria -0.302 0.031 <0.0001* -0.424 -0.180 Perissodactyla vs Chiroptera -0.239 0.036 0.001* -0.377 -0.101 Perissodactyla vs Carnivora -0.195 0.022 <0.0001* -0.280 -0.110 Perissodactyla vs Cetartiodactyla -0.221 0.026 <0.0001* -0.313 -0.13 Perissodactyla vs Lagomorpha -0.103 0.014 0.036* -0.196 -0.011 Perissodactyla vs Primates -0.480 0.022 <0.0001* -0.560 -0.399 Perissodactyla vs Eulipotyphla -0.203 0.122 0.754 -1.338 0.932 Perissodactyla vs Rodentia -0.342 0.021 <0.0001* -0.424 -0.261 Chiroptera vs Afrotheria -0.062 0.043 0.867 -0.218 0.092 Chiroptera vs Perissodactyla 0.239 0.036 0.001* 0.101 0.377 Chiroptera vs Carnivora 0.044 0.038 0.954 -0.094 0.182 Chiroptera vs Cetartiodactyla 0.017 0.040 1.000 -0.124 0.159 Chiroptera vs Lagomorpha 0.135 0.033 0.046* 0.002 0.269 Chiroptera vs Primates -0.240 0.037 <0.0001* -0.377 -0.10 Chiroptera vs Eulipoytyphla 0.036 0.126 1.000 -0.992 1.0652 Chiroptera vs Rodentia -0.103 0.037 0.212 -0.240 0.033 Carnivora vs Afrotheria -0.107 0.032 0.102 -0.228 0.014 Carnivora vs Perissodactyla 0.195 0.022 <0.0001* 0.110 0.280 Carnivora vs Chiroptera -0.044 0.038 0.954 -0.182 0.094 Carnivora vs Cetartiodactyla -0.026 0.027 0.988 -0.119 0.066 Carnivora vs Lagomorpha 0.0918 0.017 0.004* 0.026 0.157 Carnivora vs Primates -0.284 0.024 <0.0001* -0.364 -0.204 Carnivora vs Eulipotyphla -0.007 0.123 1.000 -1.130 1.114 Carnivora vs Rodentia -0.147 0.023 <0.0001* -0.227 -0.068 Cetartiodactyla vs Afrotheria -0.080 0.035 0.403 -0.205 0.044 Cetartiodactyla vs Perissodactyla 0.221 0.026 <0.0001* 0.13 0.313

61 Texas Tech University, Emma K. Roberts, May 2020

Table 2.11 Continued

Cetartiodactyla vs Chiroptera -0.017 0.040 1.000 -0.159 0.124 Cetartiodactyla vs Carnivora 0.026 0.027 0.988 -0.066 0.119 Cetartiodactyla vs Lagomorpha 0.118 0.021 0.001* 0.042 0.193 Cetartiodactyla vs Primates -0.258 0.027 <0.0001* -0.347 -0.168 Cetartiodactyla vs Eulipotyphla 0.018 0.123 1.000 -1.083 1.121 Cetartiodactyla vs Rodentia -0.121 0.026 0.002* -0.209 -0.032 Lagomorpha vs Afrotheria -0.198 0.027 0.003* -0.316 -0.081 Lagomorpha vs Perissodactyla 0.103 0.014 0.036* 0.011 0.196 Lagomorpha vs Chiroptera -0.135 0.033 0.046* -0.269 -0.002 Lagomorpha vs Carnivora -0.091 0.017 0.004* -0.157 -0.026 Lagomorpha vs Cetartiodactyla -0.118 0.021 0.001* -0.193 -0.042 Lagomorpha vs Primates -0.376 0.016 <0.0001* -0.432 -0.320 Lagomorpha vs Eulipotyphla -0.099 0.121 0.980 -1.264 1.065 Lagomorpha vs Rodentia -0.239 0.016 <0.0001* -0.295 -0.183 Primates vs Afrotheria 0.177 0.032 0.003* 0.058 0.297 Primates vs Perissodactyla 0.480 0.022 <0.0001* 0.399 0.560 Primates vs Chiroptera 0.240 0.037 <0.0001* 0.103 0.377 Primates vs Carnivora 0.284 0.024 <0.0001* 0.204 0.364 Primates vs Cetartiodactyla 0.258 0.027 <0.0001* 0.168 0.347 Primates vs Lagomorpha 0.376 0.016 <0.0001* 0.320 0.432 Primates vs Eulipotyphla 0.276 0.122 0.559 -0.850 1.404 Primates vs Rodentia 0.137 0.022 <0.0001* 0.062 0.211 Eulipotyphla vs Afrotheria -0.099 0.124 0.984 -1.166 0.968 Eulipotyphla vs Perissodactyla 0.203 0.122 0.754 -0.932 1.338 Eulipotyphla vs Chiroptera -0.036 0.126 1.000 -1.065 0.992 Eulipotyphla vs Carnivora 0.007 0.123 1.000 -1.114 1.130 Eulipotyphla vs Cetartiodactyla -0.018 0.123 1.000 -1.121 1.083 Eulipotyphla vs Lagomorpha 0.099 0.121 0.980 -1.065 1.264 Eulipotyphla vs Primates -0.276 0.122 0.559 -1.404 0.850 Eulipotyphla vs Rodentia -0.139 0.122 0.920 -1.269 0.990 Rodentia vs Afrotheria 0.040 0.031 0.922 -0.078 0.160 Rodentia vs Perissodactyla 0.342 0.021 <0.0001* 0.261 0.424 Rodentia vs Chiroptera 0.103 0.037 0.212 -0.033 0.240 Rodentia vs Carnivora 0.147 0.023 <0.0001* 0.068 0.227 Rodentia vs Cetartiodactyla 0.121 0.026 0.002* 0.032 0.209 Rodentia vs Lagomorpha 0.239 0.016 <0.0001* 0.183 0.295 Rodentia vs Primates -0.137 0.022 <0.0001* -0.211 -0.062 Rodentia vs Eulipotyphla 0.139 0.122 0.920 -0.990 1.269

62 Texas Tech University, Emma K. Roberts, May 2020

CHAPTER III

GAMETE RECOGNITION GENE TREE YIELDS A ROBUST EUTHERIAN PHYLOGENY ACROSS TAXONOMIC LEVELS

Abstract The extraordinary morphological diversity among extant mammals poses a challenge for studies of speciation, adaptation, molecular evolution, and reproductive isolation. Despite the recent wealth of molecular studies on mammalian phylogenetics, uncertainties remain surrounding both ancestral and more recent divergence events that have proven difficult to resolve. Multi-gene datasets often provide increased support for higher-level affinities within Mammalia, but such analyses require vast amounts of information (genomic-level sequence data) and computational effort (high-performance computing). This study presents a phylogenetic solution based on a single reproductive molecular marker, Zan, a putative mammalian “speciation gene” encoding the sperm protein zonadhesin that mediates species-specific adhesion to the egg and thereby promotes reproductive isolation. Topological comparison of Zan Bayesian phylogenies to a widely cited supertree showed that Zan sequences were phylogenetically informative, and provided stronger support for resolution at both deeper and more terminal nodes in the placental mammalian phylogeny. This single gene marker provides a similar but more robustly supported topology than a supertree generated using DNA sequences from 66 genes, and thus provides unique new insight into the divergence of both early and recent mammalian radiations.

63 Texas Tech University, Emma K. Roberts, May 2020

Introduction One of the greatest uncertainties in mammalian evolution and phylogenetics concerns the relationships among the 19 extant orders of Eutherian mammals

(Meredith et al 2011, dos Reis et al 2012, Song et al 2012, O’Leary et al 2013,

Springer et al 2013, Springer and Gatesy 2016). Since Simpson’s preeminent classification (Simpson,1945), systematists historically placed placental mammals in

4-5 major clusters all arranged as a polytomy (more or less), often shown radiating from the same node simultaneously (Shoshani 1986, Novacek et al 1988, Novacek

1989, Novacek 1992a, MacPhee et al 1993, Gaudin et al 1996, O’Leary et al 2004).

By the late 20th century, the view of Eutherian phylogenetic relationships had changed radically as molecular studies supported large-scale resolution of the Eutherian tree with some level of certainty (Meredith et al 2011, dos Reis et al 2012, Madsen et al

2001, Murphy et al 2001). Indeed, molecular support for a more stable Eutherian tree has emerged from analyses of increasingly robust molecular data (Springer et al. 1997, de Jong 1998, Stanhope et al. 1998, Waddell et al. 1999a, Madsen et al. 2001, Murphy et al. 2001, Amrine-Madsen et al. 2003, Nishihara et al. 2006, Hallstrom et al. 2007,

Wildman et al. 2007, Asher et al. 2009) and from supertree studies (Liu et al. 2001,

Beck et al. 2006, Bininda-Emonds et al. 2007), with a general consensus recognizing four major clades of placental mammals and resolving the divergence of the ancestral placental group. Nevertheless, at all taxonomic levels, variation among phylogenies constructed by comparing diverse characters showed that phylogenetic relationships are complex and in need of further revision, especially among mammalian groups that have received relatively little systematic attention or are morphologically or

64 Texas Tech University, Emma K. Roberts, May 2020 behaviorally cryptic (e.g. insectivores or burrowing mammals, respectively).

Consequently, several polytomies remain contentious in the Eutherian tree, both at basal and terminal nodes. For example, many taxon groups remain unresolved at the species level especially within the earliest (placental root) and most rapidly evolving mammalian lineages (speciose Orders e.g. Chiroptera and Rodentia; Fabre et al. 2012,

Halliday et al. 2015).

The difficulty of resolving the Eutherian tree has evoked debates over methodological issues such as morphologic versus molecular data, fossil calibrations, taxon sampling, long branch attraction, and nuclear versus mitochondrial genes

(Murphy et al 2001, Springer et al 2004, Springer et al 2007, Meredith et al 2011,

O’Leary et al 2013). Current approaches, especially combinatorial methods (total evidence) that include morphological and molecule-based datasets, are thought to strengthen and clarify phylogenetic relationships particularly among closely related taxa (Huelsenbeck et al. 1996, Paradis 2003). Supertrees, phylogenetic datasets consisting of many characters, e.g. multiple genes, morphological datapoints, and fossil calibrations, have the potential to show an overall consensus of the true species tree. In 2007, Bininda-Emonds et al. published the first species-level supertree of mammals. The 99% “complete” Bininda-Emonds Supertree included 4,510 of the

4,554 extant mammalian species (based on most recent data estimated as of 2007) and was generated using a 51 kb alignment with 66 genes (32 transcribed nuclear, 19 tRNA, and 15 mitochondrial) and 30 fossil calibration points. Despite the abundance

65 Texas Tech University, Emma K. Roberts, May 2020 of characters included, the Supertree still contained many unresolved polytomies

(Stadler and Bokma 2013).

Studies of different morphological (Novacek 1992a, O’Leary et al 2013) and molecular characters (Madsen et al 2001, Murphy et al 2001, Scally et al. 2001,

Delsuc et al 2002, Springer et al 2003, Springer et al 2004, Springer et al 2007,

Meredith et al 2011, dos Reis et al 2012, Tarver et al 2016), Foley et al 2016) tend to yield disparate mammalian phylogenies, forcing decades-long disagreement about the reliability of single character phylogenetic studies (Novacek 1992b, Bininda-Emonds et al. 2007). Typically, characters that are lineage-dependent or subject to positive selection (body size, pelage color, mating behaviors, adaptive genes involved in speciation) are removed from phylogenetic analyses because they reflect, at least in part, the functional evolution of the character itself, not of the organism as a whole, and thus have been thought to lack reliable phylogenetic signals. Accordingly, molecular phylogenetic studies routinely omit genes evolving under selective pressures that may differ between lineages, as their divergence reflects not only time passed but also evolution of their products’ functions, with negative selection on products that must function the same and positive selection on products that bestow beneficial new traits in the evolving organisms (Sabeti et al. 2006). Instead, genes that evolve under neutrality are thought to provide the most accurate insight into phylogenetic histories because they serve simply as a clock to measure time passed since a speciation event.

66 Texas Tech University, Emma K. Roberts, May 2020

Notwithstanding the conventional wisdom that phylogenetic studies are best done by comparing diverse characters, including large numbers of neutrally evolving genes, it is also possible that reliable phylogenetic information could be generated by comparing a single character that directly tracks the speciation process itself throughout a group, regardless of the nature of selection on that character.

Accordingly, divergence of a “speciation gene” may accurately represent species phylogeny because its evolution does not simply serve as a clock, but instead reflects the gene’s direct contribution to speciation. We previously showed that Zan is a speciation gene in Eutheria (Roberts et al. Ontogeny). Zan encodes the sperm protein zonadhesin that mediates species-specific recognition of the egg, and rapid Zan evolution by intense positive selection promotes prezygotic reproductive isolation

(Hardy and Garbers 1994, Hardy and Garbers 1995, Gao and Garbers 1998, Herlyn and Zischler 2006, Tardif et al. 2010a, Tardif et al. 2010b). Despite being a single character evolving under selection, the Zan tree was generally congruous to a widely accepted mammalian Supertree (Roberts et al. Ontogeny; Bininda-Emonds et al. 2007) but did exhibit significant incongruencies in part because it was more highly resolved

(Roberts et al. Ontogeny). To investigate those incongruencies further, here we report detailed comparisons of Zan gene and mammalian Supertree phylogenies at all taxonomic levels. The findings support the view that Zan DNA sequences reflect species divergence events compared to datasets provided from a widely established

Supertree. Thus, there is no need to omit single character evolving under selection from phylogeny studies if evolution of character directly reflects speciation.

67 Texas Tech University, Emma K. Roberts, May 2020

Methods Phylogenetic comparisons generated herein were based on an analysis of a Zan

DNA sequence alignment (6 kb) obtained from 112 Eutherian mammals using a

Bayesian inference model. The methodology and parameters used in the analyses along with appropriate nucleotide substitution models are found in Roberts et al.

Ontogeny. The Zan phylogeny was compared to the Supertree (Bininda-Emonds et al.

2007) described above. To establish maximum compatibility between studies, the BE tree was pruned in Phylomatic v.3 (Webb and Donoghue 2005) to only those species represented in each of the Zan Superorder, Ordinal, Family, and Species-level trees, generated in Roberts et al. Ontogeny. Fig. 3.1 summarizes taxonomic relationships above Parvorder and corresponding molecular composition, nodal support, and branch lengths of the tree previously reported in Roberts et al. Ontogeny. Tanglegrams of phylogenies from both datasets (Zan and Supertree) drawn manually using the NN- tanglegram method (Scornavacca et al. 2011), Superorder-level trees (n=1), Ordinal- level trees (n=2), Family-level trees (n=5) and Species-level trees (n=5) were compared visually by the tanglegram method. Tanglegrams have been shown to be a useful method to view similarities and differences between rooted phylogenetic trees and have proven valuable for studies of co-evolutionary analysis in parasitic and symbiotic systems, horizontal gene transfer in prokaryotic evolution, and phylogenetic associations between different datasets in varying taxon groups (Venkatachalam and

Gusfield 2018). Because of limited taxon representation in Orders Cingulata (n=1),

Eulipotyphla (n=3), Perissodactyla (n=3), Pholiodota (n=1), Scandentia (n=1),

Dermoptera (n=1), and Lagomorpha (n=2) no tanglegrams were generated. To

68 Texas Tech University, Emma K. Roberts, May 2020 evaluate congruence between the Zan and Supertrees, connector lines drawn between corresponding taxa evaluated the level of entanglement as a measure of incongruence between phylogenies.

Results Bayesian analyses of Zan divergence, from a 112-species comparison of a 6 kb portion encoding the region of zonadhesin that mediates species specific egg recognition (Roberts et al. Ontogeny), produced a phylogenetic tree (Fig. 3.1) that closely recapitulated a Eutherian Supertree constructed from a 51 kb alignment obtained from 66 genes (Bininda-Emonds et al. 2007). Although the Supertree has been widely cited, it contains many unresolved polytomies, whereas the Zan phylogeny exhibited stronger support, Bayesian posterior values ³0.95 at 107 of 112 nodes (Fig. 3.1). Further, topological comparisons obtained from the Zan analysis and the Supertree, identified incongruities at 67 of the 258 total nodes (26%), at all taxonomic levels (Figs. 3.2-3.7). Below, the Zan topology is described in detail along with any incongruities with the Supertree generated by Bininda-Emonds et al. (2007).

In addition, various taxonomic and classification arrangements followed those presented in McKenna and Bell (1997), Bininda-Emonds et al. (2007), and Foley et al.

(2016).

Magnordinal and Superordinal-level comparisons The Zan gene tree topology (Fig. 3.1 and 3.2A) placed Magnorders

Atlantogenata (Superorders Afrotheria and Xenarthra) and (Superorders

Laurasiatheria and Euarchontoglires) into monophyletic groups. In contrast, the

69 Texas Tech University, Emma K. Roberts, May 2020

Supertree (Fig. 3.1 and 3.2A) placed the paraphyletic with the

Afrotheria as the sister taxon to the Boreoeutheria, thereby forming the group referred to as ‘Exafroplacentalia’.

Ordinal-level comparisons Within the Afrotheria, the Zan tree’s topology depicted two clades (Fig. 3.1 and 3.2B), the first composed of Orders Tubulidentata (aardvark) and Afrosoricida

(golden moles and ) and the second contained the Traditional Clade

Paenungulata (Orders Hyracoidea-hyraxes, Proboscidea-elephants, and Sirenia- manatees). Within the Paenungulata, the Sirenia and Proboscidea (Traditional Clade

Tethytheria) grouped as sister taxa, followed by the Hyracoidea as the basal-most taxon. Herein, Xenarthra contained only a single species in the Zan study, Dasypus novemcinctus (nine-banded armadillo) and no representatives from the Order

(anteaters and sloths). The Zan tree agreed with the Supertree with the recognition of the Paenungulata (Figs. 3.1 and 3.2B), however the two trees differed in the relationships between the three Orders comprising this group.

Following the divergence of Afrotheria and Xenarthra in the Zan tree (Figs. 3.1 and 3.2C), a divergence event occurred between Laurasiatheria (Orders Eulipotyphla- shrews, moles, and , Pholiodota-, Carnivora-carnivores,

Chiroptera-bats, Perissodactyla-odd-toed ungulates and Cetartiodactyla-even-toed ungulates) and Euarchontoglires containing the remaining placental mammals (Orders

Scandentia-treeshrews, Dermoptera-flying lemurs, Primates-primates, Lagomorpha- rabbits and pikas, and Rodentia-rodents). Within Laurasiatheria, Eulipotyphla formed

70 Texas Tech University, Emma K. Roberts, May 2020 the basal-most group, with the remaining taxa in Traditional Clade Scrotifera, including Orders Pholiodota, Carnivora, Chiroptera, and finally, the Traditional Clade

Euungulata containing Perissodactyla and Cetartiodactyla), forming a monophyletic group. Perissodactyla was not recognized as a monophyletic group (as sister to

Cetartiodactyla) and unsupported in our Zan Bayesian analysis (Roberts et al.

Ontogeny). Two tangles formed between the Zan and Supertree (Figs. 3.1 and 3.2C) with one incongruity centered on the placement of Chiroptera as sister to the clade (Zan tree) or alternatively, basal to the clade containing Pholiodota, Carnivora, and the ungulates (Supertree). The second difference was based on the position of

Pholiodota as sister to the clade containing Carnivora, Chiroptera, and the ungulates

(Zan tree) or alternatively, as sister to the Carnivora (Supertree). In addition, the association of Traditional Clade Ferae (Carnivora grouping with Pholiodota) was not recovered in either the Zan or Supertree topologies.

Within the Euarchontoglires in the Zan tree (Figs. 3.1 and 3.2C), the

Scandentia was placed as the basal-most taxon, followed by Dermoptera, Primates (all three constitute Traditional Clade ), and the sister clade containing

Lagomorpha and Rodentia (Traditional Clade ). Though there were no topological tangles between Zan and the Supertree, the initial divergence event of

Euarchontoglires generated incongruent groups with the placement of Primates either with Glires (Zan tree) or as sister to Dermoptera (Supertree).

71 Texas Tech University, Emma K. Roberts, May 2020

Intra-ordinal comparisons For Orders with limited taxon representation (n=1: Cingulata, Tubulidentata,

Hyracoidea, Sirenia, Proboscidea, Pholiodota, Dermoptera, Scandentia; n=2:

Afrosoricida, Lagomorpha; n=3: Eulipotyphla; n=4: Perissodactyla), there were no multiple species comparisons and therefore no tanglegrams. See Table 3.1 for a list of those species and taxonomic designations. Phylogenetic relationships within all other

Eutherian Orders are compared below.

Order Carnivora--The Carnivora comprised seven families (Figs. 3.1 and 3.3A) with Suborder Feliformia (Infraorder Feloidea which includes the Family -cats and relatives) as the basal-most group followed by the members of the Suborder

Caniformia (Infraorder Canoidea which includes the Families -dogs and relatives and Families Ursidae-bears and pandas, Mustelidae-otters and ,

Phocidae-earless seals, Otariidae-eared seals, and Odobenidae-walruses).

Subsequently, the Canidae diverged from members of the remaining Caniformia,

(Infraorder Arctoidea) which then split into two clades: one containing Ursidae and

Mustelidae and the other comprising the pinnipeds (Phocidae, Otariidae, and

Otariidae), with the latter two families forming a sister taxon relationship. Despite an absence of tangles in the topology (Fig. 3.3A), the divergence leading to Arctoidea

(ursids, mustelids, and pinnipeds), revealed an incongruity as the Mustelidae grouped with Ursidae in the Zan tree but with pinnipeds in the Supertree.

The Carnivora comprised 13 species, all of which grouped with genera in their respective families (Figs. 3.1 and 3.3B). Within Felidae in the Zan tree, Felis

(domestic cat) was placed basal to the lineage containing Acinonyx (cheetah) and

72 Texas Tech University, Emma K. Roberts, May 2020

Panthera (lion). In the Canidae, Vulpes (red fox) and Canis (domestic dog) formed a sister relationship, Ailuropoda (giant panda) and (polar bear) placed in Ursidae,

Mustela (domestic ferret) and Enhydra (sea otter) placed in Mustelidae, and

Callorhinus (northern fur seal) and Neomanachus (Hawaiian monk seal: Zan tree)/Mirounga (elephant seal: Supertree) placed in Phocidae. A one species representative of the families Odobenidae and Otariidae in this study; consequently,

Odobenus (walrus) and Leptonychotes (Weddel seal), grouped as sister taxa. The

Supertree was similar to the Zan tree with two exceptions: one in Felidae, shown by

Panthera (lion) grouping with Acinonyx (cheetah) in the Zan tree, but with Felis

(domestic cat) in the Supertree. The second incongruity involved Mustela and

Enhydra grouping with Ursus and Ailuropoda in the Zan tree; whereas, Callorhinus,

Odobenus, Leptonychotes/Mirounga in the Supertree.

Order Chiroptera--In the Zan tree, (Figs. 3.1 and 3.4A), the five families comprising the Order Chiroptera, formed two clades. The first included the Suborder

Yangochiroptera (Families Minopteridae-long-winged bats and Vespertilionidae- vesper bats) whereas the second included the Suporder Yinpterochiroptera (Families

Pteropodidae-megabats), Rhinolophidae-horseshoe bats, and Hipposideridae-Old

World leaf-nosed bats). Within the Yinpterochiroptera, Rhinolophidae and

Hipposideridae, formed a sister relationship. Tangles in the chiropteran family tree revealed an incongruity stemming from the basal split and placement of Pteropodidae as sister to Rhinolophidae and Hipposideridae in the Zan tree but as basal to the

Yangochiropteran families in the Supertree.

73 Texas Tech University, Emma K. Roberts, May 2020

The ten bats species included in this study all grouped within the genera from their respective families (Figs. 3.1 and 3.4B). In the Yangochiroptera, the only genus representing the Family Miniopteridae, Minopterus (natal long-fingered bat), was basal to Vespertilionidae, with Eptesicus (big brown bat) and Myotis spp. (davidii:

David’s Myotis, brandti: Brandt’s bat, and lucifugus: little brown bat) grouping together. Within the genus Myotis, M. brandti and M. lucifugus formed a sister relationship with M. davidii as the basal-most species. Within the Pteropodidae,

Pteropus alecto (black flying fox) and P. vampyrus (large flying fox) grouped together with Rousettus (Egyptian fruit bat) as the basal taxon. At the species-level, tanglegram comparisons revealed no incongruities within the Miniopteridae,

Vespertilionidae, and Pteropodidae.

Order Cetartiodactyla--The Cetartiodactyla (Figs. 3.1 and 3.5A) comprised nine families, with (camels) as the basal-most Family within Suborder

Tylopoda. The next divergence led to Suidae ( and relatives) within Suborder

Suina separating from the remaining members of the Certartiodactyla: one large cohort, Suborder Whippomorpha, including the aquatic families Balaenopteridae

(rorquals), Lipotidae (river ), Monodontidae (beluga whale), and Delphinidae

(dolphins and orcas) and another, Suborder Ruminantia, including the terrestrial families Cervidae (hoofed ungulates) and (cloven-hoofed ungulates). Within the aquatic group, called Infraorder , Parvorders Mysteceti and Odontoceti were sister groups, with the Balaenopteridae placed basally, followed by Physeteridae, then Lipotidae, Monodontidae, and Delphinidae, with Lipotidae basal to a clade

74 Texas Tech University, Emma K. Roberts, May 2020 containing Monodontidae and Delphinidae. For the Cetartiodactyla, the Zan and

Supertree were in topological agreement.

Eighteen species in Cetartiodactyla all clustered within genera from their respective families (Figs. 3.1 and 3.5B). Within the Camelidae, Vicugna was placed basal to a clade containing the two Camelus species (C. dromedarius-Arabian camel and C. bactrianus-Bactrian camel). Sus was the only representative genus of the family Suidae, as was Balaenoptera (Balaenopteridae), Physeter (Physeteridae),

Lipotes (Lipotidae) and Delphinapterus (Monodontidae). The genera Tursiops and

Orcinus formed a sister relationship as members of the Delphinidae. Within the ruminants, Odocoileus was basal to the clade comprising the Bovinae (cows and relatives) and another containing the Caprinae (goats and relatives). In the bovid clade, Bubalus was basal to the clade containing the three Bos species (B. bison- buffalo, B. grunniens-yak, and B. taurus-cow), with the latter two Bos species forming a sister relationship. The Caprinae contained Pantholops as the basal taxon and Capra and Ovis as each other’s closest relative. The Supertree was identical to the Zan tree with the exception of a polytomy containing cervids, bovids, and caprids in the

Supertree which was resolved in the Zan tree.

Order Primates--Composed of ten families, the Primates (Figs. 3.1 and 3.6A) can be divided into two large clades, one group containing the Suborder Strepsirrhini

(Families Galagidae-bushbabies, Cheirogaleidae-mouse lemur, and Indriidae-sifakas) and the second including the Suborder (Families Tarsiidae-tarsiers,

Cercopithecidae-baboons and macaques, Hylobatidae-gibbons, -apes,

75 Texas Tech University, Emma K. Roberts, May 2020

Cebidae-capuchins and squirrel monkeys, Callitrichida-marmosets and tamarins, and

Aotidae-night monkeys). Within the Strepsirrhini, Cheirogaleidae, and Indriidae were sister taxa followed by the addition of Galagidae as the basal-most Family, all within the Infraorder Lemuriformes. Within the Haplorhini, Tarsiidae formed the basal-most taxon (Infraorder Tarsiiformes), followed by a cladogenic event that divided the

Infraorder Simiiformes into two clades, one large group containing the Parvorder

Platyrrhini (Old World anthropoids: Families Cercopithecidae, Hylobatidae, and

Hominidae), and the second comprising the Parvorder Catarrhini (New World

Monkeys: Families , Callitrichidae, and Aotidae). Within the Catarrhini,

Hylobatidae and Hominidae were sister taxa with Cercopithecidae as basal, and in the

Platyrhini, Callithrichidae and Aotidae grouped as sister taxa followed by the addition of the Cebidae. The Supertree was identical to the Zan tree with the exception of an unresolved trichotomy involving the Catarrhini in the Supertree.

Primates comprised the most speciose Order (n=26) in the Zan tree, with all genera correctly placed within their respective families (Figs. 3.1 and 3.6B). Because the Families Galagidae (Otolemur, Northern greater galago), Cheirogaleidae

(Microcebus, gray mouse lemur), and Indriidae (Propithecus, coquerel sifaka), contained a single genus in this study, the species-level relationships were identical to that depicted in the familial-level comparison, with the Microcebus and Propithecus as sister taxa followed by the addition of the Otolemur. The single representative of the

Tarsiiformes (Tarsius, Phillipine tarsier) was basal to the clade containing 17 genera in the Simiiformes. Within the Haplorhini, two monophyletic groups were formed, the

76 Texas Tech University, Emma K. Roberts, May 2020 first containing Nomascus (Northern white-cheeked gibbon) and Pongo (Sumatran orangutan) and the second containing Gorilla (gorilla), Homo (human), and the Pan species, P. troglodytes (chimp) and P. paniscus (pygmy chimp), with the two Pan species forming a sister group. In the Family Cercopithecidae, the two species of

Colobus (C. angolensis-Angola colobus and C. polykomos-Ugandan colobus) formed a clade, separate from the remaining eight species. This second clade depicted

Chlorocebus (green monkey) as basal to a large clade containing three species of

Macaca, (M. nemestrina-pig-tailed macaque, M. fascicularis-crab-eating macaque, and M. mulatta-Rhesus macaque) and a second clade of Cercocebus (sooty mangabey), Mandrillus (drill), Theropithecus (gelada monkey), and Papio

(Hamadryas baboon). Within the second clade, Cercocebus is the basal-most member followed by the addition of Mandrillus, and the sister group containing Theropithecus and Papio. Two groups were depicted in the Catarrhini, the first containing Aotus

(Ma’s night monkey) as basal to the sister taxa Saguinus (cotton top tamarin) and

Callithrix (white-tufted ear marmoset) and the second clade with Cebus (white-faced capuchin) basal to both the sister group containing the two species of Saimiri (S. boliviensis-Bolivian squirrel monkey and S. sciureus-common squirrel monkey).

The Zan tree and Supertree differed in three instances, all of which were resolved in the Zan tree. The first incongruity involved the placement of Nomascus as sister with Pongo in the Catarrhini; second, the Cercopithecidae contained a tangle with Cercocebus and Mandrillus grouping in a sister relationship to a clade containing

77 Texas Tech University, Emma K. Roberts, May 2020

Theropithecus and Papio; and third, a trichotomy in the Supertree between the genera

Aotus, Saguinus and Callithrix, and Cebus and Saimiri species.

Order Rodentia--The Zan tree for the Rodentia (Figs. 3.1 and 3.7A) contained

11 families. This tree depicted a clade containing the Suborder Hystricomorpha

(Families Bathyergidae-blesmols or mole-rats, Caviidae-guinea pigs and relatives,

Octodontidae-degus and relatives, and Chinchillidae-chinchillas and relatives) as well as another group containing the remaining seven rodent families, representing the

Suborders Sciuromorpha (Family Sciuridae-squirrels and relatives), Castorimorpha

(Family Heteromyidae-kangaroo rats and relatives), and Myomorpha (Families

Dipodidiae-jerboas and relatives, Spalacidae-Old World mole rats, Muridae-Old

World rats and mice, and Cricetidae-New World rats and mice). Within the

Hysticomorpha, Octodontidae and Chinchillidae formed a sister taxon relationship, followed by sequential inclusion of Caviidae and Bathyergidae, with the Bathyergidae as the basal taxon. Within the Myomorpha, Muridae and Cricetidae were sister taxa followed by sequential inclusion of Spalacidae and Dipodidae. Members of the

Myomorpha and Castorimorpha formed a sister group which was then joined by the sole member of the Sciuromorpha (Family Sciuridae).

Tangles were extensive between the Zan and Supertree with the hystricomorph and sciuromorph rodents and more than five incongruities based on the placement of

Bathyergidae, Caviidae, Octodontidae, Chinchillidae, and Sciuridae. For the clade containing myomorph rodents, the Supertree trichotomy between Spalacidae, Muridae, and Cricetidae was resolved in the Zan tree. The Supertree depicted Sciuridae basal to

78 Texas Tech University, Emma K. Roberts, May 2020 all rodent families. Within the clade containing the hystricomorph rodents, the

Supertree included an unresolved trichotomy among the Chinchillidae, Bathyergidae, and Caviidae, as well as an unresolved trichotomy among the myomorph families

Spalacidae, Muridae, and Cricetidae.

Rodentia comprised the second largest order in the Zan tree with 19 species, all of which grouped within genera from their respective families (Figs. 3.1 and 3.7B).

For Bathyergidae (Fukomys, Damara mole-rat), Octodontidae (Octodon, common degu), Caviidae (Cavia, guinea pig), Chinchillidae (Chinchilla, guinea pig), species- level relationships are identical to that of the Zan family tree. The genus Sciurus and

Marmota within Sciuridae formed a sister relationship that was basal to the remaining genera comprising the Castorimorpha (Castor, American beaver), Heteromyidae

(Dipodomys, Ord’s kangaroo rat) and Myomorpha (Families Dipodidae-Jaculus,

Lesser Egyptian jerboa, and Spalacidae-Spalax, Upper Galilee Mountains blind mole rat). The single genus of Castoridae and Heteromyidae were sister and basal to the members of Myomorpha. Within Muridae, the three species of Mus (M. musculus: house mouse, M. caroli: Ryukyu mouse, and M. pahari: shrew mouse) formed a monophyletic group with M. musculus and M. caroli grouping as sister taxa. The Mus clade was then joined by Rattus (Norway rat) and Meriones (Mongolian gerbil).

Within the Cricetidae, Mesocricetus (golden hamster) and Cricetulus (Chinese hamster) were sister, followed by sequential inclusion of Microtus (prairie vole) and

Peromyscus (prairie deermouse). Members of the murid and cricetid clades were sister taxa and were then joined by Spalax and Jaculus. In general, the topologies of

79 Texas Tech University, Emma K. Roberts, May 2020 the Zan gene and Supertree were identical except for where taxa in the Supertree could not be resolved, including a large polytomy containing the genera Spalax, Meriones,

Rattus, Mus spp., Peromyscus, Microtus, Mesocricetus, and Cricetulus.

Discussion We conclude that Zan sequences are remarkably informative and robust for a single-character molecular marker. A comparison between the topologies generated from Zan sequences and a Supertree analysis revealed the Zan tree and Supertree were remarkably congruent for non-controversial and supported mammalian relationships.

However, where the Zan tree and Supertree were incongruous at both basal (ancient placental divergence) and terminal nodes (speciose taxon groups) in the mammalian tree, the Zan tree provided plausible and robust relationships, supported by other molecular studies.

In traditional phylogenetic studies, characters involved in adaptative evolution tend to be misleading because they are lineage- or site-specific, and informative only in certain taxa, e.g. characters involved in herbivore metabolism or bat echolocation.

Phylogenies generated from these types of datasets often answer only taxon and system-dependent questions and can be affected by incongruent generation times (new mutations rate depends on number of generations past rather than time elapsed) and population size (drift and other evolutionary processes are stronger in smaller populations). Further, species-specific differences are inherently taxon-dependent and can obscure taxonomic relationships and phylogenetic relationships among groups of organisms. For example, these processes can involve sexual selection, modifications

80 Texas Tech University, Emma K. Roberts, May 2020 in metabolism and reproduction, selection pervasiveness and intensity changes along a lineage, and modifications in protein structure, function and evolution.

Like single character comparisons, single gene comparisons are phylogenetically unreliable because an individual gene’s divergence necessarily reflects the potentially lineage-specific function of the gene product. But hypothetically, divergence of a gene that tracks speciation events such as Zan could serve as a valid phylogenetic marker. As a speciation gene, Zan has the potential to be a more informative and valid single gene marker of mammalian phylogeny than other neutrally evolving genes that serve only as molecular clocks and because divergence of the gene directly reflects speciation.

With only 26% incongruities between the Zan tree and the Supertree across all taxonomic levels, the Zan tree is remarkably informative and robust for a single character phylogeny. The phylogeny obtained from the analysis of 6 kb of DNA sequences from the Zan gene possessed stronger support across all levels of the topology with 95% of the nodes; whereas the widely accepted Supertree (Bininda-

Emonds et al. 2007) constructed from an alignment with an eight-fold increase in sequence data obtained from 65 more genes contained fewer supported nodes and many unresolved polytomies.

Our results revealed that the Zan tree proved to be superior to prior studies in affirming not only previously proposed and supported mammalian relationships but also many outstanding phylogenetic controversies across all taxonomic levels of the

Eutherian tree that have been traditionally difficult to resolve. In situations where Zan

81 Texas Tech University, Emma K. Roberts, May 2020 yielded a more resolved phylogeny for placental groups (placental ancestor group, afrotherians, insectivores, pangolins, and bats) than did other contemporary studies

(e.g. Bininda-Emonds et al. 2007) are discussed briefly below. Our results reveal a near-complete resolution of higher-level phylogenetic relationships including the placental ancestral divergence, resolving relationships between and within mammalian

Superorders. For example, the placement of the placental ancestor group between

Atlantogenata and Boreoeutheria (Teeling and Hedges 2013), with Afrotheria and

Xenarthra placed as sister taxa (called Atlantogenata) was recovered using Zan as a molecular marker, consistent with what is seen in other molecular studies (Waddell et al. 1999b, Hallstrom et al. 2007, Wildman et al. 2007). The inter-ordinal relationships within Afrotheria were resolved in the Zan tree, supporting the placement of

Tethytheria within the Paenungulata like in other studies (Rose 2006, Cooper 2014).

The monophyly of Paenungulata and its placement within Afrotheria was recovered using Zan sequences and is supported by molecular evidence (Seiffert and Guillon

2007).

Relationships within Laurasiatheria have been historically difficult to resolve primarily because there is a complete lack of morphological features to unite the group

(Waddell et al. 1999a). However, studies using whole mitochondrial genome sequencing have resolved various relationships within the Superorder (Cao et al.

2000), except the placement of Chiroptera relative to Carnivora and Perissodactyla.

Rapid diversification within Chiroptera or partial taxon representation in most molecular studies may have caused the lack of deeper resolution for the placement of

82 Texas Tech University, Emma K. Roberts, May 2020

Chiroptera relative to other Orders (Teeling et al. 2005, Tsagkogeorga et al. 2013).

Our results affirm the monophyly of Chiroptera within Laurasiatheria (Simmons et al.

2008), however we cannot comment on the position of Chiroptera as basal to the

Euungulata (Cetartiodactyla and Perissodactyla). Our use of Zan sequences was in this study resolved a sister relationship between Ferae (Pholiodota and Carnivora) and

Euungulata that was also suggested by other molecular data (Beck et al. 2006, Zhou et al. 2012). Moreover, the Zan tree grouped insectivores polyphyletically grouped shrews, moles and hedgehogs in Eulipotyphla, whereas tenrecs grouped within

Afrotheria as shown by other molecular studies (Douady et al. 2002, Beck et al. 2006,

Bininda-Emonds et al. 2007). The placement of hedgehogs with shrews and moles is in direct contrast to the basal position within placental mammals observed with some molecular data (Foley et al. 2016). However, relationships within the insectivore group remain inconsistent and subsequent resolution may be achieved with greater taxon sampling, including fossil data.

Within Euarchontoglires, a sister relationship is evident between Primates and both Dermoptera and Scandentia, with the latter placed as the basal most taxon was supported and consistent with other studies (Kumar et al. 2013). Many studies have not discussed the position of Dermoptera and Scandentia, and the latter has previously been placed as sister to Glires (Rodentia and Lagomorpha; Foley et al. 2016), sister to

Primatomorpha (Kumar et al. 2013), sister to Dermoptera (Zou and Zhang 2016), or basal to the entire Superorder (Meredith et al. 2011). In addition, the monophyly of

83 Texas Tech University, Emma K. Roberts, May 2020

Glires and the sister relationship within of Rodentia and Lagomorpha was recovered like in various molecular studies (Meng and Wyss 2001, Meng 2003).

The increased taxon sampling in our Zan analysis provided more robust support for monophylies within various speciose Orders, contradictory to the topologies provided in the Supertree analysis. Our results reveal a complete resolution of genus-level relationships within Cetartiodactyla, Primates, and Rodentia, three of the most speciose Orders in Eutheria. In addition, these data support a variety of terminal nodes that have been contentious in past studies. For example, in Primates, tarsiers are placed as sister taxa to anthropoids and lemurs placed as the basal most taxon, which is consistent with the widely held view that tarsiers are more closely related to anthropoids than to the lemurs (Schmidt et al. 2005, Horvath et al. 2008).

Further, within Chiroptera, relationships recovered are consistent with current molecular evidence, with Yangochiroptera and Yinpterochiroptera as distinct groups and placed as sister taxa (Van Den Bussche and Hoofer 2004). In the most speciose

Order of mammals, Rodentia, the Zan phylogeny recovered contentious relationships, with the Hystricomorphs as basal to the remaining Suborders, a hypothesis recovered in many studies. Further, a distinct delineation between and within the remaining

Suborders Sciuromorpha, Castorimorpha, and Myomorpha (representative of Suborder

Anomaluromorpha not available) was recovered in the Zan phylogeny, which is a large undertaking for any single-character genetic analysis. Because of limited taxon representation in Orders Cingulata (n=1), Eulipotyphla (n=3), Perissodactyla (n=3),

84 Texas Tech University, Emma K. Roberts, May 2020

Pholiodota (n=1), Scandentia (n=1), Dermoptera (n=1), and Lagomorpha (n=2), no comments were made regarding resolution below the taxonomic Order.

On one hand, the phylogeny generated from the single character Zan produced a remarkably resolved and strongly supported phylogeny better than that produced by the Supertree obtained from many genes (Bininda-Emonds et al. 2007). However, what should the criteria be to accept a single gene phylogeny over a widely- established Supertree? There are three lines of evidence that we used to answer this question. First, previous studies (Roberts et al. Ontogeny) constraining the Zan topology to the Supertree topology showed that the Zan phylogeny is significantly different, presumably because the Zan tree is more fully resolved and the Supertree contains many polytomies particularly at the more terminal nodes (Stadler and Bokma

2013). Second, in situations where the Supertree was unable to resolve polytomies at both basal and terminal nodes, the Zan phylogeny recovered a plausible and supported topological hypothesis, consistent with other molecular studies. Third, there currently is substantial mounting evidence Zan is a speciation gene and therefore successfully tracks species divergence events across taxonomic levels (Roberts et al. Ontogeny).

Therefore, it may be that DNA sequences from the Zan gene may serve as the ultimate means to resolve and refine varying divergence-level relationships among placental mammals.

What will be required to test further the utility and possible limitations of Zan as a molecular marker, and describe species divergence events across all taxonomic levels? At a minimum, such studies must: 1) include all mammalian Orders and a

85 Texas Tech University, Emma K. Roberts, May 2020 more extensive assortment of species, and 2) test the recovery of lower taxonomic- level delineations by incorporating several pairs of sister taxa in the most speciose genera, i.e. Mus, Peromyscus, and Myotis. Toward this end, it may be that Zan is the preeminent molecular marker that defines genetic species.

86 Texas Tech University, Emma K. Roberts, May 2020

Parvorder Infraorder Suborder Order Traditional Clades Superorder Magnorder Dasypodidae Cingulata Xenarthra Orycteropodidae Tubulidentata Chrysochloridae Afroinsectiphilia1-3 Afrosoricida Afrotheria Tenrecidae Atlantogenata Procaviidae Hyracoidea Trichechidae 2 Sirenia 4-5 Paenungulata Proboscidea Tethytheria Talpidae Soricidae Eulipotyphla Erinaceidae Felidae, n=3 Feloidea Feliformia Canidae, n=3 Canoidea Mustelidae, n=2 Ursidae, n=2 Carnivora 6 Phocidae, n=2 Arctoidea Caniformia Ferae Otariidae Odobenidae Pegasoferae7 Manidae Pholiodota Miniopteridae Vespertilionidae, n=4 Yangochiroptera Rhinolophidae Chiroptera Laurasiatheria Hipposideridae Yinpterochiroptera Scrotifera8-9 Pteropodidae, n=3 Rhinocerotidae Ceratomorpha Equidae, n=3 Hippomorpha Perissodactyla Camelidae, n=4 Tylopoda Suidae Suina Cervida Ruminantia Bovidae, n=8 Euungulata10-11 Balaenopteridae Cetartiodactyla Mysteceti Physeteridae Lipotidae Cetacea Whippomorpha Monodontidae Odontoceti Delphinidae, n=2 Boreoeutheria Tupaiidae Dermoptera Cynocephalidae Scandentia Galagidae Indriidae Lemuriformes Strepsirrhini Cheirogaleidae Tarsiidae Tarsiiformes 12-13 Cebidae, n=3 Euarchonta Primates Aotidae Platyrrhini Callitrichidae, n=2 Haplorhini Hylobatidae Simiiformes Hominidae Hominidae, n=5 Catarrhini Cercopithecidae, n=13 Euarchontoglires Ochotontidae Lagomorpha Leporidae Bathyergidae Caviidae Hystricomorpha Octodontidae Chinchillidae Glires14-15 Sciuridae, n=2 Sciuromorpha Rodentia Castoridae Heteromyidae Castorimorpha Dipodidae Spalacidae Cricetidae, n=4 Myomorpha Muridae, n=5 Trionychidae/Chinese soft-shelled turtle Cryptodira Testudines Chelonia

87 Texas Tech University, Emma K. Roberts, May 2020

Fig. 3.1. Zan gene tree with corresponding taxonomic descriptions. Shown is a consolidated Family-level Zan phylogenetic tree corresponding to the species-level tree previously reported in Roberts et al. Ontogeny along with taxonomic descriptions. The node lacking statistical support is marked with a red dot (1/61 nodes unsupported). Note the statistically-supported monophyletic grouping of all Families into their respective Infraorders, Suborders, Orders, and of all Orders into their respective Superorders. An ‘n’ denotes number of species per Family, if greater than 1. Taxonomy per references indicated with superscripts (1: Rose and Archibald 2005; 2: Seiffert and Guillon 2007; 3: Asher et al. 2009; 4: Rose 2006; 5: Cooper et al. 2014; 6: Halliday et al. 2015; 7: Nishihara, Hasegawa, and Okada 2006; 8: Waddell 1999a; 9: Zhou et al. 2011; 10: Ursing and Arnason 1998; 11: Asher and Helgen 2010; 12: Waddell 1999b; 13: Kriegs et al. 2007; 14: Meng 2003; 15: Meng and Wyss 2001).

Figs. 3.2-3.7. Tanglegram comparisons of the Zan tree and Supertree. Shown are topology comparisons between the Zan gene tree (left) and Supertree or “ST” (right) at hierarchical taxonomic levels. Tangles denote incongruities between trees. Also, for incongruities that did not produce a tangle, circles denote the incongruous node, with closed and open circles indicating nodes with and without statistical support, respectively. Information regarding topologies and statistical support for the Zan tree and the reference Supertree can be found in Roberts et al. Ontogeny and Bininda-Emonds et al. (2007), respectively. Fig. 2: Superorder and Ordinal trees. Figs. 3-5: Laurasiatheria Family and Species trees. Figs. 6-7: Euarchontoglires Family and Species trees. Roman Numerals indicate taxonomic groupings.

88 A Afrotheria Afrotheria Xenarthra Xenarthra ST Zan Laurasiatheria Laurasiatheria Euarchontoglires Euarchontoglires

B Tubulidentata Tubulidentata Afrosoricida Afrosoricida Zan Hyracoidea Proboscidea ST Proboscidea Hyracoidea TexasSirenia Tech University,Sirenia Emma K. Roberts, May 2020

C Eulipotyphla Eulipotyphla A C IV Pholiodota Chiroptera A Afrotheria Afrotheria A Afrotheria Afrotheria I Carnivora Pholiodota Xenarthra Xenarthra ST Zan Xenarthra Xenarthra ST Zan Laurasiatheria Laurasiatheria Chiroptera Carnivora II Laurasiatheria Laurasiatheria Euarchontoglires Euarchontoglires Zan Perissodactyla Perissodactyla Euarchontoglires Euarchontoglires Cetartiodactyla Cetartiodactyla ST BB Tubulidentata Tubulidentata B Tubulidentata Tubulidentata Scandentia Scandentia Afrosoricida Afrosoricida V Afrosoricida Afrosoricida Dermoptera Dermoptera Zan Hyracoidea Proboscidea ST Zan Hyracoidea Proboscidea ST Primates Primates Proboscidea Hyracoidea III Proboscidea Hyracoidea Lagomorpha Lagomorpha Sirenia Sirenia Sirenia Sirenia Rodentia Rodentia

Fig. 3.2. SuperorderEulipotyphla and OrdinalEulipotyphla-level trees. Panel A: Superorder Tree. Note the absence of tangles but grouping of C Eulipotyphla Eulipotyphla Pholiodota Chiroptera Afrotheria with XenarthaPholiodota in the ZanChiroptera tree but divergence basal to the other three Superorders in the Supertree. Roman numerals denote Magnorders AtlantogenataCarnivora Pholiodota (I) and Boreoeutheria (II). Panel B: Afrotheria Orders. Note the tangle from grouping of Carnivora Pholiodota Sirenia with ProboscideaChiroptera in the ZanCarnivora tree, but instead with Hyracoidea in the Supertree. Roman numeral III denotes Traditional Chiroptera Carnivora CladeZan PaenungulataPerissodactyla. Panel C: LaurasiatheriaPerissodactyla (IV) and Euarchontoglires (V) Orders. Within Laurasiatheria, note in the Zan tree Zan Perissodactyla Perissodactyla the divergence of Pholiodota basal to consecutive divergenceST of Carnivora and Chiroptera from a clade of Perissodactyla and Cetartiodactyla Cetartiodactyla ST Cetartiodactyla. InCetartiodactyla contrast, noteCetartiodactyla in the Supertree the divergence of Chiroptera as the basal Order followed by a split of two Scandentia Scandentia clades, one comprisingScandentia PholiodotaScandentia and Carnivora and the other comprising Perissodactyla and Cetartiodactyla. Another Dermoptera Dermoptera discrepancy emergesDermoptera within EuarchontogliresDermoptera where in the Zan tree Primates grouped with Lagomorpha and Rodentia but with Primates Primates Scandentia and DermopteraPrimates in thePrimates Supertree. Lagomorpha Lagomorpha Lagomorpha Lagomorpha Rodentia Rodentia Rodentia Rodentia

89 A Felidae Felidae Zan Canidae Canidae ST Ursidae Ursidae Mustelidae Mustelidae Phocidae Phocidae Otariidae Otariidae Texas TechOdobenidae University,Odobenidae Emma K. Roberts, May 2020

Felis Acinonyx AA B B I I Felidae Felidae Acinonyx Felis Zan Canidae Canidae ST Panthera Panthera II Ursidae Ursidae Vulpes Vulpes ST Zan Mustelidae Mustelidae Canis Canis Phocidae Phocidae Ailuropoda Ailuropoda Otariidae Otariidae II Ursus Ursus Odobenidae Odobenidae Mustela Mustela Enhydra Enhydra B Felis Acinonyx Callorhinus Callorhinus Acinonyx Felis Neomonachus Odobenus Panthera Panthera Odobenus Mirounga Vulpes Vulpes ST Leptonychotes Leptonychotes Zan Canis Canis Fig. 3.3. OrderAiluropoda Carnivora.Ailuropoda Panel A: Families. Note in the Zan tree the grouping of Ursidae and Mustelidae with a clade comprising Phocidae,Ursus Otariidae,Ursus and Odobenidae, but in the Supertree as single family divergences following the split of Felidae and Canidae.Mustela RomanMustela numerals denote Suborders Feliformia (I) and Caniformia (II). Panel B: Species. Note the tangle within Felidae, Enhydrawith Felis basalEnhydra to Acinonyx and Panthera in the Zan tree and Acinonyx basal to Panthera and Felis in the Supertree. Note also the discrepancy from grouping of Ailuropoda and Ursus with Mustela and Enhyra in the Zan tree but Callorhinus Callorhinus placement of Ailuropoda and Ursus basal to a large clade of Mustela, Enhydra, Callorhinus, Odobenus, Leptonychotes, and Mirounga inNeomonachus the Supertree. Odobenus Odobenus Mirounga Leptonychotes Leptonychotes

90 C Miniopteridae Pteropodidae Vespertilionidae Miniopteridae ST Zan Pteropodidae Vespertilionidae RhinolophidaeTexas Tech University,Rhinolophidae Emma K. Roberts, May 2020 Hipposideridae Hipposideridae

A C BD Miniopteridae Pteropodidae I Vespertilionidae Miniopteridae ST I Miniopterus Miniopterus Zan Pteropodidae Vespertilionidae Zan Eptesicus Eptesicus ST II Rhinolophidae Rhinolophidae Myotis d. Myotis d. Hipposideridae Hipposideridae Myotis b. Myotis b. Myotis l. Myotis l.

D II Rousettus Rousettus Zan Miniopterus Miniopterus Pteropus v. Pteropus v. ST Zan Eptesicus Eptesicus ST Pteropus a. Pteropus a. Fig. 3.4. Order Chiroptera.Myotis d. MyotisPanel d. A: Families. Note the incongruency from grouping of Pteropodidate with Rhinolophidae and HipposideridaeMyotis in b. the ZanMyotis tree b. placement of Pteropodidae basal to all other bat families in the Supertree. Roman numerals denote SubordersMyotis Yangochiroptera l. Myotis l. (I) and Yinpterochiroptera (II). Panel B: Species. Top: Miniopteridae and Vespertilionidae. Botton: Pteropodidae. Note the absence of tangles within families Minopteridae, Vespertilionidae and Pteropodidate and within the generaRousettus Myotis (MyotisRousettus d: davidii, Myotis b: brandtii, Myotis l: lucifugus) and Pteropus (Pteropus v: vampyrus, PteropusZan a: alectoPteropus). v. Pteropus v. ST Pteropus a. Pteropus a.

91 A

Camelidae Camelidae Suidae Suidae Zan Balaenopteridae Balaenopteridae ST Pyseteridae Pyseteridae Lipotidae Lipotidae Monodontidae Monodontidae Texas TechDelphinidae University,Delphinidae Emma K. Roberts, May 2020 Cervidae Cervidae Bovidae Bovidae

AA B B Vicugna Vicugna I Camelidae Camelidae A I Camelus b. Camelus b. Suidae Suidae Camelidae Camelidae Zan ST Camelus d. Camelus d. Balaenopteridae Balaenopteridae Suidae Suidae Zan II Sus Sus II Pyseteridae Pyseteridae Zan Balaenopteridae Balaenopteridae STST Balaenoptera Balaenoptera Lipotidae Lipotidae Pyseteridae Pyseteridae III Physeter Physeter Monodontidae Monodontidae Lipotidae Lipotidae III Lipotes Lipotes Delphinidae Delphinidae Monodontidae Monodontidae Delphinapterus Delphinapterus Cervidae Cervidae IV DelphinidaeTursiops TursiopsDelphinidae Bovidae Bovidae OrcinusCervidae OrcinusCervidae OdocoileusBovidae OdocoileusBovidae B Vicugna Vicugna Bubalus Bubalus B Camelus b. Camelus b. IV VicugnaBison BisonVicugna Camelus d. Camelus d. CamelusBos t. b. BosCamelus t. b. Sus Sus Bos g. Bos g. Zan ST Camelus d. Camelus d. Balaenoptera Balaenoptera PantholopsSus PantholopsSus Zan ST Physeter Physeter BalaenopteraCapra CapraBalaenoptera Lipotes Lipotes PhyseterOvis OvisPhyseter Delphinapterus Delphinapterus Lipotes Lipotes Fig. 5. Order TursiopsCetartiodactyla.Tursiops Panel A:Families. Note the absence of tangles amongDelphinapterus ungulate Families.Delphinapterus Roman numerals denote SubordersOrcinus TylopodaOrcinus (I), Suina (II), Whippomoprha (III), and Ruminantia (IV).Tursiops Panel B: Species:Tursiops Cetartiodactyla. Note in the SupertreeOdocoileus the polytomyOdocoileus within the families Cervidae and Bovidae in the SupertreeOrcinus tree that isOrcinus resolved in the Zan tree. Also, genera withinBubalus the polytomyBubalus are resolved with Bubalus as basal to Bison and BosOdocoileus spp. (Bos t: Odocoileustaurus, Bos g: grunniens). Bison Bison Bubalus Bubalus Bos t. Bos t. Bison Bison Bos g. Bos g. Bos t. Bos t. Pantholops Pantholops Bos g. Bos g. Capra Capra Pantholops Pantholops Ovis Ovis Capra Capra Ovis Ovis

92 A Galagidae Galagidae Cheirogaleidae Cheirogaleidae Indriidae Indriidae Zan Tarsiidae Tarsiidae ST Cercopithecidae Cercopithecidae Hylobatidae Hylobatidae Hominidae Hominidae Cebidae Cebidae TexasCallitrichidae Tech University,Callitrichidae Emma K. Roberts, May 2020 Aotidae Aotidae

Galagidae Galagidae Otolemur Otolemur A B I A I Cheirogaleidae Cheirogaleidae B Microcebus Microcebus Indriidae Indriidae Propithecus Propithecus Zan Tarsius Tarsius II Tarsiidae Tarsiidae ST II Nomascus Nomascus Cercopithecidae Cercopithecidae Zan Pongo Pongo Hylobatidae Hylobatidae ST Gorilla Gorilla Hominidae Hominidae Homo Homo Cebidae Cebidae Pan t. Pan t. Callitrichidae Callitrichidae Pan p. Pan p. Aotidae Aotidae Colobus a. Colobus a. Colobus p. Colobus p. B Otolemur Otolemur Chlorocebus Chlorocebus Macaca n. Macaca n. Microcebus Microcebus Macaca f. Macaca f. Propithecus Propithecus Macaca m. Macaca m. Tarsius Tarsius Cercocebus Cercocebus Nomascus Nomascus Mandrillus Mandrillus Zan Pongo Pongo ST Theropithecus Theropithecus Gorilla Gorilla Papio Papio Aotus Aotus Homo Homo Saguinus Saguinus Pan t. Pan t. Callithrix Callithrix Pan p. Pan p. Cebus Cebus Colobus a. Colobus a. Saimiri b. Saimiri b. Colobus p. Colobus p. Saimiri s. Saimiri s. Chlorocebus Chlorocebus Fig. 3.6. Order Primates.Macaca Panel n. AMacaca: Families n. . In the Supertree, note the polytomy between families Cebidae, Callitrichidae, and Aotidae that is resolvedMacaca inf. the ZanMacaca tree f. with Cebidae placed basal to a clade containing Callitrichidae and Aotidae. Roman numerals denote SubordersMacaca m.StrepsirrhiniMacaca m. (I) and Haplorhini (II). Panel B: Species. Despite the absence of tangles, note the incongruities between theCercocebus Zan and CercocebusBE tree with grouping of Nomascus and Pongo together in the Zan tree but placement of Nomascus basal to PongoMandrillus in the SuperMandrillustree. Also, in the Zan tree, Cercocebus diverges basal to a clade comprising Mandrillus, Theropithecus Theropithecus and Papio, but groupsTheropithecus as sister to Mandrillus in the Supertree. Finally, the Supertree exhibits a large polytomy Papio Papio comprising Aotus, a clade Aotusof SaguinusAotus and Callithrix, and a clade including Cebus and Saimiri spp. (Saimiri b: boliviensis, Saimiri s: sciureus), thatSaguinus is resolvedSaguinus in the Zan tree. Other abbreviations are: Pan t: troglodytes, Pan p: paniscus, Colobus a: angolensis, Colobus p: polykomosCallithrix , CallithrixMacaca n: nemestrina, Macaca f: fascicularis, Macaca m: mulatta. Cebus Cebus Saimiri b. Saimiri b. Saimiri s. Saimiri s. 93 A Bathyergidae Sciuridae Caviidae Octodonidae Octodonidae Bathyergidae Zan Chinchillidae Caviidae ST Sciuridae Chinchillidae Castoridae Castoridae Heteromyidae Heteromyidae Dipodidae Dipodidae Spalacidae Spalacidae Texas Tech MuridaeUniversity,Muridae Emma K. Roberts, May 2020 Cricetidae Cricetidae

A B CA Bathyergidae Sciuridae BD Fukomys Ictidomys I Caviidae Octodonidae I Cavia Marmota Octodonidae Bathyergidae Octodon Octodon Chinchillidae Chinchilla Fukomys ST Zan Caviidae ST Zan Ictidomys Cavia II Sciuridae Chinchillidae II Marmota Chinchilla Castoridae Castoridae III III Castor Castor Heteromyidae Heteromyidae Dipodomys Dipodomys Dipodidae Dipodidae Jaculus Jaculus IV Spalacidae Spalacidae IV Spalax Spalax Muridae Muridae Meriones Meriones Cricetidae Cricetidae Rattus Rattus Mus p. Mus p. Mus c. Mus c. B Fukomys Ictidomys Mus m. Mus m. Cavia Marmota Peromyscus Peromyscus Octodon Octodon Microtus Microtus Mesocricetus Mesocricetus Chinchilla Fukomys ST Zan Cricetulus Cricetulus Ictidomys Cavia Chinchilla Fig. 3.7. Order RodentiMarmota a. Panel A: Families. Note the extensive tangles among the Families Bathyergidae, Caviidae, Castor Castor Octodontidae, Chinchillidae and Sciuridae. Note also the other incongruities: in the Zan tree a basal clade comprises Dipodomys Dipodomys Bathyergidae, Caviidae, Octodontidae and Chinchillidae is basal, however Sciuridae is the basal family in the BE tree. There is Jaculus Jaculus a polytomy containingSpalax Bathyergidae,Spalax Caviidae and Chinchillidae, whereas in the Supertree Sciuridae alone diverges basal to the other families;Meriones a polytomyMeriones in the Supertree comprising Bathyergidae, Caviidae and Chinchillidae that is resolved in the Zan tree; grouping in theRattus Zan treeRattus of Castoridae and Heteromyidae with a clade comprising Dipodidae, Spalacidae, Muridae, and Cricetidae, but in theMus p.SupertreeMus p.their grouping instead with a clade comprising Octodontidae, Bathyergidae, Caviidae, and Chichillidae; finally,Mus inc. the SupertreeMus c. a polytomy of Spalacidae, Muridae, and Cricetidae group that is resolved in the Zan tree. Roman numerals denoteMus m. SubordersMus m. Hystricomorpha (I), Sciuromorpha (II), Castorimorpha (III), and Myormopha (IV). Panel B: Species. NotePeromyscus the extensivePeromyscus tangles among Fukomys, Cavia, Octodon, Chinchilla, Ictidomys, and Marmota between the Zan and BE tree. NoteMicrotus also in theMicrotus Supertree a large polytomy comprising Spalax, Meriones, Rattus and Mus spp. (Mus p: pahari, Mus c: caroli, MesocricetusMus m: musculusMesocricetus), Peromyscus, Microtus, Mesocricetus, and Cricetulus, that is resolved in the Zan tree. Cricetulus Cricetulus

94 Texas Tech University, Emma K. Roberts, May 2020

Table 3.1. Orders with limited taxon representation and therefore no tanglegrams, and taxonomic designations of included species, listed Xenarthra> Afrotheria> Laurasiatheria> Euarchontoglires.

Order Family Species Common Name Cingulata Dasypodidae Dasypus novemcinctus Nine-banded armadillo Tubulidentata Orycteropodidae Orycteropus afer Aardvark Afrosoricida Tenrecidae telfairi Lesser hedgehog tenrec Chrysochloridae asiatica Cape golden mole Hyracoidea Procaviidae Procavia capensis Rock hyrax Sirenia Trichechidae Trichechus manatus West Indian manatee Proboscidea Elephantidae Loxodonta africana African savannah elephant Eulipotyphla Talpidae Condylura cristata Star-nosed mole Soricidae Sorex araneus Common shrew Erinaceidae Erinaceus europaeus European hedgehog Pholiodota Manidae Manis javanica Malayan pangolin Perissodactyla Rhinocerotidae Ceratotherium sinum Southern white rhinoceros Equiidae Equus asinus Donkey Equus caballos Horse Equus przewalskii Przewalski’s horse Dermoptera Cynocephalidae Galeopterus Sunda flying lemur variegatus Scandentia Tupaiidae Tupaia belangeri Chinese treeshrew Lagomorpha Octodontidae Octodon degus Pika Leporidae Oryctolagus cuniculus Domestic rabbit

95 Texas Tech University, Emma K. Roberts, May 2020

CHAPTER IV

ZAN VWD DOMAIN DUPLICATION AND DIVERGENCE REFLECT UNIQUE EVOLUTION OF A SPECIATION GENE IN MYOMORPH RODENTS

Abstract The rapid evolution of mosaic proteins during the Metazoan radiation occurred in part by duplication and functional divergence of protein domains. Internal tandem duplications can lead to the rapid expansion of protein domain repeats that may then functionally evolve and thereby promote the adaptation of species. The molecular evolution of the mosaic protein Zan, which mediates species-specific recognition between sperm and egg, subserves its function as a speciation gene in Eutheria.

Notably, tandem duplication of a two-exon cassette encoding the TIL domain portion of the Zan von Willebrand type-D domain 3 has produced dramatic expansions of partial VWD domains (designated D3p) in the Myomorpha Suborder of Rodentia, with 20 D3p repeats first identified in Mus musculus. Because the molecular evolution of orthologous protein domains may reflect phylogenetic relationships between related species, here we characterized the relationship between phylogeny and the pattern of D3p expansion in 12 myomorph species. The number of D3p domains varied between species, ranging from zero repeats in the most primitive species examined, Jaculus jaculus, to 24 repeats in Peromyscus maniculatus. The number of repeats generally reflected species richness of taxa in the more terminal branches of rodent phylogeny, with 23-24 D3p domains in Peromyscus spp., 20 in

Mus and in cricetid hamsters, yet only nine D3p domains in Spalax. Bayesian and

96 Texas Tech University, Emma K. Roberts, May 2020

Maximum Likelihood analyses of DNA sequences encoding all 235 individual D3p domains identified 21 conserved domain Groups, with three additional novel domain

Groups specific to murid species. Further phylogenetic and comparative analyses identified putative, primitive tandem repeats based on their statistically supported association to each other and, with the exception of Jaculus jaculus, their presence in all myomorph species examined. The nature of the Groups indicates that duplication events are occurring on the 5’ end of the expansion, with progressively more primitive domains pushed downstream, remaining in proximity to the D4 VWD domain and the

C-terminus of the protein. Surprisingly, Groups identified as most primitive did not diverge at the base of phylogenetic tree, revealing an effect of selection on the evolution of the domain Groups. Groups II, V, X, and XI exhibited the greatest intensity of positive selection but also variable support among nodes in the phylogeny, suggesting a powerful effect of differential positive selection on the functions of those domains. We conclude that a series of tandem D3p domain duplication events occurred since the divergence of myomorph rodents and that the duplication pattern reveals evidence of concerted gene evolution and pervasive and often strongly positive selection within Groups of tandem repeats. The findings suggest that the extent of

D3p domain expansion may be a useful criterion both for inclusion of species in

Myomorpha and for placement of them within the various myomorph genera.

Introduction Gene duplications underlie not only the evolution of new biological functions, but also the evolution of increasingly complex organisms. Oftentimes, theoretical

97 Texas Tech University, Emma K. Roberts, May 2020 studies assume that a duplication per se is selectively neutral, and that following a duplication one of the gene copies is ‘freed’ from purifying (stabilizing/negative) selection. In this conception, the duplicated genes serve as an essential source of new genetic raw material for evolution of a new function by positive selective forces, and thus give rise to new gene family members as exemplified by the vast assortment of related kinases, dehydrogenases, globins, and other paralogous gene products throughout biology.

Mosaic proteins constitute the majority (80%) of proteins in Metazoa and have contributed in large part to the diverse evolutionary history of many animal groups

(Doolittle 1995). In multi-domain proteins, domain subunits exhibit distinct structures that reflect their function and evolutionary history. Thus, as in duplication of entire genes, duplicated DNA encoding protein domains serves as a source of new genetic material for adaptation and the evolution of species. Protein domains are evolutionarily mobile, being continuously duplicated, rearranged, and sometimes deleted to generate a variety of mosaic proteins with differing functions in various lineages, as shown by the many known protein domain families each comprising numerous functional proteins (Doolittle 1995). With the increasing wealth of sequences and annotation data provided in large part by the phylogenomics era, unknown structures are modeled, new evolutionary relationships are revealed, phylogenies are inferred, and species adaptations are discovered. Extensive studies spanning many years have documented the creation of new multi-domain architectures through shuffling of protein domains, but have not fully delineated the origin,

98 Texas Tech University, Emma K. Roberts, May 2020 mechanism and evolutionary significance of internal tandem duplications in most proteins and animal systems.

An effective way to study evolution of protein structure and function is by examining the evolution of domain composition in functional proteins, as changes in domain architecture are underlined by large alterations at the gene level.

Characterizing the molecular evolution of orthologous protein domains may reveal relationships between related species, especially in proteins that contribute to processes such as adaptation, reproductive isolation and speciation. Moreover, functional proteins with remarkable sequence variation by internal tandem duplication provide an opportunity for greater resolution of those species and examination of divergence events.

The gamete recognition gene Zan encodes a large, mosaic protein, zonadhesin, in the sperm acrosome that mediates adhesion to the egg’s zona pellucida (ZP) during mammalian fertilization (Hardy and Garbers 1994, Hardy and Garbers 1995, Tardif et al. 2010a). ZP-binding activity of pig zonadhesin maps to its D0-D4 VWD domains

(Hardy and Garbers 1994, Hardy and Garbers 1995), and previous studies examining the molecular evolution of Zan D0-D4 domains revealed that their rapid divergence by positive selection confers species-specificity to mammalian fertilization that in turn serves as a mode of prezygotic reproductive isolation, thereby promoting speciation

(Roberts et al. Ontogeny). Consequently, as a speciation gene, Zan appears to be a more informative molecular marker of mammalian phylogeny than other genes and

99 Texas Tech University, Emma K. Roberts, May 2020

Supertree datasets (Roberts et al. Phylogeny) because divergence of the gene directly reflects species divergence events.

Whereas much phylogenetic information can be obtained in studies of Zan D0-

D4 VWD domains conserved in all Eutherian species, such studies have not considered the contribution of variable domain duplication to Zan’s function in reproduction and speciation. Indeed, Zan’s molecular evolution also includes dramatic expansions of partial VWD domains (D3p domains; Gao & Garbers 1998,

Wilson et al. 2001) in some rodent species, specifically in Suborder Myomorpha, raising the possibility that these domains also contribute to the protein’s function in gamete recognition. The absence of D3p domains in most placental mammals suggests that their duplication represents a dramatic and comparatively recent development in the molecular evolution of Zan that is unique to Suborder

Myomorpha. Importantly, the partial VWD domains in these species likely contribute to egg recognition, as the D3p18 domain of mouse zonadhesin is highly autoimmunogenic (Wheeler et al. 2011), and affinity-purified antibody to it inhibits mouse sperm-egg adhesion (Tardif et al. 2010a). The D3p domain repeats likely arose by internal tandem duplications (Tardif et al. 2010a), either as a duplication of the original domain copy (D3 TIL; Gao and Garbers 1998) or another D3p. Nevertheless, the purpose of this dramatic D3p expansion only in a subset of rodent species remains unclear.

Here we tested the hypothesis that duplication and divergence of Zan D3p domains reflects a contribution to the divergence of myomorph rodents. Specifically,

100 Texas Tech University, Emma K. Roberts, May 2020 we examined the molecular evolution of protein domain architecture of the Zan gene and encoded protein across myomorph species by: 1) determining presence and pattern of D3p domains in a variety of species, 2) comparing alignments of individual D3p domain sequences to detect primitive copies, and 3) potentially explaining potential processes of D3p domain internal tandem duplication. The results suggest that D3p domains likely contributed to speciation in the unique evolutionary history of

Myomorpha.

Methods

Retrieval and comparison of Zan sequences We first retrieved nucleotide sequences annotated as “zonadhesin” in GenBank for all available rodent species (last accessed 05 September 2019). Among the retrieved sequences we identified authentic Zan (conserved domain composition and synteny, see Roberts et al. Ontogeny) in 20 rodent species representing 17 genera and

11 families.

To test whether zonadhesin D3p variation correlates with divergence of rodent species, we identified and compared D3 TIL and D3p nucleotide sequences among all available myomorph species. Tandem D3 TIL and D3p domains for each species were individually detected by Dot Plot comparison (range: 50-60% match threshold,

30-residue window; MegAlign program; Lasergene 14 software, DNASTAR Inc.,

Madison, WI). Mus musculus D3p18 served as the query sequence because of its high level of autoimmunogenicity with D3p domains (Wheeler et al. 2011). See Table 4.1 for accession numbers of all rodent Zan sequences. Manual quality assessment and

101 Texas Tech University, Emma K. Roberts, May 2020 subsequent splicing was performed in Laser Gene (see above ref.), including assembly of a conceptually predicted contig of Meriones Zan D3p3 based on high domain divergence between tandem repeats.

Comparison of Zan in monotremes and placentals To evaluate zonadhesin among mammalian species, we first characterized the genes’ exon and encoded domain structures in mouse and subsequently compared them to a Zan-like (designated ZanL) gene from a primitive mammal, platypus, and also compared them to human Zan.

Sequence alignments and phylogenetic analysis of Zan Myomorph Zan DNA sequences encoding the VWD3 TIL and D3p domains were aligned (range: 272-437 nts each) using the ClustalW methodology in MegAlign

(Lasergene 14 software, DNASTAR Inc., Madison, WI). To confirm correct reading frames and detect premature stop codons, we translated the aligned sequences in

EditSeq and SeqBuilder (Lasergene 14 software, DNASTAR Inc., Madison, WI). All alignments and resultant trees will be made available in the DRYAD Digital

Repository.

We examined 56 maximum likelihood models with the hierarchical likelihood ratio tests (hLRTs) and Akaike Information Criterion (AIC) in jModelTest2 (Darriba et al. 2012) to detect the best-fit model of nucleotide substitution and identified

GTR+I+G as the most appropriate model (-lnL=28772.27). The partial VWD0 TIL domain from Mus musculus was selected as the outgroup. To perform likelihood analysis under a Bayesian inference model, we used MrBayes v.3.2.6 (Ronquist et al.

102 Texas Tech University, Emma K. Roberts, May 2020

2012) with the following parameters: 2 independent runs with four chains, one cold and three heated (Metropolis-coupled Markov chain Monte Carlo, MCMCMC), 10 million generations and sample frequency every 1000 trees from the last 9,000,000 generated, then constructed a consensus tree (50% majority rule) from the remaining trees and plotted statistically significant posterior probabilities (³0.95) on the topology in FigTree1.4.4 (Rambaut 2018). The program RAxML v.8.2.12 (Stamatakis 2018) performed a likelihood analysis using with the following parameters: base frequencies

(A = 0.217, C = 0.315, G = 0.262, and T = 0.206), proportion of invariable sites (I =

0.019), and alpha (=1.184). Bootstrap methodology (10,000 iterations) evaluated ML nodal support and bootstrap values ³70 indicated moderate-to-strong statistical support (Felsenstein 1985).

Selection tests We globally evaluated individual domain Group alignments for the relative contribution of neutral evolution, negative selection, and positive selection by the

CODEML program in the PAML 4.9d package (Yang 2007). Omega values (w, dN/dS ratios) were calculated from the individual for each domain Group with three comparisons wherein the null model, M0, assumed one global ratio and constrained w to be equal on all branches in the phylogeny. The initial comparison (M1 vs. M2 and

M7 vs. M8) tested for neutrality, wherein M1 and M7 assumed independent w ratios for all branches in the phylogeny. The subsequent comparisons (M1 vs. M2 and M7 vs. M8) tested for selection, wherein both M2 and M8 allowed w>1 and detected variation in w among sites using a Bayes Empirical Bayes approach to calculate

103 Texas Tech University, Emma K. Roberts, May 2020 posterior probabilities for sites under selective pressures (Yang 2007). We then determined which model (M1, M7, M2, or M8) was most appropriate for each domain

Group by likelihood ratio tests (LTRs), using a chi-squared distribution, degrees of freedom equaling 2 for both comparisons, M1 vs M2 and M7 vs M8, and statistical significance of p<0.05. Selection tests were not performed on domain Groups not including at least four D3p domains (e.g. Spalax D3p6-7).

Results

Retrieval and comparison of Zan sequences D3p domain expansions occurred only in species comprising the rodent

Suborder Myomorpha (Fig. 4.1). The number of D3p domains varied among species, ranging from zero repeats in the most primitive myomorph species examined, Jaculus jaculus, to 24 repeats in Peromyscus maniculatus. See Table 4.1 for gene sequence source accession numbers and corresponding taxonomic designations for all rodent and reference species examined. The number of repeats generally reflected species richness of taxa in the more terminal branches of rodent phylogeny (Table 4.1), with

>16 D3p domains in the most speciose taxa, yet <10 D3p in the least speciose taxa.

For example, there were 23-24 D3p domains in Peromyscus spp., 20 in Mus and in cricetid hamsters, yet only 16 in Microtus and nine D3p domains in Spalax. The

Meriones Zan sequence posed several issues for comparative analyses, mainly frameshifts and low sequence divergence with other closely related murid species.

To examine zonadhesin protein domain architecture across myomorph species, we mapped the protein sequence of all myomorphs examined with Cavia porcellus

104 Texas Tech University, Emma K. Roberts, May 2020

(guinea pig, non-myomorph) and Oryctolagus cuniculus (rabbit, non-rodent), as references (Fig. 4.1). There was extensive variation in both number of D3p domain expansions, ranging from 0 in Jaculus jaculus to 24 in Peromyscus maniculatus,

(Table 4.1) and length of both Meprin/A5/mu and Mucin domains.

Comparisons of ZanL and Zan domain architecture Comparison of protein domains of a ZanL gene in platypus and human Zan

(Fig. 4.2, panel A) revealed platypus comprised 67 exons, in contrast to the 48 exons of human ZAN, owing to the presence of a nearly perfect duplication of platypus exons

3-26 as exons 27-50. Domain architectural comparisons show the TIL domain two- exon cassettes are conserved back to platypus ZanL, at the divergence of Subclasses

Prototheria and Theria (Metatheria and Eutheria). Additionally, no MAM domains are present in the platypus ZanL gene.

Comparison of 48 exon human ZAN and 88 exon mouse Zan architectures revealed a close correlation between intron-exon and protein domain boundaries, as well as conservation of exon order and length across the VWD domains, with 30 exons identical in length between the two species (Fig. 4.2, panel B). Notably, with the exception of the variably spliced TIL4 of VWD4, the TIL (trypsin inhibitor-like) domains at the 3’ ends of the proteins’ VWD1-4 domains were each encoded by a cassette of two exons, the first approximately 210 bp and the second approximately

140 bp. The relative conservation of these two-exon cassettes persisted in the 40 additional mouse Zan exons encoding the 20 domain expansion of TIL repeats (D3p domains) present between the D3 and D4 VWD domains (Wilson et al. 2001).

105 Texas Tech University, Emma K. Roberts, May 2020

Phylogenetic and comparative analyses of Zan Bayesian and RAxML analysis of the D3 TIL (TIL3) and D3p alignment

(GTR+I+G nucleotide substitution model) generated a phylogeny (Figs. 4.3 and 4.4) with 138 of 236 nodes statistically supported by Bayesian posterior probability values

(p³0.95) and additional ML nodal support (³70) at 20 nodes. This analysis sorted the

D3p domains into 21 putatively orthologous domain Groups with variable statistical support (Figs. 4.3-4.5), as well as three additional novel domain Groups specific to

Spalax (D3p2-4) Meriones (D3p2-7), and to the murid species examined (D3p2-6).

Fourteen of the 21 identified Groups (I-IX, XI-XII, XIV-XV, and XVII) formed statistically supported monophyletic branches (Figs. 4.3-4.5). The remaining seven

Groups each also assembled together but with varying support. Though the D3p domains are presumably internal tandem duplications either of the master copy, D3

TIL, or another duplicated D3p, little support (2 of 26 nodes) was recovered between domain Groups (Fig. 4.4).

Further comparative analyses (Fig. 4.5) identified likely primitive tandem repeats, domain Groups I and II, at the downstream ends of their respective expansions. Criteria for identifying these Groups as primitive were their statistically supported association to each other and, with the exception of Jaculus jaculus, their presence in all myomorph species examined. Surprisingly, Groups identified as most primitive, along with the D3 TIL (original copy) did not diverge at the base of phylogenetic tree, implying the action of selection on these Groups.

106 Texas Tech University, Emma K. Roberts, May 2020

Selection tests Tests for evidence of positive selection (Table 4.2-nucleotide substitution models for each domain Group; Table 4.3-4.6-selection analyses) revealed that all domain Groups were experiencing pervasive and sometimes quite intense positive selection, with omega (ω) values from the comprehensive M8 model ranging from

1.12 in 65.8% of sites in Group XIV up to 5.55 in 12.4% of sites in Group X, all with significant p-values (Table 4.6). A higher omega intensity generally correlated with lower pervasiveness, while a lower omega intensity correlated with higher pervasiveness. Interestingly, Groups II, V, X, and XI exhibited the greatest intensity of positive selection (all ω > 3) of sites, but also variable Bayesian support among nodes in the phylogeny, ranging from 9/11 intra-Group supported nodes (82%) in

Group XI (ω = 5.25) to only 5/11 intra-Group supported nodes (45%) in Group X (ω =

5.53). Among D3p domain Groups, a quantitative comparison (Fig. 4.6) of the statistically supported sites with ω>1 showed clustering of positively selected sites in

Groups VI-VIII, XII-XV, XIX, and XXI.

Discussion The expansion and diversification of D3p domain Groups in the molecular evolution of the gamete recognition protein zonadhesin likely accelerated the rapid proliferation and divergence of myomorph species. Four findings support this conclusion: First, in the Zan ontogeny, conserved two-exon cassettes encoding TIL domains are detectable at the base of the Mammalian radiation in Prototherians

(platypus) and throughout placental Therians (more than 100 Eutherian spp examined)

107 Texas Tech University, Emma K. Roberts, May 2020 suggesting that evolutionary mobility of the conserved domains is functionally important. Second, the D3p domain expansions occurred only in Myomorpha, suggesting a unique contribution of the duplicated TIL domains in fertilization, reproductive isolation, and speciation in these species. Third, the D3p domain expansions in Myomorpha occurred by conserved duplication of D3 TIL domains on the 5’ end of the developing expansions, with the earliest duplications pushed progressively downstream. Fourth, the strength and pervasiveness of selection on D3p domain Groups varies, presumably reflecting variation in function during ZP adhesion.

Although the zonadhesin domain architecture is similar among most Eutherian species, a dramatic change, especially unique to a specific lineage, suggests a functional change. An important aspect of protein domain evolution resides in their three-dimensional structure and biochemical function, both of which are specified by the information in the amino acid sequence (Doolittle 1995, Socolich et al. 2005,

Halabi et al. 2009). Any changes in sequence, especially those non-synonymous substitutions that change amino acid side chain chemistry (Yue et al. 2005, Jordan et al. 2010) often bring about new folding capabilities, and therefore protein architectures and functions (Halabi et al. 2009, McLaughlin, Jr. et al. 2012). The varying number of duplications in the myomorph rodent lineage suggests that the duplications are most likely occurring frequently and also generating dramatic structural differences that are relevant to zonadhesin function in fertilization. Moreover, the correlation between the number of D3p duplications and species richness of taxa in the more terminal branches

108 Texas Tech University, Emma K. Roberts, May 2020 of the rodent phylogeny suggests variable effects on species-specificity and therefore potential for differential diversification within these speciose rodent groups.

The conservation of two-exon cassettes, detectable back to the divergence of

Prototheria and Theria (~166 my; Bininda-Emonds et al. 2007), suggests the encoded

TIL protein has functional importance. Considering that Zan is a rapidly evolving protein, the relatively long evolutionary history of TIL domain intron-exon boundary conservation back to the Prototheria/Theria split implies that these domains are functional units whose mobility is important for evolution of Zan function in species- specific egg recognition. Further, intra-chain domain interactions are essential to the native function of many proteins, and therefore residues residing in such positions experience higher evolutionary pressures imposed upon them by stabilizing selection

(Bhaskara and Srinivasan 2011, Verma and Pandit 2019) to promote conservation of protein function.

It is well documented that genes and their protein domains are duplicated by various mechanisms. On the largest scale, whole genome duplications such as those seen in many vertebrate genomes duplicated whole gene families (Dehal and Boore

2005, Maere et al. 2005, Hughes and Liberles 2008). Domains also have been duplicated through genetic mechanisms like exon-shuffling, retrotransposition, recombination, and horizontal gene transfer (Bjorklund et al. 2006, Buljan and

Bateman 2009). Because genetic forces such as exon-shuffling and genome duplication vary widely among species, the total number of domains present fluctuates among species (Bagowski et al. 2010, Keren et al. 2010). The majority of internal

109 Texas Tech University, Emma K. Roberts, May 2020 domain duplications can be explained by the serial duplication of single domains, called ‘rectification’, explained by the Master Slave Hypothesis (Callan 1967).

However, in the generation of long tandem repeats, especially in proteins that are directly involved in cell-cell interaction, protein domains tend to accumulate non- linearly and therefore, multiple domains can duplicate independently and simultaneously (democratic gene conversion model; Edelman and Gally 1968,

Edelman and Gally 1970, Gally 1989). In other words, duplications are dynamic as a result of domains duplicating singularly or in totum as a syntenic block.

Typically, a protein domain is duplicated and subsequently allocated in close proximity to its origin and is likely to alter its function (Kassahn et al. 2009, Tauz and

Domazet-Loso 2011). However, evolution of domain combinations is not purely stochastic but instead depends on selection of certain functions possibly because the global function is dependent on the interface between the domains or because they are both necessary for proper function. Proteins with new functions or specificities are generated through domain fusion and recombination as well as differentiation of existing domains (Ponting and Russell 2002, Jin et al. 2009). Single-domain proteins from the same domain family have a greater chance of having similar functions

(Orengo et al. 1994). A natural precedent is shown in ancient enzymes with broad substrate specificities that have evolved into more specific enzymes through gene duplication. Enzymes such as these often retain their biochemical function while gaining new substrate specificities or regulation mechanisms by the addition of a domain (Tawfik 2010, Voordeckers et al. 2012). Understanding the underlying

110 Texas Tech University, Emma K. Roberts, May 2020 mechanisms of protein evolution through domain duplications and sequence differentiation is crucial for understanding the development of new protein functionalities.

The Zan gene in rodents exhibits a large degree of variation between species, both in domain structure and composition. The additional repeats represent an evolutionary novelty of myomorph rodents that are comparatively younger than D0-4 and reveal a more rapid divergence rate than the remaining VWD domains (Herlyn and Zischler 2004). The nature of the D3p expansion and diversification suggests selection acting on D3p Groups varies substantially presumably reflecting variation in function during ZP adhesion. The distribution of positive selection in orthologous

D3p domain Groups indicates the D3p domains duplicated nearest to the putative site of propagation (D3 TIL) are most likely to confer species-specificity to gamete binding activity. Although the selective forces identified were often strong in this mammalian system, it could not be distinguished if selection was the result of preferential propagation of a copy from one generation to the next due to processes such as enhanced viability (ecological selection) or increased fertility (sexual selection) of the copy ‘carriers’ (Simmons 2005, Herlyn and Zischler 2004).

Regardless, the directional propagation and evolution of D3p internal tandem repeats by positive selection may reflect an important contribution of these processes to the rapid proliferation and divergence of myomorph species.

Although positive selection drives Zan evolution in all mammalian species

(Swanson et al. 2003, Herlyn and Zischler 2004, Roberts et al. Ontogeny), it appears

111 Texas Tech University, Emma K. Roberts, May 2020 not to be the only force driving divergence of zonadhesin in some myomorph rodent species. Sequence comparisons suggest that intragenic gene conversion by concerted evolution (Dover 1994, Liao 1999) contributed to Zan evolution, specifically in the species Spalax galili (Upper Galilee Mountains blind mole rat) and Meriones unguiculatus (Mongolian gerbil). Specifically, the tandem repeats within a gene from one species are more similar to each other than they are to any repeats comprising orthologous Groups in other closely related species, reminiscent of the reciprocal monophyly phenomenon established in many organismal groups (Swanson and

Vacquier 2002b). Unequal crossing over and gene conversion randomly homogenize tandem repeats within a gene and potentially within a population, a mechanism exemplified by ribosomal genes (Elder & Turner 1995, McAllister & Werren 1999,

Liao 2000). Alternatively, it is highly possible the dramatic sequence divergence seen in Spalax and Meriones may be an artifact of genome assembly error and therefore, characterizing Zan from raw genomic assembly data in these species may be required.

Moreover, the variation in number of D3p domain duplications along with differential selection within orthologous domains suggests a unique contribution of differential specificity to gamete recognition, which may subsequently alter speciation rate in certain rodent groups. For example, the observation that in the absence of zonadhesin other components of mouse spermatozoa mediate adhesion that is not species-specific; thus, fertilization occurs in the absence of zonadhesin function.

Under this scenario, ZP interaction is molecularly degenerate, mediated by multiple sperm proteins with overlapping function, and robust, insensitive to loss of individual

112 Texas Tech University, Emma K. Roberts, May 2020 adhesion molecules because of the compensatory activity of other proteins (Tardif et al. 2010a). The opposite phenomenon, facultative activity of the protein zonadhesin with the egg ZP, might effectively increase species-specificity, requiring a conspecific genotype (same number or combination of D3p expansions) for successful gamete recognition, especially in speciose taxa. Thus, an alteration in quantity or combination of D3p expansions may contribute to dramatic fluctuations in speciation rate and therefore divergence rate of species.

Current evidence collectively suggests that mammalian gamete adhesion occurs by binding of multiple sperm proteins to the glycoprotein components of the

ZP (Bi et al. 2002, Wassarman 2008), but no studies have completely defined the sequence of those interactions or their relative contributions to the overall specificity of sperm-egg recognition. Also, though zonadhesin D0-D4 domains contribute to species-specificity of ZP adhesion, no studies have yet mapped the protein’s gamete adhesion activity more precisely to specific subdomains, domains, or domain combinations within the larger mosaic protein. Because the TIL domain expansions in myomorph zonadhesin derive from internal tandem duplication events that then are subjected to differential selective forces, D3p domain relatedness and mechanism of duplication remain unknown. Characterization of additional, relevant taxa is needed to establish character transformation from the ancestral myomorph condition. For example, a D3 TIL domain ancestral to all Myomorpha is required to define polarity in the comparative evaluations and downstream phylogenetic analyses. Likewise, data from additional species are needed to discern the character evolution (i.e. number of

113 Texas Tech University, Emma K. Roberts, May 2020 and variation among D3p domains), especially in certain genera (i.e. Jaculus, Spalax, and Meriones) as they relate to other myomorph species. Characterizing the processes underlying the molecular evolution of D3p expansion will not only provide insight into these domains’ functions in specific egg recognition during fertilization, but also may ultimately determine if presence of D3p domains can be an overarching diagnostic character for inclusion in Myomorpha.

114 Texas Tech University, Emma K. Roberts, May 2020

SP TM D0 D3p1-24 MAM Muc D1 D2 D3 D4 Peromyscus maniculatus

D0 D3p1-23 MAM Muc D1 D2 D3 D4 Peromyscus leucopus

D0 D3p1-20 MAM Muc D1 D2 D3 D4 Cricetulus griseus Cricetid

D0 D3p1-20 MAM Muc D1 D2 D3 D4 Mesocricetus auratus

D0 D3p1-16 MAM Muc D1 D2 D3 D4 Microtus ochrogaster

D0 D3p1-20 MAM Muc D1 D3 D4 Rattus norvegicus

D0 D3p1-20 MAM Muc D1 D2 D3 D4 Mus caroli

D0 D3p1-20 MAM Muc D1 D2 D3 D4 ? Mus pahari

D0 D3p1-20 Murid MAM Muc D1 D2 D3 D4 Mus musculus

MAM D0 D3p1-9 Muc D1 D2 D3 D4 Meriones unguiculatus Legend D0 Meprin/A5/mu domain MAM Muc D1 D2 D3 D4 Jaculus jaculus

Mucin domain D0 D3p1-9 MAM Muc D1 D2 D3 D4 Spalax galili Spalacid vWD domain (-TIL) D0 MAM TIL/partial vWD domain Muc D1 D2 D3 D4 Cavia porcellus Hystricomorph

SP= signal peptide D0 TM= transmembrane segment MAM Muc D1 D2 D3 D4 Oryctolagus cuniculus Lagomorph

Fig. 4.1. Schematic diagram of zonadhesin domain structure in myomorph species. Domain structures of Oryctolagus cuniculus (rabbit, non-rodent) and Cavia porcellus (guinea pig, non-myomorph rodent) zonadhesin are included for reference. Note the extensive variation in vWD3p domains and length of both Mucin and Meprin/A5/mu (MAM) domains.

115 Texas Tech University, Emma K. Roberts, May 2020 A Mucin Mucin 29 850 Platypus ZANL Human ZAN 1 2 6 D0 ~TIL 0 189 144 3 4 D1 6 11 Mam1 ~VWD2 251 146 126 137 141 196 142 23 195 53 147 272 88 Mam1 5 6 7 8 9 10 11 1 2 3 4 5 6 D2 13 14 Mam2 ~VWD1 115 126 143 146 104 169 203 141 153 165 92 79 Mam2 12 13 14 15 16 17 18 19 7 8 9 10 13 D3 11 Mam3 ~VWD2 246 181 110 95 175 211 141 147 272 85 Mam3 20 21 22 23 24 25 26 11 12 13 26 D3p 68 Mucin ~TIL 0 190 144 1518 Mucin 27 28 14 31 D4 5 D0 ~VWD2 251 146 126 138 141 196 142 185 142 TIL 0 29 30 31 32 33 34 35 15 16 37 D5 11 D1 TIL 1 ~VWD1 115 127 142 146 104 169 203 141 120 125 146 144 111 170 203 142 VWD1 36 37 38 39 40 41 42 43 17 18 19 20 21 22 23 24 45 D6 8 D2 TIL 2 ~VWD2 246 184 110 95 175 211 141 248 181 112 95 177 209 144 VWD2 44 45 46 47 48 49 50 25 26 27 28 29 30 31

53 D7 9 D3 TIL 3 ~VWD3 247 165 128 275 218 137 249 165 125 305 212 145 VWD3 51 52 53 54 55 56 32 33 34 35 36 37 59 D8 8 D4 TIL 4 ~VWD2 252 127 175 169 139 199 138 248 130 172 183 140 439 VWD4 57 58 59 60 61 62 63 38 39 40 41 42 43 / 44 105 EGF D4 EGF 11 EGF 140 141 100 EGF 64 45 46 73 TM TM 97 8 TM 32 TM+C 100 45 196 47 91 TM+C 65 66 67 47 48 B

11 Mam1 11 Mam1 Mam1 23 195 53 147 272 88 36 208 53 147 272 88 1 2 3 4 5 6 1 2 3 4 5 6 Mam2 14 Mam2 14 Mam2 153 165 92 79 153 165 92 79 7 8 9 10 7 8 9 10 11 Mam3 11 Mam3 Mam3 147 272 85 147 272 85 11 12 13 11 12 13 Mucin 68 Mucin 14 Mucin 1518 Human ZAN 1881 Mouse Zan 14 14 D0 5 D0 5 TIL 0 185 142 185 142 15 16 15 16 11 D1 TIL 1 8 D1 TIL 1 VWD1+TIL1 120 125 146 144 111 170 203 142 117 125 146 144 120 170 203 142 17 18 19 20 21 22 23 24 17 18 19 20 21 22 23 24 D2 TIL 2 8 D2 TIL 2 8 VWD2+TIL2 248 181 112 95 177 209 144 248 181 112 95 177 206 142 25 26 27 28 29 30 31 25 26 27 28 29 30 31 D3 TIL 3 9 D3 TIL 3 8 VWD3+TIL3 249 165 125 305 212 145 260 165 125 299 218 142 32 33 34 35 36 37 32 33 34 35 36 37 8 D3p1-20 TIL3 repeats None 209-218 127-190 ( )20 38-77 D4 8 TIL 4 8 D4 TIL 4 245 130 172 171 140 200 139 VWD4+TIL4 248 130 172 183 140 439 78 79 80 81 82 83 84 D4 38 39 40 41 42 43 / 44 D4 11 5 EGF-like 141 100 141 45 46 85 TM 8 32 11 TM 38 Transmembrane 47 91 130 44 83 47 48 86 87 88 Fig. 4.2. Comparison of platypus ZanL, human ZAN, and mouse Zan exon structures. Panel A. Platypus (Ornithorhynchus anatinus) and human (Homo sapiens) comparison. Exon number is shown in italics, and the number of bases in each exon is shown within each box. Numbers and lines above exons show the number of nucleotide bases in from the 5’ or 3’ ends of each exon where a protein domain begins or ends. Yellow-shaded exons comprise TIL domain two-exon cassettes. Blue shaded exons denote VWD 1-4 domains. Green shaded boxes denote Mam1-3 domains. Gray shaded boxes denote duplicated platypus ZanL exon clusters not present in human ZAN. Note the absence of MAM domain exons in platypus. Note also the nearly perfect duplication of platypus exons 3-26 as exons 27-50. Panel B. Human (Homo sapiens) and mouse (Mus musculus). Exon numbers, lengths, and protein domain boundaries are labeled as in Panel A. Two-exon cassettes encoding the 3’ TIL domains of VWDs 1-4 are overlined and labeled TIL1-4, respectively. Exons identical in size between human and mouse are indicated by gray shading. Note the close correlation between intron-exon and protein domain boundaries.

116 M. musculus D0 TIL Jaculus D3TIL J. D3TIL Meriones D3p1 Rattus D3p1 Mer. D3p1 Microtus D3p1 P. maniculatus D3p1 XXI P. leucopus D3p1 Microtus D3p12 P. maniculatus D3p20 P. leucopus D3p19 Cricetulus D3p17 Mesocricetus D3p17 Rattus D3p17 V Grammomys D3p17 M. pahari D3p17 M. musculus D3p17 M. caroli D3p17 Cricetulus D3p11 Mesocricetus D3p11 Microtus D3p7 P. maniculatus D3p14 P. leucopus D3p13 Rattus D3p12 XI M. pahari D3p11 Grammomys D3p11 M. musculus D3p11 M. caroli D3p11 Cricetulus D3p8 Mesocricetus D3p8 Rattus D3p9 Grammomys D3p8 M. pahari D3p8 M. musculus D3p8 M. caroli D3p8 XIV P. maniculatus D3p5 P. leucopus D3p4 P. maniculatus D3p11 P. leucopus D3p10 Microtus D3p4 Mesocricetus D3p3 Spalax D3TIL Spalax D3p5 Splx. D3TIL Cavia porcellus D3TIL Spalax D3p6 D3p5-7 Spalax D3p7 Cavia Spalax D3p9 Microtus D3p16 Cricetulus D3p20 Mesocricetus D3p20 P. maniculatus D3p24 P. leucopus D3p23 I Rattus D3p20 Grammomys D3p20 M. pahari D3p20 M. musculus D3p20 M. caroli D3p20 Meriones D3p9 Microtus D3p13 P. maniculatus D3p21 IV P. leucopus D3p20 Mesocricetus D3p18 Grammomys D3p18 Rattus D3p18 M. caroli D3p18 M. pahari D3p18 M. musculus D3p18 III Cricetulus D3p18 Microtus D3p14 P. maniculatus D3p22 P. leucopus D3p21 Spalax D3p8 Microtus D3p15 P. maniculatus D3p23 P. leucopus D3p22 Cricetulus D3p19 Mesocricetus D3p19 II Rattus D3p19 Grammomys D3p19 M. caroli D3p19 M. pahari D3p19 M. musculus D3p19 Spalax D3p2 Spalax D3p3 Spalax D3p4 Splx. D3p2-4 Rattus D3p4 Grammomys D3p3 M. pahari D3p3 M. musculus D3p3 M. caroli D3p3 Cricetulus D3p5 Microtus D3p5 XVII Cricetulus D3p3 P. maniculatus D3p6 P. leucopus D3p5 P. maniculatus D3p8 P. leucopus D3p7 Cricetulus D3p4 P. maniculatus D3p7 P. leucopus D3p6 XVIII Mesocricetus D3p4 Texas Tech University, EmmaMesocricetus K.D3p5 Roberts, May 2020 Microtus D3p6 Mesocricetus D3p6 Cricetulus D3p6 P. maniculatus D3p9 P. leucopus D3p8 Rattus D3p7 XVI Grammomys D3p6 M. musculus D3p6 M. pahari D3p6 M. caroli D3p6 Mesocricetus D3p2 Microtus D3p3 P. maniculatus D3p4 XIX P. leucopus D3p3 M. musculus D0 TIL Rattus D3p3 Jaculus D3TIL J. D3TIL Grammomys D3p2 Meriones D3p1 Mer. D3p1 M. pahari D3p2 Rattus D3p1 M. musculus D3p2 Microtus D3p1 XXI M. caroli D3p2 P. maniculatus D3p1 Rattus D3p6 P. leucopus D3p1 Grammomys D3p5 Murid Microtus D3p12 M. pahari D3p5 P. maniculatus D3p20 M. musculus D3p5 D3p2-6 P. leucopus D3p19 M. caroli D3p5 Cricetulus D3p17 M. pahari D3p4 Mesocricetus D3p17 Rattus D3p5 Rattus D3p17 V Grammomys D3p4 Grammomys D3p17 M. musculus D3p4 M. pahari D3p17 M. caroli D3p4 M. musculus D3p17 Rattus D3p16 M. caroli D3p17 Grammomys D3p16 Cricetulus D3p11 M. pahari D3p16 Mesocricetus D3p11 M. musculus D3p16 Microtus D3p7 M. caroli D3p16 P. maniculatus D3p14 Cricetulus D3p7 P. leucopus D3p13 Mesocricetus D3p7 Rattus D3p12 XI P. maniculatus D3p10 M. pahari D3p11 P. leucopus D3p9 Grammomys D3p11 Rattus D3p8 M. musculus D3p11 Grammomys D3p7 VI & XV M. caroli D3p11 M. pahari D3p7 Cricetulus D3p8 M. musculus D3p7 Mesocricetus D3p8 M. caroli D3p7 Rattus D3p9 Mesocricetus D3p16 Grammomys D3p8 Cricetulus D3p16 M. pahari D3p8 Microtus D3p11 M. musculus D3p8 P. maniculatus D3p19 M. caroli D3p8 XIV P. leucopus D3p18 P. maniculatus D3p5 Meriones D3p7 P. leucopus D3p4 Meriones D3p2 P. maniculatus D3p11 Meriones D3p4 Mer. P. leucopus D3p10 Meriones D3p3 Microtus D3p4 Meriones D3p5 D3p2-7 Mesocricetus D3p3 Meriones D3p6 Spalax D3TIL Spalax D3p1 Spalax D3p5 Splx. D3TIL Cricetulus D3p2 Cavia porcellus D3TIL P. maniculatus D3p2 Spalax D3p6 D3p5-7 P. leucopus D3p2 Spalax D3p7 Cavia P. maniculatus D3p3 Spalax D3p9 Microtus D3p2 Microtus D3p16 Cricetulus D3p1 XX Cricetulus D3p20 Mesocricetus D3p1 Mesocricetus D3p20 Rattus D3p2 P. maniculatus D3p24 Grammomys D3p1 P. leucopus D3p23 I M. pahari D3p1 Rattus D3p20 M. musculus D3p1 Grammomys D3p20 M. caroli D3p1 M. pahari D3p20 Mesocricetus D3p9 M. musculus D3p20 Cricetulus D3p9 M. caroli D3p20 P. maniculatus D3p12 Meriones D3p9 P. leucopus D3p11 Microtus D3p13 Rattus D3p10 P. maniculatus D3p21 IV Grammomys D3p9 XIII P. leucopus D3p20 M. pahari D3p9 Mesocricetus D3p18 M. musculus D3p9 Grammomys D3p18 M. caroli D3p9 Rattus D3p18 Mesocricetus D3p12 M. caroli D3p18 Rattus D3p13 M. pahari D3p18 Grammomys D3p12 M. musculus D3p18 III M. pahari D3p12 Cricetulus D3p18 M. musculus D3p12 X Microtus D3p14 M. caroli D3p12 P. maniculatus D3p22 Cricetulus D3p12 P. leucopus D3p21 Microtus D3p8 Spalax D3p8 P. maniculatus D3p15 Microtus D3p15 P. leucopus D3p14 P. maniculatus D3p23 Cricetulus D3p10 P. leucopus D3p22 Mesocricetus D3p10 Cricetulus D3p19 P. maniculatus D3p13 Mesocricetus D3p19 II P. leucopus D3p12 Rattus D3p19 Rattus D3p11 XII Grammomys D3p19 Grammomys D3p10 M. caroli D3p19 M. pahari D3p10 M. pahari D3p19 M. musculus D3p10 M. musculus D3p19 M. caroli D3p10 Spalax D3p2 Cricetulus D3TIL Spalax D3p3 Mesocricetus D3TIL Spalax D3p4 Splx. D3p2-4 Rattus D3TIL Rattus D3p4 Meriones D3TIL Grammomys D3p3 Microtus D3TIL M. pahari D3p3 P. maniculatus D3TIL D3TIL M. musculus D3p3 P. leucopus D3TIL M. caroli D3p3 Grammomys D3TIL Cricetulus D3p5 M. pahari D3TIL Microtus D3p5 XVII M. musculus D3TIL Cricetulus D3p3 M. caroli D3TIL P. maniculatus D3p6 Rattus D3p15 P. leucopus D3p5 Grammomys D3p15 P. maniculatus D3p8 M. pahari D3p15 P. leucopus D3p7 M. musculus D3p15 Cricetulus D3p4 M. caroli D3p15 P. maniculatus D3p7 Mesocricetus D3p15 VII P. leucopus D3p6 XVIII Cricetulus D3p15 Mesocricetus D3p4 Meriones D3p8 Mesocricetus D3p5 Microtus D3p10 Microtus D3p6 P. maniculatus D3p18 Mesocricetus D3p6 P. leucopus D3p17 Cricetulus D3p6 Rattus D3p14 P. maniculatus D3p9 Grammomys D3p14 P. leucopus D3p8 M. pahari D3p14 Rattus D3p7 XVI M. musculus D3p14 Grammomys D3p6 M. caroli D3p14 M. musculus D3p6 P. maniculatus D3p17 M. pahari D3p6 P. leucopus D3p16 M. caroli D3p6 Microtus D3p9 Mesocricetus D3p2 Cricetulus D3p14 Microtus D3p3 Mesocricetus D3p14 VIII & IX P. maniculatus D3p4 XIX P. maniculatus D3p16 P. leucopus D3p3 P. leucopus D3p15 Rattus D3p3 Grammomys D3p13 Grammomys D3p2 M. pahari D3p13 M. pahari D3p2 M. musculus D3p13 M. musculus D3p2 M. caroli D3p13 Cricetulus D3p13 M. caroli D3p2 0.2 Rattus D3p6 Mesocricetus D3p13 Grammomys D3p5 Murid M. pahari D3p5 M. musculus D3p5 D3p2-6 M. caroli D3p5 M. pahari D3p4 Rattus D3p5 Grammomys D3p4 M. musculus D3p4 M. caroli D3p4 Rattus D3p16 Grammomys D3p16 M. pahari D3p16 M. musculus D3p16 M. caroli D3p16 Cricetulus D3p7 Mesocricetus D3p7 P. maniculatus D3p10 P. leucopus D3p9 Rattus D3p8 Grammomys D3p7 VI & XV M. pahari D3p7 M. musculus D3p7 Fig. 4.3. BayesianM. caroli treeD3p7 of Zan D3p nucleotide sequences. Shown is the von Mesocricetus D3p16 Cricetulus D3p16 Microtus D3p11 Willebrand D3pP. maniculatusdomainD3p19 (vWD3p) tree, constructed by Bayesian analysis of 235 P. leucopus D3p18 Meriones D3p7 Meriones D3p2 Meriones D3p4 Mer. aligned, individualMeriones ZanD3p3 D3p sequences (568 bp alignment, GTR+I+G nucleotide Meriones D3p5 D3p2-7 Meriones D3p6 Spalax D3p1 substitution modelCricetulus D3p2 and 10,000,000 generations) from 13 myomorph species, with Mus P. maniculatus D3p2 P. leucopus D3p2 P. maniculatus D3p3 Microtus D3p2 musculus Zan vWD0TILCricetulus D3p1 domain asXX outgroup. Black dots denote nodes with Bayesian Mesocricetus D3p1 Rattus D3p2 Grammomys D3p1 statistical support,M. pahari D3p1p≥0.95, and open squares denote additional nodes with ML support M. musculus D3p1 M. caroli D3p1 Mesocricetus D3p9 Cricetulus D3p9 values greater P.than maniculatus D3p12 70. The top half of the tree is on the left, and the bottom half is on P. leucopus D3p11 Rattus D3p10 Grammomys D3p9 XIII the right. AbbreviationsM. pahari D3p9 are: J.= Jaculus, Micr.= Microtus, Splx= Spalax, Mer.= M. musculus D3p9 M. caroli D3p9 Mesocricetus D3p12 Rattus D3p13 Meriones. Note thatGrammomys D3p12Groups I and II include statistically supported D3p domains from M. pahari D3p12 M. musculus D3p12 X M. caroli D3p12 all representedCricetulus species.D3p12 Note also existence of D3p domain groups unique to certain Microtus D3p8 P. maniculatus D3p15 P. leucopus D3p14 Cricetulus D3p10 clades (i.e. murid-, MesocricetusSpalaxD3p10 -, and Meriones-specific D3p domains). P. maniculatus D3p13 P. leucopus D3p12 Rattus D3p11 XII Grammomys D3p10 M. pahari D3p10 M. musculus D3p10 M. caroli D3p10 Cricetulus D3TIL Mesocricetus D3TIL Rattus D3TIL Meriones D3TIL Microtus D3TIL 117 P. maniculatus D3TIL D3TIL P. leucopus D3TIL Grammomys D3TIL M. pahari D3TIL M. musculus D3TIL M. caroli D3TIL Rattus D3p15 Grammomys D3p15 M. pahari D3p15 M. musculus D3p15 M. caroli D3p15 Mesocricetus D3p15 VII Cricetulus D3p15 Meriones D3p8 Microtus D3p10 P. maniculatus D3p18 P. leucopus D3p17 Rattus D3p14 Grammomys D3p14 M. pahari D3p14 M. musculus D3p14 M. caroli D3p14 P. maniculatus D3p17 P. leucopus D3p16 Microtus D3p9 Cricetulus D3p14 Mesocricetus D3p14 VIII & IX P. maniculatus D3p16 P. leucopus D3p15 Grammomys D3p13 M. pahari D3p13 M. musculus D3p13 M. caroli D3p13 Cricetulus D3p13 Mesocricetus0.2 D3p13 Texas Tech University, Emma K. Roberts, May 2020

M. musculus D0 J. D3TIL XXI V XI XIV Splx. D3TIL Splx. D3p5 & Cavia Splx. D3p6-7 I IV III II Splx. D3p2-4 XVII

* XVIII XVI XIX Murid-sp D3p2-6 * VI XV Mer. D3p2-7 XX XIII

X XII D3TIL VII * VIII IX

Fig. 4.4. Summary tree of Zan D3p Groups. Shown is a consolidated and transformed phylogenetic tree derived from Fig. 4.3. Asterisks denote nodes with Bayesian statistical support, p≥0.95. Note the large polytomy that includes all D3p Groups.

118 Texas Tech University, Emma K. Roberts, May 2020

Primitive Mur-sp XXI XX XIX XVIII XVII XVI XV XIV XIII XII XI X IX VIII VII VI V IV III II* I*

1 3 4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Peromyscus maniculatus

2 6 5 P. maniculatus duplications

1 2 3 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Peromyscus leucopus

5 4 P. leucopus duplications Cricetid 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Cricetulus griseus

1 3 C. griseus duplications

1 2 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Mesocricetus auratus

4 3 M. auratus duplications

1 2 5 6 4 7 8 9 10 11 12 13 14 15 16 Microtus ochrogaster

3 5 6 1 2 4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Rattus rattus

2 4 5 1 3 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Grammomys surdaster

2 4 5 1 3 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Mus caroli Murid

2 4 5 1 3 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Mus pahari

2 4 5 1 3 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Mus musculus

8 9 Meriones unguiculatus Spalacid 1 8 9 Spalax galili

Fig. 4.5. Identification of VWD3p domain groups. Orthologous von Willebrand D3p domains identified in the Bayesian tree (Fig. 4.3) are grouped vertically and denoted by Roman Numerals I-XXI. Rows represent vWD3p domains of each individual species, with vWD3p duplications in some species shown in additional rows. The column labeled Mur-sp depicts D3p groups unique to murine species. Gray- scale represents Bayesian posterior probabilities ≥ 0.95 in D3p groups, with light gray and dark gray referring to distinct monophyletic groups. White boxes indicate no statistical support. Asterisks denote presumably older D3p domains (Groups I and II) present in all rodent species examined. No D3p domains were present in Jaculus jaculus zonadhesin.

119 Texas Tech University, Emma K. Roberts, May 2020

80

70

60

50

40

30

20 BEB sites (omega>1) BEB sites 10

0 D3TIL XXI XX XIX XVIII XVII XVI XV XIV XIII XII XI X IX VIII VII VI V IV III II I

TIL 1 3 4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

D3p Group

Fig. 4.6. Bayes Empirical Bayes (BEB) sites with omega>1 for each D3p domain Group. Distribution of BEB sites (ω>1) estimated by PAML, using the model M8 and the consensus tree from the Bayesian analysis reported in Roberts et al. Ontogeny. The mean value of ω is 2.36. Asterisks indicate sites with BEB statistically significant ω value. The dashed line represents a threshold for quantity of positively selected BEB sites. The row of D3p domains below the graph represent Peromyscus maniculatus duplications, derived from Fig. 4.5. Note the markedly higher occurrence of positively selected BEB sites in Groups I-II, IV, VIII-XI, and XIV-XVI

120 Texas Tech University, Emma K. Roberts, May 2020

Table 4.1. Gene sequence source Accession Numbers and corresponding taxonomic designations. For myomorph rodents, both Family and Subfamily names are shown. Note the number of D3p expansions present in the various rodent taxa.

Family Species Common Name Accession No. No. D3p Ornithorhynchidae Ornithorynchus Duck-billed platypus XM_029055547.1 0 anatinus Hominidae Homo sapiens Human NM_003386.3 0 Leporidae Oryctolagus Domestic rabbit XM_008257988.2 0 cuniculus Bathyergidae Fukomys Damara mole-rat XM_019205267.1 0 damarensis Caviidae Cavia porcellus Guinea pig XM_023565129.1 0 Octodontidae Octodon degu Degu XM_023714561.1 0 Chinchillidae Chinchilla lanigera Long-tailed chinchilla XM_013509448.1 0 Sciuridae Marmota marmota Alpine marmot XM_015490707.1 0 Sciuridae Ictidomys Thirteen-lined ground XM_021728560.1 0 tridecemlineatus squirrel Castoridae Castor canadensis American beaver XM_020182103.1 0 Heteromyidae Dipodomys ordii Ord’ kangaroo rat XM_013031714.1 0 Dipodidae, Dipodinae Jaculus jaculus Lesser Egyptian jerboa XM_012950591.1 0 Spalacidae, Spalacinae Spalax galili Upper Galilee Mountains XM_029563222.1 9 blind molerat Cricetidae, Neotominae Peromyscus Prairie deermouse XM_016002906.1 24 maniculatus Cricetidae, Neotominae Peromyscus White-footed deermouse XM_028861441.1 23 leucopus Cricetidae, Arvicolinae Microtus Prairie vole XM_005371467.1 16 ochrogaster Cricetidae, Cricetinae Cricetulus griseus Chinese hamster XM_027442291.1 20 Cricetidae, Cricetinae Mesocricetus Golden hamster XM_013120772.2 20 auratus Muridae, Gerbillinae Meriones Mongolian gerbil XM_021650706.1 9 unguiculatus Muridae, Murinae Rattus norvegicus Norway rat XM_017598593.1 20 Muridae, Murinae Grammomys African woodland XM_028777870.1 20 surdaster thicket rat Muridae, Murinae Mus caroli Ryukyu mouse XM_021163424.1 20 Muridae, Murinae Mus pahari Shrew mouse XM_029533878.1 20 Muridae, Murinae Mus musculus House mouse NM_011741.2 20

121 Texas Tech University, Emma K. Roberts, May 2020

Table 4.2. Global and species-specific nucleotide substitution models for each rodent species. Listed are log likelihood values and descriptions for each nucleotide substitution model.

Dataset AIC-c Model LH Description Global TIM2+I+G -28772.27 AC=AT, CG=GT and unequal base freq.

Mus musculus GTR+I+G -4181.80 General time reversible, inverse, gamma

Mus pahari TIM2+G -4443.29 AC=AT, CG=GT, unequal base freq, gamma

Mus caroli TPM2uf+G -4416.60 AC=AT, AG=CT, CG=GT, unequal base freq, gamma Rattus norvegicus TPM2uf+G -4514.90 AC=AT, AG=CT, CG=GT, unequal base freq, gamma Peromyscus leucopus HKY+I+G -4489.63 Unequal transition/transversion rates, unequal base freq, gamma Peromyscus maniculatus TPM2uf+G -4619.21 AC=AT, AG=CT, CG=GT, unequal base freq, gamma Microtus ochrogaster TVM+I+G -3828.79 Transversion model, AG=CT, unequal base freq, inverse, gamma Cricetulus griseus TPM3uf+I+G -3951.99 AC=CG, AT=GT, unequal base freq, inverse, gamma Mesocricetus auratus TVM+I+G -3993.81 Transversion model, AG=CT, unequal base freq, inverse, gamma Meriones unguiculatus TPM2uf+I+G -2743.01 AC=AT, AG=CT, CG=GT, unequal base freq, gamma Spalax galili TIMef+G -2204.80 Transition model, AC=GT, AT=CG, unequal base freq, gamma

122 Texas Tech University, Emma K. Roberts, May 2020

Table 4.3. PAML summary results for selection models M2 and M8. Shown are omega values (w) and corresponding frequency (f) for dN/dS selection models (M2- selection, M8-beta and w>1) within the F3x4 codon frequency method. The below data are derived from Tables 4.4 and B1.

Intensity (w2, w) Frequency (f2, f) Group M2 M8 M2 M8 BEB Sites w>1 I 3.071 2.854 0.164 0.193 5 II 3.242 3.058 0.163 0.186 5 III 3.092 2.781 0.097 0.104 0 IV 2.200 2.114 0.485 0.538 1 V 3.550 3.392 0.105 0.089 3 VI 3.529 1.387 0.062 0.624 38 VII 1.924 1.927 0.545 0.544 38 VIII 1.280 1.280 0.668 0.668 40 IX 1.830 1.833 0.394 0.393 12 X 5.574 5.553 0.121 0.124 8 XI 1.245 5.249 0.593 0.001 0 XII 1.822 1.750 0.477 0.542 9 XIII 5.016 1.206 0.058 0.729 75 XIV 4.466 1.118 0.021 0.658 27 XV 1.578 1.580 0.554 0.553 24 XVI 1.860 1.750 0.258 0.332 4 XVII 2.092 2.114 0.323 0.316 15 XVIII 3.066 2.195 0.115 0.137 1 XIX 1.470 1.471 0.605 0.604 42 XX 3.114 2.840 0.113 0.130 5 XXI 2.029 1.884 0.652 0.701 72 D3TIL 2.220 1.993 0.072 0.110 1 Splx. D3p2-4 3.950 3.955 0.214 0.214 5 Mer. D3p2-7 1.612 1.613 0.751 0.750 52 Murid D3p2-6 2.178 2.115 0.295 0.328 4

123 Texas Tech University, Emma K. Roberts, May 2020

Table 4.4. Sites with p(w > 1) ³ 0.95 Bayes Empirical Bayes for each Group. Amino acids and number refer to positions in the consensus D3p sequence. An “X” refers to positions in which no consensus amino acid could be determined, either due to gaps or a high degree of variability in the alignment. Numbers in parentheses equal quantity of BEB sites for each D3p Group, with the greatest number in Group XIII (75 BEB sites) and lowest number (0 BEB sites) in both Groups III and XI.

I (5) 34 Q 0.950 158 G 0.976 159 H 0.952 164 F 0.965 165 P 0.995

II (5) 10 N 0.952 68 L 0.966 158 G 0.990 178 T 0.997 179 V 0.951

III (0)

IV (1) 163 E 0.977

V (3) 51 I 0.985 58 R 0.962 59 V 0.988

VI (38) 7 Y 0.989 10 N 0.999 29 F 0.989 31 L 0.989 32 W 0.999 35 T 0.992 51 I 0.966 58 R 1.000 59 V 0.999 64 K 0.951 70 Q 0.977 71 P 0.959 77 N 0.995 78 N 0.992 79 N 0.988 83 A 0.968 85 X 0.998 86 C 0.990 87 S 0.967 89 V 0.999 92 K 0.991 93 T 0.996 101 L 0.998 138 R 0.997 141 X 0.973 143 D 0.996 144 L 0.988 148 C 0.990 154 H 1.000 158 G 0.964 160 P 0.998 162 L 0.997 163 E 0.963 170 R 0.990 174 X 0.997 180 X 0.998 184 Q 0.973 188 X 0.992

VII (38) 7 Y 0.989 28 T 0.955 31 L 0.995 32 W 0.994 33 T 0.955 34 Q 0.980 35 T 0.975 37 Y 0.999 51 I 0.974 54 A 0.968 58 R 0.983 64 K 0.978 77 N 0.987 79 N 0.978 85 X 0.980 86 C 0.994 89 V 0.971 92 K 0.956 93 T 0.979 95 X 0.987 101 L 0.954 103 H 0.975 104 P 0.977 143 D 0.99 152 Y 0.991 154 H 0.999 158 G 0.971 159 H 0.989 163 E 0.985 169 L 0.988 178 T 0.991 179 V 0.996 180 X 0.959 181 X 0.951 182 X 0.966 183 W 0.99 185 H 0.998 188 X 0.980

VIII (40) 7 Y 0.994 28 T 0.999 31 L 0.994 32 W 1.000 33 T 0.999 34 Q 0.996 35 T 0.951 37 Y 0.996 39 A 0.995 51 I 1.000 53 P 0.981 54 A 0.973 55 X 0.979 58 R 0.980 61 S 0.998 62 T 0.995 68 L 0.999 75 C 0.996 86 C 0.999 89 V 0.953 92 K 0.995 95 X 0.975 105 G 0.995 141 X 0.97 144 L 0.956 145 Q 0.997 148 C 0.999 158 G 1.000 159 H 0.995 162 L 0.996 163 E 0.997 169 L 0.992 170 R 0.954 174 X 0.983 177 R 0.956 179 V 0.98 181 X 0.993 185 H 0.995 188 X 0.962 189 X 0.998

IX (12) 7 Y 0.962 10 N 0.990 31 L 0.991 32 W 0.978 37 Y 0.995 51 I 0.981 58 R 0.979 79 N 0.999 93 T 0.967 101 L 0.979 159 H 0.968 177 R 0.964

124 Texas Tech University, Emma K. Roberts, May 2020

Table 4.4. Continued

X (8) 51 I 1.000 77 N 0.977 101 L 0.982 145 Q 0.987 154 H 0.989 173 S 0.996 179 V 0.982 188 X 0.999

XI (0)

XII (9) 51 I 0.966 82 V 0.987 83 A 0.972 87 S 0.988 101 L 0.952 158 G 0.98 159 H 0.961 173 S 0.950 179 V 0.977

XIII (75) 4 A 1.000 6 S 1.000 7 Y 1.000 8 Y 1.000 9 S 1.000 10 N 1.000 17 F 1.000 28 T 1.000 29 F 1.000 30 L 1.000 32 W 1.000 33 T 1.000 34 Q 1.000 35 T 1.000 36 A 1.000 37 Y 1.000 39 A 1.000 40 X 1.000 51 I 1.000 53 P 1.000 54 A 1.000 55 X 1.000 58 R 1.000 59 V 1.000 61 S 1.000 62 T 1.000 68 L 1.000 71 P 1.000 73 L 1.000 77 N 1.000 78 N 1.000 79 N 1.000 80 K 1.000 82 V 1.000 83 A 1.000 85 X 1.000 86 C 1.000 89 V 1.000 91 A 1.000 92 K 1.000 93 T 1.000 94 P 1.000 95 X 1.000 101 L 0.994 103 H 1.000 140 X 1.000 142 L 1.000 144 L 1.000 145 Q 1.000 146 R 1.000 148 C 1.000 149 R 1.000 152 Y 1.000 154 H 1.000 155 G 1.000 158 G 1.000 159 H 1.000 160 P 1.000 162 L 1.000 163 E 1.000 164 F 1.000 165 P 1.000 169 L 1.000 170 R 1.000 173 S 1.000 174 X 1.00 176 A 1.000 178 T 1.000 179 V 0.962 180 X 1.000 183 W 1.000 184 Q 1.00 185 H 1.000 188 X 1.000 189 X 1.000

XIV (27) 4 A 0.987 7 Y 0.989 10 N 0.985 27 X 0.963 28 T 0.993 34 Q 0.959 37 Y 0.977 51 I 0.989 58 R 0.986 68 L 0.972 70 Q 0.988 71 P 0.986 80 K 0.981 82 V 0.951 83 A 0.971 86 C 0.984 95 X 0.983 139 Q 0.97 148 C 0.968 152 Y 0.967 154 H 0.987 155 G 0.968 159 H 0.998 162 L 0.987 174 X 0.978 177 R 0.980 179 V 0.958

XV (24) 7 Y 0.956 29 F 0.961 30 L 0.977 32 W 0.981 34 Q 0.996 40 X 0.951 68 L 0.996 71 P 0.989 78 N 0.955 79 N 0.958 86 C 0.996 87 S 0.990 89 V 0.978 93 T 0.983 101 L 0.989 105 G 0.996 148 C 0.999 149 R 0.991 154 H 0.997 155 G 0.979 158 G 0.997 173 S 0.992 179 V 0.983 185 H 0.97

XVI (4) 51 I 0.973 101 L 0.950 148 C 0.978 149 R 0.978

XVII (15) 7 Y 0.986 28 T 0.984 35 T 0.996 37 Y 0.989 40 X 0.991 51 I 0.984 55 X 0.986 58 R 0.983 70 Q 0.975 77 N 0.979 78 N 0.982 101 L 0.963 105 G 0.990 171 X 0.952 179 V 0.995

XVIII (1) 37 Y 0.973

XIX (42) 7 Y 0.976 8 Y 0.997 17 F 0.967 28 T 0.997 31 L 0.998 32 W 0.973 34 Q 0.997 35 T 0.998 36 A 0.997 37 Y 0.950 40 X 0.989 51 I 0.960

125 Texas Tech University, Emma K. Roberts, May 2020

Table 4.4. Continued

53 P 0.993 54 A 0.999 58 R 0.984 59 V 0.997 62 T 0.953 64 K 0.997 68 L 0.957 70 Q 0.999 71 P 0.983 77 N 0.999 78 N 0.999 79 N 0.995 89 V 0.998 93 T 0.967 101 L 0.967 105 G 0.950 138 R 0.988 148 C 1.000 149 R 0.999 152 Y 0.996 154 H 0.983 158 G 0.952 162 L 0.967 163 E 1.000 164 F 0.984 170 R 0.998 173 S 0.989 177 R 0.975 179 V 0.954 184 Q 0.98

XX (5) 35 T 0.991 54 A 0.974 58 R 0.968 77 N 0.963 188 X 0.960

XXI (72) 4 A 1.000 7 Y 0.968 8 Y 1.000 9 S 1.000 10 N 1.000 17 F 1.000 28 T 1.000 29 F 1.000 32 W 0.954 34 Q 0.975 35 T 1.000 37 Y 1.000 39 A 1.000 40 X 1.000 51 I 0.966 53 P 1.000 54 A 1.000 55 X 0.972 58 R 0.987 59 V 1.000 61 S 1.000 62 T 0.961 64 K 1.000 70 Q 1.000 71 P 1.000 75 C 1.000 76 X 1.000 77 N 0.957 78 N 0.976 79 N 1.000 83 A 1.000 84 Q 1.000 86 C 1.000 87 S 1.000 89 V 1.000 92 K 1.000 93 T 1.000 95 X 1.000 99 X 1.000 104 P 1.000 105 G 1.000 107 X 1.00 143 D 1.000 144 L 1.000 145 Q 1.000 148 C 1.000 149 R 1.000 150 E 1.000 152 Y 1.000 154 H 0.964 155 G 1.000 158 G 1.000 159 H 1.000 160 P 1.000 162 L 1.000 163 E 1.000 164 F 1.000 165 P 1.000 166 V 1.000 169 L 1.000 170 R 1.000 173 S 0.954 174 X 1.000 177 R 1.000 179 V 1.000 181 X 0.98 182 X 1.000 183 W 0.952 185 H 1.000 187 L 1.000 188 X 1.000 189 X 1.000

D3TIL (1) 158 G 0.957

Spalax D3p2-4 (5) 32 W 0.961 37 Y 0.967 51 I 0.960 70 Q 0.989 71 P 0.971

Meriones D3p2-7 (52) 2 P 0.951 4 A 0.994 6 S 0.970 7 Y 0.990 9 S 0.997 10 N 0.990 16 X 0.981 17 F 0.995 23 X 0.965 26 C 0.988 31 L 0.995 32 W 0.962 35 T 0.997 36 A 0.979 37 Y 0.983 39 A 0.951 40 X 0.958 50 X 0.993 51 I 0.993 53 P 0.994 55 X 0.961 58 R 0.981 59 V 0.960 62 T 0.984 70 Q 0.960 74 X 0.970 77 N 0.977 78 N 0.996 79 N 0.992 80 K 0.974 82 V 0.985 83 A 0.993 85 X 0.983 86 C 0.956 87 S 0.987 88 V 0.990 89 V 0.997 92 K 0.962 93 T 0.981 95 X 0.989 99 X 0.953 101 L 0.990 105 G 0.997 141 X 0.980 148 C 0.975 158 G 0.974 162 L 0.991 163 E 0.959 169 L 0.973 174 X 0.971 177 R 0.997 188 X 0.995

Murid-sp D3p4-6 (4) 148 C 0.953 149 R 0.987 179 V 0.979 188 X 0.953

126 Texas Tech University, Emma K. Roberts, May 2020

Table 4.5. Likelihood ratio test statistics for models of variable selective pressure for each Group. Model M1 assumes different dN/dS for each branch; Model M2 assumes different dN/dS for each branch and site, with selection at sites confirmed by Bayes Empirical Bayes statistical support. The test statistic equals double the difference in log likelihood values between models being tested, Column 2(ΔLH). Degrees of freedom equal 2 for both comparisons, M1 vs. M2 and M7 vs. M8. Asterisks denote significant p-values.

Group Model 2(ΔLH) P-value I M0 vs. M1 42.16 <<0.01* M0 vs. M2 60.58 <<0.01* M0 vs. M7 40.32 <<0.01* M0 vs. M8 60.46 <<0.01* II M0 vs. M1 26.10 <<0.01* M0 vs. M2 43.24 <<0.01* M0 vs. M7 24.68 <<0.01* M0 vs. M8 43.06 <<0.01* III M0 vs. M1 8.82 <0.01* M0 vs. M2 16.18 <0.01* M0 vs. M7 8.28 <0.01* M0 vs. M8 15.00 <0.01* IV M0 vs. M1 4.92 <0.05* M0 vs. M2 11.76 <0.01* M0 vs. M7 0.56 0.90 M0 vs. M8 11.76 <0.01* V M0 vs. M1 27.26 <<0.01* M0 vs. M2 37.90 <<0.01* M0 vs. M7 26.80 <<0.01* M0 vs. M8 37.84 <<0.01* VI M0 vs. M1 20.22 <<0.01* M0 vs. M2 23.76 <<0.01* M0 vs. M7 19.74 <<0.01* M0 vs. M8 22.98 <<0.01* VII M0 vs. M1 29.88 <<0.01* M0 vs. M2 44.12 <<0.01* M0 vs. M7 27.94 <<0.01* M0 vs. M8 44.08 <<0.01* VIII M0 vs. M1 13.88 <0.01* M0 vs. M2 15.22 <0.01* M0 vs. M7 13.82 <0.01*

127 Texas Tech University, Emma K. Roberts, May 2020

Table 4.5. Continued

M0 vs. M8 15.22 <0.01* IX M0 vs. M1 15.66 <0.01* M0 vs. M2 20.42 <0.01* M0 vs. M7 15.24 <0.01* M0 vs. M8 20.42 <0.01* X M0 vs. M1 24.54 <<0.01* M0 vs. M2 57.26 <<0.01* M0 vs. M7 20.34 <0.01* M0 vs. M8 57.16 <0.01* XI M0 vs. M1 21.16 <0.01* M0 vs. M2 22.20 <0.01* M0 vs. M7 20.68 <0.01* M0 vs. M8 20.68 <0.01* XII M0 vs. M1 22.90 <0.01* M0 vs. M2 29.86 <<0.01* M0 vs. M7 22.62 <0.01* M0 vs. M8 29.84 <<0.01* XIII M0 vs. M1 24.84 <<0.01* M0 vs. M2 36.40 <<0.01* M0 vs. M7 23.82 <<0.01* M0 vs. M8 27.02 <<0.01* XIV M0 vs. M1 38.38 <<0.01* M0 vs. M2 41.78 <<0.01* M0 vs. M7 35.86 <<0.01* M0 vs. M8 38.88 <<0.01* XV M0 vs. M1 13.10 <0.01* M0 vs. M2 17.12 <0.01* M0 vs. M7 12.90 <0.01* M0 vs. M8 17.12 <0.01* XVI M0 vs. M1 27.66 <<0.01* M0 vs. M2 31.40 <<0.01* M0 vs. M7 26.18 <<0.01* M0 vs. M8 31.46 <<0.01* XVII M0 vs. M1 34.82 <<0.01* M0 vs. M2 46.82 <<0.01* M0 vs. M7 33.30 <<0.01* M0 vs. M8 46.84 <<0.01* XVIII M0 vs. M1 12.40 <0.01* M0 vs. M2 16.20 <0.01*

128 Texas Tech University, Emma K. Roberts, May 2020

Table 4.5. Continued

M0 vs. M7 12.28 <0.01* M0 vs. M8 16.26 <0.01* XIX M0 vs. M1 25.26 <<0.01* M0 vs. M2 29.72 <<0.01* M0 vs. M7 23.04 <<0.01* M0 vs. M8 29.70 <<0.01* XX M0 vs. M1 45.90 <<0.01* M0 vs. M2 56.90 <<0.01* M0 vs. M7 44.92 <<0.01* M0 vs. M8 56.34 <<0.01* XXI M0 vs. M1 16.36 <0.01* M0 vs. M2 27.40 <<0.01* M0 vs. M7 15.76 <0.01* M0 vs. M8 27.02 <<0.01* D3TIL M0 vs. M1 29.52 <<0.01* M0 vs. M2 31.10 <<0.01* M0 vs. M7 29.46 <<0.01* M0 vs. M8 31.20 <<0.01* Spalax D3p2-4 M0 vs. M1 9.70 <0.01* M0 vs. M2 20.54 <0.01* M0 vs. M7 9.22 <0.01* M0 vs. M8 20.52 <0.01* Meriones M0 vs. M1 6.38 0.025* D3p2-7 M0 vs. M2 13.86 <0.01* M0 vs. M7 5.10 <0.025* M0 vs. M8 13.86 <0.01* Murid-sp M0 vs. M1 24.44 <0.01* D3p4-6 M0 vs. M2 31.40 <0.01* M0 vs. M7 22.94 <0.01* M0 vs. M8 31.48 <0.01*

129 Texas Tech University, Emma K. Roberts, May 2020

CHAPTER V

CONCLUSIONS

General Conclusions The formation of new species, or speciation, is a central tenet in evolutionary biology. Indeed, the vast diversity of life on Earth can only be explained by speciation, a process that continuously generates independently-evolving lineages maintained by reproductive isolation mechanisms, resulting in Macroevolution, the formation of new species. Despite its importance to all of biology, speciation continues to be a ‘black box,’ intertwined with reproductive isolation but poorly understood at the level of the genetic processes that promote it. Consequently, evolutionary biologists are often unable to answer key questions about the definition and major characteristics of "speciation genes". The successful identification of several putative speciation genes and ‘candidates’ that may contribute disproportionately to speciation, particularly in marine invertebrates that spawn externally, provides a glimpse of factors underlying the origin of species. The only characteristic required of a speciation gene, found in the literature, is the presence of positive selection among and within taxa that contributes to divergence of species

(Noor and Feder 2006, Nosil and Schluter 2011). Nevertheless, no speciation genes are known in internally fertilizing vertebrates, specifically in mammals that internally- fertilize. In these studies, compelling evidence is presented demonstrating the presence of a Eutherian ‘speciation gene’, Zan, which confers species-specificity to fertilization thereby promoting reproductive isolation and speciation.

130 Texas Tech University, Emma K. Roberts, May 2020

The extraordinary morphological diversity among extant mammals poses a challenge for studies of adaptation, reproductive isolation, molecular evolution, and speciation. Uncertainties remain regarding speciation events across all mammalian taxonomic levels that have continuously proven difficult to resolve. This dissertation presents a phylogenetic solution based on a single reproductive molecular marker,

Zan, a putative mammalian “speciation gene” encoding the sperm protein zonadhesin that mediates species-specific adhesion to the egg and thereby promotes reproductive isolation and divergence of species. Zan appears to be a viable, single-gene marker for mammalian phylogeny and for studies surrounding mammalian speciation.

A key component of delineating species divergence events requires resolution of how to define a distinct species. There are multiple definitions of ‘species’, many of which relate directly to speciation (Mayden 1997). Specifically, the Biological

Species Concept and Genetic Species Concept both explore patterns of variation

(reproductive and genetic, respectively) to accurately identify species and species boundaries, and better understand mammalian systematics, evolution and biodiversity

(Baker and Bradley 2006). The biological species concept focuses on ‘reproductive’ isolation between populations whereas the genetic species concept focuses on the

‘genetic’ isolation between populations. If populations achieve reproductive isolation, as a result of a prior genetic isolation event due to changes in a putative speciation gene, it may be feasible to obtain a ‘unified’ species concept for mammalian reproductive systems. Because sequence variation of Zan, a gene that contributes to reproductive isolation of species, accurately tracks the evolutionary history of

131 Texas Tech University, Emma K. Roberts, May 2020 mammals, it may be that Zan is a molecular marker that provides a unique unification of both species concepts and can be used instead as the basis of many mammalian systems.

The formation of new species, or speciation, is a central tenet in evolutionary biology. Indeed, the vast diversity of life on Earth can only be explained by speciation, a process that continuously generates independently-evolving lineages maintained by reproductive isolation mechanisms, resulting in Macroevolution, the formation of new species. Despite its importance to all of biology, speciation continues to be a ‘black box,’ intertwined with reproductive isolation but poorly understood at the level of the genetic processes that promote it. Consequently, evolutionary biologists are often unable to answer key questions about the definition and major characteristics of "speciation genes". The successful identification of several putative speciation genes and ‘candidates’ that may contribute disproportionately to speciation, particularly in marine invertebrates that spawn externally, provides a glimpse of factors underlying the origin of species. The only characteristic required of a speciation gene, found in the literature, is the presence of positive selection among and within taxa that contributes to divergence of species

(Noor and Feder 2006, Nosil and Schluter 2011). Nevertheless, no speciation genes are known in internally-fertilizing vertebrates, specifically in mammals that internally- fertilize. In these studies, compelling evidence is presented demonstrating the presence of a Eutherian ‘speciation gene’, Zan, which confers species-specificity to fertilization thereby promoting reproductive isolation and speciation.

132 Texas Tech University, Emma K. Roberts, May 2020

All mammalian eggs are surrounded by the zona pellucida (ZP), a relatively thick carbohydrate-rich extracellular coat, that plays vital functions during oogenesis, fertilization, and preimplantation development. The ZP is an elastic porous coat penetrable by sperm along with enzymes, antibodies, and small viruses (Wassarman

2008). The sperm protein, zonadhesin, binds in a species-specific manner to the egg

ZP, producing a dynamic and rapidly evolving co-evolutionary gamete recognition mechanism. The ZP-binding activity resides in the vWD type-D domains of the protein, specifically between D3-D4, in Eutherian mammals. In some rodent species, the partial vWD3 expansions vary in number (D3p; 0 in the primitive species Lesser

Egyptian jerboa to 24 in prairie deermouse), suggesting the expansion is an ongoing process and potentially more frequent in some lineages. Moreover, zonadhesin is a relatively large protein in Eutherian species (~6 kb) and increasing the size to almost threefold in some rodents (~16 kb) may dramatically change the structure and therefore function of the protein. Zonadhesin is hypothesized to act as a neutrally evolving molecular spacer, as a mechanism to ‘reach out’ and facilitate recognition of the sperm with the egg ZP extracellular matrix (Tardif et al. 2010a). The dramatic expansions in myomorph zonadhesin along with differential selection (in pervasiveness and intensity) between orthologous D3p domains suggests not just a structural change, but also a potential alteration, possible enhancement, of species- specificity to gamete recognition and subsequently divergence of species.

133 Texas Tech University, Emma K. Roberts, May 2020

Future Directions Zan’s apparent restriction to Eutheria and absence in other vertebrate groups raises the question of when and how it originated. Herein, this dissertation proposed the Zan and the Zan-like progenitor (ZanL) evolved from an ancestral Zan-like gene in stem vertebrates that is lost in some vertebrates (amphibians, birds, and marsupials), but persists as ZanL in others (monotremes, fishes, and reptiles) with a yet unidentified function, and persists as Zan in placental mammals because of its acquired function in gamete recognition (Roberts et al Ontogeny). Compelling evidence shows there appears to be retention of Zan in Eutheria, with an apparent loss in Metatheria or

Prototheria. Assays quantifying the expression levels of the zonadhesin protein from various vertebrate species, including but not limited to, marsupials, monotremes, reptiles, birds, and fishes may reveal the ontogeny of Zan function in the entire

Subphylum Vertebrata. Ongoing genomic analyses of the vertebrate Zan locus have thus far revealed partial syntenic blocks conserved back to the divergence of

Vertebrata from the remaining taxa.

A new approach to studying speciation focuses on the causes of initial divergence in populations that are only partly isolated (Via 2009). Evolutionarily

‘young’ lineages at incipient stages of divergence provide information on barriers to gene flow that evolve as a result of adaptive selection. This perception is bolstered by a plethora of research evaluating the function of selection (both neutral, negative, and positive) in a number of animal systems (Gavrilets 2004; Dieckmann et al. 2004a;

Gavrilets & Losos 2009; Barton 2010). The central limitation of this future-focused approach is the inability to foresee whether speciation will be driven to completion

134 Texas Tech University, Emma K. Roberts, May 2020 and only the balance between the benefits and detriments of hybridization dictates whether incipient species ultimately merge (secondary contact) or diverge (speciate).

Nevertheless, lineages that have not yet diverged allow a ‘snapshot in time’ and shows nature taking certain adaptations for a ‘test drive’.

Hybrid zones offer a unique opportunity to describe the dynamic nature of speciation. Studies involved in assessing extent of hybridization in mammal species known to hybridize and at differing times of divergence would provide information on presence of reproductive isolation barriers and specific mechanisms of divergence.

For example, species of pocket gophers (Geomys bursarius and G. knoxjonesi), woodrats (Neotoma micropus and N. floridana), and ground squirrels (Ictidomys tridecemlineatus and I. parvidens), each frequently hybridize and produce viable hybrid progeny (Bradley et al. 1993, Mauldin et al. 2014, Thompson et al. 2013).

However, hybrids of these taxon pairs may have varying degrees of fertility. For example, I. tridecemlineatus and I. parvidens are sibling species and have diverged relatively recently (<2 my; Cothran 1983, Thompson et al. 2013), whereas the other two pairs of taxa are not sibling taxa and diverged comparatively earlier (5-7 my;

Russell 1968, Bradley et al. 1993, Mauldin et al. 2014). Examination of taxon groups that appear to be at different stages of radiation provide a unique opportunity to evaluate the mechanisms of speciation, especially in rapidly evolving species groups

(Cothran 1983; Birney 1976). Directionality of crosses also provides helpful insight to isolating mechanisms in these species and makes these taxa ideal for a reproductive isolation project. G. bursarius and G. knoxjonesi produce unidirectional hybrid

135 Texas Tech University, Emma K. Roberts, May 2020 crosses with G. bursarius males mating preferentially with G. knoxjonesi females whereas the reciprocal cross occurs at a much lower frequency, if at all (Bradley et al.

1993). However, N. micropus and N. floridana mating is bidirectional and apparently has no limitation in mating direction. Ictidomys crosses are thought to be unidirectional like Geomys, but recent studies suggest these crosses may be bidirectional like that observed in Neotoma (Cothran 1983).

What is required to further test the utility and possible limitations of Zan as a molecular marker and also to describe species divergence events across taxonomic levels? At a minimum, such studies must: 1) include all mammalian Orders and a more extensive assortment of species, and 2) test the recovery of lower taxonomic- level delineations by incorporating several pairs of sister taxa in the most speciose genera, i.e. Mus, Peromyscus, and Myotis. Toward this end, it may be that Zan sequence divergence delineates species by a unified species concept.

In addition, concomitant with recent technological advancements in genome analysis and a growing interest in the role of reproductive isolation in species differentiation, speciation research is becoming increasingly open to non-model organisms, especially wild animal systems. Of the total 6,399 extant mammalian species, 99.7% exist in wild (non-domesticated) populations globally (Burgin et al.

2018). The largest families of these are in Rodentia (Muridae, 834 spp; Cricetidae;

792 spp) followed by families in Chiroptera (Vespertilionidae, 493 spp) and Soricidae

(440 spp). Despite the number of wild species, reproductive isolation mechanisms in a

136 Texas Tech University, Emma K. Roberts, May 2020 majority of mammalian systems remains unclarified and largely unknown in most mammals.

Previous studies showed that targeted disruption of Zan in mice inhibited adhesion of wild-type spermatozoa to the mouse ZP but did not inhibit adhesion of spermatozoa lacking zonadhesin (Tardif et al. 2010a). Thus, the Zan ‘knock out’ genotype appears to be promiscuous and therefore disrupts species-specificity to fertilization. Studies on ‘promiscuity’ in protein domain function report a volatile and fast-changing nature in evolution and that only a few domains retain promiscuity status throughout evolution of certain lineages, especially in rapidly evolving groups

(Basu et al. 2008). There are several opportunities to continue this research. First, by directly injecting a pig (or some other species) trans-gene into a knock-out mouse genotype, assessing pig sperm-mouse ZP adhesion by gamete co-incubation similarly to fertilization in vitro, and determining if the gametes recognize each other and bind successfully. Second, experiments perturbing of the Zan-ZP recognition barrier to reproduction, either by mutating gametes and experimentally simulating an

‘adjustment’ in recognition, or replacement of a gamete with another species (both closely- and distantly-related) may detect changes in gamete recognition.

In mammalian spermatozoa, the combined binding activities of zonadhesin domains may confer functional degeneracy that protects against sterility upon loss of any one component’s function (Tardif et al. 2010a), thereby allowing zonadhesin binding specificity free to evolve in support of reproductive isolation. Differential evolutionary pressures on portions of zonadhesin, such as intense positive selection,

137 Texas Tech University, Emma K. Roberts, May 2020 may alter the binding specificity and therefore maintain a reproductive barrier to fertilization. The distribution of positive selection in orthologous D3p domain Groups suggests the D3p domains duplicated nearest to the putative site of propagation (D3

TIL) and are most likely to confer species-specificity to gamete binding activity.

Therefore, to clarify if and which D3p domain(s) binds ZP in a species-specific manner, experiments in which isolation and expression of individual (and in combination) D3p domains may determine binding, if any, to large amounts of a conspecific ZP affinity matrix.

The majority of protein domain duplications can be explained by the addition of single domains. However, in the generation of long tandem repeats, especially in proteins that are directly involved in cell-cell interaction, protein domains tend to accumulate non-linearly and therefore, multiple domains can duplicate simultaneously. There are various mechanisms through which protein domains may be duplicated. On the largest scale, whole genome duplication such as those seen in vertebrate genomes duplicated whole gene families. On the other end of the scale, domains have been shown to duplicate through genetic mechanisms such as exon- shuffling, retrotransposition, recombination, and horizontal gene transfer. Although genetic forces such as exon-shuffling and genome duplication vary widely among species, the total number of domains present fluctuate among species. D3p domain expansion comparison in this dissertation included only a subset of the >2,500 species of rodents with 20 species representing 11 families. Future studies characterizing the dramatic variation in D3p domain expansions must incorporate an assortment of

138 Texas Tech University, Emma K. Roberts, May 2020 species, including several pairs of sister taxa in the most speciose genera, i.e. Mus,

Peromyscus, and Microtus.

Perhaps through these future studies, the resolution of Zan’s function can be achieved. For example, full breadth of Zan’s role in species-specificity to fertilization and therefore speciation in Eutherian species. In addition, the utilization of Zan as a phylogenetic marker and accurate proxy for the mammalian species tree. Finally, diagnosis of Zan’s function as a reproductive isolation barrier particularly in hybrid zones and relatively “younger taxa”, especially among closely related and even sibling species.

139 Texas Tech University, Emma K. Roberts, May 2020

BIBLIOGRAPHY Alipaz JA, Wu CI, Karr TL (2001) Gametic incompatibilities between races of Drosophila melanogaster. Proc. R. Soc. London. B. 268:789-795.

Amrine-Madsen H, Koepfli, KP, Wayne, RK, Springer, MS (2003) A new phylogenetic marker, apoliprotein B, provides compelling evidence for eutherian relationships. Mol. Phy. Evol. 28:225-240.

Arbogast BS, Slowinski JB (1999) Pleistocene speciation and the mitochondrial DNA clock. Science 282:1955a.

Asher RJ, Bennett N, and Lehmann T (2009) The new framework for understanding placental mammal evolution. BioEssays 31:853-864.

Asher RJ, Helgen KM (2010) Nomenclature and placental mammal phylogeny. BMC Evol. Biol. 6:93.

Bagowski PC, Bruins W, te Velthuis AJW (2010) The nature of protein domain evolution: shaping the interaction network. Curr. Genomics 11.5:368-376.

Baker RJ, Bradley RD (2006) Speciation in mammals and the genetic species concept. J. Mamm. 87:643-662.

Barton NH (2010) What role does natural selection play in speciation? Phil. Trans. R. Soc. B. 365:1825-1840.

Basu MK, Carmel L, Rogozin IB, Koonin EV (2008) Evolution of protein domain promiscuity in eukaryotes. Gen. Res. 18:449-461.

Bhaskara RM, Srinivasan N (2011) Stability of domain structures in multi-domain proteins. Sci. Reports 1:40.

Beck RMD, Bininda-Emonds ORP, Cardillo M, Liu F, Purvis A (2006) A higher-level MRP supertree of placental mammals. BMC Evol Biol. 6:93.

Bell G (1982) The Masterpiece of nature: the evolution and genetics of sexuality. CUP Archive.

Benton MJ (2015) Exploring macroevolution using modern and fossil data. Proc. R. Soc. B. 282:20150569.

140 Texas Tech University, Emma K. Roberts, May 2020

Bi M, Wassler MJ, Hardy DM (2002) Sperm adhesion to the extracellular matrix of the egg. In Fertilization, Hardy DM, ed. Academic Press (San Diego, CA) p. 153-180.

Bi M, Hickox JR, Winfrey VP, Olson GE, Hardy DM (2003) Processing, localization and binding activity of zonadhesin suggest a function in sperm adhesion to the zona pellucida during exocytosis of the acrosome. Biochem J. 375:477-488.

Bininda-Emonds OR, Cardillo M, Jones KE, MacPhee RDE, Beck RMD, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A (2007) The delayed rise of present- day mammals. Nature 446:507-512.

Birney EC (1976) An assessment of relationships and effects of interbreeding among woodrats of the Neotoma floridana species-group. J. Mamm. 57:103-132.

Bjorklund AK, Ekman D, Elofsson A (2006) Expansion of protein domain repeats. PLoS Comput Biol. 2:e114.

Benirschke K (1967) Sterility and fertility of interspecific mammalian hybrids. In Comparative aspects of reproductive failure (pp. 218-234). Springer, Berlin, Heidelberg.

Bradley RD, Davis S, Lockwood S, Bickham J, Baker R. (1991a). Hybrid breakdown and cellular-DNA content in a contact zone between two species of pocket gopher (Geomys). J. Mamm. 72:697-705.

Bradley RD, Davis SK, Baker RJ (1991b) Genetic control of premating-isolating behavior: Kaneshiro’s hypothesis and asymmetrical sexual selection in pocket gophers. J. Hered. 82:192-196.

Bradley RD, Bull JJ, Johnson AD, Hillis DM (1993) Origin of a novel allele in a mammalian hybrid zone. Proc. Natl. Acad. Sci. USA 90:8939-8941.

Bronson F (1991) Mammalian Reproductive Biology. University of Chicago Press, Chicago, IL.

Buljan M, Bateman A (2009) The evolution of protein domain families. Biochem. Soc. Trans. 37:751-755.

Burgin CJ, Colella JP, Kahn PL, Upham NS (2018) How many species of mammals are there? J. Mamm. 99:1-14.

Callan HG (1967) The organization of genetic units in chromosomes. J. Cell Sci. 2.1:1-7.

141 Texas Tech University, Emma K. Roberts, May 2020

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+:architecture and applications. BMC Bioinf. 1:421.

Cao Y, Fujiwara M, Nikaido M, Okada N, Hasegawa M (2000) Interordinal relationships and timescale of Eutherian evolution as inferred from mitochondrial genome data. Gene 259:149-158.

Carr SM, Ballinger SW, Derr JN, Blankenship LH, Bickham JW (1986) Mitochondrial DNA analysis of hybridization between sympatric white-tailed deer and mule deer in west Texas. Proc. Natl. Acad. Sci. U.S.A. 83:9576-9580.

Cohen J (1973) Crossovers, sperm redundancy and their close association. Heredity 31:408.

Cooper LN, Seiffert ER, Clementz M, Madar SI, Bajpai S, Hussain ST, Thewissen JGM (2014) Anthracobunids from the Middle Eocene of and Pakistan Are Stem Perissodactyls. PLoS ONE 9:e109232.

Cothran EG (1983) Morphologic relationships of the hybridizing ground squirrels Spermophilus mexicanus and S. tridecemlineatus. J. Mamm. 64:591-602.

Coyne J (1992) Genetics and speciation. Nature 355:6360.

Coyne JA, Orr HA (2004) Speciation Sinauer Associates. Sunderland, MA, pp. 276- 281.

Daly M (1978) The cost of mating. Am. Nat. 112:771-774.

Darriba D, Taboada GL, Doallo R, Posada D (2012) jModelTest 2: more models, new heuristics and parallel computing. Nature Meth. 9:772.

Day RW, Quinn GP (1989) Comparisons of treatments after an analysis of variance in ecology. Ecol. Mono. 59:433-463.

Dehal P, Boore JL (2005) Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol 3.10. de Jong W (1998) Molecules remodel the mammalian tree. Trends Ecol Evol 13:270- 275.

Delsuc F, Scally M, Madsen O, Stanhope MJ, de Jong WW, Catzeflis FM, Springer MS, Douzery EJP (2002) Molecular phylogeny of living Xenarthrans and the impact of character and taxon sampling on the placental tree rooting. Mol. Biol. Evol. 19:1656-1671.

142 Texas Tech University, Emma K. Roberts, May 2020

Derrickson EM (1992) Comparative reproductive strategies of altricial and precocial Eutherian mammals. Funct. Ecol. 6:57-65. de Vienne DM, Giraud T, Martin OC (2007) A congruence index for testing topological similarity between trees. Bioinf. 23:3119-3124.

Dieckmann U, Doebeli M, Metz JA, Tautz D (2004) Adaptive speciation, Eds. Cambridge University Press.

Dobzhansky T (1937) Genetics and the Origin of Species. Columbia University Press. Chichester, New York.

Doolittle RF (1995) The multiplicity of domains in proteins. Annu. Rev. Biochem. 64:287-314. dos Reis M, Inoue J, Hasegawa M, Asher RJ, Donoghue PC, Yang Z (2012) Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny. Proc. R. Soc. B. 279:3491-3500.

Douady CJ, Chatelier PI, Madsen O, de Jong WW, Catzeflis F, Springer MS, Stanhope MJ (2002) Molecular phylogenetic evidence confirming the Eulipotyphla concept and in support of hedgehogs as the sister group to shrews. Mol. Phylo. Evol. 25:200-209.

Dover G (1994) Concerted evolution, molecular drive and natural selection. Curr. Biol. 4.12:1165-1166.

Eberhard MJ (1979) Sexual Selection, social competition, and evolution. Proc. Am. Phil. Soc. B. 123:222-234.

Eberhard WG (1985) Sexual selection and animal genitalia. Cambridge, MA, USA: Harvard University Press.

Eberhard MJ (1986) Alternative adaptations, speciation, and phylogeny (a review). Proc. Natl. Acad. Sci. U.S.A. 83.5:1388-1392.

Edelman GM, Gally JA (1968) Antibody structure, diversity, and specificity. Brookhaven Symposia in Biology 21:328-344.

Edelman GM, Gally JA (1970) Arrangement and evolution of eukaryotic genes. In Schmitt F, The Neurosciences. Second Study Program. New York: Rockefeller University Press. pp. 962-972.

Edwards R (1993) Entomological and mammalogical perspectives on genital differentiation. Trends Ecol. Evol. 8:406-409.

143 Texas Tech University, Emma K. Roberts, May 2020

Elder Jr JF, Turner BJ (1995) Concerted evolution of repetitive DNA sequences in eukaryotes. Q. Rev. Biol. 70:297-320.

Etienne RS, Rosindell J (2012) Prolonging the past counteracts the pull of the present: protracted speciation can explain observed slowdowns in diversification. Syst. Biol. 61:204-213.

Fabre P, Hautier L, Dimitrov D, Douzery EJP (2012) A glimpse on the pattern of rodent diversification a phylogenetic approach. BMC Evol. Biol. 12:88.

Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39.4:783-791.

Fitzpatrick BM (2004) Rates of evolution of hybrid inviability in birds and mammals. Evolution. 58:1865-1870.

Foley NM, Springer MS, Teeling EC (2016) Mammal madness: is the mammal tree of life not yet resolved? Phil. Trans. R. Soc. B. 371:20150140.

Frost SDW, Wrin T, Smith DM, Kasakovsky Pond SL, Liu Y, Paxinos E, Chappey C, Galovich J, Beauchaine J, Petropoulos CJ, Little SJ, Richman DD (2005) Neutralizing antibody responses drive the evolution of human immunodeficiency virus type 1 envelope during recent HIV infection. Proc. Natl. Aca. Sci. U.S.A. 102:18514-18519.

Galbreath GJ, Hunt M, Clements T, Waits LP (2008) An apparent hybrid wild bear from Cambodia. Ursus 19:85-86.

Gally JA (1989) Past imperfect [letter]. Trends Genet. 5:172.

Gao Z, Garbers D (1998) Species diversity in the structure of zonadhesin, a sperm- specific membrane protein containing multiple cell adhesion molecule-like domains. J. Biol. Chem. 273:3415-3421.

Gaudin TJ, Wible, JR, Hopson, JA & Turnbull, WD (1996) Reexamination of the morphological evidence for the cohort Epitheria (Mammalia, Eutheria) J. Mam. Evol. 3:31-79.

Gavrilets S (2004) Fitness landscapes and the origin of species (MPB-41). Princeton University Press.

Gavrilets S, Losos JB (2009) Adaptive radiation: contrasting theory with data. Science 323:732-737.

144 Texas Tech University, Emma K. Roberts, May 2020

Gittleman JL, Thompson SD (1988) Energy allocation in mammalian reproduction. Am. Zool. 28:863-875.

Glor RE (2010) Phylogenetic insights on adaptive radiation. Annu. Rev. Ecol. Evol. Syst. 41:251-270.

Halabi N, Rivoire O, Leibler S, Ranganathan R (2009) Protein sectors: evolutionary units of three-dimensional structure. Cell 138.4:774-786.

Halliday TJD, Upchurch P, Goswami A (2015) Resolving the relationships of placental mammals. Biol Rev. 92:521-550.

Hallstrom B, Kullberg M, Nilsson MA and Janke A (2007) Phylogenomic data analyses provide evidence that Xenarthra and Afrotheria are sister groups. Mol Biol Evol 24:2059-2068.

Hardy D, Garbers D (1994) Species-specific binding of sperm proteins to the extracellular matrix (zona pellucida) of the egg. J. Biol. Chem. 269:19000- 19004.

Hardy D, Garbers D (1995) A sperm membrane protein that binds in a species-specific manner to the egg extracellular matrix is homologous to von Willebrand factor. J. Biol. Chem. 270:26025-26028.

Harrison RG, Larson EL (2014) Hybridization, introgression, and the nature of species boundaries. J. Hered. 105:795-809.

Herberg S, Gert KR, Schleiffer A, Pauli A (2018) The Ly6/uPAR protein Bouncer is necessary and sufficient for species-specific fertilization. Science 361:1029- 1033.

Herlyn H, Zischler H (2004) The molecular evolution of sperm zonadhesin. Int. J. Dev. Biol. 52.5-6:781-790.

Herlyn H, Zischler H (2006) Tandem repetitive D domains of the sperm ligand zonadhesin evolve faster in the paralogue than in the orthologue comparison. J Mol. Evol. 63:602-611.

Hickox JR, Bi M, Hardy DM (2001) Heterogeneous processing and zona pellucida binding activity of pig zonadhesin. J.Biol. Chem. 276:41502-41509.

Honeycutt R (2009) Rodents (Rodentia). In The Timetree of Life, Hedges SB, Kumar S, eds. Oxford University Press (Oxford, UK) p. 490-494.

Horandl E (2009) A combinatorial theory for maintenance of sex. Heredity. 103:445.

145 Texas Tech University, Emma K. Roberts, May 2020

Horandl E (2013) Meiosis and the paradox of sex in nature. InTech.

Horvath J, Weisrock DW, Embry SL, Fiorentino I, Balhoff JP, Kappeler P, Wray GA, Willard HF, Yoder AD (2008) Development and application of a phylogenomic toolkit: resolving the evolutionary history of Madagascar's lemurs. Gen. Res. 18:489-499.

Huelsenbeck JP, Bull JJ, Cunningham CW (1996) Combining data in phylogenetic analysis. Trends Ecol. Evol. 11:152-158.

Hughes T, Liberles DA (2008) Whole-genome duplications in the ancestral vertebrate are detectable in the distribution of gene family sizes of tetrapod species. J. Mol. Evol. 67.4:343-357.

Hunt PN, Wilson MD, Von Schalburg KR, Davidson WS, Koop BF (2005) Expression and genomic organization of zonadhesin-like genes in three species of fish give insight into the evolutionary history of a mosaic protein. BMC Genomics 6:165.

Huxley J (1942) Evolution. The modern synthesis. London: George Alien and Unwin Ltd.

Jin J, Xie X, Chen C, Park JG, Stark C, James DA, Olhovsky M, Linding R, Mao Y, Pawson T (2009) Eukaryotic protein domains as functional units of cellular evolution. Sci. Signal. 2.98:ra76.

Jordan DM, Ramensky VE, Sunyaev SR (2010) Human allelic variation: perspective from protein function, structure, and evolution. Curr. Op. Struct. Biol. 20.3:342-350.

Kaneshiro KY (1980) Sexual isolation, speciation and the direction of evolution. Evolution. 34:437-444.

Kassahn KS, Dang VT, Wilkins SJ, Perkins AC, Ragan MA (2009) Evolution of gene function and regulatory control after whole-genome duplication: comparative analyses in vertebrates. Genome Res. 19.8:1404-1418.

Keren H, Lev-Maor G, Ast G (2010) Alternative splicing and evolution: diversification, exon definition and function. Nature Rev. Genet. 11.5:345-355.

Killingbeck EE, Swanson WJ (2018) Egg coat proteins across metazoan evolution. Curr. Top. Dev. Biol. 130:443-488.

Kriegs JO, Churakov G, Jurka J, Brosius J, Schmitz J (2007) Evolutionary history of 7SL RNA-derived SINEs in Supraprimates. Trends Genet. 23:158-161.

146 Texas Tech University, Emma K. Roberts, May 2020

Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K (2012) Statistics and truth in phylogenomics. Mol. Biol. Evol. 29:457-472.

Kumar V, Hallstrom BM, Janke A (2013) Coalescent-based genome analyses resolve the early branches of the Euarchontoglires. PLoS ONE 8:e60019.

Lee Y, Vacquier V (1992) The divergence of species-specific abalone sperm lysins is promoted by positive Darwinian selection. Biol. Bull. 182:97-104.

Lessios HA (2011) Speciation genes in free-spawning marine invertebrates. Integ. Comp. Biol. 51:456-465.

Liao D (1999) Concerted evolution: molecular mechanism and biological implications. Am. J. Hum. Genet. 64.1:24.

Liao D (2000) Gene conversion drives within genic sequences: concerted evolution of ribosomal RNA genes in bacteria and archaea. J.Mol. Evol. 51.4:305-317.

Liu FGR, Miyamoto MM, Freire NP, Ong PQ, Tennant MR, Young TS, Gugel KF (2001) Molecular and morphological supertrees for eutherian (placental) mammals. Science 291:1786-1789.

Long J, Li M, Ren Q, Zhang C, Fan J, Duan Y, Chen J, Li B, Deng L (2012) Phylogenetic and molecular evolution of the ADAM (A Disintegrin And Metalloprotease) gene family from Xenopus tropicalis to Mus musculus, Rattus norvegicus, and Homo sapiens. Gene 507:36-43.

MacPhee RDE, Novacek MJ (1993) Definition and relationships of Lipotyphla. In Szalay FS, Novacek MJ, McKenna MC (eds) Mammal Phylogeny: Placentals. Springer-Verlag, New York, pp. 13-31.

Madsen O, Scally M, Douady CJ, Kao DJ, DeBry RW, Adkins R, Amrine HM, Stanhope MJ, de Jong WW, Springer MS (2001) Parallel adaptive radiations in two major clades of placental mammals. Nature 409:610-614.

Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de Peer Y (2005) Modeling gene and genome duplications in eukaryotes. Proc. Natl. Acad. Sci. U.S.A. 102.15:5454-5459.

Mallet J (2007) Hybrid speciation. Nature 446:279-283.

Matocq MD, Shurtliff QR, Feldman CR (2007) Phylogenetics of the woodrat genus Neotoma (Rodentia: Muridae): Implications for the evolution of phenotypic variation in male external genitalia. Mol. Phy. Evol. 42:637-652.

147 Texas Tech University, Emma K. Roberts, May 2020

Mauldin M, Haynie M, Hanson J, Baker R, Bradley R (2014) Multilocus characterization of a woodrat (Genus Neotoma) hybrid zone. J. Hered. 105:466- 476.

Mayden RL (1997) A hierarchy of species concepts: the denoument in the saga of the species problem. In M. F. Claridge, H. A. Dawah and M. R. Wilson (eds.), Species: The units of diversity, London: Chapman and Hall, 381-423.

Mayr E (1940) Speciation phenomena in birds. Am. Nat. 74:249-278.

McAllister BF, Werren JH (1999) Evolution of tandemly repeated sequences: What happens at the end of an array? J. Mol. Evol. 48:469–481.

McKenna MC, Bell SK (1997) Classification of Mammals Above the Species Level. Columbia Univ. Press, New York.

McLaughlin Jr. RN, Poelwijk FJ, Raman A, Gosal WS, Ranganathan R (2012) The spatial architecture of protein function and adaptation. Nature 491.7422:138- 142.

Mead R (1968a) Reproduction in western forms of the spotted skunk. J. Mamm. 49: 373-389.

Mead R (1968b) Reproduction in eastern forms of the spotted skunk (genus Spilogale). J. Zool. 156: 119-136.

Meng J (2003) The osteology of Rhombomylus (Mammalia, Glires): Implications for phylogeny and evolution of Glires. Bull. Am. Mus. Nat. Hist. 275:1-247.

Meng J, Wyss AR (2001) The morphology of Tribosphenomys (Rodentiaformes, Mammalia): Phylogenetic Implications for Basal Glires. J. Mamm. Evol. 8:1- 71.

Mengel RM (1971) A study of dog-coyote hybrids and implications concerning hybridization in Canis. J. Mamm. 52:316-336.

Meredith RW, Janecka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, Goodbla A, Eizirik E, Simao TLL, Stadler T, Rabosky DL, Honeycutt RL, Flynn JJ, Ingram CM, Steiner C, Williams TL, Robinson RJ, Burk-Herrick A, Westerman M, Ayoub NA, Springer MS, Murphy WJ (2011) Impacts of the terrestrial revolution and KPg extinction on mammal diversification. Science 334:521-524.

148 Texas Tech University, Emma K. Roberts, May 2020

Mets EC, Palumbi SR (1996) Positive selection and sequence arrangements generate extensive polymorphism in the gamete recognition protein bindin. Mol. Biol. Evol. 13:397-406.

Moy GW, Springer SA, Adams SL, Swanson WJ, Vacquier VD (2008) Extraordinary intraspecific diversity in oyster sperm bindin. Proc. Natl. Aca. Sci. U.S.A. 105:1993-1998.

Murphy WJ, Eizirik E, O’Brien SJ, Madsen O, Scally M, Douady CJ, Teeling EC, Ryder OA, Stanhope MJ, de Jong WW, Springer MS (2001) Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294:2348-2351.

Murphy WJ, Davis B, David VA, Agarwala R, Schäffer AA, Wilkerson Pearks AJ, Wilkerson BN, O’Brien SJ, Menotti-Raymond M (2007) A 1.5-Mb-resolution radiation hybrid map of the cat genome and comparative analysis with the canine and human genomes. Genomics 89:189-196.

Nei M (1975) Molecular population genetics and evolution. Mol. Pop. Gen. Evol. X + 288 pp.

Nei M, Maruyama T, Wu C (1983) Models of evolution of reproductive isolation. Genetics 103:557-579.

Nishihara H, Hasegawa M, Okada N (2006) Pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions. Proc. Natl. Acad. Sci. U.S.A. 103:9929-9934.

Nishimura H, Myles DG, Primakoff P (2007) Identification of an ADAM2-ADAM3 complex on the surface of mouse testicular germ cells and cauda epididymal sperm. Biochem. J. 282:17900-17907.

Noor M, Grams K, Bertucci L, Reiland J (2001) Chromosomal inversions and the reproductive isolation of species. Proc. Natl. Acad. Sci. U.S.A. 98:12084-12088.

Noor MA, Feder JL (2006) Speciation genetics: evolving approaches. Nature Rev. Genet. 7:851-861.

Nosil P, Schluter D (2011) The genes underlying the process of speciation. Trends Ecol. Evol. 26:160-167.

Notredame C, Higgins DG, Heringa J (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302:205-217.

149 Texas Tech University, Emma K. Roberts, May 2020

Novacek MJ (1989) Higher mammal phylogeny: the morphological-molecular synthesis. In Fernholm B, Bremer K, Jornvall H (eds) The Hierarchy of Life. Elsevier, Amsterdam, pp. 421-435.

Novacek, MJ (1992a) Mammalian phylogeny: shaking the tree. Nature 356:121-125.

Novacek MJ (1992b) Fossils, topologies, missing data, and the higher-level phylogeny of Eutherian mammals. Syst. Biol. 41:58-73.

Novacek MJ, Wyss AR, McKenna MC (1988) The major groups of Eutherian mammals. In Benton, MJ (ed) The Phylogeny and Classification of the Tetrapods, vol 2. Mammals. Clarendon Press, Oxford, pp. 31-71.

Odeen A, Florin A (2002) Sexual Selection and peripatric speciation: the Kaneshiro model revisited. J. Evol. Biol. 15:301-306.

O’Leary MA, Allard M, Novacek MJ, Meng J and Gatesy J (2004) Building the mammalian sector of the tree of life: combining different data and a discussion of divergence times for placental mammals. In Cracraft J, Donoghue M (eds), Assembling the Tree of Life. Oxford University Press (Oxford), pp. 490-516.

O’Leary MA, Bloch JI, Flynn JJ, Gaudin TJ, Giallombardo A, Giannini NP, Goldberg SL, Kraatz BP, Luo Z, Meng J, Ni X, Novacek MJ, Perini FA, Randall ZS, Rougier GW, Sargis EJ, Silcox MT, Simmons NB, Spaulding M, Velazco PM, Weksler M, Wible JR, Cirranello AL (2013) The placental mammal ancestor and the post-KPg radiation of placentals. Science 339:662-667.

Olson GE, Winfrey VP, Bi M, Hardy DM, NagDas SK (2004) Zonadhesin assembly into the hamster sperm acrosomal matrix occurs by distinct targeting strategies during spermiogenesis and maturation in the epididymis. Biol. Reprod. 71:1128- 1134.

Orengo CA, Jones DT, Thornton JM (1994) Protein superfamilies and domain superfolds. Nature 372.6507:631-634.

Orr H (2005) The genetic basis of reproductive isolation: Insights from Drosophila. Proc. Natl. Acad. Sci. U.S.A. 102:6522–6526.

Palumbi SR (1994) Genetic divergence, reproductive isolation, and marine speciation. Annu. Rev. Ecol. Syst. 25:547-572.

Palumbi SR (1999) All males are not created equal: fertility differences depend on gamete recognition polymorphisms on sea urchins. Proc. Natl. Acad. Sci. U.S.A. 96:12632-12637.

150 Texas Tech University, Emma K. Roberts, May 2020

Palumbi SR (2009) Speciation and the evolution of gamete recognition genes: pattern and process. Heredity 102:66-76.

Paradis E (2003) Analysis of diversification: combining phylogenetic and taxonomic data. Proc R. Soc. Lond. B. 270:2499-2505.

Patton JL, Smith MF (1993) Molecular evidence for mating asymmetry and female choice in a pocket gopher (Thomomys) hybrid zone. Mol. Ecol. 2:3-8.

Ponting P, Russell RR (2002) The natural history of protein domains. Annu. Rev. Biophys. Biomolec. Struct. 31.1:45-71.

Potts D (2001) Bactrian camels and bactrian-dromedary hybrids. Central Asian Survey 289:303.

Prager EM, Wilson AC (1975) Slow evolutionary loss of the potential for interspecific hybridization in birds: a manifestation of slow regulatory evolution. Proc. Natl. Acad. Sci. U.S.A. 72:200-204.

Promislow DEL, Harvey PH (1990) Living fast and dying young: a comparative analysis of life-history variation among mammals. J. Zool. 220:417-437.

Raj I, Al Hosseini HS, Dioguardi E, Nishimura K, Han L, Villa A, de Sanctis D, Jovine L (2017) Structural basis of egg coat-sperm recognition at fertilization. Cell 169:1315-1326.

Rambaut A (2018) FigTree v. 1.4.4: a graphical viewer of phylogenetic trees.

Ronquist F, Teslenko M, van der Mark P, Ayers DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61:539-542.

Rose KD, Archibald JD (2005) The Rise of Placental Mammals: Origins and Relationships of the Major Extant Clades. JHU Press. p. 65.

Rose KD (2006) The beginning of the age of mammals. Baltimore: JHU Press. pp. 242-243.

Russell RJ (1968) Evolution and classification of the pocket gophers of the subfamily Geomyinae. Occasional Papers, Museum of Natural History, University of Kansas 16:473–579.

151 Texas Tech University, Emma K. Roberts, May 2020

Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES (2006) Positive natural selection in the human lineage. Science 312:1614-1620.

Scally M, Madsen O, Douady CJ, de Jong WW, Stanhope MJ, Springer MS (2001) Molecular evidence for the major clades of placental mammals. J. Mamm. Evol. 8:239-277.

Schmidt TR, Wildman DE, Uddin M, Opazo JC, Goodman M, Grossman LI (2005) Rapid electrostatic evolution at the binding site for cytochrome c on cytochrome-c oxidase in anthropoid primates. Proc. Natl. Acad. Sci. U.S.A 102:6379-6384.

Scornavacca C, Zickmann F, Huson DH (2011) Tanglegrams for rooted phylogenetic trees and networks. Bioinf. 27:i248-i256.

Seiffert Erik, Guillon JM (2007) A new estimate of afrotherian phylogeny based on simultaneous analysis of genomic, morphological, and fossil evidence. BMC Evol. Biol. 7:13.

Shapiro A, Porter A (1989) The lock and key hypothesis: evolutionary and biosystematics interpretation of insect genitalia. Annu. Rev. Ent. 34:231-245.

Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51:492-508.

Shoshani J (1986) Mammalian phylogeny: comparison of morphological and molecular results. Mol. Biol. Evol. 3:222-242.

Simmons, LW (2005) The evolution of polyandry: sperm competition, sperm selection, and offspring viability. Annu. Rev. Ecol. Evol. Syst. 36:125-146.

Simmons NB, Seymour KL, Habersetzer J, Gunnell GF (2008) Primitive Early Eocene bat from Wyoming and the evolution of flight and echolocation. Nature 451:818-821.

Simpson GG (1945) The principles of classification and a classification of mammals. Bull. Amer. Mus. Nat. Hist. 85:xvi+350.

Socolich M, Lockless SW, Russ WP, Lee H, Gardner KH, Ranganathan R (2005) Evolutionary information for specifying a protein fold. Nature 437.7058:512- 518.

152 Texas Tech University, Emma K. Roberts, May 2020

Song S, Liu L, Edwards SV, Wu S (2012) Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc. Natl Acad. Sci. U.S.A 109:14942-14947.

Springer MS, Cleven GC, Madsen O, de Jong WW, Waddell VG, Amrine, HM and Stanhope MJ (1997) Endemic African mammals shake the phylogenetic tree. Nature 388:61-64.

Springer MS, DeBry RW, Douady C, Amrine HM, Madsen O, de Jong WW, Stanhope MJ (2001) Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Mol. Biol. Evol. 18:132-143.

Springer MS, Murphy WJ, Eizirik E, O’Brien SJ (2003) Placental mammal diversification and the Cretaceous-Tertiary boundary. Proc. Natl. Acad. Sci. U.S.A. 100:1056-1061.

Springer MS, Stanhope MJ, Madsen O, de Jong WW (2004) Molecules consolidate the placental mammal tree. Trends Ecol Evol 19:430-438.

Springer MS, Murphy WJ, Eizirik E, Madsen O, Scally M, Douady CJ, Teeling EC, Stanhope MJ, de Jong WW, O’Brien SJ (2007) A molecular classification for the living orders of placental mammals and the phylogenetic placement of primates. Origins: Adaptations and Evolution. Springer, Boston, MA. pp. 1-28.

Springer MS, Meredith RW, Teeling EC, Murphy WJ (2013) Technical comment on ‘The placental mammal ancestor and the post-KPg radiation of placentals’. Science 341:613.

Springer MS, Gatesy J (2016) The gene tree delusion. Mol. Phy. Evol. 94:1-33.

Stadler T, Bokma F (2013) Estimating speciation and extinction rates for phylogenies of higher taxa. Syst. Biol. 62:220-230.

Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinf. 30.9:1312.

Stanhope MJ, Waddell VG, Madsen O, de Jong WW, Hedges SB, Cleven GC, Kao D, Springer MS (1998) Molecular evidence for multiple origins of Insectivora and for a new order of endemic African insectivore mammals. Proc. Natl. Acad. Sci. U.S.A. 95:9967-9972.

Stanley SM (1975) A theory of evolution above the species level. Proc. Natl. Acad. Sci. U.S.A. 72:646-650.

153 Texas Tech University, Emma K. Roberts, May 2020

Swanson WJ, Vacquier VD (1998) Concerted evolution in an egg receptor for a rapidly evolving abalone sperm protein. Science 281:710-712.

Swanson WJ, Vacquier VD (2002a) The rapid evolution of reproductive proteins. Nat. Rev. Genet. 3:137-144.

Swanson WJ, Vacquier VD. (2002b) Reproductive protein evolution. Annu. Rev. Ecol. Syst. 33.1:161-179.

Swanson WJ, Nielsen R, Yang Q (2003) Pervasive adaptive evolution in mammalian fertilization proteins. Mol. Biol. Evol. 20:18-20.

Swofford DL (2003) PAUP*: phylogenetic analysis using parsimony, version 4.0 b10.

Tardif S, Wilson MD, Wagner R, Hunt P, Gertsenstein M, Nagy A, Lobe C, Koop BF, Hardy DM (2010a) Zonadhesin is essential for species specificity of sperm adhesion to the egg zona pellucida. Biochem. J. 285:24863-24870.

Tardif S, Brady JA, Breazeale KR, Bi M, Thompson LD, Bruemmer JE, Bailey LB, Hardy DM (2010b) Zonadhesin D3-polypeptides vary among species but are similar in Equus species capable of interbreeding. Biol. Reprod. 82:413-421.

Tarver JE, dos Reis M, Mirarab S, Moran RJ, Parker S, O’Reilly JE, King BJ, Asher RJ, Warnow T, Peterson KJ, Donoghue PCJ, Pisani D (2016) The interrelationships of placental mammals and the limits of phylogenetic inference. Gen. Biol. Evol. 8:330-344.

Tautz D, Domazet-Lošo T (2011) The evolutionary origin of orphan genes. Nature Rev. Gen. 12.10:692-702.

Tawfik, OKDS (2010) Enzyme promiscuity: a mechanistic and evolutionary perspective. Annu. Rev. Biochem. 79:471-505.

Teeling EC, Springer MS, Madsen O, Bates P, O’Brien SJ, Murphy WJ (2005) A molecular phylogeny for bats illuminates biogeography and the fossil record. Science 307:580-584.

Teeling EC, Hedges SB (2013) Making the impossible possible: rooting the tree of placental mammals. Mol. Biol. Evol. 30:1999-2000.

Thompson C (2013) Implications of hybridization between the Rio Grande ground squirrel (Ictidomys parvidens) and the thirteen-lined ground squirrel (I. tridecemlineatus), Doctoral dissertation, Texas Tech University.

154 Texas Tech University, Emma K. Roberts, May 2020

Thompson C, Stangl Jr. F, Bradley RD (2015) Ancient hybridization and subsequent mitochondrial capture in ground squirrels (Genus Ictidomys). Occasional Papers, Museum of Texas Tech University. 331:1-24.

Tsagkogeorga G, Parker J, Stupka E, Cotton JA and Rossiter SJ (2013) Phylogenomic analyses elucidate the evolutionary relationships of bats. Curr. Biol. 23:2262- 2267.

Tung KSK, Harakal J, Qiao J, Rival C, Li JCH, Paul AGA, Wheeler K, Pramoonjago P, Grafer CM, Sun W, Sampson RD, Wong EWP, Reddi PP, Deshmukh US, Hardy DM, Tang Hm Cheng CY, Goldberg E (2017) Egress of sperm autoantigen from seminiferous tubules maintains systemic tolerance. J. Clin. Invest. 127:1046-1060.

Turelli M, Orr HA (2000) Dominance, epistasis, and the genetics of postzygotic isolation. Genetics 154:1663-1679.

Turner LM, Hoekstra HE (2006) Adaptive evolution of fertilization proteins within a genus: variation in ZP2 and ZP3 in deer mice (Peromyscus). Mol. Biol. Evol. 23:1656-1669.

Turner LM, Chuong EB, Hoekstra HE (2008) Comparative analysis of testis protein evolution in rodents. Genetics 179:2075-2089.

Ursing BM, Arnason U (1998) Analyses of mitochondrial genomes strongly support a hippopotamus-whale clade. Proc. R. Soc. B. 265:2251-2255.

Van den Bussche RA, Hoofer SR (2004) Phylogenetic relationships among recent chiropteran families and the importance of choosing appropriate outgroup taxa. J Mamm. 85:321-330.

Venkatachalam B, Gusfield D (2018) Generalizing tanglegrams. doi:https://scholarship.org/uc/item/0bg5p8ch

Verma R, SB Pandit (2019) Unraveling the structural landscape of intra-chain domain interfaces: Implication in the evolution of domain-domain interactions. PloS One 14.8: e0220336.

Via S (2009) Natural selection in action during speciation. Proc. Natl. Acad. Sci. U.S.A. 106:9939-9946.

Voordeckers K, Brown CA, Vanneste K, van der Zande E, Voet A, Maere S, Verstrepen KJ (2012) Reconstruction of ancestral metabolic enzymes reveals molecular mechanisms underlying evolutionary innovation through gene duplication. PLoS Biol. 10.12: e1001446.

155 Texas Tech University, Emma K. Roberts, May 2020

Waddell PJ, Okada N, Hasegawa M (1999a) Towards resolving the interordinal relationships of placental mammals. Syst. Biol. 48:1-5.

Waddell PJ, Cao Y, Hauf J, Hasegawa M (1999b) Using novel phylogenetic methods to evaluate mammalian mtDNA, including amino acid-invariant sites-LogDet plus site stripping, to detect internal conflicts in the data, with special reference to the positions of hedgehog, armadillo, and elephant. Syst. Biol. 48:31-53.

Wade GN, Schneider JE (1992) Metabolic fuels and reproduction in female mammals. Neurosci. Biobehav. Rev. 16:235-272.

Wassarman PM (2008) Zona pellucida glycoproteins. Biochem. J. 283:24285-24289.

Webb CO, Donoghue MJ (2005) Phylomatic: tree assembly for applied phylogenetics. Mol. Ecol. Notes 5:181-183.

Weir BJ, Rowlands IW (1973) Reproductive strategies of mammals. Annu. Rev. Ecol. Evol. 4:139-163.

Wheeler K, Tardif S, Rival C, Luu B, Bui E, del Rio R, Teuscher C, Sparwasser T, Hardy D, Tung KSK (2011) Regulatory T cells control tolerogenic versus autoimmune response to sperm in vasectomy. Proc. Natl. Acad. Sci. U.S.A. 108:7511-7516.

Wilburn DB, Swanson WJ (2016) From molecules to mating: rapid evolution and biochemical studies of reproductive proteins. J. Prot. 135:12-25.

Wilburn DB, Tuttle LM, Klevit RE, Swanson WJ (2018) Solution structure of sperm lysin yields novel insights into molecular dynamics of rapid protein evolution. Proc. Natl. Acad. Sci. U.S.A. 115:1310-1315.

Wildman DE, Uddin M, Opazo JC, Liu G, Lefort V, Guindon S, Gascuel O, Grossman LI, Romero R, Goodman M (2007) Genomics, biogeography, and the diversification of placental mammals. Proc. Natl. Acad. Sci. U.S.A 104:14395- 14400.

Wilson AC, Maxson LR, Sarich VM (1974) Two types of molecular evolution: evidence from studies of interspecific hybridization. Proc. Natl. Acad. Sci. U.S.A. 71:2843-2847.

Wilson MD, Riemer C, Martindale DW, Schnupf P, Boright AP, Cheung TL, Hardy DM, Schwartz S, Scherer SW, Tsui L-C, Miller W, Koop BF (2001) Comparative analysis of the gene-dense ACHE/TFR2 region on human chromosome 7q22 with the orthologous region on mouse chromosome 5. Nuc Acids Res. 29:1352-1365.

156 Texas Tech University, Emma K. Roberts, May 2020

Wishart WD, Hrudka F, Schmutz SM, Flood PF (1988) Observations on spermatogenesis, sperm phenotype, and fertility in white-tailed× mule deer hybrids and a yak× cow hybrid. Canadian J. Zool. 66:1664-1671.

Wu C, Palopoli M (1994) Genetics of post-mating reproductive isolation in mammals. Annu. Rev. Genet. 27:283-308.

Wyckoff GJ, Wang W, Wu C-I (2000) Rapid evolution of male reproductive genes in the descent of man. Nature 403:304-309.

Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24:1586-1581.

Yue P, Li Z, Moult J (2005) Loss of protein structure stability as a major causative factor in monogenic disease. J. Mol. Biol. 353.2:459-473.

Zhou X, Xu S, Xu J, Chen B, Zhou K, Yang G (2012) Phylogenomic analysis resolves the interordinal relationships and rapid diversification of the Laurasiatherian mammals. Syst. Biol. 61:150-164.

Zigler K, McCartney M, Levitan D, Lessios HA (2005) Sea urchin bindin divergence predicts gamete compatibility. Evolution 59:2399-2404.

Zou Z, Zhang J (2016) Morphological and molecular convergences in mammalian phylogenetics. Nature Comm. 7:12758.

157 Texas Tech University, Emma K. Roberts, May 2020

APPENDIX A Table A1. Gene sequence source ID or Accession numbers, listed placental mammals> marsupials>monotremes>non-mammals. NS=No Sequence; gene sequence not available

Common Name Species Zan Tecta Cytb Aardvark Orycteropus afer afer XM_007955893.1 XM_007955906.1 NC_002078.1 African savannah Loxodonta africana XM_023541738.1 XM_003418199.2 NC_000934.1 elephant Alpaca Vicugna pacos XM_015251946.1 XM_015242128.1 NC_002504.1 Alpine marmot Marmota marmota XM_015490707.1 XM_015490036.1 NS marmota American beaver Castor canadensis XM_020182103.1 NS NS Amur tiger Panthera tigris altaica XM_015538605.1 XM_007091209.1 NC_010642.1 Angola colobus Colobus angolensis XM_011943959.1 XM_011926983.1 AF295583.1 palliatus Arabian camel Camelus dromedarius XM_010979462.1 XM_010990120.1 NC_009849.1 Armadillo Dasypus novemcinctus XM_023588394.1 XM_012525555.2 NC_001821.1 Atlantic walrus Odobenus rosmarus NS NS NC_004029.2 rosmarus Bactrian camel Camelus bactrianus XM_010953390.1 XM_010967243.1 NC_009628.2 Beluga whale Delphinapterus leucas XM_022587956.1 XM_022560259.1 NC_034236.1 Big brown bat Eptesicus fuscus XM_028145734.1 NS AF376835.1 Bison Bos bison bison XM_010837929.1 XM_010830495.1 NC_012346.1 Black flying fox Pteropus alecto XM_006918670.1 XM_006925244.1 NC_023122.1 Black snub-nosed Rhinopithecus bieti XM_017885956.1 XM_017865564.1 NC_015486.1 monkey Bolivian squirrel Saimiri boliviensis XM_010344850.1 XM_003923586.2 NC_018096.1 monkey boliviensis Bottlenose dolphin Tursiops truncatus ENSTTRT00000011082.1 XM_019941744.1 NC_012059.1 Brandt’s bat Myotis brandtii XM_014550115.1 XM_005857110.2 NC_025308.1 Cape elephant shrew edwardii NS XM_006895633.1 NC_041486.1 Cape golden mole Chrysochloris asiatica XM_006859819.1 XM_006833923.1 NC_004920.1 Cape rock hyrax Procavia capensis ENSPCAT00000005892.1 ENSPCAT00000007546.1 NC_004919.1 Cheetah Acinonyx jubatus XM_027042102.1 XM_027036760.1 NC_005212.1 Chimpanzee Pan troglodytes XM_024357974.1 XM_001167042.4 NC_001643.1 Chinese hamster Cricetulus griseus XM_027442291.1 XM_016980130.2 NC_007936.1 Chinese rufous Rhinolophus sinicus XM_019752697.1 XM_019718967.1 NS horseshoe bat Chinese treeshrew Tupaia chinensis XM_027774660.1 XM_006148855.1 NS Common squirrel Saimiri sciureus AY428855.1 NS NC_012775.1 monkey Common vampire bat Desmodus rotundus NS XM_024557468.1 NS Coquerel sifaka Propithecus coquereli XM_012638882.1 ENSPCOT00000019406.1 NC_011053.1 Cotton top tamarin Saguinus oedipus AY428857.1 NS HM368007.1 Cattle Bos taurus XM_024985146.1 XM_015474707.2 NC_006853.1 Crab-eating macaque Macaca fascicularis XM_015447172.1 XM_015436019.1 NC_012670.1 Damara mole rat Fukomys damarensis XM_019205267.1 XM_010629239.1 NC_027742.1 David’s Myotis Myotis davidii XM_015564568.1 XM_006763686.2 NC_025568.1 Degu Octodon degus XM_023714561.1 XM_006763686.2 NC_020661.1 Dingo Canis lupus dingo XM_025425976.1 XM_025465198.1 MH035676.1 Domestic cat Felis catus XM_023246321.1 XM_023239201.1 NC_001700.1 Domestic dog Canis lupus familiaris XM_022420661.1 XM_022418187.1 NC_002008.4 Domestic ferret Mustela putorius furo XM_013061049.1 XM_013054972.1 NC_020638.1 Donkey Equus asinus XM_014854014.1 XM_014858838.1 NC_001788.1 Drill Mandrillus leucophaeus XM_011985821.1 XM_011964981.1 NC_028442.1 Egyptian rousette Rousettus aegyptiacus XM_016124823.1 XM_016161282.1 NC_007393.1 European shrew Sorex araneus XM_012934679.1 XM_004605093.1 NC_027963.1 158 Texas Tech University, Emma K. Roberts, May 2020

Table A1, Continued

Florida manatee Trichechus manatus XM_023740979.1 XM_004385610.2 NS latirostris Gelada Theropithecus gelada XM_025379755.1 XM_025357413.1 NC_019802.1 Giant panda Ailuropoda melanoleuca XM_019807987.1 XM_002914077.3 NC_009492.1 Goat Capra hircus XM_018039999.1 XM_018059641.1 NC_005044.2 Golden hamster Mesocricetus auratus XM_013120772.2 XM_013112828.1 NC_013276.1 Golden snub-nosed Rhinopithecus roxellana XM_010377736.1 XM_010357007.1 NC_008218.1 monkey Gorilla Gorilla gorilla gorilla XM_019030456.1 XM_019035733.1 NC_011120.1 Gray mouse lemur Microcebus murinus XM_020281679.1 XM_020286855.1 ENSMICT0000 0053028.1 Great round leaf bat Hipposideros armiger XM_019629886.1 XM_019666165.1 NC_018540.1 Greater horseshoe bat Rhinolophus NS NS KP063146.1 ferrumequinum Green monkey Chlorocebus sabaeus XM_008018516.1 XM_008021233.1 NC_008066.1 Guinea pig Cavia porcellus XM_023565129.1 XM_013155428.2 NC_000884.1 Hamadryas baboon Papio hamadryas AY428853.1 XM_017949012.2 NC_001992.1 Hawaiian monk seal Neomanachus XM_021680028.1 XM_021696471.1 NC_008421.1 schauinslandi Himalayan marmot Marmota himalayana NS NS NC_018367.1 Horse Equus caballus XM_023655363.1 XM_023645009.1 NC_001640.1 House mouse Mus musculus NM_011741.2 NM_001324548.1 NC_010339.1 Human Homo sapiens NM_003386.3 NM_005422.2 NC_012920.1 Killer whale Orcinus orca XM_012532585.1 XM_004280685.1 NC_023889.1 Korean greater Rhinolophus NS NS NC_016191.1 horseshoe bat ferrumequinum Large flying fox Pteropus vampyrus XM_011371687.2 XM_011383588.1 NC_026542.1 Lesser Egyptian jerboa Jaculus jaculus XM_012950591.1 XM_004667352.1 NC_005314.1 Lesser hedgehog tenrec Echinops telfairi XM_013005430.1 XM_004716269.1 NC_002631.2 Little brown bat Myotis lucifugus XM_014453436.1 XM_023744715.1 NC_029849.1 Long-tailed chinchilla Chinchilla lanigera XM_013509448.1 XM_005378359.1 NC_021386.1 Malayan pangolin Manis javanica XM_017679895.1 XM_017679642.1 NC_026781.1 Ma’s night monkey Aotus nancymaae XM_021671603.1 XM_012470273.1 NC_018116.1 Minke whale Balaenoptera XM_007187024.1 XM_007196694.1 NC_005271.1 acutorostrata Mongolian gerbil Meriones unguiculatus XM_021650706.1 XM_021655443.1 NC_023263.1 Natal long-fingered bat Miniopterus natalensis XM_016217684.1 XM_016209090.1 AJ841977.1 Neanderthal Homo neanderthalensis ENST00000356510 ENST00000264037 NC_011137.1 Northern fur seal Callorhinus ursinus XM_025864597.1 XM_025893827.1 NC_008415.3 Northern treeshrew Tupaia belangeri NS NS NC_002521.1 Northern white- Nomascus leucogenys XM_012496204.1 XM_003253294.2 NC_021957.1 cheeked gibbon Norway rat Rattus norvegicus XM_017598593.1 XM_008766130.2 NC_001665.2 Olive baboon Papio anubis XM_021936505.1 XM_017949012.2 NC_020006.2 Ord’s kangaroo rat Dipodomys ordii XM_013031714.1 XM_013009594.1 AF173501.1 Pacific walrus Odobenus rosmarus XM_012561877.1 XM_004417225.2 NS divergens Philippine tarsier Tarsius syrichta XM_008053323.1 XM_008054274.2 NC_012774.1 Pig Sus scrofa NM_214383.1 ENSSSCT00000032235.2 NC_000845.1 Pig-tailed macaque Macaca nemestrina XM_024797634.1 XM_011732186.1 NC_026976.1 Pika Ochotona princeps XM_012927479.1 XM_004584986.1 NC_005358.1 Polar bear Ursus maritimus XM_008698435.1 XM_008690174.1 NC_003428.1 Prairie deermouse Peromyscus maniculatus XM_016002906.1 XM_006981237.2 NS bairdii Prairie vole Microtus ochragaster XM_005371467.1 XM_005347143.1 NC_027945.1 Przewalski’s horse Equus przewalskii XM_008527352.1 XM_008513722.1 NC_024030.1

159 Texas Tech University, Emma K. Roberts, May 2020

Table A1, continued

Pygmy chimpanzee Pan paniscus XM_008965459.1 XM_008973393.1 NC_001644.1 Rabbit Oryctolagus cuniculus XM_008257988.2 XM_017347720.1 NC_001913.1 Red fox Vulpes vulpes XM_026019606.1 XM_025999036.1 DQ498126.1 Rhesus macaque Macaca mulatta XM_015134488.2 XM_015115973.1 NC_005943.1 Ryukyu mouse Mus caroli XM_021163424.1 XM_021172355.1 NC_025268.1 Sea otter Enhydra lutris kenyoni XM_022500393.1 NS AF057120.1 Sheep Ovis aries XM_027961542.1 XM_015100775.1 NC_001941.1 Shrew mouse Mus pahari XM_029533878.1 XM_021206960.1 NC_036680.1 Small-eared galago Otolemur garnetti XM_023518621.1 XM_023517294.1 AY441466.1 Sooty mangabey Cercocebus atys XM_012057317.1 XM_012053453.1 NC_028592.1 Southern white Ceratotherium simum XM_014797076.1 XM_004427198.2 NC_001808.1 rhinoceros simum Sperm whale Physeter catodon XM_028498046.1 XM_007116373.2 NC_002503.2 Star-nosed mole Condylura cristata XM_012731625.1 XM_012729955.1 NC_029762.1 Sumatran orangutan Pongo abelii XM_024250049.1 XM_002822597.2 NC_002083.1 Sunda flying lemur Galeopterus variegatus XM_008580725.1 XM_008578339.1 NC_004031.1 Texas white-tailed deer Odocoileus virginianus XM_020875960.1 XM_020891407.1 NS texanus Thick-tailed bushbaby Otolemur crassicaudatus NS NS NC_012762.1 Thirteen-lined ground Ictidomys XM_021728560.1 XM_005328387.1 NC_027278.1 squirrel tridecemlineatus Tibetan antelope Pantholops hodgsonii XM_005956236.1 XM_005968188.1 NC_007441.1 Two-toed sloth Choloepus hoffmanni NS ENSCHOT00000012139.1 KR336793.1 Ugandan red colobus Piliocolobus XM_026456664.1 XM_023208479.1 NS tephrosceles Upper Galilee Mtns Nannospalax galili ENSNGAG00000010261 XM_008826017.2 NC_020754.1 blind mole rat Water buffalo Bubalus bubalis XM_025275454.1 XM_006046458.1 NC_006295.1 Weddell seal Leptonychotes weddellii XM_006733759.1 XM_006734324.1 NC_008424.1 West Indian manatee Trichechus manatus NS NS JF489120.1 Western European Erinaceus europaeus XM_016193282.1 XM_007523569.2 NC_002080.2 hedgehog Western red colobus Piliocolobus badius NS NS NC_008219.1 White-footed Peromyscus leucopus NS NS NC_037180.1 deermouse White-fronted Cebus capucinus XM_017533382.1 XM_017523255.1 NC_002763.1 capuchin imitator White-tailed deer Odocoileus virginianus NS NS NC_015247.1 White-tufted ear Callithrix jacchus XM_008982413.2 XM_017978437.1 NC_025586.1 marmoset Wild Bactrian camel Camelus ferus XM_014561991.1 XM_006180274.2 NC_009629.2 Yak Bos mutus XM_014483333.1 XM_005897103.1 NC_025563.1 Yangtze River dolphin Lipotes vexillifer XM_007468897.1 XM_007450613.1 NC_007629.1 Zebu Bos indicus XM_019987175.1 XM_019974194.1 NC_005971.1 Banded-hare wallaby Lagostrophus fasciatus NS NS NC_008447.1 Gray short-tailed Monodelphis domestica NS XM_001370253.3 NC_006299.1 opossum Koala Phascolarctos cinereus NS XM_020984871.1 NC_008133.1 Tammar wallaby Macropus eugenii NS ENSMEUG00000016001 NS Tazmanian devil Sarcophilus harrisi NS XM_023499402.1 NC_018788.1 Duck-billed platypus Ornithorhynchus NS XM_007668019.1 ENSOANT0000 anatinus 0028483.1 Chinese soft-shelled Pelodiscus sinensis XM_025189431.1 XM_025189890.1 NC_006132.1 turtle Coelocanth Latimeria chalumnae XM_014490663.1 XM_014494885.1 U82228.1

160 Texas Tech University, Emma K. Roberts, May 2020

APPENDIX B Table B1. PAML summary results. Shown are log likelihood scores (LH), Akaike Information Criterion (AIC-c) parameter estimates, and number of parameters (np) for variable dN/dS models within the F3´4 codon frequency method (M0-null, M1- neutral, M2-selection, M7-beta, M8-beta and w>1).

Group Model Parameter Estimates LH np

I M0 w0 = 0.902 -1962.50 23

M1 w0 = 0.128 f0 = 0.363 -1941.42 24

w1 = 1.000 f1 = 0.637

M2 w0 = 0.139 f0 = 0.319 -1932.21 26

w1 = 1.000 f1 = 0.517

w2 = 3.071 f2 = 0.164 M7 p = 0.059 q = 0.027 -1942.34 24 w = 0.000 f = 0.200 w = 0.285 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.807 p= 0.399 q = 0.242 -1932.27 26

p1 = 0.193 w = 2.854 w = 0.027 f = 0.161 w = 0.343 f = 0.161 w = 0.772 f = 0.161 w = 0.969 f = 0.161 w = 0.999 f = 0.161 w = 2.854 f = 0.193

II M0 w0 = 1.026 -1697.95 23

M1 w0 = 0.076 f0 = 0.301 -1684.90 24

w1 = 1.000 f1 = 0.698

M2 w0 = 0.073 f0 = 0.250 -1676.33 26

w1 = 1.000 f1 = 0.587

w2 = 3.242 f2 = 0.163 M7 p = 0.077 q = 0.013 -1685.61 24 w = 0.008 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.814 p= 0.063 q = 0.028 -1676.42 26

p1 = 0.186 w = 3.058 w = 0.000 f = 0.163

161 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued w = 0.436 f = 0.163 w = 0.999 f = 0.163 w = 1.000 f = 0.163 w = 1.000 f = 0.163 w = 3.058 f = 0.186

III M0 w0 = 0.988 -2287.47 21

M1 w0 = 0.164 f0 = 0.113 -2283.06 22

w1 = 1.000 f1 = 0.887

M2 w0 = 0.054 f0 = 0.060 -2279.38 24

w1 = 1.000 f1 = 0.843

w2 = 3.092 f2 = 0.097 M7 p = 0.165 q = 0.022 -2283.33 22 w = 0.287 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.896 p = 0.188 q = 0.025 -2279.97 24

p1 = 0.104 w = 2.781 w = 0.322 f = 0.179 w = 0.999 f = 0. 179 w = 1.000 f = 0. 179 w = 1.000 f = 0. 179 w = 1.000 f = 0. 179 w = 2.781 f = 0.104

IV M0 w0 = 1.163 -1077.54 9

M1 w0 = 0.000 f0 = 0.193 -1075.08 10

w1 = 1.000 f1 = 0.807

M2 w0 = 0.000 f0 = 0.247 -1071.66 12

w1 = 1.000 f1 = 0.268

w2 = 2.200 f2 = 0.485 M7 p = 0.503 q = 0.005 -1077.82 10 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.462 p = 0.036 q = 0.041 -1071.66 12

p1 = 0.538 w = 2.114 w = 0.000 f = 0.092 w = 0.000 f = 0.092 w = 0.130 f = 0. 092

162 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued w = 0.999 f = 0. 092 w = 1.000 f = 0. 092 w = 2.114 f = 0.538

V M0 w0 = 0.858 -1481.37 21

M1 w0 = 0.070 f0 = 0.352 -1467.74 22

w1 = 1.000 f1 = 0.648

M2 w0 = 0.070 f0 = 0.316 -1462.42 24

w1 = 1.000 f1 = 0.579

w2 = 3.550 f2 = 0.105 M7 p = 0.057 q = 0.027 -1467.96 22 w = 0.000 f = 0.200 w = 0.220 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.886 p = 0.068 q = 0.032 -1462.45 24

p1 = 0.114 w = 3.392 w = 0.000 f = 0.177 w = 0.254 f = 0.177 w = 0.999 f = 0.177 w = 1.000 f = 0.177 w = 1.000 f = 0.177 w = 3.392 f = 0.089

VI M0 w0 = 0.854 -1633.38 21

M1 w0 = 0.025 f0 = 0.253 -1623.27 22

w1 = 1.000 f1 = 0.747

M2 w0 = 0.020 f0 = 0.238 -1621.50 24

w1 = 1.000 f1 = 0.699

w2 = 3.529 f2 = 0.062 M7 p = 0.023 q = 0.005 -1623.51 22 w = 0.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.376 p = 16.882 q = 99.00 -1621.89 24

p1 = 0.624 w = 1.387 w = 0.105 f = 0.075 w = 0.127 f = 0.075 w = 0.144 f = 0.075 w = 0.161 f = 0.075 w = 0.189 f = 0.075

163 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued w = 1.387 f = 0.624

VII M0 w0 = 1.013 -1715.76 23

M1 w0 = 0.076 f0 = 0.306 -1700.82 24

w1 = 1.000 f1 = 0.694

M2 w0 = 0.213 f0 = 0.455 -1693.70 26

w1 = 1.000 f1 = 0.000

w2 = 1.924 f2 = 0.545 M7 p = 0.094 q = 0.016 -1701.79 24 w = 0.020 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.456 p = 27.329 q = 99.00 -1693.72 26

p1 = 0.544 w = 1.927 w = 0.171 f = 0.215 w = 0.196 f = 0.215 w = 0.215 f = 0.215 w = 0.234 f = 0.215 w = 0.264 f = 0.215 w = 1.927 f = 0.544

VIII M0 w0 = 0.847 -1506.31 21

M1 w0 = 0.016 f0 = 0.222 -1499.37 22

w1 = 1.000 f1 = 0.778

M2 w0 = 0.135 f0 = 0.332 -1498.70 24

w1 = 1.000 f1 = 0.000

w2 = 1.280 f2 = 0.668 M7 p = 0.026 q = 0.006 -1499.40 22 w = 0.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.332 p = 15.706 q = 99.00 -1498.70 24

p1 = 0.668 w = 1.280 w = 0.097 f = 0.067 w = 0.119 f = 0.067 w = 0.135 f = 0.067 w = 0.152 f = 0.067 w = 0.179 f = 0.067 w = 1.280 f = 0.668

IX M0 w0 = 0.786 -1287.92 17

164 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued

M1 w0 = 0.087 f0 = 0.366 -1280.09 18

w1 = 1.000 f1 = 0.634

M2 w0 = 0.269 f0 = 0.606 -1277.71 20

w1 = 1.000 f1 = 0.000

w2 = 1.830 f2 = 0.394 M7 p = 0.058 q = 0.028 -1280.30 18 w = 0.000 f = 0.200 w = 0.222 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.607 p = 36.930 q = 99.00 -1277.71 20

p1 = 0.393 w = 1.833 w = 0.224 f = 0.121 w = 0.251 f = 0.121 w = 0.271 f = 0.121 w = 0.291 f = 0.121 w = 0.321 f = 0.121 w = 1.833 f = 0.393

X M0 w0 = 1.100 -1552.53 19

M1 w0 = 0.111 f0 = 0.353 -1541.26 20

w1 = 1.000 f1 = 0.647

M2 w0 = 0.075 f0 = 0.230 -1523.90 22

w1 = 1.000 f1 = 0.649

w2 = 5.574 f2 = 0.121 M7 p = 0.023 q = 0.005 -1542.36 20 w = 0.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.876 p = 0.163 q = 0.055 -1523.95 22

p1 = 0.124 w = 5.553 w = 0.003 f = 0.175 w = 0.793 f = 0.175 w = 0.999 f = 0.175 w = 1.000 f = 0.175 w = 1.000 f = 0.175 w = 5.553 f = 0.124

XI M0 w0 = 0.721 -1438.58 21

M1 w0 = 0.047 f0 = 0.332 -1428.00 22

165 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued

w1 = 1.000 f1 = 0.668

M2 w0 = 0.105 f0 = 0.407 -1427.48 24

w1 = 1.000 f1 = 0.000

w2 = 1.245 f2 = 0.593 M7 p = 0.059 q = 0.028 -1428.24 22 w = 0.000 f = 0.200 w = 0.213 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.999 p = 0.059 q = 0.028 -1428.24 24

p1 = 0.000 w = 5.249 w = 0.000 f = 0.200 w = 0.213 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 5.249 f = 0.001

XII M0 w0 = 1.016 -1514.42 19

M1 w0 = 0.000 f0 = 0.228 -1502.97 20

w1 = 1.000 f1 = 0.772

M2 w0 = 0.020 f0 = 0.254 -1499.49 22

w1 = 1.000 f1 = 0.269

w2 = 1.822 f2 = 0.477 M7 p = 0.022 q = 0.005 -1503.11 20 w = 0.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.458 p = 0.045 q = 0.052 -1499.50 22

p1 = 0.542 w = 1.750 w = 0.000 f = 0.092 w = 0.000 f = 0.092 w = 0.146 f = 0.092 w = 0.999 f = 0.092 w = 1.000 f = 0.092 w = 1.750 f = 0.542

XIII M0 w0 = 0.852 -1590.44 21

M1 w0 = 0.015 f0 = 0.264 -1578.02 22

w1 = 1.000 f1 = 0.736

M2 w0 = 0.000 f0 = 0.235 -1572.24 24

166 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued

w1 = 1.000 f1 = 0.707

w2 = 5.016 f2 = 0.058 M7 p = 0.023 q = 0.005 -1578.53 22 w = 0.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.271 f = 0.005 q = 1.204 -1576.93 24

p1 = 0.729 w = 1.260 w = 0.000 f = 0.054 w = 0.000 f = 0.054 w = 0.000 f = 0.054 w = 0.000 f = 0.054 w = 0.000 f = 0.054 w = 1.206 f = 0.729

XIV M0 w0 = 0.722 -1984.72 27

M1 w0 = 0.000 f0 = 0.277 -1965.53 28

w1 = 1.000 f1 = 0.723

M2 w0 = 0.000 f0 = 0.272 -1963.83 30

w1 = 1.000 f1 = 0.706

w2 = 4.466 f2 = 0.021 M7 p = 0.066 q = 0.017 -1966.79 28 w = 0.001 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.342 p = 0.022 q = 0.192 -1965.28 30

p1 = 0.658 w = 1.118 w = 0.000 f = 0.068 w = 0.000 f = 0.068 w = 0.000 f = 0.068 w = 0.001 f = 0.068 w = 0.517 f = 0.068 w = 1.118 f = 0.658

XV M0 w0 = 0.917 -1491.50 19

M1 w0 = 0.050 f0 = 0.247 -1484.95 20

w1 = 1.000 f1 = 0.753

M2 w0 = 0.246 f0 = 0.446 -1482.94 22

w1 = 1.000 f1 = 0.000

w2 = 1.578 f2 = 0.554

167 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued

M7 p = 0.051 q = 0.008 -1485.05 20 w = 0.002 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.447 p = 32.844 q = 99.00 -1482.94 22

p1 = 0.553 w = 1.580 w = 0.089 f = 0.089 w = 0.228 f = 0.089 w = 0.248 f = 0.089 w = 0.268 f = 0.089 w = 0.298 f = 0.089 w = 1.580 f = 0.553

XVI M0 w0 = 0.857 -1515.96 19

M1 w0 = 0.000 f0 = 0.294 -1502.13 20

w1 = 1.000 f1 = 0.706

M2 w0 = 0.027 f0 = 0.317 -1500.26 22

w1 = 1.000 f1 = 0.425

w2 = 1.860 f2 = 0.258 M7 p = 0.049 q = 0.024 -1502.87 20 w = 0.000 f = 0.200 w = 0.159 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.668 p = 0.051 q = 0.051 -1500.23 22

p1 = 0.332 w = 1.750 w = 0.000 f = 0.133 w = 0.000 f = 0.133 w = 0.483 f = 0.133 w = 0.999 f = 0.133 w = 1.000 f = 0.133 w = 1.750 f = 0.332

XVII M0 w0 = 0.770 -1716.22 25

M1 w0 = 0.136 f0 = 0.457 -1698.81 26

w1 = 1.000 f1 = 0.543

M2 w0 = 0.317 f0 = 0.677 -1692.81 28

w1 = 1.000 f1 = 0.000

w2 = 2.092 f2 = 0.323 M7 p = 0.052 q = 0.024 -1699.57 26

168 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued w = 0.000 f = 0.200 w = 0.221 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.684 p = 6.282 q = 12.818 -1692.80 28

p1 = 0.316 w = 2.114 w = 0.197 f = 0.137 w = 0.268 f = 0.137 w = 0.323 f = 0.137 w = 0.381 f = 0.137 w = 0.469 f = 0.137 w = 2.114 f = 0.316

XVIII M0 w0 = 0.721 -1037.12 11

M1 w0 = 0.000 f0 = 0.353 -1030.92 12

w1 = 1.000 f1 = 0.647

M2 w0 = 0.153 f0 = 0.478 -1029.02 14

w1 = 1.000 f1 = 0.407

w2 = 3.066 f2 = 0.115 M7 p = 0.051 q = 0.026 -1030.98 12 w = 0.000 f = 0.200 w = 0.085 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.863 f = 0.436 q = 0.434 -1028.99 14

p1 = 0.137 w = 2.915 w = 0.016 f = 0.173 w = 0.185 f = 0.173 w = 0.502 f = 0.173 w = 0.818 f = 0.173 w = 0.985 f = 0.173 w = 2.195 f = 0.137

XIX M0 w0 = 0.894 -1655.22 19

M1 w0 = 0.009 f0 = 0.293 -1642.59 20

w1 = 1.000 f1 = 0.707

M2 w0 = 0.096 f0 = 0.395 -1640.36 22

w1 = 1.000 f1 = 0.000

w2 = 1.470 f2 = 0.605 M7 p = 0.024 q = 0.005 -1643.70 20 w = 0.000 f = 0.200 w = 1.000 f = 0.200

169 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.396 p = 10.721 q = 99.00 -1640.37 22

p1 = 0.604 w = 1.471 w = 0.063 f = 0.079 w = 0.081 f = 0.079 w = 0.095 f = 0.079 w = 0.111 f = 0.079 w = 0.135 f = 0.079 w = 1.471 f = 0.604

XX M0 w0 = 0.831 -1978.00 27

M1 w0 = 0.050 f0 = 0.318 -1955.05 28

w1 = 1.000 f1 = 0.682

M2 w0 = 0.037 f0 = 0.285 -1949.55 30

w1 = 1.000 f1 = 0.602

w2 = 3.114 f2 = 0.113 M7 p = 0.056 q = 5.006 -1955.54 28 w = 0.000 f = 0.200 w = 0.238 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.870 f = 0.062 q = 0.030 -1949.83 30

p1 = 0.130 w = 2.840 w = 0.000 f = 0.174 w = 0.207 f = 0.174 w = 0.999 f = 0.174 w = 1.000 f = 0.174 w = 1.000 f = 0.174 w = 2.840 f = 0.130

XXI M0 w0 = 1.152 -1276.18 11

M1 w0 = 0.000 f0 = 0.244 -1268.02 12

w1 = 1.000 f1 = 0.756

M2 w0 = 0.076 f0 = 0.348 -1262.48 14

w1 = 1.000 f1 = 0.000

w2 = 2.029 f2 = 0.652 M7 p = 0.021 q = 0.005 -1268.30 12 w = 0.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200

170 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued w = 1.000 f = 0.200 M8 p0 = 0.300 p = 0.005 q = 1.273 -1262.67 14

p1 = 0.701 w = 1.884 w = 0.000 f = 0.060 w = 0.000 f = 0.060 w = 0.000 f = 0.060 w = 0.000 f = 0.060 w = 0.000 f = 0.060 w = 1.884 f = 0.701

D3TIL M0 w0 = 0.656 -1505.27 23

M1 w0 = 0.025 f0 = 0.376 -1490.51 24

w1 = 1.000 f1 = 0.624

M2 w0 = 0.033 f0 = 0.379 -1489.72 26

w1 = 1.000 f1 = 0.549

w2 = 2.220 f2 = 0.072 M7 p = 0.054 q = 0.028 -1490.54 24 w = 0.000 f = 0.200 w = 0.085 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 p = 0.200 M8 p0 = 0.891 p = 0.135 q = 0.109 -1489.67 26

p1 = 0.109 w = 1.993 w = 0.000 f = 0.178 w = 0.044 f = 0.178 w = 0.741 f = 0.178 w = 0.997 f = 0.178 w = 1.000 f = 0.178 w = 1.993 f = 0.110

Spalax D3p2-4 M0 w0 = 0.892 -991.61 7

M1 w0 = 0.000 f0 = 0.304 -986.76 8

w1 = 1.000 f1 = 0.696

M2 w0 = 0.431 f0 = 0.786 -981.34 10

w1 = 1.000 f1 = 0.000

w2 = 3.950 f2 = 0.214 M7 p = 0.056 q = 0.027 -987.00 8 w = 0.000 f = 0.200 w = 0.176 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.786 p = 75.400 q = 99.00 -981.35 10

171 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued

p1 = 0.214 w = 3.955 w = 0.384 f = 0.157 w = 0.412 f = 0.157 w = 0.432 f = 0.157 w = 0.452 f = 0.157 w = 0.481 f = 0.157 w = 3.955 f = 0.214

Meriones D3p2-7 M0 w0 = 1.182 -1849.87 13

M1 w0 = 0.037 f0 = 0.111 -1846.68 14

w1 = 1.000 f1 = 0.889

M2 w0 = 0.268 f0 = 0.249 -1842.94 16

w1 = 1.000 f1 = 0.000

w2 = 1.612 f2 = 0.751 M7 p = 0.133 q = 0.019 -1847.32 14 w = 0.147 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.250 f = 36.791 q = 99.00 -1842.94 16

p1 = 0.750 w = 1.613 w = 0.223 f = 0.050 w = 0.250 f = 0.050 w = 0.270 f = 0.050 w = 0.290 f = 0.050 w = 0.320 f = 0.050 w = 1.613 f = 0.750

Murid-sp D3p2-6 M0 w0 = 0.940 -1418.73 21

M1 w0 = 0.000 f0 = 0.307 -1406.51 22

w1 = 1.000 f1 = 0.693

M2 w0 = 0.018 f0 = 0.322 -1403.03 24

w1 = 1.000 f1 = 0.383

w2 = 2.178 f2 = 0.295 M7 p = 0.052 q = 0.026 -1407.26 22 w = 0.000 f = 0.200 w = 0.107 f = 0.200 w = 0.999 f = 0.200 w = 1.000 f = 0.200 w = 1.000 f = 0.200 M8 p0 = 0.672 p = 0.046 q = 0.047 -1402.99 24

p1 = 0.328 w = 2.115 w = 0.000 f = 0.134

172 Texas Tech University, Emma K. Roberts, May 2020

Table B1, Continued w = 0.000 f = 0.134 w = 0.459 f = 0.134 w = 0.999 f = 0.134 w = 1.000 f = 0.134 w = 2.115 f = 0.328

173