University of Iowa Iowa Research Online

Theses and Dissertations

Summer 2018

Genome evolution in parasitic wasps: comparisons of sexual and asexual species

Eric S. Tvedte University of Iowa

Follow this and additional works at: https://ir.uiowa.edu/etd

Part of the Biology Commons

Copyright © 2018 Eric S. Tvedte

This dissertation is available at Iowa Research Online: https://ir.uiowa.edu/etd/6516

Recommended Citation Tvedte, Eric S.. "Genome evolution in parasitic wasps: comparisons of sexual and asexual species." PhD (Doctor of Philosophy) thesis, University of Iowa, 2018. https://doi.org/10.17077/etd.kgdbnt2x

Follow this and additional works at: https://ir.uiowa.edu/etd Part of the Biology Commons

GENOME EVOLUTION IN PARASITIC WASPS: COMPARISONS OF SEXUAL AND ASEXUAL SPECIES

by

Eric S. Tvedte

A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Biology in the Graduate College of The University of Iowa

August 2018

Thesis Supervisors: Associate Professor John M. Logsdon, Jr. Associate Professor Andrew Forbes

Copyright by

Eric S. Tvedte

2018

All Rights Reserved

Graduate College The University of Iowa Iowa City, Iowa

CERTIFICATE OF APPROVAL

______

PH.D. THESIS

______

This is to certify that the Ph.D. thesis of

Eric Tvedte has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Biology at the August 2018 graduation.

Thesis Committee: ______John M. Logsdon, Jr., Thesis Supervisor

______Andrew Forbes, Thesis Supervisor

______Maurine Neiman

______Bryant McAllister

______Todd Scheetz

______Andrew Kitchen ACKNOWLEDGEMENTS

I would like to thank my advisors, John Logsdon and Andrew Forbes for their immense contribution to my professional and personal growth. I’d also like to thank my committee members, Maurine Neiman, Bryant McAllister, Todd Scheetz, and Andrew

Kitchen for their advising and engagement with the project. I thank past and present colleagues in the Logsdon & Forbes labs, particularly Joe Jalinsky, Cindy Toll, Chris Rice,

Alaine Hippee, Anna Ward, Heather Widmayer, Robin Bagley, Sarah Hanson, Elizabeth

Savelkoul, Gaby Hamerlinck, and Amanda Nelson for your collaboration, guidance, and pizza-eating on Fridays. I also thank Neiman lab folks, especially Kyle McElroy, Laura

Bankers, and Joel Sharbrough, for talks about sex.

My thesis work was greatly aided by the contributions from high school students and undergraduates, including Austin Ward, Austin Paden, Samuel Cummings, Breana

Rinker, and Ethan Nelson-Moore. I am also grateful for the assistance of the administrators and staff in the Biology Department, past and present. I am particularly thankful to Phil Ecklund, Tom Koeppel, Eileen Sullivan, and Gery Hehman for their various roles during my tenure.

Last but not least, I’d like to thank my friends and family for their steadfast support. I would especially like to thank my girlfriend, Zoey, whose serenity, thoughtfulness, and laughter has given me so much inspiration over the past year.

ii

ABSTRACT

The fate of any lineage is contingent on the rate at which its genome changes over time.

Genome dynamics are influenced by patterns of mutation and recombination. Mutations as the raw force of variation can be acted on independently during exchanges of homologous genetic regions via meiotic recombination. While molecular evolution in sexual lineages is impacted by both mutation and recombination, asexual lineage fate is primarily influenced by the mutation rate; recombination is often altered or absent in asexuals. Although multiple studies show accelerated mutation accumulation in asexual lineages that have lost recombination, virtually nothing is known about rate patterns when meiosis is retained. Here, I use parasitic wasps in genus Diachasma to investigate genome evolution in a recently-derived asexual lineage. I provide evidence that asexual

Diachasma possess a canonical set of meiosis genes as well as high levels of genomic homozygosity. Taken together, these observations support an active, albeit modified, form of meiosis in this asexual lineage. In addition, I present the first documentation of accelerated mutation accumulation in the nuclear genome of a naturally-occurring, meiotically- reproducing organism. If harmful, these mutations could impede asexual lineage persistence and contribute strong support for the long-term benefits of sex.

iii

PUBLIC ABSTRACT

Reproduction is a core organismal trait, and transitions between reproductive strategies may influence the evolutionary trajectory of a species. Sex is a costly reproductive strategy, yet it is ubiquitous in nature. The predominance of sex may be explained in part by passage of genetic material across generations during meiosis, particularly segregation and recombination. Sex enables parents to transmit genetic material without harmful mutations to offspring. Loss of sex may increase the accumulation of mutations in the genome. If harmful, these mutations could lead to the eventual extinction of an asexual lineage. A recent loss-of-sex event in Diachasma parasitic wasps offers a promising opportunity to compare patterns of genome-wide mutations between sexual and asexual species.

I used newly sequenced genomes from sexual and asexual Diachasma to investigate whether sex loss in an asexual wasp is associated with changes in meiosis genes. I identified a complete meiosis gene set in the asexual speices D. muliebre, supporting meiotic egg production despite sex loss. To assess whether reproductive mode influences Diachasma evolution more broadly, I retrieved a genome-wide dataset to calculate evolutionary rates between wasp species. I found evidence for greater mutation accumulation in asexual Diachasma, suggesting the modification of meiosis has an effect on the generation and maintenance of genetic variation. Future studies are needed to determine the extent to which this could affect the organism’s ability to survive in nature.

iv

TABLE OF CONTENTS

LIST OF TABLES ...... viii

LIST OF FIGURES ...... ix

PREFACE ...... xi

CHAPTER 1: INTRODUCTION ...... 1 The problem of sex ...... 1 Evolution of meiosis genes in asexual lineages...... 1 Genome evolution in asexual lineages ...... 4 Diachasma wasps: a promising model for the study of sex loss ...... 6 Thesis aims ...... 8 REFERENCES ...... 9

CHAPTER 2: DESCRIPTIVE ANALYSES OF THE GENOME OF THE PARASITIC WASP DIACHASMA ALLOEUM, AN EMERGING MODEL FOR ECOLOGICAL SPECIATION AND TRANSITIONS TO ASEXUAL REPRODUCTION ...... 15 ABSTRACT ...... 15 INTRODUCTION ...... 16 MATERIALS & METHODS ...... 18 Biological material ...... 18 DNA isolation, library preparation, sequencing, and genome assembly ...... 18 RNA isolation, RNAseq library preparation, sequencing, assembly, and annotation...... 20 Characterization of ultraconserved elements using BUSCO ...... 21 Identification of orthologous gene clusters using OrthoVenn ...... 22 Manual gene annotation ...... 23 RESULTS & DISCUSSION ...... 27 Quality assessment of genome assembly ...... 27 Annotation of repertoire of intact oxidative phosphorylation genes in D. alloeum .. 28 Ortholog groups are shared between D. alloeum and other hymenopterans ...... 30 Expansion of species-specific chemosensory genes in D. alloeum ...... 31 D. alloeum contains canonical genes involved in reproduction and sex determination ...... 40 Acknowledgements ...... 42 REFERENCES ...... 43 TABLES ...... 53

v

FIGURES ...... 59 SUPPLEMENTARY DATA ...... 72

CHAPTER 3: RETENTION OF CORE MEIOTIC GENES ACROSS DIVERSE , INCLUDING THE ASEXUAL WASP DIACHASMA MULIEBRE ...... 156 ABSTRACT ...... 156 INTRODUCTION ...... 157 MATERIALS & METHODS ...... 160 Hymenopteran meiotic gene inventory development ...... 160 Diachasma genome sequencing and annotation ...... 161 Phylogenetic analysis ...... 161 Evolutionary rate analysis of meiosis-specific genes in Diachasma ...... 162 RESULTS & DISCUSSION ...... 164 Meiotic Genes: Cell cycle control ...... 165 Meiotic genes: Initiation and maintenance of chromosome structure ...... 168 Meiotic Genes: Recombination ...... 171 Concatenated datasets ...... 176 Maintenance of intact meiosis genes in D. muliebre in the absence of sex ...... 179 Conclusions and future outlook ...... 181 Acknowledgements ...... 183 Data Availability...... 183 REFERENCES ...... 184 TABLES ...... 197 FIGURES ...... 199 SUPPLEMENTARY DATA ...... 208

CHAPTER 4: GENOME EVOLUTION IN SEXUAL VERSUS ASEXUAL SPECIES OF DIACHASMA ...... 209 ABSTRACT ...... 209 INTRODUCTION ...... 210 MATERIALS & METHODS ...... 213 Wasp genomic library preparation ...... 213 Reference guided genome assembly and variant calling ...... 213 Genomic dataset generation ...... 215 Intraspecific comparisons ...... 217 Relative rate analyses ...... 218 Codon usage bias and gBGC ...... 219 Mitochondrial genome assembly and analyses ...... 220 Statistical analyses ...... 220 RESULTS ...... 221 Loss of sex is associated with a global shift in the mutation landscape ...... 222 Loss of sex is associated with accelerated nuclear genome evolution ...... 223 Low levels of intraspecific variation in Diachasma ...... 225 vi

Distinct pattern of mitochondrial mutation accumulation ...... 225 Loss of sex is not associated with changes in codon usage or nucleotide composition ...... 227 DISCUSSION ...... 228 Does meiosis hinder mutation accumulation in asexual lineages? ...... 229 How are mutational changes realized in asexual Diachasma? ...... 229 The mitochondrial genome: exception to the rule? ...... 231 Acknowledgements ...... 232 REFERENCES ...... 233 TABLES ...... 239 FIGURES ...... 243 SUPPLEMENTARY DATA ...... 250

CHAPTER 5: SUMMARY, CONCLUSIONS, AND FUTURE DIRECTIONS...... 267 SUMMARY OF FINDINGS ...... 267 The genome of D. alloeum as a platform for studying the evolution of sex ...... 267 Maintenance of meiosis genes in Hymenoptera, including sexual and asexual Diachasma species ...... 268 Genomic evidence for accelerated mutation accumulation in D. muliebre ...... 270 CONTRIBUTIONS OF FINDINGS TO THE FIELD ...... 272 Genomic consequences of asexuality ...... 272 The Diachasma alloeum genome: a new tool for interspecific and intraspecific comparisons ...... 273 NEW AND OPEN QUESTIONS ...... 274 Asexual reproduction: how does D. muliebre do it? ...... 274 Are patterns of mutation accumulation in Diachasma affected by population structure? ...... 275 Are there distinct patterns of evolution in Diachasma OXPHOS genes? ...... 276 What are the fitness consequences of accelerated mutation accumulation in D. muliebre? ...... 277 CONCLUSIONS AND FUTURE PROSPECTS ...... 280 REFERENCES ...... 281

vii

LIST OF TABLES

Table 2-1. Summary statistics and feature counts of D. alloeum genome assembly...... 53

Table 2-2. Arthropoda BUSCO analysis for four selected hymenopteran genomes...... 54

Table 2- 3. Hymenoptera BUSCO analysis for four selected hymenopteran genomes. ... 55

Table 2-4. Summary statistics and BUSCO gene content for genome assemblies in select Hymenoptera...... 56

Table 2-5. GO enrichment analysis of OrthoVenn ortholog clusters shared between D. alloeum and M. demolitor...... 57

Table 2-6. Chemosensory gene content of select hymenopteran ...... 58

Table 2-S1. Sex determination genes used as queries for BLAST searches...... 72

Table 3-1. Organisms used in meiosis gene inventory...... 197

Table 3-2. Relative rate tests in meiosis-specific genes across in sexual and asexual Diachasma...... 198

Table 4-1. Relative rate analysis of concatenated nuclear genes in Diachasma...... 239

Table 4-2. Relative rate analysis of mitochondrial genes in Diachasma...... 240

Table 4-3. Comparisons of pairwise distances in mitochondrial genes for sexual and asexual Diachasma...... 241

Table 4-4. Measures of codon bias and nucleotide composition in nuclear and mitochondrial genes in Diachasma...... 242

Table 4-S1. Manual annotation of SNPs in accelerated asexual genes...... 250

Table 4-S2. Manual annotation of SNPs in accelerated sexual genes...... 252

Table 4-S3. Pairwise distances in nuclear CDS regions in Diachasma...... 254

Table 4-S4. SNP counts in Diachasma NGS datasets...... 255

Table 4-S5. P-values of relative rate analysis tests in Diachasma mitochondrial genes...... 256

Table 4-S6. Statistical assessment of normality for mitochondrial and nuclear genome datasets...... 257

viii

LIST OF FIGURES

Figure 2-1. Diagram of mitochondrial genome of D. alloeum...... 59

Figure 2-2. OrthoVenn clusters of select hymenopteran protein datasets...... 60

Figure 2-3. Phylogenetic tree of ORs from sampled hymenopteran insects...... 61

Figure 2-4. Phylogenetic subtrees of ORs from sampled hymenopteran insects...... 62

Figure 2-5. Phylogenetic tree of GRs from sampled hymenopteran insects...... 68

Figure 2-6. Phylogenetic tree of IRs from sampled hymenopteran insects...... 69

Figure 2-7. Phylogenetic tree of OBPs from sampled hymenopteran insects...... 70

Figure 2-8. Phylogenetic tree of CSPs from sampled hymenopteran insects...... 71

Figure 2-S1. Blobplot of scaffolds in D. alloeum genome assembly...... 73

Figure 3-1. Overview of key processes involved in meiosis...... 199

Figure 3-2. Meiosis gene inventory in Hymenoptera and three additional genomes...... 200

Figure 3-3. Maximum likelihood analysis of Cyclin/CDK proteins...... 201

Figure 3-4. Maximum likelihood analysis of proteins involved in meiotic cell cycle control...... 202

Figure 3-5. Maximum likelihood analysis of SMC proteins...... 203

Figure 3-6. Maximum likelihood analysis of RAD21/REC8 proteins...... 204

Figure 3-7. Maximum likelihood analysis of RAD51 proteins...... 205

Figure 3-8. Maximum likelihood analysis of proteins involved in mismatch repair...... 206

Figure 3-9. Maximum likelihood phylogenetic analysis of concatenated meiotic gene dataset...... 207

Figure 4-1. Diagram of variant calling pipeline used for Diachasma NGS datasets...... 243

Figure 4-2. Relative pairwise distance values in sexual and asexual Diachasma...... 244

Figure 4-3. Mutation landscapes of sexual and asexual Diachasma...... 245

ix

Figure 4-4. Maximum likelihood analysis of concatenated nuclear gene dataset in Diachasma...... 246

Figure 4- 5. Maximum likelihood analysis of concatenated mitochondrial gene dataset in Diachasma...... 247

Figure 4- 6. Maximum likelihood analysis of concatenated 1st/2nd codon positions in Diachasma...... 248

Figure 4-7. Maximum likelihood analysis of concatenated 3rd codon positions in Diachasma...... 249

Figure 4-S1. Mutation landscapes of sexual and various asexual Diachasma wasps. ... 258

Figure 4-S2. Mutation landscapes of various asexual Diachasma wasps...... 263

x

PREFACE

Chapter 1 contains some content included in an invited review article in Current

Opinion in Insect Science (Tvedte, Logsdon, & Forbes, submitted).

Chapter 2 will be reformatted into a manuscript with authors, in order, of Eric

Tvedte, Kimberly Walden (University of Illinois at Urbana-Champaign), Kyle McElroy

(University of Iowa), Andrew Forbes (University of Iowa), Glen Hood (Rice University),

John Logsdon (University of Iowa), Jeffery Feder (University of Notre Dame), and Hugh

Robertson (University of Illinois at Urbana-Champaign). Forbes, Hood, and Feder were responsible for obtaining biological samples. Walden and Robertson performed wasp sequencing and genome/transcriptome assembly. McElroy conducted transposable element analysis of hymenopteran genomes (not included here). Tvedte performed all other analyses, including BUSCO, OrthoVenn, and manual annotations. This work was assisted by undergraduates at the University of Iowa, including Austin Ward, who wrote a script to retrieve nonredundant sets of proteins used in the OrthoVenn analysis, and

Samuel Cummings, who assisted in manual annotations of chemosensory genes. Tvedte was primarily responsible for written content in this chapter, with components under the auspices of other authors contributing to the final text. All authors were responsible for elements of experimental design and manuscript editing.

Chapter 3 was originally published as an original article in Journal of Heredity

(Tvedte, Forbes, & Logsdon, 2017). The chapter text largely follows the publication text and has been reformatted here. In addition, content has been added in the materials and methods and results and discussion sections regarding the analysis of coding DNA sequences of meiosis-specific genes among sexual and asexual Diachasma species.

xi

Chapter 4 will be reformatted into a manuscript with authors, in order, of Eric

Tvedte, Austin Ward (University of Iowa), Andrew Forbes (University of Iowa), and

John Logsdon (University of Iowa). Tvedte was primarily responsible for all major analyses and written content contained in this chapter. Austin Ward assisted in script writing for data retrieval. Forbes and Logsdon were responsible for elements of experimental design and manuscript editing.

xii

CHAPTER 1: INTRODUCTION

The problem of sex

How an organism reproduces is a key feature of its biology. Sexual reproduction is associated with several costs (e.g. metabolic, genetic, male production) that are expected to be detrimental to fitness (Maynard Smith, 1978; Otto, 2009; Meirmans et al., 2012).

All else being equal, asexuality should theoretically be beneficial – at least in the short term – and asexual lineages should outcompete coexisting sexual lineages. Despite this, sex is the predominant reproductive strategy among eukaryotes (Bell, 1982). Most asexual lineages are short-lived, giving their phylogenetic distribution a “twiggy” appearance (Maynard Smith, 1978; Bell, 1982; Schwander & Crespi, 2009). The ubiquity of sex in nature suggests that it confers important benefits for long-term lineage survival that outweigh its short-term costs. Although sex apparently enables prolonged lineage persistence, there is currently no comprehensive understanding of why it is favored over alternate modes of reproduction. One promising avenue is the identification of potential barriers to transitions in reproductive mode and/or maladaptive consequences of asexuality; constraints on the origin and persistence of asexual lineages provides support for the empirical observation of widespread sex (Engelstädter, 2008, Meirmans et al.,

2012).

Evolution of meiosis genes in asexual lineages

Meiosis – the cellular processes involved in the generation of gametes – is a widely conserved feature of sexual reproduction. A single round of DNA replication is followed by two successive cell divisions, the first of which involves physical associations between homologous chromosomes and potential exchange of genetic material between

1 homologs (i.e. recombination). The fidelity of meiosis is important in the production of offspring with a normal complement of chromosomes; aneuploidy resulting from meiotic errors is often detrimental to organismal fitness (Santaguida & Amon, 2015).

Transitions to asexuality are accompanied by deviations from both canonical meiosis and outcrossing (Suomalainen et al., 1987, Stenberg & Saura, 2009). Modes of asexuality involving clonal production of offspring (i.e. apomixis) are characterized by the loss of meiotic mechanisms and the preservation of parental heterozygosity. In these asexual lineages, neutral evolutionary processes are expected to degrade meiotic mechanisms, and the underlying genes should reflect these changes in selective pressures

(Normark et al., 2003). Other parthenogenetic modes include the meiotic production of eggs and subsequent fusion of the products of female meiosis to restore ploidy (i.e. automixis), as well as the switching between sexual and asexual reproductive modes at periodic intervals (i.e. cyclic parthenogenesis) or during stressful conditions (i.e. facultative parthenogenesis). As in sexual lineages, meiosis genes in these parthenogens should generally be maintained via purifying selection.

The ‘meiosis detection toolkit’ is a comparative genomics approach that has been deployed to identify intact meiosis genes across the tree of life to infer the presence of sexual reproduction (Schurko & Logsdon, 2008). Meiosis genes are conserved in all major eukaryotic lineages, supporting an origin of meiosis near the first appearance of eukaryotes (Ramesh et al., 2005; Malik et al., 2008). The ‘toolkit’ is particularly useful in cases where sex has not been directly observed. The approach has been used to infer meiosis in protists (Ramesh et al., 2005; Carlton et al., 2007), fungi (Tzung et al., 2001), and (Srivastava et al., 2008) with unknown sex lives.

2

Asexually-reproducing lineages possess a variable composition of ‘toolkit’ genes.

Multiple cyclical parthenogenetic lineages have been studied, including the water flea

Daphnia pulex, pea aphid Acyrthosiphon pisum, and monogonont rotifer Brachionus spp.

In all cases, meiosis genes are generally well-conserved, and genes involved in cell cycle regulation have experienced duplication events, potentially facilitating reproductive plasticity (Schurko et al., 2009; Srinivasan et al., 2010; Hanson et al., 2013). Transposon- mediated disruption of REC8, an ultraconserved gene involved in cohesion of homologous chromosomes during meiosis I, was found in several asexual lineages of D. pulex, implicating a potential mechanism for generating new asexual lineages (Eads et al., 2012). However, the role of REC8 as the sole determinant of asexuality was later refuted (Xu et al., 2013; Jiang et al., 2017). In A. pisum, genes specific to meiosis are expressed in asexual oocytes, albeit at lower levels relative to the sexual germline, which is hypothesized to constrain the molecular events involved in meiotic recombination

(Srinivasan et al., 2014).

There is mixed support for the degradation of meiosis genes in obligate asexual lineages. In monogonont rotifers, obligate and cyclic parthenogens had identical meiosis gene repertoires and similar meiosis gene expression profiles (Hanson et al., 2013).

Meiosis genes are largely conserved in bdelloid rotifers, despite their apparent ancient asexuality (Nowell et al., 2018). Analysis of a small set of meiosis genes in the New

Zealand snail Potamopyrgus antipodarum showed no evidence of pseudogenization

(Rice, 2015). In contrast, several key meiosis genes in closely-related sexual nematodes were absent in the nematode Diploscapter pachys, including REC8 (Fradin et al., 2017).

Canonical meiotic synapsis and recombination are absent in this unichromosomal worm,

3 and a single cellular division equivalent to meiosis II is responsible for the generation of diploid offspring. Mutating REC8 in the sexual nematode Caenorhabditis elegans show a similar phenotype as D. pachys (Severson et al., 2009), suggesting this mutational event may have preceded the transition to asexual reproduction.

Genome evolution in asexual lineages

No universal explanation accounts for the evolution and maintenance of sex, but one favored hypothesis invokes a role for recombination, which in sexual individuals can break down linkage disequilibria (LD) across the genome as selection can act on loci independent of their genetic background (Hill & Robertson, 1966). If asexuality is accompanied by the loss of recombination, higher genome-wide linkage is expected to decrease the effectiveness of selection (Hill & Robertson, 1966; Felsenstein, 1974).

Specifically, selection for the transmission of advantageous mutations could be accompanied by the co-transmission of linked deleterious mutations, whereas recombination can decouple the former from the latter. When considering the fixation of separate adaptive mutations, successive events need to originate in a single asexual lineage, whereas recombination and outcrossing can more easily bring together mutations that arise in distinct lineages (Felsenstein, 1974). Recombination can also mitigate the passage of deleterious alleles from parents to offspring, whereas individuals with the least loaded genomes which are lost via genetic drift cannot be recovered in ameiotic asexual lineages (Muller, 1964; Felsenstein, 1974). Overall, asexuals will tend to experience increase persistence and fixation of harmful mutations, a scenario potentially underlying the degradation and eventual extinction of asexual lineages (Lynch et al., 1993).

4

Historically, studies of associations between reproductive mode and mutation accumulation were largely based on small datasets of nuclear and/or mitochondrial loci

(reviewed in Hartfield, 2016). More recently, next-generation sequencing (NGS) data has enabled the investigation of genome-wide mutational patterns in asexual lineages. Data from animals (Sharbrough et al., 2018; Bast et al., 2018) and plants (Hollister et al.,

2015; Lovell et al., 2017) show evidence for increased mutation accumulation in asexual lineages. However, other studies indicate no association between reproductive mode and mutation accumulation (Tucker et al., 2013; Ament-Velásquez et al., 2016; Lindsey et al., 2018) and one recent example indicates more effective purifying selection against harmful mutations in asexual lineages (Brandt et al., 2017). Despite the increased use of

NGS to test mutation accumulation hypotheses, multiple challenges remain. First, many asexual systems are characterized by originating via hybridization and/or are polyploid, complicating associations between reproductive mode and mutation accumulation

(Tucker et al., 2013; Hollister et al., 2015; Ament-Velásquez et al., 2016; Lovell et al.,

2017; Sharbrough et al., 2018). Second, comparisons between asexual study systems and distant sexual relatives rely on model estimates for evolutionary patterns at terminal branches. Sequences from multiple closely-related sexual lineages allows for a direct test of mutational directionality. Third, nearly all previous mutation accumulation studies involve comparisons between sexual lineages and effectively clonal asexual lineages. A single study in the parasitic wasp Trichogramma pretiosum compared genome-wide protein evolution rate in sexual and automictic asexual lineages, where no relationship was found (Lindsey et al., 2018). More studies need to be performed to determine if

5 recombination persistence in automictic asexual lineages is sufficient to negate deleterious mutation accumulation.

Diachasma wasps: a promising model for the study of sex loss

To assess changes in the patterns of molecular evolution across the breadth of a newly asexual genome, we employ parasitic wasps in genus Diachasma. The three species of

North American Diachasma (Hymenoptera: ) are parasitoids of the larvae of

Rhagoletis flies (Diptera: Tephritidae). Two sexual species, D. ferrugineum and D. alloeum, are widely distributed across the eastern United States. (Muesebeck, 1956;

Wharton & Marsh, 1978; Forbes et al., 2010). Collections of D. alloeum span in large part the native range of its hosts, flies in the Rhagoletis pomonella species complex. The exception is central Mexico, where hawthorn-infesting R. pomonella flies have been described without concurrent recovery of Diachasma wasp parasitoids. The eastern cherry fly R. cingulata has a similar geographical distribution to R. pomonella and is parasitized by the sexual wasp D. ferrugineum. Populations of R. cingulata have been recently discovered in central Mexico, however Diachasma wasps have not been recovered from these regions (Rull et al., 2011).

The western cherry fly, R. indifferens, sister to R. cingulata, is attacked by asexual

D. muliebre and is concentrated in the Pacific Northwest (Muesebeck, 1956; Wharton &

Marsh, 1978). Collections of R. indifferens span the west coast of Washington, Oregon, and northern California, and eastward with collections in central Idaho (Foote et al.,

1993). Sexual and asexual Diachasma species are morphologically and behaviorally similar, with multiple lines of evidence supporting D. ferrugineum as the closest sexual relative to D. muliebre (Wharton & Marsh, 1978; Forbes et al., 2009; Hamerlinck et al.,

6

2016). Mitochondrial COX1 sequence data suggest extant D. muliebre populations formed as a single transition to asexuality ~10kya-2mya (Forbes et al., 2013). This timing is consistent with the biogeography of the system: Diachasma ferrugineum and D. muliebre are currently allopatric may have overlapped during the last glacial maximum

(LGM, ~12-40kya) when many North American trees and their associated insects would have been forced south (Booth, 1987; Clark et al., 2009). A host of R. cingulata, Prunus serotina, still exists in the Sierra Madre Oriental (Greller, 2000). Given the lack of mtCOX2 differences among the fly hosts R. cingulata and R. indifferens (Smith & Bush,

1997), the divergence between sexual and asexual wasps, as well as that of their hosts, is unlikely to be older than the LGM.

Little is known about the evolution of D. muliebre following sex loss. Ancestral wild cherry R. indifferens populations have experienced a recent host shift into cultivated cherries with earlier fruiting times (Wilson & Lovett, 1913), and there is evidence that cherry flies and their D. muliebre parasites have synchronized their annual eclosion with the fruiting periods of these hosts (Forbes et al., 2013; Yee et al., 2015). Patterns of microsatellite allele segregation among asexual wasps suggest D. muliebre reproduction involves meiotic egg production with restoration of adult diploidy following the fusion of female gametes (Forbes et al., 2013). This presents a unique opportunity to investigate the generation and maintenance of genetic diversity in an asexual lineage that can still redistribute mutations within its genome (i.e. persistence of recombination) but whose diversity is ultimately lineage-specific (i.e. loss of outcrossing).

7

Thesis aims

Broadly, my dissertation investigates whether loss of sex in Diachasma is associated with distinct patterns of genomic evolution. To enable comparative analyses between

Diachasma species, I conducted descriptive analyses on the de novo assembled genome of the sexual wasp, D. alloeum (Chapter 2). I compared the genome assembly against other publicly-available hymenopteran genomes, identified the presence of well- conserved genes, and manually annotated genes relevant to wasp biology. To evaluate the evolutionary pressures on meiotic processes in D. muliebre, I compiled a meiosis gene

‘toolkit’ of several hymenopteran insects, including sexual Diachasma species (Chapter

3). I estimated evolutionary rates in meiosis genes between D. ferrugineum and D. muliebre to infer selective pressures in these genes after sex loss. To evaluate genomic patterns of mutation accumulation, I generated a dataset of thousands of genes for sexual and asexual wasps (Chapter 4). Here, I developed a novel bioinformatic workflow to determine if the asexual Diachasma lineage has experienced an increased genetic load since sharing a common ancestor with a closely-related sexual wasp. Overall, the characterization of the D. muliebre genome provides an empirical test of fundamental hypotheses surrounding the evolution and persistence of asexual lineages.

8

REFERENCES

Ament-Velásquez SL, et al. 2016. Population genomics of sexual and asexual lineages in

fissiparous ribbon worms (Lineus, Nemertea): hybridization, polyploidy and the

Meselson effect. Mol Ecol 25:3356–3369.

Bast J et al. 2018. Consequences of asexuality in natural populations: insights from stick

insects. Mol Biol Evol doi:10.1093/molbev/msy058.

Bell G. The Masterpiece of Nature: The Evolution and Genetics of Sexuality. 1982.

Cambridge University Press, Cambridge, UK.

Booth DB. 1987. Timing and processes of deglaciation along the southern margin of the

Cordilleran ice sheet. In The Geology of North America v.K-3. ed. Ruddiman WF,

Wright HE Jr. pp.71-90. Geol Soc Am.

Brandt A et al. 2017. Effective purifying selection in ancient asexual oribatid mites.

Nat Commun 8:873.

Carlton JM, et al. 2007. Draft genome sequence of the sexually transmitted pathogen

Trichomonas vaginalis. Science. 315:207–212.

Clark PU, et al. 2009. The last glacial maximum. Science, 325:710-714.

Eads BD, Tsuchiya D, Andrews J, Lynch M, Zolan ME. 2012. The spread of a

transposon insertion in Rec8 is associated with obligate asexuality in Daphnia. P

Natl A Sci USA 109:858-863.

Engelstäedter J. 2008. Constraints on the evolution of asexual reproduction. BioEssays

30:1138–1150.

Felsenstein J. The evolutionary advantage of recombination. Genetics 1974, 78:737–756.

9

Foote RH, Blanc FL, Norrbom AL.1993. Handbook of the Fruit Flies (Diptera:

Tephritidae) of America North of Mexico. Cornell University Press, Ithaca, NY,

USA.

Forbes AA, Rice LA, Stewart NB, Yee WL, Neiman M. 2013. Niche differentiation and

colonization of a novel environment by an asexual parasitic wasp. J Evolution

Biol 26:1330–1340.

Forbes AA, Powell TH, Stelinski LL, Smith JJ, Feder JL. 2009. Sequential sympatric

speciation across trophic levels. Science 323:776–779.

Fradin H, et al. 2017. Genome architecture and evolution of a unichromosomal asexual

nematode. Curr Biol 27:2928–2939.

Greller AM. 2000. Vegetation in the floristic regions of North and Central America. In

Imperfect Balance: Landscape Transformations in the Precolumbian Americas.

ed. Lentz DL. pp 39-86. Columbia University Press, New York, NY, USA.

Hamerlinck G, Hulbert D, Hood GR, Smith JJ, Forbes AA. 2016. Histories of host shifts

and cospeciation among free-living parasitoids of Rhagoletis flies. J Evolution

Biol 29:1766–1779.

Hanson SJ, et al. 2013. Inventory and phylogenetic analysis of meiotic genes in

monogonont rotifers. J Hered 104:357-370.

Hartfield M. 2016. Evolutionary genetic consequences of facultative sex and outcrossing.

J Evolution Biol 29:5–22.

Hill WG, Robertson A. The effect of linkage on limits to artificial selection. Genet Res

1966, 8:269–294.

10

Hollister JD, et al. 2015. Recurrent loss of sex is associated with accumulation of

deleterious mutations in Oenothera. Mol Biol Evol 32:896–905.

Jiang X, Tang H, Ye Z, Lynch M. 2017. Insertion polymorphisms of mobile genetic

elements in sexual and asexual populations of Daphnia pulex. Genome Biol Evol

9:362–374.

Lindsey AR, et al. 2018. Comparative genomics of the miniature wasp and pest control

agent Trichogramma pretiosum. BMC Biol 16:54.

Lovell JT, Williamson RJ, Wright SI, McKay JK, Sharbel TF. 2017. Mutation

accumulation in an asexual relative of Arabidopsis. PLOS Genet 13:e1006550.

Lynch M, Bürger R, Butcher D, Gabriel W. 1993. The mutational meltdown in asexual

populations. J Hered 84:339–344.

Malik SB, Pightling AW, Stefaniak LM, Schurko AM, Logsdon JM. 2008. An expanded

inventory of conserved meiotic genes provides evidence for sex in Trichomonas

vaginalis. PLOS One 3:e2879.

Meirmans S, Meirmans PG, Kirkendall LR. 2012. The costs of sex: facing real-world

complexities. Q Rev Biol 87:19–40.

Muesebeck CFW. 1956. On Opius ferrugineus Gahan and two closely similar new

species (Hymenoptera: Bracondiae). Entomol News 67: 99–102.

Muller HJ. 1964. The relation of recombination to mutational advance. Mutat Res-Fund

Mol M 1:2–9.

Normark BB, Judson OP, Moran NA. 2003. Genomic signatures of ancient asexual

lineages. Biol J Linn Soc 79:69–84.

11

Nowell RW, et al. 2018. Comparative genomics of bdelloid rotifers: Insights from

desiccating and nondesiccating species. PLOS Biol. 16:e2004830.

Otto SP. 2009. The evolutionary enigma of sex. Am Nat 174:S1–S14.

Ramesh MA, Malik SB, Logsdon JM. 2005. A phylogenomic inventory of meiotic genes:

evidence for sex in Giardia and an early eukaryotic origin of meiosis. Curr Biol

15:185-191.

Rice CS. 2015. Evolution of meiosis genes in sexual vs. asexual Potamopyrgus

antipodarum. MS (Master of Science) thesis, University of Iowa, Iowa City, IA,

USA.

Rull J, Aluja M, Feder JL. 2011. Distribution and basic biology of black cherry-infesting

Rhagoletis (Diptera: Tephritidae) in Mexico. Ann Entomol Soc Am 104:202–211.

Santaguida S, Amon A. 2015. Short-and long-term effects of chromosome mis-

segregation and aneuploidy. Nat Rev Molec Cell Biol 16:473.

Schurko AM, Logsdon Jr JM, Eads BD. 2009. Meiosis genes in Daphnia pulex and the

role of parthenogenesis in genome evolution. BMC Evol Biol 9:78.

Schurko AM, Logsdon JM. 2008. Using a meiosis detection toolkit to investigate ancient

asexual“scandals” and the evolution of sex. Bioessays 30:579-589.

Schwander T, Crespi BJ. 2009. Twigs on the tree of life? Neutral and selective models

for integrating macroevolutionary patterns with microevolutionary processes in

the analysis of asexuality. Mol Ecol 18:28–42.

Severson AF, Ling L, van Zuylen V, Meyer BJ. 2009. The axial element protein HTP-3

promotes cohesin loading and meiotic axis assembly in C. elegans to implement

the meiotic program of chromosome segregation. Gene Dev 23:1763–1778.

12

Sharbrough J, Luse M, Boore JL, Logsdon Jr JM, Neiman M. 2018. Radical amino acid

mutations persist longer in the absence of sex. Evolution 72:808–824.

Smith JJ, Bush GL. 1997. Phylogeny of the genus Rhagoletis (Diptera: Tephritidae)

inferred from DNA sequences of mitochondrial cytochrome oxidase II. Mol

Phylogenet Evol 7:33-43.

Smith JM. The Evolution of Sex. 1978. Cambridge University Press, Cambridge, UK.

Srinivasan DG, Abdelhady A, Stern DL. 2014. Gene expression analysis of

parthenogenetic embryonic development of the pea aphid, Acyrthosiphon pisum,

suggests that aphid parthenogenesis evolved from meiotic oogenesis. PLOS One

9:e115099.

Srinivasan DG, Fenton B, Jaubert-Possamai S, Jaouannet M. 2010. Analysis of meiosis

and cell cycle genes of the facultatively asexual pea aphid, Acyrthosiphon pisum

(Hemiptera: Aphididae). Insect Mol Biol 19:229–239.

Srivastava M et al. 2008. The Trichoplax genome and the nature of placozoans. Nature

454:955.

Stenberg P, Saura A. 2009. Cytology of asexual animals. In Lost sex. ed. Schӧn I,

Martens K, van Dijk P. pp. 63–74. Springer, New York, NY, USA.

Suomalainen E, Saura A, Lokki J. 1987. Cytology and evolution in parthenogenesis.

CRC Press, Boca Raton, FL, USA.

Tucker AE, Ackerman MS, Eads BD, Xu S, Lynch M. 2013. Population-genomic

insights into the evolutionary origin and fate of obligately asexual Daphnia pulex.

Proc Natl Acad Sci USA 110:15740–15745.

13

Tzung KW et al. 2001. Genomic evidence for a complete sexual cycle in Candida

albicans. Proc Natl Acad Sci USA 98:3249–3253.

Wharton R, Marsh P. 1978. New world (Hymenoptera: Braconidae) parasitic on

Tephritidae (Diptera). J Wash Acad Sci 68:147-167.

Wilson HF, Lovett AL. 1913. Miscellaneous insect pests of orchard and garden. Oreg

Agric Exp Stat Bienn Crop Pest Hort Rep 1911–1912 16: 147–165.

Xu S, et al. 2015. Hybridization and the origin of contagious asexuality in Daphnia

pulex. Mol Biol Evol 32:3215-3225.

Yee WL, Goughnour RB, Hood GR, Forbes AA, Feder JL. 2015. Chilling and host

plant/site-associated eclosion times of Western cherry fruit fly (Diptera:

Tephritidae) and a host-specific parasitoid. Environ Entomol 44:1029–1042.

14

CHAPTER 2: DESCRIPTIVE ANALYSES OF THE GENOME OF THE PARASITIC WASP DIACHASMA ALLOEUM, AN EMERGING MODEL FOR ECOLOGICAL SPECIATION AND TRANSITIONS TO ASEXUAL REPRODUCTION ABSTRACT

Parasitoid wasps are among the most speciose animals, yet have relatively few available genomic resources. We report a draft genome assembly of the wasp Diachasma alloeum

(Hymenoptera: Braconidae), a host-specific parasitoid of the apple maggot fly Rhagoletis pomonella (Diptera: Tephritidae) and developing model for understanding how ecological speciation can “cascade” across trophic levels. Identification of gene content confirmed the overall quality of the draft genome, and we manually annotated ~400 genes as part of this study, including those involved in oxidative phosphorylation, chemosensation, and reproduction. Through comparisons to model hymenopterans such as the European honeybee Apis mellifera and parasitoid wasp Nasonia vitripennis, as well as a more closely related braconid parasitoid Microplitis demolitor, we identified a proliferation of transposable elements in the genome, an expansion of D. alloeum chemosensory gene families, and the maintenance of several key genes with known roles in sexual reproduction and sex determination. The D. alloeum genome will provide a valuable resource for comparative genomics studies in Hymenoptera as well as specific investigations into the genomic changes associated with ecological speciation and transitions to asexuality.

15

INTRODUCTION

The Hymenoptera may be the largest order of insects due to the immense diversity of parasitic wasps (i.e. “parasitoids”) that lay their eggs into or on other insect species

(LaSalle & Gould, 1993; Austin & Dowton, 2000; Forbes et al., 2018). The great diversity of parasitoid wasps may be a consequence of their close relationship with their insect hosts. When a specialist parasitoid shifts to a new host, this change can propel the evolution of reproductive isolating barriers between wasp populations using the new and ancestral hosts (Feder & Forbes, 2010). The evolution of reproductive isolating barriers following a host shift is a well-documented phenomenon in host specialist insects (Forbes et al., 2017), but the study of genomic changes that accompany such phenomena is still in its early stages.

Diachasma alloeum (Hymenoptera: Braconidae) is a parasitoid of the fruit fly

Rhagoletis pomonella (Diptera: Tephritidae). After the introduction of domesticated apples to the United States from Europe, R. pomonella infesting native hawthorn fruits experienced a host shift and subsequently evolved reproductive isolating barriers in what has become a well-known example of incipient ecological speciation (Walsh, 1867;

Bush, 1966; Bush, 1994; Nosil, 2012). This new “apple maggot fly” was sequentially colonized by D. alloeum, which appears to have shifted from its ancestral host, the blueberry maggot Rhagoletis mendax (Forbes et al., 2009). Two reproductive isolating barriers (i.e. diapause emergence and host fruit volatile discrimination) have evolved in parallel in R. pomonella and D. alloeum, and these appear to have a genetic basis

(Dambroski et al., 2005, Forbes & Feder, 2006, Forbes et al., 2009). This phenomenon of

16

“sequential” or “cascading” speciation may be an important driver of new biodiversity

(Stireman et al., 2006; Abrahamson & Blair, 2007; Hood et al., 2015).

Reproductive isolation in Diachasma has also arisen as a consequence of the loss of sexual reproduction, a general pattern observed in hymenopteran insects (van der Kooi et al., 2017). Diachasma muliebre appears to have split from its sexual sister Diachasma ferrugineum between 0.5 and 1 mya (Wharton & Marsh, 1978; Forbes et al., 2013).

Although the decay of genes involved in sexual traits were observed in multiple asexual parasitoid wasps (e.g. Ma et al., 2014; Kraaijeveld et al., 2016), there is a dearth of comparative assessments of genomic molecular evolution between sexual and asexual

Hymenoptera.

Here, we report the de novo genome assembly of the parasitoid wasp D. alloeum, adding to the genomic resources for parasitoid wasps, which are underrepresented among available hymenopteran genomes (Branstetter et al., 2017). We performed a series of descriptive analyses to assess the overall quality and content of the D. alloeum genome, and then focused on annotation and evolutionary analyses of multiple gene families relevant to Diachasma wasp biodiversity.

17

MATERIALS & METHODS

Biological material

We collected fallen fruit from five downy hawthorn trees (Crataegus mollis) historically highly infested with both R. pomonella (Diptera: Tephritidae) and D. alloeum

(Hymenoptera: Braconidae) in Fennville, MI (42.597307, -86.151498) in August 2009 and 2010. Infested fruit were transported to the greenhouse at the University of Notre

Dame, and placed in wire cages above black planter trays that caught late-instar fly larvae as they exited fruits. As fly larvae pupated, we moved them into Petri dishes containing moist vermiculite, held them at room temperature (21 ± 2ºC) for 10 days, and then moved them into a 4°C refrigerator for 4 months to simulate overwintering conditions (Forbes et al., 2009; Hood et al., 2015). After overwinter, we removed Petri dishes from the cold, placed the dishes in several 15 cm x 15cm x 25 cm Plexiglas cages supplied with water and a diet of honey mixed with brewer’s yeast inside a rearing chamber held at 24ºC with a 14:10 light:day cycle. Cages were monitored daily for eclosing R. pomonella flies and wasps. Upon eclosion, adult wasps were kept alive for 3-7 days, and then frozen at -80°C.

At a later date, preserved wasps were sexed and identified to the species level using the key of Wharton & Yoder (2015).

DNA isolation, library preparation, sequencing, and genome assembly

We selected a single male wasp and pooled female animals for DNA sequencing. Briefly, we grinded animals in liquid nitrogen and lysed tissue in a SDS solution overnight with

Proteinase K. We treated the homogenate with RNaseA, and we collected proteins/debris after high-salt precipitation and centrifugation. After ethanol precipitation, we resuspended the DNA in 10 mM Tris and subsequently evaluated extracted DNA on an

18 agarose gel and by Qubit quantification.

We generated the following libraries for sequencing: a 500 bp shotgun library from a single male wasp, a 1.5 kbp shotgun library from four pooled female wasps, a 5 kbp mate-pair library using a 5 female + 1 male pooled sample, and 10 and 20 kb insert mate-pair libraries using DNA from 50 pooled mixed sex wasps. Pooled samples were used to achieve the minimum DNA mass needed for library preparation. We prepared the

500 bp and 1.5 kb shotgun libraries with Illumina TruSeq DNAseq Sample Prep kits. We prepared the 5 and 10 kb mate-pair libraries using a similar protocol, however we used a custom linker to ligate between the fragment ends to facilitate mate-pair recovery. We constructed the 20 kb mate-pair library with an Illumina Nextera Mate-Pairs Sample Prep kit. We sequenced all libraries for 100 cycles on an Illumina HiSeq2000 using Illumina

TruSeq SBS Sequencing kit v3. Bases were called with Casava v1.8 and the reads are available from the Short Read Archive at NCBI (SRR2042503, SRR2046752,

SRR2042775, SRR2043489, SRR2043491, SRR2043616, SRR2043618, SRR2043726,

SRR2041626).

We filtered mate-pair libraries for properly-oriented reads of the appropriate insert size and uniqueness using in-house custom pipeline scripts. We trimmed raw Illumina reads on 5' and 3' ends nucleotide-bias and low-quality bases using the FASTX Toolkit

(http://hannonlab.cshl.edu/fastx_tookit/). Next, we error-corrected trimmed reads by library with Quake (Kelley et al. 2010), counting 19-mers. To minimize haplotype reconstruction issues during the initial de novo assembly, we used SOAPdenovo v2.04

(Luo et al. 2012) with K=49 to assemble contigs from the 500 bp-insert shotgun library reads sequenced from an individual haploid male. Following this step, we used

19

SOAPdenovo to perform scaffolding with iteratively longer-insert shotgun and mate-pair libraries and subsequently used GapCloser v1.12 to close gaps generated in the scaffolding (Luo et al. 2012).

In addition, to improve scaffolding of the genome assembly, we generated an

Illumina TruSeq Synthetic Long Read (TSLR) library with a TSLR Sample Prep kit from the same pooled DNA sample. We sequenced the library for 100 cycles on a HiSeq2000 and the resultant read data was analyzed by Illumina using their TruSeq Long Read

Assembly Application in “BaseSpace”. We added the resultant TSLR “reads” to the assembly using PBJelly v2 (English et al., 2012).

To annotate assembly scaffolds as potential contaminants, we generated a blobplot using BlobTools v1.0 (DOI 10.5281/zenodo.845347, Laetsch & Blaxter, 2017).

Using the ‘taxify’ module, we conducted BLASTn comparisons between the D. alloeum scaffolds in the assembly and the NCBI nt database, generating a ‘hits’ file. We used bwa-mem (Li et al., 2013) to align paired-end reads from the 500 bp insert library to the

D. alloeum genome to generate a BAM coverage file. We submitted the assembly file,

‘hits’ file, and BAM coverage file using the ‘create’ module to produce a database output file, which we then visualized using the ‘view’ module. We further inspected scaffolds with hits to non- identifiers, and we identified sequences with aberrant GC content and coverage as putative contaminants.

RNA isolation, RNAseq library preparation, sequencing, assembly, and annotation

We prepared separate ground samples for ten wasps of each sex in 1ml Trizol in glass tissue grinders and filtered tissue over a Qiagen Qiashredder column. We extracted the homogenates with chloroform and the precipitated RNA with linear polyacrylamide

20

(10mg/mL) and isopropanol. Next, we washed RNA pellets with 75% ethanol and resuspended in RNase-free water. We quantified RNA with a Qubit RNA Broad Range

Assay Kit on a Qubit fluorometer (Life Technologies). We visualized RNA using ethidium bromide on a 1.0% agarose gel. We then prepared RNAseq libraries from an average cDNA fragment size of 250 bases using the Illumina TruSeq Stranded RNAseq

Sample Prep kits. We individually barcoded the two libraries and quantitated libraries using qPCR before pooling and sequencing from both ends with TruSeq SBS Sequencing kit v4 for 100 cycles on a HiSeq2500 instrument. Bases were called with Casava v1.82.

We trimmed reads as for DNA sequencing. We assembled the combined trimmed reads from males and females with Trinity (Release 2014-04-13)

(http://trinityrnaseq.github.io/) (Grebherr et al., 2011; Haas et al. 2013). The raw

RNAseq reads were submitted to the Sequence Read Archive (SRA) at NCBI (Accession

Numbers SRR2041626 and SRR2040481), while the Trinity transcriptome assemblies were submitted to the Transcriptome Shotgun Assembly database (Accession Numbers

GECN00000000.1). Annotation of the D. alloeum genome assembly was performed by the NCBI using their Eukaryotic Genome Annotation Pipeline

(https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/), with experimental support from the RNAseq and transcriptome.

Characterization of ultraconserved elements using BUSCO

The BUSCO tool (Simão et al., 2015) estimates the completeness of a genome assembly by providing a quantitative measure of the presence of single-copy orthologs. We retrieved Arthropod and Hymenoptera ortholog datasets from OrthoDBv9 (Zdobnov et al., 2017) which contain genes present in > 90% of the species within those respective

21 groups in OrthoDB. Using default parameters, we ran BUSCO v3 in genome mode on the D. alloeum assembly. We downloaded genome assemblies for other hymenopteran species from NCBI (A. mellifera, N. vitripennis, M. demolitor, Table 2-2). Using the same run parameters, we identified the presence of arthropod and hymenopteran

BUSCOs for published hymenopteran genome assemblies for comparative purposes.

Identification of orthologous gene clusters using OrthoVenn

The OrthoVenn web server (http://www.bioinfogenome.net/OrthoVenn/, Wang et al.,

2015) allows for comparison of whole-genome orthologous gene clusters, identification of group- and species-level cluster annotations, and characterization of enriched GO terms in gene clusters. We downloaded protein datasets for D. alloeum, A. mellifera, N. vitripennis, and M. demolitor from NCBI (Table 1). Using a custom script, we retained the longest isoform for sequences with multiple isoforms and removed shorter isoforms.

We uploaded the resulting files containing nonredundant protein sequences for all species on the OrthoVenn server. We used default parameters in the OrthoVenn run (i.e. an e- value cutoff of 1e-5 for protein-protein similarity comparisons and a cluster inflation value of 1.5). OrthoVenn uses a hypergeometric distribution to calculate the p-value for the enrichment of GO terms in protein clusters specific to individual species or species groups relative to all GO terms in the dataset. We examined GO term enrichment in clusters specific to D. alloeum and in clusters shared with M. demolitor, another braconid wasp.

22

Manual gene annotation

Oxidative phosphorylation genes

We used NOVOplasty v2.6.3 (Dierckxsens et al., 2017) to assemble the mitochondrial genome of D. alloeum. To de novo assemble the mitochondrial sequence, we supplied raw reads from the 500bp insert library using the cytochrome C oxidase 1 gene (cox1) from a different individual D. alloeum wasp (Forbes et al., 2009) (Genbank EU881670.1) as a seed input. We used MUSCLE (Edgar, 2004) to align mitochondrial sequence regions of interest with Diachasmimorpha longicaudata sequences (Wei et al., 2010).

Using MUSCLE alignments and de novo predictions of protein coding genes, rRNAs, and tRNAs from the MITOS web server (Bernt et al., 2013), we annotated 13 protein- coding genes, two rRNA sequences, and 22 tRNAs.

We searched for 68 nuclear OXPHOS genes in the D. alloeum assembly using queries from N. vitripennis, A. mellifera, and D. melanogaster (Porcelli et al., 2007;

Gibson et al., 2010). To confirm putative nuclear OXPHOS genes, we performed a reciprocal BLAST search: we used annotated proteins from insects as queries to search

D. alloeum genome using tBLASTn, and subsequently performed BLASTx against the

NCBI nr database to confirm the identity of candidate D. alloeum OXPHOS sequences.

We generated gene models on the D. alloeum WebApollo portal

(https://apollo.nal.usda.gov/diaall/jbrowse/), annotating start site, stop site, and exon- intron boundaries for candidate D. alloeum OXPHOS nucleotide sequences. To assist with annotations, we incorporated predictions from GeneWise (Birney et al., 2004) using orthologous OXPHOS proteins from other insects. We assigned genes to the OXPHOS

23 enzyme complexes I-V based on information from MitoComp2

(http://www.mitocomp.uniba.it/, Porcelli et al., 2007).

Chemosensory genes

We used a modified reciprocal BLAST strategy to identify chemosensory-related genes in D. alloeum. Chemoreception in is mediated by three major families of receptors: odorant receptors (ORs), gustatory receptors (GRs), and ionotropic receptors

(IRs) (Clyne et al., 1999, Clyne et al. 2000; Benton et al., 2009). In addition, two major families of water-soluble proteins are responsible for transport and/or quenching of ligands to chemosensory receptors: odorant binding proteins (OBPs) and chemosensory proteins (CSPs) (Viera & Rozas, 2011; Pelosi et al., 2014b; Larter et al., 2016). We obtained chemosensory family protein sequences from published data from N. vitripennis, A. mellifera, M. demolitor, and M. mediator. We conducted tBLASTn searches of the D. alloeum genome using hymenopteran proteins as queries. For gene families with highly divergent sequence, i.e. ORs, GRs, IRs, we used the BLOSUM45 substitution matrix to execute searches. All hits with an e-value less than 1e-5 were manually inspected. We performed BLASTx searches on putative D. alloeum chemosensory gene nucleotide sequences against the NCBI nr database and retained genes with top hits matched to the appropriate gene family.

We constructed gene models on the D. alloeum WebApollo portal

(https://apollo.nal.usda.gov/diaall/jbrowse/). We compared hymenopteran proteins with the best tBLASTn score to D. alloeum nucleotide sequences using GeneWise (Birney et al., 2004). Protein sequences were aligned using the CLUSTALW plugin on the

Geneious bioinformatics software platform (Kearse et al., 2012). We retained sequences

24 with at least ~50% of the characteristic length of the protein family (ORs/GRs/IRs > 200 aas, OBPs/CSPs > 75 aas), and we performed iterative rounds of alignment and reannotation to fix poorly aligning regions. Final gene models included start sites, stops sites, and exon-intron boundaries. We assigned functional classification to gene models using InterProScan (Zdobnov & Apweiler, 2001).

We followed the naming convention for ORs and GRs in D. alloeum largely based on previously published works on chemoreceptors in model hymenopteran species

(Robertson & Wanner, 2006; Robertson et al., 2010). Genes were named with a four letter prefix corresponding to the species name, followed by the gene family name, followed by a number (e.g. DallOr1, DallOr2, DallOr3, etc. and DallGr1, DallGr2,

DallGr3, etc.) We named DallOrs and DallGrs in ascending order corresponding to sequence orthologs in N. vitripennis and A. mellifera, however extensive gene duplication and loss in these gene families prevented a consistent naming scheme.

We named IRs following the conventions outlined in Croset et al. (2010). We categorized and named IRs into “antennal IRs” and “divergent IRs.” The “antennal IRs” represent a collection of well-conserved IRs that were first characterized in the antenna of

D. melanogaster (Benton et al., 2009). We generated alignments for IRs using sequences from D. melanogaster and various hymenopteran species (tree not shown). We assigned the same name to unambiguous orthologs of D. melanogaster “antennal IRs” (e.g.

DmelIr8a = DallIr8a). In the event that multiple copies of an IR sequence ortholog exist for D. alloeum, we assigned the genes with the same base name followed by a point and a number (e.g. DallIr64a, DallIr64a.2) We numbered Diachasma-specific IRs and those with unclear orthology in ascending order, starting with DallIr101. The “divergent IRs”

25 have low intraspecific and interspecific sequence identity and are expressed in gustatory organs of D. melanogaster (Croset et al., 2010).

We adapted naming conventions for OBPs and CSPs proposed by Forêt &

Maleska (2006) for A. mellifera genes. We used a four letter Dall prefix followed by a three letter abbreviation that indicates what family the gene belongs to (OBP/CSP). We appended numbers at the end of each gene, with tandemly-arranged genes receiving sequential numbers.

Sex determination genes

To search for orthologs of doublesex, transformer/feminizer, and csd in the D. alloeum genome, we used the reciprocal BLAST strategy described above for OXPHOS and chemosensory genes. Genes used as initial BLAST queries are provided in

Supplementary Table 2-S1.

26

RESULTS & DISCUSSION

Quality assessment of genome assembly

Libraries from a combination of single and pooled wasp samples contained 182.88 GB total sequence data. The de novo genome assembly Dall1.0 (GenBank

GCA_001412515.1) had 3,968 scaffolds with a total scaffold length of 388.8 Mb and a scaffold N50 of 645,583 bp (Table 2-1). 1,732 scaffolds contained non-Arthropod taxonomic identifiers using BlobTools. Of these, 491 had a Proteobacteria taxonomic identifier. Many of these hits contained higher GC content than a typical Arthropod scaffold, providing additional evidence that these are contaminating sequences

(Supplementary Figure 2-S1). The NCBI Eukaryotic Gene Prediction Pipeline predicted

12,837 protein coding genes and 19,692 mRNA transcripts (Table 2-1), with 16,744

(~85%) of the predicted transcripts having complete EST support and an additional 2,305

(~11%) genes supported with ab initio predictions.

A common metric used to assess the relative completeness of a genome assembly is the identification of conserved single-copy genes, performed using BUSCO v3 (Simão et al., 2015). We identified 1053 of 1066 (~99%) arthropod BUSCOs and 4162 of 4415

(~94%) of hymenopteran BUSCOs in the D. alloeum genome assembly (Table 2-2, Table

2-3). Details of the BUSCO results are provided in Supplementary Data 2-S2.

The recovery of complete BUSCOs is comparable to amounts recovered for A. mellifera, N. vitripennis, and M. demolitor (Table 2-4). As the number of genes contained by > 90% of OrthoDB taxa is expected to be higher in a narrower taxonomic group (e.g.

Hymenoptera relative to Arthropoda), we would also expect spurious gene loss and duplication in species for this expanded gene set. As such, we recovered a higher number

27 of duplicate, fragmented, and missing Hymenoptera BUSCOs relative to Arthropoda

BUSCOs. When we combined missing and fragmented BUSCOs into an “incomplete” category, we found a considerable percentage of Arthropoda (3/13, 23%) and

Hymenoptera (93/253, 37%) BUSCOs that were incomplete in the D. alloeum assembly that were also incomplete in at least one other hymenopteran surveyed. The most frequent pattern we observed was genes that were incomplete in M. demolitor, which we identified in 3/13 (23.07%) Arthropoda BUSCOs and 78/253 (30.84%) in Hymenoptera BUSCOs that also were incomplete in D. alloeum. Although assembly fragmentation may preclude the ability of BUSCO to retrieve the appropriate sequences, an alternate explanation is that these genes were lost prior to the divergence between Diachasma and Microplitis.

Overall, the identification of BUSCOs in the D. alloeum genome is comparable and often superior to gene content present in the scaffold assembly of M. demolitor and the chromosomal-level assemblies of A. mellifera and N. vitripennis.

Annotation of repertoire of intact oxidative phosphorylation genes in D. alloeum

We retrieved one large contig of 16,419 bp containing a complete set of 13 protein-coding genes, 22 tRNAs, and two RNAs from the NOVOPlasty output, representing the mitochondrial genome of D. alloeum (Supplementary Data 2-S3). The total sequence length falls within the range of other complete braconid wasp genomes

(15,000-19,000 bp; Wei et al., 2010). We discarded four small contigs < 1,000 bp corresponding to mitochondrial locations surrounding the A+T-rich region. By default,

NOVOPlasty lists possible combinations of small contigs to produce a complete circular genome. Without evidence a priori to favor certain arrangements over others, we chose to discard these small contigs and retain only the single high-confidence contig.

28

All D. alloeum mitochondrial protein-coding genes and RNAs in have normal sizes and are complete. D. alloeum sequences have an identical spatial arrangement to the braconid wasp Diachasmimorpha longicaudata (Figure 2-1; Wei et al., 2010). All tRNAs had typical cloverleaf structures predicted using the MITOS web server

(http://mitos.bioinf.uni-leipzig.de/index.py; Bernt et al., 2013. No secondary structure analyses were performed on rRNAs, but BLASTn searches of these sequences confirmed their identity.

We annotated 65 of 68 nuclear-encoded mitochondrial genes in the D. alloeum genome (Supplementary Data 2-S4). The three missing genes (B15 and MNLL in complex I, polypeptide VIC in complex IV) are all present in A. mellifera, but not found in the N. vitripennis genome (Porcelli et al., 2007; Gibson et al., 2010) indicating these genes may have been lost after the emergence of parasitoidism in Hymenoptera. Of these

65 genes, we characterized 62 genes as full-length: amino acid sequences are intact and have similar lengths to homologs in N. vitripennis, A. mellifera, and D. melanogaster. We generated truncated models for the remaining three genes. We also found gene duplicates for four genes (flavoprotein and iron-sulfur subunits in complex II, subunit IV and polypeptide VIA in complex IV). These duplicates have RNASeq support and considerable homology (>30% sequence identity) with duplicated sequences in A. mellifera and D. melanogaster.

Diachasma alloeum could serve as an organismal system to evaluate patterns of molecular evolutionary rate in mitochondrial and nuclear genes in the OXPHOS pathway.

Recently it has been demonstrated that both mitochondrial and nuclear OXPHOS genes display higher rates of amino acid substitutions in Hymenoptera relative to other insect

29 orders (Li et al., 2017). Accelerated molecular evolution in nuclear OXPHOS genes may be a consequence of positive selection acting on compensatory changes in response to mutation accumulation in mitochondrial OXPHOS genes and/or reflect relaxation of functional constraint on certain regions of these proteins (Rand et al., 2004; Zhang &

Broughton, 2013). Elevated rates in some nuclear-encoded OXPHOS genes have been implicated in mitochondrial-nuclear incompatability between closely-related species of

Nasonia (Gibson et al., 2010). The OXPHOS gene set could be used to assess the presence/absence of postzygotic reproductive barriers between populations of D. alloeum utilizing different host plants (Forbes et al., 2009). In addition, co-transmission of nuclear and mitochondrial genomes as a single unit in asexuals may influence patterns of mutation accumulation in both genomes, and comparative studies have demonstrated higher mutational loads in asexual mitochondrial genomes relative to their sexual counterparts (Normark & Moran, 2000; Neiman et al., 2009; Sharbrough et al., 2018). In another species of Diachasma that has experienced recent loss of sex, various phenotypically different lineages of asexual wasps based on distinct mtDNA haplotypes have been identified (Forbes et al., 2013). The extent to which genetic variation is generated and transmitted in asexual mitochondrial genomes may provide insight into the adaptability and long-term persistence of this asexual species.

Ortholog groups are shared between D. alloeum and other hymenopterans

Using a custom script, we reduced the original set of 19,692 proteins included in the

NCBI annotation dataset to 12,789 nonredundant proteins. Submission of this dataset along with similar datasets for A. mellifera, N. vitripennis, and M. demolitor on the

OrthoVenn web server produced a total of 10,592 total clusters (Figure 2-2). 9,619

30 clusters were identified as orthologous clusters (containing at least two species), and

7,276 clusters were shared by all four species in this dataset. Interestingly, many enriched

GO terms are relevant to wasp-host fruit interactions (Figure 2-2). Proteins in the 279 clusters specific to D. alloeum are enriched for GO terms associated with olfactory processes, and clusters with aromatase activity GO terms contain cytochrome P450s involved in the metabolism of toxic plant compounds. In addition, among the top GO- enriched terms were terms relevant to transposable element activity (DNA integration,

RNA-directed DNA polymerase, nucleic acid binding, endonuclease activity), suggesting there might be high levels of transposable element content in the D. alloeum genome. GO terms enriched in the 260 clusters shared between D. alloeum and M. demolitor included terms associated with perception of taste, suggesting that lineages of genes that mediate these traits (e.g. gustatory receptors) have recently expanded in braconid wasps or have been lost in N. vitripennis and A. mellifera (Table 2-5).

Expansion of species-specific chemosensory genes in D. alloeum

We annotated a total of 326 putative chemosensory genes in D. alloeum. All models had confirmed BLASTx hits with members of their respective protein families, and most intact (non-pseudogene) models had at least one characteristic InterPro domain associated with the protein family with which it belongs. Sequences for chemosensory genes are available as Supplementary Data 2-S5.

We found 201 odorant receptors in the D. alloeum genome (Table 2-6). The number of D. alloeum ORs is comparable to A. mellifera (174 AmelOrs), N. vitripennis

(301 NvitOrs), and M. demolitor (222 MdemOrs), consistent with the expansion of this gene family in Hymenoptera (Robertson & Wanner, 2006; Robertson et al., 2010; Zhou

31 et al., 2015). 192 of the genes we identified were intact, while 14 are putative pseudogenes. The 192 intact ORs may be an overestimate, as this includes some truncated models with missing N-terminal or C-terminal ends that may actually represent pseudogenes. Since we did not include any fragments less than half of the length of a typical OR gene (< 200 amino acids), it is unlikely that truncated annotations represent the N and C termini of the same gene. Of the intact genes annotated in this study, 69 models were correctly predicted by the NCBI pipeline, 71 were changed, and 47 were unannotated. All intact ORs had InterPro domains characteristic for the OR family

(IPR004117). The majority of DallOrs (159/201) are present in tandem repeats, an observation consistent with spatial patterns of these genes in other hymenopterans

(Robertson & Wanner, 2006; Robertson et al., 2010; Zhou et al., 2015).

The phylogeny of OR proteins is shown in Figure 2-3. The sole 1:1 ortholog group contained DallOr1, MdemOr1, NvOr1, and AmOr2; these orthologs of DmOr83 serve as OR coreceptors (ORCOs) and are widely conserved across insects (Krieger et al., 2003). All other DallOrs exhibit complicated patterns of gene duplication and loss relative to ORs in other surveyed hymenopterans. We found OR subfamilies that were present in the three parasitoid wasp species (D. alloeum, M. demolitor, N. vitripennis) that were either absent or greatly reduced in A. mellifera (Figure 2-4A). Conversely, the tandem duplication in AmelOr1-61 represents a substantial expansion relative to the orthologous 15 gene tandem repeat in D. alloeum (DallOr2-16, Figure 2-4F). The relationships among members in this tandemly repeated subfamily are complicated by the lack of monophyly among species-specific ORs, and orthologous sequences in N. vitripennis are located on separate chromosomal locations (Robertson et al., 2010). In

32 comparisons of parasitoid wasps, there are orthologous group gene ratios of one-to-many

ORs, such as the DallOr196 placement with MdemOr155-167 (Figure 2-4D). There are also major subfamily expansions completely absent in D. alloeum, such as the 75-gene group of N. vitripennis (Figure 2-4C).

The OrthoVenn analysis provided details on gene clusters putatively specific to D. alloeum. We re-analyzed these clusters after manual annotation of D. alloeum ORs. We found one cluster (DallOr114-122) that had no obvious orthologous subfamily in other species in the phylogenetic tree and may represent a bona fide expansion in a recent wasp lineage (Figure 2-4B). However, we also identified multiple examples of clusters that grouped with subfamilies from other species, e.g. DallOr192-197 and DallOr17-26

(Figure 2-4D, Figure 2-4E). Identification of synteny on one side of these tandem duplicates between D. alloeum and other hymenopterans provides strong support that these are orthologous gene clusters. Overall, it is possible that D. alloeum-specific cluster identification and associated GO-term enrichment is influenced by large sequence divergence between hymenopteran OR sequences. Assessing legitimate expansions in D. alloeum is complicated by the indeterminate orthology of clusters and fragmentation of genome assemblies. Nevertheless, the expansion of OR subfamilies consistent with other parasitic wasps reflects the complex chemical ecology underlying host finding and mating behaviors in D. alloeum.

The gustatory receptor (GR) family composition varies considerably across

Hymenoptera, and the 40 GRs we identified in D. alloeum is larger than the 10 GRs identified in A. mellifera (Robertson & Wanner, 2006) but smaller than the 86 identified in M. demolitor and 58 identified in N. vitripennis (Table 2-6; Robertson et al., 2010,

33

Zhou et al., 2015). DallGr28 was the only gene identified as a putative pseudogene, and among the 39 intact genes, only four were accurately predicted by NCBI. We changed seven models and annotated 28 new models in this study. All annotated GRs possess the

IPR013604 InterPro domain specific to seven-transmembrane chemoreceptors. A majority (23 of 40) of these genes are arranged in tandem repeats in D. alloeum, consistent with observations of spatial patterns of GRs in N. vitripennis (Robertson et al.,

2010). In addition, while not quantified, there was a notable representation of DallGrs not assigned to tandem groups that were near edges (> 50,000 bp) or on scaffolds < 10 kbp.

A phylogeny of DallGrs, MdemGrs, NvGrs, and AmGrs is shown in Figure 2-5.

Simple 1-to-1 orthologous groupings were recovered for DallGr1 and DallGr2, which are putative sugar receptors, and DallGr3, which is related to the fructose receptor of flies

(Robertson et al., 2010). We also recovered DallGr5 as a simple ortholog with the three other hymenopterans (AmelGr6, NvitGr6, MdemGr76). DallGr4 has high sequence identity (> 70%) with a recently duplicated GR in N. vitripennis (NvitGr4-5), although surprisingly this group did not include an ortholog from M. demolitor. DallGr6 has strong support for placement with tandem duplicates in A. mellifera (AmelGr8-9) and N. vitripennis (NvitGr8-9), although the orthology of these genes is unclear. The grouping of

DallGr40 with MdemGr66 has high bootstrap support and represents a likely conserved receptor in these braconid wasps.

Similar to observations in ORs, extensive gene duplication and loss is present in the GR phylogeny. There is strong support for a subfamily expansion containing three truncated D. alloeum gene models DallGr7-9 and four N. vitripennis genes NvitGr11-14.

Low sequence identity among these genes (15-21%) might suggest an expansion event in

34 a distant wasp ancestor and subsequent loss events in D. alloeum and M. demolitor. We found 30 GRs in D. alloeum grouped with 62 M. demolitor genes with indeterminate orthology. Fragmentation of the genome assembly in these regions makes it difficult to use microsynteny analyses to assess orthology and subsequent subfamily expansion in D. alloeum and M. demolitor. There is strong phylogenetic support for the placement of this group sister to a 32 gene subfamily in N. vitripennis (NvitGr15-47), a group containing a single pseudogene in A. mellifera (AmelGr11PSE) and is consistent with observations of

Robertson et al. (2010) suggesting loss of these GR lineages in honeybee. The orthologous group containing D. alloeum and M. demolitor genes are members of clusters contributing to the enrichment of gustatory-related GO terms in the OrthoVenn analysis.

However, as stated above, our annotation of many new GR models in D. alloeum permits a more robust view of GR family expansion in braconid wasp ancestors and within

Diachasma and Microplitis lineages. There is also evidence for GR lineage loss in D. alloeum. An ortholog group with N. vitripennis (NvitGr48-58), A. mellifera (AmelGr7,

AmelGr12, AmelGrX-ZPSE), and M. demolitor (MdemGr51-58, MdemGr72) genes had no apparent ortholog in D. alloeum.

Overall, our phylogenetic reconstruction suggests that many GR lineages are the result of extensive gene duplications and differentiation following divergence from an ancestral parasitoid wasp (Diachasma-Microplitis-Nasonia ancestor) and from an ancestral braconid wasp (Diachasma-Microplitis ancestor). The low sequence similarity among GR sequences within and between species makes it difficult to resolve orthologous relationships among these genes and the timing of duplication/loss events.

35

We identified a total of 56 ionotropic receptors in D. alloeum, representing a considerable expansion relative to A. mellifera and M. mediator (Table 2-6). We found 15

“antennal IRs” contained in ortholog groups and 41 “divergent IRs” specific to D. alloeum. We confirmed gene predictions for 12 IRs, changed 12 models, and generated

32 new gene annotations. 37 of 56 IR genes (~66%) were arranged in tandem arrays, including repeated antennal IRs (e.g. DallIr64a-a.6, Dall75u-u.3) and divergent IRs (e.g.

DallIr101-108, DallIr109-114). All annotated IRs, with the exception of DallIr124-141, contain at least one of the ligand-gated ion channel domain (PF10613 and/or PF00060).

Most IRs, with the exception of members of basal IRs DallIR8a and DallIR25a, lack the characteristic ATP domains (PF01094) of iGLURs (Croset et al., 2010).

Phylogenetic analysis of IR sequences shows strong bootstrap support for all antennal IRs within their corresponding ortholog groups (Figure 2-6). Our phylogeny supports the placement of single-copy DallIr8a, DallIr68a, DallIr76b, and DallIr93a in

1:1 ortholog groups. DallIr25a groups with Ir25a orthologs from other species, which is duplicated in M. mediator. DallIr21a groups with orthologs in N. vitripennis and M. mediator, while this gene has apparently been lost in A. mellifera. We recovered six genes in the DallIr64a clade with three orthologs from N. vitripennis and two orthologs from M. mediator. The three DallIr75u genes group with two orthologs from N. vitripennis and one each from A. mellifera and M. mediator. The divergent IR genes show more complicated patterns of gene duplication and loss, similar to those of the ORs and GRs. Of the 41 divergent IRs, 31 are contained in two large groups; an 18 gene grouping with a single M. mediator ortholog (DallIr101-118) and a 13 gene grouping with a large N. vitripennis clade (DallIr124-136P+F). These subfamilies may have

36 independently expanded in Diachasma/Nasonia or alternatively were lost in Microplitis.

DallIr119, DallIr120, and DallIr121 each groups with orthologs from other parasitic wasps, and DallIr122/123 represent a recent duplication event specific to the Diachasma lineage. DallIr137PSE-140 are nested within a large N. vitripennis IR expansion with indeterminate orthology.

We found 15 odorant binding proteins (OBPs) in the D. alloeum genome (Table

2-6). Gene content for this family is relatively consistent among hymenopteran insects studied here, with the exception of the substantial expansion of 98 OBPs in N. vitripennis

(Viera et al., 2012). OBPs are highly divergent, with pairwise comparisons between D. alloeum amino acid sequences having as low as 8% identity. Additionally, there is high interspecies variability across OBPs, with pairwise comparisons in the 3-99% identity range. Despite extensive sequence variability, all D. alloeum OBPs contain InterPro hits representative of this family (IPR006170/IPR036728). In addition, OBPs are readily identifiable by the presence of six well-conserved cysteines that form three disulfide bridges as a major structural component of these proteins (Tegoni et al., 2004, Pelosi et al., 2014a). All annotated OBPs in D. alloeum contain the canonical six-cysteine structure. Nine DallObps are in tandem repeats, however the notable number of single genes and tandem repeats near edges of scaffolds prevents a comprehensive evaluation of this pattern in the D. alloeum genome.

The phylogenetic relationships of OBPs in hymenopterans investigated in this study are shown in Figure 2-7. We recovered DallObp1 and DallObp2 in two separate ortholog groups containing a single gene from each species. Aside from some species- specific OBP expansions, phylogenetic support for ortholog groupings is generally poor.

37

Although these genes are broadly conserved across insect species and have demonstrable function in insect olfaction, the complicated patterns of gene duplication and loss and fragmentation of these genomes makes it difficult for us to infer orthology among surveyed OBPs.

We identified nine chemosensory proteins (CSPs) in D. alloeum, similar to the number of CSPs found in other hymenopteran insects (Table 2-6). We confirmed all CSP gene predictions made by NCBI, and no models were newly generated. Consistent with observations in Forêt et al., (2007) regarding tandem arrangement of CSPs in model insect species, five of nine genes were within 50,000 bp of another CSP, with one additional gene near the edge of a small scaffold. All CSP genes contain an InterPro domain specific to this gene family (IPR005055). CSPs are more conserved relative to other chemosensory gene families, with pairwise sequence identity percentages ranging from 15-58% among D. alloeum homologs. In addition, all D. alloeum CSPs possess four conserved cysteine residues that form two disulfide bridges in these proteins (Tegoni et al., 2004).

The phylogeny of CSPs in select hymenopteran insects is shown in Figure 2-8.

We recovered a single monophyletic group containing a single gene from each insect

(DallCsp2, Nvit_NV16077, MmedCsp3, AmelCsp1), suggesting this family has experienced patterns of lineage-specific gene duplication and/or loss. The interspecific genetic distances are lower for CSPs than other chemosensory families, and pairwise identity values range from 11-68%. Overall, this family has experienced more consistent gene numbers and broader conservation across hymenopteran species.

38

In summary, this gene set is an important resource for future studies of the evolutionary history of Diachasma chemosensory genes. First, it is critical to ascertain the members of the D. alloeum chemosensory repertoire that operate specifically in chemosensory behavior. While the families are generally well conserved across insects, the challenge of orthology assessment and the limited functional study of these genes makes it difficult to estimate the precise chemosensory inventory of D. alloeum. ORs operate specifically in odorant recognition, and the expansion of OR genes in insects may have been adaptive during the transition to terrestrial life (Robertson et al., 2003, but see

Missbach et al., 2014). Although relatively understudied, the IR family has a likely protostome origin, and conservation of multiple orthologs initially identified in D. melanogaster suggest an important function of IR genes in olfaction across insects (Rytz et al., 2013). Conversely, the origin of GRs dates back to the Placozoa, and GR-like genes in basal animals function in development, not chemosensation (Robertson, 2015,

Saina et al., 2015). The OBP and CSP transporter families have roles in chemical ligand delivery to chemosensory receptors but also function in release of pheromones, reproductive processes, and embryonic development (Pelosi et al., 2018). Transcriptome datasets used for D. alloeum gene predictions were taken from pooled whole male and female wasps, so we cannot exclude the possibility that some genes have non- chemosensory roles. Future studies should incorporate tissue-specific RNA datasets to provide stronger support for genetic components of chemosensation in D. alloeum.

Second, chemosensory genes are candidates for differential selective regimes in apple and hawthorn populations of D. alloeum. The ability of wasps to discriminate host fruit odors is critical, as fruits are the sites for mate-finding and egg-laying (Forbes et al.,

39

2009). Rhagoletis pomonella host flies use olfactory cues from ripening fruit to identify suitable sites for mating and oviposition (Linn et al., 2003). Like R. pomonella, D. alloeum parasitoids have demonstrated odor preferences for their host fruits, representing a potential prezygotic reproductive barrier preventing mating between wasp populations utilizing different hosts (Forbes et al., 2009). Evolutionary rate and differential expression analyses of chemosensory genes in D. alloeum populations could be potential areas of inquiry.

Third, chemosensory gene evolution could be influenced by transitions in reproductive strategies in Diachasma. Wasp courtship is mediated by the male perception of sex pheromones produced by females (Boush & Baerwald, 1967). Across arthropods, chemosensory genes demonstrate differential expression in males and females (e.g. Zhou et al., 2012, Shiao et al., 2013, Eyun et al., 2017). Chemosensory genes showing strong sex bias may be candidates for degradation in an asexual genome, such as those involved in female signaling or male recognition of mate signals (Normark et al., 2003). Future studies could assess sex-specific expression of chemosensory genes in D. alloeum and corresponding evolutionary patterns in its asexual relative D. muliebre.

D. alloeum contains canonical genes involved in reproduction and sex determination

Hymenoptera is an insect order characterized by haplodiploid sex determination, providing an opportunity for the study the evolution of reproductive modes, including transitions from sexual to asexual systems. In many hymenopterans, development of male or female forms is based on allelic states at a single locus, termed complementary sex determination (CSD) (van Wilgenburg et al., 2006). In A. mellifera, initiation of sex development regulation depends on allelic composition of the csd gene (Hasselmann et

40 al., 2008). We found no evidence of the csd locus in D. alloeum, however our inability to consistently rear wasps at the current time precludes our ability to definitively rule out

CSD as a sex determination mechanism. In CSD and non-CSD hymenopterans, a well- conserved sex determination regulatory cascade includes transformer and doublesex, both displaying sex-specific splicing (Geuverink & Beukeboom, 2014). We identified single copies of transformer and doublesex genes in D. alloeum (Supplementary Data 2-S6).

Meiosis is essential to obligate sexual reproduction, and loss of sex may be accompanied by the loss of meiotic genetic machinery (Schurko & Logsdon, 2008).

Identical sets of meiosis genes are found in both sexual and asexual Diachasma species, suggesting that asexual wasps may retain meiotic production of gametes despite the loss of sexual reproduction (see Chapter 3, Tvedte et al., 2017). As observed for meiosis genes, a transition to asexual reproduction might not necessarily be accompanied by degradation of sex determination genes discussed above. In N. vitripennis, male production occurs due to alternative splicing of transformer rendering the protein nonfunctional, leading to male-splicing of doublesex. Conversely, full-length transformer transcription into functional protein mediates the splicing of female-specific doublesex isoforms (Oliveira et al., 2009, Verhulst et al., 2010). Our RNA-Seq data support similar splicing differences in D. alloeum males vs. females, thus we would expect selection to preserve full-length genes in all-female Diachasma species. Additional genes contributing to sex-specific traits may be candidates for degradation in asexual wasps

(van der Kooi & Schwander, 2014, Kraaijeveld et al., 2016). Moreover, the quality of D. alloeum assembly provides a suitable framework to study the effects of reproductive mode on patterns of molecular evolution across the wasp genome.

41

Acknowledgements

We thank the W. M. Keck Center for Comparative and Functional Genomics at the

University of Illinois at Urbana-Champaign for genomic and RNA library construction and sequencing. Chris Fields at the High Preformance Computing for Biology Center at the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-

Champaign for performing the PBJelly scaffolding with TSLR reads. Austin Ward at the

Biology Department at the University of Iowa for generating custom scripts for generating nonredundant protein datasets. Samuel Cummings at the Biology Department at the University of Iowa for contributing to annotation of chemosensory genes. This work was supported by the United States Department of Agriculture/National Institute of

Food and Agriculture (A2008-35302-18819 to H.M.R., 2015-67013-23289 to J.L.F.) and the National Science Foundation (DEB-1638997 to J.L.F.)

42

REFERENCES

Austin A, Dowton M. 2000. The Hymenoptera: an introduction. In Hymenoptera:

Evolution, Biodiversity and Biological Control. eds. Austin A, Dowton M. pp 3-

16. Csiro Publishing, Clayton, AU.

Abrahamson WG, Blair CP. 2007. Sequential radiation through host-race formation:

herbivore diversity leads to diversity in natural enemies. In Specialization,

speciation, and radiation: The evolutionary biology of herbivorous insects. ed.

Tilmon KJ. pp188-200. University of California Press, Berkeley, CA, USA.

Benton R, Vannice KS, Gomez-Diaz C, Vosshall LB. 2009. Variant ionotropic glutamate

receptors as chemosensory receptors in Drosophila. Cell 136:149-162.

Bernt M, et al. 2013. MITOS: improved de novo metazoan mitochondrial genome

annotation. Mol Phylogenet Evol 69:313-319.

Birney E, Clamp M, Durbin R. 2004. GeneWise and genomewise. Genome Res 14:988-

995.

Boush GM, Baerwald RJ. 1967. Courtship behavior and evidence for a sex pheromone in

the apple maggot parasite, Opius alloeus (Hymenoptera: Braconidae). Ann

Entomol Soc Am 60:865-866.

Branstetter M, et al. 2017. Genomes of the Hymenoptera. Curr Opin Insect Sci 25:65-75.

Burke GR, Walden KKO, Whitfield JB, Robertson HM, Strand MR. 2014. Widespread

genome reorganization of an obligate virus mutualist. PLOS Genet 10:e1004660.

Bush GL. 1966. The taxonomy, cytology, and evolution of the genus Rhagoletis in North

America (Diptera, Tephritidae). B Mus Compar Zool 134:431-562.

43

Bush GL. 1994. Sympatric speciation in animals: new wine in old bottles. Trends Ecol

Evol 8:285-288.

Clyne PJ, Warr CG, Carlson JR. 2000. Candidate taste receptors in Drosophila. Science

287:1830-1834.

Clyne PJ, et al. 1999. A novel family of divergent seven-transmembrane proteins:

candidate odorant receptors in Drosophila. Neuron 22:327-338.

Croset V, et al. 2010. Ancient protostome origin of chemosensory ionotropic glutamate

receptors and the evolution of insect taste and olfaction. PLOS Genet 6:e1001064.

Dambroski HR, et al. 2005. The genetic basis for fruit odor discrimination in Rhagoletis

flies and its significance for sympatric host shifts. Evolution 59:1953-1964.

Dierckxsens N, Mardulyn P, Smits G. 2016. NOVOPlasty: de novo assembly of organelle

genomes from whole genome data. Nucleic Acids Res 45:e18.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high

throughput. Nucleic Acids Res 32:1792-1797.

English AC, et al. 2012. Mind the gap: upgrading genomes with Pacific Biosciences RS

long-read sequencing technology. PLOS One 7:e47768.

Eyun S, et al. 2017. Evolutionary history of chemosensory-related gene families across

the Arthropoda. Mol Biol Evol 34:1838-1862.

Feder JL, Forbes AA. 2010. Sequential speciation and the diversity of parasitic insects.

Ecol Entomol 35:67-76.

Forbes AA, Bagley RK, Beer MA, Hippee, AC, Widmayer HA. 2018. Quantifying the

unquantifiable: why Hymenoptera—not Coleoptera—is the most speciose

order. bioRxiv 274431; doi: https://doi.org/10.1101/274431

44

Forbes AA, et al. 2017. Revisiting the particular role of host shifts in initiating insect

speciation. Evolution 71:1126-1137.

Forbes AA, Feder JL. 2006. Divergent preferences of Rhagoletis pomonella host races

for olfactory and visual fruit cues. Entomol Exp Appl 119:121-127.

Forbes AA, Powell THQ, Stelinski LL, Smith JJ, Feder JL. 2009. Sequential sympatric

speciation across trophic levels. Science 323:776-779.

Forbes AA, Rice LA, Stewart NB, Yee WL, Neiman M. 2013. Niche differentiation and

colonization of a novel environment by an asexual parasitic wasp. J Evolution

Biol 26:1330-1340.

Forêt S, Maleszka R. 2006. Function and evolution of a gene family encoding odorant

binding-like proteins in a social insect, the honey bee (Apis mellifera). Genome

Res 16:1404-1413.

Forêt S, Wanner KW, Maleszka R. 2007. Chemosensory proteins in the honey bee:

Insights from the annotated genome, comparative analyses and expressional

profiling. Insect Biochem Molec 37:19-28.

Geuverink E, Beukeboom LW. 2014. Phylogenetic distribution and evolutionary

dynamics of the sex determination genes doublesex and transformer in insects.

Sex Dev 8:38-49.

Gibson JD, Niehuis O, Verrelli BC, Gadau J. 2010. Contrasting patterns of selective

constraints in nuclear-encoded genes of the oxidative phosphorylation pathway in

holometabolous insects and their possible role in hybrid breakdown in Nasonia.

Heredity 104:310-317.

45

Grabherr MG, et al. 2011. Full-length transcriptome assembly from RNA-Seq data

without a reference genome. Nat Biotechnol 29:644-652.

Haas BJ, et al. 2013. De novo transcript sequence reconstruction from RNA-seq using the

Trinity platform for reference generation and analysis. Nature Protoc 8:1494-

1512.

Hasselmann M, et al. 2008. Evidence for the evolutionary nascence of a novel sex

determination pathway in honeybees. Nature 454:519-522.

Hood GR, et al. 2015. Sequential divergence and the multiplicative origin of community

diversity. Proc Natl Acad Sci USA 112:E5980-E5989.

Kearse M, et al. 2012. Geneious Basic: an integrated and extendable desktop software

platform for the organization and analysis of sequence data. Bioinformatics

28:1647-1649.

Kelley DR, Schatz MC, Salzberg SL. 2010. Quake: quality-aware detection and

correction of sequencing errors. Genome Biol 11:R116.

Kraaijeveld K, et al. 2016. Decay of sexual trait genes in an asexual parasitoid wasp.

Genome Biol Evol 8:3685-3695.

Krieger J, Klink O, Mohl C, Raming K, Breer H. 2003. A candidate olfactory receptor

subtype highly conserved across different insect orders. J Comp Physiol A

189:519-526.

Laetsch DR, Blaxter ML. 2017. BlobTools: Interrogation of genome assemblies.

F1000Research 6:1287.

Larter NK, Sun JS, Carlson JR. 2016. Organization and function of Drosophila odorant

binding proteins. Elife 5:e20242.

46

LaSalle J, Gauld ID. 1993. Hymenoptera: their diversity, and their impact on the

diversity of other organisms. In Hymenoptera and Biodiversity. eds. LaSalle J,

Gauld ID. pp 1–26. CAB International, Wallingford, UK.

Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-

MEM. arXiv:1303.3997.

Li Y, et al. 2017. The molecular evolutionary dynamics of oxidative phosphorylation

(OXPHOS) genes in Hymenoptera. BMC Evol Biol 17:269.

Linn C, et al. 2003. Fruit odor discrimination and sympatric host race formation in

Rhagoletis. Proc Nat Acad Sci USA 100:11490-11493.

Luo R, et al. 2012. SOAPdenovo2: an empirically improved memory-efficient short-read

de novo assembler. Gigascience 1:18.

Ma WJ, Pannebakker BA, Beukeboom LW, Schwander T, Van de Zande L. 2014.

Genetics of decayed sexual traits in a parasitoid wasp with endosymbiont-induced

asexuality. Heredity 113:424-431.

Missbach C, et al. 2014. Evolution of insect olfactory receptors. Elife 3:e02115.

Neiman M, Hehman G, Miller JT, Logsdon Jr JM, Taylor DR. 2009. Accelerated

mutation accumulation in asexual lineages of a freshwater snail. Mol Biol Evol

27:954-963.

Normark BB, Judson OP, Moran NA. 2003. Genomic signatures of ancient asexual

lineages. Biol J Linn Soc 79:69-84.

Normark BB, Moran NA. 2000. Testing for the accumulation of deleterious mutations in

asexual eukaryote genomes using molecular sequences. J Nat Hist 34:1719-1729.

Nosil, P. 2012. Ecological Speciation. Oxford University Press, New York, NY, USA.

47

Oliveira DCSG, et al. 2009. Identification and characterization of the doublesex gene of

Nasonia. Insect Mol Biol 18:315-324.

Pelosi P, Mastrogiacomo R, Iovinella I, Tuccori E, Persaud KC. 2014a. Structure and

biotechnological applications of odorant-binding proteins. Appl Microbiol Biot

98:61-70.

Pelosi P, Iovinella I, Felicioli A, Dani FR. 2014b. Soluble proteins of chemical

communication: an overview across arthropods. Front Physiol 5:320.

Pelosi P, Iovinella I, Zhu J, Wang G, Dani FR. 2018. Beyond chemoreception: diverse

tasks of soluble olfactory proteins in insects. Biol Rev 93:184-200.

Peng Y, et al. 2017. Identification of odorant binding proteins and chemosensory proteins

in Microplitis mediator as well as functional characterization of chemosensory

protein 3. PLOS One 12:e0180775.

Porcelli D, Barsanti P, Pesole G, Caggese C. 2007. The nuclear OXPHOS genes in

insecta: a common evolutionary origin, a common cis-regulatory motif, a

common destiny for gene duplicates. BMC Evol Biol 7:215.

Rand DM, Haney RA, Fry AJ. 2004. Cytonuclear coevolution: the genomics of

cooperation. Trends Ecol Evol 19:645-653.

Robertson HM. 2015. The insect chemoreceptor superfamily is ancient in animals. Chem

Senses 40:609-614.

Robertson HM, Gadau J, Wanner KW. 2010. The insect chemoreceptor superfamily of

the parasitoid jewel wasp Nasonia vitripennis. Insect Mol Biol 19:121-136.

48

Robertson HM, Wanner KW. 2006. The chemoreceptor superfamily in the honey bee,

Apis mellifera: expansion of the odorant, but not gustatory, receptor family.

Genome Res 16:1395-1403.

Robertson HM, Warr CG, Carlson JR. 2003. Molecular evolution of the insect

chemoreceptor gene superfamily in Drosophila melanogaster. Proc Nat Acad Sci

USA 100:14537-14542.

Rytz R, Croset V, Benton R. 2013. Ionotropic receptors (IRs): chemosensory ionotropic

glutamate receptors in Drosophila and beyond. Insect Biochem Molec 43:888-

897.

Saina M, et al. 2015. A cnidarian homologue of an insect gustatory receptor functions in

developmental body patterning. Nature Commun 6:6243.

Schurko AM, Logsdon JM Jr. 2008. Using a meiosis detection toolkit to investigate

ancient asexual “scandals” and the evolution of sex. Bioessays, 6:579-589.

Sharbrough J, Luse M, Boore JL, Logsdon Jr JM, Neiman M. 2018. Radical amino acid

mutations persist longer in the absence of sex. Evolution. DOI:10.1111/evo.13465

Shiao M, et al. 2013. Transcriptional profiling of adult Drosophila antennae by high-

throughput sequencing. Zool Stud 52:42.

Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO:

assessing genome assembly and annotation completeness with single-copy

orthologs. Bioinformatics 31:3210-3212.

Stireman JO, Nason JD, Heard SB, Seehawer JM. 2006. Cascading host-associated

genetic differentiation in parasitoids of phytophagous insects. P Roy Soc Lon B

Bio 273:523-530.

49

Tegoni M, Campanacci V, Cambillau C. 2004. Structural aspects of sexual attraction and

chemical communication in insects. Trends Biochem Sci 29:257-264.

Tvedte ES, Forbes AA, Logsdon Jr JM. 2017. Retention of core meiotic genes across

diverse Hymenoptera. J Hered 108:791-806. van der Kooi CJ, Matthey-Doret C, Schwander T. 2017. Evolution and comparative

ecology of parthenogenesis in haplodiploid arthropods. Evolution Let 1:304-316. van der Kooi CJ, Schwander T. 2014. On the fate of sexual traits under asexuality. Biol

Rev 89:805-819. van Wilgenburg E, Driessen G, Beukeboom LW. 2006. Single locus complementary sex

determination in Hymenoptera: an" unintelligent" design? Front Zool 3:1.

Verhulst EC, van de Zande L, Beukeboom LW. 2010. Insect sex determination: it all

evolves around transformer. Curr Opin Genet Dev 20:376-383.

Vieira FG, et al. 2012. Unique features of odorant-binding proteins of the parasitoid wasp

Nasonia vitripennis revealed by genome annotation and comparative analyses.

PLOS One 7:e43034.

Vieira FG, Rozas J. 2011. Comparative genomics of the odorant-binding and

chemosensory protein gene families across the Arthropoda: origin and

evolutionary history of the chemosensory system. Genome Biol Evol 3:476-490.

Walsh BD. 1867. The apple‐worm and the apple‐maggot. J Hort 2:338–343.

Wang S, et al. 2016. Cloning and expression profile of ionotropic receptors in the

parasitoid wasp Microplitis mediator (Hymenoptera: Braconidae). J Insect

Physiol 90:27-35.

50

Wang Y, Coleman-Derr D, Chen G, Gu YQ. 2015. OrthoVenn: a web server for genome

wide comparison and annotation of orthologous clusters across multiple species.

Nucleic Acids Res 43:W78-W84.

Wei S, Shi M, Sharkey MJ, van Achterberg C, Chen X. 2010. Comparative

mitogenomics of Braconidae (Insecta: Hymenoptera) and the phylogenetic utility

of mitochondrial genomes with special reference to Holometabolous insects.

BMC Genomics 11:371.

Weinstock GM, et al. 2006. Insights into social insects from the genome of the honeybee

Apis mellifera. Nature 443: 931-949.

Werren JH, et al. 2010. Functional and evolutionary insights from the genomes of three

parasitoid Nasonia species. Science 327:343-348.

Wharton RA, Yoder MJ. 2015. Parasitoids of fruit-infesting tephritidae. Available at

paroffit.org/public/site/paroffit/home. Accessed July 2, 2015.

Wharton RA, Marsh PM. 1978. New world Opiinae (Hymenoptera: Braconidae) parasitic

on Tephritidae (Diptera). J Wash Acad Sci:147-167.

Zdobnov EM, et al. 2017. OrthoDB v9.1: cataloging evolutionary and functional

annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs.

Nucleic Acids Res 45:D744- D749.

Zdobnov EM, Apweiler R. 2001. InterProScan–an integration platform for the signature-

recognition methods in InterPro. Bioinformatics 9:847-848.

Zhang F, Broughton RE. 2013. Mitochondrial–nuclear interactions: compensatory

evolution or variable functional constraint among vertebrate oxidative

phosphorylation genes? Genome Biol Evol 5:1781-1791.

51

Zhang S, Zhang Y, Su H, Gao X, Guo Y. 2009. Identification and expression pattern of

putative odorant-binding proteins and chemosensory proteins in antennae of the

Microplitis mediator (Hymenoptera: Braconidae). Chem Senses 34:503-512.

Zhou X, et al. 2015. Chemoreceptor evolution in hymenoptera and its implications for the

evolution of eusociality. Genome Biol Evol 7:2407-2416.

Zhou X, et al. 2012. Phylogenetic and transcriptomic analysis of chemosensory receptors

in a pair of divergent ant species reveals sex-specific signatures of odor coding.

PLOS Genet 8:e1002930.

52

TABLES

Table 2-1. Summary statistics and feature counts of D. alloeum genome assembly. Dall1.0 assembly Total Sequence Length 388,752,668 Assembly Gap Length 23,910,460 Scaffold Count 3,968 Scaffold N50 645,483 Contig Count 25,534 Contig N50 44,932 Genes 13,439 mRNA transcripts 19,692 fully-supported 16,774 with >5% ab initio 2,305 partial 549 Protein-coding genes 12,837 Pseudogenes 178

53

Table 2-2. Arthropoda BUSCO analysis for four selected hymenopteran genomes.

Arthropoda BUSCOs (1066 total) Complete Complete Organism Assembly Name Fragmented Missing single-copy duplicated GCA_001412515.1 Diachasma alloeum 1043 10 7 6 (this study) GCA_000002195.1 Apis mellifera 1038 6 15 7 (Weinstock et al., 2006) GCA_000002325.2 Nasonia vitripennis 1030 8 15 13 (Werren et al., 2010) GCA_000572035.2 Microplitis demolitor 1038 19 4 5 (Burke et al., 2014)

54

Table 2-3. Hymenoptera BUSCO analysis for four selected hymenopteran genomes.

Hymenoptera BUSCOs (4415 total) Complete Complete Organism Assembly Name Fragmented Missing single-copy duplicated GCA_001412515.1 Diachasma alloeum 4132 30 138 115 (this study) GCA_000002195.1 Apis mellifera 4282 13 58 62 (Weinstock et al., 2006) GCA_000002325.2 Nasonia vitripennis 4094 24 192 105 (Werren et al., 2010) GCA_000572035.2 Microplitis demolitor 4023 48 172 172 (Burke et al., 2014)

55

Table 2-4. Summary statistics and BUSCO gene content for genome assemblies in select Hymenoptera.

Complete Scaffold Organism Assembly Name Assembly Size Scaffold N50 arthropod Count BUSCOs GCA_001412515.1 Diachasma alloeum 388,752,668 3,698 645,483 1053 (99%) (this study) GCA_000002195.1 (Weinstock et Apis mellifera 250,287,000 5,645 997,192 1044 (98%) al., 2006) GCA_000002325.2 Nasonia vitripennis 295,780,872 6,169 708,988 1038 (97%) (Werren et al., 2010) GCA_000572035.2 Microplitis demolitor 241,190,213 1,794 1,139,389 1057 (99%) (Burke et al., 2014)

56

Table 2-5. GO enrichment analysis of OrthoVenn ortholog clusters shared between D. alloeum and M. demolitor.

number of GO Term GO Annotation GO category domain E-value clusters GO:0050912 4 detection of chemical stimulus involved in biological process 0.01370055307229133 sensory perception of taste GO:0008527 4 taste receptor activity molecular function 0.01851319349435597 GO:0030136 2 clathrin-coated vesicle cellular component 0.0196295708386015 GO:0007291 3 sperm individualization biological process 0.023419011281021872 GO:0050909 4 sensory perception of taste biological process 0.024252062279888503 GO:0005778 3 peroxisomal membrane cellular component 0.03299589904931437 GO:0005905 3 coated pit cellular component 0.04428876598447125 E-value cutoff for reporting enriched GO terms is 0.05.

57

Table 2-6. Chemosensory gene content of select hymenopteran insects.

Organism ORs GRs IRs OBPs CSPs Citations

This study D. alloeum 192(14) 39(1) 51(5) 15(0) 9(0)

Robertson & Wanner, 2006 Foret & Maleszka, 2006 A. mellifera 163 (11) 10(0) 10(0a) 21(0a) 6(0a) Foret et al., 2007 Croset et al., 2010 Robertson et al., 2010 Croset et al., 2010 N. vitripennis 225 (76) 47(11) 99(54) 90(8) 9(0) Werren et al., 2010 Vieira et al., 2012

Zhou et al., 2015 M. demolitorb 218 (4) 85(1)

Zhang et al., 2009 Wang et al., 2016 M. mediator 17(0a) 20(0a) 3(0a) Peng et al., 2017

Intact gene counts are outside parentheses and pseudogene counts are inside parentheses. apseudogene counts were not addressed explicitly in the study. bZhou et al., 2015 provided counts of truncated models and pseudogenes for ORs and GRs, however these sequences were not published and therefore were not used in building phylogenies.

58

FIGURES

Figure 2-1. Diagram of mitochondrial genome of D. alloeum. Protein-coding genes are shown in green, rRNAs are shown in red, tRNAs are shown in pink. Leftward and rightward facing arrows indicate direction of transcription. AT-rich control region (top) was not fully assembled by NOVOPlasty. Image generated using Geneious v9.1 (Biomatters, Ltd.).

59

Figure 2-2. OrthoVenn clusters of select hymenopteran protein datasets. Image adapted from OrthoVenn web server output (http://www.bioinfogenome.net/OrthoVenn/). Inset: GO enrichment analysis of OrthoVenn clusters specific to D. alloeum. E-value cutoff for reporting enriched GO terms is 0.05.

60

Figure 2-3. Phylogenetic tree of ORs from sampled hymenopteran insects. Taxa surveyed include D. alloeum (blue), M. demolitor (red), N. vitripennis (green), and A. mellifera (orange). Maximum likelihood tree generated using an alignment with 305 amino acids. Boxes and letters A-F represent points of interest in the phylogenetic tree; areas are expanded and shown in Figure 2-4A-F. The scale bar indicates the number of amino acid substitutions per site.

61

Figure 2-4. Phylogenetic subtrees of ORs from sampled hymenopteran insects. Dall = D. alloeum, Mmed = M. mediator, Nvit = N. vitripennis, Amel = A. mellifera. Maximum likelihood tree generated using an alignment with 319 amino acids. Dots on nodes indicate > 90% bootstrap support, numbers on nodes indicate < 50% bootstrap support. Letters A-F represent points of interest in the phylogenetic tree; subtrees correspond to boxed areas shown in Figure 2-3. Scale bars indicate the number of amino acid substitutions per site.

62

Figure 2-4 – continued

63

Figure 2-4 – continued

64

Figure 2-4 – continued

65

Figure 2-4 – continued

66

Figure 2-4 – continued

67

Figure 2-5. Phylogenetic tree of GRs from sampled hymenopteran insects. Dall = D. alloeum, Mdem = M. demolitor, Nvit = N. vitripennis, Amel = A. mellifera. Maximum likelihood tree generated using an alignment with 297 amino acids. Dots on nodes indicate > 90% bootstrap support, numbers on nodes indicate < 50% bootstrap support. The scale bar indicates the number of amino acid substitutions per site.

68

Figure 2-6. Phylogenetic tree of IRs from sampled hymenopteran insects. Dall = D. alloeum, Mmed = M. mediator, Nvit = N. vitripennis, Amel = A. mellifera. Maximum likelihood tree generated using an alignment with 562 amino acids. Dots on nodes indicate > 90% bootstrap support, numbers on nodes indicate < 50% bootstrap support. The scale bar indicates the number of amino acid substitutions per site.

69

Figure 2-7. Phylogenetic tree of OBPs from sampled hymenopteran insects. Dall = D. alloeum, Mmed = M. mediator, Nvit = N. vitripennis, Amel = A. mellifera. Maximum likelihood tree generated using an alignment with 169 amino acids. Dots on nodes indicate > 90% bootstrap support, numbers on nodes indicate < 50% bootstrap support. The scale bar indicates the number of amino acid substitutions per site.

70

Figure 2-8. Phylogenetic tree of CSPs from sampled hymenopteran insects. Dall = D. alloeum, Mmed = M. mediator, Nvit = N. vitripennis, Amel = A. mellifera. Maximum likelihood tree generated using an alignment with 130 amino acids. Dots on nodes indicate > 90% bootstrap support, numbers on nodes indicate < 50% bootstrap support. The scale bar indicates the number of amino acid substitutions per site.

71

SUPPLEMENTARY DATA

Table 2-S1. Sex determination genes used as queries for BLAST searches.

Gene Name Organism Accession

doublesex (female) N. vitripennis NP_001155990.1 doublesex (male) N. vitripennis NP_001155989.1 doublesex (female) A. mellifera ABW99105.1 doublesex (male) A. mellifera ABW99102.1 transformer N. vitripennis NP_001128299.1 feminizer A. mellifera NP_001128300.1 csd A. mellifera ABU68670.1

72

Figure 2-S1. Blobplot of scaffolds in D. alloeum genome assembly. Mapping coverage shown for 500 bp paired end library. Blobplot constructed using BlobTools (DOI 10.5281/zenodo.845347, Laetsch & Blaxter, 2017).

73

Supplementary Data 2-S1. Arthropoda BUSCOs not annotated as Complete (C) in select hymenopterans.

Pairwise comparisons were made with respect to D. alloeum. Bolded accessions are shared between D. alloeum and M. demolitor. Boxed accessions are shared between D. alloeum and N. vitripennis. Shaded accessions are shared between D. alloeum and A. mellifera.

Missing Diachasma alloeum Microplitis demolitor Nasonia vitripennis Apis mellifera EOG090X02ZJ EOG090X05Z0 EOG090X017M EOG090X0CAY EOG090X07WF EOG090X08JG EOG090X02ZJ EOG090X0EWC EOG090X0AQE EOG090X0AQE EOG090X05CZ EOG090X0G8U EOG090X0D9B EOG090X0D9B EOG090X08ST EOG090X0GVI EOG090X0M0J EOG090X0N0O EOG090X0DWF EOG090X0JX7 EOG090X0N0O EOG090X0E8U EOG090X0ON0 EOG090X0FQT EOG090X0P2R EOG090X0IC1 EOG090X0KAD EOG090X0KMN EOG090X0KPP EOG090X0KWJ

EOG090X0N0O

Fragmented Diachasma alloeum Microplitis demolitor Nasonia vitripennis Apis mellifera EOG090X02LT EOG090X06IP EOG090X00U2 EOG090X03YC EOG090X03DI EOG090X06Q3 EOG090X021B EOG090X04NQ EOG090X04DO EOG090X0J63 EOG090X03H5 EOG090X06V5 EOG090X07I2 EOG090X0MQF EOG090X04AR EOG090X0844 EOG090X09NR EOG090X04U5 EOG090X09ZA EOG090X0A16 EOG090X06E3 EOG090X0ARU EOG090X0IKX EOG090X07SD EOG090X0CKN EOG090X08OG EOG090X0CWA EOG090X09U3 EOG090X0DL4 EOG090X0CCZ EOG090X0FJX EOG090X0DEX EOG090X0GAD EOG090X0DNU EOG090X0H1B EOG090X0EUO EOG090X0H7R EOG090X0EVF EOG090X0J87 EOG090X0FS4 EOG090X0JXR

74

Supplementary Data 2-S1 – continued

Duplicated Diachasma alloeum Microplitis demolitor Nasonia vitripennis Apis mellifera EOG090X00DN EOG090X00RF EOG090X00GC EOG090X051P EOG090X00RF EOG090X01UY EOG090X00ST EOG090X0BVA EOG090X02EP EOG090X02C6 EOG090X07J8 EOG090X0C9F EOG090X02RQ EOG090X03AK EOG090X09KD EOG090X0G73 EOG090X04D9 EOG090X03B0 EOG090X0A1W EOG090X0GKM EOG090X078Z EOG090X03RJ EOG090X0DX5 EOG090X0GZE EOG090X0CZV EOG090X03UT EOG090X0ESN EOG090X0DWG EOG090X04JL EOG090X0FVK EOG090X0EYV EOG090X06SF EOG090X0HC4 EOG090X07AI EOG090X09Q6 EOG090X09SP EOG090X0CBU EOG090X0CJG EOG090X0DK6 EOG090X0GAD EOG090X0GT4 EOG090X0KPP EOG090X0M8U

75

Supplementary Data 2-S2. Hymenoptera BUSCOs not annotated as Complete (C) in select hymenopterans.

Pairwise comparisons were made with respect to D. alloeum. Bolded accessions are shared between D. alloeum and M. demolitor. Boxed accessions are shared between D. alloeum and N. vitripennis. Shaded accessions are shared between D. alloeum and A. mellifera.

Missing Diachasma alloeum Microplitis demolitor Nasonia vitripennis Apis mellifera EOG091B00P4 EOG091B0020 EOG091B00BW EOG091B01YL EOG091B01TN EOG091B003H EOG091B00GP EOG091B04RE EOG091B01W3 EOG091B00HB EOG091B010Y EOG091B05RW EOG091B020F EOG091B00IR EOG091B0173 EOG091B06YM EOG091B023C EOG091B00OT EOG091B01AR EOG091B078A EOG091B02AQ EOG091B011U EOG091B01BE EOG091B07B2 EOG091B02CC EOG091B017N EOG091B01J9 EOG091B08CB EOG091B02DN EOG091B019E EOG091B01PI EOG091B0905 EOG091B02GN EOG091B01D8 EOG091B02RY EOG091B095A EOG091B02KE EOG091B01F6 EOG091B02YV EOG091B09GA EOG091B02LH EOG091B01GM EOG091B03CB EOG091B09VB EOG091B02MQ EOG091B01JQ EOG091B0417 EOG091B0A7R EOG091B02RK EOG091B01NP EOG091B041Q EOG091B0B3H EOG091B02XF EOG091B01TN EOG091B044C EOG091B0B6E EOG091B033O EOG091B01V0 EOG091B045F EOG091B0B6M EOG091B0341 EOG091B01X6 EOG091B0485 EOG091B0BDI EOG091B0345 EOG091B023C EOG091B04T8 EOG091B0BIA EOG091B036B EOG091B0256 EOG091B05GP EOG091B0BKY EOG091B03BO EOG091B02B1 EOG091B064Z EOG091B0C59 EOG091B03E5 EOG091B02CC EOG091B06GQ EOG091B0C6Y EOG091B03FF EOG091B02G1 EOG091B06NS EOG091B0D1R EOG091B03FZ EOG091B02KE EOG091B07AY EOG091B0D4C EOG091B0444 EOG091B02KP EOG091B07D6 EOG091B0DAK EOG091B045Z EOG091B02O1 EOG091B07IK EOG091B0DHN EOG091B0479 EOG091B02O4 EOG091B07LB EOG091B0DLC EOG091B04DX EOG091B02OT EOG091B07S9 EOG091B0DSV EOG091B04G8 EOG091B02V4 EOG091B084W EOG091B0E2E EOG091B04JB EOG091B02XK EOG091B08BZ EOG091B0EF6 EOG091B04KU EOG091B033O EOG091B08CR EOG091B0ETA EOG091B0517 EOG091B0345 EOG091B08HJ EOG091B0EUT EOG091B051O EOG091B03A2 EOG091B08P2 EOG091B0FHQ EOG091B0527 EOG091B03D1 EOG091B09DX EOG091B0FJE

76

Supplementary Data 2-S2 – continued

EOG091B058W EOG091B03FZ EOG091B09JF EOG091B0G7L EOG091B05Q2 EOG091B03SK EOG091B09KH EOG091B0GJ4 EOG091B06Q6 EOG091B03TA EOG091B09QW EOG091B0GOV EOG091B06W5 EOG091B045F EOG091B09SV EOG091B0GQQ EOG091B06XN EOG091B0485 EOG091B09WL EOG091B0GUI EOG091B071G EOG091B049L EOG091B09WR EOG091B0I2B EOG091B074P EOG091B04BT EOG091B0ABK EOG091B0I6M EOG091B076R EOG091B04GC EOG091B0AU6 EOG091B0I6O EOG091B07D6 EOG091B04GK EOG091B0AX3 EOG091B0IGJ EOG091B07JO EOG091B04GQ EOG091B0AZ1 EOG091B0IN0 EOG091B07LH EOG091B04QE EOG091B0B6E EOG091B0IS5 EOG091B07MN EOG091B04R1 EOG091B0B9T EOG091B0IW2 EOG091B0852 EOG091B04SV EOG091B0BQR EOG091B0JI8 EOG091B086F EOG091B051O EOG091B0BXN EOG091B0JKN EOG091B08DN EOG091B057I EOG091B0CGZ EOG091B0JQA EOG091B08E3 EOG091B058G EOG091B0CIR EOG091B0JSL EOG091B08HJ EOG091B05CU EOG091B0CVU EOG091B0KR3 EOG091B08IA EOG091B05WP EOG091B0D0U EOG091B0L1V EOG091B08JW EOG091B05YM EOG091B0D52 EOG091B0L4T EOG091B08L2 EOG091B064D EOG091B0D56 EOG091B0LLB EOG091B08XI EOG091B06DP EOG091B0DRO EOG091B0LS6 EOG091B08YE EOG091B06GB EOG091B0DUR EOG091B0MB7 EOG091B08Z0 EOG091B06SB EOG091B0E1F EOG091B0MBZ EOG091B094G EOG091B06XZ EOG091B0E2T EOG091B0N5I EOG091B09CZ EOG091B074F EOG091B0EGK EOG091B0NT5 EOG091B09II EOG091B0755 EOG091B0EJH EOG091B0NVC EOG091B09P6 EOG091B07IK EOG091B0EQB EOG091B0NX5 EOG091B09X4 EOG091B07MT EOG091B0FH4 EOG091B0OW3 EOG091B0ABO EOG091B07WE EOG091B0FIA EOG091B0PHZ EOG091B0ATI EOG091B07YH EOG091B0FYG EOG091B0PO9 EOG091B0AY4 EOG091B080Y EOG091B0G7L EOG091B0B6E EOG091B086P EOG091B0GG9 EOG091B0BKA EOG091B0897 EOG091B0GP4 EOG091B0BOV EOG091B08E3 EOG091B0GQD EOG091B0C1R EOG091B08EY EOG091B0GXG EOG091B0C7J EOG091B08HJ EOG091B0HB8 EOG091B0CE9 EOG091B08L2 EOG091B0HJJ EOG091B0CH0 EOG091B08RP EOG091B0HNI EOG091B0CV7 EOG091B08S1 EOG091B0HP7 EOG091B0D37 EOG091B08Z0 EOG091B0HSR

77

Supplementary Data 2-S2 – continued

EOG091B0DFO EOG091B08Z4 EOG091B0HXP EOG091B0DG7 EOG091B094G EOG091B0I59 EOG091B0DGH EOG091B094I EOG091B0I6N EOG091B0DRA EOG091B09FT EOG091B0J1X EOG091B0E6C EOG091B09IB EOG091B0J8T EOG091B0EA6 EOG091B09QD EOG091B0JL0 EOG091B0EJ4 EOG091B0A0D EOG091B0JNA EOG091B0ERY EOG091B0A0E EOG091B0JO5 EOG091B0EYN EOG091B0ABO EOG091B0JSL EOG091B0F7I EOG091B0AC3 EOG091B0JUM EOG091B0FA7 EOG091B0AJ7 EOG091B0KG5 EOG091B0FEY EOG091B0AMR EOG091B0KPN EOG091B0FJE EOG091B0AQQ EOG091B0KTB EOG091B0FPE EOG091B0AV0 EOG091B0L0J EOG091B0FTR EOG091B0AXN EOG091B0L1W EOG091B0FXK EOG091B0AXX EOG091B0LGZ EOG091B0G97 EOG091B0AZ8 EOG091B0LKM EOG091B0GAT EOG091B0B6E EOG091B0M0X EOG091B0GRL EOG091B0B9F EOG091B0MCD EOG091B0HU9 EOG091B0B9R EOG091B0MFX EOG091B0I4S EOG091B0BLI EOG091B0MMS EOG091B0IQV EOG091B0BOV EOG091B0MMW EOG091B0IZX EOG091B0BVO EOG091B0N3T EOG091B0JEG EOG091B0C1O EOG091B0N57 EOG091B0JMD EOG091B0C1R EOG091B0N5V EOG091B0JSL EOG091B0C7D EOG091B0NFY EOG091B0JWS EOG091B0C92 EOG091B0NSN EOG091B0K27 EOG091B0CBB EOG091B0NWQ EOG091B0K4I EOG091B0CJO EOG091B0O32 EOG091B0KG0 EOG091B0CV7 EOG091B0OCH EOG091B0KXL EOG091B0CW9 EOG091B0P5Q EOG091B0KXX EOG091B0D37 EOG091B0PGQ EOG091B0L40 EOG091B0DFL EOG091B0PH7 EOG091B0LGH EOG091B0DFO EOG091B0LQO EOG091B0DIV

EOG091B0MCB EOG091B0DNB EOG091B0MZW EOG091B0DPL EOG091B0N6C EOG091B0DRO EOG091B0NJF EOG091B0E2T

EOG091B0OGK EOG091B0E96

78

Supplementary Data 2-S2 – continued

EOG091B0P1G EOG091B0E98

EOG091B0P5Q EOG091B0EA6 EOG091B0PTV EOG091B0EF6 EOG091B0EJ8 EOG091B0ELS EOG091B0ERY EOG091B0EVJ EOG091B0EWN EOG091B0EZT EOG091B0F6Y EOG091B0F9Z EOG091B0FA7 EOG091B0FES EOG091B0FJ4 EOG091B0FK8 EOG091B0FRW EOG091B0FSO EOG091B0FXK EOG091B0FYR EOG091B0G0X EOG091B0GAT EOG091B0GHK EOG091B0GO2 EOG091B0GQD EOG091B0H8F EOG091B0HML EOG091B0HQ4 EOG091B0HSR EOG091B0HU9 EOG091B0HXV EOG091B0I4B EOG091B0I6N EOG091B0I75 EOG091B0I8E EOG091B0IGJ EOG091B0IJ6 EOG091B0IK9 EOG091B0ILO EOG091B0IMK EOG091B0IW2 EOG091B0IYD

79

Supplementary Data 2-S2 – continued

EOG091B0J5U EOG091B0JHK EOG091B0JMD EOG091B0JSL EOG091B0K2R EOG091B0K4E EOG091B0K9R EOG091B0KGI EOG091B0KXL EOG091B0L1D EOG091B0L1M EOG091B0LGH EOG091B0M3X EOG091B0MD7 EOG091B0N6C EOG091B0N9E EOG091B0NCI EOG091B0P1G EOG091B0P5Q

Fragmented Diachasma alloeum Microplitis demolitor Nasonia vitripennis Apis mellifera EOG091B00C8 EOG091B00EL EOG091B0074 EOG091B01UF EOG091B00E9 EOG091B014Q EOG091B007F EOG091B02B2 EOG091B00MN EOG091B01I0 EOG091B009N EOG091B02CT EOG091B00YO EOG091B01UL EOG091B009Z EOG091B02IL EOG091B017M EOG091B01Z5 EOG091B00EN EOG091B02P9 EOG091B019E EOG091B0240 EOG091B00FI EOG091B02SJ EOG091B01BA EOG091B024X EOG091B00IV EOG091B02XF EOG091B01FD EOG091B02QK EOG091B00KQ EOG091B0349 EOG091B01GN EOG091B02QO EOG091B00OM EOG091B03R6 EOG091B01JQ EOG091B02S8 EOG091B00OW EOG091B03RP EOG091B01NU EOG091B02S9 EOG091B00YE EOG091B03UX EOG091B01OK EOG091B02SY EOG091B011X EOG091B0493 EOG091B021Z EOG091B032N EOG091B012C EOG091B04EP EOG091B022S EOG091B0341 EOG091B0134 EOG091B04L8 EOG091B02IL EOG091B03BI EOG091B013E EOG091B04RS EOG091B02O4 EOG091B03DR EOG091B013H EOG091B05FK EOG091B02QK EOG091B03L5 EOG091B017H EOG091B05VW EOG091B02QO EOG091B03LM EOG091B018M EOG091B064H

80

Supplementary Data 2-S2 – continued

EOG091B02SY EOG091B03NF EOG091B01BW EOG091B06CN EOG091B02V3 EOG091B03QB EOG091B01I0 EOG091B06JF EOG091B02WM EOG091B0417 EOG091B01ID EOG091B06VX EOG091B02ZQ EOG091B04BZ EOG091B01JD EOG091B07VL EOG091B0371 EOG091B04DS EOG091B01T9 EOG091B085X EOG091B037S EOG091B04DW EOG091B023H EOG091B08AN EOG091B03BR EOG091B04EZ EOG091B024L EOG091B08IR EOG091B03KL EOG091B04IZ EOG091B024O EOG091B08VJ EOG091B03M1 EOG091B04M0 EOG091B0267 EOG091B08YH EOG091B03QH EOG091B051U EOG091B02A2 EOG091B08ZZ EOG091B03ZJ EOG091B053B EOG091B02CC EOG091B094G EOG091B0417 EOG091B05B9 EOG091B02GA EOG091B0971 EOG091B0462 EOG091B05CP EOG091B02J0 EOG091B09P6 EOG091B047M EOG091B05IE EOG091B02Q2 EOG091B0A64 EOG091B048F EOG091B05T3 EOG091B02WL EOG091B0BFC EOG091B048K EOG091B05V2 EOG091B02WU EOG091B0BJM EOG091B048N EOG091B05W0 EOG091B030S EOG091B0BSF EOG091B04BX EOG091B05WS EOG091B034W EOG091B0C37 EOG091B04G9 EOG091B062G EOG091B035L EOG091B0CB5 EOG091B04IX EOG091B065H EOG091B0367 EOG091B0CZS EOG091B04Q3 EOG091B06AK EOG091B039V EOG091B0DOJ EOG091B056S EOG091B06CG EOG091B03LN EOG091B0DQ0 EOG091B05FB EOG091B06GJ EOG091B03M6 EOG091B0DWL EOG091B05V2 EOG091B06KO EOG091B03PE EOG091B0ER9 EOG091B05WP EOG091B06VL EOG091B03WI EOG091B0F7I EOG091B0638 EOG091B06Y6 EOG091B03X1 EOG091B0GKN EOG091B069A EOG091B06YH EOG091B0415 EOG091B0GL2 EOG091B06BG EOG091B0700 EOG091B0455 EOG091B0HDZ EOG091B06FA EOG091B070E EOG091B0457 EOG091B0HKR EOG091B06GB EOG091B07DH EOG091B045Z EOG091B0JB9 EOG091B06GH EOG091B07DM EOG091B046C EOG091B0JGB EOG091B06MH EOG091B07H8 EOG091B046D EOG091B0JXQ EOG091B06VL EOG091B07JI EOG091B04EU EOG091B0K3V EOG091B06X9 EOG091B07OC EOG091B04GF EOG091B0L0T EOG091B06Y6 EOG091B07PC EOG091B04MB EOG091B0L1S EOG091B0755 EOG091B07QU EOG091B04Q3 EOG091B0LD3 EOG091B0783 EOG091B07XF EOG091B04R0 EOG091B0LZT EOG091B07JI EOG091B0861 EOG091B04YC EOG091B0M3X EOG091B07PI EOG091B086V EOG091B0501 EOG091B0OGK EOG091B07RV EOG091B086W EOG091B055A EOG091B0P82

81

Supplementary Data 2-S2 – continued

EOG091B082U EOG091B08BZ EOG091B0581 EOG091B08HY EOG091B08DB EOG091B05C9 EOG091B08S5 EOG091B08EM EOG091B05HF EOG091B08TT EOG091B08FN EOG091B05JR EOG091B08W2 EOG091B08IM EOG091B05JS EOG091B090H EOG091B08QB EOG091B05TA EOG091B095V EOG091B08VJ EOG091B05XV EOG091B09DZ EOG091B090U EOG091B0682 EOG091B09N1 EOG091B091B EOG091B0683 EOG091B0A48 EOG091B095V EOG091B06CG EOG091B0A49 EOG091B09AU EOG091B06FL EOG091B0A9I EOG091B09DY EOG091B06N5 EOG091B0AEB EOG091B09HG EOG091B06O5 EOG091B0AS1 EOG091B09VA EOG091B06PE EOG091B0AU7 EOG091B0A11 EOG091B06VO EOG091B0AVN EOG091B0A7I EOG091B06YM EOG091B0AWR EOG091B0A99 EOG091B0728 EOG091B0B11 EOG091B0AD0 EOG091B0729 EOG091B0BFS EOG091B0ADU EOG091B074F EOG091B0BH7 EOG091B0AEB EOG091B07BF EOG091B0BHV EOG091B0AIK EOG091B07BI EOG091B0BRD EOG091B0AR8 EOG091B07FP EOG091B0BSR EOG091B0AVT EOG091B07HX EOG091B0C0B EOG091B0AYI EOG091B07K4 EOG091B0CBB EOG091B0B36 EOG091B07LD EOG091B0CBD EOG091B0B3X EOG091B07LM EOG091B0CHN EOG091B0B5O EOG091B07QU EOG091B0CRW EOG091B0B64 EOG091B07RV EOG091B0CW6 EOG091B0BHJ EOG091B07WT EOG091B0CWN EOG091B0BKN EOG091B07X0 EOG091B0D1R EOG091B0BRD EOG091B084Z EOG091B0D73 EOG091B0C4Q EOG091B0864 EOG091B0D83 EOG091B0C5O EOG091B086D EOG091B0DAV EOG091B0CKN EOG091B086O EOG091B0DF5 EOG091B0CMM EOG091B08IJ EOG091B0DJ8 EOG091B0CW6 EOG091B08IU EOG091B0DNB EOG091B0CX2 EOG091B08JW EOG091B0DUA EOG091B0D1R EOG091B08LP EOG091B0DVI EOG091B0D39 EOG091B08S6 EOG091B0E7C EOG091B0D9Z EOG091B08XT

82

Supplementary Data 2-S2 – continued

EOG091B0EC5 EOG091B0DB9 EOG091B0938 EOG091B0EI4 EOG091B0DG7 EOG091B093I EOG091B0ES0 EOG091B0DJ8 EOG091B09CR EOG091B0F0P EOG091B0DOJ EOG091B09T9 EOG091B0F1E EOG091B0DPJ EOG091B0A0P EOG091B0FMJ EOG091B0DS2 EOG091B0A0V EOG091B0FYR EOG091B0DST EOG091B0A4P EOG091B0G0W EOG091B0E0G EOG091B0AEE EOG091B0G1V EOG091B0E29 EOG091B0AOK EOG091B0GPQ EOG091B0E9O EOG091B0AWH EOG091B0HKR EOG091B0EC5 EOG091B0AYI EOG091B0HXV EOG091B0EFM EOG091B0B9G EOG091B0HZX EOG091B0ES0 EOG091B0BBD EOG091B0I7L EOG091B0F0P EOG091B0BGQ EOG091B0IBM EOG091B0F6R EOG091B0BIF EOG091B0IJ8 EOG091B0FEY EOG091B0BLG EOG091B0IK9 EOG091B0FHL EOG091B0BNR EOG091B0ITJ EOG091B0FMJ EOG091B0BOX EOG091B0JG7 EOG091B0FOR EOG091B0C2S EOG091B0JJE EOG091B0G0W EOG091B0C5O EOG091B0JLC EOG091B0GJ4 EOG091B0C8R EOG091B0JOG EOG091B0GOE EOG091B0C8V EOG091B0JQA EOG091B0GRH EOG091B0CE9 EOG091B0JUE EOG091B0GV6 EOG091B0CI2 EOG091B0KDQ EOG091B0H29 EOG091B0CK4 EOG091B0KVH EOG091B0H7K EOG091B0CKN EOG091B0L35 EOG091B0H7V EOG091B0CQR EOG091B0LJ0 EOG091B0H8A EOG091B0CVQ EOG091B0LJ9 EOG091B0H95 EOG091B0D0T EOG091B0LWH EOG091B0HAR EOG091B0D83 EOG091B0LZT EOG091B0HEL EOG091B0D90 EOG091B0M3X EOG091B0HHK EOG091B0DNB EOG091B0M72 EOG091B0HKR EOG091B0DXD EOG091B0MD7 EOG091B0HQ0 EOG091B0E94 EOG091B0MHL EOG091B0HXR EOG091B0EBE EOG091B0MON EOG091B0I13 EOG091B0EJL EOG091B0NCC EOG091B0I7L EOG091B0ENL EOG091B0NDM EOG091B0I8Q EOG091B0ETO EOG091B0NON EOG091B0IN0 EOG091B0EW7

83

Supplementary Data 2-S2 – continued

EOG091B0NU6 EOG091B0ITB EOG091B0EYJ EOG091B0IYX EOG091B0F3D EOG091B0J3B EOG091B0F3P EOG091B0J5B EOG091B0F49 EOG091B0JG7 EOG091B0FJV EOG091B0JHC EOG091B0FMJ EOG091B0JQA EOG091B0G0W EOG091B0JUE EOG091B0G3Y EOG091B0K27 EOG091B0G8S EOG091B0K5F EOG091B0GAK EOG091B0K6I EOG091B0GB0 EOG091B0KAM EOG091B0GQ2 EOG091B0KCZ EOG091B0GSO EOG091B0KFP EOG091B0GZX EOG091B0KHB EOG091B0HDQ EOG091B0KWH EOG091B0HDS EOG091B0KWW EOG091B0HEB EOG091B0L09 EOG091B0HJ0 EOG091B0L0T EOG091B0HKR EOG091B0L43 EOG091B0HTB EOG091B0LDD EOG091B0HUO EOG091B0LJ9 EOG091B0HXV EOG091B0LKL EOG091B0HYN EOG091B0M51 EOG091B0I5G EOG091B0MNT EOG091B0I7Z EOG091B0MON EOG091B0IF1 EOG091B0NDM EOG091B0IFA EOG091B0NEK EOG091B0ILW EOG091B0NF5 EOG091B0IYG EOG091B0NYH EOG091B0J35 EOG091B0O6J EOG091B0J89 EOG091B0O6M EOG091B0JCH EOG091B0OGK EOG091B0JLC EOG091B0OV1 EOG091B0JNB EOG091B0OW3 EOG091B0JOG EOG091B0JQA EOG091B0JR4 EOG091B0JS4 EOG091B0K0U EOG091B0K8J

84

Supplementary Data 2-S2 – continued

EOG091B0KJ3 EOG091B0KWP EOG091B0LDD EOG091B0LDF EOG091B0LH1 EOG091B0LLS EOG091B0LYB

EOG091B0LZT

EOG091B0M51

EOG091B0MCB

EOG091B0MMM

EOG091B0NDM

EOG091B0NJR

EOG091B0OGK

EOG091B0P82

Duplicated Diachasma alloeum Microplitis demolitor Nasonia vitripennis Apis mellifera EOG091B00HF EOG091B00T6 EOG091B00YI EOG091B062G EOG091B00QB EOG091B01EG EOG091B01HK EOG091B06T6 EOG091B00XK EOG091B01MB EOG091B02L8 EOG091B08NC EOG091B00YD EOG091B01PK EOG091B04QI EOG091B090J EOG091B010X EOG091B01WA EOG091B05DA EOG091B091P EOG091B055D EOG091B01WP EOG091B05FP EOG091B0CK5 EOG091B05TV EOG091B0219 EOG091B05H3 EOG091B0DDD EOG091B05W4 EOG091B03C7 EOG091B05W4 EOG091B0EFZ EOG091B06T6 EOG091B03C9 EOG091B070E EOG091B0I8X EOG091B07N0 EOG091B03UA EOG091B085X EOG091B0K0J EOG091B08NC EOG091B04EL EOG091B08A4 EOG091B0L1H EOG091B08O3 EOG091B05L3 EOG091B08NC EOG091B0LHA EOG091B09CT EOG091B05RY EOG091B08OP EOG091B0O8Z EOG091B09WR EOG091B067T EOG091B095V EOG091B0ANS EOG091B06T6 EOG091B09MV EOG091B0AV8 EOG091B06VO EOG091B0CB5 EOG091B0CWT EOG091B07D7 EOG091B0CD5 EOG091B0DF4 EOG091B07PI EOG091B0CQ6 EOG091B0FA4 EOG091B07ZQ EOG091B0GJK EOG091B0FTF EOG091B08EL EOG091B0KB8 EOG091B0G8K EOG091B08J5 EOG091B0KCT

85

Supplementary Data 2-S2 – continued

EOG091B0GSZ EOG091B08KF EOG091B0L87 EOG091B0J53 EOG091B08O3 EOG091B0LWG EOG091B0JUM EOG091B09EU EOG091B0MKL EOG091B0K1V EOG091B09VB EOG091B0KVN EOG091B09WR EOG091B0M05 EOG091B0ADZ EOG091B0MNR EOG091B0AG7 EOG091B0MVH EOG091B0ARA EOG091B0NCI EOG091B0BAO EOG091B0CAR EOG091B0CL4 EOG091B0COW EOG091B0D56 EOG091B0EIZ EOG091B0FY4 EOG091B0GIP EOG091B0HL1 EOG091B0ING EOG091B0IQS EOG091B0KCT EOG091B0KXK EOG091B0L8B EOG091B0M4Q EOG091B0MRN EOG091B0NAF EOG091B0NDG EOG091B0NND

86

Supplementary Data 2-S3. Sequences of mRNAs, rRNAs, and tRNAs in D. alloeum mitochondrial genome. >atp6 ATATTTAAAATGATATCAAATTTATTTACAATTTTTGATCCACAAATTTTAAAATTTTCAATTAAT TGAATTTCATCTTTAATAATTTTATTTATTATTCCTCAAGTTTATTGAGTTTTAAATTCTCGATTA AATTATATATTTATAAAGTTTATAAAAATTTTATGGATTGAGTATAAAGTTATTATAAAATTTAAT TTTAATTTAAATAATTTAATTTATTTTAATGTATTATTTATGTTTGTTATAACAAATAATTTTATG GGGTTGTTTCCTTACATTTTTACTAGGTCCAGACATTTAATTTTTTCAATAATTTTTTCTTTATC TATTTGGTTGGGTTTAATATTATTTGGTTGAGTTAATAATACTATTTTTATATTAGCACATTTAG TACCTCAAGGAACACCATTTGTATTAATATTTTTTATAGTTATAATTGAGTTGTTGAGAAGAAT TATTCGACCTTTAACATTATCTGTTCGATTAACAGCTAATATAATTGCGGGTCATTTGTTATTA ACATTGTTAAGAAGATTTATTCCAAATTTTTTTTTTTTATATTTAATTGTTTTGTTATTTCAATTA TTATTATTATTATTAGAATTAGCAGTTTCAATCATTCAATCTTATGTTTTTGTAATTTTGGTAATT TTATATTTAAAAGAAACAAATTAA

>atp8 ATATCTCCAATAGATTGATTTATATTATCATTATTTTTTTTATTAGTTTATTTTATAGGTTTAATA TATATATATTTTTTAATAAATATAAATTTTAAAATAGTTATTGTAAGAAAAAATGATTTTAATTTT ATTATAAATATTTAA

>cytb ATGAATAAGTCAATTATAAAAAAAAATAAAATTTTAGAGATTTTTAATAATTCTTTAATTAATTTG CCTTCGCCAGTAAATATTAGGGTTTGGTGAAATTTTGGGTCTTTATTAGGTTTATTTTTGATAA TTCAATTAATTTCTGGGGTATTTTTATCAATACATTATATTGCTCATTTAGATTTTTCTTTTATTA GTGTTATTCATATTATTCAAGATGTAAATTATGGGTGATTGATACGGTTAATTCATATTAATGG AGCTTCATTTTTTTTTATTTGTGTTTATATTCATATTGGTCGTGGTTTATATTATGGATCTTATA AACTTTTTAAGACTTGATTAATTGGGGTATTTATTTTTATATTAATAATGGCTATTGCTTTTTTG GGGTATGTTTTACCTTGGGGACAAATATCATTTTGAGGGGCTACAGTTATTACAAATTTATTA ACAGCTTTTCCTTATATTGGGTTAATGTTTGTTGAATGATTATGGGGAGGATTTTCTGTTGGT AATGCAACTTTGAATCGTTTTTATTCATTACATTTTTTAATACCTTTTATTTTACTTTTAATAGTT ATAATTCATTTAGTATTTTTACATGAGACTGGGTCAAATAATCCTTTGGGAGTAAGAAGGAAT AATTATAAAATTATTTTTCATAATTATTATTCTTTGAAAGATTTATTGGGGTTTATTTATTTTTTA ATAATTTTTATATTAGTTGTTTGTGAGTTACCTTATTTATTAGGAGATCCTGAAAATTTTATTAT AGCAAATTCTATAGTTACTCCAGTTCATATTCAGCCTGAATGATATTTTTTATTTGCTTATACA ATTTTACGTTCGATTTCCAATAAGTTAAGGGGGGTTATTGCATTATTAATATCAATTTTAATTTT AATAATACTACYTTATATTAATTTAAATAAATTTCAAGGTTTGTCTTTTTATCCTTTAAGTCAAAT TTATTATTGATTTTTTCTTAGGTGTTTAATATTATTAACTTGGTTGGGGGGTCAATCTGTTGAA TATCCTTTTGTTGAGTTAAGAAAATTTTTTACTTATTTTTATTTTTTATATTATTTTTTAAATAATT TATTAATAATTTATTGAGATAAATTAATTAAAGTGTAG

87

Supplementary Data 2-S3 – continued

>cox1 ATGATAAAATGGATATATTCAACTAATCATAAAGATATTGGTGTTTTGTATTTTTTGTATGGGA TTTGAGCTGGGGTGTTAGGTTTATCTATAAGAATAATTATTCGATTAGAATTAGGGATACCTG GGAGATTGTTAAATGATCAGATTTATAATAGAATAGTAACTGCTCATGCTTTTGTTATAATTTT TTTTACAGTTATACCTATTATAATTGGAGGGTTTGGGAATTGATTAATTCCTTTAATATTAGGG GTTCCTGATATAGCCTTTCCTCGAATAAATAATATAAGATTTTGATTATTGAGACCTTCTATAA TTTTGTTAATATTGAGAATATTATTAAATTTAGGTGCTGGAACTGGTTGGACAATTTATCCTCC TTTATCATCTAGATTAGGTCATAGGGGGTTAGCAGTAGATTTATTAATTTTTAGTTTACATTTA GCTGGGGTATCATCAATTATGGGGGCAATTAATTTTATTTGTACAATTTTAAATATAAAGCTTT TCATAAAGTTTGAGCAATTAAGTTTATTTATTTGGTCAATTTTAATTACAGCTATTTTATTATTA TTATCTTTACCGGTTTTAGCTGGAGCTATTACAATATTATTAACTGATCGAAATTTGAATACAA CATTTTTTGATTTTTCAGGGGGTGGTGATCCTATTTTATTTCAACATTTATTTTGGTTTTTTGGT CATCCTGAGGTTTATATTTTGATTTTACCTGGATTTGGAATAATTTCTCATATTATTTATAATGA AAGGGGGAAGAAAGAAACTTTTGGGGTTTTAGGAATGATTTATGCTATATTAACAATCGGGTT TTTAGGTTTTATTGTTTGGGCTCATCATATGTTTACTATTGGGATAGATGTTGATACTCGAGCT TATTTTACTTCTGCAACAATAATTATTGCTATTCCTACAGGGATTAAAATTTTTAGCTGATTAG CAACATTAGGGGGGGTAAAAATAAAAATAAATTTAAGAGTGTATTGATCTGTGGGATTTGTAT TTTTGTTTACTATAGGAGGGTTAACAGGTATTATTTTATCAAATTCTTCTATTGATATTGTTTTA CATGATACTTATTATGTTGTTGCTCATTTTCATTATGTTTTATCAATAGGAGCTGTATTTGCTAT TATAGCAGGTTTTATTTATTGATATYCCTTATTTTTAGGATTATATTTGAATGAGATTTGATTAA AAATTCAATTTTTTTTAATATTTTTGGGGGTTAATTTAACATTTTTTCCACAGCATTTTTTAGGT TTGAGGGGTATGCCACGTCGATATTCAGATTATTCTGATATTTATTTAATATGAAATTTAGTAT CTTCTATTGGGTCTATAATTACTTTGGTTAGTATTATTTTTTTTATTTTTTTATTGTGGGAAAGA TTAATTATAGAACGAAATATTATTTTTGTTAAGTATATAAATTCATCAATTGAATGGTTTCATTA TTATCCACCAATAAATCATTCTTATAAACAATTACCTTTATTATTTAAAAAATAA

>cox2 ATAATAAATTTTCAAGATTTTAATTCTTATTTGAGAATATTAATAATTGAGTTTCATGATTTTTCT TTAATAATTTTATTAATAATTTTATTTTTTATTTTATATTTAATTATGTGGTTTTTTAAAAATAATT TTATTGATAAAAATATTTTACATAATCAAATATTGGAGATTATCTGAACAATTGTTCCTTTAATT GTTTTAGTATTTATAATTATTCCTTCATTGAAAATTTTGTATATGATTGAGGAATCATTAAATCC TTATTTAACTTTAAAAATTATTGGTCATCAATGATATTGAAGGTATGAATATAGTGATTTTTTTA ATTTAAGATTTGATTCTTTTATAATTAATGATTGAAGAAGGTTGGGGGTATTTCGTTTATTTGA TGTTGATAATCGATTAATTTTACCTTATAATTTAATAATTCGTGGAATAATTTCTTCAGTTGATG TAATTCATTCTTGGGCTATTCCTAGTTTAGGATTAAAAGTGGATGCTATTCCAGGTCGGATTA ATCAGTGTTTGATTTATTTGAATCGTTTAGGGGTTTATTTTGGGCAATGTTCTGAAATTTGTGG GCTTAATCATAGTTTTATACCAATTGTGATAGAAGGGGTTAAATTGAAGATTTTTTTTAATTGA

88

Supplementary Data 2-S3 – continued

>cox3 ATGAATAAATTTTTTCATCCTTTTCATTTAGTTACTGAAAGGCCTTGACCAATATTATGTTCTTT TTTATTTTTAATTGTTATAGTAGGTTTTATTAAATTTTTTAATAATTTTAATATAAATATTTATATA ATTGGTAATATTTTATTAATATTTGTTGTTTTTCAATGATGACGTGACGTTATTCGTGAGAGAA TGACTCAAGGTAATCATACAATTAAAGTTGTGGATGGAATTAAATTAGGAATGATTTTATTTAT TTTATCTGAGGTAATATTTTTTATTTCTTTTTTTTGGAGATATTTTCATATATTTTTATCTCCTAG AATTGAAATTGGGAGAATGTGACCTCCTCGGGATATTTTAGTTTTTAATCCTTACAATATTCCT TTATTGAATACTTTAATTTTATTGAGATCCGGGGTTACCATTACGTGATGTCATTATTTAATTTA TAAGGGGGTTGATATTAAATATAGAGAGATTACTATTAATTTAACAATTGTTTTAGGGTTGTTA TTTATTGGTTTTCAGTATATAGAGTATAAGGAATCTTATTTTTCTATGGCTGATTCGATTTATG GATCAGTATTTTTTATAATAACTGGGTTTCATGGGGTGCATGTGATTATTGGGGTTATATTTAT TATAATTTCTATAAGACGTTTAGTGAAAAATCATTTTTCTAGATTTCATCATTTTGGATTTGAAG CAGCTTCATGATATTGACATTTTGTAGATGTAGTTTGGTTATTTTTATATATTTTTATTTATTGA TTATCTTTTTAA

>nad1 ATAAGTAATATATTAATACTACTAATAGTTATTATTTTAATTTTAATTAGGGTAGCTTTTTTAACT TTATTTGAACGTAAAACTTTAAGATATATACATTATCGGAAAGGGCCAAACAAAATTAGGTTAT GAGGAATTCTTCAACCTATAAGAGATGCAATAAAACTAATACTAAAAGAATTTTTCCACCCTAA CAAAACTAATTATAATTTTTATTTTATCTCACCAATATTAATATTAACCCTAATTTTATCATTATG ATTAGTATATCCCTTCAAAACAAACTTATTTAACTGAAATTTAAATAGCTTATTTATTCTAGCTC TTATAAGAATAGGAGTATATGGAATAATAATAGCTGGCTGATCTTCAAACTCTTGTTTTTCAAT ATTAGGAGCCATTCGCTCAATTGCACAATCCATTTCCTATGAAGTAACTTTCACTATTTCTTTT TTATTAAGATTATTTATAATTAACTCTTTAAACTTAAATAATATCCTTTATTTTCATAAATATAAC TCAATAATTTATTTTATATGACCAGTCAGAATAATATTTTTATTAAGAATATTAGCTGAACTAAA TCGTACTCCTTTTGACCTTTCAGAAGGAGAATCTGAACTAGTATCAGGATTTAATGTTGAATA CAGAAGCCACAGATTTGTTTTAATTTTTATTTCAGAATATTCAAGAATTATTTTTATAAGATACC TATTTAACTTAATTTACTTATGTAAAAACTCTATATATATCATATTTTATTTAACTTTAATTTTATT AATTTATTCAATCATCTGAACACGAGTAACATTACCTCGTATTCGTTACGATTTACTAATATTC TATTGTTGAATTTTTATCCTTCCTTTAATTTTAATTATGTTTTTATTATATATATTATTTACAAAAT TTTCTTTAGAAATAATAATTCTCTTAAGTATAAAAAGATAA

>nad2 ATAGTTTATATGAAAAAATATAATTGATTATATTTATTTTTTTTAGTTTTAAGAGCGTTAATTTTA TTATTTTTAAATAATTATTATTCTATGTGAATTTTTATGGAGTTAAATTTATTAGTTTTTATTACT TTAATAGTAATTAATGGGGCTTATATTAGTGATAGGGCTATAAAATATTATTTATTAAATAGATT TAGCTCAATAATTTTTTTATTTTTTATGAATTTAAATATAATATTTGATAATAGAATATTTTTATT ATTAATAAATATAATGATTTTAATTAAATTGGGTATATTTCCTTTTCAGTTTTGGTTTATTGATAT GTTATTACAGTTAGATTGATTAATATGTTTTGTTTTAATAGGGTGACAAAAATTAATTCCAATAT TAATTTTAATAAATATTTATAACATTAATTTATTATTTTTAGTTTCAATTTTAGGGGGAGGGTTT AGTTTATTAATGGTTTTTAATCAAATTTTGTTGAAAAAAATTTTGGGGTATTCTTCTTTAAATCA TATATCCTGGATGTTAATTTCCTTGATTGTTAATGTAGATTTGTTTTTAGTTTATTATATTAGTT ATATTTTTATTAATTTTGTAATTATTTTATTAATGTGAAAATTAAAATTGGAAGAAATTAATATAA ATTTATTAAAAGTAAATATAAATTATAATTATTTTATTATTTTTATTATTTTTTCTTTAGGAGGGG TTCCTCCTTTATATGGTTTTTTTATAAAATGATATTTTATTTTAAATATAAGTGTATGAATAAATT TTATAATAATGATTTTATTAGTTTTTTATTCTTTAATTTTTTTATTTTATTATTTACGATTAGTTTT AAATTTTATAATGTTGAATTATTTAATAATAAATCTAAGTTTATTTAATAATTTGATTTTAAAAGA AAAAAATTATAATTTTTTTCTTTTAATGAATTTATTTATTATTGTTAATTTAATAATATTTTTTATA TAA

89

Supplementary Data 2-S3 – continued

>nad3 ATAATTTTTATTATTATTTTATTTTGTGTTATTTTTATGTTAGTAAATATAATAATTTCTAAAAAG GATTGGTTAGATCGGGATAAAAATTCTTCTTTTGAATGTGGGTTTGATCCTTTAAATTCTTCTC GATTACCTTTTTCTATTCATTTTTATTTAATTGGAATTTTGTTTTTGATTTTTGATGTTGAAGTAA TTTTTTTATTTCCTATAATTAATTTATATAAATATTTAAATTTATATGAATGGTTATTTTTAGGTTT TATGATTTTGATAATTTTATATTTAGGGTTAGAATTTGAAAAATTAGAAGGATCATTGAAATGA ATTTTTTAG

>nad4 ATGATAAAACTAATTTTATTTATATTATCATTAACCTTATTCCCTATAATTAATAAATATTTATGA AATTTTTTAATTATTAACTTTATATTTATTTATTCATCTATATTTATAAATAAACTTTATTTTTTAA ATTATTATTGATCTAGAATTAATTATATATTTGGATTAGATAATCTAAGATTTTCTTTAATCTTGT TAACCTTATGAATTACACCCCTGTCAATTTATTCTTTTAACATAAATAAAAAATTATTATTCTAT AACTTAATAATTTTTATAATAATTACTCTAATTTTATCTTTTATATCTATAAACATAATAATTTTTT ATATTTTCTTTGAAACAAGATTAATTCCTATTATTTTCATTATTATAGGATGAGGATTCCAAATT GACCGTATTCAAGCTAGAATATATATATTATTTTACACTTTATTTGGTTCCCTACCATTATTAAT AATCATTATATATTTATATTCACACATAAACTCTATTATAATAAATATTTTATGAATAAAAAATCT AAGATCCTTAAATAATTTAATTAGATTTATTATATTAAATTTAGCTTTTATAGTCAAAATACCWA TATATATTCTTCATTTATGGCTCCCTAAAGCTCATGTTGAAGCTCCAGTAAGAGGATCAATAA TTTTAGCAGGAATTATACTAAAACTAGGAAGTTACGGTATTTATCGCTCTATATTAATTTTACC AAAATTAATAATAAACTTTAACCAATACATTATTATCATAGGATTAATTGGAAGATTAATCTCAA GATTAATCTGCTTAAACCAAAATGATTTAAAAATTATTGTAGCCTATTCCTCCGTCGTTCATAT AAGAACACTATTAGCTAGAATATTTACTTTATCAAAATTTAGATTTAAAGGAAGATTAATAATAA TAATTGCCCATGGTTTATGTTCATCAGGAATATTCTTTATTGTAAACTTAAATTATGAACGTTT AAAATCCCGTAATACTTTAATTAATAAAAACCTCCTAAATATTTCCCCATCATTAACTTTATGAT GATTTTTAATATGTTCTTCAAATTTCTCAGCTCCCCCATCATTAAATTTAATTAGAGAAATTTTT ATTCTAAATAGATTAATCCTATGAAATAATTGATTAATTTTTATTATTATTTTAATTTCATTTTTT AGAACTTGCTATTCAATTTTACTTTACGCTTTTTCACAATATGGAAAATTAAATTTTTCATTTTT TAATTATTCAAATATTAATAAAAAAGAATATTTAATTATTATTCTTCATTGATTACCCCTAAATTT AATATTTTTAAATTTAAACTTATTAATTTAA

>nad4L ATAAATATATATATATATATATTATTAATTTCATCTTTTTTATTTACTTCTTTCTATCAACATTTAT TACTGGCTTTAATTAGACTAGAATTTATATTAATTAACCTTTTAATAAGAATATATATAACTTTA TTAAACTTAAATATAAATTTTTATATCATTACATTTTTTATCGGAGTAATAATCTGTGAAAGAGT TATAGGATTATCAATTATAGTTTACATAATTCGGAATTCAGGCAATGATTATATTACCCAATTA AGTTTACTAAAATGATAA

90

Supplementary Data 2-S3 – continued

>nad5 ATATTATTTATAAATTTAATTTCATGAACTTTTTCTATTTTATTCATTTTATTAAAAATAAAATTTA TTGTAGAATGAAATATAATAATAATTAATACTATTAAAATTAATTATATTTTATATTTTGACTGAA TAACCTTAATATTTTCTAGAACAGTCTTAATAATTACTTCCATAGTTATATTATATAGATTAAAT TATATAGAAAAAGATTACTCTATCAAACGATTCATATTTTTAATTATAATTTTTATAGCCTCAAT AATTTTAATAATCTTTAGGCCTAACATTTTTAGGATCCTTCTAGGATGAGATGGATTAGGTTTA TCTTCATATTGTTTAGTAACATACTATATAAATAAAAAAAGATTTAATTCTAGAATAATTACAAT TTTAATAAATCGAATTGGTGATATTATAATTTTAATTTTAATTGGATTAATAATAACATTTGGAT CTTGAAATTTTATATTCTTTAAAAAAATAAGAATATTTATATTTATATTTGTATTAATTGCAGCCA TCACAAAAAGAGCTCAAATTCCTTTCTCATCATGATTACCCATAGCTATAGCAGCCCCAACCC CTATTTCATCATTAGTACATTCATCAACATTAGTAACAGCAGGAGTTTATTTAATATTACGATT TAGATATTTATTTAATAACCACTTTTTATACTTTGTTTCTATTATATCATCCTTAACTATATTTAT AGCTAGATCAAGAGCAACTATAAAATTTGATATAAAAAAAATCATTGCACTTTCAACCTTAAGT CAAATTAGATTAATAATTTTAACAATTACAATAAATTTACCTAAACTAGCATTTTTCCATTTAAT TACACACGCAATATTTAAATCATTAATTTTTCTTTGTTCAGGAATTATAATTCATAATTTTTATA ATCATCAAGATATTCGATTTATATCTTTTATAAATTTAAACTTACCTTATACTAATTTAATTTTTA ATATCGCATCACTAACTTTATGTGGATTACCCTTTCTTTCAGGATTTTATTCAAAAGATCTAAT TATTGAATTTTTTTTAATAAAAAATTTTAACTGATTAATATTTATTATAATATTTTTATCTATGGR GTTAACAATTACATACTCATTACGAATAATTTTTTACATCTCAATAAAAAATACAAAAATTTCCT TAATAAATTTACCTAAATCTTTTAATTTAATAAATTACTCAATTATTTTATTATTATTTATATCCAT AGTTTATGGTTCAATTTTAAATTGATTAATATTCTCCTCCCTAAATAAAATTTTTCTACCTATAA ATTTAAAATTAATTATTTATAAATTTATAATTTTAGGTATTATATCAGGACTCACCTTATCTTATA TTAAAATTAATTTTTTTAAAATAATTTTTTTAAAAAAAATTTATTACTTTAACAATATAATATGAAT ACTACCTAATTTATTAAAAAAAAATAAAAATAATCTAATAAATTTTAATAATAAATTAATTTTTTT CAGAGAAACCTCTTGATTAGAAATAATTTCATCTAAAATTTATATCTATTATTTAAAATTATTTT TTAATAAAAAAATTATTCACAAACTTAATTTTTTCTTAACTATTTTAATAACATTTTATATAATTA TAATTATC

>nad6 ATGAAAAATTTTGATATTAATATTTTTATTTTATTAGTGACATTAGTGGATTTAATTTTAATAATA ATTTTAATTATTCCAACGAATATATATAGTTTTCATCCCTTATCTTTGGTAATTTTATTAATTTTA TATACAATTATATTAAGATTTAAAATAAGGATTTTAATTAATAGATATTGGTATTCATATATTATA TTTTTGGTTATAATTGGGGGTTTAATAATTTTATTTTTATATTTTACTAGATTAATTAATAATCAA ATATTTTATTTTAATCAGAAATATTTTTTTTATTTTTTATTAAAGCTTATTTTAATAATTTTATTAT TATTAATAATTTTATATTTTAATAATTTTTATAATTTTATATATWTAGATTTTTTTGAAATTAATAA TTTATATAATTTTTTAAATCAGATTTTAATTTTATATAAAAATACAATAATAGATTATACTTTAGA TATGAATATATATATTGTTATATATTTATTTTTTACAATAGTTTGCTCAGTTGTAGTTTGTATAAA AGTTAATATTCCTTTACGACAAATTTTAAAATTTAATTAA

91

Supplementary Data 2-S3 – continued

>rRNAL TTTATAAAAATAAAAAAAAAAATTAAATTAAACTAAAATTTTATAAATNAAAAAAAAAAAATAATT AAGAAAAATTTATTATTTATTCCTTTTGTTTCAGAATAATTTTAAATTTATAAATTATTTTAAATT CTCGAAATAAAAAAATTTAAATAATTATATAACTTAAWATAAAAATATTATTAAAAATAATTATT TAGAAAAAATATTTTAAACAATTTTTATAATATCTAGTTATTTAATAAAAAAATTTAATTTTTCAA TAAATTCTCATTTATAAATTATTTAATTTTAATTTATAAAAAATTTATTAAAAAAATTTAAGGGAT AAACTTTAAATTTAAATTTATAATTTTTATTTAAATTACATTAATTATAAAAATTAAAATAATTATT AATAAAAAAAATATAATAATTTATAAATTATTTATAAAAATTTATATTAAATTTAATTATATATTAA ATTATTATGATAAATAAATATCCATAAAAAATAATAAATTAATTAGTAAATAAATTTTAAATTAAA TAATTAAAATTTTTTAAAAATTTAATTAATGAATTAGGCAAAATTTATATTCACCTGTTTAACAA AAACATGTCTTAAAGATCATTAATTTTAAGTCAAATCTGCCCACTGATTATATATTAAATGGCT GCAGAATTTATAACTGTACAAAGGTAGCATAATAATTTGTTTTTTAATTAAAAACTAGAATAAA AGATTTAATGAAATATAAACTTTATAAATTAAAAAAAATTAACTTTTTATTTAAGTAAAAAAACTT AAATTAATAAAAAAGACGACAAGACCCTATAGAATTTTATTTTTATTATAAATTAATCAAATATT TTATTAAATATAAATTATAATGAAAATTTAATTGGGGTAATTAAAAAATTAAATAAATTTTTTCAA TAAAAACCATTAATATATGAATAAAATATCTTAAAATTTAAATTATTAAAATTAATTACCTTAGG GATAACAGCATAATTTTTTTTAAAAGATCTTATTAATAAAAAAGATTATGACCTCGATGTTGAA TTAAGATAAAAAATAAATGAAAAAATTTATAAATTTAAGTCTGTTCGACTTTTAAAATCTTACAT GATTTGAGTTAAGATCGGTGTAAGCCAGATCAGATTCTATCTCTTTAAAAACTTTATATAATTG TACGAAAGGAATCTATATAAAAAATAATTTTTATTAAAATAATTAATATACTTAAA

>rRNAS ACAAATAACTTAATTTTAATTTTATTAATTAATTATAATTTTTCCTTAWATATAAATTTTTGAAAT AAATTAAATAAGAATATATTTAATATATAAAATTAATTAATTAAAGAAATTTTTAATTAAAAAAAT TATTAACTTAAATAAAAAATTGTGCCAGCAATTGCGGTTAAACATTTTAAGTAAATTAATTACC CCAATTAAAATAAAATATTAAAAAATTAAATTTAAATATAATTTTATAATAAAATATTAATTATAT AAAAATTTAAAAATTAAAAAAATTAAATTTAAAAATTTTTTAAAAAACTAGGATTAGATACCCTA TTATTAAAATAAAACCATAAATATTAAAAAATTAAATGAAAATCACTAAATTTAAAAAATTTGAC AGTATATTATTTTATTAAAGGAATATATTTTATAATAGATAATAAACTAAAATAAAACCTTAAATT ATATGTTTATATATTGTTGTTTAAAAATTTAATTCAAAAATTAAAAATTTAAATATTTAATAAAAT AAAAATTCAAATCATAATATAGCTTATATTTAAGAAAAAAAGTATTACATTAAATAAATAATAAT TTTAAAAATATAATTTTTTAAAAAAAAAGAATTTAACATTAAAATTAATTAAATTTTTTAATTTAA ATATATAATAATATATGTACATATTGCCCGTCATTCTCAAAAATGAGACAAGTCGTAACAAAGT AAAATTACTGGAAAGTAATTTTAGAAATTA

>tRNA_A AGAATAATAGTTTAATATAAAATTTTTAAATTGCAATTAAAGGATATCTAAGGATTTATTTTA

>tRNA_C TAATTTTTAGTTTAAAAAAAACATTAAATTGCAAATTTGAAAATATAAATATTATATAAAGTTT

>tRNA_D AAGAATTTAGTTAAATTATAACATTAAATTGTCAATTTAAAGTTATTAAAATTATTAATATTTCTT A

>tRNA_E ATTTATATAGTTTAAAAAAAACATTATATTTTCACTATAAAAATAATATTTTATATTTATAAATT

>tRNA_F ATTTAAATAGCTTAAATTTAAAGTTTAATAATGAAAATATTAAGATGATTTATAAATCTTTAAATA

>tRNA_G ATTTATATAATATAAAAAGTATATTTAATTTCCAATTAAATTGTTTAAAAATTTTAATGTAAATA

92

Supplementary Data 2-S3 – continued

>tRNA_H ATTTAAGTAGTTTAATTAAAATATTAATTTGTGGTATTAAAGATTTAATAAAATTAATTTAAATT

>tRNA_I AATGATATGCCTGAATAAAGGATTATTTTGATGTAATAAAACATGTTTAACTAAAACTTTCATT A

>tRNA_K CATTAAATATCTTAAATATAAGAATTGAATTTTTAATTCAAATTATAATAAATTAATAATTATTTT TAGTGA

>tRNA_L1 ATTACTTTGACAGACACCCATGTAATAAATTTAGAATTTATAAAAGTAAACAATTTACAAGTAA TA

>tRNA_L2 TTTAATATGGCAGATTAGTGCAGTGAATTTAAGATTCATATATAAAAGTAATTTTTTTATTAAAA

>tRNA_M GTAAAGATAAGCTAAACTTAAGCTATAAGGTTCATACCCTTAGTATAGATTATAAATCTTCTTT ATA

>tRNA_N TTAATTGAAACCAAATAGAGGTAAGTTATTGTTAATAATTTAAATGAATATATTTATTCCAATTA AA

>tRNA_P CAAAAAATAGTTTAAATTAAAACATTAATTCTGGAAATTAAAATTATAATATATTGTTGTTTTTTT GA

>tRNA_Q TATATTTTAATGTAATAAGCATAAAAATTTTTGATATTTTAAGAATTAGATTTTCCTAATAAATAT AA

>tRNA_R AATTAAGAAATAAATATTTATATTCAATTACGGTTTGAAATTTTGGTTTTAATTAATCCTTAATT G

>tRNA_S1 AAAATAAGTATGAACATCAAAAATTTCTAATTTTTAAATTTATGGTTAAAATCCATTTTTATTTTA

>tRNA_S2 AGATAATGAACTTGAATAAGTATATATTTTGAAAATATAAAATAGAATTAATTTTCTATTATCTT

>tRNA_T AGCTAAATAGTTTAAACCAAAATATTAATTTTGTAAATTAAAGATATTTTATTAATTTTAGCTT

>tRNA_V AAAATTTTAGTTTAATAAAAATTTAATATTTACAATATTAAGATATAAAAATTTATAAATTTTA

>tRNA_W AAAATTTTAAGTTAAAAAAACTTTAAATTTTCAAAATTTAAAATATAGATAAATTTTATAAATTTT A

93

Supplementary Data 2-S3 – continued

>tRNA_Y AGTAAAGTGTCTGATTTAAGTTATAAATTGTAAATTTATATAAAAGATACAAACATCTTCTTTAC TA

94

Supplementary Data 2-S4. Annotation of nuclear encoded oxidative phosphorylation genes in D. alloeum. Red highlighting indicates gene was not found. Yellow highlighting indicates gene duplicate was found. Asterisk indicates partial gene was found.

OXPHOS Complex subunit Symbol Coordinates Accession I 13 KDA SUBUNIT A ND-13A NW_015145852.1:11765-15678 XP_015123625.1 I 13 KDA SUBUNIT B ND-13B NW_015145070.1:1608854-1611160 XP_015112112.1 I 15 KDA SUBUNIT ND-15 NW_015145544.1:60115-60426 N/A I 18 KDA SUBUNIT ND-18 NW_015145474.1:187532-198523 XP_015121464.1 I 19 KDA SUBUNIT ND-19 NW_015145414.1:66023-68474 XP_015120802.1 I 20 KDA SUBUNIT ND-20 NW_015145034.1:551626-554281 XP_015109823.1 I 23 KDA SUBUNIT ND-23 NW_015145039.1:796245-799528 XP_015111197.1 I 24 KDA SUBUNIT ND-24 NW_015145114.1:790039-796653 XP_015115395.1 I 30 KDA SUBUNIT ND-30 NW_015145364.1:247847-252944 XP_015120439.1 I 39 KDA SUBUNIT ND-39 NW_015145079.1:970653-974791 XP_015112880.1 I 42 KDA SUBUNIT ND-42 NW_015145100.1:160256-163226 XP_015114735.1 I 49 KDA SUBUNIT ND-49 NW_015145170.1:1131632-1137436 XP_015117867.1 I 51 KDA SUBUNIT ND-51 NW_015145020.1:1206416-1213320 XP_015127479.1 I 75 KDA SUBUNIT ND-75 NW_015145468.1:51378-55424 XP_015121331.1 I B8 SUBUNIT ND-B8 NW_015145128.1:35-1772 XP_015115950.1 I B12 SUBUNIT ND-B12 NW_015145033.1:142588-143017 XP_015109658.1 I B14 SUBUNIT ND-B14 NW_015145060.1:670567-671740 XP_015111659.1 I B14.5A SUBUNIT ND-B14.5A NW_015145169.1:251014-259437 XP_015117745.1 I B14.5B SUBUNIT ND-B14.5B NW_015145178.1:277175-280183 XP_015117999.1 I B14.7 SUBUNIT ND-B14.7 NW_015145068.1:991747-995522 N/A

95

Supplementary Data 2-S4 – continued

I B15 SUBUNIT ND-B15 I B16.6 SUBUNIT ND-B16.6 NW_015145060.1:680974-683067 XP_015111553.1 I B17 SUBUNIT ND-B17 NW_015145023.1:1132829-1133488 XP_015127855.1 I B17.2 SUBUNIT ND-B17.2 NW_015145025.1:1109847-1111745 XP_015108874.1 I B18 SUBUNIT ND-B18 NW_015145153.1:945274-947100 XP_015116892.1 I B22 SUBUNIT ND-B22 NW_015145017.1:1138739-1141763 XP_015127094.1 I ASHI SUBUNIT ND-ASHI NW_015145012.1:713232-715471 XP_015126795.1 I AGGG SUBUNIT ND-AGGG NW_015145005.1:354471-356806 XP_015117269.1 I MLRQ SUBUNIT ND-MLRQ NW_015146075.1:9368-9791 XP_015124414.1 I MNLL SUBUNIT ND-MNLL I MWFE SUBUNIT* ND-MWFE NW_015145173.1:414353-414451 N/A I PDSW SUBUNIT ND-PDSW NW_015145481.1:79369-85511 XP_015121503.1 I SGDH SUBUNIT ND-SGDH NW_015145126.1:158325-160454 XP_015115899.1 ACYL CARRIER I PROTEIN ND-ACP NW_015145035.1:261759-265055 XP_015110175.1 FLAVOPROTEIN II SUBUNIT SDHA NW_015145024.1:1948100-1955533 XP_015108523.1 FLAVOPROTEIN II SUBUNIT SDHA.2 NW_015145024.1:1941049-1946279 XP_015108501.1 IRON-SULFUR II PROTEIN SDHB NW_015145044.1:1043948-1045737 XP_015111410.1 IRON-SULFUR II PROTEIN SDHB.2 NW_015145044.1:345243-347079 XP_015111397.1 CYTOCHROME II B560 SUBUNIT SDHC NW_015145126.1:1069985-1072442 XP_015115823.1 CYTOCHROME B II SMALL SUBUNIT SDHD NW_015145011.1:1917531-1919735 XP_015126467.1 III 6.4 KDA PROTEIN* UCRY NW_015145227.1:2549-2660 N/A III 7.2 KDA PROTEIN UCRX NW_015145170.1:659045-662834 XP_015117834.1

96

Supplementary Data 2-S4 – continued

III 11 KDA PROTEIN UCRH NW_015145158.1:300991-304268 XP_015117151.1 III 14 KDA PROTEIN UCR6 NW_015148955.1:2819925-2823183 XP_015126174.1 IRON-SULFUR III SUBUNIT UCRI NW_015145090.1:240207-24208 XP_015113994.1 CYTOCHROME C1, CYC1 NW_015145003.1:1694866-1699987 III HEME PROTEIN XP_015111448.1 III CORE PROTEIN 1 UCR1 NW_015148955.1:1168217-1172707 XP_015126203.1 III CORE PROTEIN 2 UCR2 NW_015145078.1:586076-589906 XP_015112741.1 UBIQUINONE- BINDING PROTEIN III QP-C UCRQ NW_015145002.1:3221224-3223021 XP_015108778.1 IV SUBUNIT IV COX4 NW_015145616.1:162193-165209 XP_015122400.1 IV SUBUNIT IV COX4.2 NW_015145005.1:2875760-2879532 N/A IV POLYPEPTIDE VA COX5A NW_015146079.1:30641-33277 XP_015124426.1 IV POLYPEPTIDE VB COX5B NW_015145616.1:193430-201377 XP_015122412.1 IV POLYPEPTIDE VIA COX6A NW_015145533.1:120256-121197 XP_015121813.1 IV POLYPEPTIDE VIA COX6A.2 NW_015145106.1:593739-595086 XP_015114963.1 IV POLYPEPTIDE VIB COX6B NW_015145481.1:66735-79093 XP_015121504.1 IV POLYPEPTIDE VIC COX6C IV POLYPEPTIDE VIIA* COX7A NW_015146008.1:85795-85929 N/A IV POLYPEPTIDE VIIC COX7C NW_015145038.1:1356099-1357318 XP_015110841.1 V ALPHA CHAIN ATPA NW_015146007.1:5205-8320 XP_015124302.1 V BETA CHAIN ATPB NW_015145060.1:332331-336543 XP_015111622.1 V GAMMA CHAIN ATPG NW_015145111.1:3513-118664 XP_015115256.1 V DELTA CHAIN ATPD NW_015146001.1:104853-105912 XP_015124263.1 V EPSILON CHAIN ATP5E NW_015146484.1:70937-72643 XP_015124855.1 V B CHAIN ATPF NW_015145002.1:1497397-1500275 XP_015113524.1

97

Supplementary Data 2-S4 – continued

V D CHAIN ATPH NW_015145087.1:254719-259001 N/A V E CHAIN ATPJ NW_015145004.1:3601247-3602712 XP_015113602.1 V F CHAIN ATPK NW_015145023.1:381669-384663 XP_015127799.1 V G CHAIN ATPN NW_015145011.1:208021-208841 XP_015126663.1 COUPLING FACTOR V 6 ATPR NW_015145625.1:64496-66377 XP_015122464.1 LIPID-BINDING PROTEIN P1,P2,P3 AT91 NW_015145098.1:1434056-1439969 V A XP_015114479.1 OLIGOMYCIN SENSITIVITY CONFERRAL V PROTEIN A ATPO NW_015145419.1:196592-200557 XP_015120896.1

98

Supplementary Data 2-S5. Sequences of chemosensory genes in D. alloeum. Odorant receptors

DallOr1 MMKTKHQGLVADLMPNIRLMQISGHFMFNYYGEGKKLMHKVYCSVHLFLIILQFGLCGINLA MESGDVDDLTANTITVLFFLHSVVKVVYFAVRSKLFYRTLAIWNNPNSHPLFAESNARYHSIALT KMRRLLFCVGAATVLSVICWTGITFFEDPHKTITDPITNETSTIEIPRLMVRSFYPFDARHGVAHI AMLVFQFYWLMITMVDANSLDVLFCSWLLFACEQLQHLKAIMKPLMELSATLDTVVPNSSELF KAGSAHLRETNGTQPSATPQQGDNMLDLDLRGIYSNRQDFTATFRQAAGMQFNGGVGPNG LTKKQEMLVRSAIKYWVERHKHVVRLVTAIGDAYGVALLFHMLITTITLTLLAYQATKVNGVNV YAATTIGYLLYSLGQVFLFCIFGNRLIEESSSVMEAAYSCHWYDGSEEAKTFVQIVCQQCQKAM SISGAKFFTVSLDLFASVLGAVVTYFMVLVQLK DallOr2 MESPAVPRGYKNTASEADINYVVKISRTLLTPIGIYPLHGSETAISDFLVAIQIIFVFSLMLFLLVPH LIFTYWDAEDLTKLMKIIAAQVFNSLALIKFWTMIINKKPLRWCLEELQDSWRSVSCEEDKEVMI KNAKTGRFLTVAYLSLSYGGALPYHIVMPLVAERIVKPDNTTQIPLPYPCDYVFFVPEDSPGYEM LFVTQIIISSLILSTNCGIYSLIATYIVHACCLFEIVARHLDQVGTGEGEMAERLTAVIQRHIFAMKY ALTLEKSLNIVFLSEMLGCTVIICFLEYGILLDFAEGNYLGMVTYVVLMTSILVNVFILSYIGDRLKE QSSKICEHTYDLEWHSLPKNIVNDLMFIMVRSNQPVTLTAGKLFDVSLAGFADVVKTSAAYLNF LREVV DallOr3 MAGRVRHARTPRRDQINNGNRVGSAFEADVDYAIGIIRWLLKPLGLWPKTANSTRSERLTAIFL MSVCTFLLGFLIVPGSLFAFVKIKNPAVRLKLTGPLSFCVMGIMKYYSLVVERRNIASCIQHMTA DWQKAVSTHDRDIMLTYAQFGRYGATICTGFMYSGGLFYAVILPYVSAGARTEGNGTVRSLAY PSHYVLFDPQVSPAYEVVFSTHCCCAFVMHSITSATCSLAVVFAMHACGQLEILIVWLNDLVDG SGQAEDRLSEVVEQHVRTLSFIIRAEGVLREICLVEICGCTLNICFIGYYLMMEWAQADAVGITT YTILLISFVFNIFLFCYVGELLTEECKKVGETTYMMEWYRLPDKKALGLTLVISTAQHPVTITAGG MIALSLSSFCTVIRTAVTYLNLLRTLMD DallOr4 MRDEFAVLGAVPAHSERSKDDMKYSTELNRWFLTPIGIWPTRSDTHIIEKILSEFLVFLSCFLICF QLVPCLLHTFIKEQNPKLKMKMIGPLSFGFMSISKYLFMVARKQEIGYCLEHIDVDWRRVAYLD DREIMLMNAKLGRFIAWLCAAFMYGGGLFYHTVMPFAAGSFVTPDNITIRPLVYAIYDPLFSA QTTPSYEIVFTIQCFSGFVVDTVTIGACSLAAVFVLHICGQLKIVRSRLESFVKGRIYERKNVEGR MAEIIELHLRALGFVVRVEGILTEVCFIELVGCTMNICFLGYYFMTEFDQNAAIGTITYCILLVSFIF NIFIFCYIGETLTQQEDKVGRTAYMIKWYELPPKSARDLIFLLAMTKTPICITAGKMAELSFQSFS GVLKTAAAYLNLLRTLLM DallOr5 MRDEFAVLGTVPAPSERSKDDMKYSTELNRWFLTPIGIWSTRSDAHIIEKILSELLVFLSCFLICFL LVPCGLHTFIKEQDPKLKMKMIGPLSFCLMSITKYLLLVARRREIRHCLEHIDIDWRRVRYLDDRE IMMMNAKLGRFIACLCAVFMYGGGFFYHTIMPFAAGSFVTPDNITIRPLVYAIYDPLFSAQTTP AYEIVFTIQWFSGFVNYSVTIGACSLAAVFVLHICGQLKIVSSRLESFVKGRTDERKNVESRMAEII ELHLRALGFVVRVEGILNEICLIEFVGCTMNICFLGYYFMTEFEQSAAIATVTYCVLLVSFTFNIFIF CYIGEMLTQQGDKVGRTAYMIKWYELPAKSARGLILLLAMTKNPASITAGKMAELSFRSFCGVL KTAAAYLNLLRTVVM DallOr6 MKNQSPILGAIPAPSERSRRDFQRATNINRWLLTPLGIWLREIDIDVTEKVLSGLAVIVCYFLIFSL LIPCALHTFLVEKSARKQMKMIGPMSFCVMAIIKYFFMIMRREKIRVCFNHLDVDWRRVKSPQ DREIMMSDAKIGRFIASLCATFTYSGGFFFRTILPFAVPRKVLPDNTTMRPLIYPVYRALFNSQKT PAYEIVFATQWVSGLVMYTITVATCSLAAVLTLHACGQLKIVMSRIDDFVGSSVDTEKMLTDRL GEIVELHHRALQLVVKIEGILNEICFVEFIGCTMNICFLGYYTITELEQGKTSAIALVTYTFLMTSFT FNIFIFCYIGELLNQQGRKVGTTAYMIEWYELPGKSACGLILLLAMSNSPITITAGKMVELSYATF CNVLKTAMAYFNLLKSVIL

99

Supplementary Data 2-S5 – continued

DallOr MRNESAVLGSVPPETSRSERDTKYGTNMGRWFLMTIGVWPLPADSHIMKKILCETWVVMCH 7 VLIIFILVPAFLNTVLVQKDPRKQIRMIGPMIFCLTALTKYNFLVGHRPGISACVEHMNADFRRAK CEDDRTIVLNYAKTGRHIATVSTTLMYAAGFFYRTILPLTRPTIVTPNNLTVRPLTAPIYDPLFSAYT YRSWTIVFIGQWFSGYVMYTTAVGACNLAIVFVLHACGQLKLVMARLESCVEGNKHVPAGVED RVAEIVELHVRSLCFAFRVESILNKVCFAEVGGCTLVICFLGYYLITDVNSHETAISLLTYYVLIISFTF NIFIYCYIGEMLTQQGMKVGTVAYLIDWHKLPVSSAKGIMLIISMSGYPTSITAGRMMQLSYNSF LSIMKTSVAYLNVLREMTD DallOr MTPEPWKQALQFPPIHQHEANINYELQYTRWLLTALGVWPMITNDVTRRAKISSILLVGLSLLAI 8 AFILVPITVFMFVKMKTLRTRLGFIGPIGFRISNMCKIIMMMYRSGVIRECISQVKSDWSAVTIRD DEDAMVQSVILGRSLTIVCGIFMLSGGVFFHIIMPLLKPRKVNAFNVTIRPHAYPGYDFFVDSQA TPAYEIIFSAHFLCATSGYVVVTASCNLAAVFVSHISGQVQVIKLKLARVQQDDGGKAGDLSRQIA SIVKSHVRILKFSDKIKMVFREICLVEVVLSTIVICWLEFYCITEWKNSATISIVGYCVILGSFSFNVFIY CYIGQILTDQCESVGQMAYMIDWHRIPPRNVLSLSMIISMARYPRHITAGGMINLTIRSFGDVM KTSVAYLNVLTAVAA DallOr MARKPSKSPQRSPVHDRKANINYELQYTRWLLTVLGIWQLISNNVTRLTKVSSLFLIVLSIFAISFV 9 LIPLVMFALTRVKTLRGRFTFIGPVGFRISNLLKLLTMAYRADVIKECIDQIKSDWLEVIIENEREAM LKSVELGRSLTVICAVFMFSSGTFFHVGMPLLKPRKVNALNVTIRPHLYPGYDVFVDSQASPAYEI IFAAHCFCAAGGYTIVTAACNLAAVFVSHVSGQVEVIRLKLETLHGSGDREEADVSEQIASIVRCH VKILRFSGNIKTVLREICLVEILLSTVVICWLEFYCLTEWHNSETISIVTYFLLLISLTFNIFIYCYIGQILK DQCESVGLMTYLIDWHRIPRKNILSLAFTISMTHYPRTITAGGLMQLTIQSFGDVMKTSLAYLNM LRAVTA DallOr MRLYGNNNMHHHVKTITNEHYHFDIRRSFRMTEWLLTPLGLWLLISKNPTRNNIYTSIALIGVC 10 MTTLLWVIMPCTRHLAFLEKDRNIRVTKIGPVSYCYKSIMLYGAIILSTKRIKSCIEHIKSDWRELES EGDRQIMINNLDFSRKITIICGIFMYSGGLSYHSVMQLWSGKRINGLNETLRPHVYPGYDDFVDS QATPAYELIFTGHFITALVACSITTASCNLTATFVGHALGQIQIVMTRIKKIVDENDERHSDPQQRI ASVIRGHVSALRLTAEIEKVLRNVCLVEVFGSTFVMCSVEYYFIKELSNSDSIVLLTYVALFFSLSFNI FLFCQIGEILMEKCYGIGKLTYGIPWYKLPGKASLPLTLIIAISHCPRKLSAGGLLELSMNSFATVMR TSLAYLNLLYTIDS DallOr MSRKFSELSFPGKYQTYSHCELQYTQWLLTALGVWPMVSKNVTPKAKVSSFLSVCLILFAIAFTV 11 VPCIMFVTLRMTSMRSRLVFFGPLGFRITNLLKYLFMIYRANILKVCVSRIEADWSAATEKDQVV MARSAELGRKLTRLCAFFMYGSGIFFHAMTFLRPRRVDAFNVTIRPHILPGYDLFVNPQVTPTYEI IFGINCLCGAAIYTIVIATCNLAAVFVGHVSGQVQVIRLKLLNLEEYEQNFEKNEDIRKDIAYIVRCH VKMLRYNVRDVLREICLVEVVHSTTVICWLEFFVLTGWKNSEVIFIVTYFLLLVSLTFNIFMFCYIGE ILKNQCASMGHMTYMINWHNMPKENISSLSLIIAMARYPCTITAGGMMELTIRSFGHVMKTSL GYFQMLRTVAA DallOr MPPASSKKIPKTPSSRKYSAHIEYELQYTRWILKSLGIWSMVSDNTTNSDRIASFFLIIFNLFAIAFILI 12 PCSMHMLLREKNPRRRLLLIGPIGFRISNILKYFSIMFRVDVIKQCLNHVKNDWSDVASEDERKLV LKSVEVGRNLTRLCAIFMFSAGLFYHTVMPLLKKKRVNAFNVTIRPHVYPGYDFFVDSQASPTYEI IFGTHCVFAVAGYFMTVAACNLAATFVSHICGQVQIMNLKLQGLTGNEKSSVAEEIASVIKCHVK VLRLSRMIEKILREICLVEVVASSLIICLLEYYCMTEWKNSETASLITYFMLLTSLTFNVFIFCYIGEILK DQCEVVAELTYMAEWYQISSKNALQLALIIAMARSPRKITAGGMMELTIQSFGAVIKTSVAYLN MLRAVAD

100

Supplementary Data 2-S5 – continued

DallOr MLPKGPDHELGSLPRWKYKTNGEHEFQYTRLLLTVLGIWPRVSTNATRTEKTCALLILGVFILALIV 13S SVIPCVLFTSIKIKTMNERLVFLGPLGFHFTNPLKLIFIIHHTDVIRKCLIHMRADWRKAVTKEEQTV MLKSANSRRSLTHLCALFMYNGGICFQAMTMFKPKETDEFNVTIRQHVLPRYDYFVDSQISPVF EIVYATHVLFLVFLYTIEIATCNLAAVFVGHICGQVQVMKVKLKILGQSENQGNSKGIDDLIASVIH CHVRVLILAGNTRKVLREICLVEVLYSTAVMCWLEFFCLKEWSDSEGLSLVTYFSLFVSLTFNIFILC YIGEILKNQCESVAEVTYKSDWCQIPPKKLSSFILIIAMARYPQSITAGGMIELTLRSFGDVLKTSLA YLQMLRTVTTSF DallOr MSVDKATSSSEEVKTIQLTRWLLISLGIWPFVTDNSTIIETFLGIGVQILSYGLLVFTIVPTTYHIFMR 14 EQNLDAKITLIGPICFFTTNLLKYCAIVYHREDIKQCIKCVKTDWQRITNQMDRDIMMRNVSTGR NLTIFFAIFMYSGGMFYHTFMPFLVTSQVTNVRNTSTRPLVYAGYDLAFDPYVSPVYEIVFLSHCL TAMVVHTIVTVSCNLAASFVAHACGQIQIIISRLHSLIEDDLTQDSVCVHRKMASIIQCHAKVLRF TVGVEVILREICLIEVVASTFVICLLEYYILATWNGSEVIGILTYLVLLVALSVNVFIFCFIGEKLTQQC HEASKAIYMIKWYRLPPDVAIALVLFLASAQVPRRITAGGYMDLSLNSFVKVLKTSVVYLNLLRTV EN DallOr MSVRAEKTVEEIKSIQFTKWLLTPLGVWPFVKRHSTLTEIRLGILLQLFCHSLLAFVVVPSVFHIFAR 15 ENDLNVRIALFGPVGFVVMNLLKYCAIIYHRGGIGRCIRYIESDWRGTQDQDHRDIMMKNASTG RNLTILCATFMYSGGMLYHTLMPFLLTTPIAADKNSSNRPIVYPGYDAVFDPYVSPTYEIIFLSHCL AAMIIYTITTVSCNLAASFVSHACGQIQILVARLDHLVDDGRGRSEKYLKRKIGDIIRYHAGVLRFT VDIEEILREICLVEVVASTLIICLLEYYCMTTWSESETIGICTYFILLISLSFNIFIFCYIGELLKEQFGDVG TSTYLTEWYRLPPNCGLMLILIIATAQAPRKITAGGLIDLSLNSFVGVIKTSLMYLNLLRTVET

DallOr MAGLAIKPKESKAIELTRWLLTPLGVWSLIKKTPSTRDTLSAITLQFFCYSVILFALIPSIFHIAFREKN 16 LNAKIALCGPVGFFMMNMLKYSAIIRHRKSIKKCIESVETDWERIDNDQEWNIMTKNLLIGRKLT IVCASFIYTGGMSYHTVMPFLMGSPASSEHFADERPLVFPGYDVLFDPFISPVYEIVYFSHCLAALI IYTITTVSCNIAASFVTHASGQIEIIVGKLNGLVGDGYEQDDKCLREKMGDIVRDHGRVIRFTVDIE RVLRDVCLVEVFASTIIICLLEYYCLTTWNESETIGIITYFLLLVSLSFNIFIFCYIGEVLKEQCYEAGYSA YLIDWHRLPPPAASGLILIIANAQVPRKLTAGGMLELSLSSFLTVMKTSLVYLNLLRTAATS

DallOr MTKAGGKSQVNGSVYENEDFVYSYGWNRVMMHAIGGWPEDNDNFFMRNRIFLNGLALMLF 17 VILPQSASAAVFWGDLNAFIECFSVNLAVNLSLLKIVWLGVQRSTVQNLLKLMAGDWLMEKTLE EHEEMLKMTKIIRLISSLSFYWTLCLFAAYVSVQILVGFELRNDPNVDPRLSIGFLYTCVFPFDTSPL AVFIPIWIAQFLCTYLSMAGYSSPDSFLGMFVFHQCGQLKVLRRRLERIVDEKTMENPRVFWKKL GEIVERHEHLNERASEIEENFNKIFLAATLVCIFATCTQGFAVITLLQETEGDFPVLQMIFLLVFTTY DLGHFFVYCMAGDILMTESSNFGVSMYNSQWYNLAPTDAKSVLIFTSRCIMPQQITGGKFTVLS LPLFANVVKTAASYLSVLLAMKV DallOr MTKAGGKSQVNGSVYENEDFVYSYGWNRVMMHAIGGWPEDNDNFFMRNRIFLNGLALFLF 18 NILPQSASVPVFRGDLNALIECLSVNLAISLSLLELVWLGIQRPTVQKLLKMMAEDWLMERTVEE HEEMMKMTKIIRLISSLSFYGTICLFIVFVSVQILAGLALRNDPNVDPRLSIGFLYSCVFPFDTSSRS VFIPIWIGQFFCTYISMAGYSSPVSFLGMFVFHQCGQLKLLRLRLEGIVDNETMGNPREFWRRLG KIVERHEQLNERTSEIEENFNKIFLAAAIDCIFVTCTQGFAVITVLQTEGDFPVFQMIFLFVFTTYDL GHFFVYCLAGDILMTESSNFGVSLYNSQWYKLAPNEVKAVLIFTSRCILPQKITGGKFTALSLPLFA NIVKTAASYLSVLLAMKV

101

Supplementary Data 2-S5 – continued

DallOr MTKTEEKSQVDRTVYENEDFNYAIGWNRFSLDVLGAWPRRNSGIIGRQRSLVCALGIIILIYLPQ 19 SASVVVHWGNMDAVIECLSVNGPVFLAFAKLLLFRYRRKEIRMLIDFMSDDWNTPRSLEEREA MLRTAKMSRVISLGSGVITHTLFVAYIFYKIYFGIEDMKRTDLDPRLAVGLLHPARLPFDTRKIEYF IPMWIGQCFCTYFSMTIYAVFDCMISAVVLHICGQFSVIGLALRNLADDQVGCRSDLFREKLAAI VKRHEKLNDSIGVIEDSFSSILLPQMLICTFTFCFQGFALITSLLGSSTGKITFLETAFSVTYVFYTVL HLFVYCYIGDQLLVESSSISYSVYDSHWYNLPARHARSLLFVGYRSLRPLKITAGKYCGFSRNLFII VLKTSMGYLSMLLTVKQRMTD DallOr MRIDFEYTCGWNRVLMDFAGFWPKPRSSFIGKNWPLIHAVIVLIVIIIPRFAAMFLFWNEVDA 20 VVQSFATQLIFIVVFFKLIVLQFGNKVLVDLLVTMEDDLSTELTEEQYTPVFKAAKIARTISMFLGI ATISILSTGIVSLKLYHLDSFYIKNPDPRLSTNFFFVAYLPFNTTSIITYMMILSIQFYISVISTGIHLILD SFIMMLVLHICGQLQVVQESLTQLVKGILGFEKLSFQVAFRAVLMKHQAVCRFTKNLDALFNFI WFLELFSCTLTLCFQGYTVQKLLHSDQSTIFQMGFPLCSTLGLVSQFFLNCWAGEYLMSESAEI GFAYYRSEWYTLVSSDARSLMMIGHQGLRPLTLTTCKFSILSLRLFLQVMKASFSYLSMLLVLTH S DallOr MKVDFAYSCGWNRFLLNLSGFWPKRRSSFVGTHWPLINAILVLVVIIIPRFAAMCLFWNEVDA 21 FIQFFTTQLVFINVFFKLIVLQLGNEVLVDLLSMMEANLSADLTKEQYAPVLKLAKFGRTISIIVGL SAVGVVITGVVAMKLYHLDSLYIKNPDPRLSVDFFFVAYLPFDATHIISYTTVFALQLYVSVISTIIN LNLDSFVMMLILHICGEMEVVEVLLSHLGDGIEDRRGVQLELKRIVRKHQMVCQYVKNLEKLFS IPWFLELTSCTLIFCFQGYNVLKLVRTDQFGVFQIGFIIFSTSGILTQFFLNCWAGECLLSRSSRIGY SFYRSKWYELRPSETRPLLMIGFQKTRPLTLTAWKFSILSIRLFLQLIKTSFSYLSVLLVMTDFQDE

DallOr MRIDFDYSCGWNKFLLNLGGFWPKRQSSFIGKHWPLINAALVFVVIIIPRFAAMSLFWNEVDA 22 FVQSFTTQLAFMSVFFKLIVLQLGNQVLVDLTSMMEANLSVELTKQQYAPVFKAAKFGRTISTI VALSFVCVVITGVISLKLYHLDSLYIKNPDPRLSVDFFFVAYMPFDTTNIVSYATIFLLQLYVSVMS TGIHLILDSFVVMLILHVCGQLELIQLSLTQLGGNVAQDRKRFQAVLKSIVRKHQSVCQFVNDLE KLFNFPWFVELVSCTLMFCFQGYNVLKLLHNDQSGVFQIGFIIFSTFGILTQFLLNCWAGECLIS QSSKVGYAFYCSKWYELRPSEARSLMIMGFQKTRPLKLSAWKFSTLSVKLFLQVMKTSFSYLSVL LVVTDF DallOr MKIDFAYLCGWNQLVMDMCGFWPKHHSNFIRKNWYLITASVITIVIIIPRFAAISLFWNEIDAV 23 VQCFSTQLVFITMVFKLLVLHFQNEVLIELLTTMEEDFSKELTKIQHTAVLRTAKIGRIISIFLITSSL CNVIAGVIAMKLYHLDSLYLKNPDPRLSLDFFWVSYFPFSTKNIIIYILVMSLQFFASAITGFYFIFD GFVVMLILHICGQLELIQISLKYLDEAVTVTEGRNLRIAFASIMEQHQRAHRFVNNLEQSFNLTW FLELTTCTLTLCFLGYIVQKLLHTDQSSIFQMGFPICAVLSVLMKFFLNCWAGEYLISQSSEIGYAF YQAEWYKLSPADARLLMMFGHKKTRPFALSTAKFSILSLRLFVQVLKTSASYLSMLLVVADQPD K DallOr MWPKKNLTGFRRYHVLLNAITVFFLLTLPRLAALYLFWGEIDAMIQCFSTHMPFAIGIVKLVILYL 24 QQQALGEFMEKMECDWKNPISNEQYRTMMRMAKIGRQISIFSGLTVYSGNSFGVVVQKWY NLESFYIQQPDPRLTLNLFWVAYLPIDTSKNINFAVSLLLQFYTSFIGSLAYFVFDSLIAMTTLHVC GQLDCLKLSFAQLCENENTEKPWKFQKDLRRIVQRHSELIRSVAENHFSLDNSFNFVLFLSFIGC TLTFCFQGYAIIRVILLDTGKLNFFQVAFSATVTMGAISHFFICCWMGDLLITQSTAVGQAYYKN VWYKLPHQQARSLMIIGFKNFRSLQLTAGKFVPLSLNLFLKVIKSSMSYLSMLLAVQSRQV

102

Supplementary Data 2-S5 – continued

DallOr MILIIAAIFLVDEDHLRSGFDYACGWNRFNFDLLGIWPRKNATGIAKYRSLINASVIFLILVIPRSIAL 25 YLFWGQIDAMIQCFSTHMPLIIGMVKLIILYFQQQVLGEFLEEMEVDWKVVKPTLHHDTMMK MANISRGISIVSSSMAYAANSFGVVKFYNLESFHLKDPDPRLTMNLFWVVCLPFDTSTSINFWV ALLLQFYASFSGSLSYFAFDSLVAMMIFHLCGQLDLVRLSISQLGEDSETQRVRDIQAMVRSIVKK HEKLIQSARNLETAFNFALLLLTIGCVLTFCFQGYAVIRVTVNAVRQLNFFQVGFAVTITLSAVSHL FFCCWAADCLITKSSAVGDAIYHSGWYKLPHSKAHFFLLIGFKKFRPLEITSGKFAALSLDLFMKVI KMSMSYISMLLAVQSRRT DallOr MIPVIWSDYDHPYLGFRYACGWNKFTLEIMGLWPRRKTTYFGRNRALIHAFVIFVVITIPRLVGA 26 FFFRHEIDALIQCLGTYLPFIIGIVKLITLHYRREVLAEFLETMESDWAVPLSGERLKVMLEMAIIGR RISIFSGSMAYTANSIGIVVQKWYNLEAQQIQDPDPRLVTNLFWIVWLPYDTTNGINYAVSFVLQ LYSSVYAALFYFIFDSFIVMLVLHLCGQLGGLQVSLRHLHETDVHKVTFQTKLRNIVRKHEKLVRLA GDLEESFNFVLLLQLLGCTLMFCFQGYGIIRLVTQGTLNLFQVGFAVTVVFATAVHFFFCCWAGE FLVSQSVSVGYAVYNCEWYKLPHLEARSLILIGLSKLRPLELTAGKFSVLSFNLFLNVLKTSMSYLS MLLAVQGRE DallOr MELQDWTFGTSYELLVYKLIMWPLGIWPLNKGEIFSEIRLLLAASTQAATFICLSLEIWNNCRGVE 27 IVLDLFVLSVFSLLACIKGLIVHHHISDMNESVTSAIMDWCSLSRSENPQNRELMMKYARIGRIVC FSLMIPASSGTLSWIILALPLPMFTPPNSSDVIKNFPLQTACTFEPITTSRFYYIIFVLQMYQLVTTCL GNCGNDVFFFGLSMHTCGQLEILQNDFTSIKIREQEPDNSKKLRCMVDRHVHLIELVEKLESTFS MIILAQLIMSAVLICIMGLQVIIALKTDDMFAAIKANIVLSSLMSQLFLYSYGGDCLTTQNTSVVHA VYKSSWYETSSMMRKNIAFIMMRANKPIYVTAGNFFYMTLSTFMDILKASVSYMSVLRIAIDV

DallOr MRDDSSRQSTRNLFFLYSYVLTIYGAWPLSDDTLFLKCRVTFAACQMIYLTAICIIKLIIQCGSSEDII 28 DAFLLLVVCILVCSKSILLYVHRDKLASVVLSSVEDWNTTESEHYRKMMSERANFCNIVTKLFYSM GMFVLFIYILKVMVLDVIYVMDEDSVNGTLILQKNYLLPGGCVFDGFGNVVFYFVVINQAVQLSI SCSINLGGDALYVALTLHLCEQCEIMKLRFEKFGRSDSLEKNRKHLNALIARHQDLMTRAETLEDV YNQIILMQMLMSVLLISVGGFSFLISLNGGDVIGAAKNICVMQFMLAQSYIYTFPADDLKEQAEG LLRALYSSQWCDMPANIMKDIVFMMMRINVPPYYTAGKFFYMTRQSYMTVVKTAASYLSVLRI MIK DallOr METTTQFTTIWNDNIAYAFGLYRLITQSLGIWPFTCHKFISKCQVLIVCVLQLTMMISIIGEMHLQ 29 CGEVSDKIQNISLCSCAFMTIVKVTIIRAHNPRMHEIMMSAMQDWLCVKTADDNKKMRSCAKL GRSICLFQMIGAYVTAIPIILSGFAGHSAVNTTNSTDSDIKILPLGTVCLFQQMSSSFYTSVYIIQCIQ LLITCTGNVGCDCYFFGVTMHLSGQIEKLTDDMERFGKEKEDTDCHEKLVALVKRQNHLLELAE NLETTFNIVILVELSALTYEICLIALQMVVNLRMGNSVVIINNIIMLQILYLQLFLYSYAGERLSSGLE NLGSAIYSSEWCDSPSKVTKDLIFVMMRCRKSFALTAGKVCVMNLESFTNIIKAVGSYFSVLLAM FD DallOr LWMSISLTKKILTKGNCGSVTDNVDALSLIFCGFLTVMKVAIPRTHQKNMLTVVDSAIKDWSSVV 30 SDKPSSVMKKYAYIGRLVFLVQMCGAYAAGFPLIFLKLPFINALWNDPEDNVTLREVPMGPRC P+N WISDDVSTYHYVGDIYSSIGAIVRCLHGVHRV*HFFGIAMHVCGQFEVLHMNVRRLDGTKDSLR WRQKLHDFVRRHNHLLQLAHNFEETYNAIILAQVGVDTLLICISGIALLMSLGSGDFVIISGLIIRIYL VYVQLFMYSYVGENLSANAENLSVAVCNCPWHAMPRDIVKDIRFIIMRTDYRFNLTAGKLYPM NIENLKTMLTKIGSCFSVLRLVFHESALFRITFTLFWV

103

Supplementary Data 2-S5 – continued

DallOr MANAGWNKDTRYALGWYRGLIRILGIWPLDSRGLFSTLRILIVISIQIILVAIFVQNWIVKGNCGTI 31 TELVDAVSLITTSLMAVIKILLPSIHRNRVSSIVNSALQDWSDVEDKKSWEIMRRYAYIGRLVFIVQ MFGAYMTIIPLILMSLPKFVEVEHLDNKSVFLRNIPIGPKCWVSLEISALTYFLYYIFVCLHLFILATA YLGGDVFAFGLAMHLCGQFQLLYRSLDELDGDETESIQRARVARFSKRHNQLLKLADDFEAAFH MLIFFELGANTFIISISEIILLWACKVGDTQIISAMAIRIYLMYIQIFMYSYIGEHLSTQAEKLQVAIYN SPWYGMSPAIVRDMKFIMMRNNYLFHLTAGKIWNMDYENFKSIVRSMFSYFSILRLILNE

DallOr MLEKWNKDAEYDLGFYRLINRGLGIYPFECQEVFSIIRMIIVLTINWIMLYEQVIDLISGCGQTEEII 32 ESIVGALSYPVAIAKIILPRVYWRNMKIILESAVEDWSRQNDAEAKRTMMEWVAIGRFTFLLEITS LCMLLTYTTISHFPFFLLTQNHRENSSAMIDGSLPWGSGCWLPSNTSNQMFLIAYFATCIQQIVG SLSNMGFDVSLFIILSHMCGQIEILKLDSGRNWNGTSFDSSKNPQEVYRQFIDRHNHLLNLCDLF ERSFNVIILIHMVLYLGLVTILLLNTFVAWYYQNIAMVIKAVFELMIIYVQLFVYCYAGEKLTNQLE NLRDEVYSSPWYTLPKKSMQDVCFILRRLNDPFYLTCGKFYRMNMDYFKNIVKLTASYCSVIRLM FYD

DallOr MIMEVEASREADLWPSDTTYALGYYKLLGRGLGIWPLESRGFMPNAKMLFVIIIQLWMSISLTK 33 KLLIKGNCGSVTDNVDALSLIFCGFLTVMKVAIPRIHQKNMLIVVDSAIKDWSSVVSDKPRSVMK KYAYIGRLVFLVQMCGVYAAGFPLIFLKLPFINALWNDPEDNVTVREVPIGPRCWIPDDVSTYRY VGDYILQSVQLLVVCTAYIGCDTYFFGIAMHVCGQFEVLHMNVRRLDGTKDSLWRRQKLNDFV RRHNHLLQLAHNFEETFNVIILAQVGVDTLLVCISGIALLMSLSSGDLVIISGLIIRIYLVNVQLFMYS YVGEKLSTNAENLSVAVYNCPWYDMPRNIVKDMRFIIMRTNYRFNLTAGKLYSMNIENFKTML TKIGSCFSVLRLVFQESA

DallOr MPKAKMLFVIIIQLWMSISFSKKLLTKGNCGSVTDNVDALSVIFGGFLTVMKVAIPRIHQKNMLI 34NTE VVDSAIKDWSSVVSDKPRSVMKKYAYIGRLVFLVQMCGVYAAGFPLIFLKLPFINALWNDPEDN VTVREVPIGPRCWISDDISTYRYVGDYILQSVQLFVVCTAYIGCDTYFFGIAMHVCGQFEVLHMN VRCLDGTKDSLWRRQKLNDFVRRHNHLLQLAHNFEETFNVIILAQVGVDTLLVCISGIALLMSLSS GDLVSISGLIIRIYLLNVQLFMYSYVGEKLSTNAENLSVAVYNCPWYDMPRNIVKDMRFIIMRTN YRFNLTAGKLYSMNIENFKTMLTKIGSCFSVLRLVFQESA DallOr MHITVMEHDGDIELQWNEDIQYTLGFFRLWCKILGIWPLQRNETYSIIRSRFLTFTQVSAGILLSQ 35PSE QMLTYGNCGLITQLVDALTGVSGFFVTALKILLSWLQRERIRHIMQSVKEDWLTVXMMKEHAF WGRVAYIIQIGAACSIFLEMALTRLPNFTVDASENASFLERNLMLGP*CWAPVTMSLSSYLIHYY AAFIGLEGVVFIYTGCDAFMYNLALHICGQFEILSSNVEKFEDEEGCMRQRHGFREFCKRHDRRL QMGHQINIVVNLIILSNVLTNGFLICISGIAVLVNIKNGNANKDVINLAVRIGIVYMSLFMYSYVGE KLS*QSQKLQISLYNCPWYHMPVHVAKDFQFIIMRSNSGCHLTAGDVYVMNYEGFKDITRIIFSY FSVIKLILE DallOr MEYNKKLHSQWNVDVHYALDFSKWWCKALGIWPWQHNEVSCIVQASCIVLIVMTAAALSSA 36 QLQTKGYCGVITDLLDTLSGILVFCGTAIKILSLWFHQQQMRCIISSIIDDWLNISGTKSREIMHRY ALWGRLAFILQITGTFVAMALTTWIDPPSIKTEISKNDSSATRHMLLAPPCWVPLTMPFNVYLLY YNFIFIATFFSVLTYAGCDAFMFSAALHICGQFEILEANIEKLNDGDDYPTQKYKIKKYSKRHNHLI QMGKRMDALVNVIILSELLSNSVLICASGIVVLANVKTGNVNRVVISWAARIYVWYMEFFMYCY VGEKISSHADRLRSVVYNCPWYNMSVHIAKDMEFMMMRNNYFCHLTGGEFLIMNYESFAKIT KVMFSFFSVLRFMLE

104

Supplementary Data 2-S5 – continued

DallOr MKYKEEIHELQWNRDMHYALGFSKLWCRAVGVWPWQRNETSSIVRSGLFFTIQIAAAVSLME 37 RLFVHGNCGSMPNFIEALTGITVFMVTAMKILLPCLQQEQMRHIIQSAMEDWSTVSDARSRQI MSQYAFWGRLAYSIQITAAFIIVLEMTVSRLPKFIIETTENSSLSARNLILGPPCWIPVTMPTSSYLL HYYLIFFSVWGAVLIYPGCDAFMFNAALHICGQFEILNSSIEKITDEDGHLNQRLRIRESLKRHNKL LTLGNQLNDMANLIIFSELLSNGFLICVSGIAVLANIKNGSVNNDDINFGIRIYVWYMELFMYTYV GEKLCDQAEKFQNAVYNCSWYNMSASSAKDIKFIMMRNHSFCYLTAGGIFVMNYEAFKDITRI MFSCFSVLKLVLE DallOr MDLRKETPIPWNDDTKYALGLYRQYLTLLGIWPLERDRVLPTLRLMLFIVIQMSVNITYITNLMTT 38 GYCGRILDLVEIITILSLSNISVFKLIVPWGYSDKMYFVLNSMVEDWVKVYDRKSHDTMMKYAYK GRAVCVFQLFITAFLIIRNILHKLPATVETITPENVTTLSRIVPYGPACWVSTTMSAHYYNAYYSLICI HWLSAALAYIGTDIYLFGLGMHMCGQFELLAAELDDISGNENCEEQHEKISKFIRRHNHLLNVAE SIEDVYNIPILVEVINNTVIISISGIILLWAFKLRSRAMAIPKLIRIYLFYVEIFMVSYVGQALSEKTEKL QEAVYNIPWYKIPKRHMKALVIVMMRVKYPFQLTAGKIAHMNYETYKNIIKSTFSYFSVFRLVLE Q DallOr MFGLCQLWSSKFNRSSDDDYHREIQIPRFFLGIIGIWPEHGETRSIRFYVSVTMVTLFAALPRFM 39 GLFTIVNENEFFFQVGVVMYQIYVIFAMLLLHFNSNVGRSILEEMRDNWKTTHSMEDKKVMRH YAEYSRKISLVTVAILSFVSWLNPLIAKAIKGGHSLATVGTVEIPFLPSNPFGDALGNIAFYIARIIAAI PPIGIEPCLTVLIVHTCGQLVILRNRITTYGGKIRHHFGEDISRMCSCLKCLVDSHVKICDFVRRIDG YFHIVLFMRLFVGVIDFVCSGFEASKAITKGEYAHLMTFALFTLLIYSTILISCSMAEMCQQASYAL GDAAYSTEWMTKHPRVSQRLTFIIRQSRVPLKFTAGKYFVLSRSLFKECFLTGLSYISTLMTIKLGK

DallOr MDWLRRMWNRPFHEYSSDENYEWEIQFARLALWPSGIWPETGKMTRIRFFSALLIMVGFTID 40 MVLGMNEIVARMSAISFFTPYIASLISLYSNANTFLSILEEMRGNWKTVSFEDTEILHHYARRSRFF SWIVVMGKVGLQLFENFIARKRVTTPDDSQDLPFSNKAFILACISIIREITLLAIRITEMGINLCFILLT THICGQLVKLGSRIQAHEGKIRHSLGEDILQRCSCLKCIVDSHVGIYEFVARIDGYFHKTMFLQLSQ SVIQFAINGFLVSEAFIHRDYAQLIQYSFYLLLNFPTFLLCCWMAEMVQQQSAALGDIAYSTEW MTKDPKIARHLMFIIRHSRVPLSLTAGKYFVLSLSLFREYVFTSISYLSVLANIKLRQ DallOr MDWLRRMWNRPFHEYSSDENYGWEIQFARLALWPSGIWPKTGKMTRIRFFSALLIVVGFTIFA 41 ILNINDIVTLVSATVFCTVHLASLILLYSNANTFLSILEEMRGNWKTVSFEDEEIRHHYARRSRFFS WIVVMGKVGAHLLEKIRARKRVTIPDDSQDLPFSSEAFILACIFIITEITLLVIRITEMGINLCFILLTTY TCGQLVKLGSRIQAHEGKIRHSPGEDILQRCSCLKCIVDSHVGIYEFVARIDGYFHKTMFLQLSQS VSQFAINGFFVSEAFIHRDYALLIQYSFYLLLNFPTFFLYCWMAEMVQQKSEALGDIAYSTEWMT KDPKIARHLMFIIRYSRVPLSLTAGKYFVLSMSLFREYAFTSMSYLSVLVNIKARQ DallOr MDWLRQMWNRPFYKYSSDENYEWEIQIARLALWPSGLWPETGKMTRIRFFSALLIMVGFTIDL 42 VLGMNEIVARMSAISFFTPYIASLISLYSNANTFLSILEEMRGNWKTVFFEDEEILHHYARRSRFFS WIVVIGKGFVGLQLFEKFIARKPVTTPDDTQDLPFSSKAFISACISIIREITLLTMRITEMGINLCFILL TTHICGQLVKLGSRIQAHEGKIRHSLGEDILQRCSCLKCIVDSHVGIYEFVARIDGYFHKTMFLQLS QSVIQFAINGFLVSETFIDRDYAQLIKYSFYLLLNFPTVFLYCWMAEMVQQKSEALGDIAYSTEW MTKDPKIARHLMFIIRHSRVPLSLTAGKYFVLSLSLFREYVFTSISYLSVLVNIKLR

105

Supplementary Data 2-S5 – continued

DallOr MSWLRRMWSRKFDQSSDDDYYREIQIPRFFLATSGIWTEHVKTNRIRFSIFVTILILFAALPRLLGL 43 FTIVNENQLFYQIGSLLFYVNVICAMFLLRFNSDIRRSVLAEMRDDWETTHDVEDKQVMQYYAE YSRKVSLLVITVRGVSLIGKFRSLNPLITSIIRGDNMSTGKKKYPFLPSGTFFDTVSSICYLIARIVSTV TMIGIDTFFAILTIHTCGQLVILAKRITRYRGQTHHQFGENIPRTCSCLKCIVDSHVRICDFVKRIDN YFYIVFFMRLLVGVIDLACSGFELSKAIAKGEYAYLMTFVISTLSTSSTFLINCSVAETCQQAGYALG DAAYSVEWITKQPRILRHLIFIIRQTRLPLKFTAGKYFILSRSLFKEYFLTAMSYISTLITIKLGK DallOr MSWLRRMWSREFNQSSDDDYYREIQIPRFCLGISGVWSERGNTNRIRFFISVTIVILFAALPELVH 44 MLTTVNENQLFFQIGLVMFYYYLISCMFLLHFNSQLRGSILEEMRDNWETIHSVEDKQVMRHYA ECSRKISFIFIALKVYTVFGKSAGCGWFNLIVSAFIRGENVLAGTNKMKFLFHRSGTFVDALGSTSF VIGDMLSNIASLGIDTCFTVLTVHTCGRLVILKNRITTYERQTRHHFEEDIPRMCSCLKCIVDAHVK IYDSVRRIDDYFHVVTFLRLSAGMINFVCIGFEVSKAIKNKDYGFLMTFTNTAFLNFSTFFLNCETT EMCQQASYALGHAAYSTEWLTKHPRISRNLIFIINQTRAPLKFTALKYFVLSRNLFKEYFITAISYIST LISIKLAE DallOr MSWLRRMWSRKFNQSSDDDYYREIQIPRFCLGITGVWSERGNTNRIRFFISVTTVILFAALPNLV 45 DMLTTVNENQLFFQIGLVMFYYYLISCMFLLHFNSQLRGSILEEMRDNWETIHSVEDKQVMRH YAECSRKISFIFIALKVYTVFGKSAGCGWFNLIVSAFIRGENVLAGTNKMKFLFHRSGTFVDALGST SFVIGDMLSNIASLGIDTCFTVLTVHTCGRLVILKNRITTYERQTRHHFEEDIPRMCSCLKCIVDAH VKIYDSVRRIDDYFHVVTFLRLSAGMINFVCIGFEVSKAVKNKDYGFLMTFTNTAFLNFSTFFLNC ETTEMCQQASYALGHAAYSTEWLTKHPRISRNLIFIINQTRAPLKFTALKYFVLSRNLFKEYFITAIS YISTLISIKLAE DallOr MENNLSGRNRAADVERKHFDWEVGITKPVLKLIGLWPGGQHQTFLFIVNEFMQIYLFSVFLASI 46 VLEPSVTSKIDKIFVWLGIVLIVVDHNIFRAQWHNIKPLIECIQQNWREFDGLSADIKGIMVSYGG LERRITRCIYFVSGVSLIVFTLQPILIHLAFPSKFPLAHPNNARHPFNASVSPMFEITYVIQYVFHCSA ALIFSSIDCCIIWLIFHCCGQLEILANVISHYKGIRHHTSDSVDDEKRVIMGCSCLHCIVNYHLRIFK MISLFDDAFHVVMFARMIICTGHFICIGYTIRKLKEDGNFIFLVVNVFFLFTMDSSFFIYCWLGDKL KEKSINIGYTAFHQHISSKKKVSKDLIMIIRQAQIRPCHITAAKFFVMSLELFKEYVLKSMSYLSVLV ALRVNDKS DallOr MGKQLTPEFIIQWTKFTTGFIFTWPPSLKTTRLKEIISKMGWWISLTINVVNMEFTECLSAASTFI 47F QVLIVMIIEKYHYYRFQYLIEEMESFVKNANPHEREVLMHYTKRITPFYFLYNILGLVATAGYVFSP FVTNQLFPNIAEFPFPVHEHPTYDMVFILEGISTFQCYCVIAFTYQMCLLVWYGIIRLRLLAEKIEN VTDSQQFAECIFIHQHVLWFIDETIKIMRPMAACIIRLTTISIACGGIHIVGNEPVVRKVQFALIQIA YALLLLSIAWPAEDLTAASERVGWALYNSSSIRNSKTLEKDAIFMMQKCQNAPKIVIGGLMGNLS MEYYAKYMSAVYSFFTMLRVLLQKLKNS DallOr MREDFTPDFVIQ*TKLTMGILFTWPPSIK*TRRTKIIQKIG*WLSFIVLVANVLPLVSTTYKY*SSFIK 48PSE FTESLYQALHMILVGLGTIIAKHHYHRFQYLIEEMETFVKNTNYHERGVLMRYVKRFAPLYFRCY MSGFGATVAYVFGPFIIDQSFPHIAEYPFAVHKHAVCNAVFTLEVLAGNQCYCYSTFICQLCLLL WYGTIQLEFLAEKVENATCSQEFTECITAHQHILWDIEEAIKIVRSIVAWTIAINTISIACGGIHIVG NWFCFW*NEPVAKKVQFA*IQLS*AWVLLCIAWAAQNLTATLVGHYTTLPRFETQTFNKDAIF MMQRCQNPPKIAIGELMSYLSMEYYAKYMSAIYSFFTTLCVLL DallOr YLIEEMEMFVENVNSR*RQTLMRYVKRFASFYFRYNMFGVGATIAYVFCPFIMDQPLPNTAEYT 49 FHVHEYPVYDTIFIWEAVGGVQCFVTGFIGQLCLLLWYGTIQQEFLVEKHVTCFQEFAGYITTYQ P+N HILWYIEETIKIVRPAVASIIAINTISIACGGIHIRNDPVVRKVQIA*INIVYAWILLCIAWAAENLIAA NERVAWALYNSSSLRNSKTFTKDAIFIMPRCQSPPKIAIGLMWNLTMEYYAKSFFVTLRMLLQKF ENN

106

Supplementary Data 2-S5 – continued

DallOr MDKRLTPAFVIQWTKFTTGIAFSWPASSKTTRITKIILNMGWWISLIVAVAVGLPLITTAYKHRN 50CTE NFIKFTESLSIAFCFVQIVVMMIIAKHHRHRLQYLIEQMETFVKNANPHEREVLENYVEREIPFYVR YYVFSLGAVLSYIFGPVVIDQPFPSIAEYPFPVDKHPVYDLVFVLEVIGSIQCFCNISFICQLCLLLWY GIIQLEFLGKKIKQATSSKEFASCIRVHQHTLWYIEETIKIVRPTVAVLIAIATITIACGSIHVVGYGPV LKKVQFAWIDIADASMLFCFAWTAENLTAASEKVSWALYNSPWILHSKRLKKDAIIVMQQCQN PPKIAIDGFMPNLSNEYYAK DallOr MGKEFNLDFVLQWTKLTTGILFTWPPNVKTTRRSEIIQKMGWWISLMFAVANTLALVSTTYKYR 51PSE NSFMKFTECLYKALCTILVVLGAIIAKHH*HRLQYLMEAMETFVKNANSQQRERLMKYVKRFAP FYFRYNLFGVGATMAYVFCLFIMDQPFPNIAEYPFHVHEHPVYDMVFVLEALRGAQCFCYGVFI GPLCVFLWY*TIQLEFLIEKIEHVTCSQEFAECITIHHHILWYIEETIKIGRPTIAWTIAVNTISIACGGI HIVGNESVIKKVQFASIQISYAGVLLCVAWAAENLNERVGWALYHSSSIQNSKTFNKDAIFMMQ RGRNTPKIAICYMSSIYSFFTTLRVLLQKLEHHC DallOr MEKERTPEFVIQWTKFTTGILFTWPPSLNTTRQIKITLKICWWISLIVAVTNAIPMLTTAYKYRTNF 52 MKFTESLYAALCFIQVVSAMIIAKFHNHRFQCLIEEVKTFVNNANPHEREVLMNYVKRVTPSYFR YNMLGVGTILAYVIGLFILDHPLPNIVEFPFSVYDHPVYYMVLIMEILGGGQCACTGAFICQLCLLI WYGTIQLTFLAEKVEDVTCLQELAECISIHQHILWYIKETMIIVRHMVACIIVLTTLSIACGGIHIVR NETLFSKVQTISILIAYALELLCIAWAAENLTTISARIGFALYNSLLMPNSKAWSKDAIFMMQRCLN PPKIKIAGLMSNLSMEYYAKYMSAVYSFFTTLRILLKKFEDN DallOr MGKQSTPEFVIQWTKFATGILFTWPPSSKSTRLTEIIFKICWCISFILAVANVLPLLSTMYYQRNNF 53INT MELTDCLSTVLCFMQVTLRLIIAKNHYHRFQYLIEEMETFVKNASPHEREVLTTYVKRVSPCYFLY NVVTLGATVIYVIGPFITDQQLPNAAEFPFSVDEHPVYDIVYLLEAIAGIQCSCSSAFICQTCLLLWY GTIQLDFLAEKIEHVTSSQELAECISVHQHILNGAVFKKIQPVWIQSAYTITLLALAWAAENLTAAS EGVIWALYNSSWIFSSKKLMNRAAIFVMQKCQHPLKIGIGGFMPKLSMEYYTKYMSVVYSFFT MLRVMLQKFEENQ DallOr MGEELTPEFVIQWTKFTTGIVYTWPPSSKTTRFTEIIFEICWWISLIVGVAGALPLLSTAYTHRTSFI 54 KFTESVSLAFCLIQCLVVMILTKHHHHRLQYLLEEMETFVKNANPDERDVLTKFVKLVTPFYFRYN F+CTE MVTLTATLAYIFGPLVPFPNIAEYPFSVNKHPLYDMVFILEIILGIQCSCTTGFICQLCLLLWYGTIQL RILAEKIGRVSGSQGLAKCVSIHQHILWYIEEIIQIVRPVVAVLIGLATISIACGAIHIVGNAILLKKIQF ATVQIAYAFELLSRAWAAENLTAA DallOr MGKQSTPEFVIEWTKYTTGILFTWPPSSKSTRLTEIIFKICWYISLIFAVANVLPLLSTMYYHRNNF 55 MKFTECLSMTLCFAQVVFKLIIGKNHYNRLQYLIEEMKTFVENASPHEREVLTKYVKRVSPFYLLY NIITLGATLVYVFGPFIMDQPFPNVAEFPFPVDEHPVYDIVFVLEAIAGVQCGCSSGFICQTCLLL WYGTIQLNFLAEKIEHVTSSQEVAECIAVHQHVLWYIEETIKIVRHVVTGLLGITTISIVCGAIHIVG NEAALKKIQSAWIQIAYAMALLALAWAAENLTAASEGVSWALYNSSWIFNSKKLMHRAAIFMT QKCQHPPRIAIGGFMPELSMEYYTKYMSAVYSFFTMVHVMLQKFEENR

DallOr MGKELTPEFVIQWTRFMTGIIFTWPPRSKTTRLTEIIFETCWWISLIVIVAGTVPLFSTAYKHRTSF 56 NKFTESVSLAFCLIQTLVLVIIAKHHSRRFQYIIDEMETFVKNANSYEREVLTNNVEVVTPFYFRYNI LCLTATLAYTCGPFVMDQPFPNIAEYPFPVDKHPLYDMILILEVINGVQCCCTGGFICQLSLLLWY GTIQLRILSDKIEKVSDSEGLENCISIHQYILWYIEQIIEIVQPVIAVIIAIATISIVCGAIQIVGNAILLKKI QLASILIASGFELLSVAWAAENLTAASEGVSWALYNLSWTQNSKKLNGAIIFMIQRCQNPPKIAI GGLMPTLFMEYYAKYISAIYSFFTTPRVMLQKFEQNQ

107

Supplementary Data 2-S5 – continued

DallOr ISKQSSPKFVIQWIKFTTCFMFTWPPNPKSIRLTEIIFKIC*WIASIVAVTNALPLFSTAYQYRNNFS 57PSE RVTECLTIGLCSTQVVLRLIIANHNYRRCQYRIEDMETFVKNAKPHEREVLTTYVKRVAPFYFRYNI MAVGAALTYAFSPFVMDQPLQNSAEFPISVDEHPVYDIVFVLQAIGGAKVGCGAGFLCQFYLLL WYGTIQLGFLAEKIEHITSSQELAKCISIHQRILRYIEDAIEIARPVVAVMLAMTTISIACGEIHIVGN GPVLKKLQYTLIQIAYASPLLGIAWAAENLIAVSEGVNWALYSSSWIRNSKKLNTAAIVVMHKCQ HSPKIVIGGLMPKLSVEYYAKYMSAGYSFFAMLHFMLEKFGENR DallOr MTISKQSTPEFVIQWIKLSTYFMFMWPPNSKSTRLTIIFKICWWISLLVAAANALSLVSTAYQHRS 58 NFIRVTECLSTGLYFIQIVLRLIITKQNHHRFQYLIEEMETFVKNENPHERQVLTRYVKRVAPFYFPY NIMIYGAVLAYIFGPFIIDQPLQNSAEFPFSVDEHPVYDIVAIFEAMGGIQAGCSAALVCQLCLLL WYGTIQLGFLAEKIEHVTSSQELTKCISVHQHILWYIEETIEIVRPVVAVMLGITTTCIACUGIHIVG NGPVLKRPQYALIQIVYALPLLGVTWAVENLTAAVTITGVSWALYNCSWIHYSKEFNRAAMFV MHKCQHPPKITIGGLMPELSMEYYTKYISAVYSFCAMLRIMLHKFEDNQ DallOr MVDEKKLPELALTYLKYGTALLCTWPPSSRATSAEKISIEVRWWLLYLVAWCLQLPVFYTAYLTRR 59 NFMDFTKYICFGSAIAHSIIKMIICKYHRKKFQKLIEEIEDYLLRATPREREMLNTCVKKAAPVYLSF NIIGFVAAASYVCGPLVIDQDLPMDTIYPFEIRYYPVFQIIYTTQAITTTQCAAVGPLDAQVCMLF WFTIARLKLLAHDMKNIASVGDLNACIRVHQSLLRFNEKACEVARPIIMTTVLMATCSIAFGAM HVIGHEPFEVKAQFVGFDIGYGAQLYLSAWAAENLMTAMDDLKWALYNSSWSDNTRKTNRSF LFVLQRLNKIPKVSVGGFIPELSLNYYTGYLSKTLSFFTTLRIMLQKTEGPSQ DallOr MAQRKLPELAITYVKYGTTLLCSWPLGPQASSLERKFNGVKWLFLCSILVSFQLPILYTAYLTRNDF 60 MEVTKYICFAGSISHGIIKMIVCRCYQREYQELIAVMENYFARAIPREREVLNACVRKAAPVHVVV NVIGFIAAVSYVCGPLVLDQDLPTEAVYPFAIDYYPIFQIVYTIQSIVAFQCCAVGPLDGQVCMLF WFTIARLKLLADDMRNVASVDDLNACIRVHQSILRFNEKTINVARPIVTTTVAMATVSLAFGAIH LIGNEPLEVKAQFVGLDVGYGLELYLSAWAAENFMKAMDDVKWALYDSPWLQRSQRTNRSVV IVLQRLNKIPKVTVGGLIPELSLNYYAVYLSKTMSFFTTLRVMLQKMEDPL DallOr MGKEYTPEFAIQYTKLTTGILFTWPPSSTASRLVEISFQIGWWISWIISVFLAGPLLATAYQQRSSF 61 MQLTKHLCIAVCLIQVVVKMVIGRYHYRRFQYLIEEMEIFVKNANPHERKVLMSYVNRVAPFHY RFNMVSFCGTLAVILGPLALDLSLPTEAEYPFPMYQHPTYDILFLLESIGAIQCGCTGPFDCQGCLL LWYATIRLDFLAEKIRNVNSAEELKECIRIHQHILWYIDETIRTVRPVVAATVVLATISIACGGIHLV GNEPVDQKVQFVGIDIGYSLELLCVAWAAESLTTACEEVGWALYSSPWIQNSQEFSRIAIFGIQR CQKPPKIAVGGLLSELSMNYYATYMSKTFSFFTTLHVMLTKLEEDL DallOr MGEKLTPELAIEYTKFATGILSAWPTSSNTSRVCEVSLKSEWKISWIIFVVIVLSLLRTAYRNWGSF 62P+I MQLTESLSIVACMDQLIAKVAVGKQNYHPFQKCLVIRSFREPICQVSLLLWYGTIQLEFLAGNIRN VNNTQQLKECIIIHQHILWYIEEMAKIVRPMLMTAIIVAMISVPCGGIHIVGKENTITRIQFMLVVI GDGLELLCLSWAAENLTIACQDVT*ALYSSAWIKNSKELNRSVIFVMQRCQIPPRIAVGELVPQLS MSFYASYVSTIYSFFTTLRIFLKDV DallOr MSGKLTPEFVIQYTKFATGITFVWPPSSKASRLSEITFKIGWWVSWILCVVIVLSLFRTAYNHRDN 63 FMQLTETLSTAACLLQVMTKMVVGKCHYNRLQYLIHEMETFVENANFHEREILTNYIKRIVSFHF VYHTLCAILATFYIFGPFLMDQPLPFGAEYPFCIDEHPAYDIVYLLESIGGIQCSCIIAFICQVSLLLW YGTIQLAFLAEKMRNVNNTQQLKECISKHQHILRYIDEITKIVRYMVMTAIVVAMISVTCSGIHIV GKENIIRRIQFLLVVIVDGLELLCLAWAAENLSAASKEVGWALYSSTWIKHSKELNKTVIFVIQRCQ TPPKIAIGGLLPQLSMNYYGRYMSTMYSFFTTLRVMLKKFEDDV

108

Supplementary Data 2-S5 – continued

DallOr MSREVTPEFVIQUTKFTTGIIFSWPPSLKSTRLTQIIFKICWWISLTVCEANILSAMKTAYRNRSHFL 64P+C KLTECLVLLFSLLLIEAVMIIAKYHYYRFQYLIEKIETFVETVNPRQRKVLMNYAARAALFYCRYNILA LVEALVDVCVPFVRDQPLPHMAEFPFPINAHPTYDIIFILEAMGTIQAYCQWAFICQLCLLLSYGII QLEFLAEKVERITSSQELVEGIHIHQHRLW DallOr MLGKITPEHGIQWTKLGTWIIFTWPPNSKATRLTKMSFQIGWWISLFFTIISLPFGFDTAWALRE 65 NTLMLVEILCLYICLIQVVAKMVIAKLNYHKFQYLMEKMETFVKNANPHERQVLITCVKRIAPSHL SYNMLCIIAALLYALAPFATGQPFPTVTQFPFPTDKHPTYDMIFFLQAINGFQCCASNFFDSQLSIL LCYGTIQLDILAEKMRNVNSVEKLKECISTHQDILRYVDDTIKAVRPVVITTVITATISLAFAGILIIGN GSIVHKIEFAGIILAYSCELLCFAWAAESLTTAYEQVGWALYSSTWIQNSKEFKRTAIFVMHRCQK PPAIRFGGLMPKLSMSYYASYMSATYSFFNAVRATLK

DallOr MTPERIYRILKFFGRLSCTWPPEENDNKFRRIINDLQFIVMMTNVIALLVPLMCGVYHNRHHVST 66 AMKALSELTALGDVLFNLILCRVQRDRLRGLLMEVNTYANTVEGDERKIFHRSVNQYLPFCAFVG LSYLQTAIAFSFGPLVMSSILPGDTWYPFPIEIFSTVYFLVYIQQVVAIIQTGMCITVDFMVAYLLSY LSARLQILNIEFRRVGNRRHLHACIKQHLEFIRFTGELRTAVRFIIVKGIVTMAMAAIFGAFPIIENE PLPVISQFILMVIGGCLRLYVSAWPADDVREMSERIGWSSYASPWIGSSREMQRAISIVVHRAH RPLVIAVHGILPALTLKFYASFLSSTLSYFMTLRAVLRN

DallOr TKSTTGIVFTWPESSKTTRSVVTILKIGWWVFWIFAVTQMMQTFYLAYTVRNSFMQLIHNLFHA 67NTE FCMNQVVLTMIIGKHHHHRFQYLIEEIETFMKDANSHEREVLSKCVKRIASRHFSYNIIAISATLGY ILGPITLDYSLPNRLVYPFAVDEHPSYDIAYLWETIGAIECCCSTAFICQVCLLLWYGTIQLEILAEKM RKVSIAQDLKEYINVHHHILWYIDETIKTVRPVVLTTVAMAMITILCAGITIVGNGPIIEKLQFMVIV MTFSIELLCVAWAAENLTAACEEVSWALYNRAVIFVMQRCQKPSVISIGGLLPKLSMNYYARYM SATYSFFTTLRVVLRKIEDDL DallOr MGRKITADVGIQWIKFWTWPIFTWPPNSKAPRLTKIIFQIGWWIFFIVSILQLASLLGTVWVKRG 68 NIQQFTEALCLAVGVMQVMVKMISAKCYYRQFQYLIEKIEAFVKNANSHEQEVLTKFMQRITPF YLGFNISAIVAALIYALGPFITRQPFPFVVEYPFRVDKQPIYEIIFLIQTIVCFQCGGTSFYDCQLSILL WYGTIQLDMLAEQMRNVNNARELKHCLSIHQDILRYIDDTIKTVRPVVITTVVGATICIASGGNLI VGTGSILNKAQFVIITLAYSFELLCVAWAAESLATAHQDVGWALYSSLWIRNSKEFNRMVIFVVQ RCQTPPVIRIGGILPKLSMSYYASYISATYSFFTTLRAILK

DallOr MGKRSTPEFAIQWTKSTTGILFTWPESPKATRTIVIILKIGWWVFLTVAALQIMLSFYTACRVRNS 69 FMQLIHNLFDTFSMVQVVFRMIIGKHHHHRFQYLIEEIESFMLNANAHEREVLSRCVKRVAPFHL SYNILAISASLGYILGPVTLERSLPNKLEYPFAVDEHPTYDIVYLWESIGVIQCYCSTAFICQVSLLLW YGTIQLEILAGKMKKVSSAQELKEYINIHHHILWYIDESIKTVRPVVLTTVAMATLTILCGEITIVGN GPIVEKMQFVVIVVAYSIELLCVAWAAESLTTACEDVGWELYNSPWLQNSKELNRAAIFVMQRC QNPPVIAISGLLPKLSMSWYATYISSTYSFFTTLRVILRKIEDDL

DallOr MIKRNIEDLWHIDARRNFRIYKVAAQIMGVWPFTCQEKFSKIRFFSLIIILISMAAMLAHDMLSN 70 CGNIDTTLESIVFVLSALLGIVKITLPRIYWRNIESIIVSAAYDWSTTINPKSRKIMEKTSLIGTAAFILL LGGSLFISVLFVLHKVMLNFRMSSRNLTTQYVVFGAGCWRSNLPISINSIYAVQTIQLGTMQLCV SGNDACYYQIISYLSGQLDILNLNVEELSHSYDRKTSPIDEFIRRHNCLLRLCWHVEDTYNFVVMC HLMNNLCFTMMIVLTTWNENKGLGYLVIFGSITIFLYGQIFLYSLGGDVIKTKTESLFHYIYSCPWY QLPTSERRKVLLILTKTNYPIHFTAGNFYRLNLENFKNIVKFTVTLFSFLRLSLQE

109

Supplementary Data 2-S5 – continued

DallOr MKRNIEDFWDIDALRDFRTYKVTAQIMGVWPFTCQETFSKTRFFSIIIILTSMAAMLVQDILRNC 71 GTINEALESAVFVPSALLGILKITLPRIYWKNMKSIILSTAQDWSTTTCTQSRKIMGRSSVVSTAGF LFLLGGSLSISVLAVLRKAALNFRLNSENSTIQYVALGAGCWRSDLPMNIYLIYATQSIQLCIMQLC VNGSDACYYQILSHLSGQLNILKLNLEGSINSYDGKSSPIDAFVKQHNRLLRLCYHVEETFTFVVLC HLMTNLCFIMMIVLTSWSENEERGNLVIFGCTTIFFYGQMFLFCFGGDVITTRTEALFHSVYSYP WYKLKVFERQKVLFILTKTNYPMHFTACKLYRLNLENFKNIVKFTASLFSFMRLFLQK

DallOr MRPTNLIVIKFISFYMKCPGFWPLESVRAKRFMNFLVAYAIFMTFFSYIQSVVIVYYAFDDFQKFA 72 SATMNTAICTMGLYKLLLFTYKRPSFLHFIGFVNKNFWCKSYHHNNDEIMNKCLKKCTFYVIFIVII CHTTLLFFFIQPLIDNRGKNESDRELPFAVYTSLPVELTPWYEIIFTVEFISVFYICVCYFCFDVFLYGV NLTLVGEFIILQEDLKGICHFDELPDPSSKYRGCIYSQFIKCINQHRLLTCCADNLRELYKDCVVSFV VILSLLICLAMYQLMTIRGKLLSQIHAYIYVTNILSELFFFTATCDDLSQASSGVAQAAYAVPWISIR SDVLRGRLVNGLRLVIMRSQKPSQMTVGDFSPVTLQTFTSICNLGFSFLALMRQS DallOr MRSADLAVIKVIAFYMKFPGFWPLESVHAKWWMNFMVIYAIGITLFSFYVSATIVYHSVDDFQS 73 LVSATMNSAIVVMGLYKLLLFTYKRPMFLDFIVFVNKHLWYRSHDQNNDEIMRKCLKDCAIYVIF IVIICHMTLSFYFIQPVLDNRGKNESDRELPFKWYTTLPVEMSPWYEMSFTVQLISVFYVCVCYFC FDVFLYGVNLTLVGEFIILQENLKGICHFDESPDSTSQYRGCIYSQFIKCINQHRLLIYCADNLKELYK NCVASFVVVLSILICLAMYQLMTIRGKLLSQIHACIYVTNILSELYFFTATCDDLSQASSGVAQAAY TVPWNSIKSHVLRRRLVNGLRLVIMRSQKPSQMTVGDFSPVTLQTFTSICNLSFSFLALLRQS

DallOr MTPTPGDSMAVRLTAFFMRCSGIWAEDTVLGQRIMSHVTSYSICSLLFAFTVTTNDLYHCFGDF 74 GKAAYCALNVSIVGMALYKQIIFATQRKLFLDIISYAKKNYWYRSYDDYGVRVMKQCATRSGVIII ASVVLCHATLIFYYIKPLIVNQGKNESDRVLPFKIWLDLPVTMTPWYEILYVVEVISAYHCCICYFCF DNFLCQINITLVGQFMILQEELRNICDRSDDSSKATPMDEICIYLRFKKCIIKHQDLINFIELVKELYK NTILGMVLVLSILICLELFLLITTSGELFTEMHSFVNVCSGIVQLFLFMLTCNDLAEASVELSHAAYD VKWFFLRSGVWKKQLVHDLATVIRRSQKPCNLAVGGFSSVSLQTFTSICNTSFSYLALMRQTVN D DallOr MASSPGDSWAVRLTALFIRCCGFWAEDTKFGRKVMDAIVTYSLVALLFAFCVTSNDLYNCYGDF 75 DRFTYCGLNVVTVGFGFYKLVAFSVKRRMFLDLIHYAKKHFWFCHYDDYGAEIMGNCMKRCLRI IIFAVFMCHLTIIIHYIKPLIANQGKNESDRILPFPLWIDLPVTLTPWYEFLYIIEMFSGYHCCICYFCF DNFLCQINITLVGQFVILQKAFCDIYDQRLALLSNDESYIRARFVTCVEKHQALIAFTEKVKELYRD VILGVVVLLSLLICLELFQLMTTVGQTLARIHYCVYAGGSIAQLYFFALTCNDLTEASLAIANAAYDV RWFLIRSEVKKNALVKDLQTVIIRSQKACSLTVGGFAPVTLMTFTSICNRAFSFLTLMRQSLDN

DallOr MLKIDREESGTIQWTYFFLRCTGFSVPSDPRKDLLKWVRLWAFCVPTLACPITFGDFYFNCYYGS 76 FNDLCYSMINVLTNFFVMAKFYVIILKPQFPAFLQLTRDELWGNAVTDYDREILRQCEKDTLFYLT IFSILAQSSSLAYIVEPILFNCLNHNMTDVRERRFPLKVWYDLPIFETPNFQIFFFIQFCFIYFASIQW LAYDNYLALVNIYSAGQFKILRKRLKDLYDKVGKDGESKKLDDHGVEDEGYLSSAIVNEFKDCITR HRFLIDVIEQLDSIYTIINLVQVVTFSLIICLVGYQLIMPGNPLFRRIKFVIYLGGCIIQLFSFAITCGNV TDASVEVADGAYESNWNSKNSSERGRGLTKDLMMILVRSKTPCYLTAAGFFPVTLNVFNSTLST AFSYLTLIRQSADKGSN

110

Supplementary Data 2-S5 – continued

DallOr MASTIQALPMRITFFFLYIAGFGIAKTDKDRKLLNMCLVWSCSTVLVAFSITIVDLYYVWGSFND 77 MAYSAVNLLTDVIINAKFFTIMFNRQQYEDLLKAACESLWSDTQTEHGRRVLKKCENQAMFFIIL FATFAQTSGIAYFVEPVLLNIQNNSTDVKKRLFPFKVWFNLPIYETPNFQIFFLIQMFVTYHSCILYF CFDNYLVLVNIFITGQFSILKYRLEVLYNEKIIESITRSCVGCEQLSGQDDSLMSISREFKNCIKRHQY LIGFVEQVEEIYTVVNLASVLVYSLIICLAGYQLIMPGNPLMRRVKFVVFISGCLTQLLSFSFTCNNV SVGSVEVSEGPYNSDWYSRNWSKRGRSLTRDFVIIIMRSQRPCCLTAAGFFPVTLDTLKSVITTAF SYLTLIRQSIQ DallOr MFSFKKTIIMLYLRDAIKIQSHVYTDELSNPEWYWSPDERYYSHQEMAYTGINVLTALIIVIKFFSL 78 LSQRSHYEGVLQISRKSLWGNAQTKYEENVLRQCEKQAMLFVIFFAIFANSTAFLYITEAILYNIGH NITDVTERRFPFKIWLDLPIYETPNFHIFFFLQTVMACYAGIMYCCYDNYLVLVNIFIAGQFAILKY RLELLYNRKIVDSSMNKDSDRGKPNADLKVAERDFKDCVKQHQFLIWIVSELECLYSLINLSSVLV YSLIICLTGYQLIMPGNILIRRVKFTVYIGGCLTQLLSFSYTCHNLSLASVDVCQGPYNSKWYERSHS ERSRSLTRDFVVMIMRAQRPCHLTAAGFFPVTLDTLKSVLTTAFSYLTLIRQSSMVTVN

DallOr MPFDREALPVRLTYFFLYLVGFGTGKTAKERRILNLWLSNGFLASIFGLTIIFIDFYFVWGSFQMAY 79 TGINVLTALIINIKFFSLLFQRSHYEKVLQITRDSLWGNAQTKDEENVLRQCEKQAMIFVIFFAIFA NSTSFLYIIEAILLNVGNNITDVTERRFPFKIWLDLPVYETPNFHIFFLLQSGMACYAGILYCCYDNY LVLVNIFIAGQFTILKYRLELLYNRKIVESIMNKDSDRGRLNEDFKVAAREFKDCVKQHQFLIWIVG EIESLYSLINLTSVLIYSLIICLTGYQLIMPGNVLIRRLKFTVYIGGCLTQLLSFSYTCHNLSLASVDVCQ GPYNSKWYKRSNSERSRSLTRDFVVMIMRAQRPCHLTGAGFFPVTLDTLKSVLTTAFSYLTLIRQ RSMVGVD DallOr MTSDSKEILPAKISFFLLYLGGFGITETHSERRILNLWMVYTVLMVTVGAATIGCDLYHEWGSLEG 80 MTYAGITTLVTTIISIKYFTTLMKRPWYVRVMRLARDTLWGHAETDYDKAVMKKCERHALFYVT AFAYLAGSSGVLYIVEPIIYNSLNNITAPSDRMLIFKIWQDWPVYETPNFEMVYILQALITVQLCVL YSCFESFLVLANTFITGQFSILRYRLEVLYSHQLIDRITSSYMESGEMDGEKNSLAFVSQEFKNCIRQ HQFLISVTEELEGLYTIINLASVLIHSTLICLCGYQTIMPGNALVRRIKCTVFAAGCISQLFFFSFTCTN LTLGSLTVGDGPYNSAWYNENSSEKGRSLTKDYSIMIMRAQRPCRLTGGGFFYVTLDTVKSVLTT AFSYLTLIRQSSMN DallOr MTSGQREILPARISFFLLYLVGFGITETDQQRRTLNLWALYTIIIVALGTISIYSDFYHEWGSFDGVA 81 YTGITMVTATIINIKYYTTLMKRTSYERLMRLARDKLWGQTQNDYDKEVMKKCEKHALFYVGAF AYLASSTGVLYIVEPVIYHWWNNITTPSDRMLIIKIWQNWPVYETPNFEMVYLLQVFITVQLCVL YSCFESFLVLANTFITGQFSILRYRLEVLYSHQLIDRITSSYMESGEVDGEKNFLAFVSQEFKNCIRQ HQFLISVTEELEGLYTIINLASVLVHSTLICLCGYQTIMPENAVLRRIKFGVYAIGCISQLFFFSFTCTN LTLGSLTVGDGPYNSAWYNKNSSERGRALTKDYSIMIIRAQRPCRLTGGGFFYVTLDTVKSVLTT AFSYLTLIRQSAMN DallOr MMSHPVETLPIRLTKFFLSAAGFTIATTRREKIVVDVIVTYSTIALSFACCVTGMDIYNCWGSFYEF 82 VYSFIGFGTCFIVQSKFAVFMVKRKKYLKLMKYTSEILWTRHHTEYGKGIIKECEKQAMHFIILFTFL AQGVSLCYTIQPILLNIGKNETDRLFPFTFWVDLPIYISPWFEIAFAIQVLSCYHTSICYFCFDNYLAL TNIFITGQFKIVKNRLETLFDVHTLDPKDGNGIKRWCVARSRGLGRIARELKNCVKQHQMLIDLV DQVEDIYSFINLLQVLVFSFLICLVGYQLILPGNSTMRRILFVVYFGGCFTQLFTFAFTCNNLTLASL DVGEGPWHSKWNTKIRCKEGRAIVRDLQMVIMRAQSPCRLTAFGFFPVTLNTFKSMLSTAFSYL TLMRNSEEMIE

111

Supplementary Data 2-S5 – continued

DallOr MNVLLLYTIGGLVTSFVTTAIDIYHCFGDFDKFTYCSMNVFTVGFGFYKLIAFLLKRKEFLNLISHIM 83 EHFWNVDYDDYGSTVMKDCMLRCTRIVTFAIFMCHLTISMHYVKALIENRGKNESDRDFPFRFY SDLPLSLSPWFEILFVAQIFASYPCCYTYFCFENFLCQINITVVGQFLILNKEMREICDTKDTSSLSIES GIRPRLVRCVQKHQHLIACVETTKELYRSVMLGVVILLSFLICLEMFELMVSANSNTFYTFHYCLYA GGSIVQLYFYTLSCDKLTEASESLSDAAYDVRWYLTESGASKNQLVKDLMMIIMRSQRACSLTVG GFTPVTLQTFTSICNTAFSFLTLIRQSV DallOr MDGLVVYTISAVIGALFFTLIDLYKCYGDVDKFTYSTMIFFTGGFGLYKVVSFALKRKSFLDLILYAK 84 RHFWDIEYEGYGADVMRKCMLRLTGIVTFAIFMCHLTVVLHYIIPLLENRGKNESDRIFPFRLYSD LPITLSPWYEILYLGQVIATWPCCYCYFCFDNFLCQMNITAVGQFVILQKEMREICDNLDGSRLLN EKNIQSRFIRCVRRHQQLIDFVEDIRELYRSVMLGVVILLSFLICLEMFQLMTSTSSTKLSTFHYCVY VGGSITQLYFYTMTCDNLTEASLAISNAAYDVRWFSIKSEALKNRLVKDLSMVIMRSQRACTLTV GGFAPVTLQTFTSICHTAFSFLALIRQTIQKNDS DallOr MDGLVVYTITSVIGALFFTLIDLYKCYGDMDKFTYSTMITFTGAFGTFKVISFAFKRKSFLDLVCYA 85 KKHFWNVEYENYGANIMKECMLRLSGIVTFAVFMCHLTVVLHYIIPLLENRGKNESDRIFPFRLYS DLPITLSPWYEILYVGQIIATWPCCYCYFCFDNFLCQMNITAVGQFVILQKEMREICDNHDGSRA SNEHNIRSRFVRCVRRHQQLIDFVENVKELYRSVMLGVVILLSFLICLEMFQLMTSTNSNKITTFH YCVYVGGSITQLYFYTMTCDNLTEASLAISNAAYDVQWFSIESEAFKNRLVKDLSMVIMRSQRAC SLTIGGFAPVTLQTFTSICHTAFSFLALIRQT DallOr MDALVVYTISAVLGAIGFTTVDVFKCVNDIEKFTYSSMNLITVLFCLYKVVSFALKRRQFLDLIRYA 86 KKHFWYIEYEEYGAAVMNECMRRLTGIVAFAIFMCHLTISIHYIRPLLENRGRNGSDREFPFRLYS NLPIKVSPWYEILYLAQIFSTWPCCYTYFCFDNFLCQMNITATGQFVILQKELREICDKRNESTVV NEIHVRSRFVRCIRRHQQLIAFVEDIKELYRSVMLGVVILLSFLICMEMVQLMTSSSSSALSTFHYC VYVGGSITQLYFYTMSCDNLTGASLAISDAAYDVRWFIIKSGASKNQFVKDLSMVIMRSQRACTL TVGGFAPVTLQTFTSICHTAFSFLALVRQTLRDNEQ DallOr MTSTKRGKLPMKVTYFFLSCVGFGVAETEYDRKLLNISLCAFLLLSTASIYFTATNFYLKLNRGIISET 87 IEASFPLLTSSLTHFKLVLFTLQRTPYEHLLQRTNETLWIQSSTEYGEIVLEKCERQAQLFLFIFVILGH ASGVGWIFEPLFLNMRSNSSTRHLPTEVIIGVPLFESPNYEIFFAVNALGIYIITILYFCFDFYLVLVNI FIVGQFEILRQRFEIIYSVKPNQSVIKNGHDNGKNNHRQLATMDYISQEFKNCAKQHQFLISLME EIESIHNMMNLAQVLMISLIICLVGYPLLMPGSALTIVKNGVFICSCLIQLFTYTLSCHNIMIASSDI ADGIYFSRWYHENHTEAGRSLSKAFMIVLLKTQCPCYLTAWGFFPITLDTLKSILTTAFSYLTLMR QSMEQ DallOr MALLNENFLLLHYFGVWPSAHWRQRTWKTVVYSLYTSYIIFCIYWFTISGSIHLLLITDDVEDFSES 88 SFMLLSLLALCVKVVIALLKKSEIIGLLTDLENYPHKPANLETQLLQDKINEQIRFCTLCYGGLGEAT VCYGTIAPFFQSIPFGVLPYKAWIPYDYSKPALYWTTYCQQLSSVFIAANINFGFDTIIFEFIMQMC AQFSMLKYRVEMMINEFDEKQILSVSKINPTLLQSYEEYMTDCVKYHIDILRLCKRINTIFSSIIFVQ YTASSIIICVSVVLVSQMPVSSPKFVTVAGYIACIMLEIFLYCAAGNEATFQCQNLMTAIYSTKWYS LSDRMKKCLGFMMARSMKPITFVSHHMIVLSLPSFCVLLKTSYTAYTMLQQFAD DallOr MALLKESFLLLHYFGIWPSVDWQPRTWKTIVYSLYTCYVIFSIYWFTISGSVYLLVISDDVEDFSDT 89 SFMLLSMLAICAKVIIAISKRSEIISVLTALENHPHKPVNLETQLSQDKTDERIRFFTICYGGLTEATV CYGTIAPFFQNIPFGFLPYKAWSPYDYSSSLIYWCTFCQQLISVFMAANLNIGFDTIIFGILMQICT QFNMLKCRLKMIVDEFDAKQIMMRSASDPMILRSYEKYIVDCVKYHTAIFRISKKINYIFNSIIFVQ YSASAIVICVSVLLISQMPMSSPKFMTVAMYVACMLLQILMFCAAGNEVTLECESLTTTIYNTKW YLLSPSMKKCLGLMMVRTLRPVIFVSHHIIILSLQSFCILLKSSYSAYTMLQQFSG

112

Supplementary Data 2-S5 – continued

DallOr MVLLRESFLSLQYCGMWPPLHLESKSWQYRTYSLYTIYVMAVMYWFTLSELINLLITTDDIEDFS 90 DTCFMLLSTAAVCAKILITLMKRSEIRDVLVTLESCPHKPMNSEEQAIQDEYDGQVRFLTLFYGVA TEITVWCMTIFTFFQGIPFGVLPYKAWVPYDYSEPRVYWFTYYQQLLSVLLAANLDIGFDTIIPGF MLQICAQFNILRCRLHRVVNYFDDTGCSKAKSCEVLELYENHLVDCIKYHRDIFEMAKKINSIFTGI IFVQYAASFIIICVSVLLISQMPLFSPKFLALFMYLSCMLLQIFMFCAAGNEATIQCQSMINGICGT KWYLLENRMKKYLVLMMLRTLRPVRFVSGFIIVLSLDSFCILIKTSYSAYTVLQRSSS

DallOr MVLLRENFVYLQTIGLWSPVTWPSDSWKSRAYRLYTTFLITATYWFGITESISLFTIVNSIEDFSDS 91 CFMLLTTIVVCFKVDTILSKKSEIVALIEAFEVYPHKPMNADEERIQTEFQARIRLISCICGGFVEVSA WIMTISVFFQEIPYGDLPYKAWIPYSYSKPGVYWFTFCLQLLVVVFLANVAIGFDTIIYALFLLICSQ FNILLHRVTKAIDEFPTNNTSKIDICERKIIDCVVYHRAIFKISAKINLLFRRIIFVQYTASTIVICFSVF MISQVPLLSTKALFFFLYFICMLTQIFVICALADEVTVECENVIIGIYNTKWYHLTNRTKGYLVLMM VRTLRPVVFTSGHIIVLSLNSFSNLLKRSYSIYTVLQQNSK

DallOr MVLLREHFVYLQTIGLWTPVTWPSDSWKLRAYRLYTASLITATYWFGITESISLFTIVNSIEDFSDS 92 CFMLLTTIVVCVKVDMVLSKKTEIIALIDALEAYPHKPMNADEERIQAKFNARIRLISCIYGGFGEIS VWVMTISVFFQEIPYGDLPYKAWLPYDYSNFAMYWSTFYLQLLAVIFVASVDCAFDTTLIGLIILIC AQFNILRARLERAVDEFETGELKFQEDISNVVKICENRIVDCVKYHRAIFKYTINYLYRWIIFVQYSA STIIICISIFMLSQVPLLSTKALFFLLYFMCMLIQIFTLCAVAGQATIECENVLTGIYNTKWYILTNRA KSYLVLMMVRTLRPVVFMSGYVIVLSLDAFSNLLKRSYSAYTMLQQNSRK

DallOr MVLLRENFIQLQYIGLWQPLCWPSNDIKSRVYWLYTIHILVMIHCFTLSEALSLFTLIENIEDFSDN 93 CFMLLTMFAVCVKAVVTLLRRSEIIDILTALDVHPFKPMNTDEERIQEEFNKRIRFFTFFYAGVVEC SVWIMSISVFFQGIPFGVLPYKAWLPYDYSKSTVYWLTFCAQLFTIIIGANICIGCDTVMPGFYTIV CAQFNILRYRLEKVIDEFDDAIVLKPEEMIFTYREHCEKGIITCVQYHKAIFEIAKKINSISSSIIFVQYL ASSIIICLSIFMLSHMPLFSSQFVYFTLYLTCMMSQIFLLCASANEATIECQNLTTGIYNTKWYNLN HRAKSYLGLVMVRTLRPVIFKSGHIIVLSLSSFSNLLKHSYSAYNVLQQSSK

DallOr MESLRHSFTLLQYCGMWPSHRWSSFSFKARIYGLYTVCVLAMINWFVVSELFSLSAIRSIEDFAD 94 NCFMLMSMIAASGKTAVILIRREEVLVMLNTLDNYPHKPMHVDEQNIQNQFNRRIRLIFLLYAG QFEIGVWFMSIFMYMQKLPFGVLPYKAWIPWDYSAPGMYWFLYWLQLITVVLGANVASALDT LISGFMMLICAQFNILKCRLERTIEEFMIETSSKSDRIRCDSGQISERRIAACVKYHRAIFQLAQTINS MFTIIIFIQYTASSIILCVSVLIMSQMPVTSAKFMSLFTYVWCMVLEIFMLCASGNAATVECERLIF DVYCMKWYRLTERAKKALLLMMVRTLRPIVFTSGYIIVLSLESFTKLLKLSYSVYTVLQQSIR DallOr MLPYSIALLRVWGVWVPQDWWPAWRKKLYPAYTVFVICFVYANTLSQIIELFTTYETVKGFINK 95 SFILLSTIAGGIKGAHCILHRQEFINLAKTLGSYPCRAESPEEEMIQREFDEKIKHMNRRYITMHLFT ITTLTIASILRDVPKGELYYKAWIPFNYTSPPRFWSAYIHQVIAHYFDLCMHAGYDTLAPGMMIH ACAQFAILGHRFRLLPERIKERLKESDDINESQELKALETKKLTECVRHHLQIFEFAKTNNNIFSGMI FMQYTVSSLVLCMSTLRLSQIHAFSPTLLSIFLYFLSMIVQIFMPCFSGNQITIASEKLCDDIYSMD WTILSTSTKRSLTMIITRAQKPLRFTSGYVLTLSIESFNSVVKTSYSAFNVLHGSSDF

113

Supplementary Data 2-S5 – continued

DallOr MLTQEKDGVKGFLPGSWWVFVATGLWRPRSWKSPFLVYLYQGFSIFTIFLVYTFTTTSILGIIANH 96 GGIAAVMSDFLLLSFIACCGKSINMIVCRDTIIDVIDTLQRDPCMPRDAVEEDIQFKREHFVWINT LIYGILTEVTAMMVSVGSLLQHPEEGELPFNTWLPYYHDSGFGYRFAYGQQIISIWFSASMAVAY DTAVPGMMMEICAKLDVLKYRLVNFRTLLETSDTVGAYVSERKLVAECVKCHLIILRLAETINDVF NAVVFLQYSLSTLLICVSIYNLANTNIMSSEGSGIILYLGCMLMEIFILCAAGNELTLVSESISDAIYE MDWTELNGSTIQSLILIMTRTTHPIVFKCGSIVDMSLESFKSLVKLSYSTFNLLQQTSA DallOr MVPYSFVILQLLGFCRSATWQEGWKTRLYDCYTYVMLFLLYTNALFQSIYVFTSFESVDVLIDNTF 97 ILLTTINSGLKATSFVRRRQEIVDLLRRYRSQMCSPRDEAEEFIRFKYERVIHNLSLCYISFTMCTVIF QTYVQWRELVPRGELIYKAWLPYDNSYPLTFWISHMHQVICQLADASVSCTYGVIVAALMIHIT AQFVILNHRFENLAASLGFADGSGKGLSKMRENDDHEGIVMMEREKLAECIKHHLCIFECMEM SNDVFGATMLAQYAASSFILCGTVYKVSKMSGLDPELIGVLMYLGCMLFDIFLPCYYGHNVTTES FRVSQAVYNMYWSALKTPTKKSLVLIMSRSLKPFRFTSGYIVQLSLDSFGGLIKTSYSIFNLLKQQS N DallOr MQILPVSFILLQYIGFWRSENFSSGWKYMAYSIYSTLLVIIIYSSIISELIGIATSNDNIKEIVNNMMIL 98PSE LSMIGACAKATTVLLQREGISNLITILRAHPCMAKEPEEVAIQDIYDKMIR*AYVTLSGVTSFLVTT AALWTGAPHRILPYKSWYPYNTSTLAGFWTANVHQIISHNYGACINAACDTLIYGLITQICAQFAI LQHRFHRLPKSLTRIGRNTEQWEKNEIRNCVRHHLQILHYAHGCNRIFDTLICLQFCISSTVLCVSV FRLAQINLSSPDFLLIVMYLMCMLLQIFILCISGSHVMFESHNMVHGIYAMDWTPLNLNTKRSLL FIIGNCLRPVKFTCGTVIPLSLDSFNQLIKLSYSAFNVLQQSSG

DallOr MLPVSFTILQYIGFWRPVKLSSEWKCMAYGIYSTLVVFTVYSSIVSELIYMATSNDSAKDTINNLVI 99PSE LLSMIGASAKLTTVLLQREKISNLIAILRTHPCEAEEHEEAAIQNIYDQTI*YVILSGVTTAMVTGAS LCTEAPSGILPYKAWYPYNTSTPLGFWSAYLHQIIAHAYGAFTNAACDTLIYGMIMQICAQFAIL QHRFHLLPKSLAAIGKNIEQWERKELGNCVRHHLRILHFADECNRVFDSLICLQFLISSTVLCVSVY RLAQIELSSPDFPIIVMYLMCMLSQIFILCFSGSHLIFESHNMVHGIYDMDWTPLTLNTKKSLIFIIG KCLRPVNFTCCTILPLSIHSFNQLIKLSYSTFNVLQQSSGVSH DallOr MLPYSFSVLGFWGVWLPPDWSSGWKSRLYYFYTAGMIILVYTNTLSQVIDLLVTYKSLKHFINNA 100 FILLSTIGAGVKAAHCLYIRRRILGIEEKLNTYPCKPQDEEEKAILQHFSRIIKILNIYYIGLYVCTITALT TVSFFRDVPRHQLYYHAWLPFNYSSPGRFWAAYMHQVIAHGFDACMHAAYDTIAPGIMIQAC AQFAILDHRFQLLPKLVDRVRDKSALDFPQDEASQRRRVMRFEAKKLAECVQHHLKIFQLTKEN NKIFGIMIFLQYSLSSVVICLSVLRLSQVNAFQPALISVILYLICMVSQVFMPCYSGHQITLQSSKVS DAIFSMDWPDLGVATKRSLILIMQRSQKPLQFTTGYIITLSIESFNSLMKLSYSVFNVLHRSPSF

DallOr MDFHKETFLILTYMGLWKPKNLSKWKSIFYNLYSAVVVTMMTTYGLSRLVNMIVSTKNMAQFI 101 FLGVLESGLKGCTLFFYASKIINIFEMVSRPPFQCQTSEESLVHQQFSREVRRVSRYCFMYIMIAAP YYGIDAFFTALPKRKLPLECWLPYDYSSLYAWGFSSVCLLLGMLWGITFNVVCNCLFFEMMMQ VVLHVKILKIRLRAMINTQPNANRTNNHCRGNQLLLERQSLHHCVQYHIAIIRLGKDIKAILSTIILI KYSLTSIMLCTTVYLMAKTPVFSTNFVSFSVYFGFIFYQIYILCYAGHRIRMEFDSIGEVLYTSNWIA LSETSKQSMKFMMINAEKPFVFTCGGILKLDIEALKNTLQLAYSIYNIL

114

Supplementary Data 2-S5 – continued

DallOr MSFFSLNILAFRILGLWFPEETSSVCRIILYRIYNAFAVTAMCTFVLSQFFALLECLNDTKELTNASF 102 MLLTMIAVAGKMFNMTVYRYEISRMVKLFDLEPFKSMDDREVVIKSKFHRSVKRFISVYGTLGT ATCTLITLFSLIRDMPRRQLLFKARYPFDDSASVGYWVSYIHQLYSHYMGAVVNMTFDTFVSAL MLATSAQLEVLKYRFVMMPSAMENERAETGCKGNPEEVERMEAKYLRNHTNHHLAIFEFSKM TNDTFTVSIFLQYCASSLVLCVSVFTLTQLKPLSKEFNSLLMYIGCMLVQIFIFCDAANDVTIRSET MAEGIYKMDWTSLSINSQKSLVLIMARTLRPIRYTSGHVVSLSLVSFSSLLKLSYSVYNILTQSSE DallOr MKIKIFPLNFFVLQMMGFWRPPHWSSPSKIIIYNFYSCFMCFTLFSVTLYQLIEIIKSPGSVDNFIK 103 NSRVLLTMMNACAKSINFFKRRGDILKAIDLIASSPCCPRDVAEDVIQGKYDKSIRRNSIIYGGFIH TAVCSRVLEAARECAPNRMLPFKSWIPYNIDSDLSFWLTYIQQNLAAYISAYVSICYDTIVPGCMV LTCAQLQIFKSRLQDFRPDDDKNHEEHLNNHDGDKKLISNCVQHHLKILQFVESTNNIFTFTLFT QFSISSLVICVSVYDLSKSTPFTGDFVEVVLYLMSMLLELYIFCLYGNDVTVESGRIAGHIYDLDWP SLKTSVQKELLIIMTRTMKPLRFISGHVYVLSLASFTGLLKTSYTTYNVLEQMS

DallOr MKTSILSLNFFVLQIMGFWRPPHWSSPLKIIIYNFYSCLMCFVLYSVTVSQLIEMIRSPGTVDDFIN 104 NSRVLLTMMNACAKSLSFITRRVDILRAIDIIATPPCSPRSAAEAAIQEKYDRSIRTNSIIYGTFIQTS VNTVILQTAMECIPNRILPFKSWIPYEINSDLTFWLTYMQQTVATYVTSYVNICYDTIVPGCMVQ TCAQLQIFKSRLKDLCEHDCENNKHSGLQDLEYDDSVDKKLVSNCVEHHLKILKFAELSNSIFTSTI FTQFGISSLVICVSVYDLSKAAPFSPYFVEVILYLMSMMLEIYLFCFYGHNVTVESSRIGSDIYDLD WVSLKIPVQKSLLIIMTRSMKPLIFTSGHVVVLSLASFTSLLKMSYSAYNVLQQAS

DallOr MPSQENSSMLFHRETFLFLRYIGVWKPQNLPKWKSILYNIYSTVTVTLIMTFCLSQFSGIILSKSAN 105 VKALMQNMFLALTTMCICLKIINLFCCRSKLINILEMLTTHPFECQSSEEYSICYKFSRRVRCLSILSI AYIVIAPLYFAIEEVLGNLAKRTLPFSGWFPYDYSSPSVWWCISLYLAISLYLEAVFNIVYNALFFEL MMRIVAQVKILKHRLQVMMTDLGNAMSENSAEKHLLLERHLFRDCVEYHVVILRMAKDINSIF ATIMMIQYLITSLVLSSTVYLMSETMLASRDFLKLTVYFLSMFQQIFMLCYAGHCTYLEFDSIGDV LYTCNWMTLSENSKQSMKLLLVNAKKPFVFDCGGLLKLDMEAFKNTLMFAYSIYNIF

DallOr MSKFFHKETFPLLKYMGFWKPQELSKWKSILYDIYSTIIVTLIMSSGLAQFLGICFLSHTNLRVFMQ 106 NIFIGLTTMSICLKVVNWFYSLSKIINILEMLKQHPFECQTREEYLIYHRLSSKNRRISITSLTYCMMS VAYYFIDQIVTIRPQRTLPLSAWLPYDYSNSSKWWFSSSYLALVQYLMAVFNEVYNVFFFELMM QSVAQVKILKNRLHVMMTTLVKAHSTKNHLTKNDLLMERELYHKCVQYHIAILRLTRDINSIFAPI MMVQYLITSITLPSTVYLISKTMICSRDFLKLGAYAAFLLQQMLILCYAGHRTYLEFDSIGDVLYTS DWIALRESSKQSMKLMLVSAQKPFVFDCFGMLKLDMEAFKNALMLAYSIYNIF

DallOr MDMDFLPHSFAIFTASGLWRSLRWSSGLKKHVYDCYTFLILVIVYLFGFMEFVDAMCDFGNVEE 107CT MVSASFMLLTTGNTFCKALNMMNRRKDINIFAILNDDVCRSKNQEEDEIQTKINIRIRWVCTSR E CARVKSAVVTITISSVKQNVPDRTLPFKAWIPLNYSSDNVYWYFYYYQVIAYTTVGIISIGYDTMV AGIMLLTGAQLKIFKFRCENMLANVEEEQKKSNLSKVDLERKILKQSVWHHKTIL DallOr MDFLPHTFAISTAGGLWRPLRWSSGLKKHVYDCYTFLILVIVYLFGLMEFVDAMCNFGNIEEMV 108 SASFMLLTTGNAFCKALNMVNRRKDIINIFAILNDDVCRSKNQEEDEIQTKINIRIRNDATSYFCIV QCAVVMITISSVIQNVPDRTLPFKAWIPLNYSSDNVYWCFYYYQVIAHAEVAIISIAYDTMVHGI MLLTGAQLKIFKFRRENMPANLEEEQKKSNLSKVDLERKILKQSVWHHETILQFAMSSNEIFSTVI FIQYAVSSFVICISVYRLAGMALNNPELPFALIYLLCMVNQIFCFCWYGNEVILESLGIGEAIYKMD WPSLHLRTQKDLLMIFNCTIHPIVFTSGKILVLSLESFTAIMKLSYTAFNVLQQR

115

Supplementary Data 2-S5 – continued

DallOr MRHHVSRNPSTFFHKETFLISKYMGL*KPQELPRWKSIFYDIYSTVIVTLIMTFGLSQFLGILFILEP 109 SKPESIHGKYIHNSLNDIYLFESCQLXLYCPPKIINILEMVKRYPFECQTSEEYLIYHRFSSKVRISISFFI P+F YTMIAALHYAVDKVLTTLPLRILPHSAWLPYDSSSLLTWWFSSMFLAVSVYLDSLFNMVYTIFFFQ LMMQIAVQVQISKYRLEVMMATLLKVNRIKNHSTGDDLLLECELFHNCVHYHIAIFLARDINSIF GLIMMTQYLITSLALSATVYLLSEIMVFTSDFLKLSGYLGFLFLQILIVCYAGHRTYLEFDSIANVSYT SDWITLSETSKQSMKLMLVNAKKPLIFDCGGXLKLDIEAFKNILILAYSIYNLHFCN

DallOr TSIIKDVPVRTFPYRLWIPFNYTVSSYWVIYPAALTGITLASTFHVFHDAFITGLFLTL 110 CVQLKILRTRITNKQIFKVKSIHKFLEHYNLLKQCAQLLNETIVYVIFMQYATSCFVVGLSVCRLVN N+C MNYGDPAYLFTIGYYICLLLQTFYYCWFGNEVTVESSNIRDSIYGSEWYSLDKTRLRDIVMIMQR VSTPIQFSCTQLFVLSLDSFKQ DallOr MLKSIYLSVMEADPLFLHISVFKACGIWPPAEWTSLWKRMLYKSYSFLIVGIICSNCVLQSVDAFT 111 SMGRINDLVETLYILAAGCNLSYKAFNVVIKRQDIIKLLALPRSNQCHVLSCEEEAFRRKYHLRVRN STILLCTVLEFTIVNMIISSIFGNIPERTLPIKIWLPSNFTEGYRYWMIYCSQLIAFTYAGTFHVFYDTF IIGVMITLCGHLQILQYRNSKMAAPDIQFQNYNFGVSYGPTKSAIERKLIIKSMEHYELLLQFAKSF NEIIFYVIFVQYSISCFIVCLSVYRLVNMSFSDPAYVFTIFYCGTVIIQTFSYCWFGNEVILESANVSES LYNSEWYSFDLATLRDFSIMLHRVQNHIEFSCTRLFVLSIDSFKNIMKLTYSGLNLLMQSS

DallOr MNVTALSPNFHIFTRFGVWLPPEWPSGWRRILYKTYACFVTVMLYSMFSFQFIGVFGSIGNINE 112 LVESIYTSATVFNLGCKIFNLLEKRRDIVKLLGLLEADICQIKSQAEKEIQEKCNRRIWNRTKFLCLFF ETTCVTMVLTSTVKHIPERSLPLKAWLPFNYADGYTYWPVYMHQVIGLVYVGMIHASYDTFIVG VMLTLCSQLKILQYRNSEMSLLNVRVTTVAVVDQKMLEKEFVTNSIRHHQLLQQFAESFNNIIIYV IFVQYSVSSFVICVTAYKLVNLEFDNPEIIFTILYCTSMITQTFFYCWYGNEVILESANIGNSLYTSR WYSLNAGTVRDVTMVIHRMQHSMKFSCSPLFILSVDSLKNIMKVTYSTFSVLIKS

DallOr MNKSEYKPVNAFKLNLTIWKYAGIWPAGVKNKSLRYIYYIYNSISPTIWFGSFFILQFIHILMVITDL 113 EKLMHSIYLMISYCAMACKYHSILWYRRRLEELFNTLNEPIFKPRLPSHFDVMNESMRIARRDSIIF LSSGICATIFWTFWPLFDKKSEENELAYNMWCPLDISRSPTFELVYAYQTIAITYNTMFNTMTDT VICGLLKMMSGHLDVLIEDYGTIFEDIFICPGEETPRTVAGKKLMELENVAQETEKQSEMGTRMK KRLVQKSSNNNSRLEIPPVIIPQISQEDMKARVSTCVEFALAIERFANEIKDIFQAGILVQFIASCLIV CASLFALTIFSDLADAIYTCDWTSCDERFRRSMDIIMSRSQKPVILLIGGLFSLSVETFISILKSAYSFF MFFKEVQNMSN DallOr MRKTNSFPLGKSYLHFNFQCLSWLGIWNPYEGGLKYWLYQVYWVYVISVIMAMRLMNILVRGI 114 NAENSAQFLQEFSMLAVETADTIKYIAFLFHRAEVLELSEAFNWERHLLGTNEITQYRNTVLTSSL SSSKTFTITIMITLIQYFTFSFYSAFCEADYKAERLPISVAPMKYCFVLTNFWVAFISDYLTFTLLGLIA VAHDAFIMAVMINVEAQLKILNFRLERCHLTNHHNHDATRNDVDPVVIHFECVERCTQNALNL NAELINCIKFHQQIVRILRIFKKIYDNALLPQLFISLLLITALLLQMILGRESNNTDAVVALGFLIPVIL QLLTFCWGGNIILVESDRSSNSLYVSHWYTRDREFRDNVKIFLGAIRNPLIVRAGGLCDLSAVTFK NIISKAYSGVAVLQNMQNE DallOr MEKNNIFPIDKSYLHFSLKCLSFLGVWNPYKKGLKYWIYQAYWAYVVNVIMAMRLMNTIVLLIN 115S AENSLQFLQQFSLFCVITANIIIYIAFLSHRVEVWKLAGSFNWERYLPGTEEITHYRNRVLANGLRT SKTFTMSIGFAVIPYFIFSYYAIFCEADYEVGSLPIFRVAPMKYSLILTNFWVAFVIDYIYFVVLGLIAV IHNAFIMSVMINVKGQFKILNFRLERCHMTTCKIRDDAENNEDSDVVHFEYVERKISDESLSLNP NEQLINCIKLHQQILRISQLFKNIYDNVLLPQLLISLLLITVPLLQMVLGKDGDHTDSVVAVEFLIPVI LQLLTLCWGGNLSLIESDKSSNSLYEAHCYERDKEFRDNAKIFLGAVKNSLIISAGGLYDLSAVTFK NLLSKAYSGVAVLKNLNE

116

Supplementary Data 2-S5 – continued

DallOr MSNTKSNMRENVFPIDKSYLYFNLNVLSVMGIWNPWEEGYKHWIYKIYEWYVIIVIMVMRLTAI 116S LVRGISAEDTTQFLQEFSLLAVEAADTIKYIAYIFHRQQVLELCSAFNWDRHLAGTEEITQFRNKVL TNSLSASKTFTIAIVIAVILYYCFYYYAALGYSEYKLEKLPIRTVPLKYCFILTNFWVAFICDYITFTWLG LIAVTHDAFIMALMINITAQLEILNFRLEKCHIEIFELSDDYKNHEQSKLIHFESVEREMHDKSVAL DPNAELINCIRFHQHIIKMLNTFRTIYDKALLPQLLISLLLVSMSALQMILGQQGNLSDSVGALVFL IACILQLLAFCWGGNIILVESEKTSYSLYASHWYYRDTTFRNNVKIFLGAIKNPLIVSAGGLVDLSAE TFKNILTKGYSASTVLKNTA

DallOr MSPTTTELTVKQNHVFPLTKSYLAFNFKWLTILGIWNPYKKGLKYWIYQGYWTYVITVVMAARL 117 MTLLVRTLIAEDAATLLNEFSLLAVELVDTLKYIALVVHEKEVIELSEAFNWERHLPGAERFKHYRN KVVTKALSASRTFTIAIMIALVEYYPFYYYSILTHSDYKVEKLPVGSVPLKYCFITSNFWMAFIFDWF TCTWLGLLAVCEDALIMAVLINMTAQLEILNYRLERCHLTTYVLNDDEEYYESSHTSFEYFEAQSG ISSVDNPVVDPNEELINCIKFHQHIKRMLKLFGTIFDKALLPQLFTSLLLITLLLLQMILGQQGNLLD SIIAMAFLFCVLLQLLAFCFGGNFVLVESDRTGSSLHASHWYDMDITFRKNMKIFLGAVRNPMIV SAGGLVDLSAETFKNLLTKGYSGAAVLQNVSG

DallOr MSTRGTDTRKEQVFPIILCRIRIWNPYEEGLKYWLYGIYSTYFVSAVMITRVLGLAVRAITGKSASQ 118 LLQDFSFRAVEAADSLKGIAFLIHRIEVLELSKTFNWERQLLTMRQITQYRNEVLTNSLSASKKFTIII IIALIQYFTLDFYMTLCQSDYKVDNLAIRSTPLRYFFILSNF*VAFICDYVTFTSLGLVAVCHDSIIMS VVINITAQLRLLNFRLENCHMTTFTITNDKHLALHFTCYILVTSKIYCLSMNFLFRMVNIFKNMYN KVLLPQLLNSLLLITISGLQMILGQQGNLTNSLIACGFLILVTFQLLAFCCVGNYILVESVKTSYSLYA VHWYNKDRIFRNNLKIFLGAVRNLLIVSAGGVVNLTAETFKKILTKAYSGVAVLKNVSD

DallOr MGRTESKVEKKNVFPIDKSYLYFTLNVLSCLGIWNPSMEKGRKYWLYQAYTIYYVVVIMILRIITVL 119 VRALHTTNSTEFSQEFSLLVVLICDAAKGLTFVFHRSDVLEILGIFNWERHILGTEQVRYYRNRVLTI AMMASKIFTIAVAIILTQYFIFYFYVIWKYSDYEIANLPIGSPPLKYCFVLTNYWVAISIDFVIFTSLGL NAVCHDAFIMGVLINLMGQLKILNFRVEKSSMTNLIKSSDEEIEENSKVTQFECAERESSDEFVTL DPNEELINCIRHHQQILRILEIFKHIYNKVLLPQLSISLLLIIVCGLQMILIKEGGSLDGTLVALGFMLS AVLQLFAFCWGGNIILVESDKTSFAIYSSQWYDRDIVYRSNVKIFLSLVRNPLVVSAGGLFDLSAV TFKNILAKAYSGVAVLQNMED

DallOr MGMKTIFSIDQSYLYFNFKCMTLMGLWNPYEIGVKYWLYQIYTMWMIGVIHIMRTMSTFSRSL 120 TADTPLFFIQESSIFAAEIADIIKCIAFIYHRVQISELSNLFNWEKHMLSGEELKNRRNQVLRDSLFA SKTFTIVIMISGIQYFAFYFYSALMQSDHQVERLPMGSIPMKYLIVSTKFWPVFIVDYLSATALGVI ALSQDAFLMGVMINIAAQLKILNLRLEYRQASHKSNDNYSSNHPNLGTHFDCVVRIHDKPATLE PNDELIICIKFHQQIMRMLKLFKSIYSIVLLPQIFISLSMTVLQLQLILGEHQKSTDTLIASVFLFGVVV QLLAFCWGGNFILMESEKTSYSIYASHWYSEDRVFRNNLKIFLGAVRNPLTVTAGGLIGLTAETFK NILSKGYSVAAVLKNFN DallOr MSRKESQVEKKNVFPIDKSYLYFTLNVLSCLGIWNPYKEKGRKYWLYQLYIMYYCVVIMMLRILT 121 VLVRALNSTNSTQFFQEFSLLAVLICDAIKGFTFIFHRPEVLTILGIFNWERHILGTEQIKYYRNKVIT TAMLASKIFTIAVAIFLTNYFLFYFYVIWKYSDYDVKKLPIGSPPLKYCFILTNYWVALGIDFVIFTAL GLSAVCHDAFIMGVLINLMGQLKILNFRVEKSSMTNLIKSSDEEVEENSKFTQFECVERECNDES VKLDPNEELINCIRHHQQLLKILDIFKHIYNIVLLPQLSVSLILIIISGLQMILVKDGGNLDGTLVAVVF CITAIIQLFAFCWGGNIILVESDKTSFAIYSSQWYDRDLVYRNNVKIFLSLVRNPLIVSAGGLFDLSA VTFKNILAKAYSGVAVLQNVQD

117

Supplementary Data 2-S5 – continued

DallOr MAKGSVFPIDESYISLNLKYIAALVKCIADLLQTLDARTLPEFTQAFASVVVHVAGAFKTFSIVYSRA 122 EILLLARLFNWERHLSCTPQMAHFRDVLMSKASASSRKLTLIISGIVLMNSFADLYSTLTVANCTLT DYPLSYLPLKEFSSAENCRTVALADYIVHNFASANIIINDSFLMAVMSYITVQLKILNFRLSNCHKTF IEELIIPEDEKNLEITSYECHQMSKDVSVLDPNRELIKCIENHQKMTEIFTTFKSIYNNILLPQLLSSLLI LSLMGCQLVLTNYDTDEDVHAMVHPLIFVTWATLELYIYCAGGNSILIESEQISSSAYNTQWYNN DKQFRGNLALFLSLVERPLVISIGGVIALSKDTFKNILTKAYSFMAILKNSSGQ

Dallor MEVLRTEHREILNMLLTVGRVGGIWSAVDPPLFIQRFLYSIYGFLAKGSLWTLAATMSADIISNSD 123 DMFAMSDAGAMLAGLSAVIAKVVVFQRHPKDIRRLIDMVYGPIDTMVMAKDGPVYGLMNFH VNVEKVIEYGWAGMAVQLVGAMLVSPVIFAGGNSSLPLRSKYPFDTTDDLKHNMALGLQIITAI FNFTAMFALDGLMRGFCRWTSFQMQILNSNFRHCDPVYSRSNQESSAGKLTYMSNFESKINCF VLFHPTEITRESDSFLRRFETCIKHHQRIIVIMKDMNRVFGYYIFAEFSSSTFIMCLTGFQILLGKRST TNVIKFMLYFNAAFVHLATCCFFGQMLSNEGNKIADSVWMSGWEKESNMSHAGYLMIIALMR AKQIIELKALGFYAVSMETFLMIVKSSYSVFALLTATTDETVE

DallOr MGNSLGNHRDPLGASLFTLRFFGLWSSFDDSQTIKKIVYPVYSTVMTSLIWIFIGTMLGDLFSNL 124 QDLLIITDDGCFLAGISVIVFKHIIFRTRRREIIKLIHEIYCPIDNLAKSSDDGVQMLVKVSTFYDQLH CYSFIGIGCSLVLALVTIVPTDNGSLPIRAKYPFDSTIYPLHSIAFAIQAGAVATGVAGILGMDGVVT SFSRYITLQLEILDSNYRHCQTKWPRGGFHSSVEKRKNLVTEIRCFIPFSPREAHEANDSFEKRFKIC IRHHQRIINIVNNVNVIFGSSMFCQLFASSSMICFTGFQAALGARGSANLIKFAMYCGTAFSQLLS WCLIGNMLLHQSLTLTDSQWQSGWEDEQYFNFAYLLIFAMVRGNRCLELKAINFYSVSMDTFI VILRASYSFFTLLVTVA DallOr MRQYKMQERGDPFLVENNSLRRVPPNFHIQISLTMIKYFGTWPPKDKFRFLYLVVQFINLIIIWG 125F DVPSMVASAFLLMTNSVHAYKIFLILGNQNRIQRILDMLESDKFSKDRDKFERIFTWYAWQGIYH YLRYQSFGTMAVFCWGFTPIADAISGHARRLPMEAWYPYDTKATPAFQITCAHQSFAIMLGCF HNISMDTLITGLLNVACCQFEVVKKNILELDVDGGSKEIDDARLRKELHSYIQHTVDVMGFIEEVR NIFGNVVLVQLLVNCIIICLTAFHISQMTVFVPVETFGMITYMCCMTYQIFIYCWHGNELTLQSQS LSQTVFSSNWWKFNKKLNNDLGIVITRFCKPIIFMAGPLMELSLQRFILRLSYSFFTLLKSTTVPQ DallOr MRFLYLIYKAIIYTIIMIFFSTLFAHLVLNYRDLLVATDDACYLAGISVIIFKLYNFNKQHRKIKDLIEEV 126 YRPLDVLSQSSDMGLQTLLKTTLFYERMLLYFFIILAAFLVIALMVFVPKTDGELPIKTSFPFDTTISP GHEIAMCLQTAAITFGLYSIVAMDSLAINICRCLSIQLLTLASNYEKCIVHVHDRCRLKTCSSLRSPS KRLSVLKVSEEDLIICKFVPFRKEEEQEDGDSFVKRFNLCEINHQRIIVAIDEFNSIFSGCMLMQVFS SFSMICFAGFQAVLGASSKESVLKFIVYLGAAMSQLVNWCWSGNQLIYQSTSLFESQWQSGWE HQLDNHKVKTLMLMSMMRSKNPLQLRAGNYYSMSMHTLISILRNSYSFFALLRTVTEAV DallOr MGKPLDIEDFVMINRKILEVVGLYPRNCGRYVCCVVFMSLIVVPEALEIYYRRSDFDVVLETSSVLL 127 TIILAILKSVIWISRRDTSDSVGFLINDYWKIANHLGEPEDFYKSAKAAKVITMSYSFLICNALLFFYS LPLLHLLPHPGNSEQSERLRVPFLATYPEFCYHSPAYEFVYISQLLATSTCALIILATDTLIATALLHTC GHFAIIERNISELNFAKSVENLEQRVRVIIKHHQLVIRFSDRLEFLFNPLMFLQVFASSLIICLVGFQA QSGKLDKFPQYCSYLMMALFQLLLFCWPGDELITQSSRVSTATFSANWYIGPNHLRKDLQFIILR SQKSNFLSAGKLSAMNLENFSAILSSSLSYFMLLRSFSEN

118

Supplementary Data 2-S5 – continued

DallOr MKSILDSTYYRPTRWGMRMMGQWPFQSAGRNRFFKCLTIFLVTSICVPTMIKFVESLDDIDVV 128 MESIPMFCVYAAAYIKFFTWAIDDNSMRELLLSIERDWKNLKVEEDIQLLAEFSERGRKISNLYTM SIFGFLALFLTIPAIPMLMDIWIPLNESRSRIFLYQTEYFVDQNEYYFLILIHAYLTIPVVFTVVLFFDN LFSIFVNHTCGMCNILKLHLECVHIDDPLDKETTSEKRIQICADMQTNILEFVQRLESICTIPFLLLV GMNMLVITSTGMMIVIKGNEVSEIIRFASFNIGTIFHLFWSSWQGHALIVESETIYSSVYQSEWYT FPPKLQRMLLPLMMRSANPCQITAGKFYVMSMESFGAAMRMTMSFFTLLLSMR

DallOr MDRVITPYHRINRTFLTIIGLWPLQSSISRYSFYTFTLLVTITHGYLQTAGMVAAIVDIDVFLEAIPT 129 VLADVVCYFKYFNFSINAAKMKKLLLIIEEDWKHYTAGEELDILNEYAQFGRKVTVYYAGALYGSLI PLMLTPLVPIVLDVVMPMNVSYPKHLMFQQIEFLLDFESYLFPLIIHGYIGTAGYLTIIIAIDTMLM VYIQHACAQFSIVRLMLERLAKVDADDADCHTPEFDEIDYKNMAACIERHNHAIAFCDLIEEANY MSFTFIVGINMLMMTSSALVAVFKMDNPDVAGKFVAFTIGEMFHLFYSNWQGQLLQEHSESV FSNVYNARWNYTSVRTQRLIIPLLMRSTKPCRMTAGKMYTMSLRSFSEVVKTSFSYLTVFTSMR A DallOr METLATPHLRINKILLNIVGQWPHQPKVAKTICYCLMLFFTSTQAYLQIAGMICARNNIDLFLESIP 130 TVLVDFTCAAKIFNFFYNGDKMKQLLLILEDDWRNVKTYSEAAILNKYSLFARKLTLMYCGALYGT CTE MTPFLLIPLVPIIHNFIATQNDSISKQLMFEQVDYLLDTEKYYYPLFIHGYCGTLAFLTVVVAIDTMF MVYVEHACAIFAVVG DallOr MAGTPDCYVKLNGILLSSIGLWPYQRKFQQSLFFILAIFFLVTQGFLQTGGLVAAWCDTPI 131 FLESLAPVLISIMCVIKFINFIYNGHKMKELIDVLRRDWLELKDEFEIALLRKWSEDSRKNVLMYAG AVYGSMAPFMLGPLVPPLLRLIPKSLIEINPNLTMAKPLMFHVEYFIDINKYYYPLVIHSYFGTMTY ITVVVAIDTMFMTYVQRACAIFNIVGYRLEKLVDDHDLDANLNPTLRDDASYKRMTECIIRHSKAI QYAQLIESANSTSFLLQLGLNMITISFTGFQAATKLNRPDEAFRYASFTCAQTFHLFFESWPAQRL VDESTRLTEYTIKTSWYKTSFRSRKLFQLLIMKSMEPCQLTAGGVYILNLENFSAVVKTSMSYFTVL CSTT DallOr MENIFDCSYYWITKRGLQCIGQWPFQSSKEKRILRCLTFFTITSFLTPTIIKCITSIDDIDVVIEDLFIIG 132 LIALAFTKFFNWTIMEDTVITMIQLLLTIKRDWKSLIDQRDIELLQQHSERAQKLGVAFATFVYGSL LLYFCTPAIPKILDFFHPLNETRPRLFLSPTEYFIDQEKYYIYILFHAYISSSIYTACIVFFENFFAICVSH ACGKFEILKAHLESIHLDGTILIEETSLADFYFIKNRIRICSDLQTQTLQFVDCLESSYRGALLLTVGINI GLIVLAGTVAMIKNSEPSESIRMVCIIVDLIFHLFYSSWLGHVLVVQNERVLDAVSSEWYRFSNRS QLMLLPIMMRSLKPCQLTAGKLYIMSMQGFGAAMRTVISFFTLLNSTR DallOr MENIFDCSYYWITKRGLQCIGQWPFRSSKEKRILRCLTFFMITSFLTPTITKCMTSLDDTDVVIESL 133 FIIGIIGLGFTNFFNWTIMEDNMIRLLLTIERDWKNLIDQRDIELLQQHSERGRKLSVAFTILLYFCT PAIPKILDFFHPLNETRPRIFLSPTEYFIDQEKYYIYILFHVYISVPIYTACIVFFENFFAICVNHACGKF EILKAHLESLHLDGTILIEETSLADFYFIKNRIRICSDLQTQTLQFVDCLESSYRGALLLAVGINIGLIVL AGTAAMIKNSEPSELIRMVCIGVDLIFHLFYSSWLGHTLVVQNERVLDAVYQSEWYRFSNRSQL MLLSIMMRTLKPCQLTAGKLYIMSMQGFGAAMRTVMSFFTLLNSMR

DallOr MENIFDSSYYWITKRGLQCIGQWPFRSSKEKRILRCLTFFTITSFLTPTITKCMTSLDDTDVIIESIFII 134 GLIGLCFTNFFNWIMMEDNMIRLLLTIERDWKNLRDQRDIELLQQHSERGRKLSVVFATFVLVG LLLCFCTPAIPKILDFFHPLNETRPRIFLSPTEYFIDQEKYYIYILFHAYISASIYTSSIVFFENFFGICVNH ACGKFEILKAHLESLHLDGTILIEETSLADFYFIKNRIKICSDLQTQTLQFVDCLESSCCGALLLTVGIN IGLIVVAGTVAMIKNSEPSESLRTGCIGVDLIFHLFYSSWLGHVLVVQNERVLDAVYQSEWYRFS NRSQLMLLPIMMRSLKPCQLTAGKLYIMSMQGFGAAMRTVISFFTLLNSTR

119

Supplementary Data 2-S5 – continued

DallOr MENIFDSSYYSFTKRGLQCIGQWPFQSSKEKRILRCLTIFIITSILIPTITKCITSIDDIDVVTEVLFII 135 GLIALAFTKFFNWTILEDIMIQLLLTIKRDWQSLIDQRDIELLQQHSERGRKLSVAFATFIYGNLLLY FCTPAIPKILDIFYPLNETRPRLFLSQTEYFIDHEKYYTYILFHAYISSIINTACIVFFENFFAICVCHACG KFEILKAHLESLHLDGTILIEETSLADFYFIKNRIRICSDLQTQTLEFVDCLESSYCVALLFTVGINIGLI VVAGTVAMIKNSEPSESIRMVCITADLIIHLFYSSWLGHALIVQNERVLDAVYQSEWYRFSSRSQL MLLSIMMRSLKPCQLTAGKLYIMSMQGFGAAMRTVMSFFTLFNSMR

DallOr MDNIFDYSCYRFTKHSLQCIGHWPFQSRRDKIFLRCLTFFLISTILIPKIIKLIESLNDLDMVVECLPII 136 GCYIVGMIKFFNWIIMEDHMKQLLLTIKRNWEDLNVESELEVLHQYSDQSRRLNIAYTTCCYSVL LFYFCSPAVPKILDIIKPLNGSRPRIPLYHTEFFIDQDKYYVHLLIHAYLTVPVGLSYAVFFDNLFATLI HHACGMIEILKLRLEALHIKNTTQGPIKFEIVMNRIRTSSDLQTEIFKFIEKLEAAYSFALLIIVAVNM ILIITAGVAAVVKMSHPNEMIRVTIVNMCALTHLFWSSWLGHNLIVHSEHIFLSTLSKWYLSPSPL QSILIPIMVRSMKPCRLTAGKLYTMSMESFGAMVKTIMSFMTVLNSMR

DallOr MDFWDQDYFRVSKLTSCLVGQWPNQSVNALIRSRGMLLTLTVIHLIPRIRAVVLHRDDPEVILD 137 ATGPFIIDTVFVIKTINNWYNFKKMKALLTKIKENWKIFSEDELQILHQYTASGKRLSIIYLGSVFSG GIIFATGPLQLRLVHIFIETNETLPLFPTPVDYGSIDVDKYYWGLLSVSEVATFLIILSIISCDLLFFIYSY HVFGLFATLGYAIEHIPIDNNRESTEGILHVQRCIQIHYQATEFAEELEGLYIWNFLAVIGLNMIIISI TGVQIAINLDATEKLIQYGSLTISQLGHLFVECLLGQRLIDHSLGIQENISNAQWYHSSVKSQKML SLLLMRSQLPCQLTAGKFLVMNFPTFNMIIRTSASYFTILLATQ DallOr MDFWDQDYFRVSKITSCFVGQWPNQSVNALILSRGMLLTLTVIQLIPRIRAVVLHRDDPEVILDA 138 IGPFIIDTVFVIKTINNWYNFKKMKALLTKIKENWKIFSEDELQILHHYTASGKRLSIIYLVSVFGGG VVFATEPLQLRLVHTFIETNETLPLFPMPVDYGSIDVHKYYWGLFSLAEVTTFLIVFGIISCDLLFFIY SYHVFGLFATLGYVIEHIPIDSNGESTEGILHVKRCIQIHYRATEFAEELEGLYMWNFLAVIGLNMII ISITGVQIAVNLDATEKIVQYGSLTISQLGHLFVECLLGQRLIDHSLGIQENISNAQWYHSSVKSQK MLSLLLMRSQLPCQLTAGKFLVMNFPTFNMIIRTSASYFTVLLATQ DallOr MDKFWDQDYFRVPRITACLVGQWPYQSSKEFILPRIVLLILVIGLLIPRFRALFLHSDDPAIVLSVTT 139 PIVMDAMLVVKTFSNAWNLKRMKNLMNKIYTNFGIFSKDELQVLEEWTIAGKKLCKLYLGALV GGVGIFITQPLQQELVHKLMRANETNRHFPVPVDYGPIDVDKYYWSLSLLTAITTGYMAILTVAC ATLFIMLTYHTLGLFATLGYILNHLRIDTDDGPNDELYVRRCIQLHYRIIEFAQEVGGIYLWYFLGVI GLNMILISIIGVEIVAHVDAVGEMIQYGSATILSFSHLFVECLIGQRLIDYSLGIQEHVSNSQWYDS SVKSQKMLRLLLMRSQYPCQLTAGKLLVINMESYSRIIRTSASYFTVLLAMQ

DallOr MDFWDQDYFRVVKITTCLVGQWPYQSSKEVIFSRSIIIILVIIHLIPRIRALILYGDDPEIILSVSSPLII 140 DIVFVFKTVNNSYNFKRMKNLLRKIYENWRIFSTDELQILHQYSASGKKSATIYLGVVIGSAGLFIT QPLQLQLVHILMKTNETIRLFPVPVDYGSINVDKYYWSITLLTEFTTVLIALGIVACDLLFFTYSYHV FGLFATLGYTIEHLPIDNDNGFSDEMHVKRCIKLHYRIIEFANELEDLYRWTFLAVIGFNMILISIIAV ELVVNLGSAGKMIQYGSLTVSPFGHLFVECLMGQRLIDHSLGIQEYISNAQWYASSVKSQKMISL FLMRSQLPCQLTAGKFLVMNFETFNTIVGNSALYFTVLLATQLR DallOr LVGVYMGTVSFTTEPLQERMIYPLLYPNITIPKRFPTPMYFGSLDLDTYYYPLFVMSTCCTFLVMT 141 VVGSCDVLLFMYAEHACGLFKGLGYAIENLPPQEENEGSDLGFYYLRSCAVVHKRAIEFAEGIRDI NTE YLWNFFGVIGLNMILMSVTGVQVVTNLDAMEKVVKYSVFVFMQMVHLFIECLAAQRLMDASF ELKETLTNAKWYMASKKTHKLILLMLMRSQIPIVLSAGKMIVMNMDTYAVVLKTAASYFTVFLA MQ

120

Supplementary Data 2-S5 – continued

DallOr MGGVDFWDHPYYKIVKTLTTVVGIWPCQSMKEIIMCRFGLFVIMVVQVGPQIMAFWKYRND 142 MEIILDTITPFIIDVLLTVKLVNVCCHFKKLSYLLEQIREHWTIFSRNNGLYMLHYHSNFGRILSMV YLAGVYFGGVSYSTEPVQRMMLYRLLNSNITVPKRFSMSMDFGPINLDIWYYPLLLGSGFCIFVI LTVIGSCDLLLFMYAEHACGLLNGLGYAIEHLPPYDAAEKFDYSFVYMRRCAVIHRRAIDFAEHV RDIFLWSFFGVIGLNMMVISITAVQVVINLKAVEKVLKYTVFVFMQMVHLFVECLAAQRVMD ASLNLKESLTNAKWYTASKKTQRLIPLMLIRSQTPVVLTAGKIIVMNMDTYALVLKTAASYFTVF LPMQ DallOr MSDADDFWNHPYYKMMKKVTSFMGQWPYQSMRVHLATAFITFSILSMMLTVQLSAFWVN 143S RRNIEIVLDCSVPFVTDLILLTKYVNNCCHYNMMVHLLEQIKCNWTILPKNNGLHMLHDYSVFA QKFVKIYMTAMWLSCASYISEPLQERLTVHLLYPNVTLAKRFSMPFEFGPVDVDTWYYYLWCA TSFGILTRLTVIGSCDLLMFTFAEHANGLLKGLGYAIEHLPPHDECESDDECYEYMKRCAMIHGR AIEFAEKVRDMYLWSFFGTVGLNMTLMSITGVQIVNSMDDWKKMAKFVVNIITQIMHLFME CFVAQRVMDASSDLKETITSAKWYNASIKTQKLIPLMLLRAQNPVVLTAGKVIVMGMNTFAV VLKTAASYFTVFLAMQ

DallOr MGGGVDFWDHPYYKTVKRFTAAVGLWPGQSMKEIIMCRFGIFIVSVVQIGPQIMAFWKHRN 144 DRKMVMDTISPFIVDLVFTVKLVNVCCHFKKLRYLLEQIREHWTIFSRNNGIYMLHYHSDFGRK LSVLYLVGVYVGWLCYTTEPLQRIIFYRLLNPNITAPKRFSMPMDFGPVNLAIWYYPLLFGSGFC VFVILTVVGSCDLLLFLCAEHACGMFKALGYAIEHLPPYDAAEKGDYSFEYMRRCAAIHRRAIDF AEHVRDIFLWSFFGVIGLNMLIMSITAVQVVINLNAMEKVIQYTSFVSMQMVHLFVECFSAQR VMDASFELKETLTNAKWYDASRRTQKLIPLMLIRSQTPIVLTAGKIIVMNMNTYAVVLKTAASY FTVFLVMQ DallOr MPDSEDFWNHPYYKMVRRVTSFMGQWPYQSTGVHIATAIVTFSIIGMMLTIQISALWINRDD 145S FEVVLDCTAPFIIDVALLTKYVNNCCHYKTMVYLLEQIKCNWNILPKNNGLHMLHDYSAFGRKF ARIYMTAIWLSVVSYVTEPLQERLTVYLLYPNVTLAKRFSMPMEFGPVDVDTWYYYLWCATSF GILTRLTVIGSCDLLMFTFAEHANGLLKGLGYVIEHLPPHDDFAGNDESFEYMRRCAMIHSRAIE FAEHVRDIYLWSFFGIIGLNMILMSITGVQIVNNMDDSKKLVKFAVNMITQMMHLFVECFIAQ RVMDASFDLKETITNAKWYDASIKTQKLIPLMLLRAQNPVVLTAGKVIVMGMDTYAVVLKTAA SYFTVFLAMQ DallOr MSAVTDLWNHTYYKAIRIIATCVGQWPYQTGKKLVICRIVTFSILSLQFTTQVLAAWAHRDRFV 146 VFMECLSPFLLDAVAVIKYVNNCCHFKTMVYLLERMKYNWKIFPKDNGLHMLHEHSAYAKKY AAIYLSQLWLSVFTFIAEPFEKKLIYHFLYPDVHLPKKFPIPINVPLDVDTWYYYLWATISLSKFVRI TLIGACDLLYLTFAEHARGFFKALGYVIENIPPHDESEGVDRSFEYLRSCAILHNRAIEFAEHVRDV YLWSYLGVIITNTLMITGTGVQVVHNYHNTEKLIKFTINLTTQLFHLFIQCYTAQRVMDASFDLK DALTNAKWFNASKKAQKLIPLMVMRAQNPVVITAGKMFVMNMNLFTVVVKTAGSYLTVLLA LQ DallOr MGKTVNFFDHPYYKLTKTMTSLVGQWPHQSTKELVIYRSCLFVIISLQLTPQFIAVYTYRNDLKIL 147 LDTLSPFLVDAILIAKFVNNCFHSRTLIHLLDLIEENWIIFPRTNGQQLLHYHSELCRKASILYLGAV YFGTFSFATEPIQRKLLYPLIYGNVTVEKTFAMPMEFSSLLDVDYWYYPILLISTGFIFVFMTVVGS CDLLLFLYSEHAVGLFQGLRYAIEHLPPHNKSERSDFGFEYLKCCAAIHSRAIYFAEQVRDIYLWT FFAVIGLNMIMMSITGVQVVINQDAIEKVIKYAIFVVMQSVHLFVECYAAQRLMDASYEVKESL TNSAWYDATKKTQKLLPLMLMRSQSPIVLTAGKIIAMNMDTYAVVLKTAASYFTLFLAIQ

121

Supplementary Data 2-S5 – continued

DallOr MSSVSDFWEHPYYKLGKRVSYFVAQWPYESTRILLVHRTVIFSFLSLQYASQISAAWINRYNLEIL 148 MESLAFFIIDIGVVVKYVNNCYHCKTIIHLLEQIKYNWQILPKNNGLHQLHNHSAFAEKFALVYM LQMWLSVVPYAIEPLEQRLVYHFLYPNITLPKRFSMPLDFGPIDVDTWYYYLWFGTACTMGLS MTMLGACDLLLFIFAEHACGLLSGLGYAIEHLPHYDESESIDYSFQYMRSCAVIHSRAINFAEHIR QIYLWSFFGIIGLNMFILSATGVQKLLKFAVSLAGQLLHLFVECFLAQRVMDASFNLMKTLTNSK WYDASLKTRKTIPLMLMRAQSPVILTAGTLIPMNMNTFAVVVKTAVSYFTVLLAMQ

DallOr MDGERGAKKMDIFTTGYYKRNRIFMITVGIWPESSKFSKNFARTLVLIISFSMLLPQVLLGLQSIS 149 YITARTEKFHSKTLRNVSPIMVSLGMFQIEMEINRMREDWLILRETPAFGIIHRYALRGALGTTLY MVFVFSASTLIFFLLPLEPIILDIVIPLNESRSKIYAFDSDYSIYGISTDDHYYFTLIHSLAIGLILADAIV SIDTFVLIAVEHCCGLFEAVGYLLQEMHEEHSPWCQNQIIRKAIFVHKRAISLAEHIESSFTMMF GFVVLINMILISITALQAMLEINELAKMARFLVFTVGQITHLLFLSIPGQELYDHSSRVFRTIYDSH WFEMSTQHQKIVHIMLMRSMKPSLMTAGKFYHMNVENFKNILQASMSYFTVFLSAR DallOr MDVFTNGCYKQNRILMITVGIWPDSPLIVKLFARSVILTILCSLLPPQYAFLNHPDRHLNDLISVLI 150 IQLVILTVVLKICYIGFNMKMVQNGMVRMQEDWVILKETPAIDQIRRYAKRGYLISRIYIIFFFCA TTQVFFVMPLKPMILDAIWPSNESRPRTYAFDADYSIFSIYENDYYWALLHTLIIGLTLVEGIVAID TFILIIVEHCCGLFDAVGYILQGLRKESCSLRRNKIIRDAIFVHNRALSLADLIESSFTMMFAFVVLI NMILISMTAFETVLKVDDPGELVRFVLFIIGQIAHLFCTSLPGQELLDHSSRVFTDIYNSNWSEIA DKEQKLLLFMLMRSMKPSYMTAGKFYTLHIENFKNVLRTSMSYFTVFLSIR

DallOr MKKVDVFTNGYYRTNRILMTLIGIWPELSISEKICIRITMVVLLFSTIIPQYVFLFRKCDNLDDMIF 151F GIISQISVLVGFAKYYFVAANLERVGLVLKQIREDWHILEDSPAITTIHTYAAKGTFGTKFYMLIIFS AASFFSIAPLKQVILDKLSPLNESRPKIFMLDVDYSLYGIDANELYYPTLMHSYFTSMIIMNMIVX SVDTFILIIVEHCCGLFECVGILLKETELHDSSDGQCQIIGHAVVIHHRAISLAEFIESSFTMMYAFV VLGCMILISITGLELIIKIDEVAEVTRFGAFAVGQMVHLLFMSLPGQKLLDHSSRVSEGVYSCNW SSIAVDGRKLISIMLVRSMKPLTITAGGFYILNLENYANVLRTSFSYFTVMLSNR

DallOr MKKVDIFSNGYYRNNRILMTLIGLWPESSMSEKICVRISVLVLMISIMIPQYAFLFRKCNNLDDM 152 VFGIIAQIAIAVGFAKYYFVVTNMERVGRALDQIREDWHILENSPAITTAHTYAARGSFGTKFY MLTVFCSASFFSITPLKQVILDRLSPLNESRPKIFMLDADYSIYGIDANKLYYSTLLHSYFTSMIVM NMIVSVDTFVLIIVEHCCGLFKTVGILLKGMELHDSSDGQCQIIENAVVIHHRAILLAEFIESSFTM MYGFVVLVSMILISITGLESIIKIDDVAEVIRFGAFSAGQLVHLLFMSLPGQELLDHSSQVSEEVY GCNWSSISATGRRLISIMLMRSMKPLGMTAGGFYALHLENYGNVLRTSFSYFTVMLSNR

DallOr MDFWDQIYFKPMKIVSCLAGQWPYQSPMERTIIRSLLVAAVGSQIISQIAAVAEHSNDLDFFM 153 ELVPPFVVELTCFTKMINCFFFLGKIKILLEQLQENWKFFPSGRENELLREHSLFGFQMSVLFIGTL YISGFVFGTQPFQAIILHFVLNSNGSVPHQFLFPANWGPIDADKYYFPLISLSALSIYCVVTMLVAI DCVFYTCCGHVCGLFAALGHSIEHFNFNNDPQLGDDGFAFLRRCIQIHNRVLEFISVMEDVLTV NFLIMLGSVTMAMSLTAVQMVINFDAVVKLLKNLAFAITQVTQLAIKCWMAQRVMDMSLNI KFSLINSKWYLASPKTQKLIGLMIMRSEIPCEITAGKVIVMCLETFSSTVRLSASYFTMFLALR

122

Supplementary Data 2-S5 – continued

DallOr MVGTSNSEDFWDQPYYWFTKTVFRAIGHWPYQSRLELLMCRTVLQFTYWIQIVPQIIAVIRHY 154 DDLDLIMETMSSFITDIAVICNLSTFFFNSDKIQLLLEQVKEDWNIFPMDNGLQLLHEYTQLGKQ RAILYSGGLYASWFVFVSEPVHVKLFLLFKPSNKTLPLRFAIPVDYGPLDMEKYYYTILVVAAVSIF GIVTVIVSADILLFLYAQHACGIFAALGFAIENLPVDDSFRDKTKDYEYQYMKKCVIIHHRTLEFSN TIEDIFCWNFFVMIGLNMIVISMTAVQVVTNLDAVARLFKTLVYAIALTVHLFIQCFMSQQIIDS SLGAQQSLMNARWYLSTDQTKQLLQFMIMRSQIPCQLTAGKVVLMSLNTFSSIIRTSVTYFTVF LDMQ DallOr MDFWDQPYYWLTKTVLRAIGHWPFQARRERIICRIILHFIYWIQVIPEVIVVVRHFDDADLVME 155 TVSSFLIDIGAIANICTFIVYSDKIRNLLEEIKQNWKIFPKHNGLQLLHQYSQSGRTRSIVYSAYLW AAWSVFVSEPVHFRLVRLFIPSNKTLPMRFAIPVDYGPLDVEKHYYTILVVAAISIFAIVTLIIAADL LLFLYAHHACGLLAALGLAIENLPVDATSPRKATDDEYQYMKNCIVLHSRTIRFSDTVEEIFCWN FFMTIGLNMIVISMTAFQVVTNLNTLPRLFKTLVFAVASVMHLFIQCFMSQQIIDLSLAIEQSLM NAKWYHATRRTKQLMQFMMVRSHFPCQLTAGRMMLMSLDTFSSVIRTSVSYFMVFMDMQ

DallOr MQNLFDHSYYRITKWGLQFIGHWPFQSARRKRFLRCSSFFIVSTIFIPKVIKFIESLTDLDIGVECL 156 PLIGCHLMGFIKFINWIIMEDRMRNLLLSIERDWKDLKLECDLKLLHTYSEGARRLNIFYATTLYG IAIIYFCSPAVPRILDYLKPLNESRPRIFLYQTEFFIDQEKYYAYILIHSYVTVSISLGIIVVFDNLFATLI KHACGMFEILKLHLKTLYAEDYTQGLRTHNMISNNFQMTVNHIKRCSRLQTQTLKFVEDLESSY NIALLFIVGVNMASILVTGVVAVIKASHPNEMIRITFMCMGTVCHLFWISWLGHILIVQSESVFI SAYQNEWYYMPHKLQLMIIPIMMRSLKPCQLTAGKFYVMSMGSFGAAMRTVMTFFTVLNS MR

DallOr MENILDSPYYRLTKRGLQCIGHWPFQSTREKRSLRCLTFIITISLLIPKIIKCIESLDDVDITIECIPII 157 SCYILALIKFFNWIIMEDHLRRLLMTIERDWKSLTRQRDIEILHQYSDRGRKYNLAYTTFIYGTLLL YFCSPAVPKILDFFNPLNETRRRVFLFQTEYFIDQDKYYVQILMHAWITVTVATAYIVFFDNVFAL FVNHACGRLEILRDHLETIHSNELIENGEKSVDDFVTIKRRIGVCSDIQTQTFQFIELLTSSYDIALL FVVGISMALIVFTGVVIVMKMSQPTEMIRIVGICLDSIFHLFWISWLGHMLMVQNDRVFNAAL SEWYSLSERSQLMLIPLMMRSSKPVRCQLTAGRIYVMSMQSFGAALKTVASLFTILNSMR

DallOr MVATPLKNGNIWGGEIDIFLDGYYQRNKILLMTLGVWPEFPKRSRIIVQCLVVLSLLTIVVPQW 158S AFFIQVCDNLEDLATGISHQMLVVIGFVKFCFFIKKKEQIQIILNSVREDWCLVDESVVETIRTFAE KGYLRTTVYMVIIYSAVVIFCLFPLKPLFLDMIFPLNESRPKMFVLQTDYSVFGINVNDHHFMIT MHGLFTVTIVVHFLVTIDTFISIIVLHCCGLFEAVGEMLRQIQANFPSERKYKILCDGIIIHHRAITL AESIETSFTIMNGIVVLLAMILISITGLEIIIKMDEPVEVIRYATFAGGVILHLLFISLPGQELSDHSSQ VSEGMYAQRLISIMLMRSMKPMQMTAAKFYPQNLESFGKVLKTSFSFFTVMLSNR

DallOr MDDIFKRSGFYFDIRLIKWLGQWPFQSKRTDNIRRLLAISLMFSIFIPSIIKFCELRDNIYKMVDCI 159 PMLGLHFAGISKFLNWSYHREKVVQLLNHMRYDWNNLKNEFDSNVLEKFSSSGQSLFIAYAIGI YGTTAVYFIIPFVPIILNIILPLNESRPHLYLYHTEYFVDQNEYYYPIQLHAYMAVSVSVTCLVSFDQ MCAMFIHHACGMFEILKLHLENIHTTVVTNDDKNSSVKEERAIQEIIYCLQIHNRIFEFLEVVEIW DQMIMLMVGAGNTLIITACGVGGILNHPDFLGVIRLFLFNYGAIIHFFYNCWQGHLILKQGESIF IAAYQNEWYKLPCRLQRMLLPVMAKSLKPCEITAAHLFPMSLATFGMAMKAAISYFTLFQSMK

123

Supplementary Data 2-S5 – continued

DallOr SVTLYSASVIFCLFPLKPLILDIVCPLNDSRPKTFALQTDYSIYGINANDNHFMITMHGLFTVTILIH 160 FIVTIDTFMLIIVLHCCGLFESIRADFFPERQYKIVCDGIIIHHRAITLAESIETSFTIMYGFLVLITMIL NTE ISVVGLQTIMNFNDPTEVIRCAAFGATLSLHLFFISMPGQELYDHSNRASEIIYSCNWSGMSLKA KRLISIMLMRSMKPMRMTAAKFYPQNMESFGKVLKTSFSFFTVMLSSQ DallOr YSASVIFCLFPLKPLILDIVCPLNDSRPKTFALQTDYSIYGINANDNHFMITMHGLFTVTILIHFIVTI 161 DTFILIIVLHCCGLFESIRADFFPERQYKIVCDGIIIHHRAITLAESIETSFTIMYGFLVLITMILISVVG NTE LQTIMNFNDPTEVIRCAAFGATLSLHLFFISMPGQELYDHSNRASEIIYSCNWSGMSLKAKRLISI MLMRSMKPMQMTAAKFYPQNMESFGKVLKTSFSFFTVMLSSQ Dallor MVATPLKTGNIWGGDIDIFFDGYYQRNKILLMTLGVWPEYPKRSRIIIQFLVVLSLLTVVIPQFAF 162 FIKVCDNLEDLATAILHQLLVAIGFVKFGFFIKRKKRIEIILNSIREDWSIVDESVVGTIRTFAEKGYL PSE RTTDLYGYIIYIIDFIRIILYSAVVIFCLFPLKPLILDMIFPLNESRPKIFVLQTDYSVFGINANDYHFMI TMHGLFTVTIVVYYSVTTDTFISIIVRHCSIFSNEQYFIIYLSHVLFPIRREMLQQIQADFPSERKYKI LCDGIIIHHRAITLAESIETSFTIMYGIVVLLAMILISITGLEIIMKMDEPVEVIRYATFWSGVILHLLF ISIPGQELFDHSS*VSERMYAQRLISIMLMRSMKPXNQTAAKFYPQNMESFGKVLKSSFSFFTV MLSNR DallOr VILYSAVVIFCLFPLKPLILDMIFPLNESRPKIFVLQTDYSVFGINANDYHFMITMHGLFTVTIVVYY 163 SVTTDTFISIIVRHCCULFEAVGEMLQQIQADFPSERKYKILCDGIIIHHRAITLAESIETSFTIMYGI P+N VVLLAMILISITGLEIIMKMDEPVEVIRYATFWSGVILHLLFISIPGQELFDHSS*VSERMYYSCYSC NWSAMSLKAQRLISIMLMRSMKPKPDRCEILPSEYGEFWESVLKSSFSFFTVMLSNR DallOr MKTGDIFSDGYYKRNRIFMVLIGLWPEYPKRPRIFVQCLALLSLVSAMIPQYAFLIKVCDNMDD 164 VIFAFVDQMTVSVALVYYYVFIKNMEWIHLTINQIREDWRILEKSSALETLQTYAKRGYLATTFY TWVIFTTGICFSLLPLKPVALDIILPLNESRPRTNVLKVDHSIYGIDKDKYYFTISFHGIMNVVLLLN FIVAVDTFIIILIEHCCGLFEAVGDILRQMQADISPEHQHEILCDAIGVHHRALTLADSIESTFTMM YGFIVLVSMILVSLTGIEIILKLGDPAELIRFALYGGAQIMHLLFVSIPGQEIYDHSSRVLEETYISKLI YIMLMRSIKPMHLTAGGFYILCLQSFGNVLKTSFSFFTVLLSSR DallOr MFVKSDDSVDAAETDSPIDFAKLFNVCRVTLRTLGVFTVDTIFSKSKRKIWDMKALYYVFWLLP 165 TFFANFCEFRSVLSFWKTDVYYALEVTTAVLSGVIVMAQGFFVYHSRNDLLEVIAELRNLWQKQ LALNIPDVIVKKVKRARFFTQVYATLIILLAITYSIRPYLLLLTHVISRKNETYDMSQTVFQAIYPVQI DTFVKYIIWITLEALVFVNVSASWIGADMIFLQLATHLSSQYQILHDDLIALGTGESNQYTPVQQ LNTMGKRHTHLLLLSDKFQRIFSPILLVLMLVTSVNICICIINLQEELMQQNYAGVNKCAVHTVIA MIQPTIYCVYANDLTEWAELTAVAAYTCEWVDKTRYFRHSVRLITMRAQEGLQLRIYGFFSVDL NLLTQIGVGAARFFAIINNLSGMAE DallOr MDFFDNPNWFFTKWLLSSFGAWPFQSSRFRYLSRYTVGFLICSLLVPEIIKLVTVYDDLGKTIAC 166 VPILALHSLTVTKMLNCLLNLNQNKLLLLEIQKDWQRTLSPADVEILKRNAKQNRSITHTYIYYIY CTE ATTLMYLLGPMVPKVLDVVMPLNESRPALEIYQTEYFVDPVKNKIPILVHAYVISPFPSTIIVAFD ALYCNCVNHACSMFEIVGKRLENIIDDIDNANQRFSPIMENNIHSSLRACIRQHRKSLQFAHLLR LTYSICFLSIVVINTVALSITLYQVVQNLGETSEIIRYGTFSIGQIVHLFFLSRPAQKLMDHSSRIHSF AYQGYWYNIPMRSKKMLILIMMRSRNPSILTAGKLYVMSLQSFARVIKTSMSYFTVLLSVR

124

Supplementary Data 2-S5 – continued

DallOr MGKDLLNRYTSYSESVKRLSIICGMWPSEKPSIFYRLLPYFNGFILFIICWAATNFVYVNIDNLTLV 167 IKGLSISLGYLNGITKVLPSIFKVICYIIYREEVIELLDTMNELFRKQQGDEQLLSRILSPFTFFNALSV PSE LLMCAAVIVLGIYFITPLVIMANQYIQGVRPIRYLLPYPAVFPYHIRGGSALYLLHYVMQSYELFVF FSNSSSIDNLFALYSSQISGHLRALTYEMRHFTFKVGYEKQFKKLVTIH*KLIRCCEMVQVIHGPIV LSMMLTTSIILCCIIFQISQMKTISVKQMVFFIGFMCVKLLQTTIYAFAGALITTECENFRDEVYGT GWEKLGTKTAAYHVQIILMQRPIQMKACSYSFISVNVLTGILNTTLSYFFLLQTLDDEQ DallOr MGKDLLKSYTSYAESVKRLAIICGIWPSEKPSMFYRLLPYFIGFAPFMIFCTTMNFACVNIHNLN 168 RVMKGTSISLSYLNGIVKVLSYFSMICYLVHREELTELNGTINELLRRQQEDEQLLSRTLSSFKFFK VLSVLLMCSAFTVVGMYFITPLIIMANQYSHGVTPIHYLLPYPAVYPYHIPGGSALYVLHYVMES YGCFCLFSITASIDNLFALYSSQIIGHLRALTYEMRHFTFKTGYEKHLQKLINIHQRLIRCCKMLQAI HGPIVLSMMATTAIILCCLIFQISQMKTISVKQIFFFVYISVKLLQTLIYGWAGTLITTECDNFRNE VYGTGWENLGKKSAGYHVRIILMQRPIRLKACSYVFISVNLVTAILNTTLSYFFLLQAFDNEQ

DallOr MDISASERAEKTKIELQEYYRFRRNIKYWAYFSGSWPIKDANFFYRALPFLVFTSTSLICIQQFRFV 169 FANITNIGVMVGGFSLGTSFLSVAMKVALFKFHRLRLLEIHTILDGFHKESLADENSRYFVLEKLT GFRRLTKILSICVLFGCVFCIIGPMLLFVAQIRRNIRPLKYTLPTPAVYPWNTHDIGLLYILTFIYESY NVIAIGVVTLGIDGLFEFYIFFVIGQLRVLSEQMINFKATDDHNAIVRQWVTKFLVLKKCCQMLQ TIYGPIILWQVVTNSTVICTVLFQVTHVSGISIGRYLLIFGYSGTKIMQTYLYSWAGSSLTAESEAL GKSVYFCDWVSNGCQRLRTSVLIILTQKPLVIVAAGCVYISLDMFLMTLNTAVSYFFLLQTFEEK AS DallOr MDIGSSERTIKEKIKFREYQKYTSDVKYWMLFTGTWPVPNPSIFYRAIPIVAITSTTVLSIMLFRFA 170 IANITSISLMVKGFSLGTSFLSIALKVFLFTFYKKKSTEVHTVLLDHHTKFLADDNLRYLVLERVTGF GRLTWILTILVYSGCLMYFLIPIISIIVQIRHNVQSIKYILPVPALYPWEIYPGGVVYIATYIFETYNILC LGIVTCGVDSLFGYYIFHITGQLRVLGYQMMNLKSTDNQAEFIREWVTKSLVLRECCDCLQTIY GPIIVWQIITNSAVICTVLFQISQASGISLGRYILIIGYSGTKIMQTYIYSWAGSVLTVESEALSEALY FSDWVGARYQHFKTSILLILTQRPLKITAANCMVVSTDMFLMTLNTAVSYFFLLKTFDERES DallOr MDMGTPKRSIDTDINATQSEAILRKIKLCMLINGTWPVQNPSIFYRAISIFTITSTAILGILLFRFAF 171 ANLANLSLMVKGFSLGTSFISLALKVALFRSFKGTTMELYTILHEYYTKSLADEKSRDLVLKNISGF QRLISLLTVYVTTGCVMYTITPVIYIAVQLRHHVDSIKYILPVTALYPWEVKPGGFLYVITYIFETFNI WVIYTVTLGMDPLFAYSIFQIIGQLRVLQHQMLNFSQSDNLNELVRRWVVKYQILKECCEKLQK MYGPVILWQITTNSAVICTILFQVSQGKGSLAKYILIFAYSGGKIIQTYLYAWSGTQLTSESEALSD SVYFSDWMSRDRQAFRTSILIILTQQPLKIIAAQWITVSLDMFITVLNTAVSYFFLLQTFDEKQL

DallOr MSSLEKAIKADAQYKDAQQIIKRIRYCMLITGTWPISNPGILYRAIPYVTALSLATLGIALMWFTI 172 VNITNITIMVKGFSLGVSCISMLFKVFLFTFYKEMVNELYTTLHDYYTESLADKKFRYMVLDGIGD FRRLFWLLSAIAHIGCIMYTVMPIIFMIIQIRRHVHPLKYLLPVSALFPWEITPGGLVYKITYVYESY NIWCLYFITVGMDPLFVYFVYQIIGQLRVLGYEISNLPLADNLDGFLRQWVSKFLVLRGCCEKLQ TVWGPLILWQIITNSAIICTVFFQISQGEITVVKALTILTYSGGKILQSYLYSWAGSYLTAESEVLTE TVYFSDWLGKGRHRYRTSVLVMLTQKPLQVIAANWVPVSLNLFVMTLNTAVSYFFLLQTFEEK QS

125

Supplementary Data 2-S5 – continued

DallOr MNMSSAEEAMKEEIKFEEYLAFTRSIKYCMLLCGIWPTGQPGILDRVLSILAITSTSTLSIVLLRFA 173 FANITNISLMVRGFSLGTSYLSLALKVILLTFHKKTVSALYTILHKYHEEALADKKLRSRVLEKITGFR RLSWIQTCLVISGSLMYSFMPIIFMIIQLRQHMQPIKYILPLPALYPWKIQPGGLLYKITYLFEMY NMLCMTTITCGVDPLFGYFIFLITGQLRVLSYEMINIKPSDNHEEFIRQWMTKFQVLKDCCRKV QKIYGPIILWQVTTNAAVICTILFQISQGKGISVAKYVLILCYSGGKIVQTFIYSWAGTVLTNESEA LTESVYFSDWPEAERQRFRAIVLIILTQKPLKISAANWIVVSNDLFVMTLNTAVSYFFLLKTVEEK QS DallOr MSDSRKLQTYMAYREHLRWLLDFAGLWPSEDSSPAYRMLPYLQIVVGCGAAMKIGNFIAHHI 174 TSIRIVTRAMSIMTSIILNMFRVVCLARNREPLIKARKILDTYFDELLVHEKMREVVLHDVKLFRRL SLFYTVLTFFALFGYVLTPLIIIIKQHMRHVQSVKYPLILLGMYPWIVPDNIFIYSVHYVFEAFALFT VFYVSSGTDAFLPLLVFQVKGKLHAMAYRLTQLGEKDNIDDEMSRCIRDYIALMECRDILEKTFG PIILLLMSNNAIILCALIFQFTQMKAITIVQIIQFAAYICGKTTQTFLYSWSGTLMTSKSEEYLDAVY ASNWYGNPKGMNSVLITLNQRPLTMTAWHMSVVSVDMFVMVLNTTMSYFLLLQTIEQG DallOr MNSAKKIVAEEIKLQEYHGLTRLIKCYLFLCGAWPISHPGILYRALSIYTIISNLLGSITLFRFTFANV 175 SSISQAVRGFSLGSSCISLALKDIQLTYYMKESYEILTTLHNYFKESLLDKNLRYCVLDRVTSVRRLT LIHFFIVTSGACMFAFVPIIRIILQTWHGVQPVKYTLPAPALYPWKIYPGGMVYKITYIYEMYNM LCLATITVGVDCLFAYYIFLITGQFRVLSHQMVNLQSSEDHDEFIRKWVDKFLVLRQCCRKLQK MYGPIILWQITTNAAVICAILFQISQGKGIPVAAYIQILCYSGGKIVQTFIYSSSGTILTEESELGTQS AYFCDWPDAVHHRFRTAILIILTQEPLKVSVANWIIVSNELFVMTLNAAVSYYFLLQTFEEKQS

DallOr MKDIELREYQILTNSIKFWLLFVGIWPISKPRIVYRVIPIFAISCNIILSIALFRFAIAHISSMSLMVKG 176 VSLGTSFASIAFKILVFTLSREGITKIFTIIHNYHEESLADQNLRHLALEKVSGFRRLSWILCFLVACG AVSYCIAPIIFIIIQRRHHLQSIKYILPLPAFYPWVISPGGVLYKVTYVFEVYNLMCLLFTTCGVDSLF GYYILHITGQLRVLGYQITHLNRSDDSHQCLRRFVDKFEVLKECCEDLQTIYGPVILWQIITNSVII CTVLFQISQETSISIPKYILIMGYSGSKIMQTYIYASAGTALSTESEALTESVYFSDWPGGGTQRFR TSILIMLAQKPLQITAAHFVVVSNDIFIMTLNTAVSYFFLLKTFEEKQS DallOr MANDPRAEEYFLIRKTIQKLVFFMGVWPSEDSSFAYRFLHYVPVTLYFNAIMIILNSIAHHLDNA 177 KIITLSFGVVAVYLLCALKVLCLAMNVKKIMWFYKTIDELCNQFLSDGRLQRFVLSDVTIYRYFFW FHTSLCGFSGTIPLVMSMVSVMNQKIHHIHPIKYNLIIPGMYPWNASINELIYGLHFGLESYTLM WTFYVGALVDVLFSFSLLQMTIPLRGMSHAITHICDQNDYRGTLHRCLVQYRTLIQCRNIIENTY GPIILGVAITGPIALCSLAWQVTQMETINNFQKFRFAVHGVAKILQVFSYSWSATILKGKSEDFL VKVYSSEWYGSRTFMNTVLTMLIQRPLTIKAGHMTDVSLGMFVFVMKTTVSYYLLLQTMEQK SAQ DallOr MSDDQRVQEFFETRETMRKIALYMGLWPFENPGFFYRLIPCCLVSLYIYAILIISNSIAHHVNDAR 178 LVTVCSGVITINVLCILKILRIARYRKEMLWFNKTVTDLCNQFLSDGELKKFVLDDVRIFRCFFRVH ASLCAFSGTIPIITSLVSVVYQMKHDIHPIKYSLILPGTYPWNVSTNQFVHGLHFGFEAYSLMWN FYVGALVDVLFSFTILQMTIPLRGMSHAITNLRYERDYGNILHKCLIQYRTLIKCRCIVQKTYGPIIV FIAVTGPVALCSLAWQVTQMDSMTNMQKLRFAAHGIAKTLQVWSYSWSATNLRGQSENFLE KVYCSQWLGNRTFMNTVLTILMQRPLIIRAGHMPDVSLNMFVFVMKTTSSYYLLLQTLD

126

Supplementary Data 2-S5 – continued

DallOr MTTNVTLAACLSFRRKIKALLYVSGLWLPEKLGSFYGLLPFVFAIVSAITSFGILALVSHHITRVPV 179 VVRGMSVGTSLLCVILKIFCVISQRKRAIELHEILDHYFSRVLSDEHMTNLMLTGIFTLRRICFTLFF ASSFTVFLNFVTPTIDIINQKRNGVQPIKYSLIYPGVYPWDTVQYGVIHQIHFIIEILASVSLFCVTC GVDGLFAFYVFQVTGQLRVMSHRLTHVEEKNNIQIVIRECSQQYRMLLKCLDSMDDIFGPFVL WMMATNAIVLCALIFQLSQITNISIFRVIFILTYLTMKMTQTFIYTWSGTSLIHQSEKYLEAIYEID WFGKKTIMSSIIIMLCQRPLRLKPFGFSVISMNVFVMILNTAISYFCLLRTVEQKS DallOr MSANSKLPEYVAFYQTTKTLLSVAGLWLSGNPGSFYSRLPFLFIIVLSITGFGILMFVLHHITRIAIV 180 VRGMSIGTTLSDERMRKLMLAGATTVRRICFIIVFTIGLGVVVFILTPSMSIINQKRHGVQPVKYS LIYPALYPWDPSEYGVLYQIHFIIDILASVSMFCVTCGVDGLFALYIFQMTGQLRIMSYRLTHLNE EENIQIVIKECSQQYQILLKCRDSIEDIFGPLVLWMMGTNAIVLCALMFQLSQMTSISIVRGIFIIN YLIMKTVQTYIYTWSGTSLITQSEKYHQAVYDINWLGKRTIMSSIIIMLNQRPLYLKPFGFSVISM NMFVMILKTTVSYFGLLKAMEKKSG DallOr MSKKEQIFEYTAFRLTTKFLLYSIGLWPVKEPGLFYRFLPFLCFFSSCFASLAVLRFIYHYITRINVTL 181 RGMTIGTSLMLSMLKISSIMINRKRGLELHHTLEEYFSAALSDERFAQRVLVGITTVRRLCWVLIP TIFITVGGYVMRPITSIMVQKRNHVEHVEYTLIYPGLYPWNVPDGFFYQVHFFFEIIASITVWCV TCGMDALFAYYVFQIIGQLRVMSYRLTHVEDQKDMDVIIKECTQQYAELLKCRDSLQDIFGPVI LWMMGTTAIVLCALIYQLSSQLKDLSIGRWIWILAYLIPKVTQAYIYGWCGSYLHSESEKYRSAIY HTNWLVAKKNTMSSIVIMLSQRPINLVVYKFFYLTVNMFLMILKTTLSYYFLLRHIEQKS DallOr MAVYSAFHWTTTLFLHIGGIFPRKNPKFLYCLLSYSYLLANSMTIVGLLGFVMAHTADMLLVVR 182 GMSIIITLVSVGLKLSYMIAYRKRTAELYEILDRSWSEALNDERLSDVVLSGVATIRRLSWTLILGV MILATMSIVKPGLKYAHQKHDNTTTMNYPLIYPGIYPWSITNVVVYRIHYLIESIAASSILIIPAGT DSLFPLYIFQMIGQLRVMAYRLTHLKDQQNVKMIIKECARQYKVLLMCQHSMQEIFGPWILW MVGTSAAVQCALIFQLSQTAKDLSVEQWVMTICHLIPKLTQTYIYSWSGTALITESEQYREAVY GVNWFSNKNIMSSIIIMLSQKTLKLTACKFLLVSVEIFAVIMKTTVSYFFLLRTLDPEPPS

DallOr MPQLYPDQNLDPGGVTEVRKLLKWAILALPDGAKAPTFDGVWERDGAVVVNCTDEITGNWL 183 KSLFPESKISGHTIHVVSLSERPKKHRVVVHVEDPEITSQEALCFLEKQNKGLMTSNLLLAHVIFN RNETYDLSQTAYPAIYPFAIDSMSKYIACISIELLIFISVGAWWIAADMVFLQSATHLSSQYQILN DDLMTIKTANDTPTVSNYSPIQQLNSIGKRHAHLLLLSEKLKRLFSPILMCLMLVTSANICICIINL QEELIHKNIAGVNKCVVHTIITMIQPAIYCIYTNDLVESAERTATAAYESDWVAETQNFKKSVRL MTMRSQKGLKFRIYGFFNVDLNQLTQIGVTAARFFALMNNLS DallOr MFKVSDTLLRTLGLSDIHSVFDKTPANKKKNEMVFSLTFAAVFFSILGLVNKAIHEWDHDPYNT 184 LEIIPMIFTTILGWFAGILMHSSTKKIKELLEDLKSSWTRELREGVNDMFIDTARRSIFFTVFYAILIF TLGALYIILPIARRIHHLVFGADNMPVDSNMGLLFIRYPFKVDSVPRYILCTIFEFSCTLFLMISYLG VDTLFHQSTTIVSLLLETVGNKFGEMTLYSNLKTFDRRLLRRLNHVGEQHCELLSYCRRIEEIFNPL IYLTTLLTSANLCVCVISLRSELSKLQLGSAFVTLVQVVAIISQPLIYCNSAENISHWTQNIADTIYSC QWPEQSKSFKKMVQLIMMRAQCNYKFGAHGLFKVNRHLLTQLVHTAWNFFMLLRKKNL

DallOr MFAVSNSLLEKLCLRSIYGVFHDTSVTEEENVLLFSVTAAAALLVILGFISNFIHEWNLDMHRALE 185 TMPMIFSTILAWFGGVSLCRSKEGMKGLLENMKTAWTKELKEGVDDKIINTAKRSILFTVFYAIL ICAVGAFYLIVPFVRAVHKFVTDEDQTHFDFNMRLLPIRYPFSIDTVPMYILCTIFEVVCTFFFINS YLSVDTLFLQLTTILSLLLQIVGKKYAEITLYNDVRKLNCGLLKKLHNMGRQHSELLSYCERIEEIFN PIIFLIIIFTSAHLCVCVFALESKLSVLQFVDAATLLFHFVAIIMQPFIYCNSAEDISHWTNNIANTIY TCQWPDQSQKFRKIVLLTMIRSHRDYKFKAYGLYKVNRYLLTQLVHTAFNFFMLLRKTNNSLNI

127

Supplementary Data 2-S5 – continued

DallOr MFAVSNTFLRKLGLNNIHAVFRGTQENKNGVIFPLTSAAAFLFLLGLINNLVYEWDRDMYRALE 186 TMPIILSAVLAWFGGVLLYTSSKKIKGLLRDTKVSWTKELAENVDDIIIDTAKRSIFFTIFYAVLIFA VAALYLIVPCVRMIKKLVTADGDTYFDLNVRFLPIRYPFSIDTMPTYIMCSIFEVTCAFYIATSYLT VDTLFLQFTTILSLLLQTAGKKFGQTTLDNNDGKFDGSLLKKLDHIGKQHIELLSYCRRIEEIFNPVI FLMMLFTSANLCVCVISLESALSALELVDAVTFLMHFAAVILQPFMYCNSAEDISQWTSNIATTI YTCQWPDQNKKFKNIVLLVIMRSQRNYEFGQYGLFKVNRHLLTQLVHTAGNFFMLLRKANT DallOr MSKLPGDSKAPHLRFEYYLGGDILILQCLGLYSMGGLFKNAEPSFYFWETIPFIFGVGGLFFGLM 187 SDFRSVFIETRKKSLYAIEVLAAALCCSLGIYKGFRLWVYRHEFYHFIKVFHLRWNDEVSQNTITD RMVEDAKSMRLFRIWYGAVVAAIFTAYVLRPCLVYLRFKLSGTNDTFDYSQTAYPAEYPFTLSTP HSFFCCIILEAIGIYFLILYWTAVDGLFAQFTTHLAIHFQVLANRLLDIPTNETVGSHHTGIAVQRL KKTVQDYLMLFKYVNKLEKIYNPILFGTVLVNGLNLCTCLYSLQYRMSNNEWGMAGKNALLTT GITAQTMMFCICAQRLNDEVAGVRQAAYNCSWTEFNSTIKNLILLIMIQTEPDYIYTAYGFIYLN MPQLTVIFTAAMRFFTLLRNMT DallOr MSKCTDNSEVLQSGFEYYLGGDLIILQLTGLYSIRHVFTLDQPKWPYWEVIPFIFGGGGLCFALIS 188S DIRNALGFVQTDSMLTIEIAAACFSGFLSIFKGFRIWIYRRELYDLIRLCYLRWKIQVSRNTITECM MKDAQAARHFRIIYSVIVGLLLTAYVLLPIRGYLQFYCSGSNYSFEFSETVYPANYPFTLTSVRPFF FCIALETVGIFFLGLYWLAADAVFAQITTHLAIQFQILGNDVRHICPITQMRSYGPTKIIQRLKRNI DEHLELFGYVHFLEKIYNPILLATVLINGIDLCTCLYSLQYRLAESNWGDVGKNAVHASAIVLQTL MFCACSQRLNDEIAGVRQAAYECSWTEFNTSIKTMILLIMIQTQHEYIYSAYGFIHLDMPQIFSV AMRYFTLLRTVT DallOr MTEFDSTGRPLRLYFEHLLGGDVRILQSIGLYSMGSILTHREPKWAYWEIIPIALVIPGLCFGLAC 189 NTNNIIKLLKTDHMMTMEFAATTLSSALSTFKGYRLLVYRREFYFLIKTCHIHWNVQVSRNTLTQ SIVEIARLARLFRIYYGTMVIIIFSAYIVQPFGAYILNPPSRSNESIVFTKTVYPARYPFTLNSSRAFFI CLTLETIGMYFLLLHWIAADGLFSQFTTHLSLHFQILSNQIRRICPSTMISPSKSTTVSKRLKALIEE YIELFKYVRSLEKLYNPIIFATVLVNAIILCTCLYSIQYNIAKNNWKDVEKNLLHALGVSLQTLMFCI CAERLNNEAIRIAGVHQAAYDCPWTAFSSSIKFSILMMMTLSQREYVYSAYGFIHLNMPQVTTI FTAAMRYFTLLRSIT DallOr MTEFDSTGRPLRLYFEHLLGGDVRILQSIGLYSMGSILTHREPKWAYWEVIPIALVIPGLCFGVVC 190 NINNIIRLLKTDHMMTIEFAATTLTAVLSTFKGCRLFVHRREFYFLIKTCHIHWNVQVSRNTLTQS IVEIARLARLFRIYYGTVVIIAFSAFVVQPLGAYILNPPSRSNESIVFTKTIYPARYPFTLNSSRAFFIC LALETIGMYFMMLHWIATDGLFTQFTTHLSLHFQILSNQIRRICPSTTISPSNSTTVSKRLKALIEE YIELFKYVRSLEKLYNPIIFATVLVNAIILCTCLYSIQYNSAKNNWEDVEKNLLHALGVSLQTLMFC TCAERLNNEAIRIAGVHQAAYDCPWTAFSSPIKCSILMMMTLTQREYVYSAYGFIHLNMPQVT TIFTAAMRYFTLLRSIT DallOr MDMSSKVEFEKVKYLFDRISWPLGVLGIWPKNITSFGQLKLTIFLTYFAIHLSMQLLDLASIMGS 191 LELVILNLTETAFQTMAIYRIIIIRFGQTTRRIIDSIEEDVAIENFQDPEELRILHQYSSVAEKFYRMG TRLAAITAVMYYVTPFQTYLVTRMMNGTAVLVNPYRIYHFIDLSPTERTAIVYACQFPMMYTG VSFVTSYGILLGFVMNVCGQLAILTHRVSIMKDDNSDPRAFFRRHTQRHIKIFVVAQWLNDAF HLALLYDLLATTVLIGLIVYQFLLNIDDADAFGVFGLATYILSMTILVYANCFMGECLNTECTALLN AYYECNWYEMSPFFKKALIICMETTQEPIRLTAGKFYVFSLESFAQIMKSSMVYVSMLRTMI

128

Supplementary Data 2-S5 – continued

DallOr MTPDTELEVKFIKAQNVLHQVSRSIAFLGMWPEEITSSSRIRLAMYLTYHIFRVLMELIEFAMVL 192 GNLQLTIDNVTNTGFQVAVILRLTSWRFNPKMRYAIEKFNEFHQRHEFQNTRELDVFIRYSEYS LAVYKIIRTTSVACVVSWYLTPFQNYLFAKLQNETFVFVYPFRMPPFEIFSRLDVAFLVHFTDIGM AYVSICFALTYGLYFTIIHHICGQLVIMSDRVRNLRVDPTVSMAEVFRPIIEKHTAIIRVSKALDDC SNVFLLYELLNTIIVIGLLAYYLIVDGDASSAVIVNYSLATINILVLIFANCFMGQCLENESMNLLEA FYDYQWHDMPLSYQRGLLICMLCARSPLKMTAGKFYKFSLEGFTIILKSSMAVISMLRKTV DallOr MIPDSELEERFQKAQNVLHYVSRPVAFLGMWPVNVTTGSRIRMVLYFIYHIYRMGMEFIDLV 193 MVFGDLGQVIENLMITGTQVALMMRLSFPRFTVSMRRVIDGLSDLHQRTKFNDTREMEIFIKY SEICERCYKIVLWPATFTCVSWYLTPIQTYFIMKIQNATFVYASPWRMPPFEAMEKPEVAIFLHL IEIPVTYLTACYLLTYCIYFTLINHIRGQLVIMSYKVKNLKVDRATNVKEVFGPIVEKHITILHIAKSL DDCSHLFLLYELLNTTLTLALLSYNIMANISLSETALIINFSFYMFNMTVLVFANCYMSQCLENEA MNLFHAFYEHDWNNLPLAYQKAFMICMLRAQTPLNITAGKFYKFSLSGFTSMLKSSMAFVSM LRTTV DallOr MIADAELEERFLKAQNVLHYVSRPVAFLGMWPVNITTGSRIRMVLYFIYHIYRMGMEFIDLVM 194 VFGNLEQVIENLMITGTQVAIMMRLTFPRFTSSMRRVIDGLSDMHQRDNFTDTREMEIFIEYS EICEKCYRIVVWSATFACASWYLLPVQTYLIVKIRNGTFTYASPWRMPPFEAMEKPEVVILLHLIE IPVAYLTACYSLTYCIYFTLINHIRGQLVIMSHKIRNLNVDPVTNMRKIFNPIVEKHIKILHISKSLDD CSHIYLLFELLNTILTLALLSYNIMMNINLSETALIINFFVYMYFMTVLVFANCYMSQCLEKEAMN LFYAFYEHDWSNLPLPYQKAFIICMLRAQTPLHITAGKFYEFSLSGFTSIMKSSMAFVSMLRATI

DallOr MFFCCSIGSRFIKAQNVLNYVSRPVSLLGMWPVNITIGSRIRMVVYFIYHFYRIGMELIEFVMVF 195 GDLGLVTENLLMTGVQIAIVLRLTLPRFAGSMRRVIEKLSDLHRRDMFKDTREMEIFIEYSERSE RCYRIVLWPATIACVSWYLTPVQTYLLARIRNGTFIYEAPWRMPAFGAMARPEVAILVHIIELPV VYVSFCYMISYCIYVALVNHIRCQLVIVNHKVRKLKVDRIRNMEEVLSPIVERHVAVLHIAKSLND CSRVSLLSELLIVSATLALLLYNIIVDGVKVMNYGFYVFNMLTVMFTNCYMGQCLEDEAVNLSD AFYDHDWTDLPLSYQKAFITCMIHAQRPLQITAGKFYKFSLSGFTKIMKSIMGFYSMLRATT

DallOr MSSGRIQNQEEFDKAIKVLSWNKWLLSALGLWPRNPNSIIFIVNFGYFVYNMMCEYLDLFLFID 196 NLEHVIENLTENMAFTQILVRMAMLKRYNRQLGEVINEVFKDYDAKTYRNPEESQVFLDYMN KAKLFVKLLCAFVTMTATSYYAKPITRIDSAVNATMPFLLPYRFHIFHKVDDFRTYAITYASQFPF VFVSGLGQTAADCLMVTLVFHVSGRLSVLAMRISSVKTNVDNCRAELGEIIIEHNRLLKMGQNI EEAFSETLLAHLVGATALVCILGYQLLVNYAKGQSADLATFFVFIFLVFLVLYAHCVVGESLITESN KVCEAYYDCLWYEMTPETARVIILCMARSQKPLGLTAGKFGSFCLSTLTDVSGENSYGLLIRSTII PPRM DallOr MMPDTELDEKFRQAKKVWHSVNLSLVWMGMWPVNVTTSGRIKLTAYLTYWIIKLSLEITELF 197 MVIGSLDTVIDNFAVTGVELVGFSRVITWRFSPVIYKVINEIQQFREYANFKDSVEMEIFIRHSQS SQRFHRFLIWPMCVCTLSWYFTPLMDYLSASMHNETIPLQPPYRFPPIIDISQGYLAIIVYLFDFP LMYTGLCGTSTYSTHFLLINEICAQLAILAHRIRNFEPEKCTNISEAVGHIVDKHVMLIRVTKNLN DSTTIYLVIELLNTTVLIALCSYNVLINLEDTFLVGLITFASFLLALMIMIYTDCYMGECLQTEAMNL FQAFRECDIHKWPISWRKALIICTLRAHIPLEMTAGKFFKFTLSGFINILRSSMAYISMLRAML

129

Supplementary Data 2-S5 – continued

DallOr MVDDELKEAKQLFAWNRMVFVIIGLWPLEPTICFFHTWLVYFAFHLFMGFADLVLVFGSLEEV 198 VANVSETALESMIIVKMVVLKYSGTLREAVVMARDGIMEENFSETKEKEIYMFYNAIAKKFFRW AVAFAFISAILYHLKPMETRLKAALANDTVPLLLPYRSHLTFELTDMTTYILIYAYQSPMIYVHTFH TAAVCFLITLVLNACGHLAILARRIRRIEPNSSGNSELQLGNSVRRYLEIVRFSKLIDKSFWIILLEEL VTTTVCLGLASYNVLVNADLADTTTFMTFVMYVFTMLLLIYGYCFAGEYLITESMNVHEAYYHC DWPDLPSSYRKSLTLCLIGTEKPLRLRAGGFYTFSMAGFTGIMKTAMAYVSMLRTLVL DallOr MPAEIRDINENKLIDEDFIAAGNLLTWLKWFCTIGGLWPLEKTYIRAGVWTVYLTLHLVMEYVE 199 LFAAFGNFNATVFSVVEVSMQTMVYAKLIVFRHSAMLRRLIDATKDELAENLYENYEEKKLYLK YNSLSKLYYKVSVPYVMTAACVYYLRPVVTSLLIGNFGTNDSMVLPFHITLPYTIVDTRVYWMTY AYLSPMIYLLACHNGWICVLITVQLHICGQLTIVEHRIRNIVHVTDHDASHAIFKSLVDRHTKSIW MAKSFDDSFHFILLVDLIVMTLLLGLTSYVIIIGQGVSESSTGPVFGIAGTATLLLIYGYCIVGESLIS ESSKVHAAYYECMWYESSADFKKAVMICMLSSQEPLRMTAGKFFVFSLTSFTDILKTAMGYVS LLRKVSQ DallOr MSLQNVDIKENEKIDEDFKAIGNLLTWQKYFLTIGGLWPMEKTYIRASVWTGYLGLHLIMEYTE 200 LFTLFGNFNSLVISVLESTMQSMVFTKLIVFRHSKTLCTLLEAMKEDFSGNMYGNHEEKRLYLKF HDLSKWYYKLSVPYIMCGASMYYSRPLLTSLLTGGFGTNNSMFFPFQIKLPYAVLDMQTYFMT YAYLSPMVYLLACQNAWICLLITTQLHICGQLSIVEHRIRTMPYANDDQERNMIFKSVVDLHTR SIWMAKSFDDSFHFVLLVDLVVMTLMLGLMSYIVIIGGEMEESAIGPLYTFCGLATLGLIYGYCIV GEALISESSKIYTAYAECMWYEASASFRKGVMICLLESQEPLRMTAGKFFVFSLPGFTNILKTAM GYLSMLRKVTE DallOr MTEFYNTARPFQSDFKYLLGGDVRILQSIGLYSMGSIFTHREPKWPY*EVLPFVLLIPALCFGSAC 201 NIRNMIGLLQTDFMLAVEVATATLTTTLSMFKGIRLCVYRRDFYHLIKICHICWNVQVSRNKITQ PSE SIVEIAKLARLFRIYYASVVIATLAAYVVQPFGAYFQNYLSRSNESFDFTKTIYPTGYPFHLNSTKAF FICITIETMGMYFISLYWIAADRLFAQFTTHLSLHFQILSDEIRKICPSTITSPSKSAKISKRLKTIVEE YLELYVWSLEKLCNLILFATVLVNGIYFCTCLYSLQYRIAENNWKDVAKNMLHACGVVLQTLMF CICAEHLNNKATRIVAVRQATYDCPWTAFSVPIQFSILLIMTFTQREYVYSAYGLIHLNMPQIFTA AMRYFTLSRSIV

130

Supplementary Data 2-S5 – continued Gustatory receptors

DallGr MTLWFFRVFPVEYHTGRWTPLKSDPPIAEMTTPTVNPTNFVQNSDSLHTALKPIINLAQCFAV 1 FPVNGANTPDSSHLKFTWRSLKILYSMVVLALSTIMTGASIHRILSSTFTSPKMTTLVFFGTSCVT NVLFLRLAMRWPEIAAKWEKIERELATRHRRTSKYNLVTRYKIIAVTIMAIALIEHTLSLVSGYLSA SECAQLQGDPDVLGVYFQTQFPQVFNKTVYSIWKGLMVQSINVLTTFSWNFLDLFLILMSTAL TYHFGLLNARLNSIKDKTMPEWWWAEARIDYNRLASLTRQVDSDISGLILLSFGINLYFICMQL MYSFNRVPNVIRTIYFGFSFGAVIARTAAVSLSAASVHDESLLPAPVLYSVSSSSYSTEVVRFLSQ VTTDTIGLTGMKFFSITRSLVLTVAGTIITYELVLVQFNAVQQVNPSNLTNACELK DallGr MLQPNISPRTIETNRNNKNSRSQDKKIWPLFSNFSVPERNEKRSEIICPMQIQLARPRAIKGTLL 2 GPREVNNSWRDGNPDINDQESFHRAISPILAVAQIFALLPVLNIRNPSPAKLCFKRISIRTFYTTA VIVMLFFMTYTAFIHMITTLRAETFVQDGGIATATGGIMFYGNSLMGTVLFFRLAPKWIALQRE WRMMERFFNSCKYPQVRLRWKFMLITAMIMIPAALEHTLSVVKAIPDSSEFPYNNGTWREYV EIYTDRSHAFTYTTVKFSTALGIFYFIISKIATFTWNFTDLFVILVSTGLAERYKHLNQVVLTGGKSD CSNLDWRQLREQYAALSSLVKHADNVVSPIILLSFTNNLYFICLQLLNGLSPKGTSIIMTLYFFASF AFLIGRTVSVTLLTARINDQSKLALPAIYSCPASCFTNETQRLQLQLTSDEVVMTGLKFFSITRNF MLAVAGAILTYEVVLLQFNVAMAK DallGr MTTAKEYDATKQEVTMSKQLKSDLYQGLFPIYHLSKGFGLLPVRVSIQASGRYNSKIHVVDIIYG 3, VCLLLLFICAEIWGLWRDLRDGWVNSTRLKRQTALNVTIGDVVAVALLAAVGVLGAPFRWKYL partial QEIIGRLIDADERIGFINARKTRRFAIIISAGALVYLITISSLDIYVWDMQTKLKRKMADKGPINYSP IYFLYMQALMIEIQYTITTYNLNERFLRLNKNLENLLRTGRNLMKKDVSFTSECKDQNEFIIYPRA KVEARPARVFRTPKVYDWMA DallGr AREMTDAISQLINVHSSICDTNLLLNKAFGLPILVVTITCLLHLIITPYFLMMEANSDKESLFIAVQ 3, FAWCAFHVFRMLIVVQPCYATTTESKKTAVLVSQLLTYQWEPYVRKQLELFSLQLLHRPLDFTA partial CGLFSLDRALITS DallGr MAGAVTTYLVILIQFQKADDTKDASNILKNATLFLQNVS 3, partial DallGr MSMNSMENMWLTDKPRKLKPKIDLFSMLDSVALIWQFAGNFPMRFPGHGNVKKCCFSRVI 4 MGYQIILFFTCVALISTVFYTWEHILRSQIWVMTMVTIVRTMSCLLCMLINCFMWMFYRKDFI DAMDFLASQSVILNQLGCRPDYSRCLRMIQALVYGMLTILLMFICYDFFIGFSRWPPGNKTFSFI QWLFWTIPLVVQVMTPTIFVALVMILGLHFRQLNVKLMAVRDRSNSSPPKIIEVSNEGVESIRLI AKVHYNLCKISKFINSFFSANLVVIVVMAFFVMSSSLYFIFNEVVKDEETDYWDIGTYVAWEITT ALPIIVTVVICNLTSNEAELTGRLIHELYVENSESKLYQIIRSLSLQLHHQKIEFSGAGLFPINSSLLQ TMIGNMTTFLVILIQFQPSLD DallGr MQMRRVGVLEGSTTTTKSGKSQTIYRGPESPLYSAVCPFVYVFRIFGVAPYEFADDILVPSCRN 5 AIFSFIWLGLYCWIIYEVLASFYHVDRGSPLLGHTETVKTVFNFFVAVMDIFVALYYRAEFTKIWN SLQDHDDKIRELGYARSERKPTIFVWVTLVLSFVMCLSINRLGMWAFIQSWWDNMSYLMMY IGTAISIVKFSGILLLLGDRFHQLNEIAKENIPVKPRWITILPVVNAKAVERLHDSLMVLGEKLESM HSWSLMAWIANLSFHLIFNLYFIIDWILKPEVIWAPVFCLLSWTVAFTGQLILLLYACDYASSEAS MMAYIMLSWKRLLYTQNRDMAVDTLMHLRNRRLHFCAAGCFDVNLPLLSSIASLLTTYLVILL QFEAD

131

Supplementary Data 2-S5 – continued

DallGr MGASIQKVEKKWELFRASNYLSLMGPCFELLRIHGLFPYKYNNKGVIVSSKYGWIYATLISVLCII 6 VNAVVLYMMDISKSLAFDSVPGTLQGNCYMLLAEWIAIVSYILSSKRMRLLRDIASVSASLSALS FRQLSKIIHAKDIMGFLFLMAQATNLYSSKLDLTLSKVINMYATLVVFLMDMLYADCVLVIGECF KNINEKLLTLKTNMEKDEPHLLRRVYHEKRNPMILMEIRGIKSQHNDVSELVHSLNQTFSLQLV ATVTLTFAEVTFSLYFYILQRLGKSGINLEKQIWYSYFSTSVTYYTLKLAAIVWACEISKDQAVKTG IIVHEVLIDTVDKQVKEELFSLQVLHRDNTFTARGLSLDATLLTSIVGGITTYLLILIQFMVSSKSCG SVMDPTSPPRT DallGr MSQNNSLPVVFLYYYWKIIGLCPFRVTNLGLEISILEITIAVVRCLGCLYVFPNFDYFHRAIHDHGL 7CTE IVVLSEATGFFTGLIVLLFIWIVSATRVKKIDSVIKTFMSLHKRLEEMGLEEARGRDFLRKMSLHSF FVNSFFWIHTFVITTIFWMVRTVGFSELLYQSARVVLWNTVVLFVDGVSLIRDKFQCINNAIEN FHNYLELGDDPRSMRFAQQVFYIARYCNAGEYLRSIGQIHEDLKDLLHTVENIFAVPILFAIVVN FMECASCLYLIAMTLRSGSTEWTFKEIVAMTTFSTWVIANLVHMLLIVSIPSLTDEEANRTGSAL HYMWITYRPRNTRFVVEAVSMRLLQKKTVISLHGLIRLNFPLIYR DallGr MCPQNKTTQKPKSPVSKSMFLFLKLIGMCPFTLDKIDIIRRSKTASLYSTILIVGYTLCYCRVVVSRI 8CTE SIMLPRETNMLVVMDFIGLGLDYWVIMVVWFYALVYQNELRCILKHFITTKREVVLLGMTECY DDFTMNLRIFVISLNLFFFVLYVVDHALLYHIKEFEFGVWLPFNLPRCVSINMVGIYLYALKTIQT RFQFINLRIENFPNNFSLEEAPGERNFGKNVSNPIATADRLRAFGKLHRNLRNLVTLMNNLFAL PLLIALMMQFTQLIVNIYMLLIYISNGNYCHWNANSSMSIVLGWLIIRLIQTLCIIDVCETMSQT ANRIGNYLHTMWVMRRPQCPGDVVKILSLECLQQTVKIQLYGSIVLNNSMLFK DallGr MHSIDKNSWHFVMLFRFGNILGLNPLRLTNDSAIEPSLFGLLYPLLQTILFGFICVEVFLDRYLITT 9CTE PGETEVFFILDQQIVVLTFLELSSCWISFGFSQKRLLRIADQFKKTSFIEQRLDIVCDYSKFVRRFAI SMSCFNFFFLLSVTYTMWIAFRMTPEQDLKWWCIYNVIRVIHYNMICLFVATITLVTLKFSRLN TQLENLLICDESNGHQFFGILWLFSELHRETCGLLDKIVSYFTIPLLITITSHIIHITANIYCYYLYVED VCPKNWDVFYVTSILHWITFPIAAIVIISDCCGCVIIE DallGr MKNLAQHSRKKLCLFHLIFFFFKIIGLAPFAVDIPRLLEAKKEHNSPKKTFTASTIGIIHNCALIACII 10 YVDSQTFDSVRRDPSAQVIKLPKIIGRCLTAIGFLLSIFIWALYILRVRRIVALANELLEIDGIMFRA NSLHLKFPKAIVLVSAVHFTMSGYLIAAEVSTQQTSIHVLIVFVLPSIIVSWTLVQYSLVLHMIEER IANVNRSLLEIGNIPMELAMPSLFVRKIPVTKDATAKIVSIRRVDVDLCDLCYKIRDFYAPPTLMT VTFFGASLVYSGYFLVMPFITRSDPHFHLTYINGIAWLSLQLSCLFNLTISVTKITRTIRKTVNFVHL LLDCCVLPSDASAELKGFSRQILMKKFQLSVYDMFPLDNSLLSSIAGFIITNLIILIQFST

DallGr MIEMWRYLRKPQLTVYCNLFVFRLLGLAPFSVRTSCLLGINLRSNDKKIFSVSKFGYIYNAILTVS 11 MTWLIINSKIYVFPATIRVEELLPKLVGKSLMVFGYAMAVIIWIAYILRQSSMVTIANGIYDSFRV MESCRDIHVKHGNCLMFKWLVFLMSSVYICVVEHIEYSYTSIIVIYIKLSGLVLGWMMLQYSFIL DVFRDRFADLNEGLLRIGRVSEEFEVPSFFHRKILINSTVLADIATLRRVHGELCHLCYQVGDNYA LLVLLVIIVFSAGSVYNIYFLLIPAISNYPPLLPLITGIGWIVIQLSAVIYLTTSATRVTQAVRRTAHVI HALLDMCSLHPDVTRELKIFSREILAKKFHLNIYDIFILDNSLLSSIAGVKITYLIILIQFLIN

132

Supplementary Data 2-S5 – continued

DallGr MNTAYIENMFVVKFVLLTFRLLGLAPMTISKSQASDHWNPKRKITKLTFSWSRLGTAYNLILILG 12 FLMECYRQIPKIMTIDAVDKTTLSQMYRNSIITCANTLLIILLLTFCLRQRKIIKIGNQIVEIDSTLVG LRGIYQLKPGKTYILLVLGLNLILNVLLIAMRLTALQDMTGFMTMVVFNTVVFSFFILQYALVVL VAERRFAGLNEALLSLETKAVMINDSNRFQLADQRSTVVSIIVIKRAYGKLYELCSEISSLYAPFTIF IVIYFSTSLLFALYIIFFNMTRGHLGALPTVIDSIWISVRSFPFILLTTNVRKTHIQMQITSDITHKLH SKFAAHREIKTELKNFTVDLMHRNFQFTAYGIFPIDYSLLKSIASTITTYLVIFVQFQLASSKHEPKE KK DallGr MNIADTGDMFVIKFVLLTYRLLGMAPVTISKTQSSKLKFSQSWTGAVYNLSLITGVLTECSRQIP 13 TMLGESSNDETSFSQMYRNVLIVSGNTFLVILLLRFCLCQRKLTEIGNELVEMDRYLDQLPGIYH LRAGRTYVFFVLTLHLTIIITYCALYLAVHPIKPAFFTMIVFPCTVMSFYTLQYSLVVFVIERRFTSL NEALASVETKTVGVNDRSVVSIFVIKRAYARLYRICSEISNFYAPLTVLIVVNLSTSLLYELYKIFFGL TRRSSVLPSPTIKFILGFLWISVRKLPIVLLTSTVRKCHMQMQMTSDVVHKLQDKLGACRDVKS ELKNFITDLMHRSFQFTAYGMFRIDCSLLKSIVSTMTTYFVLFIQFQLASDKGASE DallGr MNVKISKISVSSLCVIRLNFLVSRLLGLAPFEFVRSETSLEDKKLKRETLTFRYSRTGAASNLIWML 14 VVSILHYNFVYTACVEERDDGSLTSKAIGICLAMNGLVLLTFLPLKYLFEQRKIVSFANRLMKIDK SLNELRDVYHLNSGNVRLLLMTVAHLICCLLVLFSESILLQNSAASITIYVGQTVVLTFYITQYALV VTLMEKRFASLNDALLRISTNNVRTYTLTASQRSLNRSVTVEVMIIKRAYVILSDMCREIADFYA AATIFIVIHFSTSIMYSSYLMIITPMIKTWGIGVFLDYVNTGFWLLFELAPLTVLVTSAARASKEM KRTSEVVHRLLATCAASREFRRELKNFSLDLLHRNFKFTAYDMFSLDASLMNSIFSAIATYLVIFL QFQIKNSVYMRDEMTTTAISHLDSDVTSLP DallGr MPETQPQSLGTEPPIVDTYLVKHSNSIRFLIIIYKFLGLAPISITNLTSPHRKRRLHFHNLMFKKCLL 15 GLIYMYTLIIIVSGAAVVTVPLLNKGPQSDADLLNSATTIKGVFGITVMFVIWLIVAAQFQMALKI LNRIVEMDKEMLVLQDVRADASKRQIVILFMGNCTVWISIFVLEISVVDDWYKVWTPLLLPSM IMNWYIMQYIFMLVMIESRVGSVNRGFLMIAKRRLATFAYSTARPAAMSERKMVNHFMILRR GHAILAGICRDLSAYYSFPILPTIAFLCCATIYDSYYLIVPLVVPSYYTSILEIANMICWLVMEMLPV VVLAVYVTRVLNEMEKTGSIVYNVLSQSALTYVAKNELAEFSVELLHRKVRFTAYGIFSLDGTLIR SIFGMLATYLIILMQFQINHRPDNDAKQATTSPNSSTC DallGr MQRQSAWAEQGVIDKYLFSDSCTVRLVVVIFKLLGLAPISVESPRSLRTSTQNSTQGLMFKRCV 16 SGIVYTYILVVIVFAASIITVPLINSETLHSDGDLLETFEAVKGVFGLIVVLVIWLIVAFRYKKVLKILN KIVEMDNEMLMLQDLYYLETSKRRILVMFGGNSIMWVVIFVLEILSVPDWWKIWTPLLLPSFV MNWYIMQYILMLVMIENRFVSVNRGFIMISNSRIETFFHADVRPADVSERVIVNNFMTLRRA HAVLSGICRDISDYYSFPILPTVTFFCGASIYHSYYIIVPLVAKTRQRSILESTNMVCWLMMQVLP VVVLSVCVTRVLNQMGMTGGTVYKVIARSILNYVAKDELKKFSFELLHKNVQFTAYDIFSLDCT LIQSIFGMLATYLIILMQFQLSHTSQRDYKYATSSPETSSQ

DallGr MSFRNIKSRLKTSIENTVFISCYIFFKLLGLLPESLRNSESRFAPPNVCVSSPEGLIYNFGLILTFPVLS 17 YYSIRAMRDSDYPNKSSTTEAFEFFKAVFGSIIVVILWALITQKQRKTVTLANQVMNIDHLLMK DKNIAAPETCLLPCASICAFNMFCWVTLIWTEIVGFDKIQLIILVTEVGPNFIFNWFFILHTFVIML LETRVRAVNQGMLRLSKDLRSSIPRTTAGEDKIKSLITLKYAHECLHEVINEVATLLSVPVFLIIAEL APSIIYDAYYMVIPFLVPSFEFSWVLIANSLCYLSSQMFPIVTMVVSMDRITNEMERTAPIVHNL YTRAALGRIVKSEFKLLSLKLLHSKIRFTACDVFSLDRTLISSIFSMAVTYLVIIIQFQVL

133

Supplementary Data 2-S5 – continued

DallGr MSFSTRASPGTLALKASYLFFKAIGLIPARLRHLSPKKGRPTFVYSRGGTLRNSVLIVSFAVLSCFS 18 VHAMNKADYPNKSPTTEAIEVFKAALGSILVLIVWTWVCVHQREAIKVANDFLAIDSVMRSYS NLYSPKTVTQRFVLVWVFNTLIWINMFWSDYMVFDHVPVIAFISSVGPNFIFNWFLLMYTFSII YLRMKVQAVNDGILRLSADSVSRISASPGNFDDRFFMKAFFALKGTHMRLHEVISEMDKFYSF PILLVIAEMCASIVYTAYYLMVPIFLPSVKQAEIMIFNSVCYLSMEVFPITIMTISIGQIVQELGRTA DAVHKCLSRVRLSRKAKAELRLFSLELLHQNVQFTACGIFALDGTLLHSICGMTATYLIILIQFQPT ARPSEENPNLQ DallGr MTPALSKVIKILPYKKCFRKETVLLEVGLKFFQILGLFPCTVDTSASISFLPSRLCVVYHIALVVTISY 19 CNFYVGIPAIVQAKSESNGLTFEEVLELFLLGMATAVSLLIWIVYPLTRKRLMIFGRKVVIADAVA RKLGGNYRMESSRNKLVIFAMIYAILFGLLIFTEYIIYDDPMSRFVWFCFYTVPSFTIILLLIQYSLA VHLTSRRIKALNDTILGSLETTAVQGPPAALQSHIDDVVRRVFDTWKKAHNELYEASLVIAKFYS FPILFVMGYTCYTVIFNSFHAVKELVRGNGQTVAIAILNDCVWILILSMPTVILMLEIENLLAEAK EKAAVCRKLALRFQKHRLFRTQVKMYSFELLHENINFTAFGFFSLNGTVLHSIFATTVTYLVIIFQF NNVYEE DallGr MTNSRFSRVKKIMPRGKCPRRGAAAFKLITNFFKITGLFPCVVNTSPPMYFCPSKSRIFHNTALI 20 AILGYYNFYVGMPLIIQDGTRSGALKMGRVLRLIFIAVSTTVTLVIWIVYILSANKLATYGNKVLR ADAIARQFGSNYQLKSVRNKWVSLIVIYMILFSFVISTTFLSFNNIMSEFAWLCVFSVPSVLISLFL VQYSLIVQLVRRRVKALNSLILRKLDVYVHSEIVWTFEIWKKAQDELYFACLIIAEFYSFTILLAMG FVYYTVICISFYTVKDVWKSAERRRPMIHLLNDPTWMLILVMSTALLLFEIDDLLNEIKKKAAVC RKLSLRFERHGQLQTQIKMYLHELLHENINFTAYGFFPLNSNILHSMFGTTVTYLIILIQFLMFHE ERKRNAGT DallGr MPNLRLITRTKKIFSYKKWFRREVTELDVITRFFKINGLFPCTVDTSISVNVFPSRLCVFYHIALIGIL 21 GYCNFYVGIPAIIQDKMASGGLKLPSILELIYIGSSTAITFLIWIVYITFCTKLTIFGKRVAEADAIAR KFGTNYQLESVRKQWIMLFITYVILFSFLTFTESLFYSDNVTSRFVWLCMQTVPSIIIILFLTQYYLV VQLAKRRMKALNKIILRNHDKTAVHNRNSTNDAITEAFESWKKAHNELYEACVVIAEFYSFPM LFVMGLECYGVICGSFYSLKYLRSGVHQAQMLLHLLYNLIFIFNFTIPIVLLMSEIDDLLSEVKEKA VVCRQLTFRFRKHRPLLRQIKMYLYELVHENINFTAFGFFQLNRTVLHSMFATTVTYLVILYQYS NRRDEDGSNTSGN DallGr MDRETSVLFRSITAIFKTFGLLPFTITVPSTSKKGIDISPSRGTVAYNVLFFAAVLYANCFAGIPIIIR 22 HAKSKLQDTQVILDILHTIMASGISLMTLAMFNARRAKMMNMTRDFLEVDRCIKRHRHIQSLK FSPRPLTVFIILYIIMWIVLVSTEWHEDFVAAPKMQWLLWGFTYLVPVVIITLCVLQYGVIMKLV AFRFKLLNHTIRKSTDTIAHCLNQNETFILPESAGDRVIVHKFRSIKSAHESLYDFTIAVSHFYSFLIL PIIGFFCGTFIFLSFDFMMNAMSSKNEGNTISNYVAWILIVITPMVILLSNIEAATEEYERTPIELY KLASRFESCRELATEVELFSHDFLHKKMKLSALGCFPLDYTVFHSILSATLTYLIIIFQFQESRRMAS KKPANLTST DallGr MNKREKKFYFCLIILSNCFGLFEFGVKRIKLRGRGTELRLQKSRLARVWSLILIGIILYTNYLIPILDC 23 WMYTQDKWMMFDYAKSFQTILGSLIVIVLWIYCFLKQNSLIKMANKFIEIDNFFEKYRHIFQLT GFKDVIPFTVIFIIMSIAEVVMENAAYGYYGKYLWYIWWYKYLFPHFVLMTFILQYATVIRSMRI RMESLNRGLAKLTSSSHDFQNKFTVAGRMFTNEMMLESFDSIRNLYEELYKLSVEISEFYSFPILL VMAYVTFLIVTLAYCLVTMVMHFYSSELIVRILPGLSFSLLTSLSGRLIHEIEGLVKEFKKIAPTAFEL LHSFPGDHALTTSVICFTQELLHKDLVFTAYGLFPLNSTLFQSIFSSALTYSIIILQTQGSGRAVATG SNNQTH

134

Supplementary Data 2-S5 – continued

DallGr MENDRKKISLEILILFFKLIGLFPFGYSRKEKIHSPNRDNPLCSAVTSVTYNLILIALVCSLYIRKAVV 24 NPNSYVQKLMDDPLTDIEVNLASVVLVILWSCFLLNRHDVVEVFDQFLEIVKLLENQQSYWGR RKERKSVLTIAVHVVLWIIIAWFAYDAWDPTDWCSSGWYFLYMFSKFVMTSCILQYTLLCYAIR NMAECLNTILLNFYQLTPHQPHPKILDFKSIRFMYCSVWDFTRAIDGIYGVPILIIITYLCYGIIYTLS MLTLSIKGGSILSALYMNRQFYTLHFTIGVAVCILIWSTTAAIREFQDTGLIVYRIFHRLEENIQLQ KEVQSFSLELLHQHLEFSANGFFNLDYPFIQTVISAIISYGILLVNISK DallGr MKSKEILLKIYLIIFRINYIISSLLGLAPCSIKIVNRSAQRIDLTLIFSYSRVGCAYNIVLIMLCVVIIVIG 25 VPLLEEMQYPNDSKTLKTITITLSTMGNIGAIVIISTHAILQRRIIKIGNQLHEFDRKYGDKLIGRRA NAFRDFRRMMPIIFLFFIWTGLLVTTGIAEEEVYILNSSVWASCFSWFLTQYSLVILVLRGRFEGI NNALLATAKYPIGFEENSLFRGSVTNDRLIIKNLSIMKQARKEIHKITREVSRFYSFPVLIIISYCCCC SVNSMYYCVLTLIDPEVDVFTNIIIIDSIFWTLITLYPIVMLSTSVDAFHEEVCKTADIVYDIMETYA PNKDIESELNNFAIELLHRRVVFNACGMFSLDCTLLHSIFGMIVTYLLILLQFKPAGSDSKN DallGr MLQTSVIISFFLGLAPCNIDIGVHSSKTVKFTLTWSYSLIGCVYSISLVMFFLGINIVAVPNLYEIPY 26 PNDSVTSMMIIVVLSVQTNIIVVTIIMFYLLFQRHIIAIGNRLSEFDQKYGDTWLGVCDKTLGDF VKFMAIVMQLFLWIAFFVTSVMAGDTLFFVASGFGIIFCSWFLLQYSLINIIIRDRFKGLNDAISR CSVSEGNSFYIGSSSNNQLTVEKFITFTRARAMIYNISLQVSQFYSFPVLLVIFHCSCSSVSSCYFFL MTLMESKSSSPVILSVNSSFWTFMELYPIVILSVSVAMFHEEAKRTADIVYHIIVMCPPDADVVY QLNNFAIELLHKNVQFTACGIFSLDCTLLHSIFGMLVTYLLILLQFKPPEG

DallGr MRFIMFHIVEAISCILGLAPCRINITPQSAKKIDYSFTLSFSQLRCAYNILLTIFFSTIIILAVPHMTEV 27 SYPNNSKTVMVIVIALAISGNLFAIMIIIFYSIFQRHIINIGNRLSEFDRKYGHKLFGMRSSGIGDLQ KFITIILMILVWIGLLTTSLVVPPQLFLITSALCTAFLSWILIQYSLIINILRDRFEGLNNAFLAISKCPI AFEENSIFSGYTMNVRLTIENFIMIRRARNMLYEISRQVSKFFSFPVLVVVSHCCCSCVDSIYFLI MTLVPFKFNGFSMISVNSVFWILMCAYPITVLSASVNTFHAETDRTADIIYGIMEIYACNKNIQS ELNNFAIELLHTRVQFTACGVFSLDCSLLHSIFGMIVTYLLILLQFKPTDG DallGr MFGTRINSKLYLIIFRIVQIICSMSGLSPCSINITLQLLKKIDYTFALSYXLRCAYNISLIIFFSAIIIF 28P+F AVPHMTELSYPNNSKTVMAIVITHAIAGNSCAIMIIIFYSIFQRRIINIGNRLSEFDKEYGYKLFGM KSSEIGDVQKFITIIFMLFIWSGQVTSSLIATPRLFLVTSGLCTAVLGXFLIQYSLIINILRDRFDGLN NAFLAISKCPIAFQEASIFSGYLMNDRLTVQNFVMIKRARNRLYEISRQVSKFYSFPVLVAVFYCC CSCVESTYFLIMTLVHFDASFWSLDTINSLFWITMCAYPVTVLSASVNTFHAETDRTADIIYEIVE NYVSTNNIQSEVNNFAIELLHKRVQFTACGVFLLDCSLLHSIFGMIATYLLILLQFKPSEG

DallGr MKNMQINVIIFKVLRVISLLLGLAPCKLFTEAIPSKKFRYAFKLSYSRLGCVYNISWIIIFSGVIIAIVP 29 KVAKDYDINYSKIIITIDLTMSAMGNAFVIILTGIYCIYQKQIVEIGNRLSEVDEQFQADLFVFNKE TFRNLHYSVVTMTFIIMWIISFVICLVGWRFFIFFSIILPAAVLSCLLLQYSLIINILKERFRCLNEGLS AAVKYTIESPGVSSLERNITNNRSVIDNFALIRNMRHSVYKIACDVADFYSFPVLLVIFNFCCNSIT SIYFIIINALHRKNSGCPDSMSDCFWIMMYSYPIVVLSESVKRFNKEVSKTTNVIYDIRQTNFVNK EILNDFALELLHTKVAFTAFGFFSLDCTLLHSIFGMVVTYLIILLQFQPADAATE

135

Supplementary Data 2-S5 – continued

DallGr MYSNKNQIIFKITHIISFLIGLAPYSIRISSYSSKKPEFNFTITYSRLGCTYNILLMFVFVGMTLIVTPR 30 LIAWQYGDHSVLSIIVAIQTMIGSIASIIVILHYTLHQKQSVTIGNKFSEFDATYGNKFLRAFDESN SNRLKNLRHWLTIFLFTFIWSGVVSTSVGDEQSPACIASFLNVIVLSSVLIQYSSVVDNLRGRFKR LNAVLPKIFKCPIPSLPFAENVWNNRFVTNDFVAFRQARNELYRISCLTSEFYSFPILLTIFYSCCA TINTAYYFLLNVVQVDQFILHHHGLNILFWFFICVYPTIALSRSVHIFNIEMHKTADIVYDILDTSA SNREVEYQLNNFALEVKHKKVEFTACGFFSLDCTLLQSMFGAFVTYLLIMLQFKPKDVMQD DallGr MTYEPRDAKINLATFRVVQIISFVLGLAPCKIDIAISSSKDIKYITKFSYSKSGCAYNILLIIIFSVLTI 31 ACLPYISKEYDLHYSEFIVVIDLTMSVIGNVFVVITALIYCIHQKRIVKIGNQLSEIQSNLSPCVAKKS RIFYYVTIISSFVLIWVSISVGQLVRSKIFLFISFVLSAILLSGFIMQYSLIVNVLRDNFEDLNRSFYTIV KVPEEFRYISGFPRNIAKTRLIIQNFIRIRETRHALYKVSCQISHFYSFPVLLTIVNCCGNCISILYFSII NFKHRESTFLNSISPTVCILDGLSLLALYGYPVVVLTNTVKRFNAEINKTTDILYDVRQANASNEE LVAQLHDFALELVHKKVEFTAFGFFSLDCTLLHSIFGMIVTYLMILLQFKPPDTATE DallGr MSSKHPYPKISLIMYEITHVISTLLGLAPCSIQISTLSTPESGFTFLFSYSRVKCFYHILLIALCIGIITR 32 GVPRLNDVPYPNDSSIGKTIGLALAIMGNVIAAVIILYYLILQRNIIRIGNQVHEFDKIYGTKLNAR SVNMGGISKTFITITLMIFIWIGVLVTESMLPQGIIYIVTSGLSEVFVSWLLLQYSLVIHVLEDRFR GLNDSLLATAAYPVVFTEDALFRRSVSSNSLIVENLIVIKHARNCVLKLSREVSGFYSFPVLLVIFYC CCCCVDTTYFYVMTVMRPGKDGDDNDLAGFIFCVLLNLYPIMILSTSVNRFHAETEQTAEIVYD LMESHGTNEDIELLLNNFAVELLHKKVSFNACGLFTLDCTLLHSIFAMIATYLLILVQFKPSG DallGr MFSKKSLIIFKFIHIVSFLVGLAPYSIKILPRSSKQSGFNFRITYSGLGCAYNILLIFIFTGVTLIITPY 33 LIEWQHFDRSVFMMIVVAVTTLGNIFSTIVIFYYTLHQQQSVAIGSQLSEFDEKFRSKFCNLLKN GSTNRLANVESWLTIFLYLFLWCGVITTSIGIQQSPPSIASFFCALILTSVLMQYSSVIDNLRRRFK NLNEVLQTVFKCPIPSLEGVLLVGNVSNNRLVHNNFTTFKQTRNKLYKISCLIAEFYSFPILLTVFY SFCSIITTCYYFLMTIAQVERFTLDTLTLNTIFWFFLCTYPTVALSRSVRIFNKEMHKTADVIYDVM EMYAPNGEIEYQLTNFAVEQIHKRVEFTACGIFSLDCTLLHSMFGAFVTYLLIMLQFKPKEVMQ N DallGr MMHLPQWVTNSFPLLSYAVFKILGLAPFTVNPRPANKSPIFVHSHIGTLYNLILVVIVAILSPSSIA 34 AIDRVEFERKTSTAQVISLAKASIGLVVLVVVWLYIILNQKTAIRIANTWQHEDETMRTLAYVKH RNLSGLQLRLYFVLNVSIWINLFWTDLVEFEKWFKATFVGIIIPSFVFTWLILQYTFVLVLLRDRFR ALNEGLNRISSAAFAAPYYFNSLERTLDDVISKNLLVIKVSHEVLYELVCDISSLYSFPILLVISVLSG TVIYSSYYLFMMLLLNDDVKPLTIVNSVCYLSMEMFPIVVLSFGVTKIINELKLFSLELLHRNVSFT GGNIFSLDGTLLHSILNTTATYLIILYQFSQPDKSTSNTSRITNATSGKSFYILTDALGDINHVNDEL DallGr MSSASFKINLLTFRVTQKVLLLMGLAPYTVTIEAPPAKESASPIKFSFSKRGCVYNIALVILIIITVITF 35 VPEILQFEHPNYGPVVKCIDITLIIIGNTIAVTTILICCNYQQNIVDIGNQLGEFHQKFQVKSSQKC KSELTNCEYSVIVFIFSLFSIGLVIASFFLYQSLVILSSLPCAIISASFIVKYSLIIKILKGMMKSMNEAIC QMDKCLIVFPEDSFFPENIGNPRLIVRNLAAVRQARAVVCKISCQIADIHGLPSLSIIFYACCGCV SAVYFLITNFMIQQRLFLNVASMTGILWILLIMYPIVLLSETVKRFNAEMERTAGLIYDFRGTRGL NKDIIYELNDFAVELLHRKVVFSAYGFFPLDCTLLYSIFGMVVTYLIILLQYKPLDVSTH

136

Supplementary Data 2-S5 – continued

DallGr MSVSLLNSEADVFILKLNLIILKLLGLAPVSFNITKSTLNESGRSVTCKISLPGIIYNICLIIFSCILVY 36 ISLPELYNTEYINKTPLTQTIDVTLALIGNIVAVMVIVIFSCQCKLVVSIFNKLFHIETNFSKLLNTSK NRRKKYPSILFFWTNIFYCIYVALVHIGALHNDFISLFALTFPTTVCGCLITQYSLVATLVEDKFKRL NQALLKLLTIVQTSACSERTILSRLFAIKKQRFSLCELTAKISQFFGLPLLFTITYFSGALIYATHLVVR QFIILLEPNRPLTLNTLAFILLITMPIISLCILVTRINIEMDKTADIVRKLMMQFRDYEIIKSELQDFFL ELLHPSVQFTAYNVFSLDCSLLYSIFSAVATYLVILAQVPDEPCTGSHEGHYGCSGGSNVTL DallGr MFPNIDIFTLKTSLLLFKLSGIAPFTLNISKTFRNQNRVLYIPSFRCSSASTIYNLTLIITCLMLNTYLL 37 YYMSGLDRLFESEIVHVIEIALGLIGIFVMLLIWSYFTYHRRAIVDVANQLVDIDMILAKHDLRYY QRRHIHFVLIIILNTAMSLILLVIELISHDFEIVRIFVLVLPIFVASFTITQYALVLHLLKPRFQHLNRAL GNILKEPSEMSFIPRQPVAGKIVDLRRVHGELWELCQRVADFYSIPVLFTMILMSGFIIYSAYDM IIPFVLPLRYKMSILTVSAFTWFMMQSIPIVVMTTTISGTIGEMKKVSHVIHKIIAQNEIHRVVRI QLRDFSFEMIEKKINFTACHLFPLDCTLLPATFSMTMAYLIILLQLKSRVNPKLDHVHHHNH

DallGr MSAFNRVLSKIKPTFDLIILRIFKEVLFLFGLAPFSVKISESPVERSKFVFKFSQSRNRCAYNIIVGVL 38 FTSYIVWKKNNSNWINTALFAAIHGATATINFIYCYYSQQLIAIGNRLAEYDRAVPMALSRFYHC EDATLQMIIVTTAATIVWLIDSVLVVIKSDYPMDTALLMTVGLSYNCFWIQYIMIIILLKNRLSAL NAALMKIGNADAKRKTISVLSFDASCESRTERNIVVIRKARNILYAIAYSVEEFYSLPVVVAVIECC CIVVYSGYYTIGPFILSEPPATPFLFYNYACWGIEYLWILACFSINVNKLIEEVRIFTRETIIVVYDIM ETYDDKKSIKSSLIDFSYESLNRQMNFTACKMFPLDCSLLLSILSMMGTYMVIALQYKP DallGr MSRSPLVSLKIFHAIFSALGMAPCELQISTARSSQPTLKFINSRTGCIYNFVLLIYIAFTVCYLIPNYV 39 TEGLGSLEEKIAKSALATMGNTSAFIIIIYNAIYQREFVKIGNQLCKFDAKIDLPWTNETFQQLINV SWITLLWLGVLITEGFNDFNVWHFLIHVPNSIVCCWLVLQYSLAGKALQMRFQILNRLFSEVDA AAGPLESLRNLRATRNFLWKTTRGLSKFYSLTILFAIGNLCFDTVIGVYYSIAQVLNSSGTKISMD AINGFFWTCVQICPIIYLSIAIKRLMKEVCILCQSVGIFPRIFLYDEINEFSLELFHKQIPFTAYGFFSL DCSLLFSIVSTLVTYEIIAVKFDPPDTEDITPTFSGH DallGr MTSIEDLVSVEFETGDKPESFEKVIRPSTFFSWILGVGIARPLKSWKIATFFLRFVNFAICSSIVAY 40 GAIDFFFFGSVFKSDTFKIMYYMNKVACYISSYYYVFHGLVQYKKWPILMERIAAIDKKMRHCG LECNNRSIRCFQIFTLIMTCLLGPISLVSHALYYLYTQPEEIFASDLLLYHTISQSLCINFEFDLIVMGI YARFREINRGIRRSGEQITAGRIIMEIRKAREVHHALCALVRYINSIHGMHLLLSSLNSFTMVVAT LFRIYMGVVEGIDKFMMINNVIWLTYTAQFALTCFICTWARRESSRTGIIIHDIVLRRLPKGPRP CDLYSIDITRAPEDPELSLRNEINDFSAQLDHSIIAFTACEFFIMDNNLLTNFVGVITTYLIILVQFY APEGVCGGMSTTPSVEL

137

Supplementary Data 2-S5 – continued Ionotropic receptors

DallIr MLNLWVVTAIALSFFSYCAESQSQIKLLVVVENLDDGVLKLLNDVIPAAEKAHESEKVSVDVKSVQ 8a VDRHNVEGSFQQVCAVLFDGITLVLDITYTGWDRLQALAHNNSILYLRTGGSIIPYVQAIDDLLLKK NATDVALIFENTRELNESLYYLIGNSIIRLVVIDDLSEVTVARIRLMRPSPSYYAIYSSTAKMESLFKTA MSGGLVKRHGIWNLVFTDMKYREFPYIAGPDTLNTMVGILSMNPSVCCRLIREYPCTCPHNFEIF PKFFERLIFLLVSTITQIQKSGIDVEPIKGQCITTDESPTIDPEKNATLWTFYTTLTTKIESDNDVFEFSK DRYLIRMRAEINLETLEGGNLEMLGNWTKKNGIVAAPGKDIQPAKRYFRVGTAEALPWTTKKKD PVTGEIMKDKDGKIIWEGYCIDFIQKLSEKMNFDYDLVIPEDNSFGHKLPSGKWNGLVGDLSRGE TDIAIGALTMTSEREEVIDFVAPYFEQSGILIVMRKPVRETSLFKFMTVLRVEVWLSIVGALTLTAIM IWILDKYSPYSARNNKRIYPYPCREFTLKESFWFALTSFTPQGGGEAPKALSSRTLVAAYWLFVVL MLATFTANLAAFLTVERMQAQVQSLEQLARQSRINYTVVANSTTHQYFQNMKNAEDKLYNVW KEITLNSTSDQVEYRVWDYPIKEQYGHILQSINTVGPVKDSKEGFRKVIESEKAEFAFIHDSSEIKYEV TRSCNLTEIGEVFSEQPYAVAVQQGSHLQEEISRKILDLQKDRYFETLSATYWNASLKGTCSVADE NEGITLESLGGVFIATLFGLALAMITLAGEVIYYRKRNAAEGIKSKEANGEHVRSGEDKLTKGRLGFK PAPTIAFIGKPHTGPRARISHISVYPKNFPFKE DallIr MRVANYIFIILLIFTRLEIIGSHTAHMRRSMKVDLRTDSLDRLLSYILKEYFGGCVIIIIYDDKTIEQQPG 21a LLQGLYTSFPFASFIQKSTNTSLGQVPIIFKDKCYNYMIFLDDVYYIENVIEEETVNKVLLITESTPWT VKEFLKSFISRSYTNLVIITHSMSRRTEEGSFLLYTHRLYTDGSGSSKPVLLTSWIKDHMTHKNIDLFP EKLSGGFRGHRLLISTAHKPPFAIRTDRISLGQIGWDGIDIRMIRLLGKVLNFTADFRDPTASTSPTY AALMDVEKRETTLAIGGIYRTNNVTTRFDSSFSHMEDCAAFISEASLALPKYRAIMGPFQGAVWA LVVIAYVIAVIPLATNTNYSILSLVTHPSRFMHMFWYVFSTFTNSFVVKNPLLDTGIAKNSTSLLIGIY WVFTIIITSCYTGSIMAFITVPVFPEPIDTAEQLLKKNYDIGTLDHDGWEVWFNWTKIDEPVAKKLL KNLQYVSTVKAGIGNITQAFFWSYAFIGSKILLEYIVQEQFTPSWATMRSPMHISKECLLNFGVTFV LPKNSIYTEEFNKVIIRARQSGLAQKIIRDVKWDVQRTAEGLLLPVSEEYKRRKIPVQDRSLALDDT QGMFLILGAGTLLAFLTLSIECCVHLWKKRYSNDVGHTMDGSTVVSETITPKMNHFRGLDMWA PDGSTSKRRRFSISSI DallIr MTTLQNQCEKHQFRTQITTKHIDEAVWNIVINDEMNDVANRSVNNALKNIRDSHTDWLREVIII 25a QINGSDPHDTLDKICTAWDRAVRDGGHGVPDLVVDVTRSGFGAETVNSFTAAMGVPTLSTQFG QEGDLRHWRDLKEDQKGYLIQVLPPADLIPEAIRQLAITMNISNAAIMFDENFVMDHKYKSLLLN VPTRHVIVRTKEVGAIDAQLSQLRDLDIVNFFVLAKEEVLTAILDAAEAKNFTGRKYGWFALSLDEF IPKCECKNLSILFFQPQSTSFSQEQLGGLTSKGLLQPPLITAAFYYDVTRLAVQAMREATKNNLWPI DPQHITCDEFSGNNTPKRNFNFLEKLRNVNREVQFEQTYAGFHWGSKNGEHRANFTMKMSLA VIDDGNAISTNVLGEWPAGIDSPLKMLNHTAVKSFRVVTVITPPFVMYDPETDTWSGYCIDLLENI RNILKFEYEIREVADREFGWMKPDGTWNGMIRELKDKRADIALGALSVMAERENVVDFTVPYYD LVGISILMQKQKAETSLFKFLTVLENDVWLCILASYFFTSFLMWLFDRWSPYSYQNNKEKYKDDEE KREFSLKECLWFCMTSLTPQGGGEAPKNLSGRLVAATWWLFGFIIIASYTANLAAFLTVSRLETPV ESLDGLSKQYKIQYAPIRPSQAYTYFDRMAKIETRFY

138

Supplementary Data 2-S5 – continued

DallIr MHPRWWILVVLLPQCSTGSDDVTGGLTRDYFGGLLIRQIVAFGCWDSEEGVKFSRLIMGDDHSL 64a TYVSIQDDLDMERILKVNYYRLGIFLNLDCPGSEKIFDQFHRQQLRHNESYFWLMPTTRTGLPKYF EHLPLNIATEMTAALKKSDGEYTLYDVYNPSYRHGGELNVTRMGSWSVKNGLNIELTEYKYRRRG NLYGLGLNASIVVDHPAVPDYETYIHNPINPHYDTMHRYNFALTRQLRDYYNFTMNLSRGTTWG YLINGSFNGIIGDMIKGIVDFGATPFQFKPERIDVIEYTVQGWLARPCLIFRHPKKNNLSNPFLRPFE MKVWYWIAIFGIVLWSALYLTVKVETKFDPQKSVNTIDTHPASETVLITAAAICQQGLSDGPRCIS GRITFLTLFIWGLMLYQFYSASIVGSLLSGSSNWITTLQDLVDSDLEVGIEDMAYNHDFFATTTDPV AQELYNKKVAISKKRKTEPYFSAEEGIKLIQKGGFAFQVDVATAYKFIEETFNVDEICDLVEIQLFPPK HTATGTAKHSPFKKMITYGLRKVMERGTPRRLLNIWMHRRPQCPESHKANPLPVVLTEFSPALFL LIIGIMFATLVMAVERTFLSFPSLNLLDPESR DallIr MRHDAAVSFNSITLNMNLEIILKVNYYHLGVILNLDCPLSDSVLHEFSDQLVFNETYVWLLLTTAPS 64a.2 PPSNRLRHLPLSIDTETTVATRDGNKFTLYDIWNPSYRHNGLFHVVYKGRWSPEEGLINELTQYKY TRRNFNLTPLNFSITLRHPPLPDLETYMTTPINPQFDSMHRYHYALALILRDIFNFTINLHRASSWG YMKPDKTFDGILGDIAKKVIDISISPFRYRPERFDVAEFTVQTLLVRSFFIFRHPSSASLRNNFLKPFA NELWWMILMVSIVYWISLLITIRIQKHYDESRSSLMVPASEATLTTVAALSQQGVSDDPQIISGRIV FLSLFIWGLLLFQFYSASIVGSLLTTPPHTITTVKNLTDSDIDCGAEDVAWAYDMFKTTPIAEESELY EKKIKPFENTPKNKYFSIVKGMQKVQKGGFAFYTESAPAYKQIKDTYHEDEICELQEIQSHPAREVT MVTAKHSPFTKMVIYGLRKIVQHGLSAHVLETWYAPRPRCPETHNSKPTAVKFEQFVPAIFLLLM GMSVSIFVLGIEYLYFYQTEDASHFHQEYSARATQTNPTPEHCTEEY DallIr KSDRILEDFVRDYYEANNVHQIIVFACWNDFDAFQLTRNVMKFDTTVSFIPIPSAVDFEKILQVNY 64a.3 YHVGVMLDLDCAESGKVQEEFSKQLVFNETYLWLLFTEALTPPTIRLRQLPLSVDTEMTVATCEG F DKFDLYDVYNPSYRHNGAYNVTYKGQWSPETGLIDVLTQYKYKRRGNFQLLPLNFSIVLTNPPKP DFETYIRTPINRQLDTMSRYHYSLVLLLREMVLHXSATPFQYKNERFDVAEFTVETLLVKAMFLFRH PKDATLRNNFLKPFTNDIWWMILAVGTVYWVSLWITVKIQIHYNESYMNRKEARGAFEIPGSES GLIALAALSQQGLSEGLQIISGRIVFLFLFLWALLLFQFYSANVVGSLLTSPPRTINTIKNLSDSQLDA GTEDILWIYDYFNNYFQIMKTPSHIELYQKKIKPSSKRPEGSFWTAVEGMQKVKKGGFAFYIDTAT GYNLAQDMLEENEICELHEMPMITWAKVTLLTAKRSPFKKMIIYGQIVQYGLMIKQFSIWYTPRP KCPESYSSKPIPVGLKEFVPAIFLLLIGVSFAVFVLIIEFLHFWR DallIr MKVVALFLLAITLIKFTKSDKILAEFIRDYYGSCDIHQIVIFACWNYAADISQSVMGLDTVVTYQST 64a.4 MNDVDVTTILRIAHFQVGVVLDFDCPFSKNILDKFSNQLHFNDSYYWLVLSRLTPISVNFLQHLRL TIEAELTFAVREGDTFKLHDVYNPSYRHGCDVVIVDKGKWSPGDGLSNKLTQYKYERRHLHGVTL NFTVTVANPVDVDIVTYLSSQKNRGLDPMQKSHYNLMLFLQYLYKFSITVYLSPLWGILVNGSYN GIMGDMVSNEGVDMSISPFEFDWYRLHVVEYVVPTWFTDFTFSFLHPTKSTMRNNFLKPFTQD LWWAILLVGAVYWVLLLLSLMLEQHHETGRQNINAGAVETGLTTVAALSQQGLSDSPHFPSGRI TFLSLFLWALLLYQFYSASIVSSLITAPPRWIKSLKDLTESDLEVGAIDVSFFRDWFKITNNSDIRDLY NRKMNSSVSNWPNAFMSVPAGLKKVQEGGYAFLTETASTYRIMRETYSEDEICAVQEIRPQPRN KMSPILPKNSPFKKMITYGFTKIIQSGLLAHVQHYWRGSVPECPESYSSMPTAMGMKEFSPALFLL CIGAGISIVTLLIEYFHFYLEDQRDRATRHLEELPQ

139

Supplementary Data 2-S5 – continued

DallIr MKVFVAILIAFALIHACQSDNILAVFMRDYYKACDIHQIVIFACWENAAHLARNIMGLDTVVAYQ 64a.5 SITNGVDLRNILLVNYYRVGVVLDFDCPFGESILDEFSTQLHFNESYHWLVLSKFTTIPVNYLGRLSL TIASELTFATRADDVFKLYEIYNPSYRHGGAVRIITKGEWIPGTGLIRVQYSLSEYKYKRRADLQGLSL NFSLTLANRPLPDLLTYLSSPTNRELDPMTRSQYPLALYLQDMYNFSMKLHQATTFGYLVNGSYN GIIGDIISGFIDMSITPFEFHVPRLKVIDYAVVTWYADTTFVFLHPKSSTLRNNFLKPFTNDLWWMI LLVAAIYWLLLLLSLLLEQHHQAGTRDASLSAIETGLTTLAALSQQGLGDSPNFYSGRITFLSLFFWA LLLYQFYSASIVSSLMTPPPRWIKSIKDLSESDIEVGAREHPYFHNYFEKMTDPDCIELYDRKMKSPT KNRNGFLPVDRGFKKVQEGGYAFITESAVTYQILHDTFSEDEICALQEVRVGRPRWLAPILPKNSP FKKMIIYGLRKMVQSGLLNRLQKIWRASRPQCPESYNTKPTPMSMKEFSPALILLCIGVIISVVTLM MEYLYFYLETRLNSIRMIVEGSTEQYDDQNAEVFT DallIr MGFWAVLLSSVVLSRCVQSERVVEKFIRDYYDAENIHHINAFGCWDDMDASEFSRKLMSLDNA 64a.6 VLYTPISPHVNLHRILKVNYHYIGVVLDYDCPMSDFILDKFSKELVFNESYFWLLLTNSSSPPNDVLQ KLPLTVESELSVATSSGNSFELWDAYNPSYSHGGVLNVTYKGRWNPEDGLKNELNQYKYERRSNF NLLPLNWSIVLRSHPASDLELYLTTPVNRHLDTMSRYHYALVLHLRDFFNFTINLQIHESWGYLVN GTFGGLIGALMKGQADASVSSYQYKLERMDVVDYLVETLNVKLRFFFRHPRSNDLQNNFLKPFAI HLWWVIIAVGFFYWGIMVILKKFEIYYERIEENENTVSSTALTTIAAISQQGLSTPPTITSGRLVFFSL FLWTLLLFQFYSASIVGSLLAPPQRWITTLDNLTDSNLECVVEDMPYMVDYFATTANPHTKKLFER KIKATKKKPKGSYMPAIEGIQRVREGGFAFHINVAAGYKIIEDTFKENEICELQQIDMVGQCLTSM VTAKHSPFNEMFTWSVRKAVESGLTKRLDRVWNQQCPQCPKSYSSKPTPVSMQQFSPAIFLLLI GFGSSFLILLIEYLHYWKCSDYLSNVEDTNSVAGTDGTEEEQSYMVEGFHVADDGQAVIF DallIr MTQKSKCIVFMIDPYYRKLIRYNWAQLRVLPYYSIYVKESEEFTPPRRRVEDILSESKNDGCDAYVL 68a LITNGLQVSQLLEYAERNRIINTRGNFLMIHDTLLFDVGMKYIWNRITKVMFIHRFTVLTRRSNKTT MKEWFNLETVSYPVRANNFVKTRYVNTWHKGRMLNKDVDNPFTNKTLRLERRSLRVAIFEHIPA VTRYSRQLQKHYRDYSEKASGVEFEIMRVLSDKMHFKPNFYTPVDIEIEKWGTKDDNGSYNGLLG EAERGNAEFFLGDLHYTLRHFELLELSYPYNTECLTFLTPESLTDNSWKLLISPFRLYAWIAVILILLLG ACAFHFFALFYQNQIMPYVRNTNAEGIQMRGLTLFTDMQNSMLYTYSMLLQVSLPRLPRPWAL RVFIGWWLLYAILVTVAYRASMTATLANPVAKITIDTLQQLAKSRISVGGWSEEQRDLFGASLDPD LIEIATRFELTLKEDDAVARVANGTFGYYDNIQTLQEARAKRQLLEEMRKKKSTREEKVIDDRNLHV MSECVIYMPISIGMDRNSPLKPQVDEIVRRVVEAGFVEKWLSDVTEWSKITELKDESPAAKATVN LHKLHGALVALGIGYFLGFLALIIEKIQWKYFVMKDPAFDKYQMDVFYSMSRSNFINKGGIASCRK S DallIr MMREIRAIAIAWILLNLLARGRSDRIDAIIGNFITDVSSSLLVSSSFTGFFCMESDDIVKFSRQISRNY 75u LLHKIASFNDSDLIDISQIATHNHYFVVDLDCPDGADFLIKANTRRLFIAPAKWLILRDLRNQDELRG LPYLQSMVEMNEDTLISLLSNFDIFPDSEVIVGQRLNETTVQLSSLYRPNSDHSLTIENLGSWDDEG GLCLCSHDQSSRRRVNLQGTVLKTSLVMTDLNTINHLTDYQDKLIDAVTKASYMWIVYLSERMN ATFNFTVERTWGYKNEDGNWNGMIGLLDRGEINIGGTATFMISQRIGVVDYVQLYTPTGSRFLF RRPPLSYVSNIFTLPFARSVWIAIVAFLTISFGFLYITMKWEWEKMQAIPLESRLGGDLEGKPTVTD NLLVLLGAIFQQGFSCEPRTISTRIVVLMVLLIALSLYAAYTANIVALLQSTADSIKTLDDLMNSPLKI GIFDIVYNRYYFGAFEDPVRKEFYERLVKDKPAVWMPLEEGIRKVREGLFAFHVDLGFGYQMMQ ETYAEDEKCGIEEIDYLKVYDPLLVIERQSPFREIIRVGALWIAETGLKPRVASKFFTQKPPCIGSTSFV SVGIIDCYVAMLAIVYGCAISVGILLLENLWRRCIGERRSHDNTPLKTSLPESSLTRDEKSATQSSTSL QIEELFG

140

Supplementary Data 2-S5 – continued

DallIr MSLIDPFYRYQINENKMFSAPGKWLILQESRSSFPQADHSATPTNETQLRGVFENLNIFPDSEVTI 75u.2 AQRIEDTVVKLVSIYRPNTVANLIFEDRGVWSKGNHIQLHNNEETSRRRTNIMQTQLRAAAVITN PNTMNHWEDFQERRVDGVSKVDYAFTKVLVARMNATVHFTFTPTWGYKSSNGSWDGMIGSL LRNEIDLGGTGIFITEPRLEVVSYINLYTPTRVRFIFRRPPLSFVSNLFVLPFARNVWFAIVIFCCLGYM VLYFSLSQEWKMIEKIPYEERLWGDLEIKPAFGDNFLIVIGAITQQGSAYEPRTGPARAVVFMLLVT CLSLYAAYTANIVALLQSSSDSIKNIKDLMESPLQLALQDIVYNHYYMGKFDDPLRTEFYERRIKNLK NPYMSTEDGVEKVRTELFAFHTDLVMGYDVVKSTYEEDEKCGFEEIDYLYVSDPTFIIQRQSPYAEI FRVGGLWLGETGLAQRFIDKIYHKKPECNNQKKFISVGTVDCYAAYLVIVYGLVVTFTILLCEVLWF KNFDKRSSEIDDEPENHDVASITSAAASDQTFGEELNSTILEEIM DallIr MLIVSVIAVLIFSVGVSGDQEMDKILQSFIVDVITSLYASSSFTVFHCAKPDDITEFSRFMSRHYQLH 75u.3 EVATISEAYKFRFIESPLSHQNFYVVDLGCAGVHELLIQANNTGSFVGPTKWLILQDLQTDSNATA NENFHSGATDQAELRSVFEDFHVFPDSEVIVGQKISDNSIKILSVYRPSPVRSLIIEDRGTWNSIDGI QLRDHDVSSRRRTDLQQTPLKACSVVTHPDTMNHLEDLKDKQVDVITKVGYAFSKLLAARINATV TFTFAASWGYKEKNGSWSGMIGEIDRNEVDFGATATFIIASRIDAVDFIQLYTPNRIRFVFRRPPLS HVSNLFTLPFTESVWIGISVLSCIGFVVLYLSMAWEWRIVKDLSHDEKLTGDLEIKPAFGDNFLILIG AVTQQGSAYEPRSVPARIVIFMLLVFCLSLYAAYAANIVALLQSTSDSIKNVEDLMNSPLKLGIQDIV YNRHYFGSFEDPLRKEFYERRIKKQKDIWLSLTEGIGRMRNELFAFHTELTSGYDVVQSTYEEDEKC GFEEIDYLYVSDPAFAIKRRSPYREIFRVGGIWLQETGIKERYIRIMYNKKPPCNNQKKFVGVGTIEC YAAYLTIGYGMLLTFGILLFEIIWSKR DallIr TPPYSSTIRKNGSLRGEGYAFEVLDLIAKKLDISYEIVQPRTPGLGNESAGLISLLKSKEIDVAVAFIPM 76b LWKFTEFTRYSPIMDEANIVGMMVRPAESASGSGLLAPFDTTVWICILISLLVIGPIIFLFTAFRSYL WNHTKVDKYDFTSCIWFTYGALLKQGSSITPVNNSTRFVFATWWIFITILTSFYTANLTAFLTLSRF TLPYKSVEDIMRKRVPWFFEKDRTIDNILDTLQILIEVGIIKYLEKRNLPKVEYCPLNLKSTERQLKNS DLTLTYKVIAAGFISACIMFIYEMIRRRQHVSCLCCGKSCSFCWPGIETPDDPILPPPVVLQSENNYV EKESTIRHTINHNFNNDSNHNNSNRHPQVEQISLIDESIYGAAASARKTYINGRDYWVITAPKGDK RLIPIRTPSALLFQYTT DallIr MALILIVVLIYSCRFVVGFNDFPSLMTANATMVIVIEKTFYERKILIKETSIFTREAYEKSVASFTSAAT 93a KIARERMNISGLSIHVAQDMGANLARDYTILLSVATCSSTWELFGRAKKEKLVHLAITDLDCPRLPK DEGISIPLIEPGEELPQIFYDMRISRGLDWKRAIMLFDESFEQDSIGKVVQAFCNELPKSDLGLASSS LYFLKRGKSEISTKRIIKEILAALPPRKPIDDHIIVVAAYNVISFILEAARSLRMLSTGSQWLFVVPDMA KYTSGNVTYLIELLGEGENIAFLYNDTNLNIRSDQSCRTGATCHVRELVGALGIALEKSLSMEIELYN RVTEEEFEQAGTTKFDELYNDTKPGGRGGTCGRCIKWTVVSALTWGNRIGSEADIEPHTLLGTGV WTPDPGYESKDYLFPHVMHGFRGKTLPVVTYHNPPWQFQMTKTEEITESTKAQWDGFVFDVF HELSKSLNFTYKIVAVETPPEINLVKSNPLKAAMSAAEKVPEKVTELVRSKSVFIAACAYTVGVYRKD TTIKFTLPMTIQTYGLLAPRPKPLSRVLLFASPYTNESWALLTTAIIIVGPILYLVHTFSPRTIDEAVKN PQEPVYIGLSSPSRCTWYIYGALLQQGGMNLPKTDGARLIVGTWWLVVMVVVATYSGSLVAFLT FPKMEAPIKNVDDVIERRGEITWSLPQDSFIEDFLTVSNEEGLVDYKRLLRGNEPHAQTHDTVSYE DNIHDVKGGKHVVIDWKSSLMISGRNDYIETGRCAFSLGTDVLFLEEPISMVVPTDSPYLGLINVQ LQRMHESGLMDKWIANRFPTQDSCSDSLMGGFEAANHKVDLEDMQGIFFILILGYIMGTVILGY EFFRQHRQLAKERKVIQPFVQ

141

Supplementary Data 2-S5 – continued

DallIr MKFSIPLIFFITFILAFFKLNGAHYHGPLLKAVHSKYKTNGGIIVSGTGHMSFGRTTIWHEAVRMLS 101 NDGIFTVIVNLRQFGDKLKSYSGGRMSLLIVIAIDTVEELHSFESMSKDFHLSYAVWLILFSRDASQ DVCDFCRNPHDSLSNLGFGSKVLVSCCDSDMIEEWWSIGENRIERQELGRLMDDNQGILWLSEE LVNRRRYSLNGQELRIVTVQDSFSFQEKNGTYYGFLGEILKELREAMNFTVSIIYEEGYGALNLKTG NYTGYIGRIHRREADLGVAQYFIRKELLHVLSYTSPVLSGYFEFHFRKPDAVNVPWNVYLKVFTGH VWMAILSLILTSTLLLTLITYRRRSHFVPLLFENHLIVWDIYCQQALPAFPDKTPLRIVYISLALSALVT LSAYSASIISQLAVFSYSPFRTPEEFVEDGSYKIIRLRNTPSHSTVMDYELSDEKLMKKFESLLQLRDL NPRNPQEAFEQICRERVAFLAFETAKTAVNNEIPCEITSFKFGPVYHVAMLMPLGSSYMDLINHHI QQFKDNGILRRLKRKYSTVPRGNKSALAPVEIHEIAPILFMLAFAFLIAFIIFISEMNYDPFTKELHKP RRRKTRKRRAKAGHFHLRHRI DallIr MKFANSSIFSIVSIVIFFELSEAHYYGPVIKDVHDQFGRTEVIIVPQTNYFSFENIMIWHETTRILSNE 102 GISTVILNTRHFEEKLKTYNKETTRSLIVIALDTIEELHAFESTTQDLHMSYAIWLIFFTRDADRDVCE FCRNPQESLSNMKLGSRILISCCAFNTIEEWRYAGKNRTERQELGRLNRDDRGIVWFSNKQVADI HGGRYSLAGRGLRIVAVKHCPIFWEKDGHYYGVLGEILRELSQAINFTVSKIIWEDDYGAWNPETS SWTGAIGRIHRQEADLGVSDFMFSTHRSTAVAFTSHFTSAGFYLYLNKRYMARLHWNAYFKPLS MDVWMVIFGLILTTSILLNLINYTRRSHFFPLLFQHCLYAWGIYCYQALPKFPKGTSSRIVYASILLSS VVTLIAYAAAMTSRLTVVSYIPFKTLQEFVDDGSFHVIKLNVSQDFDHYKFFDQTLTKKMMSLMM PRNLLPVTDEEAFEQVCSKRVGYYLNDMAKKAIEAQGIRIPCELSSIKYGKTQILAMIMPHGSRCLD LVNYYIMQFRSNGMLQRLEHKYYKELKRNDLKYSSVSLRGVAPLLLILAIGFLIALIIFIIEQNANAFR KKLRSYQKRRTTFLKTRKASLFLANDCNFKKRLRNLKQFIGRMQLK DallIr MYLHSVIMVKLNEAHFYGPAIKAVHDKFEANGVAITAGINHLFWHETSRFLSNEGISTLIVSFQQL 103F RNIWKTQPTRTTRSLIVIAFETFEELHTFELIIKKFRMNYGVWLIFFMRDADREVCHICCNPHGTLA NLKFGTKILISCCDSNMIEEWWSIEENLPKGQEIRRLMDENFTISWFSDELINGDKYSLKGASLRIT AVTQSVFFRKKDGQLYGFLAEFLNKLSQAMDFKVSEIIWEEDFGVCITGSSDCTGSIGRVQREEVD LGVAAFSATVERHNLVDFTFPIITGNHEIYFRTYNAINVRWNAYFKPFAADVWIFIICLILMTAMFF TLIRYKRESSFFPLFVDRYLHIWGILCHQSLPAFPRETPLRIVYLTMALSALVFSSTYAASVTSNFTLSF YSPFNTVEEFVKDGTYELSFSDNKLKKKMLSLLRSEDSLPNSSQEAFEEVCKKRVAFFTHEATKRAL FNLIPCEISSIRINTMNPMSMIAPRGSKYTKIINHHIHQFKEVGLLRRLENKYIKMENDKTEHPPISL QEVQSILMILVTGCLLASIIFIIELKLYTYCKNL DallIr MKFFLLLILLGSSSLVFVNLNEALHHRPLIKDFYDKYKSDXVWIILGSNNLFEKTTIWHGVIKMLSKE 104 GISTRTADFNAFKFMLKTVNENNMRPLIVLVMNEIEELCSFESIIGKFYTDYATWLILFTADSSQDV PSE CGFCRKPYGSLANPIFGSKLFTLCCNSKVIMKWRYSETNRSRRLEIGRLMDGNQGIVWSSDELDY NSKYSMDGRTLRVFRVRVRTXMISTLTVSSGSTFNTMDEFVKDGSHKLIVLDSTLVTDMYKVITDS SLYYGNPSRLLNEGHKGQNKCQVGQNEGHEISPLLNRIPCEIVSVQTGVIGTAGMIMPLGSKYRT ALNHLQQLKXGLLRRLEHKYLRQFERGKSGHPPVTVERVVPILFILAVGFLIAVIIFTVERNVYLFATI LRDRKRRRALSGKCKTFSFPT DallIr MKFAISTIFLSVSIVIFCELSEAYYYAPVIKDVHNQFGRTEVIIVVPQTNHFSFENIMIWHETTRTLSN 105 EGISVVILNTRPDEKLKIYSEKTTRSLIVIALDTIEELHAFESITKDLHMSYAIWLIFFTKGDDRDVCEFC RNPHKSLSDMKLGSRVLISCCTSNTIEEWRYDGKNHTERQELGRFKKDDRGIVWFSNEQVADIH GGIYSLAGRGLRIVVVKRARMFWEKDGYYYGVLGEILRELSQAMNFTVSEIIWEDDYGVWNPETS SWTGAVGMIHRREADLGVSDFLFSIKRTTAIAFTTHFTSADLYLYLNKDYTARLHWNAYFKLLSMD VWMVIFGLILITTILLTLINYTRRSHFFPLLFQHYLYAWSIYCYQALPKFPEGTPSRIVYGSILLSSMAII SAYGAVMTSRLTVVSYIPFKTLQEFVDDGSFHVIKLNVSQNFDQYEFFDQTLAKKMMSLMMPTN TLPLNDQEVFEQVCSKRVGYYLNDMARKAIEAQGIRIPCELSSIKYGKTQILAMIMPQGSRYLDLV NYYILQFRTNGMMERLEHKYYKEFKRNDFKYSSVSLPGVAPLLLILAIGFLIALIIFIIEQNANAFRKKL LSHQKRRTTFLKTRKASLFLANNCNFKKRTRNLKQFIGRMQLK

142

Supplementary Data 2-S5 – continued

DallIr MQFFIFLILLCTSSVIIVKLNEAHFYGPAIKAVHDKFEANGVAITAGINHLSFEQLTVWHETSRFLSN 106 EGISTLIVSFQQLRDIWKTQPTYTTSSLIVIAFETFEELHTFELIIKTFHMNYAVWLIFFMRDADREVC HFCRNPHGTIANLKFGTRILISCCDSNMIEEWWSIGENLPKRQEIGRLMDENFTISWFSDELMNG DKYSLKGASLRITAVTQSVFFRKKDGQIYGFLAEYLKELSRAMDFKVSEIIWEEDFGVCITGSSDCTG SIGRVQREEVDLGVAAFSATVERHNLVDFTLPIITGNYEIYFSKYDVINVRWNAYFKPFAADVWIVI ICSILMTAIFFTLIRYKRESPFFPLFADHYLHMWGILCHQSVPAFPREAPLRIVYLTMALSALVISSTY AASLTSTLTLSLYSPFNTVEEFVEDGTYELIVLDSALINDMYKFSDKKLKKKMLSLLRSEDSLPKSPQE AFEEVCRTRVAFFTHEATKKALFDLIPCEISFIRINITNPMSMITPRGSKYTEIINHHICQFKEVGLLRR LENKYFIKMENDQTVHAPISLQAVKSIFMILVTGCLLASIIFIIELKLYTYCKNLCDQRRRKTLLGKRG KFPFSMYNFVIRKFP DallIr MLLGTSSLIFVNLNGALQRGSLIKHVCDRYNSDKVLIILGSNNLLFEKSTIWYGVINTLSNDGISTSIV 107 DFNALNLSLKTVNAKNMHVLIVLVLDTIEELKNFESIAEGLYTSYAIWLILFTSDSSQDLCEFCRKPH GNLSNPKFGQKVLTLCCHSNVIMEWRYSETNRIRRLEVGRLMDGKPGIVWSSDELDHNRKYSM DGKTFRVIGVKTSIMLWEEDGVFSGILGELLTELSQAMNFTLSKIIWEHDYGIWNPKTSNWTGAI GRIHRREADIGVSDFFMTTQRYAAVSFTSPIFFTPLKLHFKKRHADNLTWNAYFKALTIDVWVVIL GLILITPLLLTLIRYRRRDNFFAFLFEHYSYVWGIYCQQGVPVCPQGISPRIIYLSILMSAMVTLGAYS GSMISTLTVSSDSTFNTMDEFVEGGSHKLIVLDRTLVTDLYKFTDERMRMKMMSLLKPEHSLPQS IHEAFYQVCREKVAFFTVEATKTALLNRIPCEISSVQTGVIGTAGMIMPLGSKYRTALNHHLQQLK RIGLLQRLEHKYLRQFERGKSGHPPVTVERVVPILFVLAIGFLIAVIIFTVERNVHLSATISRDRKRRR VLSGKCKTFSFPT DallIr MKFFILLILLGSSSLIFVNLKEALHHGPLIKDFYDKYKSDGVWIILGSNNLLFEKTTIWHGVIKMLSKE 108 GISTRIADFDAFKFMLKTVNENNMRPLIVLVMNAIEELWTFESIVKKFYADYTTWLILFTGDSSQD VCGFCRKPYGSLANPKFSSKVFTLCCDSEVIMEWRYSETNRSRRLEVGRLVDGNQGIVWSSDELD YNSKYSMDGRTLRVVGVRMSMLLREKNGKLSGILGELLIELSKAMNFTISKIMWEDEFGVWDAE KSNWTGAIARIHHREVDIGVSNFIMTLQRYDAVSFTTPILFGPLKFHFKKRDINYLTWNAYFKALAI DVWMATIGLILITPILLTLIRYRRRDHFFPLLLEHYSYIWGIYCQQSLPDCPKGTSLRIIYLSILISAMVT FGGYSGSMISSLTEYSGSPFNTMEDFIKDGSYKIIFLDPTLINDIYTFRDVSLRKKMMSLLKPSHSLPK NIEEAFHQVCNERVAFFLADVMKKDVVNDIPCEIYSVGTGSIGTIAMIVPLGSRYLDPVNYHLQRF KRNGLLERLKHKYFKLSQRRRSNHPLVTVEAIVPILLILAVGLLITIVIFIAERHAAHVLVTKLRDRREL KVSLRKRKSFSFPMYDYVP DallIr MKNSVMLLLCFTGIRPIISTHTCNDVNCYGSLIAQVYNEYNTAGILVASTTTHLLFQTLIYWHEISTT 109 LSDQGIPTVMVNFAEFTERMEFYRRHPNRPFVVIILHRPGDLYFFSQITKSLPMNYVLWLILFIGDA DKDACNFCRDPHENLLNLKFNSEVLIMCCNSNIIEDWWSVTRNGTNKGQLGRWIEERNEIEWFA HKSIHRRRTSLEGRAFRISFVQDSSYIWIKDGHLQGFLADVLRELAKSMNFTISTATVEDTYGILDP GTSIWRGVVGQLQRQATDIGVDGFSRTSARRSVIDFTVPIITVDSRLYIKKPDGTNVQWNAYFQA FTRTLWAVIIAVILIMPVFLTLIKYNRRFNFFPLIVEHYLHVWGIYCQQGLPEFPDGMPLRILYVSIFIS ALVVSSAYSASLTSFLAVSQLPFNTMEQFIKDGTYGLTAVYGSEDYNTFKFSNDTVLRQMMSFMK PKGSLPKSYFEGFSQACKERVAFYTHYEITKGREMYVMPCEMVSFKTGSNQLSGIILPLQSEYRTFI NHHLQRFKTNGVIRRIAQKYNRNDEPPKTVHTPVHLRGIVPILGILIFGFIIASIIFLVERSFYSFRNKL RRRQMRKI

143

Supplementary Data 2-S5 – continued

DallIr MLSLTSLILWTTCMASPIVFSHEVTHYGSLIKNIYDKYGRTGVIIASATSHLSFEKTSTWHELTGMLS 110 NEGISALIIDFRQFENRLKVYIEKTYPPLIVIDLDTVEALHSFEIITKNADMSYSVWLIFFSGDVDHDV CKYCREPHGNLFNLNFGSKALISCCASKMIEEWWSTQKNHTKRQNVARLTNENPGIVWFSQKLI SDGRQSMDGQILRVTALADLQIKNKRNKDLYGDAGKFLAALSTVMNFTVPNIIWEKTYGAWNR ETSKWTGILGRIHRQEADLSINSIVMTSERSNIIRFTTPIMSGVYQLHFRKLDSARFTWDAYFKVFA ADVWIVIIGLILTAPIVLTLIEWNARKSHFLPLLAKHYSFVWGIYCQQGLSDFPDETPLRIIYISLMMS ALVVSATYGASFMSILAVSSSFSPFSSMEEFAGDGRYKFIVPRNSSSYYEFKNSNLTLMKKMMSLM KPVNSLPQTFVEGFQQVCKDRVALYTHEFRKRLLSNLIPCEITSINTGKMETVAMIVPRNSPYIEPIN HFIQELTFEGIFRKLVKNSNPQHYEYGFQPAHLQGITPLIVVWISGVLIASFIFIFERTSYLSTQESHIR NRRGTKNNAKHPH DallIr MNFPTLVLLLSTLNAISTIVSDKFSDFNYYGDLIKNVHDKYKGTGVIIASEMDRQPFERITIRHETTR 111 KLSNEGVSTLFLDFSQFENRLNVYMKETDPPLIVIVLDTIGALRSFETIAKGLDMGYFVWLIFFSRDA GQDVCDVCCHPRGNLFNLKFNSKVIVLCCDSKIIREWWCVRDNRTWSQEIGHTNNQEITWVSD ELLRDRRKSMHGLVLRVAAVKGTALFLERDGKYYGYIGEILAALSEAMNFTVSQIIWDNDYGYWN RKTLNWTGVIGRIHREEADIGVPDFLITDARYNAVNFAYPIMNGAYQVHVKKLEVAQVAWNAYF KVLTVDVWMVIIGFILITPILVTLIEYRKRECPLFPLLLEHYMFVWSIFCQEGFTFQIEMSGETSSKIIY VSLSLSTLVIYTAYGALMTSILAVSSSFVPFATMEEFADDGTYKFIVLNNTLFYNTFKFSNNTLMKK MMALMEPTHFLPQTHEEGFKQVCSGRVGFWASEKIKKIYDNLIPCEITSVGTEMKESSTLITPRHS EFMSLINYNIQQFKYNGMFQRLEKKYWKQSQQNEKSLSPVHLQEISPFVFALFIGGLIAFVILVIER NSYLSVKKVHNRKSKQVRKNRTSKFPAYSRL DallIr MKCWIVLILCFTATQPFPLTNEKGKINYYGSLIDIFYHRYKPSGVIILSPEKDYSFESLTFWHAISNEM 112 SKRGITTAIIDSQSFRDRFTFYTAQSVRPLVVILLGSMEDIYTFGKITMNLYMSDIAWLVLFSGDSDE NACSFCHNPFGNLLNLKFNSEVAIACCDSTIIKEWWSIGNNHTRVGQLGRWIDQNRGIQWFSSE ALLQRRRSLEGRAFRVCIVKGSNDAWEKDGHFHGPLGKLLEALSEFLNFTISTVIIEDNQGYWDSD ISRWTGVIGRLVRGEADIGLAPFMMTAERLKVIDFTVPIYDGFSQLYIKKPDTIVLHWNAYFQVPTI SRYFSPTVICLLSQLIIYLQAFHVNIWVVIIGSILIMPLFLTLIIYRKPLFFVPLIFENYLSVWGIYCQQGL PVFPAETPSKILCISIFLSALIVSATYSASLTSFLAVSSSYLPFHTMEEFVEVGTYKLTSIKDSSEYLMFKS SNDTVMQKMWSLMKPKESLPANEEQGFYQVCNEKVAFHLTTGYRKEYLNTATTCKITAIETTRIQ GSPLVMPFRSEFTAFFNYYIQRYKHSGLVSRWIEHYHVEQELPRAIYPSVRLEGVIPLLAALAGGFII ALIIFLTERILHRRRDKLRHKKIGKLQLITFQRKQFPRATFHKNLGFRY DallIr MKLFISLELYLTGIYVFFVAGNDTVVYYTSLFESVHDTYGTAGIIIASSTNYQSPARLTTWHETCTILS 113 DKGIPTAFVSFANFKVRLKFYTRRTVRPLAVVLMRKIEDVHIFEGIAKKLDMSYPVWLLIFTKDADE SVCEFCRHPHRNLFNLRFNSEILISCCDSNVIREWWSKARNFTHATKIGEWIGEDRGIRWATNQS LYSRRHFFVQPTVRVSIVRGSSYIWEKNGELDGYLGEILKELSLTMNFTISALIKEPYYGSYDPKTSK WTGVIGRIVQHEADMGASEFTLSHERINVVDFTIPIAIGDCRLYVKKLDGARLQWNAYFGAFKAD VWALIIGSIIVTPIVLTVIKYTKKRRHAFSMAIEHYLYVWGIYCQQGLSEFPDETALRILYVSIFISALV VSAAYSASLTSFLTVASIYLPFNSMEEFANDGRYKLIVLQDSADYDMFKMSNETIMKKMMSLMK PSHSLPQTILGGLQQVCTRKVAFYTNEALKRTLNKNLPCDIVSIKTGKIETLGMIMSRHSEYMGIVN YHIQRFKDNGMLMRLQHKYIQQEDTSEGALLPVGLGGIAPILFVLIIGFFVAFCIFLVEIIFPPISDKLF KRKKRRLNGQNFQY

144

Supplementary Data 2-S5 – continued

DallIr MKVCMTLMLFFGAASLMDSSNGLDTIDYGSLIKNIYDTYGTAGIIIASSVNYQSFSRLTVWQRVTR 114 MLSEKKIATGLVNFEQFKDRLEFYTSRTVRPLVVILFGKMEAINRFSRLAVAIDMSYPVWVFLFTAE TNSDVCKFCHAPHQNLLNLRFNSEALILCCNSLIIDEWWSKTENHTNTRELGKWNDERNEIEWFT ENALYSRRSSVEGRKFRVAIVKGSGYIWQKNAEVFGFLGGLLKELSRSMNFTISSIITTIGYGSWNP ETSKWSGVIGSLRRNEADMAISEFSMTHKRLDMVDFTIPIAVGYARIYIQKPNGAHVKWNAYFKA FATDVWVVIIGLLVTMPLFLTLIKYKRRKFSLFPLVVEHYSYVWGIYCQQALAEFPAETPLRIVYLSIFI SSLVVSAAYAASLISFLAVSSSYLPFNTVEQFAADGTYKLVVLKDSSDYDTLRTSNETIANKLMSLM KPYYQLPQTLEEGFQQVCNGRVAFITNEAMKDAVLAPMPCEIASIKTGRIETLGMIMPLRSEYTAL VNFHIQRFKDNGILERLKQEYFRQEDSPELSHPPVDQWEIAPILCVLTGGFFIALVIFLMEHIFHYLR NKFRNYKIKKFSCRKDRHVHFAREAVIP DallIr MELPVAVILSFITAYWSTLTTGHVHITYYGSLIDEFYKIHDADGILIVSSTNYHSFGSLTFWHEMSRK 115 MSNRGRAADMVNFQELTPTLQLYRRQNVRPLFVIFIKNMKEVHFFGRITKTLNRSNSLWVILFSG DSSGDACEFCRHPEGNLFNLKFNSKVIVACCESTIIQEWWTSTGNRTHIGELGRWIDESRGIEWFS DKSLYERRTSLEGRPFHINIVQGSTDIWQKNGDLHGYLGRVVKALSQFMNFTISSVTIERSYGRWD PDASEWTGVLGKLHRNEVDMGVSSFIMTNERREVVDFTIPTVYENSRLYIRMPGANTVQWNAY FEAFGMDVWLVIIALIMTTPLLLTFIKYKGRSFIFPLVFEHYSSVWGIYCQQGLPEFPDETPLRIVFIS LSLSALVITMAYSASLTSFLAVNVSHMPFTSIEEFVKLGTYKLIAIQDTADYTLFKDSDEVLMKKMA ALMEPPGFLPATHHEGFQQVCRKSVAFHTTHEIREGYTSPFIPCKLAWIKTRKIQASGIIMPLNSEY TAFVNYHLQRFKYSGLLDRWKREYHYRIKDVPEIYHPSVQLKGITPILGVLTAGLLIALIILLIERIFHR HRDQLQRKKIRTSEVEMLQRMLSRPQSSPSWSRDSNDFRTIYRNRQKKLAQAWIINFE

DallIr MNLFVFLLYISIVMSTVNFLKAKKRVTYYGSLIKDIYEEYRTGGIIIAMPDNTRTRFARLTRAYEISRSL 116 SRDEIPSIAVKFETFKERLASYTDRIFRPLVVIGFYKMVEVYTFQQIAKDLIMGYPVWLIIFAENADA DVCEYCRNPHGNLFNLKFDSEVLVLCCDSGIINEWWSTTVNRTERKEIGRWIGEDPGRYWFNHG SIYGRRKSLEGRELRVTAAKGSAYIWKENGKYTGFFGEILNELSTTMNFTISEVTPTDGFGSLNPETS EWSGVIGRIHRNEADIGVSPMAMTHSRLNAVDFTIPMFSGKSRLYVRKLDGARVQWQAYFKAF AVDVWMVVIGLILIMPLFLTLIRYRRGNHLLSQVMEHYSNVWGIYCQQGLSRFPDEISLRIVYLSIFI SAVVSASVYSASLVSFLTVFSSYSPFNTMEEFASDGTYGLVVLKNSAQYDMYKNSKDAFRKRMMS LMAPEESLPSSLAEGFRRVCQERVAFETNEAIRKTMAHSIFCEITAIDTKTIETFGMITPRRSEYLEFI NYRIRQFELNGVFRRLKNKYFTKPRENKIDYPKVHVEGVKPILLTLASGFLITLIIFIIELLFSWIQNKLQ RREKQRVIQQQTKYFRFSHRAKSEIYFIL DallIr MILFKTFISAALVVFIKANEEVYDHTPYLITDIYETRETCGIILAAENYHSFKTLIIWHGLSRTLSDEGIP 117 TLMINFKQFEERFEFYTKRTVRPLVVIFLAAMDEVYSFSKIAKNLDMSCAVWIFLFNGNISSDICEFC HSPRRNVFNLRYGTEVVVSCCNSTMIEEWWSVMEGHTASLELGRWTDGNHGIQWFYNDSVYN RRHSMEGQEFRIAAQSVYFWETNGKYYGYLGEFLGELSQSLNFTPKVIWEESYGTWNPETSRWT GVVGTLERNEADLVLSELRMTNERLYVMDYTIPIGVGGTRLYSRKLDAARLHWNAYFQAFTVEV WMVIVGLILTIPILLTLITYKKKDCYLLPLVFERYLCVWGIYCQQGLPEFPKEASLRIVYLSIFISALVSS GAYSAALISFLAVSSTYSPFNTIEEFVEDGSYQLIVLKDSPDYDMFKNSNQTLMKKMMSLMKPIDL LPQSFQEAFNQACTQQVVFYTHEAIRRAMANRLPCELTSVYTGKTANLALALPRNSQHRHLVNF QIRRFGDNGILQRLGNKYFSEYHRNEICYPSVHFQGIIPLLTMLAAGFLIACIIFIIERICSPPKKELFDL NRRRRC

145

Supplementary Data 2-S5 – continued

DallIr MINNNGKNNNSLSYKKGINFSISLILLETFFSAASVLLVKATEESDYHTPLIRDIYDQRGTCGIVLASA 118 ENYQSFGTLTIWHGIFKTLSDEGIPTLMISFEQFEDRFEFYAGLTVRPLAVIIFVTIGDVQSFSEISRDL DMSYAVWLLLFMGDASPDVCEFCHSPSGNLLNLKYDSEVVVSCCKSNIIQEWWSTLTEHTNSLEL GRWTAENRGIEWFFNDSLYSRRHSMEGQEFRIAATSVYFWERNSKYYGFLGEILRELSDSLNFTIP QVNWGTSYGAWNPETSSWTGIIGKLENNEADLAVSEFRITQERLNVVDYTVPIGVGGTRLYLRKL DAARLQWNAYFKAFSMDVWMVIIGLILTIPIFLTLMRYKRKHYYLLPLALEHYLSIWGIYCQQGLSE FPKPTSLRIVYLSIFISALVSSGAYSASLISFLAVSSSYSPFSTIEEFVEDGSYQLIVLKNSPDYYMYKTSN QTLMKKMMSLMKPTNLLPDSYQEGFEQVCTKRVVFYTHEAIRRAMVNLIPCEITAINTGKTETLG MALPRGSEWRGLINYQIRRFGDNGMLQRLGYKYFTEYNRNELRYPSVHLQGIIPIVAMLGAGFLI ASIIFIIETIFCSSKKKSLNPRRRKSF DallIr MRKLIILLCISAHTGQSYCQLIRPNVIYESVIKGVHDYYNNTCIILLHATEDPIESQEESENLQRLQAYL 119 SKAYIRTAVMQISTFIDRVGGSYYHIKRPLFVLLNDDDDVRNQFAFEIAPWIDMSYPNWLVFLRPE TSIEGFFDKIYVQFDCTMMVSQPDGNLESPGEIITEVYQIDRGEKLRTGLFATWSRETGIKLPRWSL YQRRSDLQGHLFRVMSIEDPPQSMIRRDDNGQVTGLGGFFGGLMDLLQESMNCTLVYLETNE WGYLRGNGTWTGAVGSLIDNTSDIVAAELIMTRDRVDAIKFTTPVYSTKIRTYIKRPSLSALKWGA YFIPFEPSVWVAIVVMIVITTATISLVNSAISLFARQWKDNDDCPTNVPDIFFAVFGVFCSQGMSA SILDPIRISHFVIHLTGVIILAAYSAALISALAVKTFVLPFTTMDGLLKDNTYRFGVVRDSADYSFFQN TTDEILGVLFDKLLVKEKELPNNYLEGLSLVCNEDKYAFMTVDNAVTQLQSEVGCVLVPLDTISQTS IAFGLRPGSPYRGILDSHLLLLRDSGVMQRLLNSHWAMTGDNVEGGWESVEIGDILPLAVMLLT GIFMGFMILSIEKLVKRNAKIIKKEKKIVKKFLKNAKFLSGQLNHVKKSNK DallIr MKLLPGFSVFSLLRVVGSIASGESILVWDSENADFIPIWHFSLFRDLIMKNRNNSLGVEKARLQGQ 120 TLRIGYHSEVNLMTFENNGTKISGLLGDLWTMLSDLLNFTIEAVEVPEAKFGAQSSESHIGLMGLL RRNEVDIIPRVAFYRNTNEVMDYSTPLWTNSFRVFVRPQFDSDDSWIFKTFPWPCWISIITSIALFS FFGTLFDRLTAARFQRNSLRHLLLEHFFYTFGTFCNQGDIPANMERSRLMAFSRRTCAWLIISIFST SLVASMTHKEMHLPFTGIASLLAKSDFKLVVNNASLGFSKFHDLILPNFTSPKYSRRFEFTRIAEDM YRKGCGSSGKKAIFESEDRYRAWATRTCTFIPTQETYFSTWITTGMTKGFEYKRPIDNGILKIIEVGL LTALKDRWLIPPLTWPPDKYVVVGMKKVYIIFVVLSIGVMVSLMIFTMEHIVVIHRRYYLRRKWDK QLRKRARRLKVAWSRRNPFEEKGILEDRNAFLL DallIr MAIKTREFTIIFLFFVILNGNQGQMHPPPSPLTLQLMMEYAKFRFWEQIVLFDDLSNGNNEILYYA 121 RPLISCLSDQGMSISIQSTLTNKLPEALNIRRHRVGAIVLLDRLNHTSAENVLKTASSKRLFDYYISWL LITTDSNDASIDLILRNLTIGINSDVVVATSSASAYNVRKEIFNYKNRQFREYIKTYNFKWENSNEVIP ENVTRNLEYSLMENRSISFYLVHTYKIRINDNSSLVVDPLGYWNPGAPMLKLPINVALRNNFFRLP MIVGILNGTSDNQNGEITSYEEEPSEDQPMNDFIDFLAHSLNASLEIVPHEKLGTLTNKVWSNLLG DVYTGAVDIGLGYITTNEDRRRDMSFTHPLIRYTRNIYIRPPESGTMRDIFLQPFNNHLLLCVALM QFFIIVTIGSINYAANNVLSKRKGRQTGIGEATLWCTSIMCMQGSPWNPSTLSGKTALLASLIFALV TYNAYAGFITSILSVQATGIKTLDDLLHNNFKLGYSDVDDEYMRNANDSALRQLYIKAFNGRESRLS TSKGLQRAVGGRYGFFASATLARRALRTSLIHERCLLKEIEIEHTFTTVALPMAQYSPYEKIINLSILK MDERGVIDKIRQRMLPDMPRCQDATTFHSARIADVYSAFIILAIGIITSLFLGVIERLWNQRKMFLA KIIGRVTSRKSVEAPKNHSVNHNFHGASVTWWRHSVGKNSRLHQGKRFHRAFKLGTFPFHH

146

Supplementary Data 2-S5 – continued

DallIr MCKYFIILLHVISSFTSGERGDIILDIDRGLWLNDRDFEMMIKYSYHFSTCCNIFVNGTTRDIGTLFN 122 LFIKIYQYEYTTGSIEYGCRGFFLLGSTGETLALAVGRVPTAVSTTEILIVVDADLRDDSPLLNVSLFQ HSNVNIIARSGNWTLSENFLQPRMFKKVHRSSQMRHKTGIVDLEGRRLQVTTSNIAPFSYLSTTV NRTVNGIQGQFFVANDERELDGVEVKLFLIMAEKLNFTWMMRKPNGPYRHGRPNGTSWNGG MIGQLYRKEVDLAFGEVWLEYEKSQYVNLSVPWYEVWINFLVPRPKPTENVWALAKPFRLNVW VAVIAIVILESIAVWGKARINSKLPPRFRSYVNTLIEVIGRLVGTWAPRKTQGIRIQLQFWHFAGLLV VTAYSSSLAARLTTPDYEPRIDTIAQFVKANLTWGRERTPPNYRHYFDLNDPYAKQLSNGFLIETSQ EDRQSKILEGNYAIIGKISHSIFFPEVNVCNSDLANYRVMRESTAKFFISFGAQSWLVPSIDTMMRR LTETGLVEYHLRDVIRRRVNGSLRDVFIEHDGENTNGPRALKLKPLGAAFIILLAGYVVATIILYFELK NKNKIDH DallIr MWKCVVLLGLTGNFVGGDRNVMIDFQRASWLDDQDFKGMIDYSFGGSRCCNIFVEGVTDGV 123 NALFHQFIKLYGHDYTVNKINRKCNAYFLLAEESTSLISAVEKVPTTIALTEILIIFNAEIEKNSLLFNASI YENANVNLVSRTGRWSLSEMFLLPRVFKKVSSNSDMKHNKGIVDLEGRELQVATFYVPPLSYLST SENRTVNGIEGEFFSSNDTMEWDGVEVKLFMIIAKQLNFTWMIRKPNGGYRYGRAMNSTWHG GMIGQLFRKEVDLAFGGIWFMYDPFRYVNLSVPWYQVSIHFLVPRPHPIINFWALTRPITLEVWI AVAVTIAMQSLNVWFKAWINPKVPSRFKSFSNTLTELIGRLVGSWAPRKTVGLRVQLQLWHFAG LLIVTAYSSSLAARLTTPDYEQRIDTADQFLKANLTWGREGPIPKFDDYFEEQYREKMRERFQSENS PDERQSKIEQGNYAIVGKIIRSIFFPENDIHSSDLHNYRVMKEGFGKYYVCFATQPWLVLSIDR DallIr MKTPLLRGALKFLVIMMCTGPFEAFHIPEPAWSADLMSYIKENYEHYRQVMIITCKDSGVPFENY 124 WIRKILHTAMATFPTIRINVDFSSNINEEWSFHRTDATATLFIFVDDSDTWRHHQPKDVGKVVPQ QIISIMKDLSLNKKIAKYLALFLSSEQTLNFDDLLRHAWEMQMIDLTIIEVIGCQSVKTTILNDCVDD SSLPVIHHFNPFLDSIIRKTYEPGMKLFPDIMKNMHGYPLKIGIRHHPPFSSVSWDNNSNYESMSG LDIQLIHTMAESMNFTLQILPQLMTFEEMQDNSSNGLFNFLRSDKIDIFASPHPHYTEDMEEHSF RSEAFIRDQLCAMVPLRKTVRILLPKTVTETLILTIGIVLIFWASIWLFRFSQSWSIFTIVRVLFGIPVF AHHARLKPAQRVIIQILMIISLLYSAKIYASLTNINIDVVGAVEIETLDDLDQSGLIPKIHPHLMDKTF GHVNKNDQTLMNLKRKTVSMRSMMKCLSEAEKFKNTTCLMTTVEGYWFIRSTYRHREPALKM TQACFWSDSYAFLFREGSPYRNKIDNTFRLLAEGGIPIIWARNDTSGNFEEQRKDNNVFHELQVP PPGVLRDQLTVVLLFGLTIAIITFIGELLWYHRIKRWKSV DallIr MKTPWQCGSIEHVVIVMCIGKLGAFNPLEPVWLKDLLHYIEHNDDEYHQVIFIASEHTGTSFENY 125 WIQRIQQTIMATYPTIRLNVERPSDADEWSYHGIDATSTLLISVDDAEVPRLDISKQTISVMKKLSL NKKRAKYLIIFLSSERTQNFSILLRYGWKIQMMDLVILEIVSSRREMITISESSIDDHSSLVIHYFNPFL NLMVRKPYEPGIEWFPNLMSNMHGYPLKIGIRHQPPFSEVTWDEKANCVSMSGWDITVIQEVA AKMNFTLQILPQLTNFSEIIEDNSSYGLFNLLSSGKVDILASVNPHYTEDMEENLYRSEMIFRDQLC AVIPVKNTTRILFPTEIIEAFISTIGILSIFWVSTLLFKFRRSWSVFDIFRMLFGIPKHINHSSLKLAQRL MIKIILIVSLFYCVRIYACLTEIIIDVDHEVEIENFDDLDRSGLIPVVSAYLLNRTFGNVDENDRALINLK NKAVATTAIWDCPSHADRFKNVTCLMTRNAVQILMKSTYQPGERILKISKACFWSDSYSFLVRSG SPIRKRMDYIINLLSEVGLKIMWFRNDTRIKLWEIDREDELIWDEMKYHPTSPLREQLTIVALFGFT TATITFIGELLWYHWLKKWKKTIKLFR

147

Supplementary Data 2-S5 – continued

DallIr MELYHFIVVLIITRALTCSGEFNWIEGIEDYASKHENLHQVIFVSESEETLRIPGMSELFRRIAEHRPII 126 QISANNSDMFNKMRQDTASTLFIYTHTPFGRGILPSTKIIDAMMEASFGRTAVRYLIIHYSNSKND YLSETLKHAWRRQILHCTIIELLYNHKKMMQEEIFQVLVHLYKPFLNQFITANFSSQTELFPNTPND MNGYPFKIRITHDPPFSTIRRQLDGTTKLKGANMRLIDTLAKAMNFTVKAEEARVSKNDRLDTFL QPLMNREMDIYAHLYAHPSEHAELRSLRTEPIDVEHLCAIVPYVAKKNSQLPSMGTICGYLLVLLIV VLFWILEHFTHLNSHYWSPWIVIKLLFGVVVVKRPNRCADRIMFGVLCLVSIVYLTKLYSSLTGGIVY TNEVNKWLTLEDLVNSDLIPFISPLYYNKTFSYATGVELKLKEKVKMISSNMMNCLNHLAEHKNIS CIMSKNEYKVFQQYETTRRTELLVPCLIVDSSAFSLSKNSPYHEHIDHLIRIFRDVGLKNKWYDIGFS RRQNITQEESYDARDVLLEKNDFFQQCIIILTIGYTLATIALLGELIYYHKCEKKHSSVSAFWRKLKLILI VFCFYDCRNGNGDHRLICFLRGEPLHYEGCRIKFC DallIr MFNRKISLVIFVITILSDDLQGRPEGDWFESVKTRLSNPHQVYSAMIIRDNNTDDFDVLRDELFGKI 127 IESMPSEIFSNLDGVHNSNNWYRNVSSRMASSLLFVYYHRYSGDNTRVHEVINTMRGLSKLSAKC YFLIVLRTSSVSKEGVEDMLRHSWNNTMLNILICEIRRDENSSVTNWKVPNDESNRVIIHLYNPFL DKFYHKKFFPKLELFPDFASNMYGKKLNVLIVQQVPLTFVKWDKNNEMTEMSGANIALMQTLA GAMNFTPVILPKWNGTGFSVKTRTFDAWKFFKTTNTDITANLEPHFTEHISEESLRFRPIMAQEV GVLMPVEYAIDKKYQDNAVESSIITLIIMMIFWFVGILLRLDKKIWNFSIIICLLFSFSVPRQPSKAYE RILFLSLSMLGFMYSNQIYASLTNVAVLSLEEREFVTFEEIDASGLIPIVPIAHFERVFRNAVGAELNL KKKSQIIVNIRDCPKMAMVYRNVCCIMYNTEGEYYRRLSRQDNGQYRLKFAKPILRSDNGAFGVR DTSPYKTKINYLIQRCFEAGLISKWYLEAISPSRSGNILNDDKPSRGERSATFRRQLMVVIVFGHSLA TLVFFGELLVHRLRISHSNIDKSRLNLRKLWKMKTK DallIr MFITKTFLVIFFITILSNEFQSRPEIDWFESAYPHLSDSDSPHTALIIHYKGAHDADAIRDELFEKIIQT 128 MPSASIEMNDEGIALKEFDFGNVSKLSTSLLFIFHTEYSGDNVQISRVMHSLNKLADFSVNCYLLIIL RTSSNLNRRLEDILHKAWKKTIMNIAIVEIRGINCNRLTKLSTDNQQFHDCEGKKYSAFSRIYDDSP KRVIIHQFNPFINTFYHKKFSTRRKLFPNFANNMHGHDLKVRIINQPPFAAVTWNENNEMKKMS GPNIVLMQTITAKLNATPVILPNPKSMEFWLSNFTVEDLINESLNSDLTAHLCVRFTGHILEESVRS RHIITHELGVLMPKERGINKITLYYAIESTALILVIMLTIWFAAILLKFDKKFFELSRIFRLILSIGVDYQP TRSSQRVLFLVVIMIGFMYSNNIYASLTNLGVDPLVEKEYKTFEDIDNSGLIPIIRAPVFRKTFNDAV GAKLNLKKKSIQVIDSSNCLRMAMVHRNVCCVLFRAEAELYERLSHRKDSQYQLKLSEPIFWASD AAFRFRQGLPGKKTINEVIDRCNEAGLINKWYWNMEGSADPNNNTNDDLHAQDNAVQTQLLI QLIVVIAFGHLLATLVFIGELLAHRFKNSSKGKRKVRFCVY DallIr MFTIKYLVIFFIIILPDELQSRPQVDWFESINTLFTAHKIHTALFIHSNNTDDSDIKDGLFAKIMESIPS 129 VSFEMDHEGIVPKDLDIGVLSKLSTSLLFIHYAKYSGNPVQIYRVMHSINKLADFSVNCYLLIVLETSS NSSEEIEDILQHAWNKTILNMAIVEIRLAANFNTSTTVIAEISKFSDEANRGLHKNIASRVIIHQFNPF IKKFYHLEFSSGKELFPNYAKNMHGHGLKVVIVDQPPFASVTWKEDGEVGKMSGPNILLMQTITE KLNATPVILPKPKNMRFWLSNFTVEDLINESLNWDLTAHLCVRFTEHILEESVRSQNIITHELGVLM PRGYAINNKIINNILESSALTLVMMLAIWLAAILLKFDRKFWKLSTIFRWIFSISVHHQPSRNFERIFF FVVVMLGFIYANNIYASLTNLGVDPLVENDYETLEAIDDTGLTPMIRNPLFRRTFKNATGVELNLKR RSIRVMEITDCTQIAILHRNVCCVLFKSEAEYHKQWSRTKNGENQLKFAAPIFWSHHAAFLFRQSL PGRKNINKVIKRCSEAGLLSKWYTHMTRGFGGIAEDVAHVTDDLHTLRGQLIVVAALGHSLAALV LIGELLTHRFRNSFKKKCRRTKKSH

148

Supplementary Data 2-S5 – continued

DallIr MPIAKILLLIFCITILLDKTLSRPEVDWFESVSTRLLRSGDIHTAVIVHDLDTDDSDVLKKELFGKIMG 130F NMPCVTVNVDDDILHLKKLDFGVVSKSSTSLLFIFYADYSGDNVQIFKVMHAMNRVADFSVNYILI ILRTSSNLDRGVENILQHAWRQSMLNMVIAENLNSRIDEDSSIQVIMHQFNPFIKKFYHDEFHPGI ELYRNFAKNMHGHQIRVTFSDQPPFTYWKAPDLKMGIRKRNNNAMTGPNVHVIENIAQALNFT IAYAIFIEDKYEWESRPNNYVNSINTVLYLLKKSATDIFANAMVLTTDETLDESLRSPLIMTNELTLL MPSDFKFNKKLMNNAILSGVLTLAIIILIWFATIFLKFNEKMWKFSVIFGLIFSITITRQPLKTFERILFI VVVILAFMYSSDIYASFTTLGIDPLVENKYETFERIDDSDLTPIIQYLLFNKTFEGAAGAKLNLKLIKXX KKTVPYFSTRFCPEMAMVHRNVCCIMTRIEAEYYKSQSHTNDGQYRLQVAKPSFWSDTGVFHFR WNLPGKTNMKKIIENFREAGILNKFYATPYMHPQNDQNDNAYEMDEQSQIHLLRQLVMLAVIG NSLAVIVCIGELLAHRFKNSVKGKVKLNLCGVFRRRMKQIQTYLLLTYRTLNVQVVLLLRAMRKYKI QLLDRRRWPKWLKSFSRTADRQAQI DallIr MSITNFLVVIFFITILTDEIQSRPEIDWFDSVATRLLHSRDIHTAVIIHNRNNSEVSDAVRDELFGKIM 131 GTVPSVIAEVKDGLIELEEMDFGVVTNEFTSLLFIFYADYSGNNEEIYDVMYAINDLTDFAVNYMLII PSE LRTSSNLTRDIESILQYAWNESMSNILIVENRLVTNSKMFAKESADNSQFLGCGNLQCYGVNRKIV EDNSVQIIMHQFNFFTKIFYHEKFQPGMELFPNFAENMHGHKIRVKIWKREPHLIQAMARILNFT ISTVVGVKNHLSFKARATVYKVS*LDIFTFGMLTGRASQLSATKCIVMKNLVLVMPNEYVFNTNLL SNTFASVIFIFTVIIVIWFVTIVLKFQEKTWEVFKIFSLIFSIGISCKPSKSHERILFIFIFILGFIYSNDIYAS FTNFGIDPLVEKQYKTFEQIDDAGLPIIVDHKLFIEEFYAGTVEAKLNLKKRIIFKDIPCVKMAAMHG NVSCLILQNSAKYVEYKDRETSGQTRLKIAKPIFVSLPEVFLFRYGLPGKKKMNDIIERCRETGLLFR WYTSKYLKTSSQSDKTVIANQMEKETKTQFLRQLIVLSLLGQLLAIIVFLGELLAHRFRNSPKGEIKL DLRGIFKRKIIRREWYAINYCLNLTR DallIr MSITNFLVVIFFITILTDEIQSRPEFDWFDSVATRLLHSSDIHTAVIIHNRNNPEVSDAVRDELFGKI 132 MGTVPSVIAEVEDGFIELEEMNFGVVTKEFTSLLFIFYADYSGNNEEIYDVMYAMNGLADFAMNY PSE VLIILRTSSNLTRDIESILQYAWNESMSNILIVENRLVTNSKMFAKESADNSQFLGCGNLQCNRVNS KIVEDSSVQIIMHQFNFFTKIFYHEKFQPGMELFPNFAENMHGHKIRVKILKREPHLIQAMARTLN FTISTGVGIQNNFSFAARGTDYNLS*LEIFTFALLAGGGAPQLSASTYIVTKNLVLVMPNEYEFNVN LLSNTFASGIFIFTVIIVIWFVTIVLNFQEKTWEVFKIFSLIFSIGIPCKPSKSHERILFIVIFILGFMYSNDI YASFTNFGIDPLVEKQYKTFEQIDDAGLPIIVSEILFIKEFYAGAVEAKLNLKKRFIFKDMRCVKTAAV YGNVSCLMLQDRAEVTEYDDRKKSGQHRLKIAKPIFVSLPEVFLFRPGLPGRKKMNDIIERCRETG LLFRWYTSQYLKTSSQSDKTVIANQMEKETKTQFLRQLIVLSLLGQLLAIIVFLGELLAHRFRNSPKG KIKLDLRGIFKRKIIRREWYAINYCLNLTR DallIr MSITKTSLMIVLIAILADEIQSRPEVDWFDSVTTRLFRSGEVHTALIIHQRSTDYSDAVKNELFGKIM 133 GTMPSVIVEVDGGVINLKEQDFGVVSVKSTSLLFIFYADYSGDNVQIYETMHAMKNLTDFSVNYIL IVLRTSSNLTRSVETILQYAWDLSMSKILIVEIRRLNSKKRNEITADNAKSLGYGKLKCYRVNGKVNE ASSVKLIMHQFNFFSKQFFHEEFHDGVKLFPNFAKNMHGYVMKSIPNHNSRLRYGQHKNEYQM FPLLFLRTELYKDMSRVLNFTTKELLLPSKWNRNSVDALIDFDNKVGKIDIMGPYSAASLPSTKMKI FFLEEMEDRDLVMVMPTTYTINQNLINNAVASTVISFVIFALIWLVAIILKFDNKTWSALAIFSTLLSI TVPNQPLKTNQRILFFTVVMVAFMFSNDIYGFLTNLGIDLFVENEYETFEQIDNLCSTPMVNKYNF YPAFKTADGAKFNLKNKSVNESIVSCLKQAEQYRHVCCLMDRGIFEVLTFWYGRTIARYQLKIAKP VFAVQTECMWFRHNIPGTNEMIEVINRLRQAGFLRSRYSDPDSHSRLRALLQRTDAMDKRRTEE SRFQLLRELIFLCALGYFFAVGVFIGELLVHRFGNSAKRKMKLDLPSVPDRKIERREW

149

Supplementary Data 2-S5 – continued

DallIr MFVTEIFRVICFITILLDELQSCPEVAWFDSVSRQLTDTYHIHTTLVVDFSNTSSFDFMKDELFGKIIT 134F SLPSITFSMDDQGIEKNGLHFGDVSKVSNSLLFIIYAEYPGNNVQMYKLMDSINESADFSVNCYSLI ILRTSSNMHRDVENILQYGWSKSMLNIVIVEIRRETDSRSTTKVSADNPKFLRYQNFNRKEDHSKIC GDNASQLIVHQFNPFINEFYHKKFSPRLKLFXAKNWHGYPLNVAVINRPPFAFIKWDKNNQMKE MSGPNIVLMQTLAKNLNFTPVIQSRLNTVEFWESDFTSPDFIRLFEIHGVDLVANLAIHYTNDTME ESVRSRTIVTQKLGVLMPKRYYISKNHLHNAFKTTALMIITIVLLIAMFILRPDKKIWGVSCIIGLLFPI GVYHRPSKALERVLFIFVIILGFMYMNDIYASLSKFGVDCLVEHEYSSFEDIDDSNLTLRIQAPLFKTA FRNAVGAELNLKKKTTKFADNIDCPQIAMVHSNVCCVMYETEVHFYKQWSHAKYGHNGFTSAK EFFWSGALAFTFRQGLPGKEYINKFIGRXPHEAGLLSKWYPDSRSSMQDEGVRNEKSQTHLIEHLI MITIFGHSFAVIIFMGELLTYCLAKKIQEKKRAQCMRCI DallIr MCIIKTLLVIFLITILVDEIQSRPEVDWFDSVATRLFRSGEVHTVLIIHQRIADYSDVVKNELFGKIMG 135 TMPSVIVEVDGGVINLKEQDFGVVSKNTSSLLFIFYADYSGDNVQIYEVMHAMKKLTDFSVNYILI VLRTSSNLTRPVENILQYAWDISMSNILIVEIRRLNSEKRTVASADNTKVLGCGKLKCYGANGKFNE ANSVKLIMHQFNFFSKQFFHEEFHNGVKLFPNFAKNMHGYVMKSLGNYIWPLRYGQHKDKKKS FFVRKKFELYEDITRVLNFTTTQQIFLRGEWKGWRGNYTYGALLSYYNKTEKVDIMGPLSSMLLSD LHIKPFILEAMKDQELIMVMPTTYAINQALINKALASTVISLIIFALIWLVAIILNFDGKIWSAFAIFSA LSLMVVHHQPLRTNQRMLFLAIVVVSFMYSNDIYASLINLGIDPFVENEYETFEQIDDSCSTPMM QWYDVNDAFVGADGAKLNLKKKHVKGTLVSCLEKAVEYRNVCCLTDARTMYASRFWLRRNNG LHRLKVAKPIFASRTECIWFREGIPGKREIIEVIKRLQETGLSSGWYSPADNPLSLQTDNHNRIGEM DEESRTQLLRELILLGVLGYSLAITIFIGELLVHRFGSSAKGKMKLHLPGVPDRKIERREW DallIr MSITKTSLVIFLITILADEIQSRPEVDWFHSVTTRLLRSDEVHTALIIHQRSTDYSDAVKNELFGKIME 136 TMPNVIVEIDGGVINLKEQNFGVVSKNTSSLLFIFYADYSGDNVQIYETMNAINKLTDFSVNYILIVL P+F RTSDLTRSVENILQHAWDI*MSNTLIVEIRFSKVSTDNSHFLGCGKLKCYGANGRINEDSSVKLIM HQFNFFTKQCFHEEFHHGVKLFPNFAKNMHGHVMKTFRGYNRPLRYGEYEDKKNSSLVRKAFAL IDDLTRVLNFTTREQLILRGEWKGCRGNYTYEALTDYYRTINKVDIVGPVSTTSVSGSIIQPFLLRPM KERELIMIIPITYKINQNLINKTVASTVILLILYXLIWLVAIISNFDEKICNTFAIFSALFSITVHHQPLITN QRLLFFAIVIVGFMYFSDIYASLTNLGIDPFVENKYETFEQIDDSCSTVITSWYDVDPAFEGADGVKL NLKKKSVNGPLMPCLDMALEYRNVCCLTDRQRIDESRFLLRKNNNLHLLKIAKPIFGSRIESIWFW EGTPGKKETIEVIKRHREAGLSSGLYSTDDDPSWLRTNKHNNADEMDEESRTQLRRELISLCALGY SLAIVVLIVEIFVYRYRNSAKGRMKLNVPIVFKKKIKRREWYAISRRFNSVELIFINHFYFPSFQLEKSK PSYYSHLRA DallIr MTHIFAIINSFEWMNHWKQVHRSYLSAPTRIALLITGSCTDSHVKSIKSILHGFWQREMLNVIAIT 137 PTQDLPLIHTYNPFLVGTENPPGELMNLTSEFFPDKLKNLHKYPVRIAFYENPPYVYPKTHPAPMD PSE GGDYALMTFLSERLNFTMKSSMKQVTFADTFFQLSNRKRTAAIGDLLMRDTDILGNSFYLESING QELDILYPRWRTALVVIVPRTRPIRGILNMFSSVDLTVVILFFIVMILLVVYLRVEKDTHIFIHVWRLLI YQHFDFIGTRASERLFCISFVIWSFFLVEFYEGRLVNDLSSSFYKDIKTLRGLIDSKLTVLVPSWQQTV LSHSSIQERMELVGQLEVEGNFTKCAQRAVEAGDVGCTMDANAADYFKDHIAKNAYEETNDGT VKGQLRVMEESLMSFWRLHATRKRFFLKDKLNEVIGIIQEGGLLKKWESDEKDRQFREHVEEQKE SVGMDKMGAAFQLLMYGHCLATGQLLIELAWHAFKKNHLRGWDDSEEAFVG

150

Supplementary Data 2-S5 – continued

DallIr MEKRLRILTQSIINQELSYLMNTPFEKEQWLSFKKLFPLPFHNFQVDHAILNFITMGYTSLRPVFLV 138 QEDGFVHPYIDSICPKLRHVVGIFESIDTVHPINWKMLYGSTCLSRTKIVLWITRHADDDDIKSILKA FWSLDFLNVILVTMSDDSTNHGLAVHSYDPFWVNKGGVRGKVYTLAVADLLYPNKMKNLQGYP IAVSMYDTFLSHPVTNISQLRANSDVELLLSLSSWLNFTIDLRSRPISTAEYYRMKSPNGTVVGIIYD LITNYSEVLGNSHIWGKASDRVEFLLPHARFISEAVVPMPEVIPEIICIFKSFDVMTFILLILFSFISVIY MYSKGTNAPVLNTFRLIIHHQFYQVGTRVSEKVLAISWIVFAGLFLLIQQSHLIENMTNPLFEKRID TLKQLADSDLTVMTDAEHFPLLNRSRDEVMQKIVSRITTKHSSRYCYERLIAGGNVACFLNTVIISS VRSKPEITQSYLKIPKMHKMREIVGQYWRAIIVKKGFPYLSKFNAGIGRLMQTGFVTKWQARDRE RLKPRVIHADDAVPLNLSHFRGPFIIFLFGIVVATCAFIYEIVIYRVKY DallIr MLGFGDDSGKNASFNFKPKIIILEGDSPEQLESDLSSFKETQWWNYMAHYFILGINQNQCENIVAI 139 LTTAWKMNIIKSIYICLNTENMPELYTLNPITNYALTPWERIEIPLVKGAIYRQLFDANVSCQNLHY DKTKVLNGFKITGVIAKFSNKKAKDMNVVNQMALNFKKILNVSIALKLKPPGDLVIQTIADGTAD MVLNYGFMYWNPERQFLFPYLWTQVMAVRKFNGYLSTLEKIDNLWSVPIRVLALLIFSIFFLVMV YENRDISSAMLELIRLLTYVSIKTTREKLSFRIFFSMMMMFMVISNGVFQGAISSFLTKPQHAKGV DTVDELRDLNYTLYTIPGTVERAKEECPDNEVIEVPLEVCPELLLKSSRAVCVTFREFFYEFYGNSSLT LMKQPLKSTYFNHKCRDDWPLHQRVNDYFVQTFEAGLINYWLTEKIRLTMQKHRANEMGLLSP NYKPVQLKALDFVFKLLAIGLCLATVVFLLEVLAKECKIDLWLLR DallIr MVIVGDFDGRILMSLIESIPIPIVTIDSHSEFIFGDASDGLYVFQPDIIIMAMEDSPKQLGNDLQLLQ 140 EMNQWNHMAAHFILEISARFCKDALESLLITWRMHIPQSSYVCLNAEQAIVLYTLNPITNYAPKP WQAITSADRFNSSDRWTLYSQLLNPNDIACDSLTFDRMKHLNGYEIKALTAPKNNVSFIPGKNYK QAGYKMLEFFGKTVFDTIITRLNATLVLQIDDIHIIPQYLANQSMDIDIRFTLSFSKTNASYLYPYFQP EVIAITRIKDYLSTFEKIAQLWSRSVLVLSFLTILITFGVMAVYKRQGFSLALLEIIRLLSYASIQNNFQS TAMRIFFSKILIFIVITNGVFQGRLAAFLTKPEEGYSPENSEDLKNLNYTLYGIQQNVKYLRTMFPDN RVVLVTQKDCVEMALASLSAACVGTKAKYLNKYWMKPIHTTREPLFVDYWNHQCRKDWPLKH RVEAVLTQIMESGLLARWAFHSISTVLDKKRAEDVERAATKYRPVELKALDFSFALLAFGLVLATIV LIIEFLMKRESNTKIREEIKFRNFFIARRTTI DallIr MECWLALIMAMMLPCVRTIVIPEEENLRIEDPSNRELINPETDVAKSELNELSKTVVCLMKFVELY 141 MNQLPSRISVLLMETDDPDFSGVYLSKLQQVRSTYILNGYLDSDSDQNTSLDALLILKSYENLENST SRLIDLCGRDCRYAVVVTNLFPDEASFMEEAVNYVKLLWIKRVANVVILGPVGNTLLAAQSQGFL ANKLTEPSDPIPIGKCHQGEWITTTEVFPVLKMNNSHIHFAIIDHEPYMSVTWEGDHMKARGIEL KIINILRDTLQFRSTGSLLEWDEGNTVEDEIIEEFKSDRKIDLVVGGLLRTMVKDVDFALPYDVTQV VWLVPSHSNISLLGLISPFTLKIWILTIASIIFGGMIKSLLFDKMSFLEIMALVIGVAWHKQPKRLSYRI KFMSWVLFGYVLTQVYLASLAGQLLAHGDLQINTMQELVDSGLIFGGTANHKQFFMQTDDDN DGEISAVDTIYKQFIVFSHEDYMKKLTQLMQGENTSLALVAVLNISSASTSVHHEMYHIVKETLATT PLAFPAWRGLPYLTQIDSKLAALIQGGIISYIANNETTVQHIHDAIEEKTDANLELVDIAPSFLLLVM GHGAGMLCLLGEFAVFRWQNRKPKKVVKGTKVRRLKGVAKRTVRFDEKNLRLTKPNNGVRLVL RPNVTIGYRDGRLPWKIV

151

Supplementary Data 2-S5 – continued Odorant binding proteins

DallObp MARHVVCCFLLGVAMQALIVSAGRPDFITDDMMAMVADDKARCMGEHGTTETLIDEVNN 1 GALPNDRALTCYMDCLFAAFGVIDEGELEVDMLVGFLPDHMQDQARELLEACAKQPGADPC DKVFNIAKCVQAKRPDLWFMI DallObp MKKFVGILCLLLQVSIGLSGPVGRPDFVSDEMIALAASVVNACQTQTGVATADIEAVRSGQW 2 PDSTPLKCYMYCLWEQFGLVDDKRELSLNGMLTFFQRIPAYRAEVQTAIRECKEIGKYLANGD NCQYAYTFNMCYAKLSPRTYYLF DallObp MVKYMLSAVVLLCLMGYISAGPIPKEFQEVAGDIRKVCIEETGTTVDLIERAGKGDFAEDDNLK 3 CYLKCLFAQFGLISKKGLNFEQLVKVAPPDMKDMAKQLGVTCKDIKINADGDQCDLAYHMSK CFFNTFPDQYFIM DallObp MKFLVVLFFVCLVGALAQELNEAQRQRLREHRDACIRETGADRAEVDRAHRGDWADNPTIR 4 CFALCMMKRLGLMSDDGKLNETAARQRLALVLRRERVEEVMRKCKDLKGPTPCDTGYLILKC YTDNRANMV DallObp MVKYVISASLAVCLVAVINAGEIPPEFKEIAPEVRRVCLEETGAKVEWIEKANKGEFTDDPKFKC 5 YLKCTVAQFGAVSRKGVNFDALAKLAPPAYKEKLEKIIAVCKDTKGTIPGDVCDQVYESSKCFQ RTSPDDYFVM DallObp MKFLVIVFFICLVGALAQELTDAQKQKLREHRDACVTETGADRADVDKAYKGEWADNPKLRC 6 FTLCMMKKVGMMNDDGVLDETITRQRMALKLKPEKVDEIWTKCKDLKGTTACDTAYMMM KCYTENRAITV DallObp MNTVFIILTVCIAVVLGVTLEKELKDACIAEEGASPDAVEKAYQRNEWEDDSKLRCFSVCMMR 7 KWKLVDEPDGSWIPEAISEFISKTYPSANVDEVMTKCPLIFDATCTTSYLLQKCWGDLKLTEVA TTA DallObp MNTSATVLVFCALAVTLVLGHHPKPMFAAAAEKCIEKTGVDLDALKTLHETGGVNADEKLKC 8 FGACIMKGLGVMAEDGTVDVEAAKELVPSDVPDRDKIVAVIEACHHEKGANECETAHAIGM CMHKNHMNDMQN DallObp MHSRLLLFLLPTMRAVSIVLITTALCLVLVEADIRRECRKQTGVSWGALKKFRAANFQQDDKKL 9 KCYVKCFMKKNGIFGERDIDIDKALRHLPKGVQGTSKRVLESCKNVPSTDPCDKAFQIVKCFSK QEPEILRGVPFV DallObp MQRENFKRQGTGIDMRIIVPCLALLSLVLFVRGDDKDPHGPIREKCKDQFGLSSEDLKEAMED 10 PNDVGCYLLCFFKDLSIMDDSGEFDPLAAMDAIEESAKDGARPVIFSCYDQEKKSATKDTCAR ALEVVLCFKKEAPELYKNLGLFHPLGQ DallObp MKMKLLVAVLILSVAFVSAAFTAADIVRFQRTLEKCRTKNDLSDEVLERVLNGEIVDEPDFNCF 11 AACLLQNYELLREDGSFNVELAVSKIPQDADFAQPLTDAIKTCSARRGTDNCKTAHLLFVCMY EKDVPNLLFG DallObp MNTSTIVVVFCALIAVSFTLGKDMNDTFLAEAEKCIEKTGVDPHVVGTLYEDGGSKADEKVKC 12 FIACILKGLRMMTEDGTVDAEGSKANVPSDEPHRETIIAAIDACHNEKGANECETAHAMFTC MHKKYMNDGA DallObp MKSGILLGLAILLVAQSFENAEAKMTIPQVTNMLMPMRKTCIQKTGASADLVDAPKTGPMP 13 DDPSLKCYYSCLLKMVKVVTKEGLPNYENMVKQMDMMLPADDLTARLKDVINLCAPKVTST EPCEANWQFIKCFYETDKNVCFFP

152

Supplementary Data 2-S5 – continued

DallObp MVKYMLNAVVLFCLMGYLSAGEIPQEFKEVAGDIPKSCIAETGVTVDLIERARTGDLAEDGNL 14 KCYLKCIMVNLGMISEKGLNFEKMIEMVQADMKDLAKQLSTTCKDIKPNAEGDQCQLAFDF MKCFYKGFPNDFFVL DallObp MKFFHFFLFFFSFAKMNIVTIRIILIFVIVHYTQSLRCSSGNQRLNDNYQKIMQTCRRRYNASRG 15 EENNFSTDDSNNSEGDSSSSDGSLFGHDFLSGTKKTHGESYLNYRNGRKASSKDVMQRQNN SRNFNDGDEQFCKSHCFFDELNVVDRRGYPERVAVTRVMTQGIQNPALRDFIEESILECFHL MPSIISCDKCAFSQNLVNCLTDKGKKECEDWND

153

Supplementary Data 2-S5 – continued Chemosensory proteins

DallCsp MKSSVFFVLAILGAAFIAAEGGNRYADKYDSVNVDQLLGNERIYKQHLNCLLDQGQCSRQAQSLK 1 DVLPEVLSTSCAKCSPVQRQMARKVVGYIQKNKPDDWKLLTTKFDPQGRYTEEIRRFILSNV DallCsp MLRGAVVIALVFLSAVIAEEKYSEKYDYVDVDGILANDKQRESYYKCFAGIGPCKTAAARFFRDTLP 2 EAIVTRCKKCTARQSVNFDKISDWYTTNEPEKYQIIVAKAVRDIMAKSA

DallCsp MRRLIFFVLIAVALAEERPMYTTKYDKFDIDSIIKNDRLFKNYIDCLMDEKPCTPEGNEFKRNLPDAL 3 ETGCASCSKAQKTMAEKFYHHVIDNRIDDWMRLENKYDPRGNYRKNYLGLDIETTTVAL DallCsp MKIAVFVLLSCLVAVISARPDKYTTKWDNIDVDQILNNDRILNNYVNCLLEEGNCTAEGRELKSVL 4 PDALETECEKCSRKQRDGSKKIIKFLVQNKQDLWEKLMDKYDEEKKYRGKYEDQARAEGIEIQS

DallCsp MKVAFVLLAVVAVSLAKPQGYTTKYDNVDLDQILRNDRLLNNYVKCLLDEGHCTSDGKELKASLP 5 DALATGCTKCSEKQRAGSEKVIRYLVNERPKVWQKLAAKYDPHDEYRVKFQGEASARGIQV

DallCsp MRAVVILCLLIGSVIAQKAGKYDNVDVDAILKNNRVLTQYIKCMLGEGSCTAEGRELKKVLPDALK 6 TNCAKCDEKQKSTAEKVINHLRSNRPNEWNRLVAKYDPQGEYEKRFEAAASAKN

DallCsp MFTKVLLISLLMCAAVMGQEAEQRSRVSDEQVNIALNDPRYLKRQIKCALGEAPCDPVGRRLKSL 7 APLVLRGSCPQCSREETHQIKRVLSHIQRQFPREWSKVIKQYAGV

DallCsp MEAKPSTQVLVFFLFTVVILTRDIECYTWPRRNTYMTRWDKVNLDEILQNKRLLHHYFRCLMGV 8 GPCPPDGQELKRVLPEALETACAKCSKSQKEGAIYVIKYLREYMPKKLEMLANRYDPDGKYRRRYY HSTSVDNNTT DallCsp MKARMALFLVGMLSLVVGIEAQDVEALLKNPEFVNFEINCMLDEGPCDLIGNSIKNVLPEALNNN 9 CRRCTRSQARIIRRLIDFMETAYPEQNQRIRNRYIRSPTSELADELP

154

Supplementary Data 2-S6. Sequences of sex-determination genes in D. alloeum

>doublesex_male_isoform MNHDNEASPASSDSKSVVLKNAGLIPRTPPNCARCRNHGLKITLKGHKRYCKYRYCICGKCTLTK DRQRVMALQTALRRAQDQDTTRVRRPDEQVEPRPMSLDGERLISVPQPARSLENSCDSNSADS PFSNHGSTAPHNGIVTIPPSRKLPPSHNIHSPSATQLSEPRSCESSENVEVLLEYSAKLLEQFWY SWEILPLMYVILKDAKADLDEARRRIEEANNEIRAIAVRKARKMITDTGDGYYNEWYTATAGSGAP TYFGQPPHIGSFMHPPVHLGIPHLLNAHVLATRVPSSPPGGPTT

>doublesex_female_isoform MNHDNEASPASSDSKSVVLKNAGLIPRTPPNCARCRNHGLKITLKGHKRYCKYRYCICGKCTLTK DRQRVMALQTALRRAQDQDTTRVRRPDEQVEPRPMSLDGERLISVPQPARSLENSCDSNSADS PFSNHGSTAPHNGIVTIPPSRKLPPSHNIHSPSATQLSEPRSCESSENVEVLLEYSAKLLEQFWY SWEILPLMYVILKDAKADLDEARRRIEEAVDTFRVNLTFSTMVI

>transformer_male_isoform MRRSDDRSREYRDSRDYLPEDRLRELERKRREWRIEQEKRREHEKRKQKMIQEWEAKRAREL EGKNQRRRSKSRSRSASPGPSRRRGRSRSVEKNSTRASTKVPVMSERFDSASSSNTPLFKGM EGSKISVSELKKIKVNIQRDISKAEETSELLRDITSPDEIVLKRREGKRSLIVFLSIRRLITVWGI

>transformer_female_isoform MRRSDDRSREYRDSRDYLPEDRLRELERKRREWRIEQEKRREHEKRKQKMIQEWEAKRAREL EGKNQRRRSKSRSRSASPGPSRRRGRSRSVEKNSTRASTKVPVMSERFDSASSSNTPLFKGM EGSKISVSELKKIKVNIQRDISKAEETSELLRDITSPDEIVLKRREGEGSKPIFEREELKTKESNNVD VPERRTVVALDEFETNKNRSTERRSRSLSSGRRTSHSSRRASRHDAASKSYTERHDRGHSYDR GSTTNDRYHERKYPSSRKSSPERVRDSRSRRSQDRNARAGSHRDSTCREYQDYDSYSHPRRH EVDRRPAALSYAEPITFPMYYDNLVRPMMMDPMMMMRTPMPMMRGRVPAMIPTFRPPFRPRF MPSEMFRLNGPPTQRYGRMFP

155

CHAPTER 3: RETENTION OF CORE MEIOTIC GENES ACROSS DIVERSE HYMENOPTERA, INCLUDING THE ASEXUAL WASP DIACHASMA MULIEBRE ABSTRACT

The cellular mechanisms of meiosis are critical for proper gamete formation in sexual organisms. Functional studies in model organisms have identified genes essential for meiosis, yet the extent to which this core meiotic machinery is conserved across non- model systems is not fully understood. Moreover, it is unclear whether deviation from canonical modes of sexual reproduction is accompanied by modifications in the genetic components involved in meiosis. We used a robust approach to identify and catalogue meiosis genes in Hymenoptera, an insect order typically characterized by haplodiploid reproduction. Using newly available genome data, we searched for 43 genes involved in meiosis in 18 diverse hymenopterans. Seven of eight genes with roles specific to meiosis were found across a majority of surveyed species, suggesting the preservation of core meiotic machinery in haplodiploid hymenopterans. Phylogenomic analyses of the inventory of meiosis genes and the identification of shared gene duplications and losses provided support for the grouping of species within Proctotrupomorpha,

Ichneumonomorpha, and Aculeata clades along with a paraphyletic Symphyta. The conservation of meiosis genes across Hymenoptera provides a framework for studying transitions between reproductive modes in this insect group.

156

INTRODUCTION

Meiosis, the cellular process involving chromosomal recombination, segregation, and the production of haploid gametes, is widely conserved among eukaryotes (reviewed in

Loidl, 2016). Meiosis likely arose once, early in eukaryotic evolution, and its prevalence across extant taxa implies that it confers considerable selective advantages (Barton &

Charlesworth, 1998; Ramesh et al., 2005). A conserved suite of meiotic genes involved in proper gamete formation has been identified in model organisms, although some of these genes have been lost in taxa that engage in meiosis and sexual reproduction

(Villenueve & Hillers, 2001; Ramesh et al., 2005). Direct observation of sexual reproduction can be difficult, especially in non-model organisms. However, bioinformatic analyses of genome data from any eukaryotic species can be used to infer the genetic underpinnings of sex and meiotic recombination, including whether an organism may be capable of sex, and how the meiotic machinery evolves.

Organisms with unconventional sexual systems are important taxa in the study of the evolution of meiosis because they offer insight into genetic changes that underlie differences in reproductive modes. In sexual haplodiploid reproduction typical of hymenopteran insects, females lay haploid eggs that develop without fertilization into new adult males. When unreduced sperm from haploid males fertilize eggs, the resulting diploid eggs develop into females. Transitions to asexual reproduction have been documented in several hymenopterans (van Wilgenburg et al., 2006). Here, females lay unfertilized diploid eggs that develop into daughters, and males are absent. Asexual reproduction in the Hymenoptera can be clonal (apomixis) or can involve recombination with subsequent fusion of meiotic products (automixis) (Lamb & Wiley, 1987). The

157 specific mechanism of asexual reproduction can affect patterns of genome heterozygosity and may influence the adaptive potential of natural populations (Suomalainen et al.,

1987).

A “meiosis detection toolkit” (Schurko & Logsdon, 2008) uses the identification of intact meiosis genes to investigate the mechanistic capacity for meiosis in non-model organisms, with an emphasis on examining organisms that deviate from canonical modes of sexual reproduction or whose capacity for sexual reproduction is unknown (e.g. Tzung et al., 2001; Ramesh et al., 2005; Malik et al., 2008; Schurko et al.,

2009; Hanson et al., 2013; Chi et al., 2013; Patil et al. 2015). Candidate toolkit genes have been found broadly across eukaryotic lineages, and losses of these genes often produces defective meiosis phenotypes in various organisms (Villenueve & Hillers, 2001;

Ramesh et al., 2005). This meiotic toolkit includes genes that are involved in both mitosis and meiosis, along with a subset of genes that are meiosis-specific, i.e. genes that encode products with functions exclusive to meiotic processes (e.g. double-strand break formation, chiasmata formation, meiotic recombination; see Figure 3-1). When cytological observation of gamete formation is not tractable, documenting the presence of several genes involved in meiosis, including meiosis-specific genes, is an indirect way to assess an organism’s ability to form gametes via meiosis.

Meiosis gene homologs have previously been identified in a limited set of insects, including just two hymenopterans: the jewel wasp (Nasonia vitripennis) and the

European honeybee (Apis mellifera) (Schurko et al., 2010). Between Nasonia and Apis, some gene duplications and losses were inferred, and additional patterns of duplication and loss were apparent in hymenopteran vs. non-hymenopteran comparisons. The

158 absence of the meiosis-specific gene DMC1 in Nasonia and Apis was particularly striking and warranted a broader survey of this and other genes across the insect order. The recent expansion of available genomic data for hymenopteran insects facilitates expanded efforts toward meiotic gene finding and annotation. Here, we searched for homologs of

43 meiosis genes in 18 newly available hymenopteran genomes, including representatives from four groups (Proctotrupomorpha, Ichneumonomorpha, Aculeata, Symphyta; see

Table 3-1) that represent a broad sampling of the biodiversity of Hymenoptera (Aguiar et al., 2013).

We also generated new genome sequence data to search for meiosis genes in the sexual wasp Diachasma ferrugineum and the asexual wasp Diachasma muliebre. The concurrent study of closely related sexual-asexual species facilitates analysis of genetic differences between meiosis genes that may relate to the loss of sexual reproduction. For instance, loss of sexual reproduction may result in reduced selective constraints on genes required for sex, and these regions might show molecular signatures of pseudogenization and eventual gene loss (Lynch & Conery, 2000; Normark et al., 2003). Conversely, intact meiotic inventory genes may indicate that they either have a secondary conserved function or that some aspects of meiosis have been retained across the transition to asexuality.

159

MATERIALS & METHODS

Hymenopteran meiotic gene inventory development

To compile an inventory of meiosis genes, we used genome assembly data from 21 insects (Table 3-1). We accessed data on NCBI (http://www.ncbi.nlm.nih.gov/) and

Ensembl (http://www.ensembl.org/) between January 2015 and August 2016. We developed a custom-made BLAST script to retrieve candidate homologs (Camacho et al.,

2009). We used tblastn with meiotic protein models reported in Nasonia vitripennis

(Schurko et al., 2010) to query insect genomes for potential homologs. Candidate homologs became queries for the blastx algorithm to confirm their identity. We chose proteins previously identified and characterized from N. vitripennis because the availability of genomes from two other Nasonia species allowed us to test the performance of BLAST scripts. In the event that a N. vitripennis model did not produce a strong BLAST hit (E-value > 1e-20), we searched the genome dataset using available protein sequences from a basal hymenopteran (Athalia rosae), and three non- hymenopteran insects (Drosophila melanogaster, Aedes aegypti, Tribolium castaneum).

Whenever possible, we retrieved protein sequence data generated using automatic prediction pipelines from NCBI/Ensembl. We discarded gene duplicates identified in assemblies which had identical sequences. We also discarded putative duplicates that were less than half of the total alignment length if characteristic protein domains were absent. We manually annotated sequences without predicted models using alignments with homologs in other hymenopterans. The completed meiotic gene inventory included our predictions for transcription start sites, exon-intron boundaries, and stop codons.

160

Diachasma genome sequencing and annotation

We extracted genomic DNA from a single female D. muliebre collected in Roslyn, WA,

U.S.A. (47.22° N, 120.99° W, 685 m elevation), and a single male D. ferrugineum collected in Iowa City, IA, U.S.A. (41.39° N, 91.31° W, 204 m elevation).We prepared

DNA libraries using the KAPA Hyper Prep kit (KAPA Biosystems, Wilmington, MA) and sequenced 2 x 300 paired-end reads using the Illumina MiSeq platform (Illumina

Inc., San Diego, CA). We used TrimGalore v0.4.0 and FASTQC (Babraham

Bioinformatics) to remove adaptors, trim low quality read ends with a Q > 20 cutoff threshold, and visually inspect read sets before and after trimming protocols. After inspection, we determined the best k-mer value for de novo assembly with Kmergenie

(Chikhi & Medvedev, 2013). We used SOAPdenovo2 (Luo et al., 2012) with default parameters to build a draft assembly for D. muliebre. We queried assembly scaffolds using meiotic gene models generated from the D. alloeum genome data to identify and annotate equivalent models for D. muliebre.

Phylogenetic analysis

We produced protein alignments using Geneious software v9.1.0

(http://www.geneious.com, Kearse et al., 2012). We aligned sequences using MUSCLE

(Edgar, 2004) with default parameters, followed by manual inspection. Once alignment datasets were compiled, we arbitrarily chose ten genes to estimate protein substitution models using MEGA v6 (Tamura et al., 2013). The LG+G model (Le & Gascuel, 2008) provided the best estimate of molecular evolution over the sampled genes. We constructed maximum-likelihood trees using PhyML v3.0 (Guindon et al., 2010). We applied the LG+G model with a discrete Gamma distribution (eight rate categories) to

161 estimate evolutionary rate differences among sites in the alignment. A BioNJ algorithm produced an initial tree for the heuristic search, which was optimized with SPR+NNI to produce a final tree with estimated maximum log likelihood. We performed two separate tests for estimating branch support: an approximate Likelihood-Ratio Test (aLRT) and

1000 bootstrap replicates provided support for final trees. For gene families with multiple paralogs, we extracted conserved regions and constructed maximum-likelihood trees to assess gene family composition and modify erroneous annotations resulting from automated prediction pipelines. After confirming orthology, we produced separate trees for each gene family member to ascertain whether topologies and their associated branch supports could be further resolved. Finally, we used the phylogenetic methods described above for two concatenated multi-gene datasets to characterize the overall hymenopteran phylogeny: the first dataset contained genes having > 50% species representation including meiosis genes and mitosis paralogs (46 total). To test for effects of dataset selection on topological concordance, we analyzed a second dataset containing a subset of the first which also included monophyletic groupings for species within

Proctotrupomorpha, Ichneumonomorpha, and Aculeata (33 total).

Evolutionary rate analysis of meiosis-specific genes in Diachasma

We used Geneious and MUSCLE as described above to generate alignments of coding

DNA sequence (CDS) for six meiosis-specific genes (CORT, SPO11, HOP2, MND1,

MSH4, MSH5) in four wasps: D. alloeum, D. ferrugineum, D. muliebre, and F. arisanus.

To determine if there was differences in evolutionary rates in D. muliebre vs. D. ferrugineum since sharing a common ancestor, we conducted Tajima’s relative rate tests

162 using MEGA7 software (Kumar et al., 2016), providing the three Diachasma species as the three-taxon input.

163

RESULTS & DISCUSSION

Our combined use of bioinformatic and phylogenetic methods allowed for a full characterization of meiosis genes in the 18 focal hymenopteran genomes, including potential duplication/loss events (Figure 3-2). The meiotic gene inventory we describe consists of 43 protein-coding genes that a) have functionally-described roles in meiosis in model eukaryotes, b) are conserved across eukaryotes, and c) have been identified previously in multiple arthropod lineages, including two hymenopterans (Schurko et al.,

2009; Schurko et al., 2010). Included in the meiotic gene inventory are eight meiosis- specific genes (boldface in Figure. 3-2) for which functional expression occurs only during meiosis and for which meiotic defects are observed in null mutants in model eukaryotic organisms (Ramesh et al., 2005). Gene coordinates and models used in this study, including meiosis genes and mitotic paralogs used to confirm orthology, are provided in the Supplementary Material (Supplementary Tables 3-S1 to 3-S19,

Supplementary Figure 3-S1).

Overall, we found 33 of 43 genes in all hymenopterans surveyed (Figure 3-2).

Genes involved in cell cycle control and chromosomal structure maintenance were well conserved. Only CORT and REC8 were not universally recovered, and CORT was only missing in one species. The composition of genes with roles in meiotic recombination is varied in Hymenoptera; seven of 24 genes were not found, ranging from individual species to the entire insect order. We found evidence for several meiosis gene duplication events, including three that provide support for the placement of clades in our tree. Of particular interest, all eight meiosis-specific genes were identified in Hymenoptera.

Evidence suggests broad conservation of six of these eight genes, with DMC1 likely lost

164 early in hymenopteran evolution and REC8 potentially experiencing multiple separate losses.

Meiotic Genes: Cell cycle control

Cyclins, cyclin-dependent kinases, and cdc20 homologs

Members of the cyclin-dependent kinase (CDK) family are responsible for multiple stages of cell-cycle progression, which is mediated through interchangeable protein complexes formed with the binding of cyclins (reviewed in Hochegger et al., 2008;

Harashima et al., 2013; Malumbres, 2014). In animals, entry into the cell cycle is characterized by the association of CDK4/6 with Cyclin D, forming a complex which inactivates transcriptional repressors (Malumbres & Barbacid, 2001). CDK2 primarily binds Cyclins A/E during the G1-S phase progression. Knockdown of CDK2 in mouse spermatocytes causes improper chromosomal pairing during Prophase I (Ortega et al.,

2003). After DNA replication, the G2-M phase transition in mitosis and meiosis is mediated by CDK1 activation by Cyclins A/B/B3. While some CDKs have been shown to be dispensable for proper cell cycle progression, CDK1 defects are lethal (Santamaría et al., 2007). Phosphorylation of targets by CDK1 is required for proper oogenesis

(Adhikari et al., 2012) and spermatogenesis (Clement et al., 2015) in mammalian cells.

Although there is some functional redundancy in mitotic cyclin activity (Cyclin A is required for mitosis, while Cyclins B/B3 are dispensable), aberrant cyclin activity in meiosis results in fertility issues (Liu et al., 1998; Jacobs et al., 1998). CDK1 expression is indirectly regulated by CDK10 during the G2-M progression (Kasten & Giordano,

2001). Global Cyclin A/B/B3 destruction and targeted Cyclin B destruction on meiotic spindles is achieved by coordinated activity from CDC20 homologs Cortex (CORT) and

165

Fizzy (FZY) in Drosophila to complete the process of meiosis (Pesin & Orr-Weaver,

2007; Swan & Schupbach, 2007).

Cyclin homologs were present in single copies in all taxa with the exception of the beetle outgroup Tribolium which has two copies of Cyclins A and D (Figure 3-2,

Figure 3-3A, Supplementary Figures 3-S2-S7). Homologs in hymenopterans resolved into separate clades for Cyclins A, B, B3, D, and E (Figure 3-3A, Supplementary Figure.

3-S2). Cyclin-dependent kinases involved in meiosis (CDK1, CDK2) were both monophyletic in major groups across the Hymenoptera, and formed protein families distinct from their mitotic counterparts (CDK4/6, CDK10; Figure 3-3B, Supplementary

Figures 3-S8-S12). CDK2 in Tribolium grouped with CDK1 homologs with weak branch support (Supplementary Figure 3-S8). The results support the observation of conserved cyclin-CDK complex elements originating from basal metazoans (Cao et al., 2014). We also identified CDC20 homologs in queried insects. With the exception of the apparent absence of CORT in the European paper wasp Polistes dominula, single copies of genes formed distinct clades for CORT, FZY, and mitotic fizzy-related (FZR) genes (Figure 3-

4A, Supplementary Figures 3-S13-S16).

Polo kinases

Polo kinases are characterized by the presence of polo-box domains that facilitate protein targeting, and perform multiple functions in mitosis and meiosis (reviewed in

Archambault & Glover, 2009; Zitouni et al., 2014). PLK1 (POLO in Drosophila) was the first member of this family to be identified; mutant screens in flies localized a kinase displaying spindle defects in the G2-M phase transition (Sunkel & Glover, 1988;

Llamazares et al., 1991) and abnormal chromosomal segregation during meiotic divisions

166

(Herrmann et al., 1998). Inactivation of PLK1 by Matrimony (MRTM) prevents phosphorylation of the meiosis-specific protein CDC25 (Twine in Drosophila) to delay meiosis until proper oocyte development is reached, ending when PLK1 levels exceed those of MTRM (Xiang et al., 2007). Activation of the CDC25 homolog by PLK1 leads to subsequent interaction with the cyclin B/CDK1 complex to promote meiotic entry

(Kishimoto, 2003). Duplication of PLK1 prior to animal-fungi divergence likely gave rise to polo-like kinase 4 (PLK4; Carvalho-Santos et al., 2010). The requirement of PLK4 for centriole duplication has been described in Drosophila (Bettencourt-Dias et al., 2005).

Additional PLK1 duplications (PLK2, PLK3, PLK5) have been described in vertebrates, although their function in cell cycle progression is not fully understood (Carvalho-Santos et al., 2010).

We found homologs for PLK1 and CDC25 in all hymenopterans (Figure 3- 2,

Figure 3-4B, Supplementary Figures 3-S17-S21). PLK1 and PLK4 are present in single copies and form well-supported clades (Figure 3-4B, Supplementary Figure 3-S17). In addition, we identified several PLK2/3 genes, which are referred to as such due to their basal phylogenetic position relative to a vertebrate duplication event leading to PLK2 and

PLK3 clade formation (Figure 3-4B, Supplementary Figure 3-S19; Schurko et al., 2010).

These PLK-like sequences were present in every hymenopteran surveyed with the exception of the parasitoid wasp Copidosoma floridanum, and the non-hymenopterans

Drosophila, Aedes, and Tribolium, however homologs are present in Daphnia and other arthropods (Schurko et al., 2009; Schurko et al., 2010). CDC25 was present in all hymenopterans in single copies, but has two homologs in Drosophila; the aforementioned

Twine and mitotic cycle-regulator String (Supplementary Figure 3-S21). Homologs of

167

CDC25 in Hymenoptera formed a distinct clade from dipteran sequences. We could not identify MTRM in any taxa outside Drosophila, consistent with a previous report that the gene was not found in two hymenopteran species (Schurko et al., 2010). Though the PLK and CDC25 genes are present in Hymenoptera, potential roles in meiosis for these genes cannot be determined without functional studies.

Meiotic genes: Initiation and maintenance of chromosome structure

Cohesin complex

Cohesin is a multisubunit protein complex that is involved in sister chromatid cohesion that initiates during S-phase and ends with chromatid segregation at the onset of anaphase

(reviewed in Nasymth & Haering, 2009). The core cohesin machinery includes two structural maintenance of chromosome proteins (SMC1, SMC3), RAD21 (replaced by

REC8 in meiotic pathways), and stromal antigen (SA). A paralog of SA in Drosophila

(SNM) has been implicated in achiasmate meiosis in males but was not found in any other insects (Thomas et al., 2005; Schurko et al., 2010). Timeless (TIM1) is a circadian rhythm protein in insects (Myers et al., 1995) and its paralog, timeout (TIM2), has been implicated in chromosome stability (Benna et al., 2010). The role of TIM2 in cohesin loading was demonstrated in C. elegans (Chan et al., 2003). In Drosophila, TIM2 mutants show chromosomal structure aberrations during metaphase, but in a cohesin- independent manner (Benna et al., 2010). The RAD21 (or REC8) subunit is cleaved by

Separase during the removal of cohesin from chromosomes and segregation of sister chromatids to opposite poles of the dividing cell (Buonomo et al., 2000; Uhlmann et al.,

2000).

168

Structural maintenance of chromosome genes

Cohesin structural maintenance genes SMC1 and SMC3 formed well-supported clades with respect to structural maintenance genes that are part of the condensin complex

(SMC2, SMC4; Figure 3-2, Figure 3-5, Supplementary Figures 3-S22-S26). A previously reported duplication of SMC1 in Nasonia was also identified in other hymenopterans; identified copies in all queried wasps in Proctotrupomorpha suggest a duplication event in that stem lineage, and SMC1 also has duplication events in individual species of

Ichneumonomorpha and Symphyta (Supplementary Figure 3-S23). Sporadic duplication events of SMC3 occurred in the parasitoid wasp Trichogramma pretiosum

(Proctotrupomorpha) and the basal sawfly Neodiprion lecontei (Symphyta)

(Supplementary Figure 3-S25). Wasps in Ichneumonomorpha also contain duplicates of

SMC3, with an additional duplication event occurring prior to the origin of Diachasma

(Supplementary Figure 3-S25). Duplicate copies of SMC1/SMC3 genes have long branches, suggesting these genes may be undergoing an accelerated rate of evolutionary divergence (Figure 3-5).

RAD21/REC8, SA, and cohesin interactors

Core cohesin complex genes are mostly conserved across Hymenoptera (Figure 3-2,

Figure 3-6, Supplementary Figures 3-S27-S30). During construction of the

RAD21/REC8 phylogeny, we identified C(2)M in Drosophila, a gene previously implicated in the formation of meiotic products in a REC8-like manner (Manheim &

McKim, 2003). After attempting to align the C(2)M with the RAD21/REC8 sequences, we decided to omit the sequence from the final phylogeny due to poor alignment in conserved regions. We felt the omission of this gene was reasonable, as a previous study

169 indicated that C(2)M may indeed be a divergent REC8, but its homology is unclear

(Heidmann et al., 2004). We found single copies of RAD21 in all insects, which formed a distinct clade from its meiosis-specific paralog, REC8 (Figure 3-6, Supplementary Figure

3-S27). We were unable to recover REC8 from five insects across the hymenopteran phylogeny (Supplementary Figure 3-S29). REC8 has experienced numerous changes in arthropods; gene losses were reported in the in pea aphid Acyrthosiphon pisum and the yellow fever mosquito Aedes aegypti (Schurko et al., 2010; Hanson et al., 2013; this study), while gene duplications were found in the water flea Daphnia pulex (Schurko et al. 2009). Regions of putative REC8 homologs used in alignments had few conserved sequence domains in a short N-terminal region (< 100 amino acids). While it is possible that REC8 has experienced three distinct loss events in Hymenoptera, it is also possible that high sequence divergence and/or incomplete genome assemblies precludes our ability to retrieve these sequences in all species. We found duplicates of SA in three wasps in Ichneumonomorpha; long branch lengths indicate sequence divergence similar to SNM in Drosophila (Supplementary Figure 3-S30). Cohesin interactors Separase and

TIM2 are present in single copies across Hymenoptera (Supplementary Figures 3-S31-

S32). Consistent with previous findings, TIM1 is absent in all surveyed insects, supporting this protein was lost ancestrally in Hymenoptera (Supplementary Figure 3-

S32). The role of TIM2 in Hymenoptera is not fully understood; it may have co-opted the circadian rhythm function in the absence of timeless or simply retains function relevant to maintenance of chromosomal integrity (Gu et al., 2014).

170

Meiotic Genes: Recombination

Double-strand breaks and strand invasion

The process of meiotic recombination is initiated by the production of double-strand breaks (DSBs) by the topoisomerase SPO11 (Keeney et al., 1997). SPO11 homologs are broadly distributed across eukaryotes, suggestive that it is indispensable for meiosis

(Ramesh et al., 2005; Malik et al., 2007).

The generation of DSBs is followed by loading of RECA homologs RAD51 and meiosis-specific DMC1 proteins onto ssDNA that execute a homology search and strand invasion, ultimately resulting in the use of homologous sequence as a template for DNA synthesis (reviewed by Krejci et al., 2012). The formation of nucleoprotein filaments is facilitated by the action of RAD54 ATPase homologs. RAD54 functions with RAD51A to activate strand exchange between sister chromatids, while RAD54B interacts with

DMC1 to facilitate interhomolog exchange (Nimonkar et al., 2012). Knockouts of DMC1 are associated with absence of recombination intermediates and fewer crossover events in yeast and mouse (Bishop et al., 1992; Pittman et al., 1998). Absence of DMC1 in

Drosophila indicates that this protein is not absolutely required for meiosis (Neale &

Keeney, 2006).

Additional proteins are associated with stabilization of nucleoprotein filaments, including several RAD51 paralogs (RAD51B, RAD51C, RAD51D, XRCC2, XRCC3) and the heterodimer MND1-HOP2. Knockout of RAD51 paralogs creates aberrant chromosomes and impairs homologous recombination events (Takata et al., 2001). Two complexes (RAD51B-RAD51C-RAD51D-XRCC2 and RAD51C-XRCC3) are implicated in processes of genomic integrity via recombinational repair and Holliday

171 junction resolution (Yokoyama et al., 2004; Liu et al., 2007). The MND1-HOP2 complex is meiosis-specific; efficacious homologous pairing is mediated by MND1-HOP2 stabilization of DMC1-ssDNA complexes and later stimulation of strand invasion and D- loop formation (Chi et al., 2007; Pezza et al. 2007).

We searched for homologs for SPO11, RAD51, RAD54, MND1, and HOP2

(Figure 3-2). Consistent with its critical function in meiosis, SPO11 is present in all insects surveyed (Supplementary Figure 3-S33). RAD51 is present across Hymenoptera, however its meiosis-specific paralog DMC1 is absent in most taxa (Figure 3-7,

Supplementary Figure 3-S34). The apparent absence of DMC1 in basal hymenopterans

Athalia rosae and Neodiprion lecontei and presence in Orussus abietinus and Cephus cinctus suggest this gene was lost at least two times in the course of hymenopteran evolution. There have been reports of independent losses of DMC1 in many eukaryotic lineages, including the arthropods Acyrthosiphon pisum and Daphnia pulex (Schurko et al., 2009; Schurko et al., 2010; Hanson et al., 2013). Additional data from symphytan insects is needed to resolve the timing of various losses in Hymenoptera. We identified

RAD51 paralogs RAD51C, RAD51D, XRCC2, and XRCC3 in all hymenopterans, with the exception of RAD51C, which was absent in aculeate insects (Figure 3-7,

Supplementary Figures 3-S34-S39). RAD54 is present across Hymenoptera, however its paralog RAD54B was not found in the Ichneumonomorpha and Symphyta groups, suggesting two independent loss events (Figure 3-2, Supplementary Figures 3-S40-S42).

Surprisingly, we found HOP2 and MND1 in nearly all insects; only Drosophila is missing these genes (Supplementary Figures 3-S43-S44). This result is unexpected given the interaction of DMC1 with the HOP2-MND1 heterodimer and previous associations of

172 the loss of the former with the absence of the latter in the nematode Caenorhabditis elegans and insects Drosophila melanogaster and Anopheles gambiae (Ramesh et al.,

2005; Schurko et al., 2010). The presence of HOP2 and MND1 in these insects implicates alternate mechanisms of strand invasion following formation of double-strand breaks. Although RAD51 and DMC1 co-localize to regions flanking DSBs, RAD51 has an accessory role and no evidence for functional redundancy in the absence of DMC1 exists (Cloud et al., 2012; Brown et al., 2015). An alternate possibility is the co-opting of a RAD51 paralog in meiosis, although hymenopterans differ in the precise inventory of these genes.

MutL and MutS homologs

Mismatch repair (MMR) during DNA replication in bacteria is a well-described mechanism that is facilitated by MutL, MutS, and MutH proteins. Multiple eukaryotic homologs of MutL (MLHs) and MutS (MSHs) have been identified and characterized

(reviewed in Manhart & Alani, 2016). MutL homologs MLH1, MLH3, and PMS1 form heterodimers that are involved in MMR, and the MLH1-MLH3 complex also preferentially binds to Holliday junctions to facilitate crossover formation in meiosis

(Kadyrov et al., 2006; Kadyrov et al., 2007; Ranja et al. 2014). PMS2 has an unclear function, but meiotic defects have been reported in PMS2 mutants in mice (Baker et al.,

1995). MSH proteins form heterodimers that are involved in MMR; MSH2-MSH6 are involved in the repair of small mismatches, while MSH2-MSH3 recognize larger indels

(reviewed in Manhart & Alani, 2016). The MSH4-MSH5 complex has no role in MMR.

This meiosis-specific heterodimer forms a sliding clamp that facilitates the formation of

173 stable recombination intermediates and promotes crossover products via the resolution of double Holliday junctions (Snowden et al., 2004).

We surveyed insect genomes for MLH and MSH family proteins (Figure 3-2).

MLH1, PMS1, and PMS2 are conserved across Hymenoptera and form strongly supported clades, whereas MLH3 is absent in all queried insects (Figure 3-8A,

Supplementary Figures 3-S45-S48). Copies of MLH1 found in Nasonia have long branches (Supplementary Figure 3-S46). We identified PMS1in all Hymenoptera with the exception of Ceratosolen solmsi marchali. A small 46 amino acid N-terminal region was recovered in C. solmsi marchali, but the sequence had poor overall alignment relative to other hymenopteran PMS1 homologs and was not used in alignments to construct phylogenetic trees. BLAST searches failed to recover other protein regions in this wasp, indicating that the gene may be undergoing pseudogenization (Figure 3-2).

Overall, evidence suggests that this gene was absent in non-hymenopteran insects, present in the ancestral hymenopteran, and independently lost in C. solmsi marchali

(Supplementary Figure 3-S47). The duplicates of PMS2 in Aedes are nearly identical and may represent a spurious assembly issue rather than a bona fide duplication event

(Supplementary Figure 3-S48). MSH2 and MSH6 proteins involved in mismatch repair are found in all insects and form well-supported groups relative to meiosis-specific proteins MSH4 and MSH5 (Figure 3-8B, Supplementary Figures 3-S49-53). MSH2 is found in single copies in all taxa, whereas MSH6 has undergone at least two independent duplication events across the phylogeny; duplications in Proctotrupomorpha

(Trichogramma pretiosum, Copidosoma floridanum) generated up to three copies of

MSH6, and five copies in a wasp in Ichneumonomorpha (Microplitis demolitor) (Figure

174

3-8B, Supplementary Figure 3-S50, Supplementary Figure 3-S53). We identified MSH4 and MSH5 in single copies in all insects, with the exception of Drosophila and C. floridanum, which were missing both (Supplementary Figures 3-S51-52). Since MSH4 and MSH5 function as a heterodimer during meiotic recombination, functional loss of one protein may increase susceptibility of the other to the effects of pseudogenization.

Indeed, the co-occurring absence of MSH4 and MSH5 due to loss has been described in fungi and protists (Villenueve & Hillers 2001; Ramesh et al., 2005).

RECQ helicases

The RECQ protein family consists of several ATP-dependent helicases with roles in

DNA replication, repair, and recombination (reviewed in Rezazadeh, 2012). RECQ1 interacts with members of the MMR pathway (MSH2/6, MLH1/PMS2) during recombination (Doherty et al., 2005). RECQ2 is a negative regulator of RAD51 nucleoprotein filament assembly and can prevent D-loop formation (Wu et al., 2001;

Bugreev et al., 2007). RECQ2 mutants have suppressed noncrossover recombinants, indicative of this protein’s role in mediating recombination product formation (De Muyt et al., 2012). RECQ3 binds specifically to Holliday Junction intermediates (Compton et al., 2008) and mutants in Drosophila display stalled replication forks that likely contribute to chromosomal instability and decreased hatching frequency (Bolterstein et al., 2014). RECQ4 co-localizes with RAD51 after induction of double-strand breaks, suggesting a role in DNA repair via recombination (Petkovic et al., 2005). Indeed,

RECQ4 has a high affinity for Holliday junction structures, yet its specific role in recombination is not fully understood (Sedlackova et al., 2015). RECQ5 disrupts RAD51

175 filament formation to regulate recombination events (Hu et al., 2007; Paliwal et al.,

2013).

We searched for homologs for RECQ protein family members in insects (Figure

3-2, Supplementary Figures 3-S54-S58). Proteins grouped into five distinct clades, and homolog membership was monophyletic with the exception of RECQ2 in Tribolium, which had low statistical support for grouping with RECQ1 homologs (Supplementary

Figure 3-S54). RECQ1 is present in single copies in all insects with the exception of

Drosophila, where it is absent (Supplementary Figure 3-S55). RECQ4 is also present in single copies across insects (Supplementary Figure 3-S57). We identified a RECQ2 duplication event in Proctotrupomorpha; Copidosoma floridanum has two copies of

RECQ2, while all other wasps in this group have three copies (Supplementary Figure 3-

S56). The most parsimonious explanation for the observed pattern is two successive duplications of RECQ2 in this insect group, followed by subsequent loss of a single copy in C. floridanum, although we cannot rule out the possibility that the draft assembly for

C. floridanum is incomplete and precludes the identification of a third copy. RECQ5 duplicates are present in all major hymenopteran groups, and these copies in

Proctotrupomorpha, Ichneumonomorpha, and Aculeata groups have long branches on trees (Supplementary Figure 3-S58).

Concatenated datasets

The primary concatenated alignment dataset consisted of 46 meiosis and mitosis genes for a total of 23,500 alignment columns (Figure 3-2, Figure 3-9A, Supplementary Figure

3-S1). Overall, the phylogenetic analysis supported the placement of diverse hymenopterans into four groupings, which correspond to the three monophyletic

176 infraorders Proctotrupomorpha, Ichneumonomorpha, and Aculeata (Rasnitsyn, 1988) as well as the basal paraphyletic suborder Symphyta (Figure 3-9A). In agreement with previous findings, we recovered Aculeata and Ichneumonomorpha as monophyletic groups with strong branch support (Figure 3-9B, Dowton & Austin, 2001; Vilhelmsen et al., 2010; Sharkey et al., 2012; Klopfstein et al., 2013; Mao et al., 2015, Peters et al.,

2017). The Proctotrupomorpha group is also monophyletic in our dataset (Figure 3-9A).

Although taxa in Proctotrupomorpha are not always monophyletic in published hymenopteran phylogenies, the subset of wasps with genome data in this study belong to insect groups that are consistently monophyletic (Sharkey et al., 2012; Klopfstein et al.,

2013). Among the Symphyta, the sawflies Neodiprion lecontei and Athalia rosae are members of the Tenthredinidae family and are consistently reported as one of the most basal hymenopteran lineages (Malm & Nyman, 2015; Song et al., 2016). Two other sawflies, Orussus abietinus and Cephus cinctus, form a clade that serves as the symphytan outgroup to Proctotrupomorpha + Ichneumonomorpha + Aculeata (Figure 3-

9A). The monophyly of O. abietinus and C. cinctus stands in contrast to other analyses that suggest these sawflies are paraphyletic (Malm & Nyman, 2015; Song et al., 2016).

However, this result may not be robust given the limited number of symphytan taxa with genomic resources. Additional data from other basal hymenopterans may be needed to fully resolve the relationships.

While the placement of hymenopteran species within Proctotrupomorpha,

Ichneumonomorpha, and Aculeata is well supported and is consistent with published reports, there is weaker support for the relative placement of these three major clades.

Our concatenated analysis with >50% species representation support the placement of

177

Proctotrupomorpha and Ichneumonomorpha as sister lineages, a result that is consistent with many other recent phylogenetic surveys (Figure 3-9B, e.g. Dowton & Austin, 2001;

Klopfstein et al., 2013; Mao et al., 2015, Peters et al., 2017). However, there are examples of alternate relationships between Proctotrupomorpha + Ichneumonomorpha +

Aculeata groups, with reports of Proctotrupomorpha (e.g. Dowton et al., 1997; Rasnitsyn

& Zhang, 2010; Vilhelmsen et al., 2010) and Ichneumonomorpha (e.g. Peters et al.,

2011, Branstetter et al., 2016) as the outgroup in three-group comparisons (Figure 3-9B).

Indeed, there is somewhat weaker statistical support for the relative placement of

Proctotrupomorpha + Ichneumonomorpha in our concatenated dataset (Figure 3-9A).

Additionally, there is weaker statistical support for the grouping of Proctotrupomorpha +

Ichneumonomorpha + Aculeata clades that separate the insect suborder Apocrita from the

Symphyta, although the loss of DMC1 in all species in the former and presence of DMC1 in some of the latter provides separate support for this grouping (Figure 3-2, Figure 3-

9A).

We performed maximum likelihood analysis of a second concatenated dataset using 33 genes that formed monophyletic clades in Proctotrupomorpha,

Ichneumonomorpha, and Aculeata, for a total of 18,735 alignment columns. We found strong support for the placement of hymenopteran species within the major infraorders, however there was weaker support for stem branches above the infraorder level

(Supplementary Figure 3-S59). In addition, Symphyta grouped as a monophyletic clade sister to Aculeata, a result in disagreement with the body of literature establishing

Symphyta as a basal paraphyletic group (Supplementary Figure 3-S59). This result indicates that dataset composition can influence the inferred topological relationships

178 between major groups of hymenopterans, and there is likely discordant evolutionary patterns among individual genes in this dataset. The intent of this study is not to make a definitive statement regarding the precise evolutionary relationships of these lineages and, as such, topological relationships should be interpreted with caution. Recent studies have superior sampling in terms of alignment size and number of species (e.g. Peters et al., 2017). Overall, the robustness of statistical support in our topologies mirrors patterns of consensus and disagreement in published work.

Maintenance of intact meiosis genes in D. muliebre in the absence of sex

We characterized a meiosis gene inventory for the asexual wasp D. muliebre and compared it to homologs in close sexual relatives, D. alloeum and D. ferrugineum, to see if sex loss in D. muliebre has contributed to major changes in this organism’s core meiotic machinery. We observed an identical set of meiosis genes in D. muliebre and D. alloeum (Figure 3-2). Furthermore, these regions showed no evidence of pseudogenization; branch lengths for D. muliebre were similar to those of D. alloeum, and visual inspection of alignments indicated no major mutational events that could contribute to loss of gene function (Figure 3-3, 3-4, 3-5, 3-6, 3-7, 3-8). The absence of

REC8 in D. muliebre is interesting given that modifications to this gene (i.e., gene duplication) was implicated in the spread of contagious asexuality in Daphnia (Eads et al., 2012). However, REC8 is also absent in the sexual wasp D. alloeum, and gene loss in

Diachasma (versus duplication in Daphnia) means that any potential role in conferring asexuality would be distinct. More recently, REC8 as a single genetic contributor to asexuality has been refuted (Xu et al., 2015). As stated above, REC8 homologs have high sequence divergence across Hymenoptera and might be present in Diachasma despite our

179 retrieval efforts. Nevertheless, actual loss of REC8 in Diachasma would be interesting as it may predispose these wasps to the generation of asexual lineages. Moreover, when comparing the numbers of nucleotide differences occurring in the D. ferrugineum vs. D. muliebre branches in six meiosis-specific genes, a null model of equal evolutionary rates between the two wasps could not be rejected (Table 3-2).

There are multiple potential explanations for the presence of intact meiotic genes in D. muliebre. First, if asexuality in D. muliebre is relatively new, low sequence divergence in meiosis genes might reflect an insufficient passage of time in order for a signal of substitutional rate change to be observed. The origin of D. muliebre has been estimated to occur between 10,000YA-1MYA (Forbes et al., 2013). A study in Timema stick insects found a signal of differences in evolutionary rate of nuclear loci in sexual and asexual lineages diverging >400,000YA (Henry et al., 2012). Although absence of a signal of evolutionary divergence remains a possibility, a second explanation for intact meiotic genes in D. muliebre is that there may be selective pressures to preserve the fidelity of meiotic machinery if they possess some secondary function, such as maintenance of chromosomal stability. While many of the genes surveyed here do indeed have secondary roles in mitosis, functional roles of meiosis-specific genes have not been explicitly tested in D. muliebre. A third possibility is that asexual reproduction in D. muliebre may be characterized by non-canonical meiotic pathways, in which case wasps would likely retain meiotic gene inventory components. The identification of core meiotic machinery in asexual D. muliebre supports the hypothesis that these wasps could reproduce via automixis. Proper ploidy levels in asexuals are maintained with various mechanisms occurring before and after meiotic divisions (Suomalainen et al., 1987). The

180 meiotic can provide preliminary speculation. A previous survey of microsatellite genetic diversity in Diachasma suggested dynamic patterns of genomic heterozygosity in D. muliebre lineages (Forbes et al., 2013). This information combined with the results of the meiotic gene inventory seem to rule out asexual reproduction via strict apomixis, where we would expect nearly ubiquitous genomic heterozygosity as well as degradation of meiotic gene machinery. Diverse patterns of genomic heterozygosity also conflict with automictic mechanisms that are typically accompanied by pervasive genomic homozygosity (i.e. endoduplication, gamete duplication; Suomalainen et al., 1987). The fusion of gamete products during Meiosis I or II (central or terminal fusion, respectively) would generate patterns of intermediate genomic heterozygosity. The two mechanisms differ in their predictions relative to the expected genomic landscape of heterozygosity, especially for regions proximal to centromeres, which is a potential area for future investigation.

Conclusions and future outlook

We have characterized an extensive inventory of meiosis genes in Hymenoptera. Seven of eight meiosis-specific genes queried in this study were identified in nearly every hymenopteran. Only DMC1 was not consistently found, suggesting that this gene may be dispensable for haplodiploid reproduction in this insect order. We also report the apparent absence of the meiosis-specific gene REC8 in five hymenopteran insects, CORT in the

European paper wasp Polistes dominula, and MSH4 and MSH5 in the polyembryonic parasitoid wasp Copidosoma floridanum. It is possible that incomplete genome assemblies prevent identification of these genes, although this explanation is less likely for the concurrent absence of MSH4 and MSH5 in C. floridanum, as they function as a

181 heterodimer and have both been lost multiple times in eukaryotic evolution (Manhart &

Alani, 2016; Villenueve & Hillers, 2001; Ramesh et al., 2005).

We also detected several duplication events in meiosis genes. We identified gene duplications in individual species in addition to shared duplications, particularly in the

SMC and RECQ gene families. Given the fragmented nature of some genome assemblies used in this study, it is possible that some gene duplicates in individual species represent artifacts produced by de novo assembly methods. Moreover, the presence of a gene duplicate does not reveal the precise role – if any – it plays in the progression of meiosis since the actual meiotic roles of genes are not directly tested with the inventory approach.

Functional redundancy of gene duplicates may relax selective pressures to maintain sequence fidelity, and long branches representing substantial sequence divergence in gene copies may belie functional divergence. However, only through direct functional tests can the roles of inventory genes be explicitly assessed.

In the haplodiploid mode of reproduction that typifies Hymenoptera, females undergo meiosis during egg production, while males generate sperm ameiotically. The genes inventoried here suggest that although the precise complement of meiosis genes in a given insect is variable as a consequence of gene loss and duplication events, the requirement of meiosis in female gamete formation selects for the maintenance of a core machinery of meiosis-specific genes. The transition in reproductive strategies in

Hymenoptera comprises one of many separate transitions to haplodiploidy in the tree of life (Bull, 1983). The “meiosis genetic toolkit” approach would be useful in studying patterns of gene retention and loss in additional systems to evaluate whether these patterns are Hymenoptera-specific or a general feature of transitions to haplodiploid

182 reproduction. Within Hymenoptera, there are several mechanisms underlying loss of sexual reproduction (reviewed in Normark, 2014). Datasets and analyses beyond the scope of the current work (e.g. additional asexual genomes, sex-specific expression of meiosis genes) will provide future opportunities to study genomic consequences of reproductive transitions.

The inventory of meiosis genes in Hymenoptera offers a promising opportunity to study alternative meiotic mechanisms. Whenever possible, future investigative efforts should combine cytological and genomic approaches. Many forms of asexual reproduction have been described in Hymenoptera, each with distinct cytological events and varied patterns of genome heterozygosity (van Wilgenburg et al., 2006). Utilization of the meiotic gene inventory to study additional transitions to asexuality in hymenopterans would allow for the evaluation of meiosis gene conservation across organisms with diverse reproductive modes.

Acknowledgements

The authors acknowledge Wee Yee for providing the Diachasma muliebre wasp sample used for sequencing, Gery Hehman and the Roy J. Carver Center for Genomics at the

University of Iowa for sequencing support, Ethan Nelson-Moore for developing a custom

BLAST script for gene finding, and Austin Paden for assistance with gene annotation.

The authors declare no conflicts of interest.

Data Availability

DNA sequences: Meiosis gene models for Diachasma muliebre are available under the

Genbank Accessions MF432979-MF433026.

183

REFERENCES

Adams MD, et al. 2000. The genome sequence of Drosophila melanogaster.

Science 287:2185-2195.

Adhikari D, et al. 2012. Cdk1, but not Cdk2, is the sole Cdk that is essential and

sufficient to drive resumption of meiosis in mouse oocytes. Hum Mol Genet

21:2476-2484.

Aguiar AP, et al. 2013. Order Hymenoptera. In Animal Biodiversity: An Outline of

Higher-level Classification and Survey of Taxonomic Richness. ed. Zhang ZQ. pp

51-62. Magnolia Press, Auckland, NZ.

Archambault V, Glover DM. 2009. Polo-like kinases: conservation and divergence in

their functions and regulation. Nat Rev Mol Cell Bio 10:265-275.

Baker SM, et al. 1995. Male mice defective in the DNA mismatch repair gene PMS2

exhibit abnormal chromosome synapsis in meiosis. Cell 82:309-319.

Barton NH, Charlesworth B. 1998. Why sex and recombination? Science 281:1986-1990.

Benna C, et al. 2010. Drosophila timeless2 is required for chromosome stability and

circadian photoreception. Curr Biol 20:346-352.

Bettencourt-Dias M, et al. 2005. SAK/PLK4 is required for centriole duplication and

flagella development. Curr Biol 15:2199-2207.

Bishop DK, Park D, Xu L, Kleckner N. 1992. DMC1: a meiosis-specific yeast homolog

of E. coli recA required for recombination, synaptonemal complex formation, and

cell cycle progression. Cell 69:439-456.

184

Bolterstein E, Rivero R, Marquez M, McVey M. 2014. The Drosophila Werner

exonuclease participates in an exonuclease-independent response to replication

stress. Genetics 197:643-652.

Branstetter MG, et al. 2016. Phylogenomic analysis of ants, bees and stinging wasps:

improved taxon sampling enhances understanding of hymenopteran

evolution. Biorxiv, 068957. doi: https://doi.org/10.1101/068957.

Brown MS, Grubb J, Zhang A, Rust MJ, Bishop DK. 2015. Small Rad51 and Dmc1

complexes often co-occupy both ends of a meiotic DNA double strand break.

PLOS Genet 11:e1005653.

Bugreev DV, Yu X, Egelman EH, Mazin AV. 2007. Novel pro-and anti-recombination

activities of the Bloom’s syndrome helicase. Gene Dev 21:3085-3094.

Bull JJ. 1983. Evolution of sex determining mechanisms. Benjamin Cummings Publishing

Company, Inc., Menlo Park, CA, USA.

Buonomo SB, et al. 2000. Disjunction of homologous chromosomes in meiosis I depends

on proteolytic cleavage of the meiotic cohesin Rec8 by separin. Cell 103:387-398.

Burke GR, Walden KK, Whitfield JB, Robertson HM, Strand MR. 2014. Widespread

genome reorganization of an obligate virus mutualist. PLOS Genet 10:e1004660.

Camacho C, et al. 2009. BLAST+: architecture and applications. BMC Bioinformatics

10:1.

Cao L, et al. 2014. Phylogenetic analysis of CDK and cyclin proteins in premetazoan

lineages. BMC Evol Biol 14:1.

Carvalho-Santos Z, et al. 2010. Stepwise evolution of the centriole-assembly pathway. J

Cell Sci 123:1414-1426.

185

Celniker SE, et al. 2002. Finishing a whole-genome shotgun: release 3 of the Drosophila

melanogaster euchromatic genome sequence. Genome Biol 3:1.

Chan RC, et al. 2003. Chromosome cohesion is regulated by a clock gene paralogue

TIM-1. Nature 423:1002-1009.

Chi J, Mahé F, Loidl J, Logsdon J, Dunthorn M. 2013. Meiosis gene inventory of four

ciliates reveals the prevalence of a synaptonemal complex-independent crossover

pathway. Mol Biol Evol:mst258.

Chi P, San Filippo J, Sehorn MG, Petukhova GV, Sung P. 2007. Bipartite stimulatory

action of the Hop2–Mnd1 complex on the Rad51 recombinase. Gene Dev

21:1747-1757.

Chikhi R, Medvedev P. 2013. Informed and automated k-mer size selection for genome

assembly. Bioinformatics:btt310.

Clement TM, Inselman AL, Goulding EH, Willis WD, Eddy EM. 2015. Disrupting cyclin

dependent kinase 1 in spermatocytes causes late meiotic arrest and infertility in

mice. Biol Reprod biolreprod.115.134940.

Cloud V, Chan YL, Grubb J, Budke B, Bishop DK. 2012. Rad51 is an accessory factor

for Dmc1-mediated joint molecule formation during meiosis. Science 337:1222-

1225.

Compton SA, Tolun G, Kamath-Loeb AS, Loeb LA, Griffith JD. 2008. The Werner

syndrome protein binds replication fork and holliday junction DNAs as an

oligomer. J Biol Chem 283:24478-24483.

De Muyt A, et al. 2012. BLM helicase ortholog Sgs1 is a central regulator of meiotic

recombination intermediate metabolism. Mol Cell 46:43-53.

186

Doherty KM, et al. 2005. RECQ1 helicase interacts with human mismatch repair factors

that regulate genetic recombination. J Biol Chem 280:28085-28094.

Dowton M, Austin A, Dillon N, Bartowsky E. 1997. Molecular phylogeny of the

apocritan wasps: the Proctotrupomorpha and Evaniomorpha. Syst Entomol

22:245-255.

Dowton M, Austin AD. 2001. Simultaneous analysis of 16S, 28S, COI and morphology

in the Hymenoptera: Apocrita–evolutionary transitions among parasitic wasps.

Biol J Linn Soc 74:87-111.

Eads BD, Tsuchiya D, Andrews J, Lynch M, Zolan ME. 2012. The spread of a

transposon insertion in Rec8 is associated with obligate asexuality in Daphnia. P

Natl A Sci USA 109:858-863.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high

throughput. Nucleic Acids Res 32:1792-1797.

Forbes AA, Hood GR, Feder JL. 2010. Geographic and ecological overlap of parasitoid

wasps associated with the Rhagoletis pomonella (Diptera: Tephritidae) species

complex. Ann Entomol Soc Am 103:908-915.

Forbes AA, Rice LA, Stewart NB, Yee WL, Neiman M. 2013. Niche differentiation and

colonization of a novel environment by an asexual parasitic wasp. J Evol

Biol 26:1330-1340.

Geib SM, Liang GH, Murphy TD, Sim SB. 2017. Whole genome sequencing of the

braconid parasitoid wasp Fopius arisanus, an important biocontrol agent of pest

tepritid [sic] fruit flies. G3-Genes Genom Genet 8:2407-2411.

187

Gu HF, et al. 2014. Adaptive evolution of the circadian gene timeout in insects. Sci Rep-

UK 4:4212.

Guindon S, et al. 2010. New algorithms and methods to estimate maximum-likelihood

phylogenies: assessing the performance of PhyML 3.0. Systematic Biol 59:307-

321.

Hanson SJ, et al. 2013. Inventory and phylogenetic analysis of meiotic genes in

monogonont rotifers. J Hered 104:357-370.

Harashima H, Dissmeyer N, Schnittger A. 2013. Cell cycle control across the eukaryotic

kingdom. Trends Cell Biol 23:345-356.

Heidmann D, et al. 2004. The Drosophila meiotic kleisin C(2)M functions before the

meiotic divisions. Chromosoma 113:177-187.

Henry L, Schwander T, Crespi BJ. 2012. Deleterious mutation accumulation in asexual

Timema stick insects. Mol Biol Evol 29:401-408.

Herrmann S, Amorim I, Sunkel CE. 1998. The POLO kinase is required at multiple

stages during spermatogenesis in Drosophila melanogaster. Chromosoma

107:440-451.

Hochegger H, Takeda S, Hunt T. 2008. Cyclin-dependent kinases and cell-cycle

transitions: does one fit all? Nat Rev Mol Cell Biol 9:910-916.

Hu Y, et al. 2007. RECQL5/RECQL5 helicase regulates homologous recombination and

suppresses tumor formation via disruption of Rad51 presynaptic filaments. Gen

Dev 21:3073-3084.

Jacobs HW, Knoblich JA, Lehner CF. 1998. Drosophila Cyclin B3 is required for female

fertility and is dispensable for mitosis like Cyclin B. Gen Dev 12:3741-3751.

188

Kadyrov FA, Dzantiev L, Constantin N, Modrich P. 2006. Endonucleolytic function of

MutLα in human mismatch repair. Cell 126:297-308.

Kadyrov FA, et al. 2007. Saccharomyces cerevisiae MutLα is a mismatch repair

endonuclease. J Biol Chem 282:37181-37190.

Kasten M, Giordano A. 2001. Cdk10, a Cdc2-related kinase, associates with the Ets2

transcription factor and modulates its transactivation activity. Oncogene 20:1832.

Kearse M, et al. 2012. Geneious Basic: an integrated and extendable desktop software

platform for the organization and analysis of sequence data. Bioinformatics

28:1647-1649.

Keeney S, Giroux CN, Kleckner N. 1997. Meiosis-specific DNA double-strand breaks

are catalyzed by Spo11, a member of a widely conserved protein family. Cell

88:375-384.

Kishimoto T. 2003. Cell-cycle control during meiotic maturation. Curr Opin Cell Biol

15:654-663.

Klopfstein S, Vilhelmsen L, Heraty JM, Sharkey M, Ronquist F. 2013. The

hymenopteran tree of life: evidence from protein-coding genes and objectively

aligned ribosomal data. PLOS One 8:e69344.

Krejci L, Altmannova V, Spirek M, Zhao X. 2012. Homologous recombination and its

regulation. Nucleic Acids Res 40:5795-5818.

Lamb RY, Willey RB. 1987. Cytological mechanisms of thelytokous parthenogenesis in

insects. Genome 29:367-369.

Le SQ, Gascuel O. 2008. An improved general amino acid replacement matrix. Mol Biol

Evol 25:1307-1320.

189

Liu D, et al. 1998. Cyclin A1 is required for meiosis in the male mouse. Nat Genet

20:377-380.

Liu Y, Tarsounas M, O'Regan P, West SC. 2007. Role of RAD51C and XRCC3 in

genetic recombination and DNA repair. J Biol Chem 282:1973-1979.

Llamazares S, et al. 1991. polo encodes a protein kinase homolog required for mitosis in

Drosophila. Gen Dev 5:2153-2165.

Loidl J. 2016. Conservation and Variability of Meiosis Across the Eukaryotes. Annu Rev

Genet 50:293-316.

Luo R, et al. 2012. SOAPdenovo2: an empirically improved memory-efficient short-read

de novo assembler. GigaScience 1:1-6.

Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate genes.

Science 290:1151-1155.

Malik SB, Pightling AW, Stefaniak LM, Schurko AM, Logsdon JM. 2008. An expanded

inventory of conserved meiotic genes provides evidence for sex in Trichomonas

vaginalis. PLOS One 3:e2879.

Malik SB, Ramesh MA, Hulstrand AM, Logsdon JM. 2007. Protist homologs of the

meiotic Spo11 gene and topoisomerase VI reveal an evolutionary history of gene

duplication and lineage-specific loss. Mol Biol Evol 24:2827-2841.

Malm T, Nyman T. 2015. Phylogeny of the symphytan grade of Hymenoptera: new

pieces into the old jigsaw (fly) puzzle. Cladistics 31:1-17.

Malumbres M. 2014. Cyclin-dependent kinases. Genome Biol 15:122.

Malumbres M, Barbacid M. 2001. Milestones in cell division: to cycle or not to cycle: a

critical decision in cancer. Nat Rev Cancer 1:222-231.

190

Manhart CM, Alani E. 2016. Roles for mismatch repair family proteins in promoting

meiotic crossing over. DNA Repair 38:84-93.

Manheim EA, McKim KS. 2003. The synaptonemal complex component C(2)M

regulates meiotic crossing over in Drosophila. Curr Biol 13:276-285.

Mao M, Gibson T, Dowton M. 2015. Higher-level phylogeny of the Hymenoptera

inferred from mitochondrial genomes. Mol Phylogenet Evol 84:34-43.

Misra S, et al. 2002. Annotation of the Drosophila melanogaster euchromatic genome: a

systematic review. Genome Biol 3:1.

Myers MP, Wager-Smith K, Wesley CS, Young MW, Sehgal A. 1995. Positional cloning

and sequence analysis of the Drosophila clock gene, timeless. Science 270:805.

Nasmyth K, Haering CH. 2009. Cohesin: its roles and mechanisms. Annu Rev Genet

43:525-558.

Neale MJ, Keeney S. 2006. Clarifying the mechanics of DNA strand exchange in meiotic

recombination. Nature 442:153-158.

Nene V, et al. 2007. Genome sequence of Aedes aegypti, a major arbovirus vector.

Science 316:1718-1723.

Nimonkar AV, et al. 2012. Saccharomyces cerevisiae Dmc1 and Rad51 proteins

preferentially function with Tid1 and Rad54 proteins, respectively, to promote

DNA strand invasion during genetic recombination. J Biol Chem 287:28727-

28737.

Normark BB. 2014. Modes of reproduction. In Shuker, DM, Simmons LW., eds. The

evolution of insect mating systems. Oxford University Press, Oxford, UK.

191

Normark BB, Judson OP, Moran NA. 2003. Genomic signatures of ancient asexual

lineages. Biol J Linn Soc 79:69-84.

Ortega S, et al. 2003. Cyclin-dependent kinase 2 is essential for meiosis but not for

mitotic cell division in mice. Nat Genet 35:25-31.

Paliwal S, Kanagaraj R, Sturzenegger A, Burdova K, Janscak P. 2014. Human RECQ5

helicase promotes repair of DNA double-strand breaks by synthesis-dependent

strand annealing. Nucleic Acids Res 42:2380-2390.

Patil S, et al. 2015. Identification of the meiotic toolkit in diatoms and exploration of

meiosis-specific SPO11 and RAD51 homologs in the sexual species Pseudo-

nitzschia multistriata and Seminavis robusta. BMC Genomics 16:1.

Pesin JA, Orr-Weaver T.L. 2007. Developmental role and regulation of cortex, a meiosis-

specific anaphase-promoting complex/cyclosome activator. PLOS Genet 3:e202.

Peters RS, et al. 2017. Evolutionary History of the Hymenoptera. Curr Biol 7:1013-1018.

Peters RS, et al. 2011. The taming of an impossible child: a standardized all-in approach

to the phylogeny of Hymenoptera using public database sequences. BMC Biol

9:1.

Petkovic M, Dietschy T, Freire R, Jiao R, Stagljar I. 2005. The human Rothmund-

Thomson syndrome gene product, RECQL4, localizes to distinct nuclear foci that

coincide with proteins involved in the maintenance of genome stability. J Cell

Sci 118:4261-4269.

Pezza RJ, Voloshin ON, Vanevski F, Camerini-Otero RD. 2007. Hop2/Mnd1 acts on two

critical steps in Dmc1-promoted homologous pairing. Gen Dev 21:1758-1766.

192

Pittman DL, et al. 1998. Meiotic prophase arrest with failure of chromosome synapsis in

mice deficient for Dmc1, a germline-specific RecA homolog. Mol Cell 1:697-

705.

Ramesh MA, Malik SB, Logsdon JM. 2005. A phylogenomic inventory of meiotic genes:

evidence for sex in Giardia and an early eukaryotic origin of meiosis. Curr Biol

15:185-191.

Ranjha L, Anand R, Cejka P. 2014. The Saccharomyces cerevisiae Mlh1-Mlh3

heterodimer is an endonuclease that preferentially binds to Holliday junctions. J

Biol Chem 289:5674-5686.

Rasnitsyn A. 1988. An outline of evolution of the hymenopterous insects (order

Vespida). Orient Insects 22:115-145.

Rasnitsyn A, Zhang H. 2010. Early evolution of Apocrita (Insecta, Hymenoptera) as

indicated by new findings in the Middle Jurassic of Daohugou, Northeast China.

Acta Geol Sin-Engl 84:834-873.

Rezazadeh S. 2012. RecQ helicases; at the crossroad of genome replication, repair, and

recombination. Mol Biol Rep 39:4527-4543.

Richards S, et al. 2008. The genome of the model beetle and pest Tribolium

castaneum. Nature 452:949-955.

Sadd BM, et al. 2015. The genomes of two key bumblebee species with primitive

eusocial organization. Genome Biol 16:1.

Santamaría D, et al. 2007. Cdk1 is sufficient to drive the mammalian cell cycle. Nature

448:811-815.

193

Schurko AM, Logsdon JM. 2008. Using a meiosis detection toolkit to investigate ancient

asexual“scandals” and the evolution of sex. Bioessays 30:579-589.

Schurko AM, Logsdon Jr JM, Eads BD. 2009. Meiosis genes in Daphnia pulex and the

role of parthenogenesis in genome evolution. BMC Evol Biol 9:78.

Schurko AM, Mazur DJ, Logsdon Jr JM. 2010. Inventory and phylogenomic distribution

of meiotic genes in Nasonia vitripennis and among diverse arthropods. Insect Mol

Biol 19:165-180.

Sedlackova H, Cechova B, Mlcouskova J, Krejci L. 2015. RECQ4 selectively recognizes

Holliday junctions. DNA Repair 30:80-89.

Sharkey MJ, et al. 2012. Phylogenetic relationships among superfamilies of

Hymenoptera. Cladistics 28:80-112.

Snowden T, Acharya S, Butz C, Berardini M, Fishel R. 2004. hMSH4-hMSH5

recognizes Holliday Junctions and forms a meiosis-specific sliding clamp that

embraces homologous chromosomes. Mol Cell 15:437-451.

Song SN, Tang P, Wei SJ, Chen XX. 2016. Comparative and phylogenetic analysis of the

mitochondrial genomes in basal hymenopterans. Sci Rep-UK 6:20972.

Standage DS, et al. 2016. Genome, transcriptome, and methylome sequencing of a

primitively eusocial wasp reveal a greatly reduced DNA methylation system in a

social insect. Mol Ecol 25: 1769-1784.

Suen G, et al. 2011. The genome sequence of the leaf-cutter ant Atta cephalotes reveals

insights into its obligate symbiotic lifestyle. PLOS Genet 7:e1002007.

Sunkel CE, Glover DM. 1988. polo, a mitotic mutant of Drosophila displaying abnormal

spindle poles. J Cell Sci 89:25-38.

194

Suomalainen E, Saura A, Lokki J. 1987. Cytology and evolution in parthenogenesis.

CRC Press, Boca Raton, FL, USA.

Swan A, Schüpbach T. 2007. The Cdc20 (Fzy)/Cdh1-related protein, Cort, cooperates

with Fzy in cyclin destruction and anaphase progression in meiosis I and II in

Drosophila. Development 134:891-899.

Takata M, et al. 2001. Chromosome instability and defective recombinational repair in

knockout mutants of the five Rad51 paralogs. Mol Cell Biol 21:2858-2866.

Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: molecular

evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725-2729.

Thomas SE, et al. 2005. Identification of two proteins required for conjunction and

regular segregation of achiasmate homologs in Drosophila male meiosis. Cell

123:555-568.

Tzung K, et al. 2001. Genomic evidence for a complete sexual cycle in Candida

albicans. Proc Natl Acad Sci USA 98:3249-3253.

Uhlmann F, Wernic D, Poupart MA, Koonin EV, Nasmyth K. 2000. Cleavage of cohesin

by the CD clan protease separin triggers anaphase in yeast. Cell 103:375-386. van Wilgenburg E, Driessen G, Beukeboom LW. 2006. Single locus complementary sex

determination in Hymenoptera: an “unintelligent” design. Front Zool 3:1.

Vilhelmsen L, Miko I, Krogmann L. 2010. Beyond the wasp‐waist: structural diversity

and phylogenetic significance of the mesosoma in apocritan wasps (Insecta:

Hymenoptera). Zool J Linn Soc 159:22-194.

Villeneuve AM, Hillers KJ. 2001. Whence meiosis? Cell 106:647-650.

195

Werren JH, et al. 2010. Functional and evolutionary insights from the genomes of three

parasitoid Nasonia species. Science 327:343-348.

Wharton R, Marsh P. 1978. New world Opiinae (Hymenoptera: Braconidae) parasitic on

Tephritidae (Diptera). J Wash Acad Sci 68:147-167.

Wu L, Davies SL, Levitt NC, Hickson ID. 2001. Potential role for the BLM helicase in

recombinational repair via a conserved interaction with RAD51. J Biol Chem

276:19375-19381.

Xiao JH, et al. 2013. Obligate mutualism within a host drives the extreme specialization

of a fig wasp genome. Genome Biol 14:1.

Xiang Y, et al. 2007. The inhibition of polo kinase by matrimony maintains G2 arrest in

the meiotic cell cycle. PLOS Biol 5:e323.

Xu S, et al. 2015. Hybridization and the origin of contagious asexuality in Daphnia

pulex. Mol Biol Evol 32:3215-3225.

Yokoyama H, et al. 2004. Preferential binding to branched DNA strands and strand‐

annealing activity of the human Rad51B, Rad51C, Rad51D and Xrcc2 protein

complex. Nucleic Acids Res 32:2556-2565.

Zitouni S, Nabais C, Jana SC, Guerrero A, Bettencourt-Dias M. 2014. Polo-like kinases:

structural variations lead to multiple functions. Nat Rev Mol Cell Biol 15:433-

452.

196

TABLES

Table 3-1. Organisms used in meiosis gene inventory.

Assembly Assembly Genbank Submitter/ Species Order Group Name Date Accession publication Aedes aegypti Diptera N/A AgamP3 2013/12 GCA_000004015.2 Nene et al., 2007 Athalia rosae Hymenoptera S Aros_1.0 2013/03/05 GCA_000344095.1 Qu et al., Baylor College of Medicine Atta cephalotes Hymenoptera A Attacep1.0 2010/07/06 GCA_000143395.2 Suen et al., 2011 Bombus terrestris Hymenoptera A Bter_1.0 2011/06/03 GCA_000214255.1 Sadd et al., 2015 Cephus cinctus Hymenoptera S Ccin1 2014/07/09 GCA_000341935.1 Robertson et al., University of Illinois at Urbana- Champaign Ceratosolen solmsi Hymenoptera P CerSol_1.0 2013/12/04 GCA_000503995.1 Xiao et al., 2013 marchali Copidosoma floridanum Hymenoptera P Cflo_1.0 2014/05/01 GCA_000648655.1 Qu et al., Baylor College of Medicine Diachasma alloeum Hymenoptera I Dall1.0 2015/10/27 GCA_001412515.1 Robertson et al., University of Illinois at Urbana- Champaign Diachasma muliebre Hymenoptera I N/A N/A MF432979-MF433026 This study Drosophila melanogaster Diptera N/A Release 2014/08/01 GCA_000001215.4 Adams et al., 2000; Misra et al., 2002; Celniker et al., 6+ISO1 MT 2002 Fopius arisanus Hymenoptera I ASM80636v1 2014/12/23 GCA_000806365.1 Geib et al., 2017 Megachile rotundata Hymenoptera A MROT_1.0 2011/07/19 GCA_000220905.1 Robinson et al., University of Maryland Microplitis demolitor Hymenoptera I Mdem2 2015/10/07 GCA_000572035.2 Burke et al., 2014 Nasonia giraulti Hymenoptera P Ngir_1.0 2010/01/15 GCA_000004775.1 Werren et al., 2010 Nasonia longicornis Hymenoptera P Nlon_1.0 2010/01/15 GCA_000004795.1 Werren et al., 2010 Nasonia vitripennis Hymenoptera P Nvit_2.1 2012/11/28 GCA_000002325.2 Werren et al., 2010 Neodiprion lecontei Hymenoptera S Nlec1.0 2015/08/10 GCA_001263575.1 Vertacnik et al., University of Kentucky Orussus abietinus Hymenoptera S Oabi_1.0 2014/04/05 GCA_000612105.1 Qu et al., Baylor College of Medicine Polistes dominula Hymenoptera A Pdom r1.2 2015/12/14 GCA_001465965.1 Standage et al., 2016 Tribolium castaneum Coleoptera N/A Tcas_3.0 2009/07/08 GCA_000002335.2 Richards et al., 2008 Trichogramma pretiosum Hymenoptera P Tpre_1.0 2014/03/24 GCA_000599845.2 Qu et al., Baylor College of Medicine Assembly names, submission dates, and Genbank accessions are listed; assemblies were accessed on Ensembl (Aedes) and NCBI (all others) during the data collection period. Letter abbreviations in Group column refer to Symphyta (S), Aculeata (A), Ichneumonomorpha (I), and Proctotrupomorpha (P) hymenopteran groups.

197

Table 3-2. Relative rate tests in meiosis-specific genes across in sexual and asexual Diachasma.

Unique differences in three- Unique differences in three- Unique differences in three- taxon test (all sites) taxon test (1st & 2nd positions) taxon test (3rd positions)

Gene Length (bp) DA DF DM DA DF DM DA DF DM

CORT 1,635 10 1 1 5 0 1* 5 1* 0

SPO11 1,164 10 1 2* 4 1 1 6 0 1*

HOP2 618 8 0 0 2 0 0 6 0 0

MND1 609 5 1 2* 2 1 2* 3 0 0

MSH4 2,055 13 9* 7 2 4* 1 11 5 6*

MSH5 2,340 10 1* 0 2 0 0 8 1* 0

Bolded numbers denote counts of unique differences that were larger in D. ferrugineum (sexual, blue) vs. D. muliebre (asexual, red). *Null model of equal evolutionary rates for D. ferrugineum and D. muliebre could not be rejected using Tajima’s Relative Rate test (p > 0.083 when correcting for multiple comparisons).

198

FIGURES

Figure 3-1. Overview of key processes involved in meiosis. Gamete differentiation during asymmetric cell divisions (I) facilitates entry into meiosis (II). Prior to meiotic entry, chromosomal content is duplicated via DNA replication, followed by the appearance of the centromere and the synaptonemal complex (III). Synapsis and recombination of homologous chromosomes (IV) occurs during Prophase I. Synaptonemal complex machinery disassociates to enable segregation of chromosomes during Anaphase I (V). In Meiosis II, sister chromatids separate (VI) and the final haploid gamete fully develops (VII). Boxes contain meiosis genes investigated in this study. Figure adapted from Hanson et al., 2013.

199

Figure 3-2. Meiosis gene inventory in Hymenoptera and three additional insect genomes. Genes specific to meiosis are highlighted in bold. Species relationships in the phylogeny are based on a concatenated gene dataset (46 genes, 23,500 columns). Bolded branches indicate bootstrap support > 800/1000 replicates and aLRT values > 0.9 (Figure 3-9). Grey boxes indicate gene is present. Numbers in boxes denote gene copies. Black boxes indicate genes not found in this study. *C(2)M in Drosophila melanogaster and PMS1 in Ceratosolen solmsi marchali display low sequence conservation and were omitted from final sequence alignments for these genes. **Genes from Apis mellifera are shown according to a previous report (Schurko et al., 2010) but were not included in alignments in this study.

200

Figure 3-3. Maximum likelihood analysis of Cyclin/CDK proteins. A. Phylogeny of Cyclin homologs. Tree generated using an alignment of 185 amino acids. Estimated parameter values: γ = 1.07, logL = −10885.98. B. Phylogeny of CDK homologs. Tree generated using an alignment of 145 amino acids. Estimated parameter values: γ = 0.62, logL = −5804.85. Bolded branches indicate bootstrap support > 800/1000 replicates and aLRT branch support > 0.90. Species include members of Proctotrupomorpha, Ichneumonomorpha, Aculeata, and Symphyta groups (Table 3-1).

201

Figure 3-4. Maximum likelihood analysis of proteins involved in meiotic cell cycle control. A. Phylogeny of CDC20 homologs. Tree generated using an alignment of 154 amino acids. Estimated parameter values: γ = 1.54, logL = −6151.53. B. Phylogeny of PLK homologs. Tree generated using an alignment of 249 amino acids. Estimated parameter values: γ = 0.68, logL = −7516.02. Bolded branches indicate bootstrap support > 800/1000 replicates and aLRT branch support > 0.90. Species include members of Proctotrupomorpha, Ichneumonomorpha, Aculeata, and Symphyta groups (Table 3-1).

202

Figure 3-5. Maximum likelihood analysis of SMC proteins. Tree generated using an alignment of 132 amino acids. Estimated parameter values: γ = 0.92, logL = −5192.85. Bolded branches indicate bootstrap support > 800/1000 replicates and aLRT branch support > 0.90. Species include members of Proctotrupomorpha, Ichneumonomorpha, Aculeata, and Symphyta groups (Table 3-1).

203

Figure 3-6. Maximum likelihood analysis of RAD21/REC8 proteins. Tree generated using an alignment of 131 amino acids. Estimated parameter values: γ = 1.65, logL = −2031.62. Bootstrap proportion values are indicated on branches. Bolded branches indicate aLRT branch support > 0.90. Species include members of Proctotrupomorpha, Ichneumonomorpha, Aculeata, and Symphyta groups (Table 3-1).

204

Figure 3-7. Maximum likelihood analysis of RAD51 proteins. Tree generated using an alignment of 106 amino acids. Estimated parameter values: γ = 1.77, logL = −11267.02. Bolded branches indicate bootstrap support > 800/1000 replicates and aLRT branch support > 0.90. Species include members of Proctotrupomorpha, Ichneumonomorpha, Aculeata, and Symphyta groups (Table 3-1).

205

Figure 3-8. Maximum likelihood analysis of proteins involved in mismatch repair. A. Phylogeny of mutL homologs. Tree generated using an alignment of 145 amino acids. Estimated parameter values: γ = 0.74, logL = −7131.10 B. Phylogeny of mutS homologs. Tree generated using an alignment of 164 amino acids. Estimated parameter values: γ = 0.94, logL = −9466.33. Bolded branches indicate bootstrap support > 800/1000 replicates and aLRT branch support > 0.90. Species include members of Proctotrupomorpha, Ichneumonomorpha, Aculeata, and Symphyta groups.

206

Figure 3-9. Maximum likelihood phylogenetic analysis of concatenated meiotic gene dataset. A. Tree generated using an alignment of 23,500 amino acids comprising 46 genes. Estimated parameter values: γ = 0.57, logL = −198883.98. Bootstrap values (1000 replicates) are indicated on branches. Bolded branches indicate aLRT branch support > 0.90. B. Examples of previously published topologies for Proctotrupomorpha + Ichneumonomorpha + Aculeata groups. Studies use different species representatives but generally agree on monophyly of Proctotrupomorpha, Ichneumonomorpha, Aculeata and paraphyly of Symphyta.

207

SUPPLEMENTARY DATA

Tables 3-S1 through 3-S19 and Figures 3-S1 through 3-S59 are available at Journal of

Heredity online as Supplementary Tables and Figures published in Tvedte et al. (2017).

URL: https://academic.oup.com/jhered/article/108/7/791/3978731#supplementary-data

208

CHAPTER 4: GENOME EVOLUTION IN SEXUAL VERSUS ASEXUAL SPECIES OF DIACHASMA ABSTRACT

A presumed trait enabling sexual lineages to survive over long periods of time is meiotic recombination, which can reduce selective interference between physically proximate regions in order to redistribute nuclear genomic mutation loads. Loss of sex that is accompanied by loss of recombination is expected to reduce the efficacy of selection across the genome, increasing the accumulating mutation load. Also, reduced or lack of recombination traps the mitochondrial genome in a specific nuclear background; relaxed selection is expected to cause increases in nonsynonymous changes in both genomes.

However, the extent to which mutation accumulation proceeds in asexual lineages that have retained meiotic reproduction is unclear. Here, we characterize global patterns of mutations in sexual and asexual species of Diachasma parasitic wasps. Despite evidence of recombination in asexual Diachasma, we show evidence supporting accelerated rates of molecular evolution in the nuclear genome of this species. Contrary to expectations, asexual Diachasma apparently have not experienced increased evolutionary rates in the mitochondrial genome. The results described here could provide a basis for the limited persistence of asexual lineages in nature, however more work is needed in order to link changes in evolutionary rates with fitness effects.

209

INTRODUCTION

Transitions in reproductive systems are likely to exert a major influence on patterns of genomic evolution. Despite its association with considerable metabolic and genetic costs, sexual reproduction is ubiquitous among eukaryotes. The apparent paradox between theoretical costs of sex and the empirical observation of sex as the predominant reproductive mode in eukaryotes has generated a substantial discourse to explain the relative benefits of sexual reproduction (Otto, 2009; Hartfield & Keightley, 2012).

Alternatively, the persistence of sex could be observed if reproductive transitions are accompanied by costs that would cause fitness declines in asexual lineages. Asexual lineages appear “twiggy” on phylogenetic trees, supporting the phenomenon that transitions to asexuality may represent evolutionary dead-ends (Maynard Smith, 1976;

Bell, 1982). A prominent prediction relevant to reproduction is that genomes with reduced meiotic recombination experience limitations on the efficacy of selection to act independently on loci (Hill & Robertson, 1966; Felsenstein, 1974).. Additionally, if asexual individuals with low genetic loads are lost via drift, the inability to restore the least loaded genotypes via recombination should lead to the irreversible accumulation of deleterious mutations within an asexual lineage (Muller, 1964; Charlesworth &

Charlesworth, 1997). High genetic loads in asexual lineages may limit their ability to persist over long evolutionary timespans (Lynch et al., 1993).

The increased availability of genomic data facilitates the investigation of the genomic consequences of reproductive mode transitions. Overall, the absence of canonical recombination is expected to hinder the clearance of harmful mutations.

Reduced efficacy of selection in asexuals is expected to produce a genomic “footprint” of

210 an increased ratio of nonsynonymous to synonymous substitutions (Glémin & Galtier,

2012). Although many studies provide support for increased accumulation of putatively deleterious mutations in asexuals, there are notable exceptions (reviewed in Hartfield,

2016). Furthermore, many studies have been based on patterns from a small number of genes. More recently, genome-wide analyses have shown mixed support for increased mutation accumulation in asexual lineages (Hollister et al., 2015; Ament-Velasquez et al., 2016; Lovell et al., 2017; Brandt et al., 2017; Bast et al, 2018; Lindsey et al., 2018).

Selection may also act on synonymous sites to mediate usage of alternative codons

(Hershberg & Petrov, 2008; Galtier et al., 2018). Following sex loss, less effective selection at these sites may alter the strength of codon usage bias across the genome.

Loss of recombination may also influence the nucleotide composition of the genome in a selection-independent manner. The conversion of A and T alleles to G and C favored over the reverse in highly recombining regions (Mugal et al., 2015). This GC- biased gene conversion (gBGC) has been suggested to be an important contributor to GC content evolution (Duret & Galtier, 2009; Pessia et al., 2012; Glemin et al., 2014). If an asexual genome is non-recombining, it should experience lower gBGC, and thus possess lower genomic GC content relative to its sexual counterpart. Arrested gBGC was recently detected in asexual lineages of Timema stick insects (Bast et al., 2018).

To assess changes in the rate of molecular evolution across the breadth of a newly asexual genome, we employ wasps in genus Diachasma (Hymenoptera:

Braconidae), parasitoids of the larvae of Rhagoletis flies (Diptera: Tephritidae). The asexual species D. muliebre likely arose as a single loss-of-sex event 10,000YA-1MYA

(Forbes et al., 2013). Although D. muliebre and its closest sexual relative D. ferrugineum

211 currently have non-overlapping geographical distributions, they are ecologically similar, parasitizing sister species of Rhagoletis (Muesebeck, 1956; Wharton & Marsh, 1978).

Although Maynard Smith (1978) pointed out that questions surrounding the evolutionary maintenance of sex could be confounded by other factors when sexual and asexual lineages do not co-occur, we are primarily concerned here with investigating the genomic consequences of sex loss and not the maintenance of sex in Diachasma.

Here, we test if loss of sex in Diachasma is associated with reduced effectiveness of selection. To do this, we obtained next-generation sequencing (NGS) datasets for D. muliebre alongside a closely related sexual species (D. ferrugineum) and genomes from three additional sexual braconid wasps (D. alloeum, Fopius arisanus,

Microplitis demolitor). We characterized evolutionary patterns of nuclear and mitochondrial genes in Diachasma to A) compare overall mutation landscapes between sexual and asexual genomes, B) assess the prevalence of putatively deleterious mutations after a loss-of-sex event, and C) analyze biases in codon usage and gene conversion.

212

MATERIALS & METHODS

Wasp genomic library preparation

We obtained individual female and male Diachasma specimens for NGS extractions. We collected two D. ferrugineum wasps (one female, one male) in Iowa City, IA, U.S.A.

(41.39° N, 91.31° W, 204 m elevation), one D. ferrugineum male in South Bend, IN,

U.S.A. (41.39° N, 91.31° W, 204 m elevation) and five D. muliebre females in Roslyn,

WA, U.S.A. (47.22° N, 120.99° W, 685 m elevation). The D. muliebre individuals were obtained from a larger collection of wasps and contain five previously-identified mtCOX1 haplotypes (Forbes et al., 2013). We generated RNA libraries for the Iowa City sample using the TruSeq Stranded mRNA kit (Illumina Inc., San Diego, CA) and DNA libraries for the remaining samples using the KAPA Hyper Prep kit (KAPA Biosystems,

Wilmington, MA). To obtain NGS datasets, we sequenced paired-end DNASeq and

RNASeq reads (2 x 300) at the University of Iowa using the Illumina MiSeq platform

(Illumina Inc., San Diego, CA). We performed quality trimming on reads using

Trimmomatic v0.32 (Bolger et al., 2014) and visually confirmed trimmed datasets using

FASTQC (Babraham Bioinformatics).

Reference guided genome assembly and variant calling

Given the haplodiploid genetic system of Diachasma, we conducted SNP calling pipelines in the nuclear genome with all female samples (one D. ferrugineum, five D. muliebre). An overview of the NGS dataset processing pipeline is provided in Figure 4-1.

We adopted the GATK’s Best Practices (Van der Auwera et al., 2013, https://software.broadinstitute.org/gatk/best-practices/, accessed 2017) for generating

213 distinct pipelines for the processing of DNASeq vs. RNASeq datasets. There are three major differences in the RNA vs. DNA pipeline:

First, we used different programs to map trimmed reads to the D. alloeum reference genome. We mapped reads passing quality filters to the D. alloeum genome assembly Dall1.0 (Tvedte et al., in preparation; NCBI Accession GCA_001412515.1).

We mapped DNASeq reads using Bowtie2 (Langmead & Salzberg, 2012) and separately using BWA-MEM (Li et al., 2013), and we mapped RNASeq reads using TopHat2 (Kim et al., 2013). To map reads to the D. alloeum genome, we largely used the default parameters for these tools. For TopHat2, we set the mean inner distance between mate pairs to 450bp (SD of 20bp), and for Bowtie2, we set the maximum fragment length for valid paired-end alignments to 800, as these values were consistent with the fragment lengths of the NGS library preps. Using the genomecov and maskfasta functions from bedtools v2.26.0 (Quinlan & Hall, 2010), we generated new D. alloeum reference sequences with masked regions that had < 2X coverage. Reads were then re-mapped to the masked reference using Bowtie2/BWA-MEM and TopHat2.

Second, for the RNASeq dataset, we executed the GATK program

SplitNCigarReads, prior to variant calling using HaplotypeCaller. This tool splits mapped reads containing Ns corresponding to splice junctions and is appropriate only for RNA datasets.

Third, we used specific parameters for the GATK tool VariantFiltration on

DNASeq and RNASeq datasets. The GATK website

(https://gatkforums.broadinstitute.org/gatk/discussion/2806/howto-apply-hard-filters-to- a-call-set) provides information on how to apply filters to a variant dataset. Assemblers

214 use different systems for defining the mapping quality (MAPQ) value of reads, and manual inspection of alignments indicated a single MAPQ filter value would not work well across variant callsets. MAPQ values reflect the probability that a mapping position is incorrect, thus we set specific MAPQ filters for each mapping software to eliminate variant calls contributed from likely secondary alignments. To do this, we excluded reads with MQ < 2 aligned with Bowtie2, and excluded reads with MQ < 40 aligned with

TopHat2 and BWA-MEM. All other filtering parameters were identical for all datasets, namely: QD > 2.0, FS > 60.0, MQRankSum < -12.5, ReadPosRankSum < -8.0, DP > 4).

Genomic dataset generation

To compare mutation accumulation in DNASeq and RNASeq datasets, we focus on evolutionary patterns in coding DNA sequences (CDS). We used OrthoVenn (Wang et al., 2015) to obtain a set of putative single-copy conserved genes in the D. alloeum assembly and the genomes of two additional parasitic wasps, including Fopius arisanus

(Geib et al., 2017), and Microplitis demolitor (Burke et al., 2014). Using the GFF file associated with each genome assembly, we ran a custom script to retrieve genome coordinates for all CDS regions. For D. alloeum and F. arisanus, we provided CDS coordinates as the –L parameter to retrieve CDS for each gene using the GATK tool

FastaReferenceMaker. For D. ferrugineum and D. muliebre, we executed a similar strategy, using the appropriate masked genome as the -R parameter and the VCF file from the GATK Best Practices pipeline as the –V parameter using the GATK tool

FastaAlternateReferenceMaker to obtain CDS for these wasp species.

For each of the genes in the dataset, we tested whether the asexual lineage D. muliebre possesses a greater mutational load relative to the sexual lineage, D.

215 ferrugineum. To do this, we produced codon-aware CDS alignments using PAL2NAL

(Suyama, 2006), providing combined information from MUSCLE protein alignments

(Edgar, 2004) and corresponding nucleotide sequences. We retained only genes having full CDS coverage in all wasp species, and calculated p-distances for each gene using the

‘ape’ package in R (Paradis et al., 2004). Specifically, we measured p-distances for D. muliebre (pdistDM) and D. ferrugineum (pdistDF) relative to a common sexual outgroup,

D. alloeum. Although p-distance values include evolutionary changes that have occurred in D. alloeum and the stem lineage preceding the divergence of D. muliebre and D. ferrugineum, using D. alloeum as a common reference point for comparisons allows us to infer whether a greater number of mutational differences have occurred in the asexual or sexual lineage. To achieve this, we first calculated the difference between pairwise distances (pdistDM – pdistDF) for each gene, and subsequently conducted a chi-square analysis to test against a null distribution of equal numbers of genes with greater mutational differences in sexuals and asexuals ((pdistDM > pdistDF) = (pdistDM < pdistDF)).

Variation in library preparation, sequencing, and assembly methods might generate distinct artefacts in our pipeline. To reduce the likelihood of problematic variant calling in NGS datasets, we manually inspected a subset of genes to assess whether

GATK produces accurate SNP calls across NGS dataset types (DNASeq and RNASeq) and read mappers (Bowtie2, BWA-MEM, and TopHat2). We focused on genes that showed the largest absolute differences in pairwise distance values, | pdistDM – pdistDF

|, as these genes could represent regions showing the greatest number of mutational differences in our dataset or alternatively could be the consequence of spurious mapping.

216

The above analyses treat each gene as an independent unit, however the actual dataset might be biased towards genes contained in linkage groups with nonindependent selective pressures. To assess biases in the genome locations of sampled genes, we determined the assembly scaffold of each gene in the dataset. Next, we counted i) the scaffolds sampled by 10% of genes with largest absolute differences in pairwise distance values, and ii) the subdivisions of scaffolds in i) by the largest pairwise distance values in either D. ferrugineum or D. muliebre. If there are strong biases in the dataset, the scaffolds contained in the subdivided datasets in ii) should be distinct pools of scaffolds from the assembly.

Intraspecific comparisons

To assess whether there is substantial variation in the nuclear genome mutation load among individual asexual females, we sequenced DNASeq datasets from five D. muliebre females. Applying the methods described above, we retrieved a subset of the conserved genes that had full-length coverage in the single D. ferrugineum female and all

D. muliebre females. After calculating pairwise distances for each wasp relative to D. alloeum, we performed multiple t-tests to test for differences among the asexual distributions, applying a Bonferroni correction for multiple comparisons.

Analagous comparisons of intraspecific variation in D. ferrugineum necessarily require females, since haploid males will lack a subset of heterozygous SNPs. We sequenced an additional female D. ferrugineum using a restriction-fragment based reduced representation library (RRL) approach (Altshuler et al., 2000). After digesting whole-genome DNA with EcoR1, the RRL was prepared with the Nextera DNA Library

Prep kit (Illumina Inc., San Diego, CA). Next, we used the Bowtie2+GATK workflow

217 described above to call SNPs. After masking regions with < 2X coverage in this dataset, we used bedtools merge and intersect functions to generate a single bed file containing all

CDS regions with > 2X coverage in two D. ferrugineum females and five D. muliebre females. For each wasp dataset, we calculated the number of SNPs as well as the frequency of homozygous vs. heterozygous SNPs using VCFtools (Danecek P, et al.

2011).

Relative rate analyses

In addition to evaluating mutational patterns for each nuclear gene individually, we performed a global analysis to characterize whether sex loss in Diachasma is associated with the accumulation of deleterious mutations. We combined all CDS regions with full coverage into a single concatenated dataset. Next, we used the GTR+CAT model implemented in RAxML (Stamatakis, 2006) to conduct a maximum likelihood analysis in order to estimate branch lengths between sexual and asexual species. We performed three

RAxML runs, using 1) all CDS sites, 2) 1st+2nd codon positions (≈ nonsynonymous sites), and 3) 3rd codon positions (≈ synonymous sites).

We performed an implementation of Tajima’s Relative Rate test in MEGA6

(Tajima, 1993; Tamura et al., 2013) to test for evolutionary rate differences in D. muliebre and D. ferrugineum. The test requires a three-taxon sequence dataset with knowledge a priori of the outgroup relative to the other two. Life history, morphology, and genetic data all support D. ferrugineum as the closest sexual relative to D. muliebre, with D. alloeum as the outgroup (Wharton & Marsh, 1978; Forbes et al., 2009;

Hamerlinck et al., 2016). We calculated the number of observed unique SNPs in the concatenated dataset for each Diachasma species. Next, a chi-square test compares the

218 observed number of unique differences to the expected number, given equal evolutionary rates. We performed three separate analyses to test rate differences across all sites as well as approximate nonsynonymous (1st+2nd codon positions) and synonymous (3rd codon positions) sites.

Codon usage bias and gBGC

We conducted two separate tests to evaluate differences in genome-wide codon usage between D. ferrugineum and D. muliebre females. First, we determined the effective number of codons (ENC) for each gene (Wright, 1990). The range of ENC values reflect codon usage for each amino acid, from 20 (each amino acid encoded by one codon) to 61 (all codons used equally). We calculated ENC using the program

ENCprime (https://github.com/jnovembre/ENCprime), which accounts for background nucleotide compositions to enable cross-species comparisons (Novembre, 2002). Second, we used the Composition Analysis Toolkit (CAT, http://www.cbrc.kaust.edu.sa/CAT/) to calculate per-gene codon deviation coefficient (CDC) values (Zhang et al., 2012). CAT uses background GC and purine content for each codon position to calculate an expected codon usage value and calculates CDC as a deviation from this expectation. CDC values range from 0 (no deviation from expectation, i.e. no selection on codon usage) to 1

(maximum deviation, i.e. effective purifying selection). To assess gBGC, we used CAT

(http://www.cbrc.kaust.edu.sa/CAT/) to calculate per-gene GC content at third codon positions (GC3), which are most likely to be evolving neutrally. Higher GC3 values reflect preferential conversion of A and T alleles to G and C.

219

Mitochondrial genome assembly and analyses

To evaluate patterns of mutation accumulation in the protein-coding regions of the mitochondrial genome, DNASeq samples were used (one D. ferrugineum male, five D. muliebre females). We assembled whole mitochondrial genomes de novo using

NOVOplasty v2.6.3 (Dierckxsens et al., 2017) with a cytochrome C oxidase I (COX1) sequence from D. alloeum to seed the assembly. To assist in annotation efforts, we used thirteen protein-coding regions from Diachasmimorpha longicaudata (Genbank

Accession GU097655.1, Wei et al., 2010). Similar to the nuclear genome analyses, we calculated per-gene p-distances, RAxML phylogenies of concatenated datasets, relative rate tests, and codon usage bias statistics.

Statistical analyses

For comparisons of pairwise distances, ENC, CDC, and GC3 in the nuclear and mitochondrial genomes, we performed statistical tests to determine whether the distribution of values involving D. muliebre and D. ferrugineum were different. We tested each dataset for normality using the Shapiro-Wilk method. Since all tests either i) failed tests of normality and/or ii) had low gene sample sizes (i.e. mitochondrial genes)

(see Supplementary Table 4-S6), we conducted a Wilcoxon rank sum test as a conservative measure to determine whether or not sexual and asexual wasps possess similar pairwise distance distributions.

220

RESULTS

The five D. muliebre DNASeq libraries averaged 8.83 Gb of sequence data (SD = 748.48

Mb). Among the D. ferrugineum samples, the male DNASeq, female RNASeq, and male

RNASeq libraries contained 9.73 Gb, 4.99 Gb, and 4.92 Gb, respectively. The nonredundant dataset of D. alloeum contained 12,791 nuclear genes. When analyzed alongside 10,971 genes from F. arisanus and 12,088 genes from M. demolitor,

OrthoVenn generated 7,910 single-copy gene clusters. Of these, we used 3,127 (39.53%) with full CDS coverage in all wasp species compared in this study, including individual

D. ferrugineum and D. muliebre females.

Manual inspection of SNP calls on a subset of genes indicated that there were fewer SNPs that likely represented false positives when using Bowtie2 to map DNASeq relative to BWA-MEM in D. muliebre, therefore we used these datasets for the remainder of the analyses. By analyzing 50 genes with the highest pdistDM value relative to pdistDF (Supplementary Table 4-S1), we identified 477 SNPs in D. muliebre, 9 (1.89%) of which were determined to be false positives and were contained on a single gene. We confirmed 146 SNPs in D. ferrugineum in these same genes; zero were false positives.

Additionally, we identified zero and three SNPs that were false negatives in D. muliebre and D. ferrugineum, respectively. In the 50 genes with the highest pdistDF values relative to pdistDM (Supplementary Table 4-S2), we identified 259 SNPs in D. ferrugineum and

111 SNPs in D. muliebre, none of which were false positives. Two genes contained false negative SNPs in D. muliebre, however one was a part of a tandem duplication and was subsequently removed from the dataset. Overall, our SNP calling pipeline was robust and

221 displayed consistent performance across NGS dataset types (DNASeq and RNASeq) and read mappers (Bowtie2 and TopHat2).

Loss of sex is associated with a global shift in the mutation landscape

In asexuals, the reduced efficacy of recombination to replace mutation-free individuals is expected to increase the mutational load within the lineage over time. When surveying mutational load on a gene-by-gene basis, decreased functionality of recombination in asexuals should decrease the proportion of mutation-free genes. Consistent with this prediction, 149 genes were identical between D. ferrugineum and D. alloeum (pdistDF =

0), compared to only 133 identical genes between D. muliebre and D. alloeum (pdistDM

= 0). Moreover, 1,219 genes had higher pairwise distance values in D. muliebre

(pdistDM > pdistDF), while 899 genes had higher pairwise distances in D. ferrugineum

(pdistDM < pdistDF) (Figure 2). These proportions represent a statistically significant deviation from the null expectation (χ2 = 48.35, p < 0.0001), indicating that there is likely an overrepresentation of genes possessing a higher number of mutations in D. muliebre.

Confirming this pattern, comparisons of pairwise distance values across the entire gene set indicated a prominent rightward shift of the D. muliebre mutation landscape relative to D. ferrugineum (Figure 3). There was a significant difference between mean pairwise distances for D. muliebre genes (MpdistDM = 0.005496, SD = 9.365E-6) relative to D. ferrugineum genes (MpdistDF = 0.005089, SD = 7.953E-6) when compared to D. alloeum homologs, suggesting the potential for a larger mutational load in the asexual lineage (Wilcoxon rank sum p = 4.17E-7).

The distinct distributions between sexual and asexual wasps could be confounded by the non-independence of genes contained in linkage groups. To address this, we

222 quantified the number of scaffolds sampled in the entire gene dataset, as well as the scaffolds containing genes with divergent molecular evolution in D. ferrugineum and D. muliebre. Of 3,968 total scaffolds in the D. alloeum assembly, 274 (7.42%) contained the

3,127 genes used for pairwise comparisons. The low percentage is unsurprising given that we restricted our gene set to those conserved between D. alloeum and two other braconid wasps, which are most likely contained on a small number of long scaffolds in the assembly. A subset of genes with the highest pairwise distances in D. muliebre relative to

D. ferrugineum (313 genes, ~10%) were contained in 102 scaffolds, whereas a subset of genes with the reverse pattern (highest in D. ferrugineum) were contained in 121 scaffolds. Finally, the combined set of 626 genes were on 141 scaffolds, providing evidence that the subsets of scaffolds at the ends of the distribution are largely overlapping. In other words, many scaffolds likely contain genes showing either accelerated evolution in D. ferrugineum or D. muliebre, and the acceleration in molecular evolution in the asexual wasp species is not explained by selection acting at linked sites.

Loss of sex is associated with accelerated nuclear genome evolution

To further evaluate patterns of mutation accumulation across Diachasma genomes, we combined CDS from all genes into a concatenated dataset. Given the recency of the loss- of-sex event in these wasps (Forbes et al., 2013), the signal generated by differences in mutation accumulation is likely to be low, and therefore comparisons involving any single gene would not be informative. The RAxML maximum likelihood analysis of the concatenated CDS displayed an increase in the overall rate of molecular evolution in D. muliebre relative to D. ferrugineum (Figure 4-4A). The total number of unique differences in D. muliebre was greater than D. ferrugineum (χ2 = 256.91, p = 0.00000)

223 supporting the observations from pairwise distance data (Table 4-1). As a conservative measure, we applied the D. muliebre false-positives rate to the concatenated dataset to compare wasp genomes (the application of the false negatives rate as a proportion of true negatives in this case is negligible given the total length of the alignments). If ~108

(1.89%) D. muliebre SNPs are false positives, this individual would have 5,634 SNPs in the concatenated dataset. Applying this conservative measure, the number of unique differences was significantly greater than the 4,148 observed in D. ferrugineum (χ2 =

1416, p << 0.005).

To distinguish between effects generated by accelerated mutation rate and reduced efficacy of selection, we parsed datasets by codon positions. An increase in the intrinsic mutation rate in an asexual lineage would be characterized by observing a greater number of mutational changes at selectively neutral sites (e.g. synonymous sites).

Here, we use phylogenetic analyses of an alignment of 3rd codon positions to estimate this effect. D. muliebre had a longer branch length relative to D. ferrugineum when using

3rd codon position partitions, and the relative rate test indicated a significant excess of unique differences in D. muliebre, suggesting a higher mutation rate in asexual wasps

(χ2 = 142.76, p = 0.00000) (Table 4-1, Figure 4-4B). If asexuality is associated with the reduction in the efficacy of selection to act independently on physically proximate loci to remove mildly deleterious mutations, we would expect to observe the increased incidence of mutations at nonsynonymous sites, estimated here using 1st and 2nd codon positions.

Similar to 3rd codon position observations, D. muliebre had longer branch lengths compared to those of its closest sexual relative (Figure 4-4C). The number of unique differences was substantially higher in D. muliebre at 1st/2nd positions (χ2 =123.24, p=

224

0.00000), and the asexual genome also contained a larger proportion of total unique differences occurring at 1st/2nd codon positions (Table 4-1). This pattern suggests that D. muliebre possesses a larger mutational load relative to D. ferrugineum, with putatively deleterious mutations more often retained in the asexual lineage.

Low levels of intraspecific variation in Diachasma

NGS sequence data from five D. muliebre females mapped to 1,764 complete CDS regions. Pairwise distances between pairs of D. muliebre females were not statistically significant (10 comparisons, Wilcoxon rank sum p > 0.005 with Bonferroni correction), providing no evidence for variation in mutation accumulation landscapes within asexuals

(Supplementary Table 4-S3, Supplementary Figure 4-1, Supplementary Figure 4-S2).

The total length of CDS regions with > 2X coverage in sexual and asexual

Diachasma females was 734,668 bp. At these sites, D. muliebre females had a consistently higher number of total SNPs relative to D. ferrugineum (Supplementary

Table 4-S4). The percentage of variant sites in these regions (0.00546 in D. muliebre I,

0.00496 in D. ferrugineum RNA) were similar to the mean pairwise distance values obtained in the full CDS dataset (0.005496 in D. muliebre I, 0.005089 in D. ferrugineum

RNA). Of the total SNPs, ~1% were heterozygous in D. muliebre, a considerably smaller percentage than found in D. ferrugineum wasps (~18-20%), supporting that asexual reproduction in Diachasma is accompanied by a reduction in genomic heterozygosity

(Supplementary Table 4-S4).

Distinct pattern of mitochondrial mutation accumulation

Proper mitochondrial function requires high-fidelity interactions between nuclear and mitochondrial gene products. In asexuals, co-transmission of the nuclear and

225 mitochondrial genomes increases LD across both genomes. Accelerated evolution in mitochondrial genomes in asexuals could reflect a) the reduction in efficacy of selection to act independently on linked genomes or b) the accommodation of compensatory changes in mitochondrial genes in cases where nuclear genome evolution is accelerated.

In contrast to the pattern observed in the nuclear gene dataset, mitochondrial genes showed longer branch lengths in D. ferrugineum relative to D. muliebre (Figure 4-5).

This pattern was consistent across subdivided codon position datasets, and was particularly apparent when considering 1st/2nd codon positions, supporting effective purifying selection in asexuals (Figure 4-6, Figure 4-7).

The relative rate analyses of mitochondrial genes are summarized in Table 4-2, here tested between the same wasp individuals as used in the nuclear dataset (i.e. D. muliebre I vs. D. ferrugineum RNA). Of 13 protein-coding genes, none showed a significant deviation from equal evolutionary rates after correcting for multiple comparisons (Supplementary Table 4-S5). When the genes were concatenated, D. ferrugineum had a significantly higher number of changes relative to D. muliebre when considering all sites and 1st/2nd codon positions, although the total number of differences is relatively small and could be largely influenced by gene-specific effects. Indeed, removing COX1 from the concatenated analysis results in the loss of significance of the all sites comparisons (Supplementary Table 4-S5). Moreover, mean per-gene distance values are similar between sexual and asexual wasps, a result that was consistent when considering a dataset containing all sites and datasets subdivided by codon positions

(Table 4-3).

226

Loss of sex is not associated with changes in codon usage or nucleotide composition

Selectionist and mutational hypotheses provide non-mutually exclusive explanations for codon bias (Plotkin & Kudla, 2011). From a selectionist perspective, biased use of bases at synonymous sites can contribute to organismal fitness via increased translational efficiency and accuracy. We tested for selective pressures on synonymous sites by measuring the effective number of codons (ENC) and codon deviation coefficients

(CDC) in 3,127 genes with full coverage in individual D. ferrugineum and D. muliebre wasps. In the nuclear and mitochondrial genomes, distributions of ENC and CDC values were comparable in the two wasps (Wilcoxon rank sum p > 0.9 in all cases), indicating similar codon usage biases across reproductive modes (Table 4-4). The results suggest that accelerated evolution in D. muliebre is not explained by relaxed selection on codon usage.

A potential mutational contributor to differences in codon bias and genome-wide nucleotide composition is gBGC. We estimated biased gene conversion by measuring

GC3 content for the 3,127 genes described above (Table 4-4). In both nuclear and mitochondrial genomes, per-gene GC3 distributions were not significantly different in D. ferrugineum and D. muliebre (Wilcoxon rank sum p ≥ 0.47), consistent with the maintenance of gBGC activity in asexual wasps.

227

DISCUSSION

The transition from sexual to asexual reproduction is expected to be associated with accelerated genomic mutation accumulation. Here, we provide genome-wide evidence for higher genetic loads in asexual Diachasma wasps compared to sexual relatives, consistent with theoretical predictions. Differences in mutation accumulation between D. muliebre and D. ferrugineum may be in part due to a slowdown in the intrinsic rate of evolution in D. ferrugineum, as the sum of branch lengths was smaller than the branch leading to the D. alloeum lineage (Figure 3). However, this explanation is not sufficient to explain the observed pattern, as the maximum likelihood analyses supported longer branch lengths in D. muliebre relative to both sexual lineages. D. muliebre showed a higher incidence of mutations occurring at 1st and 2nd codon positions, changes which are more likely to be deleterious. Previous comparative genomics studies across reproductive modes have shown the reduced efficacy of purifying selection to remove harmful mutations in clonal asexual lineages (Hollister et al. 2015; Lovell et al.,

2017; Bast et al, 2018), while others show no effect (Ament-Velasquez et al., 2016) or more effective selection in asexuals (Brandt et al., 2017). Also, asexual Diachasma individuals possessed increased synonymous changes compared to sexuals, although this pattern is unlikely to be a consequence of changes in selection for codon usage bias or gBGC.

To our knowledge, this study represents the first demonstration of accelerated mutation accumulation in a meiotically-reproducing asexual organism. In a recent analysis involving sexual and automictic asexual Trichogramma pretiosum wasps,

Lindsey et al. (2018) did not find increased rates of protein evolution in asexuals, but the

228 exclusion of a sexual outgroup to T. pretiosum prohibited genome-wide rate tests at the nucleotide level. Although the precise mechanism of automixis in D. muliebre is unclear, separate lines of evidence support the maintenance of meiotic reproduction in this asexual species (Forbes et al., 2013; Tvedte et al., 2016). The high homozygosity of asexual SNP datasets and similar distributions of GC3 content across reproductive modes strengthen this hypothesis.

Does meiosis hinder mutation accumulation in asexual lineages?

Characterizing the costs and benefits of asexuality is complicated by the maintenance of meiotic egg production in automictic lineages. The mutation landscape distributions of D. ferrugineum and D. muliebre are largely overlapping, indicating that the presence of a modified form of meiotic recombination may act as a buffer against more pronounced levels of mutation accumulation. Initial transitions to automictic recombination might be disadvantageous due to the uncovering of deleterious recessive alleles, however this component of genetic load is likely to be lower in haplodiploids (e.g. Diachasma), where recessive harmful variants would be fully expressed in males (Archetti, 2004). A recent in silico study suggested that automictic lineages could possess greater genetic diversity and lower genetic loads relative to sexual lineages (Engelstädter, 2017), but there is a current dearth of empirical studies that directly test mutation hypotheses using automictic taxa. Our results could be particularly relevant for the study of broader patterns in haplodiploid arthropods, which are particularly rich in reproductive transitions to obligate asexuality (van der Kooi et al., 2017) and possess a wide variety of automictic mechanisms (Suomalainen et al., 1987; Stenberg & Saura, 2009).

How are mutational changes realized in asexual Diachasma?

229

An alternate explanation for the mutation landscape overlap between sexual and asexual

Diachasma is the relatively recent origin of asexuality. Studied asexual systems are often young, which may contribute to mixed evidence of accelerated mutation accumulation when a limited number of loci are sequenced. The patterns observed in Diachasma reinforce the utility of NGS datasets to obtain ample signal for investigating recent loss- of-sex events. Even so, we cannot eliminate the possibility that the recency of asexuality precludes our ability to observe effects of sex loss, e.g. relaxed selection on codon usage.

In our study, comparisons across reproductive modes are largely based on single individuals as proxies for entire lineages. Whole-genome NGS datasets from five previously-identified lineages of D. muliebre wasps exhibited low intraspecific variation, thus conclusions about one asexual individual can be reasonably extended to the asexual species as a whole. Additionally, SNP counts in a second D. ferrugineum female displayed lower SNP counts relative to D. muliebre individuals, however the reduced representation library design prevented its inclusion in our full analysis. Nevertheless, increased sampling from D. muliebre and D. ferrugineum wasps would strengthen our ability to draw conclusions regarding reproductive mode and mutation accumulation. One promising avenue would be to determine ratios of nonsynonymous to synonsymous changes that are fixed vs. polymorphic in wasp species; both ratios were higher in asexual lineages of Timema stick insects, demonstrating that putatively deleterious variants are fixed faster and remain polymorphic for longer in asexuals (Bast et al., 2018).

There is currently no direct connection between mutation accumulation and fitness outcomes in Diachasma. Accumulation of harmful mutations is hypothesized to limit the evolutionary lifespan of asexual lineages. The estimated origin of asexuality in

230 the annually-reproducing D. muliebre could be used to approximate per-generation accumulation of harmful mutations. Conversely, maintenance of meiotic reproduction in

D. muliebre could facilitate adaptive evolution. Classical hypotheses for the evolutionary maintenance of sex invokes recombination’s activity to bring together advantageous combinations of alleles (Fisher, 1930; Muller 1932) or to generate genetic diversity to increase occupancy of heterogeneous niches (Bell, 1982), both conferring fitness advantages to sexual lineages. Recently, D. muliebre has experienced a host shift from bitter to sweet cherry (Prunus spp.), and genetically distinct wasp lineages display earlier eclosion behavior, corresponding to the earlier fruiting period of the sweet cherry (Forbes et al. 2013). Since asexuals cannot access genetic variation outside of their own lineage, the persistence of recombination may facilitate the union of adaptive alleles originating from de novo mutations and/or preserve standing genetic variation. Investigating evolutionary patterns of the genes involved in developmental pathways and host discrimination would be particularly promising, as there are strong associations between genetic differentiation and variation in these life-history traits across Diachasma species

(Forbes et al., 2009; Forbes et al., 2013).

The mitochondrial genome: exception to the rule?

There was no apparent acceleration of evolutionary rates in the mitochondrial genome of

D. muliebre, which stands in contrast to comparative studies in other organisms (Paland

& Lynch, 2006; Sharbrough et al., 2018). The concatenated mitochondrial dataset showed weak evidence of increased mutation rate in D. ferrugineum, but per-gene pairwise distance and relative rate measures were insignificant across reproductive modes.

231

In nearly every mitochondrial gene, D. muliebre had a lower incidence of unique changes at 1st and 2nd codon positions. One hypothesis for this observation is an increase in the efficacy of purifying selection against deleterious changes, which would be enabled in this case given the ability of recombination to disrupt mitonuclear linkage.

Conversely, this pattern could also be explained by positive selection for nonsynonymous changes in nuclear-encoded mitochondrial genes in D. ferrugineum that are accompanied by compensatory changes in the mitochondrial genome. Evolutionary rates of mitochondrial genes tend to correlate with their nuclear counterparts, particularly in cases when subunits from the two genomes are in physical contact (Yan et al., 2018).

Investigation of nuclear-encoded mitochondrial genes in Diachasma may be fruitful, and represent an exception to the predominant evolutionary rate difference in the nuclear genome.

Acknowledgements

We thank Amanda Nelson and Wee Yee for contributions to wasp collections, Gery

Hehman for assistance with NGS library preparation, and Nick Stewart and Laura

Bankers for preliminary sequencing efforts of Diachasma wasps that motivated the study.

This work was funded by a University of Iowa Internal Funding Initiative grant to A.A.F. and J.M.L.

232

REFERENCES

Altshuler D, et al. 2000. An SNP map of the human genome generated by reduced

representation shotgun sequencing. Nature 407:513-516.

Ament-Velásquez SL, et al. 2016. Population genomics of sexual and asexual lineages in

fissiparous ribbon worms (Lineus, Nemertea): hybridization, polyploidy and the

Meselson effect. Mol Ecol 25:3356–3369.

Archetti M. 2004. Loss of complementation and the logic of two-step meiosis. J

Evolution Biol 17:1098–1105.

Bast J, et al. 2018. Consequences of asexuality in natural populations: insights from stick

insects. Mol Biol Evol doi:10.1093/molbev/msy058.

Bell G. The Masterpiece of Nature: The Evolution and Genetics of Sexuality. 1982.

Cambridge University Press, Cambridge, UK.

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina

sequence data. Bioinformatics 30:2114-2120.

Brandt A, et al. 2017. Effective purifying selection in ancient asexual oribatid mites.

Nat Commun 8:873.

Burke GR, Walden KK, Whitfield JB, Robertson HM, Strand MR. 2014. Widespread

genome reorganization of an obligate virus mutualist. PLOS Genet 10:e1004660.

Charlesworth B, Charlesworth D. 1997. Rapid fixation of deleterious alleles can be

caused by Muller’s ratchet. Genet Res 70:63–73.

Danecek P, et al. 2011. The variant call format and VCFtools. Bioinformatics 27:2156-

2158.

233

Dierckxsens N, Mardulyn P, Smits G. 2016. NOVOPlasty: de novo assembly of organelle

genomes from whole genome data. Nucleic Acids Res 45:e18–e18.

Duret L, Galtier N. 2009. Biased gene conversion and the evolution of mammalian

genomic landscapes. Annu Rev Genom Hum G 10:285-311.

Engelstädter J. 2017. Asexual but not clonal: evolutionary processes in automictic

populations. Genetics 206:993–1009.

Felsenstein J. The evolutionary advantage of recombination. Genetics 1974, 78:737–756.

Fisher RA. 1930. The Genetical Theory of Natural Selection. Oxford University Press,

Oxford, UK.

Forbes AA, Powell TH, Stelinski LL, Smith JJ, Feder JL. 2009. Sequential sympatric

speciation across trophic levels. Science 323:776–779.

Forbes AA, Rice LA, Stewart NB, Yee WL, Neiman M. 2013. Niche differentiation and

colonization of a novel environment by an asexual parasitic wasp. J Evolution

Biol 26:1330–1340.

Galtier N, et al. 2018. Codon usage bias in animals: disentangling the effects of natural

selection, effective population size, and GC-biased gene conversion. Mol Biol

Evol 35:1092-1103.

Geib SM, Liang GH, Murphy TD, Sim SB. 2017. Whole genome sequencing of the

Braconid parasitoid wasp Fopius arisanus, an important biocontrol agent of pest

Tepritid fruit flies. G3-Genes Genom Genet 7:2407–2411.

Glémin S, Clément Y, David J, Ressayre A. 2014. GC content evolution in coding

regions of angiosperm genomes: a unifying hypothesis. Trends Genet 30:263-270.

234

Glémin S, Galtier N. 2012. Genome evolution in outcrossing versus selfing versus

asexual species. In Evolutionary Genomics. ed. Anisimova M. pp. 311–335.

Springer, New York, NY, USA.

Hamerlinck G, Hulbert D, Hood GR, Smith JJ, Forbes AA. 2016. Histories of host shifts

and cospeciation among free-living parasitoids of Rhagoletis flies. J Evolution

Biol 29:1766–1779.

Hartfield M. 2016. Evolutionary genetic consequences of facultative sex and outcrossing.

J Evolution Biol 29:5–22.

Hartfield M, Keightley PD. 2012. Current hypotheses for the evolution of sex and

recombination. Integr Zool 7:192–209.

Hershberg R, Petrov DA. 2008. Selection on codon bias. Ann Rev Genet 42:287-299.

Hill WG, Robertson A. The effect of linkage on limits to artificial selection. Genet Res

1966, 8:269–294.

Hollister JD, et al. 2015. Recurrent loss of sex is associated with accumulation of

deleterious mutations in Oenothera. Mol Biol Evol 32:896–905.

Kim D, et al. 2013. TopHat2: accurate alignment of transcriptomes in the presence of

insertions, deletions and gene fusions. Genome Biol 14:R36.

Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat

Methods 9:357.

Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-

MEM. arXiv preprint arXiv:1303.3997.

Lindsey AR, et al. 2018. Comparative genomics of the miniature wasp and pest control

agent Trichogramma pretiosum. BMC Biol 16:54.

235

Lovell JT, Williamson RJ, Wright SI, McKay JK, Sharbel TF. 2017. Mutation

accumulation in an asexual relative of Arabidopsis. PLOS Genet 13:e1006550.

Lynch M, Bürger R, Butcher D, Gabriel W. 1993. The mutational meltdown in asexual

populations. J Hered 84:339–344.

Muesebeck CFW. 1956. On Opius ferrugineus Gahan and two closely similar new

species (Hymenoptera: Bracondiae). Entomol News 67: 99–102.

Mugal CF, Weber CC, Ellegren H. 2015. GC‐biased gene conversion links the

recombination landscape and demography to genomic base composition: GC‐

biased gene conversion drives genomic base composition across a wide range of

species. Bioessays 37:1317-1326.

Muller HJ. 1964. The relation of recombination to mutational advance. Mutat Res-Fund

Mol M 1:2–9.

Muller HJ. 1932. Some genetic aspects of sex. Am Nat 66:118–138.

Novembre JA. 2002. Accounting for background nucleotide composition when

measuring codon usage bias. Mol Biol Evol 19:1390–1394.

Otto SP. 2009. The evolutionary enigma of sex. Am Nat 174:S1–S14.

Paland S, Lynch M. 2006. Transitions to asexuality result in excess amino acid

substitutions. Science, 311:990-992.

Paradis E, Claude J, Strimmer K. 2004. APE: analyses of phylogenetics and evolution in

R language. Bioinformatics. 20:289–290.

Pessia E, et al. 2012. Evidence for widespread GC-biased gene conversion in eukaryotes.

Genome Biol Evol 4:675-682.

236

Plotkin JB, Kudla G. 2011. Synonymous but not the same: the causes and consequences

of codon bias. Nat Rev Genet 12:32-42.

Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing

genomic features. Bioinformatics. 26:841–842.

Sharbrough J, Luse M, Boore JL, Logsdon Jr JM, Neiman M. 2018. Radical amino acid

mutations persist longer in the absence of sex. Evolution 72:808-824.

Smith JM. The Evolution of Sex. 1978. Cambridge University Press, Cambridge, UK.

Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses

with thousands of taxa and mixed models. Bioinformatics. 22:2688–2690.

Stenberg P, Saura A. 2009. Cytology of asexual animals. In Lost sex. ed. Schӧn I,

Martens K, van Dijk P. pp. 63–74. Springer, New York, NY, USA.

Suomalainen E, Saura A, Lokki J. 1987. Cytology and evolution in parthenogenesis.

CRC Press, Boca Raton, FL, USA.

Tajima F. 1993. Simple methods for testing the molecular evolutionary clock hypothesis.

Genetics 135:599-607.

Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: molecular

evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729.

Tvedte ES, Forbes AA, Logsdon Jr JM. 2017. Retention of core meiotic genes across

diverse Hymenoptera. J Hered 108:791-806. van der Auwera GA, et al. 2013. From FastQ data to high-confidence variant calls: the

genome analysis toolkit best practices pipeline. Curr Protoc Bioinformatics

43:11.10.1–11.10.33.

237 van der Kooi CJ, Matthey-Doret C, Schwander T. 2017. Evolution and comparative

ecology of parthenogenesis in haplodiploid arthropods. Evolution Letters 1:304–

316.

Wang Y, Coleman-Derr D, Chen G, Gu YQ. 2015. OrthoVenn: a web server for genome

wide comparison and annotation of orthologous clusters across multiple species.

Nucleic Acids Res 43:W78–W84.

Wharton R, Marsh P. 1978. New world Opiinae (Hymenoptera: Braconidae) parasitic on

Tephritidae (Diptera). J Wash Acad Sci 68:147-167.

Wei S, Shi M, Sharkey MJ, van Achterberg C, Chen X. 2010. Comparative

mitogenomics of Braconidae (Insecta: Hymenoptera) and the phylogenetic utility

of mitochondrial genomes with special reference to Holometabolous insects.

BMC Genomics 11:371.

Wright F. 1990. The ‘effective number of codons’ used in a gene. Gene 87:23–29.

Yan Z, Ye GY, Werren J. 2018. Evolutionary rate coevolution between mitochondria and

mitochondria-associated nuclear-encoded proteins in insects. bioRxiv, DOI:

https://doi.org/10.1101/288456.

Zhang Z, et al. 2012. Codon Deviation Coefficient: a novel measure for estimating codon

usage bias and its statistical significance. BMC Bioinformatics 13:43.

238

TABLES

Table 4-1. Relative rate analysis of concatenated nuclear genes in Diachasma.

number of unique differences

all sites 1st / 2nd codon positions 3rd codon positions # genes length (bp) DF DM DF DM DF DM

1,113 1,702* 3,035 4,040* 3,127 3,715,368 4,148 5,742* (26.83%) (29.64%) (73.17%) (70.36%) D. alloeum was used as the outgroup to measure the number of unique differences in the sexual D. ferrugineum (DF, blue) vs. asexual D. muliebre (DM, red) lineages. *indicates the χ2 test statistic was statistically significant (p << 0.005), i.e. the bolded value represents a significantly higher number of unique differences than would be expected under a null model of equal evolutionary rates between the two wasps.

239

Table 4-2. Relative rate analysis of mitochondrial genes in Diachasma.

Unique number of differences

All sites 1st/2nd codon positions 3rd codon positions

Length Gene DM DF DM DF DM DF (bp)

NAD1 921 16 20 5 13 11 7

NAD2 963 13 8 4 4 9 4

NAD3 318 3 8 0 3 3 5

NAD4 1305 13 20 2 8 11 12

NAD4L 276 3 3 0 1 3 2

NAD5 1608 18 18 9 9 9 9

NAD6 561 7 12 4 7 3 5

CYTB 1104 17 15 4 7 13 8

COX1 1515 8 19 2 4 6 15

COX2 648 7 9 3 0 4 9

COX3 747 8 15 2 6 6 9

ATP6 666 9 7 5 2 4 5

ATP8 117 4 0 3 0 1 0

Total 10,749 154 126 64* 43 90 83

Total (no NAD 1) 9,825 134 110 51 38 83 72

*indicates the χ2 test statistic was statistically significant (p < 0.05), i.e. the bolded value represents a significantly higher number of unique differences than would be expected under a null model of equal evolutionary rates between the two wasps.

240

Table 4-3. Comparisons of pairwise distances in mitochondrial genes for sexual and asexual Diachasma. Wasp 1 (codon positions) Wasp 2 (codon positions) # t-statistic p-value W-statistic p-value Mpdist (Variance) Mpdist (Variance) genes D. ferrugineum (all) D. muliebre I (all) 13 0.13604 0.89 78 0.76 0.083208 (2.23E-4) 0.082446 (1.85E-4) D. ferrugineum (1st/2nd) D. muliebre I (1st/2nd) 13 0.1212 0.90 74.5 0.63 0.044431 (1.63E-5) 0.043692 (3.2E-4) D. ferrugineum (3rd) D. muliebre I (3rd) 13 0.0435 0.97 81.5 0.90 0.160615 (1.53E-3) 0.159969 (1.34E-3) Unpaired t-tests and Wilcoxon rank sum tests were used to assess significant differences in distributions of pairwise distance measures involving D. ferrugineum (sexual, blue) and D. muliebre (asexual, red) relative to a common outgroup (D. alloeum, sexual).

241

Table 4-4. Measures of codon bias and nucleotide composition in nuclear and mitochondrial genes in Diachasma. D. ferrugineum D. muliebre I # W-statistic p-value mean mean genes (SD) (SD) Nuclear 54.69482 54.69372 ENC 3127 4887500 0.98 (3.69) (3.70) 0.130546 0.130613 CDC 3127 4883900 0.94 (0.05) (0.05) 0.52141 0.521456 GC3 3127 4887500 0.98 (0.089) (0.089) Mitochondrial 54.33375 54.53597 ENC 13 82.5 0.94 (4.99) (5.54) 0.077421 0.07619 CDC 13 86 0.96 (0.02) (0.03) 0.12321 0.11764 GC3 13 99 0.47 (0.041) (0.040) ENC = effective number of codons, CDC = codon deviation coefficient, GC3 = GC content at third codon position. Wilcoxon rank sum tests were used to assess significant differences in means between D. ferrugineum (sexual, blue) and D. muliebre (asexual, red).

242

FIGURES

Figure 4-1. Diagram of variant calling pipeline used for Diachasma NGS datasets.

Blue and red text refers to steps specific for RNA (D. ferrugineum female) and DNA (D. muliebre female) datasets, black text refers to steps used for all datasets using identical parameters.

243

Figure 4-2. Relative pairwise distance values in sexual and asexual Diachasma. Positive (red) values indicate genes with a higher mutation load in asexuals (pdistDM > pdistDF), negative (blue) values indicate genes with a higher mutational load in sexuals (pdistDM < pdistDF), zero (purple) values indicate genes with equivalent mutational loads (pdistDM = pdistDF). Bolded black lines correspond to the 90th percentile of positive and negative Δ values. All pairwise distance values were calculated against a common wasp reference (D. alloeum).

244

Figure 4-3. Mutation landscapes of sexual and asexual Diachasma. Columns represent binned frequencies (left y-axis) of corresponding p-distance values from D. alloeum. Curves represent the cumulative frequency of binned columns (right y- axis).

245

Figure 4-4. Maximum likelihood analysis of concatenated nuclear gene dataset in Diachasma. Asexual (red) and sexual (blue) Diachasma lineages are shown, and informative branch lengths are labeled. A. Phylogenetic tree of all sites (3,715,368 bp). B. Phylogenetic tree of 1st/2nd codon sites (2,476,912 bp). C. Phylogenetic tree of 3rd codon sites (1,238,456 bp).

246

Figure 4- 5. Maximum likelihood analysis of concatenated mitochondrial gene dataset in Diachasma. Asexual (red) and sexual (blue) Diachasma lineages are shown, and branch lengths > 1E- 3 are labeled. Wasp samples include five D. muliebre wasps representing distinct mtCOX1 haplotypes (Forbes et al., 2013) and three D. ferrugineum samples (SB = South Bend, IC = Iowa City). Phylogenetic tree includes all sites from 13 protein-coding genes (10,749 bp).

247

Figure 4- 6. Maximum likelihood analysis of concatenated 1st/2nd codon positions in Diachasma. Asexual (red) and sexual (blue) Diachasma lineages are shown, and branch lengths > 1E- 3 are labeled. Wasp samples include five D. muliebre wasps representing distinct mtCOX1 haplotypes (Forbes et al., 2013) and three D. ferrugineum samples (SB = South Bend, IC = Iowa City). Phylogenetic tree includes 1st/2nd codon sites from 13 protein- coding genes (7,166 bp).

248

Figure 4-7. Maximum likelihood analysis of concatenated 3rd codon positions in Diachasma. Asexual (red) and sexual (blue) Diachasma lineages are shown, and branch lengths > 1E- 3 are labeled. Wasp samples include five D. muliebre wasps representing distinct mtCOX1 haplotypes (Forbes et al., 2013) and three D. ferrugineum samples (SB = South Bend, IC = Iowa City). Phylogenetic tree includes 3rd codon sites from 13 protein-coding genes (3,583 bp).

249

SUPPLEMENTARY DATA

Table 4-S1. Manual annotation of SNPs in accelerated asexual genes.

pdist pdist Δ DF DF DF DM DM DM cluster coordinates DF DM pdist SNP FP FN SNP FP FN NW_015145034.1:1 4167 0.0038 0.0246 0.0208 2 0 0 13 0 0 825605-1826552 NW_015145098.1:5 6236 0.0025 0.0198 0.0173 1 0 0 8 0 0 5721-58954 NW_015145106.1:8 7710 0.0023 0.0159 0.0136 1 0 0 7 0 0 81760-883391 NW_015145106.1:8 319 0.0000 0.0132 0.0132 0 0 0 5 0 0 78338-879847 NW_015145163.1:8 1394 0.0007 0.0136 0.0129 1 0 0 19 0 0 88976-900771 NW_015145023.1:2 6470 0.0064 0.0191 0.0127 3 0 0 9 0 0 281296-2282712 NW_015145157.1:1 2378 0.0062 0.0186 0.0124 3 0 0 9 0 0 43688-146013 NW_015145005.1:5 1581 0.0031 0.0154 0.0123 3 0 0 15 0 0 733031-5735567 NW_015145037.1:5 5679 0.0000 0.0119 0.0119 0 0 0 2 0 0 78137-579988 NW_015145153.1:9 474 0.0056 0.0167 0.0111 2 0 0 6 0 0 45274-947100 NW_015145095.1:9 1863 0.0044 0.0153 0.0109 2 0 0 7 0 0 7674-103675 NW_015145025.1:2 3515 0.0035 0.0139 0.0104 4 0 0 16 0 0 434869-2442720 NW_015145003.1:6 1948 0.0000 0.0100 0.0100 0 0 0 4 0 0 56878-660440 NW_015145023.1:2 2051 0.0010 0.0104 0.0094 1 0 0 10 0 0 133414-2143862 NW_015148955.1:1 852 0.0037 0.0130 0.0093 2 0 0 7 0 0 184829-1186954 NW_015145003.1:8 7687 0.0013 0.0104 0.0091 1 0 1 8 0 0 04104-806929 NW_015145005.1:5 2986 0.0000 0.0091 0.0091 0 0 0 4 0 0 504884-5508136 NW_015145178.1:3 6628 0.0057 0.0144 0.0086 2 0 0 5 0 0 38673-340502 NW_015145178.1:2 7878 0.0028 0.0114 0.0085 2 0 0 8 0 0 58040-261337 NW_015145020.1:4 1682 0.0028 0.0112 0.0084 1 0 0 4 0 0 5267-46162 NW_015145005.1:4 7831 0.0035 0.0118 0.0084 5 0 1 17 9 0 120625-4125899 NW_015145350.1:2 6836 0.0083 0.0167 0.0083 2 0 1 4 0 0 51188-253471 NW_015145163.1:2 2951 0.0042 0.0125 0.0083 3 0 0 9 0 0 27785-230673 NW_015145350.1:5 7031 0.0081 0.0163 0.0081 4 0 0 8 0 0 9677-62423 NW_015145320.1:7 2971 0.0016 0.0097 0.0081 1 0 0 6 0 0 2756-187265 NW_015145163.1:9 3498 0.0010 0.0088 0.0078 1 0 0 9 0 0 15030-919136

250

Table 4-S1 – continued

NW_015145099.1:9 6256 0.0048 0.0126 0.0078 5 0 0 13 0 0 44458-951709 NW_015145615.1:1 4283 0.0034 0.0112 0.0077 4 0 0 13 0 0 61037-165080 NW_015145006.1:8 1984 0.0000 0.0074 0.0074 0 0 0 2 0 0 75190-883971 NW_015148955.1:1 2037 0.0041 0.0115 0.0074 5 0 0 14 0 0 513657-1517544 NW_015148955.1:8 999 0.0027 0.0099 0.0072 3 0 0 11 0 0 1090-84019 NW_015145006.1:1 598 0.0051 0.0123 0.0072 5 0 0 12 0 0 150941-1156578 NW_015145005.1:2 976 0.0054 0.0125 0.0072 3 0 0 7 0 0 102771-2104667 NW_015145005.1:4 7616 0.0048 0.0119 0.0071 2 0 0 5 0 0 595569-4597668 NW_015145350.1:2 7679 0.0080 0.0152 0.0071 9 0 0 17 0 0 08163-209673 NW_015145161.1:4 4817 0.0047 0.0118 0.0071 2 0 0 5 0 0 20508-422178 NW_015145070.1:2 6146 0.0000 0.0070 0.0070 0 0 0 4 0 0 316872-2318049 NW_015145152.1:6 3680 0.0000 0.0069 0.0069 0 0 0 5 0 0 995-28417 NW_015145023.1:1 1331 0.0011 0.0080 0.0068 1 0 0 7 0 0 617565-1626985 NW_015145157.1:1 4466 0.0049 0.0117 0.0068 5 0 0 12 0 0 36500-138970 NW_015145860.1:2 7233 0.0060 0.0128 0.0068 8 0 0 17 0 0 55652-257517 NW_015145098.1:9 6291 0.0061 0.0129 0.0068 9 0 0 19 0 0 88521-996880 NW_015145661.1:4 7176 0.0034 0.0102 0.0068 2 0 0 6 0 0 00051-401741 NW_015145533.1:1 1466 0.0048 0.0116 0.0068 10 0 0 24 0 0 49632-211538 NW_015145075.1:1 4412 0.0101 0.0168 0.0067 9 0 0 15 0 0 883704-1885561 NW_015145004.1:2 5310 0.0044 0.0111 0.0066 4 0 0 10 0 0 078622-2083845 NW_015145229.1:9 2950 0.0013 0.0079 0.0066 1 0 0 6 0 0 65720-967127 NW_015145098.1:6 1493 0.0022 0.0087 0.0065 1 0 0 4 0 0 3162-69263 NW_015145009.1:1 7689 0.0050 0.0115 0.0065 7 0 0 16 0 0 652636-1658501 NW_015145079.1:3 2270 0.0026 0.0090 0.0064 4 0 0 14 0 0 84157-404721 Total 146 0 3 477 9 0

FP = False Positive, FN = False Negative.

251

Table 4-S2. Manual annotation of SNPs in accelerated sexual genes.

pdist pdist DF DF DF DM DM DM cluster coordinates Δ pdist DF DM SNP FP FN SNP FP FN NW_015145088.1:4 1536 0.0126 0.0000 -0.0126 6 0 0 0 0 0 01730-402966 NW_015145025.1:9 6109 0.0093 0.0000 -0.0093 2 0 0 0 0 0 09373-915643 NW_015145068.1:9 6067 0.0150 0.0075 -0.0075 8 0 0 4 0 0 91747-995522 NW_015145004.1:2 4299 0.0075 0.0000 -0.0075 4 0 0 0 0 13 083680-2085654 NW_015145036.1:1 3937 0.0071 0.0000 -0.0071 2 0 0 0 0 0 408133-1408755 NW_015145078.1:5 5907 0.0100 0.0033 -0.0067 3 0 0 1 0 0 69758-571332 NW_015145005.1:4 4459 0.0063 0.0000 -0.0063 3 0 0 0 0 0 708964-4713497 NW_015145693.1:3 5335 0.0061 0.0000 -0.0061 2 0 0 0 0 0 73960-374854 NW_015145845.1:5 6596 0.0058 0.0000 -0.0058 2 0 0 0 0 2 5880-85443 NW_015145060.1:6 6548 0.0138 0.0083 -0.0055 5 0 0 3 0 0 70567-671740 NW_015145093.1:4 5582 0.0071 0.0016 -0.0055 13 0 0 3 0 0 04718-418275 NW_015145044.1:4 2952 0.0082 0.0027 -0.0054 6 0 0 2 0 0 90527-492944 NW_015145034.1:2 6795 0.0127 0.0072 -0.0054 7 0 0 4 0 0 086488-2088029 NW_015145137.1:7 1705 0.0065 0.0011 -0.0054 6 0 0 1 0 0 05607-709105 NW_015145095.1:1 3354 0.0144 0.0090 -0.0054 8 0 0 5 0 0 425155-1427832 NW_015145028.1:2 4224 0.0053 0.0000 -0.0053 1 0 0 0 0 0 104322-2114165 NW_015145332.1:1 1580 0.0092 0.0039 -0.0052 7 0 0 3 0 0 80088-182444 NW_015145025.1:6 3029 0.0065 0.0013 -0.0052 5 0 0 1 0 0 55387-658940 NW_015145006.1:3 560 0.0103 0.0051 -0.0051 2 0 0 1 0 0 869219-3869732 NW_015145085.1:3 7036 0.0062 0.0012 -0.0050 5 0 0 1 0 0 55742-359769 NW_015145027.1:2 2687 0.0066 0.0017 -0.0050 4 0 0 1 0 0 02486-204481 NW_015145020.1:7 528 0.0173 0.0123 -0.0049 7 0 0 5 0 0 72046-773302 NW_015145002.1:2 4388 0.0082 0.0033 -0.0049 5 0 0 2 0 0 519224-2522717 NW_015145002.1:1 7863 0.0048 0.0000 -0.0048 3 0 0 0 0 0 028020-1030852 NW_015145159.1:2 4696 0.0095 0.0048 -0.0048 2 0 0 1 0 0 44616-245469 NW_015145035.1:1 1727 0.0107 0.0059 -0.0047 9 0 0 5 0 0 402626-1409728 NW_015145135.1:1 4503 0.0071 0.0026 -0.0045 11 0 0 4 0 0 07689-121256

252

Table 4-S2 – continued

NW_015145002.1:20 3135 0.0063 0.0018 -0.0045 7 0 0 2 0 0 27477-2029020 NW_015145087.1:10 47 0.0089 0.0045 -0.0045 4 0 0 2 0 0 78823-1086391 NW_015145158.1:75 3677 0.0156 0.0111 -0.0044 7 0 0 5 0 0 1036-752074 NW_015145339.1:21 1419 0.0071 0.0027 -0.0044 8 0 0 3 0 0 4600-218862 NW_015145331.1:65 4588 0.0088 0.0044 -0.0044 2 0 0 1 0 0 6131-658106 NW_015145035.1:10 603 0.0130 0.0087 -0.0043 3 0 0 2 0 0 93225-1096999 NW_015145363.1:42 6076 0.0076 0.0032 -0.0043 7 0 0 3 0 0 0170-439338 NW_015145088.1:51 2097 0.0086 0.0043 -0.0043 4 0 0 2 0 0 1895-515401 NW_015145036.1:10 5248 0.0129 0.0086 -0.0043 6 0 0 4 0 0 4626-110177 NW_015145003.1:18 5885 0.0107 0.0064 -0.0043 5 0 0 3 0 0 8898-190637 NW_015145002.1:60 3223 0.0043 0.0000 -0.0043 1 0 0 0 0 0 38104-6039756 NW_015145039.1:93 6987 0.0043 0.0000 -0.0043 1 0 0 0 0 0 280-93803 NW_015145400.1:36 1860 0.0085 0.0043 -0.0043 8 0 0 4 0 0 5022-369380 NW_015145002.1:32 2121 0.0170 0.0127 -0.0042 16 0 0 12 0 0 63186-3267156 NW_015145005.1:57 1661 0.0053 0.0011 -0.0042 5 0 0 1 0 0 17202-5719747 NW_015145229.1:76 5475 0.0042 0.0000 -0.0042 1 0 0 0 0 0 8339-769157 NW_015145020.1:14 7836 0.0084 0.0042 -0.0042 8 0 0 4 0 0 18226-1429916 NW_015145099.1:19 1189 0.0147 0.0105 -0.0042 7 0 0 5 0 0 57413-1960415 NW_015145573.1:10 1847 0.0042 0.0000 -0.0042 2 0 0 0 0 0 1889-103914 NW_015145037.1:13 4985 0.0098 0.0056 -0.0042 7 0 0 4 0 0 62652-1400957 NW_015145020.1:21 5823 0.0073 0.0031 -0.0042 7 0 0 3 0 0 84591-2188447 NW_015145028.1:14 2705 0.0063 0.0021 -0.0042 3 0 0 1 0 0 54123-1459682 NW_015145339.1:88 580 0.0083 0.0042 -0.0042 6 0 0 3 0 0 1829-884959 Total 259 0 0 111 0 2

FP = False Positive, FN = False Negative. Red highlighting corresponds to a tandemly- duplicated gene that was removed from the analysis.

253

Table 4-S3. Pairwise distances in nuclear CDS regions in Diachasma.

Wasp 1 Wasp 2 # W-statistic p-value Mpdist (Variance) Mpdist (Variance) genes D. ferrugineum D. muliebre I 3127 4527800 4.17E-7 0.005089 (7.95E-6) 0.005496 (9.36E-6) D. ferrugineum D. muliebre I 1764 1461800 1.88E-3 0.005173 (9.35E-6) 0.005534 (1.07E-5) D. muliebre I D. muliebre II 1764 1561700 0.85 0.005534 (1.07E-5) 0.005508 (1.04E-5) D. muliebre I D. muliebre III 1764 1560200 0.89 0.005534 (1.07E-5) 0.005521 (1.06E-5) D. muliebre I D. muliebre IV 1764 1557200 0.96 0.005534 (1.07E-5) 0.005534 (1.08E-5) D. muliebre I D. muliebre V 1764 1591900 0.23 0.005534 (1.07E-5) 0.005414 (1.05E-5) D. muliebre II D. muliebre III 1764 1554300 0.96 0.005508 (1.04E-5) 0.005521 (1.06E-5) D. muliebre II D. muliebre IV 1764 1551300 0.88 0.005508 (1.04E-5) 0.005534 (1.08E-5) D. muliebre II D. muliebre V 1764 1586000 0.32 0.005508 (1.04E-5) 0.005414 (1.05E-5) D. muliebre III D. muliebre IV 1764 1552800 0.92 0.005521 (1.06E-5) 0.005534 (1.08E-5) D. muliebre III D. muliebre V 1764 1587600 0.29 0.005521 (1.06E-5) 0.005414 (1.05E-5) D. muliebre IV D. muliebre V 1764 1590600 0.25 0.005534 (1.08E-5) 0.005414 (1.05E-5) Wilcoxon rank sum tests were used to assess significant differences in distributions of pairwise distance measures involving D. ferrugineum (sexual, blue) and D. muliebre (asexual, red) relative to a common outgroup (D. alloeum, sexual).

254

Table 4-S4. SNP counts in Diachasma NGS datasets. Total wasp sample Homozygous SNPs Heterozygous SNPs SNPs D. muliebre I 4013 3969 (98.90%) 44 (1.10%) D. muliebre II 4005 3973 (99.20%) 32 (0.80%) D. muliebre III 4013 3960 (98.68%) 53 (1.32%) D. muliebre IV 4008 3964 (98.90%) 44 (1.10%) D. muliebre V 3902 3863 (99.00%) 39 (1.00%) D. ferrugineum RNA 3641 2888 (79.32%) 753 (20.68%) D. ferrugineum RRL 3472 2830 (81.51%) 642 (18.49%) Total length of regions in overlap = 734,668 bp

255

Table 4-S5. P-values of relative rate analysis tests in Diachasma mitochondrial genes.

p-value

Gene Length (bp) all 1st/2nd 3rd

NAD1 921 0.50499 0.05935 0.34578

NAD2 963 0.27523 1.00 0.16552

NAD3 318 0.13167 0.08326 0.47950

NAD4 1305 0.22302 0.05778 0.83483

NAD4L 276 1.00 0.31731 0.65472

NAD5 1608 1.00 1.00 1.00

NAD6 561 0.25135 0.36571 0.47950

CYTB 1104 0.72367 0.36571 0.27523

COX1 1515 0.03426 0.41422 0.04953

COX2 648 0.61708 0.08326 0.16552

COX3 747 0.14440 0.15730 0.43858

ATP6 666 0.61708 0.25684 0.73888

ATP8 117 0.04550 0.08326 0.31731

No individual gene p-value was statistically significant after applying a Bonferroni correction (p < 0.00385).

256

Table 4-S6. Statistical assessment of normality for mitochondrial and nuclear genome datasets. Wasp sample Data type genome # genes W-statistic p-value D. ferrugineum Pairwise distance nuclear 3127 0.9650 <2.2E-16 D. muliebre I Pairwise distance nuclear 3127 0.9648 <2.2E-16 D. ferrugineum Pairwise distance nuclear (no zeroes) 2978 0.9454 <2.2E-16 D. muliebre I Pairwise distance nuclear (no zeroes) 2995 0.9467 <2.2E-16 D. ferrugineum Pairwise distance mitochondrial 13 0.9067 0.17 D. muliebre I Pairwise distance mitochondrial 13 0.9768 0.96 D. ferrugineum ENC nuclear 3127 0.9389 <2.2E-16 D. muliebre I ENC nuclear 3127 0.9390 <2.2E-16 D. ferrugineum ENC mitochondrial 13 0.9535 0.65 D. muliebre I ENC mitochondrial 13 0.9496 0.59 D. ferrugineum CDC nuclear 3127 0.9114 <2.2E-16 D. muliebre I CDC nuclear 3127 0.9115 <2.2E-16 D. ferrugineum CDC mitochondrial 13 0.6381 1.42E-4 D. muliebre I CDC mitochondrial 13 0.6498 1.82E-4 D. ferrugineum GC3 nuclear 3127 0.9968 4.75E-6 D. muliebre I GC3 nuclear 3127 0.9969 4.75E-6 D. ferrugineum GC3 mitochondrial 13 0.9058 0.16 D. muliebre I GC3 mitochondrial 13 0.9052 0.16 ENC = effective number of codons, CDC = codon deviation coefficient, GC3 = GC content at third codon positions.

257

Figure 4-S1. Mutation landscapes of sexual and various asexual Diachasma wasps. Columns represent binned frequencies (left y-axis) of corresponding p-distance values from D. alloeum. Curves represent the cumulative frequency of binned columns (right y- axis).

258

Figure 4-S1 – continued

259

Figure 4-S1 – continued

260

Figure 4-S1 – continued

261

Figure 4-S1 – continued

262

Figure 4-S2. Mutation landscapes of various asexual Diachasma wasps. Columns represent binned frequencies (left y-axis) of corresponding p-distance values from D. alloeum. Curves represent the cumulative frequency of binned columns (right y- axis).

263

Figure 4-S2 – continued

264

Figure 4-S2 – continued

265

Figure 4-S2 – continued

266

CHAPTER 5: SUMMARY, CONCLUSIONS, AND FUTURE DIRECTIONS SUMMARY OF FINDINGS

The genome of D. alloeum as a platform for studying the evolution of sex

In Chapter 2, I performed numerous descriptive analyses on the newly sequenced genome of D. alloeum (Tvedte et al., in preparation). In particular, I assessed the quality of the D. alloeum assembly by a) broad characterization of highly-conserved nuclear genes, b) de novo assembly of the D. alloeum mitochondrial genome, and c) manual annotation of D. alloeum OXPHOS, chemosensory, and sex-determination genes. I compared whole- genome gene clusters and gene family evolution using datasets from published hymenopteran genomes (A. mellifera, N. vitripennis, M. demolitor) in order to ascertain the overall quality of the D. alloeum assembly (Weinstock et al., 2006; Werren et al.,

2010; Burke et al., 2014).

I identified a large proportion of complete BUSCO orthologs in the D. alloeum assembly, comparable to other Hymenoptera. Also, I confirmed the conservation of the canonical set of OXPHOS genes, consistent with a highly contiguous genome in the wasp genome. Of the sex-determination cascade genes, I recovered transformer and doublesex orthologs, but csd was not found. This result is unsurprising, as complementary sex determination (CSD) is infrequent in hymenopteran superfamilies with abundant asexual reproduction; asexual modes that increase genomic homozygosity will tend to produce diploid males via CSD, which may impede transitions to asexuality (van Wilgenburg et al., 2006).

I observed an expanded chemosensory gene reperitoire in D. alloeum relative to other hymenopterans. Particularly striking was my identification of the expansion of D.

267 alloeum-specific odorant receptors and ionotropic receptors that putatively function in olfactory processes. Consistent with this pattern, GO terms relevant to perception of odors were enriched in genome-wide gene cluster datasets in D. alloeum. The high chemosensory gene count in D. alloeum may underlie this wasp’s ability to discriminate between fruit odors, a life history trait which shows variation in natural populations

(Forbes et al., 2009).

Taken together, my analyses confirm the quality of the D. alloeum genome, enabling its use for comparative genomics methods across Diachasma species. Relevant to this study, annotated D. alloeum genes can be used as queries for ortholog searches to compare evolutionary patterns in sexual and asexual wasps.

Maintenance of meiosis genes in Hymenoptera, including sexual and asexual

Diachasma species

In Chapter 3, I compiled an inventory of meiosis genes across the breadth of

Hymenoptera (Tvedte et al., 2017). Using manually-annotated meiosis genes from N. vitripennis (Schurko et al., 2010) as BLAST queries, I retrieved orthologs from 21 insects, including 18 hymenopterans. Orthologous sequences were used for multiple analyses, including the a) description of gene conservation and loss, b) estimation of relative evolutionary rates, and c) generation of a multilocus phylogeny of studied hymenopteran insects. Additionally, I assessed whether variation in reproductive mode in

Diachasma produces distinct evolutionary signatures in meiosis genes. I conducted relative rate analyses in meiosis-specific genes to test for evidence of relaxed selection in

D. muliebre.

268

I found an overwhelming majority of meiosis genes across surveyed hymenopterans, including those functioning specifically in meiosis. Hymenopteran infraorders showed across-species gene duplication or loss events (e.g. RAD51C,

RAD54B, RECQ2), while other genes were duplicated or lost sporadically (e.g. REC8,

SMC1, SMC3). The RECA homolog DMC1 was the sole meiosis-specific gene that was widely absent, however other organisms lacking or possessing a mutated DMC1 can engage in meiotic recombination (Neale & Keeney, 2006). The maximum likelihood phylogeny of the concatenated gene dataset supported the two parasitic wasp infraorders as sisters, with the aculeates (bees, stinging wasps, ants) as an outgroup. Although the relationships between these groups is contentious, the result from this study is consistent with a recently published phylogeny using hundreds of hymenopteran transcriptomes

(Peters et al., 2017).

I recovered identical sets of meiosis genes in sexual and asexual Diachasma species. Loss of the meiosis-specific gene REC8 in D. muliebre could serve as a

“preadaptation” to asexuality, as it has been implicated in aborted Meiosis I in an asexual worm (Fradin et al., 2017). However, REC8 is not absolutely required for functional meiosis, as it has been lost in species that clearly engage in sexual reproduction (Schurko et al., 2010). Among six meiosis-specific genes, I found no evidence of accelerated evolution in D. muliebre compared to its close sexual relative, D. ferrugineum. The maintenance of canonical meiotic machinery provides support for the presence of automictic reproduction in D. muliebre, although it is possible that the recent origin of asexuality limits the extent to which differences in evolutionary rates can be observed.

269

Genomic evidence for accelerated mutation accumulation in D. muliebre

In Chapter 4, I used next-generation sequencing (NGS) datasets to investigate global patterns of molecular evolution in sexual and asexual Diachasma (Tvedte et al., in preparation). To do this, I compared nuclear and mitochondrial coding regions to perform multiple assessments of distinct asexual evolution, including a) measurements of pairwise distances in D. ferrugineum and D. muliebre as an approximation for mutation accumulation, b) relative rate tests to infer whether evolutionary rates at nonsynonymous and synonymous sites differ across Diachasma reproductive modes, and c) estimates of codon usage bias and gBGC in D. ferrugineum vs. D. muliebre to evaluate potential selective and mechanistic effects of sex loss on synonymous variation.

In the nuclear genome, D. muliebre exhibited signs of accelerated mutation accumulation relative to D. ferrugineum. I observed a substantial increase in genome- wide pairwise distances in D. muliebre, a pattern that I found when considering multiple sexual and asexual wasps. Moreover, accelerated evolution was pronounced in putatively deleterious 1st and 2nd codon positions, which is consistent with an increasing large body of evidence supporting harmful mutation accumulation in asexual lineages (Hollister et al., 2015; Lovell et al., 2017; Bast et al, 2018).

In contrast to observations in the nuclear genome, D. muliebre showed no evidence of accelerated mitochondrial genome evolution, as had been demonstrated in other asexual systems (Paland & Lynch, 2006; Sharbrough et al., 2018). Pairwise distance and relative rate measures comparing D. ferrugineum and D. muliebre across individual genes were not statistically significant. The concatenated mitogenome dataset showed a higher number of nucleotide changes in D. ferrugineum, particularly at 1st and

270

2nd codon positions, but the overall effect may be influenced by the evolution of one or a few genes. A possible interpretation of this observation is stronger purifying selection in the mitochondrial genome of D. muliebre to preserve optimal nuclear-mitochondrial gene combinations. An alternative explanation is tighter mitonuclear coevolution in D. ferrugineum, where mutations in nuclear-encoded OXPHOS genes may require compensatory evolution in their mitochondrial counterparts (Rand et al., 2004).

The observed increase of synonymous changes in the D. muliebre nuclear genome do not appear to be a consequence of relaxed selection on synonymous sites.

Given the recent origin of asexuality in Diachasma, it is possible that actual differences between sexual and asexual wasps are not discernable at this time. Indeed, differences in codon bias were detected in sexual and asexual lineages of orbatid mites but not Timema stick insects, the former having origins of asexuality several million years ago (Brandt et al., 2017; Bast et al., 2018). GC content at third codon positions also did not substantially differ between D. muliebre and D. ferrugineum, in contrast to increased GC3 in sexual lineages demonstrated in Timema (Bast et al., 2018). As above, this could also reflect the recency of sex loss, although the timescale of asexuality in Diachasma is similar to that of Timema lineages (0.2-2MYA) (Schwander & Crespi, 2009; Schwander et al., 2011).

Lack of evidence of arrested gBGC could instead be indicative of the maintenance of meiotic recombination in D. muliebre.

271

CONTRIBUTIONS OF FINDINGS TO THE FIELD

Genomic consequences of asexuality

The most substantial contributions of my thesis to the broader field of Biology are the comparative evolutionary patterns observed in the genomes of closely related sexual and asexual organisms (Chapters 3 & 4). The utility of NGS to explore the consequences of asexuality is becoming increasingly apparent, as these datasets are amenable for both candidate gene (e.g. Kraaijeveld et al., 2016; Nowell et al., 2018) and genome-wide (e.g.

Hollister et al., 2015; Ament-Velasquez et al., 2016; Lovell et al., 2017; Brandt et al.,

2017; Bast et al, 2018; Lindsey et al., 2018) approaches. Of particular importance, NGS approaches provide ample signal for mutation accumulation studies, which may be obscured by distinct evolutionary patterns among individual loci. Here, I developed a novel bioinformatic workflow in order to retrieve datasets containing thousands of orthologous Diachasma genes, which can be adapted for use with DNASeq and RNASeq strategies.

My study is especially compelling given the evidence for accelerated mutation accumulation in D. muliebre (Chapter 4) despite support for the maintenance of meiotic recombination (Chapters 3 & 4). Previous work has focused on comparisons between sexual and apomictic asexuals. Lack of recombination in apomicts provides a clear prediction on the reduced efficacy of selection given their entire genomes are effectively in linkage disequilibrium. Although computational models have defined parameters for which automictic asexuals could persist over long evolutionary timespans (Engelstädter,

2017), the relationship between sex loss and effectiveness of selection has not been rigorously tested in naturally-occuring automicts. The results presented here could

272 motivate similar studies in taxonomic groups with automitic reproductive polymorphism

(e.g. nematodes, annelids, arthropods; Stenberg & Saura, 2009)

The Diachasma alloeum genome: a new tool for interspecific and intraspecific comparisons

This study also contributes to the growing genome resources for Hymenoptera, a speciose insect order (Chapter 2). The D. alloeum genome assembly has already been used for conducting broad comparative genomics methods (Geib et al., 2017) and a meta-analysis of available hymenopteran genomes (Branstetter M, et al. 2017). Here, a detailed accounting of key gene families will provide opportunities for exploring prominent questions related to Diachasma biology, as well as Biology more broadly. The first group is chemonsensory genes, which may be involved in incipient genetic differentiation within and between species. The second group is oxidative phosphorylation genes, which are expected to experience different selective pressures in sexual and asexual lineages.

273

NEW AND OPEN QUESTIONS

Asexual reproduction: how does D. muliebre do it?

Multiple separate lines of evidence suggest the active maintenance of meiotic recombination in D. muliebre (Forbes et al., 2013, Tvedte et al., 2017, Chapter 4).

However, meiosis has not yet been directly observed in this asexual wasp. Cytological analyses would provide unequivocal support for a precise mode of asexuality (e.g. Verma

& Ruttner, 1983; Oxley et al., 2014). Alternatively, spatial patterns of genome heterozygosity in mother-daughter relationships could be used to infer reproductive mode

(Pearcy et al., 2006), but our inability to rear D. muliebre in a laboratory environment at this time precludes the application of this method.

At the molecular level, the absence of REC8 in D. muliebre provides a mechanism for the origin of asexuality. In C. elegans, mutations in REC8 induce the separation of sister chromatids rather than homologous chromosomes during Meiosis I, and the failure to extrude a secondary polar body yields a diploid oocyte (Severson et al.,

2009). However, this exact mechanism is unlikely to occur in D. muliebre given its tendency to preserve genomic heterozygosity, whereas the wasp genome is highly homozygous. D. muliebre has also lost DMC1, although the widespread absence of this meiosis-specific gene in Hymenoptera suggests that it is not essential for a successful meiotic program. Although I observed no significant changes in selective regimes occurring on meiosis genes, it is possible that transcriptional activity differences contribute to non-canonical meiotic pathways in D. muliebre. Elucidating the molecular roles of meiosis-related genes in D. muliebre through functional experimentation may

274 inform the nature of asexual origins in haplodiploid organisms more broadly, which may be ‘preadapted’ to reproductive mode switches (Neiman et al., 2014).

Are patterns of mutation accumulation in Diachasma affected by population structure?

In order to produce the most rigorous and unbiased inquiry into evolutionary rate comparisons, we conducted whole-genome surveys of Diachasma. However, by relying on NGS data produced from small numbers of wasps, we are ultimately limited to the extent to which these patterns are reflective of diversity in natural populations. Groups of individuals with shared genetic and demographic history function as a basis for evolution by natural selection. Because we ultimately want to know how reproductive mode affects genome dynamics at the species level, population data adds power to differentiate between effects of sex and effects of demographic sampling. For example, mutation load may be exacerbated in small populations (Kimura et al., 1963), therefore substantiated differences in mutation accumulation might not be reflective of sex loss per se, but rather variation in census sizes. Evolutionary rate patterns in at the population level should recapitulate predictions from genomic alignments created using single wasp genome representatives. Deviations from this null expectation would implicate the role of processes not related to reproductive strategy in the origin and maintenance of genomic variation.

Low levels of intraspecific variation in asexuals corroborate the conclusion that

D. muliebre originated in a single transition to asexuality (Chapter 4). Also, two D. ferrugineum females sampled distinct geographical locations consistently contained fewer SNPs relative to D. muliebre in regions with shared read coverage. Sampling

275 additional wasp genomes should have two primary benefits. First, it should allow us to more precisely describe where and when the transition to asexuality likely took place.

The current distribution of Diachasma species suggests a split in the American

Southwest, perhaps representing their shared location during the last glacial maximum, but additional collections are required to assess whether there exist sexual wasps more closely related to D. muliebre than those from our current Iowa collections. Data obtained from this project would therefore contribute to Diachasma natural history and assist in characterization of the putative conditions associated with the origin of an asexual lineage. Second, it will allow us to discern whether mutational changes in Diachasma represent segregating polymorphisms or fixed differences. Estimating relative levels of standing genetic variation in D. muliebre would elucidate patterns in genome-wide heterozygosity as well as the persistence of deleterious variants in natural populations. As a separate prediction of asexual evolution, deleterious mutation accumulation is ultimately determined by the incidence of deleterious mutant fixation events. This proposal would add to the growing use of whole-genome datasets to examine accelerated evolution in asexuals, in which the lack of consensus perhaps is reflected in part by asexual lineage age (e.g. Ament-Velasquez et al., 2016; Brandt et al., 2017; Bast et al.,

2018).

Are there distinct patterns of evolution in Diachasma OXPHOS genes?

While I observed the accelerated evolution in the D. muliebre nuclear genome, the mitochondrial genome lacked evidence of this pattern (Chapter 4). Individual genes did not show significant differences in evolutionary rates, but the concatenated dataset supported a higher incidence of mutational changes in D. ferrugineum.

276

A recent study in New Zealand freshwater snails invoked the ability of sexual reproduction (specifically, recombination) to decrease selective interference between nuclear and mitochondrial genomes (Sharbrough et al., 2018). If recombination is retained in D. muliebre, it follows that effective purifying selection could be acting on these genes as important contributors to organismal fitness. Another potential explanation is that tight coevolution of mitochondrial and nuclear genes confers strong selective pressures for compensatory changes in the former as a response to naturally-occuring mutations in the latter. Associations between evolutionary rates for OXPHOS genes have been deomstrated, particularly between mitochondrial genes and their physically- interacting nuclear counterparts (e.g. Yan et al., 2018). Given the high genomic similarity between all species of Diachasma, it would be unsurprising if per-gene divergence across nuclear-encoded OXPHOS genes is low. Relative rate analyses would be complemented by computational modeling of any protein-folding changes that occur due to nonsynonymous substitutions.

What are the fitness consequences of accelerated mutation accumulation in D. muliebre?

The discovery that D. muliebre exhibits accelerated evolution, particularly at sites likely to change amino acid composition, provides a unique opportunity to test two non- mutually exclusive hypotheses regarding the evolutionary trajectory of these asexual wasps: a) increased genetic diversity confers short-term advantages to asexual wasps occupying diverse niches, and b) within-lineage recombination extends the potential for long-term survivability in D. muliebre, but will be ultimately limited due by lack of outcrossing.

277

This effort would be greatly aided by obtaining population-level samples of sexual and asexual wasps, a need that is described above. Moreover, testing hypotheses for the actualized consequences of sex loss require a connection between genotype and phenotype. While my study demonstrates increased incidence of putatively harmful mutations in D. muliebre, none have been explicitly connected to fitness outcomes. There are multiple avenues for advancement here:

First, mutations of large effect (e.g. nonsense, frameshift) could be robust predictors of differential fitness in D. muliebre. Deleterious SNPs were identified in multiple genes coding for sexual traits in an asexual wasp, the loss of which could relieve an unnecessary metabolic burden (Kraaijeveld et al., 2016). In this study, I analyzed genome-wide evolutionary patterns with a focus on complete, intact genes. These efforts should use multiple sexual and asexual wasps to assess whether these mutations represent segregating polymorphisms or fixed differences. Although these genes might not display the hallmarks of pseudogenization given the recent origin of asexuality, they would be strong candidates for affecting asexual evolution.

Second, functional annotation of genes exhibiting the greatest genetic divergence between sexual and asexual Diachasma can be functionally annotated. Although analysis of genes evolving particularly rapidly in asexuals might not reveal a “smoking gun,” the bioinformatics workflow described in Chapter 4 effectively conducts pairwise data comparisons and can be used to characterize functional enrichment of genes at the tails of distributions. However, removing incomplete genes caused a sizeable drop-out from the original total found in D. alloeum, F. arisanus, and M. demolitor. As a result, GO-term enrichment of groups of genes at either extreme of the pairwise distance distribution

278 would be uninformative. The sequencing of female D. ferrugineum DNASeq datasets would be useful here, and additionally could be used to study evolutionary pattenrns in non protein-coding regions.

Third, recent studies have identified likely targets for selection in Diachasma.

Genetically divergent wasp populations display variation in life history traits that correspond to distinct niches, such as discrimination of fruit volatiles and eclosion time

(Forbes et al., 2009; Forbes et al., 2013). D. muliebre may be able to adapt to new niches through evolution of gene content in families with particularly rapid evolution (e.g. chemosensory genes). The D. alloeum chemosensory inventory will be useful in this regard (Chapter 2), however future studies should use de novo genome assembly approaches to identify novel genes in the other Diachasma wasps. Differences in life history traits might not be encoded in the genes themselves, but instead in their transcriptional activity. Indeed, populations of Rhagoletis hosts with distinct eclosion periods show differential regulation in genes involved in growth and developmental processes (Meyers et al., 2016). In addition to enabling sexual-asexual comparisons, increased wasp sampling will allow for a more detailed characterization of relationships between genomic variability and adaptive potential in D. muliebre, a species that has experienced a new host shift.

279

CONCLUSIONS AND FUTURE PROSPECTS

Why sex is maintained in natural populations remains a fundamental question in evolutionary biology. A critical accessory to this question is the understanding of factors contributing to the persistence of lineages that have lost the capacity for sexual reproduction. My thesis provides evidence for mutation accumulation in spite of recombination. While many of these mutations are likely contributors to a growing mutation load that will limite lineage survival, some may be beneficial, enabling the establishment of asexuals in new environments.

Although my thesis has already yielded some important insights into the consequences of asexuality, multiple questions remain: In which cases does mutation accumulation proceed, when considering alternate forms of asexuality? What are the ultimate fitness consequences for mutation accumulation in asexuals, and do these occur rapidly enough to impede asexual lineage longevity? Despite some barriers, the increasing cost-effectiveness of genomic sequencing will make investigations of closely- related asexual and sexual lineages a feasible endeavor.

280

REFERENCES

Ament-Velásquez SL, et al. 2016. Population genomics of sexual and asexual lineages in

fissiparous ribbon worms (Lineus, Nemertea): hybridization, polyploidy and the

Meselson effect. Mol Ecol 25:3356–3369.

Bast J, et al. 2018. Consequences of asexuality in natural populations: insights from stick

insects. Mol Biol Evol doi:10.1093/molbev/msy058.

Brandt A, et al. 2017. Effective purifying selection in ancient asexual oribatid mites.

Nat Commun 8:873.

Branstetter M, et al. 2017. Genomes of the Hymenoptera. Curr Opin Insect Sci 25:65-75.

Burke GR, Walden KKO, Whitfield JB, Robertson HM, Strand MR. 2014. Widespread

genome reorganization of an obligate virus mutualist. PLOS Genet 10:e1004660.

Engelstädter J. 2017. Asexual but not clonal: evolutionary processes in automictic

populations. Genetics 206:993–1009.

Fradin H, et al. 2017. Genome architecture and evolution of a unichromosomal asexual

nematode. Curr Biol 27:2928–2939.

Geib SM, Liang GH, Murphy TD, Sim SB. 2017. Whole genome sequencing of the

braconid parasitoid wasp Fopius arisanus, an important biocontrol agent of pest

tepritid [sic] fruit flies. G3-Genes Genom Genet 8:2407-2411.

Hollister JD, et al. 2015. Recurrent loss of sex is associated with accumulation of

deleterious mutations in Oenothera. Mol Biol Evol 32:896–905.

Kimura M, Maruyama T, Crow JF. 1963. The mutation load in small populations.

Genetics 48:1303-1312.

281

Kraaijeveld K, et al. 2016. Decay of Sexual Trait Genes in an Asexual Parasitoid Wasp.

Genome Biol Evol 8:3685-3695.

Lindsey AR, et al. 2018. Comparative genomics of the miniature wasp and pest control

agent Trichogramma pretiosum. BMC Biol 16:54.

Lovell JT, Williamson RJ, Wright SI, McKay JK, Sharbel TF. 2017. Mutation

accumulation in an asexual relative of Arabidopsis. PLOS Genet 13:e1006550.

Meyers PJ, et al. 2016. Divergence of the diapause transcriptome in apple maggot flies:

winter regulation and post-winter transcriptional repression. J Exp Biol 219:2613-

2622.

Neale MJ, Keeney S. 2006. Clarifying the mechanics of DNA strand exchange in meiotic

recombination. Nature 442:153-158.

Neiman M, Sharbel TF, Schwander T. 2014. Genetic causes of transitions from sexual

reproduction to asexuality in plants and animals. J Evol Biol 27:1346-1359.

Nowell RW, et al. 2018. Comparative genomics of bdelloid rotifers: Insights from

desiccating and nondesiccating species. PLOS Biol 16:e2004830.

Oxley PR, et al. 2014. The genome of the clonal raider ant Cerapachys biroi. Curr Biol

24:451-458.

Paland S, Lynch M. 2006. Transitions to asexuality result in excess amino acid

substitutions. Science, 311:990-992.

Pearcy M, Hardy O, Aron S. 2006. Thelytokous parthenogenesis and its consequences on

inbreeding in an ant. Heredity 96:377-382.

Peters RS, et al. 2017. Evolutionary History of the Hymenoptera. Curr Biol 7:1013-1018.

282

Rand DM, Haney RA, Fry AJ. 2004. Cytonuclear coevolution: the genomics of

cooperation. Trends Ecol Evol 19:645-653.

Schurko AM, Mazur DJ, Logsdon Jr JM. 2010. Inventory and phylogenomic distribution

of meiotic genes in Nasonia vitripennis and among diverse arthropods. Insect Mol

Biol 19:165-180.

Severson AF, Ling L, van Zuylen V, Meyer BJ. 2009. The axial element protein HTP-3

promotes cohesin loading and meiotic axis assembly in C. elegans to implement

the meiotic program of chromosome segregation. Gene Dev 23:1763–1778.

Sharbrough J, Luse M, Boore JL, Logsdon Jr JM, Neiman M. 2018. Radical amino acid

mutations persist longer in the absence of sex. Evolution 72:808-824.

Stenberg P, Saura A. 2009. Cytology of asexual animals. In Lost sex. ed. Schӧn I,

Martens K, van Dijk P. pp. 63–74. Springer, New York, NY, USA.

Tvedte ES, Forbes AA, Logsdon Jr JM. 2017. Retention of core meiotic genes across

diverse Hymenoptera. J Hered 108:791-806.

Tvedte ES, et al. Descriptive analyses of the genome of the parasitic wasp Diachasma

alloeum, an emerging model for ecological speciation and transitions to asexual

reproduction. In preparation.

Tvedte ES, Ward AC, Forbes AA, Logsdon Jr JM. Accelerated mutation accumulation

across the genome of a young asexual lineage. In preparation. van Wilgenburg E, Driessen G, Beukeboom LW. 2006. Single locus complementary sex

determination in Hymenoptera: an" unintelligent" design? Front Zool 3:1.

Verma S, Ruttner F. 1983. Cytological analysis of the thelytokous parthenogenesis in the

Cape honeybee (Apis mellifera capensis Escholtz). Apidologie 14:41-57.

283

Weinstock GM, et al. 2006. Insights into social insects from the genome of the honeybee

Apis mellifera. Nature 443: 931-949.

Werren JH et al. 2010. Functional and evolutionary insights from the genomes of three

parasitoid Nasonia species. Science 327:343-348.

Yan Z, Ye GY, Werren J. 2018. Evolutionary rate coevolution between mitochondria and

mitochondria-associated nuclear-encoded proteins in insects. bioRxiv, DOI:

https://doi.org/10.1101/288456.

284